quita - katedra českého jazyka · quita quantitative index text analyzer miroslav kubát...

QUITA Quantitative Index Text Analyzer Miroslav Kubát Vladimír Matlach Department of General Linguistic, Palacký University, Czech Republic Acknowledgement: QUITA was supported by the student project IGA (no. FF_2013_031) of the Palacký University, Olomouc. oltk.upol.cz/software Our aim is to provide a user-friendly tool of quantitative text analysis for researchers from various disciplines (linguistics, criticism, history, sociology, psychology, politics, biology, etc.). QUITA combines all important parts of any quantitative research: obtaining results, statistical testing and graphical visualization. There is no need to use any additional software such as spreadsheet applications or special statistical programs. INDICATORS TO COMPUTE Frequency Structure indicators o Type-Token Ratio ( TTR ) o h -point ( h ) o Vocabulary Richness ( R 1 ) o Repeat Rate ( RR ) o Relative Repeat Rate of McIntosh ( RR mc ) o Hapax Legomenon Percentage ( HL ) o Lambda ( Λ ) o Gini Coefﬁcient ( G ) o Vocabulary Richness ( R 4 ) o Curve length ( L ) o Curve length Indicator ( R ) o Entropy ( H ) o Adjusted Modulus ( A ) Miscellaneous indicators o Verb Distances ( VD ) o Activity ( Q ) & Descriptivity ( D ) o Writer’s View ( α ) o Average Tokens length ( ATL ) o Thematic Concentration ( TC ) o Secondary Thematic Concentration ( STC ) TEXT-PROCESSING Pre-processing o Tokenizer (word, line, char, DNA Triplet, DNA Nucleotide) o Multilingual lemmatizer (AR, CZ, DE, DK, EN, ES, FI, FR, IT, NL, PT, RO, RU, SE) o POS Tagger (It distinguishes parts of speech in a text) Post-processing o N-grams (QUITA enables creating char, word or whatever n-grams) o Text length reduction STATISTICAL COMPARISON CREATING CHARTS

Upload: others

Post on 28-Sep-2020

6 views

Category:

Documents

0 download

Report

Download

Embed Size (px):

TRANSCRIPT

QUITAQuantitative Index Text Analyzer

Miroslav KubátVladimír Matlach

Department of General Linguistic, Palacký University, Czech Republic

Acknowledgement: QUITA was supported by the student project IGA (no. FF_2013_031) of the Palacký University, Olomouc.

oltk.upol.cz/software

Our aim is to provide a user-friendly tool of quantitative text analysis for researchers from various disciplines (linguistics, criticism, history, sociology, psychology, politics, biology, etc.). QUITA combines all important parts of any quantitative research: obtaining results, statistical testing and graphical visualization. There is no need to use any additional software such as spreadsheet applications or special statistical programs.

INDICATORS TO COMPUTE

Frequency Structure indicatorso Type-Token Ratio (TTR)o h-point (h)o Vocabulary Richness (R1)o Repeat Rate (RR) o Relative Repeat Rate of McIntosh (RRmc)o Hapax Legomenon Percentage (HL)o Lambda (Λ)o GiniCoefficient(G) o Vocabulary Richness (R4) o Curve length (L)o Curve length Indicator (R)o Entropy (H)o Adjusted Modulus (A)

Miscellaneous indicatorso Verb Distances (VD)o Activity (Q) & Descriptivity (D)o Writer’s View (α)o Average Tokens length (ATL)o Thematic Concentration (TC)o Secondary Thematic Concentration (STC)

TEXT-PROCESSING

Pre-processingo Tokenizer (word, line, char, DNA Triplet, DNA Nucleotide)o Multilingual lemmatizer (AR, CZ, DE, DK, EN, ES, FI, FR, IT, NL, PT, RO, RU, SE)o POS Tagger (It distinguishes parts of speech in a text)

Post-processingo N-grams (QUITA enables creating char, word or whatever n-grams)o Text length reduction

STATISTICAL COMPARISON

CREATING CHARTS

Lint Remover / Tira Borbotos Quita pelusas / Appareil anti ...€¦ · Quita pelusas / Appareil anti-peluches User instructions Instruções de uso Instrucciones de uso Mode d’emploi

Manual Del Usuario Quita Generacion

ČESKÝ ČERVENÝ KŘÍŽ VÝROČNÍ ZPRÁVA CZECH RED CROSS … · Českého rozhlasu Regina a objednat si písničku na přání, kterou „zaplatil“ vodou. Částka za virtuální

3. část maturity z anglického jazyka - ústní část_6_Z_USA (2012_new)

3. část maturity z anglického jazyka - ústní část_11_Z_Holidays and Celebrations (2012_new)

OSVOJOVÁNÍ DRUHÉHO JAZYKA ONDREJ KACAN.pdf · Západočeská univerzita v Plzni Fakulta pedagogická Katedra anglického jazyka Diplomová práce OSVOJOVÁNÍ DRUHÉHO JAZYKA Ondřej

Lišejníky Českého krasu: shrnutí výzkumů a soupis druhů · Urceolaria . BOHEMIA CENTRALIS 32 216 gypsacea (= Diploschistes gypsaceus) sbíraná v roce 1818 taktéž u Sv

PWA World Champ 2011 Sarah Quita Offringa

Proměny českého akademického diskurzu během posledních