basic concepts and tools for the toki pona minimal and ... · toki pona (tp) is a minimal conlang...

15
Basic concepts and tools for the Toki Pona minimal and constructed language RENATO FABBRI , Institute of Mathematical and Computer Sciences, University of São Paulo, BR A minimal constructed language (conlang) is useful for experiments and comfortable for making tools. The Toki Pona (TP) conlang is minimal both in the vocabulary (with only 14 letters and 124 lemmas) and in the (about) 10 syntax rules. The language is useful for being a used and somewhat established minimal conlang with at least hundreds of fluent speakers. This article exposes current concepts and resources for TP, and makes available Python (and Vim) scripted routines for the analysis of the language, synthesis of texts, syntax highlighting schemes, and the achievement of a preliminary TP Wordnet. Focus is on the analysis of the basic vocabulary, as corpus analyses were found. The synthesis is based on sentence templates, relates to context by keeping track of used words, and renders larger texts by using a fixed number of phonemes (e.g. for poems) and number of sentences, words and letters (e.g. for paragraphs). Syntax highlighting reflects morphosyntactic classes given in the official dictionary and different solutions are described and implemented in the well-established Vim text editor. The tentative TP Wordnet is made available in three patterns of relations between synsets and word lemmas. In summary, this text holds potentially novel conceptualizations about, and tools and results in analyzing, synthesizing and syntax highlighting the TP language. CCS Concepts: Applied computing Arts and humanities; Document management and text processing; Social and professional topics Cultural characteristics; Additional Key Words and Phrases: Constructed languages, natural language processing, syntax highlighting, wordnet, Toki Pona ACM Reference Format: Renato Fabbri. 2018. Basic concepts and tools for the Toki Pona minimal and constructed language. ACM Trans. Asian Low-Resour. Lang. Inf. Process. 0, 0, Article 00 (July 2018), 15 pages. https://doi.org/0000001.0000001 1 INTRODUCTION Toki Pona (TP) is a minimal conlang (constructed language) with only 124 words (120 without the synonyms). Therefore, the concepts are usually very general and different, and, without context, the words are not frequently related through meronymy and hyponymy. Such a linguistic setting is desired because of the simplicity which entails facilitated e.g. learning and tool making. Another reason why the minimal language design is compelling is the study and harnessing of the strong and weak forms of the Sapir-Whorf hypothesis (linguistic relativity), i.e. that language dictates or at least influences thought and world experience [21]. Accordingly, one uses a conlang as a thinking tool (or platform) or to make experiments about the influence language has on the thoughts of the ‘speaker’ (also writer and reader). Toki Pona is often described as a tool to meditate, to simplify the thinking processes, and as a way to modify the mood and impressions about the world. [2, 14, 16, 22] In [17], Sonja Lang Description of the language and main issues; analysis of the vocabulary; text synthesis and syntax highlighting; Wordnet synsets This is the corresponding author Author’s address: Renato Fabbri, Institute of Mathematical and Computer Sciences, University of São Paulo, Avenida Trabalhador São-carlense, 400 - Centro, São Carlos, SP, 668, BR, [email protected]. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor, or affiliate of the United States government. As such, the United States government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for government purposes only. © 2018 Association for Computing Machinery. 2375-4699/2018/7-ART00 $15.00 https://doi.org/0000001.0000001 ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 0, No. 0, Article 00. Publication date: July 2018. arXiv:1712.09359v3 [cs.CY] 4 Jul 2018

Upload: others

Post on 13-Oct-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Basic concepts and tools for the Toki Pona minimal and ... · Toki Pona (TP) is a minimal conlang (constructed language) with only 124 words (120 without the synonyms). Therefore,

Basic concepts and tools for the Toki Pona minimal and constructedlanguage∗

RENATO FABBRI†, Institute of Mathematical and Computer Sciences, University of São Paulo, BR

A minimal constructed language (conlang) is useful for experiments and comfortable for making tools. The Toki Pona (TP)conlang is minimal both in the vocabulary (with only 14 letters and 124 lemmas) and in the (about) 10 syntax rules. Thelanguage is useful for being a used and somewhat established minimal conlang with at least hundreds of fluent speakers. Thisarticle exposes current concepts and resources for TP, and makes available Python (and Vim) scripted routines for the analysisof the language, synthesis of texts, syntax highlighting schemes, and the achievement of a preliminary TP Wordnet. Focus ison the analysis of the basic vocabulary, as corpus analyses were found. The synthesis is based on sentence templates, relatesto context by keeping track of used words, and renders larger texts by using a fixed number of phonemes (e.g. for poems) andnumber of sentences, words and letters (e.g. for paragraphs). Syntax highlighting reflects morphosyntactic classes given in theofficial dictionary and different solutions are described and implemented in the well-established Vim text editor. The tentativeTP Wordnet is made available in three patterns of relations between synsets and word lemmas. In summary, this text holdspotentially novel conceptualizations about, and tools and results in analyzing, synthesizing and syntax highlighting the TPlanguage.

CCS Concepts: • Applied computing → Arts and humanities; Document management and text processing; • Socialand professional topics→ Cultural characteristics;

Additional Key Words and Phrases: Constructed languages, natural language processing, syntax highlighting, wordnet, TokiPona

ACM Reference Format:Renato Fabbri. 2018. Basic concepts and tools for the Toki Pona minimal and constructed language. ACM Trans. AsianLow-Resour. Lang. Inf. Process. 0, 0, Article 00 (July 2018), 15 pages. https://doi.org/0000001.0000001

1 INTRODUCTIONToki Pona (TP) is a minimal conlang (constructed language) with only 124 words (120 without the synonyms).Therefore, the concepts are usually very general and different, and, without context, the words are not frequentlyrelated through meronymy and hyponymy. Such a linguistic setting is desired because of the simplicity whichentails facilitated e.g. learning and tool making. Another reason why the minimal language design is compellingis the study and harnessing of the strong and weak forms of the Sapir-Whorf hypothesis (linguistic relativity), i.e.that language dictates or at least influences thought and world experience [21]. Accordingly, one uses a conlangas a thinking tool (or platform) or to make experiments about the influence language has on the thoughts of the‘speaker’ (also writer and reader). Toki Pona is often described as a tool to meditate, to simplify the thinkingprocesses, and as a way to modify the mood and impressions about the world. [2, 14, 16, 22] In [17], Sonja Lang∗Description of the language and main issues; analysis of the vocabulary; text synthesis and syntax highlighting; Wordnet synsets†This is the corresponding author

Author’s address: Renato Fabbri, Institute of Mathematical and Computer Sciences, University of São Paulo, Avenida Trabalhador São-carlense,400 - Centro, São Carlos, SP, 668, BR, [email protected].

ACM acknowledges that this contribution was authored or co-authored by an employee, contractor, or affiliate of the United States government.As such, the United States government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to doso, for government purposes only.© 2018 Association for Computing Machinery.2375-4699/2018/7-ART00 $15.00https://doi.org/0000001.0000001

ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 0, No. 0, Article 00. Publication date: July 2018.

arX

iv:1

712.

0935

9v3

[cs

.CY

] 4

Jul

201

8

Page 2: Basic concepts and tools for the Toki Pona minimal and ... · Toki Pona (TP) is a minimal conlang (constructed language) with only 124 words (120 without the synonyms). Therefore,

00:2 • Fabbri, R.

(the creator of TP), describes that she has seen the language been used successfully in the context of management,creation of texts, legal texts, etc.

This article provides a conceptual overview of the language, which is fit both to the newcomer and to the expertfor being considerably different from what was found in the literature [4, 14, 16], here with emphasis on simplicityand flexibility. Novel software routines for analysis, synthesis, syntax highlighting, and the achievement of atentative TP Wordnet [19] are also herein presented.

Next subsections hold a description of the general resources available, a historical note, and some words aboutnatural and constructed languages. Section 2 describes the TP phonology and syntax. Section 3 presents thesoftware routines we made available and immediate results, such as listings and statistics of words, poems andshort stories, coloring schemes, and preliminary TP Wordnet versions. Conclusions and further work are inSection 4. Appendix A holds considerations about the author’s usage of Toki Pona, with thoughts about rulebreaking and potentially new conlangs. Appendix B holds final words in Toki Pona.

1.1 Resources on Toki PonaOne might categorize current resources for the Toki Pona language in: references and learning material, corpus,websites, interaction groups (where users talk a post texts and comments), and software gadgets. The mainreferences of the language are: the official book “Toki Pona: The Language of Good” [16], authored by SonjaLang, the creator of the language; and the online book “o kama sona e toki pona!”, from jan Pije [14]. A numberof tools and other resources for dealing with Toki Pona were developed by the community. The most completelist is supposedly [3] and includes videos, musical pieces, artistic texts, reference documents (e.g. a Toki Pona -Esperanto dictionary), journalistic articles, and software tools. For a more comprehensive view of the resourcesavailable for the user, we suggest following the links in [3, 32].

1.2 Historical noteTP was developed as an internal and personal language by Sonja Lang in late 1990s [17]. A description wasreleased as a draft in 2001, and in 2007 some documents reported it to have a few hundred speakers. The Englishofficial book [16] was published only in 2014. In 2016, a version of the official book was published in French.Nowadays, one finds a number of texts about TP and written in TP (e.g. in social platforms such as Facebookgroups, microblogging, Telegram and IRC), and other diverse uses of TP e.g. for artificial intelligence and softwaretools [3].

1.3 Natural and constructed and artificial languagesA language one uses (or might use) to communicate by speaking and writing is called ‘natural language’. A‘constructed language’ (also planned or invented language), e.g. Esperanto, Toki Pona, and Lojban, is a naturallanguage built by someone or a group. An ‘artificial language’ is most usually a term used to designate a languageyield by artificial agents, such as by AI routines, or by humans in controlled experiments, and are considered e.g.within ‘cultural evolution’ studies. Formal languages are defined by tokens and rules to operate them, they spanfrom computer programming languages to math and formal models for natural languages.

Constructed languages (aka. conlangs) are in some contexts called artificial languages, but creators most oftenprefer to use the term ‘standardized’, ’constructed’ or ’planned’ language with the argument that the conlangsare rooted on natural languages, and ’artificial’ is thus misleading. The preferred terms seem to be planned orconstructed language or simply conlang. The construction of languages is called glossopoeia. TP is engineered forexperiments, meditation and philosophy; and useful at least as an auxiliary international language and as anartistic language. [29, 30]

ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 0, No. 0, Article 00. Publication date: July 2018.

Page 3: Basic concepts and tools for the Toki Pona minimal and ... · Toki Pona (TP) is a minimal conlang (constructed language) with only 124 words (120 without the synonyms). Therefore,

Basic concepts and tools for the Toki Pona minimal and constructed language • 00:3

2 OVERVIEW OF THE LANGUAGEThis section describes very succinctly the formation of words and sentences in the Toki Pona language. Itshould enable a newcomer to grasp the essentials of the language and the experienced to acquire new insights.Furthermore, it is a solid reference of the phonological and syntax rules, and enables one to understand andmodify the software presented in Section 3.

2.1 PhonologyWords in Toki Pona are written using only 14 letters:

• Vowels a (open), e (mid front), o (mid back), i (close front), u (close back).• Consonants j, k, l, m, n, p, s, t, w:– Nasal: m (labial), n (coronal).– Plosive: p (labial), t (coronal), k (dorsal).– Fricative: s (coronal).– Approximant: w (labial), l (coronal), j (dorsal).

There are standard guidelines for pronunciation, but the language allows for considerable allophonic variation.For example, /p t k s l/ might be pronounced [p t k s l] or [b d g z !R]. Especially for poetry, one might consider jand w to be vowels (e.g. j as ’i’ and w as ’u’).

Syllables are of the form (C)V(n): an optional consonant, a vowel and an optional coronal nasal consonant (n).Non word-initial syllables must follow the pattern CV(N). These sequences are forbidden: ji, wu, wo, ti, nm, nn.

2.2 Syntax2.2.1 Fundamental notions. As in other natural languages, colloquial TP might have incomplete sentences

and deviate from the norm. The basic structure of sentences is: ‘subject’ (Noun) li ‘predicate’ (Verb) e ‘object’(Noun). The li might be repeated to associate more than one predicate to the subject. The particle li is omitted ifthe subject is a simple mi (I or us) or sina (you). A discussion about issues (potential problems) yield by this lastrule and how the author deals with them is in Appendix A. The particle e might be repeated to associate morethan one object to a predicate. Sentences might be related through la, ’sentence’ la ’sentence’, where the secondsentence is the main sentence, and the first sentence is a condition to the second. Multiple la-s are not describedin literature, but one might assume that the last sentences are conditions to the next, except in cases where thecontext suggests otherwise.

Noun and verb phrases are (usually) built with the non-particle words. The first word is the noun or verb andsubsequent words qualify the noun or verb (i.e. yield adjectives and adverbs). The pi particle might be used toseparate sequences of words to be evaluated before the relation yield by pi. As pi is often ill understood and used,the following structures might be handy for newbies and as a reference:

• No pi, ‘word word word’: word← (qualifies 1) word← (qualifies 2) word.• One pi, ’word pi word word’: word← (qualifies 2) [ word← (qualifies 1) word ].• Two pi-s: ‘word pi word word word pi word word’: word←5 [word 2 word] 3 word←4 word 1 word; or:word←5 [word 1 word] 2 word←4 word 3 word.

Further notes on the usage of pi:

• In a sequence of words, without pi, the second word qualifies the first, the third word qualifies the phraseyield by the first two words, the fourth word qualifies the phrase yield by the first three words and so on.• It is redundant to use pi before the last word in a noun or verb phrase, reason why it is most often omitted. Itsuse in this case is considered an error [14, 16], but, as one might notice, it does not add (much) information

ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 0, No. 0, Article 00. Publication date: July 2018.

Page 4: Basic concepts and tools for the Toki Pona minimal and ... · Toki Pona (TP) is a minimal conlang (constructed language) with only 124 words (120 without the synonyms). Therefore,

00:4 • Fabbri, R.

through syntax because the order of qualifications is conserved. It adds as an emphasis because of greaterlength of the segment, as a preparation: ’jan lili pi mama’ (mommy’s child). Also used in [8].• The book by jan Pije [14] describes another use for pi: after li to mean possession, e.g. ‘soweli li pi sina’(your pet). This employment of pi might thus be regarded as correct, but is promptly written as a nounphrase (e.g. ‘soweli sina’) and is not mentioned by the official book [16].

All the words except the particles (li, e, la, pi, a, o, anu, en, seme, mu) and the words classified solely asprepositions (kepeken, lon, tan) are usable after any noun or verb phrase, (i.e. 107 words discarding synonyms).Notice that the phrase expresses a noun (in a subject or object) or a verb (in a predicate). And that the first wordof the phrase is the noun or verb, and that subsequent words are (used as) adjectives or adverbs. The pi precedes anoun or a verb to be qualified and then qualify the phrase that comes before it. These are reasonable expectationsand not strict conventions.At this point, the only missing syntax rule is related to the prepositions: kepeken, lon, sama, tan, tawa. They

might appear at the end of a phrase, might also be followed by another phrase, and require no particle. E.g. ‘toki*tan* jan Pije li pana e sona *tawa* mi’. Because prepositions can be used as adjectives or nouns or verbs oradverbs, etc, and might follow any noun or verb phrase, they are one of the four main sources of ambiguity: smallvocabulary; prepositions; absence of li after sina and mi; incomplete sentences. E.g. ’(mi [li]) pana tawa kon e ilopi suli mute’.

2.2.2 Particles. Beyond the structural particles (li, e, la, pi) presented in last section, other particles are:• a or kin, emphasis.• o, vocative or imperative (’jan lukin sitelen o, li wawa’)• taso, means however as sentence or ’only’ if adjective.• anu, en: ’or’ and ’and’. Used for nouns in noun phrases when subject or after pi. For verbs, repeat li. Forobject nouns, repeat e. If the noun is complementing a preposition (e.g. tawa, lon), one might repeat thepreposition. As TP is a recent language, and is particularly able to cope with variation due to its simplicity,one might advocate for using en and anu/en wherever there is no ambiguity. E.g.: ’mi tawa mute anu mokulon tenpo lili’, in the official documentation, might otherwise be regarded as wrong in strict canonical TP.• nanpa, denotes numbering.• seme, for questions, used next to the thing being asked for. ’tan seme la sina pana e sike?’.• mu, for animal noises. It is not only a particle, as in the official dictionary, but also a noun and a verb: ’mipakala e luka. mi mu mute’.

The vocabulary specifies these morphosyntactic classes: nouns, adjectives, verbs, pre-verbs, adverbs, preposi-tions, particles, and numbers. Such classification helps the speaker, specially the newcomer, but it also suggests adeviation from TP usage: the words might be used indistinctly as a noun (the first word of: a subject, a predicatewhen there is no object, an object, or a prepositional complement), an adjective (anything that follows an nounand is not a particle or a preposition), a verb (follows mi or sina or li), or an adverbs (follows a verb). The pre-verbs(wile, ken, awen, kama, lukin, sona), might be followed by a verb, but might also be understood as a verb qualifiedby the next word, which carries a very similar if not identical meaning (’wile moku’ might be understood bothas a pre-verb followed by a verb and as a verb followed by an adverb). The pre-verb words are all have othermorphosyntactic classes, in the official dictionary, such as adjective, noun, and verb. The only exception is wile,which is specified to be only a pre-verb, but is trivially used as a verb: ’mi wile e moku e telo’.

Thus, the classes given in the dictionary dictate little in practice: jan kala li lape lon ni. Where kala, lape and niare in this phrase as adjective, verb and noun, and are in the dictionary as noun, adjective and adjective.

2.2.3 ’li’ as the verb ’to be’. The particle li very often relates phrases as the ver ’to be’. E.g. ’ona li pona’ (thatis beautiful) and ’jan nasin li jan wawa’ (someone with techniques is someone strong).

ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 0, No. 0, Article 00. Publication date: July 2018.

Page 5: Basic concepts and tools for the Toki Pona minimal and ... · Toki Pona (TP) is a minimal conlang (constructed language) with only 124 words (120 without the synonyms). Therefore,

Basic concepts and tools for the Toki Pona minimal and constructed language • 00:5

2.2.4 Recognizing POS tags by speaker and machine. One might perform a reasonable POS tagging by followingthese rules:

• Noun: the first word in a noun phrase. Starting a sentence if it is complete, after an ’e’, after a pi. Thepredicate is often a noun if the sentence has no object.• Adjective: second word on after an e and after second word on in the subject phrase if present.• Verb: after a li, mi or sina. If there is no object, the predicate might be a noun or a verb.• After a preposition, one might consider that what follows is 1) always a noun, or that 2) it can be a noun,adjective or verb, or that 3) it is a hybrid of noun, verb and adjective. [6]• If there is no object, the predicate might be parsed as being a hybrid of noun, verb and adjective.• Notice that there is ambiguity in the structure introduced by the omission of li after mi and sina. Also,when there is no object, a noun or a verb or an adjective might be in the verb position. These are sources ofsyntactic ambiguities in TP. They might be resolved or minimized considering the semantics of the words.An initial effort in this direction might be using the classes in the official dictionary to resolve ambiguitieswhenever possible. This solution is not optimal for correct POS tagging, and does not solve all possibleambiguities (there are words classified as noun and adjective, adjective and verb, noun and verb, particleand verb). Another source of ambiguity is the pre-verbs as described in the literature [14, 16] and in theend of Section 2.2.2.

2.2.5 Additional notes: synonyms, e ni, names, structure, other. The only synonyms on Toki Pona are: a andkin; lukin and oko; sin or namako; ale or ali. In formations such as toki e ni:, wile e ni:, tan ni: etc., ’(e) ni’ mightbe omitted and : used alone. Names are by default transliterated, but might not be. These and other issues arefurther described in Appendix A. John Clifford (a notable “toki-ponist”, aka. jan Kipo) states that there are notnoun, verb, adjective (or even prepositional) phrases in TP, but only phrases in a structure yield by particles andcontent words [6].Many other considerations might be made about the language, such as additional (deprecated and new)

words, situations where li is used and avoided, canonical and alternative employment of pi, counting systems,associations of TP word sequences to words in English, usage of images to represent TP words and sentences,and best practices in writing TP texts or for translation. For these and other matters, visit [3, 14, 16, 22].

3 SOFTWARE FOR ANALYSIS, SYNTHESIS, AND SYNTAX HIGHLIGHTINGIn this section, we describe software, statistic, natural language processing, and visualization results: 1) theanalysis of the Toki Pona (TP) language through statistics of the vocabulary obtained by processing the officialdictionary; 2) the synthesis of sentences, poems and paragraphs (e.g. short stories) in TP; 3) syntax highlightingfor TP, including fine-tuning and theoretical considerations; 4) an initial Wordnet [19] of TP words. Corpusanalysis was already found in [13]. The Python scripts (i.e. the toolbox) for obtaining all the results in thissection, and more, is publicly available in [10], to facilitate both the inspection of the results and the generationof derivatives by other interested parties. Such scripts are friendly to understand and alter.

3.1 Statistics of the vocabularyThis section focuses on the statistics of the vocabulary and of the syntactic rules: the letters, phonemes, wordsizes, possible combinations for words and sentences. The TPTabFig Python class was used to obtain all themeasurements and tables discussed herein, and hold further measurements and sets of words regarded as lesssuitable for this exposition than for a deeper investigation.[9]As described in Section 2, there are only 14 letters, and phonemes are also very restricted. There are 120

different words in the official vocabulary, 4 of them having synonyms. A total of 124 tokens (or lemmas), not

ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 0, No. 0, Article 00. Publication date: July 2018.

Page 6: Basic concepts and tools for the Toki Pona minimal and ... · Toki Pona (TP) is a minimal conlang (constructed language) with only 124 words (120 without the synonyms). Therefore,

00:6 • Fabbri, R.

counting proper nouns (names) and punctuation. Table 1 shows the number of words related to each POS tag (ormorphosyntactic class) specified in the dictionary.

Table 1. "POS tags incident and chosen as preferential e.g. in text synthesis. The official dictionary often relates tokens tomore than one POS tag. For the text highlighting Plugin, for example, a token has to have an established tag to have adefined color. On the Chosen column, the tokens were regarded only once by choosing the first occurrence of [’PRE’, ’VERB’,’PREPOSITION’, ’PARTICLE’, ’ADJECTIVE’, ’NOUN’, ’NUMBER’] in the official dictionary.

POS All ChosenNOUN 58 49ADJECTIVE 40 34VERB 15 13PARTICLE 12 12PRE 6 6PREPOSITION 5 5NUMBER 4 1total 140 120

From the official 124 words, 26 of them (20.97%) have only one syllable, 85 (68.55%) have two syllables, and13 (10.48%) have three syllables. No official word has four or more syllables. In all the 124 tokens, there are235 syllables (68 different), and Table 2 exhibits the 10 most frequent syllables in the first, last, middle and allpositions. Middle-word syllables only occur 13 times, and are all different, with the exception of ’la’ which occurstwice. A complete list of the syllables and their frequencies (in different positions) is in the tf.hsyls variableafter executing the basic/tbBasic.py script. A list of all words, grouped by their size in syllables, and orderedalphabetically and by the number of letters, is in the tf.hlsyl__ variable.

Table 2. Frequency of syllables in Toki Pona considering all 235 syllables of the 124 tokens, only the first or last syllables oronly the middle syllables. In parenthesis are the count and percentage of the corresponding syllable. For more informationand a complete list of syllables, see Section 3.1.

rank all first last middle1 li (13, 5.53%) a (8, 6.45%) li (10, 8.06%) la (2, 15.38%)2 la (10, 4.26%) o (5, 4.03%) lo (6, 4.84%) je (1, 7.69%)3 ka (9, 3.83%) pi (5, 4.03%) na (6, 4.84%) ka (1, 7.69%)4 na (9, 3.83%) ka (4, 3.23%) la (5, 4.03%) ke (1, 7.69%)5 pa (9, 3.83%) la (4, 3.23%) ma (5, 4.03%) li (1, 7.69%)6 a (8, 3.40%) pa (4, 3.23%) pa (5, 4.03%) lu (1, 7.69%)7 ma (8, 3.40%) se (4, 3.23%) ka (4, 3.23%) ma (1, 7.69%)8 si (8, 3.40%) si (4, 3.23%) sa (4, 3.23%) me (1, 7.69%)9 lo (7, 2.98%) su (4, 3.23%) si (4, 3.23%) pe (1, 7.69%)10 pi (6, 2.55%) i (3, 2.42%) te (4, 3.23%) ta (1, 7.69%)

Vowel and consonant frequencies, and comparisons of vowels against consonants, are as shown in Table 3for start, end and internal positions. The limited number of consonants favors the language prosody to appearnatural as it resembles babbling.

ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 0, No. 0, Article 00. Publication date: July 2018.

Page 7: Basic concepts and tools for the Toki Pona minimal and ... · Toki Pona (TP) is a minimal conlang (constructed language) with only 124 words (120 without the synonyms). Therefore,

Basic concepts and tools for the Toki Pona minimal and constructed language • 00:7

Table 3. Frequency of letters in Toki Pona. Columns freq, freq_I, freq_L and freq_M are the frequencies of the letters inany, initial, last and middle positions. The columns ’v’ and ’c’ that follow them are frequencies considering only vowels andconsonants. The most frequent vowel is ’a’ in any position, although it is more salient among words starting with a voweland among the last letter of the words. For starting, ending and middle positions, the second most frequent vowel varies.Among the consonants, ’n’ is the most frequent because it is the only consonant allowed in the last position and becausealmost 20% of the words end with ’n’. On the initial position, ’s’ is the most frequent consonant, while in middle position’l’ is the most frequent consonant. Many other conclusions may be drawn from this table and are useful e.g. for exploringsonorities in poems.

letter freq v c freq_I v c freq_L v c freq_M v ca 16.35 33.19 - 8.06 40.00 - 29.03 35.64 - 14.22 29.46 -e 8.60 17.45 - 2.42 12.00 - 11.29 13.86 - 10.78 22.32 -i 11.53 23.40 - 3.23 16.00 - 20.97 25.74 - 10.78 22.32 -o 7.55 15.32 - 4.03 20.00 - 14.52 17.82 - 6.03 12.50 -u 5.24 10.64 - 2.42 12.00 - 5.65 6.93 - 6.47 13.39 -j 2.10 - 4.13 3.23 - 4.04 0.00 - 0.00 2.59 - 5.00k 6.29 - 12.40 11.29 - 14.14 0.00 - 0.00 6.90 - 13.33l 9.22 - 18.18 12.10 - 15.15 0.00 - 0.00 12.50 - 24.17m 4.61 - 9.09 10.48 - 13.13 0.00 - 0.00 3.88 - 7.50n 10.48 - 20.66 6.45 - 8.08 18.55 - 100.00 8.19 - 15.83p 5.66 - 11.16 11.29 - 14.14 0.00 - 0.00 5.60 - 10.83s 6.29 - 12.40 13.71 - 17.17 0.00 - 0.00 5.60 - 10.83t 3.14 - 6.20 6.45 - 8.08 0.00 - 0.00 3.02 - 5.83w 2.94 - 5.79 4.84 - 6.06 0.00 - 0.00 3.45 - 6.67

Within the rules exposed in Section 2.1 (of 14 letters, 5 vowels, (C)V(n) phonemes, forbidden ji, wu, wo, ti, nn,nm syllables), 96 words are possible with 1 syllable, 8256 with 2 syllables, 710016 with 3 syllables. In the incidentwords of the official dictionary, no middle syllable end with the nasal consonant ’n’.

Possible sentence structures are unlimited. For a noun or verb phrase, one might quantify the possibilitiesby considering all words but the particles that are not classified also as something else (i.e. li, e, la, pi, a, o,anu, en, seme, mu) and the prepositions that are not classified also as something else (i.e. kepeken, lon, tan).This yields 120 − 13 = 107 words. As discussed in Section 2.2, the author advocates that, although words havespecific classes in the dictionary, they might be used indistinctly as nouns, adjectives, verbs and adverbs. Thus,one has 107 possibilities of noun and verb phrases with one word, 1072 = 11, 449 possibilities with two words,1073 = 1, 225, 043 with three words and so on. Be n, v , o, and p the number of words in the subject (noun),predicate (verb), object (noun) and prepositional (see Sections 2.2.4 and 2.2.5) phrases of a sentence, and assumethat the sentence has at most one prepositional phrase, one might quantify the possibilities using the formula:δ = 107n × 107v × 10o × 5 × 107p , where 5 stands for the possible prepositions. To account for the particles,one possibility is to assume for simplicity the use of at most one particle (not li, e, la, pi, resulting in 8 particlesand 9 possibilities) at each phrase, yielding: δ = 107n × 107v × 107o × 5 × 107p × 94. For example, assumen = v = o = p = 1, then δ = 1071 × 1071 × 1071 × 5 × 1071 × 94 = 4.300066 × 10+12, i.e. more then 4 trillionpossible sentences with single word phrases (i.e. no adjectives or adverbs), while allowing only one particle perphrase and only one predicate phrase (e.g. ’ona li moku e soweli lon supa’). [10].

3.2 Synthesis of textSuch counting exercises are also useful for (semi-)automated writing through scripting. The syntax organizes thewords in larger structures. The rhymes are very restricted and sonorities are bounded by the small vocabularyand simple syntax. There are some specific tasks for achieving texts, such as finding the number of syllablesconsidering the elisions, or handling interaction of the writer with the script to choose sentences or verses orstanzas. The package [10], herein presented, also has capabilities for synthesizing TP text. On its most basic level,

ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 0, No. 0, Article 00. Publication date: July 2018.

Page 8: Basic concepts and tools for the Toki Pona minimal and ... · Toki Pona (TP) is a minimal conlang (constructed language) with only 124 words (120 without the synonyms). Therefore,

00:8 • Fabbri, R.

these routines yield noun, verb and prepositional phrases, and sentences. They also aim at making larger scaletexts by assuring the use of specific words (to entail context), e.g. to assist the creation of short narratives, and byemploying stylistic outlines for poems.

Figure 1 holds some synthesized texts with different color schemes for the syntax highlighting, as presented inthe next section. The textual synthesis described here, and implemented in the TPSynth class, might be enhancedin countless ways, e.g. as described in Section 4.Automatic and randomized synthesis of texts in TP is particularly useful because of the reduced vocabulary

where each word is related to a broad semantic field. Ideas often make sense in unexpected ways, and thus thesynthesis yields a procedure to explore the semantic possibilities within TP. One might object that the resultingtexts are unusual and even consider that they often hold insubstantial or unsound meaning. I advocate that theseunexpected formations are desirable for exploring the possibilities of the language and of thought, and for artisticendeavors. Also, to obtain texts which one finds usual or satisfactory in more strict or personal terms, such aperson might just write them normally or use the synthesized texts as raw material.

3.3 Syntax highlightingThe same package [10] has a Vim [7] syntax highlighting plugin for Toki Pona. The Python scripts hold routinesto arrange the coloring schemes (i.e. to specify which tokens are related to which coloring groups) which arestored in the syntax/tokipona.vim file. An online syntax highlighting gadget is found at [15], and the solutionhere described presents a number of enhancements: it is capable of coloring all morphosyntactic classes; it mightbe fine-tuned in the colors and their relations to sets of tokens; it is designed to be used within Vim and might beexported as HTML; the highlighting scheme is promptly rendered by a script; the resulting script might be usedin combination with virtually any color scheme.

Basically, the resulting syntax highlighting distinguishes the words among the morphosyntactic classes accord-ing to the official dictionary as given in Table 1. Also, some classes might be further defined (e.g. words beginningwith vowels) or joined (e.g. distinguishing only particles against rest, or particles and prepositions against therest). The colors are also promptly changed according to [7] and exemplified in the package documentation,specially the syntax/tokipona.vim syntax file

Currently, the Python package synthesizes the syntax file through theTPSynHigh class. The user has control of class precedence, merging and further details through tweaking suchroutine. The choice of precise coloring schemes involves fine tuning the color scheme being used in Vim (such as’blue’, ’elflord’ and ’gruvbox’), and Vim’s highlighting schemes as described in [7]. The usage of the package andplugin might be performed, in summary, through the following actions:

• Installation of the plugin, so that syntax highlighting of TP texts will be performed in any *.tp or*.tokipona file.• Tweaking of the syntax file by hand. Or• Instantiating the TPSynHigh class and then executing the mkSynHighFile()method of the resulting object.This generates a new tokipona.vim syntax file, at the right location in Vim folders, according to objectsettings.• Write a file with the .tokipona or .tp extension (inside Vim) using Toki Pona words. Reload the high-lighting scheme using :e whenever you change the syntax file by hand or through the Python routines.• Access the used highlighted groupswith :syntax. Access all the highlighting groupswith :so $VIMRUNTIME/syntax/hitest.vimor :hi. Change the coloring of a set of terms by associating a used group (e.g. tpADJECTIVE) to an existinggroup (e.g. Visual) such as in :highlight link tpADJECTIVE Visual. This may also be achieved changingTPSynHigh().colors attribute before executing mkSynHighFile().

ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 0, No. 0, Article 00. Publication date: July 2018.

Page 9: Basic concepts and tools for the Toki Pona minimal and ... · Toki Pona (TP) is a minimal conlang (constructed language) with only 124 words (120 without the synonyms). Therefore,

Basic concepts and tools for the Toki Pona minimal and constructed language • 00:9

(a) Synthesized phrase.

(b) Synthesized sentence.

(c) Synthesized paragraph.

(d) Synthesized poem.

Fig. 1. Toki Pona texts synthesized by an instantiated TPSynth Python class [10]: (a) a phrase, (b) a sentence, (c) a paragraph,and (d) a poem. The unusual word combinations are convenient for exploring semantic possibilities. One might obtain manytexts to select the excerpts she finds fit for the intended use. For more information, see Section 3.2. The words are colored inaccordance with the considerations in Section 3.3.

ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 0, No. 0, Article 00. Publication date: July 2018.

Page 10: Basic concepts and tools for the Toki Pona minimal and ... · Toki Pona (TP) is a minimal conlang (constructed language) with only 124 words (120 without the synonyms). Therefore,

00:10 • Fabbri, R.

Such procedures are facilitated by the basic/tpBasic.py. One may change: associations of tokens to HIGs;associations of regex patterns to HIGs; associations of HIGs to other HIGs; whole color schemes or just sectionsof it. The last of these tasks is promptly performed using the space-c-s and space-c-c normal Vim commandssupplied by the color plugin of the PRV framework [7]. The other tasks are more easily performed by scripting Vim,Python import and package source code, depending on the user ability and intended tasks. The use of differentcolor schemes within the same syntax file is exemplified in Figure 1. By copying only the syntax/tokipona.vimand ftdetect/tokipona.vim files, a user might benefit from the standard Vim commands. E.g. :colorschemeblue, :colorscheme solarize, :colorscheme gruvbox or :colorscheme elflord to see the same text coloredwith different color schemes (association of colors to sets of tokens). To interfere directly on the colors chosen,:highlight Normal guifg=#00000 guibg=#0000ff will change the standard foreground (text) color to pureblack and background to pure blue. See :h gui-colors and :h highlight for the way you might edit colorsdirectly. One might obtain an HTML file with the colored TP text using the :TOhtml command.

3.3.1 Advanced syntax highlighting considerations. Standard guidelines for syntax highlighting depend heavilyon cultural and use factors and have scarce scientific studies [1]. There are informed projects such as Solarized [23],which present solutions for some contexts. Strikingly, standard guidelines for syntax highlighting were not found.Therefore, we considered current data visualization theory [20, 25, 27, 28] to glimpse at the potential guidelines:

• The use of blue and other high frequency colors (such as violet) for the background and the avoidance oftheir use to fill small objects, such as letters and words.• Explore simplicity and elegance in the coloring schemes. Most tokens should be preferably of the samecolor, i.e. a small number of colors might be explored to achieve a clean visualization of texts.• A power-law distribution of tokens among colors might be well-suited to mimic natural occurring phe-nomena. The TP dictionary morphosyntactic classification of words resemble a long-tailed distribution: asshown in Table 1, few classes hold the vast majority of words.• Physical stimuli might be related to perceptual stimuli both through a power-law or an exponential law(respectively known as Steven’s law and Weber-Fechner’s law). This might be useful e.g. for coloring setsof tokens considering their similarity, or correctly choosing values for e.g. the hue or luminosity channels.• One usually wishes to maximize contrast, although taste and less wearing combinations might also dictatethe coloring choices.• Stipulate axes of parameters to set colors. Wavelength or frequency is an obvious axis given the considera-tions previously exposed, but they are not perceptually uniform. Considering perceptually uniform colorspaces in making choices may be relevant, such as CIElab and CIEluv.• Consider two types of color schemes: of those with a dark background, which are more comfortable at first;and of those with a light background, which are usually impressive (and even annoying) at first, but the eyeadapts in a few minutes, keeps you more stimulated, and is suitable for bright contexts (e.g. in daylight) .• The blue color is specially related to physiological stimulation of the body [18, 26], and programmers oftenreport that a blue background keeps them awake and more concentrated (the author also notices sucheffect).• Colors have been associated to enhancements in specific tasks, e.g. blue and red are respectively associatedto enhanced creativity and detail-oriented tasks in [18].• It is important to consider that a text editor user might stay many hours using the tool (and they very oftendo), and that the colors and formats involved are thus prone to entail a considerable effect in the body andmental activity, the quality of life, and work results of a writer (e.g. a programmer).• Ideally, one should have facilities for tuning the syntax highlighting (e.g. through keyboard shortcuts) asenvisioned and implemented in [7].

ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 0, No. 0, Article 00. Publication date: July 2018.

Page 11: Basic concepts and tools for the Toki Pona minimal and ... · Toki Pona (TP) is a minimal conlang (constructed language) with only 124 words (120 without the synonyms). Therefore,

Basic concepts and tools for the Toki Pona minimal and constructed language • 00:11

The considerable irrelevance of the morphosyntactic classes, described in Section 2.2, suggests that an appro-priate coloring of words should either distinguish only between particles, prepositions and the rest, or considerthe syntax to identify the nouns, verbs, adjectives, adverbs, prepositions and particles. Also, the coloring ofpoems might be more appealing if considered e.g. the counting of syllables, the repeated letters or syllables, andthe ending syllables. Further insights might be obtained through the choices made and advocated in softwaresuch as text editors (e.g. Vim, Emacs, Sublime), packages dedicated to syntax highlighting (e.g. pigments1 andhighlight.js2), and other software (e.g. Linux terminals such as Xterm and gnome-terminal). Finally, the carefulchoice of fonts is known to have a relevant impact in comfort and productivity [24], e.g. sans-sheriff fonts aremore promptly read and yield a cleaner text than a serif font, and the same applies to a monospaced font whichis more likely to yield a cleaner text, at least for programming. One might also consider the mapping the textualstructures to sound [12] (i.e. parametrize the synthesis of sounds e.g. by the counting of specific tokens inside ascript, a function or class or around a variable of conditional or loop), which might be called “syntax sonification”.

3.4 Toki Pona WordnetA first Toki Pona Wordnet was constructed relating each of the TP words in the dictionary to English Wordnetsynsets [19] through the English lemmas. The canonical (i.e. Princeton) Wordnet only contains nouns, adjectives,verbs, and adverbs. Thus, particles were not considered. Numbers were considered adjectives. Words presentedas adjectives in the dictionary were considered both as adjectives and adverbs. Prepositions were considered inall classes. [19]

The TPWordnet class [10], provides such tentative TP Wordnets in their simplest form: the TP words are keysin a dictionary that returns the corresponding synsets. Three of such dictionaries are implemented: one where allsynsets related to the lemma is returned (the version most consistent with the expositions in Section 2), with4,027 synsets; another as such but excluding the prepositions, with 3,929 synsets; a last one that returns onlythe synsets registered in Wordnet with the same POS tag as the lemma is in the official dictionary, with 2,462synsets. Another immediate possibility, not implemented, would be to relate each Toki Pona word to the synsetssimultaneously associated to all the English words bounded by a semicolon in the dictionary. Such collection ofsynsets, and its relation to the whole Wordnet (the total number of synsets in Wordnet is 117, 659), might guidecreation of other conlangs (e.g. one might seek to use lemmas related to synsets that are very far apart, or thathas the most complete neighborhood possible).TP words are very general and each of them might mean many things. Thus, the words are not very easily

related e.g. by hyponymy or meronymy. Exceptions: jan (people) is a hyponym of soweli (mammal); kili is anhyponym of kasi; walo, pimeja, jelo, loje, laso are hyponyms of kule. There is at least one caveat: if a particularmeaning of each word is chosen, then there might be many other of such relations, e.g. lawa, luka, noka, sinpin,monsi, lukin, mute, kute, palisa, and lupa might be considered meronyms of sijelo. Antonyms are also speciallyuseful in TP: suno and mun, pona and jaki or ike, sinpin and munsi, lawa and noka, mije and meli, sike and palisa,pana and kama jo, pimeja and walo, weka and poka, sama and ante, ali and ala, anu and e, selo and insa.

4 CONCLUSIONS AND FURTHER WORKThis document presented a potentially novel description of Toki Pona (TP) language in Section 2, and innovativeand useful software routines for dealing with the language in Section 3. As a minimal conlang, TP is veryconvenient for cognitive experiments through linguistic semantics and for devising tools for analysis, creationand visualization, a fact to which this document is evidence. Other conlangs, and even non-planned languagesmight benefit from TP and this content in many ways, e.g. one might synthesize TP texts and translate them as

1http://pygments.org/docs/2https://highlightjs.org/

ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 0, No. 0, Article 00. Publication date: July 2018.

Page 12: Basic concepts and tools for the Toki Pona minimal and ... · Toki Pona (TP) is a minimal conlang (constructed language) with only 124 words (120 without the synonyms). Therefore,

00:12 • Fabbri, R.

convenient, adapt the routines to obtain statistics of the vocabulary, or take advantage of the language descriptionto devise new conlangs or stylistic guidelines for any language.At the moment, the framework in [9] is being used, by the author, for TP chatter bots and for semantic

explorations of domain knowledge. Some of the many potential next steps should be highlighted:

• The text synthesis facilities described in Section 3.2 might be enhanced by further employing e.g. rhythm,meter, rhyme and form techniques [31].• Syntax highlighting that colors the tokens with respect to the syntactic positions and functions, and not ina fixed manner as is implemented and exposed in Section 3.3, may be implemented by using n-grams andother techniques from Natural Language Processing.• The TP Wordnet(s) described in Section 3.4 should be may be enhanced: implement a NLTK-like Wordnetinterface; double-check if each synset is correctly associated to the TP words; seek further synsets thatwere not found by the English lemmas in the official dictionary; implement the most restricted versiondescribed in Section 3.4, which relates each TP word only to the synsets that are related to all the Englishwords not separated by a semicolon; scrutinize the neighborhood of the synsets of the TP Wordnet in theEnglish Wordnet.• Conlanging: use Wordnet for making new conlangs; use insights derived from TPWordnet. Create conlangsfor specific uses: describing data, programs, scientific writings, creative writing (exploring the thinkingprocess). A conlang might have different modes that are specified by section headers or tags.• Explore TP relation to other languages (e.g. suno and suwi might come from sun and sweet), and thereasons that led Sonja (and maybe other people) to choose the 14 letters and the syllable structure. Thismight require a dedicated communication with the speaker community and the documentation keepers.• Understand how the corpus was gathered in tokipona.net.• Corpus-based analysis.• Publication of original TP texts and translations.• Write academic articles in TP (extending Section B which might be the first section of a scientific documentwritten in TP). The outline might be: a summary in both English and Toki Pona, for facilitated acquisition ofcontext, and an article in a canonical structure (e.g. introduction and related work, materials and methods,results and discussion, conclusions and further work), with an Appendix explaining the content and contextin English. Maybe something about complexity, statistics, physics, or computer science; or linguistics,philosophy, literature or psychology. Partners may learn the language in a few hours or weeks (with orwithout help). This is a very challenging step.• Make TP variations where each TP word is replaced by a word in a language with a large speaker population.For example, suno might be sun in an English variant, and sol in a Portuguese variant. A naive versionmight be obtained through choosing the first word in the description of an official vocabulary, as theyalready exist in English and French.• Explore a minimal conlang to describe software routines. In this sense, it seems advantageous to understandwhat ‘non-ambiguous’ means in Lojban, and in what sense one is able to compile and parse Lojban. Considerthe possibilities of and reasons for parsing TP.

A CURRENT USAGE OF TOKI PONA BY THE AUTHORDisclaimer: this whole section holds personal views and usages of the Toki Pona (TP) language. It is reasonableto use the standard sounds, but often [z] instead of s is more comfortable. Translation into Toki Pona (e.g. ofbiblical excepts), and the creation of new texts as poems and short stories, is both challenging and pleasant. Textsare shared e.g. in [11] and/or published in the TP Facebook groups. The li particle is most often not used aftersubjects sina and mi, in accordance with the norm, but, even so, they are useful when there are many predicates.

ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 0, No. 0, Article 00. Publication date: July 2018.

Page 13: Basic concepts and tools for the Toki Pona minimal and ... · Toki Pona (TP) is a minimal conlang (constructed language) with only 124 words (120 without the synonyms). Therefore,

Basic concepts and tools for the Toki Pona minimal and constructed language • 00:13

E.g. sina li wawa li pimeja li lukin pona li moku e kasi mute. In such cases, the first li is sometimes omitted. Also,li is found after mi and sina where one finds that there is unwanted ambiguity, e.g. sina moku pona which mightbe ’sina li moku pona’ or ’sina moku pona (li jo e sike)’.

Names are by default transliterated, but one should notice that, as in other languages, names might be used asthey are in the correspondent mother tongue. E.g. the name Erdös is used in English and Portuguese althoughthe standard alphabet does not contain ö in such languages. The use of e.g. English (or German) words in TPtexts is found when needed, as happens often in scientific writing (kernel is a German word used in English,webpage and software are English words used in Portuguese).

Proposed notations for numbers seem numerous. E.g. one might indicate if two numbers are being multiplied(pi) or are in different scales (such as in decimal or binary notations): luka (pi) two mean 52. Most importantly, 52is a reasonable notation for a simple language. E.g. ’mi jo e jan sama nanpa 12’, or ’ona li lon e soweli 27’.

Avoiding needless words... ’e ni:’ may most often be written as ’:’. Subject phrase may be omitted (sometimesalso the li) if it is the same as in the last sentence. One may vary TP by grouping e.g. these words: noka and anpa;luka, suno, sike and lawa; pali and pana.Also, ambiguities introduced by omission of li and absence of a token to denote preposition, suggest the

possibility of a syntax that is always uniquely parsed (or at least less ambiguous).It is tempting to conceive a language with the same syntax of TP: ’sentence’ la ’sentence = subject + predicate

+ object’ (each term with a possible prepositional complement), but always using particles to bound the sentencesections, repeating them when the function is performed in the smallest scale: ’toki pi toki pona pi jan sona’meaning language in toki pona and of intellectuals vs ’toki pi toki pona pi pi jan sona’ meaning language in [tokipona of intelectuals]. The keyword for initiating a preposition complement might be ’a’, but it is already taken inTP. One possibility is to use ’a’ before a preposition and use ’ha’ or ’he’ instead of TP a. In such a setting, onemight enable pi, li, e, a (concept-qualifier, subject-predicate, predicate-object, concept-preposition) in any slot ofthe syntax template. Between sentences one may use: la pi, la li, la e, la a, and la la. This is a very fractal conlangproposal. Maybe also have a way to discern between a noun, a qualifier (adjectives/adverbs) and a verb, andaccept any of them for the subject, predicate, object, preposition, slots. Or assume a part-of-speech as default fora slot or for the first word in a slot.

B FINAL WORDS IN TOKI PONAtoki li nasin e lawa.li nasin tawa (pi) toki insa.toki pona li pona e nasin tan ni:

ona li jo e nimi nanpa lili.ona li pona.pona kepeken weka ’p’ li ona, a.o taso la toki pona li kalama li lukin pona.

sitelen en nasin li open e sitelensuli [3, 5, 8, 11, 14, 16].li sona. li nasin e toki e sona e lawa e lon.toki ni li wawa tawa jan mute nasin en toki.

wawa tawa toki pi jan sona.pi ilo nanpa en nanpa nasin.taso tawa toki sona a.sitelen sona, sitelen musi.

ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 0, No. 0, Article 00. Publication date: July 2018.

Page 14: Basic concepts and tools for the Toki Pona minimal and ... · Toki Pona (TP) is a minimal conlang (constructed language) with only 124 words (120 without the synonyms). Therefore,

00:14 • Fabbri, R.

ilo lon Poki 3 li pana e sitelene sona tan sitelen,en nasin.Poki 2 li pana e nasin pi toki pona.e sitelen tawa kama sona.mi wile pali e sitelen lon toki pona.

sitelen lon nasin, sitelen sona,sitelen musi.ante e toki ponala ona li nasa.taso nasa li pona mute.li pona mute tawa lawa,tawa kama sona, tawa sitelen e toki.ante la toki pona o.toki e toki pona tawa sina.kama sona e toki pona sina.o pona tawa jan pi toki pona.

tawa jan Sonja, Birns-Sprage, Kipo, Pije,Siwejo, Malija, Tepan, jan kulupu mute.

ACKNOWLEDGMENTSThe author thanks FAPESP (project 2017/05838-3, coordinated by Prof. Dr. Maria Cristina Ferreira de Oliveira); TokiPona, Python and Vim authors, developers and literature maintainers; Toki Pona user and speaker communities;Mario Alzate Lopez for introducing me to Toki Pona, and Silverio Guazzelli Donatti for the stimulus that resultedfrom the usage of Toki Pona for chatting.

REFERENCES[1] Various authors. 2016. Syntax-highlighting color scheme studies. Software Engineering at Stack Exchange. Retrieved July 01, 2018

from https://softwareengineering.stackexchange.com/questions/89936/syntax-highlighting-color-scheme-studies[2] Marek Blahuš. 2011. Toki pona–eine minimalistische Plansprache. Spracherfindung und ihre Ziele. Beiträge der 20 (2011), 51–56.[3] Toki Pona community. 2018. toki pona central hub preparation. Retrieved July 01, 2018 from https://docs.google.com/document/d/

1Dzs-imNeZ8TMgdHUiiungJ4Yf97CJk9ylhQPXjWLsJU[4] Toki Pona community. 2018. Various Memrise courses on Toki Pona. Retrieved July 01, 2018 from https://www.memrise.com/courses/

english/?q=toki+pona[5] Toki Pona community. 2018. Wikipesija. Retrieved July 01, 2018 from http://tokipona.wikia.com[6] John Clifford et al. 2017. o awen, akesi wawa. Retrieved July 01, 2018 from https://www.facebook.com/groups/sitelen/permalink/

1597077093680003/[7] Renato Fabbri. 2017. An anthropological account of the Vim text editor: features and tweaks after 10 years of usage. arXiv preprint

arXiv:1712.06933 (2017).[8] Renato Fabbri. 2017. o awen, akesi wawa. Retrieved July 01, 2018 from https://www.facebook.com/groups/tokiponataso/permalink/

1895942817391318/[9] Renato Fabbri. 2018. PRV: Python, RDF and Vim. Retrieved July 03, 2018 from https://github.com/ttm/prv/[10] Renato Fabbri. 2018. Toki Pona tools and article. Retrieved July 01, 2018 from https://github.com/ttm/tokipona[11] Renato Fabbri. 2018. Toki Sona: a blog dedicated to Toki Pona. Retrieved July 01, 2018 from http://tokisona.github.io/[12] Renato Fabbri, Vilson Vieira da Silva Junior, Antônio Carlos Silvano Pessotti, Débora Cristina Corrêa, and Osvaldo N Oliveira Jr. 2014.

Musical elements in the discrete-time representation of sound. arXiv preprint arXiv:1412.6853 (2014).[13] Jim Henry III. 2018. lipu pi toki pona pi jan Jakopo. Retrieved July 01, 2018 from http://jimhenry.conlang.org/conlang/tokipona/

tokipona.htm[14] Bryant Knight. 2017. o kama sona e toki pona! Retrieved July 01, 2018 from http://tokipona.net/tp/janpije/okamasona.php

ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 0, No. 0, Article 00. Publication date: July 2018.

Page 15: Basic concepts and tools for the Toki Pona minimal and ... · Toki Pona (TP) is a minimal conlang (constructed language) with only 124 words (120 without the synonyms). Therefore,

Basic concepts and tools for the Toki Pona minimal and constructed language • 00:15

[15] Bryant Knight. 2018. Online gadget for syntax highlighting Toki Pona text. Retrieved July 01, 2018 from http://tokipona.net/tp/DisplayText.aspx

[16] Sonja Lang. 2014. Toki Pona: The language of good.[17] Sonja Lang and Kris Broholm. 2014. AFP 20 – Sonja Lang: Toki Pona, Conlanging and the meaning of life. Retrieved July 01, 2018 from

https://actualfluency.com/afp-20-sonja-lang-toki-pona-conlanging-meaning-life/[18] Ravi Mehta and Rui Juliet Zhu. 2009. Blue or red? Exploring the effect of color on cognitive task performances. Science 323, 5918 (2009),

1226–1229.[19] George Miller. 1998. WordNet: An electronic lexical database. MIT press.[20] Tamara Munzner. 2014. Visualization analysis and design. AK Peters/CRC Press.[21] Sean P O’Neill. 2015. Sapir–Whorf Hypothesis. The International Encyclopedia of Language and Social Interaction (2015), 1–10.[22] Stephan Schneider. 2018. nasin pi toki pona. Retrieved July 01, 2018 from https://github.com/stefichjo/toki-pona[23] Ethan Schoonover. 2011. Solarized: precision colors for machines and people. Retrieved July 01, 2018 from http://ethanschoonover.com/

solarized[24] Avram Joel Spolsky. 2008. User interface design for programmers. Apress.[25] Alexandru C Telea. 2014. Data visualization: principles and practice (2nd. ed.). AK Peters/CRC Press.[26] Antoine U Viola, Lynette M James, Luc JM Schlangen, and Derk-Jan Dijk. 2008. Blue-enriched white light in the workplace improves

self-reported alertness, performance and sleep quality. Scandinavian journal of work, environment & health (2008), 297–306.[27] Matthew O Ward, Georges Grinstein, and Daniel Keim. 2015. Interactive data visualization: foundations, techniques, and applications. AK

Peters/CRC Press.[28] Colin Ware. 2012. Information visualization: perception for design. Elsevier.[29] Wikipedia contributors. 2017. Artistic language — Wikipedia, The Free Encyclopedia. https://en.wikipedia.org/w/index.php?title=

Artistic_language&oldid=808260717 [Online; accessed 3-July-2018].[30] Wikipedia contributors. 2018. Constructed language — Wikipedia, The Free Encyclopedia. https://en.wikipedia.org/w/index.php?title=

Constructed_language&oldid=846512368 [Online; accessed 1-July-2018].[31] Wikipedia contributors. 2018. Poetry — Wikipedia, The Free Encyclopedia. https://en.wikipedia.org/w/index.php?title=Poetry&oldid=

847801129 [Online; accessed 1-July-2018].[32] Wikipedia contributors. 2018. Toki Pona — Wikipedia, The Free Encyclopedia. https://en.wikipedia.org/w/index.php?title=Toki_Pona&

oldid=846623837 [Online; accessed 1-July-2018].

Received July 2018; accepted XXX XXXX

ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 0, No. 0, Article 00. Publication date: July 2018.