字库词库超文本界面 character and word dictionariescui.unige.ch/isi/reports/chinese.pdf ·...

17
1 Hypertext Interfaces for Chinese Character and Word Dictionaries 字库词库超文本界面 字库词库超文本界面 字库词库超文本界面 字库词库超文本界面 Claire-Lise Mottaz Jiang 苗清黎 苗清黎 苗清黎 苗清黎 Centre Universitaire d'Informatique University of Geneva, Switzerland 计算机研究中心, 信息系统系 瑞士, 日内瓦大学 24, rue du Général-Dufour, 1211 Genève 4, Switzerland [email protected] Abstract. For the same dictionary, different interfaces can be designed for different kinds of users. Interface designers and developers should thus be equipped with a tool that allow them to quickly design and test those interfaces. In this paper, I consider a chinese-english character and word dictionary which is stored in a relational database. The dictionary interface is a hypertext composed of nodes and links. To build this interface I use a database publishing tool called Lazy, which generated a hypertext view (derived hypertext) of a given database. The Lazy system consists of a declarative hypertext view specification language, a node compiler and a node server that processes node requests. Since the language is purely declarative, it is fairly easy to construct and revise hypertext interfaces for the dictionary. Using Lazy, I first build a construct called an initial structure and explain how it can be refined to produce a more usable interface. Then, I briefly introduce active hypertext links and show how they can be used to create a vocabulary learning tool. The paper is concluded with some observations on the use of Lazy as well as some future directions. 1. Introduction A bilingual dictionary is an essential tool for anyone learning a foreign language. But the same dictionary can be used in many different ways, depending on the user's profile (mother tongue, level of fluency in the foreign language, etc.). In the case of a chinese-english dictionary, we could for example say that : English speakers generally need information about the pronunciation of chinese characters, and might appreciate to be told the radical and number of strokes. Chinese (mandarin) speakers might be mainly interested in the english equivalent to a chinese word. Chinese (non-mandarin) speakers might be interested in mandarin pronunciation in addition to the english equivalent words. Some users need chinese characters in simplified form and others in traditional form. Apart from the differences on the type of data, there are also various preferences concerning the presentation of data and the interrogation mode, for example : Some users prefer to search the dictionary using the pronunciation, other prefer using the radical and the number of strokes. An English speaker might want to have the chinese characters displayed in a large font, to make it easier to distinguish them, whereas Chinese speakers could deal with smaller characters. When it comes to an electronic dictionary, it is then essential to be able to build different interfaces for different kind of users on top of a unique set of data. In this paper, I will study the construction of an hypertext interface on top of a relational database in which dictionary data is stored. Designing a good hypertext interface is a difficult task. It is thus

Upload: others

Post on 22-Jun-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 字库词库超文本界面 Character and Word Dictionariescui.unige.ch/isi/reports/chinese.pdf · • Kangxi radicals are classified both as radicals (range 2F00 - 2FDF) and as characters

1

Hypertext Interfaces for ChineseCharacter and Word Dictionaries

字库词库超文本界面字库词库超文本界面字库词库超文本界面字库词库超文本界面

Claire-Lise Mottaz Jiang 苗清黎苗清黎苗清黎苗清黎

Centre Universitaire d'InformatiqueUniversity of Geneva, Switzerland

计算机研究中心, 信息系统系瑞士, 日内瓦大学

24, rue du Général-Dufour, 1211 Genève 4, [email protected]

Abstract.

For the same dictionary, different interfaces can be designed for different kinds of users. Interfacedesigners and developers should thus be equipped with a tool that allow them to quickly design and testthose interfaces. In this paper, I consider a chinese-english character and word dictionary which is storedin a relational database. The dictionary interface is a hypertext composed of nodes and links. To build thisinterface I use a database publishing tool called Lazy, which generated a hypertext view (derivedhypertext) of a given database. The Lazy system consists of a declarative hypertext view specificationlanguage, a node compiler and a node server that processes node requests. Since the language is purelydeclarative, it is fairly easy to construct and revise hypertext interfaces for the dictionary.Using Lazy, I first build a construct called an initial structure and explain how it can be refined to producea more usable interface. Then, I briefly introduce active hypertext links and show how they can be used tocreate a vocabulary learning tool. The paper is concluded with some observations on the use of Lazy aswell as some future directions.

1. Introduction

A bilingual dictionary is an essential tool for anyone learning a foreign language. But the samedictionary can be used in many different ways, depending on the user's profile (mother tongue, levelof fluency in the foreign language, etc.). In the case of a chinese-english dictionary, we could forexample say that :• English speakers generally need information about the pronunciation of chinese characters, and

might appreciate to be told the radical and number of strokes.• Chinese (mandarin) speakers might be mainly interested in the english equivalent to a chinese

word.• Chinese (non-mandarin) speakers might be interested in mandarin pronunciation in addition to

the english equivalent words.• Some users need chinese characters in simplified form and others in traditional form.Apart from the differences on the type of data, there are also various preferences concerning thepresentation of data and the interrogation mode, for example :• Some users prefer to search the dictionary using the pronunciation, other prefer using the radical

and the number of strokes.• An English speaker might want to have the chinese characters displayed in a large font, to make

it easier to distinguish them, whereas Chinese speakers could deal with smaller characters.When it comes to an electronic dictionary, it is then essential to be able to build different interfacesfor different kind of users on top of a unique set of data.

In this paper, I will study the construction of an hypertext interface on top of a relational database inwhich dictionary data is stored. Designing a good hypertext interface is a difficult task. It is thus

Page 2: 字库词库超文本界面 Character and Word Dictionariescui.unige.ch/isi/reports/chinese.pdf · • Kangxi radicals are classified both as radicals (range 2F00 - 2FDF) and as characters

2

important to have a tool to build and test this interface quickly, in order to enable an iterativemethodology. For this purpose I used a database publishing tool called Lazy.

Lazy follows a declarative approach : it consists in defining a hypertext structure and in specifyinghow to build the hypertext elements from the database content. Many other chinese dictionaries onthe web (Chinese Characters Dictionary Web: http://www.zhongwen.com/zi.htm) use a proceduralapproach : the web pages are generated (dynamically or not) by a program (CGI programs in C orPerl, Java servlets, PHP scripts, etc.). These programs must contain both tags and programmingconstructs. They are therefore often difficult to read, and the hyperspace structure is hidden by theprogramming constructs. For these reasons, such a code might be tough to maintain and update. Thedeclarative approach is conceptually simple and tends to be closer to the information designer'sconceptual level.The paper is organized as follows: I will first describe the data which has been used in this project.In the following section, I will briefly present the Lazy system and the concept of hypertext view.After that, I will explain the interface design process, which starts with the construction of an initialstructure which is then refined progressively. Then I will introduce active hypertext links and showhow they can be used to create a vocabulary learning tool. Finally I will conclude the paper withsome comments about this approach and give some ideas for future developments.

2. Data

For this project, two different sources of data are used. Both are stored within the same relationaldatabase.

Unicode DataThe character dictionary is made of data coming from the Unicode Standard, version 3.0. The CJKUnified Ideograph block (range: 4E00 - 9FA5) of the Unicode Standard contains variousinformation on 20,902 ideographs : pronunciation in mandarin, cantonese, japanese, korean,definitions, different encoding information, etc. Here, I chose to keep only eight attributes for eachof the 20,902 ideographs (the information that is the most useful for someone learning chinese),knowing that the approach presented in this paper would remain the same with a bigger or smallerset of information. The relation (table) character described below is used to store informationabout those ideographs, with one tuple for each ideograph.

relation CHARACTERattribute name Explanationunicode primary key

unicode scalar value (4 hex digits)example : 女 → 5973

definition english definition of the characterradical id of radical (KangXi)

foreign key : radical(number)additionalStrokes number of strokes apart from the radical

example : 苗 → 艹 + 5 strokestotalStrokes number of strokes, radical includedmandarin mandarin pronunciation in pinyinsimplifiedVariant same character written in simplified version *traditionalVariant same character written in traditional version ** those attributes are null if there is no difference between the simplified and the traditionalversions.

Page 3: 字库词库超文本界面 Character and Word Dictionariescui.unige.ch/isi/reports/chinese.pdf · • Kangxi radicals are classified both as radicals (range 2F00 - 2FDF) and as characters

3

In addition to the ideographs, the Unicode Standard 3.0 also contains a block (range : 2F00 - 2FDF)dedicated to the KangXi radicals. Information from this block is stored in the relation radical.

relation RADICALattribute name explanationno primary key

id of radical (Kangxi)name name of radical in englishunicodeRad unicode scalar value for radical *unicodeChar unicode scalar value for corresponding char. *• Kangxi radicals are classified both as radicals (range 2F00 - 2FDF) and as characters

(range 4E00 - 9FA5).

CEDICTIn addition to the unicode data, a chinese-english word dictionary has been stored in the database.This is the CEDICT dictionary (a public domain chinese-english dictionary, made by PaulDenisowski and Erik Peterson, available at : http://www.mandarintools.com) which contains 23,495entries. An entry corresponds to a chinese word or expression, its pronunciation in mandarin pinyinand an english translation. The CEDICT original files are encoded in GB and Big5. Those files havefirst been converted into Unicode (using the "Chinese Encoding Converter" from Erik Peterson).Then a program that retrieves the unicode scalar value of each character has been written. This lastconversion (Unicode → unicode scalar value) was necessary to enable the establishment of linksbetween the words of the CEDICT and the character information from the Unicode database.

The CEDICT dictionary has been stored in the following relations :

relation WORDattribute name explanationwordId primary key

id of word (1..23,495)pinyin mandarin pronunciation in pinyinenglish english translation

relations COMPJIANTIZI and COMPFANTIZI* (here "comp" is a short name for composition)attribute name explanationwordId foreign key: word(wordId)unicode unicode scalar value of characterposition position of character in the word* there are two tables to enable to store both the simplified and the traditional versionsExample: to store the word "程式语言" (word number 3808) in the database,

程7A0B

式5F0F

语8BED

言8A00

one must have the following tuples :word(3808, 'cheng2 shi4 yu3 yan2', 'programming language')compJiantizi(3808, 7A0B, 1)compJiantizi(3808, 5F0F, 2)compJiantizi(3808, 8BED, 3)compJiantizi(3808, 8A00, 4)

Page 4: 字库词库超文本界面 Character and Word Dictionariescui.unige.ch/isi/reports/chinese.pdf · • Kangxi radicals are classified both as radicals (range 2F00 - 2FDF) and as characters

4

Remark: in MS Internet Explorer 5.5, a character can be displayed by directly using its unicodescalar value in the HTML code. It takes the syntax &#xUUUU ; where UUUU is the unicode scalarvalue.

3. The Lazy System

In this paper, a dictionary is viewed as a hypertext that consists of nodes and links. Lazy is adatabase publishing tool (hypertext view generation system) that enables to generate such anhypertext from a database.

The Lazy system consists of :• a declarative hypertext view specification language• a node compiler that checks the syntax of node definitions and stores the node definitions (in a

coded form) in the data dictionary• a node server (a Java servlet) that receives node requests from clients' (browsers); loads the

appropriate node definitions; executes database queries to build the node contents; and sends theresulting Web pages to the clients.

A hypertext view is a set of nodes and links that represent (a part of) the contents of a database. Inthe declarative approach, the hypertext components (nodes and links) are derived from the databasecontent (relation tuples) according to a hypertext view specification, as shown below

figure 1: generation of a hypertext view from a database and a hypertext view specification

Since the language is purely declarative, it is fairly easy to construct and revise differenthyperspaces to represent the dictionary. With this tool it then becomes possible to adopt an iterativedesign methodology.

This section presents the Lazy hypertext view specification language. The presentation is ratherinformal and is based on examples.

A hypertext view specification consists of a set of node schemas. Each node schema specify therelation (table) from which the node's content is to be drawn; the elements that form the nodecontent; the selection and ordering criteria and links to other nodes. A node definition takes thefollowing form:node <node-name> [ <parameters> ]<element-list>from <relation>, ...selected by <expression>order by <expression>

Page 5: 字库词库超文本界面 Character and Word Dictionariescui.unige.ch/isi/reports/chinese.pdf · • Kangxi radicals are classified both as radicals (range 2F00 - 2FDF) and as characters

5

An element of an element-list is an expression of the form, '<' type '>' '(' <element-list> ')' or a<simple-expression> . A simple expression may involve literal constants (string, integer, etc.),attribute names, parameter names, operators and functions.

Example: a node that displays all the characters with a given number a strokes, with pronunciationand english definition.

node ideographWithStrokes[numberOfStrokes]<h1>("characters with " , numberOfStrokes, " strokes"),{

<p> (<font size="+5" face="Ms Song">( "&#x" , unicode, "; " ),mandarin, " ", definition

)}from characterselected by totalstrokes = numberOfStrokesorder by unicode

Elements enclosed in curly brackets '{' and '}' are repeated for each selected tuple, whereas otherelements appear only once.

figure 2: simple node without links

Lazy shares many similarities with the "mail merge" functionality of MS Word (a mail merge istypically used to do a mass mailing). In Word, data, for example customer coordinates, must bestored in a table. Then the user has to create a template document that contains text and presentationelements, as well as attributes of the table. When the merge operation is executed, a new documentis created by repeating the template for each row of the table; for each repetition, the attribute nameis replaced by its value in the current row of the table.

Page 6: 字库词库超文本界面 Character and Word Dictionariescui.unige.ch/isi/reports/chinese.pdf · • Kangxi radicals are classified both as radicals (range 2F00 - 2FDF) and as characters

6

The node definition language supports three kinds of links:

reference A reference link creates an active element whose action (when activated by amouse click) consists in jumping to (opening) the referred node. A linkspecification refers to a node through its identity (schema name together withparameter values). Reference links correspond to the traditional links that arefound in HTML pages.

include An inclusion link creates a compound-component relationship between two nodes.The content of the included node is a part of the content of the parent node.

expand in place An expand-in-place link is an inclusion link that defers the inclusion until the useractivates the link. The content of a node with expand-in-place links will thusdepend on user actions taken so far.

The dictionary interface development cycle consists in writing or editing source files that containnode schemas (node definitions); compiling these source files; and viewing (testing) the newlydefined nodes in a Web browser. Since the system is dynamic, once a node definition has beenmodified and recompiled, the new version is immediately available to the clients (there is no sitegeneration phase).

Every page that is viewed by a user is an instance of a node schema; thus any design problem canbe easily located (as opposed to procedural approaches in which the same procedure may be used togenerate several different Web pages).

4. Interface Design

In this section, I explain how one can construct an initial hypertext structure that reflects thestructure of a given database and how that initial structure can be modified through somerefinement operations.

Designing efficient and effective hyperspaces is a difficult task, probably because there are anextremely large number of paths that user can follow. It is thus difficult to ensure that the users willbe able to reach any information node, that they will not get lost or disoriented in the hyperspace,that any information can be reached within a reasonable amount of time/number of clicks, etc. Sincehere the hypertext view is built on top of an existing database, there is already a conceptual schema(the database schema), declared by the relation schemes and the integrity constraints such as foreignkey constraints. This schema shows the type of entities that are being considered and some semanticrelationships (materialized by foreign key constraints) between these entities. However, a databaseschema is not sufficient to create good hyperspaces. Database design and hypertext design do nothave the same objectives. Hypertext is meant to be used directly by humans; databases are mostoften accessed through various application software. In databases, information redunduncy tends tobe minimal, for example to avoid update difficulties. In hypertext, redunduncy is often useful, tomake reading easier. If one relies on the database schema at the semantic level, it will be possible tocreate a hypertext structure that is efficient for reading and navigating. The proposed design methodfor hypertext views proceeds in two phases: 1) define a first hypertext structure based on thedatabase schema; 2) refine this structure by applying various operations to the specifications ofnodes and links.

Page 7: 字库词库超文本界面 Character and Word Dictionariescui.unige.ch/isi/reports/chinese.pdf · • Kangxi radicals are classified both as radicals (range 2F00 - 2FDF) and as characters

7

Initial structure

For the construction of an initial structure, it is assumed that the database schema is given and fixed.One obtains the initial structure by defining a node schema for each relation of the database. Aninstance of such a schema is intended to represent a single tuple of the relation. The nodeparameters corresponds to the key of the relation. The contents of the node items are formed of allthe relation's attributes. Links are formed by attributes or groups of attributes that refer to otherrelations (foreign keys). For instance, the initial node schema corresponding to the relationscharacter and radical would be:

node Character[unicodeScalarValue]unicode, definition, radical, additionalStrokes, totalStrokes,mandarin, simplifiedVariant, traditionalVariant,href Radical[radical] (radical)from characterselected by unicode = unicodeScalarValue

node Radical[number]no, name, unicodeRad, unicodeCharfrom radicalselected by no = number

initial node for table character (node parameter:unicode scalar value)

initial node for table radical (node parameter:radical number)

figure 3: two nodes from the initial structure (with some presentation improvements)

With the initial structure, it is possible to vizualize all the information of the database, but it is stillnot sufficient to create a good dictionary interface. This structure must be refined. Some possiblerefinements are explained below.

Page 8: 字库词库超文本界面 Character and Word Dictionariescui.unige.ch/isi/reports/chinese.pdf · • Kangxi radicals are classified both as radicals (range 2F00 - 2FDF) and as characters

8

Refinements

Node inclusionWhen the user has to click several times to view two closely related pieces of information, there is arisk that (s)he has forgotten the first piece by the time (s)he reaches the second one. A way toreduce the number of navigation steps is the inclusion of nodes: instead of jumping from a firstnode to a second one, the second is included in the first. All the content of the two nodes in thendisplayed on the same page.

Example : nodes word (wordId, pinyin, english) and compJiantizi[wordId]node Word[id]href WordInChinese[id] ("inchinese")wordId, pinyin, english,from wordselected by wordId = id

node WordInChinese[id]{ character }from compJiantiziselected by wordId = idorder by position

node Word[9748] node wordInChinese[9748]

in chinese9748jie4 mian4interface

界面

figure 4: two nodes with a reference link

if we use include WordInChinese[id]instead of href WordInChinese[id] ("in Chinese"), we would have :

node Word[9748]

界面9748jie4 mian4interface

figure 5: two nodes combined with an inclusion link

Remarks: The included node can itself include another node. A node can include any number ofnodes.

Creation of index nodesIn the previous examples, the selection of a character, a radical or a word was always done with anumeric parameter. However, a dictionary user must be given other entry points: those numericvalues are very useful in a technical viewpoint, but they have usually no meaning for a user (exceptin some cases, the number of the KangXi radical). So, as one cannot assume that the user will knowwhich value corresponds to with character or word, we have to provide them with indexes.

In the case of the character dictionary, one very useful index is the list of KangXi radicals. An indexis a node that displays all the possible values for an attribute. To create a compact radical index, onecan use the following node definition :

Page 9: 字库词库超文本界面 Character and Word Dictionariescui.unige.ch/isi/reports/chinese.pdf · • Kangxi radicals are classified both as radicals (range 2F00 - 2FDF) and as characters

9

node radicals<h1>( <font face="arial" size="+2">( "KangXi Radicals" ) ){

<font size="+2" face="Ms Song">(href CharWithRadical[no] ("&#x", unicodechar, ";")

)}from radical selected by 1=1order by no

When one clicks on one radical it opens the node CharWithRadical[no], which displays all thecharacters with this radical. The selection criteria is always true, so every tuple in the relationradical is selected. A similar index could be created for the number of strokes, because for thisparticular attribute, the values for the characters stored in the DB only range from 1 to 48. Thenumber of items in the index is then small enough to be displayed "correctly".

Using the LIKE operator of SQL (+ the concat function)In addition to the index of radicals, users must be given the possibility of searching the characterand the word dictionaries using the pinyin or the english translations. So new nodes that selectcharacters and word by using their pronunciation or definition have to be added. (for example :characterWithPinyin[pinyin] and characterWithEnglish[word]) However, two problems appear atthat point :• As characters may have several different pronunciation depending on the context, it is very

difficult to find the exact parameter for the node characterWithPinyin[pinyin] if the selectionpart of the node is selected by mandarin=pinyin. for example the character 叔 can bepronounced jia4, shu1 or xia2. So, it is very unlikely that the user will be able to write thecomplete parameter (especially if (s)he is a beginner learner).It is much more probable that (s)hewould want to get a list of all the characters which are pronounced "shu1" and then browsethrough the list to find the appropriate one. The problem is the same withcharacterWithEnglish[word] : one would like to know the characters which definition contain aword, without having to write the exact definition as a parameter.

• Creating in index for the attributes mandarin or english (for the table character) is not realisticeither : the list would be much too long to remain practical.

The easiest and most straightforward solution is then to use the LIKE operation of the SQLlanguage in the selection part of the considered nodes.

Example:node CharacterWithPinyin[pinyin]<p>( "&#x", unicode, ";" ),<p>( definition ),<p>( mandarin )from characterselected by mandarin like concat("%", concat(pinyin, "%"))

CharacterWithPinyin[miao] corresponds to the SQL query :select '&#x', unicode, ';', definition, mandarin from characterwhere mandarin like '%miao%'It selects all the characters which pronunciation match the patterm %miao%, where % represent zeroor several characters.

Page 10: 字库词库超文本界面 Character and Word Dictionariescui.unige.ch/isi/reports/chinese.pdf · • Kangxi radicals are classified both as radicals (range 2F00 - 2FDF) and as characters

10

1. list of radicals 2. characters corresponding to a given radical

3. detailed info on a character 4. list of words which contain the character

figure 6: example of navigation, starting with the list of radicals

Page 11: 字库词库超文本界面 Character and Word Dictionariescui.unige.ch/isi/reports/chinese.pdf · • Kangxi radicals are classified both as radicals (range 2F00 - 2FDF) and as characters

11

figure 7: hypertext view structure after several refinement steps.

Page 12: 字库词库超文本界面 Character and Word Dictionariescui.unige.ch/isi/reports/chinese.pdf · • Kangxi radicals are classified both as radicals (range 2F00 - 2FDF) and as characters

12

Remarks:

With the current database schema the links represented in dotted lines are not realisable. Forexample, the node CharWithPinyin uses a LIKE operator in its selection part. If one uses "xiao4" asparameter, it will select characters with the values such as "xiao4", "xiao4, ye2", "xiao1, xiao4", etc.for the attribute mandarin. There is no immediate solution to retrieve xiao4 from those values. Thenthe reverse link cannot be established. In order to solve this shortcoming it would be necessary todecompose the mandarin attribute in a similar way as a word in decomposed into characters. Forthis purpose, one should have an additional table compMandarin(unicode, pinyinElement). For thecharacter 爷 it would give the tuples :character(7237, 'father, grandfather', …)compMandarin(7237, 'xiao4')compMandarin(7237, 'ye2')

Among the nodes that were part of the initial stucture, some have been kept without modification(except for the presentation) whereas some have been removed from the hypertext view structure.For example, Character[unicode] still exists because it may have a sense for a user : the unicodescalar value is not an artificially created value; it has been fixed by the Unicode Consortium. Thereis thus a way to know which value corresponds to which character. (the node radical[no] has beenkept for a similar reason). On the contrary, the node Word[wordId] has been removed because thevalue of the parameter has no meaning for the user. The number of a word has been assignedarbitrarily and there is no way for the user to know which number corresponds to which character.

5. Using Active Hypertext Links

The last version of Lazy introduces a very useful new feature: active hypertext links. An active linkis a "reference" link which, when executed, in addition to making a jump from one node to anothernode, triggers an operation (insert, update, delete) on the database. This operation is carried out ononly one tuple at a time. Active links are used for the two applications described below.

Vocabulary listsFor people learning a foreign language, it is common to work with vocabulary lists. The mostcommon types of vocabulary lists are :• thematic lists (for example numbers, the human body,vehicles, family titles)• lists for each lesson of a coursebookFor chinese, one could also have :• lists of character with a given radical or phonetic element• lists of words containing a given character (combinations)

These lists can be created during the navigation through the hypertext interface of the dictionary,with the help of active links. For this purpose, two new relations (tables) have been added to thedatabase :

CharCollection(unicode, collection)the character with unicode belongs to collection

WordCollection(wordId, collection)the word with wordId belongs to collection

Then, some nodes which where previously only used for navigation have been modified as follows.Example: in the node CharWithRadical[no] (figure 9)

Page 13: 字库词库超文本界面 Character and Word Dictionariescui.unige.ch/isi/reports/chinese.pdf · • Kangxi radicals are classified both as radicals (range 2F00 - 2FDF) and as characters

active href Character[unicode] ("put in collection : <br>" , set collection = textfield(15),set unicode = unicode,on "ok" do insert charCollection (create a button anchor, with label "ok")

When the node CharWithRadical is displayed in the browser, the user can put any of the charactersin a collection by simply writing the name of the collection in the textfield and by clicking on thebutton "ok". When the click on the "ok" button occurs, the node Character will be opened and, atthe same time, a tuple is inserted in the relation charCollection (with the values unicode =unicode scalar value of displayed character and collection = whathas been typed in the text field).

Such a link is added to several nodes, in order to give the possibility to add characters and words tocollections from various places in the hypertext interface.

AnnotationsWith a paper document (paper dictionary or a course book) it is easy to make hand annotations, forinstance an example sentence. Here, we can use an active link (update) for the same purpose. Anattribute "note" has been added to the relations character and word. Then, using a very simple node,the user can type a note that will be stored in the database.

Example: in the node editnote[unicode] (figure 10)active href Character[unicode] (

<p>( "note : " ),<p>( set note = textarea(10,40, note) ),<p>( on "ok" do update character[unicode] )

)

When one clicks on the "ok" button, the effect will be to open the node Character and, at the sametime, update the attribute note in the relation character.

e

t tt

t

updat

figure 8: hy

inser

pertext v

t

inser

iew structure for vo

e

inser

inser

ca

inser

remov

13

bulary lists and annotations

Page 14: 字库词库超文本界面 Character and Word Dictionariescui.unige.ch/isi/reports/chinese.pdf · • Kangxi radicals are classified both as radicals (range 2F00 - 2FDF) and as characters

14

1. node displaying the list "family" 2. during navigation, add a character to the list

3. list after insertion 4. list after removing a character

figure 9: management of a vocabulary list

Page 15: 字库词库超文本界面 Character and Word Dictionariescui.unige.ch/isi/reports/chinese.pdf · • Kangxi radicals are classified both as radicals (range 2F00 - 2FDF) and as characters

15

1. display detailed informationabout a character

2. creation of a personal note(update link)

3. back to information aboutthe character

figure 10: annotations

6. Flashcards

Once a user has filled a vocabulary list, (s)he will want to memorize it. A practical way to learnvocabulary is to use a flashcard system. Flashcards can be made with small pieces of paper with theword written in english on one side and in chinese on the other side. Here flashcards are made witha node which uses "expand in place" links (expand in place are similar to inclusion links, but thecontent of the included node is not displayed until the user clicks on the anchor). We suppose herethat flashcards are used on only one vocabulary list at a time (for this example, we will haveflashcards on characters only, but it can of course be done on words as well). First, a programfetches all the characters of the vocabulary list, puts them in random order in the tableflashcard(unicode, order) and assign them an order number. Then we have a node which onlydisplays the equivalent of one side of the paper card and the other side of the card is displayed onlyafter one (or several) click, to check the answers. The node contains a reference link to itself, but fora different instance: the parameter, which represents the card number is incremented.

1. flashcard 2. check the answer with "expand in place" link

figure 11: flashcards

Page 16: 字库词库超文本界面 Character and Word Dictionariescui.unige.ch/isi/reports/chinese.pdf · • Kangxi radicals are classified both as radicals (range 2F00 - 2FDF) and as characters

16

figure 12: hypertext view structure for character flashcards

figure 13: hypertext view structure for character flashcards

7. Conclusion and Future Directions

In this paper I presented the development of a hypertext interface for a chinese-english dictionarywith Lazy. Being purely declarative, Lazy allowed a very quick and easy development of thehypertext interface. The active links in particular allowed to perform updates in the database withsurprising ease.

Even if in the case of this project I've been both the user and the developer, I believe it is a verypractical tool for an iterative design, where successive prototypes are discussed with the users andthen refined accordingly. Another advantage of Lazy is the existence of a hypertext schema thatenables to perform different checks without having to navigate through the hypertext nodes. Forexample it is possible to view (by examining the hypertext schema in graphical form) how manynavigation steps are necessary to reach some given information. It is also easy to identify dead-ends.

cardNum+1

Page 17: 字库词库超文本界面 Character and Word Dictionariescui.unige.ch/isi/reports/chinese.pdf · • Kangxi radicals are classified both as radicals (range 2F00 - 2FDF) and as characters

17

In the near future, I plan to create nodes with more complex selection criteria, in particular, nodesthat combine several parameters. An example of this would be a node that selects all the charactersthat have a given radical and which pronunciation contain the diphtong "ie". Another example is anode that selects all the words that have a given number of characters and one of its character ispronounced "jie1". A node that would be interesting is one that would select all the words that havea similar pronunciation (i.e. the same pinyin, if the tones are not taken into account, for example化学 and 滑雪) and thus are often confusing for non-native speakers. One could also imagine anode that lists all the characters for which the traditional and the simplified version don't have thesame radical.

In a farther future, I would like to link the present dictionary with the english lexicaldatabaseWordNet and try to generate new links between chinese words and expression using therelations between concepts that exist in WordNet (hyperonymy/hyponymy, meronymy/holonymy,etc).

8. Acknowledgements

I would like to thank all my colleagues from the ISI group at the University of Geneva for theirprecious help with Lazy.

9. References

www.mandarintools.com

www.zhongwen.com

Falquet G., Guyot J., Nerima L, Park S., Design and Analysis Of Virtual Museums, in Proceedingsof MW2001 (Museums and the Web Conference), Seattle, March 2001

Falquet G., Guyot J., Nerima L, Active hypertext views on databases, CUI Technical Report,University of Geneva, 1999.

Henkins, J. H., New Ideographs in Unicode 3.0 and Beyond, 15th International Unicode Conference,San Jose, 1999

The Unicode Consortium, The Unicode Standard, version 3.0, Addison-Wesley, 2000

Wood A. Unicode and Multilingual Support in HTML, Fonts, Web Browsers and OtherApplications, http://www.hclrss.demon.co.uk/unicode/