language documentation and description

61
LANGUAGE DOCUMENTATION and DESCRIPTION Bahar Kocaman, Sezer Yurt, Gülden Berber

Upload: sveta

Post on 24-Feb-2016

114 views

Category:

Documents


2 download

DESCRIPTION

LANGUAGE DOCUMENTATION and DESCRIPTION. Bahar Kocaman , Sezer Yurt, Gülden Berber. By documenting languages we engage in amassing data for preservation . This will allow future generations to access data for languages even after they are gone . - PowerPoint PPT Presentation

TRANSCRIPT

PowerPoint Sunusu

LANGUAGE DOCUMENTATION and DESCRIPTIONBahar Kocaman, Sezer Yurt, Glden Berber By documenting languages we engage in amassing data for preservation. This will allow future generations to access data for languages even after they are gone.

Describing languages provides information about them, and has the extended effect that the mere existence of the descriptions may be empowering for endangered languages.Fieldwork Fieldwork is the procedure of acquiring linguistic data from language consultants, preferably in environments familiar to them, such as their homes or workplaces.

Prototypical fieldwork (Hyman, 2001): A linguist spending an extended amount of time with a community in an exotic place, documenting and recording a little known language of a community with the help of local informants. describes the activity of a researcher systematically analysing parts of a language, usually other than ones native language and usually within a community of speakers of that language (Sakel & Everett, 2012:5). When planning for fieldwork, the linguist will have to consider what he wishes to investigate, how he will be accepted into the community, how he will work with the consultants, how he will go about gathering data, how to make that data as reliable and comprehensive as possible, and so on.

Language Documentation Language documentation is the collection of raw data that may then be used for further analysis.

The merits of language documentation is to give linguists raw data to work with and to preserve a cultural heritage of the community. While language documentation may not slow the rate of language death, it will preserve records of the languages for future generations.

An example of a large scale language documentation program is The Rosetta Project, where information for over 1500 languages is currently stored, much of it publicly available.

Language Description Language description seeks to illustrate the essentials of a language, based on available material. It provides analyses of a variety of areas of the language, such as its phonological, morphological, grammatical and syntactic systems, as well as, ideally presenting a lexicon of the language.

Ideally the description is general enough to be comparable with other descriptions, but specific enough to capture the uniqueness of the language (Lehmann, 1999:6). The combined effect of descriptions and documentations may lead to a higher recognition, even on a political level, of the language. Reference materials, such as educational material, produced in combination with descriptive materials, might lead to a higher awareness, and might even lead to the language becoming recognized enough to be taught in schools. SamplingTypolocigal surveys are dependent on data from different languages, often from a large number of languages.

To include all human languages in an investigation is simply not possible. Because we dont have access to all human languages and we have limited access to the languages of the world.

Statements about cross-linguistic patterns, tendencies and universal are always based on a sample of languages.

Types of Samples 1. Probability Sample: In order to check for statistical tendencies and correlations of various features, we use probability sample. In this sample, we must set variables beforehand and map the sample according to presence or absence of those variables.

For example, we can check patterns for reduplication by choosing a set of variables, such as the language doesnt have reduplication the language has partial reduplication only the language has full reduplication only

the language has both partial and full reduplication.

We then proceed to code each language of the sample according to those variables, choosing only one variable per language.

2. Variety Sample: is mainly used for explorative research: when little is known about the form or constructionunder investigation it is important thatthe sample offers a maximum degree of the linguistic parameters [i.e. variables]involved (Rijkhoff & Bakker 1998:265).3. Convenience Sample: is a sample based on what kind of data one has access to.Types of Bias 1. Bibliographical Bias: Small or remotely located languages, very often isolates or languages of unknown affiliations, are biased toward exclusion from the samples. For instance, until Derbyshire (1977) published his description of the Hixkaryana word order, object initial languages were not to be found in any surveys on types of word order.2. Genetic (Genealogical) Bias: Some language families are overrepresented while others are underrepresented in the sample.

Many features of a language are inherited. If a sample is biased towards one family over others, a feature might look more or less common than it actually is, simply because of how it appears in the dominating family.

For example, tone isnt a common feature in Indo-European languages, but it is quite common in Niger-Congo languages. If a sample has a higher proportion of Indo- European languages than other families, the pattern that is likely to emerge is that tone seems less common crosslinguistically than it actually is.3. Areal Bias: Languages from the same linguistic area are overrepresented, which may skew the resulting pattern one way or another.

Linguistic areas are areas where languages have been in sustained contact and have influenced each other so that they have specific features not found in the languages outside the area. For example, the languages of the Balkan area, which belong to different genera of Indo-European, have postposed articles as opposed to the neighbouring languages outside the linguistic area, and as opposed to other languages of the same genera.4. Typological Bias: One linguistic type is over- or underrepresented in a sample.

For example, if we want to check if there is any correlation between adposition and verb-object word order, we need to include languages of all types, such as those with prepositions, those with postpositions

If we have an overrepresentation of languages with, for example, prepositions, we are likely to get a skewed pattern.

5. Cultural Bias: We have an over- or underrepresentation of the different cultures of the world in the sample.

There is a relation between certain aspects of the grammar of a language on the one hand and beliefs and practices of its speakers on the other hand (Bakker 2010:108).

For example, in a study on number marking, Lucy (1992) found that speakers of American English and speakers of Yucatec (Mayan (Mayan): Mexico) treat nouns differently:

The English speakers; make a sharp distinction between mass and count nouns, have obligatory number marking for count nouns.

The Yucatec speakers; treat most nouns as mass nouns, have optional number marking but an obligatory numeral classifier system.

When asked to sort pictures of objects, the English speakers tended to sort objects by shape,

the Yucatec speakers tended to sort objects by material composition.

The number marking system derives from the cultural outlook ( i. e. how one views and categorizes objects)or the cultural interpretation of objects derives from the linguistic structure, is probably impossible to establish.

if languages are closely related genetically, they are likely to have inherited common linguistic types from their ancestor language, to be spoken in the same area and by people sharing the same culture (Cristofaro 2005:91).

Finally, statistics may seem far removed from typology, but is actually pretty essential, since what we are dealing with is sets of data, samples aimed at representing the whole, and drawing conclusions from these sampled data.

Databases

The databases which are popular recently are beneficial for both compilers and linguistic community. Some advantages of these databases are:

They are beneficial for research making scores of data accessible.

They allow compilers to be recognized for their painstaking work.

These databases can continously updated.

However these databases radically may differ from each other both in selection of languages and in the approach to the entries.

For example;

There are databases with a vast amount of languages but where the data provided for each language is restricted.

There are databases providing very elaborate information for each language, but the number of languages is smaller.

There are databases which look only at one specific language domain while other databases code a host of features and information about the language.

The following ones are three different kinds of databases

Word Atlas of Language Structure (WALS).

Atlas of Pidgin and Creole language Structure (APiCS).

Automated Similarity Judgement Program (ASJP).

1. Word Atlas of Language Structure (WALS)

It is a milestone in terms of large-scale databases.

Some positive features of WALS It compiles a number of databases into one single unit covering a great part of abstract linguistic system including phonology, morphology, syntax, grammar, and lexical features. It also provides the first world wide collected mapping of language systems. Another aspect of WALS is that it includes two chapters about sign languages. Each linguistic feature is dealt with seperately in WALS The Atlas provides metadata for each language includes specifically the location of the language and its genealogical classification.

Two negative features of WALS

Because athours were responsible for individual features, their chapters may contain a large amount of languages though these may not necessarily overlap with other chapters.

WALS completely ignored pidgin, creole and mixed languages.

2.Atlas of Pidgin and Creole Language Structure (APiCS)

In contrast to WALS, it is the first large- scale typological project for pidgin and creole languages.

Some positive aspects of APiCS It basically atracts attention of experts on pidgin, creole, and mixed languages. Because the features are predefined and authors are responsible for specific languages, the cross compatibility between language is absolute.The kind of information that can be found for one language can also be found for every language in the database.The instructions for the authors were to fill out a detailed questionnaire of features for the language of their expertise. Each language is also described in a survey chapter containing a summary of the socio historical background and a broad structural outline of the language.

Two negative aspects of APiCS

APiCS includes ony pidgin, creole, and mixed languages that is selected languages that may or may not be of a specific typological sort. So a complete cross comparison between APiCS and WALS is not possible.

The sample of APiCS is biased towards English lexified contact language.

3. Automated Similarity Judgement Program (ASJP)

It aims to provide an objective classification of the worlds languages by means of lexicostatistical analysis.

Lexicostatistics is a technique used to compare the rates of changes within a set of words in different languages in order to try to establish in how far they are related and if they are when they seperated from each other.

Some positive sides of ASJP It computerizes the comparison between sets of words using a fixed algorithm. The task for each contributor is to enter a set of 40 lexical items for as many languages as possible. Some macro data is included for each language such as genealogical affiliation, location and number of speakers. Since the datset is small, it is possible for contributors to submit a large amount of languages.

Two negative sides of ASJP

Since we are dealing with a computerized comparison, the words have to be transcribed in a machine readable format.

And since the transcription is an approximation of the original, it is not possible to simply convert it back to a more detailed format in order to make it more accessible to others.

METHODOLOGY OF SIGN LANGUAGES

The methodology of spoken languages differ from the sign languages methodology.

Data Collection of Signed LanguageThe methods used for data collection in spoken and signed laguage are almost the same. However there is a difference for signed language.

The Sampling of Signed Languages

The status of information on sign languages, typological surveys of them are called as convenience samples.

Documentation and Description of Signed Languages

The difference between spoken and signed language in documentation and description is that signed languages are three dimensional and visual/gestural languages but spoken languages two dimensional and audio/oral languages.

Lastly, there are a number of endangered sign languages all over the world for example Hawaii pidgin sign language and Benkala sign language are nearly extinct ones.

THANKS FOR ATTENTION