user feedback on draft devanagari script behaviour for...

13
User Feedback on Draft Devanagari Script Behaviour for Hindi Version 1.4.9 S. No. Ver- sion Feedback/ Remark From TDIL-DC Portal Users Pertinence Comments 1. 1.4.9 I would strongly suggest the use of a fixed 4 byte completely Indic script to include ALL Indian languages using bit 22 and 23, etc in a LE32 system. The system would accept conjoint constants, etc. (what the Latin group calls syllables) as distinct characters. This would help enormously in collation, as huge data (composed in Indic languages) is stored in India and abroad. From Dr. Navinchandra Mehta. NOT PERTINENT Point well-taken, however not pertinent since the document deals with Script behaviour and not with storage issues 2. 1.4.9 The modern generation is not identifying each syllable as a character or (अर). From ancient times till as recently as mid-20th century, we accepted each syllable as अर. I think it is essential that we as Indians reserve some 128K or 256K bytes available in the unused 32 bit range before others claim it. The modern computers are fast enough to handle 32 bit character comfortably. From Dr. Navinchandra Mehta. NOT PERTINENT Point well-taken, however not pertinent since the document deals with Script behaviour and not with storage issues 3. 1.4.9 We could get the short vowel sound in Hindi. But We want the long vowel sound. How can We get? Try to arrange in fonts the original vowels the short vowel and the long vowel; and their sounds also. From Chandra Sekhar j k NOT PERTINENT Long is not used in Hindi and hence has been not incorporated.

Upload: others

Post on 25-Mar-2020

24 views

Category:

Documents


0 download

TRANSCRIPT

  • User Feedback on Draft Devanagari Script Behaviour for Hindi Version 1.4.9

    S. No.

    Ver-sion

    Feedback/ Remark From TDIL-DC Portal Users Pertinence Comments

    1. 1.4.9 I would strongly suggest the use of a fixed 4 byte completely Indic script to include ALL Indian languages using bit 22 and 23, etc in a LE32 system. The system would accept conjoint constants, etc. (what the Latin group calls syllables) as distinct characters. This would help enormously in collation, as huge data (composed in Indic languages) is stored in India and abroad. From Dr. Navinchandra Mehta.

    NOT PERTINENT Point well-taken, however not pertinent since the document deals with Script behaviour and not with storage issues

    2. 1.4.9 The modern generation is not identifying each syllable as a character or (अक्षर). From ancient times till as recently as mid-20th century, we accepted each syllable as अक्षर. I think it is essential that we as Indians reserve some 128K or 256K bytes available in the unused 32 bit range before others claim it. The modern computers are fast enough to handle 32 bit character comfortably. From Dr. Navinchandra Mehta.

    NOT PERTINENT Point well-taken, however not pertinent since the document deals with Script behaviour and not with storage issues

    3. 1.4.9 We could get क़ृ the short vowel sound in Hindi. But We want कॄ the long vowel sound. How can We get? Try to arrange in fonts the original vowels ऋ the short vowel and the long vowel; and their sounds also. From Chandra Sekhar j k

    NOT PERTINENT Long ॠ is not used in Hindi and hence has been not incorporated.

  • 4. 1.4.9

    Dr Navin Mehta has documented page-wise analysis of script behaviour, His document is attached for reference. I have already commented that we have failed in not reserving more space for the Indic scripts which I believe should have been much more syllabic oriented and should occupy 200,000 to 400,000 characters. I would strongly urge to reserve space at bit 22 and 23. The following is my detailed analysis and suggestions about your document:“Devanagari Script Behaviour for Hindi Ver 1 4 9.pdf” In this analysis, I have tried to use your version in black, and my comments and suggestions in red, using brown for Hindi characters. I may have missed some areas, but I think it is generally readable. Here go the comments: Each of these is explained below: a. Choice of Character:

    Languages differ in the choice of the characters from the Devanagari code-page. Thus Marathi and Konkani use ळ and ऱ (for generating out the eyelash ra). These are not

    present in Hindi or Dogri. The Hindi ऍ (U+090D) is represented in Marathi and Konkani as

    (U+0972). Nukta is used in Hindi and Dogri but not in Marathi or Konkani. b. The shape of the given character.

    Although Marathi and Hindi share the same script Devanāgarī, not only do they not share the same character inventory but in addition the representation of certain characters is different. Thus the Hindi /la/ is different from the Marathi /la/ in so far as the placement of the stem is concerned Hindi /ल/Marathi / .

    c. Choice of Character: In general, I believe that each regional language should be allowed

    to use their own character form, e.g. ळ and ऱ . My argument is based on the fact that in

    Gujarati, for example, one uses લ for the Hindi-Devnagari character ल. I do agree that the

    absence of the vertical stem in Marathi-devnagari makes it somewhat difficult to split it but

    NOT PERTINENT

    The document deals with Hindi and not Marathi.

  • one has to learn to do it the way they have always done it. The same argument applies to the difference in shapes of the numerals.In short Marathi,

    although they use an almost Devnagari Script, the script should be treated as a separate language script within the Sanskrit group. On page 4 I would accept the different way in

    which Marathi-Devnagari treats the word Still on page 4, d. The collation order within the language. The collation order varies from language to language although they all share the same script.

    In the case of Hindi are sorted along with the first consonant of each ligature. Thus

    is sorted along with क, with ज and with त. In Marathi occur at the end of the lexical sort, giving the two conjuncts a specific value of a letter.

    In Nepali are sorted at the end. Contrary to my acceptance of difference in the shape of characters in Marathi-Devnagari, I think it is inadvisable to have different collation for the different languages. In all, i.e. be it Hindi, Gujarati of Marathi, the collation should follow the Hindi order of collation, i,e,

    are

    sorted along with the first consonant of each ligature. Thus is sorted

    along with क, with ज and with त

    NOT PERTINENT NOT PERTINENT

    Not pertinent. Please see remark above The document deals only with Hindi. The collation order is the one provided in the latest CLDR on Unicode site and represents the sort order of Hindi accurately.

  • Page 8: The other target group is the OS and application developer. Once the possible ligatures and consonant Mātrā combinations have been identified, there is a need to provide a list of maximum combinations within the language. “Devanagari Script behaviour for Hindi” is equally important for keyboard design, especially when supplemented by frequency data from a corpus. This is of great importance to me as a developer. I am glad you are addressing the issue. Page 10: (and pages 22 and 23) Example ख in Hindi.

    What about the deeply gruttal ख of Arabic? Have you provided or will you be providing for it?

    What about the v in “have”, which is different from the sound in “wind”. The word “have” is one which can identify an Indian as distinct from an English person. Considering the overwhelming usage of /English words both in Devnagari and in Gujarati Media, it is necessary to have the separate character for “v” which is not the same as “ व “ or

    “ व्” See you comment 21 on page 18. I think a Nukta on “ व “ is required to bring out the

    proper sound of v in the word “have”, “revolve”, “vine” and many other English words. In particular I want to point out that the English pronounce “vine” and “wine” differently. Therefore, the character “ व “Eis the sound in “wine” but NOT in “vine”, which should be

    represented with a nukta added. A LOT OF Indians do not understand this difference. My suggestion is toa dd this in the table of consonants on page 22, as similar to the addition of “फ़ “ an Urdu Import.

    NOT PERTINENT NOT PERTINENT NOT PERTINENT

    No comment, since it approves of the remark. The deeply gruttal (sic. guttural) ख of

    Arabic is represented

    by ख़. Since Unicode does not distinguish between /w/ and /v/ of English insofar as Devanagari is concerned, the matter be addressed to Unicode.

  • Page 14. The “Devanagari Script behaviour for Hindi” is limited to its synchronic use, i.e. the manner in which a given language as of today admits a character set within the script used to write it. It is not diachronic or historical in nature and does not study the evolution of the given script across centuries. I do not fully agree with this approach or interpretation or narrowness. Page 16. 6.1.3. Amendments needed in Unicode for Hindi language None has been proposed by the experts who have mandated the document. I am surprised and appalled at the lack of interest. The Indic portion of Unicode is so difficient as to need at least another 256K characters which can only be had by use of a 32 bit character using the available bits 22,23, etc. Page 18. ऽ - Avagraha For extra length with long vowels as seen in the Sanskrit text /उपद श

    ऽजनुनाि सक/

    I do not see any need for this item. It is but rarely a part of Hindi or any other current languages of India. It occurs a lot in Sankrit and Prakrit, but then there are so many more occurring in Sanskrit that it becomes a separate subject altogether. See http://www.sanskritweb.net/ http://www.sanskritweb.net/itrans/addendum.pdf Pages 22 and 23 My strong request as covered above at Page 10 comments.

    NOT PERTINENT NOT PERTINENT

    NOT PERTINENT NOT PERTINENT

    The document is pertinent to actual use of Hindi today and has no diachronic pertinence. The matter be referred to Unicode. This document is restricted to the shapes. Used in poetry and to quote Sanskrit words and hence the experts agreed to maintain the same. Please refer to

  • Page 25. We accept “क + ऋ = कृ”

    We also write I have never understood why we do not have a place of the half र

    as a vowel sound, as above, but I am too small a person to even attempt to change things in that area. FOR MY PERSONAL USE ONLY in the E000 area I use it as a vowel for the convenience of compostion and the fact that I have three vacancies in a line of 16 decimals. Page 26 I also use bottom rakar in my list of vowel attachments (see page 25 comments above). I will re-emphasize that I do it for MY convenience, and certainly do not wish to be drawn into any linguistic discussion about my interpretation.

    The displaced Catenator as in is not a problem to me. Page 29

    The | and || are God sent to me, as I use them as end of sentence and end of paragraph respectively and I am very comfortable with it. Page 30 I have already brought your attention to the request for addition of “व” with a nukta in my

    comments about Page 10 and Pages 22 and 23. Pages 31 onwards These would be affected if you accept the insertion of “व” with a nukta.

    NOT PERTINENT NOT PERTINENT NOT PERTINENT NOT PERTINENT NOT PERTINENT

    remarks above. The rafar is not a vowel sound. The remarks are for personal use. The remarks are for personal use. The matter be addressed to Unicode. Please see detailed comment re. the same above. Please see remark above.

  • Page 55

    I am uncomfortable with and other three consonants conjoints. They exist prolifically

    in Sanskrit but rarely in Hindi. are available through Google transliteration

    software, so I suppose they have to exist. For three consonant conjoints like I

    have usually found and used myself more conveniently and comfortably. However, enough said about three consonants conjoints. Page 61 If the collation order is as you have shown there. I have been taught and usually used the anuswar, the chandrabindu and क: AFTER कौ . But I will not quibble.Universal acceptability

    for collation is more important.

    NOT PERTINENT NOT PERTINENT

    The conjunct characters have been provided by Hindi experts who have validated the document. The collation order is based on the CLDR as provided by Unicode.

  • 5. 1.4.9

    1.This document does not talk about keyboard layout (Inscript keyboard layout) It will be a useful addition to this document to include this topic in this document. It will make it more complete. In relation to this topic, there is certain scope for improvement in the current layout of the Hindi Inscript keyboard. Currently, in the Inscript keyboard the purna viram chihn

    (।) occurs in the shift level of the keyboard which greatly inconveniences Hindi writers. Since the Hindi full stop occurs at the end each sentence, it is a frequently used character. It is suggested that in the Hindi keyboard the position of the English full stop (.) which currently occurs in the normal level of the keyboard be switched with the position of the Hindi purna viram, that is the Hindi purna viram be brought into the normal level of the keyboard, so that it can be typed by pressing just one key. Currently it requires the pressing of two keys (shift and full stop). This will greatly facilitate Hindi typing.

    Suitable changes in the Unicode numbers of . and । should be made to accommodate this change. Another drawback of the Hindi Inscript keyboard is that it misses the frequently used symbols like ? + % @ ' " ; : etc. For all these, Hindi writers have to switch to the English keyboard, type these symbols and then switch back to the Hindi keyboard for continuing to type the Hindi text. This greatly slows Hindi typing. Ways should be found to accommodate all these symbols so that Hindi typing speed can be increased and work efficiency increased. From L. Balasubramaniam

    NOT PERTINENT

    The Enhanced Inscript Keyboard document is a separate proposal pending approval before the Bureau of Indian Standards. The remark may be addressed to the team working on the said document.

  • 6. 1.4.9 Although this topic is not strictly related to script, it would be a useful topic to have in a comprehensive document of this nature. Currently it is very difficult to make out the gender of loan words from English, Urdu and other languages. This has led to great confusion in Hindi

    regarding the gender of words. Words like टिकि, पेंसिल, टे्रन, (to cite a few examples from English) are found to be used in both genders in Hindi. Even dictionaries give different

    genders for different words, eg., फ़ाख्त़ा which according to some dictionaries is masculine (because of the masculine-indicating आ ending of Hindi) and according to some is feminine (because of the feminine-indicating आ ending of Sanskrit loan words like लत़ा, पत़ा, िीत़ा, etc). Clear guidelines on determining the gender of such words will greatly help to standardize the Hindi language. I am sure Central Hindi Directorate and other Hindi institutions have deliberated on the gender issue of loan words, and it should be relatively easy to summarize their recommendations and include them in this document. I hope you will be able to add these two topics to this document and make it more comprehensive and useful. From L. Balasubramaniam

    NOT PERTINENT

    This document is concerned with shapes and issues afferent to the same. Grammar and Morphology are not within the purview of the document. It is requested that CHD be contacted to prepare a document on the issue.

  • 7. 1.4.9 two short vowels ऎ,ऒ should be included

    by Shree Devi Kumar Short E/O are not really “for Dravidian transliteration” only, but were originally introduced by Hoernle for the Bihari languages Bhojpuri, Magadhi and Maithili" Please see: http://www.unicode.org/L2/L2010/10471-dev-short-vowels.pdf As per LSI by Grierson, Bihari and Awadhi "As in Bihari, there is a short e as well as a long one, and a short o as well as o. Also a short at and a short au." https://archive.org/search.php?query=rosettaproject%20awadhi%20AND%20subject%3A%22Awadhi%20Detailed%20Description%22 " As in other Bihari dialects, the vowels e and o , and the diphthongs ai and au have each two sounds, a short and a long one. Accurate writers distinguish these when writing in the Deva-nagari character, " http://www.joao-roiz.jp/LSI/pdf/vol=5-2ff=36ft=36tid=fcbbfe64c14e8d3678c67e55b6b0e2ce4c083ec2

    NOT PERTINENT

    The document deals with Hindi alone and not with Bhojpuri, Maithili or Magadh. Information provided is gratefully acknowledged.

  • Comment 1: Nukta is not used in Nepali also. Because, Nepali does not have uvular sounds

    and other sounds where Hindi uses Nukta.

    Comment 1: Nukta is not used in Nepali also. Because, Nepali does not have uvular sounds

    and other sounds where Hindi uses Nukta.

    Comment 2: In Nepali, the collation order of क्ष, ज्ञ, त्र is with क, ज and त respectively, not at the

    end. However, in the primary school level, they are taught to read and write as they are at the

    end.

    Comment 3: In Nepali, is used for nazalization uniformmaly. is used in Sanskrit tatsam

    words and they follow the Sanskrit Pancham varna rule,ie, they pronuciation depends upon

    the following character.

    Comment 4: Nepali has only six vowels: अ, आ, इ, उ, ए and ओ; and two dipththongs ऐ and औ।

    However, ई, ऊ and ऋ are also used in writing.

    Comment 5: Since the enconidng system (unicode) has been well established, the project

    can be extended to the entire indic langage where Devanagari scripit is used. Why only for

    'Hindi'?

    NOT PERTINENT

    The document deals with Hindi alone and not with Nepali or any other language using Devanagari script. However the information provided is appreciated.

  • 8. 1.4.9 K P Tiwari

    सहायक महाप्रबन्धक

    Reserve Bank Of India

    [email protected]

    आपके आलेख में पृष्ठ 44 पर ददखाया गया ह ैशृ् । जबदक अधााक्षर श् में दो नहीं लगेंगे या तो श् + र होगा या

    दिर ऋ की मात्रा श् + ृ का स योग होगा। कृपया इस बारे में कुछ ध्यान ददया जाए।

    दसूरी बात यह दक जबसे कम्प्यूटर आए तो आर भ में कितपय कठिनाइयों और स साधनों की कमी के कारण

    िुलस्टॉप का प्रयोग करत ेहुए ही दशमलव भी िलखा जाने लगा। जबदक

    गिणतीय दिृि से दशमलव का सथ्ान अधोरेखा से कुछ ऊपर हुआ करता था। गिणत की पुरानी दकताबों में यह

    स्पष्ट ह।ै यदद दवेनागरी में दशमलव का स्थान सही कर ददया जाए तो यह अत्यिधक प्रश सनीय काया होगा। यह

    बात तो सवािवददत ह ैदक पहले ट कण य त्र की अपनी सीमाए थीं इसिलए दवेनागरी लेखन के िलए भी कुछ

    सीमाओं में रहकर ही वणा आदद का काम चलाऊ तरीका अपनाया गया। लेदकन आज वह सीमा टूट गई ह।ै अ गे्रजी

    में ही दिेखए फे्रन्च या जमान के िजन शबद्ों यथा tête-à-tête का ट कण रोमन में करत ेसमय ऊपर के नुक्त नहीं

    लगाए जात ेथ ेलेदकन वर्ा प्रोसेससग में आज यह अपने आप लग जाते हैं। इसी प्रकार की

    व्यवस्था िहनद्ी में भी की जा सकती ह।ै िजन शबद्ों के अशुद्ध रूप चल रह ेहैं, या िजनके स युक्ताक्षर बनान े

    NOT PERTINENT The points raised by

    Shri Tiwari are

    interesting.

    The first point

    regarding श्ृृं is

    adequately covered in

    the footnote on the

    page mentioned.

    Insofar as the use of

    the full stop as a

    decimal point or a

    temperature mark is

    concerned, the issue

    although interesting,

    is beyond the purview

    of this document.

    CHD be requested to

    consider the same.

  • स भव नहीं थे आज आसानी स ेबन जात ेहैं। द्वव द्व को इतना ल बा िलखने की क्या आवश्यकता ह ैजब द्व का

    स युक्त रूप आसानी से बन सकता ह।ै यह न तो पढ़ने में आसान ह ैन दखेने में, बिकक मैंने िहनद्ी सीखन ेवालों को

    इसे दव दव बोलते सुना ह,ै जबदक स युक्ताक्षर में यह

    दोष नहीं रह जाता।आशा ह ैआप इन सुझावों पर ध्यान दकेर मुझे कृताथा करेंगे