proposal for a bangla (or bengali) script root zone …‘bangla’ (or bengali) is historically and...

66
Proposal for a Bangla (or Bengali) Script Root Zone Label Generation Ruleset (LGR) – Incorporating comments from IP LGR Version: 4.0 Current Date: 2020-03-02 Document version: 4.8.2 Authors: Neo-Brahmi Generation Panel [NBGP] 1. General Information This document lays down the Label Generation Rule Set (LGR) for the Bangla (or ‘Bengali’) 1 script under the general rubric of the Neo-Brāhmī Writing System. Three main components of the Bangla Script LGR i.e. (i) Code point repertoire, (ii) Variants and (iii) Whole Label Evaluation Rules which have been described in detail here, having given a brief historical background of the Script under Section 3. All these components will be incorporated in a machine-readable format in an XML file named "proposal-bengali-lgr-02mar20-en.xml". Labels for testing can be found in the accompanying text document “bangla-test-labels-02mar20-en.txt”. 2. Script for Which the LGR Is Proposed ISO 15924 Code: Beng ISO 15924 Key N°: 325 ISO 15924 English Name: Bengali (Bangla) Latin transliteration of native script names [in IPA]: bɑːŋlɑː, ôxômiya Native names of the script: বাংলা, অসমীয়া Maximal Starting Repertoire (MSR) version : MSR-4 1 The term ‘Bangla’ is used in the descriptive text and the term ‘Bengali’ is used in the normative part of this proposal.

Upload: others

Post on 27-Mar-2020

22 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Proposal for a Bangla (or Bengali) Script Root Zone …‘Bangla’ (or Bengali) is historically and genealogically regarded as an eastern Indo-Aryan language with around 178.2 million

ProposalforaBangla(orBengali)ScriptRootZoneLabelGenerationRuleset(LGR)ndashIncorporatingcommentsfromIPLGRVersion40CurrentDate2020-03-02Documentversion482AuthorsNeo-BrahmiGenerationPanel[NBGP]

1 GeneralInformationThis document lays down the Label Generation Rule Set (LGR) for the Bangla (orlsquoBengalirsquo)1script under the general rubric of the Neo-Brāhmī Writing System Threemain components of theBangla Script LGR ie (i) Codepoint repertoire (ii) Variantsand(iii)WholeLabelEvaluationRuleswhichhavebeendescribedindetailherehavinggivenabriefhistoricalbackgroundoftheScriptunderSection3Allthesecomponentswillbeincorporatedinamachine-readableformatinanXMLfilenamed proposal-bengali-lgr-02mar20-enxml Labels for testing can be found in theaccompanyingtextdocumentldquobangla-test-labels-02mar20-entxtrdquo

2 ScriptforWhichtheLGRIsProposedISO15924CodeBengISO15924KeyNdeg325ISO15924EnglishNameBengali(Bangla)Latintransliterationofnativescriptnames[inIPA]bɑːŋlɑːocircxocircmiyaNativenamesofthescriptবাংলা অসমীয়াMaximalStartingRepertoire(MSR)versionMSR-4

1 The term lsquoBanglarsquo is used in the descriptive text and the term lsquoBengalirsquo is used in the normative part of this proposal

2

3 BackgroundonScriptampPrincipalLanguagesUsingIt30IntroductionlsquoBanglarsquo (or Bengali) is historically and genealogically regarded as an eastern Indo-Aryanlanguagewitharound1782millionspeakersinBangladesh(98speakers)and834millionspeakersintheIndianstatesofWestBengal(6837million)Tripura(215million) SouthAssam (73million) Odisha (049million) andDelhi (021million) aswellasintheAndamanandNicobarIslands(closetoahundredthousand)-accountingfor83ofIndiaItisamajorlanguageinJharkhand(26million)tooandalanguagewith a sizable population in Bihar (044million) Apart from these there are a hugenumberofBangla-speakingdiasporasspreadallovertheworldItistheseventhlargestspokenandwrittenlanguageintheworldBanglaisthenationalandofficiallanguageofBangladeshandoneof the22Official languages in India(listed in the8thScheduleofthe Indian Constitution) It is also one of the official languages of Sierra Leone Thescript is also calledBangla [102]which is an eastern variety of the lsquoBrāhmīrsquoWritingSystemwritten from left to rightHistorically it derives from theBrāhmī alphabet asusedintheAshokaninscriptions(269-232BC)

Banglaanditscognatelanguagesasmentionedabovetogetherformalinguisticgroupknown as the Eastern New Indo-Aryan (NIA) There is a gross inadequacy of theinscriptionsandmanuscriptsintheEasternApabhranśaorlsquoAvahaṭṭharsquoexceptforsmallinscriptions and the manuscripts of the Tantric Buddhist text titledlsquoCaryyācaryyaviniścayarsquoortheCaryā-Pada[114]datingbacktothe9th-11thcenturyAsa result there is not much epigraphic evidence for the development of its writingsystemHoweverwhatevidenceisavailableofthegenesisofBanglawritingsystemisdiscussedinthesection31[109]Historically theBangla languageisdividedintothreeperiodsasevident fromvarioussources

(i) FirstlyOldBanglaPeriod (roughly9501000 toAD12001350) ofwhichthreespecimensarefound(a)47CaryāsongstheDohākōṣaofSarahaandtheDohākōṣa of Kānha (mostly in Apabhranśa) and theḌākārṇava (in avariety of Prakrt) (b) Old Bangla specimens of over 300 words in acommentary[141]

(ii)ThenthereisMiddleBanglaPeriod-1200-1800ADagaindividedintothreestages(a)TransitionalMiddleBangla(1200-1300ADforwhichnogenuinespecimensarefound)[147](b)EarlyMiddleBangla(1300-1500AD)and(c)LateMiddleBangla(1500-1800AD)

(iii)Finally after1800ADwe find theModernorNewBanglamarkedby theintroduction of written prose [109] in the books of Fort William College(established in1800)ThecolloquialvarietyofBanglabasedonthespeech

3

varietyofCalcutta(calledlsquoKolkatarsquonow)madeitsfirstappearancethroughthe Hutōm Pẽcāra Nakśā (1862) by Peari Chand Mitra The influence ofEnglishinthevocabularyidiomsandexpressionsaswellasinthewritingstyles of Bangla is significant by this time The fonts and types for Bangladeveloped during this time also spread to all parts of Bangla speechcommunity[101120]Thesamefontswithsomeextensionswerealsousedfortheneighbouringlanguagesdeployingthiswritingsystem

Bangla prose had developed two literary styles during the 19th-20th Century TheSādhubhāṣā (সাধভাষা - Elegant Language or Style) and the Calitabhāṣā (চিলতভাষাCurrent Language orModern Style) It is the latter style that is prevalent today inwrittenproseTheLanguageMovementinBangladesh(thethenEastPakistan)beganin1948ascivilsociety dissented to the elimination of the Bangla script from currency and stampswhichwere inuse since theBritishRaj Themovement reached its pinnacle in1952when on 21 February the police fired on demonstrating students and civilianstriggeringnumerousinjuriesanddeaths2LaterfollowingtheLanguagemovementon27 April 1952 the All Party National Language Committee decided to demandestablishment of an organization for the promotion of Bengali language BanglaAcademyDhaka right from its inception in1955hasbeenengaged inpromotingandfosteringBanglaasthelinguafrancaofthecountrybeforeandafterindependencefromPakistanin1971ThroughthevariouscommissionsandcommitteesconstitutedbytheGovernment of Bangladesh (Banladesa Jatıya Sy iksa Kamisana in 1972 Jatıya Sy iksaUpadestaParisadin1979BanlaBhasaBastabayanaSelain1982BanlaBhasaKamitiin1983 etc3) after independence in 1971 Bangla was made the primary medium ofinstructioncommunication in all Governmental and educational activities Through agreatstruggleandbloodshedtheBengalisestablishedBanglaasanofficiallanguageofthestate4

2 The UN declared Ekuśe February (21st February) as the International Mother Language Day at the UNESCO General Conference in Paris on 17 November 1999 ldquoin recognition of the sanctity and preservation of all vernacular languages in the worldrdquo22 3 Bāṅlā Bhāṣā Kamiṭi 1983 Bāṅlā Bhāṣā Kamiṭi Riporṭ (Report of the Bangla Bhasha Committee) Dhakaː Śikṣā Dharma Krīṛā O Saṅskṛti Mantraṇālaya Peoples Republic of Bangladesh 4 Chakraborty Rajib 2018 The Fishermenrsquos Community A Language-Culture Interplay (A Study of Post-1971 Select Bangla Novels) Unpublished PhD Dissertation Visva-Bharati

4

31WrittenBanglaThe lsquoBangla alphabetrsquo (বাংলা িলিপ - Bānglā lipi ISO15924) is derived from theBrāhmīwritingsystemwhichisrelatedtotheNagarı(alsoknownasDevanāgarī5)script[108]aswell as to Tirhutāwriting system [106] Considered to be fifthmostwidely usedwritingsystem in theworld thiscombinedBangla-Asamiyā-ManipuriScript (showingsomevariationsforAsamiyāandMeiteiorBisnupriyaManipuri)(130)wasusedintheeasternIndianSanskritmanuscriptstooForChakma in IndiaandBangladeshandforKokborok inTripura itwasandstill isoneof thescriptsusedAclosevariant calledTirhutā (123 now available also in UNICODE 100 as 11480 114DF See 110) orMithilākṣarawasused forMaithili fromthe14thCenturyuntil theearly-20thcentury[106]InthiscontextonefindsamentionoflsquoSylhetiNagarılipirsquoorlsquoSilotirsquo(addedtotheUnicodeStandard inMarch2005with thereleaseofversion41) thedetailsofwhichcouldbeof interest only tohistorians andhistorical linguists (See137and144)ButSylhetiBanglaisgenerallywrittenbymanyinthemodern-dayBanglascriptnowforallpracticalpurposes Originallyduring thereignof thePāladynasty (750-1154AD) intheeasternIndiaandevenearlierperhapsduringtheMallaperiod(694ADonwards)thepresent-dayBanglawritingsystemgotashapecomparabletothemodern-dayones[111 119] A pictorial description of Brāhmī to Modern Bangla Script could bepresentedhereinatabularform

Modern ক জ ম র স অ

k j m r s a

Table1PictorialdepictionofEvolutionofBrāhmītoBangla

5William DwightWhitney in his SanskritGrammar unequivocally said ldquoThis name (Devanagarı) is ofdoubtfuloriginandvaluerdquo(WhitneyWilliamDwight1994reprintSanskritGrammarNewDelhiːMotilalBanarasidassPublishersp1)

5

The inscriptional evidence in Brāhmī is found in the Archaic Brāhmī from the 3rdcenturyBC tothe1stcenturyBCandinMiddleBrāhmīndashsoonafter(1st-3rdCenturyAD)andthenonintheLateBrāhmī(4th-6thCenturyAD)ThisevidencecouldbeseeninbothBangladeshandWestBengal [108]by1)TheMahasthanagara(BogradistrictBangladesh mdash the ancient name being Pundranagara or Paundravardhanapura)inscriptions 2)Brāhmī (andKharoṣṭhī) inscriptions from the lower lsquoGangeticBengalrsquoand (3) Copper plate inscriptions of the Imperial Guptas fromNorthernpart ofWestBengal andNorth-West Bangladeshmdash in the areas underDharmaditya Gopachandraand Samācāradeva (about whom one only knows from five Copper-plates found inKotalipara in the Faridpur district in Bangladesh one in Mallasarul in the Burdwandistrict(WestBengal)andoneinJayramapura(BallesvaradistrictnowinOdisha)Theseepigraphs fromtheeasternpartofUndivided India (datingback to the4th-6thCenturiesAD)showedsomecharacteristicfeaturesofletters(especiallyinমlsquomarsquoলlsquolarsquo

শlsquosarsquoসlsquosarsquoandহlsquoharsquo)whichledtothedevelopmentofeasternvarietyofGuptascriptEpigraphicrecordsfromBangladeshdemonstrateremarkabledevelopmentsinEasternBrāhmī In this context the Tippera copper plate inscription of the lsquoSamatatarsquo rulers(139 pp 265) such as Lokanātha (dated 7th Century AD during the latter half) theKailaninscriptionofSy ridharanaRātaaswellastheAstafpurcopperplatesThelettersseemtohangdownfromwedgeshapedsolidtriangleswithrighthandverticalsbendingdownatthebottombecauseofwhichitwasdescribedbyPrinsepandFleetasKuṭila-lipi (literally lsquoCursivewriting stylersquo)whereas the termSiddhamātrikā (as amatra orbarisplacedovereachoftheletters)wasusedbyAlBiruni(973-1048)todesignatethescriptofNorthernIndiaThenextstageofdevelopmentisillustratedbythe9thCenturycopper plate inscriptions fromKhalimpur of the reign of Dharmapāla fromMonghyrand Nalanda of the time of Devapāla in Bihar and from Jagjıvanpura (Malda) of thereignofMahendrapālaTheSiddhamātrikā(mentionedaslsquoSiddhamrsquoinChinesesources)issaidtohavebeenprevalentalsointhisregionuptotheendofthetenthcenturyAlsocalledtheGauri(ieGandi)inPūrvadeśāortheEasterncountryitwasregardedasthesame script to which is given the appellative Proto-Bangla characteristics inrudimentaryformsintheperiodbetweenAD875andAD1025Insomeepigraphs it isconsideredasbelonging to thesecondquarterof theeleventhcenturyADFlatteningofhead-marksbecomesprominentincomparisontothewedge-shaped serifs An important landmark in the development of the Bangla script is theRamaganja copper plate inscription of Mahāmānḍalika in the last quarter of theeleventhcenturyADItistheearliestdocumentfromthisentireregionwhichbearsthelettermwithatickrisingupwardsThefullvowelidevelopsatickattherightendofthe upper horizontal bar above and a curved hook below Initial e approaches themodernBanglacharacterAmature formofProto-Bangla the immediateprecursorof

6

BanglascriptisillustratedintheinscriptionsoftheVarmanaSenaandDevarulersofthetwelfthandthirteenthcenturies[104]TheevolutionoftheBanglascript(Cf136)isalignedwiththestoryofadvancementofprintingtechnologyThefirstldquoMovabletyperdquoscriptstechnicallycreatedandusedwhileprintingNathanielBrasseyHalheds (1751-1830)1778-book titled AGrammaroftheBengalLanguageIn1785Governor-GeneralWarrenHastings(1732-1818)requestedanother civilian Charles Wilkins (1749-1836) to cut punches for Bangla printingcharactersThecurrentprintedformofBanglascriptappearedsoonafterItisgenerallyagreedthatWilkinsdevelopedBanglaprintscript[111]HepassedonthisknowledgetoPancananaKarmakara(-1804)arenownedartistinBengalLateritwasKarmakarand his family that became famous in Bangla printing technology Shepherd wasanotherassistantofWilkinsinthisdesigningofscriptwhichbecamemoreangularwithsharperturnsandedges[133]Afewarchaiclettersweremodernizedduringthe19thcentury It was standardized by Pandit Ishwar Chandra Vidyasagar when the Banglatypefontsweretobeusedtopublishona largescaleundertheCalcuttaSchoolBookSociety[116forseveralreferences]Much later in1935 theLinotypetechnique inventedbyOttmarMergenthaler(1854-1899) in 1886was introduced intoBangla printing in 1935 by the efforts of SureshChandra Majumdar (1888-1954) Rajsekhar Basu (1880-1960) Jatindra Kumar Sen(1882-1966)andhisdiscipleSushilKumarBhattacharyaandhadbegunbeingusedbytheA nandabazaraPatrikagrouplaterfollowedbyothersWithinafewyearsthemoreadvancedmonotypetechnologycametobeusedinBanglaprintingHoweverinBanglaprinting culturemonotypehas a very limited acceptance and linotype held stage tilleventuallythedigitaltechnologycameintoreplaceallearliertechniquesAllthesecouldbepresentedinatable

PERIOD DESCRIPTION NAMES

3rdCenturyBC UseofBrāhmīandKharosthīscriptsbegininthesubcontinentBrāhmīwaswidelyusedduringtheMauryanKingAśokaInonetheoryBrāhmīisbasedonNorthSemiticalphabetbutsuitablymodifiedtofittheneedoflocallanguagesItiscurrentlybelievedtohavebeenanindependentdevelopment

Brāhmī

1st-3rdCenturyAD

TheKusanascriptnamedaftertheKusanaroyaldynasty

Kusanascript

7

PERIOD DESCRIPTION NAMES

4th-5thCenturyAD

ThenextstageofitsevolutionwasintotheGuptascriptnamedaftertheGuptaroyaldynasty

Guptascript

7thCenturyAD EpigraphicrecordsfromBangladeshdemonstrateremarkabledevelopmentsinEasternBrāhmīgivingrisetotheKuṭila-lipi

Kutila-lipi

8thCenturyAD SomecopperplateinscriptionsarefoundintheKhalimpurBangladeshduringthereignofDharmapālafromMonghyrandNālandāinBiharofthetimeofDevapālaandfromJagjıvanapurainWestBengalofthereignofMahendrapāla

Siddhamātikā

9thCenturyADuntil1025AD

Proto-BanglacharacteristicsinrudimentaryformsdevelopAnimportantlandmarkinthedevelopmentoftheBanglascriptistheRamaganjacopperplateinscriptionofMahāmāndalikafoundinthelastquarteroftheeleventhcenturyAD

Proto-BanglaScriptampLanguage

12th-13thCenturyAD

AmatureformofProto-BanglatheimmediateprecursorofBanglascriptisfoundintheinscriptionsoftheVarmanaSenaandDevarulersofthetwelfthandthirteenthcenturies

MaturedProto-Bangla

14th-15thCenturyAD

ThecharacteristicsoftypicalBanglascriptbegantodevelopascouldbeseeninthecopperplateinscriptionofVijayamānikya-IofTripuradated1478AD-alsoIllustratesformsofBanglalettersinthefifteenthcenturyAD

ModernBanglaScripterabegins(SeeRoss1999)

16th-17thCenturyAD

ThechartoftheBanglaalphabetappendedtotheChinaMonumentspublishedfromAmsterdamin1667andThecodeofGentoolawpublishedfromLondonin1776bothshowachartoftheBanglaalphabetTheyshow16VowellettersincludingtheLonglsquoৡrsquo lsquol irsquoAnusvāraandVisargaand34Consonants

PrintedChartsofBangla

18th-19thCenturyAD

CharlesWilkinsdevelopsprintinginBanglain1778andVidyasagarreformsit

BanglaTypeFonts

Table2DevelopmentoftheBanglaWritingSystem

8

TheoveralldevelopmentofBanglaScriptfromtheKuṭila-lipiperiodtoModernBanglacouldbeseenhereinTable3([102and146]andalsoseetheweb-pagein147)

Table3BanglaScriptinDifferentCenturies

32LanguagesConsideredBelowisthetabularrepresentationofthelanguagesusingBanglascriptthatareplacedonEGIDSScale1-6 (See117 fordetails) Some languagesunderEGIDS5 and6havealso developed their own scripts for printing and publishing Some had used Banglascriptearlier(suchasBodo)orusedit inWestBengalatsomepointoftime(Santali)but have later shifted to another writing system Bodo is now written in Nāgarī or

9

DevanāgarīandforSantalioneusesbothNāgarīDevanāgarīandOl-chiki(145)Forthepurposesof theBanglaLGRonly languagesbelonging to theEGIDS scale1 to4havebeenconsideredConsiderthefollowingtable

EGIDSScale1

EGIDSScale2

EGIDSScale3

EGIDSScale4

EGIDSScale5

EGIDS6

Bangla(Bengali)

SantaliBodoRiangKhumiMru(ng)Asho

LepchaPnarKodaKoraChak

Asamiyā(Assamese)

KochorRajabansı

MaltoorMalpahariya

ManipuriorMeitei

BisnupriyaManipuriKok-Borok(TripuraampBangladesh)

ChakmaHajongMundariampKurux(ofBangladesh)

TotoRohingyaTipperaMegamTanchangya

Usoi LimbuSadriorOraon

BhumijorMundariBawmChin

Table4MainlanguagesinIndiaandBangladesh

thatuseBanglaScriptontheEGIDSScale

33NotableFeaturesofBanglaScript[150]BanglaWritingSystemhascertainfeaturesthatshowhowithastobewritteninorhowtype-setting inBangla couldbedoneThis section is followedbya section that explains theCode-points (and fixed Code-point sequences) which show certain distinctive characteristics ofBanglaandwhichmaketheRepertoireThenextsectionswillalsocoverthelsquoaksharrsquo-formationrules(ABNF)showingcharacterclassWordLevelEvaluation(WLE)andContextRulesaswellas In-ScriptandCross-ScriptVariantsHerewepresentsomebasic featuresof theScriptandPronuncition The Bangla script is an alpha-syllabic writing system in which writing of all

consonants are assumed to contain an accompanying lsquoinherentrsquo vowel(theoretically before or after each consonant) It varies between ɔ and o

10

depending on the position of the consonant in the word At times theselsquoassumedrsquoorlsquoinherentrsquovowelsarenotpronouncedatall[142]

Vowelscanbewrittenasindependentlettersorbyusingavarietyofdiacriticalmarks which are written above below before after or both of the last twopositionstheconsonanttheyfollowinpronunciation[105]

AllBanglaconsonantswhenpronouncedinisolationareutteredwithaninherentvowel-ɔhenceক lsquokrsquoখ lsquokhrsquoorগ lsquogrsquoareusuallypronouncedas[kɔ][khɔ]or[gɔ]etcPhonologicallyBanglavowel-ɔcorrespondstotheHindischwaə

WhenconsonantsoccurtogetherinclustersspecialconjunctlettersareformedInprintedBanglamanyof theseconsonantal clustersorconjoinedconsonantsareinuseThelettersfortheconsonantsotherthanthefinaloneinthegrouparegenerally reduced But there are a few special conjunct characters which arecompounds of the consonant characters eg 7(k)+ষ(s)=8(ks)

9(n)+জ(j)=(nj)(j)+ঞ(n)==(jn) gt (h)+ম(m)=(hm) There are other issuesalsomdashরasthesecondmemberofaclusterisreducedtoasecondarysymboleg

(p)+র(r)=A(pr)B(s)+C(t)+র(r)=D(str) (as inউD ustra ldquocamelrdquo)য (y)whenusedas a primary symbol represents jɔ in Bangla But its secondary symbol(allograph) jɔ-phala has two phonetic values When added to the initialconsonant in a word it is a vowel aelig (as in শGামল (syamala) ldquogreenrdquo র Gাপার

(ryapara)ldquowrapperrdquoetc)Butafteranon-initialconsonant it justdoublesit in

pronunciation (as in কাযH ধাযH etc) The I(r)+য(y) combination has two

renderingsmdashর G(ry) andযH(ry)IncaseofJ(d)+ধ(dh)K(g)+ধ(dh)L(n)+ধ(dh)the

shape of the second member is changedmdasheg M(ddh) N(gdh) and O(ndh)

respectively The solitary example of I (r)+ঋ(r)=ঋH (as in ৈনঋHত nairrtSouthwest) ndash usedmostly in cases of Classical borrowings shows the use ofsecondary symbol of a consonant followed by the primary symbol of a vowelTheinherentvowelonlyappliestothefinalconsonantofthecluster

InconsonantclustersmanyconsonantstookacompletelydifferentformSometypicalexamplesareS(kt)T(kr)8(ks)N(gdh)=(jn)U(nc)(nj)V(tt)W(nt)O(ndh)X(bdh)Y(bhr)Z(mb)[(st)etcরhastwoallographsapartfromthisfullshapeoneislsquorepharsquoasfoundinকH(rk)পH(rp)andanotherisra-phalaasinA(pr)T(kr)(s+n)isanotheronewherethecerebralnasalconsonantsigntakesaqueershape[151]

The Bangla script has at least fifty-two primary symbols and quite a fewallographs(positionalvariantsofthem)correspondingtoforty-four(7oraland7nasalvowelsand30consonants)phonemes(150)orfunctionalspeechsoundswithsomeobviousredundanciesalthoughinoneofthefirstphonemicanalysisthenumberwasthoughttobethirty-fivephonemes[140]

11

As mentioned above in Bangla several graphemic symbols have secondaryshapestechnicallycalledlsquoallographsrsquowithacomplementarydistributionineachcaseThesegraphsormarkingsaregenerallyaddedtothefollowingpositionsoftheprimarysymbol[113]inthefollowingmanner

1) Below(egক(ku)W(nta)ক(ku)^ (hra)etc)

2) Above(egচ (ca)কH (rka)etc)

3) Rightside(egকা (ka)কং (kan)etc)

4) Leftside(egেক (ke))

5) LeftSideandabovesimultaneously(egৈক (kai)িক (ki)etc)

6) Rightsideandabovesimultaneously(egকী (kı))

7) Rightsideandleftsidesimultaneously(egেকা (ko))

8) Rightsideleftsideandabovesimultaneously(egেকৗ (kau))

Asforcomplementarydistributionofvowelletters(word-orsyllable-initial)andVowel Matras which are relevant for ABNF let us consider the followingBesidessomesimpleVowelModifierscalledlsquoKarsrsquoinBangla(alsoreferredtoasMatraintheotherLGRdocumentsofNeo-Brāhmī)therearesomecombinatorymodifiersofBanglaVowelswithcertainconsonantsForexamplewhereas

আU+0986BENGALILETTERAAissubstitutedby

াU+09BEBENGALIVOWELSIGNAA

ইU+0987BENGALILETTERIissubstitutedby

pre-posedিU+09BFBENGALIVOWELSIGNI

ঈU+0988BENGALILETTERIIissubstitutedby

ীU+09C0BENGALIVOWELSIGNIIor

উU+0989BENGALILETTERUissubstitutedby

U+09C1 BENGALI VOWEL SIGN U by marking below the primary

grapheme there are some special vowel modifiers of উ as in the followingcombinedletters

zwnj guratherthanwritingasগ(g)+ (u)

h ruratherthanwritingasর(r)+ (u)

zwnj śuratherthanwritingasশ (s)+ (u)

j huratherthanwritingasহ(h)+ (u)

12

knturatherthanwritingasL (n)+ত (t)+ (u)

Similarlytherecouldbevowelmodifiersofঊorlsquo(Long)ūrsquoaswelleg

m (bh)+র (r) (n bhru ldquoeyebrowrdquo)o (s)+র (r) (p sru)ঋ (r) afterহ (h) (q hr)etc

TherehavebeenmanynotablecontributionsinsimplifyingandmodifyingBanglaspellings and combinatory techniques especially by scholars such as PabitraSarkar(1992)[134]Inthistherehasbeenanattempttoreducethenumberofallographs of both vowels and consonants in clusters and it has been widelyacceptedintheprintingofschooltextsinbothBangladeshandWestBengal[151152]Asofnow twosystems theold (traditional) and thenewgoon sidebysideoperativeindifferentdomains

HoweverinpreparationofthisLGRdocumenttheaimhasbeentoconsiderthewidelyused and usable sequences and combinations and their variations across the sisterscriptsbelongingtothebasketofBrāhmīwritingsystemsBanglaAcademyDhakapublishedStandardBanglaSpellingRulesin1992followingtherecommendationsofacommitteeconstitutedthroughaworkshopjointlyorganizedbytheJatıyaSy iksakramaandPathyapustakaBoardin1988AthroughlyrevisededitionoftheRuleswaspublishedinSeptember20126After the establishment of Banla A kademi ofWestBengal in 1986 its first PresidentAnnadasankar Ray (1904-2002) in his inaugural address gave a direction forstandardizationofBanglaalphabetscript thespellingsystemandclearlyarguedthattheywouldnotblindly followtheSanskriticmodelofconventionalgrammarAbroadlistofproposalswassenttoexpertsonBanglaandabroadagreementwasreachedforlsquohomogenizationofBanglaspellingrsquoby1988BasedonopinionsreceivedfromdifferentquartersaunanimouslistoflsquorulesrsquowasagreeduponThiswaspublishedbyalsquoSpellingDictionaryrsquo titled Ākādemi Bānāna Abhidhāna (1997) which was obviously morecomprehensive than lsquoTheUniversityofCalcuttaproposalsrsquomade in1936Alongwiththe lsquorationalizationrsquo of spellings another stepwas taken tomake thewriting systemeasier to read by making the symbols used both single and combined ones morelsquotransparentrsquoThesereformswereoriginallysuggestedbySarkar(1987firstpublishedin1978)[134][153]whereheusedthetermsSwaccha (lsquoTransparentrsquo)andAswaccha(lsquoOpaquersquo or non-transparent) even adding Ardha Swaccha (lsquohalf transparent) inbetweenthetwoSomesampleexamplesare

6Bangla Academy 2012 Bāṅlā Ekaḍemī Pramita Bāṅlā Bānānera Niyama (Bangla Academy StandardBanglaSpellingRules)DhakaːBanglaAcademy

13

Transparent r (nn) s (pt) [ (st) where both member of the cluster can berecognizedOpaquewhereneitherofthetwocouldbe(easily)recognizedmdash8 ks(7 k+ষ s)= jn

( j+ঞ n)tng(un+গg) hm(gt h+ম m)

Semi-transparent A (pr)পH (rp)whereone symbol is recognizable and theother is

notIncaseofthree-termclustersatleastonesymbolwillnotbetransparentegv str

(w s+x t+র r)D str(B s+C t+র r)etc

Therewere in fact two types of proposals One concerned the shape of the lettersthose of consonant + vowel (CV) combinations and conjuncts which is consonant +consonantcombinationsTherewerefurthercomplexshapesiethoseofconsonant+consonant+ (consonant+) vowel (CC(CV) signs as in y (pru) or z (skru) SomedecisionsinthisareawerenecessarybecauseafewoftheCC(C)symbolsrepresentedcomplexitiesthatmadelearningthemdifficultforthechildrenTheotherdealtwiththespellings ofwords onlywithout any reference to the shapes of letters inwhich theywere written The basic objective here was lsquoone word one spellingrsquo to the greatestextentthatwaspossible[151]

Belowwe place a statement of themost salient changes that affect the consonant +vowelcombinations[153]

a The variants of the short u (^ উ-কার hrasva u-kāra) vowel sign have been

brought down to one ie So zwnj (gu) is now গ Similarly h (ru) gt র zwnj

(śu)gt শ j (hu)gtহ and therefore cluster + short u sign k (ntu)gt W

(ন++ত+উ) (stu)gt[ (স++ত+উ)

b The variants of long u (দীঘH ঊ-কার dīrgha u-kāra) have also been reduced

(rū)gt র n (bhrū) gt Y (ভ bh++র r+ঊ ū) (drū)gt (দ d++র r+ঊ ū) p (śrū)gt

(শ ś++র r+ঊ ū)

c The variants of ঋ-কার (ṛ-kāra secondary symbol of ṛ) have been brought down to one q (hṛ) gt হ

Regarding consonant + consonant + (consonant)hellip+ (vowel) clusters PaschimbangaBanglaAkademi proposed transparent or semi-transparent shapes for clusters to theextentadmissibleinBanglawritingsystemSomeexampleswillclarifytheproposal(Aslashwillmeanthatthetraditionalcluster-shapeprecedesitwhiletheBanglaAkademiinnovationfollows)[153]

Xব ধ bdh ( b+ ধ dh) Mদ ধ ddh (J d+ধ dh) ন থ nth (L n+থ th) Uঞ চ ntildec

(9 ntilde+চ c) ঞ ছ $ ntildech (9+ছ) ঞ জ ntildej (9 ntilde+জ j) Sক ত amp kt (7 k+ত t) T

14

kr (7 k+র r) Nগ ধ ( gdh (K g+ধ dh) ) ṅk (u ṅ+ক k) t ṅg (u ṅ+গ g) +

ṣṇ (B ṣ+ণ ṇ) ন ndhr (L n+ dh+র r) - ṇḍr ( ṇ+ ḍ+র r) ktr (7 k+x

t+র r)

331TheConsonantsAsper traditional classificationBangla Consonants are categorized according to theirphoneticpropertiesespeciallyintermsofplaceandmannerofarticulation[107]Thereare Five lsquoVargarsquo (pronounced as lsquoBargarsquo in Bangla) or Groups (sets or classes)distinguished by Place of Articulation and one Non-lsquovargarsquo group [105] Each Vargawhich corresponds toStopsat a certainplaceof articulation containsa seriesof fiveconsonants classified as per their phonetic qualities (ie manner of articulation)beginning from Unvoiced and Unaspirated to Voiced and Aspirated (in the fourthcolumn)finallyendingwithaHomorganicorCorrespondingnasal[107]Considerthefollowingtable

lsquoVargarsquoorSets

Unvoiced Voiced Nasal

-Asp +Asp -Asp +Asp

Velar কlsquoKrsquoU+0995

খlsquoKHrsquoU+0996

গlsquoGrsquoU+0997

ঘlsquoGHrsquoU+0998

ঙlsquoNGrsquoU+0999

Palatal চlsquoCrsquoU+099A

ছlsquoCHrsquoU+099B

জlsquoJrsquoU+099C

ঝlsquoJHrsquoU+099D

ঞlsquoNYrsquoU+099E

Retroflex টlsquoTTrsquoU+099F

ঠlsquoTTHrsquoU+09A0

ডlsquoDDrsquoU+09A1

ঢlsquoDDHrsquoU+09A2

ণlsquoNNrsquoU+09A3

Dental তlsquoTrsquoU+09A4

থlsquoTHrsquoU+09A5

দlsquoDrsquoU+09A6

ধlsquoDHrsquoU+09A7

নlsquoNrsquoU+09A8

Bilabial পlsquoPrsquoU+09AA

ফlsquoPHrsquoU+09AB

বlsquoBrsquoU+09AC

ভlsquoBHrsquoU+09AD

মlsquoMrsquoU+09AE

Table5VargaclassificationofBanglaconsonants

(FallingintoaPatternofFiveSetsofUnvoicedUnaspiratedUnvoicedAspiratedVoicedUnaspiratedVoicedAspiratedandNasalscalledfivelsquoVargarsquo)

Non-

যlsquoYrsquoU+09AF

য়lsquoYYrsquoU+09DF

রlsquoRrsquoU+09B0

ড়lsquoRRrsquoU+09DC

ঢ়lsquoRHrsquoU+09DD

15

Varga লlsquoLrsquoU+09B2

শlsquoSHrsquoU+09B6

ষlsquoSSrsquoU+09B7

স lsquoSrsquo U+09B8

হlsquoHrsquoU+0939

Table6Non-Vargaconsonants(Notfallingintoanyofthefivecategories)

332TheImplicitVowelKillerHasanta(calledrsquoHalantrsquoorlsquoHalantarsquoinotherBrahmı-basedscripts)As stated earlier all consonants are pronounced in isolation with an implicit vowel(centralback-ɔinBanglaastheneutralvowel)assumedtobeassociatedwiththem[121]ThelsquoHasantarsquo(=rsquoHalantrsquoorlsquoHalantarsquoinotherBrahmı-basedscripts)orthetermlsquoVirāmarsquo7(=rsquoDa rirsquoinBangla)aspreferredinUNICODE(cfUnicode30andabove)havebeenusedinthisreportastermsthathavebeenusedtodenotethecharacterthatmarkthe absence of this inherent vowel It may be noted that the term virama has beenadopted in UNICODE in a sense that is different from the traditional definition ofgrammarandhenceitrequiressomeexplanationhereConsideringtheimportanceofthedocumentthisnoteshouldbeapartofthisLGRdocumentsothatanybodyreferingtoitshouldbeabletoknowthepropergrammaticalexplanationofthetermBecauseaspecialsignisneededwheneverthisimplicitvowelisstrippedoffthesymbolisknownas the Hasanta (= Halant) (U+09CD) By placing the Hasanta under the firstconsonantofacombinationorclusteronecouldndashincommonparlanceldquokillrdquoitsvowelandcreate conjuncts In thismanner conjunct characters canbegenerallywrittenbyjoining two to fourconsonant combinations In rarecases thisprocess can joinup tofive consonantsHowever thenotionof amaximumnumberof consonants joining toformoneaksara8istobeboundedempiricallyThisisanobservationbasedontheCIIL-Emille Corpora of Bangla words [132 amp 133] as seen in print these days Given themixture of scripts and languages happening on theweb the possibility that onemaywant a generic Top Level Domain [gTLD] which may have more than the observedmaximum cannot be ruled out This can be the case when a foreign language wordwhichadmitsalargenumberofconsonantsistransliteratedintoBanglaHenceintheBanglaLGRworkthislimitwillnotbeenforced

333VowelsSeparate symbols exist for all lsquoSwararsquo or Vowels in Bangla which are pronouncedindependentlyeitheratthebeginningofthewordorafteranothervowelorconsonantsoundToindicateaVowelsoundotherthantheimplicitoneaVowelsigncalledlsquokārrsquo

7VirāmaasusedhereisalsoamisnomeraccordingtotheIndiangrammaticaltraditionsNowheremereabsence of a vowel is marked as virama Hasanta just marks the absence of a vowel nothing else(AbhyankarKashinathVasudevampJMShukla1961ADictionaryofSanskritGrammarBarodaːOrientalInstitute)8ThistermneedstobedisambiguatedAksaraalsomeanslsquosyllablelsquoinIndiangrammaticaltreaditions

16

inBanglaorMātrāinNagarı9isattachedtotheconsonantSincetheconsonanthasthisbuilt in neutral vowel at the end there are equivalent kāras (Mātrās) for all vowelsexcepttheঅ(pronounced-ɔ)Thecorrelationisshownasfollows

Vowel Correspondingvowelsign(kāras(Mātrās)

অlsquoArsquoU+0985

আlsquoAArsquoU+0986 া U+09BE

ইlsquoIrsquoU+0987 ি U+09BF

ঈlsquoIIrsquoU+0988 ীU+09C0

উlsquoUrsquoU+0989 U+09C1

ঊlsquoUUrsquoU+098A U+09C2

ঋVocalicrsquoRrsquoU+098B U+09C3

ৠVocaliclsquoRRrsquoU+09E0 U+09C4

ঌVocaliclsquoLrsquoU+098C U+09E2

ৡVocaliclsquoLLrsquoU+09E1 U+09E3

এlsquoErsquoU+098F ে U+09C7

ঐlsquoAIrsquoU+0990 ৈU+09C8

ওlsquoOrsquoU+0993 োU+09CB

ঔlsquoAUrsquoU+0994 ৌ U+09CC

- ৗ U+09D7

Couldappearontopofঅ lsquoArsquoU+0985oranyothervowel

U+0981Candrabindu

9AlthoughthetermlsquoMatralsquoinBanglastandsforanaltogetherdifferentconceptvizthetopbarplacedoveraletterndashtypicallyavailableinHindiandBanglabutmissinginGujarati

17

Vowel Correspondingvowelsign(kāras(Mātrās)

Couldappearafterঅ lsquoArsquoU+0985oranyothervowel

ংU+0982Anusvara

Couldappearafter অlsquoArsquoU+0985oranyothervowel

ঃU+0983Visarga

Afteranyconsonant U+09CD(Hasanta)

- ঽ U+09BDAvagraha

Table7BanglaVowelswithcorrespondingkārs

334TheAnusvāraonuʃʃār(ং-U+0982)TheAnusvāra or onuʃʃār inBangla at times represents a homorganic nasal but notalwaysItreplacesaconjunctgroupofalsquoNasalConsonant+Hasanta+ConsonantrsquowherethesecondconsonantbelongstotheVelarvargaorsetasinলংকাButitoftenappearsalso for such combinations involving non-velars appearing as the lastmember of thecombinationasinলGাংটা ldquonakedrdquoorলGাংচা ldquoakindofsweettolimprdquoBeforeanon-vargaconsonant the Anusvara represents a nasal sound that may have an alternativeconjoined writing symbol representing the corresponding nasal consonant of theparticularsetAlthoughModernHindiMarathiandKonkaniprefertheanusvāratothecorrespondingHalf-nasal inBangla it isclearlydemarcatedastowhereonemustusetheAnusvāraandwhere ithastobeaconjunctclusterwithanasalasthefirstorthesecondcomponent

335NasalizationCandrabindu(-U+0981)

Candrabindu denotes nasalization of the preceding vowel as in চাদ cad lsquomoonrsquo(U+099AU+09BEU+0981U+09A6)Thissignwithadotinsidethehalf-moonmarkisusedasnasalizationmarkerinmanyBrahmı-basedscripts[143]

336Nukta(-U+09BC)ThenuktasigndoesnotexistinBanglaorthographyItispredominantlyusedinmanyBrahmıderivedscriptssuchasDevanagarı(forHindiBodoMaithiliSantaliKashmiriandSindhiThetermandtheconceptofnuktaareborrowedinBanglaTheIDNAProtocol(RFC5891)statesthatIDNsmustbeinUnicodeNormalizationFormC (NFC) RFC 7940 applies this requirement to LGRs The definition of NFC in the

18

UnicodeStandardcontainsanumberofcompositionexclusionsAsaresulttheBanglalettersয় YYAড় RRAandঢ় RRHAhavetoberepresentedinthethisLGRbyusingthesequences (YA +Nukta U+9AF + U+09BC) (DDA + Nukta U+9A1 + U+09BC) and(DDHA+NuktaU+9A2+U+09BC)insteadofthesinglecodepointsYYA(U+9DF)RRA(U+09DC) andRRHA (U+09DD) although the useof lsquoNuktarsquo is otherwise completelyunnaturalinBanglaIt is noted that in the current Unicode Standard chart these characters are listed asadditionalconsonantsAspertheLGRProcedurehoweverthesedecisionsdependontheIDNAProtocolthroughasetofprodeduresdevelopedbytheIETFEventhoughtheUnicode Standard also prescribesmethods to produce these three characters both asatomiccharacters (forexample09DC forড় [r]09DD forঢ় [rh] and09DFasয় [y]assinglekeystroke)theIDNAprotocolrequiresthatwetreatthemasconjunctcharactersandthenallocatecodesfortheseintheUnicodeBengaliBlockItmaybenotedthattherecouldbesporadicattemptsorcasesofwritingMuslimnamesUrdupoeticwordsandPerso-Arabicloanwordswithnuktaunderক(k)খ(kh)গ(g)জ(j) and ফ (ph) only for the sake of correct pronunciation and for maintaining thesanctityoftheloanwordThesewerealsolikeusingBanglawritingsystemtoworkliketheIPAscriptItishowevernotinuseinBanglawritinginprinting

337Visargabiʃɔrgo(ঃ-U+0983)andAvagraha(ঽ-U+09BD)

TheVisargabiʃɔrgoU+0983 is frequentlyused inBangla loanwordsborrowed fromSanskritandrepresentsasoundveryclosetohOnecouldquoteasanexampleদঃখduhkholdquosorrowrsquorsquoldquounhappinessrsquorsquo(U+0926U+0941U+0983U+0916)The Avagraha ঽ (U+09BD) is mainly used in Sanskrit Pali Prakrt or Maithili textswritteninBanglaItisgraduallybeingreplacedbyanuppercomma(egনেরাঽপরািণre-writtenasনেরাrsquoপরািণ)ItisrarelyusednoweveninotherlanguagesusingBanglascriptIncaseofLGRtheAvagrahaisnotpartoftherepertoireIthasbeendecidedthereforenottoretainAvagraha(ঽ)(U+09BD)becauseitisblockedinTLDsaspertheMaximalStartingRepertoire(MSR)PleaseseeAppendixIIinsection11foracompletelistofBanglaconsonantsandtheirallographs

338ZeroWidthNon-joiner(U+200C)andZeroWidthJoiner(U+200D)ThisnoteispertinenttotheuseofZeroWidthJoiner(ZWJ)andZeroWidthNonJoiner(ZWNJ)asusedinBanglaItneedstobenotedthatNepaliKonkaniandHindiusethesetwosignsinadifferentmanner

19

ZWJ(U+0200D)andZWNJ(U+0200C)arecodepointsthathavebeenprovidedbytheUnicodestandardto instructtherenderingofastringwherethescripthastheoptionbetweenjoiningandnon-joiningcharactersWithouttheuseofthesecontrolcodesthestringmayberenderedinanalternateformfromwhatisintendedUseofZWJ

bull InsofarasBanglaisconcernedZWJisusedfortheproperrenderingofcharacterssuchaskhaṇḍa-taৎasinসতGিজৎ (satyajit)ldquoSatyajitrdquoandসৎ(sat)ldquohonestrdquoThisistypedasfollowsta+Hasanta+ZWJ(U+0200D)

bull However ZWJ is more important where same combination of consonantal

characters is represented differently depending upon the contexts Eg র++য

havetworepresentationsinBanglamdashasযHandasর GTogettheformযHonehasto

type in the following mannermdashর++য but for র G the sequence would be

র+ZWJ++য [154] In other words ZWJ is used in the rendering of wordsdemanding ya-phalā after ra which is otherwise not possible to type (render)due to the same order of ra+hasanta+antastha ja in the medial andor finalposition Interestingly ra+hasanta+antastha ja is used to type repha on theconsonant -antasthaja as inকায6 (kaarjo) In order to get a ya-phalā after the

consonant -ra it is therefore obligatory to use ZWJ after -ra as in র Gাপার

(wrapper)র Gাশ(rash)র Gািল(rally)etcThetypingsequenceisgivenbelow

ra(র)+ZWJ+hasanta()+antasthaja(য)=র GUseofZWNJ

bull TheuseofZWNJinBanglaisusedtorepresenttheexplicitHasantaorHalantInordertoavoidconjunctformationincaseswherethereisanexplicithasantabeforethesucceedingconsonanttheZWNJisused

Consonant+hasanta+ZWNJ+consonant=explicithasantaExampleAা7 কথন(prakkathanaprakkɔtʰon)

TheuseofZWJZWNJhavebeenruledoutfromtherootzonebythe[Procedure]Usedin Bangla to create alternate renderings the insertion of these two signs can affectsearchingaswellasNLP

20

TheZeroWidthNon-joiner(ZWNJ)isaninvisiblecharacterusedincertaincases(afterHasanta)wheredefaultconjunctformationistobeexplicitlyrestrictedandtheHasantajoiningthetwoconsonantsparticipatingintheconjunctformationneedstobeexplicitlyshown

339UseofYa-phalaaYa-Phalaasequencesare two instances inBanglawhereHasanta isprecededbya fullvowel(U+0985অ-BENGALILETTERAandU+098Fএ-BENGALILETTERE)

bull অা 098509CD09AF09BE BENGALILETTERA+BENGALISIGNVIRAMA+BENGALILETTERYA+BENGALIVOWELSIGNAA

bull এা 098F09CD09AF09BEBENGALILETTERE+BENGALISIGNVIRAMA+BENGALILETTERYA+BENGALIVOWELSIGNAA

For renderingYa-phalā followedbyঅ andএ it isnecessary to typeU+09CDHasantaplusU+09AFyaprecededby thesaidvowelsThis isapurely ligaturalentityand theadditionofYa-phalāandākaraisusedtoelicittheaeligsoundasinEnglishacidঅGািসড

association অGােসািসেয়শনlsquobatrsquoবGাটlsquofatrsquo ফGাট lsquomatrsquo মGাটlsquocaprsquoকGাপetcTheBrāhmīscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribed as lsquovowel killerrsquo although it actually indicates absenceof a vowel after themarkedconsonantOnly theconsonantscanhave theHasantamarkedButasweseehereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅ8াandএ8া(CfUnicode100p473[100])

3310FormationofRa-phalaaandRefSequences Thiscasereferstotheformationofrephaandra-phalāasfollows

Ra-Hasanta= (C2H)whereC2iseither

09B0(র-BENGALILETTERRA)or 09F0(ৰ-ASSAMESELETTERRAUnicodename

BENGALILETTERRAWITHMIDDLEDIAGONAL)His09CD(-BENGALISIGNVIRAMA)

Owingtoco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+kaasinঅকHarkaldquothesunldquo)ra-phalā=C+Hasanta+ra(egT ieka+Hasanta+raasinচTchakraldquocyclerdquo)Thepointisin

21

both the cases the slot for ra could be Bangla ra র (U+09B0) or the Assamese ra ৰ(U+09F0)followedprecededbythecommonHasanta(U+09CD)whereastheshapesofrephaandra-phalāinboththecasesremainthesameTheLGRmakesanoteofthispoint of concern with respect to the two RAs in disguise as it would be compeltelyimpossibletodistinguishbetweenthemwithnakedeyesinalablesogeneratedwhichmay consequently lead to concerns related to spoofing and other kind of cyberirregularitiesThemotivetoclassthesetwoCPsas(blocking)variantsisbecausefullyrendered labels may mask the distinction between Bangla ra র (U+09B0) or theAssameseraৰ(U+09F0)ThatprovidesthejustificationforVariantSet4thoughonlyinthecontextoffollowingHasantThedifferencebetweentheRAsisonlydistinguishableifonelooksintotheirUnicodevaluesThereforelabelssuchasঅকHarka শীষH sırsalsquotopapexrsquo অY abhra lsquocloudthe skyrsquo ম śrama lsquophysical labourrsquo could be extremelydangerous as theweb-usermay never verify the digital content (the labels) with itsunicodevaluecodepoints ThispointismadeexplicitlywithreferencetoTable9(ofsequencesp36)andTable16(ofWLESymbolsp47)thataretofollowMoreoveritisnoteworthythattheREPHAcanalsooccurwithKHANDATATheconditionsinthiscontextofKHANDATAareliabletobesuchthattheCshouldbeeitherRAU+09B0(র)(usedinBangla)orRAU+09F0(ৰ)(usedinAssamese)

4 OverallDevelopmentProcessandMethodologyThe Neo-Brāhmī Generation Panel (NBGP) has been formed by members havingexperience in Linguistics (especially in NLP Computational linguistics) LiteratureLanguageHistoryandEpigraphyUndertheNeo-BrāhmīGenerationPanelBanglaandeightotherscriptsbelongingtoseparateUnicodeblocksarebeingtakenuptoassignaseparate LGR for each However an attempt ismade to ensure that the fundamentalphilosophybehindbuildingthoseLGRsconsistentwithallotherBrāhmī-derivedscriptsThepresentLGRwillcater tomultiple languagesbelongingtoEGIDSscale1to4(seeTable4)thatuseBanglascriptThefollowingguidingprinciplesareusedinmakingdecisionsaboutBanglaLGRCode-points

41 GuidingPrinciplesTheNBGP adopts followingbroadprinciples for selection of code-points in the code-pointrepertoireacrosstheboardforalltheNeo-Brāhmīscriptswithinitsambit

411 InclusionPrinciples4111 ModernUsageEvery character proposed should be in the everyday usage of a particular linguisticcommunityThecharacterswhichhavebeenencodedintheUnicodefortranscription

22

purposesonlyorforarchivalpurposeswillnotbeconsideredforinclusioninthecode-pointrepertoire4112 UnambiguousUseEvery character proposed should have unambiguous understanding among linguistsaboutitsusageinthelanguage

42 ExclusionPrinciplesThe main exclusion principle is that of External Limits on Scope These consist ofprotocolsor standardswhichareprerequisites to theLabelGenerationRule-setsAllfurtherprinciplesareinfactsubsumedundertheselimitationsbuthavebeenspeltoutseparatelyforthesakeofclarity421 ExternalLimitsofScopeThecodepointrepertoireforrootzonebeingaveryspecialcaseatthetopofprotocolhierarchies thecanvasofavailablecharacters forselectionasapartof theRootZonecodepointrepertoireisalreadyconstrainedbyvariousprotocollayersbeneathitThefollowingthreemainprotocolsstandardsactassuccessivefiltersiTheUnicodeChartOut of all the characters that are needed by the script in question if a particularcharacter is not encoded in Unicode it cannot be incorporated in the code pointrepertoire Such cases are quite rare and especially so in Bangla-Asamiyā-ManipuriWritingSystemgiventheelaborateandexhaustivecharacterinclusioneffortsmadebytheUnicodeconsortiumiiIDNAProtocolUnicode being the character-encoding standard for providing the maximum possiblerepresentation of a given scriptlanguage it has encoded as far as possible all thepossible characters needed by the script However the Domain name being aspecialized case it is governed by an additional protocol known as IDNA(InternationalizedDomainNames inApplications) The IDNAprotocol excludes somecharactersoutofUnicoderepertoirefrombeingpartofthedomainnamesiiiMaximalStartingRepertoire(MSR)TheRoot-zoneLGRbeing the repertoireof characterswhicharegoing tobeused forcreationoftheRoot-zoneTLDswhichinturnconstituteanevenmorespecializedcaseof domain names the ROOT LGR procedure introduces additional exclusions on theIDNArsquosallowedsetofcharacters

23

ExampleBanglaSignAvagrahaঽ(U+093D)evenifallowedbyIDNAprotocol isnotpermittedintheRootZoneRepertoireaspertheMSRTosumuptherestrictionsstartoffwithadmittingonlysuchcharactersasarepartofthe code-block of the given scriptlanguage The IDNA Protocol further narrows thisdownandfinallyanadditionalfilterintheformofMaximalStartingRepertoirerestrictsthecharactersetassociatedwiththegivenlanguageevenmore4211 NoPunctuationMarksTheTLDsbeingidentifierspunctuationmarkerspresentinBraHami-basedscriptswillnotbeincluded 4212 NoSymbolsandAbbreviationsAbbreviations weights and measures and other such iconic characters like BANGLAISSHAR(U+09FA)BANGLACURRENCYDENOMINATORSIXTEEN৹(U+09F9)etcwillalsonotbeincluded4213 NoRareandObsoleteCharactersThere are characterswhich have been added toUnicode to accommodate rare formssuchasSanskriticVOCALICRRৠ(U+09E0)andVOCALICLldquoঌrdquo (U+098C)aswellasVOCALICLLৡ(U+09E1)andtheallographicndashkaraformsofthelattertwosymbols-VOWELSIGNVOCALICL(U+09E2)andVOWELSIGNVOCALICLLldquo(U+09E3)Allsuch charactersareexcludedwhich complieswith theConservatismprincipleas laiddownintheRootZoneLGRprocedureHoweverinBanglathe-karacorrespondingtoVOCALICRRৠ(U+09E0)whichisVOWELSIGNVOCALICRRldquordquo(U+09C4)isstill inactiveuseincertainlimitedborrowedorSanskriticwordsandarethereforeretained4214 NoStressMarkersofClassicalSanskritandVedicStressmarkers for classical Sanskrit will not be included This is also in consonancewiththeLetterprincipleaslaiddownintheRootZoneLGRprocedure4215 ABNFThe Augmented Backus-Naur Formalism (ABNF) is described in Section 541 andAppendix(Section101)

5 RepertoireTheBanglaWritingSystemisrepresentedinUNICODEusingtheBengali(Bangla)scriptname as enumerated in ISO 15924 corresponding to languages such as Asamiyā(Assamese) Bangla (Bengali) and Manipuri The BENGALI block used for Bangla-

24

Asamiyā-Manipuri in theUNICODEhas93 entriesThis sectiondetails the code-pointrepertoirethattheNeo-BrāhmīGenerationPanel[NBGP]proposestobeincludedintheBanglaLGRItmaybementionedherethat theGovernmentofAssamhassubmittedaproposal toBureauof Indian Standards (BIS) on26th February2016 for dis-unificationofBanglaand Asamiyā Scripts The BIS inits 8thMeetingofIndian Language Technologies andProducts Sectional Committee LITD 20 held on 23rd Aug 2017 decided torefer theproposalforrecognitionofAssamesescriptinISOIEC10646toISOUntiltheUNICODEConsortiumtakesanyfurtheractionitwillbeassumedthattheCodePointRepertoireunderTable11willbevalidforallthethreelanguagesasaboveFor each of the code points language references have been given in the last columntitledReferenceunderTable8titledtheldquoCodePointRepertoirerdquoForentirecoverageofBanglacodepointsreferencesofBanglaAsamiyā(Assamese)Manipuri(Meitei)andBishnupriya are given Kokborok written in Bangla script is not known to haveintroducedmanynewcomplicationsexceptforoneparticularcharacterThoughonlyafewrepresentativelanguagesunderEGIDSScale1-4havebeenchosenforreferencingthey together cover all the code-points required for all the languages that NBGP hasconsideredasgivenunderBanglaUnicodePoints(asgiveninUNICODE63)Howeverbefore thedetailsarepresented it is ideal to lookat theBanglaCodePointChartfromMaximalStartingRepertoire[MSR]Version3Itmaybenotedthattheshapesofthereferenceglyphsgivenbelowinthecodechartsarebasedononeofthemanyfontsdesigned and are not prescriptive because there could be some variations in actualfonts ndash both UNICODE-compatible and True-Type ones Consider the following Codepointtable

25

Figure1BanglaCodePagefrom[MSR]forBangla-Asamiyā-Manipuri

ColourconventionAllcharactersthatareincludedinthe[MSR]-YellowbackgroundPVALIDinIDNA2008butexcludedfromthe[MSR]-PinkishbackgroundNot PVALID in IDNA2008 or are ineligiblefor the root zone (digits hyphen) - Whitebackground

GiventheBanglaUnicodeBlockasinFigure1forthecodepointsthoseareincludedintheMSRthefollowingsymbolswillneedaseparatetreatmentৎ U+09CE BanglaLetterKhanda-Taৰ U+09F0 Asamiyā-BanglaLetterRaWithMiddleDiagonalৱ U+09F1 Asamiyā-BanglaLetterRaWithLowerDiagonal

26

51 CodePointRepertoireInclusion

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

1 U+0981 BENGALISIGNCANDRABINDU

Candra-bindu

1Bangla2Manipuri2Assamese

[112][122][125]

2 U+0982 ং BENGALISIGNANUSVARA

Onushshar(Anusvara)

1Bangla2Manipuri2Assamese

[112][122][125]

3 U+0983 ঃ BENGALISIGNVISARGA

Biśarga(Visarga)

1Bangla2Manipuri2Assamese

[112][122][125]

4 U+0985 অ BENGALILETTERA

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

5 U+0986 আ BENGALILETTERAA

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

6 U+0987 ই BENGALILETTERI

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

7 U+0988 ঈ BENGALILETTERII

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

27

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

8 U+0989 উ BENGALILETTERU

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

9 U+098A ঊ BENGALILETTERUU

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

10 U+098B ঋ BENGALILETTERVOCALICR

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

11 U+098F এ BENGALILETTERE

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

12 U+0990 ঐ BENGALILETTERAI

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

13 U+0993 ও BENGALILETTERO

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

14 U+0994 ঔ BENGALILETTERAU

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

28

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

15 U+0995 ক BENGALILETTERKA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

16 U+0996 খ BENGALILETTERKHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

17 U+0997 গ BENGALILETTERGA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

18 U+0998 ঘ BENGALILETTERGHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

19 U+0999 ঙ BENGALILETTERNGA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

20 U+099A চ BENGALILETTERCA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

21 U+099B ছ BENGALILETTERCHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

29

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

22 U+099C জ BENGALILETTERJA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

23 U+099D ঝ BENGALILETTERJHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

24 U+099E ঞ BENGALILETTERNYA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

25 U+099F ট BENGALILETTERTTA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

26 U+09A0 ঠ BENGALILETTERTTHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

27 U+09A1 ড BENGALILETTERDDA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

30

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

28 09A109BC(U+09DC)

ড় NormalizedformofBENGALILETTERRRA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]09DCisthepreferredcodepointhoweveritisnotavailableforLGRasperthestandardsgoverningthisLGRdevelopment

29 U+09A2 ঢ BENGALILETTERDDHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

30 09A209BC(U+09DD)

ঢ় NormalizedformofBENGALILETTERRHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]09DDisthepreferredcodepointhoweveritisnotavailableforLGRasperthestandardsgoverningthisLGRdevelopment

31 U+09A3 ণ BENGALILETTERNNA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

32 U+09A4 ত BENGALILETTERTA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

31

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

33 U+09A5 থ BENGALILETTERTHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

34 U+09A6 দ BENGALILETTERDA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

35 U+09A7 ধ BENGALILETTERDHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

36 U+09A8 ন BENGALILETTERNA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

37 U+09AA প BENGALILETTERPA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

38 U+09AB ফ BENGALILETTERPHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

39 U+09AC ব BENGALILETTERBA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

32

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

40 U+09AD ভ BENGALILETTERBHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

41 U+09AE ম BENGALILETTERMA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

42 U+09AF য BENGALILETTERYA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

43 09AF09BC(U+09DF)

য় NormalizedformofBENGALILETTERYYA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]09DFisthepreferredcodepointhoweveritisnotavailableforLGRasperthestandardsgoverningthisLGRdevelopment

44 U+09B0 র BENGALILETTERRA

Consonant 1Bangla2Manipuri

[112][125]

45 U+09B2 ল BENGALILETTERLA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

46 U+09B6 শ BENGALILETTERSHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

33

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

47 U+09B7 ষ BENGALILETTERSSA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

48 U+09B8 স BENGALILETTERSA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

49 U+09B9 হ BENGALILETTERHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

50 U+09BE া BENGALIVOWELSIGNAA

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

51 U+09BF ি BENGALIVOWELSIGNI

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

52 U+09C0 ী BENGALIVOWELSIGNII

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

53 U+09C1 BENGALIVOWELSIGNU

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

34

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

54 U+09C2 BENGALIVOWELSIGNUU

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

55 U+09C3 BENGALIVOWELSIGNVOCALICR

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

56 U+09C4 BENGALIVOWELSIGNVOCALICRR

Kāra(Mātrā)

1Bangla2Assamese

[112][122]

57 U+09C7 l BENGALIVOWELSIGNE

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

58 U+09C8 m BENGALIVOWELSIGNAI

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

59 U+09CB lা BENGALIVOWELSIGNO

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

60 U+09CC lৗ BENGALIVOWELSIGNAU

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

35

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

61 U+09CD BENGALISIGNVIRAMA

Hasanta(=Halant)Virama(=Da ri)

1Bangla2Assamese2Manipuri

[112][122][125]

62 U+09CE ৎ BENGALILETTERKHANDATA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

63 U+09F0 ৰ BENGALILETTERRAWITHMIDDLEDIAGONAL

Consonant 2Assamese [122]

64 U+09F1 ৱ BENGALILETTERRAWITHLOWERDIAGONAL

Consonant 2Assamese2Manipuri

[122][125]

Table8BanglaCode-PointRepertoire

Apart from the above individual code-points the Neo-Brāhmī Generation Panel alsoproposes some specific sequences which enable conditional inclusion of the BanglaLETTER A and E followed by Bangla SIGN VIRAMA and Bangla LETTER YA againfollowed by Bangla VOWEL SIGN AA in the repertoire for enabling inclusion of aeligsoundasinEnglishlsquobatrsquolsquocatrsquoetc

SrNo

UnicodeCodePoints

Sequence

CharacterNames Examplelanguagesusingthecode-point

(Notexhaustive

list)

Reference

36

SrNo

UnicodeCodePoints

Sequence

CharacterNames Examplelanguagesusingthecode-point

(Notexhaustive

list)

Reference

S1 098509CD09AF09BE

অ8া BENGALILETTERABENGALISIGNVIRAMABENGALILETTERYABENGALIVOWELSIGNAA

BanglaAssamese

[112][122]

S2 098F09CD09AF09BE

এ8া BENGALILETTEREBENGALISIGNVIRAMABENGALILETTERYABENGALIVOWELSIGNAA

Bangla [112]

Table9Sequences

52 CodePointRepertoireExclusionTherearesomecharactersoftheBanglascriptthatfindplaceintheUnicodebuthavenot been included in the repertoire in the LGR proposal The reason for excludingঌ(U+098C)andৗ(U+09D7)isthattheyarerareandobsoletecharacters

SrNo CodePoints

Glyph CharacterNames Note

1 U+098C ঌ BENGALILETTERVOCALICL Limitedordeclininguse

2 U+09D7 ৗ BENGALIAULENGTHMARK Limitedordeclininguse

Table10ExcludedCodePoints

37

53 CodepointnotusedaloneBENGALI SIGN NUKTA U+09BC (See 336) is excluded from repertoire since it will never be used alone It will be used as sequence in three special characters in normalized form for ড় ঢ় য়

UnicodeCodePoint

Glyph CharacterName Reasonforexclusion

U+09BC BENGALISIGNNUKTA

Never used alone Only used together with U+09A1 ড

U+09A2 ঢ U+09AF য as to form ড় ঢ় য় respectively

Table10bExcludedCodePoints

54 TheBasisofPresentIDNThepresentLGRhasalsobenefitedfromtheearlierworkonIDNforBangla(differentversions)doneforभारतorভারতdraftedbetween20112009and18072013

541 TheABNFVariablesTheAugmentedBackus-NaurFormalism(ABNF)beganwiththefollowingvariables

CrarrConsonantVrarrVowelMrarrkāra(Mātrā)BrarrAnusvāra(onuʃʃār)DrarrCandrabinduXrarrVisarga(biʃɔrgo)HrarrHasantaViramaZrarrKhandaTa

TheAugmentedBackus-NaurFormalism(ABNF)willusethefollowingOperators

SrNumber Operator Function

1 ldquo|ldquo Alternative

2 ldquo[]rdquo Optional

3 ldquordquo VariableRepetition

38

4 ldquo()rdquo SequenceGroup

Table11TheABNFFormalism

InwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoBanglaaregiventofacilitateunderstanding

542 TheVowelSequenceInwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoBanglaare given To facilitate understanding of other Brahmi script users equivalents inDevanāgarīareprovidedwherevernecessaryAvowelsequenceismadeupofasinglevowelItmaybefollowedbutnotnecessarily(optionally)byanAnusvāraonuʃʃār(B)Candrabindu(D)oraVisargabiʃɔrgo(X)ThenumberofDBorXwhichcanfollowaV inBanglamaynotberestrictedtooneGoingbytherules illustratedinthedocument it isclearthat formationssuchasVDDVBBandVXXare invalidorthographicunitsHowever it isvalidandpossible tohaveformationsorsequencessuchasanusvarafollowedbyachandrabinduononehandandvisarga followed by a chandrabindu on the other as in হ8াংচা lsquohaelignchārsquo and lsquohaelignrsquo হ8াঃrespectivelyThepossibilityof aVisarga orAnusvāra (onuʃʃār) followingaCandrabinduexists inBangla Vowel can optionally be followed by a combination of Hasanta Virama [H]Consonant [C] to formaYa-phala ldquoYa-phala isapresentation formofU+09AFBanglaletterযorlsquoyarsquoRepresentedbythesequenceltU+09CDieBENGALISIGNVIRAMABangla SIGNHasanta or VIRAMA U+09AF -য BENGALI LETTER YAgt Ya-phala has aspecialformয়AgainwhencombinedwithU+09BEাBENGALIVOWELSIGNAA(ielsquoaarsquo(ā))itisusedfortranscribing[aelig]asintheldquoardquointheEnglishwordldquobatrdquowritteninBanglaasব8াটAVowel-sequenceadmitsthefollowingcombinations

5421 ASingleVowel

ExamplesV অअ

5422 AVowelwithConditionsAVowelcanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D]orVisarga[X]or Candrabindu+ Anusvāra [DB] or Candrabindu+ Visarga [DX] or combination ofHasanta(orVirama)[H]followedbyConsonant[C]followedbykāra(Mātrā)[M]

39

Examples

VB অং अ

VD অ अ

VX অঃ अः

VDB অং अ

VDX অঃ अःVHCM অ8াএ8া

5423 VHCMSequenceAVHCMsequencecanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D] orVisarga [X]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga [DX]

Examples VHCMB অ8াংএ8াং VHCMD অ8াএ8া VHCMX অ8াঃএ8াঃ VHCMDB অ8াংএ8াং VHCMDX অ8াঃএ8াঃ

543 TheConsonantSequence

5431 ASingleConsonant(C) ExampleC কक

5432 AConsonantwithConditionsAConsonantoptionallyfollowedbydependentvowelsignkāra(Mātrā)[M]orAnusvāra [B] or Candrabindu [D] or Visarga [X] or Hasanta (also known asVirama)[H]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga[DX]

Example

CM িকক -कक

CB কং क

40

CD ক क

CX কঃ कः

CH p क(Pureconsonant)

CDB কং क

CDX কঃ कः

5433 CMSequenceACMsequencecanbeoptionallyfollowedbyBDXDBorDX

Example CMB কীংকং कक

CMD কা का

CMX বীঃ वीः

CMDB কাং का

CMDX কাঃ काः

5434 SequenceofConsonants Asequenceofconsonants(upto4)joinedbyHasanta(alsoknownasVirama)

3(CH)CExample

CHC W rarr ন++ ত न+त

CHCHC sup2 rarr ন+ + ত+ + র न+त+र

CHCHCHC q8 rarr ন++ত++র++য় न+त+र+य

5435 Subsets While considering its subsets as a representative example we will consider thecombinationCHConlyhoweverthesameisequallyapplicabletoCHCHCandCHCHCHC[A]ThecombinationmaybefollowedbyMBDXDBorDX

Example CHCM sup3ী rarrক ক ী 4कrarrककী

CHCB sup3ং rarrক ক ং 4कrarrकक

CHCD sup3 rarrক ক 4कrarrकक

41

CHCX sup3ঃ rarrক ক ঃ 4कःrarrककঃ

CHCDB sup3 ং rarrক ক ং 4कrarrकक

CHCDX sup3ঃ rarrক ক ঃ 4कःrarrककः [B] 3(CH)CMmayfurtherbefollowedbyaBDXDBorDX

Example

CHCMB sup3ীং rarr ক ক ী ং 4क rarr क क ी

sup3ং rarr ক ক ং 4क rarr क क

CHCMD sup3া rarr ক ক া 4का rarr क क ा

CHCMX sup3ীঃ rarr ক ক ী ঃ 4कः rarr क क ी ः

CHCMDB sup3াংrarr ক ক া ং 4काrarr क क ा

CHCMDX sup3াঃ rarr ক ক া ঃ 4काः rarr क क ा ः

544 TheKhanda-Tasequence

5441 AsinglelsquoKhandarsquo-Ta(Z) Example Zৎ=x

5442 AKhandaTaCombination10AKhandaTacanbeprecededbyaconsonantandHasanta(alsoknownasVirama)

[CH]Z

Example র++ৎ =ৎHasinভৎHসনা (bhartsanā) scoldingNoteTheconditionsinthiscontextofKHANDATAarethattheCshouldbeeitherRAU+09B0(র)(usedinBangla)orRAU+09F0(ৰ)(usedinAssamese)

545 SpecialCasesSandPTwospecialcasesinvolvingSequences(referredtoasSandPinTable16underSection7)couldbedescribedbrieflyhereLetustakeupSinthefirstinstanceItisnoteworthythat thereare two instances inBanglawhereHasanta(U+09CD) isprecededbya full

10 Refer to Rule P in Section 7 Table 16

42

vowel (U+0985 অ - BANGLA LETTER A and U+098F এ - BANGLA LETTER E) ForrenderingYa-phalāfollowedbyঅandএitisnecessarytotypeU+09CDplusU+09AFyapreceded by the said vowels This is a purely ligatural entity and the addition ofYa-phalāandā-kāraisusedtoelicittheaeligsoundasinEnglishlsquobatrsquolsquofatrsquoetcTheBrahmıscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribedaslsquovowel killerrsquo although it actually indicates absence of a vowel after the markedconsonant Only the consonants can have the Hasanta marked But as we see hereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅGাandএGা(CfUnicode100p473[100])Another case refers to the formation of repha and ra-phalā in the said script andmentioned in the tableaboveasPOwing toco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+

kaasinঅকHarkaldquothesunldquo)ra-phalā=C+Hasanta+ra(egTieka+Hasanta+raas

inচTcakraldquocycleldquo)ThepointisinboththecasestheslotforracouldbeBanglaraর

(U+09B0)ortheAssameseraৰ(U+09F0)followedprecededbythecommonHasanta(U+09CD)whereastheshapesofrephaandra-phalāinboththecasesremainthesame

43

6 VariantsThissection talksabout thevariants in theBanglascriptTheNBGPcategorizes theseconfusinglyvariantsintwogroupsGroup1ConfusingduetopurevisualsimilarityGroup2ConfusingduetodeviationfromnormallyperceivedcharacterformationsbylargerlinguisticcommunityForGroup1anyidenticalcodepointsaredefinedasvariantsTheconfusablebutnotidenticalcasesarenotproposedasthereisanotherpanel(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesHowevercaseswhichbelongtoGroup2areproposedtobeconsideredasvariantsThesecasesarenotofmerevisualsimilarityasthey involve some deviations from the widely accepted norms of Bangla Aksharformations These can cause confusion even to a careful observer and hence beingproposedasvariantsThe variants are generated in a script when two or more forms are formed withdifferent storage or code points In Bangla the e-kāra ā-kāra and the o-kāra havedifferentcodepointsOnecantypeowithaconsonantatonegoandthesamebytypinge-kāra and ā-kāra as two separate keys getting the same results A reader cannotdifferentiatebetween the twoko (েকা)one typedwitha singlekeyand theotheronetypedwithtwodifferentkeysMoreoverthiswillnotbeconsideredasacaseofvariantbecauseakārafollowedbyakāraisnotallowed

61 InScriptVariantsHoweverweproposetwocasesoftruein-scriptvariantsinBanglascriptCASEIAs far as true variants in Bangla are concernedwemay drawour attention to caseswhereinHasantawith(U+09A5)থ(tha)appearsasconjunctwith(U+09B8)স(sa)and(U+09A8)ন(na)1 স+Hasanta+থ(U+09B8+U+09CD+U+09A5)versus

স +Hasanta+হ(U+09B8+U+09CD+U+09B9)

2 ন+Hasanta+থ(U+09A8+U+09CD+U+09A5)versus

ন+Hasanta+হ(U+09A8+U+09CD+U+09B9)

44

Theabovecombinationsifwrittenintraditionalorthographycouldbelittleconfusingwheretheথ(tha)inconjunctappearslikeaহ(ha)Theconjunctcouldbeintheinitialmedialorfinalpositions(asshownbelowinegno1)Itcouldbetypedwrongaswellthinking itwas aহ (ha)U+09B9 increasing the chances of risks in labelwriting andidentificationExamples1 acuteandসহ(asinacuteানsthānaacuteলsthulaাacuteGsvāsthyaঅacuteায়ীasthāyı)

2 andনহ(asinparagrantha)The fontswhichrepresent traditionalBanglawritingsystemcould tend tocreate thisproblemThereforethesemaybetakenascasesofvariantsinBanglaCASEIIAnotherinterestingexampleofvariantisencounteredinra+HasantaandHasanta+racombinationsinwritinglabelsintheBanglascript(forlanguagessuchasBanglaAssameseandManipuri)Thevariantcasesariseintypinglsquorepharsquo(involvingra+Hasanta)andlsquora-phalārsquo(involvingHasanta+ra)lsquoRepharsquocouldbeformedbytwosequences(mainlybecausebothAssameseandBanglafindplaceinthesameUNICODEpointsandlsquoB_RArsquoaswellaslsquoA_RArsquorefertothesamephoneticelement)Herethefinalligatureslookthesameandwillbeasfollows

(1) B_RA+H+C(2) A_RA+H+C

Where

B_RA = U+09B0BENGALILETTERRA(র)orA_RA = U+09F0BENGALILETTERRAWITHMIDDLEDIAGONAL(ৰ)H = U+09CDBENGALISIGNVIRAMA()C = anyconsonant(theoretically)

Example

Sequence1(UsingBanglaRA)

Ligature1

Sequence2(UsingAssameseRA)

Ligature2

U+09B0(র)U+09CD()U+0995(ক) কH U+09F0(ৰ)U+09CD()U+0995(ক) কH

U+09B0(র)U+09CD()U+09A0(ঠ) ঠH U+09F0(ৰ)U+09CD()U+09A0(ঠ) ঠH

Table12ExampleofRepha

45

NoteAsBanglaandAssameseকandঠlookexactlythesametheresultantcombinationswithRephalookidenticalAdditionofRephadoesnotmakeanydifferencelsquoRa-phalarsquocouldbeformedbytwosequencesonsimilargroundsandthefinalligatureswouldlookthesame

(1) C1+H+B_RA(2) C1+H+A_RA

WhereC1 = anyconsonantsexceptKhanda-ta

Example

Sequence1(UsingBanglaRA)

Ligature1

Sequence2(UsingAssameseRA)

Ligature2

U+0995(ক)U+09CD()U+09B0(র) U+0995(ক)U+09CD()U+09F0(ৰ)

U+09A8(ন)U+09CD()U+09B0(র) ) U+09A8(ন)U+09CD()U+09F0(ৰ) )

Table13ExampleofRa-phala

AstheAssameseandBanglaRephaandRa-phalaconjunctformslookthesamethiscouldcauseconfusabilitytotheend-usersHencetherephaandra-phalacasesneedtobedefinedasvariantsNBGPconcludedtodefineরandৰasvariantcodepointswhereonlyonevariantsetbetweenরandৰcouldcoverallcasesButthiswillcreateblockedvariantlabelsegif

someoneregistersldquoরররrdquothevariantlabelldquoৰৰৰrdquowillbegeneratedasvariantandwillbeblockedandviceversaHoweveritisonlyblockedatthelabellevelifsomeoneelseneedstoregisterotherlabelsegৰৰorৰৰৰৰitisstillpossible

62 CrossScriptVariantsAcrispcrossscriptstudyforBanglahasbeendonewithrespecttosisterscriptssuchasDevanāgarī Gurmukhı and Odia11(formerly Oriya) keeping in mind the visual andtechnicalconfusionstheymaycauseaslabelsonthewebdomainMoreoverthereisnoin-script variant in Bangla as far as the orthography is concerned The followingcharacters are being proposed by the NBPG as variants Although there are certaincharacters which are somewhat similar they but have not been included here TheyhavebeenprovidedintheAppendix(102)forreference

11 Unicode uses Oriya for the script although Odia is now the official term used

46

1 BanglaandNāgarīDevanāgarīScript

Bangla Devanāgarī

মU+09AE

मU+092E

িU+09BF

िU+093F

Table14-BanglaandDevanāgarīcross-scriptvariantcodepoint

2 BanglaandGurmukhiScript

Bangla Gurmukhı

মU+09AE

ਸU+0A38

িU+09BF

ਿU+0A3F

Table15-BanglaandGurmukhıcross-scriptvariantcodepoint

7 WholeLabelEvaluationRules(WLE) ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinsection32whenwritten inBangla12ScriptThe ruleshavebeendrafted in suchawaythattheycanbeeasilytranslatedintotheLGRspecificationsBelow are the symbols used in the WLE rules for each of the Indic SyllabicCategoryasmentionedinthetableprovidedinCodepointrepertoire(Section51)

C rarr Consonant

M rarr Kāra(Mātrā)

V rarr Vowel

B rarr Anusvāra

12 As used by the Unicode denoting and including both Assamese and Maṇipuri

47

D rarr Candrabindu

X rarr Visarga

H rarr Hasanta

Z rarr KhandaTa

S rarr S1S2(fromTable9)or(ae)Ya-phalā(V1HC1M1)whereV1iseither0985(অ-BENGALILETTERA)or098F(এ-BENGALILETTERE)His09CD(-BENGALISIGNVIRAMA)C1is-09AF(য-BENGALILETTERYA)M1is-09BE(া-BENGALIVOWELSIGNAA)S1 and S2 are valid even they are not allowed by the other context rules

P rarr Ra-Hasanta(C2H)whereC2iseither09B0(র-BENGALILETTERRA)or 09F0 (ৰ - ASSAMESE LETTER RA

Unicode name BENGALI LETTER RAWITHMIDDLEDIAGONAL)

His09CD(-BENGALISIGNVIRAMA)

Table16-SymbolsusedinWLErules

ItisalsoperhapsidealtomentionherethatinBanglatheconsonantletters(orgraphemes)arephysicallyjoinedtoformldquoclustersrdquothatcouldtheoreticallyconjoinfromtwotofourconsonantsandcombinetocreatenewshapesDashandChaudhuri(1998)statethatthereareldquonearly380uniqueconsonantclustersrdquooutofwhichBi-consonantalcombinationsare290three-lettercombinationsaccountforanother80andtherareroneswithfourlettersnumber10more[136Pg4]MoredetailsofsuchcombinationscouldbeseeninPabitraSarkar(1993)[135] 71FinalSetofWLERulesTheprevalentpatternsinBanglaandvariousrestrictionsbelowarethespecificWLErulesthatneedtobeimplemented

48

1 CisasetofCandCNwhereCNisthesetofnormalizedformsofড়ঢ়য়2 HmustbeprecededbyC

Example

3 MmustbeprecededbyCExampleকা

4 DmustbeprecededbyeitherofVCorMExampleআখখা হা

5 XmustbeprecededbyeitherofVCMorDExampleউঃখঃবঃাঃ দ ঃ

6 BmustbeprecededbyeitherofVCMorDExampleআংইংকং

7 ZmustbeprecededbyVCMDBXorPExampleইৎকৎাৎাৎপ6ৎrৎ (S is not listed because S ends with M Z may also follow S)rdquo

8 VCANNOTbeprecededbyHDetailsin711CaseofVprecededbyH

9 SCANNOTbeprecededbyH

Now letuselaborateeach rulewithexamples from the scriptkeeping inmindthe Bangla Assamese and Manipuri communities Some combinations ofcharactersmayseemunrealisticorrareinusagebutthereisnoharminaddingsuch ligaturesbecause it ispossible tocreate thembyanyusereasilybutmaynotbeattestedcombinationsCase of V Preceded by H There could be cases involvingmulti-word domains where Vmay need to beallowedtofollowanHegব8াsঅtইিuয়া baeligŋkʌv ɪndiə (U+09ACU+09CDU+09AFU+09BEU+0999U+09CD U+0995 U+0985 U+09AB U+09CD U+0987 U+09A8 U+09CD U+09A1U+09BFU+09DFU+09BE)(meaningBankofIndia)ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendswithanH(অt)andthesecondwordbeginswithaV(ইিuয়া)Somesectionsofthe linguistic community require the explicit presence of H for fullrepresentation of the sound intended However by and large the form of thefirstwordwithoutanH(U+09CD)isconsideredenoughforfullrepresentationofthesoundintendedforthefirstword

49

This isauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoire Otherwise V is never required to be allowed to follow an HPermittingthismaycreateaperceptivesimilaritybetweentwolabels(withandwithout H) for majority of the linguistic communities hence this is explicitlyprohibitedbytheNBGPIn future if required depending on the prevailing requirements from thecommunitythefutureNBGPmayconsiderrevisitingthisrule

72 AdditionalExamplesfromBanglaABNF

Belowarea fewexampleswhichhelponeunderstandsomeof the rulesABNFputsinplaceThesearejustgivenforreferencepurposesandarenotmeanttobecomprehensive

1HMBDorXcannotoccurinthebeginningofaBanglawordExample

ক क

াক ाक

ংক क

ক क

ঃক ःकAscanbeseensuchcombinationwillresultautomaticallyinaldquogolurdquooradottedcircle marking it as an invalid formation This is an intrinsic property of theIndianlanguagesyllableandisquasi-automaticallyappliedwhereversupportedbytheOS

2HisnotpermittedafterVBDXMS

Example

অ अ

অং क ক क কঃ कः ি )क

3NumberofBDorXpermittedafterConsonantorVowelorakāra(Mātrā)isrestrictedtoonethusthefollowingcombinationsareinvalidated

50

Example

কংং क

ক क

কঃঃ कःः

কা का

কীঃঃ कःः

অংং अ

অ अ

অঃঃ अःः

4NumberofMpermittedafterConsonantisrestrictedtooneExample

কীী की5MisnotpermittedafterV Example

ইাঈৗ ईाईौ6ThecombinationsofAnusvāra+ VisargaaswellasVisarga+Anusvāraarenotpermissible

Example

কংঃ कः

কঃং कः

8 Contributors

81ExpertsfromIndia ProfessorUdayaNarayanaSinghChair-ProfessorofLinguisticsampDeanFacultyofArtsAmityUniversityHaryanaGurgaonPachgaonManesarPIN122431(Haryana)India

51

ProfessorPabitraSarkarformerlyVice-ChancellorRabindraBharatiUniversityKolkataDrAtiurRahmanKhanPrincipalTechnicalOfficerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMrRajibChakrabortyLinguistSocietyforNaturalLanguageTechnologyResearch(SNLTR)Module114amp130SDFBuildingSaltLakeSector-VKolkata-700091(WestBengal)IndiaMrAkshatJoshiProjectEngineerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMsMoumitaChowdhurySeniorTechnicalOfficerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMrChandrakantaMurasinghAgartalaTripuraSomeotherNBGPmembers

82ContributorsfromBangladeshJanabMustafaJabbarHonorableMinisterMinistryofPostsTelecommunicationsampInformationTechnologyGovtofBangladeshProfShamsuzzamanKhanFormerDirector-GeneralBanglaAcademyDhakaProfRafiqulIslamNationalProfessorofHumanitiesDhakaProfSwarochisSarkarDirectorInstituteofBangladeshStudiesRajshahiUniversityRajshahiBangladeshProfJinnatImtiazAliDirector-GeneralInternationalMotherLanguageInstituteDhakaMrMohammadMamunOrRashidDepartmentofBanglaJahangirnagarUniversityampMemberBangladeshComputerCouncilProfManiruzzamanformerlyProfessorChittagongUniversityChattagramBangladeshMrShyamSunderSikderSecretarySecretaryPostampTelecommunicationsDivisionGovtofBangladesh

52

MrMdMustafaKamalFormerDirectorGeneralBangladeshTelecommunicationsRegulatoryAuthorityGovernmentofBangladeshDhakaBrigadierGeneralMdMahfuzulKarimMajumderDirector-GeneralEngineeringampOperationsDivisionBangladeshTelecommunicationsRegulatoryAuthorityGovernmentofBangladeshDhakaMdZiarulIslamProgrammerPostsampTelecommunicationsDivisionGovernmentofBangladeshDhakaProfSyedShahriyarRahmanDepartmentofLinguisticsUniversityofDhakaDrMizanurRahmanDirector(In-Charge)TranslationTextBookandInternationalRelationsDivisionBanglaAcademyDhakaDrApareshBandyopadhyayDirectorBanglaAcademyDhakaMrMdMobarakHossainDirectorBanglaAcademyDhakaDrJalalAhmedDirectorBanglaAcademyDhakaMrJahangirHossainInternetSocietyBangladesh(ICANNALS)JanabSarwarMostafaChoudhuryBangladeshComputerCouncilDhakaJanabMdRashidWasifBangladeshComputerCouncilDhakaJanabIstiaqueArifSeniorAssistantDirectorBangladeshTelecommunicationsRegulatoryAuthorityDhakaMsAfifaAbbasInformationSecurityandGovernanceLeadEngineeratBanglalinkandICANNFellowMr Mohammad Abdul Haque Secretary General Bangladesh Internet GovernanceForumMrImranHossenCEOEyeSoftandkeymemberofBangladeshAssociationofSoftwareampInformationServices(BASIS)MsShahidaKhatunDirectorFolkloreMuseumampArchiveDivisionBanglaAcademyDhakaMrSyedAshikRehmanCEOBengalMediaCorporationDhakaMrHaseebRahmanCEOProfessionalsrsquoSystemsDhaka

53

9 References[100] UnicodeConsortium2017UnicodeStandard100MountainViewCA

[101] BandyopadhyayChittaranjan1981DuiShatakerBanglaMudranoPrakashanKolkataAnandaPublishers

[102] BanerjiRD1919TheOriginoftheBengaliScriptKolkataNewDelhiAsianEducationalServices2003reprint

[103] ChatterjiSK1926TheOriginandDevelopmentoftheBengaliLanguageCalcuttaCalcuttaUniversityPress

[104] -----1939Bhasha-prakashBangalaVyakaran(AGrammaroftheBengaliLanguage)CalcuttaUniversityofCalcutta

[105] HaiMuhammadAbdul1964DhvaniVijnanOBanglaDhvani-tattwa(PhoneticsandBengaliPhonology)DhakaBanglaAcademy

[106]JhaSubhadra1958TheFormationofMaithiliLondonLuzacampCo

[107] KosticDjordjeDasRheaS1972AShortOutlineofBengaliPhoneticsCalcuttaStatisticalPublishingCompany

[108] MajumdarRC1971HistoryofAncientBengalCalcuttaGBhardwaj

[109] MazumdarBijaychandra19202000TheHistoryoftheBengaliLanguage(ReprCalcutta1920ed)NewDelhiAsianEducationalServices

[110] PandeyAnshuman2001ProposaltoEncodetheTirhutaScriptinISOIEC10646

[111] PalPalashBaran2001DhwanimalaBarnamalaKolkataPapyrus

[112] -----2007lsquoBanglaHarapherPanchParbarsquoInSwapanChakrabortyedMudranerSanskritiOBanglaBoiKolkataAbabhas

[113] RossFiona1999ThePrintedBengaliCharacteranditsEvolutionLondonCurzon

[114] ShastriMahamahopadhyayHaraPrasad1916HājārBacharērPurāṇaBāṅgālāBhāṣāyBauddhaGānōDōhāCalcuttaBangiyaSahityaParishat

[115] SinghUdayaNarayana(JointlyManiruzzaman)1983DiglossiainBangladeshandlanguageplanningCalcuttaGyanBharati214pp

[116] -----1987ABibliographyofBengaliLinguisticsMysoreCIILxii+316pp

[117] -----2017(withRajibChakrabortyBidishaBhattacharjeeampArimardanKumarTripathy)LanguagesandCulturesontheMarginGuidelinesforFieldworkonEndangeredLanguagesMimeoCentreforEndangeredLanguagesVisva-Bharati

54

[118] -----1980ScriptalchoiceandspellingreformAnessayinlanguageandplanningJournaloftheMSUniversityofBarodaSocialScienceNumber292173-186AmodifiedversionreprintedEAnnamalaiBjornJernuddandJoanRubinedsLanguagePlanningProceedingsofanInstituteMysoreCIIL405-417

[119] Sripantha1996JakhanChapakhanaEloKolkataPaschim-BangaBanglaAcademy

[120] SurAtul1986BanglaMudranerDushoBacharKolkataJijnasa

[121] ScriptBehaviourforBengaliVersion11TDILandC-DACPune

[122] BoraMahendra1981TheEvolutionofAssameseScriptJorhatAssamSahityaSabha

[123] ProposaltoEncodetheTirhutaScriptinISOIEC10646httpwwwunicodeorgL2L201111175r-tirhutapdfaccessedon25112017

[124] EthnologueAssameseintheLanguageCloudhttpswwwethnologuecomcloudasmaccessedon25112017

[125] BengalialphabetforManipurifoundinEthnologueManipuri(MeeteilonMeithei)httpswwwomniglotcomwritingmanipurihtmaccessedon20102019

[126] WikipediaBengalialphabethttpsenwikipediaorgwikiBengali_alphabetaccessedon25112017

[129]OmniglotSlyhetihttpwwwomniglotcomwritingsylotihtmaccessedon1052018

[130]WikipediaBishnupriyaManipurilanguagehttpsenwikipediaorgwikiBishnupriya_Manipuri_languageaccessedon25112017

[131] TheEMILLECIILCorpushttpmetashareeldaorgrepositorybrowsethe-emilleciil-corpusabdd35c8de6f11e2b1e400259011f6ea6bce74d38dbb42d881da76c64a6adb20accessedon1052018

[132]TheEMILLECIILCorpushttpcatalogelrainfoproduct_infophpproducts_id=696accessedon1052018

[133] BanglaLanguageampScript httpswwwisicalacin~rc_banglabanglahtmlaccessedon1052018

[134] SarkarPabitra1992BanglaBananSanskarSamasyaoSambhabanaKolkataChirayataPrakashan

[135] SarkarPabitra1993BanglaBhasharYuktabyanjanBhasha1123-45

55

[136] DashNiladriShekharandBBChaudhuri1998BanglaScriptAStructuralStudyLinguisticsToday121-28Alsoavailableathttpswwwacademiaedu9967428Bangla_Script_A_Structural_Study

[137] DaniAhmedHasan(1957)lsquoSrīhatta-NāgarīLipirUtpattioBikāśrsquoBanglaAcademyPatrika(Dhaka)Vol12(Bhadra-Agrahayan1364BangabdaNumber)pg1

[138] WikipediaSylhetiNagari httpsenwikipediaorgwikiSylheti_Nagariaccessedon1952018

[139] FuruiRyosuke(2015)lsquoVariegatedAdaptationsStateFormationinBengalfromtheFifthtoSeventhCenturyrsquoinBhairabiPrasadSahuampHermannKulkeedsInterrogatingPoliticalSystemsIntegrativeProcessesandStatesinPre-ModernIndiaChapter9Pp255-73NewDelhiManohar

[140] FergusonCharesAandMunierChowdhury(1960)lsquoPhonesofBengalirsquoLanguageVol36No1pp22-59

[141] ShahidullahMuhammad(2007)BuddhistMysticSongsDhakaMowlaBrothers

[142] RayPunyaSloka(1966)BengaliLanguageHandbookWashington

[143] HaiMuhammadAbdul(1960)AphoneticandphonologicalstudyofnasalsandnasalizationinBengaliDhakaUniversityofDhaka

[144] UnicodeConsortiumProposalSummaryFormtoAccompanySubmissionsforAdditionstotheRepertoireofISOIEC10646UNICODEhttpswwwunicodeorgL2L200202387r-syloti-formpdfaccessedonMay212018

[145] WikipediaOlChiki(Unicodeblock)httpsenwikipediaorgwikiOl_Chiki_(Unicode_block)accessedonMay212018

[146]BanglaScripthttpwwwbangladesh2000combdbangla_scripthtmlaccessedonMay212018

[147] BhattacharyaAshutoshed(1942)GopichandrerGanCalcuttaCalcuttaUniversity

[149] DasSisirKumar(1975)SahibsandMunshisAnAccountoftheCollegeofFortWilliamCalcutta

[150] IslamRafiqulPabitraSarkarMahbubulHaqampRajibChakraborty(eds)(2014)BanglaAcademyPromitoBanglaByabaharikByakaran(AFunctionalGrammarofStandardBangla)DhakaBanglaAcademy

[151] SarkarPabitra[2013]lsquoBanglaSpellingReformtheLongandShortofItrsquoBanglaJournal19215-232

[152] BanglaAcademy(2012)BanglaAcademyPromitoBanglaBananerNiyam(StandardBanglaSpellingasadoptedbyBanglaAcademy)DhakaBanglaAcademy

56

[153] SarkarPabitraampRajibChakraborty2018ldquoWhathashappenedSoFarIntermsof Script Reformsrdquo Paper presented at the Face to Face meeting jointly held by theBanglaAcademyDhakaampICANNatBanglaAcademyDhakaon10072018

[154] The Unicode Consortium 2018 The Unicodereg Standard Version 110 ndash CoreSpecificationChapter12P473

57

10 Appendix-I

101 AugmentedBackus-NaurFormalism(ABNF)The Augmented Backus-Naur Formalism (ABNF) is generic in nature and whenappliedtoaspecificlanguagescriptcertainrestrictionrulesapplyInotherwordsinagivenlanguagesomeoftheFormalismstructuresdonotnecessarilyapplyTotakecareofsuchcasesrestrictionrulesaresetinplaceTheserestrictionswillhelptofine-tunetheABNFIncaseofBangla13inparticularthefollowingrulesapply

1Khaṇḍata (ৎ) is NOT allowed at the beginning of an IDN label The sameappliestoঞandthevelarnasalঙintheBanglaSchemeoffive-foldlsquovargarsquo(asdefined under Table 5) Moreover Bangla does not allow ya (য়) in thebeginningofawordeitherbutwecanciteacoupleofnativeexamples forexamplethewordয়8াvেড়া(yaeligbbɔRo)fromthepoemlsquoLichuchorrsquowrittenbyKaziNazrul IslamHowever there are instances of it being used in namesmostlyof foreignoriginsuchasYaqubwhichmaybewrittenwithya(য়) inthe beginning as inয়াwব) In very recent timeswhile transliterating someChineseandJapanesenamesinBanglaonedoescomeacrossthepossibilityofKhaṇḍata (ৎ) followedbysa (স) inthebeginningofaword forexamplexেসিরং(Tsering)

2CHcancomewithKhandaTainonlythecasewhereCisra(র)(09B0)

ৎ6 asinভৎ6 সনা

3OnlyfollowingcombinationswithVHCMwillbeallowedrarrঅ8া(togetherpronouncedasaelig)asinঅ8ািসড(acid)rarrএ8া(togetheralsopronouncedasaelig)asinএ8ািসডএ8ােসািসেয়শান

(acidassociation)

102 lsquoSylhetiNagarılipirsquoorlsquoSilotirsquoThisversionofBanglascriptresemblesthe lsquoKaithīrsquoscript(ISO12954)usedbytheAccountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh andBiharndashwidelyinuseduringthe1880sTherewereseveralothernamesofSylheti

13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language Assamese and Maṇipuri have not been covered in this section

58

Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)

Table17ndashTheScriptTableofSylhetiNagarıorSiloti

103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints

1031 BanglaandNāgarīorDevanāgarī

Bangla Devanāgarī NBGP

Decision

ঃ U+0983 ः U+0903 Confusable

ওU+0993 उU+0909 Confusable

ঘU+0998 घU+0918 Confusable

U+0981 U+0945 Confusable

Table18BanglaandDevanāgarīconfusablecodepoints

59

1032 BanglaandGurmukhi

Bangla Gurmukhi NBGPdecision

ঘU+0998 ਬU+0A2C Confusable

U+0981 U+0A71 Confusable

Table19BanglaandGurmukhiconfusablecodepoints

Bangla Gurmukhi

NBGPdecision

ওU+0993 ਤU+0A24

Distinguishable

শU+09B6 ਅU+0A05

Distinguishable

মU+09AE ਮU+0A2E

Distinguishable

বাU+09ACandU+09BE

ਗU+0A17

Distinguishable

Table20ndashBanglaandGurmukhıdistinguishablecodepoints

1033BanglaandOriya(Odia)

Bangla Oriya(Odia) NBGPDecision

ওU+0993 ଓU+0B13 Confusable

Table21ndashBanglaandOriyadistinguishablecodepoints

Bangla Oriya(Odia) NBGP

Decision

ঘU+0998 ସU+0B38 Distinguishable

Table22ndashBanglaandOriyadistinguishablecodepoints

60

11 Appendix-IIBengaliconsonantsandtheirallographs

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

প p y(z+ত)(z +ন)|(z +প)প8(z +য)r(z +র)(z +ল)~(z +স)(+প)(+প)

ফ pʰ (t +র)(t +ল)(+ফ)

ব b ( +জ)( +দ)( +ধ)v(+ব)ব8(+য)(+র)(+ল)ভ(+ভ)(+ব)(+ব)

(0+ধ) 2 (3+ব)

ভ bʱ ভ8(+য)(+র)(+ল)

ত t (x+ত)8(x+x+য)(x+x+ব)(x+থ)(x+ন)ত8(x+য)(x+ম)8(x++য)(x+ব)(x+র)y(z+ত)(p+ত)(p+x+ব)(+ত)q8(+x++য) (+x+র)Thereisamarkedformofত+=ৎৎ6 (+xৎ)

amp (5+ত)

61

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

থ tʰ থ8(iexcl+য)cent(iexcl+র)pound(+থ)(x+থ)curren(+থ)

(7+থ) 9 (+থ)

দ d yen(brvbar+গ)sect(brvbar+ঘ)uml(brvbar+দ)copy(brvbar+ধ)দ8(brvbar+য)ordf(brvbar+ব)laquo(brvbar+ভ)not(brvbar+র)(+দ)shy(+দ)reg(+brvbar+র)not6 (+brvbar+র)

(lt+গ) gt (lt+ধ)

ধ dʱ macr(deg+ন)plusmn(deg+ম)ধ8(deg+য)sup2(deg+র)sup3(acute+ধ)copy(brvbar+ধ)(+ধ)micro(+ধ)

( (+ধ) gt (lt+ধ) (0+ধ) (7+ধ)

ট ʈ para(middot+ট)ট8(middot+য)cedil(middot+ব)sup1(middot+র)ordm(p+ট)raquo(frac14+ট)

ঠ ʈʰ ঠ8(frac12+য)frac34(iquest+ঠ)Agrave(frac14+ঠ)

ড ɖ Aacute(Acirc+ড)ড8(Acirc+য)Atilde(Acirc+র)

ঢ ɖʱ ঢ8(Auml+য)Aring(iquest+ঢ)

চ tʃ AElig(Ccedil+চ)Egrave(Ccedil+ছ)Eacute(Ccedil+Ecirc+র)Euml(Ccedil+ঞ)চ8(Ccedil+য)Igrave(Iacute+চ)Icirc(Iuml+চ)

(A+চ)

62

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ছ tʃʰ ETH(Ecirc+র)Egrave(Ccedil+ছ)Ntilde(Iacute+ছ)Ograve(Iuml+ছ)

$ (A+ছ)

জ dʒ Oacute(Ocirc+জ)Otilde(Ocirc+Ocirc+ব)Ouml(Ocirc+ঝ)times(Ocirc+ঞ)জ8(Ocirc+য)Oslash(Ocirc+র)Ugrave(Iacute+জ)

(A+জ)

ঝ dʒʱ (notprivilegedenoughtohaveclustersasafirstmember)Ouml(Ocirc+ঝ)Uacute(Iacute+ঝ)

ক k Ucirc(p+ক)ordm(p+ট)(p+ত)Uuml(p+x+র)(p+x+ব)Yacute(p+ন)THORN(p+ব)szlig(p+ম)ক8(p+য)agrave(p+র)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)auml(p+frac14+ব)aacute8(p+frac14+য)aring(p+স)s(aelig+ক)ccedil(+p+র)

amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)

খ kʰ (notprivilegedenoughtohaveclustersasafirstmember)egrave(aelig+খ)

63

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

গ g eacute(acute+গ)ecirc(acute+দ)sup3(acute+ধ)euml(acute+ন)igrave(acute+ব)iacute(acute+ম)গ8(acute+য)icirc(acute+র)iuml(acute+ল)eth(aelig+গ)eth6 (+aelig+গ)

( (+ধ) (H+গ) K (L+H+গ)

ঘ gʱ ntilde(ograve+ন)ঘ8(ograve+য)oacute(ograve+র)ocirc(aelig+ঘ)

ঞ Thisletterdoesnothaveanyparticularphoneticvaluebutmostlypronouncedasn

Igrave(Iacute+চ)Ntilde(Iacute+ছ)Ugrave(Iacute+জ)Uacute(Iacute+ঝ)times(Ocirc+ঞ)

(A+চ) $ (A+ছ) (A+জ) M (A+ঝ)

ণ n otilde(iquest+ট)frac34(iquest+ঠ)ouml(iquest+ড)divide(iquest+Acirc+র)Aring(iquest+ঢ)oslash(iquest+ণ)ণ8(iquest+য)ugrave(iquest+ব)acirc(p+frac14+ণ)uacute(frac14+ণ)ucirc(+ণ)

O (P+ড) - (P+R+র) + (S+ণ)

ঙং ŋ s(aelig+ক)uuml(aelig+p+র)egrave(aelig+খ)eth(aelig+গ)ocirc(aelig+ঘ)yacute(aelig+p+ষ)(Insomecontextsaeligisreplacedbyং)কংঅং

) (H+ক) (H+গ) U (H+ঘ)

64

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ম m thorn(+ল)yuml(+প)(+z+র)(+ভ)(++র)$(+ম)(+র)(x+ম)plusmn(deg+ম)amp(+ম)atilde(p+frac14+ম)

W (3+ম)

ন n (+ট)((+middot+র))(iquest+ঠ)u(+ড)(+Acirc+র)(+ত)q(+x+র)q8(+x++য)curren(+থ)shy(+দ)reg(+brvbar+র)micro(+ধ)+(+deg+র)(+brvbar+ব)-(+ন)(+ম)ন8(+য)(+স)0(+ন)

(7+থ) (7+ধ) (7+Y+র)

শ ʃ Icirc(Iuml+চ)Ograve(Iuml+ছ)1(Iuml+ন)2(Iuml+ম)3(Iuml+র)4(Iuml+ল)শ8(Iuml+য)

ষ ʃ 5(frac14+ক)raquo(frac14+ট)Agrave(frac14+ঠ)uacute(frac14+ণ)6(frac14+প)7(frac14+z+র)8(frac14+ফ)raquo(frac14+ট)9(frac14+middot+র)Agrave(frac14+ঠ)uacute(frac14+ণ)ষ8(frac14+য)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)

+ (S+ণ)

65

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

স sampʃ (+ক)(+ট)(+প)(+ফ)lt(+ত)pound(+থ)(+ট)(+ক)=(+খ)স8(+য)gt(+র)(+ল)aring(p+স)

9 (+থ)

হ h ucirc(+ণ)0(+ন)amp(+ম)হ8(+য)(+র)A(+ল)

W (3+ম)

ড় ɽ B(C+গ)

ঢ় ɽʱ (notprivilegedenoughtohaveclusters)

য dʒThesecondarysymbol(allograph)jɔ-phalahastwophoneticvaluesWhenaddedtotheinitialconsonantinaworditisavowelaelig(asinশ8ামলর 8াপারetc)Butafteranon-initialconsonantitjustdoublesitinpronunciation(asinকায6ধায6etc)The+যcombinationhastwophysicalmanifestationsmdashর 8andয6

ক8(p+য)স8(+য)র 8(+য)[Justর 8isneverusedinBanglaorthographyর 8াisbutthenitslasttwosymbolsYa-phalaa-karaconstituteavowelsignrepresentingthevowelঅ8া]

66

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

র r Twomanifestationsmdashi lরফrepʰasthefirst

memberofaclusteregপ6ৎ6 not6 য6D6(+deg+ব)(earlierE6=+brvbar+deg+বafour-termcluster)etc(placedoverthefollowingconsonant)

ii র-ফলাrɔ-pʰɔlaasthesecondthirdmemberofaclustereg etc(placedundertheconsonantitfollows)

ল l F(+গ)(+প)G(+ব)H(+ম)I(+ট)J(+ড)K(+ক)F(+গ)L(+দ)ল8(+য)iuml(acute+ল)(+ল)thorn(+ল)

ঃ hwordfinallywordmediallyitdoublesthepronunciationofthefollowingconsonant

অঃকঃ

অব

Page 2: Proposal for a Bangla (or Bengali) Script Root Zone …‘Bangla’ (or Bengali) is historically and genealogically regarded as an eastern Indo-Aryan language with around 178.2 million

2

3 BackgroundonScriptampPrincipalLanguagesUsingIt30IntroductionlsquoBanglarsquo (or Bengali) is historically and genealogically regarded as an eastern Indo-Aryanlanguagewitharound1782millionspeakersinBangladesh(98speakers)and834millionspeakersintheIndianstatesofWestBengal(6837million)Tripura(215million) SouthAssam (73million) Odisha (049million) andDelhi (021million) aswellasintheAndamanandNicobarIslands(closetoahundredthousand)-accountingfor83ofIndiaItisamajorlanguageinJharkhand(26million)tooandalanguagewith a sizable population in Bihar (044million) Apart from these there are a hugenumberofBangla-speakingdiasporasspreadallovertheworldItistheseventhlargestspokenandwrittenlanguageintheworldBanglaisthenationalandofficiallanguageofBangladeshandoneof the22Official languages in India(listed in the8thScheduleofthe Indian Constitution) It is also one of the official languages of Sierra Leone Thescript is also calledBangla [102]which is an eastern variety of the lsquoBrāhmīrsquoWritingSystemwritten from left to rightHistorically it derives from theBrāhmī alphabet asusedintheAshokaninscriptions(269-232BC)

Banglaanditscognatelanguagesasmentionedabovetogetherformalinguisticgroupknown as the Eastern New Indo-Aryan (NIA) There is a gross inadequacy of theinscriptionsandmanuscriptsintheEasternApabhranśaorlsquoAvahaṭṭharsquoexceptforsmallinscriptions and the manuscripts of the Tantric Buddhist text titledlsquoCaryyācaryyaviniścayarsquoortheCaryā-Pada[114]datingbacktothe9th-11thcenturyAsa result there is not much epigraphic evidence for the development of its writingsystemHoweverwhatevidenceisavailableofthegenesisofBanglawritingsystemisdiscussedinthesection31[109]Historically theBangla languageisdividedintothreeperiodsasevident fromvarioussources

(i) FirstlyOldBanglaPeriod (roughly9501000 toAD12001350) ofwhichthreespecimensarefound(a)47CaryāsongstheDohākōṣaofSarahaandtheDohākōṣa of Kānha (mostly in Apabhranśa) and theḌākārṇava (in avariety of Prakrt) (b) Old Bangla specimens of over 300 words in acommentary[141]

(ii)ThenthereisMiddleBanglaPeriod-1200-1800ADagaindividedintothreestages(a)TransitionalMiddleBangla(1200-1300ADforwhichnogenuinespecimensarefound)[147](b)EarlyMiddleBangla(1300-1500AD)and(c)LateMiddleBangla(1500-1800AD)

(iii)Finally after1800ADwe find theModernorNewBanglamarkedby theintroduction of written prose [109] in the books of Fort William College(established in1800)ThecolloquialvarietyofBanglabasedonthespeech

3

varietyofCalcutta(calledlsquoKolkatarsquonow)madeitsfirstappearancethroughthe Hutōm Pẽcāra Nakśā (1862) by Peari Chand Mitra The influence ofEnglishinthevocabularyidiomsandexpressionsaswellasinthewritingstyles of Bangla is significant by this time The fonts and types for Bangladeveloped during this time also spread to all parts of Bangla speechcommunity[101120]Thesamefontswithsomeextensionswerealsousedfortheneighbouringlanguagesdeployingthiswritingsystem

Bangla prose had developed two literary styles during the 19th-20th Century TheSādhubhāṣā (সাধভাষা - Elegant Language or Style) and the Calitabhāṣā (চিলতভাষাCurrent Language orModern Style) It is the latter style that is prevalent today inwrittenproseTheLanguageMovementinBangladesh(thethenEastPakistan)beganin1948ascivilsociety dissented to the elimination of the Bangla script from currency and stampswhichwere inuse since theBritishRaj Themovement reached its pinnacle in1952when on 21 February the police fired on demonstrating students and civilianstriggeringnumerousinjuriesanddeaths2LaterfollowingtheLanguagemovementon27 April 1952 the All Party National Language Committee decided to demandestablishment of an organization for the promotion of Bengali language BanglaAcademyDhaka right from its inception in1955hasbeenengaged inpromotingandfosteringBanglaasthelinguafrancaofthecountrybeforeandafterindependencefromPakistanin1971ThroughthevariouscommissionsandcommitteesconstitutedbytheGovernment of Bangladesh (Banladesa Jatıya Sy iksa Kamisana in 1972 Jatıya Sy iksaUpadestaParisadin1979BanlaBhasaBastabayanaSelain1982BanlaBhasaKamitiin1983 etc3) after independence in 1971 Bangla was made the primary medium ofinstructioncommunication in all Governmental and educational activities Through agreatstruggleandbloodshedtheBengalisestablishedBanglaasanofficiallanguageofthestate4

2 The UN declared Ekuśe February (21st February) as the International Mother Language Day at the UNESCO General Conference in Paris on 17 November 1999 ldquoin recognition of the sanctity and preservation of all vernacular languages in the worldrdquo22 3 Bāṅlā Bhāṣā Kamiṭi 1983 Bāṅlā Bhāṣā Kamiṭi Riporṭ (Report of the Bangla Bhasha Committee) Dhakaː Śikṣā Dharma Krīṛā O Saṅskṛti Mantraṇālaya Peoples Republic of Bangladesh 4 Chakraborty Rajib 2018 The Fishermenrsquos Community A Language-Culture Interplay (A Study of Post-1971 Select Bangla Novels) Unpublished PhD Dissertation Visva-Bharati

4

31WrittenBanglaThe lsquoBangla alphabetrsquo (বাংলা িলিপ - Bānglā lipi ISO15924) is derived from theBrāhmīwritingsystemwhichisrelatedtotheNagarı(alsoknownasDevanāgarī5)script[108]aswell as to Tirhutāwriting system [106] Considered to be fifthmostwidely usedwritingsystem in theworld thiscombinedBangla-Asamiyā-ManipuriScript (showingsomevariationsforAsamiyāandMeiteiorBisnupriyaManipuri)(130)wasusedintheeasternIndianSanskritmanuscriptstooForChakma in IndiaandBangladeshandforKokborok inTripura itwasandstill isoneof thescriptsusedAclosevariant calledTirhutā (123 now available also in UNICODE 100 as 11480 114DF See 110) orMithilākṣarawasused forMaithili fromthe14thCenturyuntil theearly-20thcentury[106]InthiscontextonefindsamentionoflsquoSylhetiNagarılipirsquoorlsquoSilotirsquo(addedtotheUnicodeStandard inMarch2005with thereleaseofversion41) thedetailsofwhichcouldbeof interest only tohistorians andhistorical linguists (See137and144)ButSylhetiBanglaisgenerallywrittenbymanyinthemodern-dayBanglascriptnowforallpracticalpurposes Originallyduring thereignof thePāladynasty (750-1154AD) intheeasternIndiaandevenearlierperhapsduringtheMallaperiod(694ADonwards)thepresent-dayBanglawritingsystemgotashapecomparabletothemodern-dayones[111 119] A pictorial description of Brāhmī to Modern Bangla Script could bepresentedhereinatabularform

Modern ক জ ম র স অ

k j m r s a

Table1PictorialdepictionofEvolutionofBrāhmītoBangla

5William DwightWhitney in his SanskritGrammar unequivocally said ldquoThis name (Devanagarı) is ofdoubtfuloriginandvaluerdquo(WhitneyWilliamDwight1994reprintSanskritGrammarNewDelhiːMotilalBanarasidassPublishersp1)

5

The inscriptional evidence in Brāhmī is found in the Archaic Brāhmī from the 3rdcenturyBC tothe1stcenturyBCandinMiddleBrāhmīndashsoonafter(1st-3rdCenturyAD)andthenonintheLateBrāhmī(4th-6thCenturyAD)ThisevidencecouldbeseeninbothBangladeshandWestBengal [108]by1)TheMahasthanagara(BogradistrictBangladesh mdash the ancient name being Pundranagara or Paundravardhanapura)inscriptions 2)Brāhmī (andKharoṣṭhī) inscriptions from the lower lsquoGangeticBengalrsquoand (3) Copper plate inscriptions of the Imperial Guptas fromNorthernpart ofWestBengal andNorth-West Bangladeshmdash in the areas underDharmaditya Gopachandraand Samācāradeva (about whom one only knows from five Copper-plates found inKotalipara in the Faridpur district in Bangladesh one in Mallasarul in the Burdwandistrict(WestBengal)andoneinJayramapura(BallesvaradistrictnowinOdisha)Theseepigraphs fromtheeasternpartofUndivided India (datingback to the4th-6thCenturiesAD)showedsomecharacteristicfeaturesofletters(especiallyinমlsquomarsquoলlsquolarsquo

শlsquosarsquoসlsquosarsquoandহlsquoharsquo)whichledtothedevelopmentofeasternvarietyofGuptascriptEpigraphicrecordsfromBangladeshdemonstrateremarkabledevelopmentsinEasternBrāhmī In this context the Tippera copper plate inscription of the lsquoSamatatarsquo rulers(139 pp 265) such as Lokanātha (dated 7th Century AD during the latter half) theKailaninscriptionofSy ridharanaRātaaswellastheAstafpurcopperplatesThelettersseemtohangdownfromwedgeshapedsolidtriangleswithrighthandverticalsbendingdownatthebottombecauseofwhichitwasdescribedbyPrinsepandFleetasKuṭila-lipi (literally lsquoCursivewriting stylersquo)whereas the termSiddhamātrikā (as amatra orbarisplacedovereachoftheletters)wasusedbyAlBiruni(973-1048)todesignatethescriptofNorthernIndiaThenextstageofdevelopmentisillustratedbythe9thCenturycopper plate inscriptions fromKhalimpur of the reign of Dharmapāla fromMonghyrand Nalanda of the time of Devapāla in Bihar and from Jagjıvanpura (Malda) of thereignofMahendrapālaTheSiddhamātrikā(mentionedaslsquoSiddhamrsquoinChinesesources)issaidtohavebeenprevalentalsointhisregionuptotheendofthetenthcenturyAlsocalledtheGauri(ieGandi)inPūrvadeśāortheEasterncountryitwasregardedasthesame script to which is given the appellative Proto-Bangla characteristics inrudimentaryformsintheperiodbetweenAD875andAD1025Insomeepigraphs it isconsideredasbelonging to thesecondquarterof theeleventhcenturyADFlatteningofhead-marksbecomesprominentincomparisontothewedge-shaped serifs An important landmark in the development of the Bangla script is theRamaganja copper plate inscription of Mahāmānḍalika in the last quarter of theeleventhcenturyADItistheearliestdocumentfromthisentireregionwhichbearsthelettermwithatickrisingupwardsThefullvowelidevelopsatickattherightendofthe upper horizontal bar above and a curved hook below Initial e approaches themodernBanglacharacterAmature formofProto-Bangla the immediateprecursorof

6

BanglascriptisillustratedintheinscriptionsoftheVarmanaSenaandDevarulersofthetwelfthandthirteenthcenturies[104]TheevolutionoftheBanglascript(Cf136)isalignedwiththestoryofadvancementofprintingtechnologyThefirstldquoMovabletyperdquoscriptstechnicallycreatedandusedwhileprintingNathanielBrasseyHalheds (1751-1830)1778-book titled AGrammaroftheBengalLanguageIn1785Governor-GeneralWarrenHastings(1732-1818)requestedanother civilian Charles Wilkins (1749-1836) to cut punches for Bangla printingcharactersThecurrentprintedformofBanglascriptappearedsoonafterItisgenerallyagreedthatWilkinsdevelopedBanglaprintscript[111]HepassedonthisknowledgetoPancananaKarmakara(-1804)arenownedartistinBengalLateritwasKarmakarand his family that became famous in Bangla printing technology Shepherd wasanotherassistantofWilkinsinthisdesigningofscriptwhichbecamemoreangularwithsharperturnsandedges[133]Afewarchaiclettersweremodernizedduringthe19thcentury It was standardized by Pandit Ishwar Chandra Vidyasagar when the Banglatypefontsweretobeusedtopublishona largescaleundertheCalcuttaSchoolBookSociety[116forseveralreferences]Much later in1935 theLinotypetechnique inventedbyOttmarMergenthaler(1854-1899) in 1886was introduced intoBangla printing in 1935 by the efforts of SureshChandra Majumdar (1888-1954) Rajsekhar Basu (1880-1960) Jatindra Kumar Sen(1882-1966)andhisdiscipleSushilKumarBhattacharyaandhadbegunbeingusedbytheA nandabazaraPatrikagrouplaterfollowedbyothersWithinafewyearsthemoreadvancedmonotypetechnologycametobeusedinBanglaprintingHoweverinBanglaprinting culturemonotypehas a very limited acceptance and linotype held stage tilleventuallythedigitaltechnologycameintoreplaceallearliertechniquesAllthesecouldbepresentedinatable

PERIOD DESCRIPTION NAMES

3rdCenturyBC UseofBrāhmīandKharosthīscriptsbegininthesubcontinentBrāhmīwaswidelyusedduringtheMauryanKingAśokaInonetheoryBrāhmīisbasedonNorthSemiticalphabetbutsuitablymodifiedtofittheneedoflocallanguagesItiscurrentlybelievedtohavebeenanindependentdevelopment

Brāhmī

1st-3rdCenturyAD

TheKusanascriptnamedaftertheKusanaroyaldynasty

Kusanascript

7

PERIOD DESCRIPTION NAMES

4th-5thCenturyAD

ThenextstageofitsevolutionwasintotheGuptascriptnamedaftertheGuptaroyaldynasty

Guptascript

7thCenturyAD EpigraphicrecordsfromBangladeshdemonstrateremarkabledevelopmentsinEasternBrāhmīgivingrisetotheKuṭila-lipi

Kutila-lipi

8thCenturyAD SomecopperplateinscriptionsarefoundintheKhalimpurBangladeshduringthereignofDharmapālafromMonghyrandNālandāinBiharofthetimeofDevapālaandfromJagjıvanapurainWestBengalofthereignofMahendrapāla

Siddhamātikā

9thCenturyADuntil1025AD

Proto-BanglacharacteristicsinrudimentaryformsdevelopAnimportantlandmarkinthedevelopmentoftheBanglascriptistheRamaganjacopperplateinscriptionofMahāmāndalikafoundinthelastquarteroftheeleventhcenturyAD

Proto-BanglaScriptampLanguage

12th-13thCenturyAD

AmatureformofProto-BanglatheimmediateprecursorofBanglascriptisfoundintheinscriptionsoftheVarmanaSenaandDevarulersofthetwelfthandthirteenthcenturies

MaturedProto-Bangla

14th-15thCenturyAD

ThecharacteristicsoftypicalBanglascriptbegantodevelopascouldbeseeninthecopperplateinscriptionofVijayamānikya-IofTripuradated1478AD-alsoIllustratesformsofBanglalettersinthefifteenthcenturyAD

ModernBanglaScripterabegins(SeeRoss1999)

16th-17thCenturyAD

ThechartoftheBanglaalphabetappendedtotheChinaMonumentspublishedfromAmsterdamin1667andThecodeofGentoolawpublishedfromLondonin1776bothshowachartoftheBanglaalphabetTheyshow16VowellettersincludingtheLonglsquoৡrsquo lsquol irsquoAnusvāraandVisargaand34Consonants

PrintedChartsofBangla

18th-19thCenturyAD

CharlesWilkinsdevelopsprintinginBanglain1778andVidyasagarreformsit

BanglaTypeFonts

Table2DevelopmentoftheBanglaWritingSystem

8

TheoveralldevelopmentofBanglaScriptfromtheKuṭila-lipiperiodtoModernBanglacouldbeseenhereinTable3([102and146]andalsoseetheweb-pagein147)

Table3BanglaScriptinDifferentCenturies

32LanguagesConsideredBelowisthetabularrepresentationofthelanguagesusingBanglascriptthatareplacedonEGIDSScale1-6 (See117 fordetails) Some languagesunderEGIDS5 and6havealso developed their own scripts for printing and publishing Some had used Banglascriptearlier(suchasBodo)orusedit inWestBengalatsomepointoftime(Santali)but have later shifted to another writing system Bodo is now written in Nāgarī or

9

DevanāgarīandforSantalioneusesbothNāgarīDevanāgarīandOl-chiki(145)Forthepurposesof theBanglaLGRonly languagesbelonging to theEGIDS scale1 to4havebeenconsideredConsiderthefollowingtable

EGIDSScale1

EGIDSScale2

EGIDSScale3

EGIDSScale4

EGIDSScale5

EGIDS6

Bangla(Bengali)

SantaliBodoRiangKhumiMru(ng)Asho

LepchaPnarKodaKoraChak

Asamiyā(Assamese)

KochorRajabansı

MaltoorMalpahariya

ManipuriorMeitei

BisnupriyaManipuriKok-Borok(TripuraampBangladesh)

ChakmaHajongMundariampKurux(ofBangladesh)

TotoRohingyaTipperaMegamTanchangya

Usoi LimbuSadriorOraon

BhumijorMundariBawmChin

Table4MainlanguagesinIndiaandBangladesh

thatuseBanglaScriptontheEGIDSScale

33NotableFeaturesofBanglaScript[150]BanglaWritingSystemhascertainfeaturesthatshowhowithastobewritteninorhowtype-setting inBangla couldbedoneThis section is followedbya section that explains theCode-points (and fixed Code-point sequences) which show certain distinctive characteristics ofBanglaandwhichmaketheRepertoireThenextsectionswillalsocoverthelsquoaksharrsquo-formationrules(ABNF)showingcharacterclassWordLevelEvaluation(WLE)andContextRulesaswellas In-ScriptandCross-ScriptVariantsHerewepresentsomebasic featuresof theScriptandPronuncition The Bangla script is an alpha-syllabic writing system in which writing of all

consonants are assumed to contain an accompanying lsquoinherentrsquo vowel(theoretically before or after each consonant) It varies between ɔ and o

10

depending on the position of the consonant in the word At times theselsquoassumedrsquoorlsquoinherentrsquovowelsarenotpronouncedatall[142]

Vowelscanbewrittenasindependentlettersorbyusingavarietyofdiacriticalmarks which are written above below before after or both of the last twopositionstheconsonanttheyfollowinpronunciation[105]

AllBanglaconsonantswhenpronouncedinisolationareutteredwithaninherentvowel-ɔhenceক lsquokrsquoখ lsquokhrsquoorগ lsquogrsquoareusuallypronouncedas[kɔ][khɔ]or[gɔ]etcPhonologicallyBanglavowel-ɔcorrespondstotheHindischwaə

WhenconsonantsoccurtogetherinclustersspecialconjunctlettersareformedInprintedBanglamanyof theseconsonantal clustersorconjoinedconsonantsareinuseThelettersfortheconsonantsotherthanthefinaloneinthegrouparegenerally reduced But there are a few special conjunct characters which arecompounds of the consonant characters eg 7(k)+ষ(s)=8(ks)

9(n)+জ(j)=(nj)(j)+ঞ(n)==(jn) gt (h)+ম(m)=(hm) There are other issuesalsomdashরasthesecondmemberofaclusterisreducedtoasecondarysymboleg

(p)+র(r)=A(pr)B(s)+C(t)+র(r)=D(str) (as inউD ustra ldquocamelrdquo)য (y)whenusedas a primary symbol represents jɔ in Bangla But its secondary symbol(allograph) jɔ-phala has two phonetic values When added to the initialconsonant in a word it is a vowel aelig (as in শGামল (syamala) ldquogreenrdquo র Gাপার

(ryapara)ldquowrapperrdquoetc)Butafteranon-initialconsonant it justdoublesit in

pronunciation (as in কাযH ধাযH etc) The I(r)+য(y) combination has two

renderingsmdashর G(ry) andযH(ry)IncaseofJ(d)+ধ(dh)K(g)+ধ(dh)L(n)+ধ(dh)the

shape of the second member is changedmdasheg M(ddh) N(gdh) and O(ndh)

respectively The solitary example of I (r)+ঋ(r)=ঋH (as in ৈনঋHত nairrtSouthwest) ndash usedmostly in cases of Classical borrowings shows the use ofsecondary symbol of a consonant followed by the primary symbol of a vowelTheinherentvowelonlyappliestothefinalconsonantofthecluster

InconsonantclustersmanyconsonantstookacompletelydifferentformSometypicalexamplesareS(kt)T(kr)8(ks)N(gdh)=(jn)U(nc)(nj)V(tt)W(nt)O(ndh)X(bdh)Y(bhr)Z(mb)[(st)etcরhastwoallographsapartfromthisfullshapeoneislsquorepharsquoasfoundinকH(rk)পH(rp)andanotherisra-phalaasinA(pr)T(kr)(s+n)isanotheronewherethecerebralnasalconsonantsigntakesaqueershape[151]

The Bangla script has at least fifty-two primary symbols and quite a fewallographs(positionalvariantsofthem)correspondingtoforty-four(7oraland7nasalvowelsand30consonants)phonemes(150)orfunctionalspeechsoundswithsomeobviousredundanciesalthoughinoneofthefirstphonemicanalysisthenumberwasthoughttobethirty-fivephonemes[140]

11

As mentioned above in Bangla several graphemic symbols have secondaryshapestechnicallycalledlsquoallographsrsquowithacomplementarydistributionineachcaseThesegraphsormarkingsaregenerallyaddedtothefollowingpositionsoftheprimarysymbol[113]inthefollowingmanner

1) Below(egক(ku)W(nta)ক(ku)^ (hra)etc)

2) Above(egচ (ca)কH (rka)etc)

3) Rightside(egকা (ka)কং (kan)etc)

4) Leftside(egেক (ke))

5) LeftSideandabovesimultaneously(egৈক (kai)িক (ki)etc)

6) Rightsideandabovesimultaneously(egকী (kı))

7) Rightsideandleftsidesimultaneously(egেকা (ko))

8) Rightsideleftsideandabovesimultaneously(egেকৗ (kau))

Asforcomplementarydistributionofvowelletters(word-orsyllable-initial)andVowel Matras which are relevant for ABNF let us consider the followingBesidessomesimpleVowelModifierscalledlsquoKarsrsquoinBangla(alsoreferredtoasMatraintheotherLGRdocumentsofNeo-Brāhmī)therearesomecombinatorymodifiersofBanglaVowelswithcertainconsonantsForexamplewhereas

আU+0986BENGALILETTERAAissubstitutedby

াU+09BEBENGALIVOWELSIGNAA

ইU+0987BENGALILETTERIissubstitutedby

pre-posedিU+09BFBENGALIVOWELSIGNI

ঈU+0988BENGALILETTERIIissubstitutedby

ীU+09C0BENGALIVOWELSIGNIIor

উU+0989BENGALILETTERUissubstitutedby

U+09C1 BENGALI VOWEL SIGN U by marking below the primary

grapheme there are some special vowel modifiers of উ as in the followingcombinedletters

zwnj guratherthanwritingasগ(g)+ (u)

h ruratherthanwritingasর(r)+ (u)

zwnj śuratherthanwritingasশ (s)+ (u)

j huratherthanwritingasহ(h)+ (u)

12

knturatherthanwritingasL (n)+ত (t)+ (u)

Similarlytherecouldbevowelmodifiersofঊorlsquo(Long)ūrsquoaswelleg

m (bh)+র (r) (n bhru ldquoeyebrowrdquo)o (s)+র (r) (p sru)ঋ (r) afterহ (h) (q hr)etc

TherehavebeenmanynotablecontributionsinsimplifyingandmodifyingBanglaspellings and combinatory techniques especially by scholars such as PabitraSarkar(1992)[134]Inthistherehasbeenanattempttoreducethenumberofallographs of both vowels and consonants in clusters and it has been widelyacceptedintheprintingofschooltextsinbothBangladeshandWestBengal[151152]Asofnow twosystems theold (traditional) and thenewgoon sidebysideoperativeindifferentdomains

HoweverinpreparationofthisLGRdocumenttheaimhasbeentoconsiderthewidelyused and usable sequences and combinations and their variations across the sisterscriptsbelongingtothebasketofBrāhmīwritingsystemsBanglaAcademyDhakapublishedStandardBanglaSpellingRulesin1992followingtherecommendationsofacommitteeconstitutedthroughaworkshopjointlyorganizedbytheJatıyaSy iksakramaandPathyapustakaBoardin1988AthroughlyrevisededitionoftheRuleswaspublishedinSeptember20126After the establishment of Banla A kademi ofWestBengal in 1986 its first PresidentAnnadasankar Ray (1904-2002) in his inaugural address gave a direction forstandardizationofBanglaalphabetscript thespellingsystemandclearlyarguedthattheywouldnotblindly followtheSanskriticmodelofconventionalgrammarAbroadlistofproposalswassenttoexpertsonBanglaandabroadagreementwasreachedforlsquohomogenizationofBanglaspellingrsquoby1988BasedonopinionsreceivedfromdifferentquartersaunanimouslistoflsquorulesrsquowasagreeduponThiswaspublishedbyalsquoSpellingDictionaryrsquo titled Ākādemi Bānāna Abhidhāna (1997) which was obviously morecomprehensive than lsquoTheUniversityofCalcuttaproposalsrsquomade in1936Alongwiththe lsquorationalizationrsquo of spellings another stepwas taken tomake thewriting systemeasier to read by making the symbols used both single and combined ones morelsquotransparentrsquoThesereformswereoriginallysuggestedbySarkar(1987firstpublishedin1978)[134][153]whereheusedthetermsSwaccha (lsquoTransparentrsquo)andAswaccha(lsquoOpaquersquo or non-transparent) even adding Ardha Swaccha (lsquohalf transparent) inbetweenthetwoSomesampleexamplesare

6Bangla Academy 2012 Bāṅlā Ekaḍemī Pramita Bāṅlā Bānānera Niyama (Bangla Academy StandardBanglaSpellingRules)DhakaːBanglaAcademy

13

Transparent r (nn) s (pt) [ (st) where both member of the cluster can berecognizedOpaquewhereneitherofthetwocouldbe(easily)recognizedmdash8 ks(7 k+ষ s)= jn

( j+ঞ n)tng(un+গg) hm(gt h+ম m)

Semi-transparent A (pr)পH (rp)whereone symbol is recognizable and theother is

notIncaseofthree-termclustersatleastonesymbolwillnotbetransparentegv str

(w s+x t+র r)D str(B s+C t+র r)etc

Therewere in fact two types of proposals One concerned the shape of the lettersthose of consonant + vowel (CV) combinations and conjuncts which is consonant +consonantcombinationsTherewerefurthercomplexshapesiethoseofconsonant+consonant+ (consonant+) vowel (CC(CV) signs as in y (pru) or z (skru) SomedecisionsinthisareawerenecessarybecauseafewoftheCC(C)symbolsrepresentedcomplexitiesthatmadelearningthemdifficultforthechildrenTheotherdealtwiththespellings ofwords onlywithout any reference to the shapes of letters inwhich theywere written The basic objective here was lsquoone word one spellingrsquo to the greatestextentthatwaspossible[151]

Belowwe place a statement of themost salient changes that affect the consonant +vowelcombinations[153]

a The variants of the short u (^ উ-কার hrasva u-kāra) vowel sign have been

brought down to one ie So zwnj (gu) is now গ Similarly h (ru) gt র zwnj

(śu)gt শ j (hu)gtহ and therefore cluster + short u sign k (ntu)gt W

(ন++ত+উ) (stu)gt[ (স++ত+উ)

b The variants of long u (দীঘH ঊ-কার dīrgha u-kāra) have also been reduced

(rū)gt র n (bhrū) gt Y (ভ bh++র r+ঊ ū) (drū)gt (দ d++র r+ঊ ū) p (śrū)gt

(শ ś++র r+ঊ ū)

c The variants of ঋ-কার (ṛ-kāra secondary symbol of ṛ) have been brought down to one q (hṛ) gt হ

Regarding consonant + consonant + (consonant)hellip+ (vowel) clusters PaschimbangaBanglaAkademi proposed transparent or semi-transparent shapes for clusters to theextentadmissibleinBanglawritingsystemSomeexampleswillclarifytheproposal(Aslashwillmeanthatthetraditionalcluster-shapeprecedesitwhiletheBanglaAkademiinnovationfollows)[153]

Xব ধ bdh ( b+ ধ dh) Mদ ধ ddh (J d+ধ dh) ন থ nth (L n+থ th) Uঞ চ ntildec

(9 ntilde+চ c) ঞ ছ $ ntildech (9+ছ) ঞ জ ntildej (9 ntilde+জ j) Sক ত amp kt (7 k+ত t) T

14

kr (7 k+র r) Nগ ধ ( gdh (K g+ধ dh) ) ṅk (u ṅ+ক k) t ṅg (u ṅ+গ g) +

ṣṇ (B ṣ+ণ ṇ) ন ndhr (L n+ dh+র r) - ṇḍr ( ṇ+ ḍ+র r) ktr (7 k+x

t+র r)

331TheConsonantsAsper traditional classificationBangla Consonants are categorized according to theirphoneticpropertiesespeciallyintermsofplaceandmannerofarticulation[107]Thereare Five lsquoVargarsquo (pronounced as lsquoBargarsquo in Bangla) or Groups (sets or classes)distinguished by Place of Articulation and one Non-lsquovargarsquo group [105] Each Vargawhich corresponds toStopsat a certainplaceof articulation containsa seriesof fiveconsonants classified as per their phonetic qualities (ie manner of articulation)beginning from Unvoiced and Unaspirated to Voiced and Aspirated (in the fourthcolumn)finallyendingwithaHomorganicorCorrespondingnasal[107]Considerthefollowingtable

lsquoVargarsquoorSets

Unvoiced Voiced Nasal

-Asp +Asp -Asp +Asp

Velar কlsquoKrsquoU+0995

খlsquoKHrsquoU+0996

গlsquoGrsquoU+0997

ঘlsquoGHrsquoU+0998

ঙlsquoNGrsquoU+0999

Palatal চlsquoCrsquoU+099A

ছlsquoCHrsquoU+099B

জlsquoJrsquoU+099C

ঝlsquoJHrsquoU+099D

ঞlsquoNYrsquoU+099E

Retroflex টlsquoTTrsquoU+099F

ঠlsquoTTHrsquoU+09A0

ডlsquoDDrsquoU+09A1

ঢlsquoDDHrsquoU+09A2

ণlsquoNNrsquoU+09A3

Dental তlsquoTrsquoU+09A4

থlsquoTHrsquoU+09A5

দlsquoDrsquoU+09A6

ধlsquoDHrsquoU+09A7

নlsquoNrsquoU+09A8

Bilabial পlsquoPrsquoU+09AA

ফlsquoPHrsquoU+09AB

বlsquoBrsquoU+09AC

ভlsquoBHrsquoU+09AD

মlsquoMrsquoU+09AE

Table5VargaclassificationofBanglaconsonants

(FallingintoaPatternofFiveSetsofUnvoicedUnaspiratedUnvoicedAspiratedVoicedUnaspiratedVoicedAspiratedandNasalscalledfivelsquoVargarsquo)

Non-

যlsquoYrsquoU+09AF

য়lsquoYYrsquoU+09DF

রlsquoRrsquoU+09B0

ড়lsquoRRrsquoU+09DC

ঢ়lsquoRHrsquoU+09DD

15

Varga লlsquoLrsquoU+09B2

শlsquoSHrsquoU+09B6

ষlsquoSSrsquoU+09B7

স lsquoSrsquo U+09B8

হlsquoHrsquoU+0939

Table6Non-Vargaconsonants(Notfallingintoanyofthefivecategories)

332TheImplicitVowelKillerHasanta(calledrsquoHalantrsquoorlsquoHalantarsquoinotherBrahmı-basedscripts)As stated earlier all consonants are pronounced in isolation with an implicit vowel(centralback-ɔinBanglaastheneutralvowel)assumedtobeassociatedwiththem[121]ThelsquoHasantarsquo(=rsquoHalantrsquoorlsquoHalantarsquoinotherBrahmı-basedscripts)orthetermlsquoVirāmarsquo7(=rsquoDa rirsquoinBangla)aspreferredinUNICODE(cfUnicode30andabove)havebeenusedinthisreportastermsthathavebeenusedtodenotethecharacterthatmarkthe absence of this inherent vowel It may be noted that the term virama has beenadopted in UNICODE in a sense that is different from the traditional definition ofgrammarandhenceitrequiressomeexplanationhereConsideringtheimportanceofthedocumentthisnoteshouldbeapartofthisLGRdocumentsothatanybodyreferingtoitshouldbeabletoknowthepropergrammaticalexplanationofthetermBecauseaspecialsignisneededwheneverthisimplicitvowelisstrippedoffthesymbolisknownas the Hasanta (= Halant) (U+09CD) By placing the Hasanta under the firstconsonantofacombinationorclusteronecouldndashincommonparlanceldquokillrdquoitsvowelandcreate conjuncts In thismanner conjunct characters canbegenerallywrittenbyjoining two to fourconsonant combinations In rarecases thisprocess can joinup tofive consonantsHowever thenotionof amaximumnumberof consonants joining toformoneaksara8istobeboundedempiricallyThisisanobservationbasedontheCIIL-Emille Corpora of Bangla words [132 amp 133] as seen in print these days Given themixture of scripts and languages happening on theweb the possibility that onemaywant a generic Top Level Domain [gTLD] which may have more than the observedmaximum cannot be ruled out This can be the case when a foreign language wordwhichadmitsalargenumberofconsonantsistransliteratedintoBanglaHenceintheBanglaLGRworkthislimitwillnotbeenforced

333VowelsSeparate symbols exist for all lsquoSwararsquo or Vowels in Bangla which are pronouncedindependentlyeitheratthebeginningofthewordorafteranothervowelorconsonantsoundToindicateaVowelsoundotherthantheimplicitoneaVowelsigncalledlsquokārrsquo

7VirāmaasusedhereisalsoamisnomeraccordingtotheIndiangrammaticaltraditionsNowheremereabsence of a vowel is marked as virama Hasanta just marks the absence of a vowel nothing else(AbhyankarKashinathVasudevampJMShukla1961ADictionaryofSanskritGrammarBarodaːOrientalInstitute)8ThistermneedstobedisambiguatedAksaraalsomeanslsquosyllablelsquoinIndiangrammaticaltreaditions

16

inBanglaorMātrāinNagarı9isattachedtotheconsonantSincetheconsonanthasthisbuilt in neutral vowel at the end there are equivalent kāras (Mātrās) for all vowelsexcepttheঅ(pronounced-ɔ)Thecorrelationisshownasfollows

Vowel Correspondingvowelsign(kāras(Mātrās)

অlsquoArsquoU+0985

আlsquoAArsquoU+0986 া U+09BE

ইlsquoIrsquoU+0987 ি U+09BF

ঈlsquoIIrsquoU+0988 ীU+09C0

উlsquoUrsquoU+0989 U+09C1

ঊlsquoUUrsquoU+098A U+09C2

ঋVocalicrsquoRrsquoU+098B U+09C3

ৠVocaliclsquoRRrsquoU+09E0 U+09C4

ঌVocaliclsquoLrsquoU+098C U+09E2

ৡVocaliclsquoLLrsquoU+09E1 U+09E3

এlsquoErsquoU+098F ে U+09C7

ঐlsquoAIrsquoU+0990 ৈU+09C8

ওlsquoOrsquoU+0993 োU+09CB

ঔlsquoAUrsquoU+0994 ৌ U+09CC

- ৗ U+09D7

Couldappearontopofঅ lsquoArsquoU+0985oranyothervowel

U+0981Candrabindu

9AlthoughthetermlsquoMatralsquoinBanglastandsforanaltogetherdifferentconceptvizthetopbarplacedoveraletterndashtypicallyavailableinHindiandBanglabutmissinginGujarati

17

Vowel Correspondingvowelsign(kāras(Mātrās)

Couldappearafterঅ lsquoArsquoU+0985oranyothervowel

ংU+0982Anusvara

Couldappearafter অlsquoArsquoU+0985oranyothervowel

ঃU+0983Visarga

Afteranyconsonant U+09CD(Hasanta)

- ঽ U+09BDAvagraha

Table7BanglaVowelswithcorrespondingkārs

334TheAnusvāraonuʃʃār(ং-U+0982)TheAnusvāra or onuʃʃār inBangla at times represents a homorganic nasal but notalwaysItreplacesaconjunctgroupofalsquoNasalConsonant+Hasanta+ConsonantrsquowherethesecondconsonantbelongstotheVelarvargaorsetasinলংকাButitoftenappearsalso for such combinations involving non-velars appearing as the lastmember of thecombinationasinলGাংটা ldquonakedrdquoorলGাংচা ldquoakindofsweettolimprdquoBeforeanon-vargaconsonant the Anusvara represents a nasal sound that may have an alternativeconjoined writing symbol representing the corresponding nasal consonant of theparticularsetAlthoughModernHindiMarathiandKonkaniprefertheanusvāratothecorrespondingHalf-nasal inBangla it isclearlydemarcatedastowhereonemustusetheAnusvāraandwhere ithastobeaconjunctclusterwithanasalasthefirstorthesecondcomponent

335NasalizationCandrabindu(-U+0981)

Candrabindu denotes nasalization of the preceding vowel as in চাদ cad lsquomoonrsquo(U+099AU+09BEU+0981U+09A6)Thissignwithadotinsidethehalf-moonmarkisusedasnasalizationmarkerinmanyBrahmı-basedscripts[143]

336Nukta(-U+09BC)ThenuktasigndoesnotexistinBanglaorthographyItispredominantlyusedinmanyBrahmıderivedscriptssuchasDevanagarı(forHindiBodoMaithiliSantaliKashmiriandSindhiThetermandtheconceptofnuktaareborrowedinBanglaTheIDNAProtocol(RFC5891)statesthatIDNsmustbeinUnicodeNormalizationFormC (NFC) RFC 7940 applies this requirement to LGRs The definition of NFC in the

18

UnicodeStandardcontainsanumberofcompositionexclusionsAsaresulttheBanglalettersয় YYAড় RRAandঢ় RRHAhavetoberepresentedinthethisLGRbyusingthesequences (YA +Nukta U+9AF + U+09BC) (DDA + Nukta U+9A1 + U+09BC) and(DDHA+NuktaU+9A2+U+09BC)insteadofthesinglecodepointsYYA(U+9DF)RRA(U+09DC) andRRHA (U+09DD) although the useof lsquoNuktarsquo is otherwise completelyunnaturalinBanglaIt is noted that in the current Unicode Standard chart these characters are listed asadditionalconsonantsAspertheLGRProcedurehoweverthesedecisionsdependontheIDNAProtocolthroughasetofprodeduresdevelopedbytheIETFEventhoughtheUnicode Standard also prescribesmethods to produce these three characters both asatomiccharacters (forexample09DC forড় [r]09DD forঢ় [rh] and09DFasয় [y]assinglekeystroke)theIDNAprotocolrequiresthatwetreatthemasconjunctcharactersandthenallocatecodesfortheseintheUnicodeBengaliBlockItmaybenotedthattherecouldbesporadicattemptsorcasesofwritingMuslimnamesUrdupoeticwordsandPerso-Arabicloanwordswithnuktaunderক(k)খ(kh)গ(g)জ(j) and ফ (ph) only for the sake of correct pronunciation and for maintaining thesanctityoftheloanwordThesewerealsolikeusingBanglawritingsystemtoworkliketheIPAscriptItishowevernotinuseinBanglawritinginprinting

337Visargabiʃɔrgo(ঃ-U+0983)andAvagraha(ঽ-U+09BD)

TheVisargabiʃɔrgoU+0983 is frequentlyused inBangla loanwordsborrowed fromSanskritandrepresentsasoundveryclosetohOnecouldquoteasanexampleদঃখduhkholdquosorrowrsquorsquoldquounhappinessrsquorsquo(U+0926U+0941U+0983U+0916)The Avagraha ঽ (U+09BD) is mainly used in Sanskrit Pali Prakrt or Maithili textswritteninBanglaItisgraduallybeingreplacedbyanuppercomma(egনেরাঽপরািণre-writtenasনেরাrsquoপরািণ)ItisrarelyusednoweveninotherlanguagesusingBanglascriptIncaseofLGRtheAvagrahaisnotpartoftherepertoireIthasbeendecidedthereforenottoretainAvagraha(ঽ)(U+09BD)becauseitisblockedinTLDsaspertheMaximalStartingRepertoire(MSR)PleaseseeAppendixIIinsection11foracompletelistofBanglaconsonantsandtheirallographs

338ZeroWidthNon-joiner(U+200C)andZeroWidthJoiner(U+200D)ThisnoteispertinenttotheuseofZeroWidthJoiner(ZWJ)andZeroWidthNonJoiner(ZWNJ)asusedinBanglaItneedstobenotedthatNepaliKonkaniandHindiusethesetwosignsinadifferentmanner

19

ZWJ(U+0200D)andZWNJ(U+0200C)arecodepointsthathavebeenprovidedbytheUnicodestandardto instructtherenderingofastringwherethescripthastheoptionbetweenjoiningandnon-joiningcharactersWithouttheuseofthesecontrolcodesthestringmayberenderedinanalternateformfromwhatisintendedUseofZWJ

bull InsofarasBanglaisconcernedZWJisusedfortheproperrenderingofcharacterssuchaskhaṇḍa-taৎasinসতGিজৎ (satyajit)ldquoSatyajitrdquoandসৎ(sat)ldquohonestrdquoThisistypedasfollowsta+Hasanta+ZWJ(U+0200D)

bull However ZWJ is more important where same combination of consonantal

characters is represented differently depending upon the contexts Eg র++য

havetworepresentationsinBanglamdashasযHandasর GTogettheformযHonehasto

type in the following mannermdashর++য but for র G the sequence would be

র+ZWJ++য [154] In other words ZWJ is used in the rendering of wordsdemanding ya-phalā after ra which is otherwise not possible to type (render)due to the same order of ra+hasanta+antastha ja in the medial andor finalposition Interestingly ra+hasanta+antastha ja is used to type repha on theconsonant -antasthaja as inকায6 (kaarjo) In order to get a ya-phalā after the

consonant -ra it is therefore obligatory to use ZWJ after -ra as in র Gাপার

(wrapper)র Gাশ(rash)র Gািল(rally)etcThetypingsequenceisgivenbelow

ra(র)+ZWJ+hasanta()+antasthaja(য)=র GUseofZWNJ

bull TheuseofZWNJinBanglaisusedtorepresenttheexplicitHasantaorHalantInordertoavoidconjunctformationincaseswherethereisanexplicithasantabeforethesucceedingconsonanttheZWNJisused

Consonant+hasanta+ZWNJ+consonant=explicithasantaExampleAা7 কথন(prakkathanaprakkɔtʰon)

TheuseofZWJZWNJhavebeenruledoutfromtherootzonebythe[Procedure]Usedin Bangla to create alternate renderings the insertion of these two signs can affectsearchingaswellasNLP

20

TheZeroWidthNon-joiner(ZWNJ)isaninvisiblecharacterusedincertaincases(afterHasanta)wheredefaultconjunctformationistobeexplicitlyrestrictedandtheHasantajoiningthetwoconsonantsparticipatingintheconjunctformationneedstobeexplicitlyshown

339UseofYa-phalaaYa-Phalaasequencesare two instances inBanglawhereHasanta isprecededbya fullvowel(U+0985অ-BENGALILETTERAandU+098Fএ-BENGALILETTERE)

bull অা 098509CD09AF09BE BENGALILETTERA+BENGALISIGNVIRAMA+BENGALILETTERYA+BENGALIVOWELSIGNAA

bull এা 098F09CD09AF09BEBENGALILETTERE+BENGALISIGNVIRAMA+BENGALILETTERYA+BENGALIVOWELSIGNAA

For renderingYa-phalā followedbyঅ andএ it isnecessary to typeU+09CDHasantaplusU+09AFyaprecededby thesaidvowelsThis isapurely ligaturalentityand theadditionofYa-phalāandākaraisusedtoelicittheaeligsoundasinEnglishacidঅGািসড

association অGােসািসেয়শনlsquobatrsquoবGাটlsquofatrsquo ফGাট lsquomatrsquo মGাটlsquocaprsquoকGাপetcTheBrāhmīscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribed as lsquovowel killerrsquo although it actually indicates absenceof a vowel after themarkedconsonantOnly theconsonantscanhave theHasantamarkedButasweseehereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅ8াandএ8া(CfUnicode100p473[100])

3310FormationofRa-phalaaandRefSequences Thiscasereferstotheformationofrephaandra-phalāasfollows

Ra-Hasanta= (C2H)whereC2iseither

09B0(র-BENGALILETTERRA)or 09F0(ৰ-ASSAMESELETTERRAUnicodename

BENGALILETTERRAWITHMIDDLEDIAGONAL)His09CD(-BENGALISIGNVIRAMA)

Owingtoco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+kaasinঅকHarkaldquothesunldquo)ra-phalā=C+Hasanta+ra(egT ieka+Hasanta+raasinচTchakraldquocyclerdquo)Thepointisin

21

both the cases the slot for ra could be Bangla ra র (U+09B0) or the Assamese ra ৰ(U+09F0)followedprecededbythecommonHasanta(U+09CD)whereastheshapesofrephaandra-phalāinboththecasesremainthesameTheLGRmakesanoteofthispoint of concern with respect to the two RAs in disguise as it would be compeltelyimpossibletodistinguishbetweenthemwithnakedeyesinalablesogeneratedwhichmay consequently lead to concerns related to spoofing and other kind of cyberirregularitiesThemotivetoclassthesetwoCPsas(blocking)variantsisbecausefullyrendered labels may mask the distinction between Bangla ra র (U+09B0) or theAssameseraৰ(U+09F0)ThatprovidesthejustificationforVariantSet4thoughonlyinthecontextoffollowingHasantThedifferencebetweentheRAsisonlydistinguishableifonelooksintotheirUnicodevaluesThereforelabelssuchasঅকHarka শীষH sırsalsquotopapexrsquo অY abhra lsquocloudthe skyrsquo ম śrama lsquophysical labourrsquo could be extremelydangerous as theweb-usermay never verify the digital content (the labels) with itsunicodevaluecodepoints ThispointismadeexplicitlywithreferencetoTable9(ofsequencesp36)andTable16(ofWLESymbolsp47)thataretofollowMoreoveritisnoteworthythattheREPHAcanalsooccurwithKHANDATATheconditionsinthiscontextofKHANDATAareliabletobesuchthattheCshouldbeeitherRAU+09B0(র)(usedinBangla)orRAU+09F0(ৰ)(usedinAssamese)

4 OverallDevelopmentProcessandMethodologyThe Neo-Brāhmī Generation Panel (NBGP) has been formed by members havingexperience in Linguistics (especially in NLP Computational linguistics) LiteratureLanguageHistoryandEpigraphyUndertheNeo-BrāhmīGenerationPanelBanglaandeightotherscriptsbelongingtoseparateUnicodeblocksarebeingtakenuptoassignaseparate LGR for each However an attempt ismade to ensure that the fundamentalphilosophybehindbuildingthoseLGRsconsistentwithallotherBrāhmī-derivedscriptsThepresentLGRwillcater tomultiple languagesbelongingtoEGIDSscale1to4(seeTable4)thatuseBanglascriptThefollowingguidingprinciplesareusedinmakingdecisionsaboutBanglaLGRCode-points

41 GuidingPrinciplesTheNBGP adopts followingbroadprinciples for selection of code-points in the code-pointrepertoireacrosstheboardforalltheNeo-Brāhmīscriptswithinitsambit

411 InclusionPrinciples4111 ModernUsageEvery character proposed should be in the everyday usage of a particular linguisticcommunityThecharacterswhichhavebeenencodedintheUnicodefortranscription

22

purposesonlyorforarchivalpurposeswillnotbeconsideredforinclusioninthecode-pointrepertoire4112 UnambiguousUseEvery character proposed should have unambiguous understanding among linguistsaboutitsusageinthelanguage

42 ExclusionPrinciplesThe main exclusion principle is that of External Limits on Scope These consist ofprotocolsor standardswhichareprerequisites to theLabelGenerationRule-setsAllfurtherprinciplesareinfactsubsumedundertheselimitationsbuthavebeenspeltoutseparatelyforthesakeofclarity421 ExternalLimitsofScopeThecodepointrepertoireforrootzonebeingaveryspecialcaseatthetopofprotocolhierarchies thecanvasofavailablecharacters forselectionasapartof theRootZonecodepointrepertoireisalreadyconstrainedbyvariousprotocollayersbeneathitThefollowingthreemainprotocolsstandardsactassuccessivefiltersiTheUnicodeChartOut of all the characters that are needed by the script in question if a particularcharacter is not encoded in Unicode it cannot be incorporated in the code pointrepertoire Such cases are quite rare and especially so in Bangla-Asamiyā-ManipuriWritingSystemgiventheelaborateandexhaustivecharacterinclusioneffortsmadebytheUnicodeconsortiumiiIDNAProtocolUnicode being the character-encoding standard for providing the maximum possiblerepresentation of a given scriptlanguage it has encoded as far as possible all thepossible characters needed by the script However the Domain name being aspecialized case it is governed by an additional protocol known as IDNA(InternationalizedDomainNames inApplications) The IDNAprotocol excludes somecharactersoutofUnicoderepertoirefrombeingpartofthedomainnamesiiiMaximalStartingRepertoire(MSR)TheRoot-zoneLGRbeing the repertoireof characterswhicharegoing tobeused forcreationoftheRoot-zoneTLDswhichinturnconstituteanevenmorespecializedcaseof domain names the ROOT LGR procedure introduces additional exclusions on theIDNArsquosallowedsetofcharacters

23

ExampleBanglaSignAvagrahaঽ(U+093D)evenifallowedbyIDNAprotocol isnotpermittedintheRootZoneRepertoireaspertheMSRTosumuptherestrictionsstartoffwithadmittingonlysuchcharactersasarepartofthe code-block of the given scriptlanguage The IDNA Protocol further narrows thisdownandfinallyanadditionalfilterintheformofMaximalStartingRepertoirerestrictsthecharactersetassociatedwiththegivenlanguageevenmore4211 NoPunctuationMarksTheTLDsbeingidentifierspunctuationmarkerspresentinBraHami-basedscriptswillnotbeincluded 4212 NoSymbolsandAbbreviationsAbbreviations weights and measures and other such iconic characters like BANGLAISSHAR(U+09FA)BANGLACURRENCYDENOMINATORSIXTEEN৹(U+09F9)etcwillalsonotbeincluded4213 NoRareandObsoleteCharactersThere are characterswhich have been added toUnicode to accommodate rare formssuchasSanskriticVOCALICRRৠ(U+09E0)andVOCALICLldquoঌrdquo (U+098C)aswellasVOCALICLLৡ(U+09E1)andtheallographicndashkaraformsofthelattertwosymbols-VOWELSIGNVOCALICL(U+09E2)andVOWELSIGNVOCALICLLldquo(U+09E3)Allsuch charactersareexcludedwhich complieswith theConservatismprincipleas laiddownintheRootZoneLGRprocedureHoweverinBanglathe-karacorrespondingtoVOCALICRRৠ(U+09E0)whichisVOWELSIGNVOCALICRRldquordquo(U+09C4)isstill inactiveuseincertainlimitedborrowedorSanskriticwordsandarethereforeretained4214 NoStressMarkersofClassicalSanskritandVedicStressmarkers for classical Sanskrit will not be included This is also in consonancewiththeLetterprincipleaslaiddownintheRootZoneLGRprocedure4215 ABNFThe Augmented Backus-Naur Formalism (ABNF) is described in Section 541 andAppendix(Section101)

5 RepertoireTheBanglaWritingSystemisrepresentedinUNICODEusingtheBengali(Bangla)scriptname as enumerated in ISO 15924 corresponding to languages such as Asamiyā(Assamese) Bangla (Bengali) and Manipuri The BENGALI block used for Bangla-

24

Asamiyā-Manipuri in theUNICODEhas93 entriesThis sectiondetails the code-pointrepertoirethattheNeo-BrāhmīGenerationPanel[NBGP]proposestobeincludedintheBanglaLGRItmaybementionedherethat theGovernmentofAssamhassubmittedaproposal toBureauof Indian Standards (BIS) on26th February2016 for dis-unificationofBanglaand Asamiyā Scripts The BIS inits 8thMeetingofIndian Language Technologies andProducts Sectional Committee LITD 20 held on 23rd Aug 2017 decided torefer theproposalforrecognitionofAssamesescriptinISOIEC10646toISOUntiltheUNICODEConsortiumtakesanyfurtheractionitwillbeassumedthattheCodePointRepertoireunderTable11willbevalidforallthethreelanguagesasaboveFor each of the code points language references have been given in the last columntitledReferenceunderTable8titledtheldquoCodePointRepertoirerdquoForentirecoverageofBanglacodepointsreferencesofBanglaAsamiyā(Assamese)Manipuri(Meitei)andBishnupriya are given Kokborok written in Bangla script is not known to haveintroducedmanynewcomplicationsexceptforoneparticularcharacterThoughonlyafewrepresentativelanguagesunderEGIDSScale1-4havebeenchosenforreferencingthey together cover all the code-points required for all the languages that NBGP hasconsideredasgivenunderBanglaUnicodePoints(asgiveninUNICODE63)Howeverbefore thedetailsarepresented it is ideal to lookat theBanglaCodePointChartfromMaximalStartingRepertoire[MSR]Version3Itmaybenotedthattheshapesofthereferenceglyphsgivenbelowinthecodechartsarebasedononeofthemanyfontsdesigned and are not prescriptive because there could be some variations in actualfonts ndash both UNICODE-compatible and True-Type ones Consider the following Codepointtable

25

Figure1BanglaCodePagefrom[MSR]forBangla-Asamiyā-Manipuri

ColourconventionAllcharactersthatareincludedinthe[MSR]-YellowbackgroundPVALIDinIDNA2008butexcludedfromthe[MSR]-PinkishbackgroundNot PVALID in IDNA2008 or are ineligiblefor the root zone (digits hyphen) - Whitebackground

GiventheBanglaUnicodeBlockasinFigure1forthecodepointsthoseareincludedintheMSRthefollowingsymbolswillneedaseparatetreatmentৎ U+09CE BanglaLetterKhanda-Taৰ U+09F0 Asamiyā-BanglaLetterRaWithMiddleDiagonalৱ U+09F1 Asamiyā-BanglaLetterRaWithLowerDiagonal

26

51 CodePointRepertoireInclusion

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

1 U+0981 BENGALISIGNCANDRABINDU

Candra-bindu

1Bangla2Manipuri2Assamese

[112][122][125]

2 U+0982 ং BENGALISIGNANUSVARA

Onushshar(Anusvara)

1Bangla2Manipuri2Assamese

[112][122][125]

3 U+0983 ঃ BENGALISIGNVISARGA

Biśarga(Visarga)

1Bangla2Manipuri2Assamese

[112][122][125]

4 U+0985 অ BENGALILETTERA

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

5 U+0986 আ BENGALILETTERAA

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

6 U+0987 ই BENGALILETTERI

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

7 U+0988 ঈ BENGALILETTERII

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

27

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

8 U+0989 উ BENGALILETTERU

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

9 U+098A ঊ BENGALILETTERUU

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

10 U+098B ঋ BENGALILETTERVOCALICR

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

11 U+098F এ BENGALILETTERE

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

12 U+0990 ঐ BENGALILETTERAI

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

13 U+0993 ও BENGALILETTERO

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

14 U+0994 ঔ BENGALILETTERAU

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

28

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

15 U+0995 ক BENGALILETTERKA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

16 U+0996 খ BENGALILETTERKHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

17 U+0997 গ BENGALILETTERGA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

18 U+0998 ঘ BENGALILETTERGHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

19 U+0999 ঙ BENGALILETTERNGA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

20 U+099A চ BENGALILETTERCA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

21 U+099B ছ BENGALILETTERCHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

29

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

22 U+099C জ BENGALILETTERJA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

23 U+099D ঝ BENGALILETTERJHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

24 U+099E ঞ BENGALILETTERNYA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

25 U+099F ট BENGALILETTERTTA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

26 U+09A0 ঠ BENGALILETTERTTHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

27 U+09A1 ড BENGALILETTERDDA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

30

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

28 09A109BC(U+09DC)

ড় NormalizedformofBENGALILETTERRRA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]09DCisthepreferredcodepointhoweveritisnotavailableforLGRasperthestandardsgoverningthisLGRdevelopment

29 U+09A2 ঢ BENGALILETTERDDHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

30 09A209BC(U+09DD)

ঢ় NormalizedformofBENGALILETTERRHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]09DDisthepreferredcodepointhoweveritisnotavailableforLGRasperthestandardsgoverningthisLGRdevelopment

31 U+09A3 ণ BENGALILETTERNNA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

32 U+09A4 ত BENGALILETTERTA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

31

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

33 U+09A5 থ BENGALILETTERTHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

34 U+09A6 দ BENGALILETTERDA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

35 U+09A7 ধ BENGALILETTERDHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

36 U+09A8 ন BENGALILETTERNA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

37 U+09AA প BENGALILETTERPA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

38 U+09AB ফ BENGALILETTERPHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

39 U+09AC ব BENGALILETTERBA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

32

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

40 U+09AD ভ BENGALILETTERBHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

41 U+09AE ম BENGALILETTERMA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

42 U+09AF য BENGALILETTERYA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

43 09AF09BC(U+09DF)

য় NormalizedformofBENGALILETTERYYA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]09DFisthepreferredcodepointhoweveritisnotavailableforLGRasperthestandardsgoverningthisLGRdevelopment

44 U+09B0 র BENGALILETTERRA

Consonant 1Bangla2Manipuri

[112][125]

45 U+09B2 ল BENGALILETTERLA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

46 U+09B6 শ BENGALILETTERSHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

33

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

47 U+09B7 ষ BENGALILETTERSSA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

48 U+09B8 স BENGALILETTERSA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

49 U+09B9 হ BENGALILETTERHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

50 U+09BE া BENGALIVOWELSIGNAA

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

51 U+09BF ি BENGALIVOWELSIGNI

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

52 U+09C0 ী BENGALIVOWELSIGNII

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

53 U+09C1 BENGALIVOWELSIGNU

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

34

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

54 U+09C2 BENGALIVOWELSIGNUU

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

55 U+09C3 BENGALIVOWELSIGNVOCALICR

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

56 U+09C4 BENGALIVOWELSIGNVOCALICRR

Kāra(Mātrā)

1Bangla2Assamese

[112][122]

57 U+09C7 l BENGALIVOWELSIGNE

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

58 U+09C8 m BENGALIVOWELSIGNAI

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

59 U+09CB lা BENGALIVOWELSIGNO

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

60 U+09CC lৗ BENGALIVOWELSIGNAU

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

35

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

61 U+09CD BENGALISIGNVIRAMA

Hasanta(=Halant)Virama(=Da ri)

1Bangla2Assamese2Manipuri

[112][122][125]

62 U+09CE ৎ BENGALILETTERKHANDATA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

63 U+09F0 ৰ BENGALILETTERRAWITHMIDDLEDIAGONAL

Consonant 2Assamese [122]

64 U+09F1 ৱ BENGALILETTERRAWITHLOWERDIAGONAL

Consonant 2Assamese2Manipuri

[122][125]

Table8BanglaCode-PointRepertoire

Apart from the above individual code-points the Neo-Brāhmī Generation Panel alsoproposes some specific sequences which enable conditional inclusion of the BanglaLETTER A and E followed by Bangla SIGN VIRAMA and Bangla LETTER YA againfollowed by Bangla VOWEL SIGN AA in the repertoire for enabling inclusion of aeligsoundasinEnglishlsquobatrsquolsquocatrsquoetc

SrNo

UnicodeCodePoints

Sequence

CharacterNames Examplelanguagesusingthecode-point

(Notexhaustive

list)

Reference

36

SrNo

UnicodeCodePoints

Sequence

CharacterNames Examplelanguagesusingthecode-point

(Notexhaustive

list)

Reference

S1 098509CD09AF09BE

অ8া BENGALILETTERABENGALISIGNVIRAMABENGALILETTERYABENGALIVOWELSIGNAA

BanglaAssamese

[112][122]

S2 098F09CD09AF09BE

এ8া BENGALILETTEREBENGALISIGNVIRAMABENGALILETTERYABENGALIVOWELSIGNAA

Bangla [112]

Table9Sequences

52 CodePointRepertoireExclusionTherearesomecharactersoftheBanglascriptthatfindplaceintheUnicodebuthavenot been included in the repertoire in the LGR proposal The reason for excludingঌ(U+098C)andৗ(U+09D7)isthattheyarerareandobsoletecharacters

SrNo CodePoints

Glyph CharacterNames Note

1 U+098C ঌ BENGALILETTERVOCALICL Limitedordeclininguse

2 U+09D7 ৗ BENGALIAULENGTHMARK Limitedordeclininguse

Table10ExcludedCodePoints

37

53 CodepointnotusedaloneBENGALI SIGN NUKTA U+09BC (See 336) is excluded from repertoire since it will never be used alone It will be used as sequence in three special characters in normalized form for ড় ঢ় য়

UnicodeCodePoint

Glyph CharacterName Reasonforexclusion

U+09BC BENGALISIGNNUKTA

Never used alone Only used together with U+09A1 ড

U+09A2 ঢ U+09AF য as to form ড় ঢ় য় respectively

Table10bExcludedCodePoints

54 TheBasisofPresentIDNThepresentLGRhasalsobenefitedfromtheearlierworkonIDNforBangla(differentversions)doneforभारतorভারতdraftedbetween20112009and18072013

541 TheABNFVariablesTheAugmentedBackus-NaurFormalism(ABNF)beganwiththefollowingvariables

CrarrConsonantVrarrVowelMrarrkāra(Mātrā)BrarrAnusvāra(onuʃʃār)DrarrCandrabinduXrarrVisarga(biʃɔrgo)HrarrHasantaViramaZrarrKhandaTa

TheAugmentedBackus-NaurFormalism(ABNF)willusethefollowingOperators

SrNumber Operator Function

1 ldquo|ldquo Alternative

2 ldquo[]rdquo Optional

3 ldquordquo VariableRepetition

38

4 ldquo()rdquo SequenceGroup

Table11TheABNFFormalism

InwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoBanglaaregiventofacilitateunderstanding

542 TheVowelSequenceInwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoBanglaare given To facilitate understanding of other Brahmi script users equivalents inDevanāgarīareprovidedwherevernecessaryAvowelsequenceismadeupofasinglevowelItmaybefollowedbutnotnecessarily(optionally)byanAnusvāraonuʃʃār(B)Candrabindu(D)oraVisargabiʃɔrgo(X)ThenumberofDBorXwhichcanfollowaV inBanglamaynotberestrictedtooneGoingbytherules illustratedinthedocument it isclearthat formationssuchasVDDVBBandVXXare invalidorthographicunitsHowever it isvalidandpossible tohaveformationsorsequencessuchasanusvarafollowedbyachandrabinduononehandandvisarga followed by a chandrabindu on the other as in হ8াংচা lsquohaelignchārsquo and lsquohaelignrsquo হ8াঃrespectivelyThepossibilityof aVisarga orAnusvāra (onuʃʃār) followingaCandrabinduexists inBangla Vowel can optionally be followed by a combination of Hasanta Virama [H]Consonant [C] to formaYa-phala ldquoYa-phala isapresentation formofU+09AFBanglaletterযorlsquoyarsquoRepresentedbythesequenceltU+09CDieBENGALISIGNVIRAMABangla SIGNHasanta or VIRAMA U+09AF -য BENGALI LETTER YAgt Ya-phala has aspecialformয়AgainwhencombinedwithU+09BEাBENGALIVOWELSIGNAA(ielsquoaarsquo(ā))itisusedfortranscribing[aelig]asintheldquoardquointheEnglishwordldquobatrdquowritteninBanglaasব8াটAVowel-sequenceadmitsthefollowingcombinations

5421 ASingleVowel

ExamplesV অअ

5422 AVowelwithConditionsAVowelcanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D]orVisarga[X]or Candrabindu+ Anusvāra [DB] or Candrabindu+ Visarga [DX] or combination ofHasanta(orVirama)[H]followedbyConsonant[C]followedbykāra(Mātrā)[M]

39

Examples

VB অং अ

VD অ अ

VX অঃ अः

VDB অং अ

VDX অঃ अःVHCM অ8াএ8া

5423 VHCMSequenceAVHCMsequencecanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D] orVisarga [X]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga [DX]

Examples VHCMB অ8াংএ8াং VHCMD অ8াএ8া VHCMX অ8াঃএ8াঃ VHCMDB অ8াংএ8াং VHCMDX অ8াঃএ8াঃ

543 TheConsonantSequence

5431 ASingleConsonant(C) ExampleC কक

5432 AConsonantwithConditionsAConsonantoptionallyfollowedbydependentvowelsignkāra(Mātrā)[M]orAnusvāra [B] or Candrabindu [D] or Visarga [X] or Hasanta (also known asVirama)[H]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga[DX]

Example

CM িকক -कक

CB কং क

40

CD ক क

CX কঃ कः

CH p क(Pureconsonant)

CDB কং क

CDX কঃ कः

5433 CMSequenceACMsequencecanbeoptionallyfollowedbyBDXDBorDX

Example CMB কীংকং कक

CMD কা का

CMX বীঃ वीः

CMDB কাং का

CMDX কাঃ काः

5434 SequenceofConsonants Asequenceofconsonants(upto4)joinedbyHasanta(alsoknownasVirama)

3(CH)CExample

CHC W rarr ন++ ত न+त

CHCHC sup2 rarr ন+ + ত+ + র न+त+र

CHCHCHC q8 rarr ন++ত++র++য় न+त+र+य

5435 Subsets While considering its subsets as a representative example we will consider thecombinationCHConlyhoweverthesameisequallyapplicabletoCHCHCandCHCHCHC[A]ThecombinationmaybefollowedbyMBDXDBorDX

Example CHCM sup3ী rarrক ক ী 4कrarrककী

CHCB sup3ং rarrক ক ং 4कrarrकक

CHCD sup3 rarrক ক 4कrarrकक

41

CHCX sup3ঃ rarrক ক ঃ 4कःrarrककঃ

CHCDB sup3 ং rarrক ক ং 4कrarrकक

CHCDX sup3ঃ rarrক ক ঃ 4कःrarrककः [B] 3(CH)CMmayfurtherbefollowedbyaBDXDBorDX

Example

CHCMB sup3ীং rarr ক ক ী ং 4क rarr क क ी

sup3ং rarr ক ক ং 4क rarr क क

CHCMD sup3া rarr ক ক া 4का rarr क क ा

CHCMX sup3ীঃ rarr ক ক ী ঃ 4कः rarr क क ी ः

CHCMDB sup3াংrarr ক ক া ং 4काrarr क क ा

CHCMDX sup3াঃ rarr ক ক া ঃ 4काः rarr क क ा ः

544 TheKhanda-Tasequence

5441 AsinglelsquoKhandarsquo-Ta(Z) Example Zৎ=x

5442 AKhandaTaCombination10AKhandaTacanbeprecededbyaconsonantandHasanta(alsoknownasVirama)

[CH]Z

Example র++ৎ =ৎHasinভৎHসনা (bhartsanā) scoldingNoteTheconditionsinthiscontextofKHANDATAarethattheCshouldbeeitherRAU+09B0(র)(usedinBangla)orRAU+09F0(ৰ)(usedinAssamese)

545 SpecialCasesSandPTwospecialcasesinvolvingSequences(referredtoasSandPinTable16underSection7)couldbedescribedbrieflyhereLetustakeupSinthefirstinstanceItisnoteworthythat thereare two instances inBanglawhereHasanta(U+09CD) isprecededbya full

10 Refer to Rule P in Section 7 Table 16

42

vowel (U+0985 অ - BANGLA LETTER A and U+098F এ - BANGLA LETTER E) ForrenderingYa-phalāfollowedbyঅandএitisnecessarytotypeU+09CDplusU+09AFyapreceded by the said vowels This is a purely ligatural entity and the addition ofYa-phalāandā-kāraisusedtoelicittheaeligsoundasinEnglishlsquobatrsquolsquofatrsquoetcTheBrahmıscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribedaslsquovowel killerrsquo although it actually indicates absence of a vowel after the markedconsonant Only the consonants can have the Hasanta marked But as we see hereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅGাandএGা(CfUnicode100p473[100])Another case refers to the formation of repha and ra-phalā in the said script andmentioned in the tableaboveasPOwing toco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+

kaasinঅকHarkaldquothesunldquo)ra-phalā=C+Hasanta+ra(egTieka+Hasanta+raas

inচTcakraldquocycleldquo)ThepointisinboththecasestheslotforracouldbeBanglaraর

(U+09B0)ortheAssameseraৰ(U+09F0)followedprecededbythecommonHasanta(U+09CD)whereastheshapesofrephaandra-phalāinboththecasesremainthesame

43

6 VariantsThissection talksabout thevariants in theBanglascriptTheNBGPcategorizes theseconfusinglyvariantsintwogroupsGroup1ConfusingduetopurevisualsimilarityGroup2ConfusingduetodeviationfromnormallyperceivedcharacterformationsbylargerlinguisticcommunityForGroup1anyidenticalcodepointsaredefinedasvariantsTheconfusablebutnotidenticalcasesarenotproposedasthereisanotherpanel(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesHowevercaseswhichbelongtoGroup2areproposedtobeconsideredasvariantsThesecasesarenotofmerevisualsimilarityasthey involve some deviations from the widely accepted norms of Bangla Aksharformations These can cause confusion even to a careful observer and hence beingproposedasvariantsThe variants are generated in a script when two or more forms are formed withdifferent storage or code points In Bangla the e-kāra ā-kāra and the o-kāra havedifferentcodepointsOnecantypeowithaconsonantatonegoandthesamebytypinge-kāra and ā-kāra as two separate keys getting the same results A reader cannotdifferentiatebetween the twoko (েকা)one typedwitha singlekeyand theotheronetypedwithtwodifferentkeysMoreoverthiswillnotbeconsideredasacaseofvariantbecauseakārafollowedbyakāraisnotallowed

61 InScriptVariantsHoweverweproposetwocasesoftruein-scriptvariantsinBanglascriptCASEIAs far as true variants in Bangla are concernedwemay drawour attention to caseswhereinHasantawith(U+09A5)থ(tha)appearsasconjunctwith(U+09B8)স(sa)and(U+09A8)ন(na)1 স+Hasanta+থ(U+09B8+U+09CD+U+09A5)versus

স +Hasanta+হ(U+09B8+U+09CD+U+09B9)

2 ন+Hasanta+থ(U+09A8+U+09CD+U+09A5)versus

ন+Hasanta+হ(U+09A8+U+09CD+U+09B9)

44

Theabovecombinationsifwrittenintraditionalorthographycouldbelittleconfusingwheretheথ(tha)inconjunctappearslikeaহ(ha)Theconjunctcouldbeintheinitialmedialorfinalpositions(asshownbelowinegno1)Itcouldbetypedwrongaswellthinking itwas aহ (ha)U+09B9 increasing the chances of risks in labelwriting andidentificationExamples1 acuteandসহ(asinacuteানsthānaacuteলsthulaাacuteGsvāsthyaঅacuteায়ীasthāyı)

2 andনহ(asinparagrantha)The fontswhichrepresent traditionalBanglawritingsystemcould tend tocreate thisproblemThereforethesemaybetakenascasesofvariantsinBanglaCASEIIAnotherinterestingexampleofvariantisencounteredinra+HasantaandHasanta+racombinationsinwritinglabelsintheBanglascript(forlanguagessuchasBanglaAssameseandManipuri)Thevariantcasesariseintypinglsquorepharsquo(involvingra+Hasanta)andlsquora-phalārsquo(involvingHasanta+ra)lsquoRepharsquocouldbeformedbytwosequences(mainlybecausebothAssameseandBanglafindplaceinthesameUNICODEpointsandlsquoB_RArsquoaswellaslsquoA_RArsquorefertothesamephoneticelement)Herethefinalligatureslookthesameandwillbeasfollows

(1) B_RA+H+C(2) A_RA+H+C

Where

B_RA = U+09B0BENGALILETTERRA(র)orA_RA = U+09F0BENGALILETTERRAWITHMIDDLEDIAGONAL(ৰ)H = U+09CDBENGALISIGNVIRAMA()C = anyconsonant(theoretically)

Example

Sequence1(UsingBanglaRA)

Ligature1

Sequence2(UsingAssameseRA)

Ligature2

U+09B0(র)U+09CD()U+0995(ক) কH U+09F0(ৰ)U+09CD()U+0995(ক) কH

U+09B0(র)U+09CD()U+09A0(ঠ) ঠH U+09F0(ৰ)U+09CD()U+09A0(ঠ) ঠH

Table12ExampleofRepha

45

NoteAsBanglaandAssameseকandঠlookexactlythesametheresultantcombinationswithRephalookidenticalAdditionofRephadoesnotmakeanydifferencelsquoRa-phalarsquocouldbeformedbytwosequencesonsimilargroundsandthefinalligatureswouldlookthesame

(1) C1+H+B_RA(2) C1+H+A_RA

WhereC1 = anyconsonantsexceptKhanda-ta

Example

Sequence1(UsingBanglaRA)

Ligature1

Sequence2(UsingAssameseRA)

Ligature2

U+0995(ক)U+09CD()U+09B0(র) U+0995(ক)U+09CD()U+09F0(ৰ)

U+09A8(ন)U+09CD()U+09B0(র) ) U+09A8(ন)U+09CD()U+09F0(ৰ) )

Table13ExampleofRa-phala

AstheAssameseandBanglaRephaandRa-phalaconjunctformslookthesamethiscouldcauseconfusabilitytotheend-usersHencetherephaandra-phalacasesneedtobedefinedasvariantsNBGPconcludedtodefineরandৰasvariantcodepointswhereonlyonevariantsetbetweenরandৰcouldcoverallcasesButthiswillcreateblockedvariantlabelsegif

someoneregistersldquoরররrdquothevariantlabelldquoৰৰৰrdquowillbegeneratedasvariantandwillbeblockedandviceversaHoweveritisonlyblockedatthelabellevelifsomeoneelseneedstoregisterotherlabelsegৰৰorৰৰৰৰitisstillpossible

62 CrossScriptVariantsAcrispcrossscriptstudyforBanglahasbeendonewithrespecttosisterscriptssuchasDevanāgarī Gurmukhı and Odia11(formerly Oriya) keeping in mind the visual andtechnicalconfusionstheymaycauseaslabelsonthewebdomainMoreoverthereisnoin-script variant in Bangla as far as the orthography is concerned The followingcharacters are being proposed by the NBPG as variants Although there are certaincharacters which are somewhat similar they but have not been included here TheyhavebeenprovidedintheAppendix(102)forreference

11 Unicode uses Oriya for the script although Odia is now the official term used

46

1 BanglaandNāgarīDevanāgarīScript

Bangla Devanāgarī

মU+09AE

मU+092E

িU+09BF

िU+093F

Table14-BanglaandDevanāgarīcross-scriptvariantcodepoint

2 BanglaandGurmukhiScript

Bangla Gurmukhı

মU+09AE

ਸU+0A38

িU+09BF

ਿU+0A3F

Table15-BanglaandGurmukhıcross-scriptvariantcodepoint

7 WholeLabelEvaluationRules(WLE) ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinsection32whenwritten inBangla12ScriptThe ruleshavebeendrafted in suchawaythattheycanbeeasilytranslatedintotheLGRspecificationsBelow are the symbols used in the WLE rules for each of the Indic SyllabicCategoryasmentionedinthetableprovidedinCodepointrepertoire(Section51)

C rarr Consonant

M rarr Kāra(Mātrā)

V rarr Vowel

B rarr Anusvāra

12 As used by the Unicode denoting and including both Assamese and Maṇipuri

47

D rarr Candrabindu

X rarr Visarga

H rarr Hasanta

Z rarr KhandaTa

S rarr S1S2(fromTable9)or(ae)Ya-phalā(V1HC1M1)whereV1iseither0985(অ-BENGALILETTERA)or098F(এ-BENGALILETTERE)His09CD(-BENGALISIGNVIRAMA)C1is-09AF(য-BENGALILETTERYA)M1is-09BE(া-BENGALIVOWELSIGNAA)S1 and S2 are valid even they are not allowed by the other context rules

P rarr Ra-Hasanta(C2H)whereC2iseither09B0(র-BENGALILETTERRA)or 09F0 (ৰ - ASSAMESE LETTER RA

Unicode name BENGALI LETTER RAWITHMIDDLEDIAGONAL)

His09CD(-BENGALISIGNVIRAMA)

Table16-SymbolsusedinWLErules

ItisalsoperhapsidealtomentionherethatinBanglatheconsonantletters(orgraphemes)arephysicallyjoinedtoformldquoclustersrdquothatcouldtheoreticallyconjoinfromtwotofourconsonantsandcombinetocreatenewshapesDashandChaudhuri(1998)statethatthereareldquonearly380uniqueconsonantclustersrdquooutofwhichBi-consonantalcombinationsare290three-lettercombinationsaccountforanother80andtherareroneswithfourlettersnumber10more[136Pg4]MoredetailsofsuchcombinationscouldbeseeninPabitraSarkar(1993)[135] 71FinalSetofWLERulesTheprevalentpatternsinBanglaandvariousrestrictionsbelowarethespecificWLErulesthatneedtobeimplemented

48

1 CisasetofCandCNwhereCNisthesetofnormalizedformsofড়ঢ়য়2 HmustbeprecededbyC

Example

3 MmustbeprecededbyCExampleকা

4 DmustbeprecededbyeitherofVCorMExampleআখখা হা

5 XmustbeprecededbyeitherofVCMorDExampleউঃখঃবঃাঃ দ ঃ

6 BmustbeprecededbyeitherofVCMorDExampleআংইংকং

7 ZmustbeprecededbyVCMDBXorPExampleইৎকৎাৎাৎপ6ৎrৎ (S is not listed because S ends with M Z may also follow S)rdquo

8 VCANNOTbeprecededbyHDetailsin711CaseofVprecededbyH

9 SCANNOTbeprecededbyH

Now letuselaborateeach rulewithexamples from the scriptkeeping inmindthe Bangla Assamese and Manipuri communities Some combinations ofcharactersmayseemunrealisticorrareinusagebutthereisnoharminaddingsuch ligaturesbecause it ispossible tocreate thembyanyusereasilybutmaynotbeattestedcombinationsCase of V Preceded by H There could be cases involvingmulti-word domains where Vmay need to beallowedtofollowanHegব8াsঅtইিuয়া baeligŋkʌv ɪndiə (U+09ACU+09CDU+09AFU+09BEU+0999U+09CD U+0995 U+0985 U+09AB U+09CD U+0987 U+09A8 U+09CD U+09A1U+09BFU+09DFU+09BE)(meaningBankofIndia)ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendswithanH(অt)andthesecondwordbeginswithaV(ইিuয়া)Somesectionsofthe linguistic community require the explicit presence of H for fullrepresentation of the sound intended However by and large the form of thefirstwordwithoutanH(U+09CD)isconsideredenoughforfullrepresentationofthesoundintendedforthefirstword

49

This isauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoire Otherwise V is never required to be allowed to follow an HPermittingthismaycreateaperceptivesimilaritybetweentwolabels(withandwithout H) for majority of the linguistic communities hence this is explicitlyprohibitedbytheNBGPIn future if required depending on the prevailing requirements from thecommunitythefutureNBGPmayconsiderrevisitingthisrule

72 AdditionalExamplesfromBanglaABNF

Belowarea fewexampleswhichhelponeunderstandsomeof the rulesABNFputsinplaceThesearejustgivenforreferencepurposesandarenotmeanttobecomprehensive

1HMBDorXcannotoccurinthebeginningofaBanglawordExample

ক क

াক ाक

ংক क

ক क

ঃক ःकAscanbeseensuchcombinationwillresultautomaticallyinaldquogolurdquooradottedcircle marking it as an invalid formation This is an intrinsic property of theIndianlanguagesyllableandisquasi-automaticallyappliedwhereversupportedbytheOS

2HisnotpermittedafterVBDXMS

Example

অ अ

অং क ক क কঃ कः ি )क

3NumberofBDorXpermittedafterConsonantorVowelorakāra(Mātrā)isrestrictedtoonethusthefollowingcombinationsareinvalidated

50

Example

কংং क

ক क

কঃঃ कःः

কা का

কীঃঃ कःः

অংং अ

অ अ

অঃঃ अःः

4NumberofMpermittedafterConsonantisrestrictedtooneExample

কীী की5MisnotpermittedafterV Example

ইাঈৗ ईाईौ6ThecombinationsofAnusvāra+ VisargaaswellasVisarga+Anusvāraarenotpermissible

Example

কংঃ कः

কঃং कः

8 Contributors

81ExpertsfromIndia ProfessorUdayaNarayanaSinghChair-ProfessorofLinguisticsampDeanFacultyofArtsAmityUniversityHaryanaGurgaonPachgaonManesarPIN122431(Haryana)India

51

ProfessorPabitraSarkarformerlyVice-ChancellorRabindraBharatiUniversityKolkataDrAtiurRahmanKhanPrincipalTechnicalOfficerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMrRajibChakrabortyLinguistSocietyforNaturalLanguageTechnologyResearch(SNLTR)Module114amp130SDFBuildingSaltLakeSector-VKolkata-700091(WestBengal)IndiaMrAkshatJoshiProjectEngineerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMsMoumitaChowdhurySeniorTechnicalOfficerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMrChandrakantaMurasinghAgartalaTripuraSomeotherNBGPmembers

82ContributorsfromBangladeshJanabMustafaJabbarHonorableMinisterMinistryofPostsTelecommunicationsampInformationTechnologyGovtofBangladeshProfShamsuzzamanKhanFormerDirector-GeneralBanglaAcademyDhakaProfRafiqulIslamNationalProfessorofHumanitiesDhakaProfSwarochisSarkarDirectorInstituteofBangladeshStudiesRajshahiUniversityRajshahiBangladeshProfJinnatImtiazAliDirector-GeneralInternationalMotherLanguageInstituteDhakaMrMohammadMamunOrRashidDepartmentofBanglaJahangirnagarUniversityampMemberBangladeshComputerCouncilProfManiruzzamanformerlyProfessorChittagongUniversityChattagramBangladeshMrShyamSunderSikderSecretarySecretaryPostampTelecommunicationsDivisionGovtofBangladesh

52

MrMdMustafaKamalFormerDirectorGeneralBangladeshTelecommunicationsRegulatoryAuthorityGovernmentofBangladeshDhakaBrigadierGeneralMdMahfuzulKarimMajumderDirector-GeneralEngineeringampOperationsDivisionBangladeshTelecommunicationsRegulatoryAuthorityGovernmentofBangladeshDhakaMdZiarulIslamProgrammerPostsampTelecommunicationsDivisionGovernmentofBangladeshDhakaProfSyedShahriyarRahmanDepartmentofLinguisticsUniversityofDhakaDrMizanurRahmanDirector(In-Charge)TranslationTextBookandInternationalRelationsDivisionBanglaAcademyDhakaDrApareshBandyopadhyayDirectorBanglaAcademyDhakaMrMdMobarakHossainDirectorBanglaAcademyDhakaDrJalalAhmedDirectorBanglaAcademyDhakaMrJahangirHossainInternetSocietyBangladesh(ICANNALS)JanabSarwarMostafaChoudhuryBangladeshComputerCouncilDhakaJanabMdRashidWasifBangladeshComputerCouncilDhakaJanabIstiaqueArifSeniorAssistantDirectorBangladeshTelecommunicationsRegulatoryAuthorityDhakaMsAfifaAbbasInformationSecurityandGovernanceLeadEngineeratBanglalinkandICANNFellowMr Mohammad Abdul Haque Secretary General Bangladesh Internet GovernanceForumMrImranHossenCEOEyeSoftandkeymemberofBangladeshAssociationofSoftwareampInformationServices(BASIS)MsShahidaKhatunDirectorFolkloreMuseumampArchiveDivisionBanglaAcademyDhakaMrSyedAshikRehmanCEOBengalMediaCorporationDhakaMrHaseebRahmanCEOProfessionalsrsquoSystemsDhaka

53

9 References[100] UnicodeConsortium2017UnicodeStandard100MountainViewCA

[101] BandyopadhyayChittaranjan1981DuiShatakerBanglaMudranoPrakashanKolkataAnandaPublishers

[102] BanerjiRD1919TheOriginoftheBengaliScriptKolkataNewDelhiAsianEducationalServices2003reprint

[103] ChatterjiSK1926TheOriginandDevelopmentoftheBengaliLanguageCalcuttaCalcuttaUniversityPress

[104] -----1939Bhasha-prakashBangalaVyakaran(AGrammaroftheBengaliLanguage)CalcuttaUniversityofCalcutta

[105] HaiMuhammadAbdul1964DhvaniVijnanOBanglaDhvani-tattwa(PhoneticsandBengaliPhonology)DhakaBanglaAcademy

[106]JhaSubhadra1958TheFormationofMaithiliLondonLuzacampCo

[107] KosticDjordjeDasRheaS1972AShortOutlineofBengaliPhoneticsCalcuttaStatisticalPublishingCompany

[108] MajumdarRC1971HistoryofAncientBengalCalcuttaGBhardwaj

[109] MazumdarBijaychandra19202000TheHistoryoftheBengaliLanguage(ReprCalcutta1920ed)NewDelhiAsianEducationalServices

[110] PandeyAnshuman2001ProposaltoEncodetheTirhutaScriptinISOIEC10646

[111] PalPalashBaran2001DhwanimalaBarnamalaKolkataPapyrus

[112] -----2007lsquoBanglaHarapherPanchParbarsquoInSwapanChakrabortyedMudranerSanskritiOBanglaBoiKolkataAbabhas

[113] RossFiona1999ThePrintedBengaliCharacteranditsEvolutionLondonCurzon

[114] ShastriMahamahopadhyayHaraPrasad1916HājārBacharērPurāṇaBāṅgālāBhāṣāyBauddhaGānōDōhāCalcuttaBangiyaSahityaParishat

[115] SinghUdayaNarayana(JointlyManiruzzaman)1983DiglossiainBangladeshandlanguageplanningCalcuttaGyanBharati214pp

[116] -----1987ABibliographyofBengaliLinguisticsMysoreCIILxii+316pp

[117] -----2017(withRajibChakrabortyBidishaBhattacharjeeampArimardanKumarTripathy)LanguagesandCulturesontheMarginGuidelinesforFieldworkonEndangeredLanguagesMimeoCentreforEndangeredLanguagesVisva-Bharati

54

[118] -----1980ScriptalchoiceandspellingreformAnessayinlanguageandplanningJournaloftheMSUniversityofBarodaSocialScienceNumber292173-186AmodifiedversionreprintedEAnnamalaiBjornJernuddandJoanRubinedsLanguagePlanningProceedingsofanInstituteMysoreCIIL405-417

[119] Sripantha1996JakhanChapakhanaEloKolkataPaschim-BangaBanglaAcademy

[120] SurAtul1986BanglaMudranerDushoBacharKolkataJijnasa

[121] ScriptBehaviourforBengaliVersion11TDILandC-DACPune

[122] BoraMahendra1981TheEvolutionofAssameseScriptJorhatAssamSahityaSabha

[123] ProposaltoEncodetheTirhutaScriptinISOIEC10646httpwwwunicodeorgL2L201111175r-tirhutapdfaccessedon25112017

[124] EthnologueAssameseintheLanguageCloudhttpswwwethnologuecomcloudasmaccessedon25112017

[125] BengalialphabetforManipurifoundinEthnologueManipuri(MeeteilonMeithei)httpswwwomniglotcomwritingmanipurihtmaccessedon20102019

[126] WikipediaBengalialphabethttpsenwikipediaorgwikiBengali_alphabetaccessedon25112017

[129]OmniglotSlyhetihttpwwwomniglotcomwritingsylotihtmaccessedon1052018

[130]WikipediaBishnupriyaManipurilanguagehttpsenwikipediaorgwikiBishnupriya_Manipuri_languageaccessedon25112017

[131] TheEMILLECIILCorpushttpmetashareeldaorgrepositorybrowsethe-emilleciil-corpusabdd35c8de6f11e2b1e400259011f6ea6bce74d38dbb42d881da76c64a6adb20accessedon1052018

[132]TheEMILLECIILCorpushttpcatalogelrainfoproduct_infophpproducts_id=696accessedon1052018

[133] BanglaLanguageampScript httpswwwisicalacin~rc_banglabanglahtmlaccessedon1052018

[134] SarkarPabitra1992BanglaBananSanskarSamasyaoSambhabanaKolkataChirayataPrakashan

[135] SarkarPabitra1993BanglaBhasharYuktabyanjanBhasha1123-45

55

[136] DashNiladriShekharandBBChaudhuri1998BanglaScriptAStructuralStudyLinguisticsToday121-28Alsoavailableathttpswwwacademiaedu9967428Bangla_Script_A_Structural_Study

[137] DaniAhmedHasan(1957)lsquoSrīhatta-NāgarīLipirUtpattioBikāśrsquoBanglaAcademyPatrika(Dhaka)Vol12(Bhadra-Agrahayan1364BangabdaNumber)pg1

[138] WikipediaSylhetiNagari httpsenwikipediaorgwikiSylheti_Nagariaccessedon1952018

[139] FuruiRyosuke(2015)lsquoVariegatedAdaptationsStateFormationinBengalfromtheFifthtoSeventhCenturyrsquoinBhairabiPrasadSahuampHermannKulkeedsInterrogatingPoliticalSystemsIntegrativeProcessesandStatesinPre-ModernIndiaChapter9Pp255-73NewDelhiManohar

[140] FergusonCharesAandMunierChowdhury(1960)lsquoPhonesofBengalirsquoLanguageVol36No1pp22-59

[141] ShahidullahMuhammad(2007)BuddhistMysticSongsDhakaMowlaBrothers

[142] RayPunyaSloka(1966)BengaliLanguageHandbookWashington

[143] HaiMuhammadAbdul(1960)AphoneticandphonologicalstudyofnasalsandnasalizationinBengaliDhakaUniversityofDhaka

[144] UnicodeConsortiumProposalSummaryFormtoAccompanySubmissionsforAdditionstotheRepertoireofISOIEC10646UNICODEhttpswwwunicodeorgL2L200202387r-syloti-formpdfaccessedonMay212018

[145] WikipediaOlChiki(Unicodeblock)httpsenwikipediaorgwikiOl_Chiki_(Unicode_block)accessedonMay212018

[146]BanglaScripthttpwwwbangladesh2000combdbangla_scripthtmlaccessedonMay212018

[147] BhattacharyaAshutoshed(1942)GopichandrerGanCalcuttaCalcuttaUniversity

[149] DasSisirKumar(1975)SahibsandMunshisAnAccountoftheCollegeofFortWilliamCalcutta

[150] IslamRafiqulPabitraSarkarMahbubulHaqampRajibChakraborty(eds)(2014)BanglaAcademyPromitoBanglaByabaharikByakaran(AFunctionalGrammarofStandardBangla)DhakaBanglaAcademy

[151] SarkarPabitra[2013]lsquoBanglaSpellingReformtheLongandShortofItrsquoBanglaJournal19215-232

[152] BanglaAcademy(2012)BanglaAcademyPromitoBanglaBananerNiyam(StandardBanglaSpellingasadoptedbyBanglaAcademy)DhakaBanglaAcademy

56

[153] SarkarPabitraampRajibChakraborty2018ldquoWhathashappenedSoFarIntermsof Script Reformsrdquo Paper presented at the Face to Face meeting jointly held by theBanglaAcademyDhakaampICANNatBanglaAcademyDhakaon10072018

[154] The Unicode Consortium 2018 The Unicodereg Standard Version 110 ndash CoreSpecificationChapter12P473

57

10 Appendix-I

101 AugmentedBackus-NaurFormalism(ABNF)The Augmented Backus-Naur Formalism (ABNF) is generic in nature and whenappliedtoaspecificlanguagescriptcertainrestrictionrulesapplyInotherwordsinagivenlanguagesomeoftheFormalismstructuresdonotnecessarilyapplyTotakecareofsuchcasesrestrictionrulesaresetinplaceTheserestrictionswillhelptofine-tunetheABNFIncaseofBangla13inparticularthefollowingrulesapply

1Khaṇḍata (ৎ) is NOT allowed at the beginning of an IDN label The sameappliestoঞandthevelarnasalঙintheBanglaSchemeoffive-foldlsquovargarsquo(asdefined under Table 5) Moreover Bangla does not allow ya (য়) in thebeginningofawordeitherbutwecanciteacoupleofnativeexamples forexamplethewordয়8াvেড়া(yaeligbbɔRo)fromthepoemlsquoLichuchorrsquowrittenbyKaziNazrul IslamHowever there are instances of it being used in namesmostlyof foreignoriginsuchasYaqubwhichmaybewrittenwithya(য়) inthe beginning as inয়াwব) In very recent timeswhile transliterating someChineseandJapanesenamesinBanglaonedoescomeacrossthepossibilityofKhaṇḍata (ৎ) followedbysa (স) inthebeginningofaword forexamplexেসিরং(Tsering)

2CHcancomewithKhandaTainonlythecasewhereCisra(র)(09B0)

ৎ6 asinভৎ6 সনা

3OnlyfollowingcombinationswithVHCMwillbeallowedrarrঅ8া(togetherpronouncedasaelig)asinঅ8ািসড(acid)rarrএ8া(togetheralsopronouncedasaelig)asinএ8ািসডএ8ােসািসেয়শান

(acidassociation)

102 lsquoSylhetiNagarılipirsquoorlsquoSilotirsquoThisversionofBanglascriptresemblesthe lsquoKaithīrsquoscript(ISO12954)usedbytheAccountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh andBiharndashwidelyinuseduringthe1880sTherewereseveralothernamesofSylheti

13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language Assamese and Maṇipuri have not been covered in this section

58

Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)

Table17ndashTheScriptTableofSylhetiNagarıorSiloti

103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints

1031 BanglaandNāgarīorDevanāgarī

Bangla Devanāgarī NBGP

Decision

ঃ U+0983 ः U+0903 Confusable

ওU+0993 उU+0909 Confusable

ঘU+0998 घU+0918 Confusable

U+0981 U+0945 Confusable

Table18BanglaandDevanāgarīconfusablecodepoints

59

1032 BanglaandGurmukhi

Bangla Gurmukhi NBGPdecision

ঘU+0998 ਬU+0A2C Confusable

U+0981 U+0A71 Confusable

Table19BanglaandGurmukhiconfusablecodepoints

Bangla Gurmukhi

NBGPdecision

ওU+0993 ਤU+0A24

Distinguishable

শU+09B6 ਅU+0A05

Distinguishable

মU+09AE ਮU+0A2E

Distinguishable

বাU+09ACandU+09BE

ਗU+0A17

Distinguishable

Table20ndashBanglaandGurmukhıdistinguishablecodepoints

1033BanglaandOriya(Odia)

Bangla Oriya(Odia) NBGPDecision

ওU+0993 ଓU+0B13 Confusable

Table21ndashBanglaandOriyadistinguishablecodepoints

Bangla Oriya(Odia) NBGP

Decision

ঘU+0998 ସU+0B38 Distinguishable

Table22ndashBanglaandOriyadistinguishablecodepoints

60

11 Appendix-IIBengaliconsonantsandtheirallographs

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

প p y(z+ত)(z +ন)|(z +প)প8(z +য)r(z +র)(z +ল)~(z +স)(+প)(+প)

ফ pʰ (t +র)(t +ল)(+ফ)

ব b ( +জ)( +দ)( +ধ)v(+ব)ব8(+য)(+র)(+ল)ভ(+ভ)(+ব)(+ব)

(0+ধ) 2 (3+ব)

ভ bʱ ভ8(+য)(+র)(+ল)

ত t (x+ত)8(x+x+য)(x+x+ব)(x+থ)(x+ন)ত8(x+য)(x+ম)8(x++য)(x+ব)(x+র)y(z+ত)(p+ত)(p+x+ব)(+ত)q8(+x++য) (+x+র)Thereisamarkedformofত+=ৎৎ6 (+xৎ)

amp (5+ত)

61

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

থ tʰ থ8(iexcl+য)cent(iexcl+র)pound(+থ)(x+থ)curren(+থ)

(7+থ) 9 (+থ)

দ d yen(brvbar+গ)sect(brvbar+ঘ)uml(brvbar+দ)copy(brvbar+ধ)দ8(brvbar+য)ordf(brvbar+ব)laquo(brvbar+ভ)not(brvbar+র)(+দ)shy(+দ)reg(+brvbar+র)not6 (+brvbar+র)

(lt+গ) gt (lt+ধ)

ধ dʱ macr(deg+ন)plusmn(deg+ম)ধ8(deg+য)sup2(deg+র)sup3(acute+ধ)copy(brvbar+ধ)(+ধ)micro(+ধ)

( (+ধ) gt (lt+ধ) (0+ধ) (7+ধ)

ট ʈ para(middot+ট)ট8(middot+য)cedil(middot+ব)sup1(middot+র)ordm(p+ট)raquo(frac14+ট)

ঠ ʈʰ ঠ8(frac12+য)frac34(iquest+ঠ)Agrave(frac14+ঠ)

ড ɖ Aacute(Acirc+ড)ড8(Acirc+য)Atilde(Acirc+র)

ঢ ɖʱ ঢ8(Auml+য)Aring(iquest+ঢ)

চ tʃ AElig(Ccedil+চ)Egrave(Ccedil+ছ)Eacute(Ccedil+Ecirc+র)Euml(Ccedil+ঞ)চ8(Ccedil+য)Igrave(Iacute+চ)Icirc(Iuml+চ)

(A+চ)

62

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ছ tʃʰ ETH(Ecirc+র)Egrave(Ccedil+ছ)Ntilde(Iacute+ছ)Ograve(Iuml+ছ)

$ (A+ছ)

জ dʒ Oacute(Ocirc+জ)Otilde(Ocirc+Ocirc+ব)Ouml(Ocirc+ঝ)times(Ocirc+ঞ)জ8(Ocirc+য)Oslash(Ocirc+র)Ugrave(Iacute+জ)

(A+জ)

ঝ dʒʱ (notprivilegedenoughtohaveclustersasafirstmember)Ouml(Ocirc+ঝ)Uacute(Iacute+ঝ)

ক k Ucirc(p+ক)ordm(p+ট)(p+ত)Uuml(p+x+র)(p+x+ব)Yacute(p+ন)THORN(p+ব)szlig(p+ম)ক8(p+য)agrave(p+র)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)auml(p+frac14+ব)aacute8(p+frac14+য)aring(p+স)s(aelig+ক)ccedil(+p+র)

amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)

খ kʰ (notprivilegedenoughtohaveclustersasafirstmember)egrave(aelig+খ)

63

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

গ g eacute(acute+গ)ecirc(acute+দ)sup3(acute+ধ)euml(acute+ন)igrave(acute+ব)iacute(acute+ম)গ8(acute+য)icirc(acute+র)iuml(acute+ল)eth(aelig+গ)eth6 (+aelig+গ)

( (+ধ) (H+গ) K (L+H+গ)

ঘ gʱ ntilde(ograve+ন)ঘ8(ograve+য)oacute(ograve+র)ocirc(aelig+ঘ)

ঞ Thisletterdoesnothaveanyparticularphoneticvaluebutmostlypronouncedasn

Igrave(Iacute+চ)Ntilde(Iacute+ছ)Ugrave(Iacute+জ)Uacute(Iacute+ঝ)times(Ocirc+ঞ)

(A+চ) $ (A+ছ) (A+জ) M (A+ঝ)

ণ n otilde(iquest+ট)frac34(iquest+ঠ)ouml(iquest+ড)divide(iquest+Acirc+র)Aring(iquest+ঢ)oslash(iquest+ণ)ণ8(iquest+য)ugrave(iquest+ব)acirc(p+frac14+ণ)uacute(frac14+ণ)ucirc(+ণ)

O (P+ড) - (P+R+র) + (S+ণ)

ঙং ŋ s(aelig+ক)uuml(aelig+p+র)egrave(aelig+খ)eth(aelig+গ)ocirc(aelig+ঘ)yacute(aelig+p+ষ)(Insomecontextsaeligisreplacedbyং)কংঅং

) (H+ক) (H+গ) U (H+ঘ)

64

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ম m thorn(+ল)yuml(+প)(+z+র)(+ভ)(++র)$(+ম)(+র)(x+ম)plusmn(deg+ম)amp(+ম)atilde(p+frac14+ম)

W (3+ম)

ন n (+ট)((+middot+র))(iquest+ঠ)u(+ড)(+Acirc+র)(+ত)q(+x+র)q8(+x++য)curren(+থ)shy(+দ)reg(+brvbar+র)micro(+ধ)+(+deg+র)(+brvbar+ব)-(+ন)(+ম)ন8(+য)(+স)0(+ন)

(7+থ) (7+ধ) (7+Y+র)

শ ʃ Icirc(Iuml+চ)Ograve(Iuml+ছ)1(Iuml+ন)2(Iuml+ম)3(Iuml+র)4(Iuml+ল)শ8(Iuml+য)

ষ ʃ 5(frac14+ক)raquo(frac14+ট)Agrave(frac14+ঠ)uacute(frac14+ণ)6(frac14+প)7(frac14+z+র)8(frac14+ফ)raquo(frac14+ট)9(frac14+middot+র)Agrave(frac14+ঠ)uacute(frac14+ণ)ষ8(frac14+য)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)

+ (S+ণ)

65

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

স sampʃ (+ক)(+ট)(+প)(+ফ)lt(+ত)pound(+থ)(+ট)(+ক)=(+খ)স8(+য)gt(+র)(+ল)aring(p+স)

9 (+থ)

হ h ucirc(+ণ)0(+ন)amp(+ম)হ8(+য)(+র)A(+ল)

W (3+ম)

ড় ɽ B(C+গ)

ঢ় ɽʱ (notprivilegedenoughtohaveclusters)

য dʒThesecondarysymbol(allograph)jɔ-phalahastwophoneticvaluesWhenaddedtotheinitialconsonantinaworditisavowelaelig(asinশ8ামলর 8াপারetc)Butafteranon-initialconsonantitjustdoublesitinpronunciation(asinকায6ধায6etc)The+যcombinationhastwophysicalmanifestationsmdashর 8andয6

ক8(p+য)স8(+য)র 8(+য)[Justর 8isneverusedinBanglaorthographyর 8াisbutthenitslasttwosymbolsYa-phalaa-karaconstituteavowelsignrepresentingthevowelঅ8া]

66

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

র r Twomanifestationsmdashi lরফrepʰasthefirst

memberofaclusteregপ6ৎ6 not6 য6D6(+deg+ব)(earlierE6=+brvbar+deg+বafour-termcluster)etc(placedoverthefollowingconsonant)

ii র-ফলাrɔ-pʰɔlaasthesecondthirdmemberofaclustereg etc(placedundertheconsonantitfollows)

ল l F(+গ)(+প)G(+ব)H(+ম)I(+ট)J(+ড)K(+ক)F(+গ)L(+দ)ল8(+য)iuml(acute+ল)(+ল)thorn(+ল)

ঃ hwordfinallywordmediallyitdoublesthepronunciationofthefollowingconsonant

অঃকঃ

অব

Page 3: Proposal for a Bangla (or Bengali) Script Root Zone …‘Bangla’ (or Bengali) is historically and genealogically regarded as an eastern Indo-Aryan language with around 178.2 million

3

varietyofCalcutta(calledlsquoKolkatarsquonow)madeitsfirstappearancethroughthe Hutōm Pẽcāra Nakśā (1862) by Peari Chand Mitra The influence ofEnglishinthevocabularyidiomsandexpressionsaswellasinthewritingstyles of Bangla is significant by this time The fonts and types for Bangladeveloped during this time also spread to all parts of Bangla speechcommunity[101120]Thesamefontswithsomeextensionswerealsousedfortheneighbouringlanguagesdeployingthiswritingsystem

Bangla prose had developed two literary styles during the 19th-20th Century TheSādhubhāṣā (সাধভাষা - Elegant Language or Style) and the Calitabhāṣā (চিলতভাষাCurrent Language orModern Style) It is the latter style that is prevalent today inwrittenproseTheLanguageMovementinBangladesh(thethenEastPakistan)beganin1948ascivilsociety dissented to the elimination of the Bangla script from currency and stampswhichwere inuse since theBritishRaj Themovement reached its pinnacle in1952when on 21 February the police fired on demonstrating students and civilianstriggeringnumerousinjuriesanddeaths2LaterfollowingtheLanguagemovementon27 April 1952 the All Party National Language Committee decided to demandestablishment of an organization for the promotion of Bengali language BanglaAcademyDhaka right from its inception in1955hasbeenengaged inpromotingandfosteringBanglaasthelinguafrancaofthecountrybeforeandafterindependencefromPakistanin1971ThroughthevariouscommissionsandcommitteesconstitutedbytheGovernment of Bangladesh (Banladesa Jatıya Sy iksa Kamisana in 1972 Jatıya Sy iksaUpadestaParisadin1979BanlaBhasaBastabayanaSelain1982BanlaBhasaKamitiin1983 etc3) after independence in 1971 Bangla was made the primary medium ofinstructioncommunication in all Governmental and educational activities Through agreatstruggleandbloodshedtheBengalisestablishedBanglaasanofficiallanguageofthestate4

2 The UN declared Ekuśe February (21st February) as the International Mother Language Day at the UNESCO General Conference in Paris on 17 November 1999 ldquoin recognition of the sanctity and preservation of all vernacular languages in the worldrdquo22 3 Bāṅlā Bhāṣā Kamiṭi 1983 Bāṅlā Bhāṣā Kamiṭi Riporṭ (Report of the Bangla Bhasha Committee) Dhakaː Śikṣā Dharma Krīṛā O Saṅskṛti Mantraṇālaya Peoples Republic of Bangladesh 4 Chakraborty Rajib 2018 The Fishermenrsquos Community A Language-Culture Interplay (A Study of Post-1971 Select Bangla Novels) Unpublished PhD Dissertation Visva-Bharati

4

31WrittenBanglaThe lsquoBangla alphabetrsquo (বাংলা িলিপ - Bānglā lipi ISO15924) is derived from theBrāhmīwritingsystemwhichisrelatedtotheNagarı(alsoknownasDevanāgarī5)script[108]aswell as to Tirhutāwriting system [106] Considered to be fifthmostwidely usedwritingsystem in theworld thiscombinedBangla-Asamiyā-ManipuriScript (showingsomevariationsforAsamiyāandMeiteiorBisnupriyaManipuri)(130)wasusedintheeasternIndianSanskritmanuscriptstooForChakma in IndiaandBangladeshandforKokborok inTripura itwasandstill isoneof thescriptsusedAclosevariant calledTirhutā (123 now available also in UNICODE 100 as 11480 114DF See 110) orMithilākṣarawasused forMaithili fromthe14thCenturyuntil theearly-20thcentury[106]InthiscontextonefindsamentionoflsquoSylhetiNagarılipirsquoorlsquoSilotirsquo(addedtotheUnicodeStandard inMarch2005with thereleaseofversion41) thedetailsofwhichcouldbeof interest only tohistorians andhistorical linguists (See137and144)ButSylhetiBanglaisgenerallywrittenbymanyinthemodern-dayBanglascriptnowforallpracticalpurposes Originallyduring thereignof thePāladynasty (750-1154AD) intheeasternIndiaandevenearlierperhapsduringtheMallaperiod(694ADonwards)thepresent-dayBanglawritingsystemgotashapecomparabletothemodern-dayones[111 119] A pictorial description of Brāhmī to Modern Bangla Script could bepresentedhereinatabularform

Modern ক জ ম র স অ

k j m r s a

Table1PictorialdepictionofEvolutionofBrāhmītoBangla

5William DwightWhitney in his SanskritGrammar unequivocally said ldquoThis name (Devanagarı) is ofdoubtfuloriginandvaluerdquo(WhitneyWilliamDwight1994reprintSanskritGrammarNewDelhiːMotilalBanarasidassPublishersp1)

5

The inscriptional evidence in Brāhmī is found in the Archaic Brāhmī from the 3rdcenturyBC tothe1stcenturyBCandinMiddleBrāhmīndashsoonafter(1st-3rdCenturyAD)andthenonintheLateBrāhmī(4th-6thCenturyAD)ThisevidencecouldbeseeninbothBangladeshandWestBengal [108]by1)TheMahasthanagara(BogradistrictBangladesh mdash the ancient name being Pundranagara or Paundravardhanapura)inscriptions 2)Brāhmī (andKharoṣṭhī) inscriptions from the lower lsquoGangeticBengalrsquoand (3) Copper plate inscriptions of the Imperial Guptas fromNorthernpart ofWestBengal andNorth-West Bangladeshmdash in the areas underDharmaditya Gopachandraand Samācāradeva (about whom one only knows from five Copper-plates found inKotalipara in the Faridpur district in Bangladesh one in Mallasarul in the Burdwandistrict(WestBengal)andoneinJayramapura(BallesvaradistrictnowinOdisha)Theseepigraphs fromtheeasternpartofUndivided India (datingback to the4th-6thCenturiesAD)showedsomecharacteristicfeaturesofletters(especiallyinমlsquomarsquoলlsquolarsquo

শlsquosarsquoসlsquosarsquoandহlsquoharsquo)whichledtothedevelopmentofeasternvarietyofGuptascriptEpigraphicrecordsfromBangladeshdemonstrateremarkabledevelopmentsinEasternBrāhmī In this context the Tippera copper plate inscription of the lsquoSamatatarsquo rulers(139 pp 265) such as Lokanātha (dated 7th Century AD during the latter half) theKailaninscriptionofSy ridharanaRātaaswellastheAstafpurcopperplatesThelettersseemtohangdownfromwedgeshapedsolidtriangleswithrighthandverticalsbendingdownatthebottombecauseofwhichitwasdescribedbyPrinsepandFleetasKuṭila-lipi (literally lsquoCursivewriting stylersquo)whereas the termSiddhamātrikā (as amatra orbarisplacedovereachoftheletters)wasusedbyAlBiruni(973-1048)todesignatethescriptofNorthernIndiaThenextstageofdevelopmentisillustratedbythe9thCenturycopper plate inscriptions fromKhalimpur of the reign of Dharmapāla fromMonghyrand Nalanda of the time of Devapāla in Bihar and from Jagjıvanpura (Malda) of thereignofMahendrapālaTheSiddhamātrikā(mentionedaslsquoSiddhamrsquoinChinesesources)issaidtohavebeenprevalentalsointhisregionuptotheendofthetenthcenturyAlsocalledtheGauri(ieGandi)inPūrvadeśāortheEasterncountryitwasregardedasthesame script to which is given the appellative Proto-Bangla characteristics inrudimentaryformsintheperiodbetweenAD875andAD1025Insomeepigraphs it isconsideredasbelonging to thesecondquarterof theeleventhcenturyADFlatteningofhead-marksbecomesprominentincomparisontothewedge-shaped serifs An important landmark in the development of the Bangla script is theRamaganja copper plate inscription of Mahāmānḍalika in the last quarter of theeleventhcenturyADItistheearliestdocumentfromthisentireregionwhichbearsthelettermwithatickrisingupwardsThefullvowelidevelopsatickattherightendofthe upper horizontal bar above and a curved hook below Initial e approaches themodernBanglacharacterAmature formofProto-Bangla the immediateprecursorof

6

BanglascriptisillustratedintheinscriptionsoftheVarmanaSenaandDevarulersofthetwelfthandthirteenthcenturies[104]TheevolutionoftheBanglascript(Cf136)isalignedwiththestoryofadvancementofprintingtechnologyThefirstldquoMovabletyperdquoscriptstechnicallycreatedandusedwhileprintingNathanielBrasseyHalheds (1751-1830)1778-book titled AGrammaroftheBengalLanguageIn1785Governor-GeneralWarrenHastings(1732-1818)requestedanother civilian Charles Wilkins (1749-1836) to cut punches for Bangla printingcharactersThecurrentprintedformofBanglascriptappearedsoonafterItisgenerallyagreedthatWilkinsdevelopedBanglaprintscript[111]HepassedonthisknowledgetoPancananaKarmakara(-1804)arenownedartistinBengalLateritwasKarmakarand his family that became famous in Bangla printing technology Shepherd wasanotherassistantofWilkinsinthisdesigningofscriptwhichbecamemoreangularwithsharperturnsandedges[133]Afewarchaiclettersweremodernizedduringthe19thcentury It was standardized by Pandit Ishwar Chandra Vidyasagar when the Banglatypefontsweretobeusedtopublishona largescaleundertheCalcuttaSchoolBookSociety[116forseveralreferences]Much later in1935 theLinotypetechnique inventedbyOttmarMergenthaler(1854-1899) in 1886was introduced intoBangla printing in 1935 by the efforts of SureshChandra Majumdar (1888-1954) Rajsekhar Basu (1880-1960) Jatindra Kumar Sen(1882-1966)andhisdiscipleSushilKumarBhattacharyaandhadbegunbeingusedbytheA nandabazaraPatrikagrouplaterfollowedbyothersWithinafewyearsthemoreadvancedmonotypetechnologycametobeusedinBanglaprintingHoweverinBanglaprinting culturemonotypehas a very limited acceptance and linotype held stage tilleventuallythedigitaltechnologycameintoreplaceallearliertechniquesAllthesecouldbepresentedinatable

PERIOD DESCRIPTION NAMES

3rdCenturyBC UseofBrāhmīandKharosthīscriptsbegininthesubcontinentBrāhmīwaswidelyusedduringtheMauryanKingAśokaInonetheoryBrāhmīisbasedonNorthSemiticalphabetbutsuitablymodifiedtofittheneedoflocallanguagesItiscurrentlybelievedtohavebeenanindependentdevelopment

Brāhmī

1st-3rdCenturyAD

TheKusanascriptnamedaftertheKusanaroyaldynasty

Kusanascript

7

PERIOD DESCRIPTION NAMES

4th-5thCenturyAD

ThenextstageofitsevolutionwasintotheGuptascriptnamedaftertheGuptaroyaldynasty

Guptascript

7thCenturyAD EpigraphicrecordsfromBangladeshdemonstrateremarkabledevelopmentsinEasternBrāhmīgivingrisetotheKuṭila-lipi

Kutila-lipi

8thCenturyAD SomecopperplateinscriptionsarefoundintheKhalimpurBangladeshduringthereignofDharmapālafromMonghyrandNālandāinBiharofthetimeofDevapālaandfromJagjıvanapurainWestBengalofthereignofMahendrapāla

Siddhamātikā

9thCenturyADuntil1025AD

Proto-BanglacharacteristicsinrudimentaryformsdevelopAnimportantlandmarkinthedevelopmentoftheBanglascriptistheRamaganjacopperplateinscriptionofMahāmāndalikafoundinthelastquarteroftheeleventhcenturyAD

Proto-BanglaScriptampLanguage

12th-13thCenturyAD

AmatureformofProto-BanglatheimmediateprecursorofBanglascriptisfoundintheinscriptionsoftheVarmanaSenaandDevarulersofthetwelfthandthirteenthcenturies

MaturedProto-Bangla

14th-15thCenturyAD

ThecharacteristicsoftypicalBanglascriptbegantodevelopascouldbeseeninthecopperplateinscriptionofVijayamānikya-IofTripuradated1478AD-alsoIllustratesformsofBanglalettersinthefifteenthcenturyAD

ModernBanglaScripterabegins(SeeRoss1999)

16th-17thCenturyAD

ThechartoftheBanglaalphabetappendedtotheChinaMonumentspublishedfromAmsterdamin1667andThecodeofGentoolawpublishedfromLondonin1776bothshowachartoftheBanglaalphabetTheyshow16VowellettersincludingtheLonglsquoৡrsquo lsquol irsquoAnusvāraandVisargaand34Consonants

PrintedChartsofBangla

18th-19thCenturyAD

CharlesWilkinsdevelopsprintinginBanglain1778andVidyasagarreformsit

BanglaTypeFonts

Table2DevelopmentoftheBanglaWritingSystem

8

TheoveralldevelopmentofBanglaScriptfromtheKuṭila-lipiperiodtoModernBanglacouldbeseenhereinTable3([102and146]andalsoseetheweb-pagein147)

Table3BanglaScriptinDifferentCenturies

32LanguagesConsideredBelowisthetabularrepresentationofthelanguagesusingBanglascriptthatareplacedonEGIDSScale1-6 (See117 fordetails) Some languagesunderEGIDS5 and6havealso developed their own scripts for printing and publishing Some had used Banglascriptearlier(suchasBodo)orusedit inWestBengalatsomepointoftime(Santali)but have later shifted to another writing system Bodo is now written in Nāgarī or

9

DevanāgarīandforSantalioneusesbothNāgarīDevanāgarīandOl-chiki(145)Forthepurposesof theBanglaLGRonly languagesbelonging to theEGIDS scale1 to4havebeenconsideredConsiderthefollowingtable

EGIDSScale1

EGIDSScale2

EGIDSScale3

EGIDSScale4

EGIDSScale5

EGIDS6

Bangla(Bengali)

SantaliBodoRiangKhumiMru(ng)Asho

LepchaPnarKodaKoraChak

Asamiyā(Assamese)

KochorRajabansı

MaltoorMalpahariya

ManipuriorMeitei

BisnupriyaManipuriKok-Borok(TripuraampBangladesh)

ChakmaHajongMundariampKurux(ofBangladesh)

TotoRohingyaTipperaMegamTanchangya

Usoi LimbuSadriorOraon

BhumijorMundariBawmChin

Table4MainlanguagesinIndiaandBangladesh

thatuseBanglaScriptontheEGIDSScale

33NotableFeaturesofBanglaScript[150]BanglaWritingSystemhascertainfeaturesthatshowhowithastobewritteninorhowtype-setting inBangla couldbedoneThis section is followedbya section that explains theCode-points (and fixed Code-point sequences) which show certain distinctive characteristics ofBanglaandwhichmaketheRepertoireThenextsectionswillalsocoverthelsquoaksharrsquo-formationrules(ABNF)showingcharacterclassWordLevelEvaluation(WLE)andContextRulesaswellas In-ScriptandCross-ScriptVariantsHerewepresentsomebasic featuresof theScriptandPronuncition The Bangla script is an alpha-syllabic writing system in which writing of all

consonants are assumed to contain an accompanying lsquoinherentrsquo vowel(theoretically before or after each consonant) It varies between ɔ and o

10

depending on the position of the consonant in the word At times theselsquoassumedrsquoorlsquoinherentrsquovowelsarenotpronouncedatall[142]

Vowelscanbewrittenasindependentlettersorbyusingavarietyofdiacriticalmarks which are written above below before after or both of the last twopositionstheconsonanttheyfollowinpronunciation[105]

AllBanglaconsonantswhenpronouncedinisolationareutteredwithaninherentvowel-ɔhenceক lsquokrsquoখ lsquokhrsquoorগ lsquogrsquoareusuallypronouncedas[kɔ][khɔ]or[gɔ]etcPhonologicallyBanglavowel-ɔcorrespondstotheHindischwaə

WhenconsonantsoccurtogetherinclustersspecialconjunctlettersareformedInprintedBanglamanyof theseconsonantal clustersorconjoinedconsonantsareinuseThelettersfortheconsonantsotherthanthefinaloneinthegrouparegenerally reduced But there are a few special conjunct characters which arecompounds of the consonant characters eg 7(k)+ষ(s)=8(ks)

9(n)+জ(j)=(nj)(j)+ঞ(n)==(jn) gt (h)+ম(m)=(hm) There are other issuesalsomdashরasthesecondmemberofaclusterisreducedtoasecondarysymboleg

(p)+র(r)=A(pr)B(s)+C(t)+র(r)=D(str) (as inউD ustra ldquocamelrdquo)য (y)whenusedas a primary symbol represents jɔ in Bangla But its secondary symbol(allograph) jɔ-phala has two phonetic values When added to the initialconsonant in a word it is a vowel aelig (as in শGামল (syamala) ldquogreenrdquo র Gাপার

(ryapara)ldquowrapperrdquoetc)Butafteranon-initialconsonant it justdoublesit in

pronunciation (as in কাযH ধাযH etc) The I(r)+য(y) combination has two

renderingsmdashর G(ry) andযH(ry)IncaseofJ(d)+ধ(dh)K(g)+ধ(dh)L(n)+ধ(dh)the

shape of the second member is changedmdasheg M(ddh) N(gdh) and O(ndh)

respectively The solitary example of I (r)+ঋ(r)=ঋH (as in ৈনঋHত nairrtSouthwest) ndash usedmostly in cases of Classical borrowings shows the use ofsecondary symbol of a consonant followed by the primary symbol of a vowelTheinherentvowelonlyappliestothefinalconsonantofthecluster

InconsonantclustersmanyconsonantstookacompletelydifferentformSometypicalexamplesareS(kt)T(kr)8(ks)N(gdh)=(jn)U(nc)(nj)V(tt)W(nt)O(ndh)X(bdh)Y(bhr)Z(mb)[(st)etcরhastwoallographsapartfromthisfullshapeoneislsquorepharsquoasfoundinকH(rk)পH(rp)andanotherisra-phalaasinA(pr)T(kr)(s+n)isanotheronewherethecerebralnasalconsonantsigntakesaqueershape[151]

The Bangla script has at least fifty-two primary symbols and quite a fewallographs(positionalvariantsofthem)correspondingtoforty-four(7oraland7nasalvowelsand30consonants)phonemes(150)orfunctionalspeechsoundswithsomeobviousredundanciesalthoughinoneofthefirstphonemicanalysisthenumberwasthoughttobethirty-fivephonemes[140]

11

As mentioned above in Bangla several graphemic symbols have secondaryshapestechnicallycalledlsquoallographsrsquowithacomplementarydistributionineachcaseThesegraphsormarkingsaregenerallyaddedtothefollowingpositionsoftheprimarysymbol[113]inthefollowingmanner

1) Below(egক(ku)W(nta)ক(ku)^ (hra)etc)

2) Above(egচ (ca)কH (rka)etc)

3) Rightside(egকা (ka)কং (kan)etc)

4) Leftside(egেক (ke))

5) LeftSideandabovesimultaneously(egৈক (kai)িক (ki)etc)

6) Rightsideandabovesimultaneously(egকী (kı))

7) Rightsideandleftsidesimultaneously(egেকা (ko))

8) Rightsideleftsideandabovesimultaneously(egেকৗ (kau))

Asforcomplementarydistributionofvowelletters(word-orsyllable-initial)andVowel Matras which are relevant for ABNF let us consider the followingBesidessomesimpleVowelModifierscalledlsquoKarsrsquoinBangla(alsoreferredtoasMatraintheotherLGRdocumentsofNeo-Brāhmī)therearesomecombinatorymodifiersofBanglaVowelswithcertainconsonantsForexamplewhereas

আU+0986BENGALILETTERAAissubstitutedby

াU+09BEBENGALIVOWELSIGNAA

ইU+0987BENGALILETTERIissubstitutedby

pre-posedিU+09BFBENGALIVOWELSIGNI

ঈU+0988BENGALILETTERIIissubstitutedby

ীU+09C0BENGALIVOWELSIGNIIor

উU+0989BENGALILETTERUissubstitutedby

U+09C1 BENGALI VOWEL SIGN U by marking below the primary

grapheme there are some special vowel modifiers of উ as in the followingcombinedletters

zwnj guratherthanwritingasগ(g)+ (u)

h ruratherthanwritingasর(r)+ (u)

zwnj śuratherthanwritingasশ (s)+ (u)

j huratherthanwritingasহ(h)+ (u)

12

knturatherthanwritingasL (n)+ত (t)+ (u)

Similarlytherecouldbevowelmodifiersofঊorlsquo(Long)ūrsquoaswelleg

m (bh)+র (r) (n bhru ldquoeyebrowrdquo)o (s)+র (r) (p sru)ঋ (r) afterহ (h) (q hr)etc

TherehavebeenmanynotablecontributionsinsimplifyingandmodifyingBanglaspellings and combinatory techniques especially by scholars such as PabitraSarkar(1992)[134]Inthistherehasbeenanattempttoreducethenumberofallographs of both vowels and consonants in clusters and it has been widelyacceptedintheprintingofschooltextsinbothBangladeshandWestBengal[151152]Asofnow twosystems theold (traditional) and thenewgoon sidebysideoperativeindifferentdomains

HoweverinpreparationofthisLGRdocumenttheaimhasbeentoconsiderthewidelyused and usable sequences and combinations and their variations across the sisterscriptsbelongingtothebasketofBrāhmīwritingsystemsBanglaAcademyDhakapublishedStandardBanglaSpellingRulesin1992followingtherecommendationsofacommitteeconstitutedthroughaworkshopjointlyorganizedbytheJatıyaSy iksakramaandPathyapustakaBoardin1988AthroughlyrevisededitionoftheRuleswaspublishedinSeptember20126After the establishment of Banla A kademi ofWestBengal in 1986 its first PresidentAnnadasankar Ray (1904-2002) in his inaugural address gave a direction forstandardizationofBanglaalphabetscript thespellingsystemandclearlyarguedthattheywouldnotblindly followtheSanskriticmodelofconventionalgrammarAbroadlistofproposalswassenttoexpertsonBanglaandabroadagreementwasreachedforlsquohomogenizationofBanglaspellingrsquoby1988BasedonopinionsreceivedfromdifferentquartersaunanimouslistoflsquorulesrsquowasagreeduponThiswaspublishedbyalsquoSpellingDictionaryrsquo titled Ākādemi Bānāna Abhidhāna (1997) which was obviously morecomprehensive than lsquoTheUniversityofCalcuttaproposalsrsquomade in1936Alongwiththe lsquorationalizationrsquo of spellings another stepwas taken tomake thewriting systemeasier to read by making the symbols used both single and combined ones morelsquotransparentrsquoThesereformswereoriginallysuggestedbySarkar(1987firstpublishedin1978)[134][153]whereheusedthetermsSwaccha (lsquoTransparentrsquo)andAswaccha(lsquoOpaquersquo or non-transparent) even adding Ardha Swaccha (lsquohalf transparent) inbetweenthetwoSomesampleexamplesare

6Bangla Academy 2012 Bāṅlā Ekaḍemī Pramita Bāṅlā Bānānera Niyama (Bangla Academy StandardBanglaSpellingRules)DhakaːBanglaAcademy

13

Transparent r (nn) s (pt) [ (st) where both member of the cluster can berecognizedOpaquewhereneitherofthetwocouldbe(easily)recognizedmdash8 ks(7 k+ষ s)= jn

( j+ঞ n)tng(un+গg) hm(gt h+ম m)

Semi-transparent A (pr)পH (rp)whereone symbol is recognizable and theother is

notIncaseofthree-termclustersatleastonesymbolwillnotbetransparentegv str

(w s+x t+র r)D str(B s+C t+র r)etc

Therewere in fact two types of proposals One concerned the shape of the lettersthose of consonant + vowel (CV) combinations and conjuncts which is consonant +consonantcombinationsTherewerefurthercomplexshapesiethoseofconsonant+consonant+ (consonant+) vowel (CC(CV) signs as in y (pru) or z (skru) SomedecisionsinthisareawerenecessarybecauseafewoftheCC(C)symbolsrepresentedcomplexitiesthatmadelearningthemdifficultforthechildrenTheotherdealtwiththespellings ofwords onlywithout any reference to the shapes of letters inwhich theywere written The basic objective here was lsquoone word one spellingrsquo to the greatestextentthatwaspossible[151]

Belowwe place a statement of themost salient changes that affect the consonant +vowelcombinations[153]

a The variants of the short u (^ উ-কার hrasva u-kāra) vowel sign have been

brought down to one ie So zwnj (gu) is now গ Similarly h (ru) gt র zwnj

(śu)gt শ j (hu)gtহ and therefore cluster + short u sign k (ntu)gt W

(ন++ত+উ) (stu)gt[ (স++ত+উ)

b The variants of long u (দীঘH ঊ-কার dīrgha u-kāra) have also been reduced

(rū)gt র n (bhrū) gt Y (ভ bh++র r+ঊ ū) (drū)gt (দ d++র r+ঊ ū) p (śrū)gt

(শ ś++র r+ঊ ū)

c The variants of ঋ-কার (ṛ-kāra secondary symbol of ṛ) have been brought down to one q (hṛ) gt হ

Regarding consonant + consonant + (consonant)hellip+ (vowel) clusters PaschimbangaBanglaAkademi proposed transparent or semi-transparent shapes for clusters to theextentadmissibleinBanglawritingsystemSomeexampleswillclarifytheproposal(Aslashwillmeanthatthetraditionalcluster-shapeprecedesitwhiletheBanglaAkademiinnovationfollows)[153]

Xব ধ bdh ( b+ ধ dh) Mদ ধ ddh (J d+ধ dh) ন থ nth (L n+থ th) Uঞ চ ntildec

(9 ntilde+চ c) ঞ ছ $ ntildech (9+ছ) ঞ জ ntildej (9 ntilde+জ j) Sক ত amp kt (7 k+ত t) T

14

kr (7 k+র r) Nগ ধ ( gdh (K g+ধ dh) ) ṅk (u ṅ+ক k) t ṅg (u ṅ+গ g) +

ṣṇ (B ṣ+ণ ṇ) ন ndhr (L n+ dh+র r) - ṇḍr ( ṇ+ ḍ+র r) ktr (7 k+x

t+র r)

331TheConsonantsAsper traditional classificationBangla Consonants are categorized according to theirphoneticpropertiesespeciallyintermsofplaceandmannerofarticulation[107]Thereare Five lsquoVargarsquo (pronounced as lsquoBargarsquo in Bangla) or Groups (sets or classes)distinguished by Place of Articulation and one Non-lsquovargarsquo group [105] Each Vargawhich corresponds toStopsat a certainplaceof articulation containsa seriesof fiveconsonants classified as per their phonetic qualities (ie manner of articulation)beginning from Unvoiced and Unaspirated to Voiced and Aspirated (in the fourthcolumn)finallyendingwithaHomorganicorCorrespondingnasal[107]Considerthefollowingtable

lsquoVargarsquoorSets

Unvoiced Voiced Nasal

-Asp +Asp -Asp +Asp

Velar কlsquoKrsquoU+0995

খlsquoKHrsquoU+0996

গlsquoGrsquoU+0997

ঘlsquoGHrsquoU+0998

ঙlsquoNGrsquoU+0999

Palatal চlsquoCrsquoU+099A

ছlsquoCHrsquoU+099B

জlsquoJrsquoU+099C

ঝlsquoJHrsquoU+099D

ঞlsquoNYrsquoU+099E

Retroflex টlsquoTTrsquoU+099F

ঠlsquoTTHrsquoU+09A0

ডlsquoDDrsquoU+09A1

ঢlsquoDDHrsquoU+09A2

ণlsquoNNrsquoU+09A3

Dental তlsquoTrsquoU+09A4

থlsquoTHrsquoU+09A5

দlsquoDrsquoU+09A6

ধlsquoDHrsquoU+09A7

নlsquoNrsquoU+09A8

Bilabial পlsquoPrsquoU+09AA

ফlsquoPHrsquoU+09AB

বlsquoBrsquoU+09AC

ভlsquoBHrsquoU+09AD

মlsquoMrsquoU+09AE

Table5VargaclassificationofBanglaconsonants

(FallingintoaPatternofFiveSetsofUnvoicedUnaspiratedUnvoicedAspiratedVoicedUnaspiratedVoicedAspiratedandNasalscalledfivelsquoVargarsquo)

Non-

যlsquoYrsquoU+09AF

য়lsquoYYrsquoU+09DF

রlsquoRrsquoU+09B0

ড়lsquoRRrsquoU+09DC

ঢ়lsquoRHrsquoU+09DD

15

Varga লlsquoLrsquoU+09B2

শlsquoSHrsquoU+09B6

ষlsquoSSrsquoU+09B7

স lsquoSrsquo U+09B8

হlsquoHrsquoU+0939

Table6Non-Vargaconsonants(Notfallingintoanyofthefivecategories)

332TheImplicitVowelKillerHasanta(calledrsquoHalantrsquoorlsquoHalantarsquoinotherBrahmı-basedscripts)As stated earlier all consonants are pronounced in isolation with an implicit vowel(centralback-ɔinBanglaastheneutralvowel)assumedtobeassociatedwiththem[121]ThelsquoHasantarsquo(=rsquoHalantrsquoorlsquoHalantarsquoinotherBrahmı-basedscripts)orthetermlsquoVirāmarsquo7(=rsquoDa rirsquoinBangla)aspreferredinUNICODE(cfUnicode30andabove)havebeenusedinthisreportastermsthathavebeenusedtodenotethecharacterthatmarkthe absence of this inherent vowel It may be noted that the term virama has beenadopted in UNICODE in a sense that is different from the traditional definition ofgrammarandhenceitrequiressomeexplanationhereConsideringtheimportanceofthedocumentthisnoteshouldbeapartofthisLGRdocumentsothatanybodyreferingtoitshouldbeabletoknowthepropergrammaticalexplanationofthetermBecauseaspecialsignisneededwheneverthisimplicitvowelisstrippedoffthesymbolisknownas the Hasanta (= Halant) (U+09CD) By placing the Hasanta under the firstconsonantofacombinationorclusteronecouldndashincommonparlanceldquokillrdquoitsvowelandcreate conjuncts In thismanner conjunct characters canbegenerallywrittenbyjoining two to fourconsonant combinations In rarecases thisprocess can joinup tofive consonantsHowever thenotionof amaximumnumberof consonants joining toformoneaksara8istobeboundedempiricallyThisisanobservationbasedontheCIIL-Emille Corpora of Bangla words [132 amp 133] as seen in print these days Given themixture of scripts and languages happening on theweb the possibility that onemaywant a generic Top Level Domain [gTLD] which may have more than the observedmaximum cannot be ruled out This can be the case when a foreign language wordwhichadmitsalargenumberofconsonantsistransliteratedintoBanglaHenceintheBanglaLGRworkthislimitwillnotbeenforced

333VowelsSeparate symbols exist for all lsquoSwararsquo or Vowels in Bangla which are pronouncedindependentlyeitheratthebeginningofthewordorafteranothervowelorconsonantsoundToindicateaVowelsoundotherthantheimplicitoneaVowelsigncalledlsquokārrsquo

7VirāmaasusedhereisalsoamisnomeraccordingtotheIndiangrammaticaltraditionsNowheremereabsence of a vowel is marked as virama Hasanta just marks the absence of a vowel nothing else(AbhyankarKashinathVasudevampJMShukla1961ADictionaryofSanskritGrammarBarodaːOrientalInstitute)8ThistermneedstobedisambiguatedAksaraalsomeanslsquosyllablelsquoinIndiangrammaticaltreaditions

16

inBanglaorMātrāinNagarı9isattachedtotheconsonantSincetheconsonanthasthisbuilt in neutral vowel at the end there are equivalent kāras (Mātrās) for all vowelsexcepttheঅ(pronounced-ɔ)Thecorrelationisshownasfollows

Vowel Correspondingvowelsign(kāras(Mātrās)

অlsquoArsquoU+0985

আlsquoAArsquoU+0986 া U+09BE

ইlsquoIrsquoU+0987 ি U+09BF

ঈlsquoIIrsquoU+0988 ীU+09C0

উlsquoUrsquoU+0989 U+09C1

ঊlsquoUUrsquoU+098A U+09C2

ঋVocalicrsquoRrsquoU+098B U+09C3

ৠVocaliclsquoRRrsquoU+09E0 U+09C4

ঌVocaliclsquoLrsquoU+098C U+09E2

ৡVocaliclsquoLLrsquoU+09E1 U+09E3

এlsquoErsquoU+098F ে U+09C7

ঐlsquoAIrsquoU+0990 ৈU+09C8

ওlsquoOrsquoU+0993 োU+09CB

ঔlsquoAUrsquoU+0994 ৌ U+09CC

- ৗ U+09D7

Couldappearontopofঅ lsquoArsquoU+0985oranyothervowel

U+0981Candrabindu

9AlthoughthetermlsquoMatralsquoinBanglastandsforanaltogetherdifferentconceptvizthetopbarplacedoveraletterndashtypicallyavailableinHindiandBanglabutmissinginGujarati

17

Vowel Correspondingvowelsign(kāras(Mātrās)

Couldappearafterঅ lsquoArsquoU+0985oranyothervowel

ংU+0982Anusvara

Couldappearafter অlsquoArsquoU+0985oranyothervowel

ঃU+0983Visarga

Afteranyconsonant U+09CD(Hasanta)

- ঽ U+09BDAvagraha

Table7BanglaVowelswithcorrespondingkārs

334TheAnusvāraonuʃʃār(ং-U+0982)TheAnusvāra or onuʃʃār inBangla at times represents a homorganic nasal but notalwaysItreplacesaconjunctgroupofalsquoNasalConsonant+Hasanta+ConsonantrsquowherethesecondconsonantbelongstotheVelarvargaorsetasinলংকাButitoftenappearsalso for such combinations involving non-velars appearing as the lastmember of thecombinationasinলGাংটা ldquonakedrdquoorলGাংচা ldquoakindofsweettolimprdquoBeforeanon-vargaconsonant the Anusvara represents a nasal sound that may have an alternativeconjoined writing symbol representing the corresponding nasal consonant of theparticularsetAlthoughModernHindiMarathiandKonkaniprefertheanusvāratothecorrespondingHalf-nasal inBangla it isclearlydemarcatedastowhereonemustusetheAnusvāraandwhere ithastobeaconjunctclusterwithanasalasthefirstorthesecondcomponent

335NasalizationCandrabindu(-U+0981)

Candrabindu denotes nasalization of the preceding vowel as in চাদ cad lsquomoonrsquo(U+099AU+09BEU+0981U+09A6)Thissignwithadotinsidethehalf-moonmarkisusedasnasalizationmarkerinmanyBrahmı-basedscripts[143]

336Nukta(-U+09BC)ThenuktasigndoesnotexistinBanglaorthographyItispredominantlyusedinmanyBrahmıderivedscriptssuchasDevanagarı(forHindiBodoMaithiliSantaliKashmiriandSindhiThetermandtheconceptofnuktaareborrowedinBanglaTheIDNAProtocol(RFC5891)statesthatIDNsmustbeinUnicodeNormalizationFormC (NFC) RFC 7940 applies this requirement to LGRs The definition of NFC in the

18

UnicodeStandardcontainsanumberofcompositionexclusionsAsaresulttheBanglalettersয় YYAড় RRAandঢ় RRHAhavetoberepresentedinthethisLGRbyusingthesequences (YA +Nukta U+9AF + U+09BC) (DDA + Nukta U+9A1 + U+09BC) and(DDHA+NuktaU+9A2+U+09BC)insteadofthesinglecodepointsYYA(U+9DF)RRA(U+09DC) andRRHA (U+09DD) although the useof lsquoNuktarsquo is otherwise completelyunnaturalinBanglaIt is noted that in the current Unicode Standard chart these characters are listed asadditionalconsonantsAspertheLGRProcedurehoweverthesedecisionsdependontheIDNAProtocolthroughasetofprodeduresdevelopedbytheIETFEventhoughtheUnicode Standard also prescribesmethods to produce these three characters both asatomiccharacters (forexample09DC forড় [r]09DD forঢ় [rh] and09DFasয় [y]assinglekeystroke)theIDNAprotocolrequiresthatwetreatthemasconjunctcharactersandthenallocatecodesfortheseintheUnicodeBengaliBlockItmaybenotedthattherecouldbesporadicattemptsorcasesofwritingMuslimnamesUrdupoeticwordsandPerso-Arabicloanwordswithnuktaunderক(k)খ(kh)গ(g)জ(j) and ফ (ph) only for the sake of correct pronunciation and for maintaining thesanctityoftheloanwordThesewerealsolikeusingBanglawritingsystemtoworkliketheIPAscriptItishowevernotinuseinBanglawritinginprinting

337Visargabiʃɔrgo(ঃ-U+0983)andAvagraha(ঽ-U+09BD)

TheVisargabiʃɔrgoU+0983 is frequentlyused inBangla loanwordsborrowed fromSanskritandrepresentsasoundveryclosetohOnecouldquoteasanexampleদঃখduhkholdquosorrowrsquorsquoldquounhappinessrsquorsquo(U+0926U+0941U+0983U+0916)The Avagraha ঽ (U+09BD) is mainly used in Sanskrit Pali Prakrt or Maithili textswritteninBanglaItisgraduallybeingreplacedbyanuppercomma(egনেরাঽপরািণre-writtenasনেরাrsquoপরািণ)ItisrarelyusednoweveninotherlanguagesusingBanglascriptIncaseofLGRtheAvagrahaisnotpartoftherepertoireIthasbeendecidedthereforenottoretainAvagraha(ঽ)(U+09BD)becauseitisblockedinTLDsaspertheMaximalStartingRepertoire(MSR)PleaseseeAppendixIIinsection11foracompletelistofBanglaconsonantsandtheirallographs

338ZeroWidthNon-joiner(U+200C)andZeroWidthJoiner(U+200D)ThisnoteispertinenttotheuseofZeroWidthJoiner(ZWJ)andZeroWidthNonJoiner(ZWNJ)asusedinBanglaItneedstobenotedthatNepaliKonkaniandHindiusethesetwosignsinadifferentmanner

19

ZWJ(U+0200D)andZWNJ(U+0200C)arecodepointsthathavebeenprovidedbytheUnicodestandardto instructtherenderingofastringwherethescripthastheoptionbetweenjoiningandnon-joiningcharactersWithouttheuseofthesecontrolcodesthestringmayberenderedinanalternateformfromwhatisintendedUseofZWJ

bull InsofarasBanglaisconcernedZWJisusedfortheproperrenderingofcharacterssuchaskhaṇḍa-taৎasinসতGিজৎ (satyajit)ldquoSatyajitrdquoandসৎ(sat)ldquohonestrdquoThisistypedasfollowsta+Hasanta+ZWJ(U+0200D)

bull However ZWJ is more important where same combination of consonantal

characters is represented differently depending upon the contexts Eg র++য

havetworepresentationsinBanglamdashasযHandasর GTogettheformযHonehasto

type in the following mannermdashর++য but for র G the sequence would be

র+ZWJ++য [154] In other words ZWJ is used in the rendering of wordsdemanding ya-phalā after ra which is otherwise not possible to type (render)due to the same order of ra+hasanta+antastha ja in the medial andor finalposition Interestingly ra+hasanta+antastha ja is used to type repha on theconsonant -antasthaja as inকায6 (kaarjo) In order to get a ya-phalā after the

consonant -ra it is therefore obligatory to use ZWJ after -ra as in র Gাপার

(wrapper)র Gাশ(rash)র Gািল(rally)etcThetypingsequenceisgivenbelow

ra(র)+ZWJ+hasanta()+antasthaja(য)=র GUseofZWNJ

bull TheuseofZWNJinBanglaisusedtorepresenttheexplicitHasantaorHalantInordertoavoidconjunctformationincaseswherethereisanexplicithasantabeforethesucceedingconsonanttheZWNJisused

Consonant+hasanta+ZWNJ+consonant=explicithasantaExampleAা7 কথন(prakkathanaprakkɔtʰon)

TheuseofZWJZWNJhavebeenruledoutfromtherootzonebythe[Procedure]Usedin Bangla to create alternate renderings the insertion of these two signs can affectsearchingaswellasNLP

20

TheZeroWidthNon-joiner(ZWNJ)isaninvisiblecharacterusedincertaincases(afterHasanta)wheredefaultconjunctformationistobeexplicitlyrestrictedandtheHasantajoiningthetwoconsonantsparticipatingintheconjunctformationneedstobeexplicitlyshown

339UseofYa-phalaaYa-Phalaasequencesare two instances inBanglawhereHasanta isprecededbya fullvowel(U+0985অ-BENGALILETTERAandU+098Fএ-BENGALILETTERE)

bull অা 098509CD09AF09BE BENGALILETTERA+BENGALISIGNVIRAMA+BENGALILETTERYA+BENGALIVOWELSIGNAA

bull এা 098F09CD09AF09BEBENGALILETTERE+BENGALISIGNVIRAMA+BENGALILETTERYA+BENGALIVOWELSIGNAA

For renderingYa-phalā followedbyঅ andএ it isnecessary to typeU+09CDHasantaplusU+09AFyaprecededby thesaidvowelsThis isapurely ligaturalentityand theadditionofYa-phalāandākaraisusedtoelicittheaeligsoundasinEnglishacidঅGািসড

association অGােসািসেয়শনlsquobatrsquoবGাটlsquofatrsquo ফGাট lsquomatrsquo মGাটlsquocaprsquoকGাপetcTheBrāhmīscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribed as lsquovowel killerrsquo although it actually indicates absenceof a vowel after themarkedconsonantOnly theconsonantscanhave theHasantamarkedButasweseehereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅ8াandএ8া(CfUnicode100p473[100])

3310FormationofRa-phalaaandRefSequences Thiscasereferstotheformationofrephaandra-phalāasfollows

Ra-Hasanta= (C2H)whereC2iseither

09B0(র-BENGALILETTERRA)or 09F0(ৰ-ASSAMESELETTERRAUnicodename

BENGALILETTERRAWITHMIDDLEDIAGONAL)His09CD(-BENGALISIGNVIRAMA)

Owingtoco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+kaasinঅকHarkaldquothesunldquo)ra-phalā=C+Hasanta+ra(egT ieka+Hasanta+raasinচTchakraldquocyclerdquo)Thepointisin

21

both the cases the slot for ra could be Bangla ra র (U+09B0) or the Assamese ra ৰ(U+09F0)followedprecededbythecommonHasanta(U+09CD)whereastheshapesofrephaandra-phalāinboththecasesremainthesameTheLGRmakesanoteofthispoint of concern with respect to the two RAs in disguise as it would be compeltelyimpossibletodistinguishbetweenthemwithnakedeyesinalablesogeneratedwhichmay consequently lead to concerns related to spoofing and other kind of cyberirregularitiesThemotivetoclassthesetwoCPsas(blocking)variantsisbecausefullyrendered labels may mask the distinction between Bangla ra র (U+09B0) or theAssameseraৰ(U+09F0)ThatprovidesthejustificationforVariantSet4thoughonlyinthecontextoffollowingHasantThedifferencebetweentheRAsisonlydistinguishableifonelooksintotheirUnicodevaluesThereforelabelssuchasঅকHarka শীষH sırsalsquotopapexrsquo অY abhra lsquocloudthe skyrsquo ম śrama lsquophysical labourrsquo could be extremelydangerous as theweb-usermay never verify the digital content (the labels) with itsunicodevaluecodepoints ThispointismadeexplicitlywithreferencetoTable9(ofsequencesp36)andTable16(ofWLESymbolsp47)thataretofollowMoreoveritisnoteworthythattheREPHAcanalsooccurwithKHANDATATheconditionsinthiscontextofKHANDATAareliabletobesuchthattheCshouldbeeitherRAU+09B0(র)(usedinBangla)orRAU+09F0(ৰ)(usedinAssamese)

4 OverallDevelopmentProcessandMethodologyThe Neo-Brāhmī Generation Panel (NBGP) has been formed by members havingexperience in Linguistics (especially in NLP Computational linguistics) LiteratureLanguageHistoryandEpigraphyUndertheNeo-BrāhmīGenerationPanelBanglaandeightotherscriptsbelongingtoseparateUnicodeblocksarebeingtakenuptoassignaseparate LGR for each However an attempt ismade to ensure that the fundamentalphilosophybehindbuildingthoseLGRsconsistentwithallotherBrāhmī-derivedscriptsThepresentLGRwillcater tomultiple languagesbelongingtoEGIDSscale1to4(seeTable4)thatuseBanglascriptThefollowingguidingprinciplesareusedinmakingdecisionsaboutBanglaLGRCode-points

41 GuidingPrinciplesTheNBGP adopts followingbroadprinciples for selection of code-points in the code-pointrepertoireacrosstheboardforalltheNeo-Brāhmīscriptswithinitsambit

411 InclusionPrinciples4111 ModernUsageEvery character proposed should be in the everyday usage of a particular linguisticcommunityThecharacterswhichhavebeenencodedintheUnicodefortranscription

22

purposesonlyorforarchivalpurposeswillnotbeconsideredforinclusioninthecode-pointrepertoire4112 UnambiguousUseEvery character proposed should have unambiguous understanding among linguistsaboutitsusageinthelanguage

42 ExclusionPrinciplesThe main exclusion principle is that of External Limits on Scope These consist ofprotocolsor standardswhichareprerequisites to theLabelGenerationRule-setsAllfurtherprinciplesareinfactsubsumedundertheselimitationsbuthavebeenspeltoutseparatelyforthesakeofclarity421 ExternalLimitsofScopeThecodepointrepertoireforrootzonebeingaveryspecialcaseatthetopofprotocolhierarchies thecanvasofavailablecharacters forselectionasapartof theRootZonecodepointrepertoireisalreadyconstrainedbyvariousprotocollayersbeneathitThefollowingthreemainprotocolsstandardsactassuccessivefiltersiTheUnicodeChartOut of all the characters that are needed by the script in question if a particularcharacter is not encoded in Unicode it cannot be incorporated in the code pointrepertoire Such cases are quite rare and especially so in Bangla-Asamiyā-ManipuriWritingSystemgiventheelaborateandexhaustivecharacterinclusioneffortsmadebytheUnicodeconsortiumiiIDNAProtocolUnicode being the character-encoding standard for providing the maximum possiblerepresentation of a given scriptlanguage it has encoded as far as possible all thepossible characters needed by the script However the Domain name being aspecialized case it is governed by an additional protocol known as IDNA(InternationalizedDomainNames inApplications) The IDNAprotocol excludes somecharactersoutofUnicoderepertoirefrombeingpartofthedomainnamesiiiMaximalStartingRepertoire(MSR)TheRoot-zoneLGRbeing the repertoireof characterswhicharegoing tobeused forcreationoftheRoot-zoneTLDswhichinturnconstituteanevenmorespecializedcaseof domain names the ROOT LGR procedure introduces additional exclusions on theIDNArsquosallowedsetofcharacters

23

ExampleBanglaSignAvagrahaঽ(U+093D)evenifallowedbyIDNAprotocol isnotpermittedintheRootZoneRepertoireaspertheMSRTosumuptherestrictionsstartoffwithadmittingonlysuchcharactersasarepartofthe code-block of the given scriptlanguage The IDNA Protocol further narrows thisdownandfinallyanadditionalfilterintheformofMaximalStartingRepertoirerestrictsthecharactersetassociatedwiththegivenlanguageevenmore4211 NoPunctuationMarksTheTLDsbeingidentifierspunctuationmarkerspresentinBraHami-basedscriptswillnotbeincluded 4212 NoSymbolsandAbbreviationsAbbreviations weights and measures and other such iconic characters like BANGLAISSHAR(U+09FA)BANGLACURRENCYDENOMINATORSIXTEEN৹(U+09F9)etcwillalsonotbeincluded4213 NoRareandObsoleteCharactersThere are characterswhich have been added toUnicode to accommodate rare formssuchasSanskriticVOCALICRRৠ(U+09E0)andVOCALICLldquoঌrdquo (U+098C)aswellasVOCALICLLৡ(U+09E1)andtheallographicndashkaraformsofthelattertwosymbols-VOWELSIGNVOCALICL(U+09E2)andVOWELSIGNVOCALICLLldquo(U+09E3)Allsuch charactersareexcludedwhich complieswith theConservatismprincipleas laiddownintheRootZoneLGRprocedureHoweverinBanglathe-karacorrespondingtoVOCALICRRৠ(U+09E0)whichisVOWELSIGNVOCALICRRldquordquo(U+09C4)isstill inactiveuseincertainlimitedborrowedorSanskriticwordsandarethereforeretained4214 NoStressMarkersofClassicalSanskritandVedicStressmarkers for classical Sanskrit will not be included This is also in consonancewiththeLetterprincipleaslaiddownintheRootZoneLGRprocedure4215 ABNFThe Augmented Backus-Naur Formalism (ABNF) is described in Section 541 andAppendix(Section101)

5 RepertoireTheBanglaWritingSystemisrepresentedinUNICODEusingtheBengali(Bangla)scriptname as enumerated in ISO 15924 corresponding to languages such as Asamiyā(Assamese) Bangla (Bengali) and Manipuri The BENGALI block used for Bangla-

24

Asamiyā-Manipuri in theUNICODEhas93 entriesThis sectiondetails the code-pointrepertoirethattheNeo-BrāhmīGenerationPanel[NBGP]proposestobeincludedintheBanglaLGRItmaybementionedherethat theGovernmentofAssamhassubmittedaproposal toBureauof Indian Standards (BIS) on26th February2016 for dis-unificationofBanglaand Asamiyā Scripts The BIS inits 8thMeetingofIndian Language Technologies andProducts Sectional Committee LITD 20 held on 23rd Aug 2017 decided torefer theproposalforrecognitionofAssamesescriptinISOIEC10646toISOUntiltheUNICODEConsortiumtakesanyfurtheractionitwillbeassumedthattheCodePointRepertoireunderTable11willbevalidforallthethreelanguagesasaboveFor each of the code points language references have been given in the last columntitledReferenceunderTable8titledtheldquoCodePointRepertoirerdquoForentirecoverageofBanglacodepointsreferencesofBanglaAsamiyā(Assamese)Manipuri(Meitei)andBishnupriya are given Kokborok written in Bangla script is not known to haveintroducedmanynewcomplicationsexceptforoneparticularcharacterThoughonlyafewrepresentativelanguagesunderEGIDSScale1-4havebeenchosenforreferencingthey together cover all the code-points required for all the languages that NBGP hasconsideredasgivenunderBanglaUnicodePoints(asgiveninUNICODE63)Howeverbefore thedetailsarepresented it is ideal to lookat theBanglaCodePointChartfromMaximalStartingRepertoire[MSR]Version3Itmaybenotedthattheshapesofthereferenceglyphsgivenbelowinthecodechartsarebasedononeofthemanyfontsdesigned and are not prescriptive because there could be some variations in actualfonts ndash both UNICODE-compatible and True-Type ones Consider the following Codepointtable

25

Figure1BanglaCodePagefrom[MSR]forBangla-Asamiyā-Manipuri

ColourconventionAllcharactersthatareincludedinthe[MSR]-YellowbackgroundPVALIDinIDNA2008butexcludedfromthe[MSR]-PinkishbackgroundNot PVALID in IDNA2008 or are ineligiblefor the root zone (digits hyphen) - Whitebackground

GiventheBanglaUnicodeBlockasinFigure1forthecodepointsthoseareincludedintheMSRthefollowingsymbolswillneedaseparatetreatmentৎ U+09CE BanglaLetterKhanda-Taৰ U+09F0 Asamiyā-BanglaLetterRaWithMiddleDiagonalৱ U+09F1 Asamiyā-BanglaLetterRaWithLowerDiagonal

26

51 CodePointRepertoireInclusion

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

1 U+0981 BENGALISIGNCANDRABINDU

Candra-bindu

1Bangla2Manipuri2Assamese

[112][122][125]

2 U+0982 ং BENGALISIGNANUSVARA

Onushshar(Anusvara)

1Bangla2Manipuri2Assamese

[112][122][125]

3 U+0983 ঃ BENGALISIGNVISARGA

Biśarga(Visarga)

1Bangla2Manipuri2Assamese

[112][122][125]

4 U+0985 অ BENGALILETTERA

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

5 U+0986 আ BENGALILETTERAA

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

6 U+0987 ই BENGALILETTERI

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

7 U+0988 ঈ BENGALILETTERII

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

27

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

8 U+0989 উ BENGALILETTERU

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

9 U+098A ঊ BENGALILETTERUU

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

10 U+098B ঋ BENGALILETTERVOCALICR

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

11 U+098F এ BENGALILETTERE

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

12 U+0990 ঐ BENGALILETTERAI

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

13 U+0993 ও BENGALILETTERO

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

14 U+0994 ঔ BENGALILETTERAU

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

28

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

15 U+0995 ক BENGALILETTERKA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

16 U+0996 খ BENGALILETTERKHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

17 U+0997 গ BENGALILETTERGA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

18 U+0998 ঘ BENGALILETTERGHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

19 U+0999 ঙ BENGALILETTERNGA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

20 U+099A চ BENGALILETTERCA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

21 U+099B ছ BENGALILETTERCHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

29

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

22 U+099C জ BENGALILETTERJA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

23 U+099D ঝ BENGALILETTERJHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

24 U+099E ঞ BENGALILETTERNYA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

25 U+099F ট BENGALILETTERTTA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

26 U+09A0 ঠ BENGALILETTERTTHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

27 U+09A1 ড BENGALILETTERDDA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

30

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

28 09A109BC(U+09DC)

ড় NormalizedformofBENGALILETTERRRA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]09DCisthepreferredcodepointhoweveritisnotavailableforLGRasperthestandardsgoverningthisLGRdevelopment

29 U+09A2 ঢ BENGALILETTERDDHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

30 09A209BC(U+09DD)

ঢ় NormalizedformofBENGALILETTERRHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]09DDisthepreferredcodepointhoweveritisnotavailableforLGRasperthestandardsgoverningthisLGRdevelopment

31 U+09A3 ণ BENGALILETTERNNA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

32 U+09A4 ত BENGALILETTERTA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

31

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

33 U+09A5 থ BENGALILETTERTHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

34 U+09A6 দ BENGALILETTERDA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

35 U+09A7 ধ BENGALILETTERDHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

36 U+09A8 ন BENGALILETTERNA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

37 U+09AA প BENGALILETTERPA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

38 U+09AB ফ BENGALILETTERPHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

39 U+09AC ব BENGALILETTERBA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

32

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

40 U+09AD ভ BENGALILETTERBHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

41 U+09AE ম BENGALILETTERMA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

42 U+09AF য BENGALILETTERYA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

43 09AF09BC(U+09DF)

য় NormalizedformofBENGALILETTERYYA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]09DFisthepreferredcodepointhoweveritisnotavailableforLGRasperthestandardsgoverningthisLGRdevelopment

44 U+09B0 র BENGALILETTERRA

Consonant 1Bangla2Manipuri

[112][125]

45 U+09B2 ল BENGALILETTERLA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

46 U+09B6 শ BENGALILETTERSHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

33

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

47 U+09B7 ষ BENGALILETTERSSA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

48 U+09B8 স BENGALILETTERSA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

49 U+09B9 হ BENGALILETTERHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

50 U+09BE া BENGALIVOWELSIGNAA

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

51 U+09BF ি BENGALIVOWELSIGNI

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

52 U+09C0 ী BENGALIVOWELSIGNII

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

53 U+09C1 BENGALIVOWELSIGNU

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

34

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

54 U+09C2 BENGALIVOWELSIGNUU

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

55 U+09C3 BENGALIVOWELSIGNVOCALICR

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

56 U+09C4 BENGALIVOWELSIGNVOCALICRR

Kāra(Mātrā)

1Bangla2Assamese

[112][122]

57 U+09C7 l BENGALIVOWELSIGNE

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

58 U+09C8 m BENGALIVOWELSIGNAI

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

59 U+09CB lা BENGALIVOWELSIGNO

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

60 U+09CC lৗ BENGALIVOWELSIGNAU

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

35

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

61 U+09CD BENGALISIGNVIRAMA

Hasanta(=Halant)Virama(=Da ri)

1Bangla2Assamese2Manipuri

[112][122][125]

62 U+09CE ৎ BENGALILETTERKHANDATA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

63 U+09F0 ৰ BENGALILETTERRAWITHMIDDLEDIAGONAL

Consonant 2Assamese [122]

64 U+09F1 ৱ BENGALILETTERRAWITHLOWERDIAGONAL

Consonant 2Assamese2Manipuri

[122][125]

Table8BanglaCode-PointRepertoire

Apart from the above individual code-points the Neo-Brāhmī Generation Panel alsoproposes some specific sequences which enable conditional inclusion of the BanglaLETTER A and E followed by Bangla SIGN VIRAMA and Bangla LETTER YA againfollowed by Bangla VOWEL SIGN AA in the repertoire for enabling inclusion of aeligsoundasinEnglishlsquobatrsquolsquocatrsquoetc

SrNo

UnicodeCodePoints

Sequence

CharacterNames Examplelanguagesusingthecode-point

(Notexhaustive

list)

Reference

36

SrNo

UnicodeCodePoints

Sequence

CharacterNames Examplelanguagesusingthecode-point

(Notexhaustive

list)

Reference

S1 098509CD09AF09BE

অ8া BENGALILETTERABENGALISIGNVIRAMABENGALILETTERYABENGALIVOWELSIGNAA

BanglaAssamese

[112][122]

S2 098F09CD09AF09BE

এ8া BENGALILETTEREBENGALISIGNVIRAMABENGALILETTERYABENGALIVOWELSIGNAA

Bangla [112]

Table9Sequences

52 CodePointRepertoireExclusionTherearesomecharactersoftheBanglascriptthatfindplaceintheUnicodebuthavenot been included in the repertoire in the LGR proposal The reason for excludingঌ(U+098C)andৗ(U+09D7)isthattheyarerareandobsoletecharacters

SrNo CodePoints

Glyph CharacterNames Note

1 U+098C ঌ BENGALILETTERVOCALICL Limitedordeclininguse

2 U+09D7 ৗ BENGALIAULENGTHMARK Limitedordeclininguse

Table10ExcludedCodePoints

37

53 CodepointnotusedaloneBENGALI SIGN NUKTA U+09BC (See 336) is excluded from repertoire since it will never be used alone It will be used as sequence in three special characters in normalized form for ড় ঢ় য়

UnicodeCodePoint

Glyph CharacterName Reasonforexclusion

U+09BC BENGALISIGNNUKTA

Never used alone Only used together with U+09A1 ড

U+09A2 ঢ U+09AF য as to form ড় ঢ় য় respectively

Table10bExcludedCodePoints

54 TheBasisofPresentIDNThepresentLGRhasalsobenefitedfromtheearlierworkonIDNforBangla(differentversions)doneforभारतorভারতdraftedbetween20112009and18072013

541 TheABNFVariablesTheAugmentedBackus-NaurFormalism(ABNF)beganwiththefollowingvariables

CrarrConsonantVrarrVowelMrarrkāra(Mātrā)BrarrAnusvāra(onuʃʃār)DrarrCandrabinduXrarrVisarga(biʃɔrgo)HrarrHasantaViramaZrarrKhandaTa

TheAugmentedBackus-NaurFormalism(ABNF)willusethefollowingOperators

SrNumber Operator Function

1 ldquo|ldquo Alternative

2 ldquo[]rdquo Optional

3 ldquordquo VariableRepetition

38

4 ldquo()rdquo SequenceGroup

Table11TheABNFFormalism

InwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoBanglaaregiventofacilitateunderstanding

542 TheVowelSequenceInwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoBanglaare given To facilitate understanding of other Brahmi script users equivalents inDevanāgarīareprovidedwherevernecessaryAvowelsequenceismadeupofasinglevowelItmaybefollowedbutnotnecessarily(optionally)byanAnusvāraonuʃʃār(B)Candrabindu(D)oraVisargabiʃɔrgo(X)ThenumberofDBorXwhichcanfollowaV inBanglamaynotberestrictedtooneGoingbytherules illustratedinthedocument it isclearthat formationssuchasVDDVBBandVXXare invalidorthographicunitsHowever it isvalidandpossible tohaveformationsorsequencessuchasanusvarafollowedbyachandrabinduononehandandvisarga followed by a chandrabindu on the other as in হ8াংচা lsquohaelignchārsquo and lsquohaelignrsquo হ8াঃrespectivelyThepossibilityof aVisarga orAnusvāra (onuʃʃār) followingaCandrabinduexists inBangla Vowel can optionally be followed by a combination of Hasanta Virama [H]Consonant [C] to formaYa-phala ldquoYa-phala isapresentation formofU+09AFBanglaletterযorlsquoyarsquoRepresentedbythesequenceltU+09CDieBENGALISIGNVIRAMABangla SIGNHasanta or VIRAMA U+09AF -য BENGALI LETTER YAgt Ya-phala has aspecialformয়AgainwhencombinedwithU+09BEাBENGALIVOWELSIGNAA(ielsquoaarsquo(ā))itisusedfortranscribing[aelig]asintheldquoardquointheEnglishwordldquobatrdquowritteninBanglaasব8াটAVowel-sequenceadmitsthefollowingcombinations

5421 ASingleVowel

ExamplesV অअ

5422 AVowelwithConditionsAVowelcanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D]orVisarga[X]or Candrabindu+ Anusvāra [DB] or Candrabindu+ Visarga [DX] or combination ofHasanta(orVirama)[H]followedbyConsonant[C]followedbykāra(Mātrā)[M]

39

Examples

VB অং अ

VD অ अ

VX অঃ अः

VDB অং अ

VDX অঃ अःVHCM অ8াএ8া

5423 VHCMSequenceAVHCMsequencecanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D] orVisarga [X]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga [DX]

Examples VHCMB অ8াংএ8াং VHCMD অ8াএ8া VHCMX অ8াঃএ8াঃ VHCMDB অ8াংএ8াং VHCMDX অ8াঃএ8াঃ

543 TheConsonantSequence

5431 ASingleConsonant(C) ExampleC কक

5432 AConsonantwithConditionsAConsonantoptionallyfollowedbydependentvowelsignkāra(Mātrā)[M]orAnusvāra [B] or Candrabindu [D] or Visarga [X] or Hasanta (also known asVirama)[H]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga[DX]

Example

CM িকক -कक

CB কং क

40

CD ক क

CX কঃ कः

CH p क(Pureconsonant)

CDB কং क

CDX কঃ कः

5433 CMSequenceACMsequencecanbeoptionallyfollowedbyBDXDBorDX

Example CMB কীংকং कक

CMD কা का

CMX বীঃ वीः

CMDB কাং का

CMDX কাঃ काः

5434 SequenceofConsonants Asequenceofconsonants(upto4)joinedbyHasanta(alsoknownasVirama)

3(CH)CExample

CHC W rarr ন++ ত न+त

CHCHC sup2 rarr ন+ + ত+ + র न+त+र

CHCHCHC q8 rarr ন++ত++র++য় न+त+र+य

5435 Subsets While considering its subsets as a representative example we will consider thecombinationCHConlyhoweverthesameisequallyapplicabletoCHCHCandCHCHCHC[A]ThecombinationmaybefollowedbyMBDXDBorDX

Example CHCM sup3ী rarrক ক ী 4कrarrककী

CHCB sup3ং rarrক ক ং 4कrarrकक

CHCD sup3 rarrক ক 4कrarrकक

41

CHCX sup3ঃ rarrক ক ঃ 4कःrarrककঃ

CHCDB sup3 ং rarrক ক ং 4कrarrकक

CHCDX sup3ঃ rarrক ক ঃ 4कःrarrककः [B] 3(CH)CMmayfurtherbefollowedbyaBDXDBorDX

Example

CHCMB sup3ীং rarr ক ক ী ং 4क rarr क क ी

sup3ং rarr ক ক ং 4क rarr क क

CHCMD sup3া rarr ক ক া 4का rarr क क ा

CHCMX sup3ীঃ rarr ক ক ী ঃ 4कः rarr क क ी ः

CHCMDB sup3াংrarr ক ক া ং 4काrarr क क ा

CHCMDX sup3াঃ rarr ক ক া ঃ 4काः rarr क क ा ः

544 TheKhanda-Tasequence

5441 AsinglelsquoKhandarsquo-Ta(Z) Example Zৎ=x

5442 AKhandaTaCombination10AKhandaTacanbeprecededbyaconsonantandHasanta(alsoknownasVirama)

[CH]Z

Example র++ৎ =ৎHasinভৎHসনা (bhartsanā) scoldingNoteTheconditionsinthiscontextofKHANDATAarethattheCshouldbeeitherRAU+09B0(র)(usedinBangla)orRAU+09F0(ৰ)(usedinAssamese)

545 SpecialCasesSandPTwospecialcasesinvolvingSequences(referredtoasSandPinTable16underSection7)couldbedescribedbrieflyhereLetustakeupSinthefirstinstanceItisnoteworthythat thereare two instances inBanglawhereHasanta(U+09CD) isprecededbya full

10 Refer to Rule P in Section 7 Table 16

42

vowel (U+0985 অ - BANGLA LETTER A and U+098F এ - BANGLA LETTER E) ForrenderingYa-phalāfollowedbyঅandএitisnecessarytotypeU+09CDplusU+09AFyapreceded by the said vowels This is a purely ligatural entity and the addition ofYa-phalāandā-kāraisusedtoelicittheaeligsoundasinEnglishlsquobatrsquolsquofatrsquoetcTheBrahmıscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribedaslsquovowel killerrsquo although it actually indicates absence of a vowel after the markedconsonant Only the consonants can have the Hasanta marked But as we see hereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅGাandএGা(CfUnicode100p473[100])Another case refers to the formation of repha and ra-phalā in the said script andmentioned in the tableaboveasPOwing toco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+

kaasinঅকHarkaldquothesunldquo)ra-phalā=C+Hasanta+ra(egTieka+Hasanta+raas

inচTcakraldquocycleldquo)ThepointisinboththecasestheslotforracouldbeBanglaraর

(U+09B0)ortheAssameseraৰ(U+09F0)followedprecededbythecommonHasanta(U+09CD)whereastheshapesofrephaandra-phalāinboththecasesremainthesame

43

6 VariantsThissection talksabout thevariants in theBanglascriptTheNBGPcategorizes theseconfusinglyvariantsintwogroupsGroup1ConfusingduetopurevisualsimilarityGroup2ConfusingduetodeviationfromnormallyperceivedcharacterformationsbylargerlinguisticcommunityForGroup1anyidenticalcodepointsaredefinedasvariantsTheconfusablebutnotidenticalcasesarenotproposedasthereisanotherpanel(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesHowevercaseswhichbelongtoGroup2areproposedtobeconsideredasvariantsThesecasesarenotofmerevisualsimilarityasthey involve some deviations from the widely accepted norms of Bangla Aksharformations These can cause confusion even to a careful observer and hence beingproposedasvariantsThe variants are generated in a script when two or more forms are formed withdifferent storage or code points In Bangla the e-kāra ā-kāra and the o-kāra havedifferentcodepointsOnecantypeowithaconsonantatonegoandthesamebytypinge-kāra and ā-kāra as two separate keys getting the same results A reader cannotdifferentiatebetween the twoko (েকা)one typedwitha singlekeyand theotheronetypedwithtwodifferentkeysMoreoverthiswillnotbeconsideredasacaseofvariantbecauseakārafollowedbyakāraisnotallowed

61 InScriptVariantsHoweverweproposetwocasesoftruein-scriptvariantsinBanglascriptCASEIAs far as true variants in Bangla are concernedwemay drawour attention to caseswhereinHasantawith(U+09A5)থ(tha)appearsasconjunctwith(U+09B8)স(sa)and(U+09A8)ন(na)1 স+Hasanta+থ(U+09B8+U+09CD+U+09A5)versus

স +Hasanta+হ(U+09B8+U+09CD+U+09B9)

2 ন+Hasanta+থ(U+09A8+U+09CD+U+09A5)versus

ন+Hasanta+হ(U+09A8+U+09CD+U+09B9)

44

Theabovecombinationsifwrittenintraditionalorthographycouldbelittleconfusingwheretheথ(tha)inconjunctappearslikeaহ(ha)Theconjunctcouldbeintheinitialmedialorfinalpositions(asshownbelowinegno1)Itcouldbetypedwrongaswellthinking itwas aহ (ha)U+09B9 increasing the chances of risks in labelwriting andidentificationExamples1 acuteandসহ(asinacuteানsthānaacuteলsthulaাacuteGsvāsthyaঅacuteায়ীasthāyı)

2 andনহ(asinparagrantha)The fontswhichrepresent traditionalBanglawritingsystemcould tend tocreate thisproblemThereforethesemaybetakenascasesofvariantsinBanglaCASEIIAnotherinterestingexampleofvariantisencounteredinra+HasantaandHasanta+racombinationsinwritinglabelsintheBanglascript(forlanguagessuchasBanglaAssameseandManipuri)Thevariantcasesariseintypinglsquorepharsquo(involvingra+Hasanta)andlsquora-phalārsquo(involvingHasanta+ra)lsquoRepharsquocouldbeformedbytwosequences(mainlybecausebothAssameseandBanglafindplaceinthesameUNICODEpointsandlsquoB_RArsquoaswellaslsquoA_RArsquorefertothesamephoneticelement)Herethefinalligatureslookthesameandwillbeasfollows

(1) B_RA+H+C(2) A_RA+H+C

Where

B_RA = U+09B0BENGALILETTERRA(র)orA_RA = U+09F0BENGALILETTERRAWITHMIDDLEDIAGONAL(ৰ)H = U+09CDBENGALISIGNVIRAMA()C = anyconsonant(theoretically)

Example

Sequence1(UsingBanglaRA)

Ligature1

Sequence2(UsingAssameseRA)

Ligature2

U+09B0(র)U+09CD()U+0995(ক) কH U+09F0(ৰ)U+09CD()U+0995(ক) কH

U+09B0(র)U+09CD()U+09A0(ঠ) ঠH U+09F0(ৰ)U+09CD()U+09A0(ঠ) ঠH

Table12ExampleofRepha

45

NoteAsBanglaandAssameseকandঠlookexactlythesametheresultantcombinationswithRephalookidenticalAdditionofRephadoesnotmakeanydifferencelsquoRa-phalarsquocouldbeformedbytwosequencesonsimilargroundsandthefinalligatureswouldlookthesame

(1) C1+H+B_RA(2) C1+H+A_RA

WhereC1 = anyconsonantsexceptKhanda-ta

Example

Sequence1(UsingBanglaRA)

Ligature1

Sequence2(UsingAssameseRA)

Ligature2

U+0995(ক)U+09CD()U+09B0(র) U+0995(ক)U+09CD()U+09F0(ৰ)

U+09A8(ন)U+09CD()U+09B0(র) ) U+09A8(ন)U+09CD()U+09F0(ৰ) )

Table13ExampleofRa-phala

AstheAssameseandBanglaRephaandRa-phalaconjunctformslookthesamethiscouldcauseconfusabilitytotheend-usersHencetherephaandra-phalacasesneedtobedefinedasvariantsNBGPconcludedtodefineরandৰasvariantcodepointswhereonlyonevariantsetbetweenরandৰcouldcoverallcasesButthiswillcreateblockedvariantlabelsegif

someoneregistersldquoরররrdquothevariantlabelldquoৰৰৰrdquowillbegeneratedasvariantandwillbeblockedandviceversaHoweveritisonlyblockedatthelabellevelifsomeoneelseneedstoregisterotherlabelsegৰৰorৰৰৰৰitisstillpossible

62 CrossScriptVariantsAcrispcrossscriptstudyforBanglahasbeendonewithrespecttosisterscriptssuchasDevanāgarī Gurmukhı and Odia11(formerly Oriya) keeping in mind the visual andtechnicalconfusionstheymaycauseaslabelsonthewebdomainMoreoverthereisnoin-script variant in Bangla as far as the orthography is concerned The followingcharacters are being proposed by the NBPG as variants Although there are certaincharacters which are somewhat similar they but have not been included here TheyhavebeenprovidedintheAppendix(102)forreference

11 Unicode uses Oriya for the script although Odia is now the official term used

46

1 BanglaandNāgarīDevanāgarīScript

Bangla Devanāgarī

মU+09AE

मU+092E

িU+09BF

िU+093F

Table14-BanglaandDevanāgarīcross-scriptvariantcodepoint

2 BanglaandGurmukhiScript

Bangla Gurmukhı

মU+09AE

ਸU+0A38

িU+09BF

ਿU+0A3F

Table15-BanglaandGurmukhıcross-scriptvariantcodepoint

7 WholeLabelEvaluationRules(WLE) ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinsection32whenwritten inBangla12ScriptThe ruleshavebeendrafted in suchawaythattheycanbeeasilytranslatedintotheLGRspecificationsBelow are the symbols used in the WLE rules for each of the Indic SyllabicCategoryasmentionedinthetableprovidedinCodepointrepertoire(Section51)

C rarr Consonant

M rarr Kāra(Mātrā)

V rarr Vowel

B rarr Anusvāra

12 As used by the Unicode denoting and including both Assamese and Maṇipuri

47

D rarr Candrabindu

X rarr Visarga

H rarr Hasanta

Z rarr KhandaTa

S rarr S1S2(fromTable9)or(ae)Ya-phalā(V1HC1M1)whereV1iseither0985(অ-BENGALILETTERA)or098F(এ-BENGALILETTERE)His09CD(-BENGALISIGNVIRAMA)C1is-09AF(য-BENGALILETTERYA)M1is-09BE(া-BENGALIVOWELSIGNAA)S1 and S2 are valid even they are not allowed by the other context rules

P rarr Ra-Hasanta(C2H)whereC2iseither09B0(র-BENGALILETTERRA)or 09F0 (ৰ - ASSAMESE LETTER RA

Unicode name BENGALI LETTER RAWITHMIDDLEDIAGONAL)

His09CD(-BENGALISIGNVIRAMA)

Table16-SymbolsusedinWLErules

ItisalsoperhapsidealtomentionherethatinBanglatheconsonantletters(orgraphemes)arephysicallyjoinedtoformldquoclustersrdquothatcouldtheoreticallyconjoinfromtwotofourconsonantsandcombinetocreatenewshapesDashandChaudhuri(1998)statethatthereareldquonearly380uniqueconsonantclustersrdquooutofwhichBi-consonantalcombinationsare290three-lettercombinationsaccountforanother80andtherareroneswithfourlettersnumber10more[136Pg4]MoredetailsofsuchcombinationscouldbeseeninPabitraSarkar(1993)[135] 71FinalSetofWLERulesTheprevalentpatternsinBanglaandvariousrestrictionsbelowarethespecificWLErulesthatneedtobeimplemented

48

1 CisasetofCandCNwhereCNisthesetofnormalizedformsofড়ঢ়য়2 HmustbeprecededbyC

Example

3 MmustbeprecededbyCExampleকা

4 DmustbeprecededbyeitherofVCorMExampleআখখা হা

5 XmustbeprecededbyeitherofVCMorDExampleউঃখঃবঃাঃ দ ঃ

6 BmustbeprecededbyeitherofVCMorDExampleআংইংকং

7 ZmustbeprecededbyVCMDBXorPExampleইৎকৎাৎাৎপ6ৎrৎ (S is not listed because S ends with M Z may also follow S)rdquo

8 VCANNOTbeprecededbyHDetailsin711CaseofVprecededbyH

9 SCANNOTbeprecededbyH

Now letuselaborateeach rulewithexamples from the scriptkeeping inmindthe Bangla Assamese and Manipuri communities Some combinations ofcharactersmayseemunrealisticorrareinusagebutthereisnoharminaddingsuch ligaturesbecause it ispossible tocreate thembyanyusereasilybutmaynotbeattestedcombinationsCase of V Preceded by H There could be cases involvingmulti-word domains where Vmay need to beallowedtofollowanHegব8াsঅtইিuয়া baeligŋkʌv ɪndiə (U+09ACU+09CDU+09AFU+09BEU+0999U+09CD U+0995 U+0985 U+09AB U+09CD U+0987 U+09A8 U+09CD U+09A1U+09BFU+09DFU+09BE)(meaningBankofIndia)ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendswithanH(অt)andthesecondwordbeginswithaV(ইিuয়া)Somesectionsofthe linguistic community require the explicit presence of H for fullrepresentation of the sound intended However by and large the form of thefirstwordwithoutanH(U+09CD)isconsideredenoughforfullrepresentationofthesoundintendedforthefirstword

49

This isauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoire Otherwise V is never required to be allowed to follow an HPermittingthismaycreateaperceptivesimilaritybetweentwolabels(withandwithout H) for majority of the linguistic communities hence this is explicitlyprohibitedbytheNBGPIn future if required depending on the prevailing requirements from thecommunitythefutureNBGPmayconsiderrevisitingthisrule

72 AdditionalExamplesfromBanglaABNF

Belowarea fewexampleswhichhelponeunderstandsomeof the rulesABNFputsinplaceThesearejustgivenforreferencepurposesandarenotmeanttobecomprehensive

1HMBDorXcannotoccurinthebeginningofaBanglawordExample

ক क

াক ाक

ংক क

ক क

ঃক ःकAscanbeseensuchcombinationwillresultautomaticallyinaldquogolurdquooradottedcircle marking it as an invalid formation This is an intrinsic property of theIndianlanguagesyllableandisquasi-automaticallyappliedwhereversupportedbytheOS

2HisnotpermittedafterVBDXMS

Example

অ अ

অং क ক क কঃ कः ি )क

3NumberofBDorXpermittedafterConsonantorVowelorakāra(Mātrā)isrestrictedtoonethusthefollowingcombinationsareinvalidated

50

Example

কংং क

ক क

কঃঃ कःः

কা का

কীঃঃ कःः

অংং अ

অ अ

অঃঃ अःः

4NumberofMpermittedafterConsonantisrestrictedtooneExample

কীী की5MisnotpermittedafterV Example

ইাঈৗ ईाईौ6ThecombinationsofAnusvāra+ VisargaaswellasVisarga+Anusvāraarenotpermissible

Example

কংঃ कः

কঃং कः

8 Contributors

81ExpertsfromIndia ProfessorUdayaNarayanaSinghChair-ProfessorofLinguisticsampDeanFacultyofArtsAmityUniversityHaryanaGurgaonPachgaonManesarPIN122431(Haryana)India

51

ProfessorPabitraSarkarformerlyVice-ChancellorRabindraBharatiUniversityKolkataDrAtiurRahmanKhanPrincipalTechnicalOfficerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMrRajibChakrabortyLinguistSocietyforNaturalLanguageTechnologyResearch(SNLTR)Module114amp130SDFBuildingSaltLakeSector-VKolkata-700091(WestBengal)IndiaMrAkshatJoshiProjectEngineerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMsMoumitaChowdhurySeniorTechnicalOfficerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMrChandrakantaMurasinghAgartalaTripuraSomeotherNBGPmembers

82ContributorsfromBangladeshJanabMustafaJabbarHonorableMinisterMinistryofPostsTelecommunicationsampInformationTechnologyGovtofBangladeshProfShamsuzzamanKhanFormerDirector-GeneralBanglaAcademyDhakaProfRafiqulIslamNationalProfessorofHumanitiesDhakaProfSwarochisSarkarDirectorInstituteofBangladeshStudiesRajshahiUniversityRajshahiBangladeshProfJinnatImtiazAliDirector-GeneralInternationalMotherLanguageInstituteDhakaMrMohammadMamunOrRashidDepartmentofBanglaJahangirnagarUniversityampMemberBangladeshComputerCouncilProfManiruzzamanformerlyProfessorChittagongUniversityChattagramBangladeshMrShyamSunderSikderSecretarySecretaryPostampTelecommunicationsDivisionGovtofBangladesh

52

MrMdMustafaKamalFormerDirectorGeneralBangladeshTelecommunicationsRegulatoryAuthorityGovernmentofBangladeshDhakaBrigadierGeneralMdMahfuzulKarimMajumderDirector-GeneralEngineeringampOperationsDivisionBangladeshTelecommunicationsRegulatoryAuthorityGovernmentofBangladeshDhakaMdZiarulIslamProgrammerPostsampTelecommunicationsDivisionGovernmentofBangladeshDhakaProfSyedShahriyarRahmanDepartmentofLinguisticsUniversityofDhakaDrMizanurRahmanDirector(In-Charge)TranslationTextBookandInternationalRelationsDivisionBanglaAcademyDhakaDrApareshBandyopadhyayDirectorBanglaAcademyDhakaMrMdMobarakHossainDirectorBanglaAcademyDhakaDrJalalAhmedDirectorBanglaAcademyDhakaMrJahangirHossainInternetSocietyBangladesh(ICANNALS)JanabSarwarMostafaChoudhuryBangladeshComputerCouncilDhakaJanabMdRashidWasifBangladeshComputerCouncilDhakaJanabIstiaqueArifSeniorAssistantDirectorBangladeshTelecommunicationsRegulatoryAuthorityDhakaMsAfifaAbbasInformationSecurityandGovernanceLeadEngineeratBanglalinkandICANNFellowMr Mohammad Abdul Haque Secretary General Bangladesh Internet GovernanceForumMrImranHossenCEOEyeSoftandkeymemberofBangladeshAssociationofSoftwareampInformationServices(BASIS)MsShahidaKhatunDirectorFolkloreMuseumampArchiveDivisionBanglaAcademyDhakaMrSyedAshikRehmanCEOBengalMediaCorporationDhakaMrHaseebRahmanCEOProfessionalsrsquoSystemsDhaka

53

9 References[100] UnicodeConsortium2017UnicodeStandard100MountainViewCA

[101] BandyopadhyayChittaranjan1981DuiShatakerBanglaMudranoPrakashanKolkataAnandaPublishers

[102] BanerjiRD1919TheOriginoftheBengaliScriptKolkataNewDelhiAsianEducationalServices2003reprint

[103] ChatterjiSK1926TheOriginandDevelopmentoftheBengaliLanguageCalcuttaCalcuttaUniversityPress

[104] -----1939Bhasha-prakashBangalaVyakaran(AGrammaroftheBengaliLanguage)CalcuttaUniversityofCalcutta

[105] HaiMuhammadAbdul1964DhvaniVijnanOBanglaDhvani-tattwa(PhoneticsandBengaliPhonology)DhakaBanglaAcademy

[106]JhaSubhadra1958TheFormationofMaithiliLondonLuzacampCo

[107] KosticDjordjeDasRheaS1972AShortOutlineofBengaliPhoneticsCalcuttaStatisticalPublishingCompany

[108] MajumdarRC1971HistoryofAncientBengalCalcuttaGBhardwaj

[109] MazumdarBijaychandra19202000TheHistoryoftheBengaliLanguage(ReprCalcutta1920ed)NewDelhiAsianEducationalServices

[110] PandeyAnshuman2001ProposaltoEncodetheTirhutaScriptinISOIEC10646

[111] PalPalashBaran2001DhwanimalaBarnamalaKolkataPapyrus

[112] -----2007lsquoBanglaHarapherPanchParbarsquoInSwapanChakrabortyedMudranerSanskritiOBanglaBoiKolkataAbabhas

[113] RossFiona1999ThePrintedBengaliCharacteranditsEvolutionLondonCurzon

[114] ShastriMahamahopadhyayHaraPrasad1916HājārBacharērPurāṇaBāṅgālāBhāṣāyBauddhaGānōDōhāCalcuttaBangiyaSahityaParishat

[115] SinghUdayaNarayana(JointlyManiruzzaman)1983DiglossiainBangladeshandlanguageplanningCalcuttaGyanBharati214pp

[116] -----1987ABibliographyofBengaliLinguisticsMysoreCIILxii+316pp

[117] -----2017(withRajibChakrabortyBidishaBhattacharjeeampArimardanKumarTripathy)LanguagesandCulturesontheMarginGuidelinesforFieldworkonEndangeredLanguagesMimeoCentreforEndangeredLanguagesVisva-Bharati

54

[118] -----1980ScriptalchoiceandspellingreformAnessayinlanguageandplanningJournaloftheMSUniversityofBarodaSocialScienceNumber292173-186AmodifiedversionreprintedEAnnamalaiBjornJernuddandJoanRubinedsLanguagePlanningProceedingsofanInstituteMysoreCIIL405-417

[119] Sripantha1996JakhanChapakhanaEloKolkataPaschim-BangaBanglaAcademy

[120] SurAtul1986BanglaMudranerDushoBacharKolkataJijnasa

[121] ScriptBehaviourforBengaliVersion11TDILandC-DACPune

[122] BoraMahendra1981TheEvolutionofAssameseScriptJorhatAssamSahityaSabha

[123] ProposaltoEncodetheTirhutaScriptinISOIEC10646httpwwwunicodeorgL2L201111175r-tirhutapdfaccessedon25112017

[124] EthnologueAssameseintheLanguageCloudhttpswwwethnologuecomcloudasmaccessedon25112017

[125] BengalialphabetforManipurifoundinEthnologueManipuri(MeeteilonMeithei)httpswwwomniglotcomwritingmanipurihtmaccessedon20102019

[126] WikipediaBengalialphabethttpsenwikipediaorgwikiBengali_alphabetaccessedon25112017

[129]OmniglotSlyhetihttpwwwomniglotcomwritingsylotihtmaccessedon1052018

[130]WikipediaBishnupriyaManipurilanguagehttpsenwikipediaorgwikiBishnupriya_Manipuri_languageaccessedon25112017

[131] TheEMILLECIILCorpushttpmetashareeldaorgrepositorybrowsethe-emilleciil-corpusabdd35c8de6f11e2b1e400259011f6ea6bce74d38dbb42d881da76c64a6adb20accessedon1052018

[132]TheEMILLECIILCorpushttpcatalogelrainfoproduct_infophpproducts_id=696accessedon1052018

[133] BanglaLanguageampScript httpswwwisicalacin~rc_banglabanglahtmlaccessedon1052018

[134] SarkarPabitra1992BanglaBananSanskarSamasyaoSambhabanaKolkataChirayataPrakashan

[135] SarkarPabitra1993BanglaBhasharYuktabyanjanBhasha1123-45

55

[136] DashNiladriShekharandBBChaudhuri1998BanglaScriptAStructuralStudyLinguisticsToday121-28Alsoavailableathttpswwwacademiaedu9967428Bangla_Script_A_Structural_Study

[137] DaniAhmedHasan(1957)lsquoSrīhatta-NāgarīLipirUtpattioBikāśrsquoBanglaAcademyPatrika(Dhaka)Vol12(Bhadra-Agrahayan1364BangabdaNumber)pg1

[138] WikipediaSylhetiNagari httpsenwikipediaorgwikiSylheti_Nagariaccessedon1952018

[139] FuruiRyosuke(2015)lsquoVariegatedAdaptationsStateFormationinBengalfromtheFifthtoSeventhCenturyrsquoinBhairabiPrasadSahuampHermannKulkeedsInterrogatingPoliticalSystemsIntegrativeProcessesandStatesinPre-ModernIndiaChapter9Pp255-73NewDelhiManohar

[140] FergusonCharesAandMunierChowdhury(1960)lsquoPhonesofBengalirsquoLanguageVol36No1pp22-59

[141] ShahidullahMuhammad(2007)BuddhistMysticSongsDhakaMowlaBrothers

[142] RayPunyaSloka(1966)BengaliLanguageHandbookWashington

[143] HaiMuhammadAbdul(1960)AphoneticandphonologicalstudyofnasalsandnasalizationinBengaliDhakaUniversityofDhaka

[144] UnicodeConsortiumProposalSummaryFormtoAccompanySubmissionsforAdditionstotheRepertoireofISOIEC10646UNICODEhttpswwwunicodeorgL2L200202387r-syloti-formpdfaccessedonMay212018

[145] WikipediaOlChiki(Unicodeblock)httpsenwikipediaorgwikiOl_Chiki_(Unicode_block)accessedonMay212018

[146]BanglaScripthttpwwwbangladesh2000combdbangla_scripthtmlaccessedonMay212018

[147] BhattacharyaAshutoshed(1942)GopichandrerGanCalcuttaCalcuttaUniversity

[149] DasSisirKumar(1975)SahibsandMunshisAnAccountoftheCollegeofFortWilliamCalcutta

[150] IslamRafiqulPabitraSarkarMahbubulHaqampRajibChakraborty(eds)(2014)BanglaAcademyPromitoBanglaByabaharikByakaran(AFunctionalGrammarofStandardBangla)DhakaBanglaAcademy

[151] SarkarPabitra[2013]lsquoBanglaSpellingReformtheLongandShortofItrsquoBanglaJournal19215-232

[152] BanglaAcademy(2012)BanglaAcademyPromitoBanglaBananerNiyam(StandardBanglaSpellingasadoptedbyBanglaAcademy)DhakaBanglaAcademy

56

[153] SarkarPabitraampRajibChakraborty2018ldquoWhathashappenedSoFarIntermsof Script Reformsrdquo Paper presented at the Face to Face meeting jointly held by theBanglaAcademyDhakaampICANNatBanglaAcademyDhakaon10072018

[154] The Unicode Consortium 2018 The Unicodereg Standard Version 110 ndash CoreSpecificationChapter12P473

57

10 Appendix-I

101 AugmentedBackus-NaurFormalism(ABNF)The Augmented Backus-Naur Formalism (ABNF) is generic in nature and whenappliedtoaspecificlanguagescriptcertainrestrictionrulesapplyInotherwordsinagivenlanguagesomeoftheFormalismstructuresdonotnecessarilyapplyTotakecareofsuchcasesrestrictionrulesaresetinplaceTheserestrictionswillhelptofine-tunetheABNFIncaseofBangla13inparticularthefollowingrulesapply

1Khaṇḍata (ৎ) is NOT allowed at the beginning of an IDN label The sameappliestoঞandthevelarnasalঙintheBanglaSchemeoffive-foldlsquovargarsquo(asdefined under Table 5) Moreover Bangla does not allow ya (য়) in thebeginningofawordeitherbutwecanciteacoupleofnativeexamples forexamplethewordয়8াvেড়া(yaeligbbɔRo)fromthepoemlsquoLichuchorrsquowrittenbyKaziNazrul IslamHowever there are instances of it being used in namesmostlyof foreignoriginsuchasYaqubwhichmaybewrittenwithya(য়) inthe beginning as inয়াwব) In very recent timeswhile transliterating someChineseandJapanesenamesinBanglaonedoescomeacrossthepossibilityofKhaṇḍata (ৎ) followedbysa (স) inthebeginningofaword forexamplexেসিরং(Tsering)

2CHcancomewithKhandaTainonlythecasewhereCisra(র)(09B0)

ৎ6 asinভৎ6 সনা

3OnlyfollowingcombinationswithVHCMwillbeallowedrarrঅ8া(togetherpronouncedasaelig)asinঅ8ািসড(acid)rarrএ8া(togetheralsopronouncedasaelig)asinএ8ািসডএ8ােসািসেয়শান

(acidassociation)

102 lsquoSylhetiNagarılipirsquoorlsquoSilotirsquoThisversionofBanglascriptresemblesthe lsquoKaithīrsquoscript(ISO12954)usedbytheAccountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh andBiharndashwidelyinuseduringthe1880sTherewereseveralothernamesofSylheti

13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language Assamese and Maṇipuri have not been covered in this section

58

Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)

Table17ndashTheScriptTableofSylhetiNagarıorSiloti

103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints

1031 BanglaandNāgarīorDevanāgarī

Bangla Devanāgarī NBGP

Decision

ঃ U+0983 ः U+0903 Confusable

ওU+0993 उU+0909 Confusable

ঘU+0998 घU+0918 Confusable

U+0981 U+0945 Confusable

Table18BanglaandDevanāgarīconfusablecodepoints

59

1032 BanglaandGurmukhi

Bangla Gurmukhi NBGPdecision

ঘU+0998 ਬU+0A2C Confusable

U+0981 U+0A71 Confusable

Table19BanglaandGurmukhiconfusablecodepoints

Bangla Gurmukhi

NBGPdecision

ওU+0993 ਤU+0A24

Distinguishable

শU+09B6 ਅU+0A05

Distinguishable

মU+09AE ਮU+0A2E

Distinguishable

বাU+09ACandU+09BE

ਗU+0A17

Distinguishable

Table20ndashBanglaandGurmukhıdistinguishablecodepoints

1033BanglaandOriya(Odia)

Bangla Oriya(Odia) NBGPDecision

ওU+0993 ଓU+0B13 Confusable

Table21ndashBanglaandOriyadistinguishablecodepoints

Bangla Oriya(Odia) NBGP

Decision

ঘU+0998 ସU+0B38 Distinguishable

Table22ndashBanglaandOriyadistinguishablecodepoints

60

11 Appendix-IIBengaliconsonantsandtheirallographs

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

প p y(z+ত)(z +ন)|(z +প)প8(z +য)r(z +র)(z +ল)~(z +স)(+প)(+প)

ফ pʰ (t +র)(t +ল)(+ফ)

ব b ( +জ)( +দ)( +ধ)v(+ব)ব8(+য)(+র)(+ল)ভ(+ভ)(+ব)(+ব)

(0+ধ) 2 (3+ব)

ভ bʱ ভ8(+য)(+র)(+ল)

ত t (x+ত)8(x+x+য)(x+x+ব)(x+থ)(x+ন)ত8(x+য)(x+ম)8(x++য)(x+ব)(x+র)y(z+ত)(p+ত)(p+x+ব)(+ত)q8(+x++য) (+x+র)Thereisamarkedformofত+=ৎৎ6 (+xৎ)

amp (5+ত)

61

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

থ tʰ থ8(iexcl+য)cent(iexcl+র)pound(+থ)(x+থ)curren(+থ)

(7+থ) 9 (+থ)

দ d yen(brvbar+গ)sect(brvbar+ঘ)uml(brvbar+দ)copy(brvbar+ধ)দ8(brvbar+য)ordf(brvbar+ব)laquo(brvbar+ভ)not(brvbar+র)(+দ)shy(+দ)reg(+brvbar+র)not6 (+brvbar+র)

(lt+গ) gt (lt+ধ)

ধ dʱ macr(deg+ন)plusmn(deg+ম)ধ8(deg+য)sup2(deg+র)sup3(acute+ধ)copy(brvbar+ধ)(+ধ)micro(+ধ)

( (+ধ) gt (lt+ধ) (0+ধ) (7+ধ)

ট ʈ para(middot+ট)ট8(middot+য)cedil(middot+ব)sup1(middot+র)ordm(p+ট)raquo(frac14+ট)

ঠ ʈʰ ঠ8(frac12+য)frac34(iquest+ঠ)Agrave(frac14+ঠ)

ড ɖ Aacute(Acirc+ড)ড8(Acirc+য)Atilde(Acirc+র)

ঢ ɖʱ ঢ8(Auml+য)Aring(iquest+ঢ)

চ tʃ AElig(Ccedil+চ)Egrave(Ccedil+ছ)Eacute(Ccedil+Ecirc+র)Euml(Ccedil+ঞ)চ8(Ccedil+য)Igrave(Iacute+চ)Icirc(Iuml+চ)

(A+চ)

62

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ছ tʃʰ ETH(Ecirc+র)Egrave(Ccedil+ছ)Ntilde(Iacute+ছ)Ograve(Iuml+ছ)

$ (A+ছ)

জ dʒ Oacute(Ocirc+জ)Otilde(Ocirc+Ocirc+ব)Ouml(Ocirc+ঝ)times(Ocirc+ঞ)জ8(Ocirc+য)Oslash(Ocirc+র)Ugrave(Iacute+জ)

(A+জ)

ঝ dʒʱ (notprivilegedenoughtohaveclustersasafirstmember)Ouml(Ocirc+ঝ)Uacute(Iacute+ঝ)

ক k Ucirc(p+ক)ordm(p+ট)(p+ত)Uuml(p+x+র)(p+x+ব)Yacute(p+ন)THORN(p+ব)szlig(p+ম)ক8(p+য)agrave(p+র)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)auml(p+frac14+ব)aacute8(p+frac14+য)aring(p+স)s(aelig+ক)ccedil(+p+র)

amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)

খ kʰ (notprivilegedenoughtohaveclustersasafirstmember)egrave(aelig+খ)

63

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

গ g eacute(acute+গ)ecirc(acute+দ)sup3(acute+ধ)euml(acute+ন)igrave(acute+ব)iacute(acute+ম)গ8(acute+য)icirc(acute+র)iuml(acute+ল)eth(aelig+গ)eth6 (+aelig+গ)

( (+ধ) (H+গ) K (L+H+গ)

ঘ gʱ ntilde(ograve+ন)ঘ8(ograve+য)oacute(ograve+র)ocirc(aelig+ঘ)

ঞ Thisletterdoesnothaveanyparticularphoneticvaluebutmostlypronouncedasn

Igrave(Iacute+চ)Ntilde(Iacute+ছ)Ugrave(Iacute+জ)Uacute(Iacute+ঝ)times(Ocirc+ঞ)

(A+চ) $ (A+ছ) (A+জ) M (A+ঝ)

ণ n otilde(iquest+ট)frac34(iquest+ঠ)ouml(iquest+ড)divide(iquest+Acirc+র)Aring(iquest+ঢ)oslash(iquest+ণ)ণ8(iquest+য)ugrave(iquest+ব)acirc(p+frac14+ণ)uacute(frac14+ণ)ucirc(+ণ)

O (P+ড) - (P+R+র) + (S+ণ)

ঙং ŋ s(aelig+ক)uuml(aelig+p+র)egrave(aelig+খ)eth(aelig+গ)ocirc(aelig+ঘ)yacute(aelig+p+ষ)(Insomecontextsaeligisreplacedbyং)কংঅং

) (H+ক) (H+গ) U (H+ঘ)

64

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ম m thorn(+ল)yuml(+প)(+z+র)(+ভ)(++র)$(+ম)(+র)(x+ম)plusmn(deg+ম)amp(+ম)atilde(p+frac14+ম)

W (3+ম)

ন n (+ট)((+middot+র))(iquest+ঠ)u(+ড)(+Acirc+র)(+ত)q(+x+র)q8(+x++য)curren(+থ)shy(+দ)reg(+brvbar+র)micro(+ধ)+(+deg+র)(+brvbar+ব)-(+ন)(+ম)ন8(+য)(+স)0(+ন)

(7+থ) (7+ধ) (7+Y+র)

শ ʃ Icirc(Iuml+চ)Ograve(Iuml+ছ)1(Iuml+ন)2(Iuml+ম)3(Iuml+র)4(Iuml+ল)শ8(Iuml+য)

ষ ʃ 5(frac14+ক)raquo(frac14+ট)Agrave(frac14+ঠ)uacute(frac14+ণ)6(frac14+প)7(frac14+z+র)8(frac14+ফ)raquo(frac14+ট)9(frac14+middot+র)Agrave(frac14+ঠ)uacute(frac14+ণ)ষ8(frac14+য)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)

+ (S+ণ)

65

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

স sampʃ (+ক)(+ট)(+প)(+ফ)lt(+ত)pound(+থ)(+ট)(+ক)=(+খ)স8(+য)gt(+র)(+ল)aring(p+স)

9 (+থ)

হ h ucirc(+ণ)0(+ন)amp(+ম)হ8(+য)(+র)A(+ল)

W (3+ম)

ড় ɽ B(C+গ)

ঢ় ɽʱ (notprivilegedenoughtohaveclusters)

য dʒThesecondarysymbol(allograph)jɔ-phalahastwophoneticvaluesWhenaddedtotheinitialconsonantinaworditisavowelaelig(asinশ8ামলর 8াপারetc)Butafteranon-initialconsonantitjustdoublesitinpronunciation(asinকায6ধায6etc)The+যcombinationhastwophysicalmanifestationsmdashর 8andয6

ক8(p+য)স8(+য)র 8(+য)[Justর 8isneverusedinBanglaorthographyর 8াisbutthenitslasttwosymbolsYa-phalaa-karaconstituteavowelsignrepresentingthevowelঅ8া]

66

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

র r Twomanifestationsmdashi lরফrepʰasthefirst

memberofaclusteregপ6ৎ6 not6 য6D6(+deg+ব)(earlierE6=+brvbar+deg+বafour-termcluster)etc(placedoverthefollowingconsonant)

ii র-ফলাrɔ-pʰɔlaasthesecondthirdmemberofaclustereg etc(placedundertheconsonantitfollows)

ল l F(+গ)(+প)G(+ব)H(+ম)I(+ট)J(+ড)K(+ক)F(+গ)L(+দ)ল8(+য)iuml(acute+ল)(+ল)thorn(+ল)

ঃ hwordfinallywordmediallyitdoublesthepronunciationofthefollowingconsonant

অঃকঃ

অব

Page 4: Proposal for a Bangla (or Bengali) Script Root Zone …‘Bangla’ (or Bengali) is historically and genealogically regarded as an eastern Indo-Aryan language with around 178.2 million

4

31WrittenBanglaThe lsquoBangla alphabetrsquo (বাংলা িলিপ - Bānglā lipi ISO15924) is derived from theBrāhmīwritingsystemwhichisrelatedtotheNagarı(alsoknownasDevanāgarī5)script[108]aswell as to Tirhutāwriting system [106] Considered to be fifthmostwidely usedwritingsystem in theworld thiscombinedBangla-Asamiyā-ManipuriScript (showingsomevariationsforAsamiyāandMeiteiorBisnupriyaManipuri)(130)wasusedintheeasternIndianSanskritmanuscriptstooForChakma in IndiaandBangladeshandforKokborok inTripura itwasandstill isoneof thescriptsusedAclosevariant calledTirhutā (123 now available also in UNICODE 100 as 11480 114DF See 110) orMithilākṣarawasused forMaithili fromthe14thCenturyuntil theearly-20thcentury[106]InthiscontextonefindsamentionoflsquoSylhetiNagarılipirsquoorlsquoSilotirsquo(addedtotheUnicodeStandard inMarch2005with thereleaseofversion41) thedetailsofwhichcouldbeof interest only tohistorians andhistorical linguists (See137and144)ButSylhetiBanglaisgenerallywrittenbymanyinthemodern-dayBanglascriptnowforallpracticalpurposes Originallyduring thereignof thePāladynasty (750-1154AD) intheeasternIndiaandevenearlierperhapsduringtheMallaperiod(694ADonwards)thepresent-dayBanglawritingsystemgotashapecomparabletothemodern-dayones[111 119] A pictorial description of Brāhmī to Modern Bangla Script could bepresentedhereinatabularform

Modern ক জ ম র স অ

k j m r s a

Table1PictorialdepictionofEvolutionofBrāhmītoBangla

5William DwightWhitney in his SanskritGrammar unequivocally said ldquoThis name (Devanagarı) is ofdoubtfuloriginandvaluerdquo(WhitneyWilliamDwight1994reprintSanskritGrammarNewDelhiːMotilalBanarasidassPublishersp1)

5

The inscriptional evidence in Brāhmī is found in the Archaic Brāhmī from the 3rdcenturyBC tothe1stcenturyBCandinMiddleBrāhmīndashsoonafter(1st-3rdCenturyAD)andthenonintheLateBrāhmī(4th-6thCenturyAD)ThisevidencecouldbeseeninbothBangladeshandWestBengal [108]by1)TheMahasthanagara(BogradistrictBangladesh mdash the ancient name being Pundranagara or Paundravardhanapura)inscriptions 2)Brāhmī (andKharoṣṭhī) inscriptions from the lower lsquoGangeticBengalrsquoand (3) Copper plate inscriptions of the Imperial Guptas fromNorthernpart ofWestBengal andNorth-West Bangladeshmdash in the areas underDharmaditya Gopachandraand Samācāradeva (about whom one only knows from five Copper-plates found inKotalipara in the Faridpur district in Bangladesh one in Mallasarul in the Burdwandistrict(WestBengal)andoneinJayramapura(BallesvaradistrictnowinOdisha)Theseepigraphs fromtheeasternpartofUndivided India (datingback to the4th-6thCenturiesAD)showedsomecharacteristicfeaturesofletters(especiallyinমlsquomarsquoলlsquolarsquo

শlsquosarsquoসlsquosarsquoandহlsquoharsquo)whichledtothedevelopmentofeasternvarietyofGuptascriptEpigraphicrecordsfromBangladeshdemonstrateremarkabledevelopmentsinEasternBrāhmī In this context the Tippera copper plate inscription of the lsquoSamatatarsquo rulers(139 pp 265) such as Lokanātha (dated 7th Century AD during the latter half) theKailaninscriptionofSy ridharanaRātaaswellastheAstafpurcopperplatesThelettersseemtohangdownfromwedgeshapedsolidtriangleswithrighthandverticalsbendingdownatthebottombecauseofwhichitwasdescribedbyPrinsepandFleetasKuṭila-lipi (literally lsquoCursivewriting stylersquo)whereas the termSiddhamātrikā (as amatra orbarisplacedovereachoftheletters)wasusedbyAlBiruni(973-1048)todesignatethescriptofNorthernIndiaThenextstageofdevelopmentisillustratedbythe9thCenturycopper plate inscriptions fromKhalimpur of the reign of Dharmapāla fromMonghyrand Nalanda of the time of Devapāla in Bihar and from Jagjıvanpura (Malda) of thereignofMahendrapālaTheSiddhamātrikā(mentionedaslsquoSiddhamrsquoinChinesesources)issaidtohavebeenprevalentalsointhisregionuptotheendofthetenthcenturyAlsocalledtheGauri(ieGandi)inPūrvadeśāortheEasterncountryitwasregardedasthesame script to which is given the appellative Proto-Bangla characteristics inrudimentaryformsintheperiodbetweenAD875andAD1025Insomeepigraphs it isconsideredasbelonging to thesecondquarterof theeleventhcenturyADFlatteningofhead-marksbecomesprominentincomparisontothewedge-shaped serifs An important landmark in the development of the Bangla script is theRamaganja copper plate inscription of Mahāmānḍalika in the last quarter of theeleventhcenturyADItistheearliestdocumentfromthisentireregionwhichbearsthelettermwithatickrisingupwardsThefullvowelidevelopsatickattherightendofthe upper horizontal bar above and a curved hook below Initial e approaches themodernBanglacharacterAmature formofProto-Bangla the immediateprecursorof

6

BanglascriptisillustratedintheinscriptionsoftheVarmanaSenaandDevarulersofthetwelfthandthirteenthcenturies[104]TheevolutionoftheBanglascript(Cf136)isalignedwiththestoryofadvancementofprintingtechnologyThefirstldquoMovabletyperdquoscriptstechnicallycreatedandusedwhileprintingNathanielBrasseyHalheds (1751-1830)1778-book titled AGrammaroftheBengalLanguageIn1785Governor-GeneralWarrenHastings(1732-1818)requestedanother civilian Charles Wilkins (1749-1836) to cut punches for Bangla printingcharactersThecurrentprintedformofBanglascriptappearedsoonafterItisgenerallyagreedthatWilkinsdevelopedBanglaprintscript[111]HepassedonthisknowledgetoPancananaKarmakara(-1804)arenownedartistinBengalLateritwasKarmakarand his family that became famous in Bangla printing technology Shepherd wasanotherassistantofWilkinsinthisdesigningofscriptwhichbecamemoreangularwithsharperturnsandedges[133]Afewarchaiclettersweremodernizedduringthe19thcentury It was standardized by Pandit Ishwar Chandra Vidyasagar when the Banglatypefontsweretobeusedtopublishona largescaleundertheCalcuttaSchoolBookSociety[116forseveralreferences]Much later in1935 theLinotypetechnique inventedbyOttmarMergenthaler(1854-1899) in 1886was introduced intoBangla printing in 1935 by the efforts of SureshChandra Majumdar (1888-1954) Rajsekhar Basu (1880-1960) Jatindra Kumar Sen(1882-1966)andhisdiscipleSushilKumarBhattacharyaandhadbegunbeingusedbytheA nandabazaraPatrikagrouplaterfollowedbyothersWithinafewyearsthemoreadvancedmonotypetechnologycametobeusedinBanglaprintingHoweverinBanglaprinting culturemonotypehas a very limited acceptance and linotype held stage tilleventuallythedigitaltechnologycameintoreplaceallearliertechniquesAllthesecouldbepresentedinatable

PERIOD DESCRIPTION NAMES

3rdCenturyBC UseofBrāhmīandKharosthīscriptsbegininthesubcontinentBrāhmīwaswidelyusedduringtheMauryanKingAśokaInonetheoryBrāhmīisbasedonNorthSemiticalphabetbutsuitablymodifiedtofittheneedoflocallanguagesItiscurrentlybelievedtohavebeenanindependentdevelopment

Brāhmī

1st-3rdCenturyAD

TheKusanascriptnamedaftertheKusanaroyaldynasty

Kusanascript

7

PERIOD DESCRIPTION NAMES

4th-5thCenturyAD

ThenextstageofitsevolutionwasintotheGuptascriptnamedaftertheGuptaroyaldynasty

Guptascript

7thCenturyAD EpigraphicrecordsfromBangladeshdemonstrateremarkabledevelopmentsinEasternBrāhmīgivingrisetotheKuṭila-lipi

Kutila-lipi

8thCenturyAD SomecopperplateinscriptionsarefoundintheKhalimpurBangladeshduringthereignofDharmapālafromMonghyrandNālandāinBiharofthetimeofDevapālaandfromJagjıvanapurainWestBengalofthereignofMahendrapāla

Siddhamātikā

9thCenturyADuntil1025AD

Proto-BanglacharacteristicsinrudimentaryformsdevelopAnimportantlandmarkinthedevelopmentoftheBanglascriptistheRamaganjacopperplateinscriptionofMahāmāndalikafoundinthelastquarteroftheeleventhcenturyAD

Proto-BanglaScriptampLanguage

12th-13thCenturyAD

AmatureformofProto-BanglatheimmediateprecursorofBanglascriptisfoundintheinscriptionsoftheVarmanaSenaandDevarulersofthetwelfthandthirteenthcenturies

MaturedProto-Bangla

14th-15thCenturyAD

ThecharacteristicsoftypicalBanglascriptbegantodevelopascouldbeseeninthecopperplateinscriptionofVijayamānikya-IofTripuradated1478AD-alsoIllustratesformsofBanglalettersinthefifteenthcenturyAD

ModernBanglaScripterabegins(SeeRoss1999)

16th-17thCenturyAD

ThechartoftheBanglaalphabetappendedtotheChinaMonumentspublishedfromAmsterdamin1667andThecodeofGentoolawpublishedfromLondonin1776bothshowachartoftheBanglaalphabetTheyshow16VowellettersincludingtheLonglsquoৡrsquo lsquol irsquoAnusvāraandVisargaand34Consonants

PrintedChartsofBangla

18th-19thCenturyAD

CharlesWilkinsdevelopsprintinginBanglain1778andVidyasagarreformsit

BanglaTypeFonts

Table2DevelopmentoftheBanglaWritingSystem

8

TheoveralldevelopmentofBanglaScriptfromtheKuṭila-lipiperiodtoModernBanglacouldbeseenhereinTable3([102and146]andalsoseetheweb-pagein147)

Table3BanglaScriptinDifferentCenturies

32LanguagesConsideredBelowisthetabularrepresentationofthelanguagesusingBanglascriptthatareplacedonEGIDSScale1-6 (See117 fordetails) Some languagesunderEGIDS5 and6havealso developed their own scripts for printing and publishing Some had used Banglascriptearlier(suchasBodo)orusedit inWestBengalatsomepointoftime(Santali)but have later shifted to another writing system Bodo is now written in Nāgarī or

9

DevanāgarīandforSantalioneusesbothNāgarīDevanāgarīandOl-chiki(145)Forthepurposesof theBanglaLGRonly languagesbelonging to theEGIDS scale1 to4havebeenconsideredConsiderthefollowingtable

EGIDSScale1

EGIDSScale2

EGIDSScale3

EGIDSScale4

EGIDSScale5

EGIDS6

Bangla(Bengali)

SantaliBodoRiangKhumiMru(ng)Asho

LepchaPnarKodaKoraChak

Asamiyā(Assamese)

KochorRajabansı

MaltoorMalpahariya

ManipuriorMeitei

BisnupriyaManipuriKok-Borok(TripuraampBangladesh)

ChakmaHajongMundariampKurux(ofBangladesh)

TotoRohingyaTipperaMegamTanchangya

Usoi LimbuSadriorOraon

BhumijorMundariBawmChin

Table4MainlanguagesinIndiaandBangladesh

thatuseBanglaScriptontheEGIDSScale

33NotableFeaturesofBanglaScript[150]BanglaWritingSystemhascertainfeaturesthatshowhowithastobewritteninorhowtype-setting inBangla couldbedoneThis section is followedbya section that explains theCode-points (and fixed Code-point sequences) which show certain distinctive characteristics ofBanglaandwhichmaketheRepertoireThenextsectionswillalsocoverthelsquoaksharrsquo-formationrules(ABNF)showingcharacterclassWordLevelEvaluation(WLE)andContextRulesaswellas In-ScriptandCross-ScriptVariantsHerewepresentsomebasic featuresof theScriptandPronuncition The Bangla script is an alpha-syllabic writing system in which writing of all

consonants are assumed to contain an accompanying lsquoinherentrsquo vowel(theoretically before or after each consonant) It varies between ɔ and o

10

depending on the position of the consonant in the word At times theselsquoassumedrsquoorlsquoinherentrsquovowelsarenotpronouncedatall[142]

Vowelscanbewrittenasindependentlettersorbyusingavarietyofdiacriticalmarks which are written above below before after or both of the last twopositionstheconsonanttheyfollowinpronunciation[105]

AllBanglaconsonantswhenpronouncedinisolationareutteredwithaninherentvowel-ɔhenceক lsquokrsquoখ lsquokhrsquoorগ lsquogrsquoareusuallypronouncedas[kɔ][khɔ]or[gɔ]etcPhonologicallyBanglavowel-ɔcorrespondstotheHindischwaə

WhenconsonantsoccurtogetherinclustersspecialconjunctlettersareformedInprintedBanglamanyof theseconsonantal clustersorconjoinedconsonantsareinuseThelettersfortheconsonantsotherthanthefinaloneinthegrouparegenerally reduced But there are a few special conjunct characters which arecompounds of the consonant characters eg 7(k)+ষ(s)=8(ks)

9(n)+জ(j)=(nj)(j)+ঞ(n)==(jn) gt (h)+ম(m)=(hm) There are other issuesalsomdashরasthesecondmemberofaclusterisreducedtoasecondarysymboleg

(p)+র(r)=A(pr)B(s)+C(t)+র(r)=D(str) (as inউD ustra ldquocamelrdquo)য (y)whenusedas a primary symbol represents jɔ in Bangla But its secondary symbol(allograph) jɔ-phala has two phonetic values When added to the initialconsonant in a word it is a vowel aelig (as in শGামল (syamala) ldquogreenrdquo র Gাপার

(ryapara)ldquowrapperrdquoetc)Butafteranon-initialconsonant it justdoublesit in

pronunciation (as in কাযH ধাযH etc) The I(r)+য(y) combination has two

renderingsmdashর G(ry) andযH(ry)IncaseofJ(d)+ধ(dh)K(g)+ধ(dh)L(n)+ধ(dh)the

shape of the second member is changedmdasheg M(ddh) N(gdh) and O(ndh)

respectively The solitary example of I (r)+ঋ(r)=ঋH (as in ৈনঋHত nairrtSouthwest) ndash usedmostly in cases of Classical borrowings shows the use ofsecondary symbol of a consonant followed by the primary symbol of a vowelTheinherentvowelonlyappliestothefinalconsonantofthecluster

InconsonantclustersmanyconsonantstookacompletelydifferentformSometypicalexamplesareS(kt)T(kr)8(ks)N(gdh)=(jn)U(nc)(nj)V(tt)W(nt)O(ndh)X(bdh)Y(bhr)Z(mb)[(st)etcরhastwoallographsapartfromthisfullshapeoneislsquorepharsquoasfoundinকH(rk)পH(rp)andanotherisra-phalaasinA(pr)T(kr)(s+n)isanotheronewherethecerebralnasalconsonantsigntakesaqueershape[151]

The Bangla script has at least fifty-two primary symbols and quite a fewallographs(positionalvariantsofthem)correspondingtoforty-four(7oraland7nasalvowelsand30consonants)phonemes(150)orfunctionalspeechsoundswithsomeobviousredundanciesalthoughinoneofthefirstphonemicanalysisthenumberwasthoughttobethirty-fivephonemes[140]

11

As mentioned above in Bangla several graphemic symbols have secondaryshapestechnicallycalledlsquoallographsrsquowithacomplementarydistributionineachcaseThesegraphsormarkingsaregenerallyaddedtothefollowingpositionsoftheprimarysymbol[113]inthefollowingmanner

1) Below(egক(ku)W(nta)ক(ku)^ (hra)etc)

2) Above(egচ (ca)কH (rka)etc)

3) Rightside(egকা (ka)কং (kan)etc)

4) Leftside(egেক (ke))

5) LeftSideandabovesimultaneously(egৈক (kai)িক (ki)etc)

6) Rightsideandabovesimultaneously(egকী (kı))

7) Rightsideandleftsidesimultaneously(egেকা (ko))

8) Rightsideleftsideandabovesimultaneously(egেকৗ (kau))

Asforcomplementarydistributionofvowelletters(word-orsyllable-initial)andVowel Matras which are relevant for ABNF let us consider the followingBesidessomesimpleVowelModifierscalledlsquoKarsrsquoinBangla(alsoreferredtoasMatraintheotherLGRdocumentsofNeo-Brāhmī)therearesomecombinatorymodifiersofBanglaVowelswithcertainconsonantsForexamplewhereas

আU+0986BENGALILETTERAAissubstitutedby

াU+09BEBENGALIVOWELSIGNAA

ইU+0987BENGALILETTERIissubstitutedby

pre-posedিU+09BFBENGALIVOWELSIGNI

ঈU+0988BENGALILETTERIIissubstitutedby

ীU+09C0BENGALIVOWELSIGNIIor

উU+0989BENGALILETTERUissubstitutedby

U+09C1 BENGALI VOWEL SIGN U by marking below the primary

grapheme there are some special vowel modifiers of উ as in the followingcombinedletters

zwnj guratherthanwritingasগ(g)+ (u)

h ruratherthanwritingasর(r)+ (u)

zwnj śuratherthanwritingasশ (s)+ (u)

j huratherthanwritingasহ(h)+ (u)

12

knturatherthanwritingasL (n)+ত (t)+ (u)

Similarlytherecouldbevowelmodifiersofঊorlsquo(Long)ūrsquoaswelleg

m (bh)+র (r) (n bhru ldquoeyebrowrdquo)o (s)+র (r) (p sru)ঋ (r) afterহ (h) (q hr)etc

TherehavebeenmanynotablecontributionsinsimplifyingandmodifyingBanglaspellings and combinatory techniques especially by scholars such as PabitraSarkar(1992)[134]Inthistherehasbeenanattempttoreducethenumberofallographs of both vowels and consonants in clusters and it has been widelyacceptedintheprintingofschooltextsinbothBangladeshandWestBengal[151152]Asofnow twosystems theold (traditional) and thenewgoon sidebysideoperativeindifferentdomains

HoweverinpreparationofthisLGRdocumenttheaimhasbeentoconsiderthewidelyused and usable sequences and combinations and their variations across the sisterscriptsbelongingtothebasketofBrāhmīwritingsystemsBanglaAcademyDhakapublishedStandardBanglaSpellingRulesin1992followingtherecommendationsofacommitteeconstitutedthroughaworkshopjointlyorganizedbytheJatıyaSy iksakramaandPathyapustakaBoardin1988AthroughlyrevisededitionoftheRuleswaspublishedinSeptember20126After the establishment of Banla A kademi ofWestBengal in 1986 its first PresidentAnnadasankar Ray (1904-2002) in his inaugural address gave a direction forstandardizationofBanglaalphabetscript thespellingsystemandclearlyarguedthattheywouldnotblindly followtheSanskriticmodelofconventionalgrammarAbroadlistofproposalswassenttoexpertsonBanglaandabroadagreementwasreachedforlsquohomogenizationofBanglaspellingrsquoby1988BasedonopinionsreceivedfromdifferentquartersaunanimouslistoflsquorulesrsquowasagreeduponThiswaspublishedbyalsquoSpellingDictionaryrsquo titled Ākādemi Bānāna Abhidhāna (1997) which was obviously morecomprehensive than lsquoTheUniversityofCalcuttaproposalsrsquomade in1936Alongwiththe lsquorationalizationrsquo of spellings another stepwas taken tomake thewriting systemeasier to read by making the symbols used both single and combined ones morelsquotransparentrsquoThesereformswereoriginallysuggestedbySarkar(1987firstpublishedin1978)[134][153]whereheusedthetermsSwaccha (lsquoTransparentrsquo)andAswaccha(lsquoOpaquersquo or non-transparent) even adding Ardha Swaccha (lsquohalf transparent) inbetweenthetwoSomesampleexamplesare

6Bangla Academy 2012 Bāṅlā Ekaḍemī Pramita Bāṅlā Bānānera Niyama (Bangla Academy StandardBanglaSpellingRules)DhakaːBanglaAcademy

13

Transparent r (nn) s (pt) [ (st) where both member of the cluster can berecognizedOpaquewhereneitherofthetwocouldbe(easily)recognizedmdash8 ks(7 k+ষ s)= jn

( j+ঞ n)tng(un+গg) hm(gt h+ম m)

Semi-transparent A (pr)পH (rp)whereone symbol is recognizable and theother is

notIncaseofthree-termclustersatleastonesymbolwillnotbetransparentegv str

(w s+x t+র r)D str(B s+C t+র r)etc

Therewere in fact two types of proposals One concerned the shape of the lettersthose of consonant + vowel (CV) combinations and conjuncts which is consonant +consonantcombinationsTherewerefurthercomplexshapesiethoseofconsonant+consonant+ (consonant+) vowel (CC(CV) signs as in y (pru) or z (skru) SomedecisionsinthisareawerenecessarybecauseafewoftheCC(C)symbolsrepresentedcomplexitiesthatmadelearningthemdifficultforthechildrenTheotherdealtwiththespellings ofwords onlywithout any reference to the shapes of letters inwhich theywere written The basic objective here was lsquoone word one spellingrsquo to the greatestextentthatwaspossible[151]

Belowwe place a statement of themost salient changes that affect the consonant +vowelcombinations[153]

a The variants of the short u (^ উ-কার hrasva u-kāra) vowel sign have been

brought down to one ie So zwnj (gu) is now গ Similarly h (ru) gt র zwnj

(śu)gt শ j (hu)gtহ and therefore cluster + short u sign k (ntu)gt W

(ন++ত+উ) (stu)gt[ (স++ত+উ)

b The variants of long u (দীঘH ঊ-কার dīrgha u-kāra) have also been reduced

(rū)gt র n (bhrū) gt Y (ভ bh++র r+ঊ ū) (drū)gt (দ d++র r+ঊ ū) p (śrū)gt

(শ ś++র r+ঊ ū)

c The variants of ঋ-কার (ṛ-kāra secondary symbol of ṛ) have been brought down to one q (hṛ) gt হ

Regarding consonant + consonant + (consonant)hellip+ (vowel) clusters PaschimbangaBanglaAkademi proposed transparent or semi-transparent shapes for clusters to theextentadmissibleinBanglawritingsystemSomeexampleswillclarifytheproposal(Aslashwillmeanthatthetraditionalcluster-shapeprecedesitwhiletheBanglaAkademiinnovationfollows)[153]

Xব ধ bdh ( b+ ধ dh) Mদ ধ ddh (J d+ধ dh) ন থ nth (L n+থ th) Uঞ চ ntildec

(9 ntilde+চ c) ঞ ছ $ ntildech (9+ছ) ঞ জ ntildej (9 ntilde+জ j) Sক ত amp kt (7 k+ত t) T

14

kr (7 k+র r) Nগ ধ ( gdh (K g+ধ dh) ) ṅk (u ṅ+ক k) t ṅg (u ṅ+গ g) +

ṣṇ (B ṣ+ণ ṇ) ন ndhr (L n+ dh+র r) - ṇḍr ( ṇ+ ḍ+র r) ktr (7 k+x

t+র r)

331TheConsonantsAsper traditional classificationBangla Consonants are categorized according to theirphoneticpropertiesespeciallyintermsofplaceandmannerofarticulation[107]Thereare Five lsquoVargarsquo (pronounced as lsquoBargarsquo in Bangla) or Groups (sets or classes)distinguished by Place of Articulation and one Non-lsquovargarsquo group [105] Each Vargawhich corresponds toStopsat a certainplaceof articulation containsa seriesof fiveconsonants classified as per their phonetic qualities (ie manner of articulation)beginning from Unvoiced and Unaspirated to Voiced and Aspirated (in the fourthcolumn)finallyendingwithaHomorganicorCorrespondingnasal[107]Considerthefollowingtable

lsquoVargarsquoorSets

Unvoiced Voiced Nasal

-Asp +Asp -Asp +Asp

Velar কlsquoKrsquoU+0995

খlsquoKHrsquoU+0996

গlsquoGrsquoU+0997

ঘlsquoGHrsquoU+0998

ঙlsquoNGrsquoU+0999

Palatal চlsquoCrsquoU+099A

ছlsquoCHrsquoU+099B

জlsquoJrsquoU+099C

ঝlsquoJHrsquoU+099D

ঞlsquoNYrsquoU+099E

Retroflex টlsquoTTrsquoU+099F

ঠlsquoTTHrsquoU+09A0

ডlsquoDDrsquoU+09A1

ঢlsquoDDHrsquoU+09A2

ণlsquoNNrsquoU+09A3

Dental তlsquoTrsquoU+09A4

থlsquoTHrsquoU+09A5

দlsquoDrsquoU+09A6

ধlsquoDHrsquoU+09A7

নlsquoNrsquoU+09A8

Bilabial পlsquoPrsquoU+09AA

ফlsquoPHrsquoU+09AB

বlsquoBrsquoU+09AC

ভlsquoBHrsquoU+09AD

মlsquoMrsquoU+09AE

Table5VargaclassificationofBanglaconsonants

(FallingintoaPatternofFiveSetsofUnvoicedUnaspiratedUnvoicedAspiratedVoicedUnaspiratedVoicedAspiratedandNasalscalledfivelsquoVargarsquo)

Non-

যlsquoYrsquoU+09AF

য়lsquoYYrsquoU+09DF

রlsquoRrsquoU+09B0

ড়lsquoRRrsquoU+09DC

ঢ়lsquoRHrsquoU+09DD

15

Varga লlsquoLrsquoU+09B2

শlsquoSHrsquoU+09B6

ষlsquoSSrsquoU+09B7

স lsquoSrsquo U+09B8

হlsquoHrsquoU+0939

Table6Non-Vargaconsonants(Notfallingintoanyofthefivecategories)

332TheImplicitVowelKillerHasanta(calledrsquoHalantrsquoorlsquoHalantarsquoinotherBrahmı-basedscripts)As stated earlier all consonants are pronounced in isolation with an implicit vowel(centralback-ɔinBanglaastheneutralvowel)assumedtobeassociatedwiththem[121]ThelsquoHasantarsquo(=rsquoHalantrsquoorlsquoHalantarsquoinotherBrahmı-basedscripts)orthetermlsquoVirāmarsquo7(=rsquoDa rirsquoinBangla)aspreferredinUNICODE(cfUnicode30andabove)havebeenusedinthisreportastermsthathavebeenusedtodenotethecharacterthatmarkthe absence of this inherent vowel It may be noted that the term virama has beenadopted in UNICODE in a sense that is different from the traditional definition ofgrammarandhenceitrequiressomeexplanationhereConsideringtheimportanceofthedocumentthisnoteshouldbeapartofthisLGRdocumentsothatanybodyreferingtoitshouldbeabletoknowthepropergrammaticalexplanationofthetermBecauseaspecialsignisneededwheneverthisimplicitvowelisstrippedoffthesymbolisknownas the Hasanta (= Halant) (U+09CD) By placing the Hasanta under the firstconsonantofacombinationorclusteronecouldndashincommonparlanceldquokillrdquoitsvowelandcreate conjuncts In thismanner conjunct characters canbegenerallywrittenbyjoining two to fourconsonant combinations In rarecases thisprocess can joinup tofive consonantsHowever thenotionof amaximumnumberof consonants joining toformoneaksara8istobeboundedempiricallyThisisanobservationbasedontheCIIL-Emille Corpora of Bangla words [132 amp 133] as seen in print these days Given themixture of scripts and languages happening on theweb the possibility that onemaywant a generic Top Level Domain [gTLD] which may have more than the observedmaximum cannot be ruled out This can be the case when a foreign language wordwhichadmitsalargenumberofconsonantsistransliteratedintoBanglaHenceintheBanglaLGRworkthislimitwillnotbeenforced

333VowelsSeparate symbols exist for all lsquoSwararsquo or Vowels in Bangla which are pronouncedindependentlyeitheratthebeginningofthewordorafteranothervowelorconsonantsoundToindicateaVowelsoundotherthantheimplicitoneaVowelsigncalledlsquokārrsquo

7VirāmaasusedhereisalsoamisnomeraccordingtotheIndiangrammaticaltraditionsNowheremereabsence of a vowel is marked as virama Hasanta just marks the absence of a vowel nothing else(AbhyankarKashinathVasudevampJMShukla1961ADictionaryofSanskritGrammarBarodaːOrientalInstitute)8ThistermneedstobedisambiguatedAksaraalsomeanslsquosyllablelsquoinIndiangrammaticaltreaditions

16

inBanglaorMātrāinNagarı9isattachedtotheconsonantSincetheconsonanthasthisbuilt in neutral vowel at the end there are equivalent kāras (Mātrās) for all vowelsexcepttheঅ(pronounced-ɔ)Thecorrelationisshownasfollows

Vowel Correspondingvowelsign(kāras(Mātrās)

অlsquoArsquoU+0985

আlsquoAArsquoU+0986 া U+09BE

ইlsquoIrsquoU+0987 ি U+09BF

ঈlsquoIIrsquoU+0988 ীU+09C0

উlsquoUrsquoU+0989 U+09C1

ঊlsquoUUrsquoU+098A U+09C2

ঋVocalicrsquoRrsquoU+098B U+09C3

ৠVocaliclsquoRRrsquoU+09E0 U+09C4

ঌVocaliclsquoLrsquoU+098C U+09E2

ৡVocaliclsquoLLrsquoU+09E1 U+09E3

এlsquoErsquoU+098F ে U+09C7

ঐlsquoAIrsquoU+0990 ৈU+09C8

ওlsquoOrsquoU+0993 োU+09CB

ঔlsquoAUrsquoU+0994 ৌ U+09CC

- ৗ U+09D7

Couldappearontopofঅ lsquoArsquoU+0985oranyothervowel

U+0981Candrabindu

9AlthoughthetermlsquoMatralsquoinBanglastandsforanaltogetherdifferentconceptvizthetopbarplacedoveraletterndashtypicallyavailableinHindiandBanglabutmissinginGujarati

17

Vowel Correspondingvowelsign(kāras(Mātrās)

Couldappearafterঅ lsquoArsquoU+0985oranyothervowel

ংU+0982Anusvara

Couldappearafter অlsquoArsquoU+0985oranyothervowel

ঃU+0983Visarga

Afteranyconsonant U+09CD(Hasanta)

- ঽ U+09BDAvagraha

Table7BanglaVowelswithcorrespondingkārs

334TheAnusvāraonuʃʃār(ং-U+0982)TheAnusvāra or onuʃʃār inBangla at times represents a homorganic nasal but notalwaysItreplacesaconjunctgroupofalsquoNasalConsonant+Hasanta+ConsonantrsquowherethesecondconsonantbelongstotheVelarvargaorsetasinলংকাButitoftenappearsalso for such combinations involving non-velars appearing as the lastmember of thecombinationasinলGাংটা ldquonakedrdquoorলGাংচা ldquoakindofsweettolimprdquoBeforeanon-vargaconsonant the Anusvara represents a nasal sound that may have an alternativeconjoined writing symbol representing the corresponding nasal consonant of theparticularsetAlthoughModernHindiMarathiandKonkaniprefertheanusvāratothecorrespondingHalf-nasal inBangla it isclearlydemarcatedastowhereonemustusetheAnusvāraandwhere ithastobeaconjunctclusterwithanasalasthefirstorthesecondcomponent

335NasalizationCandrabindu(-U+0981)

Candrabindu denotes nasalization of the preceding vowel as in চাদ cad lsquomoonrsquo(U+099AU+09BEU+0981U+09A6)Thissignwithadotinsidethehalf-moonmarkisusedasnasalizationmarkerinmanyBrahmı-basedscripts[143]

336Nukta(-U+09BC)ThenuktasigndoesnotexistinBanglaorthographyItispredominantlyusedinmanyBrahmıderivedscriptssuchasDevanagarı(forHindiBodoMaithiliSantaliKashmiriandSindhiThetermandtheconceptofnuktaareborrowedinBanglaTheIDNAProtocol(RFC5891)statesthatIDNsmustbeinUnicodeNormalizationFormC (NFC) RFC 7940 applies this requirement to LGRs The definition of NFC in the

18

UnicodeStandardcontainsanumberofcompositionexclusionsAsaresulttheBanglalettersয় YYAড় RRAandঢ় RRHAhavetoberepresentedinthethisLGRbyusingthesequences (YA +Nukta U+9AF + U+09BC) (DDA + Nukta U+9A1 + U+09BC) and(DDHA+NuktaU+9A2+U+09BC)insteadofthesinglecodepointsYYA(U+9DF)RRA(U+09DC) andRRHA (U+09DD) although the useof lsquoNuktarsquo is otherwise completelyunnaturalinBanglaIt is noted that in the current Unicode Standard chart these characters are listed asadditionalconsonantsAspertheLGRProcedurehoweverthesedecisionsdependontheIDNAProtocolthroughasetofprodeduresdevelopedbytheIETFEventhoughtheUnicode Standard also prescribesmethods to produce these three characters both asatomiccharacters (forexample09DC forড় [r]09DD forঢ় [rh] and09DFasয় [y]assinglekeystroke)theIDNAprotocolrequiresthatwetreatthemasconjunctcharactersandthenallocatecodesfortheseintheUnicodeBengaliBlockItmaybenotedthattherecouldbesporadicattemptsorcasesofwritingMuslimnamesUrdupoeticwordsandPerso-Arabicloanwordswithnuktaunderক(k)খ(kh)গ(g)জ(j) and ফ (ph) only for the sake of correct pronunciation and for maintaining thesanctityoftheloanwordThesewerealsolikeusingBanglawritingsystemtoworkliketheIPAscriptItishowevernotinuseinBanglawritinginprinting

337Visargabiʃɔrgo(ঃ-U+0983)andAvagraha(ঽ-U+09BD)

TheVisargabiʃɔrgoU+0983 is frequentlyused inBangla loanwordsborrowed fromSanskritandrepresentsasoundveryclosetohOnecouldquoteasanexampleদঃখduhkholdquosorrowrsquorsquoldquounhappinessrsquorsquo(U+0926U+0941U+0983U+0916)The Avagraha ঽ (U+09BD) is mainly used in Sanskrit Pali Prakrt or Maithili textswritteninBanglaItisgraduallybeingreplacedbyanuppercomma(egনেরাঽপরািণre-writtenasনেরাrsquoপরািণ)ItisrarelyusednoweveninotherlanguagesusingBanglascriptIncaseofLGRtheAvagrahaisnotpartoftherepertoireIthasbeendecidedthereforenottoretainAvagraha(ঽ)(U+09BD)becauseitisblockedinTLDsaspertheMaximalStartingRepertoire(MSR)PleaseseeAppendixIIinsection11foracompletelistofBanglaconsonantsandtheirallographs

338ZeroWidthNon-joiner(U+200C)andZeroWidthJoiner(U+200D)ThisnoteispertinenttotheuseofZeroWidthJoiner(ZWJ)andZeroWidthNonJoiner(ZWNJ)asusedinBanglaItneedstobenotedthatNepaliKonkaniandHindiusethesetwosignsinadifferentmanner

19

ZWJ(U+0200D)andZWNJ(U+0200C)arecodepointsthathavebeenprovidedbytheUnicodestandardto instructtherenderingofastringwherethescripthastheoptionbetweenjoiningandnon-joiningcharactersWithouttheuseofthesecontrolcodesthestringmayberenderedinanalternateformfromwhatisintendedUseofZWJ

bull InsofarasBanglaisconcernedZWJisusedfortheproperrenderingofcharacterssuchaskhaṇḍa-taৎasinসতGিজৎ (satyajit)ldquoSatyajitrdquoandসৎ(sat)ldquohonestrdquoThisistypedasfollowsta+Hasanta+ZWJ(U+0200D)

bull However ZWJ is more important where same combination of consonantal

characters is represented differently depending upon the contexts Eg র++য

havetworepresentationsinBanglamdashasযHandasর GTogettheformযHonehasto

type in the following mannermdashর++য but for র G the sequence would be

র+ZWJ++য [154] In other words ZWJ is used in the rendering of wordsdemanding ya-phalā after ra which is otherwise not possible to type (render)due to the same order of ra+hasanta+antastha ja in the medial andor finalposition Interestingly ra+hasanta+antastha ja is used to type repha on theconsonant -antasthaja as inকায6 (kaarjo) In order to get a ya-phalā after the

consonant -ra it is therefore obligatory to use ZWJ after -ra as in র Gাপার

(wrapper)র Gাশ(rash)র Gািল(rally)etcThetypingsequenceisgivenbelow

ra(র)+ZWJ+hasanta()+antasthaja(য)=র GUseofZWNJ

bull TheuseofZWNJinBanglaisusedtorepresenttheexplicitHasantaorHalantInordertoavoidconjunctformationincaseswherethereisanexplicithasantabeforethesucceedingconsonanttheZWNJisused

Consonant+hasanta+ZWNJ+consonant=explicithasantaExampleAা7 কথন(prakkathanaprakkɔtʰon)

TheuseofZWJZWNJhavebeenruledoutfromtherootzonebythe[Procedure]Usedin Bangla to create alternate renderings the insertion of these two signs can affectsearchingaswellasNLP

20

TheZeroWidthNon-joiner(ZWNJ)isaninvisiblecharacterusedincertaincases(afterHasanta)wheredefaultconjunctformationistobeexplicitlyrestrictedandtheHasantajoiningthetwoconsonantsparticipatingintheconjunctformationneedstobeexplicitlyshown

339UseofYa-phalaaYa-Phalaasequencesare two instances inBanglawhereHasanta isprecededbya fullvowel(U+0985অ-BENGALILETTERAandU+098Fএ-BENGALILETTERE)

bull অা 098509CD09AF09BE BENGALILETTERA+BENGALISIGNVIRAMA+BENGALILETTERYA+BENGALIVOWELSIGNAA

bull এা 098F09CD09AF09BEBENGALILETTERE+BENGALISIGNVIRAMA+BENGALILETTERYA+BENGALIVOWELSIGNAA

For renderingYa-phalā followedbyঅ andএ it isnecessary to typeU+09CDHasantaplusU+09AFyaprecededby thesaidvowelsThis isapurely ligaturalentityand theadditionofYa-phalāandākaraisusedtoelicittheaeligsoundasinEnglishacidঅGািসড

association অGােসািসেয়শনlsquobatrsquoবGাটlsquofatrsquo ফGাট lsquomatrsquo মGাটlsquocaprsquoকGাপetcTheBrāhmīscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribed as lsquovowel killerrsquo although it actually indicates absenceof a vowel after themarkedconsonantOnly theconsonantscanhave theHasantamarkedButasweseehereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅ8াandএ8া(CfUnicode100p473[100])

3310FormationofRa-phalaaandRefSequences Thiscasereferstotheformationofrephaandra-phalāasfollows

Ra-Hasanta= (C2H)whereC2iseither

09B0(র-BENGALILETTERRA)or 09F0(ৰ-ASSAMESELETTERRAUnicodename

BENGALILETTERRAWITHMIDDLEDIAGONAL)His09CD(-BENGALISIGNVIRAMA)

Owingtoco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+kaasinঅকHarkaldquothesunldquo)ra-phalā=C+Hasanta+ra(egT ieka+Hasanta+raasinচTchakraldquocyclerdquo)Thepointisin

21

both the cases the slot for ra could be Bangla ra র (U+09B0) or the Assamese ra ৰ(U+09F0)followedprecededbythecommonHasanta(U+09CD)whereastheshapesofrephaandra-phalāinboththecasesremainthesameTheLGRmakesanoteofthispoint of concern with respect to the two RAs in disguise as it would be compeltelyimpossibletodistinguishbetweenthemwithnakedeyesinalablesogeneratedwhichmay consequently lead to concerns related to spoofing and other kind of cyberirregularitiesThemotivetoclassthesetwoCPsas(blocking)variantsisbecausefullyrendered labels may mask the distinction between Bangla ra র (U+09B0) or theAssameseraৰ(U+09F0)ThatprovidesthejustificationforVariantSet4thoughonlyinthecontextoffollowingHasantThedifferencebetweentheRAsisonlydistinguishableifonelooksintotheirUnicodevaluesThereforelabelssuchasঅকHarka শীষH sırsalsquotopapexrsquo অY abhra lsquocloudthe skyrsquo ম śrama lsquophysical labourrsquo could be extremelydangerous as theweb-usermay never verify the digital content (the labels) with itsunicodevaluecodepoints ThispointismadeexplicitlywithreferencetoTable9(ofsequencesp36)andTable16(ofWLESymbolsp47)thataretofollowMoreoveritisnoteworthythattheREPHAcanalsooccurwithKHANDATATheconditionsinthiscontextofKHANDATAareliabletobesuchthattheCshouldbeeitherRAU+09B0(র)(usedinBangla)orRAU+09F0(ৰ)(usedinAssamese)

4 OverallDevelopmentProcessandMethodologyThe Neo-Brāhmī Generation Panel (NBGP) has been formed by members havingexperience in Linguistics (especially in NLP Computational linguistics) LiteratureLanguageHistoryandEpigraphyUndertheNeo-BrāhmīGenerationPanelBanglaandeightotherscriptsbelongingtoseparateUnicodeblocksarebeingtakenuptoassignaseparate LGR for each However an attempt ismade to ensure that the fundamentalphilosophybehindbuildingthoseLGRsconsistentwithallotherBrāhmī-derivedscriptsThepresentLGRwillcater tomultiple languagesbelongingtoEGIDSscale1to4(seeTable4)thatuseBanglascriptThefollowingguidingprinciplesareusedinmakingdecisionsaboutBanglaLGRCode-points

41 GuidingPrinciplesTheNBGP adopts followingbroadprinciples for selection of code-points in the code-pointrepertoireacrosstheboardforalltheNeo-Brāhmīscriptswithinitsambit

411 InclusionPrinciples4111 ModernUsageEvery character proposed should be in the everyday usage of a particular linguisticcommunityThecharacterswhichhavebeenencodedintheUnicodefortranscription

22

purposesonlyorforarchivalpurposeswillnotbeconsideredforinclusioninthecode-pointrepertoire4112 UnambiguousUseEvery character proposed should have unambiguous understanding among linguistsaboutitsusageinthelanguage

42 ExclusionPrinciplesThe main exclusion principle is that of External Limits on Scope These consist ofprotocolsor standardswhichareprerequisites to theLabelGenerationRule-setsAllfurtherprinciplesareinfactsubsumedundertheselimitationsbuthavebeenspeltoutseparatelyforthesakeofclarity421 ExternalLimitsofScopeThecodepointrepertoireforrootzonebeingaveryspecialcaseatthetopofprotocolhierarchies thecanvasofavailablecharacters forselectionasapartof theRootZonecodepointrepertoireisalreadyconstrainedbyvariousprotocollayersbeneathitThefollowingthreemainprotocolsstandardsactassuccessivefiltersiTheUnicodeChartOut of all the characters that are needed by the script in question if a particularcharacter is not encoded in Unicode it cannot be incorporated in the code pointrepertoire Such cases are quite rare and especially so in Bangla-Asamiyā-ManipuriWritingSystemgiventheelaborateandexhaustivecharacterinclusioneffortsmadebytheUnicodeconsortiumiiIDNAProtocolUnicode being the character-encoding standard for providing the maximum possiblerepresentation of a given scriptlanguage it has encoded as far as possible all thepossible characters needed by the script However the Domain name being aspecialized case it is governed by an additional protocol known as IDNA(InternationalizedDomainNames inApplications) The IDNAprotocol excludes somecharactersoutofUnicoderepertoirefrombeingpartofthedomainnamesiiiMaximalStartingRepertoire(MSR)TheRoot-zoneLGRbeing the repertoireof characterswhicharegoing tobeused forcreationoftheRoot-zoneTLDswhichinturnconstituteanevenmorespecializedcaseof domain names the ROOT LGR procedure introduces additional exclusions on theIDNArsquosallowedsetofcharacters

23

ExampleBanglaSignAvagrahaঽ(U+093D)evenifallowedbyIDNAprotocol isnotpermittedintheRootZoneRepertoireaspertheMSRTosumuptherestrictionsstartoffwithadmittingonlysuchcharactersasarepartofthe code-block of the given scriptlanguage The IDNA Protocol further narrows thisdownandfinallyanadditionalfilterintheformofMaximalStartingRepertoirerestrictsthecharactersetassociatedwiththegivenlanguageevenmore4211 NoPunctuationMarksTheTLDsbeingidentifierspunctuationmarkerspresentinBraHami-basedscriptswillnotbeincluded 4212 NoSymbolsandAbbreviationsAbbreviations weights and measures and other such iconic characters like BANGLAISSHAR(U+09FA)BANGLACURRENCYDENOMINATORSIXTEEN৹(U+09F9)etcwillalsonotbeincluded4213 NoRareandObsoleteCharactersThere are characterswhich have been added toUnicode to accommodate rare formssuchasSanskriticVOCALICRRৠ(U+09E0)andVOCALICLldquoঌrdquo (U+098C)aswellasVOCALICLLৡ(U+09E1)andtheallographicndashkaraformsofthelattertwosymbols-VOWELSIGNVOCALICL(U+09E2)andVOWELSIGNVOCALICLLldquo(U+09E3)Allsuch charactersareexcludedwhich complieswith theConservatismprincipleas laiddownintheRootZoneLGRprocedureHoweverinBanglathe-karacorrespondingtoVOCALICRRৠ(U+09E0)whichisVOWELSIGNVOCALICRRldquordquo(U+09C4)isstill inactiveuseincertainlimitedborrowedorSanskriticwordsandarethereforeretained4214 NoStressMarkersofClassicalSanskritandVedicStressmarkers for classical Sanskrit will not be included This is also in consonancewiththeLetterprincipleaslaiddownintheRootZoneLGRprocedure4215 ABNFThe Augmented Backus-Naur Formalism (ABNF) is described in Section 541 andAppendix(Section101)

5 RepertoireTheBanglaWritingSystemisrepresentedinUNICODEusingtheBengali(Bangla)scriptname as enumerated in ISO 15924 corresponding to languages such as Asamiyā(Assamese) Bangla (Bengali) and Manipuri The BENGALI block used for Bangla-

24

Asamiyā-Manipuri in theUNICODEhas93 entriesThis sectiondetails the code-pointrepertoirethattheNeo-BrāhmīGenerationPanel[NBGP]proposestobeincludedintheBanglaLGRItmaybementionedherethat theGovernmentofAssamhassubmittedaproposal toBureauof Indian Standards (BIS) on26th February2016 for dis-unificationofBanglaand Asamiyā Scripts The BIS inits 8thMeetingofIndian Language Technologies andProducts Sectional Committee LITD 20 held on 23rd Aug 2017 decided torefer theproposalforrecognitionofAssamesescriptinISOIEC10646toISOUntiltheUNICODEConsortiumtakesanyfurtheractionitwillbeassumedthattheCodePointRepertoireunderTable11willbevalidforallthethreelanguagesasaboveFor each of the code points language references have been given in the last columntitledReferenceunderTable8titledtheldquoCodePointRepertoirerdquoForentirecoverageofBanglacodepointsreferencesofBanglaAsamiyā(Assamese)Manipuri(Meitei)andBishnupriya are given Kokborok written in Bangla script is not known to haveintroducedmanynewcomplicationsexceptforoneparticularcharacterThoughonlyafewrepresentativelanguagesunderEGIDSScale1-4havebeenchosenforreferencingthey together cover all the code-points required for all the languages that NBGP hasconsideredasgivenunderBanglaUnicodePoints(asgiveninUNICODE63)Howeverbefore thedetailsarepresented it is ideal to lookat theBanglaCodePointChartfromMaximalStartingRepertoire[MSR]Version3Itmaybenotedthattheshapesofthereferenceglyphsgivenbelowinthecodechartsarebasedononeofthemanyfontsdesigned and are not prescriptive because there could be some variations in actualfonts ndash both UNICODE-compatible and True-Type ones Consider the following Codepointtable

25

Figure1BanglaCodePagefrom[MSR]forBangla-Asamiyā-Manipuri

ColourconventionAllcharactersthatareincludedinthe[MSR]-YellowbackgroundPVALIDinIDNA2008butexcludedfromthe[MSR]-PinkishbackgroundNot PVALID in IDNA2008 or are ineligiblefor the root zone (digits hyphen) - Whitebackground

GiventheBanglaUnicodeBlockasinFigure1forthecodepointsthoseareincludedintheMSRthefollowingsymbolswillneedaseparatetreatmentৎ U+09CE BanglaLetterKhanda-Taৰ U+09F0 Asamiyā-BanglaLetterRaWithMiddleDiagonalৱ U+09F1 Asamiyā-BanglaLetterRaWithLowerDiagonal

26

51 CodePointRepertoireInclusion

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

1 U+0981 BENGALISIGNCANDRABINDU

Candra-bindu

1Bangla2Manipuri2Assamese

[112][122][125]

2 U+0982 ং BENGALISIGNANUSVARA

Onushshar(Anusvara)

1Bangla2Manipuri2Assamese

[112][122][125]

3 U+0983 ঃ BENGALISIGNVISARGA

Biśarga(Visarga)

1Bangla2Manipuri2Assamese

[112][122][125]

4 U+0985 অ BENGALILETTERA

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

5 U+0986 আ BENGALILETTERAA

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

6 U+0987 ই BENGALILETTERI

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

7 U+0988 ঈ BENGALILETTERII

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

27

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

8 U+0989 উ BENGALILETTERU

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

9 U+098A ঊ BENGALILETTERUU

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

10 U+098B ঋ BENGALILETTERVOCALICR

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

11 U+098F এ BENGALILETTERE

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

12 U+0990 ঐ BENGALILETTERAI

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

13 U+0993 ও BENGALILETTERO

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

14 U+0994 ঔ BENGALILETTERAU

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

28

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

15 U+0995 ক BENGALILETTERKA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

16 U+0996 খ BENGALILETTERKHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

17 U+0997 গ BENGALILETTERGA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

18 U+0998 ঘ BENGALILETTERGHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

19 U+0999 ঙ BENGALILETTERNGA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

20 U+099A চ BENGALILETTERCA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

21 U+099B ছ BENGALILETTERCHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

29

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

22 U+099C জ BENGALILETTERJA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

23 U+099D ঝ BENGALILETTERJHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

24 U+099E ঞ BENGALILETTERNYA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

25 U+099F ট BENGALILETTERTTA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

26 U+09A0 ঠ BENGALILETTERTTHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

27 U+09A1 ড BENGALILETTERDDA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

30

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

28 09A109BC(U+09DC)

ড় NormalizedformofBENGALILETTERRRA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]09DCisthepreferredcodepointhoweveritisnotavailableforLGRasperthestandardsgoverningthisLGRdevelopment

29 U+09A2 ঢ BENGALILETTERDDHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

30 09A209BC(U+09DD)

ঢ় NormalizedformofBENGALILETTERRHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]09DDisthepreferredcodepointhoweveritisnotavailableforLGRasperthestandardsgoverningthisLGRdevelopment

31 U+09A3 ণ BENGALILETTERNNA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

32 U+09A4 ত BENGALILETTERTA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

31

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

33 U+09A5 থ BENGALILETTERTHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

34 U+09A6 দ BENGALILETTERDA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

35 U+09A7 ধ BENGALILETTERDHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

36 U+09A8 ন BENGALILETTERNA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

37 U+09AA প BENGALILETTERPA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

38 U+09AB ফ BENGALILETTERPHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

39 U+09AC ব BENGALILETTERBA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

32

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

40 U+09AD ভ BENGALILETTERBHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

41 U+09AE ম BENGALILETTERMA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

42 U+09AF য BENGALILETTERYA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

43 09AF09BC(U+09DF)

য় NormalizedformofBENGALILETTERYYA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]09DFisthepreferredcodepointhoweveritisnotavailableforLGRasperthestandardsgoverningthisLGRdevelopment

44 U+09B0 র BENGALILETTERRA

Consonant 1Bangla2Manipuri

[112][125]

45 U+09B2 ল BENGALILETTERLA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

46 U+09B6 শ BENGALILETTERSHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

33

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

47 U+09B7 ষ BENGALILETTERSSA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

48 U+09B8 স BENGALILETTERSA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

49 U+09B9 হ BENGALILETTERHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

50 U+09BE া BENGALIVOWELSIGNAA

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

51 U+09BF ি BENGALIVOWELSIGNI

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

52 U+09C0 ী BENGALIVOWELSIGNII

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

53 U+09C1 BENGALIVOWELSIGNU

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

34

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

54 U+09C2 BENGALIVOWELSIGNUU

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

55 U+09C3 BENGALIVOWELSIGNVOCALICR

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

56 U+09C4 BENGALIVOWELSIGNVOCALICRR

Kāra(Mātrā)

1Bangla2Assamese

[112][122]

57 U+09C7 l BENGALIVOWELSIGNE

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

58 U+09C8 m BENGALIVOWELSIGNAI

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

59 U+09CB lা BENGALIVOWELSIGNO

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

60 U+09CC lৗ BENGALIVOWELSIGNAU

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

35

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

61 U+09CD BENGALISIGNVIRAMA

Hasanta(=Halant)Virama(=Da ri)

1Bangla2Assamese2Manipuri

[112][122][125]

62 U+09CE ৎ BENGALILETTERKHANDATA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

63 U+09F0 ৰ BENGALILETTERRAWITHMIDDLEDIAGONAL

Consonant 2Assamese [122]

64 U+09F1 ৱ BENGALILETTERRAWITHLOWERDIAGONAL

Consonant 2Assamese2Manipuri

[122][125]

Table8BanglaCode-PointRepertoire

Apart from the above individual code-points the Neo-Brāhmī Generation Panel alsoproposes some specific sequences which enable conditional inclusion of the BanglaLETTER A and E followed by Bangla SIGN VIRAMA and Bangla LETTER YA againfollowed by Bangla VOWEL SIGN AA in the repertoire for enabling inclusion of aeligsoundasinEnglishlsquobatrsquolsquocatrsquoetc

SrNo

UnicodeCodePoints

Sequence

CharacterNames Examplelanguagesusingthecode-point

(Notexhaustive

list)

Reference

36

SrNo

UnicodeCodePoints

Sequence

CharacterNames Examplelanguagesusingthecode-point

(Notexhaustive

list)

Reference

S1 098509CD09AF09BE

অ8া BENGALILETTERABENGALISIGNVIRAMABENGALILETTERYABENGALIVOWELSIGNAA

BanglaAssamese

[112][122]

S2 098F09CD09AF09BE

এ8া BENGALILETTEREBENGALISIGNVIRAMABENGALILETTERYABENGALIVOWELSIGNAA

Bangla [112]

Table9Sequences

52 CodePointRepertoireExclusionTherearesomecharactersoftheBanglascriptthatfindplaceintheUnicodebuthavenot been included in the repertoire in the LGR proposal The reason for excludingঌ(U+098C)andৗ(U+09D7)isthattheyarerareandobsoletecharacters

SrNo CodePoints

Glyph CharacterNames Note

1 U+098C ঌ BENGALILETTERVOCALICL Limitedordeclininguse

2 U+09D7 ৗ BENGALIAULENGTHMARK Limitedordeclininguse

Table10ExcludedCodePoints

37

53 CodepointnotusedaloneBENGALI SIGN NUKTA U+09BC (See 336) is excluded from repertoire since it will never be used alone It will be used as sequence in three special characters in normalized form for ড় ঢ় য়

UnicodeCodePoint

Glyph CharacterName Reasonforexclusion

U+09BC BENGALISIGNNUKTA

Never used alone Only used together with U+09A1 ড

U+09A2 ঢ U+09AF য as to form ড় ঢ় য় respectively

Table10bExcludedCodePoints

54 TheBasisofPresentIDNThepresentLGRhasalsobenefitedfromtheearlierworkonIDNforBangla(differentversions)doneforभारतorভারতdraftedbetween20112009and18072013

541 TheABNFVariablesTheAugmentedBackus-NaurFormalism(ABNF)beganwiththefollowingvariables

CrarrConsonantVrarrVowelMrarrkāra(Mātrā)BrarrAnusvāra(onuʃʃār)DrarrCandrabinduXrarrVisarga(biʃɔrgo)HrarrHasantaViramaZrarrKhandaTa

TheAugmentedBackus-NaurFormalism(ABNF)willusethefollowingOperators

SrNumber Operator Function

1 ldquo|ldquo Alternative

2 ldquo[]rdquo Optional

3 ldquordquo VariableRepetition

38

4 ldquo()rdquo SequenceGroup

Table11TheABNFFormalism

InwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoBanglaaregiventofacilitateunderstanding

542 TheVowelSequenceInwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoBanglaare given To facilitate understanding of other Brahmi script users equivalents inDevanāgarīareprovidedwherevernecessaryAvowelsequenceismadeupofasinglevowelItmaybefollowedbutnotnecessarily(optionally)byanAnusvāraonuʃʃār(B)Candrabindu(D)oraVisargabiʃɔrgo(X)ThenumberofDBorXwhichcanfollowaV inBanglamaynotberestrictedtooneGoingbytherules illustratedinthedocument it isclearthat formationssuchasVDDVBBandVXXare invalidorthographicunitsHowever it isvalidandpossible tohaveformationsorsequencessuchasanusvarafollowedbyachandrabinduononehandandvisarga followed by a chandrabindu on the other as in হ8াংচা lsquohaelignchārsquo and lsquohaelignrsquo হ8াঃrespectivelyThepossibilityof aVisarga orAnusvāra (onuʃʃār) followingaCandrabinduexists inBangla Vowel can optionally be followed by a combination of Hasanta Virama [H]Consonant [C] to formaYa-phala ldquoYa-phala isapresentation formofU+09AFBanglaletterযorlsquoyarsquoRepresentedbythesequenceltU+09CDieBENGALISIGNVIRAMABangla SIGNHasanta or VIRAMA U+09AF -য BENGALI LETTER YAgt Ya-phala has aspecialformয়AgainwhencombinedwithU+09BEাBENGALIVOWELSIGNAA(ielsquoaarsquo(ā))itisusedfortranscribing[aelig]asintheldquoardquointheEnglishwordldquobatrdquowritteninBanglaasব8াটAVowel-sequenceadmitsthefollowingcombinations

5421 ASingleVowel

ExamplesV অअ

5422 AVowelwithConditionsAVowelcanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D]orVisarga[X]or Candrabindu+ Anusvāra [DB] or Candrabindu+ Visarga [DX] or combination ofHasanta(orVirama)[H]followedbyConsonant[C]followedbykāra(Mātrā)[M]

39

Examples

VB অং अ

VD অ अ

VX অঃ अः

VDB অং अ

VDX অঃ अःVHCM অ8াএ8া

5423 VHCMSequenceAVHCMsequencecanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D] orVisarga [X]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga [DX]

Examples VHCMB অ8াংএ8াং VHCMD অ8াএ8া VHCMX অ8াঃএ8াঃ VHCMDB অ8াংএ8াং VHCMDX অ8াঃএ8াঃ

543 TheConsonantSequence

5431 ASingleConsonant(C) ExampleC কक

5432 AConsonantwithConditionsAConsonantoptionallyfollowedbydependentvowelsignkāra(Mātrā)[M]orAnusvāra [B] or Candrabindu [D] or Visarga [X] or Hasanta (also known asVirama)[H]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga[DX]

Example

CM িকক -कक

CB কং क

40

CD ক क

CX কঃ कः

CH p क(Pureconsonant)

CDB কং क

CDX কঃ कः

5433 CMSequenceACMsequencecanbeoptionallyfollowedbyBDXDBorDX

Example CMB কীংকং कक

CMD কা का

CMX বীঃ वीः

CMDB কাং का

CMDX কাঃ काः

5434 SequenceofConsonants Asequenceofconsonants(upto4)joinedbyHasanta(alsoknownasVirama)

3(CH)CExample

CHC W rarr ন++ ত न+त

CHCHC sup2 rarr ন+ + ত+ + র न+त+र

CHCHCHC q8 rarr ন++ত++র++য় न+त+र+य

5435 Subsets While considering its subsets as a representative example we will consider thecombinationCHConlyhoweverthesameisequallyapplicabletoCHCHCandCHCHCHC[A]ThecombinationmaybefollowedbyMBDXDBorDX

Example CHCM sup3ী rarrক ক ী 4कrarrककী

CHCB sup3ং rarrক ক ং 4कrarrकक

CHCD sup3 rarrক ক 4कrarrकक

41

CHCX sup3ঃ rarrক ক ঃ 4कःrarrककঃ

CHCDB sup3 ং rarrক ক ং 4कrarrकक

CHCDX sup3ঃ rarrক ক ঃ 4कःrarrककः [B] 3(CH)CMmayfurtherbefollowedbyaBDXDBorDX

Example

CHCMB sup3ীং rarr ক ক ী ং 4क rarr क क ी

sup3ং rarr ক ক ং 4क rarr क क

CHCMD sup3া rarr ক ক া 4का rarr क क ा

CHCMX sup3ীঃ rarr ক ক ী ঃ 4कः rarr क क ी ः

CHCMDB sup3াংrarr ক ক া ং 4काrarr क क ा

CHCMDX sup3াঃ rarr ক ক া ঃ 4काः rarr क क ा ः

544 TheKhanda-Tasequence

5441 AsinglelsquoKhandarsquo-Ta(Z) Example Zৎ=x

5442 AKhandaTaCombination10AKhandaTacanbeprecededbyaconsonantandHasanta(alsoknownasVirama)

[CH]Z

Example র++ৎ =ৎHasinভৎHসনা (bhartsanā) scoldingNoteTheconditionsinthiscontextofKHANDATAarethattheCshouldbeeitherRAU+09B0(র)(usedinBangla)orRAU+09F0(ৰ)(usedinAssamese)

545 SpecialCasesSandPTwospecialcasesinvolvingSequences(referredtoasSandPinTable16underSection7)couldbedescribedbrieflyhereLetustakeupSinthefirstinstanceItisnoteworthythat thereare two instances inBanglawhereHasanta(U+09CD) isprecededbya full

10 Refer to Rule P in Section 7 Table 16

42

vowel (U+0985 অ - BANGLA LETTER A and U+098F এ - BANGLA LETTER E) ForrenderingYa-phalāfollowedbyঅandএitisnecessarytotypeU+09CDplusU+09AFyapreceded by the said vowels This is a purely ligatural entity and the addition ofYa-phalāandā-kāraisusedtoelicittheaeligsoundasinEnglishlsquobatrsquolsquofatrsquoetcTheBrahmıscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribedaslsquovowel killerrsquo although it actually indicates absence of a vowel after the markedconsonant Only the consonants can have the Hasanta marked But as we see hereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅGাandএGা(CfUnicode100p473[100])Another case refers to the formation of repha and ra-phalā in the said script andmentioned in the tableaboveasPOwing toco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+

kaasinঅকHarkaldquothesunldquo)ra-phalā=C+Hasanta+ra(egTieka+Hasanta+raas

inচTcakraldquocycleldquo)ThepointisinboththecasestheslotforracouldbeBanglaraর

(U+09B0)ortheAssameseraৰ(U+09F0)followedprecededbythecommonHasanta(U+09CD)whereastheshapesofrephaandra-phalāinboththecasesremainthesame

43

6 VariantsThissection talksabout thevariants in theBanglascriptTheNBGPcategorizes theseconfusinglyvariantsintwogroupsGroup1ConfusingduetopurevisualsimilarityGroup2ConfusingduetodeviationfromnormallyperceivedcharacterformationsbylargerlinguisticcommunityForGroup1anyidenticalcodepointsaredefinedasvariantsTheconfusablebutnotidenticalcasesarenotproposedasthereisanotherpanel(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesHowevercaseswhichbelongtoGroup2areproposedtobeconsideredasvariantsThesecasesarenotofmerevisualsimilarityasthey involve some deviations from the widely accepted norms of Bangla Aksharformations These can cause confusion even to a careful observer and hence beingproposedasvariantsThe variants are generated in a script when two or more forms are formed withdifferent storage or code points In Bangla the e-kāra ā-kāra and the o-kāra havedifferentcodepointsOnecantypeowithaconsonantatonegoandthesamebytypinge-kāra and ā-kāra as two separate keys getting the same results A reader cannotdifferentiatebetween the twoko (েকা)one typedwitha singlekeyand theotheronetypedwithtwodifferentkeysMoreoverthiswillnotbeconsideredasacaseofvariantbecauseakārafollowedbyakāraisnotallowed

61 InScriptVariantsHoweverweproposetwocasesoftruein-scriptvariantsinBanglascriptCASEIAs far as true variants in Bangla are concernedwemay drawour attention to caseswhereinHasantawith(U+09A5)থ(tha)appearsasconjunctwith(U+09B8)স(sa)and(U+09A8)ন(na)1 স+Hasanta+থ(U+09B8+U+09CD+U+09A5)versus

স +Hasanta+হ(U+09B8+U+09CD+U+09B9)

2 ন+Hasanta+থ(U+09A8+U+09CD+U+09A5)versus

ন+Hasanta+হ(U+09A8+U+09CD+U+09B9)

44

Theabovecombinationsifwrittenintraditionalorthographycouldbelittleconfusingwheretheথ(tha)inconjunctappearslikeaহ(ha)Theconjunctcouldbeintheinitialmedialorfinalpositions(asshownbelowinegno1)Itcouldbetypedwrongaswellthinking itwas aহ (ha)U+09B9 increasing the chances of risks in labelwriting andidentificationExamples1 acuteandসহ(asinacuteানsthānaacuteলsthulaাacuteGsvāsthyaঅacuteায়ীasthāyı)

2 andনহ(asinparagrantha)The fontswhichrepresent traditionalBanglawritingsystemcould tend tocreate thisproblemThereforethesemaybetakenascasesofvariantsinBanglaCASEIIAnotherinterestingexampleofvariantisencounteredinra+HasantaandHasanta+racombinationsinwritinglabelsintheBanglascript(forlanguagessuchasBanglaAssameseandManipuri)Thevariantcasesariseintypinglsquorepharsquo(involvingra+Hasanta)andlsquora-phalārsquo(involvingHasanta+ra)lsquoRepharsquocouldbeformedbytwosequences(mainlybecausebothAssameseandBanglafindplaceinthesameUNICODEpointsandlsquoB_RArsquoaswellaslsquoA_RArsquorefertothesamephoneticelement)Herethefinalligatureslookthesameandwillbeasfollows

(1) B_RA+H+C(2) A_RA+H+C

Where

B_RA = U+09B0BENGALILETTERRA(র)orA_RA = U+09F0BENGALILETTERRAWITHMIDDLEDIAGONAL(ৰ)H = U+09CDBENGALISIGNVIRAMA()C = anyconsonant(theoretically)

Example

Sequence1(UsingBanglaRA)

Ligature1

Sequence2(UsingAssameseRA)

Ligature2

U+09B0(র)U+09CD()U+0995(ক) কH U+09F0(ৰ)U+09CD()U+0995(ক) কH

U+09B0(র)U+09CD()U+09A0(ঠ) ঠH U+09F0(ৰ)U+09CD()U+09A0(ঠ) ঠH

Table12ExampleofRepha

45

NoteAsBanglaandAssameseকandঠlookexactlythesametheresultantcombinationswithRephalookidenticalAdditionofRephadoesnotmakeanydifferencelsquoRa-phalarsquocouldbeformedbytwosequencesonsimilargroundsandthefinalligatureswouldlookthesame

(1) C1+H+B_RA(2) C1+H+A_RA

WhereC1 = anyconsonantsexceptKhanda-ta

Example

Sequence1(UsingBanglaRA)

Ligature1

Sequence2(UsingAssameseRA)

Ligature2

U+0995(ক)U+09CD()U+09B0(র) U+0995(ক)U+09CD()U+09F0(ৰ)

U+09A8(ন)U+09CD()U+09B0(র) ) U+09A8(ন)U+09CD()U+09F0(ৰ) )

Table13ExampleofRa-phala

AstheAssameseandBanglaRephaandRa-phalaconjunctformslookthesamethiscouldcauseconfusabilitytotheend-usersHencetherephaandra-phalacasesneedtobedefinedasvariantsNBGPconcludedtodefineরandৰasvariantcodepointswhereonlyonevariantsetbetweenরandৰcouldcoverallcasesButthiswillcreateblockedvariantlabelsegif

someoneregistersldquoরররrdquothevariantlabelldquoৰৰৰrdquowillbegeneratedasvariantandwillbeblockedandviceversaHoweveritisonlyblockedatthelabellevelifsomeoneelseneedstoregisterotherlabelsegৰৰorৰৰৰৰitisstillpossible

62 CrossScriptVariantsAcrispcrossscriptstudyforBanglahasbeendonewithrespecttosisterscriptssuchasDevanāgarī Gurmukhı and Odia11(formerly Oriya) keeping in mind the visual andtechnicalconfusionstheymaycauseaslabelsonthewebdomainMoreoverthereisnoin-script variant in Bangla as far as the orthography is concerned The followingcharacters are being proposed by the NBPG as variants Although there are certaincharacters which are somewhat similar they but have not been included here TheyhavebeenprovidedintheAppendix(102)forreference

11 Unicode uses Oriya for the script although Odia is now the official term used

46

1 BanglaandNāgarīDevanāgarīScript

Bangla Devanāgarī

মU+09AE

मU+092E

িU+09BF

िU+093F

Table14-BanglaandDevanāgarīcross-scriptvariantcodepoint

2 BanglaandGurmukhiScript

Bangla Gurmukhı

মU+09AE

ਸU+0A38

িU+09BF

ਿU+0A3F

Table15-BanglaandGurmukhıcross-scriptvariantcodepoint

7 WholeLabelEvaluationRules(WLE) ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinsection32whenwritten inBangla12ScriptThe ruleshavebeendrafted in suchawaythattheycanbeeasilytranslatedintotheLGRspecificationsBelow are the symbols used in the WLE rules for each of the Indic SyllabicCategoryasmentionedinthetableprovidedinCodepointrepertoire(Section51)

C rarr Consonant

M rarr Kāra(Mātrā)

V rarr Vowel

B rarr Anusvāra

12 As used by the Unicode denoting and including both Assamese and Maṇipuri

47

D rarr Candrabindu

X rarr Visarga

H rarr Hasanta

Z rarr KhandaTa

S rarr S1S2(fromTable9)or(ae)Ya-phalā(V1HC1M1)whereV1iseither0985(অ-BENGALILETTERA)or098F(এ-BENGALILETTERE)His09CD(-BENGALISIGNVIRAMA)C1is-09AF(য-BENGALILETTERYA)M1is-09BE(া-BENGALIVOWELSIGNAA)S1 and S2 are valid even they are not allowed by the other context rules

P rarr Ra-Hasanta(C2H)whereC2iseither09B0(র-BENGALILETTERRA)or 09F0 (ৰ - ASSAMESE LETTER RA

Unicode name BENGALI LETTER RAWITHMIDDLEDIAGONAL)

His09CD(-BENGALISIGNVIRAMA)

Table16-SymbolsusedinWLErules

ItisalsoperhapsidealtomentionherethatinBanglatheconsonantletters(orgraphemes)arephysicallyjoinedtoformldquoclustersrdquothatcouldtheoreticallyconjoinfromtwotofourconsonantsandcombinetocreatenewshapesDashandChaudhuri(1998)statethatthereareldquonearly380uniqueconsonantclustersrdquooutofwhichBi-consonantalcombinationsare290three-lettercombinationsaccountforanother80andtherareroneswithfourlettersnumber10more[136Pg4]MoredetailsofsuchcombinationscouldbeseeninPabitraSarkar(1993)[135] 71FinalSetofWLERulesTheprevalentpatternsinBanglaandvariousrestrictionsbelowarethespecificWLErulesthatneedtobeimplemented

48

1 CisasetofCandCNwhereCNisthesetofnormalizedformsofড়ঢ়য়2 HmustbeprecededbyC

Example

3 MmustbeprecededbyCExampleকা

4 DmustbeprecededbyeitherofVCorMExampleআখখা হা

5 XmustbeprecededbyeitherofVCMorDExampleউঃখঃবঃাঃ দ ঃ

6 BmustbeprecededbyeitherofVCMorDExampleআংইংকং

7 ZmustbeprecededbyVCMDBXorPExampleইৎকৎাৎাৎপ6ৎrৎ (S is not listed because S ends with M Z may also follow S)rdquo

8 VCANNOTbeprecededbyHDetailsin711CaseofVprecededbyH

9 SCANNOTbeprecededbyH

Now letuselaborateeach rulewithexamples from the scriptkeeping inmindthe Bangla Assamese and Manipuri communities Some combinations ofcharactersmayseemunrealisticorrareinusagebutthereisnoharminaddingsuch ligaturesbecause it ispossible tocreate thembyanyusereasilybutmaynotbeattestedcombinationsCase of V Preceded by H There could be cases involvingmulti-word domains where Vmay need to beallowedtofollowanHegব8াsঅtইিuয়া baeligŋkʌv ɪndiə (U+09ACU+09CDU+09AFU+09BEU+0999U+09CD U+0995 U+0985 U+09AB U+09CD U+0987 U+09A8 U+09CD U+09A1U+09BFU+09DFU+09BE)(meaningBankofIndia)ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendswithanH(অt)andthesecondwordbeginswithaV(ইিuয়া)Somesectionsofthe linguistic community require the explicit presence of H for fullrepresentation of the sound intended However by and large the form of thefirstwordwithoutanH(U+09CD)isconsideredenoughforfullrepresentationofthesoundintendedforthefirstword

49

This isauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoire Otherwise V is never required to be allowed to follow an HPermittingthismaycreateaperceptivesimilaritybetweentwolabels(withandwithout H) for majority of the linguistic communities hence this is explicitlyprohibitedbytheNBGPIn future if required depending on the prevailing requirements from thecommunitythefutureNBGPmayconsiderrevisitingthisrule

72 AdditionalExamplesfromBanglaABNF

Belowarea fewexampleswhichhelponeunderstandsomeof the rulesABNFputsinplaceThesearejustgivenforreferencepurposesandarenotmeanttobecomprehensive

1HMBDorXcannotoccurinthebeginningofaBanglawordExample

ক क

াক ाक

ংক क

ক क

ঃক ःकAscanbeseensuchcombinationwillresultautomaticallyinaldquogolurdquooradottedcircle marking it as an invalid formation This is an intrinsic property of theIndianlanguagesyllableandisquasi-automaticallyappliedwhereversupportedbytheOS

2HisnotpermittedafterVBDXMS

Example

অ अ

অং क ক क কঃ कः ি )क

3NumberofBDorXpermittedafterConsonantorVowelorakāra(Mātrā)isrestrictedtoonethusthefollowingcombinationsareinvalidated

50

Example

কংং क

ক क

কঃঃ कःः

কা का

কীঃঃ कःः

অংং अ

অ अ

অঃঃ अःः

4NumberofMpermittedafterConsonantisrestrictedtooneExample

কীী की5MisnotpermittedafterV Example

ইাঈৗ ईाईौ6ThecombinationsofAnusvāra+ VisargaaswellasVisarga+Anusvāraarenotpermissible

Example

কংঃ कः

কঃং कः

8 Contributors

81ExpertsfromIndia ProfessorUdayaNarayanaSinghChair-ProfessorofLinguisticsampDeanFacultyofArtsAmityUniversityHaryanaGurgaonPachgaonManesarPIN122431(Haryana)India

51

ProfessorPabitraSarkarformerlyVice-ChancellorRabindraBharatiUniversityKolkataDrAtiurRahmanKhanPrincipalTechnicalOfficerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMrRajibChakrabortyLinguistSocietyforNaturalLanguageTechnologyResearch(SNLTR)Module114amp130SDFBuildingSaltLakeSector-VKolkata-700091(WestBengal)IndiaMrAkshatJoshiProjectEngineerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMsMoumitaChowdhurySeniorTechnicalOfficerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMrChandrakantaMurasinghAgartalaTripuraSomeotherNBGPmembers

82ContributorsfromBangladeshJanabMustafaJabbarHonorableMinisterMinistryofPostsTelecommunicationsampInformationTechnologyGovtofBangladeshProfShamsuzzamanKhanFormerDirector-GeneralBanglaAcademyDhakaProfRafiqulIslamNationalProfessorofHumanitiesDhakaProfSwarochisSarkarDirectorInstituteofBangladeshStudiesRajshahiUniversityRajshahiBangladeshProfJinnatImtiazAliDirector-GeneralInternationalMotherLanguageInstituteDhakaMrMohammadMamunOrRashidDepartmentofBanglaJahangirnagarUniversityampMemberBangladeshComputerCouncilProfManiruzzamanformerlyProfessorChittagongUniversityChattagramBangladeshMrShyamSunderSikderSecretarySecretaryPostampTelecommunicationsDivisionGovtofBangladesh

52

MrMdMustafaKamalFormerDirectorGeneralBangladeshTelecommunicationsRegulatoryAuthorityGovernmentofBangladeshDhakaBrigadierGeneralMdMahfuzulKarimMajumderDirector-GeneralEngineeringampOperationsDivisionBangladeshTelecommunicationsRegulatoryAuthorityGovernmentofBangladeshDhakaMdZiarulIslamProgrammerPostsampTelecommunicationsDivisionGovernmentofBangladeshDhakaProfSyedShahriyarRahmanDepartmentofLinguisticsUniversityofDhakaDrMizanurRahmanDirector(In-Charge)TranslationTextBookandInternationalRelationsDivisionBanglaAcademyDhakaDrApareshBandyopadhyayDirectorBanglaAcademyDhakaMrMdMobarakHossainDirectorBanglaAcademyDhakaDrJalalAhmedDirectorBanglaAcademyDhakaMrJahangirHossainInternetSocietyBangladesh(ICANNALS)JanabSarwarMostafaChoudhuryBangladeshComputerCouncilDhakaJanabMdRashidWasifBangladeshComputerCouncilDhakaJanabIstiaqueArifSeniorAssistantDirectorBangladeshTelecommunicationsRegulatoryAuthorityDhakaMsAfifaAbbasInformationSecurityandGovernanceLeadEngineeratBanglalinkandICANNFellowMr Mohammad Abdul Haque Secretary General Bangladesh Internet GovernanceForumMrImranHossenCEOEyeSoftandkeymemberofBangladeshAssociationofSoftwareampInformationServices(BASIS)MsShahidaKhatunDirectorFolkloreMuseumampArchiveDivisionBanglaAcademyDhakaMrSyedAshikRehmanCEOBengalMediaCorporationDhakaMrHaseebRahmanCEOProfessionalsrsquoSystemsDhaka

53

9 References[100] UnicodeConsortium2017UnicodeStandard100MountainViewCA

[101] BandyopadhyayChittaranjan1981DuiShatakerBanglaMudranoPrakashanKolkataAnandaPublishers

[102] BanerjiRD1919TheOriginoftheBengaliScriptKolkataNewDelhiAsianEducationalServices2003reprint

[103] ChatterjiSK1926TheOriginandDevelopmentoftheBengaliLanguageCalcuttaCalcuttaUniversityPress

[104] -----1939Bhasha-prakashBangalaVyakaran(AGrammaroftheBengaliLanguage)CalcuttaUniversityofCalcutta

[105] HaiMuhammadAbdul1964DhvaniVijnanOBanglaDhvani-tattwa(PhoneticsandBengaliPhonology)DhakaBanglaAcademy

[106]JhaSubhadra1958TheFormationofMaithiliLondonLuzacampCo

[107] KosticDjordjeDasRheaS1972AShortOutlineofBengaliPhoneticsCalcuttaStatisticalPublishingCompany

[108] MajumdarRC1971HistoryofAncientBengalCalcuttaGBhardwaj

[109] MazumdarBijaychandra19202000TheHistoryoftheBengaliLanguage(ReprCalcutta1920ed)NewDelhiAsianEducationalServices

[110] PandeyAnshuman2001ProposaltoEncodetheTirhutaScriptinISOIEC10646

[111] PalPalashBaran2001DhwanimalaBarnamalaKolkataPapyrus

[112] -----2007lsquoBanglaHarapherPanchParbarsquoInSwapanChakrabortyedMudranerSanskritiOBanglaBoiKolkataAbabhas

[113] RossFiona1999ThePrintedBengaliCharacteranditsEvolutionLondonCurzon

[114] ShastriMahamahopadhyayHaraPrasad1916HājārBacharērPurāṇaBāṅgālāBhāṣāyBauddhaGānōDōhāCalcuttaBangiyaSahityaParishat

[115] SinghUdayaNarayana(JointlyManiruzzaman)1983DiglossiainBangladeshandlanguageplanningCalcuttaGyanBharati214pp

[116] -----1987ABibliographyofBengaliLinguisticsMysoreCIILxii+316pp

[117] -----2017(withRajibChakrabortyBidishaBhattacharjeeampArimardanKumarTripathy)LanguagesandCulturesontheMarginGuidelinesforFieldworkonEndangeredLanguagesMimeoCentreforEndangeredLanguagesVisva-Bharati

54

[118] -----1980ScriptalchoiceandspellingreformAnessayinlanguageandplanningJournaloftheMSUniversityofBarodaSocialScienceNumber292173-186AmodifiedversionreprintedEAnnamalaiBjornJernuddandJoanRubinedsLanguagePlanningProceedingsofanInstituteMysoreCIIL405-417

[119] Sripantha1996JakhanChapakhanaEloKolkataPaschim-BangaBanglaAcademy

[120] SurAtul1986BanglaMudranerDushoBacharKolkataJijnasa

[121] ScriptBehaviourforBengaliVersion11TDILandC-DACPune

[122] BoraMahendra1981TheEvolutionofAssameseScriptJorhatAssamSahityaSabha

[123] ProposaltoEncodetheTirhutaScriptinISOIEC10646httpwwwunicodeorgL2L201111175r-tirhutapdfaccessedon25112017

[124] EthnologueAssameseintheLanguageCloudhttpswwwethnologuecomcloudasmaccessedon25112017

[125] BengalialphabetforManipurifoundinEthnologueManipuri(MeeteilonMeithei)httpswwwomniglotcomwritingmanipurihtmaccessedon20102019

[126] WikipediaBengalialphabethttpsenwikipediaorgwikiBengali_alphabetaccessedon25112017

[129]OmniglotSlyhetihttpwwwomniglotcomwritingsylotihtmaccessedon1052018

[130]WikipediaBishnupriyaManipurilanguagehttpsenwikipediaorgwikiBishnupriya_Manipuri_languageaccessedon25112017

[131] TheEMILLECIILCorpushttpmetashareeldaorgrepositorybrowsethe-emilleciil-corpusabdd35c8de6f11e2b1e400259011f6ea6bce74d38dbb42d881da76c64a6adb20accessedon1052018

[132]TheEMILLECIILCorpushttpcatalogelrainfoproduct_infophpproducts_id=696accessedon1052018

[133] BanglaLanguageampScript httpswwwisicalacin~rc_banglabanglahtmlaccessedon1052018

[134] SarkarPabitra1992BanglaBananSanskarSamasyaoSambhabanaKolkataChirayataPrakashan

[135] SarkarPabitra1993BanglaBhasharYuktabyanjanBhasha1123-45

55

[136] DashNiladriShekharandBBChaudhuri1998BanglaScriptAStructuralStudyLinguisticsToday121-28Alsoavailableathttpswwwacademiaedu9967428Bangla_Script_A_Structural_Study

[137] DaniAhmedHasan(1957)lsquoSrīhatta-NāgarīLipirUtpattioBikāśrsquoBanglaAcademyPatrika(Dhaka)Vol12(Bhadra-Agrahayan1364BangabdaNumber)pg1

[138] WikipediaSylhetiNagari httpsenwikipediaorgwikiSylheti_Nagariaccessedon1952018

[139] FuruiRyosuke(2015)lsquoVariegatedAdaptationsStateFormationinBengalfromtheFifthtoSeventhCenturyrsquoinBhairabiPrasadSahuampHermannKulkeedsInterrogatingPoliticalSystemsIntegrativeProcessesandStatesinPre-ModernIndiaChapter9Pp255-73NewDelhiManohar

[140] FergusonCharesAandMunierChowdhury(1960)lsquoPhonesofBengalirsquoLanguageVol36No1pp22-59

[141] ShahidullahMuhammad(2007)BuddhistMysticSongsDhakaMowlaBrothers

[142] RayPunyaSloka(1966)BengaliLanguageHandbookWashington

[143] HaiMuhammadAbdul(1960)AphoneticandphonologicalstudyofnasalsandnasalizationinBengaliDhakaUniversityofDhaka

[144] UnicodeConsortiumProposalSummaryFormtoAccompanySubmissionsforAdditionstotheRepertoireofISOIEC10646UNICODEhttpswwwunicodeorgL2L200202387r-syloti-formpdfaccessedonMay212018

[145] WikipediaOlChiki(Unicodeblock)httpsenwikipediaorgwikiOl_Chiki_(Unicode_block)accessedonMay212018

[146]BanglaScripthttpwwwbangladesh2000combdbangla_scripthtmlaccessedonMay212018

[147] BhattacharyaAshutoshed(1942)GopichandrerGanCalcuttaCalcuttaUniversity

[149] DasSisirKumar(1975)SahibsandMunshisAnAccountoftheCollegeofFortWilliamCalcutta

[150] IslamRafiqulPabitraSarkarMahbubulHaqampRajibChakraborty(eds)(2014)BanglaAcademyPromitoBanglaByabaharikByakaran(AFunctionalGrammarofStandardBangla)DhakaBanglaAcademy

[151] SarkarPabitra[2013]lsquoBanglaSpellingReformtheLongandShortofItrsquoBanglaJournal19215-232

[152] BanglaAcademy(2012)BanglaAcademyPromitoBanglaBananerNiyam(StandardBanglaSpellingasadoptedbyBanglaAcademy)DhakaBanglaAcademy

56

[153] SarkarPabitraampRajibChakraborty2018ldquoWhathashappenedSoFarIntermsof Script Reformsrdquo Paper presented at the Face to Face meeting jointly held by theBanglaAcademyDhakaampICANNatBanglaAcademyDhakaon10072018

[154] The Unicode Consortium 2018 The Unicodereg Standard Version 110 ndash CoreSpecificationChapter12P473

57

10 Appendix-I

101 AugmentedBackus-NaurFormalism(ABNF)The Augmented Backus-Naur Formalism (ABNF) is generic in nature and whenappliedtoaspecificlanguagescriptcertainrestrictionrulesapplyInotherwordsinagivenlanguagesomeoftheFormalismstructuresdonotnecessarilyapplyTotakecareofsuchcasesrestrictionrulesaresetinplaceTheserestrictionswillhelptofine-tunetheABNFIncaseofBangla13inparticularthefollowingrulesapply

1Khaṇḍata (ৎ) is NOT allowed at the beginning of an IDN label The sameappliestoঞandthevelarnasalঙintheBanglaSchemeoffive-foldlsquovargarsquo(asdefined under Table 5) Moreover Bangla does not allow ya (য়) in thebeginningofawordeitherbutwecanciteacoupleofnativeexamples forexamplethewordয়8াvেড়া(yaeligbbɔRo)fromthepoemlsquoLichuchorrsquowrittenbyKaziNazrul IslamHowever there are instances of it being used in namesmostlyof foreignoriginsuchasYaqubwhichmaybewrittenwithya(য়) inthe beginning as inয়াwব) In very recent timeswhile transliterating someChineseandJapanesenamesinBanglaonedoescomeacrossthepossibilityofKhaṇḍata (ৎ) followedbysa (স) inthebeginningofaword forexamplexেসিরং(Tsering)

2CHcancomewithKhandaTainonlythecasewhereCisra(র)(09B0)

ৎ6 asinভৎ6 সনা

3OnlyfollowingcombinationswithVHCMwillbeallowedrarrঅ8া(togetherpronouncedasaelig)asinঅ8ািসড(acid)rarrএ8া(togetheralsopronouncedasaelig)asinএ8ািসডএ8ােসািসেয়শান

(acidassociation)

102 lsquoSylhetiNagarılipirsquoorlsquoSilotirsquoThisversionofBanglascriptresemblesthe lsquoKaithīrsquoscript(ISO12954)usedbytheAccountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh andBiharndashwidelyinuseduringthe1880sTherewereseveralothernamesofSylheti

13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language Assamese and Maṇipuri have not been covered in this section

58

Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)

Table17ndashTheScriptTableofSylhetiNagarıorSiloti

103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints

1031 BanglaandNāgarīorDevanāgarī

Bangla Devanāgarī NBGP

Decision

ঃ U+0983 ः U+0903 Confusable

ওU+0993 उU+0909 Confusable

ঘU+0998 घU+0918 Confusable

U+0981 U+0945 Confusable

Table18BanglaandDevanāgarīconfusablecodepoints

59

1032 BanglaandGurmukhi

Bangla Gurmukhi NBGPdecision

ঘU+0998 ਬU+0A2C Confusable

U+0981 U+0A71 Confusable

Table19BanglaandGurmukhiconfusablecodepoints

Bangla Gurmukhi

NBGPdecision

ওU+0993 ਤU+0A24

Distinguishable

শU+09B6 ਅU+0A05

Distinguishable

মU+09AE ਮU+0A2E

Distinguishable

বাU+09ACandU+09BE

ਗU+0A17

Distinguishable

Table20ndashBanglaandGurmukhıdistinguishablecodepoints

1033BanglaandOriya(Odia)

Bangla Oriya(Odia) NBGPDecision

ওU+0993 ଓU+0B13 Confusable

Table21ndashBanglaandOriyadistinguishablecodepoints

Bangla Oriya(Odia) NBGP

Decision

ঘU+0998 ସU+0B38 Distinguishable

Table22ndashBanglaandOriyadistinguishablecodepoints

60

11 Appendix-IIBengaliconsonantsandtheirallographs

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

প p y(z+ত)(z +ন)|(z +প)প8(z +য)r(z +র)(z +ল)~(z +স)(+প)(+প)

ফ pʰ (t +র)(t +ল)(+ফ)

ব b ( +জ)( +দ)( +ধ)v(+ব)ব8(+য)(+র)(+ল)ভ(+ভ)(+ব)(+ব)

(0+ধ) 2 (3+ব)

ভ bʱ ভ8(+য)(+র)(+ল)

ত t (x+ত)8(x+x+য)(x+x+ব)(x+থ)(x+ন)ত8(x+য)(x+ম)8(x++য)(x+ব)(x+র)y(z+ত)(p+ত)(p+x+ব)(+ত)q8(+x++য) (+x+র)Thereisamarkedformofত+=ৎৎ6 (+xৎ)

amp (5+ত)

61

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

থ tʰ থ8(iexcl+য)cent(iexcl+র)pound(+থ)(x+থ)curren(+থ)

(7+থ) 9 (+থ)

দ d yen(brvbar+গ)sect(brvbar+ঘ)uml(brvbar+দ)copy(brvbar+ধ)দ8(brvbar+য)ordf(brvbar+ব)laquo(brvbar+ভ)not(brvbar+র)(+দ)shy(+দ)reg(+brvbar+র)not6 (+brvbar+র)

(lt+গ) gt (lt+ধ)

ধ dʱ macr(deg+ন)plusmn(deg+ম)ধ8(deg+য)sup2(deg+র)sup3(acute+ধ)copy(brvbar+ধ)(+ধ)micro(+ধ)

( (+ধ) gt (lt+ধ) (0+ধ) (7+ধ)

ট ʈ para(middot+ট)ট8(middot+য)cedil(middot+ব)sup1(middot+র)ordm(p+ট)raquo(frac14+ট)

ঠ ʈʰ ঠ8(frac12+য)frac34(iquest+ঠ)Agrave(frac14+ঠ)

ড ɖ Aacute(Acirc+ড)ড8(Acirc+য)Atilde(Acirc+র)

ঢ ɖʱ ঢ8(Auml+য)Aring(iquest+ঢ)

চ tʃ AElig(Ccedil+চ)Egrave(Ccedil+ছ)Eacute(Ccedil+Ecirc+র)Euml(Ccedil+ঞ)চ8(Ccedil+য)Igrave(Iacute+চ)Icirc(Iuml+চ)

(A+চ)

62

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ছ tʃʰ ETH(Ecirc+র)Egrave(Ccedil+ছ)Ntilde(Iacute+ছ)Ograve(Iuml+ছ)

$ (A+ছ)

জ dʒ Oacute(Ocirc+জ)Otilde(Ocirc+Ocirc+ব)Ouml(Ocirc+ঝ)times(Ocirc+ঞ)জ8(Ocirc+য)Oslash(Ocirc+র)Ugrave(Iacute+জ)

(A+জ)

ঝ dʒʱ (notprivilegedenoughtohaveclustersasafirstmember)Ouml(Ocirc+ঝ)Uacute(Iacute+ঝ)

ক k Ucirc(p+ক)ordm(p+ট)(p+ত)Uuml(p+x+র)(p+x+ব)Yacute(p+ন)THORN(p+ব)szlig(p+ম)ক8(p+য)agrave(p+র)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)auml(p+frac14+ব)aacute8(p+frac14+য)aring(p+স)s(aelig+ক)ccedil(+p+র)

amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)

খ kʰ (notprivilegedenoughtohaveclustersasafirstmember)egrave(aelig+খ)

63

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

গ g eacute(acute+গ)ecirc(acute+দ)sup3(acute+ধ)euml(acute+ন)igrave(acute+ব)iacute(acute+ম)গ8(acute+য)icirc(acute+র)iuml(acute+ল)eth(aelig+গ)eth6 (+aelig+গ)

( (+ধ) (H+গ) K (L+H+গ)

ঘ gʱ ntilde(ograve+ন)ঘ8(ograve+য)oacute(ograve+র)ocirc(aelig+ঘ)

ঞ Thisletterdoesnothaveanyparticularphoneticvaluebutmostlypronouncedasn

Igrave(Iacute+চ)Ntilde(Iacute+ছ)Ugrave(Iacute+জ)Uacute(Iacute+ঝ)times(Ocirc+ঞ)

(A+চ) $ (A+ছ) (A+জ) M (A+ঝ)

ণ n otilde(iquest+ট)frac34(iquest+ঠ)ouml(iquest+ড)divide(iquest+Acirc+র)Aring(iquest+ঢ)oslash(iquest+ণ)ণ8(iquest+য)ugrave(iquest+ব)acirc(p+frac14+ণ)uacute(frac14+ণ)ucirc(+ণ)

O (P+ড) - (P+R+র) + (S+ণ)

ঙং ŋ s(aelig+ক)uuml(aelig+p+র)egrave(aelig+খ)eth(aelig+গ)ocirc(aelig+ঘ)yacute(aelig+p+ষ)(Insomecontextsaeligisreplacedbyং)কংঅং

) (H+ক) (H+গ) U (H+ঘ)

64

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ম m thorn(+ল)yuml(+প)(+z+র)(+ভ)(++র)$(+ম)(+র)(x+ম)plusmn(deg+ম)amp(+ম)atilde(p+frac14+ম)

W (3+ম)

ন n (+ট)((+middot+র))(iquest+ঠ)u(+ড)(+Acirc+র)(+ত)q(+x+র)q8(+x++য)curren(+থ)shy(+দ)reg(+brvbar+র)micro(+ধ)+(+deg+র)(+brvbar+ব)-(+ন)(+ম)ন8(+য)(+স)0(+ন)

(7+থ) (7+ধ) (7+Y+র)

শ ʃ Icirc(Iuml+চ)Ograve(Iuml+ছ)1(Iuml+ন)2(Iuml+ম)3(Iuml+র)4(Iuml+ল)শ8(Iuml+য)

ষ ʃ 5(frac14+ক)raquo(frac14+ট)Agrave(frac14+ঠ)uacute(frac14+ণ)6(frac14+প)7(frac14+z+র)8(frac14+ফ)raquo(frac14+ট)9(frac14+middot+র)Agrave(frac14+ঠ)uacute(frac14+ণ)ষ8(frac14+য)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)

+ (S+ণ)

65

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

স sampʃ (+ক)(+ট)(+প)(+ফ)lt(+ত)pound(+থ)(+ট)(+ক)=(+খ)স8(+য)gt(+র)(+ল)aring(p+স)

9 (+থ)

হ h ucirc(+ণ)0(+ন)amp(+ম)হ8(+য)(+র)A(+ল)

W (3+ম)

ড় ɽ B(C+গ)

ঢ় ɽʱ (notprivilegedenoughtohaveclusters)

য dʒThesecondarysymbol(allograph)jɔ-phalahastwophoneticvaluesWhenaddedtotheinitialconsonantinaworditisavowelaelig(asinশ8ামলর 8াপারetc)Butafteranon-initialconsonantitjustdoublesitinpronunciation(asinকায6ধায6etc)The+যcombinationhastwophysicalmanifestationsmdashর 8andয6

ক8(p+য)স8(+য)র 8(+য)[Justর 8isneverusedinBanglaorthographyর 8াisbutthenitslasttwosymbolsYa-phalaa-karaconstituteavowelsignrepresentingthevowelঅ8া]

66

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

র r Twomanifestationsmdashi lরফrepʰasthefirst

memberofaclusteregপ6ৎ6 not6 য6D6(+deg+ব)(earlierE6=+brvbar+deg+বafour-termcluster)etc(placedoverthefollowingconsonant)

ii র-ফলাrɔ-pʰɔlaasthesecondthirdmemberofaclustereg etc(placedundertheconsonantitfollows)

ল l F(+গ)(+প)G(+ব)H(+ম)I(+ট)J(+ড)K(+ক)F(+গ)L(+দ)ল8(+য)iuml(acute+ল)(+ল)thorn(+ল)

ঃ hwordfinallywordmediallyitdoublesthepronunciationofthefollowingconsonant

অঃকঃ

অব

Page 5: Proposal for a Bangla (or Bengali) Script Root Zone …‘Bangla’ (or Bengali) is historically and genealogically regarded as an eastern Indo-Aryan language with around 178.2 million

5

The inscriptional evidence in Brāhmī is found in the Archaic Brāhmī from the 3rdcenturyBC tothe1stcenturyBCandinMiddleBrāhmīndashsoonafter(1st-3rdCenturyAD)andthenonintheLateBrāhmī(4th-6thCenturyAD)ThisevidencecouldbeseeninbothBangladeshandWestBengal [108]by1)TheMahasthanagara(BogradistrictBangladesh mdash the ancient name being Pundranagara or Paundravardhanapura)inscriptions 2)Brāhmī (andKharoṣṭhī) inscriptions from the lower lsquoGangeticBengalrsquoand (3) Copper plate inscriptions of the Imperial Guptas fromNorthernpart ofWestBengal andNorth-West Bangladeshmdash in the areas underDharmaditya Gopachandraand Samācāradeva (about whom one only knows from five Copper-plates found inKotalipara in the Faridpur district in Bangladesh one in Mallasarul in the Burdwandistrict(WestBengal)andoneinJayramapura(BallesvaradistrictnowinOdisha)Theseepigraphs fromtheeasternpartofUndivided India (datingback to the4th-6thCenturiesAD)showedsomecharacteristicfeaturesofletters(especiallyinমlsquomarsquoলlsquolarsquo

শlsquosarsquoসlsquosarsquoandহlsquoharsquo)whichledtothedevelopmentofeasternvarietyofGuptascriptEpigraphicrecordsfromBangladeshdemonstrateremarkabledevelopmentsinEasternBrāhmī In this context the Tippera copper plate inscription of the lsquoSamatatarsquo rulers(139 pp 265) such as Lokanātha (dated 7th Century AD during the latter half) theKailaninscriptionofSy ridharanaRātaaswellastheAstafpurcopperplatesThelettersseemtohangdownfromwedgeshapedsolidtriangleswithrighthandverticalsbendingdownatthebottombecauseofwhichitwasdescribedbyPrinsepandFleetasKuṭila-lipi (literally lsquoCursivewriting stylersquo)whereas the termSiddhamātrikā (as amatra orbarisplacedovereachoftheletters)wasusedbyAlBiruni(973-1048)todesignatethescriptofNorthernIndiaThenextstageofdevelopmentisillustratedbythe9thCenturycopper plate inscriptions fromKhalimpur of the reign of Dharmapāla fromMonghyrand Nalanda of the time of Devapāla in Bihar and from Jagjıvanpura (Malda) of thereignofMahendrapālaTheSiddhamātrikā(mentionedaslsquoSiddhamrsquoinChinesesources)issaidtohavebeenprevalentalsointhisregionuptotheendofthetenthcenturyAlsocalledtheGauri(ieGandi)inPūrvadeśāortheEasterncountryitwasregardedasthesame script to which is given the appellative Proto-Bangla characteristics inrudimentaryformsintheperiodbetweenAD875andAD1025Insomeepigraphs it isconsideredasbelonging to thesecondquarterof theeleventhcenturyADFlatteningofhead-marksbecomesprominentincomparisontothewedge-shaped serifs An important landmark in the development of the Bangla script is theRamaganja copper plate inscription of Mahāmānḍalika in the last quarter of theeleventhcenturyADItistheearliestdocumentfromthisentireregionwhichbearsthelettermwithatickrisingupwardsThefullvowelidevelopsatickattherightendofthe upper horizontal bar above and a curved hook below Initial e approaches themodernBanglacharacterAmature formofProto-Bangla the immediateprecursorof

6

BanglascriptisillustratedintheinscriptionsoftheVarmanaSenaandDevarulersofthetwelfthandthirteenthcenturies[104]TheevolutionoftheBanglascript(Cf136)isalignedwiththestoryofadvancementofprintingtechnologyThefirstldquoMovabletyperdquoscriptstechnicallycreatedandusedwhileprintingNathanielBrasseyHalheds (1751-1830)1778-book titled AGrammaroftheBengalLanguageIn1785Governor-GeneralWarrenHastings(1732-1818)requestedanother civilian Charles Wilkins (1749-1836) to cut punches for Bangla printingcharactersThecurrentprintedformofBanglascriptappearedsoonafterItisgenerallyagreedthatWilkinsdevelopedBanglaprintscript[111]HepassedonthisknowledgetoPancananaKarmakara(-1804)arenownedartistinBengalLateritwasKarmakarand his family that became famous in Bangla printing technology Shepherd wasanotherassistantofWilkinsinthisdesigningofscriptwhichbecamemoreangularwithsharperturnsandedges[133]Afewarchaiclettersweremodernizedduringthe19thcentury It was standardized by Pandit Ishwar Chandra Vidyasagar when the Banglatypefontsweretobeusedtopublishona largescaleundertheCalcuttaSchoolBookSociety[116forseveralreferences]Much later in1935 theLinotypetechnique inventedbyOttmarMergenthaler(1854-1899) in 1886was introduced intoBangla printing in 1935 by the efforts of SureshChandra Majumdar (1888-1954) Rajsekhar Basu (1880-1960) Jatindra Kumar Sen(1882-1966)andhisdiscipleSushilKumarBhattacharyaandhadbegunbeingusedbytheA nandabazaraPatrikagrouplaterfollowedbyothersWithinafewyearsthemoreadvancedmonotypetechnologycametobeusedinBanglaprintingHoweverinBanglaprinting culturemonotypehas a very limited acceptance and linotype held stage tilleventuallythedigitaltechnologycameintoreplaceallearliertechniquesAllthesecouldbepresentedinatable

PERIOD DESCRIPTION NAMES

3rdCenturyBC UseofBrāhmīandKharosthīscriptsbegininthesubcontinentBrāhmīwaswidelyusedduringtheMauryanKingAśokaInonetheoryBrāhmīisbasedonNorthSemiticalphabetbutsuitablymodifiedtofittheneedoflocallanguagesItiscurrentlybelievedtohavebeenanindependentdevelopment

Brāhmī

1st-3rdCenturyAD

TheKusanascriptnamedaftertheKusanaroyaldynasty

Kusanascript

7

PERIOD DESCRIPTION NAMES

4th-5thCenturyAD

ThenextstageofitsevolutionwasintotheGuptascriptnamedaftertheGuptaroyaldynasty

Guptascript

7thCenturyAD EpigraphicrecordsfromBangladeshdemonstrateremarkabledevelopmentsinEasternBrāhmīgivingrisetotheKuṭila-lipi

Kutila-lipi

8thCenturyAD SomecopperplateinscriptionsarefoundintheKhalimpurBangladeshduringthereignofDharmapālafromMonghyrandNālandāinBiharofthetimeofDevapālaandfromJagjıvanapurainWestBengalofthereignofMahendrapāla

Siddhamātikā

9thCenturyADuntil1025AD

Proto-BanglacharacteristicsinrudimentaryformsdevelopAnimportantlandmarkinthedevelopmentoftheBanglascriptistheRamaganjacopperplateinscriptionofMahāmāndalikafoundinthelastquarteroftheeleventhcenturyAD

Proto-BanglaScriptampLanguage

12th-13thCenturyAD

AmatureformofProto-BanglatheimmediateprecursorofBanglascriptisfoundintheinscriptionsoftheVarmanaSenaandDevarulersofthetwelfthandthirteenthcenturies

MaturedProto-Bangla

14th-15thCenturyAD

ThecharacteristicsoftypicalBanglascriptbegantodevelopascouldbeseeninthecopperplateinscriptionofVijayamānikya-IofTripuradated1478AD-alsoIllustratesformsofBanglalettersinthefifteenthcenturyAD

ModernBanglaScripterabegins(SeeRoss1999)

16th-17thCenturyAD

ThechartoftheBanglaalphabetappendedtotheChinaMonumentspublishedfromAmsterdamin1667andThecodeofGentoolawpublishedfromLondonin1776bothshowachartoftheBanglaalphabetTheyshow16VowellettersincludingtheLonglsquoৡrsquo lsquol irsquoAnusvāraandVisargaand34Consonants

PrintedChartsofBangla

18th-19thCenturyAD

CharlesWilkinsdevelopsprintinginBanglain1778andVidyasagarreformsit

BanglaTypeFonts

Table2DevelopmentoftheBanglaWritingSystem

8

TheoveralldevelopmentofBanglaScriptfromtheKuṭila-lipiperiodtoModernBanglacouldbeseenhereinTable3([102and146]andalsoseetheweb-pagein147)

Table3BanglaScriptinDifferentCenturies

32LanguagesConsideredBelowisthetabularrepresentationofthelanguagesusingBanglascriptthatareplacedonEGIDSScale1-6 (See117 fordetails) Some languagesunderEGIDS5 and6havealso developed their own scripts for printing and publishing Some had used Banglascriptearlier(suchasBodo)orusedit inWestBengalatsomepointoftime(Santali)but have later shifted to another writing system Bodo is now written in Nāgarī or

9

DevanāgarīandforSantalioneusesbothNāgarīDevanāgarīandOl-chiki(145)Forthepurposesof theBanglaLGRonly languagesbelonging to theEGIDS scale1 to4havebeenconsideredConsiderthefollowingtable

EGIDSScale1

EGIDSScale2

EGIDSScale3

EGIDSScale4

EGIDSScale5

EGIDS6

Bangla(Bengali)

SantaliBodoRiangKhumiMru(ng)Asho

LepchaPnarKodaKoraChak

Asamiyā(Assamese)

KochorRajabansı

MaltoorMalpahariya

ManipuriorMeitei

BisnupriyaManipuriKok-Borok(TripuraampBangladesh)

ChakmaHajongMundariampKurux(ofBangladesh)

TotoRohingyaTipperaMegamTanchangya

Usoi LimbuSadriorOraon

BhumijorMundariBawmChin

Table4MainlanguagesinIndiaandBangladesh

thatuseBanglaScriptontheEGIDSScale

33NotableFeaturesofBanglaScript[150]BanglaWritingSystemhascertainfeaturesthatshowhowithastobewritteninorhowtype-setting inBangla couldbedoneThis section is followedbya section that explains theCode-points (and fixed Code-point sequences) which show certain distinctive characteristics ofBanglaandwhichmaketheRepertoireThenextsectionswillalsocoverthelsquoaksharrsquo-formationrules(ABNF)showingcharacterclassWordLevelEvaluation(WLE)andContextRulesaswellas In-ScriptandCross-ScriptVariantsHerewepresentsomebasic featuresof theScriptandPronuncition The Bangla script is an alpha-syllabic writing system in which writing of all

consonants are assumed to contain an accompanying lsquoinherentrsquo vowel(theoretically before or after each consonant) It varies between ɔ and o

10

depending on the position of the consonant in the word At times theselsquoassumedrsquoorlsquoinherentrsquovowelsarenotpronouncedatall[142]

Vowelscanbewrittenasindependentlettersorbyusingavarietyofdiacriticalmarks which are written above below before after or both of the last twopositionstheconsonanttheyfollowinpronunciation[105]

AllBanglaconsonantswhenpronouncedinisolationareutteredwithaninherentvowel-ɔhenceক lsquokrsquoখ lsquokhrsquoorগ lsquogrsquoareusuallypronouncedas[kɔ][khɔ]or[gɔ]etcPhonologicallyBanglavowel-ɔcorrespondstotheHindischwaə

WhenconsonantsoccurtogetherinclustersspecialconjunctlettersareformedInprintedBanglamanyof theseconsonantal clustersorconjoinedconsonantsareinuseThelettersfortheconsonantsotherthanthefinaloneinthegrouparegenerally reduced But there are a few special conjunct characters which arecompounds of the consonant characters eg 7(k)+ষ(s)=8(ks)

9(n)+জ(j)=(nj)(j)+ঞ(n)==(jn) gt (h)+ম(m)=(hm) There are other issuesalsomdashরasthesecondmemberofaclusterisreducedtoasecondarysymboleg

(p)+র(r)=A(pr)B(s)+C(t)+র(r)=D(str) (as inউD ustra ldquocamelrdquo)য (y)whenusedas a primary symbol represents jɔ in Bangla But its secondary symbol(allograph) jɔ-phala has two phonetic values When added to the initialconsonant in a word it is a vowel aelig (as in শGামল (syamala) ldquogreenrdquo র Gাপার

(ryapara)ldquowrapperrdquoetc)Butafteranon-initialconsonant it justdoublesit in

pronunciation (as in কাযH ধাযH etc) The I(r)+য(y) combination has two

renderingsmdashর G(ry) andযH(ry)IncaseofJ(d)+ধ(dh)K(g)+ধ(dh)L(n)+ধ(dh)the

shape of the second member is changedmdasheg M(ddh) N(gdh) and O(ndh)

respectively The solitary example of I (r)+ঋ(r)=ঋH (as in ৈনঋHত nairrtSouthwest) ndash usedmostly in cases of Classical borrowings shows the use ofsecondary symbol of a consonant followed by the primary symbol of a vowelTheinherentvowelonlyappliestothefinalconsonantofthecluster

InconsonantclustersmanyconsonantstookacompletelydifferentformSometypicalexamplesareS(kt)T(kr)8(ks)N(gdh)=(jn)U(nc)(nj)V(tt)W(nt)O(ndh)X(bdh)Y(bhr)Z(mb)[(st)etcরhastwoallographsapartfromthisfullshapeoneislsquorepharsquoasfoundinকH(rk)পH(rp)andanotherisra-phalaasinA(pr)T(kr)(s+n)isanotheronewherethecerebralnasalconsonantsigntakesaqueershape[151]

The Bangla script has at least fifty-two primary symbols and quite a fewallographs(positionalvariantsofthem)correspondingtoforty-four(7oraland7nasalvowelsand30consonants)phonemes(150)orfunctionalspeechsoundswithsomeobviousredundanciesalthoughinoneofthefirstphonemicanalysisthenumberwasthoughttobethirty-fivephonemes[140]

11

As mentioned above in Bangla several graphemic symbols have secondaryshapestechnicallycalledlsquoallographsrsquowithacomplementarydistributionineachcaseThesegraphsormarkingsaregenerallyaddedtothefollowingpositionsoftheprimarysymbol[113]inthefollowingmanner

1) Below(egক(ku)W(nta)ক(ku)^ (hra)etc)

2) Above(egচ (ca)কH (rka)etc)

3) Rightside(egকা (ka)কং (kan)etc)

4) Leftside(egেক (ke))

5) LeftSideandabovesimultaneously(egৈক (kai)িক (ki)etc)

6) Rightsideandabovesimultaneously(egকী (kı))

7) Rightsideandleftsidesimultaneously(egেকা (ko))

8) Rightsideleftsideandabovesimultaneously(egেকৗ (kau))

Asforcomplementarydistributionofvowelletters(word-orsyllable-initial)andVowel Matras which are relevant for ABNF let us consider the followingBesidessomesimpleVowelModifierscalledlsquoKarsrsquoinBangla(alsoreferredtoasMatraintheotherLGRdocumentsofNeo-Brāhmī)therearesomecombinatorymodifiersofBanglaVowelswithcertainconsonantsForexamplewhereas

আU+0986BENGALILETTERAAissubstitutedby

াU+09BEBENGALIVOWELSIGNAA

ইU+0987BENGALILETTERIissubstitutedby

pre-posedিU+09BFBENGALIVOWELSIGNI

ঈU+0988BENGALILETTERIIissubstitutedby

ীU+09C0BENGALIVOWELSIGNIIor

উU+0989BENGALILETTERUissubstitutedby

U+09C1 BENGALI VOWEL SIGN U by marking below the primary

grapheme there are some special vowel modifiers of উ as in the followingcombinedletters

zwnj guratherthanwritingasগ(g)+ (u)

h ruratherthanwritingasর(r)+ (u)

zwnj śuratherthanwritingasশ (s)+ (u)

j huratherthanwritingasহ(h)+ (u)

12

knturatherthanwritingasL (n)+ত (t)+ (u)

Similarlytherecouldbevowelmodifiersofঊorlsquo(Long)ūrsquoaswelleg

m (bh)+র (r) (n bhru ldquoeyebrowrdquo)o (s)+র (r) (p sru)ঋ (r) afterহ (h) (q hr)etc

TherehavebeenmanynotablecontributionsinsimplifyingandmodifyingBanglaspellings and combinatory techniques especially by scholars such as PabitraSarkar(1992)[134]Inthistherehasbeenanattempttoreducethenumberofallographs of both vowels and consonants in clusters and it has been widelyacceptedintheprintingofschooltextsinbothBangladeshandWestBengal[151152]Asofnow twosystems theold (traditional) and thenewgoon sidebysideoperativeindifferentdomains

HoweverinpreparationofthisLGRdocumenttheaimhasbeentoconsiderthewidelyused and usable sequences and combinations and their variations across the sisterscriptsbelongingtothebasketofBrāhmīwritingsystemsBanglaAcademyDhakapublishedStandardBanglaSpellingRulesin1992followingtherecommendationsofacommitteeconstitutedthroughaworkshopjointlyorganizedbytheJatıyaSy iksakramaandPathyapustakaBoardin1988AthroughlyrevisededitionoftheRuleswaspublishedinSeptember20126After the establishment of Banla A kademi ofWestBengal in 1986 its first PresidentAnnadasankar Ray (1904-2002) in his inaugural address gave a direction forstandardizationofBanglaalphabetscript thespellingsystemandclearlyarguedthattheywouldnotblindly followtheSanskriticmodelofconventionalgrammarAbroadlistofproposalswassenttoexpertsonBanglaandabroadagreementwasreachedforlsquohomogenizationofBanglaspellingrsquoby1988BasedonopinionsreceivedfromdifferentquartersaunanimouslistoflsquorulesrsquowasagreeduponThiswaspublishedbyalsquoSpellingDictionaryrsquo titled Ākādemi Bānāna Abhidhāna (1997) which was obviously morecomprehensive than lsquoTheUniversityofCalcuttaproposalsrsquomade in1936Alongwiththe lsquorationalizationrsquo of spellings another stepwas taken tomake thewriting systemeasier to read by making the symbols used both single and combined ones morelsquotransparentrsquoThesereformswereoriginallysuggestedbySarkar(1987firstpublishedin1978)[134][153]whereheusedthetermsSwaccha (lsquoTransparentrsquo)andAswaccha(lsquoOpaquersquo or non-transparent) even adding Ardha Swaccha (lsquohalf transparent) inbetweenthetwoSomesampleexamplesare

6Bangla Academy 2012 Bāṅlā Ekaḍemī Pramita Bāṅlā Bānānera Niyama (Bangla Academy StandardBanglaSpellingRules)DhakaːBanglaAcademy

13

Transparent r (nn) s (pt) [ (st) where both member of the cluster can berecognizedOpaquewhereneitherofthetwocouldbe(easily)recognizedmdash8 ks(7 k+ষ s)= jn

( j+ঞ n)tng(un+গg) hm(gt h+ম m)

Semi-transparent A (pr)পH (rp)whereone symbol is recognizable and theother is

notIncaseofthree-termclustersatleastonesymbolwillnotbetransparentegv str

(w s+x t+র r)D str(B s+C t+র r)etc

Therewere in fact two types of proposals One concerned the shape of the lettersthose of consonant + vowel (CV) combinations and conjuncts which is consonant +consonantcombinationsTherewerefurthercomplexshapesiethoseofconsonant+consonant+ (consonant+) vowel (CC(CV) signs as in y (pru) or z (skru) SomedecisionsinthisareawerenecessarybecauseafewoftheCC(C)symbolsrepresentedcomplexitiesthatmadelearningthemdifficultforthechildrenTheotherdealtwiththespellings ofwords onlywithout any reference to the shapes of letters inwhich theywere written The basic objective here was lsquoone word one spellingrsquo to the greatestextentthatwaspossible[151]

Belowwe place a statement of themost salient changes that affect the consonant +vowelcombinations[153]

a The variants of the short u (^ উ-কার hrasva u-kāra) vowel sign have been

brought down to one ie So zwnj (gu) is now গ Similarly h (ru) gt র zwnj

(śu)gt শ j (hu)gtহ and therefore cluster + short u sign k (ntu)gt W

(ন++ত+উ) (stu)gt[ (স++ত+উ)

b The variants of long u (দীঘH ঊ-কার dīrgha u-kāra) have also been reduced

(rū)gt র n (bhrū) gt Y (ভ bh++র r+ঊ ū) (drū)gt (দ d++র r+ঊ ū) p (śrū)gt

(শ ś++র r+ঊ ū)

c The variants of ঋ-কার (ṛ-kāra secondary symbol of ṛ) have been brought down to one q (hṛ) gt হ

Regarding consonant + consonant + (consonant)hellip+ (vowel) clusters PaschimbangaBanglaAkademi proposed transparent or semi-transparent shapes for clusters to theextentadmissibleinBanglawritingsystemSomeexampleswillclarifytheproposal(Aslashwillmeanthatthetraditionalcluster-shapeprecedesitwhiletheBanglaAkademiinnovationfollows)[153]

Xব ধ bdh ( b+ ধ dh) Mদ ধ ddh (J d+ধ dh) ন থ nth (L n+থ th) Uঞ চ ntildec

(9 ntilde+চ c) ঞ ছ $ ntildech (9+ছ) ঞ জ ntildej (9 ntilde+জ j) Sক ত amp kt (7 k+ত t) T

14

kr (7 k+র r) Nগ ধ ( gdh (K g+ধ dh) ) ṅk (u ṅ+ক k) t ṅg (u ṅ+গ g) +

ṣṇ (B ṣ+ণ ṇ) ন ndhr (L n+ dh+র r) - ṇḍr ( ṇ+ ḍ+র r) ktr (7 k+x

t+র r)

331TheConsonantsAsper traditional classificationBangla Consonants are categorized according to theirphoneticpropertiesespeciallyintermsofplaceandmannerofarticulation[107]Thereare Five lsquoVargarsquo (pronounced as lsquoBargarsquo in Bangla) or Groups (sets or classes)distinguished by Place of Articulation and one Non-lsquovargarsquo group [105] Each Vargawhich corresponds toStopsat a certainplaceof articulation containsa seriesof fiveconsonants classified as per their phonetic qualities (ie manner of articulation)beginning from Unvoiced and Unaspirated to Voiced and Aspirated (in the fourthcolumn)finallyendingwithaHomorganicorCorrespondingnasal[107]Considerthefollowingtable

lsquoVargarsquoorSets

Unvoiced Voiced Nasal

-Asp +Asp -Asp +Asp

Velar কlsquoKrsquoU+0995

খlsquoKHrsquoU+0996

গlsquoGrsquoU+0997

ঘlsquoGHrsquoU+0998

ঙlsquoNGrsquoU+0999

Palatal চlsquoCrsquoU+099A

ছlsquoCHrsquoU+099B

জlsquoJrsquoU+099C

ঝlsquoJHrsquoU+099D

ঞlsquoNYrsquoU+099E

Retroflex টlsquoTTrsquoU+099F

ঠlsquoTTHrsquoU+09A0

ডlsquoDDrsquoU+09A1

ঢlsquoDDHrsquoU+09A2

ণlsquoNNrsquoU+09A3

Dental তlsquoTrsquoU+09A4

থlsquoTHrsquoU+09A5

দlsquoDrsquoU+09A6

ধlsquoDHrsquoU+09A7

নlsquoNrsquoU+09A8

Bilabial পlsquoPrsquoU+09AA

ফlsquoPHrsquoU+09AB

বlsquoBrsquoU+09AC

ভlsquoBHrsquoU+09AD

মlsquoMrsquoU+09AE

Table5VargaclassificationofBanglaconsonants

(FallingintoaPatternofFiveSetsofUnvoicedUnaspiratedUnvoicedAspiratedVoicedUnaspiratedVoicedAspiratedandNasalscalledfivelsquoVargarsquo)

Non-

যlsquoYrsquoU+09AF

য়lsquoYYrsquoU+09DF

রlsquoRrsquoU+09B0

ড়lsquoRRrsquoU+09DC

ঢ়lsquoRHrsquoU+09DD

15

Varga লlsquoLrsquoU+09B2

শlsquoSHrsquoU+09B6

ষlsquoSSrsquoU+09B7

স lsquoSrsquo U+09B8

হlsquoHrsquoU+0939

Table6Non-Vargaconsonants(Notfallingintoanyofthefivecategories)

332TheImplicitVowelKillerHasanta(calledrsquoHalantrsquoorlsquoHalantarsquoinotherBrahmı-basedscripts)As stated earlier all consonants are pronounced in isolation with an implicit vowel(centralback-ɔinBanglaastheneutralvowel)assumedtobeassociatedwiththem[121]ThelsquoHasantarsquo(=rsquoHalantrsquoorlsquoHalantarsquoinotherBrahmı-basedscripts)orthetermlsquoVirāmarsquo7(=rsquoDa rirsquoinBangla)aspreferredinUNICODE(cfUnicode30andabove)havebeenusedinthisreportastermsthathavebeenusedtodenotethecharacterthatmarkthe absence of this inherent vowel It may be noted that the term virama has beenadopted in UNICODE in a sense that is different from the traditional definition ofgrammarandhenceitrequiressomeexplanationhereConsideringtheimportanceofthedocumentthisnoteshouldbeapartofthisLGRdocumentsothatanybodyreferingtoitshouldbeabletoknowthepropergrammaticalexplanationofthetermBecauseaspecialsignisneededwheneverthisimplicitvowelisstrippedoffthesymbolisknownas the Hasanta (= Halant) (U+09CD) By placing the Hasanta under the firstconsonantofacombinationorclusteronecouldndashincommonparlanceldquokillrdquoitsvowelandcreate conjuncts In thismanner conjunct characters canbegenerallywrittenbyjoining two to fourconsonant combinations In rarecases thisprocess can joinup tofive consonantsHowever thenotionof amaximumnumberof consonants joining toformoneaksara8istobeboundedempiricallyThisisanobservationbasedontheCIIL-Emille Corpora of Bangla words [132 amp 133] as seen in print these days Given themixture of scripts and languages happening on theweb the possibility that onemaywant a generic Top Level Domain [gTLD] which may have more than the observedmaximum cannot be ruled out This can be the case when a foreign language wordwhichadmitsalargenumberofconsonantsistransliteratedintoBanglaHenceintheBanglaLGRworkthislimitwillnotbeenforced

333VowelsSeparate symbols exist for all lsquoSwararsquo or Vowels in Bangla which are pronouncedindependentlyeitheratthebeginningofthewordorafteranothervowelorconsonantsoundToindicateaVowelsoundotherthantheimplicitoneaVowelsigncalledlsquokārrsquo

7VirāmaasusedhereisalsoamisnomeraccordingtotheIndiangrammaticaltraditionsNowheremereabsence of a vowel is marked as virama Hasanta just marks the absence of a vowel nothing else(AbhyankarKashinathVasudevampJMShukla1961ADictionaryofSanskritGrammarBarodaːOrientalInstitute)8ThistermneedstobedisambiguatedAksaraalsomeanslsquosyllablelsquoinIndiangrammaticaltreaditions

16

inBanglaorMātrāinNagarı9isattachedtotheconsonantSincetheconsonanthasthisbuilt in neutral vowel at the end there are equivalent kāras (Mātrās) for all vowelsexcepttheঅ(pronounced-ɔ)Thecorrelationisshownasfollows

Vowel Correspondingvowelsign(kāras(Mātrās)

অlsquoArsquoU+0985

আlsquoAArsquoU+0986 া U+09BE

ইlsquoIrsquoU+0987 ি U+09BF

ঈlsquoIIrsquoU+0988 ীU+09C0

উlsquoUrsquoU+0989 U+09C1

ঊlsquoUUrsquoU+098A U+09C2

ঋVocalicrsquoRrsquoU+098B U+09C3

ৠVocaliclsquoRRrsquoU+09E0 U+09C4

ঌVocaliclsquoLrsquoU+098C U+09E2

ৡVocaliclsquoLLrsquoU+09E1 U+09E3

এlsquoErsquoU+098F ে U+09C7

ঐlsquoAIrsquoU+0990 ৈU+09C8

ওlsquoOrsquoU+0993 োU+09CB

ঔlsquoAUrsquoU+0994 ৌ U+09CC

- ৗ U+09D7

Couldappearontopofঅ lsquoArsquoU+0985oranyothervowel

U+0981Candrabindu

9AlthoughthetermlsquoMatralsquoinBanglastandsforanaltogetherdifferentconceptvizthetopbarplacedoveraletterndashtypicallyavailableinHindiandBanglabutmissinginGujarati

17

Vowel Correspondingvowelsign(kāras(Mātrās)

Couldappearafterঅ lsquoArsquoU+0985oranyothervowel

ংU+0982Anusvara

Couldappearafter অlsquoArsquoU+0985oranyothervowel

ঃU+0983Visarga

Afteranyconsonant U+09CD(Hasanta)

- ঽ U+09BDAvagraha

Table7BanglaVowelswithcorrespondingkārs

334TheAnusvāraonuʃʃār(ং-U+0982)TheAnusvāra or onuʃʃār inBangla at times represents a homorganic nasal but notalwaysItreplacesaconjunctgroupofalsquoNasalConsonant+Hasanta+ConsonantrsquowherethesecondconsonantbelongstotheVelarvargaorsetasinলংকাButitoftenappearsalso for such combinations involving non-velars appearing as the lastmember of thecombinationasinলGাংটা ldquonakedrdquoorলGাংচা ldquoakindofsweettolimprdquoBeforeanon-vargaconsonant the Anusvara represents a nasal sound that may have an alternativeconjoined writing symbol representing the corresponding nasal consonant of theparticularsetAlthoughModernHindiMarathiandKonkaniprefertheanusvāratothecorrespondingHalf-nasal inBangla it isclearlydemarcatedastowhereonemustusetheAnusvāraandwhere ithastobeaconjunctclusterwithanasalasthefirstorthesecondcomponent

335NasalizationCandrabindu(-U+0981)

Candrabindu denotes nasalization of the preceding vowel as in চাদ cad lsquomoonrsquo(U+099AU+09BEU+0981U+09A6)Thissignwithadotinsidethehalf-moonmarkisusedasnasalizationmarkerinmanyBrahmı-basedscripts[143]

336Nukta(-U+09BC)ThenuktasigndoesnotexistinBanglaorthographyItispredominantlyusedinmanyBrahmıderivedscriptssuchasDevanagarı(forHindiBodoMaithiliSantaliKashmiriandSindhiThetermandtheconceptofnuktaareborrowedinBanglaTheIDNAProtocol(RFC5891)statesthatIDNsmustbeinUnicodeNormalizationFormC (NFC) RFC 7940 applies this requirement to LGRs The definition of NFC in the

18

UnicodeStandardcontainsanumberofcompositionexclusionsAsaresulttheBanglalettersয় YYAড় RRAandঢ় RRHAhavetoberepresentedinthethisLGRbyusingthesequences (YA +Nukta U+9AF + U+09BC) (DDA + Nukta U+9A1 + U+09BC) and(DDHA+NuktaU+9A2+U+09BC)insteadofthesinglecodepointsYYA(U+9DF)RRA(U+09DC) andRRHA (U+09DD) although the useof lsquoNuktarsquo is otherwise completelyunnaturalinBanglaIt is noted that in the current Unicode Standard chart these characters are listed asadditionalconsonantsAspertheLGRProcedurehoweverthesedecisionsdependontheIDNAProtocolthroughasetofprodeduresdevelopedbytheIETFEventhoughtheUnicode Standard also prescribesmethods to produce these three characters both asatomiccharacters (forexample09DC forড় [r]09DD forঢ় [rh] and09DFasয় [y]assinglekeystroke)theIDNAprotocolrequiresthatwetreatthemasconjunctcharactersandthenallocatecodesfortheseintheUnicodeBengaliBlockItmaybenotedthattherecouldbesporadicattemptsorcasesofwritingMuslimnamesUrdupoeticwordsandPerso-Arabicloanwordswithnuktaunderক(k)খ(kh)গ(g)জ(j) and ফ (ph) only for the sake of correct pronunciation and for maintaining thesanctityoftheloanwordThesewerealsolikeusingBanglawritingsystemtoworkliketheIPAscriptItishowevernotinuseinBanglawritinginprinting

337Visargabiʃɔrgo(ঃ-U+0983)andAvagraha(ঽ-U+09BD)

TheVisargabiʃɔrgoU+0983 is frequentlyused inBangla loanwordsborrowed fromSanskritandrepresentsasoundveryclosetohOnecouldquoteasanexampleদঃখduhkholdquosorrowrsquorsquoldquounhappinessrsquorsquo(U+0926U+0941U+0983U+0916)The Avagraha ঽ (U+09BD) is mainly used in Sanskrit Pali Prakrt or Maithili textswritteninBanglaItisgraduallybeingreplacedbyanuppercomma(egনেরাঽপরািণre-writtenasনেরাrsquoপরািণ)ItisrarelyusednoweveninotherlanguagesusingBanglascriptIncaseofLGRtheAvagrahaisnotpartoftherepertoireIthasbeendecidedthereforenottoretainAvagraha(ঽ)(U+09BD)becauseitisblockedinTLDsaspertheMaximalStartingRepertoire(MSR)PleaseseeAppendixIIinsection11foracompletelistofBanglaconsonantsandtheirallographs

338ZeroWidthNon-joiner(U+200C)andZeroWidthJoiner(U+200D)ThisnoteispertinenttotheuseofZeroWidthJoiner(ZWJ)andZeroWidthNonJoiner(ZWNJ)asusedinBanglaItneedstobenotedthatNepaliKonkaniandHindiusethesetwosignsinadifferentmanner

19

ZWJ(U+0200D)andZWNJ(U+0200C)arecodepointsthathavebeenprovidedbytheUnicodestandardto instructtherenderingofastringwherethescripthastheoptionbetweenjoiningandnon-joiningcharactersWithouttheuseofthesecontrolcodesthestringmayberenderedinanalternateformfromwhatisintendedUseofZWJ

bull InsofarasBanglaisconcernedZWJisusedfortheproperrenderingofcharacterssuchaskhaṇḍa-taৎasinসতGিজৎ (satyajit)ldquoSatyajitrdquoandসৎ(sat)ldquohonestrdquoThisistypedasfollowsta+Hasanta+ZWJ(U+0200D)

bull However ZWJ is more important where same combination of consonantal

characters is represented differently depending upon the contexts Eg র++য

havetworepresentationsinBanglamdashasযHandasর GTogettheformযHonehasto

type in the following mannermdashর++য but for র G the sequence would be

র+ZWJ++য [154] In other words ZWJ is used in the rendering of wordsdemanding ya-phalā after ra which is otherwise not possible to type (render)due to the same order of ra+hasanta+antastha ja in the medial andor finalposition Interestingly ra+hasanta+antastha ja is used to type repha on theconsonant -antasthaja as inকায6 (kaarjo) In order to get a ya-phalā after the

consonant -ra it is therefore obligatory to use ZWJ after -ra as in র Gাপার

(wrapper)র Gাশ(rash)র Gািল(rally)etcThetypingsequenceisgivenbelow

ra(র)+ZWJ+hasanta()+antasthaja(য)=র GUseofZWNJ

bull TheuseofZWNJinBanglaisusedtorepresenttheexplicitHasantaorHalantInordertoavoidconjunctformationincaseswherethereisanexplicithasantabeforethesucceedingconsonanttheZWNJisused

Consonant+hasanta+ZWNJ+consonant=explicithasantaExampleAা7 কথন(prakkathanaprakkɔtʰon)

TheuseofZWJZWNJhavebeenruledoutfromtherootzonebythe[Procedure]Usedin Bangla to create alternate renderings the insertion of these two signs can affectsearchingaswellasNLP

20

TheZeroWidthNon-joiner(ZWNJ)isaninvisiblecharacterusedincertaincases(afterHasanta)wheredefaultconjunctformationistobeexplicitlyrestrictedandtheHasantajoiningthetwoconsonantsparticipatingintheconjunctformationneedstobeexplicitlyshown

339UseofYa-phalaaYa-Phalaasequencesare two instances inBanglawhereHasanta isprecededbya fullvowel(U+0985অ-BENGALILETTERAandU+098Fএ-BENGALILETTERE)

bull অা 098509CD09AF09BE BENGALILETTERA+BENGALISIGNVIRAMA+BENGALILETTERYA+BENGALIVOWELSIGNAA

bull এা 098F09CD09AF09BEBENGALILETTERE+BENGALISIGNVIRAMA+BENGALILETTERYA+BENGALIVOWELSIGNAA

For renderingYa-phalā followedbyঅ andএ it isnecessary to typeU+09CDHasantaplusU+09AFyaprecededby thesaidvowelsThis isapurely ligaturalentityand theadditionofYa-phalāandākaraisusedtoelicittheaeligsoundasinEnglishacidঅGািসড

association অGােসািসেয়শনlsquobatrsquoবGাটlsquofatrsquo ফGাট lsquomatrsquo মGাটlsquocaprsquoকGাপetcTheBrāhmīscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribed as lsquovowel killerrsquo although it actually indicates absenceof a vowel after themarkedconsonantOnly theconsonantscanhave theHasantamarkedButasweseehereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅ8াandএ8া(CfUnicode100p473[100])

3310FormationofRa-phalaaandRefSequences Thiscasereferstotheformationofrephaandra-phalāasfollows

Ra-Hasanta= (C2H)whereC2iseither

09B0(র-BENGALILETTERRA)or 09F0(ৰ-ASSAMESELETTERRAUnicodename

BENGALILETTERRAWITHMIDDLEDIAGONAL)His09CD(-BENGALISIGNVIRAMA)

Owingtoco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+kaasinঅকHarkaldquothesunldquo)ra-phalā=C+Hasanta+ra(egT ieka+Hasanta+raasinচTchakraldquocyclerdquo)Thepointisin

21

both the cases the slot for ra could be Bangla ra র (U+09B0) or the Assamese ra ৰ(U+09F0)followedprecededbythecommonHasanta(U+09CD)whereastheshapesofrephaandra-phalāinboththecasesremainthesameTheLGRmakesanoteofthispoint of concern with respect to the two RAs in disguise as it would be compeltelyimpossibletodistinguishbetweenthemwithnakedeyesinalablesogeneratedwhichmay consequently lead to concerns related to spoofing and other kind of cyberirregularitiesThemotivetoclassthesetwoCPsas(blocking)variantsisbecausefullyrendered labels may mask the distinction between Bangla ra র (U+09B0) or theAssameseraৰ(U+09F0)ThatprovidesthejustificationforVariantSet4thoughonlyinthecontextoffollowingHasantThedifferencebetweentheRAsisonlydistinguishableifonelooksintotheirUnicodevaluesThereforelabelssuchasঅকHarka শীষH sırsalsquotopapexrsquo অY abhra lsquocloudthe skyrsquo ম śrama lsquophysical labourrsquo could be extremelydangerous as theweb-usermay never verify the digital content (the labels) with itsunicodevaluecodepoints ThispointismadeexplicitlywithreferencetoTable9(ofsequencesp36)andTable16(ofWLESymbolsp47)thataretofollowMoreoveritisnoteworthythattheREPHAcanalsooccurwithKHANDATATheconditionsinthiscontextofKHANDATAareliabletobesuchthattheCshouldbeeitherRAU+09B0(র)(usedinBangla)orRAU+09F0(ৰ)(usedinAssamese)

4 OverallDevelopmentProcessandMethodologyThe Neo-Brāhmī Generation Panel (NBGP) has been formed by members havingexperience in Linguistics (especially in NLP Computational linguistics) LiteratureLanguageHistoryandEpigraphyUndertheNeo-BrāhmīGenerationPanelBanglaandeightotherscriptsbelongingtoseparateUnicodeblocksarebeingtakenuptoassignaseparate LGR for each However an attempt ismade to ensure that the fundamentalphilosophybehindbuildingthoseLGRsconsistentwithallotherBrāhmī-derivedscriptsThepresentLGRwillcater tomultiple languagesbelongingtoEGIDSscale1to4(seeTable4)thatuseBanglascriptThefollowingguidingprinciplesareusedinmakingdecisionsaboutBanglaLGRCode-points

41 GuidingPrinciplesTheNBGP adopts followingbroadprinciples for selection of code-points in the code-pointrepertoireacrosstheboardforalltheNeo-Brāhmīscriptswithinitsambit

411 InclusionPrinciples4111 ModernUsageEvery character proposed should be in the everyday usage of a particular linguisticcommunityThecharacterswhichhavebeenencodedintheUnicodefortranscription

22

purposesonlyorforarchivalpurposeswillnotbeconsideredforinclusioninthecode-pointrepertoire4112 UnambiguousUseEvery character proposed should have unambiguous understanding among linguistsaboutitsusageinthelanguage

42 ExclusionPrinciplesThe main exclusion principle is that of External Limits on Scope These consist ofprotocolsor standardswhichareprerequisites to theLabelGenerationRule-setsAllfurtherprinciplesareinfactsubsumedundertheselimitationsbuthavebeenspeltoutseparatelyforthesakeofclarity421 ExternalLimitsofScopeThecodepointrepertoireforrootzonebeingaveryspecialcaseatthetopofprotocolhierarchies thecanvasofavailablecharacters forselectionasapartof theRootZonecodepointrepertoireisalreadyconstrainedbyvariousprotocollayersbeneathitThefollowingthreemainprotocolsstandardsactassuccessivefiltersiTheUnicodeChartOut of all the characters that are needed by the script in question if a particularcharacter is not encoded in Unicode it cannot be incorporated in the code pointrepertoire Such cases are quite rare and especially so in Bangla-Asamiyā-ManipuriWritingSystemgiventheelaborateandexhaustivecharacterinclusioneffortsmadebytheUnicodeconsortiumiiIDNAProtocolUnicode being the character-encoding standard for providing the maximum possiblerepresentation of a given scriptlanguage it has encoded as far as possible all thepossible characters needed by the script However the Domain name being aspecialized case it is governed by an additional protocol known as IDNA(InternationalizedDomainNames inApplications) The IDNAprotocol excludes somecharactersoutofUnicoderepertoirefrombeingpartofthedomainnamesiiiMaximalStartingRepertoire(MSR)TheRoot-zoneLGRbeing the repertoireof characterswhicharegoing tobeused forcreationoftheRoot-zoneTLDswhichinturnconstituteanevenmorespecializedcaseof domain names the ROOT LGR procedure introduces additional exclusions on theIDNArsquosallowedsetofcharacters

23

ExampleBanglaSignAvagrahaঽ(U+093D)evenifallowedbyIDNAprotocol isnotpermittedintheRootZoneRepertoireaspertheMSRTosumuptherestrictionsstartoffwithadmittingonlysuchcharactersasarepartofthe code-block of the given scriptlanguage The IDNA Protocol further narrows thisdownandfinallyanadditionalfilterintheformofMaximalStartingRepertoirerestrictsthecharactersetassociatedwiththegivenlanguageevenmore4211 NoPunctuationMarksTheTLDsbeingidentifierspunctuationmarkerspresentinBraHami-basedscriptswillnotbeincluded 4212 NoSymbolsandAbbreviationsAbbreviations weights and measures and other such iconic characters like BANGLAISSHAR(U+09FA)BANGLACURRENCYDENOMINATORSIXTEEN৹(U+09F9)etcwillalsonotbeincluded4213 NoRareandObsoleteCharactersThere are characterswhich have been added toUnicode to accommodate rare formssuchasSanskriticVOCALICRRৠ(U+09E0)andVOCALICLldquoঌrdquo (U+098C)aswellasVOCALICLLৡ(U+09E1)andtheallographicndashkaraformsofthelattertwosymbols-VOWELSIGNVOCALICL(U+09E2)andVOWELSIGNVOCALICLLldquo(U+09E3)Allsuch charactersareexcludedwhich complieswith theConservatismprincipleas laiddownintheRootZoneLGRprocedureHoweverinBanglathe-karacorrespondingtoVOCALICRRৠ(U+09E0)whichisVOWELSIGNVOCALICRRldquordquo(U+09C4)isstill inactiveuseincertainlimitedborrowedorSanskriticwordsandarethereforeretained4214 NoStressMarkersofClassicalSanskritandVedicStressmarkers for classical Sanskrit will not be included This is also in consonancewiththeLetterprincipleaslaiddownintheRootZoneLGRprocedure4215 ABNFThe Augmented Backus-Naur Formalism (ABNF) is described in Section 541 andAppendix(Section101)

5 RepertoireTheBanglaWritingSystemisrepresentedinUNICODEusingtheBengali(Bangla)scriptname as enumerated in ISO 15924 corresponding to languages such as Asamiyā(Assamese) Bangla (Bengali) and Manipuri The BENGALI block used for Bangla-

24

Asamiyā-Manipuri in theUNICODEhas93 entriesThis sectiondetails the code-pointrepertoirethattheNeo-BrāhmīGenerationPanel[NBGP]proposestobeincludedintheBanglaLGRItmaybementionedherethat theGovernmentofAssamhassubmittedaproposal toBureauof Indian Standards (BIS) on26th February2016 for dis-unificationofBanglaand Asamiyā Scripts The BIS inits 8thMeetingofIndian Language Technologies andProducts Sectional Committee LITD 20 held on 23rd Aug 2017 decided torefer theproposalforrecognitionofAssamesescriptinISOIEC10646toISOUntiltheUNICODEConsortiumtakesanyfurtheractionitwillbeassumedthattheCodePointRepertoireunderTable11willbevalidforallthethreelanguagesasaboveFor each of the code points language references have been given in the last columntitledReferenceunderTable8titledtheldquoCodePointRepertoirerdquoForentirecoverageofBanglacodepointsreferencesofBanglaAsamiyā(Assamese)Manipuri(Meitei)andBishnupriya are given Kokborok written in Bangla script is not known to haveintroducedmanynewcomplicationsexceptforoneparticularcharacterThoughonlyafewrepresentativelanguagesunderEGIDSScale1-4havebeenchosenforreferencingthey together cover all the code-points required for all the languages that NBGP hasconsideredasgivenunderBanglaUnicodePoints(asgiveninUNICODE63)Howeverbefore thedetailsarepresented it is ideal to lookat theBanglaCodePointChartfromMaximalStartingRepertoire[MSR]Version3Itmaybenotedthattheshapesofthereferenceglyphsgivenbelowinthecodechartsarebasedononeofthemanyfontsdesigned and are not prescriptive because there could be some variations in actualfonts ndash both UNICODE-compatible and True-Type ones Consider the following Codepointtable

25

Figure1BanglaCodePagefrom[MSR]forBangla-Asamiyā-Manipuri

ColourconventionAllcharactersthatareincludedinthe[MSR]-YellowbackgroundPVALIDinIDNA2008butexcludedfromthe[MSR]-PinkishbackgroundNot PVALID in IDNA2008 or are ineligiblefor the root zone (digits hyphen) - Whitebackground

GiventheBanglaUnicodeBlockasinFigure1forthecodepointsthoseareincludedintheMSRthefollowingsymbolswillneedaseparatetreatmentৎ U+09CE BanglaLetterKhanda-Taৰ U+09F0 Asamiyā-BanglaLetterRaWithMiddleDiagonalৱ U+09F1 Asamiyā-BanglaLetterRaWithLowerDiagonal

26

51 CodePointRepertoireInclusion

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

1 U+0981 BENGALISIGNCANDRABINDU

Candra-bindu

1Bangla2Manipuri2Assamese

[112][122][125]

2 U+0982 ং BENGALISIGNANUSVARA

Onushshar(Anusvara)

1Bangla2Manipuri2Assamese

[112][122][125]

3 U+0983 ঃ BENGALISIGNVISARGA

Biśarga(Visarga)

1Bangla2Manipuri2Assamese

[112][122][125]

4 U+0985 অ BENGALILETTERA

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

5 U+0986 আ BENGALILETTERAA

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

6 U+0987 ই BENGALILETTERI

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

7 U+0988 ঈ BENGALILETTERII

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

27

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

8 U+0989 উ BENGALILETTERU

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

9 U+098A ঊ BENGALILETTERUU

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

10 U+098B ঋ BENGALILETTERVOCALICR

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

11 U+098F এ BENGALILETTERE

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

12 U+0990 ঐ BENGALILETTERAI

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

13 U+0993 ও BENGALILETTERO

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

14 U+0994 ঔ BENGALILETTERAU

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

28

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

15 U+0995 ক BENGALILETTERKA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

16 U+0996 খ BENGALILETTERKHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

17 U+0997 গ BENGALILETTERGA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

18 U+0998 ঘ BENGALILETTERGHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

19 U+0999 ঙ BENGALILETTERNGA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

20 U+099A চ BENGALILETTERCA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

21 U+099B ছ BENGALILETTERCHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

29

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

22 U+099C জ BENGALILETTERJA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

23 U+099D ঝ BENGALILETTERJHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

24 U+099E ঞ BENGALILETTERNYA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

25 U+099F ট BENGALILETTERTTA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

26 U+09A0 ঠ BENGALILETTERTTHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

27 U+09A1 ড BENGALILETTERDDA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

30

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

28 09A109BC(U+09DC)

ড় NormalizedformofBENGALILETTERRRA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]09DCisthepreferredcodepointhoweveritisnotavailableforLGRasperthestandardsgoverningthisLGRdevelopment

29 U+09A2 ঢ BENGALILETTERDDHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

30 09A209BC(U+09DD)

ঢ় NormalizedformofBENGALILETTERRHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]09DDisthepreferredcodepointhoweveritisnotavailableforLGRasperthestandardsgoverningthisLGRdevelopment

31 U+09A3 ণ BENGALILETTERNNA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

32 U+09A4 ত BENGALILETTERTA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

31

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

33 U+09A5 থ BENGALILETTERTHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

34 U+09A6 দ BENGALILETTERDA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

35 U+09A7 ধ BENGALILETTERDHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

36 U+09A8 ন BENGALILETTERNA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

37 U+09AA প BENGALILETTERPA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

38 U+09AB ফ BENGALILETTERPHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

39 U+09AC ব BENGALILETTERBA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

32

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

40 U+09AD ভ BENGALILETTERBHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

41 U+09AE ম BENGALILETTERMA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

42 U+09AF য BENGALILETTERYA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

43 09AF09BC(U+09DF)

য় NormalizedformofBENGALILETTERYYA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]09DFisthepreferredcodepointhoweveritisnotavailableforLGRasperthestandardsgoverningthisLGRdevelopment

44 U+09B0 র BENGALILETTERRA

Consonant 1Bangla2Manipuri

[112][125]

45 U+09B2 ল BENGALILETTERLA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

46 U+09B6 শ BENGALILETTERSHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

33

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

47 U+09B7 ষ BENGALILETTERSSA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

48 U+09B8 স BENGALILETTERSA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

49 U+09B9 হ BENGALILETTERHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

50 U+09BE া BENGALIVOWELSIGNAA

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

51 U+09BF ি BENGALIVOWELSIGNI

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

52 U+09C0 ী BENGALIVOWELSIGNII

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

53 U+09C1 BENGALIVOWELSIGNU

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

34

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

54 U+09C2 BENGALIVOWELSIGNUU

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

55 U+09C3 BENGALIVOWELSIGNVOCALICR

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

56 U+09C4 BENGALIVOWELSIGNVOCALICRR

Kāra(Mātrā)

1Bangla2Assamese

[112][122]

57 U+09C7 l BENGALIVOWELSIGNE

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

58 U+09C8 m BENGALIVOWELSIGNAI

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

59 U+09CB lা BENGALIVOWELSIGNO

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

60 U+09CC lৗ BENGALIVOWELSIGNAU

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

35

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

61 U+09CD BENGALISIGNVIRAMA

Hasanta(=Halant)Virama(=Da ri)

1Bangla2Assamese2Manipuri

[112][122][125]

62 U+09CE ৎ BENGALILETTERKHANDATA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

63 U+09F0 ৰ BENGALILETTERRAWITHMIDDLEDIAGONAL

Consonant 2Assamese [122]

64 U+09F1 ৱ BENGALILETTERRAWITHLOWERDIAGONAL

Consonant 2Assamese2Manipuri

[122][125]

Table8BanglaCode-PointRepertoire

Apart from the above individual code-points the Neo-Brāhmī Generation Panel alsoproposes some specific sequences which enable conditional inclusion of the BanglaLETTER A and E followed by Bangla SIGN VIRAMA and Bangla LETTER YA againfollowed by Bangla VOWEL SIGN AA in the repertoire for enabling inclusion of aeligsoundasinEnglishlsquobatrsquolsquocatrsquoetc

SrNo

UnicodeCodePoints

Sequence

CharacterNames Examplelanguagesusingthecode-point

(Notexhaustive

list)

Reference

36

SrNo

UnicodeCodePoints

Sequence

CharacterNames Examplelanguagesusingthecode-point

(Notexhaustive

list)

Reference

S1 098509CD09AF09BE

অ8া BENGALILETTERABENGALISIGNVIRAMABENGALILETTERYABENGALIVOWELSIGNAA

BanglaAssamese

[112][122]

S2 098F09CD09AF09BE

এ8া BENGALILETTEREBENGALISIGNVIRAMABENGALILETTERYABENGALIVOWELSIGNAA

Bangla [112]

Table9Sequences

52 CodePointRepertoireExclusionTherearesomecharactersoftheBanglascriptthatfindplaceintheUnicodebuthavenot been included in the repertoire in the LGR proposal The reason for excludingঌ(U+098C)andৗ(U+09D7)isthattheyarerareandobsoletecharacters

SrNo CodePoints

Glyph CharacterNames Note

1 U+098C ঌ BENGALILETTERVOCALICL Limitedordeclininguse

2 U+09D7 ৗ BENGALIAULENGTHMARK Limitedordeclininguse

Table10ExcludedCodePoints

37

53 CodepointnotusedaloneBENGALI SIGN NUKTA U+09BC (See 336) is excluded from repertoire since it will never be used alone It will be used as sequence in three special characters in normalized form for ড় ঢ় য়

UnicodeCodePoint

Glyph CharacterName Reasonforexclusion

U+09BC BENGALISIGNNUKTA

Never used alone Only used together with U+09A1 ড

U+09A2 ঢ U+09AF য as to form ড় ঢ় য় respectively

Table10bExcludedCodePoints

54 TheBasisofPresentIDNThepresentLGRhasalsobenefitedfromtheearlierworkonIDNforBangla(differentversions)doneforभारतorভারতdraftedbetween20112009and18072013

541 TheABNFVariablesTheAugmentedBackus-NaurFormalism(ABNF)beganwiththefollowingvariables

CrarrConsonantVrarrVowelMrarrkāra(Mātrā)BrarrAnusvāra(onuʃʃār)DrarrCandrabinduXrarrVisarga(biʃɔrgo)HrarrHasantaViramaZrarrKhandaTa

TheAugmentedBackus-NaurFormalism(ABNF)willusethefollowingOperators

SrNumber Operator Function

1 ldquo|ldquo Alternative

2 ldquo[]rdquo Optional

3 ldquordquo VariableRepetition

38

4 ldquo()rdquo SequenceGroup

Table11TheABNFFormalism

InwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoBanglaaregiventofacilitateunderstanding

542 TheVowelSequenceInwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoBanglaare given To facilitate understanding of other Brahmi script users equivalents inDevanāgarīareprovidedwherevernecessaryAvowelsequenceismadeupofasinglevowelItmaybefollowedbutnotnecessarily(optionally)byanAnusvāraonuʃʃār(B)Candrabindu(D)oraVisargabiʃɔrgo(X)ThenumberofDBorXwhichcanfollowaV inBanglamaynotberestrictedtooneGoingbytherules illustratedinthedocument it isclearthat formationssuchasVDDVBBandVXXare invalidorthographicunitsHowever it isvalidandpossible tohaveformationsorsequencessuchasanusvarafollowedbyachandrabinduononehandandvisarga followed by a chandrabindu on the other as in হ8াংচা lsquohaelignchārsquo and lsquohaelignrsquo হ8াঃrespectivelyThepossibilityof aVisarga orAnusvāra (onuʃʃār) followingaCandrabinduexists inBangla Vowel can optionally be followed by a combination of Hasanta Virama [H]Consonant [C] to formaYa-phala ldquoYa-phala isapresentation formofU+09AFBanglaletterযorlsquoyarsquoRepresentedbythesequenceltU+09CDieBENGALISIGNVIRAMABangla SIGNHasanta or VIRAMA U+09AF -য BENGALI LETTER YAgt Ya-phala has aspecialformয়AgainwhencombinedwithU+09BEাBENGALIVOWELSIGNAA(ielsquoaarsquo(ā))itisusedfortranscribing[aelig]asintheldquoardquointheEnglishwordldquobatrdquowritteninBanglaasব8াটAVowel-sequenceadmitsthefollowingcombinations

5421 ASingleVowel

ExamplesV অअ

5422 AVowelwithConditionsAVowelcanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D]orVisarga[X]or Candrabindu+ Anusvāra [DB] or Candrabindu+ Visarga [DX] or combination ofHasanta(orVirama)[H]followedbyConsonant[C]followedbykāra(Mātrā)[M]

39

Examples

VB অং अ

VD অ अ

VX অঃ अः

VDB অং अ

VDX অঃ अःVHCM অ8াএ8া

5423 VHCMSequenceAVHCMsequencecanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D] orVisarga [X]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga [DX]

Examples VHCMB অ8াংএ8াং VHCMD অ8াএ8া VHCMX অ8াঃএ8াঃ VHCMDB অ8াংএ8াং VHCMDX অ8াঃএ8াঃ

543 TheConsonantSequence

5431 ASingleConsonant(C) ExampleC কक

5432 AConsonantwithConditionsAConsonantoptionallyfollowedbydependentvowelsignkāra(Mātrā)[M]orAnusvāra [B] or Candrabindu [D] or Visarga [X] or Hasanta (also known asVirama)[H]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga[DX]

Example

CM িকক -कक

CB কং क

40

CD ক क

CX কঃ कः

CH p क(Pureconsonant)

CDB কং क

CDX কঃ कः

5433 CMSequenceACMsequencecanbeoptionallyfollowedbyBDXDBorDX

Example CMB কীংকং कक

CMD কা का

CMX বীঃ वीः

CMDB কাং का

CMDX কাঃ काः

5434 SequenceofConsonants Asequenceofconsonants(upto4)joinedbyHasanta(alsoknownasVirama)

3(CH)CExample

CHC W rarr ন++ ত न+त

CHCHC sup2 rarr ন+ + ত+ + র न+त+र

CHCHCHC q8 rarr ন++ত++র++য় न+त+र+य

5435 Subsets While considering its subsets as a representative example we will consider thecombinationCHConlyhoweverthesameisequallyapplicabletoCHCHCandCHCHCHC[A]ThecombinationmaybefollowedbyMBDXDBorDX

Example CHCM sup3ী rarrক ক ী 4कrarrककী

CHCB sup3ং rarrক ক ং 4कrarrकक

CHCD sup3 rarrক ক 4कrarrकक

41

CHCX sup3ঃ rarrক ক ঃ 4कःrarrककঃ

CHCDB sup3 ং rarrক ক ং 4कrarrकक

CHCDX sup3ঃ rarrক ক ঃ 4कःrarrककः [B] 3(CH)CMmayfurtherbefollowedbyaBDXDBorDX

Example

CHCMB sup3ীং rarr ক ক ী ং 4क rarr क क ी

sup3ং rarr ক ক ং 4क rarr क क

CHCMD sup3া rarr ক ক া 4का rarr क क ा

CHCMX sup3ীঃ rarr ক ক ী ঃ 4कः rarr क क ी ः

CHCMDB sup3াংrarr ক ক া ং 4काrarr क क ा

CHCMDX sup3াঃ rarr ক ক া ঃ 4काः rarr क क ा ः

544 TheKhanda-Tasequence

5441 AsinglelsquoKhandarsquo-Ta(Z) Example Zৎ=x

5442 AKhandaTaCombination10AKhandaTacanbeprecededbyaconsonantandHasanta(alsoknownasVirama)

[CH]Z

Example র++ৎ =ৎHasinভৎHসনা (bhartsanā) scoldingNoteTheconditionsinthiscontextofKHANDATAarethattheCshouldbeeitherRAU+09B0(র)(usedinBangla)orRAU+09F0(ৰ)(usedinAssamese)

545 SpecialCasesSandPTwospecialcasesinvolvingSequences(referredtoasSandPinTable16underSection7)couldbedescribedbrieflyhereLetustakeupSinthefirstinstanceItisnoteworthythat thereare two instances inBanglawhereHasanta(U+09CD) isprecededbya full

10 Refer to Rule P in Section 7 Table 16

42

vowel (U+0985 অ - BANGLA LETTER A and U+098F এ - BANGLA LETTER E) ForrenderingYa-phalāfollowedbyঅandএitisnecessarytotypeU+09CDplusU+09AFyapreceded by the said vowels This is a purely ligatural entity and the addition ofYa-phalāandā-kāraisusedtoelicittheaeligsoundasinEnglishlsquobatrsquolsquofatrsquoetcTheBrahmıscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribedaslsquovowel killerrsquo although it actually indicates absence of a vowel after the markedconsonant Only the consonants can have the Hasanta marked But as we see hereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅGাandএGা(CfUnicode100p473[100])Another case refers to the formation of repha and ra-phalā in the said script andmentioned in the tableaboveasPOwing toco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+

kaasinঅকHarkaldquothesunldquo)ra-phalā=C+Hasanta+ra(egTieka+Hasanta+raas

inচTcakraldquocycleldquo)ThepointisinboththecasestheslotforracouldbeBanglaraর

(U+09B0)ortheAssameseraৰ(U+09F0)followedprecededbythecommonHasanta(U+09CD)whereastheshapesofrephaandra-phalāinboththecasesremainthesame

43

6 VariantsThissection talksabout thevariants in theBanglascriptTheNBGPcategorizes theseconfusinglyvariantsintwogroupsGroup1ConfusingduetopurevisualsimilarityGroup2ConfusingduetodeviationfromnormallyperceivedcharacterformationsbylargerlinguisticcommunityForGroup1anyidenticalcodepointsaredefinedasvariantsTheconfusablebutnotidenticalcasesarenotproposedasthereisanotherpanel(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesHowevercaseswhichbelongtoGroup2areproposedtobeconsideredasvariantsThesecasesarenotofmerevisualsimilarityasthey involve some deviations from the widely accepted norms of Bangla Aksharformations These can cause confusion even to a careful observer and hence beingproposedasvariantsThe variants are generated in a script when two or more forms are formed withdifferent storage or code points In Bangla the e-kāra ā-kāra and the o-kāra havedifferentcodepointsOnecantypeowithaconsonantatonegoandthesamebytypinge-kāra and ā-kāra as two separate keys getting the same results A reader cannotdifferentiatebetween the twoko (েকা)one typedwitha singlekeyand theotheronetypedwithtwodifferentkeysMoreoverthiswillnotbeconsideredasacaseofvariantbecauseakārafollowedbyakāraisnotallowed

61 InScriptVariantsHoweverweproposetwocasesoftruein-scriptvariantsinBanglascriptCASEIAs far as true variants in Bangla are concernedwemay drawour attention to caseswhereinHasantawith(U+09A5)থ(tha)appearsasconjunctwith(U+09B8)স(sa)and(U+09A8)ন(na)1 স+Hasanta+থ(U+09B8+U+09CD+U+09A5)versus

স +Hasanta+হ(U+09B8+U+09CD+U+09B9)

2 ন+Hasanta+থ(U+09A8+U+09CD+U+09A5)versus

ন+Hasanta+হ(U+09A8+U+09CD+U+09B9)

44

Theabovecombinationsifwrittenintraditionalorthographycouldbelittleconfusingwheretheথ(tha)inconjunctappearslikeaহ(ha)Theconjunctcouldbeintheinitialmedialorfinalpositions(asshownbelowinegno1)Itcouldbetypedwrongaswellthinking itwas aহ (ha)U+09B9 increasing the chances of risks in labelwriting andidentificationExamples1 acuteandসহ(asinacuteানsthānaacuteলsthulaাacuteGsvāsthyaঅacuteায়ীasthāyı)

2 andনহ(asinparagrantha)The fontswhichrepresent traditionalBanglawritingsystemcould tend tocreate thisproblemThereforethesemaybetakenascasesofvariantsinBanglaCASEIIAnotherinterestingexampleofvariantisencounteredinra+HasantaandHasanta+racombinationsinwritinglabelsintheBanglascript(forlanguagessuchasBanglaAssameseandManipuri)Thevariantcasesariseintypinglsquorepharsquo(involvingra+Hasanta)andlsquora-phalārsquo(involvingHasanta+ra)lsquoRepharsquocouldbeformedbytwosequences(mainlybecausebothAssameseandBanglafindplaceinthesameUNICODEpointsandlsquoB_RArsquoaswellaslsquoA_RArsquorefertothesamephoneticelement)Herethefinalligatureslookthesameandwillbeasfollows

(1) B_RA+H+C(2) A_RA+H+C

Where

B_RA = U+09B0BENGALILETTERRA(র)orA_RA = U+09F0BENGALILETTERRAWITHMIDDLEDIAGONAL(ৰ)H = U+09CDBENGALISIGNVIRAMA()C = anyconsonant(theoretically)

Example

Sequence1(UsingBanglaRA)

Ligature1

Sequence2(UsingAssameseRA)

Ligature2

U+09B0(র)U+09CD()U+0995(ক) কH U+09F0(ৰ)U+09CD()U+0995(ক) কH

U+09B0(র)U+09CD()U+09A0(ঠ) ঠH U+09F0(ৰ)U+09CD()U+09A0(ঠ) ঠH

Table12ExampleofRepha

45

NoteAsBanglaandAssameseকandঠlookexactlythesametheresultantcombinationswithRephalookidenticalAdditionofRephadoesnotmakeanydifferencelsquoRa-phalarsquocouldbeformedbytwosequencesonsimilargroundsandthefinalligatureswouldlookthesame

(1) C1+H+B_RA(2) C1+H+A_RA

WhereC1 = anyconsonantsexceptKhanda-ta

Example

Sequence1(UsingBanglaRA)

Ligature1

Sequence2(UsingAssameseRA)

Ligature2

U+0995(ক)U+09CD()U+09B0(র) U+0995(ক)U+09CD()U+09F0(ৰ)

U+09A8(ন)U+09CD()U+09B0(র) ) U+09A8(ন)U+09CD()U+09F0(ৰ) )

Table13ExampleofRa-phala

AstheAssameseandBanglaRephaandRa-phalaconjunctformslookthesamethiscouldcauseconfusabilitytotheend-usersHencetherephaandra-phalacasesneedtobedefinedasvariantsNBGPconcludedtodefineরandৰasvariantcodepointswhereonlyonevariantsetbetweenরandৰcouldcoverallcasesButthiswillcreateblockedvariantlabelsegif

someoneregistersldquoরররrdquothevariantlabelldquoৰৰৰrdquowillbegeneratedasvariantandwillbeblockedandviceversaHoweveritisonlyblockedatthelabellevelifsomeoneelseneedstoregisterotherlabelsegৰৰorৰৰৰৰitisstillpossible

62 CrossScriptVariantsAcrispcrossscriptstudyforBanglahasbeendonewithrespecttosisterscriptssuchasDevanāgarī Gurmukhı and Odia11(formerly Oriya) keeping in mind the visual andtechnicalconfusionstheymaycauseaslabelsonthewebdomainMoreoverthereisnoin-script variant in Bangla as far as the orthography is concerned The followingcharacters are being proposed by the NBPG as variants Although there are certaincharacters which are somewhat similar they but have not been included here TheyhavebeenprovidedintheAppendix(102)forreference

11 Unicode uses Oriya for the script although Odia is now the official term used

46

1 BanglaandNāgarīDevanāgarīScript

Bangla Devanāgarī

মU+09AE

मU+092E

িU+09BF

िU+093F

Table14-BanglaandDevanāgarīcross-scriptvariantcodepoint

2 BanglaandGurmukhiScript

Bangla Gurmukhı

মU+09AE

ਸU+0A38

িU+09BF

ਿU+0A3F

Table15-BanglaandGurmukhıcross-scriptvariantcodepoint

7 WholeLabelEvaluationRules(WLE) ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinsection32whenwritten inBangla12ScriptThe ruleshavebeendrafted in suchawaythattheycanbeeasilytranslatedintotheLGRspecificationsBelow are the symbols used in the WLE rules for each of the Indic SyllabicCategoryasmentionedinthetableprovidedinCodepointrepertoire(Section51)

C rarr Consonant

M rarr Kāra(Mātrā)

V rarr Vowel

B rarr Anusvāra

12 As used by the Unicode denoting and including both Assamese and Maṇipuri

47

D rarr Candrabindu

X rarr Visarga

H rarr Hasanta

Z rarr KhandaTa

S rarr S1S2(fromTable9)or(ae)Ya-phalā(V1HC1M1)whereV1iseither0985(অ-BENGALILETTERA)or098F(এ-BENGALILETTERE)His09CD(-BENGALISIGNVIRAMA)C1is-09AF(য-BENGALILETTERYA)M1is-09BE(া-BENGALIVOWELSIGNAA)S1 and S2 are valid even they are not allowed by the other context rules

P rarr Ra-Hasanta(C2H)whereC2iseither09B0(র-BENGALILETTERRA)or 09F0 (ৰ - ASSAMESE LETTER RA

Unicode name BENGALI LETTER RAWITHMIDDLEDIAGONAL)

His09CD(-BENGALISIGNVIRAMA)

Table16-SymbolsusedinWLErules

ItisalsoperhapsidealtomentionherethatinBanglatheconsonantletters(orgraphemes)arephysicallyjoinedtoformldquoclustersrdquothatcouldtheoreticallyconjoinfromtwotofourconsonantsandcombinetocreatenewshapesDashandChaudhuri(1998)statethatthereareldquonearly380uniqueconsonantclustersrdquooutofwhichBi-consonantalcombinationsare290three-lettercombinationsaccountforanother80andtherareroneswithfourlettersnumber10more[136Pg4]MoredetailsofsuchcombinationscouldbeseeninPabitraSarkar(1993)[135] 71FinalSetofWLERulesTheprevalentpatternsinBanglaandvariousrestrictionsbelowarethespecificWLErulesthatneedtobeimplemented

48

1 CisasetofCandCNwhereCNisthesetofnormalizedformsofড়ঢ়য়2 HmustbeprecededbyC

Example

3 MmustbeprecededbyCExampleকা

4 DmustbeprecededbyeitherofVCorMExampleআখখা হা

5 XmustbeprecededbyeitherofVCMorDExampleউঃখঃবঃাঃ দ ঃ

6 BmustbeprecededbyeitherofVCMorDExampleআংইংকং

7 ZmustbeprecededbyVCMDBXorPExampleইৎকৎাৎাৎপ6ৎrৎ (S is not listed because S ends with M Z may also follow S)rdquo

8 VCANNOTbeprecededbyHDetailsin711CaseofVprecededbyH

9 SCANNOTbeprecededbyH

Now letuselaborateeach rulewithexamples from the scriptkeeping inmindthe Bangla Assamese and Manipuri communities Some combinations ofcharactersmayseemunrealisticorrareinusagebutthereisnoharminaddingsuch ligaturesbecause it ispossible tocreate thembyanyusereasilybutmaynotbeattestedcombinationsCase of V Preceded by H There could be cases involvingmulti-word domains where Vmay need to beallowedtofollowanHegব8াsঅtইিuয়া baeligŋkʌv ɪndiə (U+09ACU+09CDU+09AFU+09BEU+0999U+09CD U+0995 U+0985 U+09AB U+09CD U+0987 U+09A8 U+09CD U+09A1U+09BFU+09DFU+09BE)(meaningBankofIndia)ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendswithanH(অt)andthesecondwordbeginswithaV(ইিuয়া)Somesectionsofthe linguistic community require the explicit presence of H for fullrepresentation of the sound intended However by and large the form of thefirstwordwithoutanH(U+09CD)isconsideredenoughforfullrepresentationofthesoundintendedforthefirstword

49

This isauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoire Otherwise V is never required to be allowed to follow an HPermittingthismaycreateaperceptivesimilaritybetweentwolabels(withandwithout H) for majority of the linguistic communities hence this is explicitlyprohibitedbytheNBGPIn future if required depending on the prevailing requirements from thecommunitythefutureNBGPmayconsiderrevisitingthisrule

72 AdditionalExamplesfromBanglaABNF

Belowarea fewexampleswhichhelponeunderstandsomeof the rulesABNFputsinplaceThesearejustgivenforreferencepurposesandarenotmeanttobecomprehensive

1HMBDorXcannotoccurinthebeginningofaBanglawordExample

ক क

াক ाक

ংক क

ক क

ঃক ःकAscanbeseensuchcombinationwillresultautomaticallyinaldquogolurdquooradottedcircle marking it as an invalid formation This is an intrinsic property of theIndianlanguagesyllableandisquasi-automaticallyappliedwhereversupportedbytheOS

2HisnotpermittedafterVBDXMS

Example

অ अ

অং क ক क কঃ कः ি )क

3NumberofBDorXpermittedafterConsonantorVowelorakāra(Mātrā)isrestrictedtoonethusthefollowingcombinationsareinvalidated

50

Example

কংং क

ক क

কঃঃ कःः

কা का

কীঃঃ कःः

অংং अ

অ अ

অঃঃ अःः

4NumberofMpermittedafterConsonantisrestrictedtooneExample

কীী की5MisnotpermittedafterV Example

ইাঈৗ ईाईौ6ThecombinationsofAnusvāra+ VisargaaswellasVisarga+Anusvāraarenotpermissible

Example

কংঃ कः

কঃং कः

8 Contributors

81ExpertsfromIndia ProfessorUdayaNarayanaSinghChair-ProfessorofLinguisticsampDeanFacultyofArtsAmityUniversityHaryanaGurgaonPachgaonManesarPIN122431(Haryana)India

51

ProfessorPabitraSarkarformerlyVice-ChancellorRabindraBharatiUniversityKolkataDrAtiurRahmanKhanPrincipalTechnicalOfficerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMrRajibChakrabortyLinguistSocietyforNaturalLanguageTechnologyResearch(SNLTR)Module114amp130SDFBuildingSaltLakeSector-VKolkata-700091(WestBengal)IndiaMrAkshatJoshiProjectEngineerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMsMoumitaChowdhurySeniorTechnicalOfficerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMrChandrakantaMurasinghAgartalaTripuraSomeotherNBGPmembers

82ContributorsfromBangladeshJanabMustafaJabbarHonorableMinisterMinistryofPostsTelecommunicationsampInformationTechnologyGovtofBangladeshProfShamsuzzamanKhanFormerDirector-GeneralBanglaAcademyDhakaProfRafiqulIslamNationalProfessorofHumanitiesDhakaProfSwarochisSarkarDirectorInstituteofBangladeshStudiesRajshahiUniversityRajshahiBangladeshProfJinnatImtiazAliDirector-GeneralInternationalMotherLanguageInstituteDhakaMrMohammadMamunOrRashidDepartmentofBanglaJahangirnagarUniversityampMemberBangladeshComputerCouncilProfManiruzzamanformerlyProfessorChittagongUniversityChattagramBangladeshMrShyamSunderSikderSecretarySecretaryPostampTelecommunicationsDivisionGovtofBangladesh

52

MrMdMustafaKamalFormerDirectorGeneralBangladeshTelecommunicationsRegulatoryAuthorityGovernmentofBangladeshDhakaBrigadierGeneralMdMahfuzulKarimMajumderDirector-GeneralEngineeringampOperationsDivisionBangladeshTelecommunicationsRegulatoryAuthorityGovernmentofBangladeshDhakaMdZiarulIslamProgrammerPostsampTelecommunicationsDivisionGovernmentofBangladeshDhakaProfSyedShahriyarRahmanDepartmentofLinguisticsUniversityofDhakaDrMizanurRahmanDirector(In-Charge)TranslationTextBookandInternationalRelationsDivisionBanglaAcademyDhakaDrApareshBandyopadhyayDirectorBanglaAcademyDhakaMrMdMobarakHossainDirectorBanglaAcademyDhakaDrJalalAhmedDirectorBanglaAcademyDhakaMrJahangirHossainInternetSocietyBangladesh(ICANNALS)JanabSarwarMostafaChoudhuryBangladeshComputerCouncilDhakaJanabMdRashidWasifBangladeshComputerCouncilDhakaJanabIstiaqueArifSeniorAssistantDirectorBangladeshTelecommunicationsRegulatoryAuthorityDhakaMsAfifaAbbasInformationSecurityandGovernanceLeadEngineeratBanglalinkandICANNFellowMr Mohammad Abdul Haque Secretary General Bangladesh Internet GovernanceForumMrImranHossenCEOEyeSoftandkeymemberofBangladeshAssociationofSoftwareampInformationServices(BASIS)MsShahidaKhatunDirectorFolkloreMuseumampArchiveDivisionBanglaAcademyDhakaMrSyedAshikRehmanCEOBengalMediaCorporationDhakaMrHaseebRahmanCEOProfessionalsrsquoSystemsDhaka

53

9 References[100] UnicodeConsortium2017UnicodeStandard100MountainViewCA

[101] BandyopadhyayChittaranjan1981DuiShatakerBanglaMudranoPrakashanKolkataAnandaPublishers

[102] BanerjiRD1919TheOriginoftheBengaliScriptKolkataNewDelhiAsianEducationalServices2003reprint

[103] ChatterjiSK1926TheOriginandDevelopmentoftheBengaliLanguageCalcuttaCalcuttaUniversityPress

[104] -----1939Bhasha-prakashBangalaVyakaran(AGrammaroftheBengaliLanguage)CalcuttaUniversityofCalcutta

[105] HaiMuhammadAbdul1964DhvaniVijnanOBanglaDhvani-tattwa(PhoneticsandBengaliPhonology)DhakaBanglaAcademy

[106]JhaSubhadra1958TheFormationofMaithiliLondonLuzacampCo

[107] KosticDjordjeDasRheaS1972AShortOutlineofBengaliPhoneticsCalcuttaStatisticalPublishingCompany

[108] MajumdarRC1971HistoryofAncientBengalCalcuttaGBhardwaj

[109] MazumdarBijaychandra19202000TheHistoryoftheBengaliLanguage(ReprCalcutta1920ed)NewDelhiAsianEducationalServices

[110] PandeyAnshuman2001ProposaltoEncodetheTirhutaScriptinISOIEC10646

[111] PalPalashBaran2001DhwanimalaBarnamalaKolkataPapyrus

[112] -----2007lsquoBanglaHarapherPanchParbarsquoInSwapanChakrabortyedMudranerSanskritiOBanglaBoiKolkataAbabhas

[113] RossFiona1999ThePrintedBengaliCharacteranditsEvolutionLondonCurzon

[114] ShastriMahamahopadhyayHaraPrasad1916HājārBacharērPurāṇaBāṅgālāBhāṣāyBauddhaGānōDōhāCalcuttaBangiyaSahityaParishat

[115] SinghUdayaNarayana(JointlyManiruzzaman)1983DiglossiainBangladeshandlanguageplanningCalcuttaGyanBharati214pp

[116] -----1987ABibliographyofBengaliLinguisticsMysoreCIILxii+316pp

[117] -----2017(withRajibChakrabortyBidishaBhattacharjeeampArimardanKumarTripathy)LanguagesandCulturesontheMarginGuidelinesforFieldworkonEndangeredLanguagesMimeoCentreforEndangeredLanguagesVisva-Bharati

54

[118] -----1980ScriptalchoiceandspellingreformAnessayinlanguageandplanningJournaloftheMSUniversityofBarodaSocialScienceNumber292173-186AmodifiedversionreprintedEAnnamalaiBjornJernuddandJoanRubinedsLanguagePlanningProceedingsofanInstituteMysoreCIIL405-417

[119] Sripantha1996JakhanChapakhanaEloKolkataPaschim-BangaBanglaAcademy

[120] SurAtul1986BanglaMudranerDushoBacharKolkataJijnasa

[121] ScriptBehaviourforBengaliVersion11TDILandC-DACPune

[122] BoraMahendra1981TheEvolutionofAssameseScriptJorhatAssamSahityaSabha

[123] ProposaltoEncodetheTirhutaScriptinISOIEC10646httpwwwunicodeorgL2L201111175r-tirhutapdfaccessedon25112017

[124] EthnologueAssameseintheLanguageCloudhttpswwwethnologuecomcloudasmaccessedon25112017

[125] BengalialphabetforManipurifoundinEthnologueManipuri(MeeteilonMeithei)httpswwwomniglotcomwritingmanipurihtmaccessedon20102019

[126] WikipediaBengalialphabethttpsenwikipediaorgwikiBengali_alphabetaccessedon25112017

[129]OmniglotSlyhetihttpwwwomniglotcomwritingsylotihtmaccessedon1052018

[130]WikipediaBishnupriyaManipurilanguagehttpsenwikipediaorgwikiBishnupriya_Manipuri_languageaccessedon25112017

[131] TheEMILLECIILCorpushttpmetashareeldaorgrepositorybrowsethe-emilleciil-corpusabdd35c8de6f11e2b1e400259011f6ea6bce74d38dbb42d881da76c64a6adb20accessedon1052018

[132]TheEMILLECIILCorpushttpcatalogelrainfoproduct_infophpproducts_id=696accessedon1052018

[133] BanglaLanguageampScript httpswwwisicalacin~rc_banglabanglahtmlaccessedon1052018

[134] SarkarPabitra1992BanglaBananSanskarSamasyaoSambhabanaKolkataChirayataPrakashan

[135] SarkarPabitra1993BanglaBhasharYuktabyanjanBhasha1123-45

55

[136] DashNiladriShekharandBBChaudhuri1998BanglaScriptAStructuralStudyLinguisticsToday121-28Alsoavailableathttpswwwacademiaedu9967428Bangla_Script_A_Structural_Study

[137] DaniAhmedHasan(1957)lsquoSrīhatta-NāgarīLipirUtpattioBikāśrsquoBanglaAcademyPatrika(Dhaka)Vol12(Bhadra-Agrahayan1364BangabdaNumber)pg1

[138] WikipediaSylhetiNagari httpsenwikipediaorgwikiSylheti_Nagariaccessedon1952018

[139] FuruiRyosuke(2015)lsquoVariegatedAdaptationsStateFormationinBengalfromtheFifthtoSeventhCenturyrsquoinBhairabiPrasadSahuampHermannKulkeedsInterrogatingPoliticalSystemsIntegrativeProcessesandStatesinPre-ModernIndiaChapter9Pp255-73NewDelhiManohar

[140] FergusonCharesAandMunierChowdhury(1960)lsquoPhonesofBengalirsquoLanguageVol36No1pp22-59

[141] ShahidullahMuhammad(2007)BuddhistMysticSongsDhakaMowlaBrothers

[142] RayPunyaSloka(1966)BengaliLanguageHandbookWashington

[143] HaiMuhammadAbdul(1960)AphoneticandphonologicalstudyofnasalsandnasalizationinBengaliDhakaUniversityofDhaka

[144] UnicodeConsortiumProposalSummaryFormtoAccompanySubmissionsforAdditionstotheRepertoireofISOIEC10646UNICODEhttpswwwunicodeorgL2L200202387r-syloti-formpdfaccessedonMay212018

[145] WikipediaOlChiki(Unicodeblock)httpsenwikipediaorgwikiOl_Chiki_(Unicode_block)accessedonMay212018

[146]BanglaScripthttpwwwbangladesh2000combdbangla_scripthtmlaccessedonMay212018

[147] BhattacharyaAshutoshed(1942)GopichandrerGanCalcuttaCalcuttaUniversity

[149] DasSisirKumar(1975)SahibsandMunshisAnAccountoftheCollegeofFortWilliamCalcutta

[150] IslamRafiqulPabitraSarkarMahbubulHaqampRajibChakraborty(eds)(2014)BanglaAcademyPromitoBanglaByabaharikByakaran(AFunctionalGrammarofStandardBangla)DhakaBanglaAcademy

[151] SarkarPabitra[2013]lsquoBanglaSpellingReformtheLongandShortofItrsquoBanglaJournal19215-232

[152] BanglaAcademy(2012)BanglaAcademyPromitoBanglaBananerNiyam(StandardBanglaSpellingasadoptedbyBanglaAcademy)DhakaBanglaAcademy

56

[153] SarkarPabitraampRajibChakraborty2018ldquoWhathashappenedSoFarIntermsof Script Reformsrdquo Paper presented at the Face to Face meeting jointly held by theBanglaAcademyDhakaampICANNatBanglaAcademyDhakaon10072018

[154] The Unicode Consortium 2018 The Unicodereg Standard Version 110 ndash CoreSpecificationChapter12P473

57

10 Appendix-I

101 AugmentedBackus-NaurFormalism(ABNF)The Augmented Backus-Naur Formalism (ABNF) is generic in nature and whenappliedtoaspecificlanguagescriptcertainrestrictionrulesapplyInotherwordsinagivenlanguagesomeoftheFormalismstructuresdonotnecessarilyapplyTotakecareofsuchcasesrestrictionrulesaresetinplaceTheserestrictionswillhelptofine-tunetheABNFIncaseofBangla13inparticularthefollowingrulesapply

1Khaṇḍata (ৎ) is NOT allowed at the beginning of an IDN label The sameappliestoঞandthevelarnasalঙintheBanglaSchemeoffive-foldlsquovargarsquo(asdefined under Table 5) Moreover Bangla does not allow ya (য়) in thebeginningofawordeitherbutwecanciteacoupleofnativeexamples forexamplethewordয়8াvেড়া(yaeligbbɔRo)fromthepoemlsquoLichuchorrsquowrittenbyKaziNazrul IslamHowever there are instances of it being used in namesmostlyof foreignoriginsuchasYaqubwhichmaybewrittenwithya(য়) inthe beginning as inয়াwব) In very recent timeswhile transliterating someChineseandJapanesenamesinBanglaonedoescomeacrossthepossibilityofKhaṇḍata (ৎ) followedbysa (স) inthebeginningofaword forexamplexেসিরং(Tsering)

2CHcancomewithKhandaTainonlythecasewhereCisra(র)(09B0)

ৎ6 asinভৎ6 সনা

3OnlyfollowingcombinationswithVHCMwillbeallowedrarrঅ8া(togetherpronouncedasaelig)asinঅ8ািসড(acid)rarrএ8া(togetheralsopronouncedasaelig)asinএ8ািসডএ8ােসািসেয়শান

(acidassociation)

102 lsquoSylhetiNagarılipirsquoorlsquoSilotirsquoThisversionofBanglascriptresemblesthe lsquoKaithīrsquoscript(ISO12954)usedbytheAccountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh andBiharndashwidelyinuseduringthe1880sTherewereseveralothernamesofSylheti

13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language Assamese and Maṇipuri have not been covered in this section

58

Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)

Table17ndashTheScriptTableofSylhetiNagarıorSiloti

103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints

1031 BanglaandNāgarīorDevanāgarī

Bangla Devanāgarī NBGP

Decision

ঃ U+0983 ः U+0903 Confusable

ওU+0993 उU+0909 Confusable

ঘU+0998 घU+0918 Confusable

U+0981 U+0945 Confusable

Table18BanglaandDevanāgarīconfusablecodepoints

59

1032 BanglaandGurmukhi

Bangla Gurmukhi NBGPdecision

ঘU+0998 ਬU+0A2C Confusable

U+0981 U+0A71 Confusable

Table19BanglaandGurmukhiconfusablecodepoints

Bangla Gurmukhi

NBGPdecision

ওU+0993 ਤU+0A24

Distinguishable

শU+09B6 ਅU+0A05

Distinguishable

মU+09AE ਮU+0A2E

Distinguishable

বাU+09ACandU+09BE

ਗU+0A17

Distinguishable

Table20ndashBanglaandGurmukhıdistinguishablecodepoints

1033BanglaandOriya(Odia)

Bangla Oriya(Odia) NBGPDecision

ওU+0993 ଓU+0B13 Confusable

Table21ndashBanglaandOriyadistinguishablecodepoints

Bangla Oriya(Odia) NBGP

Decision

ঘU+0998 ସU+0B38 Distinguishable

Table22ndashBanglaandOriyadistinguishablecodepoints

60

11 Appendix-IIBengaliconsonantsandtheirallographs

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

প p y(z+ত)(z +ন)|(z +প)প8(z +য)r(z +র)(z +ল)~(z +স)(+প)(+প)

ফ pʰ (t +র)(t +ল)(+ফ)

ব b ( +জ)( +দ)( +ধ)v(+ব)ব8(+য)(+র)(+ল)ভ(+ভ)(+ব)(+ব)

(0+ধ) 2 (3+ব)

ভ bʱ ভ8(+য)(+র)(+ল)

ত t (x+ত)8(x+x+য)(x+x+ব)(x+থ)(x+ন)ত8(x+য)(x+ম)8(x++য)(x+ব)(x+র)y(z+ত)(p+ত)(p+x+ব)(+ত)q8(+x++য) (+x+র)Thereisamarkedformofত+=ৎৎ6 (+xৎ)

amp (5+ত)

61

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

থ tʰ থ8(iexcl+য)cent(iexcl+র)pound(+থ)(x+থ)curren(+থ)

(7+থ) 9 (+থ)

দ d yen(brvbar+গ)sect(brvbar+ঘ)uml(brvbar+দ)copy(brvbar+ধ)দ8(brvbar+য)ordf(brvbar+ব)laquo(brvbar+ভ)not(brvbar+র)(+দ)shy(+দ)reg(+brvbar+র)not6 (+brvbar+র)

(lt+গ) gt (lt+ধ)

ধ dʱ macr(deg+ন)plusmn(deg+ম)ধ8(deg+য)sup2(deg+র)sup3(acute+ধ)copy(brvbar+ধ)(+ধ)micro(+ধ)

( (+ধ) gt (lt+ধ) (0+ধ) (7+ধ)

ট ʈ para(middot+ট)ট8(middot+য)cedil(middot+ব)sup1(middot+র)ordm(p+ট)raquo(frac14+ট)

ঠ ʈʰ ঠ8(frac12+য)frac34(iquest+ঠ)Agrave(frac14+ঠ)

ড ɖ Aacute(Acirc+ড)ড8(Acirc+য)Atilde(Acirc+র)

ঢ ɖʱ ঢ8(Auml+য)Aring(iquest+ঢ)

চ tʃ AElig(Ccedil+চ)Egrave(Ccedil+ছ)Eacute(Ccedil+Ecirc+র)Euml(Ccedil+ঞ)চ8(Ccedil+য)Igrave(Iacute+চ)Icirc(Iuml+চ)

(A+চ)

62

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ছ tʃʰ ETH(Ecirc+র)Egrave(Ccedil+ছ)Ntilde(Iacute+ছ)Ograve(Iuml+ছ)

$ (A+ছ)

জ dʒ Oacute(Ocirc+জ)Otilde(Ocirc+Ocirc+ব)Ouml(Ocirc+ঝ)times(Ocirc+ঞ)জ8(Ocirc+য)Oslash(Ocirc+র)Ugrave(Iacute+জ)

(A+জ)

ঝ dʒʱ (notprivilegedenoughtohaveclustersasafirstmember)Ouml(Ocirc+ঝ)Uacute(Iacute+ঝ)

ক k Ucirc(p+ক)ordm(p+ট)(p+ত)Uuml(p+x+র)(p+x+ব)Yacute(p+ন)THORN(p+ব)szlig(p+ম)ক8(p+য)agrave(p+র)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)auml(p+frac14+ব)aacute8(p+frac14+য)aring(p+স)s(aelig+ক)ccedil(+p+র)

amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)

খ kʰ (notprivilegedenoughtohaveclustersasafirstmember)egrave(aelig+খ)

63

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

গ g eacute(acute+গ)ecirc(acute+দ)sup3(acute+ধ)euml(acute+ন)igrave(acute+ব)iacute(acute+ম)গ8(acute+য)icirc(acute+র)iuml(acute+ল)eth(aelig+গ)eth6 (+aelig+গ)

( (+ধ) (H+গ) K (L+H+গ)

ঘ gʱ ntilde(ograve+ন)ঘ8(ograve+য)oacute(ograve+র)ocirc(aelig+ঘ)

ঞ Thisletterdoesnothaveanyparticularphoneticvaluebutmostlypronouncedasn

Igrave(Iacute+চ)Ntilde(Iacute+ছ)Ugrave(Iacute+জ)Uacute(Iacute+ঝ)times(Ocirc+ঞ)

(A+চ) $ (A+ছ) (A+জ) M (A+ঝ)

ণ n otilde(iquest+ট)frac34(iquest+ঠ)ouml(iquest+ড)divide(iquest+Acirc+র)Aring(iquest+ঢ)oslash(iquest+ণ)ণ8(iquest+য)ugrave(iquest+ব)acirc(p+frac14+ণ)uacute(frac14+ণ)ucirc(+ণ)

O (P+ড) - (P+R+র) + (S+ণ)

ঙং ŋ s(aelig+ক)uuml(aelig+p+র)egrave(aelig+খ)eth(aelig+গ)ocirc(aelig+ঘ)yacute(aelig+p+ষ)(Insomecontextsaeligisreplacedbyং)কংঅং

) (H+ক) (H+গ) U (H+ঘ)

64

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ম m thorn(+ল)yuml(+প)(+z+র)(+ভ)(++র)$(+ম)(+র)(x+ম)plusmn(deg+ম)amp(+ম)atilde(p+frac14+ম)

W (3+ম)

ন n (+ট)((+middot+র))(iquest+ঠ)u(+ড)(+Acirc+র)(+ত)q(+x+র)q8(+x++য)curren(+থ)shy(+দ)reg(+brvbar+র)micro(+ধ)+(+deg+র)(+brvbar+ব)-(+ন)(+ম)ন8(+য)(+স)0(+ন)

(7+থ) (7+ধ) (7+Y+র)

শ ʃ Icirc(Iuml+চ)Ograve(Iuml+ছ)1(Iuml+ন)2(Iuml+ম)3(Iuml+র)4(Iuml+ল)শ8(Iuml+য)

ষ ʃ 5(frac14+ক)raquo(frac14+ট)Agrave(frac14+ঠ)uacute(frac14+ণ)6(frac14+প)7(frac14+z+র)8(frac14+ফ)raquo(frac14+ট)9(frac14+middot+র)Agrave(frac14+ঠ)uacute(frac14+ণ)ষ8(frac14+য)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)

+ (S+ণ)

65

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

স sampʃ (+ক)(+ট)(+প)(+ফ)lt(+ত)pound(+থ)(+ট)(+ক)=(+খ)স8(+য)gt(+র)(+ল)aring(p+স)

9 (+থ)

হ h ucirc(+ণ)0(+ন)amp(+ম)হ8(+য)(+র)A(+ল)

W (3+ম)

ড় ɽ B(C+গ)

ঢ় ɽʱ (notprivilegedenoughtohaveclusters)

য dʒThesecondarysymbol(allograph)jɔ-phalahastwophoneticvaluesWhenaddedtotheinitialconsonantinaworditisavowelaelig(asinশ8ামলর 8াপারetc)Butafteranon-initialconsonantitjustdoublesitinpronunciation(asinকায6ধায6etc)The+যcombinationhastwophysicalmanifestationsmdashর 8andয6

ক8(p+য)স8(+য)র 8(+য)[Justর 8isneverusedinBanglaorthographyর 8াisbutthenitslasttwosymbolsYa-phalaa-karaconstituteavowelsignrepresentingthevowelঅ8া]

66

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

র r Twomanifestationsmdashi lরফrepʰasthefirst

memberofaclusteregপ6ৎ6 not6 য6D6(+deg+ব)(earlierE6=+brvbar+deg+বafour-termcluster)etc(placedoverthefollowingconsonant)

ii র-ফলাrɔ-pʰɔlaasthesecondthirdmemberofaclustereg etc(placedundertheconsonantitfollows)

ল l F(+গ)(+প)G(+ব)H(+ম)I(+ট)J(+ড)K(+ক)F(+গ)L(+দ)ল8(+য)iuml(acute+ল)(+ল)thorn(+ল)

ঃ hwordfinallywordmediallyitdoublesthepronunciationofthefollowingconsonant

অঃকঃ

অব

Page 6: Proposal for a Bangla (or Bengali) Script Root Zone …‘Bangla’ (or Bengali) is historically and genealogically regarded as an eastern Indo-Aryan language with around 178.2 million

6

BanglascriptisillustratedintheinscriptionsoftheVarmanaSenaandDevarulersofthetwelfthandthirteenthcenturies[104]TheevolutionoftheBanglascript(Cf136)isalignedwiththestoryofadvancementofprintingtechnologyThefirstldquoMovabletyperdquoscriptstechnicallycreatedandusedwhileprintingNathanielBrasseyHalheds (1751-1830)1778-book titled AGrammaroftheBengalLanguageIn1785Governor-GeneralWarrenHastings(1732-1818)requestedanother civilian Charles Wilkins (1749-1836) to cut punches for Bangla printingcharactersThecurrentprintedformofBanglascriptappearedsoonafterItisgenerallyagreedthatWilkinsdevelopedBanglaprintscript[111]HepassedonthisknowledgetoPancananaKarmakara(-1804)arenownedartistinBengalLateritwasKarmakarand his family that became famous in Bangla printing technology Shepherd wasanotherassistantofWilkinsinthisdesigningofscriptwhichbecamemoreangularwithsharperturnsandedges[133]Afewarchaiclettersweremodernizedduringthe19thcentury It was standardized by Pandit Ishwar Chandra Vidyasagar when the Banglatypefontsweretobeusedtopublishona largescaleundertheCalcuttaSchoolBookSociety[116forseveralreferences]Much later in1935 theLinotypetechnique inventedbyOttmarMergenthaler(1854-1899) in 1886was introduced intoBangla printing in 1935 by the efforts of SureshChandra Majumdar (1888-1954) Rajsekhar Basu (1880-1960) Jatindra Kumar Sen(1882-1966)andhisdiscipleSushilKumarBhattacharyaandhadbegunbeingusedbytheA nandabazaraPatrikagrouplaterfollowedbyothersWithinafewyearsthemoreadvancedmonotypetechnologycametobeusedinBanglaprintingHoweverinBanglaprinting culturemonotypehas a very limited acceptance and linotype held stage tilleventuallythedigitaltechnologycameintoreplaceallearliertechniquesAllthesecouldbepresentedinatable

PERIOD DESCRIPTION NAMES

3rdCenturyBC UseofBrāhmīandKharosthīscriptsbegininthesubcontinentBrāhmīwaswidelyusedduringtheMauryanKingAśokaInonetheoryBrāhmīisbasedonNorthSemiticalphabetbutsuitablymodifiedtofittheneedoflocallanguagesItiscurrentlybelievedtohavebeenanindependentdevelopment

Brāhmī

1st-3rdCenturyAD

TheKusanascriptnamedaftertheKusanaroyaldynasty

Kusanascript

7

PERIOD DESCRIPTION NAMES

4th-5thCenturyAD

ThenextstageofitsevolutionwasintotheGuptascriptnamedaftertheGuptaroyaldynasty

Guptascript

7thCenturyAD EpigraphicrecordsfromBangladeshdemonstrateremarkabledevelopmentsinEasternBrāhmīgivingrisetotheKuṭila-lipi

Kutila-lipi

8thCenturyAD SomecopperplateinscriptionsarefoundintheKhalimpurBangladeshduringthereignofDharmapālafromMonghyrandNālandāinBiharofthetimeofDevapālaandfromJagjıvanapurainWestBengalofthereignofMahendrapāla

Siddhamātikā

9thCenturyADuntil1025AD

Proto-BanglacharacteristicsinrudimentaryformsdevelopAnimportantlandmarkinthedevelopmentoftheBanglascriptistheRamaganjacopperplateinscriptionofMahāmāndalikafoundinthelastquarteroftheeleventhcenturyAD

Proto-BanglaScriptampLanguage

12th-13thCenturyAD

AmatureformofProto-BanglatheimmediateprecursorofBanglascriptisfoundintheinscriptionsoftheVarmanaSenaandDevarulersofthetwelfthandthirteenthcenturies

MaturedProto-Bangla

14th-15thCenturyAD

ThecharacteristicsoftypicalBanglascriptbegantodevelopascouldbeseeninthecopperplateinscriptionofVijayamānikya-IofTripuradated1478AD-alsoIllustratesformsofBanglalettersinthefifteenthcenturyAD

ModernBanglaScripterabegins(SeeRoss1999)

16th-17thCenturyAD

ThechartoftheBanglaalphabetappendedtotheChinaMonumentspublishedfromAmsterdamin1667andThecodeofGentoolawpublishedfromLondonin1776bothshowachartoftheBanglaalphabetTheyshow16VowellettersincludingtheLonglsquoৡrsquo lsquol irsquoAnusvāraandVisargaand34Consonants

PrintedChartsofBangla

18th-19thCenturyAD

CharlesWilkinsdevelopsprintinginBanglain1778andVidyasagarreformsit

BanglaTypeFonts

Table2DevelopmentoftheBanglaWritingSystem

8

TheoveralldevelopmentofBanglaScriptfromtheKuṭila-lipiperiodtoModernBanglacouldbeseenhereinTable3([102and146]andalsoseetheweb-pagein147)

Table3BanglaScriptinDifferentCenturies

32LanguagesConsideredBelowisthetabularrepresentationofthelanguagesusingBanglascriptthatareplacedonEGIDSScale1-6 (See117 fordetails) Some languagesunderEGIDS5 and6havealso developed their own scripts for printing and publishing Some had used Banglascriptearlier(suchasBodo)orusedit inWestBengalatsomepointoftime(Santali)but have later shifted to another writing system Bodo is now written in Nāgarī or

9

DevanāgarīandforSantalioneusesbothNāgarīDevanāgarīandOl-chiki(145)Forthepurposesof theBanglaLGRonly languagesbelonging to theEGIDS scale1 to4havebeenconsideredConsiderthefollowingtable

EGIDSScale1

EGIDSScale2

EGIDSScale3

EGIDSScale4

EGIDSScale5

EGIDS6

Bangla(Bengali)

SantaliBodoRiangKhumiMru(ng)Asho

LepchaPnarKodaKoraChak

Asamiyā(Assamese)

KochorRajabansı

MaltoorMalpahariya

ManipuriorMeitei

BisnupriyaManipuriKok-Borok(TripuraampBangladesh)

ChakmaHajongMundariampKurux(ofBangladesh)

TotoRohingyaTipperaMegamTanchangya

Usoi LimbuSadriorOraon

BhumijorMundariBawmChin

Table4MainlanguagesinIndiaandBangladesh

thatuseBanglaScriptontheEGIDSScale

33NotableFeaturesofBanglaScript[150]BanglaWritingSystemhascertainfeaturesthatshowhowithastobewritteninorhowtype-setting inBangla couldbedoneThis section is followedbya section that explains theCode-points (and fixed Code-point sequences) which show certain distinctive characteristics ofBanglaandwhichmaketheRepertoireThenextsectionswillalsocoverthelsquoaksharrsquo-formationrules(ABNF)showingcharacterclassWordLevelEvaluation(WLE)andContextRulesaswellas In-ScriptandCross-ScriptVariantsHerewepresentsomebasic featuresof theScriptandPronuncition The Bangla script is an alpha-syllabic writing system in which writing of all

consonants are assumed to contain an accompanying lsquoinherentrsquo vowel(theoretically before or after each consonant) It varies between ɔ and o

10

depending on the position of the consonant in the word At times theselsquoassumedrsquoorlsquoinherentrsquovowelsarenotpronouncedatall[142]

Vowelscanbewrittenasindependentlettersorbyusingavarietyofdiacriticalmarks which are written above below before after or both of the last twopositionstheconsonanttheyfollowinpronunciation[105]

AllBanglaconsonantswhenpronouncedinisolationareutteredwithaninherentvowel-ɔhenceক lsquokrsquoখ lsquokhrsquoorগ lsquogrsquoareusuallypronouncedas[kɔ][khɔ]or[gɔ]etcPhonologicallyBanglavowel-ɔcorrespondstotheHindischwaə

WhenconsonantsoccurtogetherinclustersspecialconjunctlettersareformedInprintedBanglamanyof theseconsonantal clustersorconjoinedconsonantsareinuseThelettersfortheconsonantsotherthanthefinaloneinthegrouparegenerally reduced But there are a few special conjunct characters which arecompounds of the consonant characters eg 7(k)+ষ(s)=8(ks)

9(n)+জ(j)=(nj)(j)+ঞ(n)==(jn) gt (h)+ম(m)=(hm) There are other issuesalsomdashরasthesecondmemberofaclusterisreducedtoasecondarysymboleg

(p)+র(r)=A(pr)B(s)+C(t)+র(r)=D(str) (as inউD ustra ldquocamelrdquo)য (y)whenusedas a primary symbol represents jɔ in Bangla But its secondary symbol(allograph) jɔ-phala has two phonetic values When added to the initialconsonant in a word it is a vowel aelig (as in শGামল (syamala) ldquogreenrdquo র Gাপার

(ryapara)ldquowrapperrdquoetc)Butafteranon-initialconsonant it justdoublesit in

pronunciation (as in কাযH ধাযH etc) The I(r)+য(y) combination has two

renderingsmdashর G(ry) andযH(ry)IncaseofJ(d)+ধ(dh)K(g)+ধ(dh)L(n)+ধ(dh)the

shape of the second member is changedmdasheg M(ddh) N(gdh) and O(ndh)

respectively The solitary example of I (r)+ঋ(r)=ঋH (as in ৈনঋHত nairrtSouthwest) ndash usedmostly in cases of Classical borrowings shows the use ofsecondary symbol of a consonant followed by the primary symbol of a vowelTheinherentvowelonlyappliestothefinalconsonantofthecluster

InconsonantclustersmanyconsonantstookacompletelydifferentformSometypicalexamplesareS(kt)T(kr)8(ks)N(gdh)=(jn)U(nc)(nj)V(tt)W(nt)O(ndh)X(bdh)Y(bhr)Z(mb)[(st)etcরhastwoallographsapartfromthisfullshapeoneislsquorepharsquoasfoundinকH(rk)পH(rp)andanotherisra-phalaasinA(pr)T(kr)(s+n)isanotheronewherethecerebralnasalconsonantsigntakesaqueershape[151]

The Bangla script has at least fifty-two primary symbols and quite a fewallographs(positionalvariantsofthem)correspondingtoforty-four(7oraland7nasalvowelsand30consonants)phonemes(150)orfunctionalspeechsoundswithsomeobviousredundanciesalthoughinoneofthefirstphonemicanalysisthenumberwasthoughttobethirty-fivephonemes[140]

11

As mentioned above in Bangla several graphemic symbols have secondaryshapestechnicallycalledlsquoallographsrsquowithacomplementarydistributionineachcaseThesegraphsormarkingsaregenerallyaddedtothefollowingpositionsoftheprimarysymbol[113]inthefollowingmanner

1) Below(egক(ku)W(nta)ক(ku)^ (hra)etc)

2) Above(egচ (ca)কH (rka)etc)

3) Rightside(egকা (ka)কং (kan)etc)

4) Leftside(egেক (ke))

5) LeftSideandabovesimultaneously(egৈক (kai)িক (ki)etc)

6) Rightsideandabovesimultaneously(egকী (kı))

7) Rightsideandleftsidesimultaneously(egেকা (ko))

8) Rightsideleftsideandabovesimultaneously(egেকৗ (kau))

Asforcomplementarydistributionofvowelletters(word-orsyllable-initial)andVowel Matras which are relevant for ABNF let us consider the followingBesidessomesimpleVowelModifierscalledlsquoKarsrsquoinBangla(alsoreferredtoasMatraintheotherLGRdocumentsofNeo-Brāhmī)therearesomecombinatorymodifiersofBanglaVowelswithcertainconsonantsForexamplewhereas

আU+0986BENGALILETTERAAissubstitutedby

াU+09BEBENGALIVOWELSIGNAA

ইU+0987BENGALILETTERIissubstitutedby

pre-posedিU+09BFBENGALIVOWELSIGNI

ঈU+0988BENGALILETTERIIissubstitutedby

ীU+09C0BENGALIVOWELSIGNIIor

উU+0989BENGALILETTERUissubstitutedby

U+09C1 BENGALI VOWEL SIGN U by marking below the primary

grapheme there are some special vowel modifiers of উ as in the followingcombinedletters

zwnj guratherthanwritingasগ(g)+ (u)

h ruratherthanwritingasর(r)+ (u)

zwnj śuratherthanwritingasশ (s)+ (u)

j huratherthanwritingasহ(h)+ (u)

12

knturatherthanwritingasL (n)+ত (t)+ (u)

Similarlytherecouldbevowelmodifiersofঊorlsquo(Long)ūrsquoaswelleg

m (bh)+র (r) (n bhru ldquoeyebrowrdquo)o (s)+র (r) (p sru)ঋ (r) afterহ (h) (q hr)etc

TherehavebeenmanynotablecontributionsinsimplifyingandmodifyingBanglaspellings and combinatory techniques especially by scholars such as PabitraSarkar(1992)[134]Inthistherehasbeenanattempttoreducethenumberofallographs of both vowels and consonants in clusters and it has been widelyacceptedintheprintingofschooltextsinbothBangladeshandWestBengal[151152]Asofnow twosystems theold (traditional) and thenewgoon sidebysideoperativeindifferentdomains

HoweverinpreparationofthisLGRdocumenttheaimhasbeentoconsiderthewidelyused and usable sequences and combinations and their variations across the sisterscriptsbelongingtothebasketofBrāhmīwritingsystemsBanglaAcademyDhakapublishedStandardBanglaSpellingRulesin1992followingtherecommendationsofacommitteeconstitutedthroughaworkshopjointlyorganizedbytheJatıyaSy iksakramaandPathyapustakaBoardin1988AthroughlyrevisededitionoftheRuleswaspublishedinSeptember20126After the establishment of Banla A kademi ofWestBengal in 1986 its first PresidentAnnadasankar Ray (1904-2002) in his inaugural address gave a direction forstandardizationofBanglaalphabetscript thespellingsystemandclearlyarguedthattheywouldnotblindly followtheSanskriticmodelofconventionalgrammarAbroadlistofproposalswassenttoexpertsonBanglaandabroadagreementwasreachedforlsquohomogenizationofBanglaspellingrsquoby1988BasedonopinionsreceivedfromdifferentquartersaunanimouslistoflsquorulesrsquowasagreeduponThiswaspublishedbyalsquoSpellingDictionaryrsquo titled Ākādemi Bānāna Abhidhāna (1997) which was obviously morecomprehensive than lsquoTheUniversityofCalcuttaproposalsrsquomade in1936Alongwiththe lsquorationalizationrsquo of spellings another stepwas taken tomake thewriting systemeasier to read by making the symbols used both single and combined ones morelsquotransparentrsquoThesereformswereoriginallysuggestedbySarkar(1987firstpublishedin1978)[134][153]whereheusedthetermsSwaccha (lsquoTransparentrsquo)andAswaccha(lsquoOpaquersquo or non-transparent) even adding Ardha Swaccha (lsquohalf transparent) inbetweenthetwoSomesampleexamplesare

6Bangla Academy 2012 Bāṅlā Ekaḍemī Pramita Bāṅlā Bānānera Niyama (Bangla Academy StandardBanglaSpellingRules)DhakaːBanglaAcademy

13

Transparent r (nn) s (pt) [ (st) where both member of the cluster can berecognizedOpaquewhereneitherofthetwocouldbe(easily)recognizedmdash8 ks(7 k+ষ s)= jn

( j+ঞ n)tng(un+গg) hm(gt h+ম m)

Semi-transparent A (pr)পH (rp)whereone symbol is recognizable and theother is

notIncaseofthree-termclustersatleastonesymbolwillnotbetransparentegv str

(w s+x t+র r)D str(B s+C t+র r)etc

Therewere in fact two types of proposals One concerned the shape of the lettersthose of consonant + vowel (CV) combinations and conjuncts which is consonant +consonantcombinationsTherewerefurthercomplexshapesiethoseofconsonant+consonant+ (consonant+) vowel (CC(CV) signs as in y (pru) or z (skru) SomedecisionsinthisareawerenecessarybecauseafewoftheCC(C)symbolsrepresentedcomplexitiesthatmadelearningthemdifficultforthechildrenTheotherdealtwiththespellings ofwords onlywithout any reference to the shapes of letters inwhich theywere written The basic objective here was lsquoone word one spellingrsquo to the greatestextentthatwaspossible[151]

Belowwe place a statement of themost salient changes that affect the consonant +vowelcombinations[153]

a The variants of the short u (^ উ-কার hrasva u-kāra) vowel sign have been

brought down to one ie So zwnj (gu) is now গ Similarly h (ru) gt র zwnj

(śu)gt শ j (hu)gtহ and therefore cluster + short u sign k (ntu)gt W

(ন++ত+উ) (stu)gt[ (স++ত+উ)

b The variants of long u (দীঘH ঊ-কার dīrgha u-kāra) have also been reduced

(rū)gt র n (bhrū) gt Y (ভ bh++র r+ঊ ū) (drū)gt (দ d++র r+ঊ ū) p (śrū)gt

(শ ś++র r+ঊ ū)

c The variants of ঋ-কার (ṛ-kāra secondary symbol of ṛ) have been brought down to one q (hṛ) gt হ

Regarding consonant + consonant + (consonant)hellip+ (vowel) clusters PaschimbangaBanglaAkademi proposed transparent or semi-transparent shapes for clusters to theextentadmissibleinBanglawritingsystemSomeexampleswillclarifytheproposal(Aslashwillmeanthatthetraditionalcluster-shapeprecedesitwhiletheBanglaAkademiinnovationfollows)[153]

Xব ধ bdh ( b+ ধ dh) Mদ ধ ddh (J d+ধ dh) ন থ nth (L n+থ th) Uঞ চ ntildec

(9 ntilde+চ c) ঞ ছ $ ntildech (9+ছ) ঞ জ ntildej (9 ntilde+জ j) Sক ত amp kt (7 k+ত t) T

14

kr (7 k+র r) Nগ ধ ( gdh (K g+ধ dh) ) ṅk (u ṅ+ক k) t ṅg (u ṅ+গ g) +

ṣṇ (B ṣ+ণ ṇ) ন ndhr (L n+ dh+র r) - ṇḍr ( ṇ+ ḍ+র r) ktr (7 k+x

t+র r)

331TheConsonantsAsper traditional classificationBangla Consonants are categorized according to theirphoneticpropertiesespeciallyintermsofplaceandmannerofarticulation[107]Thereare Five lsquoVargarsquo (pronounced as lsquoBargarsquo in Bangla) or Groups (sets or classes)distinguished by Place of Articulation and one Non-lsquovargarsquo group [105] Each Vargawhich corresponds toStopsat a certainplaceof articulation containsa seriesof fiveconsonants classified as per their phonetic qualities (ie manner of articulation)beginning from Unvoiced and Unaspirated to Voiced and Aspirated (in the fourthcolumn)finallyendingwithaHomorganicorCorrespondingnasal[107]Considerthefollowingtable

lsquoVargarsquoorSets

Unvoiced Voiced Nasal

-Asp +Asp -Asp +Asp

Velar কlsquoKrsquoU+0995

খlsquoKHrsquoU+0996

গlsquoGrsquoU+0997

ঘlsquoGHrsquoU+0998

ঙlsquoNGrsquoU+0999

Palatal চlsquoCrsquoU+099A

ছlsquoCHrsquoU+099B

জlsquoJrsquoU+099C

ঝlsquoJHrsquoU+099D

ঞlsquoNYrsquoU+099E

Retroflex টlsquoTTrsquoU+099F

ঠlsquoTTHrsquoU+09A0

ডlsquoDDrsquoU+09A1

ঢlsquoDDHrsquoU+09A2

ণlsquoNNrsquoU+09A3

Dental তlsquoTrsquoU+09A4

থlsquoTHrsquoU+09A5

দlsquoDrsquoU+09A6

ধlsquoDHrsquoU+09A7

নlsquoNrsquoU+09A8

Bilabial পlsquoPrsquoU+09AA

ফlsquoPHrsquoU+09AB

বlsquoBrsquoU+09AC

ভlsquoBHrsquoU+09AD

মlsquoMrsquoU+09AE

Table5VargaclassificationofBanglaconsonants

(FallingintoaPatternofFiveSetsofUnvoicedUnaspiratedUnvoicedAspiratedVoicedUnaspiratedVoicedAspiratedandNasalscalledfivelsquoVargarsquo)

Non-

যlsquoYrsquoU+09AF

য়lsquoYYrsquoU+09DF

রlsquoRrsquoU+09B0

ড়lsquoRRrsquoU+09DC

ঢ়lsquoRHrsquoU+09DD

15

Varga লlsquoLrsquoU+09B2

শlsquoSHrsquoU+09B6

ষlsquoSSrsquoU+09B7

স lsquoSrsquo U+09B8

হlsquoHrsquoU+0939

Table6Non-Vargaconsonants(Notfallingintoanyofthefivecategories)

332TheImplicitVowelKillerHasanta(calledrsquoHalantrsquoorlsquoHalantarsquoinotherBrahmı-basedscripts)As stated earlier all consonants are pronounced in isolation with an implicit vowel(centralback-ɔinBanglaastheneutralvowel)assumedtobeassociatedwiththem[121]ThelsquoHasantarsquo(=rsquoHalantrsquoorlsquoHalantarsquoinotherBrahmı-basedscripts)orthetermlsquoVirāmarsquo7(=rsquoDa rirsquoinBangla)aspreferredinUNICODE(cfUnicode30andabove)havebeenusedinthisreportastermsthathavebeenusedtodenotethecharacterthatmarkthe absence of this inherent vowel It may be noted that the term virama has beenadopted in UNICODE in a sense that is different from the traditional definition ofgrammarandhenceitrequiressomeexplanationhereConsideringtheimportanceofthedocumentthisnoteshouldbeapartofthisLGRdocumentsothatanybodyreferingtoitshouldbeabletoknowthepropergrammaticalexplanationofthetermBecauseaspecialsignisneededwheneverthisimplicitvowelisstrippedoffthesymbolisknownas the Hasanta (= Halant) (U+09CD) By placing the Hasanta under the firstconsonantofacombinationorclusteronecouldndashincommonparlanceldquokillrdquoitsvowelandcreate conjuncts In thismanner conjunct characters canbegenerallywrittenbyjoining two to fourconsonant combinations In rarecases thisprocess can joinup tofive consonantsHowever thenotionof amaximumnumberof consonants joining toformoneaksara8istobeboundedempiricallyThisisanobservationbasedontheCIIL-Emille Corpora of Bangla words [132 amp 133] as seen in print these days Given themixture of scripts and languages happening on theweb the possibility that onemaywant a generic Top Level Domain [gTLD] which may have more than the observedmaximum cannot be ruled out This can be the case when a foreign language wordwhichadmitsalargenumberofconsonantsistransliteratedintoBanglaHenceintheBanglaLGRworkthislimitwillnotbeenforced

333VowelsSeparate symbols exist for all lsquoSwararsquo or Vowels in Bangla which are pronouncedindependentlyeitheratthebeginningofthewordorafteranothervowelorconsonantsoundToindicateaVowelsoundotherthantheimplicitoneaVowelsigncalledlsquokārrsquo

7VirāmaasusedhereisalsoamisnomeraccordingtotheIndiangrammaticaltraditionsNowheremereabsence of a vowel is marked as virama Hasanta just marks the absence of a vowel nothing else(AbhyankarKashinathVasudevampJMShukla1961ADictionaryofSanskritGrammarBarodaːOrientalInstitute)8ThistermneedstobedisambiguatedAksaraalsomeanslsquosyllablelsquoinIndiangrammaticaltreaditions

16

inBanglaorMātrāinNagarı9isattachedtotheconsonantSincetheconsonanthasthisbuilt in neutral vowel at the end there are equivalent kāras (Mātrās) for all vowelsexcepttheঅ(pronounced-ɔ)Thecorrelationisshownasfollows

Vowel Correspondingvowelsign(kāras(Mātrās)

অlsquoArsquoU+0985

আlsquoAArsquoU+0986 া U+09BE

ইlsquoIrsquoU+0987 ি U+09BF

ঈlsquoIIrsquoU+0988 ীU+09C0

উlsquoUrsquoU+0989 U+09C1

ঊlsquoUUrsquoU+098A U+09C2

ঋVocalicrsquoRrsquoU+098B U+09C3

ৠVocaliclsquoRRrsquoU+09E0 U+09C4

ঌVocaliclsquoLrsquoU+098C U+09E2

ৡVocaliclsquoLLrsquoU+09E1 U+09E3

এlsquoErsquoU+098F ে U+09C7

ঐlsquoAIrsquoU+0990 ৈU+09C8

ওlsquoOrsquoU+0993 োU+09CB

ঔlsquoAUrsquoU+0994 ৌ U+09CC

- ৗ U+09D7

Couldappearontopofঅ lsquoArsquoU+0985oranyothervowel

U+0981Candrabindu

9AlthoughthetermlsquoMatralsquoinBanglastandsforanaltogetherdifferentconceptvizthetopbarplacedoveraletterndashtypicallyavailableinHindiandBanglabutmissinginGujarati

17

Vowel Correspondingvowelsign(kāras(Mātrās)

Couldappearafterঅ lsquoArsquoU+0985oranyothervowel

ংU+0982Anusvara

Couldappearafter অlsquoArsquoU+0985oranyothervowel

ঃU+0983Visarga

Afteranyconsonant U+09CD(Hasanta)

- ঽ U+09BDAvagraha

Table7BanglaVowelswithcorrespondingkārs

334TheAnusvāraonuʃʃār(ং-U+0982)TheAnusvāra or onuʃʃār inBangla at times represents a homorganic nasal but notalwaysItreplacesaconjunctgroupofalsquoNasalConsonant+Hasanta+ConsonantrsquowherethesecondconsonantbelongstotheVelarvargaorsetasinলংকাButitoftenappearsalso for such combinations involving non-velars appearing as the lastmember of thecombinationasinলGাংটা ldquonakedrdquoorলGাংচা ldquoakindofsweettolimprdquoBeforeanon-vargaconsonant the Anusvara represents a nasal sound that may have an alternativeconjoined writing symbol representing the corresponding nasal consonant of theparticularsetAlthoughModernHindiMarathiandKonkaniprefertheanusvāratothecorrespondingHalf-nasal inBangla it isclearlydemarcatedastowhereonemustusetheAnusvāraandwhere ithastobeaconjunctclusterwithanasalasthefirstorthesecondcomponent

335NasalizationCandrabindu(-U+0981)

Candrabindu denotes nasalization of the preceding vowel as in চাদ cad lsquomoonrsquo(U+099AU+09BEU+0981U+09A6)Thissignwithadotinsidethehalf-moonmarkisusedasnasalizationmarkerinmanyBrahmı-basedscripts[143]

336Nukta(-U+09BC)ThenuktasigndoesnotexistinBanglaorthographyItispredominantlyusedinmanyBrahmıderivedscriptssuchasDevanagarı(forHindiBodoMaithiliSantaliKashmiriandSindhiThetermandtheconceptofnuktaareborrowedinBanglaTheIDNAProtocol(RFC5891)statesthatIDNsmustbeinUnicodeNormalizationFormC (NFC) RFC 7940 applies this requirement to LGRs The definition of NFC in the

18

UnicodeStandardcontainsanumberofcompositionexclusionsAsaresulttheBanglalettersয় YYAড় RRAandঢ় RRHAhavetoberepresentedinthethisLGRbyusingthesequences (YA +Nukta U+9AF + U+09BC) (DDA + Nukta U+9A1 + U+09BC) and(DDHA+NuktaU+9A2+U+09BC)insteadofthesinglecodepointsYYA(U+9DF)RRA(U+09DC) andRRHA (U+09DD) although the useof lsquoNuktarsquo is otherwise completelyunnaturalinBanglaIt is noted that in the current Unicode Standard chart these characters are listed asadditionalconsonantsAspertheLGRProcedurehoweverthesedecisionsdependontheIDNAProtocolthroughasetofprodeduresdevelopedbytheIETFEventhoughtheUnicode Standard also prescribesmethods to produce these three characters both asatomiccharacters (forexample09DC forড় [r]09DD forঢ় [rh] and09DFasয় [y]assinglekeystroke)theIDNAprotocolrequiresthatwetreatthemasconjunctcharactersandthenallocatecodesfortheseintheUnicodeBengaliBlockItmaybenotedthattherecouldbesporadicattemptsorcasesofwritingMuslimnamesUrdupoeticwordsandPerso-Arabicloanwordswithnuktaunderক(k)খ(kh)গ(g)জ(j) and ফ (ph) only for the sake of correct pronunciation and for maintaining thesanctityoftheloanwordThesewerealsolikeusingBanglawritingsystemtoworkliketheIPAscriptItishowevernotinuseinBanglawritinginprinting

337Visargabiʃɔrgo(ঃ-U+0983)andAvagraha(ঽ-U+09BD)

TheVisargabiʃɔrgoU+0983 is frequentlyused inBangla loanwordsborrowed fromSanskritandrepresentsasoundveryclosetohOnecouldquoteasanexampleদঃখduhkholdquosorrowrsquorsquoldquounhappinessrsquorsquo(U+0926U+0941U+0983U+0916)The Avagraha ঽ (U+09BD) is mainly used in Sanskrit Pali Prakrt or Maithili textswritteninBanglaItisgraduallybeingreplacedbyanuppercomma(egনেরাঽপরািণre-writtenasনেরাrsquoপরািণ)ItisrarelyusednoweveninotherlanguagesusingBanglascriptIncaseofLGRtheAvagrahaisnotpartoftherepertoireIthasbeendecidedthereforenottoretainAvagraha(ঽ)(U+09BD)becauseitisblockedinTLDsaspertheMaximalStartingRepertoire(MSR)PleaseseeAppendixIIinsection11foracompletelistofBanglaconsonantsandtheirallographs

338ZeroWidthNon-joiner(U+200C)andZeroWidthJoiner(U+200D)ThisnoteispertinenttotheuseofZeroWidthJoiner(ZWJ)andZeroWidthNonJoiner(ZWNJ)asusedinBanglaItneedstobenotedthatNepaliKonkaniandHindiusethesetwosignsinadifferentmanner

19

ZWJ(U+0200D)andZWNJ(U+0200C)arecodepointsthathavebeenprovidedbytheUnicodestandardto instructtherenderingofastringwherethescripthastheoptionbetweenjoiningandnon-joiningcharactersWithouttheuseofthesecontrolcodesthestringmayberenderedinanalternateformfromwhatisintendedUseofZWJ

bull InsofarasBanglaisconcernedZWJisusedfortheproperrenderingofcharacterssuchaskhaṇḍa-taৎasinসতGিজৎ (satyajit)ldquoSatyajitrdquoandসৎ(sat)ldquohonestrdquoThisistypedasfollowsta+Hasanta+ZWJ(U+0200D)

bull However ZWJ is more important where same combination of consonantal

characters is represented differently depending upon the contexts Eg র++য

havetworepresentationsinBanglamdashasযHandasর GTogettheformযHonehasto

type in the following mannermdashর++য but for র G the sequence would be

র+ZWJ++য [154] In other words ZWJ is used in the rendering of wordsdemanding ya-phalā after ra which is otherwise not possible to type (render)due to the same order of ra+hasanta+antastha ja in the medial andor finalposition Interestingly ra+hasanta+antastha ja is used to type repha on theconsonant -antasthaja as inকায6 (kaarjo) In order to get a ya-phalā after the

consonant -ra it is therefore obligatory to use ZWJ after -ra as in র Gাপার

(wrapper)র Gাশ(rash)র Gািল(rally)etcThetypingsequenceisgivenbelow

ra(র)+ZWJ+hasanta()+antasthaja(য)=র GUseofZWNJ

bull TheuseofZWNJinBanglaisusedtorepresenttheexplicitHasantaorHalantInordertoavoidconjunctformationincaseswherethereisanexplicithasantabeforethesucceedingconsonanttheZWNJisused

Consonant+hasanta+ZWNJ+consonant=explicithasantaExampleAা7 কথন(prakkathanaprakkɔtʰon)

TheuseofZWJZWNJhavebeenruledoutfromtherootzonebythe[Procedure]Usedin Bangla to create alternate renderings the insertion of these two signs can affectsearchingaswellasNLP

20

TheZeroWidthNon-joiner(ZWNJ)isaninvisiblecharacterusedincertaincases(afterHasanta)wheredefaultconjunctformationistobeexplicitlyrestrictedandtheHasantajoiningthetwoconsonantsparticipatingintheconjunctformationneedstobeexplicitlyshown

339UseofYa-phalaaYa-Phalaasequencesare two instances inBanglawhereHasanta isprecededbya fullvowel(U+0985অ-BENGALILETTERAandU+098Fএ-BENGALILETTERE)

bull অা 098509CD09AF09BE BENGALILETTERA+BENGALISIGNVIRAMA+BENGALILETTERYA+BENGALIVOWELSIGNAA

bull এা 098F09CD09AF09BEBENGALILETTERE+BENGALISIGNVIRAMA+BENGALILETTERYA+BENGALIVOWELSIGNAA

For renderingYa-phalā followedbyঅ andএ it isnecessary to typeU+09CDHasantaplusU+09AFyaprecededby thesaidvowelsThis isapurely ligaturalentityand theadditionofYa-phalāandākaraisusedtoelicittheaeligsoundasinEnglishacidঅGািসড

association অGােসািসেয়শনlsquobatrsquoবGাটlsquofatrsquo ফGাট lsquomatrsquo মGাটlsquocaprsquoকGাপetcTheBrāhmīscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribed as lsquovowel killerrsquo although it actually indicates absenceof a vowel after themarkedconsonantOnly theconsonantscanhave theHasantamarkedButasweseehereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅ8াandএ8া(CfUnicode100p473[100])

3310FormationofRa-phalaaandRefSequences Thiscasereferstotheformationofrephaandra-phalāasfollows

Ra-Hasanta= (C2H)whereC2iseither

09B0(র-BENGALILETTERRA)or 09F0(ৰ-ASSAMESELETTERRAUnicodename

BENGALILETTERRAWITHMIDDLEDIAGONAL)His09CD(-BENGALISIGNVIRAMA)

Owingtoco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+kaasinঅকHarkaldquothesunldquo)ra-phalā=C+Hasanta+ra(egT ieka+Hasanta+raasinচTchakraldquocyclerdquo)Thepointisin

21

both the cases the slot for ra could be Bangla ra র (U+09B0) or the Assamese ra ৰ(U+09F0)followedprecededbythecommonHasanta(U+09CD)whereastheshapesofrephaandra-phalāinboththecasesremainthesameTheLGRmakesanoteofthispoint of concern with respect to the two RAs in disguise as it would be compeltelyimpossibletodistinguishbetweenthemwithnakedeyesinalablesogeneratedwhichmay consequently lead to concerns related to spoofing and other kind of cyberirregularitiesThemotivetoclassthesetwoCPsas(blocking)variantsisbecausefullyrendered labels may mask the distinction between Bangla ra র (U+09B0) or theAssameseraৰ(U+09F0)ThatprovidesthejustificationforVariantSet4thoughonlyinthecontextoffollowingHasantThedifferencebetweentheRAsisonlydistinguishableifonelooksintotheirUnicodevaluesThereforelabelssuchasঅকHarka শীষH sırsalsquotopapexrsquo অY abhra lsquocloudthe skyrsquo ম śrama lsquophysical labourrsquo could be extremelydangerous as theweb-usermay never verify the digital content (the labels) with itsunicodevaluecodepoints ThispointismadeexplicitlywithreferencetoTable9(ofsequencesp36)andTable16(ofWLESymbolsp47)thataretofollowMoreoveritisnoteworthythattheREPHAcanalsooccurwithKHANDATATheconditionsinthiscontextofKHANDATAareliabletobesuchthattheCshouldbeeitherRAU+09B0(র)(usedinBangla)orRAU+09F0(ৰ)(usedinAssamese)

4 OverallDevelopmentProcessandMethodologyThe Neo-Brāhmī Generation Panel (NBGP) has been formed by members havingexperience in Linguistics (especially in NLP Computational linguistics) LiteratureLanguageHistoryandEpigraphyUndertheNeo-BrāhmīGenerationPanelBanglaandeightotherscriptsbelongingtoseparateUnicodeblocksarebeingtakenuptoassignaseparate LGR for each However an attempt ismade to ensure that the fundamentalphilosophybehindbuildingthoseLGRsconsistentwithallotherBrāhmī-derivedscriptsThepresentLGRwillcater tomultiple languagesbelongingtoEGIDSscale1to4(seeTable4)thatuseBanglascriptThefollowingguidingprinciplesareusedinmakingdecisionsaboutBanglaLGRCode-points

41 GuidingPrinciplesTheNBGP adopts followingbroadprinciples for selection of code-points in the code-pointrepertoireacrosstheboardforalltheNeo-Brāhmīscriptswithinitsambit

411 InclusionPrinciples4111 ModernUsageEvery character proposed should be in the everyday usage of a particular linguisticcommunityThecharacterswhichhavebeenencodedintheUnicodefortranscription

22

purposesonlyorforarchivalpurposeswillnotbeconsideredforinclusioninthecode-pointrepertoire4112 UnambiguousUseEvery character proposed should have unambiguous understanding among linguistsaboutitsusageinthelanguage

42 ExclusionPrinciplesThe main exclusion principle is that of External Limits on Scope These consist ofprotocolsor standardswhichareprerequisites to theLabelGenerationRule-setsAllfurtherprinciplesareinfactsubsumedundertheselimitationsbuthavebeenspeltoutseparatelyforthesakeofclarity421 ExternalLimitsofScopeThecodepointrepertoireforrootzonebeingaveryspecialcaseatthetopofprotocolhierarchies thecanvasofavailablecharacters forselectionasapartof theRootZonecodepointrepertoireisalreadyconstrainedbyvariousprotocollayersbeneathitThefollowingthreemainprotocolsstandardsactassuccessivefiltersiTheUnicodeChartOut of all the characters that are needed by the script in question if a particularcharacter is not encoded in Unicode it cannot be incorporated in the code pointrepertoire Such cases are quite rare and especially so in Bangla-Asamiyā-ManipuriWritingSystemgiventheelaborateandexhaustivecharacterinclusioneffortsmadebytheUnicodeconsortiumiiIDNAProtocolUnicode being the character-encoding standard for providing the maximum possiblerepresentation of a given scriptlanguage it has encoded as far as possible all thepossible characters needed by the script However the Domain name being aspecialized case it is governed by an additional protocol known as IDNA(InternationalizedDomainNames inApplications) The IDNAprotocol excludes somecharactersoutofUnicoderepertoirefrombeingpartofthedomainnamesiiiMaximalStartingRepertoire(MSR)TheRoot-zoneLGRbeing the repertoireof characterswhicharegoing tobeused forcreationoftheRoot-zoneTLDswhichinturnconstituteanevenmorespecializedcaseof domain names the ROOT LGR procedure introduces additional exclusions on theIDNArsquosallowedsetofcharacters

23

ExampleBanglaSignAvagrahaঽ(U+093D)evenifallowedbyIDNAprotocol isnotpermittedintheRootZoneRepertoireaspertheMSRTosumuptherestrictionsstartoffwithadmittingonlysuchcharactersasarepartofthe code-block of the given scriptlanguage The IDNA Protocol further narrows thisdownandfinallyanadditionalfilterintheformofMaximalStartingRepertoirerestrictsthecharactersetassociatedwiththegivenlanguageevenmore4211 NoPunctuationMarksTheTLDsbeingidentifierspunctuationmarkerspresentinBraHami-basedscriptswillnotbeincluded 4212 NoSymbolsandAbbreviationsAbbreviations weights and measures and other such iconic characters like BANGLAISSHAR(U+09FA)BANGLACURRENCYDENOMINATORSIXTEEN৹(U+09F9)etcwillalsonotbeincluded4213 NoRareandObsoleteCharactersThere are characterswhich have been added toUnicode to accommodate rare formssuchasSanskriticVOCALICRRৠ(U+09E0)andVOCALICLldquoঌrdquo (U+098C)aswellasVOCALICLLৡ(U+09E1)andtheallographicndashkaraformsofthelattertwosymbols-VOWELSIGNVOCALICL(U+09E2)andVOWELSIGNVOCALICLLldquo(U+09E3)Allsuch charactersareexcludedwhich complieswith theConservatismprincipleas laiddownintheRootZoneLGRprocedureHoweverinBanglathe-karacorrespondingtoVOCALICRRৠ(U+09E0)whichisVOWELSIGNVOCALICRRldquordquo(U+09C4)isstill inactiveuseincertainlimitedborrowedorSanskriticwordsandarethereforeretained4214 NoStressMarkersofClassicalSanskritandVedicStressmarkers for classical Sanskrit will not be included This is also in consonancewiththeLetterprincipleaslaiddownintheRootZoneLGRprocedure4215 ABNFThe Augmented Backus-Naur Formalism (ABNF) is described in Section 541 andAppendix(Section101)

5 RepertoireTheBanglaWritingSystemisrepresentedinUNICODEusingtheBengali(Bangla)scriptname as enumerated in ISO 15924 corresponding to languages such as Asamiyā(Assamese) Bangla (Bengali) and Manipuri The BENGALI block used for Bangla-

24

Asamiyā-Manipuri in theUNICODEhas93 entriesThis sectiondetails the code-pointrepertoirethattheNeo-BrāhmīGenerationPanel[NBGP]proposestobeincludedintheBanglaLGRItmaybementionedherethat theGovernmentofAssamhassubmittedaproposal toBureauof Indian Standards (BIS) on26th February2016 for dis-unificationofBanglaand Asamiyā Scripts The BIS inits 8thMeetingofIndian Language Technologies andProducts Sectional Committee LITD 20 held on 23rd Aug 2017 decided torefer theproposalforrecognitionofAssamesescriptinISOIEC10646toISOUntiltheUNICODEConsortiumtakesanyfurtheractionitwillbeassumedthattheCodePointRepertoireunderTable11willbevalidforallthethreelanguagesasaboveFor each of the code points language references have been given in the last columntitledReferenceunderTable8titledtheldquoCodePointRepertoirerdquoForentirecoverageofBanglacodepointsreferencesofBanglaAsamiyā(Assamese)Manipuri(Meitei)andBishnupriya are given Kokborok written in Bangla script is not known to haveintroducedmanynewcomplicationsexceptforoneparticularcharacterThoughonlyafewrepresentativelanguagesunderEGIDSScale1-4havebeenchosenforreferencingthey together cover all the code-points required for all the languages that NBGP hasconsideredasgivenunderBanglaUnicodePoints(asgiveninUNICODE63)Howeverbefore thedetailsarepresented it is ideal to lookat theBanglaCodePointChartfromMaximalStartingRepertoire[MSR]Version3Itmaybenotedthattheshapesofthereferenceglyphsgivenbelowinthecodechartsarebasedononeofthemanyfontsdesigned and are not prescriptive because there could be some variations in actualfonts ndash both UNICODE-compatible and True-Type ones Consider the following Codepointtable

25

Figure1BanglaCodePagefrom[MSR]forBangla-Asamiyā-Manipuri

ColourconventionAllcharactersthatareincludedinthe[MSR]-YellowbackgroundPVALIDinIDNA2008butexcludedfromthe[MSR]-PinkishbackgroundNot PVALID in IDNA2008 or are ineligiblefor the root zone (digits hyphen) - Whitebackground

GiventheBanglaUnicodeBlockasinFigure1forthecodepointsthoseareincludedintheMSRthefollowingsymbolswillneedaseparatetreatmentৎ U+09CE BanglaLetterKhanda-Taৰ U+09F0 Asamiyā-BanglaLetterRaWithMiddleDiagonalৱ U+09F1 Asamiyā-BanglaLetterRaWithLowerDiagonal

26

51 CodePointRepertoireInclusion

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

1 U+0981 BENGALISIGNCANDRABINDU

Candra-bindu

1Bangla2Manipuri2Assamese

[112][122][125]

2 U+0982 ং BENGALISIGNANUSVARA

Onushshar(Anusvara)

1Bangla2Manipuri2Assamese

[112][122][125]

3 U+0983 ঃ BENGALISIGNVISARGA

Biśarga(Visarga)

1Bangla2Manipuri2Assamese

[112][122][125]

4 U+0985 অ BENGALILETTERA

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

5 U+0986 আ BENGALILETTERAA

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

6 U+0987 ই BENGALILETTERI

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

7 U+0988 ঈ BENGALILETTERII

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

27

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

8 U+0989 উ BENGALILETTERU

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

9 U+098A ঊ BENGALILETTERUU

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

10 U+098B ঋ BENGALILETTERVOCALICR

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

11 U+098F এ BENGALILETTERE

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

12 U+0990 ঐ BENGALILETTERAI

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

13 U+0993 ও BENGALILETTERO

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

14 U+0994 ঔ BENGALILETTERAU

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

28

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

15 U+0995 ক BENGALILETTERKA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

16 U+0996 খ BENGALILETTERKHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

17 U+0997 গ BENGALILETTERGA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

18 U+0998 ঘ BENGALILETTERGHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

19 U+0999 ঙ BENGALILETTERNGA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

20 U+099A চ BENGALILETTERCA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

21 U+099B ছ BENGALILETTERCHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

29

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

22 U+099C জ BENGALILETTERJA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

23 U+099D ঝ BENGALILETTERJHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

24 U+099E ঞ BENGALILETTERNYA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

25 U+099F ট BENGALILETTERTTA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

26 U+09A0 ঠ BENGALILETTERTTHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

27 U+09A1 ড BENGALILETTERDDA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

30

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

28 09A109BC(U+09DC)

ড় NormalizedformofBENGALILETTERRRA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]09DCisthepreferredcodepointhoweveritisnotavailableforLGRasperthestandardsgoverningthisLGRdevelopment

29 U+09A2 ঢ BENGALILETTERDDHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

30 09A209BC(U+09DD)

ঢ় NormalizedformofBENGALILETTERRHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]09DDisthepreferredcodepointhoweveritisnotavailableforLGRasperthestandardsgoverningthisLGRdevelopment

31 U+09A3 ণ BENGALILETTERNNA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

32 U+09A4 ত BENGALILETTERTA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

31

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

33 U+09A5 থ BENGALILETTERTHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

34 U+09A6 দ BENGALILETTERDA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

35 U+09A7 ধ BENGALILETTERDHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

36 U+09A8 ন BENGALILETTERNA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

37 U+09AA প BENGALILETTERPA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

38 U+09AB ফ BENGALILETTERPHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

39 U+09AC ব BENGALILETTERBA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

32

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

40 U+09AD ভ BENGALILETTERBHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

41 U+09AE ম BENGALILETTERMA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

42 U+09AF য BENGALILETTERYA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

43 09AF09BC(U+09DF)

য় NormalizedformofBENGALILETTERYYA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]09DFisthepreferredcodepointhoweveritisnotavailableforLGRasperthestandardsgoverningthisLGRdevelopment

44 U+09B0 র BENGALILETTERRA

Consonant 1Bangla2Manipuri

[112][125]

45 U+09B2 ল BENGALILETTERLA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

46 U+09B6 শ BENGALILETTERSHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

33

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

47 U+09B7 ষ BENGALILETTERSSA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

48 U+09B8 স BENGALILETTERSA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

49 U+09B9 হ BENGALILETTERHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

50 U+09BE া BENGALIVOWELSIGNAA

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

51 U+09BF ি BENGALIVOWELSIGNI

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

52 U+09C0 ী BENGALIVOWELSIGNII

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

53 U+09C1 BENGALIVOWELSIGNU

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

34

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

54 U+09C2 BENGALIVOWELSIGNUU

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

55 U+09C3 BENGALIVOWELSIGNVOCALICR

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

56 U+09C4 BENGALIVOWELSIGNVOCALICRR

Kāra(Mātrā)

1Bangla2Assamese

[112][122]

57 U+09C7 l BENGALIVOWELSIGNE

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

58 U+09C8 m BENGALIVOWELSIGNAI

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

59 U+09CB lা BENGALIVOWELSIGNO

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

60 U+09CC lৗ BENGALIVOWELSIGNAU

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

35

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

61 U+09CD BENGALISIGNVIRAMA

Hasanta(=Halant)Virama(=Da ri)

1Bangla2Assamese2Manipuri

[112][122][125]

62 U+09CE ৎ BENGALILETTERKHANDATA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

63 U+09F0 ৰ BENGALILETTERRAWITHMIDDLEDIAGONAL

Consonant 2Assamese [122]

64 U+09F1 ৱ BENGALILETTERRAWITHLOWERDIAGONAL

Consonant 2Assamese2Manipuri

[122][125]

Table8BanglaCode-PointRepertoire

Apart from the above individual code-points the Neo-Brāhmī Generation Panel alsoproposes some specific sequences which enable conditional inclusion of the BanglaLETTER A and E followed by Bangla SIGN VIRAMA and Bangla LETTER YA againfollowed by Bangla VOWEL SIGN AA in the repertoire for enabling inclusion of aeligsoundasinEnglishlsquobatrsquolsquocatrsquoetc

SrNo

UnicodeCodePoints

Sequence

CharacterNames Examplelanguagesusingthecode-point

(Notexhaustive

list)

Reference

36

SrNo

UnicodeCodePoints

Sequence

CharacterNames Examplelanguagesusingthecode-point

(Notexhaustive

list)

Reference

S1 098509CD09AF09BE

অ8া BENGALILETTERABENGALISIGNVIRAMABENGALILETTERYABENGALIVOWELSIGNAA

BanglaAssamese

[112][122]

S2 098F09CD09AF09BE

এ8া BENGALILETTEREBENGALISIGNVIRAMABENGALILETTERYABENGALIVOWELSIGNAA

Bangla [112]

Table9Sequences

52 CodePointRepertoireExclusionTherearesomecharactersoftheBanglascriptthatfindplaceintheUnicodebuthavenot been included in the repertoire in the LGR proposal The reason for excludingঌ(U+098C)andৗ(U+09D7)isthattheyarerareandobsoletecharacters

SrNo CodePoints

Glyph CharacterNames Note

1 U+098C ঌ BENGALILETTERVOCALICL Limitedordeclininguse

2 U+09D7 ৗ BENGALIAULENGTHMARK Limitedordeclininguse

Table10ExcludedCodePoints

37

53 CodepointnotusedaloneBENGALI SIGN NUKTA U+09BC (See 336) is excluded from repertoire since it will never be used alone It will be used as sequence in three special characters in normalized form for ড় ঢ় য়

UnicodeCodePoint

Glyph CharacterName Reasonforexclusion

U+09BC BENGALISIGNNUKTA

Never used alone Only used together with U+09A1 ড

U+09A2 ঢ U+09AF য as to form ড় ঢ় য় respectively

Table10bExcludedCodePoints

54 TheBasisofPresentIDNThepresentLGRhasalsobenefitedfromtheearlierworkonIDNforBangla(differentversions)doneforभारतorভারতdraftedbetween20112009and18072013

541 TheABNFVariablesTheAugmentedBackus-NaurFormalism(ABNF)beganwiththefollowingvariables

CrarrConsonantVrarrVowelMrarrkāra(Mātrā)BrarrAnusvāra(onuʃʃār)DrarrCandrabinduXrarrVisarga(biʃɔrgo)HrarrHasantaViramaZrarrKhandaTa

TheAugmentedBackus-NaurFormalism(ABNF)willusethefollowingOperators

SrNumber Operator Function

1 ldquo|ldquo Alternative

2 ldquo[]rdquo Optional

3 ldquordquo VariableRepetition

38

4 ldquo()rdquo SequenceGroup

Table11TheABNFFormalism

InwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoBanglaaregiventofacilitateunderstanding

542 TheVowelSequenceInwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoBanglaare given To facilitate understanding of other Brahmi script users equivalents inDevanāgarīareprovidedwherevernecessaryAvowelsequenceismadeupofasinglevowelItmaybefollowedbutnotnecessarily(optionally)byanAnusvāraonuʃʃār(B)Candrabindu(D)oraVisargabiʃɔrgo(X)ThenumberofDBorXwhichcanfollowaV inBanglamaynotberestrictedtooneGoingbytherules illustratedinthedocument it isclearthat formationssuchasVDDVBBandVXXare invalidorthographicunitsHowever it isvalidandpossible tohaveformationsorsequencessuchasanusvarafollowedbyachandrabinduononehandandvisarga followed by a chandrabindu on the other as in হ8াংচা lsquohaelignchārsquo and lsquohaelignrsquo হ8াঃrespectivelyThepossibilityof aVisarga orAnusvāra (onuʃʃār) followingaCandrabinduexists inBangla Vowel can optionally be followed by a combination of Hasanta Virama [H]Consonant [C] to formaYa-phala ldquoYa-phala isapresentation formofU+09AFBanglaletterযorlsquoyarsquoRepresentedbythesequenceltU+09CDieBENGALISIGNVIRAMABangla SIGNHasanta or VIRAMA U+09AF -য BENGALI LETTER YAgt Ya-phala has aspecialformয়AgainwhencombinedwithU+09BEাBENGALIVOWELSIGNAA(ielsquoaarsquo(ā))itisusedfortranscribing[aelig]asintheldquoardquointheEnglishwordldquobatrdquowritteninBanglaasব8াটAVowel-sequenceadmitsthefollowingcombinations

5421 ASingleVowel

ExamplesV অअ

5422 AVowelwithConditionsAVowelcanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D]orVisarga[X]or Candrabindu+ Anusvāra [DB] or Candrabindu+ Visarga [DX] or combination ofHasanta(orVirama)[H]followedbyConsonant[C]followedbykāra(Mātrā)[M]

39

Examples

VB অং अ

VD অ अ

VX অঃ अः

VDB অং अ

VDX অঃ अःVHCM অ8াএ8া

5423 VHCMSequenceAVHCMsequencecanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D] orVisarga [X]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga [DX]

Examples VHCMB অ8াংএ8াং VHCMD অ8াএ8া VHCMX অ8াঃএ8াঃ VHCMDB অ8াংএ8াং VHCMDX অ8াঃএ8াঃ

543 TheConsonantSequence

5431 ASingleConsonant(C) ExampleC কक

5432 AConsonantwithConditionsAConsonantoptionallyfollowedbydependentvowelsignkāra(Mātrā)[M]orAnusvāra [B] or Candrabindu [D] or Visarga [X] or Hasanta (also known asVirama)[H]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga[DX]

Example

CM িকক -कक

CB কং क

40

CD ক क

CX কঃ कः

CH p क(Pureconsonant)

CDB কং क

CDX কঃ कः

5433 CMSequenceACMsequencecanbeoptionallyfollowedbyBDXDBorDX

Example CMB কীংকং कक

CMD কা का

CMX বীঃ वीः

CMDB কাং का

CMDX কাঃ काः

5434 SequenceofConsonants Asequenceofconsonants(upto4)joinedbyHasanta(alsoknownasVirama)

3(CH)CExample

CHC W rarr ন++ ত न+त

CHCHC sup2 rarr ন+ + ত+ + র न+त+र

CHCHCHC q8 rarr ন++ত++র++য় न+त+र+य

5435 Subsets While considering its subsets as a representative example we will consider thecombinationCHConlyhoweverthesameisequallyapplicabletoCHCHCandCHCHCHC[A]ThecombinationmaybefollowedbyMBDXDBorDX

Example CHCM sup3ী rarrক ক ী 4कrarrककী

CHCB sup3ং rarrক ক ং 4कrarrकक

CHCD sup3 rarrক ক 4कrarrकक

41

CHCX sup3ঃ rarrক ক ঃ 4कःrarrककঃ

CHCDB sup3 ং rarrক ক ং 4कrarrकक

CHCDX sup3ঃ rarrক ক ঃ 4कःrarrककः [B] 3(CH)CMmayfurtherbefollowedbyaBDXDBorDX

Example

CHCMB sup3ীং rarr ক ক ী ং 4क rarr क क ी

sup3ং rarr ক ক ং 4क rarr क क

CHCMD sup3া rarr ক ক া 4का rarr क क ा

CHCMX sup3ীঃ rarr ক ক ী ঃ 4कः rarr क क ी ः

CHCMDB sup3াংrarr ক ক া ং 4काrarr क क ा

CHCMDX sup3াঃ rarr ক ক া ঃ 4काः rarr क क ा ः

544 TheKhanda-Tasequence

5441 AsinglelsquoKhandarsquo-Ta(Z) Example Zৎ=x

5442 AKhandaTaCombination10AKhandaTacanbeprecededbyaconsonantandHasanta(alsoknownasVirama)

[CH]Z

Example র++ৎ =ৎHasinভৎHসনা (bhartsanā) scoldingNoteTheconditionsinthiscontextofKHANDATAarethattheCshouldbeeitherRAU+09B0(র)(usedinBangla)orRAU+09F0(ৰ)(usedinAssamese)

545 SpecialCasesSandPTwospecialcasesinvolvingSequences(referredtoasSandPinTable16underSection7)couldbedescribedbrieflyhereLetustakeupSinthefirstinstanceItisnoteworthythat thereare two instances inBanglawhereHasanta(U+09CD) isprecededbya full

10 Refer to Rule P in Section 7 Table 16

42

vowel (U+0985 অ - BANGLA LETTER A and U+098F এ - BANGLA LETTER E) ForrenderingYa-phalāfollowedbyঅandএitisnecessarytotypeU+09CDplusU+09AFyapreceded by the said vowels This is a purely ligatural entity and the addition ofYa-phalāandā-kāraisusedtoelicittheaeligsoundasinEnglishlsquobatrsquolsquofatrsquoetcTheBrahmıscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribedaslsquovowel killerrsquo although it actually indicates absence of a vowel after the markedconsonant Only the consonants can have the Hasanta marked But as we see hereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅGাandএGা(CfUnicode100p473[100])Another case refers to the formation of repha and ra-phalā in the said script andmentioned in the tableaboveasPOwing toco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+

kaasinঅকHarkaldquothesunldquo)ra-phalā=C+Hasanta+ra(egTieka+Hasanta+raas

inচTcakraldquocycleldquo)ThepointisinboththecasestheslotforracouldbeBanglaraর

(U+09B0)ortheAssameseraৰ(U+09F0)followedprecededbythecommonHasanta(U+09CD)whereastheshapesofrephaandra-phalāinboththecasesremainthesame

43

6 VariantsThissection talksabout thevariants in theBanglascriptTheNBGPcategorizes theseconfusinglyvariantsintwogroupsGroup1ConfusingduetopurevisualsimilarityGroup2ConfusingduetodeviationfromnormallyperceivedcharacterformationsbylargerlinguisticcommunityForGroup1anyidenticalcodepointsaredefinedasvariantsTheconfusablebutnotidenticalcasesarenotproposedasthereisanotherpanel(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesHowevercaseswhichbelongtoGroup2areproposedtobeconsideredasvariantsThesecasesarenotofmerevisualsimilarityasthey involve some deviations from the widely accepted norms of Bangla Aksharformations These can cause confusion even to a careful observer and hence beingproposedasvariantsThe variants are generated in a script when two or more forms are formed withdifferent storage or code points In Bangla the e-kāra ā-kāra and the o-kāra havedifferentcodepointsOnecantypeowithaconsonantatonegoandthesamebytypinge-kāra and ā-kāra as two separate keys getting the same results A reader cannotdifferentiatebetween the twoko (েকা)one typedwitha singlekeyand theotheronetypedwithtwodifferentkeysMoreoverthiswillnotbeconsideredasacaseofvariantbecauseakārafollowedbyakāraisnotallowed

61 InScriptVariantsHoweverweproposetwocasesoftruein-scriptvariantsinBanglascriptCASEIAs far as true variants in Bangla are concernedwemay drawour attention to caseswhereinHasantawith(U+09A5)থ(tha)appearsasconjunctwith(U+09B8)স(sa)and(U+09A8)ন(na)1 স+Hasanta+থ(U+09B8+U+09CD+U+09A5)versus

স +Hasanta+হ(U+09B8+U+09CD+U+09B9)

2 ন+Hasanta+থ(U+09A8+U+09CD+U+09A5)versus

ন+Hasanta+হ(U+09A8+U+09CD+U+09B9)

44

Theabovecombinationsifwrittenintraditionalorthographycouldbelittleconfusingwheretheথ(tha)inconjunctappearslikeaহ(ha)Theconjunctcouldbeintheinitialmedialorfinalpositions(asshownbelowinegno1)Itcouldbetypedwrongaswellthinking itwas aহ (ha)U+09B9 increasing the chances of risks in labelwriting andidentificationExamples1 acuteandসহ(asinacuteানsthānaacuteলsthulaাacuteGsvāsthyaঅacuteায়ীasthāyı)

2 andনহ(asinparagrantha)The fontswhichrepresent traditionalBanglawritingsystemcould tend tocreate thisproblemThereforethesemaybetakenascasesofvariantsinBanglaCASEIIAnotherinterestingexampleofvariantisencounteredinra+HasantaandHasanta+racombinationsinwritinglabelsintheBanglascript(forlanguagessuchasBanglaAssameseandManipuri)Thevariantcasesariseintypinglsquorepharsquo(involvingra+Hasanta)andlsquora-phalārsquo(involvingHasanta+ra)lsquoRepharsquocouldbeformedbytwosequences(mainlybecausebothAssameseandBanglafindplaceinthesameUNICODEpointsandlsquoB_RArsquoaswellaslsquoA_RArsquorefertothesamephoneticelement)Herethefinalligatureslookthesameandwillbeasfollows

(1) B_RA+H+C(2) A_RA+H+C

Where

B_RA = U+09B0BENGALILETTERRA(র)orA_RA = U+09F0BENGALILETTERRAWITHMIDDLEDIAGONAL(ৰ)H = U+09CDBENGALISIGNVIRAMA()C = anyconsonant(theoretically)

Example

Sequence1(UsingBanglaRA)

Ligature1

Sequence2(UsingAssameseRA)

Ligature2

U+09B0(র)U+09CD()U+0995(ক) কH U+09F0(ৰ)U+09CD()U+0995(ক) কH

U+09B0(র)U+09CD()U+09A0(ঠ) ঠH U+09F0(ৰ)U+09CD()U+09A0(ঠ) ঠH

Table12ExampleofRepha

45

NoteAsBanglaandAssameseকandঠlookexactlythesametheresultantcombinationswithRephalookidenticalAdditionofRephadoesnotmakeanydifferencelsquoRa-phalarsquocouldbeformedbytwosequencesonsimilargroundsandthefinalligatureswouldlookthesame

(1) C1+H+B_RA(2) C1+H+A_RA

WhereC1 = anyconsonantsexceptKhanda-ta

Example

Sequence1(UsingBanglaRA)

Ligature1

Sequence2(UsingAssameseRA)

Ligature2

U+0995(ক)U+09CD()U+09B0(র) U+0995(ক)U+09CD()U+09F0(ৰ)

U+09A8(ন)U+09CD()U+09B0(র) ) U+09A8(ন)U+09CD()U+09F0(ৰ) )

Table13ExampleofRa-phala

AstheAssameseandBanglaRephaandRa-phalaconjunctformslookthesamethiscouldcauseconfusabilitytotheend-usersHencetherephaandra-phalacasesneedtobedefinedasvariantsNBGPconcludedtodefineরandৰasvariantcodepointswhereonlyonevariantsetbetweenরandৰcouldcoverallcasesButthiswillcreateblockedvariantlabelsegif

someoneregistersldquoরররrdquothevariantlabelldquoৰৰৰrdquowillbegeneratedasvariantandwillbeblockedandviceversaHoweveritisonlyblockedatthelabellevelifsomeoneelseneedstoregisterotherlabelsegৰৰorৰৰৰৰitisstillpossible

62 CrossScriptVariantsAcrispcrossscriptstudyforBanglahasbeendonewithrespecttosisterscriptssuchasDevanāgarī Gurmukhı and Odia11(formerly Oriya) keeping in mind the visual andtechnicalconfusionstheymaycauseaslabelsonthewebdomainMoreoverthereisnoin-script variant in Bangla as far as the orthography is concerned The followingcharacters are being proposed by the NBPG as variants Although there are certaincharacters which are somewhat similar they but have not been included here TheyhavebeenprovidedintheAppendix(102)forreference

11 Unicode uses Oriya for the script although Odia is now the official term used

46

1 BanglaandNāgarīDevanāgarīScript

Bangla Devanāgarī

মU+09AE

मU+092E

িU+09BF

िU+093F

Table14-BanglaandDevanāgarīcross-scriptvariantcodepoint

2 BanglaandGurmukhiScript

Bangla Gurmukhı

মU+09AE

ਸU+0A38

িU+09BF

ਿU+0A3F

Table15-BanglaandGurmukhıcross-scriptvariantcodepoint

7 WholeLabelEvaluationRules(WLE) ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinsection32whenwritten inBangla12ScriptThe ruleshavebeendrafted in suchawaythattheycanbeeasilytranslatedintotheLGRspecificationsBelow are the symbols used in the WLE rules for each of the Indic SyllabicCategoryasmentionedinthetableprovidedinCodepointrepertoire(Section51)

C rarr Consonant

M rarr Kāra(Mātrā)

V rarr Vowel

B rarr Anusvāra

12 As used by the Unicode denoting and including both Assamese and Maṇipuri

47

D rarr Candrabindu

X rarr Visarga

H rarr Hasanta

Z rarr KhandaTa

S rarr S1S2(fromTable9)or(ae)Ya-phalā(V1HC1M1)whereV1iseither0985(অ-BENGALILETTERA)or098F(এ-BENGALILETTERE)His09CD(-BENGALISIGNVIRAMA)C1is-09AF(য-BENGALILETTERYA)M1is-09BE(া-BENGALIVOWELSIGNAA)S1 and S2 are valid even they are not allowed by the other context rules

P rarr Ra-Hasanta(C2H)whereC2iseither09B0(র-BENGALILETTERRA)or 09F0 (ৰ - ASSAMESE LETTER RA

Unicode name BENGALI LETTER RAWITHMIDDLEDIAGONAL)

His09CD(-BENGALISIGNVIRAMA)

Table16-SymbolsusedinWLErules

ItisalsoperhapsidealtomentionherethatinBanglatheconsonantletters(orgraphemes)arephysicallyjoinedtoformldquoclustersrdquothatcouldtheoreticallyconjoinfromtwotofourconsonantsandcombinetocreatenewshapesDashandChaudhuri(1998)statethatthereareldquonearly380uniqueconsonantclustersrdquooutofwhichBi-consonantalcombinationsare290three-lettercombinationsaccountforanother80andtherareroneswithfourlettersnumber10more[136Pg4]MoredetailsofsuchcombinationscouldbeseeninPabitraSarkar(1993)[135] 71FinalSetofWLERulesTheprevalentpatternsinBanglaandvariousrestrictionsbelowarethespecificWLErulesthatneedtobeimplemented

48

1 CisasetofCandCNwhereCNisthesetofnormalizedformsofড়ঢ়য়2 HmustbeprecededbyC

Example

3 MmustbeprecededbyCExampleকা

4 DmustbeprecededbyeitherofVCorMExampleআখখা হা

5 XmustbeprecededbyeitherofVCMorDExampleউঃখঃবঃাঃ দ ঃ

6 BmustbeprecededbyeitherofVCMorDExampleআংইংকং

7 ZmustbeprecededbyVCMDBXorPExampleইৎকৎাৎাৎপ6ৎrৎ (S is not listed because S ends with M Z may also follow S)rdquo

8 VCANNOTbeprecededbyHDetailsin711CaseofVprecededbyH

9 SCANNOTbeprecededbyH

Now letuselaborateeach rulewithexamples from the scriptkeeping inmindthe Bangla Assamese and Manipuri communities Some combinations ofcharactersmayseemunrealisticorrareinusagebutthereisnoharminaddingsuch ligaturesbecause it ispossible tocreate thembyanyusereasilybutmaynotbeattestedcombinationsCase of V Preceded by H There could be cases involvingmulti-word domains where Vmay need to beallowedtofollowanHegব8াsঅtইিuয়া baeligŋkʌv ɪndiə (U+09ACU+09CDU+09AFU+09BEU+0999U+09CD U+0995 U+0985 U+09AB U+09CD U+0987 U+09A8 U+09CD U+09A1U+09BFU+09DFU+09BE)(meaningBankofIndia)ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendswithanH(অt)andthesecondwordbeginswithaV(ইিuয়া)Somesectionsofthe linguistic community require the explicit presence of H for fullrepresentation of the sound intended However by and large the form of thefirstwordwithoutanH(U+09CD)isconsideredenoughforfullrepresentationofthesoundintendedforthefirstword

49

This isauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoire Otherwise V is never required to be allowed to follow an HPermittingthismaycreateaperceptivesimilaritybetweentwolabels(withandwithout H) for majority of the linguistic communities hence this is explicitlyprohibitedbytheNBGPIn future if required depending on the prevailing requirements from thecommunitythefutureNBGPmayconsiderrevisitingthisrule

72 AdditionalExamplesfromBanglaABNF

Belowarea fewexampleswhichhelponeunderstandsomeof the rulesABNFputsinplaceThesearejustgivenforreferencepurposesandarenotmeanttobecomprehensive

1HMBDorXcannotoccurinthebeginningofaBanglawordExample

ক क

াক ाक

ংক क

ক क

ঃক ःकAscanbeseensuchcombinationwillresultautomaticallyinaldquogolurdquooradottedcircle marking it as an invalid formation This is an intrinsic property of theIndianlanguagesyllableandisquasi-automaticallyappliedwhereversupportedbytheOS

2HisnotpermittedafterVBDXMS

Example

অ अ

অং क ক क কঃ कः ি )क

3NumberofBDorXpermittedafterConsonantorVowelorakāra(Mātrā)isrestrictedtoonethusthefollowingcombinationsareinvalidated

50

Example

কংং क

ক क

কঃঃ कःः

কা का

কীঃঃ कःः

অংং अ

অ अ

অঃঃ अःः

4NumberofMpermittedafterConsonantisrestrictedtooneExample

কীী की5MisnotpermittedafterV Example

ইাঈৗ ईाईौ6ThecombinationsofAnusvāra+ VisargaaswellasVisarga+Anusvāraarenotpermissible

Example

কংঃ कः

কঃং कः

8 Contributors

81ExpertsfromIndia ProfessorUdayaNarayanaSinghChair-ProfessorofLinguisticsampDeanFacultyofArtsAmityUniversityHaryanaGurgaonPachgaonManesarPIN122431(Haryana)India

51

ProfessorPabitraSarkarformerlyVice-ChancellorRabindraBharatiUniversityKolkataDrAtiurRahmanKhanPrincipalTechnicalOfficerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMrRajibChakrabortyLinguistSocietyforNaturalLanguageTechnologyResearch(SNLTR)Module114amp130SDFBuildingSaltLakeSector-VKolkata-700091(WestBengal)IndiaMrAkshatJoshiProjectEngineerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMsMoumitaChowdhurySeniorTechnicalOfficerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMrChandrakantaMurasinghAgartalaTripuraSomeotherNBGPmembers

82ContributorsfromBangladeshJanabMustafaJabbarHonorableMinisterMinistryofPostsTelecommunicationsampInformationTechnologyGovtofBangladeshProfShamsuzzamanKhanFormerDirector-GeneralBanglaAcademyDhakaProfRafiqulIslamNationalProfessorofHumanitiesDhakaProfSwarochisSarkarDirectorInstituteofBangladeshStudiesRajshahiUniversityRajshahiBangladeshProfJinnatImtiazAliDirector-GeneralInternationalMotherLanguageInstituteDhakaMrMohammadMamunOrRashidDepartmentofBanglaJahangirnagarUniversityampMemberBangladeshComputerCouncilProfManiruzzamanformerlyProfessorChittagongUniversityChattagramBangladeshMrShyamSunderSikderSecretarySecretaryPostampTelecommunicationsDivisionGovtofBangladesh

52

MrMdMustafaKamalFormerDirectorGeneralBangladeshTelecommunicationsRegulatoryAuthorityGovernmentofBangladeshDhakaBrigadierGeneralMdMahfuzulKarimMajumderDirector-GeneralEngineeringampOperationsDivisionBangladeshTelecommunicationsRegulatoryAuthorityGovernmentofBangladeshDhakaMdZiarulIslamProgrammerPostsampTelecommunicationsDivisionGovernmentofBangladeshDhakaProfSyedShahriyarRahmanDepartmentofLinguisticsUniversityofDhakaDrMizanurRahmanDirector(In-Charge)TranslationTextBookandInternationalRelationsDivisionBanglaAcademyDhakaDrApareshBandyopadhyayDirectorBanglaAcademyDhakaMrMdMobarakHossainDirectorBanglaAcademyDhakaDrJalalAhmedDirectorBanglaAcademyDhakaMrJahangirHossainInternetSocietyBangladesh(ICANNALS)JanabSarwarMostafaChoudhuryBangladeshComputerCouncilDhakaJanabMdRashidWasifBangladeshComputerCouncilDhakaJanabIstiaqueArifSeniorAssistantDirectorBangladeshTelecommunicationsRegulatoryAuthorityDhakaMsAfifaAbbasInformationSecurityandGovernanceLeadEngineeratBanglalinkandICANNFellowMr Mohammad Abdul Haque Secretary General Bangladesh Internet GovernanceForumMrImranHossenCEOEyeSoftandkeymemberofBangladeshAssociationofSoftwareampInformationServices(BASIS)MsShahidaKhatunDirectorFolkloreMuseumampArchiveDivisionBanglaAcademyDhakaMrSyedAshikRehmanCEOBengalMediaCorporationDhakaMrHaseebRahmanCEOProfessionalsrsquoSystemsDhaka

53

9 References[100] UnicodeConsortium2017UnicodeStandard100MountainViewCA

[101] BandyopadhyayChittaranjan1981DuiShatakerBanglaMudranoPrakashanKolkataAnandaPublishers

[102] BanerjiRD1919TheOriginoftheBengaliScriptKolkataNewDelhiAsianEducationalServices2003reprint

[103] ChatterjiSK1926TheOriginandDevelopmentoftheBengaliLanguageCalcuttaCalcuttaUniversityPress

[104] -----1939Bhasha-prakashBangalaVyakaran(AGrammaroftheBengaliLanguage)CalcuttaUniversityofCalcutta

[105] HaiMuhammadAbdul1964DhvaniVijnanOBanglaDhvani-tattwa(PhoneticsandBengaliPhonology)DhakaBanglaAcademy

[106]JhaSubhadra1958TheFormationofMaithiliLondonLuzacampCo

[107] KosticDjordjeDasRheaS1972AShortOutlineofBengaliPhoneticsCalcuttaStatisticalPublishingCompany

[108] MajumdarRC1971HistoryofAncientBengalCalcuttaGBhardwaj

[109] MazumdarBijaychandra19202000TheHistoryoftheBengaliLanguage(ReprCalcutta1920ed)NewDelhiAsianEducationalServices

[110] PandeyAnshuman2001ProposaltoEncodetheTirhutaScriptinISOIEC10646

[111] PalPalashBaran2001DhwanimalaBarnamalaKolkataPapyrus

[112] -----2007lsquoBanglaHarapherPanchParbarsquoInSwapanChakrabortyedMudranerSanskritiOBanglaBoiKolkataAbabhas

[113] RossFiona1999ThePrintedBengaliCharacteranditsEvolutionLondonCurzon

[114] ShastriMahamahopadhyayHaraPrasad1916HājārBacharērPurāṇaBāṅgālāBhāṣāyBauddhaGānōDōhāCalcuttaBangiyaSahityaParishat

[115] SinghUdayaNarayana(JointlyManiruzzaman)1983DiglossiainBangladeshandlanguageplanningCalcuttaGyanBharati214pp

[116] -----1987ABibliographyofBengaliLinguisticsMysoreCIILxii+316pp

[117] -----2017(withRajibChakrabortyBidishaBhattacharjeeampArimardanKumarTripathy)LanguagesandCulturesontheMarginGuidelinesforFieldworkonEndangeredLanguagesMimeoCentreforEndangeredLanguagesVisva-Bharati

54

[118] -----1980ScriptalchoiceandspellingreformAnessayinlanguageandplanningJournaloftheMSUniversityofBarodaSocialScienceNumber292173-186AmodifiedversionreprintedEAnnamalaiBjornJernuddandJoanRubinedsLanguagePlanningProceedingsofanInstituteMysoreCIIL405-417

[119] Sripantha1996JakhanChapakhanaEloKolkataPaschim-BangaBanglaAcademy

[120] SurAtul1986BanglaMudranerDushoBacharKolkataJijnasa

[121] ScriptBehaviourforBengaliVersion11TDILandC-DACPune

[122] BoraMahendra1981TheEvolutionofAssameseScriptJorhatAssamSahityaSabha

[123] ProposaltoEncodetheTirhutaScriptinISOIEC10646httpwwwunicodeorgL2L201111175r-tirhutapdfaccessedon25112017

[124] EthnologueAssameseintheLanguageCloudhttpswwwethnologuecomcloudasmaccessedon25112017

[125] BengalialphabetforManipurifoundinEthnologueManipuri(MeeteilonMeithei)httpswwwomniglotcomwritingmanipurihtmaccessedon20102019

[126] WikipediaBengalialphabethttpsenwikipediaorgwikiBengali_alphabetaccessedon25112017

[129]OmniglotSlyhetihttpwwwomniglotcomwritingsylotihtmaccessedon1052018

[130]WikipediaBishnupriyaManipurilanguagehttpsenwikipediaorgwikiBishnupriya_Manipuri_languageaccessedon25112017

[131] TheEMILLECIILCorpushttpmetashareeldaorgrepositorybrowsethe-emilleciil-corpusabdd35c8de6f11e2b1e400259011f6ea6bce74d38dbb42d881da76c64a6adb20accessedon1052018

[132]TheEMILLECIILCorpushttpcatalogelrainfoproduct_infophpproducts_id=696accessedon1052018

[133] BanglaLanguageampScript httpswwwisicalacin~rc_banglabanglahtmlaccessedon1052018

[134] SarkarPabitra1992BanglaBananSanskarSamasyaoSambhabanaKolkataChirayataPrakashan

[135] SarkarPabitra1993BanglaBhasharYuktabyanjanBhasha1123-45

55

[136] DashNiladriShekharandBBChaudhuri1998BanglaScriptAStructuralStudyLinguisticsToday121-28Alsoavailableathttpswwwacademiaedu9967428Bangla_Script_A_Structural_Study

[137] DaniAhmedHasan(1957)lsquoSrīhatta-NāgarīLipirUtpattioBikāśrsquoBanglaAcademyPatrika(Dhaka)Vol12(Bhadra-Agrahayan1364BangabdaNumber)pg1

[138] WikipediaSylhetiNagari httpsenwikipediaorgwikiSylheti_Nagariaccessedon1952018

[139] FuruiRyosuke(2015)lsquoVariegatedAdaptationsStateFormationinBengalfromtheFifthtoSeventhCenturyrsquoinBhairabiPrasadSahuampHermannKulkeedsInterrogatingPoliticalSystemsIntegrativeProcessesandStatesinPre-ModernIndiaChapter9Pp255-73NewDelhiManohar

[140] FergusonCharesAandMunierChowdhury(1960)lsquoPhonesofBengalirsquoLanguageVol36No1pp22-59

[141] ShahidullahMuhammad(2007)BuddhistMysticSongsDhakaMowlaBrothers

[142] RayPunyaSloka(1966)BengaliLanguageHandbookWashington

[143] HaiMuhammadAbdul(1960)AphoneticandphonologicalstudyofnasalsandnasalizationinBengaliDhakaUniversityofDhaka

[144] UnicodeConsortiumProposalSummaryFormtoAccompanySubmissionsforAdditionstotheRepertoireofISOIEC10646UNICODEhttpswwwunicodeorgL2L200202387r-syloti-formpdfaccessedonMay212018

[145] WikipediaOlChiki(Unicodeblock)httpsenwikipediaorgwikiOl_Chiki_(Unicode_block)accessedonMay212018

[146]BanglaScripthttpwwwbangladesh2000combdbangla_scripthtmlaccessedonMay212018

[147] BhattacharyaAshutoshed(1942)GopichandrerGanCalcuttaCalcuttaUniversity

[149] DasSisirKumar(1975)SahibsandMunshisAnAccountoftheCollegeofFortWilliamCalcutta

[150] IslamRafiqulPabitraSarkarMahbubulHaqampRajibChakraborty(eds)(2014)BanglaAcademyPromitoBanglaByabaharikByakaran(AFunctionalGrammarofStandardBangla)DhakaBanglaAcademy

[151] SarkarPabitra[2013]lsquoBanglaSpellingReformtheLongandShortofItrsquoBanglaJournal19215-232

[152] BanglaAcademy(2012)BanglaAcademyPromitoBanglaBananerNiyam(StandardBanglaSpellingasadoptedbyBanglaAcademy)DhakaBanglaAcademy

56

[153] SarkarPabitraampRajibChakraborty2018ldquoWhathashappenedSoFarIntermsof Script Reformsrdquo Paper presented at the Face to Face meeting jointly held by theBanglaAcademyDhakaampICANNatBanglaAcademyDhakaon10072018

[154] The Unicode Consortium 2018 The Unicodereg Standard Version 110 ndash CoreSpecificationChapter12P473

57

10 Appendix-I

101 AugmentedBackus-NaurFormalism(ABNF)The Augmented Backus-Naur Formalism (ABNF) is generic in nature and whenappliedtoaspecificlanguagescriptcertainrestrictionrulesapplyInotherwordsinagivenlanguagesomeoftheFormalismstructuresdonotnecessarilyapplyTotakecareofsuchcasesrestrictionrulesaresetinplaceTheserestrictionswillhelptofine-tunetheABNFIncaseofBangla13inparticularthefollowingrulesapply

1Khaṇḍata (ৎ) is NOT allowed at the beginning of an IDN label The sameappliestoঞandthevelarnasalঙintheBanglaSchemeoffive-foldlsquovargarsquo(asdefined under Table 5) Moreover Bangla does not allow ya (য়) in thebeginningofawordeitherbutwecanciteacoupleofnativeexamples forexamplethewordয়8াvেড়া(yaeligbbɔRo)fromthepoemlsquoLichuchorrsquowrittenbyKaziNazrul IslamHowever there are instances of it being used in namesmostlyof foreignoriginsuchasYaqubwhichmaybewrittenwithya(য়) inthe beginning as inয়াwব) In very recent timeswhile transliterating someChineseandJapanesenamesinBanglaonedoescomeacrossthepossibilityofKhaṇḍata (ৎ) followedbysa (স) inthebeginningofaword forexamplexেসিরং(Tsering)

2CHcancomewithKhandaTainonlythecasewhereCisra(র)(09B0)

ৎ6 asinভৎ6 সনা

3OnlyfollowingcombinationswithVHCMwillbeallowedrarrঅ8া(togetherpronouncedasaelig)asinঅ8ািসড(acid)rarrএ8া(togetheralsopronouncedasaelig)asinএ8ািসডএ8ােসািসেয়শান

(acidassociation)

102 lsquoSylhetiNagarılipirsquoorlsquoSilotirsquoThisversionofBanglascriptresemblesthe lsquoKaithīrsquoscript(ISO12954)usedbytheAccountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh andBiharndashwidelyinuseduringthe1880sTherewereseveralothernamesofSylheti

13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language Assamese and Maṇipuri have not been covered in this section

58

Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)

Table17ndashTheScriptTableofSylhetiNagarıorSiloti

103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints

1031 BanglaandNāgarīorDevanāgarī

Bangla Devanāgarī NBGP

Decision

ঃ U+0983 ः U+0903 Confusable

ওU+0993 उU+0909 Confusable

ঘU+0998 घU+0918 Confusable

U+0981 U+0945 Confusable

Table18BanglaandDevanāgarīconfusablecodepoints

59

1032 BanglaandGurmukhi

Bangla Gurmukhi NBGPdecision

ঘU+0998 ਬU+0A2C Confusable

U+0981 U+0A71 Confusable

Table19BanglaandGurmukhiconfusablecodepoints

Bangla Gurmukhi

NBGPdecision

ওU+0993 ਤU+0A24

Distinguishable

শU+09B6 ਅU+0A05

Distinguishable

মU+09AE ਮU+0A2E

Distinguishable

বাU+09ACandU+09BE

ਗU+0A17

Distinguishable

Table20ndashBanglaandGurmukhıdistinguishablecodepoints

1033BanglaandOriya(Odia)

Bangla Oriya(Odia) NBGPDecision

ওU+0993 ଓU+0B13 Confusable

Table21ndashBanglaandOriyadistinguishablecodepoints

Bangla Oriya(Odia) NBGP

Decision

ঘU+0998 ସU+0B38 Distinguishable

Table22ndashBanglaandOriyadistinguishablecodepoints

60

11 Appendix-IIBengaliconsonantsandtheirallographs

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

প p y(z+ত)(z +ন)|(z +প)প8(z +য)r(z +র)(z +ল)~(z +স)(+প)(+প)

ফ pʰ (t +র)(t +ল)(+ফ)

ব b ( +জ)( +দ)( +ধ)v(+ব)ব8(+য)(+র)(+ল)ভ(+ভ)(+ব)(+ব)

(0+ধ) 2 (3+ব)

ভ bʱ ভ8(+য)(+র)(+ল)

ত t (x+ত)8(x+x+য)(x+x+ব)(x+থ)(x+ন)ত8(x+য)(x+ম)8(x++য)(x+ব)(x+র)y(z+ত)(p+ত)(p+x+ব)(+ত)q8(+x++য) (+x+র)Thereisamarkedformofত+=ৎৎ6 (+xৎ)

amp (5+ত)

61

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

থ tʰ থ8(iexcl+য)cent(iexcl+র)pound(+থ)(x+থ)curren(+থ)

(7+থ) 9 (+থ)

দ d yen(brvbar+গ)sect(brvbar+ঘ)uml(brvbar+দ)copy(brvbar+ধ)দ8(brvbar+য)ordf(brvbar+ব)laquo(brvbar+ভ)not(brvbar+র)(+দ)shy(+দ)reg(+brvbar+র)not6 (+brvbar+র)

(lt+গ) gt (lt+ধ)

ধ dʱ macr(deg+ন)plusmn(deg+ম)ধ8(deg+য)sup2(deg+র)sup3(acute+ধ)copy(brvbar+ধ)(+ধ)micro(+ধ)

( (+ধ) gt (lt+ধ) (0+ধ) (7+ধ)

ট ʈ para(middot+ট)ট8(middot+য)cedil(middot+ব)sup1(middot+র)ordm(p+ট)raquo(frac14+ট)

ঠ ʈʰ ঠ8(frac12+য)frac34(iquest+ঠ)Agrave(frac14+ঠ)

ড ɖ Aacute(Acirc+ড)ড8(Acirc+য)Atilde(Acirc+র)

ঢ ɖʱ ঢ8(Auml+য)Aring(iquest+ঢ)

চ tʃ AElig(Ccedil+চ)Egrave(Ccedil+ছ)Eacute(Ccedil+Ecirc+র)Euml(Ccedil+ঞ)চ8(Ccedil+য)Igrave(Iacute+চ)Icirc(Iuml+চ)

(A+চ)

62

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ছ tʃʰ ETH(Ecirc+র)Egrave(Ccedil+ছ)Ntilde(Iacute+ছ)Ograve(Iuml+ছ)

$ (A+ছ)

জ dʒ Oacute(Ocirc+জ)Otilde(Ocirc+Ocirc+ব)Ouml(Ocirc+ঝ)times(Ocirc+ঞ)জ8(Ocirc+য)Oslash(Ocirc+র)Ugrave(Iacute+জ)

(A+জ)

ঝ dʒʱ (notprivilegedenoughtohaveclustersasafirstmember)Ouml(Ocirc+ঝ)Uacute(Iacute+ঝ)

ক k Ucirc(p+ক)ordm(p+ট)(p+ত)Uuml(p+x+র)(p+x+ব)Yacute(p+ন)THORN(p+ব)szlig(p+ম)ক8(p+য)agrave(p+র)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)auml(p+frac14+ব)aacute8(p+frac14+য)aring(p+স)s(aelig+ক)ccedil(+p+র)

amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)

খ kʰ (notprivilegedenoughtohaveclustersasafirstmember)egrave(aelig+খ)

63

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

গ g eacute(acute+গ)ecirc(acute+দ)sup3(acute+ধ)euml(acute+ন)igrave(acute+ব)iacute(acute+ম)গ8(acute+য)icirc(acute+র)iuml(acute+ল)eth(aelig+গ)eth6 (+aelig+গ)

( (+ধ) (H+গ) K (L+H+গ)

ঘ gʱ ntilde(ograve+ন)ঘ8(ograve+য)oacute(ograve+র)ocirc(aelig+ঘ)

ঞ Thisletterdoesnothaveanyparticularphoneticvaluebutmostlypronouncedasn

Igrave(Iacute+চ)Ntilde(Iacute+ছ)Ugrave(Iacute+জ)Uacute(Iacute+ঝ)times(Ocirc+ঞ)

(A+চ) $ (A+ছ) (A+জ) M (A+ঝ)

ণ n otilde(iquest+ট)frac34(iquest+ঠ)ouml(iquest+ড)divide(iquest+Acirc+র)Aring(iquest+ঢ)oslash(iquest+ণ)ণ8(iquest+য)ugrave(iquest+ব)acirc(p+frac14+ণ)uacute(frac14+ণ)ucirc(+ণ)

O (P+ড) - (P+R+র) + (S+ণ)

ঙং ŋ s(aelig+ক)uuml(aelig+p+র)egrave(aelig+খ)eth(aelig+গ)ocirc(aelig+ঘ)yacute(aelig+p+ষ)(Insomecontextsaeligisreplacedbyং)কংঅং

) (H+ক) (H+গ) U (H+ঘ)

64

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ম m thorn(+ল)yuml(+প)(+z+র)(+ভ)(++র)$(+ম)(+র)(x+ম)plusmn(deg+ম)amp(+ম)atilde(p+frac14+ম)

W (3+ম)

ন n (+ট)((+middot+র))(iquest+ঠ)u(+ড)(+Acirc+র)(+ত)q(+x+র)q8(+x++য)curren(+থ)shy(+দ)reg(+brvbar+র)micro(+ধ)+(+deg+র)(+brvbar+ব)-(+ন)(+ম)ন8(+য)(+স)0(+ন)

(7+থ) (7+ধ) (7+Y+র)

শ ʃ Icirc(Iuml+চ)Ograve(Iuml+ছ)1(Iuml+ন)2(Iuml+ম)3(Iuml+র)4(Iuml+ল)শ8(Iuml+য)

ষ ʃ 5(frac14+ক)raquo(frac14+ট)Agrave(frac14+ঠ)uacute(frac14+ণ)6(frac14+প)7(frac14+z+র)8(frac14+ফ)raquo(frac14+ট)9(frac14+middot+র)Agrave(frac14+ঠ)uacute(frac14+ণ)ষ8(frac14+য)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)

+ (S+ণ)

65

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

স sampʃ (+ক)(+ট)(+প)(+ফ)lt(+ত)pound(+থ)(+ট)(+ক)=(+খ)স8(+য)gt(+র)(+ল)aring(p+স)

9 (+থ)

হ h ucirc(+ণ)0(+ন)amp(+ম)হ8(+য)(+র)A(+ল)

W (3+ম)

ড় ɽ B(C+গ)

ঢ় ɽʱ (notprivilegedenoughtohaveclusters)

য dʒThesecondarysymbol(allograph)jɔ-phalahastwophoneticvaluesWhenaddedtotheinitialconsonantinaworditisavowelaelig(asinশ8ামলর 8াপারetc)Butafteranon-initialconsonantitjustdoublesitinpronunciation(asinকায6ধায6etc)The+যcombinationhastwophysicalmanifestationsmdashর 8andয6

ক8(p+য)স8(+য)র 8(+য)[Justর 8isneverusedinBanglaorthographyর 8াisbutthenitslasttwosymbolsYa-phalaa-karaconstituteavowelsignrepresentingthevowelঅ8া]

66

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

র r Twomanifestationsmdashi lরফrepʰasthefirst

memberofaclusteregপ6ৎ6 not6 য6D6(+deg+ব)(earlierE6=+brvbar+deg+বafour-termcluster)etc(placedoverthefollowingconsonant)

ii র-ফলাrɔ-pʰɔlaasthesecondthirdmemberofaclustereg etc(placedundertheconsonantitfollows)

ল l F(+গ)(+প)G(+ব)H(+ম)I(+ট)J(+ড)K(+ক)F(+গ)L(+দ)ল8(+য)iuml(acute+ল)(+ল)thorn(+ল)

ঃ hwordfinallywordmediallyitdoublesthepronunciationofthefollowingconsonant

অঃকঃ

অব

Page 7: Proposal for a Bangla (or Bengali) Script Root Zone …‘Bangla’ (or Bengali) is historically and genealogically regarded as an eastern Indo-Aryan language with around 178.2 million

7

PERIOD DESCRIPTION NAMES

4th-5thCenturyAD

ThenextstageofitsevolutionwasintotheGuptascriptnamedaftertheGuptaroyaldynasty

Guptascript

7thCenturyAD EpigraphicrecordsfromBangladeshdemonstrateremarkabledevelopmentsinEasternBrāhmīgivingrisetotheKuṭila-lipi

Kutila-lipi

8thCenturyAD SomecopperplateinscriptionsarefoundintheKhalimpurBangladeshduringthereignofDharmapālafromMonghyrandNālandāinBiharofthetimeofDevapālaandfromJagjıvanapurainWestBengalofthereignofMahendrapāla

Siddhamātikā

9thCenturyADuntil1025AD

Proto-BanglacharacteristicsinrudimentaryformsdevelopAnimportantlandmarkinthedevelopmentoftheBanglascriptistheRamaganjacopperplateinscriptionofMahāmāndalikafoundinthelastquarteroftheeleventhcenturyAD

Proto-BanglaScriptampLanguage

12th-13thCenturyAD

AmatureformofProto-BanglatheimmediateprecursorofBanglascriptisfoundintheinscriptionsoftheVarmanaSenaandDevarulersofthetwelfthandthirteenthcenturies

MaturedProto-Bangla

14th-15thCenturyAD

ThecharacteristicsoftypicalBanglascriptbegantodevelopascouldbeseeninthecopperplateinscriptionofVijayamānikya-IofTripuradated1478AD-alsoIllustratesformsofBanglalettersinthefifteenthcenturyAD

ModernBanglaScripterabegins(SeeRoss1999)

16th-17thCenturyAD

ThechartoftheBanglaalphabetappendedtotheChinaMonumentspublishedfromAmsterdamin1667andThecodeofGentoolawpublishedfromLondonin1776bothshowachartoftheBanglaalphabetTheyshow16VowellettersincludingtheLonglsquoৡrsquo lsquol irsquoAnusvāraandVisargaand34Consonants

PrintedChartsofBangla

18th-19thCenturyAD

CharlesWilkinsdevelopsprintinginBanglain1778andVidyasagarreformsit

BanglaTypeFonts

Table2DevelopmentoftheBanglaWritingSystem

8

TheoveralldevelopmentofBanglaScriptfromtheKuṭila-lipiperiodtoModernBanglacouldbeseenhereinTable3([102and146]andalsoseetheweb-pagein147)

Table3BanglaScriptinDifferentCenturies

32LanguagesConsideredBelowisthetabularrepresentationofthelanguagesusingBanglascriptthatareplacedonEGIDSScale1-6 (See117 fordetails) Some languagesunderEGIDS5 and6havealso developed their own scripts for printing and publishing Some had used Banglascriptearlier(suchasBodo)orusedit inWestBengalatsomepointoftime(Santali)but have later shifted to another writing system Bodo is now written in Nāgarī or

9

DevanāgarīandforSantalioneusesbothNāgarīDevanāgarīandOl-chiki(145)Forthepurposesof theBanglaLGRonly languagesbelonging to theEGIDS scale1 to4havebeenconsideredConsiderthefollowingtable

EGIDSScale1

EGIDSScale2

EGIDSScale3

EGIDSScale4

EGIDSScale5

EGIDS6

Bangla(Bengali)

SantaliBodoRiangKhumiMru(ng)Asho

LepchaPnarKodaKoraChak

Asamiyā(Assamese)

KochorRajabansı

MaltoorMalpahariya

ManipuriorMeitei

BisnupriyaManipuriKok-Borok(TripuraampBangladesh)

ChakmaHajongMundariampKurux(ofBangladesh)

TotoRohingyaTipperaMegamTanchangya

Usoi LimbuSadriorOraon

BhumijorMundariBawmChin

Table4MainlanguagesinIndiaandBangladesh

thatuseBanglaScriptontheEGIDSScale

33NotableFeaturesofBanglaScript[150]BanglaWritingSystemhascertainfeaturesthatshowhowithastobewritteninorhowtype-setting inBangla couldbedoneThis section is followedbya section that explains theCode-points (and fixed Code-point sequences) which show certain distinctive characteristics ofBanglaandwhichmaketheRepertoireThenextsectionswillalsocoverthelsquoaksharrsquo-formationrules(ABNF)showingcharacterclassWordLevelEvaluation(WLE)andContextRulesaswellas In-ScriptandCross-ScriptVariantsHerewepresentsomebasic featuresof theScriptandPronuncition The Bangla script is an alpha-syllabic writing system in which writing of all

consonants are assumed to contain an accompanying lsquoinherentrsquo vowel(theoretically before or after each consonant) It varies between ɔ and o

10

depending on the position of the consonant in the word At times theselsquoassumedrsquoorlsquoinherentrsquovowelsarenotpronouncedatall[142]

Vowelscanbewrittenasindependentlettersorbyusingavarietyofdiacriticalmarks which are written above below before after or both of the last twopositionstheconsonanttheyfollowinpronunciation[105]

AllBanglaconsonantswhenpronouncedinisolationareutteredwithaninherentvowel-ɔhenceক lsquokrsquoখ lsquokhrsquoorগ lsquogrsquoareusuallypronouncedas[kɔ][khɔ]or[gɔ]etcPhonologicallyBanglavowel-ɔcorrespondstotheHindischwaə

WhenconsonantsoccurtogetherinclustersspecialconjunctlettersareformedInprintedBanglamanyof theseconsonantal clustersorconjoinedconsonantsareinuseThelettersfortheconsonantsotherthanthefinaloneinthegrouparegenerally reduced But there are a few special conjunct characters which arecompounds of the consonant characters eg 7(k)+ষ(s)=8(ks)

9(n)+জ(j)=(nj)(j)+ঞ(n)==(jn) gt (h)+ম(m)=(hm) There are other issuesalsomdashরasthesecondmemberofaclusterisreducedtoasecondarysymboleg

(p)+র(r)=A(pr)B(s)+C(t)+র(r)=D(str) (as inউD ustra ldquocamelrdquo)য (y)whenusedas a primary symbol represents jɔ in Bangla But its secondary symbol(allograph) jɔ-phala has two phonetic values When added to the initialconsonant in a word it is a vowel aelig (as in শGামল (syamala) ldquogreenrdquo র Gাপার

(ryapara)ldquowrapperrdquoetc)Butafteranon-initialconsonant it justdoublesit in

pronunciation (as in কাযH ধাযH etc) The I(r)+য(y) combination has two

renderingsmdashর G(ry) andযH(ry)IncaseofJ(d)+ধ(dh)K(g)+ধ(dh)L(n)+ধ(dh)the

shape of the second member is changedmdasheg M(ddh) N(gdh) and O(ndh)

respectively The solitary example of I (r)+ঋ(r)=ঋH (as in ৈনঋHত nairrtSouthwest) ndash usedmostly in cases of Classical borrowings shows the use ofsecondary symbol of a consonant followed by the primary symbol of a vowelTheinherentvowelonlyappliestothefinalconsonantofthecluster

InconsonantclustersmanyconsonantstookacompletelydifferentformSometypicalexamplesareS(kt)T(kr)8(ks)N(gdh)=(jn)U(nc)(nj)V(tt)W(nt)O(ndh)X(bdh)Y(bhr)Z(mb)[(st)etcরhastwoallographsapartfromthisfullshapeoneislsquorepharsquoasfoundinকH(rk)পH(rp)andanotherisra-phalaasinA(pr)T(kr)(s+n)isanotheronewherethecerebralnasalconsonantsigntakesaqueershape[151]

The Bangla script has at least fifty-two primary symbols and quite a fewallographs(positionalvariantsofthem)correspondingtoforty-four(7oraland7nasalvowelsand30consonants)phonemes(150)orfunctionalspeechsoundswithsomeobviousredundanciesalthoughinoneofthefirstphonemicanalysisthenumberwasthoughttobethirty-fivephonemes[140]

11

As mentioned above in Bangla several graphemic symbols have secondaryshapestechnicallycalledlsquoallographsrsquowithacomplementarydistributionineachcaseThesegraphsormarkingsaregenerallyaddedtothefollowingpositionsoftheprimarysymbol[113]inthefollowingmanner

1) Below(egক(ku)W(nta)ক(ku)^ (hra)etc)

2) Above(egচ (ca)কH (rka)etc)

3) Rightside(egকা (ka)কং (kan)etc)

4) Leftside(egেক (ke))

5) LeftSideandabovesimultaneously(egৈক (kai)িক (ki)etc)

6) Rightsideandabovesimultaneously(egকী (kı))

7) Rightsideandleftsidesimultaneously(egেকা (ko))

8) Rightsideleftsideandabovesimultaneously(egেকৗ (kau))

Asforcomplementarydistributionofvowelletters(word-orsyllable-initial)andVowel Matras which are relevant for ABNF let us consider the followingBesidessomesimpleVowelModifierscalledlsquoKarsrsquoinBangla(alsoreferredtoasMatraintheotherLGRdocumentsofNeo-Brāhmī)therearesomecombinatorymodifiersofBanglaVowelswithcertainconsonantsForexamplewhereas

আU+0986BENGALILETTERAAissubstitutedby

াU+09BEBENGALIVOWELSIGNAA

ইU+0987BENGALILETTERIissubstitutedby

pre-posedিU+09BFBENGALIVOWELSIGNI

ঈU+0988BENGALILETTERIIissubstitutedby

ীU+09C0BENGALIVOWELSIGNIIor

উU+0989BENGALILETTERUissubstitutedby

U+09C1 BENGALI VOWEL SIGN U by marking below the primary

grapheme there are some special vowel modifiers of উ as in the followingcombinedletters

zwnj guratherthanwritingasগ(g)+ (u)

h ruratherthanwritingasর(r)+ (u)

zwnj śuratherthanwritingasশ (s)+ (u)

j huratherthanwritingasহ(h)+ (u)

12

knturatherthanwritingasL (n)+ত (t)+ (u)

Similarlytherecouldbevowelmodifiersofঊorlsquo(Long)ūrsquoaswelleg

m (bh)+র (r) (n bhru ldquoeyebrowrdquo)o (s)+র (r) (p sru)ঋ (r) afterহ (h) (q hr)etc

TherehavebeenmanynotablecontributionsinsimplifyingandmodifyingBanglaspellings and combinatory techniques especially by scholars such as PabitraSarkar(1992)[134]Inthistherehasbeenanattempttoreducethenumberofallographs of both vowels and consonants in clusters and it has been widelyacceptedintheprintingofschooltextsinbothBangladeshandWestBengal[151152]Asofnow twosystems theold (traditional) and thenewgoon sidebysideoperativeindifferentdomains

HoweverinpreparationofthisLGRdocumenttheaimhasbeentoconsiderthewidelyused and usable sequences and combinations and their variations across the sisterscriptsbelongingtothebasketofBrāhmīwritingsystemsBanglaAcademyDhakapublishedStandardBanglaSpellingRulesin1992followingtherecommendationsofacommitteeconstitutedthroughaworkshopjointlyorganizedbytheJatıyaSy iksakramaandPathyapustakaBoardin1988AthroughlyrevisededitionoftheRuleswaspublishedinSeptember20126After the establishment of Banla A kademi ofWestBengal in 1986 its first PresidentAnnadasankar Ray (1904-2002) in his inaugural address gave a direction forstandardizationofBanglaalphabetscript thespellingsystemandclearlyarguedthattheywouldnotblindly followtheSanskriticmodelofconventionalgrammarAbroadlistofproposalswassenttoexpertsonBanglaandabroadagreementwasreachedforlsquohomogenizationofBanglaspellingrsquoby1988BasedonopinionsreceivedfromdifferentquartersaunanimouslistoflsquorulesrsquowasagreeduponThiswaspublishedbyalsquoSpellingDictionaryrsquo titled Ākādemi Bānāna Abhidhāna (1997) which was obviously morecomprehensive than lsquoTheUniversityofCalcuttaproposalsrsquomade in1936Alongwiththe lsquorationalizationrsquo of spellings another stepwas taken tomake thewriting systemeasier to read by making the symbols used both single and combined ones morelsquotransparentrsquoThesereformswereoriginallysuggestedbySarkar(1987firstpublishedin1978)[134][153]whereheusedthetermsSwaccha (lsquoTransparentrsquo)andAswaccha(lsquoOpaquersquo or non-transparent) even adding Ardha Swaccha (lsquohalf transparent) inbetweenthetwoSomesampleexamplesare

6Bangla Academy 2012 Bāṅlā Ekaḍemī Pramita Bāṅlā Bānānera Niyama (Bangla Academy StandardBanglaSpellingRules)DhakaːBanglaAcademy

13

Transparent r (nn) s (pt) [ (st) where both member of the cluster can berecognizedOpaquewhereneitherofthetwocouldbe(easily)recognizedmdash8 ks(7 k+ষ s)= jn

( j+ঞ n)tng(un+গg) hm(gt h+ম m)

Semi-transparent A (pr)পH (rp)whereone symbol is recognizable and theother is

notIncaseofthree-termclustersatleastonesymbolwillnotbetransparentegv str

(w s+x t+র r)D str(B s+C t+র r)etc

Therewere in fact two types of proposals One concerned the shape of the lettersthose of consonant + vowel (CV) combinations and conjuncts which is consonant +consonantcombinationsTherewerefurthercomplexshapesiethoseofconsonant+consonant+ (consonant+) vowel (CC(CV) signs as in y (pru) or z (skru) SomedecisionsinthisareawerenecessarybecauseafewoftheCC(C)symbolsrepresentedcomplexitiesthatmadelearningthemdifficultforthechildrenTheotherdealtwiththespellings ofwords onlywithout any reference to the shapes of letters inwhich theywere written The basic objective here was lsquoone word one spellingrsquo to the greatestextentthatwaspossible[151]

Belowwe place a statement of themost salient changes that affect the consonant +vowelcombinations[153]

a The variants of the short u (^ উ-কার hrasva u-kāra) vowel sign have been

brought down to one ie So zwnj (gu) is now গ Similarly h (ru) gt র zwnj

(śu)gt শ j (hu)gtহ and therefore cluster + short u sign k (ntu)gt W

(ন++ত+উ) (stu)gt[ (স++ত+উ)

b The variants of long u (দীঘH ঊ-কার dīrgha u-kāra) have also been reduced

(rū)gt র n (bhrū) gt Y (ভ bh++র r+ঊ ū) (drū)gt (দ d++র r+ঊ ū) p (śrū)gt

(শ ś++র r+ঊ ū)

c The variants of ঋ-কার (ṛ-kāra secondary symbol of ṛ) have been brought down to one q (hṛ) gt হ

Regarding consonant + consonant + (consonant)hellip+ (vowel) clusters PaschimbangaBanglaAkademi proposed transparent or semi-transparent shapes for clusters to theextentadmissibleinBanglawritingsystemSomeexampleswillclarifytheproposal(Aslashwillmeanthatthetraditionalcluster-shapeprecedesitwhiletheBanglaAkademiinnovationfollows)[153]

Xব ধ bdh ( b+ ধ dh) Mদ ধ ddh (J d+ধ dh) ন থ nth (L n+থ th) Uঞ চ ntildec

(9 ntilde+চ c) ঞ ছ $ ntildech (9+ছ) ঞ জ ntildej (9 ntilde+জ j) Sক ত amp kt (7 k+ত t) T

14

kr (7 k+র r) Nগ ধ ( gdh (K g+ধ dh) ) ṅk (u ṅ+ক k) t ṅg (u ṅ+গ g) +

ṣṇ (B ṣ+ণ ṇ) ন ndhr (L n+ dh+র r) - ṇḍr ( ṇ+ ḍ+র r) ktr (7 k+x

t+র r)

331TheConsonantsAsper traditional classificationBangla Consonants are categorized according to theirphoneticpropertiesespeciallyintermsofplaceandmannerofarticulation[107]Thereare Five lsquoVargarsquo (pronounced as lsquoBargarsquo in Bangla) or Groups (sets or classes)distinguished by Place of Articulation and one Non-lsquovargarsquo group [105] Each Vargawhich corresponds toStopsat a certainplaceof articulation containsa seriesof fiveconsonants classified as per their phonetic qualities (ie manner of articulation)beginning from Unvoiced and Unaspirated to Voiced and Aspirated (in the fourthcolumn)finallyendingwithaHomorganicorCorrespondingnasal[107]Considerthefollowingtable

lsquoVargarsquoorSets

Unvoiced Voiced Nasal

-Asp +Asp -Asp +Asp

Velar কlsquoKrsquoU+0995

খlsquoKHrsquoU+0996

গlsquoGrsquoU+0997

ঘlsquoGHrsquoU+0998

ঙlsquoNGrsquoU+0999

Palatal চlsquoCrsquoU+099A

ছlsquoCHrsquoU+099B

জlsquoJrsquoU+099C

ঝlsquoJHrsquoU+099D

ঞlsquoNYrsquoU+099E

Retroflex টlsquoTTrsquoU+099F

ঠlsquoTTHrsquoU+09A0

ডlsquoDDrsquoU+09A1

ঢlsquoDDHrsquoU+09A2

ণlsquoNNrsquoU+09A3

Dental তlsquoTrsquoU+09A4

থlsquoTHrsquoU+09A5

দlsquoDrsquoU+09A6

ধlsquoDHrsquoU+09A7

নlsquoNrsquoU+09A8

Bilabial পlsquoPrsquoU+09AA

ফlsquoPHrsquoU+09AB

বlsquoBrsquoU+09AC

ভlsquoBHrsquoU+09AD

মlsquoMrsquoU+09AE

Table5VargaclassificationofBanglaconsonants

(FallingintoaPatternofFiveSetsofUnvoicedUnaspiratedUnvoicedAspiratedVoicedUnaspiratedVoicedAspiratedandNasalscalledfivelsquoVargarsquo)

Non-

যlsquoYrsquoU+09AF

য়lsquoYYrsquoU+09DF

রlsquoRrsquoU+09B0

ড়lsquoRRrsquoU+09DC

ঢ়lsquoRHrsquoU+09DD

15

Varga লlsquoLrsquoU+09B2

শlsquoSHrsquoU+09B6

ষlsquoSSrsquoU+09B7

স lsquoSrsquo U+09B8

হlsquoHrsquoU+0939

Table6Non-Vargaconsonants(Notfallingintoanyofthefivecategories)

332TheImplicitVowelKillerHasanta(calledrsquoHalantrsquoorlsquoHalantarsquoinotherBrahmı-basedscripts)As stated earlier all consonants are pronounced in isolation with an implicit vowel(centralback-ɔinBanglaastheneutralvowel)assumedtobeassociatedwiththem[121]ThelsquoHasantarsquo(=rsquoHalantrsquoorlsquoHalantarsquoinotherBrahmı-basedscripts)orthetermlsquoVirāmarsquo7(=rsquoDa rirsquoinBangla)aspreferredinUNICODE(cfUnicode30andabove)havebeenusedinthisreportastermsthathavebeenusedtodenotethecharacterthatmarkthe absence of this inherent vowel It may be noted that the term virama has beenadopted in UNICODE in a sense that is different from the traditional definition ofgrammarandhenceitrequiressomeexplanationhereConsideringtheimportanceofthedocumentthisnoteshouldbeapartofthisLGRdocumentsothatanybodyreferingtoitshouldbeabletoknowthepropergrammaticalexplanationofthetermBecauseaspecialsignisneededwheneverthisimplicitvowelisstrippedoffthesymbolisknownas the Hasanta (= Halant) (U+09CD) By placing the Hasanta under the firstconsonantofacombinationorclusteronecouldndashincommonparlanceldquokillrdquoitsvowelandcreate conjuncts In thismanner conjunct characters canbegenerallywrittenbyjoining two to fourconsonant combinations In rarecases thisprocess can joinup tofive consonantsHowever thenotionof amaximumnumberof consonants joining toformoneaksara8istobeboundedempiricallyThisisanobservationbasedontheCIIL-Emille Corpora of Bangla words [132 amp 133] as seen in print these days Given themixture of scripts and languages happening on theweb the possibility that onemaywant a generic Top Level Domain [gTLD] which may have more than the observedmaximum cannot be ruled out This can be the case when a foreign language wordwhichadmitsalargenumberofconsonantsistransliteratedintoBanglaHenceintheBanglaLGRworkthislimitwillnotbeenforced

333VowelsSeparate symbols exist for all lsquoSwararsquo or Vowels in Bangla which are pronouncedindependentlyeitheratthebeginningofthewordorafteranothervowelorconsonantsoundToindicateaVowelsoundotherthantheimplicitoneaVowelsigncalledlsquokārrsquo

7VirāmaasusedhereisalsoamisnomeraccordingtotheIndiangrammaticaltraditionsNowheremereabsence of a vowel is marked as virama Hasanta just marks the absence of a vowel nothing else(AbhyankarKashinathVasudevampJMShukla1961ADictionaryofSanskritGrammarBarodaːOrientalInstitute)8ThistermneedstobedisambiguatedAksaraalsomeanslsquosyllablelsquoinIndiangrammaticaltreaditions

16

inBanglaorMātrāinNagarı9isattachedtotheconsonantSincetheconsonanthasthisbuilt in neutral vowel at the end there are equivalent kāras (Mātrās) for all vowelsexcepttheঅ(pronounced-ɔ)Thecorrelationisshownasfollows

Vowel Correspondingvowelsign(kāras(Mātrās)

অlsquoArsquoU+0985

আlsquoAArsquoU+0986 া U+09BE

ইlsquoIrsquoU+0987 ি U+09BF

ঈlsquoIIrsquoU+0988 ীU+09C0

উlsquoUrsquoU+0989 U+09C1

ঊlsquoUUrsquoU+098A U+09C2

ঋVocalicrsquoRrsquoU+098B U+09C3

ৠVocaliclsquoRRrsquoU+09E0 U+09C4

ঌVocaliclsquoLrsquoU+098C U+09E2

ৡVocaliclsquoLLrsquoU+09E1 U+09E3

এlsquoErsquoU+098F ে U+09C7

ঐlsquoAIrsquoU+0990 ৈU+09C8

ওlsquoOrsquoU+0993 োU+09CB

ঔlsquoAUrsquoU+0994 ৌ U+09CC

- ৗ U+09D7

Couldappearontopofঅ lsquoArsquoU+0985oranyothervowel

U+0981Candrabindu

9AlthoughthetermlsquoMatralsquoinBanglastandsforanaltogetherdifferentconceptvizthetopbarplacedoveraletterndashtypicallyavailableinHindiandBanglabutmissinginGujarati

17

Vowel Correspondingvowelsign(kāras(Mātrās)

Couldappearafterঅ lsquoArsquoU+0985oranyothervowel

ংU+0982Anusvara

Couldappearafter অlsquoArsquoU+0985oranyothervowel

ঃU+0983Visarga

Afteranyconsonant U+09CD(Hasanta)

- ঽ U+09BDAvagraha

Table7BanglaVowelswithcorrespondingkārs

334TheAnusvāraonuʃʃār(ং-U+0982)TheAnusvāra or onuʃʃār inBangla at times represents a homorganic nasal but notalwaysItreplacesaconjunctgroupofalsquoNasalConsonant+Hasanta+ConsonantrsquowherethesecondconsonantbelongstotheVelarvargaorsetasinলংকাButitoftenappearsalso for such combinations involving non-velars appearing as the lastmember of thecombinationasinলGাংটা ldquonakedrdquoorলGাংচা ldquoakindofsweettolimprdquoBeforeanon-vargaconsonant the Anusvara represents a nasal sound that may have an alternativeconjoined writing symbol representing the corresponding nasal consonant of theparticularsetAlthoughModernHindiMarathiandKonkaniprefertheanusvāratothecorrespondingHalf-nasal inBangla it isclearlydemarcatedastowhereonemustusetheAnusvāraandwhere ithastobeaconjunctclusterwithanasalasthefirstorthesecondcomponent

335NasalizationCandrabindu(-U+0981)

Candrabindu denotes nasalization of the preceding vowel as in চাদ cad lsquomoonrsquo(U+099AU+09BEU+0981U+09A6)Thissignwithadotinsidethehalf-moonmarkisusedasnasalizationmarkerinmanyBrahmı-basedscripts[143]

336Nukta(-U+09BC)ThenuktasigndoesnotexistinBanglaorthographyItispredominantlyusedinmanyBrahmıderivedscriptssuchasDevanagarı(forHindiBodoMaithiliSantaliKashmiriandSindhiThetermandtheconceptofnuktaareborrowedinBanglaTheIDNAProtocol(RFC5891)statesthatIDNsmustbeinUnicodeNormalizationFormC (NFC) RFC 7940 applies this requirement to LGRs The definition of NFC in the

18

UnicodeStandardcontainsanumberofcompositionexclusionsAsaresulttheBanglalettersয় YYAড় RRAandঢ় RRHAhavetoberepresentedinthethisLGRbyusingthesequences (YA +Nukta U+9AF + U+09BC) (DDA + Nukta U+9A1 + U+09BC) and(DDHA+NuktaU+9A2+U+09BC)insteadofthesinglecodepointsYYA(U+9DF)RRA(U+09DC) andRRHA (U+09DD) although the useof lsquoNuktarsquo is otherwise completelyunnaturalinBanglaIt is noted that in the current Unicode Standard chart these characters are listed asadditionalconsonantsAspertheLGRProcedurehoweverthesedecisionsdependontheIDNAProtocolthroughasetofprodeduresdevelopedbytheIETFEventhoughtheUnicode Standard also prescribesmethods to produce these three characters both asatomiccharacters (forexample09DC forড় [r]09DD forঢ় [rh] and09DFasয় [y]assinglekeystroke)theIDNAprotocolrequiresthatwetreatthemasconjunctcharactersandthenallocatecodesfortheseintheUnicodeBengaliBlockItmaybenotedthattherecouldbesporadicattemptsorcasesofwritingMuslimnamesUrdupoeticwordsandPerso-Arabicloanwordswithnuktaunderক(k)খ(kh)গ(g)জ(j) and ফ (ph) only for the sake of correct pronunciation and for maintaining thesanctityoftheloanwordThesewerealsolikeusingBanglawritingsystemtoworkliketheIPAscriptItishowevernotinuseinBanglawritinginprinting

337Visargabiʃɔrgo(ঃ-U+0983)andAvagraha(ঽ-U+09BD)

TheVisargabiʃɔrgoU+0983 is frequentlyused inBangla loanwordsborrowed fromSanskritandrepresentsasoundveryclosetohOnecouldquoteasanexampleদঃখduhkholdquosorrowrsquorsquoldquounhappinessrsquorsquo(U+0926U+0941U+0983U+0916)The Avagraha ঽ (U+09BD) is mainly used in Sanskrit Pali Prakrt or Maithili textswritteninBanglaItisgraduallybeingreplacedbyanuppercomma(egনেরাঽপরািণre-writtenasনেরাrsquoপরািণ)ItisrarelyusednoweveninotherlanguagesusingBanglascriptIncaseofLGRtheAvagrahaisnotpartoftherepertoireIthasbeendecidedthereforenottoretainAvagraha(ঽ)(U+09BD)becauseitisblockedinTLDsaspertheMaximalStartingRepertoire(MSR)PleaseseeAppendixIIinsection11foracompletelistofBanglaconsonantsandtheirallographs

338ZeroWidthNon-joiner(U+200C)andZeroWidthJoiner(U+200D)ThisnoteispertinenttotheuseofZeroWidthJoiner(ZWJ)andZeroWidthNonJoiner(ZWNJ)asusedinBanglaItneedstobenotedthatNepaliKonkaniandHindiusethesetwosignsinadifferentmanner

19

ZWJ(U+0200D)andZWNJ(U+0200C)arecodepointsthathavebeenprovidedbytheUnicodestandardto instructtherenderingofastringwherethescripthastheoptionbetweenjoiningandnon-joiningcharactersWithouttheuseofthesecontrolcodesthestringmayberenderedinanalternateformfromwhatisintendedUseofZWJ

bull InsofarasBanglaisconcernedZWJisusedfortheproperrenderingofcharacterssuchaskhaṇḍa-taৎasinসতGিজৎ (satyajit)ldquoSatyajitrdquoandসৎ(sat)ldquohonestrdquoThisistypedasfollowsta+Hasanta+ZWJ(U+0200D)

bull However ZWJ is more important where same combination of consonantal

characters is represented differently depending upon the contexts Eg র++য

havetworepresentationsinBanglamdashasযHandasর GTogettheformযHonehasto

type in the following mannermdashর++য but for র G the sequence would be

র+ZWJ++য [154] In other words ZWJ is used in the rendering of wordsdemanding ya-phalā after ra which is otherwise not possible to type (render)due to the same order of ra+hasanta+antastha ja in the medial andor finalposition Interestingly ra+hasanta+antastha ja is used to type repha on theconsonant -antasthaja as inকায6 (kaarjo) In order to get a ya-phalā after the

consonant -ra it is therefore obligatory to use ZWJ after -ra as in র Gাপার

(wrapper)র Gাশ(rash)র Gািল(rally)etcThetypingsequenceisgivenbelow

ra(র)+ZWJ+hasanta()+antasthaja(য)=র GUseofZWNJ

bull TheuseofZWNJinBanglaisusedtorepresenttheexplicitHasantaorHalantInordertoavoidconjunctformationincaseswherethereisanexplicithasantabeforethesucceedingconsonanttheZWNJisused

Consonant+hasanta+ZWNJ+consonant=explicithasantaExampleAা7 কথন(prakkathanaprakkɔtʰon)

TheuseofZWJZWNJhavebeenruledoutfromtherootzonebythe[Procedure]Usedin Bangla to create alternate renderings the insertion of these two signs can affectsearchingaswellasNLP

20

TheZeroWidthNon-joiner(ZWNJ)isaninvisiblecharacterusedincertaincases(afterHasanta)wheredefaultconjunctformationistobeexplicitlyrestrictedandtheHasantajoiningthetwoconsonantsparticipatingintheconjunctformationneedstobeexplicitlyshown

339UseofYa-phalaaYa-Phalaasequencesare two instances inBanglawhereHasanta isprecededbya fullvowel(U+0985অ-BENGALILETTERAandU+098Fএ-BENGALILETTERE)

bull অা 098509CD09AF09BE BENGALILETTERA+BENGALISIGNVIRAMA+BENGALILETTERYA+BENGALIVOWELSIGNAA

bull এা 098F09CD09AF09BEBENGALILETTERE+BENGALISIGNVIRAMA+BENGALILETTERYA+BENGALIVOWELSIGNAA

For renderingYa-phalā followedbyঅ andএ it isnecessary to typeU+09CDHasantaplusU+09AFyaprecededby thesaidvowelsThis isapurely ligaturalentityand theadditionofYa-phalāandākaraisusedtoelicittheaeligsoundasinEnglishacidঅGািসড

association অGােসািসেয়শনlsquobatrsquoবGাটlsquofatrsquo ফGাট lsquomatrsquo মGাটlsquocaprsquoকGাপetcTheBrāhmīscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribed as lsquovowel killerrsquo although it actually indicates absenceof a vowel after themarkedconsonantOnly theconsonantscanhave theHasantamarkedButasweseehereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅ8াandএ8া(CfUnicode100p473[100])

3310FormationofRa-phalaaandRefSequences Thiscasereferstotheformationofrephaandra-phalāasfollows

Ra-Hasanta= (C2H)whereC2iseither

09B0(র-BENGALILETTERRA)or 09F0(ৰ-ASSAMESELETTERRAUnicodename

BENGALILETTERRAWITHMIDDLEDIAGONAL)His09CD(-BENGALISIGNVIRAMA)

Owingtoco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+kaasinঅকHarkaldquothesunldquo)ra-phalā=C+Hasanta+ra(egT ieka+Hasanta+raasinচTchakraldquocyclerdquo)Thepointisin

21

both the cases the slot for ra could be Bangla ra র (U+09B0) or the Assamese ra ৰ(U+09F0)followedprecededbythecommonHasanta(U+09CD)whereastheshapesofrephaandra-phalāinboththecasesremainthesameTheLGRmakesanoteofthispoint of concern with respect to the two RAs in disguise as it would be compeltelyimpossibletodistinguishbetweenthemwithnakedeyesinalablesogeneratedwhichmay consequently lead to concerns related to spoofing and other kind of cyberirregularitiesThemotivetoclassthesetwoCPsas(blocking)variantsisbecausefullyrendered labels may mask the distinction between Bangla ra র (U+09B0) or theAssameseraৰ(U+09F0)ThatprovidesthejustificationforVariantSet4thoughonlyinthecontextoffollowingHasantThedifferencebetweentheRAsisonlydistinguishableifonelooksintotheirUnicodevaluesThereforelabelssuchasঅকHarka শীষH sırsalsquotopapexrsquo অY abhra lsquocloudthe skyrsquo ম śrama lsquophysical labourrsquo could be extremelydangerous as theweb-usermay never verify the digital content (the labels) with itsunicodevaluecodepoints ThispointismadeexplicitlywithreferencetoTable9(ofsequencesp36)andTable16(ofWLESymbolsp47)thataretofollowMoreoveritisnoteworthythattheREPHAcanalsooccurwithKHANDATATheconditionsinthiscontextofKHANDATAareliabletobesuchthattheCshouldbeeitherRAU+09B0(র)(usedinBangla)orRAU+09F0(ৰ)(usedinAssamese)

4 OverallDevelopmentProcessandMethodologyThe Neo-Brāhmī Generation Panel (NBGP) has been formed by members havingexperience in Linguistics (especially in NLP Computational linguistics) LiteratureLanguageHistoryandEpigraphyUndertheNeo-BrāhmīGenerationPanelBanglaandeightotherscriptsbelongingtoseparateUnicodeblocksarebeingtakenuptoassignaseparate LGR for each However an attempt ismade to ensure that the fundamentalphilosophybehindbuildingthoseLGRsconsistentwithallotherBrāhmī-derivedscriptsThepresentLGRwillcater tomultiple languagesbelongingtoEGIDSscale1to4(seeTable4)thatuseBanglascriptThefollowingguidingprinciplesareusedinmakingdecisionsaboutBanglaLGRCode-points

41 GuidingPrinciplesTheNBGP adopts followingbroadprinciples for selection of code-points in the code-pointrepertoireacrosstheboardforalltheNeo-Brāhmīscriptswithinitsambit

411 InclusionPrinciples4111 ModernUsageEvery character proposed should be in the everyday usage of a particular linguisticcommunityThecharacterswhichhavebeenencodedintheUnicodefortranscription

22

purposesonlyorforarchivalpurposeswillnotbeconsideredforinclusioninthecode-pointrepertoire4112 UnambiguousUseEvery character proposed should have unambiguous understanding among linguistsaboutitsusageinthelanguage

42 ExclusionPrinciplesThe main exclusion principle is that of External Limits on Scope These consist ofprotocolsor standardswhichareprerequisites to theLabelGenerationRule-setsAllfurtherprinciplesareinfactsubsumedundertheselimitationsbuthavebeenspeltoutseparatelyforthesakeofclarity421 ExternalLimitsofScopeThecodepointrepertoireforrootzonebeingaveryspecialcaseatthetopofprotocolhierarchies thecanvasofavailablecharacters forselectionasapartof theRootZonecodepointrepertoireisalreadyconstrainedbyvariousprotocollayersbeneathitThefollowingthreemainprotocolsstandardsactassuccessivefiltersiTheUnicodeChartOut of all the characters that are needed by the script in question if a particularcharacter is not encoded in Unicode it cannot be incorporated in the code pointrepertoire Such cases are quite rare and especially so in Bangla-Asamiyā-ManipuriWritingSystemgiventheelaborateandexhaustivecharacterinclusioneffortsmadebytheUnicodeconsortiumiiIDNAProtocolUnicode being the character-encoding standard for providing the maximum possiblerepresentation of a given scriptlanguage it has encoded as far as possible all thepossible characters needed by the script However the Domain name being aspecialized case it is governed by an additional protocol known as IDNA(InternationalizedDomainNames inApplications) The IDNAprotocol excludes somecharactersoutofUnicoderepertoirefrombeingpartofthedomainnamesiiiMaximalStartingRepertoire(MSR)TheRoot-zoneLGRbeing the repertoireof characterswhicharegoing tobeused forcreationoftheRoot-zoneTLDswhichinturnconstituteanevenmorespecializedcaseof domain names the ROOT LGR procedure introduces additional exclusions on theIDNArsquosallowedsetofcharacters

23

ExampleBanglaSignAvagrahaঽ(U+093D)evenifallowedbyIDNAprotocol isnotpermittedintheRootZoneRepertoireaspertheMSRTosumuptherestrictionsstartoffwithadmittingonlysuchcharactersasarepartofthe code-block of the given scriptlanguage The IDNA Protocol further narrows thisdownandfinallyanadditionalfilterintheformofMaximalStartingRepertoirerestrictsthecharactersetassociatedwiththegivenlanguageevenmore4211 NoPunctuationMarksTheTLDsbeingidentifierspunctuationmarkerspresentinBraHami-basedscriptswillnotbeincluded 4212 NoSymbolsandAbbreviationsAbbreviations weights and measures and other such iconic characters like BANGLAISSHAR(U+09FA)BANGLACURRENCYDENOMINATORSIXTEEN৹(U+09F9)etcwillalsonotbeincluded4213 NoRareandObsoleteCharactersThere are characterswhich have been added toUnicode to accommodate rare formssuchasSanskriticVOCALICRRৠ(U+09E0)andVOCALICLldquoঌrdquo (U+098C)aswellasVOCALICLLৡ(U+09E1)andtheallographicndashkaraformsofthelattertwosymbols-VOWELSIGNVOCALICL(U+09E2)andVOWELSIGNVOCALICLLldquo(U+09E3)Allsuch charactersareexcludedwhich complieswith theConservatismprincipleas laiddownintheRootZoneLGRprocedureHoweverinBanglathe-karacorrespondingtoVOCALICRRৠ(U+09E0)whichisVOWELSIGNVOCALICRRldquordquo(U+09C4)isstill inactiveuseincertainlimitedborrowedorSanskriticwordsandarethereforeretained4214 NoStressMarkersofClassicalSanskritandVedicStressmarkers for classical Sanskrit will not be included This is also in consonancewiththeLetterprincipleaslaiddownintheRootZoneLGRprocedure4215 ABNFThe Augmented Backus-Naur Formalism (ABNF) is described in Section 541 andAppendix(Section101)

5 RepertoireTheBanglaWritingSystemisrepresentedinUNICODEusingtheBengali(Bangla)scriptname as enumerated in ISO 15924 corresponding to languages such as Asamiyā(Assamese) Bangla (Bengali) and Manipuri The BENGALI block used for Bangla-

24

Asamiyā-Manipuri in theUNICODEhas93 entriesThis sectiondetails the code-pointrepertoirethattheNeo-BrāhmīGenerationPanel[NBGP]proposestobeincludedintheBanglaLGRItmaybementionedherethat theGovernmentofAssamhassubmittedaproposal toBureauof Indian Standards (BIS) on26th February2016 for dis-unificationofBanglaand Asamiyā Scripts The BIS inits 8thMeetingofIndian Language Technologies andProducts Sectional Committee LITD 20 held on 23rd Aug 2017 decided torefer theproposalforrecognitionofAssamesescriptinISOIEC10646toISOUntiltheUNICODEConsortiumtakesanyfurtheractionitwillbeassumedthattheCodePointRepertoireunderTable11willbevalidforallthethreelanguagesasaboveFor each of the code points language references have been given in the last columntitledReferenceunderTable8titledtheldquoCodePointRepertoirerdquoForentirecoverageofBanglacodepointsreferencesofBanglaAsamiyā(Assamese)Manipuri(Meitei)andBishnupriya are given Kokborok written in Bangla script is not known to haveintroducedmanynewcomplicationsexceptforoneparticularcharacterThoughonlyafewrepresentativelanguagesunderEGIDSScale1-4havebeenchosenforreferencingthey together cover all the code-points required for all the languages that NBGP hasconsideredasgivenunderBanglaUnicodePoints(asgiveninUNICODE63)Howeverbefore thedetailsarepresented it is ideal to lookat theBanglaCodePointChartfromMaximalStartingRepertoire[MSR]Version3Itmaybenotedthattheshapesofthereferenceglyphsgivenbelowinthecodechartsarebasedononeofthemanyfontsdesigned and are not prescriptive because there could be some variations in actualfonts ndash both UNICODE-compatible and True-Type ones Consider the following Codepointtable

25

Figure1BanglaCodePagefrom[MSR]forBangla-Asamiyā-Manipuri

ColourconventionAllcharactersthatareincludedinthe[MSR]-YellowbackgroundPVALIDinIDNA2008butexcludedfromthe[MSR]-PinkishbackgroundNot PVALID in IDNA2008 or are ineligiblefor the root zone (digits hyphen) - Whitebackground

GiventheBanglaUnicodeBlockasinFigure1forthecodepointsthoseareincludedintheMSRthefollowingsymbolswillneedaseparatetreatmentৎ U+09CE BanglaLetterKhanda-Taৰ U+09F0 Asamiyā-BanglaLetterRaWithMiddleDiagonalৱ U+09F1 Asamiyā-BanglaLetterRaWithLowerDiagonal

26

51 CodePointRepertoireInclusion

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

1 U+0981 BENGALISIGNCANDRABINDU

Candra-bindu

1Bangla2Manipuri2Assamese

[112][122][125]

2 U+0982 ং BENGALISIGNANUSVARA

Onushshar(Anusvara)

1Bangla2Manipuri2Assamese

[112][122][125]

3 U+0983 ঃ BENGALISIGNVISARGA

Biśarga(Visarga)

1Bangla2Manipuri2Assamese

[112][122][125]

4 U+0985 অ BENGALILETTERA

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

5 U+0986 আ BENGALILETTERAA

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

6 U+0987 ই BENGALILETTERI

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

7 U+0988 ঈ BENGALILETTERII

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

27

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

8 U+0989 উ BENGALILETTERU

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

9 U+098A ঊ BENGALILETTERUU

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

10 U+098B ঋ BENGALILETTERVOCALICR

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

11 U+098F এ BENGALILETTERE

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

12 U+0990 ঐ BENGALILETTERAI

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

13 U+0993 ও BENGALILETTERO

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

14 U+0994 ঔ BENGALILETTERAU

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

28

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

15 U+0995 ক BENGALILETTERKA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

16 U+0996 খ BENGALILETTERKHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

17 U+0997 গ BENGALILETTERGA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

18 U+0998 ঘ BENGALILETTERGHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

19 U+0999 ঙ BENGALILETTERNGA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

20 U+099A চ BENGALILETTERCA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

21 U+099B ছ BENGALILETTERCHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

29

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

22 U+099C জ BENGALILETTERJA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

23 U+099D ঝ BENGALILETTERJHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

24 U+099E ঞ BENGALILETTERNYA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

25 U+099F ট BENGALILETTERTTA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

26 U+09A0 ঠ BENGALILETTERTTHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

27 U+09A1 ড BENGALILETTERDDA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

30

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

28 09A109BC(U+09DC)

ড় NormalizedformofBENGALILETTERRRA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]09DCisthepreferredcodepointhoweveritisnotavailableforLGRasperthestandardsgoverningthisLGRdevelopment

29 U+09A2 ঢ BENGALILETTERDDHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

30 09A209BC(U+09DD)

ঢ় NormalizedformofBENGALILETTERRHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]09DDisthepreferredcodepointhoweveritisnotavailableforLGRasperthestandardsgoverningthisLGRdevelopment

31 U+09A3 ণ BENGALILETTERNNA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

32 U+09A4 ত BENGALILETTERTA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

31

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

33 U+09A5 থ BENGALILETTERTHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

34 U+09A6 দ BENGALILETTERDA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

35 U+09A7 ধ BENGALILETTERDHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

36 U+09A8 ন BENGALILETTERNA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

37 U+09AA প BENGALILETTERPA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

38 U+09AB ফ BENGALILETTERPHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

39 U+09AC ব BENGALILETTERBA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

32

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

40 U+09AD ভ BENGALILETTERBHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

41 U+09AE ম BENGALILETTERMA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

42 U+09AF য BENGALILETTERYA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

43 09AF09BC(U+09DF)

য় NormalizedformofBENGALILETTERYYA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]09DFisthepreferredcodepointhoweveritisnotavailableforLGRasperthestandardsgoverningthisLGRdevelopment

44 U+09B0 র BENGALILETTERRA

Consonant 1Bangla2Manipuri

[112][125]

45 U+09B2 ল BENGALILETTERLA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

46 U+09B6 শ BENGALILETTERSHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

33

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

47 U+09B7 ষ BENGALILETTERSSA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

48 U+09B8 স BENGALILETTERSA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

49 U+09B9 হ BENGALILETTERHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

50 U+09BE া BENGALIVOWELSIGNAA

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

51 U+09BF ি BENGALIVOWELSIGNI

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

52 U+09C0 ী BENGALIVOWELSIGNII

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

53 U+09C1 BENGALIVOWELSIGNU

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

34

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

54 U+09C2 BENGALIVOWELSIGNUU

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

55 U+09C3 BENGALIVOWELSIGNVOCALICR

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

56 U+09C4 BENGALIVOWELSIGNVOCALICRR

Kāra(Mātrā)

1Bangla2Assamese

[112][122]

57 U+09C7 l BENGALIVOWELSIGNE

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

58 U+09C8 m BENGALIVOWELSIGNAI

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

59 U+09CB lা BENGALIVOWELSIGNO

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

60 U+09CC lৗ BENGALIVOWELSIGNAU

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

35

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

61 U+09CD BENGALISIGNVIRAMA

Hasanta(=Halant)Virama(=Da ri)

1Bangla2Assamese2Manipuri

[112][122][125]

62 U+09CE ৎ BENGALILETTERKHANDATA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

63 U+09F0 ৰ BENGALILETTERRAWITHMIDDLEDIAGONAL

Consonant 2Assamese [122]

64 U+09F1 ৱ BENGALILETTERRAWITHLOWERDIAGONAL

Consonant 2Assamese2Manipuri

[122][125]

Table8BanglaCode-PointRepertoire

Apart from the above individual code-points the Neo-Brāhmī Generation Panel alsoproposes some specific sequences which enable conditional inclusion of the BanglaLETTER A and E followed by Bangla SIGN VIRAMA and Bangla LETTER YA againfollowed by Bangla VOWEL SIGN AA in the repertoire for enabling inclusion of aeligsoundasinEnglishlsquobatrsquolsquocatrsquoetc

SrNo

UnicodeCodePoints

Sequence

CharacterNames Examplelanguagesusingthecode-point

(Notexhaustive

list)

Reference

36

SrNo

UnicodeCodePoints

Sequence

CharacterNames Examplelanguagesusingthecode-point

(Notexhaustive

list)

Reference

S1 098509CD09AF09BE

অ8া BENGALILETTERABENGALISIGNVIRAMABENGALILETTERYABENGALIVOWELSIGNAA

BanglaAssamese

[112][122]

S2 098F09CD09AF09BE

এ8া BENGALILETTEREBENGALISIGNVIRAMABENGALILETTERYABENGALIVOWELSIGNAA

Bangla [112]

Table9Sequences

52 CodePointRepertoireExclusionTherearesomecharactersoftheBanglascriptthatfindplaceintheUnicodebuthavenot been included in the repertoire in the LGR proposal The reason for excludingঌ(U+098C)andৗ(U+09D7)isthattheyarerareandobsoletecharacters

SrNo CodePoints

Glyph CharacterNames Note

1 U+098C ঌ BENGALILETTERVOCALICL Limitedordeclininguse

2 U+09D7 ৗ BENGALIAULENGTHMARK Limitedordeclininguse

Table10ExcludedCodePoints

37

53 CodepointnotusedaloneBENGALI SIGN NUKTA U+09BC (See 336) is excluded from repertoire since it will never be used alone It will be used as sequence in three special characters in normalized form for ড় ঢ় য়

UnicodeCodePoint

Glyph CharacterName Reasonforexclusion

U+09BC BENGALISIGNNUKTA

Never used alone Only used together with U+09A1 ড

U+09A2 ঢ U+09AF য as to form ড় ঢ় য় respectively

Table10bExcludedCodePoints

54 TheBasisofPresentIDNThepresentLGRhasalsobenefitedfromtheearlierworkonIDNforBangla(differentversions)doneforभारतorভারতdraftedbetween20112009and18072013

541 TheABNFVariablesTheAugmentedBackus-NaurFormalism(ABNF)beganwiththefollowingvariables

CrarrConsonantVrarrVowelMrarrkāra(Mātrā)BrarrAnusvāra(onuʃʃār)DrarrCandrabinduXrarrVisarga(biʃɔrgo)HrarrHasantaViramaZrarrKhandaTa

TheAugmentedBackus-NaurFormalism(ABNF)willusethefollowingOperators

SrNumber Operator Function

1 ldquo|ldquo Alternative

2 ldquo[]rdquo Optional

3 ldquordquo VariableRepetition

38

4 ldquo()rdquo SequenceGroup

Table11TheABNFFormalism

InwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoBanglaaregiventofacilitateunderstanding

542 TheVowelSequenceInwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoBanglaare given To facilitate understanding of other Brahmi script users equivalents inDevanāgarīareprovidedwherevernecessaryAvowelsequenceismadeupofasinglevowelItmaybefollowedbutnotnecessarily(optionally)byanAnusvāraonuʃʃār(B)Candrabindu(D)oraVisargabiʃɔrgo(X)ThenumberofDBorXwhichcanfollowaV inBanglamaynotberestrictedtooneGoingbytherules illustratedinthedocument it isclearthat formationssuchasVDDVBBandVXXare invalidorthographicunitsHowever it isvalidandpossible tohaveformationsorsequencessuchasanusvarafollowedbyachandrabinduononehandandvisarga followed by a chandrabindu on the other as in হ8াংচা lsquohaelignchārsquo and lsquohaelignrsquo হ8াঃrespectivelyThepossibilityof aVisarga orAnusvāra (onuʃʃār) followingaCandrabinduexists inBangla Vowel can optionally be followed by a combination of Hasanta Virama [H]Consonant [C] to formaYa-phala ldquoYa-phala isapresentation formofU+09AFBanglaletterযorlsquoyarsquoRepresentedbythesequenceltU+09CDieBENGALISIGNVIRAMABangla SIGNHasanta or VIRAMA U+09AF -য BENGALI LETTER YAgt Ya-phala has aspecialformয়AgainwhencombinedwithU+09BEাBENGALIVOWELSIGNAA(ielsquoaarsquo(ā))itisusedfortranscribing[aelig]asintheldquoardquointheEnglishwordldquobatrdquowritteninBanglaasব8াটAVowel-sequenceadmitsthefollowingcombinations

5421 ASingleVowel

ExamplesV অअ

5422 AVowelwithConditionsAVowelcanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D]orVisarga[X]or Candrabindu+ Anusvāra [DB] or Candrabindu+ Visarga [DX] or combination ofHasanta(orVirama)[H]followedbyConsonant[C]followedbykāra(Mātrā)[M]

39

Examples

VB অং अ

VD অ अ

VX অঃ अः

VDB অং अ

VDX অঃ अःVHCM অ8াএ8া

5423 VHCMSequenceAVHCMsequencecanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D] orVisarga [X]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga [DX]

Examples VHCMB অ8াংএ8াং VHCMD অ8াএ8া VHCMX অ8াঃএ8াঃ VHCMDB অ8াংএ8াং VHCMDX অ8াঃএ8াঃ

543 TheConsonantSequence

5431 ASingleConsonant(C) ExampleC কक

5432 AConsonantwithConditionsAConsonantoptionallyfollowedbydependentvowelsignkāra(Mātrā)[M]orAnusvāra [B] or Candrabindu [D] or Visarga [X] or Hasanta (also known asVirama)[H]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga[DX]

Example

CM িকক -कक

CB কং क

40

CD ক क

CX কঃ कः

CH p क(Pureconsonant)

CDB কং क

CDX কঃ कः

5433 CMSequenceACMsequencecanbeoptionallyfollowedbyBDXDBorDX

Example CMB কীংকং कक

CMD কা का

CMX বীঃ वीः

CMDB কাং का

CMDX কাঃ काः

5434 SequenceofConsonants Asequenceofconsonants(upto4)joinedbyHasanta(alsoknownasVirama)

3(CH)CExample

CHC W rarr ন++ ত न+त

CHCHC sup2 rarr ন+ + ত+ + র न+त+र

CHCHCHC q8 rarr ন++ত++র++য় न+त+र+य

5435 Subsets While considering its subsets as a representative example we will consider thecombinationCHConlyhoweverthesameisequallyapplicabletoCHCHCandCHCHCHC[A]ThecombinationmaybefollowedbyMBDXDBorDX

Example CHCM sup3ী rarrক ক ী 4कrarrककী

CHCB sup3ং rarrক ক ং 4कrarrकक

CHCD sup3 rarrক ক 4कrarrकक

41

CHCX sup3ঃ rarrক ক ঃ 4कःrarrककঃ

CHCDB sup3 ং rarrক ক ং 4कrarrकक

CHCDX sup3ঃ rarrক ক ঃ 4कःrarrककः [B] 3(CH)CMmayfurtherbefollowedbyaBDXDBorDX

Example

CHCMB sup3ীং rarr ক ক ী ং 4क rarr क क ी

sup3ং rarr ক ক ং 4क rarr क क

CHCMD sup3া rarr ক ক া 4का rarr क क ा

CHCMX sup3ীঃ rarr ক ক ী ঃ 4कः rarr क क ी ः

CHCMDB sup3াংrarr ক ক া ং 4काrarr क क ा

CHCMDX sup3াঃ rarr ক ক া ঃ 4काः rarr क क ा ः

544 TheKhanda-Tasequence

5441 AsinglelsquoKhandarsquo-Ta(Z) Example Zৎ=x

5442 AKhandaTaCombination10AKhandaTacanbeprecededbyaconsonantandHasanta(alsoknownasVirama)

[CH]Z

Example র++ৎ =ৎHasinভৎHসনা (bhartsanā) scoldingNoteTheconditionsinthiscontextofKHANDATAarethattheCshouldbeeitherRAU+09B0(র)(usedinBangla)orRAU+09F0(ৰ)(usedinAssamese)

545 SpecialCasesSandPTwospecialcasesinvolvingSequences(referredtoasSandPinTable16underSection7)couldbedescribedbrieflyhereLetustakeupSinthefirstinstanceItisnoteworthythat thereare two instances inBanglawhereHasanta(U+09CD) isprecededbya full

10 Refer to Rule P in Section 7 Table 16

42

vowel (U+0985 অ - BANGLA LETTER A and U+098F এ - BANGLA LETTER E) ForrenderingYa-phalāfollowedbyঅandএitisnecessarytotypeU+09CDplusU+09AFyapreceded by the said vowels This is a purely ligatural entity and the addition ofYa-phalāandā-kāraisusedtoelicittheaeligsoundasinEnglishlsquobatrsquolsquofatrsquoetcTheBrahmıscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribedaslsquovowel killerrsquo although it actually indicates absence of a vowel after the markedconsonant Only the consonants can have the Hasanta marked But as we see hereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅGাandএGা(CfUnicode100p473[100])Another case refers to the formation of repha and ra-phalā in the said script andmentioned in the tableaboveasPOwing toco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+

kaasinঅকHarkaldquothesunldquo)ra-phalā=C+Hasanta+ra(egTieka+Hasanta+raas

inচTcakraldquocycleldquo)ThepointisinboththecasestheslotforracouldbeBanglaraর

(U+09B0)ortheAssameseraৰ(U+09F0)followedprecededbythecommonHasanta(U+09CD)whereastheshapesofrephaandra-phalāinboththecasesremainthesame

43

6 VariantsThissection talksabout thevariants in theBanglascriptTheNBGPcategorizes theseconfusinglyvariantsintwogroupsGroup1ConfusingduetopurevisualsimilarityGroup2ConfusingduetodeviationfromnormallyperceivedcharacterformationsbylargerlinguisticcommunityForGroup1anyidenticalcodepointsaredefinedasvariantsTheconfusablebutnotidenticalcasesarenotproposedasthereisanotherpanel(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesHowevercaseswhichbelongtoGroup2areproposedtobeconsideredasvariantsThesecasesarenotofmerevisualsimilarityasthey involve some deviations from the widely accepted norms of Bangla Aksharformations These can cause confusion even to a careful observer and hence beingproposedasvariantsThe variants are generated in a script when two or more forms are formed withdifferent storage or code points In Bangla the e-kāra ā-kāra and the o-kāra havedifferentcodepointsOnecantypeowithaconsonantatonegoandthesamebytypinge-kāra and ā-kāra as two separate keys getting the same results A reader cannotdifferentiatebetween the twoko (েকা)one typedwitha singlekeyand theotheronetypedwithtwodifferentkeysMoreoverthiswillnotbeconsideredasacaseofvariantbecauseakārafollowedbyakāraisnotallowed

61 InScriptVariantsHoweverweproposetwocasesoftruein-scriptvariantsinBanglascriptCASEIAs far as true variants in Bangla are concernedwemay drawour attention to caseswhereinHasantawith(U+09A5)থ(tha)appearsasconjunctwith(U+09B8)স(sa)and(U+09A8)ন(na)1 স+Hasanta+থ(U+09B8+U+09CD+U+09A5)versus

স +Hasanta+হ(U+09B8+U+09CD+U+09B9)

2 ন+Hasanta+থ(U+09A8+U+09CD+U+09A5)versus

ন+Hasanta+হ(U+09A8+U+09CD+U+09B9)

44

Theabovecombinationsifwrittenintraditionalorthographycouldbelittleconfusingwheretheথ(tha)inconjunctappearslikeaহ(ha)Theconjunctcouldbeintheinitialmedialorfinalpositions(asshownbelowinegno1)Itcouldbetypedwrongaswellthinking itwas aহ (ha)U+09B9 increasing the chances of risks in labelwriting andidentificationExamples1 acuteandসহ(asinacuteানsthānaacuteলsthulaাacuteGsvāsthyaঅacuteায়ীasthāyı)

2 andনহ(asinparagrantha)The fontswhichrepresent traditionalBanglawritingsystemcould tend tocreate thisproblemThereforethesemaybetakenascasesofvariantsinBanglaCASEIIAnotherinterestingexampleofvariantisencounteredinra+HasantaandHasanta+racombinationsinwritinglabelsintheBanglascript(forlanguagessuchasBanglaAssameseandManipuri)Thevariantcasesariseintypinglsquorepharsquo(involvingra+Hasanta)andlsquora-phalārsquo(involvingHasanta+ra)lsquoRepharsquocouldbeformedbytwosequences(mainlybecausebothAssameseandBanglafindplaceinthesameUNICODEpointsandlsquoB_RArsquoaswellaslsquoA_RArsquorefertothesamephoneticelement)Herethefinalligatureslookthesameandwillbeasfollows

(1) B_RA+H+C(2) A_RA+H+C

Where

B_RA = U+09B0BENGALILETTERRA(র)orA_RA = U+09F0BENGALILETTERRAWITHMIDDLEDIAGONAL(ৰ)H = U+09CDBENGALISIGNVIRAMA()C = anyconsonant(theoretically)

Example

Sequence1(UsingBanglaRA)

Ligature1

Sequence2(UsingAssameseRA)

Ligature2

U+09B0(র)U+09CD()U+0995(ক) কH U+09F0(ৰ)U+09CD()U+0995(ক) কH

U+09B0(র)U+09CD()U+09A0(ঠ) ঠH U+09F0(ৰ)U+09CD()U+09A0(ঠ) ঠH

Table12ExampleofRepha

45

NoteAsBanglaandAssameseকandঠlookexactlythesametheresultantcombinationswithRephalookidenticalAdditionofRephadoesnotmakeanydifferencelsquoRa-phalarsquocouldbeformedbytwosequencesonsimilargroundsandthefinalligatureswouldlookthesame

(1) C1+H+B_RA(2) C1+H+A_RA

WhereC1 = anyconsonantsexceptKhanda-ta

Example

Sequence1(UsingBanglaRA)

Ligature1

Sequence2(UsingAssameseRA)

Ligature2

U+0995(ক)U+09CD()U+09B0(র) U+0995(ক)U+09CD()U+09F0(ৰ)

U+09A8(ন)U+09CD()U+09B0(র) ) U+09A8(ন)U+09CD()U+09F0(ৰ) )

Table13ExampleofRa-phala

AstheAssameseandBanglaRephaandRa-phalaconjunctformslookthesamethiscouldcauseconfusabilitytotheend-usersHencetherephaandra-phalacasesneedtobedefinedasvariantsNBGPconcludedtodefineরandৰasvariantcodepointswhereonlyonevariantsetbetweenরandৰcouldcoverallcasesButthiswillcreateblockedvariantlabelsegif

someoneregistersldquoরররrdquothevariantlabelldquoৰৰৰrdquowillbegeneratedasvariantandwillbeblockedandviceversaHoweveritisonlyblockedatthelabellevelifsomeoneelseneedstoregisterotherlabelsegৰৰorৰৰৰৰitisstillpossible

62 CrossScriptVariantsAcrispcrossscriptstudyforBanglahasbeendonewithrespecttosisterscriptssuchasDevanāgarī Gurmukhı and Odia11(formerly Oriya) keeping in mind the visual andtechnicalconfusionstheymaycauseaslabelsonthewebdomainMoreoverthereisnoin-script variant in Bangla as far as the orthography is concerned The followingcharacters are being proposed by the NBPG as variants Although there are certaincharacters which are somewhat similar they but have not been included here TheyhavebeenprovidedintheAppendix(102)forreference

11 Unicode uses Oriya for the script although Odia is now the official term used

46

1 BanglaandNāgarīDevanāgarīScript

Bangla Devanāgarī

মU+09AE

मU+092E

িU+09BF

िU+093F

Table14-BanglaandDevanāgarīcross-scriptvariantcodepoint

2 BanglaandGurmukhiScript

Bangla Gurmukhı

মU+09AE

ਸU+0A38

িU+09BF

ਿU+0A3F

Table15-BanglaandGurmukhıcross-scriptvariantcodepoint

7 WholeLabelEvaluationRules(WLE) ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinsection32whenwritten inBangla12ScriptThe ruleshavebeendrafted in suchawaythattheycanbeeasilytranslatedintotheLGRspecificationsBelow are the symbols used in the WLE rules for each of the Indic SyllabicCategoryasmentionedinthetableprovidedinCodepointrepertoire(Section51)

C rarr Consonant

M rarr Kāra(Mātrā)

V rarr Vowel

B rarr Anusvāra

12 As used by the Unicode denoting and including both Assamese and Maṇipuri

47

D rarr Candrabindu

X rarr Visarga

H rarr Hasanta

Z rarr KhandaTa

S rarr S1S2(fromTable9)or(ae)Ya-phalā(V1HC1M1)whereV1iseither0985(অ-BENGALILETTERA)or098F(এ-BENGALILETTERE)His09CD(-BENGALISIGNVIRAMA)C1is-09AF(য-BENGALILETTERYA)M1is-09BE(া-BENGALIVOWELSIGNAA)S1 and S2 are valid even they are not allowed by the other context rules

P rarr Ra-Hasanta(C2H)whereC2iseither09B0(র-BENGALILETTERRA)or 09F0 (ৰ - ASSAMESE LETTER RA

Unicode name BENGALI LETTER RAWITHMIDDLEDIAGONAL)

His09CD(-BENGALISIGNVIRAMA)

Table16-SymbolsusedinWLErules

ItisalsoperhapsidealtomentionherethatinBanglatheconsonantletters(orgraphemes)arephysicallyjoinedtoformldquoclustersrdquothatcouldtheoreticallyconjoinfromtwotofourconsonantsandcombinetocreatenewshapesDashandChaudhuri(1998)statethatthereareldquonearly380uniqueconsonantclustersrdquooutofwhichBi-consonantalcombinationsare290three-lettercombinationsaccountforanother80andtherareroneswithfourlettersnumber10more[136Pg4]MoredetailsofsuchcombinationscouldbeseeninPabitraSarkar(1993)[135] 71FinalSetofWLERulesTheprevalentpatternsinBanglaandvariousrestrictionsbelowarethespecificWLErulesthatneedtobeimplemented

48

1 CisasetofCandCNwhereCNisthesetofnormalizedformsofড়ঢ়য়2 HmustbeprecededbyC

Example

3 MmustbeprecededbyCExampleকা

4 DmustbeprecededbyeitherofVCorMExampleআখখা হা

5 XmustbeprecededbyeitherofVCMorDExampleউঃখঃবঃাঃ দ ঃ

6 BmustbeprecededbyeitherofVCMorDExampleআংইংকং

7 ZmustbeprecededbyVCMDBXorPExampleইৎকৎাৎাৎপ6ৎrৎ (S is not listed because S ends with M Z may also follow S)rdquo

8 VCANNOTbeprecededbyHDetailsin711CaseofVprecededbyH

9 SCANNOTbeprecededbyH

Now letuselaborateeach rulewithexamples from the scriptkeeping inmindthe Bangla Assamese and Manipuri communities Some combinations ofcharactersmayseemunrealisticorrareinusagebutthereisnoharminaddingsuch ligaturesbecause it ispossible tocreate thembyanyusereasilybutmaynotbeattestedcombinationsCase of V Preceded by H There could be cases involvingmulti-word domains where Vmay need to beallowedtofollowanHegব8াsঅtইিuয়া baeligŋkʌv ɪndiə (U+09ACU+09CDU+09AFU+09BEU+0999U+09CD U+0995 U+0985 U+09AB U+09CD U+0987 U+09A8 U+09CD U+09A1U+09BFU+09DFU+09BE)(meaningBankofIndia)ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendswithanH(অt)andthesecondwordbeginswithaV(ইিuয়া)Somesectionsofthe linguistic community require the explicit presence of H for fullrepresentation of the sound intended However by and large the form of thefirstwordwithoutanH(U+09CD)isconsideredenoughforfullrepresentationofthesoundintendedforthefirstword

49

This isauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoire Otherwise V is never required to be allowed to follow an HPermittingthismaycreateaperceptivesimilaritybetweentwolabels(withandwithout H) for majority of the linguistic communities hence this is explicitlyprohibitedbytheNBGPIn future if required depending on the prevailing requirements from thecommunitythefutureNBGPmayconsiderrevisitingthisrule

72 AdditionalExamplesfromBanglaABNF

Belowarea fewexampleswhichhelponeunderstandsomeof the rulesABNFputsinplaceThesearejustgivenforreferencepurposesandarenotmeanttobecomprehensive

1HMBDorXcannotoccurinthebeginningofaBanglawordExample

ক क

াক ाक

ংক क

ক क

ঃক ःकAscanbeseensuchcombinationwillresultautomaticallyinaldquogolurdquooradottedcircle marking it as an invalid formation This is an intrinsic property of theIndianlanguagesyllableandisquasi-automaticallyappliedwhereversupportedbytheOS

2HisnotpermittedafterVBDXMS

Example

অ अ

অং क ক क কঃ कः ি )क

3NumberofBDorXpermittedafterConsonantorVowelorakāra(Mātrā)isrestrictedtoonethusthefollowingcombinationsareinvalidated

50

Example

কংং क

ক क

কঃঃ कःः

কা का

কীঃঃ कःः

অংং अ

অ अ

অঃঃ अःः

4NumberofMpermittedafterConsonantisrestrictedtooneExample

কীী की5MisnotpermittedafterV Example

ইাঈৗ ईाईौ6ThecombinationsofAnusvāra+ VisargaaswellasVisarga+Anusvāraarenotpermissible

Example

কংঃ कः

কঃং कः

8 Contributors

81ExpertsfromIndia ProfessorUdayaNarayanaSinghChair-ProfessorofLinguisticsampDeanFacultyofArtsAmityUniversityHaryanaGurgaonPachgaonManesarPIN122431(Haryana)India

51

ProfessorPabitraSarkarformerlyVice-ChancellorRabindraBharatiUniversityKolkataDrAtiurRahmanKhanPrincipalTechnicalOfficerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMrRajibChakrabortyLinguistSocietyforNaturalLanguageTechnologyResearch(SNLTR)Module114amp130SDFBuildingSaltLakeSector-VKolkata-700091(WestBengal)IndiaMrAkshatJoshiProjectEngineerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMsMoumitaChowdhurySeniorTechnicalOfficerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMrChandrakantaMurasinghAgartalaTripuraSomeotherNBGPmembers

82ContributorsfromBangladeshJanabMustafaJabbarHonorableMinisterMinistryofPostsTelecommunicationsampInformationTechnologyGovtofBangladeshProfShamsuzzamanKhanFormerDirector-GeneralBanglaAcademyDhakaProfRafiqulIslamNationalProfessorofHumanitiesDhakaProfSwarochisSarkarDirectorInstituteofBangladeshStudiesRajshahiUniversityRajshahiBangladeshProfJinnatImtiazAliDirector-GeneralInternationalMotherLanguageInstituteDhakaMrMohammadMamunOrRashidDepartmentofBanglaJahangirnagarUniversityampMemberBangladeshComputerCouncilProfManiruzzamanformerlyProfessorChittagongUniversityChattagramBangladeshMrShyamSunderSikderSecretarySecretaryPostampTelecommunicationsDivisionGovtofBangladesh

52

MrMdMustafaKamalFormerDirectorGeneralBangladeshTelecommunicationsRegulatoryAuthorityGovernmentofBangladeshDhakaBrigadierGeneralMdMahfuzulKarimMajumderDirector-GeneralEngineeringampOperationsDivisionBangladeshTelecommunicationsRegulatoryAuthorityGovernmentofBangladeshDhakaMdZiarulIslamProgrammerPostsampTelecommunicationsDivisionGovernmentofBangladeshDhakaProfSyedShahriyarRahmanDepartmentofLinguisticsUniversityofDhakaDrMizanurRahmanDirector(In-Charge)TranslationTextBookandInternationalRelationsDivisionBanglaAcademyDhakaDrApareshBandyopadhyayDirectorBanglaAcademyDhakaMrMdMobarakHossainDirectorBanglaAcademyDhakaDrJalalAhmedDirectorBanglaAcademyDhakaMrJahangirHossainInternetSocietyBangladesh(ICANNALS)JanabSarwarMostafaChoudhuryBangladeshComputerCouncilDhakaJanabMdRashidWasifBangladeshComputerCouncilDhakaJanabIstiaqueArifSeniorAssistantDirectorBangladeshTelecommunicationsRegulatoryAuthorityDhakaMsAfifaAbbasInformationSecurityandGovernanceLeadEngineeratBanglalinkandICANNFellowMr Mohammad Abdul Haque Secretary General Bangladesh Internet GovernanceForumMrImranHossenCEOEyeSoftandkeymemberofBangladeshAssociationofSoftwareampInformationServices(BASIS)MsShahidaKhatunDirectorFolkloreMuseumampArchiveDivisionBanglaAcademyDhakaMrSyedAshikRehmanCEOBengalMediaCorporationDhakaMrHaseebRahmanCEOProfessionalsrsquoSystemsDhaka

53

9 References[100] UnicodeConsortium2017UnicodeStandard100MountainViewCA

[101] BandyopadhyayChittaranjan1981DuiShatakerBanglaMudranoPrakashanKolkataAnandaPublishers

[102] BanerjiRD1919TheOriginoftheBengaliScriptKolkataNewDelhiAsianEducationalServices2003reprint

[103] ChatterjiSK1926TheOriginandDevelopmentoftheBengaliLanguageCalcuttaCalcuttaUniversityPress

[104] -----1939Bhasha-prakashBangalaVyakaran(AGrammaroftheBengaliLanguage)CalcuttaUniversityofCalcutta

[105] HaiMuhammadAbdul1964DhvaniVijnanOBanglaDhvani-tattwa(PhoneticsandBengaliPhonology)DhakaBanglaAcademy

[106]JhaSubhadra1958TheFormationofMaithiliLondonLuzacampCo

[107] KosticDjordjeDasRheaS1972AShortOutlineofBengaliPhoneticsCalcuttaStatisticalPublishingCompany

[108] MajumdarRC1971HistoryofAncientBengalCalcuttaGBhardwaj

[109] MazumdarBijaychandra19202000TheHistoryoftheBengaliLanguage(ReprCalcutta1920ed)NewDelhiAsianEducationalServices

[110] PandeyAnshuman2001ProposaltoEncodetheTirhutaScriptinISOIEC10646

[111] PalPalashBaran2001DhwanimalaBarnamalaKolkataPapyrus

[112] -----2007lsquoBanglaHarapherPanchParbarsquoInSwapanChakrabortyedMudranerSanskritiOBanglaBoiKolkataAbabhas

[113] RossFiona1999ThePrintedBengaliCharacteranditsEvolutionLondonCurzon

[114] ShastriMahamahopadhyayHaraPrasad1916HājārBacharērPurāṇaBāṅgālāBhāṣāyBauddhaGānōDōhāCalcuttaBangiyaSahityaParishat

[115] SinghUdayaNarayana(JointlyManiruzzaman)1983DiglossiainBangladeshandlanguageplanningCalcuttaGyanBharati214pp

[116] -----1987ABibliographyofBengaliLinguisticsMysoreCIILxii+316pp

[117] -----2017(withRajibChakrabortyBidishaBhattacharjeeampArimardanKumarTripathy)LanguagesandCulturesontheMarginGuidelinesforFieldworkonEndangeredLanguagesMimeoCentreforEndangeredLanguagesVisva-Bharati

54

[118] -----1980ScriptalchoiceandspellingreformAnessayinlanguageandplanningJournaloftheMSUniversityofBarodaSocialScienceNumber292173-186AmodifiedversionreprintedEAnnamalaiBjornJernuddandJoanRubinedsLanguagePlanningProceedingsofanInstituteMysoreCIIL405-417

[119] Sripantha1996JakhanChapakhanaEloKolkataPaschim-BangaBanglaAcademy

[120] SurAtul1986BanglaMudranerDushoBacharKolkataJijnasa

[121] ScriptBehaviourforBengaliVersion11TDILandC-DACPune

[122] BoraMahendra1981TheEvolutionofAssameseScriptJorhatAssamSahityaSabha

[123] ProposaltoEncodetheTirhutaScriptinISOIEC10646httpwwwunicodeorgL2L201111175r-tirhutapdfaccessedon25112017

[124] EthnologueAssameseintheLanguageCloudhttpswwwethnologuecomcloudasmaccessedon25112017

[125] BengalialphabetforManipurifoundinEthnologueManipuri(MeeteilonMeithei)httpswwwomniglotcomwritingmanipurihtmaccessedon20102019

[126] WikipediaBengalialphabethttpsenwikipediaorgwikiBengali_alphabetaccessedon25112017

[129]OmniglotSlyhetihttpwwwomniglotcomwritingsylotihtmaccessedon1052018

[130]WikipediaBishnupriyaManipurilanguagehttpsenwikipediaorgwikiBishnupriya_Manipuri_languageaccessedon25112017

[131] TheEMILLECIILCorpushttpmetashareeldaorgrepositorybrowsethe-emilleciil-corpusabdd35c8de6f11e2b1e400259011f6ea6bce74d38dbb42d881da76c64a6adb20accessedon1052018

[132]TheEMILLECIILCorpushttpcatalogelrainfoproduct_infophpproducts_id=696accessedon1052018

[133] BanglaLanguageampScript httpswwwisicalacin~rc_banglabanglahtmlaccessedon1052018

[134] SarkarPabitra1992BanglaBananSanskarSamasyaoSambhabanaKolkataChirayataPrakashan

[135] SarkarPabitra1993BanglaBhasharYuktabyanjanBhasha1123-45

55

[136] DashNiladriShekharandBBChaudhuri1998BanglaScriptAStructuralStudyLinguisticsToday121-28Alsoavailableathttpswwwacademiaedu9967428Bangla_Script_A_Structural_Study

[137] DaniAhmedHasan(1957)lsquoSrīhatta-NāgarīLipirUtpattioBikāśrsquoBanglaAcademyPatrika(Dhaka)Vol12(Bhadra-Agrahayan1364BangabdaNumber)pg1

[138] WikipediaSylhetiNagari httpsenwikipediaorgwikiSylheti_Nagariaccessedon1952018

[139] FuruiRyosuke(2015)lsquoVariegatedAdaptationsStateFormationinBengalfromtheFifthtoSeventhCenturyrsquoinBhairabiPrasadSahuampHermannKulkeedsInterrogatingPoliticalSystemsIntegrativeProcessesandStatesinPre-ModernIndiaChapter9Pp255-73NewDelhiManohar

[140] FergusonCharesAandMunierChowdhury(1960)lsquoPhonesofBengalirsquoLanguageVol36No1pp22-59

[141] ShahidullahMuhammad(2007)BuddhistMysticSongsDhakaMowlaBrothers

[142] RayPunyaSloka(1966)BengaliLanguageHandbookWashington

[143] HaiMuhammadAbdul(1960)AphoneticandphonologicalstudyofnasalsandnasalizationinBengaliDhakaUniversityofDhaka

[144] UnicodeConsortiumProposalSummaryFormtoAccompanySubmissionsforAdditionstotheRepertoireofISOIEC10646UNICODEhttpswwwunicodeorgL2L200202387r-syloti-formpdfaccessedonMay212018

[145] WikipediaOlChiki(Unicodeblock)httpsenwikipediaorgwikiOl_Chiki_(Unicode_block)accessedonMay212018

[146]BanglaScripthttpwwwbangladesh2000combdbangla_scripthtmlaccessedonMay212018

[147] BhattacharyaAshutoshed(1942)GopichandrerGanCalcuttaCalcuttaUniversity

[149] DasSisirKumar(1975)SahibsandMunshisAnAccountoftheCollegeofFortWilliamCalcutta

[150] IslamRafiqulPabitraSarkarMahbubulHaqampRajibChakraborty(eds)(2014)BanglaAcademyPromitoBanglaByabaharikByakaran(AFunctionalGrammarofStandardBangla)DhakaBanglaAcademy

[151] SarkarPabitra[2013]lsquoBanglaSpellingReformtheLongandShortofItrsquoBanglaJournal19215-232

[152] BanglaAcademy(2012)BanglaAcademyPromitoBanglaBananerNiyam(StandardBanglaSpellingasadoptedbyBanglaAcademy)DhakaBanglaAcademy

56

[153] SarkarPabitraampRajibChakraborty2018ldquoWhathashappenedSoFarIntermsof Script Reformsrdquo Paper presented at the Face to Face meeting jointly held by theBanglaAcademyDhakaampICANNatBanglaAcademyDhakaon10072018

[154] The Unicode Consortium 2018 The Unicodereg Standard Version 110 ndash CoreSpecificationChapter12P473

57

10 Appendix-I

101 AugmentedBackus-NaurFormalism(ABNF)The Augmented Backus-Naur Formalism (ABNF) is generic in nature and whenappliedtoaspecificlanguagescriptcertainrestrictionrulesapplyInotherwordsinagivenlanguagesomeoftheFormalismstructuresdonotnecessarilyapplyTotakecareofsuchcasesrestrictionrulesaresetinplaceTheserestrictionswillhelptofine-tunetheABNFIncaseofBangla13inparticularthefollowingrulesapply

1Khaṇḍata (ৎ) is NOT allowed at the beginning of an IDN label The sameappliestoঞandthevelarnasalঙintheBanglaSchemeoffive-foldlsquovargarsquo(asdefined under Table 5) Moreover Bangla does not allow ya (য়) in thebeginningofawordeitherbutwecanciteacoupleofnativeexamples forexamplethewordয়8াvেড়া(yaeligbbɔRo)fromthepoemlsquoLichuchorrsquowrittenbyKaziNazrul IslamHowever there are instances of it being used in namesmostlyof foreignoriginsuchasYaqubwhichmaybewrittenwithya(য়) inthe beginning as inয়াwব) In very recent timeswhile transliterating someChineseandJapanesenamesinBanglaonedoescomeacrossthepossibilityofKhaṇḍata (ৎ) followedbysa (স) inthebeginningofaword forexamplexেসিরং(Tsering)

2CHcancomewithKhandaTainonlythecasewhereCisra(র)(09B0)

ৎ6 asinভৎ6 সনা

3OnlyfollowingcombinationswithVHCMwillbeallowedrarrঅ8া(togetherpronouncedasaelig)asinঅ8ািসড(acid)rarrএ8া(togetheralsopronouncedasaelig)asinএ8ািসডএ8ােসািসেয়শান

(acidassociation)

102 lsquoSylhetiNagarılipirsquoorlsquoSilotirsquoThisversionofBanglascriptresemblesthe lsquoKaithīrsquoscript(ISO12954)usedbytheAccountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh andBiharndashwidelyinuseduringthe1880sTherewereseveralothernamesofSylheti

13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language Assamese and Maṇipuri have not been covered in this section

58

Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)

Table17ndashTheScriptTableofSylhetiNagarıorSiloti

103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints

1031 BanglaandNāgarīorDevanāgarī

Bangla Devanāgarī NBGP

Decision

ঃ U+0983 ः U+0903 Confusable

ওU+0993 उU+0909 Confusable

ঘU+0998 घU+0918 Confusable

U+0981 U+0945 Confusable

Table18BanglaandDevanāgarīconfusablecodepoints

59

1032 BanglaandGurmukhi

Bangla Gurmukhi NBGPdecision

ঘU+0998 ਬU+0A2C Confusable

U+0981 U+0A71 Confusable

Table19BanglaandGurmukhiconfusablecodepoints

Bangla Gurmukhi

NBGPdecision

ওU+0993 ਤU+0A24

Distinguishable

শU+09B6 ਅU+0A05

Distinguishable

মU+09AE ਮU+0A2E

Distinguishable

বাU+09ACandU+09BE

ਗU+0A17

Distinguishable

Table20ndashBanglaandGurmukhıdistinguishablecodepoints

1033BanglaandOriya(Odia)

Bangla Oriya(Odia) NBGPDecision

ওU+0993 ଓU+0B13 Confusable

Table21ndashBanglaandOriyadistinguishablecodepoints

Bangla Oriya(Odia) NBGP

Decision

ঘU+0998 ସU+0B38 Distinguishable

Table22ndashBanglaandOriyadistinguishablecodepoints

60

11 Appendix-IIBengaliconsonantsandtheirallographs

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

প p y(z+ত)(z +ন)|(z +প)প8(z +য)r(z +র)(z +ল)~(z +স)(+প)(+প)

ফ pʰ (t +র)(t +ল)(+ফ)

ব b ( +জ)( +দ)( +ধ)v(+ব)ব8(+য)(+র)(+ল)ভ(+ভ)(+ব)(+ব)

(0+ধ) 2 (3+ব)

ভ bʱ ভ8(+য)(+র)(+ল)

ত t (x+ত)8(x+x+য)(x+x+ব)(x+থ)(x+ন)ত8(x+য)(x+ম)8(x++য)(x+ব)(x+র)y(z+ত)(p+ত)(p+x+ব)(+ত)q8(+x++য) (+x+র)Thereisamarkedformofত+=ৎৎ6 (+xৎ)

amp (5+ত)

61

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

থ tʰ থ8(iexcl+য)cent(iexcl+র)pound(+থ)(x+থ)curren(+থ)

(7+থ) 9 (+থ)

দ d yen(brvbar+গ)sect(brvbar+ঘ)uml(brvbar+দ)copy(brvbar+ধ)দ8(brvbar+য)ordf(brvbar+ব)laquo(brvbar+ভ)not(brvbar+র)(+দ)shy(+দ)reg(+brvbar+র)not6 (+brvbar+র)

(lt+গ) gt (lt+ধ)

ধ dʱ macr(deg+ন)plusmn(deg+ম)ধ8(deg+য)sup2(deg+র)sup3(acute+ধ)copy(brvbar+ধ)(+ধ)micro(+ধ)

( (+ধ) gt (lt+ধ) (0+ধ) (7+ধ)

ট ʈ para(middot+ট)ট8(middot+য)cedil(middot+ব)sup1(middot+র)ordm(p+ট)raquo(frac14+ট)

ঠ ʈʰ ঠ8(frac12+য)frac34(iquest+ঠ)Agrave(frac14+ঠ)

ড ɖ Aacute(Acirc+ড)ড8(Acirc+য)Atilde(Acirc+র)

ঢ ɖʱ ঢ8(Auml+য)Aring(iquest+ঢ)

চ tʃ AElig(Ccedil+চ)Egrave(Ccedil+ছ)Eacute(Ccedil+Ecirc+র)Euml(Ccedil+ঞ)চ8(Ccedil+য)Igrave(Iacute+চ)Icirc(Iuml+চ)

(A+চ)

62

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ছ tʃʰ ETH(Ecirc+র)Egrave(Ccedil+ছ)Ntilde(Iacute+ছ)Ograve(Iuml+ছ)

$ (A+ছ)

জ dʒ Oacute(Ocirc+জ)Otilde(Ocirc+Ocirc+ব)Ouml(Ocirc+ঝ)times(Ocirc+ঞ)জ8(Ocirc+য)Oslash(Ocirc+র)Ugrave(Iacute+জ)

(A+জ)

ঝ dʒʱ (notprivilegedenoughtohaveclustersasafirstmember)Ouml(Ocirc+ঝ)Uacute(Iacute+ঝ)

ক k Ucirc(p+ক)ordm(p+ট)(p+ত)Uuml(p+x+র)(p+x+ব)Yacute(p+ন)THORN(p+ব)szlig(p+ম)ক8(p+য)agrave(p+র)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)auml(p+frac14+ব)aacute8(p+frac14+য)aring(p+স)s(aelig+ক)ccedil(+p+র)

amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)

খ kʰ (notprivilegedenoughtohaveclustersasafirstmember)egrave(aelig+খ)

63

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

গ g eacute(acute+গ)ecirc(acute+দ)sup3(acute+ধ)euml(acute+ন)igrave(acute+ব)iacute(acute+ম)গ8(acute+য)icirc(acute+র)iuml(acute+ল)eth(aelig+গ)eth6 (+aelig+গ)

( (+ধ) (H+গ) K (L+H+গ)

ঘ gʱ ntilde(ograve+ন)ঘ8(ograve+য)oacute(ograve+র)ocirc(aelig+ঘ)

ঞ Thisletterdoesnothaveanyparticularphoneticvaluebutmostlypronouncedasn

Igrave(Iacute+চ)Ntilde(Iacute+ছ)Ugrave(Iacute+জ)Uacute(Iacute+ঝ)times(Ocirc+ঞ)

(A+চ) $ (A+ছ) (A+জ) M (A+ঝ)

ণ n otilde(iquest+ট)frac34(iquest+ঠ)ouml(iquest+ড)divide(iquest+Acirc+র)Aring(iquest+ঢ)oslash(iquest+ণ)ণ8(iquest+য)ugrave(iquest+ব)acirc(p+frac14+ণ)uacute(frac14+ণ)ucirc(+ণ)

O (P+ড) - (P+R+র) + (S+ণ)

ঙং ŋ s(aelig+ক)uuml(aelig+p+র)egrave(aelig+খ)eth(aelig+গ)ocirc(aelig+ঘ)yacute(aelig+p+ষ)(Insomecontextsaeligisreplacedbyং)কংঅং

) (H+ক) (H+গ) U (H+ঘ)

64

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ম m thorn(+ল)yuml(+প)(+z+র)(+ভ)(++র)$(+ম)(+র)(x+ম)plusmn(deg+ম)amp(+ম)atilde(p+frac14+ম)

W (3+ম)

ন n (+ট)((+middot+র))(iquest+ঠ)u(+ড)(+Acirc+র)(+ত)q(+x+র)q8(+x++য)curren(+থ)shy(+দ)reg(+brvbar+র)micro(+ধ)+(+deg+র)(+brvbar+ব)-(+ন)(+ম)ন8(+য)(+স)0(+ন)

(7+থ) (7+ধ) (7+Y+র)

শ ʃ Icirc(Iuml+চ)Ograve(Iuml+ছ)1(Iuml+ন)2(Iuml+ম)3(Iuml+র)4(Iuml+ল)শ8(Iuml+য)

ষ ʃ 5(frac14+ক)raquo(frac14+ট)Agrave(frac14+ঠ)uacute(frac14+ণ)6(frac14+প)7(frac14+z+র)8(frac14+ফ)raquo(frac14+ট)9(frac14+middot+র)Agrave(frac14+ঠ)uacute(frac14+ণ)ষ8(frac14+য)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)

+ (S+ণ)

65

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

স sampʃ (+ক)(+ট)(+প)(+ফ)lt(+ত)pound(+থ)(+ট)(+ক)=(+খ)স8(+য)gt(+র)(+ল)aring(p+স)

9 (+থ)

হ h ucirc(+ণ)0(+ন)amp(+ম)হ8(+য)(+র)A(+ল)

W (3+ম)

ড় ɽ B(C+গ)

ঢ় ɽʱ (notprivilegedenoughtohaveclusters)

য dʒThesecondarysymbol(allograph)jɔ-phalahastwophoneticvaluesWhenaddedtotheinitialconsonantinaworditisavowelaelig(asinশ8ামলর 8াপারetc)Butafteranon-initialconsonantitjustdoublesitinpronunciation(asinকায6ধায6etc)The+যcombinationhastwophysicalmanifestationsmdashর 8andয6

ক8(p+য)স8(+য)র 8(+য)[Justর 8isneverusedinBanglaorthographyর 8াisbutthenitslasttwosymbolsYa-phalaa-karaconstituteavowelsignrepresentingthevowelঅ8া]

66

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

র r Twomanifestationsmdashi lরফrepʰasthefirst

memberofaclusteregপ6ৎ6 not6 য6D6(+deg+ব)(earlierE6=+brvbar+deg+বafour-termcluster)etc(placedoverthefollowingconsonant)

ii র-ফলাrɔ-pʰɔlaasthesecondthirdmemberofaclustereg etc(placedundertheconsonantitfollows)

ল l F(+গ)(+প)G(+ব)H(+ম)I(+ট)J(+ড)K(+ক)F(+গ)L(+দ)ল8(+য)iuml(acute+ল)(+ল)thorn(+ল)

ঃ hwordfinallywordmediallyitdoublesthepronunciationofthefollowingconsonant

অঃকঃ

অব

Page 8: Proposal for a Bangla (or Bengali) Script Root Zone …‘Bangla’ (or Bengali) is historically and genealogically regarded as an eastern Indo-Aryan language with around 178.2 million

8

TheoveralldevelopmentofBanglaScriptfromtheKuṭila-lipiperiodtoModernBanglacouldbeseenhereinTable3([102and146]andalsoseetheweb-pagein147)

Table3BanglaScriptinDifferentCenturies

32LanguagesConsideredBelowisthetabularrepresentationofthelanguagesusingBanglascriptthatareplacedonEGIDSScale1-6 (See117 fordetails) Some languagesunderEGIDS5 and6havealso developed their own scripts for printing and publishing Some had used Banglascriptearlier(suchasBodo)orusedit inWestBengalatsomepointoftime(Santali)but have later shifted to another writing system Bodo is now written in Nāgarī or

9

DevanāgarīandforSantalioneusesbothNāgarīDevanāgarīandOl-chiki(145)Forthepurposesof theBanglaLGRonly languagesbelonging to theEGIDS scale1 to4havebeenconsideredConsiderthefollowingtable

EGIDSScale1

EGIDSScale2

EGIDSScale3

EGIDSScale4

EGIDSScale5

EGIDS6

Bangla(Bengali)

SantaliBodoRiangKhumiMru(ng)Asho

LepchaPnarKodaKoraChak

Asamiyā(Assamese)

KochorRajabansı

MaltoorMalpahariya

ManipuriorMeitei

BisnupriyaManipuriKok-Borok(TripuraampBangladesh)

ChakmaHajongMundariampKurux(ofBangladesh)

TotoRohingyaTipperaMegamTanchangya

Usoi LimbuSadriorOraon

BhumijorMundariBawmChin

Table4MainlanguagesinIndiaandBangladesh

thatuseBanglaScriptontheEGIDSScale

33NotableFeaturesofBanglaScript[150]BanglaWritingSystemhascertainfeaturesthatshowhowithastobewritteninorhowtype-setting inBangla couldbedoneThis section is followedbya section that explains theCode-points (and fixed Code-point sequences) which show certain distinctive characteristics ofBanglaandwhichmaketheRepertoireThenextsectionswillalsocoverthelsquoaksharrsquo-formationrules(ABNF)showingcharacterclassWordLevelEvaluation(WLE)andContextRulesaswellas In-ScriptandCross-ScriptVariantsHerewepresentsomebasic featuresof theScriptandPronuncition The Bangla script is an alpha-syllabic writing system in which writing of all

consonants are assumed to contain an accompanying lsquoinherentrsquo vowel(theoretically before or after each consonant) It varies between ɔ and o

10

depending on the position of the consonant in the word At times theselsquoassumedrsquoorlsquoinherentrsquovowelsarenotpronouncedatall[142]

Vowelscanbewrittenasindependentlettersorbyusingavarietyofdiacriticalmarks which are written above below before after or both of the last twopositionstheconsonanttheyfollowinpronunciation[105]

AllBanglaconsonantswhenpronouncedinisolationareutteredwithaninherentvowel-ɔhenceক lsquokrsquoখ lsquokhrsquoorগ lsquogrsquoareusuallypronouncedas[kɔ][khɔ]or[gɔ]etcPhonologicallyBanglavowel-ɔcorrespondstotheHindischwaə

WhenconsonantsoccurtogetherinclustersspecialconjunctlettersareformedInprintedBanglamanyof theseconsonantal clustersorconjoinedconsonantsareinuseThelettersfortheconsonantsotherthanthefinaloneinthegrouparegenerally reduced But there are a few special conjunct characters which arecompounds of the consonant characters eg 7(k)+ষ(s)=8(ks)

9(n)+জ(j)=(nj)(j)+ঞ(n)==(jn) gt (h)+ম(m)=(hm) There are other issuesalsomdashরasthesecondmemberofaclusterisreducedtoasecondarysymboleg

(p)+র(r)=A(pr)B(s)+C(t)+র(r)=D(str) (as inউD ustra ldquocamelrdquo)য (y)whenusedas a primary symbol represents jɔ in Bangla But its secondary symbol(allograph) jɔ-phala has two phonetic values When added to the initialconsonant in a word it is a vowel aelig (as in শGামল (syamala) ldquogreenrdquo র Gাপার

(ryapara)ldquowrapperrdquoetc)Butafteranon-initialconsonant it justdoublesit in

pronunciation (as in কাযH ধাযH etc) The I(r)+য(y) combination has two

renderingsmdashর G(ry) andযH(ry)IncaseofJ(d)+ধ(dh)K(g)+ধ(dh)L(n)+ধ(dh)the

shape of the second member is changedmdasheg M(ddh) N(gdh) and O(ndh)

respectively The solitary example of I (r)+ঋ(r)=ঋH (as in ৈনঋHত nairrtSouthwest) ndash usedmostly in cases of Classical borrowings shows the use ofsecondary symbol of a consonant followed by the primary symbol of a vowelTheinherentvowelonlyappliestothefinalconsonantofthecluster

InconsonantclustersmanyconsonantstookacompletelydifferentformSometypicalexamplesareS(kt)T(kr)8(ks)N(gdh)=(jn)U(nc)(nj)V(tt)W(nt)O(ndh)X(bdh)Y(bhr)Z(mb)[(st)etcরhastwoallographsapartfromthisfullshapeoneislsquorepharsquoasfoundinকH(rk)পH(rp)andanotherisra-phalaasinA(pr)T(kr)(s+n)isanotheronewherethecerebralnasalconsonantsigntakesaqueershape[151]

The Bangla script has at least fifty-two primary symbols and quite a fewallographs(positionalvariantsofthem)correspondingtoforty-four(7oraland7nasalvowelsand30consonants)phonemes(150)orfunctionalspeechsoundswithsomeobviousredundanciesalthoughinoneofthefirstphonemicanalysisthenumberwasthoughttobethirty-fivephonemes[140]

11

As mentioned above in Bangla several graphemic symbols have secondaryshapestechnicallycalledlsquoallographsrsquowithacomplementarydistributionineachcaseThesegraphsormarkingsaregenerallyaddedtothefollowingpositionsoftheprimarysymbol[113]inthefollowingmanner

1) Below(egক(ku)W(nta)ক(ku)^ (hra)etc)

2) Above(egচ (ca)কH (rka)etc)

3) Rightside(egকা (ka)কং (kan)etc)

4) Leftside(egেক (ke))

5) LeftSideandabovesimultaneously(egৈক (kai)িক (ki)etc)

6) Rightsideandabovesimultaneously(egকী (kı))

7) Rightsideandleftsidesimultaneously(egেকা (ko))

8) Rightsideleftsideandabovesimultaneously(egেকৗ (kau))

Asforcomplementarydistributionofvowelletters(word-orsyllable-initial)andVowel Matras which are relevant for ABNF let us consider the followingBesidessomesimpleVowelModifierscalledlsquoKarsrsquoinBangla(alsoreferredtoasMatraintheotherLGRdocumentsofNeo-Brāhmī)therearesomecombinatorymodifiersofBanglaVowelswithcertainconsonantsForexamplewhereas

আU+0986BENGALILETTERAAissubstitutedby

াU+09BEBENGALIVOWELSIGNAA

ইU+0987BENGALILETTERIissubstitutedby

pre-posedিU+09BFBENGALIVOWELSIGNI

ঈU+0988BENGALILETTERIIissubstitutedby

ীU+09C0BENGALIVOWELSIGNIIor

উU+0989BENGALILETTERUissubstitutedby

U+09C1 BENGALI VOWEL SIGN U by marking below the primary

grapheme there are some special vowel modifiers of উ as in the followingcombinedletters

zwnj guratherthanwritingasগ(g)+ (u)

h ruratherthanwritingasর(r)+ (u)

zwnj śuratherthanwritingasশ (s)+ (u)

j huratherthanwritingasহ(h)+ (u)

12

knturatherthanwritingasL (n)+ত (t)+ (u)

Similarlytherecouldbevowelmodifiersofঊorlsquo(Long)ūrsquoaswelleg

m (bh)+র (r) (n bhru ldquoeyebrowrdquo)o (s)+র (r) (p sru)ঋ (r) afterহ (h) (q hr)etc

TherehavebeenmanynotablecontributionsinsimplifyingandmodifyingBanglaspellings and combinatory techniques especially by scholars such as PabitraSarkar(1992)[134]Inthistherehasbeenanattempttoreducethenumberofallographs of both vowels and consonants in clusters and it has been widelyacceptedintheprintingofschooltextsinbothBangladeshandWestBengal[151152]Asofnow twosystems theold (traditional) and thenewgoon sidebysideoperativeindifferentdomains

HoweverinpreparationofthisLGRdocumenttheaimhasbeentoconsiderthewidelyused and usable sequences and combinations and their variations across the sisterscriptsbelongingtothebasketofBrāhmīwritingsystemsBanglaAcademyDhakapublishedStandardBanglaSpellingRulesin1992followingtherecommendationsofacommitteeconstitutedthroughaworkshopjointlyorganizedbytheJatıyaSy iksakramaandPathyapustakaBoardin1988AthroughlyrevisededitionoftheRuleswaspublishedinSeptember20126After the establishment of Banla A kademi ofWestBengal in 1986 its first PresidentAnnadasankar Ray (1904-2002) in his inaugural address gave a direction forstandardizationofBanglaalphabetscript thespellingsystemandclearlyarguedthattheywouldnotblindly followtheSanskriticmodelofconventionalgrammarAbroadlistofproposalswassenttoexpertsonBanglaandabroadagreementwasreachedforlsquohomogenizationofBanglaspellingrsquoby1988BasedonopinionsreceivedfromdifferentquartersaunanimouslistoflsquorulesrsquowasagreeduponThiswaspublishedbyalsquoSpellingDictionaryrsquo titled Ākādemi Bānāna Abhidhāna (1997) which was obviously morecomprehensive than lsquoTheUniversityofCalcuttaproposalsrsquomade in1936Alongwiththe lsquorationalizationrsquo of spellings another stepwas taken tomake thewriting systemeasier to read by making the symbols used both single and combined ones morelsquotransparentrsquoThesereformswereoriginallysuggestedbySarkar(1987firstpublishedin1978)[134][153]whereheusedthetermsSwaccha (lsquoTransparentrsquo)andAswaccha(lsquoOpaquersquo or non-transparent) even adding Ardha Swaccha (lsquohalf transparent) inbetweenthetwoSomesampleexamplesare

6Bangla Academy 2012 Bāṅlā Ekaḍemī Pramita Bāṅlā Bānānera Niyama (Bangla Academy StandardBanglaSpellingRules)DhakaːBanglaAcademy

13

Transparent r (nn) s (pt) [ (st) where both member of the cluster can berecognizedOpaquewhereneitherofthetwocouldbe(easily)recognizedmdash8 ks(7 k+ষ s)= jn

( j+ঞ n)tng(un+গg) hm(gt h+ম m)

Semi-transparent A (pr)পH (rp)whereone symbol is recognizable and theother is

notIncaseofthree-termclustersatleastonesymbolwillnotbetransparentegv str

(w s+x t+র r)D str(B s+C t+র r)etc

Therewere in fact two types of proposals One concerned the shape of the lettersthose of consonant + vowel (CV) combinations and conjuncts which is consonant +consonantcombinationsTherewerefurthercomplexshapesiethoseofconsonant+consonant+ (consonant+) vowel (CC(CV) signs as in y (pru) or z (skru) SomedecisionsinthisareawerenecessarybecauseafewoftheCC(C)symbolsrepresentedcomplexitiesthatmadelearningthemdifficultforthechildrenTheotherdealtwiththespellings ofwords onlywithout any reference to the shapes of letters inwhich theywere written The basic objective here was lsquoone word one spellingrsquo to the greatestextentthatwaspossible[151]

Belowwe place a statement of themost salient changes that affect the consonant +vowelcombinations[153]

a The variants of the short u (^ উ-কার hrasva u-kāra) vowel sign have been

brought down to one ie So zwnj (gu) is now গ Similarly h (ru) gt র zwnj

(śu)gt শ j (hu)gtহ and therefore cluster + short u sign k (ntu)gt W

(ন++ত+উ) (stu)gt[ (স++ত+উ)

b The variants of long u (দীঘH ঊ-কার dīrgha u-kāra) have also been reduced

(rū)gt র n (bhrū) gt Y (ভ bh++র r+ঊ ū) (drū)gt (দ d++র r+ঊ ū) p (śrū)gt

(শ ś++র r+ঊ ū)

c The variants of ঋ-কার (ṛ-kāra secondary symbol of ṛ) have been brought down to one q (hṛ) gt হ

Regarding consonant + consonant + (consonant)hellip+ (vowel) clusters PaschimbangaBanglaAkademi proposed transparent or semi-transparent shapes for clusters to theextentadmissibleinBanglawritingsystemSomeexampleswillclarifytheproposal(Aslashwillmeanthatthetraditionalcluster-shapeprecedesitwhiletheBanglaAkademiinnovationfollows)[153]

Xব ধ bdh ( b+ ধ dh) Mদ ধ ddh (J d+ধ dh) ন থ nth (L n+থ th) Uঞ চ ntildec

(9 ntilde+চ c) ঞ ছ $ ntildech (9+ছ) ঞ জ ntildej (9 ntilde+জ j) Sক ত amp kt (7 k+ত t) T

14

kr (7 k+র r) Nগ ধ ( gdh (K g+ধ dh) ) ṅk (u ṅ+ক k) t ṅg (u ṅ+গ g) +

ṣṇ (B ṣ+ণ ṇ) ন ndhr (L n+ dh+র r) - ṇḍr ( ṇ+ ḍ+র r) ktr (7 k+x

t+র r)

331TheConsonantsAsper traditional classificationBangla Consonants are categorized according to theirphoneticpropertiesespeciallyintermsofplaceandmannerofarticulation[107]Thereare Five lsquoVargarsquo (pronounced as lsquoBargarsquo in Bangla) or Groups (sets or classes)distinguished by Place of Articulation and one Non-lsquovargarsquo group [105] Each Vargawhich corresponds toStopsat a certainplaceof articulation containsa seriesof fiveconsonants classified as per their phonetic qualities (ie manner of articulation)beginning from Unvoiced and Unaspirated to Voiced and Aspirated (in the fourthcolumn)finallyendingwithaHomorganicorCorrespondingnasal[107]Considerthefollowingtable

lsquoVargarsquoorSets

Unvoiced Voiced Nasal

-Asp +Asp -Asp +Asp

Velar কlsquoKrsquoU+0995

খlsquoKHrsquoU+0996

গlsquoGrsquoU+0997

ঘlsquoGHrsquoU+0998

ঙlsquoNGrsquoU+0999

Palatal চlsquoCrsquoU+099A

ছlsquoCHrsquoU+099B

জlsquoJrsquoU+099C

ঝlsquoJHrsquoU+099D

ঞlsquoNYrsquoU+099E

Retroflex টlsquoTTrsquoU+099F

ঠlsquoTTHrsquoU+09A0

ডlsquoDDrsquoU+09A1

ঢlsquoDDHrsquoU+09A2

ণlsquoNNrsquoU+09A3

Dental তlsquoTrsquoU+09A4

থlsquoTHrsquoU+09A5

দlsquoDrsquoU+09A6

ধlsquoDHrsquoU+09A7

নlsquoNrsquoU+09A8

Bilabial পlsquoPrsquoU+09AA

ফlsquoPHrsquoU+09AB

বlsquoBrsquoU+09AC

ভlsquoBHrsquoU+09AD

মlsquoMrsquoU+09AE

Table5VargaclassificationofBanglaconsonants

(FallingintoaPatternofFiveSetsofUnvoicedUnaspiratedUnvoicedAspiratedVoicedUnaspiratedVoicedAspiratedandNasalscalledfivelsquoVargarsquo)

Non-

যlsquoYrsquoU+09AF

য়lsquoYYrsquoU+09DF

রlsquoRrsquoU+09B0

ড়lsquoRRrsquoU+09DC

ঢ়lsquoRHrsquoU+09DD

15

Varga লlsquoLrsquoU+09B2

শlsquoSHrsquoU+09B6

ষlsquoSSrsquoU+09B7

স lsquoSrsquo U+09B8

হlsquoHrsquoU+0939

Table6Non-Vargaconsonants(Notfallingintoanyofthefivecategories)

332TheImplicitVowelKillerHasanta(calledrsquoHalantrsquoorlsquoHalantarsquoinotherBrahmı-basedscripts)As stated earlier all consonants are pronounced in isolation with an implicit vowel(centralback-ɔinBanglaastheneutralvowel)assumedtobeassociatedwiththem[121]ThelsquoHasantarsquo(=rsquoHalantrsquoorlsquoHalantarsquoinotherBrahmı-basedscripts)orthetermlsquoVirāmarsquo7(=rsquoDa rirsquoinBangla)aspreferredinUNICODE(cfUnicode30andabove)havebeenusedinthisreportastermsthathavebeenusedtodenotethecharacterthatmarkthe absence of this inherent vowel It may be noted that the term virama has beenadopted in UNICODE in a sense that is different from the traditional definition ofgrammarandhenceitrequiressomeexplanationhereConsideringtheimportanceofthedocumentthisnoteshouldbeapartofthisLGRdocumentsothatanybodyreferingtoitshouldbeabletoknowthepropergrammaticalexplanationofthetermBecauseaspecialsignisneededwheneverthisimplicitvowelisstrippedoffthesymbolisknownas the Hasanta (= Halant) (U+09CD) By placing the Hasanta under the firstconsonantofacombinationorclusteronecouldndashincommonparlanceldquokillrdquoitsvowelandcreate conjuncts In thismanner conjunct characters canbegenerallywrittenbyjoining two to fourconsonant combinations In rarecases thisprocess can joinup tofive consonantsHowever thenotionof amaximumnumberof consonants joining toformoneaksara8istobeboundedempiricallyThisisanobservationbasedontheCIIL-Emille Corpora of Bangla words [132 amp 133] as seen in print these days Given themixture of scripts and languages happening on theweb the possibility that onemaywant a generic Top Level Domain [gTLD] which may have more than the observedmaximum cannot be ruled out This can be the case when a foreign language wordwhichadmitsalargenumberofconsonantsistransliteratedintoBanglaHenceintheBanglaLGRworkthislimitwillnotbeenforced

333VowelsSeparate symbols exist for all lsquoSwararsquo or Vowels in Bangla which are pronouncedindependentlyeitheratthebeginningofthewordorafteranothervowelorconsonantsoundToindicateaVowelsoundotherthantheimplicitoneaVowelsigncalledlsquokārrsquo

7VirāmaasusedhereisalsoamisnomeraccordingtotheIndiangrammaticaltraditionsNowheremereabsence of a vowel is marked as virama Hasanta just marks the absence of a vowel nothing else(AbhyankarKashinathVasudevampJMShukla1961ADictionaryofSanskritGrammarBarodaːOrientalInstitute)8ThistermneedstobedisambiguatedAksaraalsomeanslsquosyllablelsquoinIndiangrammaticaltreaditions

16

inBanglaorMātrāinNagarı9isattachedtotheconsonantSincetheconsonanthasthisbuilt in neutral vowel at the end there are equivalent kāras (Mātrās) for all vowelsexcepttheঅ(pronounced-ɔ)Thecorrelationisshownasfollows

Vowel Correspondingvowelsign(kāras(Mātrās)

অlsquoArsquoU+0985

আlsquoAArsquoU+0986 া U+09BE

ইlsquoIrsquoU+0987 ি U+09BF

ঈlsquoIIrsquoU+0988 ীU+09C0

উlsquoUrsquoU+0989 U+09C1

ঊlsquoUUrsquoU+098A U+09C2

ঋVocalicrsquoRrsquoU+098B U+09C3

ৠVocaliclsquoRRrsquoU+09E0 U+09C4

ঌVocaliclsquoLrsquoU+098C U+09E2

ৡVocaliclsquoLLrsquoU+09E1 U+09E3

এlsquoErsquoU+098F ে U+09C7

ঐlsquoAIrsquoU+0990 ৈU+09C8

ওlsquoOrsquoU+0993 োU+09CB

ঔlsquoAUrsquoU+0994 ৌ U+09CC

- ৗ U+09D7

Couldappearontopofঅ lsquoArsquoU+0985oranyothervowel

U+0981Candrabindu

9AlthoughthetermlsquoMatralsquoinBanglastandsforanaltogetherdifferentconceptvizthetopbarplacedoveraletterndashtypicallyavailableinHindiandBanglabutmissinginGujarati

17

Vowel Correspondingvowelsign(kāras(Mātrās)

Couldappearafterঅ lsquoArsquoU+0985oranyothervowel

ংU+0982Anusvara

Couldappearafter অlsquoArsquoU+0985oranyothervowel

ঃU+0983Visarga

Afteranyconsonant U+09CD(Hasanta)

- ঽ U+09BDAvagraha

Table7BanglaVowelswithcorrespondingkārs

334TheAnusvāraonuʃʃār(ং-U+0982)TheAnusvāra or onuʃʃār inBangla at times represents a homorganic nasal but notalwaysItreplacesaconjunctgroupofalsquoNasalConsonant+Hasanta+ConsonantrsquowherethesecondconsonantbelongstotheVelarvargaorsetasinলংকাButitoftenappearsalso for such combinations involving non-velars appearing as the lastmember of thecombinationasinলGাংটা ldquonakedrdquoorলGাংচা ldquoakindofsweettolimprdquoBeforeanon-vargaconsonant the Anusvara represents a nasal sound that may have an alternativeconjoined writing symbol representing the corresponding nasal consonant of theparticularsetAlthoughModernHindiMarathiandKonkaniprefertheanusvāratothecorrespondingHalf-nasal inBangla it isclearlydemarcatedastowhereonemustusetheAnusvāraandwhere ithastobeaconjunctclusterwithanasalasthefirstorthesecondcomponent

335NasalizationCandrabindu(-U+0981)

Candrabindu denotes nasalization of the preceding vowel as in চাদ cad lsquomoonrsquo(U+099AU+09BEU+0981U+09A6)Thissignwithadotinsidethehalf-moonmarkisusedasnasalizationmarkerinmanyBrahmı-basedscripts[143]

336Nukta(-U+09BC)ThenuktasigndoesnotexistinBanglaorthographyItispredominantlyusedinmanyBrahmıderivedscriptssuchasDevanagarı(forHindiBodoMaithiliSantaliKashmiriandSindhiThetermandtheconceptofnuktaareborrowedinBanglaTheIDNAProtocol(RFC5891)statesthatIDNsmustbeinUnicodeNormalizationFormC (NFC) RFC 7940 applies this requirement to LGRs The definition of NFC in the

18

UnicodeStandardcontainsanumberofcompositionexclusionsAsaresulttheBanglalettersয় YYAড় RRAandঢ় RRHAhavetoberepresentedinthethisLGRbyusingthesequences (YA +Nukta U+9AF + U+09BC) (DDA + Nukta U+9A1 + U+09BC) and(DDHA+NuktaU+9A2+U+09BC)insteadofthesinglecodepointsYYA(U+9DF)RRA(U+09DC) andRRHA (U+09DD) although the useof lsquoNuktarsquo is otherwise completelyunnaturalinBanglaIt is noted that in the current Unicode Standard chart these characters are listed asadditionalconsonantsAspertheLGRProcedurehoweverthesedecisionsdependontheIDNAProtocolthroughasetofprodeduresdevelopedbytheIETFEventhoughtheUnicode Standard also prescribesmethods to produce these three characters both asatomiccharacters (forexample09DC forড় [r]09DD forঢ় [rh] and09DFasয় [y]assinglekeystroke)theIDNAprotocolrequiresthatwetreatthemasconjunctcharactersandthenallocatecodesfortheseintheUnicodeBengaliBlockItmaybenotedthattherecouldbesporadicattemptsorcasesofwritingMuslimnamesUrdupoeticwordsandPerso-Arabicloanwordswithnuktaunderক(k)খ(kh)গ(g)জ(j) and ফ (ph) only for the sake of correct pronunciation and for maintaining thesanctityoftheloanwordThesewerealsolikeusingBanglawritingsystemtoworkliketheIPAscriptItishowevernotinuseinBanglawritinginprinting

337Visargabiʃɔrgo(ঃ-U+0983)andAvagraha(ঽ-U+09BD)

TheVisargabiʃɔrgoU+0983 is frequentlyused inBangla loanwordsborrowed fromSanskritandrepresentsasoundveryclosetohOnecouldquoteasanexampleদঃখduhkholdquosorrowrsquorsquoldquounhappinessrsquorsquo(U+0926U+0941U+0983U+0916)The Avagraha ঽ (U+09BD) is mainly used in Sanskrit Pali Prakrt or Maithili textswritteninBanglaItisgraduallybeingreplacedbyanuppercomma(egনেরাঽপরািণre-writtenasনেরাrsquoপরািণ)ItisrarelyusednoweveninotherlanguagesusingBanglascriptIncaseofLGRtheAvagrahaisnotpartoftherepertoireIthasbeendecidedthereforenottoretainAvagraha(ঽ)(U+09BD)becauseitisblockedinTLDsaspertheMaximalStartingRepertoire(MSR)PleaseseeAppendixIIinsection11foracompletelistofBanglaconsonantsandtheirallographs

338ZeroWidthNon-joiner(U+200C)andZeroWidthJoiner(U+200D)ThisnoteispertinenttotheuseofZeroWidthJoiner(ZWJ)andZeroWidthNonJoiner(ZWNJ)asusedinBanglaItneedstobenotedthatNepaliKonkaniandHindiusethesetwosignsinadifferentmanner

19

ZWJ(U+0200D)andZWNJ(U+0200C)arecodepointsthathavebeenprovidedbytheUnicodestandardto instructtherenderingofastringwherethescripthastheoptionbetweenjoiningandnon-joiningcharactersWithouttheuseofthesecontrolcodesthestringmayberenderedinanalternateformfromwhatisintendedUseofZWJ

bull InsofarasBanglaisconcernedZWJisusedfortheproperrenderingofcharacterssuchaskhaṇḍa-taৎasinসতGিজৎ (satyajit)ldquoSatyajitrdquoandসৎ(sat)ldquohonestrdquoThisistypedasfollowsta+Hasanta+ZWJ(U+0200D)

bull However ZWJ is more important where same combination of consonantal

characters is represented differently depending upon the contexts Eg র++য

havetworepresentationsinBanglamdashasযHandasর GTogettheformযHonehasto

type in the following mannermdashর++য but for র G the sequence would be

র+ZWJ++য [154] In other words ZWJ is used in the rendering of wordsdemanding ya-phalā after ra which is otherwise not possible to type (render)due to the same order of ra+hasanta+antastha ja in the medial andor finalposition Interestingly ra+hasanta+antastha ja is used to type repha on theconsonant -antasthaja as inকায6 (kaarjo) In order to get a ya-phalā after the

consonant -ra it is therefore obligatory to use ZWJ after -ra as in র Gাপার

(wrapper)র Gাশ(rash)র Gািল(rally)etcThetypingsequenceisgivenbelow

ra(র)+ZWJ+hasanta()+antasthaja(য)=র GUseofZWNJ

bull TheuseofZWNJinBanglaisusedtorepresenttheexplicitHasantaorHalantInordertoavoidconjunctformationincaseswherethereisanexplicithasantabeforethesucceedingconsonanttheZWNJisused

Consonant+hasanta+ZWNJ+consonant=explicithasantaExampleAা7 কথন(prakkathanaprakkɔtʰon)

TheuseofZWJZWNJhavebeenruledoutfromtherootzonebythe[Procedure]Usedin Bangla to create alternate renderings the insertion of these two signs can affectsearchingaswellasNLP

20

TheZeroWidthNon-joiner(ZWNJ)isaninvisiblecharacterusedincertaincases(afterHasanta)wheredefaultconjunctformationistobeexplicitlyrestrictedandtheHasantajoiningthetwoconsonantsparticipatingintheconjunctformationneedstobeexplicitlyshown

339UseofYa-phalaaYa-Phalaasequencesare two instances inBanglawhereHasanta isprecededbya fullvowel(U+0985অ-BENGALILETTERAandU+098Fএ-BENGALILETTERE)

bull অা 098509CD09AF09BE BENGALILETTERA+BENGALISIGNVIRAMA+BENGALILETTERYA+BENGALIVOWELSIGNAA

bull এা 098F09CD09AF09BEBENGALILETTERE+BENGALISIGNVIRAMA+BENGALILETTERYA+BENGALIVOWELSIGNAA

For renderingYa-phalā followedbyঅ andএ it isnecessary to typeU+09CDHasantaplusU+09AFyaprecededby thesaidvowelsThis isapurely ligaturalentityand theadditionofYa-phalāandākaraisusedtoelicittheaeligsoundasinEnglishacidঅGািসড

association অGােসািসেয়শনlsquobatrsquoবGাটlsquofatrsquo ফGাট lsquomatrsquo মGাটlsquocaprsquoকGাপetcTheBrāhmīscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribed as lsquovowel killerrsquo although it actually indicates absenceof a vowel after themarkedconsonantOnly theconsonantscanhave theHasantamarkedButasweseehereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅ8াandএ8া(CfUnicode100p473[100])

3310FormationofRa-phalaaandRefSequences Thiscasereferstotheformationofrephaandra-phalāasfollows

Ra-Hasanta= (C2H)whereC2iseither

09B0(র-BENGALILETTERRA)or 09F0(ৰ-ASSAMESELETTERRAUnicodename

BENGALILETTERRAWITHMIDDLEDIAGONAL)His09CD(-BENGALISIGNVIRAMA)

Owingtoco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+kaasinঅকHarkaldquothesunldquo)ra-phalā=C+Hasanta+ra(egT ieka+Hasanta+raasinচTchakraldquocyclerdquo)Thepointisin

21

both the cases the slot for ra could be Bangla ra র (U+09B0) or the Assamese ra ৰ(U+09F0)followedprecededbythecommonHasanta(U+09CD)whereastheshapesofrephaandra-phalāinboththecasesremainthesameTheLGRmakesanoteofthispoint of concern with respect to the two RAs in disguise as it would be compeltelyimpossibletodistinguishbetweenthemwithnakedeyesinalablesogeneratedwhichmay consequently lead to concerns related to spoofing and other kind of cyberirregularitiesThemotivetoclassthesetwoCPsas(blocking)variantsisbecausefullyrendered labels may mask the distinction between Bangla ra র (U+09B0) or theAssameseraৰ(U+09F0)ThatprovidesthejustificationforVariantSet4thoughonlyinthecontextoffollowingHasantThedifferencebetweentheRAsisonlydistinguishableifonelooksintotheirUnicodevaluesThereforelabelssuchasঅকHarka শীষH sırsalsquotopapexrsquo অY abhra lsquocloudthe skyrsquo ম śrama lsquophysical labourrsquo could be extremelydangerous as theweb-usermay never verify the digital content (the labels) with itsunicodevaluecodepoints ThispointismadeexplicitlywithreferencetoTable9(ofsequencesp36)andTable16(ofWLESymbolsp47)thataretofollowMoreoveritisnoteworthythattheREPHAcanalsooccurwithKHANDATATheconditionsinthiscontextofKHANDATAareliabletobesuchthattheCshouldbeeitherRAU+09B0(র)(usedinBangla)orRAU+09F0(ৰ)(usedinAssamese)

4 OverallDevelopmentProcessandMethodologyThe Neo-Brāhmī Generation Panel (NBGP) has been formed by members havingexperience in Linguistics (especially in NLP Computational linguistics) LiteratureLanguageHistoryandEpigraphyUndertheNeo-BrāhmīGenerationPanelBanglaandeightotherscriptsbelongingtoseparateUnicodeblocksarebeingtakenuptoassignaseparate LGR for each However an attempt ismade to ensure that the fundamentalphilosophybehindbuildingthoseLGRsconsistentwithallotherBrāhmī-derivedscriptsThepresentLGRwillcater tomultiple languagesbelongingtoEGIDSscale1to4(seeTable4)thatuseBanglascriptThefollowingguidingprinciplesareusedinmakingdecisionsaboutBanglaLGRCode-points

41 GuidingPrinciplesTheNBGP adopts followingbroadprinciples for selection of code-points in the code-pointrepertoireacrosstheboardforalltheNeo-Brāhmīscriptswithinitsambit

411 InclusionPrinciples4111 ModernUsageEvery character proposed should be in the everyday usage of a particular linguisticcommunityThecharacterswhichhavebeenencodedintheUnicodefortranscription

22

purposesonlyorforarchivalpurposeswillnotbeconsideredforinclusioninthecode-pointrepertoire4112 UnambiguousUseEvery character proposed should have unambiguous understanding among linguistsaboutitsusageinthelanguage

42 ExclusionPrinciplesThe main exclusion principle is that of External Limits on Scope These consist ofprotocolsor standardswhichareprerequisites to theLabelGenerationRule-setsAllfurtherprinciplesareinfactsubsumedundertheselimitationsbuthavebeenspeltoutseparatelyforthesakeofclarity421 ExternalLimitsofScopeThecodepointrepertoireforrootzonebeingaveryspecialcaseatthetopofprotocolhierarchies thecanvasofavailablecharacters forselectionasapartof theRootZonecodepointrepertoireisalreadyconstrainedbyvariousprotocollayersbeneathitThefollowingthreemainprotocolsstandardsactassuccessivefiltersiTheUnicodeChartOut of all the characters that are needed by the script in question if a particularcharacter is not encoded in Unicode it cannot be incorporated in the code pointrepertoire Such cases are quite rare and especially so in Bangla-Asamiyā-ManipuriWritingSystemgiventheelaborateandexhaustivecharacterinclusioneffortsmadebytheUnicodeconsortiumiiIDNAProtocolUnicode being the character-encoding standard for providing the maximum possiblerepresentation of a given scriptlanguage it has encoded as far as possible all thepossible characters needed by the script However the Domain name being aspecialized case it is governed by an additional protocol known as IDNA(InternationalizedDomainNames inApplications) The IDNAprotocol excludes somecharactersoutofUnicoderepertoirefrombeingpartofthedomainnamesiiiMaximalStartingRepertoire(MSR)TheRoot-zoneLGRbeing the repertoireof characterswhicharegoing tobeused forcreationoftheRoot-zoneTLDswhichinturnconstituteanevenmorespecializedcaseof domain names the ROOT LGR procedure introduces additional exclusions on theIDNArsquosallowedsetofcharacters

23

ExampleBanglaSignAvagrahaঽ(U+093D)evenifallowedbyIDNAprotocol isnotpermittedintheRootZoneRepertoireaspertheMSRTosumuptherestrictionsstartoffwithadmittingonlysuchcharactersasarepartofthe code-block of the given scriptlanguage The IDNA Protocol further narrows thisdownandfinallyanadditionalfilterintheformofMaximalStartingRepertoirerestrictsthecharactersetassociatedwiththegivenlanguageevenmore4211 NoPunctuationMarksTheTLDsbeingidentifierspunctuationmarkerspresentinBraHami-basedscriptswillnotbeincluded 4212 NoSymbolsandAbbreviationsAbbreviations weights and measures and other such iconic characters like BANGLAISSHAR(U+09FA)BANGLACURRENCYDENOMINATORSIXTEEN৹(U+09F9)etcwillalsonotbeincluded4213 NoRareandObsoleteCharactersThere are characterswhich have been added toUnicode to accommodate rare formssuchasSanskriticVOCALICRRৠ(U+09E0)andVOCALICLldquoঌrdquo (U+098C)aswellasVOCALICLLৡ(U+09E1)andtheallographicndashkaraformsofthelattertwosymbols-VOWELSIGNVOCALICL(U+09E2)andVOWELSIGNVOCALICLLldquo(U+09E3)Allsuch charactersareexcludedwhich complieswith theConservatismprincipleas laiddownintheRootZoneLGRprocedureHoweverinBanglathe-karacorrespondingtoVOCALICRRৠ(U+09E0)whichisVOWELSIGNVOCALICRRldquordquo(U+09C4)isstill inactiveuseincertainlimitedborrowedorSanskriticwordsandarethereforeretained4214 NoStressMarkersofClassicalSanskritandVedicStressmarkers for classical Sanskrit will not be included This is also in consonancewiththeLetterprincipleaslaiddownintheRootZoneLGRprocedure4215 ABNFThe Augmented Backus-Naur Formalism (ABNF) is described in Section 541 andAppendix(Section101)

5 RepertoireTheBanglaWritingSystemisrepresentedinUNICODEusingtheBengali(Bangla)scriptname as enumerated in ISO 15924 corresponding to languages such as Asamiyā(Assamese) Bangla (Bengali) and Manipuri The BENGALI block used for Bangla-

24

Asamiyā-Manipuri in theUNICODEhas93 entriesThis sectiondetails the code-pointrepertoirethattheNeo-BrāhmīGenerationPanel[NBGP]proposestobeincludedintheBanglaLGRItmaybementionedherethat theGovernmentofAssamhassubmittedaproposal toBureauof Indian Standards (BIS) on26th February2016 for dis-unificationofBanglaand Asamiyā Scripts The BIS inits 8thMeetingofIndian Language Technologies andProducts Sectional Committee LITD 20 held on 23rd Aug 2017 decided torefer theproposalforrecognitionofAssamesescriptinISOIEC10646toISOUntiltheUNICODEConsortiumtakesanyfurtheractionitwillbeassumedthattheCodePointRepertoireunderTable11willbevalidforallthethreelanguagesasaboveFor each of the code points language references have been given in the last columntitledReferenceunderTable8titledtheldquoCodePointRepertoirerdquoForentirecoverageofBanglacodepointsreferencesofBanglaAsamiyā(Assamese)Manipuri(Meitei)andBishnupriya are given Kokborok written in Bangla script is not known to haveintroducedmanynewcomplicationsexceptforoneparticularcharacterThoughonlyafewrepresentativelanguagesunderEGIDSScale1-4havebeenchosenforreferencingthey together cover all the code-points required for all the languages that NBGP hasconsideredasgivenunderBanglaUnicodePoints(asgiveninUNICODE63)Howeverbefore thedetailsarepresented it is ideal to lookat theBanglaCodePointChartfromMaximalStartingRepertoire[MSR]Version3Itmaybenotedthattheshapesofthereferenceglyphsgivenbelowinthecodechartsarebasedononeofthemanyfontsdesigned and are not prescriptive because there could be some variations in actualfonts ndash both UNICODE-compatible and True-Type ones Consider the following Codepointtable

25

Figure1BanglaCodePagefrom[MSR]forBangla-Asamiyā-Manipuri

ColourconventionAllcharactersthatareincludedinthe[MSR]-YellowbackgroundPVALIDinIDNA2008butexcludedfromthe[MSR]-PinkishbackgroundNot PVALID in IDNA2008 or are ineligiblefor the root zone (digits hyphen) - Whitebackground

GiventheBanglaUnicodeBlockasinFigure1forthecodepointsthoseareincludedintheMSRthefollowingsymbolswillneedaseparatetreatmentৎ U+09CE BanglaLetterKhanda-Taৰ U+09F0 Asamiyā-BanglaLetterRaWithMiddleDiagonalৱ U+09F1 Asamiyā-BanglaLetterRaWithLowerDiagonal

26

51 CodePointRepertoireInclusion

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

1 U+0981 BENGALISIGNCANDRABINDU

Candra-bindu

1Bangla2Manipuri2Assamese

[112][122][125]

2 U+0982 ং BENGALISIGNANUSVARA

Onushshar(Anusvara)

1Bangla2Manipuri2Assamese

[112][122][125]

3 U+0983 ঃ BENGALISIGNVISARGA

Biśarga(Visarga)

1Bangla2Manipuri2Assamese

[112][122][125]

4 U+0985 অ BENGALILETTERA

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

5 U+0986 আ BENGALILETTERAA

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

6 U+0987 ই BENGALILETTERI

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

7 U+0988 ঈ BENGALILETTERII

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

27

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

8 U+0989 উ BENGALILETTERU

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

9 U+098A ঊ BENGALILETTERUU

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

10 U+098B ঋ BENGALILETTERVOCALICR

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

11 U+098F এ BENGALILETTERE

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

12 U+0990 ঐ BENGALILETTERAI

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

13 U+0993 ও BENGALILETTERO

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

14 U+0994 ঔ BENGALILETTERAU

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

28

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

15 U+0995 ক BENGALILETTERKA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

16 U+0996 খ BENGALILETTERKHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

17 U+0997 গ BENGALILETTERGA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

18 U+0998 ঘ BENGALILETTERGHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

19 U+0999 ঙ BENGALILETTERNGA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

20 U+099A চ BENGALILETTERCA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

21 U+099B ছ BENGALILETTERCHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

29

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

22 U+099C জ BENGALILETTERJA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

23 U+099D ঝ BENGALILETTERJHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

24 U+099E ঞ BENGALILETTERNYA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

25 U+099F ট BENGALILETTERTTA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

26 U+09A0 ঠ BENGALILETTERTTHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

27 U+09A1 ড BENGALILETTERDDA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

30

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

28 09A109BC(U+09DC)

ড় NormalizedformofBENGALILETTERRRA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]09DCisthepreferredcodepointhoweveritisnotavailableforLGRasperthestandardsgoverningthisLGRdevelopment

29 U+09A2 ঢ BENGALILETTERDDHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

30 09A209BC(U+09DD)

ঢ় NormalizedformofBENGALILETTERRHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]09DDisthepreferredcodepointhoweveritisnotavailableforLGRasperthestandardsgoverningthisLGRdevelopment

31 U+09A3 ণ BENGALILETTERNNA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

32 U+09A4 ত BENGALILETTERTA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

31

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

33 U+09A5 থ BENGALILETTERTHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

34 U+09A6 দ BENGALILETTERDA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

35 U+09A7 ধ BENGALILETTERDHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

36 U+09A8 ন BENGALILETTERNA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

37 U+09AA প BENGALILETTERPA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

38 U+09AB ফ BENGALILETTERPHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

39 U+09AC ব BENGALILETTERBA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

32

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

40 U+09AD ভ BENGALILETTERBHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

41 U+09AE ম BENGALILETTERMA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

42 U+09AF য BENGALILETTERYA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

43 09AF09BC(U+09DF)

য় NormalizedformofBENGALILETTERYYA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]09DFisthepreferredcodepointhoweveritisnotavailableforLGRasperthestandardsgoverningthisLGRdevelopment

44 U+09B0 র BENGALILETTERRA

Consonant 1Bangla2Manipuri

[112][125]

45 U+09B2 ল BENGALILETTERLA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

46 U+09B6 শ BENGALILETTERSHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

33

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

47 U+09B7 ষ BENGALILETTERSSA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

48 U+09B8 স BENGALILETTERSA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

49 U+09B9 হ BENGALILETTERHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

50 U+09BE া BENGALIVOWELSIGNAA

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

51 U+09BF ি BENGALIVOWELSIGNI

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

52 U+09C0 ী BENGALIVOWELSIGNII

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

53 U+09C1 BENGALIVOWELSIGNU

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

34

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

54 U+09C2 BENGALIVOWELSIGNUU

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

55 U+09C3 BENGALIVOWELSIGNVOCALICR

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

56 U+09C4 BENGALIVOWELSIGNVOCALICRR

Kāra(Mātrā)

1Bangla2Assamese

[112][122]

57 U+09C7 l BENGALIVOWELSIGNE

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

58 U+09C8 m BENGALIVOWELSIGNAI

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

59 U+09CB lা BENGALIVOWELSIGNO

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

60 U+09CC lৗ BENGALIVOWELSIGNAU

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

35

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

61 U+09CD BENGALISIGNVIRAMA

Hasanta(=Halant)Virama(=Da ri)

1Bangla2Assamese2Manipuri

[112][122][125]

62 U+09CE ৎ BENGALILETTERKHANDATA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

63 U+09F0 ৰ BENGALILETTERRAWITHMIDDLEDIAGONAL

Consonant 2Assamese [122]

64 U+09F1 ৱ BENGALILETTERRAWITHLOWERDIAGONAL

Consonant 2Assamese2Manipuri

[122][125]

Table8BanglaCode-PointRepertoire

Apart from the above individual code-points the Neo-Brāhmī Generation Panel alsoproposes some specific sequences which enable conditional inclusion of the BanglaLETTER A and E followed by Bangla SIGN VIRAMA and Bangla LETTER YA againfollowed by Bangla VOWEL SIGN AA in the repertoire for enabling inclusion of aeligsoundasinEnglishlsquobatrsquolsquocatrsquoetc

SrNo

UnicodeCodePoints

Sequence

CharacterNames Examplelanguagesusingthecode-point

(Notexhaustive

list)

Reference

36

SrNo

UnicodeCodePoints

Sequence

CharacterNames Examplelanguagesusingthecode-point

(Notexhaustive

list)

Reference

S1 098509CD09AF09BE

অ8া BENGALILETTERABENGALISIGNVIRAMABENGALILETTERYABENGALIVOWELSIGNAA

BanglaAssamese

[112][122]

S2 098F09CD09AF09BE

এ8া BENGALILETTEREBENGALISIGNVIRAMABENGALILETTERYABENGALIVOWELSIGNAA

Bangla [112]

Table9Sequences

52 CodePointRepertoireExclusionTherearesomecharactersoftheBanglascriptthatfindplaceintheUnicodebuthavenot been included in the repertoire in the LGR proposal The reason for excludingঌ(U+098C)andৗ(U+09D7)isthattheyarerareandobsoletecharacters

SrNo CodePoints

Glyph CharacterNames Note

1 U+098C ঌ BENGALILETTERVOCALICL Limitedordeclininguse

2 U+09D7 ৗ BENGALIAULENGTHMARK Limitedordeclininguse

Table10ExcludedCodePoints

37

53 CodepointnotusedaloneBENGALI SIGN NUKTA U+09BC (See 336) is excluded from repertoire since it will never be used alone It will be used as sequence in three special characters in normalized form for ড় ঢ় য়

UnicodeCodePoint

Glyph CharacterName Reasonforexclusion

U+09BC BENGALISIGNNUKTA

Never used alone Only used together with U+09A1 ড

U+09A2 ঢ U+09AF য as to form ড় ঢ় য় respectively

Table10bExcludedCodePoints

54 TheBasisofPresentIDNThepresentLGRhasalsobenefitedfromtheearlierworkonIDNforBangla(differentversions)doneforभारतorভারতdraftedbetween20112009and18072013

541 TheABNFVariablesTheAugmentedBackus-NaurFormalism(ABNF)beganwiththefollowingvariables

CrarrConsonantVrarrVowelMrarrkāra(Mātrā)BrarrAnusvāra(onuʃʃār)DrarrCandrabinduXrarrVisarga(biʃɔrgo)HrarrHasantaViramaZrarrKhandaTa

TheAugmentedBackus-NaurFormalism(ABNF)willusethefollowingOperators

SrNumber Operator Function

1 ldquo|ldquo Alternative

2 ldquo[]rdquo Optional

3 ldquordquo VariableRepetition

38

4 ldquo()rdquo SequenceGroup

Table11TheABNFFormalism

InwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoBanglaaregiventofacilitateunderstanding

542 TheVowelSequenceInwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoBanglaare given To facilitate understanding of other Brahmi script users equivalents inDevanāgarīareprovidedwherevernecessaryAvowelsequenceismadeupofasinglevowelItmaybefollowedbutnotnecessarily(optionally)byanAnusvāraonuʃʃār(B)Candrabindu(D)oraVisargabiʃɔrgo(X)ThenumberofDBorXwhichcanfollowaV inBanglamaynotberestrictedtooneGoingbytherules illustratedinthedocument it isclearthat formationssuchasVDDVBBandVXXare invalidorthographicunitsHowever it isvalidandpossible tohaveformationsorsequencessuchasanusvarafollowedbyachandrabinduononehandandvisarga followed by a chandrabindu on the other as in হ8াংচা lsquohaelignchārsquo and lsquohaelignrsquo হ8াঃrespectivelyThepossibilityof aVisarga orAnusvāra (onuʃʃār) followingaCandrabinduexists inBangla Vowel can optionally be followed by a combination of Hasanta Virama [H]Consonant [C] to formaYa-phala ldquoYa-phala isapresentation formofU+09AFBanglaletterযorlsquoyarsquoRepresentedbythesequenceltU+09CDieBENGALISIGNVIRAMABangla SIGNHasanta or VIRAMA U+09AF -য BENGALI LETTER YAgt Ya-phala has aspecialformয়AgainwhencombinedwithU+09BEাBENGALIVOWELSIGNAA(ielsquoaarsquo(ā))itisusedfortranscribing[aelig]asintheldquoardquointheEnglishwordldquobatrdquowritteninBanglaasব8াটAVowel-sequenceadmitsthefollowingcombinations

5421 ASingleVowel

ExamplesV অअ

5422 AVowelwithConditionsAVowelcanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D]orVisarga[X]or Candrabindu+ Anusvāra [DB] or Candrabindu+ Visarga [DX] or combination ofHasanta(orVirama)[H]followedbyConsonant[C]followedbykāra(Mātrā)[M]

39

Examples

VB অং अ

VD অ अ

VX অঃ अः

VDB অং अ

VDX অঃ अःVHCM অ8াএ8া

5423 VHCMSequenceAVHCMsequencecanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D] orVisarga [X]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga [DX]

Examples VHCMB অ8াংএ8াং VHCMD অ8াএ8া VHCMX অ8াঃএ8াঃ VHCMDB অ8াংএ8াং VHCMDX অ8াঃএ8াঃ

543 TheConsonantSequence

5431 ASingleConsonant(C) ExampleC কक

5432 AConsonantwithConditionsAConsonantoptionallyfollowedbydependentvowelsignkāra(Mātrā)[M]orAnusvāra [B] or Candrabindu [D] or Visarga [X] or Hasanta (also known asVirama)[H]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga[DX]

Example

CM িকক -कक

CB কং क

40

CD ক क

CX কঃ कः

CH p क(Pureconsonant)

CDB কং क

CDX কঃ कः

5433 CMSequenceACMsequencecanbeoptionallyfollowedbyBDXDBorDX

Example CMB কীংকং कक

CMD কা का

CMX বীঃ वीः

CMDB কাং का

CMDX কাঃ काः

5434 SequenceofConsonants Asequenceofconsonants(upto4)joinedbyHasanta(alsoknownasVirama)

3(CH)CExample

CHC W rarr ন++ ত न+त

CHCHC sup2 rarr ন+ + ত+ + র न+त+र

CHCHCHC q8 rarr ন++ত++র++য় न+त+र+य

5435 Subsets While considering its subsets as a representative example we will consider thecombinationCHConlyhoweverthesameisequallyapplicabletoCHCHCandCHCHCHC[A]ThecombinationmaybefollowedbyMBDXDBorDX

Example CHCM sup3ী rarrক ক ী 4कrarrककী

CHCB sup3ং rarrক ক ং 4कrarrकक

CHCD sup3 rarrক ক 4कrarrकक

41

CHCX sup3ঃ rarrক ক ঃ 4कःrarrककঃ

CHCDB sup3 ং rarrক ক ং 4कrarrकक

CHCDX sup3ঃ rarrক ক ঃ 4कःrarrककः [B] 3(CH)CMmayfurtherbefollowedbyaBDXDBorDX

Example

CHCMB sup3ীং rarr ক ক ী ং 4क rarr क क ी

sup3ং rarr ক ক ং 4क rarr क क

CHCMD sup3া rarr ক ক া 4का rarr क क ा

CHCMX sup3ীঃ rarr ক ক ী ঃ 4कः rarr क क ी ः

CHCMDB sup3াংrarr ক ক া ং 4काrarr क क ा

CHCMDX sup3াঃ rarr ক ক া ঃ 4काः rarr क क ा ः

544 TheKhanda-Tasequence

5441 AsinglelsquoKhandarsquo-Ta(Z) Example Zৎ=x

5442 AKhandaTaCombination10AKhandaTacanbeprecededbyaconsonantandHasanta(alsoknownasVirama)

[CH]Z

Example র++ৎ =ৎHasinভৎHসনা (bhartsanā) scoldingNoteTheconditionsinthiscontextofKHANDATAarethattheCshouldbeeitherRAU+09B0(র)(usedinBangla)orRAU+09F0(ৰ)(usedinAssamese)

545 SpecialCasesSandPTwospecialcasesinvolvingSequences(referredtoasSandPinTable16underSection7)couldbedescribedbrieflyhereLetustakeupSinthefirstinstanceItisnoteworthythat thereare two instances inBanglawhereHasanta(U+09CD) isprecededbya full

10 Refer to Rule P in Section 7 Table 16

42

vowel (U+0985 অ - BANGLA LETTER A and U+098F এ - BANGLA LETTER E) ForrenderingYa-phalāfollowedbyঅandএitisnecessarytotypeU+09CDplusU+09AFyapreceded by the said vowels This is a purely ligatural entity and the addition ofYa-phalāandā-kāraisusedtoelicittheaeligsoundasinEnglishlsquobatrsquolsquofatrsquoetcTheBrahmıscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribedaslsquovowel killerrsquo although it actually indicates absence of a vowel after the markedconsonant Only the consonants can have the Hasanta marked But as we see hereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅGাandএGা(CfUnicode100p473[100])Another case refers to the formation of repha and ra-phalā in the said script andmentioned in the tableaboveasPOwing toco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+

kaasinঅকHarkaldquothesunldquo)ra-phalā=C+Hasanta+ra(egTieka+Hasanta+raas

inচTcakraldquocycleldquo)ThepointisinboththecasestheslotforracouldbeBanglaraর

(U+09B0)ortheAssameseraৰ(U+09F0)followedprecededbythecommonHasanta(U+09CD)whereastheshapesofrephaandra-phalāinboththecasesremainthesame

43

6 VariantsThissection talksabout thevariants in theBanglascriptTheNBGPcategorizes theseconfusinglyvariantsintwogroupsGroup1ConfusingduetopurevisualsimilarityGroup2ConfusingduetodeviationfromnormallyperceivedcharacterformationsbylargerlinguisticcommunityForGroup1anyidenticalcodepointsaredefinedasvariantsTheconfusablebutnotidenticalcasesarenotproposedasthereisanotherpanel(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesHowevercaseswhichbelongtoGroup2areproposedtobeconsideredasvariantsThesecasesarenotofmerevisualsimilarityasthey involve some deviations from the widely accepted norms of Bangla Aksharformations These can cause confusion even to a careful observer and hence beingproposedasvariantsThe variants are generated in a script when two or more forms are formed withdifferent storage or code points In Bangla the e-kāra ā-kāra and the o-kāra havedifferentcodepointsOnecantypeowithaconsonantatonegoandthesamebytypinge-kāra and ā-kāra as two separate keys getting the same results A reader cannotdifferentiatebetween the twoko (েকা)one typedwitha singlekeyand theotheronetypedwithtwodifferentkeysMoreoverthiswillnotbeconsideredasacaseofvariantbecauseakārafollowedbyakāraisnotallowed

61 InScriptVariantsHoweverweproposetwocasesoftruein-scriptvariantsinBanglascriptCASEIAs far as true variants in Bangla are concernedwemay drawour attention to caseswhereinHasantawith(U+09A5)থ(tha)appearsasconjunctwith(U+09B8)স(sa)and(U+09A8)ন(na)1 স+Hasanta+থ(U+09B8+U+09CD+U+09A5)versus

স +Hasanta+হ(U+09B8+U+09CD+U+09B9)

2 ন+Hasanta+থ(U+09A8+U+09CD+U+09A5)versus

ন+Hasanta+হ(U+09A8+U+09CD+U+09B9)

44

Theabovecombinationsifwrittenintraditionalorthographycouldbelittleconfusingwheretheথ(tha)inconjunctappearslikeaহ(ha)Theconjunctcouldbeintheinitialmedialorfinalpositions(asshownbelowinegno1)Itcouldbetypedwrongaswellthinking itwas aহ (ha)U+09B9 increasing the chances of risks in labelwriting andidentificationExamples1 acuteandসহ(asinacuteানsthānaacuteলsthulaাacuteGsvāsthyaঅacuteায়ীasthāyı)

2 andনহ(asinparagrantha)The fontswhichrepresent traditionalBanglawritingsystemcould tend tocreate thisproblemThereforethesemaybetakenascasesofvariantsinBanglaCASEIIAnotherinterestingexampleofvariantisencounteredinra+HasantaandHasanta+racombinationsinwritinglabelsintheBanglascript(forlanguagessuchasBanglaAssameseandManipuri)Thevariantcasesariseintypinglsquorepharsquo(involvingra+Hasanta)andlsquora-phalārsquo(involvingHasanta+ra)lsquoRepharsquocouldbeformedbytwosequences(mainlybecausebothAssameseandBanglafindplaceinthesameUNICODEpointsandlsquoB_RArsquoaswellaslsquoA_RArsquorefertothesamephoneticelement)Herethefinalligatureslookthesameandwillbeasfollows

(1) B_RA+H+C(2) A_RA+H+C

Where

B_RA = U+09B0BENGALILETTERRA(র)orA_RA = U+09F0BENGALILETTERRAWITHMIDDLEDIAGONAL(ৰ)H = U+09CDBENGALISIGNVIRAMA()C = anyconsonant(theoretically)

Example

Sequence1(UsingBanglaRA)

Ligature1

Sequence2(UsingAssameseRA)

Ligature2

U+09B0(র)U+09CD()U+0995(ক) কH U+09F0(ৰ)U+09CD()U+0995(ক) কH

U+09B0(র)U+09CD()U+09A0(ঠ) ঠH U+09F0(ৰ)U+09CD()U+09A0(ঠ) ঠH

Table12ExampleofRepha

45

NoteAsBanglaandAssameseকandঠlookexactlythesametheresultantcombinationswithRephalookidenticalAdditionofRephadoesnotmakeanydifferencelsquoRa-phalarsquocouldbeformedbytwosequencesonsimilargroundsandthefinalligatureswouldlookthesame

(1) C1+H+B_RA(2) C1+H+A_RA

WhereC1 = anyconsonantsexceptKhanda-ta

Example

Sequence1(UsingBanglaRA)

Ligature1

Sequence2(UsingAssameseRA)

Ligature2

U+0995(ক)U+09CD()U+09B0(র) U+0995(ক)U+09CD()U+09F0(ৰ)

U+09A8(ন)U+09CD()U+09B0(র) ) U+09A8(ন)U+09CD()U+09F0(ৰ) )

Table13ExampleofRa-phala

AstheAssameseandBanglaRephaandRa-phalaconjunctformslookthesamethiscouldcauseconfusabilitytotheend-usersHencetherephaandra-phalacasesneedtobedefinedasvariantsNBGPconcludedtodefineরandৰasvariantcodepointswhereonlyonevariantsetbetweenরandৰcouldcoverallcasesButthiswillcreateblockedvariantlabelsegif

someoneregistersldquoরররrdquothevariantlabelldquoৰৰৰrdquowillbegeneratedasvariantandwillbeblockedandviceversaHoweveritisonlyblockedatthelabellevelifsomeoneelseneedstoregisterotherlabelsegৰৰorৰৰৰৰitisstillpossible

62 CrossScriptVariantsAcrispcrossscriptstudyforBanglahasbeendonewithrespecttosisterscriptssuchasDevanāgarī Gurmukhı and Odia11(formerly Oriya) keeping in mind the visual andtechnicalconfusionstheymaycauseaslabelsonthewebdomainMoreoverthereisnoin-script variant in Bangla as far as the orthography is concerned The followingcharacters are being proposed by the NBPG as variants Although there are certaincharacters which are somewhat similar they but have not been included here TheyhavebeenprovidedintheAppendix(102)forreference

11 Unicode uses Oriya for the script although Odia is now the official term used

46

1 BanglaandNāgarīDevanāgarīScript

Bangla Devanāgarī

মU+09AE

मU+092E

িU+09BF

िU+093F

Table14-BanglaandDevanāgarīcross-scriptvariantcodepoint

2 BanglaandGurmukhiScript

Bangla Gurmukhı

মU+09AE

ਸU+0A38

িU+09BF

ਿU+0A3F

Table15-BanglaandGurmukhıcross-scriptvariantcodepoint

7 WholeLabelEvaluationRules(WLE) ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinsection32whenwritten inBangla12ScriptThe ruleshavebeendrafted in suchawaythattheycanbeeasilytranslatedintotheLGRspecificationsBelow are the symbols used in the WLE rules for each of the Indic SyllabicCategoryasmentionedinthetableprovidedinCodepointrepertoire(Section51)

C rarr Consonant

M rarr Kāra(Mātrā)

V rarr Vowel

B rarr Anusvāra

12 As used by the Unicode denoting and including both Assamese and Maṇipuri

47

D rarr Candrabindu

X rarr Visarga

H rarr Hasanta

Z rarr KhandaTa

S rarr S1S2(fromTable9)or(ae)Ya-phalā(V1HC1M1)whereV1iseither0985(অ-BENGALILETTERA)or098F(এ-BENGALILETTERE)His09CD(-BENGALISIGNVIRAMA)C1is-09AF(য-BENGALILETTERYA)M1is-09BE(া-BENGALIVOWELSIGNAA)S1 and S2 are valid even they are not allowed by the other context rules

P rarr Ra-Hasanta(C2H)whereC2iseither09B0(র-BENGALILETTERRA)or 09F0 (ৰ - ASSAMESE LETTER RA

Unicode name BENGALI LETTER RAWITHMIDDLEDIAGONAL)

His09CD(-BENGALISIGNVIRAMA)

Table16-SymbolsusedinWLErules

ItisalsoperhapsidealtomentionherethatinBanglatheconsonantletters(orgraphemes)arephysicallyjoinedtoformldquoclustersrdquothatcouldtheoreticallyconjoinfromtwotofourconsonantsandcombinetocreatenewshapesDashandChaudhuri(1998)statethatthereareldquonearly380uniqueconsonantclustersrdquooutofwhichBi-consonantalcombinationsare290three-lettercombinationsaccountforanother80andtherareroneswithfourlettersnumber10more[136Pg4]MoredetailsofsuchcombinationscouldbeseeninPabitraSarkar(1993)[135] 71FinalSetofWLERulesTheprevalentpatternsinBanglaandvariousrestrictionsbelowarethespecificWLErulesthatneedtobeimplemented

48

1 CisasetofCandCNwhereCNisthesetofnormalizedformsofড়ঢ়য়2 HmustbeprecededbyC

Example

3 MmustbeprecededbyCExampleকা

4 DmustbeprecededbyeitherofVCorMExampleআখখা হা

5 XmustbeprecededbyeitherofVCMorDExampleউঃখঃবঃাঃ দ ঃ

6 BmustbeprecededbyeitherofVCMorDExampleআংইংকং

7 ZmustbeprecededbyVCMDBXorPExampleইৎকৎাৎাৎপ6ৎrৎ (S is not listed because S ends with M Z may also follow S)rdquo

8 VCANNOTbeprecededbyHDetailsin711CaseofVprecededbyH

9 SCANNOTbeprecededbyH

Now letuselaborateeach rulewithexamples from the scriptkeeping inmindthe Bangla Assamese and Manipuri communities Some combinations ofcharactersmayseemunrealisticorrareinusagebutthereisnoharminaddingsuch ligaturesbecause it ispossible tocreate thembyanyusereasilybutmaynotbeattestedcombinationsCase of V Preceded by H There could be cases involvingmulti-word domains where Vmay need to beallowedtofollowanHegব8াsঅtইিuয়া baeligŋkʌv ɪndiə (U+09ACU+09CDU+09AFU+09BEU+0999U+09CD U+0995 U+0985 U+09AB U+09CD U+0987 U+09A8 U+09CD U+09A1U+09BFU+09DFU+09BE)(meaningBankofIndia)ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendswithanH(অt)andthesecondwordbeginswithaV(ইিuয়া)Somesectionsofthe linguistic community require the explicit presence of H for fullrepresentation of the sound intended However by and large the form of thefirstwordwithoutanH(U+09CD)isconsideredenoughforfullrepresentationofthesoundintendedforthefirstword

49

This isauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoire Otherwise V is never required to be allowed to follow an HPermittingthismaycreateaperceptivesimilaritybetweentwolabels(withandwithout H) for majority of the linguistic communities hence this is explicitlyprohibitedbytheNBGPIn future if required depending on the prevailing requirements from thecommunitythefutureNBGPmayconsiderrevisitingthisrule

72 AdditionalExamplesfromBanglaABNF

Belowarea fewexampleswhichhelponeunderstandsomeof the rulesABNFputsinplaceThesearejustgivenforreferencepurposesandarenotmeanttobecomprehensive

1HMBDorXcannotoccurinthebeginningofaBanglawordExample

ক क

াক ाक

ংক क

ক क

ঃক ःकAscanbeseensuchcombinationwillresultautomaticallyinaldquogolurdquooradottedcircle marking it as an invalid formation This is an intrinsic property of theIndianlanguagesyllableandisquasi-automaticallyappliedwhereversupportedbytheOS

2HisnotpermittedafterVBDXMS

Example

অ अ

অং क ক क কঃ कः ি )क

3NumberofBDorXpermittedafterConsonantorVowelorakāra(Mātrā)isrestrictedtoonethusthefollowingcombinationsareinvalidated

50

Example

কংং क

ক क

কঃঃ कःः

কা का

কীঃঃ कःः

অংং अ

অ अ

অঃঃ अःः

4NumberofMpermittedafterConsonantisrestrictedtooneExample

কীী की5MisnotpermittedafterV Example

ইাঈৗ ईाईौ6ThecombinationsofAnusvāra+ VisargaaswellasVisarga+Anusvāraarenotpermissible

Example

কংঃ कः

কঃং कः

8 Contributors

81ExpertsfromIndia ProfessorUdayaNarayanaSinghChair-ProfessorofLinguisticsampDeanFacultyofArtsAmityUniversityHaryanaGurgaonPachgaonManesarPIN122431(Haryana)India

51

ProfessorPabitraSarkarformerlyVice-ChancellorRabindraBharatiUniversityKolkataDrAtiurRahmanKhanPrincipalTechnicalOfficerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMrRajibChakrabortyLinguistSocietyforNaturalLanguageTechnologyResearch(SNLTR)Module114amp130SDFBuildingSaltLakeSector-VKolkata-700091(WestBengal)IndiaMrAkshatJoshiProjectEngineerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMsMoumitaChowdhurySeniorTechnicalOfficerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMrChandrakantaMurasinghAgartalaTripuraSomeotherNBGPmembers

82ContributorsfromBangladeshJanabMustafaJabbarHonorableMinisterMinistryofPostsTelecommunicationsampInformationTechnologyGovtofBangladeshProfShamsuzzamanKhanFormerDirector-GeneralBanglaAcademyDhakaProfRafiqulIslamNationalProfessorofHumanitiesDhakaProfSwarochisSarkarDirectorInstituteofBangladeshStudiesRajshahiUniversityRajshahiBangladeshProfJinnatImtiazAliDirector-GeneralInternationalMotherLanguageInstituteDhakaMrMohammadMamunOrRashidDepartmentofBanglaJahangirnagarUniversityampMemberBangladeshComputerCouncilProfManiruzzamanformerlyProfessorChittagongUniversityChattagramBangladeshMrShyamSunderSikderSecretarySecretaryPostampTelecommunicationsDivisionGovtofBangladesh

52

MrMdMustafaKamalFormerDirectorGeneralBangladeshTelecommunicationsRegulatoryAuthorityGovernmentofBangladeshDhakaBrigadierGeneralMdMahfuzulKarimMajumderDirector-GeneralEngineeringampOperationsDivisionBangladeshTelecommunicationsRegulatoryAuthorityGovernmentofBangladeshDhakaMdZiarulIslamProgrammerPostsampTelecommunicationsDivisionGovernmentofBangladeshDhakaProfSyedShahriyarRahmanDepartmentofLinguisticsUniversityofDhakaDrMizanurRahmanDirector(In-Charge)TranslationTextBookandInternationalRelationsDivisionBanglaAcademyDhakaDrApareshBandyopadhyayDirectorBanglaAcademyDhakaMrMdMobarakHossainDirectorBanglaAcademyDhakaDrJalalAhmedDirectorBanglaAcademyDhakaMrJahangirHossainInternetSocietyBangladesh(ICANNALS)JanabSarwarMostafaChoudhuryBangladeshComputerCouncilDhakaJanabMdRashidWasifBangladeshComputerCouncilDhakaJanabIstiaqueArifSeniorAssistantDirectorBangladeshTelecommunicationsRegulatoryAuthorityDhakaMsAfifaAbbasInformationSecurityandGovernanceLeadEngineeratBanglalinkandICANNFellowMr Mohammad Abdul Haque Secretary General Bangladesh Internet GovernanceForumMrImranHossenCEOEyeSoftandkeymemberofBangladeshAssociationofSoftwareampInformationServices(BASIS)MsShahidaKhatunDirectorFolkloreMuseumampArchiveDivisionBanglaAcademyDhakaMrSyedAshikRehmanCEOBengalMediaCorporationDhakaMrHaseebRahmanCEOProfessionalsrsquoSystemsDhaka

53

9 References[100] UnicodeConsortium2017UnicodeStandard100MountainViewCA

[101] BandyopadhyayChittaranjan1981DuiShatakerBanglaMudranoPrakashanKolkataAnandaPublishers

[102] BanerjiRD1919TheOriginoftheBengaliScriptKolkataNewDelhiAsianEducationalServices2003reprint

[103] ChatterjiSK1926TheOriginandDevelopmentoftheBengaliLanguageCalcuttaCalcuttaUniversityPress

[104] -----1939Bhasha-prakashBangalaVyakaran(AGrammaroftheBengaliLanguage)CalcuttaUniversityofCalcutta

[105] HaiMuhammadAbdul1964DhvaniVijnanOBanglaDhvani-tattwa(PhoneticsandBengaliPhonology)DhakaBanglaAcademy

[106]JhaSubhadra1958TheFormationofMaithiliLondonLuzacampCo

[107] KosticDjordjeDasRheaS1972AShortOutlineofBengaliPhoneticsCalcuttaStatisticalPublishingCompany

[108] MajumdarRC1971HistoryofAncientBengalCalcuttaGBhardwaj

[109] MazumdarBijaychandra19202000TheHistoryoftheBengaliLanguage(ReprCalcutta1920ed)NewDelhiAsianEducationalServices

[110] PandeyAnshuman2001ProposaltoEncodetheTirhutaScriptinISOIEC10646

[111] PalPalashBaran2001DhwanimalaBarnamalaKolkataPapyrus

[112] -----2007lsquoBanglaHarapherPanchParbarsquoInSwapanChakrabortyedMudranerSanskritiOBanglaBoiKolkataAbabhas

[113] RossFiona1999ThePrintedBengaliCharacteranditsEvolutionLondonCurzon

[114] ShastriMahamahopadhyayHaraPrasad1916HājārBacharērPurāṇaBāṅgālāBhāṣāyBauddhaGānōDōhāCalcuttaBangiyaSahityaParishat

[115] SinghUdayaNarayana(JointlyManiruzzaman)1983DiglossiainBangladeshandlanguageplanningCalcuttaGyanBharati214pp

[116] -----1987ABibliographyofBengaliLinguisticsMysoreCIILxii+316pp

[117] -----2017(withRajibChakrabortyBidishaBhattacharjeeampArimardanKumarTripathy)LanguagesandCulturesontheMarginGuidelinesforFieldworkonEndangeredLanguagesMimeoCentreforEndangeredLanguagesVisva-Bharati

54

[118] -----1980ScriptalchoiceandspellingreformAnessayinlanguageandplanningJournaloftheMSUniversityofBarodaSocialScienceNumber292173-186AmodifiedversionreprintedEAnnamalaiBjornJernuddandJoanRubinedsLanguagePlanningProceedingsofanInstituteMysoreCIIL405-417

[119] Sripantha1996JakhanChapakhanaEloKolkataPaschim-BangaBanglaAcademy

[120] SurAtul1986BanglaMudranerDushoBacharKolkataJijnasa

[121] ScriptBehaviourforBengaliVersion11TDILandC-DACPune

[122] BoraMahendra1981TheEvolutionofAssameseScriptJorhatAssamSahityaSabha

[123] ProposaltoEncodetheTirhutaScriptinISOIEC10646httpwwwunicodeorgL2L201111175r-tirhutapdfaccessedon25112017

[124] EthnologueAssameseintheLanguageCloudhttpswwwethnologuecomcloudasmaccessedon25112017

[125] BengalialphabetforManipurifoundinEthnologueManipuri(MeeteilonMeithei)httpswwwomniglotcomwritingmanipurihtmaccessedon20102019

[126] WikipediaBengalialphabethttpsenwikipediaorgwikiBengali_alphabetaccessedon25112017

[129]OmniglotSlyhetihttpwwwomniglotcomwritingsylotihtmaccessedon1052018

[130]WikipediaBishnupriyaManipurilanguagehttpsenwikipediaorgwikiBishnupriya_Manipuri_languageaccessedon25112017

[131] TheEMILLECIILCorpushttpmetashareeldaorgrepositorybrowsethe-emilleciil-corpusabdd35c8de6f11e2b1e400259011f6ea6bce74d38dbb42d881da76c64a6adb20accessedon1052018

[132]TheEMILLECIILCorpushttpcatalogelrainfoproduct_infophpproducts_id=696accessedon1052018

[133] BanglaLanguageampScript httpswwwisicalacin~rc_banglabanglahtmlaccessedon1052018

[134] SarkarPabitra1992BanglaBananSanskarSamasyaoSambhabanaKolkataChirayataPrakashan

[135] SarkarPabitra1993BanglaBhasharYuktabyanjanBhasha1123-45

55

[136] DashNiladriShekharandBBChaudhuri1998BanglaScriptAStructuralStudyLinguisticsToday121-28Alsoavailableathttpswwwacademiaedu9967428Bangla_Script_A_Structural_Study

[137] DaniAhmedHasan(1957)lsquoSrīhatta-NāgarīLipirUtpattioBikāśrsquoBanglaAcademyPatrika(Dhaka)Vol12(Bhadra-Agrahayan1364BangabdaNumber)pg1

[138] WikipediaSylhetiNagari httpsenwikipediaorgwikiSylheti_Nagariaccessedon1952018

[139] FuruiRyosuke(2015)lsquoVariegatedAdaptationsStateFormationinBengalfromtheFifthtoSeventhCenturyrsquoinBhairabiPrasadSahuampHermannKulkeedsInterrogatingPoliticalSystemsIntegrativeProcessesandStatesinPre-ModernIndiaChapter9Pp255-73NewDelhiManohar

[140] FergusonCharesAandMunierChowdhury(1960)lsquoPhonesofBengalirsquoLanguageVol36No1pp22-59

[141] ShahidullahMuhammad(2007)BuddhistMysticSongsDhakaMowlaBrothers

[142] RayPunyaSloka(1966)BengaliLanguageHandbookWashington

[143] HaiMuhammadAbdul(1960)AphoneticandphonologicalstudyofnasalsandnasalizationinBengaliDhakaUniversityofDhaka

[144] UnicodeConsortiumProposalSummaryFormtoAccompanySubmissionsforAdditionstotheRepertoireofISOIEC10646UNICODEhttpswwwunicodeorgL2L200202387r-syloti-formpdfaccessedonMay212018

[145] WikipediaOlChiki(Unicodeblock)httpsenwikipediaorgwikiOl_Chiki_(Unicode_block)accessedonMay212018

[146]BanglaScripthttpwwwbangladesh2000combdbangla_scripthtmlaccessedonMay212018

[147] BhattacharyaAshutoshed(1942)GopichandrerGanCalcuttaCalcuttaUniversity

[149] DasSisirKumar(1975)SahibsandMunshisAnAccountoftheCollegeofFortWilliamCalcutta

[150] IslamRafiqulPabitraSarkarMahbubulHaqampRajibChakraborty(eds)(2014)BanglaAcademyPromitoBanglaByabaharikByakaran(AFunctionalGrammarofStandardBangla)DhakaBanglaAcademy

[151] SarkarPabitra[2013]lsquoBanglaSpellingReformtheLongandShortofItrsquoBanglaJournal19215-232

[152] BanglaAcademy(2012)BanglaAcademyPromitoBanglaBananerNiyam(StandardBanglaSpellingasadoptedbyBanglaAcademy)DhakaBanglaAcademy

56

[153] SarkarPabitraampRajibChakraborty2018ldquoWhathashappenedSoFarIntermsof Script Reformsrdquo Paper presented at the Face to Face meeting jointly held by theBanglaAcademyDhakaampICANNatBanglaAcademyDhakaon10072018

[154] The Unicode Consortium 2018 The Unicodereg Standard Version 110 ndash CoreSpecificationChapter12P473

57

10 Appendix-I

101 AugmentedBackus-NaurFormalism(ABNF)The Augmented Backus-Naur Formalism (ABNF) is generic in nature and whenappliedtoaspecificlanguagescriptcertainrestrictionrulesapplyInotherwordsinagivenlanguagesomeoftheFormalismstructuresdonotnecessarilyapplyTotakecareofsuchcasesrestrictionrulesaresetinplaceTheserestrictionswillhelptofine-tunetheABNFIncaseofBangla13inparticularthefollowingrulesapply

1Khaṇḍata (ৎ) is NOT allowed at the beginning of an IDN label The sameappliestoঞandthevelarnasalঙintheBanglaSchemeoffive-foldlsquovargarsquo(asdefined under Table 5) Moreover Bangla does not allow ya (য়) in thebeginningofawordeitherbutwecanciteacoupleofnativeexamples forexamplethewordয়8াvেড়া(yaeligbbɔRo)fromthepoemlsquoLichuchorrsquowrittenbyKaziNazrul IslamHowever there are instances of it being used in namesmostlyof foreignoriginsuchasYaqubwhichmaybewrittenwithya(য়) inthe beginning as inয়াwব) In very recent timeswhile transliterating someChineseandJapanesenamesinBanglaonedoescomeacrossthepossibilityofKhaṇḍata (ৎ) followedbysa (স) inthebeginningofaword forexamplexেসিরং(Tsering)

2CHcancomewithKhandaTainonlythecasewhereCisra(র)(09B0)

ৎ6 asinভৎ6 সনা

3OnlyfollowingcombinationswithVHCMwillbeallowedrarrঅ8া(togetherpronouncedasaelig)asinঅ8ািসড(acid)rarrএ8া(togetheralsopronouncedasaelig)asinএ8ািসডএ8ােসািসেয়শান

(acidassociation)

102 lsquoSylhetiNagarılipirsquoorlsquoSilotirsquoThisversionofBanglascriptresemblesthe lsquoKaithīrsquoscript(ISO12954)usedbytheAccountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh andBiharndashwidelyinuseduringthe1880sTherewereseveralothernamesofSylheti

13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language Assamese and Maṇipuri have not been covered in this section

58

Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)

Table17ndashTheScriptTableofSylhetiNagarıorSiloti

103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints

1031 BanglaandNāgarīorDevanāgarī

Bangla Devanāgarī NBGP

Decision

ঃ U+0983 ः U+0903 Confusable

ওU+0993 उU+0909 Confusable

ঘU+0998 घU+0918 Confusable

U+0981 U+0945 Confusable

Table18BanglaandDevanāgarīconfusablecodepoints

59

1032 BanglaandGurmukhi

Bangla Gurmukhi NBGPdecision

ঘU+0998 ਬU+0A2C Confusable

U+0981 U+0A71 Confusable

Table19BanglaandGurmukhiconfusablecodepoints

Bangla Gurmukhi

NBGPdecision

ওU+0993 ਤU+0A24

Distinguishable

শU+09B6 ਅU+0A05

Distinguishable

মU+09AE ਮU+0A2E

Distinguishable

বাU+09ACandU+09BE

ਗU+0A17

Distinguishable

Table20ndashBanglaandGurmukhıdistinguishablecodepoints

1033BanglaandOriya(Odia)

Bangla Oriya(Odia) NBGPDecision

ওU+0993 ଓU+0B13 Confusable

Table21ndashBanglaandOriyadistinguishablecodepoints

Bangla Oriya(Odia) NBGP

Decision

ঘU+0998 ସU+0B38 Distinguishable

Table22ndashBanglaandOriyadistinguishablecodepoints

60

11 Appendix-IIBengaliconsonantsandtheirallographs

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

প p y(z+ত)(z +ন)|(z +প)প8(z +য)r(z +র)(z +ল)~(z +স)(+প)(+প)

ফ pʰ (t +র)(t +ল)(+ফ)

ব b ( +জ)( +দ)( +ধ)v(+ব)ব8(+য)(+র)(+ল)ভ(+ভ)(+ব)(+ব)

(0+ধ) 2 (3+ব)

ভ bʱ ভ8(+য)(+র)(+ল)

ত t (x+ত)8(x+x+য)(x+x+ব)(x+থ)(x+ন)ত8(x+য)(x+ম)8(x++য)(x+ব)(x+র)y(z+ত)(p+ত)(p+x+ব)(+ত)q8(+x++য) (+x+র)Thereisamarkedformofত+=ৎৎ6 (+xৎ)

amp (5+ত)

61

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

থ tʰ থ8(iexcl+য)cent(iexcl+র)pound(+থ)(x+থ)curren(+থ)

(7+থ) 9 (+থ)

দ d yen(brvbar+গ)sect(brvbar+ঘ)uml(brvbar+দ)copy(brvbar+ধ)দ8(brvbar+য)ordf(brvbar+ব)laquo(brvbar+ভ)not(brvbar+র)(+দ)shy(+দ)reg(+brvbar+র)not6 (+brvbar+র)

(lt+গ) gt (lt+ধ)

ধ dʱ macr(deg+ন)plusmn(deg+ম)ধ8(deg+য)sup2(deg+র)sup3(acute+ধ)copy(brvbar+ধ)(+ধ)micro(+ধ)

( (+ধ) gt (lt+ধ) (0+ধ) (7+ধ)

ট ʈ para(middot+ট)ট8(middot+য)cedil(middot+ব)sup1(middot+র)ordm(p+ট)raquo(frac14+ট)

ঠ ʈʰ ঠ8(frac12+য)frac34(iquest+ঠ)Agrave(frac14+ঠ)

ড ɖ Aacute(Acirc+ড)ড8(Acirc+য)Atilde(Acirc+র)

ঢ ɖʱ ঢ8(Auml+য)Aring(iquest+ঢ)

চ tʃ AElig(Ccedil+চ)Egrave(Ccedil+ছ)Eacute(Ccedil+Ecirc+র)Euml(Ccedil+ঞ)চ8(Ccedil+য)Igrave(Iacute+চ)Icirc(Iuml+চ)

(A+চ)

62

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ছ tʃʰ ETH(Ecirc+র)Egrave(Ccedil+ছ)Ntilde(Iacute+ছ)Ograve(Iuml+ছ)

$ (A+ছ)

জ dʒ Oacute(Ocirc+জ)Otilde(Ocirc+Ocirc+ব)Ouml(Ocirc+ঝ)times(Ocirc+ঞ)জ8(Ocirc+য)Oslash(Ocirc+র)Ugrave(Iacute+জ)

(A+জ)

ঝ dʒʱ (notprivilegedenoughtohaveclustersasafirstmember)Ouml(Ocirc+ঝ)Uacute(Iacute+ঝ)

ক k Ucirc(p+ক)ordm(p+ট)(p+ত)Uuml(p+x+র)(p+x+ব)Yacute(p+ন)THORN(p+ব)szlig(p+ম)ক8(p+য)agrave(p+র)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)auml(p+frac14+ব)aacute8(p+frac14+য)aring(p+স)s(aelig+ক)ccedil(+p+র)

amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)

খ kʰ (notprivilegedenoughtohaveclustersasafirstmember)egrave(aelig+খ)

63

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

গ g eacute(acute+গ)ecirc(acute+দ)sup3(acute+ধ)euml(acute+ন)igrave(acute+ব)iacute(acute+ম)গ8(acute+য)icirc(acute+র)iuml(acute+ল)eth(aelig+গ)eth6 (+aelig+গ)

( (+ধ) (H+গ) K (L+H+গ)

ঘ gʱ ntilde(ograve+ন)ঘ8(ograve+য)oacute(ograve+র)ocirc(aelig+ঘ)

ঞ Thisletterdoesnothaveanyparticularphoneticvaluebutmostlypronouncedasn

Igrave(Iacute+চ)Ntilde(Iacute+ছ)Ugrave(Iacute+জ)Uacute(Iacute+ঝ)times(Ocirc+ঞ)

(A+চ) $ (A+ছ) (A+জ) M (A+ঝ)

ণ n otilde(iquest+ট)frac34(iquest+ঠ)ouml(iquest+ড)divide(iquest+Acirc+র)Aring(iquest+ঢ)oslash(iquest+ণ)ণ8(iquest+য)ugrave(iquest+ব)acirc(p+frac14+ণ)uacute(frac14+ণ)ucirc(+ণ)

O (P+ড) - (P+R+র) + (S+ণ)

ঙং ŋ s(aelig+ক)uuml(aelig+p+র)egrave(aelig+খ)eth(aelig+গ)ocirc(aelig+ঘ)yacute(aelig+p+ষ)(Insomecontextsaeligisreplacedbyং)কংঅং

) (H+ক) (H+গ) U (H+ঘ)

64

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ম m thorn(+ল)yuml(+প)(+z+র)(+ভ)(++র)$(+ম)(+র)(x+ম)plusmn(deg+ম)amp(+ম)atilde(p+frac14+ম)

W (3+ম)

ন n (+ট)((+middot+র))(iquest+ঠ)u(+ড)(+Acirc+র)(+ত)q(+x+র)q8(+x++য)curren(+থ)shy(+দ)reg(+brvbar+র)micro(+ধ)+(+deg+র)(+brvbar+ব)-(+ন)(+ম)ন8(+য)(+স)0(+ন)

(7+থ) (7+ধ) (7+Y+র)

শ ʃ Icirc(Iuml+চ)Ograve(Iuml+ছ)1(Iuml+ন)2(Iuml+ম)3(Iuml+র)4(Iuml+ল)শ8(Iuml+য)

ষ ʃ 5(frac14+ক)raquo(frac14+ট)Agrave(frac14+ঠ)uacute(frac14+ণ)6(frac14+প)7(frac14+z+র)8(frac14+ফ)raquo(frac14+ট)9(frac14+middot+র)Agrave(frac14+ঠ)uacute(frac14+ণ)ষ8(frac14+য)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)

+ (S+ণ)

65

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

স sampʃ (+ক)(+ট)(+প)(+ফ)lt(+ত)pound(+থ)(+ট)(+ক)=(+খ)স8(+য)gt(+র)(+ল)aring(p+স)

9 (+থ)

হ h ucirc(+ণ)0(+ন)amp(+ম)হ8(+য)(+র)A(+ল)

W (3+ম)

ড় ɽ B(C+গ)

ঢ় ɽʱ (notprivilegedenoughtohaveclusters)

য dʒThesecondarysymbol(allograph)jɔ-phalahastwophoneticvaluesWhenaddedtotheinitialconsonantinaworditisavowelaelig(asinশ8ামলর 8াপারetc)Butafteranon-initialconsonantitjustdoublesitinpronunciation(asinকায6ধায6etc)The+যcombinationhastwophysicalmanifestationsmdashর 8andয6

ক8(p+য)স8(+য)র 8(+য)[Justর 8isneverusedinBanglaorthographyর 8াisbutthenitslasttwosymbolsYa-phalaa-karaconstituteavowelsignrepresentingthevowelঅ8া]

66

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

র r Twomanifestationsmdashi lরফrepʰasthefirst

memberofaclusteregপ6ৎ6 not6 য6D6(+deg+ব)(earlierE6=+brvbar+deg+বafour-termcluster)etc(placedoverthefollowingconsonant)

ii র-ফলাrɔ-pʰɔlaasthesecondthirdmemberofaclustereg etc(placedundertheconsonantitfollows)

ল l F(+গ)(+প)G(+ব)H(+ম)I(+ট)J(+ড)K(+ক)F(+গ)L(+দ)ল8(+য)iuml(acute+ল)(+ল)thorn(+ল)

ঃ hwordfinallywordmediallyitdoublesthepronunciationofthefollowingconsonant

অঃকঃ

অব

Page 9: Proposal for a Bangla (or Bengali) Script Root Zone …‘Bangla’ (or Bengali) is historically and genealogically regarded as an eastern Indo-Aryan language with around 178.2 million

9

DevanāgarīandforSantalioneusesbothNāgarīDevanāgarīandOl-chiki(145)Forthepurposesof theBanglaLGRonly languagesbelonging to theEGIDS scale1 to4havebeenconsideredConsiderthefollowingtable

EGIDSScale1

EGIDSScale2

EGIDSScale3

EGIDSScale4

EGIDSScale5

EGIDS6

Bangla(Bengali)

SantaliBodoRiangKhumiMru(ng)Asho

LepchaPnarKodaKoraChak

Asamiyā(Assamese)

KochorRajabansı

MaltoorMalpahariya

ManipuriorMeitei

BisnupriyaManipuriKok-Borok(TripuraampBangladesh)

ChakmaHajongMundariampKurux(ofBangladesh)

TotoRohingyaTipperaMegamTanchangya

Usoi LimbuSadriorOraon

BhumijorMundariBawmChin

Table4MainlanguagesinIndiaandBangladesh

thatuseBanglaScriptontheEGIDSScale

33NotableFeaturesofBanglaScript[150]BanglaWritingSystemhascertainfeaturesthatshowhowithastobewritteninorhowtype-setting inBangla couldbedoneThis section is followedbya section that explains theCode-points (and fixed Code-point sequences) which show certain distinctive characteristics ofBanglaandwhichmaketheRepertoireThenextsectionswillalsocoverthelsquoaksharrsquo-formationrules(ABNF)showingcharacterclassWordLevelEvaluation(WLE)andContextRulesaswellas In-ScriptandCross-ScriptVariantsHerewepresentsomebasic featuresof theScriptandPronuncition The Bangla script is an alpha-syllabic writing system in which writing of all

consonants are assumed to contain an accompanying lsquoinherentrsquo vowel(theoretically before or after each consonant) It varies between ɔ and o

10

depending on the position of the consonant in the word At times theselsquoassumedrsquoorlsquoinherentrsquovowelsarenotpronouncedatall[142]

Vowelscanbewrittenasindependentlettersorbyusingavarietyofdiacriticalmarks which are written above below before after or both of the last twopositionstheconsonanttheyfollowinpronunciation[105]

AllBanglaconsonantswhenpronouncedinisolationareutteredwithaninherentvowel-ɔhenceক lsquokrsquoখ lsquokhrsquoorগ lsquogrsquoareusuallypronouncedas[kɔ][khɔ]or[gɔ]etcPhonologicallyBanglavowel-ɔcorrespondstotheHindischwaə

WhenconsonantsoccurtogetherinclustersspecialconjunctlettersareformedInprintedBanglamanyof theseconsonantal clustersorconjoinedconsonantsareinuseThelettersfortheconsonantsotherthanthefinaloneinthegrouparegenerally reduced But there are a few special conjunct characters which arecompounds of the consonant characters eg 7(k)+ষ(s)=8(ks)

9(n)+জ(j)=(nj)(j)+ঞ(n)==(jn) gt (h)+ম(m)=(hm) There are other issuesalsomdashরasthesecondmemberofaclusterisreducedtoasecondarysymboleg

(p)+র(r)=A(pr)B(s)+C(t)+র(r)=D(str) (as inউD ustra ldquocamelrdquo)য (y)whenusedas a primary symbol represents jɔ in Bangla But its secondary symbol(allograph) jɔ-phala has two phonetic values When added to the initialconsonant in a word it is a vowel aelig (as in শGামল (syamala) ldquogreenrdquo র Gাপার

(ryapara)ldquowrapperrdquoetc)Butafteranon-initialconsonant it justdoublesit in

pronunciation (as in কাযH ধাযH etc) The I(r)+য(y) combination has two

renderingsmdashর G(ry) andযH(ry)IncaseofJ(d)+ধ(dh)K(g)+ধ(dh)L(n)+ধ(dh)the

shape of the second member is changedmdasheg M(ddh) N(gdh) and O(ndh)

respectively The solitary example of I (r)+ঋ(r)=ঋH (as in ৈনঋHত nairrtSouthwest) ndash usedmostly in cases of Classical borrowings shows the use ofsecondary symbol of a consonant followed by the primary symbol of a vowelTheinherentvowelonlyappliestothefinalconsonantofthecluster

InconsonantclustersmanyconsonantstookacompletelydifferentformSometypicalexamplesareS(kt)T(kr)8(ks)N(gdh)=(jn)U(nc)(nj)V(tt)W(nt)O(ndh)X(bdh)Y(bhr)Z(mb)[(st)etcরhastwoallographsapartfromthisfullshapeoneislsquorepharsquoasfoundinকH(rk)পH(rp)andanotherisra-phalaasinA(pr)T(kr)(s+n)isanotheronewherethecerebralnasalconsonantsigntakesaqueershape[151]

The Bangla script has at least fifty-two primary symbols and quite a fewallographs(positionalvariantsofthem)correspondingtoforty-four(7oraland7nasalvowelsand30consonants)phonemes(150)orfunctionalspeechsoundswithsomeobviousredundanciesalthoughinoneofthefirstphonemicanalysisthenumberwasthoughttobethirty-fivephonemes[140]

11

As mentioned above in Bangla several graphemic symbols have secondaryshapestechnicallycalledlsquoallographsrsquowithacomplementarydistributionineachcaseThesegraphsormarkingsaregenerallyaddedtothefollowingpositionsoftheprimarysymbol[113]inthefollowingmanner

1) Below(egক(ku)W(nta)ক(ku)^ (hra)etc)

2) Above(egচ (ca)কH (rka)etc)

3) Rightside(egকা (ka)কং (kan)etc)

4) Leftside(egেক (ke))

5) LeftSideandabovesimultaneously(egৈক (kai)িক (ki)etc)

6) Rightsideandabovesimultaneously(egকী (kı))

7) Rightsideandleftsidesimultaneously(egেকা (ko))

8) Rightsideleftsideandabovesimultaneously(egেকৗ (kau))

Asforcomplementarydistributionofvowelletters(word-orsyllable-initial)andVowel Matras which are relevant for ABNF let us consider the followingBesidessomesimpleVowelModifierscalledlsquoKarsrsquoinBangla(alsoreferredtoasMatraintheotherLGRdocumentsofNeo-Brāhmī)therearesomecombinatorymodifiersofBanglaVowelswithcertainconsonantsForexamplewhereas

আU+0986BENGALILETTERAAissubstitutedby

াU+09BEBENGALIVOWELSIGNAA

ইU+0987BENGALILETTERIissubstitutedby

pre-posedিU+09BFBENGALIVOWELSIGNI

ঈU+0988BENGALILETTERIIissubstitutedby

ীU+09C0BENGALIVOWELSIGNIIor

উU+0989BENGALILETTERUissubstitutedby

U+09C1 BENGALI VOWEL SIGN U by marking below the primary

grapheme there are some special vowel modifiers of উ as in the followingcombinedletters

zwnj guratherthanwritingasগ(g)+ (u)

h ruratherthanwritingasর(r)+ (u)

zwnj śuratherthanwritingasশ (s)+ (u)

j huratherthanwritingasহ(h)+ (u)

12

knturatherthanwritingasL (n)+ত (t)+ (u)

Similarlytherecouldbevowelmodifiersofঊorlsquo(Long)ūrsquoaswelleg

m (bh)+র (r) (n bhru ldquoeyebrowrdquo)o (s)+র (r) (p sru)ঋ (r) afterহ (h) (q hr)etc

TherehavebeenmanynotablecontributionsinsimplifyingandmodifyingBanglaspellings and combinatory techniques especially by scholars such as PabitraSarkar(1992)[134]Inthistherehasbeenanattempttoreducethenumberofallographs of both vowels and consonants in clusters and it has been widelyacceptedintheprintingofschooltextsinbothBangladeshandWestBengal[151152]Asofnow twosystems theold (traditional) and thenewgoon sidebysideoperativeindifferentdomains

HoweverinpreparationofthisLGRdocumenttheaimhasbeentoconsiderthewidelyused and usable sequences and combinations and their variations across the sisterscriptsbelongingtothebasketofBrāhmīwritingsystemsBanglaAcademyDhakapublishedStandardBanglaSpellingRulesin1992followingtherecommendationsofacommitteeconstitutedthroughaworkshopjointlyorganizedbytheJatıyaSy iksakramaandPathyapustakaBoardin1988AthroughlyrevisededitionoftheRuleswaspublishedinSeptember20126After the establishment of Banla A kademi ofWestBengal in 1986 its first PresidentAnnadasankar Ray (1904-2002) in his inaugural address gave a direction forstandardizationofBanglaalphabetscript thespellingsystemandclearlyarguedthattheywouldnotblindly followtheSanskriticmodelofconventionalgrammarAbroadlistofproposalswassenttoexpertsonBanglaandabroadagreementwasreachedforlsquohomogenizationofBanglaspellingrsquoby1988BasedonopinionsreceivedfromdifferentquartersaunanimouslistoflsquorulesrsquowasagreeduponThiswaspublishedbyalsquoSpellingDictionaryrsquo titled Ākādemi Bānāna Abhidhāna (1997) which was obviously morecomprehensive than lsquoTheUniversityofCalcuttaproposalsrsquomade in1936Alongwiththe lsquorationalizationrsquo of spellings another stepwas taken tomake thewriting systemeasier to read by making the symbols used both single and combined ones morelsquotransparentrsquoThesereformswereoriginallysuggestedbySarkar(1987firstpublishedin1978)[134][153]whereheusedthetermsSwaccha (lsquoTransparentrsquo)andAswaccha(lsquoOpaquersquo or non-transparent) even adding Ardha Swaccha (lsquohalf transparent) inbetweenthetwoSomesampleexamplesare

6Bangla Academy 2012 Bāṅlā Ekaḍemī Pramita Bāṅlā Bānānera Niyama (Bangla Academy StandardBanglaSpellingRules)DhakaːBanglaAcademy

13

Transparent r (nn) s (pt) [ (st) where both member of the cluster can berecognizedOpaquewhereneitherofthetwocouldbe(easily)recognizedmdash8 ks(7 k+ষ s)= jn

( j+ঞ n)tng(un+গg) hm(gt h+ম m)

Semi-transparent A (pr)পH (rp)whereone symbol is recognizable and theother is

notIncaseofthree-termclustersatleastonesymbolwillnotbetransparentegv str

(w s+x t+র r)D str(B s+C t+র r)etc

Therewere in fact two types of proposals One concerned the shape of the lettersthose of consonant + vowel (CV) combinations and conjuncts which is consonant +consonantcombinationsTherewerefurthercomplexshapesiethoseofconsonant+consonant+ (consonant+) vowel (CC(CV) signs as in y (pru) or z (skru) SomedecisionsinthisareawerenecessarybecauseafewoftheCC(C)symbolsrepresentedcomplexitiesthatmadelearningthemdifficultforthechildrenTheotherdealtwiththespellings ofwords onlywithout any reference to the shapes of letters inwhich theywere written The basic objective here was lsquoone word one spellingrsquo to the greatestextentthatwaspossible[151]

Belowwe place a statement of themost salient changes that affect the consonant +vowelcombinations[153]

a The variants of the short u (^ উ-কার hrasva u-kāra) vowel sign have been

brought down to one ie So zwnj (gu) is now গ Similarly h (ru) gt র zwnj

(śu)gt শ j (hu)gtহ and therefore cluster + short u sign k (ntu)gt W

(ন++ত+উ) (stu)gt[ (স++ত+উ)

b The variants of long u (দীঘH ঊ-কার dīrgha u-kāra) have also been reduced

(rū)gt র n (bhrū) gt Y (ভ bh++র r+ঊ ū) (drū)gt (দ d++র r+ঊ ū) p (śrū)gt

(শ ś++র r+ঊ ū)

c The variants of ঋ-কার (ṛ-kāra secondary symbol of ṛ) have been brought down to one q (hṛ) gt হ

Regarding consonant + consonant + (consonant)hellip+ (vowel) clusters PaschimbangaBanglaAkademi proposed transparent or semi-transparent shapes for clusters to theextentadmissibleinBanglawritingsystemSomeexampleswillclarifytheproposal(Aslashwillmeanthatthetraditionalcluster-shapeprecedesitwhiletheBanglaAkademiinnovationfollows)[153]

Xব ধ bdh ( b+ ধ dh) Mদ ধ ddh (J d+ধ dh) ন থ nth (L n+থ th) Uঞ চ ntildec

(9 ntilde+চ c) ঞ ছ $ ntildech (9+ছ) ঞ জ ntildej (9 ntilde+জ j) Sক ত amp kt (7 k+ত t) T

14

kr (7 k+র r) Nগ ধ ( gdh (K g+ধ dh) ) ṅk (u ṅ+ক k) t ṅg (u ṅ+গ g) +

ṣṇ (B ṣ+ণ ṇ) ন ndhr (L n+ dh+র r) - ṇḍr ( ṇ+ ḍ+র r) ktr (7 k+x

t+র r)

331TheConsonantsAsper traditional classificationBangla Consonants are categorized according to theirphoneticpropertiesespeciallyintermsofplaceandmannerofarticulation[107]Thereare Five lsquoVargarsquo (pronounced as lsquoBargarsquo in Bangla) or Groups (sets or classes)distinguished by Place of Articulation and one Non-lsquovargarsquo group [105] Each Vargawhich corresponds toStopsat a certainplaceof articulation containsa seriesof fiveconsonants classified as per their phonetic qualities (ie manner of articulation)beginning from Unvoiced and Unaspirated to Voiced and Aspirated (in the fourthcolumn)finallyendingwithaHomorganicorCorrespondingnasal[107]Considerthefollowingtable

lsquoVargarsquoorSets

Unvoiced Voiced Nasal

-Asp +Asp -Asp +Asp

Velar কlsquoKrsquoU+0995

খlsquoKHrsquoU+0996

গlsquoGrsquoU+0997

ঘlsquoGHrsquoU+0998

ঙlsquoNGrsquoU+0999

Palatal চlsquoCrsquoU+099A

ছlsquoCHrsquoU+099B

জlsquoJrsquoU+099C

ঝlsquoJHrsquoU+099D

ঞlsquoNYrsquoU+099E

Retroflex টlsquoTTrsquoU+099F

ঠlsquoTTHrsquoU+09A0

ডlsquoDDrsquoU+09A1

ঢlsquoDDHrsquoU+09A2

ণlsquoNNrsquoU+09A3

Dental তlsquoTrsquoU+09A4

থlsquoTHrsquoU+09A5

দlsquoDrsquoU+09A6

ধlsquoDHrsquoU+09A7

নlsquoNrsquoU+09A8

Bilabial পlsquoPrsquoU+09AA

ফlsquoPHrsquoU+09AB

বlsquoBrsquoU+09AC

ভlsquoBHrsquoU+09AD

মlsquoMrsquoU+09AE

Table5VargaclassificationofBanglaconsonants

(FallingintoaPatternofFiveSetsofUnvoicedUnaspiratedUnvoicedAspiratedVoicedUnaspiratedVoicedAspiratedandNasalscalledfivelsquoVargarsquo)

Non-

যlsquoYrsquoU+09AF

য়lsquoYYrsquoU+09DF

রlsquoRrsquoU+09B0

ড়lsquoRRrsquoU+09DC

ঢ়lsquoRHrsquoU+09DD

15

Varga লlsquoLrsquoU+09B2

শlsquoSHrsquoU+09B6

ষlsquoSSrsquoU+09B7

স lsquoSrsquo U+09B8

হlsquoHrsquoU+0939

Table6Non-Vargaconsonants(Notfallingintoanyofthefivecategories)

332TheImplicitVowelKillerHasanta(calledrsquoHalantrsquoorlsquoHalantarsquoinotherBrahmı-basedscripts)As stated earlier all consonants are pronounced in isolation with an implicit vowel(centralback-ɔinBanglaastheneutralvowel)assumedtobeassociatedwiththem[121]ThelsquoHasantarsquo(=rsquoHalantrsquoorlsquoHalantarsquoinotherBrahmı-basedscripts)orthetermlsquoVirāmarsquo7(=rsquoDa rirsquoinBangla)aspreferredinUNICODE(cfUnicode30andabove)havebeenusedinthisreportastermsthathavebeenusedtodenotethecharacterthatmarkthe absence of this inherent vowel It may be noted that the term virama has beenadopted in UNICODE in a sense that is different from the traditional definition ofgrammarandhenceitrequiressomeexplanationhereConsideringtheimportanceofthedocumentthisnoteshouldbeapartofthisLGRdocumentsothatanybodyreferingtoitshouldbeabletoknowthepropergrammaticalexplanationofthetermBecauseaspecialsignisneededwheneverthisimplicitvowelisstrippedoffthesymbolisknownas the Hasanta (= Halant) (U+09CD) By placing the Hasanta under the firstconsonantofacombinationorclusteronecouldndashincommonparlanceldquokillrdquoitsvowelandcreate conjuncts In thismanner conjunct characters canbegenerallywrittenbyjoining two to fourconsonant combinations In rarecases thisprocess can joinup tofive consonantsHowever thenotionof amaximumnumberof consonants joining toformoneaksara8istobeboundedempiricallyThisisanobservationbasedontheCIIL-Emille Corpora of Bangla words [132 amp 133] as seen in print these days Given themixture of scripts and languages happening on theweb the possibility that onemaywant a generic Top Level Domain [gTLD] which may have more than the observedmaximum cannot be ruled out This can be the case when a foreign language wordwhichadmitsalargenumberofconsonantsistransliteratedintoBanglaHenceintheBanglaLGRworkthislimitwillnotbeenforced

333VowelsSeparate symbols exist for all lsquoSwararsquo or Vowels in Bangla which are pronouncedindependentlyeitheratthebeginningofthewordorafteranothervowelorconsonantsoundToindicateaVowelsoundotherthantheimplicitoneaVowelsigncalledlsquokārrsquo

7VirāmaasusedhereisalsoamisnomeraccordingtotheIndiangrammaticaltraditionsNowheremereabsence of a vowel is marked as virama Hasanta just marks the absence of a vowel nothing else(AbhyankarKashinathVasudevampJMShukla1961ADictionaryofSanskritGrammarBarodaːOrientalInstitute)8ThistermneedstobedisambiguatedAksaraalsomeanslsquosyllablelsquoinIndiangrammaticaltreaditions

16

inBanglaorMātrāinNagarı9isattachedtotheconsonantSincetheconsonanthasthisbuilt in neutral vowel at the end there are equivalent kāras (Mātrās) for all vowelsexcepttheঅ(pronounced-ɔ)Thecorrelationisshownasfollows

Vowel Correspondingvowelsign(kāras(Mātrās)

অlsquoArsquoU+0985

আlsquoAArsquoU+0986 া U+09BE

ইlsquoIrsquoU+0987 ি U+09BF

ঈlsquoIIrsquoU+0988 ীU+09C0

উlsquoUrsquoU+0989 U+09C1

ঊlsquoUUrsquoU+098A U+09C2

ঋVocalicrsquoRrsquoU+098B U+09C3

ৠVocaliclsquoRRrsquoU+09E0 U+09C4

ঌVocaliclsquoLrsquoU+098C U+09E2

ৡVocaliclsquoLLrsquoU+09E1 U+09E3

এlsquoErsquoU+098F ে U+09C7

ঐlsquoAIrsquoU+0990 ৈU+09C8

ওlsquoOrsquoU+0993 োU+09CB

ঔlsquoAUrsquoU+0994 ৌ U+09CC

- ৗ U+09D7

Couldappearontopofঅ lsquoArsquoU+0985oranyothervowel

U+0981Candrabindu

9AlthoughthetermlsquoMatralsquoinBanglastandsforanaltogetherdifferentconceptvizthetopbarplacedoveraletterndashtypicallyavailableinHindiandBanglabutmissinginGujarati

17

Vowel Correspondingvowelsign(kāras(Mātrās)

Couldappearafterঅ lsquoArsquoU+0985oranyothervowel

ংU+0982Anusvara

Couldappearafter অlsquoArsquoU+0985oranyothervowel

ঃU+0983Visarga

Afteranyconsonant U+09CD(Hasanta)

- ঽ U+09BDAvagraha

Table7BanglaVowelswithcorrespondingkārs

334TheAnusvāraonuʃʃār(ং-U+0982)TheAnusvāra or onuʃʃār inBangla at times represents a homorganic nasal but notalwaysItreplacesaconjunctgroupofalsquoNasalConsonant+Hasanta+ConsonantrsquowherethesecondconsonantbelongstotheVelarvargaorsetasinলংকাButitoftenappearsalso for such combinations involving non-velars appearing as the lastmember of thecombinationasinলGাংটা ldquonakedrdquoorলGাংচা ldquoakindofsweettolimprdquoBeforeanon-vargaconsonant the Anusvara represents a nasal sound that may have an alternativeconjoined writing symbol representing the corresponding nasal consonant of theparticularsetAlthoughModernHindiMarathiandKonkaniprefertheanusvāratothecorrespondingHalf-nasal inBangla it isclearlydemarcatedastowhereonemustusetheAnusvāraandwhere ithastobeaconjunctclusterwithanasalasthefirstorthesecondcomponent

335NasalizationCandrabindu(-U+0981)

Candrabindu denotes nasalization of the preceding vowel as in চাদ cad lsquomoonrsquo(U+099AU+09BEU+0981U+09A6)Thissignwithadotinsidethehalf-moonmarkisusedasnasalizationmarkerinmanyBrahmı-basedscripts[143]

336Nukta(-U+09BC)ThenuktasigndoesnotexistinBanglaorthographyItispredominantlyusedinmanyBrahmıderivedscriptssuchasDevanagarı(forHindiBodoMaithiliSantaliKashmiriandSindhiThetermandtheconceptofnuktaareborrowedinBanglaTheIDNAProtocol(RFC5891)statesthatIDNsmustbeinUnicodeNormalizationFormC (NFC) RFC 7940 applies this requirement to LGRs The definition of NFC in the

18

UnicodeStandardcontainsanumberofcompositionexclusionsAsaresulttheBanglalettersয় YYAড় RRAandঢ় RRHAhavetoberepresentedinthethisLGRbyusingthesequences (YA +Nukta U+9AF + U+09BC) (DDA + Nukta U+9A1 + U+09BC) and(DDHA+NuktaU+9A2+U+09BC)insteadofthesinglecodepointsYYA(U+9DF)RRA(U+09DC) andRRHA (U+09DD) although the useof lsquoNuktarsquo is otherwise completelyunnaturalinBanglaIt is noted that in the current Unicode Standard chart these characters are listed asadditionalconsonantsAspertheLGRProcedurehoweverthesedecisionsdependontheIDNAProtocolthroughasetofprodeduresdevelopedbytheIETFEventhoughtheUnicode Standard also prescribesmethods to produce these three characters both asatomiccharacters (forexample09DC forড় [r]09DD forঢ় [rh] and09DFasয় [y]assinglekeystroke)theIDNAprotocolrequiresthatwetreatthemasconjunctcharactersandthenallocatecodesfortheseintheUnicodeBengaliBlockItmaybenotedthattherecouldbesporadicattemptsorcasesofwritingMuslimnamesUrdupoeticwordsandPerso-Arabicloanwordswithnuktaunderক(k)খ(kh)গ(g)জ(j) and ফ (ph) only for the sake of correct pronunciation and for maintaining thesanctityoftheloanwordThesewerealsolikeusingBanglawritingsystemtoworkliketheIPAscriptItishowevernotinuseinBanglawritinginprinting

337Visargabiʃɔrgo(ঃ-U+0983)andAvagraha(ঽ-U+09BD)

TheVisargabiʃɔrgoU+0983 is frequentlyused inBangla loanwordsborrowed fromSanskritandrepresentsasoundveryclosetohOnecouldquoteasanexampleদঃখduhkholdquosorrowrsquorsquoldquounhappinessrsquorsquo(U+0926U+0941U+0983U+0916)The Avagraha ঽ (U+09BD) is mainly used in Sanskrit Pali Prakrt or Maithili textswritteninBanglaItisgraduallybeingreplacedbyanuppercomma(egনেরাঽপরািণre-writtenasনেরাrsquoপরািণ)ItisrarelyusednoweveninotherlanguagesusingBanglascriptIncaseofLGRtheAvagrahaisnotpartoftherepertoireIthasbeendecidedthereforenottoretainAvagraha(ঽ)(U+09BD)becauseitisblockedinTLDsaspertheMaximalStartingRepertoire(MSR)PleaseseeAppendixIIinsection11foracompletelistofBanglaconsonantsandtheirallographs

338ZeroWidthNon-joiner(U+200C)andZeroWidthJoiner(U+200D)ThisnoteispertinenttotheuseofZeroWidthJoiner(ZWJ)andZeroWidthNonJoiner(ZWNJ)asusedinBanglaItneedstobenotedthatNepaliKonkaniandHindiusethesetwosignsinadifferentmanner

19

ZWJ(U+0200D)andZWNJ(U+0200C)arecodepointsthathavebeenprovidedbytheUnicodestandardto instructtherenderingofastringwherethescripthastheoptionbetweenjoiningandnon-joiningcharactersWithouttheuseofthesecontrolcodesthestringmayberenderedinanalternateformfromwhatisintendedUseofZWJ

bull InsofarasBanglaisconcernedZWJisusedfortheproperrenderingofcharacterssuchaskhaṇḍa-taৎasinসতGিজৎ (satyajit)ldquoSatyajitrdquoandসৎ(sat)ldquohonestrdquoThisistypedasfollowsta+Hasanta+ZWJ(U+0200D)

bull However ZWJ is more important where same combination of consonantal

characters is represented differently depending upon the contexts Eg র++য

havetworepresentationsinBanglamdashasযHandasর GTogettheformযHonehasto

type in the following mannermdashর++য but for র G the sequence would be

র+ZWJ++য [154] In other words ZWJ is used in the rendering of wordsdemanding ya-phalā after ra which is otherwise not possible to type (render)due to the same order of ra+hasanta+antastha ja in the medial andor finalposition Interestingly ra+hasanta+antastha ja is used to type repha on theconsonant -antasthaja as inকায6 (kaarjo) In order to get a ya-phalā after the

consonant -ra it is therefore obligatory to use ZWJ after -ra as in র Gাপার

(wrapper)র Gাশ(rash)র Gািল(rally)etcThetypingsequenceisgivenbelow

ra(র)+ZWJ+hasanta()+antasthaja(য)=র GUseofZWNJ

bull TheuseofZWNJinBanglaisusedtorepresenttheexplicitHasantaorHalantInordertoavoidconjunctformationincaseswherethereisanexplicithasantabeforethesucceedingconsonanttheZWNJisused

Consonant+hasanta+ZWNJ+consonant=explicithasantaExampleAা7 কথন(prakkathanaprakkɔtʰon)

TheuseofZWJZWNJhavebeenruledoutfromtherootzonebythe[Procedure]Usedin Bangla to create alternate renderings the insertion of these two signs can affectsearchingaswellasNLP

20

TheZeroWidthNon-joiner(ZWNJ)isaninvisiblecharacterusedincertaincases(afterHasanta)wheredefaultconjunctformationistobeexplicitlyrestrictedandtheHasantajoiningthetwoconsonantsparticipatingintheconjunctformationneedstobeexplicitlyshown

339UseofYa-phalaaYa-Phalaasequencesare two instances inBanglawhereHasanta isprecededbya fullvowel(U+0985অ-BENGALILETTERAandU+098Fএ-BENGALILETTERE)

bull অা 098509CD09AF09BE BENGALILETTERA+BENGALISIGNVIRAMA+BENGALILETTERYA+BENGALIVOWELSIGNAA

bull এা 098F09CD09AF09BEBENGALILETTERE+BENGALISIGNVIRAMA+BENGALILETTERYA+BENGALIVOWELSIGNAA

For renderingYa-phalā followedbyঅ andএ it isnecessary to typeU+09CDHasantaplusU+09AFyaprecededby thesaidvowelsThis isapurely ligaturalentityand theadditionofYa-phalāandākaraisusedtoelicittheaeligsoundasinEnglishacidঅGািসড

association অGােসািসেয়শনlsquobatrsquoবGাটlsquofatrsquo ফGাট lsquomatrsquo মGাটlsquocaprsquoকGাপetcTheBrāhmīscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribed as lsquovowel killerrsquo although it actually indicates absenceof a vowel after themarkedconsonantOnly theconsonantscanhave theHasantamarkedButasweseehereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅ8াandএ8া(CfUnicode100p473[100])

3310FormationofRa-phalaaandRefSequences Thiscasereferstotheformationofrephaandra-phalāasfollows

Ra-Hasanta= (C2H)whereC2iseither

09B0(র-BENGALILETTERRA)or 09F0(ৰ-ASSAMESELETTERRAUnicodename

BENGALILETTERRAWITHMIDDLEDIAGONAL)His09CD(-BENGALISIGNVIRAMA)

Owingtoco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+kaasinঅকHarkaldquothesunldquo)ra-phalā=C+Hasanta+ra(egT ieka+Hasanta+raasinচTchakraldquocyclerdquo)Thepointisin

21

both the cases the slot for ra could be Bangla ra র (U+09B0) or the Assamese ra ৰ(U+09F0)followedprecededbythecommonHasanta(U+09CD)whereastheshapesofrephaandra-phalāinboththecasesremainthesameTheLGRmakesanoteofthispoint of concern with respect to the two RAs in disguise as it would be compeltelyimpossibletodistinguishbetweenthemwithnakedeyesinalablesogeneratedwhichmay consequently lead to concerns related to spoofing and other kind of cyberirregularitiesThemotivetoclassthesetwoCPsas(blocking)variantsisbecausefullyrendered labels may mask the distinction between Bangla ra র (U+09B0) or theAssameseraৰ(U+09F0)ThatprovidesthejustificationforVariantSet4thoughonlyinthecontextoffollowingHasantThedifferencebetweentheRAsisonlydistinguishableifonelooksintotheirUnicodevaluesThereforelabelssuchasঅকHarka শীষH sırsalsquotopapexrsquo অY abhra lsquocloudthe skyrsquo ম śrama lsquophysical labourrsquo could be extremelydangerous as theweb-usermay never verify the digital content (the labels) with itsunicodevaluecodepoints ThispointismadeexplicitlywithreferencetoTable9(ofsequencesp36)andTable16(ofWLESymbolsp47)thataretofollowMoreoveritisnoteworthythattheREPHAcanalsooccurwithKHANDATATheconditionsinthiscontextofKHANDATAareliabletobesuchthattheCshouldbeeitherRAU+09B0(র)(usedinBangla)orRAU+09F0(ৰ)(usedinAssamese)

4 OverallDevelopmentProcessandMethodologyThe Neo-Brāhmī Generation Panel (NBGP) has been formed by members havingexperience in Linguistics (especially in NLP Computational linguistics) LiteratureLanguageHistoryandEpigraphyUndertheNeo-BrāhmīGenerationPanelBanglaandeightotherscriptsbelongingtoseparateUnicodeblocksarebeingtakenuptoassignaseparate LGR for each However an attempt ismade to ensure that the fundamentalphilosophybehindbuildingthoseLGRsconsistentwithallotherBrāhmī-derivedscriptsThepresentLGRwillcater tomultiple languagesbelongingtoEGIDSscale1to4(seeTable4)thatuseBanglascriptThefollowingguidingprinciplesareusedinmakingdecisionsaboutBanglaLGRCode-points

41 GuidingPrinciplesTheNBGP adopts followingbroadprinciples for selection of code-points in the code-pointrepertoireacrosstheboardforalltheNeo-Brāhmīscriptswithinitsambit

411 InclusionPrinciples4111 ModernUsageEvery character proposed should be in the everyday usage of a particular linguisticcommunityThecharacterswhichhavebeenencodedintheUnicodefortranscription

22

purposesonlyorforarchivalpurposeswillnotbeconsideredforinclusioninthecode-pointrepertoire4112 UnambiguousUseEvery character proposed should have unambiguous understanding among linguistsaboutitsusageinthelanguage

42 ExclusionPrinciplesThe main exclusion principle is that of External Limits on Scope These consist ofprotocolsor standardswhichareprerequisites to theLabelGenerationRule-setsAllfurtherprinciplesareinfactsubsumedundertheselimitationsbuthavebeenspeltoutseparatelyforthesakeofclarity421 ExternalLimitsofScopeThecodepointrepertoireforrootzonebeingaveryspecialcaseatthetopofprotocolhierarchies thecanvasofavailablecharacters forselectionasapartof theRootZonecodepointrepertoireisalreadyconstrainedbyvariousprotocollayersbeneathitThefollowingthreemainprotocolsstandardsactassuccessivefiltersiTheUnicodeChartOut of all the characters that are needed by the script in question if a particularcharacter is not encoded in Unicode it cannot be incorporated in the code pointrepertoire Such cases are quite rare and especially so in Bangla-Asamiyā-ManipuriWritingSystemgiventheelaborateandexhaustivecharacterinclusioneffortsmadebytheUnicodeconsortiumiiIDNAProtocolUnicode being the character-encoding standard for providing the maximum possiblerepresentation of a given scriptlanguage it has encoded as far as possible all thepossible characters needed by the script However the Domain name being aspecialized case it is governed by an additional protocol known as IDNA(InternationalizedDomainNames inApplications) The IDNAprotocol excludes somecharactersoutofUnicoderepertoirefrombeingpartofthedomainnamesiiiMaximalStartingRepertoire(MSR)TheRoot-zoneLGRbeing the repertoireof characterswhicharegoing tobeused forcreationoftheRoot-zoneTLDswhichinturnconstituteanevenmorespecializedcaseof domain names the ROOT LGR procedure introduces additional exclusions on theIDNArsquosallowedsetofcharacters

23

ExampleBanglaSignAvagrahaঽ(U+093D)evenifallowedbyIDNAprotocol isnotpermittedintheRootZoneRepertoireaspertheMSRTosumuptherestrictionsstartoffwithadmittingonlysuchcharactersasarepartofthe code-block of the given scriptlanguage The IDNA Protocol further narrows thisdownandfinallyanadditionalfilterintheformofMaximalStartingRepertoirerestrictsthecharactersetassociatedwiththegivenlanguageevenmore4211 NoPunctuationMarksTheTLDsbeingidentifierspunctuationmarkerspresentinBraHami-basedscriptswillnotbeincluded 4212 NoSymbolsandAbbreviationsAbbreviations weights and measures and other such iconic characters like BANGLAISSHAR(U+09FA)BANGLACURRENCYDENOMINATORSIXTEEN৹(U+09F9)etcwillalsonotbeincluded4213 NoRareandObsoleteCharactersThere are characterswhich have been added toUnicode to accommodate rare formssuchasSanskriticVOCALICRRৠ(U+09E0)andVOCALICLldquoঌrdquo (U+098C)aswellasVOCALICLLৡ(U+09E1)andtheallographicndashkaraformsofthelattertwosymbols-VOWELSIGNVOCALICL(U+09E2)andVOWELSIGNVOCALICLLldquo(U+09E3)Allsuch charactersareexcludedwhich complieswith theConservatismprincipleas laiddownintheRootZoneLGRprocedureHoweverinBanglathe-karacorrespondingtoVOCALICRRৠ(U+09E0)whichisVOWELSIGNVOCALICRRldquordquo(U+09C4)isstill inactiveuseincertainlimitedborrowedorSanskriticwordsandarethereforeretained4214 NoStressMarkersofClassicalSanskritandVedicStressmarkers for classical Sanskrit will not be included This is also in consonancewiththeLetterprincipleaslaiddownintheRootZoneLGRprocedure4215 ABNFThe Augmented Backus-Naur Formalism (ABNF) is described in Section 541 andAppendix(Section101)

5 RepertoireTheBanglaWritingSystemisrepresentedinUNICODEusingtheBengali(Bangla)scriptname as enumerated in ISO 15924 corresponding to languages such as Asamiyā(Assamese) Bangla (Bengali) and Manipuri The BENGALI block used for Bangla-

24

Asamiyā-Manipuri in theUNICODEhas93 entriesThis sectiondetails the code-pointrepertoirethattheNeo-BrāhmīGenerationPanel[NBGP]proposestobeincludedintheBanglaLGRItmaybementionedherethat theGovernmentofAssamhassubmittedaproposal toBureauof Indian Standards (BIS) on26th February2016 for dis-unificationofBanglaand Asamiyā Scripts The BIS inits 8thMeetingofIndian Language Technologies andProducts Sectional Committee LITD 20 held on 23rd Aug 2017 decided torefer theproposalforrecognitionofAssamesescriptinISOIEC10646toISOUntiltheUNICODEConsortiumtakesanyfurtheractionitwillbeassumedthattheCodePointRepertoireunderTable11willbevalidforallthethreelanguagesasaboveFor each of the code points language references have been given in the last columntitledReferenceunderTable8titledtheldquoCodePointRepertoirerdquoForentirecoverageofBanglacodepointsreferencesofBanglaAsamiyā(Assamese)Manipuri(Meitei)andBishnupriya are given Kokborok written in Bangla script is not known to haveintroducedmanynewcomplicationsexceptforoneparticularcharacterThoughonlyafewrepresentativelanguagesunderEGIDSScale1-4havebeenchosenforreferencingthey together cover all the code-points required for all the languages that NBGP hasconsideredasgivenunderBanglaUnicodePoints(asgiveninUNICODE63)Howeverbefore thedetailsarepresented it is ideal to lookat theBanglaCodePointChartfromMaximalStartingRepertoire[MSR]Version3Itmaybenotedthattheshapesofthereferenceglyphsgivenbelowinthecodechartsarebasedononeofthemanyfontsdesigned and are not prescriptive because there could be some variations in actualfonts ndash both UNICODE-compatible and True-Type ones Consider the following Codepointtable

25

Figure1BanglaCodePagefrom[MSR]forBangla-Asamiyā-Manipuri

ColourconventionAllcharactersthatareincludedinthe[MSR]-YellowbackgroundPVALIDinIDNA2008butexcludedfromthe[MSR]-PinkishbackgroundNot PVALID in IDNA2008 or are ineligiblefor the root zone (digits hyphen) - Whitebackground

GiventheBanglaUnicodeBlockasinFigure1forthecodepointsthoseareincludedintheMSRthefollowingsymbolswillneedaseparatetreatmentৎ U+09CE BanglaLetterKhanda-Taৰ U+09F0 Asamiyā-BanglaLetterRaWithMiddleDiagonalৱ U+09F1 Asamiyā-BanglaLetterRaWithLowerDiagonal

26

51 CodePointRepertoireInclusion

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

1 U+0981 BENGALISIGNCANDRABINDU

Candra-bindu

1Bangla2Manipuri2Assamese

[112][122][125]

2 U+0982 ং BENGALISIGNANUSVARA

Onushshar(Anusvara)

1Bangla2Manipuri2Assamese

[112][122][125]

3 U+0983 ঃ BENGALISIGNVISARGA

Biśarga(Visarga)

1Bangla2Manipuri2Assamese

[112][122][125]

4 U+0985 অ BENGALILETTERA

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

5 U+0986 আ BENGALILETTERAA

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

6 U+0987 ই BENGALILETTERI

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

7 U+0988 ঈ BENGALILETTERII

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

27

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

8 U+0989 উ BENGALILETTERU

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

9 U+098A ঊ BENGALILETTERUU

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

10 U+098B ঋ BENGALILETTERVOCALICR

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

11 U+098F এ BENGALILETTERE

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

12 U+0990 ঐ BENGALILETTERAI

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

13 U+0993 ও BENGALILETTERO

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

14 U+0994 ঔ BENGALILETTERAU

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

28

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

15 U+0995 ক BENGALILETTERKA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

16 U+0996 খ BENGALILETTERKHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

17 U+0997 গ BENGALILETTERGA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

18 U+0998 ঘ BENGALILETTERGHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

19 U+0999 ঙ BENGALILETTERNGA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

20 U+099A চ BENGALILETTERCA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

21 U+099B ছ BENGALILETTERCHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

29

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

22 U+099C জ BENGALILETTERJA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

23 U+099D ঝ BENGALILETTERJHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

24 U+099E ঞ BENGALILETTERNYA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

25 U+099F ট BENGALILETTERTTA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

26 U+09A0 ঠ BENGALILETTERTTHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

27 U+09A1 ড BENGALILETTERDDA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

30

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

28 09A109BC(U+09DC)

ড় NormalizedformofBENGALILETTERRRA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]09DCisthepreferredcodepointhoweveritisnotavailableforLGRasperthestandardsgoverningthisLGRdevelopment

29 U+09A2 ঢ BENGALILETTERDDHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

30 09A209BC(U+09DD)

ঢ় NormalizedformofBENGALILETTERRHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]09DDisthepreferredcodepointhoweveritisnotavailableforLGRasperthestandardsgoverningthisLGRdevelopment

31 U+09A3 ণ BENGALILETTERNNA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

32 U+09A4 ত BENGALILETTERTA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

31

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

33 U+09A5 থ BENGALILETTERTHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

34 U+09A6 দ BENGALILETTERDA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

35 U+09A7 ধ BENGALILETTERDHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

36 U+09A8 ন BENGALILETTERNA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

37 U+09AA প BENGALILETTERPA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

38 U+09AB ফ BENGALILETTERPHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

39 U+09AC ব BENGALILETTERBA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

32

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

40 U+09AD ভ BENGALILETTERBHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

41 U+09AE ম BENGALILETTERMA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

42 U+09AF য BENGALILETTERYA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

43 09AF09BC(U+09DF)

য় NormalizedformofBENGALILETTERYYA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]09DFisthepreferredcodepointhoweveritisnotavailableforLGRasperthestandardsgoverningthisLGRdevelopment

44 U+09B0 র BENGALILETTERRA

Consonant 1Bangla2Manipuri

[112][125]

45 U+09B2 ল BENGALILETTERLA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

46 U+09B6 শ BENGALILETTERSHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

33

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

47 U+09B7 ষ BENGALILETTERSSA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

48 U+09B8 স BENGALILETTERSA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

49 U+09B9 হ BENGALILETTERHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

50 U+09BE া BENGALIVOWELSIGNAA

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

51 U+09BF ি BENGALIVOWELSIGNI

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

52 U+09C0 ী BENGALIVOWELSIGNII

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

53 U+09C1 BENGALIVOWELSIGNU

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

34

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

54 U+09C2 BENGALIVOWELSIGNUU

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

55 U+09C3 BENGALIVOWELSIGNVOCALICR

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

56 U+09C4 BENGALIVOWELSIGNVOCALICRR

Kāra(Mātrā)

1Bangla2Assamese

[112][122]

57 U+09C7 l BENGALIVOWELSIGNE

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

58 U+09C8 m BENGALIVOWELSIGNAI

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

59 U+09CB lা BENGALIVOWELSIGNO

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

60 U+09CC lৗ BENGALIVOWELSIGNAU

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

35

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

61 U+09CD BENGALISIGNVIRAMA

Hasanta(=Halant)Virama(=Da ri)

1Bangla2Assamese2Manipuri

[112][122][125]

62 U+09CE ৎ BENGALILETTERKHANDATA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

63 U+09F0 ৰ BENGALILETTERRAWITHMIDDLEDIAGONAL

Consonant 2Assamese [122]

64 U+09F1 ৱ BENGALILETTERRAWITHLOWERDIAGONAL

Consonant 2Assamese2Manipuri

[122][125]

Table8BanglaCode-PointRepertoire

Apart from the above individual code-points the Neo-Brāhmī Generation Panel alsoproposes some specific sequences which enable conditional inclusion of the BanglaLETTER A and E followed by Bangla SIGN VIRAMA and Bangla LETTER YA againfollowed by Bangla VOWEL SIGN AA in the repertoire for enabling inclusion of aeligsoundasinEnglishlsquobatrsquolsquocatrsquoetc

SrNo

UnicodeCodePoints

Sequence

CharacterNames Examplelanguagesusingthecode-point

(Notexhaustive

list)

Reference

36

SrNo

UnicodeCodePoints

Sequence

CharacterNames Examplelanguagesusingthecode-point

(Notexhaustive

list)

Reference

S1 098509CD09AF09BE

অ8া BENGALILETTERABENGALISIGNVIRAMABENGALILETTERYABENGALIVOWELSIGNAA

BanglaAssamese

[112][122]

S2 098F09CD09AF09BE

এ8া BENGALILETTEREBENGALISIGNVIRAMABENGALILETTERYABENGALIVOWELSIGNAA

Bangla [112]

Table9Sequences

52 CodePointRepertoireExclusionTherearesomecharactersoftheBanglascriptthatfindplaceintheUnicodebuthavenot been included in the repertoire in the LGR proposal The reason for excludingঌ(U+098C)andৗ(U+09D7)isthattheyarerareandobsoletecharacters

SrNo CodePoints

Glyph CharacterNames Note

1 U+098C ঌ BENGALILETTERVOCALICL Limitedordeclininguse

2 U+09D7 ৗ BENGALIAULENGTHMARK Limitedordeclininguse

Table10ExcludedCodePoints

37

53 CodepointnotusedaloneBENGALI SIGN NUKTA U+09BC (See 336) is excluded from repertoire since it will never be used alone It will be used as sequence in three special characters in normalized form for ড় ঢ় য়

UnicodeCodePoint

Glyph CharacterName Reasonforexclusion

U+09BC BENGALISIGNNUKTA

Never used alone Only used together with U+09A1 ড

U+09A2 ঢ U+09AF য as to form ড় ঢ় য় respectively

Table10bExcludedCodePoints

54 TheBasisofPresentIDNThepresentLGRhasalsobenefitedfromtheearlierworkonIDNforBangla(differentversions)doneforभारतorভারতdraftedbetween20112009and18072013

541 TheABNFVariablesTheAugmentedBackus-NaurFormalism(ABNF)beganwiththefollowingvariables

CrarrConsonantVrarrVowelMrarrkāra(Mātrā)BrarrAnusvāra(onuʃʃār)DrarrCandrabinduXrarrVisarga(biʃɔrgo)HrarrHasantaViramaZrarrKhandaTa

TheAugmentedBackus-NaurFormalism(ABNF)willusethefollowingOperators

SrNumber Operator Function

1 ldquo|ldquo Alternative

2 ldquo[]rdquo Optional

3 ldquordquo VariableRepetition

38

4 ldquo()rdquo SequenceGroup

Table11TheABNFFormalism

InwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoBanglaaregiventofacilitateunderstanding

542 TheVowelSequenceInwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoBanglaare given To facilitate understanding of other Brahmi script users equivalents inDevanāgarīareprovidedwherevernecessaryAvowelsequenceismadeupofasinglevowelItmaybefollowedbutnotnecessarily(optionally)byanAnusvāraonuʃʃār(B)Candrabindu(D)oraVisargabiʃɔrgo(X)ThenumberofDBorXwhichcanfollowaV inBanglamaynotberestrictedtooneGoingbytherules illustratedinthedocument it isclearthat formationssuchasVDDVBBandVXXare invalidorthographicunitsHowever it isvalidandpossible tohaveformationsorsequencessuchasanusvarafollowedbyachandrabinduononehandandvisarga followed by a chandrabindu on the other as in হ8াংচা lsquohaelignchārsquo and lsquohaelignrsquo হ8াঃrespectivelyThepossibilityof aVisarga orAnusvāra (onuʃʃār) followingaCandrabinduexists inBangla Vowel can optionally be followed by a combination of Hasanta Virama [H]Consonant [C] to formaYa-phala ldquoYa-phala isapresentation formofU+09AFBanglaletterযorlsquoyarsquoRepresentedbythesequenceltU+09CDieBENGALISIGNVIRAMABangla SIGNHasanta or VIRAMA U+09AF -য BENGALI LETTER YAgt Ya-phala has aspecialformয়AgainwhencombinedwithU+09BEাBENGALIVOWELSIGNAA(ielsquoaarsquo(ā))itisusedfortranscribing[aelig]asintheldquoardquointheEnglishwordldquobatrdquowritteninBanglaasব8াটAVowel-sequenceadmitsthefollowingcombinations

5421 ASingleVowel

ExamplesV অअ

5422 AVowelwithConditionsAVowelcanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D]orVisarga[X]or Candrabindu+ Anusvāra [DB] or Candrabindu+ Visarga [DX] or combination ofHasanta(orVirama)[H]followedbyConsonant[C]followedbykāra(Mātrā)[M]

39

Examples

VB অং अ

VD অ अ

VX অঃ अः

VDB অং अ

VDX অঃ अःVHCM অ8াএ8া

5423 VHCMSequenceAVHCMsequencecanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D] orVisarga [X]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga [DX]

Examples VHCMB অ8াংএ8াং VHCMD অ8াএ8া VHCMX অ8াঃএ8াঃ VHCMDB অ8াংএ8াং VHCMDX অ8াঃএ8াঃ

543 TheConsonantSequence

5431 ASingleConsonant(C) ExampleC কक

5432 AConsonantwithConditionsAConsonantoptionallyfollowedbydependentvowelsignkāra(Mātrā)[M]orAnusvāra [B] or Candrabindu [D] or Visarga [X] or Hasanta (also known asVirama)[H]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga[DX]

Example

CM িকক -कक

CB কং क

40

CD ক क

CX কঃ कः

CH p क(Pureconsonant)

CDB কং क

CDX কঃ कः

5433 CMSequenceACMsequencecanbeoptionallyfollowedbyBDXDBorDX

Example CMB কীংকং कक

CMD কা का

CMX বীঃ वीः

CMDB কাং का

CMDX কাঃ काः

5434 SequenceofConsonants Asequenceofconsonants(upto4)joinedbyHasanta(alsoknownasVirama)

3(CH)CExample

CHC W rarr ন++ ত न+त

CHCHC sup2 rarr ন+ + ত+ + র न+त+र

CHCHCHC q8 rarr ন++ত++র++য় न+त+र+य

5435 Subsets While considering its subsets as a representative example we will consider thecombinationCHConlyhoweverthesameisequallyapplicabletoCHCHCandCHCHCHC[A]ThecombinationmaybefollowedbyMBDXDBorDX

Example CHCM sup3ী rarrক ক ী 4कrarrककী

CHCB sup3ং rarrক ক ং 4कrarrकक

CHCD sup3 rarrক ক 4कrarrकक

41

CHCX sup3ঃ rarrক ক ঃ 4कःrarrककঃ

CHCDB sup3 ং rarrক ক ং 4कrarrकक

CHCDX sup3ঃ rarrক ক ঃ 4कःrarrककः [B] 3(CH)CMmayfurtherbefollowedbyaBDXDBorDX

Example

CHCMB sup3ীং rarr ক ক ী ং 4क rarr क क ी

sup3ং rarr ক ক ং 4क rarr क क

CHCMD sup3া rarr ক ক া 4का rarr क क ा

CHCMX sup3ীঃ rarr ক ক ী ঃ 4कः rarr क क ी ः

CHCMDB sup3াংrarr ক ক া ং 4काrarr क क ा

CHCMDX sup3াঃ rarr ক ক া ঃ 4काः rarr क क ा ः

544 TheKhanda-Tasequence

5441 AsinglelsquoKhandarsquo-Ta(Z) Example Zৎ=x

5442 AKhandaTaCombination10AKhandaTacanbeprecededbyaconsonantandHasanta(alsoknownasVirama)

[CH]Z

Example র++ৎ =ৎHasinভৎHসনা (bhartsanā) scoldingNoteTheconditionsinthiscontextofKHANDATAarethattheCshouldbeeitherRAU+09B0(র)(usedinBangla)orRAU+09F0(ৰ)(usedinAssamese)

545 SpecialCasesSandPTwospecialcasesinvolvingSequences(referredtoasSandPinTable16underSection7)couldbedescribedbrieflyhereLetustakeupSinthefirstinstanceItisnoteworthythat thereare two instances inBanglawhereHasanta(U+09CD) isprecededbya full

10 Refer to Rule P in Section 7 Table 16

42

vowel (U+0985 অ - BANGLA LETTER A and U+098F এ - BANGLA LETTER E) ForrenderingYa-phalāfollowedbyঅandএitisnecessarytotypeU+09CDplusU+09AFyapreceded by the said vowels This is a purely ligatural entity and the addition ofYa-phalāandā-kāraisusedtoelicittheaeligsoundasinEnglishlsquobatrsquolsquofatrsquoetcTheBrahmıscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribedaslsquovowel killerrsquo although it actually indicates absence of a vowel after the markedconsonant Only the consonants can have the Hasanta marked But as we see hereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅGাandএGা(CfUnicode100p473[100])Another case refers to the formation of repha and ra-phalā in the said script andmentioned in the tableaboveasPOwing toco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+

kaasinঅকHarkaldquothesunldquo)ra-phalā=C+Hasanta+ra(egTieka+Hasanta+raas

inচTcakraldquocycleldquo)ThepointisinboththecasestheslotforracouldbeBanglaraর

(U+09B0)ortheAssameseraৰ(U+09F0)followedprecededbythecommonHasanta(U+09CD)whereastheshapesofrephaandra-phalāinboththecasesremainthesame

43

6 VariantsThissection talksabout thevariants in theBanglascriptTheNBGPcategorizes theseconfusinglyvariantsintwogroupsGroup1ConfusingduetopurevisualsimilarityGroup2ConfusingduetodeviationfromnormallyperceivedcharacterformationsbylargerlinguisticcommunityForGroup1anyidenticalcodepointsaredefinedasvariantsTheconfusablebutnotidenticalcasesarenotproposedasthereisanotherpanel(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesHowevercaseswhichbelongtoGroup2areproposedtobeconsideredasvariantsThesecasesarenotofmerevisualsimilarityasthey involve some deviations from the widely accepted norms of Bangla Aksharformations These can cause confusion even to a careful observer and hence beingproposedasvariantsThe variants are generated in a script when two or more forms are formed withdifferent storage or code points In Bangla the e-kāra ā-kāra and the o-kāra havedifferentcodepointsOnecantypeowithaconsonantatonegoandthesamebytypinge-kāra and ā-kāra as two separate keys getting the same results A reader cannotdifferentiatebetween the twoko (েকা)one typedwitha singlekeyand theotheronetypedwithtwodifferentkeysMoreoverthiswillnotbeconsideredasacaseofvariantbecauseakārafollowedbyakāraisnotallowed

61 InScriptVariantsHoweverweproposetwocasesoftruein-scriptvariantsinBanglascriptCASEIAs far as true variants in Bangla are concernedwemay drawour attention to caseswhereinHasantawith(U+09A5)থ(tha)appearsasconjunctwith(U+09B8)স(sa)and(U+09A8)ন(na)1 স+Hasanta+থ(U+09B8+U+09CD+U+09A5)versus

স +Hasanta+হ(U+09B8+U+09CD+U+09B9)

2 ন+Hasanta+থ(U+09A8+U+09CD+U+09A5)versus

ন+Hasanta+হ(U+09A8+U+09CD+U+09B9)

44

Theabovecombinationsifwrittenintraditionalorthographycouldbelittleconfusingwheretheথ(tha)inconjunctappearslikeaহ(ha)Theconjunctcouldbeintheinitialmedialorfinalpositions(asshownbelowinegno1)Itcouldbetypedwrongaswellthinking itwas aহ (ha)U+09B9 increasing the chances of risks in labelwriting andidentificationExamples1 acuteandসহ(asinacuteানsthānaacuteলsthulaাacuteGsvāsthyaঅacuteায়ীasthāyı)

2 andনহ(asinparagrantha)The fontswhichrepresent traditionalBanglawritingsystemcould tend tocreate thisproblemThereforethesemaybetakenascasesofvariantsinBanglaCASEIIAnotherinterestingexampleofvariantisencounteredinra+HasantaandHasanta+racombinationsinwritinglabelsintheBanglascript(forlanguagessuchasBanglaAssameseandManipuri)Thevariantcasesariseintypinglsquorepharsquo(involvingra+Hasanta)andlsquora-phalārsquo(involvingHasanta+ra)lsquoRepharsquocouldbeformedbytwosequences(mainlybecausebothAssameseandBanglafindplaceinthesameUNICODEpointsandlsquoB_RArsquoaswellaslsquoA_RArsquorefertothesamephoneticelement)Herethefinalligatureslookthesameandwillbeasfollows

(1) B_RA+H+C(2) A_RA+H+C

Where

B_RA = U+09B0BENGALILETTERRA(র)orA_RA = U+09F0BENGALILETTERRAWITHMIDDLEDIAGONAL(ৰ)H = U+09CDBENGALISIGNVIRAMA()C = anyconsonant(theoretically)

Example

Sequence1(UsingBanglaRA)

Ligature1

Sequence2(UsingAssameseRA)

Ligature2

U+09B0(র)U+09CD()U+0995(ক) কH U+09F0(ৰ)U+09CD()U+0995(ক) কH

U+09B0(র)U+09CD()U+09A0(ঠ) ঠH U+09F0(ৰ)U+09CD()U+09A0(ঠ) ঠH

Table12ExampleofRepha

45

NoteAsBanglaandAssameseকandঠlookexactlythesametheresultantcombinationswithRephalookidenticalAdditionofRephadoesnotmakeanydifferencelsquoRa-phalarsquocouldbeformedbytwosequencesonsimilargroundsandthefinalligatureswouldlookthesame

(1) C1+H+B_RA(2) C1+H+A_RA

WhereC1 = anyconsonantsexceptKhanda-ta

Example

Sequence1(UsingBanglaRA)

Ligature1

Sequence2(UsingAssameseRA)

Ligature2

U+0995(ক)U+09CD()U+09B0(র) U+0995(ক)U+09CD()U+09F0(ৰ)

U+09A8(ন)U+09CD()U+09B0(র) ) U+09A8(ন)U+09CD()U+09F0(ৰ) )

Table13ExampleofRa-phala

AstheAssameseandBanglaRephaandRa-phalaconjunctformslookthesamethiscouldcauseconfusabilitytotheend-usersHencetherephaandra-phalacasesneedtobedefinedasvariantsNBGPconcludedtodefineরandৰasvariantcodepointswhereonlyonevariantsetbetweenরandৰcouldcoverallcasesButthiswillcreateblockedvariantlabelsegif

someoneregistersldquoরররrdquothevariantlabelldquoৰৰৰrdquowillbegeneratedasvariantandwillbeblockedandviceversaHoweveritisonlyblockedatthelabellevelifsomeoneelseneedstoregisterotherlabelsegৰৰorৰৰৰৰitisstillpossible

62 CrossScriptVariantsAcrispcrossscriptstudyforBanglahasbeendonewithrespecttosisterscriptssuchasDevanāgarī Gurmukhı and Odia11(formerly Oriya) keeping in mind the visual andtechnicalconfusionstheymaycauseaslabelsonthewebdomainMoreoverthereisnoin-script variant in Bangla as far as the orthography is concerned The followingcharacters are being proposed by the NBPG as variants Although there are certaincharacters which are somewhat similar they but have not been included here TheyhavebeenprovidedintheAppendix(102)forreference

11 Unicode uses Oriya for the script although Odia is now the official term used

46

1 BanglaandNāgarīDevanāgarīScript

Bangla Devanāgarī

মU+09AE

मU+092E

িU+09BF

िU+093F

Table14-BanglaandDevanāgarīcross-scriptvariantcodepoint

2 BanglaandGurmukhiScript

Bangla Gurmukhı

মU+09AE

ਸU+0A38

িU+09BF

ਿU+0A3F

Table15-BanglaandGurmukhıcross-scriptvariantcodepoint

7 WholeLabelEvaluationRules(WLE) ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinsection32whenwritten inBangla12ScriptThe ruleshavebeendrafted in suchawaythattheycanbeeasilytranslatedintotheLGRspecificationsBelow are the symbols used in the WLE rules for each of the Indic SyllabicCategoryasmentionedinthetableprovidedinCodepointrepertoire(Section51)

C rarr Consonant

M rarr Kāra(Mātrā)

V rarr Vowel

B rarr Anusvāra

12 As used by the Unicode denoting and including both Assamese and Maṇipuri

47

D rarr Candrabindu

X rarr Visarga

H rarr Hasanta

Z rarr KhandaTa

S rarr S1S2(fromTable9)or(ae)Ya-phalā(V1HC1M1)whereV1iseither0985(অ-BENGALILETTERA)or098F(এ-BENGALILETTERE)His09CD(-BENGALISIGNVIRAMA)C1is-09AF(য-BENGALILETTERYA)M1is-09BE(া-BENGALIVOWELSIGNAA)S1 and S2 are valid even they are not allowed by the other context rules

P rarr Ra-Hasanta(C2H)whereC2iseither09B0(র-BENGALILETTERRA)or 09F0 (ৰ - ASSAMESE LETTER RA

Unicode name BENGALI LETTER RAWITHMIDDLEDIAGONAL)

His09CD(-BENGALISIGNVIRAMA)

Table16-SymbolsusedinWLErules

ItisalsoperhapsidealtomentionherethatinBanglatheconsonantletters(orgraphemes)arephysicallyjoinedtoformldquoclustersrdquothatcouldtheoreticallyconjoinfromtwotofourconsonantsandcombinetocreatenewshapesDashandChaudhuri(1998)statethatthereareldquonearly380uniqueconsonantclustersrdquooutofwhichBi-consonantalcombinationsare290three-lettercombinationsaccountforanother80andtherareroneswithfourlettersnumber10more[136Pg4]MoredetailsofsuchcombinationscouldbeseeninPabitraSarkar(1993)[135] 71FinalSetofWLERulesTheprevalentpatternsinBanglaandvariousrestrictionsbelowarethespecificWLErulesthatneedtobeimplemented

48

1 CisasetofCandCNwhereCNisthesetofnormalizedformsofড়ঢ়য়2 HmustbeprecededbyC

Example

3 MmustbeprecededbyCExampleকা

4 DmustbeprecededbyeitherofVCorMExampleআখখা হা

5 XmustbeprecededbyeitherofVCMorDExampleউঃখঃবঃাঃ দ ঃ

6 BmustbeprecededbyeitherofVCMorDExampleআংইংকং

7 ZmustbeprecededbyVCMDBXorPExampleইৎকৎাৎাৎপ6ৎrৎ (S is not listed because S ends with M Z may also follow S)rdquo

8 VCANNOTbeprecededbyHDetailsin711CaseofVprecededbyH

9 SCANNOTbeprecededbyH

Now letuselaborateeach rulewithexamples from the scriptkeeping inmindthe Bangla Assamese and Manipuri communities Some combinations ofcharactersmayseemunrealisticorrareinusagebutthereisnoharminaddingsuch ligaturesbecause it ispossible tocreate thembyanyusereasilybutmaynotbeattestedcombinationsCase of V Preceded by H There could be cases involvingmulti-word domains where Vmay need to beallowedtofollowanHegব8াsঅtইিuয়া baeligŋkʌv ɪndiə (U+09ACU+09CDU+09AFU+09BEU+0999U+09CD U+0995 U+0985 U+09AB U+09CD U+0987 U+09A8 U+09CD U+09A1U+09BFU+09DFU+09BE)(meaningBankofIndia)ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendswithanH(অt)andthesecondwordbeginswithaV(ইিuয়া)Somesectionsofthe linguistic community require the explicit presence of H for fullrepresentation of the sound intended However by and large the form of thefirstwordwithoutanH(U+09CD)isconsideredenoughforfullrepresentationofthesoundintendedforthefirstword

49

This isauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoire Otherwise V is never required to be allowed to follow an HPermittingthismaycreateaperceptivesimilaritybetweentwolabels(withandwithout H) for majority of the linguistic communities hence this is explicitlyprohibitedbytheNBGPIn future if required depending on the prevailing requirements from thecommunitythefutureNBGPmayconsiderrevisitingthisrule

72 AdditionalExamplesfromBanglaABNF

Belowarea fewexampleswhichhelponeunderstandsomeof the rulesABNFputsinplaceThesearejustgivenforreferencepurposesandarenotmeanttobecomprehensive

1HMBDorXcannotoccurinthebeginningofaBanglawordExample

ক क

াক ाक

ংক क

ক क

ঃক ःकAscanbeseensuchcombinationwillresultautomaticallyinaldquogolurdquooradottedcircle marking it as an invalid formation This is an intrinsic property of theIndianlanguagesyllableandisquasi-automaticallyappliedwhereversupportedbytheOS

2HisnotpermittedafterVBDXMS

Example

অ अ

অং क ক क কঃ कः ি )क

3NumberofBDorXpermittedafterConsonantorVowelorakāra(Mātrā)isrestrictedtoonethusthefollowingcombinationsareinvalidated

50

Example

কংং क

ক क

কঃঃ कःः

কা का

কীঃঃ कःः

অংং अ

অ अ

অঃঃ अःः

4NumberofMpermittedafterConsonantisrestrictedtooneExample

কীী की5MisnotpermittedafterV Example

ইাঈৗ ईाईौ6ThecombinationsofAnusvāra+ VisargaaswellasVisarga+Anusvāraarenotpermissible

Example

কংঃ कः

কঃং कः

8 Contributors

81ExpertsfromIndia ProfessorUdayaNarayanaSinghChair-ProfessorofLinguisticsampDeanFacultyofArtsAmityUniversityHaryanaGurgaonPachgaonManesarPIN122431(Haryana)India

51

ProfessorPabitraSarkarformerlyVice-ChancellorRabindraBharatiUniversityKolkataDrAtiurRahmanKhanPrincipalTechnicalOfficerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMrRajibChakrabortyLinguistSocietyforNaturalLanguageTechnologyResearch(SNLTR)Module114amp130SDFBuildingSaltLakeSector-VKolkata-700091(WestBengal)IndiaMrAkshatJoshiProjectEngineerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMsMoumitaChowdhurySeniorTechnicalOfficerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMrChandrakantaMurasinghAgartalaTripuraSomeotherNBGPmembers

82ContributorsfromBangladeshJanabMustafaJabbarHonorableMinisterMinistryofPostsTelecommunicationsampInformationTechnologyGovtofBangladeshProfShamsuzzamanKhanFormerDirector-GeneralBanglaAcademyDhakaProfRafiqulIslamNationalProfessorofHumanitiesDhakaProfSwarochisSarkarDirectorInstituteofBangladeshStudiesRajshahiUniversityRajshahiBangladeshProfJinnatImtiazAliDirector-GeneralInternationalMotherLanguageInstituteDhakaMrMohammadMamunOrRashidDepartmentofBanglaJahangirnagarUniversityampMemberBangladeshComputerCouncilProfManiruzzamanformerlyProfessorChittagongUniversityChattagramBangladeshMrShyamSunderSikderSecretarySecretaryPostampTelecommunicationsDivisionGovtofBangladesh

52

MrMdMustafaKamalFormerDirectorGeneralBangladeshTelecommunicationsRegulatoryAuthorityGovernmentofBangladeshDhakaBrigadierGeneralMdMahfuzulKarimMajumderDirector-GeneralEngineeringampOperationsDivisionBangladeshTelecommunicationsRegulatoryAuthorityGovernmentofBangladeshDhakaMdZiarulIslamProgrammerPostsampTelecommunicationsDivisionGovernmentofBangladeshDhakaProfSyedShahriyarRahmanDepartmentofLinguisticsUniversityofDhakaDrMizanurRahmanDirector(In-Charge)TranslationTextBookandInternationalRelationsDivisionBanglaAcademyDhakaDrApareshBandyopadhyayDirectorBanglaAcademyDhakaMrMdMobarakHossainDirectorBanglaAcademyDhakaDrJalalAhmedDirectorBanglaAcademyDhakaMrJahangirHossainInternetSocietyBangladesh(ICANNALS)JanabSarwarMostafaChoudhuryBangladeshComputerCouncilDhakaJanabMdRashidWasifBangladeshComputerCouncilDhakaJanabIstiaqueArifSeniorAssistantDirectorBangladeshTelecommunicationsRegulatoryAuthorityDhakaMsAfifaAbbasInformationSecurityandGovernanceLeadEngineeratBanglalinkandICANNFellowMr Mohammad Abdul Haque Secretary General Bangladesh Internet GovernanceForumMrImranHossenCEOEyeSoftandkeymemberofBangladeshAssociationofSoftwareampInformationServices(BASIS)MsShahidaKhatunDirectorFolkloreMuseumampArchiveDivisionBanglaAcademyDhakaMrSyedAshikRehmanCEOBengalMediaCorporationDhakaMrHaseebRahmanCEOProfessionalsrsquoSystemsDhaka

53

9 References[100] UnicodeConsortium2017UnicodeStandard100MountainViewCA

[101] BandyopadhyayChittaranjan1981DuiShatakerBanglaMudranoPrakashanKolkataAnandaPublishers

[102] BanerjiRD1919TheOriginoftheBengaliScriptKolkataNewDelhiAsianEducationalServices2003reprint

[103] ChatterjiSK1926TheOriginandDevelopmentoftheBengaliLanguageCalcuttaCalcuttaUniversityPress

[104] -----1939Bhasha-prakashBangalaVyakaran(AGrammaroftheBengaliLanguage)CalcuttaUniversityofCalcutta

[105] HaiMuhammadAbdul1964DhvaniVijnanOBanglaDhvani-tattwa(PhoneticsandBengaliPhonology)DhakaBanglaAcademy

[106]JhaSubhadra1958TheFormationofMaithiliLondonLuzacampCo

[107] KosticDjordjeDasRheaS1972AShortOutlineofBengaliPhoneticsCalcuttaStatisticalPublishingCompany

[108] MajumdarRC1971HistoryofAncientBengalCalcuttaGBhardwaj

[109] MazumdarBijaychandra19202000TheHistoryoftheBengaliLanguage(ReprCalcutta1920ed)NewDelhiAsianEducationalServices

[110] PandeyAnshuman2001ProposaltoEncodetheTirhutaScriptinISOIEC10646

[111] PalPalashBaran2001DhwanimalaBarnamalaKolkataPapyrus

[112] -----2007lsquoBanglaHarapherPanchParbarsquoInSwapanChakrabortyedMudranerSanskritiOBanglaBoiKolkataAbabhas

[113] RossFiona1999ThePrintedBengaliCharacteranditsEvolutionLondonCurzon

[114] ShastriMahamahopadhyayHaraPrasad1916HājārBacharērPurāṇaBāṅgālāBhāṣāyBauddhaGānōDōhāCalcuttaBangiyaSahityaParishat

[115] SinghUdayaNarayana(JointlyManiruzzaman)1983DiglossiainBangladeshandlanguageplanningCalcuttaGyanBharati214pp

[116] -----1987ABibliographyofBengaliLinguisticsMysoreCIILxii+316pp

[117] -----2017(withRajibChakrabortyBidishaBhattacharjeeampArimardanKumarTripathy)LanguagesandCulturesontheMarginGuidelinesforFieldworkonEndangeredLanguagesMimeoCentreforEndangeredLanguagesVisva-Bharati

54

[118] -----1980ScriptalchoiceandspellingreformAnessayinlanguageandplanningJournaloftheMSUniversityofBarodaSocialScienceNumber292173-186AmodifiedversionreprintedEAnnamalaiBjornJernuddandJoanRubinedsLanguagePlanningProceedingsofanInstituteMysoreCIIL405-417

[119] Sripantha1996JakhanChapakhanaEloKolkataPaschim-BangaBanglaAcademy

[120] SurAtul1986BanglaMudranerDushoBacharKolkataJijnasa

[121] ScriptBehaviourforBengaliVersion11TDILandC-DACPune

[122] BoraMahendra1981TheEvolutionofAssameseScriptJorhatAssamSahityaSabha

[123] ProposaltoEncodetheTirhutaScriptinISOIEC10646httpwwwunicodeorgL2L201111175r-tirhutapdfaccessedon25112017

[124] EthnologueAssameseintheLanguageCloudhttpswwwethnologuecomcloudasmaccessedon25112017

[125] BengalialphabetforManipurifoundinEthnologueManipuri(MeeteilonMeithei)httpswwwomniglotcomwritingmanipurihtmaccessedon20102019

[126] WikipediaBengalialphabethttpsenwikipediaorgwikiBengali_alphabetaccessedon25112017

[129]OmniglotSlyhetihttpwwwomniglotcomwritingsylotihtmaccessedon1052018

[130]WikipediaBishnupriyaManipurilanguagehttpsenwikipediaorgwikiBishnupriya_Manipuri_languageaccessedon25112017

[131] TheEMILLECIILCorpushttpmetashareeldaorgrepositorybrowsethe-emilleciil-corpusabdd35c8de6f11e2b1e400259011f6ea6bce74d38dbb42d881da76c64a6adb20accessedon1052018

[132]TheEMILLECIILCorpushttpcatalogelrainfoproduct_infophpproducts_id=696accessedon1052018

[133] BanglaLanguageampScript httpswwwisicalacin~rc_banglabanglahtmlaccessedon1052018

[134] SarkarPabitra1992BanglaBananSanskarSamasyaoSambhabanaKolkataChirayataPrakashan

[135] SarkarPabitra1993BanglaBhasharYuktabyanjanBhasha1123-45

55

[136] DashNiladriShekharandBBChaudhuri1998BanglaScriptAStructuralStudyLinguisticsToday121-28Alsoavailableathttpswwwacademiaedu9967428Bangla_Script_A_Structural_Study

[137] DaniAhmedHasan(1957)lsquoSrīhatta-NāgarīLipirUtpattioBikāśrsquoBanglaAcademyPatrika(Dhaka)Vol12(Bhadra-Agrahayan1364BangabdaNumber)pg1

[138] WikipediaSylhetiNagari httpsenwikipediaorgwikiSylheti_Nagariaccessedon1952018

[139] FuruiRyosuke(2015)lsquoVariegatedAdaptationsStateFormationinBengalfromtheFifthtoSeventhCenturyrsquoinBhairabiPrasadSahuampHermannKulkeedsInterrogatingPoliticalSystemsIntegrativeProcessesandStatesinPre-ModernIndiaChapter9Pp255-73NewDelhiManohar

[140] FergusonCharesAandMunierChowdhury(1960)lsquoPhonesofBengalirsquoLanguageVol36No1pp22-59

[141] ShahidullahMuhammad(2007)BuddhistMysticSongsDhakaMowlaBrothers

[142] RayPunyaSloka(1966)BengaliLanguageHandbookWashington

[143] HaiMuhammadAbdul(1960)AphoneticandphonologicalstudyofnasalsandnasalizationinBengaliDhakaUniversityofDhaka

[144] UnicodeConsortiumProposalSummaryFormtoAccompanySubmissionsforAdditionstotheRepertoireofISOIEC10646UNICODEhttpswwwunicodeorgL2L200202387r-syloti-formpdfaccessedonMay212018

[145] WikipediaOlChiki(Unicodeblock)httpsenwikipediaorgwikiOl_Chiki_(Unicode_block)accessedonMay212018

[146]BanglaScripthttpwwwbangladesh2000combdbangla_scripthtmlaccessedonMay212018

[147] BhattacharyaAshutoshed(1942)GopichandrerGanCalcuttaCalcuttaUniversity

[149] DasSisirKumar(1975)SahibsandMunshisAnAccountoftheCollegeofFortWilliamCalcutta

[150] IslamRafiqulPabitraSarkarMahbubulHaqampRajibChakraborty(eds)(2014)BanglaAcademyPromitoBanglaByabaharikByakaran(AFunctionalGrammarofStandardBangla)DhakaBanglaAcademy

[151] SarkarPabitra[2013]lsquoBanglaSpellingReformtheLongandShortofItrsquoBanglaJournal19215-232

[152] BanglaAcademy(2012)BanglaAcademyPromitoBanglaBananerNiyam(StandardBanglaSpellingasadoptedbyBanglaAcademy)DhakaBanglaAcademy

56

[153] SarkarPabitraampRajibChakraborty2018ldquoWhathashappenedSoFarIntermsof Script Reformsrdquo Paper presented at the Face to Face meeting jointly held by theBanglaAcademyDhakaampICANNatBanglaAcademyDhakaon10072018

[154] The Unicode Consortium 2018 The Unicodereg Standard Version 110 ndash CoreSpecificationChapter12P473

57

10 Appendix-I

101 AugmentedBackus-NaurFormalism(ABNF)The Augmented Backus-Naur Formalism (ABNF) is generic in nature and whenappliedtoaspecificlanguagescriptcertainrestrictionrulesapplyInotherwordsinagivenlanguagesomeoftheFormalismstructuresdonotnecessarilyapplyTotakecareofsuchcasesrestrictionrulesaresetinplaceTheserestrictionswillhelptofine-tunetheABNFIncaseofBangla13inparticularthefollowingrulesapply

1Khaṇḍata (ৎ) is NOT allowed at the beginning of an IDN label The sameappliestoঞandthevelarnasalঙintheBanglaSchemeoffive-foldlsquovargarsquo(asdefined under Table 5) Moreover Bangla does not allow ya (য়) in thebeginningofawordeitherbutwecanciteacoupleofnativeexamples forexamplethewordয়8াvেড়া(yaeligbbɔRo)fromthepoemlsquoLichuchorrsquowrittenbyKaziNazrul IslamHowever there are instances of it being used in namesmostlyof foreignoriginsuchasYaqubwhichmaybewrittenwithya(য়) inthe beginning as inয়াwব) In very recent timeswhile transliterating someChineseandJapanesenamesinBanglaonedoescomeacrossthepossibilityofKhaṇḍata (ৎ) followedbysa (স) inthebeginningofaword forexamplexেসিরং(Tsering)

2CHcancomewithKhandaTainonlythecasewhereCisra(র)(09B0)

ৎ6 asinভৎ6 সনা

3OnlyfollowingcombinationswithVHCMwillbeallowedrarrঅ8া(togetherpronouncedasaelig)asinঅ8ািসড(acid)rarrএ8া(togetheralsopronouncedasaelig)asinএ8ািসডএ8ােসািসেয়শান

(acidassociation)

102 lsquoSylhetiNagarılipirsquoorlsquoSilotirsquoThisversionofBanglascriptresemblesthe lsquoKaithīrsquoscript(ISO12954)usedbytheAccountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh andBiharndashwidelyinuseduringthe1880sTherewereseveralothernamesofSylheti

13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language Assamese and Maṇipuri have not been covered in this section

58

Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)

Table17ndashTheScriptTableofSylhetiNagarıorSiloti

103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints

1031 BanglaandNāgarīorDevanāgarī

Bangla Devanāgarī NBGP

Decision

ঃ U+0983 ः U+0903 Confusable

ওU+0993 उU+0909 Confusable

ঘU+0998 घU+0918 Confusable

U+0981 U+0945 Confusable

Table18BanglaandDevanāgarīconfusablecodepoints

59

1032 BanglaandGurmukhi

Bangla Gurmukhi NBGPdecision

ঘU+0998 ਬU+0A2C Confusable

U+0981 U+0A71 Confusable

Table19BanglaandGurmukhiconfusablecodepoints

Bangla Gurmukhi

NBGPdecision

ওU+0993 ਤU+0A24

Distinguishable

শU+09B6 ਅU+0A05

Distinguishable

মU+09AE ਮU+0A2E

Distinguishable

বাU+09ACandU+09BE

ਗU+0A17

Distinguishable

Table20ndashBanglaandGurmukhıdistinguishablecodepoints

1033BanglaandOriya(Odia)

Bangla Oriya(Odia) NBGPDecision

ওU+0993 ଓU+0B13 Confusable

Table21ndashBanglaandOriyadistinguishablecodepoints

Bangla Oriya(Odia) NBGP

Decision

ঘU+0998 ସU+0B38 Distinguishable

Table22ndashBanglaandOriyadistinguishablecodepoints

60

11 Appendix-IIBengaliconsonantsandtheirallographs

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

প p y(z+ত)(z +ন)|(z +প)প8(z +য)r(z +র)(z +ল)~(z +স)(+প)(+প)

ফ pʰ (t +র)(t +ল)(+ফ)

ব b ( +জ)( +দ)( +ধ)v(+ব)ব8(+য)(+র)(+ল)ভ(+ভ)(+ব)(+ব)

(0+ধ) 2 (3+ব)

ভ bʱ ভ8(+য)(+র)(+ল)

ত t (x+ত)8(x+x+য)(x+x+ব)(x+থ)(x+ন)ত8(x+য)(x+ম)8(x++য)(x+ব)(x+র)y(z+ত)(p+ত)(p+x+ব)(+ত)q8(+x++য) (+x+র)Thereisamarkedformofত+=ৎৎ6 (+xৎ)

amp (5+ত)

61

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

থ tʰ থ8(iexcl+য)cent(iexcl+র)pound(+থ)(x+থ)curren(+থ)

(7+থ) 9 (+থ)

দ d yen(brvbar+গ)sect(brvbar+ঘ)uml(brvbar+দ)copy(brvbar+ধ)দ8(brvbar+য)ordf(brvbar+ব)laquo(brvbar+ভ)not(brvbar+র)(+দ)shy(+দ)reg(+brvbar+র)not6 (+brvbar+র)

(lt+গ) gt (lt+ধ)

ধ dʱ macr(deg+ন)plusmn(deg+ম)ধ8(deg+য)sup2(deg+র)sup3(acute+ধ)copy(brvbar+ধ)(+ধ)micro(+ধ)

( (+ধ) gt (lt+ধ) (0+ধ) (7+ধ)

ট ʈ para(middot+ট)ট8(middot+য)cedil(middot+ব)sup1(middot+র)ordm(p+ট)raquo(frac14+ট)

ঠ ʈʰ ঠ8(frac12+য)frac34(iquest+ঠ)Agrave(frac14+ঠ)

ড ɖ Aacute(Acirc+ড)ড8(Acirc+য)Atilde(Acirc+র)

ঢ ɖʱ ঢ8(Auml+য)Aring(iquest+ঢ)

চ tʃ AElig(Ccedil+চ)Egrave(Ccedil+ছ)Eacute(Ccedil+Ecirc+র)Euml(Ccedil+ঞ)চ8(Ccedil+য)Igrave(Iacute+চ)Icirc(Iuml+চ)

(A+চ)

62

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ছ tʃʰ ETH(Ecirc+র)Egrave(Ccedil+ছ)Ntilde(Iacute+ছ)Ograve(Iuml+ছ)

$ (A+ছ)

জ dʒ Oacute(Ocirc+জ)Otilde(Ocirc+Ocirc+ব)Ouml(Ocirc+ঝ)times(Ocirc+ঞ)জ8(Ocirc+য)Oslash(Ocirc+র)Ugrave(Iacute+জ)

(A+জ)

ঝ dʒʱ (notprivilegedenoughtohaveclustersasafirstmember)Ouml(Ocirc+ঝ)Uacute(Iacute+ঝ)

ক k Ucirc(p+ক)ordm(p+ট)(p+ত)Uuml(p+x+র)(p+x+ব)Yacute(p+ন)THORN(p+ব)szlig(p+ম)ক8(p+য)agrave(p+র)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)auml(p+frac14+ব)aacute8(p+frac14+য)aring(p+স)s(aelig+ক)ccedil(+p+র)

amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)

খ kʰ (notprivilegedenoughtohaveclustersasafirstmember)egrave(aelig+খ)

63

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

গ g eacute(acute+গ)ecirc(acute+দ)sup3(acute+ধ)euml(acute+ন)igrave(acute+ব)iacute(acute+ম)গ8(acute+য)icirc(acute+র)iuml(acute+ল)eth(aelig+গ)eth6 (+aelig+গ)

( (+ধ) (H+গ) K (L+H+গ)

ঘ gʱ ntilde(ograve+ন)ঘ8(ograve+য)oacute(ograve+র)ocirc(aelig+ঘ)

ঞ Thisletterdoesnothaveanyparticularphoneticvaluebutmostlypronouncedasn

Igrave(Iacute+চ)Ntilde(Iacute+ছ)Ugrave(Iacute+জ)Uacute(Iacute+ঝ)times(Ocirc+ঞ)

(A+চ) $ (A+ছ) (A+জ) M (A+ঝ)

ণ n otilde(iquest+ট)frac34(iquest+ঠ)ouml(iquest+ড)divide(iquest+Acirc+র)Aring(iquest+ঢ)oslash(iquest+ণ)ণ8(iquest+য)ugrave(iquest+ব)acirc(p+frac14+ণ)uacute(frac14+ণ)ucirc(+ণ)

O (P+ড) - (P+R+র) + (S+ণ)

ঙং ŋ s(aelig+ক)uuml(aelig+p+র)egrave(aelig+খ)eth(aelig+গ)ocirc(aelig+ঘ)yacute(aelig+p+ষ)(Insomecontextsaeligisreplacedbyং)কংঅং

) (H+ক) (H+গ) U (H+ঘ)

64

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ম m thorn(+ল)yuml(+প)(+z+র)(+ভ)(++র)$(+ম)(+র)(x+ম)plusmn(deg+ম)amp(+ম)atilde(p+frac14+ম)

W (3+ম)

ন n (+ট)((+middot+র))(iquest+ঠ)u(+ড)(+Acirc+র)(+ত)q(+x+র)q8(+x++য)curren(+থ)shy(+দ)reg(+brvbar+র)micro(+ধ)+(+deg+র)(+brvbar+ব)-(+ন)(+ম)ন8(+য)(+স)0(+ন)

(7+থ) (7+ধ) (7+Y+র)

শ ʃ Icirc(Iuml+চ)Ograve(Iuml+ছ)1(Iuml+ন)2(Iuml+ম)3(Iuml+র)4(Iuml+ল)শ8(Iuml+য)

ষ ʃ 5(frac14+ক)raquo(frac14+ট)Agrave(frac14+ঠ)uacute(frac14+ণ)6(frac14+প)7(frac14+z+র)8(frac14+ফ)raquo(frac14+ট)9(frac14+middot+র)Agrave(frac14+ঠ)uacute(frac14+ণ)ষ8(frac14+য)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)

+ (S+ণ)

65

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

স sampʃ (+ক)(+ট)(+প)(+ফ)lt(+ত)pound(+থ)(+ট)(+ক)=(+খ)স8(+য)gt(+র)(+ল)aring(p+স)

9 (+থ)

হ h ucirc(+ণ)0(+ন)amp(+ম)হ8(+য)(+র)A(+ল)

W (3+ম)

ড় ɽ B(C+গ)

ঢ় ɽʱ (notprivilegedenoughtohaveclusters)

য dʒThesecondarysymbol(allograph)jɔ-phalahastwophoneticvaluesWhenaddedtotheinitialconsonantinaworditisavowelaelig(asinশ8ামলর 8াপারetc)Butafteranon-initialconsonantitjustdoublesitinpronunciation(asinকায6ধায6etc)The+যcombinationhastwophysicalmanifestationsmdashর 8andয6

ক8(p+য)স8(+য)র 8(+য)[Justর 8isneverusedinBanglaorthographyর 8াisbutthenitslasttwosymbolsYa-phalaa-karaconstituteavowelsignrepresentingthevowelঅ8া]

66

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

র r Twomanifestationsmdashi lরফrepʰasthefirst

memberofaclusteregপ6ৎ6 not6 য6D6(+deg+ব)(earlierE6=+brvbar+deg+বafour-termcluster)etc(placedoverthefollowingconsonant)

ii র-ফলাrɔ-pʰɔlaasthesecondthirdmemberofaclustereg etc(placedundertheconsonantitfollows)

ল l F(+গ)(+প)G(+ব)H(+ম)I(+ট)J(+ড)K(+ক)F(+গ)L(+দ)ল8(+য)iuml(acute+ল)(+ল)thorn(+ল)

ঃ hwordfinallywordmediallyitdoublesthepronunciationofthefollowingconsonant

অঃকঃ

অব

Page 10: Proposal for a Bangla (or Bengali) Script Root Zone …‘Bangla’ (or Bengali) is historically and genealogically regarded as an eastern Indo-Aryan language with around 178.2 million

10

depending on the position of the consonant in the word At times theselsquoassumedrsquoorlsquoinherentrsquovowelsarenotpronouncedatall[142]

Vowelscanbewrittenasindependentlettersorbyusingavarietyofdiacriticalmarks which are written above below before after or both of the last twopositionstheconsonanttheyfollowinpronunciation[105]

AllBanglaconsonantswhenpronouncedinisolationareutteredwithaninherentvowel-ɔhenceক lsquokrsquoখ lsquokhrsquoorগ lsquogrsquoareusuallypronouncedas[kɔ][khɔ]or[gɔ]etcPhonologicallyBanglavowel-ɔcorrespondstotheHindischwaə

WhenconsonantsoccurtogetherinclustersspecialconjunctlettersareformedInprintedBanglamanyof theseconsonantal clustersorconjoinedconsonantsareinuseThelettersfortheconsonantsotherthanthefinaloneinthegrouparegenerally reduced But there are a few special conjunct characters which arecompounds of the consonant characters eg 7(k)+ষ(s)=8(ks)

9(n)+জ(j)=(nj)(j)+ঞ(n)==(jn) gt (h)+ম(m)=(hm) There are other issuesalsomdashরasthesecondmemberofaclusterisreducedtoasecondarysymboleg

(p)+র(r)=A(pr)B(s)+C(t)+র(r)=D(str) (as inউD ustra ldquocamelrdquo)য (y)whenusedas a primary symbol represents jɔ in Bangla But its secondary symbol(allograph) jɔ-phala has two phonetic values When added to the initialconsonant in a word it is a vowel aelig (as in শGামল (syamala) ldquogreenrdquo র Gাপার

(ryapara)ldquowrapperrdquoetc)Butafteranon-initialconsonant it justdoublesit in

pronunciation (as in কাযH ধাযH etc) The I(r)+য(y) combination has two

renderingsmdashর G(ry) andযH(ry)IncaseofJ(d)+ধ(dh)K(g)+ধ(dh)L(n)+ধ(dh)the

shape of the second member is changedmdasheg M(ddh) N(gdh) and O(ndh)

respectively The solitary example of I (r)+ঋ(r)=ঋH (as in ৈনঋHত nairrtSouthwest) ndash usedmostly in cases of Classical borrowings shows the use ofsecondary symbol of a consonant followed by the primary symbol of a vowelTheinherentvowelonlyappliestothefinalconsonantofthecluster

InconsonantclustersmanyconsonantstookacompletelydifferentformSometypicalexamplesareS(kt)T(kr)8(ks)N(gdh)=(jn)U(nc)(nj)V(tt)W(nt)O(ndh)X(bdh)Y(bhr)Z(mb)[(st)etcরhastwoallographsapartfromthisfullshapeoneislsquorepharsquoasfoundinকH(rk)পH(rp)andanotherisra-phalaasinA(pr)T(kr)(s+n)isanotheronewherethecerebralnasalconsonantsigntakesaqueershape[151]

The Bangla script has at least fifty-two primary symbols and quite a fewallographs(positionalvariantsofthem)correspondingtoforty-four(7oraland7nasalvowelsand30consonants)phonemes(150)orfunctionalspeechsoundswithsomeobviousredundanciesalthoughinoneofthefirstphonemicanalysisthenumberwasthoughttobethirty-fivephonemes[140]

11

As mentioned above in Bangla several graphemic symbols have secondaryshapestechnicallycalledlsquoallographsrsquowithacomplementarydistributionineachcaseThesegraphsormarkingsaregenerallyaddedtothefollowingpositionsoftheprimarysymbol[113]inthefollowingmanner

1) Below(egক(ku)W(nta)ক(ku)^ (hra)etc)

2) Above(egচ (ca)কH (rka)etc)

3) Rightside(egকা (ka)কং (kan)etc)

4) Leftside(egেক (ke))

5) LeftSideandabovesimultaneously(egৈক (kai)িক (ki)etc)

6) Rightsideandabovesimultaneously(egকী (kı))

7) Rightsideandleftsidesimultaneously(egেকা (ko))

8) Rightsideleftsideandabovesimultaneously(egেকৗ (kau))

Asforcomplementarydistributionofvowelletters(word-orsyllable-initial)andVowel Matras which are relevant for ABNF let us consider the followingBesidessomesimpleVowelModifierscalledlsquoKarsrsquoinBangla(alsoreferredtoasMatraintheotherLGRdocumentsofNeo-Brāhmī)therearesomecombinatorymodifiersofBanglaVowelswithcertainconsonantsForexamplewhereas

আU+0986BENGALILETTERAAissubstitutedby

াU+09BEBENGALIVOWELSIGNAA

ইU+0987BENGALILETTERIissubstitutedby

pre-posedিU+09BFBENGALIVOWELSIGNI

ঈU+0988BENGALILETTERIIissubstitutedby

ীU+09C0BENGALIVOWELSIGNIIor

উU+0989BENGALILETTERUissubstitutedby

U+09C1 BENGALI VOWEL SIGN U by marking below the primary

grapheme there are some special vowel modifiers of উ as in the followingcombinedletters

zwnj guratherthanwritingasগ(g)+ (u)

h ruratherthanwritingasর(r)+ (u)

zwnj śuratherthanwritingasশ (s)+ (u)

j huratherthanwritingasহ(h)+ (u)

12

knturatherthanwritingasL (n)+ত (t)+ (u)

Similarlytherecouldbevowelmodifiersofঊorlsquo(Long)ūrsquoaswelleg

m (bh)+র (r) (n bhru ldquoeyebrowrdquo)o (s)+র (r) (p sru)ঋ (r) afterহ (h) (q hr)etc

TherehavebeenmanynotablecontributionsinsimplifyingandmodifyingBanglaspellings and combinatory techniques especially by scholars such as PabitraSarkar(1992)[134]Inthistherehasbeenanattempttoreducethenumberofallographs of both vowels and consonants in clusters and it has been widelyacceptedintheprintingofschooltextsinbothBangladeshandWestBengal[151152]Asofnow twosystems theold (traditional) and thenewgoon sidebysideoperativeindifferentdomains

HoweverinpreparationofthisLGRdocumenttheaimhasbeentoconsiderthewidelyused and usable sequences and combinations and their variations across the sisterscriptsbelongingtothebasketofBrāhmīwritingsystemsBanglaAcademyDhakapublishedStandardBanglaSpellingRulesin1992followingtherecommendationsofacommitteeconstitutedthroughaworkshopjointlyorganizedbytheJatıyaSy iksakramaandPathyapustakaBoardin1988AthroughlyrevisededitionoftheRuleswaspublishedinSeptember20126After the establishment of Banla A kademi ofWestBengal in 1986 its first PresidentAnnadasankar Ray (1904-2002) in his inaugural address gave a direction forstandardizationofBanglaalphabetscript thespellingsystemandclearlyarguedthattheywouldnotblindly followtheSanskriticmodelofconventionalgrammarAbroadlistofproposalswassenttoexpertsonBanglaandabroadagreementwasreachedforlsquohomogenizationofBanglaspellingrsquoby1988BasedonopinionsreceivedfromdifferentquartersaunanimouslistoflsquorulesrsquowasagreeduponThiswaspublishedbyalsquoSpellingDictionaryrsquo titled Ākādemi Bānāna Abhidhāna (1997) which was obviously morecomprehensive than lsquoTheUniversityofCalcuttaproposalsrsquomade in1936Alongwiththe lsquorationalizationrsquo of spellings another stepwas taken tomake thewriting systemeasier to read by making the symbols used both single and combined ones morelsquotransparentrsquoThesereformswereoriginallysuggestedbySarkar(1987firstpublishedin1978)[134][153]whereheusedthetermsSwaccha (lsquoTransparentrsquo)andAswaccha(lsquoOpaquersquo or non-transparent) even adding Ardha Swaccha (lsquohalf transparent) inbetweenthetwoSomesampleexamplesare

6Bangla Academy 2012 Bāṅlā Ekaḍemī Pramita Bāṅlā Bānānera Niyama (Bangla Academy StandardBanglaSpellingRules)DhakaːBanglaAcademy

13

Transparent r (nn) s (pt) [ (st) where both member of the cluster can berecognizedOpaquewhereneitherofthetwocouldbe(easily)recognizedmdash8 ks(7 k+ষ s)= jn

( j+ঞ n)tng(un+গg) hm(gt h+ম m)

Semi-transparent A (pr)পH (rp)whereone symbol is recognizable and theother is

notIncaseofthree-termclustersatleastonesymbolwillnotbetransparentegv str

(w s+x t+র r)D str(B s+C t+র r)etc

Therewere in fact two types of proposals One concerned the shape of the lettersthose of consonant + vowel (CV) combinations and conjuncts which is consonant +consonantcombinationsTherewerefurthercomplexshapesiethoseofconsonant+consonant+ (consonant+) vowel (CC(CV) signs as in y (pru) or z (skru) SomedecisionsinthisareawerenecessarybecauseafewoftheCC(C)symbolsrepresentedcomplexitiesthatmadelearningthemdifficultforthechildrenTheotherdealtwiththespellings ofwords onlywithout any reference to the shapes of letters inwhich theywere written The basic objective here was lsquoone word one spellingrsquo to the greatestextentthatwaspossible[151]

Belowwe place a statement of themost salient changes that affect the consonant +vowelcombinations[153]

a The variants of the short u (^ উ-কার hrasva u-kāra) vowel sign have been

brought down to one ie So zwnj (gu) is now গ Similarly h (ru) gt র zwnj

(śu)gt শ j (hu)gtহ and therefore cluster + short u sign k (ntu)gt W

(ন++ত+উ) (stu)gt[ (স++ত+উ)

b The variants of long u (দীঘH ঊ-কার dīrgha u-kāra) have also been reduced

(rū)gt র n (bhrū) gt Y (ভ bh++র r+ঊ ū) (drū)gt (দ d++র r+ঊ ū) p (śrū)gt

(শ ś++র r+ঊ ū)

c The variants of ঋ-কার (ṛ-kāra secondary symbol of ṛ) have been brought down to one q (hṛ) gt হ

Regarding consonant + consonant + (consonant)hellip+ (vowel) clusters PaschimbangaBanglaAkademi proposed transparent or semi-transparent shapes for clusters to theextentadmissibleinBanglawritingsystemSomeexampleswillclarifytheproposal(Aslashwillmeanthatthetraditionalcluster-shapeprecedesitwhiletheBanglaAkademiinnovationfollows)[153]

Xব ধ bdh ( b+ ধ dh) Mদ ধ ddh (J d+ধ dh) ন থ nth (L n+থ th) Uঞ চ ntildec

(9 ntilde+চ c) ঞ ছ $ ntildech (9+ছ) ঞ জ ntildej (9 ntilde+জ j) Sক ত amp kt (7 k+ত t) T

14

kr (7 k+র r) Nগ ধ ( gdh (K g+ধ dh) ) ṅk (u ṅ+ক k) t ṅg (u ṅ+গ g) +

ṣṇ (B ṣ+ণ ṇ) ন ndhr (L n+ dh+র r) - ṇḍr ( ṇ+ ḍ+র r) ktr (7 k+x

t+র r)

331TheConsonantsAsper traditional classificationBangla Consonants are categorized according to theirphoneticpropertiesespeciallyintermsofplaceandmannerofarticulation[107]Thereare Five lsquoVargarsquo (pronounced as lsquoBargarsquo in Bangla) or Groups (sets or classes)distinguished by Place of Articulation and one Non-lsquovargarsquo group [105] Each Vargawhich corresponds toStopsat a certainplaceof articulation containsa seriesof fiveconsonants classified as per their phonetic qualities (ie manner of articulation)beginning from Unvoiced and Unaspirated to Voiced and Aspirated (in the fourthcolumn)finallyendingwithaHomorganicorCorrespondingnasal[107]Considerthefollowingtable

lsquoVargarsquoorSets

Unvoiced Voiced Nasal

-Asp +Asp -Asp +Asp

Velar কlsquoKrsquoU+0995

খlsquoKHrsquoU+0996

গlsquoGrsquoU+0997

ঘlsquoGHrsquoU+0998

ঙlsquoNGrsquoU+0999

Palatal চlsquoCrsquoU+099A

ছlsquoCHrsquoU+099B

জlsquoJrsquoU+099C

ঝlsquoJHrsquoU+099D

ঞlsquoNYrsquoU+099E

Retroflex টlsquoTTrsquoU+099F

ঠlsquoTTHrsquoU+09A0

ডlsquoDDrsquoU+09A1

ঢlsquoDDHrsquoU+09A2

ণlsquoNNrsquoU+09A3

Dental তlsquoTrsquoU+09A4

থlsquoTHrsquoU+09A5

দlsquoDrsquoU+09A6

ধlsquoDHrsquoU+09A7

নlsquoNrsquoU+09A8

Bilabial পlsquoPrsquoU+09AA

ফlsquoPHrsquoU+09AB

বlsquoBrsquoU+09AC

ভlsquoBHrsquoU+09AD

মlsquoMrsquoU+09AE

Table5VargaclassificationofBanglaconsonants

(FallingintoaPatternofFiveSetsofUnvoicedUnaspiratedUnvoicedAspiratedVoicedUnaspiratedVoicedAspiratedandNasalscalledfivelsquoVargarsquo)

Non-

যlsquoYrsquoU+09AF

য়lsquoYYrsquoU+09DF

রlsquoRrsquoU+09B0

ড়lsquoRRrsquoU+09DC

ঢ়lsquoRHrsquoU+09DD

15

Varga লlsquoLrsquoU+09B2

শlsquoSHrsquoU+09B6

ষlsquoSSrsquoU+09B7

স lsquoSrsquo U+09B8

হlsquoHrsquoU+0939

Table6Non-Vargaconsonants(Notfallingintoanyofthefivecategories)

332TheImplicitVowelKillerHasanta(calledrsquoHalantrsquoorlsquoHalantarsquoinotherBrahmı-basedscripts)As stated earlier all consonants are pronounced in isolation with an implicit vowel(centralback-ɔinBanglaastheneutralvowel)assumedtobeassociatedwiththem[121]ThelsquoHasantarsquo(=rsquoHalantrsquoorlsquoHalantarsquoinotherBrahmı-basedscripts)orthetermlsquoVirāmarsquo7(=rsquoDa rirsquoinBangla)aspreferredinUNICODE(cfUnicode30andabove)havebeenusedinthisreportastermsthathavebeenusedtodenotethecharacterthatmarkthe absence of this inherent vowel It may be noted that the term virama has beenadopted in UNICODE in a sense that is different from the traditional definition ofgrammarandhenceitrequiressomeexplanationhereConsideringtheimportanceofthedocumentthisnoteshouldbeapartofthisLGRdocumentsothatanybodyreferingtoitshouldbeabletoknowthepropergrammaticalexplanationofthetermBecauseaspecialsignisneededwheneverthisimplicitvowelisstrippedoffthesymbolisknownas the Hasanta (= Halant) (U+09CD) By placing the Hasanta under the firstconsonantofacombinationorclusteronecouldndashincommonparlanceldquokillrdquoitsvowelandcreate conjuncts In thismanner conjunct characters canbegenerallywrittenbyjoining two to fourconsonant combinations In rarecases thisprocess can joinup tofive consonantsHowever thenotionof amaximumnumberof consonants joining toformoneaksara8istobeboundedempiricallyThisisanobservationbasedontheCIIL-Emille Corpora of Bangla words [132 amp 133] as seen in print these days Given themixture of scripts and languages happening on theweb the possibility that onemaywant a generic Top Level Domain [gTLD] which may have more than the observedmaximum cannot be ruled out This can be the case when a foreign language wordwhichadmitsalargenumberofconsonantsistransliteratedintoBanglaHenceintheBanglaLGRworkthislimitwillnotbeenforced

333VowelsSeparate symbols exist for all lsquoSwararsquo or Vowels in Bangla which are pronouncedindependentlyeitheratthebeginningofthewordorafteranothervowelorconsonantsoundToindicateaVowelsoundotherthantheimplicitoneaVowelsigncalledlsquokārrsquo

7VirāmaasusedhereisalsoamisnomeraccordingtotheIndiangrammaticaltraditionsNowheremereabsence of a vowel is marked as virama Hasanta just marks the absence of a vowel nothing else(AbhyankarKashinathVasudevampJMShukla1961ADictionaryofSanskritGrammarBarodaːOrientalInstitute)8ThistermneedstobedisambiguatedAksaraalsomeanslsquosyllablelsquoinIndiangrammaticaltreaditions

16

inBanglaorMātrāinNagarı9isattachedtotheconsonantSincetheconsonanthasthisbuilt in neutral vowel at the end there are equivalent kāras (Mātrās) for all vowelsexcepttheঅ(pronounced-ɔ)Thecorrelationisshownasfollows

Vowel Correspondingvowelsign(kāras(Mātrās)

অlsquoArsquoU+0985

আlsquoAArsquoU+0986 া U+09BE

ইlsquoIrsquoU+0987 ি U+09BF

ঈlsquoIIrsquoU+0988 ীU+09C0

উlsquoUrsquoU+0989 U+09C1

ঊlsquoUUrsquoU+098A U+09C2

ঋVocalicrsquoRrsquoU+098B U+09C3

ৠVocaliclsquoRRrsquoU+09E0 U+09C4

ঌVocaliclsquoLrsquoU+098C U+09E2

ৡVocaliclsquoLLrsquoU+09E1 U+09E3

এlsquoErsquoU+098F ে U+09C7

ঐlsquoAIrsquoU+0990 ৈU+09C8

ওlsquoOrsquoU+0993 োU+09CB

ঔlsquoAUrsquoU+0994 ৌ U+09CC

- ৗ U+09D7

Couldappearontopofঅ lsquoArsquoU+0985oranyothervowel

U+0981Candrabindu

9AlthoughthetermlsquoMatralsquoinBanglastandsforanaltogetherdifferentconceptvizthetopbarplacedoveraletterndashtypicallyavailableinHindiandBanglabutmissinginGujarati

17

Vowel Correspondingvowelsign(kāras(Mātrās)

Couldappearafterঅ lsquoArsquoU+0985oranyothervowel

ংU+0982Anusvara

Couldappearafter অlsquoArsquoU+0985oranyothervowel

ঃU+0983Visarga

Afteranyconsonant U+09CD(Hasanta)

- ঽ U+09BDAvagraha

Table7BanglaVowelswithcorrespondingkārs

334TheAnusvāraonuʃʃār(ং-U+0982)TheAnusvāra or onuʃʃār inBangla at times represents a homorganic nasal but notalwaysItreplacesaconjunctgroupofalsquoNasalConsonant+Hasanta+ConsonantrsquowherethesecondconsonantbelongstotheVelarvargaorsetasinলংকাButitoftenappearsalso for such combinations involving non-velars appearing as the lastmember of thecombinationasinলGাংটা ldquonakedrdquoorলGাংচা ldquoakindofsweettolimprdquoBeforeanon-vargaconsonant the Anusvara represents a nasal sound that may have an alternativeconjoined writing symbol representing the corresponding nasal consonant of theparticularsetAlthoughModernHindiMarathiandKonkaniprefertheanusvāratothecorrespondingHalf-nasal inBangla it isclearlydemarcatedastowhereonemustusetheAnusvāraandwhere ithastobeaconjunctclusterwithanasalasthefirstorthesecondcomponent

335NasalizationCandrabindu(-U+0981)

Candrabindu denotes nasalization of the preceding vowel as in চাদ cad lsquomoonrsquo(U+099AU+09BEU+0981U+09A6)Thissignwithadotinsidethehalf-moonmarkisusedasnasalizationmarkerinmanyBrahmı-basedscripts[143]

336Nukta(-U+09BC)ThenuktasigndoesnotexistinBanglaorthographyItispredominantlyusedinmanyBrahmıderivedscriptssuchasDevanagarı(forHindiBodoMaithiliSantaliKashmiriandSindhiThetermandtheconceptofnuktaareborrowedinBanglaTheIDNAProtocol(RFC5891)statesthatIDNsmustbeinUnicodeNormalizationFormC (NFC) RFC 7940 applies this requirement to LGRs The definition of NFC in the

18

UnicodeStandardcontainsanumberofcompositionexclusionsAsaresulttheBanglalettersয় YYAড় RRAandঢ় RRHAhavetoberepresentedinthethisLGRbyusingthesequences (YA +Nukta U+9AF + U+09BC) (DDA + Nukta U+9A1 + U+09BC) and(DDHA+NuktaU+9A2+U+09BC)insteadofthesinglecodepointsYYA(U+9DF)RRA(U+09DC) andRRHA (U+09DD) although the useof lsquoNuktarsquo is otherwise completelyunnaturalinBanglaIt is noted that in the current Unicode Standard chart these characters are listed asadditionalconsonantsAspertheLGRProcedurehoweverthesedecisionsdependontheIDNAProtocolthroughasetofprodeduresdevelopedbytheIETFEventhoughtheUnicode Standard also prescribesmethods to produce these three characters both asatomiccharacters (forexample09DC forড় [r]09DD forঢ় [rh] and09DFasয় [y]assinglekeystroke)theIDNAprotocolrequiresthatwetreatthemasconjunctcharactersandthenallocatecodesfortheseintheUnicodeBengaliBlockItmaybenotedthattherecouldbesporadicattemptsorcasesofwritingMuslimnamesUrdupoeticwordsandPerso-Arabicloanwordswithnuktaunderক(k)খ(kh)গ(g)জ(j) and ফ (ph) only for the sake of correct pronunciation and for maintaining thesanctityoftheloanwordThesewerealsolikeusingBanglawritingsystemtoworkliketheIPAscriptItishowevernotinuseinBanglawritinginprinting

337Visargabiʃɔrgo(ঃ-U+0983)andAvagraha(ঽ-U+09BD)

TheVisargabiʃɔrgoU+0983 is frequentlyused inBangla loanwordsborrowed fromSanskritandrepresentsasoundveryclosetohOnecouldquoteasanexampleদঃখduhkholdquosorrowrsquorsquoldquounhappinessrsquorsquo(U+0926U+0941U+0983U+0916)The Avagraha ঽ (U+09BD) is mainly used in Sanskrit Pali Prakrt or Maithili textswritteninBanglaItisgraduallybeingreplacedbyanuppercomma(egনেরাঽপরািণre-writtenasনেরাrsquoপরািণ)ItisrarelyusednoweveninotherlanguagesusingBanglascriptIncaseofLGRtheAvagrahaisnotpartoftherepertoireIthasbeendecidedthereforenottoretainAvagraha(ঽ)(U+09BD)becauseitisblockedinTLDsaspertheMaximalStartingRepertoire(MSR)PleaseseeAppendixIIinsection11foracompletelistofBanglaconsonantsandtheirallographs

338ZeroWidthNon-joiner(U+200C)andZeroWidthJoiner(U+200D)ThisnoteispertinenttotheuseofZeroWidthJoiner(ZWJ)andZeroWidthNonJoiner(ZWNJ)asusedinBanglaItneedstobenotedthatNepaliKonkaniandHindiusethesetwosignsinadifferentmanner

19

ZWJ(U+0200D)andZWNJ(U+0200C)arecodepointsthathavebeenprovidedbytheUnicodestandardto instructtherenderingofastringwherethescripthastheoptionbetweenjoiningandnon-joiningcharactersWithouttheuseofthesecontrolcodesthestringmayberenderedinanalternateformfromwhatisintendedUseofZWJ

bull InsofarasBanglaisconcernedZWJisusedfortheproperrenderingofcharacterssuchaskhaṇḍa-taৎasinসতGিজৎ (satyajit)ldquoSatyajitrdquoandসৎ(sat)ldquohonestrdquoThisistypedasfollowsta+Hasanta+ZWJ(U+0200D)

bull However ZWJ is more important where same combination of consonantal

characters is represented differently depending upon the contexts Eg র++য

havetworepresentationsinBanglamdashasযHandasর GTogettheformযHonehasto

type in the following mannermdashর++য but for র G the sequence would be

র+ZWJ++য [154] In other words ZWJ is used in the rendering of wordsdemanding ya-phalā after ra which is otherwise not possible to type (render)due to the same order of ra+hasanta+antastha ja in the medial andor finalposition Interestingly ra+hasanta+antastha ja is used to type repha on theconsonant -antasthaja as inকায6 (kaarjo) In order to get a ya-phalā after the

consonant -ra it is therefore obligatory to use ZWJ after -ra as in র Gাপার

(wrapper)র Gাশ(rash)র Gািল(rally)etcThetypingsequenceisgivenbelow

ra(র)+ZWJ+hasanta()+antasthaja(য)=র GUseofZWNJ

bull TheuseofZWNJinBanglaisusedtorepresenttheexplicitHasantaorHalantInordertoavoidconjunctformationincaseswherethereisanexplicithasantabeforethesucceedingconsonanttheZWNJisused

Consonant+hasanta+ZWNJ+consonant=explicithasantaExampleAা7 কথন(prakkathanaprakkɔtʰon)

TheuseofZWJZWNJhavebeenruledoutfromtherootzonebythe[Procedure]Usedin Bangla to create alternate renderings the insertion of these two signs can affectsearchingaswellasNLP

20

TheZeroWidthNon-joiner(ZWNJ)isaninvisiblecharacterusedincertaincases(afterHasanta)wheredefaultconjunctformationistobeexplicitlyrestrictedandtheHasantajoiningthetwoconsonantsparticipatingintheconjunctformationneedstobeexplicitlyshown

339UseofYa-phalaaYa-Phalaasequencesare two instances inBanglawhereHasanta isprecededbya fullvowel(U+0985অ-BENGALILETTERAandU+098Fএ-BENGALILETTERE)

bull অা 098509CD09AF09BE BENGALILETTERA+BENGALISIGNVIRAMA+BENGALILETTERYA+BENGALIVOWELSIGNAA

bull এা 098F09CD09AF09BEBENGALILETTERE+BENGALISIGNVIRAMA+BENGALILETTERYA+BENGALIVOWELSIGNAA

For renderingYa-phalā followedbyঅ andএ it isnecessary to typeU+09CDHasantaplusU+09AFyaprecededby thesaidvowelsThis isapurely ligaturalentityand theadditionofYa-phalāandākaraisusedtoelicittheaeligsoundasinEnglishacidঅGািসড

association অGােসািসেয়শনlsquobatrsquoবGাটlsquofatrsquo ফGাট lsquomatrsquo মGাটlsquocaprsquoকGাপetcTheBrāhmīscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribed as lsquovowel killerrsquo although it actually indicates absenceof a vowel after themarkedconsonantOnly theconsonantscanhave theHasantamarkedButasweseehereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅ8াandএ8া(CfUnicode100p473[100])

3310FormationofRa-phalaaandRefSequences Thiscasereferstotheformationofrephaandra-phalāasfollows

Ra-Hasanta= (C2H)whereC2iseither

09B0(র-BENGALILETTERRA)or 09F0(ৰ-ASSAMESELETTERRAUnicodename

BENGALILETTERRAWITHMIDDLEDIAGONAL)His09CD(-BENGALISIGNVIRAMA)

Owingtoco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+kaasinঅকHarkaldquothesunldquo)ra-phalā=C+Hasanta+ra(egT ieka+Hasanta+raasinচTchakraldquocyclerdquo)Thepointisin

21

both the cases the slot for ra could be Bangla ra র (U+09B0) or the Assamese ra ৰ(U+09F0)followedprecededbythecommonHasanta(U+09CD)whereastheshapesofrephaandra-phalāinboththecasesremainthesameTheLGRmakesanoteofthispoint of concern with respect to the two RAs in disguise as it would be compeltelyimpossibletodistinguishbetweenthemwithnakedeyesinalablesogeneratedwhichmay consequently lead to concerns related to spoofing and other kind of cyberirregularitiesThemotivetoclassthesetwoCPsas(blocking)variantsisbecausefullyrendered labels may mask the distinction between Bangla ra র (U+09B0) or theAssameseraৰ(U+09F0)ThatprovidesthejustificationforVariantSet4thoughonlyinthecontextoffollowingHasantThedifferencebetweentheRAsisonlydistinguishableifonelooksintotheirUnicodevaluesThereforelabelssuchasঅকHarka শীষH sırsalsquotopapexrsquo অY abhra lsquocloudthe skyrsquo ম śrama lsquophysical labourrsquo could be extremelydangerous as theweb-usermay never verify the digital content (the labels) with itsunicodevaluecodepoints ThispointismadeexplicitlywithreferencetoTable9(ofsequencesp36)andTable16(ofWLESymbolsp47)thataretofollowMoreoveritisnoteworthythattheREPHAcanalsooccurwithKHANDATATheconditionsinthiscontextofKHANDATAareliabletobesuchthattheCshouldbeeitherRAU+09B0(র)(usedinBangla)orRAU+09F0(ৰ)(usedinAssamese)

4 OverallDevelopmentProcessandMethodologyThe Neo-Brāhmī Generation Panel (NBGP) has been formed by members havingexperience in Linguistics (especially in NLP Computational linguistics) LiteratureLanguageHistoryandEpigraphyUndertheNeo-BrāhmīGenerationPanelBanglaandeightotherscriptsbelongingtoseparateUnicodeblocksarebeingtakenuptoassignaseparate LGR for each However an attempt ismade to ensure that the fundamentalphilosophybehindbuildingthoseLGRsconsistentwithallotherBrāhmī-derivedscriptsThepresentLGRwillcater tomultiple languagesbelongingtoEGIDSscale1to4(seeTable4)thatuseBanglascriptThefollowingguidingprinciplesareusedinmakingdecisionsaboutBanglaLGRCode-points

41 GuidingPrinciplesTheNBGP adopts followingbroadprinciples for selection of code-points in the code-pointrepertoireacrosstheboardforalltheNeo-Brāhmīscriptswithinitsambit

411 InclusionPrinciples4111 ModernUsageEvery character proposed should be in the everyday usage of a particular linguisticcommunityThecharacterswhichhavebeenencodedintheUnicodefortranscription

22

purposesonlyorforarchivalpurposeswillnotbeconsideredforinclusioninthecode-pointrepertoire4112 UnambiguousUseEvery character proposed should have unambiguous understanding among linguistsaboutitsusageinthelanguage

42 ExclusionPrinciplesThe main exclusion principle is that of External Limits on Scope These consist ofprotocolsor standardswhichareprerequisites to theLabelGenerationRule-setsAllfurtherprinciplesareinfactsubsumedundertheselimitationsbuthavebeenspeltoutseparatelyforthesakeofclarity421 ExternalLimitsofScopeThecodepointrepertoireforrootzonebeingaveryspecialcaseatthetopofprotocolhierarchies thecanvasofavailablecharacters forselectionasapartof theRootZonecodepointrepertoireisalreadyconstrainedbyvariousprotocollayersbeneathitThefollowingthreemainprotocolsstandardsactassuccessivefiltersiTheUnicodeChartOut of all the characters that are needed by the script in question if a particularcharacter is not encoded in Unicode it cannot be incorporated in the code pointrepertoire Such cases are quite rare and especially so in Bangla-Asamiyā-ManipuriWritingSystemgiventheelaborateandexhaustivecharacterinclusioneffortsmadebytheUnicodeconsortiumiiIDNAProtocolUnicode being the character-encoding standard for providing the maximum possiblerepresentation of a given scriptlanguage it has encoded as far as possible all thepossible characters needed by the script However the Domain name being aspecialized case it is governed by an additional protocol known as IDNA(InternationalizedDomainNames inApplications) The IDNAprotocol excludes somecharactersoutofUnicoderepertoirefrombeingpartofthedomainnamesiiiMaximalStartingRepertoire(MSR)TheRoot-zoneLGRbeing the repertoireof characterswhicharegoing tobeused forcreationoftheRoot-zoneTLDswhichinturnconstituteanevenmorespecializedcaseof domain names the ROOT LGR procedure introduces additional exclusions on theIDNArsquosallowedsetofcharacters

23

ExampleBanglaSignAvagrahaঽ(U+093D)evenifallowedbyIDNAprotocol isnotpermittedintheRootZoneRepertoireaspertheMSRTosumuptherestrictionsstartoffwithadmittingonlysuchcharactersasarepartofthe code-block of the given scriptlanguage The IDNA Protocol further narrows thisdownandfinallyanadditionalfilterintheformofMaximalStartingRepertoirerestrictsthecharactersetassociatedwiththegivenlanguageevenmore4211 NoPunctuationMarksTheTLDsbeingidentifierspunctuationmarkerspresentinBraHami-basedscriptswillnotbeincluded 4212 NoSymbolsandAbbreviationsAbbreviations weights and measures and other such iconic characters like BANGLAISSHAR(U+09FA)BANGLACURRENCYDENOMINATORSIXTEEN৹(U+09F9)etcwillalsonotbeincluded4213 NoRareandObsoleteCharactersThere are characterswhich have been added toUnicode to accommodate rare formssuchasSanskriticVOCALICRRৠ(U+09E0)andVOCALICLldquoঌrdquo (U+098C)aswellasVOCALICLLৡ(U+09E1)andtheallographicndashkaraformsofthelattertwosymbols-VOWELSIGNVOCALICL(U+09E2)andVOWELSIGNVOCALICLLldquo(U+09E3)Allsuch charactersareexcludedwhich complieswith theConservatismprincipleas laiddownintheRootZoneLGRprocedureHoweverinBanglathe-karacorrespondingtoVOCALICRRৠ(U+09E0)whichisVOWELSIGNVOCALICRRldquordquo(U+09C4)isstill inactiveuseincertainlimitedborrowedorSanskriticwordsandarethereforeretained4214 NoStressMarkersofClassicalSanskritandVedicStressmarkers for classical Sanskrit will not be included This is also in consonancewiththeLetterprincipleaslaiddownintheRootZoneLGRprocedure4215 ABNFThe Augmented Backus-Naur Formalism (ABNF) is described in Section 541 andAppendix(Section101)

5 RepertoireTheBanglaWritingSystemisrepresentedinUNICODEusingtheBengali(Bangla)scriptname as enumerated in ISO 15924 corresponding to languages such as Asamiyā(Assamese) Bangla (Bengali) and Manipuri The BENGALI block used for Bangla-

24

Asamiyā-Manipuri in theUNICODEhas93 entriesThis sectiondetails the code-pointrepertoirethattheNeo-BrāhmīGenerationPanel[NBGP]proposestobeincludedintheBanglaLGRItmaybementionedherethat theGovernmentofAssamhassubmittedaproposal toBureauof Indian Standards (BIS) on26th February2016 for dis-unificationofBanglaand Asamiyā Scripts The BIS inits 8thMeetingofIndian Language Technologies andProducts Sectional Committee LITD 20 held on 23rd Aug 2017 decided torefer theproposalforrecognitionofAssamesescriptinISOIEC10646toISOUntiltheUNICODEConsortiumtakesanyfurtheractionitwillbeassumedthattheCodePointRepertoireunderTable11willbevalidforallthethreelanguagesasaboveFor each of the code points language references have been given in the last columntitledReferenceunderTable8titledtheldquoCodePointRepertoirerdquoForentirecoverageofBanglacodepointsreferencesofBanglaAsamiyā(Assamese)Manipuri(Meitei)andBishnupriya are given Kokborok written in Bangla script is not known to haveintroducedmanynewcomplicationsexceptforoneparticularcharacterThoughonlyafewrepresentativelanguagesunderEGIDSScale1-4havebeenchosenforreferencingthey together cover all the code-points required for all the languages that NBGP hasconsideredasgivenunderBanglaUnicodePoints(asgiveninUNICODE63)Howeverbefore thedetailsarepresented it is ideal to lookat theBanglaCodePointChartfromMaximalStartingRepertoire[MSR]Version3Itmaybenotedthattheshapesofthereferenceglyphsgivenbelowinthecodechartsarebasedononeofthemanyfontsdesigned and are not prescriptive because there could be some variations in actualfonts ndash both UNICODE-compatible and True-Type ones Consider the following Codepointtable

25

Figure1BanglaCodePagefrom[MSR]forBangla-Asamiyā-Manipuri

ColourconventionAllcharactersthatareincludedinthe[MSR]-YellowbackgroundPVALIDinIDNA2008butexcludedfromthe[MSR]-PinkishbackgroundNot PVALID in IDNA2008 or are ineligiblefor the root zone (digits hyphen) - Whitebackground

GiventheBanglaUnicodeBlockasinFigure1forthecodepointsthoseareincludedintheMSRthefollowingsymbolswillneedaseparatetreatmentৎ U+09CE BanglaLetterKhanda-Taৰ U+09F0 Asamiyā-BanglaLetterRaWithMiddleDiagonalৱ U+09F1 Asamiyā-BanglaLetterRaWithLowerDiagonal

26

51 CodePointRepertoireInclusion

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

1 U+0981 BENGALISIGNCANDRABINDU

Candra-bindu

1Bangla2Manipuri2Assamese

[112][122][125]

2 U+0982 ং BENGALISIGNANUSVARA

Onushshar(Anusvara)

1Bangla2Manipuri2Assamese

[112][122][125]

3 U+0983 ঃ BENGALISIGNVISARGA

Biśarga(Visarga)

1Bangla2Manipuri2Assamese

[112][122][125]

4 U+0985 অ BENGALILETTERA

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

5 U+0986 আ BENGALILETTERAA

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

6 U+0987 ই BENGALILETTERI

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

7 U+0988 ঈ BENGALILETTERII

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

27

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

8 U+0989 উ BENGALILETTERU

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

9 U+098A ঊ BENGALILETTERUU

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

10 U+098B ঋ BENGALILETTERVOCALICR

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

11 U+098F এ BENGALILETTERE

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

12 U+0990 ঐ BENGALILETTERAI

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

13 U+0993 ও BENGALILETTERO

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

14 U+0994 ঔ BENGALILETTERAU

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

28

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

15 U+0995 ক BENGALILETTERKA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

16 U+0996 খ BENGALILETTERKHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

17 U+0997 গ BENGALILETTERGA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

18 U+0998 ঘ BENGALILETTERGHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

19 U+0999 ঙ BENGALILETTERNGA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

20 U+099A চ BENGALILETTERCA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

21 U+099B ছ BENGALILETTERCHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

29

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

22 U+099C জ BENGALILETTERJA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

23 U+099D ঝ BENGALILETTERJHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

24 U+099E ঞ BENGALILETTERNYA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

25 U+099F ট BENGALILETTERTTA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

26 U+09A0 ঠ BENGALILETTERTTHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

27 U+09A1 ড BENGALILETTERDDA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

30

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

28 09A109BC(U+09DC)

ড় NormalizedformofBENGALILETTERRRA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]09DCisthepreferredcodepointhoweveritisnotavailableforLGRasperthestandardsgoverningthisLGRdevelopment

29 U+09A2 ঢ BENGALILETTERDDHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

30 09A209BC(U+09DD)

ঢ় NormalizedformofBENGALILETTERRHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]09DDisthepreferredcodepointhoweveritisnotavailableforLGRasperthestandardsgoverningthisLGRdevelopment

31 U+09A3 ণ BENGALILETTERNNA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

32 U+09A4 ত BENGALILETTERTA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

31

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

33 U+09A5 থ BENGALILETTERTHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

34 U+09A6 দ BENGALILETTERDA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

35 U+09A7 ধ BENGALILETTERDHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

36 U+09A8 ন BENGALILETTERNA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

37 U+09AA প BENGALILETTERPA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

38 U+09AB ফ BENGALILETTERPHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

39 U+09AC ব BENGALILETTERBA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

32

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

40 U+09AD ভ BENGALILETTERBHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

41 U+09AE ম BENGALILETTERMA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

42 U+09AF য BENGALILETTERYA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

43 09AF09BC(U+09DF)

য় NormalizedformofBENGALILETTERYYA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]09DFisthepreferredcodepointhoweveritisnotavailableforLGRasperthestandardsgoverningthisLGRdevelopment

44 U+09B0 র BENGALILETTERRA

Consonant 1Bangla2Manipuri

[112][125]

45 U+09B2 ল BENGALILETTERLA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

46 U+09B6 শ BENGALILETTERSHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

33

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

47 U+09B7 ষ BENGALILETTERSSA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

48 U+09B8 স BENGALILETTERSA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

49 U+09B9 হ BENGALILETTERHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

50 U+09BE া BENGALIVOWELSIGNAA

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

51 U+09BF ি BENGALIVOWELSIGNI

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

52 U+09C0 ী BENGALIVOWELSIGNII

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

53 U+09C1 BENGALIVOWELSIGNU

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

34

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

54 U+09C2 BENGALIVOWELSIGNUU

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

55 U+09C3 BENGALIVOWELSIGNVOCALICR

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

56 U+09C4 BENGALIVOWELSIGNVOCALICRR

Kāra(Mātrā)

1Bangla2Assamese

[112][122]

57 U+09C7 l BENGALIVOWELSIGNE

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

58 U+09C8 m BENGALIVOWELSIGNAI

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

59 U+09CB lা BENGALIVOWELSIGNO

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

60 U+09CC lৗ BENGALIVOWELSIGNAU

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

35

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

61 U+09CD BENGALISIGNVIRAMA

Hasanta(=Halant)Virama(=Da ri)

1Bangla2Assamese2Manipuri

[112][122][125]

62 U+09CE ৎ BENGALILETTERKHANDATA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

63 U+09F0 ৰ BENGALILETTERRAWITHMIDDLEDIAGONAL

Consonant 2Assamese [122]

64 U+09F1 ৱ BENGALILETTERRAWITHLOWERDIAGONAL

Consonant 2Assamese2Manipuri

[122][125]

Table8BanglaCode-PointRepertoire

Apart from the above individual code-points the Neo-Brāhmī Generation Panel alsoproposes some specific sequences which enable conditional inclusion of the BanglaLETTER A and E followed by Bangla SIGN VIRAMA and Bangla LETTER YA againfollowed by Bangla VOWEL SIGN AA in the repertoire for enabling inclusion of aeligsoundasinEnglishlsquobatrsquolsquocatrsquoetc

SrNo

UnicodeCodePoints

Sequence

CharacterNames Examplelanguagesusingthecode-point

(Notexhaustive

list)

Reference

36

SrNo

UnicodeCodePoints

Sequence

CharacterNames Examplelanguagesusingthecode-point

(Notexhaustive

list)

Reference

S1 098509CD09AF09BE

অ8া BENGALILETTERABENGALISIGNVIRAMABENGALILETTERYABENGALIVOWELSIGNAA

BanglaAssamese

[112][122]

S2 098F09CD09AF09BE

এ8া BENGALILETTEREBENGALISIGNVIRAMABENGALILETTERYABENGALIVOWELSIGNAA

Bangla [112]

Table9Sequences

52 CodePointRepertoireExclusionTherearesomecharactersoftheBanglascriptthatfindplaceintheUnicodebuthavenot been included in the repertoire in the LGR proposal The reason for excludingঌ(U+098C)andৗ(U+09D7)isthattheyarerareandobsoletecharacters

SrNo CodePoints

Glyph CharacterNames Note

1 U+098C ঌ BENGALILETTERVOCALICL Limitedordeclininguse

2 U+09D7 ৗ BENGALIAULENGTHMARK Limitedordeclininguse

Table10ExcludedCodePoints

37

53 CodepointnotusedaloneBENGALI SIGN NUKTA U+09BC (See 336) is excluded from repertoire since it will never be used alone It will be used as sequence in three special characters in normalized form for ড় ঢ় য়

UnicodeCodePoint

Glyph CharacterName Reasonforexclusion

U+09BC BENGALISIGNNUKTA

Never used alone Only used together with U+09A1 ড

U+09A2 ঢ U+09AF য as to form ড় ঢ় য় respectively

Table10bExcludedCodePoints

54 TheBasisofPresentIDNThepresentLGRhasalsobenefitedfromtheearlierworkonIDNforBangla(differentversions)doneforभारतorভারতdraftedbetween20112009and18072013

541 TheABNFVariablesTheAugmentedBackus-NaurFormalism(ABNF)beganwiththefollowingvariables

CrarrConsonantVrarrVowelMrarrkāra(Mātrā)BrarrAnusvāra(onuʃʃār)DrarrCandrabinduXrarrVisarga(biʃɔrgo)HrarrHasantaViramaZrarrKhandaTa

TheAugmentedBackus-NaurFormalism(ABNF)willusethefollowingOperators

SrNumber Operator Function

1 ldquo|ldquo Alternative

2 ldquo[]rdquo Optional

3 ldquordquo VariableRepetition

38

4 ldquo()rdquo SequenceGroup

Table11TheABNFFormalism

InwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoBanglaaregiventofacilitateunderstanding

542 TheVowelSequenceInwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoBanglaare given To facilitate understanding of other Brahmi script users equivalents inDevanāgarīareprovidedwherevernecessaryAvowelsequenceismadeupofasinglevowelItmaybefollowedbutnotnecessarily(optionally)byanAnusvāraonuʃʃār(B)Candrabindu(D)oraVisargabiʃɔrgo(X)ThenumberofDBorXwhichcanfollowaV inBanglamaynotberestrictedtooneGoingbytherules illustratedinthedocument it isclearthat formationssuchasVDDVBBandVXXare invalidorthographicunitsHowever it isvalidandpossible tohaveformationsorsequencessuchasanusvarafollowedbyachandrabinduononehandandvisarga followed by a chandrabindu on the other as in হ8াংচা lsquohaelignchārsquo and lsquohaelignrsquo হ8াঃrespectivelyThepossibilityof aVisarga orAnusvāra (onuʃʃār) followingaCandrabinduexists inBangla Vowel can optionally be followed by a combination of Hasanta Virama [H]Consonant [C] to formaYa-phala ldquoYa-phala isapresentation formofU+09AFBanglaletterযorlsquoyarsquoRepresentedbythesequenceltU+09CDieBENGALISIGNVIRAMABangla SIGNHasanta or VIRAMA U+09AF -য BENGALI LETTER YAgt Ya-phala has aspecialformয়AgainwhencombinedwithU+09BEাBENGALIVOWELSIGNAA(ielsquoaarsquo(ā))itisusedfortranscribing[aelig]asintheldquoardquointheEnglishwordldquobatrdquowritteninBanglaasব8াটAVowel-sequenceadmitsthefollowingcombinations

5421 ASingleVowel

ExamplesV অअ

5422 AVowelwithConditionsAVowelcanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D]orVisarga[X]or Candrabindu+ Anusvāra [DB] or Candrabindu+ Visarga [DX] or combination ofHasanta(orVirama)[H]followedbyConsonant[C]followedbykāra(Mātrā)[M]

39

Examples

VB অং अ

VD অ अ

VX অঃ अः

VDB অং अ

VDX অঃ अःVHCM অ8াএ8া

5423 VHCMSequenceAVHCMsequencecanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D] orVisarga [X]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga [DX]

Examples VHCMB অ8াংএ8াং VHCMD অ8াএ8া VHCMX অ8াঃএ8াঃ VHCMDB অ8াংএ8াং VHCMDX অ8াঃএ8াঃ

543 TheConsonantSequence

5431 ASingleConsonant(C) ExampleC কक

5432 AConsonantwithConditionsAConsonantoptionallyfollowedbydependentvowelsignkāra(Mātrā)[M]orAnusvāra [B] or Candrabindu [D] or Visarga [X] or Hasanta (also known asVirama)[H]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga[DX]

Example

CM িকক -कक

CB কং क

40

CD ক क

CX কঃ कः

CH p क(Pureconsonant)

CDB কং क

CDX কঃ कः

5433 CMSequenceACMsequencecanbeoptionallyfollowedbyBDXDBorDX

Example CMB কীংকং कक

CMD কা का

CMX বীঃ वीः

CMDB কাং का

CMDX কাঃ काः

5434 SequenceofConsonants Asequenceofconsonants(upto4)joinedbyHasanta(alsoknownasVirama)

3(CH)CExample

CHC W rarr ন++ ত न+त

CHCHC sup2 rarr ন+ + ত+ + র न+त+र

CHCHCHC q8 rarr ন++ত++র++য় न+त+र+य

5435 Subsets While considering its subsets as a representative example we will consider thecombinationCHConlyhoweverthesameisequallyapplicabletoCHCHCandCHCHCHC[A]ThecombinationmaybefollowedbyMBDXDBorDX

Example CHCM sup3ী rarrক ক ী 4कrarrककী

CHCB sup3ং rarrক ক ং 4कrarrकक

CHCD sup3 rarrক ক 4कrarrकक

41

CHCX sup3ঃ rarrক ক ঃ 4कःrarrककঃ

CHCDB sup3 ং rarrক ক ং 4कrarrकक

CHCDX sup3ঃ rarrক ক ঃ 4कःrarrककः [B] 3(CH)CMmayfurtherbefollowedbyaBDXDBorDX

Example

CHCMB sup3ীং rarr ক ক ী ং 4क rarr क क ी

sup3ং rarr ক ক ং 4क rarr क क

CHCMD sup3া rarr ক ক া 4का rarr क क ा

CHCMX sup3ীঃ rarr ক ক ী ঃ 4कः rarr क क ी ः

CHCMDB sup3াংrarr ক ক া ং 4काrarr क क ा

CHCMDX sup3াঃ rarr ক ক া ঃ 4काः rarr क क ा ः

544 TheKhanda-Tasequence

5441 AsinglelsquoKhandarsquo-Ta(Z) Example Zৎ=x

5442 AKhandaTaCombination10AKhandaTacanbeprecededbyaconsonantandHasanta(alsoknownasVirama)

[CH]Z

Example র++ৎ =ৎHasinভৎHসনা (bhartsanā) scoldingNoteTheconditionsinthiscontextofKHANDATAarethattheCshouldbeeitherRAU+09B0(র)(usedinBangla)orRAU+09F0(ৰ)(usedinAssamese)

545 SpecialCasesSandPTwospecialcasesinvolvingSequences(referredtoasSandPinTable16underSection7)couldbedescribedbrieflyhereLetustakeupSinthefirstinstanceItisnoteworthythat thereare two instances inBanglawhereHasanta(U+09CD) isprecededbya full

10 Refer to Rule P in Section 7 Table 16

42

vowel (U+0985 অ - BANGLA LETTER A and U+098F এ - BANGLA LETTER E) ForrenderingYa-phalāfollowedbyঅandএitisnecessarytotypeU+09CDplusU+09AFyapreceded by the said vowels This is a purely ligatural entity and the addition ofYa-phalāandā-kāraisusedtoelicittheaeligsoundasinEnglishlsquobatrsquolsquofatrsquoetcTheBrahmıscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribedaslsquovowel killerrsquo although it actually indicates absence of a vowel after the markedconsonant Only the consonants can have the Hasanta marked But as we see hereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅGাandএGা(CfUnicode100p473[100])Another case refers to the formation of repha and ra-phalā in the said script andmentioned in the tableaboveasPOwing toco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+

kaasinঅকHarkaldquothesunldquo)ra-phalā=C+Hasanta+ra(egTieka+Hasanta+raas

inচTcakraldquocycleldquo)ThepointisinboththecasestheslotforracouldbeBanglaraর

(U+09B0)ortheAssameseraৰ(U+09F0)followedprecededbythecommonHasanta(U+09CD)whereastheshapesofrephaandra-phalāinboththecasesremainthesame

43

6 VariantsThissection talksabout thevariants in theBanglascriptTheNBGPcategorizes theseconfusinglyvariantsintwogroupsGroup1ConfusingduetopurevisualsimilarityGroup2ConfusingduetodeviationfromnormallyperceivedcharacterformationsbylargerlinguisticcommunityForGroup1anyidenticalcodepointsaredefinedasvariantsTheconfusablebutnotidenticalcasesarenotproposedasthereisanotherpanel(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesHowevercaseswhichbelongtoGroup2areproposedtobeconsideredasvariantsThesecasesarenotofmerevisualsimilarityasthey involve some deviations from the widely accepted norms of Bangla Aksharformations These can cause confusion even to a careful observer and hence beingproposedasvariantsThe variants are generated in a script when two or more forms are formed withdifferent storage or code points In Bangla the e-kāra ā-kāra and the o-kāra havedifferentcodepointsOnecantypeowithaconsonantatonegoandthesamebytypinge-kāra and ā-kāra as two separate keys getting the same results A reader cannotdifferentiatebetween the twoko (েকা)one typedwitha singlekeyand theotheronetypedwithtwodifferentkeysMoreoverthiswillnotbeconsideredasacaseofvariantbecauseakārafollowedbyakāraisnotallowed

61 InScriptVariantsHoweverweproposetwocasesoftruein-scriptvariantsinBanglascriptCASEIAs far as true variants in Bangla are concernedwemay drawour attention to caseswhereinHasantawith(U+09A5)থ(tha)appearsasconjunctwith(U+09B8)স(sa)and(U+09A8)ন(na)1 স+Hasanta+থ(U+09B8+U+09CD+U+09A5)versus

স +Hasanta+হ(U+09B8+U+09CD+U+09B9)

2 ন+Hasanta+থ(U+09A8+U+09CD+U+09A5)versus

ন+Hasanta+হ(U+09A8+U+09CD+U+09B9)

44

Theabovecombinationsifwrittenintraditionalorthographycouldbelittleconfusingwheretheথ(tha)inconjunctappearslikeaহ(ha)Theconjunctcouldbeintheinitialmedialorfinalpositions(asshownbelowinegno1)Itcouldbetypedwrongaswellthinking itwas aহ (ha)U+09B9 increasing the chances of risks in labelwriting andidentificationExamples1 acuteandসহ(asinacuteানsthānaacuteলsthulaাacuteGsvāsthyaঅacuteায়ীasthāyı)

2 andনহ(asinparagrantha)The fontswhichrepresent traditionalBanglawritingsystemcould tend tocreate thisproblemThereforethesemaybetakenascasesofvariantsinBanglaCASEIIAnotherinterestingexampleofvariantisencounteredinra+HasantaandHasanta+racombinationsinwritinglabelsintheBanglascript(forlanguagessuchasBanglaAssameseandManipuri)Thevariantcasesariseintypinglsquorepharsquo(involvingra+Hasanta)andlsquora-phalārsquo(involvingHasanta+ra)lsquoRepharsquocouldbeformedbytwosequences(mainlybecausebothAssameseandBanglafindplaceinthesameUNICODEpointsandlsquoB_RArsquoaswellaslsquoA_RArsquorefertothesamephoneticelement)Herethefinalligatureslookthesameandwillbeasfollows

(1) B_RA+H+C(2) A_RA+H+C

Where

B_RA = U+09B0BENGALILETTERRA(র)orA_RA = U+09F0BENGALILETTERRAWITHMIDDLEDIAGONAL(ৰ)H = U+09CDBENGALISIGNVIRAMA()C = anyconsonant(theoretically)

Example

Sequence1(UsingBanglaRA)

Ligature1

Sequence2(UsingAssameseRA)

Ligature2

U+09B0(র)U+09CD()U+0995(ক) কH U+09F0(ৰ)U+09CD()U+0995(ক) কH

U+09B0(র)U+09CD()U+09A0(ঠ) ঠH U+09F0(ৰ)U+09CD()U+09A0(ঠ) ঠH

Table12ExampleofRepha

45

NoteAsBanglaandAssameseকandঠlookexactlythesametheresultantcombinationswithRephalookidenticalAdditionofRephadoesnotmakeanydifferencelsquoRa-phalarsquocouldbeformedbytwosequencesonsimilargroundsandthefinalligatureswouldlookthesame

(1) C1+H+B_RA(2) C1+H+A_RA

WhereC1 = anyconsonantsexceptKhanda-ta

Example

Sequence1(UsingBanglaRA)

Ligature1

Sequence2(UsingAssameseRA)

Ligature2

U+0995(ক)U+09CD()U+09B0(র) U+0995(ক)U+09CD()U+09F0(ৰ)

U+09A8(ন)U+09CD()U+09B0(র) ) U+09A8(ন)U+09CD()U+09F0(ৰ) )

Table13ExampleofRa-phala

AstheAssameseandBanglaRephaandRa-phalaconjunctformslookthesamethiscouldcauseconfusabilitytotheend-usersHencetherephaandra-phalacasesneedtobedefinedasvariantsNBGPconcludedtodefineরandৰasvariantcodepointswhereonlyonevariantsetbetweenরandৰcouldcoverallcasesButthiswillcreateblockedvariantlabelsegif

someoneregistersldquoরররrdquothevariantlabelldquoৰৰৰrdquowillbegeneratedasvariantandwillbeblockedandviceversaHoweveritisonlyblockedatthelabellevelifsomeoneelseneedstoregisterotherlabelsegৰৰorৰৰৰৰitisstillpossible

62 CrossScriptVariantsAcrispcrossscriptstudyforBanglahasbeendonewithrespecttosisterscriptssuchasDevanāgarī Gurmukhı and Odia11(formerly Oriya) keeping in mind the visual andtechnicalconfusionstheymaycauseaslabelsonthewebdomainMoreoverthereisnoin-script variant in Bangla as far as the orthography is concerned The followingcharacters are being proposed by the NBPG as variants Although there are certaincharacters which are somewhat similar they but have not been included here TheyhavebeenprovidedintheAppendix(102)forreference

11 Unicode uses Oriya for the script although Odia is now the official term used

46

1 BanglaandNāgarīDevanāgarīScript

Bangla Devanāgarī

মU+09AE

मU+092E

িU+09BF

िU+093F

Table14-BanglaandDevanāgarīcross-scriptvariantcodepoint

2 BanglaandGurmukhiScript

Bangla Gurmukhı

মU+09AE

ਸU+0A38

িU+09BF

ਿU+0A3F

Table15-BanglaandGurmukhıcross-scriptvariantcodepoint

7 WholeLabelEvaluationRules(WLE) ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinsection32whenwritten inBangla12ScriptThe ruleshavebeendrafted in suchawaythattheycanbeeasilytranslatedintotheLGRspecificationsBelow are the symbols used in the WLE rules for each of the Indic SyllabicCategoryasmentionedinthetableprovidedinCodepointrepertoire(Section51)

C rarr Consonant

M rarr Kāra(Mātrā)

V rarr Vowel

B rarr Anusvāra

12 As used by the Unicode denoting and including both Assamese and Maṇipuri

47

D rarr Candrabindu

X rarr Visarga

H rarr Hasanta

Z rarr KhandaTa

S rarr S1S2(fromTable9)or(ae)Ya-phalā(V1HC1M1)whereV1iseither0985(অ-BENGALILETTERA)or098F(এ-BENGALILETTERE)His09CD(-BENGALISIGNVIRAMA)C1is-09AF(য-BENGALILETTERYA)M1is-09BE(া-BENGALIVOWELSIGNAA)S1 and S2 are valid even they are not allowed by the other context rules

P rarr Ra-Hasanta(C2H)whereC2iseither09B0(র-BENGALILETTERRA)or 09F0 (ৰ - ASSAMESE LETTER RA

Unicode name BENGALI LETTER RAWITHMIDDLEDIAGONAL)

His09CD(-BENGALISIGNVIRAMA)

Table16-SymbolsusedinWLErules

ItisalsoperhapsidealtomentionherethatinBanglatheconsonantletters(orgraphemes)arephysicallyjoinedtoformldquoclustersrdquothatcouldtheoreticallyconjoinfromtwotofourconsonantsandcombinetocreatenewshapesDashandChaudhuri(1998)statethatthereareldquonearly380uniqueconsonantclustersrdquooutofwhichBi-consonantalcombinationsare290three-lettercombinationsaccountforanother80andtherareroneswithfourlettersnumber10more[136Pg4]MoredetailsofsuchcombinationscouldbeseeninPabitraSarkar(1993)[135] 71FinalSetofWLERulesTheprevalentpatternsinBanglaandvariousrestrictionsbelowarethespecificWLErulesthatneedtobeimplemented

48

1 CisasetofCandCNwhereCNisthesetofnormalizedformsofড়ঢ়য়2 HmustbeprecededbyC

Example

3 MmustbeprecededbyCExampleকা

4 DmustbeprecededbyeitherofVCorMExampleআখখা হা

5 XmustbeprecededbyeitherofVCMorDExampleউঃখঃবঃাঃ দ ঃ

6 BmustbeprecededbyeitherofVCMorDExampleআংইংকং

7 ZmustbeprecededbyVCMDBXorPExampleইৎকৎাৎাৎপ6ৎrৎ (S is not listed because S ends with M Z may also follow S)rdquo

8 VCANNOTbeprecededbyHDetailsin711CaseofVprecededbyH

9 SCANNOTbeprecededbyH

Now letuselaborateeach rulewithexamples from the scriptkeeping inmindthe Bangla Assamese and Manipuri communities Some combinations ofcharactersmayseemunrealisticorrareinusagebutthereisnoharminaddingsuch ligaturesbecause it ispossible tocreate thembyanyusereasilybutmaynotbeattestedcombinationsCase of V Preceded by H There could be cases involvingmulti-word domains where Vmay need to beallowedtofollowanHegব8াsঅtইিuয়া baeligŋkʌv ɪndiə (U+09ACU+09CDU+09AFU+09BEU+0999U+09CD U+0995 U+0985 U+09AB U+09CD U+0987 U+09A8 U+09CD U+09A1U+09BFU+09DFU+09BE)(meaningBankofIndia)ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendswithanH(অt)andthesecondwordbeginswithaV(ইিuয়া)Somesectionsofthe linguistic community require the explicit presence of H for fullrepresentation of the sound intended However by and large the form of thefirstwordwithoutanH(U+09CD)isconsideredenoughforfullrepresentationofthesoundintendedforthefirstword

49

This isauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoire Otherwise V is never required to be allowed to follow an HPermittingthismaycreateaperceptivesimilaritybetweentwolabels(withandwithout H) for majority of the linguistic communities hence this is explicitlyprohibitedbytheNBGPIn future if required depending on the prevailing requirements from thecommunitythefutureNBGPmayconsiderrevisitingthisrule

72 AdditionalExamplesfromBanglaABNF

Belowarea fewexampleswhichhelponeunderstandsomeof the rulesABNFputsinplaceThesearejustgivenforreferencepurposesandarenotmeanttobecomprehensive

1HMBDorXcannotoccurinthebeginningofaBanglawordExample

ক क

াক ाक

ংক क

ক क

ঃক ःकAscanbeseensuchcombinationwillresultautomaticallyinaldquogolurdquooradottedcircle marking it as an invalid formation This is an intrinsic property of theIndianlanguagesyllableandisquasi-automaticallyappliedwhereversupportedbytheOS

2HisnotpermittedafterVBDXMS

Example

অ अ

অং क ক क কঃ कः ি )क

3NumberofBDorXpermittedafterConsonantorVowelorakāra(Mātrā)isrestrictedtoonethusthefollowingcombinationsareinvalidated

50

Example

কংং क

ক क

কঃঃ कःः

কা का

কীঃঃ कःः

অংং अ

অ अ

অঃঃ अःः

4NumberofMpermittedafterConsonantisrestrictedtooneExample

কীী की5MisnotpermittedafterV Example

ইাঈৗ ईाईौ6ThecombinationsofAnusvāra+ VisargaaswellasVisarga+Anusvāraarenotpermissible

Example

কংঃ कः

কঃং कः

8 Contributors

81ExpertsfromIndia ProfessorUdayaNarayanaSinghChair-ProfessorofLinguisticsampDeanFacultyofArtsAmityUniversityHaryanaGurgaonPachgaonManesarPIN122431(Haryana)India

51

ProfessorPabitraSarkarformerlyVice-ChancellorRabindraBharatiUniversityKolkataDrAtiurRahmanKhanPrincipalTechnicalOfficerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMrRajibChakrabortyLinguistSocietyforNaturalLanguageTechnologyResearch(SNLTR)Module114amp130SDFBuildingSaltLakeSector-VKolkata-700091(WestBengal)IndiaMrAkshatJoshiProjectEngineerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMsMoumitaChowdhurySeniorTechnicalOfficerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMrChandrakantaMurasinghAgartalaTripuraSomeotherNBGPmembers

82ContributorsfromBangladeshJanabMustafaJabbarHonorableMinisterMinistryofPostsTelecommunicationsampInformationTechnologyGovtofBangladeshProfShamsuzzamanKhanFormerDirector-GeneralBanglaAcademyDhakaProfRafiqulIslamNationalProfessorofHumanitiesDhakaProfSwarochisSarkarDirectorInstituteofBangladeshStudiesRajshahiUniversityRajshahiBangladeshProfJinnatImtiazAliDirector-GeneralInternationalMotherLanguageInstituteDhakaMrMohammadMamunOrRashidDepartmentofBanglaJahangirnagarUniversityampMemberBangladeshComputerCouncilProfManiruzzamanformerlyProfessorChittagongUniversityChattagramBangladeshMrShyamSunderSikderSecretarySecretaryPostampTelecommunicationsDivisionGovtofBangladesh

52

MrMdMustafaKamalFormerDirectorGeneralBangladeshTelecommunicationsRegulatoryAuthorityGovernmentofBangladeshDhakaBrigadierGeneralMdMahfuzulKarimMajumderDirector-GeneralEngineeringampOperationsDivisionBangladeshTelecommunicationsRegulatoryAuthorityGovernmentofBangladeshDhakaMdZiarulIslamProgrammerPostsampTelecommunicationsDivisionGovernmentofBangladeshDhakaProfSyedShahriyarRahmanDepartmentofLinguisticsUniversityofDhakaDrMizanurRahmanDirector(In-Charge)TranslationTextBookandInternationalRelationsDivisionBanglaAcademyDhakaDrApareshBandyopadhyayDirectorBanglaAcademyDhakaMrMdMobarakHossainDirectorBanglaAcademyDhakaDrJalalAhmedDirectorBanglaAcademyDhakaMrJahangirHossainInternetSocietyBangladesh(ICANNALS)JanabSarwarMostafaChoudhuryBangladeshComputerCouncilDhakaJanabMdRashidWasifBangladeshComputerCouncilDhakaJanabIstiaqueArifSeniorAssistantDirectorBangladeshTelecommunicationsRegulatoryAuthorityDhakaMsAfifaAbbasInformationSecurityandGovernanceLeadEngineeratBanglalinkandICANNFellowMr Mohammad Abdul Haque Secretary General Bangladesh Internet GovernanceForumMrImranHossenCEOEyeSoftandkeymemberofBangladeshAssociationofSoftwareampInformationServices(BASIS)MsShahidaKhatunDirectorFolkloreMuseumampArchiveDivisionBanglaAcademyDhakaMrSyedAshikRehmanCEOBengalMediaCorporationDhakaMrHaseebRahmanCEOProfessionalsrsquoSystemsDhaka

53

9 References[100] UnicodeConsortium2017UnicodeStandard100MountainViewCA

[101] BandyopadhyayChittaranjan1981DuiShatakerBanglaMudranoPrakashanKolkataAnandaPublishers

[102] BanerjiRD1919TheOriginoftheBengaliScriptKolkataNewDelhiAsianEducationalServices2003reprint

[103] ChatterjiSK1926TheOriginandDevelopmentoftheBengaliLanguageCalcuttaCalcuttaUniversityPress

[104] -----1939Bhasha-prakashBangalaVyakaran(AGrammaroftheBengaliLanguage)CalcuttaUniversityofCalcutta

[105] HaiMuhammadAbdul1964DhvaniVijnanOBanglaDhvani-tattwa(PhoneticsandBengaliPhonology)DhakaBanglaAcademy

[106]JhaSubhadra1958TheFormationofMaithiliLondonLuzacampCo

[107] KosticDjordjeDasRheaS1972AShortOutlineofBengaliPhoneticsCalcuttaStatisticalPublishingCompany

[108] MajumdarRC1971HistoryofAncientBengalCalcuttaGBhardwaj

[109] MazumdarBijaychandra19202000TheHistoryoftheBengaliLanguage(ReprCalcutta1920ed)NewDelhiAsianEducationalServices

[110] PandeyAnshuman2001ProposaltoEncodetheTirhutaScriptinISOIEC10646

[111] PalPalashBaran2001DhwanimalaBarnamalaKolkataPapyrus

[112] -----2007lsquoBanglaHarapherPanchParbarsquoInSwapanChakrabortyedMudranerSanskritiOBanglaBoiKolkataAbabhas

[113] RossFiona1999ThePrintedBengaliCharacteranditsEvolutionLondonCurzon

[114] ShastriMahamahopadhyayHaraPrasad1916HājārBacharērPurāṇaBāṅgālāBhāṣāyBauddhaGānōDōhāCalcuttaBangiyaSahityaParishat

[115] SinghUdayaNarayana(JointlyManiruzzaman)1983DiglossiainBangladeshandlanguageplanningCalcuttaGyanBharati214pp

[116] -----1987ABibliographyofBengaliLinguisticsMysoreCIILxii+316pp

[117] -----2017(withRajibChakrabortyBidishaBhattacharjeeampArimardanKumarTripathy)LanguagesandCulturesontheMarginGuidelinesforFieldworkonEndangeredLanguagesMimeoCentreforEndangeredLanguagesVisva-Bharati

54

[118] -----1980ScriptalchoiceandspellingreformAnessayinlanguageandplanningJournaloftheMSUniversityofBarodaSocialScienceNumber292173-186AmodifiedversionreprintedEAnnamalaiBjornJernuddandJoanRubinedsLanguagePlanningProceedingsofanInstituteMysoreCIIL405-417

[119] Sripantha1996JakhanChapakhanaEloKolkataPaschim-BangaBanglaAcademy

[120] SurAtul1986BanglaMudranerDushoBacharKolkataJijnasa

[121] ScriptBehaviourforBengaliVersion11TDILandC-DACPune

[122] BoraMahendra1981TheEvolutionofAssameseScriptJorhatAssamSahityaSabha

[123] ProposaltoEncodetheTirhutaScriptinISOIEC10646httpwwwunicodeorgL2L201111175r-tirhutapdfaccessedon25112017

[124] EthnologueAssameseintheLanguageCloudhttpswwwethnologuecomcloudasmaccessedon25112017

[125] BengalialphabetforManipurifoundinEthnologueManipuri(MeeteilonMeithei)httpswwwomniglotcomwritingmanipurihtmaccessedon20102019

[126] WikipediaBengalialphabethttpsenwikipediaorgwikiBengali_alphabetaccessedon25112017

[129]OmniglotSlyhetihttpwwwomniglotcomwritingsylotihtmaccessedon1052018

[130]WikipediaBishnupriyaManipurilanguagehttpsenwikipediaorgwikiBishnupriya_Manipuri_languageaccessedon25112017

[131] TheEMILLECIILCorpushttpmetashareeldaorgrepositorybrowsethe-emilleciil-corpusabdd35c8de6f11e2b1e400259011f6ea6bce74d38dbb42d881da76c64a6adb20accessedon1052018

[132]TheEMILLECIILCorpushttpcatalogelrainfoproduct_infophpproducts_id=696accessedon1052018

[133] BanglaLanguageampScript httpswwwisicalacin~rc_banglabanglahtmlaccessedon1052018

[134] SarkarPabitra1992BanglaBananSanskarSamasyaoSambhabanaKolkataChirayataPrakashan

[135] SarkarPabitra1993BanglaBhasharYuktabyanjanBhasha1123-45

55

[136] DashNiladriShekharandBBChaudhuri1998BanglaScriptAStructuralStudyLinguisticsToday121-28Alsoavailableathttpswwwacademiaedu9967428Bangla_Script_A_Structural_Study

[137] DaniAhmedHasan(1957)lsquoSrīhatta-NāgarīLipirUtpattioBikāśrsquoBanglaAcademyPatrika(Dhaka)Vol12(Bhadra-Agrahayan1364BangabdaNumber)pg1

[138] WikipediaSylhetiNagari httpsenwikipediaorgwikiSylheti_Nagariaccessedon1952018

[139] FuruiRyosuke(2015)lsquoVariegatedAdaptationsStateFormationinBengalfromtheFifthtoSeventhCenturyrsquoinBhairabiPrasadSahuampHermannKulkeedsInterrogatingPoliticalSystemsIntegrativeProcessesandStatesinPre-ModernIndiaChapter9Pp255-73NewDelhiManohar

[140] FergusonCharesAandMunierChowdhury(1960)lsquoPhonesofBengalirsquoLanguageVol36No1pp22-59

[141] ShahidullahMuhammad(2007)BuddhistMysticSongsDhakaMowlaBrothers

[142] RayPunyaSloka(1966)BengaliLanguageHandbookWashington

[143] HaiMuhammadAbdul(1960)AphoneticandphonologicalstudyofnasalsandnasalizationinBengaliDhakaUniversityofDhaka

[144] UnicodeConsortiumProposalSummaryFormtoAccompanySubmissionsforAdditionstotheRepertoireofISOIEC10646UNICODEhttpswwwunicodeorgL2L200202387r-syloti-formpdfaccessedonMay212018

[145] WikipediaOlChiki(Unicodeblock)httpsenwikipediaorgwikiOl_Chiki_(Unicode_block)accessedonMay212018

[146]BanglaScripthttpwwwbangladesh2000combdbangla_scripthtmlaccessedonMay212018

[147] BhattacharyaAshutoshed(1942)GopichandrerGanCalcuttaCalcuttaUniversity

[149] DasSisirKumar(1975)SahibsandMunshisAnAccountoftheCollegeofFortWilliamCalcutta

[150] IslamRafiqulPabitraSarkarMahbubulHaqampRajibChakraborty(eds)(2014)BanglaAcademyPromitoBanglaByabaharikByakaran(AFunctionalGrammarofStandardBangla)DhakaBanglaAcademy

[151] SarkarPabitra[2013]lsquoBanglaSpellingReformtheLongandShortofItrsquoBanglaJournal19215-232

[152] BanglaAcademy(2012)BanglaAcademyPromitoBanglaBananerNiyam(StandardBanglaSpellingasadoptedbyBanglaAcademy)DhakaBanglaAcademy

56

[153] SarkarPabitraampRajibChakraborty2018ldquoWhathashappenedSoFarIntermsof Script Reformsrdquo Paper presented at the Face to Face meeting jointly held by theBanglaAcademyDhakaampICANNatBanglaAcademyDhakaon10072018

[154] The Unicode Consortium 2018 The Unicodereg Standard Version 110 ndash CoreSpecificationChapter12P473

57

10 Appendix-I

101 AugmentedBackus-NaurFormalism(ABNF)The Augmented Backus-Naur Formalism (ABNF) is generic in nature and whenappliedtoaspecificlanguagescriptcertainrestrictionrulesapplyInotherwordsinagivenlanguagesomeoftheFormalismstructuresdonotnecessarilyapplyTotakecareofsuchcasesrestrictionrulesaresetinplaceTheserestrictionswillhelptofine-tunetheABNFIncaseofBangla13inparticularthefollowingrulesapply

1Khaṇḍata (ৎ) is NOT allowed at the beginning of an IDN label The sameappliestoঞandthevelarnasalঙintheBanglaSchemeoffive-foldlsquovargarsquo(asdefined under Table 5) Moreover Bangla does not allow ya (য়) in thebeginningofawordeitherbutwecanciteacoupleofnativeexamples forexamplethewordয়8াvেড়া(yaeligbbɔRo)fromthepoemlsquoLichuchorrsquowrittenbyKaziNazrul IslamHowever there are instances of it being used in namesmostlyof foreignoriginsuchasYaqubwhichmaybewrittenwithya(য়) inthe beginning as inয়াwব) In very recent timeswhile transliterating someChineseandJapanesenamesinBanglaonedoescomeacrossthepossibilityofKhaṇḍata (ৎ) followedbysa (স) inthebeginningofaword forexamplexেসিরং(Tsering)

2CHcancomewithKhandaTainonlythecasewhereCisra(র)(09B0)

ৎ6 asinভৎ6 সনা

3OnlyfollowingcombinationswithVHCMwillbeallowedrarrঅ8া(togetherpronouncedasaelig)asinঅ8ািসড(acid)rarrএ8া(togetheralsopronouncedasaelig)asinএ8ািসডএ8ােসািসেয়শান

(acidassociation)

102 lsquoSylhetiNagarılipirsquoorlsquoSilotirsquoThisversionofBanglascriptresemblesthe lsquoKaithīrsquoscript(ISO12954)usedbytheAccountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh andBiharndashwidelyinuseduringthe1880sTherewereseveralothernamesofSylheti

13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language Assamese and Maṇipuri have not been covered in this section

58

Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)

Table17ndashTheScriptTableofSylhetiNagarıorSiloti

103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints

1031 BanglaandNāgarīorDevanāgarī

Bangla Devanāgarī NBGP

Decision

ঃ U+0983 ः U+0903 Confusable

ওU+0993 उU+0909 Confusable

ঘU+0998 घU+0918 Confusable

U+0981 U+0945 Confusable

Table18BanglaandDevanāgarīconfusablecodepoints

59

1032 BanglaandGurmukhi

Bangla Gurmukhi NBGPdecision

ঘU+0998 ਬU+0A2C Confusable

U+0981 U+0A71 Confusable

Table19BanglaandGurmukhiconfusablecodepoints

Bangla Gurmukhi

NBGPdecision

ওU+0993 ਤU+0A24

Distinguishable

শU+09B6 ਅU+0A05

Distinguishable

মU+09AE ਮU+0A2E

Distinguishable

বাU+09ACandU+09BE

ਗU+0A17

Distinguishable

Table20ndashBanglaandGurmukhıdistinguishablecodepoints

1033BanglaandOriya(Odia)

Bangla Oriya(Odia) NBGPDecision

ওU+0993 ଓU+0B13 Confusable

Table21ndashBanglaandOriyadistinguishablecodepoints

Bangla Oriya(Odia) NBGP

Decision

ঘU+0998 ସU+0B38 Distinguishable

Table22ndashBanglaandOriyadistinguishablecodepoints

60

11 Appendix-IIBengaliconsonantsandtheirallographs

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

প p y(z+ত)(z +ন)|(z +প)প8(z +য)r(z +র)(z +ল)~(z +স)(+প)(+প)

ফ pʰ (t +র)(t +ল)(+ফ)

ব b ( +জ)( +দ)( +ধ)v(+ব)ব8(+য)(+র)(+ল)ভ(+ভ)(+ব)(+ব)

(0+ধ) 2 (3+ব)

ভ bʱ ভ8(+য)(+র)(+ল)

ত t (x+ত)8(x+x+য)(x+x+ব)(x+থ)(x+ন)ত8(x+য)(x+ম)8(x++য)(x+ব)(x+র)y(z+ত)(p+ত)(p+x+ব)(+ত)q8(+x++য) (+x+র)Thereisamarkedformofত+=ৎৎ6 (+xৎ)

amp (5+ত)

61

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

থ tʰ থ8(iexcl+য)cent(iexcl+র)pound(+থ)(x+থ)curren(+থ)

(7+থ) 9 (+থ)

দ d yen(brvbar+গ)sect(brvbar+ঘ)uml(brvbar+দ)copy(brvbar+ধ)দ8(brvbar+য)ordf(brvbar+ব)laquo(brvbar+ভ)not(brvbar+র)(+দ)shy(+দ)reg(+brvbar+র)not6 (+brvbar+র)

(lt+গ) gt (lt+ধ)

ধ dʱ macr(deg+ন)plusmn(deg+ম)ধ8(deg+য)sup2(deg+র)sup3(acute+ধ)copy(brvbar+ধ)(+ধ)micro(+ধ)

( (+ধ) gt (lt+ধ) (0+ধ) (7+ধ)

ট ʈ para(middot+ট)ট8(middot+য)cedil(middot+ব)sup1(middot+র)ordm(p+ট)raquo(frac14+ট)

ঠ ʈʰ ঠ8(frac12+য)frac34(iquest+ঠ)Agrave(frac14+ঠ)

ড ɖ Aacute(Acirc+ড)ড8(Acirc+য)Atilde(Acirc+র)

ঢ ɖʱ ঢ8(Auml+য)Aring(iquest+ঢ)

চ tʃ AElig(Ccedil+চ)Egrave(Ccedil+ছ)Eacute(Ccedil+Ecirc+র)Euml(Ccedil+ঞ)চ8(Ccedil+য)Igrave(Iacute+চ)Icirc(Iuml+চ)

(A+চ)

62

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ছ tʃʰ ETH(Ecirc+র)Egrave(Ccedil+ছ)Ntilde(Iacute+ছ)Ograve(Iuml+ছ)

$ (A+ছ)

জ dʒ Oacute(Ocirc+জ)Otilde(Ocirc+Ocirc+ব)Ouml(Ocirc+ঝ)times(Ocirc+ঞ)জ8(Ocirc+য)Oslash(Ocirc+র)Ugrave(Iacute+জ)

(A+জ)

ঝ dʒʱ (notprivilegedenoughtohaveclustersasafirstmember)Ouml(Ocirc+ঝ)Uacute(Iacute+ঝ)

ক k Ucirc(p+ক)ordm(p+ট)(p+ত)Uuml(p+x+র)(p+x+ব)Yacute(p+ন)THORN(p+ব)szlig(p+ম)ক8(p+য)agrave(p+র)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)auml(p+frac14+ব)aacute8(p+frac14+য)aring(p+স)s(aelig+ক)ccedil(+p+র)

amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)

খ kʰ (notprivilegedenoughtohaveclustersasafirstmember)egrave(aelig+খ)

63

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

গ g eacute(acute+গ)ecirc(acute+দ)sup3(acute+ধ)euml(acute+ন)igrave(acute+ব)iacute(acute+ম)গ8(acute+য)icirc(acute+র)iuml(acute+ল)eth(aelig+গ)eth6 (+aelig+গ)

( (+ধ) (H+গ) K (L+H+গ)

ঘ gʱ ntilde(ograve+ন)ঘ8(ograve+য)oacute(ograve+র)ocirc(aelig+ঘ)

ঞ Thisletterdoesnothaveanyparticularphoneticvaluebutmostlypronouncedasn

Igrave(Iacute+চ)Ntilde(Iacute+ছ)Ugrave(Iacute+জ)Uacute(Iacute+ঝ)times(Ocirc+ঞ)

(A+চ) $ (A+ছ) (A+জ) M (A+ঝ)

ণ n otilde(iquest+ট)frac34(iquest+ঠ)ouml(iquest+ড)divide(iquest+Acirc+র)Aring(iquest+ঢ)oslash(iquest+ণ)ণ8(iquest+য)ugrave(iquest+ব)acirc(p+frac14+ণ)uacute(frac14+ণ)ucirc(+ণ)

O (P+ড) - (P+R+র) + (S+ণ)

ঙং ŋ s(aelig+ক)uuml(aelig+p+র)egrave(aelig+খ)eth(aelig+গ)ocirc(aelig+ঘ)yacute(aelig+p+ষ)(Insomecontextsaeligisreplacedbyং)কংঅং

) (H+ক) (H+গ) U (H+ঘ)

64

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ম m thorn(+ল)yuml(+প)(+z+র)(+ভ)(++র)$(+ম)(+র)(x+ম)plusmn(deg+ম)amp(+ম)atilde(p+frac14+ম)

W (3+ম)

ন n (+ট)((+middot+র))(iquest+ঠ)u(+ড)(+Acirc+র)(+ত)q(+x+র)q8(+x++য)curren(+থ)shy(+দ)reg(+brvbar+র)micro(+ধ)+(+deg+র)(+brvbar+ব)-(+ন)(+ম)ন8(+য)(+স)0(+ন)

(7+থ) (7+ধ) (7+Y+র)

শ ʃ Icirc(Iuml+চ)Ograve(Iuml+ছ)1(Iuml+ন)2(Iuml+ম)3(Iuml+র)4(Iuml+ল)শ8(Iuml+য)

ষ ʃ 5(frac14+ক)raquo(frac14+ট)Agrave(frac14+ঠ)uacute(frac14+ণ)6(frac14+প)7(frac14+z+র)8(frac14+ফ)raquo(frac14+ট)9(frac14+middot+র)Agrave(frac14+ঠ)uacute(frac14+ণ)ষ8(frac14+য)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)

+ (S+ণ)

65

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

স sampʃ (+ক)(+ট)(+প)(+ফ)lt(+ত)pound(+থ)(+ট)(+ক)=(+খ)স8(+য)gt(+র)(+ল)aring(p+স)

9 (+থ)

হ h ucirc(+ণ)0(+ন)amp(+ম)হ8(+য)(+র)A(+ল)

W (3+ম)

ড় ɽ B(C+গ)

ঢ় ɽʱ (notprivilegedenoughtohaveclusters)

য dʒThesecondarysymbol(allograph)jɔ-phalahastwophoneticvaluesWhenaddedtotheinitialconsonantinaworditisavowelaelig(asinশ8ামলর 8াপারetc)Butafteranon-initialconsonantitjustdoublesitinpronunciation(asinকায6ধায6etc)The+যcombinationhastwophysicalmanifestationsmdashর 8andয6

ক8(p+য)স8(+য)র 8(+য)[Justর 8isneverusedinBanglaorthographyর 8াisbutthenitslasttwosymbolsYa-phalaa-karaconstituteavowelsignrepresentingthevowelঅ8া]

66

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

র r Twomanifestationsmdashi lরফrepʰasthefirst

memberofaclusteregপ6ৎ6 not6 য6D6(+deg+ব)(earlierE6=+brvbar+deg+বafour-termcluster)etc(placedoverthefollowingconsonant)

ii র-ফলাrɔ-pʰɔlaasthesecondthirdmemberofaclustereg etc(placedundertheconsonantitfollows)

ল l F(+গ)(+প)G(+ব)H(+ম)I(+ট)J(+ড)K(+ক)F(+গ)L(+দ)ল8(+য)iuml(acute+ল)(+ল)thorn(+ল)

ঃ hwordfinallywordmediallyitdoublesthepronunciationofthefollowingconsonant

অঃকঃ

অব

Page 11: Proposal for a Bangla (or Bengali) Script Root Zone …‘Bangla’ (or Bengali) is historically and genealogically regarded as an eastern Indo-Aryan language with around 178.2 million

11

As mentioned above in Bangla several graphemic symbols have secondaryshapestechnicallycalledlsquoallographsrsquowithacomplementarydistributionineachcaseThesegraphsormarkingsaregenerallyaddedtothefollowingpositionsoftheprimarysymbol[113]inthefollowingmanner

1) Below(egক(ku)W(nta)ক(ku)^ (hra)etc)

2) Above(egচ (ca)কH (rka)etc)

3) Rightside(egকা (ka)কং (kan)etc)

4) Leftside(egেক (ke))

5) LeftSideandabovesimultaneously(egৈক (kai)িক (ki)etc)

6) Rightsideandabovesimultaneously(egকী (kı))

7) Rightsideandleftsidesimultaneously(egেকা (ko))

8) Rightsideleftsideandabovesimultaneously(egেকৗ (kau))

Asforcomplementarydistributionofvowelletters(word-orsyllable-initial)andVowel Matras which are relevant for ABNF let us consider the followingBesidessomesimpleVowelModifierscalledlsquoKarsrsquoinBangla(alsoreferredtoasMatraintheotherLGRdocumentsofNeo-Brāhmī)therearesomecombinatorymodifiersofBanglaVowelswithcertainconsonantsForexamplewhereas

আU+0986BENGALILETTERAAissubstitutedby

াU+09BEBENGALIVOWELSIGNAA

ইU+0987BENGALILETTERIissubstitutedby

pre-posedিU+09BFBENGALIVOWELSIGNI

ঈU+0988BENGALILETTERIIissubstitutedby

ীU+09C0BENGALIVOWELSIGNIIor

উU+0989BENGALILETTERUissubstitutedby

U+09C1 BENGALI VOWEL SIGN U by marking below the primary

grapheme there are some special vowel modifiers of উ as in the followingcombinedletters

zwnj guratherthanwritingasগ(g)+ (u)

h ruratherthanwritingasর(r)+ (u)

zwnj śuratherthanwritingasশ (s)+ (u)

j huratherthanwritingasহ(h)+ (u)

12

knturatherthanwritingasL (n)+ত (t)+ (u)

Similarlytherecouldbevowelmodifiersofঊorlsquo(Long)ūrsquoaswelleg

m (bh)+র (r) (n bhru ldquoeyebrowrdquo)o (s)+র (r) (p sru)ঋ (r) afterহ (h) (q hr)etc

TherehavebeenmanynotablecontributionsinsimplifyingandmodifyingBanglaspellings and combinatory techniques especially by scholars such as PabitraSarkar(1992)[134]Inthistherehasbeenanattempttoreducethenumberofallographs of both vowels and consonants in clusters and it has been widelyacceptedintheprintingofschooltextsinbothBangladeshandWestBengal[151152]Asofnow twosystems theold (traditional) and thenewgoon sidebysideoperativeindifferentdomains

HoweverinpreparationofthisLGRdocumenttheaimhasbeentoconsiderthewidelyused and usable sequences and combinations and their variations across the sisterscriptsbelongingtothebasketofBrāhmīwritingsystemsBanglaAcademyDhakapublishedStandardBanglaSpellingRulesin1992followingtherecommendationsofacommitteeconstitutedthroughaworkshopjointlyorganizedbytheJatıyaSy iksakramaandPathyapustakaBoardin1988AthroughlyrevisededitionoftheRuleswaspublishedinSeptember20126After the establishment of Banla A kademi ofWestBengal in 1986 its first PresidentAnnadasankar Ray (1904-2002) in his inaugural address gave a direction forstandardizationofBanglaalphabetscript thespellingsystemandclearlyarguedthattheywouldnotblindly followtheSanskriticmodelofconventionalgrammarAbroadlistofproposalswassenttoexpertsonBanglaandabroadagreementwasreachedforlsquohomogenizationofBanglaspellingrsquoby1988BasedonopinionsreceivedfromdifferentquartersaunanimouslistoflsquorulesrsquowasagreeduponThiswaspublishedbyalsquoSpellingDictionaryrsquo titled Ākādemi Bānāna Abhidhāna (1997) which was obviously morecomprehensive than lsquoTheUniversityofCalcuttaproposalsrsquomade in1936Alongwiththe lsquorationalizationrsquo of spellings another stepwas taken tomake thewriting systemeasier to read by making the symbols used both single and combined ones morelsquotransparentrsquoThesereformswereoriginallysuggestedbySarkar(1987firstpublishedin1978)[134][153]whereheusedthetermsSwaccha (lsquoTransparentrsquo)andAswaccha(lsquoOpaquersquo or non-transparent) even adding Ardha Swaccha (lsquohalf transparent) inbetweenthetwoSomesampleexamplesare

6Bangla Academy 2012 Bāṅlā Ekaḍemī Pramita Bāṅlā Bānānera Niyama (Bangla Academy StandardBanglaSpellingRules)DhakaːBanglaAcademy

13

Transparent r (nn) s (pt) [ (st) where both member of the cluster can berecognizedOpaquewhereneitherofthetwocouldbe(easily)recognizedmdash8 ks(7 k+ষ s)= jn

( j+ঞ n)tng(un+গg) hm(gt h+ম m)

Semi-transparent A (pr)পH (rp)whereone symbol is recognizable and theother is

notIncaseofthree-termclustersatleastonesymbolwillnotbetransparentegv str

(w s+x t+র r)D str(B s+C t+র r)etc

Therewere in fact two types of proposals One concerned the shape of the lettersthose of consonant + vowel (CV) combinations and conjuncts which is consonant +consonantcombinationsTherewerefurthercomplexshapesiethoseofconsonant+consonant+ (consonant+) vowel (CC(CV) signs as in y (pru) or z (skru) SomedecisionsinthisareawerenecessarybecauseafewoftheCC(C)symbolsrepresentedcomplexitiesthatmadelearningthemdifficultforthechildrenTheotherdealtwiththespellings ofwords onlywithout any reference to the shapes of letters inwhich theywere written The basic objective here was lsquoone word one spellingrsquo to the greatestextentthatwaspossible[151]

Belowwe place a statement of themost salient changes that affect the consonant +vowelcombinations[153]

a The variants of the short u (^ উ-কার hrasva u-kāra) vowel sign have been

brought down to one ie So zwnj (gu) is now গ Similarly h (ru) gt র zwnj

(śu)gt শ j (hu)gtহ and therefore cluster + short u sign k (ntu)gt W

(ন++ত+উ) (stu)gt[ (স++ত+উ)

b The variants of long u (দীঘH ঊ-কার dīrgha u-kāra) have also been reduced

(rū)gt র n (bhrū) gt Y (ভ bh++র r+ঊ ū) (drū)gt (দ d++র r+ঊ ū) p (śrū)gt

(শ ś++র r+ঊ ū)

c The variants of ঋ-কার (ṛ-kāra secondary symbol of ṛ) have been brought down to one q (hṛ) gt হ

Regarding consonant + consonant + (consonant)hellip+ (vowel) clusters PaschimbangaBanglaAkademi proposed transparent or semi-transparent shapes for clusters to theextentadmissibleinBanglawritingsystemSomeexampleswillclarifytheproposal(Aslashwillmeanthatthetraditionalcluster-shapeprecedesitwhiletheBanglaAkademiinnovationfollows)[153]

Xব ধ bdh ( b+ ধ dh) Mদ ধ ddh (J d+ধ dh) ন থ nth (L n+থ th) Uঞ চ ntildec

(9 ntilde+চ c) ঞ ছ $ ntildech (9+ছ) ঞ জ ntildej (9 ntilde+জ j) Sক ত amp kt (7 k+ত t) T

14

kr (7 k+র r) Nগ ধ ( gdh (K g+ধ dh) ) ṅk (u ṅ+ক k) t ṅg (u ṅ+গ g) +

ṣṇ (B ṣ+ণ ṇ) ন ndhr (L n+ dh+র r) - ṇḍr ( ṇ+ ḍ+র r) ktr (7 k+x

t+র r)

331TheConsonantsAsper traditional classificationBangla Consonants are categorized according to theirphoneticpropertiesespeciallyintermsofplaceandmannerofarticulation[107]Thereare Five lsquoVargarsquo (pronounced as lsquoBargarsquo in Bangla) or Groups (sets or classes)distinguished by Place of Articulation and one Non-lsquovargarsquo group [105] Each Vargawhich corresponds toStopsat a certainplaceof articulation containsa seriesof fiveconsonants classified as per their phonetic qualities (ie manner of articulation)beginning from Unvoiced and Unaspirated to Voiced and Aspirated (in the fourthcolumn)finallyendingwithaHomorganicorCorrespondingnasal[107]Considerthefollowingtable

lsquoVargarsquoorSets

Unvoiced Voiced Nasal

-Asp +Asp -Asp +Asp

Velar কlsquoKrsquoU+0995

খlsquoKHrsquoU+0996

গlsquoGrsquoU+0997

ঘlsquoGHrsquoU+0998

ঙlsquoNGrsquoU+0999

Palatal চlsquoCrsquoU+099A

ছlsquoCHrsquoU+099B

জlsquoJrsquoU+099C

ঝlsquoJHrsquoU+099D

ঞlsquoNYrsquoU+099E

Retroflex টlsquoTTrsquoU+099F

ঠlsquoTTHrsquoU+09A0

ডlsquoDDrsquoU+09A1

ঢlsquoDDHrsquoU+09A2

ণlsquoNNrsquoU+09A3

Dental তlsquoTrsquoU+09A4

থlsquoTHrsquoU+09A5

দlsquoDrsquoU+09A6

ধlsquoDHrsquoU+09A7

নlsquoNrsquoU+09A8

Bilabial পlsquoPrsquoU+09AA

ফlsquoPHrsquoU+09AB

বlsquoBrsquoU+09AC

ভlsquoBHrsquoU+09AD

মlsquoMrsquoU+09AE

Table5VargaclassificationofBanglaconsonants

(FallingintoaPatternofFiveSetsofUnvoicedUnaspiratedUnvoicedAspiratedVoicedUnaspiratedVoicedAspiratedandNasalscalledfivelsquoVargarsquo)

Non-

যlsquoYrsquoU+09AF

য়lsquoYYrsquoU+09DF

রlsquoRrsquoU+09B0

ড়lsquoRRrsquoU+09DC

ঢ়lsquoRHrsquoU+09DD

15

Varga লlsquoLrsquoU+09B2

শlsquoSHrsquoU+09B6

ষlsquoSSrsquoU+09B7

স lsquoSrsquo U+09B8

হlsquoHrsquoU+0939

Table6Non-Vargaconsonants(Notfallingintoanyofthefivecategories)

332TheImplicitVowelKillerHasanta(calledrsquoHalantrsquoorlsquoHalantarsquoinotherBrahmı-basedscripts)As stated earlier all consonants are pronounced in isolation with an implicit vowel(centralback-ɔinBanglaastheneutralvowel)assumedtobeassociatedwiththem[121]ThelsquoHasantarsquo(=rsquoHalantrsquoorlsquoHalantarsquoinotherBrahmı-basedscripts)orthetermlsquoVirāmarsquo7(=rsquoDa rirsquoinBangla)aspreferredinUNICODE(cfUnicode30andabove)havebeenusedinthisreportastermsthathavebeenusedtodenotethecharacterthatmarkthe absence of this inherent vowel It may be noted that the term virama has beenadopted in UNICODE in a sense that is different from the traditional definition ofgrammarandhenceitrequiressomeexplanationhereConsideringtheimportanceofthedocumentthisnoteshouldbeapartofthisLGRdocumentsothatanybodyreferingtoitshouldbeabletoknowthepropergrammaticalexplanationofthetermBecauseaspecialsignisneededwheneverthisimplicitvowelisstrippedoffthesymbolisknownas the Hasanta (= Halant) (U+09CD) By placing the Hasanta under the firstconsonantofacombinationorclusteronecouldndashincommonparlanceldquokillrdquoitsvowelandcreate conjuncts In thismanner conjunct characters canbegenerallywrittenbyjoining two to fourconsonant combinations In rarecases thisprocess can joinup tofive consonantsHowever thenotionof amaximumnumberof consonants joining toformoneaksara8istobeboundedempiricallyThisisanobservationbasedontheCIIL-Emille Corpora of Bangla words [132 amp 133] as seen in print these days Given themixture of scripts and languages happening on theweb the possibility that onemaywant a generic Top Level Domain [gTLD] which may have more than the observedmaximum cannot be ruled out This can be the case when a foreign language wordwhichadmitsalargenumberofconsonantsistransliteratedintoBanglaHenceintheBanglaLGRworkthislimitwillnotbeenforced

333VowelsSeparate symbols exist for all lsquoSwararsquo or Vowels in Bangla which are pronouncedindependentlyeitheratthebeginningofthewordorafteranothervowelorconsonantsoundToindicateaVowelsoundotherthantheimplicitoneaVowelsigncalledlsquokārrsquo

7VirāmaasusedhereisalsoamisnomeraccordingtotheIndiangrammaticaltraditionsNowheremereabsence of a vowel is marked as virama Hasanta just marks the absence of a vowel nothing else(AbhyankarKashinathVasudevampJMShukla1961ADictionaryofSanskritGrammarBarodaːOrientalInstitute)8ThistermneedstobedisambiguatedAksaraalsomeanslsquosyllablelsquoinIndiangrammaticaltreaditions

16

inBanglaorMātrāinNagarı9isattachedtotheconsonantSincetheconsonanthasthisbuilt in neutral vowel at the end there are equivalent kāras (Mātrās) for all vowelsexcepttheঅ(pronounced-ɔ)Thecorrelationisshownasfollows

Vowel Correspondingvowelsign(kāras(Mātrās)

অlsquoArsquoU+0985

আlsquoAArsquoU+0986 া U+09BE

ইlsquoIrsquoU+0987 ি U+09BF

ঈlsquoIIrsquoU+0988 ীU+09C0

উlsquoUrsquoU+0989 U+09C1

ঊlsquoUUrsquoU+098A U+09C2

ঋVocalicrsquoRrsquoU+098B U+09C3

ৠVocaliclsquoRRrsquoU+09E0 U+09C4

ঌVocaliclsquoLrsquoU+098C U+09E2

ৡVocaliclsquoLLrsquoU+09E1 U+09E3

এlsquoErsquoU+098F ে U+09C7

ঐlsquoAIrsquoU+0990 ৈU+09C8

ওlsquoOrsquoU+0993 োU+09CB

ঔlsquoAUrsquoU+0994 ৌ U+09CC

- ৗ U+09D7

Couldappearontopofঅ lsquoArsquoU+0985oranyothervowel

U+0981Candrabindu

9AlthoughthetermlsquoMatralsquoinBanglastandsforanaltogetherdifferentconceptvizthetopbarplacedoveraletterndashtypicallyavailableinHindiandBanglabutmissinginGujarati

17

Vowel Correspondingvowelsign(kāras(Mātrās)

Couldappearafterঅ lsquoArsquoU+0985oranyothervowel

ংU+0982Anusvara

Couldappearafter অlsquoArsquoU+0985oranyothervowel

ঃU+0983Visarga

Afteranyconsonant U+09CD(Hasanta)

- ঽ U+09BDAvagraha

Table7BanglaVowelswithcorrespondingkārs

334TheAnusvāraonuʃʃār(ং-U+0982)TheAnusvāra or onuʃʃār inBangla at times represents a homorganic nasal but notalwaysItreplacesaconjunctgroupofalsquoNasalConsonant+Hasanta+ConsonantrsquowherethesecondconsonantbelongstotheVelarvargaorsetasinলংকাButitoftenappearsalso for such combinations involving non-velars appearing as the lastmember of thecombinationasinলGাংটা ldquonakedrdquoorলGাংচা ldquoakindofsweettolimprdquoBeforeanon-vargaconsonant the Anusvara represents a nasal sound that may have an alternativeconjoined writing symbol representing the corresponding nasal consonant of theparticularsetAlthoughModernHindiMarathiandKonkaniprefertheanusvāratothecorrespondingHalf-nasal inBangla it isclearlydemarcatedastowhereonemustusetheAnusvāraandwhere ithastobeaconjunctclusterwithanasalasthefirstorthesecondcomponent

335NasalizationCandrabindu(-U+0981)

Candrabindu denotes nasalization of the preceding vowel as in চাদ cad lsquomoonrsquo(U+099AU+09BEU+0981U+09A6)Thissignwithadotinsidethehalf-moonmarkisusedasnasalizationmarkerinmanyBrahmı-basedscripts[143]

336Nukta(-U+09BC)ThenuktasigndoesnotexistinBanglaorthographyItispredominantlyusedinmanyBrahmıderivedscriptssuchasDevanagarı(forHindiBodoMaithiliSantaliKashmiriandSindhiThetermandtheconceptofnuktaareborrowedinBanglaTheIDNAProtocol(RFC5891)statesthatIDNsmustbeinUnicodeNormalizationFormC (NFC) RFC 7940 applies this requirement to LGRs The definition of NFC in the

18

UnicodeStandardcontainsanumberofcompositionexclusionsAsaresulttheBanglalettersয় YYAড় RRAandঢ় RRHAhavetoberepresentedinthethisLGRbyusingthesequences (YA +Nukta U+9AF + U+09BC) (DDA + Nukta U+9A1 + U+09BC) and(DDHA+NuktaU+9A2+U+09BC)insteadofthesinglecodepointsYYA(U+9DF)RRA(U+09DC) andRRHA (U+09DD) although the useof lsquoNuktarsquo is otherwise completelyunnaturalinBanglaIt is noted that in the current Unicode Standard chart these characters are listed asadditionalconsonantsAspertheLGRProcedurehoweverthesedecisionsdependontheIDNAProtocolthroughasetofprodeduresdevelopedbytheIETFEventhoughtheUnicode Standard also prescribesmethods to produce these three characters both asatomiccharacters (forexample09DC forড় [r]09DD forঢ় [rh] and09DFasয় [y]assinglekeystroke)theIDNAprotocolrequiresthatwetreatthemasconjunctcharactersandthenallocatecodesfortheseintheUnicodeBengaliBlockItmaybenotedthattherecouldbesporadicattemptsorcasesofwritingMuslimnamesUrdupoeticwordsandPerso-Arabicloanwordswithnuktaunderক(k)খ(kh)গ(g)জ(j) and ফ (ph) only for the sake of correct pronunciation and for maintaining thesanctityoftheloanwordThesewerealsolikeusingBanglawritingsystemtoworkliketheIPAscriptItishowevernotinuseinBanglawritinginprinting

337Visargabiʃɔrgo(ঃ-U+0983)andAvagraha(ঽ-U+09BD)

TheVisargabiʃɔrgoU+0983 is frequentlyused inBangla loanwordsborrowed fromSanskritandrepresentsasoundveryclosetohOnecouldquoteasanexampleদঃখduhkholdquosorrowrsquorsquoldquounhappinessrsquorsquo(U+0926U+0941U+0983U+0916)The Avagraha ঽ (U+09BD) is mainly used in Sanskrit Pali Prakrt or Maithili textswritteninBanglaItisgraduallybeingreplacedbyanuppercomma(egনেরাঽপরািণre-writtenasনেরাrsquoপরািণ)ItisrarelyusednoweveninotherlanguagesusingBanglascriptIncaseofLGRtheAvagrahaisnotpartoftherepertoireIthasbeendecidedthereforenottoretainAvagraha(ঽ)(U+09BD)becauseitisblockedinTLDsaspertheMaximalStartingRepertoire(MSR)PleaseseeAppendixIIinsection11foracompletelistofBanglaconsonantsandtheirallographs

338ZeroWidthNon-joiner(U+200C)andZeroWidthJoiner(U+200D)ThisnoteispertinenttotheuseofZeroWidthJoiner(ZWJ)andZeroWidthNonJoiner(ZWNJ)asusedinBanglaItneedstobenotedthatNepaliKonkaniandHindiusethesetwosignsinadifferentmanner

19

ZWJ(U+0200D)andZWNJ(U+0200C)arecodepointsthathavebeenprovidedbytheUnicodestandardto instructtherenderingofastringwherethescripthastheoptionbetweenjoiningandnon-joiningcharactersWithouttheuseofthesecontrolcodesthestringmayberenderedinanalternateformfromwhatisintendedUseofZWJ

bull InsofarasBanglaisconcernedZWJisusedfortheproperrenderingofcharacterssuchaskhaṇḍa-taৎasinসতGিজৎ (satyajit)ldquoSatyajitrdquoandসৎ(sat)ldquohonestrdquoThisistypedasfollowsta+Hasanta+ZWJ(U+0200D)

bull However ZWJ is more important where same combination of consonantal

characters is represented differently depending upon the contexts Eg র++য

havetworepresentationsinBanglamdashasযHandasর GTogettheformযHonehasto

type in the following mannermdashর++য but for র G the sequence would be

র+ZWJ++য [154] In other words ZWJ is used in the rendering of wordsdemanding ya-phalā after ra which is otherwise not possible to type (render)due to the same order of ra+hasanta+antastha ja in the medial andor finalposition Interestingly ra+hasanta+antastha ja is used to type repha on theconsonant -antasthaja as inকায6 (kaarjo) In order to get a ya-phalā after the

consonant -ra it is therefore obligatory to use ZWJ after -ra as in র Gাপার

(wrapper)র Gাশ(rash)র Gািল(rally)etcThetypingsequenceisgivenbelow

ra(র)+ZWJ+hasanta()+antasthaja(য)=র GUseofZWNJ

bull TheuseofZWNJinBanglaisusedtorepresenttheexplicitHasantaorHalantInordertoavoidconjunctformationincaseswherethereisanexplicithasantabeforethesucceedingconsonanttheZWNJisused

Consonant+hasanta+ZWNJ+consonant=explicithasantaExampleAা7 কথন(prakkathanaprakkɔtʰon)

TheuseofZWJZWNJhavebeenruledoutfromtherootzonebythe[Procedure]Usedin Bangla to create alternate renderings the insertion of these two signs can affectsearchingaswellasNLP

20

TheZeroWidthNon-joiner(ZWNJ)isaninvisiblecharacterusedincertaincases(afterHasanta)wheredefaultconjunctformationistobeexplicitlyrestrictedandtheHasantajoiningthetwoconsonantsparticipatingintheconjunctformationneedstobeexplicitlyshown

339UseofYa-phalaaYa-Phalaasequencesare two instances inBanglawhereHasanta isprecededbya fullvowel(U+0985অ-BENGALILETTERAandU+098Fএ-BENGALILETTERE)

bull অা 098509CD09AF09BE BENGALILETTERA+BENGALISIGNVIRAMA+BENGALILETTERYA+BENGALIVOWELSIGNAA

bull এা 098F09CD09AF09BEBENGALILETTERE+BENGALISIGNVIRAMA+BENGALILETTERYA+BENGALIVOWELSIGNAA

For renderingYa-phalā followedbyঅ andএ it isnecessary to typeU+09CDHasantaplusU+09AFyaprecededby thesaidvowelsThis isapurely ligaturalentityand theadditionofYa-phalāandākaraisusedtoelicittheaeligsoundasinEnglishacidঅGািসড

association অGােসািসেয়শনlsquobatrsquoবGাটlsquofatrsquo ফGাট lsquomatrsquo মGাটlsquocaprsquoকGাপetcTheBrāhmīscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribed as lsquovowel killerrsquo although it actually indicates absenceof a vowel after themarkedconsonantOnly theconsonantscanhave theHasantamarkedButasweseehereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅ8াandএ8া(CfUnicode100p473[100])

3310FormationofRa-phalaaandRefSequences Thiscasereferstotheformationofrephaandra-phalāasfollows

Ra-Hasanta= (C2H)whereC2iseither

09B0(র-BENGALILETTERRA)or 09F0(ৰ-ASSAMESELETTERRAUnicodename

BENGALILETTERRAWITHMIDDLEDIAGONAL)His09CD(-BENGALISIGNVIRAMA)

Owingtoco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+kaasinঅকHarkaldquothesunldquo)ra-phalā=C+Hasanta+ra(egT ieka+Hasanta+raasinচTchakraldquocyclerdquo)Thepointisin

21

both the cases the slot for ra could be Bangla ra র (U+09B0) or the Assamese ra ৰ(U+09F0)followedprecededbythecommonHasanta(U+09CD)whereastheshapesofrephaandra-phalāinboththecasesremainthesameTheLGRmakesanoteofthispoint of concern with respect to the two RAs in disguise as it would be compeltelyimpossibletodistinguishbetweenthemwithnakedeyesinalablesogeneratedwhichmay consequently lead to concerns related to spoofing and other kind of cyberirregularitiesThemotivetoclassthesetwoCPsas(blocking)variantsisbecausefullyrendered labels may mask the distinction between Bangla ra র (U+09B0) or theAssameseraৰ(U+09F0)ThatprovidesthejustificationforVariantSet4thoughonlyinthecontextoffollowingHasantThedifferencebetweentheRAsisonlydistinguishableifonelooksintotheirUnicodevaluesThereforelabelssuchasঅকHarka শীষH sırsalsquotopapexrsquo অY abhra lsquocloudthe skyrsquo ম śrama lsquophysical labourrsquo could be extremelydangerous as theweb-usermay never verify the digital content (the labels) with itsunicodevaluecodepoints ThispointismadeexplicitlywithreferencetoTable9(ofsequencesp36)andTable16(ofWLESymbolsp47)thataretofollowMoreoveritisnoteworthythattheREPHAcanalsooccurwithKHANDATATheconditionsinthiscontextofKHANDATAareliabletobesuchthattheCshouldbeeitherRAU+09B0(র)(usedinBangla)orRAU+09F0(ৰ)(usedinAssamese)

4 OverallDevelopmentProcessandMethodologyThe Neo-Brāhmī Generation Panel (NBGP) has been formed by members havingexperience in Linguistics (especially in NLP Computational linguistics) LiteratureLanguageHistoryandEpigraphyUndertheNeo-BrāhmīGenerationPanelBanglaandeightotherscriptsbelongingtoseparateUnicodeblocksarebeingtakenuptoassignaseparate LGR for each However an attempt ismade to ensure that the fundamentalphilosophybehindbuildingthoseLGRsconsistentwithallotherBrāhmī-derivedscriptsThepresentLGRwillcater tomultiple languagesbelongingtoEGIDSscale1to4(seeTable4)thatuseBanglascriptThefollowingguidingprinciplesareusedinmakingdecisionsaboutBanglaLGRCode-points

41 GuidingPrinciplesTheNBGP adopts followingbroadprinciples for selection of code-points in the code-pointrepertoireacrosstheboardforalltheNeo-Brāhmīscriptswithinitsambit

411 InclusionPrinciples4111 ModernUsageEvery character proposed should be in the everyday usage of a particular linguisticcommunityThecharacterswhichhavebeenencodedintheUnicodefortranscription

22

purposesonlyorforarchivalpurposeswillnotbeconsideredforinclusioninthecode-pointrepertoire4112 UnambiguousUseEvery character proposed should have unambiguous understanding among linguistsaboutitsusageinthelanguage

42 ExclusionPrinciplesThe main exclusion principle is that of External Limits on Scope These consist ofprotocolsor standardswhichareprerequisites to theLabelGenerationRule-setsAllfurtherprinciplesareinfactsubsumedundertheselimitationsbuthavebeenspeltoutseparatelyforthesakeofclarity421 ExternalLimitsofScopeThecodepointrepertoireforrootzonebeingaveryspecialcaseatthetopofprotocolhierarchies thecanvasofavailablecharacters forselectionasapartof theRootZonecodepointrepertoireisalreadyconstrainedbyvariousprotocollayersbeneathitThefollowingthreemainprotocolsstandardsactassuccessivefiltersiTheUnicodeChartOut of all the characters that are needed by the script in question if a particularcharacter is not encoded in Unicode it cannot be incorporated in the code pointrepertoire Such cases are quite rare and especially so in Bangla-Asamiyā-ManipuriWritingSystemgiventheelaborateandexhaustivecharacterinclusioneffortsmadebytheUnicodeconsortiumiiIDNAProtocolUnicode being the character-encoding standard for providing the maximum possiblerepresentation of a given scriptlanguage it has encoded as far as possible all thepossible characters needed by the script However the Domain name being aspecialized case it is governed by an additional protocol known as IDNA(InternationalizedDomainNames inApplications) The IDNAprotocol excludes somecharactersoutofUnicoderepertoirefrombeingpartofthedomainnamesiiiMaximalStartingRepertoire(MSR)TheRoot-zoneLGRbeing the repertoireof characterswhicharegoing tobeused forcreationoftheRoot-zoneTLDswhichinturnconstituteanevenmorespecializedcaseof domain names the ROOT LGR procedure introduces additional exclusions on theIDNArsquosallowedsetofcharacters

23

ExampleBanglaSignAvagrahaঽ(U+093D)evenifallowedbyIDNAprotocol isnotpermittedintheRootZoneRepertoireaspertheMSRTosumuptherestrictionsstartoffwithadmittingonlysuchcharactersasarepartofthe code-block of the given scriptlanguage The IDNA Protocol further narrows thisdownandfinallyanadditionalfilterintheformofMaximalStartingRepertoirerestrictsthecharactersetassociatedwiththegivenlanguageevenmore4211 NoPunctuationMarksTheTLDsbeingidentifierspunctuationmarkerspresentinBraHami-basedscriptswillnotbeincluded 4212 NoSymbolsandAbbreviationsAbbreviations weights and measures and other such iconic characters like BANGLAISSHAR(U+09FA)BANGLACURRENCYDENOMINATORSIXTEEN৹(U+09F9)etcwillalsonotbeincluded4213 NoRareandObsoleteCharactersThere are characterswhich have been added toUnicode to accommodate rare formssuchasSanskriticVOCALICRRৠ(U+09E0)andVOCALICLldquoঌrdquo (U+098C)aswellasVOCALICLLৡ(U+09E1)andtheallographicndashkaraformsofthelattertwosymbols-VOWELSIGNVOCALICL(U+09E2)andVOWELSIGNVOCALICLLldquo(U+09E3)Allsuch charactersareexcludedwhich complieswith theConservatismprincipleas laiddownintheRootZoneLGRprocedureHoweverinBanglathe-karacorrespondingtoVOCALICRRৠ(U+09E0)whichisVOWELSIGNVOCALICRRldquordquo(U+09C4)isstill inactiveuseincertainlimitedborrowedorSanskriticwordsandarethereforeretained4214 NoStressMarkersofClassicalSanskritandVedicStressmarkers for classical Sanskrit will not be included This is also in consonancewiththeLetterprincipleaslaiddownintheRootZoneLGRprocedure4215 ABNFThe Augmented Backus-Naur Formalism (ABNF) is described in Section 541 andAppendix(Section101)

5 RepertoireTheBanglaWritingSystemisrepresentedinUNICODEusingtheBengali(Bangla)scriptname as enumerated in ISO 15924 corresponding to languages such as Asamiyā(Assamese) Bangla (Bengali) and Manipuri The BENGALI block used for Bangla-

24

Asamiyā-Manipuri in theUNICODEhas93 entriesThis sectiondetails the code-pointrepertoirethattheNeo-BrāhmīGenerationPanel[NBGP]proposestobeincludedintheBanglaLGRItmaybementionedherethat theGovernmentofAssamhassubmittedaproposal toBureauof Indian Standards (BIS) on26th February2016 for dis-unificationofBanglaand Asamiyā Scripts The BIS inits 8thMeetingofIndian Language Technologies andProducts Sectional Committee LITD 20 held on 23rd Aug 2017 decided torefer theproposalforrecognitionofAssamesescriptinISOIEC10646toISOUntiltheUNICODEConsortiumtakesanyfurtheractionitwillbeassumedthattheCodePointRepertoireunderTable11willbevalidforallthethreelanguagesasaboveFor each of the code points language references have been given in the last columntitledReferenceunderTable8titledtheldquoCodePointRepertoirerdquoForentirecoverageofBanglacodepointsreferencesofBanglaAsamiyā(Assamese)Manipuri(Meitei)andBishnupriya are given Kokborok written in Bangla script is not known to haveintroducedmanynewcomplicationsexceptforoneparticularcharacterThoughonlyafewrepresentativelanguagesunderEGIDSScale1-4havebeenchosenforreferencingthey together cover all the code-points required for all the languages that NBGP hasconsideredasgivenunderBanglaUnicodePoints(asgiveninUNICODE63)Howeverbefore thedetailsarepresented it is ideal to lookat theBanglaCodePointChartfromMaximalStartingRepertoire[MSR]Version3Itmaybenotedthattheshapesofthereferenceglyphsgivenbelowinthecodechartsarebasedononeofthemanyfontsdesigned and are not prescriptive because there could be some variations in actualfonts ndash both UNICODE-compatible and True-Type ones Consider the following Codepointtable

25

Figure1BanglaCodePagefrom[MSR]forBangla-Asamiyā-Manipuri

ColourconventionAllcharactersthatareincludedinthe[MSR]-YellowbackgroundPVALIDinIDNA2008butexcludedfromthe[MSR]-PinkishbackgroundNot PVALID in IDNA2008 or are ineligiblefor the root zone (digits hyphen) - Whitebackground

GiventheBanglaUnicodeBlockasinFigure1forthecodepointsthoseareincludedintheMSRthefollowingsymbolswillneedaseparatetreatmentৎ U+09CE BanglaLetterKhanda-Taৰ U+09F0 Asamiyā-BanglaLetterRaWithMiddleDiagonalৱ U+09F1 Asamiyā-BanglaLetterRaWithLowerDiagonal

26

51 CodePointRepertoireInclusion

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

1 U+0981 BENGALISIGNCANDRABINDU

Candra-bindu

1Bangla2Manipuri2Assamese

[112][122][125]

2 U+0982 ং BENGALISIGNANUSVARA

Onushshar(Anusvara)

1Bangla2Manipuri2Assamese

[112][122][125]

3 U+0983 ঃ BENGALISIGNVISARGA

Biśarga(Visarga)

1Bangla2Manipuri2Assamese

[112][122][125]

4 U+0985 অ BENGALILETTERA

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

5 U+0986 আ BENGALILETTERAA

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

6 U+0987 ই BENGALILETTERI

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

7 U+0988 ঈ BENGALILETTERII

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

27

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

8 U+0989 উ BENGALILETTERU

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

9 U+098A ঊ BENGALILETTERUU

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

10 U+098B ঋ BENGALILETTERVOCALICR

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

11 U+098F এ BENGALILETTERE

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

12 U+0990 ঐ BENGALILETTERAI

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

13 U+0993 ও BENGALILETTERO

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

14 U+0994 ঔ BENGALILETTERAU

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

28

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

15 U+0995 ক BENGALILETTERKA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

16 U+0996 খ BENGALILETTERKHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

17 U+0997 গ BENGALILETTERGA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

18 U+0998 ঘ BENGALILETTERGHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

19 U+0999 ঙ BENGALILETTERNGA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

20 U+099A চ BENGALILETTERCA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

21 U+099B ছ BENGALILETTERCHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

29

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

22 U+099C জ BENGALILETTERJA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

23 U+099D ঝ BENGALILETTERJHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

24 U+099E ঞ BENGALILETTERNYA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

25 U+099F ট BENGALILETTERTTA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

26 U+09A0 ঠ BENGALILETTERTTHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

27 U+09A1 ড BENGALILETTERDDA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

30

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

28 09A109BC(U+09DC)

ড় NormalizedformofBENGALILETTERRRA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]09DCisthepreferredcodepointhoweveritisnotavailableforLGRasperthestandardsgoverningthisLGRdevelopment

29 U+09A2 ঢ BENGALILETTERDDHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

30 09A209BC(U+09DD)

ঢ় NormalizedformofBENGALILETTERRHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]09DDisthepreferredcodepointhoweveritisnotavailableforLGRasperthestandardsgoverningthisLGRdevelopment

31 U+09A3 ণ BENGALILETTERNNA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

32 U+09A4 ত BENGALILETTERTA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

31

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

33 U+09A5 থ BENGALILETTERTHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

34 U+09A6 দ BENGALILETTERDA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

35 U+09A7 ধ BENGALILETTERDHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

36 U+09A8 ন BENGALILETTERNA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

37 U+09AA প BENGALILETTERPA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

38 U+09AB ফ BENGALILETTERPHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

39 U+09AC ব BENGALILETTERBA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

32

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

40 U+09AD ভ BENGALILETTERBHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

41 U+09AE ম BENGALILETTERMA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

42 U+09AF য BENGALILETTERYA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

43 09AF09BC(U+09DF)

য় NormalizedformofBENGALILETTERYYA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]09DFisthepreferredcodepointhoweveritisnotavailableforLGRasperthestandardsgoverningthisLGRdevelopment

44 U+09B0 র BENGALILETTERRA

Consonant 1Bangla2Manipuri

[112][125]

45 U+09B2 ল BENGALILETTERLA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

46 U+09B6 শ BENGALILETTERSHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

33

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

47 U+09B7 ষ BENGALILETTERSSA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

48 U+09B8 স BENGALILETTERSA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

49 U+09B9 হ BENGALILETTERHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

50 U+09BE া BENGALIVOWELSIGNAA

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

51 U+09BF ি BENGALIVOWELSIGNI

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

52 U+09C0 ী BENGALIVOWELSIGNII

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

53 U+09C1 BENGALIVOWELSIGNU

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

34

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

54 U+09C2 BENGALIVOWELSIGNUU

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

55 U+09C3 BENGALIVOWELSIGNVOCALICR

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

56 U+09C4 BENGALIVOWELSIGNVOCALICRR

Kāra(Mātrā)

1Bangla2Assamese

[112][122]

57 U+09C7 l BENGALIVOWELSIGNE

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

58 U+09C8 m BENGALIVOWELSIGNAI

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

59 U+09CB lা BENGALIVOWELSIGNO

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

60 U+09CC lৗ BENGALIVOWELSIGNAU

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

35

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

61 U+09CD BENGALISIGNVIRAMA

Hasanta(=Halant)Virama(=Da ri)

1Bangla2Assamese2Manipuri

[112][122][125]

62 U+09CE ৎ BENGALILETTERKHANDATA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

63 U+09F0 ৰ BENGALILETTERRAWITHMIDDLEDIAGONAL

Consonant 2Assamese [122]

64 U+09F1 ৱ BENGALILETTERRAWITHLOWERDIAGONAL

Consonant 2Assamese2Manipuri

[122][125]

Table8BanglaCode-PointRepertoire

Apart from the above individual code-points the Neo-Brāhmī Generation Panel alsoproposes some specific sequences which enable conditional inclusion of the BanglaLETTER A and E followed by Bangla SIGN VIRAMA and Bangla LETTER YA againfollowed by Bangla VOWEL SIGN AA in the repertoire for enabling inclusion of aeligsoundasinEnglishlsquobatrsquolsquocatrsquoetc

SrNo

UnicodeCodePoints

Sequence

CharacterNames Examplelanguagesusingthecode-point

(Notexhaustive

list)

Reference

36

SrNo

UnicodeCodePoints

Sequence

CharacterNames Examplelanguagesusingthecode-point

(Notexhaustive

list)

Reference

S1 098509CD09AF09BE

অ8া BENGALILETTERABENGALISIGNVIRAMABENGALILETTERYABENGALIVOWELSIGNAA

BanglaAssamese

[112][122]

S2 098F09CD09AF09BE

এ8া BENGALILETTEREBENGALISIGNVIRAMABENGALILETTERYABENGALIVOWELSIGNAA

Bangla [112]

Table9Sequences

52 CodePointRepertoireExclusionTherearesomecharactersoftheBanglascriptthatfindplaceintheUnicodebuthavenot been included in the repertoire in the LGR proposal The reason for excludingঌ(U+098C)andৗ(U+09D7)isthattheyarerareandobsoletecharacters

SrNo CodePoints

Glyph CharacterNames Note

1 U+098C ঌ BENGALILETTERVOCALICL Limitedordeclininguse

2 U+09D7 ৗ BENGALIAULENGTHMARK Limitedordeclininguse

Table10ExcludedCodePoints

37

53 CodepointnotusedaloneBENGALI SIGN NUKTA U+09BC (See 336) is excluded from repertoire since it will never be used alone It will be used as sequence in three special characters in normalized form for ড় ঢ় য়

UnicodeCodePoint

Glyph CharacterName Reasonforexclusion

U+09BC BENGALISIGNNUKTA

Never used alone Only used together with U+09A1 ড

U+09A2 ঢ U+09AF য as to form ড় ঢ় য় respectively

Table10bExcludedCodePoints

54 TheBasisofPresentIDNThepresentLGRhasalsobenefitedfromtheearlierworkonIDNforBangla(differentversions)doneforभारतorভারতdraftedbetween20112009and18072013

541 TheABNFVariablesTheAugmentedBackus-NaurFormalism(ABNF)beganwiththefollowingvariables

CrarrConsonantVrarrVowelMrarrkāra(Mātrā)BrarrAnusvāra(onuʃʃār)DrarrCandrabinduXrarrVisarga(biʃɔrgo)HrarrHasantaViramaZrarrKhandaTa

TheAugmentedBackus-NaurFormalism(ABNF)willusethefollowingOperators

SrNumber Operator Function

1 ldquo|ldquo Alternative

2 ldquo[]rdquo Optional

3 ldquordquo VariableRepetition

38

4 ldquo()rdquo SequenceGroup

Table11TheABNFFormalism

InwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoBanglaaregiventofacilitateunderstanding

542 TheVowelSequenceInwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoBanglaare given To facilitate understanding of other Brahmi script users equivalents inDevanāgarīareprovidedwherevernecessaryAvowelsequenceismadeupofasinglevowelItmaybefollowedbutnotnecessarily(optionally)byanAnusvāraonuʃʃār(B)Candrabindu(D)oraVisargabiʃɔrgo(X)ThenumberofDBorXwhichcanfollowaV inBanglamaynotberestrictedtooneGoingbytherules illustratedinthedocument it isclearthat formationssuchasVDDVBBandVXXare invalidorthographicunitsHowever it isvalidandpossible tohaveformationsorsequencessuchasanusvarafollowedbyachandrabinduononehandandvisarga followed by a chandrabindu on the other as in হ8াংচা lsquohaelignchārsquo and lsquohaelignrsquo হ8াঃrespectivelyThepossibilityof aVisarga orAnusvāra (onuʃʃār) followingaCandrabinduexists inBangla Vowel can optionally be followed by a combination of Hasanta Virama [H]Consonant [C] to formaYa-phala ldquoYa-phala isapresentation formofU+09AFBanglaletterযorlsquoyarsquoRepresentedbythesequenceltU+09CDieBENGALISIGNVIRAMABangla SIGNHasanta or VIRAMA U+09AF -য BENGALI LETTER YAgt Ya-phala has aspecialformয়AgainwhencombinedwithU+09BEাBENGALIVOWELSIGNAA(ielsquoaarsquo(ā))itisusedfortranscribing[aelig]asintheldquoardquointheEnglishwordldquobatrdquowritteninBanglaasব8াটAVowel-sequenceadmitsthefollowingcombinations

5421 ASingleVowel

ExamplesV অअ

5422 AVowelwithConditionsAVowelcanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D]orVisarga[X]or Candrabindu+ Anusvāra [DB] or Candrabindu+ Visarga [DX] or combination ofHasanta(orVirama)[H]followedbyConsonant[C]followedbykāra(Mātrā)[M]

39

Examples

VB অং अ

VD অ अ

VX অঃ अः

VDB অং अ

VDX অঃ अःVHCM অ8াএ8া

5423 VHCMSequenceAVHCMsequencecanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D] orVisarga [X]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga [DX]

Examples VHCMB অ8াংএ8াং VHCMD অ8াএ8া VHCMX অ8াঃএ8াঃ VHCMDB অ8াংএ8াং VHCMDX অ8াঃএ8াঃ

543 TheConsonantSequence

5431 ASingleConsonant(C) ExampleC কक

5432 AConsonantwithConditionsAConsonantoptionallyfollowedbydependentvowelsignkāra(Mātrā)[M]orAnusvāra [B] or Candrabindu [D] or Visarga [X] or Hasanta (also known asVirama)[H]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga[DX]

Example

CM িকক -कक

CB কং क

40

CD ক क

CX কঃ कः

CH p क(Pureconsonant)

CDB কং क

CDX কঃ कः

5433 CMSequenceACMsequencecanbeoptionallyfollowedbyBDXDBorDX

Example CMB কীংকং कक

CMD কা का

CMX বীঃ वीः

CMDB কাং का

CMDX কাঃ काः

5434 SequenceofConsonants Asequenceofconsonants(upto4)joinedbyHasanta(alsoknownasVirama)

3(CH)CExample

CHC W rarr ন++ ত न+त

CHCHC sup2 rarr ন+ + ত+ + র न+त+र

CHCHCHC q8 rarr ন++ত++র++য় न+त+र+य

5435 Subsets While considering its subsets as a representative example we will consider thecombinationCHConlyhoweverthesameisequallyapplicabletoCHCHCandCHCHCHC[A]ThecombinationmaybefollowedbyMBDXDBorDX

Example CHCM sup3ী rarrক ক ী 4कrarrककী

CHCB sup3ং rarrক ক ং 4कrarrकक

CHCD sup3 rarrক ক 4कrarrकक

41

CHCX sup3ঃ rarrক ক ঃ 4कःrarrककঃ

CHCDB sup3 ং rarrক ক ং 4कrarrकक

CHCDX sup3ঃ rarrক ক ঃ 4कःrarrककः [B] 3(CH)CMmayfurtherbefollowedbyaBDXDBorDX

Example

CHCMB sup3ীং rarr ক ক ী ং 4क rarr क क ी

sup3ং rarr ক ক ং 4क rarr क क

CHCMD sup3া rarr ক ক া 4का rarr क क ा

CHCMX sup3ীঃ rarr ক ক ী ঃ 4कः rarr क क ी ः

CHCMDB sup3াংrarr ক ক া ং 4काrarr क क ा

CHCMDX sup3াঃ rarr ক ক া ঃ 4काः rarr क क ा ः

544 TheKhanda-Tasequence

5441 AsinglelsquoKhandarsquo-Ta(Z) Example Zৎ=x

5442 AKhandaTaCombination10AKhandaTacanbeprecededbyaconsonantandHasanta(alsoknownasVirama)

[CH]Z

Example র++ৎ =ৎHasinভৎHসনা (bhartsanā) scoldingNoteTheconditionsinthiscontextofKHANDATAarethattheCshouldbeeitherRAU+09B0(র)(usedinBangla)orRAU+09F0(ৰ)(usedinAssamese)

545 SpecialCasesSandPTwospecialcasesinvolvingSequences(referredtoasSandPinTable16underSection7)couldbedescribedbrieflyhereLetustakeupSinthefirstinstanceItisnoteworthythat thereare two instances inBanglawhereHasanta(U+09CD) isprecededbya full

10 Refer to Rule P in Section 7 Table 16

42

vowel (U+0985 অ - BANGLA LETTER A and U+098F এ - BANGLA LETTER E) ForrenderingYa-phalāfollowedbyঅandএitisnecessarytotypeU+09CDplusU+09AFyapreceded by the said vowels This is a purely ligatural entity and the addition ofYa-phalāandā-kāraisusedtoelicittheaeligsoundasinEnglishlsquobatrsquolsquofatrsquoetcTheBrahmıscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribedaslsquovowel killerrsquo although it actually indicates absence of a vowel after the markedconsonant Only the consonants can have the Hasanta marked But as we see hereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅGাandএGা(CfUnicode100p473[100])Another case refers to the formation of repha and ra-phalā in the said script andmentioned in the tableaboveasPOwing toco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+

kaasinঅকHarkaldquothesunldquo)ra-phalā=C+Hasanta+ra(egTieka+Hasanta+raas

inচTcakraldquocycleldquo)ThepointisinboththecasestheslotforracouldbeBanglaraর

(U+09B0)ortheAssameseraৰ(U+09F0)followedprecededbythecommonHasanta(U+09CD)whereastheshapesofrephaandra-phalāinboththecasesremainthesame

43

6 VariantsThissection talksabout thevariants in theBanglascriptTheNBGPcategorizes theseconfusinglyvariantsintwogroupsGroup1ConfusingduetopurevisualsimilarityGroup2ConfusingduetodeviationfromnormallyperceivedcharacterformationsbylargerlinguisticcommunityForGroup1anyidenticalcodepointsaredefinedasvariantsTheconfusablebutnotidenticalcasesarenotproposedasthereisanotherpanel(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesHowevercaseswhichbelongtoGroup2areproposedtobeconsideredasvariantsThesecasesarenotofmerevisualsimilarityasthey involve some deviations from the widely accepted norms of Bangla Aksharformations These can cause confusion even to a careful observer and hence beingproposedasvariantsThe variants are generated in a script when two or more forms are formed withdifferent storage or code points In Bangla the e-kāra ā-kāra and the o-kāra havedifferentcodepointsOnecantypeowithaconsonantatonegoandthesamebytypinge-kāra and ā-kāra as two separate keys getting the same results A reader cannotdifferentiatebetween the twoko (েকা)one typedwitha singlekeyand theotheronetypedwithtwodifferentkeysMoreoverthiswillnotbeconsideredasacaseofvariantbecauseakārafollowedbyakāraisnotallowed

61 InScriptVariantsHoweverweproposetwocasesoftruein-scriptvariantsinBanglascriptCASEIAs far as true variants in Bangla are concernedwemay drawour attention to caseswhereinHasantawith(U+09A5)থ(tha)appearsasconjunctwith(U+09B8)স(sa)and(U+09A8)ন(na)1 স+Hasanta+থ(U+09B8+U+09CD+U+09A5)versus

স +Hasanta+হ(U+09B8+U+09CD+U+09B9)

2 ন+Hasanta+থ(U+09A8+U+09CD+U+09A5)versus

ন+Hasanta+হ(U+09A8+U+09CD+U+09B9)

44

Theabovecombinationsifwrittenintraditionalorthographycouldbelittleconfusingwheretheথ(tha)inconjunctappearslikeaহ(ha)Theconjunctcouldbeintheinitialmedialorfinalpositions(asshownbelowinegno1)Itcouldbetypedwrongaswellthinking itwas aহ (ha)U+09B9 increasing the chances of risks in labelwriting andidentificationExamples1 acuteandসহ(asinacuteানsthānaacuteলsthulaাacuteGsvāsthyaঅacuteায়ীasthāyı)

2 andনহ(asinparagrantha)The fontswhichrepresent traditionalBanglawritingsystemcould tend tocreate thisproblemThereforethesemaybetakenascasesofvariantsinBanglaCASEIIAnotherinterestingexampleofvariantisencounteredinra+HasantaandHasanta+racombinationsinwritinglabelsintheBanglascript(forlanguagessuchasBanglaAssameseandManipuri)Thevariantcasesariseintypinglsquorepharsquo(involvingra+Hasanta)andlsquora-phalārsquo(involvingHasanta+ra)lsquoRepharsquocouldbeformedbytwosequences(mainlybecausebothAssameseandBanglafindplaceinthesameUNICODEpointsandlsquoB_RArsquoaswellaslsquoA_RArsquorefertothesamephoneticelement)Herethefinalligatureslookthesameandwillbeasfollows

(1) B_RA+H+C(2) A_RA+H+C

Where

B_RA = U+09B0BENGALILETTERRA(র)orA_RA = U+09F0BENGALILETTERRAWITHMIDDLEDIAGONAL(ৰ)H = U+09CDBENGALISIGNVIRAMA()C = anyconsonant(theoretically)

Example

Sequence1(UsingBanglaRA)

Ligature1

Sequence2(UsingAssameseRA)

Ligature2

U+09B0(র)U+09CD()U+0995(ক) কH U+09F0(ৰ)U+09CD()U+0995(ক) কH

U+09B0(র)U+09CD()U+09A0(ঠ) ঠH U+09F0(ৰ)U+09CD()U+09A0(ঠ) ঠH

Table12ExampleofRepha

45

NoteAsBanglaandAssameseকandঠlookexactlythesametheresultantcombinationswithRephalookidenticalAdditionofRephadoesnotmakeanydifferencelsquoRa-phalarsquocouldbeformedbytwosequencesonsimilargroundsandthefinalligatureswouldlookthesame

(1) C1+H+B_RA(2) C1+H+A_RA

WhereC1 = anyconsonantsexceptKhanda-ta

Example

Sequence1(UsingBanglaRA)

Ligature1

Sequence2(UsingAssameseRA)

Ligature2

U+0995(ক)U+09CD()U+09B0(র) U+0995(ক)U+09CD()U+09F0(ৰ)

U+09A8(ন)U+09CD()U+09B0(র) ) U+09A8(ন)U+09CD()U+09F0(ৰ) )

Table13ExampleofRa-phala

AstheAssameseandBanglaRephaandRa-phalaconjunctformslookthesamethiscouldcauseconfusabilitytotheend-usersHencetherephaandra-phalacasesneedtobedefinedasvariantsNBGPconcludedtodefineরandৰasvariantcodepointswhereonlyonevariantsetbetweenরandৰcouldcoverallcasesButthiswillcreateblockedvariantlabelsegif

someoneregistersldquoরররrdquothevariantlabelldquoৰৰৰrdquowillbegeneratedasvariantandwillbeblockedandviceversaHoweveritisonlyblockedatthelabellevelifsomeoneelseneedstoregisterotherlabelsegৰৰorৰৰৰৰitisstillpossible

62 CrossScriptVariantsAcrispcrossscriptstudyforBanglahasbeendonewithrespecttosisterscriptssuchasDevanāgarī Gurmukhı and Odia11(formerly Oriya) keeping in mind the visual andtechnicalconfusionstheymaycauseaslabelsonthewebdomainMoreoverthereisnoin-script variant in Bangla as far as the orthography is concerned The followingcharacters are being proposed by the NBPG as variants Although there are certaincharacters which are somewhat similar they but have not been included here TheyhavebeenprovidedintheAppendix(102)forreference

11 Unicode uses Oriya for the script although Odia is now the official term used

46

1 BanglaandNāgarīDevanāgarīScript

Bangla Devanāgarī

মU+09AE

मU+092E

িU+09BF

िU+093F

Table14-BanglaandDevanāgarīcross-scriptvariantcodepoint

2 BanglaandGurmukhiScript

Bangla Gurmukhı

মU+09AE

ਸU+0A38

িU+09BF

ਿU+0A3F

Table15-BanglaandGurmukhıcross-scriptvariantcodepoint

7 WholeLabelEvaluationRules(WLE) ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinsection32whenwritten inBangla12ScriptThe ruleshavebeendrafted in suchawaythattheycanbeeasilytranslatedintotheLGRspecificationsBelow are the symbols used in the WLE rules for each of the Indic SyllabicCategoryasmentionedinthetableprovidedinCodepointrepertoire(Section51)

C rarr Consonant

M rarr Kāra(Mātrā)

V rarr Vowel

B rarr Anusvāra

12 As used by the Unicode denoting and including both Assamese and Maṇipuri

47

D rarr Candrabindu

X rarr Visarga

H rarr Hasanta

Z rarr KhandaTa

S rarr S1S2(fromTable9)or(ae)Ya-phalā(V1HC1M1)whereV1iseither0985(অ-BENGALILETTERA)or098F(এ-BENGALILETTERE)His09CD(-BENGALISIGNVIRAMA)C1is-09AF(য-BENGALILETTERYA)M1is-09BE(া-BENGALIVOWELSIGNAA)S1 and S2 are valid even they are not allowed by the other context rules

P rarr Ra-Hasanta(C2H)whereC2iseither09B0(র-BENGALILETTERRA)or 09F0 (ৰ - ASSAMESE LETTER RA

Unicode name BENGALI LETTER RAWITHMIDDLEDIAGONAL)

His09CD(-BENGALISIGNVIRAMA)

Table16-SymbolsusedinWLErules

ItisalsoperhapsidealtomentionherethatinBanglatheconsonantletters(orgraphemes)arephysicallyjoinedtoformldquoclustersrdquothatcouldtheoreticallyconjoinfromtwotofourconsonantsandcombinetocreatenewshapesDashandChaudhuri(1998)statethatthereareldquonearly380uniqueconsonantclustersrdquooutofwhichBi-consonantalcombinationsare290three-lettercombinationsaccountforanother80andtherareroneswithfourlettersnumber10more[136Pg4]MoredetailsofsuchcombinationscouldbeseeninPabitraSarkar(1993)[135] 71FinalSetofWLERulesTheprevalentpatternsinBanglaandvariousrestrictionsbelowarethespecificWLErulesthatneedtobeimplemented

48

1 CisasetofCandCNwhereCNisthesetofnormalizedformsofড়ঢ়য়2 HmustbeprecededbyC

Example

3 MmustbeprecededbyCExampleকা

4 DmustbeprecededbyeitherofVCorMExampleআখখা হা

5 XmustbeprecededbyeitherofVCMorDExampleউঃখঃবঃাঃ দ ঃ

6 BmustbeprecededbyeitherofVCMorDExampleআংইংকং

7 ZmustbeprecededbyVCMDBXorPExampleইৎকৎাৎাৎপ6ৎrৎ (S is not listed because S ends with M Z may also follow S)rdquo

8 VCANNOTbeprecededbyHDetailsin711CaseofVprecededbyH

9 SCANNOTbeprecededbyH

Now letuselaborateeach rulewithexamples from the scriptkeeping inmindthe Bangla Assamese and Manipuri communities Some combinations ofcharactersmayseemunrealisticorrareinusagebutthereisnoharminaddingsuch ligaturesbecause it ispossible tocreate thembyanyusereasilybutmaynotbeattestedcombinationsCase of V Preceded by H There could be cases involvingmulti-word domains where Vmay need to beallowedtofollowanHegব8াsঅtইিuয়া baeligŋkʌv ɪndiə (U+09ACU+09CDU+09AFU+09BEU+0999U+09CD U+0995 U+0985 U+09AB U+09CD U+0987 U+09A8 U+09CD U+09A1U+09BFU+09DFU+09BE)(meaningBankofIndia)ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendswithanH(অt)andthesecondwordbeginswithaV(ইিuয়া)Somesectionsofthe linguistic community require the explicit presence of H for fullrepresentation of the sound intended However by and large the form of thefirstwordwithoutanH(U+09CD)isconsideredenoughforfullrepresentationofthesoundintendedforthefirstword

49

This isauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoire Otherwise V is never required to be allowed to follow an HPermittingthismaycreateaperceptivesimilaritybetweentwolabels(withandwithout H) for majority of the linguistic communities hence this is explicitlyprohibitedbytheNBGPIn future if required depending on the prevailing requirements from thecommunitythefutureNBGPmayconsiderrevisitingthisrule

72 AdditionalExamplesfromBanglaABNF

Belowarea fewexampleswhichhelponeunderstandsomeof the rulesABNFputsinplaceThesearejustgivenforreferencepurposesandarenotmeanttobecomprehensive

1HMBDorXcannotoccurinthebeginningofaBanglawordExample

ক क

াক ाक

ংক क

ক क

ঃক ःकAscanbeseensuchcombinationwillresultautomaticallyinaldquogolurdquooradottedcircle marking it as an invalid formation This is an intrinsic property of theIndianlanguagesyllableandisquasi-automaticallyappliedwhereversupportedbytheOS

2HisnotpermittedafterVBDXMS

Example

অ अ

অং क ক क কঃ कः ি )क

3NumberofBDorXpermittedafterConsonantorVowelorakāra(Mātrā)isrestrictedtoonethusthefollowingcombinationsareinvalidated

50

Example

কংং क

ক क

কঃঃ कःः

কা का

কীঃঃ कःः

অংং अ

অ अ

অঃঃ अःः

4NumberofMpermittedafterConsonantisrestrictedtooneExample

কীী की5MisnotpermittedafterV Example

ইাঈৗ ईाईौ6ThecombinationsofAnusvāra+ VisargaaswellasVisarga+Anusvāraarenotpermissible

Example

কংঃ कः

কঃং कः

8 Contributors

81ExpertsfromIndia ProfessorUdayaNarayanaSinghChair-ProfessorofLinguisticsampDeanFacultyofArtsAmityUniversityHaryanaGurgaonPachgaonManesarPIN122431(Haryana)India

51

ProfessorPabitraSarkarformerlyVice-ChancellorRabindraBharatiUniversityKolkataDrAtiurRahmanKhanPrincipalTechnicalOfficerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMrRajibChakrabortyLinguistSocietyforNaturalLanguageTechnologyResearch(SNLTR)Module114amp130SDFBuildingSaltLakeSector-VKolkata-700091(WestBengal)IndiaMrAkshatJoshiProjectEngineerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMsMoumitaChowdhurySeniorTechnicalOfficerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMrChandrakantaMurasinghAgartalaTripuraSomeotherNBGPmembers

82ContributorsfromBangladeshJanabMustafaJabbarHonorableMinisterMinistryofPostsTelecommunicationsampInformationTechnologyGovtofBangladeshProfShamsuzzamanKhanFormerDirector-GeneralBanglaAcademyDhakaProfRafiqulIslamNationalProfessorofHumanitiesDhakaProfSwarochisSarkarDirectorInstituteofBangladeshStudiesRajshahiUniversityRajshahiBangladeshProfJinnatImtiazAliDirector-GeneralInternationalMotherLanguageInstituteDhakaMrMohammadMamunOrRashidDepartmentofBanglaJahangirnagarUniversityampMemberBangladeshComputerCouncilProfManiruzzamanformerlyProfessorChittagongUniversityChattagramBangladeshMrShyamSunderSikderSecretarySecretaryPostampTelecommunicationsDivisionGovtofBangladesh

52

MrMdMustafaKamalFormerDirectorGeneralBangladeshTelecommunicationsRegulatoryAuthorityGovernmentofBangladeshDhakaBrigadierGeneralMdMahfuzulKarimMajumderDirector-GeneralEngineeringampOperationsDivisionBangladeshTelecommunicationsRegulatoryAuthorityGovernmentofBangladeshDhakaMdZiarulIslamProgrammerPostsampTelecommunicationsDivisionGovernmentofBangladeshDhakaProfSyedShahriyarRahmanDepartmentofLinguisticsUniversityofDhakaDrMizanurRahmanDirector(In-Charge)TranslationTextBookandInternationalRelationsDivisionBanglaAcademyDhakaDrApareshBandyopadhyayDirectorBanglaAcademyDhakaMrMdMobarakHossainDirectorBanglaAcademyDhakaDrJalalAhmedDirectorBanglaAcademyDhakaMrJahangirHossainInternetSocietyBangladesh(ICANNALS)JanabSarwarMostafaChoudhuryBangladeshComputerCouncilDhakaJanabMdRashidWasifBangladeshComputerCouncilDhakaJanabIstiaqueArifSeniorAssistantDirectorBangladeshTelecommunicationsRegulatoryAuthorityDhakaMsAfifaAbbasInformationSecurityandGovernanceLeadEngineeratBanglalinkandICANNFellowMr Mohammad Abdul Haque Secretary General Bangladesh Internet GovernanceForumMrImranHossenCEOEyeSoftandkeymemberofBangladeshAssociationofSoftwareampInformationServices(BASIS)MsShahidaKhatunDirectorFolkloreMuseumampArchiveDivisionBanglaAcademyDhakaMrSyedAshikRehmanCEOBengalMediaCorporationDhakaMrHaseebRahmanCEOProfessionalsrsquoSystemsDhaka

53

9 References[100] UnicodeConsortium2017UnicodeStandard100MountainViewCA

[101] BandyopadhyayChittaranjan1981DuiShatakerBanglaMudranoPrakashanKolkataAnandaPublishers

[102] BanerjiRD1919TheOriginoftheBengaliScriptKolkataNewDelhiAsianEducationalServices2003reprint

[103] ChatterjiSK1926TheOriginandDevelopmentoftheBengaliLanguageCalcuttaCalcuttaUniversityPress

[104] -----1939Bhasha-prakashBangalaVyakaran(AGrammaroftheBengaliLanguage)CalcuttaUniversityofCalcutta

[105] HaiMuhammadAbdul1964DhvaniVijnanOBanglaDhvani-tattwa(PhoneticsandBengaliPhonology)DhakaBanglaAcademy

[106]JhaSubhadra1958TheFormationofMaithiliLondonLuzacampCo

[107] KosticDjordjeDasRheaS1972AShortOutlineofBengaliPhoneticsCalcuttaStatisticalPublishingCompany

[108] MajumdarRC1971HistoryofAncientBengalCalcuttaGBhardwaj

[109] MazumdarBijaychandra19202000TheHistoryoftheBengaliLanguage(ReprCalcutta1920ed)NewDelhiAsianEducationalServices

[110] PandeyAnshuman2001ProposaltoEncodetheTirhutaScriptinISOIEC10646

[111] PalPalashBaran2001DhwanimalaBarnamalaKolkataPapyrus

[112] -----2007lsquoBanglaHarapherPanchParbarsquoInSwapanChakrabortyedMudranerSanskritiOBanglaBoiKolkataAbabhas

[113] RossFiona1999ThePrintedBengaliCharacteranditsEvolutionLondonCurzon

[114] ShastriMahamahopadhyayHaraPrasad1916HājārBacharērPurāṇaBāṅgālāBhāṣāyBauddhaGānōDōhāCalcuttaBangiyaSahityaParishat

[115] SinghUdayaNarayana(JointlyManiruzzaman)1983DiglossiainBangladeshandlanguageplanningCalcuttaGyanBharati214pp

[116] -----1987ABibliographyofBengaliLinguisticsMysoreCIILxii+316pp

[117] -----2017(withRajibChakrabortyBidishaBhattacharjeeampArimardanKumarTripathy)LanguagesandCulturesontheMarginGuidelinesforFieldworkonEndangeredLanguagesMimeoCentreforEndangeredLanguagesVisva-Bharati

54

[118] -----1980ScriptalchoiceandspellingreformAnessayinlanguageandplanningJournaloftheMSUniversityofBarodaSocialScienceNumber292173-186AmodifiedversionreprintedEAnnamalaiBjornJernuddandJoanRubinedsLanguagePlanningProceedingsofanInstituteMysoreCIIL405-417

[119] Sripantha1996JakhanChapakhanaEloKolkataPaschim-BangaBanglaAcademy

[120] SurAtul1986BanglaMudranerDushoBacharKolkataJijnasa

[121] ScriptBehaviourforBengaliVersion11TDILandC-DACPune

[122] BoraMahendra1981TheEvolutionofAssameseScriptJorhatAssamSahityaSabha

[123] ProposaltoEncodetheTirhutaScriptinISOIEC10646httpwwwunicodeorgL2L201111175r-tirhutapdfaccessedon25112017

[124] EthnologueAssameseintheLanguageCloudhttpswwwethnologuecomcloudasmaccessedon25112017

[125] BengalialphabetforManipurifoundinEthnologueManipuri(MeeteilonMeithei)httpswwwomniglotcomwritingmanipurihtmaccessedon20102019

[126] WikipediaBengalialphabethttpsenwikipediaorgwikiBengali_alphabetaccessedon25112017

[129]OmniglotSlyhetihttpwwwomniglotcomwritingsylotihtmaccessedon1052018

[130]WikipediaBishnupriyaManipurilanguagehttpsenwikipediaorgwikiBishnupriya_Manipuri_languageaccessedon25112017

[131] TheEMILLECIILCorpushttpmetashareeldaorgrepositorybrowsethe-emilleciil-corpusabdd35c8de6f11e2b1e400259011f6ea6bce74d38dbb42d881da76c64a6adb20accessedon1052018

[132]TheEMILLECIILCorpushttpcatalogelrainfoproduct_infophpproducts_id=696accessedon1052018

[133] BanglaLanguageampScript httpswwwisicalacin~rc_banglabanglahtmlaccessedon1052018

[134] SarkarPabitra1992BanglaBananSanskarSamasyaoSambhabanaKolkataChirayataPrakashan

[135] SarkarPabitra1993BanglaBhasharYuktabyanjanBhasha1123-45

55

[136] DashNiladriShekharandBBChaudhuri1998BanglaScriptAStructuralStudyLinguisticsToday121-28Alsoavailableathttpswwwacademiaedu9967428Bangla_Script_A_Structural_Study

[137] DaniAhmedHasan(1957)lsquoSrīhatta-NāgarīLipirUtpattioBikāśrsquoBanglaAcademyPatrika(Dhaka)Vol12(Bhadra-Agrahayan1364BangabdaNumber)pg1

[138] WikipediaSylhetiNagari httpsenwikipediaorgwikiSylheti_Nagariaccessedon1952018

[139] FuruiRyosuke(2015)lsquoVariegatedAdaptationsStateFormationinBengalfromtheFifthtoSeventhCenturyrsquoinBhairabiPrasadSahuampHermannKulkeedsInterrogatingPoliticalSystemsIntegrativeProcessesandStatesinPre-ModernIndiaChapter9Pp255-73NewDelhiManohar

[140] FergusonCharesAandMunierChowdhury(1960)lsquoPhonesofBengalirsquoLanguageVol36No1pp22-59

[141] ShahidullahMuhammad(2007)BuddhistMysticSongsDhakaMowlaBrothers

[142] RayPunyaSloka(1966)BengaliLanguageHandbookWashington

[143] HaiMuhammadAbdul(1960)AphoneticandphonologicalstudyofnasalsandnasalizationinBengaliDhakaUniversityofDhaka

[144] UnicodeConsortiumProposalSummaryFormtoAccompanySubmissionsforAdditionstotheRepertoireofISOIEC10646UNICODEhttpswwwunicodeorgL2L200202387r-syloti-formpdfaccessedonMay212018

[145] WikipediaOlChiki(Unicodeblock)httpsenwikipediaorgwikiOl_Chiki_(Unicode_block)accessedonMay212018

[146]BanglaScripthttpwwwbangladesh2000combdbangla_scripthtmlaccessedonMay212018

[147] BhattacharyaAshutoshed(1942)GopichandrerGanCalcuttaCalcuttaUniversity

[149] DasSisirKumar(1975)SahibsandMunshisAnAccountoftheCollegeofFortWilliamCalcutta

[150] IslamRafiqulPabitraSarkarMahbubulHaqampRajibChakraborty(eds)(2014)BanglaAcademyPromitoBanglaByabaharikByakaran(AFunctionalGrammarofStandardBangla)DhakaBanglaAcademy

[151] SarkarPabitra[2013]lsquoBanglaSpellingReformtheLongandShortofItrsquoBanglaJournal19215-232

[152] BanglaAcademy(2012)BanglaAcademyPromitoBanglaBananerNiyam(StandardBanglaSpellingasadoptedbyBanglaAcademy)DhakaBanglaAcademy

56

[153] SarkarPabitraampRajibChakraborty2018ldquoWhathashappenedSoFarIntermsof Script Reformsrdquo Paper presented at the Face to Face meeting jointly held by theBanglaAcademyDhakaampICANNatBanglaAcademyDhakaon10072018

[154] The Unicode Consortium 2018 The Unicodereg Standard Version 110 ndash CoreSpecificationChapter12P473

57

10 Appendix-I

101 AugmentedBackus-NaurFormalism(ABNF)The Augmented Backus-Naur Formalism (ABNF) is generic in nature and whenappliedtoaspecificlanguagescriptcertainrestrictionrulesapplyInotherwordsinagivenlanguagesomeoftheFormalismstructuresdonotnecessarilyapplyTotakecareofsuchcasesrestrictionrulesaresetinplaceTheserestrictionswillhelptofine-tunetheABNFIncaseofBangla13inparticularthefollowingrulesapply

1Khaṇḍata (ৎ) is NOT allowed at the beginning of an IDN label The sameappliestoঞandthevelarnasalঙintheBanglaSchemeoffive-foldlsquovargarsquo(asdefined under Table 5) Moreover Bangla does not allow ya (য়) in thebeginningofawordeitherbutwecanciteacoupleofnativeexamples forexamplethewordয়8াvেড়া(yaeligbbɔRo)fromthepoemlsquoLichuchorrsquowrittenbyKaziNazrul IslamHowever there are instances of it being used in namesmostlyof foreignoriginsuchasYaqubwhichmaybewrittenwithya(য়) inthe beginning as inয়াwব) In very recent timeswhile transliterating someChineseandJapanesenamesinBanglaonedoescomeacrossthepossibilityofKhaṇḍata (ৎ) followedbysa (স) inthebeginningofaword forexamplexেসিরং(Tsering)

2CHcancomewithKhandaTainonlythecasewhereCisra(র)(09B0)

ৎ6 asinভৎ6 সনা

3OnlyfollowingcombinationswithVHCMwillbeallowedrarrঅ8া(togetherpronouncedasaelig)asinঅ8ািসড(acid)rarrএ8া(togetheralsopronouncedasaelig)asinএ8ািসডএ8ােসািসেয়শান

(acidassociation)

102 lsquoSylhetiNagarılipirsquoorlsquoSilotirsquoThisversionofBanglascriptresemblesthe lsquoKaithīrsquoscript(ISO12954)usedbytheAccountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh andBiharndashwidelyinuseduringthe1880sTherewereseveralothernamesofSylheti

13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language Assamese and Maṇipuri have not been covered in this section

58

Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)

Table17ndashTheScriptTableofSylhetiNagarıorSiloti

103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints

1031 BanglaandNāgarīorDevanāgarī

Bangla Devanāgarī NBGP

Decision

ঃ U+0983 ः U+0903 Confusable

ওU+0993 उU+0909 Confusable

ঘU+0998 घU+0918 Confusable

U+0981 U+0945 Confusable

Table18BanglaandDevanāgarīconfusablecodepoints

59

1032 BanglaandGurmukhi

Bangla Gurmukhi NBGPdecision

ঘU+0998 ਬU+0A2C Confusable

U+0981 U+0A71 Confusable

Table19BanglaandGurmukhiconfusablecodepoints

Bangla Gurmukhi

NBGPdecision

ওU+0993 ਤU+0A24

Distinguishable

শU+09B6 ਅU+0A05

Distinguishable

মU+09AE ਮU+0A2E

Distinguishable

বাU+09ACandU+09BE

ਗU+0A17

Distinguishable

Table20ndashBanglaandGurmukhıdistinguishablecodepoints

1033BanglaandOriya(Odia)

Bangla Oriya(Odia) NBGPDecision

ওU+0993 ଓU+0B13 Confusable

Table21ndashBanglaandOriyadistinguishablecodepoints

Bangla Oriya(Odia) NBGP

Decision

ঘU+0998 ସU+0B38 Distinguishable

Table22ndashBanglaandOriyadistinguishablecodepoints

60

11 Appendix-IIBengaliconsonantsandtheirallographs

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

প p y(z+ত)(z +ন)|(z +প)প8(z +য)r(z +র)(z +ল)~(z +স)(+প)(+প)

ফ pʰ (t +র)(t +ল)(+ফ)

ব b ( +জ)( +দ)( +ধ)v(+ব)ব8(+য)(+র)(+ল)ভ(+ভ)(+ব)(+ব)

(0+ধ) 2 (3+ব)

ভ bʱ ভ8(+য)(+র)(+ল)

ত t (x+ত)8(x+x+য)(x+x+ব)(x+থ)(x+ন)ত8(x+য)(x+ম)8(x++য)(x+ব)(x+র)y(z+ত)(p+ত)(p+x+ব)(+ত)q8(+x++য) (+x+র)Thereisamarkedformofত+=ৎৎ6 (+xৎ)

amp (5+ত)

61

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

থ tʰ থ8(iexcl+য)cent(iexcl+র)pound(+থ)(x+থ)curren(+থ)

(7+থ) 9 (+থ)

দ d yen(brvbar+গ)sect(brvbar+ঘ)uml(brvbar+দ)copy(brvbar+ধ)দ8(brvbar+য)ordf(brvbar+ব)laquo(brvbar+ভ)not(brvbar+র)(+দ)shy(+দ)reg(+brvbar+র)not6 (+brvbar+র)

(lt+গ) gt (lt+ধ)

ধ dʱ macr(deg+ন)plusmn(deg+ম)ধ8(deg+য)sup2(deg+র)sup3(acute+ধ)copy(brvbar+ধ)(+ধ)micro(+ধ)

( (+ধ) gt (lt+ধ) (0+ধ) (7+ধ)

ট ʈ para(middot+ট)ট8(middot+য)cedil(middot+ব)sup1(middot+র)ordm(p+ট)raquo(frac14+ট)

ঠ ʈʰ ঠ8(frac12+য)frac34(iquest+ঠ)Agrave(frac14+ঠ)

ড ɖ Aacute(Acirc+ড)ড8(Acirc+য)Atilde(Acirc+র)

ঢ ɖʱ ঢ8(Auml+য)Aring(iquest+ঢ)

চ tʃ AElig(Ccedil+চ)Egrave(Ccedil+ছ)Eacute(Ccedil+Ecirc+র)Euml(Ccedil+ঞ)চ8(Ccedil+য)Igrave(Iacute+চ)Icirc(Iuml+চ)

(A+চ)

62

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ছ tʃʰ ETH(Ecirc+র)Egrave(Ccedil+ছ)Ntilde(Iacute+ছ)Ograve(Iuml+ছ)

$ (A+ছ)

জ dʒ Oacute(Ocirc+জ)Otilde(Ocirc+Ocirc+ব)Ouml(Ocirc+ঝ)times(Ocirc+ঞ)জ8(Ocirc+য)Oslash(Ocirc+র)Ugrave(Iacute+জ)

(A+জ)

ঝ dʒʱ (notprivilegedenoughtohaveclustersasafirstmember)Ouml(Ocirc+ঝ)Uacute(Iacute+ঝ)

ক k Ucirc(p+ক)ordm(p+ট)(p+ত)Uuml(p+x+র)(p+x+ব)Yacute(p+ন)THORN(p+ব)szlig(p+ম)ক8(p+য)agrave(p+র)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)auml(p+frac14+ব)aacute8(p+frac14+য)aring(p+স)s(aelig+ক)ccedil(+p+র)

amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)

খ kʰ (notprivilegedenoughtohaveclustersasafirstmember)egrave(aelig+খ)

63

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

গ g eacute(acute+গ)ecirc(acute+দ)sup3(acute+ধ)euml(acute+ন)igrave(acute+ব)iacute(acute+ম)গ8(acute+য)icirc(acute+র)iuml(acute+ল)eth(aelig+গ)eth6 (+aelig+গ)

( (+ধ) (H+গ) K (L+H+গ)

ঘ gʱ ntilde(ograve+ন)ঘ8(ograve+য)oacute(ograve+র)ocirc(aelig+ঘ)

ঞ Thisletterdoesnothaveanyparticularphoneticvaluebutmostlypronouncedasn

Igrave(Iacute+চ)Ntilde(Iacute+ছ)Ugrave(Iacute+জ)Uacute(Iacute+ঝ)times(Ocirc+ঞ)

(A+চ) $ (A+ছ) (A+জ) M (A+ঝ)

ণ n otilde(iquest+ট)frac34(iquest+ঠ)ouml(iquest+ড)divide(iquest+Acirc+র)Aring(iquest+ঢ)oslash(iquest+ণ)ণ8(iquest+য)ugrave(iquest+ব)acirc(p+frac14+ণ)uacute(frac14+ণ)ucirc(+ণ)

O (P+ড) - (P+R+র) + (S+ণ)

ঙং ŋ s(aelig+ক)uuml(aelig+p+র)egrave(aelig+খ)eth(aelig+গ)ocirc(aelig+ঘ)yacute(aelig+p+ষ)(Insomecontextsaeligisreplacedbyং)কংঅং

) (H+ক) (H+গ) U (H+ঘ)

64

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ম m thorn(+ল)yuml(+প)(+z+র)(+ভ)(++র)$(+ম)(+র)(x+ম)plusmn(deg+ম)amp(+ম)atilde(p+frac14+ম)

W (3+ম)

ন n (+ট)((+middot+র))(iquest+ঠ)u(+ড)(+Acirc+র)(+ত)q(+x+র)q8(+x++য)curren(+থ)shy(+দ)reg(+brvbar+র)micro(+ধ)+(+deg+র)(+brvbar+ব)-(+ন)(+ম)ন8(+য)(+স)0(+ন)

(7+থ) (7+ধ) (7+Y+র)

শ ʃ Icirc(Iuml+চ)Ograve(Iuml+ছ)1(Iuml+ন)2(Iuml+ম)3(Iuml+র)4(Iuml+ল)শ8(Iuml+য)

ষ ʃ 5(frac14+ক)raquo(frac14+ট)Agrave(frac14+ঠ)uacute(frac14+ণ)6(frac14+প)7(frac14+z+র)8(frac14+ফ)raquo(frac14+ট)9(frac14+middot+র)Agrave(frac14+ঠ)uacute(frac14+ণ)ষ8(frac14+য)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)

+ (S+ণ)

65

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

স sampʃ (+ক)(+ট)(+প)(+ফ)lt(+ত)pound(+থ)(+ট)(+ক)=(+খ)স8(+য)gt(+র)(+ল)aring(p+স)

9 (+থ)

হ h ucirc(+ণ)0(+ন)amp(+ম)হ8(+য)(+র)A(+ল)

W (3+ম)

ড় ɽ B(C+গ)

ঢ় ɽʱ (notprivilegedenoughtohaveclusters)

য dʒThesecondarysymbol(allograph)jɔ-phalahastwophoneticvaluesWhenaddedtotheinitialconsonantinaworditisavowelaelig(asinশ8ামলর 8াপারetc)Butafteranon-initialconsonantitjustdoublesitinpronunciation(asinকায6ধায6etc)The+যcombinationhastwophysicalmanifestationsmdashর 8andয6

ক8(p+য)স8(+য)র 8(+য)[Justর 8isneverusedinBanglaorthographyর 8াisbutthenitslasttwosymbolsYa-phalaa-karaconstituteavowelsignrepresentingthevowelঅ8া]

66

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

র r Twomanifestationsmdashi lরফrepʰasthefirst

memberofaclusteregপ6ৎ6 not6 য6D6(+deg+ব)(earlierE6=+brvbar+deg+বafour-termcluster)etc(placedoverthefollowingconsonant)

ii র-ফলাrɔ-pʰɔlaasthesecondthirdmemberofaclustereg etc(placedundertheconsonantitfollows)

ল l F(+গ)(+প)G(+ব)H(+ম)I(+ট)J(+ড)K(+ক)F(+গ)L(+দ)ল8(+য)iuml(acute+ল)(+ল)thorn(+ল)

ঃ hwordfinallywordmediallyitdoublesthepronunciationofthefollowingconsonant

অঃকঃ

অব

Page 12: Proposal for a Bangla (or Bengali) Script Root Zone …‘Bangla’ (or Bengali) is historically and genealogically regarded as an eastern Indo-Aryan language with around 178.2 million

12

knturatherthanwritingasL (n)+ত (t)+ (u)

Similarlytherecouldbevowelmodifiersofঊorlsquo(Long)ūrsquoaswelleg

m (bh)+র (r) (n bhru ldquoeyebrowrdquo)o (s)+র (r) (p sru)ঋ (r) afterহ (h) (q hr)etc

TherehavebeenmanynotablecontributionsinsimplifyingandmodifyingBanglaspellings and combinatory techniques especially by scholars such as PabitraSarkar(1992)[134]Inthistherehasbeenanattempttoreducethenumberofallographs of both vowels and consonants in clusters and it has been widelyacceptedintheprintingofschooltextsinbothBangladeshandWestBengal[151152]Asofnow twosystems theold (traditional) and thenewgoon sidebysideoperativeindifferentdomains

HoweverinpreparationofthisLGRdocumenttheaimhasbeentoconsiderthewidelyused and usable sequences and combinations and their variations across the sisterscriptsbelongingtothebasketofBrāhmīwritingsystemsBanglaAcademyDhakapublishedStandardBanglaSpellingRulesin1992followingtherecommendationsofacommitteeconstitutedthroughaworkshopjointlyorganizedbytheJatıyaSy iksakramaandPathyapustakaBoardin1988AthroughlyrevisededitionoftheRuleswaspublishedinSeptember20126After the establishment of Banla A kademi ofWestBengal in 1986 its first PresidentAnnadasankar Ray (1904-2002) in his inaugural address gave a direction forstandardizationofBanglaalphabetscript thespellingsystemandclearlyarguedthattheywouldnotblindly followtheSanskriticmodelofconventionalgrammarAbroadlistofproposalswassenttoexpertsonBanglaandabroadagreementwasreachedforlsquohomogenizationofBanglaspellingrsquoby1988BasedonopinionsreceivedfromdifferentquartersaunanimouslistoflsquorulesrsquowasagreeduponThiswaspublishedbyalsquoSpellingDictionaryrsquo titled Ākādemi Bānāna Abhidhāna (1997) which was obviously morecomprehensive than lsquoTheUniversityofCalcuttaproposalsrsquomade in1936Alongwiththe lsquorationalizationrsquo of spellings another stepwas taken tomake thewriting systemeasier to read by making the symbols used both single and combined ones morelsquotransparentrsquoThesereformswereoriginallysuggestedbySarkar(1987firstpublishedin1978)[134][153]whereheusedthetermsSwaccha (lsquoTransparentrsquo)andAswaccha(lsquoOpaquersquo or non-transparent) even adding Ardha Swaccha (lsquohalf transparent) inbetweenthetwoSomesampleexamplesare

6Bangla Academy 2012 Bāṅlā Ekaḍemī Pramita Bāṅlā Bānānera Niyama (Bangla Academy StandardBanglaSpellingRules)DhakaːBanglaAcademy

13

Transparent r (nn) s (pt) [ (st) where both member of the cluster can berecognizedOpaquewhereneitherofthetwocouldbe(easily)recognizedmdash8 ks(7 k+ষ s)= jn

( j+ঞ n)tng(un+গg) hm(gt h+ম m)

Semi-transparent A (pr)পH (rp)whereone symbol is recognizable and theother is

notIncaseofthree-termclustersatleastonesymbolwillnotbetransparentegv str

(w s+x t+র r)D str(B s+C t+র r)etc

Therewere in fact two types of proposals One concerned the shape of the lettersthose of consonant + vowel (CV) combinations and conjuncts which is consonant +consonantcombinationsTherewerefurthercomplexshapesiethoseofconsonant+consonant+ (consonant+) vowel (CC(CV) signs as in y (pru) or z (skru) SomedecisionsinthisareawerenecessarybecauseafewoftheCC(C)symbolsrepresentedcomplexitiesthatmadelearningthemdifficultforthechildrenTheotherdealtwiththespellings ofwords onlywithout any reference to the shapes of letters inwhich theywere written The basic objective here was lsquoone word one spellingrsquo to the greatestextentthatwaspossible[151]

Belowwe place a statement of themost salient changes that affect the consonant +vowelcombinations[153]

a The variants of the short u (^ উ-কার hrasva u-kāra) vowel sign have been

brought down to one ie So zwnj (gu) is now গ Similarly h (ru) gt র zwnj

(śu)gt শ j (hu)gtহ and therefore cluster + short u sign k (ntu)gt W

(ন++ত+উ) (stu)gt[ (স++ত+উ)

b The variants of long u (দীঘH ঊ-কার dīrgha u-kāra) have also been reduced

(rū)gt র n (bhrū) gt Y (ভ bh++র r+ঊ ū) (drū)gt (দ d++র r+ঊ ū) p (śrū)gt

(শ ś++র r+ঊ ū)

c The variants of ঋ-কার (ṛ-kāra secondary symbol of ṛ) have been brought down to one q (hṛ) gt হ

Regarding consonant + consonant + (consonant)hellip+ (vowel) clusters PaschimbangaBanglaAkademi proposed transparent or semi-transparent shapes for clusters to theextentadmissibleinBanglawritingsystemSomeexampleswillclarifytheproposal(Aslashwillmeanthatthetraditionalcluster-shapeprecedesitwhiletheBanglaAkademiinnovationfollows)[153]

Xব ধ bdh ( b+ ধ dh) Mদ ধ ddh (J d+ধ dh) ন থ nth (L n+থ th) Uঞ চ ntildec

(9 ntilde+চ c) ঞ ছ $ ntildech (9+ছ) ঞ জ ntildej (9 ntilde+জ j) Sক ত amp kt (7 k+ত t) T

14

kr (7 k+র r) Nগ ধ ( gdh (K g+ধ dh) ) ṅk (u ṅ+ক k) t ṅg (u ṅ+গ g) +

ṣṇ (B ṣ+ণ ṇ) ন ndhr (L n+ dh+র r) - ṇḍr ( ṇ+ ḍ+র r) ktr (7 k+x

t+র r)

331TheConsonantsAsper traditional classificationBangla Consonants are categorized according to theirphoneticpropertiesespeciallyintermsofplaceandmannerofarticulation[107]Thereare Five lsquoVargarsquo (pronounced as lsquoBargarsquo in Bangla) or Groups (sets or classes)distinguished by Place of Articulation and one Non-lsquovargarsquo group [105] Each Vargawhich corresponds toStopsat a certainplaceof articulation containsa seriesof fiveconsonants classified as per their phonetic qualities (ie manner of articulation)beginning from Unvoiced and Unaspirated to Voiced and Aspirated (in the fourthcolumn)finallyendingwithaHomorganicorCorrespondingnasal[107]Considerthefollowingtable

lsquoVargarsquoorSets

Unvoiced Voiced Nasal

-Asp +Asp -Asp +Asp

Velar কlsquoKrsquoU+0995

খlsquoKHrsquoU+0996

গlsquoGrsquoU+0997

ঘlsquoGHrsquoU+0998

ঙlsquoNGrsquoU+0999

Palatal চlsquoCrsquoU+099A

ছlsquoCHrsquoU+099B

জlsquoJrsquoU+099C

ঝlsquoJHrsquoU+099D

ঞlsquoNYrsquoU+099E

Retroflex টlsquoTTrsquoU+099F

ঠlsquoTTHrsquoU+09A0

ডlsquoDDrsquoU+09A1

ঢlsquoDDHrsquoU+09A2

ণlsquoNNrsquoU+09A3

Dental তlsquoTrsquoU+09A4

থlsquoTHrsquoU+09A5

দlsquoDrsquoU+09A6

ধlsquoDHrsquoU+09A7

নlsquoNrsquoU+09A8

Bilabial পlsquoPrsquoU+09AA

ফlsquoPHrsquoU+09AB

বlsquoBrsquoU+09AC

ভlsquoBHrsquoU+09AD

মlsquoMrsquoU+09AE

Table5VargaclassificationofBanglaconsonants

(FallingintoaPatternofFiveSetsofUnvoicedUnaspiratedUnvoicedAspiratedVoicedUnaspiratedVoicedAspiratedandNasalscalledfivelsquoVargarsquo)

Non-

যlsquoYrsquoU+09AF

য়lsquoYYrsquoU+09DF

রlsquoRrsquoU+09B0

ড়lsquoRRrsquoU+09DC

ঢ়lsquoRHrsquoU+09DD

15

Varga লlsquoLrsquoU+09B2

শlsquoSHrsquoU+09B6

ষlsquoSSrsquoU+09B7

স lsquoSrsquo U+09B8

হlsquoHrsquoU+0939

Table6Non-Vargaconsonants(Notfallingintoanyofthefivecategories)

332TheImplicitVowelKillerHasanta(calledrsquoHalantrsquoorlsquoHalantarsquoinotherBrahmı-basedscripts)As stated earlier all consonants are pronounced in isolation with an implicit vowel(centralback-ɔinBanglaastheneutralvowel)assumedtobeassociatedwiththem[121]ThelsquoHasantarsquo(=rsquoHalantrsquoorlsquoHalantarsquoinotherBrahmı-basedscripts)orthetermlsquoVirāmarsquo7(=rsquoDa rirsquoinBangla)aspreferredinUNICODE(cfUnicode30andabove)havebeenusedinthisreportastermsthathavebeenusedtodenotethecharacterthatmarkthe absence of this inherent vowel It may be noted that the term virama has beenadopted in UNICODE in a sense that is different from the traditional definition ofgrammarandhenceitrequiressomeexplanationhereConsideringtheimportanceofthedocumentthisnoteshouldbeapartofthisLGRdocumentsothatanybodyreferingtoitshouldbeabletoknowthepropergrammaticalexplanationofthetermBecauseaspecialsignisneededwheneverthisimplicitvowelisstrippedoffthesymbolisknownas the Hasanta (= Halant) (U+09CD) By placing the Hasanta under the firstconsonantofacombinationorclusteronecouldndashincommonparlanceldquokillrdquoitsvowelandcreate conjuncts In thismanner conjunct characters canbegenerallywrittenbyjoining two to fourconsonant combinations In rarecases thisprocess can joinup tofive consonantsHowever thenotionof amaximumnumberof consonants joining toformoneaksara8istobeboundedempiricallyThisisanobservationbasedontheCIIL-Emille Corpora of Bangla words [132 amp 133] as seen in print these days Given themixture of scripts and languages happening on theweb the possibility that onemaywant a generic Top Level Domain [gTLD] which may have more than the observedmaximum cannot be ruled out This can be the case when a foreign language wordwhichadmitsalargenumberofconsonantsistransliteratedintoBanglaHenceintheBanglaLGRworkthislimitwillnotbeenforced

333VowelsSeparate symbols exist for all lsquoSwararsquo or Vowels in Bangla which are pronouncedindependentlyeitheratthebeginningofthewordorafteranothervowelorconsonantsoundToindicateaVowelsoundotherthantheimplicitoneaVowelsigncalledlsquokārrsquo

7VirāmaasusedhereisalsoamisnomeraccordingtotheIndiangrammaticaltraditionsNowheremereabsence of a vowel is marked as virama Hasanta just marks the absence of a vowel nothing else(AbhyankarKashinathVasudevampJMShukla1961ADictionaryofSanskritGrammarBarodaːOrientalInstitute)8ThistermneedstobedisambiguatedAksaraalsomeanslsquosyllablelsquoinIndiangrammaticaltreaditions

16

inBanglaorMātrāinNagarı9isattachedtotheconsonantSincetheconsonanthasthisbuilt in neutral vowel at the end there are equivalent kāras (Mātrās) for all vowelsexcepttheঅ(pronounced-ɔ)Thecorrelationisshownasfollows

Vowel Correspondingvowelsign(kāras(Mātrās)

অlsquoArsquoU+0985

আlsquoAArsquoU+0986 া U+09BE

ইlsquoIrsquoU+0987 ি U+09BF

ঈlsquoIIrsquoU+0988 ীU+09C0

উlsquoUrsquoU+0989 U+09C1

ঊlsquoUUrsquoU+098A U+09C2

ঋVocalicrsquoRrsquoU+098B U+09C3

ৠVocaliclsquoRRrsquoU+09E0 U+09C4

ঌVocaliclsquoLrsquoU+098C U+09E2

ৡVocaliclsquoLLrsquoU+09E1 U+09E3

এlsquoErsquoU+098F ে U+09C7

ঐlsquoAIrsquoU+0990 ৈU+09C8

ওlsquoOrsquoU+0993 োU+09CB

ঔlsquoAUrsquoU+0994 ৌ U+09CC

- ৗ U+09D7

Couldappearontopofঅ lsquoArsquoU+0985oranyothervowel

U+0981Candrabindu

9AlthoughthetermlsquoMatralsquoinBanglastandsforanaltogetherdifferentconceptvizthetopbarplacedoveraletterndashtypicallyavailableinHindiandBanglabutmissinginGujarati

17

Vowel Correspondingvowelsign(kāras(Mātrās)

Couldappearafterঅ lsquoArsquoU+0985oranyothervowel

ংU+0982Anusvara

Couldappearafter অlsquoArsquoU+0985oranyothervowel

ঃU+0983Visarga

Afteranyconsonant U+09CD(Hasanta)

- ঽ U+09BDAvagraha

Table7BanglaVowelswithcorrespondingkārs

334TheAnusvāraonuʃʃār(ং-U+0982)TheAnusvāra or onuʃʃār inBangla at times represents a homorganic nasal but notalwaysItreplacesaconjunctgroupofalsquoNasalConsonant+Hasanta+ConsonantrsquowherethesecondconsonantbelongstotheVelarvargaorsetasinলংকাButitoftenappearsalso for such combinations involving non-velars appearing as the lastmember of thecombinationasinলGাংটা ldquonakedrdquoorলGাংচা ldquoakindofsweettolimprdquoBeforeanon-vargaconsonant the Anusvara represents a nasal sound that may have an alternativeconjoined writing symbol representing the corresponding nasal consonant of theparticularsetAlthoughModernHindiMarathiandKonkaniprefertheanusvāratothecorrespondingHalf-nasal inBangla it isclearlydemarcatedastowhereonemustusetheAnusvāraandwhere ithastobeaconjunctclusterwithanasalasthefirstorthesecondcomponent

335NasalizationCandrabindu(-U+0981)

Candrabindu denotes nasalization of the preceding vowel as in চাদ cad lsquomoonrsquo(U+099AU+09BEU+0981U+09A6)Thissignwithadotinsidethehalf-moonmarkisusedasnasalizationmarkerinmanyBrahmı-basedscripts[143]

336Nukta(-U+09BC)ThenuktasigndoesnotexistinBanglaorthographyItispredominantlyusedinmanyBrahmıderivedscriptssuchasDevanagarı(forHindiBodoMaithiliSantaliKashmiriandSindhiThetermandtheconceptofnuktaareborrowedinBanglaTheIDNAProtocol(RFC5891)statesthatIDNsmustbeinUnicodeNormalizationFormC (NFC) RFC 7940 applies this requirement to LGRs The definition of NFC in the

18

UnicodeStandardcontainsanumberofcompositionexclusionsAsaresulttheBanglalettersয় YYAড় RRAandঢ় RRHAhavetoberepresentedinthethisLGRbyusingthesequences (YA +Nukta U+9AF + U+09BC) (DDA + Nukta U+9A1 + U+09BC) and(DDHA+NuktaU+9A2+U+09BC)insteadofthesinglecodepointsYYA(U+9DF)RRA(U+09DC) andRRHA (U+09DD) although the useof lsquoNuktarsquo is otherwise completelyunnaturalinBanglaIt is noted that in the current Unicode Standard chart these characters are listed asadditionalconsonantsAspertheLGRProcedurehoweverthesedecisionsdependontheIDNAProtocolthroughasetofprodeduresdevelopedbytheIETFEventhoughtheUnicode Standard also prescribesmethods to produce these three characters both asatomiccharacters (forexample09DC forড় [r]09DD forঢ় [rh] and09DFasয় [y]assinglekeystroke)theIDNAprotocolrequiresthatwetreatthemasconjunctcharactersandthenallocatecodesfortheseintheUnicodeBengaliBlockItmaybenotedthattherecouldbesporadicattemptsorcasesofwritingMuslimnamesUrdupoeticwordsandPerso-Arabicloanwordswithnuktaunderক(k)খ(kh)গ(g)জ(j) and ফ (ph) only for the sake of correct pronunciation and for maintaining thesanctityoftheloanwordThesewerealsolikeusingBanglawritingsystemtoworkliketheIPAscriptItishowevernotinuseinBanglawritinginprinting

337Visargabiʃɔrgo(ঃ-U+0983)andAvagraha(ঽ-U+09BD)

TheVisargabiʃɔrgoU+0983 is frequentlyused inBangla loanwordsborrowed fromSanskritandrepresentsasoundveryclosetohOnecouldquoteasanexampleদঃখduhkholdquosorrowrsquorsquoldquounhappinessrsquorsquo(U+0926U+0941U+0983U+0916)The Avagraha ঽ (U+09BD) is mainly used in Sanskrit Pali Prakrt or Maithili textswritteninBanglaItisgraduallybeingreplacedbyanuppercomma(egনেরাঽপরািণre-writtenasনেরাrsquoপরািণ)ItisrarelyusednoweveninotherlanguagesusingBanglascriptIncaseofLGRtheAvagrahaisnotpartoftherepertoireIthasbeendecidedthereforenottoretainAvagraha(ঽ)(U+09BD)becauseitisblockedinTLDsaspertheMaximalStartingRepertoire(MSR)PleaseseeAppendixIIinsection11foracompletelistofBanglaconsonantsandtheirallographs

338ZeroWidthNon-joiner(U+200C)andZeroWidthJoiner(U+200D)ThisnoteispertinenttotheuseofZeroWidthJoiner(ZWJ)andZeroWidthNonJoiner(ZWNJ)asusedinBanglaItneedstobenotedthatNepaliKonkaniandHindiusethesetwosignsinadifferentmanner

19

ZWJ(U+0200D)andZWNJ(U+0200C)arecodepointsthathavebeenprovidedbytheUnicodestandardto instructtherenderingofastringwherethescripthastheoptionbetweenjoiningandnon-joiningcharactersWithouttheuseofthesecontrolcodesthestringmayberenderedinanalternateformfromwhatisintendedUseofZWJ

bull InsofarasBanglaisconcernedZWJisusedfortheproperrenderingofcharacterssuchaskhaṇḍa-taৎasinসতGিজৎ (satyajit)ldquoSatyajitrdquoandসৎ(sat)ldquohonestrdquoThisistypedasfollowsta+Hasanta+ZWJ(U+0200D)

bull However ZWJ is more important where same combination of consonantal

characters is represented differently depending upon the contexts Eg র++য

havetworepresentationsinBanglamdashasযHandasর GTogettheformযHonehasto

type in the following mannermdashর++য but for র G the sequence would be

র+ZWJ++য [154] In other words ZWJ is used in the rendering of wordsdemanding ya-phalā after ra which is otherwise not possible to type (render)due to the same order of ra+hasanta+antastha ja in the medial andor finalposition Interestingly ra+hasanta+antastha ja is used to type repha on theconsonant -antasthaja as inকায6 (kaarjo) In order to get a ya-phalā after the

consonant -ra it is therefore obligatory to use ZWJ after -ra as in র Gাপার

(wrapper)র Gাশ(rash)র Gািল(rally)etcThetypingsequenceisgivenbelow

ra(র)+ZWJ+hasanta()+antasthaja(য)=র GUseofZWNJ

bull TheuseofZWNJinBanglaisusedtorepresenttheexplicitHasantaorHalantInordertoavoidconjunctformationincaseswherethereisanexplicithasantabeforethesucceedingconsonanttheZWNJisused

Consonant+hasanta+ZWNJ+consonant=explicithasantaExampleAা7 কথন(prakkathanaprakkɔtʰon)

TheuseofZWJZWNJhavebeenruledoutfromtherootzonebythe[Procedure]Usedin Bangla to create alternate renderings the insertion of these two signs can affectsearchingaswellasNLP

20

TheZeroWidthNon-joiner(ZWNJ)isaninvisiblecharacterusedincertaincases(afterHasanta)wheredefaultconjunctformationistobeexplicitlyrestrictedandtheHasantajoiningthetwoconsonantsparticipatingintheconjunctformationneedstobeexplicitlyshown

339UseofYa-phalaaYa-Phalaasequencesare two instances inBanglawhereHasanta isprecededbya fullvowel(U+0985অ-BENGALILETTERAandU+098Fএ-BENGALILETTERE)

bull অা 098509CD09AF09BE BENGALILETTERA+BENGALISIGNVIRAMA+BENGALILETTERYA+BENGALIVOWELSIGNAA

bull এা 098F09CD09AF09BEBENGALILETTERE+BENGALISIGNVIRAMA+BENGALILETTERYA+BENGALIVOWELSIGNAA

For renderingYa-phalā followedbyঅ andএ it isnecessary to typeU+09CDHasantaplusU+09AFyaprecededby thesaidvowelsThis isapurely ligaturalentityand theadditionofYa-phalāandākaraisusedtoelicittheaeligsoundasinEnglishacidঅGািসড

association অGােসািসেয়শনlsquobatrsquoবGাটlsquofatrsquo ফGাট lsquomatrsquo মGাটlsquocaprsquoকGাপetcTheBrāhmīscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribed as lsquovowel killerrsquo although it actually indicates absenceof a vowel after themarkedconsonantOnly theconsonantscanhave theHasantamarkedButasweseehereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅ8াandএ8া(CfUnicode100p473[100])

3310FormationofRa-phalaaandRefSequences Thiscasereferstotheformationofrephaandra-phalāasfollows

Ra-Hasanta= (C2H)whereC2iseither

09B0(র-BENGALILETTERRA)or 09F0(ৰ-ASSAMESELETTERRAUnicodename

BENGALILETTERRAWITHMIDDLEDIAGONAL)His09CD(-BENGALISIGNVIRAMA)

Owingtoco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+kaasinঅকHarkaldquothesunldquo)ra-phalā=C+Hasanta+ra(egT ieka+Hasanta+raasinচTchakraldquocyclerdquo)Thepointisin

21

both the cases the slot for ra could be Bangla ra র (U+09B0) or the Assamese ra ৰ(U+09F0)followedprecededbythecommonHasanta(U+09CD)whereastheshapesofrephaandra-phalāinboththecasesremainthesameTheLGRmakesanoteofthispoint of concern with respect to the two RAs in disguise as it would be compeltelyimpossibletodistinguishbetweenthemwithnakedeyesinalablesogeneratedwhichmay consequently lead to concerns related to spoofing and other kind of cyberirregularitiesThemotivetoclassthesetwoCPsas(blocking)variantsisbecausefullyrendered labels may mask the distinction between Bangla ra র (U+09B0) or theAssameseraৰ(U+09F0)ThatprovidesthejustificationforVariantSet4thoughonlyinthecontextoffollowingHasantThedifferencebetweentheRAsisonlydistinguishableifonelooksintotheirUnicodevaluesThereforelabelssuchasঅকHarka শীষH sırsalsquotopapexrsquo অY abhra lsquocloudthe skyrsquo ম śrama lsquophysical labourrsquo could be extremelydangerous as theweb-usermay never verify the digital content (the labels) with itsunicodevaluecodepoints ThispointismadeexplicitlywithreferencetoTable9(ofsequencesp36)andTable16(ofWLESymbolsp47)thataretofollowMoreoveritisnoteworthythattheREPHAcanalsooccurwithKHANDATATheconditionsinthiscontextofKHANDATAareliabletobesuchthattheCshouldbeeitherRAU+09B0(র)(usedinBangla)orRAU+09F0(ৰ)(usedinAssamese)

4 OverallDevelopmentProcessandMethodologyThe Neo-Brāhmī Generation Panel (NBGP) has been formed by members havingexperience in Linguistics (especially in NLP Computational linguistics) LiteratureLanguageHistoryandEpigraphyUndertheNeo-BrāhmīGenerationPanelBanglaandeightotherscriptsbelongingtoseparateUnicodeblocksarebeingtakenuptoassignaseparate LGR for each However an attempt ismade to ensure that the fundamentalphilosophybehindbuildingthoseLGRsconsistentwithallotherBrāhmī-derivedscriptsThepresentLGRwillcater tomultiple languagesbelongingtoEGIDSscale1to4(seeTable4)thatuseBanglascriptThefollowingguidingprinciplesareusedinmakingdecisionsaboutBanglaLGRCode-points

41 GuidingPrinciplesTheNBGP adopts followingbroadprinciples for selection of code-points in the code-pointrepertoireacrosstheboardforalltheNeo-Brāhmīscriptswithinitsambit

411 InclusionPrinciples4111 ModernUsageEvery character proposed should be in the everyday usage of a particular linguisticcommunityThecharacterswhichhavebeenencodedintheUnicodefortranscription

22

purposesonlyorforarchivalpurposeswillnotbeconsideredforinclusioninthecode-pointrepertoire4112 UnambiguousUseEvery character proposed should have unambiguous understanding among linguistsaboutitsusageinthelanguage

42 ExclusionPrinciplesThe main exclusion principle is that of External Limits on Scope These consist ofprotocolsor standardswhichareprerequisites to theLabelGenerationRule-setsAllfurtherprinciplesareinfactsubsumedundertheselimitationsbuthavebeenspeltoutseparatelyforthesakeofclarity421 ExternalLimitsofScopeThecodepointrepertoireforrootzonebeingaveryspecialcaseatthetopofprotocolhierarchies thecanvasofavailablecharacters forselectionasapartof theRootZonecodepointrepertoireisalreadyconstrainedbyvariousprotocollayersbeneathitThefollowingthreemainprotocolsstandardsactassuccessivefiltersiTheUnicodeChartOut of all the characters that are needed by the script in question if a particularcharacter is not encoded in Unicode it cannot be incorporated in the code pointrepertoire Such cases are quite rare and especially so in Bangla-Asamiyā-ManipuriWritingSystemgiventheelaborateandexhaustivecharacterinclusioneffortsmadebytheUnicodeconsortiumiiIDNAProtocolUnicode being the character-encoding standard for providing the maximum possiblerepresentation of a given scriptlanguage it has encoded as far as possible all thepossible characters needed by the script However the Domain name being aspecialized case it is governed by an additional protocol known as IDNA(InternationalizedDomainNames inApplications) The IDNAprotocol excludes somecharactersoutofUnicoderepertoirefrombeingpartofthedomainnamesiiiMaximalStartingRepertoire(MSR)TheRoot-zoneLGRbeing the repertoireof characterswhicharegoing tobeused forcreationoftheRoot-zoneTLDswhichinturnconstituteanevenmorespecializedcaseof domain names the ROOT LGR procedure introduces additional exclusions on theIDNArsquosallowedsetofcharacters

23

ExampleBanglaSignAvagrahaঽ(U+093D)evenifallowedbyIDNAprotocol isnotpermittedintheRootZoneRepertoireaspertheMSRTosumuptherestrictionsstartoffwithadmittingonlysuchcharactersasarepartofthe code-block of the given scriptlanguage The IDNA Protocol further narrows thisdownandfinallyanadditionalfilterintheformofMaximalStartingRepertoirerestrictsthecharactersetassociatedwiththegivenlanguageevenmore4211 NoPunctuationMarksTheTLDsbeingidentifierspunctuationmarkerspresentinBraHami-basedscriptswillnotbeincluded 4212 NoSymbolsandAbbreviationsAbbreviations weights and measures and other such iconic characters like BANGLAISSHAR(U+09FA)BANGLACURRENCYDENOMINATORSIXTEEN৹(U+09F9)etcwillalsonotbeincluded4213 NoRareandObsoleteCharactersThere are characterswhich have been added toUnicode to accommodate rare formssuchasSanskriticVOCALICRRৠ(U+09E0)andVOCALICLldquoঌrdquo (U+098C)aswellasVOCALICLLৡ(U+09E1)andtheallographicndashkaraformsofthelattertwosymbols-VOWELSIGNVOCALICL(U+09E2)andVOWELSIGNVOCALICLLldquo(U+09E3)Allsuch charactersareexcludedwhich complieswith theConservatismprincipleas laiddownintheRootZoneLGRprocedureHoweverinBanglathe-karacorrespondingtoVOCALICRRৠ(U+09E0)whichisVOWELSIGNVOCALICRRldquordquo(U+09C4)isstill inactiveuseincertainlimitedborrowedorSanskriticwordsandarethereforeretained4214 NoStressMarkersofClassicalSanskritandVedicStressmarkers for classical Sanskrit will not be included This is also in consonancewiththeLetterprincipleaslaiddownintheRootZoneLGRprocedure4215 ABNFThe Augmented Backus-Naur Formalism (ABNF) is described in Section 541 andAppendix(Section101)

5 RepertoireTheBanglaWritingSystemisrepresentedinUNICODEusingtheBengali(Bangla)scriptname as enumerated in ISO 15924 corresponding to languages such as Asamiyā(Assamese) Bangla (Bengali) and Manipuri The BENGALI block used for Bangla-

24

Asamiyā-Manipuri in theUNICODEhas93 entriesThis sectiondetails the code-pointrepertoirethattheNeo-BrāhmīGenerationPanel[NBGP]proposestobeincludedintheBanglaLGRItmaybementionedherethat theGovernmentofAssamhassubmittedaproposal toBureauof Indian Standards (BIS) on26th February2016 for dis-unificationofBanglaand Asamiyā Scripts The BIS inits 8thMeetingofIndian Language Technologies andProducts Sectional Committee LITD 20 held on 23rd Aug 2017 decided torefer theproposalforrecognitionofAssamesescriptinISOIEC10646toISOUntiltheUNICODEConsortiumtakesanyfurtheractionitwillbeassumedthattheCodePointRepertoireunderTable11willbevalidforallthethreelanguagesasaboveFor each of the code points language references have been given in the last columntitledReferenceunderTable8titledtheldquoCodePointRepertoirerdquoForentirecoverageofBanglacodepointsreferencesofBanglaAsamiyā(Assamese)Manipuri(Meitei)andBishnupriya are given Kokborok written in Bangla script is not known to haveintroducedmanynewcomplicationsexceptforoneparticularcharacterThoughonlyafewrepresentativelanguagesunderEGIDSScale1-4havebeenchosenforreferencingthey together cover all the code-points required for all the languages that NBGP hasconsideredasgivenunderBanglaUnicodePoints(asgiveninUNICODE63)Howeverbefore thedetailsarepresented it is ideal to lookat theBanglaCodePointChartfromMaximalStartingRepertoire[MSR]Version3Itmaybenotedthattheshapesofthereferenceglyphsgivenbelowinthecodechartsarebasedononeofthemanyfontsdesigned and are not prescriptive because there could be some variations in actualfonts ndash both UNICODE-compatible and True-Type ones Consider the following Codepointtable

25

Figure1BanglaCodePagefrom[MSR]forBangla-Asamiyā-Manipuri

ColourconventionAllcharactersthatareincludedinthe[MSR]-YellowbackgroundPVALIDinIDNA2008butexcludedfromthe[MSR]-PinkishbackgroundNot PVALID in IDNA2008 or are ineligiblefor the root zone (digits hyphen) - Whitebackground

GiventheBanglaUnicodeBlockasinFigure1forthecodepointsthoseareincludedintheMSRthefollowingsymbolswillneedaseparatetreatmentৎ U+09CE BanglaLetterKhanda-Taৰ U+09F0 Asamiyā-BanglaLetterRaWithMiddleDiagonalৱ U+09F1 Asamiyā-BanglaLetterRaWithLowerDiagonal

26

51 CodePointRepertoireInclusion

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

1 U+0981 BENGALISIGNCANDRABINDU

Candra-bindu

1Bangla2Manipuri2Assamese

[112][122][125]

2 U+0982 ং BENGALISIGNANUSVARA

Onushshar(Anusvara)

1Bangla2Manipuri2Assamese

[112][122][125]

3 U+0983 ঃ BENGALISIGNVISARGA

Biśarga(Visarga)

1Bangla2Manipuri2Assamese

[112][122][125]

4 U+0985 অ BENGALILETTERA

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

5 U+0986 আ BENGALILETTERAA

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

6 U+0987 ই BENGALILETTERI

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

7 U+0988 ঈ BENGALILETTERII

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

27

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

8 U+0989 উ BENGALILETTERU

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

9 U+098A ঊ BENGALILETTERUU

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

10 U+098B ঋ BENGALILETTERVOCALICR

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

11 U+098F এ BENGALILETTERE

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

12 U+0990 ঐ BENGALILETTERAI

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

13 U+0993 ও BENGALILETTERO

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

14 U+0994 ঔ BENGALILETTERAU

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

28

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

15 U+0995 ক BENGALILETTERKA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

16 U+0996 খ BENGALILETTERKHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

17 U+0997 গ BENGALILETTERGA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

18 U+0998 ঘ BENGALILETTERGHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

19 U+0999 ঙ BENGALILETTERNGA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

20 U+099A চ BENGALILETTERCA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

21 U+099B ছ BENGALILETTERCHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

29

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

22 U+099C জ BENGALILETTERJA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

23 U+099D ঝ BENGALILETTERJHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

24 U+099E ঞ BENGALILETTERNYA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

25 U+099F ট BENGALILETTERTTA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

26 U+09A0 ঠ BENGALILETTERTTHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

27 U+09A1 ড BENGALILETTERDDA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

30

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

28 09A109BC(U+09DC)

ড় NormalizedformofBENGALILETTERRRA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]09DCisthepreferredcodepointhoweveritisnotavailableforLGRasperthestandardsgoverningthisLGRdevelopment

29 U+09A2 ঢ BENGALILETTERDDHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

30 09A209BC(U+09DD)

ঢ় NormalizedformofBENGALILETTERRHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]09DDisthepreferredcodepointhoweveritisnotavailableforLGRasperthestandardsgoverningthisLGRdevelopment

31 U+09A3 ণ BENGALILETTERNNA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

32 U+09A4 ত BENGALILETTERTA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

31

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

33 U+09A5 থ BENGALILETTERTHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

34 U+09A6 দ BENGALILETTERDA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

35 U+09A7 ধ BENGALILETTERDHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

36 U+09A8 ন BENGALILETTERNA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

37 U+09AA প BENGALILETTERPA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

38 U+09AB ফ BENGALILETTERPHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

39 U+09AC ব BENGALILETTERBA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

32

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

40 U+09AD ভ BENGALILETTERBHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

41 U+09AE ম BENGALILETTERMA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

42 U+09AF য BENGALILETTERYA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

43 09AF09BC(U+09DF)

য় NormalizedformofBENGALILETTERYYA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]09DFisthepreferredcodepointhoweveritisnotavailableforLGRasperthestandardsgoverningthisLGRdevelopment

44 U+09B0 র BENGALILETTERRA

Consonant 1Bangla2Manipuri

[112][125]

45 U+09B2 ল BENGALILETTERLA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

46 U+09B6 শ BENGALILETTERSHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

33

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

47 U+09B7 ষ BENGALILETTERSSA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

48 U+09B8 স BENGALILETTERSA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

49 U+09B9 হ BENGALILETTERHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

50 U+09BE া BENGALIVOWELSIGNAA

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

51 U+09BF ি BENGALIVOWELSIGNI

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

52 U+09C0 ী BENGALIVOWELSIGNII

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

53 U+09C1 BENGALIVOWELSIGNU

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

34

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

54 U+09C2 BENGALIVOWELSIGNUU

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

55 U+09C3 BENGALIVOWELSIGNVOCALICR

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

56 U+09C4 BENGALIVOWELSIGNVOCALICRR

Kāra(Mātrā)

1Bangla2Assamese

[112][122]

57 U+09C7 l BENGALIVOWELSIGNE

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

58 U+09C8 m BENGALIVOWELSIGNAI

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

59 U+09CB lা BENGALIVOWELSIGNO

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

60 U+09CC lৗ BENGALIVOWELSIGNAU

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

35

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

61 U+09CD BENGALISIGNVIRAMA

Hasanta(=Halant)Virama(=Da ri)

1Bangla2Assamese2Manipuri

[112][122][125]

62 U+09CE ৎ BENGALILETTERKHANDATA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

63 U+09F0 ৰ BENGALILETTERRAWITHMIDDLEDIAGONAL

Consonant 2Assamese [122]

64 U+09F1 ৱ BENGALILETTERRAWITHLOWERDIAGONAL

Consonant 2Assamese2Manipuri

[122][125]

Table8BanglaCode-PointRepertoire

Apart from the above individual code-points the Neo-Brāhmī Generation Panel alsoproposes some specific sequences which enable conditional inclusion of the BanglaLETTER A and E followed by Bangla SIGN VIRAMA and Bangla LETTER YA againfollowed by Bangla VOWEL SIGN AA in the repertoire for enabling inclusion of aeligsoundasinEnglishlsquobatrsquolsquocatrsquoetc

SrNo

UnicodeCodePoints

Sequence

CharacterNames Examplelanguagesusingthecode-point

(Notexhaustive

list)

Reference

36

SrNo

UnicodeCodePoints

Sequence

CharacterNames Examplelanguagesusingthecode-point

(Notexhaustive

list)

Reference

S1 098509CD09AF09BE

অ8া BENGALILETTERABENGALISIGNVIRAMABENGALILETTERYABENGALIVOWELSIGNAA

BanglaAssamese

[112][122]

S2 098F09CD09AF09BE

এ8া BENGALILETTEREBENGALISIGNVIRAMABENGALILETTERYABENGALIVOWELSIGNAA

Bangla [112]

Table9Sequences

52 CodePointRepertoireExclusionTherearesomecharactersoftheBanglascriptthatfindplaceintheUnicodebuthavenot been included in the repertoire in the LGR proposal The reason for excludingঌ(U+098C)andৗ(U+09D7)isthattheyarerareandobsoletecharacters

SrNo CodePoints

Glyph CharacterNames Note

1 U+098C ঌ BENGALILETTERVOCALICL Limitedordeclininguse

2 U+09D7 ৗ BENGALIAULENGTHMARK Limitedordeclininguse

Table10ExcludedCodePoints

37

53 CodepointnotusedaloneBENGALI SIGN NUKTA U+09BC (See 336) is excluded from repertoire since it will never be used alone It will be used as sequence in three special characters in normalized form for ড় ঢ় য়

UnicodeCodePoint

Glyph CharacterName Reasonforexclusion

U+09BC BENGALISIGNNUKTA

Never used alone Only used together with U+09A1 ড

U+09A2 ঢ U+09AF য as to form ড় ঢ় য় respectively

Table10bExcludedCodePoints

54 TheBasisofPresentIDNThepresentLGRhasalsobenefitedfromtheearlierworkonIDNforBangla(differentversions)doneforभारतorভারতdraftedbetween20112009and18072013

541 TheABNFVariablesTheAugmentedBackus-NaurFormalism(ABNF)beganwiththefollowingvariables

CrarrConsonantVrarrVowelMrarrkāra(Mātrā)BrarrAnusvāra(onuʃʃār)DrarrCandrabinduXrarrVisarga(biʃɔrgo)HrarrHasantaViramaZrarrKhandaTa

TheAugmentedBackus-NaurFormalism(ABNF)willusethefollowingOperators

SrNumber Operator Function

1 ldquo|ldquo Alternative

2 ldquo[]rdquo Optional

3 ldquordquo VariableRepetition

38

4 ldquo()rdquo SequenceGroup

Table11TheABNFFormalism

InwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoBanglaaregiventofacilitateunderstanding

542 TheVowelSequenceInwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoBanglaare given To facilitate understanding of other Brahmi script users equivalents inDevanāgarīareprovidedwherevernecessaryAvowelsequenceismadeupofasinglevowelItmaybefollowedbutnotnecessarily(optionally)byanAnusvāraonuʃʃār(B)Candrabindu(D)oraVisargabiʃɔrgo(X)ThenumberofDBorXwhichcanfollowaV inBanglamaynotberestrictedtooneGoingbytherules illustratedinthedocument it isclearthat formationssuchasVDDVBBandVXXare invalidorthographicunitsHowever it isvalidandpossible tohaveformationsorsequencessuchasanusvarafollowedbyachandrabinduononehandandvisarga followed by a chandrabindu on the other as in হ8াংচা lsquohaelignchārsquo and lsquohaelignrsquo হ8াঃrespectivelyThepossibilityof aVisarga orAnusvāra (onuʃʃār) followingaCandrabinduexists inBangla Vowel can optionally be followed by a combination of Hasanta Virama [H]Consonant [C] to formaYa-phala ldquoYa-phala isapresentation formofU+09AFBanglaletterযorlsquoyarsquoRepresentedbythesequenceltU+09CDieBENGALISIGNVIRAMABangla SIGNHasanta or VIRAMA U+09AF -য BENGALI LETTER YAgt Ya-phala has aspecialformয়AgainwhencombinedwithU+09BEাBENGALIVOWELSIGNAA(ielsquoaarsquo(ā))itisusedfortranscribing[aelig]asintheldquoardquointheEnglishwordldquobatrdquowritteninBanglaasব8াটAVowel-sequenceadmitsthefollowingcombinations

5421 ASingleVowel

ExamplesV অअ

5422 AVowelwithConditionsAVowelcanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D]orVisarga[X]or Candrabindu+ Anusvāra [DB] or Candrabindu+ Visarga [DX] or combination ofHasanta(orVirama)[H]followedbyConsonant[C]followedbykāra(Mātrā)[M]

39

Examples

VB অং अ

VD অ अ

VX অঃ अः

VDB অং अ

VDX অঃ अःVHCM অ8াএ8া

5423 VHCMSequenceAVHCMsequencecanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D] orVisarga [X]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga [DX]

Examples VHCMB অ8াংএ8াং VHCMD অ8াএ8া VHCMX অ8াঃএ8াঃ VHCMDB অ8াংএ8াং VHCMDX অ8াঃএ8াঃ

543 TheConsonantSequence

5431 ASingleConsonant(C) ExampleC কक

5432 AConsonantwithConditionsAConsonantoptionallyfollowedbydependentvowelsignkāra(Mātrā)[M]orAnusvāra [B] or Candrabindu [D] or Visarga [X] or Hasanta (also known asVirama)[H]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga[DX]

Example

CM িকক -कक

CB কং क

40

CD ক क

CX কঃ कः

CH p क(Pureconsonant)

CDB কং क

CDX কঃ कः

5433 CMSequenceACMsequencecanbeoptionallyfollowedbyBDXDBorDX

Example CMB কীংকং कक

CMD কা का

CMX বীঃ वीः

CMDB কাং का

CMDX কাঃ काः

5434 SequenceofConsonants Asequenceofconsonants(upto4)joinedbyHasanta(alsoknownasVirama)

3(CH)CExample

CHC W rarr ন++ ত न+त

CHCHC sup2 rarr ন+ + ত+ + র न+त+र

CHCHCHC q8 rarr ন++ত++র++য় न+त+र+य

5435 Subsets While considering its subsets as a representative example we will consider thecombinationCHConlyhoweverthesameisequallyapplicabletoCHCHCandCHCHCHC[A]ThecombinationmaybefollowedbyMBDXDBorDX

Example CHCM sup3ী rarrক ক ী 4कrarrककী

CHCB sup3ং rarrক ক ং 4कrarrकक

CHCD sup3 rarrক ক 4कrarrकक

41

CHCX sup3ঃ rarrক ক ঃ 4कःrarrककঃ

CHCDB sup3 ং rarrক ক ং 4कrarrकक

CHCDX sup3ঃ rarrক ক ঃ 4कःrarrककः [B] 3(CH)CMmayfurtherbefollowedbyaBDXDBorDX

Example

CHCMB sup3ীং rarr ক ক ী ং 4क rarr क क ी

sup3ং rarr ক ক ং 4क rarr क क

CHCMD sup3া rarr ক ক া 4का rarr क क ा

CHCMX sup3ীঃ rarr ক ক ী ঃ 4कः rarr क क ी ः

CHCMDB sup3াংrarr ক ক া ং 4काrarr क क ा

CHCMDX sup3াঃ rarr ক ক া ঃ 4काः rarr क क ा ः

544 TheKhanda-Tasequence

5441 AsinglelsquoKhandarsquo-Ta(Z) Example Zৎ=x

5442 AKhandaTaCombination10AKhandaTacanbeprecededbyaconsonantandHasanta(alsoknownasVirama)

[CH]Z

Example র++ৎ =ৎHasinভৎHসনা (bhartsanā) scoldingNoteTheconditionsinthiscontextofKHANDATAarethattheCshouldbeeitherRAU+09B0(র)(usedinBangla)orRAU+09F0(ৰ)(usedinAssamese)

545 SpecialCasesSandPTwospecialcasesinvolvingSequences(referredtoasSandPinTable16underSection7)couldbedescribedbrieflyhereLetustakeupSinthefirstinstanceItisnoteworthythat thereare two instances inBanglawhereHasanta(U+09CD) isprecededbya full

10 Refer to Rule P in Section 7 Table 16

42

vowel (U+0985 অ - BANGLA LETTER A and U+098F এ - BANGLA LETTER E) ForrenderingYa-phalāfollowedbyঅandএitisnecessarytotypeU+09CDplusU+09AFyapreceded by the said vowels This is a purely ligatural entity and the addition ofYa-phalāandā-kāraisusedtoelicittheaeligsoundasinEnglishlsquobatrsquolsquofatrsquoetcTheBrahmıscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribedaslsquovowel killerrsquo although it actually indicates absence of a vowel after the markedconsonant Only the consonants can have the Hasanta marked But as we see hereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅGাandএGা(CfUnicode100p473[100])Another case refers to the formation of repha and ra-phalā in the said script andmentioned in the tableaboveasPOwing toco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+

kaasinঅকHarkaldquothesunldquo)ra-phalā=C+Hasanta+ra(egTieka+Hasanta+raas

inচTcakraldquocycleldquo)ThepointisinboththecasestheslotforracouldbeBanglaraর

(U+09B0)ortheAssameseraৰ(U+09F0)followedprecededbythecommonHasanta(U+09CD)whereastheshapesofrephaandra-phalāinboththecasesremainthesame

43

6 VariantsThissection talksabout thevariants in theBanglascriptTheNBGPcategorizes theseconfusinglyvariantsintwogroupsGroup1ConfusingduetopurevisualsimilarityGroup2ConfusingduetodeviationfromnormallyperceivedcharacterformationsbylargerlinguisticcommunityForGroup1anyidenticalcodepointsaredefinedasvariantsTheconfusablebutnotidenticalcasesarenotproposedasthereisanotherpanel(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesHowevercaseswhichbelongtoGroup2areproposedtobeconsideredasvariantsThesecasesarenotofmerevisualsimilarityasthey involve some deviations from the widely accepted norms of Bangla Aksharformations These can cause confusion even to a careful observer and hence beingproposedasvariantsThe variants are generated in a script when two or more forms are formed withdifferent storage or code points In Bangla the e-kāra ā-kāra and the o-kāra havedifferentcodepointsOnecantypeowithaconsonantatonegoandthesamebytypinge-kāra and ā-kāra as two separate keys getting the same results A reader cannotdifferentiatebetween the twoko (েকা)one typedwitha singlekeyand theotheronetypedwithtwodifferentkeysMoreoverthiswillnotbeconsideredasacaseofvariantbecauseakārafollowedbyakāraisnotallowed

61 InScriptVariantsHoweverweproposetwocasesoftruein-scriptvariantsinBanglascriptCASEIAs far as true variants in Bangla are concernedwemay drawour attention to caseswhereinHasantawith(U+09A5)থ(tha)appearsasconjunctwith(U+09B8)স(sa)and(U+09A8)ন(na)1 স+Hasanta+থ(U+09B8+U+09CD+U+09A5)versus

স +Hasanta+হ(U+09B8+U+09CD+U+09B9)

2 ন+Hasanta+থ(U+09A8+U+09CD+U+09A5)versus

ন+Hasanta+হ(U+09A8+U+09CD+U+09B9)

44

Theabovecombinationsifwrittenintraditionalorthographycouldbelittleconfusingwheretheথ(tha)inconjunctappearslikeaহ(ha)Theconjunctcouldbeintheinitialmedialorfinalpositions(asshownbelowinegno1)Itcouldbetypedwrongaswellthinking itwas aহ (ha)U+09B9 increasing the chances of risks in labelwriting andidentificationExamples1 acuteandসহ(asinacuteানsthānaacuteলsthulaাacuteGsvāsthyaঅacuteায়ীasthāyı)

2 andনহ(asinparagrantha)The fontswhichrepresent traditionalBanglawritingsystemcould tend tocreate thisproblemThereforethesemaybetakenascasesofvariantsinBanglaCASEIIAnotherinterestingexampleofvariantisencounteredinra+HasantaandHasanta+racombinationsinwritinglabelsintheBanglascript(forlanguagessuchasBanglaAssameseandManipuri)Thevariantcasesariseintypinglsquorepharsquo(involvingra+Hasanta)andlsquora-phalārsquo(involvingHasanta+ra)lsquoRepharsquocouldbeformedbytwosequences(mainlybecausebothAssameseandBanglafindplaceinthesameUNICODEpointsandlsquoB_RArsquoaswellaslsquoA_RArsquorefertothesamephoneticelement)Herethefinalligatureslookthesameandwillbeasfollows

(1) B_RA+H+C(2) A_RA+H+C

Where

B_RA = U+09B0BENGALILETTERRA(র)orA_RA = U+09F0BENGALILETTERRAWITHMIDDLEDIAGONAL(ৰ)H = U+09CDBENGALISIGNVIRAMA()C = anyconsonant(theoretically)

Example

Sequence1(UsingBanglaRA)

Ligature1

Sequence2(UsingAssameseRA)

Ligature2

U+09B0(র)U+09CD()U+0995(ক) কH U+09F0(ৰ)U+09CD()U+0995(ক) কH

U+09B0(র)U+09CD()U+09A0(ঠ) ঠH U+09F0(ৰ)U+09CD()U+09A0(ঠ) ঠH

Table12ExampleofRepha

45

NoteAsBanglaandAssameseকandঠlookexactlythesametheresultantcombinationswithRephalookidenticalAdditionofRephadoesnotmakeanydifferencelsquoRa-phalarsquocouldbeformedbytwosequencesonsimilargroundsandthefinalligatureswouldlookthesame

(1) C1+H+B_RA(2) C1+H+A_RA

WhereC1 = anyconsonantsexceptKhanda-ta

Example

Sequence1(UsingBanglaRA)

Ligature1

Sequence2(UsingAssameseRA)

Ligature2

U+0995(ক)U+09CD()U+09B0(র) U+0995(ক)U+09CD()U+09F0(ৰ)

U+09A8(ন)U+09CD()U+09B0(র) ) U+09A8(ন)U+09CD()U+09F0(ৰ) )

Table13ExampleofRa-phala

AstheAssameseandBanglaRephaandRa-phalaconjunctformslookthesamethiscouldcauseconfusabilitytotheend-usersHencetherephaandra-phalacasesneedtobedefinedasvariantsNBGPconcludedtodefineরandৰasvariantcodepointswhereonlyonevariantsetbetweenরandৰcouldcoverallcasesButthiswillcreateblockedvariantlabelsegif

someoneregistersldquoরররrdquothevariantlabelldquoৰৰৰrdquowillbegeneratedasvariantandwillbeblockedandviceversaHoweveritisonlyblockedatthelabellevelifsomeoneelseneedstoregisterotherlabelsegৰৰorৰৰৰৰitisstillpossible

62 CrossScriptVariantsAcrispcrossscriptstudyforBanglahasbeendonewithrespecttosisterscriptssuchasDevanāgarī Gurmukhı and Odia11(formerly Oriya) keeping in mind the visual andtechnicalconfusionstheymaycauseaslabelsonthewebdomainMoreoverthereisnoin-script variant in Bangla as far as the orthography is concerned The followingcharacters are being proposed by the NBPG as variants Although there are certaincharacters which are somewhat similar they but have not been included here TheyhavebeenprovidedintheAppendix(102)forreference

11 Unicode uses Oriya for the script although Odia is now the official term used

46

1 BanglaandNāgarīDevanāgarīScript

Bangla Devanāgarī

মU+09AE

मU+092E

িU+09BF

िU+093F

Table14-BanglaandDevanāgarīcross-scriptvariantcodepoint

2 BanglaandGurmukhiScript

Bangla Gurmukhı

মU+09AE

ਸU+0A38

িU+09BF

ਿU+0A3F

Table15-BanglaandGurmukhıcross-scriptvariantcodepoint

7 WholeLabelEvaluationRules(WLE) ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinsection32whenwritten inBangla12ScriptThe ruleshavebeendrafted in suchawaythattheycanbeeasilytranslatedintotheLGRspecificationsBelow are the symbols used in the WLE rules for each of the Indic SyllabicCategoryasmentionedinthetableprovidedinCodepointrepertoire(Section51)

C rarr Consonant

M rarr Kāra(Mātrā)

V rarr Vowel

B rarr Anusvāra

12 As used by the Unicode denoting and including both Assamese and Maṇipuri

47

D rarr Candrabindu

X rarr Visarga

H rarr Hasanta

Z rarr KhandaTa

S rarr S1S2(fromTable9)or(ae)Ya-phalā(V1HC1M1)whereV1iseither0985(অ-BENGALILETTERA)or098F(এ-BENGALILETTERE)His09CD(-BENGALISIGNVIRAMA)C1is-09AF(য-BENGALILETTERYA)M1is-09BE(া-BENGALIVOWELSIGNAA)S1 and S2 are valid even they are not allowed by the other context rules

P rarr Ra-Hasanta(C2H)whereC2iseither09B0(র-BENGALILETTERRA)or 09F0 (ৰ - ASSAMESE LETTER RA

Unicode name BENGALI LETTER RAWITHMIDDLEDIAGONAL)

His09CD(-BENGALISIGNVIRAMA)

Table16-SymbolsusedinWLErules

ItisalsoperhapsidealtomentionherethatinBanglatheconsonantletters(orgraphemes)arephysicallyjoinedtoformldquoclustersrdquothatcouldtheoreticallyconjoinfromtwotofourconsonantsandcombinetocreatenewshapesDashandChaudhuri(1998)statethatthereareldquonearly380uniqueconsonantclustersrdquooutofwhichBi-consonantalcombinationsare290three-lettercombinationsaccountforanother80andtherareroneswithfourlettersnumber10more[136Pg4]MoredetailsofsuchcombinationscouldbeseeninPabitraSarkar(1993)[135] 71FinalSetofWLERulesTheprevalentpatternsinBanglaandvariousrestrictionsbelowarethespecificWLErulesthatneedtobeimplemented

48

1 CisasetofCandCNwhereCNisthesetofnormalizedformsofড়ঢ়য়2 HmustbeprecededbyC

Example

3 MmustbeprecededbyCExampleকা

4 DmustbeprecededbyeitherofVCorMExampleআখখা হা

5 XmustbeprecededbyeitherofVCMorDExampleউঃখঃবঃাঃ দ ঃ

6 BmustbeprecededbyeitherofVCMorDExampleআংইংকং

7 ZmustbeprecededbyVCMDBXorPExampleইৎকৎাৎাৎপ6ৎrৎ (S is not listed because S ends with M Z may also follow S)rdquo

8 VCANNOTbeprecededbyHDetailsin711CaseofVprecededbyH

9 SCANNOTbeprecededbyH

Now letuselaborateeach rulewithexamples from the scriptkeeping inmindthe Bangla Assamese and Manipuri communities Some combinations ofcharactersmayseemunrealisticorrareinusagebutthereisnoharminaddingsuch ligaturesbecause it ispossible tocreate thembyanyusereasilybutmaynotbeattestedcombinationsCase of V Preceded by H There could be cases involvingmulti-word domains where Vmay need to beallowedtofollowanHegব8াsঅtইিuয়া baeligŋkʌv ɪndiə (U+09ACU+09CDU+09AFU+09BEU+0999U+09CD U+0995 U+0985 U+09AB U+09CD U+0987 U+09A8 U+09CD U+09A1U+09BFU+09DFU+09BE)(meaningBankofIndia)ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendswithanH(অt)andthesecondwordbeginswithaV(ইিuয়া)Somesectionsofthe linguistic community require the explicit presence of H for fullrepresentation of the sound intended However by and large the form of thefirstwordwithoutanH(U+09CD)isconsideredenoughforfullrepresentationofthesoundintendedforthefirstword

49

This isauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoire Otherwise V is never required to be allowed to follow an HPermittingthismaycreateaperceptivesimilaritybetweentwolabels(withandwithout H) for majority of the linguistic communities hence this is explicitlyprohibitedbytheNBGPIn future if required depending on the prevailing requirements from thecommunitythefutureNBGPmayconsiderrevisitingthisrule

72 AdditionalExamplesfromBanglaABNF

Belowarea fewexampleswhichhelponeunderstandsomeof the rulesABNFputsinplaceThesearejustgivenforreferencepurposesandarenotmeanttobecomprehensive

1HMBDorXcannotoccurinthebeginningofaBanglawordExample

ক क

াক ाक

ংক क

ক क

ঃক ःकAscanbeseensuchcombinationwillresultautomaticallyinaldquogolurdquooradottedcircle marking it as an invalid formation This is an intrinsic property of theIndianlanguagesyllableandisquasi-automaticallyappliedwhereversupportedbytheOS

2HisnotpermittedafterVBDXMS

Example

অ अ

অং क ক क কঃ कः ি )क

3NumberofBDorXpermittedafterConsonantorVowelorakāra(Mātrā)isrestrictedtoonethusthefollowingcombinationsareinvalidated

50

Example

কংং क

ক क

কঃঃ कःः

কা का

কীঃঃ कःः

অংং अ

অ अ

অঃঃ अःः

4NumberofMpermittedafterConsonantisrestrictedtooneExample

কীী की5MisnotpermittedafterV Example

ইাঈৗ ईाईौ6ThecombinationsofAnusvāra+ VisargaaswellasVisarga+Anusvāraarenotpermissible

Example

কংঃ कः

কঃং कः

8 Contributors

81ExpertsfromIndia ProfessorUdayaNarayanaSinghChair-ProfessorofLinguisticsampDeanFacultyofArtsAmityUniversityHaryanaGurgaonPachgaonManesarPIN122431(Haryana)India

51

ProfessorPabitraSarkarformerlyVice-ChancellorRabindraBharatiUniversityKolkataDrAtiurRahmanKhanPrincipalTechnicalOfficerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMrRajibChakrabortyLinguistSocietyforNaturalLanguageTechnologyResearch(SNLTR)Module114amp130SDFBuildingSaltLakeSector-VKolkata-700091(WestBengal)IndiaMrAkshatJoshiProjectEngineerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMsMoumitaChowdhurySeniorTechnicalOfficerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMrChandrakantaMurasinghAgartalaTripuraSomeotherNBGPmembers

82ContributorsfromBangladeshJanabMustafaJabbarHonorableMinisterMinistryofPostsTelecommunicationsampInformationTechnologyGovtofBangladeshProfShamsuzzamanKhanFormerDirector-GeneralBanglaAcademyDhakaProfRafiqulIslamNationalProfessorofHumanitiesDhakaProfSwarochisSarkarDirectorInstituteofBangladeshStudiesRajshahiUniversityRajshahiBangladeshProfJinnatImtiazAliDirector-GeneralInternationalMotherLanguageInstituteDhakaMrMohammadMamunOrRashidDepartmentofBanglaJahangirnagarUniversityampMemberBangladeshComputerCouncilProfManiruzzamanformerlyProfessorChittagongUniversityChattagramBangladeshMrShyamSunderSikderSecretarySecretaryPostampTelecommunicationsDivisionGovtofBangladesh

52

MrMdMustafaKamalFormerDirectorGeneralBangladeshTelecommunicationsRegulatoryAuthorityGovernmentofBangladeshDhakaBrigadierGeneralMdMahfuzulKarimMajumderDirector-GeneralEngineeringampOperationsDivisionBangladeshTelecommunicationsRegulatoryAuthorityGovernmentofBangladeshDhakaMdZiarulIslamProgrammerPostsampTelecommunicationsDivisionGovernmentofBangladeshDhakaProfSyedShahriyarRahmanDepartmentofLinguisticsUniversityofDhakaDrMizanurRahmanDirector(In-Charge)TranslationTextBookandInternationalRelationsDivisionBanglaAcademyDhakaDrApareshBandyopadhyayDirectorBanglaAcademyDhakaMrMdMobarakHossainDirectorBanglaAcademyDhakaDrJalalAhmedDirectorBanglaAcademyDhakaMrJahangirHossainInternetSocietyBangladesh(ICANNALS)JanabSarwarMostafaChoudhuryBangladeshComputerCouncilDhakaJanabMdRashidWasifBangladeshComputerCouncilDhakaJanabIstiaqueArifSeniorAssistantDirectorBangladeshTelecommunicationsRegulatoryAuthorityDhakaMsAfifaAbbasInformationSecurityandGovernanceLeadEngineeratBanglalinkandICANNFellowMr Mohammad Abdul Haque Secretary General Bangladesh Internet GovernanceForumMrImranHossenCEOEyeSoftandkeymemberofBangladeshAssociationofSoftwareampInformationServices(BASIS)MsShahidaKhatunDirectorFolkloreMuseumampArchiveDivisionBanglaAcademyDhakaMrSyedAshikRehmanCEOBengalMediaCorporationDhakaMrHaseebRahmanCEOProfessionalsrsquoSystemsDhaka

53

9 References[100] UnicodeConsortium2017UnicodeStandard100MountainViewCA

[101] BandyopadhyayChittaranjan1981DuiShatakerBanglaMudranoPrakashanKolkataAnandaPublishers

[102] BanerjiRD1919TheOriginoftheBengaliScriptKolkataNewDelhiAsianEducationalServices2003reprint

[103] ChatterjiSK1926TheOriginandDevelopmentoftheBengaliLanguageCalcuttaCalcuttaUniversityPress

[104] -----1939Bhasha-prakashBangalaVyakaran(AGrammaroftheBengaliLanguage)CalcuttaUniversityofCalcutta

[105] HaiMuhammadAbdul1964DhvaniVijnanOBanglaDhvani-tattwa(PhoneticsandBengaliPhonology)DhakaBanglaAcademy

[106]JhaSubhadra1958TheFormationofMaithiliLondonLuzacampCo

[107] KosticDjordjeDasRheaS1972AShortOutlineofBengaliPhoneticsCalcuttaStatisticalPublishingCompany

[108] MajumdarRC1971HistoryofAncientBengalCalcuttaGBhardwaj

[109] MazumdarBijaychandra19202000TheHistoryoftheBengaliLanguage(ReprCalcutta1920ed)NewDelhiAsianEducationalServices

[110] PandeyAnshuman2001ProposaltoEncodetheTirhutaScriptinISOIEC10646

[111] PalPalashBaran2001DhwanimalaBarnamalaKolkataPapyrus

[112] -----2007lsquoBanglaHarapherPanchParbarsquoInSwapanChakrabortyedMudranerSanskritiOBanglaBoiKolkataAbabhas

[113] RossFiona1999ThePrintedBengaliCharacteranditsEvolutionLondonCurzon

[114] ShastriMahamahopadhyayHaraPrasad1916HājārBacharērPurāṇaBāṅgālāBhāṣāyBauddhaGānōDōhāCalcuttaBangiyaSahityaParishat

[115] SinghUdayaNarayana(JointlyManiruzzaman)1983DiglossiainBangladeshandlanguageplanningCalcuttaGyanBharati214pp

[116] -----1987ABibliographyofBengaliLinguisticsMysoreCIILxii+316pp

[117] -----2017(withRajibChakrabortyBidishaBhattacharjeeampArimardanKumarTripathy)LanguagesandCulturesontheMarginGuidelinesforFieldworkonEndangeredLanguagesMimeoCentreforEndangeredLanguagesVisva-Bharati

54

[118] -----1980ScriptalchoiceandspellingreformAnessayinlanguageandplanningJournaloftheMSUniversityofBarodaSocialScienceNumber292173-186AmodifiedversionreprintedEAnnamalaiBjornJernuddandJoanRubinedsLanguagePlanningProceedingsofanInstituteMysoreCIIL405-417

[119] Sripantha1996JakhanChapakhanaEloKolkataPaschim-BangaBanglaAcademy

[120] SurAtul1986BanglaMudranerDushoBacharKolkataJijnasa

[121] ScriptBehaviourforBengaliVersion11TDILandC-DACPune

[122] BoraMahendra1981TheEvolutionofAssameseScriptJorhatAssamSahityaSabha

[123] ProposaltoEncodetheTirhutaScriptinISOIEC10646httpwwwunicodeorgL2L201111175r-tirhutapdfaccessedon25112017

[124] EthnologueAssameseintheLanguageCloudhttpswwwethnologuecomcloudasmaccessedon25112017

[125] BengalialphabetforManipurifoundinEthnologueManipuri(MeeteilonMeithei)httpswwwomniglotcomwritingmanipurihtmaccessedon20102019

[126] WikipediaBengalialphabethttpsenwikipediaorgwikiBengali_alphabetaccessedon25112017

[129]OmniglotSlyhetihttpwwwomniglotcomwritingsylotihtmaccessedon1052018

[130]WikipediaBishnupriyaManipurilanguagehttpsenwikipediaorgwikiBishnupriya_Manipuri_languageaccessedon25112017

[131] TheEMILLECIILCorpushttpmetashareeldaorgrepositorybrowsethe-emilleciil-corpusabdd35c8de6f11e2b1e400259011f6ea6bce74d38dbb42d881da76c64a6adb20accessedon1052018

[132]TheEMILLECIILCorpushttpcatalogelrainfoproduct_infophpproducts_id=696accessedon1052018

[133] BanglaLanguageampScript httpswwwisicalacin~rc_banglabanglahtmlaccessedon1052018

[134] SarkarPabitra1992BanglaBananSanskarSamasyaoSambhabanaKolkataChirayataPrakashan

[135] SarkarPabitra1993BanglaBhasharYuktabyanjanBhasha1123-45

55

[136] DashNiladriShekharandBBChaudhuri1998BanglaScriptAStructuralStudyLinguisticsToday121-28Alsoavailableathttpswwwacademiaedu9967428Bangla_Script_A_Structural_Study

[137] DaniAhmedHasan(1957)lsquoSrīhatta-NāgarīLipirUtpattioBikāśrsquoBanglaAcademyPatrika(Dhaka)Vol12(Bhadra-Agrahayan1364BangabdaNumber)pg1

[138] WikipediaSylhetiNagari httpsenwikipediaorgwikiSylheti_Nagariaccessedon1952018

[139] FuruiRyosuke(2015)lsquoVariegatedAdaptationsStateFormationinBengalfromtheFifthtoSeventhCenturyrsquoinBhairabiPrasadSahuampHermannKulkeedsInterrogatingPoliticalSystemsIntegrativeProcessesandStatesinPre-ModernIndiaChapter9Pp255-73NewDelhiManohar

[140] FergusonCharesAandMunierChowdhury(1960)lsquoPhonesofBengalirsquoLanguageVol36No1pp22-59

[141] ShahidullahMuhammad(2007)BuddhistMysticSongsDhakaMowlaBrothers

[142] RayPunyaSloka(1966)BengaliLanguageHandbookWashington

[143] HaiMuhammadAbdul(1960)AphoneticandphonologicalstudyofnasalsandnasalizationinBengaliDhakaUniversityofDhaka

[144] UnicodeConsortiumProposalSummaryFormtoAccompanySubmissionsforAdditionstotheRepertoireofISOIEC10646UNICODEhttpswwwunicodeorgL2L200202387r-syloti-formpdfaccessedonMay212018

[145] WikipediaOlChiki(Unicodeblock)httpsenwikipediaorgwikiOl_Chiki_(Unicode_block)accessedonMay212018

[146]BanglaScripthttpwwwbangladesh2000combdbangla_scripthtmlaccessedonMay212018

[147] BhattacharyaAshutoshed(1942)GopichandrerGanCalcuttaCalcuttaUniversity

[149] DasSisirKumar(1975)SahibsandMunshisAnAccountoftheCollegeofFortWilliamCalcutta

[150] IslamRafiqulPabitraSarkarMahbubulHaqampRajibChakraborty(eds)(2014)BanglaAcademyPromitoBanglaByabaharikByakaran(AFunctionalGrammarofStandardBangla)DhakaBanglaAcademy

[151] SarkarPabitra[2013]lsquoBanglaSpellingReformtheLongandShortofItrsquoBanglaJournal19215-232

[152] BanglaAcademy(2012)BanglaAcademyPromitoBanglaBananerNiyam(StandardBanglaSpellingasadoptedbyBanglaAcademy)DhakaBanglaAcademy

56

[153] SarkarPabitraampRajibChakraborty2018ldquoWhathashappenedSoFarIntermsof Script Reformsrdquo Paper presented at the Face to Face meeting jointly held by theBanglaAcademyDhakaampICANNatBanglaAcademyDhakaon10072018

[154] The Unicode Consortium 2018 The Unicodereg Standard Version 110 ndash CoreSpecificationChapter12P473

57

10 Appendix-I

101 AugmentedBackus-NaurFormalism(ABNF)The Augmented Backus-Naur Formalism (ABNF) is generic in nature and whenappliedtoaspecificlanguagescriptcertainrestrictionrulesapplyInotherwordsinagivenlanguagesomeoftheFormalismstructuresdonotnecessarilyapplyTotakecareofsuchcasesrestrictionrulesaresetinplaceTheserestrictionswillhelptofine-tunetheABNFIncaseofBangla13inparticularthefollowingrulesapply

1Khaṇḍata (ৎ) is NOT allowed at the beginning of an IDN label The sameappliestoঞandthevelarnasalঙintheBanglaSchemeoffive-foldlsquovargarsquo(asdefined under Table 5) Moreover Bangla does not allow ya (য়) in thebeginningofawordeitherbutwecanciteacoupleofnativeexamples forexamplethewordয়8াvেড়া(yaeligbbɔRo)fromthepoemlsquoLichuchorrsquowrittenbyKaziNazrul IslamHowever there are instances of it being used in namesmostlyof foreignoriginsuchasYaqubwhichmaybewrittenwithya(য়) inthe beginning as inয়াwব) In very recent timeswhile transliterating someChineseandJapanesenamesinBanglaonedoescomeacrossthepossibilityofKhaṇḍata (ৎ) followedbysa (স) inthebeginningofaword forexamplexেসিরং(Tsering)

2CHcancomewithKhandaTainonlythecasewhereCisra(র)(09B0)

ৎ6 asinভৎ6 সনা

3OnlyfollowingcombinationswithVHCMwillbeallowedrarrঅ8া(togetherpronouncedasaelig)asinঅ8ািসড(acid)rarrএ8া(togetheralsopronouncedasaelig)asinএ8ািসডএ8ােসািসেয়শান

(acidassociation)

102 lsquoSylhetiNagarılipirsquoorlsquoSilotirsquoThisversionofBanglascriptresemblesthe lsquoKaithīrsquoscript(ISO12954)usedbytheAccountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh andBiharndashwidelyinuseduringthe1880sTherewereseveralothernamesofSylheti

13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language Assamese and Maṇipuri have not been covered in this section

58

Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)

Table17ndashTheScriptTableofSylhetiNagarıorSiloti

103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints

1031 BanglaandNāgarīorDevanāgarī

Bangla Devanāgarī NBGP

Decision

ঃ U+0983 ः U+0903 Confusable

ওU+0993 उU+0909 Confusable

ঘU+0998 घU+0918 Confusable

U+0981 U+0945 Confusable

Table18BanglaandDevanāgarīconfusablecodepoints

59

1032 BanglaandGurmukhi

Bangla Gurmukhi NBGPdecision

ঘU+0998 ਬU+0A2C Confusable

U+0981 U+0A71 Confusable

Table19BanglaandGurmukhiconfusablecodepoints

Bangla Gurmukhi

NBGPdecision

ওU+0993 ਤU+0A24

Distinguishable

শU+09B6 ਅU+0A05

Distinguishable

মU+09AE ਮU+0A2E

Distinguishable

বাU+09ACandU+09BE

ਗU+0A17

Distinguishable

Table20ndashBanglaandGurmukhıdistinguishablecodepoints

1033BanglaandOriya(Odia)

Bangla Oriya(Odia) NBGPDecision

ওU+0993 ଓU+0B13 Confusable

Table21ndashBanglaandOriyadistinguishablecodepoints

Bangla Oriya(Odia) NBGP

Decision

ঘU+0998 ସU+0B38 Distinguishable

Table22ndashBanglaandOriyadistinguishablecodepoints

60

11 Appendix-IIBengaliconsonantsandtheirallographs

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

প p y(z+ত)(z +ন)|(z +প)প8(z +য)r(z +র)(z +ল)~(z +স)(+প)(+প)

ফ pʰ (t +র)(t +ল)(+ফ)

ব b ( +জ)( +দ)( +ধ)v(+ব)ব8(+য)(+র)(+ল)ভ(+ভ)(+ব)(+ব)

(0+ধ) 2 (3+ব)

ভ bʱ ভ8(+য)(+র)(+ল)

ত t (x+ত)8(x+x+য)(x+x+ব)(x+থ)(x+ন)ত8(x+য)(x+ম)8(x++য)(x+ব)(x+র)y(z+ত)(p+ত)(p+x+ব)(+ত)q8(+x++য) (+x+র)Thereisamarkedformofত+=ৎৎ6 (+xৎ)

amp (5+ত)

61

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

থ tʰ থ8(iexcl+য)cent(iexcl+র)pound(+থ)(x+থ)curren(+থ)

(7+থ) 9 (+থ)

দ d yen(brvbar+গ)sect(brvbar+ঘ)uml(brvbar+দ)copy(brvbar+ধ)দ8(brvbar+য)ordf(brvbar+ব)laquo(brvbar+ভ)not(brvbar+র)(+দ)shy(+দ)reg(+brvbar+র)not6 (+brvbar+র)

(lt+গ) gt (lt+ধ)

ধ dʱ macr(deg+ন)plusmn(deg+ম)ধ8(deg+য)sup2(deg+র)sup3(acute+ধ)copy(brvbar+ধ)(+ধ)micro(+ধ)

( (+ধ) gt (lt+ধ) (0+ধ) (7+ধ)

ট ʈ para(middot+ট)ট8(middot+য)cedil(middot+ব)sup1(middot+র)ordm(p+ট)raquo(frac14+ট)

ঠ ʈʰ ঠ8(frac12+য)frac34(iquest+ঠ)Agrave(frac14+ঠ)

ড ɖ Aacute(Acirc+ড)ড8(Acirc+য)Atilde(Acirc+র)

ঢ ɖʱ ঢ8(Auml+য)Aring(iquest+ঢ)

চ tʃ AElig(Ccedil+চ)Egrave(Ccedil+ছ)Eacute(Ccedil+Ecirc+র)Euml(Ccedil+ঞ)চ8(Ccedil+য)Igrave(Iacute+চ)Icirc(Iuml+চ)

(A+চ)

62

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ছ tʃʰ ETH(Ecirc+র)Egrave(Ccedil+ছ)Ntilde(Iacute+ছ)Ograve(Iuml+ছ)

$ (A+ছ)

জ dʒ Oacute(Ocirc+জ)Otilde(Ocirc+Ocirc+ব)Ouml(Ocirc+ঝ)times(Ocirc+ঞ)জ8(Ocirc+য)Oslash(Ocirc+র)Ugrave(Iacute+জ)

(A+জ)

ঝ dʒʱ (notprivilegedenoughtohaveclustersasafirstmember)Ouml(Ocirc+ঝ)Uacute(Iacute+ঝ)

ক k Ucirc(p+ক)ordm(p+ট)(p+ত)Uuml(p+x+র)(p+x+ব)Yacute(p+ন)THORN(p+ব)szlig(p+ম)ক8(p+য)agrave(p+র)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)auml(p+frac14+ব)aacute8(p+frac14+য)aring(p+স)s(aelig+ক)ccedil(+p+র)

amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)

খ kʰ (notprivilegedenoughtohaveclustersasafirstmember)egrave(aelig+খ)

63

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

গ g eacute(acute+গ)ecirc(acute+দ)sup3(acute+ধ)euml(acute+ন)igrave(acute+ব)iacute(acute+ম)গ8(acute+য)icirc(acute+র)iuml(acute+ল)eth(aelig+গ)eth6 (+aelig+গ)

( (+ধ) (H+গ) K (L+H+গ)

ঘ gʱ ntilde(ograve+ন)ঘ8(ograve+য)oacute(ograve+র)ocirc(aelig+ঘ)

ঞ Thisletterdoesnothaveanyparticularphoneticvaluebutmostlypronouncedasn

Igrave(Iacute+চ)Ntilde(Iacute+ছ)Ugrave(Iacute+জ)Uacute(Iacute+ঝ)times(Ocirc+ঞ)

(A+চ) $ (A+ছ) (A+জ) M (A+ঝ)

ণ n otilde(iquest+ট)frac34(iquest+ঠ)ouml(iquest+ড)divide(iquest+Acirc+র)Aring(iquest+ঢ)oslash(iquest+ণ)ণ8(iquest+য)ugrave(iquest+ব)acirc(p+frac14+ণ)uacute(frac14+ণ)ucirc(+ণ)

O (P+ড) - (P+R+র) + (S+ণ)

ঙং ŋ s(aelig+ক)uuml(aelig+p+র)egrave(aelig+খ)eth(aelig+গ)ocirc(aelig+ঘ)yacute(aelig+p+ষ)(Insomecontextsaeligisreplacedbyং)কংঅং

) (H+ক) (H+গ) U (H+ঘ)

64

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ম m thorn(+ল)yuml(+প)(+z+র)(+ভ)(++র)$(+ম)(+র)(x+ম)plusmn(deg+ম)amp(+ম)atilde(p+frac14+ম)

W (3+ম)

ন n (+ট)((+middot+র))(iquest+ঠ)u(+ড)(+Acirc+র)(+ত)q(+x+র)q8(+x++য)curren(+থ)shy(+দ)reg(+brvbar+র)micro(+ধ)+(+deg+র)(+brvbar+ব)-(+ন)(+ম)ন8(+য)(+স)0(+ন)

(7+থ) (7+ধ) (7+Y+র)

শ ʃ Icirc(Iuml+চ)Ograve(Iuml+ছ)1(Iuml+ন)2(Iuml+ম)3(Iuml+র)4(Iuml+ল)শ8(Iuml+য)

ষ ʃ 5(frac14+ক)raquo(frac14+ট)Agrave(frac14+ঠ)uacute(frac14+ণ)6(frac14+প)7(frac14+z+র)8(frac14+ফ)raquo(frac14+ট)9(frac14+middot+র)Agrave(frac14+ঠ)uacute(frac14+ণ)ষ8(frac14+য)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)

+ (S+ণ)

65

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

স sampʃ (+ক)(+ট)(+প)(+ফ)lt(+ত)pound(+থ)(+ট)(+ক)=(+খ)স8(+য)gt(+র)(+ল)aring(p+স)

9 (+থ)

হ h ucirc(+ণ)0(+ন)amp(+ম)হ8(+য)(+র)A(+ল)

W (3+ম)

ড় ɽ B(C+গ)

ঢ় ɽʱ (notprivilegedenoughtohaveclusters)

য dʒThesecondarysymbol(allograph)jɔ-phalahastwophoneticvaluesWhenaddedtotheinitialconsonantinaworditisavowelaelig(asinশ8ামলর 8াপারetc)Butafteranon-initialconsonantitjustdoublesitinpronunciation(asinকায6ধায6etc)The+যcombinationhastwophysicalmanifestationsmdashর 8andয6

ক8(p+য)স8(+য)র 8(+য)[Justর 8isneverusedinBanglaorthographyর 8াisbutthenitslasttwosymbolsYa-phalaa-karaconstituteavowelsignrepresentingthevowelঅ8া]

66

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

র r Twomanifestationsmdashi lরফrepʰasthefirst

memberofaclusteregপ6ৎ6 not6 য6D6(+deg+ব)(earlierE6=+brvbar+deg+বafour-termcluster)etc(placedoverthefollowingconsonant)

ii র-ফলাrɔ-pʰɔlaasthesecondthirdmemberofaclustereg etc(placedundertheconsonantitfollows)

ল l F(+গ)(+প)G(+ব)H(+ম)I(+ট)J(+ড)K(+ক)F(+গ)L(+দ)ল8(+য)iuml(acute+ল)(+ল)thorn(+ল)

ঃ hwordfinallywordmediallyitdoublesthepronunciationofthefollowingconsonant

অঃকঃ

অব

Page 13: Proposal for a Bangla (or Bengali) Script Root Zone …‘Bangla’ (or Bengali) is historically and genealogically regarded as an eastern Indo-Aryan language with around 178.2 million

13

Transparent r (nn) s (pt) [ (st) where both member of the cluster can berecognizedOpaquewhereneitherofthetwocouldbe(easily)recognizedmdash8 ks(7 k+ষ s)= jn

( j+ঞ n)tng(un+গg) hm(gt h+ম m)

Semi-transparent A (pr)পH (rp)whereone symbol is recognizable and theother is

notIncaseofthree-termclustersatleastonesymbolwillnotbetransparentegv str

(w s+x t+র r)D str(B s+C t+র r)etc

Therewere in fact two types of proposals One concerned the shape of the lettersthose of consonant + vowel (CV) combinations and conjuncts which is consonant +consonantcombinationsTherewerefurthercomplexshapesiethoseofconsonant+consonant+ (consonant+) vowel (CC(CV) signs as in y (pru) or z (skru) SomedecisionsinthisareawerenecessarybecauseafewoftheCC(C)symbolsrepresentedcomplexitiesthatmadelearningthemdifficultforthechildrenTheotherdealtwiththespellings ofwords onlywithout any reference to the shapes of letters inwhich theywere written The basic objective here was lsquoone word one spellingrsquo to the greatestextentthatwaspossible[151]

Belowwe place a statement of themost salient changes that affect the consonant +vowelcombinations[153]

a The variants of the short u (^ উ-কার hrasva u-kāra) vowel sign have been

brought down to one ie So zwnj (gu) is now গ Similarly h (ru) gt র zwnj

(śu)gt শ j (hu)gtহ and therefore cluster + short u sign k (ntu)gt W

(ন++ত+উ) (stu)gt[ (স++ত+উ)

b The variants of long u (দীঘH ঊ-কার dīrgha u-kāra) have also been reduced

(rū)gt র n (bhrū) gt Y (ভ bh++র r+ঊ ū) (drū)gt (দ d++র r+ঊ ū) p (śrū)gt

(শ ś++র r+ঊ ū)

c The variants of ঋ-কার (ṛ-kāra secondary symbol of ṛ) have been brought down to one q (hṛ) gt হ

Regarding consonant + consonant + (consonant)hellip+ (vowel) clusters PaschimbangaBanglaAkademi proposed transparent or semi-transparent shapes for clusters to theextentadmissibleinBanglawritingsystemSomeexampleswillclarifytheproposal(Aslashwillmeanthatthetraditionalcluster-shapeprecedesitwhiletheBanglaAkademiinnovationfollows)[153]

Xব ধ bdh ( b+ ধ dh) Mদ ধ ddh (J d+ধ dh) ন থ nth (L n+থ th) Uঞ চ ntildec

(9 ntilde+চ c) ঞ ছ $ ntildech (9+ছ) ঞ জ ntildej (9 ntilde+জ j) Sক ত amp kt (7 k+ত t) T

14

kr (7 k+র r) Nগ ধ ( gdh (K g+ধ dh) ) ṅk (u ṅ+ক k) t ṅg (u ṅ+গ g) +

ṣṇ (B ṣ+ণ ṇ) ন ndhr (L n+ dh+র r) - ṇḍr ( ṇ+ ḍ+র r) ktr (7 k+x

t+র r)

331TheConsonantsAsper traditional classificationBangla Consonants are categorized according to theirphoneticpropertiesespeciallyintermsofplaceandmannerofarticulation[107]Thereare Five lsquoVargarsquo (pronounced as lsquoBargarsquo in Bangla) or Groups (sets or classes)distinguished by Place of Articulation and one Non-lsquovargarsquo group [105] Each Vargawhich corresponds toStopsat a certainplaceof articulation containsa seriesof fiveconsonants classified as per their phonetic qualities (ie manner of articulation)beginning from Unvoiced and Unaspirated to Voiced and Aspirated (in the fourthcolumn)finallyendingwithaHomorganicorCorrespondingnasal[107]Considerthefollowingtable

lsquoVargarsquoorSets

Unvoiced Voiced Nasal

-Asp +Asp -Asp +Asp

Velar কlsquoKrsquoU+0995

খlsquoKHrsquoU+0996

গlsquoGrsquoU+0997

ঘlsquoGHrsquoU+0998

ঙlsquoNGrsquoU+0999

Palatal চlsquoCrsquoU+099A

ছlsquoCHrsquoU+099B

জlsquoJrsquoU+099C

ঝlsquoJHrsquoU+099D

ঞlsquoNYrsquoU+099E

Retroflex টlsquoTTrsquoU+099F

ঠlsquoTTHrsquoU+09A0

ডlsquoDDrsquoU+09A1

ঢlsquoDDHrsquoU+09A2

ণlsquoNNrsquoU+09A3

Dental তlsquoTrsquoU+09A4

থlsquoTHrsquoU+09A5

দlsquoDrsquoU+09A6

ধlsquoDHrsquoU+09A7

নlsquoNrsquoU+09A8

Bilabial পlsquoPrsquoU+09AA

ফlsquoPHrsquoU+09AB

বlsquoBrsquoU+09AC

ভlsquoBHrsquoU+09AD

মlsquoMrsquoU+09AE

Table5VargaclassificationofBanglaconsonants

(FallingintoaPatternofFiveSetsofUnvoicedUnaspiratedUnvoicedAspiratedVoicedUnaspiratedVoicedAspiratedandNasalscalledfivelsquoVargarsquo)

Non-

যlsquoYrsquoU+09AF

য়lsquoYYrsquoU+09DF

রlsquoRrsquoU+09B0

ড়lsquoRRrsquoU+09DC

ঢ়lsquoRHrsquoU+09DD

15

Varga লlsquoLrsquoU+09B2

শlsquoSHrsquoU+09B6

ষlsquoSSrsquoU+09B7

স lsquoSrsquo U+09B8

হlsquoHrsquoU+0939

Table6Non-Vargaconsonants(Notfallingintoanyofthefivecategories)

332TheImplicitVowelKillerHasanta(calledrsquoHalantrsquoorlsquoHalantarsquoinotherBrahmı-basedscripts)As stated earlier all consonants are pronounced in isolation with an implicit vowel(centralback-ɔinBanglaastheneutralvowel)assumedtobeassociatedwiththem[121]ThelsquoHasantarsquo(=rsquoHalantrsquoorlsquoHalantarsquoinotherBrahmı-basedscripts)orthetermlsquoVirāmarsquo7(=rsquoDa rirsquoinBangla)aspreferredinUNICODE(cfUnicode30andabove)havebeenusedinthisreportastermsthathavebeenusedtodenotethecharacterthatmarkthe absence of this inherent vowel It may be noted that the term virama has beenadopted in UNICODE in a sense that is different from the traditional definition ofgrammarandhenceitrequiressomeexplanationhereConsideringtheimportanceofthedocumentthisnoteshouldbeapartofthisLGRdocumentsothatanybodyreferingtoitshouldbeabletoknowthepropergrammaticalexplanationofthetermBecauseaspecialsignisneededwheneverthisimplicitvowelisstrippedoffthesymbolisknownas the Hasanta (= Halant) (U+09CD) By placing the Hasanta under the firstconsonantofacombinationorclusteronecouldndashincommonparlanceldquokillrdquoitsvowelandcreate conjuncts In thismanner conjunct characters canbegenerallywrittenbyjoining two to fourconsonant combinations In rarecases thisprocess can joinup tofive consonantsHowever thenotionof amaximumnumberof consonants joining toformoneaksara8istobeboundedempiricallyThisisanobservationbasedontheCIIL-Emille Corpora of Bangla words [132 amp 133] as seen in print these days Given themixture of scripts and languages happening on theweb the possibility that onemaywant a generic Top Level Domain [gTLD] which may have more than the observedmaximum cannot be ruled out This can be the case when a foreign language wordwhichadmitsalargenumberofconsonantsistransliteratedintoBanglaHenceintheBanglaLGRworkthislimitwillnotbeenforced

333VowelsSeparate symbols exist for all lsquoSwararsquo or Vowels in Bangla which are pronouncedindependentlyeitheratthebeginningofthewordorafteranothervowelorconsonantsoundToindicateaVowelsoundotherthantheimplicitoneaVowelsigncalledlsquokārrsquo

7VirāmaasusedhereisalsoamisnomeraccordingtotheIndiangrammaticaltraditionsNowheremereabsence of a vowel is marked as virama Hasanta just marks the absence of a vowel nothing else(AbhyankarKashinathVasudevampJMShukla1961ADictionaryofSanskritGrammarBarodaːOrientalInstitute)8ThistermneedstobedisambiguatedAksaraalsomeanslsquosyllablelsquoinIndiangrammaticaltreaditions

16

inBanglaorMātrāinNagarı9isattachedtotheconsonantSincetheconsonanthasthisbuilt in neutral vowel at the end there are equivalent kāras (Mātrās) for all vowelsexcepttheঅ(pronounced-ɔ)Thecorrelationisshownasfollows

Vowel Correspondingvowelsign(kāras(Mātrās)

অlsquoArsquoU+0985

আlsquoAArsquoU+0986 া U+09BE

ইlsquoIrsquoU+0987 ি U+09BF

ঈlsquoIIrsquoU+0988 ীU+09C0

উlsquoUrsquoU+0989 U+09C1

ঊlsquoUUrsquoU+098A U+09C2

ঋVocalicrsquoRrsquoU+098B U+09C3

ৠVocaliclsquoRRrsquoU+09E0 U+09C4

ঌVocaliclsquoLrsquoU+098C U+09E2

ৡVocaliclsquoLLrsquoU+09E1 U+09E3

এlsquoErsquoU+098F ে U+09C7

ঐlsquoAIrsquoU+0990 ৈU+09C8

ওlsquoOrsquoU+0993 োU+09CB

ঔlsquoAUrsquoU+0994 ৌ U+09CC

- ৗ U+09D7

Couldappearontopofঅ lsquoArsquoU+0985oranyothervowel

U+0981Candrabindu

9AlthoughthetermlsquoMatralsquoinBanglastandsforanaltogetherdifferentconceptvizthetopbarplacedoveraletterndashtypicallyavailableinHindiandBanglabutmissinginGujarati

17

Vowel Correspondingvowelsign(kāras(Mātrās)

Couldappearafterঅ lsquoArsquoU+0985oranyothervowel

ংU+0982Anusvara

Couldappearafter অlsquoArsquoU+0985oranyothervowel

ঃU+0983Visarga

Afteranyconsonant U+09CD(Hasanta)

- ঽ U+09BDAvagraha

Table7BanglaVowelswithcorrespondingkārs

334TheAnusvāraonuʃʃār(ং-U+0982)TheAnusvāra or onuʃʃār inBangla at times represents a homorganic nasal but notalwaysItreplacesaconjunctgroupofalsquoNasalConsonant+Hasanta+ConsonantrsquowherethesecondconsonantbelongstotheVelarvargaorsetasinলংকাButitoftenappearsalso for such combinations involving non-velars appearing as the lastmember of thecombinationasinলGাংটা ldquonakedrdquoorলGাংচা ldquoakindofsweettolimprdquoBeforeanon-vargaconsonant the Anusvara represents a nasal sound that may have an alternativeconjoined writing symbol representing the corresponding nasal consonant of theparticularsetAlthoughModernHindiMarathiandKonkaniprefertheanusvāratothecorrespondingHalf-nasal inBangla it isclearlydemarcatedastowhereonemustusetheAnusvāraandwhere ithastobeaconjunctclusterwithanasalasthefirstorthesecondcomponent

335NasalizationCandrabindu(-U+0981)

Candrabindu denotes nasalization of the preceding vowel as in চাদ cad lsquomoonrsquo(U+099AU+09BEU+0981U+09A6)Thissignwithadotinsidethehalf-moonmarkisusedasnasalizationmarkerinmanyBrahmı-basedscripts[143]

336Nukta(-U+09BC)ThenuktasigndoesnotexistinBanglaorthographyItispredominantlyusedinmanyBrahmıderivedscriptssuchasDevanagarı(forHindiBodoMaithiliSantaliKashmiriandSindhiThetermandtheconceptofnuktaareborrowedinBanglaTheIDNAProtocol(RFC5891)statesthatIDNsmustbeinUnicodeNormalizationFormC (NFC) RFC 7940 applies this requirement to LGRs The definition of NFC in the

18

UnicodeStandardcontainsanumberofcompositionexclusionsAsaresulttheBanglalettersয় YYAড় RRAandঢ় RRHAhavetoberepresentedinthethisLGRbyusingthesequences (YA +Nukta U+9AF + U+09BC) (DDA + Nukta U+9A1 + U+09BC) and(DDHA+NuktaU+9A2+U+09BC)insteadofthesinglecodepointsYYA(U+9DF)RRA(U+09DC) andRRHA (U+09DD) although the useof lsquoNuktarsquo is otherwise completelyunnaturalinBanglaIt is noted that in the current Unicode Standard chart these characters are listed asadditionalconsonantsAspertheLGRProcedurehoweverthesedecisionsdependontheIDNAProtocolthroughasetofprodeduresdevelopedbytheIETFEventhoughtheUnicode Standard also prescribesmethods to produce these three characters both asatomiccharacters (forexample09DC forড় [r]09DD forঢ় [rh] and09DFasয় [y]assinglekeystroke)theIDNAprotocolrequiresthatwetreatthemasconjunctcharactersandthenallocatecodesfortheseintheUnicodeBengaliBlockItmaybenotedthattherecouldbesporadicattemptsorcasesofwritingMuslimnamesUrdupoeticwordsandPerso-Arabicloanwordswithnuktaunderক(k)খ(kh)গ(g)জ(j) and ফ (ph) only for the sake of correct pronunciation and for maintaining thesanctityoftheloanwordThesewerealsolikeusingBanglawritingsystemtoworkliketheIPAscriptItishowevernotinuseinBanglawritinginprinting

337Visargabiʃɔrgo(ঃ-U+0983)andAvagraha(ঽ-U+09BD)

TheVisargabiʃɔrgoU+0983 is frequentlyused inBangla loanwordsborrowed fromSanskritandrepresentsasoundveryclosetohOnecouldquoteasanexampleদঃখduhkholdquosorrowrsquorsquoldquounhappinessrsquorsquo(U+0926U+0941U+0983U+0916)The Avagraha ঽ (U+09BD) is mainly used in Sanskrit Pali Prakrt or Maithili textswritteninBanglaItisgraduallybeingreplacedbyanuppercomma(egনেরাঽপরািণre-writtenasনেরাrsquoপরািণ)ItisrarelyusednoweveninotherlanguagesusingBanglascriptIncaseofLGRtheAvagrahaisnotpartoftherepertoireIthasbeendecidedthereforenottoretainAvagraha(ঽ)(U+09BD)becauseitisblockedinTLDsaspertheMaximalStartingRepertoire(MSR)PleaseseeAppendixIIinsection11foracompletelistofBanglaconsonantsandtheirallographs

338ZeroWidthNon-joiner(U+200C)andZeroWidthJoiner(U+200D)ThisnoteispertinenttotheuseofZeroWidthJoiner(ZWJ)andZeroWidthNonJoiner(ZWNJ)asusedinBanglaItneedstobenotedthatNepaliKonkaniandHindiusethesetwosignsinadifferentmanner

19

ZWJ(U+0200D)andZWNJ(U+0200C)arecodepointsthathavebeenprovidedbytheUnicodestandardto instructtherenderingofastringwherethescripthastheoptionbetweenjoiningandnon-joiningcharactersWithouttheuseofthesecontrolcodesthestringmayberenderedinanalternateformfromwhatisintendedUseofZWJ

bull InsofarasBanglaisconcernedZWJisusedfortheproperrenderingofcharacterssuchaskhaṇḍa-taৎasinসতGিজৎ (satyajit)ldquoSatyajitrdquoandসৎ(sat)ldquohonestrdquoThisistypedasfollowsta+Hasanta+ZWJ(U+0200D)

bull However ZWJ is more important where same combination of consonantal

characters is represented differently depending upon the contexts Eg র++য

havetworepresentationsinBanglamdashasযHandasর GTogettheformযHonehasto

type in the following mannermdashর++য but for র G the sequence would be

র+ZWJ++য [154] In other words ZWJ is used in the rendering of wordsdemanding ya-phalā after ra which is otherwise not possible to type (render)due to the same order of ra+hasanta+antastha ja in the medial andor finalposition Interestingly ra+hasanta+antastha ja is used to type repha on theconsonant -antasthaja as inকায6 (kaarjo) In order to get a ya-phalā after the

consonant -ra it is therefore obligatory to use ZWJ after -ra as in র Gাপার

(wrapper)র Gাশ(rash)র Gািল(rally)etcThetypingsequenceisgivenbelow

ra(র)+ZWJ+hasanta()+antasthaja(য)=র GUseofZWNJ

bull TheuseofZWNJinBanglaisusedtorepresenttheexplicitHasantaorHalantInordertoavoidconjunctformationincaseswherethereisanexplicithasantabeforethesucceedingconsonanttheZWNJisused

Consonant+hasanta+ZWNJ+consonant=explicithasantaExampleAা7 কথন(prakkathanaprakkɔtʰon)

TheuseofZWJZWNJhavebeenruledoutfromtherootzonebythe[Procedure]Usedin Bangla to create alternate renderings the insertion of these two signs can affectsearchingaswellasNLP

20

TheZeroWidthNon-joiner(ZWNJ)isaninvisiblecharacterusedincertaincases(afterHasanta)wheredefaultconjunctformationistobeexplicitlyrestrictedandtheHasantajoiningthetwoconsonantsparticipatingintheconjunctformationneedstobeexplicitlyshown

339UseofYa-phalaaYa-Phalaasequencesare two instances inBanglawhereHasanta isprecededbya fullvowel(U+0985অ-BENGALILETTERAandU+098Fএ-BENGALILETTERE)

bull অা 098509CD09AF09BE BENGALILETTERA+BENGALISIGNVIRAMA+BENGALILETTERYA+BENGALIVOWELSIGNAA

bull এা 098F09CD09AF09BEBENGALILETTERE+BENGALISIGNVIRAMA+BENGALILETTERYA+BENGALIVOWELSIGNAA

For renderingYa-phalā followedbyঅ andএ it isnecessary to typeU+09CDHasantaplusU+09AFyaprecededby thesaidvowelsThis isapurely ligaturalentityand theadditionofYa-phalāandākaraisusedtoelicittheaeligsoundasinEnglishacidঅGািসড

association অGােসািসেয়শনlsquobatrsquoবGাটlsquofatrsquo ফGাট lsquomatrsquo মGাটlsquocaprsquoকGাপetcTheBrāhmīscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribed as lsquovowel killerrsquo although it actually indicates absenceof a vowel after themarkedconsonantOnly theconsonantscanhave theHasantamarkedButasweseehereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅ8াandএ8া(CfUnicode100p473[100])

3310FormationofRa-phalaaandRefSequences Thiscasereferstotheformationofrephaandra-phalāasfollows

Ra-Hasanta= (C2H)whereC2iseither

09B0(র-BENGALILETTERRA)or 09F0(ৰ-ASSAMESELETTERRAUnicodename

BENGALILETTERRAWITHMIDDLEDIAGONAL)His09CD(-BENGALISIGNVIRAMA)

Owingtoco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+kaasinঅকHarkaldquothesunldquo)ra-phalā=C+Hasanta+ra(egT ieka+Hasanta+raasinচTchakraldquocyclerdquo)Thepointisin

21

both the cases the slot for ra could be Bangla ra র (U+09B0) or the Assamese ra ৰ(U+09F0)followedprecededbythecommonHasanta(U+09CD)whereastheshapesofrephaandra-phalāinboththecasesremainthesameTheLGRmakesanoteofthispoint of concern with respect to the two RAs in disguise as it would be compeltelyimpossibletodistinguishbetweenthemwithnakedeyesinalablesogeneratedwhichmay consequently lead to concerns related to spoofing and other kind of cyberirregularitiesThemotivetoclassthesetwoCPsas(blocking)variantsisbecausefullyrendered labels may mask the distinction between Bangla ra র (U+09B0) or theAssameseraৰ(U+09F0)ThatprovidesthejustificationforVariantSet4thoughonlyinthecontextoffollowingHasantThedifferencebetweentheRAsisonlydistinguishableifonelooksintotheirUnicodevaluesThereforelabelssuchasঅকHarka শীষH sırsalsquotopapexrsquo অY abhra lsquocloudthe skyrsquo ম śrama lsquophysical labourrsquo could be extremelydangerous as theweb-usermay never verify the digital content (the labels) with itsunicodevaluecodepoints ThispointismadeexplicitlywithreferencetoTable9(ofsequencesp36)andTable16(ofWLESymbolsp47)thataretofollowMoreoveritisnoteworthythattheREPHAcanalsooccurwithKHANDATATheconditionsinthiscontextofKHANDATAareliabletobesuchthattheCshouldbeeitherRAU+09B0(র)(usedinBangla)orRAU+09F0(ৰ)(usedinAssamese)

4 OverallDevelopmentProcessandMethodologyThe Neo-Brāhmī Generation Panel (NBGP) has been formed by members havingexperience in Linguistics (especially in NLP Computational linguistics) LiteratureLanguageHistoryandEpigraphyUndertheNeo-BrāhmīGenerationPanelBanglaandeightotherscriptsbelongingtoseparateUnicodeblocksarebeingtakenuptoassignaseparate LGR for each However an attempt ismade to ensure that the fundamentalphilosophybehindbuildingthoseLGRsconsistentwithallotherBrāhmī-derivedscriptsThepresentLGRwillcater tomultiple languagesbelongingtoEGIDSscale1to4(seeTable4)thatuseBanglascriptThefollowingguidingprinciplesareusedinmakingdecisionsaboutBanglaLGRCode-points

41 GuidingPrinciplesTheNBGP adopts followingbroadprinciples for selection of code-points in the code-pointrepertoireacrosstheboardforalltheNeo-Brāhmīscriptswithinitsambit

411 InclusionPrinciples4111 ModernUsageEvery character proposed should be in the everyday usage of a particular linguisticcommunityThecharacterswhichhavebeenencodedintheUnicodefortranscription

22

purposesonlyorforarchivalpurposeswillnotbeconsideredforinclusioninthecode-pointrepertoire4112 UnambiguousUseEvery character proposed should have unambiguous understanding among linguistsaboutitsusageinthelanguage

42 ExclusionPrinciplesThe main exclusion principle is that of External Limits on Scope These consist ofprotocolsor standardswhichareprerequisites to theLabelGenerationRule-setsAllfurtherprinciplesareinfactsubsumedundertheselimitationsbuthavebeenspeltoutseparatelyforthesakeofclarity421 ExternalLimitsofScopeThecodepointrepertoireforrootzonebeingaveryspecialcaseatthetopofprotocolhierarchies thecanvasofavailablecharacters forselectionasapartof theRootZonecodepointrepertoireisalreadyconstrainedbyvariousprotocollayersbeneathitThefollowingthreemainprotocolsstandardsactassuccessivefiltersiTheUnicodeChartOut of all the characters that are needed by the script in question if a particularcharacter is not encoded in Unicode it cannot be incorporated in the code pointrepertoire Such cases are quite rare and especially so in Bangla-Asamiyā-ManipuriWritingSystemgiventheelaborateandexhaustivecharacterinclusioneffortsmadebytheUnicodeconsortiumiiIDNAProtocolUnicode being the character-encoding standard for providing the maximum possiblerepresentation of a given scriptlanguage it has encoded as far as possible all thepossible characters needed by the script However the Domain name being aspecialized case it is governed by an additional protocol known as IDNA(InternationalizedDomainNames inApplications) The IDNAprotocol excludes somecharactersoutofUnicoderepertoirefrombeingpartofthedomainnamesiiiMaximalStartingRepertoire(MSR)TheRoot-zoneLGRbeing the repertoireof characterswhicharegoing tobeused forcreationoftheRoot-zoneTLDswhichinturnconstituteanevenmorespecializedcaseof domain names the ROOT LGR procedure introduces additional exclusions on theIDNArsquosallowedsetofcharacters

23

ExampleBanglaSignAvagrahaঽ(U+093D)evenifallowedbyIDNAprotocol isnotpermittedintheRootZoneRepertoireaspertheMSRTosumuptherestrictionsstartoffwithadmittingonlysuchcharactersasarepartofthe code-block of the given scriptlanguage The IDNA Protocol further narrows thisdownandfinallyanadditionalfilterintheformofMaximalStartingRepertoirerestrictsthecharactersetassociatedwiththegivenlanguageevenmore4211 NoPunctuationMarksTheTLDsbeingidentifierspunctuationmarkerspresentinBraHami-basedscriptswillnotbeincluded 4212 NoSymbolsandAbbreviationsAbbreviations weights and measures and other such iconic characters like BANGLAISSHAR(U+09FA)BANGLACURRENCYDENOMINATORSIXTEEN৹(U+09F9)etcwillalsonotbeincluded4213 NoRareandObsoleteCharactersThere are characterswhich have been added toUnicode to accommodate rare formssuchasSanskriticVOCALICRRৠ(U+09E0)andVOCALICLldquoঌrdquo (U+098C)aswellasVOCALICLLৡ(U+09E1)andtheallographicndashkaraformsofthelattertwosymbols-VOWELSIGNVOCALICL(U+09E2)andVOWELSIGNVOCALICLLldquo(U+09E3)Allsuch charactersareexcludedwhich complieswith theConservatismprincipleas laiddownintheRootZoneLGRprocedureHoweverinBanglathe-karacorrespondingtoVOCALICRRৠ(U+09E0)whichisVOWELSIGNVOCALICRRldquordquo(U+09C4)isstill inactiveuseincertainlimitedborrowedorSanskriticwordsandarethereforeretained4214 NoStressMarkersofClassicalSanskritandVedicStressmarkers for classical Sanskrit will not be included This is also in consonancewiththeLetterprincipleaslaiddownintheRootZoneLGRprocedure4215 ABNFThe Augmented Backus-Naur Formalism (ABNF) is described in Section 541 andAppendix(Section101)

5 RepertoireTheBanglaWritingSystemisrepresentedinUNICODEusingtheBengali(Bangla)scriptname as enumerated in ISO 15924 corresponding to languages such as Asamiyā(Assamese) Bangla (Bengali) and Manipuri The BENGALI block used for Bangla-

24

Asamiyā-Manipuri in theUNICODEhas93 entriesThis sectiondetails the code-pointrepertoirethattheNeo-BrāhmīGenerationPanel[NBGP]proposestobeincludedintheBanglaLGRItmaybementionedherethat theGovernmentofAssamhassubmittedaproposal toBureauof Indian Standards (BIS) on26th February2016 for dis-unificationofBanglaand Asamiyā Scripts The BIS inits 8thMeetingofIndian Language Technologies andProducts Sectional Committee LITD 20 held on 23rd Aug 2017 decided torefer theproposalforrecognitionofAssamesescriptinISOIEC10646toISOUntiltheUNICODEConsortiumtakesanyfurtheractionitwillbeassumedthattheCodePointRepertoireunderTable11willbevalidforallthethreelanguagesasaboveFor each of the code points language references have been given in the last columntitledReferenceunderTable8titledtheldquoCodePointRepertoirerdquoForentirecoverageofBanglacodepointsreferencesofBanglaAsamiyā(Assamese)Manipuri(Meitei)andBishnupriya are given Kokborok written in Bangla script is not known to haveintroducedmanynewcomplicationsexceptforoneparticularcharacterThoughonlyafewrepresentativelanguagesunderEGIDSScale1-4havebeenchosenforreferencingthey together cover all the code-points required for all the languages that NBGP hasconsideredasgivenunderBanglaUnicodePoints(asgiveninUNICODE63)Howeverbefore thedetailsarepresented it is ideal to lookat theBanglaCodePointChartfromMaximalStartingRepertoire[MSR]Version3Itmaybenotedthattheshapesofthereferenceglyphsgivenbelowinthecodechartsarebasedononeofthemanyfontsdesigned and are not prescriptive because there could be some variations in actualfonts ndash both UNICODE-compatible and True-Type ones Consider the following Codepointtable

25

Figure1BanglaCodePagefrom[MSR]forBangla-Asamiyā-Manipuri

ColourconventionAllcharactersthatareincludedinthe[MSR]-YellowbackgroundPVALIDinIDNA2008butexcludedfromthe[MSR]-PinkishbackgroundNot PVALID in IDNA2008 or are ineligiblefor the root zone (digits hyphen) - Whitebackground

GiventheBanglaUnicodeBlockasinFigure1forthecodepointsthoseareincludedintheMSRthefollowingsymbolswillneedaseparatetreatmentৎ U+09CE BanglaLetterKhanda-Taৰ U+09F0 Asamiyā-BanglaLetterRaWithMiddleDiagonalৱ U+09F1 Asamiyā-BanglaLetterRaWithLowerDiagonal

26

51 CodePointRepertoireInclusion

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

1 U+0981 BENGALISIGNCANDRABINDU

Candra-bindu

1Bangla2Manipuri2Assamese

[112][122][125]

2 U+0982 ং BENGALISIGNANUSVARA

Onushshar(Anusvara)

1Bangla2Manipuri2Assamese

[112][122][125]

3 U+0983 ঃ BENGALISIGNVISARGA

Biśarga(Visarga)

1Bangla2Manipuri2Assamese

[112][122][125]

4 U+0985 অ BENGALILETTERA

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

5 U+0986 আ BENGALILETTERAA

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

6 U+0987 ই BENGALILETTERI

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

7 U+0988 ঈ BENGALILETTERII

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

27

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

8 U+0989 উ BENGALILETTERU

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

9 U+098A ঊ BENGALILETTERUU

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

10 U+098B ঋ BENGALILETTERVOCALICR

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

11 U+098F এ BENGALILETTERE

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

12 U+0990 ঐ BENGALILETTERAI

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

13 U+0993 ও BENGALILETTERO

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

14 U+0994 ঔ BENGALILETTERAU

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

28

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

15 U+0995 ক BENGALILETTERKA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

16 U+0996 খ BENGALILETTERKHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

17 U+0997 গ BENGALILETTERGA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

18 U+0998 ঘ BENGALILETTERGHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

19 U+0999 ঙ BENGALILETTERNGA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

20 U+099A চ BENGALILETTERCA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

21 U+099B ছ BENGALILETTERCHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

29

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

22 U+099C জ BENGALILETTERJA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

23 U+099D ঝ BENGALILETTERJHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

24 U+099E ঞ BENGALILETTERNYA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

25 U+099F ট BENGALILETTERTTA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

26 U+09A0 ঠ BENGALILETTERTTHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

27 U+09A1 ড BENGALILETTERDDA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

30

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

28 09A109BC(U+09DC)

ড় NormalizedformofBENGALILETTERRRA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]09DCisthepreferredcodepointhoweveritisnotavailableforLGRasperthestandardsgoverningthisLGRdevelopment

29 U+09A2 ঢ BENGALILETTERDDHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

30 09A209BC(U+09DD)

ঢ় NormalizedformofBENGALILETTERRHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]09DDisthepreferredcodepointhoweveritisnotavailableforLGRasperthestandardsgoverningthisLGRdevelopment

31 U+09A3 ণ BENGALILETTERNNA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

32 U+09A4 ত BENGALILETTERTA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

31

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

33 U+09A5 থ BENGALILETTERTHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

34 U+09A6 দ BENGALILETTERDA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

35 U+09A7 ধ BENGALILETTERDHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

36 U+09A8 ন BENGALILETTERNA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

37 U+09AA প BENGALILETTERPA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

38 U+09AB ফ BENGALILETTERPHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

39 U+09AC ব BENGALILETTERBA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

32

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

40 U+09AD ভ BENGALILETTERBHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

41 U+09AE ম BENGALILETTERMA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

42 U+09AF য BENGALILETTERYA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

43 09AF09BC(U+09DF)

য় NormalizedformofBENGALILETTERYYA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]09DFisthepreferredcodepointhoweveritisnotavailableforLGRasperthestandardsgoverningthisLGRdevelopment

44 U+09B0 র BENGALILETTERRA

Consonant 1Bangla2Manipuri

[112][125]

45 U+09B2 ল BENGALILETTERLA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

46 U+09B6 শ BENGALILETTERSHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

33

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

47 U+09B7 ষ BENGALILETTERSSA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

48 U+09B8 স BENGALILETTERSA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

49 U+09B9 হ BENGALILETTERHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

50 U+09BE া BENGALIVOWELSIGNAA

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

51 U+09BF ি BENGALIVOWELSIGNI

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

52 U+09C0 ী BENGALIVOWELSIGNII

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

53 U+09C1 BENGALIVOWELSIGNU

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

34

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

54 U+09C2 BENGALIVOWELSIGNUU

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

55 U+09C3 BENGALIVOWELSIGNVOCALICR

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

56 U+09C4 BENGALIVOWELSIGNVOCALICRR

Kāra(Mātrā)

1Bangla2Assamese

[112][122]

57 U+09C7 l BENGALIVOWELSIGNE

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

58 U+09C8 m BENGALIVOWELSIGNAI

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

59 U+09CB lা BENGALIVOWELSIGNO

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

60 U+09CC lৗ BENGALIVOWELSIGNAU

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

35

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

61 U+09CD BENGALISIGNVIRAMA

Hasanta(=Halant)Virama(=Da ri)

1Bangla2Assamese2Manipuri

[112][122][125]

62 U+09CE ৎ BENGALILETTERKHANDATA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

63 U+09F0 ৰ BENGALILETTERRAWITHMIDDLEDIAGONAL

Consonant 2Assamese [122]

64 U+09F1 ৱ BENGALILETTERRAWITHLOWERDIAGONAL

Consonant 2Assamese2Manipuri

[122][125]

Table8BanglaCode-PointRepertoire

Apart from the above individual code-points the Neo-Brāhmī Generation Panel alsoproposes some specific sequences which enable conditional inclusion of the BanglaLETTER A and E followed by Bangla SIGN VIRAMA and Bangla LETTER YA againfollowed by Bangla VOWEL SIGN AA in the repertoire for enabling inclusion of aeligsoundasinEnglishlsquobatrsquolsquocatrsquoetc

SrNo

UnicodeCodePoints

Sequence

CharacterNames Examplelanguagesusingthecode-point

(Notexhaustive

list)

Reference

36

SrNo

UnicodeCodePoints

Sequence

CharacterNames Examplelanguagesusingthecode-point

(Notexhaustive

list)

Reference

S1 098509CD09AF09BE

অ8া BENGALILETTERABENGALISIGNVIRAMABENGALILETTERYABENGALIVOWELSIGNAA

BanglaAssamese

[112][122]

S2 098F09CD09AF09BE

এ8া BENGALILETTEREBENGALISIGNVIRAMABENGALILETTERYABENGALIVOWELSIGNAA

Bangla [112]

Table9Sequences

52 CodePointRepertoireExclusionTherearesomecharactersoftheBanglascriptthatfindplaceintheUnicodebuthavenot been included in the repertoire in the LGR proposal The reason for excludingঌ(U+098C)andৗ(U+09D7)isthattheyarerareandobsoletecharacters

SrNo CodePoints

Glyph CharacterNames Note

1 U+098C ঌ BENGALILETTERVOCALICL Limitedordeclininguse

2 U+09D7 ৗ BENGALIAULENGTHMARK Limitedordeclininguse

Table10ExcludedCodePoints

37

53 CodepointnotusedaloneBENGALI SIGN NUKTA U+09BC (See 336) is excluded from repertoire since it will never be used alone It will be used as sequence in three special characters in normalized form for ড় ঢ় য়

UnicodeCodePoint

Glyph CharacterName Reasonforexclusion

U+09BC BENGALISIGNNUKTA

Never used alone Only used together with U+09A1 ড

U+09A2 ঢ U+09AF য as to form ড় ঢ় য় respectively

Table10bExcludedCodePoints

54 TheBasisofPresentIDNThepresentLGRhasalsobenefitedfromtheearlierworkonIDNforBangla(differentversions)doneforभारतorভারতdraftedbetween20112009and18072013

541 TheABNFVariablesTheAugmentedBackus-NaurFormalism(ABNF)beganwiththefollowingvariables

CrarrConsonantVrarrVowelMrarrkāra(Mātrā)BrarrAnusvāra(onuʃʃār)DrarrCandrabinduXrarrVisarga(biʃɔrgo)HrarrHasantaViramaZrarrKhandaTa

TheAugmentedBackus-NaurFormalism(ABNF)willusethefollowingOperators

SrNumber Operator Function

1 ldquo|ldquo Alternative

2 ldquo[]rdquo Optional

3 ldquordquo VariableRepetition

38

4 ldquo()rdquo SequenceGroup

Table11TheABNFFormalism

InwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoBanglaaregiventofacilitateunderstanding

542 TheVowelSequenceInwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoBanglaare given To facilitate understanding of other Brahmi script users equivalents inDevanāgarīareprovidedwherevernecessaryAvowelsequenceismadeupofasinglevowelItmaybefollowedbutnotnecessarily(optionally)byanAnusvāraonuʃʃār(B)Candrabindu(D)oraVisargabiʃɔrgo(X)ThenumberofDBorXwhichcanfollowaV inBanglamaynotberestrictedtooneGoingbytherules illustratedinthedocument it isclearthat formationssuchasVDDVBBandVXXare invalidorthographicunitsHowever it isvalidandpossible tohaveformationsorsequencessuchasanusvarafollowedbyachandrabinduononehandandvisarga followed by a chandrabindu on the other as in হ8াংচা lsquohaelignchārsquo and lsquohaelignrsquo হ8াঃrespectivelyThepossibilityof aVisarga orAnusvāra (onuʃʃār) followingaCandrabinduexists inBangla Vowel can optionally be followed by a combination of Hasanta Virama [H]Consonant [C] to formaYa-phala ldquoYa-phala isapresentation formofU+09AFBanglaletterযorlsquoyarsquoRepresentedbythesequenceltU+09CDieBENGALISIGNVIRAMABangla SIGNHasanta or VIRAMA U+09AF -য BENGALI LETTER YAgt Ya-phala has aspecialformয়AgainwhencombinedwithU+09BEাBENGALIVOWELSIGNAA(ielsquoaarsquo(ā))itisusedfortranscribing[aelig]asintheldquoardquointheEnglishwordldquobatrdquowritteninBanglaasব8াটAVowel-sequenceadmitsthefollowingcombinations

5421 ASingleVowel

ExamplesV অअ

5422 AVowelwithConditionsAVowelcanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D]orVisarga[X]or Candrabindu+ Anusvāra [DB] or Candrabindu+ Visarga [DX] or combination ofHasanta(orVirama)[H]followedbyConsonant[C]followedbykāra(Mātrā)[M]

39

Examples

VB অং अ

VD অ अ

VX অঃ अः

VDB অং अ

VDX অঃ अःVHCM অ8াএ8া

5423 VHCMSequenceAVHCMsequencecanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D] orVisarga [X]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga [DX]

Examples VHCMB অ8াংএ8াং VHCMD অ8াএ8া VHCMX অ8াঃএ8াঃ VHCMDB অ8াংএ8াং VHCMDX অ8াঃএ8াঃ

543 TheConsonantSequence

5431 ASingleConsonant(C) ExampleC কक

5432 AConsonantwithConditionsAConsonantoptionallyfollowedbydependentvowelsignkāra(Mātrā)[M]orAnusvāra [B] or Candrabindu [D] or Visarga [X] or Hasanta (also known asVirama)[H]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga[DX]

Example

CM িকক -कक

CB কং क

40

CD ক क

CX কঃ कः

CH p क(Pureconsonant)

CDB কং क

CDX কঃ कः

5433 CMSequenceACMsequencecanbeoptionallyfollowedbyBDXDBorDX

Example CMB কীংকং कक

CMD কা का

CMX বীঃ वीः

CMDB কাং का

CMDX কাঃ काः

5434 SequenceofConsonants Asequenceofconsonants(upto4)joinedbyHasanta(alsoknownasVirama)

3(CH)CExample

CHC W rarr ন++ ত न+त

CHCHC sup2 rarr ন+ + ত+ + র न+त+र

CHCHCHC q8 rarr ন++ত++র++য় न+त+र+य

5435 Subsets While considering its subsets as a representative example we will consider thecombinationCHConlyhoweverthesameisequallyapplicabletoCHCHCandCHCHCHC[A]ThecombinationmaybefollowedbyMBDXDBorDX

Example CHCM sup3ী rarrক ক ী 4कrarrककী

CHCB sup3ং rarrক ক ং 4कrarrकक

CHCD sup3 rarrক ক 4कrarrकक

41

CHCX sup3ঃ rarrক ক ঃ 4कःrarrककঃ

CHCDB sup3 ং rarrক ক ং 4कrarrकक

CHCDX sup3ঃ rarrক ক ঃ 4कःrarrककः [B] 3(CH)CMmayfurtherbefollowedbyaBDXDBorDX

Example

CHCMB sup3ীং rarr ক ক ী ং 4क rarr क क ी

sup3ং rarr ক ক ং 4क rarr क क

CHCMD sup3া rarr ক ক া 4का rarr क क ा

CHCMX sup3ীঃ rarr ক ক ী ঃ 4कः rarr क क ी ः

CHCMDB sup3াংrarr ক ক া ং 4काrarr क क ा

CHCMDX sup3াঃ rarr ক ক া ঃ 4काः rarr क क ा ः

544 TheKhanda-Tasequence

5441 AsinglelsquoKhandarsquo-Ta(Z) Example Zৎ=x

5442 AKhandaTaCombination10AKhandaTacanbeprecededbyaconsonantandHasanta(alsoknownasVirama)

[CH]Z

Example র++ৎ =ৎHasinভৎHসনা (bhartsanā) scoldingNoteTheconditionsinthiscontextofKHANDATAarethattheCshouldbeeitherRAU+09B0(র)(usedinBangla)orRAU+09F0(ৰ)(usedinAssamese)

545 SpecialCasesSandPTwospecialcasesinvolvingSequences(referredtoasSandPinTable16underSection7)couldbedescribedbrieflyhereLetustakeupSinthefirstinstanceItisnoteworthythat thereare two instances inBanglawhereHasanta(U+09CD) isprecededbya full

10 Refer to Rule P in Section 7 Table 16

42

vowel (U+0985 অ - BANGLA LETTER A and U+098F এ - BANGLA LETTER E) ForrenderingYa-phalāfollowedbyঅandএitisnecessarytotypeU+09CDplusU+09AFyapreceded by the said vowels This is a purely ligatural entity and the addition ofYa-phalāandā-kāraisusedtoelicittheaeligsoundasinEnglishlsquobatrsquolsquofatrsquoetcTheBrahmıscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribedaslsquovowel killerrsquo although it actually indicates absence of a vowel after the markedconsonant Only the consonants can have the Hasanta marked But as we see hereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅGাandএGা(CfUnicode100p473[100])Another case refers to the formation of repha and ra-phalā in the said script andmentioned in the tableaboveasPOwing toco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+

kaasinঅকHarkaldquothesunldquo)ra-phalā=C+Hasanta+ra(egTieka+Hasanta+raas

inচTcakraldquocycleldquo)ThepointisinboththecasestheslotforracouldbeBanglaraর

(U+09B0)ortheAssameseraৰ(U+09F0)followedprecededbythecommonHasanta(U+09CD)whereastheshapesofrephaandra-phalāinboththecasesremainthesame

43

6 VariantsThissection talksabout thevariants in theBanglascriptTheNBGPcategorizes theseconfusinglyvariantsintwogroupsGroup1ConfusingduetopurevisualsimilarityGroup2ConfusingduetodeviationfromnormallyperceivedcharacterformationsbylargerlinguisticcommunityForGroup1anyidenticalcodepointsaredefinedasvariantsTheconfusablebutnotidenticalcasesarenotproposedasthereisanotherpanel(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesHowevercaseswhichbelongtoGroup2areproposedtobeconsideredasvariantsThesecasesarenotofmerevisualsimilarityasthey involve some deviations from the widely accepted norms of Bangla Aksharformations These can cause confusion even to a careful observer and hence beingproposedasvariantsThe variants are generated in a script when two or more forms are formed withdifferent storage or code points In Bangla the e-kāra ā-kāra and the o-kāra havedifferentcodepointsOnecantypeowithaconsonantatonegoandthesamebytypinge-kāra and ā-kāra as two separate keys getting the same results A reader cannotdifferentiatebetween the twoko (েকা)one typedwitha singlekeyand theotheronetypedwithtwodifferentkeysMoreoverthiswillnotbeconsideredasacaseofvariantbecauseakārafollowedbyakāraisnotallowed

61 InScriptVariantsHoweverweproposetwocasesoftruein-scriptvariantsinBanglascriptCASEIAs far as true variants in Bangla are concernedwemay drawour attention to caseswhereinHasantawith(U+09A5)থ(tha)appearsasconjunctwith(U+09B8)স(sa)and(U+09A8)ন(na)1 স+Hasanta+থ(U+09B8+U+09CD+U+09A5)versus

স +Hasanta+হ(U+09B8+U+09CD+U+09B9)

2 ন+Hasanta+থ(U+09A8+U+09CD+U+09A5)versus

ন+Hasanta+হ(U+09A8+U+09CD+U+09B9)

44

Theabovecombinationsifwrittenintraditionalorthographycouldbelittleconfusingwheretheথ(tha)inconjunctappearslikeaহ(ha)Theconjunctcouldbeintheinitialmedialorfinalpositions(asshownbelowinegno1)Itcouldbetypedwrongaswellthinking itwas aহ (ha)U+09B9 increasing the chances of risks in labelwriting andidentificationExamples1 acuteandসহ(asinacuteানsthānaacuteলsthulaাacuteGsvāsthyaঅacuteায়ীasthāyı)

2 andনহ(asinparagrantha)The fontswhichrepresent traditionalBanglawritingsystemcould tend tocreate thisproblemThereforethesemaybetakenascasesofvariantsinBanglaCASEIIAnotherinterestingexampleofvariantisencounteredinra+HasantaandHasanta+racombinationsinwritinglabelsintheBanglascript(forlanguagessuchasBanglaAssameseandManipuri)Thevariantcasesariseintypinglsquorepharsquo(involvingra+Hasanta)andlsquora-phalārsquo(involvingHasanta+ra)lsquoRepharsquocouldbeformedbytwosequences(mainlybecausebothAssameseandBanglafindplaceinthesameUNICODEpointsandlsquoB_RArsquoaswellaslsquoA_RArsquorefertothesamephoneticelement)Herethefinalligatureslookthesameandwillbeasfollows

(1) B_RA+H+C(2) A_RA+H+C

Where

B_RA = U+09B0BENGALILETTERRA(র)orA_RA = U+09F0BENGALILETTERRAWITHMIDDLEDIAGONAL(ৰ)H = U+09CDBENGALISIGNVIRAMA()C = anyconsonant(theoretically)

Example

Sequence1(UsingBanglaRA)

Ligature1

Sequence2(UsingAssameseRA)

Ligature2

U+09B0(র)U+09CD()U+0995(ক) কH U+09F0(ৰ)U+09CD()U+0995(ক) কH

U+09B0(র)U+09CD()U+09A0(ঠ) ঠH U+09F0(ৰ)U+09CD()U+09A0(ঠ) ঠH

Table12ExampleofRepha

45

NoteAsBanglaandAssameseকandঠlookexactlythesametheresultantcombinationswithRephalookidenticalAdditionofRephadoesnotmakeanydifferencelsquoRa-phalarsquocouldbeformedbytwosequencesonsimilargroundsandthefinalligatureswouldlookthesame

(1) C1+H+B_RA(2) C1+H+A_RA

WhereC1 = anyconsonantsexceptKhanda-ta

Example

Sequence1(UsingBanglaRA)

Ligature1

Sequence2(UsingAssameseRA)

Ligature2

U+0995(ক)U+09CD()U+09B0(র) U+0995(ক)U+09CD()U+09F0(ৰ)

U+09A8(ন)U+09CD()U+09B0(র) ) U+09A8(ন)U+09CD()U+09F0(ৰ) )

Table13ExampleofRa-phala

AstheAssameseandBanglaRephaandRa-phalaconjunctformslookthesamethiscouldcauseconfusabilitytotheend-usersHencetherephaandra-phalacasesneedtobedefinedasvariantsNBGPconcludedtodefineরandৰasvariantcodepointswhereonlyonevariantsetbetweenরandৰcouldcoverallcasesButthiswillcreateblockedvariantlabelsegif

someoneregistersldquoরররrdquothevariantlabelldquoৰৰৰrdquowillbegeneratedasvariantandwillbeblockedandviceversaHoweveritisonlyblockedatthelabellevelifsomeoneelseneedstoregisterotherlabelsegৰৰorৰৰৰৰitisstillpossible

62 CrossScriptVariantsAcrispcrossscriptstudyforBanglahasbeendonewithrespecttosisterscriptssuchasDevanāgarī Gurmukhı and Odia11(formerly Oriya) keeping in mind the visual andtechnicalconfusionstheymaycauseaslabelsonthewebdomainMoreoverthereisnoin-script variant in Bangla as far as the orthography is concerned The followingcharacters are being proposed by the NBPG as variants Although there are certaincharacters which are somewhat similar they but have not been included here TheyhavebeenprovidedintheAppendix(102)forreference

11 Unicode uses Oriya for the script although Odia is now the official term used

46

1 BanglaandNāgarīDevanāgarīScript

Bangla Devanāgarī

মU+09AE

मU+092E

িU+09BF

िU+093F

Table14-BanglaandDevanāgarīcross-scriptvariantcodepoint

2 BanglaandGurmukhiScript

Bangla Gurmukhı

মU+09AE

ਸU+0A38

িU+09BF

ਿU+0A3F

Table15-BanglaandGurmukhıcross-scriptvariantcodepoint

7 WholeLabelEvaluationRules(WLE) ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinsection32whenwritten inBangla12ScriptThe ruleshavebeendrafted in suchawaythattheycanbeeasilytranslatedintotheLGRspecificationsBelow are the symbols used in the WLE rules for each of the Indic SyllabicCategoryasmentionedinthetableprovidedinCodepointrepertoire(Section51)

C rarr Consonant

M rarr Kāra(Mātrā)

V rarr Vowel

B rarr Anusvāra

12 As used by the Unicode denoting and including both Assamese and Maṇipuri

47

D rarr Candrabindu

X rarr Visarga

H rarr Hasanta

Z rarr KhandaTa

S rarr S1S2(fromTable9)or(ae)Ya-phalā(V1HC1M1)whereV1iseither0985(অ-BENGALILETTERA)or098F(এ-BENGALILETTERE)His09CD(-BENGALISIGNVIRAMA)C1is-09AF(য-BENGALILETTERYA)M1is-09BE(া-BENGALIVOWELSIGNAA)S1 and S2 are valid even they are not allowed by the other context rules

P rarr Ra-Hasanta(C2H)whereC2iseither09B0(র-BENGALILETTERRA)or 09F0 (ৰ - ASSAMESE LETTER RA

Unicode name BENGALI LETTER RAWITHMIDDLEDIAGONAL)

His09CD(-BENGALISIGNVIRAMA)

Table16-SymbolsusedinWLErules

ItisalsoperhapsidealtomentionherethatinBanglatheconsonantletters(orgraphemes)arephysicallyjoinedtoformldquoclustersrdquothatcouldtheoreticallyconjoinfromtwotofourconsonantsandcombinetocreatenewshapesDashandChaudhuri(1998)statethatthereareldquonearly380uniqueconsonantclustersrdquooutofwhichBi-consonantalcombinationsare290three-lettercombinationsaccountforanother80andtherareroneswithfourlettersnumber10more[136Pg4]MoredetailsofsuchcombinationscouldbeseeninPabitraSarkar(1993)[135] 71FinalSetofWLERulesTheprevalentpatternsinBanglaandvariousrestrictionsbelowarethespecificWLErulesthatneedtobeimplemented

48

1 CisasetofCandCNwhereCNisthesetofnormalizedformsofড়ঢ়য়2 HmustbeprecededbyC

Example

3 MmustbeprecededbyCExampleকা

4 DmustbeprecededbyeitherofVCorMExampleআখখা হা

5 XmustbeprecededbyeitherofVCMorDExampleউঃখঃবঃাঃ দ ঃ

6 BmustbeprecededbyeitherofVCMorDExampleআংইংকং

7 ZmustbeprecededbyVCMDBXorPExampleইৎকৎাৎাৎপ6ৎrৎ (S is not listed because S ends with M Z may also follow S)rdquo

8 VCANNOTbeprecededbyHDetailsin711CaseofVprecededbyH

9 SCANNOTbeprecededbyH

Now letuselaborateeach rulewithexamples from the scriptkeeping inmindthe Bangla Assamese and Manipuri communities Some combinations ofcharactersmayseemunrealisticorrareinusagebutthereisnoharminaddingsuch ligaturesbecause it ispossible tocreate thembyanyusereasilybutmaynotbeattestedcombinationsCase of V Preceded by H There could be cases involvingmulti-word domains where Vmay need to beallowedtofollowanHegব8াsঅtইিuয়া baeligŋkʌv ɪndiə (U+09ACU+09CDU+09AFU+09BEU+0999U+09CD U+0995 U+0985 U+09AB U+09CD U+0987 U+09A8 U+09CD U+09A1U+09BFU+09DFU+09BE)(meaningBankofIndia)ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendswithanH(অt)andthesecondwordbeginswithaV(ইিuয়া)Somesectionsofthe linguistic community require the explicit presence of H for fullrepresentation of the sound intended However by and large the form of thefirstwordwithoutanH(U+09CD)isconsideredenoughforfullrepresentationofthesoundintendedforthefirstword

49

This isauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoire Otherwise V is never required to be allowed to follow an HPermittingthismaycreateaperceptivesimilaritybetweentwolabels(withandwithout H) for majority of the linguistic communities hence this is explicitlyprohibitedbytheNBGPIn future if required depending on the prevailing requirements from thecommunitythefutureNBGPmayconsiderrevisitingthisrule

72 AdditionalExamplesfromBanglaABNF

Belowarea fewexampleswhichhelponeunderstandsomeof the rulesABNFputsinplaceThesearejustgivenforreferencepurposesandarenotmeanttobecomprehensive

1HMBDorXcannotoccurinthebeginningofaBanglawordExample

ক क

াক ाक

ংক क

ক क

ঃক ःकAscanbeseensuchcombinationwillresultautomaticallyinaldquogolurdquooradottedcircle marking it as an invalid formation This is an intrinsic property of theIndianlanguagesyllableandisquasi-automaticallyappliedwhereversupportedbytheOS

2HisnotpermittedafterVBDXMS

Example

অ अ

অং क ক क কঃ कः ি )क

3NumberofBDorXpermittedafterConsonantorVowelorakāra(Mātrā)isrestrictedtoonethusthefollowingcombinationsareinvalidated

50

Example

কংং क

ক क

কঃঃ कःः

কা का

কীঃঃ कःः

অংং अ

অ अ

অঃঃ अःः

4NumberofMpermittedafterConsonantisrestrictedtooneExample

কীী की5MisnotpermittedafterV Example

ইাঈৗ ईाईौ6ThecombinationsofAnusvāra+ VisargaaswellasVisarga+Anusvāraarenotpermissible

Example

কংঃ कः

কঃং कः

8 Contributors

81ExpertsfromIndia ProfessorUdayaNarayanaSinghChair-ProfessorofLinguisticsampDeanFacultyofArtsAmityUniversityHaryanaGurgaonPachgaonManesarPIN122431(Haryana)India

51

ProfessorPabitraSarkarformerlyVice-ChancellorRabindraBharatiUniversityKolkataDrAtiurRahmanKhanPrincipalTechnicalOfficerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMrRajibChakrabortyLinguistSocietyforNaturalLanguageTechnologyResearch(SNLTR)Module114amp130SDFBuildingSaltLakeSector-VKolkata-700091(WestBengal)IndiaMrAkshatJoshiProjectEngineerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMsMoumitaChowdhurySeniorTechnicalOfficerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMrChandrakantaMurasinghAgartalaTripuraSomeotherNBGPmembers

82ContributorsfromBangladeshJanabMustafaJabbarHonorableMinisterMinistryofPostsTelecommunicationsampInformationTechnologyGovtofBangladeshProfShamsuzzamanKhanFormerDirector-GeneralBanglaAcademyDhakaProfRafiqulIslamNationalProfessorofHumanitiesDhakaProfSwarochisSarkarDirectorInstituteofBangladeshStudiesRajshahiUniversityRajshahiBangladeshProfJinnatImtiazAliDirector-GeneralInternationalMotherLanguageInstituteDhakaMrMohammadMamunOrRashidDepartmentofBanglaJahangirnagarUniversityampMemberBangladeshComputerCouncilProfManiruzzamanformerlyProfessorChittagongUniversityChattagramBangladeshMrShyamSunderSikderSecretarySecretaryPostampTelecommunicationsDivisionGovtofBangladesh

52

MrMdMustafaKamalFormerDirectorGeneralBangladeshTelecommunicationsRegulatoryAuthorityGovernmentofBangladeshDhakaBrigadierGeneralMdMahfuzulKarimMajumderDirector-GeneralEngineeringampOperationsDivisionBangladeshTelecommunicationsRegulatoryAuthorityGovernmentofBangladeshDhakaMdZiarulIslamProgrammerPostsampTelecommunicationsDivisionGovernmentofBangladeshDhakaProfSyedShahriyarRahmanDepartmentofLinguisticsUniversityofDhakaDrMizanurRahmanDirector(In-Charge)TranslationTextBookandInternationalRelationsDivisionBanglaAcademyDhakaDrApareshBandyopadhyayDirectorBanglaAcademyDhakaMrMdMobarakHossainDirectorBanglaAcademyDhakaDrJalalAhmedDirectorBanglaAcademyDhakaMrJahangirHossainInternetSocietyBangladesh(ICANNALS)JanabSarwarMostafaChoudhuryBangladeshComputerCouncilDhakaJanabMdRashidWasifBangladeshComputerCouncilDhakaJanabIstiaqueArifSeniorAssistantDirectorBangladeshTelecommunicationsRegulatoryAuthorityDhakaMsAfifaAbbasInformationSecurityandGovernanceLeadEngineeratBanglalinkandICANNFellowMr Mohammad Abdul Haque Secretary General Bangladesh Internet GovernanceForumMrImranHossenCEOEyeSoftandkeymemberofBangladeshAssociationofSoftwareampInformationServices(BASIS)MsShahidaKhatunDirectorFolkloreMuseumampArchiveDivisionBanglaAcademyDhakaMrSyedAshikRehmanCEOBengalMediaCorporationDhakaMrHaseebRahmanCEOProfessionalsrsquoSystemsDhaka

53

9 References[100] UnicodeConsortium2017UnicodeStandard100MountainViewCA

[101] BandyopadhyayChittaranjan1981DuiShatakerBanglaMudranoPrakashanKolkataAnandaPublishers

[102] BanerjiRD1919TheOriginoftheBengaliScriptKolkataNewDelhiAsianEducationalServices2003reprint

[103] ChatterjiSK1926TheOriginandDevelopmentoftheBengaliLanguageCalcuttaCalcuttaUniversityPress

[104] -----1939Bhasha-prakashBangalaVyakaran(AGrammaroftheBengaliLanguage)CalcuttaUniversityofCalcutta

[105] HaiMuhammadAbdul1964DhvaniVijnanOBanglaDhvani-tattwa(PhoneticsandBengaliPhonology)DhakaBanglaAcademy

[106]JhaSubhadra1958TheFormationofMaithiliLondonLuzacampCo

[107] KosticDjordjeDasRheaS1972AShortOutlineofBengaliPhoneticsCalcuttaStatisticalPublishingCompany

[108] MajumdarRC1971HistoryofAncientBengalCalcuttaGBhardwaj

[109] MazumdarBijaychandra19202000TheHistoryoftheBengaliLanguage(ReprCalcutta1920ed)NewDelhiAsianEducationalServices

[110] PandeyAnshuman2001ProposaltoEncodetheTirhutaScriptinISOIEC10646

[111] PalPalashBaran2001DhwanimalaBarnamalaKolkataPapyrus

[112] -----2007lsquoBanglaHarapherPanchParbarsquoInSwapanChakrabortyedMudranerSanskritiOBanglaBoiKolkataAbabhas

[113] RossFiona1999ThePrintedBengaliCharacteranditsEvolutionLondonCurzon

[114] ShastriMahamahopadhyayHaraPrasad1916HājārBacharērPurāṇaBāṅgālāBhāṣāyBauddhaGānōDōhāCalcuttaBangiyaSahityaParishat

[115] SinghUdayaNarayana(JointlyManiruzzaman)1983DiglossiainBangladeshandlanguageplanningCalcuttaGyanBharati214pp

[116] -----1987ABibliographyofBengaliLinguisticsMysoreCIILxii+316pp

[117] -----2017(withRajibChakrabortyBidishaBhattacharjeeampArimardanKumarTripathy)LanguagesandCulturesontheMarginGuidelinesforFieldworkonEndangeredLanguagesMimeoCentreforEndangeredLanguagesVisva-Bharati

54

[118] -----1980ScriptalchoiceandspellingreformAnessayinlanguageandplanningJournaloftheMSUniversityofBarodaSocialScienceNumber292173-186AmodifiedversionreprintedEAnnamalaiBjornJernuddandJoanRubinedsLanguagePlanningProceedingsofanInstituteMysoreCIIL405-417

[119] Sripantha1996JakhanChapakhanaEloKolkataPaschim-BangaBanglaAcademy

[120] SurAtul1986BanglaMudranerDushoBacharKolkataJijnasa

[121] ScriptBehaviourforBengaliVersion11TDILandC-DACPune

[122] BoraMahendra1981TheEvolutionofAssameseScriptJorhatAssamSahityaSabha

[123] ProposaltoEncodetheTirhutaScriptinISOIEC10646httpwwwunicodeorgL2L201111175r-tirhutapdfaccessedon25112017

[124] EthnologueAssameseintheLanguageCloudhttpswwwethnologuecomcloudasmaccessedon25112017

[125] BengalialphabetforManipurifoundinEthnologueManipuri(MeeteilonMeithei)httpswwwomniglotcomwritingmanipurihtmaccessedon20102019

[126] WikipediaBengalialphabethttpsenwikipediaorgwikiBengali_alphabetaccessedon25112017

[129]OmniglotSlyhetihttpwwwomniglotcomwritingsylotihtmaccessedon1052018

[130]WikipediaBishnupriyaManipurilanguagehttpsenwikipediaorgwikiBishnupriya_Manipuri_languageaccessedon25112017

[131] TheEMILLECIILCorpushttpmetashareeldaorgrepositorybrowsethe-emilleciil-corpusabdd35c8de6f11e2b1e400259011f6ea6bce74d38dbb42d881da76c64a6adb20accessedon1052018

[132]TheEMILLECIILCorpushttpcatalogelrainfoproduct_infophpproducts_id=696accessedon1052018

[133] BanglaLanguageampScript httpswwwisicalacin~rc_banglabanglahtmlaccessedon1052018

[134] SarkarPabitra1992BanglaBananSanskarSamasyaoSambhabanaKolkataChirayataPrakashan

[135] SarkarPabitra1993BanglaBhasharYuktabyanjanBhasha1123-45

55

[136] DashNiladriShekharandBBChaudhuri1998BanglaScriptAStructuralStudyLinguisticsToday121-28Alsoavailableathttpswwwacademiaedu9967428Bangla_Script_A_Structural_Study

[137] DaniAhmedHasan(1957)lsquoSrīhatta-NāgarīLipirUtpattioBikāśrsquoBanglaAcademyPatrika(Dhaka)Vol12(Bhadra-Agrahayan1364BangabdaNumber)pg1

[138] WikipediaSylhetiNagari httpsenwikipediaorgwikiSylheti_Nagariaccessedon1952018

[139] FuruiRyosuke(2015)lsquoVariegatedAdaptationsStateFormationinBengalfromtheFifthtoSeventhCenturyrsquoinBhairabiPrasadSahuampHermannKulkeedsInterrogatingPoliticalSystemsIntegrativeProcessesandStatesinPre-ModernIndiaChapter9Pp255-73NewDelhiManohar

[140] FergusonCharesAandMunierChowdhury(1960)lsquoPhonesofBengalirsquoLanguageVol36No1pp22-59

[141] ShahidullahMuhammad(2007)BuddhistMysticSongsDhakaMowlaBrothers

[142] RayPunyaSloka(1966)BengaliLanguageHandbookWashington

[143] HaiMuhammadAbdul(1960)AphoneticandphonologicalstudyofnasalsandnasalizationinBengaliDhakaUniversityofDhaka

[144] UnicodeConsortiumProposalSummaryFormtoAccompanySubmissionsforAdditionstotheRepertoireofISOIEC10646UNICODEhttpswwwunicodeorgL2L200202387r-syloti-formpdfaccessedonMay212018

[145] WikipediaOlChiki(Unicodeblock)httpsenwikipediaorgwikiOl_Chiki_(Unicode_block)accessedonMay212018

[146]BanglaScripthttpwwwbangladesh2000combdbangla_scripthtmlaccessedonMay212018

[147] BhattacharyaAshutoshed(1942)GopichandrerGanCalcuttaCalcuttaUniversity

[149] DasSisirKumar(1975)SahibsandMunshisAnAccountoftheCollegeofFortWilliamCalcutta

[150] IslamRafiqulPabitraSarkarMahbubulHaqampRajibChakraborty(eds)(2014)BanglaAcademyPromitoBanglaByabaharikByakaran(AFunctionalGrammarofStandardBangla)DhakaBanglaAcademy

[151] SarkarPabitra[2013]lsquoBanglaSpellingReformtheLongandShortofItrsquoBanglaJournal19215-232

[152] BanglaAcademy(2012)BanglaAcademyPromitoBanglaBananerNiyam(StandardBanglaSpellingasadoptedbyBanglaAcademy)DhakaBanglaAcademy

56

[153] SarkarPabitraampRajibChakraborty2018ldquoWhathashappenedSoFarIntermsof Script Reformsrdquo Paper presented at the Face to Face meeting jointly held by theBanglaAcademyDhakaampICANNatBanglaAcademyDhakaon10072018

[154] The Unicode Consortium 2018 The Unicodereg Standard Version 110 ndash CoreSpecificationChapter12P473

57

10 Appendix-I

101 AugmentedBackus-NaurFormalism(ABNF)The Augmented Backus-Naur Formalism (ABNF) is generic in nature and whenappliedtoaspecificlanguagescriptcertainrestrictionrulesapplyInotherwordsinagivenlanguagesomeoftheFormalismstructuresdonotnecessarilyapplyTotakecareofsuchcasesrestrictionrulesaresetinplaceTheserestrictionswillhelptofine-tunetheABNFIncaseofBangla13inparticularthefollowingrulesapply

1Khaṇḍata (ৎ) is NOT allowed at the beginning of an IDN label The sameappliestoঞandthevelarnasalঙintheBanglaSchemeoffive-foldlsquovargarsquo(asdefined under Table 5) Moreover Bangla does not allow ya (য়) in thebeginningofawordeitherbutwecanciteacoupleofnativeexamples forexamplethewordয়8াvেড়া(yaeligbbɔRo)fromthepoemlsquoLichuchorrsquowrittenbyKaziNazrul IslamHowever there are instances of it being used in namesmostlyof foreignoriginsuchasYaqubwhichmaybewrittenwithya(য়) inthe beginning as inয়াwব) In very recent timeswhile transliterating someChineseandJapanesenamesinBanglaonedoescomeacrossthepossibilityofKhaṇḍata (ৎ) followedbysa (স) inthebeginningofaword forexamplexেসিরং(Tsering)

2CHcancomewithKhandaTainonlythecasewhereCisra(র)(09B0)

ৎ6 asinভৎ6 সনা

3OnlyfollowingcombinationswithVHCMwillbeallowedrarrঅ8া(togetherpronouncedasaelig)asinঅ8ািসড(acid)rarrএ8া(togetheralsopronouncedasaelig)asinএ8ািসডএ8ােসািসেয়শান

(acidassociation)

102 lsquoSylhetiNagarılipirsquoorlsquoSilotirsquoThisversionofBanglascriptresemblesthe lsquoKaithīrsquoscript(ISO12954)usedbytheAccountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh andBiharndashwidelyinuseduringthe1880sTherewereseveralothernamesofSylheti

13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language Assamese and Maṇipuri have not been covered in this section

58

Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)

Table17ndashTheScriptTableofSylhetiNagarıorSiloti

103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints

1031 BanglaandNāgarīorDevanāgarī

Bangla Devanāgarī NBGP

Decision

ঃ U+0983 ः U+0903 Confusable

ওU+0993 उU+0909 Confusable

ঘU+0998 घU+0918 Confusable

U+0981 U+0945 Confusable

Table18BanglaandDevanāgarīconfusablecodepoints

59

1032 BanglaandGurmukhi

Bangla Gurmukhi NBGPdecision

ঘU+0998 ਬU+0A2C Confusable

U+0981 U+0A71 Confusable

Table19BanglaandGurmukhiconfusablecodepoints

Bangla Gurmukhi

NBGPdecision

ওU+0993 ਤU+0A24

Distinguishable

শU+09B6 ਅU+0A05

Distinguishable

মU+09AE ਮU+0A2E

Distinguishable

বাU+09ACandU+09BE

ਗU+0A17

Distinguishable

Table20ndashBanglaandGurmukhıdistinguishablecodepoints

1033BanglaandOriya(Odia)

Bangla Oriya(Odia) NBGPDecision

ওU+0993 ଓU+0B13 Confusable

Table21ndashBanglaandOriyadistinguishablecodepoints

Bangla Oriya(Odia) NBGP

Decision

ঘU+0998 ସU+0B38 Distinguishable

Table22ndashBanglaandOriyadistinguishablecodepoints

60

11 Appendix-IIBengaliconsonantsandtheirallographs

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

প p y(z+ত)(z +ন)|(z +প)প8(z +য)r(z +র)(z +ল)~(z +স)(+প)(+প)

ফ pʰ (t +র)(t +ল)(+ফ)

ব b ( +জ)( +দ)( +ধ)v(+ব)ব8(+য)(+র)(+ল)ভ(+ভ)(+ব)(+ব)

(0+ধ) 2 (3+ব)

ভ bʱ ভ8(+য)(+র)(+ল)

ত t (x+ত)8(x+x+য)(x+x+ব)(x+থ)(x+ন)ত8(x+য)(x+ম)8(x++য)(x+ব)(x+র)y(z+ত)(p+ত)(p+x+ব)(+ত)q8(+x++য) (+x+র)Thereisamarkedformofত+=ৎৎ6 (+xৎ)

amp (5+ত)

61

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

থ tʰ থ8(iexcl+য)cent(iexcl+র)pound(+থ)(x+থ)curren(+থ)

(7+থ) 9 (+থ)

দ d yen(brvbar+গ)sect(brvbar+ঘ)uml(brvbar+দ)copy(brvbar+ধ)দ8(brvbar+য)ordf(brvbar+ব)laquo(brvbar+ভ)not(brvbar+র)(+দ)shy(+দ)reg(+brvbar+র)not6 (+brvbar+র)

(lt+গ) gt (lt+ধ)

ধ dʱ macr(deg+ন)plusmn(deg+ম)ধ8(deg+য)sup2(deg+র)sup3(acute+ধ)copy(brvbar+ধ)(+ধ)micro(+ধ)

( (+ধ) gt (lt+ধ) (0+ধ) (7+ধ)

ট ʈ para(middot+ট)ট8(middot+য)cedil(middot+ব)sup1(middot+র)ordm(p+ট)raquo(frac14+ট)

ঠ ʈʰ ঠ8(frac12+য)frac34(iquest+ঠ)Agrave(frac14+ঠ)

ড ɖ Aacute(Acirc+ড)ড8(Acirc+য)Atilde(Acirc+র)

ঢ ɖʱ ঢ8(Auml+য)Aring(iquest+ঢ)

চ tʃ AElig(Ccedil+চ)Egrave(Ccedil+ছ)Eacute(Ccedil+Ecirc+র)Euml(Ccedil+ঞ)চ8(Ccedil+য)Igrave(Iacute+চ)Icirc(Iuml+চ)

(A+চ)

62

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ছ tʃʰ ETH(Ecirc+র)Egrave(Ccedil+ছ)Ntilde(Iacute+ছ)Ograve(Iuml+ছ)

$ (A+ছ)

জ dʒ Oacute(Ocirc+জ)Otilde(Ocirc+Ocirc+ব)Ouml(Ocirc+ঝ)times(Ocirc+ঞ)জ8(Ocirc+য)Oslash(Ocirc+র)Ugrave(Iacute+জ)

(A+জ)

ঝ dʒʱ (notprivilegedenoughtohaveclustersasafirstmember)Ouml(Ocirc+ঝ)Uacute(Iacute+ঝ)

ক k Ucirc(p+ক)ordm(p+ট)(p+ত)Uuml(p+x+র)(p+x+ব)Yacute(p+ন)THORN(p+ব)szlig(p+ম)ক8(p+য)agrave(p+র)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)auml(p+frac14+ব)aacute8(p+frac14+য)aring(p+স)s(aelig+ক)ccedil(+p+র)

amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)

খ kʰ (notprivilegedenoughtohaveclustersasafirstmember)egrave(aelig+খ)

63

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

গ g eacute(acute+গ)ecirc(acute+দ)sup3(acute+ধ)euml(acute+ন)igrave(acute+ব)iacute(acute+ম)গ8(acute+য)icirc(acute+র)iuml(acute+ল)eth(aelig+গ)eth6 (+aelig+গ)

( (+ধ) (H+গ) K (L+H+গ)

ঘ gʱ ntilde(ograve+ন)ঘ8(ograve+য)oacute(ograve+র)ocirc(aelig+ঘ)

ঞ Thisletterdoesnothaveanyparticularphoneticvaluebutmostlypronouncedasn

Igrave(Iacute+চ)Ntilde(Iacute+ছ)Ugrave(Iacute+জ)Uacute(Iacute+ঝ)times(Ocirc+ঞ)

(A+চ) $ (A+ছ) (A+জ) M (A+ঝ)

ণ n otilde(iquest+ট)frac34(iquest+ঠ)ouml(iquest+ড)divide(iquest+Acirc+র)Aring(iquest+ঢ)oslash(iquest+ণ)ণ8(iquest+য)ugrave(iquest+ব)acirc(p+frac14+ণ)uacute(frac14+ণ)ucirc(+ণ)

O (P+ড) - (P+R+র) + (S+ণ)

ঙং ŋ s(aelig+ক)uuml(aelig+p+র)egrave(aelig+খ)eth(aelig+গ)ocirc(aelig+ঘ)yacute(aelig+p+ষ)(Insomecontextsaeligisreplacedbyং)কংঅং

) (H+ক) (H+গ) U (H+ঘ)

64

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ম m thorn(+ল)yuml(+প)(+z+র)(+ভ)(++র)$(+ম)(+র)(x+ম)plusmn(deg+ম)amp(+ম)atilde(p+frac14+ম)

W (3+ম)

ন n (+ট)((+middot+র))(iquest+ঠ)u(+ড)(+Acirc+র)(+ত)q(+x+র)q8(+x++য)curren(+থ)shy(+দ)reg(+brvbar+র)micro(+ধ)+(+deg+র)(+brvbar+ব)-(+ন)(+ম)ন8(+য)(+স)0(+ন)

(7+থ) (7+ধ) (7+Y+র)

শ ʃ Icirc(Iuml+চ)Ograve(Iuml+ছ)1(Iuml+ন)2(Iuml+ম)3(Iuml+র)4(Iuml+ল)শ8(Iuml+য)

ষ ʃ 5(frac14+ক)raquo(frac14+ট)Agrave(frac14+ঠ)uacute(frac14+ণ)6(frac14+প)7(frac14+z+র)8(frac14+ফ)raquo(frac14+ট)9(frac14+middot+র)Agrave(frac14+ঠ)uacute(frac14+ণ)ষ8(frac14+য)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)

+ (S+ণ)

65

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

স sampʃ (+ক)(+ট)(+প)(+ফ)lt(+ত)pound(+থ)(+ট)(+ক)=(+খ)স8(+য)gt(+র)(+ল)aring(p+স)

9 (+থ)

হ h ucirc(+ণ)0(+ন)amp(+ম)হ8(+য)(+র)A(+ল)

W (3+ম)

ড় ɽ B(C+গ)

ঢ় ɽʱ (notprivilegedenoughtohaveclusters)

য dʒThesecondarysymbol(allograph)jɔ-phalahastwophoneticvaluesWhenaddedtotheinitialconsonantinaworditisavowelaelig(asinশ8ামলর 8াপারetc)Butafteranon-initialconsonantitjustdoublesitinpronunciation(asinকায6ধায6etc)The+যcombinationhastwophysicalmanifestationsmdashর 8andয6

ক8(p+য)স8(+য)র 8(+য)[Justর 8isneverusedinBanglaorthographyর 8াisbutthenitslasttwosymbolsYa-phalaa-karaconstituteavowelsignrepresentingthevowelঅ8া]

66

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

র r Twomanifestationsmdashi lরফrepʰasthefirst

memberofaclusteregপ6ৎ6 not6 য6D6(+deg+ব)(earlierE6=+brvbar+deg+বafour-termcluster)etc(placedoverthefollowingconsonant)

ii র-ফলাrɔ-pʰɔlaasthesecondthirdmemberofaclustereg etc(placedundertheconsonantitfollows)

ল l F(+গ)(+প)G(+ব)H(+ম)I(+ট)J(+ড)K(+ক)F(+গ)L(+দ)ল8(+য)iuml(acute+ল)(+ল)thorn(+ল)

ঃ hwordfinallywordmediallyitdoublesthepronunciationofthefollowingconsonant

অঃকঃ

অব

Page 14: Proposal for a Bangla (or Bengali) Script Root Zone …‘Bangla’ (or Bengali) is historically and genealogically regarded as an eastern Indo-Aryan language with around 178.2 million

14

kr (7 k+র r) Nগ ধ ( gdh (K g+ধ dh) ) ṅk (u ṅ+ক k) t ṅg (u ṅ+গ g) +

ṣṇ (B ṣ+ণ ṇ) ন ndhr (L n+ dh+র r) - ṇḍr ( ṇ+ ḍ+র r) ktr (7 k+x

t+র r)

331TheConsonantsAsper traditional classificationBangla Consonants are categorized according to theirphoneticpropertiesespeciallyintermsofplaceandmannerofarticulation[107]Thereare Five lsquoVargarsquo (pronounced as lsquoBargarsquo in Bangla) or Groups (sets or classes)distinguished by Place of Articulation and one Non-lsquovargarsquo group [105] Each Vargawhich corresponds toStopsat a certainplaceof articulation containsa seriesof fiveconsonants classified as per their phonetic qualities (ie manner of articulation)beginning from Unvoiced and Unaspirated to Voiced and Aspirated (in the fourthcolumn)finallyendingwithaHomorganicorCorrespondingnasal[107]Considerthefollowingtable

lsquoVargarsquoorSets

Unvoiced Voiced Nasal

-Asp +Asp -Asp +Asp

Velar কlsquoKrsquoU+0995

খlsquoKHrsquoU+0996

গlsquoGrsquoU+0997

ঘlsquoGHrsquoU+0998

ঙlsquoNGrsquoU+0999

Palatal চlsquoCrsquoU+099A

ছlsquoCHrsquoU+099B

জlsquoJrsquoU+099C

ঝlsquoJHrsquoU+099D

ঞlsquoNYrsquoU+099E

Retroflex টlsquoTTrsquoU+099F

ঠlsquoTTHrsquoU+09A0

ডlsquoDDrsquoU+09A1

ঢlsquoDDHrsquoU+09A2

ণlsquoNNrsquoU+09A3

Dental তlsquoTrsquoU+09A4

থlsquoTHrsquoU+09A5

দlsquoDrsquoU+09A6

ধlsquoDHrsquoU+09A7

নlsquoNrsquoU+09A8

Bilabial পlsquoPrsquoU+09AA

ফlsquoPHrsquoU+09AB

বlsquoBrsquoU+09AC

ভlsquoBHrsquoU+09AD

মlsquoMrsquoU+09AE

Table5VargaclassificationofBanglaconsonants

(FallingintoaPatternofFiveSetsofUnvoicedUnaspiratedUnvoicedAspiratedVoicedUnaspiratedVoicedAspiratedandNasalscalledfivelsquoVargarsquo)

Non-

যlsquoYrsquoU+09AF

য়lsquoYYrsquoU+09DF

রlsquoRrsquoU+09B0

ড়lsquoRRrsquoU+09DC

ঢ়lsquoRHrsquoU+09DD

15

Varga লlsquoLrsquoU+09B2

শlsquoSHrsquoU+09B6

ষlsquoSSrsquoU+09B7

স lsquoSrsquo U+09B8

হlsquoHrsquoU+0939

Table6Non-Vargaconsonants(Notfallingintoanyofthefivecategories)

332TheImplicitVowelKillerHasanta(calledrsquoHalantrsquoorlsquoHalantarsquoinotherBrahmı-basedscripts)As stated earlier all consonants are pronounced in isolation with an implicit vowel(centralback-ɔinBanglaastheneutralvowel)assumedtobeassociatedwiththem[121]ThelsquoHasantarsquo(=rsquoHalantrsquoorlsquoHalantarsquoinotherBrahmı-basedscripts)orthetermlsquoVirāmarsquo7(=rsquoDa rirsquoinBangla)aspreferredinUNICODE(cfUnicode30andabove)havebeenusedinthisreportastermsthathavebeenusedtodenotethecharacterthatmarkthe absence of this inherent vowel It may be noted that the term virama has beenadopted in UNICODE in a sense that is different from the traditional definition ofgrammarandhenceitrequiressomeexplanationhereConsideringtheimportanceofthedocumentthisnoteshouldbeapartofthisLGRdocumentsothatanybodyreferingtoitshouldbeabletoknowthepropergrammaticalexplanationofthetermBecauseaspecialsignisneededwheneverthisimplicitvowelisstrippedoffthesymbolisknownas the Hasanta (= Halant) (U+09CD) By placing the Hasanta under the firstconsonantofacombinationorclusteronecouldndashincommonparlanceldquokillrdquoitsvowelandcreate conjuncts In thismanner conjunct characters canbegenerallywrittenbyjoining two to fourconsonant combinations In rarecases thisprocess can joinup tofive consonantsHowever thenotionof amaximumnumberof consonants joining toformoneaksara8istobeboundedempiricallyThisisanobservationbasedontheCIIL-Emille Corpora of Bangla words [132 amp 133] as seen in print these days Given themixture of scripts and languages happening on theweb the possibility that onemaywant a generic Top Level Domain [gTLD] which may have more than the observedmaximum cannot be ruled out This can be the case when a foreign language wordwhichadmitsalargenumberofconsonantsistransliteratedintoBanglaHenceintheBanglaLGRworkthislimitwillnotbeenforced

333VowelsSeparate symbols exist for all lsquoSwararsquo or Vowels in Bangla which are pronouncedindependentlyeitheratthebeginningofthewordorafteranothervowelorconsonantsoundToindicateaVowelsoundotherthantheimplicitoneaVowelsigncalledlsquokārrsquo

7VirāmaasusedhereisalsoamisnomeraccordingtotheIndiangrammaticaltraditionsNowheremereabsence of a vowel is marked as virama Hasanta just marks the absence of a vowel nothing else(AbhyankarKashinathVasudevampJMShukla1961ADictionaryofSanskritGrammarBarodaːOrientalInstitute)8ThistermneedstobedisambiguatedAksaraalsomeanslsquosyllablelsquoinIndiangrammaticaltreaditions

16

inBanglaorMātrāinNagarı9isattachedtotheconsonantSincetheconsonanthasthisbuilt in neutral vowel at the end there are equivalent kāras (Mātrās) for all vowelsexcepttheঅ(pronounced-ɔ)Thecorrelationisshownasfollows

Vowel Correspondingvowelsign(kāras(Mātrās)

অlsquoArsquoU+0985

আlsquoAArsquoU+0986 া U+09BE

ইlsquoIrsquoU+0987 ি U+09BF

ঈlsquoIIrsquoU+0988 ীU+09C0

উlsquoUrsquoU+0989 U+09C1

ঊlsquoUUrsquoU+098A U+09C2

ঋVocalicrsquoRrsquoU+098B U+09C3

ৠVocaliclsquoRRrsquoU+09E0 U+09C4

ঌVocaliclsquoLrsquoU+098C U+09E2

ৡVocaliclsquoLLrsquoU+09E1 U+09E3

এlsquoErsquoU+098F ে U+09C7

ঐlsquoAIrsquoU+0990 ৈU+09C8

ওlsquoOrsquoU+0993 োU+09CB

ঔlsquoAUrsquoU+0994 ৌ U+09CC

- ৗ U+09D7

Couldappearontopofঅ lsquoArsquoU+0985oranyothervowel

U+0981Candrabindu

9AlthoughthetermlsquoMatralsquoinBanglastandsforanaltogetherdifferentconceptvizthetopbarplacedoveraletterndashtypicallyavailableinHindiandBanglabutmissinginGujarati

17

Vowel Correspondingvowelsign(kāras(Mātrās)

Couldappearafterঅ lsquoArsquoU+0985oranyothervowel

ংU+0982Anusvara

Couldappearafter অlsquoArsquoU+0985oranyothervowel

ঃU+0983Visarga

Afteranyconsonant U+09CD(Hasanta)

- ঽ U+09BDAvagraha

Table7BanglaVowelswithcorrespondingkārs

334TheAnusvāraonuʃʃār(ং-U+0982)TheAnusvāra or onuʃʃār inBangla at times represents a homorganic nasal but notalwaysItreplacesaconjunctgroupofalsquoNasalConsonant+Hasanta+ConsonantrsquowherethesecondconsonantbelongstotheVelarvargaorsetasinলংকাButitoftenappearsalso for such combinations involving non-velars appearing as the lastmember of thecombinationasinলGাংটা ldquonakedrdquoorলGাংচা ldquoakindofsweettolimprdquoBeforeanon-vargaconsonant the Anusvara represents a nasal sound that may have an alternativeconjoined writing symbol representing the corresponding nasal consonant of theparticularsetAlthoughModernHindiMarathiandKonkaniprefertheanusvāratothecorrespondingHalf-nasal inBangla it isclearlydemarcatedastowhereonemustusetheAnusvāraandwhere ithastobeaconjunctclusterwithanasalasthefirstorthesecondcomponent

335NasalizationCandrabindu(-U+0981)

Candrabindu denotes nasalization of the preceding vowel as in চাদ cad lsquomoonrsquo(U+099AU+09BEU+0981U+09A6)Thissignwithadotinsidethehalf-moonmarkisusedasnasalizationmarkerinmanyBrahmı-basedscripts[143]

336Nukta(-U+09BC)ThenuktasigndoesnotexistinBanglaorthographyItispredominantlyusedinmanyBrahmıderivedscriptssuchasDevanagarı(forHindiBodoMaithiliSantaliKashmiriandSindhiThetermandtheconceptofnuktaareborrowedinBanglaTheIDNAProtocol(RFC5891)statesthatIDNsmustbeinUnicodeNormalizationFormC (NFC) RFC 7940 applies this requirement to LGRs The definition of NFC in the

18

UnicodeStandardcontainsanumberofcompositionexclusionsAsaresulttheBanglalettersয় YYAড় RRAandঢ় RRHAhavetoberepresentedinthethisLGRbyusingthesequences (YA +Nukta U+9AF + U+09BC) (DDA + Nukta U+9A1 + U+09BC) and(DDHA+NuktaU+9A2+U+09BC)insteadofthesinglecodepointsYYA(U+9DF)RRA(U+09DC) andRRHA (U+09DD) although the useof lsquoNuktarsquo is otherwise completelyunnaturalinBanglaIt is noted that in the current Unicode Standard chart these characters are listed asadditionalconsonantsAspertheLGRProcedurehoweverthesedecisionsdependontheIDNAProtocolthroughasetofprodeduresdevelopedbytheIETFEventhoughtheUnicode Standard also prescribesmethods to produce these three characters both asatomiccharacters (forexample09DC forড় [r]09DD forঢ় [rh] and09DFasয় [y]assinglekeystroke)theIDNAprotocolrequiresthatwetreatthemasconjunctcharactersandthenallocatecodesfortheseintheUnicodeBengaliBlockItmaybenotedthattherecouldbesporadicattemptsorcasesofwritingMuslimnamesUrdupoeticwordsandPerso-Arabicloanwordswithnuktaunderক(k)খ(kh)গ(g)জ(j) and ফ (ph) only for the sake of correct pronunciation and for maintaining thesanctityoftheloanwordThesewerealsolikeusingBanglawritingsystemtoworkliketheIPAscriptItishowevernotinuseinBanglawritinginprinting

337Visargabiʃɔrgo(ঃ-U+0983)andAvagraha(ঽ-U+09BD)

TheVisargabiʃɔrgoU+0983 is frequentlyused inBangla loanwordsborrowed fromSanskritandrepresentsasoundveryclosetohOnecouldquoteasanexampleদঃখduhkholdquosorrowrsquorsquoldquounhappinessrsquorsquo(U+0926U+0941U+0983U+0916)The Avagraha ঽ (U+09BD) is mainly used in Sanskrit Pali Prakrt or Maithili textswritteninBanglaItisgraduallybeingreplacedbyanuppercomma(egনেরাঽপরািণre-writtenasনেরাrsquoপরািণ)ItisrarelyusednoweveninotherlanguagesusingBanglascriptIncaseofLGRtheAvagrahaisnotpartoftherepertoireIthasbeendecidedthereforenottoretainAvagraha(ঽ)(U+09BD)becauseitisblockedinTLDsaspertheMaximalStartingRepertoire(MSR)PleaseseeAppendixIIinsection11foracompletelistofBanglaconsonantsandtheirallographs

338ZeroWidthNon-joiner(U+200C)andZeroWidthJoiner(U+200D)ThisnoteispertinenttotheuseofZeroWidthJoiner(ZWJ)andZeroWidthNonJoiner(ZWNJ)asusedinBanglaItneedstobenotedthatNepaliKonkaniandHindiusethesetwosignsinadifferentmanner

19

ZWJ(U+0200D)andZWNJ(U+0200C)arecodepointsthathavebeenprovidedbytheUnicodestandardto instructtherenderingofastringwherethescripthastheoptionbetweenjoiningandnon-joiningcharactersWithouttheuseofthesecontrolcodesthestringmayberenderedinanalternateformfromwhatisintendedUseofZWJ

bull InsofarasBanglaisconcernedZWJisusedfortheproperrenderingofcharacterssuchaskhaṇḍa-taৎasinসতGিজৎ (satyajit)ldquoSatyajitrdquoandসৎ(sat)ldquohonestrdquoThisistypedasfollowsta+Hasanta+ZWJ(U+0200D)

bull However ZWJ is more important where same combination of consonantal

characters is represented differently depending upon the contexts Eg র++য

havetworepresentationsinBanglamdashasযHandasর GTogettheformযHonehasto

type in the following mannermdashর++য but for র G the sequence would be

র+ZWJ++য [154] In other words ZWJ is used in the rendering of wordsdemanding ya-phalā after ra which is otherwise not possible to type (render)due to the same order of ra+hasanta+antastha ja in the medial andor finalposition Interestingly ra+hasanta+antastha ja is used to type repha on theconsonant -antasthaja as inকায6 (kaarjo) In order to get a ya-phalā after the

consonant -ra it is therefore obligatory to use ZWJ after -ra as in র Gাপার

(wrapper)র Gাশ(rash)র Gািল(rally)etcThetypingsequenceisgivenbelow

ra(র)+ZWJ+hasanta()+antasthaja(য)=র GUseofZWNJ

bull TheuseofZWNJinBanglaisusedtorepresenttheexplicitHasantaorHalantInordertoavoidconjunctformationincaseswherethereisanexplicithasantabeforethesucceedingconsonanttheZWNJisused

Consonant+hasanta+ZWNJ+consonant=explicithasantaExampleAা7 কথন(prakkathanaprakkɔtʰon)

TheuseofZWJZWNJhavebeenruledoutfromtherootzonebythe[Procedure]Usedin Bangla to create alternate renderings the insertion of these two signs can affectsearchingaswellasNLP

20

TheZeroWidthNon-joiner(ZWNJ)isaninvisiblecharacterusedincertaincases(afterHasanta)wheredefaultconjunctformationistobeexplicitlyrestrictedandtheHasantajoiningthetwoconsonantsparticipatingintheconjunctformationneedstobeexplicitlyshown

339UseofYa-phalaaYa-Phalaasequencesare two instances inBanglawhereHasanta isprecededbya fullvowel(U+0985অ-BENGALILETTERAandU+098Fএ-BENGALILETTERE)

bull অা 098509CD09AF09BE BENGALILETTERA+BENGALISIGNVIRAMA+BENGALILETTERYA+BENGALIVOWELSIGNAA

bull এা 098F09CD09AF09BEBENGALILETTERE+BENGALISIGNVIRAMA+BENGALILETTERYA+BENGALIVOWELSIGNAA

For renderingYa-phalā followedbyঅ andএ it isnecessary to typeU+09CDHasantaplusU+09AFyaprecededby thesaidvowelsThis isapurely ligaturalentityand theadditionofYa-phalāandākaraisusedtoelicittheaeligsoundasinEnglishacidঅGািসড

association অGােসািসেয়শনlsquobatrsquoবGাটlsquofatrsquo ফGাট lsquomatrsquo মGাটlsquocaprsquoকGাপetcTheBrāhmīscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribed as lsquovowel killerrsquo although it actually indicates absenceof a vowel after themarkedconsonantOnly theconsonantscanhave theHasantamarkedButasweseehereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅ8াandএ8া(CfUnicode100p473[100])

3310FormationofRa-phalaaandRefSequences Thiscasereferstotheformationofrephaandra-phalāasfollows

Ra-Hasanta= (C2H)whereC2iseither

09B0(র-BENGALILETTERRA)or 09F0(ৰ-ASSAMESELETTERRAUnicodename

BENGALILETTERRAWITHMIDDLEDIAGONAL)His09CD(-BENGALISIGNVIRAMA)

Owingtoco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+kaasinঅকHarkaldquothesunldquo)ra-phalā=C+Hasanta+ra(egT ieka+Hasanta+raasinচTchakraldquocyclerdquo)Thepointisin

21

both the cases the slot for ra could be Bangla ra র (U+09B0) or the Assamese ra ৰ(U+09F0)followedprecededbythecommonHasanta(U+09CD)whereastheshapesofrephaandra-phalāinboththecasesremainthesameTheLGRmakesanoteofthispoint of concern with respect to the two RAs in disguise as it would be compeltelyimpossibletodistinguishbetweenthemwithnakedeyesinalablesogeneratedwhichmay consequently lead to concerns related to spoofing and other kind of cyberirregularitiesThemotivetoclassthesetwoCPsas(blocking)variantsisbecausefullyrendered labels may mask the distinction between Bangla ra র (U+09B0) or theAssameseraৰ(U+09F0)ThatprovidesthejustificationforVariantSet4thoughonlyinthecontextoffollowingHasantThedifferencebetweentheRAsisonlydistinguishableifonelooksintotheirUnicodevaluesThereforelabelssuchasঅকHarka শীষH sırsalsquotopapexrsquo অY abhra lsquocloudthe skyrsquo ম śrama lsquophysical labourrsquo could be extremelydangerous as theweb-usermay never verify the digital content (the labels) with itsunicodevaluecodepoints ThispointismadeexplicitlywithreferencetoTable9(ofsequencesp36)andTable16(ofWLESymbolsp47)thataretofollowMoreoveritisnoteworthythattheREPHAcanalsooccurwithKHANDATATheconditionsinthiscontextofKHANDATAareliabletobesuchthattheCshouldbeeitherRAU+09B0(র)(usedinBangla)orRAU+09F0(ৰ)(usedinAssamese)

4 OverallDevelopmentProcessandMethodologyThe Neo-Brāhmī Generation Panel (NBGP) has been formed by members havingexperience in Linguistics (especially in NLP Computational linguistics) LiteratureLanguageHistoryandEpigraphyUndertheNeo-BrāhmīGenerationPanelBanglaandeightotherscriptsbelongingtoseparateUnicodeblocksarebeingtakenuptoassignaseparate LGR for each However an attempt ismade to ensure that the fundamentalphilosophybehindbuildingthoseLGRsconsistentwithallotherBrāhmī-derivedscriptsThepresentLGRwillcater tomultiple languagesbelongingtoEGIDSscale1to4(seeTable4)thatuseBanglascriptThefollowingguidingprinciplesareusedinmakingdecisionsaboutBanglaLGRCode-points

41 GuidingPrinciplesTheNBGP adopts followingbroadprinciples for selection of code-points in the code-pointrepertoireacrosstheboardforalltheNeo-Brāhmīscriptswithinitsambit

411 InclusionPrinciples4111 ModernUsageEvery character proposed should be in the everyday usage of a particular linguisticcommunityThecharacterswhichhavebeenencodedintheUnicodefortranscription

22

purposesonlyorforarchivalpurposeswillnotbeconsideredforinclusioninthecode-pointrepertoire4112 UnambiguousUseEvery character proposed should have unambiguous understanding among linguistsaboutitsusageinthelanguage

42 ExclusionPrinciplesThe main exclusion principle is that of External Limits on Scope These consist ofprotocolsor standardswhichareprerequisites to theLabelGenerationRule-setsAllfurtherprinciplesareinfactsubsumedundertheselimitationsbuthavebeenspeltoutseparatelyforthesakeofclarity421 ExternalLimitsofScopeThecodepointrepertoireforrootzonebeingaveryspecialcaseatthetopofprotocolhierarchies thecanvasofavailablecharacters forselectionasapartof theRootZonecodepointrepertoireisalreadyconstrainedbyvariousprotocollayersbeneathitThefollowingthreemainprotocolsstandardsactassuccessivefiltersiTheUnicodeChartOut of all the characters that are needed by the script in question if a particularcharacter is not encoded in Unicode it cannot be incorporated in the code pointrepertoire Such cases are quite rare and especially so in Bangla-Asamiyā-ManipuriWritingSystemgiventheelaborateandexhaustivecharacterinclusioneffortsmadebytheUnicodeconsortiumiiIDNAProtocolUnicode being the character-encoding standard for providing the maximum possiblerepresentation of a given scriptlanguage it has encoded as far as possible all thepossible characters needed by the script However the Domain name being aspecialized case it is governed by an additional protocol known as IDNA(InternationalizedDomainNames inApplications) The IDNAprotocol excludes somecharactersoutofUnicoderepertoirefrombeingpartofthedomainnamesiiiMaximalStartingRepertoire(MSR)TheRoot-zoneLGRbeing the repertoireof characterswhicharegoing tobeused forcreationoftheRoot-zoneTLDswhichinturnconstituteanevenmorespecializedcaseof domain names the ROOT LGR procedure introduces additional exclusions on theIDNArsquosallowedsetofcharacters

23

ExampleBanglaSignAvagrahaঽ(U+093D)evenifallowedbyIDNAprotocol isnotpermittedintheRootZoneRepertoireaspertheMSRTosumuptherestrictionsstartoffwithadmittingonlysuchcharactersasarepartofthe code-block of the given scriptlanguage The IDNA Protocol further narrows thisdownandfinallyanadditionalfilterintheformofMaximalStartingRepertoirerestrictsthecharactersetassociatedwiththegivenlanguageevenmore4211 NoPunctuationMarksTheTLDsbeingidentifierspunctuationmarkerspresentinBraHami-basedscriptswillnotbeincluded 4212 NoSymbolsandAbbreviationsAbbreviations weights and measures and other such iconic characters like BANGLAISSHAR(U+09FA)BANGLACURRENCYDENOMINATORSIXTEEN৹(U+09F9)etcwillalsonotbeincluded4213 NoRareandObsoleteCharactersThere are characterswhich have been added toUnicode to accommodate rare formssuchasSanskriticVOCALICRRৠ(U+09E0)andVOCALICLldquoঌrdquo (U+098C)aswellasVOCALICLLৡ(U+09E1)andtheallographicndashkaraformsofthelattertwosymbols-VOWELSIGNVOCALICL(U+09E2)andVOWELSIGNVOCALICLLldquo(U+09E3)Allsuch charactersareexcludedwhich complieswith theConservatismprincipleas laiddownintheRootZoneLGRprocedureHoweverinBanglathe-karacorrespondingtoVOCALICRRৠ(U+09E0)whichisVOWELSIGNVOCALICRRldquordquo(U+09C4)isstill inactiveuseincertainlimitedborrowedorSanskriticwordsandarethereforeretained4214 NoStressMarkersofClassicalSanskritandVedicStressmarkers for classical Sanskrit will not be included This is also in consonancewiththeLetterprincipleaslaiddownintheRootZoneLGRprocedure4215 ABNFThe Augmented Backus-Naur Formalism (ABNF) is described in Section 541 andAppendix(Section101)

5 RepertoireTheBanglaWritingSystemisrepresentedinUNICODEusingtheBengali(Bangla)scriptname as enumerated in ISO 15924 corresponding to languages such as Asamiyā(Assamese) Bangla (Bengali) and Manipuri The BENGALI block used for Bangla-

24

Asamiyā-Manipuri in theUNICODEhas93 entriesThis sectiondetails the code-pointrepertoirethattheNeo-BrāhmīGenerationPanel[NBGP]proposestobeincludedintheBanglaLGRItmaybementionedherethat theGovernmentofAssamhassubmittedaproposal toBureauof Indian Standards (BIS) on26th February2016 for dis-unificationofBanglaand Asamiyā Scripts The BIS inits 8thMeetingofIndian Language Technologies andProducts Sectional Committee LITD 20 held on 23rd Aug 2017 decided torefer theproposalforrecognitionofAssamesescriptinISOIEC10646toISOUntiltheUNICODEConsortiumtakesanyfurtheractionitwillbeassumedthattheCodePointRepertoireunderTable11willbevalidforallthethreelanguagesasaboveFor each of the code points language references have been given in the last columntitledReferenceunderTable8titledtheldquoCodePointRepertoirerdquoForentirecoverageofBanglacodepointsreferencesofBanglaAsamiyā(Assamese)Manipuri(Meitei)andBishnupriya are given Kokborok written in Bangla script is not known to haveintroducedmanynewcomplicationsexceptforoneparticularcharacterThoughonlyafewrepresentativelanguagesunderEGIDSScale1-4havebeenchosenforreferencingthey together cover all the code-points required for all the languages that NBGP hasconsideredasgivenunderBanglaUnicodePoints(asgiveninUNICODE63)Howeverbefore thedetailsarepresented it is ideal to lookat theBanglaCodePointChartfromMaximalStartingRepertoire[MSR]Version3Itmaybenotedthattheshapesofthereferenceglyphsgivenbelowinthecodechartsarebasedononeofthemanyfontsdesigned and are not prescriptive because there could be some variations in actualfonts ndash both UNICODE-compatible and True-Type ones Consider the following Codepointtable

25

Figure1BanglaCodePagefrom[MSR]forBangla-Asamiyā-Manipuri

ColourconventionAllcharactersthatareincludedinthe[MSR]-YellowbackgroundPVALIDinIDNA2008butexcludedfromthe[MSR]-PinkishbackgroundNot PVALID in IDNA2008 or are ineligiblefor the root zone (digits hyphen) - Whitebackground

GiventheBanglaUnicodeBlockasinFigure1forthecodepointsthoseareincludedintheMSRthefollowingsymbolswillneedaseparatetreatmentৎ U+09CE BanglaLetterKhanda-Taৰ U+09F0 Asamiyā-BanglaLetterRaWithMiddleDiagonalৱ U+09F1 Asamiyā-BanglaLetterRaWithLowerDiagonal

26

51 CodePointRepertoireInclusion

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

1 U+0981 BENGALISIGNCANDRABINDU

Candra-bindu

1Bangla2Manipuri2Assamese

[112][122][125]

2 U+0982 ং BENGALISIGNANUSVARA

Onushshar(Anusvara)

1Bangla2Manipuri2Assamese

[112][122][125]

3 U+0983 ঃ BENGALISIGNVISARGA

Biśarga(Visarga)

1Bangla2Manipuri2Assamese

[112][122][125]

4 U+0985 অ BENGALILETTERA

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

5 U+0986 আ BENGALILETTERAA

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

6 U+0987 ই BENGALILETTERI

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

7 U+0988 ঈ BENGALILETTERII

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

27

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

8 U+0989 উ BENGALILETTERU

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

9 U+098A ঊ BENGALILETTERUU

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

10 U+098B ঋ BENGALILETTERVOCALICR

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

11 U+098F এ BENGALILETTERE

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

12 U+0990 ঐ BENGALILETTERAI

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

13 U+0993 ও BENGALILETTERO

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

14 U+0994 ঔ BENGALILETTERAU

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

28

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

15 U+0995 ক BENGALILETTERKA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

16 U+0996 খ BENGALILETTERKHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

17 U+0997 গ BENGALILETTERGA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

18 U+0998 ঘ BENGALILETTERGHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

19 U+0999 ঙ BENGALILETTERNGA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

20 U+099A চ BENGALILETTERCA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

21 U+099B ছ BENGALILETTERCHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

29

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

22 U+099C জ BENGALILETTERJA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

23 U+099D ঝ BENGALILETTERJHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

24 U+099E ঞ BENGALILETTERNYA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

25 U+099F ট BENGALILETTERTTA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

26 U+09A0 ঠ BENGALILETTERTTHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

27 U+09A1 ড BENGALILETTERDDA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

30

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

28 09A109BC(U+09DC)

ড় NormalizedformofBENGALILETTERRRA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]09DCisthepreferredcodepointhoweveritisnotavailableforLGRasperthestandardsgoverningthisLGRdevelopment

29 U+09A2 ঢ BENGALILETTERDDHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

30 09A209BC(U+09DD)

ঢ় NormalizedformofBENGALILETTERRHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]09DDisthepreferredcodepointhoweveritisnotavailableforLGRasperthestandardsgoverningthisLGRdevelopment

31 U+09A3 ণ BENGALILETTERNNA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

32 U+09A4 ত BENGALILETTERTA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

31

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

33 U+09A5 থ BENGALILETTERTHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

34 U+09A6 দ BENGALILETTERDA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

35 U+09A7 ধ BENGALILETTERDHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

36 U+09A8 ন BENGALILETTERNA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

37 U+09AA প BENGALILETTERPA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

38 U+09AB ফ BENGALILETTERPHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

39 U+09AC ব BENGALILETTERBA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

32

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

40 U+09AD ভ BENGALILETTERBHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

41 U+09AE ম BENGALILETTERMA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

42 U+09AF য BENGALILETTERYA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

43 09AF09BC(U+09DF)

য় NormalizedformofBENGALILETTERYYA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]09DFisthepreferredcodepointhoweveritisnotavailableforLGRasperthestandardsgoverningthisLGRdevelopment

44 U+09B0 র BENGALILETTERRA

Consonant 1Bangla2Manipuri

[112][125]

45 U+09B2 ল BENGALILETTERLA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

46 U+09B6 শ BENGALILETTERSHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

33

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

47 U+09B7 ষ BENGALILETTERSSA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

48 U+09B8 স BENGALILETTERSA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

49 U+09B9 হ BENGALILETTERHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

50 U+09BE া BENGALIVOWELSIGNAA

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

51 U+09BF ি BENGALIVOWELSIGNI

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

52 U+09C0 ী BENGALIVOWELSIGNII

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

53 U+09C1 BENGALIVOWELSIGNU

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

34

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

54 U+09C2 BENGALIVOWELSIGNUU

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

55 U+09C3 BENGALIVOWELSIGNVOCALICR

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

56 U+09C4 BENGALIVOWELSIGNVOCALICRR

Kāra(Mātrā)

1Bangla2Assamese

[112][122]

57 U+09C7 l BENGALIVOWELSIGNE

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

58 U+09C8 m BENGALIVOWELSIGNAI

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

59 U+09CB lা BENGALIVOWELSIGNO

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

60 U+09CC lৗ BENGALIVOWELSIGNAU

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

35

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

61 U+09CD BENGALISIGNVIRAMA

Hasanta(=Halant)Virama(=Da ri)

1Bangla2Assamese2Manipuri

[112][122][125]

62 U+09CE ৎ BENGALILETTERKHANDATA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

63 U+09F0 ৰ BENGALILETTERRAWITHMIDDLEDIAGONAL

Consonant 2Assamese [122]

64 U+09F1 ৱ BENGALILETTERRAWITHLOWERDIAGONAL

Consonant 2Assamese2Manipuri

[122][125]

Table8BanglaCode-PointRepertoire

Apart from the above individual code-points the Neo-Brāhmī Generation Panel alsoproposes some specific sequences which enable conditional inclusion of the BanglaLETTER A and E followed by Bangla SIGN VIRAMA and Bangla LETTER YA againfollowed by Bangla VOWEL SIGN AA in the repertoire for enabling inclusion of aeligsoundasinEnglishlsquobatrsquolsquocatrsquoetc

SrNo

UnicodeCodePoints

Sequence

CharacterNames Examplelanguagesusingthecode-point

(Notexhaustive

list)

Reference

36

SrNo

UnicodeCodePoints

Sequence

CharacterNames Examplelanguagesusingthecode-point

(Notexhaustive

list)

Reference

S1 098509CD09AF09BE

অ8া BENGALILETTERABENGALISIGNVIRAMABENGALILETTERYABENGALIVOWELSIGNAA

BanglaAssamese

[112][122]

S2 098F09CD09AF09BE

এ8া BENGALILETTEREBENGALISIGNVIRAMABENGALILETTERYABENGALIVOWELSIGNAA

Bangla [112]

Table9Sequences

52 CodePointRepertoireExclusionTherearesomecharactersoftheBanglascriptthatfindplaceintheUnicodebuthavenot been included in the repertoire in the LGR proposal The reason for excludingঌ(U+098C)andৗ(U+09D7)isthattheyarerareandobsoletecharacters

SrNo CodePoints

Glyph CharacterNames Note

1 U+098C ঌ BENGALILETTERVOCALICL Limitedordeclininguse

2 U+09D7 ৗ BENGALIAULENGTHMARK Limitedordeclininguse

Table10ExcludedCodePoints

37

53 CodepointnotusedaloneBENGALI SIGN NUKTA U+09BC (See 336) is excluded from repertoire since it will never be used alone It will be used as sequence in three special characters in normalized form for ড় ঢ় য়

UnicodeCodePoint

Glyph CharacterName Reasonforexclusion

U+09BC BENGALISIGNNUKTA

Never used alone Only used together with U+09A1 ড

U+09A2 ঢ U+09AF য as to form ড় ঢ় য় respectively

Table10bExcludedCodePoints

54 TheBasisofPresentIDNThepresentLGRhasalsobenefitedfromtheearlierworkonIDNforBangla(differentversions)doneforभारतorভারতdraftedbetween20112009and18072013

541 TheABNFVariablesTheAugmentedBackus-NaurFormalism(ABNF)beganwiththefollowingvariables

CrarrConsonantVrarrVowelMrarrkāra(Mātrā)BrarrAnusvāra(onuʃʃār)DrarrCandrabinduXrarrVisarga(biʃɔrgo)HrarrHasantaViramaZrarrKhandaTa

TheAugmentedBackus-NaurFormalism(ABNF)willusethefollowingOperators

SrNumber Operator Function

1 ldquo|ldquo Alternative

2 ldquo[]rdquo Optional

3 ldquordquo VariableRepetition

38

4 ldquo()rdquo SequenceGroup

Table11TheABNFFormalism

InwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoBanglaaregiventofacilitateunderstanding

542 TheVowelSequenceInwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoBanglaare given To facilitate understanding of other Brahmi script users equivalents inDevanāgarīareprovidedwherevernecessaryAvowelsequenceismadeupofasinglevowelItmaybefollowedbutnotnecessarily(optionally)byanAnusvāraonuʃʃār(B)Candrabindu(D)oraVisargabiʃɔrgo(X)ThenumberofDBorXwhichcanfollowaV inBanglamaynotberestrictedtooneGoingbytherules illustratedinthedocument it isclearthat formationssuchasVDDVBBandVXXare invalidorthographicunitsHowever it isvalidandpossible tohaveformationsorsequencessuchasanusvarafollowedbyachandrabinduononehandandvisarga followed by a chandrabindu on the other as in হ8াংচা lsquohaelignchārsquo and lsquohaelignrsquo হ8াঃrespectivelyThepossibilityof aVisarga orAnusvāra (onuʃʃār) followingaCandrabinduexists inBangla Vowel can optionally be followed by a combination of Hasanta Virama [H]Consonant [C] to formaYa-phala ldquoYa-phala isapresentation formofU+09AFBanglaletterযorlsquoyarsquoRepresentedbythesequenceltU+09CDieBENGALISIGNVIRAMABangla SIGNHasanta or VIRAMA U+09AF -য BENGALI LETTER YAgt Ya-phala has aspecialformয়AgainwhencombinedwithU+09BEাBENGALIVOWELSIGNAA(ielsquoaarsquo(ā))itisusedfortranscribing[aelig]asintheldquoardquointheEnglishwordldquobatrdquowritteninBanglaasব8াটAVowel-sequenceadmitsthefollowingcombinations

5421 ASingleVowel

ExamplesV অअ

5422 AVowelwithConditionsAVowelcanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D]orVisarga[X]or Candrabindu+ Anusvāra [DB] or Candrabindu+ Visarga [DX] or combination ofHasanta(orVirama)[H]followedbyConsonant[C]followedbykāra(Mātrā)[M]

39

Examples

VB অং अ

VD অ अ

VX অঃ अः

VDB অং अ

VDX অঃ अःVHCM অ8াএ8া

5423 VHCMSequenceAVHCMsequencecanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D] orVisarga [X]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga [DX]

Examples VHCMB অ8াংএ8াং VHCMD অ8াএ8া VHCMX অ8াঃএ8াঃ VHCMDB অ8াংএ8াং VHCMDX অ8াঃএ8াঃ

543 TheConsonantSequence

5431 ASingleConsonant(C) ExampleC কक

5432 AConsonantwithConditionsAConsonantoptionallyfollowedbydependentvowelsignkāra(Mātrā)[M]orAnusvāra [B] or Candrabindu [D] or Visarga [X] or Hasanta (also known asVirama)[H]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga[DX]

Example

CM িকক -कक

CB কং क

40

CD ক क

CX কঃ कः

CH p क(Pureconsonant)

CDB কং क

CDX কঃ कः

5433 CMSequenceACMsequencecanbeoptionallyfollowedbyBDXDBorDX

Example CMB কীংকং कक

CMD কা का

CMX বীঃ वीः

CMDB কাং का

CMDX কাঃ काः

5434 SequenceofConsonants Asequenceofconsonants(upto4)joinedbyHasanta(alsoknownasVirama)

3(CH)CExample

CHC W rarr ন++ ত न+त

CHCHC sup2 rarr ন+ + ত+ + র न+त+र

CHCHCHC q8 rarr ন++ত++র++য় न+त+र+य

5435 Subsets While considering its subsets as a representative example we will consider thecombinationCHConlyhoweverthesameisequallyapplicabletoCHCHCandCHCHCHC[A]ThecombinationmaybefollowedbyMBDXDBorDX

Example CHCM sup3ী rarrক ক ী 4कrarrककী

CHCB sup3ং rarrক ক ং 4कrarrकक

CHCD sup3 rarrক ক 4कrarrकक

41

CHCX sup3ঃ rarrক ক ঃ 4कःrarrककঃ

CHCDB sup3 ং rarrক ক ং 4कrarrकक

CHCDX sup3ঃ rarrক ক ঃ 4कःrarrककः [B] 3(CH)CMmayfurtherbefollowedbyaBDXDBorDX

Example

CHCMB sup3ীং rarr ক ক ী ং 4क rarr क क ी

sup3ং rarr ক ক ং 4क rarr क क

CHCMD sup3া rarr ক ক া 4का rarr क क ा

CHCMX sup3ীঃ rarr ক ক ী ঃ 4कः rarr क क ी ः

CHCMDB sup3াংrarr ক ক া ং 4काrarr क क ा

CHCMDX sup3াঃ rarr ক ক া ঃ 4काः rarr क क ा ः

544 TheKhanda-Tasequence

5441 AsinglelsquoKhandarsquo-Ta(Z) Example Zৎ=x

5442 AKhandaTaCombination10AKhandaTacanbeprecededbyaconsonantandHasanta(alsoknownasVirama)

[CH]Z

Example র++ৎ =ৎHasinভৎHসনা (bhartsanā) scoldingNoteTheconditionsinthiscontextofKHANDATAarethattheCshouldbeeitherRAU+09B0(র)(usedinBangla)orRAU+09F0(ৰ)(usedinAssamese)

545 SpecialCasesSandPTwospecialcasesinvolvingSequences(referredtoasSandPinTable16underSection7)couldbedescribedbrieflyhereLetustakeupSinthefirstinstanceItisnoteworthythat thereare two instances inBanglawhereHasanta(U+09CD) isprecededbya full

10 Refer to Rule P in Section 7 Table 16

42

vowel (U+0985 অ - BANGLA LETTER A and U+098F এ - BANGLA LETTER E) ForrenderingYa-phalāfollowedbyঅandএitisnecessarytotypeU+09CDplusU+09AFyapreceded by the said vowels This is a purely ligatural entity and the addition ofYa-phalāandā-kāraisusedtoelicittheaeligsoundasinEnglishlsquobatrsquolsquofatrsquoetcTheBrahmıscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribedaslsquovowel killerrsquo although it actually indicates absence of a vowel after the markedconsonant Only the consonants can have the Hasanta marked But as we see hereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅGাandএGা(CfUnicode100p473[100])Another case refers to the formation of repha and ra-phalā in the said script andmentioned in the tableaboveasPOwing toco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+

kaasinঅকHarkaldquothesunldquo)ra-phalā=C+Hasanta+ra(egTieka+Hasanta+raas

inচTcakraldquocycleldquo)ThepointisinboththecasestheslotforracouldbeBanglaraর

(U+09B0)ortheAssameseraৰ(U+09F0)followedprecededbythecommonHasanta(U+09CD)whereastheshapesofrephaandra-phalāinboththecasesremainthesame

43

6 VariantsThissection talksabout thevariants in theBanglascriptTheNBGPcategorizes theseconfusinglyvariantsintwogroupsGroup1ConfusingduetopurevisualsimilarityGroup2ConfusingduetodeviationfromnormallyperceivedcharacterformationsbylargerlinguisticcommunityForGroup1anyidenticalcodepointsaredefinedasvariantsTheconfusablebutnotidenticalcasesarenotproposedasthereisanotherpanel(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesHowevercaseswhichbelongtoGroup2areproposedtobeconsideredasvariantsThesecasesarenotofmerevisualsimilarityasthey involve some deviations from the widely accepted norms of Bangla Aksharformations These can cause confusion even to a careful observer and hence beingproposedasvariantsThe variants are generated in a script when two or more forms are formed withdifferent storage or code points In Bangla the e-kāra ā-kāra and the o-kāra havedifferentcodepointsOnecantypeowithaconsonantatonegoandthesamebytypinge-kāra and ā-kāra as two separate keys getting the same results A reader cannotdifferentiatebetween the twoko (েকা)one typedwitha singlekeyand theotheronetypedwithtwodifferentkeysMoreoverthiswillnotbeconsideredasacaseofvariantbecauseakārafollowedbyakāraisnotallowed

61 InScriptVariantsHoweverweproposetwocasesoftruein-scriptvariantsinBanglascriptCASEIAs far as true variants in Bangla are concernedwemay drawour attention to caseswhereinHasantawith(U+09A5)থ(tha)appearsasconjunctwith(U+09B8)স(sa)and(U+09A8)ন(na)1 স+Hasanta+থ(U+09B8+U+09CD+U+09A5)versus

স +Hasanta+হ(U+09B8+U+09CD+U+09B9)

2 ন+Hasanta+থ(U+09A8+U+09CD+U+09A5)versus

ন+Hasanta+হ(U+09A8+U+09CD+U+09B9)

44

Theabovecombinationsifwrittenintraditionalorthographycouldbelittleconfusingwheretheথ(tha)inconjunctappearslikeaহ(ha)Theconjunctcouldbeintheinitialmedialorfinalpositions(asshownbelowinegno1)Itcouldbetypedwrongaswellthinking itwas aহ (ha)U+09B9 increasing the chances of risks in labelwriting andidentificationExamples1 acuteandসহ(asinacuteানsthānaacuteলsthulaাacuteGsvāsthyaঅacuteায়ীasthāyı)

2 andনহ(asinparagrantha)The fontswhichrepresent traditionalBanglawritingsystemcould tend tocreate thisproblemThereforethesemaybetakenascasesofvariantsinBanglaCASEIIAnotherinterestingexampleofvariantisencounteredinra+HasantaandHasanta+racombinationsinwritinglabelsintheBanglascript(forlanguagessuchasBanglaAssameseandManipuri)Thevariantcasesariseintypinglsquorepharsquo(involvingra+Hasanta)andlsquora-phalārsquo(involvingHasanta+ra)lsquoRepharsquocouldbeformedbytwosequences(mainlybecausebothAssameseandBanglafindplaceinthesameUNICODEpointsandlsquoB_RArsquoaswellaslsquoA_RArsquorefertothesamephoneticelement)Herethefinalligatureslookthesameandwillbeasfollows

(1) B_RA+H+C(2) A_RA+H+C

Where

B_RA = U+09B0BENGALILETTERRA(র)orA_RA = U+09F0BENGALILETTERRAWITHMIDDLEDIAGONAL(ৰ)H = U+09CDBENGALISIGNVIRAMA()C = anyconsonant(theoretically)

Example

Sequence1(UsingBanglaRA)

Ligature1

Sequence2(UsingAssameseRA)

Ligature2

U+09B0(র)U+09CD()U+0995(ক) কH U+09F0(ৰ)U+09CD()U+0995(ক) কH

U+09B0(র)U+09CD()U+09A0(ঠ) ঠH U+09F0(ৰ)U+09CD()U+09A0(ঠ) ঠH

Table12ExampleofRepha

45

NoteAsBanglaandAssameseকandঠlookexactlythesametheresultantcombinationswithRephalookidenticalAdditionofRephadoesnotmakeanydifferencelsquoRa-phalarsquocouldbeformedbytwosequencesonsimilargroundsandthefinalligatureswouldlookthesame

(1) C1+H+B_RA(2) C1+H+A_RA

WhereC1 = anyconsonantsexceptKhanda-ta

Example

Sequence1(UsingBanglaRA)

Ligature1

Sequence2(UsingAssameseRA)

Ligature2

U+0995(ক)U+09CD()U+09B0(র) U+0995(ক)U+09CD()U+09F0(ৰ)

U+09A8(ন)U+09CD()U+09B0(র) ) U+09A8(ন)U+09CD()U+09F0(ৰ) )

Table13ExampleofRa-phala

AstheAssameseandBanglaRephaandRa-phalaconjunctformslookthesamethiscouldcauseconfusabilitytotheend-usersHencetherephaandra-phalacasesneedtobedefinedasvariantsNBGPconcludedtodefineরandৰasvariantcodepointswhereonlyonevariantsetbetweenরandৰcouldcoverallcasesButthiswillcreateblockedvariantlabelsegif

someoneregistersldquoরররrdquothevariantlabelldquoৰৰৰrdquowillbegeneratedasvariantandwillbeblockedandviceversaHoweveritisonlyblockedatthelabellevelifsomeoneelseneedstoregisterotherlabelsegৰৰorৰৰৰৰitisstillpossible

62 CrossScriptVariantsAcrispcrossscriptstudyforBanglahasbeendonewithrespecttosisterscriptssuchasDevanāgarī Gurmukhı and Odia11(formerly Oriya) keeping in mind the visual andtechnicalconfusionstheymaycauseaslabelsonthewebdomainMoreoverthereisnoin-script variant in Bangla as far as the orthography is concerned The followingcharacters are being proposed by the NBPG as variants Although there are certaincharacters which are somewhat similar they but have not been included here TheyhavebeenprovidedintheAppendix(102)forreference

11 Unicode uses Oriya for the script although Odia is now the official term used

46

1 BanglaandNāgarīDevanāgarīScript

Bangla Devanāgarī

মU+09AE

मU+092E

িU+09BF

िU+093F

Table14-BanglaandDevanāgarīcross-scriptvariantcodepoint

2 BanglaandGurmukhiScript

Bangla Gurmukhı

মU+09AE

ਸU+0A38

িU+09BF

ਿU+0A3F

Table15-BanglaandGurmukhıcross-scriptvariantcodepoint

7 WholeLabelEvaluationRules(WLE) ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinsection32whenwritten inBangla12ScriptThe ruleshavebeendrafted in suchawaythattheycanbeeasilytranslatedintotheLGRspecificationsBelow are the symbols used in the WLE rules for each of the Indic SyllabicCategoryasmentionedinthetableprovidedinCodepointrepertoire(Section51)

C rarr Consonant

M rarr Kāra(Mātrā)

V rarr Vowel

B rarr Anusvāra

12 As used by the Unicode denoting and including both Assamese and Maṇipuri

47

D rarr Candrabindu

X rarr Visarga

H rarr Hasanta

Z rarr KhandaTa

S rarr S1S2(fromTable9)or(ae)Ya-phalā(V1HC1M1)whereV1iseither0985(অ-BENGALILETTERA)or098F(এ-BENGALILETTERE)His09CD(-BENGALISIGNVIRAMA)C1is-09AF(য-BENGALILETTERYA)M1is-09BE(া-BENGALIVOWELSIGNAA)S1 and S2 are valid even they are not allowed by the other context rules

P rarr Ra-Hasanta(C2H)whereC2iseither09B0(র-BENGALILETTERRA)or 09F0 (ৰ - ASSAMESE LETTER RA

Unicode name BENGALI LETTER RAWITHMIDDLEDIAGONAL)

His09CD(-BENGALISIGNVIRAMA)

Table16-SymbolsusedinWLErules

ItisalsoperhapsidealtomentionherethatinBanglatheconsonantletters(orgraphemes)arephysicallyjoinedtoformldquoclustersrdquothatcouldtheoreticallyconjoinfromtwotofourconsonantsandcombinetocreatenewshapesDashandChaudhuri(1998)statethatthereareldquonearly380uniqueconsonantclustersrdquooutofwhichBi-consonantalcombinationsare290three-lettercombinationsaccountforanother80andtherareroneswithfourlettersnumber10more[136Pg4]MoredetailsofsuchcombinationscouldbeseeninPabitraSarkar(1993)[135] 71FinalSetofWLERulesTheprevalentpatternsinBanglaandvariousrestrictionsbelowarethespecificWLErulesthatneedtobeimplemented

48

1 CisasetofCandCNwhereCNisthesetofnormalizedformsofড়ঢ়য়2 HmustbeprecededbyC

Example

3 MmustbeprecededbyCExampleকা

4 DmustbeprecededbyeitherofVCorMExampleআখখা হা

5 XmustbeprecededbyeitherofVCMorDExampleউঃখঃবঃাঃ দ ঃ

6 BmustbeprecededbyeitherofVCMorDExampleআংইংকং

7 ZmustbeprecededbyVCMDBXorPExampleইৎকৎাৎাৎপ6ৎrৎ (S is not listed because S ends with M Z may also follow S)rdquo

8 VCANNOTbeprecededbyHDetailsin711CaseofVprecededbyH

9 SCANNOTbeprecededbyH

Now letuselaborateeach rulewithexamples from the scriptkeeping inmindthe Bangla Assamese and Manipuri communities Some combinations ofcharactersmayseemunrealisticorrareinusagebutthereisnoharminaddingsuch ligaturesbecause it ispossible tocreate thembyanyusereasilybutmaynotbeattestedcombinationsCase of V Preceded by H There could be cases involvingmulti-word domains where Vmay need to beallowedtofollowanHegব8াsঅtইিuয়া baeligŋkʌv ɪndiə (U+09ACU+09CDU+09AFU+09BEU+0999U+09CD U+0995 U+0985 U+09AB U+09CD U+0987 U+09A8 U+09CD U+09A1U+09BFU+09DFU+09BE)(meaningBankofIndia)ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendswithanH(অt)andthesecondwordbeginswithaV(ইিuয়া)Somesectionsofthe linguistic community require the explicit presence of H for fullrepresentation of the sound intended However by and large the form of thefirstwordwithoutanH(U+09CD)isconsideredenoughforfullrepresentationofthesoundintendedforthefirstword

49

This isauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoire Otherwise V is never required to be allowed to follow an HPermittingthismaycreateaperceptivesimilaritybetweentwolabels(withandwithout H) for majority of the linguistic communities hence this is explicitlyprohibitedbytheNBGPIn future if required depending on the prevailing requirements from thecommunitythefutureNBGPmayconsiderrevisitingthisrule

72 AdditionalExamplesfromBanglaABNF

Belowarea fewexampleswhichhelponeunderstandsomeof the rulesABNFputsinplaceThesearejustgivenforreferencepurposesandarenotmeanttobecomprehensive

1HMBDorXcannotoccurinthebeginningofaBanglawordExample

ক क

াক ाक

ংক क

ক क

ঃক ःकAscanbeseensuchcombinationwillresultautomaticallyinaldquogolurdquooradottedcircle marking it as an invalid formation This is an intrinsic property of theIndianlanguagesyllableandisquasi-automaticallyappliedwhereversupportedbytheOS

2HisnotpermittedafterVBDXMS

Example

অ अ

অং क ক क কঃ कः ি )क

3NumberofBDorXpermittedafterConsonantorVowelorakāra(Mātrā)isrestrictedtoonethusthefollowingcombinationsareinvalidated

50

Example

কংং क

ক क

কঃঃ कःः

কা का

কীঃঃ कःः

অংং अ

অ अ

অঃঃ अःः

4NumberofMpermittedafterConsonantisrestrictedtooneExample

কীী की5MisnotpermittedafterV Example

ইাঈৗ ईाईौ6ThecombinationsofAnusvāra+ VisargaaswellasVisarga+Anusvāraarenotpermissible

Example

কংঃ कः

কঃং कः

8 Contributors

81ExpertsfromIndia ProfessorUdayaNarayanaSinghChair-ProfessorofLinguisticsampDeanFacultyofArtsAmityUniversityHaryanaGurgaonPachgaonManesarPIN122431(Haryana)India

51

ProfessorPabitraSarkarformerlyVice-ChancellorRabindraBharatiUniversityKolkataDrAtiurRahmanKhanPrincipalTechnicalOfficerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMrRajibChakrabortyLinguistSocietyforNaturalLanguageTechnologyResearch(SNLTR)Module114amp130SDFBuildingSaltLakeSector-VKolkata-700091(WestBengal)IndiaMrAkshatJoshiProjectEngineerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMsMoumitaChowdhurySeniorTechnicalOfficerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMrChandrakantaMurasinghAgartalaTripuraSomeotherNBGPmembers

82ContributorsfromBangladeshJanabMustafaJabbarHonorableMinisterMinistryofPostsTelecommunicationsampInformationTechnologyGovtofBangladeshProfShamsuzzamanKhanFormerDirector-GeneralBanglaAcademyDhakaProfRafiqulIslamNationalProfessorofHumanitiesDhakaProfSwarochisSarkarDirectorInstituteofBangladeshStudiesRajshahiUniversityRajshahiBangladeshProfJinnatImtiazAliDirector-GeneralInternationalMotherLanguageInstituteDhakaMrMohammadMamunOrRashidDepartmentofBanglaJahangirnagarUniversityampMemberBangladeshComputerCouncilProfManiruzzamanformerlyProfessorChittagongUniversityChattagramBangladeshMrShyamSunderSikderSecretarySecretaryPostampTelecommunicationsDivisionGovtofBangladesh

52

MrMdMustafaKamalFormerDirectorGeneralBangladeshTelecommunicationsRegulatoryAuthorityGovernmentofBangladeshDhakaBrigadierGeneralMdMahfuzulKarimMajumderDirector-GeneralEngineeringampOperationsDivisionBangladeshTelecommunicationsRegulatoryAuthorityGovernmentofBangladeshDhakaMdZiarulIslamProgrammerPostsampTelecommunicationsDivisionGovernmentofBangladeshDhakaProfSyedShahriyarRahmanDepartmentofLinguisticsUniversityofDhakaDrMizanurRahmanDirector(In-Charge)TranslationTextBookandInternationalRelationsDivisionBanglaAcademyDhakaDrApareshBandyopadhyayDirectorBanglaAcademyDhakaMrMdMobarakHossainDirectorBanglaAcademyDhakaDrJalalAhmedDirectorBanglaAcademyDhakaMrJahangirHossainInternetSocietyBangladesh(ICANNALS)JanabSarwarMostafaChoudhuryBangladeshComputerCouncilDhakaJanabMdRashidWasifBangladeshComputerCouncilDhakaJanabIstiaqueArifSeniorAssistantDirectorBangladeshTelecommunicationsRegulatoryAuthorityDhakaMsAfifaAbbasInformationSecurityandGovernanceLeadEngineeratBanglalinkandICANNFellowMr Mohammad Abdul Haque Secretary General Bangladesh Internet GovernanceForumMrImranHossenCEOEyeSoftandkeymemberofBangladeshAssociationofSoftwareampInformationServices(BASIS)MsShahidaKhatunDirectorFolkloreMuseumampArchiveDivisionBanglaAcademyDhakaMrSyedAshikRehmanCEOBengalMediaCorporationDhakaMrHaseebRahmanCEOProfessionalsrsquoSystemsDhaka

53

9 References[100] UnicodeConsortium2017UnicodeStandard100MountainViewCA

[101] BandyopadhyayChittaranjan1981DuiShatakerBanglaMudranoPrakashanKolkataAnandaPublishers

[102] BanerjiRD1919TheOriginoftheBengaliScriptKolkataNewDelhiAsianEducationalServices2003reprint

[103] ChatterjiSK1926TheOriginandDevelopmentoftheBengaliLanguageCalcuttaCalcuttaUniversityPress

[104] -----1939Bhasha-prakashBangalaVyakaran(AGrammaroftheBengaliLanguage)CalcuttaUniversityofCalcutta

[105] HaiMuhammadAbdul1964DhvaniVijnanOBanglaDhvani-tattwa(PhoneticsandBengaliPhonology)DhakaBanglaAcademy

[106]JhaSubhadra1958TheFormationofMaithiliLondonLuzacampCo

[107] KosticDjordjeDasRheaS1972AShortOutlineofBengaliPhoneticsCalcuttaStatisticalPublishingCompany

[108] MajumdarRC1971HistoryofAncientBengalCalcuttaGBhardwaj

[109] MazumdarBijaychandra19202000TheHistoryoftheBengaliLanguage(ReprCalcutta1920ed)NewDelhiAsianEducationalServices

[110] PandeyAnshuman2001ProposaltoEncodetheTirhutaScriptinISOIEC10646

[111] PalPalashBaran2001DhwanimalaBarnamalaKolkataPapyrus

[112] -----2007lsquoBanglaHarapherPanchParbarsquoInSwapanChakrabortyedMudranerSanskritiOBanglaBoiKolkataAbabhas

[113] RossFiona1999ThePrintedBengaliCharacteranditsEvolutionLondonCurzon

[114] ShastriMahamahopadhyayHaraPrasad1916HājārBacharērPurāṇaBāṅgālāBhāṣāyBauddhaGānōDōhāCalcuttaBangiyaSahityaParishat

[115] SinghUdayaNarayana(JointlyManiruzzaman)1983DiglossiainBangladeshandlanguageplanningCalcuttaGyanBharati214pp

[116] -----1987ABibliographyofBengaliLinguisticsMysoreCIILxii+316pp

[117] -----2017(withRajibChakrabortyBidishaBhattacharjeeampArimardanKumarTripathy)LanguagesandCulturesontheMarginGuidelinesforFieldworkonEndangeredLanguagesMimeoCentreforEndangeredLanguagesVisva-Bharati

54

[118] -----1980ScriptalchoiceandspellingreformAnessayinlanguageandplanningJournaloftheMSUniversityofBarodaSocialScienceNumber292173-186AmodifiedversionreprintedEAnnamalaiBjornJernuddandJoanRubinedsLanguagePlanningProceedingsofanInstituteMysoreCIIL405-417

[119] Sripantha1996JakhanChapakhanaEloKolkataPaschim-BangaBanglaAcademy

[120] SurAtul1986BanglaMudranerDushoBacharKolkataJijnasa

[121] ScriptBehaviourforBengaliVersion11TDILandC-DACPune

[122] BoraMahendra1981TheEvolutionofAssameseScriptJorhatAssamSahityaSabha

[123] ProposaltoEncodetheTirhutaScriptinISOIEC10646httpwwwunicodeorgL2L201111175r-tirhutapdfaccessedon25112017

[124] EthnologueAssameseintheLanguageCloudhttpswwwethnologuecomcloudasmaccessedon25112017

[125] BengalialphabetforManipurifoundinEthnologueManipuri(MeeteilonMeithei)httpswwwomniglotcomwritingmanipurihtmaccessedon20102019

[126] WikipediaBengalialphabethttpsenwikipediaorgwikiBengali_alphabetaccessedon25112017

[129]OmniglotSlyhetihttpwwwomniglotcomwritingsylotihtmaccessedon1052018

[130]WikipediaBishnupriyaManipurilanguagehttpsenwikipediaorgwikiBishnupriya_Manipuri_languageaccessedon25112017

[131] TheEMILLECIILCorpushttpmetashareeldaorgrepositorybrowsethe-emilleciil-corpusabdd35c8de6f11e2b1e400259011f6ea6bce74d38dbb42d881da76c64a6adb20accessedon1052018

[132]TheEMILLECIILCorpushttpcatalogelrainfoproduct_infophpproducts_id=696accessedon1052018

[133] BanglaLanguageampScript httpswwwisicalacin~rc_banglabanglahtmlaccessedon1052018

[134] SarkarPabitra1992BanglaBananSanskarSamasyaoSambhabanaKolkataChirayataPrakashan

[135] SarkarPabitra1993BanglaBhasharYuktabyanjanBhasha1123-45

55

[136] DashNiladriShekharandBBChaudhuri1998BanglaScriptAStructuralStudyLinguisticsToday121-28Alsoavailableathttpswwwacademiaedu9967428Bangla_Script_A_Structural_Study

[137] DaniAhmedHasan(1957)lsquoSrīhatta-NāgarīLipirUtpattioBikāśrsquoBanglaAcademyPatrika(Dhaka)Vol12(Bhadra-Agrahayan1364BangabdaNumber)pg1

[138] WikipediaSylhetiNagari httpsenwikipediaorgwikiSylheti_Nagariaccessedon1952018

[139] FuruiRyosuke(2015)lsquoVariegatedAdaptationsStateFormationinBengalfromtheFifthtoSeventhCenturyrsquoinBhairabiPrasadSahuampHermannKulkeedsInterrogatingPoliticalSystemsIntegrativeProcessesandStatesinPre-ModernIndiaChapter9Pp255-73NewDelhiManohar

[140] FergusonCharesAandMunierChowdhury(1960)lsquoPhonesofBengalirsquoLanguageVol36No1pp22-59

[141] ShahidullahMuhammad(2007)BuddhistMysticSongsDhakaMowlaBrothers

[142] RayPunyaSloka(1966)BengaliLanguageHandbookWashington

[143] HaiMuhammadAbdul(1960)AphoneticandphonologicalstudyofnasalsandnasalizationinBengaliDhakaUniversityofDhaka

[144] UnicodeConsortiumProposalSummaryFormtoAccompanySubmissionsforAdditionstotheRepertoireofISOIEC10646UNICODEhttpswwwunicodeorgL2L200202387r-syloti-formpdfaccessedonMay212018

[145] WikipediaOlChiki(Unicodeblock)httpsenwikipediaorgwikiOl_Chiki_(Unicode_block)accessedonMay212018

[146]BanglaScripthttpwwwbangladesh2000combdbangla_scripthtmlaccessedonMay212018

[147] BhattacharyaAshutoshed(1942)GopichandrerGanCalcuttaCalcuttaUniversity

[149] DasSisirKumar(1975)SahibsandMunshisAnAccountoftheCollegeofFortWilliamCalcutta

[150] IslamRafiqulPabitraSarkarMahbubulHaqampRajibChakraborty(eds)(2014)BanglaAcademyPromitoBanglaByabaharikByakaran(AFunctionalGrammarofStandardBangla)DhakaBanglaAcademy

[151] SarkarPabitra[2013]lsquoBanglaSpellingReformtheLongandShortofItrsquoBanglaJournal19215-232

[152] BanglaAcademy(2012)BanglaAcademyPromitoBanglaBananerNiyam(StandardBanglaSpellingasadoptedbyBanglaAcademy)DhakaBanglaAcademy

56

[153] SarkarPabitraampRajibChakraborty2018ldquoWhathashappenedSoFarIntermsof Script Reformsrdquo Paper presented at the Face to Face meeting jointly held by theBanglaAcademyDhakaampICANNatBanglaAcademyDhakaon10072018

[154] The Unicode Consortium 2018 The Unicodereg Standard Version 110 ndash CoreSpecificationChapter12P473

57

10 Appendix-I

101 AugmentedBackus-NaurFormalism(ABNF)The Augmented Backus-Naur Formalism (ABNF) is generic in nature and whenappliedtoaspecificlanguagescriptcertainrestrictionrulesapplyInotherwordsinagivenlanguagesomeoftheFormalismstructuresdonotnecessarilyapplyTotakecareofsuchcasesrestrictionrulesaresetinplaceTheserestrictionswillhelptofine-tunetheABNFIncaseofBangla13inparticularthefollowingrulesapply

1Khaṇḍata (ৎ) is NOT allowed at the beginning of an IDN label The sameappliestoঞandthevelarnasalঙintheBanglaSchemeoffive-foldlsquovargarsquo(asdefined under Table 5) Moreover Bangla does not allow ya (য়) in thebeginningofawordeitherbutwecanciteacoupleofnativeexamples forexamplethewordয়8াvেড়া(yaeligbbɔRo)fromthepoemlsquoLichuchorrsquowrittenbyKaziNazrul IslamHowever there are instances of it being used in namesmostlyof foreignoriginsuchasYaqubwhichmaybewrittenwithya(য়) inthe beginning as inয়াwব) In very recent timeswhile transliterating someChineseandJapanesenamesinBanglaonedoescomeacrossthepossibilityofKhaṇḍata (ৎ) followedbysa (স) inthebeginningofaword forexamplexেসিরং(Tsering)

2CHcancomewithKhandaTainonlythecasewhereCisra(র)(09B0)

ৎ6 asinভৎ6 সনা

3OnlyfollowingcombinationswithVHCMwillbeallowedrarrঅ8া(togetherpronouncedasaelig)asinঅ8ািসড(acid)rarrএ8া(togetheralsopronouncedasaelig)asinএ8ািসডএ8ােসািসেয়শান

(acidassociation)

102 lsquoSylhetiNagarılipirsquoorlsquoSilotirsquoThisversionofBanglascriptresemblesthe lsquoKaithīrsquoscript(ISO12954)usedbytheAccountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh andBiharndashwidelyinuseduringthe1880sTherewereseveralothernamesofSylheti

13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language Assamese and Maṇipuri have not been covered in this section

58

Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)

Table17ndashTheScriptTableofSylhetiNagarıorSiloti

103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints

1031 BanglaandNāgarīorDevanāgarī

Bangla Devanāgarī NBGP

Decision

ঃ U+0983 ः U+0903 Confusable

ওU+0993 उU+0909 Confusable

ঘU+0998 घU+0918 Confusable

U+0981 U+0945 Confusable

Table18BanglaandDevanāgarīconfusablecodepoints

59

1032 BanglaandGurmukhi

Bangla Gurmukhi NBGPdecision

ঘU+0998 ਬU+0A2C Confusable

U+0981 U+0A71 Confusable

Table19BanglaandGurmukhiconfusablecodepoints

Bangla Gurmukhi

NBGPdecision

ওU+0993 ਤU+0A24

Distinguishable

শU+09B6 ਅU+0A05

Distinguishable

মU+09AE ਮU+0A2E

Distinguishable

বাU+09ACandU+09BE

ਗU+0A17

Distinguishable

Table20ndashBanglaandGurmukhıdistinguishablecodepoints

1033BanglaandOriya(Odia)

Bangla Oriya(Odia) NBGPDecision

ওU+0993 ଓU+0B13 Confusable

Table21ndashBanglaandOriyadistinguishablecodepoints

Bangla Oriya(Odia) NBGP

Decision

ঘU+0998 ସU+0B38 Distinguishable

Table22ndashBanglaandOriyadistinguishablecodepoints

60

11 Appendix-IIBengaliconsonantsandtheirallographs

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

প p y(z+ত)(z +ন)|(z +প)প8(z +য)r(z +র)(z +ল)~(z +স)(+প)(+প)

ফ pʰ (t +র)(t +ল)(+ফ)

ব b ( +জ)( +দ)( +ধ)v(+ব)ব8(+য)(+র)(+ল)ভ(+ভ)(+ব)(+ব)

(0+ধ) 2 (3+ব)

ভ bʱ ভ8(+য)(+র)(+ল)

ত t (x+ত)8(x+x+য)(x+x+ব)(x+থ)(x+ন)ত8(x+য)(x+ম)8(x++য)(x+ব)(x+র)y(z+ত)(p+ত)(p+x+ব)(+ত)q8(+x++য) (+x+র)Thereisamarkedformofত+=ৎৎ6 (+xৎ)

amp (5+ত)

61

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

থ tʰ থ8(iexcl+য)cent(iexcl+র)pound(+থ)(x+থ)curren(+থ)

(7+থ) 9 (+থ)

দ d yen(brvbar+গ)sect(brvbar+ঘ)uml(brvbar+দ)copy(brvbar+ধ)দ8(brvbar+য)ordf(brvbar+ব)laquo(brvbar+ভ)not(brvbar+র)(+দ)shy(+দ)reg(+brvbar+র)not6 (+brvbar+র)

(lt+গ) gt (lt+ধ)

ধ dʱ macr(deg+ন)plusmn(deg+ম)ধ8(deg+য)sup2(deg+র)sup3(acute+ধ)copy(brvbar+ধ)(+ধ)micro(+ধ)

( (+ধ) gt (lt+ধ) (0+ধ) (7+ধ)

ট ʈ para(middot+ট)ট8(middot+য)cedil(middot+ব)sup1(middot+র)ordm(p+ট)raquo(frac14+ট)

ঠ ʈʰ ঠ8(frac12+য)frac34(iquest+ঠ)Agrave(frac14+ঠ)

ড ɖ Aacute(Acirc+ড)ড8(Acirc+য)Atilde(Acirc+র)

ঢ ɖʱ ঢ8(Auml+য)Aring(iquest+ঢ)

চ tʃ AElig(Ccedil+চ)Egrave(Ccedil+ছ)Eacute(Ccedil+Ecirc+র)Euml(Ccedil+ঞ)চ8(Ccedil+য)Igrave(Iacute+চ)Icirc(Iuml+চ)

(A+চ)

62

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ছ tʃʰ ETH(Ecirc+র)Egrave(Ccedil+ছ)Ntilde(Iacute+ছ)Ograve(Iuml+ছ)

$ (A+ছ)

জ dʒ Oacute(Ocirc+জ)Otilde(Ocirc+Ocirc+ব)Ouml(Ocirc+ঝ)times(Ocirc+ঞ)জ8(Ocirc+য)Oslash(Ocirc+র)Ugrave(Iacute+জ)

(A+জ)

ঝ dʒʱ (notprivilegedenoughtohaveclustersasafirstmember)Ouml(Ocirc+ঝ)Uacute(Iacute+ঝ)

ক k Ucirc(p+ক)ordm(p+ট)(p+ত)Uuml(p+x+র)(p+x+ব)Yacute(p+ন)THORN(p+ব)szlig(p+ম)ক8(p+য)agrave(p+র)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)auml(p+frac14+ব)aacute8(p+frac14+য)aring(p+স)s(aelig+ক)ccedil(+p+র)

amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)

খ kʰ (notprivilegedenoughtohaveclustersasafirstmember)egrave(aelig+খ)

63

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

গ g eacute(acute+গ)ecirc(acute+দ)sup3(acute+ধ)euml(acute+ন)igrave(acute+ব)iacute(acute+ম)গ8(acute+য)icirc(acute+র)iuml(acute+ল)eth(aelig+গ)eth6 (+aelig+গ)

( (+ধ) (H+গ) K (L+H+গ)

ঘ gʱ ntilde(ograve+ন)ঘ8(ograve+য)oacute(ograve+র)ocirc(aelig+ঘ)

ঞ Thisletterdoesnothaveanyparticularphoneticvaluebutmostlypronouncedasn

Igrave(Iacute+চ)Ntilde(Iacute+ছ)Ugrave(Iacute+জ)Uacute(Iacute+ঝ)times(Ocirc+ঞ)

(A+চ) $ (A+ছ) (A+জ) M (A+ঝ)

ণ n otilde(iquest+ট)frac34(iquest+ঠ)ouml(iquest+ড)divide(iquest+Acirc+র)Aring(iquest+ঢ)oslash(iquest+ণ)ণ8(iquest+য)ugrave(iquest+ব)acirc(p+frac14+ণ)uacute(frac14+ণ)ucirc(+ণ)

O (P+ড) - (P+R+র) + (S+ণ)

ঙং ŋ s(aelig+ক)uuml(aelig+p+র)egrave(aelig+খ)eth(aelig+গ)ocirc(aelig+ঘ)yacute(aelig+p+ষ)(Insomecontextsaeligisreplacedbyং)কংঅং

) (H+ক) (H+গ) U (H+ঘ)

64

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ম m thorn(+ল)yuml(+প)(+z+র)(+ভ)(++র)$(+ম)(+র)(x+ম)plusmn(deg+ম)amp(+ম)atilde(p+frac14+ম)

W (3+ম)

ন n (+ট)((+middot+র))(iquest+ঠ)u(+ড)(+Acirc+র)(+ত)q(+x+র)q8(+x++য)curren(+থ)shy(+দ)reg(+brvbar+র)micro(+ধ)+(+deg+র)(+brvbar+ব)-(+ন)(+ম)ন8(+য)(+স)0(+ন)

(7+থ) (7+ধ) (7+Y+র)

শ ʃ Icirc(Iuml+চ)Ograve(Iuml+ছ)1(Iuml+ন)2(Iuml+ম)3(Iuml+র)4(Iuml+ল)শ8(Iuml+য)

ষ ʃ 5(frac14+ক)raquo(frac14+ট)Agrave(frac14+ঠ)uacute(frac14+ণ)6(frac14+প)7(frac14+z+র)8(frac14+ফ)raquo(frac14+ট)9(frac14+middot+র)Agrave(frac14+ঠ)uacute(frac14+ণ)ষ8(frac14+য)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)

+ (S+ণ)

65

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

স sampʃ (+ক)(+ট)(+প)(+ফ)lt(+ত)pound(+থ)(+ট)(+ক)=(+খ)স8(+য)gt(+র)(+ল)aring(p+স)

9 (+থ)

হ h ucirc(+ণ)0(+ন)amp(+ম)হ8(+য)(+র)A(+ল)

W (3+ম)

ড় ɽ B(C+গ)

ঢ় ɽʱ (notprivilegedenoughtohaveclusters)

য dʒThesecondarysymbol(allograph)jɔ-phalahastwophoneticvaluesWhenaddedtotheinitialconsonantinaworditisavowelaelig(asinশ8ামলর 8াপারetc)Butafteranon-initialconsonantitjustdoublesitinpronunciation(asinকায6ধায6etc)The+যcombinationhastwophysicalmanifestationsmdashর 8andয6

ক8(p+য)স8(+য)র 8(+য)[Justর 8isneverusedinBanglaorthographyর 8াisbutthenitslasttwosymbolsYa-phalaa-karaconstituteavowelsignrepresentingthevowelঅ8া]

66

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

র r Twomanifestationsmdashi lরফrepʰasthefirst

memberofaclusteregপ6ৎ6 not6 য6D6(+deg+ব)(earlierE6=+brvbar+deg+বafour-termcluster)etc(placedoverthefollowingconsonant)

ii র-ফলাrɔ-pʰɔlaasthesecondthirdmemberofaclustereg etc(placedundertheconsonantitfollows)

ল l F(+গ)(+প)G(+ব)H(+ম)I(+ট)J(+ড)K(+ক)F(+গ)L(+দ)ল8(+য)iuml(acute+ল)(+ল)thorn(+ল)

ঃ hwordfinallywordmediallyitdoublesthepronunciationofthefollowingconsonant

অঃকঃ

অব

Page 15: Proposal for a Bangla (or Bengali) Script Root Zone …‘Bangla’ (or Bengali) is historically and genealogically regarded as an eastern Indo-Aryan language with around 178.2 million

15

Varga লlsquoLrsquoU+09B2

শlsquoSHrsquoU+09B6

ষlsquoSSrsquoU+09B7

স lsquoSrsquo U+09B8

হlsquoHrsquoU+0939

Table6Non-Vargaconsonants(Notfallingintoanyofthefivecategories)

332TheImplicitVowelKillerHasanta(calledrsquoHalantrsquoorlsquoHalantarsquoinotherBrahmı-basedscripts)As stated earlier all consonants are pronounced in isolation with an implicit vowel(centralback-ɔinBanglaastheneutralvowel)assumedtobeassociatedwiththem[121]ThelsquoHasantarsquo(=rsquoHalantrsquoorlsquoHalantarsquoinotherBrahmı-basedscripts)orthetermlsquoVirāmarsquo7(=rsquoDa rirsquoinBangla)aspreferredinUNICODE(cfUnicode30andabove)havebeenusedinthisreportastermsthathavebeenusedtodenotethecharacterthatmarkthe absence of this inherent vowel It may be noted that the term virama has beenadopted in UNICODE in a sense that is different from the traditional definition ofgrammarandhenceitrequiressomeexplanationhereConsideringtheimportanceofthedocumentthisnoteshouldbeapartofthisLGRdocumentsothatanybodyreferingtoitshouldbeabletoknowthepropergrammaticalexplanationofthetermBecauseaspecialsignisneededwheneverthisimplicitvowelisstrippedoffthesymbolisknownas the Hasanta (= Halant) (U+09CD) By placing the Hasanta under the firstconsonantofacombinationorclusteronecouldndashincommonparlanceldquokillrdquoitsvowelandcreate conjuncts In thismanner conjunct characters canbegenerallywrittenbyjoining two to fourconsonant combinations In rarecases thisprocess can joinup tofive consonantsHowever thenotionof amaximumnumberof consonants joining toformoneaksara8istobeboundedempiricallyThisisanobservationbasedontheCIIL-Emille Corpora of Bangla words [132 amp 133] as seen in print these days Given themixture of scripts and languages happening on theweb the possibility that onemaywant a generic Top Level Domain [gTLD] which may have more than the observedmaximum cannot be ruled out This can be the case when a foreign language wordwhichadmitsalargenumberofconsonantsistransliteratedintoBanglaHenceintheBanglaLGRworkthislimitwillnotbeenforced

333VowelsSeparate symbols exist for all lsquoSwararsquo or Vowels in Bangla which are pronouncedindependentlyeitheratthebeginningofthewordorafteranothervowelorconsonantsoundToindicateaVowelsoundotherthantheimplicitoneaVowelsigncalledlsquokārrsquo

7VirāmaasusedhereisalsoamisnomeraccordingtotheIndiangrammaticaltraditionsNowheremereabsence of a vowel is marked as virama Hasanta just marks the absence of a vowel nothing else(AbhyankarKashinathVasudevampJMShukla1961ADictionaryofSanskritGrammarBarodaːOrientalInstitute)8ThistermneedstobedisambiguatedAksaraalsomeanslsquosyllablelsquoinIndiangrammaticaltreaditions

16

inBanglaorMātrāinNagarı9isattachedtotheconsonantSincetheconsonanthasthisbuilt in neutral vowel at the end there are equivalent kāras (Mātrās) for all vowelsexcepttheঅ(pronounced-ɔ)Thecorrelationisshownasfollows

Vowel Correspondingvowelsign(kāras(Mātrās)

অlsquoArsquoU+0985

আlsquoAArsquoU+0986 া U+09BE

ইlsquoIrsquoU+0987 ি U+09BF

ঈlsquoIIrsquoU+0988 ীU+09C0

উlsquoUrsquoU+0989 U+09C1

ঊlsquoUUrsquoU+098A U+09C2

ঋVocalicrsquoRrsquoU+098B U+09C3

ৠVocaliclsquoRRrsquoU+09E0 U+09C4

ঌVocaliclsquoLrsquoU+098C U+09E2

ৡVocaliclsquoLLrsquoU+09E1 U+09E3

এlsquoErsquoU+098F ে U+09C7

ঐlsquoAIrsquoU+0990 ৈU+09C8

ওlsquoOrsquoU+0993 োU+09CB

ঔlsquoAUrsquoU+0994 ৌ U+09CC

- ৗ U+09D7

Couldappearontopofঅ lsquoArsquoU+0985oranyothervowel

U+0981Candrabindu

9AlthoughthetermlsquoMatralsquoinBanglastandsforanaltogetherdifferentconceptvizthetopbarplacedoveraletterndashtypicallyavailableinHindiandBanglabutmissinginGujarati

17

Vowel Correspondingvowelsign(kāras(Mātrās)

Couldappearafterঅ lsquoArsquoU+0985oranyothervowel

ংU+0982Anusvara

Couldappearafter অlsquoArsquoU+0985oranyothervowel

ঃU+0983Visarga

Afteranyconsonant U+09CD(Hasanta)

- ঽ U+09BDAvagraha

Table7BanglaVowelswithcorrespondingkārs

334TheAnusvāraonuʃʃār(ং-U+0982)TheAnusvāra or onuʃʃār inBangla at times represents a homorganic nasal but notalwaysItreplacesaconjunctgroupofalsquoNasalConsonant+Hasanta+ConsonantrsquowherethesecondconsonantbelongstotheVelarvargaorsetasinলংকাButitoftenappearsalso for such combinations involving non-velars appearing as the lastmember of thecombinationasinলGাংটা ldquonakedrdquoorলGাংচা ldquoakindofsweettolimprdquoBeforeanon-vargaconsonant the Anusvara represents a nasal sound that may have an alternativeconjoined writing symbol representing the corresponding nasal consonant of theparticularsetAlthoughModernHindiMarathiandKonkaniprefertheanusvāratothecorrespondingHalf-nasal inBangla it isclearlydemarcatedastowhereonemustusetheAnusvāraandwhere ithastobeaconjunctclusterwithanasalasthefirstorthesecondcomponent

335NasalizationCandrabindu(-U+0981)

Candrabindu denotes nasalization of the preceding vowel as in চাদ cad lsquomoonrsquo(U+099AU+09BEU+0981U+09A6)Thissignwithadotinsidethehalf-moonmarkisusedasnasalizationmarkerinmanyBrahmı-basedscripts[143]

336Nukta(-U+09BC)ThenuktasigndoesnotexistinBanglaorthographyItispredominantlyusedinmanyBrahmıderivedscriptssuchasDevanagarı(forHindiBodoMaithiliSantaliKashmiriandSindhiThetermandtheconceptofnuktaareborrowedinBanglaTheIDNAProtocol(RFC5891)statesthatIDNsmustbeinUnicodeNormalizationFormC (NFC) RFC 7940 applies this requirement to LGRs The definition of NFC in the

18

UnicodeStandardcontainsanumberofcompositionexclusionsAsaresulttheBanglalettersয় YYAড় RRAandঢ় RRHAhavetoberepresentedinthethisLGRbyusingthesequences (YA +Nukta U+9AF + U+09BC) (DDA + Nukta U+9A1 + U+09BC) and(DDHA+NuktaU+9A2+U+09BC)insteadofthesinglecodepointsYYA(U+9DF)RRA(U+09DC) andRRHA (U+09DD) although the useof lsquoNuktarsquo is otherwise completelyunnaturalinBanglaIt is noted that in the current Unicode Standard chart these characters are listed asadditionalconsonantsAspertheLGRProcedurehoweverthesedecisionsdependontheIDNAProtocolthroughasetofprodeduresdevelopedbytheIETFEventhoughtheUnicode Standard also prescribesmethods to produce these three characters both asatomiccharacters (forexample09DC forড় [r]09DD forঢ় [rh] and09DFasয় [y]assinglekeystroke)theIDNAprotocolrequiresthatwetreatthemasconjunctcharactersandthenallocatecodesfortheseintheUnicodeBengaliBlockItmaybenotedthattherecouldbesporadicattemptsorcasesofwritingMuslimnamesUrdupoeticwordsandPerso-Arabicloanwordswithnuktaunderক(k)খ(kh)গ(g)জ(j) and ফ (ph) only for the sake of correct pronunciation and for maintaining thesanctityoftheloanwordThesewerealsolikeusingBanglawritingsystemtoworkliketheIPAscriptItishowevernotinuseinBanglawritinginprinting

337Visargabiʃɔrgo(ঃ-U+0983)andAvagraha(ঽ-U+09BD)

TheVisargabiʃɔrgoU+0983 is frequentlyused inBangla loanwordsborrowed fromSanskritandrepresentsasoundveryclosetohOnecouldquoteasanexampleদঃখduhkholdquosorrowrsquorsquoldquounhappinessrsquorsquo(U+0926U+0941U+0983U+0916)The Avagraha ঽ (U+09BD) is mainly used in Sanskrit Pali Prakrt or Maithili textswritteninBanglaItisgraduallybeingreplacedbyanuppercomma(egনেরাঽপরািণre-writtenasনেরাrsquoপরািণ)ItisrarelyusednoweveninotherlanguagesusingBanglascriptIncaseofLGRtheAvagrahaisnotpartoftherepertoireIthasbeendecidedthereforenottoretainAvagraha(ঽ)(U+09BD)becauseitisblockedinTLDsaspertheMaximalStartingRepertoire(MSR)PleaseseeAppendixIIinsection11foracompletelistofBanglaconsonantsandtheirallographs

338ZeroWidthNon-joiner(U+200C)andZeroWidthJoiner(U+200D)ThisnoteispertinenttotheuseofZeroWidthJoiner(ZWJ)andZeroWidthNonJoiner(ZWNJ)asusedinBanglaItneedstobenotedthatNepaliKonkaniandHindiusethesetwosignsinadifferentmanner

19

ZWJ(U+0200D)andZWNJ(U+0200C)arecodepointsthathavebeenprovidedbytheUnicodestandardto instructtherenderingofastringwherethescripthastheoptionbetweenjoiningandnon-joiningcharactersWithouttheuseofthesecontrolcodesthestringmayberenderedinanalternateformfromwhatisintendedUseofZWJ

bull InsofarasBanglaisconcernedZWJisusedfortheproperrenderingofcharacterssuchaskhaṇḍa-taৎasinসতGিজৎ (satyajit)ldquoSatyajitrdquoandসৎ(sat)ldquohonestrdquoThisistypedasfollowsta+Hasanta+ZWJ(U+0200D)

bull However ZWJ is more important where same combination of consonantal

characters is represented differently depending upon the contexts Eg র++য

havetworepresentationsinBanglamdashasযHandasর GTogettheformযHonehasto

type in the following mannermdashর++য but for র G the sequence would be

র+ZWJ++য [154] In other words ZWJ is used in the rendering of wordsdemanding ya-phalā after ra which is otherwise not possible to type (render)due to the same order of ra+hasanta+antastha ja in the medial andor finalposition Interestingly ra+hasanta+antastha ja is used to type repha on theconsonant -antasthaja as inকায6 (kaarjo) In order to get a ya-phalā after the

consonant -ra it is therefore obligatory to use ZWJ after -ra as in র Gাপার

(wrapper)র Gাশ(rash)র Gািল(rally)etcThetypingsequenceisgivenbelow

ra(র)+ZWJ+hasanta()+antasthaja(য)=র GUseofZWNJ

bull TheuseofZWNJinBanglaisusedtorepresenttheexplicitHasantaorHalantInordertoavoidconjunctformationincaseswherethereisanexplicithasantabeforethesucceedingconsonanttheZWNJisused

Consonant+hasanta+ZWNJ+consonant=explicithasantaExampleAা7 কথন(prakkathanaprakkɔtʰon)

TheuseofZWJZWNJhavebeenruledoutfromtherootzonebythe[Procedure]Usedin Bangla to create alternate renderings the insertion of these two signs can affectsearchingaswellasNLP

20

TheZeroWidthNon-joiner(ZWNJ)isaninvisiblecharacterusedincertaincases(afterHasanta)wheredefaultconjunctformationistobeexplicitlyrestrictedandtheHasantajoiningthetwoconsonantsparticipatingintheconjunctformationneedstobeexplicitlyshown

339UseofYa-phalaaYa-Phalaasequencesare two instances inBanglawhereHasanta isprecededbya fullvowel(U+0985অ-BENGALILETTERAandU+098Fএ-BENGALILETTERE)

bull অা 098509CD09AF09BE BENGALILETTERA+BENGALISIGNVIRAMA+BENGALILETTERYA+BENGALIVOWELSIGNAA

bull এা 098F09CD09AF09BEBENGALILETTERE+BENGALISIGNVIRAMA+BENGALILETTERYA+BENGALIVOWELSIGNAA

For renderingYa-phalā followedbyঅ andএ it isnecessary to typeU+09CDHasantaplusU+09AFyaprecededby thesaidvowelsThis isapurely ligaturalentityand theadditionofYa-phalāandākaraisusedtoelicittheaeligsoundasinEnglishacidঅGািসড

association অGােসািসেয়শনlsquobatrsquoবGাটlsquofatrsquo ফGাট lsquomatrsquo মGাটlsquocaprsquoকGাপetcTheBrāhmīscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribed as lsquovowel killerrsquo although it actually indicates absenceof a vowel after themarkedconsonantOnly theconsonantscanhave theHasantamarkedButasweseehereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅ8াandএ8া(CfUnicode100p473[100])

3310FormationofRa-phalaaandRefSequences Thiscasereferstotheformationofrephaandra-phalāasfollows

Ra-Hasanta= (C2H)whereC2iseither

09B0(র-BENGALILETTERRA)or 09F0(ৰ-ASSAMESELETTERRAUnicodename

BENGALILETTERRAWITHMIDDLEDIAGONAL)His09CD(-BENGALISIGNVIRAMA)

Owingtoco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+kaasinঅকHarkaldquothesunldquo)ra-phalā=C+Hasanta+ra(egT ieka+Hasanta+raasinচTchakraldquocyclerdquo)Thepointisin

21

both the cases the slot for ra could be Bangla ra র (U+09B0) or the Assamese ra ৰ(U+09F0)followedprecededbythecommonHasanta(U+09CD)whereastheshapesofrephaandra-phalāinboththecasesremainthesameTheLGRmakesanoteofthispoint of concern with respect to the two RAs in disguise as it would be compeltelyimpossibletodistinguishbetweenthemwithnakedeyesinalablesogeneratedwhichmay consequently lead to concerns related to spoofing and other kind of cyberirregularitiesThemotivetoclassthesetwoCPsas(blocking)variantsisbecausefullyrendered labels may mask the distinction between Bangla ra র (U+09B0) or theAssameseraৰ(U+09F0)ThatprovidesthejustificationforVariantSet4thoughonlyinthecontextoffollowingHasantThedifferencebetweentheRAsisonlydistinguishableifonelooksintotheirUnicodevaluesThereforelabelssuchasঅকHarka শীষH sırsalsquotopapexrsquo অY abhra lsquocloudthe skyrsquo ম śrama lsquophysical labourrsquo could be extremelydangerous as theweb-usermay never verify the digital content (the labels) with itsunicodevaluecodepoints ThispointismadeexplicitlywithreferencetoTable9(ofsequencesp36)andTable16(ofWLESymbolsp47)thataretofollowMoreoveritisnoteworthythattheREPHAcanalsooccurwithKHANDATATheconditionsinthiscontextofKHANDATAareliabletobesuchthattheCshouldbeeitherRAU+09B0(র)(usedinBangla)orRAU+09F0(ৰ)(usedinAssamese)

4 OverallDevelopmentProcessandMethodologyThe Neo-Brāhmī Generation Panel (NBGP) has been formed by members havingexperience in Linguistics (especially in NLP Computational linguistics) LiteratureLanguageHistoryandEpigraphyUndertheNeo-BrāhmīGenerationPanelBanglaandeightotherscriptsbelongingtoseparateUnicodeblocksarebeingtakenuptoassignaseparate LGR for each However an attempt ismade to ensure that the fundamentalphilosophybehindbuildingthoseLGRsconsistentwithallotherBrāhmī-derivedscriptsThepresentLGRwillcater tomultiple languagesbelongingtoEGIDSscale1to4(seeTable4)thatuseBanglascriptThefollowingguidingprinciplesareusedinmakingdecisionsaboutBanglaLGRCode-points

41 GuidingPrinciplesTheNBGP adopts followingbroadprinciples for selection of code-points in the code-pointrepertoireacrosstheboardforalltheNeo-Brāhmīscriptswithinitsambit

411 InclusionPrinciples4111 ModernUsageEvery character proposed should be in the everyday usage of a particular linguisticcommunityThecharacterswhichhavebeenencodedintheUnicodefortranscription

22

purposesonlyorforarchivalpurposeswillnotbeconsideredforinclusioninthecode-pointrepertoire4112 UnambiguousUseEvery character proposed should have unambiguous understanding among linguistsaboutitsusageinthelanguage

42 ExclusionPrinciplesThe main exclusion principle is that of External Limits on Scope These consist ofprotocolsor standardswhichareprerequisites to theLabelGenerationRule-setsAllfurtherprinciplesareinfactsubsumedundertheselimitationsbuthavebeenspeltoutseparatelyforthesakeofclarity421 ExternalLimitsofScopeThecodepointrepertoireforrootzonebeingaveryspecialcaseatthetopofprotocolhierarchies thecanvasofavailablecharacters forselectionasapartof theRootZonecodepointrepertoireisalreadyconstrainedbyvariousprotocollayersbeneathitThefollowingthreemainprotocolsstandardsactassuccessivefiltersiTheUnicodeChartOut of all the characters that are needed by the script in question if a particularcharacter is not encoded in Unicode it cannot be incorporated in the code pointrepertoire Such cases are quite rare and especially so in Bangla-Asamiyā-ManipuriWritingSystemgiventheelaborateandexhaustivecharacterinclusioneffortsmadebytheUnicodeconsortiumiiIDNAProtocolUnicode being the character-encoding standard for providing the maximum possiblerepresentation of a given scriptlanguage it has encoded as far as possible all thepossible characters needed by the script However the Domain name being aspecialized case it is governed by an additional protocol known as IDNA(InternationalizedDomainNames inApplications) The IDNAprotocol excludes somecharactersoutofUnicoderepertoirefrombeingpartofthedomainnamesiiiMaximalStartingRepertoire(MSR)TheRoot-zoneLGRbeing the repertoireof characterswhicharegoing tobeused forcreationoftheRoot-zoneTLDswhichinturnconstituteanevenmorespecializedcaseof domain names the ROOT LGR procedure introduces additional exclusions on theIDNArsquosallowedsetofcharacters

23

ExampleBanglaSignAvagrahaঽ(U+093D)evenifallowedbyIDNAprotocol isnotpermittedintheRootZoneRepertoireaspertheMSRTosumuptherestrictionsstartoffwithadmittingonlysuchcharactersasarepartofthe code-block of the given scriptlanguage The IDNA Protocol further narrows thisdownandfinallyanadditionalfilterintheformofMaximalStartingRepertoirerestrictsthecharactersetassociatedwiththegivenlanguageevenmore4211 NoPunctuationMarksTheTLDsbeingidentifierspunctuationmarkerspresentinBraHami-basedscriptswillnotbeincluded 4212 NoSymbolsandAbbreviationsAbbreviations weights and measures and other such iconic characters like BANGLAISSHAR(U+09FA)BANGLACURRENCYDENOMINATORSIXTEEN৹(U+09F9)etcwillalsonotbeincluded4213 NoRareandObsoleteCharactersThere are characterswhich have been added toUnicode to accommodate rare formssuchasSanskriticVOCALICRRৠ(U+09E0)andVOCALICLldquoঌrdquo (U+098C)aswellasVOCALICLLৡ(U+09E1)andtheallographicndashkaraformsofthelattertwosymbols-VOWELSIGNVOCALICL(U+09E2)andVOWELSIGNVOCALICLLldquo(U+09E3)Allsuch charactersareexcludedwhich complieswith theConservatismprincipleas laiddownintheRootZoneLGRprocedureHoweverinBanglathe-karacorrespondingtoVOCALICRRৠ(U+09E0)whichisVOWELSIGNVOCALICRRldquordquo(U+09C4)isstill inactiveuseincertainlimitedborrowedorSanskriticwordsandarethereforeretained4214 NoStressMarkersofClassicalSanskritandVedicStressmarkers for classical Sanskrit will not be included This is also in consonancewiththeLetterprincipleaslaiddownintheRootZoneLGRprocedure4215 ABNFThe Augmented Backus-Naur Formalism (ABNF) is described in Section 541 andAppendix(Section101)

5 RepertoireTheBanglaWritingSystemisrepresentedinUNICODEusingtheBengali(Bangla)scriptname as enumerated in ISO 15924 corresponding to languages such as Asamiyā(Assamese) Bangla (Bengali) and Manipuri The BENGALI block used for Bangla-

24

Asamiyā-Manipuri in theUNICODEhas93 entriesThis sectiondetails the code-pointrepertoirethattheNeo-BrāhmīGenerationPanel[NBGP]proposestobeincludedintheBanglaLGRItmaybementionedherethat theGovernmentofAssamhassubmittedaproposal toBureauof Indian Standards (BIS) on26th February2016 for dis-unificationofBanglaand Asamiyā Scripts The BIS inits 8thMeetingofIndian Language Technologies andProducts Sectional Committee LITD 20 held on 23rd Aug 2017 decided torefer theproposalforrecognitionofAssamesescriptinISOIEC10646toISOUntiltheUNICODEConsortiumtakesanyfurtheractionitwillbeassumedthattheCodePointRepertoireunderTable11willbevalidforallthethreelanguagesasaboveFor each of the code points language references have been given in the last columntitledReferenceunderTable8titledtheldquoCodePointRepertoirerdquoForentirecoverageofBanglacodepointsreferencesofBanglaAsamiyā(Assamese)Manipuri(Meitei)andBishnupriya are given Kokborok written in Bangla script is not known to haveintroducedmanynewcomplicationsexceptforoneparticularcharacterThoughonlyafewrepresentativelanguagesunderEGIDSScale1-4havebeenchosenforreferencingthey together cover all the code-points required for all the languages that NBGP hasconsideredasgivenunderBanglaUnicodePoints(asgiveninUNICODE63)Howeverbefore thedetailsarepresented it is ideal to lookat theBanglaCodePointChartfromMaximalStartingRepertoire[MSR]Version3Itmaybenotedthattheshapesofthereferenceglyphsgivenbelowinthecodechartsarebasedononeofthemanyfontsdesigned and are not prescriptive because there could be some variations in actualfonts ndash both UNICODE-compatible and True-Type ones Consider the following Codepointtable

25

Figure1BanglaCodePagefrom[MSR]forBangla-Asamiyā-Manipuri

ColourconventionAllcharactersthatareincludedinthe[MSR]-YellowbackgroundPVALIDinIDNA2008butexcludedfromthe[MSR]-PinkishbackgroundNot PVALID in IDNA2008 or are ineligiblefor the root zone (digits hyphen) - Whitebackground

GiventheBanglaUnicodeBlockasinFigure1forthecodepointsthoseareincludedintheMSRthefollowingsymbolswillneedaseparatetreatmentৎ U+09CE BanglaLetterKhanda-Taৰ U+09F0 Asamiyā-BanglaLetterRaWithMiddleDiagonalৱ U+09F1 Asamiyā-BanglaLetterRaWithLowerDiagonal

26

51 CodePointRepertoireInclusion

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

1 U+0981 BENGALISIGNCANDRABINDU

Candra-bindu

1Bangla2Manipuri2Assamese

[112][122][125]

2 U+0982 ং BENGALISIGNANUSVARA

Onushshar(Anusvara)

1Bangla2Manipuri2Assamese

[112][122][125]

3 U+0983 ঃ BENGALISIGNVISARGA

Biśarga(Visarga)

1Bangla2Manipuri2Assamese

[112][122][125]

4 U+0985 অ BENGALILETTERA

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

5 U+0986 আ BENGALILETTERAA

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

6 U+0987 ই BENGALILETTERI

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

7 U+0988 ঈ BENGALILETTERII

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

27

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

8 U+0989 উ BENGALILETTERU

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

9 U+098A ঊ BENGALILETTERUU

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

10 U+098B ঋ BENGALILETTERVOCALICR

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

11 U+098F এ BENGALILETTERE

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

12 U+0990 ঐ BENGALILETTERAI

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

13 U+0993 ও BENGALILETTERO

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

14 U+0994 ঔ BENGALILETTERAU

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

28

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

15 U+0995 ক BENGALILETTERKA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

16 U+0996 খ BENGALILETTERKHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

17 U+0997 গ BENGALILETTERGA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

18 U+0998 ঘ BENGALILETTERGHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

19 U+0999 ঙ BENGALILETTERNGA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

20 U+099A চ BENGALILETTERCA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

21 U+099B ছ BENGALILETTERCHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

29

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

22 U+099C জ BENGALILETTERJA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

23 U+099D ঝ BENGALILETTERJHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

24 U+099E ঞ BENGALILETTERNYA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

25 U+099F ট BENGALILETTERTTA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

26 U+09A0 ঠ BENGALILETTERTTHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

27 U+09A1 ড BENGALILETTERDDA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

30

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

28 09A109BC(U+09DC)

ড় NormalizedformofBENGALILETTERRRA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]09DCisthepreferredcodepointhoweveritisnotavailableforLGRasperthestandardsgoverningthisLGRdevelopment

29 U+09A2 ঢ BENGALILETTERDDHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

30 09A209BC(U+09DD)

ঢ় NormalizedformofBENGALILETTERRHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]09DDisthepreferredcodepointhoweveritisnotavailableforLGRasperthestandardsgoverningthisLGRdevelopment

31 U+09A3 ণ BENGALILETTERNNA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

32 U+09A4 ত BENGALILETTERTA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

31

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

33 U+09A5 থ BENGALILETTERTHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

34 U+09A6 দ BENGALILETTERDA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

35 U+09A7 ধ BENGALILETTERDHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

36 U+09A8 ন BENGALILETTERNA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

37 U+09AA প BENGALILETTERPA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

38 U+09AB ফ BENGALILETTERPHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

39 U+09AC ব BENGALILETTERBA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

32

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

40 U+09AD ভ BENGALILETTERBHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

41 U+09AE ম BENGALILETTERMA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

42 U+09AF য BENGALILETTERYA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

43 09AF09BC(U+09DF)

য় NormalizedformofBENGALILETTERYYA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]09DFisthepreferredcodepointhoweveritisnotavailableforLGRasperthestandardsgoverningthisLGRdevelopment

44 U+09B0 র BENGALILETTERRA

Consonant 1Bangla2Manipuri

[112][125]

45 U+09B2 ল BENGALILETTERLA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

46 U+09B6 শ BENGALILETTERSHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

33

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

47 U+09B7 ষ BENGALILETTERSSA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

48 U+09B8 স BENGALILETTERSA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

49 U+09B9 হ BENGALILETTERHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

50 U+09BE া BENGALIVOWELSIGNAA

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

51 U+09BF ি BENGALIVOWELSIGNI

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

52 U+09C0 ী BENGALIVOWELSIGNII

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

53 U+09C1 BENGALIVOWELSIGNU

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

34

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

54 U+09C2 BENGALIVOWELSIGNUU

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

55 U+09C3 BENGALIVOWELSIGNVOCALICR

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

56 U+09C4 BENGALIVOWELSIGNVOCALICRR

Kāra(Mātrā)

1Bangla2Assamese

[112][122]

57 U+09C7 l BENGALIVOWELSIGNE

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

58 U+09C8 m BENGALIVOWELSIGNAI

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

59 U+09CB lা BENGALIVOWELSIGNO

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

60 U+09CC lৗ BENGALIVOWELSIGNAU

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

35

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

61 U+09CD BENGALISIGNVIRAMA

Hasanta(=Halant)Virama(=Da ri)

1Bangla2Assamese2Manipuri

[112][122][125]

62 U+09CE ৎ BENGALILETTERKHANDATA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

63 U+09F0 ৰ BENGALILETTERRAWITHMIDDLEDIAGONAL

Consonant 2Assamese [122]

64 U+09F1 ৱ BENGALILETTERRAWITHLOWERDIAGONAL

Consonant 2Assamese2Manipuri

[122][125]

Table8BanglaCode-PointRepertoire

Apart from the above individual code-points the Neo-Brāhmī Generation Panel alsoproposes some specific sequences which enable conditional inclusion of the BanglaLETTER A and E followed by Bangla SIGN VIRAMA and Bangla LETTER YA againfollowed by Bangla VOWEL SIGN AA in the repertoire for enabling inclusion of aeligsoundasinEnglishlsquobatrsquolsquocatrsquoetc

SrNo

UnicodeCodePoints

Sequence

CharacterNames Examplelanguagesusingthecode-point

(Notexhaustive

list)

Reference

36

SrNo

UnicodeCodePoints

Sequence

CharacterNames Examplelanguagesusingthecode-point

(Notexhaustive

list)

Reference

S1 098509CD09AF09BE

অ8া BENGALILETTERABENGALISIGNVIRAMABENGALILETTERYABENGALIVOWELSIGNAA

BanglaAssamese

[112][122]

S2 098F09CD09AF09BE

এ8া BENGALILETTEREBENGALISIGNVIRAMABENGALILETTERYABENGALIVOWELSIGNAA

Bangla [112]

Table9Sequences

52 CodePointRepertoireExclusionTherearesomecharactersoftheBanglascriptthatfindplaceintheUnicodebuthavenot been included in the repertoire in the LGR proposal The reason for excludingঌ(U+098C)andৗ(U+09D7)isthattheyarerareandobsoletecharacters

SrNo CodePoints

Glyph CharacterNames Note

1 U+098C ঌ BENGALILETTERVOCALICL Limitedordeclininguse

2 U+09D7 ৗ BENGALIAULENGTHMARK Limitedordeclininguse

Table10ExcludedCodePoints

37

53 CodepointnotusedaloneBENGALI SIGN NUKTA U+09BC (See 336) is excluded from repertoire since it will never be used alone It will be used as sequence in three special characters in normalized form for ড় ঢ় য়

UnicodeCodePoint

Glyph CharacterName Reasonforexclusion

U+09BC BENGALISIGNNUKTA

Never used alone Only used together with U+09A1 ড

U+09A2 ঢ U+09AF য as to form ড় ঢ় য় respectively

Table10bExcludedCodePoints

54 TheBasisofPresentIDNThepresentLGRhasalsobenefitedfromtheearlierworkonIDNforBangla(differentversions)doneforभारतorভারতdraftedbetween20112009and18072013

541 TheABNFVariablesTheAugmentedBackus-NaurFormalism(ABNF)beganwiththefollowingvariables

CrarrConsonantVrarrVowelMrarrkāra(Mātrā)BrarrAnusvāra(onuʃʃār)DrarrCandrabinduXrarrVisarga(biʃɔrgo)HrarrHasantaViramaZrarrKhandaTa

TheAugmentedBackus-NaurFormalism(ABNF)willusethefollowingOperators

SrNumber Operator Function

1 ldquo|ldquo Alternative

2 ldquo[]rdquo Optional

3 ldquordquo VariableRepetition

38

4 ldquo()rdquo SequenceGroup

Table11TheABNFFormalism

InwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoBanglaaregiventofacilitateunderstanding

542 TheVowelSequenceInwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoBanglaare given To facilitate understanding of other Brahmi script users equivalents inDevanāgarīareprovidedwherevernecessaryAvowelsequenceismadeupofasinglevowelItmaybefollowedbutnotnecessarily(optionally)byanAnusvāraonuʃʃār(B)Candrabindu(D)oraVisargabiʃɔrgo(X)ThenumberofDBorXwhichcanfollowaV inBanglamaynotberestrictedtooneGoingbytherules illustratedinthedocument it isclearthat formationssuchasVDDVBBandVXXare invalidorthographicunitsHowever it isvalidandpossible tohaveformationsorsequencessuchasanusvarafollowedbyachandrabinduononehandandvisarga followed by a chandrabindu on the other as in হ8াংচা lsquohaelignchārsquo and lsquohaelignrsquo হ8াঃrespectivelyThepossibilityof aVisarga orAnusvāra (onuʃʃār) followingaCandrabinduexists inBangla Vowel can optionally be followed by a combination of Hasanta Virama [H]Consonant [C] to formaYa-phala ldquoYa-phala isapresentation formofU+09AFBanglaletterযorlsquoyarsquoRepresentedbythesequenceltU+09CDieBENGALISIGNVIRAMABangla SIGNHasanta or VIRAMA U+09AF -য BENGALI LETTER YAgt Ya-phala has aspecialformয়AgainwhencombinedwithU+09BEাBENGALIVOWELSIGNAA(ielsquoaarsquo(ā))itisusedfortranscribing[aelig]asintheldquoardquointheEnglishwordldquobatrdquowritteninBanglaasব8াটAVowel-sequenceadmitsthefollowingcombinations

5421 ASingleVowel

ExamplesV অअ

5422 AVowelwithConditionsAVowelcanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D]orVisarga[X]or Candrabindu+ Anusvāra [DB] or Candrabindu+ Visarga [DX] or combination ofHasanta(orVirama)[H]followedbyConsonant[C]followedbykāra(Mātrā)[M]

39

Examples

VB অং अ

VD অ अ

VX অঃ अः

VDB অং अ

VDX অঃ अःVHCM অ8াএ8া

5423 VHCMSequenceAVHCMsequencecanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D] orVisarga [X]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga [DX]

Examples VHCMB অ8াংএ8াং VHCMD অ8াএ8া VHCMX অ8াঃএ8াঃ VHCMDB অ8াংএ8াং VHCMDX অ8াঃএ8াঃ

543 TheConsonantSequence

5431 ASingleConsonant(C) ExampleC কक

5432 AConsonantwithConditionsAConsonantoptionallyfollowedbydependentvowelsignkāra(Mātrā)[M]orAnusvāra [B] or Candrabindu [D] or Visarga [X] or Hasanta (also known asVirama)[H]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga[DX]

Example

CM িকক -कक

CB কং क

40

CD ক क

CX কঃ कः

CH p क(Pureconsonant)

CDB কং क

CDX কঃ कः

5433 CMSequenceACMsequencecanbeoptionallyfollowedbyBDXDBorDX

Example CMB কীংকং कक

CMD কা का

CMX বীঃ वीः

CMDB কাং का

CMDX কাঃ काः

5434 SequenceofConsonants Asequenceofconsonants(upto4)joinedbyHasanta(alsoknownasVirama)

3(CH)CExample

CHC W rarr ন++ ত न+त

CHCHC sup2 rarr ন+ + ত+ + র न+त+र

CHCHCHC q8 rarr ন++ত++র++য় न+त+र+य

5435 Subsets While considering its subsets as a representative example we will consider thecombinationCHConlyhoweverthesameisequallyapplicabletoCHCHCandCHCHCHC[A]ThecombinationmaybefollowedbyMBDXDBorDX

Example CHCM sup3ী rarrক ক ী 4कrarrककী

CHCB sup3ং rarrক ক ং 4कrarrकक

CHCD sup3 rarrক ক 4कrarrकक

41

CHCX sup3ঃ rarrক ক ঃ 4कःrarrककঃ

CHCDB sup3 ং rarrক ক ং 4कrarrकक

CHCDX sup3ঃ rarrক ক ঃ 4कःrarrककः [B] 3(CH)CMmayfurtherbefollowedbyaBDXDBorDX

Example

CHCMB sup3ীং rarr ক ক ী ং 4क rarr क क ी

sup3ং rarr ক ক ং 4क rarr क क

CHCMD sup3া rarr ক ক া 4का rarr क क ा

CHCMX sup3ীঃ rarr ক ক ী ঃ 4कः rarr क क ी ः

CHCMDB sup3াংrarr ক ক া ং 4काrarr क क ा

CHCMDX sup3াঃ rarr ক ক া ঃ 4काः rarr क क ा ः

544 TheKhanda-Tasequence

5441 AsinglelsquoKhandarsquo-Ta(Z) Example Zৎ=x

5442 AKhandaTaCombination10AKhandaTacanbeprecededbyaconsonantandHasanta(alsoknownasVirama)

[CH]Z

Example র++ৎ =ৎHasinভৎHসনা (bhartsanā) scoldingNoteTheconditionsinthiscontextofKHANDATAarethattheCshouldbeeitherRAU+09B0(র)(usedinBangla)orRAU+09F0(ৰ)(usedinAssamese)

545 SpecialCasesSandPTwospecialcasesinvolvingSequences(referredtoasSandPinTable16underSection7)couldbedescribedbrieflyhereLetustakeupSinthefirstinstanceItisnoteworthythat thereare two instances inBanglawhereHasanta(U+09CD) isprecededbya full

10 Refer to Rule P in Section 7 Table 16

42

vowel (U+0985 অ - BANGLA LETTER A and U+098F এ - BANGLA LETTER E) ForrenderingYa-phalāfollowedbyঅandএitisnecessarytotypeU+09CDplusU+09AFyapreceded by the said vowels This is a purely ligatural entity and the addition ofYa-phalāandā-kāraisusedtoelicittheaeligsoundasinEnglishlsquobatrsquolsquofatrsquoetcTheBrahmıscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribedaslsquovowel killerrsquo although it actually indicates absence of a vowel after the markedconsonant Only the consonants can have the Hasanta marked But as we see hereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅGাandএGা(CfUnicode100p473[100])Another case refers to the formation of repha and ra-phalā in the said script andmentioned in the tableaboveasPOwing toco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+

kaasinঅকHarkaldquothesunldquo)ra-phalā=C+Hasanta+ra(egTieka+Hasanta+raas

inচTcakraldquocycleldquo)ThepointisinboththecasestheslotforracouldbeBanglaraর

(U+09B0)ortheAssameseraৰ(U+09F0)followedprecededbythecommonHasanta(U+09CD)whereastheshapesofrephaandra-phalāinboththecasesremainthesame

43

6 VariantsThissection talksabout thevariants in theBanglascriptTheNBGPcategorizes theseconfusinglyvariantsintwogroupsGroup1ConfusingduetopurevisualsimilarityGroup2ConfusingduetodeviationfromnormallyperceivedcharacterformationsbylargerlinguisticcommunityForGroup1anyidenticalcodepointsaredefinedasvariantsTheconfusablebutnotidenticalcasesarenotproposedasthereisanotherpanel(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesHowevercaseswhichbelongtoGroup2areproposedtobeconsideredasvariantsThesecasesarenotofmerevisualsimilarityasthey involve some deviations from the widely accepted norms of Bangla Aksharformations These can cause confusion even to a careful observer and hence beingproposedasvariantsThe variants are generated in a script when two or more forms are formed withdifferent storage or code points In Bangla the e-kāra ā-kāra and the o-kāra havedifferentcodepointsOnecantypeowithaconsonantatonegoandthesamebytypinge-kāra and ā-kāra as two separate keys getting the same results A reader cannotdifferentiatebetween the twoko (েকা)one typedwitha singlekeyand theotheronetypedwithtwodifferentkeysMoreoverthiswillnotbeconsideredasacaseofvariantbecauseakārafollowedbyakāraisnotallowed

61 InScriptVariantsHoweverweproposetwocasesoftruein-scriptvariantsinBanglascriptCASEIAs far as true variants in Bangla are concernedwemay drawour attention to caseswhereinHasantawith(U+09A5)থ(tha)appearsasconjunctwith(U+09B8)স(sa)and(U+09A8)ন(na)1 স+Hasanta+থ(U+09B8+U+09CD+U+09A5)versus

স +Hasanta+হ(U+09B8+U+09CD+U+09B9)

2 ন+Hasanta+থ(U+09A8+U+09CD+U+09A5)versus

ন+Hasanta+হ(U+09A8+U+09CD+U+09B9)

44

Theabovecombinationsifwrittenintraditionalorthographycouldbelittleconfusingwheretheথ(tha)inconjunctappearslikeaহ(ha)Theconjunctcouldbeintheinitialmedialorfinalpositions(asshownbelowinegno1)Itcouldbetypedwrongaswellthinking itwas aহ (ha)U+09B9 increasing the chances of risks in labelwriting andidentificationExamples1 acuteandসহ(asinacuteানsthānaacuteলsthulaাacuteGsvāsthyaঅacuteায়ীasthāyı)

2 andনহ(asinparagrantha)The fontswhichrepresent traditionalBanglawritingsystemcould tend tocreate thisproblemThereforethesemaybetakenascasesofvariantsinBanglaCASEIIAnotherinterestingexampleofvariantisencounteredinra+HasantaandHasanta+racombinationsinwritinglabelsintheBanglascript(forlanguagessuchasBanglaAssameseandManipuri)Thevariantcasesariseintypinglsquorepharsquo(involvingra+Hasanta)andlsquora-phalārsquo(involvingHasanta+ra)lsquoRepharsquocouldbeformedbytwosequences(mainlybecausebothAssameseandBanglafindplaceinthesameUNICODEpointsandlsquoB_RArsquoaswellaslsquoA_RArsquorefertothesamephoneticelement)Herethefinalligatureslookthesameandwillbeasfollows

(1) B_RA+H+C(2) A_RA+H+C

Where

B_RA = U+09B0BENGALILETTERRA(র)orA_RA = U+09F0BENGALILETTERRAWITHMIDDLEDIAGONAL(ৰ)H = U+09CDBENGALISIGNVIRAMA()C = anyconsonant(theoretically)

Example

Sequence1(UsingBanglaRA)

Ligature1

Sequence2(UsingAssameseRA)

Ligature2

U+09B0(র)U+09CD()U+0995(ক) কH U+09F0(ৰ)U+09CD()U+0995(ক) কH

U+09B0(র)U+09CD()U+09A0(ঠ) ঠH U+09F0(ৰ)U+09CD()U+09A0(ঠ) ঠH

Table12ExampleofRepha

45

NoteAsBanglaandAssameseকandঠlookexactlythesametheresultantcombinationswithRephalookidenticalAdditionofRephadoesnotmakeanydifferencelsquoRa-phalarsquocouldbeformedbytwosequencesonsimilargroundsandthefinalligatureswouldlookthesame

(1) C1+H+B_RA(2) C1+H+A_RA

WhereC1 = anyconsonantsexceptKhanda-ta

Example

Sequence1(UsingBanglaRA)

Ligature1

Sequence2(UsingAssameseRA)

Ligature2

U+0995(ক)U+09CD()U+09B0(র) U+0995(ক)U+09CD()U+09F0(ৰ)

U+09A8(ন)U+09CD()U+09B0(র) ) U+09A8(ন)U+09CD()U+09F0(ৰ) )

Table13ExampleofRa-phala

AstheAssameseandBanglaRephaandRa-phalaconjunctformslookthesamethiscouldcauseconfusabilitytotheend-usersHencetherephaandra-phalacasesneedtobedefinedasvariantsNBGPconcludedtodefineরandৰasvariantcodepointswhereonlyonevariantsetbetweenরandৰcouldcoverallcasesButthiswillcreateblockedvariantlabelsegif

someoneregistersldquoরররrdquothevariantlabelldquoৰৰৰrdquowillbegeneratedasvariantandwillbeblockedandviceversaHoweveritisonlyblockedatthelabellevelifsomeoneelseneedstoregisterotherlabelsegৰৰorৰৰৰৰitisstillpossible

62 CrossScriptVariantsAcrispcrossscriptstudyforBanglahasbeendonewithrespecttosisterscriptssuchasDevanāgarī Gurmukhı and Odia11(formerly Oriya) keeping in mind the visual andtechnicalconfusionstheymaycauseaslabelsonthewebdomainMoreoverthereisnoin-script variant in Bangla as far as the orthography is concerned The followingcharacters are being proposed by the NBPG as variants Although there are certaincharacters which are somewhat similar they but have not been included here TheyhavebeenprovidedintheAppendix(102)forreference

11 Unicode uses Oriya for the script although Odia is now the official term used

46

1 BanglaandNāgarīDevanāgarīScript

Bangla Devanāgarī

মU+09AE

मU+092E

িU+09BF

िU+093F

Table14-BanglaandDevanāgarīcross-scriptvariantcodepoint

2 BanglaandGurmukhiScript

Bangla Gurmukhı

মU+09AE

ਸU+0A38

িU+09BF

ਿU+0A3F

Table15-BanglaandGurmukhıcross-scriptvariantcodepoint

7 WholeLabelEvaluationRules(WLE) ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinsection32whenwritten inBangla12ScriptThe ruleshavebeendrafted in suchawaythattheycanbeeasilytranslatedintotheLGRspecificationsBelow are the symbols used in the WLE rules for each of the Indic SyllabicCategoryasmentionedinthetableprovidedinCodepointrepertoire(Section51)

C rarr Consonant

M rarr Kāra(Mātrā)

V rarr Vowel

B rarr Anusvāra

12 As used by the Unicode denoting and including both Assamese and Maṇipuri

47

D rarr Candrabindu

X rarr Visarga

H rarr Hasanta

Z rarr KhandaTa

S rarr S1S2(fromTable9)or(ae)Ya-phalā(V1HC1M1)whereV1iseither0985(অ-BENGALILETTERA)or098F(এ-BENGALILETTERE)His09CD(-BENGALISIGNVIRAMA)C1is-09AF(য-BENGALILETTERYA)M1is-09BE(া-BENGALIVOWELSIGNAA)S1 and S2 are valid even they are not allowed by the other context rules

P rarr Ra-Hasanta(C2H)whereC2iseither09B0(র-BENGALILETTERRA)or 09F0 (ৰ - ASSAMESE LETTER RA

Unicode name BENGALI LETTER RAWITHMIDDLEDIAGONAL)

His09CD(-BENGALISIGNVIRAMA)

Table16-SymbolsusedinWLErules

ItisalsoperhapsidealtomentionherethatinBanglatheconsonantletters(orgraphemes)arephysicallyjoinedtoformldquoclustersrdquothatcouldtheoreticallyconjoinfromtwotofourconsonantsandcombinetocreatenewshapesDashandChaudhuri(1998)statethatthereareldquonearly380uniqueconsonantclustersrdquooutofwhichBi-consonantalcombinationsare290three-lettercombinationsaccountforanother80andtherareroneswithfourlettersnumber10more[136Pg4]MoredetailsofsuchcombinationscouldbeseeninPabitraSarkar(1993)[135] 71FinalSetofWLERulesTheprevalentpatternsinBanglaandvariousrestrictionsbelowarethespecificWLErulesthatneedtobeimplemented

48

1 CisasetofCandCNwhereCNisthesetofnormalizedformsofড়ঢ়য়2 HmustbeprecededbyC

Example

3 MmustbeprecededbyCExampleকা

4 DmustbeprecededbyeitherofVCorMExampleআখখা হা

5 XmustbeprecededbyeitherofVCMorDExampleউঃখঃবঃাঃ দ ঃ

6 BmustbeprecededbyeitherofVCMorDExampleআংইংকং

7 ZmustbeprecededbyVCMDBXorPExampleইৎকৎাৎাৎপ6ৎrৎ (S is not listed because S ends with M Z may also follow S)rdquo

8 VCANNOTbeprecededbyHDetailsin711CaseofVprecededbyH

9 SCANNOTbeprecededbyH

Now letuselaborateeach rulewithexamples from the scriptkeeping inmindthe Bangla Assamese and Manipuri communities Some combinations ofcharactersmayseemunrealisticorrareinusagebutthereisnoharminaddingsuch ligaturesbecause it ispossible tocreate thembyanyusereasilybutmaynotbeattestedcombinationsCase of V Preceded by H There could be cases involvingmulti-word domains where Vmay need to beallowedtofollowanHegব8াsঅtইিuয়া baeligŋkʌv ɪndiə (U+09ACU+09CDU+09AFU+09BEU+0999U+09CD U+0995 U+0985 U+09AB U+09CD U+0987 U+09A8 U+09CD U+09A1U+09BFU+09DFU+09BE)(meaningBankofIndia)ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendswithanH(অt)andthesecondwordbeginswithaV(ইিuয়া)Somesectionsofthe linguistic community require the explicit presence of H for fullrepresentation of the sound intended However by and large the form of thefirstwordwithoutanH(U+09CD)isconsideredenoughforfullrepresentationofthesoundintendedforthefirstword

49

This isauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoire Otherwise V is never required to be allowed to follow an HPermittingthismaycreateaperceptivesimilaritybetweentwolabels(withandwithout H) for majority of the linguistic communities hence this is explicitlyprohibitedbytheNBGPIn future if required depending on the prevailing requirements from thecommunitythefutureNBGPmayconsiderrevisitingthisrule

72 AdditionalExamplesfromBanglaABNF

Belowarea fewexampleswhichhelponeunderstandsomeof the rulesABNFputsinplaceThesearejustgivenforreferencepurposesandarenotmeanttobecomprehensive

1HMBDorXcannotoccurinthebeginningofaBanglawordExample

ক क

াক ाक

ংক क

ক क

ঃক ःकAscanbeseensuchcombinationwillresultautomaticallyinaldquogolurdquooradottedcircle marking it as an invalid formation This is an intrinsic property of theIndianlanguagesyllableandisquasi-automaticallyappliedwhereversupportedbytheOS

2HisnotpermittedafterVBDXMS

Example

অ अ

অং क ক क কঃ कः ি )क

3NumberofBDorXpermittedafterConsonantorVowelorakāra(Mātrā)isrestrictedtoonethusthefollowingcombinationsareinvalidated

50

Example

কংং क

ক क

কঃঃ कःः

কা का

কীঃঃ कःः

অংং अ

অ अ

অঃঃ अःः

4NumberofMpermittedafterConsonantisrestrictedtooneExample

কীী की5MisnotpermittedafterV Example

ইাঈৗ ईाईौ6ThecombinationsofAnusvāra+ VisargaaswellasVisarga+Anusvāraarenotpermissible

Example

কংঃ कः

কঃং कः

8 Contributors

81ExpertsfromIndia ProfessorUdayaNarayanaSinghChair-ProfessorofLinguisticsampDeanFacultyofArtsAmityUniversityHaryanaGurgaonPachgaonManesarPIN122431(Haryana)India

51

ProfessorPabitraSarkarformerlyVice-ChancellorRabindraBharatiUniversityKolkataDrAtiurRahmanKhanPrincipalTechnicalOfficerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMrRajibChakrabortyLinguistSocietyforNaturalLanguageTechnologyResearch(SNLTR)Module114amp130SDFBuildingSaltLakeSector-VKolkata-700091(WestBengal)IndiaMrAkshatJoshiProjectEngineerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMsMoumitaChowdhurySeniorTechnicalOfficerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMrChandrakantaMurasinghAgartalaTripuraSomeotherNBGPmembers

82ContributorsfromBangladeshJanabMustafaJabbarHonorableMinisterMinistryofPostsTelecommunicationsampInformationTechnologyGovtofBangladeshProfShamsuzzamanKhanFormerDirector-GeneralBanglaAcademyDhakaProfRafiqulIslamNationalProfessorofHumanitiesDhakaProfSwarochisSarkarDirectorInstituteofBangladeshStudiesRajshahiUniversityRajshahiBangladeshProfJinnatImtiazAliDirector-GeneralInternationalMotherLanguageInstituteDhakaMrMohammadMamunOrRashidDepartmentofBanglaJahangirnagarUniversityampMemberBangladeshComputerCouncilProfManiruzzamanformerlyProfessorChittagongUniversityChattagramBangladeshMrShyamSunderSikderSecretarySecretaryPostampTelecommunicationsDivisionGovtofBangladesh

52

MrMdMustafaKamalFormerDirectorGeneralBangladeshTelecommunicationsRegulatoryAuthorityGovernmentofBangladeshDhakaBrigadierGeneralMdMahfuzulKarimMajumderDirector-GeneralEngineeringampOperationsDivisionBangladeshTelecommunicationsRegulatoryAuthorityGovernmentofBangladeshDhakaMdZiarulIslamProgrammerPostsampTelecommunicationsDivisionGovernmentofBangladeshDhakaProfSyedShahriyarRahmanDepartmentofLinguisticsUniversityofDhakaDrMizanurRahmanDirector(In-Charge)TranslationTextBookandInternationalRelationsDivisionBanglaAcademyDhakaDrApareshBandyopadhyayDirectorBanglaAcademyDhakaMrMdMobarakHossainDirectorBanglaAcademyDhakaDrJalalAhmedDirectorBanglaAcademyDhakaMrJahangirHossainInternetSocietyBangladesh(ICANNALS)JanabSarwarMostafaChoudhuryBangladeshComputerCouncilDhakaJanabMdRashidWasifBangladeshComputerCouncilDhakaJanabIstiaqueArifSeniorAssistantDirectorBangladeshTelecommunicationsRegulatoryAuthorityDhakaMsAfifaAbbasInformationSecurityandGovernanceLeadEngineeratBanglalinkandICANNFellowMr Mohammad Abdul Haque Secretary General Bangladesh Internet GovernanceForumMrImranHossenCEOEyeSoftandkeymemberofBangladeshAssociationofSoftwareampInformationServices(BASIS)MsShahidaKhatunDirectorFolkloreMuseumampArchiveDivisionBanglaAcademyDhakaMrSyedAshikRehmanCEOBengalMediaCorporationDhakaMrHaseebRahmanCEOProfessionalsrsquoSystemsDhaka

53

9 References[100] UnicodeConsortium2017UnicodeStandard100MountainViewCA

[101] BandyopadhyayChittaranjan1981DuiShatakerBanglaMudranoPrakashanKolkataAnandaPublishers

[102] BanerjiRD1919TheOriginoftheBengaliScriptKolkataNewDelhiAsianEducationalServices2003reprint

[103] ChatterjiSK1926TheOriginandDevelopmentoftheBengaliLanguageCalcuttaCalcuttaUniversityPress

[104] -----1939Bhasha-prakashBangalaVyakaran(AGrammaroftheBengaliLanguage)CalcuttaUniversityofCalcutta

[105] HaiMuhammadAbdul1964DhvaniVijnanOBanglaDhvani-tattwa(PhoneticsandBengaliPhonology)DhakaBanglaAcademy

[106]JhaSubhadra1958TheFormationofMaithiliLondonLuzacampCo

[107] KosticDjordjeDasRheaS1972AShortOutlineofBengaliPhoneticsCalcuttaStatisticalPublishingCompany

[108] MajumdarRC1971HistoryofAncientBengalCalcuttaGBhardwaj

[109] MazumdarBijaychandra19202000TheHistoryoftheBengaliLanguage(ReprCalcutta1920ed)NewDelhiAsianEducationalServices

[110] PandeyAnshuman2001ProposaltoEncodetheTirhutaScriptinISOIEC10646

[111] PalPalashBaran2001DhwanimalaBarnamalaKolkataPapyrus

[112] -----2007lsquoBanglaHarapherPanchParbarsquoInSwapanChakrabortyedMudranerSanskritiOBanglaBoiKolkataAbabhas

[113] RossFiona1999ThePrintedBengaliCharacteranditsEvolutionLondonCurzon

[114] ShastriMahamahopadhyayHaraPrasad1916HājārBacharērPurāṇaBāṅgālāBhāṣāyBauddhaGānōDōhāCalcuttaBangiyaSahityaParishat

[115] SinghUdayaNarayana(JointlyManiruzzaman)1983DiglossiainBangladeshandlanguageplanningCalcuttaGyanBharati214pp

[116] -----1987ABibliographyofBengaliLinguisticsMysoreCIILxii+316pp

[117] -----2017(withRajibChakrabortyBidishaBhattacharjeeampArimardanKumarTripathy)LanguagesandCulturesontheMarginGuidelinesforFieldworkonEndangeredLanguagesMimeoCentreforEndangeredLanguagesVisva-Bharati

54

[118] -----1980ScriptalchoiceandspellingreformAnessayinlanguageandplanningJournaloftheMSUniversityofBarodaSocialScienceNumber292173-186AmodifiedversionreprintedEAnnamalaiBjornJernuddandJoanRubinedsLanguagePlanningProceedingsofanInstituteMysoreCIIL405-417

[119] Sripantha1996JakhanChapakhanaEloKolkataPaschim-BangaBanglaAcademy

[120] SurAtul1986BanglaMudranerDushoBacharKolkataJijnasa

[121] ScriptBehaviourforBengaliVersion11TDILandC-DACPune

[122] BoraMahendra1981TheEvolutionofAssameseScriptJorhatAssamSahityaSabha

[123] ProposaltoEncodetheTirhutaScriptinISOIEC10646httpwwwunicodeorgL2L201111175r-tirhutapdfaccessedon25112017

[124] EthnologueAssameseintheLanguageCloudhttpswwwethnologuecomcloudasmaccessedon25112017

[125] BengalialphabetforManipurifoundinEthnologueManipuri(MeeteilonMeithei)httpswwwomniglotcomwritingmanipurihtmaccessedon20102019

[126] WikipediaBengalialphabethttpsenwikipediaorgwikiBengali_alphabetaccessedon25112017

[129]OmniglotSlyhetihttpwwwomniglotcomwritingsylotihtmaccessedon1052018

[130]WikipediaBishnupriyaManipurilanguagehttpsenwikipediaorgwikiBishnupriya_Manipuri_languageaccessedon25112017

[131] TheEMILLECIILCorpushttpmetashareeldaorgrepositorybrowsethe-emilleciil-corpusabdd35c8de6f11e2b1e400259011f6ea6bce74d38dbb42d881da76c64a6adb20accessedon1052018

[132]TheEMILLECIILCorpushttpcatalogelrainfoproduct_infophpproducts_id=696accessedon1052018

[133] BanglaLanguageampScript httpswwwisicalacin~rc_banglabanglahtmlaccessedon1052018

[134] SarkarPabitra1992BanglaBananSanskarSamasyaoSambhabanaKolkataChirayataPrakashan

[135] SarkarPabitra1993BanglaBhasharYuktabyanjanBhasha1123-45

55

[136] DashNiladriShekharandBBChaudhuri1998BanglaScriptAStructuralStudyLinguisticsToday121-28Alsoavailableathttpswwwacademiaedu9967428Bangla_Script_A_Structural_Study

[137] DaniAhmedHasan(1957)lsquoSrīhatta-NāgarīLipirUtpattioBikāśrsquoBanglaAcademyPatrika(Dhaka)Vol12(Bhadra-Agrahayan1364BangabdaNumber)pg1

[138] WikipediaSylhetiNagari httpsenwikipediaorgwikiSylheti_Nagariaccessedon1952018

[139] FuruiRyosuke(2015)lsquoVariegatedAdaptationsStateFormationinBengalfromtheFifthtoSeventhCenturyrsquoinBhairabiPrasadSahuampHermannKulkeedsInterrogatingPoliticalSystemsIntegrativeProcessesandStatesinPre-ModernIndiaChapter9Pp255-73NewDelhiManohar

[140] FergusonCharesAandMunierChowdhury(1960)lsquoPhonesofBengalirsquoLanguageVol36No1pp22-59

[141] ShahidullahMuhammad(2007)BuddhistMysticSongsDhakaMowlaBrothers

[142] RayPunyaSloka(1966)BengaliLanguageHandbookWashington

[143] HaiMuhammadAbdul(1960)AphoneticandphonologicalstudyofnasalsandnasalizationinBengaliDhakaUniversityofDhaka

[144] UnicodeConsortiumProposalSummaryFormtoAccompanySubmissionsforAdditionstotheRepertoireofISOIEC10646UNICODEhttpswwwunicodeorgL2L200202387r-syloti-formpdfaccessedonMay212018

[145] WikipediaOlChiki(Unicodeblock)httpsenwikipediaorgwikiOl_Chiki_(Unicode_block)accessedonMay212018

[146]BanglaScripthttpwwwbangladesh2000combdbangla_scripthtmlaccessedonMay212018

[147] BhattacharyaAshutoshed(1942)GopichandrerGanCalcuttaCalcuttaUniversity

[149] DasSisirKumar(1975)SahibsandMunshisAnAccountoftheCollegeofFortWilliamCalcutta

[150] IslamRafiqulPabitraSarkarMahbubulHaqampRajibChakraborty(eds)(2014)BanglaAcademyPromitoBanglaByabaharikByakaran(AFunctionalGrammarofStandardBangla)DhakaBanglaAcademy

[151] SarkarPabitra[2013]lsquoBanglaSpellingReformtheLongandShortofItrsquoBanglaJournal19215-232

[152] BanglaAcademy(2012)BanglaAcademyPromitoBanglaBananerNiyam(StandardBanglaSpellingasadoptedbyBanglaAcademy)DhakaBanglaAcademy

56

[153] SarkarPabitraampRajibChakraborty2018ldquoWhathashappenedSoFarIntermsof Script Reformsrdquo Paper presented at the Face to Face meeting jointly held by theBanglaAcademyDhakaampICANNatBanglaAcademyDhakaon10072018

[154] The Unicode Consortium 2018 The Unicodereg Standard Version 110 ndash CoreSpecificationChapter12P473

57

10 Appendix-I

101 AugmentedBackus-NaurFormalism(ABNF)The Augmented Backus-Naur Formalism (ABNF) is generic in nature and whenappliedtoaspecificlanguagescriptcertainrestrictionrulesapplyInotherwordsinagivenlanguagesomeoftheFormalismstructuresdonotnecessarilyapplyTotakecareofsuchcasesrestrictionrulesaresetinplaceTheserestrictionswillhelptofine-tunetheABNFIncaseofBangla13inparticularthefollowingrulesapply

1Khaṇḍata (ৎ) is NOT allowed at the beginning of an IDN label The sameappliestoঞandthevelarnasalঙintheBanglaSchemeoffive-foldlsquovargarsquo(asdefined under Table 5) Moreover Bangla does not allow ya (য়) in thebeginningofawordeitherbutwecanciteacoupleofnativeexamples forexamplethewordয়8াvেড়া(yaeligbbɔRo)fromthepoemlsquoLichuchorrsquowrittenbyKaziNazrul IslamHowever there are instances of it being used in namesmostlyof foreignoriginsuchasYaqubwhichmaybewrittenwithya(য়) inthe beginning as inয়াwব) In very recent timeswhile transliterating someChineseandJapanesenamesinBanglaonedoescomeacrossthepossibilityofKhaṇḍata (ৎ) followedbysa (স) inthebeginningofaword forexamplexেসিরং(Tsering)

2CHcancomewithKhandaTainonlythecasewhereCisra(র)(09B0)

ৎ6 asinভৎ6 সনা

3OnlyfollowingcombinationswithVHCMwillbeallowedrarrঅ8া(togetherpronouncedasaelig)asinঅ8ািসড(acid)rarrএ8া(togetheralsopronouncedasaelig)asinএ8ািসডএ8ােসািসেয়শান

(acidassociation)

102 lsquoSylhetiNagarılipirsquoorlsquoSilotirsquoThisversionofBanglascriptresemblesthe lsquoKaithīrsquoscript(ISO12954)usedbytheAccountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh andBiharndashwidelyinuseduringthe1880sTherewereseveralothernamesofSylheti

13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language Assamese and Maṇipuri have not been covered in this section

58

Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)

Table17ndashTheScriptTableofSylhetiNagarıorSiloti

103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints

1031 BanglaandNāgarīorDevanāgarī

Bangla Devanāgarī NBGP

Decision

ঃ U+0983 ः U+0903 Confusable

ওU+0993 उU+0909 Confusable

ঘU+0998 घU+0918 Confusable

U+0981 U+0945 Confusable

Table18BanglaandDevanāgarīconfusablecodepoints

59

1032 BanglaandGurmukhi

Bangla Gurmukhi NBGPdecision

ঘU+0998 ਬU+0A2C Confusable

U+0981 U+0A71 Confusable

Table19BanglaandGurmukhiconfusablecodepoints

Bangla Gurmukhi

NBGPdecision

ওU+0993 ਤU+0A24

Distinguishable

শU+09B6 ਅU+0A05

Distinguishable

মU+09AE ਮU+0A2E

Distinguishable

বাU+09ACandU+09BE

ਗU+0A17

Distinguishable

Table20ndashBanglaandGurmukhıdistinguishablecodepoints

1033BanglaandOriya(Odia)

Bangla Oriya(Odia) NBGPDecision

ওU+0993 ଓU+0B13 Confusable

Table21ndashBanglaandOriyadistinguishablecodepoints

Bangla Oriya(Odia) NBGP

Decision

ঘU+0998 ସU+0B38 Distinguishable

Table22ndashBanglaandOriyadistinguishablecodepoints

60

11 Appendix-IIBengaliconsonantsandtheirallographs

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

প p y(z+ত)(z +ন)|(z +প)প8(z +য)r(z +র)(z +ল)~(z +স)(+প)(+প)

ফ pʰ (t +র)(t +ল)(+ফ)

ব b ( +জ)( +দ)( +ধ)v(+ব)ব8(+য)(+র)(+ল)ভ(+ভ)(+ব)(+ব)

(0+ধ) 2 (3+ব)

ভ bʱ ভ8(+য)(+র)(+ল)

ত t (x+ত)8(x+x+য)(x+x+ব)(x+থ)(x+ন)ত8(x+য)(x+ম)8(x++য)(x+ব)(x+র)y(z+ত)(p+ত)(p+x+ব)(+ত)q8(+x++য) (+x+র)Thereisamarkedformofত+=ৎৎ6 (+xৎ)

amp (5+ত)

61

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

থ tʰ থ8(iexcl+য)cent(iexcl+র)pound(+থ)(x+থ)curren(+থ)

(7+থ) 9 (+থ)

দ d yen(brvbar+গ)sect(brvbar+ঘ)uml(brvbar+দ)copy(brvbar+ধ)দ8(brvbar+য)ordf(brvbar+ব)laquo(brvbar+ভ)not(brvbar+র)(+দ)shy(+দ)reg(+brvbar+র)not6 (+brvbar+র)

(lt+গ) gt (lt+ধ)

ধ dʱ macr(deg+ন)plusmn(deg+ম)ধ8(deg+য)sup2(deg+র)sup3(acute+ধ)copy(brvbar+ধ)(+ধ)micro(+ধ)

( (+ধ) gt (lt+ধ) (0+ধ) (7+ধ)

ট ʈ para(middot+ট)ট8(middot+য)cedil(middot+ব)sup1(middot+র)ordm(p+ট)raquo(frac14+ট)

ঠ ʈʰ ঠ8(frac12+য)frac34(iquest+ঠ)Agrave(frac14+ঠ)

ড ɖ Aacute(Acirc+ড)ড8(Acirc+য)Atilde(Acirc+র)

ঢ ɖʱ ঢ8(Auml+য)Aring(iquest+ঢ)

চ tʃ AElig(Ccedil+চ)Egrave(Ccedil+ছ)Eacute(Ccedil+Ecirc+র)Euml(Ccedil+ঞ)চ8(Ccedil+য)Igrave(Iacute+চ)Icirc(Iuml+চ)

(A+চ)

62

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ছ tʃʰ ETH(Ecirc+র)Egrave(Ccedil+ছ)Ntilde(Iacute+ছ)Ograve(Iuml+ছ)

$ (A+ছ)

জ dʒ Oacute(Ocirc+জ)Otilde(Ocirc+Ocirc+ব)Ouml(Ocirc+ঝ)times(Ocirc+ঞ)জ8(Ocirc+য)Oslash(Ocirc+র)Ugrave(Iacute+জ)

(A+জ)

ঝ dʒʱ (notprivilegedenoughtohaveclustersasafirstmember)Ouml(Ocirc+ঝ)Uacute(Iacute+ঝ)

ক k Ucirc(p+ক)ordm(p+ট)(p+ত)Uuml(p+x+র)(p+x+ব)Yacute(p+ন)THORN(p+ব)szlig(p+ম)ক8(p+য)agrave(p+র)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)auml(p+frac14+ব)aacute8(p+frac14+য)aring(p+স)s(aelig+ক)ccedil(+p+র)

amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)

খ kʰ (notprivilegedenoughtohaveclustersasafirstmember)egrave(aelig+খ)

63

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

গ g eacute(acute+গ)ecirc(acute+দ)sup3(acute+ধ)euml(acute+ন)igrave(acute+ব)iacute(acute+ম)গ8(acute+য)icirc(acute+র)iuml(acute+ল)eth(aelig+গ)eth6 (+aelig+গ)

( (+ধ) (H+গ) K (L+H+গ)

ঘ gʱ ntilde(ograve+ন)ঘ8(ograve+য)oacute(ograve+র)ocirc(aelig+ঘ)

ঞ Thisletterdoesnothaveanyparticularphoneticvaluebutmostlypronouncedasn

Igrave(Iacute+চ)Ntilde(Iacute+ছ)Ugrave(Iacute+জ)Uacute(Iacute+ঝ)times(Ocirc+ঞ)

(A+চ) $ (A+ছ) (A+জ) M (A+ঝ)

ণ n otilde(iquest+ট)frac34(iquest+ঠ)ouml(iquest+ড)divide(iquest+Acirc+র)Aring(iquest+ঢ)oslash(iquest+ণ)ণ8(iquest+য)ugrave(iquest+ব)acirc(p+frac14+ণ)uacute(frac14+ণ)ucirc(+ণ)

O (P+ড) - (P+R+র) + (S+ণ)

ঙং ŋ s(aelig+ক)uuml(aelig+p+র)egrave(aelig+খ)eth(aelig+গ)ocirc(aelig+ঘ)yacute(aelig+p+ষ)(Insomecontextsaeligisreplacedbyং)কংঅং

) (H+ক) (H+গ) U (H+ঘ)

64

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ম m thorn(+ল)yuml(+প)(+z+র)(+ভ)(++র)$(+ম)(+র)(x+ম)plusmn(deg+ম)amp(+ম)atilde(p+frac14+ম)

W (3+ম)

ন n (+ট)((+middot+র))(iquest+ঠ)u(+ড)(+Acirc+র)(+ত)q(+x+র)q8(+x++য)curren(+থ)shy(+দ)reg(+brvbar+র)micro(+ধ)+(+deg+র)(+brvbar+ব)-(+ন)(+ম)ন8(+য)(+স)0(+ন)

(7+থ) (7+ধ) (7+Y+র)

শ ʃ Icirc(Iuml+চ)Ograve(Iuml+ছ)1(Iuml+ন)2(Iuml+ম)3(Iuml+র)4(Iuml+ল)শ8(Iuml+য)

ষ ʃ 5(frac14+ক)raquo(frac14+ট)Agrave(frac14+ঠ)uacute(frac14+ণ)6(frac14+প)7(frac14+z+র)8(frac14+ফ)raquo(frac14+ট)9(frac14+middot+র)Agrave(frac14+ঠ)uacute(frac14+ণ)ষ8(frac14+য)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)

+ (S+ণ)

65

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

স sampʃ (+ক)(+ট)(+প)(+ফ)lt(+ত)pound(+থ)(+ট)(+ক)=(+খ)স8(+য)gt(+র)(+ল)aring(p+স)

9 (+থ)

হ h ucirc(+ণ)0(+ন)amp(+ম)হ8(+য)(+র)A(+ল)

W (3+ম)

ড় ɽ B(C+গ)

ঢ় ɽʱ (notprivilegedenoughtohaveclusters)

য dʒThesecondarysymbol(allograph)jɔ-phalahastwophoneticvaluesWhenaddedtotheinitialconsonantinaworditisavowelaelig(asinশ8ামলর 8াপারetc)Butafteranon-initialconsonantitjustdoublesitinpronunciation(asinকায6ধায6etc)The+যcombinationhastwophysicalmanifestationsmdashর 8andয6

ক8(p+য)স8(+য)র 8(+য)[Justর 8isneverusedinBanglaorthographyর 8াisbutthenitslasttwosymbolsYa-phalaa-karaconstituteavowelsignrepresentingthevowelঅ8া]

66

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

র r Twomanifestationsmdashi lরফrepʰasthefirst

memberofaclusteregপ6ৎ6 not6 য6D6(+deg+ব)(earlierE6=+brvbar+deg+বafour-termcluster)etc(placedoverthefollowingconsonant)

ii র-ফলাrɔ-pʰɔlaasthesecondthirdmemberofaclustereg etc(placedundertheconsonantitfollows)

ল l F(+গ)(+প)G(+ব)H(+ম)I(+ট)J(+ড)K(+ক)F(+গ)L(+দ)ল8(+য)iuml(acute+ল)(+ল)thorn(+ল)

ঃ hwordfinallywordmediallyitdoublesthepronunciationofthefollowingconsonant

অঃকঃ

অব

Page 16: Proposal for a Bangla (or Bengali) Script Root Zone …‘Bangla’ (or Bengali) is historically and genealogically regarded as an eastern Indo-Aryan language with around 178.2 million

16

inBanglaorMātrāinNagarı9isattachedtotheconsonantSincetheconsonanthasthisbuilt in neutral vowel at the end there are equivalent kāras (Mātrās) for all vowelsexcepttheঅ(pronounced-ɔ)Thecorrelationisshownasfollows

Vowel Correspondingvowelsign(kāras(Mātrās)

অlsquoArsquoU+0985

আlsquoAArsquoU+0986 া U+09BE

ইlsquoIrsquoU+0987 ি U+09BF

ঈlsquoIIrsquoU+0988 ীU+09C0

উlsquoUrsquoU+0989 U+09C1

ঊlsquoUUrsquoU+098A U+09C2

ঋVocalicrsquoRrsquoU+098B U+09C3

ৠVocaliclsquoRRrsquoU+09E0 U+09C4

ঌVocaliclsquoLrsquoU+098C U+09E2

ৡVocaliclsquoLLrsquoU+09E1 U+09E3

এlsquoErsquoU+098F ে U+09C7

ঐlsquoAIrsquoU+0990 ৈU+09C8

ওlsquoOrsquoU+0993 োU+09CB

ঔlsquoAUrsquoU+0994 ৌ U+09CC

- ৗ U+09D7

Couldappearontopofঅ lsquoArsquoU+0985oranyothervowel

U+0981Candrabindu

9AlthoughthetermlsquoMatralsquoinBanglastandsforanaltogetherdifferentconceptvizthetopbarplacedoveraletterndashtypicallyavailableinHindiandBanglabutmissinginGujarati

17

Vowel Correspondingvowelsign(kāras(Mātrās)

Couldappearafterঅ lsquoArsquoU+0985oranyothervowel

ংU+0982Anusvara

Couldappearafter অlsquoArsquoU+0985oranyothervowel

ঃU+0983Visarga

Afteranyconsonant U+09CD(Hasanta)

- ঽ U+09BDAvagraha

Table7BanglaVowelswithcorrespondingkārs

334TheAnusvāraonuʃʃār(ং-U+0982)TheAnusvāra or onuʃʃār inBangla at times represents a homorganic nasal but notalwaysItreplacesaconjunctgroupofalsquoNasalConsonant+Hasanta+ConsonantrsquowherethesecondconsonantbelongstotheVelarvargaorsetasinলংকাButitoftenappearsalso for such combinations involving non-velars appearing as the lastmember of thecombinationasinলGাংটা ldquonakedrdquoorলGাংচা ldquoakindofsweettolimprdquoBeforeanon-vargaconsonant the Anusvara represents a nasal sound that may have an alternativeconjoined writing symbol representing the corresponding nasal consonant of theparticularsetAlthoughModernHindiMarathiandKonkaniprefertheanusvāratothecorrespondingHalf-nasal inBangla it isclearlydemarcatedastowhereonemustusetheAnusvāraandwhere ithastobeaconjunctclusterwithanasalasthefirstorthesecondcomponent

335NasalizationCandrabindu(-U+0981)

Candrabindu denotes nasalization of the preceding vowel as in চাদ cad lsquomoonrsquo(U+099AU+09BEU+0981U+09A6)Thissignwithadotinsidethehalf-moonmarkisusedasnasalizationmarkerinmanyBrahmı-basedscripts[143]

336Nukta(-U+09BC)ThenuktasigndoesnotexistinBanglaorthographyItispredominantlyusedinmanyBrahmıderivedscriptssuchasDevanagarı(forHindiBodoMaithiliSantaliKashmiriandSindhiThetermandtheconceptofnuktaareborrowedinBanglaTheIDNAProtocol(RFC5891)statesthatIDNsmustbeinUnicodeNormalizationFormC (NFC) RFC 7940 applies this requirement to LGRs The definition of NFC in the

18

UnicodeStandardcontainsanumberofcompositionexclusionsAsaresulttheBanglalettersয় YYAড় RRAandঢ় RRHAhavetoberepresentedinthethisLGRbyusingthesequences (YA +Nukta U+9AF + U+09BC) (DDA + Nukta U+9A1 + U+09BC) and(DDHA+NuktaU+9A2+U+09BC)insteadofthesinglecodepointsYYA(U+9DF)RRA(U+09DC) andRRHA (U+09DD) although the useof lsquoNuktarsquo is otherwise completelyunnaturalinBanglaIt is noted that in the current Unicode Standard chart these characters are listed asadditionalconsonantsAspertheLGRProcedurehoweverthesedecisionsdependontheIDNAProtocolthroughasetofprodeduresdevelopedbytheIETFEventhoughtheUnicode Standard also prescribesmethods to produce these three characters both asatomiccharacters (forexample09DC forড় [r]09DD forঢ় [rh] and09DFasয় [y]assinglekeystroke)theIDNAprotocolrequiresthatwetreatthemasconjunctcharactersandthenallocatecodesfortheseintheUnicodeBengaliBlockItmaybenotedthattherecouldbesporadicattemptsorcasesofwritingMuslimnamesUrdupoeticwordsandPerso-Arabicloanwordswithnuktaunderক(k)খ(kh)গ(g)জ(j) and ফ (ph) only for the sake of correct pronunciation and for maintaining thesanctityoftheloanwordThesewerealsolikeusingBanglawritingsystemtoworkliketheIPAscriptItishowevernotinuseinBanglawritinginprinting

337Visargabiʃɔrgo(ঃ-U+0983)andAvagraha(ঽ-U+09BD)

TheVisargabiʃɔrgoU+0983 is frequentlyused inBangla loanwordsborrowed fromSanskritandrepresentsasoundveryclosetohOnecouldquoteasanexampleদঃখduhkholdquosorrowrsquorsquoldquounhappinessrsquorsquo(U+0926U+0941U+0983U+0916)The Avagraha ঽ (U+09BD) is mainly used in Sanskrit Pali Prakrt or Maithili textswritteninBanglaItisgraduallybeingreplacedbyanuppercomma(egনেরাঽপরািণre-writtenasনেরাrsquoপরািণ)ItisrarelyusednoweveninotherlanguagesusingBanglascriptIncaseofLGRtheAvagrahaisnotpartoftherepertoireIthasbeendecidedthereforenottoretainAvagraha(ঽ)(U+09BD)becauseitisblockedinTLDsaspertheMaximalStartingRepertoire(MSR)PleaseseeAppendixIIinsection11foracompletelistofBanglaconsonantsandtheirallographs

338ZeroWidthNon-joiner(U+200C)andZeroWidthJoiner(U+200D)ThisnoteispertinenttotheuseofZeroWidthJoiner(ZWJ)andZeroWidthNonJoiner(ZWNJ)asusedinBanglaItneedstobenotedthatNepaliKonkaniandHindiusethesetwosignsinadifferentmanner

19

ZWJ(U+0200D)andZWNJ(U+0200C)arecodepointsthathavebeenprovidedbytheUnicodestandardto instructtherenderingofastringwherethescripthastheoptionbetweenjoiningandnon-joiningcharactersWithouttheuseofthesecontrolcodesthestringmayberenderedinanalternateformfromwhatisintendedUseofZWJ

bull InsofarasBanglaisconcernedZWJisusedfortheproperrenderingofcharacterssuchaskhaṇḍa-taৎasinসতGিজৎ (satyajit)ldquoSatyajitrdquoandসৎ(sat)ldquohonestrdquoThisistypedasfollowsta+Hasanta+ZWJ(U+0200D)

bull However ZWJ is more important where same combination of consonantal

characters is represented differently depending upon the contexts Eg র++য

havetworepresentationsinBanglamdashasযHandasর GTogettheformযHonehasto

type in the following mannermdashর++য but for র G the sequence would be

র+ZWJ++য [154] In other words ZWJ is used in the rendering of wordsdemanding ya-phalā after ra which is otherwise not possible to type (render)due to the same order of ra+hasanta+antastha ja in the medial andor finalposition Interestingly ra+hasanta+antastha ja is used to type repha on theconsonant -antasthaja as inকায6 (kaarjo) In order to get a ya-phalā after the

consonant -ra it is therefore obligatory to use ZWJ after -ra as in র Gাপার

(wrapper)র Gাশ(rash)র Gািল(rally)etcThetypingsequenceisgivenbelow

ra(র)+ZWJ+hasanta()+antasthaja(য)=র GUseofZWNJ

bull TheuseofZWNJinBanglaisusedtorepresenttheexplicitHasantaorHalantInordertoavoidconjunctformationincaseswherethereisanexplicithasantabeforethesucceedingconsonanttheZWNJisused

Consonant+hasanta+ZWNJ+consonant=explicithasantaExampleAা7 কথন(prakkathanaprakkɔtʰon)

TheuseofZWJZWNJhavebeenruledoutfromtherootzonebythe[Procedure]Usedin Bangla to create alternate renderings the insertion of these two signs can affectsearchingaswellasNLP

20

TheZeroWidthNon-joiner(ZWNJ)isaninvisiblecharacterusedincertaincases(afterHasanta)wheredefaultconjunctformationistobeexplicitlyrestrictedandtheHasantajoiningthetwoconsonantsparticipatingintheconjunctformationneedstobeexplicitlyshown

339UseofYa-phalaaYa-Phalaasequencesare two instances inBanglawhereHasanta isprecededbya fullvowel(U+0985অ-BENGALILETTERAandU+098Fএ-BENGALILETTERE)

bull অা 098509CD09AF09BE BENGALILETTERA+BENGALISIGNVIRAMA+BENGALILETTERYA+BENGALIVOWELSIGNAA

bull এা 098F09CD09AF09BEBENGALILETTERE+BENGALISIGNVIRAMA+BENGALILETTERYA+BENGALIVOWELSIGNAA

For renderingYa-phalā followedbyঅ andএ it isnecessary to typeU+09CDHasantaplusU+09AFyaprecededby thesaidvowelsThis isapurely ligaturalentityand theadditionofYa-phalāandākaraisusedtoelicittheaeligsoundasinEnglishacidঅGািসড

association অGােসািসেয়শনlsquobatrsquoবGাটlsquofatrsquo ফGাট lsquomatrsquo মGাটlsquocaprsquoকGাপetcTheBrāhmīscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribed as lsquovowel killerrsquo although it actually indicates absenceof a vowel after themarkedconsonantOnly theconsonantscanhave theHasantamarkedButasweseehereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅ8াandএ8া(CfUnicode100p473[100])

3310FormationofRa-phalaaandRefSequences Thiscasereferstotheformationofrephaandra-phalāasfollows

Ra-Hasanta= (C2H)whereC2iseither

09B0(র-BENGALILETTERRA)or 09F0(ৰ-ASSAMESELETTERRAUnicodename

BENGALILETTERRAWITHMIDDLEDIAGONAL)His09CD(-BENGALISIGNVIRAMA)

Owingtoco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+kaasinঅকHarkaldquothesunldquo)ra-phalā=C+Hasanta+ra(egT ieka+Hasanta+raasinচTchakraldquocyclerdquo)Thepointisin

21

both the cases the slot for ra could be Bangla ra র (U+09B0) or the Assamese ra ৰ(U+09F0)followedprecededbythecommonHasanta(U+09CD)whereastheshapesofrephaandra-phalāinboththecasesremainthesameTheLGRmakesanoteofthispoint of concern with respect to the two RAs in disguise as it would be compeltelyimpossibletodistinguishbetweenthemwithnakedeyesinalablesogeneratedwhichmay consequently lead to concerns related to spoofing and other kind of cyberirregularitiesThemotivetoclassthesetwoCPsas(blocking)variantsisbecausefullyrendered labels may mask the distinction between Bangla ra র (U+09B0) or theAssameseraৰ(U+09F0)ThatprovidesthejustificationforVariantSet4thoughonlyinthecontextoffollowingHasantThedifferencebetweentheRAsisonlydistinguishableifonelooksintotheirUnicodevaluesThereforelabelssuchasঅকHarka শীষH sırsalsquotopapexrsquo অY abhra lsquocloudthe skyrsquo ম śrama lsquophysical labourrsquo could be extremelydangerous as theweb-usermay never verify the digital content (the labels) with itsunicodevaluecodepoints ThispointismadeexplicitlywithreferencetoTable9(ofsequencesp36)andTable16(ofWLESymbolsp47)thataretofollowMoreoveritisnoteworthythattheREPHAcanalsooccurwithKHANDATATheconditionsinthiscontextofKHANDATAareliabletobesuchthattheCshouldbeeitherRAU+09B0(র)(usedinBangla)orRAU+09F0(ৰ)(usedinAssamese)

4 OverallDevelopmentProcessandMethodologyThe Neo-Brāhmī Generation Panel (NBGP) has been formed by members havingexperience in Linguistics (especially in NLP Computational linguistics) LiteratureLanguageHistoryandEpigraphyUndertheNeo-BrāhmīGenerationPanelBanglaandeightotherscriptsbelongingtoseparateUnicodeblocksarebeingtakenuptoassignaseparate LGR for each However an attempt ismade to ensure that the fundamentalphilosophybehindbuildingthoseLGRsconsistentwithallotherBrāhmī-derivedscriptsThepresentLGRwillcater tomultiple languagesbelongingtoEGIDSscale1to4(seeTable4)thatuseBanglascriptThefollowingguidingprinciplesareusedinmakingdecisionsaboutBanglaLGRCode-points

41 GuidingPrinciplesTheNBGP adopts followingbroadprinciples for selection of code-points in the code-pointrepertoireacrosstheboardforalltheNeo-Brāhmīscriptswithinitsambit

411 InclusionPrinciples4111 ModernUsageEvery character proposed should be in the everyday usage of a particular linguisticcommunityThecharacterswhichhavebeenencodedintheUnicodefortranscription

22

purposesonlyorforarchivalpurposeswillnotbeconsideredforinclusioninthecode-pointrepertoire4112 UnambiguousUseEvery character proposed should have unambiguous understanding among linguistsaboutitsusageinthelanguage

42 ExclusionPrinciplesThe main exclusion principle is that of External Limits on Scope These consist ofprotocolsor standardswhichareprerequisites to theLabelGenerationRule-setsAllfurtherprinciplesareinfactsubsumedundertheselimitationsbuthavebeenspeltoutseparatelyforthesakeofclarity421 ExternalLimitsofScopeThecodepointrepertoireforrootzonebeingaveryspecialcaseatthetopofprotocolhierarchies thecanvasofavailablecharacters forselectionasapartof theRootZonecodepointrepertoireisalreadyconstrainedbyvariousprotocollayersbeneathitThefollowingthreemainprotocolsstandardsactassuccessivefiltersiTheUnicodeChartOut of all the characters that are needed by the script in question if a particularcharacter is not encoded in Unicode it cannot be incorporated in the code pointrepertoire Such cases are quite rare and especially so in Bangla-Asamiyā-ManipuriWritingSystemgiventheelaborateandexhaustivecharacterinclusioneffortsmadebytheUnicodeconsortiumiiIDNAProtocolUnicode being the character-encoding standard for providing the maximum possiblerepresentation of a given scriptlanguage it has encoded as far as possible all thepossible characters needed by the script However the Domain name being aspecialized case it is governed by an additional protocol known as IDNA(InternationalizedDomainNames inApplications) The IDNAprotocol excludes somecharactersoutofUnicoderepertoirefrombeingpartofthedomainnamesiiiMaximalStartingRepertoire(MSR)TheRoot-zoneLGRbeing the repertoireof characterswhicharegoing tobeused forcreationoftheRoot-zoneTLDswhichinturnconstituteanevenmorespecializedcaseof domain names the ROOT LGR procedure introduces additional exclusions on theIDNArsquosallowedsetofcharacters

23

ExampleBanglaSignAvagrahaঽ(U+093D)evenifallowedbyIDNAprotocol isnotpermittedintheRootZoneRepertoireaspertheMSRTosumuptherestrictionsstartoffwithadmittingonlysuchcharactersasarepartofthe code-block of the given scriptlanguage The IDNA Protocol further narrows thisdownandfinallyanadditionalfilterintheformofMaximalStartingRepertoirerestrictsthecharactersetassociatedwiththegivenlanguageevenmore4211 NoPunctuationMarksTheTLDsbeingidentifierspunctuationmarkerspresentinBraHami-basedscriptswillnotbeincluded 4212 NoSymbolsandAbbreviationsAbbreviations weights and measures and other such iconic characters like BANGLAISSHAR(U+09FA)BANGLACURRENCYDENOMINATORSIXTEEN৹(U+09F9)etcwillalsonotbeincluded4213 NoRareandObsoleteCharactersThere are characterswhich have been added toUnicode to accommodate rare formssuchasSanskriticVOCALICRRৠ(U+09E0)andVOCALICLldquoঌrdquo (U+098C)aswellasVOCALICLLৡ(U+09E1)andtheallographicndashkaraformsofthelattertwosymbols-VOWELSIGNVOCALICL(U+09E2)andVOWELSIGNVOCALICLLldquo(U+09E3)Allsuch charactersareexcludedwhich complieswith theConservatismprincipleas laiddownintheRootZoneLGRprocedureHoweverinBanglathe-karacorrespondingtoVOCALICRRৠ(U+09E0)whichisVOWELSIGNVOCALICRRldquordquo(U+09C4)isstill inactiveuseincertainlimitedborrowedorSanskriticwordsandarethereforeretained4214 NoStressMarkersofClassicalSanskritandVedicStressmarkers for classical Sanskrit will not be included This is also in consonancewiththeLetterprincipleaslaiddownintheRootZoneLGRprocedure4215 ABNFThe Augmented Backus-Naur Formalism (ABNF) is described in Section 541 andAppendix(Section101)

5 RepertoireTheBanglaWritingSystemisrepresentedinUNICODEusingtheBengali(Bangla)scriptname as enumerated in ISO 15924 corresponding to languages such as Asamiyā(Assamese) Bangla (Bengali) and Manipuri The BENGALI block used for Bangla-

24

Asamiyā-Manipuri in theUNICODEhas93 entriesThis sectiondetails the code-pointrepertoirethattheNeo-BrāhmīGenerationPanel[NBGP]proposestobeincludedintheBanglaLGRItmaybementionedherethat theGovernmentofAssamhassubmittedaproposal toBureauof Indian Standards (BIS) on26th February2016 for dis-unificationofBanglaand Asamiyā Scripts The BIS inits 8thMeetingofIndian Language Technologies andProducts Sectional Committee LITD 20 held on 23rd Aug 2017 decided torefer theproposalforrecognitionofAssamesescriptinISOIEC10646toISOUntiltheUNICODEConsortiumtakesanyfurtheractionitwillbeassumedthattheCodePointRepertoireunderTable11willbevalidforallthethreelanguagesasaboveFor each of the code points language references have been given in the last columntitledReferenceunderTable8titledtheldquoCodePointRepertoirerdquoForentirecoverageofBanglacodepointsreferencesofBanglaAsamiyā(Assamese)Manipuri(Meitei)andBishnupriya are given Kokborok written in Bangla script is not known to haveintroducedmanynewcomplicationsexceptforoneparticularcharacterThoughonlyafewrepresentativelanguagesunderEGIDSScale1-4havebeenchosenforreferencingthey together cover all the code-points required for all the languages that NBGP hasconsideredasgivenunderBanglaUnicodePoints(asgiveninUNICODE63)Howeverbefore thedetailsarepresented it is ideal to lookat theBanglaCodePointChartfromMaximalStartingRepertoire[MSR]Version3Itmaybenotedthattheshapesofthereferenceglyphsgivenbelowinthecodechartsarebasedononeofthemanyfontsdesigned and are not prescriptive because there could be some variations in actualfonts ndash both UNICODE-compatible and True-Type ones Consider the following Codepointtable

25

Figure1BanglaCodePagefrom[MSR]forBangla-Asamiyā-Manipuri

ColourconventionAllcharactersthatareincludedinthe[MSR]-YellowbackgroundPVALIDinIDNA2008butexcludedfromthe[MSR]-PinkishbackgroundNot PVALID in IDNA2008 or are ineligiblefor the root zone (digits hyphen) - Whitebackground

GiventheBanglaUnicodeBlockasinFigure1forthecodepointsthoseareincludedintheMSRthefollowingsymbolswillneedaseparatetreatmentৎ U+09CE BanglaLetterKhanda-Taৰ U+09F0 Asamiyā-BanglaLetterRaWithMiddleDiagonalৱ U+09F1 Asamiyā-BanglaLetterRaWithLowerDiagonal

26

51 CodePointRepertoireInclusion

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

1 U+0981 BENGALISIGNCANDRABINDU

Candra-bindu

1Bangla2Manipuri2Assamese

[112][122][125]

2 U+0982 ং BENGALISIGNANUSVARA

Onushshar(Anusvara)

1Bangla2Manipuri2Assamese

[112][122][125]

3 U+0983 ঃ BENGALISIGNVISARGA

Biśarga(Visarga)

1Bangla2Manipuri2Assamese

[112][122][125]

4 U+0985 অ BENGALILETTERA

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

5 U+0986 আ BENGALILETTERAA

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

6 U+0987 ই BENGALILETTERI

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

7 U+0988 ঈ BENGALILETTERII

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

27

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

8 U+0989 উ BENGALILETTERU

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

9 U+098A ঊ BENGALILETTERUU

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

10 U+098B ঋ BENGALILETTERVOCALICR

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

11 U+098F এ BENGALILETTERE

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

12 U+0990 ঐ BENGALILETTERAI

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

13 U+0993 ও BENGALILETTERO

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

14 U+0994 ঔ BENGALILETTERAU

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

28

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

15 U+0995 ক BENGALILETTERKA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

16 U+0996 খ BENGALILETTERKHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

17 U+0997 গ BENGALILETTERGA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

18 U+0998 ঘ BENGALILETTERGHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

19 U+0999 ঙ BENGALILETTERNGA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

20 U+099A চ BENGALILETTERCA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

21 U+099B ছ BENGALILETTERCHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

29

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

22 U+099C জ BENGALILETTERJA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

23 U+099D ঝ BENGALILETTERJHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

24 U+099E ঞ BENGALILETTERNYA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

25 U+099F ট BENGALILETTERTTA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

26 U+09A0 ঠ BENGALILETTERTTHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

27 U+09A1 ড BENGALILETTERDDA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

30

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

28 09A109BC(U+09DC)

ড় NormalizedformofBENGALILETTERRRA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]09DCisthepreferredcodepointhoweveritisnotavailableforLGRasperthestandardsgoverningthisLGRdevelopment

29 U+09A2 ঢ BENGALILETTERDDHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

30 09A209BC(U+09DD)

ঢ় NormalizedformofBENGALILETTERRHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]09DDisthepreferredcodepointhoweveritisnotavailableforLGRasperthestandardsgoverningthisLGRdevelopment

31 U+09A3 ণ BENGALILETTERNNA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

32 U+09A4 ত BENGALILETTERTA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

31

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

33 U+09A5 থ BENGALILETTERTHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

34 U+09A6 দ BENGALILETTERDA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

35 U+09A7 ধ BENGALILETTERDHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

36 U+09A8 ন BENGALILETTERNA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

37 U+09AA প BENGALILETTERPA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

38 U+09AB ফ BENGALILETTERPHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

39 U+09AC ব BENGALILETTERBA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

32

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

40 U+09AD ভ BENGALILETTERBHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

41 U+09AE ম BENGALILETTERMA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

42 U+09AF য BENGALILETTERYA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

43 09AF09BC(U+09DF)

য় NormalizedformofBENGALILETTERYYA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]09DFisthepreferredcodepointhoweveritisnotavailableforLGRasperthestandardsgoverningthisLGRdevelopment

44 U+09B0 র BENGALILETTERRA

Consonant 1Bangla2Manipuri

[112][125]

45 U+09B2 ল BENGALILETTERLA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

46 U+09B6 শ BENGALILETTERSHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

33

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

47 U+09B7 ষ BENGALILETTERSSA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

48 U+09B8 স BENGALILETTERSA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

49 U+09B9 হ BENGALILETTERHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

50 U+09BE া BENGALIVOWELSIGNAA

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

51 U+09BF ি BENGALIVOWELSIGNI

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

52 U+09C0 ী BENGALIVOWELSIGNII

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

53 U+09C1 BENGALIVOWELSIGNU

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

34

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

54 U+09C2 BENGALIVOWELSIGNUU

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

55 U+09C3 BENGALIVOWELSIGNVOCALICR

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

56 U+09C4 BENGALIVOWELSIGNVOCALICRR

Kāra(Mātrā)

1Bangla2Assamese

[112][122]

57 U+09C7 l BENGALIVOWELSIGNE

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

58 U+09C8 m BENGALIVOWELSIGNAI

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

59 U+09CB lা BENGALIVOWELSIGNO

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

60 U+09CC lৗ BENGALIVOWELSIGNAU

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

35

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

61 U+09CD BENGALISIGNVIRAMA

Hasanta(=Halant)Virama(=Da ri)

1Bangla2Assamese2Manipuri

[112][122][125]

62 U+09CE ৎ BENGALILETTERKHANDATA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

63 U+09F0 ৰ BENGALILETTERRAWITHMIDDLEDIAGONAL

Consonant 2Assamese [122]

64 U+09F1 ৱ BENGALILETTERRAWITHLOWERDIAGONAL

Consonant 2Assamese2Manipuri

[122][125]

Table8BanglaCode-PointRepertoire

Apart from the above individual code-points the Neo-Brāhmī Generation Panel alsoproposes some specific sequences which enable conditional inclusion of the BanglaLETTER A and E followed by Bangla SIGN VIRAMA and Bangla LETTER YA againfollowed by Bangla VOWEL SIGN AA in the repertoire for enabling inclusion of aeligsoundasinEnglishlsquobatrsquolsquocatrsquoetc

SrNo

UnicodeCodePoints

Sequence

CharacterNames Examplelanguagesusingthecode-point

(Notexhaustive

list)

Reference

36

SrNo

UnicodeCodePoints

Sequence

CharacterNames Examplelanguagesusingthecode-point

(Notexhaustive

list)

Reference

S1 098509CD09AF09BE

অ8া BENGALILETTERABENGALISIGNVIRAMABENGALILETTERYABENGALIVOWELSIGNAA

BanglaAssamese

[112][122]

S2 098F09CD09AF09BE

এ8া BENGALILETTEREBENGALISIGNVIRAMABENGALILETTERYABENGALIVOWELSIGNAA

Bangla [112]

Table9Sequences

52 CodePointRepertoireExclusionTherearesomecharactersoftheBanglascriptthatfindplaceintheUnicodebuthavenot been included in the repertoire in the LGR proposal The reason for excludingঌ(U+098C)andৗ(U+09D7)isthattheyarerareandobsoletecharacters

SrNo CodePoints

Glyph CharacterNames Note

1 U+098C ঌ BENGALILETTERVOCALICL Limitedordeclininguse

2 U+09D7 ৗ BENGALIAULENGTHMARK Limitedordeclininguse

Table10ExcludedCodePoints

37

53 CodepointnotusedaloneBENGALI SIGN NUKTA U+09BC (See 336) is excluded from repertoire since it will never be used alone It will be used as sequence in three special characters in normalized form for ড় ঢ় য়

UnicodeCodePoint

Glyph CharacterName Reasonforexclusion

U+09BC BENGALISIGNNUKTA

Never used alone Only used together with U+09A1 ড

U+09A2 ঢ U+09AF য as to form ড় ঢ় য় respectively

Table10bExcludedCodePoints

54 TheBasisofPresentIDNThepresentLGRhasalsobenefitedfromtheearlierworkonIDNforBangla(differentversions)doneforभारतorভারতdraftedbetween20112009and18072013

541 TheABNFVariablesTheAugmentedBackus-NaurFormalism(ABNF)beganwiththefollowingvariables

CrarrConsonantVrarrVowelMrarrkāra(Mātrā)BrarrAnusvāra(onuʃʃār)DrarrCandrabinduXrarrVisarga(biʃɔrgo)HrarrHasantaViramaZrarrKhandaTa

TheAugmentedBackus-NaurFormalism(ABNF)willusethefollowingOperators

SrNumber Operator Function

1 ldquo|ldquo Alternative

2 ldquo[]rdquo Optional

3 ldquordquo VariableRepetition

38

4 ldquo()rdquo SequenceGroup

Table11TheABNFFormalism

InwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoBanglaaregiventofacilitateunderstanding

542 TheVowelSequenceInwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoBanglaare given To facilitate understanding of other Brahmi script users equivalents inDevanāgarīareprovidedwherevernecessaryAvowelsequenceismadeupofasinglevowelItmaybefollowedbutnotnecessarily(optionally)byanAnusvāraonuʃʃār(B)Candrabindu(D)oraVisargabiʃɔrgo(X)ThenumberofDBorXwhichcanfollowaV inBanglamaynotberestrictedtooneGoingbytherules illustratedinthedocument it isclearthat formationssuchasVDDVBBandVXXare invalidorthographicunitsHowever it isvalidandpossible tohaveformationsorsequencessuchasanusvarafollowedbyachandrabinduononehandandvisarga followed by a chandrabindu on the other as in হ8াংচা lsquohaelignchārsquo and lsquohaelignrsquo হ8াঃrespectivelyThepossibilityof aVisarga orAnusvāra (onuʃʃār) followingaCandrabinduexists inBangla Vowel can optionally be followed by a combination of Hasanta Virama [H]Consonant [C] to formaYa-phala ldquoYa-phala isapresentation formofU+09AFBanglaletterযorlsquoyarsquoRepresentedbythesequenceltU+09CDieBENGALISIGNVIRAMABangla SIGNHasanta or VIRAMA U+09AF -য BENGALI LETTER YAgt Ya-phala has aspecialformয়AgainwhencombinedwithU+09BEাBENGALIVOWELSIGNAA(ielsquoaarsquo(ā))itisusedfortranscribing[aelig]asintheldquoardquointheEnglishwordldquobatrdquowritteninBanglaasব8াটAVowel-sequenceadmitsthefollowingcombinations

5421 ASingleVowel

ExamplesV অअ

5422 AVowelwithConditionsAVowelcanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D]orVisarga[X]or Candrabindu+ Anusvāra [DB] or Candrabindu+ Visarga [DX] or combination ofHasanta(orVirama)[H]followedbyConsonant[C]followedbykāra(Mātrā)[M]

39

Examples

VB অং अ

VD অ अ

VX অঃ अः

VDB অং अ

VDX অঃ अःVHCM অ8াএ8া

5423 VHCMSequenceAVHCMsequencecanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D] orVisarga [X]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga [DX]

Examples VHCMB অ8াংএ8াং VHCMD অ8াএ8া VHCMX অ8াঃএ8াঃ VHCMDB অ8াংএ8াং VHCMDX অ8াঃএ8াঃ

543 TheConsonantSequence

5431 ASingleConsonant(C) ExampleC কक

5432 AConsonantwithConditionsAConsonantoptionallyfollowedbydependentvowelsignkāra(Mātrā)[M]orAnusvāra [B] or Candrabindu [D] or Visarga [X] or Hasanta (also known asVirama)[H]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga[DX]

Example

CM িকক -कक

CB কং क

40

CD ক क

CX কঃ कः

CH p क(Pureconsonant)

CDB কং क

CDX কঃ कः

5433 CMSequenceACMsequencecanbeoptionallyfollowedbyBDXDBorDX

Example CMB কীংকং कक

CMD কা का

CMX বীঃ वीः

CMDB কাং का

CMDX কাঃ काः

5434 SequenceofConsonants Asequenceofconsonants(upto4)joinedbyHasanta(alsoknownasVirama)

3(CH)CExample

CHC W rarr ন++ ত न+त

CHCHC sup2 rarr ন+ + ত+ + র न+त+र

CHCHCHC q8 rarr ন++ত++র++য় न+त+र+य

5435 Subsets While considering its subsets as a representative example we will consider thecombinationCHConlyhoweverthesameisequallyapplicabletoCHCHCandCHCHCHC[A]ThecombinationmaybefollowedbyMBDXDBorDX

Example CHCM sup3ী rarrক ক ী 4कrarrककী

CHCB sup3ং rarrক ক ং 4कrarrकक

CHCD sup3 rarrক ক 4कrarrकक

41

CHCX sup3ঃ rarrক ক ঃ 4कःrarrककঃ

CHCDB sup3 ং rarrক ক ং 4कrarrकक

CHCDX sup3ঃ rarrক ক ঃ 4कःrarrककः [B] 3(CH)CMmayfurtherbefollowedbyaBDXDBorDX

Example

CHCMB sup3ীং rarr ক ক ী ং 4क rarr क क ी

sup3ং rarr ক ক ং 4क rarr क क

CHCMD sup3া rarr ক ক া 4का rarr क क ा

CHCMX sup3ীঃ rarr ক ক ী ঃ 4कः rarr क क ी ः

CHCMDB sup3াংrarr ক ক া ং 4काrarr क क ा

CHCMDX sup3াঃ rarr ক ক া ঃ 4काः rarr क क ा ः

544 TheKhanda-Tasequence

5441 AsinglelsquoKhandarsquo-Ta(Z) Example Zৎ=x

5442 AKhandaTaCombination10AKhandaTacanbeprecededbyaconsonantandHasanta(alsoknownasVirama)

[CH]Z

Example র++ৎ =ৎHasinভৎHসনা (bhartsanā) scoldingNoteTheconditionsinthiscontextofKHANDATAarethattheCshouldbeeitherRAU+09B0(র)(usedinBangla)orRAU+09F0(ৰ)(usedinAssamese)

545 SpecialCasesSandPTwospecialcasesinvolvingSequences(referredtoasSandPinTable16underSection7)couldbedescribedbrieflyhereLetustakeupSinthefirstinstanceItisnoteworthythat thereare two instances inBanglawhereHasanta(U+09CD) isprecededbya full

10 Refer to Rule P in Section 7 Table 16

42

vowel (U+0985 অ - BANGLA LETTER A and U+098F এ - BANGLA LETTER E) ForrenderingYa-phalāfollowedbyঅandএitisnecessarytotypeU+09CDplusU+09AFyapreceded by the said vowels This is a purely ligatural entity and the addition ofYa-phalāandā-kāraisusedtoelicittheaeligsoundasinEnglishlsquobatrsquolsquofatrsquoetcTheBrahmıscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribedaslsquovowel killerrsquo although it actually indicates absence of a vowel after the markedconsonant Only the consonants can have the Hasanta marked But as we see hereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅGাandএGা(CfUnicode100p473[100])Another case refers to the formation of repha and ra-phalā in the said script andmentioned in the tableaboveasPOwing toco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+

kaasinঅকHarkaldquothesunldquo)ra-phalā=C+Hasanta+ra(egTieka+Hasanta+raas

inচTcakraldquocycleldquo)ThepointisinboththecasestheslotforracouldbeBanglaraর

(U+09B0)ortheAssameseraৰ(U+09F0)followedprecededbythecommonHasanta(U+09CD)whereastheshapesofrephaandra-phalāinboththecasesremainthesame

43

6 VariantsThissection talksabout thevariants in theBanglascriptTheNBGPcategorizes theseconfusinglyvariantsintwogroupsGroup1ConfusingduetopurevisualsimilarityGroup2ConfusingduetodeviationfromnormallyperceivedcharacterformationsbylargerlinguisticcommunityForGroup1anyidenticalcodepointsaredefinedasvariantsTheconfusablebutnotidenticalcasesarenotproposedasthereisanotherpanel(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesHowevercaseswhichbelongtoGroup2areproposedtobeconsideredasvariantsThesecasesarenotofmerevisualsimilarityasthey involve some deviations from the widely accepted norms of Bangla Aksharformations These can cause confusion even to a careful observer and hence beingproposedasvariantsThe variants are generated in a script when two or more forms are formed withdifferent storage or code points In Bangla the e-kāra ā-kāra and the o-kāra havedifferentcodepointsOnecantypeowithaconsonantatonegoandthesamebytypinge-kāra and ā-kāra as two separate keys getting the same results A reader cannotdifferentiatebetween the twoko (েকা)one typedwitha singlekeyand theotheronetypedwithtwodifferentkeysMoreoverthiswillnotbeconsideredasacaseofvariantbecauseakārafollowedbyakāraisnotallowed

61 InScriptVariantsHoweverweproposetwocasesoftruein-scriptvariantsinBanglascriptCASEIAs far as true variants in Bangla are concernedwemay drawour attention to caseswhereinHasantawith(U+09A5)থ(tha)appearsasconjunctwith(U+09B8)স(sa)and(U+09A8)ন(na)1 স+Hasanta+থ(U+09B8+U+09CD+U+09A5)versus

স +Hasanta+হ(U+09B8+U+09CD+U+09B9)

2 ন+Hasanta+থ(U+09A8+U+09CD+U+09A5)versus

ন+Hasanta+হ(U+09A8+U+09CD+U+09B9)

44

Theabovecombinationsifwrittenintraditionalorthographycouldbelittleconfusingwheretheথ(tha)inconjunctappearslikeaহ(ha)Theconjunctcouldbeintheinitialmedialorfinalpositions(asshownbelowinegno1)Itcouldbetypedwrongaswellthinking itwas aহ (ha)U+09B9 increasing the chances of risks in labelwriting andidentificationExamples1 acuteandসহ(asinacuteানsthānaacuteলsthulaাacuteGsvāsthyaঅacuteায়ীasthāyı)

2 andনহ(asinparagrantha)The fontswhichrepresent traditionalBanglawritingsystemcould tend tocreate thisproblemThereforethesemaybetakenascasesofvariantsinBanglaCASEIIAnotherinterestingexampleofvariantisencounteredinra+HasantaandHasanta+racombinationsinwritinglabelsintheBanglascript(forlanguagessuchasBanglaAssameseandManipuri)Thevariantcasesariseintypinglsquorepharsquo(involvingra+Hasanta)andlsquora-phalārsquo(involvingHasanta+ra)lsquoRepharsquocouldbeformedbytwosequences(mainlybecausebothAssameseandBanglafindplaceinthesameUNICODEpointsandlsquoB_RArsquoaswellaslsquoA_RArsquorefertothesamephoneticelement)Herethefinalligatureslookthesameandwillbeasfollows

(1) B_RA+H+C(2) A_RA+H+C

Where

B_RA = U+09B0BENGALILETTERRA(র)orA_RA = U+09F0BENGALILETTERRAWITHMIDDLEDIAGONAL(ৰ)H = U+09CDBENGALISIGNVIRAMA()C = anyconsonant(theoretically)

Example

Sequence1(UsingBanglaRA)

Ligature1

Sequence2(UsingAssameseRA)

Ligature2

U+09B0(র)U+09CD()U+0995(ক) কH U+09F0(ৰ)U+09CD()U+0995(ক) কH

U+09B0(র)U+09CD()U+09A0(ঠ) ঠH U+09F0(ৰ)U+09CD()U+09A0(ঠ) ঠH

Table12ExampleofRepha

45

NoteAsBanglaandAssameseকandঠlookexactlythesametheresultantcombinationswithRephalookidenticalAdditionofRephadoesnotmakeanydifferencelsquoRa-phalarsquocouldbeformedbytwosequencesonsimilargroundsandthefinalligatureswouldlookthesame

(1) C1+H+B_RA(2) C1+H+A_RA

WhereC1 = anyconsonantsexceptKhanda-ta

Example

Sequence1(UsingBanglaRA)

Ligature1

Sequence2(UsingAssameseRA)

Ligature2

U+0995(ক)U+09CD()U+09B0(র) U+0995(ক)U+09CD()U+09F0(ৰ)

U+09A8(ন)U+09CD()U+09B0(র) ) U+09A8(ন)U+09CD()U+09F0(ৰ) )

Table13ExampleofRa-phala

AstheAssameseandBanglaRephaandRa-phalaconjunctformslookthesamethiscouldcauseconfusabilitytotheend-usersHencetherephaandra-phalacasesneedtobedefinedasvariantsNBGPconcludedtodefineরandৰasvariantcodepointswhereonlyonevariantsetbetweenরandৰcouldcoverallcasesButthiswillcreateblockedvariantlabelsegif

someoneregistersldquoরররrdquothevariantlabelldquoৰৰৰrdquowillbegeneratedasvariantandwillbeblockedandviceversaHoweveritisonlyblockedatthelabellevelifsomeoneelseneedstoregisterotherlabelsegৰৰorৰৰৰৰitisstillpossible

62 CrossScriptVariantsAcrispcrossscriptstudyforBanglahasbeendonewithrespecttosisterscriptssuchasDevanāgarī Gurmukhı and Odia11(formerly Oriya) keeping in mind the visual andtechnicalconfusionstheymaycauseaslabelsonthewebdomainMoreoverthereisnoin-script variant in Bangla as far as the orthography is concerned The followingcharacters are being proposed by the NBPG as variants Although there are certaincharacters which are somewhat similar they but have not been included here TheyhavebeenprovidedintheAppendix(102)forreference

11 Unicode uses Oriya for the script although Odia is now the official term used

46

1 BanglaandNāgarīDevanāgarīScript

Bangla Devanāgarī

মU+09AE

मU+092E

িU+09BF

िU+093F

Table14-BanglaandDevanāgarīcross-scriptvariantcodepoint

2 BanglaandGurmukhiScript

Bangla Gurmukhı

মU+09AE

ਸU+0A38

িU+09BF

ਿU+0A3F

Table15-BanglaandGurmukhıcross-scriptvariantcodepoint

7 WholeLabelEvaluationRules(WLE) ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinsection32whenwritten inBangla12ScriptThe ruleshavebeendrafted in suchawaythattheycanbeeasilytranslatedintotheLGRspecificationsBelow are the symbols used in the WLE rules for each of the Indic SyllabicCategoryasmentionedinthetableprovidedinCodepointrepertoire(Section51)

C rarr Consonant

M rarr Kāra(Mātrā)

V rarr Vowel

B rarr Anusvāra

12 As used by the Unicode denoting and including both Assamese and Maṇipuri

47

D rarr Candrabindu

X rarr Visarga

H rarr Hasanta

Z rarr KhandaTa

S rarr S1S2(fromTable9)or(ae)Ya-phalā(V1HC1M1)whereV1iseither0985(অ-BENGALILETTERA)or098F(এ-BENGALILETTERE)His09CD(-BENGALISIGNVIRAMA)C1is-09AF(য-BENGALILETTERYA)M1is-09BE(া-BENGALIVOWELSIGNAA)S1 and S2 are valid even they are not allowed by the other context rules

P rarr Ra-Hasanta(C2H)whereC2iseither09B0(র-BENGALILETTERRA)or 09F0 (ৰ - ASSAMESE LETTER RA

Unicode name BENGALI LETTER RAWITHMIDDLEDIAGONAL)

His09CD(-BENGALISIGNVIRAMA)

Table16-SymbolsusedinWLErules

ItisalsoperhapsidealtomentionherethatinBanglatheconsonantletters(orgraphemes)arephysicallyjoinedtoformldquoclustersrdquothatcouldtheoreticallyconjoinfromtwotofourconsonantsandcombinetocreatenewshapesDashandChaudhuri(1998)statethatthereareldquonearly380uniqueconsonantclustersrdquooutofwhichBi-consonantalcombinationsare290three-lettercombinationsaccountforanother80andtherareroneswithfourlettersnumber10more[136Pg4]MoredetailsofsuchcombinationscouldbeseeninPabitraSarkar(1993)[135] 71FinalSetofWLERulesTheprevalentpatternsinBanglaandvariousrestrictionsbelowarethespecificWLErulesthatneedtobeimplemented

48

1 CisasetofCandCNwhereCNisthesetofnormalizedformsofড়ঢ়য়2 HmustbeprecededbyC

Example

3 MmustbeprecededbyCExampleকা

4 DmustbeprecededbyeitherofVCorMExampleআখখা হা

5 XmustbeprecededbyeitherofVCMorDExampleউঃখঃবঃাঃ দ ঃ

6 BmustbeprecededbyeitherofVCMorDExampleআংইংকং

7 ZmustbeprecededbyVCMDBXorPExampleইৎকৎাৎাৎপ6ৎrৎ (S is not listed because S ends with M Z may also follow S)rdquo

8 VCANNOTbeprecededbyHDetailsin711CaseofVprecededbyH

9 SCANNOTbeprecededbyH

Now letuselaborateeach rulewithexamples from the scriptkeeping inmindthe Bangla Assamese and Manipuri communities Some combinations ofcharactersmayseemunrealisticorrareinusagebutthereisnoharminaddingsuch ligaturesbecause it ispossible tocreate thembyanyusereasilybutmaynotbeattestedcombinationsCase of V Preceded by H There could be cases involvingmulti-word domains where Vmay need to beallowedtofollowanHegব8াsঅtইিuয়া baeligŋkʌv ɪndiə (U+09ACU+09CDU+09AFU+09BEU+0999U+09CD U+0995 U+0985 U+09AB U+09CD U+0987 U+09A8 U+09CD U+09A1U+09BFU+09DFU+09BE)(meaningBankofIndia)ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendswithanH(অt)andthesecondwordbeginswithaV(ইিuয়া)Somesectionsofthe linguistic community require the explicit presence of H for fullrepresentation of the sound intended However by and large the form of thefirstwordwithoutanH(U+09CD)isconsideredenoughforfullrepresentationofthesoundintendedforthefirstword

49

This isauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoire Otherwise V is never required to be allowed to follow an HPermittingthismaycreateaperceptivesimilaritybetweentwolabels(withandwithout H) for majority of the linguistic communities hence this is explicitlyprohibitedbytheNBGPIn future if required depending on the prevailing requirements from thecommunitythefutureNBGPmayconsiderrevisitingthisrule

72 AdditionalExamplesfromBanglaABNF

Belowarea fewexampleswhichhelponeunderstandsomeof the rulesABNFputsinplaceThesearejustgivenforreferencepurposesandarenotmeanttobecomprehensive

1HMBDorXcannotoccurinthebeginningofaBanglawordExample

ক क

াক ाक

ংক क

ক क

ঃক ःकAscanbeseensuchcombinationwillresultautomaticallyinaldquogolurdquooradottedcircle marking it as an invalid formation This is an intrinsic property of theIndianlanguagesyllableandisquasi-automaticallyappliedwhereversupportedbytheOS

2HisnotpermittedafterVBDXMS

Example

অ अ

অং क ক क কঃ कः ি )क

3NumberofBDorXpermittedafterConsonantorVowelorakāra(Mātrā)isrestrictedtoonethusthefollowingcombinationsareinvalidated

50

Example

কংং क

ক क

কঃঃ कःः

কা का

কীঃঃ कःः

অংং अ

অ अ

অঃঃ अःः

4NumberofMpermittedafterConsonantisrestrictedtooneExample

কীী की5MisnotpermittedafterV Example

ইাঈৗ ईाईौ6ThecombinationsofAnusvāra+ VisargaaswellasVisarga+Anusvāraarenotpermissible

Example

কংঃ कः

কঃং कः

8 Contributors

81ExpertsfromIndia ProfessorUdayaNarayanaSinghChair-ProfessorofLinguisticsampDeanFacultyofArtsAmityUniversityHaryanaGurgaonPachgaonManesarPIN122431(Haryana)India

51

ProfessorPabitraSarkarformerlyVice-ChancellorRabindraBharatiUniversityKolkataDrAtiurRahmanKhanPrincipalTechnicalOfficerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMrRajibChakrabortyLinguistSocietyforNaturalLanguageTechnologyResearch(SNLTR)Module114amp130SDFBuildingSaltLakeSector-VKolkata-700091(WestBengal)IndiaMrAkshatJoshiProjectEngineerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMsMoumitaChowdhurySeniorTechnicalOfficerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMrChandrakantaMurasinghAgartalaTripuraSomeotherNBGPmembers

82ContributorsfromBangladeshJanabMustafaJabbarHonorableMinisterMinistryofPostsTelecommunicationsampInformationTechnologyGovtofBangladeshProfShamsuzzamanKhanFormerDirector-GeneralBanglaAcademyDhakaProfRafiqulIslamNationalProfessorofHumanitiesDhakaProfSwarochisSarkarDirectorInstituteofBangladeshStudiesRajshahiUniversityRajshahiBangladeshProfJinnatImtiazAliDirector-GeneralInternationalMotherLanguageInstituteDhakaMrMohammadMamunOrRashidDepartmentofBanglaJahangirnagarUniversityampMemberBangladeshComputerCouncilProfManiruzzamanformerlyProfessorChittagongUniversityChattagramBangladeshMrShyamSunderSikderSecretarySecretaryPostampTelecommunicationsDivisionGovtofBangladesh

52

MrMdMustafaKamalFormerDirectorGeneralBangladeshTelecommunicationsRegulatoryAuthorityGovernmentofBangladeshDhakaBrigadierGeneralMdMahfuzulKarimMajumderDirector-GeneralEngineeringampOperationsDivisionBangladeshTelecommunicationsRegulatoryAuthorityGovernmentofBangladeshDhakaMdZiarulIslamProgrammerPostsampTelecommunicationsDivisionGovernmentofBangladeshDhakaProfSyedShahriyarRahmanDepartmentofLinguisticsUniversityofDhakaDrMizanurRahmanDirector(In-Charge)TranslationTextBookandInternationalRelationsDivisionBanglaAcademyDhakaDrApareshBandyopadhyayDirectorBanglaAcademyDhakaMrMdMobarakHossainDirectorBanglaAcademyDhakaDrJalalAhmedDirectorBanglaAcademyDhakaMrJahangirHossainInternetSocietyBangladesh(ICANNALS)JanabSarwarMostafaChoudhuryBangladeshComputerCouncilDhakaJanabMdRashidWasifBangladeshComputerCouncilDhakaJanabIstiaqueArifSeniorAssistantDirectorBangladeshTelecommunicationsRegulatoryAuthorityDhakaMsAfifaAbbasInformationSecurityandGovernanceLeadEngineeratBanglalinkandICANNFellowMr Mohammad Abdul Haque Secretary General Bangladesh Internet GovernanceForumMrImranHossenCEOEyeSoftandkeymemberofBangladeshAssociationofSoftwareampInformationServices(BASIS)MsShahidaKhatunDirectorFolkloreMuseumampArchiveDivisionBanglaAcademyDhakaMrSyedAshikRehmanCEOBengalMediaCorporationDhakaMrHaseebRahmanCEOProfessionalsrsquoSystemsDhaka

53

9 References[100] UnicodeConsortium2017UnicodeStandard100MountainViewCA

[101] BandyopadhyayChittaranjan1981DuiShatakerBanglaMudranoPrakashanKolkataAnandaPublishers

[102] BanerjiRD1919TheOriginoftheBengaliScriptKolkataNewDelhiAsianEducationalServices2003reprint

[103] ChatterjiSK1926TheOriginandDevelopmentoftheBengaliLanguageCalcuttaCalcuttaUniversityPress

[104] -----1939Bhasha-prakashBangalaVyakaran(AGrammaroftheBengaliLanguage)CalcuttaUniversityofCalcutta

[105] HaiMuhammadAbdul1964DhvaniVijnanOBanglaDhvani-tattwa(PhoneticsandBengaliPhonology)DhakaBanglaAcademy

[106]JhaSubhadra1958TheFormationofMaithiliLondonLuzacampCo

[107] KosticDjordjeDasRheaS1972AShortOutlineofBengaliPhoneticsCalcuttaStatisticalPublishingCompany

[108] MajumdarRC1971HistoryofAncientBengalCalcuttaGBhardwaj

[109] MazumdarBijaychandra19202000TheHistoryoftheBengaliLanguage(ReprCalcutta1920ed)NewDelhiAsianEducationalServices

[110] PandeyAnshuman2001ProposaltoEncodetheTirhutaScriptinISOIEC10646

[111] PalPalashBaran2001DhwanimalaBarnamalaKolkataPapyrus

[112] -----2007lsquoBanglaHarapherPanchParbarsquoInSwapanChakrabortyedMudranerSanskritiOBanglaBoiKolkataAbabhas

[113] RossFiona1999ThePrintedBengaliCharacteranditsEvolutionLondonCurzon

[114] ShastriMahamahopadhyayHaraPrasad1916HājārBacharērPurāṇaBāṅgālāBhāṣāyBauddhaGānōDōhāCalcuttaBangiyaSahityaParishat

[115] SinghUdayaNarayana(JointlyManiruzzaman)1983DiglossiainBangladeshandlanguageplanningCalcuttaGyanBharati214pp

[116] -----1987ABibliographyofBengaliLinguisticsMysoreCIILxii+316pp

[117] -----2017(withRajibChakrabortyBidishaBhattacharjeeampArimardanKumarTripathy)LanguagesandCulturesontheMarginGuidelinesforFieldworkonEndangeredLanguagesMimeoCentreforEndangeredLanguagesVisva-Bharati

54

[118] -----1980ScriptalchoiceandspellingreformAnessayinlanguageandplanningJournaloftheMSUniversityofBarodaSocialScienceNumber292173-186AmodifiedversionreprintedEAnnamalaiBjornJernuddandJoanRubinedsLanguagePlanningProceedingsofanInstituteMysoreCIIL405-417

[119] Sripantha1996JakhanChapakhanaEloKolkataPaschim-BangaBanglaAcademy

[120] SurAtul1986BanglaMudranerDushoBacharKolkataJijnasa

[121] ScriptBehaviourforBengaliVersion11TDILandC-DACPune

[122] BoraMahendra1981TheEvolutionofAssameseScriptJorhatAssamSahityaSabha

[123] ProposaltoEncodetheTirhutaScriptinISOIEC10646httpwwwunicodeorgL2L201111175r-tirhutapdfaccessedon25112017

[124] EthnologueAssameseintheLanguageCloudhttpswwwethnologuecomcloudasmaccessedon25112017

[125] BengalialphabetforManipurifoundinEthnologueManipuri(MeeteilonMeithei)httpswwwomniglotcomwritingmanipurihtmaccessedon20102019

[126] WikipediaBengalialphabethttpsenwikipediaorgwikiBengali_alphabetaccessedon25112017

[129]OmniglotSlyhetihttpwwwomniglotcomwritingsylotihtmaccessedon1052018

[130]WikipediaBishnupriyaManipurilanguagehttpsenwikipediaorgwikiBishnupriya_Manipuri_languageaccessedon25112017

[131] TheEMILLECIILCorpushttpmetashareeldaorgrepositorybrowsethe-emilleciil-corpusabdd35c8de6f11e2b1e400259011f6ea6bce74d38dbb42d881da76c64a6adb20accessedon1052018

[132]TheEMILLECIILCorpushttpcatalogelrainfoproduct_infophpproducts_id=696accessedon1052018

[133] BanglaLanguageampScript httpswwwisicalacin~rc_banglabanglahtmlaccessedon1052018

[134] SarkarPabitra1992BanglaBananSanskarSamasyaoSambhabanaKolkataChirayataPrakashan

[135] SarkarPabitra1993BanglaBhasharYuktabyanjanBhasha1123-45

55

[136] DashNiladriShekharandBBChaudhuri1998BanglaScriptAStructuralStudyLinguisticsToday121-28Alsoavailableathttpswwwacademiaedu9967428Bangla_Script_A_Structural_Study

[137] DaniAhmedHasan(1957)lsquoSrīhatta-NāgarīLipirUtpattioBikāśrsquoBanglaAcademyPatrika(Dhaka)Vol12(Bhadra-Agrahayan1364BangabdaNumber)pg1

[138] WikipediaSylhetiNagari httpsenwikipediaorgwikiSylheti_Nagariaccessedon1952018

[139] FuruiRyosuke(2015)lsquoVariegatedAdaptationsStateFormationinBengalfromtheFifthtoSeventhCenturyrsquoinBhairabiPrasadSahuampHermannKulkeedsInterrogatingPoliticalSystemsIntegrativeProcessesandStatesinPre-ModernIndiaChapter9Pp255-73NewDelhiManohar

[140] FergusonCharesAandMunierChowdhury(1960)lsquoPhonesofBengalirsquoLanguageVol36No1pp22-59

[141] ShahidullahMuhammad(2007)BuddhistMysticSongsDhakaMowlaBrothers

[142] RayPunyaSloka(1966)BengaliLanguageHandbookWashington

[143] HaiMuhammadAbdul(1960)AphoneticandphonologicalstudyofnasalsandnasalizationinBengaliDhakaUniversityofDhaka

[144] UnicodeConsortiumProposalSummaryFormtoAccompanySubmissionsforAdditionstotheRepertoireofISOIEC10646UNICODEhttpswwwunicodeorgL2L200202387r-syloti-formpdfaccessedonMay212018

[145] WikipediaOlChiki(Unicodeblock)httpsenwikipediaorgwikiOl_Chiki_(Unicode_block)accessedonMay212018

[146]BanglaScripthttpwwwbangladesh2000combdbangla_scripthtmlaccessedonMay212018

[147] BhattacharyaAshutoshed(1942)GopichandrerGanCalcuttaCalcuttaUniversity

[149] DasSisirKumar(1975)SahibsandMunshisAnAccountoftheCollegeofFortWilliamCalcutta

[150] IslamRafiqulPabitraSarkarMahbubulHaqampRajibChakraborty(eds)(2014)BanglaAcademyPromitoBanglaByabaharikByakaran(AFunctionalGrammarofStandardBangla)DhakaBanglaAcademy

[151] SarkarPabitra[2013]lsquoBanglaSpellingReformtheLongandShortofItrsquoBanglaJournal19215-232

[152] BanglaAcademy(2012)BanglaAcademyPromitoBanglaBananerNiyam(StandardBanglaSpellingasadoptedbyBanglaAcademy)DhakaBanglaAcademy

56

[153] SarkarPabitraampRajibChakraborty2018ldquoWhathashappenedSoFarIntermsof Script Reformsrdquo Paper presented at the Face to Face meeting jointly held by theBanglaAcademyDhakaampICANNatBanglaAcademyDhakaon10072018

[154] The Unicode Consortium 2018 The Unicodereg Standard Version 110 ndash CoreSpecificationChapter12P473

57

10 Appendix-I

101 AugmentedBackus-NaurFormalism(ABNF)The Augmented Backus-Naur Formalism (ABNF) is generic in nature and whenappliedtoaspecificlanguagescriptcertainrestrictionrulesapplyInotherwordsinagivenlanguagesomeoftheFormalismstructuresdonotnecessarilyapplyTotakecareofsuchcasesrestrictionrulesaresetinplaceTheserestrictionswillhelptofine-tunetheABNFIncaseofBangla13inparticularthefollowingrulesapply

1Khaṇḍata (ৎ) is NOT allowed at the beginning of an IDN label The sameappliestoঞandthevelarnasalঙintheBanglaSchemeoffive-foldlsquovargarsquo(asdefined under Table 5) Moreover Bangla does not allow ya (য়) in thebeginningofawordeitherbutwecanciteacoupleofnativeexamples forexamplethewordয়8াvেড়া(yaeligbbɔRo)fromthepoemlsquoLichuchorrsquowrittenbyKaziNazrul IslamHowever there are instances of it being used in namesmostlyof foreignoriginsuchasYaqubwhichmaybewrittenwithya(য়) inthe beginning as inয়াwব) In very recent timeswhile transliterating someChineseandJapanesenamesinBanglaonedoescomeacrossthepossibilityofKhaṇḍata (ৎ) followedbysa (স) inthebeginningofaword forexamplexেসিরং(Tsering)

2CHcancomewithKhandaTainonlythecasewhereCisra(র)(09B0)

ৎ6 asinভৎ6 সনা

3OnlyfollowingcombinationswithVHCMwillbeallowedrarrঅ8া(togetherpronouncedasaelig)asinঅ8ািসড(acid)rarrএ8া(togetheralsopronouncedasaelig)asinএ8ািসডএ8ােসািসেয়শান

(acidassociation)

102 lsquoSylhetiNagarılipirsquoorlsquoSilotirsquoThisversionofBanglascriptresemblesthe lsquoKaithīrsquoscript(ISO12954)usedbytheAccountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh andBiharndashwidelyinuseduringthe1880sTherewereseveralothernamesofSylheti

13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language Assamese and Maṇipuri have not been covered in this section

58

Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)

Table17ndashTheScriptTableofSylhetiNagarıorSiloti

103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints

1031 BanglaandNāgarīorDevanāgarī

Bangla Devanāgarī NBGP

Decision

ঃ U+0983 ः U+0903 Confusable

ওU+0993 उU+0909 Confusable

ঘU+0998 घU+0918 Confusable

U+0981 U+0945 Confusable

Table18BanglaandDevanāgarīconfusablecodepoints

59

1032 BanglaandGurmukhi

Bangla Gurmukhi NBGPdecision

ঘU+0998 ਬU+0A2C Confusable

U+0981 U+0A71 Confusable

Table19BanglaandGurmukhiconfusablecodepoints

Bangla Gurmukhi

NBGPdecision

ওU+0993 ਤU+0A24

Distinguishable

শU+09B6 ਅU+0A05

Distinguishable

মU+09AE ਮU+0A2E

Distinguishable

বাU+09ACandU+09BE

ਗU+0A17

Distinguishable

Table20ndashBanglaandGurmukhıdistinguishablecodepoints

1033BanglaandOriya(Odia)

Bangla Oriya(Odia) NBGPDecision

ওU+0993 ଓU+0B13 Confusable

Table21ndashBanglaandOriyadistinguishablecodepoints

Bangla Oriya(Odia) NBGP

Decision

ঘU+0998 ସU+0B38 Distinguishable

Table22ndashBanglaandOriyadistinguishablecodepoints

60

11 Appendix-IIBengaliconsonantsandtheirallographs

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

প p y(z+ত)(z +ন)|(z +প)প8(z +য)r(z +র)(z +ল)~(z +স)(+প)(+প)

ফ pʰ (t +র)(t +ল)(+ফ)

ব b ( +জ)( +দ)( +ধ)v(+ব)ব8(+য)(+র)(+ল)ভ(+ভ)(+ব)(+ব)

(0+ধ) 2 (3+ব)

ভ bʱ ভ8(+য)(+র)(+ল)

ত t (x+ত)8(x+x+য)(x+x+ব)(x+থ)(x+ন)ত8(x+য)(x+ম)8(x++য)(x+ব)(x+র)y(z+ত)(p+ত)(p+x+ব)(+ত)q8(+x++য) (+x+র)Thereisamarkedformofত+=ৎৎ6 (+xৎ)

amp (5+ত)

61

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

থ tʰ থ8(iexcl+য)cent(iexcl+র)pound(+থ)(x+থ)curren(+থ)

(7+থ) 9 (+থ)

দ d yen(brvbar+গ)sect(brvbar+ঘ)uml(brvbar+দ)copy(brvbar+ধ)দ8(brvbar+য)ordf(brvbar+ব)laquo(brvbar+ভ)not(brvbar+র)(+দ)shy(+দ)reg(+brvbar+র)not6 (+brvbar+র)

(lt+গ) gt (lt+ধ)

ধ dʱ macr(deg+ন)plusmn(deg+ম)ধ8(deg+য)sup2(deg+র)sup3(acute+ধ)copy(brvbar+ধ)(+ধ)micro(+ধ)

( (+ধ) gt (lt+ধ) (0+ধ) (7+ধ)

ট ʈ para(middot+ট)ট8(middot+য)cedil(middot+ব)sup1(middot+র)ordm(p+ট)raquo(frac14+ট)

ঠ ʈʰ ঠ8(frac12+য)frac34(iquest+ঠ)Agrave(frac14+ঠ)

ড ɖ Aacute(Acirc+ড)ড8(Acirc+য)Atilde(Acirc+র)

ঢ ɖʱ ঢ8(Auml+য)Aring(iquest+ঢ)

চ tʃ AElig(Ccedil+চ)Egrave(Ccedil+ছ)Eacute(Ccedil+Ecirc+র)Euml(Ccedil+ঞ)চ8(Ccedil+য)Igrave(Iacute+চ)Icirc(Iuml+চ)

(A+চ)

62

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ছ tʃʰ ETH(Ecirc+র)Egrave(Ccedil+ছ)Ntilde(Iacute+ছ)Ograve(Iuml+ছ)

$ (A+ছ)

জ dʒ Oacute(Ocirc+জ)Otilde(Ocirc+Ocirc+ব)Ouml(Ocirc+ঝ)times(Ocirc+ঞ)জ8(Ocirc+য)Oslash(Ocirc+র)Ugrave(Iacute+জ)

(A+জ)

ঝ dʒʱ (notprivilegedenoughtohaveclustersasafirstmember)Ouml(Ocirc+ঝ)Uacute(Iacute+ঝ)

ক k Ucirc(p+ক)ordm(p+ট)(p+ত)Uuml(p+x+র)(p+x+ব)Yacute(p+ন)THORN(p+ব)szlig(p+ম)ক8(p+য)agrave(p+র)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)auml(p+frac14+ব)aacute8(p+frac14+য)aring(p+স)s(aelig+ক)ccedil(+p+র)

amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)

খ kʰ (notprivilegedenoughtohaveclustersasafirstmember)egrave(aelig+খ)

63

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

গ g eacute(acute+গ)ecirc(acute+দ)sup3(acute+ধ)euml(acute+ন)igrave(acute+ব)iacute(acute+ম)গ8(acute+য)icirc(acute+র)iuml(acute+ল)eth(aelig+গ)eth6 (+aelig+গ)

( (+ধ) (H+গ) K (L+H+গ)

ঘ gʱ ntilde(ograve+ন)ঘ8(ograve+য)oacute(ograve+র)ocirc(aelig+ঘ)

ঞ Thisletterdoesnothaveanyparticularphoneticvaluebutmostlypronouncedasn

Igrave(Iacute+চ)Ntilde(Iacute+ছ)Ugrave(Iacute+জ)Uacute(Iacute+ঝ)times(Ocirc+ঞ)

(A+চ) $ (A+ছ) (A+জ) M (A+ঝ)

ণ n otilde(iquest+ট)frac34(iquest+ঠ)ouml(iquest+ড)divide(iquest+Acirc+র)Aring(iquest+ঢ)oslash(iquest+ণ)ণ8(iquest+য)ugrave(iquest+ব)acirc(p+frac14+ণ)uacute(frac14+ণ)ucirc(+ণ)

O (P+ড) - (P+R+র) + (S+ণ)

ঙং ŋ s(aelig+ক)uuml(aelig+p+র)egrave(aelig+খ)eth(aelig+গ)ocirc(aelig+ঘ)yacute(aelig+p+ষ)(Insomecontextsaeligisreplacedbyং)কংঅং

) (H+ক) (H+গ) U (H+ঘ)

64

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ম m thorn(+ল)yuml(+প)(+z+র)(+ভ)(++র)$(+ম)(+র)(x+ম)plusmn(deg+ম)amp(+ম)atilde(p+frac14+ম)

W (3+ম)

ন n (+ট)((+middot+র))(iquest+ঠ)u(+ড)(+Acirc+র)(+ত)q(+x+র)q8(+x++য)curren(+থ)shy(+দ)reg(+brvbar+র)micro(+ধ)+(+deg+র)(+brvbar+ব)-(+ন)(+ম)ন8(+য)(+স)0(+ন)

(7+থ) (7+ধ) (7+Y+র)

শ ʃ Icirc(Iuml+চ)Ograve(Iuml+ছ)1(Iuml+ন)2(Iuml+ম)3(Iuml+র)4(Iuml+ল)শ8(Iuml+য)

ষ ʃ 5(frac14+ক)raquo(frac14+ট)Agrave(frac14+ঠ)uacute(frac14+ণ)6(frac14+প)7(frac14+z+র)8(frac14+ফ)raquo(frac14+ট)9(frac14+middot+র)Agrave(frac14+ঠ)uacute(frac14+ণ)ষ8(frac14+য)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)

+ (S+ণ)

65

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

স sampʃ (+ক)(+ট)(+প)(+ফ)lt(+ত)pound(+থ)(+ট)(+ক)=(+খ)স8(+য)gt(+র)(+ল)aring(p+স)

9 (+থ)

হ h ucirc(+ণ)0(+ন)amp(+ম)হ8(+য)(+র)A(+ল)

W (3+ম)

ড় ɽ B(C+গ)

ঢ় ɽʱ (notprivilegedenoughtohaveclusters)

য dʒThesecondarysymbol(allograph)jɔ-phalahastwophoneticvaluesWhenaddedtotheinitialconsonantinaworditisavowelaelig(asinশ8ামলর 8াপারetc)Butafteranon-initialconsonantitjustdoublesitinpronunciation(asinকায6ধায6etc)The+যcombinationhastwophysicalmanifestationsmdashর 8andয6

ক8(p+য)স8(+য)র 8(+য)[Justর 8isneverusedinBanglaorthographyর 8াisbutthenitslasttwosymbolsYa-phalaa-karaconstituteavowelsignrepresentingthevowelঅ8া]

66

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

র r Twomanifestationsmdashi lরফrepʰasthefirst

memberofaclusteregপ6ৎ6 not6 য6D6(+deg+ব)(earlierE6=+brvbar+deg+বafour-termcluster)etc(placedoverthefollowingconsonant)

ii র-ফলাrɔ-pʰɔlaasthesecondthirdmemberofaclustereg etc(placedundertheconsonantitfollows)

ল l F(+গ)(+প)G(+ব)H(+ম)I(+ট)J(+ড)K(+ক)F(+গ)L(+দ)ল8(+য)iuml(acute+ল)(+ল)thorn(+ল)

ঃ hwordfinallywordmediallyitdoublesthepronunciationofthefollowingconsonant

অঃকঃ

অব

Page 17: Proposal for a Bangla (or Bengali) Script Root Zone …‘Bangla’ (or Bengali) is historically and genealogically regarded as an eastern Indo-Aryan language with around 178.2 million

17

Vowel Correspondingvowelsign(kāras(Mātrās)

Couldappearafterঅ lsquoArsquoU+0985oranyothervowel

ংU+0982Anusvara

Couldappearafter অlsquoArsquoU+0985oranyothervowel

ঃU+0983Visarga

Afteranyconsonant U+09CD(Hasanta)

- ঽ U+09BDAvagraha

Table7BanglaVowelswithcorrespondingkārs

334TheAnusvāraonuʃʃār(ং-U+0982)TheAnusvāra or onuʃʃār inBangla at times represents a homorganic nasal but notalwaysItreplacesaconjunctgroupofalsquoNasalConsonant+Hasanta+ConsonantrsquowherethesecondconsonantbelongstotheVelarvargaorsetasinলংকাButitoftenappearsalso for such combinations involving non-velars appearing as the lastmember of thecombinationasinলGাংটা ldquonakedrdquoorলGাংচা ldquoakindofsweettolimprdquoBeforeanon-vargaconsonant the Anusvara represents a nasal sound that may have an alternativeconjoined writing symbol representing the corresponding nasal consonant of theparticularsetAlthoughModernHindiMarathiandKonkaniprefertheanusvāratothecorrespondingHalf-nasal inBangla it isclearlydemarcatedastowhereonemustusetheAnusvāraandwhere ithastobeaconjunctclusterwithanasalasthefirstorthesecondcomponent

335NasalizationCandrabindu(-U+0981)

Candrabindu denotes nasalization of the preceding vowel as in চাদ cad lsquomoonrsquo(U+099AU+09BEU+0981U+09A6)Thissignwithadotinsidethehalf-moonmarkisusedasnasalizationmarkerinmanyBrahmı-basedscripts[143]

336Nukta(-U+09BC)ThenuktasigndoesnotexistinBanglaorthographyItispredominantlyusedinmanyBrahmıderivedscriptssuchasDevanagarı(forHindiBodoMaithiliSantaliKashmiriandSindhiThetermandtheconceptofnuktaareborrowedinBanglaTheIDNAProtocol(RFC5891)statesthatIDNsmustbeinUnicodeNormalizationFormC (NFC) RFC 7940 applies this requirement to LGRs The definition of NFC in the

18

UnicodeStandardcontainsanumberofcompositionexclusionsAsaresulttheBanglalettersয় YYAড় RRAandঢ় RRHAhavetoberepresentedinthethisLGRbyusingthesequences (YA +Nukta U+9AF + U+09BC) (DDA + Nukta U+9A1 + U+09BC) and(DDHA+NuktaU+9A2+U+09BC)insteadofthesinglecodepointsYYA(U+9DF)RRA(U+09DC) andRRHA (U+09DD) although the useof lsquoNuktarsquo is otherwise completelyunnaturalinBanglaIt is noted that in the current Unicode Standard chart these characters are listed asadditionalconsonantsAspertheLGRProcedurehoweverthesedecisionsdependontheIDNAProtocolthroughasetofprodeduresdevelopedbytheIETFEventhoughtheUnicode Standard also prescribesmethods to produce these three characters both asatomiccharacters (forexample09DC forড় [r]09DD forঢ় [rh] and09DFasয় [y]assinglekeystroke)theIDNAprotocolrequiresthatwetreatthemasconjunctcharactersandthenallocatecodesfortheseintheUnicodeBengaliBlockItmaybenotedthattherecouldbesporadicattemptsorcasesofwritingMuslimnamesUrdupoeticwordsandPerso-Arabicloanwordswithnuktaunderক(k)খ(kh)গ(g)জ(j) and ফ (ph) only for the sake of correct pronunciation and for maintaining thesanctityoftheloanwordThesewerealsolikeusingBanglawritingsystemtoworkliketheIPAscriptItishowevernotinuseinBanglawritinginprinting

337Visargabiʃɔrgo(ঃ-U+0983)andAvagraha(ঽ-U+09BD)

TheVisargabiʃɔrgoU+0983 is frequentlyused inBangla loanwordsborrowed fromSanskritandrepresentsasoundveryclosetohOnecouldquoteasanexampleদঃখduhkholdquosorrowrsquorsquoldquounhappinessrsquorsquo(U+0926U+0941U+0983U+0916)The Avagraha ঽ (U+09BD) is mainly used in Sanskrit Pali Prakrt or Maithili textswritteninBanglaItisgraduallybeingreplacedbyanuppercomma(egনেরাঽপরািণre-writtenasনেরাrsquoপরািণ)ItisrarelyusednoweveninotherlanguagesusingBanglascriptIncaseofLGRtheAvagrahaisnotpartoftherepertoireIthasbeendecidedthereforenottoretainAvagraha(ঽ)(U+09BD)becauseitisblockedinTLDsaspertheMaximalStartingRepertoire(MSR)PleaseseeAppendixIIinsection11foracompletelistofBanglaconsonantsandtheirallographs

338ZeroWidthNon-joiner(U+200C)andZeroWidthJoiner(U+200D)ThisnoteispertinenttotheuseofZeroWidthJoiner(ZWJ)andZeroWidthNonJoiner(ZWNJ)asusedinBanglaItneedstobenotedthatNepaliKonkaniandHindiusethesetwosignsinadifferentmanner

19

ZWJ(U+0200D)andZWNJ(U+0200C)arecodepointsthathavebeenprovidedbytheUnicodestandardto instructtherenderingofastringwherethescripthastheoptionbetweenjoiningandnon-joiningcharactersWithouttheuseofthesecontrolcodesthestringmayberenderedinanalternateformfromwhatisintendedUseofZWJ

bull InsofarasBanglaisconcernedZWJisusedfortheproperrenderingofcharacterssuchaskhaṇḍa-taৎasinসতGিজৎ (satyajit)ldquoSatyajitrdquoandসৎ(sat)ldquohonestrdquoThisistypedasfollowsta+Hasanta+ZWJ(U+0200D)

bull However ZWJ is more important where same combination of consonantal

characters is represented differently depending upon the contexts Eg র++য

havetworepresentationsinBanglamdashasযHandasর GTogettheformযHonehasto

type in the following mannermdashর++য but for র G the sequence would be

র+ZWJ++য [154] In other words ZWJ is used in the rendering of wordsdemanding ya-phalā after ra which is otherwise not possible to type (render)due to the same order of ra+hasanta+antastha ja in the medial andor finalposition Interestingly ra+hasanta+antastha ja is used to type repha on theconsonant -antasthaja as inকায6 (kaarjo) In order to get a ya-phalā after the

consonant -ra it is therefore obligatory to use ZWJ after -ra as in র Gাপার

(wrapper)র Gাশ(rash)র Gািল(rally)etcThetypingsequenceisgivenbelow

ra(র)+ZWJ+hasanta()+antasthaja(য)=র GUseofZWNJ

bull TheuseofZWNJinBanglaisusedtorepresenttheexplicitHasantaorHalantInordertoavoidconjunctformationincaseswherethereisanexplicithasantabeforethesucceedingconsonanttheZWNJisused

Consonant+hasanta+ZWNJ+consonant=explicithasantaExampleAা7 কথন(prakkathanaprakkɔtʰon)

TheuseofZWJZWNJhavebeenruledoutfromtherootzonebythe[Procedure]Usedin Bangla to create alternate renderings the insertion of these two signs can affectsearchingaswellasNLP

20

TheZeroWidthNon-joiner(ZWNJ)isaninvisiblecharacterusedincertaincases(afterHasanta)wheredefaultconjunctformationistobeexplicitlyrestrictedandtheHasantajoiningthetwoconsonantsparticipatingintheconjunctformationneedstobeexplicitlyshown

339UseofYa-phalaaYa-Phalaasequencesare two instances inBanglawhereHasanta isprecededbya fullvowel(U+0985অ-BENGALILETTERAandU+098Fএ-BENGALILETTERE)

bull অা 098509CD09AF09BE BENGALILETTERA+BENGALISIGNVIRAMA+BENGALILETTERYA+BENGALIVOWELSIGNAA

bull এা 098F09CD09AF09BEBENGALILETTERE+BENGALISIGNVIRAMA+BENGALILETTERYA+BENGALIVOWELSIGNAA

For renderingYa-phalā followedbyঅ andএ it isnecessary to typeU+09CDHasantaplusU+09AFyaprecededby thesaidvowelsThis isapurely ligaturalentityand theadditionofYa-phalāandākaraisusedtoelicittheaeligsoundasinEnglishacidঅGািসড

association অGােসািসেয়শনlsquobatrsquoবGাটlsquofatrsquo ফGাট lsquomatrsquo মGাটlsquocaprsquoকGাপetcTheBrāhmīscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribed as lsquovowel killerrsquo although it actually indicates absenceof a vowel after themarkedconsonantOnly theconsonantscanhave theHasantamarkedButasweseehereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅ8াandএ8া(CfUnicode100p473[100])

3310FormationofRa-phalaaandRefSequences Thiscasereferstotheformationofrephaandra-phalāasfollows

Ra-Hasanta= (C2H)whereC2iseither

09B0(র-BENGALILETTERRA)or 09F0(ৰ-ASSAMESELETTERRAUnicodename

BENGALILETTERRAWITHMIDDLEDIAGONAL)His09CD(-BENGALISIGNVIRAMA)

Owingtoco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+kaasinঅকHarkaldquothesunldquo)ra-phalā=C+Hasanta+ra(egT ieka+Hasanta+raasinচTchakraldquocyclerdquo)Thepointisin

21

both the cases the slot for ra could be Bangla ra র (U+09B0) or the Assamese ra ৰ(U+09F0)followedprecededbythecommonHasanta(U+09CD)whereastheshapesofrephaandra-phalāinboththecasesremainthesameTheLGRmakesanoteofthispoint of concern with respect to the two RAs in disguise as it would be compeltelyimpossibletodistinguishbetweenthemwithnakedeyesinalablesogeneratedwhichmay consequently lead to concerns related to spoofing and other kind of cyberirregularitiesThemotivetoclassthesetwoCPsas(blocking)variantsisbecausefullyrendered labels may mask the distinction between Bangla ra র (U+09B0) or theAssameseraৰ(U+09F0)ThatprovidesthejustificationforVariantSet4thoughonlyinthecontextoffollowingHasantThedifferencebetweentheRAsisonlydistinguishableifonelooksintotheirUnicodevaluesThereforelabelssuchasঅকHarka শীষH sırsalsquotopapexrsquo অY abhra lsquocloudthe skyrsquo ম śrama lsquophysical labourrsquo could be extremelydangerous as theweb-usermay never verify the digital content (the labels) with itsunicodevaluecodepoints ThispointismadeexplicitlywithreferencetoTable9(ofsequencesp36)andTable16(ofWLESymbolsp47)thataretofollowMoreoveritisnoteworthythattheREPHAcanalsooccurwithKHANDATATheconditionsinthiscontextofKHANDATAareliabletobesuchthattheCshouldbeeitherRAU+09B0(র)(usedinBangla)orRAU+09F0(ৰ)(usedinAssamese)

4 OverallDevelopmentProcessandMethodologyThe Neo-Brāhmī Generation Panel (NBGP) has been formed by members havingexperience in Linguistics (especially in NLP Computational linguistics) LiteratureLanguageHistoryandEpigraphyUndertheNeo-BrāhmīGenerationPanelBanglaandeightotherscriptsbelongingtoseparateUnicodeblocksarebeingtakenuptoassignaseparate LGR for each However an attempt ismade to ensure that the fundamentalphilosophybehindbuildingthoseLGRsconsistentwithallotherBrāhmī-derivedscriptsThepresentLGRwillcater tomultiple languagesbelongingtoEGIDSscale1to4(seeTable4)thatuseBanglascriptThefollowingguidingprinciplesareusedinmakingdecisionsaboutBanglaLGRCode-points

41 GuidingPrinciplesTheNBGP adopts followingbroadprinciples for selection of code-points in the code-pointrepertoireacrosstheboardforalltheNeo-Brāhmīscriptswithinitsambit

411 InclusionPrinciples4111 ModernUsageEvery character proposed should be in the everyday usage of a particular linguisticcommunityThecharacterswhichhavebeenencodedintheUnicodefortranscription

22

purposesonlyorforarchivalpurposeswillnotbeconsideredforinclusioninthecode-pointrepertoire4112 UnambiguousUseEvery character proposed should have unambiguous understanding among linguistsaboutitsusageinthelanguage

42 ExclusionPrinciplesThe main exclusion principle is that of External Limits on Scope These consist ofprotocolsor standardswhichareprerequisites to theLabelGenerationRule-setsAllfurtherprinciplesareinfactsubsumedundertheselimitationsbuthavebeenspeltoutseparatelyforthesakeofclarity421 ExternalLimitsofScopeThecodepointrepertoireforrootzonebeingaveryspecialcaseatthetopofprotocolhierarchies thecanvasofavailablecharacters forselectionasapartof theRootZonecodepointrepertoireisalreadyconstrainedbyvariousprotocollayersbeneathitThefollowingthreemainprotocolsstandardsactassuccessivefiltersiTheUnicodeChartOut of all the characters that are needed by the script in question if a particularcharacter is not encoded in Unicode it cannot be incorporated in the code pointrepertoire Such cases are quite rare and especially so in Bangla-Asamiyā-ManipuriWritingSystemgiventheelaborateandexhaustivecharacterinclusioneffortsmadebytheUnicodeconsortiumiiIDNAProtocolUnicode being the character-encoding standard for providing the maximum possiblerepresentation of a given scriptlanguage it has encoded as far as possible all thepossible characters needed by the script However the Domain name being aspecialized case it is governed by an additional protocol known as IDNA(InternationalizedDomainNames inApplications) The IDNAprotocol excludes somecharactersoutofUnicoderepertoirefrombeingpartofthedomainnamesiiiMaximalStartingRepertoire(MSR)TheRoot-zoneLGRbeing the repertoireof characterswhicharegoing tobeused forcreationoftheRoot-zoneTLDswhichinturnconstituteanevenmorespecializedcaseof domain names the ROOT LGR procedure introduces additional exclusions on theIDNArsquosallowedsetofcharacters

23

ExampleBanglaSignAvagrahaঽ(U+093D)evenifallowedbyIDNAprotocol isnotpermittedintheRootZoneRepertoireaspertheMSRTosumuptherestrictionsstartoffwithadmittingonlysuchcharactersasarepartofthe code-block of the given scriptlanguage The IDNA Protocol further narrows thisdownandfinallyanadditionalfilterintheformofMaximalStartingRepertoirerestrictsthecharactersetassociatedwiththegivenlanguageevenmore4211 NoPunctuationMarksTheTLDsbeingidentifierspunctuationmarkerspresentinBraHami-basedscriptswillnotbeincluded 4212 NoSymbolsandAbbreviationsAbbreviations weights and measures and other such iconic characters like BANGLAISSHAR(U+09FA)BANGLACURRENCYDENOMINATORSIXTEEN৹(U+09F9)etcwillalsonotbeincluded4213 NoRareandObsoleteCharactersThere are characterswhich have been added toUnicode to accommodate rare formssuchasSanskriticVOCALICRRৠ(U+09E0)andVOCALICLldquoঌrdquo (U+098C)aswellasVOCALICLLৡ(U+09E1)andtheallographicndashkaraformsofthelattertwosymbols-VOWELSIGNVOCALICL(U+09E2)andVOWELSIGNVOCALICLLldquo(U+09E3)Allsuch charactersareexcludedwhich complieswith theConservatismprincipleas laiddownintheRootZoneLGRprocedureHoweverinBanglathe-karacorrespondingtoVOCALICRRৠ(U+09E0)whichisVOWELSIGNVOCALICRRldquordquo(U+09C4)isstill inactiveuseincertainlimitedborrowedorSanskriticwordsandarethereforeretained4214 NoStressMarkersofClassicalSanskritandVedicStressmarkers for classical Sanskrit will not be included This is also in consonancewiththeLetterprincipleaslaiddownintheRootZoneLGRprocedure4215 ABNFThe Augmented Backus-Naur Formalism (ABNF) is described in Section 541 andAppendix(Section101)

5 RepertoireTheBanglaWritingSystemisrepresentedinUNICODEusingtheBengali(Bangla)scriptname as enumerated in ISO 15924 corresponding to languages such as Asamiyā(Assamese) Bangla (Bengali) and Manipuri The BENGALI block used for Bangla-

24

Asamiyā-Manipuri in theUNICODEhas93 entriesThis sectiondetails the code-pointrepertoirethattheNeo-BrāhmīGenerationPanel[NBGP]proposestobeincludedintheBanglaLGRItmaybementionedherethat theGovernmentofAssamhassubmittedaproposal toBureauof Indian Standards (BIS) on26th February2016 for dis-unificationofBanglaand Asamiyā Scripts The BIS inits 8thMeetingofIndian Language Technologies andProducts Sectional Committee LITD 20 held on 23rd Aug 2017 decided torefer theproposalforrecognitionofAssamesescriptinISOIEC10646toISOUntiltheUNICODEConsortiumtakesanyfurtheractionitwillbeassumedthattheCodePointRepertoireunderTable11willbevalidforallthethreelanguagesasaboveFor each of the code points language references have been given in the last columntitledReferenceunderTable8titledtheldquoCodePointRepertoirerdquoForentirecoverageofBanglacodepointsreferencesofBanglaAsamiyā(Assamese)Manipuri(Meitei)andBishnupriya are given Kokborok written in Bangla script is not known to haveintroducedmanynewcomplicationsexceptforoneparticularcharacterThoughonlyafewrepresentativelanguagesunderEGIDSScale1-4havebeenchosenforreferencingthey together cover all the code-points required for all the languages that NBGP hasconsideredasgivenunderBanglaUnicodePoints(asgiveninUNICODE63)Howeverbefore thedetailsarepresented it is ideal to lookat theBanglaCodePointChartfromMaximalStartingRepertoire[MSR]Version3Itmaybenotedthattheshapesofthereferenceglyphsgivenbelowinthecodechartsarebasedononeofthemanyfontsdesigned and are not prescriptive because there could be some variations in actualfonts ndash both UNICODE-compatible and True-Type ones Consider the following Codepointtable

25

Figure1BanglaCodePagefrom[MSR]forBangla-Asamiyā-Manipuri

ColourconventionAllcharactersthatareincludedinthe[MSR]-YellowbackgroundPVALIDinIDNA2008butexcludedfromthe[MSR]-PinkishbackgroundNot PVALID in IDNA2008 or are ineligiblefor the root zone (digits hyphen) - Whitebackground

GiventheBanglaUnicodeBlockasinFigure1forthecodepointsthoseareincludedintheMSRthefollowingsymbolswillneedaseparatetreatmentৎ U+09CE BanglaLetterKhanda-Taৰ U+09F0 Asamiyā-BanglaLetterRaWithMiddleDiagonalৱ U+09F1 Asamiyā-BanglaLetterRaWithLowerDiagonal

26

51 CodePointRepertoireInclusion

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

1 U+0981 BENGALISIGNCANDRABINDU

Candra-bindu

1Bangla2Manipuri2Assamese

[112][122][125]

2 U+0982 ং BENGALISIGNANUSVARA

Onushshar(Anusvara)

1Bangla2Manipuri2Assamese

[112][122][125]

3 U+0983 ঃ BENGALISIGNVISARGA

Biśarga(Visarga)

1Bangla2Manipuri2Assamese

[112][122][125]

4 U+0985 অ BENGALILETTERA

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

5 U+0986 আ BENGALILETTERAA

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

6 U+0987 ই BENGALILETTERI

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

7 U+0988 ঈ BENGALILETTERII

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

27

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

8 U+0989 উ BENGALILETTERU

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

9 U+098A ঊ BENGALILETTERUU

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

10 U+098B ঋ BENGALILETTERVOCALICR

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

11 U+098F এ BENGALILETTERE

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

12 U+0990 ঐ BENGALILETTERAI

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

13 U+0993 ও BENGALILETTERO

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

14 U+0994 ঔ BENGALILETTERAU

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

28

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

15 U+0995 ক BENGALILETTERKA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

16 U+0996 খ BENGALILETTERKHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

17 U+0997 গ BENGALILETTERGA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

18 U+0998 ঘ BENGALILETTERGHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

19 U+0999 ঙ BENGALILETTERNGA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

20 U+099A চ BENGALILETTERCA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

21 U+099B ছ BENGALILETTERCHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

29

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

22 U+099C জ BENGALILETTERJA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

23 U+099D ঝ BENGALILETTERJHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

24 U+099E ঞ BENGALILETTERNYA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

25 U+099F ট BENGALILETTERTTA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

26 U+09A0 ঠ BENGALILETTERTTHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

27 U+09A1 ড BENGALILETTERDDA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

30

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

28 09A109BC(U+09DC)

ড় NormalizedformofBENGALILETTERRRA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]09DCisthepreferredcodepointhoweveritisnotavailableforLGRasperthestandardsgoverningthisLGRdevelopment

29 U+09A2 ঢ BENGALILETTERDDHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

30 09A209BC(U+09DD)

ঢ় NormalizedformofBENGALILETTERRHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]09DDisthepreferredcodepointhoweveritisnotavailableforLGRasperthestandardsgoverningthisLGRdevelopment

31 U+09A3 ণ BENGALILETTERNNA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

32 U+09A4 ত BENGALILETTERTA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

31

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

33 U+09A5 থ BENGALILETTERTHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

34 U+09A6 দ BENGALILETTERDA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

35 U+09A7 ধ BENGALILETTERDHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

36 U+09A8 ন BENGALILETTERNA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

37 U+09AA প BENGALILETTERPA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

38 U+09AB ফ BENGALILETTERPHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

39 U+09AC ব BENGALILETTERBA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

32

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

40 U+09AD ভ BENGALILETTERBHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

41 U+09AE ম BENGALILETTERMA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

42 U+09AF য BENGALILETTERYA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

43 09AF09BC(U+09DF)

য় NormalizedformofBENGALILETTERYYA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]09DFisthepreferredcodepointhoweveritisnotavailableforLGRasperthestandardsgoverningthisLGRdevelopment

44 U+09B0 র BENGALILETTERRA

Consonant 1Bangla2Manipuri

[112][125]

45 U+09B2 ল BENGALILETTERLA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

46 U+09B6 শ BENGALILETTERSHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

33

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

47 U+09B7 ষ BENGALILETTERSSA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

48 U+09B8 স BENGALILETTERSA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

49 U+09B9 হ BENGALILETTERHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

50 U+09BE া BENGALIVOWELSIGNAA

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

51 U+09BF ি BENGALIVOWELSIGNI

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

52 U+09C0 ী BENGALIVOWELSIGNII

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

53 U+09C1 BENGALIVOWELSIGNU

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

34

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

54 U+09C2 BENGALIVOWELSIGNUU

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

55 U+09C3 BENGALIVOWELSIGNVOCALICR

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

56 U+09C4 BENGALIVOWELSIGNVOCALICRR

Kāra(Mātrā)

1Bangla2Assamese

[112][122]

57 U+09C7 l BENGALIVOWELSIGNE

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

58 U+09C8 m BENGALIVOWELSIGNAI

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

59 U+09CB lা BENGALIVOWELSIGNO

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

60 U+09CC lৗ BENGALIVOWELSIGNAU

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

35

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

61 U+09CD BENGALISIGNVIRAMA

Hasanta(=Halant)Virama(=Da ri)

1Bangla2Assamese2Manipuri

[112][122][125]

62 U+09CE ৎ BENGALILETTERKHANDATA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

63 U+09F0 ৰ BENGALILETTERRAWITHMIDDLEDIAGONAL

Consonant 2Assamese [122]

64 U+09F1 ৱ BENGALILETTERRAWITHLOWERDIAGONAL

Consonant 2Assamese2Manipuri

[122][125]

Table8BanglaCode-PointRepertoire

Apart from the above individual code-points the Neo-Brāhmī Generation Panel alsoproposes some specific sequences which enable conditional inclusion of the BanglaLETTER A and E followed by Bangla SIGN VIRAMA and Bangla LETTER YA againfollowed by Bangla VOWEL SIGN AA in the repertoire for enabling inclusion of aeligsoundasinEnglishlsquobatrsquolsquocatrsquoetc

SrNo

UnicodeCodePoints

Sequence

CharacterNames Examplelanguagesusingthecode-point

(Notexhaustive

list)

Reference

36

SrNo

UnicodeCodePoints

Sequence

CharacterNames Examplelanguagesusingthecode-point

(Notexhaustive

list)

Reference

S1 098509CD09AF09BE

অ8া BENGALILETTERABENGALISIGNVIRAMABENGALILETTERYABENGALIVOWELSIGNAA

BanglaAssamese

[112][122]

S2 098F09CD09AF09BE

এ8া BENGALILETTEREBENGALISIGNVIRAMABENGALILETTERYABENGALIVOWELSIGNAA

Bangla [112]

Table9Sequences

52 CodePointRepertoireExclusionTherearesomecharactersoftheBanglascriptthatfindplaceintheUnicodebuthavenot been included in the repertoire in the LGR proposal The reason for excludingঌ(U+098C)andৗ(U+09D7)isthattheyarerareandobsoletecharacters

SrNo CodePoints

Glyph CharacterNames Note

1 U+098C ঌ BENGALILETTERVOCALICL Limitedordeclininguse

2 U+09D7 ৗ BENGALIAULENGTHMARK Limitedordeclininguse

Table10ExcludedCodePoints

37

53 CodepointnotusedaloneBENGALI SIGN NUKTA U+09BC (See 336) is excluded from repertoire since it will never be used alone It will be used as sequence in three special characters in normalized form for ড় ঢ় য়

UnicodeCodePoint

Glyph CharacterName Reasonforexclusion

U+09BC BENGALISIGNNUKTA

Never used alone Only used together with U+09A1 ড

U+09A2 ঢ U+09AF য as to form ড় ঢ় য় respectively

Table10bExcludedCodePoints

54 TheBasisofPresentIDNThepresentLGRhasalsobenefitedfromtheearlierworkonIDNforBangla(differentversions)doneforभारतorভারতdraftedbetween20112009and18072013

541 TheABNFVariablesTheAugmentedBackus-NaurFormalism(ABNF)beganwiththefollowingvariables

CrarrConsonantVrarrVowelMrarrkāra(Mātrā)BrarrAnusvāra(onuʃʃār)DrarrCandrabinduXrarrVisarga(biʃɔrgo)HrarrHasantaViramaZrarrKhandaTa

TheAugmentedBackus-NaurFormalism(ABNF)willusethefollowingOperators

SrNumber Operator Function

1 ldquo|ldquo Alternative

2 ldquo[]rdquo Optional

3 ldquordquo VariableRepetition

38

4 ldquo()rdquo SequenceGroup

Table11TheABNFFormalism

InwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoBanglaaregiventofacilitateunderstanding

542 TheVowelSequenceInwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoBanglaare given To facilitate understanding of other Brahmi script users equivalents inDevanāgarīareprovidedwherevernecessaryAvowelsequenceismadeupofasinglevowelItmaybefollowedbutnotnecessarily(optionally)byanAnusvāraonuʃʃār(B)Candrabindu(D)oraVisargabiʃɔrgo(X)ThenumberofDBorXwhichcanfollowaV inBanglamaynotberestrictedtooneGoingbytherules illustratedinthedocument it isclearthat formationssuchasVDDVBBandVXXare invalidorthographicunitsHowever it isvalidandpossible tohaveformationsorsequencessuchasanusvarafollowedbyachandrabinduononehandandvisarga followed by a chandrabindu on the other as in হ8াংচা lsquohaelignchārsquo and lsquohaelignrsquo হ8াঃrespectivelyThepossibilityof aVisarga orAnusvāra (onuʃʃār) followingaCandrabinduexists inBangla Vowel can optionally be followed by a combination of Hasanta Virama [H]Consonant [C] to formaYa-phala ldquoYa-phala isapresentation formofU+09AFBanglaletterযorlsquoyarsquoRepresentedbythesequenceltU+09CDieBENGALISIGNVIRAMABangla SIGNHasanta or VIRAMA U+09AF -য BENGALI LETTER YAgt Ya-phala has aspecialformয়AgainwhencombinedwithU+09BEাBENGALIVOWELSIGNAA(ielsquoaarsquo(ā))itisusedfortranscribing[aelig]asintheldquoardquointheEnglishwordldquobatrdquowritteninBanglaasব8াটAVowel-sequenceadmitsthefollowingcombinations

5421 ASingleVowel

ExamplesV অअ

5422 AVowelwithConditionsAVowelcanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D]orVisarga[X]or Candrabindu+ Anusvāra [DB] or Candrabindu+ Visarga [DX] or combination ofHasanta(orVirama)[H]followedbyConsonant[C]followedbykāra(Mātrā)[M]

39

Examples

VB অং अ

VD অ अ

VX অঃ अः

VDB অং अ

VDX অঃ अःVHCM অ8াএ8া

5423 VHCMSequenceAVHCMsequencecanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D] orVisarga [X]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga [DX]

Examples VHCMB অ8াংএ8াং VHCMD অ8াএ8া VHCMX অ8াঃএ8াঃ VHCMDB অ8াংএ8াং VHCMDX অ8াঃএ8াঃ

543 TheConsonantSequence

5431 ASingleConsonant(C) ExampleC কक

5432 AConsonantwithConditionsAConsonantoptionallyfollowedbydependentvowelsignkāra(Mātrā)[M]orAnusvāra [B] or Candrabindu [D] or Visarga [X] or Hasanta (also known asVirama)[H]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga[DX]

Example

CM িকক -कक

CB কং क

40

CD ক क

CX কঃ कः

CH p क(Pureconsonant)

CDB কং क

CDX কঃ कः

5433 CMSequenceACMsequencecanbeoptionallyfollowedbyBDXDBorDX

Example CMB কীংকং कक

CMD কা का

CMX বীঃ वीः

CMDB কাং का

CMDX কাঃ काः

5434 SequenceofConsonants Asequenceofconsonants(upto4)joinedbyHasanta(alsoknownasVirama)

3(CH)CExample

CHC W rarr ন++ ত न+त

CHCHC sup2 rarr ন+ + ত+ + র न+त+र

CHCHCHC q8 rarr ন++ত++র++য় न+त+र+य

5435 Subsets While considering its subsets as a representative example we will consider thecombinationCHConlyhoweverthesameisequallyapplicabletoCHCHCandCHCHCHC[A]ThecombinationmaybefollowedbyMBDXDBorDX

Example CHCM sup3ী rarrক ক ী 4कrarrककী

CHCB sup3ং rarrক ক ং 4कrarrकक

CHCD sup3 rarrক ক 4कrarrकक

41

CHCX sup3ঃ rarrক ক ঃ 4कःrarrककঃ

CHCDB sup3 ং rarrক ক ং 4कrarrकक

CHCDX sup3ঃ rarrক ক ঃ 4कःrarrककः [B] 3(CH)CMmayfurtherbefollowedbyaBDXDBorDX

Example

CHCMB sup3ীং rarr ক ক ী ং 4क rarr क क ी

sup3ং rarr ক ক ং 4क rarr क क

CHCMD sup3া rarr ক ক া 4का rarr क क ा

CHCMX sup3ীঃ rarr ক ক ী ঃ 4कः rarr क क ी ः

CHCMDB sup3াংrarr ক ক া ং 4काrarr क क ा

CHCMDX sup3াঃ rarr ক ক া ঃ 4काः rarr क क ा ः

544 TheKhanda-Tasequence

5441 AsinglelsquoKhandarsquo-Ta(Z) Example Zৎ=x

5442 AKhandaTaCombination10AKhandaTacanbeprecededbyaconsonantandHasanta(alsoknownasVirama)

[CH]Z

Example র++ৎ =ৎHasinভৎHসনা (bhartsanā) scoldingNoteTheconditionsinthiscontextofKHANDATAarethattheCshouldbeeitherRAU+09B0(র)(usedinBangla)orRAU+09F0(ৰ)(usedinAssamese)

545 SpecialCasesSandPTwospecialcasesinvolvingSequences(referredtoasSandPinTable16underSection7)couldbedescribedbrieflyhereLetustakeupSinthefirstinstanceItisnoteworthythat thereare two instances inBanglawhereHasanta(U+09CD) isprecededbya full

10 Refer to Rule P in Section 7 Table 16

42

vowel (U+0985 অ - BANGLA LETTER A and U+098F এ - BANGLA LETTER E) ForrenderingYa-phalāfollowedbyঅandএitisnecessarytotypeU+09CDplusU+09AFyapreceded by the said vowels This is a purely ligatural entity and the addition ofYa-phalāandā-kāraisusedtoelicittheaeligsoundasinEnglishlsquobatrsquolsquofatrsquoetcTheBrahmıscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribedaslsquovowel killerrsquo although it actually indicates absence of a vowel after the markedconsonant Only the consonants can have the Hasanta marked But as we see hereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅGাandএGা(CfUnicode100p473[100])Another case refers to the formation of repha and ra-phalā in the said script andmentioned in the tableaboveasPOwing toco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+

kaasinঅকHarkaldquothesunldquo)ra-phalā=C+Hasanta+ra(egTieka+Hasanta+raas

inচTcakraldquocycleldquo)ThepointisinboththecasestheslotforracouldbeBanglaraর

(U+09B0)ortheAssameseraৰ(U+09F0)followedprecededbythecommonHasanta(U+09CD)whereastheshapesofrephaandra-phalāinboththecasesremainthesame

43

6 VariantsThissection talksabout thevariants in theBanglascriptTheNBGPcategorizes theseconfusinglyvariantsintwogroupsGroup1ConfusingduetopurevisualsimilarityGroup2ConfusingduetodeviationfromnormallyperceivedcharacterformationsbylargerlinguisticcommunityForGroup1anyidenticalcodepointsaredefinedasvariantsTheconfusablebutnotidenticalcasesarenotproposedasthereisanotherpanel(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesHowevercaseswhichbelongtoGroup2areproposedtobeconsideredasvariantsThesecasesarenotofmerevisualsimilarityasthey involve some deviations from the widely accepted norms of Bangla Aksharformations These can cause confusion even to a careful observer and hence beingproposedasvariantsThe variants are generated in a script when two or more forms are formed withdifferent storage or code points In Bangla the e-kāra ā-kāra and the o-kāra havedifferentcodepointsOnecantypeowithaconsonantatonegoandthesamebytypinge-kāra and ā-kāra as two separate keys getting the same results A reader cannotdifferentiatebetween the twoko (েকা)one typedwitha singlekeyand theotheronetypedwithtwodifferentkeysMoreoverthiswillnotbeconsideredasacaseofvariantbecauseakārafollowedbyakāraisnotallowed

61 InScriptVariantsHoweverweproposetwocasesoftruein-scriptvariantsinBanglascriptCASEIAs far as true variants in Bangla are concernedwemay drawour attention to caseswhereinHasantawith(U+09A5)থ(tha)appearsasconjunctwith(U+09B8)স(sa)and(U+09A8)ন(na)1 স+Hasanta+থ(U+09B8+U+09CD+U+09A5)versus

স +Hasanta+হ(U+09B8+U+09CD+U+09B9)

2 ন+Hasanta+থ(U+09A8+U+09CD+U+09A5)versus

ন+Hasanta+হ(U+09A8+U+09CD+U+09B9)

44

Theabovecombinationsifwrittenintraditionalorthographycouldbelittleconfusingwheretheথ(tha)inconjunctappearslikeaহ(ha)Theconjunctcouldbeintheinitialmedialorfinalpositions(asshownbelowinegno1)Itcouldbetypedwrongaswellthinking itwas aহ (ha)U+09B9 increasing the chances of risks in labelwriting andidentificationExamples1 acuteandসহ(asinacuteানsthānaacuteলsthulaাacuteGsvāsthyaঅacuteায়ীasthāyı)

2 andনহ(asinparagrantha)The fontswhichrepresent traditionalBanglawritingsystemcould tend tocreate thisproblemThereforethesemaybetakenascasesofvariantsinBanglaCASEIIAnotherinterestingexampleofvariantisencounteredinra+HasantaandHasanta+racombinationsinwritinglabelsintheBanglascript(forlanguagessuchasBanglaAssameseandManipuri)Thevariantcasesariseintypinglsquorepharsquo(involvingra+Hasanta)andlsquora-phalārsquo(involvingHasanta+ra)lsquoRepharsquocouldbeformedbytwosequences(mainlybecausebothAssameseandBanglafindplaceinthesameUNICODEpointsandlsquoB_RArsquoaswellaslsquoA_RArsquorefertothesamephoneticelement)Herethefinalligatureslookthesameandwillbeasfollows

(1) B_RA+H+C(2) A_RA+H+C

Where

B_RA = U+09B0BENGALILETTERRA(র)orA_RA = U+09F0BENGALILETTERRAWITHMIDDLEDIAGONAL(ৰ)H = U+09CDBENGALISIGNVIRAMA()C = anyconsonant(theoretically)

Example

Sequence1(UsingBanglaRA)

Ligature1

Sequence2(UsingAssameseRA)

Ligature2

U+09B0(র)U+09CD()U+0995(ক) কH U+09F0(ৰ)U+09CD()U+0995(ক) কH

U+09B0(র)U+09CD()U+09A0(ঠ) ঠH U+09F0(ৰ)U+09CD()U+09A0(ঠ) ঠH

Table12ExampleofRepha

45

NoteAsBanglaandAssameseকandঠlookexactlythesametheresultantcombinationswithRephalookidenticalAdditionofRephadoesnotmakeanydifferencelsquoRa-phalarsquocouldbeformedbytwosequencesonsimilargroundsandthefinalligatureswouldlookthesame

(1) C1+H+B_RA(2) C1+H+A_RA

WhereC1 = anyconsonantsexceptKhanda-ta

Example

Sequence1(UsingBanglaRA)

Ligature1

Sequence2(UsingAssameseRA)

Ligature2

U+0995(ক)U+09CD()U+09B0(র) U+0995(ক)U+09CD()U+09F0(ৰ)

U+09A8(ন)U+09CD()U+09B0(র) ) U+09A8(ন)U+09CD()U+09F0(ৰ) )

Table13ExampleofRa-phala

AstheAssameseandBanglaRephaandRa-phalaconjunctformslookthesamethiscouldcauseconfusabilitytotheend-usersHencetherephaandra-phalacasesneedtobedefinedasvariantsNBGPconcludedtodefineরandৰasvariantcodepointswhereonlyonevariantsetbetweenরandৰcouldcoverallcasesButthiswillcreateblockedvariantlabelsegif

someoneregistersldquoরররrdquothevariantlabelldquoৰৰৰrdquowillbegeneratedasvariantandwillbeblockedandviceversaHoweveritisonlyblockedatthelabellevelifsomeoneelseneedstoregisterotherlabelsegৰৰorৰৰৰৰitisstillpossible

62 CrossScriptVariantsAcrispcrossscriptstudyforBanglahasbeendonewithrespecttosisterscriptssuchasDevanāgarī Gurmukhı and Odia11(formerly Oriya) keeping in mind the visual andtechnicalconfusionstheymaycauseaslabelsonthewebdomainMoreoverthereisnoin-script variant in Bangla as far as the orthography is concerned The followingcharacters are being proposed by the NBPG as variants Although there are certaincharacters which are somewhat similar they but have not been included here TheyhavebeenprovidedintheAppendix(102)forreference

11 Unicode uses Oriya for the script although Odia is now the official term used

46

1 BanglaandNāgarīDevanāgarīScript

Bangla Devanāgarī

মU+09AE

मU+092E

িU+09BF

िU+093F

Table14-BanglaandDevanāgarīcross-scriptvariantcodepoint

2 BanglaandGurmukhiScript

Bangla Gurmukhı

মU+09AE

ਸU+0A38

িU+09BF

ਿU+0A3F

Table15-BanglaandGurmukhıcross-scriptvariantcodepoint

7 WholeLabelEvaluationRules(WLE) ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinsection32whenwritten inBangla12ScriptThe ruleshavebeendrafted in suchawaythattheycanbeeasilytranslatedintotheLGRspecificationsBelow are the symbols used in the WLE rules for each of the Indic SyllabicCategoryasmentionedinthetableprovidedinCodepointrepertoire(Section51)

C rarr Consonant

M rarr Kāra(Mātrā)

V rarr Vowel

B rarr Anusvāra

12 As used by the Unicode denoting and including both Assamese and Maṇipuri

47

D rarr Candrabindu

X rarr Visarga

H rarr Hasanta

Z rarr KhandaTa

S rarr S1S2(fromTable9)or(ae)Ya-phalā(V1HC1M1)whereV1iseither0985(অ-BENGALILETTERA)or098F(এ-BENGALILETTERE)His09CD(-BENGALISIGNVIRAMA)C1is-09AF(য-BENGALILETTERYA)M1is-09BE(া-BENGALIVOWELSIGNAA)S1 and S2 are valid even they are not allowed by the other context rules

P rarr Ra-Hasanta(C2H)whereC2iseither09B0(র-BENGALILETTERRA)or 09F0 (ৰ - ASSAMESE LETTER RA

Unicode name BENGALI LETTER RAWITHMIDDLEDIAGONAL)

His09CD(-BENGALISIGNVIRAMA)

Table16-SymbolsusedinWLErules

ItisalsoperhapsidealtomentionherethatinBanglatheconsonantletters(orgraphemes)arephysicallyjoinedtoformldquoclustersrdquothatcouldtheoreticallyconjoinfromtwotofourconsonantsandcombinetocreatenewshapesDashandChaudhuri(1998)statethatthereareldquonearly380uniqueconsonantclustersrdquooutofwhichBi-consonantalcombinationsare290three-lettercombinationsaccountforanother80andtherareroneswithfourlettersnumber10more[136Pg4]MoredetailsofsuchcombinationscouldbeseeninPabitraSarkar(1993)[135] 71FinalSetofWLERulesTheprevalentpatternsinBanglaandvariousrestrictionsbelowarethespecificWLErulesthatneedtobeimplemented

48

1 CisasetofCandCNwhereCNisthesetofnormalizedformsofড়ঢ়য়2 HmustbeprecededbyC

Example

3 MmustbeprecededbyCExampleকা

4 DmustbeprecededbyeitherofVCorMExampleআখখা হা

5 XmustbeprecededbyeitherofVCMorDExampleউঃখঃবঃাঃ দ ঃ

6 BmustbeprecededbyeitherofVCMorDExampleআংইংকং

7 ZmustbeprecededbyVCMDBXorPExampleইৎকৎাৎাৎপ6ৎrৎ (S is not listed because S ends with M Z may also follow S)rdquo

8 VCANNOTbeprecededbyHDetailsin711CaseofVprecededbyH

9 SCANNOTbeprecededbyH

Now letuselaborateeach rulewithexamples from the scriptkeeping inmindthe Bangla Assamese and Manipuri communities Some combinations ofcharactersmayseemunrealisticorrareinusagebutthereisnoharminaddingsuch ligaturesbecause it ispossible tocreate thembyanyusereasilybutmaynotbeattestedcombinationsCase of V Preceded by H There could be cases involvingmulti-word domains where Vmay need to beallowedtofollowanHegব8াsঅtইিuয়া baeligŋkʌv ɪndiə (U+09ACU+09CDU+09AFU+09BEU+0999U+09CD U+0995 U+0985 U+09AB U+09CD U+0987 U+09A8 U+09CD U+09A1U+09BFU+09DFU+09BE)(meaningBankofIndia)ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendswithanH(অt)andthesecondwordbeginswithaV(ইিuয়া)Somesectionsofthe linguistic community require the explicit presence of H for fullrepresentation of the sound intended However by and large the form of thefirstwordwithoutanH(U+09CD)isconsideredenoughforfullrepresentationofthesoundintendedforthefirstword

49

This isauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoire Otherwise V is never required to be allowed to follow an HPermittingthismaycreateaperceptivesimilaritybetweentwolabels(withandwithout H) for majority of the linguistic communities hence this is explicitlyprohibitedbytheNBGPIn future if required depending on the prevailing requirements from thecommunitythefutureNBGPmayconsiderrevisitingthisrule

72 AdditionalExamplesfromBanglaABNF

Belowarea fewexampleswhichhelponeunderstandsomeof the rulesABNFputsinplaceThesearejustgivenforreferencepurposesandarenotmeanttobecomprehensive

1HMBDorXcannotoccurinthebeginningofaBanglawordExample

ক क

াক ाक

ংক क

ক क

ঃক ःकAscanbeseensuchcombinationwillresultautomaticallyinaldquogolurdquooradottedcircle marking it as an invalid formation This is an intrinsic property of theIndianlanguagesyllableandisquasi-automaticallyappliedwhereversupportedbytheOS

2HisnotpermittedafterVBDXMS

Example

অ अ

অং क ক क কঃ कः ি )क

3NumberofBDorXpermittedafterConsonantorVowelorakāra(Mātrā)isrestrictedtoonethusthefollowingcombinationsareinvalidated

50

Example

কংং क

ক क

কঃঃ कःः

কা का

কীঃঃ कःः

অংং अ

অ अ

অঃঃ अःः

4NumberofMpermittedafterConsonantisrestrictedtooneExample

কীী की5MisnotpermittedafterV Example

ইাঈৗ ईाईौ6ThecombinationsofAnusvāra+ VisargaaswellasVisarga+Anusvāraarenotpermissible

Example

কংঃ कः

কঃং कः

8 Contributors

81ExpertsfromIndia ProfessorUdayaNarayanaSinghChair-ProfessorofLinguisticsampDeanFacultyofArtsAmityUniversityHaryanaGurgaonPachgaonManesarPIN122431(Haryana)India

51

ProfessorPabitraSarkarformerlyVice-ChancellorRabindraBharatiUniversityKolkataDrAtiurRahmanKhanPrincipalTechnicalOfficerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMrRajibChakrabortyLinguistSocietyforNaturalLanguageTechnologyResearch(SNLTR)Module114amp130SDFBuildingSaltLakeSector-VKolkata-700091(WestBengal)IndiaMrAkshatJoshiProjectEngineerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMsMoumitaChowdhurySeniorTechnicalOfficerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMrChandrakantaMurasinghAgartalaTripuraSomeotherNBGPmembers

82ContributorsfromBangladeshJanabMustafaJabbarHonorableMinisterMinistryofPostsTelecommunicationsampInformationTechnologyGovtofBangladeshProfShamsuzzamanKhanFormerDirector-GeneralBanglaAcademyDhakaProfRafiqulIslamNationalProfessorofHumanitiesDhakaProfSwarochisSarkarDirectorInstituteofBangladeshStudiesRajshahiUniversityRajshahiBangladeshProfJinnatImtiazAliDirector-GeneralInternationalMotherLanguageInstituteDhakaMrMohammadMamunOrRashidDepartmentofBanglaJahangirnagarUniversityampMemberBangladeshComputerCouncilProfManiruzzamanformerlyProfessorChittagongUniversityChattagramBangladeshMrShyamSunderSikderSecretarySecretaryPostampTelecommunicationsDivisionGovtofBangladesh

52

MrMdMustafaKamalFormerDirectorGeneralBangladeshTelecommunicationsRegulatoryAuthorityGovernmentofBangladeshDhakaBrigadierGeneralMdMahfuzulKarimMajumderDirector-GeneralEngineeringampOperationsDivisionBangladeshTelecommunicationsRegulatoryAuthorityGovernmentofBangladeshDhakaMdZiarulIslamProgrammerPostsampTelecommunicationsDivisionGovernmentofBangladeshDhakaProfSyedShahriyarRahmanDepartmentofLinguisticsUniversityofDhakaDrMizanurRahmanDirector(In-Charge)TranslationTextBookandInternationalRelationsDivisionBanglaAcademyDhakaDrApareshBandyopadhyayDirectorBanglaAcademyDhakaMrMdMobarakHossainDirectorBanglaAcademyDhakaDrJalalAhmedDirectorBanglaAcademyDhakaMrJahangirHossainInternetSocietyBangladesh(ICANNALS)JanabSarwarMostafaChoudhuryBangladeshComputerCouncilDhakaJanabMdRashidWasifBangladeshComputerCouncilDhakaJanabIstiaqueArifSeniorAssistantDirectorBangladeshTelecommunicationsRegulatoryAuthorityDhakaMsAfifaAbbasInformationSecurityandGovernanceLeadEngineeratBanglalinkandICANNFellowMr Mohammad Abdul Haque Secretary General Bangladesh Internet GovernanceForumMrImranHossenCEOEyeSoftandkeymemberofBangladeshAssociationofSoftwareampInformationServices(BASIS)MsShahidaKhatunDirectorFolkloreMuseumampArchiveDivisionBanglaAcademyDhakaMrSyedAshikRehmanCEOBengalMediaCorporationDhakaMrHaseebRahmanCEOProfessionalsrsquoSystemsDhaka

53

9 References[100] UnicodeConsortium2017UnicodeStandard100MountainViewCA

[101] BandyopadhyayChittaranjan1981DuiShatakerBanglaMudranoPrakashanKolkataAnandaPublishers

[102] BanerjiRD1919TheOriginoftheBengaliScriptKolkataNewDelhiAsianEducationalServices2003reprint

[103] ChatterjiSK1926TheOriginandDevelopmentoftheBengaliLanguageCalcuttaCalcuttaUniversityPress

[104] -----1939Bhasha-prakashBangalaVyakaran(AGrammaroftheBengaliLanguage)CalcuttaUniversityofCalcutta

[105] HaiMuhammadAbdul1964DhvaniVijnanOBanglaDhvani-tattwa(PhoneticsandBengaliPhonology)DhakaBanglaAcademy

[106]JhaSubhadra1958TheFormationofMaithiliLondonLuzacampCo

[107] KosticDjordjeDasRheaS1972AShortOutlineofBengaliPhoneticsCalcuttaStatisticalPublishingCompany

[108] MajumdarRC1971HistoryofAncientBengalCalcuttaGBhardwaj

[109] MazumdarBijaychandra19202000TheHistoryoftheBengaliLanguage(ReprCalcutta1920ed)NewDelhiAsianEducationalServices

[110] PandeyAnshuman2001ProposaltoEncodetheTirhutaScriptinISOIEC10646

[111] PalPalashBaran2001DhwanimalaBarnamalaKolkataPapyrus

[112] -----2007lsquoBanglaHarapherPanchParbarsquoInSwapanChakrabortyedMudranerSanskritiOBanglaBoiKolkataAbabhas

[113] RossFiona1999ThePrintedBengaliCharacteranditsEvolutionLondonCurzon

[114] ShastriMahamahopadhyayHaraPrasad1916HājārBacharērPurāṇaBāṅgālāBhāṣāyBauddhaGānōDōhāCalcuttaBangiyaSahityaParishat

[115] SinghUdayaNarayana(JointlyManiruzzaman)1983DiglossiainBangladeshandlanguageplanningCalcuttaGyanBharati214pp

[116] -----1987ABibliographyofBengaliLinguisticsMysoreCIILxii+316pp

[117] -----2017(withRajibChakrabortyBidishaBhattacharjeeampArimardanKumarTripathy)LanguagesandCulturesontheMarginGuidelinesforFieldworkonEndangeredLanguagesMimeoCentreforEndangeredLanguagesVisva-Bharati

54

[118] -----1980ScriptalchoiceandspellingreformAnessayinlanguageandplanningJournaloftheMSUniversityofBarodaSocialScienceNumber292173-186AmodifiedversionreprintedEAnnamalaiBjornJernuddandJoanRubinedsLanguagePlanningProceedingsofanInstituteMysoreCIIL405-417

[119] Sripantha1996JakhanChapakhanaEloKolkataPaschim-BangaBanglaAcademy

[120] SurAtul1986BanglaMudranerDushoBacharKolkataJijnasa

[121] ScriptBehaviourforBengaliVersion11TDILandC-DACPune

[122] BoraMahendra1981TheEvolutionofAssameseScriptJorhatAssamSahityaSabha

[123] ProposaltoEncodetheTirhutaScriptinISOIEC10646httpwwwunicodeorgL2L201111175r-tirhutapdfaccessedon25112017

[124] EthnologueAssameseintheLanguageCloudhttpswwwethnologuecomcloudasmaccessedon25112017

[125] BengalialphabetforManipurifoundinEthnologueManipuri(MeeteilonMeithei)httpswwwomniglotcomwritingmanipurihtmaccessedon20102019

[126] WikipediaBengalialphabethttpsenwikipediaorgwikiBengali_alphabetaccessedon25112017

[129]OmniglotSlyhetihttpwwwomniglotcomwritingsylotihtmaccessedon1052018

[130]WikipediaBishnupriyaManipurilanguagehttpsenwikipediaorgwikiBishnupriya_Manipuri_languageaccessedon25112017

[131] TheEMILLECIILCorpushttpmetashareeldaorgrepositorybrowsethe-emilleciil-corpusabdd35c8de6f11e2b1e400259011f6ea6bce74d38dbb42d881da76c64a6adb20accessedon1052018

[132]TheEMILLECIILCorpushttpcatalogelrainfoproduct_infophpproducts_id=696accessedon1052018

[133] BanglaLanguageampScript httpswwwisicalacin~rc_banglabanglahtmlaccessedon1052018

[134] SarkarPabitra1992BanglaBananSanskarSamasyaoSambhabanaKolkataChirayataPrakashan

[135] SarkarPabitra1993BanglaBhasharYuktabyanjanBhasha1123-45

55

[136] DashNiladriShekharandBBChaudhuri1998BanglaScriptAStructuralStudyLinguisticsToday121-28Alsoavailableathttpswwwacademiaedu9967428Bangla_Script_A_Structural_Study

[137] DaniAhmedHasan(1957)lsquoSrīhatta-NāgarīLipirUtpattioBikāśrsquoBanglaAcademyPatrika(Dhaka)Vol12(Bhadra-Agrahayan1364BangabdaNumber)pg1

[138] WikipediaSylhetiNagari httpsenwikipediaorgwikiSylheti_Nagariaccessedon1952018

[139] FuruiRyosuke(2015)lsquoVariegatedAdaptationsStateFormationinBengalfromtheFifthtoSeventhCenturyrsquoinBhairabiPrasadSahuampHermannKulkeedsInterrogatingPoliticalSystemsIntegrativeProcessesandStatesinPre-ModernIndiaChapter9Pp255-73NewDelhiManohar

[140] FergusonCharesAandMunierChowdhury(1960)lsquoPhonesofBengalirsquoLanguageVol36No1pp22-59

[141] ShahidullahMuhammad(2007)BuddhistMysticSongsDhakaMowlaBrothers

[142] RayPunyaSloka(1966)BengaliLanguageHandbookWashington

[143] HaiMuhammadAbdul(1960)AphoneticandphonologicalstudyofnasalsandnasalizationinBengaliDhakaUniversityofDhaka

[144] UnicodeConsortiumProposalSummaryFormtoAccompanySubmissionsforAdditionstotheRepertoireofISOIEC10646UNICODEhttpswwwunicodeorgL2L200202387r-syloti-formpdfaccessedonMay212018

[145] WikipediaOlChiki(Unicodeblock)httpsenwikipediaorgwikiOl_Chiki_(Unicode_block)accessedonMay212018

[146]BanglaScripthttpwwwbangladesh2000combdbangla_scripthtmlaccessedonMay212018

[147] BhattacharyaAshutoshed(1942)GopichandrerGanCalcuttaCalcuttaUniversity

[149] DasSisirKumar(1975)SahibsandMunshisAnAccountoftheCollegeofFortWilliamCalcutta

[150] IslamRafiqulPabitraSarkarMahbubulHaqampRajibChakraborty(eds)(2014)BanglaAcademyPromitoBanglaByabaharikByakaran(AFunctionalGrammarofStandardBangla)DhakaBanglaAcademy

[151] SarkarPabitra[2013]lsquoBanglaSpellingReformtheLongandShortofItrsquoBanglaJournal19215-232

[152] BanglaAcademy(2012)BanglaAcademyPromitoBanglaBananerNiyam(StandardBanglaSpellingasadoptedbyBanglaAcademy)DhakaBanglaAcademy

56

[153] SarkarPabitraampRajibChakraborty2018ldquoWhathashappenedSoFarIntermsof Script Reformsrdquo Paper presented at the Face to Face meeting jointly held by theBanglaAcademyDhakaampICANNatBanglaAcademyDhakaon10072018

[154] The Unicode Consortium 2018 The Unicodereg Standard Version 110 ndash CoreSpecificationChapter12P473

57

10 Appendix-I

101 AugmentedBackus-NaurFormalism(ABNF)The Augmented Backus-Naur Formalism (ABNF) is generic in nature and whenappliedtoaspecificlanguagescriptcertainrestrictionrulesapplyInotherwordsinagivenlanguagesomeoftheFormalismstructuresdonotnecessarilyapplyTotakecareofsuchcasesrestrictionrulesaresetinplaceTheserestrictionswillhelptofine-tunetheABNFIncaseofBangla13inparticularthefollowingrulesapply

1Khaṇḍata (ৎ) is NOT allowed at the beginning of an IDN label The sameappliestoঞandthevelarnasalঙintheBanglaSchemeoffive-foldlsquovargarsquo(asdefined under Table 5) Moreover Bangla does not allow ya (য়) in thebeginningofawordeitherbutwecanciteacoupleofnativeexamples forexamplethewordয়8াvেড়া(yaeligbbɔRo)fromthepoemlsquoLichuchorrsquowrittenbyKaziNazrul IslamHowever there are instances of it being used in namesmostlyof foreignoriginsuchasYaqubwhichmaybewrittenwithya(য়) inthe beginning as inয়াwব) In very recent timeswhile transliterating someChineseandJapanesenamesinBanglaonedoescomeacrossthepossibilityofKhaṇḍata (ৎ) followedbysa (স) inthebeginningofaword forexamplexেসিরং(Tsering)

2CHcancomewithKhandaTainonlythecasewhereCisra(র)(09B0)

ৎ6 asinভৎ6 সনা

3OnlyfollowingcombinationswithVHCMwillbeallowedrarrঅ8া(togetherpronouncedasaelig)asinঅ8ািসড(acid)rarrএ8া(togetheralsopronouncedasaelig)asinএ8ািসডএ8ােসািসেয়শান

(acidassociation)

102 lsquoSylhetiNagarılipirsquoorlsquoSilotirsquoThisversionofBanglascriptresemblesthe lsquoKaithīrsquoscript(ISO12954)usedbytheAccountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh andBiharndashwidelyinuseduringthe1880sTherewereseveralothernamesofSylheti

13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language Assamese and Maṇipuri have not been covered in this section

58

Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)

Table17ndashTheScriptTableofSylhetiNagarıorSiloti

103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints

1031 BanglaandNāgarīorDevanāgarī

Bangla Devanāgarī NBGP

Decision

ঃ U+0983 ः U+0903 Confusable

ওU+0993 उU+0909 Confusable

ঘU+0998 घU+0918 Confusable

U+0981 U+0945 Confusable

Table18BanglaandDevanāgarīconfusablecodepoints

59

1032 BanglaandGurmukhi

Bangla Gurmukhi NBGPdecision

ঘU+0998 ਬU+0A2C Confusable

U+0981 U+0A71 Confusable

Table19BanglaandGurmukhiconfusablecodepoints

Bangla Gurmukhi

NBGPdecision

ওU+0993 ਤU+0A24

Distinguishable

শU+09B6 ਅU+0A05

Distinguishable

মU+09AE ਮU+0A2E

Distinguishable

বাU+09ACandU+09BE

ਗU+0A17

Distinguishable

Table20ndashBanglaandGurmukhıdistinguishablecodepoints

1033BanglaandOriya(Odia)

Bangla Oriya(Odia) NBGPDecision

ওU+0993 ଓU+0B13 Confusable

Table21ndashBanglaandOriyadistinguishablecodepoints

Bangla Oriya(Odia) NBGP

Decision

ঘU+0998 ସU+0B38 Distinguishable

Table22ndashBanglaandOriyadistinguishablecodepoints

60

11 Appendix-IIBengaliconsonantsandtheirallographs

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

প p y(z+ত)(z +ন)|(z +প)প8(z +য)r(z +র)(z +ল)~(z +স)(+প)(+প)

ফ pʰ (t +র)(t +ল)(+ফ)

ব b ( +জ)( +দ)( +ধ)v(+ব)ব8(+য)(+র)(+ল)ভ(+ভ)(+ব)(+ব)

(0+ধ) 2 (3+ব)

ভ bʱ ভ8(+য)(+র)(+ল)

ত t (x+ত)8(x+x+য)(x+x+ব)(x+থ)(x+ন)ত8(x+য)(x+ম)8(x++য)(x+ব)(x+র)y(z+ত)(p+ত)(p+x+ব)(+ত)q8(+x++য) (+x+র)Thereisamarkedformofত+=ৎৎ6 (+xৎ)

amp (5+ত)

61

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

থ tʰ থ8(iexcl+য)cent(iexcl+র)pound(+থ)(x+থ)curren(+থ)

(7+থ) 9 (+থ)

দ d yen(brvbar+গ)sect(brvbar+ঘ)uml(brvbar+দ)copy(brvbar+ধ)দ8(brvbar+য)ordf(brvbar+ব)laquo(brvbar+ভ)not(brvbar+র)(+দ)shy(+দ)reg(+brvbar+র)not6 (+brvbar+র)

(lt+গ) gt (lt+ধ)

ধ dʱ macr(deg+ন)plusmn(deg+ম)ধ8(deg+য)sup2(deg+র)sup3(acute+ধ)copy(brvbar+ধ)(+ধ)micro(+ধ)

( (+ধ) gt (lt+ধ) (0+ধ) (7+ধ)

ট ʈ para(middot+ট)ট8(middot+য)cedil(middot+ব)sup1(middot+র)ordm(p+ট)raquo(frac14+ট)

ঠ ʈʰ ঠ8(frac12+য)frac34(iquest+ঠ)Agrave(frac14+ঠ)

ড ɖ Aacute(Acirc+ড)ড8(Acirc+য)Atilde(Acirc+র)

ঢ ɖʱ ঢ8(Auml+য)Aring(iquest+ঢ)

চ tʃ AElig(Ccedil+চ)Egrave(Ccedil+ছ)Eacute(Ccedil+Ecirc+র)Euml(Ccedil+ঞ)চ8(Ccedil+য)Igrave(Iacute+চ)Icirc(Iuml+চ)

(A+চ)

62

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ছ tʃʰ ETH(Ecirc+র)Egrave(Ccedil+ছ)Ntilde(Iacute+ছ)Ograve(Iuml+ছ)

$ (A+ছ)

জ dʒ Oacute(Ocirc+জ)Otilde(Ocirc+Ocirc+ব)Ouml(Ocirc+ঝ)times(Ocirc+ঞ)জ8(Ocirc+য)Oslash(Ocirc+র)Ugrave(Iacute+জ)

(A+জ)

ঝ dʒʱ (notprivilegedenoughtohaveclustersasafirstmember)Ouml(Ocirc+ঝ)Uacute(Iacute+ঝ)

ক k Ucirc(p+ক)ordm(p+ট)(p+ত)Uuml(p+x+র)(p+x+ব)Yacute(p+ন)THORN(p+ব)szlig(p+ম)ক8(p+য)agrave(p+র)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)auml(p+frac14+ব)aacute8(p+frac14+য)aring(p+স)s(aelig+ক)ccedil(+p+র)

amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)

খ kʰ (notprivilegedenoughtohaveclustersasafirstmember)egrave(aelig+খ)

63

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

গ g eacute(acute+গ)ecirc(acute+দ)sup3(acute+ধ)euml(acute+ন)igrave(acute+ব)iacute(acute+ম)গ8(acute+য)icirc(acute+র)iuml(acute+ল)eth(aelig+গ)eth6 (+aelig+গ)

( (+ধ) (H+গ) K (L+H+গ)

ঘ gʱ ntilde(ograve+ন)ঘ8(ograve+য)oacute(ograve+র)ocirc(aelig+ঘ)

ঞ Thisletterdoesnothaveanyparticularphoneticvaluebutmostlypronouncedasn

Igrave(Iacute+চ)Ntilde(Iacute+ছ)Ugrave(Iacute+জ)Uacute(Iacute+ঝ)times(Ocirc+ঞ)

(A+চ) $ (A+ছ) (A+জ) M (A+ঝ)

ণ n otilde(iquest+ট)frac34(iquest+ঠ)ouml(iquest+ড)divide(iquest+Acirc+র)Aring(iquest+ঢ)oslash(iquest+ণ)ণ8(iquest+য)ugrave(iquest+ব)acirc(p+frac14+ণ)uacute(frac14+ণ)ucirc(+ণ)

O (P+ড) - (P+R+র) + (S+ণ)

ঙং ŋ s(aelig+ক)uuml(aelig+p+র)egrave(aelig+খ)eth(aelig+গ)ocirc(aelig+ঘ)yacute(aelig+p+ষ)(Insomecontextsaeligisreplacedbyং)কংঅং

) (H+ক) (H+গ) U (H+ঘ)

64

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ম m thorn(+ল)yuml(+প)(+z+র)(+ভ)(++র)$(+ম)(+র)(x+ম)plusmn(deg+ম)amp(+ম)atilde(p+frac14+ম)

W (3+ম)

ন n (+ট)((+middot+র))(iquest+ঠ)u(+ড)(+Acirc+র)(+ত)q(+x+র)q8(+x++য)curren(+থ)shy(+দ)reg(+brvbar+র)micro(+ধ)+(+deg+র)(+brvbar+ব)-(+ন)(+ম)ন8(+য)(+স)0(+ন)

(7+থ) (7+ধ) (7+Y+র)

শ ʃ Icirc(Iuml+চ)Ograve(Iuml+ছ)1(Iuml+ন)2(Iuml+ম)3(Iuml+র)4(Iuml+ল)শ8(Iuml+য)

ষ ʃ 5(frac14+ক)raquo(frac14+ট)Agrave(frac14+ঠ)uacute(frac14+ণ)6(frac14+প)7(frac14+z+র)8(frac14+ফ)raquo(frac14+ট)9(frac14+middot+র)Agrave(frac14+ঠ)uacute(frac14+ণ)ষ8(frac14+য)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)

+ (S+ণ)

65

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

স sampʃ (+ক)(+ট)(+প)(+ফ)lt(+ত)pound(+থ)(+ট)(+ক)=(+খ)স8(+য)gt(+র)(+ল)aring(p+স)

9 (+থ)

হ h ucirc(+ণ)0(+ন)amp(+ম)হ8(+য)(+র)A(+ল)

W (3+ম)

ড় ɽ B(C+গ)

ঢ় ɽʱ (notprivilegedenoughtohaveclusters)

য dʒThesecondarysymbol(allograph)jɔ-phalahastwophoneticvaluesWhenaddedtotheinitialconsonantinaworditisavowelaelig(asinশ8ামলর 8াপারetc)Butafteranon-initialconsonantitjustdoublesitinpronunciation(asinকায6ধায6etc)The+যcombinationhastwophysicalmanifestationsmdashর 8andয6

ক8(p+য)স8(+য)র 8(+য)[Justর 8isneverusedinBanglaorthographyর 8াisbutthenitslasttwosymbolsYa-phalaa-karaconstituteavowelsignrepresentingthevowelঅ8া]

66

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

র r Twomanifestationsmdashi lরফrepʰasthefirst

memberofaclusteregপ6ৎ6 not6 য6D6(+deg+ব)(earlierE6=+brvbar+deg+বafour-termcluster)etc(placedoverthefollowingconsonant)

ii র-ফলাrɔ-pʰɔlaasthesecondthirdmemberofaclustereg etc(placedundertheconsonantitfollows)

ল l F(+গ)(+প)G(+ব)H(+ম)I(+ট)J(+ড)K(+ক)F(+গ)L(+দ)ল8(+য)iuml(acute+ল)(+ল)thorn(+ল)

ঃ hwordfinallywordmediallyitdoublesthepronunciationofthefollowingconsonant

অঃকঃ

অব

Page 18: Proposal for a Bangla (or Bengali) Script Root Zone …‘Bangla’ (or Bengali) is historically and genealogically regarded as an eastern Indo-Aryan language with around 178.2 million

18

UnicodeStandardcontainsanumberofcompositionexclusionsAsaresulttheBanglalettersয় YYAড় RRAandঢ় RRHAhavetoberepresentedinthethisLGRbyusingthesequences (YA +Nukta U+9AF + U+09BC) (DDA + Nukta U+9A1 + U+09BC) and(DDHA+NuktaU+9A2+U+09BC)insteadofthesinglecodepointsYYA(U+9DF)RRA(U+09DC) andRRHA (U+09DD) although the useof lsquoNuktarsquo is otherwise completelyunnaturalinBanglaIt is noted that in the current Unicode Standard chart these characters are listed asadditionalconsonantsAspertheLGRProcedurehoweverthesedecisionsdependontheIDNAProtocolthroughasetofprodeduresdevelopedbytheIETFEventhoughtheUnicode Standard also prescribesmethods to produce these three characters both asatomiccharacters (forexample09DC forড় [r]09DD forঢ় [rh] and09DFasয় [y]assinglekeystroke)theIDNAprotocolrequiresthatwetreatthemasconjunctcharactersandthenallocatecodesfortheseintheUnicodeBengaliBlockItmaybenotedthattherecouldbesporadicattemptsorcasesofwritingMuslimnamesUrdupoeticwordsandPerso-Arabicloanwordswithnuktaunderক(k)খ(kh)গ(g)জ(j) and ফ (ph) only for the sake of correct pronunciation and for maintaining thesanctityoftheloanwordThesewerealsolikeusingBanglawritingsystemtoworkliketheIPAscriptItishowevernotinuseinBanglawritinginprinting

337Visargabiʃɔrgo(ঃ-U+0983)andAvagraha(ঽ-U+09BD)

TheVisargabiʃɔrgoU+0983 is frequentlyused inBangla loanwordsborrowed fromSanskritandrepresentsasoundveryclosetohOnecouldquoteasanexampleদঃখduhkholdquosorrowrsquorsquoldquounhappinessrsquorsquo(U+0926U+0941U+0983U+0916)The Avagraha ঽ (U+09BD) is mainly used in Sanskrit Pali Prakrt or Maithili textswritteninBanglaItisgraduallybeingreplacedbyanuppercomma(egনেরাঽপরািণre-writtenasনেরাrsquoপরািণ)ItisrarelyusednoweveninotherlanguagesusingBanglascriptIncaseofLGRtheAvagrahaisnotpartoftherepertoireIthasbeendecidedthereforenottoretainAvagraha(ঽ)(U+09BD)becauseitisblockedinTLDsaspertheMaximalStartingRepertoire(MSR)PleaseseeAppendixIIinsection11foracompletelistofBanglaconsonantsandtheirallographs

338ZeroWidthNon-joiner(U+200C)andZeroWidthJoiner(U+200D)ThisnoteispertinenttotheuseofZeroWidthJoiner(ZWJ)andZeroWidthNonJoiner(ZWNJ)asusedinBanglaItneedstobenotedthatNepaliKonkaniandHindiusethesetwosignsinadifferentmanner

19

ZWJ(U+0200D)andZWNJ(U+0200C)arecodepointsthathavebeenprovidedbytheUnicodestandardto instructtherenderingofastringwherethescripthastheoptionbetweenjoiningandnon-joiningcharactersWithouttheuseofthesecontrolcodesthestringmayberenderedinanalternateformfromwhatisintendedUseofZWJ

bull InsofarasBanglaisconcernedZWJisusedfortheproperrenderingofcharacterssuchaskhaṇḍa-taৎasinসতGিজৎ (satyajit)ldquoSatyajitrdquoandসৎ(sat)ldquohonestrdquoThisistypedasfollowsta+Hasanta+ZWJ(U+0200D)

bull However ZWJ is more important where same combination of consonantal

characters is represented differently depending upon the contexts Eg র++য

havetworepresentationsinBanglamdashasযHandasর GTogettheformযHonehasto

type in the following mannermdashর++য but for র G the sequence would be

র+ZWJ++য [154] In other words ZWJ is used in the rendering of wordsdemanding ya-phalā after ra which is otherwise not possible to type (render)due to the same order of ra+hasanta+antastha ja in the medial andor finalposition Interestingly ra+hasanta+antastha ja is used to type repha on theconsonant -antasthaja as inকায6 (kaarjo) In order to get a ya-phalā after the

consonant -ra it is therefore obligatory to use ZWJ after -ra as in র Gাপার

(wrapper)র Gাশ(rash)র Gািল(rally)etcThetypingsequenceisgivenbelow

ra(র)+ZWJ+hasanta()+antasthaja(য)=র GUseofZWNJ

bull TheuseofZWNJinBanglaisusedtorepresenttheexplicitHasantaorHalantInordertoavoidconjunctformationincaseswherethereisanexplicithasantabeforethesucceedingconsonanttheZWNJisused

Consonant+hasanta+ZWNJ+consonant=explicithasantaExampleAা7 কথন(prakkathanaprakkɔtʰon)

TheuseofZWJZWNJhavebeenruledoutfromtherootzonebythe[Procedure]Usedin Bangla to create alternate renderings the insertion of these two signs can affectsearchingaswellasNLP

20

TheZeroWidthNon-joiner(ZWNJ)isaninvisiblecharacterusedincertaincases(afterHasanta)wheredefaultconjunctformationistobeexplicitlyrestrictedandtheHasantajoiningthetwoconsonantsparticipatingintheconjunctformationneedstobeexplicitlyshown

339UseofYa-phalaaYa-Phalaasequencesare two instances inBanglawhereHasanta isprecededbya fullvowel(U+0985অ-BENGALILETTERAandU+098Fএ-BENGALILETTERE)

bull অা 098509CD09AF09BE BENGALILETTERA+BENGALISIGNVIRAMA+BENGALILETTERYA+BENGALIVOWELSIGNAA

bull এা 098F09CD09AF09BEBENGALILETTERE+BENGALISIGNVIRAMA+BENGALILETTERYA+BENGALIVOWELSIGNAA

For renderingYa-phalā followedbyঅ andএ it isnecessary to typeU+09CDHasantaplusU+09AFyaprecededby thesaidvowelsThis isapurely ligaturalentityand theadditionofYa-phalāandākaraisusedtoelicittheaeligsoundasinEnglishacidঅGািসড

association অGােসািসেয়শনlsquobatrsquoবGাটlsquofatrsquo ফGাট lsquomatrsquo মGাটlsquocaprsquoকGাপetcTheBrāhmīscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribed as lsquovowel killerrsquo although it actually indicates absenceof a vowel after themarkedconsonantOnly theconsonantscanhave theHasantamarkedButasweseehereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅ8াandএ8া(CfUnicode100p473[100])

3310FormationofRa-phalaaandRefSequences Thiscasereferstotheformationofrephaandra-phalāasfollows

Ra-Hasanta= (C2H)whereC2iseither

09B0(র-BENGALILETTERRA)or 09F0(ৰ-ASSAMESELETTERRAUnicodename

BENGALILETTERRAWITHMIDDLEDIAGONAL)His09CD(-BENGALISIGNVIRAMA)

Owingtoco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+kaasinঅকHarkaldquothesunldquo)ra-phalā=C+Hasanta+ra(egT ieka+Hasanta+raasinচTchakraldquocyclerdquo)Thepointisin

21

both the cases the slot for ra could be Bangla ra র (U+09B0) or the Assamese ra ৰ(U+09F0)followedprecededbythecommonHasanta(U+09CD)whereastheshapesofrephaandra-phalāinboththecasesremainthesameTheLGRmakesanoteofthispoint of concern with respect to the two RAs in disguise as it would be compeltelyimpossibletodistinguishbetweenthemwithnakedeyesinalablesogeneratedwhichmay consequently lead to concerns related to spoofing and other kind of cyberirregularitiesThemotivetoclassthesetwoCPsas(blocking)variantsisbecausefullyrendered labels may mask the distinction between Bangla ra র (U+09B0) or theAssameseraৰ(U+09F0)ThatprovidesthejustificationforVariantSet4thoughonlyinthecontextoffollowingHasantThedifferencebetweentheRAsisonlydistinguishableifonelooksintotheirUnicodevaluesThereforelabelssuchasঅকHarka শীষH sırsalsquotopapexrsquo অY abhra lsquocloudthe skyrsquo ম śrama lsquophysical labourrsquo could be extremelydangerous as theweb-usermay never verify the digital content (the labels) with itsunicodevaluecodepoints ThispointismadeexplicitlywithreferencetoTable9(ofsequencesp36)andTable16(ofWLESymbolsp47)thataretofollowMoreoveritisnoteworthythattheREPHAcanalsooccurwithKHANDATATheconditionsinthiscontextofKHANDATAareliabletobesuchthattheCshouldbeeitherRAU+09B0(র)(usedinBangla)orRAU+09F0(ৰ)(usedinAssamese)

4 OverallDevelopmentProcessandMethodologyThe Neo-Brāhmī Generation Panel (NBGP) has been formed by members havingexperience in Linguistics (especially in NLP Computational linguistics) LiteratureLanguageHistoryandEpigraphyUndertheNeo-BrāhmīGenerationPanelBanglaandeightotherscriptsbelongingtoseparateUnicodeblocksarebeingtakenuptoassignaseparate LGR for each However an attempt ismade to ensure that the fundamentalphilosophybehindbuildingthoseLGRsconsistentwithallotherBrāhmī-derivedscriptsThepresentLGRwillcater tomultiple languagesbelongingtoEGIDSscale1to4(seeTable4)thatuseBanglascriptThefollowingguidingprinciplesareusedinmakingdecisionsaboutBanglaLGRCode-points

41 GuidingPrinciplesTheNBGP adopts followingbroadprinciples for selection of code-points in the code-pointrepertoireacrosstheboardforalltheNeo-Brāhmīscriptswithinitsambit

411 InclusionPrinciples4111 ModernUsageEvery character proposed should be in the everyday usage of a particular linguisticcommunityThecharacterswhichhavebeenencodedintheUnicodefortranscription

22

purposesonlyorforarchivalpurposeswillnotbeconsideredforinclusioninthecode-pointrepertoire4112 UnambiguousUseEvery character proposed should have unambiguous understanding among linguistsaboutitsusageinthelanguage

42 ExclusionPrinciplesThe main exclusion principle is that of External Limits on Scope These consist ofprotocolsor standardswhichareprerequisites to theLabelGenerationRule-setsAllfurtherprinciplesareinfactsubsumedundertheselimitationsbuthavebeenspeltoutseparatelyforthesakeofclarity421 ExternalLimitsofScopeThecodepointrepertoireforrootzonebeingaveryspecialcaseatthetopofprotocolhierarchies thecanvasofavailablecharacters forselectionasapartof theRootZonecodepointrepertoireisalreadyconstrainedbyvariousprotocollayersbeneathitThefollowingthreemainprotocolsstandardsactassuccessivefiltersiTheUnicodeChartOut of all the characters that are needed by the script in question if a particularcharacter is not encoded in Unicode it cannot be incorporated in the code pointrepertoire Such cases are quite rare and especially so in Bangla-Asamiyā-ManipuriWritingSystemgiventheelaborateandexhaustivecharacterinclusioneffortsmadebytheUnicodeconsortiumiiIDNAProtocolUnicode being the character-encoding standard for providing the maximum possiblerepresentation of a given scriptlanguage it has encoded as far as possible all thepossible characters needed by the script However the Domain name being aspecialized case it is governed by an additional protocol known as IDNA(InternationalizedDomainNames inApplications) The IDNAprotocol excludes somecharactersoutofUnicoderepertoirefrombeingpartofthedomainnamesiiiMaximalStartingRepertoire(MSR)TheRoot-zoneLGRbeing the repertoireof characterswhicharegoing tobeused forcreationoftheRoot-zoneTLDswhichinturnconstituteanevenmorespecializedcaseof domain names the ROOT LGR procedure introduces additional exclusions on theIDNArsquosallowedsetofcharacters

23

ExampleBanglaSignAvagrahaঽ(U+093D)evenifallowedbyIDNAprotocol isnotpermittedintheRootZoneRepertoireaspertheMSRTosumuptherestrictionsstartoffwithadmittingonlysuchcharactersasarepartofthe code-block of the given scriptlanguage The IDNA Protocol further narrows thisdownandfinallyanadditionalfilterintheformofMaximalStartingRepertoirerestrictsthecharactersetassociatedwiththegivenlanguageevenmore4211 NoPunctuationMarksTheTLDsbeingidentifierspunctuationmarkerspresentinBraHami-basedscriptswillnotbeincluded 4212 NoSymbolsandAbbreviationsAbbreviations weights and measures and other such iconic characters like BANGLAISSHAR(U+09FA)BANGLACURRENCYDENOMINATORSIXTEEN৹(U+09F9)etcwillalsonotbeincluded4213 NoRareandObsoleteCharactersThere are characterswhich have been added toUnicode to accommodate rare formssuchasSanskriticVOCALICRRৠ(U+09E0)andVOCALICLldquoঌrdquo (U+098C)aswellasVOCALICLLৡ(U+09E1)andtheallographicndashkaraformsofthelattertwosymbols-VOWELSIGNVOCALICL(U+09E2)andVOWELSIGNVOCALICLLldquo(U+09E3)Allsuch charactersareexcludedwhich complieswith theConservatismprincipleas laiddownintheRootZoneLGRprocedureHoweverinBanglathe-karacorrespondingtoVOCALICRRৠ(U+09E0)whichisVOWELSIGNVOCALICRRldquordquo(U+09C4)isstill inactiveuseincertainlimitedborrowedorSanskriticwordsandarethereforeretained4214 NoStressMarkersofClassicalSanskritandVedicStressmarkers for classical Sanskrit will not be included This is also in consonancewiththeLetterprincipleaslaiddownintheRootZoneLGRprocedure4215 ABNFThe Augmented Backus-Naur Formalism (ABNF) is described in Section 541 andAppendix(Section101)

5 RepertoireTheBanglaWritingSystemisrepresentedinUNICODEusingtheBengali(Bangla)scriptname as enumerated in ISO 15924 corresponding to languages such as Asamiyā(Assamese) Bangla (Bengali) and Manipuri The BENGALI block used for Bangla-

24

Asamiyā-Manipuri in theUNICODEhas93 entriesThis sectiondetails the code-pointrepertoirethattheNeo-BrāhmīGenerationPanel[NBGP]proposestobeincludedintheBanglaLGRItmaybementionedherethat theGovernmentofAssamhassubmittedaproposal toBureauof Indian Standards (BIS) on26th February2016 for dis-unificationofBanglaand Asamiyā Scripts The BIS inits 8thMeetingofIndian Language Technologies andProducts Sectional Committee LITD 20 held on 23rd Aug 2017 decided torefer theproposalforrecognitionofAssamesescriptinISOIEC10646toISOUntiltheUNICODEConsortiumtakesanyfurtheractionitwillbeassumedthattheCodePointRepertoireunderTable11willbevalidforallthethreelanguagesasaboveFor each of the code points language references have been given in the last columntitledReferenceunderTable8titledtheldquoCodePointRepertoirerdquoForentirecoverageofBanglacodepointsreferencesofBanglaAsamiyā(Assamese)Manipuri(Meitei)andBishnupriya are given Kokborok written in Bangla script is not known to haveintroducedmanynewcomplicationsexceptforoneparticularcharacterThoughonlyafewrepresentativelanguagesunderEGIDSScale1-4havebeenchosenforreferencingthey together cover all the code-points required for all the languages that NBGP hasconsideredasgivenunderBanglaUnicodePoints(asgiveninUNICODE63)Howeverbefore thedetailsarepresented it is ideal to lookat theBanglaCodePointChartfromMaximalStartingRepertoire[MSR]Version3Itmaybenotedthattheshapesofthereferenceglyphsgivenbelowinthecodechartsarebasedononeofthemanyfontsdesigned and are not prescriptive because there could be some variations in actualfonts ndash both UNICODE-compatible and True-Type ones Consider the following Codepointtable

25

Figure1BanglaCodePagefrom[MSR]forBangla-Asamiyā-Manipuri

ColourconventionAllcharactersthatareincludedinthe[MSR]-YellowbackgroundPVALIDinIDNA2008butexcludedfromthe[MSR]-PinkishbackgroundNot PVALID in IDNA2008 or are ineligiblefor the root zone (digits hyphen) - Whitebackground

GiventheBanglaUnicodeBlockasinFigure1forthecodepointsthoseareincludedintheMSRthefollowingsymbolswillneedaseparatetreatmentৎ U+09CE BanglaLetterKhanda-Taৰ U+09F0 Asamiyā-BanglaLetterRaWithMiddleDiagonalৱ U+09F1 Asamiyā-BanglaLetterRaWithLowerDiagonal

26

51 CodePointRepertoireInclusion

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

1 U+0981 BENGALISIGNCANDRABINDU

Candra-bindu

1Bangla2Manipuri2Assamese

[112][122][125]

2 U+0982 ং BENGALISIGNANUSVARA

Onushshar(Anusvara)

1Bangla2Manipuri2Assamese

[112][122][125]

3 U+0983 ঃ BENGALISIGNVISARGA

Biśarga(Visarga)

1Bangla2Manipuri2Assamese

[112][122][125]

4 U+0985 অ BENGALILETTERA

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

5 U+0986 আ BENGALILETTERAA

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

6 U+0987 ই BENGALILETTERI

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

7 U+0988 ঈ BENGALILETTERII

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

27

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

8 U+0989 উ BENGALILETTERU

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

9 U+098A ঊ BENGALILETTERUU

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

10 U+098B ঋ BENGALILETTERVOCALICR

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

11 U+098F এ BENGALILETTERE

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

12 U+0990 ঐ BENGALILETTERAI

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

13 U+0993 ও BENGALILETTERO

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

14 U+0994 ঔ BENGALILETTERAU

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

28

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

15 U+0995 ক BENGALILETTERKA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

16 U+0996 খ BENGALILETTERKHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

17 U+0997 গ BENGALILETTERGA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

18 U+0998 ঘ BENGALILETTERGHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

19 U+0999 ঙ BENGALILETTERNGA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

20 U+099A চ BENGALILETTERCA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

21 U+099B ছ BENGALILETTERCHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

29

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

22 U+099C জ BENGALILETTERJA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

23 U+099D ঝ BENGALILETTERJHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

24 U+099E ঞ BENGALILETTERNYA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

25 U+099F ট BENGALILETTERTTA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

26 U+09A0 ঠ BENGALILETTERTTHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

27 U+09A1 ড BENGALILETTERDDA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

30

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

28 09A109BC(U+09DC)

ড় NormalizedformofBENGALILETTERRRA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]09DCisthepreferredcodepointhoweveritisnotavailableforLGRasperthestandardsgoverningthisLGRdevelopment

29 U+09A2 ঢ BENGALILETTERDDHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

30 09A209BC(U+09DD)

ঢ় NormalizedformofBENGALILETTERRHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]09DDisthepreferredcodepointhoweveritisnotavailableforLGRasperthestandardsgoverningthisLGRdevelopment

31 U+09A3 ণ BENGALILETTERNNA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

32 U+09A4 ত BENGALILETTERTA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

31

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

33 U+09A5 থ BENGALILETTERTHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

34 U+09A6 দ BENGALILETTERDA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

35 U+09A7 ধ BENGALILETTERDHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

36 U+09A8 ন BENGALILETTERNA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

37 U+09AA প BENGALILETTERPA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

38 U+09AB ফ BENGALILETTERPHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

39 U+09AC ব BENGALILETTERBA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

32

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

40 U+09AD ভ BENGALILETTERBHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

41 U+09AE ম BENGALILETTERMA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

42 U+09AF য BENGALILETTERYA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

43 09AF09BC(U+09DF)

য় NormalizedformofBENGALILETTERYYA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]09DFisthepreferredcodepointhoweveritisnotavailableforLGRasperthestandardsgoverningthisLGRdevelopment

44 U+09B0 র BENGALILETTERRA

Consonant 1Bangla2Manipuri

[112][125]

45 U+09B2 ল BENGALILETTERLA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

46 U+09B6 শ BENGALILETTERSHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

33

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

47 U+09B7 ষ BENGALILETTERSSA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

48 U+09B8 স BENGALILETTERSA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

49 U+09B9 হ BENGALILETTERHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

50 U+09BE া BENGALIVOWELSIGNAA

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

51 U+09BF ি BENGALIVOWELSIGNI

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

52 U+09C0 ী BENGALIVOWELSIGNII

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

53 U+09C1 BENGALIVOWELSIGNU

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

34

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

54 U+09C2 BENGALIVOWELSIGNUU

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

55 U+09C3 BENGALIVOWELSIGNVOCALICR

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

56 U+09C4 BENGALIVOWELSIGNVOCALICRR

Kāra(Mātrā)

1Bangla2Assamese

[112][122]

57 U+09C7 l BENGALIVOWELSIGNE

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

58 U+09C8 m BENGALIVOWELSIGNAI

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

59 U+09CB lা BENGALIVOWELSIGNO

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

60 U+09CC lৗ BENGALIVOWELSIGNAU

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

35

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

61 U+09CD BENGALISIGNVIRAMA

Hasanta(=Halant)Virama(=Da ri)

1Bangla2Assamese2Manipuri

[112][122][125]

62 U+09CE ৎ BENGALILETTERKHANDATA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

63 U+09F0 ৰ BENGALILETTERRAWITHMIDDLEDIAGONAL

Consonant 2Assamese [122]

64 U+09F1 ৱ BENGALILETTERRAWITHLOWERDIAGONAL

Consonant 2Assamese2Manipuri

[122][125]

Table8BanglaCode-PointRepertoire

Apart from the above individual code-points the Neo-Brāhmī Generation Panel alsoproposes some specific sequences which enable conditional inclusion of the BanglaLETTER A and E followed by Bangla SIGN VIRAMA and Bangla LETTER YA againfollowed by Bangla VOWEL SIGN AA in the repertoire for enabling inclusion of aeligsoundasinEnglishlsquobatrsquolsquocatrsquoetc

SrNo

UnicodeCodePoints

Sequence

CharacterNames Examplelanguagesusingthecode-point

(Notexhaustive

list)

Reference

36

SrNo

UnicodeCodePoints

Sequence

CharacterNames Examplelanguagesusingthecode-point

(Notexhaustive

list)

Reference

S1 098509CD09AF09BE

অ8া BENGALILETTERABENGALISIGNVIRAMABENGALILETTERYABENGALIVOWELSIGNAA

BanglaAssamese

[112][122]

S2 098F09CD09AF09BE

এ8া BENGALILETTEREBENGALISIGNVIRAMABENGALILETTERYABENGALIVOWELSIGNAA

Bangla [112]

Table9Sequences

52 CodePointRepertoireExclusionTherearesomecharactersoftheBanglascriptthatfindplaceintheUnicodebuthavenot been included in the repertoire in the LGR proposal The reason for excludingঌ(U+098C)andৗ(U+09D7)isthattheyarerareandobsoletecharacters

SrNo CodePoints

Glyph CharacterNames Note

1 U+098C ঌ BENGALILETTERVOCALICL Limitedordeclininguse

2 U+09D7 ৗ BENGALIAULENGTHMARK Limitedordeclininguse

Table10ExcludedCodePoints

37

53 CodepointnotusedaloneBENGALI SIGN NUKTA U+09BC (See 336) is excluded from repertoire since it will never be used alone It will be used as sequence in three special characters in normalized form for ড় ঢ় য়

UnicodeCodePoint

Glyph CharacterName Reasonforexclusion

U+09BC BENGALISIGNNUKTA

Never used alone Only used together with U+09A1 ড

U+09A2 ঢ U+09AF য as to form ড় ঢ় য় respectively

Table10bExcludedCodePoints

54 TheBasisofPresentIDNThepresentLGRhasalsobenefitedfromtheearlierworkonIDNforBangla(differentversions)doneforभारतorভারতdraftedbetween20112009and18072013

541 TheABNFVariablesTheAugmentedBackus-NaurFormalism(ABNF)beganwiththefollowingvariables

CrarrConsonantVrarrVowelMrarrkāra(Mātrā)BrarrAnusvāra(onuʃʃār)DrarrCandrabinduXrarrVisarga(biʃɔrgo)HrarrHasantaViramaZrarrKhandaTa

TheAugmentedBackus-NaurFormalism(ABNF)willusethefollowingOperators

SrNumber Operator Function

1 ldquo|ldquo Alternative

2 ldquo[]rdquo Optional

3 ldquordquo VariableRepetition

38

4 ldquo()rdquo SequenceGroup

Table11TheABNFFormalism

InwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoBanglaaregiventofacilitateunderstanding

542 TheVowelSequenceInwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoBanglaare given To facilitate understanding of other Brahmi script users equivalents inDevanāgarīareprovidedwherevernecessaryAvowelsequenceismadeupofasinglevowelItmaybefollowedbutnotnecessarily(optionally)byanAnusvāraonuʃʃār(B)Candrabindu(D)oraVisargabiʃɔrgo(X)ThenumberofDBorXwhichcanfollowaV inBanglamaynotberestrictedtooneGoingbytherules illustratedinthedocument it isclearthat formationssuchasVDDVBBandVXXare invalidorthographicunitsHowever it isvalidandpossible tohaveformationsorsequencessuchasanusvarafollowedbyachandrabinduononehandandvisarga followed by a chandrabindu on the other as in হ8াংচা lsquohaelignchārsquo and lsquohaelignrsquo হ8াঃrespectivelyThepossibilityof aVisarga orAnusvāra (onuʃʃār) followingaCandrabinduexists inBangla Vowel can optionally be followed by a combination of Hasanta Virama [H]Consonant [C] to formaYa-phala ldquoYa-phala isapresentation formofU+09AFBanglaletterযorlsquoyarsquoRepresentedbythesequenceltU+09CDieBENGALISIGNVIRAMABangla SIGNHasanta or VIRAMA U+09AF -য BENGALI LETTER YAgt Ya-phala has aspecialformয়AgainwhencombinedwithU+09BEাBENGALIVOWELSIGNAA(ielsquoaarsquo(ā))itisusedfortranscribing[aelig]asintheldquoardquointheEnglishwordldquobatrdquowritteninBanglaasব8াটAVowel-sequenceadmitsthefollowingcombinations

5421 ASingleVowel

ExamplesV অअ

5422 AVowelwithConditionsAVowelcanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D]orVisarga[X]or Candrabindu+ Anusvāra [DB] or Candrabindu+ Visarga [DX] or combination ofHasanta(orVirama)[H]followedbyConsonant[C]followedbykāra(Mātrā)[M]

39

Examples

VB অং अ

VD অ अ

VX অঃ अः

VDB অং अ

VDX অঃ अःVHCM অ8াএ8া

5423 VHCMSequenceAVHCMsequencecanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D] orVisarga [X]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga [DX]

Examples VHCMB অ8াংএ8াং VHCMD অ8াএ8া VHCMX অ8াঃএ8াঃ VHCMDB অ8াংএ8াং VHCMDX অ8াঃএ8াঃ

543 TheConsonantSequence

5431 ASingleConsonant(C) ExampleC কक

5432 AConsonantwithConditionsAConsonantoptionallyfollowedbydependentvowelsignkāra(Mātrā)[M]orAnusvāra [B] or Candrabindu [D] or Visarga [X] or Hasanta (also known asVirama)[H]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga[DX]

Example

CM িকক -कक

CB কং क

40

CD ক क

CX কঃ कः

CH p क(Pureconsonant)

CDB কং क

CDX কঃ कः

5433 CMSequenceACMsequencecanbeoptionallyfollowedbyBDXDBorDX

Example CMB কীংকং कक

CMD কা का

CMX বীঃ वीः

CMDB কাং का

CMDX কাঃ काः

5434 SequenceofConsonants Asequenceofconsonants(upto4)joinedbyHasanta(alsoknownasVirama)

3(CH)CExample

CHC W rarr ন++ ত न+त

CHCHC sup2 rarr ন+ + ত+ + র न+त+र

CHCHCHC q8 rarr ন++ত++র++য় न+त+र+य

5435 Subsets While considering its subsets as a representative example we will consider thecombinationCHConlyhoweverthesameisequallyapplicabletoCHCHCandCHCHCHC[A]ThecombinationmaybefollowedbyMBDXDBorDX

Example CHCM sup3ী rarrক ক ী 4कrarrककী

CHCB sup3ং rarrক ক ং 4कrarrकक

CHCD sup3 rarrক ক 4कrarrकक

41

CHCX sup3ঃ rarrক ক ঃ 4कःrarrककঃ

CHCDB sup3 ং rarrক ক ং 4कrarrकक

CHCDX sup3ঃ rarrক ক ঃ 4कःrarrककः [B] 3(CH)CMmayfurtherbefollowedbyaBDXDBorDX

Example

CHCMB sup3ীং rarr ক ক ী ং 4क rarr क क ी

sup3ং rarr ক ক ং 4क rarr क क

CHCMD sup3া rarr ক ক া 4का rarr क क ा

CHCMX sup3ীঃ rarr ক ক ী ঃ 4कः rarr क क ी ः

CHCMDB sup3াংrarr ক ক া ং 4काrarr क क ा

CHCMDX sup3াঃ rarr ক ক া ঃ 4काः rarr क क ा ः

544 TheKhanda-Tasequence

5441 AsinglelsquoKhandarsquo-Ta(Z) Example Zৎ=x

5442 AKhandaTaCombination10AKhandaTacanbeprecededbyaconsonantandHasanta(alsoknownasVirama)

[CH]Z

Example র++ৎ =ৎHasinভৎHসনা (bhartsanā) scoldingNoteTheconditionsinthiscontextofKHANDATAarethattheCshouldbeeitherRAU+09B0(র)(usedinBangla)orRAU+09F0(ৰ)(usedinAssamese)

545 SpecialCasesSandPTwospecialcasesinvolvingSequences(referredtoasSandPinTable16underSection7)couldbedescribedbrieflyhereLetustakeupSinthefirstinstanceItisnoteworthythat thereare two instances inBanglawhereHasanta(U+09CD) isprecededbya full

10 Refer to Rule P in Section 7 Table 16

42

vowel (U+0985 অ - BANGLA LETTER A and U+098F এ - BANGLA LETTER E) ForrenderingYa-phalāfollowedbyঅandএitisnecessarytotypeU+09CDplusU+09AFyapreceded by the said vowels This is a purely ligatural entity and the addition ofYa-phalāandā-kāraisusedtoelicittheaeligsoundasinEnglishlsquobatrsquolsquofatrsquoetcTheBrahmıscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribedaslsquovowel killerrsquo although it actually indicates absence of a vowel after the markedconsonant Only the consonants can have the Hasanta marked But as we see hereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅGাandএGা(CfUnicode100p473[100])Another case refers to the formation of repha and ra-phalā in the said script andmentioned in the tableaboveasPOwing toco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+

kaasinঅকHarkaldquothesunldquo)ra-phalā=C+Hasanta+ra(egTieka+Hasanta+raas

inচTcakraldquocycleldquo)ThepointisinboththecasestheslotforracouldbeBanglaraর

(U+09B0)ortheAssameseraৰ(U+09F0)followedprecededbythecommonHasanta(U+09CD)whereastheshapesofrephaandra-phalāinboththecasesremainthesame

43

6 VariantsThissection talksabout thevariants in theBanglascriptTheNBGPcategorizes theseconfusinglyvariantsintwogroupsGroup1ConfusingduetopurevisualsimilarityGroup2ConfusingduetodeviationfromnormallyperceivedcharacterformationsbylargerlinguisticcommunityForGroup1anyidenticalcodepointsaredefinedasvariantsTheconfusablebutnotidenticalcasesarenotproposedasthereisanotherpanel(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesHowevercaseswhichbelongtoGroup2areproposedtobeconsideredasvariantsThesecasesarenotofmerevisualsimilarityasthey involve some deviations from the widely accepted norms of Bangla Aksharformations These can cause confusion even to a careful observer and hence beingproposedasvariantsThe variants are generated in a script when two or more forms are formed withdifferent storage or code points In Bangla the e-kāra ā-kāra and the o-kāra havedifferentcodepointsOnecantypeowithaconsonantatonegoandthesamebytypinge-kāra and ā-kāra as two separate keys getting the same results A reader cannotdifferentiatebetween the twoko (েকা)one typedwitha singlekeyand theotheronetypedwithtwodifferentkeysMoreoverthiswillnotbeconsideredasacaseofvariantbecauseakārafollowedbyakāraisnotallowed

61 InScriptVariantsHoweverweproposetwocasesoftruein-scriptvariantsinBanglascriptCASEIAs far as true variants in Bangla are concernedwemay drawour attention to caseswhereinHasantawith(U+09A5)থ(tha)appearsasconjunctwith(U+09B8)স(sa)and(U+09A8)ন(na)1 স+Hasanta+থ(U+09B8+U+09CD+U+09A5)versus

স +Hasanta+হ(U+09B8+U+09CD+U+09B9)

2 ন+Hasanta+থ(U+09A8+U+09CD+U+09A5)versus

ন+Hasanta+হ(U+09A8+U+09CD+U+09B9)

44

Theabovecombinationsifwrittenintraditionalorthographycouldbelittleconfusingwheretheথ(tha)inconjunctappearslikeaহ(ha)Theconjunctcouldbeintheinitialmedialorfinalpositions(asshownbelowinegno1)Itcouldbetypedwrongaswellthinking itwas aহ (ha)U+09B9 increasing the chances of risks in labelwriting andidentificationExamples1 acuteandসহ(asinacuteানsthānaacuteলsthulaাacuteGsvāsthyaঅacuteায়ীasthāyı)

2 andনহ(asinparagrantha)The fontswhichrepresent traditionalBanglawritingsystemcould tend tocreate thisproblemThereforethesemaybetakenascasesofvariantsinBanglaCASEIIAnotherinterestingexampleofvariantisencounteredinra+HasantaandHasanta+racombinationsinwritinglabelsintheBanglascript(forlanguagessuchasBanglaAssameseandManipuri)Thevariantcasesariseintypinglsquorepharsquo(involvingra+Hasanta)andlsquora-phalārsquo(involvingHasanta+ra)lsquoRepharsquocouldbeformedbytwosequences(mainlybecausebothAssameseandBanglafindplaceinthesameUNICODEpointsandlsquoB_RArsquoaswellaslsquoA_RArsquorefertothesamephoneticelement)Herethefinalligatureslookthesameandwillbeasfollows

(1) B_RA+H+C(2) A_RA+H+C

Where

B_RA = U+09B0BENGALILETTERRA(র)orA_RA = U+09F0BENGALILETTERRAWITHMIDDLEDIAGONAL(ৰ)H = U+09CDBENGALISIGNVIRAMA()C = anyconsonant(theoretically)

Example

Sequence1(UsingBanglaRA)

Ligature1

Sequence2(UsingAssameseRA)

Ligature2

U+09B0(র)U+09CD()U+0995(ক) কH U+09F0(ৰ)U+09CD()U+0995(ক) কH

U+09B0(র)U+09CD()U+09A0(ঠ) ঠH U+09F0(ৰ)U+09CD()U+09A0(ঠ) ঠH

Table12ExampleofRepha

45

NoteAsBanglaandAssameseকandঠlookexactlythesametheresultantcombinationswithRephalookidenticalAdditionofRephadoesnotmakeanydifferencelsquoRa-phalarsquocouldbeformedbytwosequencesonsimilargroundsandthefinalligatureswouldlookthesame

(1) C1+H+B_RA(2) C1+H+A_RA

WhereC1 = anyconsonantsexceptKhanda-ta

Example

Sequence1(UsingBanglaRA)

Ligature1

Sequence2(UsingAssameseRA)

Ligature2

U+0995(ক)U+09CD()U+09B0(র) U+0995(ক)U+09CD()U+09F0(ৰ)

U+09A8(ন)U+09CD()U+09B0(র) ) U+09A8(ন)U+09CD()U+09F0(ৰ) )

Table13ExampleofRa-phala

AstheAssameseandBanglaRephaandRa-phalaconjunctformslookthesamethiscouldcauseconfusabilitytotheend-usersHencetherephaandra-phalacasesneedtobedefinedasvariantsNBGPconcludedtodefineরandৰasvariantcodepointswhereonlyonevariantsetbetweenরandৰcouldcoverallcasesButthiswillcreateblockedvariantlabelsegif

someoneregistersldquoরররrdquothevariantlabelldquoৰৰৰrdquowillbegeneratedasvariantandwillbeblockedandviceversaHoweveritisonlyblockedatthelabellevelifsomeoneelseneedstoregisterotherlabelsegৰৰorৰৰৰৰitisstillpossible

62 CrossScriptVariantsAcrispcrossscriptstudyforBanglahasbeendonewithrespecttosisterscriptssuchasDevanāgarī Gurmukhı and Odia11(formerly Oriya) keeping in mind the visual andtechnicalconfusionstheymaycauseaslabelsonthewebdomainMoreoverthereisnoin-script variant in Bangla as far as the orthography is concerned The followingcharacters are being proposed by the NBPG as variants Although there are certaincharacters which are somewhat similar they but have not been included here TheyhavebeenprovidedintheAppendix(102)forreference

11 Unicode uses Oriya for the script although Odia is now the official term used

46

1 BanglaandNāgarīDevanāgarīScript

Bangla Devanāgarī

মU+09AE

मU+092E

িU+09BF

िU+093F

Table14-BanglaandDevanāgarīcross-scriptvariantcodepoint

2 BanglaandGurmukhiScript

Bangla Gurmukhı

মU+09AE

ਸU+0A38

িU+09BF

ਿU+0A3F

Table15-BanglaandGurmukhıcross-scriptvariantcodepoint

7 WholeLabelEvaluationRules(WLE) ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinsection32whenwritten inBangla12ScriptThe ruleshavebeendrafted in suchawaythattheycanbeeasilytranslatedintotheLGRspecificationsBelow are the symbols used in the WLE rules for each of the Indic SyllabicCategoryasmentionedinthetableprovidedinCodepointrepertoire(Section51)

C rarr Consonant

M rarr Kāra(Mātrā)

V rarr Vowel

B rarr Anusvāra

12 As used by the Unicode denoting and including both Assamese and Maṇipuri

47

D rarr Candrabindu

X rarr Visarga

H rarr Hasanta

Z rarr KhandaTa

S rarr S1S2(fromTable9)or(ae)Ya-phalā(V1HC1M1)whereV1iseither0985(অ-BENGALILETTERA)or098F(এ-BENGALILETTERE)His09CD(-BENGALISIGNVIRAMA)C1is-09AF(য-BENGALILETTERYA)M1is-09BE(া-BENGALIVOWELSIGNAA)S1 and S2 are valid even they are not allowed by the other context rules

P rarr Ra-Hasanta(C2H)whereC2iseither09B0(র-BENGALILETTERRA)or 09F0 (ৰ - ASSAMESE LETTER RA

Unicode name BENGALI LETTER RAWITHMIDDLEDIAGONAL)

His09CD(-BENGALISIGNVIRAMA)

Table16-SymbolsusedinWLErules

ItisalsoperhapsidealtomentionherethatinBanglatheconsonantletters(orgraphemes)arephysicallyjoinedtoformldquoclustersrdquothatcouldtheoreticallyconjoinfromtwotofourconsonantsandcombinetocreatenewshapesDashandChaudhuri(1998)statethatthereareldquonearly380uniqueconsonantclustersrdquooutofwhichBi-consonantalcombinationsare290three-lettercombinationsaccountforanother80andtherareroneswithfourlettersnumber10more[136Pg4]MoredetailsofsuchcombinationscouldbeseeninPabitraSarkar(1993)[135] 71FinalSetofWLERulesTheprevalentpatternsinBanglaandvariousrestrictionsbelowarethespecificWLErulesthatneedtobeimplemented

48

1 CisasetofCandCNwhereCNisthesetofnormalizedformsofড়ঢ়য়2 HmustbeprecededbyC

Example

3 MmustbeprecededbyCExampleকা

4 DmustbeprecededbyeitherofVCorMExampleআখখা হা

5 XmustbeprecededbyeitherofVCMorDExampleউঃখঃবঃাঃ দ ঃ

6 BmustbeprecededbyeitherofVCMorDExampleআংইংকং

7 ZmustbeprecededbyVCMDBXorPExampleইৎকৎাৎাৎপ6ৎrৎ (S is not listed because S ends with M Z may also follow S)rdquo

8 VCANNOTbeprecededbyHDetailsin711CaseofVprecededbyH

9 SCANNOTbeprecededbyH

Now letuselaborateeach rulewithexamples from the scriptkeeping inmindthe Bangla Assamese and Manipuri communities Some combinations ofcharactersmayseemunrealisticorrareinusagebutthereisnoharminaddingsuch ligaturesbecause it ispossible tocreate thembyanyusereasilybutmaynotbeattestedcombinationsCase of V Preceded by H There could be cases involvingmulti-word domains where Vmay need to beallowedtofollowanHegব8াsঅtইিuয়া baeligŋkʌv ɪndiə (U+09ACU+09CDU+09AFU+09BEU+0999U+09CD U+0995 U+0985 U+09AB U+09CD U+0987 U+09A8 U+09CD U+09A1U+09BFU+09DFU+09BE)(meaningBankofIndia)ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendswithanH(অt)andthesecondwordbeginswithaV(ইিuয়া)Somesectionsofthe linguistic community require the explicit presence of H for fullrepresentation of the sound intended However by and large the form of thefirstwordwithoutanH(U+09CD)isconsideredenoughforfullrepresentationofthesoundintendedforthefirstword

49

This isauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoire Otherwise V is never required to be allowed to follow an HPermittingthismaycreateaperceptivesimilaritybetweentwolabels(withandwithout H) for majority of the linguistic communities hence this is explicitlyprohibitedbytheNBGPIn future if required depending on the prevailing requirements from thecommunitythefutureNBGPmayconsiderrevisitingthisrule

72 AdditionalExamplesfromBanglaABNF

Belowarea fewexampleswhichhelponeunderstandsomeof the rulesABNFputsinplaceThesearejustgivenforreferencepurposesandarenotmeanttobecomprehensive

1HMBDorXcannotoccurinthebeginningofaBanglawordExample

ক क

াক ाक

ংক क

ক क

ঃক ःकAscanbeseensuchcombinationwillresultautomaticallyinaldquogolurdquooradottedcircle marking it as an invalid formation This is an intrinsic property of theIndianlanguagesyllableandisquasi-automaticallyappliedwhereversupportedbytheOS

2HisnotpermittedafterVBDXMS

Example

অ अ

অং क ক क কঃ कः ি )क

3NumberofBDorXpermittedafterConsonantorVowelorakāra(Mātrā)isrestrictedtoonethusthefollowingcombinationsareinvalidated

50

Example

কংং क

ক क

কঃঃ कःः

কা का

কীঃঃ कःः

অংং अ

অ अ

অঃঃ अःः

4NumberofMpermittedafterConsonantisrestrictedtooneExample

কীী की5MisnotpermittedafterV Example

ইাঈৗ ईाईौ6ThecombinationsofAnusvāra+ VisargaaswellasVisarga+Anusvāraarenotpermissible

Example

কংঃ कः

কঃং कः

8 Contributors

81ExpertsfromIndia ProfessorUdayaNarayanaSinghChair-ProfessorofLinguisticsampDeanFacultyofArtsAmityUniversityHaryanaGurgaonPachgaonManesarPIN122431(Haryana)India

51

ProfessorPabitraSarkarformerlyVice-ChancellorRabindraBharatiUniversityKolkataDrAtiurRahmanKhanPrincipalTechnicalOfficerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMrRajibChakrabortyLinguistSocietyforNaturalLanguageTechnologyResearch(SNLTR)Module114amp130SDFBuildingSaltLakeSector-VKolkata-700091(WestBengal)IndiaMrAkshatJoshiProjectEngineerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMsMoumitaChowdhurySeniorTechnicalOfficerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMrChandrakantaMurasinghAgartalaTripuraSomeotherNBGPmembers

82ContributorsfromBangladeshJanabMustafaJabbarHonorableMinisterMinistryofPostsTelecommunicationsampInformationTechnologyGovtofBangladeshProfShamsuzzamanKhanFormerDirector-GeneralBanglaAcademyDhakaProfRafiqulIslamNationalProfessorofHumanitiesDhakaProfSwarochisSarkarDirectorInstituteofBangladeshStudiesRajshahiUniversityRajshahiBangladeshProfJinnatImtiazAliDirector-GeneralInternationalMotherLanguageInstituteDhakaMrMohammadMamunOrRashidDepartmentofBanglaJahangirnagarUniversityampMemberBangladeshComputerCouncilProfManiruzzamanformerlyProfessorChittagongUniversityChattagramBangladeshMrShyamSunderSikderSecretarySecretaryPostampTelecommunicationsDivisionGovtofBangladesh

52

MrMdMustafaKamalFormerDirectorGeneralBangladeshTelecommunicationsRegulatoryAuthorityGovernmentofBangladeshDhakaBrigadierGeneralMdMahfuzulKarimMajumderDirector-GeneralEngineeringampOperationsDivisionBangladeshTelecommunicationsRegulatoryAuthorityGovernmentofBangladeshDhakaMdZiarulIslamProgrammerPostsampTelecommunicationsDivisionGovernmentofBangladeshDhakaProfSyedShahriyarRahmanDepartmentofLinguisticsUniversityofDhakaDrMizanurRahmanDirector(In-Charge)TranslationTextBookandInternationalRelationsDivisionBanglaAcademyDhakaDrApareshBandyopadhyayDirectorBanglaAcademyDhakaMrMdMobarakHossainDirectorBanglaAcademyDhakaDrJalalAhmedDirectorBanglaAcademyDhakaMrJahangirHossainInternetSocietyBangladesh(ICANNALS)JanabSarwarMostafaChoudhuryBangladeshComputerCouncilDhakaJanabMdRashidWasifBangladeshComputerCouncilDhakaJanabIstiaqueArifSeniorAssistantDirectorBangladeshTelecommunicationsRegulatoryAuthorityDhakaMsAfifaAbbasInformationSecurityandGovernanceLeadEngineeratBanglalinkandICANNFellowMr Mohammad Abdul Haque Secretary General Bangladesh Internet GovernanceForumMrImranHossenCEOEyeSoftandkeymemberofBangladeshAssociationofSoftwareampInformationServices(BASIS)MsShahidaKhatunDirectorFolkloreMuseumampArchiveDivisionBanglaAcademyDhakaMrSyedAshikRehmanCEOBengalMediaCorporationDhakaMrHaseebRahmanCEOProfessionalsrsquoSystemsDhaka

53

9 References[100] UnicodeConsortium2017UnicodeStandard100MountainViewCA

[101] BandyopadhyayChittaranjan1981DuiShatakerBanglaMudranoPrakashanKolkataAnandaPublishers

[102] BanerjiRD1919TheOriginoftheBengaliScriptKolkataNewDelhiAsianEducationalServices2003reprint

[103] ChatterjiSK1926TheOriginandDevelopmentoftheBengaliLanguageCalcuttaCalcuttaUniversityPress

[104] -----1939Bhasha-prakashBangalaVyakaran(AGrammaroftheBengaliLanguage)CalcuttaUniversityofCalcutta

[105] HaiMuhammadAbdul1964DhvaniVijnanOBanglaDhvani-tattwa(PhoneticsandBengaliPhonology)DhakaBanglaAcademy

[106]JhaSubhadra1958TheFormationofMaithiliLondonLuzacampCo

[107] KosticDjordjeDasRheaS1972AShortOutlineofBengaliPhoneticsCalcuttaStatisticalPublishingCompany

[108] MajumdarRC1971HistoryofAncientBengalCalcuttaGBhardwaj

[109] MazumdarBijaychandra19202000TheHistoryoftheBengaliLanguage(ReprCalcutta1920ed)NewDelhiAsianEducationalServices

[110] PandeyAnshuman2001ProposaltoEncodetheTirhutaScriptinISOIEC10646

[111] PalPalashBaran2001DhwanimalaBarnamalaKolkataPapyrus

[112] -----2007lsquoBanglaHarapherPanchParbarsquoInSwapanChakrabortyedMudranerSanskritiOBanglaBoiKolkataAbabhas

[113] RossFiona1999ThePrintedBengaliCharacteranditsEvolutionLondonCurzon

[114] ShastriMahamahopadhyayHaraPrasad1916HājārBacharērPurāṇaBāṅgālāBhāṣāyBauddhaGānōDōhāCalcuttaBangiyaSahityaParishat

[115] SinghUdayaNarayana(JointlyManiruzzaman)1983DiglossiainBangladeshandlanguageplanningCalcuttaGyanBharati214pp

[116] -----1987ABibliographyofBengaliLinguisticsMysoreCIILxii+316pp

[117] -----2017(withRajibChakrabortyBidishaBhattacharjeeampArimardanKumarTripathy)LanguagesandCulturesontheMarginGuidelinesforFieldworkonEndangeredLanguagesMimeoCentreforEndangeredLanguagesVisva-Bharati

54

[118] -----1980ScriptalchoiceandspellingreformAnessayinlanguageandplanningJournaloftheMSUniversityofBarodaSocialScienceNumber292173-186AmodifiedversionreprintedEAnnamalaiBjornJernuddandJoanRubinedsLanguagePlanningProceedingsofanInstituteMysoreCIIL405-417

[119] Sripantha1996JakhanChapakhanaEloKolkataPaschim-BangaBanglaAcademy

[120] SurAtul1986BanglaMudranerDushoBacharKolkataJijnasa

[121] ScriptBehaviourforBengaliVersion11TDILandC-DACPune

[122] BoraMahendra1981TheEvolutionofAssameseScriptJorhatAssamSahityaSabha

[123] ProposaltoEncodetheTirhutaScriptinISOIEC10646httpwwwunicodeorgL2L201111175r-tirhutapdfaccessedon25112017

[124] EthnologueAssameseintheLanguageCloudhttpswwwethnologuecomcloudasmaccessedon25112017

[125] BengalialphabetforManipurifoundinEthnologueManipuri(MeeteilonMeithei)httpswwwomniglotcomwritingmanipurihtmaccessedon20102019

[126] WikipediaBengalialphabethttpsenwikipediaorgwikiBengali_alphabetaccessedon25112017

[129]OmniglotSlyhetihttpwwwomniglotcomwritingsylotihtmaccessedon1052018

[130]WikipediaBishnupriyaManipurilanguagehttpsenwikipediaorgwikiBishnupriya_Manipuri_languageaccessedon25112017

[131] TheEMILLECIILCorpushttpmetashareeldaorgrepositorybrowsethe-emilleciil-corpusabdd35c8de6f11e2b1e400259011f6ea6bce74d38dbb42d881da76c64a6adb20accessedon1052018

[132]TheEMILLECIILCorpushttpcatalogelrainfoproduct_infophpproducts_id=696accessedon1052018

[133] BanglaLanguageampScript httpswwwisicalacin~rc_banglabanglahtmlaccessedon1052018

[134] SarkarPabitra1992BanglaBananSanskarSamasyaoSambhabanaKolkataChirayataPrakashan

[135] SarkarPabitra1993BanglaBhasharYuktabyanjanBhasha1123-45

55

[136] DashNiladriShekharandBBChaudhuri1998BanglaScriptAStructuralStudyLinguisticsToday121-28Alsoavailableathttpswwwacademiaedu9967428Bangla_Script_A_Structural_Study

[137] DaniAhmedHasan(1957)lsquoSrīhatta-NāgarīLipirUtpattioBikāśrsquoBanglaAcademyPatrika(Dhaka)Vol12(Bhadra-Agrahayan1364BangabdaNumber)pg1

[138] WikipediaSylhetiNagari httpsenwikipediaorgwikiSylheti_Nagariaccessedon1952018

[139] FuruiRyosuke(2015)lsquoVariegatedAdaptationsStateFormationinBengalfromtheFifthtoSeventhCenturyrsquoinBhairabiPrasadSahuampHermannKulkeedsInterrogatingPoliticalSystemsIntegrativeProcessesandStatesinPre-ModernIndiaChapter9Pp255-73NewDelhiManohar

[140] FergusonCharesAandMunierChowdhury(1960)lsquoPhonesofBengalirsquoLanguageVol36No1pp22-59

[141] ShahidullahMuhammad(2007)BuddhistMysticSongsDhakaMowlaBrothers

[142] RayPunyaSloka(1966)BengaliLanguageHandbookWashington

[143] HaiMuhammadAbdul(1960)AphoneticandphonologicalstudyofnasalsandnasalizationinBengaliDhakaUniversityofDhaka

[144] UnicodeConsortiumProposalSummaryFormtoAccompanySubmissionsforAdditionstotheRepertoireofISOIEC10646UNICODEhttpswwwunicodeorgL2L200202387r-syloti-formpdfaccessedonMay212018

[145] WikipediaOlChiki(Unicodeblock)httpsenwikipediaorgwikiOl_Chiki_(Unicode_block)accessedonMay212018

[146]BanglaScripthttpwwwbangladesh2000combdbangla_scripthtmlaccessedonMay212018

[147] BhattacharyaAshutoshed(1942)GopichandrerGanCalcuttaCalcuttaUniversity

[149] DasSisirKumar(1975)SahibsandMunshisAnAccountoftheCollegeofFortWilliamCalcutta

[150] IslamRafiqulPabitraSarkarMahbubulHaqampRajibChakraborty(eds)(2014)BanglaAcademyPromitoBanglaByabaharikByakaran(AFunctionalGrammarofStandardBangla)DhakaBanglaAcademy

[151] SarkarPabitra[2013]lsquoBanglaSpellingReformtheLongandShortofItrsquoBanglaJournal19215-232

[152] BanglaAcademy(2012)BanglaAcademyPromitoBanglaBananerNiyam(StandardBanglaSpellingasadoptedbyBanglaAcademy)DhakaBanglaAcademy

56

[153] SarkarPabitraampRajibChakraborty2018ldquoWhathashappenedSoFarIntermsof Script Reformsrdquo Paper presented at the Face to Face meeting jointly held by theBanglaAcademyDhakaampICANNatBanglaAcademyDhakaon10072018

[154] The Unicode Consortium 2018 The Unicodereg Standard Version 110 ndash CoreSpecificationChapter12P473

57

10 Appendix-I

101 AugmentedBackus-NaurFormalism(ABNF)The Augmented Backus-Naur Formalism (ABNF) is generic in nature and whenappliedtoaspecificlanguagescriptcertainrestrictionrulesapplyInotherwordsinagivenlanguagesomeoftheFormalismstructuresdonotnecessarilyapplyTotakecareofsuchcasesrestrictionrulesaresetinplaceTheserestrictionswillhelptofine-tunetheABNFIncaseofBangla13inparticularthefollowingrulesapply

1Khaṇḍata (ৎ) is NOT allowed at the beginning of an IDN label The sameappliestoঞandthevelarnasalঙintheBanglaSchemeoffive-foldlsquovargarsquo(asdefined under Table 5) Moreover Bangla does not allow ya (য়) in thebeginningofawordeitherbutwecanciteacoupleofnativeexamples forexamplethewordয়8াvেড়া(yaeligbbɔRo)fromthepoemlsquoLichuchorrsquowrittenbyKaziNazrul IslamHowever there are instances of it being used in namesmostlyof foreignoriginsuchasYaqubwhichmaybewrittenwithya(য়) inthe beginning as inয়াwব) In very recent timeswhile transliterating someChineseandJapanesenamesinBanglaonedoescomeacrossthepossibilityofKhaṇḍata (ৎ) followedbysa (স) inthebeginningofaword forexamplexেসিরং(Tsering)

2CHcancomewithKhandaTainonlythecasewhereCisra(র)(09B0)

ৎ6 asinভৎ6 সনা

3OnlyfollowingcombinationswithVHCMwillbeallowedrarrঅ8া(togetherpronouncedasaelig)asinঅ8ািসড(acid)rarrএ8া(togetheralsopronouncedasaelig)asinএ8ািসডএ8ােসািসেয়শান

(acidassociation)

102 lsquoSylhetiNagarılipirsquoorlsquoSilotirsquoThisversionofBanglascriptresemblesthe lsquoKaithīrsquoscript(ISO12954)usedbytheAccountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh andBiharndashwidelyinuseduringthe1880sTherewereseveralothernamesofSylheti

13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language Assamese and Maṇipuri have not been covered in this section

58

Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)

Table17ndashTheScriptTableofSylhetiNagarıorSiloti

103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints

1031 BanglaandNāgarīorDevanāgarī

Bangla Devanāgarī NBGP

Decision

ঃ U+0983 ः U+0903 Confusable

ওU+0993 उU+0909 Confusable

ঘU+0998 घU+0918 Confusable

U+0981 U+0945 Confusable

Table18BanglaandDevanāgarīconfusablecodepoints

59

1032 BanglaandGurmukhi

Bangla Gurmukhi NBGPdecision

ঘU+0998 ਬU+0A2C Confusable

U+0981 U+0A71 Confusable

Table19BanglaandGurmukhiconfusablecodepoints

Bangla Gurmukhi

NBGPdecision

ওU+0993 ਤU+0A24

Distinguishable

শU+09B6 ਅU+0A05

Distinguishable

মU+09AE ਮU+0A2E

Distinguishable

বাU+09ACandU+09BE

ਗU+0A17

Distinguishable

Table20ndashBanglaandGurmukhıdistinguishablecodepoints

1033BanglaandOriya(Odia)

Bangla Oriya(Odia) NBGPDecision

ওU+0993 ଓU+0B13 Confusable

Table21ndashBanglaandOriyadistinguishablecodepoints

Bangla Oriya(Odia) NBGP

Decision

ঘU+0998 ସU+0B38 Distinguishable

Table22ndashBanglaandOriyadistinguishablecodepoints

60

11 Appendix-IIBengaliconsonantsandtheirallographs

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

প p y(z+ত)(z +ন)|(z +প)প8(z +য)r(z +র)(z +ল)~(z +স)(+প)(+প)

ফ pʰ (t +র)(t +ল)(+ফ)

ব b ( +জ)( +দ)( +ধ)v(+ব)ব8(+য)(+র)(+ল)ভ(+ভ)(+ব)(+ব)

(0+ধ) 2 (3+ব)

ভ bʱ ভ8(+য)(+র)(+ল)

ত t (x+ত)8(x+x+য)(x+x+ব)(x+থ)(x+ন)ত8(x+য)(x+ম)8(x++য)(x+ব)(x+র)y(z+ত)(p+ত)(p+x+ব)(+ত)q8(+x++য) (+x+র)Thereisamarkedformofত+=ৎৎ6 (+xৎ)

amp (5+ত)

61

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

থ tʰ থ8(iexcl+য)cent(iexcl+র)pound(+থ)(x+থ)curren(+থ)

(7+থ) 9 (+থ)

দ d yen(brvbar+গ)sect(brvbar+ঘ)uml(brvbar+দ)copy(brvbar+ধ)দ8(brvbar+য)ordf(brvbar+ব)laquo(brvbar+ভ)not(brvbar+র)(+দ)shy(+দ)reg(+brvbar+র)not6 (+brvbar+র)

(lt+গ) gt (lt+ধ)

ধ dʱ macr(deg+ন)plusmn(deg+ম)ধ8(deg+য)sup2(deg+র)sup3(acute+ধ)copy(brvbar+ধ)(+ধ)micro(+ধ)

( (+ধ) gt (lt+ধ) (0+ধ) (7+ধ)

ট ʈ para(middot+ট)ট8(middot+য)cedil(middot+ব)sup1(middot+র)ordm(p+ট)raquo(frac14+ট)

ঠ ʈʰ ঠ8(frac12+য)frac34(iquest+ঠ)Agrave(frac14+ঠ)

ড ɖ Aacute(Acirc+ড)ড8(Acirc+য)Atilde(Acirc+র)

ঢ ɖʱ ঢ8(Auml+য)Aring(iquest+ঢ)

চ tʃ AElig(Ccedil+চ)Egrave(Ccedil+ছ)Eacute(Ccedil+Ecirc+র)Euml(Ccedil+ঞ)চ8(Ccedil+য)Igrave(Iacute+চ)Icirc(Iuml+চ)

(A+চ)

62

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ছ tʃʰ ETH(Ecirc+র)Egrave(Ccedil+ছ)Ntilde(Iacute+ছ)Ograve(Iuml+ছ)

$ (A+ছ)

জ dʒ Oacute(Ocirc+জ)Otilde(Ocirc+Ocirc+ব)Ouml(Ocirc+ঝ)times(Ocirc+ঞ)জ8(Ocirc+য)Oslash(Ocirc+র)Ugrave(Iacute+জ)

(A+জ)

ঝ dʒʱ (notprivilegedenoughtohaveclustersasafirstmember)Ouml(Ocirc+ঝ)Uacute(Iacute+ঝ)

ক k Ucirc(p+ক)ordm(p+ট)(p+ত)Uuml(p+x+র)(p+x+ব)Yacute(p+ন)THORN(p+ব)szlig(p+ম)ক8(p+য)agrave(p+র)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)auml(p+frac14+ব)aacute8(p+frac14+য)aring(p+স)s(aelig+ক)ccedil(+p+র)

amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)

খ kʰ (notprivilegedenoughtohaveclustersasafirstmember)egrave(aelig+খ)

63

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

গ g eacute(acute+গ)ecirc(acute+দ)sup3(acute+ধ)euml(acute+ন)igrave(acute+ব)iacute(acute+ম)গ8(acute+য)icirc(acute+র)iuml(acute+ল)eth(aelig+গ)eth6 (+aelig+গ)

( (+ধ) (H+গ) K (L+H+গ)

ঘ gʱ ntilde(ograve+ন)ঘ8(ograve+য)oacute(ograve+র)ocirc(aelig+ঘ)

ঞ Thisletterdoesnothaveanyparticularphoneticvaluebutmostlypronouncedasn

Igrave(Iacute+চ)Ntilde(Iacute+ছ)Ugrave(Iacute+জ)Uacute(Iacute+ঝ)times(Ocirc+ঞ)

(A+চ) $ (A+ছ) (A+জ) M (A+ঝ)

ণ n otilde(iquest+ট)frac34(iquest+ঠ)ouml(iquest+ড)divide(iquest+Acirc+র)Aring(iquest+ঢ)oslash(iquest+ণ)ণ8(iquest+য)ugrave(iquest+ব)acirc(p+frac14+ণ)uacute(frac14+ণ)ucirc(+ণ)

O (P+ড) - (P+R+র) + (S+ণ)

ঙং ŋ s(aelig+ক)uuml(aelig+p+র)egrave(aelig+খ)eth(aelig+গ)ocirc(aelig+ঘ)yacute(aelig+p+ষ)(Insomecontextsaeligisreplacedbyং)কংঅং

) (H+ক) (H+গ) U (H+ঘ)

64

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ম m thorn(+ল)yuml(+প)(+z+র)(+ভ)(++র)$(+ম)(+র)(x+ম)plusmn(deg+ম)amp(+ম)atilde(p+frac14+ম)

W (3+ম)

ন n (+ট)((+middot+র))(iquest+ঠ)u(+ড)(+Acirc+র)(+ত)q(+x+র)q8(+x++য)curren(+থ)shy(+দ)reg(+brvbar+র)micro(+ধ)+(+deg+র)(+brvbar+ব)-(+ন)(+ম)ন8(+য)(+স)0(+ন)

(7+থ) (7+ধ) (7+Y+র)

শ ʃ Icirc(Iuml+চ)Ograve(Iuml+ছ)1(Iuml+ন)2(Iuml+ম)3(Iuml+র)4(Iuml+ল)শ8(Iuml+য)

ষ ʃ 5(frac14+ক)raquo(frac14+ট)Agrave(frac14+ঠ)uacute(frac14+ণ)6(frac14+প)7(frac14+z+র)8(frac14+ফ)raquo(frac14+ট)9(frac14+middot+র)Agrave(frac14+ঠ)uacute(frac14+ণ)ষ8(frac14+য)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)

+ (S+ণ)

65

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

স sampʃ (+ক)(+ট)(+প)(+ফ)lt(+ত)pound(+থ)(+ট)(+ক)=(+খ)স8(+য)gt(+র)(+ল)aring(p+স)

9 (+থ)

হ h ucirc(+ণ)0(+ন)amp(+ম)হ8(+য)(+র)A(+ল)

W (3+ম)

ড় ɽ B(C+গ)

ঢ় ɽʱ (notprivilegedenoughtohaveclusters)

য dʒThesecondarysymbol(allograph)jɔ-phalahastwophoneticvaluesWhenaddedtotheinitialconsonantinaworditisavowelaelig(asinশ8ামলর 8াপারetc)Butafteranon-initialconsonantitjustdoublesitinpronunciation(asinকায6ধায6etc)The+যcombinationhastwophysicalmanifestationsmdashর 8andয6

ক8(p+য)স8(+য)র 8(+য)[Justর 8isneverusedinBanglaorthographyর 8াisbutthenitslasttwosymbolsYa-phalaa-karaconstituteavowelsignrepresentingthevowelঅ8া]

66

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

র r Twomanifestationsmdashi lরফrepʰasthefirst

memberofaclusteregপ6ৎ6 not6 য6D6(+deg+ব)(earlierE6=+brvbar+deg+বafour-termcluster)etc(placedoverthefollowingconsonant)

ii র-ফলাrɔ-pʰɔlaasthesecondthirdmemberofaclustereg etc(placedundertheconsonantitfollows)

ল l F(+গ)(+প)G(+ব)H(+ম)I(+ট)J(+ড)K(+ক)F(+গ)L(+দ)ল8(+য)iuml(acute+ল)(+ল)thorn(+ল)

ঃ hwordfinallywordmediallyitdoublesthepronunciationofthefollowingconsonant

অঃকঃ

অব

Page 19: Proposal for a Bangla (or Bengali) Script Root Zone …‘Bangla’ (or Bengali) is historically and genealogically regarded as an eastern Indo-Aryan language with around 178.2 million

19

ZWJ(U+0200D)andZWNJ(U+0200C)arecodepointsthathavebeenprovidedbytheUnicodestandardto instructtherenderingofastringwherethescripthastheoptionbetweenjoiningandnon-joiningcharactersWithouttheuseofthesecontrolcodesthestringmayberenderedinanalternateformfromwhatisintendedUseofZWJ

bull InsofarasBanglaisconcernedZWJisusedfortheproperrenderingofcharacterssuchaskhaṇḍa-taৎasinসতGিজৎ (satyajit)ldquoSatyajitrdquoandসৎ(sat)ldquohonestrdquoThisistypedasfollowsta+Hasanta+ZWJ(U+0200D)

bull However ZWJ is more important where same combination of consonantal

characters is represented differently depending upon the contexts Eg র++য

havetworepresentationsinBanglamdashasযHandasর GTogettheformযHonehasto

type in the following mannermdashর++য but for র G the sequence would be

র+ZWJ++য [154] In other words ZWJ is used in the rendering of wordsdemanding ya-phalā after ra which is otherwise not possible to type (render)due to the same order of ra+hasanta+antastha ja in the medial andor finalposition Interestingly ra+hasanta+antastha ja is used to type repha on theconsonant -antasthaja as inকায6 (kaarjo) In order to get a ya-phalā after the

consonant -ra it is therefore obligatory to use ZWJ after -ra as in র Gাপার

(wrapper)র Gাশ(rash)র Gািল(rally)etcThetypingsequenceisgivenbelow

ra(র)+ZWJ+hasanta()+antasthaja(য)=র GUseofZWNJ

bull TheuseofZWNJinBanglaisusedtorepresenttheexplicitHasantaorHalantInordertoavoidconjunctformationincaseswherethereisanexplicithasantabeforethesucceedingconsonanttheZWNJisused

Consonant+hasanta+ZWNJ+consonant=explicithasantaExampleAা7 কথন(prakkathanaprakkɔtʰon)

TheuseofZWJZWNJhavebeenruledoutfromtherootzonebythe[Procedure]Usedin Bangla to create alternate renderings the insertion of these two signs can affectsearchingaswellasNLP

20

TheZeroWidthNon-joiner(ZWNJ)isaninvisiblecharacterusedincertaincases(afterHasanta)wheredefaultconjunctformationistobeexplicitlyrestrictedandtheHasantajoiningthetwoconsonantsparticipatingintheconjunctformationneedstobeexplicitlyshown

339UseofYa-phalaaYa-Phalaasequencesare two instances inBanglawhereHasanta isprecededbya fullvowel(U+0985অ-BENGALILETTERAandU+098Fএ-BENGALILETTERE)

bull অা 098509CD09AF09BE BENGALILETTERA+BENGALISIGNVIRAMA+BENGALILETTERYA+BENGALIVOWELSIGNAA

bull এা 098F09CD09AF09BEBENGALILETTERE+BENGALISIGNVIRAMA+BENGALILETTERYA+BENGALIVOWELSIGNAA

For renderingYa-phalā followedbyঅ andএ it isnecessary to typeU+09CDHasantaplusU+09AFyaprecededby thesaidvowelsThis isapurely ligaturalentityand theadditionofYa-phalāandākaraisusedtoelicittheaeligsoundasinEnglishacidঅGািসড

association অGােসািসেয়শনlsquobatrsquoবGাটlsquofatrsquo ফGাট lsquomatrsquo মGাটlsquocaprsquoকGাপetcTheBrāhmīscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribed as lsquovowel killerrsquo although it actually indicates absenceof a vowel after themarkedconsonantOnly theconsonantscanhave theHasantamarkedButasweseehereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅ8াandএ8া(CfUnicode100p473[100])

3310FormationofRa-phalaaandRefSequences Thiscasereferstotheformationofrephaandra-phalāasfollows

Ra-Hasanta= (C2H)whereC2iseither

09B0(র-BENGALILETTERRA)or 09F0(ৰ-ASSAMESELETTERRAUnicodename

BENGALILETTERRAWITHMIDDLEDIAGONAL)His09CD(-BENGALISIGNVIRAMA)

Owingtoco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+kaasinঅকHarkaldquothesunldquo)ra-phalā=C+Hasanta+ra(egT ieka+Hasanta+raasinচTchakraldquocyclerdquo)Thepointisin

21

both the cases the slot for ra could be Bangla ra র (U+09B0) or the Assamese ra ৰ(U+09F0)followedprecededbythecommonHasanta(U+09CD)whereastheshapesofrephaandra-phalāinboththecasesremainthesameTheLGRmakesanoteofthispoint of concern with respect to the two RAs in disguise as it would be compeltelyimpossibletodistinguishbetweenthemwithnakedeyesinalablesogeneratedwhichmay consequently lead to concerns related to spoofing and other kind of cyberirregularitiesThemotivetoclassthesetwoCPsas(blocking)variantsisbecausefullyrendered labels may mask the distinction between Bangla ra র (U+09B0) or theAssameseraৰ(U+09F0)ThatprovidesthejustificationforVariantSet4thoughonlyinthecontextoffollowingHasantThedifferencebetweentheRAsisonlydistinguishableifonelooksintotheirUnicodevaluesThereforelabelssuchasঅকHarka শীষH sırsalsquotopapexrsquo অY abhra lsquocloudthe skyrsquo ম śrama lsquophysical labourrsquo could be extremelydangerous as theweb-usermay never verify the digital content (the labels) with itsunicodevaluecodepoints ThispointismadeexplicitlywithreferencetoTable9(ofsequencesp36)andTable16(ofWLESymbolsp47)thataretofollowMoreoveritisnoteworthythattheREPHAcanalsooccurwithKHANDATATheconditionsinthiscontextofKHANDATAareliabletobesuchthattheCshouldbeeitherRAU+09B0(র)(usedinBangla)orRAU+09F0(ৰ)(usedinAssamese)

4 OverallDevelopmentProcessandMethodologyThe Neo-Brāhmī Generation Panel (NBGP) has been formed by members havingexperience in Linguistics (especially in NLP Computational linguistics) LiteratureLanguageHistoryandEpigraphyUndertheNeo-BrāhmīGenerationPanelBanglaandeightotherscriptsbelongingtoseparateUnicodeblocksarebeingtakenuptoassignaseparate LGR for each However an attempt ismade to ensure that the fundamentalphilosophybehindbuildingthoseLGRsconsistentwithallotherBrāhmī-derivedscriptsThepresentLGRwillcater tomultiple languagesbelongingtoEGIDSscale1to4(seeTable4)thatuseBanglascriptThefollowingguidingprinciplesareusedinmakingdecisionsaboutBanglaLGRCode-points

41 GuidingPrinciplesTheNBGP adopts followingbroadprinciples for selection of code-points in the code-pointrepertoireacrosstheboardforalltheNeo-Brāhmīscriptswithinitsambit

411 InclusionPrinciples4111 ModernUsageEvery character proposed should be in the everyday usage of a particular linguisticcommunityThecharacterswhichhavebeenencodedintheUnicodefortranscription

22

purposesonlyorforarchivalpurposeswillnotbeconsideredforinclusioninthecode-pointrepertoire4112 UnambiguousUseEvery character proposed should have unambiguous understanding among linguistsaboutitsusageinthelanguage

42 ExclusionPrinciplesThe main exclusion principle is that of External Limits on Scope These consist ofprotocolsor standardswhichareprerequisites to theLabelGenerationRule-setsAllfurtherprinciplesareinfactsubsumedundertheselimitationsbuthavebeenspeltoutseparatelyforthesakeofclarity421 ExternalLimitsofScopeThecodepointrepertoireforrootzonebeingaveryspecialcaseatthetopofprotocolhierarchies thecanvasofavailablecharacters forselectionasapartof theRootZonecodepointrepertoireisalreadyconstrainedbyvariousprotocollayersbeneathitThefollowingthreemainprotocolsstandardsactassuccessivefiltersiTheUnicodeChartOut of all the characters that are needed by the script in question if a particularcharacter is not encoded in Unicode it cannot be incorporated in the code pointrepertoire Such cases are quite rare and especially so in Bangla-Asamiyā-ManipuriWritingSystemgiventheelaborateandexhaustivecharacterinclusioneffortsmadebytheUnicodeconsortiumiiIDNAProtocolUnicode being the character-encoding standard for providing the maximum possiblerepresentation of a given scriptlanguage it has encoded as far as possible all thepossible characters needed by the script However the Domain name being aspecialized case it is governed by an additional protocol known as IDNA(InternationalizedDomainNames inApplications) The IDNAprotocol excludes somecharactersoutofUnicoderepertoirefrombeingpartofthedomainnamesiiiMaximalStartingRepertoire(MSR)TheRoot-zoneLGRbeing the repertoireof characterswhicharegoing tobeused forcreationoftheRoot-zoneTLDswhichinturnconstituteanevenmorespecializedcaseof domain names the ROOT LGR procedure introduces additional exclusions on theIDNArsquosallowedsetofcharacters

23

ExampleBanglaSignAvagrahaঽ(U+093D)evenifallowedbyIDNAprotocol isnotpermittedintheRootZoneRepertoireaspertheMSRTosumuptherestrictionsstartoffwithadmittingonlysuchcharactersasarepartofthe code-block of the given scriptlanguage The IDNA Protocol further narrows thisdownandfinallyanadditionalfilterintheformofMaximalStartingRepertoirerestrictsthecharactersetassociatedwiththegivenlanguageevenmore4211 NoPunctuationMarksTheTLDsbeingidentifierspunctuationmarkerspresentinBraHami-basedscriptswillnotbeincluded 4212 NoSymbolsandAbbreviationsAbbreviations weights and measures and other such iconic characters like BANGLAISSHAR(U+09FA)BANGLACURRENCYDENOMINATORSIXTEEN৹(U+09F9)etcwillalsonotbeincluded4213 NoRareandObsoleteCharactersThere are characterswhich have been added toUnicode to accommodate rare formssuchasSanskriticVOCALICRRৠ(U+09E0)andVOCALICLldquoঌrdquo (U+098C)aswellasVOCALICLLৡ(U+09E1)andtheallographicndashkaraformsofthelattertwosymbols-VOWELSIGNVOCALICL(U+09E2)andVOWELSIGNVOCALICLLldquo(U+09E3)Allsuch charactersareexcludedwhich complieswith theConservatismprincipleas laiddownintheRootZoneLGRprocedureHoweverinBanglathe-karacorrespondingtoVOCALICRRৠ(U+09E0)whichisVOWELSIGNVOCALICRRldquordquo(U+09C4)isstill inactiveuseincertainlimitedborrowedorSanskriticwordsandarethereforeretained4214 NoStressMarkersofClassicalSanskritandVedicStressmarkers for classical Sanskrit will not be included This is also in consonancewiththeLetterprincipleaslaiddownintheRootZoneLGRprocedure4215 ABNFThe Augmented Backus-Naur Formalism (ABNF) is described in Section 541 andAppendix(Section101)

5 RepertoireTheBanglaWritingSystemisrepresentedinUNICODEusingtheBengali(Bangla)scriptname as enumerated in ISO 15924 corresponding to languages such as Asamiyā(Assamese) Bangla (Bengali) and Manipuri The BENGALI block used for Bangla-

24

Asamiyā-Manipuri in theUNICODEhas93 entriesThis sectiondetails the code-pointrepertoirethattheNeo-BrāhmīGenerationPanel[NBGP]proposestobeincludedintheBanglaLGRItmaybementionedherethat theGovernmentofAssamhassubmittedaproposal toBureauof Indian Standards (BIS) on26th February2016 for dis-unificationofBanglaand Asamiyā Scripts The BIS inits 8thMeetingofIndian Language Technologies andProducts Sectional Committee LITD 20 held on 23rd Aug 2017 decided torefer theproposalforrecognitionofAssamesescriptinISOIEC10646toISOUntiltheUNICODEConsortiumtakesanyfurtheractionitwillbeassumedthattheCodePointRepertoireunderTable11willbevalidforallthethreelanguagesasaboveFor each of the code points language references have been given in the last columntitledReferenceunderTable8titledtheldquoCodePointRepertoirerdquoForentirecoverageofBanglacodepointsreferencesofBanglaAsamiyā(Assamese)Manipuri(Meitei)andBishnupriya are given Kokborok written in Bangla script is not known to haveintroducedmanynewcomplicationsexceptforoneparticularcharacterThoughonlyafewrepresentativelanguagesunderEGIDSScale1-4havebeenchosenforreferencingthey together cover all the code-points required for all the languages that NBGP hasconsideredasgivenunderBanglaUnicodePoints(asgiveninUNICODE63)Howeverbefore thedetailsarepresented it is ideal to lookat theBanglaCodePointChartfromMaximalStartingRepertoire[MSR]Version3Itmaybenotedthattheshapesofthereferenceglyphsgivenbelowinthecodechartsarebasedononeofthemanyfontsdesigned and are not prescriptive because there could be some variations in actualfonts ndash both UNICODE-compatible and True-Type ones Consider the following Codepointtable

25

Figure1BanglaCodePagefrom[MSR]forBangla-Asamiyā-Manipuri

ColourconventionAllcharactersthatareincludedinthe[MSR]-YellowbackgroundPVALIDinIDNA2008butexcludedfromthe[MSR]-PinkishbackgroundNot PVALID in IDNA2008 or are ineligiblefor the root zone (digits hyphen) - Whitebackground

GiventheBanglaUnicodeBlockasinFigure1forthecodepointsthoseareincludedintheMSRthefollowingsymbolswillneedaseparatetreatmentৎ U+09CE BanglaLetterKhanda-Taৰ U+09F0 Asamiyā-BanglaLetterRaWithMiddleDiagonalৱ U+09F1 Asamiyā-BanglaLetterRaWithLowerDiagonal

26

51 CodePointRepertoireInclusion

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

1 U+0981 BENGALISIGNCANDRABINDU

Candra-bindu

1Bangla2Manipuri2Assamese

[112][122][125]

2 U+0982 ং BENGALISIGNANUSVARA

Onushshar(Anusvara)

1Bangla2Manipuri2Assamese

[112][122][125]

3 U+0983 ঃ BENGALISIGNVISARGA

Biśarga(Visarga)

1Bangla2Manipuri2Assamese

[112][122][125]

4 U+0985 অ BENGALILETTERA

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

5 U+0986 আ BENGALILETTERAA

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

6 U+0987 ই BENGALILETTERI

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

7 U+0988 ঈ BENGALILETTERII

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

27

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

8 U+0989 উ BENGALILETTERU

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

9 U+098A ঊ BENGALILETTERUU

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

10 U+098B ঋ BENGALILETTERVOCALICR

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

11 U+098F এ BENGALILETTERE

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

12 U+0990 ঐ BENGALILETTERAI

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

13 U+0993 ও BENGALILETTERO

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

14 U+0994 ঔ BENGALILETTERAU

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

28

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

15 U+0995 ক BENGALILETTERKA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

16 U+0996 খ BENGALILETTERKHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

17 U+0997 গ BENGALILETTERGA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

18 U+0998 ঘ BENGALILETTERGHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

19 U+0999 ঙ BENGALILETTERNGA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

20 U+099A চ BENGALILETTERCA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

21 U+099B ছ BENGALILETTERCHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

29

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

22 U+099C জ BENGALILETTERJA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

23 U+099D ঝ BENGALILETTERJHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

24 U+099E ঞ BENGALILETTERNYA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

25 U+099F ট BENGALILETTERTTA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

26 U+09A0 ঠ BENGALILETTERTTHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

27 U+09A1 ড BENGALILETTERDDA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

30

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

28 09A109BC(U+09DC)

ড় NormalizedformofBENGALILETTERRRA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]09DCisthepreferredcodepointhoweveritisnotavailableforLGRasperthestandardsgoverningthisLGRdevelopment

29 U+09A2 ঢ BENGALILETTERDDHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

30 09A209BC(U+09DD)

ঢ় NormalizedformofBENGALILETTERRHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]09DDisthepreferredcodepointhoweveritisnotavailableforLGRasperthestandardsgoverningthisLGRdevelopment

31 U+09A3 ণ BENGALILETTERNNA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

32 U+09A4 ত BENGALILETTERTA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

31

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

33 U+09A5 থ BENGALILETTERTHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

34 U+09A6 দ BENGALILETTERDA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

35 U+09A7 ধ BENGALILETTERDHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

36 U+09A8 ন BENGALILETTERNA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

37 U+09AA প BENGALILETTERPA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

38 U+09AB ফ BENGALILETTERPHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

39 U+09AC ব BENGALILETTERBA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

32

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

40 U+09AD ভ BENGALILETTERBHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

41 U+09AE ম BENGALILETTERMA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

42 U+09AF য BENGALILETTERYA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

43 09AF09BC(U+09DF)

য় NormalizedformofBENGALILETTERYYA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]09DFisthepreferredcodepointhoweveritisnotavailableforLGRasperthestandardsgoverningthisLGRdevelopment

44 U+09B0 র BENGALILETTERRA

Consonant 1Bangla2Manipuri

[112][125]

45 U+09B2 ল BENGALILETTERLA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

46 U+09B6 শ BENGALILETTERSHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

33

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

47 U+09B7 ষ BENGALILETTERSSA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

48 U+09B8 স BENGALILETTERSA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

49 U+09B9 হ BENGALILETTERHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

50 U+09BE া BENGALIVOWELSIGNAA

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

51 U+09BF ি BENGALIVOWELSIGNI

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

52 U+09C0 ী BENGALIVOWELSIGNII

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

53 U+09C1 BENGALIVOWELSIGNU

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

34

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

54 U+09C2 BENGALIVOWELSIGNUU

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

55 U+09C3 BENGALIVOWELSIGNVOCALICR

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

56 U+09C4 BENGALIVOWELSIGNVOCALICRR

Kāra(Mātrā)

1Bangla2Assamese

[112][122]

57 U+09C7 l BENGALIVOWELSIGNE

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

58 U+09C8 m BENGALIVOWELSIGNAI

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

59 U+09CB lা BENGALIVOWELSIGNO

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

60 U+09CC lৗ BENGALIVOWELSIGNAU

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

35

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

61 U+09CD BENGALISIGNVIRAMA

Hasanta(=Halant)Virama(=Da ri)

1Bangla2Assamese2Manipuri

[112][122][125]

62 U+09CE ৎ BENGALILETTERKHANDATA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

63 U+09F0 ৰ BENGALILETTERRAWITHMIDDLEDIAGONAL

Consonant 2Assamese [122]

64 U+09F1 ৱ BENGALILETTERRAWITHLOWERDIAGONAL

Consonant 2Assamese2Manipuri

[122][125]

Table8BanglaCode-PointRepertoire

Apart from the above individual code-points the Neo-Brāhmī Generation Panel alsoproposes some specific sequences which enable conditional inclusion of the BanglaLETTER A and E followed by Bangla SIGN VIRAMA and Bangla LETTER YA againfollowed by Bangla VOWEL SIGN AA in the repertoire for enabling inclusion of aeligsoundasinEnglishlsquobatrsquolsquocatrsquoetc

SrNo

UnicodeCodePoints

Sequence

CharacterNames Examplelanguagesusingthecode-point

(Notexhaustive

list)

Reference

36

SrNo

UnicodeCodePoints

Sequence

CharacterNames Examplelanguagesusingthecode-point

(Notexhaustive

list)

Reference

S1 098509CD09AF09BE

অ8া BENGALILETTERABENGALISIGNVIRAMABENGALILETTERYABENGALIVOWELSIGNAA

BanglaAssamese

[112][122]

S2 098F09CD09AF09BE

এ8া BENGALILETTEREBENGALISIGNVIRAMABENGALILETTERYABENGALIVOWELSIGNAA

Bangla [112]

Table9Sequences

52 CodePointRepertoireExclusionTherearesomecharactersoftheBanglascriptthatfindplaceintheUnicodebuthavenot been included in the repertoire in the LGR proposal The reason for excludingঌ(U+098C)andৗ(U+09D7)isthattheyarerareandobsoletecharacters

SrNo CodePoints

Glyph CharacterNames Note

1 U+098C ঌ BENGALILETTERVOCALICL Limitedordeclininguse

2 U+09D7 ৗ BENGALIAULENGTHMARK Limitedordeclininguse

Table10ExcludedCodePoints

37

53 CodepointnotusedaloneBENGALI SIGN NUKTA U+09BC (See 336) is excluded from repertoire since it will never be used alone It will be used as sequence in three special characters in normalized form for ড় ঢ় য়

UnicodeCodePoint

Glyph CharacterName Reasonforexclusion

U+09BC BENGALISIGNNUKTA

Never used alone Only used together with U+09A1 ড

U+09A2 ঢ U+09AF য as to form ড় ঢ় য় respectively

Table10bExcludedCodePoints

54 TheBasisofPresentIDNThepresentLGRhasalsobenefitedfromtheearlierworkonIDNforBangla(differentversions)doneforभारतorভারতdraftedbetween20112009and18072013

541 TheABNFVariablesTheAugmentedBackus-NaurFormalism(ABNF)beganwiththefollowingvariables

CrarrConsonantVrarrVowelMrarrkāra(Mātrā)BrarrAnusvāra(onuʃʃār)DrarrCandrabinduXrarrVisarga(biʃɔrgo)HrarrHasantaViramaZrarrKhandaTa

TheAugmentedBackus-NaurFormalism(ABNF)willusethefollowingOperators

SrNumber Operator Function

1 ldquo|ldquo Alternative

2 ldquo[]rdquo Optional

3 ldquordquo VariableRepetition

38

4 ldquo()rdquo SequenceGroup

Table11TheABNFFormalism

InwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoBanglaaregiventofacilitateunderstanding

542 TheVowelSequenceInwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoBanglaare given To facilitate understanding of other Brahmi script users equivalents inDevanāgarīareprovidedwherevernecessaryAvowelsequenceismadeupofasinglevowelItmaybefollowedbutnotnecessarily(optionally)byanAnusvāraonuʃʃār(B)Candrabindu(D)oraVisargabiʃɔrgo(X)ThenumberofDBorXwhichcanfollowaV inBanglamaynotberestrictedtooneGoingbytherules illustratedinthedocument it isclearthat formationssuchasVDDVBBandVXXare invalidorthographicunitsHowever it isvalidandpossible tohaveformationsorsequencessuchasanusvarafollowedbyachandrabinduononehandandvisarga followed by a chandrabindu on the other as in হ8াংচা lsquohaelignchārsquo and lsquohaelignrsquo হ8াঃrespectivelyThepossibilityof aVisarga orAnusvāra (onuʃʃār) followingaCandrabinduexists inBangla Vowel can optionally be followed by a combination of Hasanta Virama [H]Consonant [C] to formaYa-phala ldquoYa-phala isapresentation formofU+09AFBanglaletterযorlsquoyarsquoRepresentedbythesequenceltU+09CDieBENGALISIGNVIRAMABangla SIGNHasanta or VIRAMA U+09AF -য BENGALI LETTER YAgt Ya-phala has aspecialformয়AgainwhencombinedwithU+09BEাBENGALIVOWELSIGNAA(ielsquoaarsquo(ā))itisusedfortranscribing[aelig]asintheldquoardquointheEnglishwordldquobatrdquowritteninBanglaasব8াটAVowel-sequenceadmitsthefollowingcombinations

5421 ASingleVowel

ExamplesV অअ

5422 AVowelwithConditionsAVowelcanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D]orVisarga[X]or Candrabindu+ Anusvāra [DB] or Candrabindu+ Visarga [DX] or combination ofHasanta(orVirama)[H]followedbyConsonant[C]followedbykāra(Mātrā)[M]

39

Examples

VB অং अ

VD অ अ

VX অঃ अः

VDB অং अ

VDX অঃ अःVHCM অ8াএ8া

5423 VHCMSequenceAVHCMsequencecanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D] orVisarga [X]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga [DX]

Examples VHCMB অ8াংএ8াং VHCMD অ8াএ8া VHCMX অ8াঃএ8াঃ VHCMDB অ8াংএ8াং VHCMDX অ8াঃএ8াঃ

543 TheConsonantSequence

5431 ASingleConsonant(C) ExampleC কक

5432 AConsonantwithConditionsAConsonantoptionallyfollowedbydependentvowelsignkāra(Mātrā)[M]orAnusvāra [B] or Candrabindu [D] or Visarga [X] or Hasanta (also known asVirama)[H]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga[DX]

Example

CM িকক -कक

CB কং क

40

CD ক क

CX কঃ कः

CH p क(Pureconsonant)

CDB কং क

CDX কঃ कः

5433 CMSequenceACMsequencecanbeoptionallyfollowedbyBDXDBorDX

Example CMB কীংকং कक

CMD কা का

CMX বীঃ वीः

CMDB কাং का

CMDX কাঃ काः

5434 SequenceofConsonants Asequenceofconsonants(upto4)joinedbyHasanta(alsoknownasVirama)

3(CH)CExample

CHC W rarr ন++ ত न+त

CHCHC sup2 rarr ন+ + ত+ + র न+त+र

CHCHCHC q8 rarr ন++ত++র++য় न+त+र+य

5435 Subsets While considering its subsets as a representative example we will consider thecombinationCHConlyhoweverthesameisequallyapplicabletoCHCHCandCHCHCHC[A]ThecombinationmaybefollowedbyMBDXDBorDX

Example CHCM sup3ী rarrক ক ী 4कrarrककী

CHCB sup3ং rarrক ক ং 4कrarrकक

CHCD sup3 rarrক ক 4कrarrकक

41

CHCX sup3ঃ rarrক ক ঃ 4कःrarrककঃ

CHCDB sup3 ং rarrক ক ং 4कrarrकक

CHCDX sup3ঃ rarrক ক ঃ 4कःrarrककः [B] 3(CH)CMmayfurtherbefollowedbyaBDXDBorDX

Example

CHCMB sup3ীং rarr ক ক ী ং 4क rarr क क ी

sup3ং rarr ক ক ং 4क rarr क क

CHCMD sup3া rarr ক ক া 4का rarr क क ा

CHCMX sup3ীঃ rarr ক ক ী ঃ 4कः rarr क क ी ः

CHCMDB sup3াংrarr ক ক া ং 4काrarr क क ा

CHCMDX sup3াঃ rarr ক ক া ঃ 4काः rarr क क ा ः

544 TheKhanda-Tasequence

5441 AsinglelsquoKhandarsquo-Ta(Z) Example Zৎ=x

5442 AKhandaTaCombination10AKhandaTacanbeprecededbyaconsonantandHasanta(alsoknownasVirama)

[CH]Z

Example র++ৎ =ৎHasinভৎHসনা (bhartsanā) scoldingNoteTheconditionsinthiscontextofKHANDATAarethattheCshouldbeeitherRAU+09B0(র)(usedinBangla)orRAU+09F0(ৰ)(usedinAssamese)

545 SpecialCasesSandPTwospecialcasesinvolvingSequences(referredtoasSandPinTable16underSection7)couldbedescribedbrieflyhereLetustakeupSinthefirstinstanceItisnoteworthythat thereare two instances inBanglawhereHasanta(U+09CD) isprecededbya full

10 Refer to Rule P in Section 7 Table 16

42

vowel (U+0985 অ - BANGLA LETTER A and U+098F এ - BANGLA LETTER E) ForrenderingYa-phalāfollowedbyঅandএitisnecessarytotypeU+09CDplusU+09AFyapreceded by the said vowels This is a purely ligatural entity and the addition ofYa-phalāandā-kāraisusedtoelicittheaeligsoundasinEnglishlsquobatrsquolsquofatrsquoetcTheBrahmıscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribedaslsquovowel killerrsquo although it actually indicates absence of a vowel after the markedconsonant Only the consonants can have the Hasanta marked But as we see hereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅGাandএGা(CfUnicode100p473[100])Another case refers to the formation of repha and ra-phalā in the said script andmentioned in the tableaboveasPOwing toco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+

kaasinঅকHarkaldquothesunldquo)ra-phalā=C+Hasanta+ra(egTieka+Hasanta+raas

inচTcakraldquocycleldquo)ThepointisinboththecasestheslotforracouldbeBanglaraর

(U+09B0)ortheAssameseraৰ(U+09F0)followedprecededbythecommonHasanta(U+09CD)whereastheshapesofrephaandra-phalāinboththecasesremainthesame

43

6 VariantsThissection talksabout thevariants in theBanglascriptTheNBGPcategorizes theseconfusinglyvariantsintwogroupsGroup1ConfusingduetopurevisualsimilarityGroup2ConfusingduetodeviationfromnormallyperceivedcharacterformationsbylargerlinguisticcommunityForGroup1anyidenticalcodepointsaredefinedasvariantsTheconfusablebutnotidenticalcasesarenotproposedasthereisanotherpanel(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesHowevercaseswhichbelongtoGroup2areproposedtobeconsideredasvariantsThesecasesarenotofmerevisualsimilarityasthey involve some deviations from the widely accepted norms of Bangla Aksharformations These can cause confusion even to a careful observer and hence beingproposedasvariantsThe variants are generated in a script when two or more forms are formed withdifferent storage or code points In Bangla the e-kāra ā-kāra and the o-kāra havedifferentcodepointsOnecantypeowithaconsonantatonegoandthesamebytypinge-kāra and ā-kāra as two separate keys getting the same results A reader cannotdifferentiatebetween the twoko (েকা)one typedwitha singlekeyand theotheronetypedwithtwodifferentkeysMoreoverthiswillnotbeconsideredasacaseofvariantbecauseakārafollowedbyakāraisnotallowed

61 InScriptVariantsHoweverweproposetwocasesoftruein-scriptvariantsinBanglascriptCASEIAs far as true variants in Bangla are concernedwemay drawour attention to caseswhereinHasantawith(U+09A5)থ(tha)appearsasconjunctwith(U+09B8)স(sa)and(U+09A8)ন(na)1 স+Hasanta+থ(U+09B8+U+09CD+U+09A5)versus

স +Hasanta+হ(U+09B8+U+09CD+U+09B9)

2 ন+Hasanta+থ(U+09A8+U+09CD+U+09A5)versus

ন+Hasanta+হ(U+09A8+U+09CD+U+09B9)

44

Theabovecombinationsifwrittenintraditionalorthographycouldbelittleconfusingwheretheথ(tha)inconjunctappearslikeaহ(ha)Theconjunctcouldbeintheinitialmedialorfinalpositions(asshownbelowinegno1)Itcouldbetypedwrongaswellthinking itwas aহ (ha)U+09B9 increasing the chances of risks in labelwriting andidentificationExamples1 acuteandসহ(asinacuteানsthānaacuteলsthulaাacuteGsvāsthyaঅacuteায়ীasthāyı)

2 andনহ(asinparagrantha)The fontswhichrepresent traditionalBanglawritingsystemcould tend tocreate thisproblemThereforethesemaybetakenascasesofvariantsinBanglaCASEIIAnotherinterestingexampleofvariantisencounteredinra+HasantaandHasanta+racombinationsinwritinglabelsintheBanglascript(forlanguagessuchasBanglaAssameseandManipuri)Thevariantcasesariseintypinglsquorepharsquo(involvingra+Hasanta)andlsquora-phalārsquo(involvingHasanta+ra)lsquoRepharsquocouldbeformedbytwosequences(mainlybecausebothAssameseandBanglafindplaceinthesameUNICODEpointsandlsquoB_RArsquoaswellaslsquoA_RArsquorefertothesamephoneticelement)Herethefinalligatureslookthesameandwillbeasfollows

(1) B_RA+H+C(2) A_RA+H+C

Where

B_RA = U+09B0BENGALILETTERRA(র)orA_RA = U+09F0BENGALILETTERRAWITHMIDDLEDIAGONAL(ৰ)H = U+09CDBENGALISIGNVIRAMA()C = anyconsonant(theoretically)

Example

Sequence1(UsingBanglaRA)

Ligature1

Sequence2(UsingAssameseRA)

Ligature2

U+09B0(র)U+09CD()U+0995(ক) কH U+09F0(ৰ)U+09CD()U+0995(ক) কH

U+09B0(র)U+09CD()U+09A0(ঠ) ঠH U+09F0(ৰ)U+09CD()U+09A0(ঠ) ঠH

Table12ExampleofRepha

45

NoteAsBanglaandAssameseকandঠlookexactlythesametheresultantcombinationswithRephalookidenticalAdditionofRephadoesnotmakeanydifferencelsquoRa-phalarsquocouldbeformedbytwosequencesonsimilargroundsandthefinalligatureswouldlookthesame

(1) C1+H+B_RA(2) C1+H+A_RA

WhereC1 = anyconsonantsexceptKhanda-ta

Example

Sequence1(UsingBanglaRA)

Ligature1

Sequence2(UsingAssameseRA)

Ligature2

U+0995(ক)U+09CD()U+09B0(র) U+0995(ক)U+09CD()U+09F0(ৰ)

U+09A8(ন)U+09CD()U+09B0(র) ) U+09A8(ন)U+09CD()U+09F0(ৰ) )

Table13ExampleofRa-phala

AstheAssameseandBanglaRephaandRa-phalaconjunctformslookthesamethiscouldcauseconfusabilitytotheend-usersHencetherephaandra-phalacasesneedtobedefinedasvariantsNBGPconcludedtodefineরandৰasvariantcodepointswhereonlyonevariantsetbetweenরandৰcouldcoverallcasesButthiswillcreateblockedvariantlabelsegif

someoneregistersldquoরররrdquothevariantlabelldquoৰৰৰrdquowillbegeneratedasvariantandwillbeblockedandviceversaHoweveritisonlyblockedatthelabellevelifsomeoneelseneedstoregisterotherlabelsegৰৰorৰৰৰৰitisstillpossible

62 CrossScriptVariantsAcrispcrossscriptstudyforBanglahasbeendonewithrespecttosisterscriptssuchasDevanāgarī Gurmukhı and Odia11(formerly Oriya) keeping in mind the visual andtechnicalconfusionstheymaycauseaslabelsonthewebdomainMoreoverthereisnoin-script variant in Bangla as far as the orthography is concerned The followingcharacters are being proposed by the NBPG as variants Although there are certaincharacters which are somewhat similar they but have not been included here TheyhavebeenprovidedintheAppendix(102)forreference

11 Unicode uses Oriya for the script although Odia is now the official term used

46

1 BanglaandNāgarīDevanāgarīScript

Bangla Devanāgarī

মU+09AE

मU+092E

িU+09BF

िU+093F

Table14-BanglaandDevanāgarīcross-scriptvariantcodepoint

2 BanglaandGurmukhiScript

Bangla Gurmukhı

মU+09AE

ਸU+0A38

িU+09BF

ਿU+0A3F

Table15-BanglaandGurmukhıcross-scriptvariantcodepoint

7 WholeLabelEvaluationRules(WLE) ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinsection32whenwritten inBangla12ScriptThe ruleshavebeendrafted in suchawaythattheycanbeeasilytranslatedintotheLGRspecificationsBelow are the symbols used in the WLE rules for each of the Indic SyllabicCategoryasmentionedinthetableprovidedinCodepointrepertoire(Section51)

C rarr Consonant

M rarr Kāra(Mātrā)

V rarr Vowel

B rarr Anusvāra

12 As used by the Unicode denoting and including both Assamese and Maṇipuri

47

D rarr Candrabindu

X rarr Visarga

H rarr Hasanta

Z rarr KhandaTa

S rarr S1S2(fromTable9)or(ae)Ya-phalā(V1HC1M1)whereV1iseither0985(অ-BENGALILETTERA)or098F(এ-BENGALILETTERE)His09CD(-BENGALISIGNVIRAMA)C1is-09AF(য-BENGALILETTERYA)M1is-09BE(া-BENGALIVOWELSIGNAA)S1 and S2 are valid even they are not allowed by the other context rules

P rarr Ra-Hasanta(C2H)whereC2iseither09B0(র-BENGALILETTERRA)or 09F0 (ৰ - ASSAMESE LETTER RA

Unicode name BENGALI LETTER RAWITHMIDDLEDIAGONAL)

His09CD(-BENGALISIGNVIRAMA)

Table16-SymbolsusedinWLErules

ItisalsoperhapsidealtomentionherethatinBanglatheconsonantletters(orgraphemes)arephysicallyjoinedtoformldquoclustersrdquothatcouldtheoreticallyconjoinfromtwotofourconsonantsandcombinetocreatenewshapesDashandChaudhuri(1998)statethatthereareldquonearly380uniqueconsonantclustersrdquooutofwhichBi-consonantalcombinationsare290three-lettercombinationsaccountforanother80andtherareroneswithfourlettersnumber10more[136Pg4]MoredetailsofsuchcombinationscouldbeseeninPabitraSarkar(1993)[135] 71FinalSetofWLERulesTheprevalentpatternsinBanglaandvariousrestrictionsbelowarethespecificWLErulesthatneedtobeimplemented

48

1 CisasetofCandCNwhereCNisthesetofnormalizedformsofড়ঢ়য়2 HmustbeprecededbyC

Example

3 MmustbeprecededbyCExampleকা

4 DmustbeprecededbyeitherofVCorMExampleআখখা হা

5 XmustbeprecededbyeitherofVCMorDExampleউঃখঃবঃাঃ দ ঃ

6 BmustbeprecededbyeitherofVCMorDExampleআংইংকং

7 ZmustbeprecededbyVCMDBXorPExampleইৎকৎাৎাৎপ6ৎrৎ (S is not listed because S ends with M Z may also follow S)rdquo

8 VCANNOTbeprecededbyHDetailsin711CaseofVprecededbyH

9 SCANNOTbeprecededbyH

Now letuselaborateeach rulewithexamples from the scriptkeeping inmindthe Bangla Assamese and Manipuri communities Some combinations ofcharactersmayseemunrealisticorrareinusagebutthereisnoharminaddingsuch ligaturesbecause it ispossible tocreate thembyanyusereasilybutmaynotbeattestedcombinationsCase of V Preceded by H There could be cases involvingmulti-word domains where Vmay need to beallowedtofollowanHegব8াsঅtইিuয়া baeligŋkʌv ɪndiə (U+09ACU+09CDU+09AFU+09BEU+0999U+09CD U+0995 U+0985 U+09AB U+09CD U+0987 U+09A8 U+09CD U+09A1U+09BFU+09DFU+09BE)(meaningBankofIndia)ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendswithanH(অt)andthesecondwordbeginswithaV(ইিuয়া)Somesectionsofthe linguistic community require the explicit presence of H for fullrepresentation of the sound intended However by and large the form of thefirstwordwithoutanH(U+09CD)isconsideredenoughforfullrepresentationofthesoundintendedforthefirstword

49

This isauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoire Otherwise V is never required to be allowed to follow an HPermittingthismaycreateaperceptivesimilaritybetweentwolabels(withandwithout H) for majority of the linguistic communities hence this is explicitlyprohibitedbytheNBGPIn future if required depending on the prevailing requirements from thecommunitythefutureNBGPmayconsiderrevisitingthisrule

72 AdditionalExamplesfromBanglaABNF

Belowarea fewexampleswhichhelponeunderstandsomeof the rulesABNFputsinplaceThesearejustgivenforreferencepurposesandarenotmeanttobecomprehensive

1HMBDorXcannotoccurinthebeginningofaBanglawordExample

ক क

াক ाक

ংক क

ক क

ঃক ःकAscanbeseensuchcombinationwillresultautomaticallyinaldquogolurdquooradottedcircle marking it as an invalid formation This is an intrinsic property of theIndianlanguagesyllableandisquasi-automaticallyappliedwhereversupportedbytheOS

2HisnotpermittedafterVBDXMS

Example

অ अ

অং क ক क কঃ कः ি )क

3NumberofBDorXpermittedafterConsonantorVowelorakāra(Mātrā)isrestrictedtoonethusthefollowingcombinationsareinvalidated

50

Example

কংং क

ক क

কঃঃ कःः

কা का

কীঃঃ कःः

অংং अ

অ अ

অঃঃ अःः

4NumberofMpermittedafterConsonantisrestrictedtooneExample

কীী की5MisnotpermittedafterV Example

ইাঈৗ ईाईौ6ThecombinationsofAnusvāra+ VisargaaswellasVisarga+Anusvāraarenotpermissible

Example

কংঃ कः

কঃং कः

8 Contributors

81ExpertsfromIndia ProfessorUdayaNarayanaSinghChair-ProfessorofLinguisticsampDeanFacultyofArtsAmityUniversityHaryanaGurgaonPachgaonManesarPIN122431(Haryana)India

51

ProfessorPabitraSarkarformerlyVice-ChancellorRabindraBharatiUniversityKolkataDrAtiurRahmanKhanPrincipalTechnicalOfficerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMrRajibChakrabortyLinguistSocietyforNaturalLanguageTechnologyResearch(SNLTR)Module114amp130SDFBuildingSaltLakeSector-VKolkata-700091(WestBengal)IndiaMrAkshatJoshiProjectEngineerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMsMoumitaChowdhurySeniorTechnicalOfficerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMrChandrakantaMurasinghAgartalaTripuraSomeotherNBGPmembers

82ContributorsfromBangladeshJanabMustafaJabbarHonorableMinisterMinistryofPostsTelecommunicationsampInformationTechnologyGovtofBangladeshProfShamsuzzamanKhanFormerDirector-GeneralBanglaAcademyDhakaProfRafiqulIslamNationalProfessorofHumanitiesDhakaProfSwarochisSarkarDirectorInstituteofBangladeshStudiesRajshahiUniversityRajshahiBangladeshProfJinnatImtiazAliDirector-GeneralInternationalMotherLanguageInstituteDhakaMrMohammadMamunOrRashidDepartmentofBanglaJahangirnagarUniversityampMemberBangladeshComputerCouncilProfManiruzzamanformerlyProfessorChittagongUniversityChattagramBangladeshMrShyamSunderSikderSecretarySecretaryPostampTelecommunicationsDivisionGovtofBangladesh

52

MrMdMustafaKamalFormerDirectorGeneralBangladeshTelecommunicationsRegulatoryAuthorityGovernmentofBangladeshDhakaBrigadierGeneralMdMahfuzulKarimMajumderDirector-GeneralEngineeringampOperationsDivisionBangladeshTelecommunicationsRegulatoryAuthorityGovernmentofBangladeshDhakaMdZiarulIslamProgrammerPostsampTelecommunicationsDivisionGovernmentofBangladeshDhakaProfSyedShahriyarRahmanDepartmentofLinguisticsUniversityofDhakaDrMizanurRahmanDirector(In-Charge)TranslationTextBookandInternationalRelationsDivisionBanglaAcademyDhakaDrApareshBandyopadhyayDirectorBanglaAcademyDhakaMrMdMobarakHossainDirectorBanglaAcademyDhakaDrJalalAhmedDirectorBanglaAcademyDhakaMrJahangirHossainInternetSocietyBangladesh(ICANNALS)JanabSarwarMostafaChoudhuryBangladeshComputerCouncilDhakaJanabMdRashidWasifBangladeshComputerCouncilDhakaJanabIstiaqueArifSeniorAssistantDirectorBangladeshTelecommunicationsRegulatoryAuthorityDhakaMsAfifaAbbasInformationSecurityandGovernanceLeadEngineeratBanglalinkandICANNFellowMr Mohammad Abdul Haque Secretary General Bangladesh Internet GovernanceForumMrImranHossenCEOEyeSoftandkeymemberofBangladeshAssociationofSoftwareampInformationServices(BASIS)MsShahidaKhatunDirectorFolkloreMuseumampArchiveDivisionBanglaAcademyDhakaMrSyedAshikRehmanCEOBengalMediaCorporationDhakaMrHaseebRahmanCEOProfessionalsrsquoSystemsDhaka

53

9 References[100] UnicodeConsortium2017UnicodeStandard100MountainViewCA

[101] BandyopadhyayChittaranjan1981DuiShatakerBanglaMudranoPrakashanKolkataAnandaPublishers

[102] BanerjiRD1919TheOriginoftheBengaliScriptKolkataNewDelhiAsianEducationalServices2003reprint

[103] ChatterjiSK1926TheOriginandDevelopmentoftheBengaliLanguageCalcuttaCalcuttaUniversityPress

[104] -----1939Bhasha-prakashBangalaVyakaran(AGrammaroftheBengaliLanguage)CalcuttaUniversityofCalcutta

[105] HaiMuhammadAbdul1964DhvaniVijnanOBanglaDhvani-tattwa(PhoneticsandBengaliPhonology)DhakaBanglaAcademy

[106]JhaSubhadra1958TheFormationofMaithiliLondonLuzacampCo

[107] KosticDjordjeDasRheaS1972AShortOutlineofBengaliPhoneticsCalcuttaStatisticalPublishingCompany

[108] MajumdarRC1971HistoryofAncientBengalCalcuttaGBhardwaj

[109] MazumdarBijaychandra19202000TheHistoryoftheBengaliLanguage(ReprCalcutta1920ed)NewDelhiAsianEducationalServices

[110] PandeyAnshuman2001ProposaltoEncodetheTirhutaScriptinISOIEC10646

[111] PalPalashBaran2001DhwanimalaBarnamalaKolkataPapyrus

[112] -----2007lsquoBanglaHarapherPanchParbarsquoInSwapanChakrabortyedMudranerSanskritiOBanglaBoiKolkataAbabhas

[113] RossFiona1999ThePrintedBengaliCharacteranditsEvolutionLondonCurzon

[114] ShastriMahamahopadhyayHaraPrasad1916HājārBacharērPurāṇaBāṅgālāBhāṣāyBauddhaGānōDōhāCalcuttaBangiyaSahityaParishat

[115] SinghUdayaNarayana(JointlyManiruzzaman)1983DiglossiainBangladeshandlanguageplanningCalcuttaGyanBharati214pp

[116] -----1987ABibliographyofBengaliLinguisticsMysoreCIILxii+316pp

[117] -----2017(withRajibChakrabortyBidishaBhattacharjeeampArimardanKumarTripathy)LanguagesandCulturesontheMarginGuidelinesforFieldworkonEndangeredLanguagesMimeoCentreforEndangeredLanguagesVisva-Bharati

54

[118] -----1980ScriptalchoiceandspellingreformAnessayinlanguageandplanningJournaloftheMSUniversityofBarodaSocialScienceNumber292173-186AmodifiedversionreprintedEAnnamalaiBjornJernuddandJoanRubinedsLanguagePlanningProceedingsofanInstituteMysoreCIIL405-417

[119] Sripantha1996JakhanChapakhanaEloKolkataPaschim-BangaBanglaAcademy

[120] SurAtul1986BanglaMudranerDushoBacharKolkataJijnasa

[121] ScriptBehaviourforBengaliVersion11TDILandC-DACPune

[122] BoraMahendra1981TheEvolutionofAssameseScriptJorhatAssamSahityaSabha

[123] ProposaltoEncodetheTirhutaScriptinISOIEC10646httpwwwunicodeorgL2L201111175r-tirhutapdfaccessedon25112017

[124] EthnologueAssameseintheLanguageCloudhttpswwwethnologuecomcloudasmaccessedon25112017

[125] BengalialphabetforManipurifoundinEthnologueManipuri(MeeteilonMeithei)httpswwwomniglotcomwritingmanipurihtmaccessedon20102019

[126] WikipediaBengalialphabethttpsenwikipediaorgwikiBengali_alphabetaccessedon25112017

[129]OmniglotSlyhetihttpwwwomniglotcomwritingsylotihtmaccessedon1052018

[130]WikipediaBishnupriyaManipurilanguagehttpsenwikipediaorgwikiBishnupriya_Manipuri_languageaccessedon25112017

[131] TheEMILLECIILCorpushttpmetashareeldaorgrepositorybrowsethe-emilleciil-corpusabdd35c8de6f11e2b1e400259011f6ea6bce74d38dbb42d881da76c64a6adb20accessedon1052018

[132]TheEMILLECIILCorpushttpcatalogelrainfoproduct_infophpproducts_id=696accessedon1052018

[133] BanglaLanguageampScript httpswwwisicalacin~rc_banglabanglahtmlaccessedon1052018

[134] SarkarPabitra1992BanglaBananSanskarSamasyaoSambhabanaKolkataChirayataPrakashan

[135] SarkarPabitra1993BanglaBhasharYuktabyanjanBhasha1123-45

55

[136] DashNiladriShekharandBBChaudhuri1998BanglaScriptAStructuralStudyLinguisticsToday121-28Alsoavailableathttpswwwacademiaedu9967428Bangla_Script_A_Structural_Study

[137] DaniAhmedHasan(1957)lsquoSrīhatta-NāgarīLipirUtpattioBikāśrsquoBanglaAcademyPatrika(Dhaka)Vol12(Bhadra-Agrahayan1364BangabdaNumber)pg1

[138] WikipediaSylhetiNagari httpsenwikipediaorgwikiSylheti_Nagariaccessedon1952018

[139] FuruiRyosuke(2015)lsquoVariegatedAdaptationsStateFormationinBengalfromtheFifthtoSeventhCenturyrsquoinBhairabiPrasadSahuampHermannKulkeedsInterrogatingPoliticalSystemsIntegrativeProcessesandStatesinPre-ModernIndiaChapter9Pp255-73NewDelhiManohar

[140] FergusonCharesAandMunierChowdhury(1960)lsquoPhonesofBengalirsquoLanguageVol36No1pp22-59

[141] ShahidullahMuhammad(2007)BuddhistMysticSongsDhakaMowlaBrothers

[142] RayPunyaSloka(1966)BengaliLanguageHandbookWashington

[143] HaiMuhammadAbdul(1960)AphoneticandphonologicalstudyofnasalsandnasalizationinBengaliDhakaUniversityofDhaka

[144] UnicodeConsortiumProposalSummaryFormtoAccompanySubmissionsforAdditionstotheRepertoireofISOIEC10646UNICODEhttpswwwunicodeorgL2L200202387r-syloti-formpdfaccessedonMay212018

[145] WikipediaOlChiki(Unicodeblock)httpsenwikipediaorgwikiOl_Chiki_(Unicode_block)accessedonMay212018

[146]BanglaScripthttpwwwbangladesh2000combdbangla_scripthtmlaccessedonMay212018

[147] BhattacharyaAshutoshed(1942)GopichandrerGanCalcuttaCalcuttaUniversity

[149] DasSisirKumar(1975)SahibsandMunshisAnAccountoftheCollegeofFortWilliamCalcutta

[150] IslamRafiqulPabitraSarkarMahbubulHaqampRajibChakraborty(eds)(2014)BanglaAcademyPromitoBanglaByabaharikByakaran(AFunctionalGrammarofStandardBangla)DhakaBanglaAcademy

[151] SarkarPabitra[2013]lsquoBanglaSpellingReformtheLongandShortofItrsquoBanglaJournal19215-232

[152] BanglaAcademy(2012)BanglaAcademyPromitoBanglaBananerNiyam(StandardBanglaSpellingasadoptedbyBanglaAcademy)DhakaBanglaAcademy

56

[153] SarkarPabitraampRajibChakraborty2018ldquoWhathashappenedSoFarIntermsof Script Reformsrdquo Paper presented at the Face to Face meeting jointly held by theBanglaAcademyDhakaampICANNatBanglaAcademyDhakaon10072018

[154] The Unicode Consortium 2018 The Unicodereg Standard Version 110 ndash CoreSpecificationChapter12P473

57

10 Appendix-I

101 AugmentedBackus-NaurFormalism(ABNF)The Augmented Backus-Naur Formalism (ABNF) is generic in nature and whenappliedtoaspecificlanguagescriptcertainrestrictionrulesapplyInotherwordsinagivenlanguagesomeoftheFormalismstructuresdonotnecessarilyapplyTotakecareofsuchcasesrestrictionrulesaresetinplaceTheserestrictionswillhelptofine-tunetheABNFIncaseofBangla13inparticularthefollowingrulesapply

1Khaṇḍata (ৎ) is NOT allowed at the beginning of an IDN label The sameappliestoঞandthevelarnasalঙintheBanglaSchemeoffive-foldlsquovargarsquo(asdefined under Table 5) Moreover Bangla does not allow ya (য়) in thebeginningofawordeitherbutwecanciteacoupleofnativeexamples forexamplethewordয়8াvেড়া(yaeligbbɔRo)fromthepoemlsquoLichuchorrsquowrittenbyKaziNazrul IslamHowever there are instances of it being used in namesmostlyof foreignoriginsuchasYaqubwhichmaybewrittenwithya(য়) inthe beginning as inয়াwব) In very recent timeswhile transliterating someChineseandJapanesenamesinBanglaonedoescomeacrossthepossibilityofKhaṇḍata (ৎ) followedbysa (স) inthebeginningofaword forexamplexেসিরং(Tsering)

2CHcancomewithKhandaTainonlythecasewhereCisra(র)(09B0)

ৎ6 asinভৎ6 সনা

3OnlyfollowingcombinationswithVHCMwillbeallowedrarrঅ8া(togetherpronouncedasaelig)asinঅ8ািসড(acid)rarrএ8া(togetheralsopronouncedasaelig)asinএ8ািসডএ8ােসািসেয়শান

(acidassociation)

102 lsquoSylhetiNagarılipirsquoorlsquoSilotirsquoThisversionofBanglascriptresemblesthe lsquoKaithīrsquoscript(ISO12954)usedbytheAccountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh andBiharndashwidelyinuseduringthe1880sTherewereseveralothernamesofSylheti

13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language Assamese and Maṇipuri have not been covered in this section

58

Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)

Table17ndashTheScriptTableofSylhetiNagarıorSiloti

103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints

1031 BanglaandNāgarīorDevanāgarī

Bangla Devanāgarī NBGP

Decision

ঃ U+0983 ः U+0903 Confusable

ওU+0993 उU+0909 Confusable

ঘU+0998 घU+0918 Confusable

U+0981 U+0945 Confusable

Table18BanglaandDevanāgarīconfusablecodepoints

59

1032 BanglaandGurmukhi

Bangla Gurmukhi NBGPdecision

ঘU+0998 ਬU+0A2C Confusable

U+0981 U+0A71 Confusable

Table19BanglaandGurmukhiconfusablecodepoints

Bangla Gurmukhi

NBGPdecision

ওU+0993 ਤU+0A24

Distinguishable

শU+09B6 ਅU+0A05

Distinguishable

মU+09AE ਮU+0A2E

Distinguishable

বাU+09ACandU+09BE

ਗU+0A17

Distinguishable

Table20ndashBanglaandGurmukhıdistinguishablecodepoints

1033BanglaandOriya(Odia)

Bangla Oriya(Odia) NBGPDecision

ওU+0993 ଓU+0B13 Confusable

Table21ndashBanglaandOriyadistinguishablecodepoints

Bangla Oriya(Odia) NBGP

Decision

ঘU+0998 ସU+0B38 Distinguishable

Table22ndashBanglaandOriyadistinguishablecodepoints

60

11 Appendix-IIBengaliconsonantsandtheirallographs

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

প p y(z+ত)(z +ন)|(z +প)প8(z +য)r(z +র)(z +ল)~(z +স)(+প)(+প)

ফ pʰ (t +র)(t +ল)(+ফ)

ব b ( +জ)( +দ)( +ধ)v(+ব)ব8(+য)(+র)(+ল)ভ(+ভ)(+ব)(+ব)

(0+ধ) 2 (3+ব)

ভ bʱ ভ8(+য)(+র)(+ল)

ত t (x+ত)8(x+x+য)(x+x+ব)(x+থ)(x+ন)ত8(x+য)(x+ম)8(x++য)(x+ব)(x+র)y(z+ত)(p+ত)(p+x+ব)(+ত)q8(+x++য) (+x+র)Thereisamarkedformofত+=ৎৎ6 (+xৎ)

amp (5+ত)

61

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

থ tʰ থ8(iexcl+য)cent(iexcl+র)pound(+থ)(x+থ)curren(+থ)

(7+থ) 9 (+থ)

দ d yen(brvbar+গ)sect(brvbar+ঘ)uml(brvbar+দ)copy(brvbar+ধ)দ8(brvbar+য)ordf(brvbar+ব)laquo(brvbar+ভ)not(brvbar+র)(+দ)shy(+দ)reg(+brvbar+র)not6 (+brvbar+র)

(lt+গ) gt (lt+ধ)

ধ dʱ macr(deg+ন)plusmn(deg+ম)ধ8(deg+য)sup2(deg+র)sup3(acute+ধ)copy(brvbar+ধ)(+ধ)micro(+ধ)

( (+ধ) gt (lt+ধ) (0+ধ) (7+ধ)

ট ʈ para(middot+ট)ট8(middot+য)cedil(middot+ব)sup1(middot+র)ordm(p+ট)raquo(frac14+ট)

ঠ ʈʰ ঠ8(frac12+য)frac34(iquest+ঠ)Agrave(frac14+ঠ)

ড ɖ Aacute(Acirc+ড)ড8(Acirc+য)Atilde(Acirc+র)

ঢ ɖʱ ঢ8(Auml+য)Aring(iquest+ঢ)

চ tʃ AElig(Ccedil+চ)Egrave(Ccedil+ছ)Eacute(Ccedil+Ecirc+র)Euml(Ccedil+ঞ)চ8(Ccedil+য)Igrave(Iacute+চ)Icirc(Iuml+চ)

(A+চ)

62

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ছ tʃʰ ETH(Ecirc+র)Egrave(Ccedil+ছ)Ntilde(Iacute+ছ)Ograve(Iuml+ছ)

$ (A+ছ)

জ dʒ Oacute(Ocirc+জ)Otilde(Ocirc+Ocirc+ব)Ouml(Ocirc+ঝ)times(Ocirc+ঞ)জ8(Ocirc+য)Oslash(Ocirc+র)Ugrave(Iacute+জ)

(A+জ)

ঝ dʒʱ (notprivilegedenoughtohaveclustersasafirstmember)Ouml(Ocirc+ঝ)Uacute(Iacute+ঝ)

ক k Ucirc(p+ক)ordm(p+ট)(p+ত)Uuml(p+x+র)(p+x+ব)Yacute(p+ন)THORN(p+ব)szlig(p+ম)ক8(p+য)agrave(p+র)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)auml(p+frac14+ব)aacute8(p+frac14+য)aring(p+স)s(aelig+ক)ccedil(+p+র)

amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)

খ kʰ (notprivilegedenoughtohaveclustersasafirstmember)egrave(aelig+খ)

63

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

গ g eacute(acute+গ)ecirc(acute+দ)sup3(acute+ধ)euml(acute+ন)igrave(acute+ব)iacute(acute+ম)গ8(acute+য)icirc(acute+র)iuml(acute+ল)eth(aelig+গ)eth6 (+aelig+গ)

( (+ধ) (H+গ) K (L+H+গ)

ঘ gʱ ntilde(ograve+ন)ঘ8(ograve+য)oacute(ograve+র)ocirc(aelig+ঘ)

ঞ Thisletterdoesnothaveanyparticularphoneticvaluebutmostlypronouncedasn

Igrave(Iacute+চ)Ntilde(Iacute+ছ)Ugrave(Iacute+জ)Uacute(Iacute+ঝ)times(Ocirc+ঞ)

(A+চ) $ (A+ছ) (A+জ) M (A+ঝ)

ণ n otilde(iquest+ট)frac34(iquest+ঠ)ouml(iquest+ড)divide(iquest+Acirc+র)Aring(iquest+ঢ)oslash(iquest+ণ)ণ8(iquest+য)ugrave(iquest+ব)acirc(p+frac14+ণ)uacute(frac14+ণ)ucirc(+ণ)

O (P+ড) - (P+R+র) + (S+ণ)

ঙং ŋ s(aelig+ক)uuml(aelig+p+র)egrave(aelig+খ)eth(aelig+গ)ocirc(aelig+ঘ)yacute(aelig+p+ষ)(Insomecontextsaeligisreplacedbyং)কংঅং

) (H+ক) (H+গ) U (H+ঘ)

64

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ম m thorn(+ল)yuml(+প)(+z+র)(+ভ)(++র)$(+ম)(+র)(x+ম)plusmn(deg+ম)amp(+ম)atilde(p+frac14+ম)

W (3+ম)

ন n (+ট)((+middot+র))(iquest+ঠ)u(+ড)(+Acirc+র)(+ত)q(+x+র)q8(+x++য)curren(+থ)shy(+দ)reg(+brvbar+র)micro(+ধ)+(+deg+র)(+brvbar+ব)-(+ন)(+ম)ন8(+য)(+স)0(+ন)

(7+থ) (7+ধ) (7+Y+র)

শ ʃ Icirc(Iuml+চ)Ograve(Iuml+ছ)1(Iuml+ন)2(Iuml+ম)3(Iuml+র)4(Iuml+ল)শ8(Iuml+য)

ষ ʃ 5(frac14+ক)raquo(frac14+ট)Agrave(frac14+ঠ)uacute(frac14+ণ)6(frac14+প)7(frac14+z+র)8(frac14+ফ)raquo(frac14+ট)9(frac14+middot+র)Agrave(frac14+ঠ)uacute(frac14+ণ)ষ8(frac14+য)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)

+ (S+ণ)

65

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

স sampʃ (+ক)(+ট)(+প)(+ফ)lt(+ত)pound(+থ)(+ট)(+ক)=(+খ)স8(+য)gt(+র)(+ল)aring(p+স)

9 (+থ)

হ h ucirc(+ণ)0(+ন)amp(+ম)হ8(+য)(+র)A(+ল)

W (3+ম)

ড় ɽ B(C+গ)

ঢ় ɽʱ (notprivilegedenoughtohaveclusters)

য dʒThesecondarysymbol(allograph)jɔ-phalahastwophoneticvaluesWhenaddedtotheinitialconsonantinaworditisavowelaelig(asinশ8ামলর 8াপারetc)Butafteranon-initialconsonantitjustdoublesitinpronunciation(asinকায6ধায6etc)The+যcombinationhastwophysicalmanifestationsmdashর 8andয6

ক8(p+য)স8(+য)র 8(+য)[Justর 8isneverusedinBanglaorthographyর 8াisbutthenitslasttwosymbolsYa-phalaa-karaconstituteavowelsignrepresentingthevowelঅ8া]

66

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

র r Twomanifestationsmdashi lরফrepʰasthefirst

memberofaclusteregপ6ৎ6 not6 য6D6(+deg+ব)(earlierE6=+brvbar+deg+বafour-termcluster)etc(placedoverthefollowingconsonant)

ii র-ফলাrɔ-pʰɔlaasthesecondthirdmemberofaclustereg etc(placedundertheconsonantitfollows)

ল l F(+গ)(+প)G(+ব)H(+ম)I(+ট)J(+ড)K(+ক)F(+গ)L(+দ)ল8(+য)iuml(acute+ল)(+ল)thorn(+ল)

ঃ hwordfinallywordmediallyitdoublesthepronunciationofthefollowingconsonant

অঃকঃ

অব

Page 20: Proposal for a Bangla (or Bengali) Script Root Zone …‘Bangla’ (or Bengali) is historically and genealogically regarded as an eastern Indo-Aryan language with around 178.2 million

20

TheZeroWidthNon-joiner(ZWNJ)isaninvisiblecharacterusedincertaincases(afterHasanta)wheredefaultconjunctformationistobeexplicitlyrestrictedandtheHasantajoiningthetwoconsonantsparticipatingintheconjunctformationneedstobeexplicitlyshown

339UseofYa-phalaaYa-Phalaasequencesare two instances inBanglawhereHasanta isprecededbya fullvowel(U+0985অ-BENGALILETTERAandU+098Fএ-BENGALILETTERE)

bull অা 098509CD09AF09BE BENGALILETTERA+BENGALISIGNVIRAMA+BENGALILETTERYA+BENGALIVOWELSIGNAA

bull এা 098F09CD09AF09BEBENGALILETTERE+BENGALISIGNVIRAMA+BENGALILETTERYA+BENGALIVOWELSIGNAA

For renderingYa-phalā followedbyঅ andএ it isnecessary to typeU+09CDHasantaplusU+09AFyaprecededby thesaidvowelsThis isapurely ligaturalentityand theadditionofYa-phalāandākaraisusedtoelicittheaeligsoundasinEnglishacidঅGািসড

association অGােসািসেয়শনlsquobatrsquoবGাটlsquofatrsquo ফGাট lsquomatrsquo মGাটlsquocaprsquoকGাপetcTheBrāhmīscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribed as lsquovowel killerrsquo although it actually indicates absenceof a vowel after themarkedconsonantOnly theconsonantscanhave theHasantamarkedButasweseehereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅ8াandএ8া(CfUnicode100p473[100])

3310FormationofRa-phalaaandRefSequences Thiscasereferstotheformationofrephaandra-phalāasfollows

Ra-Hasanta= (C2H)whereC2iseither

09B0(র-BENGALILETTERRA)or 09F0(ৰ-ASSAMESELETTERRAUnicodename

BENGALILETTERRAWITHMIDDLEDIAGONAL)His09CD(-BENGALISIGNVIRAMA)

Owingtoco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+kaasinঅকHarkaldquothesunldquo)ra-phalā=C+Hasanta+ra(egT ieka+Hasanta+raasinচTchakraldquocyclerdquo)Thepointisin

21

both the cases the slot for ra could be Bangla ra র (U+09B0) or the Assamese ra ৰ(U+09F0)followedprecededbythecommonHasanta(U+09CD)whereastheshapesofrephaandra-phalāinboththecasesremainthesameTheLGRmakesanoteofthispoint of concern with respect to the two RAs in disguise as it would be compeltelyimpossibletodistinguishbetweenthemwithnakedeyesinalablesogeneratedwhichmay consequently lead to concerns related to spoofing and other kind of cyberirregularitiesThemotivetoclassthesetwoCPsas(blocking)variantsisbecausefullyrendered labels may mask the distinction between Bangla ra র (U+09B0) or theAssameseraৰ(U+09F0)ThatprovidesthejustificationforVariantSet4thoughonlyinthecontextoffollowingHasantThedifferencebetweentheRAsisonlydistinguishableifonelooksintotheirUnicodevaluesThereforelabelssuchasঅকHarka শীষH sırsalsquotopapexrsquo অY abhra lsquocloudthe skyrsquo ম śrama lsquophysical labourrsquo could be extremelydangerous as theweb-usermay never verify the digital content (the labels) with itsunicodevaluecodepoints ThispointismadeexplicitlywithreferencetoTable9(ofsequencesp36)andTable16(ofWLESymbolsp47)thataretofollowMoreoveritisnoteworthythattheREPHAcanalsooccurwithKHANDATATheconditionsinthiscontextofKHANDATAareliabletobesuchthattheCshouldbeeitherRAU+09B0(র)(usedinBangla)orRAU+09F0(ৰ)(usedinAssamese)

4 OverallDevelopmentProcessandMethodologyThe Neo-Brāhmī Generation Panel (NBGP) has been formed by members havingexperience in Linguistics (especially in NLP Computational linguistics) LiteratureLanguageHistoryandEpigraphyUndertheNeo-BrāhmīGenerationPanelBanglaandeightotherscriptsbelongingtoseparateUnicodeblocksarebeingtakenuptoassignaseparate LGR for each However an attempt ismade to ensure that the fundamentalphilosophybehindbuildingthoseLGRsconsistentwithallotherBrāhmī-derivedscriptsThepresentLGRwillcater tomultiple languagesbelongingtoEGIDSscale1to4(seeTable4)thatuseBanglascriptThefollowingguidingprinciplesareusedinmakingdecisionsaboutBanglaLGRCode-points

41 GuidingPrinciplesTheNBGP adopts followingbroadprinciples for selection of code-points in the code-pointrepertoireacrosstheboardforalltheNeo-Brāhmīscriptswithinitsambit

411 InclusionPrinciples4111 ModernUsageEvery character proposed should be in the everyday usage of a particular linguisticcommunityThecharacterswhichhavebeenencodedintheUnicodefortranscription

22

purposesonlyorforarchivalpurposeswillnotbeconsideredforinclusioninthecode-pointrepertoire4112 UnambiguousUseEvery character proposed should have unambiguous understanding among linguistsaboutitsusageinthelanguage

42 ExclusionPrinciplesThe main exclusion principle is that of External Limits on Scope These consist ofprotocolsor standardswhichareprerequisites to theLabelGenerationRule-setsAllfurtherprinciplesareinfactsubsumedundertheselimitationsbuthavebeenspeltoutseparatelyforthesakeofclarity421 ExternalLimitsofScopeThecodepointrepertoireforrootzonebeingaveryspecialcaseatthetopofprotocolhierarchies thecanvasofavailablecharacters forselectionasapartof theRootZonecodepointrepertoireisalreadyconstrainedbyvariousprotocollayersbeneathitThefollowingthreemainprotocolsstandardsactassuccessivefiltersiTheUnicodeChartOut of all the characters that are needed by the script in question if a particularcharacter is not encoded in Unicode it cannot be incorporated in the code pointrepertoire Such cases are quite rare and especially so in Bangla-Asamiyā-ManipuriWritingSystemgiventheelaborateandexhaustivecharacterinclusioneffortsmadebytheUnicodeconsortiumiiIDNAProtocolUnicode being the character-encoding standard for providing the maximum possiblerepresentation of a given scriptlanguage it has encoded as far as possible all thepossible characters needed by the script However the Domain name being aspecialized case it is governed by an additional protocol known as IDNA(InternationalizedDomainNames inApplications) The IDNAprotocol excludes somecharactersoutofUnicoderepertoirefrombeingpartofthedomainnamesiiiMaximalStartingRepertoire(MSR)TheRoot-zoneLGRbeing the repertoireof characterswhicharegoing tobeused forcreationoftheRoot-zoneTLDswhichinturnconstituteanevenmorespecializedcaseof domain names the ROOT LGR procedure introduces additional exclusions on theIDNArsquosallowedsetofcharacters

23

ExampleBanglaSignAvagrahaঽ(U+093D)evenifallowedbyIDNAprotocol isnotpermittedintheRootZoneRepertoireaspertheMSRTosumuptherestrictionsstartoffwithadmittingonlysuchcharactersasarepartofthe code-block of the given scriptlanguage The IDNA Protocol further narrows thisdownandfinallyanadditionalfilterintheformofMaximalStartingRepertoirerestrictsthecharactersetassociatedwiththegivenlanguageevenmore4211 NoPunctuationMarksTheTLDsbeingidentifierspunctuationmarkerspresentinBraHami-basedscriptswillnotbeincluded 4212 NoSymbolsandAbbreviationsAbbreviations weights and measures and other such iconic characters like BANGLAISSHAR(U+09FA)BANGLACURRENCYDENOMINATORSIXTEEN৹(U+09F9)etcwillalsonotbeincluded4213 NoRareandObsoleteCharactersThere are characterswhich have been added toUnicode to accommodate rare formssuchasSanskriticVOCALICRRৠ(U+09E0)andVOCALICLldquoঌrdquo (U+098C)aswellasVOCALICLLৡ(U+09E1)andtheallographicndashkaraformsofthelattertwosymbols-VOWELSIGNVOCALICL(U+09E2)andVOWELSIGNVOCALICLLldquo(U+09E3)Allsuch charactersareexcludedwhich complieswith theConservatismprincipleas laiddownintheRootZoneLGRprocedureHoweverinBanglathe-karacorrespondingtoVOCALICRRৠ(U+09E0)whichisVOWELSIGNVOCALICRRldquordquo(U+09C4)isstill inactiveuseincertainlimitedborrowedorSanskriticwordsandarethereforeretained4214 NoStressMarkersofClassicalSanskritandVedicStressmarkers for classical Sanskrit will not be included This is also in consonancewiththeLetterprincipleaslaiddownintheRootZoneLGRprocedure4215 ABNFThe Augmented Backus-Naur Formalism (ABNF) is described in Section 541 andAppendix(Section101)

5 RepertoireTheBanglaWritingSystemisrepresentedinUNICODEusingtheBengali(Bangla)scriptname as enumerated in ISO 15924 corresponding to languages such as Asamiyā(Assamese) Bangla (Bengali) and Manipuri The BENGALI block used for Bangla-

24

Asamiyā-Manipuri in theUNICODEhas93 entriesThis sectiondetails the code-pointrepertoirethattheNeo-BrāhmīGenerationPanel[NBGP]proposestobeincludedintheBanglaLGRItmaybementionedherethat theGovernmentofAssamhassubmittedaproposal toBureauof Indian Standards (BIS) on26th February2016 for dis-unificationofBanglaand Asamiyā Scripts The BIS inits 8thMeetingofIndian Language Technologies andProducts Sectional Committee LITD 20 held on 23rd Aug 2017 decided torefer theproposalforrecognitionofAssamesescriptinISOIEC10646toISOUntiltheUNICODEConsortiumtakesanyfurtheractionitwillbeassumedthattheCodePointRepertoireunderTable11willbevalidforallthethreelanguagesasaboveFor each of the code points language references have been given in the last columntitledReferenceunderTable8titledtheldquoCodePointRepertoirerdquoForentirecoverageofBanglacodepointsreferencesofBanglaAsamiyā(Assamese)Manipuri(Meitei)andBishnupriya are given Kokborok written in Bangla script is not known to haveintroducedmanynewcomplicationsexceptforoneparticularcharacterThoughonlyafewrepresentativelanguagesunderEGIDSScale1-4havebeenchosenforreferencingthey together cover all the code-points required for all the languages that NBGP hasconsideredasgivenunderBanglaUnicodePoints(asgiveninUNICODE63)Howeverbefore thedetailsarepresented it is ideal to lookat theBanglaCodePointChartfromMaximalStartingRepertoire[MSR]Version3Itmaybenotedthattheshapesofthereferenceglyphsgivenbelowinthecodechartsarebasedononeofthemanyfontsdesigned and are not prescriptive because there could be some variations in actualfonts ndash both UNICODE-compatible and True-Type ones Consider the following Codepointtable

25

Figure1BanglaCodePagefrom[MSR]forBangla-Asamiyā-Manipuri

ColourconventionAllcharactersthatareincludedinthe[MSR]-YellowbackgroundPVALIDinIDNA2008butexcludedfromthe[MSR]-PinkishbackgroundNot PVALID in IDNA2008 or are ineligiblefor the root zone (digits hyphen) - Whitebackground

GiventheBanglaUnicodeBlockasinFigure1forthecodepointsthoseareincludedintheMSRthefollowingsymbolswillneedaseparatetreatmentৎ U+09CE BanglaLetterKhanda-Taৰ U+09F0 Asamiyā-BanglaLetterRaWithMiddleDiagonalৱ U+09F1 Asamiyā-BanglaLetterRaWithLowerDiagonal

26

51 CodePointRepertoireInclusion

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

1 U+0981 BENGALISIGNCANDRABINDU

Candra-bindu

1Bangla2Manipuri2Assamese

[112][122][125]

2 U+0982 ং BENGALISIGNANUSVARA

Onushshar(Anusvara)

1Bangla2Manipuri2Assamese

[112][122][125]

3 U+0983 ঃ BENGALISIGNVISARGA

Biśarga(Visarga)

1Bangla2Manipuri2Assamese

[112][122][125]

4 U+0985 অ BENGALILETTERA

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

5 U+0986 আ BENGALILETTERAA

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

6 U+0987 ই BENGALILETTERI

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

7 U+0988 ঈ BENGALILETTERII

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

27

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

8 U+0989 উ BENGALILETTERU

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

9 U+098A ঊ BENGALILETTERUU

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

10 U+098B ঋ BENGALILETTERVOCALICR

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

11 U+098F এ BENGALILETTERE

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

12 U+0990 ঐ BENGALILETTERAI

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

13 U+0993 ও BENGALILETTERO

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

14 U+0994 ঔ BENGALILETTERAU

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

28

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

15 U+0995 ক BENGALILETTERKA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

16 U+0996 খ BENGALILETTERKHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

17 U+0997 গ BENGALILETTERGA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

18 U+0998 ঘ BENGALILETTERGHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

19 U+0999 ঙ BENGALILETTERNGA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

20 U+099A চ BENGALILETTERCA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

21 U+099B ছ BENGALILETTERCHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

29

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

22 U+099C জ BENGALILETTERJA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

23 U+099D ঝ BENGALILETTERJHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

24 U+099E ঞ BENGALILETTERNYA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

25 U+099F ট BENGALILETTERTTA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

26 U+09A0 ঠ BENGALILETTERTTHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

27 U+09A1 ড BENGALILETTERDDA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

30

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

28 09A109BC(U+09DC)

ড় NormalizedformofBENGALILETTERRRA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]09DCisthepreferredcodepointhoweveritisnotavailableforLGRasperthestandardsgoverningthisLGRdevelopment

29 U+09A2 ঢ BENGALILETTERDDHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

30 09A209BC(U+09DD)

ঢ় NormalizedformofBENGALILETTERRHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]09DDisthepreferredcodepointhoweveritisnotavailableforLGRasperthestandardsgoverningthisLGRdevelopment

31 U+09A3 ণ BENGALILETTERNNA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

32 U+09A4 ত BENGALILETTERTA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

31

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

33 U+09A5 থ BENGALILETTERTHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

34 U+09A6 দ BENGALILETTERDA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

35 U+09A7 ধ BENGALILETTERDHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

36 U+09A8 ন BENGALILETTERNA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

37 U+09AA প BENGALILETTERPA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

38 U+09AB ফ BENGALILETTERPHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

39 U+09AC ব BENGALILETTERBA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

32

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

40 U+09AD ভ BENGALILETTERBHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

41 U+09AE ম BENGALILETTERMA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

42 U+09AF য BENGALILETTERYA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

43 09AF09BC(U+09DF)

য় NormalizedformofBENGALILETTERYYA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]09DFisthepreferredcodepointhoweveritisnotavailableforLGRasperthestandardsgoverningthisLGRdevelopment

44 U+09B0 র BENGALILETTERRA

Consonant 1Bangla2Manipuri

[112][125]

45 U+09B2 ল BENGALILETTERLA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

46 U+09B6 শ BENGALILETTERSHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

33

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

47 U+09B7 ষ BENGALILETTERSSA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

48 U+09B8 স BENGALILETTERSA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

49 U+09B9 হ BENGALILETTERHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

50 U+09BE া BENGALIVOWELSIGNAA

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

51 U+09BF ি BENGALIVOWELSIGNI

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

52 U+09C0 ী BENGALIVOWELSIGNII

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

53 U+09C1 BENGALIVOWELSIGNU

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

34

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

54 U+09C2 BENGALIVOWELSIGNUU

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

55 U+09C3 BENGALIVOWELSIGNVOCALICR

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

56 U+09C4 BENGALIVOWELSIGNVOCALICRR

Kāra(Mātrā)

1Bangla2Assamese

[112][122]

57 U+09C7 l BENGALIVOWELSIGNE

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

58 U+09C8 m BENGALIVOWELSIGNAI

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

59 U+09CB lা BENGALIVOWELSIGNO

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

60 U+09CC lৗ BENGALIVOWELSIGNAU

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

35

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

61 U+09CD BENGALISIGNVIRAMA

Hasanta(=Halant)Virama(=Da ri)

1Bangla2Assamese2Manipuri

[112][122][125]

62 U+09CE ৎ BENGALILETTERKHANDATA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

63 U+09F0 ৰ BENGALILETTERRAWITHMIDDLEDIAGONAL

Consonant 2Assamese [122]

64 U+09F1 ৱ BENGALILETTERRAWITHLOWERDIAGONAL

Consonant 2Assamese2Manipuri

[122][125]

Table8BanglaCode-PointRepertoire

Apart from the above individual code-points the Neo-Brāhmī Generation Panel alsoproposes some specific sequences which enable conditional inclusion of the BanglaLETTER A and E followed by Bangla SIGN VIRAMA and Bangla LETTER YA againfollowed by Bangla VOWEL SIGN AA in the repertoire for enabling inclusion of aeligsoundasinEnglishlsquobatrsquolsquocatrsquoetc

SrNo

UnicodeCodePoints

Sequence

CharacterNames Examplelanguagesusingthecode-point

(Notexhaustive

list)

Reference

36

SrNo

UnicodeCodePoints

Sequence

CharacterNames Examplelanguagesusingthecode-point

(Notexhaustive

list)

Reference

S1 098509CD09AF09BE

অ8া BENGALILETTERABENGALISIGNVIRAMABENGALILETTERYABENGALIVOWELSIGNAA

BanglaAssamese

[112][122]

S2 098F09CD09AF09BE

এ8া BENGALILETTEREBENGALISIGNVIRAMABENGALILETTERYABENGALIVOWELSIGNAA

Bangla [112]

Table9Sequences

52 CodePointRepertoireExclusionTherearesomecharactersoftheBanglascriptthatfindplaceintheUnicodebuthavenot been included in the repertoire in the LGR proposal The reason for excludingঌ(U+098C)andৗ(U+09D7)isthattheyarerareandobsoletecharacters

SrNo CodePoints

Glyph CharacterNames Note

1 U+098C ঌ BENGALILETTERVOCALICL Limitedordeclininguse

2 U+09D7 ৗ BENGALIAULENGTHMARK Limitedordeclininguse

Table10ExcludedCodePoints

37

53 CodepointnotusedaloneBENGALI SIGN NUKTA U+09BC (See 336) is excluded from repertoire since it will never be used alone It will be used as sequence in three special characters in normalized form for ড় ঢ় য়

UnicodeCodePoint

Glyph CharacterName Reasonforexclusion

U+09BC BENGALISIGNNUKTA

Never used alone Only used together with U+09A1 ড

U+09A2 ঢ U+09AF য as to form ড় ঢ় য় respectively

Table10bExcludedCodePoints

54 TheBasisofPresentIDNThepresentLGRhasalsobenefitedfromtheearlierworkonIDNforBangla(differentversions)doneforभारतorভারতdraftedbetween20112009and18072013

541 TheABNFVariablesTheAugmentedBackus-NaurFormalism(ABNF)beganwiththefollowingvariables

CrarrConsonantVrarrVowelMrarrkāra(Mātrā)BrarrAnusvāra(onuʃʃār)DrarrCandrabinduXrarrVisarga(biʃɔrgo)HrarrHasantaViramaZrarrKhandaTa

TheAugmentedBackus-NaurFormalism(ABNF)willusethefollowingOperators

SrNumber Operator Function

1 ldquo|ldquo Alternative

2 ldquo[]rdquo Optional

3 ldquordquo VariableRepetition

38

4 ldquo()rdquo SequenceGroup

Table11TheABNFFormalism

InwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoBanglaaregiventofacilitateunderstanding

542 TheVowelSequenceInwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoBanglaare given To facilitate understanding of other Brahmi script users equivalents inDevanāgarīareprovidedwherevernecessaryAvowelsequenceismadeupofasinglevowelItmaybefollowedbutnotnecessarily(optionally)byanAnusvāraonuʃʃār(B)Candrabindu(D)oraVisargabiʃɔrgo(X)ThenumberofDBorXwhichcanfollowaV inBanglamaynotberestrictedtooneGoingbytherules illustratedinthedocument it isclearthat formationssuchasVDDVBBandVXXare invalidorthographicunitsHowever it isvalidandpossible tohaveformationsorsequencessuchasanusvarafollowedbyachandrabinduononehandandvisarga followed by a chandrabindu on the other as in হ8াংচা lsquohaelignchārsquo and lsquohaelignrsquo হ8াঃrespectivelyThepossibilityof aVisarga orAnusvāra (onuʃʃār) followingaCandrabinduexists inBangla Vowel can optionally be followed by a combination of Hasanta Virama [H]Consonant [C] to formaYa-phala ldquoYa-phala isapresentation formofU+09AFBanglaletterযorlsquoyarsquoRepresentedbythesequenceltU+09CDieBENGALISIGNVIRAMABangla SIGNHasanta or VIRAMA U+09AF -য BENGALI LETTER YAgt Ya-phala has aspecialformয়AgainwhencombinedwithU+09BEাBENGALIVOWELSIGNAA(ielsquoaarsquo(ā))itisusedfortranscribing[aelig]asintheldquoardquointheEnglishwordldquobatrdquowritteninBanglaasব8াটAVowel-sequenceadmitsthefollowingcombinations

5421 ASingleVowel

ExamplesV অअ

5422 AVowelwithConditionsAVowelcanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D]orVisarga[X]or Candrabindu+ Anusvāra [DB] or Candrabindu+ Visarga [DX] or combination ofHasanta(orVirama)[H]followedbyConsonant[C]followedbykāra(Mātrā)[M]

39

Examples

VB অং अ

VD অ अ

VX অঃ अः

VDB অং अ

VDX অঃ अःVHCM অ8াএ8া

5423 VHCMSequenceAVHCMsequencecanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D] orVisarga [X]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga [DX]

Examples VHCMB অ8াংএ8াং VHCMD অ8াএ8া VHCMX অ8াঃএ8াঃ VHCMDB অ8াংএ8াং VHCMDX অ8াঃএ8াঃ

543 TheConsonantSequence

5431 ASingleConsonant(C) ExampleC কक

5432 AConsonantwithConditionsAConsonantoptionallyfollowedbydependentvowelsignkāra(Mātrā)[M]orAnusvāra [B] or Candrabindu [D] or Visarga [X] or Hasanta (also known asVirama)[H]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga[DX]

Example

CM িকক -कक

CB কং क

40

CD ক क

CX কঃ कः

CH p क(Pureconsonant)

CDB কং क

CDX কঃ कः

5433 CMSequenceACMsequencecanbeoptionallyfollowedbyBDXDBorDX

Example CMB কীংকং कक

CMD কা का

CMX বীঃ वीः

CMDB কাং का

CMDX কাঃ काः

5434 SequenceofConsonants Asequenceofconsonants(upto4)joinedbyHasanta(alsoknownasVirama)

3(CH)CExample

CHC W rarr ন++ ত न+त

CHCHC sup2 rarr ন+ + ত+ + র न+त+र

CHCHCHC q8 rarr ন++ত++র++য় न+त+र+य

5435 Subsets While considering its subsets as a representative example we will consider thecombinationCHConlyhoweverthesameisequallyapplicabletoCHCHCandCHCHCHC[A]ThecombinationmaybefollowedbyMBDXDBorDX

Example CHCM sup3ী rarrক ক ী 4कrarrककী

CHCB sup3ং rarrক ক ং 4कrarrकक

CHCD sup3 rarrক ক 4कrarrकक

41

CHCX sup3ঃ rarrক ক ঃ 4कःrarrककঃ

CHCDB sup3 ং rarrক ক ং 4कrarrकक

CHCDX sup3ঃ rarrক ক ঃ 4कःrarrककः [B] 3(CH)CMmayfurtherbefollowedbyaBDXDBorDX

Example

CHCMB sup3ীং rarr ক ক ী ং 4क rarr क क ी

sup3ং rarr ক ক ং 4क rarr क क

CHCMD sup3া rarr ক ক া 4का rarr क क ा

CHCMX sup3ীঃ rarr ক ক ী ঃ 4कः rarr क क ी ः

CHCMDB sup3াংrarr ক ক া ং 4काrarr क क ा

CHCMDX sup3াঃ rarr ক ক া ঃ 4काः rarr क क ा ः

544 TheKhanda-Tasequence

5441 AsinglelsquoKhandarsquo-Ta(Z) Example Zৎ=x

5442 AKhandaTaCombination10AKhandaTacanbeprecededbyaconsonantandHasanta(alsoknownasVirama)

[CH]Z

Example র++ৎ =ৎHasinভৎHসনা (bhartsanā) scoldingNoteTheconditionsinthiscontextofKHANDATAarethattheCshouldbeeitherRAU+09B0(র)(usedinBangla)orRAU+09F0(ৰ)(usedinAssamese)

545 SpecialCasesSandPTwospecialcasesinvolvingSequences(referredtoasSandPinTable16underSection7)couldbedescribedbrieflyhereLetustakeupSinthefirstinstanceItisnoteworthythat thereare two instances inBanglawhereHasanta(U+09CD) isprecededbya full

10 Refer to Rule P in Section 7 Table 16

42

vowel (U+0985 অ - BANGLA LETTER A and U+098F এ - BANGLA LETTER E) ForrenderingYa-phalāfollowedbyঅandএitisnecessarytotypeU+09CDplusU+09AFyapreceded by the said vowels This is a purely ligatural entity and the addition ofYa-phalāandā-kāraisusedtoelicittheaeligsoundasinEnglishlsquobatrsquolsquofatrsquoetcTheBrahmıscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribedaslsquovowel killerrsquo although it actually indicates absence of a vowel after the markedconsonant Only the consonants can have the Hasanta marked But as we see hereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅGাandএGা(CfUnicode100p473[100])Another case refers to the formation of repha and ra-phalā in the said script andmentioned in the tableaboveasPOwing toco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+

kaasinঅকHarkaldquothesunldquo)ra-phalā=C+Hasanta+ra(egTieka+Hasanta+raas

inচTcakraldquocycleldquo)ThepointisinboththecasestheslotforracouldbeBanglaraর

(U+09B0)ortheAssameseraৰ(U+09F0)followedprecededbythecommonHasanta(U+09CD)whereastheshapesofrephaandra-phalāinboththecasesremainthesame

43

6 VariantsThissection talksabout thevariants in theBanglascriptTheNBGPcategorizes theseconfusinglyvariantsintwogroupsGroup1ConfusingduetopurevisualsimilarityGroup2ConfusingduetodeviationfromnormallyperceivedcharacterformationsbylargerlinguisticcommunityForGroup1anyidenticalcodepointsaredefinedasvariantsTheconfusablebutnotidenticalcasesarenotproposedasthereisanotherpanel(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesHowevercaseswhichbelongtoGroup2areproposedtobeconsideredasvariantsThesecasesarenotofmerevisualsimilarityasthey involve some deviations from the widely accepted norms of Bangla Aksharformations These can cause confusion even to a careful observer and hence beingproposedasvariantsThe variants are generated in a script when two or more forms are formed withdifferent storage or code points In Bangla the e-kāra ā-kāra and the o-kāra havedifferentcodepointsOnecantypeowithaconsonantatonegoandthesamebytypinge-kāra and ā-kāra as two separate keys getting the same results A reader cannotdifferentiatebetween the twoko (েকা)one typedwitha singlekeyand theotheronetypedwithtwodifferentkeysMoreoverthiswillnotbeconsideredasacaseofvariantbecauseakārafollowedbyakāraisnotallowed

61 InScriptVariantsHoweverweproposetwocasesoftruein-scriptvariantsinBanglascriptCASEIAs far as true variants in Bangla are concernedwemay drawour attention to caseswhereinHasantawith(U+09A5)থ(tha)appearsasconjunctwith(U+09B8)স(sa)and(U+09A8)ন(na)1 স+Hasanta+থ(U+09B8+U+09CD+U+09A5)versus

স +Hasanta+হ(U+09B8+U+09CD+U+09B9)

2 ন+Hasanta+থ(U+09A8+U+09CD+U+09A5)versus

ন+Hasanta+হ(U+09A8+U+09CD+U+09B9)

44

Theabovecombinationsifwrittenintraditionalorthographycouldbelittleconfusingwheretheথ(tha)inconjunctappearslikeaহ(ha)Theconjunctcouldbeintheinitialmedialorfinalpositions(asshownbelowinegno1)Itcouldbetypedwrongaswellthinking itwas aহ (ha)U+09B9 increasing the chances of risks in labelwriting andidentificationExamples1 acuteandসহ(asinacuteানsthānaacuteলsthulaাacuteGsvāsthyaঅacuteায়ীasthāyı)

2 andনহ(asinparagrantha)The fontswhichrepresent traditionalBanglawritingsystemcould tend tocreate thisproblemThereforethesemaybetakenascasesofvariantsinBanglaCASEIIAnotherinterestingexampleofvariantisencounteredinra+HasantaandHasanta+racombinationsinwritinglabelsintheBanglascript(forlanguagessuchasBanglaAssameseandManipuri)Thevariantcasesariseintypinglsquorepharsquo(involvingra+Hasanta)andlsquora-phalārsquo(involvingHasanta+ra)lsquoRepharsquocouldbeformedbytwosequences(mainlybecausebothAssameseandBanglafindplaceinthesameUNICODEpointsandlsquoB_RArsquoaswellaslsquoA_RArsquorefertothesamephoneticelement)Herethefinalligatureslookthesameandwillbeasfollows

(1) B_RA+H+C(2) A_RA+H+C

Where

B_RA = U+09B0BENGALILETTERRA(র)orA_RA = U+09F0BENGALILETTERRAWITHMIDDLEDIAGONAL(ৰ)H = U+09CDBENGALISIGNVIRAMA()C = anyconsonant(theoretically)

Example

Sequence1(UsingBanglaRA)

Ligature1

Sequence2(UsingAssameseRA)

Ligature2

U+09B0(র)U+09CD()U+0995(ক) কH U+09F0(ৰ)U+09CD()U+0995(ক) কH

U+09B0(র)U+09CD()U+09A0(ঠ) ঠH U+09F0(ৰ)U+09CD()U+09A0(ঠ) ঠH

Table12ExampleofRepha

45

NoteAsBanglaandAssameseকandঠlookexactlythesametheresultantcombinationswithRephalookidenticalAdditionofRephadoesnotmakeanydifferencelsquoRa-phalarsquocouldbeformedbytwosequencesonsimilargroundsandthefinalligatureswouldlookthesame

(1) C1+H+B_RA(2) C1+H+A_RA

WhereC1 = anyconsonantsexceptKhanda-ta

Example

Sequence1(UsingBanglaRA)

Ligature1

Sequence2(UsingAssameseRA)

Ligature2

U+0995(ক)U+09CD()U+09B0(র) U+0995(ক)U+09CD()U+09F0(ৰ)

U+09A8(ন)U+09CD()U+09B0(র) ) U+09A8(ন)U+09CD()U+09F0(ৰ) )

Table13ExampleofRa-phala

AstheAssameseandBanglaRephaandRa-phalaconjunctformslookthesamethiscouldcauseconfusabilitytotheend-usersHencetherephaandra-phalacasesneedtobedefinedasvariantsNBGPconcludedtodefineরandৰasvariantcodepointswhereonlyonevariantsetbetweenরandৰcouldcoverallcasesButthiswillcreateblockedvariantlabelsegif

someoneregistersldquoরররrdquothevariantlabelldquoৰৰৰrdquowillbegeneratedasvariantandwillbeblockedandviceversaHoweveritisonlyblockedatthelabellevelifsomeoneelseneedstoregisterotherlabelsegৰৰorৰৰৰৰitisstillpossible

62 CrossScriptVariantsAcrispcrossscriptstudyforBanglahasbeendonewithrespecttosisterscriptssuchasDevanāgarī Gurmukhı and Odia11(formerly Oriya) keeping in mind the visual andtechnicalconfusionstheymaycauseaslabelsonthewebdomainMoreoverthereisnoin-script variant in Bangla as far as the orthography is concerned The followingcharacters are being proposed by the NBPG as variants Although there are certaincharacters which are somewhat similar they but have not been included here TheyhavebeenprovidedintheAppendix(102)forreference

11 Unicode uses Oriya for the script although Odia is now the official term used

46

1 BanglaandNāgarīDevanāgarīScript

Bangla Devanāgarī

মU+09AE

मU+092E

িU+09BF

िU+093F

Table14-BanglaandDevanāgarīcross-scriptvariantcodepoint

2 BanglaandGurmukhiScript

Bangla Gurmukhı

মU+09AE

ਸU+0A38

িU+09BF

ਿU+0A3F

Table15-BanglaandGurmukhıcross-scriptvariantcodepoint

7 WholeLabelEvaluationRules(WLE) ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinsection32whenwritten inBangla12ScriptThe ruleshavebeendrafted in suchawaythattheycanbeeasilytranslatedintotheLGRspecificationsBelow are the symbols used in the WLE rules for each of the Indic SyllabicCategoryasmentionedinthetableprovidedinCodepointrepertoire(Section51)

C rarr Consonant

M rarr Kāra(Mātrā)

V rarr Vowel

B rarr Anusvāra

12 As used by the Unicode denoting and including both Assamese and Maṇipuri

47

D rarr Candrabindu

X rarr Visarga

H rarr Hasanta

Z rarr KhandaTa

S rarr S1S2(fromTable9)or(ae)Ya-phalā(V1HC1M1)whereV1iseither0985(অ-BENGALILETTERA)or098F(এ-BENGALILETTERE)His09CD(-BENGALISIGNVIRAMA)C1is-09AF(য-BENGALILETTERYA)M1is-09BE(া-BENGALIVOWELSIGNAA)S1 and S2 are valid even they are not allowed by the other context rules

P rarr Ra-Hasanta(C2H)whereC2iseither09B0(র-BENGALILETTERRA)or 09F0 (ৰ - ASSAMESE LETTER RA

Unicode name BENGALI LETTER RAWITHMIDDLEDIAGONAL)

His09CD(-BENGALISIGNVIRAMA)

Table16-SymbolsusedinWLErules

ItisalsoperhapsidealtomentionherethatinBanglatheconsonantletters(orgraphemes)arephysicallyjoinedtoformldquoclustersrdquothatcouldtheoreticallyconjoinfromtwotofourconsonantsandcombinetocreatenewshapesDashandChaudhuri(1998)statethatthereareldquonearly380uniqueconsonantclustersrdquooutofwhichBi-consonantalcombinationsare290three-lettercombinationsaccountforanother80andtherareroneswithfourlettersnumber10more[136Pg4]MoredetailsofsuchcombinationscouldbeseeninPabitraSarkar(1993)[135] 71FinalSetofWLERulesTheprevalentpatternsinBanglaandvariousrestrictionsbelowarethespecificWLErulesthatneedtobeimplemented

48

1 CisasetofCandCNwhereCNisthesetofnormalizedformsofড়ঢ়য়2 HmustbeprecededbyC

Example

3 MmustbeprecededbyCExampleকা

4 DmustbeprecededbyeitherofVCorMExampleআখখা হা

5 XmustbeprecededbyeitherofVCMorDExampleউঃখঃবঃাঃ দ ঃ

6 BmustbeprecededbyeitherofVCMorDExampleআংইংকং

7 ZmustbeprecededbyVCMDBXorPExampleইৎকৎাৎাৎপ6ৎrৎ (S is not listed because S ends with M Z may also follow S)rdquo

8 VCANNOTbeprecededbyHDetailsin711CaseofVprecededbyH

9 SCANNOTbeprecededbyH

Now letuselaborateeach rulewithexamples from the scriptkeeping inmindthe Bangla Assamese and Manipuri communities Some combinations ofcharactersmayseemunrealisticorrareinusagebutthereisnoharminaddingsuch ligaturesbecause it ispossible tocreate thembyanyusereasilybutmaynotbeattestedcombinationsCase of V Preceded by H There could be cases involvingmulti-word domains where Vmay need to beallowedtofollowanHegব8াsঅtইিuয়া baeligŋkʌv ɪndiə (U+09ACU+09CDU+09AFU+09BEU+0999U+09CD U+0995 U+0985 U+09AB U+09CD U+0987 U+09A8 U+09CD U+09A1U+09BFU+09DFU+09BE)(meaningBankofIndia)ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendswithanH(অt)andthesecondwordbeginswithaV(ইিuয়া)Somesectionsofthe linguistic community require the explicit presence of H for fullrepresentation of the sound intended However by and large the form of thefirstwordwithoutanH(U+09CD)isconsideredenoughforfullrepresentationofthesoundintendedforthefirstword

49

This isauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoire Otherwise V is never required to be allowed to follow an HPermittingthismaycreateaperceptivesimilaritybetweentwolabels(withandwithout H) for majority of the linguistic communities hence this is explicitlyprohibitedbytheNBGPIn future if required depending on the prevailing requirements from thecommunitythefutureNBGPmayconsiderrevisitingthisrule

72 AdditionalExamplesfromBanglaABNF

Belowarea fewexampleswhichhelponeunderstandsomeof the rulesABNFputsinplaceThesearejustgivenforreferencepurposesandarenotmeanttobecomprehensive

1HMBDorXcannotoccurinthebeginningofaBanglawordExample

ক क

াক ाक

ংক क

ক क

ঃক ःकAscanbeseensuchcombinationwillresultautomaticallyinaldquogolurdquooradottedcircle marking it as an invalid formation This is an intrinsic property of theIndianlanguagesyllableandisquasi-automaticallyappliedwhereversupportedbytheOS

2HisnotpermittedafterVBDXMS

Example

অ अ

অং क ক क কঃ कः ি )क

3NumberofBDorXpermittedafterConsonantorVowelorakāra(Mātrā)isrestrictedtoonethusthefollowingcombinationsareinvalidated

50

Example

কংং क

ক क

কঃঃ कःः

কা का

কীঃঃ कःः

অংং अ

অ अ

অঃঃ अःः

4NumberofMpermittedafterConsonantisrestrictedtooneExample

কীী की5MisnotpermittedafterV Example

ইাঈৗ ईाईौ6ThecombinationsofAnusvāra+ VisargaaswellasVisarga+Anusvāraarenotpermissible

Example

কংঃ कः

কঃং कः

8 Contributors

81ExpertsfromIndia ProfessorUdayaNarayanaSinghChair-ProfessorofLinguisticsampDeanFacultyofArtsAmityUniversityHaryanaGurgaonPachgaonManesarPIN122431(Haryana)India

51

ProfessorPabitraSarkarformerlyVice-ChancellorRabindraBharatiUniversityKolkataDrAtiurRahmanKhanPrincipalTechnicalOfficerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMrRajibChakrabortyLinguistSocietyforNaturalLanguageTechnologyResearch(SNLTR)Module114amp130SDFBuildingSaltLakeSector-VKolkata-700091(WestBengal)IndiaMrAkshatJoshiProjectEngineerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMsMoumitaChowdhurySeniorTechnicalOfficerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMrChandrakantaMurasinghAgartalaTripuraSomeotherNBGPmembers

82ContributorsfromBangladeshJanabMustafaJabbarHonorableMinisterMinistryofPostsTelecommunicationsampInformationTechnologyGovtofBangladeshProfShamsuzzamanKhanFormerDirector-GeneralBanglaAcademyDhakaProfRafiqulIslamNationalProfessorofHumanitiesDhakaProfSwarochisSarkarDirectorInstituteofBangladeshStudiesRajshahiUniversityRajshahiBangladeshProfJinnatImtiazAliDirector-GeneralInternationalMotherLanguageInstituteDhakaMrMohammadMamunOrRashidDepartmentofBanglaJahangirnagarUniversityampMemberBangladeshComputerCouncilProfManiruzzamanformerlyProfessorChittagongUniversityChattagramBangladeshMrShyamSunderSikderSecretarySecretaryPostampTelecommunicationsDivisionGovtofBangladesh

52

MrMdMustafaKamalFormerDirectorGeneralBangladeshTelecommunicationsRegulatoryAuthorityGovernmentofBangladeshDhakaBrigadierGeneralMdMahfuzulKarimMajumderDirector-GeneralEngineeringampOperationsDivisionBangladeshTelecommunicationsRegulatoryAuthorityGovernmentofBangladeshDhakaMdZiarulIslamProgrammerPostsampTelecommunicationsDivisionGovernmentofBangladeshDhakaProfSyedShahriyarRahmanDepartmentofLinguisticsUniversityofDhakaDrMizanurRahmanDirector(In-Charge)TranslationTextBookandInternationalRelationsDivisionBanglaAcademyDhakaDrApareshBandyopadhyayDirectorBanglaAcademyDhakaMrMdMobarakHossainDirectorBanglaAcademyDhakaDrJalalAhmedDirectorBanglaAcademyDhakaMrJahangirHossainInternetSocietyBangladesh(ICANNALS)JanabSarwarMostafaChoudhuryBangladeshComputerCouncilDhakaJanabMdRashidWasifBangladeshComputerCouncilDhakaJanabIstiaqueArifSeniorAssistantDirectorBangladeshTelecommunicationsRegulatoryAuthorityDhakaMsAfifaAbbasInformationSecurityandGovernanceLeadEngineeratBanglalinkandICANNFellowMr Mohammad Abdul Haque Secretary General Bangladesh Internet GovernanceForumMrImranHossenCEOEyeSoftandkeymemberofBangladeshAssociationofSoftwareampInformationServices(BASIS)MsShahidaKhatunDirectorFolkloreMuseumampArchiveDivisionBanglaAcademyDhakaMrSyedAshikRehmanCEOBengalMediaCorporationDhakaMrHaseebRahmanCEOProfessionalsrsquoSystemsDhaka

53

9 References[100] UnicodeConsortium2017UnicodeStandard100MountainViewCA

[101] BandyopadhyayChittaranjan1981DuiShatakerBanglaMudranoPrakashanKolkataAnandaPublishers

[102] BanerjiRD1919TheOriginoftheBengaliScriptKolkataNewDelhiAsianEducationalServices2003reprint

[103] ChatterjiSK1926TheOriginandDevelopmentoftheBengaliLanguageCalcuttaCalcuttaUniversityPress

[104] -----1939Bhasha-prakashBangalaVyakaran(AGrammaroftheBengaliLanguage)CalcuttaUniversityofCalcutta

[105] HaiMuhammadAbdul1964DhvaniVijnanOBanglaDhvani-tattwa(PhoneticsandBengaliPhonology)DhakaBanglaAcademy

[106]JhaSubhadra1958TheFormationofMaithiliLondonLuzacampCo

[107] KosticDjordjeDasRheaS1972AShortOutlineofBengaliPhoneticsCalcuttaStatisticalPublishingCompany

[108] MajumdarRC1971HistoryofAncientBengalCalcuttaGBhardwaj

[109] MazumdarBijaychandra19202000TheHistoryoftheBengaliLanguage(ReprCalcutta1920ed)NewDelhiAsianEducationalServices

[110] PandeyAnshuman2001ProposaltoEncodetheTirhutaScriptinISOIEC10646

[111] PalPalashBaran2001DhwanimalaBarnamalaKolkataPapyrus

[112] -----2007lsquoBanglaHarapherPanchParbarsquoInSwapanChakrabortyedMudranerSanskritiOBanglaBoiKolkataAbabhas

[113] RossFiona1999ThePrintedBengaliCharacteranditsEvolutionLondonCurzon

[114] ShastriMahamahopadhyayHaraPrasad1916HājārBacharērPurāṇaBāṅgālāBhāṣāyBauddhaGānōDōhāCalcuttaBangiyaSahityaParishat

[115] SinghUdayaNarayana(JointlyManiruzzaman)1983DiglossiainBangladeshandlanguageplanningCalcuttaGyanBharati214pp

[116] -----1987ABibliographyofBengaliLinguisticsMysoreCIILxii+316pp

[117] -----2017(withRajibChakrabortyBidishaBhattacharjeeampArimardanKumarTripathy)LanguagesandCulturesontheMarginGuidelinesforFieldworkonEndangeredLanguagesMimeoCentreforEndangeredLanguagesVisva-Bharati

54

[118] -----1980ScriptalchoiceandspellingreformAnessayinlanguageandplanningJournaloftheMSUniversityofBarodaSocialScienceNumber292173-186AmodifiedversionreprintedEAnnamalaiBjornJernuddandJoanRubinedsLanguagePlanningProceedingsofanInstituteMysoreCIIL405-417

[119] Sripantha1996JakhanChapakhanaEloKolkataPaschim-BangaBanglaAcademy

[120] SurAtul1986BanglaMudranerDushoBacharKolkataJijnasa

[121] ScriptBehaviourforBengaliVersion11TDILandC-DACPune

[122] BoraMahendra1981TheEvolutionofAssameseScriptJorhatAssamSahityaSabha

[123] ProposaltoEncodetheTirhutaScriptinISOIEC10646httpwwwunicodeorgL2L201111175r-tirhutapdfaccessedon25112017

[124] EthnologueAssameseintheLanguageCloudhttpswwwethnologuecomcloudasmaccessedon25112017

[125] BengalialphabetforManipurifoundinEthnologueManipuri(MeeteilonMeithei)httpswwwomniglotcomwritingmanipurihtmaccessedon20102019

[126] WikipediaBengalialphabethttpsenwikipediaorgwikiBengali_alphabetaccessedon25112017

[129]OmniglotSlyhetihttpwwwomniglotcomwritingsylotihtmaccessedon1052018

[130]WikipediaBishnupriyaManipurilanguagehttpsenwikipediaorgwikiBishnupriya_Manipuri_languageaccessedon25112017

[131] TheEMILLECIILCorpushttpmetashareeldaorgrepositorybrowsethe-emilleciil-corpusabdd35c8de6f11e2b1e400259011f6ea6bce74d38dbb42d881da76c64a6adb20accessedon1052018

[132]TheEMILLECIILCorpushttpcatalogelrainfoproduct_infophpproducts_id=696accessedon1052018

[133] BanglaLanguageampScript httpswwwisicalacin~rc_banglabanglahtmlaccessedon1052018

[134] SarkarPabitra1992BanglaBananSanskarSamasyaoSambhabanaKolkataChirayataPrakashan

[135] SarkarPabitra1993BanglaBhasharYuktabyanjanBhasha1123-45

55

[136] DashNiladriShekharandBBChaudhuri1998BanglaScriptAStructuralStudyLinguisticsToday121-28Alsoavailableathttpswwwacademiaedu9967428Bangla_Script_A_Structural_Study

[137] DaniAhmedHasan(1957)lsquoSrīhatta-NāgarīLipirUtpattioBikāśrsquoBanglaAcademyPatrika(Dhaka)Vol12(Bhadra-Agrahayan1364BangabdaNumber)pg1

[138] WikipediaSylhetiNagari httpsenwikipediaorgwikiSylheti_Nagariaccessedon1952018

[139] FuruiRyosuke(2015)lsquoVariegatedAdaptationsStateFormationinBengalfromtheFifthtoSeventhCenturyrsquoinBhairabiPrasadSahuampHermannKulkeedsInterrogatingPoliticalSystemsIntegrativeProcessesandStatesinPre-ModernIndiaChapter9Pp255-73NewDelhiManohar

[140] FergusonCharesAandMunierChowdhury(1960)lsquoPhonesofBengalirsquoLanguageVol36No1pp22-59

[141] ShahidullahMuhammad(2007)BuddhistMysticSongsDhakaMowlaBrothers

[142] RayPunyaSloka(1966)BengaliLanguageHandbookWashington

[143] HaiMuhammadAbdul(1960)AphoneticandphonologicalstudyofnasalsandnasalizationinBengaliDhakaUniversityofDhaka

[144] UnicodeConsortiumProposalSummaryFormtoAccompanySubmissionsforAdditionstotheRepertoireofISOIEC10646UNICODEhttpswwwunicodeorgL2L200202387r-syloti-formpdfaccessedonMay212018

[145] WikipediaOlChiki(Unicodeblock)httpsenwikipediaorgwikiOl_Chiki_(Unicode_block)accessedonMay212018

[146]BanglaScripthttpwwwbangladesh2000combdbangla_scripthtmlaccessedonMay212018

[147] BhattacharyaAshutoshed(1942)GopichandrerGanCalcuttaCalcuttaUniversity

[149] DasSisirKumar(1975)SahibsandMunshisAnAccountoftheCollegeofFortWilliamCalcutta

[150] IslamRafiqulPabitraSarkarMahbubulHaqampRajibChakraborty(eds)(2014)BanglaAcademyPromitoBanglaByabaharikByakaran(AFunctionalGrammarofStandardBangla)DhakaBanglaAcademy

[151] SarkarPabitra[2013]lsquoBanglaSpellingReformtheLongandShortofItrsquoBanglaJournal19215-232

[152] BanglaAcademy(2012)BanglaAcademyPromitoBanglaBananerNiyam(StandardBanglaSpellingasadoptedbyBanglaAcademy)DhakaBanglaAcademy

56

[153] SarkarPabitraampRajibChakraborty2018ldquoWhathashappenedSoFarIntermsof Script Reformsrdquo Paper presented at the Face to Face meeting jointly held by theBanglaAcademyDhakaampICANNatBanglaAcademyDhakaon10072018

[154] The Unicode Consortium 2018 The Unicodereg Standard Version 110 ndash CoreSpecificationChapter12P473

57

10 Appendix-I

101 AugmentedBackus-NaurFormalism(ABNF)The Augmented Backus-Naur Formalism (ABNF) is generic in nature and whenappliedtoaspecificlanguagescriptcertainrestrictionrulesapplyInotherwordsinagivenlanguagesomeoftheFormalismstructuresdonotnecessarilyapplyTotakecareofsuchcasesrestrictionrulesaresetinplaceTheserestrictionswillhelptofine-tunetheABNFIncaseofBangla13inparticularthefollowingrulesapply

1Khaṇḍata (ৎ) is NOT allowed at the beginning of an IDN label The sameappliestoঞandthevelarnasalঙintheBanglaSchemeoffive-foldlsquovargarsquo(asdefined under Table 5) Moreover Bangla does not allow ya (য়) in thebeginningofawordeitherbutwecanciteacoupleofnativeexamples forexamplethewordয়8াvেড়া(yaeligbbɔRo)fromthepoemlsquoLichuchorrsquowrittenbyKaziNazrul IslamHowever there are instances of it being used in namesmostlyof foreignoriginsuchasYaqubwhichmaybewrittenwithya(য়) inthe beginning as inয়াwব) In very recent timeswhile transliterating someChineseandJapanesenamesinBanglaonedoescomeacrossthepossibilityofKhaṇḍata (ৎ) followedbysa (স) inthebeginningofaword forexamplexেসিরং(Tsering)

2CHcancomewithKhandaTainonlythecasewhereCisra(র)(09B0)

ৎ6 asinভৎ6 সনা

3OnlyfollowingcombinationswithVHCMwillbeallowedrarrঅ8া(togetherpronouncedasaelig)asinঅ8ািসড(acid)rarrএ8া(togetheralsopronouncedasaelig)asinএ8ািসডএ8ােসািসেয়শান

(acidassociation)

102 lsquoSylhetiNagarılipirsquoorlsquoSilotirsquoThisversionofBanglascriptresemblesthe lsquoKaithīrsquoscript(ISO12954)usedbytheAccountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh andBiharndashwidelyinuseduringthe1880sTherewereseveralothernamesofSylheti

13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language Assamese and Maṇipuri have not been covered in this section

58

Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)

Table17ndashTheScriptTableofSylhetiNagarıorSiloti

103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints

1031 BanglaandNāgarīorDevanāgarī

Bangla Devanāgarī NBGP

Decision

ঃ U+0983 ः U+0903 Confusable

ওU+0993 उU+0909 Confusable

ঘU+0998 घU+0918 Confusable

U+0981 U+0945 Confusable

Table18BanglaandDevanāgarīconfusablecodepoints

59

1032 BanglaandGurmukhi

Bangla Gurmukhi NBGPdecision

ঘU+0998 ਬU+0A2C Confusable

U+0981 U+0A71 Confusable

Table19BanglaandGurmukhiconfusablecodepoints

Bangla Gurmukhi

NBGPdecision

ওU+0993 ਤU+0A24

Distinguishable

শU+09B6 ਅU+0A05

Distinguishable

মU+09AE ਮU+0A2E

Distinguishable

বাU+09ACandU+09BE

ਗU+0A17

Distinguishable

Table20ndashBanglaandGurmukhıdistinguishablecodepoints

1033BanglaandOriya(Odia)

Bangla Oriya(Odia) NBGPDecision

ওU+0993 ଓU+0B13 Confusable

Table21ndashBanglaandOriyadistinguishablecodepoints

Bangla Oriya(Odia) NBGP

Decision

ঘU+0998 ସU+0B38 Distinguishable

Table22ndashBanglaandOriyadistinguishablecodepoints

60

11 Appendix-IIBengaliconsonantsandtheirallographs

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

প p y(z+ত)(z +ন)|(z +প)প8(z +য)r(z +র)(z +ল)~(z +স)(+প)(+প)

ফ pʰ (t +র)(t +ল)(+ফ)

ব b ( +জ)( +দ)( +ধ)v(+ব)ব8(+য)(+র)(+ল)ভ(+ভ)(+ব)(+ব)

(0+ধ) 2 (3+ব)

ভ bʱ ভ8(+য)(+র)(+ল)

ত t (x+ত)8(x+x+য)(x+x+ব)(x+থ)(x+ন)ত8(x+য)(x+ম)8(x++য)(x+ব)(x+র)y(z+ত)(p+ত)(p+x+ব)(+ত)q8(+x++য) (+x+র)Thereisamarkedformofত+=ৎৎ6 (+xৎ)

amp (5+ত)

61

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

থ tʰ থ8(iexcl+য)cent(iexcl+র)pound(+থ)(x+থ)curren(+থ)

(7+থ) 9 (+থ)

দ d yen(brvbar+গ)sect(brvbar+ঘ)uml(brvbar+দ)copy(brvbar+ধ)দ8(brvbar+য)ordf(brvbar+ব)laquo(brvbar+ভ)not(brvbar+র)(+দ)shy(+দ)reg(+brvbar+র)not6 (+brvbar+র)

(lt+গ) gt (lt+ধ)

ধ dʱ macr(deg+ন)plusmn(deg+ম)ধ8(deg+য)sup2(deg+র)sup3(acute+ধ)copy(brvbar+ধ)(+ধ)micro(+ধ)

( (+ধ) gt (lt+ধ) (0+ধ) (7+ধ)

ট ʈ para(middot+ট)ট8(middot+য)cedil(middot+ব)sup1(middot+র)ordm(p+ট)raquo(frac14+ট)

ঠ ʈʰ ঠ8(frac12+য)frac34(iquest+ঠ)Agrave(frac14+ঠ)

ড ɖ Aacute(Acirc+ড)ড8(Acirc+য)Atilde(Acirc+র)

ঢ ɖʱ ঢ8(Auml+য)Aring(iquest+ঢ)

চ tʃ AElig(Ccedil+চ)Egrave(Ccedil+ছ)Eacute(Ccedil+Ecirc+র)Euml(Ccedil+ঞ)চ8(Ccedil+য)Igrave(Iacute+চ)Icirc(Iuml+চ)

(A+চ)

62

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ছ tʃʰ ETH(Ecirc+র)Egrave(Ccedil+ছ)Ntilde(Iacute+ছ)Ograve(Iuml+ছ)

$ (A+ছ)

জ dʒ Oacute(Ocirc+জ)Otilde(Ocirc+Ocirc+ব)Ouml(Ocirc+ঝ)times(Ocirc+ঞ)জ8(Ocirc+য)Oslash(Ocirc+র)Ugrave(Iacute+জ)

(A+জ)

ঝ dʒʱ (notprivilegedenoughtohaveclustersasafirstmember)Ouml(Ocirc+ঝ)Uacute(Iacute+ঝ)

ক k Ucirc(p+ক)ordm(p+ট)(p+ত)Uuml(p+x+র)(p+x+ব)Yacute(p+ন)THORN(p+ব)szlig(p+ম)ক8(p+য)agrave(p+র)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)auml(p+frac14+ব)aacute8(p+frac14+য)aring(p+স)s(aelig+ক)ccedil(+p+র)

amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)

খ kʰ (notprivilegedenoughtohaveclustersasafirstmember)egrave(aelig+খ)

63

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

গ g eacute(acute+গ)ecirc(acute+দ)sup3(acute+ধ)euml(acute+ন)igrave(acute+ব)iacute(acute+ম)গ8(acute+য)icirc(acute+র)iuml(acute+ল)eth(aelig+গ)eth6 (+aelig+গ)

( (+ধ) (H+গ) K (L+H+গ)

ঘ gʱ ntilde(ograve+ন)ঘ8(ograve+য)oacute(ograve+র)ocirc(aelig+ঘ)

ঞ Thisletterdoesnothaveanyparticularphoneticvaluebutmostlypronouncedasn

Igrave(Iacute+চ)Ntilde(Iacute+ছ)Ugrave(Iacute+জ)Uacute(Iacute+ঝ)times(Ocirc+ঞ)

(A+চ) $ (A+ছ) (A+জ) M (A+ঝ)

ণ n otilde(iquest+ট)frac34(iquest+ঠ)ouml(iquest+ড)divide(iquest+Acirc+র)Aring(iquest+ঢ)oslash(iquest+ণ)ণ8(iquest+য)ugrave(iquest+ব)acirc(p+frac14+ণ)uacute(frac14+ণ)ucirc(+ণ)

O (P+ড) - (P+R+র) + (S+ণ)

ঙং ŋ s(aelig+ক)uuml(aelig+p+র)egrave(aelig+খ)eth(aelig+গ)ocirc(aelig+ঘ)yacute(aelig+p+ষ)(Insomecontextsaeligisreplacedbyং)কংঅং

) (H+ক) (H+গ) U (H+ঘ)

64

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ম m thorn(+ল)yuml(+প)(+z+র)(+ভ)(++র)$(+ম)(+র)(x+ম)plusmn(deg+ম)amp(+ম)atilde(p+frac14+ম)

W (3+ম)

ন n (+ট)((+middot+র))(iquest+ঠ)u(+ড)(+Acirc+র)(+ত)q(+x+র)q8(+x++য)curren(+থ)shy(+দ)reg(+brvbar+র)micro(+ধ)+(+deg+র)(+brvbar+ব)-(+ন)(+ম)ন8(+য)(+স)0(+ন)

(7+থ) (7+ধ) (7+Y+র)

শ ʃ Icirc(Iuml+চ)Ograve(Iuml+ছ)1(Iuml+ন)2(Iuml+ম)3(Iuml+র)4(Iuml+ল)শ8(Iuml+য)

ষ ʃ 5(frac14+ক)raquo(frac14+ট)Agrave(frac14+ঠ)uacute(frac14+ণ)6(frac14+প)7(frac14+z+র)8(frac14+ফ)raquo(frac14+ট)9(frac14+middot+র)Agrave(frac14+ঠ)uacute(frac14+ণ)ষ8(frac14+য)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)

+ (S+ণ)

65

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

স sampʃ (+ক)(+ট)(+প)(+ফ)lt(+ত)pound(+থ)(+ট)(+ক)=(+খ)স8(+য)gt(+র)(+ল)aring(p+স)

9 (+থ)

হ h ucirc(+ণ)0(+ন)amp(+ম)হ8(+য)(+র)A(+ল)

W (3+ম)

ড় ɽ B(C+গ)

ঢ় ɽʱ (notprivilegedenoughtohaveclusters)

য dʒThesecondarysymbol(allograph)jɔ-phalahastwophoneticvaluesWhenaddedtotheinitialconsonantinaworditisavowelaelig(asinশ8ামলর 8াপারetc)Butafteranon-initialconsonantitjustdoublesitinpronunciation(asinকায6ধায6etc)The+যcombinationhastwophysicalmanifestationsmdashর 8andয6

ক8(p+য)স8(+য)র 8(+য)[Justর 8isneverusedinBanglaorthographyর 8াisbutthenitslasttwosymbolsYa-phalaa-karaconstituteavowelsignrepresentingthevowelঅ8া]

66

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

র r Twomanifestationsmdashi lরফrepʰasthefirst

memberofaclusteregপ6ৎ6 not6 য6D6(+deg+ব)(earlierE6=+brvbar+deg+বafour-termcluster)etc(placedoverthefollowingconsonant)

ii র-ফলাrɔ-pʰɔlaasthesecondthirdmemberofaclustereg etc(placedundertheconsonantitfollows)

ল l F(+গ)(+প)G(+ব)H(+ম)I(+ট)J(+ড)K(+ক)F(+গ)L(+দ)ল8(+য)iuml(acute+ল)(+ল)thorn(+ল)

ঃ hwordfinallywordmediallyitdoublesthepronunciationofthefollowingconsonant

অঃকঃ

অব

Page 21: Proposal for a Bangla (or Bengali) Script Root Zone …‘Bangla’ (or Bengali) is historically and genealogically regarded as an eastern Indo-Aryan language with around 178.2 million

21

both the cases the slot for ra could be Bangla ra র (U+09B0) or the Assamese ra ৰ(U+09F0)followedprecededbythecommonHasanta(U+09CD)whereastheshapesofrephaandra-phalāinboththecasesremainthesameTheLGRmakesanoteofthispoint of concern with respect to the two RAs in disguise as it would be compeltelyimpossibletodistinguishbetweenthemwithnakedeyesinalablesogeneratedwhichmay consequently lead to concerns related to spoofing and other kind of cyberirregularitiesThemotivetoclassthesetwoCPsas(blocking)variantsisbecausefullyrendered labels may mask the distinction between Bangla ra র (U+09B0) or theAssameseraৰ(U+09F0)ThatprovidesthejustificationforVariantSet4thoughonlyinthecontextoffollowingHasantThedifferencebetweentheRAsisonlydistinguishableifonelooksintotheirUnicodevaluesThereforelabelssuchasঅকHarka শীষH sırsalsquotopapexrsquo অY abhra lsquocloudthe skyrsquo ম śrama lsquophysical labourrsquo could be extremelydangerous as theweb-usermay never verify the digital content (the labels) with itsunicodevaluecodepoints ThispointismadeexplicitlywithreferencetoTable9(ofsequencesp36)andTable16(ofWLESymbolsp47)thataretofollowMoreoveritisnoteworthythattheREPHAcanalsooccurwithKHANDATATheconditionsinthiscontextofKHANDATAareliabletobesuchthattheCshouldbeeitherRAU+09B0(র)(usedinBangla)orRAU+09F0(ৰ)(usedinAssamese)

4 OverallDevelopmentProcessandMethodologyThe Neo-Brāhmī Generation Panel (NBGP) has been formed by members havingexperience in Linguistics (especially in NLP Computational linguistics) LiteratureLanguageHistoryandEpigraphyUndertheNeo-BrāhmīGenerationPanelBanglaandeightotherscriptsbelongingtoseparateUnicodeblocksarebeingtakenuptoassignaseparate LGR for each However an attempt ismade to ensure that the fundamentalphilosophybehindbuildingthoseLGRsconsistentwithallotherBrāhmī-derivedscriptsThepresentLGRwillcater tomultiple languagesbelongingtoEGIDSscale1to4(seeTable4)thatuseBanglascriptThefollowingguidingprinciplesareusedinmakingdecisionsaboutBanglaLGRCode-points

41 GuidingPrinciplesTheNBGP adopts followingbroadprinciples for selection of code-points in the code-pointrepertoireacrosstheboardforalltheNeo-Brāhmīscriptswithinitsambit

411 InclusionPrinciples4111 ModernUsageEvery character proposed should be in the everyday usage of a particular linguisticcommunityThecharacterswhichhavebeenencodedintheUnicodefortranscription

22

purposesonlyorforarchivalpurposeswillnotbeconsideredforinclusioninthecode-pointrepertoire4112 UnambiguousUseEvery character proposed should have unambiguous understanding among linguistsaboutitsusageinthelanguage

42 ExclusionPrinciplesThe main exclusion principle is that of External Limits on Scope These consist ofprotocolsor standardswhichareprerequisites to theLabelGenerationRule-setsAllfurtherprinciplesareinfactsubsumedundertheselimitationsbuthavebeenspeltoutseparatelyforthesakeofclarity421 ExternalLimitsofScopeThecodepointrepertoireforrootzonebeingaveryspecialcaseatthetopofprotocolhierarchies thecanvasofavailablecharacters forselectionasapartof theRootZonecodepointrepertoireisalreadyconstrainedbyvariousprotocollayersbeneathitThefollowingthreemainprotocolsstandardsactassuccessivefiltersiTheUnicodeChartOut of all the characters that are needed by the script in question if a particularcharacter is not encoded in Unicode it cannot be incorporated in the code pointrepertoire Such cases are quite rare and especially so in Bangla-Asamiyā-ManipuriWritingSystemgiventheelaborateandexhaustivecharacterinclusioneffortsmadebytheUnicodeconsortiumiiIDNAProtocolUnicode being the character-encoding standard for providing the maximum possiblerepresentation of a given scriptlanguage it has encoded as far as possible all thepossible characters needed by the script However the Domain name being aspecialized case it is governed by an additional protocol known as IDNA(InternationalizedDomainNames inApplications) The IDNAprotocol excludes somecharactersoutofUnicoderepertoirefrombeingpartofthedomainnamesiiiMaximalStartingRepertoire(MSR)TheRoot-zoneLGRbeing the repertoireof characterswhicharegoing tobeused forcreationoftheRoot-zoneTLDswhichinturnconstituteanevenmorespecializedcaseof domain names the ROOT LGR procedure introduces additional exclusions on theIDNArsquosallowedsetofcharacters

23

ExampleBanglaSignAvagrahaঽ(U+093D)evenifallowedbyIDNAprotocol isnotpermittedintheRootZoneRepertoireaspertheMSRTosumuptherestrictionsstartoffwithadmittingonlysuchcharactersasarepartofthe code-block of the given scriptlanguage The IDNA Protocol further narrows thisdownandfinallyanadditionalfilterintheformofMaximalStartingRepertoirerestrictsthecharactersetassociatedwiththegivenlanguageevenmore4211 NoPunctuationMarksTheTLDsbeingidentifierspunctuationmarkerspresentinBraHami-basedscriptswillnotbeincluded 4212 NoSymbolsandAbbreviationsAbbreviations weights and measures and other such iconic characters like BANGLAISSHAR(U+09FA)BANGLACURRENCYDENOMINATORSIXTEEN৹(U+09F9)etcwillalsonotbeincluded4213 NoRareandObsoleteCharactersThere are characterswhich have been added toUnicode to accommodate rare formssuchasSanskriticVOCALICRRৠ(U+09E0)andVOCALICLldquoঌrdquo (U+098C)aswellasVOCALICLLৡ(U+09E1)andtheallographicndashkaraformsofthelattertwosymbols-VOWELSIGNVOCALICL(U+09E2)andVOWELSIGNVOCALICLLldquo(U+09E3)Allsuch charactersareexcludedwhich complieswith theConservatismprincipleas laiddownintheRootZoneLGRprocedureHoweverinBanglathe-karacorrespondingtoVOCALICRRৠ(U+09E0)whichisVOWELSIGNVOCALICRRldquordquo(U+09C4)isstill inactiveuseincertainlimitedborrowedorSanskriticwordsandarethereforeretained4214 NoStressMarkersofClassicalSanskritandVedicStressmarkers for classical Sanskrit will not be included This is also in consonancewiththeLetterprincipleaslaiddownintheRootZoneLGRprocedure4215 ABNFThe Augmented Backus-Naur Formalism (ABNF) is described in Section 541 andAppendix(Section101)

5 RepertoireTheBanglaWritingSystemisrepresentedinUNICODEusingtheBengali(Bangla)scriptname as enumerated in ISO 15924 corresponding to languages such as Asamiyā(Assamese) Bangla (Bengali) and Manipuri The BENGALI block used for Bangla-

24

Asamiyā-Manipuri in theUNICODEhas93 entriesThis sectiondetails the code-pointrepertoirethattheNeo-BrāhmīGenerationPanel[NBGP]proposestobeincludedintheBanglaLGRItmaybementionedherethat theGovernmentofAssamhassubmittedaproposal toBureauof Indian Standards (BIS) on26th February2016 for dis-unificationofBanglaand Asamiyā Scripts The BIS inits 8thMeetingofIndian Language Technologies andProducts Sectional Committee LITD 20 held on 23rd Aug 2017 decided torefer theproposalforrecognitionofAssamesescriptinISOIEC10646toISOUntiltheUNICODEConsortiumtakesanyfurtheractionitwillbeassumedthattheCodePointRepertoireunderTable11willbevalidforallthethreelanguagesasaboveFor each of the code points language references have been given in the last columntitledReferenceunderTable8titledtheldquoCodePointRepertoirerdquoForentirecoverageofBanglacodepointsreferencesofBanglaAsamiyā(Assamese)Manipuri(Meitei)andBishnupriya are given Kokborok written in Bangla script is not known to haveintroducedmanynewcomplicationsexceptforoneparticularcharacterThoughonlyafewrepresentativelanguagesunderEGIDSScale1-4havebeenchosenforreferencingthey together cover all the code-points required for all the languages that NBGP hasconsideredasgivenunderBanglaUnicodePoints(asgiveninUNICODE63)Howeverbefore thedetailsarepresented it is ideal to lookat theBanglaCodePointChartfromMaximalStartingRepertoire[MSR]Version3Itmaybenotedthattheshapesofthereferenceglyphsgivenbelowinthecodechartsarebasedononeofthemanyfontsdesigned and are not prescriptive because there could be some variations in actualfonts ndash both UNICODE-compatible and True-Type ones Consider the following Codepointtable

25

Figure1BanglaCodePagefrom[MSR]forBangla-Asamiyā-Manipuri

ColourconventionAllcharactersthatareincludedinthe[MSR]-YellowbackgroundPVALIDinIDNA2008butexcludedfromthe[MSR]-PinkishbackgroundNot PVALID in IDNA2008 or are ineligiblefor the root zone (digits hyphen) - Whitebackground

GiventheBanglaUnicodeBlockasinFigure1forthecodepointsthoseareincludedintheMSRthefollowingsymbolswillneedaseparatetreatmentৎ U+09CE BanglaLetterKhanda-Taৰ U+09F0 Asamiyā-BanglaLetterRaWithMiddleDiagonalৱ U+09F1 Asamiyā-BanglaLetterRaWithLowerDiagonal

26

51 CodePointRepertoireInclusion

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

1 U+0981 BENGALISIGNCANDRABINDU

Candra-bindu

1Bangla2Manipuri2Assamese

[112][122][125]

2 U+0982 ং BENGALISIGNANUSVARA

Onushshar(Anusvara)

1Bangla2Manipuri2Assamese

[112][122][125]

3 U+0983 ঃ BENGALISIGNVISARGA

Biśarga(Visarga)

1Bangla2Manipuri2Assamese

[112][122][125]

4 U+0985 অ BENGALILETTERA

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

5 U+0986 আ BENGALILETTERAA

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

6 U+0987 ই BENGALILETTERI

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

7 U+0988 ঈ BENGALILETTERII

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

27

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

8 U+0989 উ BENGALILETTERU

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

9 U+098A ঊ BENGALILETTERUU

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

10 U+098B ঋ BENGALILETTERVOCALICR

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

11 U+098F এ BENGALILETTERE

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

12 U+0990 ঐ BENGALILETTERAI

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

13 U+0993 ও BENGALILETTERO

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

14 U+0994 ঔ BENGALILETTERAU

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

28

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

15 U+0995 ক BENGALILETTERKA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

16 U+0996 খ BENGALILETTERKHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

17 U+0997 গ BENGALILETTERGA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

18 U+0998 ঘ BENGALILETTERGHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

19 U+0999 ঙ BENGALILETTERNGA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

20 U+099A চ BENGALILETTERCA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

21 U+099B ছ BENGALILETTERCHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

29

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

22 U+099C জ BENGALILETTERJA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

23 U+099D ঝ BENGALILETTERJHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

24 U+099E ঞ BENGALILETTERNYA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

25 U+099F ট BENGALILETTERTTA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

26 U+09A0 ঠ BENGALILETTERTTHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

27 U+09A1 ড BENGALILETTERDDA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

30

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

28 09A109BC(U+09DC)

ড় NormalizedformofBENGALILETTERRRA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]09DCisthepreferredcodepointhoweveritisnotavailableforLGRasperthestandardsgoverningthisLGRdevelopment

29 U+09A2 ঢ BENGALILETTERDDHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

30 09A209BC(U+09DD)

ঢ় NormalizedformofBENGALILETTERRHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]09DDisthepreferredcodepointhoweveritisnotavailableforLGRasperthestandardsgoverningthisLGRdevelopment

31 U+09A3 ণ BENGALILETTERNNA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

32 U+09A4 ত BENGALILETTERTA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

31

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

33 U+09A5 থ BENGALILETTERTHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

34 U+09A6 দ BENGALILETTERDA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

35 U+09A7 ধ BENGALILETTERDHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

36 U+09A8 ন BENGALILETTERNA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

37 U+09AA প BENGALILETTERPA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

38 U+09AB ফ BENGALILETTERPHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

39 U+09AC ব BENGALILETTERBA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

32

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

40 U+09AD ভ BENGALILETTERBHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

41 U+09AE ম BENGALILETTERMA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

42 U+09AF য BENGALILETTERYA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

43 09AF09BC(U+09DF)

য় NormalizedformofBENGALILETTERYYA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]09DFisthepreferredcodepointhoweveritisnotavailableforLGRasperthestandardsgoverningthisLGRdevelopment

44 U+09B0 র BENGALILETTERRA

Consonant 1Bangla2Manipuri

[112][125]

45 U+09B2 ল BENGALILETTERLA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

46 U+09B6 শ BENGALILETTERSHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

33

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

47 U+09B7 ষ BENGALILETTERSSA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

48 U+09B8 স BENGALILETTERSA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

49 U+09B9 হ BENGALILETTERHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

50 U+09BE া BENGALIVOWELSIGNAA

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

51 U+09BF ি BENGALIVOWELSIGNI

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

52 U+09C0 ী BENGALIVOWELSIGNII

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

53 U+09C1 BENGALIVOWELSIGNU

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

34

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

54 U+09C2 BENGALIVOWELSIGNUU

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

55 U+09C3 BENGALIVOWELSIGNVOCALICR

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

56 U+09C4 BENGALIVOWELSIGNVOCALICRR

Kāra(Mātrā)

1Bangla2Assamese

[112][122]

57 U+09C7 l BENGALIVOWELSIGNE

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

58 U+09C8 m BENGALIVOWELSIGNAI

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

59 U+09CB lা BENGALIVOWELSIGNO

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

60 U+09CC lৗ BENGALIVOWELSIGNAU

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

35

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

61 U+09CD BENGALISIGNVIRAMA

Hasanta(=Halant)Virama(=Da ri)

1Bangla2Assamese2Manipuri

[112][122][125]

62 U+09CE ৎ BENGALILETTERKHANDATA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

63 U+09F0 ৰ BENGALILETTERRAWITHMIDDLEDIAGONAL

Consonant 2Assamese [122]

64 U+09F1 ৱ BENGALILETTERRAWITHLOWERDIAGONAL

Consonant 2Assamese2Manipuri

[122][125]

Table8BanglaCode-PointRepertoire

Apart from the above individual code-points the Neo-Brāhmī Generation Panel alsoproposes some specific sequences which enable conditional inclusion of the BanglaLETTER A and E followed by Bangla SIGN VIRAMA and Bangla LETTER YA againfollowed by Bangla VOWEL SIGN AA in the repertoire for enabling inclusion of aeligsoundasinEnglishlsquobatrsquolsquocatrsquoetc

SrNo

UnicodeCodePoints

Sequence

CharacterNames Examplelanguagesusingthecode-point

(Notexhaustive

list)

Reference

36

SrNo

UnicodeCodePoints

Sequence

CharacterNames Examplelanguagesusingthecode-point

(Notexhaustive

list)

Reference

S1 098509CD09AF09BE

অ8া BENGALILETTERABENGALISIGNVIRAMABENGALILETTERYABENGALIVOWELSIGNAA

BanglaAssamese

[112][122]

S2 098F09CD09AF09BE

এ8া BENGALILETTEREBENGALISIGNVIRAMABENGALILETTERYABENGALIVOWELSIGNAA

Bangla [112]

Table9Sequences

52 CodePointRepertoireExclusionTherearesomecharactersoftheBanglascriptthatfindplaceintheUnicodebuthavenot been included in the repertoire in the LGR proposal The reason for excludingঌ(U+098C)andৗ(U+09D7)isthattheyarerareandobsoletecharacters

SrNo CodePoints

Glyph CharacterNames Note

1 U+098C ঌ BENGALILETTERVOCALICL Limitedordeclininguse

2 U+09D7 ৗ BENGALIAULENGTHMARK Limitedordeclininguse

Table10ExcludedCodePoints

37

53 CodepointnotusedaloneBENGALI SIGN NUKTA U+09BC (See 336) is excluded from repertoire since it will never be used alone It will be used as sequence in three special characters in normalized form for ড় ঢ় য়

UnicodeCodePoint

Glyph CharacterName Reasonforexclusion

U+09BC BENGALISIGNNUKTA

Never used alone Only used together with U+09A1 ড

U+09A2 ঢ U+09AF য as to form ড় ঢ় য় respectively

Table10bExcludedCodePoints

54 TheBasisofPresentIDNThepresentLGRhasalsobenefitedfromtheearlierworkonIDNforBangla(differentversions)doneforभारतorভারতdraftedbetween20112009and18072013

541 TheABNFVariablesTheAugmentedBackus-NaurFormalism(ABNF)beganwiththefollowingvariables

CrarrConsonantVrarrVowelMrarrkāra(Mātrā)BrarrAnusvāra(onuʃʃār)DrarrCandrabinduXrarrVisarga(biʃɔrgo)HrarrHasantaViramaZrarrKhandaTa

TheAugmentedBackus-NaurFormalism(ABNF)willusethefollowingOperators

SrNumber Operator Function

1 ldquo|ldquo Alternative

2 ldquo[]rdquo Optional

3 ldquordquo VariableRepetition

38

4 ldquo()rdquo SequenceGroup

Table11TheABNFFormalism

InwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoBanglaaregiventofacilitateunderstanding

542 TheVowelSequenceInwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoBanglaare given To facilitate understanding of other Brahmi script users equivalents inDevanāgarīareprovidedwherevernecessaryAvowelsequenceismadeupofasinglevowelItmaybefollowedbutnotnecessarily(optionally)byanAnusvāraonuʃʃār(B)Candrabindu(D)oraVisargabiʃɔrgo(X)ThenumberofDBorXwhichcanfollowaV inBanglamaynotberestrictedtooneGoingbytherules illustratedinthedocument it isclearthat formationssuchasVDDVBBandVXXare invalidorthographicunitsHowever it isvalidandpossible tohaveformationsorsequencessuchasanusvarafollowedbyachandrabinduononehandandvisarga followed by a chandrabindu on the other as in হ8াংচা lsquohaelignchārsquo and lsquohaelignrsquo হ8াঃrespectivelyThepossibilityof aVisarga orAnusvāra (onuʃʃār) followingaCandrabinduexists inBangla Vowel can optionally be followed by a combination of Hasanta Virama [H]Consonant [C] to formaYa-phala ldquoYa-phala isapresentation formofU+09AFBanglaletterযorlsquoyarsquoRepresentedbythesequenceltU+09CDieBENGALISIGNVIRAMABangla SIGNHasanta or VIRAMA U+09AF -য BENGALI LETTER YAgt Ya-phala has aspecialformয়AgainwhencombinedwithU+09BEাBENGALIVOWELSIGNAA(ielsquoaarsquo(ā))itisusedfortranscribing[aelig]asintheldquoardquointheEnglishwordldquobatrdquowritteninBanglaasব8াটAVowel-sequenceadmitsthefollowingcombinations

5421 ASingleVowel

ExamplesV অअ

5422 AVowelwithConditionsAVowelcanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D]orVisarga[X]or Candrabindu+ Anusvāra [DB] or Candrabindu+ Visarga [DX] or combination ofHasanta(orVirama)[H]followedbyConsonant[C]followedbykāra(Mātrā)[M]

39

Examples

VB অং अ

VD অ अ

VX অঃ अः

VDB অং अ

VDX অঃ अःVHCM অ8াএ8া

5423 VHCMSequenceAVHCMsequencecanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D] orVisarga [X]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga [DX]

Examples VHCMB অ8াংএ8াং VHCMD অ8াএ8া VHCMX অ8াঃএ8াঃ VHCMDB অ8াংএ8াং VHCMDX অ8াঃএ8াঃ

543 TheConsonantSequence

5431 ASingleConsonant(C) ExampleC কक

5432 AConsonantwithConditionsAConsonantoptionallyfollowedbydependentvowelsignkāra(Mātrā)[M]orAnusvāra [B] or Candrabindu [D] or Visarga [X] or Hasanta (also known asVirama)[H]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga[DX]

Example

CM িকক -कक

CB কং क

40

CD ক क

CX কঃ कः

CH p क(Pureconsonant)

CDB কং क

CDX কঃ कः

5433 CMSequenceACMsequencecanbeoptionallyfollowedbyBDXDBorDX

Example CMB কীংকং कक

CMD কা का

CMX বীঃ वीः

CMDB কাং का

CMDX কাঃ काः

5434 SequenceofConsonants Asequenceofconsonants(upto4)joinedbyHasanta(alsoknownasVirama)

3(CH)CExample

CHC W rarr ন++ ত न+त

CHCHC sup2 rarr ন+ + ত+ + র न+त+र

CHCHCHC q8 rarr ন++ত++র++য় न+त+र+य

5435 Subsets While considering its subsets as a representative example we will consider thecombinationCHConlyhoweverthesameisequallyapplicabletoCHCHCandCHCHCHC[A]ThecombinationmaybefollowedbyMBDXDBorDX

Example CHCM sup3ী rarrক ক ী 4कrarrककী

CHCB sup3ং rarrক ক ং 4कrarrकक

CHCD sup3 rarrক ক 4कrarrकक

41

CHCX sup3ঃ rarrক ক ঃ 4कःrarrककঃ

CHCDB sup3 ং rarrক ক ং 4कrarrकक

CHCDX sup3ঃ rarrক ক ঃ 4कःrarrककः [B] 3(CH)CMmayfurtherbefollowedbyaBDXDBorDX

Example

CHCMB sup3ীং rarr ক ক ী ং 4क rarr क क ी

sup3ং rarr ক ক ং 4क rarr क क

CHCMD sup3া rarr ক ক া 4का rarr क क ा

CHCMX sup3ীঃ rarr ক ক ী ঃ 4कः rarr क क ी ः

CHCMDB sup3াংrarr ক ক া ং 4काrarr क क ा

CHCMDX sup3াঃ rarr ক ক া ঃ 4काः rarr क क ा ः

544 TheKhanda-Tasequence

5441 AsinglelsquoKhandarsquo-Ta(Z) Example Zৎ=x

5442 AKhandaTaCombination10AKhandaTacanbeprecededbyaconsonantandHasanta(alsoknownasVirama)

[CH]Z

Example র++ৎ =ৎHasinভৎHসনা (bhartsanā) scoldingNoteTheconditionsinthiscontextofKHANDATAarethattheCshouldbeeitherRAU+09B0(র)(usedinBangla)orRAU+09F0(ৰ)(usedinAssamese)

545 SpecialCasesSandPTwospecialcasesinvolvingSequences(referredtoasSandPinTable16underSection7)couldbedescribedbrieflyhereLetustakeupSinthefirstinstanceItisnoteworthythat thereare two instances inBanglawhereHasanta(U+09CD) isprecededbya full

10 Refer to Rule P in Section 7 Table 16

42

vowel (U+0985 অ - BANGLA LETTER A and U+098F এ - BANGLA LETTER E) ForrenderingYa-phalāfollowedbyঅandএitisnecessarytotypeU+09CDplusU+09AFyapreceded by the said vowels This is a purely ligatural entity and the addition ofYa-phalāandā-kāraisusedtoelicittheaeligsoundasinEnglishlsquobatrsquolsquofatrsquoetcTheBrahmıscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribedaslsquovowel killerrsquo although it actually indicates absence of a vowel after the markedconsonant Only the consonants can have the Hasanta marked But as we see hereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅGাandএGা(CfUnicode100p473[100])Another case refers to the formation of repha and ra-phalā in the said script andmentioned in the tableaboveasPOwing toco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+

kaasinঅকHarkaldquothesunldquo)ra-phalā=C+Hasanta+ra(egTieka+Hasanta+raas

inচTcakraldquocycleldquo)ThepointisinboththecasestheslotforracouldbeBanglaraর

(U+09B0)ortheAssameseraৰ(U+09F0)followedprecededbythecommonHasanta(U+09CD)whereastheshapesofrephaandra-phalāinboththecasesremainthesame

43

6 VariantsThissection talksabout thevariants in theBanglascriptTheNBGPcategorizes theseconfusinglyvariantsintwogroupsGroup1ConfusingduetopurevisualsimilarityGroup2ConfusingduetodeviationfromnormallyperceivedcharacterformationsbylargerlinguisticcommunityForGroup1anyidenticalcodepointsaredefinedasvariantsTheconfusablebutnotidenticalcasesarenotproposedasthereisanotherpanel(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesHowevercaseswhichbelongtoGroup2areproposedtobeconsideredasvariantsThesecasesarenotofmerevisualsimilarityasthey involve some deviations from the widely accepted norms of Bangla Aksharformations These can cause confusion even to a careful observer and hence beingproposedasvariantsThe variants are generated in a script when two or more forms are formed withdifferent storage or code points In Bangla the e-kāra ā-kāra and the o-kāra havedifferentcodepointsOnecantypeowithaconsonantatonegoandthesamebytypinge-kāra and ā-kāra as two separate keys getting the same results A reader cannotdifferentiatebetween the twoko (েকা)one typedwitha singlekeyand theotheronetypedwithtwodifferentkeysMoreoverthiswillnotbeconsideredasacaseofvariantbecauseakārafollowedbyakāraisnotallowed

61 InScriptVariantsHoweverweproposetwocasesoftruein-scriptvariantsinBanglascriptCASEIAs far as true variants in Bangla are concernedwemay drawour attention to caseswhereinHasantawith(U+09A5)থ(tha)appearsasconjunctwith(U+09B8)স(sa)and(U+09A8)ন(na)1 স+Hasanta+থ(U+09B8+U+09CD+U+09A5)versus

স +Hasanta+হ(U+09B8+U+09CD+U+09B9)

2 ন+Hasanta+থ(U+09A8+U+09CD+U+09A5)versus

ন+Hasanta+হ(U+09A8+U+09CD+U+09B9)

44

Theabovecombinationsifwrittenintraditionalorthographycouldbelittleconfusingwheretheথ(tha)inconjunctappearslikeaহ(ha)Theconjunctcouldbeintheinitialmedialorfinalpositions(asshownbelowinegno1)Itcouldbetypedwrongaswellthinking itwas aহ (ha)U+09B9 increasing the chances of risks in labelwriting andidentificationExamples1 acuteandসহ(asinacuteানsthānaacuteলsthulaাacuteGsvāsthyaঅacuteায়ীasthāyı)

2 andনহ(asinparagrantha)The fontswhichrepresent traditionalBanglawritingsystemcould tend tocreate thisproblemThereforethesemaybetakenascasesofvariantsinBanglaCASEIIAnotherinterestingexampleofvariantisencounteredinra+HasantaandHasanta+racombinationsinwritinglabelsintheBanglascript(forlanguagessuchasBanglaAssameseandManipuri)Thevariantcasesariseintypinglsquorepharsquo(involvingra+Hasanta)andlsquora-phalārsquo(involvingHasanta+ra)lsquoRepharsquocouldbeformedbytwosequences(mainlybecausebothAssameseandBanglafindplaceinthesameUNICODEpointsandlsquoB_RArsquoaswellaslsquoA_RArsquorefertothesamephoneticelement)Herethefinalligatureslookthesameandwillbeasfollows

(1) B_RA+H+C(2) A_RA+H+C

Where

B_RA = U+09B0BENGALILETTERRA(র)orA_RA = U+09F0BENGALILETTERRAWITHMIDDLEDIAGONAL(ৰ)H = U+09CDBENGALISIGNVIRAMA()C = anyconsonant(theoretically)

Example

Sequence1(UsingBanglaRA)

Ligature1

Sequence2(UsingAssameseRA)

Ligature2

U+09B0(র)U+09CD()U+0995(ক) কH U+09F0(ৰ)U+09CD()U+0995(ক) কH

U+09B0(র)U+09CD()U+09A0(ঠ) ঠH U+09F0(ৰ)U+09CD()U+09A0(ঠ) ঠH

Table12ExampleofRepha

45

NoteAsBanglaandAssameseকandঠlookexactlythesametheresultantcombinationswithRephalookidenticalAdditionofRephadoesnotmakeanydifferencelsquoRa-phalarsquocouldbeformedbytwosequencesonsimilargroundsandthefinalligatureswouldlookthesame

(1) C1+H+B_RA(2) C1+H+A_RA

WhereC1 = anyconsonantsexceptKhanda-ta

Example

Sequence1(UsingBanglaRA)

Ligature1

Sequence2(UsingAssameseRA)

Ligature2

U+0995(ক)U+09CD()U+09B0(র) U+0995(ক)U+09CD()U+09F0(ৰ)

U+09A8(ন)U+09CD()U+09B0(র) ) U+09A8(ন)U+09CD()U+09F0(ৰ) )

Table13ExampleofRa-phala

AstheAssameseandBanglaRephaandRa-phalaconjunctformslookthesamethiscouldcauseconfusabilitytotheend-usersHencetherephaandra-phalacasesneedtobedefinedasvariantsNBGPconcludedtodefineরandৰasvariantcodepointswhereonlyonevariantsetbetweenরandৰcouldcoverallcasesButthiswillcreateblockedvariantlabelsegif

someoneregistersldquoরররrdquothevariantlabelldquoৰৰৰrdquowillbegeneratedasvariantandwillbeblockedandviceversaHoweveritisonlyblockedatthelabellevelifsomeoneelseneedstoregisterotherlabelsegৰৰorৰৰৰৰitisstillpossible

62 CrossScriptVariantsAcrispcrossscriptstudyforBanglahasbeendonewithrespecttosisterscriptssuchasDevanāgarī Gurmukhı and Odia11(formerly Oriya) keeping in mind the visual andtechnicalconfusionstheymaycauseaslabelsonthewebdomainMoreoverthereisnoin-script variant in Bangla as far as the orthography is concerned The followingcharacters are being proposed by the NBPG as variants Although there are certaincharacters which are somewhat similar they but have not been included here TheyhavebeenprovidedintheAppendix(102)forreference

11 Unicode uses Oriya for the script although Odia is now the official term used

46

1 BanglaandNāgarīDevanāgarīScript

Bangla Devanāgarī

মU+09AE

मU+092E

িU+09BF

िU+093F

Table14-BanglaandDevanāgarīcross-scriptvariantcodepoint

2 BanglaandGurmukhiScript

Bangla Gurmukhı

মU+09AE

ਸU+0A38

িU+09BF

ਿU+0A3F

Table15-BanglaandGurmukhıcross-scriptvariantcodepoint

7 WholeLabelEvaluationRules(WLE) ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinsection32whenwritten inBangla12ScriptThe ruleshavebeendrafted in suchawaythattheycanbeeasilytranslatedintotheLGRspecificationsBelow are the symbols used in the WLE rules for each of the Indic SyllabicCategoryasmentionedinthetableprovidedinCodepointrepertoire(Section51)

C rarr Consonant

M rarr Kāra(Mātrā)

V rarr Vowel

B rarr Anusvāra

12 As used by the Unicode denoting and including both Assamese and Maṇipuri

47

D rarr Candrabindu

X rarr Visarga

H rarr Hasanta

Z rarr KhandaTa

S rarr S1S2(fromTable9)or(ae)Ya-phalā(V1HC1M1)whereV1iseither0985(অ-BENGALILETTERA)or098F(এ-BENGALILETTERE)His09CD(-BENGALISIGNVIRAMA)C1is-09AF(য-BENGALILETTERYA)M1is-09BE(া-BENGALIVOWELSIGNAA)S1 and S2 are valid even they are not allowed by the other context rules

P rarr Ra-Hasanta(C2H)whereC2iseither09B0(র-BENGALILETTERRA)or 09F0 (ৰ - ASSAMESE LETTER RA

Unicode name BENGALI LETTER RAWITHMIDDLEDIAGONAL)

His09CD(-BENGALISIGNVIRAMA)

Table16-SymbolsusedinWLErules

ItisalsoperhapsidealtomentionherethatinBanglatheconsonantletters(orgraphemes)arephysicallyjoinedtoformldquoclustersrdquothatcouldtheoreticallyconjoinfromtwotofourconsonantsandcombinetocreatenewshapesDashandChaudhuri(1998)statethatthereareldquonearly380uniqueconsonantclustersrdquooutofwhichBi-consonantalcombinationsare290three-lettercombinationsaccountforanother80andtherareroneswithfourlettersnumber10more[136Pg4]MoredetailsofsuchcombinationscouldbeseeninPabitraSarkar(1993)[135] 71FinalSetofWLERulesTheprevalentpatternsinBanglaandvariousrestrictionsbelowarethespecificWLErulesthatneedtobeimplemented

48

1 CisasetofCandCNwhereCNisthesetofnormalizedformsofড়ঢ়য়2 HmustbeprecededbyC

Example

3 MmustbeprecededbyCExampleকা

4 DmustbeprecededbyeitherofVCorMExampleআখখা হা

5 XmustbeprecededbyeitherofVCMorDExampleউঃখঃবঃাঃ দ ঃ

6 BmustbeprecededbyeitherofVCMorDExampleআংইংকং

7 ZmustbeprecededbyVCMDBXorPExampleইৎকৎাৎাৎপ6ৎrৎ (S is not listed because S ends with M Z may also follow S)rdquo

8 VCANNOTbeprecededbyHDetailsin711CaseofVprecededbyH

9 SCANNOTbeprecededbyH

Now letuselaborateeach rulewithexamples from the scriptkeeping inmindthe Bangla Assamese and Manipuri communities Some combinations ofcharactersmayseemunrealisticorrareinusagebutthereisnoharminaddingsuch ligaturesbecause it ispossible tocreate thembyanyusereasilybutmaynotbeattestedcombinationsCase of V Preceded by H There could be cases involvingmulti-word domains where Vmay need to beallowedtofollowanHegব8াsঅtইিuয়া baeligŋkʌv ɪndiə (U+09ACU+09CDU+09AFU+09BEU+0999U+09CD U+0995 U+0985 U+09AB U+09CD U+0987 U+09A8 U+09CD U+09A1U+09BFU+09DFU+09BE)(meaningBankofIndia)ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendswithanH(অt)andthesecondwordbeginswithaV(ইিuয়া)Somesectionsofthe linguistic community require the explicit presence of H for fullrepresentation of the sound intended However by and large the form of thefirstwordwithoutanH(U+09CD)isconsideredenoughforfullrepresentationofthesoundintendedforthefirstword

49

This isauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoire Otherwise V is never required to be allowed to follow an HPermittingthismaycreateaperceptivesimilaritybetweentwolabels(withandwithout H) for majority of the linguistic communities hence this is explicitlyprohibitedbytheNBGPIn future if required depending on the prevailing requirements from thecommunitythefutureNBGPmayconsiderrevisitingthisrule

72 AdditionalExamplesfromBanglaABNF

Belowarea fewexampleswhichhelponeunderstandsomeof the rulesABNFputsinplaceThesearejustgivenforreferencepurposesandarenotmeanttobecomprehensive

1HMBDorXcannotoccurinthebeginningofaBanglawordExample

ক क

াক ाक

ংক क

ক क

ঃক ःकAscanbeseensuchcombinationwillresultautomaticallyinaldquogolurdquooradottedcircle marking it as an invalid formation This is an intrinsic property of theIndianlanguagesyllableandisquasi-automaticallyappliedwhereversupportedbytheOS

2HisnotpermittedafterVBDXMS

Example

অ अ

অং क ক क কঃ कः ি )क

3NumberofBDorXpermittedafterConsonantorVowelorakāra(Mātrā)isrestrictedtoonethusthefollowingcombinationsareinvalidated

50

Example

কংং क

ক क

কঃঃ कःः

কা का

কীঃঃ कःः

অংং अ

অ अ

অঃঃ अःः

4NumberofMpermittedafterConsonantisrestrictedtooneExample

কীী की5MisnotpermittedafterV Example

ইাঈৗ ईाईौ6ThecombinationsofAnusvāra+ VisargaaswellasVisarga+Anusvāraarenotpermissible

Example

কংঃ कः

কঃং कः

8 Contributors

81ExpertsfromIndia ProfessorUdayaNarayanaSinghChair-ProfessorofLinguisticsampDeanFacultyofArtsAmityUniversityHaryanaGurgaonPachgaonManesarPIN122431(Haryana)India

51

ProfessorPabitraSarkarformerlyVice-ChancellorRabindraBharatiUniversityKolkataDrAtiurRahmanKhanPrincipalTechnicalOfficerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMrRajibChakrabortyLinguistSocietyforNaturalLanguageTechnologyResearch(SNLTR)Module114amp130SDFBuildingSaltLakeSector-VKolkata-700091(WestBengal)IndiaMrAkshatJoshiProjectEngineerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMsMoumitaChowdhurySeniorTechnicalOfficerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMrChandrakantaMurasinghAgartalaTripuraSomeotherNBGPmembers

82ContributorsfromBangladeshJanabMustafaJabbarHonorableMinisterMinistryofPostsTelecommunicationsampInformationTechnologyGovtofBangladeshProfShamsuzzamanKhanFormerDirector-GeneralBanglaAcademyDhakaProfRafiqulIslamNationalProfessorofHumanitiesDhakaProfSwarochisSarkarDirectorInstituteofBangladeshStudiesRajshahiUniversityRajshahiBangladeshProfJinnatImtiazAliDirector-GeneralInternationalMotherLanguageInstituteDhakaMrMohammadMamunOrRashidDepartmentofBanglaJahangirnagarUniversityampMemberBangladeshComputerCouncilProfManiruzzamanformerlyProfessorChittagongUniversityChattagramBangladeshMrShyamSunderSikderSecretarySecretaryPostampTelecommunicationsDivisionGovtofBangladesh

52

MrMdMustafaKamalFormerDirectorGeneralBangladeshTelecommunicationsRegulatoryAuthorityGovernmentofBangladeshDhakaBrigadierGeneralMdMahfuzulKarimMajumderDirector-GeneralEngineeringampOperationsDivisionBangladeshTelecommunicationsRegulatoryAuthorityGovernmentofBangladeshDhakaMdZiarulIslamProgrammerPostsampTelecommunicationsDivisionGovernmentofBangladeshDhakaProfSyedShahriyarRahmanDepartmentofLinguisticsUniversityofDhakaDrMizanurRahmanDirector(In-Charge)TranslationTextBookandInternationalRelationsDivisionBanglaAcademyDhakaDrApareshBandyopadhyayDirectorBanglaAcademyDhakaMrMdMobarakHossainDirectorBanglaAcademyDhakaDrJalalAhmedDirectorBanglaAcademyDhakaMrJahangirHossainInternetSocietyBangladesh(ICANNALS)JanabSarwarMostafaChoudhuryBangladeshComputerCouncilDhakaJanabMdRashidWasifBangladeshComputerCouncilDhakaJanabIstiaqueArifSeniorAssistantDirectorBangladeshTelecommunicationsRegulatoryAuthorityDhakaMsAfifaAbbasInformationSecurityandGovernanceLeadEngineeratBanglalinkandICANNFellowMr Mohammad Abdul Haque Secretary General Bangladesh Internet GovernanceForumMrImranHossenCEOEyeSoftandkeymemberofBangladeshAssociationofSoftwareampInformationServices(BASIS)MsShahidaKhatunDirectorFolkloreMuseumampArchiveDivisionBanglaAcademyDhakaMrSyedAshikRehmanCEOBengalMediaCorporationDhakaMrHaseebRahmanCEOProfessionalsrsquoSystemsDhaka

53

9 References[100] UnicodeConsortium2017UnicodeStandard100MountainViewCA

[101] BandyopadhyayChittaranjan1981DuiShatakerBanglaMudranoPrakashanKolkataAnandaPublishers

[102] BanerjiRD1919TheOriginoftheBengaliScriptKolkataNewDelhiAsianEducationalServices2003reprint

[103] ChatterjiSK1926TheOriginandDevelopmentoftheBengaliLanguageCalcuttaCalcuttaUniversityPress

[104] -----1939Bhasha-prakashBangalaVyakaran(AGrammaroftheBengaliLanguage)CalcuttaUniversityofCalcutta

[105] HaiMuhammadAbdul1964DhvaniVijnanOBanglaDhvani-tattwa(PhoneticsandBengaliPhonology)DhakaBanglaAcademy

[106]JhaSubhadra1958TheFormationofMaithiliLondonLuzacampCo

[107] KosticDjordjeDasRheaS1972AShortOutlineofBengaliPhoneticsCalcuttaStatisticalPublishingCompany

[108] MajumdarRC1971HistoryofAncientBengalCalcuttaGBhardwaj

[109] MazumdarBijaychandra19202000TheHistoryoftheBengaliLanguage(ReprCalcutta1920ed)NewDelhiAsianEducationalServices

[110] PandeyAnshuman2001ProposaltoEncodetheTirhutaScriptinISOIEC10646

[111] PalPalashBaran2001DhwanimalaBarnamalaKolkataPapyrus

[112] -----2007lsquoBanglaHarapherPanchParbarsquoInSwapanChakrabortyedMudranerSanskritiOBanglaBoiKolkataAbabhas

[113] RossFiona1999ThePrintedBengaliCharacteranditsEvolutionLondonCurzon

[114] ShastriMahamahopadhyayHaraPrasad1916HājārBacharērPurāṇaBāṅgālāBhāṣāyBauddhaGānōDōhāCalcuttaBangiyaSahityaParishat

[115] SinghUdayaNarayana(JointlyManiruzzaman)1983DiglossiainBangladeshandlanguageplanningCalcuttaGyanBharati214pp

[116] -----1987ABibliographyofBengaliLinguisticsMysoreCIILxii+316pp

[117] -----2017(withRajibChakrabortyBidishaBhattacharjeeampArimardanKumarTripathy)LanguagesandCulturesontheMarginGuidelinesforFieldworkonEndangeredLanguagesMimeoCentreforEndangeredLanguagesVisva-Bharati

54

[118] -----1980ScriptalchoiceandspellingreformAnessayinlanguageandplanningJournaloftheMSUniversityofBarodaSocialScienceNumber292173-186AmodifiedversionreprintedEAnnamalaiBjornJernuddandJoanRubinedsLanguagePlanningProceedingsofanInstituteMysoreCIIL405-417

[119] Sripantha1996JakhanChapakhanaEloKolkataPaschim-BangaBanglaAcademy

[120] SurAtul1986BanglaMudranerDushoBacharKolkataJijnasa

[121] ScriptBehaviourforBengaliVersion11TDILandC-DACPune

[122] BoraMahendra1981TheEvolutionofAssameseScriptJorhatAssamSahityaSabha

[123] ProposaltoEncodetheTirhutaScriptinISOIEC10646httpwwwunicodeorgL2L201111175r-tirhutapdfaccessedon25112017

[124] EthnologueAssameseintheLanguageCloudhttpswwwethnologuecomcloudasmaccessedon25112017

[125] BengalialphabetforManipurifoundinEthnologueManipuri(MeeteilonMeithei)httpswwwomniglotcomwritingmanipurihtmaccessedon20102019

[126] WikipediaBengalialphabethttpsenwikipediaorgwikiBengali_alphabetaccessedon25112017

[129]OmniglotSlyhetihttpwwwomniglotcomwritingsylotihtmaccessedon1052018

[130]WikipediaBishnupriyaManipurilanguagehttpsenwikipediaorgwikiBishnupriya_Manipuri_languageaccessedon25112017

[131] TheEMILLECIILCorpushttpmetashareeldaorgrepositorybrowsethe-emilleciil-corpusabdd35c8de6f11e2b1e400259011f6ea6bce74d38dbb42d881da76c64a6adb20accessedon1052018

[132]TheEMILLECIILCorpushttpcatalogelrainfoproduct_infophpproducts_id=696accessedon1052018

[133] BanglaLanguageampScript httpswwwisicalacin~rc_banglabanglahtmlaccessedon1052018

[134] SarkarPabitra1992BanglaBananSanskarSamasyaoSambhabanaKolkataChirayataPrakashan

[135] SarkarPabitra1993BanglaBhasharYuktabyanjanBhasha1123-45

55

[136] DashNiladriShekharandBBChaudhuri1998BanglaScriptAStructuralStudyLinguisticsToday121-28Alsoavailableathttpswwwacademiaedu9967428Bangla_Script_A_Structural_Study

[137] DaniAhmedHasan(1957)lsquoSrīhatta-NāgarīLipirUtpattioBikāśrsquoBanglaAcademyPatrika(Dhaka)Vol12(Bhadra-Agrahayan1364BangabdaNumber)pg1

[138] WikipediaSylhetiNagari httpsenwikipediaorgwikiSylheti_Nagariaccessedon1952018

[139] FuruiRyosuke(2015)lsquoVariegatedAdaptationsStateFormationinBengalfromtheFifthtoSeventhCenturyrsquoinBhairabiPrasadSahuampHermannKulkeedsInterrogatingPoliticalSystemsIntegrativeProcessesandStatesinPre-ModernIndiaChapter9Pp255-73NewDelhiManohar

[140] FergusonCharesAandMunierChowdhury(1960)lsquoPhonesofBengalirsquoLanguageVol36No1pp22-59

[141] ShahidullahMuhammad(2007)BuddhistMysticSongsDhakaMowlaBrothers

[142] RayPunyaSloka(1966)BengaliLanguageHandbookWashington

[143] HaiMuhammadAbdul(1960)AphoneticandphonologicalstudyofnasalsandnasalizationinBengaliDhakaUniversityofDhaka

[144] UnicodeConsortiumProposalSummaryFormtoAccompanySubmissionsforAdditionstotheRepertoireofISOIEC10646UNICODEhttpswwwunicodeorgL2L200202387r-syloti-formpdfaccessedonMay212018

[145] WikipediaOlChiki(Unicodeblock)httpsenwikipediaorgwikiOl_Chiki_(Unicode_block)accessedonMay212018

[146]BanglaScripthttpwwwbangladesh2000combdbangla_scripthtmlaccessedonMay212018

[147] BhattacharyaAshutoshed(1942)GopichandrerGanCalcuttaCalcuttaUniversity

[149] DasSisirKumar(1975)SahibsandMunshisAnAccountoftheCollegeofFortWilliamCalcutta

[150] IslamRafiqulPabitraSarkarMahbubulHaqampRajibChakraborty(eds)(2014)BanglaAcademyPromitoBanglaByabaharikByakaran(AFunctionalGrammarofStandardBangla)DhakaBanglaAcademy

[151] SarkarPabitra[2013]lsquoBanglaSpellingReformtheLongandShortofItrsquoBanglaJournal19215-232

[152] BanglaAcademy(2012)BanglaAcademyPromitoBanglaBananerNiyam(StandardBanglaSpellingasadoptedbyBanglaAcademy)DhakaBanglaAcademy

56

[153] SarkarPabitraampRajibChakraborty2018ldquoWhathashappenedSoFarIntermsof Script Reformsrdquo Paper presented at the Face to Face meeting jointly held by theBanglaAcademyDhakaampICANNatBanglaAcademyDhakaon10072018

[154] The Unicode Consortium 2018 The Unicodereg Standard Version 110 ndash CoreSpecificationChapter12P473

57

10 Appendix-I

101 AugmentedBackus-NaurFormalism(ABNF)The Augmented Backus-Naur Formalism (ABNF) is generic in nature and whenappliedtoaspecificlanguagescriptcertainrestrictionrulesapplyInotherwordsinagivenlanguagesomeoftheFormalismstructuresdonotnecessarilyapplyTotakecareofsuchcasesrestrictionrulesaresetinplaceTheserestrictionswillhelptofine-tunetheABNFIncaseofBangla13inparticularthefollowingrulesapply

1Khaṇḍata (ৎ) is NOT allowed at the beginning of an IDN label The sameappliestoঞandthevelarnasalঙintheBanglaSchemeoffive-foldlsquovargarsquo(asdefined under Table 5) Moreover Bangla does not allow ya (য়) in thebeginningofawordeitherbutwecanciteacoupleofnativeexamples forexamplethewordয়8াvেড়া(yaeligbbɔRo)fromthepoemlsquoLichuchorrsquowrittenbyKaziNazrul IslamHowever there are instances of it being used in namesmostlyof foreignoriginsuchasYaqubwhichmaybewrittenwithya(য়) inthe beginning as inয়াwব) In very recent timeswhile transliterating someChineseandJapanesenamesinBanglaonedoescomeacrossthepossibilityofKhaṇḍata (ৎ) followedbysa (স) inthebeginningofaword forexamplexেসিরং(Tsering)

2CHcancomewithKhandaTainonlythecasewhereCisra(র)(09B0)

ৎ6 asinভৎ6 সনা

3OnlyfollowingcombinationswithVHCMwillbeallowedrarrঅ8া(togetherpronouncedasaelig)asinঅ8ািসড(acid)rarrএ8া(togetheralsopronouncedasaelig)asinএ8ািসডএ8ােসািসেয়শান

(acidassociation)

102 lsquoSylhetiNagarılipirsquoorlsquoSilotirsquoThisversionofBanglascriptresemblesthe lsquoKaithīrsquoscript(ISO12954)usedbytheAccountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh andBiharndashwidelyinuseduringthe1880sTherewereseveralothernamesofSylheti

13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language Assamese and Maṇipuri have not been covered in this section

58

Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)

Table17ndashTheScriptTableofSylhetiNagarıorSiloti

103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints

1031 BanglaandNāgarīorDevanāgarī

Bangla Devanāgarī NBGP

Decision

ঃ U+0983 ः U+0903 Confusable

ওU+0993 उU+0909 Confusable

ঘU+0998 घU+0918 Confusable

U+0981 U+0945 Confusable

Table18BanglaandDevanāgarīconfusablecodepoints

59

1032 BanglaandGurmukhi

Bangla Gurmukhi NBGPdecision

ঘU+0998 ਬU+0A2C Confusable

U+0981 U+0A71 Confusable

Table19BanglaandGurmukhiconfusablecodepoints

Bangla Gurmukhi

NBGPdecision

ওU+0993 ਤU+0A24

Distinguishable

শU+09B6 ਅU+0A05

Distinguishable

মU+09AE ਮU+0A2E

Distinguishable

বাU+09ACandU+09BE

ਗU+0A17

Distinguishable

Table20ndashBanglaandGurmukhıdistinguishablecodepoints

1033BanglaandOriya(Odia)

Bangla Oriya(Odia) NBGPDecision

ওU+0993 ଓU+0B13 Confusable

Table21ndashBanglaandOriyadistinguishablecodepoints

Bangla Oriya(Odia) NBGP

Decision

ঘU+0998 ସU+0B38 Distinguishable

Table22ndashBanglaandOriyadistinguishablecodepoints

60

11 Appendix-IIBengaliconsonantsandtheirallographs

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

প p y(z+ত)(z +ন)|(z +প)প8(z +য)r(z +র)(z +ল)~(z +স)(+প)(+প)

ফ pʰ (t +র)(t +ল)(+ফ)

ব b ( +জ)( +দ)( +ধ)v(+ব)ব8(+য)(+র)(+ল)ভ(+ভ)(+ব)(+ব)

(0+ধ) 2 (3+ব)

ভ bʱ ভ8(+য)(+র)(+ল)

ত t (x+ত)8(x+x+য)(x+x+ব)(x+থ)(x+ন)ত8(x+য)(x+ম)8(x++য)(x+ব)(x+র)y(z+ত)(p+ত)(p+x+ব)(+ত)q8(+x++য) (+x+র)Thereisamarkedformofত+=ৎৎ6 (+xৎ)

amp (5+ত)

61

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

থ tʰ থ8(iexcl+য)cent(iexcl+র)pound(+থ)(x+থ)curren(+থ)

(7+থ) 9 (+থ)

দ d yen(brvbar+গ)sect(brvbar+ঘ)uml(brvbar+দ)copy(brvbar+ধ)দ8(brvbar+য)ordf(brvbar+ব)laquo(brvbar+ভ)not(brvbar+র)(+দ)shy(+দ)reg(+brvbar+র)not6 (+brvbar+র)

(lt+গ) gt (lt+ধ)

ধ dʱ macr(deg+ন)plusmn(deg+ম)ধ8(deg+য)sup2(deg+র)sup3(acute+ধ)copy(brvbar+ধ)(+ধ)micro(+ধ)

( (+ধ) gt (lt+ধ) (0+ধ) (7+ধ)

ট ʈ para(middot+ট)ট8(middot+য)cedil(middot+ব)sup1(middot+র)ordm(p+ট)raquo(frac14+ট)

ঠ ʈʰ ঠ8(frac12+য)frac34(iquest+ঠ)Agrave(frac14+ঠ)

ড ɖ Aacute(Acirc+ড)ড8(Acirc+য)Atilde(Acirc+র)

ঢ ɖʱ ঢ8(Auml+য)Aring(iquest+ঢ)

চ tʃ AElig(Ccedil+চ)Egrave(Ccedil+ছ)Eacute(Ccedil+Ecirc+র)Euml(Ccedil+ঞ)চ8(Ccedil+য)Igrave(Iacute+চ)Icirc(Iuml+চ)

(A+চ)

62

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ছ tʃʰ ETH(Ecirc+র)Egrave(Ccedil+ছ)Ntilde(Iacute+ছ)Ograve(Iuml+ছ)

$ (A+ছ)

জ dʒ Oacute(Ocirc+জ)Otilde(Ocirc+Ocirc+ব)Ouml(Ocirc+ঝ)times(Ocirc+ঞ)জ8(Ocirc+য)Oslash(Ocirc+র)Ugrave(Iacute+জ)

(A+জ)

ঝ dʒʱ (notprivilegedenoughtohaveclustersasafirstmember)Ouml(Ocirc+ঝ)Uacute(Iacute+ঝ)

ক k Ucirc(p+ক)ordm(p+ট)(p+ত)Uuml(p+x+র)(p+x+ব)Yacute(p+ন)THORN(p+ব)szlig(p+ম)ক8(p+য)agrave(p+র)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)auml(p+frac14+ব)aacute8(p+frac14+য)aring(p+স)s(aelig+ক)ccedil(+p+র)

amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)

খ kʰ (notprivilegedenoughtohaveclustersasafirstmember)egrave(aelig+খ)

63

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

গ g eacute(acute+গ)ecirc(acute+দ)sup3(acute+ধ)euml(acute+ন)igrave(acute+ব)iacute(acute+ম)গ8(acute+য)icirc(acute+র)iuml(acute+ল)eth(aelig+গ)eth6 (+aelig+গ)

( (+ধ) (H+গ) K (L+H+গ)

ঘ gʱ ntilde(ograve+ন)ঘ8(ograve+য)oacute(ograve+র)ocirc(aelig+ঘ)

ঞ Thisletterdoesnothaveanyparticularphoneticvaluebutmostlypronouncedasn

Igrave(Iacute+চ)Ntilde(Iacute+ছ)Ugrave(Iacute+জ)Uacute(Iacute+ঝ)times(Ocirc+ঞ)

(A+চ) $ (A+ছ) (A+জ) M (A+ঝ)

ণ n otilde(iquest+ট)frac34(iquest+ঠ)ouml(iquest+ড)divide(iquest+Acirc+র)Aring(iquest+ঢ)oslash(iquest+ণ)ণ8(iquest+য)ugrave(iquest+ব)acirc(p+frac14+ণ)uacute(frac14+ণ)ucirc(+ণ)

O (P+ড) - (P+R+র) + (S+ণ)

ঙং ŋ s(aelig+ক)uuml(aelig+p+র)egrave(aelig+খ)eth(aelig+গ)ocirc(aelig+ঘ)yacute(aelig+p+ষ)(Insomecontextsaeligisreplacedbyং)কংঅং

) (H+ক) (H+গ) U (H+ঘ)

64

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ম m thorn(+ল)yuml(+প)(+z+র)(+ভ)(++র)$(+ম)(+র)(x+ম)plusmn(deg+ম)amp(+ম)atilde(p+frac14+ম)

W (3+ম)

ন n (+ট)((+middot+র))(iquest+ঠ)u(+ড)(+Acirc+র)(+ত)q(+x+র)q8(+x++য)curren(+থ)shy(+দ)reg(+brvbar+র)micro(+ধ)+(+deg+র)(+brvbar+ব)-(+ন)(+ম)ন8(+য)(+স)0(+ন)

(7+থ) (7+ধ) (7+Y+র)

শ ʃ Icirc(Iuml+চ)Ograve(Iuml+ছ)1(Iuml+ন)2(Iuml+ম)3(Iuml+র)4(Iuml+ল)শ8(Iuml+য)

ষ ʃ 5(frac14+ক)raquo(frac14+ট)Agrave(frac14+ঠ)uacute(frac14+ণ)6(frac14+প)7(frac14+z+র)8(frac14+ফ)raquo(frac14+ট)9(frac14+middot+র)Agrave(frac14+ঠ)uacute(frac14+ণ)ষ8(frac14+য)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)

+ (S+ণ)

65

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

স sampʃ (+ক)(+ট)(+প)(+ফ)lt(+ত)pound(+থ)(+ট)(+ক)=(+খ)স8(+য)gt(+র)(+ল)aring(p+স)

9 (+থ)

হ h ucirc(+ণ)0(+ন)amp(+ম)হ8(+য)(+র)A(+ল)

W (3+ম)

ড় ɽ B(C+গ)

ঢ় ɽʱ (notprivilegedenoughtohaveclusters)

য dʒThesecondarysymbol(allograph)jɔ-phalahastwophoneticvaluesWhenaddedtotheinitialconsonantinaworditisavowelaelig(asinশ8ামলর 8াপারetc)Butafteranon-initialconsonantitjustdoublesitinpronunciation(asinকায6ধায6etc)The+যcombinationhastwophysicalmanifestationsmdashর 8andয6

ক8(p+য)স8(+য)র 8(+য)[Justর 8isneverusedinBanglaorthographyর 8াisbutthenitslasttwosymbolsYa-phalaa-karaconstituteavowelsignrepresentingthevowelঅ8া]

66

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

র r Twomanifestationsmdashi lরফrepʰasthefirst

memberofaclusteregপ6ৎ6 not6 য6D6(+deg+ব)(earlierE6=+brvbar+deg+বafour-termcluster)etc(placedoverthefollowingconsonant)

ii র-ফলাrɔ-pʰɔlaasthesecondthirdmemberofaclustereg etc(placedundertheconsonantitfollows)

ল l F(+গ)(+প)G(+ব)H(+ম)I(+ট)J(+ড)K(+ক)F(+গ)L(+দ)ল8(+য)iuml(acute+ল)(+ল)thorn(+ল)

ঃ hwordfinallywordmediallyitdoublesthepronunciationofthefollowingconsonant

অঃকঃ

অব

Page 22: Proposal for a Bangla (or Bengali) Script Root Zone …‘Bangla’ (or Bengali) is historically and genealogically regarded as an eastern Indo-Aryan language with around 178.2 million

22

purposesonlyorforarchivalpurposeswillnotbeconsideredforinclusioninthecode-pointrepertoire4112 UnambiguousUseEvery character proposed should have unambiguous understanding among linguistsaboutitsusageinthelanguage

42 ExclusionPrinciplesThe main exclusion principle is that of External Limits on Scope These consist ofprotocolsor standardswhichareprerequisites to theLabelGenerationRule-setsAllfurtherprinciplesareinfactsubsumedundertheselimitationsbuthavebeenspeltoutseparatelyforthesakeofclarity421 ExternalLimitsofScopeThecodepointrepertoireforrootzonebeingaveryspecialcaseatthetopofprotocolhierarchies thecanvasofavailablecharacters forselectionasapartof theRootZonecodepointrepertoireisalreadyconstrainedbyvariousprotocollayersbeneathitThefollowingthreemainprotocolsstandardsactassuccessivefiltersiTheUnicodeChartOut of all the characters that are needed by the script in question if a particularcharacter is not encoded in Unicode it cannot be incorporated in the code pointrepertoire Such cases are quite rare and especially so in Bangla-Asamiyā-ManipuriWritingSystemgiventheelaborateandexhaustivecharacterinclusioneffortsmadebytheUnicodeconsortiumiiIDNAProtocolUnicode being the character-encoding standard for providing the maximum possiblerepresentation of a given scriptlanguage it has encoded as far as possible all thepossible characters needed by the script However the Domain name being aspecialized case it is governed by an additional protocol known as IDNA(InternationalizedDomainNames inApplications) The IDNAprotocol excludes somecharactersoutofUnicoderepertoirefrombeingpartofthedomainnamesiiiMaximalStartingRepertoire(MSR)TheRoot-zoneLGRbeing the repertoireof characterswhicharegoing tobeused forcreationoftheRoot-zoneTLDswhichinturnconstituteanevenmorespecializedcaseof domain names the ROOT LGR procedure introduces additional exclusions on theIDNArsquosallowedsetofcharacters

23

ExampleBanglaSignAvagrahaঽ(U+093D)evenifallowedbyIDNAprotocol isnotpermittedintheRootZoneRepertoireaspertheMSRTosumuptherestrictionsstartoffwithadmittingonlysuchcharactersasarepartofthe code-block of the given scriptlanguage The IDNA Protocol further narrows thisdownandfinallyanadditionalfilterintheformofMaximalStartingRepertoirerestrictsthecharactersetassociatedwiththegivenlanguageevenmore4211 NoPunctuationMarksTheTLDsbeingidentifierspunctuationmarkerspresentinBraHami-basedscriptswillnotbeincluded 4212 NoSymbolsandAbbreviationsAbbreviations weights and measures and other such iconic characters like BANGLAISSHAR(U+09FA)BANGLACURRENCYDENOMINATORSIXTEEN৹(U+09F9)etcwillalsonotbeincluded4213 NoRareandObsoleteCharactersThere are characterswhich have been added toUnicode to accommodate rare formssuchasSanskriticVOCALICRRৠ(U+09E0)andVOCALICLldquoঌrdquo (U+098C)aswellasVOCALICLLৡ(U+09E1)andtheallographicndashkaraformsofthelattertwosymbols-VOWELSIGNVOCALICL(U+09E2)andVOWELSIGNVOCALICLLldquo(U+09E3)Allsuch charactersareexcludedwhich complieswith theConservatismprincipleas laiddownintheRootZoneLGRprocedureHoweverinBanglathe-karacorrespondingtoVOCALICRRৠ(U+09E0)whichisVOWELSIGNVOCALICRRldquordquo(U+09C4)isstill inactiveuseincertainlimitedborrowedorSanskriticwordsandarethereforeretained4214 NoStressMarkersofClassicalSanskritandVedicStressmarkers for classical Sanskrit will not be included This is also in consonancewiththeLetterprincipleaslaiddownintheRootZoneLGRprocedure4215 ABNFThe Augmented Backus-Naur Formalism (ABNF) is described in Section 541 andAppendix(Section101)

5 RepertoireTheBanglaWritingSystemisrepresentedinUNICODEusingtheBengali(Bangla)scriptname as enumerated in ISO 15924 corresponding to languages such as Asamiyā(Assamese) Bangla (Bengali) and Manipuri The BENGALI block used for Bangla-

24

Asamiyā-Manipuri in theUNICODEhas93 entriesThis sectiondetails the code-pointrepertoirethattheNeo-BrāhmīGenerationPanel[NBGP]proposestobeincludedintheBanglaLGRItmaybementionedherethat theGovernmentofAssamhassubmittedaproposal toBureauof Indian Standards (BIS) on26th February2016 for dis-unificationofBanglaand Asamiyā Scripts The BIS inits 8thMeetingofIndian Language Technologies andProducts Sectional Committee LITD 20 held on 23rd Aug 2017 decided torefer theproposalforrecognitionofAssamesescriptinISOIEC10646toISOUntiltheUNICODEConsortiumtakesanyfurtheractionitwillbeassumedthattheCodePointRepertoireunderTable11willbevalidforallthethreelanguagesasaboveFor each of the code points language references have been given in the last columntitledReferenceunderTable8titledtheldquoCodePointRepertoirerdquoForentirecoverageofBanglacodepointsreferencesofBanglaAsamiyā(Assamese)Manipuri(Meitei)andBishnupriya are given Kokborok written in Bangla script is not known to haveintroducedmanynewcomplicationsexceptforoneparticularcharacterThoughonlyafewrepresentativelanguagesunderEGIDSScale1-4havebeenchosenforreferencingthey together cover all the code-points required for all the languages that NBGP hasconsideredasgivenunderBanglaUnicodePoints(asgiveninUNICODE63)Howeverbefore thedetailsarepresented it is ideal to lookat theBanglaCodePointChartfromMaximalStartingRepertoire[MSR]Version3Itmaybenotedthattheshapesofthereferenceglyphsgivenbelowinthecodechartsarebasedononeofthemanyfontsdesigned and are not prescriptive because there could be some variations in actualfonts ndash both UNICODE-compatible and True-Type ones Consider the following Codepointtable

25

Figure1BanglaCodePagefrom[MSR]forBangla-Asamiyā-Manipuri

ColourconventionAllcharactersthatareincludedinthe[MSR]-YellowbackgroundPVALIDinIDNA2008butexcludedfromthe[MSR]-PinkishbackgroundNot PVALID in IDNA2008 or are ineligiblefor the root zone (digits hyphen) - Whitebackground

GiventheBanglaUnicodeBlockasinFigure1forthecodepointsthoseareincludedintheMSRthefollowingsymbolswillneedaseparatetreatmentৎ U+09CE BanglaLetterKhanda-Taৰ U+09F0 Asamiyā-BanglaLetterRaWithMiddleDiagonalৱ U+09F1 Asamiyā-BanglaLetterRaWithLowerDiagonal

26

51 CodePointRepertoireInclusion

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

1 U+0981 BENGALISIGNCANDRABINDU

Candra-bindu

1Bangla2Manipuri2Assamese

[112][122][125]

2 U+0982 ং BENGALISIGNANUSVARA

Onushshar(Anusvara)

1Bangla2Manipuri2Assamese

[112][122][125]

3 U+0983 ঃ BENGALISIGNVISARGA

Biśarga(Visarga)

1Bangla2Manipuri2Assamese

[112][122][125]

4 U+0985 অ BENGALILETTERA

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

5 U+0986 আ BENGALILETTERAA

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

6 U+0987 ই BENGALILETTERI

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

7 U+0988 ঈ BENGALILETTERII

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

27

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

8 U+0989 উ BENGALILETTERU

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

9 U+098A ঊ BENGALILETTERUU

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

10 U+098B ঋ BENGALILETTERVOCALICR

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

11 U+098F এ BENGALILETTERE

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

12 U+0990 ঐ BENGALILETTERAI

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

13 U+0993 ও BENGALILETTERO

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

14 U+0994 ঔ BENGALILETTERAU

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

28

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

15 U+0995 ক BENGALILETTERKA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

16 U+0996 খ BENGALILETTERKHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

17 U+0997 গ BENGALILETTERGA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

18 U+0998 ঘ BENGALILETTERGHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

19 U+0999 ঙ BENGALILETTERNGA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

20 U+099A চ BENGALILETTERCA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

21 U+099B ছ BENGALILETTERCHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

29

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

22 U+099C জ BENGALILETTERJA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

23 U+099D ঝ BENGALILETTERJHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

24 U+099E ঞ BENGALILETTERNYA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

25 U+099F ট BENGALILETTERTTA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

26 U+09A0 ঠ BENGALILETTERTTHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

27 U+09A1 ড BENGALILETTERDDA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

30

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

28 09A109BC(U+09DC)

ড় NormalizedformofBENGALILETTERRRA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]09DCisthepreferredcodepointhoweveritisnotavailableforLGRasperthestandardsgoverningthisLGRdevelopment

29 U+09A2 ঢ BENGALILETTERDDHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

30 09A209BC(U+09DD)

ঢ় NormalizedformofBENGALILETTERRHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]09DDisthepreferredcodepointhoweveritisnotavailableforLGRasperthestandardsgoverningthisLGRdevelopment

31 U+09A3 ণ BENGALILETTERNNA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

32 U+09A4 ত BENGALILETTERTA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

31

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

33 U+09A5 থ BENGALILETTERTHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

34 U+09A6 দ BENGALILETTERDA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

35 U+09A7 ধ BENGALILETTERDHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

36 U+09A8 ন BENGALILETTERNA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

37 U+09AA প BENGALILETTERPA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

38 U+09AB ফ BENGALILETTERPHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

39 U+09AC ব BENGALILETTERBA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

32

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

40 U+09AD ভ BENGALILETTERBHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

41 U+09AE ম BENGALILETTERMA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

42 U+09AF য BENGALILETTERYA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

43 09AF09BC(U+09DF)

য় NormalizedformofBENGALILETTERYYA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]09DFisthepreferredcodepointhoweveritisnotavailableforLGRasperthestandardsgoverningthisLGRdevelopment

44 U+09B0 র BENGALILETTERRA

Consonant 1Bangla2Manipuri

[112][125]

45 U+09B2 ল BENGALILETTERLA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

46 U+09B6 শ BENGALILETTERSHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

33

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

47 U+09B7 ষ BENGALILETTERSSA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

48 U+09B8 স BENGALILETTERSA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

49 U+09B9 হ BENGALILETTERHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

50 U+09BE া BENGALIVOWELSIGNAA

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

51 U+09BF ি BENGALIVOWELSIGNI

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

52 U+09C0 ী BENGALIVOWELSIGNII

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

53 U+09C1 BENGALIVOWELSIGNU

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

34

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

54 U+09C2 BENGALIVOWELSIGNUU

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

55 U+09C3 BENGALIVOWELSIGNVOCALICR

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

56 U+09C4 BENGALIVOWELSIGNVOCALICRR

Kāra(Mātrā)

1Bangla2Assamese

[112][122]

57 U+09C7 l BENGALIVOWELSIGNE

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

58 U+09C8 m BENGALIVOWELSIGNAI

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

59 U+09CB lা BENGALIVOWELSIGNO

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

60 U+09CC lৗ BENGALIVOWELSIGNAU

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

35

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

61 U+09CD BENGALISIGNVIRAMA

Hasanta(=Halant)Virama(=Da ri)

1Bangla2Assamese2Manipuri

[112][122][125]

62 U+09CE ৎ BENGALILETTERKHANDATA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

63 U+09F0 ৰ BENGALILETTERRAWITHMIDDLEDIAGONAL

Consonant 2Assamese [122]

64 U+09F1 ৱ BENGALILETTERRAWITHLOWERDIAGONAL

Consonant 2Assamese2Manipuri

[122][125]

Table8BanglaCode-PointRepertoire

Apart from the above individual code-points the Neo-Brāhmī Generation Panel alsoproposes some specific sequences which enable conditional inclusion of the BanglaLETTER A and E followed by Bangla SIGN VIRAMA and Bangla LETTER YA againfollowed by Bangla VOWEL SIGN AA in the repertoire for enabling inclusion of aeligsoundasinEnglishlsquobatrsquolsquocatrsquoetc

SrNo

UnicodeCodePoints

Sequence

CharacterNames Examplelanguagesusingthecode-point

(Notexhaustive

list)

Reference

36

SrNo

UnicodeCodePoints

Sequence

CharacterNames Examplelanguagesusingthecode-point

(Notexhaustive

list)

Reference

S1 098509CD09AF09BE

অ8া BENGALILETTERABENGALISIGNVIRAMABENGALILETTERYABENGALIVOWELSIGNAA

BanglaAssamese

[112][122]

S2 098F09CD09AF09BE

এ8া BENGALILETTEREBENGALISIGNVIRAMABENGALILETTERYABENGALIVOWELSIGNAA

Bangla [112]

Table9Sequences

52 CodePointRepertoireExclusionTherearesomecharactersoftheBanglascriptthatfindplaceintheUnicodebuthavenot been included in the repertoire in the LGR proposal The reason for excludingঌ(U+098C)andৗ(U+09D7)isthattheyarerareandobsoletecharacters

SrNo CodePoints

Glyph CharacterNames Note

1 U+098C ঌ BENGALILETTERVOCALICL Limitedordeclininguse

2 U+09D7 ৗ BENGALIAULENGTHMARK Limitedordeclininguse

Table10ExcludedCodePoints

37

53 CodepointnotusedaloneBENGALI SIGN NUKTA U+09BC (See 336) is excluded from repertoire since it will never be used alone It will be used as sequence in three special characters in normalized form for ড় ঢ় য়

UnicodeCodePoint

Glyph CharacterName Reasonforexclusion

U+09BC BENGALISIGNNUKTA

Never used alone Only used together with U+09A1 ড

U+09A2 ঢ U+09AF য as to form ড় ঢ় য় respectively

Table10bExcludedCodePoints

54 TheBasisofPresentIDNThepresentLGRhasalsobenefitedfromtheearlierworkonIDNforBangla(differentversions)doneforभारतorভারতdraftedbetween20112009and18072013

541 TheABNFVariablesTheAugmentedBackus-NaurFormalism(ABNF)beganwiththefollowingvariables

CrarrConsonantVrarrVowelMrarrkāra(Mātrā)BrarrAnusvāra(onuʃʃār)DrarrCandrabinduXrarrVisarga(biʃɔrgo)HrarrHasantaViramaZrarrKhandaTa

TheAugmentedBackus-NaurFormalism(ABNF)willusethefollowingOperators

SrNumber Operator Function

1 ldquo|ldquo Alternative

2 ldquo[]rdquo Optional

3 ldquordquo VariableRepetition

38

4 ldquo()rdquo SequenceGroup

Table11TheABNFFormalism

InwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoBanglaaregiventofacilitateunderstanding

542 TheVowelSequenceInwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoBanglaare given To facilitate understanding of other Brahmi script users equivalents inDevanāgarīareprovidedwherevernecessaryAvowelsequenceismadeupofasinglevowelItmaybefollowedbutnotnecessarily(optionally)byanAnusvāraonuʃʃār(B)Candrabindu(D)oraVisargabiʃɔrgo(X)ThenumberofDBorXwhichcanfollowaV inBanglamaynotberestrictedtooneGoingbytherules illustratedinthedocument it isclearthat formationssuchasVDDVBBandVXXare invalidorthographicunitsHowever it isvalidandpossible tohaveformationsorsequencessuchasanusvarafollowedbyachandrabinduononehandandvisarga followed by a chandrabindu on the other as in হ8াংচা lsquohaelignchārsquo and lsquohaelignrsquo হ8াঃrespectivelyThepossibilityof aVisarga orAnusvāra (onuʃʃār) followingaCandrabinduexists inBangla Vowel can optionally be followed by a combination of Hasanta Virama [H]Consonant [C] to formaYa-phala ldquoYa-phala isapresentation formofU+09AFBanglaletterযorlsquoyarsquoRepresentedbythesequenceltU+09CDieBENGALISIGNVIRAMABangla SIGNHasanta or VIRAMA U+09AF -য BENGALI LETTER YAgt Ya-phala has aspecialformয়AgainwhencombinedwithU+09BEাBENGALIVOWELSIGNAA(ielsquoaarsquo(ā))itisusedfortranscribing[aelig]asintheldquoardquointheEnglishwordldquobatrdquowritteninBanglaasব8াটAVowel-sequenceadmitsthefollowingcombinations

5421 ASingleVowel

ExamplesV অअ

5422 AVowelwithConditionsAVowelcanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D]orVisarga[X]or Candrabindu+ Anusvāra [DB] or Candrabindu+ Visarga [DX] or combination ofHasanta(orVirama)[H]followedbyConsonant[C]followedbykāra(Mātrā)[M]

39

Examples

VB অং अ

VD অ अ

VX অঃ अः

VDB অং अ

VDX অঃ अःVHCM অ8াএ8া

5423 VHCMSequenceAVHCMsequencecanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D] orVisarga [X]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga [DX]

Examples VHCMB অ8াংএ8াং VHCMD অ8াএ8া VHCMX অ8াঃএ8াঃ VHCMDB অ8াংএ8াং VHCMDX অ8াঃএ8াঃ

543 TheConsonantSequence

5431 ASingleConsonant(C) ExampleC কक

5432 AConsonantwithConditionsAConsonantoptionallyfollowedbydependentvowelsignkāra(Mātrā)[M]orAnusvāra [B] or Candrabindu [D] or Visarga [X] or Hasanta (also known asVirama)[H]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga[DX]

Example

CM িকক -कक

CB কং क

40

CD ক क

CX কঃ कः

CH p क(Pureconsonant)

CDB কং क

CDX কঃ कः

5433 CMSequenceACMsequencecanbeoptionallyfollowedbyBDXDBorDX

Example CMB কীংকং कक

CMD কা का

CMX বীঃ वीः

CMDB কাং का

CMDX কাঃ काः

5434 SequenceofConsonants Asequenceofconsonants(upto4)joinedbyHasanta(alsoknownasVirama)

3(CH)CExample

CHC W rarr ন++ ত न+त

CHCHC sup2 rarr ন+ + ত+ + র न+त+र

CHCHCHC q8 rarr ন++ত++র++য় न+त+र+य

5435 Subsets While considering its subsets as a representative example we will consider thecombinationCHConlyhoweverthesameisequallyapplicabletoCHCHCandCHCHCHC[A]ThecombinationmaybefollowedbyMBDXDBorDX

Example CHCM sup3ী rarrক ক ী 4कrarrककী

CHCB sup3ং rarrক ক ং 4कrarrकक

CHCD sup3 rarrক ক 4कrarrकक

41

CHCX sup3ঃ rarrক ক ঃ 4कःrarrककঃ

CHCDB sup3 ং rarrক ক ং 4कrarrकक

CHCDX sup3ঃ rarrক ক ঃ 4कःrarrककः [B] 3(CH)CMmayfurtherbefollowedbyaBDXDBorDX

Example

CHCMB sup3ীং rarr ক ক ী ং 4क rarr क क ी

sup3ং rarr ক ক ং 4क rarr क क

CHCMD sup3া rarr ক ক া 4का rarr क क ा

CHCMX sup3ীঃ rarr ক ক ী ঃ 4कः rarr क क ी ः

CHCMDB sup3াংrarr ক ক া ং 4काrarr क क ा

CHCMDX sup3াঃ rarr ক ক া ঃ 4काः rarr क क ा ः

544 TheKhanda-Tasequence

5441 AsinglelsquoKhandarsquo-Ta(Z) Example Zৎ=x

5442 AKhandaTaCombination10AKhandaTacanbeprecededbyaconsonantandHasanta(alsoknownasVirama)

[CH]Z

Example র++ৎ =ৎHasinভৎHসনা (bhartsanā) scoldingNoteTheconditionsinthiscontextofKHANDATAarethattheCshouldbeeitherRAU+09B0(র)(usedinBangla)orRAU+09F0(ৰ)(usedinAssamese)

545 SpecialCasesSandPTwospecialcasesinvolvingSequences(referredtoasSandPinTable16underSection7)couldbedescribedbrieflyhereLetustakeupSinthefirstinstanceItisnoteworthythat thereare two instances inBanglawhereHasanta(U+09CD) isprecededbya full

10 Refer to Rule P in Section 7 Table 16

42

vowel (U+0985 অ - BANGLA LETTER A and U+098F এ - BANGLA LETTER E) ForrenderingYa-phalāfollowedbyঅandএitisnecessarytotypeU+09CDplusU+09AFyapreceded by the said vowels This is a purely ligatural entity and the addition ofYa-phalāandā-kāraisusedtoelicittheaeligsoundasinEnglishlsquobatrsquolsquofatrsquoetcTheBrahmıscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribedaslsquovowel killerrsquo although it actually indicates absence of a vowel after the markedconsonant Only the consonants can have the Hasanta marked But as we see hereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅGাandএGা(CfUnicode100p473[100])Another case refers to the formation of repha and ra-phalā in the said script andmentioned in the tableaboveasPOwing toco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+

kaasinঅকHarkaldquothesunldquo)ra-phalā=C+Hasanta+ra(egTieka+Hasanta+raas

inচTcakraldquocycleldquo)ThepointisinboththecasestheslotforracouldbeBanglaraর

(U+09B0)ortheAssameseraৰ(U+09F0)followedprecededbythecommonHasanta(U+09CD)whereastheshapesofrephaandra-phalāinboththecasesremainthesame

43

6 VariantsThissection talksabout thevariants in theBanglascriptTheNBGPcategorizes theseconfusinglyvariantsintwogroupsGroup1ConfusingduetopurevisualsimilarityGroup2ConfusingduetodeviationfromnormallyperceivedcharacterformationsbylargerlinguisticcommunityForGroup1anyidenticalcodepointsaredefinedasvariantsTheconfusablebutnotidenticalcasesarenotproposedasthereisanotherpanel(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesHowevercaseswhichbelongtoGroup2areproposedtobeconsideredasvariantsThesecasesarenotofmerevisualsimilarityasthey involve some deviations from the widely accepted norms of Bangla Aksharformations These can cause confusion even to a careful observer and hence beingproposedasvariantsThe variants are generated in a script when two or more forms are formed withdifferent storage or code points In Bangla the e-kāra ā-kāra and the o-kāra havedifferentcodepointsOnecantypeowithaconsonantatonegoandthesamebytypinge-kāra and ā-kāra as two separate keys getting the same results A reader cannotdifferentiatebetween the twoko (েকা)one typedwitha singlekeyand theotheronetypedwithtwodifferentkeysMoreoverthiswillnotbeconsideredasacaseofvariantbecauseakārafollowedbyakāraisnotallowed

61 InScriptVariantsHoweverweproposetwocasesoftruein-scriptvariantsinBanglascriptCASEIAs far as true variants in Bangla are concernedwemay drawour attention to caseswhereinHasantawith(U+09A5)থ(tha)appearsasconjunctwith(U+09B8)স(sa)and(U+09A8)ন(na)1 স+Hasanta+থ(U+09B8+U+09CD+U+09A5)versus

স +Hasanta+হ(U+09B8+U+09CD+U+09B9)

2 ন+Hasanta+থ(U+09A8+U+09CD+U+09A5)versus

ন+Hasanta+হ(U+09A8+U+09CD+U+09B9)

44

Theabovecombinationsifwrittenintraditionalorthographycouldbelittleconfusingwheretheথ(tha)inconjunctappearslikeaহ(ha)Theconjunctcouldbeintheinitialmedialorfinalpositions(asshownbelowinegno1)Itcouldbetypedwrongaswellthinking itwas aহ (ha)U+09B9 increasing the chances of risks in labelwriting andidentificationExamples1 acuteandসহ(asinacuteানsthānaacuteলsthulaাacuteGsvāsthyaঅacuteায়ীasthāyı)

2 andনহ(asinparagrantha)The fontswhichrepresent traditionalBanglawritingsystemcould tend tocreate thisproblemThereforethesemaybetakenascasesofvariantsinBanglaCASEIIAnotherinterestingexampleofvariantisencounteredinra+HasantaandHasanta+racombinationsinwritinglabelsintheBanglascript(forlanguagessuchasBanglaAssameseandManipuri)Thevariantcasesariseintypinglsquorepharsquo(involvingra+Hasanta)andlsquora-phalārsquo(involvingHasanta+ra)lsquoRepharsquocouldbeformedbytwosequences(mainlybecausebothAssameseandBanglafindplaceinthesameUNICODEpointsandlsquoB_RArsquoaswellaslsquoA_RArsquorefertothesamephoneticelement)Herethefinalligatureslookthesameandwillbeasfollows

(1) B_RA+H+C(2) A_RA+H+C

Where

B_RA = U+09B0BENGALILETTERRA(র)orA_RA = U+09F0BENGALILETTERRAWITHMIDDLEDIAGONAL(ৰ)H = U+09CDBENGALISIGNVIRAMA()C = anyconsonant(theoretically)

Example

Sequence1(UsingBanglaRA)

Ligature1

Sequence2(UsingAssameseRA)

Ligature2

U+09B0(র)U+09CD()U+0995(ক) কH U+09F0(ৰ)U+09CD()U+0995(ক) কH

U+09B0(র)U+09CD()U+09A0(ঠ) ঠH U+09F0(ৰ)U+09CD()U+09A0(ঠ) ঠH

Table12ExampleofRepha

45

NoteAsBanglaandAssameseকandঠlookexactlythesametheresultantcombinationswithRephalookidenticalAdditionofRephadoesnotmakeanydifferencelsquoRa-phalarsquocouldbeformedbytwosequencesonsimilargroundsandthefinalligatureswouldlookthesame

(1) C1+H+B_RA(2) C1+H+A_RA

WhereC1 = anyconsonantsexceptKhanda-ta

Example

Sequence1(UsingBanglaRA)

Ligature1

Sequence2(UsingAssameseRA)

Ligature2

U+0995(ক)U+09CD()U+09B0(র) U+0995(ক)U+09CD()U+09F0(ৰ)

U+09A8(ন)U+09CD()U+09B0(র) ) U+09A8(ন)U+09CD()U+09F0(ৰ) )

Table13ExampleofRa-phala

AstheAssameseandBanglaRephaandRa-phalaconjunctformslookthesamethiscouldcauseconfusabilitytotheend-usersHencetherephaandra-phalacasesneedtobedefinedasvariantsNBGPconcludedtodefineরandৰasvariantcodepointswhereonlyonevariantsetbetweenরandৰcouldcoverallcasesButthiswillcreateblockedvariantlabelsegif

someoneregistersldquoরররrdquothevariantlabelldquoৰৰৰrdquowillbegeneratedasvariantandwillbeblockedandviceversaHoweveritisonlyblockedatthelabellevelifsomeoneelseneedstoregisterotherlabelsegৰৰorৰৰৰৰitisstillpossible

62 CrossScriptVariantsAcrispcrossscriptstudyforBanglahasbeendonewithrespecttosisterscriptssuchasDevanāgarī Gurmukhı and Odia11(formerly Oriya) keeping in mind the visual andtechnicalconfusionstheymaycauseaslabelsonthewebdomainMoreoverthereisnoin-script variant in Bangla as far as the orthography is concerned The followingcharacters are being proposed by the NBPG as variants Although there are certaincharacters which are somewhat similar they but have not been included here TheyhavebeenprovidedintheAppendix(102)forreference

11 Unicode uses Oriya for the script although Odia is now the official term used

46

1 BanglaandNāgarīDevanāgarīScript

Bangla Devanāgarī

মU+09AE

मU+092E

িU+09BF

िU+093F

Table14-BanglaandDevanāgarīcross-scriptvariantcodepoint

2 BanglaandGurmukhiScript

Bangla Gurmukhı

মU+09AE

ਸU+0A38

িU+09BF

ਿU+0A3F

Table15-BanglaandGurmukhıcross-scriptvariantcodepoint

7 WholeLabelEvaluationRules(WLE) ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinsection32whenwritten inBangla12ScriptThe ruleshavebeendrafted in suchawaythattheycanbeeasilytranslatedintotheLGRspecificationsBelow are the symbols used in the WLE rules for each of the Indic SyllabicCategoryasmentionedinthetableprovidedinCodepointrepertoire(Section51)

C rarr Consonant

M rarr Kāra(Mātrā)

V rarr Vowel

B rarr Anusvāra

12 As used by the Unicode denoting and including both Assamese and Maṇipuri

47

D rarr Candrabindu

X rarr Visarga

H rarr Hasanta

Z rarr KhandaTa

S rarr S1S2(fromTable9)or(ae)Ya-phalā(V1HC1M1)whereV1iseither0985(অ-BENGALILETTERA)or098F(এ-BENGALILETTERE)His09CD(-BENGALISIGNVIRAMA)C1is-09AF(য-BENGALILETTERYA)M1is-09BE(া-BENGALIVOWELSIGNAA)S1 and S2 are valid even they are not allowed by the other context rules

P rarr Ra-Hasanta(C2H)whereC2iseither09B0(র-BENGALILETTERRA)or 09F0 (ৰ - ASSAMESE LETTER RA

Unicode name BENGALI LETTER RAWITHMIDDLEDIAGONAL)

His09CD(-BENGALISIGNVIRAMA)

Table16-SymbolsusedinWLErules

ItisalsoperhapsidealtomentionherethatinBanglatheconsonantletters(orgraphemes)arephysicallyjoinedtoformldquoclustersrdquothatcouldtheoreticallyconjoinfromtwotofourconsonantsandcombinetocreatenewshapesDashandChaudhuri(1998)statethatthereareldquonearly380uniqueconsonantclustersrdquooutofwhichBi-consonantalcombinationsare290three-lettercombinationsaccountforanother80andtherareroneswithfourlettersnumber10more[136Pg4]MoredetailsofsuchcombinationscouldbeseeninPabitraSarkar(1993)[135] 71FinalSetofWLERulesTheprevalentpatternsinBanglaandvariousrestrictionsbelowarethespecificWLErulesthatneedtobeimplemented

48

1 CisasetofCandCNwhereCNisthesetofnormalizedformsofড়ঢ়য়2 HmustbeprecededbyC

Example

3 MmustbeprecededbyCExampleকা

4 DmustbeprecededbyeitherofVCorMExampleআখখা হা

5 XmustbeprecededbyeitherofVCMorDExampleউঃখঃবঃাঃ দ ঃ

6 BmustbeprecededbyeitherofVCMorDExampleআংইংকং

7 ZmustbeprecededbyVCMDBXorPExampleইৎকৎাৎাৎপ6ৎrৎ (S is not listed because S ends with M Z may also follow S)rdquo

8 VCANNOTbeprecededbyHDetailsin711CaseofVprecededbyH

9 SCANNOTbeprecededbyH

Now letuselaborateeach rulewithexamples from the scriptkeeping inmindthe Bangla Assamese and Manipuri communities Some combinations ofcharactersmayseemunrealisticorrareinusagebutthereisnoharminaddingsuch ligaturesbecause it ispossible tocreate thembyanyusereasilybutmaynotbeattestedcombinationsCase of V Preceded by H There could be cases involvingmulti-word domains where Vmay need to beallowedtofollowanHegব8াsঅtইিuয়া baeligŋkʌv ɪndiə (U+09ACU+09CDU+09AFU+09BEU+0999U+09CD U+0995 U+0985 U+09AB U+09CD U+0987 U+09A8 U+09CD U+09A1U+09BFU+09DFU+09BE)(meaningBankofIndia)ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendswithanH(অt)andthesecondwordbeginswithaV(ইিuয়া)Somesectionsofthe linguistic community require the explicit presence of H for fullrepresentation of the sound intended However by and large the form of thefirstwordwithoutanH(U+09CD)isconsideredenoughforfullrepresentationofthesoundintendedforthefirstword

49

This isauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoire Otherwise V is never required to be allowed to follow an HPermittingthismaycreateaperceptivesimilaritybetweentwolabels(withandwithout H) for majority of the linguistic communities hence this is explicitlyprohibitedbytheNBGPIn future if required depending on the prevailing requirements from thecommunitythefutureNBGPmayconsiderrevisitingthisrule

72 AdditionalExamplesfromBanglaABNF

Belowarea fewexampleswhichhelponeunderstandsomeof the rulesABNFputsinplaceThesearejustgivenforreferencepurposesandarenotmeanttobecomprehensive

1HMBDorXcannotoccurinthebeginningofaBanglawordExample

ক क

াক ाक

ংক क

ক क

ঃক ःकAscanbeseensuchcombinationwillresultautomaticallyinaldquogolurdquooradottedcircle marking it as an invalid formation This is an intrinsic property of theIndianlanguagesyllableandisquasi-automaticallyappliedwhereversupportedbytheOS

2HisnotpermittedafterVBDXMS

Example

অ अ

অং क ক क কঃ कः ি )क

3NumberofBDorXpermittedafterConsonantorVowelorakāra(Mātrā)isrestrictedtoonethusthefollowingcombinationsareinvalidated

50

Example

কংং क

ক क

কঃঃ कःः

কা का

কীঃঃ कःः

অংং अ

অ अ

অঃঃ अःः

4NumberofMpermittedafterConsonantisrestrictedtooneExample

কীী की5MisnotpermittedafterV Example

ইাঈৗ ईाईौ6ThecombinationsofAnusvāra+ VisargaaswellasVisarga+Anusvāraarenotpermissible

Example

কংঃ कः

কঃং कः

8 Contributors

81ExpertsfromIndia ProfessorUdayaNarayanaSinghChair-ProfessorofLinguisticsampDeanFacultyofArtsAmityUniversityHaryanaGurgaonPachgaonManesarPIN122431(Haryana)India

51

ProfessorPabitraSarkarformerlyVice-ChancellorRabindraBharatiUniversityKolkataDrAtiurRahmanKhanPrincipalTechnicalOfficerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMrRajibChakrabortyLinguistSocietyforNaturalLanguageTechnologyResearch(SNLTR)Module114amp130SDFBuildingSaltLakeSector-VKolkata-700091(WestBengal)IndiaMrAkshatJoshiProjectEngineerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMsMoumitaChowdhurySeniorTechnicalOfficerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMrChandrakantaMurasinghAgartalaTripuraSomeotherNBGPmembers

82ContributorsfromBangladeshJanabMustafaJabbarHonorableMinisterMinistryofPostsTelecommunicationsampInformationTechnologyGovtofBangladeshProfShamsuzzamanKhanFormerDirector-GeneralBanglaAcademyDhakaProfRafiqulIslamNationalProfessorofHumanitiesDhakaProfSwarochisSarkarDirectorInstituteofBangladeshStudiesRajshahiUniversityRajshahiBangladeshProfJinnatImtiazAliDirector-GeneralInternationalMotherLanguageInstituteDhakaMrMohammadMamunOrRashidDepartmentofBanglaJahangirnagarUniversityampMemberBangladeshComputerCouncilProfManiruzzamanformerlyProfessorChittagongUniversityChattagramBangladeshMrShyamSunderSikderSecretarySecretaryPostampTelecommunicationsDivisionGovtofBangladesh

52

MrMdMustafaKamalFormerDirectorGeneralBangladeshTelecommunicationsRegulatoryAuthorityGovernmentofBangladeshDhakaBrigadierGeneralMdMahfuzulKarimMajumderDirector-GeneralEngineeringampOperationsDivisionBangladeshTelecommunicationsRegulatoryAuthorityGovernmentofBangladeshDhakaMdZiarulIslamProgrammerPostsampTelecommunicationsDivisionGovernmentofBangladeshDhakaProfSyedShahriyarRahmanDepartmentofLinguisticsUniversityofDhakaDrMizanurRahmanDirector(In-Charge)TranslationTextBookandInternationalRelationsDivisionBanglaAcademyDhakaDrApareshBandyopadhyayDirectorBanglaAcademyDhakaMrMdMobarakHossainDirectorBanglaAcademyDhakaDrJalalAhmedDirectorBanglaAcademyDhakaMrJahangirHossainInternetSocietyBangladesh(ICANNALS)JanabSarwarMostafaChoudhuryBangladeshComputerCouncilDhakaJanabMdRashidWasifBangladeshComputerCouncilDhakaJanabIstiaqueArifSeniorAssistantDirectorBangladeshTelecommunicationsRegulatoryAuthorityDhakaMsAfifaAbbasInformationSecurityandGovernanceLeadEngineeratBanglalinkandICANNFellowMr Mohammad Abdul Haque Secretary General Bangladesh Internet GovernanceForumMrImranHossenCEOEyeSoftandkeymemberofBangladeshAssociationofSoftwareampInformationServices(BASIS)MsShahidaKhatunDirectorFolkloreMuseumampArchiveDivisionBanglaAcademyDhakaMrSyedAshikRehmanCEOBengalMediaCorporationDhakaMrHaseebRahmanCEOProfessionalsrsquoSystemsDhaka

53

9 References[100] UnicodeConsortium2017UnicodeStandard100MountainViewCA

[101] BandyopadhyayChittaranjan1981DuiShatakerBanglaMudranoPrakashanKolkataAnandaPublishers

[102] BanerjiRD1919TheOriginoftheBengaliScriptKolkataNewDelhiAsianEducationalServices2003reprint

[103] ChatterjiSK1926TheOriginandDevelopmentoftheBengaliLanguageCalcuttaCalcuttaUniversityPress

[104] -----1939Bhasha-prakashBangalaVyakaran(AGrammaroftheBengaliLanguage)CalcuttaUniversityofCalcutta

[105] HaiMuhammadAbdul1964DhvaniVijnanOBanglaDhvani-tattwa(PhoneticsandBengaliPhonology)DhakaBanglaAcademy

[106]JhaSubhadra1958TheFormationofMaithiliLondonLuzacampCo

[107] KosticDjordjeDasRheaS1972AShortOutlineofBengaliPhoneticsCalcuttaStatisticalPublishingCompany

[108] MajumdarRC1971HistoryofAncientBengalCalcuttaGBhardwaj

[109] MazumdarBijaychandra19202000TheHistoryoftheBengaliLanguage(ReprCalcutta1920ed)NewDelhiAsianEducationalServices

[110] PandeyAnshuman2001ProposaltoEncodetheTirhutaScriptinISOIEC10646

[111] PalPalashBaran2001DhwanimalaBarnamalaKolkataPapyrus

[112] -----2007lsquoBanglaHarapherPanchParbarsquoInSwapanChakrabortyedMudranerSanskritiOBanglaBoiKolkataAbabhas

[113] RossFiona1999ThePrintedBengaliCharacteranditsEvolutionLondonCurzon

[114] ShastriMahamahopadhyayHaraPrasad1916HājārBacharērPurāṇaBāṅgālāBhāṣāyBauddhaGānōDōhāCalcuttaBangiyaSahityaParishat

[115] SinghUdayaNarayana(JointlyManiruzzaman)1983DiglossiainBangladeshandlanguageplanningCalcuttaGyanBharati214pp

[116] -----1987ABibliographyofBengaliLinguisticsMysoreCIILxii+316pp

[117] -----2017(withRajibChakrabortyBidishaBhattacharjeeampArimardanKumarTripathy)LanguagesandCulturesontheMarginGuidelinesforFieldworkonEndangeredLanguagesMimeoCentreforEndangeredLanguagesVisva-Bharati

54

[118] -----1980ScriptalchoiceandspellingreformAnessayinlanguageandplanningJournaloftheMSUniversityofBarodaSocialScienceNumber292173-186AmodifiedversionreprintedEAnnamalaiBjornJernuddandJoanRubinedsLanguagePlanningProceedingsofanInstituteMysoreCIIL405-417

[119] Sripantha1996JakhanChapakhanaEloKolkataPaschim-BangaBanglaAcademy

[120] SurAtul1986BanglaMudranerDushoBacharKolkataJijnasa

[121] ScriptBehaviourforBengaliVersion11TDILandC-DACPune

[122] BoraMahendra1981TheEvolutionofAssameseScriptJorhatAssamSahityaSabha

[123] ProposaltoEncodetheTirhutaScriptinISOIEC10646httpwwwunicodeorgL2L201111175r-tirhutapdfaccessedon25112017

[124] EthnologueAssameseintheLanguageCloudhttpswwwethnologuecomcloudasmaccessedon25112017

[125] BengalialphabetforManipurifoundinEthnologueManipuri(MeeteilonMeithei)httpswwwomniglotcomwritingmanipurihtmaccessedon20102019

[126] WikipediaBengalialphabethttpsenwikipediaorgwikiBengali_alphabetaccessedon25112017

[129]OmniglotSlyhetihttpwwwomniglotcomwritingsylotihtmaccessedon1052018

[130]WikipediaBishnupriyaManipurilanguagehttpsenwikipediaorgwikiBishnupriya_Manipuri_languageaccessedon25112017

[131] TheEMILLECIILCorpushttpmetashareeldaorgrepositorybrowsethe-emilleciil-corpusabdd35c8de6f11e2b1e400259011f6ea6bce74d38dbb42d881da76c64a6adb20accessedon1052018

[132]TheEMILLECIILCorpushttpcatalogelrainfoproduct_infophpproducts_id=696accessedon1052018

[133] BanglaLanguageampScript httpswwwisicalacin~rc_banglabanglahtmlaccessedon1052018

[134] SarkarPabitra1992BanglaBananSanskarSamasyaoSambhabanaKolkataChirayataPrakashan

[135] SarkarPabitra1993BanglaBhasharYuktabyanjanBhasha1123-45

55

[136] DashNiladriShekharandBBChaudhuri1998BanglaScriptAStructuralStudyLinguisticsToday121-28Alsoavailableathttpswwwacademiaedu9967428Bangla_Script_A_Structural_Study

[137] DaniAhmedHasan(1957)lsquoSrīhatta-NāgarīLipirUtpattioBikāśrsquoBanglaAcademyPatrika(Dhaka)Vol12(Bhadra-Agrahayan1364BangabdaNumber)pg1

[138] WikipediaSylhetiNagari httpsenwikipediaorgwikiSylheti_Nagariaccessedon1952018

[139] FuruiRyosuke(2015)lsquoVariegatedAdaptationsStateFormationinBengalfromtheFifthtoSeventhCenturyrsquoinBhairabiPrasadSahuampHermannKulkeedsInterrogatingPoliticalSystemsIntegrativeProcessesandStatesinPre-ModernIndiaChapter9Pp255-73NewDelhiManohar

[140] FergusonCharesAandMunierChowdhury(1960)lsquoPhonesofBengalirsquoLanguageVol36No1pp22-59

[141] ShahidullahMuhammad(2007)BuddhistMysticSongsDhakaMowlaBrothers

[142] RayPunyaSloka(1966)BengaliLanguageHandbookWashington

[143] HaiMuhammadAbdul(1960)AphoneticandphonologicalstudyofnasalsandnasalizationinBengaliDhakaUniversityofDhaka

[144] UnicodeConsortiumProposalSummaryFormtoAccompanySubmissionsforAdditionstotheRepertoireofISOIEC10646UNICODEhttpswwwunicodeorgL2L200202387r-syloti-formpdfaccessedonMay212018

[145] WikipediaOlChiki(Unicodeblock)httpsenwikipediaorgwikiOl_Chiki_(Unicode_block)accessedonMay212018

[146]BanglaScripthttpwwwbangladesh2000combdbangla_scripthtmlaccessedonMay212018

[147] BhattacharyaAshutoshed(1942)GopichandrerGanCalcuttaCalcuttaUniversity

[149] DasSisirKumar(1975)SahibsandMunshisAnAccountoftheCollegeofFortWilliamCalcutta

[150] IslamRafiqulPabitraSarkarMahbubulHaqampRajibChakraborty(eds)(2014)BanglaAcademyPromitoBanglaByabaharikByakaran(AFunctionalGrammarofStandardBangla)DhakaBanglaAcademy

[151] SarkarPabitra[2013]lsquoBanglaSpellingReformtheLongandShortofItrsquoBanglaJournal19215-232

[152] BanglaAcademy(2012)BanglaAcademyPromitoBanglaBananerNiyam(StandardBanglaSpellingasadoptedbyBanglaAcademy)DhakaBanglaAcademy

56

[153] SarkarPabitraampRajibChakraborty2018ldquoWhathashappenedSoFarIntermsof Script Reformsrdquo Paper presented at the Face to Face meeting jointly held by theBanglaAcademyDhakaampICANNatBanglaAcademyDhakaon10072018

[154] The Unicode Consortium 2018 The Unicodereg Standard Version 110 ndash CoreSpecificationChapter12P473

57

10 Appendix-I

101 AugmentedBackus-NaurFormalism(ABNF)The Augmented Backus-Naur Formalism (ABNF) is generic in nature and whenappliedtoaspecificlanguagescriptcertainrestrictionrulesapplyInotherwordsinagivenlanguagesomeoftheFormalismstructuresdonotnecessarilyapplyTotakecareofsuchcasesrestrictionrulesaresetinplaceTheserestrictionswillhelptofine-tunetheABNFIncaseofBangla13inparticularthefollowingrulesapply

1Khaṇḍata (ৎ) is NOT allowed at the beginning of an IDN label The sameappliestoঞandthevelarnasalঙintheBanglaSchemeoffive-foldlsquovargarsquo(asdefined under Table 5) Moreover Bangla does not allow ya (য়) in thebeginningofawordeitherbutwecanciteacoupleofnativeexamples forexamplethewordয়8াvেড়া(yaeligbbɔRo)fromthepoemlsquoLichuchorrsquowrittenbyKaziNazrul IslamHowever there are instances of it being used in namesmostlyof foreignoriginsuchasYaqubwhichmaybewrittenwithya(য়) inthe beginning as inয়াwব) In very recent timeswhile transliterating someChineseandJapanesenamesinBanglaonedoescomeacrossthepossibilityofKhaṇḍata (ৎ) followedbysa (স) inthebeginningofaword forexamplexেসিরং(Tsering)

2CHcancomewithKhandaTainonlythecasewhereCisra(র)(09B0)

ৎ6 asinভৎ6 সনা

3OnlyfollowingcombinationswithVHCMwillbeallowedrarrঅ8া(togetherpronouncedasaelig)asinঅ8ািসড(acid)rarrএ8া(togetheralsopronouncedasaelig)asinএ8ািসডএ8ােসািসেয়শান

(acidassociation)

102 lsquoSylhetiNagarılipirsquoorlsquoSilotirsquoThisversionofBanglascriptresemblesthe lsquoKaithīrsquoscript(ISO12954)usedbytheAccountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh andBiharndashwidelyinuseduringthe1880sTherewereseveralothernamesofSylheti

13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language Assamese and Maṇipuri have not been covered in this section

58

Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)

Table17ndashTheScriptTableofSylhetiNagarıorSiloti

103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints

1031 BanglaandNāgarīorDevanāgarī

Bangla Devanāgarī NBGP

Decision

ঃ U+0983 ः U+0903 Confusable

ওU+0993 उU+0909 Confusable

ঘU+0998 घU+0918 Confusable

U+0981 U+0945 Confusable

Table18BanglaandDevanāgarīconfusablecodepoints

59

1032 BanglaandGurmukhi

Bangla Gurmukhi NBGPdecision

ঘU+0998 ਬU+0A2C Confusable

U+0981 U+0A71 Confusable

Table19BanglaandGurmukhiconfusablecodepoints

Bangla Gurmukhi

NBGPdecision

ওU+0993 ਤU+0A24

Distinguishable

শU+09B6 ਅU+0A05

Distinguishable

মU+09AE ਮU+0A2E

Distinguishable

বাU+09ACandU+09BE

ਗU+0A17

Distinguishable

Table20ndashBanglaandGurmukhıdistinguishablecodepoints

1033BanglaandOriya(Odia)

Bangla Oriya(Odia) NBGPDecision

ওU+0993 ଓU+0B13 Confusable

Table21ndashBanglaandOriyadistinguishablecodepoints

Bangla Oriya(Odia) NBGP

Decision

ঘU+0998 ସU+0B38 Distinguishable

Table22ndashBanglaandOriyadistinguishablecodepoints

60

11 Appendix-IIBengaliconsonantsandtheirallographs

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

প p y(z+ত)(z +ন)|(z +প)প8(z +য)r(z +র)(z +ল)~(z +স)(+প)(+প)

ফ pʰ (t +র)(t +ল)(+ফ)

ব b ( +জ)( +দ)( +ধ)v(+ব)ব8(+য)(+র)(+ল)ভ(+ভ)(+ব)(+ব)

(0+ধ) 2 (3+ব)

ভ bʱ ভ8(+য)(+র)(+ল)

ত t (x+ত)8(x+x+য)(x+x+ব)(x+থ)(x+ন)ত8(x+য)(x+ম)8(x++য)(x+ব)(x+র)y(z+ত)(p+ত)(p+x+ব)(+ত)q8(+x++য) (+x+র)Thereisamarkedformofত+=ৎৎ6 (+xৎ)

amp (5+ত)

61

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

থ tʰ থ8(iexcl+য)cent(iexcl+র)pound(+থ)(x+থ)curren(+থ)

(7+থ) 9 (+থ)

দ d yen(brvbar+গ)sect(brvbar+ঘ)uml(brvbar+দ)copy(brvbar+ধ)দ8(brvbar+য)ordf(brvbar+ব)laquo(brvbar+ভ)not(brvbar+র)(+দ)shy(+দ)reg(+brvbar+র)not6 (+brvbar+র)

(lt+গ) gt (lt+ধ)

ধ dʱ macr(deg+ন)plusmn(deg+ম)ধ8(deg+য)sup2(deg+র)sup3(acute+ধ)copy(brvbar+ধ)(+ধ)micro(+ধ)

( (+ধ) gt (lt+ধ) (0+ধ) (7+ধ)

ট ʈ para(middot+ট)ট8(middot+য)cedil(middot+ব)sup1(middot+র)ordm(p+ট)raquo(frac14+ট)

ঠ ʈʰ ঠ8(frac12+য)frac34(iquest+ঠ)Agrave(frac14+ঠ)

ড ɖ Aacute(Acirc+ড)ড8(Acirc+য)Atilde(Acirc+র)

ঢ ɖʱ ঢ8(Auml+য)Aring(iquest+ঢ)

চ tʃ AElig(Ccedil+চ)Egrave(Ccedil+ছ)Eacute(Ccedil+Ecirc+র)Euml(Ccedil+ঞ)চ8(Ccedil+য)Igrave(Iacute+চ)Icirc(Iuml+চ)

(A+চ)

62

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ছ tʃʰ ETH(Ecirc+র)Egrave(Ccedil+ছ)Ntilde(Iacute+ছ)Ograve(Iuml+ছ)

$ (A+ছ)

জ dʒ Oacute(Ocirc+জ)Otilde(Ocirc+Ocirc+ব)Ouml(Ocirc+ঝ)times(Ocirc+ঞ)জ8(Ocirc+য)Oslash(Ocirc+র)Ugrave(Iacute+জ)

(A+জ)

ঝ dʒʱ (notprivilegedenoughtohaveclustersasafirstmember)Ouml(Ocirc+ঝ)Uacute(Iacute+ঝ)

ক k Ucirc(p+ক)ordm(p+ট)(p+ত)Uuml(p+x+র)(p+x+ব)Yacute(p+ন)THORN(p+ব)szlig(p+ম)ক8(p+য)agrave(p+র)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)auml(p+frac14+ব)aacute8(p+frac14+য)aring(p+স)s(aelig+ক)ccedil(+p+র)

amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)

খ kʰ (notprivilegedenoughtohaveclustersasafirstmember)egrave(aelig+খ)

63

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

গ g eacute(acute+গ)ecirc(acute+দ)sup3(acute+ধ)euml(acute+ন)igrave(acute+ব)iacute(acute+ম)গ8(acute+য)icirc(acute+র)iuml(acute+ল)eth(aelig+গ)eth6 (+aelig+গ)

( (+ধ) (H+গ) K (L+H+গ)

ঘ gʱ ntilde(ograve+ন)ঘ8(ograve+য)oacute(ograve+র)ocirc(aelig+ঘ)

ঞ Thisletterdoesnothaveanyparticularphoneticvaluebutmostlypronouncedasn

Igrave(Iacute+চ)Ntilde(Iacute+ছ)Ugrave(Iacute+জ)Uacute(Iacute+ঝ)times(Ocirc+ঞ)

(A+চ) $ (A+ছ) (A+জ) M (A+ঝ)

ণ n otilde(iquest+ট)frac34(iquest+ঠ)ouml(iquest+ড)divide(iquest+Acirc+র)Aring(iquest+ঢ)oslash(iquest+ণ)ণ8(iquest+য)ugrave(iquest+ব)acirc(p+frac14+ণ)uacute(frac14+ণ)ucirc(+ণ)

O (P+ড) - (P+R+র) + (S+ণ)

ঙং ŋ s(aelig+ক)uuml(aelig+p+র)egrave(aelig+খ)eth(aelig+গ)ocirc(aelig+ঘ)yacute(aelig+p+ষ)(Insomecontextsaeligisreplacedbyং)কংঅং

) (H+ক) (H+গ) U (H+ঘ)

64

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ম m thorn(+ল)yuml(+প)(+z+র)(+ভ)(++র)$(+ম)(+র)(x+ম)plusmn(deg+ম)amp(+ম)atilde(p+frac14+ম)

W (3+ম)

ন n (+ট)((+middot+র))(iquest+ঠ)u(+ড)(+Acirc+র)(+ত)q(+x+র)q8(+x++য)curren(+থ)shy(+দ)reg(+brvbar+র)micro(+ধ)+(+deg+র)(+brvbar+ব)-(+ন)(+ম)ন8(+য)(+স)0(+ন)

(7+থ) (7+ধ) (7+Y+র)

শ ʃ Icirc(Iuml+চ)Ograve(Iuml+ছ)1(Iuml+ন)2(Iuml+ম)3(Iuml+র)4(Iuml+ল)শ8(Iuml+য)

ষ ʃ 5(frac14+ক)raquo(frac14+ট)Agrave(frac14+ঠ)uacute(frac14+ণ)6(frac14+প)7(frac14+z+র)8(frac14+ফ)raquo(frac14+ট)9(frac14+middot+র)Agrave(frac14+ঠ)uacute(frac14+ণ)ষ8(frac14+য)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)

+ (S+ণ)

65

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

স sampʃ (+ক)(+ট)(+প)(+ফ)lt(+ত)pound(+থ)(+ট)(+ক)=(+খ)স8(+য)gt(+র)(+ল)aring(p+স)

9 (+থ)

হ h ucirc(+ণ)0(+ন)amp(+ম)হ8(+য)(+র)A(+ল)

W (3+ম)

ড় ɽ B(C+গ)

ঢ় ɽʱ (notprivilegedenoughtohaveclusters)

য dʒThesecondarysymbol(allograph)jɔ-phalahastwophoneticvaluesWhenaddedtotheinitialconsonantinaworditisavowelaelig(asinশ8ামলর 8াপারetc)Butafteranon-initialconsonantitjustdoublesitinpronunciation(asinকায6ধায6etc)The+যcombinationhastwophysicalmanifestationsmdashর 8andয6

ক8(p+য)স8(+য)র 8(+য)[Justর 8isneverusedinBanglaorthographyর 8াisbutthenitslasttwosymbolsYa-phalaa-karaconstituteavowelsignrepresentingthevowelঅ8া]

66

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

র r Twomanifestationsmdashi lরফrepʰasthefirst

memberofaclusteregপ6ৎ6 not6 য6D6(+deg+ব)(earlierE6=+brvbar+deg+বafour-termcluster)etc(placedoverthefollowingconsonant)

ii র-ফলাrɔ-pʰɔlaasthesecondthirdmemberofaclustereg etc(placedundertheconsonantitfollows)

ল l F(+গ)(+প)G(+ব)H(+ম)I(+ট)J(+ড)K(+ক)F(+গ)L(+দ)ল8(+য)iuml(acute+ল)(+ল)thorn(+ল)

ঃ hwordfinallywordmediallyitdoublesthepronunciationofthefollowingconsonant

অঃকঃ

অব

Page 23: Proposal for a Bangla (or Bengali) Script Root Zone …‘Bangla’ (or Bengali) is historically and genealogically regarded as an eastern Indo-Aryan language with around 178.2 million

23

ExampleBanglaSignAvagrahaঽ(U+093D)evenifallowedbyIDNAprotocol isnotpermittedintheRootZoneRepertoireaspertheMSRTosumuptherestrictionsstartoffwithadmittingonlysuchcharactersasarepartofthe code-block of the given scriptlanguage The IDNA Protocol further narrows thisdownandfinallyanadditionalfilterintheformofMaximalStartingRepertoirerestrictsthecharactersetassociatedwiththegivenlanguageevenmore4211 NoPunctuationMarksTheTLDsbeingidentifierspunctuationmarkerspresentinBraHami-basedscriptswillnotbeincluded 4212 NoSymbolsandAbbreviationsAbbreviations weights and measures and other such iconic characters like BANGLAISSHAR(U+09FA)BANGLACURRENCYDENOMINATORSIXTEEN৹(U+09F9)etcwillalsonotbeincluded4213 NoRareandObsoleteCharactersThere are characterswhich have been added toUnicode to accommodate rare formssuchasSanskriticVOCALICRRৠ(U+09E0)andVOCALICLldquoঌrdquo (U+098C)aswellasVOCALICLLৡ(U+09E1)andtheallographicndashkaraformsofthelattertwosymbols-VOWELSIGNVOCALICL(U+09E2)andVOWELSIGNVOCALICLLldquo(U+09E3)Allsuch charactersareexcludedwhich complieswith theConservatismprincipleas laiddownintheRootZoneLGRprocedureHoweverinBanglathe-karacorrespondingtoVOCALICRRৠ(U+09E0)whichisVOWELSIGNVOCALICRRldquordquo(U+09C4)isstill inactiveuseincertainlimitedborrowedorSanskriticwordsandarethereforeretained4214 NoStressMarkersofClassicalSanskritandVedicStressmarkers for classical Sanskrit will not be included This is also in consonancewiththeLetterprincipleaslaiddownintheRootZoneLGRprocedure4215 ABNFThe Augmented Backus-Naur Formalism (ABNF) is described in Section 541 andAppendix(Section101)

5 RepertoireTheBanglaWritingSystemisrepresentedinUNICODEusingtheBengali(Bangla)scriptname as enumerated in ISO 15924 corresponding to languages such as Asamiyā(Assamese) Bangla (Bengali) and Manipuri The BENGALI block used for Bangla-

24

Asamiyā-Manipuri in theUNICODEhas93 entriesThis sectiondetails the code-pointrepertoirethattheNeo-BrāhmīGenerationPanel[NBGP]proposestobeincludedintheBanglaLGRItmaybementionedherethat theGovernmentofAssamhassubmittedaproposal toBureauof Indian Standards (BIS) on26th February2016 for dis-unificationofBanglaand Asamiyā Scripts The BIS inits 8thMeetingofIndian Language Technologies andProducts Sectional Committee LITD 20 held on 23rd Aug 2017 decided torefer theproposalforrecognitionofAssamesescriptinISOIEC10646toISOUntiltheUNICODEConsortiumtakesanyfurtheractionitwillbeassumedthattheCodePointRepertoireunderTable11willbevalidforallthethreelanguagesasaboveFor each of the code points language references have been given in the last columntitledReferenceunderTable8titledtheldquoCodePointRepertoirerdquoForentirecoverageofBanglacodepointsreferencesofBanglaAsamiyā(Assamese)Manipuri(Meitei)andBishnupriya are given Kokborok written in Bangla script is not known to haveintroducedmanynewcomplicationsexceptforoneparticularcharacterThoughonlyafewrepresentativelanguagesunderEGIDSScale1-4havebeenchosenforreferencingthey together cover all the code-points required for all the languages that NBGP hasconsideredasgivenunderBanglaUnicodePoints(asgiveninUNICODE63)Howeverbefore thedetailsarepresented it is ideal to lookat theBanglaCodePointChartfromMaximalStartingRepertoire[MSR]Version3Itmaybenotedthattheshapesofthereferenceglyphsgivenbelowinthecodechartsarebasedononeofthemanyfontsdesigned and are not prescriptive because there could be some variations in actualfonts ndash both UNICODE-compatible and True-Type ones Consider the following Codepointtable

25

Figure1BanglaCodePagefrom[MSR]forBangla-Asamiyā-Manipuri

ColourconventionAllcharactersthatareincludedinthe[MSR]-YellowbackgroundPVALIDinIDNA2008butexcludedfromthe[MSR]-PinkishbackgroundNot PVALID in IDNA2008 or are ineligiblefor the root zone (digits hyphen) - Whitebackground

GiventheBanglaUnicodeBlockasinFigure1forthecodepointsthoseareincludedintheMSRthefollowingsymbolswillneedaseparatetreatmentৎ U+09CE BanglaLetterKhanda-Taৰ U+09F0 Asamiyā-BanglaLetterRaWithMiddleDiagonalৱ U+09F1 Asamiyā-BanglaLetterRaWithLowerDiagonal

26

51 CodePointRepertoireInclusion

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

1 U+0981 BENGALISIGNCANDRABINDU

Candra-bindu

1Bangla2Manipuri2Assamese

[112][122][125]

2 U+0982 ং BENGALISIGNANUSVARA

Onushshar(Anusvara)

1Bangla2Manipuri2Assamese

[112][122][125]

3 U+0983 ঃ BENGALISIGNVISARGA

Biśarga(Visarga)

1Bangla2Manipuri2Assamese

[112][122][125]

4 U+0985 অ BENGALILETTERA

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

5 U+0986 আ BENGALILETTERAA

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

6 U+0987 ই BENGALILETTERI

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

7 U+0988 ঈ BENGALILETTERII

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

27

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

8 U+0989 উ BENGALILETTERU

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

9 U+098A ঊ BENGALILETTERUU

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

10 U+098B ঋ BENGALILETTERVOCALICR

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

11 U+098F এ BENGALILETTERE

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

12 U+0990 ঐ BENGALILETTERAI

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

13 U+0993 ও BENGALILETTERO

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

14 U+0994 ঔ BENGALILETTERAU

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

28

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

15 U+0995 ক BENGALILETTERKA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

16 U+0996 খ BENGALILETTERKHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

17 U+0997 গ BENGALILETTERGA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

18 U+0998 ঘ BENGALILETTERGHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

19 U+0999 ঙ BENGALILETTERNGA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

20 U+099A চ BENGALILETTERCA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

21 U+099B ছ BENGALILETTERCHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

29

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

22 U+099C জ BENGALILETTERJA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

23 U+099D ঝ BENGALILETTERJHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

24 U+099E ঞ BENGALILETTERNYA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

25 U+099F ট BENGALILETTERTTA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

26 U+09A0 ঠ BENGALILETTERTTHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

27 U+09A1 ড BENGALILETTERDDA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

30

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

28 09A109BC(U+09DC)

ড় NormalizedformofBENGALILETTERRRA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]09DCisthepreferredcodepointhoweveritisnotavailableforLGRasperthestandardsgoverningthisLGRdevelopment

29 U+09A2 ঢ BENGALILETTERDDHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

30 09A209BC(U+09DD)

ঢ় NormalizedformofBENGALILETTERRHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]09DDisthepreferredcodepointhoweveritisnotavailableforLGRasperthestandardsgoverningthisLGRdevelopment

31 U+09A3 ণ BENGALILETTERNNA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

32 U+09A4 ত BENGALILETTERTA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

31

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

33 U+09A5 থ BENGALILETTERTHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

34 U+09A6 দ BENGALILETTERDA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

35 U+09A7 ধ BENGALILETTERDHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

36 U+09A8 ন BENGALILETTERNA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

37 U+09AA প BENGALILETTERPA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

38 U+09AB ফ BENGALILETTERPHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

39 U+09AC ব BENGALILETTERBA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

32

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

40 U+09AD ভ BENGALILETTERBHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

41 U+09AE ম BENGALILETTERMA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

42 U+09AF য BENGALILETTERYA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

43 09AF09BC(U+09DF)

য় NormalizedformofBENGALILETTERYYA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]09DFisthepreferredcodepointhoweveritisnotavailableforLGRasperthestandardsgoverningthisLGRdevelopment

44 U+09B0 র BENGALILETTERRA

Consonant 1Bangla2Manipuri

[112][125]

45 U+09B2 ল BENGALILETTERLA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

46 U+09B6 শ BENGALILETTERSHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

33

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

47 U+09B7 ষ BENGALILETTERSSA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

48 U+09B8 স BENGALILETTERSA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

49 U+09B9 হ BENGALILETTERHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

50 U+09BE া BENGALIVOWELSIGNAA

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

51 U+09BF ি BENGALIVOWELSIGNI

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

52 U+09C0 ী BENGALIVOWELSIGNII

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

53 U+09C1 BENGALIVOWELSIGNU

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

34

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

54 U+09C2 BENGALIVOWELSIGNUU

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

55 U+09C3 BENGALIVOWELSIGNVOCALICR

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

56 U+09C4 BENGALIVOWELSIGNVOCALICRR

Kāra(Mātrā)

1Bangla2Assamese

[112][122]

57 U+09C7 l BENGALIVOWELSIGNE

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

58 U+09C8 m BENGALIVOWELSIGNAI

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

59 U+09CB lা BENGALIVOWELSIGNO

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

60 U+09CC lৗ BENGALIVOWELSIGNAU

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

35

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

61 U+09CD BENGALISIGNVIRAMA

Hasanta(=Halant)Virama(=Da ri)

1Bangla2Assamese2Manipuri

[112][122][125]

62 U+09CE ৎ BENGALILETTERKHANDATA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

63 U+09F0 ৰ BENGALILETTERRAWITHMIDDLEDIAGONAL

Consonant 2Assamese [122]

64 U+09F1 ৱ BENGALILETTERRAWITHLOWERDIAGONAL

Consonant 2Assamese2Manipuri

[122][125]

Table8BanglaCode-PointRepertoire

Apart from the above individual code-points the Neo-Brāhmī Generation Panel alsoproposes some specific sequences which enable conditional inclusion of the BanglaLETTER A and E followed by Bangla SIGN VIRAMA and Bangla LETTER YA againfollowed by Bangla VOWEL SIGN AA in the repertoire for enabling inclusion of aeligsoundasinEnglishlsquobatrsquolsquocatrsquoetc

SrNo

UnicodeCodePoints

Sequence

CharacterNames Examplelanguagesusingthecode-point

(Notexhaustive

list)

Reference

36

SrNo

UnicodeCodePoints

Sequence

CharacterNames Examplelanguagesusingthecode-point

(Notexhaustive

list)

Reference

S1 098509CD09AF09BE

অ8া BENGALILETTERABENGALISIGNVIRAMABENGALILETTERYABENGALIVOWELSIGNAA

BanglaAssamese

[112][122]

S2 098F09CD09AF09BE

এ8া BENGALILETTEREBENGALISIGNVIRAMABENGALILETTERYABENGALIVOWELSIGNAA

Bangla [112]

Table9Sequences

52 CodePointRepertoireExclusionTherearesomecharactersoftheBanglascriptthatfindplaceintheUnicodebuthavenot been included in the repertoire in the LGR proposal The reason for excludingঌ(U+098C)andৗ(U+09D7)isthattheyarerareandobsoletecharacters

SrNo CodePoints

Glyph CharacterNames Note

1 U+098C ঌ BENGALILETTERVOCALICL Limitedordeclininguse

2 U+09D7 ৗ BENGALIAULENGTHMARK Limitedordeclininguse

Table10ExcludedCodePoints

37

53 CodepointnotusedaloneBENGALI SIGN NUKTA U+09BC (See 336) is excluded from repertoire since it will never be used alone It will be used as sequence in three special characters in normalized form for ড় ঢ় য়

UnicodeCodePoint

Glyph CharacterName Reasonforexclusion

U+09BC BENGALISIGNNUKTA

Never used alone Only used together with U+09A1 ড

U+09A2 ঢ U+09AF য as to form ড় ঢ় য় respectively

Table10bExcludedCodePoints

54 TheBasisofPresentIDNThepresentLGRhasalsobenefitedfromtheearlierworkonIDNforBangla(differentversions)doneforभारतorভারতdraftedbetween20112009and18072013

541 TheABNFVariablesTheAugmentedBackus-NaurFormalism(ABNF)beganwiththefollowingvariables

CrarrConsonantVrarrVowelMrarrkāra(Mātrā)BrarrAnusvāra(onuʃʃār)DrarrCandrabinduXrarrVisarga(biʃɔrgo)HrarrHasantaViramaZrarrKhandaTa

TheAugmentedBackus-NaurFormalism(ABNF)willusethefollowingOperators

SrNumber Operator Function

1 ldquo|ldquo Alternative

2 ldquo[]rdquo Optional

3 ldquordquo VariableRepetition

38

4 ldquo()rdquo SequenceGroup

Table11TheABNFFormalism

InwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoBanglaaregiventofacilitateunderstanding

542 TheVowelSequenceInwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoBanglaare given To facilitate understanding of other Brahmi script users equivalents inDevanāgarīareprovidedwherevernecessaryAvowelsequenceismadeupofasinglevowelItmaybefollowedbutnotnecessarily(optionally)byanAnusvāraonuʃʃār(B)Candrabindu(D)oraVisargabiʃɔrgo(X)ThenumberofDBorXwhichcanfollowaV inBanglamaynotberestrictedtooneGoingbytherules illustratedinthedocument it isclearthat formationssuchasVDDVBBandVXXare invalidorthographicunitsHowever it isvalidandpossible tohaveformationsorsequencessuchasanusvarafollowedbyachandrabinduononehandandvisarga followed by a chandrabindu on the other as in হ8াংচা lsquohaelignchārsquo and lsquohaelignrsquo হ8াঃrespectivelyThepossibilityof aVisarga orAnusvāra (onuʃʃār) followingaCandrabinduexists inBangla Vowel can optionally be followed by a combination of Hasanta Virama [H]Consonant [C] to formaYa-phala ldquoYa-phala isapresentation formofU+09AFBanglaletterযorlsquoyarsquoRepresentedbythesequenceltU+09CDieBENGALISIGNVIRAMABangla SIGNHasanta or VIRAMA U+09AF -য BENGALI LETTER YAgt Ya-phala has aspecialformয়AgainwhencombinedwithU+09BEাBENGALIVOWELSIGNAA(ielsquoaarsquo(ā))itisusedfortranscribing[aelig]asintheldquoardquointheEnglishwordldquobatrdquowritteninBanglaasব8াটAVowel-sequenceadmitsthefollowingcombinations

5421 ASingleVowel

ExamplesV অअ

5422 AVowelwithConditionsAVowelcanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D]orVisarga[X]or Candrabindu+ Anusvāra [DB] or Candrabindu+ Visarga [DX] or combination ofHasanta(orVirama)[H]followedbyConsonant[C]followedbykāra(Mātrā)[M]

39

Examples

VB অং अ

VD অ अ

VX অঃ अः

VDB অং अ

VDX অঃ अःVHCM অ8াএ8া

5423 VHCMSequenceAVHCMsequencecanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D] orVisarga [X]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga [DX]

Examples VHCMB অ8াংএ8াং VHCMD অ8াএ8া VHCMX অ8াঃএ8াঃ VHCMDB অ8াংএ8াং VHCMDX অ8াঃএ8াঃ

543 TheConsonantSequence

5431 ASingleConsonant(C) ExampleC কक

5432 AConsonantwithConditionsAConsonantoptionallyfollowedbydependentvowelsignkāra(Mātrā)[M]orAnusvāra [B] or Candrabindu [D] or Visarga [X] or Hasanta (also known asVirama)[H]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga[DX]

Example

CM িকক -कक

CB কং क

40

CD ক क

CX কঃ कः

CH p क(Pureconsonant)

CDB কং क

CDX কঃ कः

5433 CMSequenceACMsequencecanbeoptionallyfollowedbyBDXDBorDX

Example CMB কীংকং कक

CMD কা का

CMX বীঃ वीः

CMDB কাং का

CMDX কাঃ काः

5434 SequenceofConsonants Asequenceofconsonants(upto4)joinedbyHasanta(alsoknownasVirama)

3(CH)CExample

CHC W rarr ন++ ত न+त

CHCHC sup2 rarr ন+ + ত+ + র न+त+र

CHCHCHC q8 rarr ন++ত++র++য় न+त+र+य

5435 Subsets While considering its subsets as a representative example we will consider thecombinationCHConlyhoweverthesameisequallyapplicabletoCHCHCandCHCHCHC[A]ThecombinationmaybefollowedbyMBDXDBorDX

Example CHCM sup3ী rarrক ক ী 4कrarrककী

CHCB sup3ং rarrক ক ং 4कrarrकक

CHCD sup3 rarrক ক 4कrarrकक

41

CHCX sup3ঃ rarrক ক ঃ 4कःrarrककঃ

CHCDB sup3 ং rarrক ক ং 4कrarrकक

CHCDX sup3ঃ rarrক ক ঃ 4कःrarrककः [B] 3(CH)CMmayfurtherbefollowedbyaBDXDBorDX

Example

CHCMB sup3ীং rarr ক ক ী ং 4क rarr क क ी

sup3ং rarr ক ক ং 4क rarr क क

CHCMD sup3া rarr ক ক া 4का rarr क क ा

CHCMX sup3ীঃ rarr ক ক ী ঃ 4कः rarr क क ी ः

CHCMDB sup3াংrarr ক ক া ং 4काrarr क क ा

CHCMDX sup3াঃ rarr ক ক া ঃ 4काः rarr क क ा ः

544 TheKhanda-Tasequence

5441 AsinglelsquoKhandarsquo-Ta(Z) Example Zৎ=x

5442 AKhandaTaCombination10AKhandaTacanbeprecededbyaconsonantandHasanta(alsoknownasVirama)

[CH]Z

Example র++ৎ =ৎHasinভৎHসনা (bhartsanā) scoldingNoteTheconditionsinthiscontextofKHANDATAarethattheCshouldbeeitherRAU+09B0(র)(usedinBangla)orRAU+09F0(ৰ)(usedinAssamese)

545 SpecialCasesSandPTwospecialcasesinvolvingSequences(referredtoasSandPinTable16underSection7)couldbedescribedbrieflyhereLetustakeupSinthefirstinstanceItisnoteworthythat thereare two instances inBanglawhereHasanta(U+09CD) isprecededbya full

10 Refer to Rule P in Section 7 Table 16

42

vowel (U+0985 অ - BANGLA LETTER A and U+098F এ - BANGLA LETTER E) ForrenderingYa-phalāfollowedbyঅandএitisnecessarytotypeU+09CDplusU+09AFyapreceded by the said vowels This is a purely ligatural entity and the addition ofYa-phalāandā-kāraisusedtoelicittheaeligsoundasinEnglishlsquobatrsquolsquofatrsquoetcTheBrahmıscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribedaslsquovowel killerrsquo although it actually indicates absence of a vowel after the markedconsonant Only the consonants can have the Hasanta marked But as we see hereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅGাandএGা(CfUnicode100p473[100])Another case refers to the formation of repha and ra-phalā in the said script andmentioned in the tableaboveasPOwing toco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+

kaasinঅকHarkaldquothesunldquo)ra-phalā=C+Hasanta+ra(egTieka+Hasanta+raas

inচTcakraldquocycleldquo)ThepointisinboththecasestheslotforracouldbeBanglaraর

(U+09B0)ortheAssameseraৰ(U+09F0)followedprecededbythecommonHasanta(U+09CD)whereastheshapesofrephaandra-phalāinboththecasesremainthesame

43

6 VariantsThissection talksabout thevariants in theBanglascriptTheNBGPcategorizes theseconfusinglyvariantsintwogroupsGroup1ConfusingduetopurevisualsimilarityGroup2ConfusingduetodeviationfromnormallyperceivedcharacterformationsbylargerlinguisticcommunityForGroup1anyidenticalcodepointsaredefinedasvariantsTheconfusablebutnotidenticalcasesarenotproposedasthereisanotherpanel(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesHowevercaseswhichbelongtoGroup2areproposedtobeconsideredasvariantsThesecasesarenotofmerevisualsimilarityasthey involve some deviations from the widely accepted norms of Bangla Aksharformations These can cause confusion even to a careful observer and hence beingproposedasvariantsThe variants are generated in a script when two or more forms are formed withdifferent storage or code points In Bangla the e-kāra ā-kāra and the o-kāra havedifferentcodepointsOnecantypeowithaconsonantatonegoandthesamebytypinge-kāra and ā-kāra as two separate keys getting the same results A reader cannotdifferentiatebetween the twoko (েকা)one typedwitha singlekeyand theotheronetypedwithtwodifferentkeysMoreoverthiswillnotbeconsideredasacaseofvariantbecauseakārafollowedbyakāraisnotallowed

61 InScriptVariantsHoweverweproposetwocasesoftruein-scriptvariantsinBanglascriptCASEIAs far as true variants in Bangla are concernedwemay drawour attention to caseswhereinHasantawith(U+09A5)থ(tha)appearsasconjunctwith(U+09B8)স(sa)and(U+09A8)ন(na)1 স+Hasanta+থ(U+09B8+U+09CD+U+09A5)versus

স +Hasanta+হ(U+09B8+U+09CD+U+09B9)

2 ন+Hasanta+থ(U+09A8+U+09CD+U+09A5)versus

ন+Hasanta+হ(U+09A8+U+09CD+U+09B9)

44

Theabovecombinationsifwrittenintraditionalorthographycouldbelittleconfusingwheretheথ(tha)inconjunctappearslikeaহ(ha)Theconjunctcouldbeintheinitialmedialorfinalpositions(asshownbelowinegno1)Itcouldbetypedwrongaswellthinking itwas aহ (ha)U+09B9 increasing the chances of risks in labelwriting andidentificationExamples1 acuteandসহ(asinacuteানsthānaacuteলsthulaাacuteGsvāsthyaঅacuteায়ীasthāyı)

2 andনহ(asinparagrantha)The fontswhichrepresent traditionalBanglawritingsystemcould tend tocreate thisproblemThereforethesemaybetakenascasesofvariantsinBanglaCASEIIAnotherinterestingexampleofvariantisencounteredinra+HasantaandHasanta+racombinationsinwritinglabelsintheBanglascript(forlanguagessuchasBanglaAssameseandManipuri)Thevariantcasesariseintypinglsquorepharsquo(involvingra+Hasanta)andlsquora-phalārsquo(involvingHasanta+ra)lsquoRepharsquocouldbeformedbytwosequences(mainlybecausebothAssameseandBanglafindplaceinthesameUNICODEpointsandlsquoB_RArsquoaswellaslsquoA_RArsquorefertothesamephoneticelement)Herethefinalligatureslookthesameandwillbeasfollows

(1) B_RA+H+C(2) A_RA+H+C

Where

B_RA = U+09B0BENGALILETTERRA(র)orA_RA = U+09F0BENGALILETTERRAWITHMIDDLEDIAGONAL(ৰ)H = U+09CDBENGALISIGNVIRAMA()C = anyconsonant(theoretically)

Example

Sequence1(UsingBanglaRA)

Ligature1

Sequence2(UsingAssameseRA)

Ligature2

U+09B0(র)U+09CD()U+0995(ক) কH U+09F0(ৰ)U+09CD()U+0995(ক) কH

U+09B0(র)U+09CD()U+09A0(ঠ) ঠH U+09F0(ৰ)U+09CD()U+09A0(ঠ) ঠH

Table12ExampleofRepha

45

NoteAsBanglaandAssameseকandঠlookexactlythesametheresultantcombinationswithRephalookidenticalAdditionofRephadoesnotmakeanydifferencelsquoRa-phalarsquocouldbeformedbytwosequencesonsimilargroundsandthefinalligatureswouldlookthesame

(1) C1+H+B_RA(2) C1+H+A_RA

WhereC1 = anyconsonantsexceptKhanda-ta

Example

Sequence1(UsingBanglaRA)

Ligature1

Sequence2(UsingAssameseRA)

Ligature2

U+0995(ক)U+09CD()U+09B0(র) U+0995(ক)U+09CD()U+09F0(ৰ)

U+09A8(ন)U+09CD()U+09B0(র) ) U+09A8(ন)U+09CD()U+09F0(ৰ) )

Table13ExampleofRa-phala

AstheAssameseandBanglaRephaandRa-phalaconjunctformslookthesamethiscouldcauseconfusabilitytotheend-usersHencetherephaandra-phalacasesneedtobedefinedasvariantsNBGPconcludedtodefineরandৰasvariantcodepointswhereonlyonevariantsetbetweenরandৰcouldcoverallcasesButthiswillcreateblockedvariantlabelsegif

someoneregistersldquoরররrdquothevariantlabelldquoৰৰৰrdquowillbegeneratedasvariantandwillbeblockedandviceversaHoweveritisonlyblockedatthelabellevelifsomeoneelseneedstoregisterotherlabelsegৰৰorৰৰৰৰitisstillpossible

62 CrossScriptVariantsAcrispcrossscriptstudyforBanglahasbeendonewithrespecttosisterscriptssuchasDevanāgarī Gurmukhı and Odia11(formerly Oriya) keeping in mind the visual andtechnicalconfusionstheymaycauseaslabelsonthewebdomainMoreoverthereisnoin-script variant in Bangla as far as the orthography is concerned The followingcharacters are being proposed by the NBPG as variants Although there are certaincharacters which are somewhat similar they but have not been included here TheyhavebeenprovidedintheAppendix(102)forreference

11 Unicode uses Oriya for the script although Odia is now the official term used

46

1 BanglaandNāgarīDevanāgarīScript

Bangla Devanāgarī

মU+09AE

मU+092E

িU+09BF

िU+093F

Table14-BanglaandDevanāgarīcross-scriptvariantcodepoint

2 BanglaandGurmukhiScript

Bangla Gurmukhı

মU+09AE

ਸU+0A38

িU+09BF

ਿU+0A3F

Table15-BanglaandGurmukhıcross-scriptvariantcodepoint

7 WholeLabelEvaluationRules(WLE) ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinsection32whenwritten inBangla12ScriptThe ruleshavebeendrafted in suchawaythattheycanbeeasilytranslatedintotheLGRspecificationsBelow are the symbols used in the WLE rules for each of the Indic SyllabicCategoryasmentionedinthetableprovidedinCodepointrepertoire(Section51)

C rarr Consonant

M rarr Kāra(Mātrā)

V rarr Vowel

B rarr Anusvāra

12 As used by the Unicode denoting and including both Assamese and Maṇipuri

47

D rarr Candrabindu

X rarr Visarga

H rarr Hasanta

Z rarr KhandaTa

S rarr S1S2(fromTable9)or(ae)Ya-phalā(V1HC1M1)whereV1iseither0985(অ-BENGALILETTERA)or098F(এ-BENGALILETTERE)His09CD(-BENGALISIGNVIRAMA)C1is-09AF(য-BENGALILETTERYA)M1is-09BE(া-BENGALIVOWELSIGNAA)S1 and S2 are valid even they are not allowed by the other context rules

P rarr Ra-Hasanta(C2H)whereC2iseither09B0(র-BENGALILETTERRA)or 09F0 (ৰ - ASSAMESE LETTER RA

Unicode name BENGALI LETTER RAWITHMIDDLEDIAGONAL)

His09CD(-BENGALISIGNVIRAMA)

Table16-SymbolsusedinWLErules

ItisalsoperhapsidealtomentionherethatinBanglatheconsonantletters(orgraphemes)arephysicallyjoinedtoformldquoclustersrdquothatcouldtheoreticallyconjoinfromtwotofourconsonantsandcombinetocreatenewshapesDashandChaudhuri(1998)statethatthereareldquonearly380uniqueconsonantclustersrdquooutofwhichBi-consonantalcombinationsare290three-lettercombinationsaccountforanother80andtherareroneswithfourlettersnumber10more[136Pg4]MoredetailsofsuchcombinationscouldbeseeninPabitraSarkar(1993)[135] 71FinalSetofWLERulesTheprevalentpatternsinBanglaandvariousrestrictionsbelowarethespecificWLErulesthatneedtobeimplemented

48

1 CisasetofCandCNwhereCNisthesetofnormalizedformsofড়ঢ়য়2 HmustbeprecededbyC

Example

3 MmustbeprecededbyCExampleকা

4 DmustbeprecededbyeitherofVCorMExampleআখখা হা

5 XmustbeprecededbyeitherofVCMorDExampleউঃখঃবঃাঃ দ ঃ

6 BmustbeprecededbyeitherofVCMorDExampleআংইংকং

7 ZmustbeprecededbyVCMDBXorPExampleইৎকৎাৎাৎপ6ৎrৎ (S is not listed because S ends with M Z may also follow S)rdquo

8 VCANNOTbeprecededbyHDetailsin711CaseofVprecededbyH

9 SCANNOTbeprecededbyH

Now letuselaborateeach rulewithexamples from the scriptkeeping inmindthe Bangla Assamese and Manipuri communities Some combinations ofcharactersmayseemunrealisticorrareinusagebutthereisnoharminaddingsuch ligaturesbecause it ispossible tocreate thembyanyusereasilybutmaynotbeattestedcombinationsCase of V Preceded by H There could be cases involvingmulti-word domains where Vmay need to beallowedtofollowanHegব8াsঅtইিuয়া baeligŋkʌv ɪndiə (U+09ACU+09CDU+09AFU+09BEU+0999U+09CD U+0995 U+0985 U+09AB U+09CD U+0987 U+09A8 U+09CD U+09A1U+09BFU+09DFU+09BE)(meaningBankofIndia)ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendswithanH(অt)andthesecondwordbeginswithaV(ইিuয়া)Somesectionsofthe linguistic community require the explicit presence of H for fullrepresentation of the sound intended However by and large the form of thefirstwordwithoutanH(U+09CD)isconsideredenoughforfullrepresentationofthesoundintendedforthefirstword

49

This isauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoire Otherwise V is never required to be allowed to follow an HPermittingthismaycreateaperceptivesimilaritybetweentwolabels(withandwithout H) for majority of the linguistic communities hence this is explicitlyprohibitedbytheNBGPIn future if required depending on the prevailing requirements from thecommunitythefutureNBGPmayconsiderrevisitingthisrule

72 AdditionalExamplesfromBanglaABNF

Belowarea fewexampleswhichhelponeunderstandsomeof the rulesABNFputsinplaceThesearejustgivenforreferencepurposesandarenotmeanttobecomprehensive

1HMBDorXcannotoccurinthebeginningofaBanglawordExample

ক क

াক ाक

ংক क

ক क

ঃক ःकAscanbeseensuchcombinationwillresultautomaticallyinaldquogolurdquooradottedcircle marking it as an invalid formation This is an intrinsic property of theIndianlanguagesyllableandisquasi-automaticallyappliedwhereversupportedbytheOS

2HisnotpermittedafterVBDXMS

Example

অ अ

অং क ক क কঃ कः ি )क

3NumberofBDorXpermittedafterConsonantorVowelorakāra(Mātrā)isrestrictedtoonethusthefollowingcombinationsareinvalidated

50

Example

কংং क

ক क

কঃঃ कःः

কা का

কীঃঃ कःः

অংং अ

অ अ

অঃঃ अःः

4NumberofMpermittedafterConsonantisrestrictedtooneExample

কীী की5MisnotpermittedafterV Example

ইাঈৗ ईाईौ6ThecombinationsofAnusvāra+ VisargaaswellasVisarga+Anusvāraarenotpermissible

Example

কংঃ कः

কঃং कः

8 Contributors

81ExpertsfromIndia ProfessorUdayaNarayanaSinghChair-ProfessorofLinguisticsampDeanFacultyofArtsAmityUniversityHaryanaGurgaonPachgaonManesarPIN122431(Haryana)India

51

ProfessorPabitraSarkarformerlyVice-ChancellorRabindraBharatiUniversityKolkataDrAtiurRahmanKhanPrincipalTechnicalOfficerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMrRajibChakrabortyLinguistSocietyforNaturalLanguageTechnologyResearch(SNLTR)Module114amp130SDFBuildingSaltLakeSector-VKolkata-700091(WestBengal)IndiaMrAkshatJoshiProjectEngineerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMsMoumitaChowdhurySeniorTechnicalOfficerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMrChandrakantaMurasinghAgartalaTripuraSomeotherNBGPmembers

82ContributorsfromBangladeshJanabMustafaJabbarHonorableMinisterMinistryofPostsTelecommunicationsampInformationTechnologyGovtofBangladeshProfShamsuzzamanKhanFormerDirector-GeneralBanglaAcademyDhakaProfRafiqulIslamNationalProfessorofHumanitiesDhakaProfSwarochisSarkarDirectorInstituteofBangladeshStudiesRajshahiUniversityRajshahiBangladeshProfJinnatImtiazAliDirector-GeneralInternationalMotherLanguageInstituteDhakaMrMohammadMamunOrRashidDepartmentofBanglaJahangirnagarUniversityampMemberBangladeshComputerCouncilProfManiruzzamanformerlyProfessorChittagongUniversityChattagramBangladeshMrShyamSunderSikderSecretarySecretaryPostampTelecommunicationsDivisionGovtofBangladesh

52

MrMdMustafaKamalFormerDirectorGeneralBangladeshTelecommunicationsRegulatoryAuthorityGovernmentofBangladeshDhakaBrigadierGeneralMdMahfuzulKarimMajumderDirector-GeneralEngineeringampOperationsDivisionBangladeshTelecommunicationsRegulatoryAuthorityGovernmentofBangladeshDhakaMdZiarulIslamProgrammerPostsampTelecommunicationsDivisionGovernmentofBangladeshDhakaProfSyedShahriyarRahmanDepartmentofLinguisticsUniversityofDhakaDrMizanurRahmanDirector(In-Charge)TranslationTextBookandInternationalRelationsDivisionBanglaAcademyDhakaDrApareshBandyopadhyayDirectorBanglaAcademyDhakaMrMdMobarakHossainDirectorBanglaAcademyDhakaDrJalalAhmedDirectorBanglaAcademyDhakaMrJahangirHossainInternetSocietyBangladesh(ICANNALS)JanabSarwarMostafaChoudhuryBangladeshComputerCouncilDhakaJanabMdRashidWasifBangladeshComputerCouncilDhakaJanabIstiaqueArifSeniorAssistantDirectorBangladeshTelecommunicationsRegulatoryAuthorityDhakaMsAfifaAbbasInformationSecurityandGovernanceLeadEngineeratBanglalinkandICANNFellowMr Mohammad Abdul Haque Secretary General Bangladesh Internet GovernanceForumMrImranHossenCEOEyeSoftandkeymemberofBangladeshAssociationofSoftwareampInformationServices(BASIS)MsShahidaKhatunDirectorFolkloreMuseumampArchiveDivisionBanglaAcademyDhakaMrSyedAshikRehmanCEOBengalMediaCorporationDhakaMrHaseebRahmanCEOProfessionalsrsquoSystemsDhaka

53

9 References[100] UnicodeConsortium2017UnicodeStandard100MountainViewCA

[101] BandyopadhyayChittaranjan1981DuiShatakerBanglaMudranoPrakashanKolkataAnandaPublishers

[102] BanerjiRD1919TheOriginoftheBengaliScriptKolkataNewDelhiAsianEducationalServices2003reprint

[103] ChatterjiSK1926TheOriginandDevelopmentoftheBengaliLanguageCalcuttaCalcuttaUniversityPress

[104] -----1939Bhasha-prakashBangalaVyakaran(AGrammaroftheBengaliLanguage)CalcuttaUniversityofCalcutta

[105] HaiMuhammadAbdul1964DhvaniVijnanOBanglaDhvani-tattwa(PhoneticsandBengaliPhonology)DhakaBanglaAcademy

[106]JhaSubhadra1958TheFormationofMaithiliLondonLuzacampCo

[107] KosticDjordjeDasRheaS1972AShortOutlineofBengaliPhoneticsCalcuttaStatisticalPublishingCompany

[108] MajumdarRC1971HistoryofAncientBengalCalcuttaGBhardwaj

[109] MazumdarBijaychandra19202000TheHistoryoftheBengaliLanguage(ReprCalcutta1920ed)NewDelhiAsianEducationalServices

[110] PandeyAnshuman2001ProposaltoEncodetheTirhutaScriptinISOIEC10646

[111] PalPalashBaran2001DhwanimalaBarnamalaKolkataPapyrus

[112] -----2007lsquoBanglaHarapherPanchParbarsquoInSwapanChakrabortyedMudranerSanskritiOBanglaBoiKolkataAbabhas

[113] RossFiona1999ThePrintedBengaliCharacteranditsEvolutionLondonCurzon

[114] ShastriMahamahopadhyayHaraPrasad1916HājārBacharērPurāṇaBāṅgālāBhāṣāyBauddhaGānōDōhāCalcuttaBangiyaSahityaParishat

[115] SinghUdayaNarayana(JointlyManiruzzaman)1983DiglossiainBangladeshandlanguageplanningCalcuttaGyanBharati214pp

[116] -----1987ABibliographyofBengaliLinguisticsMysoreCIILxii+316pp

[117] -----2017(withRajibChakrabortyBidishaBhattacharjeeampArimardanKumarTripathy)LanguagesandCulturesontheMarginGuidelinesforFieldworkonEndangeredLanguagesMimeoCentreforEndangeredLanguagesVisva-Bharati

54

[118] -----1980ScriptalchoiceandspellingreformAnessayinlanguageandplanningJournaloftheMSUniversityofBarodaSocialScienceNumber292173-186AmodifiedversionreprintedEAnnamalaiBjornJernuddandJoanRubinedsLanguagePlanningProceedingsofanInstituteMysoreCIIL405-417

[119] Sripantha1996JakhanChapakhanaEloKolkataPaschim-BangaBanglaAcademy

[120] SurAtul1986BanglaMudranerDushoBacharKolkataJijnasa

[121] ScriptBehaviourforBengaliVersion11TDILandC-DACPune

[122] BoraMahendra1981TheEvolutionofAssameseScriptJorhatAssamSahityaSabha

[123] ProposaltoEncodetheTirhutaScriptinISOIEC10646httpwwwunicodeorgL2L201111175r-tirhutapdfaccessedon25112017

[124] EthnologueAssameseintheLanguageCloudhttpswwwethnologuecomcloudasmaccessedon25112017

[125] BengalialphabetforManipurifoundinEthnologueManipuri(MeeteilonMeithei)httpswwwomniglotcomwritingmanipurihtmaccessedon20102019

[126] WikipediaBengalialphabethttpsenwikipediaorgwikiBengali_alphabetaccessedon25112017

[129]OmniglotSlyhetihttpwwwomniglotcomwritingsylotihtmaccessedon1052018

[130]WikipediaBishnupriyaManipurilanguagehttpsenwikipediaorgwikiBishnupriya_Manipuri_languageaccessedon25112017

[131] TheEMILLECIILCorpushttpmetashareeldaorgrepositorybrowsethe-emilleciil-corpusabdd35c8de6f11e2b1e400259011f6ea6bce74d38dbb42d881da76c64a6adb20accessedon1052018

[132]TheEMILLECIILCorpushttpcatalogelrainfoproduct_infophpproducts_id=696accessedon1052018

[133] BanglaLanguageampScript httpswwwisicalacin~rc_banglabanglahtmlaccessedon1052018

[134] SarkarPabitra1992BanglaBananSanskarSamasyaoSambhabanaKolkataChirayataPrakashan

[135] SarkarPabitra1993BanglaBhasharYuktabyanjanBhasha1123-45

55

[136] DashNiladriShekharandBBChaudhuri1998BanglaScriptAStructuralStudyLinguisticsToday121-28Alsoavailableathttpswwwacademiaedu9967428Bangla_Script_A_Structural_Study

[137] DaniAhmedHasan(1957)lsquoSrīhatta-NāgarīLipirUtpattioBikāśrsquoBanglaAcademyPatrika(Dhaka)Vol12(Bhadra-Agrahayan1364BangabdaNumber)pg1

[138] WikipediaSylhetiNagari httpsenwikipediaorgwikiSylheti_Nagariaccessedon1952018

[139] FuruiRyosuke(2015)lsquoVariegatedAdaptationsStateFormationinBengalfromtheFifthtoSeventhCenturyrsquoinBhairabiPrasadSahuampHermannKulkeedsInterrogatingPoliticalSystemsIntegrativeProcessesandStatesinPre-ModernIndiaChapter9Pp255-73NewDelhiManohar

[140] FergusonCharesAandMunierChowdhury(1960)lsquoPhonesofBengalirsquoLanguageVol36No1pp22-59

[141] ShahidullahMuhammad(2007)BuddhistMysticSongsDhakaMowlaBrothers

[142] RayPunyaSloka(1966)BengaliLanguageHandbookWashington

[143] HaiMuhammadAbdul(1960)AphoneticandphonologicalstudyofnasalsandnasalizationinBengaliDhakaUniversityofDhaka

[144] UnicodeConsortiumProposalSummaryFormtoAccompanySubmissionsforAdditionstotheRepertoireofISOIEC10646UNICODEhttpswwwunicodeorgL2L200202387r-syloti-formpdfaccessedonMay212018

[145] WikipediaOlChiki(Unicodeblock)httpsenwikipediaorgwikiOl_Chiki_(Unicode_block)accessedonMay212018

[146]BanglaScripthttpwwwbangladesh2000combdbangla_scripthtmlaccessedonMay212018

[147] BhattacharyaAshutoshed(1942)GopichandrerGanCalcuttaCalcuttaUniversity

[149] DasSisirKumar(1975)SahibsandMunshisAnAccountoftheCollegeofFortWilliamCalcutta

[150] IslamRafiqulPabitraSarkarMahbubulHaqampRajibChakraborty(eds)(2014)BanglaAcademyPromitoBanglaByabaharikByakaran(AFunctionalGrammarofStandardBangla)DhakaBanglaAcademy

[151] SarkarPabitra[2013]lsquoBanglaSpellingReformtheLongandShortofItrsquoBanglaJournal19215-232

[152] BanglaAcademy(2012)BanglaAcademyPromitoBanglaBananerNiyam(StandardBanglaSpellingasadoptedbyBanglaAcademy)DhakaBanglaAcademy

56

[153] SarkarPabitraampRajibChakraborty2018ldquoWhathashappenedSoFarIntermsof Script Reformsrdquo Paper presented at the Face to Face meeting jointly held by theBanglaAcademyDhakaampICANNatBanglaAcademyDhakaon10072018

[154] The Unicode Consortium 2018 The Unicodereg Standard Version 110 ndash CoreSpecificationChapter12P473

57

10 Appendix-I

101 AugmentedBackus-NaurFormalism(ABNF)The Augmented Backus-Naur Formalism (ABNF) is generic in nature and whenappliedtoaspecificlanguagescriptcertainrestrictionrulesapplyInotherwordsinagivenlanguagesomeoftheFormalismstructuresdonotnecessarilyapplyTotakecareofsuchcasesrestrictionrulesaresetinplaceTheserestrictionswillhelptofine-tunetheABNFIncaseofBangla13inparticularthefollowingrulesapply

1Khaṇḍata (ৎ) is NOT allowed at the beginning of an IDN label The sameappliestoঞandthevelarnasalঙintheBanglaSchemeoffive-foldlsquovargarsquo(asdefined under Table 5) Moreover Bangla does not allow ya (য়) in thebeginningofawordeitherbutwecanciteacoupleofnativeexamples forexamplethewordয়8াvেড়া(yaeligbbɔRo)fromthepoemlsquoLichuchorrsquowrittenbyKaziNazrul IslamHowever there are instances of it being used in namesmostlyof foreignoriginsuchasYaqubwhichmaybewrittenwithya(য়) inthe beginning as inয়াwব) In very recent timeswhile transliterating someChineseandJapanesenamesinBanglaonedoescomeacrossthepossibilityofKhaṇḍata (ৎ) followedbysa (স) inthebeginningofaword forexamplexেসিরং(Tsering)

2CHcancomewithKhandaTainonlythecasewhereCisra(র)(09B0)

ৎ6 asinভৎ6 সনা

3OnlyfollowingcombinationswithVHCMwillbeallowedrarrঅ8া(togetherpronouncedasaelig)asinঅ8ািসড(acid)rarrএ8া(togetheralsopronouncedasaelig)asinএ8ািসডএ8ােসািসেয়শান

(acidassociation)

102 lsquoSylhetiNagarılipirsquoorlsquoSilotirsquoThisversionofBanglascriptresemblesthe lsquoKaithīrsquoscript(ISO12954)usedbytheAccountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh andBiharndashwidelyinuseduringthe1880sTherewereseveralothernamesofSylheti

13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language Assamese and Maṇipuri have not been covered in this section

58

Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)

Table17ndashTheScriptTableofSylhetiNagarıorSiloti

103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints

1031 BanglaandNāgarīorDevanāgarī

Bangla Devanāgarī NBGP

Decision

ঃ U+0983 ः U+0903 Confusable

ওU+0993 उU+0909 Confusable

ঘU+0998 घU+0918 Confusable

U+0981 U+0945 Confusable

Table18BanglaandDevanāgarīconfusablecodepoints

59

1032 BanglaandGurmukhi

Bangla Gurmukhi NBGPdecision

ঘU+0998 ਬU+0A2C Confusable

U+0981 U+0A71 Confusable

Table19BanglaandGurmukhiconfusablecodepoints

Bangla Gurmukhi

NBGPdecision

ওU+0993 ਤU+0A24

Distinguishable

শU+09B6 ਅU+0A05

Distinguishable

মU+09AE ਮU+0A2E

Distinguishable

বাU+09ACandU+09BE

ਗU+0A17

Distinguishable

Table20ndashBanglaandGurmukhıdistinguishablecodepoints

1033BanglaandOriya(Odia)

Bangla Oriya(Odia) NBGPDecision

ওU+0993 ଓU+0B13 Confusable

Table21ndashBanglaandOriyadistinguishablecodepoints

Bangla Oriya(Odia) NBGP

Decision

ঘU+0998 ସU+0B38 Distinguishable

Table22ndashBanglaandOriyadistinguishablecodepoints

60

11 Appendix-IIBengaliconsonantsandtheirallographs

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

প p y(z+ত)(z +ন)|(z +প)প8(z +য)r(z +র)(z +ল)~(z +স)(+প)(+প)

ফ pʰ (t +র)(t +ল)(+ফ)

ব b ( +জ)( +দ)( +ধ)v(+ব)ব8(+য)(+র)(+ল)ভ(+ভ)(+ব)(+ব)

(0+ধ) 2 (3+ব)

ভ bʱ ভ8(+য)(+র)(+ল)

ত t (x+ত)8(x+x+য)(x+x+ব)(x+থ)(x+ন)ত8(x+য)(x+ম)8(x++য)(x+ব)(x+র)y(z+ত)(p+ত)(p+x+ব)(+ত)q8(+x++য) (+x+র)Thereisamarkedformofত+=ৎৎ6 (+xৎ)

amp (5+ত)

61

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

থ tʰ থ8(iexcl+য)cent(iexcl+র)pound(+থ)(x+থ)curren(+থ)

(7+থ) 9 (+থ)

দ d yen(brvbar+গ)sect(brvbar+ঘ)uml(brvbar+দ)copy(brvbar+ধ)দ8(brvbar+য)ordf(brvbar+ব)laquo(brvbar+ভ)not(brvbar+র)(+দ)shy(+দ)reg(+brvbar+র)not6 (+brvbar+র)

(lt+গ) gt (lt+ধ)

ধ dʱ macr(deg+ন)plusmn(deg+ম)ধ8(deg+য)sup2(deg+র)sup3(acute+ধ)copy(brvbar+ধ)(+ধ)micro(+ধ)

( (+ধ) gt (lt+ধ) (0+ধ) (7+ধ)

ট ʈ para(middot+ট)ট8(middot+য)cedil(middot+ব)sup1(middot+র)ordm(p+ট)raquo(frac14+ট)

ঠ ʈʰ ঠ8(frac12+য)frac34(iquest+ঠ)Agrave(frac14+ঠ)

ড ɖ Aacute(Acirc+ড)ড8(Acirc+য)Atilde(Acirc+র)

ঢ ɖʱ ঢ8(Auml+য)Aring(iquest+ঢ)

চ tʃ AElig(Ccedil+চ)Egrave(Ccedil+ছ)Eacute(Ccedil+Ecirc+র)Euml(Ccedil+ঞ)চ8(Ccedil+য)Igrave(Iacute+চ)Icirc(Iuml+চ)

(A+চ)

62

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ছ tʃʰ ETH(Ecirc+র)Egrave(Ccedil+ছ)Ntilde(Iacute+ছ)Ograve(Iuml+ছ)

$ (A+ছ)

জ dʒ Oacute(Ocirc+জ)Otilde(Ocirc+Ocirc+ব)Ouml(Ocirc+ঝ)times(Ocirc+ঞ)জ8(Ocirc+য)Oslash(Ocirc+র)Ugrave(Iacute+জ)

(A+জ)

ঝ dʒʱ (notprivilegedenoughtohaveclustersasafirstmember)Ouml(Ocirc+ঝ)Uacute(Iacute+ঝ)

ক k Ucirc(p+ক)ordm(p+ট)(p+ত)Uuml(p+x+র)(p+x+ব)Yacute(p+ন)THORN(p+ব)szlig(p+ম)ক8(p+য)agrave(p+র)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)auml(p+frac14+ব)aacute8(p+frac14+য)aring(p+স)s(aelig+ক)ccedil(+p+র)

amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)

খ kʰ (notprivilegedenoughtohaveclustersasafirstmember)egrave(aelig+খ)

63

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

গ g eacute(acute+গ)ecirc(acute+দ)sup3(acute+ধ)euml(acute+ন)igrave(acute+ব)iacute(acute+ম)গ8(acute+য)icirc(acute+র)iuml(acute+ল)eth(aelig+গ)eth6 (+aelig+গ)

( (+ধ) (H+গ) K (L+H+গ)

ঘ gʱ ntilde(ograve+ন)ঘ8(ograve+য)oacute(ograve+র)ocirc(aelig+ঘ)

ঞ Thisletterdoesnothaveanyparticularphoneticvaluebutmostlypronouncedasn

Igrave(Iacute+চ)Ntilde(Iacute+ছ)Ugrave(Iacute+জ)Uacute(Iacute+ঝ)times(Ocirc+ঞ)

(A+চ) $ (A+ছ) (A+জ) M (A+ঝ)

ণ n otilde(iquest+ট)frac34(iquest+ঠ)ouml(iquest+ড)divide(iquest+Acirc+র)Aring(iquest+ঢ)oslash(iquest+ণ)ণ8(iquest+য)ugrave(iquest+ব)acirc(p+frac14+ণ)uacute(frac14+ণ)ucirc(+ণ)

O (P+ড) - (P+R+র) + (S+ণ)

ঙং ŋ s(aelig+ক)uuml(aelig+p+র)egrave(aelig+খ)eth(aelig+গ)ocirc(aelig+ঘ)yacute(aelig+p+ষ)(Insomecontextsaeligisreplacedbyং)কংঅং

) (H+ক) (H+গ) U (H+ঘ)

64

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ম m thorn(+ল)yuml(+প)(+z+র)(+ভ)(++র)$(+ম)(+র)(x+ম)plusmn(deg+ম)amp(+ম)atilde(p+frac14+ম)

W (3+ম)

ন n (+ট)((+middot+র))(iquest+ঠ)u(+ড)(+Acirc+র)(+ত)q(+x+র)q8(+x++য)curren(+থ)shy(+দ)reg(+brvbar+র)micro(+ধ)+(+deg+র)(+brvbar+ব)-(+ন)(+ম)ন8(+য)(+স)0(+ন)

(7+থ) (7+ধ) (7+Y+র)

শ ʃ Icirc(Iuml+চ)Ograve(Iuml+ছ)1(Iuml+ন)2(Iuml+ম)3(Iuml+র)4(Iuml+ল)শ8(Iuml+য)

ষ ʃ 5(frac14+ক)raquo(frac14+ট)Agrave(frac14+ঠ)uacute(frac14+ণ)6(frac14+প)7(frac14+z+র)8(frac14+ফ)raquo(frac14+ট)9(frac14+middot+র)Agrave(frac14+ঠ)uacute(frac14+ণ)ষ8(frac14+য)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)

+ (S+ণ)

65

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

স sampʃ (+ক)(+ট)(+প)(+ফ)lt(+ত)pound(+থ)(+ট)(+ক)=(+খ)স8(+য)gt(+র)(+ল)aring(p+স)

9 (+থ)

হ h ucirc(+ণ)0(+ন)amp(+ম)হ8(+য)(+র)A(+ল)

W (3+ম)

ড় ɽ B(C+গ)

ঢ় ɽʱ (notprivilegedenoughtohaveclusters)

য dʒThesecondarysymbol(allograph)jɔ-phalahastwophoneticvaluesWhenaddedtotheinitialconsonantinaworditisavowelaelig(asinশ8ামলর 8াপারetc)Butafteranon-initialconsonantitjustdoublesitinpronunciation(asinকায6ধায6etc)The+যcombinationhastwophysicalmanifestationsmdashর 8andয6

ক8(p+য)স8(+য)র 8(+য)[Justর 8isneverusedinBanglaorthographyর 8াisbutthenitslasttwosymbolsYa-phalaa-karaconstituteavowelsignrepresentingthevowelঅ8া]

66

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

র r Twomanifestationsmdashi lরফrepʰasthefirst

memberofaclusteregপ6ৎ6 not6 য6D6(+deg+ব)(earlierE6=+brvbar+deg+বafour-termcluster)etc(placedoverthefollowingconsonant)

ii র-ফলাrɔ-pʰɔlaasthesecondthirdmemberofaclustereg etc(placedundertheconsonantitfollows)

ল l F(+গ)(+প)G(+ব)H(+ম)I(+ট)J(+ড)K(+ক)F(+গ)L(+দ)ল8(+য)iuml(acute+ল)(+ল)thorn(+ল)

ঃ hwordfinallywordmediallyitdoublesthepronunciationofthefollowingconsonant

অঃকঃ

অব

Page 24: Proposal for a Bangla (or Bengali) Script Root Zone …‘Bangla’ (or Bengali) is historically and genealogically regarded as an eastern Indo-Aryan language with around 178.2 million

24

Asamiyā-Manipuri in theUNICODEhas93 entriesThis sectiondetails the code-pointrepertoirethattheNeo-BrāhmīGenerationPanel[NBGP]proposestobeincludedintheBanglaLGRItmaybementionedherethat theGovernmentofAssamhassubmittedaproposal toBureauof Indian Standards (BIS) on26th February2016 for dis-unificationofBanglaand Asamiyā Scripts The BIS inits 8thMeetingofIndian Language Technologies andProducts Sectional Committee LITD 20 held on 23rd Aug 2017 decided torefer theproposalforrecognitionofAssamesescriptinISOIEC10646toISOUntiltheUNICODEConsortiumtakesanyfurtheractionitwillbeassumedthattheCodePointRepertoireunderTable11willbevalidforallthethreelanguagesasaboveFor each of the code points language references have been given in the last columntitledReferenceunderTable8titledtheldquoCodePointRepertoirerdquoForentirecoverageofBanglacodepointsreferencesofBanglaAsamiyā(Assamese)Manipuri(Meitei)andBishnupriya are given Kokborok written in Bangla script is not known to haveintroducedmanynewcomplicationsexceptforoneparticularcharacterThoughonlyafewrepresentativelanguagesunderEGIDSScale1-4havebeenchosenforreferencingthey together cover all the code-points required for all the languages that NBGP hasconsideredasgivenunderBanglaUnicodePoints(asgiveninUNICODE63)Howeverbefore thedetailsarepresented it is ideal to lookat theBanglaCodePointChartfromMaximalStartingRepertoire[MSR]Version3Itmaybenotedthattheshapesofthereferenceglyphsgivenbelowinthecodechartsarebasedononeofthemanyfontsdesigned and are not prescriptive because there could be some variations in actualfonts ndash both UNICODE-compatible and True-Type ones Consider the following Codepointtable

25

Figure1BanglaCodePagefrom[MSR]forBangla-Asamiyā-Manipuri

ColourconventionAllcharactersthatareincludedinthe[MSR]-YellowbackgroundPVALIDinIDNA2008butexcludedfromthe[MSR]-PinkishbackgroundNot PVALID in IDNA2008 or are ineligiblefor the root zone (digits hyphen) - Whitebackground

GiventheBanglaUnicodeBlockasinFigure1forthecodepointsthoseareincludedintheMSRthefollowingsymbolswillneedaseparatetreatmentৎ U+09CE BanglaLetterKhanda-Taৰ U+09F0 Asamiyā-BanglaLetterRaWithMiddleDiagonalৱ U+09F1 Asamiyā-BanglaLetterRaWithLowerDiagonal

26

51 CodePointRepertoireInclusion

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

1 U+0981 BENGALISIGNCANDRABINDU

Candra-bindu

1Bangla2Manipuri2Assamese

[112][122][125]

2 U+0982 ং BENGALISIGNANUSVARA

Onushshar(Anusvara)

1Bangla2Manipuri2Assamese

[112][122][125]

3 U+0983 ঃ BENGALISIGNVISARGA

Biśarga(Visarga)

1Bangla2Manipuri2Assamese

[112][122][125]

4 U+0985 অ BENGALILETTERA

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

5 U+0986 আ BENGALILETTERAA

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

6 U+0987 ই BENGALILETTERI

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

7 U+0988 ঈ BENGALILETTERII

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

27

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

8 U+0989 উ BENGALILETTERU

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

9 U+098A ঊ BENGALILETTERUU

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

10 U+098B ঋ BENGALILETTERVOCALICR

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

11 U+098F এ BENGALILETTERE

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

12 U+0990 ঐ BENGALILETTERAI

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

13 U+0993 ও BENGALILETTERO

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

14 U+0994 ঔ BENGALILETTERAU

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

28

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

15 U+0995 ক BENGALILETTERKA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

16 U+0996 খ BENGALILETTERKHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

17 U+0997 গ BENGALILETTERGA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

18 U+0998 ঘ BENGALILETTERGHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

19 U+0999 ঙ BENGALILETTERNGA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

20 U+099A চ BENGALILETTERCA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

21 U+099B ছ BENGALILETTERCHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

29

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

22 U+099C জ BENGALILETTERJA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

23 U+099D ঝ BENGALILETTERJHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

24 U+099E ঞ BENGALILETTERNYA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

25 U+099F ট BENGALILETTERTTA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

26 U+09A0 ঠ BENGALILETTERTTHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

27 U+09A1 ড BENGALILETTERDDA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

30

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

28 09A109BC(U+09DC)

ড় NormalizedformofBENGALILETTERRRA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]09DCisthepreferredcodepointhoweveritisnotavailableforLGRasperthestandardsgoverningthisLGRdevelopment

29 U+09A2 ঢ BENGALILETTERDDHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

30 09A209BC(U+09DD)

ঢ় NormalizedformofBENGALILETTERRHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]09DDisthepreferredcodepointhoweveritisnotavailableforLGRasperthestandardsgoverningthisLGRdevelopment

31 U+09A3 ণ BENGALILETTERNNA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

32 U+09A4 ত BENGALILETTERTA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

31

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

33 U+09A5 থ BENGALILETTERTHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

34 U+09A6 দ BENGALILETTERDA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

35 U+09A7 ধ BENGALILETTERDHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

36 U+09A8 ন BENGALILETTERNA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

37 U+09AA প BENGALILETTERPA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

38 U+09AB ফ BENGALILETTERPHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

39 U+09AC ব BENGALILETTERBA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

32

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

40 U+09AD ভ BENGALILETTERBHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

41 U+09AE ম BENGALILETTERMA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

42 U+09AF য BENGALILETTERYA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

43 09AF09BC(U+09DF)

য় NormalizedformofBENGALILETTERYYA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]09DFisthepreferredcodepointhoweveritisnotavailableforLGRasperthestandardsgoverningthisLGRdevelopment

44 U+09B0 র BENGALILETTERRA

Consonant 1Bangla2Manipuri

[112][125]

45 U+09B2 ল BENGALILETTERLA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

46 U+09B6 শ BENGALILETTERSHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

33

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

47 U+09B7 ষ BENGALILETTERSSA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

48 U+09B8 স BENGALILETTERSA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

49 U+09B9 হ BENGALILETTERHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

50 U+09BE া BENGALIVOWELSIGNAA

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

51 U+09BF ি BENGALIVOWELSIGNI

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

52 U+09C0 ী BENGALIVOWELSIGNII

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

53 U+09C1 BENGALIVOWELSIGNU

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

34

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

54 U+09C2 BENGALIVOWELSIGNUU

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

55 U+09C3 BENGALIVOWELSIGNVOCALICR

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

56 U+09C4 BENGALIVOWELSIGNVOCALICRR

Kāra(Mātrā)

1Bangla2Assamese

[112][122]

57 U+09C7 l BENGALIVOWELSIGNE

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

58 U+09C8 m BENGALIVOWELSIGNAI

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

59 U+09CB lা BENGALIVOWELSIGNO

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

60 U+09CC lৗ BENGALIVOWELSIGNAU

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

35

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

61 U+09CD BENGALISIGNVIRAMA

Hasanta(=Halant)Virama(=Da ri)

1Bangla2Assamese2Manipuri

[112][122][125]

62 U+09CE ৎ BENGALILETTERKHANDATA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

63 U+09F0 ৰ BENGALILETTERRAWITHMIDDLEDIAGONAL

Consonant 2Assamese [122]

64 U+09F1 ৱ BENGALILETTERRAWITHLOWERDIAGONAL

Consonant 2Assamese2Manipuri

[122][125]

Table8BanglaCode-PointRepertoire

Apart from the above individual code-points the Neo-Brāhmī Generation Panel alsoproposes some specific sequences which enable conditional inclusion of the BanglaLETTER A and E followed by Bangla SIGN VIRAMA and Bangla LETTER YA againfollowed by Bangla VOWEL SIGN AA in the repertoire for enabling inclusion of aeligsoundasinEnglishlsquobatrsquolsquocatrsquoetc

SrNo

UnicodeCodePoints

Sequence

CharacterNames Examplelanguagesusingthecode-point

(Notexhaustive

list)

Reference

36

SrNo

UnicodeCodePoints

Sequence

CharacterNames Examplelanguagesusingthecode-point

(Notexhaustive

list)

Reference

S1 098509CD09AF09BE

অ8া BENGALILETTERABENGALISIGNVIRAMABENGALILETTERYABENGALIVOWELSIGNAA

BanglaAssamese

[112][122]

S2 098F09CD09AF09BE

এ8া BENGALILETTEREBENGALISIGNVIRAMABENGALILETTERYABENGALIVOWELSIGNAA

Bangla [112]

Table9Sequences

52 CodePointRepertoireExclusionTherearesomecharactersoftheBanglascriptthatfindplaceintheUnicodebuthavenot been included in the repertoire in the LGR proposal The reason for excludingঌ(U+098C)andৗ(U+09D7)isthattheyarerareandobsoletecharacters

SrNo CodePoints

Glyph CharacterNames Note

1 U+098C ঌ BENGALILETTERVOCALICL Limitedordeclininguse

2 U+09D7 ৗ BENGALIAULENGTHMARK Limitedordeclininguse

Table10ExcludedCodePoints

37

53 CodepointnotusedaloneBENGALI SIGN NUKTA U+09BC (See 336) is excluded from repertoire since it will never be used alone It will be used as sequence in three special characters in normalized form for ড় ঢ় য়

UnicodeCodePoint

Glyph CharacterName Reasonforexclusion

U+09BC BENGALISIGNNUKTA

Never used alone Only used together with U+09A1 ড

U+09A2 ঢ U+09AF য as to form ড় ঢ় য় respectively

Table10bExcludedCodePoints

54 TheBasisofPresentIDNThepresentLGRhasalsobenefitedfromtheearlierworkonIDNforBangla(differentversions)doneforभारतorভারতdraftedbetween20112009and18072013

541 TheABNFVariablesTheAugmentedBackus-NaurFormalism(ABNF)beganwiththefollowingvariables

CrarrConsonantVrarrVowelMrarrkāra(Mātrā)BrarrAnusvāra(onuʃʃār)DrarrCandrabinduXrarrVisarga(biʃɔrgo)HrarrHasantaViramaZrarrKhandaTa

TheAugmentedBackus-NaurFormalism(ABNF)willusethefollowingOperators

SrNumber Operator Function

1 ldquo|ldquo Alternative

2 ldquo[]rdquo Optional

3 ldquordquo VariableRepetition

38

4 ldquo()rdquo SequenceGroup

Table11TheABNFFormalism

InwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoBanglaaregiventofacilitateunderstanding

542 TheVowelSequenceInwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoBanglaare given To facilitate understanding of other Brahmi script users equivalents inDevanāgarīareprovidedwherevernecessaryAvowelsequenceismadeupofasinglevowelItmaybefollowedbutnotnecessarily(optionally)byanAnusvāraonuʃʃār(B)Candrabindu(D)oraVisargabiʃɔrgo(X)ThenumberofDBorXwhichcanfollowaV inBanglamaynotberestrictedtooneGoingbytherules illustratedinthedocument it isclearthat formationssuchasVDDVBBandVXXare invalidorthographicunitsHowever it isvalidandpossible tohaveformationsorsequencessuchasanusvarafollowedbyachandrabinduononehandandvisarga followed by a chandrabindu on the other as in হ8াংচা lsquohaelignchārsquo and lsquohaelignrsquo হ8াঃrespectivelyThepossibilityof aVisarga orAnusvāra (onuʃʃār) followingaCandrabinduexists inBangla Vowel can optionally be followed by a combination of Hasanta Virama [H]Consonant [C] to formaYa-phala ldquoYa-phala isapresentation formofU+09AFBanglaletterযorlsquoyarsquoRepresentedbythesequenceltU+09CDieBENGALISIGNVIRAMABangla SIGNHasanta or VIRAMA U+09AF -য BENGALI LETTER YAgt Ya-phala has aspecialformয়AgainwhencombinedwithU+09BEাBENGALIVOWELSIGNAA(ielsquoaarsquo(ā))itisusedfortranscribing[aelig]asintheldquoardquointheEnglishwordldquobatrdquowritteninBanglaasব8াটAVowel-sequenceadmitsthefollowingcombinations

5421 ASingleVowel

ExamplesV অअ

5422 AVowelwithConditionsAVowelcanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D]orVisarga[X]or Candrabindu+ Anusvāra [DB] or Candrabindu+ Visarga [DX] or combination ofHasanta(orVirama)[H]followedbyConsonant[C]followedbykāra(Mātrā)[M]

39

Examples

VB অং अ

VD অ अ

VX অঃ अः

VDB অং अ

VDX অঃ अःVHCM অ8াএ8া

5423 VHCMSequenceAVHCMsequencecanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D] orVisarga [X]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga [DX]

Examples VHCMB অ8াংএ8াং VHCMD অ8াএ8া VHCMX অ8াঃএ8াঃ VHCMDB অ8াংএ8াং VHCMDX অ8াঃএ8াঃ

543 TheConsonantSequence

5431 ASingleConsonant(C) ExampleC কक

5432 AConsonantwithConditionsAConsonantoptionallyfollowedbydependentvowelsignkāra(Mātrā)[M]orAnusvāra [B] or Candrabindu [D] or Visarga [X] or Hasanta (also known asVirama)[H]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga[DX]

Example

CM িকক -कक

CB কং क

40

CD ক क

CX কঃ कः

CH p क(Pureconsonant)

CDB কং क

CDX কঃ कः

5433 CMSequenceACMsequencecanbeoptionallyfollowedbyBDXDBorDX

Example CMB কীংকং कक

CMD কা का

CMX বীঃ वीः

CMDB কাং का

CMDX কাঃ काः

5434 SequenceofConsonants Asequenceofconsonants(upto4)joinedbyHasanta(alsoknownasVirama)

3(CH)CExample

CHC W rarr ন++ ত न+त

CHCHC sup2 rarr ন+ + ত+ + র न+त+र

CHCHCHC q8 rarr ন++ত++র++য় न+त+र+य

5435 Subsets While considering its subsets as a representative example we will consider thecombinationCHConlyhoweverthesameisequallyapplicabletoCHCHCandCHCHCHC[A]ThecombinationmaybefollowedbyMBDXDBorDX

Example CHCM sup3ী rarrক ক ী 4कrarrककী

CHCB sup3ং rarrক ক ং 4कrarrकक

CHCD sup3 rarrক ক 4कrarrकक

41

CHCX sup3ঃ rarrক ক ঃ 4कःrarrककঃ

CHCDB sup3 ং rarrক ক ং 4कrarrकक

CHCDX sup3ঃ rarrক ক ঃ 4कःrarrककः [B] 3(CH)CMmayfurtherbefollowedbyaBDXDBorDX

Example

CHCMB sup3ীং rarr ক ক ী ং 4क rarr क क ी

sup3ং rarr ক ক ং 4क rarr क क

CHCMD sup3া rarr ক ক া 4का rarr क क ा

CHCMX sup3ীঃ rarr ক ক ী ঃ 4कः rarr क क ी ः

CHCMDB sup3াংrarr ক ক া ং 4काrarr क क ा

CHCMDX sup3াঃ rarr ক ক া ঃ 4काः rarr क क ा ः

544 TheKhanda-Tasequence

5441 AsinglelsquoKhandarsquo-Ta(Z) Example Zৎ=x

5442 AKhandaTaCombination10AKhandaTacanbeprecededbyaconsonantandHasanta(alsoknownasVirama)

[CH]Z

Example র++ৎ =ৎHasinভৎHসনা (bhartsanā) scoldingNoteTheconditionsinthiscontextofKHANDATAarethattheCshouldbeeitherRAU+09B0(র)(usedinBangla)orRAU+09F0(ৰ)(usedinAssamese)

545 SpecialCasesSandPTwospecialcasesinvolvingSequences(referredtoasSandPinTable16underSection7)couldbedescribedbrieflyhereLetustakeupSinthefirstinstanceItisnoteworthythat thereare two instances inBanglawhereHasanta(U+09CD) isprecededbya full

10 Refer to Rule P in Section 7 Table 16

42

vowel (U+0985 অ - BANGLA LETTER A and U+098F এ - BANGLA LETTER E) ForrenderingYa-phalāfollowedbyঅandএitisnecessarytotypeU+09CDplusU+09AFyapreceded by the said vowels This is a purely ligatural entity and the addition ofYa-phalāandā-kāraisusedtoelicittheaeligsoundasinEnglishlsquobatrsquolsquofatrsquoetcTheBrahmıscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribedaslsquovowel killerrsquo although it actually indicates absence of a vowel after the markedconsonant Only the consonants can have the Hasanta marked But as we see hereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅGাandএGা(CfUnicode100p473[100])Another case refers to the formation of repha and ra-phalā in the said script andmentioned in the tableaboveasPOwing toco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+

kaasinঅকHarkaldquothesunldquo)ra-phalā=C+Hasanta+ra(egTieka+Hasanta+raas

inচTcakraldquocycleldquo)ThepointisinboththecasestheslotforracouldbeBanglaraর

(U+09B0)ortheAssameseraৰ(U+09F0)followedprecededbythecommonHasanta(U+09CD)whereastheshapesofrephaandra-phalāinboththecasesremainthesame

43

6 VariantsThissection talksabout thevariants in theBanglascriptTheNBGPcategorizes theseconfusinglyvariantsintwogroupsGroup1ConfusingduetopurevisualsimilarityGroup2ConfusingduetodeviationfromnormallyperceivedcharacterformationsbylargerlinguisticcommunityForGroup1anyidenticalcodepointsaredefinedasvariantsTheconfusablebutnotidenticalcasesarenotproposedasthereisanotherpanel(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesHowevercaseswhichbelongtoGroup2areproposedtobeconsideredasvariantsThesecasesarenotofmerevisualsimilarityasthey involve some deviations from the widely accepted norms of Bangla Aksharformations These can cause confusion even to a careful observer and hence beingproposedasvariantsThe variants are generated in a script when two or more forms are formed withdifferent storage or code points In Bangla the e-kāra ā-kāra and the o-kāra havedifferentcodepointsOnecantypeowithaconsonantatonegoandthesamebytypinge-kāra and ā-kāra as two separate keys getting the same results A reader cannotdifferentiatebetween the twoko (েকা)one typedwitha singlekeyand theotheronetypedwithtwodifferentkeysMoreoverthiswillnotbeconsideredasacaseofvariantbecauseakārafollowedbyakāraisnotallowed

61 InScriptVariantsHoweverweproposetwocasesoftruein-scriptvariantsinBanglascriptCASEIAs far as true variants in Bangla are concernedwemay drawour attention to caseswhereinHasantawith(U+09A5)থ(tha)appearsasconjunctwith(U+09B8)স(sa)and(U+09A8)ন(na)1 স+Hasanta+থ(U+09B8+U+09CD+U+09A5)versus

স +Hasanta+হ(U+09B8+U+09CD+U+09B9)

2 ন+Hasanta+থ(U+09A8+U+09CD+U+09A5)versus

ন+Hasanta+হ(U+09A8+U+09CD+U+09B9)

44

Theabovecombinationsifwrittenintraditionalorthographycouldbelittleconfusingwheretheথ(tha)inconjunctappearslikeaহ(ha)Theconjunctcouldbeintheinitialmedialorfinalpositions(asshownbelowinegno1)Itcouldbetypedwrongaswellthinking itwas aহ (ha)U+09B9 increasing the chances of risks in labelwriting andidentificationExamples1 acuteandসহ(asinacuteানsthānaacuteলsthulaাacuteGsvāsthyaঅacuteায়ীasthāyı)

2 andনহ(asinparagrantha)The fontswhichrepresent traditionalBanglawritingsystemcould tend tocreate thisproblemThereforethesemaybetakenascasesofvariantsinBanglaCASEIIAnotherinterestingexampleofvariantisencounteredinra+HasantaandHasanta+racombinationsinwritinglabelsintheBanglascript(forlanguagessuchasBanglaAssameseandManipuri)Thevariantcasesariseintypinglsquorepharsquo(involvingra+Hasanta)andlsquora-phalārsquo(involvingHasanta+ra)lsquoRepharsquocouldbeformedbytwosequences(mainlybecausebothAssameseandBanglafindplaceinthesameUNICODEpointsandlsquoB_RArsquoaswellaslsquoA_RArsquorefertothesamephoneticelement)Herethefinalligatureslookthesameandwillbeasfollows

(1) B_RA+H+C(2) A_RA+H+C

Where

B_RA = U+09B0BENGALILETTERRA(র)orA_RA = U+09F0BENGALILETTERRAWITHMIDDLEDIAGONAL(ৰ)H = U+09CDBENGALISIGNVIRAMA()C = anyconsonant(theoretically)

Example

Sequence1(UsingBanglaRA)

Ligature1

Sequence2(UsingAssameseRA)

Ligature2

U+09B0(র)U+09CD()U+0995(ক) কH U+09F0(ৰ)U+09CD()U+0995(ক) কH

U+09B0(র)U+09CD()U+09A0(ঠ) ঠH U+09F0(ৰ)U+09CD()U+09A0(ঠ) ঠH

Table12ExampleofRepha

45

NoteAsBanglaandAssameseকandঠlookexactlythesametheresultantcombinationswithRephalookidenticalAdditionofRephadoesnotmakeanydifferencelsquoRa-phalarsquocouldbeformedbytwosequencesonsimilargroundsandthefinalligatureswouldlookthesame

(1) C1+H+B_RA(2) C1+H+A_RA

WhereC1 = anyconsonantsexceptKhanda-ta

Example

Sequence1(UsingBanglaRA)

Ligature1

Sequence2(UsingAssameseRA)

Ligature2

U+0995(ক)U+09CD()U+09B0(র) U+0995(ক)U+09CD()U+09F0(ৰ)

U+09A8(ন)U+09CD()U+09B0(র) ) U+09A8(ন)U+09CD()U+09F0(ৰ) )

Table13ExampleofRa-phala

AstheAssameseandBanglaRephaandRa-phalaconjunctformslookthesamethiscouldcauseconfusabilitytotheend-usersHencetherephaandra-phalacasesneedtobedefinedasvariantsNBGPconcludedtodefineরandৰasvariantcodepointswhereonlyonevariantsetbetweenরandৰcouldcoverallcasesButthiswillcreateblockedvariantlabelsegif

someoneregistersldquoরররrdquothevariantlabelldquoৰৰৰrdquowillbegeneratedasvariantandwillbeblockedandviceversaHoweveritisonlyblockedatthelabellevelifsomeoneelseneedstoregisterotherlabelsegৰৰorৰৰৰৰitisstillpossible

62 CrossScriptVariantsAcrispcrossscriptstudyforBanglahasbeendonewithrespecttosisterscriptssuchasDevanāgarī Gurmukhı and Odia11(formerly Oriya) keeping in mind the visual andtechnicalconfusionstheymaycauseaslabelsonthewebdomainMoreoverthereisnoin-script variant in Bangla as far as the orthography is concerned The followingcharacters are being proposed by the NBPG as variants Although there are certaincharacters which are somewhat similar they but have not been included here TheyhavebeenprovidedintheAppendix(102)forreference

11 Unicode uses Oriya for the script although Odia is now the official term used

46

1 BanglaandNāgarīDevanāgarīScript

Bangla Devanāgarī

মU+09AE

मU+092E

িU+09BF

िU+093F

Table14-BanglaandDevanāgarīcross-scriptvariantcodepoint

2 BanglaandGurmukhiScript

Bangla Gurmukhı

মU+09AE

ਸU+0A38

িU+09BF

ਿU+0A3F

Table15-BanglaandGurmukhıcross-scriptvariantcodepoint

7 WholeLabelEvaluationRules(WLE) ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinsection32whenwritten inBangla12ScriptThe ruleshavebeendrafted in suchawaythattheycanbeeasilytranslatedintotheLGRspecificationsBelow are the symbols used in the WLE rules for each of the Indic SyllabicCategoryasmentionedinthetableprovidedinCodepointrepertoire(Section51)

C rarr Consonant

M rarr Kāra(Mātrā)

V rarr Vowel

B rarr Anusvāra

12 As used by the Unicode denoting and including both Assamese and Maṇipuri

47

D rarr Candrabindu

X rarr Visarga

H rarr Hasanta

Z rarr KhandaTa

S rarr S1S2(fromTable9)or(ae)Ya-phalā(V1HC1M1)whereV1iseither0985(অ-BENGALILETTERA)or098F(এ-BENGALILETTERE)His09CD(-BENGALISIGNVIRAMA)C1is-09AF(য-BENGALILETTERYA)M1is-09BE(া-BENGALIVOWELSIGNAA)S1 and S2 are valid even they are not allowed by the other context rules

P rarr Ra-Hasanta(C2H)whereC2iseither09B0(র-BENGALILETTERRA)or 09F0 (ৰ - ASSAMESE LETTER RA

Unicode name BENGALI LETTER RAWITHMIDDLEDIAGONAL)

His09CD(-BENGALISIGNVIRAMA)

Table16-SymbolsusedinWLErules

ItisalsoperhapsidealtomentionherethatinBanglatheconsonantletters(orgraphemes)arephysicallyjoinedtoformldquoclustersrdquothatcouldtheoreticallyconjoinfromtwotofourconsonantsandcombinetocreatenewshapesDashandChaudhuri(1998)statethatthereareldquonearly380uniqueconsonantclustersrdquooutofwhichBi-consonantalcombinationsare290three-lettercombinationsaccountforanother80andtherareroneswithfourlettersnumber10more[136Pg4]MoredetailsofsuchcombinationscouldbeseeninPabitraSarkar(1993)[135] 71FinalSetofWLERulesTheprevalentpatternsinBanglaandvariousrestrictionsbelowarethespecificWLErulesthatneedtobeimplemented

48

1 CisasetofCandCNwhereCNisthesetofnormalizedformsofড়ঢ়য়2 HmustbeprecededbyC

Example

3 MmustbeprecededbyCExampleকা

4 DmustbeprecededbyeitherofVCorMExampleআখখা হা

5 XmustbeprecededbyeitherofVCMorDExampleউঃখঃবঃাঃ দ ঃ

6 BmustbeprecededbyeitherofVCMorDExampleআংইংকং

7 ZmustbeprecededbyVCMDBXorPExampleইৎকৎাৎাৎপ6ৎrৎ (S is not listed because S ends with M Z may also follow S)rdquo

8 VCANNOTbeprecededbyHDetailsin711CaseofVprecededbyH

9 SCANNOTbeprecededbyH

Now letuselaborateeach rulewithexamples from the scriptkeeping inmindthe Bangla Assamese and Manipuri communities Some combinations ofcharactersmayseemunrealisticorrareinusagebutthereisnoharminaddingsuch ligaturesbecause it ispossible tocreate thembyanyusereasilybutmaynotbeattestedcombinationsCase of V Preceded by H There could be cases involvingmulti-word domains where Vmay need to beallowedtofollowanHegব8াsঅtইিuয়া baeligŋkʌv ɪndiə (U+09ACU+09CDU+09AFU+09BEU+0999U+09CD U+0995 U+0985 U+09AB U+09CD U+0987 U+09A8 U+09CD U+09A1U+09BFU+09DFU+09BE)(meaningBankofIndia)ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendswithanH(অt)andthesecondwordbeginswithaV(ইিuয়া)Somesectionsofthe linguistic community require the explicit presence of H for fullrepresentation of the sound intended However by and large the form of thefirstwordwithoutanH(U+09CD)isconsideredenoughforfullrepresentationofthesoundintendedforthefirstword

49

This isauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoire Otherwise V is never required to be allowed to follow an HPermittingthismaycreateaperceptivesimilaritybetweentwolabels(withandwithout H) for majority of the linguistic communities hence this is explicitlyprohibitedbytheNBGPIn future if required depending on the prevailing requirements from thecommunitythefutureNBGPmayconsiderrevisitingthisrule

72 AdditionalExamplesfromBanglaABNF

Belowarea fewexampleswhichhelponeunderstandsomeof the rulesABNFputsinplaceThesearejustgivenforreferencepurposesandarenotmeanttobecomprehensive

1HMBDorXcannotoccurinthebeginningofaBanglawordExample

ক क

াক ाक

ংক क

ক क

ঃক ःकAscanbeseensuchcombinationwillresultautomaticallyinaldquogolurdquooradottedcircle marking it as an invalid formation This is an intrinsic property of theIndianlanguagesyllableandisquasi-automaticallyappliedwhereversupportedbytheOS

2HisnotpermittedafterVBDXMS

Example

অ अ

অং क ক क কঃ कः ি )क

3NumberofBDorXpermittedafterConsonantorVowelorakāra(Mātrā)isrestrictedtoonethusthefollowingcombinationsareinvalidated

50

Example

কংং क

ক क

কঃঃ कःः

কা का

কীঃঃ कःः

অংং अ

অ अ

অঃঃ अःः

4NumberofMpermittedafterConsonantisrestrictedtooneExample

কীী की5MisnotpermittedafterV Example

ইাঈৗ ईाईौ6ThecombinationsofAnusvāra+ VisargaaswellasVisarga+Anusvāraarenotpermissible

Example

কংঃ कः

কঃং कः

8 Contributors

81ExpertsfromIndia ProfessorUdayaNarayanaSinghChair-ProfessorofLinguisticsampDeanFacultyofArtsAmityUniversityHaryanaGurgaonPachgaonManesarPIN122431(Haryana)India

51

ProfessorPabitraSarkarformerlyVice-ChancellorRabindraBharatiUniversityKolkataDrAtiurRahmanKhanPrincipalTechnicalOfficerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMrRajibChakrabortyLinguistSocietyforNaturalLanguageTechnologyResearch(SNLTR)Module114amp130SDFBuildingSaltLakeSector-VKolkata-700091(WestBengal)IndiaMrAkshatJoshiProjectEngineerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMsMoumitaChowdhurySeniorTechnicalOfficerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMrChandrakantaMurasinghAgartalaTripuraSomeotherNBGPmembers

82ContributorsfromBangladeshJanabMustafaJabbarHonorableMinisterMinistryofPostsTelecommunicationsampInformationTechnologyGovtofBangladeshProfShamsuzzamanKhanFormerDirector-GeneralBanglaAcademyDhakaProfRafiqulIslamNationalProfessorofHumanitiesDhakaProfSwarochisSarkarDirectorInstituteofBangladeshStudiesRajshahiUniversityRajshahiBangladeshProfJinnatImtiazAliDirector-GeneralInternationalMotherLanguageInstituteDhakaMrMohammadMamunOrRashidDepartmentofBanglaJahangirnagarUniversityampMemberBangladeshComputerCouncilProfManiruzzamanformerlyProfessorChittagongUniversityChattagramBangladeshMrShyamSunderSikderSecretarySecretaryPostampTelecommunicationsDivisionGovtofBangladesh

52

MrMdMustafaKamalFormerDirectorGeneralBangladeshTelecommunicationsRegulatoryAuthorityGovernmentofBangladeshDhakaBrigadierGeneralMdMahfuzulKarimMajumderDirector-GeneralEngineeringampOperationsDivisionBangladeshTelecommunicationsRegulatoryAuthorityGovernmentofBangladeshDhakaMdZiarulIslamProgrammerPostsampTelecommunicationsDivisionGovernmentofBangladeshDhakaProfSyedShahriyarRahmanDepartmentofLinguisticsUniversityofDhakaDrMizanurRahmanDirector(In-Charge)TranslationTextBookandInternationalRelationsDivisionBanglaAcademyDhakaDrApareshBandyopadhyayDirectorBanglaAcademyDhakaMrMdMobarakHossainDirectorBanglaAcademyDhakaDrJalalAhmedDirectorBanglaAcademyDhakaMrJahangirHossainInternetSocietyBangladesh(ICANNALS)JanabSarwarMostafaChoudhuryBangladeshComputerCouncilDhakaJanabMdRashidWasifBangladeshComputerCouncilDhakaJanabIstiaqueArifSeniorAssistantDirectorBangladeshTelecommunicationsRegulatoryAuthorityDhakaMsAfifaAbbasInformationSecurityandGovernanceLeadEngineeratBanglalinkandICANNFellowMr Mohammad Abdul Haque Secretary General Bangladesh Internet GovernanceForumMrImranHossenCEOEyeSoftandkeymemberofBangladeshAssociationofSoftwareampInformationServices(BASIS)MsShahidaKhatunDirectorFolkloreMuseumampArchiveDivisionBanglaAcademyDhakaMrSyedAshikRehmanCEOBengalMediaCorporationDhakaMrHaseebRahmanCEOProfessionalsrsquoSystemsDhaka

53

9 References[100] UnicodeConsortium2017UnicodeStandard100MountainViewCA

[101] BandyopadhyayChittaranjan1981DuiShatakerBanglaMudranoPrakashanKolkataAnandaPublishers

[102] BanerjiRD1919TheOriginoftheBengaliScriptKolkataNewDelhiAsianEducationalServices2003reprint

[103] ChatterjiSK1926TheOriginandDevelopmentoftheBengaliLanguageCalcuttaCalcuttaUniversityPress

[104] -----1939Bhasha-prakashBangalaVyakaran(AGrammaroftheBengaliLanguage)CalcuttaUniversityofCalcutta

[105] HaiMuhammadAbdul1964DhvaniVijnanOBanglaDhvani-tattwa(PhoneticsandBengaliPhonology)DhakaBanglaAcademy

[106]JhaSubhadra1958TheFormationofMaithiliLondonLuzacampCo

[107] KosticDjordjeDasRheaS1972AShortOutlineofBengaliPhoneticsCalcuttaStatisticalPublishingCompany

[108] MajumdarRC1971HistoryofAncientBengalCalcuttaGBhardwaj

[109] MazumdarBijaychandra19202000TheHistoryoftheBengaliLanguage(ReprCalcutta1920ed)NewDelhiAsianEducationalServices

[110] PandeyAnshuman2001ProposaltoEncodetheTirhutaScriptinISOIEC10646

[111] PalPalashBaran2001DhwanimalaBarnamalaKolkataPapyrus

[112] -----2007lsquoBanglaHarapherPanchParbarsquoInSwapanChakrabortyedMudranerSanskritiOBanglaBoiKolkataAbabhas

[113] RossFiona1999ThePrintedBengaliCharacteranditsEvolutionLondonCurzon

[114] ShastriMahamahopadhyayHaraPrasad1916HājārBacharērPurāṇaBāṅgālāBhāṣāyBauddhaGānōDōhāCalcuttaBangiyaSahityaParishat

[115] SinghUdayaNarayana(JointlyManiruzzaman)1983DiglossiainBangladeshandlanguageplanningCalcuttaGyanBharati214pp

[116] -----1987ABibliographyofBengaliLinguisticsMysoreCIILxii+316pp

[117] -----2017(withRajibChakrabortyBidishaBhattacharjeeampArimardanKumarTripathy)LanguagesandCulturesontheMarginGuidelinesforFieldworkonEndangeredLanguagesMimeoCentreforEndangeredLanguagesVisva-Bharati

54

[118] -----1980ScriptalchoiceandspellingreformAnessayinlanguageandplanningJournaloftheMSUniversityofBarodaSocialScienceNumber292173-186AmodifiedversionreprintedEAnnamalaiBjornJernuddandJoanRubinedsLanguagePlanningProceedingsofanInstituteMysoreCIIL405-417

[119] Sripantha1996JakhanChapakhanaEloKolkataPaschim-BangaBanglaAcademy

[120] SurAtul1986BanglaMudranerDushoBacharKolkataJijnasa

[121] ScriptBehaviourforBengaliVersion11TDILandC-DACPune

[122] BoraMahendra1981TheEvolutionofAssameseScriptJorhatAssamSahityaSabha

[123] ProposaltoEncodetheTirhutaScriptinISOIEC10646httpwwwunicodeorgL2L201111175r-tirhutapdfaccessedon25112017

[124] EthnologueAssameseintheLanguageCloudhttpswwwethnologuecomcloudasmaccessedon25112017

[125] BengalialphabetforManipurifoundinEthnologueManipuri(MeeteilonMeithei)httpswwwomniglotcomwritingmanipurihtmaccessedon20102019

[126] WikipediaBengalialphabethttpsenwikipediaorgwikiBengali_alphabetaccessedon25112017

[129]OmniglotSlyhetihttpwwwomniglotcomwritingsylotihtmaccessedon1052018

[130]WikipediaBishnupriyaManipurilanguagehttpsenwikipediaorgwikiBishnupriya_Manipuri_languageaccessedon25112017

[131] TheEMILLECIILCorpushttpmetashareeldaorgrepositorybrowsethe-emilleciil-corpusabdd35c8de6f11e2b1e400259011f6ea6bce74d38dbb42d881da76c64a6adb20accessedon1052018

[132]TheEMILLECIILCorpushttpcatalogelrainfoproduct_infophpproducts_id=696accessedon1052018

[133] BanglaLanguageampScript httpswwwisicalacin~rc_banglabanglahtmlaccessedon1052018

[134] SarkarPabitra1992BanglaBananSanskarSamasyaoSambhabanaKolkataChirayataPrakashan

[135] SarkarPabitra1993BanglaBhasharYuktabyanjanBhasha1123-45

55

[136] DashNiladriShekharandBBChaudhuri1998BanglaScriptAStructuralStudyLinguisticsToday121-28Alsoavailableathttpswwwacademiaedu9967428Bangla_Script_A_Structural_Study

[137] DaniAhmedHasan(1957)lsquoSrīhatta-NāgarīLipirUtpattioBikāśrsquoBanglaAcademyPatrika(Dhaka)Vol12(Bhadra-Agrahayan1364BangabdaNumber)pg1

[138] WikipediaSylhetiNagari httpsenwikipediaorgwikiSylheti_Nagariaccessedon1952018

[139] FuruiRyosuke(2015)lsquoVariegatedAdaptationsStateFormationinBengalfromtheFifthtoSeventhCenturyrsquoinBhairabiPrasadSahuampHermannKulkeedsInterrogatingPoliticalSystemsIntegrativeProcessesandStatesinPre-ModernIndiaChapter9Pp255-73NewDelhiManohar

[140] FergusonCharesAandMunierChowdhury(1960)lsquoPhonesofBengalirsquoLanguageVol36No1pp22-59

[141] ShahidullahMuhammad(2007)BuddhistMysticSongsDhakaMowlaBrothers

[142] RayPunyaSloka(1966)BengaliLanguageHandbookWashington

[143] HaiMuhammadAbdul(1960)AphoneticandphonologicalstudyofnasalsandnasalizationinBengaliDhakaUniversityofDhaka

[144] UnicodeConsortiumProposalSummaryFormtoAccompanySubmissionsforAdditionstotheRepertoireofISOIEC10646UNICODEhttpswwwunicodeorgL2L200202387r-syloti-formpdfaccessedonMay212018

[145] WikipediaOlChiki(Unicodeblock)httpsenwikipediaorgwikiOl_Chiki_(Unicode_block)accessedonMay212018

[146]BanglaScripthttpwwwbangladesh2000combdbangla_scripthtmlaccessedonMay212018

[147] BhattacharyaAshutoshed(1942)GopichandrerGanCalcuttaCalcuttaUniversity

[149] DasSisirKumar(1975)SahibsandMunshisAnAccountoftheCollegeofFortWilliamCalcutta

[150] IslamRafiqulPabitraSarkarMahbubulHaqampRajibChakraborty(eds)(2014)BanglaAcademyPromitoBanglaByabaharikByakaran(AFunctionalGrammarofStandardBangla)DhakaBanglaAcademy

[151] SarkarPabitra[2013]lsquoBanglaSpellingReformtheLongandShortofItrsquoBanglaJournal19215-232

[152] BanglaAcademy(2012)BanglaAcademyPromitoBanglaBananerNiyam(StandardBanglaSpellingasadoptedbyBanglaAcademy)DhakaBanglaAcademy

56

[153] SarkarPabitraampRajibChakraborty2018ldquoWhathashappenedSoFarIntermsof Script Reformsrdquo Paper presented at the Face to Face meeting jointly held by theBanglaAcademyDhakaampICANNatBanglaAcademyDhakaon10072018

[154] The Unicode Consortium 2018 The Unicodereg Standard Version 110 ndash CoreSpecificationChapter12P473

57

10 Appendix-I

101 AugmentedBackus-NaurFormalism(ABNF)The Augmented Backus-Naur Formalism (ABNF) is generic in nature and whenappliedtoaspecificlanguagescriptcertainrestrictionrulesapplyInotherwordsinagivenlanguagesomeoftheFormalismstructuresdonotnecessarilyapplyTotakecareofsuchcasesrestrictionrulesaresetinplaceTheserestrictionswillhelptofine-tunetheABNFIncaseofBangla13inparticularthefollowingrulesapply

1Khaṇḍata (ৎ) is NOT allowed at the beginning of an IDN label The sameappliestoঞandthevelarnasalঙintheBanglaSchemeoffive-foldlsquovargarsquo(asdefined under Table 5) Moreover Bangla does not allow ya (য়) in thebeginningofawordeitherbutwecanciteacoupleofnativeexamples forexamplethewordয়8াvেড়া(yaeligbbɔRo)fromthepoemlsquoLichuchorrsquowrittenbyKaziNazrul IslamHowever there are instances of it being used in namesmostlyof foreignoriginsuchasYaqubwhichmaybewrittenwithya(য়) inthe beginning as inয়াwব) In very recent timeswhile transliterating someChineseandJapanesenamesinBanglaonedoescomeacrossthepossibilityofKhaṇḍata (ৎ) followedbysa (স) inthebeginningofaword forexamplexেসিরং(Tsering)

2CHcancomewithKhandaTainonlythecasewhereCisra(র)(09B0)

ৎ6 asinভৎ6 সনা

3OnlyfollowingcombinationswithVHCMwillbeallowedrarrঅ8া(togetherpronouncedasaelig)asinঅ8ািসড(acid)rarrএ8া(togetheralsopronouncedasaelig)asinএ8ািসডএ8ােসািসেয়শান

(acidassociation)

102 lsquoSylhetiNagarılipirsquoorlsquoSilotirsquoThisversionofBanglascriptresemblesthe lsquoKaithīrsquoscript(ISO12954)usedbytheAccountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh andBiharndashwidelyinuseduringthe1880sTherewereseveralothernamesofSylheti

13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language Assamese and Maṇipuri have not been covered in this section

58

Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)

Table17ndashTheScriptTableofSylhetiNagarıorSiloti

103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints

1031 BanglaandNāgarīorDevanāgarī

Bangla Devanāgarī NBGP

Decision

ঃ U+0983 ः U+0903 Confusable

ওU+0993 उU+0909 Confusable

ঘU+0998 घU+0918 Confusable

U+0981 U+0945 Confusable

Table18BanglaandDevanāgarīconfusablecodepoints

59

1032 BanglaandGurmukhi

Bangla Gurmukhi NBGPdecision

ঘU+0998 ਬU+0A2C Confusable

U+0981 U+0A71 Confusable

Table19BanglaandGurmukhiconfusablecodepoints

Bangla Gurmukhi

NBGPdecision

ওU+0993 ਤU+0A24

Distinguishable

শU+09B6 ਅU+0A05

Distinguishable

মU+09AE ਮU+0A2E

Distinguishable

বাU+09ACandU+09BE

ਗU+0A17

Distinguishable

Table20ndashBanglaandGurmukhıdistinguishablecodepoints

1033BanglaandOriya(Odia)

Bangla Oriya(Odia) NBGPDecision

ওU+0993 ଓU+0B13 Confusable

Table21ndashBanglaandOriyadistinguishablecodepoints

Bangla Oriya(Odia) NBGP

Decision

ঘU+0998 ସU+0B38 Distinguishable

Table22ndashBanglaandOriyadistinguishablecodepoints

60

11 Appendix-IIBengaliconsonantsandtheirallographs

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

প p y(z+ত)(z +ন)|(z +প)প8(z +য)r(z +র)(z +ল)~(z +স)(+প)(+প)

ফ pʰ (t +র)(t +ল)(+ফ)

ব b ( +জ)( +দ)( +ধ)v(+ব)ব8(+য)(+র)(+ল)ভ(+ভ)(+ব)(+ব)

(0+ধ) 2 (3+ব)

ভ bʱ ভ8(+য)(+র)(+ল)

ত t (x+ত)8(x+x+য)(x+x+ব)(x+থ)(x+ন)ত8(x+য)(x+ম)8(x++য)(x+ব)(x+র)y(z+ত)(p+ত)(p+x+ব)(+ত)q8(+x++য) (+x+র)Thereisamarkedformofত+=ৎৎ6 (+xৎ)

amp (5+ত)

61

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

থ tʰ থ8(iexcl+য)cent(iexcl+র)pound(+থ)(x+থ)curren(+থ)

(7+থ) 9 (+থ)

দ d yen(brvbar+গ)sect(brvbar+ঘ)uml(brvbar+দ)copy(brvbar+ধ)দ8(brvbar+য)ordf(brvbar+ব)laquo(brvbar+ভ)not(brvbar+র)(+দ)shy(+দ)reg(+brvbar+র)not6 (+brvbar+র)

(lt+গ) gt (lt+ধ)

ধ dʱ macr(deg+ন)plusmn(deg+ম)ধ8(deg+য)sup2(deg+র)sup3(acute+ধ)copy(brvbar+ধ)(+ধ)micro(+ধ)

( (+ধ) gt (lt+ধ) (0+ধ) (7+ধ)

ট ʈ para(middot+ট)ট8(middot+য)cedil(middot+ব)sup1(middot+র)ordm(p+ট)raquo(frac14+ট)

ঠ ʈʰ ঠ8(frac12+য)frac34(iquest+ঠ)Agrave(frac14+ঠ)

ড ɖ Aacute(Acirc+ড)ড8(Acirc+য)Atilde(Acirc+র)

ঢ ɖʱ ঢ8(Auml+য)Aring(iquest+ঢ)

চ tʃ AElig(Ccedil+চ)Egrave(Ccedil+ছ)Eacute(Ccedil+Ecirc+র)Euml(Ccedil+ঞ)চ8(Ccedil+য)Igrave(Iacute+চ)Icirc(Iuml+চ)

(A+চ)

62

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ছ tʃʰ ETH(Ecirc+র)Egrave(Ccedil+ছ)Ntilde(Iacute+ছ)Ograve(Iuml+ছ)

$ (A+ছ)

জ dʒ Oacute(Ocirc+জ)Otilde(Ocirc+Ocirc+ব)Ouml(Ocirc+ঝ)times(Ocirc+ঞ)জ8(Ocirc+য)Oslash(Ocirc+র)Ugrave(Iacute+জ)

(A+জ)

ঝ dʒʱ (notprivilegedenoughtohaveclustersasafirstmember)Ouml(Ocirc+ঝ)Uacute(Iacute+ঝ)

ক k Ucirc(p+ক)ordm(p+ট)(p+ত)Uuml(p+x+র)(p+x+ব)Yacute(p+ন)THORN(p+ব)szlig(p+ম)ক8(p+য)agrave(p+র)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)auml(p+frac14+ব)aacute8(p+frac14+য)aring(p+স)s(aelig+ক)ccedil(+p+র)

amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)

খ kʰ (notprivilegedenoughtohaveclustersasafirstmember)egrave(aelig+খ)

63

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

গ g eacute(acute+গ)ecirc(acute+দ)sup3(acute+ধ)euml(acute+ন)igrave(acute+ব)iacute(acute+ম)গ8(acute+য)icirc(acute+র)iuml(acute+ল)eth(aelig+গ)eth6 (+aelig+গ)

( (+ধ) (H+গ) K (L+H+গ)

ঘ gʱ ntilde(ograve+ন)ঘ8(ograve+য)oacute(ograve+র)ocirc(aelig+ঘ)

ঞ Thisletterdoesnothaveanyparticularphoneticvaluebutmostlypronouncedasn

Igrave(Iacute+চ)Ntilde(Iacute+ছ)Ugrave(Iacute+জ)Uacute(Iacute+ঝ)times(Ocirc+ঞ)

(A+চ) $ (A+ছ) (A+জ) M (A+ঝ)

ণ n otilde(iquest+ট)frac34(iquest+ঠ)ouml(iquest+ড)divide(iquest+Acirc+র)Aring(iquest+ঢ)oslash(iquest+ণ)ণ8(iquest+য)ugrave(iquest+ব)acirc(p+frac14+ণ)uacute(frac14+ণ)ucirc(+ণ)

O (P+ড) - (P+R+র) + (S+ণ)

ঙং ŋ s(aelig+ক)uuml(aelig+p+র)egrave(aelig+খ)eth(aelig+গ)ocirc(aelig+ঘ)yacute(aelig+p+ষ)(Insomecontextsaeligisreplacedbyং)কংঅং

) (H+ক) (H+গ) U (H+ঘ)

64

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ম m thorn(+ল)yuml(+প)(+z+র)(+ভ)(++র)$(+ম)(+র)(x+ম)plusmn(deg+ম)amp(+ম)atilde(p+frac14+ম)

W (3+ম)

ন n (+ট)((+middot+র))(iquest+ঠ)u(+ড)(+Acirc+র)(+ত)q(+x+র)q8(+x++য)curren(+থ)shy(+দ)reg(+brvbar+র)micro(+ধ)+(+deg+র)(+brvbar+ব)-(+ন)(+ম)ন8(+য)(+স)0(+ন)

(7+থ) (7+ধ) (7+Y+র)

শ ʃ Icirc(Iuml+চ)Ograve(Iuml+ছ)1(Iuml+ন)2(Iuml+ম)3(Iuml+র)4(Iuml+ল)শ8(Iuml+য)

ষ ʃ 5(frac14+ক)raquo(frac14+ট)Agrave(frac14+ঠ)uacute(frac14+ণ)6(frac14+প)7(frac14+z+র)8(frac14+ফ)raquo(frac14+ট)9(frac14+middot+র)Agrave(frac14+ঠ)uacute(frac14+ণ)ষ8(frac14+য)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)

+ (S+ণ)

65

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

স sampʃ (+ক)(+ট)(+প)(+ফ)lt(+ত)pound(+থ)(+ট)(+ক)=(+খ)স8(+য)gt(+র)(+ল)aring(p+স)

9 (+থ)

হ h ucirc(+ণ)0(+ন)amp(+ম)হ8(+য)(+র)A(+ল)

W (3+ম)

ড় ɽ B(C+গ)

ঢ় ɽʱ (notprivilegedenoughtohaveclusters)

য dʒThesecondarysymbol(allograph)jɔ-phalahastwophoneticvaluesWhenaddedtotheinitialconsonantinaworditisavowelaelig(asinশ8ামলর 8াপারetc)Butafteranon-initialconsonantitjustdoublesitinpronunciation(asinকায6ধায6etc)The+যcombinationhastwophysicalmanifestationsmdashর 8andয6

ক8(p+য)স8(+য)র 8(+য)[Justর 8isneverusedinBanglaorthographyর 8াisbutthenitslasttwosymbolsYa-phalaa-karaconstituteavowelsignrepresentingthevowelঅ8া]

66

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

র r Twomanifestationsmdashi lরফrepʰasthefirst

memberofaclusteregপ6ৎ6 not6 য6D6(+deg+ব)(earlierE6=+brvbar+deg+বafour-termcluster)etc(placedoverthefollowingconsonant)

ii র-ফলাrɔ-pʰɔlaasthesecondthirdmemberofaclustereg etc(placedundertheconsonantitfollows)

ল l F(+গ)(+প)G(+ব)H(+ম)I(+ট)J(+ড)K(+ক)F(+গ)L(+দ)ল8(+য)iuml(acute+ল)(+ল)thorn(+ল)

ঃ hwordfinallywordmediallyitdoublesthepronunciationofthefollowingconsonant

অঃকঃ

অব

Page 25: Proposal for a Bangla (or Bengali) Script Root Zone …‘Bangla’ (or Bengali) is historically and genealogically regarded as an eastern Indo-Aryan language with around 178.2 million

25

Figure1BanglaCodePagefrom[MSR]forBangla-Asamiyā-Manipuri

ColourconventionAllcharactersthatareincludedinthe[MSR]-YellowbackgroundPVALIDinIDNA2008butexcludedfromthe[MSR]-PinkishbackgroundNot PVALID in IDNA2008 or are ineligiblefor the root zone (digits hyphen) - Whitebackground

GiventheBanglaUnicodeBlockasinFigure1forthecodepointsthoseareincludedintheMSRthefollowingsymbolswillneedaseparatetreatmentৎ U+09CE BanglaLetterKhanda-Taৰ U+09F0 Asamiyā-BanglaLetterRaWithMiddleDiagonalৱ U+09F1 Asamiyā-BanglaLetterRaWithLowerDiagonal

26

51 CodePointRepertoireInclusion

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

1 U+0981 BENGALISIGNCANDRABINDU

Candra-bindu

1Bangla2Manipuri2Assamese

[112][122][125]

2 U+0982 ং BENGALISIGNANUSVARA

Onushshar(Anusvara)

1Bangla2Manipuri2Assamese

[112][122][125]

3 U+0983 ঃ BENGALISIGNVISARGA

Biśarga(Visarga)

1Bangla2Manipuri2Assamese

[112][122][125]

4 U+0985 অ BENGALILETTERA

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

5 U+0986 আ BENGALILETTERAA

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

6 U+0987 ই BENGALILETTERI

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

7 U+0988 ঈ BENGALILETTERII

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

27

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

8 U+0989 উ BENGALILETTERU

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

9 U+098A ঊ BENGALILETTERUU

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

10 U+098B ঋ BENGALILETTERVOCALICR

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

11 U+098F এ BENGALILETTERE

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

12 U+0990 ঐ BENGALILETTERAI

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

13 U+0993 ও BENGALILETTERO

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

14 U+0994 ঔ BENGALILETTERAU

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

28

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

15 U+0995 ক BENGALILETTERKA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

16 U+0996 খ BENGALILETTERKHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

17 U+0997 গ BENGALILETTERGA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

18 U+0998 ঘ BENGALILETTERGHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

19 U+0999 ঙ BENGALILETTERNGA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

20 U+099A চ BENGALILETTERCA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

21 U+099B ছ BENGALILETTERCHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

29

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

22 U+099C জ BENGALILETTERJA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

23 U+099D ঝ BENGALILETTERJHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

24 U+099E ঞ BENGALILETTERNYA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

25 U+099F ট BENGALILETTERTTA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

26 U+09A0 ঠ BENGALILETTERTTHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

27 U+09A1 ড BENGALILETTERDDA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

30

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

28 09A109BC(U+09DC)

ড় NormalizedformofBENGALILETTERRRA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]09DCisthepreferredcodepointhoweveritisnotavailableforLGRasperthestandardsgoverningthisLGRdevelopment

29 U+09A2 ঢ BENGALILETTERDDHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

30 09A209BC(U+09DD)

ঢ় NormalizedformofBENGALILETTERRHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]09DDisthepreferredcodepointhoweveritisnotavailableforLGRasperthestandardsgoverningthisLGRdevelopment

31 U+09A3 ণ BENGALILETTERNNA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

32 U+09A4 ত BENGALILETTERTA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

31

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

33 U+09A5 থ BENGALILETTERTHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

34 U+09A6 দ BENGALILETTERDA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

35 U+09A7 ধ BENGALILETTERDHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

36 U+09A8 ন BENGALILETTERNA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

37 U+09AA প BENGALILETTERPA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

38 U+09AB ফ BENGALILETTERPHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

39 U+09AC ব BENGALILETTERBA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

32

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

40 U+09AD ভ BENGALILETTERBHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

41 U+09AE ম BENGALILETTERMA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

42 U+09AF য BENGALILETTERYA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

43 09AF09BC(U+09DF)

য় NormalizedformofBENGALILETTERYYA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]09DFisthepreferredcodepointhoweveritisnotavailableforLGRasperthestandardsgoverningthisLGRdevelopment

44 U+09B0 র BENGALILETTERRA

Consonant 1Bangla2Manipuri

[112][125]

45 U+09B2 ল BENGALILETTERLA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

46 U+09B6 শ BENGALILETTERSHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

33

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

47 U+09B7 ষ BENGALILETTERSSA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

48 U+09B8 স BENGALILETTERSA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

49 U+09B9 হ BENGALILETTERHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

50 U+09BE া BENGALIVOWELSIGNAA

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

51 U+09BF ি BENGALIVOWELSIGNI

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

52 U+09C0 ী BENGALIVOWELSIGNII

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

53 U+09C1 BENGALIVOWELSIGNU

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

34

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

54 U+09C2 BENGALIVOWELSIGNUU

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

55 U+09C3 BENGALIVOWELSIGNVOCALICR

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

56 U+09C4 BENGALIVOWELSIGNVOCALICRR

Kāra(Mātrā)

1Bangla2Assamese

[112][122]

57 U+09C7 l BENGALIVOWELSIGNE

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

58 U+09C8 m BENGALIVOWELSIGNAI

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

59 U+09CB lা BENGALIVOWELSIGNO

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

60 U+09CC lৗ BENGALIVOWELSIGNAU

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

35

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

61 U+09CD BENGALISIGNVIRAMA

Hasanta(=Halant)Virama(=Da ri)

1Bangla2Assamese2Manipuri

[112][122][125]

62 U+09CE ৎ BENGALILETTERKHANDATA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

63 U+09F0 ৰ BENGALILETTERRAWITHMIDDLEDIAGONAL

Consonant 2Assamese [122]

64 U+09F1 ৱ BENGALILETTERRAWITHLOWERDIAGONAL

Consonant 2Assamese2Manipuri

[122][125]

Table8BanglaCode-PointRepertoire

Apart from the above individual code-points the Neo-Brāhmī Generation Panel alsoproposes some specific sequences which enable conditional inclusion of the BanglaLETTER A and E followed by Bangla SIGN VIRAMA and Bangla LETTER YA againfollowed by Bangla VOWEL SIGN AA in the repertoire for enabling inclusion of aeligsoundasinEnglishlsquobatrsquolsquocatrsquoetc

SrNo

UnicodeCodePoints

Sequence

CharacterNames Examplelanguagesusingthecode-point

(Notexhaustive

list)

Reference

36

SrNo

UnicodeCodePoints

Sequence

CharacterNames Examplelanguagesusingthecode-point

(Notexhaustive

list)

Reference

S1 098509CD09AF09BE

অ8া BENGALILETTERABENGALISIGNVIRAMABENGALILETTERYABENGALIVOWELSIGNAA

BanglaAssamese

[112][122]

S2 098F09CD09AF09BE

এ8া BENGALILETTEREBENGALISIGNVIRAMABENGALILETTERYABENGALIVOWELSIGNAA

Bangla [112]

Table9Sequences

52 CodePointRepertoireExclusionTherearesomecharactersoftheBanglascriptthatfindplaceintheUnicodebuthavenot been included in the repertoire in the LGR proposal The reason for excludingঌ(U+098C)andৗ(U+09D7)isthattheyarerareandobsoletecharacters

SrNo CodePoints

Glyph CharacterNames Note

1 U+098C ঌ BENGALILETTERVOCALICL Limitedordeclininguse

2 U+09D7 ৗ BENGALIAULENGTHMARK Limitedordeclininguse

Table10ExcludedCodePoints

37

53 CodepointnotusedaloneBENGALI SIGN NUKTA U+09BC (See 336) is excluded from repertoire since it will never be used alone It will be used as sequence in three special characters in normalized form for ড় ঢ় য়

UnicodeCodePoint

Glyph CharacterName Reasonforexclusion

U+09BC BENGALISIGNNUKTA

Never used alone Only used together with U+09A1 ড

U+09A2 ঢ U+09AF য as to form ড় ঢ় য় respectively

Table10bExcludedCodePoints

54 TheBasisofPresentIDNThepresentLGRhasalsobenefitedfromtheearlierworkonIDNforBangla(differentversions)doneforभारतorভারতdraftedbetween20112009and18072013

541 TheABNFVariablesTheAugmentedBackus-NaurFormalism(ABNF)beganwiththefollowingvariables

CrarrConsonantVrarrVowelMrarrkāra(Mātrā)BrarrAnusvāra(onuʃʃār)DrarrCandrabinduXrarrVisarga(biʃɔrgo)HrarrHasantaViramaZrarrKhandaTa

TheAugmentedBackus-NaurFormalism(ABNF)willusethefollowingOperators

SrNumber Operator Function

1 ldquo|ldquo Alternative

2 ldquo[]rdquo Optional

3 ldquordquo VariableRepetition

38

4 ldquo()rdquo SequenceGroup

Table11TheABNFFormalism

InwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoBanglaaregiventofacilitateunderstanding

542 TheVowelSequenceInwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoBanglaare given To facilitate understanding of other Brahmi script users equivalents inDevanāgarīareprovidedwherevernecessaryAvowelsequenceismadeupofasinglevowelItmaybefollowedbutnotnecessarily(optionally)byanAnusvāraonuʃʃār(B)Candrabindu(D)oraVisargabiʃɔrgo(X)ThenumberofDBorXwhichcanfollowaV inBanglamaynotberestrictedtooneGoingbytherules illustratedinthedocument it isclearthat formationssuchasVDDVBBandVXXare invalidorthographicunitsHowever it isvalidandpossible tohaveformationsorsequencessuchasanusvarafollowedbyachandrabinduononehandandvisarga followed by a chandrabindu on the other as in হ8াংচা lsquohaelignchārsquo and lsquohaelignrsquo হ8াঃrespectivelyThepossibilityof aVisarga orAnusvāra (onuʃʃār) followingaCandrabinduexists inBangla Vowel can optionally be followed by a combination of Hasanta Virama [H]Consonant [C] to formaYa-phala ldquoYa-phala isapresentation formofU+09AFBanglaletterযorlsquoyarsquoRepresentedbythesequenceltU+09CDieBENGALISIGNVIRAMABangla SIGNHasanta or VIRAMA U+09AF -য BENGALI LETTER YAgt Ya-phala has aspecialformয়AgainwhencombinedwithU+09BEাBENGALIVOWELSIGNAA(ielsquoaarsquo(ā))itisusedfortranscribing[aelig]asintheldquoardquointheEnglishwordldquobatrdquowritteninBanglaasব8াটAVowel-sequenceadmitsthefollowingcombinations

5421 ASingleVowel

ExamplesV অअ

5422 AVowelwithConditionsAVowelcanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D]orVisarga[X]or Candrabindu+ Anusvāra [DB] or Candrabindu+ Visarga [DX] or combination ofHasanta(orVirama)[H]followedbyConsonant[C]followedbykāra(Mātrā)[M]

39

Examples

VB অং अ

VD অ अ

VX অঃ अः

VDB অং अ

VDX অঃ अःVHCM অ8াএ8া

5423 VHCMSequenceAVHCMsequencecanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D] orVisarga [X]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga [DX]

Examples VHCMB অ8াংএ8াং VHCMD অ8াএ8া VHCMX অ8াঃএ8াঃ VHCMDB অ8াংএ8াং VHCMDX অ8াঃএ8াঃ

543 TheConsonantSequence

5431 ASingleConsonant(C) ExampleC কक

5432 AConsonantwithConditionsAConsonantoptionallyfollowedbydependentvowelsignkāra(Mātrā)[M]orAnusvāra [B] or Candrabindu [D] or Visarga [X] or Hasanta (also known asVirama)[H]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga[DX]

Example

CM িকক -कक

CB কং क

40

CD ক क

CX কঃ कः

CH p क(Pureconsonant)

CDB কং क

CDX কঃ कः

5433 CMSequenceACMsequencecanbeoptionallyfollowedbyBDXDBorDX

Example CMB কীংকং कक

CMD কা का

CMX বীঃ वीः

CMDB কাং का

CMDX কাঃ काः

5434 SequenceofConsonants Asequenceofconsonants(upto4)joinedbyHasanta(alsoknownasVirama)

3(CH)CExample

CHC W rarr ন++ ত न+त

CHCHC sup2 rarr ন+ + ত+ + র न+त+र

CHCHCHC q8 rarr ন++ত++র++য় न+त+र+य

5435 Subsets While considering its subsets as a representative example we will consider thecombinationCHConlyhoweverthesameisequallyapplicabletoCHCHCandCHCHCHC[A]ThecombinationmaybefollowedbyMBDXDBorDX

Example CHCM sup3ী rarrক ক ী 4कrarrककী

CHCB sup3ং rarrক ক ং 4कrarrकक

CHCD sup3 rarrক ক 4कrarrकक

41

CHCX sup3ঃ rarrক ক ঃ 4कःrarrककঃ

CHCDB sup3 ং rarrক ক ং 4कrarrकक

CHCDX sup3ঃ rarrক ক ঃ 4कःrarrककः [B] 3(CH)CMmayfurtherbefollowedbyaBDXDBorDX

Example

CHCMB sup3ীং rarr ক ক ী ং 4क rarr क क ी

sup3ং rarr ক ক ং 4क rarr क क

CHCMD sup3া rarr ক ক া 4का rarr क क ा

CHCMX sup3ীঃ rarr ক ক ী ঃ 4कः rarr क क ी ः

CHCMDB sup3াংrarr ক ক া ং 4काrarr क क ा

CHCMDX sup3াঃ rarr ক ক া ঃ 4काः rarr क क ा ः

544 TheKhanda-Tasequence

5441 AsinglelsquoKhandarsquo-Ta(Z) Example Zৎ=x

5442 AKhandaTaCombination10AKhandaTacanbeprecededbyaconsonantandHasanta(alsoknownasVirama)

[CH]Z

Example র++ৎ =ৎHasinভৎHসনা (bhartsanā) scoldingNoteTheconditionsinthiscontextofKHANDATAarethattheCshouldbeeitherRAU+09B0(র)(usedinBangla)orRAU+09F0(ৰ)(usedinAssamese)

545 SpecialCasesSandPTwospecialcasesinvolvingSequences(referredtoasSandPinTable16underSection7)couldbedescribedbrieflyhereLetustakeupSinthefirstinstanceItisnoteworthythat thereare two instances inBanglawhereHasanta(U+09CD) isprecededbya full

10 Refer to Rule P in Section 7 Table 16

42

vowel (U+0985 অ - BANGLA LETTER A and U+098F এ - BANGLA LETTER E) ForrenderingYa-phalāfollowedbyঅandএitisnecessarytotypeU+09CDplusU+09AFyapreceded by the said vowels This is a purely ligatural entity and the addition ofYa-phalāandā-kāraisusedtoelicittheaeligsoundasinEnglishlsquobatrsquolsquofatrsquoetcTheBrahmıscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribedaslsquovowel killerrsquo although it actually indicates absence of a vowel after the markedconsonant Only the consonants can have the Hasanta marked But as we see hereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅGাandএGা(CfUnicode100p473[100])Another case refers to the formation of repha and ra-phalā in the said script andmentioned in the tableaboveasPOwing toco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+

kaasinঅকHarkaldquothesunldquo)ra-phalā=C+Hasanta+ra(egTieka+Hasanta+raas

inচTcakraldquocycleldquo)ThepointisinboththecasestheslotforracouldbeBanglaraর

(U+09B0)ortheAssameseraৰ(U+09F0)followedprecededbythecommonHasanta(U+09CD)whereastheshapesofrephaandra-phalāinboththecasesremainthesame

43

6 VariantsThissection talksabout thevariants in theBanglascriptTheNBGPcategorizes theseconfusinglyvariantsintwogroupsGroup1ConfusingduetopurevisualsimilarityGroup2ConfusingduetodeviationfromnormallyperceivedcharacterformationsbylargerlinguisticcommunityForGroup1anyidenticalcodepointsaredefinedasvariantsTheconfusablebutnotidenticalcasesarenotproposedasthereisanotherpanel(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesHowevercaseswhichbelongtoGroup2areproposedtobeconsideredasvariantsThesecasesarenotofmerevisualsimilarityasthey involve some deviations from the widely accepted norms of Bangla Aksharformations These can cause confusion even to a careful observer and hence beingproposedasvariantsThe variants are generated in a script when two or more forms are formed withdifferent storage or code points In Bangla the e-kāra ā-kāra and the o-kāra havedifferentcodepointsOnecantypeowithaconsonantatonegoandthesamebytypinge-kāra and ā-kāra as two separate keys getting the same results A reader cannotdifferentiatebetween the twoko (েকা)one typedwitha singlekeyand theotheronetypedwithtwodifferentkeysMoreoverthiswillnotbeconsideredasacaseofvariantbecauseakārafollowedbyakāraisnotallowed

61 InScriptVariantsHoweverweproposetwocasesoftruein-scriptvariantsinBanglascriptCASEIAs far as true variants in Bangla are concernedwemay drawour attention to caseswhereinHasantawith(U+09A5)থ(tha)appearsasconjunctwith(U+09B8)স(sa)and(U+09A8)ন(na)1 স+Hasanta+থ(U+09B8+U+09CD+U+09A5)versus

স +Hasanta+হ(U+09B8+U+09CD+U+09B9)

2 ন+Hasanta+থ(U+09A8+U+09CD+U+09A5)versus

ন+Hasanta+হ(U+09A8+U+09CD+U+09B9)

44

Theabovecombinationsifwrittenintraditionalorthographycouldbelittleconfusingwheretheথ(tha)inconjunctappearslikeaহ(ha)Theconjunctcouldbeintheinitialmedialorfinalpositions(asshownbelowinegno1)Itcouldbetypedwrongaswellthinking itwas aহ (ha)U+09B9 increasing the chances of risks in labelwriting andidentificationExamples1 acuteandসহ(asinacuteানsthānaacuteলsthulaাacuteGsvāsthyaঅacuteায়ীasthāyı)

2 andনহ(asinparagrantha)The fontswhichrepresent traditionalBanglawritingsystemcould tend tocreate thisproblemThereforethesemaybetakenascasesofvariantsinBanglaCASEIIAnotherinterestingexampleofvariantisencounteredinra+HasantaandHasanta+racombinationsinwritinglabelsintheBanglascript(forlanguagessuchasBanglaAssameseandManipuri)Thevariantcasesariseintypinglsquorepharsquo(involvingra+Hasanta)andlsquora-phalārsquo(involvingHasanta+ra)lsquoRepharsquocouldbeformedbytwosequences(mainlybecausebothAssameseandBanglafindplaceinthesameUNICODEpointsandlsquoB_RArsquoaswellaslsquoA_RArsquorefertothesamephoneticelement)Herethefinalligatureslookthesameandwillbeasfollows

(1) B_RA+H+C(2) A_RA+H+C

Where

B_RA = U+09B0BENGALILETTERRA(র)orA_RA = U+09F0BENGALILETTERRAWITHMIDDLEDIAGONAL(ৰ)H = U+09CDBENGALISIGNVIRAMA()C = anyconsonant(theoretically)

Example

Sequence1(UsingBanglaRA)

Ligature1

Sequence2(UsingAssameseRA)

Ligature2

U+09B0(র)U+09CD()U+0995(ক) কH U+09F0(ৰ)U+09CD()U+0995(ক) কH

U+09B0(র)U+09CD()U+09A0(ঠ) ঠH U+09F0(ৰ)U+09CD()U+09A0(ঠ) ঠH

Table12ExampleofRepha

45

NoteAsBanglaandAssameseকandঠlookexactlythesametheresultantcombinationswithRephalookidenticalAdditionofRephadoesnotmakeanydifferencelsquoRa-phalarsquocouldbeformedbytwosequencesonsimilargroundsandthefinalligatureswouldlookthesame

(1) C1+H+B_RA(2) C1+H+A_RA

WhereC1 = anyconsonantsexceptKhanda-ta

Example

Sequence1(UsingBanglaRA)

Ligature1

Sequence2(UsingAssameseRA)

Ligature2

U+0995(ক)U+09CD()U+09B0(র) U+0995(ক)U+09CD()U+09F0(ৰ)

U+09A8(ন)U+09CD()U+09B0(র) ) U+09A8(ন)U+09CD()U+09F0(ৰ) )

Table13ExampleofRa-phala

AstheAssameseandBanglaRephaandRa-phalaconjunctformslookthesamethiscouldcauseconfusabilitytotheend-usersHencetherephaandra-phalacasesneedtobedefinedasvariantsNBGPconcludedtodefineরandৰasvariantcodepointswhereonlyonevariantsetbetweenরandৰcouldcoverallcasesButthiswillcreateblockedvariantlabelsegif

someoneregistersldquoরররrdquothevariantlabelldquoৰৰৰrdquowillbegeneratedasvariantandwillbeblockedandviceversaHoweveritisonlyblockedatthelabellevelifsomeoneelseneedstoregisterotherlabelsegৰৰorৰৰৰৰitisstillpossible

62 CrossScriptVariantsAcrispcrossscriptstudyforBanglahasbeendonewithrespecttosisterscriptssuchasDevanāgarī Gurmukhı and Odia11(formerly Oriya) keeping in mind the visual andtechnicalconfusionstheymaycauseaslabelsonthewebdomainMoreoverthereisnoin-script variant in Bangla as far as the orthography is concerned The followingcharacters are being proposed by the NBPG as variants Although there are certaincharacters which are somewhat similar they but have not been included here TheyhavebeenprovidedintheAppendix(102)forreference

11 Unicode uses Oriya for the script although Odia is now the official term used

46

1 BanglaandNāgarīDevanāgarīScript

Bangla Devanāgarī

মU+09AE

मU+092E

িU+09BF

िU+093F

Table14-BanglaandDevanāgarīcross-scriptvariantcodepoint

2 BanglaandGurmukhiScript

Bangla Gurmukhı

মU+09AE

ਸU+0A38

িU+09BF

ਿU+0A3F

Table15-BanglaandGurmukhıcross-scriptvariantcodepoint

7 WholeLabelEvaluationRules(WLE) ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinsection32whenwritten inBangla12ScriptThe ruleshavebeendrafted in suchawaythattheycanbeeasilytranslatedintotheLGRspecificationsBelow are the symbols used in the WLE rules for each of the Indic SyllabicCategoryasmentionedinthetableprovidedinCodepointrepertoire(Section51)

C rarr Consonant

M rarr Kāra(Mātrā)

V rarr Vowel

B rarr Anusvāra

12 As used by the Unicode denoting and including both Assamese and Maṇipuri

47

D rarr Candrabindu

X rarr Visarga

H rarr Hasanta

Z rarr KhandaTa

S rarr S1S2(fromTable9)or(ae)Ya-phalā(V1HC1M1)whereV1iseither0985(অ-BENGALILETTERA)or098F(এ-BENGALILETTERE)His09CD(-BENGALISIGNVIRAMA)C1is-09AF(য-BENGALILETTERYA)M1is-09BE(া-BENGALIVOWELSIGNAA)S1 and S2 are valid even they are not allowed by the other context rules

P rarr Ra-Hasanta(C2H)whereC2iseither09B0(র-BENGALILETTERRA)or 09F0 (ৰ - ASSAMESE LETTER RA

Unicode name BENGALI LETTER RAWITHMIDDLEDIAGONAL)

His09CD(-BENGALISIGNVIRAMA)

Table16-SymbolsusedinWLErules

ItisalsoperhapsidealtomentionherethatinBanglatheconsonantletters(orgraphemes)arephysicallyjoinedtoformldquoclustersrdquothatcouldtheoreticallyconjoinfromtwotofourconsonantsandcombinetocreatenewshapesDashandChaudhuri(1998)statethatthereareldquonearly380uniqueconsonantclustersrdquooutofwhichBi-consonantalcombinationsare290three-lettercombinationsaccountforanother80andtherareroneswithfourlettersnumber10more[136Pg4]MoredetailsofsuchcombinationscouldbeseeninPabitraSarkar(1993)[135] 71FinalSetofWLERulesTheprevalentpatternsinBanglaandvariousrestrictionsbelowarethespecificWLErulesthatneedtobeimplemented

48

1 CisasetofCandCNwhereCNisthesetofnormalizedformsofড়ঢ়য়2 HmustbeprecededbyC

Example

3 MmustbeprecededbyCExampleকা

4 DmustbeprecededbyeitherofVCorMExampleআখখা হা

5 XmustbeprecededbyeitherofVCMorDExampleউঃখঃবঃাঃ দ ঃ

6 BmustbeprecededbyeitherofVCMorDExampleআংইংকং

7 ZmustbeprecededbyVCMDBXorPExampleইৎকৎাৎাৎপ6ৎrৎ (S is not listed because S ends with M Z may also follow S)rdquo

8 VCANNOTbeprecededbyHDetailsin711CaseofVprecededbyH

9 SCANNOTbeprecededbyH

Now letuselaborateeach rulewithexamples from the scriptkeeping inmindthe Bangla Assamese and Manipuri communities Some combinations ofcharactersmayseemunrealisticorrareinusagebutthereisnoharminaddingsuch ligaturesbecause it ispossible tocreate thembyanyusereasilybutmaynotbeattestedcombinationsCase of V Preceded by H There could be cases involvingmulti-word domains where Vmay need to beallowedtofollowanHegব8াsঅtইিuয়া baeligŋkʌv ɪndiə (U+09ACU+09CDU+09AFU+09BEU+0999U+09CD U+0995 U+0985 U+09AB U+09CD U+0987 U+09A8 U+09CD U+09A1U+09BFU+09DFU+09BE)(meaningBankofIndia)ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendswithanH(অt)andthesecondwordbeginswithaV(ইিuয়া)Somesectionsofthe linguistic community require the explicit presence of H for fullrepresentation of the sound intended However by and large the form of thefirstwordwithoutanH(U+09CD)isconsideredenoughforfullrepresentationofthesoundintendedforthefirstword

49

This isauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoire Otherwise V is never required to be allowed to follow an HPermittingthismaycreateaperceptivesimilaritybetweentwolabels(withandwithout H) for majority of the linguistic communities hence this is explicitlyprohibitedbytheNBGPIn future if required depending on the prevailing requirements from thecommunitythefutureNBGPmayconsiderrevisitingthisrule

72 AdditionalExamplesfromBanglaABNF

Belowarea fewexampleswhichhelponeunderstandsomeof the rulesABNFputsinplaceThesearejustgivenforreferencepurposesandarenotmeanttobecomprehensive

1HMBDorXcannotoccurinthebeginningofaBanglawordExample

ক क

াক ाक

ংক क

ক क

ঃক ःकAscanbeseensuchcombinationwillresultautomaticallyinaldquogolurdquooradottedcircle marking it as an invalid formation This is an intrinsic property of theIndianlanguagesyllableandisquasi-automaticallyappliedwhereversupportedbytheOS

2HisnotpermittedafterVBDXMS

Example

অ अ

অং क ক क কঃ कः ি )क

3NumberofBDorXpermittedafterConsonantorVowelorakāra(Mātrā)isrestrictedtoonethusthefollowingcombinationsareinvalidated

50

Example

কংং क

ক क

কঃঃ कःः

কা का

কীঃঃ कःः

অংং अ

অ अ

অঃঃ अःः

4NumberofMpermittedafterConsonantisrestrictedtooneExample

কীী की5MisnotpermittedafterV Example

ইাঈৗ ईाईौ6ThecombinationsofAnusvāra+ VisargaaswellasVisarga+Anusvāraarenotpermissible

Example

কংঃ कः

কঃং कः

8 Contributors

81ExpertsfromIndia ProfessorUdayaNarayanaSinghChair-ProfessorofLinguisticsampDeanFacultyofArtsAmityUniversityHaryanaGurgaonPachgaonManesarPIN122431(Haryana)India

51

ProfessorPabitraSarkarformerlyVice-ChancellorRabindraBharatiUniversityKolkataDrAtiurRahmanKhanPrincipalTechnicalOfficerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMrRajibChakrabortyLinguistSocietyforNaturalLanguageTechnologyResearch(SNLTR)Module114amp130SDFBuildingSaltLakeSector-VKolkata-700091(WestBengal)IndiaMrAkshatJoshiProjectEngineerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMsMoumitaChowdhurySeniorTechnicalOfficerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMrChandrakantaMurasinghAgartalaTripuraSomeotherNBGPmembers

82ContributorsfromBangladeshJanabMustafaJabbarHonorableMinisterMinistryofPostsTelecommunicationsampInformationTechnologyGovtofBangladeshProfShamsuzzamanKhanFormerDirector-GeneralBanglaAcademyDhakaProfRafiqulIslamNationalProfessorofHumanitiesDhakaProfSwarochisSarkarDirectorInstituteofBangladeshStudiesRajshahiUniversityRajshahiBangladeshProfJinnatImtiazAliDirector-GeneralInternationalMotherLanguageInstituteDhakaMrMohammadMamunOrRashidDepartmentofBanglaJahangirnagarUniversityampMemberBangladeshComputerCouncilProfManiruzzamanformerlyProfessorChittagongUniversityChattagramBangladeshMrShyamSunderSikderSecretarySecretaryPostampTelecommunicationsDivisionGovtofBangladesh

52

MrMdMustafaKamalFormerDirectorGeneralBangladeshTelecommunicationsRegulatoryAuthorityGovernmentofBangladeshDhakaBrigadierGeneralMdMahfuzulKarimMajumderDirector-GeneralEngineeringampOperationsDivisionBangladeshTelecommunicationsRegulatoryAuthorityGovernmentofBangladeshDhakaMdZiarulIslamProgrammerPostsampTelecommunicationsDivisionGovernmentofBangladeshDhakaProfSyedShahriyarRahmanDepartmentofLinguisticsUniversityofDhakaDrMizanurRahmanDirector(In-Charge)TranslationTextBookandInternationalRelationsDivisionBanglaAcademyDhakaDrApareshBandyopadhyayDirectorBanglaAcademyDhakaMrMdMobarakHossainDirectorBanglaAcademyDhakaDrJalalAhmedDirectorBanglaAcademyDhakaMrJahangirHossainInternetSocietyBangladesh(ICANNALS)JanabSarwarMostafaChoudhuryBangladeshComputerCouncilDhakaJanabMdRashidWasifBangladeshComputerCouncilDhakaJanabIstiaqueArifSeniorAssistantDirectorBangladeshTelecommunicationsRegulatoryAuthorityDhakaMsAfifaAbbasInformationSecurityandGovernanceLeadEngineeratBanglalinkandICANNFellowMr Mohammad Abdul Haque Secretary General Bangladesh Internet GovernanceForumMrImranHossenCEOEyeSoftandkeymemberofBangladeshAssociationofSoftwareampInformationServices(BASIS)MsShahidaKhatunDirectorFolkloreMuseumampArchiveDivisionBanglaAcademyDhakaMrSyedAshikRehmanCEOBengalMediaCorporationDhakaMrHaseebRahmanCEOProfessionalsrsquoSystemsDhaka

53

9 References[100] UnicodeConsortium2017UnicodeStandard100MountainViewCA

[101] BandyopadhyayChittaranjan1981DuiShatakerBanglaMudranoPrakashanKolkataAnandaPublishers

[102] BanerjiRD1919TheOriginoftheBengaliScriptKolkataNewDelhiAsianEducationalServices2003reprint

[103] ChatterjiSK1926TheOriginandDevelopmentoftheBengaliLanguageCalcuttaCalcuttaUniversityPress

[104] -----1939Bhasha-prakashBangalaVyakaran(AGrammaroftheBengaliLanguage)CalcuttaUniversityofCalcutta

[105] HaiMuhammadAbdul1964DhvaniVijnanOBanglaDhvani-tattwa(PhoneticsandBengaliPhonology)DhakaBanglaAcademy

[106]JhaSubhadra1958TheFormationofMaithiliLondonLuzacampCo

[107] KosticDjordjeDasRheaS1972AShortOutlineofBengaliPhoneticsCalcuttaStatisticalPublishingCompany

[108] MajumdarRC1971HistoryofAncientBengalCalcuttaGBhardwaj

[109] MazumdarBijaychandra19202000TheHistoryoftheBengaliLanguage(ReprCalcutta1920ed)NewDelhiAsianEducationalServices

[110] PandeyAnshuman2001ProposaltoEncodetheTirhutaScriptinISOIEC10646

[111] PalPalashBaran2001DhwanimalaBarnamalaKolkataPapyrus

[112] -----2007lsquoBanglaHarapherPanchParbarsquoInSwapanChakrabortyedMudranerSanskritiOBanglaBoiKolkataAbabhas

[113] RossFiona1999ThePrintedBengaliCharacteranditsEvolutionLondonCurzon

[114] ShastriMahamahopadhyayHaraPrasad1916HājārBacharērPurāṇaBāṅgālāBhāṣāyBauddhaGānōDōhāCalcuttaBangiyaSahityaParishat

[115] SinghUdayaNarayana(JointlyManiruzzaman)1983DiglossiainBangladeshandlanguageplanningCalcuttaGyanBharati214pp

[116] -----1987ABibliographyofBengaliLinguisticsMysoreCIILxii+316pp

[117] -----2017(withRajibChakrabortyBidishaBhattacharjeeampArimardanKumarTripathy)LanguagesandCulturesontheMarginGuidelinesforFieldworkonEndangeredLanguagesMimeoCentreforEndangeredLanguagesVisva-Bharati

54

[118] -----1980ScriptalchoiceandspellingreformAnessayinlanguageandplanningJournaloftheMSUniversityofBarodaSocialScienceNumber292173-186AmodifiedversionreprintedEAnnamalaiBjornJernuddandJoanRubinedsLanguagePlanningProceedingsofanInstituteMysoreCIIL405-417

[119] Sripantha1996JakhanChapakhanaEloKolkataPaschim-BangaBanglaAcademy

[120] SurAtul1986BanglaMudranerDushoBacharKolkataJijnasa

[121] ScriptBehaviourforBengaliVersion11TDILandC-DACPune

[122] BoraMahendra1981TheEvolutionofAssameseScriptJorhatAssamSahityaSabha

[123] ProposaltoEncodetheTirhutaScriptinISOIEC10646httpwwwunicodeorgL2L201111175r-tirhutapdfaccessedon25112017

[124] EthnologueAssameseintheLanguageCloudhttpswwwethnologuecomcloudasmaccessedon25112017

[125] BengalialphabetforManipurifoundinEthnologueManipuri(MeeteilonMeithei)httpswwwomniglotcomwritingmanipurihtmaccessedon20102019

[126] WikipediaBengalialphabethttpsenwikipediaorgwikiBengali_alphabetaccessedon25112017

[129]OmniglotSlyhetihttpwwwomniglotcomwritingsylotihtmaccessedon1052018

[130]WikipediaBishnupriyaManipurilanguagehttpsenwikipediaorgwikiBishnupriya_Manipuri_languageaccessedon25112017

[131] TheEMILLECIILCorpushttpmetashareeldaorgrepositorybrowsethe-emilleciil-corpusabdd35c8de6f11e2b1e400259011f6ea6bce74d38dbb42d881da76c64a6adb20accessedon1052018

[132]TheEMILLECIILCorpushttpcatalogelrainfoproduct_infophpproducts_id=696accessedon1052018

[133] BanglaLanguageampScript httpswwwisicalacin~rc_banglabanglahtmlaccessedon1052018

[134] SarkarPabitra1992BanglaBananSanskarSamasyaoSambhabanaKolkataChirayataPrakashan

[135] SarkarPabitra1993BanglaBhasharYuktabyanjanBhasha1123-45

55

[136] DashNiladriShekharandBBChaudhuri1998BanglaScriptAStructuralStudyLinguisticsToday121-28Alsoavailableathttpswwwacademiaedu9967428Bangla_Script_A_Structural_Study

[137] DaniAhmedHasan(1957)lsquoSrīhatta-NāgarīLipirUtpattioBikāśrsquoBanglaAcademyPatrika(Dhaka)Vol12(Bhadra-Agrahayan1364BangabdaNumber)pg1

[138] WikipediaSylhetiNagari httpsenwikipediaorgwikiSylheti_Nagariaccessedon1952018

[139] FuruiRyosuke(2015)lsquoVariegatedAdaptationsStateFormationinBengalfromtheFifthtoSeventhCenturyrsquoinBhairabiPrasadSahuampHermannKulkeedsInterrogatingPoliticalSystemsIntegrativeProcessesandStatesinPre-ModernIndiaChapter9Pp255-73NewDelhiManohar

[140] FergusonCharesAandMunierChowdhury(1960)lsquoPhonesofBengalirsquoLanguageVol36No1pp22-59

[141] ShahidullahMuhammad(2007)BuddhistMysticSongsDhakaMowlaBrothers

[142] RayPunyaSloka(1966)BengaliLanguageHandbookWashington

[143] HaiMuhammadAbdul(1960)AphoneticandphonologicalstudyofnasalsandnasalizationinBengaliDhakaUniversityofDhaka

[144] UnicodeConsortiumProposalSummaryFormtoAccompanySubmissionsforAdditionstotheRepertoireofISOIEC10646UNICODEhttpswwwunicodeorgL2L200202387r-syloti-formpdfaccessedonMay212018

[145] WikipediaOlChiki(Unicodeblock)httpsenwikipediaorgwikiOl_Chiki_(Unicode_block)accessedonMay212018

[146]BanglaScripthttpwwwbangladesh2000combdbangla_scripthtmlaccessedonMay212018

[147] BhattacharyaAshutoshed(1942)GopichandrerGanCalcuttaCalcuttaUniversity

[149] DasSisirKumar(1975)SahibsandMunshisAnAccountoftheCollegeofFortWilliamCalcutta

[150] IslamRafiqulPabitraSarkarMahbubulHaqampRajibChakraborty(eds)(2014)BanglaAcademyPromitoBanglaByabaharikByakaran(AFunctionalGrammarofStandardBangla)DhakaBanglaAcademy

[151] SarkarPabitra[2013]lsquoBanglaSpellingReformtheLongandShortofItrsquoBanglaJournal19215-232

[152] BanglaAcademy(2012)BanglaAcademyPromitoBanglaBananerNiyam(StandardBanglaSpellingasadoptedbyBanglaAcademy)DhakaBanglaAcademy

56

[153] SarkarPabitraampRajibChakraborty2018ldquoWhathashappenedSoFarIntermsof Script Reformsrdquo Paper presented at the Face to Face meeting jointly held by theBanglaAcademyDhakaampICANNatBanglaAcademyDhakaon10072018

[154] The Unicode Consortium 2018 The Unicodereg Standard Version 110 ndash CoreSpecificationChapter12P473

57

10 Appendix-I

101 AugmentedBackus-NaurFormalism(ABNF)The Augmented Backus-Naur Formalism (ABNF) is generic in nature and whenappliedtoaspecificlanguagescriptcertainrestrictionrulesapplyInotherwordsinagivenlanguagesomeoftheFormalismstructuresdonotnecessarilyapplyTotakecareofsuchcasesrestrictionrulesaresetinplaceTheserestrictionswillhelptofine-tunetheABNFIncaseofBangla13inparticularthefollowingrulesapply

1Khaṇḍata (ৎ) is NOT allowed at the beginning of an IDN label The sameappliestoঞandthevelarnasalঙintheBanglaSchemeoffive-foldlsquovargarsquo(asdefined under Table 5) Moreover Bangla does not allow ya (য়) in thebeginningofawordeitherbutwecanciteacoupleofnativeexamples forexamplethewordয়8াvেড়া(yaeligbbɔRo)fromthepoemlsquoLichuchorrsquowrittenbyKaziNazrul IslamHowever there are instances of it being used in namesmostlyof foreignoriginsuchasYaqubwhichmaybewrittenwithya(য়) inthe beginning as inয়াwব) In very recent timeswhile transliterating someChineseandJapanesenamesinBanglaonedoescomeacrossthepossibilityofKhaṇḍata (ৎ) followedbysa (স) inthebeginningofaword forexamplexেসিরং(Tsering)

2CHcancomewithKhandaTainonlythecasewhereCisra(র)(09B0)

ৎ6 asinভৎ6 সনা

3OnlyfollowingcombinationswithVHCMwillbeallowedrarrঅ8া(togetherpronouncedasaelig)asinঅ8ািসড(acid)rarrএ8া(togetheralsopronouncedasaelig)asinএ8ািসডএ8ােসািসেয়শান

(acidassociation)

102 lsquoSylhetiNagarılipirsquoorlsquoSilotirsquoThisversionofBanglascriptresemblesthe lsquoKaithīrsquoscript(ISO12954)usedbytheAccountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh andBiharndashwidelyinuseduringthe1880sTherewereseveralothernamesofSylheti

13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language Assamese and Maṇipuri have not been covered in this section

58

Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)

Table17ndashTheScriptTableofSylhetiNagarıorSiloti

103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints

1031 BanglaandNāgarīorDevanāgarī

Bangla Devanāgarī NBGP

Decision

ঃ U+0983 ः U+0903 Confusable

ওU+0993 उU+0909 Confusable

ঘU+0998 घU+0918 Confusable

U+0981 U+0945 Confusable

Table18BanglaandDevanāgarīconfusablecodepoints

59

1032 BanglaandGurmukhi

Bangla Gurmukhi NBGPdecision

ঘU+0998 ਬU+0A2C Confusable

U+0981 U+0A71 Confusable

Table19BanglaandGurmukhiconfusablecodepoints

Bangla Gurmukhi

NBGPdecision

ওU+0993 ਤU+0A24

Distinguishable

শU+09B6 ਅU+0A05

Distinguishable

মU+09AE ਮU+0A2E

Distinguishable

বাU+09ACandU+09BE

ਗU+0A17

Distinguishable

Table20ndashBanglaandGurmukhıdistinguishablecodepoints

1033BanglaandOriya(Odia)

Bangla Oriya(Odia) NBGPDecision

ওU+0993 ଓU+0B13 Confusable

Table21ndashBanglaandOriyadistinguishablecodepoints

Bangla Oriya(Odia) NBGP

Decision

ঘU+0998 ସU+0B38 Distinguishable

Table22ndashBanglaandOriyadistinguishablecodepoints

60

11 Appendix-IIBengaliconsonantsandtheirallographs

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

প p y(z+ত)(z +ন)|(z +প)প8(z +য)r(z +র)(z +ল)~(z +স)(+প)(+প)

ফ pʰ (t +র)(t +ল)(+ফ)

ব b ( +জ)( +দ)( +ধ)v(+ব)ব8(+য)(+র)(+ল)ভ(+ভ)(+ব)(+ব)

(0+ধ) 2 (3+ব)

ভ bʱ ভ8(+য)(+র)(+ল)

ত t (x+ত)8(x+x+য)(x+x+ব)(x+থ)(x+ন)ত8(x+য)(x+ম)8(x++য)(x+ব)(x+র)y(z+ত)(p+ত)(p+x+ব)(+ত)q8(+x++য) (+x+র)Thereisamarkedformofত+=ৎৎ6 (+xৎ)

amp (5+ত)

61

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

থ tʰ থ8(iexcl+য)cent(iexcl+র)pound(+থ)(x+থ)curren(+থ)

(7+থ) 9 (+থ)

দ d yen(brvbar+গ)sect(brvbar+ঘ)uml(brvbar+দ)copy(brvbar+ধ)দ8(brvbar+য)ordf(brvbar+ব)laquo(brvbar+ভ)not(brvbar+র)(+দ)shy(+দ)reg(+brvbar+র)not6 (+brvbar+র)

(lt+গ) gt (lt+ধ)

ধ dʱ macr(deg+ন)plusmn(deg+ম)ধ8(deg+য)sup2(deg+র)sup3(acute+ধ)copy(brvbar+ধ)(+ধ)micro(+ধ)

( (+ধ) gt (lt+ধ) (0+ধ) (7+ধ)

ট ʈ para(middot+ট)ট8(middot+য)cedil(middot+ব)sup1(middot+র)ordm(p+ট)raquo(frac14+ট)

ঠ ʈʰ ঠ8(frac12+য)frac34(iquest+ঠ)Agrave(frac14+ঠ)

ড ɖ Aacute(Acirc+ড)ড8(Acirc+য)Atilde(Acirc+র)

ঢ ɖʱ ঢ8(Auml+য)Aring(iquest+ঢ)

চ tʃ AElig(Ccedil+চ)Egrave(Ccedil+ছ)Eacute(Ccedil+Ecirc+র)Euml(Ccedil+ঞ)চ8(Ccedil+য)Igrave(Iacute+চ)Icirc(Iuml+চ)

(A+চ)

62

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ছ tʃʰ ETH(Ecirc+র)Egrave(Ccedil+ছ)Ntilde(Iacute+ছ)Ograve(Iuml+ছ)

$ (A+ছ)

জ dʒ Oacute(Ocirc+জ)Otilde(Ocirc+Ocirc+ব)Ouml(Ocirc+ঝ)times(Ocirc+ঞ)জ8(Ocirc+য)Oslash(Ocirc+র)Ugrave(Iacute+জ)

(A+জ)

ঝ dʒʱ (notprivilegedenoughtohaveclustersasafirstmember)Ouml(Ocirc+ঝ)Uacute(Iacute+ঝ)

ক k Ucirc(p+ক)ordm(p+ট)(p+ত)Uuml(p+x+র)(p+x+ব)Yacute(p+ন)THORN(p+ব)szlig(p+ম)ক8(p+য)agrave(p+র)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)auml(p+frac14+ব)aacute8(p+frac14+য)aring(p+স)s(aelig+ক)ccedil(+p+র)

amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)

খ kʰ (notprivilegedenoughtohaveclustersasafirstmember)egrave(aelig+খ)

63

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

গ g eacute(acute+গ)ecirc(acute+দ)sup3(acute+ধ)euml(acute+ন)igrave(acute+ব)iacute(acute+ম)গ8(acute+য)icirc(acute+র)iuml(acute+ল)eth(aelig+গ)eth6 (+aelig+গ)

( (+ধ) (H+গ) K (L+H+গ)

ঘ gʱ ntilde(ograve+ন)ঘ8(ograve+য)oacute(ograve+র)ocirc(aelig+ঘ)

ঞ Thisletterdoesnothaveanyparticularphoneticvaluebutmostlypronouncedasn

Igrave(Iacute+চ)Ntilde(Iacute+ছ)Ugrave(Iacute+জ)Uacute(Iacute+ঝ)times(Ocirc+ঞ)

(A+চ) $ (A+ছ) (A+জ) M (A+ঝ)

ণ n otilde(iquest+ট)frac34(iquest+ঠ)ouml(iquest+ড)divide(iquest+Acirc+র)Aring(iquest+ঢ)oslash(iquest+ণ)ণ8(iquest+য)ugrave(iquest+ব)acirc(p+frac14+ণ)uacute(frac14+ণ)ucirc(+ণ)

O (P+ড) - (P+R+র) + (S+ণ)

ঙং ŋ s(aelig+ক)uuml(aelig+p+র)egrave(aelig+খ)eth(aelig+গ)ocirc(aelig+ঘ)yacute(aelig+p+ষ)(Insomecontextsaeligisreplacedbyং)কংঅং

) (H+ক) (H+গ) U (H+ঘ)

64

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ম m thorn(+ল)yuml(+প)(+z+র)(+ভ)(++র)$(+ম)(+র)(x+ম)plusmn(deg+ম)amp(+ম)atilde(p+frac14+ম)

W (3+ম)

ন n (+ট)((+middot+র))(iquest+ঠ)u(+ড)(+Acirc+র)(+ত)q(+x+র)q8(+x++য)curren(+থ)shy(+দ)reg(+brvbar+র)micro(+ধ)+(+deg+র)(+brvbar+ব)-(+ন)(+ম)ন8(+য)(+স)0(+ন)

(7+থ) (7+ধ) (7+Y+র)

শ ʃ Icirc(Iuml+চ)Ograve(Iuml+ছ)1(Iuml+ন)2(Iuml+ম)3(Iuml+র)4(Iuml+ল)শ8(Iuml+য)

ষ ʃ 5(frac14+ক)raquo(frac14+ট)Agrave(frac14+ঠ)uacute(frac14+ণ)6(frac14+প)7(frac14+z+র)8(frac14+ফ)raquo(frac14+ট)9(frac14+middot+র)Agrave(frac14+ঠ)uacute(frac14+ণ)ষ8(frac14+য)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)

+ (S+ণ)

65

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

স sampʃ (+ক)(+ট)(+প)(+ফ)lt(+ত)pound(+থ)(+ট)(+ক)=(+খ)স8(+য)gt(+র)(+ল)aring(p+স)

9 (+থ)

হ h ucirc(+ণ)0(+ন)amp(+ম)হ8(+য)(+র)A(+ল)

W (3+ম)

ড় ɽ B(C+গ)

ঢ় ɽʱ (notprivilegedenoughtohaveclusters)

য dʒThesecondarysymbol(allograph)jɔ-phalahastwophoneticvaluesWhenaddedtotheinitialconsonantinaworditisavowelaelig(asinশ8ামলর 8াপারetc)Butafteranon-initialconsonantitjustdoublesitinpronunciation(asinকায6ধায6etc)The+যcombinationhastwophysicalmanifestationsmdashর 8andয6

ক8(p+য)স8(+য)র 8(+য)[Justর 8isneverusedinBanglaorthographyর 8াisbutthenitslasttwosymbolsYa-phalaa-karaconstituteavowelsignrepresentingthevowelঅ8া]

66

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

র r Twomanifestationsmdashi lরফrepʰasthefirst

memberofaclusteregপ6ৎ6 not6 য6D6(+deg+ব)(earlierE6=+brvbar+deg+বafour-termcluster)etc(placedoverthefollowingconsonant)

ii র-ফলাrɔ-pʰɔlaasthesecondthirdmemberofaclustereg etc(placedundertheconsonantitfollows)

ল l F(+গ)(+প)G(+ব)H(+ম)I(+ট)J(+ড)K(+ক)F(+গ)L(+দ)ল8(+য)iuml(acute+ল)(+ল)thorn(+ল)

ঃ hwordfinallywordmediallyitdoublesthepronunciationofthefollowingconsonant

অঃকঃ

অব

Page 26: Proposal for a Bangla (or Bengali) Script Root Zone …‘Bangla’ (or Bengali) is historically and genealogically regarded as an eastern Indo-Aryan language with around 178.2 million

26

51 CodePointRepertoireInclusion

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

1 U+0981 BENGALISIGNCANDRABINDU

Candra-bindu

1Bangla2Manipuri2Assamese

[112][122][125]

2 U+0982 ং BENGALISIGNANUSVARA

Onushshar(Anusvara)

1Bangla2Manipuri2Assamese

[112][122][125]

3 U+0983 ঃ BENGALISIGNVISARGA

Biśarga(Visarga)

1Bangla2Manipuri2Assamese

[112][122][125]

4 U+0985 অ BENGALILETTERA

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

5 U+0986 আ BENGALILETTERAA

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

6 U+0987 ই BENGALILETTERI

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

7 U+0988 ঈ BENGALILETTERII

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

27

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

8 U+0989 উ BENGALILETTERU

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

9 U+098A ঊ BENGALILETTERUU

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

10 U+098B ঋ BENGALILETTERVOCALICR

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

11 U+098F এ BENGALILETTERE

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

12 U+0990 ঐ BENGALILETTERAI

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

13 U+0993 ও BENGALILETTERO

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

14 U+0994 ঔ BENGALILETTERAU

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

28

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

15 U+0995 ক BENGALILETTERKA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

16 U+0996 খ BENGALILETTERKHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

17 U+0997 গ BENGALILETTERGA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

18 U+0998 ঘ BENGALILETTERGHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

19 U+0999 ঙ BENGALILETTERNGA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

20 U+099A চ BENGALILETTERCA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

21 U+099B ছ BENGALILETTERCHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

29

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

22 U+099C জ BENGALILETTERJA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

23 U+099D ঝ BENGALILETTERJHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

24 U+099E ঞ BENGALILETTERNYA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

25 U+099F ট BENGALILETTERTTA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

26 U+09A0 ঠ BENGALILETTERTTHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

27 U+09A1 ড BENGALILETTERDDA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

30

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

28 09A109BC(U+09DC)

ড় NormalizedformofBENGALILETTERRRA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]09DCisthepreferredcodepointhoweveritisnotavailableforLGRasperthestandardsgoverningthisLGRdevelopment

29 U+09A2 ঢ BENGALILETTERDDHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

30 09A209BC(U+09DD)

ঢ় NormalizedformofBENGALILETTERRHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]09DDisthepreferredcodepointhoweveritisnotavailableforLGRasperthestandardsgoverningthisLGRdevelopment

31 U+09A3 ণ BENGALILETTERNNA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

32 U+09A4 ত BENGALILETTERTA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

31

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

33 U+09A5 থ BENGALILETTERTHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

34 U+09A6 দ BENGALILETTERDA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

35 U+09A7 ধ BENGALILETTERDHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

36 U+09A8 ন BENGALILETTERNA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

37 U+09AA প BENGALILETTERPA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

38 U+09AB ফ BENGALILETTERPHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

39 U+09AC ব BENGALILETTERBA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

32

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

40 U+09AD ভ BENGALILETTERBHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

41 U+09AE ম BENGALILETTERMA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

42 U+09AF য BENGALILETTERYA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

43 09AF09BC(U+09DF)

য় NormalizedformofBENGALILETTERYYA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]09DFisthepreferredcodepointhoweveritisnotavailableforLGRasperthestandardsgoverningthisLGRdevelopment

44 U+09B0 র BENGALILETTERRA

Consonant 1Bangla2Manipuri

[112][125]

45 U+09B2 ল BENGALILETTERLA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

46 U+09B6 শ BENGALILETTERSHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

33

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

47 U+09B7 ষ BENGALILETTERSSA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

48 U+09B8 স BENGALILETTERSA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

49 U+09B9 হ BENGALILETTERHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

50 U+09BE া BENGALIVOWELSIGNAA

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

51 U+09BF ি BENGALIVOWELSIGNI

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

52 U+09C0 ী BENGALIVOWELSIGNII

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

53 U+09C1 BENGALIVOWELSIGNU

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

34

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

54 U+09C2 BENGALIVOWELSIGNUU

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

55 U+09C3 BENGALIVOWELSIGNVOCALICR

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

56 U+09C4 BENGALIVOWELSIGNVOCALICRR

Kāra(Mātrā)

1Bangla2Assamese

[112][122]

57 U+09C7 l BENGALIVOWELSIGNE

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

58 U+09C8 m BENGALIVOWELSIGNAI

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

59 U+09CB lা BENGALIVOWELSIGNO

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

60 U+09CC lৗ BENGALIVOWELSIGNAU

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

35

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

61 U+09CD BENGALISIGNVIRAMA

Hasanta(=Halant)Virama(=Da ri)

1Bangla2Assamese2Manipuri

[112][122][125]

62 U+09CE ৎ BENGALILETTERKHANDATA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

63 U+09F0 ৰ BENGALILETTERRAWITHMIDDLEDIAGONAL

Consonant 2Assamese [122]

64 U+09F1 ৱ BENGALILETTERRAWITHLOWERDIAGONAL

Consonant 2Assamese2Manipuri

[122][125]

Table8BanglaCode-PointRepertoire

Apart from the above individual code-points the Neo-Brāhmī Generation Panel alsoproposes some specific sequences which enable conditional inclusion of the BanglaLETTER A and E followed by Bangla SIGN VIRAMA and Bangla LETTER YA againfollowed by Bangla VOWEL SIGN AA in the repertoire for enabling inclusion of aeligsoundasinEnglishlsquobatrsquolsquocatrsquoetc

SrNo

UnicodeCodePoints

Sequence

CharacterNames Examplelanguagesusingthecode-point

(Notexhaustive

list)

Reference

36

SrNo

UnicodeCodePoints

Sequence

CharacterNames Examplelanguagesusingthecode-point

(Notexhaustive

list)

Reference

S1 098509CD09AF09BE

অ8া BENGALILETTERABENGALISIGNVIRAMABENGALILETTERYABENGALIVOWELSIGNAA

BanglaAssamese

[112][122]

S2 098F09CD09AF09BE

এ8া BENGALILETTEREBENGALISIGNVIRAMABENGALILETTERYABENGALIVOWELSIGNAA

Bangla [112]

Table9Sequences

52 CodePointRepertoireExclusionTherearesomecharactersoftheBanglascriptthatfindplaceintheUnicodebuthavenot been included in the repertoire in the LGR proposal The reason for excludingঌ(U+098C)andৗ(U+09D7)isthattheyarerareandobsoletecharacters

SrNo CodePoints

Glyph CharacterNames Note

1 U+098C ঌ BENGALILETTERVOCALICL Limitedordeclininguse

2 U+09D7 ৗ BENGALIAULENGTHMARK Limitedordeclininguse

Table10ExcludedCodePoints

37

53 CodepointnotusedaloneBENGALI SIGN NUKTA U+09BC (See 336) is excluded from repertoire since it will never be used alone It will be used as sequence in three special characters in normalized form for ড় ঢ় য়

UnicodeCodePoint

Glyph CharacterName Reasonforexclusion

U+09BC BENGALISIGNNUKTA

Never used alone Only used together with U+09A1 ড

U+09A2 ঢ U+09AF য as to form ড় ঢ় য় respectively

Table10bExcludedCodePoints

54 TheBasisofPresentIDNThepresentLGRhasalsobenefitedfromtheearlierworkonIDNforBangla(differentversions)doneforभारतorভারতdraftedbetween20112009and18072013

541 TheABNFVariablesTheAugmentedBackus-NaurFormalism(ABNF)beganwiththefollowingvariables

CrarrConsonantVrarrVowelMrarrkāra(Mātrā)BrarrAnusvāra(onuʃʃār)DrarrCandrabinduXrarrVisarga(biʃɔrgo)HrarrHasantaViramaZrarrKhandaTa

TheAugmentedBackus-NaurFormalism(ABNF)willusethefollowingOperators

SrNumber Operator Function

1 ldquo|ldquo Alternative

2 ldquo[]rdquo Optional

3 ldquordquo VariableRepetition

38

4 ldquo()rdquo SequenceGroup

Table11TheABNFFormalism

InwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoBanglaaregiventofacilitateunderstanding

542 TheVowelSequenceInwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoBanglaare given To facilitate understanding of other Brahmi script users equivalents inDevanāgarīareprovidedwherevernecessaryAvowelsequenceismadeupofasinglevowelItmaybefollowedbutnotnecessarily(optionally)byanAnusvāraonuʃʃār(B)Candrabindu(D)oraVisargabiʃɔrgo(X)ThenumberofDBorXwhichcanfollowaV inBanglamaynotberestrictedtooneGoingbytherules illustratedinthedocument it isclearthat formationssuchasVDDVBBandVXXare invalidorthographicunitsHowever it isvalidandpossible tohaveformationsorsequencessuchasanusvarafollowedbyachandrabinduononehandandvisarga followed by a chandrabindu on the other as in হ8াংচা lsquohaelignchārsquo and lsquohaelignrsquo হ8াঃrespectivelyThepossibilityof aVisarga orAnusvāra (onuʃʃār) followingaCandrabinduexists inBangla Vowel can optionally be followed by a combination of Hasanta Virama [H]Consonant [C] to formaYa-phala ldquoYa-phala isapresentation formofU+09AFBanglaletterযorlsquoyarsquoRepresentedbythesequenceltU+09CDieBENGALISIGNVIRAMABangla SIGNHasanta or VIRAMA U+09AF -য BENGALI LETTER YAgt Ya-phala has aspecialformয়AgainwhencombinedwithU+09BEাBENGALIVOWELSIGNAA(ielsquoaarsquo(ā))itisusedfortranscribing[aelig]asintheldquoardquointheEnglishwordldquobatrdquowritteninBanglaasব8াটAVowel-sequenceadmitsthefollowingcombinations

5421 ASingleVowel

ExamplesV অअ

5422 AVowelwithConditionsAVowelcanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D]orVisarga[X]or Candrabindu+ Anusvāra [DB] or Candrabindu+ Visarga [DX] or combination ofHasanta(orVirama)[H]followedbyConsonant[C]followedbykāra(Mātrā)[M]

39

Examples

VB অং अ

VD অ अ

VX অঃ अः

VDB অং अ

VDX অঃ अःVHCM অ8াএ8া

5423 VHCMSequenceAVHCMsequencecanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D] orVisarga [X]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga [DX]

Examples VHCMB অ8াংএ8াং VHCMD অ8াএ8া VHCMX অ8াঃএ8াঃ VHCMDB অ8াংএ8াং VHCMDX অ8াঃএ8াঃ

543 TheConsonantSequence

5431 ASingleConsonant(C) ExampleC কक

5432 AConsonantwithConditionsAConsonantoptionallyfollowedbydependentvowelsignkāra(Mātrā)[M]orAnusvāra [B] or Candrabindu [D] or Visarga [X] or Hasanta (also known asVirama)[H]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga[DX]

Example

CM িকক -कक

CB কং क

40

CD ক क

CX কঃ कः

CH p क(Pureconsonant)

CDB কং क

CDX কঃ कः

5433 CMSequenceACMsequencecanbeoptionallyfollowedbyBDXDBorDX

Example CMB কীংকং कक

CMD কা का

CMX বীঃ वीः

CMDB কাং का

CMDX কাঃ काः

5434 SequenceofConsonants Asequenceofconsonants(upto4)joinedbyHasanta(alsoknownasVirama)

3(CH)CExample

CHC W rarr ন++ ত न+त

CHCHC sup2 rarr ন+ + ত+ + র न+त+र

CHCHCHC q8 rarr ন++ত++র++য় न+त+र+य

5435 Subsets While considering its subsets as a representative example we will consider thecombinationCHConlyhoweverthesameisequallyapplicabletoCHCHCandCHCHCHC[A]ThecombinationmaybefollowedbyMBDXDBorDX

Example CHCM sup3ী rarrক ক ী 4कrarrककী

CHCB sup3ং rarrক ক ং 4कrarrकक

CHCD sup3 rarrক ক 4कrarrकक

41

CHCX sup3ঃ rarrক ক ঃ 4कःrarrककঃ

CHCDB sup3 ং rarrক ক ং 4कrarrकक

CHCDX sup3ঃ rarrক ক ঃ 4कःrarrककः [B] 3(CH)CMmayfurtherbefollowedbyaBDXDBorDX

Example

CHCMB sup3ীং rarr ক ক ী ং 4क rarr क क ी

sup3ং rarr ক ক ং 4क rarr क क

CHCMD sup3া rarr ক ক া 4का rarr क क ा

CHCMX sup3ীঃ rarr ক ক ী ঃ 4कः rarr क क ी ः

CHCMDB sup3াংrarr ক ক া ং 4काrarr क क ा

CHCMDX sup3াঃ rarr ক ক া ঃ 4काः rarr क क ा ः

544 TheKhanda-Tasequence

5441 AsinglelsquoKhandarsquo-Ta(Z) Example Zৎ=x

5442 AKhandaTaCombination10AKhandaTacanbeprecededbyaconsonantandHasanta(alsoknownasVirama)

[CH]Z

Example র++ৎ =ৎHasinভৎHসনা (bhartsanā) scoldingNoteTheconditionsinthiscontextofKHANDATAarethattheCshouldbeeitherRAU+09B0(র)(usedinBangla)orRAU+09F0(ৰ)(usedinAssamese)

545 SpecialCasesSandPTwospecialcasesinvolvingSequences(referredtoasSandPinTable16underSection7)couldbedescribedbrieflyhereLetustakeupSinthefirstinstanceItisnoteworthythat thereare two instances inBanglawhereHasanta(U+09CD) isprecededbya full

10 Refer to Rule P in Section 7 Table 16

42

vowel (U+0985 অ - BANGLA LETTER A and U+098F এ - BANGLA LETTER E) ForrenderingYa-phalāfollowedbyঅandএitisnecessarytotypeU+09CDplusU+09AFyapreceded by the said vowels This is a purely ligatural entity and the addition ofYa-phalāandā-kāraisusedtoelicittheaeligsoundasinEnglishlsquobatrsquolsquofatrsquoetcTheBrahmıscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribedaslsquovowel killerrsquo although it actually indicates absence of a vowel after the markedconsonant Only the consonants can have the Hasanta marked But as we see hereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅGাandএGা(CfUnicode100p473[100])Another case refers to the formation of repha and ra-phalā in the said script andmentioned in the tableaboveasPOwing toco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+

kaasinঅকHarkaldquothesunldquo)ra-phalā=C+Hasanta+ra(egTieka+Hasanta+raas

inচTcakraldquocycleldquo)ThepointisinboththecasestheslotforracouldbeBanglaraর

(U+09B0)ortheAssameseraৰ(U+09F0)followedprecededbythecommonHasanta(U+09CD)whereastheshapesofrephaandra-phalāinboththecasesremainthesame

43

6 VariantsThissection talksabout thevariants in theBanglascriptTheNBGPcategorizes theseconfusinglyvariantsintwogroupsGroup1ConfusingduetopurevisualsimilarityGroup2ConfusingduetodeviationfromnormallyperceivedcharacterformationsbylargerlinguisticcommunityForGroup1anyidenticalcodepointsaredefinedasvariantsTheconfusablebutnotidenticalcasesarenotproposedasthereisanotherpanel(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesHowevercaseswhichbelongtoGroup2areproposedtobeconsideredasvariantsThesecasesarenotofmerevisualsimilarityasthey involve some deviations from the widely accepted norms of Bangla Aksharformations These can cause confusion even to a careful observer and hence beingproposedasvariantsThe variants are generated in a script when two or more forms are formed withdifferent storage or code points In Bangla the e-kāra ā-kāra and the o-kāra havedifferentcodepointsOnecantypeowithaconsonantatonegoandthesamebytypinge-kāra and ā-kāra as two separate keys getting the same results A reader cannotdifferentiatebetween the twoko (েকা)one typedwitha singlekeyand theotheronetypedwithtwodifferentkeysMoreoverthiswillnotbeconsideredasacaseofvariantbecauseakārafollowedbyakāraisnotallowed

61 InScriptVariantsHoweverweproposetwocasesoftruein-scriptvariantsinBanglascriptCASEIAs far as true variants in Bangla are concernedwemay drawour attention to caseswhereinHasantawith(U+09A5)থ(tha)appearsasconjunctwith(U+09B8)স(sa)and(U+09A8)ন(na)1 স+Hasanta+থ(U+09B8+U+09CD+U+09A5)versus

স +Hasanta+হ(U+09B8+U+09CD+U+09B9)

2 ন+Hasanta+থ(U+09A8+U+09CD+U+09A5)versus

ন+Hasanta+হ(U+09A8+U+09CD+U+09B9)

44

Theabovecombinationsifwrittenintraditionalorthographycouldbelittleconfusingwheretheথ(tha)inconjunctappearslikeaহ(ha)Theconjunctcouldbeintheinitialmedialorfinalpositions(asshownbelowinegno1)Itcouldbetypedwrongaswellthinking itwas aহ (ha)U+09B9 increasing the chances of risks in labelwriting andidentificationExamples1 acuteandসহ(asinacuteানsthānaacuteলsthulaাacuteGsvāsthyaঅacuteায়ীasthāyı)

2 andনহ(asinparagrantha)The fontswhichrepresent traditionalBanglawritingsystemcould tend tocreate thisproblemThereforethesemaybetakenascasesofvariantsinBanglaCASEIIAnotherinterestingexampleofvariantisencounteredinra+HasantaandHasanta+racombinationsinwritinglabelsintheBanglascript(forlanguagessuchasBanglaAssameseandManipuri)Thevariantcasesariseintypinglsquorepharsquo(involvingra+Hasanta)andlsquora-phalārsquo(involvingHasanta+ra)lsquoRepharsquocouldbeformedbytwosequences(mainlybecausebothAssameseandBanglafindplaceinthesameUNICODEpointsandlsquoB_RArsquoaswellaslsquoA_RArsquorefertothesamephoneticelement)Herethefinalligatureslookthesameandwillbeasfollows

(1) B_RA+H+C(2) A_RA+H+C

Where

B_RA = U+09B0BENGALILETTERRA(র)orA_RA = U+09F0BENGALILETTERRAWITHMIDDLEDIAGONAL(ৰ)H = U+09CDBENGALISIGNVIRAMA()C = anyconsonant(theoretically)

Example

Sequence1(UsingBanglaRA)

Ligature1

Sequence2(UsingAssameseRA)

Ligature2

U+09B0(র)U+09CD()U+0995(ক) কH U+09F0(ৰ)U+09CD()U+0995(ক) কH

U+09B0(র)U+09CD()U+09A0(ঠ) ঠH U+09F0(ৰ)U+09CD()U+09A0(ঠ) ঠH

Table12ExampleofRepha

45

NoteAsBanglaandAssameseকandঠlookexactlythesametheresultantcombinationswithRephalookidenticalAdditionofRephadoesnotmakeanydifferencelsquoRa-phalarsquocouldbeformedbytwosequencesonsimilargroundsandthefinalligatureswouldlookthesame

(1) C1+H+B_RA(2) C1+H+A_RA

WhereC1 = anyconsonantsexceptKhanda-ta

Example

Sequence1(UsingBanglaRA)

Ligature1

Sequence2(UsingAssameseRA)

Ligature2

U+0995(ক)U+09CD()U+09B0(র) U+0995(ক)U+09CD()U+09F0(ৰ)

U+09A8(ন)U+09CD()U+09B0(র) ) U+09A8(ন)U+09CD()U+09F0(ৰ) )

Table13ExampleofRa-phala

AstheAssameseandBanglaRephaandRa-phalaconjunctformslookthesamethiscouldcauseconfusabilitytotheend-usersHencetherephaandra-phalacasesneedtobedefinedasvariantsNBGPconcludedtodefineরandৰasvariantcodepointswhereonlyonevariantsetbetweenরandৰcouldcoverallcasesButthiswillcreateblockedvariantlabelsegif

someoneregistersldquoরররrdquothevariantlabelldquoৰৰৰrdquowillbegeneratedasvariantandwillbeblockedandviceversaHoweveritisonlyblockedatthelabellevelifsomeoneelseneedstoregisterotherlabelsegৰৰorৰৰৰৰitisstillpossible

62 CrossScriptVariantsAcrispcrossscriptstudyforBanglahasbeendonewithrespecttosisterscriptssuchasDevanāgarī Gurmukhı and Odia11(formerly Oriya) keeping in mind the visual andtechnicalconfusionstheymaycauseaslabelsonthewebdomainMoreoverthereisnoin-script variant in Bangla as far as the orthography is concerned The followingcharacters are being proposed by the NBPG as variants Although there are certaincharacters which are somewhat similar they but have not been included here TheyhavebeenprovidedintheAppendix(102)forreference

11 Unicode uses Oriya for the script although Odia is now the official term used

46

1 BanglaandNāgarīDevanāgarīScript

Bangla Devanāgarī

মU+09AE

मU+092E

িU+09BF

िU+093F

Table14-BanglaandDevanāgarīcross-scriptvariantcodepoint

2 BanglaandGurmukhiScript

Bangla Gurmukhı

মU+09AE

ਸU+0A38

িU+09BF

ਿU+0A3F

Table15-BanglaandGurmukhıcross-scriptvariantcodepoint

7 WholeLabelEvaluationRules(WLE) ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinsection32whenwritten inBangla12ScriptThe ruleshavebeendrafted in suchawaythattheycanbeeasilytranslatedintotheLGRspecificationsBelow are the symbols used in the WLE rules for each of the Indic SyllabicCategoryasmentionedinthetableprovidedinCodepointrepertoire(Section51)

C rarr Consonant

M rarr Kāra(Mātrā)

V rarr Vowel

B rarr Anusvāra

12 As used by the Unicode denoting and including both Assamese and Maṇipuri

47

D rarr Candrabindu

X rarr Visarga

H rarr Hasanta

Z rarr KhandaTa

S rarr S1S2(fromTable9)or(ae)Ya-phalā(V1HC1M1)whereV1iseither0985(অ-BENGALILETTERA)or098F(এ-BENGALILETTERE)His09CD(-BENGALISIGNVIRAMA)C1is-09AF(য-BENGALILETTERYA)M1is-09BE(া-BENGALIVOWELSIGNAA)S1 and S2 are valid even they are not allowed by the other context rules

P rarr Ra-Hasanta(C2H)whereC2iseither09B0(র-BENGALILETTERRA)or 09F0 (ৰ - ASSAMESE LETTER RA

Unicode name BENGALI LETTER RAWITHMIDDLEDIAGONAL)

His09CD(-BENGALISIGNVIRAMA)

Table16-SymbolsusedinWLErules

ItisalsoperhapsidealtomentionherethatinBanglatheconsonantletters(orgraphemes)arephysicallyjoinedtoformldquoclustersrdquothatcouldtheoreticallyconjoinfromtwotofourconsonantsandcombinetocreatenewshapesDashandChaudhuri(1998)statethatthereareldquonearly380uniqueconsonantclustersrdquooutofwhichBi-consonantalcombinationsare290three-lettercombinationsaccountforanother80andtherareroneswithfourlettersnumber10more[136Pg4]MoredetailsofsuchcombinationscouldbeseeninPabitraSarkar(1993)[135] 71FinalSetofWLERulesTheprevalentpatternsinBanglaandvariousrestrictionsbelowarethespecificWLErulesthatneedtobeimplemented

48

1 CisasetofCandCNwhereCNisthesetofnormalizedformsofড়ঢ়য়2 HmustbeprecededbyC

Example

3 MmustbeprecededbyCExampleকা

4 DmustbeprecededbyeitherofVCorMExampleআখখা হা

5 XmustbeprecededbyeitherofVCMorDExampleউঃখঃবঃাঃ দ ঃ

6 BmustbeprecededbyeitherofVCMorDExampleআংইংকং

7 ZmustbeprecededbyVCMDBXorPExampleইৎকৎাৎাৎপ6ৎrৎ (S is not listed because S ends with M Z may also follow S)rdquo

8 VCANNOTbeprecededbyHDetailsin711CaseofVprecededbyH

9 SCANNOTbeprecededbyH

Now letuselaborateeach rulewithexamples from the scriptkeeping inmindthe Bangla Assamese and Manipuri communities Some combinations ofcharactersmayseemunrealisticorrareinusagebutthereisnoharminaddingsuch ligaturesbecause it ispossible tocreate thembyanyusereasilybutmaynotbeattestedcombinationsCase of V Preceded by H There could be cases involvingmulti-word domains where Vmay need to beallowedtofollowanHegব8াsঅtইিuয়া baeligŋkʌv ɪndiə (U+09ACU+09CDU+09AFU+09BEU+0999U+09CD U+0995 U+0985 U+09AB U+09CD U+0987 U+09A8 U+09CD U+09A1U+09BFU+09DFU+09BE)(meaningBankofIndia)ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendswithanH(অt)andthesecondwordbeginswithaV(ইিuয়া)Somesectionsofthe linguistic community require the explicit presence of H for fullrepresentation of the sound intended However by and large the form of thefirstwordwithoutanH(U+09CD)isconsideredenoughforfullrepresentationofthesoundintendedforthefirstword

49

This isauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoire Otherwise V is never required to be allowed to follow an HPermittingthismaycreateaperceptivesimilaritybetweentwolabels(withandwithout H) for majority of the linguistic communities hence this is explicitlyprohibitedbytheNBGPIn future if required depending on the prevailing requirements from thecommunitythefutureNBGPmayconsiderrevisitingthisrule

72 AdditionalExamplesfromBanglaABNF

Belowarea fewexampleswhichhelponeunderstandsomeof the rulesABNFputsinplaceThesearejustgivenforreferencepurposesandarenotmeanttobecomprehensive

1HMBDorXcannotoccurinthebeginningofaBanglawordExample

ক क

াক ाक

ংক क

ক क

ঃক ःकAscanbeseensuchcombinationwillresultautomaticallyinaldquogolurdquooradottedcircle marking it as an invalid formation This is an intrinsic property of theIndianlanguagesyllableandisquasi-automaticallyappliedwhereversupportedbytheOS

2HisnotpermittedafterVBDXMS

Example

অ अ

অং क ক क কঃ कः ি )क

3NumberofBDorXpermittedafterConsonantorVowelorakāra(Mātrā)isrestrictedtoonethusthefollowingcombinationsareinvalidated

50

Example

কংং क

ক क

কঃঃ कःः

কা का

কীঃঃ कःः

অংং अ

অ अ

অঃঃ अःः

4NumberofMpermittedafterConsonantisrestrictedtooneExample

কীী की5MisnotpermittedafterV Example

ইাঈৗ ईाईौ6ThecombinationsofAnusvāra+ VisargaaswellasVisarga+Anusvāraarenotpermissible

Example

কংঃ कः

কঃং कः

8 Contributors

81ExpertsfromIndia ProfessorUdayaNarayanaSinghChair-ProfessorofLinguisticsampDeanFacultyofArtsAmityUniversityHaryanaGurgaonPachgaonManesarPIN122431(Haryana)India

51

ProfessorPabitraSarkarformerlyVice-ChancellorRabindraBharatiUniversityKolkataDrAtiurRahmanKhanPrincipalTechnicalOfficerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMrRajibChakrabortyLinguistSocietyforNaturalLanguageTechnologyResearch(SNLTR)Module114amp130SDFBuildingSaltLakeSector-VKolkata-700091(WestBengal)IndiaMrAkshatJoshiProjectEngineerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMsMoumitaChowdhurySeniorTechnicalOfficerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMrChandrakantaMurasinghAgartalaTripuraSomeotherNBGPmembers

82ContributorsfromBangladeshJanabMustafaJabbarHonorableMinisterMinistryofPostsTelecommunicationsampInformationTechnologyGovtofBangladeshProfShamsuzzamanKhanFormerDirector-GeneralBanglaAcademyDhakaProfRafiqulIslamNationalProfessorofHumanitiesDhakaProfSwarochisSarkarDirectorInstituteofBangladeshStudiesRajshahiUniversityRajshahiBangladeshProfJinnatImtiazAliDirector-GeneralInternationalMotherLanguageInstituteDhakaMrMohammadMamunOrRashidDepartmentofBanglaJahangirnagarUniversityampMemberBangladeshComputerCouncilProfManiruzzamanformerlyProfessorChittagongUniversityChattagramBangladeshMrShyamSunderSikderSecretarySecretaryPostampTelecommunicationsDivisionGovtofBangladesh

52

MrMdMustafaKamalFormerDirectorGeneralBangladeshTelecommunicationsRegulatoryAuthorityGovernmentofBangladeshDhakaBrigadierGeneralMdMahfuzulKarimMajumderDirector-GeneralEngineeringampOperationsDivisionBangladeshTelecommunicationsRegulatoryAuthorityGovernmentofBangladeshDhakaMdZiarulIslamProgrammerPostsampTelecommunicationsDivisionGovernmentofBangladeshDhakaProfSyedShahriyarRahmanDepartmentofLinguisticsUniversityofDhakaDrMizanurRahmanDirector(In-Charge)TranslationTextBookandInternationalRelationsDivisionBanglaAcademyDhakaDrApareshBandyopadhyayDirectorBanglaAcademyDhakaMrMdMobarakHossainDirectorBanglaAcademyDhakaDrJalalAhmedDirectorBanglaAcademyDhakaMrJahangirHossainInternetSocietyBangladesh(ICANNALS)JanabSarwarMostafaChoudhuryBangladeshComputerCouncilDhakaJanabMdRashidWasifBangladeshComputerCouncilDhakaJanabIstiaqueArifSeniorAssistantDirectorBangladeshTelecommunicationsRegulatoryAuthorityDhakaMsAfifaAbbasInformationSecurityandGovernanceLeadEngineeratBanglalinkandICANNFellowMr Mohammad Abdul Haque Secretary General Bangladesh Internet GovernanceForumMrImranHossenCEOEyeSoftandkeymemberofBangladeshAssociationofSoftwareampInformationServices(BASIS)MsShahidaKhatunDirectorFolkloreMuseumampArchiveDivisionBanglaAcademyDhakaMrSyedAshikRehmanCEOBengalMediaCorporationDhakaMrHaseebRahmanCEOProfessionalsrsquoSystemsDhaka

53

9 References[100] UnicodeConsortium2017UnicodeStandard100MountainViewCA

[101] BandyopadhyayChittaranjan1981DuiShatakerBanglaMudranoPrakashanKolkataAnandaPublishers

[102] BanerjiRD1919TheOriginoftheBengaliScriptKolkataNewDelhiAsianEducationalServices2003reprint

[103] ChatterjiSK1926TheOriginandDevelopmentoftheBengaliLanguageCalcuttaCalcuttaUniversityPress

[104] -----1939Bhasha-prakashBangalaVyakaran(AGrammaroftheBengaliLanguage)CalcuttaUniversityofCalcutta

[105] HaiMuhammadAbdul1964DhvaniVijnanOBanglaDhvani-tattwa(PhoneticsandBengaliPhonology)DhakaBanglaAcademy

[106]JhaSubhadra1958TheFormationofMaithiliLondonLuzacampCo

[107] KosticDjordjeDasRheaS1972AShortOutlineofBengaliPhoneticsCalcuttaStatisticalPublishingCompany

[108] MajumdarRC1971HistoryofAncientBengalCalcuttaGBhardwaj

[109] MazumdarBijaychandra19202000TheHistoryoftheBengaliLanguage(ReprCalcutta1920ed)NewDelhiAsianEducationalServices

[110] PandeyAnshuman2001ProposaltoEncodetheTirhutaScriptinISOIEC10646

[111] PalPalashBaran2001DhwanimalaBarnamalaKolkataPapyrus

[112] -----2007lsquoBanglaHarapherPanchParbarsquoInSwapanChakrabortyedMudranerSanskritiOBanglaBoiKolkataAbabhas

[113] RossFiona1999ThePrintedBengaliCharacteranditsEvolutionLondonCurzon

[114] ShastriMahamahopadhyayHaraPrasad1916HājārBacharērPurāṇaBāṅgālāBhāṣāyBauddhaGānōDōhāCalcuttaBangiyaSahityaParishat

[115] SinghUdayaNarayana(JointlyManiruzzaman)1983DiglossiainBangladeshandlanguageplanningCalcuttaGyanBharati214pp

[116] -----1987ABibliographyofBengaliLinguisticsMysoreCIILxii+316pp

[117] -----2017(withRajibChakrabortyBidishaBhattacharjeeampArimardanKumarTripathy)LanguagesandCulturesontheMarginGuidelinesforFieldworkonEndangeredLanguagesMimeoCentreforEndangeredLanguagesVisva-Bharati

54

[118] -----1980ScriptalchoiceandspellingreformAnessayinlanguageandplanningJournaloftheMSUniversityofBarodaSocialScienceNumber292173-186AmodifiedversionreprintedEAnnamalaiBjornJernuddandJoanRubinedsLanguagePlanningProceedingsofanInstituteMysoreCIIL405-417

[119] Sripantha1996JakhanChapakhanaEloKolkataPaschim-BangaBanglaAcademy

[120] SurAtul1986BanglaMudranerDushoBacharKolkataJijnasa

[121] ScriptBehaviourforBengaliVersion11TDILandC-DACPune

[122] BoraMahendra1981TheEvolutionofAssameseScriptJorhatAssamSahityaSabha

[123] ProposaltoEncodetheTirhutaScriptinISOIEC10646httpwwwunicodeorgL2L201111175r-tirhutapdfaccessedon25112017

[124] EthnologueAssameseintheLanguageCloudhttpswwwethnologuecomcloudasmaccessedon25112017

[125] BengalialphabetforManipurifoundinEthnologueManipuri(MeeteilonMeithei)httpswwwomniglotcomwritingmanipurihtmaccessedon20102019

[126] WikipediaBengalialphabethttpsenwikipediaorgwikiBengali_alphabetaccessedon25112017

[129]OmniglotSlyhetihttpwwwomniglotcomwritingsylotihtmaccessedon1052018

[130]WikipediaBishnupriyaManipurilanguagehttpsenwikipediaorgwikiBishnupriya_Manipuri_languageaccessedon25112017

[131] TheEMILLECIILCorpushttpmetashareeldaorgrepositorybrowsethe-emilleciil-corpusabdd35c8de6f11e2b1e400259011f6ea6bce74d38dbb42d881da76c64a6adb20accessedon1052018

[132]TheEMILLECIILCorpushttpcatalogelrainfoproduct_infophpproducts_id=696accessedon1052018

[133] BanglaLanguageampScript httpswwwisicalacin~rc_banglabanglahtmlaccessedon1052018

[134] SarkarPabitra1992BanglaBananSanskarSamasyaoSambhabanaKolkataChirayataPrakashan

[135] SarkarPabitra1993BanglaBhasharYuktabyanjanBhasha1123-45

55

[136] DashNiladriShekharandBBChaudhuri1998BanglaScriptAStructuralStudyLinguisticsToday121-28Alsoavailableathttpswwwacademiaedu9967428Bangla_Script_A_Structural_Study

[137] DaniAhmedHasan(1957)lsquoSrīhatta-NāgarīLipirUtpattioBikāśrsquoBanglaAcademyPatrika(Dhaka)Vol12(Bhadra-Agrahayan1364BangabdaNumber)pg1

[138] WikipediaSylhetiNagari httpsenwikipediaorgwikiSylheti_Nagariaccessedon1952018

[139] FuruiRyosuke(2015)lsquoVariegatedAdaptationsStateFormationinBengalfromtheFifthtoSeventhCenturyrsquoinBhairabiPrasadSahuampHermannKulkeedsInterrogatingPoliticalSystemsIntegrativeProcessesandStatesinPre-ModernIndiaChapter9Pp255-73NewDelhiManohar

[140] FergusonCharesAandMunierChowdhury(1960)lsquoPhonesofBengalirsquoLanguageVol36No1pp22-59

[141] ShahidullahMuhammad(2007)BuddhistMysticSongsDhakaMowlaBrothers

[142] RayPunyaSloka(1966)BengaliLanguageHandbookWashington

[143] HaiMuhammadAbdul(1960)AphoneticandphonologicalstudyofnasalsandnasalizationinBengaliDhakaUniversityofDhaka

[144] UnicodeConsortiumProposalSummaryFormtoAccompanySubmissionsforAdditionstotheRepertoireofISOIEC10646UNICODEhttpswwwunicodeorgL2L200202387r-syloti-formpdfaccessedonMay212018

[145] WikipediaOlChiki(Unicodeblock)httpsenwikipediaorgwikiOl_Chiki_(Unicode_block)accessedonMay212018

[146]BanglaScripthttpwwwbangladesh2000combdbangla_scripthtmlaccessedonMay212018

[147] BhattacharyaAshutoshed(1942)GopichandrerGanCalcuttaCalcuttaUniversity

[149] DasSisirKumar(1975)SahibsandMunshisAnAccountoftheCollegeofFortWilliamCalcutta

[150] IslamRafiqulPabitraSarkarMahbubulHaqampRajibChakraborty(eds)(2014)BanglaAcademyPromitoBanglaByabaharikByakaran(AFunctionalGrammarofStandardBangla)DhakaBanglaAcademy

[151] SarkarPabitra[2013]lsquoBanglaSpellingReformtheLongandShortofItrsquoBanglaJournal19215-232

[152] BanglaAcademy(2012)BanglaAcademyPromitoBanglaBananerNiyam(StandardBanglaSpellingasadoptedbyBanglaAcademy)DhakaBanglaAcademy

56

[153] SarkarPabitraampRajibChakraborty2018ldquoWhathashappenedSoFarIntermsof Script Reformsrdquo Paper presented at the Face to Face meeting jointly held by theBanglaAcademyDhakaampICANNatBanglaAcademyDhakaon10072018

[154] The Unicode Consortium 2018 The Unicodereg Standard Version 110 ndash CoreSpecificationChapter12P473

57

10 Appendix-I

101 AugmentedBackus-NaurFormalism(ABNF)The Augmented Backus-Naur Formalism (ABNF) is generic in nature and whenappliedtoaspecificlanguagescriptcertainrestrictionrulesapplyInotherwordsinagivenlanguagesomeoftheFormalismstructuresdonotnecessarilyapplyTotakecareofsuchcasesrestrictionrulesaresetinplaceTheserestrictionswillhelptofine-tunetheABNFIncaseofBangla13inparticularthefollowingrulesapply

1Khaṇḍata (ৎ) is NOT allowed at the beginning of an IDN label The sameappliestoঞandthevelarnasalঙintheBanglaSchemeoffive-foldlsquovargarsquo(asdefined under Table 5) Moreover Bangla does not allow ya (য়) in thebeginningofawordeitherbutwecanciteacoupleofnativeexamples forexamplethewordয়8াvেড়া(yaeligbbɔRo)fromthepoemlsquoLichuchorrsquowrittenbyKaziNazrul IslamHowever there are instances of it being used in namesmostlyof foreignoriginsuchasYaqubwhichmaybewrittenwithya(য়) inthe beginning as inয়াwব) In very recent timeswhile transliterating someChineseandJapanesenamesinBanglaonedoescomeacrossthepossibilityofKhaṇḍata (ৎ) followedbysa (স) inthebeginningofaword forexamplexেসিরং(Tsering)

2CHcancomewithKhandaTainonlythecasewhereCisra(র)(09B0)

ৎ6 asinভৎ6 সনা

3OnlyfollowingcombinationswithVHCMwillbeallowedrarrঅ8া(togetherpronouncedasaelig)asinঅ8ািসড(acid)rarrএ8া(togetheralsopronouncedasaelig)asinএ8ািসডএ8ােসািসেয়শান

(acidassociation)

102 lsquoSylhetiNagarılipirsquoorlsquoSilotirsquoThisversionofBanglascriptresemblesthe lsquoKaithīrsquoscript(ISO12954)usedbytheAccountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh andBiharndashwidelyinuseduringthe1880sTherewereseveralothernamesofSylheti

13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language Assamese and Maṇipuri have not been covered in this section

58

Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)

Table17ndashTheScriptTableofSylhetiNagarıorSiloti

103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints

1031 BanglaandNāgarīorDevanāgarī

Bangla Devanāgarī NBGP

Decision

ঃ U+0983 ः U+0903 Confusable

ওU+0993 उU+0909 Confusable

ঘU+0998 घU+0918 Confusable

U+0981 U+0945 Confusable

Table18BanglaandDevanāgarīconfusablecodepoints

59

1032 BanglaandGurmukhi

Bangla Gurmukhi NBGPdecision

ঘU+0998 ਬU+0A2C Confusable

U+0981 U+0A71 Confusable

Table19BanglaandGurmukhiconfusablecodepoints

Bangla Gurmukhi

NBGPdecision

ওU+0993 ਤU+0A24

Distinguishable

শU+09B6 ਅU+0A05

Distinguishable

মU+09AE ਮU+0A2E

Distinguishable

বাU+09ACandU+09BE

ਗU+0A17

Distinguishable

Table20ndashBanglaandGurmukhıdistinguishablecodepoints

1033BanglaandOriya(Odia)

Bangla Oriya(Odia) NBGPDecision

ওU+0993 ଓU+0B13 Confusable

Table21ndashBanglaandOriyadistinguishablecodepoints

Bangla Oriya(Odia) NBGP

Decision

ঘU+0998 ସU+0B38 Distinguishable

Table22ndashBanglaandOriyadistinguishablecodepoints

60

11 Appendix-IIBengaliconsonantsandtheirallographs

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

প p y(z+ত)(z +ন)|(z +প)প8(z +য)r(z +র)(z +ল)~(z +স)(+প)(+প)

ফ pʰ (t +র)(t +ল)(+ফ)

ব b ( +জ)( +দ)( +ধ)v(+ব)ব8(+য)(+র)(+ল)ভ(+ভ)(+ব)(+ব)

(0+ধ) 2 (3+ব)

ভ bʱ ভ8(+য)(+র)(+ল)

ত t (x+ত)8(x+x+য)(x+x+ব)(x+থ)(x+ন)ত8(x+য)(x+ম)8(x++য)(x+ব)(x+র)y(z+ত)(p+ত)(p+x+ব)(+ত)q8(+x++য) (+x+র)Thereisamarkedformofত+=ৎৎ6 (+xৎ)

amp (5+ত)

61

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

থ tʰ থ8(iexcl+য)cent(iexcl+র)pound(+থ)(x+থ)curren(+থ)

(7+থ) 9 (+থ)

দ d yen(brvbar+গ)sect(brvbar+ঘ)uml(brvbar+দ)copy(brvbar+ধ)দ8(brvbar+য)ordf(brvbar+ব)laquo(brvbar+ভ)not(brvbar+র)(+দ)shy(+দ)reg(+brvbar+র)not6 (+brvbar+র)

(lt+গ) gt (lt+ধ)

ধ dʱ macr(deg+ন)plusmn(deg+ম)ধ8(deg+য)sup2(deg+র)sup3(acute+ধ)copy(brvbar+ধ)(+ধ)micro(+ধ)

( (+ধ) gt (lt+ধ) (0+ধ) (7+ধ)

ট ʈ para(middot+ট)ট8(middot+য)cedil(middot+ব)sup1(middot+র)ordm(p+ট)raquo(frac14+ট)

ঠ ʈʰ ঠ8(frac12+য)frac34(iquest+ঠ)Agrave(frac14+ঠ)

ড ɖ Aacute(Acirc+ড)ড8(Acirc+য)Atilde(Acirc+র)

ঢ ɖʱ ঢ8(Auml+য)Aring(iquest+ঢ)

চ tʃ AElig(Ccedil+চ)Egrave(Ccedil+ছ)Eacute(Ccedil+Ecirc+র)Euml(Ccedil+ঞ)চ8(Ccedil+য)Igrave(Iacute+চ)Icirc(Iuml+চ)

(A+চ)

62

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ছ tʃʰ ETH(Ecirc+র)Egrave(Ccedil+ছ)Ntilde(Iacute+ছ)Ograve(Iuml+ছ)

$ (A+ছ)

জ dʒ Oacute(Ocirc+জ)Otilde(Ocirc+Ocirc+ব)Ouml(Ocirc+ঝ)times(Ocirc+ঞ)জ8(Ocirc+য)Oslash(Ocirc+র)Ugrave(Iacute+জ)

(A+জ)

ঝ dʒʱ (notprivilegedenoughtohaveclustersasafirstmember)Ouml(Ocirc+ঝ)Uacute(Iacute+ঝ)

ক k Ucirc(p+ক)ordm(p+ট)(p+ত)Uuml(p+x+র)(p+x+ব)Yacute(p+ন)THORN(p+ব)szlig(p+ম)ক8(p+য)agrave(p+র)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)auml(p+frac14+ব)aacute8(p+frac14+য)aring(p+স)s(aelig+ক)ccedil(+p+র)

amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)

খ kʰ (notprivilegedenoughtohaveclustersasafirstmember)egrave(aelig+খ)

63

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

গ g eacute(acute+গ)ecirc(acute+দ)sup3(acute+ধ)euml(acute+ন)igrave(acute+ব)iacute(acute+ম)গ8(acute+য)icirc(acute+র)iuml(acute+ল)eth(aelig+গ)eth6 (+aelig+গ)

( (+ধ) (H+গ) K (L+H+গ)

ঘ gʱ ntilde(ograve+ন)ঘ8(ograve+য)oacute(ograve+র)ocirc(aelig+ঘ)

ঞ Thisletterdoesnothaveanyparticularphoneticvaluebutmostlypronouncedasn

Igrave(Iacute+চ)Ntilde(Iacute+ছ)Ugrave(Iacute+জ)Uacute(Iacute+ঝ)times(Ocirc+ঞ)

(A+চ) $ (A+ছ) (A+জ) M (A+ঝ)

ণ n otilde(iquest+ট)frac34(iquest+ঠ)ouml(iquest+ড)divide(iquest+Acirc+র)Aring(iquest+ঢ)oslash(iquest+ণ)ণ8(iquest+য)ugrave(iquest+ব)acirc(p+frac14+ণ)uacute(frac14+ণ)ucirc(+ণ)

O (P+ড) - (P+R+র) + (S+ণ)

ঙং ŋ s(aelig+ক)uuml(aelig+p+র)egrave(aelig+খ)eth(aelig+গ)ocirc(aelig+ঘ)yacute(aelig+p+ষ)(Insomecontextsaeligisreplacedbyং)কংঅং

) (H+ক) (H+গ) U (H+ঘ)

64

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ম m thorn(+ল)yuml(+প)(+z+র)(+ভ)(++র)$(+ম)(+র)(x+ম)plusmn(deg+ম)amp(+ম)atilde(p+frac14+ম)

W (3+ম)

ন n (+ট)((+middot+র))(iquest+ঠ)u(+ড)(+Acirc+র)(+ত)q(+x+র)q8(+x++য)curren(+থ)shy(+দ)reg(+brvbar+র)micro(+ধ)+(+deg+র)(+brvbar+ব)-(+ন)(+ম)ন8(+য)(+স)0(+ন)

(7+থ) (7+ধ) (7+Y+র)

শ ʃ Icirc(Iuml+চ)Ograve(Iuml+ছ)1(Iuml+ন)2(Iuml+ম)3(Iuml+র)4(Iuml+ল)শ8(Iuml+য)

ষ ʃ 5(frac14+ক)raquo(frac14+ট)Agrave(frac14+ঠ)uacute(frac14+ণ)6(frac14+প)7(frac14+z+র)8(frac14+ফ)raquo(frac14+ট)9(frac14+middot+র)Agrave(frac14+ঠ)uacute(frac14+ণ)ষ8(frac14+য)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)

+ (S+ণ)

65

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

স sampʃ (+ক)(+ট)(+প)(+ফ)lt(+ত)pound(+থ)(+ট)(+ক)=(+খ)স8(+য)gt(+র)(+ল)aring(p+স)

9 (+থ)

হ h ucirc(+ণ)0(+ন)amp(+ম)হ8(+য)(+র)A(+ল)

W (3+ম)

ড় ɽ B(C+গ)

ঢ় ɽʱ (notprivilegedenoughtohaveclusters)

য dʒThesecondarysymbol(allograph)jɔ-phalahastwophoneticvaluesWhenaddedtotheinitialconsonantinaworditisavowelaelig(asinশ8ামলর 8াপারetc)Butafteranon-initialconsonantitjustdoublesitinpronunciation(asinকায6ধায6etc)The+যcombinationhastwophysicalmanifestationsmdashর 8andয6

ক8(p+য)স8(+য)র 8(+য)[Justর 8isneverusedinBanglaorthographyর 8াisbutthenitslasttwosymbolsYa-phalaa-karaconstituteavowelsignrepresentingthevowelঅ8া]

66

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

র r Twomanifestationsmdashi lরফrepʰasthefirst

memberofaclusteregপ6ৎ6 not6 য6D6(+deg+ব)(earlierE6=+brvbar+deg+বafour-termcluster)etc(placedoverthefollowingconsonant)

ii র-ফলাrɔ-pʰɔlaasthesecondthirdmemberofaclustereg etc(placedundertheconsonantitfollows)

ল l F(+গ)(+প)G(+ব)H(+ম)I(+ট)J(+ড)K(+ক)F(+গ)L(+দ)ল8(+য)iuml(acute+ল)(+ল)thorn(+ল)

ঃ hwordfinallywordmediallyitdoublesthepronunciationofthefollowingconsonant

অঃকঃ

অব

Page 27: Proposal for a Bangla (or Bengali) Script Root Zone …‘Bangla’ (or Bengali) is historically and genealogically regarded as an eastern Indo-Aryan language with around 178.2 million

27

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

8 U+0989 উ BENGALILETTERU

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

9 U+098A ঊ BENGALILETTERUU

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

10 U+098B ঋ BENGALILETTERVOCALICR

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

11 U+098F এ BENGALILETTERE

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

12 U+0990 ঐ BENGALILETTERAI

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

13 U+0993 ও BENGALILETTERO

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

14 U+0994 ঔ BENGALILETTERAU

Vowel 1Bangla2Manipuri2Assamese

[112][122][125]

28

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

15 U+0995 ক BENGALILETTERKA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

16 U+0996 খ BENGALILETTERKHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

17 U+0997 গ BENGALILETTERGA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

18 U+0998 ঘ BENGALILETTERGHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

19 U+0999 ঙ BENGALILETTERNGA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

20 U+099A চ BENGALILETTERCA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

21 U+099B ছ BENGALILETTERCHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

29

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

22 U+099C জ BENGALILETTERJA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

23 U+099D ঝ BENGALILETTERJHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

24 U+099E ঞ BENGALILETTERNYA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

25 U+099F ট BENGALILETTERTTA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

26 U+09A0 ঠ BENGALILETTERTTHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

27 U+09A1 ড BENGALILETTERDDA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

30

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

28 09A109BC(U+09DC)

ড় NormalizedformofBENGALILETTERRRA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]09DCisthepreferredcodepointhoweveritisnotavailableforLGRasperthestandardsgoverningthisLGRdevelopment

29 U+09A2 ঢ BENGALILETTERDDHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

30 09A209BC(U+09DD)

ঢ় NormalizedformofBENGALILETTERRHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]09DDisthepreferredcodepointhoweveritisnotavailableforLGRasperthestandardsgoverningthisLGRdevelopment

31 U+09A3 ণ BENGALILETTERNNA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

32 U+09A4 ত BENGALILETTERTA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

31

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

33 U+09A5 থ BENGALILETTERTHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

34 U+09A6 দ BENGALILETTERDA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

35 U+09A7 ধ BENGALILETTERDHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

36 U+09A8 ন BENGALILETTERNA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

37 U+09AA প BENGALILETTERPA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

38 U+09AB ফ BENGALILETTERPHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

39 U+09AC ব BENGALILETTERBA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

32

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

40 U+09AD ভ BENGALILETTERBHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

41 U+09AE ম BENGALILETTERMA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

42 U+09AF য BENGALILETTERYA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

43 09AF09BC(U+09DF)

য় NormalizedformofBENGALILETTERYYA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]09DFisthepreferredcodepointhoweveritisnotavailableforLGRasperthestandardsgoverningthisLGRdevelopment

44 U+09B0 র BENGALILETTERRA

Consonant 1Bangla2Manipuri

[112][125]

45 U+09B2 ল BENGALILETTERLA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

46 U+09B6 শ BENGALILETTERSHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

33

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

47 U+09B7 ষ BENGALILETTERSSA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

48 U+09B8 স BENGALILETTERSA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

49 U+09B9 হ BENGALILETTERHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

50 U+09BE া BENGALIVOWELSIGNAA

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

51 U+09BF ি BENGALIVOWELSIGNI

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

52 U+09C0 ী BENGALIVOWELSIGNII

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

53 U+09C1 BENGALIVOWELSIGNU

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

34

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

54 U+09C2 BENGALIVOWELSIGNUU

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

55 U+09C3 BENGALIVOWELSIGNVOCALICR

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

56 U+09C4 BENGALIVOWELSIGNVOCALICRR

Kāra(Mātrā)

1Bangla2Assamese

[112][122]

57 U+09C7 l BENGALIVOWELSIGNE

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

58 U+09C8 m BENGALIVOWELSIGNAI

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

59 U+09CB lা BENGALIVOWELSIGNO

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

60 U+09CC lৗ BENGALIVOWELSIGNAU

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

35

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

61 U+09CD BENGALISIGNVIRAMA

Hasanta(=Halant)Virama(=Da ri)

1Bangla2Assamese2Manipuri

[112][122][125]

62 U+09CE ৎ BENGALILETTERKHANDATA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

63 U+09F0 ৰ BENGALILETTERRAWITHMIDDLEDIAGONAL

Consonant 2Assamese [122]

64 U+09F1 ৱ BENGALILETTERRAWITHLOWERDIAGONAL

Consonant 2Assamese2Manipuri

[122][125]

Table8BanglaCode-PointRepertoire

Apart from the above individual code-points the Neo-Brāhmī Generation Panel alsoproposes some specific sequences which enable conditional inclusion of the BanglaLETTER A and E followed by Bangla SIGN VIRAMA and Bangla LETTER YA againfollowed by Bangla VOWEL SIGN AA in the repertoire for enabling inclusion of aeligsoundasinEnglishlsquobatrsquolsquocatrsquoetc

SrNo

UnicodeCodePoints

Sequence

CharacterNames Examplelanguagesusingthecode-point

(Notexhaustive

list)

Reference

36

SrNo

UnicodeCodePoints

Sequence

CharacterNames Examplelanguagesusingthecode-point

(Notexhaustive

list)

Reference

S1 098509CD09AF09BE

অ8া BENGALILETTERABENGALISIGNVIRAMABENGALILETTERYABENGALIVOWELSIGNAA

BanglaAssamese

[112][122]

S2 098F09CD09AF09BE

এ8া BENGALILETTEREBENGALISIGNVIRAMABENGALILETTERYABENGALIVOWELSIGNAA

Bangla [112]

Table9Sequences

52 CodePointRepertoireExclusionTherearesomecharactersoftheBanglascriptthatfindplaceintheUnicodebuthavenot been included in the repertoire in the LGR proposal The reason for excludingঌ(U+098C)andৗ(U+09D7)isthattheyarerareandobsoletecharacters

SrNo CodePoints

Glyph CharacterNames Note

1 U+098C ঌ BENGALILETTERVOCALICL Limitedordeclininguse

2 U+09D7 ৗ BENGALIAULENGTHMARK Limitedordeclininguse

Table10ExcludedCodePoints

37

53 CodepointnotusedaloneBENGALI SIGN NUKTA U+09BC (See 336) is excluded from repertoire since it will never be used alone It will be used as sequence in three special characters in normalized form for ড় ঢ় য়

UnicodeCodePoint

Glyph CharacterName Reasonforexclusion

U+09BC BENGALISIGNNUKTA

Never used alone Only used together with U+09A1 ড

U+09A2 ঢ U+09AF য as to form ড় ঢ় য় respectively

Table10bExcludedCodePoints

54 TheBasisofPresentIDNThepresentLGRhasalsobenefitedfromtheearlierworkonIDNforBangla(differentversions)doneforभारतorভারতdraftedbetween20112009and18072013

541 TheABNFVariablesTheAugmentedBackus-NaurFormalism(ABNF)beganwiththefollowingvariables

CrarrConsonantVrarrVowelMrarrkāra(Mātrā)BrarrAnusvāra(onuʃʃār)DrarrCandrabinduXrarrVisarga(biʃɔrgo)HrarrHasantaViramaZrarrKhandaTa

TheAugmentedBackus-NaurFormalism(ABNF)willusethefollowingOperators

SrNumber Operator Function

1 ldquo|ldquo Alternative

2 ldquo[]rdquo Optional

3 ldquordquo VariableRepetition

38

4 ldquo()rdquo SequenceGroup

Table11TheABNFFormalism

InwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoBanglaaregiventofacilitateunderstanding

542 TheVowelSequenceInwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoBanglaare given To facilitate understanding of other Brahmi script users equivalents inDevanāgarīareprovidedwherevernecessaryAvowelsequenceismadeupofasinglevowelItmaybefollowedbutnotnecessarily(optionally)byanAnusvāraonuʃʃār(B)Candrabindu(D)oraVisargabiʃɔrgo(X)ThenumberofDBorXwhichcanfollowaV inBanglamaynotberestrictedtooneGoingbytherules illustratedinthedocument it isclearthat formationssuchasVDDVBBandVXXare invalidorthographicunitsHowever it isvalidandpossible tohaveformationsorsequencessuchasanusvarafollowedbyachandrabinduononehandandvisarga followed by a chandrabindu on the other as in হ8াংচা lsquohaelignchārsquo and lsquohaelignrsquo হ8াঃrespectivelyThepossibilityof aVisarga orAnusvāra (onuʃʃār) followingaCandrabinduexists inBangla Vowel can optionally be followed by a combination of Hasanta Virama [H]Consonant [C] to formaYa-phala ldquoYa-phala isapresentation formofU+09AFBanglaletterযorlsquoyarsquoRepresentedbythesequenceltU+09CDieBENGALISIGNVIRAMABangla SIGNHasanta or VIRAMA U+09AF -য BENGALI LETTER YAgt Ya-phala has aspecialformয়AgainwhencombinedwithU+09BEাBENGALIVOWELSIGNAA(ielsquoaarsquo(ā))itisusedfortranscribing[aelig]asintheldquoardquointheEnglishwordldquobatrdquowritteninBanglaasব8াটAVowel-sequenceadmitsthefollowingcombinations

5421 ASingleVowel

ExamplesV অअ

5422 AVowelwithConditionsAVowelcanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D]orVisarga[X]or Candrabindu+ Anusvāra [DB] or Candrabindu+ Visarga [DX] or combination ofHasanta(orVirama)[H]followedbyConsonant[C]followedbykāra(Mātrā)[M]

39

Examples

VB অং अ

VD অ अ

VX অঃ अः

VDB অং अ

VDX অঃ अःVHCM অ8াএ8া

5423 VHCMSequenceAVHCMsequencecanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D] orVisarga [X]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga [DX]

Examples VHCMB অ8াংএ8াং VHCMD অ8াএ8া VHCMX অ8াঃএ8াঃ VHCMDB অ8াংএ8াং VHCMDX অ8াঃএ8াঃ

543 TheConsonantSequence

5431 ASingleConsonant(C) ExampleC কक

5432 AConsonantwithConditionsAConsonantoptionallyfollowedbydependentvowelsignkāra(Mātrā)[M]orAnusvāra [B] or Candrabindu [D] or Visarga [X] or Hasanta (also known asVirama)[H]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga[DX]

Example

CM িকক -कक

CB কং क

40

CD ক क

CX কঃ कः

CH p क(Pureconsonant)

CDB কং क

CDX কঃ कः

5433 CMSequenceACMsequencecanbeoptionallyfollowedbyBDXDBorDX

Example CMB কীংকং कक

CMD কা का

CMX বীঃ वीः

CMDB কাং का

CMDX কাঃ काः

5434 SequenceofConsonants Asequenceofconsonants(upto4)joinedbyHasanta(alsoknownasVirama)

3(CH)CExample

CHC W rarr ন++ ত न+त

CHCHC sup2 rarr ন+ + ত+ + র न+त+र

CHCHCHC q8 rarr ন++ত++র++য় न+त+र+य

5435 Subsets While considering its subsets as a representative example we will consider thecombinationCHConlyhoweverthesameisequallyapplicabletoCHCHCandCHCHCHC[A]ThecombinationmaybefollowedbyMBDXDBorDX

Example CHCM sup3ী rarrক ক ী 4कrarrककী

CHCB sup3ং rarrক ক ং 4कrarrकक

CHCD sup3 rarrক ক 4कrarrकक

41

CHCX sup3ঃ rarrক ক ঃ 4कःrarrककঃ

CHCDB sup3 ং rarrক ক ং 4कrarrकक

CHCDX sup3ঃ rarrক ক ঃ 4कःrarrककः [B] 3(CH)CMmayfurtherbefollowedbyaBDXDBorDX

Example

CHCMB sup3ীং rarr ক ক ী ং 4क rarr क क ी

sup3ং rarr ক ক ং 4क rarr क क

CHCMD sup3া rarr ক ক া 4का rarr क क ा

CHCMX sup3ীঃ rarr ক ক ী ঃ 4कः rarr क क ी ः

CHCMDB sup3াংrarr ক ক া ং 4काrarr क क ा

CHCMDX sup3াঃ rarr ক ক া ঃ 4काः rarr क क ा ः

544 TheKhanda-Tasequence

5441 AsinglelsquoKhandarsquo-Ta(Z) Example Zৎ=x

5442 AKhandaTaCombination10AKhandaTacanbeprecededbyaconsonantandHasanta(alsoknownasVirama)

[CH]Z

Example র++ৎ =ৎHasinভৎHসনা (bhartsanā) scoldingNoteTheconditionsinthiscontextofKHANDATAarethattheCshouldbeeitherRAU+09B0(র)(usedinBangla)orRAU+09F0(ৰ)(usedinAssamese)

545 SpecialCasesSandPTwospecialcasesinvolvingSequences(referredtoasSandPinTable16underSection7)couldbedescribedbrieflyhereLetustakeupSinthefirstinstanceItisnoteworthythat thereare two instances inBanglawhereHasanta(U+09CD) isprecededbya full

10 Refer to Rule P in Section 7 Table 16

42

vowel (U+0985 অ - BANGLA LETTER A and U+098F এ - BANGLA LETTER E) ForrenderingYa-phalāfollowedbyঅandএitisnecessarytotypeU+09CDplusU+09AFyapreceded by the said vowels This is a purely ligatural entity and the addition ofYa-phalāandā-kāraisusedtoelicittheaeligsoundasinEnglishlsquobatrsquolsquofatrsquoetcTheBrahmıscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribedaslsquovowel killerrsquo although it actually indicates absence of a vowel after the markedconsonant Only the consonants can have the Hasanta marked But as we see hereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅGাandএGা(CfUnicode100p473[100])Another case refers to the formation of repha and ra-phalā in the said script andmentioned in the tableaboveasPOwing toco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+

kaasinঅকHarkaldquothesunldquo)ra-phalā=C+Hasanta+ra(egTieka+Hasanta+raas

inচTcakraldquocycleldquo)ThepointisinboththecasestheslotforracouldbeBanglaraর

(U+09B0)ortheAssameseraৰ(U+09F0)followedprecededbythecommonHasanta(U+09CD)whereastheshapesofrephaandra-phalāinboththecasesremainthesame

43

6 VariantsThissection talksabout thevariants in theBanglascriptTheNBGPcategorizes theseconfusinglyvariantsintwogroupsGroup1ConfusingduetopurevisualsimilarityGroup2ConfusingduetodeviationfromnormallyperceivedcharacterformationsbylargerlinguisticcommunityForGroup1anyidenticalcodepointsaredefinedasvariantsTheconfusablebutnotidenticalcasesarenotproposedasthereisanotherpanel(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesHowevercaseswhichbelongtoGroup2areproposedtobeconsideredasvariantsThesecasesarenotofmerevisualsimilarityasthey involve some deviations from the widely accepted norms of Bangla Aksharformations These can cause confusion even to a careful observer and hence beingproposedasvariantsThe variants are generated in a script when two or more forms are formed withdifferent storage or code points In Bangla the e-kāra ā-kāra and the o-kāra havedifferentcodepointsOnecantypeowithaconsonantatonegoandthesamebytypinge-kāra and ā-kāra as two separate keys getting the same results A reader cannotdifferentiatebetween the twoko (েকা)one typedwitha singlekeyand theotheronetypedwithtwodifferentkeysMoreoverthiswillnotbeconsideredasacaseofvariantbecauseakārafollowedbyakāraisnotallowed

61 InScriptVariantsHoweverweproposetwocasesoftruein-scriptvariantsinBanglascriptCASEIAs far as true variants in Bangla are concernedwemay drawour attention to caseswhereinHasantawith(U+09A5)থ(tha)appearsasconjunctwith(U+09B8)স(sa)and(U+09A8)ন(na)1 স+Hasanta+থ(U+09B8+U+09CD+U+09A5)versus

স +Hasanta+হ(U+09B8+U+09CD+U+09B9)

2 ন+Hasanta+থ(U+09A8+U+09CD+U+09A5)versus

ন+Hasanta+হ(U+09A8+U+09CD+U+09B9)

44

Theabovecombinationsifwrittenintraditionalorthographycouldbelittleconfusingwheretheথ(tha)inconjunctappearslikeaহ(ha)Theconjunctcouldbeintheinitialmedialorfinalpositions(asshownbelowinegno1)Itcouldbetypedwrongaswellthinking itwas aহ (ha)U+09B9 increasing the chances of risks in labelwriting andidentificationExamples1 acuteandসহ(asinacuteানsthānaacuteলsthulaাacuteGsvāsthyaঅacuteায়ীasthāyı)

2 andনহ(asinparagrantha)The fontswhichrepresent traditionalBanglawritingsystemcould tend tocreate thisproblemThereforethesemaybetakenascasesofvariantsinBanglaCASEIIAnotherinterestingexampleofvariantisencounteredinra+HasantaandHasanta+racombinationsinwritinglabelsintheBanglascript(forlanguagessuchasBanglaAssameseandManipuri)Thevariantcasesariseintypinglsquorepharsquo(involvingra+Hasanta)andlsquora-phalārsquo(involvingHasanta+ra)lsquoRepharsquocouldbeformedbytwosequences(mainlybecausebothAssameseandBanglafindplaceinthesameUNICODEpointsandlsquoB_RArsquoaswellaslsquoA_RArsquorefertothesamephoneticelement)Herethefinalligatureslookthesameandwillbeasfollows

(1) B_RA+H+C(2) A_RA+H+C

Where

B_RA = U+09B0BENGALILETTERRA(র)orA_RA = U+09F0BENGALILETTERRAWITHMIDDLEDIAGONAL(ৰ)H = U+09CDBENGALISIGNVIRAMA()C = anyconsonant(theoretically)

Example

Sequence1(UsingBanglaRA)

Ligature1

Sequence2(UsingAssameseRA)

Ligature2

U+09B0(র)U+09CD()U+0995(ক) কH U+09F0(ৰ)U+09CD()U+0995(ক) কH

U+09B0(র)U+09CD()U+09A0(ঠ) ঠH U+09F0(ৰ)U+09CD()U+09A0(ঠ) ঠH

Table12ExampleofRepha

45

NoteAsBanglaandAssameseকandঠlookexactlythesametheresultantcombinationswithRephalookidenticalAdditionofRephadoesnotmakeanydifferencelsquoRa-phalarsquocouldbeformedbytwosequencesonsimilargroundsandthefinalligatureswouldlookthesame

(1) C1+H+B_RA(2) C1+H+A_RA

WhereC1 = anyconsonantsexceptKhanda-ta

Example

Sequence1(UsingBanglaRA)

Ligature1

Sequence2(UsingAssameseRA)

Ligature2

U+0995(ক)U+09CD()U+09B0(র) U+0995(ক)U+09CD()U+09F0(ৰ)

U+09A8(ন)U+09CD()U+09B0(র) ) U+09A8(ন)U+09CD()U+09F0(ৰ) )

Table13ExampleofRa-phala

AstheAssameseandBanglaRephaandRa-phalaconjunctformslookthesamethiscouldcauseconfusabilitytotheend-usersHencetherephaandra-phalacasesneedtobedefinedasvariantsNBGPconcludedtodefineরandৰasvariantcodepointswhereonlyonevariantsetbetweenরandৰcouldcoverallcasesButthiswillcreateblockedvariantlabelsegif

someoneregistersldquoরররrdquothevariantlabelldquoৰৰৰrdquowillbegeneratedasvariantandwillbeblockedandviceversaHoweveritisonlyblockedatthelabellevelifsomeoneelseneedstoregisterotherlabelsegৰৰorৰৰৰৰitisstillpossible

62 CrossScriptVariantsAcrispcrossscriptstudyforBanglahasbeendonewithrespecttosisterscriptssuchasDevanāgarī Gurmukhı and Odia11(formerly Oriya) keeping in mind the visual andtechnicalconfusionstheymaycauseaslabelsonthewebdomainMoreoverthereisnoin-script variant in Bangla as far as the orthography is concerned The followingcharacters are being proposed by the NBPG as variants Although there are certaincharacters which are somewhat similar they but have not been included here TheyhavebeenprovidedintheAppendix(102)forreference

11 Unicode uses Oriya for the script although Odia is now the official term used

46

1 BanglaandNāgarīDevanāgarīScript

Bangla Devanāgarī

মU+09AE

मU+092E

িU+09BF

िU+093F

Table14-BanglaandDevanāgarīcross-scriptvariantcodepoint

2 BanglaandGurmukhiScript

Bangla Gurmukhı

মU+09AE

ਸU+0A38

িU+09BF

ਿU+0A3F

Table15-BanglaandGurmukhıcross-scriptvariantcodepoint

7 WholeLabelEvaluationRules(WLE) ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinsection32whenwritten inBangla12ScriptThe ruleshavebeendrafted in suchawaythattheycanbeeasilytranslatedintotheLGRspecificationsBelow are the symbols used in the WLE rules for each of the Indic SyllabicCategoryasmentionedinthetableprovidedinCodepointrepertoire(Section51)

C rarr Consonant

M rarr Kāra(Mātrā)

V rarr Vowel

B rarr Anusvāra

12 As used by the Unicode denoting and including both Assamese and Maṇipuri

47

D rarr Candrabindu

X rarr Visarga

H rarr Hasanta

Z rarr KhandaTa

S rarr S1S2(fromTable9)or(ae)Ya-phalā(V1HC1M1)whereV1iseither0985(অ-BENGALILETTERA)or098F(এ-BENGALILETTERE)His09CD(-BENGALISIGNVIRAMA)C1is-09AF(য-BENGALILETTERYA)M1is-09BE(া-BENGALIVOWELSIGNAA)S1 and S2 are valid even they are not allowed by the other context rules

P rarr Ra-Hasanta(C2H)whereC2iseither09B0(র-BENGALILETTERRA)or 09F0 (ৰ - ASSAMESE LETTER RA

Unicode name BENGALI LETTER RAWITHMIDDLEDIAGONAL)

His09CD(-BENGALISIGNVIRAMA)

Table16-SymbolsusedinWLErules

ItisalsoperhapsidealtomentionherethatinBanglatheconsonantletters(orgraphemes)arephysicallyjoinedtoformldquoclustersrdquothatcouldtheoreticallyconjoinfromtwotofourconsonantsandcombinetocreatenewshapesDashandChaudhuri(1998)statethatthereareldquonearly380uniqueconsonantclustersrdquooutofwhichBi-consonantalcombinationsare290three-lettercombinationsaccountforanother80andtherareroneswithfourlettersnumber10more[136Pg4]MoredetailsofsuchcombinationscouldbeseeninPabitraSarkar(1993)[135] 71FinalSetofWLERulesTheprevalentpatternsinBanglaandvariousrestrictionsbelowarethespecificWLErulesthatneedtobeimplemented

48

1 CisasetofCandCNwhereCNisthesetofnormalizedformsofড়ঢ়য়2 HmustbeprecededbyC

Example

3 MmustbeprecededbyCExampleকা

4 DmustbeprecededbyeitherofVCorMExampleআখখা হা

5 XmustbeprecededbyeitherofVCMorDExampleউঃখঃবঃাঃ দ ঃ

6 BmustbeprecededbyeitherofVCMorDExampleআংইংকং

7 ZmustbeprecededbyVCMDBXorPExampleইৎকৎাৎাৎপ6ৎrৎ (S is not listed because S ends with M Z may also follow S)rdquo

8 VCANNOTbeprecededbyHDetailsin711CaseofVprecededbyH

9 SCANNOTbeprecededbyH

Now letuselaborateeach rulewithexamples from the scriptkeeping inmindthe Bangla Assamese and Manipuri communities Some combinations ofcharactersmayseemunrealisticorrareinusagebutthereisnoharminaddingsuch ligaturesbecause it ispossible tocreate thembyanyusereasilybutmaynotbeattestedcombinationsCase of V Preceded by H There could be cases involvingmulti-word domains where Vmay need to beallowedtofollowanHegব8াsঅtইিuয়া baeligŋkʌv ɪndiə (U+09ACU+09CDU+09AFU+09BEU+0999U+09CD U+0995 U+0985 U+09AB U+09CD U+0987 U+09A8 U+09CD U+09A1U+09BFU+09DFU+09BE)(meaningBankofIndia)ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendswithanH(অt)andthesecondwordbeginswithaV(ইিuয়া)Somesectionsofthe linguistic community require the explicit presence of H for fullrepresentation of the sound intended However by and large the form of thefirstwordwithoutanH(U+09CD)isconsideredenoughforfullrepresentationofthesoundintendedforthefirstword

49

This isauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoire Otherwise V is never required to be allowed to follow an HPermittingthismaycreateaperceptivesimilaritybetweentwolabels(withandwithout H) for majority of the linguistic communities hence this is explicitlyprohibitedbytheNBGPIn future if required depending on the prevailing requirements from thecommunitythefutureNBGPmayconsiderrevisitingthisrule

72 AdditionalExamplesfromBanglaABNF

Belowarea fewexampleswhichhelponeunderstandsomeof the rulesABNFputsinplaceThesearejustgivenforreferencepurposesandarenotmeanttobecomprehensive

1HMBDorXcannotoccurinthebeginningofaBanglawordExample

ক क

াক ाक

ংক क

ক क

ঃক ःकAscanbeseensuchcombinationwillresultautomaticallyinaldquogolurdquooradottedcircle marking it as an invalid formation This is an intrinsic property of theIndianlanguagesyllableandisquasi-automaticallyappliedwhereversupportedbytheOS

2HisnotpermittedafterVBDXMS

Example

অ अ

অং क ক क কঃ कः ি )क

3NumberofBDorXpermittedafterConsonantorVowelorakāra(Mātrā)isrestrictedtoonethusthefollowingcombinationsareinvalidated

50

Example

কংং क

ক क

কঃঃ कःः

কা का

কীঃঃ कःः

অংং अ

অ अ

অঃঃ अःः

4NumberofMpermittedafterConsonantisrestrictedtooneExample

কীী की5MisnotpermittedafterV Example

ইাঈৗ ईाईौ6ThecombinationsofAnusvāra+ VisargaaswellasVisarga+Anusvāraarenotpermissible

Example

কংঃ कः

কঃং कः

8 Contributors

81ExpertsfromIndia ProfessorUdayaNarayanaSinghChair-ProfessorofLinguisticsampDeanFacultyofArtsAmityUniversityHaryanaGurgaonPachgaonManesarPIN122431(Haryana)India

51

ProfessorPabitraSarkarformerlyVice-ChancellorRabindraBharatiUniversityKolkataDrAtiurRahmanKhanPrincipalTechnicalOfficerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMrRajibChakrabortyLinguistSocietyforNaturalLanguageTechnologyResearch(SNLTR)Module114amp130SDFBuildingSaltLakeSector-VKolkata-700091(WestBengal)IndiaMrAkshatJoshiProjectEngineerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMsMoumitaChowdhurySeniorTechnicalOfficerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMrChandrakantaMurasinghAgartalaTripuraSomeotherNBGPmembers

82ContributorsfromBangladeshJanabMustafaJabbarHonorableMinisterMinistryofPostsTelecommunicationsampInformationTechnologyGovtofBangladeshProfShamsuzzamanKhanFormerDirector-GeneralBanglaAcademyDhakaProfRafiqulIslamNationalProfessorofHumanitiesDhakaProfSwarochisSarkarDirectorInstituteofBangladeshStudiesRajshahiUniversityRajshahiBangladeshProfJinnatImtiazAliDirector-GeneralInternationalMotherLanguageInstituteDhakaMrMohammadMamunOrRashidDepartmentofBanglaJahangirnagarUniversityampMemberBangladeshComputerCouncilProfManiruzzamanformerlyProfessorChittagongUniversityChattagramBangladeshMrShyamSunderSikderSecretarySecretaryPostampTelecommunicationsDivisionGovtofBangladesh

52

MrMdMustafaKamalFormerDirectorGeneralBangladeshTelecommunicationsRegulatoryAuthorityGovernmentofBangladeshDhakaBrigadierGeneralMdMahfuzulKarimMajumderDirector-GeneralEngineeringampOperationsDivisionBangladeshTelecommunicationsRegulatoryAuthorityGovernmentofBangladeshDhakaMdZiarulIslamProgrammerPostsampTelecommunicationsDivisionGovernmentofBangladeshDhakaProfSyedShahriyarRahmanDepartmentofLinguisticsUniversityofDhakaDrMizanurRahmanDirector(In-Charge)TranslationTextBookandInternationalRelationsDivisionBanglaAcademyDhakaDrApareshBandyopadhyayDirectorBanglaAcademyDhakaMrMdMobarakHossainDirectorBanglaAcademyDhakaDrJalalAhmedDirectorBanglaAcademyDhakaMrJahangirHossainInternetSocietyBangladesh(ICANNALS)JanabSarwarMostafaChoudhuryBangladeshComputerCouncilDhakaJanabMdRashidWasifBangladeshComputerCouncilDhakaJanabIstiaqueArifSeniorAssistantDirectorBangladeshTelecommunicationsRegulatoryAuthorityDhakaMsAfifaAbbasInformationSecurityandGovernanceLeadEngineeratBanglalinkandICANNFellowMr Mohammad Abdul Haque Secretary General Bangladesh Internet GovernanceForumMrImranHossenCEOEyeSoftandkeymemberofBangladeshAssociationofSoftwareampInformationServices(BASIS)MsShahidaKhatunDirectorFolkloreMuseumampArchiveDivisionBanglaAcademyDhakaMrSyedAshikRehmanCEOBengalMediaCorporationDhakaMrHaseebRahmanCEOProfessionalsrsquoSystemsDhaka

53

9 References[100] UnicodeConsortium2017UnicodeStandard100MountainViewCA

[101] BandyopadhyayChittaranjan1981DuiShatakerBanglaMudranoPrakashanKolkataAnandaPublishers

[102] BanerjiRD1919TheOriginoftheBengaliScriptKolkataNewDelhiAsianEducationalServices2003reprint

[103] ChatterjiSK1926TheOriginandDevelopmentoftheBengaliLanguageCalcuttaCalcuttaUniversityPress

[104] -----1939Bhasha-prakashBangalaVyakaran(AGrammaroftheBengaliLanguage)CalcuttaUniversityofCalcutta

[105] HaiMuhammadAbdul1964DhvaniVijnanOBanglaDhvani-tattwa(PhoneticsandBengaliPhonology)DhakaBanglaAcademy

[106]JhaSubhadra1958TheFormationofMaithiliLondonLuzacampCo

[107] KosticDjordjeDasRheaS1972AShortOutlineofBengaliPhoneticsCalcuttaStatisticalPublishingCompany

[108] MajumdarRC1971HistoryofAncientBengalCalcuttaGBhardwaj

[109] MazumdarBijaychandra19202000TheHistoryoftheBengaliLanguage(ReprCalcutta1920ed)NewDelhiAsianEducationalServices

[110] PandeyAnshuman2001ProposaltoEncodetheTirhutaScriptinISOIEC10646

[111] PalPalashBaran2001DhwanimalaBarnamalaKolkataPapyrus

[112] -----2007lsquoBanglaHarapherPanchParbarsquoInSwapanChakrabortyedMudranerSanskritiOBanglaBoiKolkataAbabhas

[113] RossFiona1999ThePrintedBengaliCharacteranditsEvolutionLondonCurzon

[114] ShastriMahamahopadhyayHaraPrasad1916HājārBacharērPurāṇaBāṅgālāBhāṣāyBauddhaGānōDōhāCalcuttaBangiyaSahityaParishat

[115] SinghUdayaNarayana(JointlyManiruzzaman)1983DiglossiainBangladeshandlanguageplanningCalcuttaGyanBharati214pp

[116] -----1987ABibliographyofBengaliLinguisticsMysoreCIILxii+316pp

[117] -----2017(withRajibChakrabortyBidishaBhattacharjeeampArimardanKumarTripathy)LanguagesandCulturesontheMarginGuidelinesforFieldworkonEndangeredLanguagesMimeoCentreforEndangeredLanguagesVisva-Bharati

54

[118] -----1980ScriptalchoiceandspellingreformAnessayinlanguageandplanningJournaloftheMSUniversityofBarodaSocialScienceNumber292173-186AmodifiedversionreprintedEAnnamalaiBjornJernuddandJoanRubinedsLanguagePlanningProceedingsofanInstituteMysoreCIIL405-417

[119] Sripantha1996JakhanChapakhanaEloKolkataPaschim-BangaBanglaAcademy

[120] SurAtul1986BanglaMudranerDushoBacharKolkataJijnasa

[121] ScriptBehaviourforBengaliVersion11TDILandC-DACPune

[122] BoraMahendra1981TheEvolutionofAssameseScriptJorhatAssamSahityaSabha

[123] ProposaltoEncodetheTirhutaScriptinISOIEC10646httpwwwunicodeorgL2L201111175r-tirhutapdfaccessedon25112017

[124] EthnologueAssameseintheLanguageCloudhttpswwwethnologuecomcloudasmaccessedon25112017

[125] BengalialphabetforManipurifoundinEthnologueManipuri(MeeteilonMeithei)httpswwwomniglotcomwritingmanipurihtmaccessedon20102019

[126] WikipediaBengalialphabethttpsenwikipediaorgwikiBengali_alphabetaccessedon25112017

[129]OmniglotSlyhetihttpwwwomniglotcomwritingsylotihtmaccessedon1052018

[130]WikipediaBishnupriyaManipurilanguagehttpsenwikipediaorgwikiBishnupriya_Manipuri_languageaccessedon25112017

[131] TheEMILLECIILCorpushttpmetashareeldaorgrepositorybrowsethe-emilleciil-corpusabdd35c8de6f11e2b1e400259011f6ea6bce74d38dbb42d881da76c64a6adb20accessedon1052018

[132]TheEMILLECIILCorpushttpcatalogelrainfoproduct_infophpproducts_id=696accessedon1052018

[133] BanglaLanguageampScript httpswwwisicalacin~rc_banglabanglahtmlaccessedon1052018

[134] SarkarPabitra1992BanglaBananSanskarSamasyaoSambhabanaKolkataChirayataPrakashan

[135] SarkarPabitra1993BanglaBhasharYuktabyanjanBhasha1123-45

55

[136] DashNiladriShekharandBBChaudhuri1998BanglaScriptAStructuralStudyLinguisticsToday121-28Alsoavailableathttpswwwacademiaedu9967428Bangla_Script_A_Structural_Study

[137] DaniAhmedHasan(1957)lsquoSrīhatta-NāgarīLipirUtpattioBikāśrsquoBanglaAcademyPatrika(Dhaka)Vol12(Bhadra-Agrahayan1364BangabdaNumber)pg1

[138] WikipediaSylhetiNagari httpsenwikipediaorgwikiSylheti_Nagariaccessedon1952018

[139] FuruiRyosuke(2015)lsquoVariegatedAdaptationsStateFormationinBengalfromtheFifthtoSeventhCenturyrsquoinBhairabiPrasadSahuampHermannKulkeedsInterrogatingPoliticalSystemsIntegrativeProcessesandStatesinPre-ModernIndiaChapter9Pp255-73NewDelhiManohar

[140] FergusonCharesAandMunierChowdhury(1960)lsquoPhonesofBengalirsquoLanguageVol36No1pp22-59

[141] ShahidullahMuhammad(2007)BuddhistMysticSongsDhakaMowlaBrothers

[142] RayPunyaSloka(1966)BengaliLanguageHandbookWashington

[143] HaiMuhammadAbdul(1960)AphoneticandphonologicalstudyofnasalsandnasalizationinBengaliDhakaUniversityofDhaka

[144] UnicodeConsortiumProposalSummaryFormtoAccompanySubmissionsforAdditionstotheRepertoireofISOIEC10646UNICODEhttpswwwunicodeorgL2L200202387r-syloti-formpdfaccessedonMay212018

[145] WikipediaOlChiki(Unicodeblock)httpsenwikipediaorgwikiOl_Chiki_(Unicode_block)accessedonMay212018

[146]BanglaScripthttpwwwbangladesh2000combdbangla_scripthtmlaccessedonMay212018

[147] BhattacharyaAshutoshed(1942)GopichandrerGanCalcuttaCalcuttaUniversity

[149] DasSisirKumar(1975)SahibsandMunshisAnAccountoftheCollegeofFortWilliamCalcutta

[150] IslamRafiqulPabitraSarkarMahbubulHaqampRajibChakraborty(eds)(2014)BanglaAcademyPromitoBanglaByabaharikByakaran(AFunctionalGrammarofStandardBangla)DhakaBanglaAcademy

[151] SarkarPabitra[2013]lsquoBanglaSpellingReformtheLongandShortofItrsquoBanglaJournal19215-232

[152] BanglaAcademy(2012)BanglaAcademyPromitoBanglaBananerNiyam(StandardBanglaSpellingasadoptedbyBanglaAcademy)DhakaBanglaAcademy

56

[153] SarkarPabitraampRajibChakraborty2018ldquoWhathashappenedSoFarIntermsof Script Reformsrdquo Paper presented at the Face to Face meeting jointly held by theBanglaAcademyDhakaampICANNatBanglaAcademyDhakaon10072018

[154] The Unicode Consortium 2018 The Unicodereg Standard Version 110 ndash CoreSpecificationChapter12P473

57

10 Appendix-I

101 AugmentedBackus-NaurFormalism(ABNF)The Augmented Backus-Naur Formalism (ABNF) is generic in nature and whenappliedtoaspecificlanguagescriptcertainrestrictionrulesapplyInotherwordsinagivenlanguagesomeoftheFormalismstructuresdonotnecessarilyapplyTotakecareofsuchcasesrestrictionrulesaresetinplaceTheserestrictionswillhelptofine-tunetheABNFIncaseofBangla13inparticularthefollowingrulesapply

1Khaṇḍata (ৎ) is NOT allowed at the beginning of an IDN label The sameappliestoঞandthevelarnasalঙintheBanglaSchemeoffive-foldlsquovargarsquo(asdefined under Table 5) Moreover Bangla does not allow ya (য়) in thebeginningofawordeitherbutwecanciteacoupleofnativeexamples forexamplethewordয়8াvেড়া(yaeligbbɔRo)fromthepoemlsquoLichuchorrsquowrittenbyKaziNazrul IslamHowever there are instances of it being used in namesmostlyof foreignoriginsuchasYaqubwhichmaybewrittenwithya(য়) inthe beginning as inয়াwব) In very recent timeswhile transliterating someChineseandJapanesenamesinBanglaonedoescomeacrossthepossibilityofKhaṇḍata (ৎ) followedbysa (স) inthebeginningofaword forexamplexেসিরং(Tsering)

2CHcancomewithKhandaTainonlythecasewhereCisra(র)(09B0)

ৎ6 asinভৎ6 সনা

3OnlyfollowingcombinationswithVHCMwillbeallowedrarrঅ8া(togetherpronouncedasaelig)asinঅ8ািসড(acid)rarrএ8া(togetheralsopronouncedasaelig)asinএ8ািসডএ8ােসািসেয়শান

(acidassociation)

102 lsquoSylhetiNagarılipirsquoorlsquoSilotirsquoThisversionofBanglascriptresemblesthe lsquoKaithīrsquoscript(ISO12954)usedbytheAccountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh andBiharndashwidelyinuseduringthe1880sTherewereseveralothernamesofSylheti

13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language Assamese and Maṇipuri have not been covered in this section

58

Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)

Table17ndashTheScriptTableofSylhetiNagarıorSiloti

103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints

1031 BanglaandNāgarīorDevanāgarī

Bangla Devanāgarī NBGP

Decision

ঃ U+0983 ः U+0903 Confusable

ওU+0993 उU+0909 Confusable

ঘU+0998 घU+0918 Confusable

U+0981 U+0945 Confusable

Table18BanglaandDevanāgarīconfusablecodepoints

59

1032 BanglaandGurmukhi

Bangla Gurmukhi NBGPdecision

ঘU+0998 ਬU+0A2C Confusable

U+0981 U+0A71 Confusable

Table19BanglaandGurmukhiconfusablecodepoints

Bangla Gurmukhi

NBGPdecision

ওU+0993 ਤU+0A24

Distinguishable

শU+09B6 ਅU+0A05

Distinguishable

মU+09AE ਮU+0A2E

Distinguishable

বাU+09ACandU+09BE

ਗU+0A17

Distinguishable

Table20ndashBanglaandGurmukhıdistinguishablecodepoints

1033BanglaandOriya(Odia)

Bangla Oriya(Odia) NBGPDecision

ওU+0993 ଓU+0B13 Confusable

Table21ndashBanglaandOriyadistinguishablecodepoints

Bangla Oriya(Odia) NBGP

Decision

ঘU+0998 ସU+0B38 Distinguishable

Table22ndashBanglaandOriyadistinguishablecodepoints

60

11 Appendix-IIBengaliconsonantsandtheirallographs

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

প p y(z+ত)(z +ন)|(z +প)প8(z +য)r(z +র)(z +ল)~(z +স)(+প)(+প)

ফ pʰ (t +র)(t +ল)(+ফ)

ব b ( +জ)( +দ)( +ধ)v(+ব)ব8(+য)(+র)(+ল)ভ(+ভ)(+ব)(+ব)

(0+ধ) 2 (3+ব)

ভ bʱ ভ8(+য)(+র)(+ল)

ত t (x+ত)8(x+x+য)(x+x+ব)(x+থ)(x+ন)ত8(x+য)(x+ম)8(x++য)(x+ব)(x+র)y(z+ত)(p+ত)(p+x+ব)(+ত)q8(+x++য) (+x+র)Thereisamarkedformofত+=ৎৎ6 (+xৎ)

amp (5+ত)

61

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

থ tʰ থ8(iexcl+য)cent(iexcl+র)pound(+থ)(x+থ)curren(+থ)

(7+থ) 9 (+থ)

দ d yen(brvbar+গ)sect(brvbar+ঘ)uml(brvbar+দ)copy(brvbar+ধ)দ8(brvbar+য)ordf(brvbar+ব)laquo(brvbar+ভ)not(brvbar+র)(+দ)shy(+দ)reg(+brvbar+র)not6 (+brvbar+র)

(lt+গ) gt (lt+ধ)

ধ dʱ macr(deg+ন)plusmn(deg+ম)ধ8(deg+য)sup2(deg+র)sup3(acute+ধ)copy(brvbar+ধ)(+ধ)micro(+ধ)

( (+ধ) gt (lt+ধ) (0+ধ) (7+ধ)

ট ʈ para(middot+ট)ট8(middot+য)cedil(middot+ব)sup1(middot+র)ordm(p+ট)raquo(frac14+ট)

ঠ ʈʰ ঠ8(frac12+য)frac34(iquest+ঠ)Agrave(frac14+ঠ)

ড ɖ Aacute(Acirc+ড)ড8(Acirc+য)Atilde(Acirc+র)

ঢ ɖʱ ঢ8(Auml+য)Aring(iquest+ঢ)

চ tʃ AElig(Ccedil+চ)Egrave(Ccedil+ছ)Eacute(Ccedil+Ecirc+র)Euml(Ccedil+ঞ)চ8(Ccedil+য)Igrave(Iacute+চ)Icirc(Iuml+চ)

(A+চ)

62

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ছ tʃʰ ETH(Ecirc+র)Egrave(Ccedil+ছ)Ntilde(Iacute+ছ)Ograve(Iuml+ছ)

$ (A+ছ)

জ dʒ Oacute(Ocirc+জ)Otilde(Ocirc+Ocirc+ব)Ouml(Ocirc+ঝ)times(Ocirc+ঞ)জ8(Ocirc+য)Oslash(Ocirc+র)Ugrave(Iacute+জ)

(A+জ)

ঝ dʒʱ (notprivilegedenoughtohaveclustersasafirstmember)Ouml(Ocirc+ঝ)Uacute(Iacute+ঝ)

ক k Ucirc(p+ক)ordm(p+ট)(p+ত)Uuml(p+x+র)(p+x+ব)Yacute(p+ন)THORN(p+ব)szlig(p+ম)ক8(p+য)agrave(p+র)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)auml(p+frac14+ব)aacute8(p+frac14+য)aring(p+স)s(aelig+ক)ccedil(+p+র)

amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)

খ kʰ (notprivilegedenoughtohaveclustersasafirstmember)egrave(aelig+খ)

63

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

গ g eacute(acute+গ)ecirc(acute+দ)sup3(acute+ধ)euml(acute+ন)igrave(acute+ব)iacute(acute+ম)গ8(acute+য)icirc(acute+র)iuml(acute+ল)eth(aelig+গ)eth6 (+aelig+গ)

( (+ধ) (H+গ) K (L+H+গ)

ঘ gʱ ntilde(ograve+ন)ঘ8(ograve+য)oacute(ograve+র)ocirc(aelig+ঘ)

ঞ Thisletterdoesnothaveanyparticularphoneticvaluebutmostlypronouncedasn

Igrave(Iacute+চ)Ntilde(Iacute+ছ)Ugrave(Iacute+জ)Uacute(Iacute+ঝ)times(Ocirc+ঞ)

(A+চ) $ (A+ছ) (A+জ) M (A+ঝ)

ণ n otilde(iquest+ট)frac34(iquest+ঠ)ouml(iquest+ড)divide(iquest+Acirc+র)Aring(iquest+ঢ)oslash(iquest+ণ)ণ8(iquest+য)ugrave(iquest+ব)acirc(p+frac14+ণ)uacute(frac14+ণ)ucirc(+ণ)

O (P+ড) - (P+R+র) + (S+ণ)

ঙং ŋ s(aelig+ক)uuml(aelig+p+র)egrave(aelig+খ)eth(aelig+গ)ocirc(aelig+ঘ)yacute(aelig+p+ষ)(Insomecontextsaeligisreplacedbyং)কংঅং

) (H+ক) (H+গ) U (H+ঘ)

64

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ম m thorn(+ল)yuml(+প)(+z+র)(+ভ)(++র)$(+ম)(+র)(x+ম)plusmn(deg+ম)amp(+ম)atilde(p+frac14+ম)

W (3+ম)

ন n (+ট)((+middot+র))(iquest+ঠ)u(+ড)(+Acirc+র)(+ত)q(+x+র)q8(+x++য)curren(+থ)shy(+দ)reg(+brvbar+র)micro(+ধ)+(+deg+র)(+brvbar+ব)-(+ন)(+ম)ন8(+য)(+স)0(+ন)

(7+থ) (7+ধ) (7+Y+র)

শ ʃ Icirc(Iuml+চ)Ograve(Iuml+ছ)1(Iuml+ন)2(Iuml+ম)3(Iuml+র)4(Iuml+ল)শ8(Iuml+য)

ষ ʃ 5(frac14+ক)raquo(frac14+ট)Agrave(frac14+ঠ)uacute(frac14+ণ)6(frac14+প)7(frac14+z+র)8(frac14+ফ)raquo(frac14+ট)9(frac14+middot+র)Agrave(frac14+ঠ)uacute(frac14+ণ)ষ8(frac14+য)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)

+ (S+ণ)

65

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

স sampʃ (+ক)(+ট)(+প)(+ফ)lt(+ত)pound(+থ)(+ট)(+ক)=(+খ)স8(+য)gt(+র)(+ল)aring(p+স)

9 (+থ)

হ h ucirc(+ণ)0(+ন)amp(+ম)হ8(+য)(+র)A(+ল)

W (3+ম)

ড় ɽ B(C+গ)

ঢ় ɽʱ (notprivilegedenoughtohaveclusters)

য dʒThesecondarysymbol(allograph)jɔ-phalahastwophoneticvaluesWhenaddedtotheinitialconsonantinaworditisavowelaelig(asinশ8ামলর 8াপারetc)Butafteranon-initialconsonantitjustdoublesitinpronunciation(asinকায6ধায6etc)The+যcombinationhastwophysicalmanifestationsmdashর 8andয6

ক8(p+য)স8(+য)র 8(+য)[Justর 8isneverusedinBanglaorthographyর 8াisbutthenitslasttwosymbolsYa-phalaa-karaconstituteavowelsignrepresentingthevowelঅ8া]

66

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

র r Twomanifestationsmdashi lরফrepʰasthefirst

memberofaclusteregপ6ৎ6 not6 য6D6(+deg+ব)(earlierE6=+brvbar+deg+বafour-termcluster)etc(placedoverthefollowingconsonant)

ii র-ফলাrɔ-pʰɔlaasthesecondthirdmemberofaclustereg etc(placedundertheconsonantitfollows)

ল l F(+গ)(+প)G(+ব)H(+ম)I(+ট)J(+ড)K(+ক)F(+গ)L(+দ)ল8(+য)iuml(acute+ল)(+ল)thorn(+ল)

ঃ hwordfinallywordmediallyitdoublesthepronunciationofthefollowingconsonant

অঃকঃ

অব

Page 28: Proposal for a Bangla (or Bengali) Script Root Zone …‘Bangla’ (or Bengali) is historically and genealogically regarded as an eastern Indo-Aryan language with around 178.2 million

28

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

15 U+0995 ক BENGALILETTERKA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

16 U+0996 খ BENGALILETTERKHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

17 U+0997 গ BENGALILETTERGA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

18 U+0998 ঘ BENGALILETTERGHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

19 U+0999 ঙ BENGALILETTERNGA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

20 U+099A চ BENGALILETTERCA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

21 U+099B ছ BENGALILETTERCHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

29

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

22 U+099C জ BENGALILETTERJA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

23 U+099D ঝ BENGALILETTERJHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

24 U+099E ঞ BENGALILETTERNYA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

25 U+099F ট BENGALILETTERTTA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

26 U+09A0 ঠ BENGALILETTERTTHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

27 U+09A1 ড BENGALILETTERDDA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

30

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

28 09A109BC(U+09DC)

ড় NormalizedformofBENGALILETTERRRA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]09DCisthepreferredcodepointhoweveritisnotavailableforLGRasperthestandardsgoverningthisLGRdevelopment

29 U+09A2 ঢ BENGALILETTERDDHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

30 09A209BC(U+09DD)

ঢ় NormalizedformofBENGALILETTERRHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]09DDisthepreferredcodepointhoweveritisnotavailableforLGRasperthestandardsgoverningthisLGRdevelopment

31 U+09A3 ণ BENGALILETTERNNA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

32 U+09A4 ত BENGALILETTERTA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

31

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

33 U+09A5 থ BENGALILETTERTHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

34 U+09A6 দ BENGALILETTERDA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

35 U+09A7 ধ BENGALILETTERDHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

36 U+09A8 ন BENGALILETTERNA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

37 U+09AA প BENGALILETTERPA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

38 U+09AB ফ BENGALILETTERPHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

39 U+09AC ব BENGALILETTERBA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

32

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

40 U+09AD ভ BENGALILETTERBHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

41 U+09AE ম BENGALILETTERMA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

42 U+09AF য BENGALILETTERYA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

43 09AF09BC(U+09DF)

য় NormalizedformofBENGALILETTERYYA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]09DFisthepreferredcodepointhoweveritisnotavailableforLGRasperthestandardsgoverningthisLGRdevelopment

44 U+09B0 র BENGALILETTERRA

Consonant 1Bangla2Manipuri

[112][125]

45 U+09B2 ল BENGALILETTERLA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

46 U+09B6 শ BENGALILETTERSHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

33

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

47 U+09B7 ষ BENGALILETTERSSA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

48 U+09B8 স BENGALILETTERSA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

49 U+09B9 হ BENGALILETTERHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

50 U+09BE া BENGALIVOWELSIGNAA

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

51 U+09BF ি BENGALIVOWELSIGNI

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

52 U+09C0 ী BENGALIVOWELSIGNII

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

53 U+09C1 BENGALIVOWELSIGNU

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

34

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

54 U+09C2 BENGALIVOWELSIGNUU

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

55 U+09C3 BENGALIVOWELSIGNVOCALICR

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

56 U+09C4 BENGALIVOWELSIGNVOCALICRR

Kāra(Mātrā)

1Bangla2Assamese

[112][122]

57 U+09C7 l BENGALIVOWELSIGNE

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

58 U+09C8 m BENGALIVOWELSIGNAI

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

59 U+09CB lা BENGALIVOWELSIGNO

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

60 U+09CC lৗ BENGALIVOWELSIGNAU

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

35

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

61 U+09CD BENGALISIGNVIRAMA

Hasanta(=Halant)Virama(=Da ri)

1Bangla2Assamese2Manipuri

[112][122][125]

62 U+09CE ৎ BENGALILETTERKHANDATA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

63 U+09F0 ৰ BENGALILETTERRAWITHMIDDLEDIAGONAL

Consonant 2Assamese [122]

64 U+09F1 ৱ BENGALILETTERRAWITHLOWERDIAGONAL

Consonant 2Assamese2Manipuri

[122][125]

Table8BanglaCode-PointRepertoire

Apart from the above individual code-points the Neo-Brāhmī Generation Panel alsoproposes some specific sequences which enable conditional inclusion of the BanglaLETTER A and E followed by Bangla SIGN VIRAMA and Bangla LETTER YA againfollowed by Bangla VOWEL SIGN AA in the repertoire for enabling inclusion of aeligsoundasinEnglishlsquobatrsquolsquocatrsquoetc

SrNo

UnicodeCodePoints

Sequence

CharacterNames Examplelanguagesusingthecode-point

(Notexhaustive

list)

Reference

36

SrNo

UnicodeCodePoints

Sequence

CharacterNames Examplelanguagesusingthecode-point

(Notexhaustive

list)

Reference

S1 098509CD09AF09BE

অ8া BENGALILETTERABENGALISIGNVIRAMABENGALILETTERYABENGALIVOWELSIGNAA

BanglaAssamese

[112][122]

S2 098F09CD09AF09BE

এ8া BENGALILETTEREBENGALISIGNVIRAMABENGALILETTERYABENGALIVOWELSIGNAA

Bangla [112]

Table9Sequences

52 CodePointRepertoireExclusionTherearesomecharactersoftheBanglascriptthatfindplaceintheUnicodebuthavenot been included in the repertoire in the LGR proposal The reason for excludingঌ(U+098C)andৗ(U+09D7)isthattheyarerareandobsoletecharacters

SrNo CodePoints

Glyph CharacterNames Note

1 U+098C ঌ BENGALILETTERVOCALICL Limitedordeclininguse

2 U+09D7 ৗ BENGALIAULENGTHMARK Limitedordeclininguse

Table10ExcludedCodePoints

37

53 CodepointnotusedaloneBENGALI SIGN NUKTA U+09BC (See 336) is excluded from repertoire since it will never be used alone It will be used as sequence in three special characters in normalized form for ড় ঢ় য়

UnicodeCodePoint

Glyph CharacterName Reasonforexclusion

U+09BC BENGALISIGNNUKTA

Never used alone Only used together with U+09A1 ড

U+09A2 ঢ U+09AF য as to form ড় ঢ় য় respectively

Table10bExcludedCodePoints

54 TheBasisofPresentIDNThepresentLGRhasalsobenefitedfromtheearlierworkonIDNforBangla(differentversions)doneforभारतorভারতdraftedbetween20112009and18072013

541 TheABNFVariablesTheAugmentedBackus-NaurFormalism(ABNF)beganwiththefollowingvariables

CrarrConsonantVrarrVowelMrarrkāra(Mātrā)BrarrAnusvāra(onuʃʃār)DrarrCandrabinduXrarrVisarga(biʃɔrgo)HrarrHasantaViramaZrarrKhandaTa

TheAugmentedBackus-NaurFormalism(ABNF)willusethefollowingOperators

SrNumber Operator Function

1 ldquo|ldquo Alternative

2 ldquo[]rdquo Optional

3 ldquordquo VariableRepetition

38

4 ldquo()rdquo SequenceGroup

Table11TheABNFFormalism

InwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoBanglaaregiventofacilitateunderstanding

542 TheVowelSequenceInwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoBanglaare given To facilitate understanding of other Brahmi script users equivalents inDevanāgarīareprovidedwherevernecessaryAvowelsequenceismadeupofasinglevowelItmaybefollowedbutnotnecessarily(optionally)byanAnusvāraonuʃʃār(B)Candrabindu(D)oraVisargabiʃɔrgo(X)ThenumberofDBorXwhichcanfollowaV inBanglamaynotberestrictedtooneGoingbytherules illustratedinthedocument it isclearthat formationssuchasVDDVBBandVXXare invalidorthographicunitsHowever it isvalidandpossible tohaveformationsorsequencessuchasanusvarafollowedbyachandrabinduononehandandvisarga followed by a chandrabindu on the other as in হ8াংচা lsquohaelignchārsquo and lsquohaelignrsquo হ8াঃrespectivelyThepossibilityof aVisarga orAnusvāra (onuʃʃār) followingaCandrabinduexists inBangla Vowel can optionally be followed by a combination of Hasanta Virama [H]Consonant [C] to formaYa-phala ldquoYa-phala isapresentation formofU+09AFBanglaletterযorlsquoyarsquoRepresentedbythesequenceltU+09CDieBENGALISIGNVIRAMABangla SIGNHasanta or VIRAMA U+09AF -য BENGALI LETTER YAgt Ya-phala has aspecialformয়AgainwhencombinedwithU+09BEাBENGALIVOWELSIGNAA(ielsquoaarsquo(ā))itisusedfortranscribing[aelig]asintheldquoardquointheEnglishwordldquobatrdquowritteninBanglaasব8াটAVowel-sequenceadmitsthefollowingcombinations

5421 ASingleVowel

ExamplesV অअ

5422 AVowelwithConditionsAVowelcanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D]orVisarga[X]or Candrabindu+ Anusvāra [DB] or Candrabindu+ Visarga [DX] or combination ofHasanta(orVirama)[H]followedbyConsonant[C]followedbykāra(Mātrā)[M]

39

Examples

VB অং अ

VD অ अ

VX অঃ अः

VDB অং अ

VDX অঃ अःVHCM অ8াএ8া

5423 VHCMSequenceAVHCMsequencecanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D] orVisarga [X]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga [DX]

Examples VHCMB অ8াংএ8াং VHCMD অ8াএ8া VHCMX অ8াঃএ8াঃ VHCMDB অ8াংএ8াং VHCMDX অ8াঃএ8াঃ

543 TheConsonantSequence

5431 ASingleConsonant(C) ExampleC কक

5432 AConsonantwithConditionsAConsonantoptionallyfollowedbydependentvowelsignkāra(Mātrā)[M]orAnusvāra [B] or Candrabindu [D] or Visarga [X] or Hasanta (also known asVirama)[H]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga[DX]

Example

CM িকক -कक

CB কং क

40

CD ক क

CX কঃ कः

CH p क(Pureconsonant)

CDB কং क

CDX কঃ कः

5433 CMSequenceACMsequencecanbeoptionallyfollowedbyBDXDBorDX

Example CMB কীংকং कक

CMD কা का

CMX বীঃ वीः

CMDB কাং का

CMDX কাঃ काः

5434 SequenceofConsonants Asequenceofconsonants(upto4)joinedbyHasanta(alsoknownasVirama)

3(CH)CExample

CHC W rarr ন++ ত न+त

CHCHC sup2 rarr ন+ + ত+ + র न+त+र

CHCHCHC q8 rarr ন++ত++র++য় न+त+र+य

5435 Subsets While considering its subsets as a representative example we will consider thecombinationCHConlyhoweverthesameisequallyapplicabletoCHCHCandCHCHCHC[A]ThecombinationmaybefollowedbyMBDXDBorDX

Example CHCM sup3ী rarrক ক ী 4कrarrककী

CHCB sup3ং rarrক ক ং 4कrarrकक

CHCD sup3 rarrক ক 4कrarrकक

41

CHCX sup3ঃ rarrক ক ঃ 4कःrarrककঃ

CHCDB sup3 ং rarrক ক ং 4कrarrकक

CHCDX sup3ঃ rarrক ক ঃ 4कःrarrककः [B] 3(CH)CMmayfurtherbefollowedbyaBDXDBorDX

Example

CHCMB sup3ীং rarr ক ক ী ং 4क rarr क क ी

sup3ং rarr ক ক ং 4क rarr क क

CHCMD sup3া rarr ক ক া 4का rarr क क ा

CHCMX sup3ীঃ rarr ক ক ী ঃ 4कः rarr क क ी ः

CHCMDB sup3াংrarr ক ক া ং 4काrarr क क ा

CHCMDX sup3াঃ rarr ক ক া ঃ 4काः rarr क क ा ः

544 TheKhanda-Tasequence

5441 AsinglelsquoKhandarsquo-Ta(Z) Example Zৎ=x

5442 AKhandaTaCombination10AKhandaTacanbeprecededbyaconsonantandHasanta(alsoknownasVirama)

[CH]Z

Example র++ৎ =ৎHasinভৎHসনা (bhartsanā) scoldingNoteTheconditionsinthiscontextofKHANDATAarethattheCshouldbeeitherRAU+09B0(র)(usedinBangla)orRAU+09F0(ৰ)(usedinAssamese)

545 SpecialCasesSandPTwospecialcasesinvolvingSequences(referredtoasSandPinTable16underSection7)couldbedescribedbrieflyhereLetustakeupSinthefirstinstanceItisnoteworthythat thereare two instances inBanglawhereHasanta(U+09CD) isprecededbya full

10 Refer to Rule P in Section 7 Table 16

42

vowel (U+0985 অ - BANGLA LETTER A and U+098F এ - BANGLA LETTER E) ForrenderingYa-phalāfollowedbyঅandএitisnecessarytotypeU+09CDplusU+09AFyapreceded by the said vowels This is a purely ligatural entity and the addition ofYa-phalāandā-kāraisusedtoelicittheaeligsoundasinEnglishlsquobatrsquolsquofatrsquoetcTheBrahmıscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribedaslsquovowel killerrsquo although it actually indicates absence of a vowel after the markedconsonant Only the consonants can have the Hasanta marked But as we see hereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅGাandএGা(CfUnicode100p473[100])Another case refers to the formation of repha and ra-phalā in the said script andmentioned in the tableaboveasPOwing toco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+

kaasinঅকHarkaldquothesunldquo)ra-phalā=C+Hasanta+ra(egTieka+Hasanta+raas

inচTcakraldquocycleldquo)ThepointisinboththecasestheslotforracouldbeBanglaraর

(U+09B0)ortheAssameseraৰ(U+09F0)followedprecededbythecommonHasanta(U+09CD)whereastheshapesofrephaandra-phalāinboththecasesremainthesame

43

6 VariantsThissection talksabout thevariants in theBanglascriptTheNBGPcategorizes theseconfusinglyvariantsintwogroupsGroup1ConfusingduetopurevisualsimilarityGroup2ConfusingduetodeviationfromnormallyperceivedcharacterformationsbylargerlinguisticcommunityForGroup1anyidenticalcodepointsaredefinedasvariantsTheconfusablebutnotidenticalcasesarenotproposedasthereisanotherpanel(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesHowevercaseswhichbelongtoGroup2areproposedtobeconsideredasvariantsThesecasesarenotofmerevisualsimilarityasthey involve some deviations from the widely accepted norms of Bangla Aksharformations These can cause confusion even to a careful observer and hence beingproposedasvariantsThe variants are generated in a script when two or more forms are formed withdifferent storage or code points In Bangla the e-kāra ā-kāra and the o-kāra havedifferentcodepointsOnecantypeowithaconsonantatonegoandthesamebytypinge-kāra and ā-kāra as two separate keys getting the same results A reader cannotdifferentiatebetween the twoko (েকা)one typedwitha singlekeyand theotheronetypedwithtwodifferentkeysMoreoverthiswillnotbeconsideredasacaseofvariantbecauseakārafollowedbyakāraisnotallowed

61 InScriptVariantsHoweverweproposetwocasesoftruein-scriptvariantsinBanglascriptCASEIAs far as true variants in Bangla are concernedwemay drawour attention to caseswhereinHasantawith(U+09A5)থ(tha)appearsasconjunctwith(U+09B8)স(sa)and(U+09A8)ন(na)1 স+Hasanta+থ(U+09B8+U+09CD+U+09A5)versus

স +Hasanta+হ(U+09B8+U+09CD+U+09B9)

2 ন+Hasanta+থ(U+09A8+U+09CD+U+09A5)versus

ন+Hasanta+হ(U+09A8+U+09CD+U+09B9)

44

Theabovecombinationsifwrittenintraditionalorthographycouldbelittleconfusingwheretheথ(tha)inconjunctappearslikeaহ(ha)Theconjunctcouldbeintheinitialmedialorfinalpositions(asshownbelowinegno1)Itcouldbetypedwrongaswellthinking itwas aহ (ha)U+09B9 increasing the chances of risks in labelwriting andidentificationExamples1 acuteandসহ(asinacuteানsthānaacuteলsthulaাacuteGsvāsthyaঅacuteায়ীasthāyı)

2 andনহ(asinparagrantha)The fontswhichrepresent traditionalBanglawritingsystemcould tend tocreate thisproblemThereforethesemaybetakenascasesofvariantsinBanglaCASEIIAnotherinterestingexampleofvariantisencounteredinra+HasantaandHasanta+racombinationsinwritinglabelsintheBanglascript(forlanguagessuchasBanglaAssameseandManipuri)Thevariantcasesariseintypinglsquorepharsquo(involvingra+Hasanta)andlsquora-phalārsquo(involvingHasanta+ra)lsquoRepharsquocouldbeformedbytwosequences(mainlybecausebothAssameseandBanglafindplaceinthesameUNICODEpointsandlsquoB_RArsquoaswellaslsquoA_RArsquorefertothesamephoneticelement)Herethefinalligatureslookthesameandwillbeasfollows

(1) B_RA+H+C(2) A_RA+H+C

Where

B_RA = U+09B0BENGALILETTERRA(র)orA_RA = U+09F0BENGALILETTERRAWITHMIDDLEDIAGONAL(ৰ)H = U+09CDBENGALISIGNVIRAMA()C = anyconsonant(theoretically)

Example

Sequence1(UsingBanglaRA)

Ligature1

Sequence2(UsingAssameseRA)

Ligature2

U+09B0(র)U+09CD()U+0995(ক) কH U+09F0(ৰ)U+09CD()U+0995(ক) কH

U+09B0(র)U+09CD()U+09A0(ঠ) ঠH U+09F0(ৰ)U+09CD()U+09A0(ঠ) ঠH

Table12ExampleofRepha

45

NoteAsBanglaandAssameseকandঠlookexactlythesametheresultantcombinationswithRephalookidenticalAdditionofRephadoesnotmakeanydifferencelsquoRa-phalarsquocouldbeformedbytwosequencesonsimilargroundsandthefinalligatureswouldlookthesame

(1) C1+H+B_RA(2) C1+H+A_RA

WhereC1 = anyconsonantsexceptKhanda-ta

Example

Sequence1(UsingBanglaRA)

Ligature1

Sequence2(UsingAssameseRA)

Ligature2

U+0995(ক)U+09CD()U+09B0(র) U+0995(ক)U+09CD()U+09F0(ৰ)

U+09A8(ন)U+09CD()U+09B0(র) ) U+09A8(ন)U+09CD()U+09F0(ৰ) )

Table13ExampleofRa-phala

AstheAssameseandBanglaRephaandRa-phalaconjunctformslookthesamethiscouldcauseconfusabilitytotheend-usersHencetherephaandra-phalacasesneedtobedefinedasvariantsNBGPconcludedtodefineরandৰasvariantcodepointswhereonlyonevariantsetbetweenরandৰcouldcoverallcasesButthiswillcreateblockedvariantlabelsegif

someoneregistersldquoরররrdquothevariantlabelldquoৰৰৰrdquowillbegeneratedasvariantandwillbeblockedandviceversaHoweveritisonlyblockedatthelabellevelifsomeoneelseneedstoregisterotherlabelsegৰৰorৰৰৰৰitisstillpossible

62 CrossScriptVariantsAcrispcrossscriptstudyforBanglahasbeendonewithrespecttosisterscriptssuchasDevanāgarī Gurmukhı and Odia11(formerly Oriya) keeping in mind the visual andtechnicalconfusionstheymaycauseaslabelsonthewebdomainMoreoverthereisnoin-script variant in Bangla as far as the orthography is concerned The followingcharacters are being proposed by the NBPG as variants Although there are certaincharacters which are somewhat similar they but have not been included here TheyhavebeenprovidedintheAppendix(102)forreference

11 Unicode uses Oriya for the script although Odia is now the official term used

46

1 BanglaandNāgarīDevanāgarīScript

Bangla Devanāgarī

মU+09AE

मU+092E

িU+09BF

िU+093F

Table14-BanglaandDevanāgarīcross-scriptvariantcodepoint

2 BanglaandGurmukhiScript

Bangla Gurmukhı

মU+09AE

ਸU+0A38

িU+09BF

ਿU+0A3F

Table15-BanglaandGurmukhıcross-scriptvariantcodepoint

7 WholeLabelEvaluationRules(WLE) ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinsection32whenwritten inBangla12ScriptThe ruleshavebeendrafted in suchawaythattheycanbeeasilytranslatedintotheLGRspecificationsBelow are the symbols used in the WLE rules for each of the Indic SyllabicCategoryasmentionedinthetableprovidedinCodepointrepertoire(Section51)

C rarr Consonant

M rarr Kāra(Mātrā)

V rarr Vowel

B rarr Anusvāra

12 As used by the Unicode denoting and including both Assamese and Maṇipuri

47

D rarr Candrabindu

X rarr Visarga

H rarr Hasanta

Z rarr KhandaTa

S rarr S1S2(fromTable9)or(ae)Ya-phalā(V1HC1M1)whereV1iseither0985(অ-BENGALILETTERA)or098F(এ-BENGALILETTERE)His09CD(-BENGALISIGNVIRAMA)C1is-09AF(য-BENGALILETTERYA)M1is-09BE(া-BENGALIVOWELSIGNAA)S1 and S2 are valid even they are not allowed by the other context rules

P rarr Ra-Hasanta(C2H)whereC2iseither09B0(র-BENGALILETTERRA)or 09F0 (ৰ - ASSAMESE LETTER RA

Unicode name BENGALI LETTER RAWITHMIDDLEDIAGONAL)

His09CD(-BENGALISIGNVIRAMA)

Table16-SymbolsusedinWLErules

ItisalsoperhapsidealtomentionherethatinBanglatheconsonantletters(orgraphemes)arephysicallyjoinedtoformldquoclustersrdquothatcouldtheoreticallyconjoinfromtwotofourconsonantsandcombinetocreatenewshapesDashandChaudhuri(1998)statethatthereareldquonearly380uniqueconsonantclustersrdquooutofwhichBi-consonantalcombinationsare290three-lettercombinationsaccountforanother80andtherareroneswithfourlettersnumber10more[136Pg4]MoredetailsofsuchcombinationscouldbeseeninPabitraSarkar(1993)[135] 71FinalSetofWLERulesTheprevalentpatternsinBanglaandvariousrestrictionsbelowarethespecificWLErulesthatneedtobeimplemented

48

1 CisasetofCandCNwhereCNisthesetofnormalizedformsofড়ঢ়য়2 HmustbeprecededbyC

Example

3 MmustbeprecededbyCExampleকা

4 DmustbeprecededbyeitherofVCorMExampleআখখা হা

5 XmustbeprecededbyeitherofVCMorDExampleউঃখঃবঃাঃ দ ঃ

6 BmustbeprecededbyeitherofVCMorDExampleআংইংকং

7 ZmustbeprecededbyVCMDBXorPExampleইৎকৎাৎাৎপ6ৎrৎ (S is not listed because S ends with M Z may also follow S)rdquo

8 VCANNOTbeprecededbyHDetailsin711CaseofVprecededbyH

9 SCANNOTbeprecededbyH

Now letuselaborateeach rulewithexamples from the scriptkeeping inmindthe Bangla Assamese and Manipuri communities Some combinations ofcharactersmayseemunrealisticorrareinusagebutthereisnoharminaddingsuch ligaturesbecause it ispossible tocreate thembyanyusereasilybutmaynotbeattestedcombinationsCase of V Preceded by H There could be cases involvingmulti-word domains where Vmay need to beallowedtofollowanHegব8াsঅtইিuয়া baeligŋkʌv ɪndiə (U+09ACU+09CDU+09AFU+09BEU+0999U+09CD U+0995 U+0985 U+09AB U+09CD U+0987 U+09A8 U+09CD U+09A1U+09BFU+09DFU+09BE)(meaningBankofIndia)ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendswithanH(অt)andthesecondwordbeginswithaV(ইিuয়া)Somesectionsofthe linguistic community require the explicit presence of H for fullrepresentation of the sound intended However by and large the form of thefirstwordwithoutanH(U+09CD)isconsideredenoughforfullrepresentationofthesoundintendedforthefirstword

49

This isauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoire Otherwise V is never required to be allowed to follow an HPermittingthismaycreateaperceptivesimilaritybetweentwolabels(withandwithout H) for majority of the linguistic communities hence this is explicitlyprohibitedbytheNBGPIn future if required depending on the prevailing requirements from thecommunitythefutureNBGPmayconsiderrevisitingthisrule

72 AdditionalExamplesfromBanglaABNF

Belowarea fewexampleswhichhelponeunderstandsomeof the rulesABNFputsinplaceThesearejustgivenforreferencepurposesandarenotmeanttobecomprehensive

1HMBDorXcannotoccurinthebeginningofaBanglawordExample

ক क

াক ाक

ংক क

ক क

ঃক ःकAscanbeseensuchcombinationwillresultautomaticallyinaldquogolurdquooradottedcircle marking it as an invalid formation This is an intrinsic property of theIndianlanguagesyllableandisquasi-automaticallyappliedwhereversupportedbytheOS

2HisnotpermittedafterVBDXMS

Example

অ अ

অং क ক क কঃ कः ি )क

3NumberofBDorXpermittedafterConsonantorVowelorakāra(Mātrā)isrestrictedtoonethusthefollowingcombinationsareinvalidated

50

Example

কংং क

ক क

কঃঃ कःः

কা का

কীঃঃ कःः

অংং अ

অ अ

অঃঃ अःः

4NumberofMpermittedafterConsonantisrestrictedtooneExample

কীী की5MisnotpermittedafterV Example

ইাঈৗ ईाईौ6ThecombinationsofAnusvāra+ VisargaaswellasVisarga+Anusvāraarenotpermissible

Example

কংঃ कः

কঃং कः

8 Contributors

81ExpertsfromIndia ProfessorUdayaNarayanaSinghChair-ProfessorofLinguisticsampDeanFacultyofArtsAmityUniversityHaryanaGurgaonPachgaonManesarPIN122431(Haryana)India

51

ProfessorPabitraSarkarformerlyVice-ChancellorRabindraBharatiUniversityKolkataDrAtiurRahmanKhanPrincipalTechnicalOfficerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMrRajibChakrabortyLinguistSocietyforNaturalLanguageTechnologyResearch(SNLTR)Module114amp130SDFBuildingSaltLakeSector-VKolkata-700091(WestBengal)IndiaMrAkshatJoshiProjectEngineerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMsMoumitaChowdhurySeniorTechnicalOfficerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMrChandrakantaMurasinghAgartalaTripuraSomeotherNBGPmembers

82ContributorsfromBangladeshJanabMustafaJabbarHonorableMinisterMinistryofPostsTelecommunicationsampInformationTechnologyGovtofBangladeshProfShamsuzzamanKhanFormerDirector-GeneralBanglaAcademyDhakaProfRafiqulIslamNationalProfessorofHumanitiesDhakaProfSwarochisSarkarDirectorInstituteofBangladeshStudiesRajshahiUniversityRajshahiBangladeshProfJinnatImtiazAliDirector-GeneralInternationalMotherLanguageInstituteDhakaMrMohammadMamunOrRashidDepartmentofBanglaJahangirnagarUniversityampMemberBangladeshComputerCouncilProfManiruzzamanformerlyProfessorChittagongUniversityChattagramBangladeshMrShyamSunderSikderSecretarySecretaryPostampTelecommunicationsDivisionGovtofBangladesh

52

MrMdMustafaKamalFormerDirectorGeneralBangladeshTelecommunicationsRegulatoryAuthorityGovernmentofBangladeshDhakaBrigadierGeneralMdMahfuzulKarimMajumderDirector-GeneralEngineeringampOperationsDivisionBangladeshTelecommunicationsRegulatoryAuthorityGovernmentofBangladeshDhakaMdZiarulIslamProgrammerPostsampTelecommunicationsDivisionGovernmentofBangladeshDhakaProfSyedShahriyarRahmanDepartmentofLinguisticsUniversityofDhakaDrMizanurRahmanDirector(In-Charge)TranslationTextBookandInternationalRelationsDivisionBanglaAcademyDhakaDrApareshBandyopadhyayDirectorBanglaAcademyDhakaMrMdMobarakHossainDirectorBanglaAcademyDhakaDrJalalAhmedDirectorBanglaAcademyDhakaMrJahangirHossainInternetSocietyBangladesh(ICANNALS)JanabSarwarMostafaChoudhuryBangladeshComputerCouncilDhakaJanabMdRashidWasifBangladeshComputerCouncilDhakaJanabIstiaqueArifSeniorAssistantDirectorBangladeshTelecommunicationsRegulatoryAuthorityDhakaMsAfifaAbbasInformationSecurityandGovernanceLeadEngineeratBanglalinkandICANNFellowMr Mohammad Abdul Haque Secretary General Bangladesh Internet GovernanceForumMrImranHossenCEOEyeSoftandkeymemberofBangladeshAssociationofSoftwareampInformationServices(BASIS)MsShahidaKhatunDirectorFolkloreMuseumampArchiveDivisionBanglaAcademyDhakaMrSyedAshikRehmanCEOBengalMediaCorporationDhakaMrHaseebRahmanCEOProfessionalsrsquoSystemsDhaka

53

9 References[100] UnicodeConsortium2017UnicodeStandard100MountainViewCA

[101] BandyopadhyayChittaranjan1981DuiShatakerBanglaMudranoPrakashanKolkataAnandaPublishers

[102] BanerjiRD1919TheOriginoftheBengaliScriptKolkataNewDelhiAsianEducationalServices2003reprint

[103] ChatterjiSK1926TheOriginandDevelopmentoftheBengaliLanguageCalcuttaCalcuttaUniversityPress

[104] -----1939Bhasha-prakashBangalaVyakaran(AGrammaroftheBengaliLanguage)CalcuttaUniversityofCalcutta

[105] HaiMuhammadAbdul1964DhvaniVijnanOBanglaDhvani-tattwa(PhoneticsandBengaliPhonology)DhakaBanglaAcademy

[106]JhaSubhadra1958TheFormationofMaithiliLondonLuzacampCo

[107] KosticDjordjeDasRheaS1972AShortOutlineofBengaliPhoneticsCalcuttaStatisticalPublishingCompany

[108] MajumdarRC1971HistoryofAncientBengalCalcuttaGBhardwaj

[109] MazumdarBijaychandra19202000TheHistoryoftheBengaliLanguage(ReprCalcutta1920ed)NewDelhiAsianEducationalServices

[110] PandeyAnshuman2001ProposaltoEncodetheTirhutaScriptinISOIEC10646

[111] PalPalashBaran2001DhwanimalaBarnamalaKolkataPapyrus

[112] -----2007lsquoBanglaHarapherPanchParbarsquoInSwapanChakrabortyedMudranerSanskritiOBanglaBoiKolkataAbabhas

[113] RossFiona1999ThePrintedBengaliCharacteranditsEvolutionLondonCurzon

[114] ShastriMahamahopadhyayHaraPrasad1916HājārBacharērPurāṇaBāṅgālāBhāṣāyBauddhaGānōDōhāCalcuttaBangiyaSahityaParishat

[115] SinghUdayaNarayana(JointlyManiruzzaman)1983DiglossiainBangladeshandlanguageplanningCalcuttaGyanBharati214pp

[116] -----1987ABibliographyofBengaliLinguisticsMysoreCIILxii+316pp

[117] -----2017(withRajibChakrabortyBidishaBhattacharjeeampArimardanKumarTripathy)LanguagesandCulturesontheMarginGuidelinesforFieldworkonEndangeredLanguagesMimeoCentreforEndangeredLanguagesVisva-Bharati

54

[118] -----1980ScriptalchoiceandspellingreformAnessayinlanguageandplanningJournaloftheMSUniversityofBarodaSocialScienceNumber292173-186AmodifiedversionreprintedEAnnamalaiBjornJernuddandJoanRubinedsLanguagePlanningProceedingsofanInstituteMysoreCIIL405-417

[119] Sripantha1996JakhanChapakhanaEloKolkataPaschim-BangaBanglaAcademy

[120] SurAtul1986BanglaMudranerDushoBacharKolkataJijnasa

[121] ScriptBehaviourforBengaliVersion11TDILandC-DACPune

[122] BoraMahendra1981TheEvolutionofAssameseScriptJorhatAssamSahityaSabha

[123] ProposaltoEncodetheTirhutaScriptinISOIEC10646httpwwwunicodeorgL2L201111175r-tirhutapdfaccessedon25112017

[124] EthnologueAssameseintheLanguageCloudhttpswwwethnologuecomcloudasmaccessedon25112017

[125] BengalialphabetforManipurifoundinEthnologueManipuri(MeeteilonMeithei)httpswwwomniglotcomwritingmanipurihtmaccessedon20102019

[126] WikipediaBengalialphabethttpsenwikipediaorgwikiBengali_alphabetaccessedon25112017

[129]OmniglotSlyhetihttpwwwomniglotcomwritingsylotihtmaccessedon1052018

[130]WikipediaBishnupriyaManipurilanguagehttpsenwikipediaorgwikiBishnupriya_Manipuri_languageaccessedon25112017

[131] TheEMILLECIILCorpushttpmetashareeldaorgrepositorybrowsethe-emilleciil-corpusabdd35c8de6f11e2b1e400259011f6ea6bce74d38dbb42d881da76c64a6adb20accessedon1052018

[132]TheEMILLECIILCorpushttpcatalogelrainfoproduct_infophpproducts_id=696accessedon1052018

[133] BanglaLanguageampScript httpswwwisicalacin~rc_banglabanglahtmlaccessedon1052018

[134] SarkarPabitra1992BanglaBananSanskarSamasyaoSambhabanaKolkataChirayataPrakashan

[135] SarkarPabitra1993BanglaBhasharYuktabyanjanBhasha1123-45

55

[136] DashNiladriShekharandBBChaudhuri1998BanglaScriptAStructuralStudyLinguisticsToday121-28Alsoavailableathttpswwwacademiaedu9967428Bangla_Script_A_Structural_Study

[137] DaniAhmedHasan(1957)lsquoSrīhatta-NāgarīLipirUtpattioBikāśrsquoBanglaAcademyPatrika(Dhaka)Vol12(Bhadra-Agrahayan1364BangabdaNumber)pg1

[138] WikipediaSylhetiNagari httpsenwikipediaorgwikiSylheti_Nagariaccessedon1952018

[139] FuruiRyosuke(2015)lsquoVariegatedAdaptationsStateFormationinBengalfromtheFifthtoSeventhCenturyrsquoinBhairabiPrasadSahuampHermannKulkeedsInterrogatingPoliticalSystemsIntegrativeProcessesandStatesinPre-ModernIndiaChapter9Pp255-73NewDelhiManohar

[140] FergusonCharesAandMunierChowdhury(1960)lsquoPhonesofBengalirsquoLanguageVol36No1pp22-59

[141] ShahidullahMuhammad(2007)BuddhistMysticSongsDhakaMowlaBrothers

[142] RayPunyaSloka(1966)BengaliLanguageHandbookWashington

[143] HaiMuhammadAbdul(1960)AphoneticandphonologicalstudyofnasalsandnasalizationinBengaliDhakaUniversityofDhaka

[144] UnicodeConsortiumProposalSummaryFormtoAccompanySubmissionsforAdditionstotheRepertoireofISOIEC10646UNICODEhttpswwwunicodeorgL2L200202387r-syloti-formpdfaccessedonMay212018

[145] WikipediaOlChiki(Unicodeblock)httpsenwikipediaorgwikiOl_Chiki_(Unicode_block)accessedonMay212018

[146]BanglaScripthttpwwwbangladesh2000combdbangla_scripthtmlaccessedonMay212018

[147] BhattacharyaAshutoshed(1942)GopichandrerGanCalcuttaCalcuttaUniversity

[149] DasSisirKumar(1975)SahibsandMunshisAnAccountoftheCollegeofFortWilliamCalcutta

[150] IslamRafiqulPabitraSarkarMahbubulHaqampRajibChakraborty(eds)(2014)BanglaAcademyPromitoBanglaByabaharikByakaran(AFunctionalGrammarofStandardBangla)DhakaBanglaAcademy

[151] SarkarPabitra[2013]lsquoBanglaSpellingReformtheLongandShortofItrsquoBanglaJournal19215-232

[152] BanglaAcademy(2012)BanglaAcademyPromitoBanglaBananerNiyam(StandardBanglaSpellingasadoptedbyBanglaAcademy)DhakaBanglaAcademy

56

[153] SarkarPabitraampRajibChakraborty2018ldquoWhathashappenedSoFarIntermsof Script Reformsrdquo Paper presented at the Face to Face meeting jointly held by theBanglaAcademyDhakaampICANNatBanglaAcademyDhakaon10072018

[154] The Unicode Consortium 2018 The Unicodereg Standard Version 110 ndash CoreSpecificationChapter12P473

57

10 Appendix-I

101 AugmentedBackus-NaurFormalism(ABNF)The Augmented Backus-Naur Formalism (ABNF) is generic in nature and whenappliedtoaspecificlanguagescriptcertainrestrictionrulesapplyInotherwordsinagivenlanguagesomeoftheFormalismstructuresdonotnecessarilyapplyTotakecareofsuchcasesrestrictionrulesaresetinplaceTheserestrictionswillhelptofine-tunetheABNFIncaseofBangla13inparticularthefollowingrulesapply

1Khaṇḍata (ৎ) is NOT allowed at the beginning of an IDN label The sameappliestoঞandthevelarnasalঙintheBanglaSchemeoffive-foldlsquovargarsquo(asdefined under Table 5) Moreover Bangla does not allow ya (য়) in thebeginningofawordeitherbutwecanciteacoupleofnativeexamples forexamplethewordয়8াvেড়া(yaeligbbɔRo)fromthepoemlsquoLichuchorrsquowrittenbyKaziNazrul IslamHowever there are instances of it being used in namesmostlyof foreignoriginsuchasYaqubwhichmaybewrittenwithya(য়) inthe beginning as inয়াwব) In very recent timeswhile transliterating someChineseandJapanesenamesinBanglaonedoescomeacrossthepossibilityofKhaṇḍata (ৎ) followedbysa (স) inthebeginningofaword forexamplexেসিরং(Tsering)

2CHcancomewithKhandaTainonlythecasewhereCisra(র)(09B0)

ৎ6 asinভৎ6 সনা

3OnlyfollowingcombinationswithVHCMwillbeallowedrarrঅ8া(togetherpronouncedasaelig)asinঅ8ািসড(acid)rarrএ8া(togetheralsopronouncedasaelig)asinএ8ািসডএ8ােসািসেয়শান

(acidassociation)

102 lsquoSylhetiNagarılipirsquoorlsquoSilotirsquoThisversionofBanglascriptresemblesthe lsquoKaithīrsquoscript(ISO12954)usedbytheAccountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh andBiharndashwidelyinuseduringthe1880sTherewereseveralothernamesofSylheti

13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language Assamese and Maṇipuri have not been covered in this section

58

Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)

Table17ndashTheScriptTableofSylhetiNagarıorSiloti

103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints

1031 BanglaandNāgarīorDevanāgarī

Bangla Devanāgarī NBGP

Decision

ঃ U+0983 ः U+0903 Confusable

ওU+0993 उU+0909 Confusable

ঘU+0998 घU+0918 Confusable

U+0981 U+0945 Confusable

Table18BanglaandDevanāgarīconfusablecodepoints

59

1032 BanglaandGurmukhi

Bangla Gurmukhi NBGPdecision

ঘU+0998 ਬU+0A2C Confusable

U+0981 U+0A71 Confusable

Table19BanglaandGurmukhiconfusablecodepoints

Bangla Gurmukhi

NBGPdecision

ওU+0993 ਤU+0A24

Distinguishable

শU+09B6 ਅU+0A05

Distinguishable

মU+09AE ਮU+0A2E

Distinguishable

বাU+09ACandU+09BE

ਗU+0A17

Distinguishable

Table20ndashBanglaandGurmukhıdistinguishablecodepoints

1033BanglaandOriya(Odia)

Bangla Oriya(Odia) NBGPDecision

ওU+0993 ଓU+0B13 Confusable

Table21ndashBanglaandOriyadistinguishablecodepoints

Bangla Oriya(Odia) NBGP

Decision

ঘU+0998 ସU+0B38 Distinguishable

Table22ndashBanglaandOriyadistinguishablecodepoints

60

11 Appendix-IIBengaliconsonantsandtheirallographs

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

প p y(z+ত)(z +ন)|(z +প)প8(z +য)r(z +র)(z +ল)~(z +স)(+প)(+প)

ফ pʰ (t +র)(t +ল)(+ফ)

ব b ( +জ)( +দ)( +ধ)v(+ব)ব8(+য)(+র)(+ল)ভ(+ভ)(+ব)(+ব)

(0+ধ) 2 (3+ব)

ভ bʱ ভ8(+য)(+র)(+ল)

ত t (x+ত)8(x+x+য)(x+x+ব)(x+থ)(x+ন)ত8(x+য)(x+ম)8(x++য)(x+ব)(x+র)y(z+ত)(p+ত)(p+x+ব)(+ত)q8(+x++য) (+x+র)Thereisamarkedformofত+=ৎৎ6 (+xৎ)

amp (5+ত)

61

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

থ tʰ থ8(iexcl+য)cent(iexcl+র)pound(+থ)(x+থ)curren(+থ)

(7+থ) 9 (+থ)

দ d yen(brvbar+গ)sect(brvbar+ঘ)uml(brvbar+দ)copy(brvbar+ধ)দ8(brvbar+য)ordf(brvbar+ব)laquo(brvbar+ভ)not(brvbar+র)(+দ)shy(+দ)reg(+brvbar+র)not6 (+brvbar+র)

(lt+গ) gt (lt+ধ)

ধ dʱ macr(deg+ন)plusmn(deg+ম)ধ8(deg+য)sup2(deg+র)sup3(acute+ধ)copy(brvbar+ধ)(+ধ)micro(+ধ)

( (+ধ) gt (lt+ধ) (0+ধ) (7+ধ)

ট ʈ para(middot+ট)ট8(middot+য)cedil(middot+ব)sup1(middot+র)ordm(p+ট)raquo(frac14+ট)

ঠ ʈʰ ঠ8(frac12+য)frac34(iquest+ঠ)Agrave(frac14+ঠ)

ড ɖ Aacute(Acirc+ড)ড8(Acirc+য)Atilde(Acirc+র)

ঢ ɖʱ ঢ8(Auml+য)Aring(iquest+ঢ)

চ tʃ AElig(Ccedil+চ)Egrave(Ccedil+ছ)Eacute(Ccedil+Ecirc+র)Euml(Ccedil+ঞ)চ8(Ccedil+য)Igrave(Iacute+চ)Icirc(Iuml+চ)

(A+চ)

62

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ছ tʃʰ ETH(Ecirc+র)Egrave(Ccedil+ছ)Ntilde(Iacute+ছ)Ograve(Iuml+ছ)

$ (A+ছ)

জ dʒ Oacute(Ocirc+জ)Otilde(Ocirc+Ocirc+ব)Ouml(Ocirc+ঝ)times(Ocirc+ঞ)জ8(Ocirc+য)Oslash(Ocirc+র)Ugrave(Iacute+জ)

(A+জ)

ঝ dʒʱ (notprivilegedenoughtohaveclustersasafirstmember)Ouml(Ocirc+ঝ)Uacute(Iacute+ঝ)

ক k Ucirc(p+ক)ordm(p+ট)(p+ত)Uuml(p+x+র)(p+x+ব)Yacute(p+ন)THORN(p+ব)szlig(p+ম)ক8(p+য)agrave(p+র)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)auml(p+frac14+ব)aacute8(p+frac14+য)aring(p+স)s(aelig+ক)ccedil(+p+র)

amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)

খ kʰ (notprivilegedenoughtohaveclustersasafirstmember)egrave(aelig+খ)

63

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

গ g eacute(acute+গ)ecirc(acute+দ)sup3(acute+ধ)euml(acute+ন)igrave(acute+ব)iacute(acute+ম)গ8(acute+য)icirc(acute+র)iuml(acute+ল)eth(aelig+গ)eth6 (+aelig+গ)

( (+ধ) (H+গ) K (L+H+গ)

ঘ gʱ ntilde(ograve+ন)ঘ8(ograve+য)oacute(ograve+র)ocirc(aelig+ঘ)

ঞ Thisletterdoesnothaveanyparticularphoneticvaluebutmostlypronouncedasn

Igrave(Iacute+চ)Ntilde(Iacute+ছ)Ugrave(Iacute+জ)Uacute(Iacute+ঝ)times(Ocirc+ঞ)

(A+চ) $ (A+ছ) (A+জ) M (A+ঝ)

ণ n otilde(iquest+ট)frac34(iquest+ঠ)ouml(iquest+ড)divide(iquest+Acirc+র)Aring(iquest+ঢ)oslash(iquest+ণ)ণ8(iquest+য)ugrave(iquest+ব)acirc(p+frac14+ণ)uacute(frac14+ণ)ucirc(+ণ)

O (P+ড) - (P+R+র) + (S+ণ)

ঙং ŋ s(aelig+ক)uuml(aelig+p+র)egrave(aelig+খ)eth(aelig+গ)ocirc(aelig+ঘ)yacute(aelig+p+ষ)(Insomecontextsaeligisreplacedbyং)কংঅং

) (H+ক) (H+গ) U (H+ঘ)

64

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ম m thorn(+ল)yuml(+প)(+z+র)(+ভ)(++র)$(+ম)(+র)(x+ম)plusmn(deg+ম)amp(+ম)atilde(p+frac14+ম)

W (3+ম)

ন n (+ট)((+middot+র))(iquest+ঠ)u(+ড)(+Acirc+র)(+ত)q(+x+র)q8(+x++য)curren(+থ)shy(+দ)reg(+brvbar+র)micro(+ধ)+(+deg+র)(+brvbar+ব)-(+ন)(+ম)ন8(+য)(+স)0(+ন)

(7+থ) (7+ধ) (7+Y+র)

শ ʃ Icirc(Iuml+চ)Ograve(Iuml+ছ)1(Iuml+ন)2(Iuml+ম)3(Iuml+র)4(Iuml+ল)শ8(Iuml+য)

ষ ʃ 5(frac14+ক)raquo(frac14+ট)Agrave(frac14+ঠ)uacute(frac14+ণ)6(frac14+প)7(frac14+z+র)8(frac14+ফ)raquo(frac14+ট)9(frac14+middot+র)Agrave(frac14+ঠ)uacute(frac14+ণ)ষ8(frac14+য)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)

+ (S+ণ)

65

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

স sampʃ (+ক)(+ট)(+প)(+ফ)lt(+ত)pound(+থ)(+ট)(+ক)=(+খ)স8(+য)gt(+র)(+ল)aring(p+স)

9 (+থ)

হ h ucirc(+ণ)0(+ন)amp(+ম)হ8(+য)(+র)A(+ল)

W (3+ম)

ড় ɽ B(C+গ)

ঢ় ɽʱ (notprivilegedenoughtohaveclusters)

য dʒThesecondarysymbol(allograph)jɔ-phalahastwophoneticvaluesWhenaddedtotheinitialconsonantinaworditisavowelaelig(asinশ8ামলর 8াপারetc)Butafteranon-initialconsonantitjustdoublesitinpronunciation(asinকায6ধায6etc)The+যcombinationhastwophysicalmanifestationsmdashর 8andয6

ক8(p+য)স8(+য)র 8(+য)[Justর 8isneverusedinBanglaorthographyর 8াisbutthenitslasttwosymbolsYa-phalaa-karaconstituteavowelsignrepresentingthevowelঅ8া]

66

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

র r Twomanifestationsmdashi lরফrepʰasthefirst

memberofaclusteregপ6ৎ6 not6 য6D6(+deg+ব)(earlierE6=+brvbar+deg+বafour-termcluster)etc(placedoverthefollowingconsonant)

ii র-ফলাrɔ-pʰɔlaasthesecondthirdmemberofaclustereg etc(placedundertheconsonantitfollows)

ল l F(+গ)(+প)G(+ব)H(+ম)I(+ট)J(+ড)K(+ক)F(+গ)L(+দ)ল8(+য)iuml(acute+ল)(+ল)thorn(+ল)

ঃ hwordfinallywordmediallyitdoublesthepronunciationofthefollowingconsonant

অঃকঃ

অব

Page 29: Proposal for a Bangla (or Bengali) Script Root Zone …‘Bangla’ (or Bengali) is historically and genealogically regarded as an eastern Indo-Aryan language with around 178.2 million

29

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

22 U+099C জ BENGALILETTERJA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

23 U+099D ঝ BENGALILETTERJHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

24 U+099E ঞ BENGALILETTERNYA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

25 U+099F ট BENGALILETTERTTA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

26 U+09A0 ঠ BENGALILETTERTTHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

27 U+09A1 ড BENGALILETTERDDA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

30

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

28 09A109BC(U+09DC)

ড় NormalizedformofBENGALILETTERRRA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]09DCisthepreferredcodepointhoweveritisnotavailableforLGRasperthestandardsgoverningthisLGRdevelopment

29 U+09A2 ঢ BENGALILETTERDDHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

30 09A209BC(U+09DD)

ঢ় NormalizedformofBENGALILETTERRHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]09DDisthepreferredcodepointhoweveritisnotavailableforLGRasperthestandardsgoverningthisLGRdevelopment

31 U+09A3 ণ BENGALILETTERNNA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

32 U+09A4 ত BENGALILETTERTA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

31

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

33 U+09A5 থ BENGALILETTERTHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

34 U+09A6 দ BENGALILETTERDA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

35 U+09A7 ধ BENGALILETTERDHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

36 U+09A8 ন BENGALILETTERNA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

37 U+09AA প BENGALILETTERPA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

38 U+09AB ফ BENGALILETTERPHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

39 U+09AC ব BENGALILETTERBA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

32

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

40 U+09AD ভ BENGALILETTERBHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

41 U+09AE ম BENGALILETTERMA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

42 U+09AF য BENGALILETTERYA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

43 09AF09BC(U+09DF)

য় NormalizedformofBENGALILETTERYYA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]09DFisthepreferredcodepointhoweveritisnotavailableforLGRasperthestandardsgoverningthisLGRdevelopment

44 U+09B0 র BENGALILETTERRA

Consonant 1Bangla2Manipuri

[112][125]

45 U+09B2 ল BENGALILETTERLA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

46 U+09B6 শ BENGALILETTERSHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

33

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

47 U+09B7 ষ BENGALILETTERSSA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

48 U+09B8 স BENGALILETTERSA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

49 U+09B9 হ BENGALILETTERHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

50 U+09BE া BENGALIVOWELSIGNAA

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

51 U+09BF ি BENGALIVOWELSIGNI

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

52 U+09C0 ী BENGALIVOWELSIGNII

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

53 U+09C1 BENGALIVOWELSIGNU

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

34

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

54 U+09C2 BENGALIVOWELSIGNUU

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

55 U+09C3 BENGALIVOWELSIGNVOCALICR

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

56 U+09C4 BENGALIVOWELSIGNVOCALICRR

Kāra(Mātrā)

1Bangla2Assamese

[112][122]

57 U+09C7 l BENGALIVOWELSIGNE

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

58 U+09C8 m BENGALIVOWELSIGNAI

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

59 U+09CB lা BENGALIVOWELSIGNO

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

60 U+09CC lৗ BENGALIVOWELSIGNAU

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

35

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

61 U+09CD BENGALISIGNVIRAMA

Hasanta(=Halant)Virama(=Da ri)

1Bangla2Assamese2Manipuri

[112][122][125]

62 U+09CE ৎ BENGALILETTERKHANDATA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

63 U+09F0 ৰ BENGALILETTERRAWITHMIDDLEDIAGONAL

Consonant 2Assamese [122]

64 U+09F1 ৱ BENGALILETTERRAWITHLOWERDIAGONAL

Consonant 2Assamese2Manipuri

[122][125]

Table8BanglaCode-PointRepertoire

Apart from the above individual code-points the Neo-Brāhmī Generation Panel alsoproposes some specific sequences which enable conditional inclusion of the BanglaLETTER A and E followed by Bangla SIGN VIRAMA and Bangla LETTER YA againfollowed by Bangla VOWEL SIGN AA in the repertoire for enabling inclusion of aeligsoundasinEnglishlsquobatrsquolsquocatrsquoetc

SrNo

UnicodeCodePoints

Sequence

CharacterNames Examplelanguagesusingthecode-point

(Notexhaustive

list)

Reference

36

SrNo

UnicodeCodePoints

Sequence

CharacterNames Examplelanguagesusingthecode-point

(Notexhaustive

list)

Reference

S1 098509CD09AF09BE

অ8া BENGALILETTERABENGALISIGNVIRAMABENGALILETTERYABENGALIVOWELSIGNAA

BanglaAssamese

[112][122]

S2 098F09CD09AF09BE

এ8া BENGALILETTEREBENGALISIGNVIRAMABENGALILETTERYABENGALIVOWELSIGNAA

Bangla [112]

Table9Sequences

52 CodePointRepertoireExclusionTherearesomecharactersoftheBanglascriptthatfindplaceintheUnicodebuthavenot been included in the repertoire in the LGR proposal The reason for excludingঌ(U+098C)andৗ(U+09D7)isthattheyarerareandobsoletecharacters

SrNo CodePoints

Glyph CharacterNames Note

1 U+098C ঌ BENGALILETTERVOCALICL Limitedordeclininguse

2 U+09D7 ৗ BENGALIAULENGTHMARK Limitedordeclininguse

Table10ExcludedCodePoints

37

53 CodepointnotusedaloneBENGALI SIGN NUKTA U+09BC (See 336) is excluded from repertoire since it will never be used alone It will be used as sequence in three special characters in normalized form for ড় ঢ় য়

UnicodeCodePoint

Glyph CharacterName Reasonforexclusion

U+09BC BENGALISIGNNUKTA

Never used alone Only used together with U+09A1 ড

U+09A2 ঢ U+09AF য as to form ড় ঢ় য় respectively

Table10bExcludedCodePoints

54 TheBasisofPresentIDNThepresentLGRhasalsobenefitedfromtheearlierworkonIDNforBangla(differentversions)doneforभारतorভারতdraftedbetween20112009and18072013

541 TheABNFVariablesTheAugmentedBackus-NaurFormalism(ABNF)beganwiththefollowingvariables

CrarrConsonantVrarrVowelMrarrkāra(Mātrā)BrarrAnusvāra(onuʃʃār)DrarrCandrabinduXrarrVisarga(biʃɔrgo)HrarrHasantaViramaZrarrKhandaTa

TheAugmentedBackus-NaurFormalism(ABNF)willusethefollowingOperators

SrNumber Operator Function

1 ldquo|ldquo Alternative

2 ldquo[]rdquo Optional

3 ldquordquo VariableRepetition

38

4 ldquo()rdquo SequenceGroup

Table11TheABNFFormalism

InwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoBanglaaregiventofacilitateunderstanding

542 TheVowelSequenceInwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoBanglaare given To facilitate understanding of other Brahmi script users equivalents inDevanāgarīareprovidedwherevernecessaryAvowelsequenceismadeupofasinglevowelItmaybefollowedbutnotnecessarily(optionally)byanAnusvāraonuʃʃār(B)Candrabindu(D)oraVisargabiʃɔrgo(X)ThenumberofDBorXwhichcanfollowaV inBanglamaynotberestrictedtooneGoingbytherules illustratedinthedocument it isclearthat formationssuchasVDDVBBandVXXare invalidorthographicunitsHowever it isvalidandpossible tohaveformationsorsequencessuchasanusvarafollowedbyachandrabinduononehandandvisarga followed by a chandrabindu on the other as in হ8াংচা lsquohaelignchārsquo and lsquohaelignrsquo হ8াঃrespectivelyThepossibilityof aVisarga orAnusvāra (onuʃʃār) followingaCandrabinduexists inBangla Vowel can optionally be followed by a combination of Hasanta Virama [H]Consonant [C] to formaYa-phala ldquoYa-phala isapresentation formofU+09AFBanglaletterযorlsquoyarsquoRepresentedbythesequenceltU+09CDieBENGALISIGNVIRAMABangla SIGNHasanta or VIRAMA U+09AF -য BENGALI LETTER YAgt Ya-phala has aspecialformয়AgainwhencombinedwithU+09BEাBENGALIVOWELSIGNAA(ielsquoaarsquo(ā))itisusedfortranscribing[aelig]asintheldquoardquointheEnglishwordldquobatrdquowritteninBanglaasব8াটAVowel-sequenceadmitsthefollowingcombinations

5421 ASingleVowel

ExamplesV অअ

5422 AVowelwithConditionsAVowelcanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D]orVisarga[X]or Candrabindu+ Anusvāra [DB] or Candrabindu+ Visarga [DX] or combination ofHasanta(orVirama)[H]followedbyConsonant[C]followedbykāra(Mātrā)[M]

39

Examples

VB অং अ

VD অ अ

VX অঃ अः

VDB অং अ

VDX অঃ अःVHCM অ8াএ8া

5423 VHCMSequenceAVHCMsequencecanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D] orVisarga [X]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga [DX]

Examples VHCMB অ8াংএ8াং VHCMD অ8াএ8া VHCMX অ8াঃএ8াঃ VHCMDB অ8াংএ8াং VHCMDX অ8াঃএ8াঃ

543 TheConsonantSequence

5431 ASingleConsonant(C) ExampleC কक

5432 AConsonantwithConditionsAConsonantoptionallyfollowedbydependentvowelsignkāra(Mātrā)[M]orAnusvāra [B] or Candrabindu [D] or Visarga [X] or Hasanta (also known asVirama)[H]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga[DX]

Example

CM িকক -कक

CB কং क

40

CD ক क

CX কঃ कः

CH p क(Pureconsonant)

CDB কং क

CDX কঃ कः

5433 CMSequenceACMsequencecanbeoptionallyfollowedbyBDXDBorDX

Example CMB কীংকং कक

CMD কা का

CMX বীঃ वीः

CMDB কাং का

CMDX কাঃ काः

5434 SequenceofConsonants Asequenceofconsonants(upto4)joinedbyHasanta(alsoknownasVirama)

3(CH)CExample

CHC W rarr ন++ ত न+त

CHCHC sup2 rarr ন+ + ত+ + র न+त+र

CHCHCHC q8 rarr ন++ত++র++য় न+त+र+य

5435 Subsets While considering its subsets as a representative example we will consider thecombinationCHConlyhoweverthesameisequallyapplicabletoCHCHCandCHCHCHC[A]ThecombinationmaybefollowedbyMBDXDBorDX

Example CHCM sup3ী rarrক ক ী 4कrarrककী

CHCB sup3ং rarrক ক ং 4कrarrकक

CHCD sup3 rarrক ক 4कrarrकक

41

CHCX sup3ঃ rarrক ক ঃ 4कःrarrककঃ

CHCDB sup3 ং rarrক ক ং 4कrarrकक

CHCDX sup3ঃ rarrক ক ঃ 4कःrarrककः [B] 3(CH)CMmayfurtherbefollowedbyaBDXDBorDX

Example

CHCMB sup3ীং rarr ক ক ী ং 4क rarr क क ी

sup3ং rarr ক ক ং 4क rarr क क

CHCMD sup3া rarr ক ক া 4का rarr क क ा

CHCMX sup3ীঃ rarr ক ক ী ঃ 4कः rarr क क ी ः

CHCMDB sup3াংrarr ক ক া ং 4काrarr क क ा

CHCMDX sup3াঃ rarr ক ক া ঃ 4काः rarr क क ा ः

544 TheKhanda-Tasequence

5441 AsinglelsquoKhandarsquo-Ta(Z) Example Zৎ=x

5442 AKhandaTaCombination10AKhandaTacanbeprecededbyaconsonantandHasanta(alsoknownasVirama)

[CH]Z

Example র++ৎ =ৎHasinভৎHসনা (bhartsanā) scoldingNoteTheconditionsinthiscontextofKHANDATAarethattheCshouldbeeitherRAU+09B0(র)(usedinBangla)orRAU+09F0(ৰ)(usedinAssamese)

545 SpecialCasesSandPTwospecialcasesinvolvingSequences(referredtoasSandPinTable16underSection7)couldbedescribedbrieflyhereLetustakeupSinthefirstinstanceItisnoteworthythat thereare two instances inBanglawhereHasanta(U+09CD) isprecededbya full

10 Refer to Rule P in Section 7 Table 16

42

vowel (U+0985 অ - BANGLA LETTER A and U+098F এ - BANGLA LETTER E) ForrenderingYa-phalāfollowedbyঅandএitisnecessarytotypeU+09CDplusU+09AFyapreceded by the said vowels This is a purely ligatural entity and the addition ofYa-phalāandā-kāraisusedtoelicittheaeligsoundasinEnglishlsquobatrsquolsquofatrsquoetcTheBrahmıscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribedaslsquovowel killerrsquo although it actually indicates absence of a vowel after the markedconsonant Only the consonants can have the Hasanta marked But as we see hereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅGাandএGা(CfUnicode100p473[100])Another case refers to the formation of repha and ra-phalā in the said script andmentioned in the tableaboveasPOwing toco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+

kaasinঅকHarkaldquothesunldquo)ra-phalā=C+Hasanta+ra(egTieka+Hasanta+raas

inচTcakraldquocycleldquo)ThepointisinboththecasestheslotforracouldbeBanglaraর

(U+09B0)ortheAssameseraৰ(U+09F0)followedprecededbythecommonHasanta(U+09CD)whereastheshapesofrephaandra-phalāinboththecasesremainthesame

43

6 VariantsThissection talksabout thevariants in theBanglascriptTheNBGPcategorizes theseconfusinglyvariantsintwogroupsGroup1ConfusingduetopurevisualsimilarityGroup2ConfusingduetodeviationfromnormallyperceivedcharacterformationsbylargerlinguisticcommunityForGroup1anyidenticalcodepointsaredefinedasvariantsTheconfusablebutnotidenticalcasesarenotproposedasthereisanotherpanel(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesHowevercaseswhichbelongtoGroup2areproposedtobeconsideredasvariantsThesecasesarenotofmerevisualsimilarityasthey involve some deviations from the widely accepted norms of Bangla Aksharformations These can cause confusion even to a careful observer and hence beingproposedasvariantsThe variants are generated in a script when two or more forms are formed withdifferent storage or code points In Bangla the e-kāra ā-kāra and the o-kāra havedifferentcodepointsOnecantypeowithaconsonantatonegoandthesamebytypinge-kāra and ā-kāra as two separate keys getting the same results A reader cannotdifferentiatebetween the twoko (েকা)one typedwitha singlekeyand theotheronetypedwithtwodifferentkeysMoreoverthiswillnotbeconsideredasacaseofvariantbecauseakārafollowedbyakāraisnotallowed

61 InScriptVariantsHoweverweproposetwocasesoftruein-scriptvariantsinBanglascriptCASEIAs far as true variants in Bangla are concernedwemay drawour attention to caseswhereinHasantawith(U+09A5)থ(tha)appearsasconjunctwith(U+09B8)স(sa)and(U+09A8)ন(na)1 স+Hasanta+থ(U+09B8+U+09CD+U+09A5)versus

স +Hasanta+হ(U+09B8+U+09CD+U+09B9)

2 ন+Hasanta+থ(U+09A8+U+09CD+U+09A5)versus

ন+Hasanta+হ(U+09A8+U+09CD+U+09B9)

44

Theabovecombinationsifwrittenintraditionalorthographycouldbelittleconfusingwheretheথ(tha)inconjunctappearslikeaহ(ha)Theconjunctcouldbeintheinitialmedialorfinalpositions(asshownbelowinegno1)Itcouldbetypedwrongaswellthinking itwas aহ (ha)U+09B9 increasing the chances of risks in labelwriting andidentificationExamples1 acuteandসহ(asinacuteানsthānaacuteলsthulaাacuteGsvāsthyaঅacuteায়ীasthāyı)

2 andনহ(asinparagrantha)The fontswhichrepresent traditionalBanglawritingsystemcould tend tocreate thisproblemThereforethesemaybetakenascasesofvariantsinBanglaCASEIIAnotherinterestingexampleofvariantisencounteredinra+HasantaandHasanta+racombinationsinwritinglabelsintheBanglascript(forlanguagessuchasBanglaAssameseandManipuri)Thevariantcasesariseintypinglsquorepharsquo(involvingra+Hasanta)andlsquora-phalārsquo(involvingHasanta+ra)lsquoRepharsquocouldbeformedbytwosequences(mainlybecausebothAssameseandBanglafindplaceinthesameUNICODEpointsandlsquoB_RArsquoaswellaslsquoA_RArsquorefertothesamephoneticelement)Herethefinalligatureslookthesameandwillbeasfollows

(1) B_RA+H+C(2) A_RA+H+C

Where

B_RA = U+09B0BENGALILETTERRA(র)orA_RA = U+09F0BENGALILETTERRAWITHMIDDLEDIAGONAL(ৰ)H = U+09CDBENGALISIGNVIRAMA()C = anyconsonant(theoretically)

Example

Sequence1(UsingBanglaRA)

Ligature1

Sequence2(UsingAssameseRA)

Ligature2

U+09B0(র)U+09CD()U+0995(ক) কH U+09F0(ৰ)U+09CD()U+0995(ক) কH

U+09B0(র)U+09CD()U+09A0(ঠ) ঠH U+09F0(ৰ)U+09CD()U+09A0(ঠ) ঠH

Table12ExampleofRepha

45

NoteAsBanglaandAssameseকandঠlookexactlythesametheresultantcombinationswithRephalookidenticalAdditionofRephadoesnotmakeanydifferencelsquoRa-phalarsquocouldbeformedbytwosequencesonsimilargroundsandthefinalligatureswouldlookthesame

(1) C1+H+B_RA(2) C1+H+A_RA

WhereC1 = anyconsonantsexceptKhanda-ta

Example

Sequence1(UsingBanglaRA)

Ligature1

Sequence2(UsingAssameseRA)

Ligature2

U+0995(ক)U+09CD()U+09B0(র) U+0995(ক)U+09CD()U+09F0(ৰ)

U+09A8(ন)U+09CD()U+09B0(র) ) U+09A8(ন)U+09CD()U+09F0(ৰ) )

Table13ExampleofRa-phala

AstheAssameseandBanglaRephaandRa-phalaconjunctformslookthesamethiscouldcauseconfusabilitytotheend-usersHencetherephaandra-phalacasesneedtobedefinedasvariantsNBGPconcludedtodefineরandৰasvariantcodepointswhereonlyonevariantsetbetweenরandৰcouldcoverallcasesButthiswillcreateblockedvariantlabelsegif

someoneregistersldquoরররrdquothevariantlabelldquoৰৰৰrdquowillbegeneratedasvariantandwillbeblockedandviceversaHoweveritisonlyblockedatthelabellevelifsomeoneelseneedstoregisterotherlabelsegৰৰorৰৰৰৰitisstillpossible

62 CrossScriptVariantsAcrispcrossscriptstudyforBanglahasbeendonewithrespecttosisterscriptssuchasDevanāgarī Gurmukhı and Odia11(formerly Oriya) keeping in mind the visual andtechnicalconfusionstheymaycauseaslabelsonthewebdomainMoreoverthereisnoin-script variant in Bangla as far as the orthography is concerned The followingcharacters are being proposed by the NBPG as variants Although there are certaincharacters which are somewhat similar they but have not been included here TheyhavebeenprovidedintheAppendix(102)forreference

11 Unicode uses Oriya for the script although Odia is now the official term used

46

1 BanglaandNāgarīDevanāgarīScript

Bangla Devanāgarī

মU+09AE

मU+092E

িU+09BF

िU+093F

Table14-BanglaandDevanāgarīcross-scriptvariantcodepoint

2 BanglaandGurmukhiScript

Bangla Gurmukhı

মU+09AE

ਸU+0A38

িU+09BF

ਿU+0A3F

Table15-BanglaandGurmukhıcross-scriptvariantcodepoint

7 WholeLabelEvaluationRules(WLE) ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinsection32whenwritten inBangla12ScriptThe ruleshavebeendrafted in suchawaythattheycanbeeasilytranslatedintotheLGRspecificationsBelow are the symbols used in the WLE rules for each of the Indic SyllabicCategoryasmentionedinthetableprovidedinCodepointrepertoire(Section51)

C rarr Consonant

M rarr Kāra(Mātrā)

V rarr Vowel

B rarr Anusvāra

12 As used by the Unicode denoting and including both Assamese and Maṇipuri

47

D rarr Candrabindu

X rarr Visarga

H rarr Hasanta

Z rarr KhandaTa

S rarr S1S2(fromTable9)or(ae)Ya-phalā(V1HC1M1)whereV1iseither0985(অ-BENGALILETTERA)or098F(এ-BENGALILETTERE)His09CD(-BENGALISIGNVIRAMA)C1is-09AF(য-BENGALILETTERYA)M1is-09BE(া-BENGALIVOWELSIGNAA)S1 and S2 are valid even they are not allowed by the other context rules

P rarr Ra-Hasanta(C2H)whereC2iseither09B0(র-BENGALILETTERRA)or 09F0 (ৰ - ASSAMESE LETTER RA

Unicode name BENGALI LETTER RAWITHMIDDLEDIAGONAL)

His09CD(-BENGALISIGNVIRAMA)

Table16-SymbolsusedinWLErules

ItisalsoperhapsidealtomentionherethatinBanglatheconsonantletters(orgraphemes)arephysicallyjoinedtoformldquoclustersrdquothatcouldtheoreticallyconjoinfromtwotofourconsonantsandcombinetocreatenewshapesDashandChaudhuri(1998)statethatthereareldquonearly380uniqueconsonantclustersrdquooutofwhichBi-consonantalcombinationsare290three-lettercombinationsaccountforanother80andtherareroneswithfourlettersnumber10more[136Pg4]MoredetailsofsuchcombinationscouldbeseeninPabitraSarkar(1993)[135] 71FinalSetofWLERulesTheprevalentpatternsinBanglaandvariousrestrictionsbelowarethespecificWLErulesthatneedtobeimplemented

48

1 CisasetofCandCNwhereCNisthesetofnormalizedformsofড়ঢ়য়2 HmustbeprecededbyC

Example

3 MmustbeprecededbyCExampleকা

4 DmustbeprecededbyeitherofVCorMExampleআখখা হা

5 XmustbeprecededbyeitherofVCMorDExampleউঃখঃবঃাঃ দ ঃ

6 BmustbeprecededbyeitherofVCMorDExampleআংইংকং

7 ZmustbeprecededbyVCMDBXorPExampleইৎকৎাৎাৎপ6ৎrৎ (S is not listed because S ends with M Z may also follow S)rdquo

8 VCANNOTbeprecededbyHDetailsin711CaseofVprecededbyH

9 SCANNOTbeprecededbyH

Now letuselaborateeach rulewithexamples from the scriptkeeping inmindthe Bangla Assamese and Manipuri communities Some combinations ofcharactersmayseemunrealisticorrareinusagebutthereisnoharminaddingsuch ligaturesbecause it ispossible tocreate thembyanyusereasilybutmaynotbeattestedcombinationsCase of V Preceded by H There could be cases involvingmulti-word domains where Vmay need to beallowedtofollowanHegব8াsঅtইিuয়া baeligŋkʌv ɪndiə (U+09ACU+09CDU+09AFU+09BEU+0999U+09CD U+0995 U+0985 U+09AB U+09CD U+0987 U+09A8 U+09CD U+09A1U+09BFU+09DFU+09BE)(meaningBankofIndia)ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendswithanH(অt)andthesecondwordbeginswithaV(ইিuয়া)Somesectionsofthe linguistic community require the explicit presence of H for fullrepresentation of the sound intended However by and large the form of thefirstwordwithoutanH(U+09CD)isconsideredenoughforfullrepresentationofthesoundintendedforthefirstword

49

This isauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoire Otherwise V is never required to be allowed to follow an HPermittingthismaycreateaperceptivesimilaritybetweentwolabels(withandwithout H) for majority of the linguistic communities hence this is explicitlyprohibitedbytheNBGPIn future if required depending on the prevailing requirements from thecommunitythefutureNBGPmayconsiderrevisitingthisrule

72 AdditionalExamplesfromBanglaABNF

Belowarea fewexampleswhichhelponeunderstandsomeof the rulesABNFputsinplaceThesearejustgivenforreferencepurposesandarenotmeanttobecomprehensive

1HMBDorXcannotoccurinthebeginningofaBanglawordExample

ক क

াক ाक

ংক क

ক क

ঃক ःकAscanbeseensuchcombinationwillresultautomaticallyinaldquogolurdquooradottedcircle marking it as an invalid formation This is an intrinsic property of theIndianlanguagesyllableandisquasi-automaticallyappliedwhereversupportedbytheOS

2HisnotpermittedafterVBDXMS

Example

অ अ

অং क ক क কঃ कः ি )क

3NumberofBDorXpermittedafterConsonantorVowelorakāra(Mātrā)isrestrictedtoonethusthefollowingcombinationsareinvalidated

50

Example

কংং क

ক क

কঃঃ कःः

কা का

কীঃঃ कःः

অংং अ

অ अ

অঃঃ अःः

4NumberofMpermittedafterConsonantisrestrictedtooneExample

কীী की5MisnotpermittedafterV Example

ইাঈৗ ईाईौ6ThecombinationsofAnusvāra+ VisargaaswellasVisarga+Anusvāraarenotpermissible

Example

কংঃ कः

কঃং कः

8 Contributors

81ExpertsfromIndia ProfessorUdayaNarayanaSinghChair-ProfessorofLinguisticsampDeanFacultyofArtsAmityUniversityHaryanaGurgaonPachgaonManesarPIN122431(Haryana)India

51

ProfessorPabitraSarkarformerlyVice-ChancellorRabindraBharatiUniversityKolkataDrAtiurRahmanKhanPrincipalTechnicalOfficerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMrRajibChakrabortyLinguistSocietyforNaturalLanguageTechnologyResearch(SNLTR)Module114amp130SDFBuildingSaltLakeSector-VKolkata-700091(WestBengal)IndiaMrAkshatJoshiProjectEngineerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMsMoumitaChowdhurySeniorTechnicalOfficerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMrChandrakantaMurasinghAgartalaTripuraSomeotherNBGPmembers

82ContributorsfromBangladeshJanabMustafaJabbarHonorableMinisterMinistryofPostsTelecommunicationsampInformationTechnologyGovtofBangladeshProfShamsuzzamanKhanFormerDirector-GeneralBanglaAcademyDhakaProfRafiqulIslamNationalProfessorofHumanitiesDhakaProfSwarochisSarkarDirectorInstituteofBangladeshStudiesRajshahiUniversityRajshahiBangladeshProfJinnatImtiazAliDirector-GeneralInternationalMotherLanguageInstituteDhakaMrMohammadMamunOrRashidDepartmentofBanglaJahangirnagarUniversityampMemberBangladeshComputerCouncilProfManiruzzamanformerlyProfessorChittagongUniversityChattagramBangladeshMrShyamSunderSikderSecretarySecretaryPostampTelecommunicationsDivisionGovtofBangladesh

52

MrMdMustafaKamalFormerDirectorGeneralBangladeshTelecommunicationsRegulatoryAuthorityGovernmentofBangladeshDhakaBrigadierGeneralMdMahfuzulKarimMajumderDirector-GeneralEngineeringampOperationsDivisionBangladeshTelecommunicationsRegulatoryAuthorityGovernmentofBangladeshDhakaMdZiarulIslamProgrammerPostsampTelecommunicationsDivisionGovernmentofBangladeshDhakaProfSyedShahriyarRahmanDepartmentofLinguisticsUniversityofDhakaDrMizanurRahmanDirector(In-Charge)TranslationTextBookandInternationalRelationsDivisionBanglaAcademyDhakaDrApareshBandyopadhyayDirectorBanglaAcademyDhakaMrMdMobarakHossainDirectorBanglaAcademyDhakaDrJalalAhmedDirectorBanglaAcademyDhakaMrJahangirHossainInternetSocietyBangladesh(ICANNALS)JanabSarwarMostafaChoudhuryBangladeshComputerCouncilDhakaJanabMdRashidWasifBangladeshComputerCouncilDhakaJanabIstiaqueArifSeniorAssistantDirectorBangladeshTelecommunicationsRegulatoryAuthorityDhakaMsAfifaAbbasInformationSecurityandGovernanceLeadEngineeratBanglalinkandICANNFellowMr Mohammad Abdul Haque Secretary General Bangladesh Internet GovernanceForumMrImranHossenCEOEyeSoftandkeymemberofBangladeshAssociationofSoftwareampInformationServices(BASIS)MsShahidaKhatunDirectorFolkloreMuseumampArchiveDivisionBanglaAcademyDhakaMrSyedAshikRehmanCEOBengalMediaCorporationDhakaMrHaseebRahmanCEOProfessionalsrsquoSystemsDhaka

53

9 References[100] UnicodeConsortium2017UnicodeStandard100MountainViewCA

[101] BandyopadhyayChittaranjan1981DuiShatakerBanglaMudranoPrakashanKolkataAnandaPublishers

[102] BanerjiRD1919TheOriginoftheBengaliScriptKolkataNewDelhiAsianEducationalServices2003reprint

[103] ChatterjiSK1926TheOriginandDevelopmentoftheBengaliLanguageCalcuttaCalcuttaUniversityPress

[104] -----1939Bhasha-prakashBangalaVyakaran(AGrammaroftheBengaliLanguage)CalcuttaUniversityofCalcutta

[105] HaiMuhammadAbdul1964DhvaniVijnanOBanglaDhvani-tattwa(PhoneticsandBengaliPhonology)DhakaBanglaAcademy

[106]JhaSubhadra1958TheFormationofMaithiliLondonLuzacampCo

[107] KosticDjordjeDasRheaS1972AShortOutlineofBengaliPhoneticsCalcuttaStatisticalPublishingCompany

[108] MajumdarRC1971HistoryofAncientBengalCalcuttaGBhardwaj

[109] MazumdarBijaychandra19202000TheHistoryoftheBengaliLanguage(ReprCalcutta1920ed)NewDelhiAsianEducationalServices

[110] PandeyAnshuman2001ProposaltoEncodetheTirhutaScriptinISOIEC10646

[111] PalPalashBaran2001DhwanimalaBarnamalaKolkataPapyrus

[112] -----2007lsquoBanglaHarapherPanchParbarsquoInSwapanChakrabortyedMudranerSanskritiOBanglaBoiKolkataAbabhas

[113] RossFiona1999ThePrintedBengaliCharacteranditsEvolutionLondonCurzon

[114] ShastriMahamahopadhyayHaraPrasad1916HājārBacharērPurāṇaBāṅgālāBhāṣāyBauddhaGānōDōhāCalcuttaBangiyaSahityaParishat

[115] SinghUdayaNarayana(JointlyManiruzzaman)1983DiglossiainBangladeshandlanguageplanningCalcuttaGyanBharati214pp

[116] -----1987ABibliographyofBengaliLinguisticsMysoreCIILxii+316pp

[117] -----2017(withRajibChakrabortyBidishaBhattacharjeeampArimardanKumarTripathy)LanguagesandCulturesontheMarginGuidelinesforFieldworkonEndangeredLanguagesMimeoCentreforEndangeredLanguagesVisva-Bharati

54

[118] -----1980ScriptalchoiceandspellingreformAnessayinlanguageandplanningJournaloftheMSUniversityofBarodaSocialScienceNumber292173-186AmodifiedversionreprintedEAnnamalaiBjornJernuddandJoanRubinedsLanguagePlanningProceedingsofanInstituteMysoreCIIL405-417

[119] Sripantha1996JakhanChapakhanaEloKolkataPaschim-BangaBanglaAcademy

[120] SurAtul1986BanglaMudranerDushoBacharKolkataJijnasa

[121] ScriptBehaviourforBengaliVersion11TDILandC-DACPune

[122] BoraMahendra1981TheEvolutionofAssameseScriptJorhatAssamSahityaSabha

[123] ProposaltoEncodetheTirhutaScriptinISOIEC10646httpwwwunicodeorgL2L201111175r-tirhutapdfaccessedon25112017

[124] EthnologueAssameseintheLanguageCloudhttpswwwethnologuecomcloudasmaccessedon25112017

[125] BengalialphabetforManipurifoundinEthnologueManipuri(MeeteilonMeithei)httpswwwomniglotcomwritingmanipurihtmaccessedon20102019

[126] WikipediaBengalialphabethttpsenwikipediaorgwikiBengali_alphabetaccessedon25112017

[129]OmniglotSlyhetihttpwwwomniglotcomwritingsylotihtmaccessedon1052018

[130]WikipediaBishnupriyaManipurilanguagehttpsenwikipediaorgwikiBishnupriya_Manipuri_languageaccessedon25112017

[131] TheEMILLECIILCorpushttpmetashareeldaorgrepositorybrowsethe-emilleciil-corpusabdd35c8de6f11e2b1e400259011f6ea6bce74d38dbb42d881da76c64a6adb20accessedon1052018

[132]TheEMILLECIILCorpushttpcatalogelrainfoproduct_infophpproducts_id=696accessedon1052018

[133] BanglaLanguageampScript httpswwwisicalacin~rc_banglabanglahtmlaccessedon1052018

[134] SarkarPabitra1992BanglaBananSanskarSamasyaoSambhabanaKolkataChirayataPrakashan

[135] SarkarPabitra1993BanglaBhasharYuktabyanjanBhasha1123-45

55

[136] DashNiladriShekharandBBChaudhuri1998BanglaScriptAStructuralStudyLinguisticsToday121-28Alsoavailableathttpswwwacademiaedu9967428Bangla_Script_A_Structural_Study

[137] DaniAhmedHasan(1957)lsquoSrīhatta-NāgarīLipirUtpattioBikāśrsquoBanglaAcademyPatrika(Dhaka)Vol12(Bhadra-Agrahayan1364BangabdaNumber)pg1

[138] WikipediaSylhetiNagari httpsenwikipediaorgwikiSylheti_Nagariaccessedon1952018

[139] FuruiRyosuke(2015)lsquoVariegatedAdaptationsStateFormationinBengalfromtheFifthtoSeventhCenturyrsquoinBhairabiPrasadSahuampHermannKulkeedsInterrogatingPoliticalSystemsIntegrativeProcessesandStatesinPre-ModernIndiaChapter9Pp255-73NewDelhiManohar

[140] FergusonCharesAandMunierChowdhury(1960)lsquoPhonesofBengalirsquoLanguageVol36No1pp22-59

[141] ShahidullahMuhammad(2007)BuddhistMysticSongsDhakaMowlaBrothers

[142] RayPunyaSloka(1966)BengaliLanguageHandbookWashington

[143] HaiMuhammadAbdul(1960)AphoneticandphonologicalstudyofnasalsandnasalizationinBengaliDhakaUniversityofDhaka

[144] UnicodeConsortiumProposalSummaryFormtoAccompanySubmissionsforAdditionstotheRepertoireofISOIEC10646UNICODEhttpswwwunicodeorgL2L200202387r-syloti-formpdfaccessedonMay212018

[145] WikipediaOlChiki(Unicodeblock)httpsenwikipediaorgwikiOl_Chiki_(Unicode_block)accessedonMay212018

[146]BanglaScripthttpwwwbangladesh2000combdbangla_scripthtmlaccessedonMay212018

[147] BhattacharyaAshutoshed(1942)GopichandrerGanCalcuttaCalcuttaUniversity

[149] DasSisirKumar(1975)SahibsandMunshisAnAccountoftheCollegeofFortWilliamCalcutta

[150] IslamRafiqulPabitraSarkarMahbubulHaqampRajibChakraborty(eds)(2014)BanglaAcademyPromitoBanglaByabaharikByakaran(AFunctionalGrammarofStandardBangla)DhakaBanglaAcademy

[151] SarkarPabitra[2013]lsquoBanglaSpellingReformtheLongandShortofItrsquoBanglaJournal19215-232

[152] BanglaAcademy(2012)BanglaAcademyPromitoBanglaBananerNiyam(StandardBanglaSpellingasadoptedbyBanglaAcademy)DhakaBanglaAcademy

56

[153] SarkarPabitraampRajibChakraborty2018ldquoWhathashappenedSoFarIntermsof Script Reformsrdquo Paper presented at the Face to Face meeting jointly held by theBanglaAcademyDhakaampICANNatBanglaAcademyDhakaon10072018

[154] The Unicode Consortium 2018 The Unicodereg Standard Version 110 ndash CoreSpecificationChapter12P473

57

10 Appendix-I

101 AugmentedBackus-NaurFormalism(ABNF)The Augmented Backus-Naur Formalism (ABNF) is generic in nature and whenappliedtoaspecificlanguagescriptcertainrestrictionrulesapplyInotherwordsinagivenlanguagesomeoftheFormalismstructuresdonotnecessarilyapplyTotakecareofsuchcasesrestrictionrulesaresetinplaceTheserestrictionswillhelptofine-tunetheABNFIncaseofBangla13inparticularthefollowingrulesapply

1Khaṇḍata (ৎ) is NOT allowed at the beginning of an IDN label The sameappliestoঞandthevelarnasalঙintheBanglaSchemeoffive-foldlsquovargarsquo(asdefined under Table 5) Moreover Bangla does not allow ya (য়) in thebeginningofawordeitherbutwecanciteacoupleofnativeexamples forexamplethewordয়8াvেড়া(yaeligbbɔRo)fromthepoemlsquoLichuchorrsquowrittenbyKaziNazrul IslamHowever there are instances of it being used in namesmostlyof foreignoriginsuchasYaqubwhichmaybewrittenwithya(য়) inthe beginning as inয়াwব) In very recent timeswhile transliterating someChineseandJapanesenamesinBanglaonedoescomeacrossthepossibilityofKhaṇḍata (ৎ) followedbysa (স) inthebeginningofaword forexamplexেসিরং(Tsering)

2CHcancomewithKhandaTainonlythecasewhereCisra(র)(09B0)

ৎ6 asinভৎ6 সনা

3OnlyfollowingcombinationswithVHCMwillbeallowedrarrঅ8া(togetherpronouncedasaelig)asinঅ8ািসড(acid)rarrএ8া(togetheralsopronouncedasaelig)asinএ8ািসডএ8ােসািসেয়শান

(acidassociation)

102 lsquoSylhetiNagarılipirsquoorlsquoSilotirsquoThisversionofBanglascriptresemblesthe lsquoKaithīrsquoscript(ISO12954)usedbytheAccountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh andBiharndashwidelyinuseduringthe1880sTherewereseveralothernamesofSylheti

13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language Assamese and Maṇipuri have not been covered in this section

58

Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)

Table17ndashTheScriptTableofSylhetiNagarıorSiloti

103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints

1031 BanglaandNāgarīorDevanāgarī

Bangla Devanāgarī NBGP

Decision

ঃ U+0983 ः U+0903 Confusable

ওU+0993 उU+0909 Confusable

ঘU+0998 घU+0918 Confusable

U+0981 U+0945 Confusable

Table18BanglaandDevanāgarīconfusablecodepoints

59

1032 BanglaandGurmukhi

Bangla Gurmukhi NBGPdecision

ঘU+0998 ਬU+0A2C Confusable

U+0981 U+0A71 Confusable

Table19BanglaandGurmukhiconfusablecodepoints

Bangla Gurmukhi

NBGPdecision

ওU+0993 ਤU+0A24

Distinguishable

শU+09B6 ਅU+0A05

Distinguishable

মU+09AE ਮU+0A2E

Distinguishable

বাU+09ACandU+09BE

ਗU+0A17

Distinguishable

Table20ndashBanglaandGurmukhıdistinguishablecodepoints

1033BanglaandOriya(Odia)

Bangla Oriya(Odia) NBGPDecision

ওU+0993 ଓU+0B13 Confusable

Table21ndashBanglaandOriyadistinguishablecodepoints

Bangla Oriya(Odia) NBGP

Decision

ঘU+0998 ସU+0B38 Distinguishable

Table22ndashBanglaandOriyadistinguishablecodepoints

60

11 Appendix-IIBengaliconsonantsandtheirallographs

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

প p y(z+ত)(z +ন)|(z +প)প8(z +য)r(z +র)(z +ল)~(z +স)(+প)(+প)

ফ pʰ (t +র)(t +ল)(+ফ)

ব b ( +জ)( +দ)( +ধ)v(+ব)ব8(+য)(+র)(+ল)ভ(+ভ)(+ব)(+ব)

(0+ধ) 2 (3+ব)

ভ bʱ ভ8(+য)(+র)(+ল)

ত t (x+ত)8(x+x+য)(x+x+ব)(x+থ)(x+ন)ত8(x+য)(x+ম)8(x++য)(x+ব)(x+র)y(z+ত)(p+ত)(p+x+ব)(+ত)q8(+x++য) (+x+র)Thereisamarkedformofত+=ৎৎ6 (+xৎ)

amp (5+ত)

61

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

থ tʰ থ8(iexcl+য)cent(iexcl+র)pound(+থ)(x+থ)curren(+থ)

(7+থ) 9 (+থ)

দ d yen(brvbar+গ)sect(brvbar+ঘ)uml(brvbar+দ)copy(brvbar+ধ)দ8(brvbar+য)ordf(brvbar+ব)laquo(brvbar+ভ)not(brvbar+র)(+দ)shy(+দ)reg(+brvbar+র)not6 (+brvbar+র)

(lt+গ) gt (lt+ধ)

ধ dʱ macr(deg+ন)plusmn(deg+ম)ধ8(deg+য)sup2(deg+র)sup3(acute+ধ)copy(brvbar+ধ)(+ধ)micro(+ধ)

( (+ধ) gt (lt+ধ) (0+ধ) (7+ধ)

ট ʈ para(middot+ট)ট8(middot+য)cedil(middot+ব)sup1(middot+র)ordm(p+ট)raquo(frac14+ট)

ঠ ʈʰ ঠ8(frac12+য)frac34(iquest+ঠ)Agrave(frac14+ঠ)

ড ɖ Aacute(Acirc+ড)ড8(Acirc+য)Atilde(Acirc+র)

ঢ ɖʱ ঢ8(Auml+য)Aring(iquest+ঢ)

চ tʃ AElig(Ccedil+চ)Egrave(Ccedil+ছ)Eacute(Ccedil+Ecirc+র)Euml(Ccedil+ঞ)চ8(Ccedil+য)Igrave(Iacute+চ)Icirc(Iuml+চ)

(A+চ)

62

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ছ tʃʰ ETH(Ecirc+র)Egrave(Ccedil+ছ)Ntilde(Iacute+ছ)Ograve(Iuml+ছ)

$ (A+ছ)

জ dʒ Oacute(Ocirc+জ)Otilde(Ocirc+Ocirc+ব)Ouml(Ocirc+ঝ)times(Ocirc+ঞ)জ8(Ocirc+য)Oslash(Ocirc+র)Ugrave(Iacute+জ)

(A+জ)

ঝ dʒʱ (notprivilegedenoughtohaveclustersasafirstmember)Ouml(Ocirc+ঝ)Uacute(Iacute+ঝ)

ক k Ucirc(p+ক)ordm(p+ট)(p+ত)Uuml(p+x+র)(p+x+ব)Yacute(p+ন)THORN(p+ব)szlig(p+ম)ক8(p+য)agrave(p+র)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)auml(p+frac14+ব)aacute8(p+frac14+য)aring(p+স)s(aelig+ক)ccedil(+p+র)

amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)

খ kʰ (notprivilegedenoughtohaveclustersasafirstmember)egrave(aelig+খ)

63

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

গ g eacute(acute+গ)ecirc(acute+দ)sup3(acute+ধ)euml(acute+ন)igrave(acute+ব)iacute(acute+ম)গ8(acute+য)icirc(acute+র)iuml(acute+ল)eth(aelig+গ)eth6 (+aelig+গ)

( (+ধ) (H+গ) K (L+H+গ)

ঘ gʱ ntilde(ograve+ন)ঘ8(ograve+য)oacute(ograve+র)ocirc(aelig+ঘ)

ঞ Thisletterdoesnothaveanyparticularphoneticvaluebutmostlypronouncedasn

Igrave(Iacute+চ)Ntilde(Iacute+ছ)Ugrave(Iacute+জ)Uacute(Iacute+ঝ)times(Ocirc+ঞ)

(A+চ) $ (A+ছ) (A+জ) M (A+ঝ)

ণ n otilde(iquest+ট)frac34(iquest+ঠ)ouml(iquest+ড)divide(iquest+Acirc+র)Aring(iquest+ঢ)oslash(iquest+ণ)ণ8(iquest+য)ugrave(iquest+ব)acirc(p+frac14+ণ)uacute(frac14+ণ)ucirc(+ণ)

O (P+ড) - (P+R+র) + (S+ণ)

ঙং ŋ s(aelig+ক)uuml(aelig+p+র)egrave(aelig+খ)eth(aelig+গ)ocirc(aelig+ঘ)yacute(aelig+p+ষ)(Insomecontextsaeligisreplacedbyং)কংঅং

) (H+ক) (H+গ) U (H+ঘ)

64

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ম m thorn(+ল)yuml(+প)(+z+র)(+ভ)(++র)$(+ম)(+র)(x+ম)plusmn(deg+ম)amp(+ম)atilde(p+frac14+ম)

W (3+ম)

ন n (+ট)((+middot+র))(iquest+ঠ)u(+ড)(+Acirc+র)(+ত)q(+x+র)q8(+x++য)curren(+থ)shy(+দ)reg(+brvbar+র)micro(+ধ)+(+deg+র)(+brvbar+ব)-(+ন)(+ম)ন8(+য)(+স)0(+ন)

(7+থ) (7+ধ) (7+Y+র)

শ ʃ Icirc(Iuml+চ)Ograve(Iuml+ছ)1(Iuml+ন)2(Iuml+ম)3(Iuml+র)4(Iuml+ল)শ8(Iuml+য)

ষ ʃ 5(frac14+ক)raquo(frac14+ট)Agrave(frac14+ঠ)uacute(frac14+ণ)6(frac14+প)7(frac14+z+র)8(frac14+ফ)raquo(frac14+ট)9(frac14+middot+র)Agrave(frac14+ঠ)uacute(frac14+ণ)ষ8(frac14+য)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)

+ (S+ণ)

65

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

স sampʃ (+ক)(+ট)(+প)(+ফ)lt(+ত)pound(+থ)(+ট)(+ক)=(+খ)স8(+য)gt(+র)(+ল)aring(p+স)

9 (+থ)

হ h ucirc(+ণ)0(+ন)amp(+ম)হ8(+য)(+র)A(+ল)

W (3+ম)

ড় ɽ B(C+গ)

ঢ় ɽʱ (notprivilegedenoughtohaveclusters)

য dʒThesecondarysymbol(allograph)jɔ-phalahastwophoneticvaluesWhenaddedtotheinitialconsonantinaworditisavowelaelig(asinশ8ামলর 8াপারetc)Butafteranon-initialconsonantitjustdoublesitinpronunciation(asinকায6ধায6etc)The+যcombinationhastwophysicalmanifestationsmdashর 8andয6

ক8(p+য)স8(+য)র 8(+য)[Justর 8isneverusedinBanglaorthographyর 8াisbutthenitslasttwosymbolsYa-phalaa-karaconstituteavowelsignrepresentingthevowelঅ8া]

66

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

র r Twomanifestationsmdashi lরফrepʰasthefirst

memberofaclusteregপ6ৎ6 not6 য6D6(+deg+ব)(earlierE6=+brvbar+deg+বafour-termcluster)etc(placedoverthefollowingconsonant)

ii র-ফলাrɔ-pʰɔlaasthesecondthirdmemberofaclustereg etc(placedundertheconsonantitfollows)

ল l F(+গ)(+প)G(+ব)H(+ম)I(+ট)J(+ড)K(+ক)F(+গ)L(+দ)ল8(+য)iuml(acute+ল)(+ল)thorn(+ল)

ঃ hwordfinallywordmediallyitdoublesthepronunciationofthefollowingconsonant

অঃকঃ

অব

Page 30: Proposal for a Bangla (or Bengali) Script Root Zone …‘Bangla’ (or Bengali) is historically and genealogically regarded as an eastern Indo-Aryan language with around 178.2 million

30

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

28 09A109BC(U+09DC)

ড় NormalizedformofBENGALILETTERRRA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]09DCisthepreferredcodepointhoweveritisnotavailableforLGRasperthestandardsgoverningthisLGRdevelopment

29 U+09A2 ঢ BENGALILETTERDDHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

30 09A209BC(U+09DD)

ঢ় NormalizedformofBENGALILETTERRHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]09DDisthepreferredcodepointhoweveritisnotavailableforLGRasperthestandardsgoverningthisLGRdevelopment

31 U+09A3 ণ BENGALILETTERNNA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

32 U+09A4 ত BENGALILETTERTA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

31

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

33 U+09A5 থ BENGALILETTERTHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

34 U+09A6 দ BENGALILETTERDA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

35 U+09A7 ধ BENGALILETTERDHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

36 U+09A8 ন BENGALILETTERNA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

37 U+09AA প BENGALILETTERPA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

38 U+09AB ফ BENGALILETTERPHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

39 U+09AC ব BENGALILETTERBA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

32

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

40 U+09AD ভ BENGALILETTERBHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

41 U+09AE ম BENGALILETTERMA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

42 U+09AF য BENGALILETTERYA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

43 09AF09BC(U+09DF)

য় NormalizedformofBENGALILETTERYYA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]09DFisthepreferredcodepointhoweveritisnotavailableforLGRasperthestandardsgoverningthisLGRdevelopment

44 U+09B0 র BENGALILETTERRA

Consonant 1Bangla2Manipuri

[112][125]

45 U+09B2 ল BENGALILETTERLA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

46 U+09B6 শ BENGALILETTERSHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

33

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

47 U+09B7 ষ BENGALILETTERSSA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

48 U+09B8 স BENGALILETTERSA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

49 U+09B9 হ BENGALILETTERHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

50 U+09BE া BENGALIVOWELSIGNAA

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

51 U+09BF ি BENGALIVOWELSIGNI

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

52 U+09C0 ী BENGALIVOWELSIGNII

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

53 U+09C1 BENGALIVOWELSIGNU

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

34

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

54 U+09C2 BENGALIVOWELSIGNUU

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

55 U+09C3 BENGALIVOWELSIGNVOCALICR

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

56 U+09C4 BENGALIVOWELSIGNVOCALICRR

Kāra(Mātrā)

1Bangla2Assamese

[112][122]

57 U+09C7 l BENGALIVOWELSIGNE

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

58 U+09C8 m BENGALIVOWELSIGNAI

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

59 U+09CB lা BENGALIVOWELSIGNO

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

60 U+09CC lৗ BENGALIVOWELSIGNAU

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

35

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

61 U+09CD BENGALISIGNVIRAMA

Hasanta(=Halant)Virama(=Da ri)

1Bangla2Assamese2Manipuri

[112][122][125]

62 U+09CE ৎ BENGALILETTERKHANDATA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

63 U+09F0 ৰ BENGALILETTERRAWITHMIDDLEDIAGONAL

Consonant 2Assamese [122]

64 U+09F1 ৱ BENGALILETTERRAWITHLOWERDIAGONAL

Consonant 2Assamese2Manipuri

[122][125]

Table8BanglaCode-PointRepertoire

Apart from the above individual code-points the Neo-Brāhmī Generation Panel alsoproposes some specific sequences which enable conditional inclusion of the BanglaLETTER A and E followed by Bangla SIGN VIRAMA and Bangla LETTER YA againfollowed by Bangla VOWEL SIGN AA in the repertoire for enabling inclusion of aeligsoundasinEnglishlsquobatrsquolsquocatrsquoetc

SrNo

UnicodeCodePoints

Sequence

CharacterNames Examplelanguagesusingthecode-point

(Notexhaustive

list)

Reference

36

SrNo

UnicodeCodePoints

Sequence

CharacterNames Examplelanguagesusingthecode-point

(Notexhaustive

list)

Reference

S1 098509CD09AF09BE

অ8া BENGALILETTERABENGALISIGNVIRAMABENGALILETTERYABENGALIVOWELSIGNAA

BanglaAssamese

[112][122]

S2 098F09CD09AF09BE

এ8া BENGALILETTEREBENGALISIGNVIRAMABENGALILETTERYABENGALIVOWELSIGNAA

Bangla [112]

Table9Sequences

52 CodePointRepertoireExclusionTherearesomecharactersoftheBanglascriptthatfindplaceintheUnicodebuthavenot been included in the repertoire in the LGR proposal The reason for excludingঌ(U+098C)andৗ(U+09D7)isthattheyarerareandobsoletecharacters

SrNo CodePoints

Glyph CharacterNames Note

1 U+098C ঌ BENGALILETTERVOCALICL Limitedordeclininguse

2 U+09D7 ৗ BENGALIAULENGTHMARK Limitedordeclininguse

Table10ExcludedCodePoints

37

53 CodepointnotusedaloneBENGALI SIGN NUKTA U+09BC (See 336) is excluded from repertoire since it will never be used alone It will be used as sequence in three special characters in normalized form for ড় ঢ় য়

UnicodeCodePoint

Glyph CharacterName Reasonforexclusion

U+09BC BENGALISIGNNUKTA

Never used alone Only used together with U+09A1 ড

U+09A2 ঢ U+09AF য as to form ড় ঢ় য় respectively

Table10bExcludedCodePoints

54 TheBasisofPresentIDNThepresentLGRhasalsobenefitedfromtheearlierworkonIDNforBangla(differentversions)doneforभारतorভারতdraftedbetween20112009and18072013

541 TheABNFVariablesTheAugmentedBackus-NaurFormalism(ABNF)beganwiththefollowingvariables

CrarrConsonantVrarrVowelMrarrkāra(Mātrā)BrarrAnusvāra(onuʃʃār)DrarrCandrabinduXrarrVisarga(biʃɔrgo)HrarrHasantaViramaZrarrKhandaTa

TheAugmentedBackus-NaurFormalism(ABNF)willusethefollowingOperators

SrNumber Operator Function

1 ldquo|ldquo Alternative

2 ldquo[]rdquo Optional

3 ldquordquo VariableRepetition

38

4 ldquo()rdquo SequenceGroup

Table11TheABNFFormalism

InwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoBanglaaregiventofacilitateunderstanding

542 TheVowelSequenceInwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoBanglaare given To facilitate understanding of other Brahmi script users equivalents inDevanāgarīareprovidedwherevernecessaryAvowelsequenceismadeupofasinglevowelItmaybefollowedbutnotnecessarily(optionally)byanAnusvāraonuʃʃār(B)Candrabindu(D)oraVisargabiʃɔrgo(X)ThenumberofDBorXwhichcanfollowaV inBanglamaynotberestrictedtooneGoingbytherules illustratedinthedocument it isclearthat formationssuchasVDDVBBandVXXare invalidorthographicunitsHowever it isvalidandpossible tohaveformationsorsequencessuchasanusvarafollowedbyachandrabinduononehandandvisarga followed by a chandrabindu on the other as in হ8াংচা lsquohaelignchārsquo and lsquohaelignrsquo হ8াঃrespectivelyThepossibilityof aVisarga orAnusvāra (onuʃʃār) followingaCandrabinduexists inBangla Vowel can optionally be followed by a combination of Hasanta Virama [H]Consonant [C] to formaYa-phala ldquoYa-phala isapresentation formofU+09AFBanglaletterযorlsquoyarsquoRepresentedbythesequenceltU+09CDieBENGALISIGNVIRAMABangla SIGNHasanta or VIRAMA U+09AF -য BENGALI LETTER YAgt Ya-phala has aspecialformয়AgainwhencombinedwithU+09BEাBENGALIVOWELSIGNAA(ielsquoaarsquo(ā))itisusedfortranscribing[aelig]asintheldquoardquointheEnglishwordldquobatrdquowritteninBanglaasব8াটAVowel-sequenceadmitsthefollowingcombinations

5421 ASingleVowel

ExamplesV অअ

5422 AVowelwithConditionsAVowelcanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D]orVisarga[X]or Candrabindu+ Anusvāra [DB] or Candrabindu+ Visarga [DX] or combination ofHasanta(orVirama)[H]followedbyConsonant[C]followedbykāra(Mātrā)[M]

39

Examples

VB অং अ

VD অ अ

VX অঃ अः

VDB অং अ

VDX অঃ अःVHCM অ8াএ8া

5423 VHCMSequenceAVHCMsequencecanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D] orVisarga [X]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga [DX]

Examples VHCMB অ8াংএ8াং VHCMD অ8াএ8া VHCMX অ8াঃএ8াঃ VHCMDB অ8াংএ8াং VHCMDX অ8াঃএ8াঃ

543 TheConsonantSequence

5431 ASingleConsonant(C) ExampleC কक

5432 AConsonantwithConditionsAConsonantoptionallyfollowedbydependentvowelsignkāra(Mātrā)[M]orAnusvāra [B] or Candrabindu [D] or Visarga [X] or Hasanta (also known asVirama)[H]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga[DX]

Example

CM িকক -कक

CB কং क

40

CD ক क

CX কঃ कः

CH p क(Pureconsonant)

CDB কং क

CDX কঃ कः

5433 CMSequenceACMsequencecanbeoptionallyfollowedbyBDXDBorDX

Example CMB কীংকং कक

CMD কা का

CMX বীঃ वीः

CMDB কাং का

CMDX কাঃ काः

5434 SequenceofConsonants Asequenceofconsonants(upto4)joinedbyHasanta(alsoknownasVirama)

3(CH)CExample

CHC W rarr ন++ ত न+त

CHCHC sup2 rarr ন+ + ত+ + র न+त+र

CHCHCHC q8 rarr ন++ত++র++য় न+त+र+य

5435 Subsets While considering its subsets as a representative example we will consider thecombinationCHConlyhoweverthesameisequallyapplicabletoCHCHCandCHCHCHC[A]ThecombinationmaybefollowedbyMBDXDBorDX

Example CHCM sup3ী rarrক ক ী 4कrarrककী

CHCB sup3ং rarrক ক ং 4कrarrकक

CHCD sup3 rarrক ক 4कrarrकक

41

CHCX sup3ঃ rarrক ক ঃ 4कःrarrककঃ

CHCDB sup3 ং rarrক ক ং 4कrarrकक

CHCDX sup3ঃ rarrক ক ঃ 4कःrarrककः [B] 3(CH)CMmayfurtherbefollowedbyaBDXDBorDX

Example

CHCMB sup3ীং rarr ক ক ী ং 4क rarr क क ी

sup3ং rarr ক ক ং 4क rarr क क

CHCMD sup3া rarr ক ক া 4का rarr क क ा

CHCMX sup3ীঃ rarr ক ক ী ঃ 4कः rarr क क ी ः

CHCMDB sup3াংrarr ক ক া ং 4काrarr क क ा

CHCMDX sup3াঃ rarr ক ক া ঃ 4काः rarr क क ा ः

544 TheKhanda-Tasequence

5441 AsinglelsquoKhandarsquo-Ta(Z) Example Zৎ=x

5442 AKhandaTaCombination10AKhandaTacanbeprecededbyaconsonantandHasanta(alsoknownasVirama)

[CH]Z

Example র++ৎ =ৎHasinভৎHসনা (bhartsanā) scoldingNoteTheconditionsinthiscontextofKHANDATAarethattheCshouldbeeitherRAU+09B0(র)(usedinBangla)orRAU+09F0(ৰ)(usedinAssamese)

545 SpecialCasesSandPTwospecialcasesinvolvingSequences(referredtoasSandPinTable16underSection7)couldbedescribedbrieflyhereLetustakeupSinthefirstinstanceItisnoteworthythat thereare two instances inBanglawhereHasanta(U+09CD) isprecededbya full

10 Refer to Rule P in Section 7 Table 16

42

vowel (U+0985 অ - BANGLA LETTER A and U+098F এ - BANGLA LETTER E) ForrenderingYa-phalāfollowedbyঅandএitisnecessarytotypeU+09CDplusU+09AFyapreceded by the said vowels This is a purely ligatural entity and the addition ofYa-phalāandā-kāraisusedtoelicittheaeligsoundasinEnglishlsquobatrsquolsquofatrsquoetcTheBrahmıscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribedaslsquovowel killerrsquo although it actually indicates absence of a vowel after the markedconsonant Only the consonants can have the Hasanta marked But as we see hereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅGাandএGা(CfUnicode100p473[100])Another case refers to the formation of repha and ra-phalā in the said script andmentioned in the tableaboveasPOwing toco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+

kaasinঅকHarkaldquothesunldquo)ra-phalā=C+Hasanta+ra(egTieka+Hasanta+raas

inচTcakraldquocycleldquo)ThepointisinboththecasestheslotforracouldbeBanglaraর

(U+09B0)ortheAssameseraৰ(U+09F0)followedprecededbythecommonHasanta(U+09CD)whereastheshapesofrephaandra-phalāinboththecasesremainthesame

43

6 VariantsThissection talksabout thevariants in theBanglascriptTheNBGPcategorizes theseconfusinglyvariantsintwogroupsGroup1ConfusingduetopurevisualsimilarityGroup2ConfusingduetodeviationfromnormallyperceivedcharacterformationsbylargerlinguisticcommunityForGroup1anyidenticalcodepointsaredefinedasvariantsTheconfusablebutnotidenticalcasesarenotproposedasthereisanotherpanel(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesHowevercaseswhichbelongtoGroup2areproposedtobeconsideredasvariantsThesecasesarenotofmerevisualsimilarityasthey involve some deviations from the widely accepted norms of Bangla Aksharformations These can cause confusion even to a careful observer and hence beingproposedasvariantsThe variants are generated in a script when two or more forms are formed withdifferent storage or code points In Bangla the e-kāra ā-kāra and the o-kāra havedifferentcodepointsOnecantypeowithaconsonantatonegoandthesamebytypinge-kāra and ā-kāra as two separate keys getting the same results A reader cannotdifferentiatebetween the twoko (েকা)one typedwitha singlekeyand theotheronetypedwithtwodifferentkeysMoreoverthiswillnotbeconsideredasacaseofvariantbecauseakārafollowedbyakāraisnotallowed

61 InScriptVariantsHoweverweproposetwocasesoftruein-scriptvariantsinBanglascriptCASEIAs far as true variants in Bangla are concernedwemay drawour attention to caseswhereinHasantawith(U+09A5)থ(tha)appearsasconjunctwith(U+09B8)স(sa)and(U+09A8)ন(na)1 স+Hasanta+থ(U+09B8+U+09CD+U+09A5)versus

স +Hasanta+হ(U+09B8+U+09CD+U+09B9)

2 ন+Hasanta+থ(U+09A8+U+09CD+U+09A5)versus

ন+Hasanta+হ(U+09A8+U+09CD+U+09B9)

44

Theabovecombinationsifwrittenintraditionalorthographycouldbelittleconfusingwheretheথ(tha)inconjunctappearslikeaহ(ha)Theconjunctcouldbeintheinitialmedialorfinalpositions(asshownbelowinegno1)Itcouldbetypedwrongaswellthinking itwas aহ (ha)U+09B9 increasing the chances of risks in labelwriting andidentificationExamples1 acuteandসহ(asinacuteানsthānaacuteলsthulaাacuteGsvāsthyaঅacuteায়ীasthāyı)

2 andনহ(asinparagrantha)The fontswhichrepresent traditionalBanglawritingsystemcould tend tocreate thisproblemThereforethesemaybetakenascasesofvariantsinBanglaCASEIIAnotherinterestingexampleofvariantisencounteredinra+HasantaandHasanta+racombinationsinwritinglabelsintheBanglascript(forlanguagessuchasBanglaAssameseandManipuri)Thevariantcasesariseintypinglsquorepharsquo(involvingra+Hasanta)andlsquora-phalārsquo(involvingHasanta+ra)lsquoRepharsquocouldbeformedbytwosequences(mainlybecausebothAssameseandBanglafindplaceinthesameUNICODEpointsandlsquoB_RArsquoaswellaslsquoA_RArsquorefertothesamephoneticelement)Herethefinalligatureslookthesameandwillbeasfollows

(1) B_RA+H+C(2) A_RA+H+C

Where

B_RA = U+09B0BENGALILETTERRA(র)orA_RA = U+09F0BENGALILETTERRAWITHMIDDLEDIAGONAL(ৰ)H = U+09CDBENGALISIGNVIRAMA()C = anyconsonant(theoretically)

Example

Sequence1(UsingBanglaRA)

Ligature1

Sequence2(UsingAssameseRA)

Ligature2

U+09B0(র)U+09CD()U+0995(ক) কH U+09F0(ৰ)U+09CD()U+0995(ক) কH

U+09B0(র)U+09CD()U+09A0(ঠ) ঠH U+09F0(ৰ)U+09CD()U+09A0(ঠ) ঠH

Table12ExampleofRepha

45

NoteAsBanglaandAssameseকandঠlookexactlythesametheresultantcombinationswithRephalookidenticalAdditionofRephadoesnotmakeanydifferencelsquoRa-phalarsquocouldbeformedbytwosequencesonsimilargroundsandthefinalligatureswouldlookthesame

(1) C1+H+B_RA(2) C1+H+A_RA

WhereC1 = anyconsonantsexceptKhanda-ta

Example

Sequence1(UsingBanglaRA)

Ligature1

Sequence2(UsingAssameseRA)

Ligature2

U+0995(ক)U+09CD()U+09B0(র) U+0995(ক)U+09CD()U+09F0(ৰ)

U+09A8(ন)U+09CD()U+09B0(র) ) U+09A8(ন)U+09CD()U+09F0(ৰ) )

Table13ExampleofRa-phala

AstheAssameseandBanglaRephaandRa-phalaconjunctformslookthesamethiscouldcauseconfusabilitytotheend-usersHencetherephaandra-phalacasesneedtobedefinedasvariantsNBGPconcludedtodefineরandৰasvariantcodepointswhereonlyonevariantsetbetweenরandৰcouldcoverallcasesButthiswillcreateblockedvariantlabelsegif

someoneregistersldquoরররrdquothevariantlabelldquoৰৰৰrdquowillbegeneratedasvariantandwillbeblockedandviceversaHoweveritisonlyblockedatthelabellevelifsomeoneelseneedstoregisterotherlabelsegৰৰorৰৰৰৰitisstillpossible

62 CrossScriptVariantsAcrispcrossscriptstudyforBanglahasbeendonewithrespecttosisterscriptssuchasDevanāgarī Gurmukhı and Odia11(formerly Oriya) keeping in mind the visual andtechnicalconfusionstheymaycauseaslabelsonthewebdomainMoreoverthereisnoin-script variant in Bangla as far as the orthography is concerned The followingcharacters are being proposed by the NBPG as variants Although there are certaincharacters which are somewhat similar they but have not been included here TheyhavebeenprovidedintheAppendix(102)forreference

11 Unicode uses Oriya for the script although Odia is now the official term used

46

1 BanglaandNāgarīDevanāgarīScript

Bangla Devanāgarī

মU+09AE

मU+092E

িU+09BF

िU+093F

Table14-BanglaandDevanāgarīcross-scriptvariantcodepoint

2 BanglaandGurmukhiScript

Bangla Gurmukhı

মU+09AE

ਸU+0A38

িU+09BF

ਿU+0A3F

Table15-BanglaandGurmukhıcross-scriptvariantcodepoint

7 WholeLabelEvaluationRules(WLE) ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinsection32whenwritten inBangla12ScriptThe ruleshavebeendrafted in suchawaythattheycanbeeasilytranslatedintotheLGRspecificationsBelow are the symbols used in the WLE rules for each of the Indic SyllabicCategoryasmentionedinthetableprovidedinCodepointrepertoire(Section51)

C rarr Consonant

M rarr Kāra(Mātrā)

V rarr Vowel

B rarr Anusvāra

12 As used by the Unicode denoting and including both Assamese and Maṇipuri

47

D rarr Candrabindu

X rarr Visarga

H rarr Hasanta

Z rarr KhandaTa

S rarr S1S2(fromTable9)or(ae)Ya-phalā(V1HC1M1)whereV1iseither0985(অ-BENGALILETTERA)or098F(এ-BENGALILETTERE)His09CD(-BENGALISIGNVIRAMA)C1is-09AF(য-BENGALILETTERYA)M1is-09BE(া-BENGALIVOWELSIGNAA)S1 and S2 are valid even they are not allowed by the other context rules

P rarr Ra-Hasanta(C2H)whereC2iseither09B0(র-BENGALILETTERRA)or 09F0 (ৰ - ASSAMESE LETTER RA

Unicode name BENGALI LETTER RAWITHMIDDLEDIAGONAL)

His09CD(-BENGALISIGNVIRAMA)

Table16-SymbolsusedinWLErules

ItisalsoperhapsidealtomentionherethatinBanglatheconsonantletters(orgraphemes)arephysicallyjoinedtoformldquoclustersrdquothatcouldtheoreticallyconjoinfromtwotofourconsonantsandcombinetocreatenewshapesDashandChaudhuri(1998)statethatthereareldquonearly380uniqueconsonantclustersrdquooutofwhichBi-consonantalcombinationsare290three-lettercombinationsaccountforanother80andtherareroneswithfourlettersnumber10more[136Pg4]MoredetailsofsuchcombinationscouldbeseeninPabitraSarkar(1993)[135] 71FinalSetofWLERulesTheprevalentpatternsinBanglaandvariousrestrictionsbelowarethespecificWLErulesthatneedtobeimplemented

48

1 CisasetofCandCNwhereCNisthesetofnormalizedformsofড়ঢ়য়2 HmustbeprecededbyC

Example

3 MmustbeprecededbyCExampleকা

4 DmustbeprecededbyeitherofVCorMExampleআখখা হা

5 XmustbeprecededbyeitherofVCMorDExampleউঃখঃবঃাঃ দ ঃ

6 BmustbeprecededbyeitherofVCMorDExampleআংইংকং

7 ZmustbeprecededbyVCMDBXorPExampleইৎকৎাৎাৎপ6ৎrৎ (S is not listed because S ends with M Z may also follow S)rdquo

8 VCANNOTbeprecededbyHDetailsin711CaseofVprecededbyH

9 SCANNOTbeprecededbyH

Now letuselaborateeach rulewithexamples from the scriptkeeping inmindthe Bangla Assamese and Manipuri communities Some combinations ofcharactersmayseemunrealisticorrareinusagebutthereisnoharminaddingsuch ligaturesbecause it ispossible tocreate thembyanyusereasilybutmaynotbeattestedcombinationsCase of V Preceded by H There could be cases involvingmulti-word domains where Vmay need to beallowedtofollowanHegব8াsঅtইিuয়া baeligŋkʌv ɪndiə (U+09ACU+09CDU+09AFU+09BEU+0999U+09CD U+0995 U+0985 U+09AB U+09CD U+0987 U+09A8 U+09CD U+09A1U+09BFU+09DFU+09BE)(meaningBankofIndia)ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendswithanH(অt)andthesecondwordbeginswithaV(ইিuয়া)Somesectionsofthe linguistic community require the explicit presence of H for fullrepresentation of the sound intended However by and large the form of thefirstwordwithoutanH(U+09CD)isconsideredenoughforfullrepresentationofthesoundintendedforthefirstword

49

This isauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoire Otherwise V is never required to be allowed to follow an HPermittingthismaycreateaperceptivesimilaritybetweentwolabels(withandwithout H) for majority of the linguistic communities hence this is explicitlyprohibitedbytheNBGPIn future if required depending on the prevailing requirements from thecommunitythefutureNBGPmayconsiderrevisitingthisrule

72 AdditionalExamplesfromBanglaABNF

Belowarea fewexampleswhichhelponeunderstandsomeof the rulesABNFputsinplaceThesearejustgivenforreferencepurposesandarenotmeanttobecomprehensive

1HMBDorXcannotoccurinthebeginningofaBanglawordExample

ক क

াক ाक

ংক क

ক क

ঃক ःकAscanbeseensuchcombinationwillresultautomaticallyinaldquogolurdquooradottedcircle marking it as an invalid formation This is an intrinsic property of theIndianlanguagesyllableandisquasi-automaticallyappliedwhereversupportedbytheOS

2HisnotpermittedafterVBDXMS

Example

অ अ

অং क ক क কঃ कः ি )क

3NumberofBDorXpermittedafterConsonantorVowelorakāra(Mātrā)isrestrictedtoonethusthefollowingcombinationsareinvalidated

50

Example

কংং क

ক क

কঃঃ कःः

কা का

কীঃঃ कःः

অংং अ

অ अ

অঃঃ अःः

4NumberofMpermittedafterConsonantisrestrictedtooneExample

কীী की5MisnotpermittedafterV Example

ইাঈৗ ईाईौ6ThecombinationsofAnusvāra+ VisargaaswellasVisarga+Anusvāraarenotpermissible

Example

কংঃ कः

কঃং कः

8 Contributors

81ExpertsfromIndia ProfessorUdayaNarayanaSinghChair-ProfessorofLinguisticsampDeanFacultyofArtsAmityUniversityHaryanaGurgaonPachgaonManesarPIN122431(Haryana)India

51

ProfessorPabitraSarkarformerlyVice-ChancellorRabindraBharatiUniversityKolkataDrAtiurRahmanKhanPrincipalTechnicalOfficerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMrRajibChakrabortyLinguistSocietyforNaturalLanguageTechnologyResearch(SNLTR)Module114amp130SDFBuildingSaltLakeSector-VKolkata-700091(WestBengal)IndiaMrAkshatJoshiProjectEngineerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMsMoumitaChowdhurySeniorTechnicalOfficerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMrChandrakantaMurasinghAgartalaTripuraSomeotherNBGPmembers

82ContributorsfromBangladeshJanabMustafaJabbarHonorableMinisterMinistryofPostsTelecommunicationsampInformationTechnologyGovtofBangladeshProfShamsuzzamanKhanFormerDirector-GeneralBanglaAcademyDhakaProfRafiqulIslamNationalProfessorofHumanitiesDhakaProfSwarochisSarkarDirectorInstituteofBangladeshStudiesRajshahiUniversityRajshahiBangladeshProfJinnatImtiazAliDirector-GeneralInternationalMotherLanguageInstituteDhakaMrMohammadMamunOrRashidDepartmentofBanglaJahangirnagarUniversityampMemberBangladeshComputerCouncilProfManiruzzamanformerlyProfessorChittagongUniversityChattagramBangladeshMrShyamSunderSikderSecretarySecretaryPostampTelecommunicationsDivisionGovtofBangladesh

52

MrMdMustafaKamalFormerDirectorGeneralBangladeshTelecommunicationsRegulatoryAuthorityGovernmentofBangladeshDhakaBrigadierGeneralMdMahfuzulKarimMajumderDirector-GeneralEngineeringampOperationsDivisionBangladeshTelecommunicationsRegulatoryAuthorityGovernmentofBangladeshDhakaMdZiarulIslamProgrammerPostsampTelecommunicationsDivisionGovernmentofBangladeshDhakaProfSyedShahriyarRahmanDepartmentofLinguisticsUniversityofDhakaDrMizanurRahmanDirector(In-Charge)TranslationTextBookandInternationalRelationsDivisionBanglaAcademyDhakaDrApareshBandyopadhyayDirectorBanglaAcademyDhakaMrMdMobarakHossainDirectorBanglaAcademyDhakaDrJalalAhmedDirectorBanglaAcademyDhakaMrJahangirHossainInternetSocietyBangladesh(ICANNALS)JanabSarwarMostafaChoudhuryBangladeshComputerCouncilDhakaJanabMdRashidWasifBangladeshComputerCouncilDhakaJanabIstiaqueArifSeniorAssistantDirectorBangladeshTelecommunicationsRegulatoryAuthorityDhakaMsAfifaAbbasInformationSecurityandGovernanceLeadEngineeratBanglalinkandICANNFellowMr Mohammad Abdul Haque Secretary General Bangladesh Internet GovernanceForumMrImranHossenCEOEyeSoftandkeymemberofBangladeshAssociationofSoftwareampInformationServices(BASIS)MsShahidaKhatunDirectorFolkloreMuseumampArchiveDivisionBanglaAcademyDhakaMrSyedAshikRehmanCEOBengalMediaCorporationDhakaMrHaseebRahmanCEOProfessionalsrsquoSystemsDhaka

53

9 References[100] UnicodeConsortium2017UnicodeStandard100MountainViewCA

[101] BandyopadhyayChittaranjan1981DuiShatakerBanglaMudranoPrakashanKolkataAnandaPublishers

[102] BanerjiRD1919TheOriginoftheBengaliScriptKolkataNewDelhiAsianEducationalServices2003reprint

[103] ChatterjiSK1926TheOriginandDevelopmentoftheBengaliLanguageCalcuttaCalcuttaUniversityPress

[104] -----1939Bhasha-prakashBangalaVyakaran(AGrammaroftheBengaliLanguage)CalcuttaUniversityofCalcutta

[105] HaiMuhammadAbdul1964DhvaniVijnanOBanglaDhvani-tattwa(PhoneticsandBengaliPhonology)DhakaBanglaAcademy

[106]JhaSubhadra1958TheFormationofMaithiliLondonLuzacampCo

[107] KosticDjordjeDasRheaS1972AShortOutlineofBengaliPhoneticsCalcuttaStatisticalPublishingCompany

[108] MajumdarRC1971HistoryofAncientBengalCalcuttaGBhardwaj

[109] MazumdarBijaychandra19202000TheHistoryoftheBengaliLanguage(ReprCalcutta1920ed)NewDelhiAsianEducationalServices

[110] PandeyAnshuman2001ProposaltoEncodetheTirhutaScriptinISOIEC10646

[111] PalPalashBaran2001DhwanimalaBarnamalaKolkataPapyrus

[112] -----2007lsquoBanglaHarapherPanchParbarsquoInSwapanChakrabortyedMudranerSanskritiOBanglaBoiKolkataAbabhas

[113] RossFiona1999ThePrintedBengaliCharacteranditsEvolutionLondonCurzon

[114] ShastriMahamahopadhyayHaraPrasad1916HājārBacharērPurāṇaBāṅgālāBhāṣāyBauddhaGānōDōhāCalcuttaBangiyaSahityaParishat

[115] SinghUdayaNarayana(JointlyManiruzzaman)1983DiglossiainBangladeshandlanguageplanningCalcuttaGyanBharati214pp

[116] -----1987ABibliographyofBengaliLinguisticsMysoreCIILxii+316pp

[117] -----2017(withRajibChakrabortyBidishaBhattacharjeeampArimardanKumarTripathy)LanguagesandCulturesontheMarginGuidelinesforFieldworkonEndangeredLanguagesMimeoCentreforEndangeredLanguagesVisva-Bharati

54

[118] -----1980ScriptalchoiceandspellingreformAnessayinlanguageandplanningJournaloftheMSUniversityofBarodaSocialScienceNumber292173-186AmodifiedversionreprintedEAnnamalaiBjornJernuddandJoanRubinedsLanguagePlanningProceedingsofanInstituteMysoreCIIL405-417

[119] Sripantha1996JakhanChapakhanaEloKolkataPaschim-BangaBanglaAcademy

[120] SurAtul1986BanglaMudranerDushoBacharKolkataJijnasa

[121] ScriptBehaviourforBengaliVersion11TDILandC-DACPune

[122] BoraMahendra1981TheEvolutionofAssameseScriptJorhatAssamSahityaSabha

[123] ProposaltoEncodetheTirhutaScriptinISOIEC10646httpwwwunicodeorgL2L201111175r-tirhutapdfaccessedon25112017

[124] EthnologueAssameseintheLanguageCloudhttpswwwethnologuecomcloudasmaccessedon25112017

[125] BengalialphabetforManipurifoundinEthnologueManipuri(MeeteilonMeithei)httpswwwomniglotcomwritingmanipurihtmaccessedon20102019

[126] WikipediaBengalialphabethttpsenwikipediaorgwikiBengali_alphabetaccessedon25112017

[129]OmniglotSlyhetihttpwwwomniglotcomwritingsylotihtmaccessedon1052018

[130]WikipediaBishnupriyaManipurilanguagehttpsenwikipediaorgwikiBishnupriya_Manipuri_languageaccessedon25112017

[131] TheEMILLECIILCorpushttpmetashareeldaorgrepositorybrowsethe-emilleciil-corpusabdd35c8de6f11e2b1e400259011f6ea6bce74d38dbb42d881da76c64a6adb20accessedon1052018

[132]TheEMILLECIILCorpushttpcatalogelrainfoproduct_infophpproducts_id=696accessedon1052018

[133] BanglaLanguageampScript httpswwwisicalacin~rc_banglabanglahtmlaccessedon1052018

[134] SarkarPabitra1992BanglaBananSanskarSamasyaoSambhabanaKolkataChirayataPrakashan

[135] SarkarPabitra1993BanglaBhasharYuktabyanjanBhasha1123-45

55

[136] DashNiladriShekharandBBChaudhuri1998BanglaScriptAStructuralStudyLinguisticsToday121-28Alsoavailableathttpswwwacademiaedu9967428Bangla_Script_A_Structural_Study

[137] DaniAhmedHasan(1957)lsquoSrīhatta-NāgarīLipirUtpattioBikāśrsquoBanglaAcademyPatrika(Dhaka)Vol12(Bhadra-Agrahayan1364BangabdaNumber)pg1

[138] WikipediaSylhetiNagari httpsenwikipediaorgwikiSylheti_Nagariaccessedon1952018

[139] FuruiRyosuke(2015)lsquoVariegatedAdaptationsStateFormationinBengalfromtheFifthtoSeventhCenturyrsquoinBhairabiPrasadSahuampHermannKulkeedsInterrogatingPoliticalSystemsIntegrativeProcessesandStatesinPre-ModernIndiaChapter9Pp255-73NewDelhiManohar

[140] FergusonCharesAandMunierChowdhury(1960)lsquoPhonesofBengalirsquoLanguageVol36No1pp22-59

[141] ShahidullahMuhammad(2007)BuddhistMysticSongsDhakaMowlaBrothers

[142] RayPunyaSloka(1966)BengaliLanguageHandbookWashington

[143] HaiMuhammadAbdul(1960)AphoneticandphonologicalstudyofnasalsandnasalizationinBengaliDhakaUniversityofDhaka

[144] UnicodeConsortiumProposalSummaryFormtoAccompanySubmissionsforAdditionstotheRepertoireofISOIEC10646UNICODEhttpswwwunicodeorgL2L200202387r-syloti-formpdfaccessedonMay212018

[145] WikipediaOlChiki(Unicodeblock)httpsenwikipediaorgwikiOl_Chiki_(Unicode_block)accessedonMay212018

[146]BanglaScripthttpwwwbangladesh2000combdbangla_scripthtmlaccessedonMay212018

[147] BhattacharyaAshutoshed(1942)GopichandrerGanCalcuttaCalcuttaUniversity

[149] DasSisirKumar(1975)SahibsandMunshisAnAccountoftheCollegeofFortWilliamCalcutta

[150] IslamRafiqulPabitraSarkarMahbubulHaqampRajibChakraborty(eds)(2014)BanglaAcademyPromitoBanglaByabaharikByakaran(AFunctionalGrammarofStandardBangla)DhakaBanglaAcademy

[151] SarkarPabitra[2013]lsquoBanglaSpellingReformtheLongandShortofItrsquoBanglaJournal19215-232

[152] BanglaAcademy(2012)BanglaAcademyPromitoBanglaBananerNiyam(StandardBanglaSpellingasadoptedbyBanglaAcademy)DhakaBanglaAcademy

56

[153] SarkarPabitraampRajibChakraborty2018ldquoWhathashappenedSoFarIntermsof Script Reformsrdquo Paper presented at the Face to Face meeting jointly held by theBanglaAcademyDhakaampICANNatBanglaAcademyDhakaon10072018

[154] The Unicode Consortium 2018 The Unicodereg Standard Version 110 ndash CoreSpecificationChapter12P473

57

10 Appendix-I

101 AugmentedBackus-NaurFormalism(ABNF)The Augmented Backus-Naur Formalism (ABNF) is generic in nature and whenappliedtoaspecificlanguagescriptcertainrestrictionrulesapplyInotherwordsinagivenlanguagesomeoftheFormalismstructuresdonotnecessarilyapplyTotakecareofsuchcasesrestrictionrulesaresetinplaceTheserestrictionswillhelptofine-tunetheABNFIncaseofBangla13inparticularthefollowingrulesapply

1Khaṇḍata (ৎ) is NOT allowed at the beginning of an IDN label The sameappliestoঞandthevelarnasalঙintheBanglaSchemeoffive-foldlsquovargarsquo(asdefined under Table 5) Moreover Bangla does not allow ya (য়) in thebeginningofawordeitherbutwecanciteacoupleofnativeexamples forexamplethewordয়8াvেড়া(yaeligbbɔRo)fromthepoemlsquoLichuchorrsquowrittenbyKaziNazrul IslamHowever there are instances of it being used in namesmostlyof foreignoriginsuchasYaqubwhichmaybewrittenwithya(য়) inthe beginning as inয়াwব) In very recent timeswhile transliterating someChineseandJapanesenamesinBanglaonedoescomeacrossthepossibilityofKhaṇḍata (ৎ) followedbysa (স) inthebeginningofaword forexamplexেসিরং(Tsering)

2CHcancomewithKhandaTainonlythecasewhereCisra(র)(09B0)

ৎ6 asinভৎ6 সনা

3OnlyfollowingcombinationswithVHCMwillbeallowedrarrঅ8া(togetherpronouncedasaelig)asinঅ8ািসড(acid)rarrএ8া(togetheralsopronouncedasaelig)asinএ8ািসডএ8ােসািসেয়শান

(acidassociation)

102 lsquoSylhetiNagarılipirsquoorlsquoSilotirsquoThisversionofBanglascriptresemblesthe lsquoKaithīrsquoscript(ISO12954)usedbytheAccountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh andBiharndashwidelyinuseduringthe1880sTherewereseveralothernamesofSylheti

13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language Assamese and Maṇipuri have not been covered in this section

58

Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)

Table17ndashTheScriptTableofSylhetiNagarıorSiloti

103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints

1031 BanglaandNāgarīorDevanāgarī

Bangla Devanāgarī NBGP

Decision

ঃ U+0983 ः U+0903 Confusable

ওU+0993 उU+0909 Confusable

ঘU+0998 घU+0918 Confusable

U+0981 U+0945 Confusable

Table18BanglaandDevanāgarīconfusablecodepoints

59

1032 BanglaandGurmukhi

Bangla Gurmukhi NBGPdecision

ঘU+0998 ਬU+0A2C Confusable

U+0981 U+0A71 Confusable

Table19BanglaandGurmukhiconfusablecodepoints

Bangla Gurmukhi

NBGPdecision

ওU+0993 ਤU+0A24

Distinguishable

শU+09B6 ਅU+0A05

Distinguishable

মU+09AE ਮU+0A2E

Distinguishable

বাU+09ACandU+09BE

ਗU+0A17

Distinguishable

Table20ndashBanglaandGurmukhıdistinguishablecodepoints

1033BanglaandOriya(Odia)

Bangla Oriya(Odia) NBGPDecision

ওU+0993 ଓU+0B13 Confusable

Table21ndashBanglaandOriyadistinguishablecodepoints

Bangla Oriya(Odia) NBGP

Decision

ঘU+0998 ସU+0B38 Distinguishable

Table22ndashBanglaandOriyadistinguishablecodepoints

60

11 Appendix-IIBengaliconsonantsandtheirallographs

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

প p y(z+ত)(z +ন)|(z +প)প8(z +য)r(z +র)(z +ল)~(z +স)(+প)(+প)

ফ pʰ (t +র)(t +ল)(+ফ)

ব b ( +জ)( +দ)( +ধ)v(+ব)ব8(+য)(+র)(+ল)ভ(+ভ)(+ব)(+ব)

(0+ধ) 2 (3+ব)

ভ bʱ ভ8(+য)(+র)(+ল)

ত t (x+ত)8(x+x+য)(x+x+ব)(x+থ)(x+ন)ত8(x+য)(x+ম)8(x++য)(x+ব)(x+র)y(z+ত)(p+ত)(p+x+ব)(+ত)q8(+x++য) (+x+র)Thereisamarkedformofত+=ৎৎ6 (+xৎ)

amp (5+ত)

61

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

থ tʰ থ8(iexcl+য)cent(iexcl+র)pound(+থ)(x+থ)curren(+থ)

(7+থ) 9 (+থ)

দ d yen(brvbar+গ)sect(brvbar+ঘ)uml(brvbar+দ)copy(brvbar+ধ)দ8(brvbar+য)ordf(brvbar+ব)laquo(brvbar+ভ)not(brvbar+র)(+দ)shy(+দ)reg(+brvbar+র)not6 (+brvbar+র)

(lt+গ) gt (lt+ধ)

ধ dʱ macr(deg+ন)plusmn(deg+ম)ধ8(deg+য)sup2(deg+র)sup3(acute+ধ)copy(brvbar+ধ)(+ধ)micro(+ধ)

( (+ধ) gt (lt+ধ) (0+ধ) (7+ধ)

ট ʈ para(middot+ট)ট8(middot+য)cedil(middot+ব)sup1(middot+র)ordm(p+ট)raquo(frac14+ট)

ঠ ʈʰ ঠ8(frac12+য)frac34(iquest+ঠ)Agrave(frac14+ঠ)

ড ɖ Aacute(Acirc+ড)ড8(Acirc+য)Atilde(Acirc+র)

ঢ ɖʱ ঢ8(Auml+য)Aring(iquest+ঢ)

চ tʃ AElig(Ccedil+চ)Egrave(Ccedil+ছ)Eacute(Ccedil+Ecirc+র)Euml(Ccedil+ঞ)চ8(Ccedil+য)Igrave(Iacute+চ)Icirc(Iuml+চ)

(A+চ)

62

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ছ tʃʰ ETH(Ecirc+র)Egrave(Ccedil+ছ)Ntilde(Iacute+ছ)Ograve(Iuml+ছ)

$ (A+ছ)

জ dʒ Oacute(Ocirc+জ)Otilde(Ocirc+Ocirc+ব)Ouml(Ocirc+ঝ)times(Ocirc+ঞ)জ8(Ocirc+য)Oslash(Ocirc+র)Ugrave(Iacute+জ)

(A+জ)

ঝ dʒʱ (notprivilegedenoughtohaveclustersasafirstmember)Ouml(Ocirc+ঝ)Uacute(Iacute+ঝ)

ক k Ucirc(p+ক)ordm(p+ট)(p+ত)Uuml(p+x+র)(p+x+ব)Yacute(p+ন)THORN(p+ব)szlig(p+ম)ক8(p+য)agrave(p+র)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)auml(p+frac14+ব)aacute8(p+frac14+য)aring(p+স)s(aelig+ক)ccedil(+p+র)

amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)

খ kʰ (notprivilegedenoughtohaveclustersasafirstmember)egrave(aelig+খ)

63

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

গ g eacute(acute+গ)ecirc(acute+দ)sup3(acute+ধ)euml(acute+ন)igrave(acute+ব)iacute(acute+ম)গ8(acute+য)icirc(acute+র)iuml(acute+ল)eth(aelig+গ)eth6 (+aelig+গ)

( (+ধ) (H+গ) K (L+H+গ)

ঘ gʱ ntilde(ograve+ন)ঘ8(ograve+য)oacute(ograve+র)ocirc(aelig+ঘ)

ঞ Thisletterdoesnothaveanyparticularphoneticvaluebutmostlypronouncedasn

Igrave(Iacute+চ)Ntilde(Iacute+ছ)Ugrave(Iacute+জ)Uacute(Iacute+ঝ)times(Ocirc+ঞ)

(A+চ) $ (A+ছ) (A+জ) M (A+ঝ)

ণ n otilde(iquest+ট)frac34(iquest+ঠ)ouml(iquest+ড)divide(iquest+Acirc+র)Aring(iquest+ঢ)oslash(iquest+ণ)ণ8(iquest+য)ugrave(iquest+ব)acirc(p+frac14+ণ)uacute(frac14+ণ)ucirc(+ণ)

O (P+ড) - (P+R+র) + (S+ণ)

ঙং ŋ s(aelig+ক)uuml(aelig+p+র)egrave(aelig+খ)eth(aelig+গ)ocirc(aelig+ঘ)yacute(aelig+p+ষ)(Insomecontextsaeligisreplacedbyং)কংঅং

) (H+ক) (H+গ) U (H+ঘ)

64

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ম m thorn(+ল)yuml(+প)(+z+র)(+ভ)(++র)$(+ম)(+র)(x+ম)plusmn(deg+ম)amp(+ম)atilde(p+frac14+ম)

W (3+ম)

ন n (+ট)((+middot+র))(iquest+ঠ)u(+ড)(+Acirc+র)(+ত)q(+x+র)q8(+x++য)curren(+থ)shy(+দ)reg(+brvbar+র)micro(+ধ)+(+deg+র)(+brvbar+ব)-(+ন)(+ম)ন8(+য)(+স)0(+ন)

(7+থ) (7+ধ) (7+Y+র)

শ ʃ Icirc(Iuml+চ)Ograve(Iuml+ছ)1(Iuml+ন)2(Iuml+ম)3(Iuml+র)4(Iuml+ল)শ8(Iuml+য)

ষ ʃ 5(frac14+ক)raquo(frac14+ট)Agrave(frac14+ঠ)uacute(frac14+ণ)6(frac14+প)7(frac14+z+র)8(frac14+ফ)raquo(frac14+ট)9(frac14+middot+র)Agrave(frac14+ঠ)uacute(frac14+ণ)ষ8(frac14+য)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)

+ (S+ণ)

65

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

স sampʃ (+ক)(+ট)(+প)(+ফ)lt(+ত)pound(+থ)(+ট)(+ক)=(+খ)স8(+য)gt(+র)(+ল)aring(p+স)

9 (+থ)

হ h ucirc(+ণ)0(+ন)amp(+ম)হ8(+য)(+র)A(+ল)

W (3+ম)

ড় ɽ B(C+গ)

ঢ় ɽʱ (notprivilegedenoughtohaveclusters)

য dʒThesecondarysymbol(allograph)jɔ-phalahastwophoneticvaluesWhenaddedtotheinitialconsonantinaworditisavowelaelig(asinশ8ামলর 8াপারetc)Butafteranon-initialconsonantitjustdoublesitinpronunciation(asinকায6ধায6etc)The+যcombinationhastwophysicalmanifestationsmdashর 8andয6

ক8(p+য)স8(+য)র 8(+য)[Justর 8isneverusedinBanglaorthographyর 8াisbutthenitslasttwosymbolsYa-phalaa-karaconstituteavowelsignrepresentingthevowelঅ8া]

66

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

র r Twomanifestationsmdashi lরফrepʰasthefirst

memberofaclusteregপ6ৎ6 not6 য6D6(+deg+ব)(earlierE6=+brvbar+deg+বafour-termcluster)etc(placedoverthefollowingconsonant)

ii র-ফলাrɔ-pʰɔlaasthesecondthirdmemberofaclustereg etc(placedundertheconsonantitfollows)

ল l F(+গ)(+প)G(+ব)H(+ম)I(+ট)J(+ড)K(+ক)F(+গ)L(+দ)ল8(+য)iuml(acute+ল)(+ল)thorn(+ল)

ঃ hwordfinallywordmediallyitdoublesthepronunciationofthefollowingconsonant

অঃকঃ

অব

Page 31: Proposal for a Bangla (or Bengali) Script Root Zone …‘Bangla’ (or Bengali) is historically and genealogically regarded as an eastern Indo-Aryan language with around 178.2 million

31

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

33 U+09A5 থ BENGALILETTERTHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

34 U+09A6 দ BENGALILETTERDA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

35 U+09A7 ধ BENGALILETTERDHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

36 U+09A8 ন BENGALILETTERNA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

37 U+09AA প BENGALILETTERPA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

38 U+09AB ফ BENGALILETTERPHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

39 U+09AC ব BENGALILETTERBA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

32

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

40 U+09AD ভ BENGALILETTERBHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

41 U+09AE ম BENGALILETTERMA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

42 U+09AF য BENGALILETTERYA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

43 09AF09BC(U+09DF)

য় NormalizedformofBENGALILETTERYYA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]09DFisthepreferredcodepointhoweveritisnotavailableforLGRasperthestandardsgoverningthisLGRdevelopment

44 U+09B0 র BENGALILETTERRA

Consonant 1Bangla2Manipuri

[112][125]

45 U+09B2 ল BENGALILETTERLA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

46 U+09B6 শ BENGALILETTERSHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

33

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

47 U+09B7 ষ BENGALILETTERSSA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

48 U+09B8 স BENGALILETTERSA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

49 U+09B9 হ BENGALILETTERHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

50 U+09BE া BENGALIVOWELSIGNAA

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

51 U+09BF ি BENGALIVOWELSIGNI

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

52 U+09C0 ী BENGALIVOWELSIGNII

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

53 U+09C1 BENGALIVOWELSIGNU

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

34

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

54 U+09C2 BENGALIVOWELSIGNUU

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

55 U+09C3 BENGALIVOWELSIGNVOCALICR

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

56 U+09C4 BENGALIVOWELSIGNVOCALICRR

Kāra(Mātrā)

1Bangla2Assamese

[112][122]

57 U+09C7 l BENGALIVOWELSIGNE

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

58 U+09C8 m BENGALIVOWELSIGNAI

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

59 U+09CB lা BENGALIVOWELSIGNO

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

60 U+09CC lৗ BENGALIVOWELSIGNAU

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

35

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

61 U+09CD BENGALISIGNVIRAMA

Hasanta(=Halant)Virama(=Da ri)

1Bangla2Assamese2Manipuri

[112][122][125]

62 U+09CE ৎ BENGALILETTERKHANDATA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

63 U+09F0 ৰ BENGALILETTERRAWITHMIDDLEDIAGONAL

Consonant 2Assamese [122]

64 U+09F1 ৱ BENGALILETTERRAWITHLOWERDIAGONAL

Consonant 2Assamese2Manipuri

[122][125]

Table8BanglaCode-PointRepertoire

Apart from the above individual code-points the Neo-Brāhmī Generation Panel alsoproposes some specific sequences which enable conditional inclusion of the BanglaLETTER A and E followed by Bangla SIGN VIRAMA and Bangla LETTER YA againfollowed by Bangla VOWEL SIGN AA in the repertoire for enabling inclusion of aeligsoundasinEnglishlsquobatrsquolsquocatrsquoetc

SrNo

UnicodeCodePoints

Sequence

CharacterNames Examplelanguagesusingthecode-point

(Notexhaustive

list)

Reference

36

SrNo

UnicodeCodePoints

Sequence

CharacterNames Examplelanguagesusingthecode-point

(Notexhaustive

list)

Reference

S1 098509CD09AF09BE

অ8া BENGALILETTERABENGALISIGNVIRAMABENGALILETTERYABENGALIVOWELSIGNAA

BanglaAssamese

[112][122]

S2 098F09CD09AF09BE

এ8া BENGALILETTEREBENGALISIGNVIRAMABENGALILETTERYABENGALIVOWELSIGNAA

Bangla [112]

Table9Sequences

52 CodePointRepertoireExclusionTherearesomecharactersoftheBanglascriptthatfindplaceintheUnicodebuthavenot been included in the repertoire in the LGR proposal The reason for excludingঌ(U+098C)andৗ(U+09D7)isthattheyarerareandobsoletecharacters

SrNo CodePoints

Glyph CharacterNames Note

1 U+098C ঌ BENGALILETTERVOCALICL Limitedordeclininguse

2 U+09D7 ৗ BENGALIAULENGTHMARK Limitedordeclininguse

Table10ExcludedCodePoints

37

53 CodepointnotusedaloneBENGALI SIGN NUKTA U+09BC (See 336) is excluded from repertoire since it will never be used alone It will be used as sequence in three special characters in normalized form for ড় ঢ় য়

UnicodeCodePoint

Glyph CharacterName Reasonforexclusion

U+09BC BENGALISIGNNUKTA

Never used alone Only used together with U+09A1 ড

U+09A2 ঢ U+09AF য as to form ড় ঢ় য় respectively

Table10bExcludedCodePoints

54 TheBasisofPresentIDNThepresentLGRhasalsobenefitedfromtheearlierworkonIDNforBangla(differentversions)doneforभारतorভারতdraftedbetween20112009and18072013

541 TheABNFVariablesTheAugmentedBackus-NaurFormalism(ABNF)beganwiththefollowingvariables

CrarrConsonantVrarrVowelMrarrkāra(Mātrā)BrarrAnusvāra(onuʃʃār)DrarrCandrabinduXrarrVisarga(biʃɔrgo)HrarrHasantaViramaZrarrKhandaTa

TheAugmentedBackus-NaurFormalism(ABNF)willusethefollowingOperators

SrNumber Operator Function

1 ldquo|ldquo Alternative

2 ldquo[]rdquo Optional

3 ldquordquo VariableRepetition

38

4 ldquo()rdquo SequenceGroup

Table11TheABNFFormalism

InwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoBanglaaregiventofacilitateunderstanding

542 TheVowelSequenceInwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoBanglaare given To facilitate understanding of other Brahmi script users equivalents inDevanāgarīareprovidedwherevernecessaryAvowelsequenceismadeupofasinglevowelItmaybefollowedbutnotnecessarily(optionally)byanAnusvāraonuʃʃār(B)Candrabindu(D)oraVisargabiʃɔrgo(X)ThenumberofDBorXwhichcanfollowaV inBanglamaynotberestrictedtooneGoingbytherules illustratedinthedocument it isclearthat formationssuchasVDDVBBandVXXare invalidorthographicunitsHowever it isvalidandpossible tohaveformationsorsequencessuchasanusvarafollowedbyachandrabinduononehandandvisarga followed by a chandrabindu on the other as in হ8াংচা lsquohaelignchārsquo and lsquohaelignrsquo হ8াঃrespectivelyThepossibilityof aVisarga orAnusvāra (onuʃʃār) followingaCandrabinduexists inBangla Vowel can optionally be followed by a combination of Hasanta Virama [H]Consonant [C] to formaYa-phala ldquoYa-phala isapresentation formofU+09AFBanglaletterযorlsquoyarsquoRepresentedbythesequenceltU+09CDieBENGALISIGNVIRAMABangla SIGNHasanta or VIRAMA U+09AF -য BENGALI LETTER YAgt Ya-phala has aspecialformয়AgainwhencombinedwithU+09BEাBENGALIVOWELSIGNAA(ielsquoaarsquo(ā))itisusedfortranscribing[aelig]asintheldquoardquointheEnglishwordldquobatrdquowritteninBanglaasব8াটAVowel-sequenceadmitsthefollowingcombinations

5421 ASingleVowel

ExamplesV অअ

5422 AVowelwithConditionsAVowelcanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D]orVisarga[X]or Candrabindu+ Anusvāra [DB] or Candrabindu+ Visarga [DX] or combination ofHasanta(orVirama)[H]followedbyConsonant[C]followedbykāra(Mātrā)[M]

39

Examples

VB অং अ

VD অ अ

VX অঃ अः

VDB অং अ

VDX অঃ अःVHCM অ8াএ8া

5423 VHCMSequenceAVHCMsequencecanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D] orVisarga [X]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga [DX]

Examples VHCMB অ8াংএ8াং VHCMD অ8াএ8া VHCMX অ8াঃএ8াঃ VHCMDB অ8াংএ8াং VHCMDX অ8াঃএ8াঃ

543 TheConsonantSequence

5431 ASingleConsonant(C) ExampleC কक

5432 AConsonantwithConditionsAConsonantoptionallyfollowedbydependentvowelsignkāra(Mātrā)[M]orAnusvāra [B] or Candrabindu [D] or Visarga [X] or Hasanta (also known asVirama)[H]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga[DX]

Example

CM িকক -कक

CB কং क

40

CD ক क

CX কঃ कः

CH p क(Pureconsonant)

CDB কং क

CDX কঃ कः

5433 CMSequenceACMsequencecanbeoptionallyfollowedbyBDXDBorDX

Example CMB কীংকং कक

CMD কা का

CMX বীঃ वीः

CMDB কাং का

CMDX কাঃ काः

5434 SequenceofConsonants Asequenceofconsonants(upto4)joinedbyHasanta(alsoknownasVirama)

3(CH)CExample

CHC W rarr ন++ ত न+त

CHCHC sup2 rarr ন+ + ত+ + র न+त+र

CHCHCHC q8 rarr ন++ত++র++য় न+त+र+य

5435 Subsets While considering its subsets as a representative example we will consider thecombinationCHConlyhoweverthesameisequallyapplicabletoCHCHCandCHCHCHC[A]ThecombinationmaybefollowedbyMBDXDBorDX

Example CHCM sup3ী rarrক ক ী 4कrarrककী

CHCB sup3ং rarrক ক ং 4कrarrकक

CHCD sup3 rarrক ক 4कrarrकक

41

CHCX sup3ঃ rarrক ক ঃ 4कःrarrककঃ

CHCDB sup3 ং rarrক ক ং 4कrarrकक

CHCDX sup3ঃ rarrক ক ঃ 4कःrarrककः [B] 3(CH)CMmayfurtherbefollowedbyaBDXDBorDX

Example

CHCMB sup3ীং rarr ক ক ী ং 4क rarr क क ी

sup3ং rarr ক ক ং 4क rarr क क

CHCMD sup3া rarr ক ক া 4का rarr क क ा

CHCMX sup3ীঃ rarr ক ক ী ঃ 4कः rarr क क ी ः

CHCMDB sup3াংrarr ক ক া ং 4काrarr क क ा

CHCMDX sup3াঃ rarr ক ক া ঃ 4काः rarr क क ा ः

544 TheKhanda-Tasequence

5441 AsinglelsquoKhandarsquo-Ta(Z) Example Zৎ=x

5442 AKhandaTaCombination10AKhandaTacanbeprecededbyaconsonantandHasanta(alsoknownasVirama)

[CH]Z

Example র++ৎ =ৎHasinভৎHসনা (bhartsanā) scoldingNoteTheconditionsinthiscontextofKHANDATAarethattheCshouldbeeitherRAU+09B0(র)(usedinBangla)orRAU+09F0(ৰ)(usedinAssamese)

545 SpecialCasesSandPTwospecialcasesinvolvingSequences(referredtoasSandPinTable16underSection7)couldbedescribedbrieflyhereLetustakeupSinthefirstinstanceItisnoteworthythat thereare two instances inBanglawhereHasanta(U+09CD) isprecededbya full

10 Refer to Rule P in Section 7 Table 16

42

vowel (U+0985 অ - BANGLA LETTER A and U+098F এ - BANGLA LETTER E) ForrenderingYa-phalāfollowedbyঅandএitisnecessarytotypeU+09CDplusU+09AFyapreceded by the said vowels This is a purely ligatural entity and the addition ofYa-phalāandā-kāraisusedtoelicittheaeligsoundasinEnglishlsquobatrsquolsquofatrsquoetcTheBrahmıscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribedaslsquovowel killerrsquo although it actually indicates absence of a vowel after the markedconsonant Only the consonants can have the Hasanta marked But as we see hereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅGাandএGা(CfUnicode100p473[100])Another case refers to the formation of repha and ra-phalā in the said script andmentioned in the tableaboveasPOwing toco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+

kaasinঅকHarkaldquothesunldquo)ra-phalā=C+Hasanta+ra(egTieka+Hasanta+raas

inচTcakraldquocycleldquo)ThepointisinboththecasestheslotforracouldbeBanglaraর

(U+09B0)ortheAssameseraৰ(U+09F0)followedprecededbythecommonHasanta(U+09CD)whereastheshapesofrephaandra-phalāinboththecasesremainthesame

43

6 VariantsThissection talksabout thevariants in theBanglascriptTheNBGPcategorizes theseconfusinglyvariantsintwogroupsGroup1ConfusingduetopurevisualsimilarityGroup2ConfusingduetodeviationfromnormallyperceivedcharacterformationsbylargerlinguisticcommunityForGroup1anyidenticalcodepointsaredefinedasvariantsTheconfusablebutnotidenticalcasesarenotproposedasthereisanotherpanel(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesHowevercaseswhichbelongtoGroup2areproposedtobeconsideredasvariantsThesecasesarenotofmerevisualsimilarityasthey involve some deviations from the widely accepted norms of Bangla Aksharformations These can cause confusion even to a careful observer and hence beingproposedasvariantsThe variants are generated in a script when two or more forms are formed withdifferent storage or code points In Bangla the e-kāra ā-kāra and the o-kāra havedifferentcodepointsOnecantypeowithaconsonantatonegoandthesamebytypinge-kāra and ā-kāra as two separate keys getting the same results A reader cannotdifferentiatebetween the twoko (েকা)one typedwitha singlekeyand theotheronetypedwithtwodifferentkeysMoreoverthiswillnotbeconsideredasacaseofvariantbecauseakārafollowedbyakāraisnotallowed

61 InScriptVariantsHoweverweproposetwocasesoftruein-scriptvariantsinBanglascriptCASEIAs far as true variants in Bangla are concernedwemay drawour attention to caseswhereinHasantawith(U+09A5)থ(tha)appearsasconjunctwith(U+09B8)স(sa)and(U+09A8)ন(na)1 স+Hasanta+থ(U+09B8+U+09CD+U+09A5)versus

স +Hasanta+হ(U+09B8+U+09CD+U+09B9)

2 ন+Hasanta+থ(U+09A8+U+09CD+U+09A5)versus

ন+Hasanta+হ(U+09A8+U+09CD+U+09B9)

44

Theabovecombinationsifwrittenintraditionalorthographycouldbelittleconfusingwheretheথ(tha)inconjunctappearslikeaহ(ha)Theconjunctcouldbeintheinitialmedialorfinalpositions(asshownbelowinegno1)Itcouldbetypedwrongaswellthinking itwas aহ (ha)U+09B9 increasing the chances of risks in labelwriting andidentificationExamples1 acuteandসহ(asinacuteানsthānaacuteলsthulaাacuteGsvāsthyaঅacuteায়ীasthāyı)

2 andনহ(asinparagrantha)The fontswhichrepresent traditionalBanglawritingsystemcould tend tocreate thisproblemThereforethesemaybetakenascasesofvariantsinBanglaCASEIIAnotherinterestingexampleofvariantisencounteredinra+HasantaandHasanta+racombinationsinwritinglabelsintheBanglascript(forlanguagessuchasBanglaAssameseandManipuri)Thevariantcasesariseintypinglsquorepharsquo(involvingra+Hasanta)andlsquora-phalārsquo(involvingHasanta+ra)lsquoRepharsquocouldbeformedbytwosequences(mainlybecausebothAssameseandBanglafindplaceinthesameUNICODEpointsandlsquoB_RArsquoaswellaslsquoA_RArsquorefertothesamephoneticelement)Herethefinalligatureslookthesameandwillbeasfollows

(1) B_RA+H+C(2) A_RA+H+C

Where

B_RA = U+09B0BENGALILETTERRA(র)orA_RA = U+09F0BENGALILETTERRAWITHMIDDLEDIAGONAL(ৰ)H = U+09CDBENGALISIGNVIRAMA()C = anyconsonant(theoretically)

Example

Sequence1(UsingBanglaRA)

Ligature1

Sequence2(UsingAssameseRA)

Ligature2

U+09B0(র)U+09CD()U+0995(ক) কH U+09F0(ৰ)U+09CD()U+0995(ক) কH

U+09B0(র)U+09CD()U+09A0(ঠ) ঠH U+09F0(ৰ)U+09CD()U+09A0(ঠ) ঠH

Table12ExampleofRepha

45

NoteAsBanglaandAssameseকandঠlookexactlythesametheresultantcombinationswithRephalookidenticalAdditionofRephadoesnotmakeanydifferencelsquoRa-phalarsquocouldbeformedbytwosequencesonsimilargroundsandthefinalligatureswouldlookthesame

(1) C1+H+B_RA(2) C1+H+A_RA

WhereC1 = anyconsonantsexceptKhanda-ta

Example

Sequence1(UsingBanglaRA)

Ligature1

Sequence2(UsingAssameseRA)

Ligature2

U+0995(ক)U+09CD()U+09B0(র) U+0995(ক)U+09CD()U+09F0(ৰ)

U+09A8(ন)U+09CD()U+09B0(র) ) U+09A8(ন)U+09CD()U+09F0(ৰ) )

Table13ExampleofRa-phala

AstheAssameseandBanglaRephaandRa-phalaconjunctformslookthesamethiscouldcauseconfusabilitytotheend-usersHencetherephaandra-phalacasesneedtobedefinedasvariantsNBGPconcludedtodefineরandৰasvariantcodepointswhereonlyonevariantsetbetweenরandৰcouldcoverallcasesButthiswillcreateblockedvariantlabelsegif

someoneregistersldquoরররrdquothevariantlabelldquoৰৰৰrdquowillbegeneratedasvariantandwillbeblockedandviceversaHoweveritisonlyblockedatthelabellevelifsomeoneelseneedstoregisterotherlabelsegৰৰorৰৰৰৰitisstillpossible

62 CrossScriptVariantsAcrispcrossscriptstudyforBanglahasbeendonewithrespecttosisterscriptssuchasDevanāgarī Gurmukhı and Odia11(formerly Oriya) keeping in mind the visual andtechnicalconfusionstheymaycauseaslabelsonthewebdomainMoreoverthereisnoin-script variant in Bangla as far as the orthography is concerned The followingcharacters are being proposed by the NBPG as variants Although there are certaincharacters which are somewhat similar they but have not been included here TheyhavebeenprovidedintheAppendix(102)forreference

11 Unicode uses Oriya for the script although Odia is now the official term used

46

1 BanglaandNāgarīDevanāgarīScript

Bangla Devanāgarī

মU+09AE

मU+092E

িU+09BF

िU+093F

Table14-BanglaandDevanāgarīcross-scriptvariantcodepoint

2 BanglaandGurmukhiScript

Bangla Gurmukhı

মU+09AE

ਸU+0A38

িU+09BF

ਿU+0A3F

Table15-BanglaandGurmukhıcross-scriptvariantcodepoint

7 WholeLabelEvaluationRules(WLE) ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinsection32whenwritten inBangla12ScriptThe ruleshavebeendrafted in suchawaythattheycanbeeasilytranslatedintotheLGRspecificationsBelow are the symbols used in the WLE rules for each of the Indic SyllabicCategoryasmentionedinthetableprovidedinCodepointrepertoire(Section51)

C rarr Consonant

M rarr Kāra(Mātrā)

V rarr Vowel

B rarr Anusvāra

12 As used by the Unicode denoting and including both Assamese and Maṇipuri

47

D rarr Candrabindu

X rarr Visarga

H rarr Hasanta

Z rarr KhandaTa

S rarr S1S2(fromTable9)or(ae)Ya-phalā(V1HC1M1)whereV1iseither0985(অ-BENGALILETTERA)or098F(এ-BENGALILETTERE)His09CD(-BENGALISIGNVIRAMA)C1is-09AF(য-BENGALILETTERYA)M1is-09BE(া-BENGALIVOWELSIGNAA)S1 and S2 are valid even they are not allowed by the other context rules

P rarr Ra-Hasanta(C2H)whereC2iseither09B0(র-BENGALILETTERRA)or 09F0 (ৰ - ASSAMESE LETTER RA

Unicode name BENGALI LETTER RAWITHMIDDLEDIAGONAL)

His09CD(-BENGALISIGNVIRAMA)

Table16-SymbolsusedinWLErules

ItisalsoperhapsidealtomentionherethatinBanglatheconsonantletters(orgraphemes)arephysicallyjoinedtoformldquoclustersrdquothatcouldtheoreticallyconjoinfromtwotofourconsonantsandcombinetocreatenewshapesDashandChaudhuri(1998)statethatthereareldquonearly380uniqueconsonantclustersrdquooutofwhichBi-consonantalcombinationsare290three-lettercombinationsaccountforanother80andtherareroneswithfourlettersnumber10more[136Pg4]MoredetailsofsuchcombinationscouldbeseeninPabitraSarkar(1993)[135] 71FinalSetofWLERulesTheprevalentpatternsinBanglaandvariousrestrictionsbelowarethespecificWLErulesthatneedtobeimplemented

48

1 CisasetofCandCNwhereCNisthesetofnormalizedformsofড়ঢ়য়2 HmustbeprecededbyC

Example

3 MmustbeprecededbyCExampleকা

4 DmustbeprecededbyeitherofVCorMExampleআখখা হা

5 XmustbeprecededbyeitherofVCMorDExampleউঃখঃবঃাঃ দ ঃ

6 BmustbeprecededbyeitherofVCMorDExampleআংইংকং

7 ZmustbeprecededbyVCMDBXorPExampleইৎকৎাৎাৎপ6ৎrৎ (S is not listed because S ends with M Z may also follow S)rdquo

8 VCANNOTbeprecededbyHDetailsin711CaseofVprecededbyH

9 SCANNOTbeprecededbyH

Now letuselaborateeach rulewithexamples from the scriptkeeping inmindthe Bangla Assamese and Manipuri communities Some combinations ofcharactersmayseemunrealisticorrareinusagebutthereisnoharminaddingsuch ligaturesbecause it ispossible tocreate thembyanyusereasilybutmaynotbeattestedcombinationsCase of V Preceded by H There could be cases involvingmulti-word domains where Vmay need to beallowedtofollowanHegব8াsঅtইিuয়া baeligŋkʌv ɪndiə (U+09ACU+09CDU+09AFU+09BEU+0999U+09CD U+0995 U+0985 U+09AB U+09CD U+0987 U+09A8 U+09CD U+09A1U+09BFU+09DFU+09BE)(meaningBankofIndia)ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendswithanH(অt)andthesecondwordbeginswithaV(ইিuয়া)Somesectionsofthe linguistic community require the explicit presence of H for fullrepresentation of the sound intended However by and large the form of thefirstwordwithoutanH(U+09CD)isconsideredenoughforfullrepresentationofthesoundintendedforthefirstword

49

This isauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoire Otherwise V is never required to be allowed to follow an HPermittingthismaycreateaperceptivesimilaritybetweentwolabels(withandwithout H) for majority of the linguistic communities hence this is explicitlyprohibitedbytheNBGPIn future if required depending on the prevailing requirements from thecommunitythefutureNBGPmayconsiderrevisitingthisrule

72 AdditionalExamplesfromBanglaABNF

Belowarea fewexampleswhichhelponeunderstandsomeof the rulesABNFputsinplaceThesearejustgivenforreferencepurposesandarenotmeanttobecomprehensive

1HMBDorXcannotoccurinthebeginningofaBanglawordExample

ক क

াক ाक

ংক क

ক क

ঃক ःकAscanbeseensuchcombinationwillresultautomaticallyinaldquogolurdquooradottedcircle marking it as an invalid formation This is an intrinsic property of theIndianlanguagesyllableandisquasi-automaticallyappliedwhereversupportedbytheOS

2HisnotpermittedafterVBDXMS

Example

অ अ

অং क ক क কঃ कः ি )क

3NumberofBDorXpermittedafterConsonantorVowelorakāra(Mātrā)isrestrictedtoonethusthefollowingcombinationsareinvalidated

50

Example

কংং क

ক क

কঃঃ कःः

কা का

কীঃঃ कःः

অংং अ

অ अ

অঃঃ अःः

4NumberofMpermittedafterConsonantisrestrictedtooneExample

কীী की5MisnotpermittedafterV Example

ইাঈৗ ईाईौ6ThecombinationsofAnusvāra+ VisargaaswellasVisarga+Anusvāraarenotpermissible

Example

কংঃ कः

কঃং कः

8 Contributors

81ExpertsfromIndia ProfessorUdayaNarayanaSinghChair-ProfessorofLinguisticsampDeanFacultyofArtsAmityUniversityHaryanaGurgaonPachgaonManesarPIN122431(Haryana)India

51

ProfessorPabitraSarkarformerlyVice-ChancellorRabindraBharatiUniversityKolkataDrAtiurRahmanKhanPrincipalTechnicalOfficerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMrRajibChakrabortyLinguistSocietyforNaturalLanguageTechnologyResearch(SNLTR)Module114amp130SDFBuildingSaltLakeSector-VKolkata-700091(WestBengal)IndiaMrAkshatJoshiProjectEngineerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMsMoumitaChowdhurySeniorTechnicalOfficerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMrChandrakantaMurasinghAgartalaTripuraSomeotherNBGPmembers

82ContributorsfromBangladeshJanabMustafaJabbarHonorableMinisterMinistryofPostsTelecommunicationsampInformationTechnologyGovtofBangladeshProfShamsuzzamanKhanFormerDirector-GeneralBanglaAcademyDhakaProfRafiqulIslamNationalProfessorofHumanitiesDhakaProfSwarochisSarkarDirectorInstituteofBangladeshStudiesRajshahiUniversityRajshahiBangladeshProfJinnatImtiazAliDirector-GeneralInternationalMotherLanguageInstituteDhakaMrMohammadMamunOrRashidDepartmentofBanglaJahangirnagarUniversityampMemberBangladeshComputerCouncilProfManiruzzamanformerlyProfessorChittagongUniversityChattagramBangladeshMrShyamSunderSikderSecretarySecretaryPostampTelecommunicationsDivisionGovtofBangladesh

52

MrMdMustafaKamalFormerDirectorGeneralBangladeshTelecommunicationsRegulatoryAuthorityGovernmentofBangladeshDhakaBrigadierGeneralMdMahfuzulKarimMajumderDirector-GeneralEngineeringampOperationsDivisionBangladeshTelecommunicationsRegulatoryAuthorityGovernmentofBangladeshDhakaMdZiarulIslamProgrammerPostsampTelecommunicationsDivisionGovernmentofBangladeshDhakaProfSyedShahriyarRahmanDepartmentofLinguisticsUniversityofDhakaDrMizanurRahmanDirector(In-Charge)TranslationTextBookandInternationalRelationsDivisionBanglaAcademyDhakaDrApareshBandyopadhyayDirectorBanglaAcademyDhakaMrMdMobarakHossainDirectorBanglaAcademyDhakaDrJalalAhmedDirectorBanglaAcademyDhakaMrJahangirHossainInternetSocietyBangladesh(ICANNALS)JanabSarwarMostafaChoudhuryBangladeshComputerCouncilDhakaJanabMdRashidWasifBangladeshComputerCouncilDhakaJanabIstiaqueArifSeniorAssistantDirectorBangladeshTelecommunicationsRegulatoryAuthorityDhakaMsAfifaAbbasInformationSecurityandGovernanceLeadEngineeratBanglalinkandICANNFellowMr Mohammad Abdul Haque Secretary General Bangladesh Internet GovernanceForumMrImranHossenCEOEyeSoftandkeymemberofBangladeshAssociationofSoftwareampInformationServices(BASIS)MsShahidaKhatunDirectorFolkloreMuseumampArchiveDivisionBanglaAcademyDhakaMrSyedAshikRehmanCEOBengalMediaCorporationDhakaMrHaseebRahmanCEOProfessionalsrsquoSystemsDhaka

53

9 References[100] UnicodeConsortium2017UnicodeStandard100MountainViewCA

[101] BandyopadhyayChittaranjan1981DuiShatakerBanglaMudranoPrakashanKolkataAnandaPublishers

[102] BanerjiRD1919TheOriginoftheBengaliScriptKolkataNewDelhiAsianEducationalServices2003reprint

[103] ChatterjiSK1926TheOriginandDevelopmentoftheBengaliLanguageCalcuttaCalcuttaUniversityPress

[104] -----1939Bhasha-prakashBangalaVyakaran(AGrammaroftheBengaliLanguage)CalcuttaUniversityofCalcutta

[105] HaiMuhammadAbdul1964DhvaniVijnanOBanglaDhvani-tattwa(PhoneticsandBengaliPhonology)DhakaBanglaAcademy

[106]JhaSubhadra1958TheFormationofMaithiliLondonLuzacampCo

[107] KosticDjordjeDasRheaS1972AShortOutlineofBengaliPhoneticsCalcuttaStatisticalPublishingCompany

[108] MajumdarRC1971HistoryofAncientBengalCalcuttaGBhardwaj

[109] MazumdarBijaychandra19202000TheHistoryoftheBengaliLanguage(ReprCalcutta1920ed)NewDelhiAsianEducationalServices

[110] PandeyAnshuman2001ProposaltoEncodetheTirhutaScriptinISOIEC10646

[111] PalPalashBaran2001DhwanimalaBarnamalaKolkataPapyrus

[112] -----2007lsquoBanglaHarapherPanchParbarsquoInSwapanChakrabortyedMudranerSanskritiOBanglaBoiKolkataAbabhas

[113] RossFiona1999ThePrintedBengaliCharacteranditsEvolutionLondonCurzon

[114] ShastriMahamahopadhyayHaraPrasad1916HājārBacharērPurāṇaBāṅgālāBhāṣāyBauddhaGānōDōhāCalcuttaBangiyaSahityaParishat

[115] SinghUdayaNarayana(JointlyManiruzzaman)1983DiglossiainBangladeshandlanguageplanningCalcuttaGyanBharati214pp

[116] -----1987ABibliographyofBengaliLinguisticsMysoreCIILxii+316pp

[117] -----2017(withRajibChakrabortyBidishaBhattacharjeeampArimardanKumarTripathy)LanguagesandCulturesontheMarginGuidelinesforFieldworkonEndangeredLanguagesMimeoCentreforEndangeredLanguagesVisva-Bharati

54

[118] -----1980ScriptalchoiceandspellingreformAnessayinlanguageandplanningJournaloftheMSUniversityofBarodaSocialScienceNumber292173-186AmodifiedversionreprintedEAnnamalaiBjornJernuddandJoanRubinedsLanguagePlanningProceedingsofanInstituteMysoreCIIL405-417

[119] Sripantha1996JakhanChapakhanaEloKolkataPaschim-BangaBanglaAcademy

[120] SurAtul1986BanglaMudranerDushoBacharKolkataJijnasa

[121] ScriptBehaviourforBengaliVersion11TDILandC-DACPune

[122] BoraMahendra1981TheEvolutionofAssameseScriptJorhatAssamSahityaSabha

[123] ProposaltoEncodetheTirhutaScriptinISOIEC10646httpwwwunicodeorgL2L201111175r-tirhutapdfaccessedon25112017

[124] EthnologueAssameseintheLanguageCloudhttpswwwethnologuecomcloudasmaccessedon25112017

[125] BengalialphabetforManipurifoundinEthnologueManipuri(MeeteilonMeithei)httpswwwomniglotcomwritingmanipurihtmaccessedon20102019

[126] WikipediaBengalialphabethttpsenwikipediaorgwikiBengali_alphabetaccessedon25112017

[129]OmniglotSlyhetihttpwwwomniglotcomwritingsylotihtmaccessedon1052018

[130]WikipediaBishnupriyaManipurilanguagehttpsenwikipediaorgwikiBishnupriya_Manipuri_languageaccessedon25112017

[131] TheEMILLECIILCorpushttpmetashareeldaorgrepositorybrowsethe-emilleciil-corpusabdd35c8de6f11e2b1e400259011f6ea6bce74d38dbb42d881da76c64a6adb20accessedon1052018

[132]TheEMILLECIILCorpushttpcatalogelrainfoproduct_infophpproducts_id=696accessedon1052018

[133] BanglaLanguageampScript httpswwwisicalacin~rc_banglabanglahtmlaccessedon1052018

[134] SarkarPabitra1992BanglaBananSanskarSamasyaoSambhabanaKolkataChirayataPrakashan

[135] SarkarPabitra1993BanglaBhasharYuktabyanjanBhasha1123-45

55

[136] DashNiladriShekharandBBChaudhuri1998BanglaScriptAStructuralStudyLinguisticsToday121-28Alsoavailableathttpswwwacademiaedu9967428Bangla_Script_A_Structural_Study

[137] DaniAhmedHasan(1957)lsquoSrīhatta-NāgarīLipirUtpattioBikāśrsquoBanglaAcademyPatrika(Dhaka)Vol12(Bhadra-Agrahayan1364BangabdaNumber)pg1

[138] WikipediaSylhetiNagari httpsenwikipediaorgwikiSylheti_Nagariaccessedon1952018

[139] FuruiRyosuke(2015)lsquoVariegatedAdaptationsStateFormationinBengalfromtheFifthtoSeventhCenturyrsquoinBhairabiPrasadSahuampHermannKulkeedsInterrogatingPoliticalSystemsIntegrativeProcessesandStatesinPre-ModernIndiaChapter9Pp255-73NewDelhiManohar

[140] FergusonCharesAandMunierChowdhury(1960)lsquoPhonesofBengalirsquoLanguageVol36No1pp22-59

[141] ShahidullahMuhammad(2007)BuddhistMysticSongsDhakaMowlaBrothers

[142] RayPunyaSloka(1966)BengaliLanguageHandbookWashington

[143] HaiMuhammadAbdul(1960)AphoneticandphonologicalstudyofnasalsandnasalizationinBengaliDhakaUniversityofDhaka

[144] UnicodeConsortiumProposalSummaryFormtoAccompanySubmissionsforAdditionstotheRepertoireofISOIEC10646UNICODEhttpswwwunicodeorgL2L200202387r-syloti-formpdfaccessedonMay212018

[145] WikipediaOlChiki(Unicodeblock)httpsenwikipediaorgwikiOl_Chiki_(Unicode_block)accessedonMay212018

[146]BanglaScripthttpwwwbangladesh2000combdbangla_scripthtmlaccessedonMay212018

[147] BhattacharyaAshutoshed(1942)GopichandrerGanCalcuttaCalcuttaUniversity

[149] DasSisirKumar(1975)SahibsandMunshisAnAccountoftheCollegeofFortWilliamCalcutta

[150] IslamRafiqulPabitraSarkarMahbubulHaqampRajibChakraborty(eds)(2014)BanglaAcademyPromitoBanglaByabaharikByakaran(AFunctionalGrammarofStandardBangla)DhakaBanglaAcademy

[151] SarkarPabitra[2013]lsquoBanglaSpellingReformtheLongandShortofItrsquoBanglaJournal19215-232

[152] BanglaAcademy(2012)BanglaAcademyPromitoBanglaBananerNiyam(StandardBanglaSpellingasadoptedbyBanglaAcademy)DhakaBanglaAcademy

56

[153] SarkarPabitraampRajibChakraborty2018ldquoWhathashappenedSoFarIntermsof Script Reformsrdquo Paper presented at the Face to Face meeting jointly held by theBanglaAcademyDhakaampICANNatBanglaAcademyDhakaon10072018

[154] The Unicode Consortium 2018 The Unicodereg Standard Version 110 ndash CoreSpecificationChapter12P473

57

10 Appendix-I

101 AugmentedBackus-NaurFormalism(ABNF)The Augmented Backus-Naur Formalism (ABNF) is generic in nature and whenappliedtoaspecificlanguagescriptcertainrestrictionrulesapplyInotherwordsinagivenlanguagesomeoftheFormalismstructuresdonotnecessarilyapplyTotakecareofsuchcasesrestrictionrulesaresetinplaceTheserestrictionswillhelptofine-tunetheABNFIncaseofBangla13inparticularthefollowingrulesapply

1Khaṇḍata (ৎ) is NOT allowed at the beginning of an IDN label The sameappliestoঞandthevelarnasalঙintheBanglaSchemeoffive-foldlsquovargarsquo(asdefined under Table 5) Moreover Bangla does not allow ya (য়) in thebeginningofawordeitherbutwecanciteacoupleofnativeexamples forexamplethewordয়8াvেড়া(yaeligbbɔRo)fromthepoemlsquoLichuchorrsquowrittenbyKaziNazrul IslamHowever there are instances of it being used in namesmostlyof foreignoriginsuchasYaqubwhichmaybewrittenwithya(য়) inthe beginning as inয়াwব) In very recent timeswhile transliterating someChineseandJapanesenamesinBanglaonedoescomeacrossthepossibilityofKhaṇḍata (ৎ) followedbysa (স) inthebeginningofaword forexamplexেসিরং(Tsering)

2CHcancomewithKhandaTainonlythecasewhereCisra(র)(09B0)

ৎ6 asinভৎ6 সনা

3OnlyfollowingcombinationswithVHCMwillbeallowedrarrঅ8া(togetherpronouncedasaelig)asinঅ8ািসড(acid)rarrএ8া(togetheralsopronouncedasaelig)asinএ8ািসডএ8ােসািসেয়শান

(acidassociation)

102 lsquoSylhetiNagarılipirsquoorlsquoSilotirsquoThisversionofBanglascriptresemblesthe lsquoKaithīrsquoscript(ISO12954)usedbytheAccountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh andBiharndashwidelyinuseduringthe1880sTherewereseveralothernamesofSylheti

13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language Assamese and Maṇipuri have not been covered in this section

58

Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)

Table17ndashTheScriptTableofSylhetiNagarıorSiloti

103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints

1031 BanglaandNāgarīorDevanāgarī

Bangla Devanāgarī NBGP

Decision

ঃ U+0983 ः U+0903 Confusable

ওU+0993 उU+0909 Confusable

ঘU+0998 घU+0918 Confusable

U+0981 U+0945 Confusable

Table18BanglaandDevanāgarīconfusablecodepoints

59

1032 BanglaandGurmukhi

Bangla Gurmukhi NBGPdecision

ঘU+0998 ਬU+0A2C Confusable

U+0981 U+0A71 Confusable

Table19BanglaandGurmukhiconfusablecodepoints

Bangla Gurmukhi

NBGPdecision

ওU+0993 ਤU+0A24

Distinguishable

শU+09B6 ਅU+0A05

Distinguishable

মU+09AE ਮU+0A2E

Distinguishable

বাU+09ACandU+09BE

ਗU+0A17

Distinguishable

Table20ndashBanglaandGurmukhıdistinguishablecodepoints

1033BanglaandOriya(Odia)

Bangla Oriya(Odia) NBGPDecision

ওU+0993 ଓU+0B13 Confusable

Table21ndashBanglaandOriyadistinguishablecodepoints

Bangla Oriya(Odia) NBGP

Decision

ঘU+0998 ସU+0B38 Distinguishable

Table22ndashBanglaandOriyadistinguishablecodepoints

60

11 Appendix-IIBengaliconsonantsandtheirallographs

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

প p y(z+ত)(z +ন)|(z +প)প8(z +য)r(z +র)(z +ল)~(z +স)(+প)(+প)

ফ pʰ (t +র)(t +ল)(+ফ)

ব b ( +জ)( +দ)( +ধ)v(+ব)ব8(+য)(+র)(+ল)ভ(+ভ)(+ব)(+ব)

(0+ধ) 2 (3+ব)

ভ bʱ ভ8(+য)(+র)(+ল)

ত t (x+ত)8(x+x+য)(x+x+ব)(x+থ)(x+ন)ত8(x+য)(x+ম)8(x++য)(x+ব)(x+র)y(z+ত)(p+ত)(p+x+ব)(+ত)q8(+x++য) (+x+র)Thereisamarkedformofত+=ৎৎ6 (+xৎ)

amp (5+ত)

61

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

থ tʰ থ8(iexcl+য)cent(iexcl+র)pound(+থ)(x+থ)curren(+থ)

(7+থ) 9 (+থ)

দ d yen(brvbar+গ)sect(brvbar+ঘ)uml(brvbar+দ)copy(brvbar+ধ)দ8(brvbar+য)ordf(brvbar+ব)laquo(brvbar+ভ)not(brvbar+র)(+দ)shy(+দ)reg(+brvbar+র)not6 (+brvbar+র)

(lt+গ) gt (lt+ধ)

ধ dʱ macr(deg+ন)plusmn(deg+ম)ধ8(deg+য)sup2(deg+র)sup3(acute+ধ)copy(brvbar+ধ)(+ধ)micro(+ধ)

( (+ধ) gt (lt+ধ) (0+ধ) (7+ধ)

ট ʈ para(middot+ট)ট8(middot+য)cedil(middot+ব)sup1(middot+র)ordm(p+ট)raquo(frac14+ট)

ঠ ʈʰ ঠ8(frac12+য)frac34(iquest+ঠ)Agrave(frac14+ঠ)

ড ɖ Aacute(Acirc+ড)ড8(Acirc+য)Atilde(Acirc+র)

ঢ ɖʱ ঢ8(Auml+য)Aring(iquest+ঢ)

চ tʃ AElig(Ccedil+চ)Egrave(Ccedil+ছ)Eacute(Ccedil+Ecirc+র)Euml(Ccedil+ঞ)চ8(Ccedil+য)Igrave(Iacute+চ)Icirc(Iuml+চ)

(A+চ)

62

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ছ tʃʰ ETH(Ecirc+র)Egrave(Ccedil+ছ)Ntilde(Iacute+ছ)Ograve(Iuml+ছ)

$ (A+ছ)

জ dʒ Oacute(Ocirc+জ)Otilde(Ocirc+Ocirc+ব)Ouml(Ocirc+ঝ)times(Ocirc+ঞ)জ8(Ocirc+য)Oslash(Ocirc+র)Ugrave(Iacute+জ)

(A+জ)

ঝ dʒʱ (notprivilegedenoughtohaveclustersasafirstmember)Ouml(Ocirc+ঝ)Uacute(Iacute+ঝ)

ক k Ucirc(p+ক)ordm(p+ট)(p+ত)Uuml(p+x+র)(p+x+ব)Yacute(p+ন)THORN(p+ব)szlig(p+ম)ক8(p+য)agrave(p+র)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)auml(p+frac14+ব)aacute8(p+frac14+য)aring(p+স)s(aelig+ক)ccedil(+p+র)

amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)

খ kʰ (notprivilegedenoughtohaveclustersasafirstmember)egrave(aelig+খ)

63

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

গ g eacute(acute+গ)ecirc(acute+দ)sup3(acute+ধ)euml(acute+ন)igrave(acute+ব)iacute(acute+ম)গ8(acute+য)icirc(acute+র)iuml(acute+ল)eth(aelig+গ)eth6 (+aelig+গ)

( (+ধ) (H+গ) K (L+H+গ)

ঘ gʱ ntilde(ograve+ন)ঘ8(ograve+য)oacute(ograve+র)ocirc(aelig+ঘ)

ঞ Thisletterdoesnothaveanyparticularphoneticvaluebutmostlypronouncedasn

Igrave(Iacute+চ)Ntilde(Iacute+ছ)Ugrave(Iacute+জ)Uacute(Iacute+ঝ)times(Ocirc+ঞ)

(A+চ) $ (A+ছ) (A+জ) M (A+ঝ)

ণ n otilde(iquest+ট)frac34(iquest+ঠ)ouml(iquest+ড)divide(iquest+Acirc+র)Aring(iquest+ঢ)oslash(iquest+ণ)ণ8(iquest+য)ugrave(iquest+ব)acirc(p+frac14+ণ)uacute(frac14+ণ)ucirc(+ণ)

O (P+ড) - (P+R+র) + (S+ণ)

ঙং ŋ s(aelig+ক)uuml(aelig+p+র)egrave(aelig+খ)eth(aelig+গ)ocirc(aelig+ঘ)yacute(aelig+p+ষ)(Insomecontextsaeligisreplacedbyং)কংঅং

) (H+ক) (H+গ) U (H+ঘ)

64

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ম m thorn(+ল)yuml(+প)(+z+র)(+ভ)(++র)$(+ম)(+র)(x+ম)plusmn(deg+ম)amp(+ম)atilde(p+frac14+ম)

W (3+ম)

ন n (+ট)((+middot+র))(iquest+ঠ)u(+ড)(+Acirc+র)(+ত)q(+x+র)q8(+x++য)curren(+থ)shy(+দ)reg(+brvbar+র)micro(+ধ)+(+deg+র)(+brvbar+ব)-(+ন)(+ম)ন8(+য)(+স)0(+ন)

(7+থ) (7+ধ) (7+Y+র)

শ ʃ Icirc(Iuml+চ)Ograve(Iuml+ছ)1(Iuml+ন)2(Iuml+ম)3(Iuml+র)4(Iuml+ল)শ8(Iuml+য)

ষ ʃ 5(frac14+ক)raquo(frac14+ট)Agrave(frac14+ঠ)uacute(frac14+ণ)6(frac14+প)7(frac14+z+র)8(frac14+ফ)raquo(frac14+ট)9(frac14+middot+র)Agrave(frac14+ঠ)uacute(frac14+ণ)ষ8(frac14+য)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)

+ (S+ণ)

65

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

স sampʃ (+ক)(+ট)(+প)(+ফ)lt(+ত)pound(+থ)(+ট)(+ক)=(+খ)স8(+য)gt(+র)(+ল)aring(p+স)

9 (+থ)

হ h ucirc(+ণ)0(+ন)amp(+ম)হ8(+য)(+র)A(+ল)

W (3+ম)

ড় ɽ B(C+গ)

ঢ় ɽʱ (notprivilegedenoughtohaveclusters)

য dʒThesecondarysymbol(allograph)jɔ-phalahastwophoneticvaluesWhenaddedtotheinitialconsonantinaworditisavowelaelig(asinশ8ামলর 8াপারetc)Butafteranon-initialconsonantitjustdoublesitinpronunciation(asinকায6ধায6etc)The+যcombinationhastwophysicalmanifestationsmdashর 8andয6

ক8(p+য)স8(+য)র 8(+য)[Justর 8isneverusedinBanglaorthographyর 8াisbutthenitslasttwosymbolsYa-phalaa-karaconstituteavowelsignrepresentingthevowelঅ8া]

66

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

র r Twomanifestationsmdashi lরফrepʰasthefirst

memberofaclusteregপ6ৎ6 not6 য6D6(+deg+ব)(earlierE6=+brvbar+deg+বafour-termcluster)etc(placedoverthefollowingconsonant)

ii র-ফলাrɔ-pʰɔlaasthesecondthirdmemberofaclustereg etc(placedundertheconsonantitfollows)

ল l F(+গ)(+প)G(+ব)H(+ম)I(+ট)J(+ড)K(+ক)F(+গ)L(+দ)ল8(+য)iuml(acute+ল)(+ল)thorn(+ল)

ঃ hwordfinallywordmediallyitdoublesthepronunciationofthefollowingconsonant

অঃকঃ

অব

Page 32: Proposal for a Bangla (or Bengali) Script Root Zone …‘Bangla’ (or Bengali) is historically and genealogically regarded as an eastern Indo-Aryan language with around 178.2 million

32

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

40 U+09AD ভ BENGALILETTERBHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

41 U+09AE ম BENGALILETTERMA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

42 U+09AF য BENGALILETTERYA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

43 09AF09BC(U+09DF)

য় NormalizedformofBENGALILETTERYYA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]09DFisthepreferredcodepointhoweveritisnotavailableforLGRasperthestandardsgoverningthisLGRdevelopment

44 U+09B0 র BENGALILETTERRA

Consonant 1Bangla2Manipuri

[112][125]

45 U+09B2 ল BENGALILETTERLA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

46 U+09B6 শ BENGALILETTERSHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

33

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

47 U+09B7 ষ BENGALILETTERSSA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

48 U+09B8 স BENGALILETTERSA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

49 U+09B9 হ BENGALILETTERHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

50 U+09BE া BENGALIVOWELSIGNAA

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

51 U+09BF ি BENGALIVOWELSIGNI

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

52 U+09C0 ী BENGALIVOWELSIGNII

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

53 U+09C1 BENGALIVOWELSIGNU

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

34

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

54 U+09C2 BENGALIVOWELSIGNUU

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

55 U+09C3 BENGALIVOWELSIGNVOCALICR

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

56 U+09C4 BENGALIVOWELSIGNVOCALICRR

Kāra(Mātrā)

1Bangla2Assamese

[112][122]

57 U+09C7 l BENGALIVOWELSIGNE

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

58 U+09C8 m BENGALIVOWELSIGNAI

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

59 U+09CB lা BENGALIVOWELSIGNO

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

60 U+09CC lৗ BENGALIVOWELSIGNAU

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

35

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

61 U+09CD BENGALISIGNVIRAMA

Hasanta(=Halant)Virama(=Da ri)

1Bangla2Assamese2Manipuri

[112][122][125]

62 U+09CE ৎ BENGALILETTERKHANDATA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

63 U+09F0 ৰ BENGALILETTERRAWITHMIDDLEDIAGONAL

Consonant 2Assamese [122]

64 U+09F1 ৱ BENGALILETTERRAWITHLOWERDIAGONAL

Consonant 2Assamese2Manipuri

[122][125]

Table8BanglaCode-PointRepertoire

Apart from the above individual code-points the Neo-Brāhmī Generation Panel alsoproposes some specific sequences which enable conditional inclusion of the BanglaLETTER A and E followed by Bangla SIGN VIRAMA and Bangla LETTER YA againfollowed by Bangla VOWEL SIGN AA in the repertoire for enabling inclusion of aeligsoundasinEnglishlsquobatrsquolsquocatrsquoetc

SrNo

UnicodeCodePoints

Sequence

CharacterNames Examplelanguagesusingthecode-point

(Notexhaustive

list)

Reference

36

SrNo

UnicodeCodePoints

Sequence

CharacterNames Examplelanguagesusingthecode-point

(Notexhaustive

list)

Reference

S1 098509CD09AF09BE

অ8া BENGALILETTERABENGALISIGNVIRAMABENGALILETTERYABENGALIVOWELSIGNAA

BanglaAssamese

[112][122]

S2 098F09CD09AF09BE

এ8া BENGALILETTEREBENGALISIGNVIRAMABENGALILETTERYABENGALIVOWELSIGNAA

Bangla [112]

Table9Sequences

52 CodePointRepertoireExclusionTherearesomecharactersoftheBanglascriptthatfindplaceintheUnicodebuthavenot been included in the repertoire in the LGR proposal The reason for excludingঌ(U+098C)andৗ(U+09D7)isthattheyarerareandobsoletecharacters

SrNo CodePoints

Glyph CharacterNames Note

1 U+098C ঌ BENGALILETTERVOCALICL Limitedordeclininguse

2 U+09D7 ৗ BENGALIAULENGTHMARK Limitedordeclininguse

Table10ExcludedCodePoints

37

53 CodepointnotusedaloneBENGALI SIGN NUKTA U+09BC (See 336) is excluded from repertoire since it will never be used alone It will be used as sequence in three special characters in normalized form for ড় ঢ় য়

UnicodeCodePoint

Glyph CharacterName Reasonforexclusion

U+09BC BENGALISIGNNUKTA

Never used alone Only used together with U+09A1 ড

U+09A2 ঢ U+09AF য as to form ড় ঢ় য় respectively

Table10bExcludedCodePoints

54 TheBasisofPresentIDNThepresentLGRhasalsobenefitedfromtheearlierworkonIDNforBangla(differentversions)doneforभारतorভারতdraftedbetween20112009and18072013

541 TheABNFVariablesTheAugmentedBackus-NaurFormalism(ABNF)beganwiththefollowingvariables

CrarrConsonantVrarrVowelMrarrkāra(Mātrā)BrarrAnusvāra(onuʃʃār)DrarrCandrabinduXrarrVisarga(biʃɔrgo)HrarrHasantaViramaZrarrKhandaTa

TheAugmentedBackus-NaurFormalism(ABNF)willusethefollowingOperators

SrNumber Operator Function

1 ldquo|ldquo Alternative

2 ldquo[]rdquo Optional

3 ldquordquo VariableRepetition

38

4 ldquo()rdquo SequenceGroup

Table11TheABNFFormalism

InwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoBanglaaregiventofacilitateunderstanding

542 TheVowelSequenceInwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoBanglaare given To facilitate understanding of other Brahmi script users equivalents inDevanāgarīareprovidedwherevernecessaryAvowelsequenceismadeupofasinglevowelItmaybefollowedbutnotnecessarily(optionally)byanAnusvāraonuʃʃār(B)Candrabindu(D)oraVisargabiʃɔrgo(X)ThenumberofDBorXwhichcanfollowaV inBanglamaynotberestrictedtooneGoingbytherules illustratedinthedocument it isclearthat formationssuchasVDDVBBandVXXare invalidorthographicunitsHowever it isvalidandpossible tohaveformationsorsequencessuchasanusvarafollowedbyachandrabinduononehandandvisarga followed by a chandrabindu on the other as in হ8াংচা lsquohaelignchārsquo and lsquohaelignrsquo হ8াঃrespectivelyThepossibilityof aVisarga orAnusvāra (onuʃʃār) followingaCandrabinduexists inBangla Vowel can optionally be followed by a combination of Hasanta Virama [H]Consonant [C] to formaYa-phala ldquoYa-phala isapresentation formofU+09AFBanglaletterযorlsquoyarsquoRepresentedbythesequenceltU+09CDieBENGALISIGNVIRAMABangla SIGNHasanta or VIRAMA U+09AF -য BENGALI LETTER YAgt Ya-phala has aspecialformয়AgainwhencombinedwithU+09BEাBENGALIVOWELSIGNAA(ielsquoaarsquo(ā))itisusedfortranscribing[aelig]asintheldquoardquointheEnglishwordldquobatrdquowritteninBanglaasব8াটAVowel-sequenceadmitsthefollowingcombinations

5421 ASingleVowel

ExamplesV অअ

5422 AVowelwithConditionsAVowelcanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D]orVisarga[X]or Candrabindu+ Anusvāra [DB] or Candrabindu+ Visarga [DX] or combination ofHasanta(orVirama)[H]followedbyConsonant[C]followedbykāra(Mātrā)[M]

39

Examples

VB অং अ

VD অ अ

VX অঃ अः

VDB অং अ

VDX অঃ अःVHCM অ8াএ8া

5423 VHCMSequenceAVHCMsequencecanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D] orVisarga [X]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga [DX]

Examples VHCMB অ8াংএ8াং VHCMD অ8াএ8া VHCMX অ8াঃএ8াঃ VHCMDB অ8াংএ8াং VHCMDX অ8াঃএ8াঃ

543 TheConsonantSequence

5431 ASingleConsonant(C) ExampleC কक

5432 AConsonantwithConditionsAConsonantoptionallyfollowedbydependentvowelsignkāra(Mātrā)[M]orAnusvāra [B] or Candrabindu [D] or Visarga [X] or Hasanta (also known asVirama)[H]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga[DX]

Example

CM িকক -कक

CB কং क

40

CD ক क

CX কঃ कः

CH p क(Pureconsonant)

CDB কং क

CDX কঃ कः

5433 CMSequenceACMsequencecanbeoptionallyfollowedbyBDXDBorDX

Example CMB কীংকং कक

CMD কা का

CMX বীঃ वीः

CMDB কাং का

CMDX কাঃ काः

5434 SequenceofConsonants Asequenceofconsonants(upto4)joinedbyHasanta(alsoknownasVirama)

3(CH)CExample

CHC W rarr ন++ ত न+त

CHCHC sup2 rarr ন+ + ত+ + র न+त+र

CHCHCHC q8 rarr ন++ত++র++য় न+त+र+य

5435 Subsets While considering its subsets as a representative example we will consider thecombinationCHConlyhoweverthesameisequallyapplicabletoCHCHCandCHCHCHC[A]ThecombinationmaybefollowedbyMBDXDBorDX

Example CHCM sup3ী rarrক ক ী 4कrarrककী

CHCB sup3ং rarrক ক ং 4कrarrकक

CHCD sup3 rarrক ক 4कrarrकक

41

CHCX sup3ঃ rarrক ক ঃ 4कःrarrककঃ

CHCDB sup3 ং rarrক ক ং 4कrarrकक

CHCDX sup3ঃ rarrক ক ঃ 4कःrarrककः [B] 3(CH)CMmayfurtherbefollowedbyaBDXDBorDX

Example

CHCMB sup3ীং rarr ক ক ী ং 4क rarr क क ी

sup3ং rarr ক ক ং 4क rarr क क

CHCMD sup3া rarr ক ক া 4का rarr क क ा

CHCMX sup3ীঃ rarr ক ক ী ঃ 4कः rarr क क ी ः

CHCMDB sup3াংrarr ক ক া ং 4काrarr क क ा

CHCMDX sup3াঃ rarr ক ক া ঃ 4काः rarr क क ा ः

544 TheKhanda-Tasequence

5441 AsinglelsquoKhandarsquo-Ta(Z) Example Zৎ=x

5442 AKhandaTaCombination10AKhandaTacanbeprecededbyaconsonantandHasanta(alsoknownasVirama)

[CH]Z

Example র++ৎ =ৎHasinভৎHসনা (bhartsanā) scoldingNoteTheconditionsinthiscontextofKHANDATAarethattheCshouldbeeitherRAU+09B0(র)(usedinBangla)orRAU+09F0(ৰ)(usedinAssamese)

545 SpecialCasesSandPTwospecialcasesinvolvingSequences(referredtoasSandPinTable16underSection7)couldbedescribedbrieflyhereLetustakeupSinthefirstinstanceItisnoteworthythat thereare two instances inBanglawhereHasanta(U+09CD) isprecededbya full

10 Refer to Rule P in Section 7 Table 16

42

vowel (U+0985 অ - BANGLA LETTER A and U+098F এ - BANGLA LETTER E) ForrenderingYa-phalāfollowedbyঅandএitisnecessarytotypeU+09CDplusU+09AFyapreceded by the said vowels This is a purely ligatural entity and the addition ofYa-phalāandā-kāraisusedtoelicittheaeligsoundasinEnglishlsquobatrsquolsquofatrsquoetcTheBrahmıscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribedaslsquovowel killerrsquo although it actually indicates absence of a vowel after the markedconsonant Only the consonants can have the Hasanta marked But as we see hereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅGাandএGা(CfUnicode100p473[100])Another case refers to the formation of repha and ra-phalā in the said script andmentioned in the tableaboveasPOwing toco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+

kaasinঅকHarkaldquothesunldquo)ra-phalā=C+Hasanta+ra(egTieka+Hasanta+raas

inচTcakraldquocycleldquo)ThepointisinboththecasestheslotforracouldbeBanglaraর

(U+09B0)ortheAssameseraৰ(U+09F0)followedprecededbythecommonHasanta(U+09CD)whereastheshapesofrephaandra-phalāinboththecasesremainthesame

43

6 VariantsThissection talksabout thevariants in theBanglascriptTheNBGPcategorizes theseconfusinglyvariantsintwogroupsGroup1ConfusingduetopurevisualsimilarityGroup2ConfusingduetodeviationfromnormallyperceivedcharacterformationsbylargerlinguisticcommunityForGroup1anyidenticalcodepointsaredefinedasvariantsTheconfusablebutnotidenticalcasesarenotproposedasthereisanotherpanel(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesHowevercaseswhichbelongtoGroup2areproposedtobeconsideredasvariantsThesecasesarenotofmerevisualsimilarityasthey involve some deviations from the widely accepted norms of Bangla Aksharformations These can cause confusion even to a careful observer and hence beingproposedasvariantsThe variants are generated in a script when two or more forms are formed withdifferent storage or code points In Bangla the e-kāra ā-kāra and the o-kāra havedifferentcodepointsOnecantypeowithaconsonantatonegoandthesamebytypinge-kāra and ā-kāra as two separate keys getting the same results A reader cannotdifferentiatebetween the twoko (েকা)one typedwitha singlekeyand theotheronetypedwithtwodifferentkeysMoreoverthiswillnotbeconsideredasacaseofvariantbecauseakārafollowedbyakāraisnotallowed

61 InScriptVariantsHoweverweproposetwocasesoftruein-scriptvariantsinBanglascriptCASEIAs far as true variants in Bangla are concernedwemay drawour attention to caseswhereinHasantawith(U+09A5)থ(tha)appearsasconjunctwith(U+09B8)স(sa)and(U+09A8)ন(na)1 স+Hasanta+থ(U+09B8+U+09CD+U+09A5)versus

স +Hasanta+হ(U+09B8+U+09CD+U+09B9)

2 ন+Hasanta+থ(U+09A8+U+09CD+U+09A5)versus

ন+Hasanta+হ(U+09A8+U+09CD+U+09B9)

44

Theabovecombinationsifwrittenintraditionalorthographycouldbelittleconfusingwheretheথ(tha)inconjunctappearslikeaহ(ha)Theconjunctcouldbeintheinitialmedialorfinalpositions(asshownbelowinegno1)Itcouldbetypedwrongaswellthinking itwas aহ (ha)U+09B9 increasing the chances of risks in labelwriting andidentificationExamples1 acuteandসহ(asinacuteানsthānaacuteলsthulaাacuteGsvāsthyaঅacuteায়ীasthāyı)

2 andনহ(asinparagrantha)The fontswhichrepresent traditionalBanglawritingsystemcould tend tocreate thisproblemThereforethesemaybetakenascasesofvariantsinBanglaCASEIIAnotherinterestingexampleofvariantisencounteredinra+HasantaandHasanta+racombinationsinwritinglabelsintheBanglascript(forlanguagessuchasBanglaAssameseandManipuri)Thevariantcasesariseintypinglsquorepharsquo(involvingra+Hasanta)andlsquora-phalārsquo(involvingHasanta+ra)lsquoRepharsquocouldbeformedbytwosequences(mainlybecausebothAssameseandBanglafindplaceinthesameUNICODEpointsandlsquoB_RArsquoaswellaslsquoA_RArsquorefertothesamephoneticelement)Herethefinalligatureslookthesameandwillbeasfollows

(1) B_RA+H+C(2) A_RA+H+C

Where

B_RA = U+09B0BENGALILETTERRA(র)orA_RA = U+09F0BENGALILETTERRAWITHMIDDLEDIAGONAL(ৰ)H = U+09CDBENGALISIGNVIRAMA()C = anyconsonant(theoretically)

Example

Sequence1(UsingBanglaRA)

Ligature1

Sequence2(UsingAssameseRA)

Ligature2

U+09B0(র)U+09CD()U+0995(ক) কH U+09F0(ৰ)U+09CD()U+0995(ক) কH

U+09B0(র)U+09CD()U+09A0(ঠ) ঠH U+09F0(ৰ)U+09CD()U+09A0(ঠ) ঠH

Table12ExampleofRepha

45

NoteAsBanglaandAssameseকandঠlookexactlythesametheresultantcombinationswithRephalookidenticalAdditionofRephadoesnotmakeanydifferencelsquoRa-phalarsquocouldbeformedbytwosequencesonsimilargroundsandthefinalligatureswouldlookthesame

(1) C1+H+B_RA(2) C1+H+A_RA

WhereC1 = anyconsonantsexceptKhanda-ta

Example

Sequence1(UsingBanglaRA)

Ligature1

Sequence2(UsingAssameseRA)

Ligature2

U+0995(ক)U+09CD()U+09B0(র) U+0995(ক)U+09CD()U+09F0(ৰ)

U+09A8(ন)U+09CD()U+09B0(র) ) U+09A8(ন)U+09CD()U+09F0(ৰ) )

Table13ExampleofRa-phala

AstheAssameseandBanglaRephaandRa-phalaconjunctformslookthesamethiscouldcauseconfusabilitytotheend-usersHencetherephaandra-phalacasesneedtobedefinedasvariantsNBGPconcludedtodefineরandৰasvariantcodepointswhereonlyonevariantsetbetweenরandৰcouldcoverallcasesButthiswillcreateblockedvariantlabelsegif

someoneregistersldquoরররrdquothevariantlabelldquoৰৰৰrdquowillbegeneratedasvariantandwillbeblockedandviceversaHoweveritisonlyblockedatthelabellevelifsomeoneelseneedstoregisterotherlabelsegৰৰorৰৰৰৰitisstillpossible

62 CrossScriptVariantsAcrispcrossscriptstudyforBanglahasbeendonewithrespecttosisterscriptssuchasDevanāgarī Gurmukhı and Odia11(formerly Oriya) keeping in mind the visual andtechnicalconfusionstheymaycauseaslabelsonthewebdomainMoreoverthereisnoin-script variant in Bangla as far as the orthography is concerned The followingcharacters are being proposed by the NBPG as variants Although there are certaincharacters which are somewhat similar they but have not been included here TheyhavebeenprovidedintheAppendix(102)forreference

11 Unicode uses Oriya for the script although Odia is now the official term used

46

1 BanglaandNāgarīDevanāgarīScript

Bangla Devanāgarī

মU+09AE

मU+092E

িU+09BF

िU+093F

Table14-BanglaandDevanāgarīcross-scriptvariantcodepoint

2 BanglaandGurmukhiScript

Bangla Gurmukhı

মU+09AE

ਸU+0A38

িU+09BF

ਿU+0A3F

Table15-BanglaandGurmukhıcross-scriptvariantcodepoint

7 WholeLabelEvaluationRules(WLE) ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinsection32whenwritten inBangla12ScriptThe ruleshavebeendrafted in suchawaythattheycanbeeasilytranslatedintotheLGRspecificationsBelow are the symbols used in the WLE rules for each of the Indic SyllabicCategoryasmentionedinthetableprovidedinCodepointrepertoire(Section51)

C rarr Consonant

M rarr Kāra(Mātrā)

V rarr Vowel

B rarr Anusvāra

12 As used by the Unicode denoting and including both Assamese and Maṇipuri

47

D rarr Candrabindu

X rarr Visarga

H rarr Hasanta

Z rarr KhandaTa

S rarr S1S2(fromTable9)or(ae)Ya-phalā(V1HC1M1)whereV1iseither0985(অ-BENGALILETTERA)or098F(এ-BENGALILETTERE)His09CD(-BENGALISIGNVIRAMA)C1is-09AF(য-BENGALILETTERYA)M1is-09BE(া-BENGALIVOWELSIGNAA)S1 and S2 are valid even they are not allowed by the other context rules

P rarr Ra-Hasanta(C2H)whereC2iseither09B0(র-BENGALILETTERRA)or 09F0 (ৰ - ASSAMESE LETTER RA

Unicode name BENGALI LETTER RAWITHMIDDLEDIAGONAL)

His09CD(-BENGALISIGNVIRAMA)

Table16-SymbolsusedinWLErules

ItisalsoperhapsidealtomentionherethatinBanglatheconsonantletters(orgraphemes)arephysicallyjoinedtoformldquoclustersrdquothatcouldtheoreticallyconjoinfromtwotofourconsonantsandcombinetocreatenewshapesDashandChaudhuri(1998)statethatthereareldquonearly380uniqueconsonantclustersrdquooutofwhichBi-consonantalcombinationsare290three-lettercombinationsaccountforanother80andtherareroneswithfourlettersnumber10more[136Pg4]MoredetailsofsuchcombinationscouldbeseeninPabitraSarkar(1993)[135] 71FinalSetofWLERulesTheprevalentpatternsinBanglaandvariousrestrictionsbelowarethespecificWLErulesthatneedtobeimplemented

48

1 CisasetofCandCNwhereCNisthesetofnormalizedformsofড়ঢ়য়2 HmustbeprecededbyC

Example

3 MmustbeprecededbyCExampleকা

4 DmustbeprecededbyeitherofVCorMExampleআখখা হা

5 XmustbeprecededbyeitherofVCMorDExampleউঃখঃবঃাঃ দ ঃ

6 BmustbeprecededbyeitherofVCMorDExampleআংইংকং

7 ZmustbeprecededbyVCMDBXorPExampleইৎকৎাৎাৎপ6ৎrৎ (S is not listed because S ends with M Z may also follow S)rdquo

8 VCANNOTbeprecededbyHDetailsin711CaseofVprecededbyH

9 SCANNOTbeprecededbyH

Now letuselaborateeach rulewithexamples from the scriptkeeping inmindthe Bangla Assamese and Manipuri communities Some combinations ofcharactersmayseemunrealisticorrareinusagebutthereisnoharminaddingsuch ligaturesbecause it ispossible tocreate thembyanyusereasilybutmaynotbeattestedcombinationsCase of V Preceded by H There could be cases involvingmulti-word domains where Vmay need to beallowedtofollowanHegব8াsঅtইিuয়া baeligŋkʌv ɪndiə (U+09ACU+09CDU+09AFU+09BEU+0999U+09CD U+0995 U+0985 U+09AB U+09CD U+0987 U+09A8 U+09CD U+09A1U+09BFU+09DFU+09BE)(meaningBankofIndia)ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendswithanH(অt)andthesecondwordbeginswithaV(ইিuয়া)Somesectionsofthe linguistic community require the explicit presence of H for fullrepresentation of the sound intended However by and large the form of thefirstwordwithoutanH(U+09CD)isconsideredenoughforfullrepresentationofthesoundintendedforthefirstword

49

This isauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoire Otherwise V is never required to be allowed to follow an HPermittingthismaycreateaperceptivesimilaritybetweentwolabels(withandwithout H) for majority of the linguistic communities hence this is explicitlyprohibitedbytheNBGPIn future if required depending on the prevailing requirements from thecommunitythefutureNBGPmayconsiderrevisitingthisrule

72 AdditionalExamplesfromBanglaABNF

Belowarea fewexampleswhichhelponeunderstandsomeof the rulesABNFputsinplaceThesearejustgivenforreferencepurposesandarenotmeanttobecomprehensive

1HMBDorXcannotoccurinthebeginningofaBanglawordExample

ক क

াক ाक

ংক क

ক क

ঃক ःकAscanbeseensuchcombinationwillresultautomaticallyinaldquogolurdquooradottedcircle marking it as an invalid formation This is an intrinsic property of theIndianlanguagesyllableandisquasi-automaticallyappliedwhereversupportedbytheOS

2HisnotpermittedafterVBDXMS

Example

অ अ

অং क ক क কঃ कः ি )क

3NumberofBDorXpermittedafterConsonantorVowelorakāra(Mātrā)isrestrictedtoonethusthefollowingcombinationsareinvalidated

50

Example

কংং क

ক क

কঃঃ कःः

কা का

কীঃঃ कःः

অংং अ

অ अ

অঃঃ अःः

4NumberofMpermittedafterConsonantisrestrictedtooneExample

কীী की5MisnotpermittedafterV Example

ইাঈৗ ईाईौ6ThecombinationsofAnusvāra+ VisargaaswellasVisarga+Anusvāraarenotpermissible

Example

কংঃ कः

কঃং कः

8 Contributors

81ExpertsfromIndia ProfessorUdayaNarayanaSinghChair-ProfessorofLinguisticsampDeanFacultyofArtsAmityUniversityHaryanaGurgaonPachgaonManesarPIN122431(Haryana)India

51

ProfessorPabitraSarkarformerlyVice-ChancellorRabindraBharatiUniversityKolkataDrAtiurRahmanKhanPrincipalTechnicalOfficerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMrRajibChakrabortyLinguistSocietyforNaturalLanguageTechnologyResearch(SNLTR)Module114amp130SDFBuildingSaltLakeSector-VKolkata-700091(WestBengal)IndiaMrAkshatJoshiProjectEngineerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMsMoumitaChowdhurySeniorTechnicalOfficerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMrChandrakantaMurasinghAgartalaTripuraSomeotherNBGPmembers

82ContributorsfromBangladeshJanabMustafaJabbarHonorableMinisterMinistryofPostsTelecommunicationsampInformationTechnologyGovtofBangladeshProfShamsuzzamanKhanFormerDirector-GeneralBanglaAcademyDhakaProfRafiqulIslamNationalProfessorofHumanitiesDhakaProfSwarochisSarkarDirectorInstituteofBangladeshStudiesRajshahiUniversityRajshahiBangladeshProfJinnatImtiazAliDirector-GeneralInternationalMotherLanguageInstituteDhakaMrMohammadMamunOrRashidDepartmentofBanglaJahangirnagarUniversityampMemberBangladeshComputerCouncilProfManiruzzamanformerlyProfessorChittagongUniversityChattagramBangladeshMrShyamSunderSikderSecretarySecretaryPostampTelecommunicationsDivisionGovtofBangladesh

52

MrMdMustafaKamalFormerDirectorGeneralBangladeshTelecommunicationsRegulatoryAuthorityGovernmentofBangladeshDhakaBrigadierGeneralMdMahfuzulKarimMajumderDirector-GeneralEngineeringampOperationsDivisionBangladeshTelecommunicationsRegulatoryAuthorityGovernmentofBangladeshDhakaMdZiarulIslamProgrammerPostsampTelecommunicationsDivisionGovernmentofBangladeshDhakaProfSyedShahriyarRahmanDepartmentofLinguisticsUniversityofDhakaDrMizanurRahmanDirector(In-Charge)TranslationTextBookandInternationalRelationsDivisionBanglaAcademyDhakaDrApareshBandyopadhyayDirectorBanglaAcademyDhakaMrMdMobarakHossainDirectorBanglaAcademyDhakaDrJalalAhmedDirectorBanglaAcademyDhakaMrJahangirHossainInternetSocietyBangladesh(ICANNALS)JanabSarwarMostafaChoudhuryBangladeshComputerCouncilDhakaJanabMdRashidWasifBangladeshComputerCouncilDhakaJanabIstiaqueArifSeniorAssistantDirectorBangladeshTelecommunicationsRegulatoryAuthorityDhakaMsAfifaAbbasInformationSecurityandGovernanceLeadEngineeratBanglalinkandICANNFellowMr Mohammad Abdul Haque Secretary General Bangladesh Internet GovernanceForumMrImranHossenCEOEyeSoftandkeymemberofBangladeshAssociationofSoftwareampInformationServices(BASIS)MsShahidaKhatunDirectorFolkloreMuseumampArchiveDivisionBanglaAcademyDhakaMrSyedAshikRehmanCEOBengalMediaCorporationDhakaMrHaseebRahmanCEOProfessionalsrsquoSystemsDhaka

53

9 References[100] UnicodeConsortium2017UnicodeStandard100MountainViewCA

[101] BandyopadhyayChittaranjan1981DuiShatakerBanglaMudranoPrakashanKolkataAnandaPublishers

[102] BanerjiRD1919TheOriginoftheBengaliScriptKolkataNewDelhiAsianEducationalServices2003reprint

[103] ChatterjiSK1926TheOriginandDevelopmentoftheBengaliLanguageCalcuttaCalcuttaUniversityPress

[104] -----1939Bhasha-prakashBangalaVyakaran(AGrammaroftheBengaliLanguage)CalcuttaUniversityofCalcutta

[105] HaiMuhammadAbdul1964DhvaniVijnanOBanglaDhvani-tattwa(PhoneticsandBengaliPhonology)DhakaBanglaAcademy

[106]JhaSubhadra1958TheFormationofMaithiliLondonLuzacampCo

[107] KosticDjordjeDasRheaS1972AShortOutlineofBengaliPhoneticsCalcuttaStatisticalPublishingCompany

[108] MajumdarRC1971HistoryofAncientBengalCalcuttaGBhardwaj

[109] MazumdarBijaychandra19202000TheHistoryoftheBengaliLanguage(ReprCalcutta1920ed)NewDelhiAsianEducationalServices

[110] PandeyAnshuman2001ProposaltoEncodetheTirhutaScriptinISOIEC10646

[111] PalPalashBaran2001DhwanimalaBarnamalaKolkataPapyrus

[112] -----2007lsquoBanglaHarapherPanchParbarsquoInSwapanChakrabortyedMudranerSanskritiOBanglaBoiKolkataAbabhas

[113] RossFiona1999ThePrintedBengaliCharacteranditsEvolutionLondonCurzon

[114] ShastriMahamahopadhyayHaraPrasad1916HājārBacharērPurāṇaBāṅgālāBhāṣāyBauddhaGānōDōhāCalcuttaBangiyaSahityaParishat

[115] SinghUdayaNarayana(JointlyManiruzzaman)1983DiglossiainBangladeshandlanguageplanningCalcuttaGyanBharati214pp

[116] -----1987ABibliographyofBengaliLinguisticsMysoreCIILxii+316pp

[117] -----2017(withRajibChakrabortyBidishaBhattacharjeeampArimardanKumarTripathy)LanguagesandCulturesontheMarginGuidelinesforFieldworkonEndangeredLanguagesMimeoCentreforEndangeredLanguagesVisva-Bharati

54

[118] -----1980ScriptalchoiceandspellingreformAnessayinlanguageandplanningJournaloftheMSUniversityofBarodaSocialScienceNumber292173-186AmodifiedversionreprintedEAnnamalaiBjornJernuddandJoanRubinedsLanguagePlanningProceedingsofanInstituteMysoreCIIL405-417

[119] Sripantha1996JakhanChapakhanaEloKolkataPaschim-BangaBanglaAcademy

[120] SurAtul1986BanglaMudranerDushoBacharKolkataJijnasa

[121] ScriptBehaviourforBengaliVersion11TDILandC-DACPune

[122] BoraMahendra1981TheEvolutionofAssameseScriptJorhatAssamSahityaSabha

[123] ProposaltoEncodetheTirhutaScriptinISOIEC10646httpwwwunicodeorgL2L201111175r-tirhutapdfaccessedon25112017

[124] EthnologueAssameseintheLanguageCloudhttpswwwethnologuecomcloudasmaccessedon25112017

[125] BengalialphabetforManipurifoundinEthnologueManipuri(MeeteilonMeithei)httpswwwomniglotcomwritingmanipurihtmaccessedon20102019

[126] WikipediaBengalialphabethttpsenwikipediaorgwikiBengali_alphabetaccessedon25112017

[129]OmniglotSlyhetihttpwwwomniglotcomwritingsylotihtmaccessedon1052018

[130]WikipediaBishnupriyaManipurilanguagehttpsenwikipediaorgwikiBishnupriya_Manipuri_languageaccessedon25112017

[131] TheEMILLECIILCorpushttpmetashareeldaorgrepositorybrowsethe-emilleciil-corpusabdd35c8de6f11e2b1e400259011f6ea6bce74d38dbb42d881da76c64a6adb20accessedon1052018

[132]TheEMILLECIILCorpushttpcatalogelrainfoproduct_infophpproducts_id=696accessedon1052018

[133] BanglaLanguageampScript httpswwwisicalacin~rc_banglabanglahtmlaccessedon1052018

[134] SarkarPabitra1992BanglaBananSanskarSamasyaoSambhabanaKolkataChirayataPrakashan

[135] SarkarPabitra1993BanglaBhasharYuktabyanjanBhasha1123-45

55

[136] DashNiladriShekharandBBChaudhuri1998BanglaScriptAStructuralStudyLinguisticsToday121-28Alsoavailableathttpswwwacademiaedu9967428Bangla_Script_A_Structural_Study

[137] DaniAhmedHasan(1957)lsquoSrīhatta-NāgarīLipirUtpattioBikāśrsquoBanglaAcademyPatrika(Dhaka)Vol12(Bhadra-Agrahayan1364BangabdaNumber)pg1

[138] WikipediaSylhetiNagari httpsenwikipediaorgwikiSylheti_Nagariaccessedon1952018

[139] FuruiRyosuke(2015)lsquoVariegatedAdaptationsStateFormationinBengalfromtheFifthtoSeventhCenturyrsquoinBhairabiPrasadSahuampHermannKulkeedsInterrogatingPoliticalSystemsIntegrativeProcessesandStatesinPre-ModernIndiaChapter9Pp255-73NewDelhiManohar

[140] FergusonCharesAandMunierChowdhury(1960)lsquoPhonesofBengalirsquoLanguageVol36No1pp22-59

[141] ShahidullahMuhammad(2007)BuddhistMysticSongsDhakaMowlaBrothers

[142] RayPunyaSloka(1966)BengaliLanguageHandbookWashington

[143] HaiMuhammadAbdul(1960)AphoneticandphonologicalstudyofnasalsandnasalizationinBengaliDhakaUniversityofDhaka

[144] UnicodeConsortiumProposalSummaryFormtoAccompanySubmissionsforAdditionstotheRepertoireofISOIEC10646UNICODEhttpswwwunicodeorgL2L200202387r-syloti-formpdfaccessedonMay212018

[145] WikipediaOlChiki(Unicodeblock)httpsenwikipediaorgwikiOl_Chiki_(Unicode_block)accessedonMay212018

[146]BanglaScripthttpwwwbangladesh2000combdbangla_scripthtmlaccessedonMay212018

[147] BhattacharyaAshutoshed(1942)GopichandrerGanCalcuttaCalcuttaUniversity

[149] DasSisirKumar(1975)SahibsandMunshisAnAccountoftheCollegeofFortWilliamCalcutta

[150] IslamRafiqulPabitraSarkarMahbubulHaqampRajibChakraborty(eds)(2014)BanglaAcademyPromitoBanglaByabaharikByakaran(AFunctionalGrammarofStandardBangla)DhakaBanglaAcademy

[151] SarkarPabitra[2013]lsquoBanglaSpellingReformtheLongandShortofItrsquoBanglaJournal19215-232

[152] BanglaAcademy(2012)BanglaAcademyPromitoBanglaBananerNiyam(StandardBanglaSpellingasadoptedbyBanglaAcademy)DhakaBanglaAcademy

56

[153] SarkarPabitraampRajibChakraborty2018ldquoWhathashappenedSoFarIntermsof Script Reformsrdquo Paper presented at the Face to Face meeting jointly held by theBanglaAcademyDhakaampICANNatBanglaAcademyDhakaon10072018

[154] The Unicode Consortium 2018 The Unicodereg Standard Version 110 ndash CoreSpecificationChapter12P473

57

10 Appendix-I

101 AugmentedBackus-NaurFormalism(ABNF)The Augmented Backus-Naur Formalism (ABNF) is generic in nature and whenappliedtoaspecificlanguagescriptcertainrestrictionrulesapplyInotherwordsinagivenlanguagesomeoftheFormalismstructuresdonotnecessarilyapplyTotakecareofsuchcasesrestrictionrulesaresetinplaceTheserestrictionswillhelptofine-tunetheABNFIncaseofBangla13inparticularthefollowingrulesapply

1Khaṇḍata (ৎ) is NOT allowed at the beginning of an IDN label The sameappliestoঞandthevelarnasalঙintheBanglaSchemeoffive-foldlsquovargarsquo(asdefined under Table 5) Moreover Bangla does not allow ya (য়) in thebeginningofawordeitherbutwecanciteacoupleofnativeexamples forexamplethewordয়8াvেড়া(yaeligbbɔRo)fromthepoemlsquoLichuchorrsquowrittenbyKaziNazrul IslamHowever there are instances of it being used in namesmostlyof foreignoriginsuchasYaqubwhichmaybewrittenwithya(য়) inthe beginning as inয়াwব) In very recent timeswhile transliterating someChineseandJapanesenamesinBanglaonedoescomeacrossthepossibilityofKhaṇḍata (ৎ) followedbysa (স) inthebeginningofaword forexamplexেসিরং(Tsering)

2CHcancomewithKhandaTainonlythecasewhereCisra(র)(09B0)

ৎ6 asinভৎ6 সনা

3OnlyfollowingcombinationswithVHCMwillbeallowedrarrঅ8া(togetherpronouncedasaelig)asinঅ8ািসড(acid)rarrএ8া(togetheralsopronouncedasaelig)asinএ8ািসডএ8ােসািসেয়শান

(acidassociation)

102 lsquoSylhetiNagarılipirsquoorlsquoSilotirsquoThisversionofBanglascriptresemblesthe lsquoKaithīrsquoscript(ISO12954)usedbytheAccountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh andBiharndashwidelyinuseduringthe1880sTherewereseveralothernamesofSylheti

13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language Assamese and Maṇipuri have not been covered in this section

58

Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)

Table17ndashTheScriptTableofSylhetiNagarıorSiloti

103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints

1031 BanglaandNāgarīorDevanāgarī

Bangla Devanāgarī NBGP

Decision

ঃ U+0983 ः U+0903 Confusable

ওU+0993 उU+0909 Confusable

ঘU+0998 घU+0918 Confusable

U+0981 U+0945 Confusable

Table18BanglaandDevanāgarīconfusablecodepoints

59

1032 BanglaandGurmukhi

Bangla Gurmukhi NBGPdecision

ঘU+0998 ਬU+0A2C Confusable

U+0981 U+0A71 Confusable

Table19BanglaandGurmukhiconfusablecodepoints

Bangla Gurmukhi

NBGPdecision

ওU+0993 ਤU+0A24

Distinguishable

শU+09B6 ਅU+0A05

Distinguishable

মU+09AE ਮU+0A2E

Distinguishable

বাU+09ACandU+09BE

ਗU+0A17

Distinguishable

Table20ndashBanglaandGurmukhıdistinguishablecodepoints

1033BanglaandOriya(Odia)

Bangla Oriya(Odia) NBGPDecision

ওU+0993 ଓU+0B13 Confusable

Table21ndashBanglaandOriyadistinguishablecodepoints

Bangla Oriya(Odia) NBGP

Decision

ঘU+0998 ସU+0B38 Distinguishable

Table22ndashBanglaandOriyadistinguishablecodepoints

60

11 Appendix-IIBengaliconsonantsandtheirallographs

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

প p y(z+ত)(z +ন)|(z +প)প8(z +য)r(z +র)(z +ল)~(z +স)(+প)(+প)

ফ pʰ (t +র)(t +ল)(+ফ)

ব b ( +জ)( +দ)( +ধ)v(+ব)ব8(+য)(+র)(+ল)ভ(+ভ)(+ব)(+ব)

(0+ধ) 2 (3+ব)

ভ bʱ ভ8(+য)(+র)(+ল)

ত t (x+ত)8(x+x+য)(x+x+ব)(x+থ)(x+ন)ত8(x+য)(x+ম)8(x++য)(x+ব)(x+র)y(z+ত)(p+ত)(p+x+ব)(+ত)q8(+x++য) (+x+র)Thereisamarkedformofত+=ৎৎ6 (+xৎ)

amp (5+ত)

61

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

থ tʰ থ8(iexcl+য)cent(iexcl+র)pound(+থ)(x+থ)curren(+থ)

(7+থ) 9 (+থ)

দ d yen(brvbar+গ)sect(brvbar+ঘ)uml(brvbar+দ)copy(brvbar+ধ)দ8(brvbar+য)ordf(brvbar+ব)laquo(brvbar+ভ)not(brvbar+র)(+দ)shy(+দ)reg(+brvbar+র)not6 (+brvbar+র)

(lt+গ) gt (lt+ধ)

ধ dʱ macr(deg+ন)plusmn(deg+ম)ধ8(deg+য)sup2(deg+র)sup3(acute+ধ)copy(brvbar+ধ)(+ধ)micro(+ধ)

( (+ধ) gt (lt+ধ) (0+ধ) (7+ধ)

ট ʈ para(middot+ট)ট8(middot+য)cedil(middot+ব)sup1(middot+র)ordm(p+ট)raquo(frac14+ট)

ঠ ʈʰ ঠ8(frac12+য)frac34(iquest+ঠ)Agrave(frac14+ঠ)

ড ɖ Aacute(Acirc+ড)ড8(Acirc+য)Atilde(Acirc+র)

ঢ ɖʱ ঢ8(Auml+য)Aring(iquest+ঢ)

চ tʃ AElig(Ccedil+চ)Egrave(Ccedil+ছ)Eacute(Ccedil+Ecirc+র)Euml(Ccedil+ঞ)চ8(Ccedil+য)Igrave(Iacute+চ)Icirc(Iuml+চ)

(A+চ)

62

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ছ tʃʰ ETH(Ecirc+র)Egrave(Ccedil+ছ)Ntilde(Iacute+ছ)Ograve(Iuml+ছ)

$ (A+ছ)

জ dʒ Oacute(Ocirc+জ)Otilde(Ocirc+Ocirc+ব)Ouml(Ocirc+ঝ)times(Ocirc+ঞ)জ8(Ocirc+য)Oslash(Ocirc+র)Ugrave(Iacute+জ)

(A+জ)

ঝ dʒʱ (notprivilegedenoughtohaveclustersasafirstmember)Ouml(Ocirc+ঝ)Uacute(Iacute+ঝ)

ক k Ucirc(p+ক)ordm(p+ট)(p+ত)Uuml(p+x+র)(p+x+ব)Yacute(p+ন)THORN(p+ব)szlig(p+ম)ক8(p+য)agrave(p+র)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)auml(p+frac14+ব)aacute8(p+frac14+য)aring(p+স)s(aelig+ক)ccedil(+p+র)

amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)

খ kʰ (notprivilegedenoughtohaveclustersasafirstmember)egrave(aelig+খ)

63

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

গ g eacute(acute+গ)ecirc(acute+দ)sup3(acute+ধ)euml(acute+ন)igrave(acute+ব)iacute(acute+ম)গ8(acute+য)icirc(acute+র)iuml(acute+ল)eth(aelig+গ)eth6 (+aelig+গ)

( (+ধ) (H+গ) K (L+H+গ)

ঘ gʱ ntilde(ograve+ন)ঘ8(ograve+য)oacute(ograve+র)ocirc(aelig+ঘ)

ঞ Thisletterdoesnothaveanyparticularphoneticvaluebutmostlypronouncedasn

Igrave(Iacute+চ)Ntilde(Iacute+ছ)Ugrave(Iacute+জ)Uacute(Iacute+ঝ)times(Ocirc+ঞ)

(A+চ) $ (A+ছ) (A+জ) M (A+ঝ)

ণ n otilde(iquest+ট)frac34(iquest+ঠ)ouml(iquest+ড)divide(iquest+Acirc+র)Aring(iquest+ঢ)oslash(iquest+ণ)ণ8(iquest+য)ugrave(iquest+ব)acirc(p+frac14+ণ)uacute(frac14+ণ)ucirc(+ণ)

O (P+ড) - (P+R+র) + (S+ণ)

ঙং ŋ s(aelig+ক)uuml(aelig+p+র)egrave(aelig+খ)eth(aelig+গ)ocirc(aelig+ঘ)yacute(aelig+p+ষ)(Insomecontextsaeligisreplacedbyং)কংঅং

) (H+ক) (H+গ) U (H+ঘ)

64

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ম m thorn(+ল)yuml(+প)(+z+র)(+ভ)(++র)$(+ম)(+র)(x+ম)plusmn(deg+ম)amp(+ম)atilde(p+frac14+ম)

W (3+ম)

ন n (+ট)((+middot+র))(iquest+ঠ)u(+ড)(+Acirc+র)(+ত)q(+x+র)q8(+x++য)curren(+থ)shy(+দ)reg(+brvbar+র)micro(+ধ)+(+deg+র)(+brvbar+ব)-(+ন)(+ম)ন8(+য)(+স)0(+ন)

(7+থ) (7+ধ) (7+Y+র)

শ ʃ Icirc(Iuml+চ)Ograve(Iuml+ছ)1(Iuml+ন)2(Iuml+ম)3(Iuml+র)4(Iuml+ল)শ8(Iuml+য)

ষ ʃ 5(frac14+ক)raquo(frac14+ট)Agrave(frac14+ঠ)uacute(frac14+ণ)6(frac14+প)7(frac14+z+র)8(frac14+ফ)raquo(frac14+ট)9(frac14+middot+র)Agrave(frac14+ঠ)uacute(frac14+ণ)ষ8(frac14+য)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)

+ (S+ণ)

65

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

স sampʃ (+ক)(+ট)(+প)(+ফ)lt(+ত)pound(+থ)(+ট)(+ক)=(+খ)স8(+য)gt(+র)(+ল)aring(p+স)

9 (+থ)

হ h ucirc(+ণ)0(+ন)amp(+ম)হ8(+য)(+র)A(+ল)

W (3+ম)

ড় ɽ B(C+গ)

ঢ় ɽʱ (notprivilegedenoughtohaveclusters)

য dʒThesecondarysymbol(allograph)jɔ-phalahastwophoneticvaluesWhenaddedtotheinitialconsonantinaworditisavowelaelig(asinশ8ামলর 8াপারetc)Butafteranon-initialconsonantitjustdoublesitinpronunciation(asinকায6ধায6etc)The+যcombinationhastwophysicalmanifestationsmdashর 8andয6

ক8(p+য)স8(+য)র 8(+য)[Justর 8isneverusedinBanglaorthographyর 8াisbutthenitslasttwosymbolsYa-phalaa-karaconstituteavowelsignrepresentingthevowelঅ8া]

66

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

র r Twomanifestationsmdashi lরফrepʰasthefirst

memberofaclusteregপ6ৎ6 not6 য6D6(+deg+ব)(earlierE6=+brvbar+deg+বafour-termcluster)etc(placedoverthefollowingconsonant)

ii র-ফলাrɔ-pʰɔlaasthesecondthirdmemberofaclustereg etc(placedundertheconsonantitfollows)

ল l F(+গ)(+প)G(+ব)H(+ম)I(+ট)J(+ড)K(+ক)F(+গ)L(+দ)ল8(+য)iuml(acute+ল)(+ল)thorn(+ল)

ঃ hwordfinallywordmediallyitdoublesthepronunciationofthefollowingconsonant

অঃকঃ

অব

Page 33: Proposal for a Bangla (or Bengali) Script Root Zone …‘Bangla’ (or Bengali) is historically and genealogically regarded as an eastern Indo-Aryan language with around 178.2 million

33

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

47 U+09B7 ষ BENGALILETTERSSA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

48 U+09B8 স BENGALILETTERSA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

49 U+09B9 হ BENGALILETTERHA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

50 U+09BE া BENGALIVOWELSIGNAA

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

51 U+09BF ি BENGALIVOWELSIGNI

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

52 U+09C0 ী BENGALIVOWELSIGNII

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

53 U+09C1 BENGALIVOWELSIGNU

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

34

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

54 U+09C2 BENGALIVOWELSIGNUU

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

55 U+09C3 BENGALIVOWELSIGNVOCALICR

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

56 U+09C4 BENGALIVOWELSIGNVOCALICRR

Kāra(Mātrā)

1Bangla2Assamese

[112][122]

57 U+09C7 l BENGALIVOWELSIGNE

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

58 U+09C8 m BENGALIVOWELSIGNAI

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

59 U+09CB lা BENGALIVOWELSIGNO

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

60 U+09CC lৗ BENGALIVOWELSIGNAU

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

35

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

61 U+09CD BENGALISIGNVIRAMA

Hasanta(=Halant)Virama(=Da ri)

1Bangla2Assamese2Manipuri

[112][122][125]

62 U+09CE ৎ BENGALILETTERKHANDATA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

63 U+09F0 ৰ BENGALILETTERRAWITHMIDDLEDIAGONAL

Consonant 2Assamese [122]

64 U+09F1 ৱ BENGALILETTERRAWITHLOWERDIAGONAL

Consonant 2Assamese2Manipuri

[122][125]

Table8BanglaCode-PointRepertoire

Apart from the above individual code-points the Neo-Brāhmī Generation Panel alsoproposes some specific sequences which enable conditional inclusion of the BanglaLETTER A and E followed by Bangla SIGN VIRAMA and Bangla LETTER YA againfollowed by Bangla VOWEL SIGN AA in the repertoire for enabling inclusion of aeligsoundasinEnglishlsquobatrsquolsquocatrsquoetc

SrNo

UnicodeCodePoints

Sequence

CharacterNames Examplelanguagesusingthecode-point

(Notexhaustive

list)

Reference

36

SrNo

UnicodeCodePoints

Sequence

CharacterNames Examplelanguagesusingthecode-point

(Notexhaustive

list)

Reference

S1 098509CD09AF09BE

অ8া BENGALILETTERABENGALISIGNVIRAMABENGALILETTERYABENGALIVOWELSIGNAA

BanglaAssamese

[112][122]

S2 098F09CD09AF09BE

এ8া BENGALILETTEREBENGALISIGNVIRAMABENGALILETTERYABENGALIVOWELSIGNAA

Bangla [112]

Table9Sequences

52 CodePointRepertoireExclusionTherearesomecharactersoftheBanglascriptthatfindplaceintheUnicodebuthavenot been included in the repertoire in the LGR proposal The reason for excludingঌ(U+098C)andৗ(U+09D7)isthattheyarerareandobsoletecharacters

SrNo CodePoints

Glyph CharacterNames Note

1 U+098C ঌ BENGALILETTERVOCALICL Limitedordeclininguse

2 U+09D7 ৗ BENGALIAULENGTHMARK Limitedordeclininguse

Table10ExcludedCodePoints

37

53 CodepointnotusedaloneBENGALI SIGN NUKTA U+09BC (See 336) is excluded from repertoire since it will never be used alone It will be used as sequence in three special characters in normalized form for ড় ঢ় য়

UnicodeCodePoint

Glyph CharacterName Reasonforexclusion

U+09BC BENGALISIGNNUKTA

Never used alone Only used together with U+09A1 ড

U+09A2 ঢ U+09AF য as to form ড় ঢ় য় respectively

Table10bExcludedCodePoints

54 TheBasisofPresentIDNThepresentLGRhasalsobenefitedfromtheearlierworkonIDNforBangla(differentversions)doneforभारतorভারতdraftedbetween20112009and18072013

541 TheABNFVariablesTheAugmentedBackus-NaurFormalism(ABNF)beganwiththefollowingvariables

CrarrConsonantVrarrVowelMrarrkāra(Mātrā)BrarrAnusvāra(onuʃʃār)DrarrCandrabinduXrarrVisarga(biʃɔrgo)HrarrHasantaViramaZrarrKhandaTa

TheAugmentedBackus-NaurFormalism(ABNF)willusethefollowingOperators

SrNumber Operator Function

1 ldquo|ldquo Alternative

2 ldquo[]rdquo Optional

3 ldquordquo VariableRepetition

38

4 ldquo()rdquo SequenceGroup

Table11TheABNFFormalism

InwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoBanglaaregiventofacilitateunderstanding

542 TheVowelSequenceInwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoBanglaare given To facilitate understanding of other Brahmi script users equivalents inDevanāgarīareprovidedwherevernecessaryAvowelsequenceismadeupofasinglevowelItmaybefollowedbutnotnecessarily(optionally)byanAnusvāraonuʃʃār(B)Candrabindu(D)oraVisargabiʃɔrgo(X)ThenumberofDBorXwhichcanfollowaV inBanglamaynotberestrictedtooneGoingbytherules illustratedinthedocument it isclearthat formationssuchasVDDVBBandVXXare invalidorthographicunitsHowever it isvalidandpossible tohaveformationsorsequencessuchasanusvarafollowedbyachandrabinduononehandandvisarga followed by a chandrabindu on the other as in হ8াংচা lsquohaelignchārsquo and lsquohaelignrsquo হ8াঃrespectivelyThepossibilityof aVisarga orAnusvāra (onuʃʃār) followingaCandrabinduexists inBangla Vowel can optionally be followed by a combination of Hasanta Virama [H]Consonant [C] to formaYa-phala ldquoYa-phala isapresentation formofU+09AFBanglaletterযorlsquoyarsquoRepresentedbythesequenceltU+09CDieBENGALISIGNVIRAMABangla SIGNHasanta or VIRAMA U+09AF -য BENGALI LETTER YAgt Ya-phala has aspecialformয়AgainwhencombinedwithU+09BEাBENGALIVOWELSIGNAA(ielsquoaarsquo(ā))itisusedfortranscribing[aelig]asintheldquoardquointheEnglishwordldquobatrdquowritteninBanglaasব8াটAVowel-sequenceadmitsthefollowingcombinations

5421 ASingleVowel

ExamplesV অअ

5422 AVowelwithConditionsAVowelcanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D]orVisarga[X]or Candrabindu+ Anusvāra [DB] or Candrabindu+ Visarga [DX] or combination ofHasanta(orVirama)[H]followedbyConsonant[C]followedbykāra(Mātrā)[M]

39

Examples

VB অং अ

VD অ अ

VX অঃ अः

VDB অং अ

VDX অঃ अःVHCM অ8াএ8া

5423 VHCMSequenceAVHCMsequencecanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D] orVisarga [X]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga [DX]

Examples VHCMB অ8াংএ8াং VHCMD অ8াএ8া VHCMX অ8াঃএ8াঃ VHCMDB অ8াংএ8াং VHCMDX অ8াঃএ8াঃ

543 TheConsonantSequence

5431 ASingleConsonant(C) ExampleC কक

5432 AConsonantwithConditionsAConsonantoptionallyfollowedbydependentvowelsignkāra(Mātrā)[M]orAnusvāra [B] or Candrabindu [D] or Visarga [X] or Hasanta (also known asVirama)[H]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga[DX]

Example

CM িকক -कक

CB কং क

40

CD ক क

CX কঃ कः

CH p क(Pureconsonant)

CDB কং क

CDX কঃ कः

5433 CMSequenceACMsequencecanbeoptionallyfollowedbyBDXDBorDX

Example CMB কীংকং कक

CMD কা का

CMX বীঃ वीः

CMDB কাং का

CMDX কাঃ काः

5434 SequenceofConsonants Asequenceofconsonants(upto4)joinedbyHasanta(alsoknownasVirama)

3(CH)CExample

CHC W rarr ন++ ত न+त

CHCHC sup2 rarr ন+ + ত+ + র न+त+र

CHCHCHC q8 rarr ন++ত++র++য় न+त+र+य

5435 Subsets While considering its subsets as a representative example we will consider thecombinationCHConlyhoweverthesameisequallyapplicabletoCHCHCandCHCHCHC[A]ThecombinationmaybefollowedbyMBDXDBorDX

Example CHCM sup3ী rarrক ক ী 4कrarrककী

CHCB sup3ং rarrক ক ং 4कrarrकक

CHCD sup3 rarrক ক 4कrarrकक

41

CHCX sup3ঃ rarrক ক ঃ 4कःrarrककঃ

CHCDB sup3 ং rarrক ক ং 4कrarrकक

CHCDX sup3ঃ rarrক ক ঃ 4कःrarrककः [B] 3(CH)CMmayfurtherbefollowedbyaBDXDBorDX

Example

CHCMB sup3ীং rarr ক ক ী ং 4क rarr क क ी

sup3ং rarr ক ক ং 4क rarr क क

CHCMD sup3া rarr ক ক া 4का rarr क क ा

CHCMX sup3ীঃ rarr ক ক ী ঃ 4कः rarr क क ी ः

CHCMDB sup3াংrarr ক ক া ং 4काrarr क क ा

CHCMDX sup3াঃ rarr ক ক া ঃ 4काः rarr क क ा ः

544 TheKhanda-Tasequence

5441 AsinglelsquoKhandarsquo-Ta(Z) Example Zৎ=x

5442 AKhandaTaCombination10AKhandaTacanbeprecededbyaconsonantandHasanta(alsoknownasVirama)

[CH]Z

Example র++ৎ =ৎHasinভৎHসনা (bhartsanā) scoldingNoteTheconditionsinthiscontextofKHANDATAarethattheCshouldbeeitherRAU+09B0(র)(usedinBangla)orRAU+09F0(ৰ)(usedinAssamese)

545 SpecialCasesSandPTwospecialcasesinvolvingSequences(referredtoasSandPinTable16underSection7)couldbedescribedbrieflyhereLetustakeupSinthefirstinstanceItisnoteworthythat thereare two instances inBanglawhereHasanta(U+09CD) isprecededbya full

10 Refer to Rule P in Section 7 Table 16

42

vowel (U+0985 অ - BANGLA LETTER A and U+098F এ - BANGLA LETTER E) ForrenderingYa-phalāfollowedbyঅandএitisnecessarytotypeU+09CDplusU+09AFyapreceded by the said vowels This is a purely ligatural entity and the addition ofYa-phalāandā-kāraisusedtoelicittheaeligsoundasinEnglishlsquobatrsquolsquofatrsquoetcTheBrahmıscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribedaslsquovowel killerrsquo although it actually indicates absence of a vowel after the markedconsonant Only the consonants can have the Hasanta marked But as we see hereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅGাandএGা(CfUnicode100p473[100])Another case refers to the formation of repha and ra-phalā in the said script andmentioned in the tableaboveasPOwing toco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+

kaasinঅকHarkaldquothesunldquo)ra-phalā=C+Hasanta+ra(egTieka+Hasanta+raas

inচTcakraldquocycleldquo)ThepointisinboththecasestheslotforracouldbeBanglaraর

(U+09B0)ortheAssameseraৰ(U+09F0)followedprecededbythecommonHasanta(U+09CD)whereastheshapesofrephaandra-phalāinboththecasesremainthesame

43

6 VariantsThissection talksabout thevariants in theBanglascriptTheNBGPcategorizes theseconfusinglyvariantsintwogroupsGroup1ConfusingduetopurevisualsimilarityGroup2ConfusingduetodeviationfromnormallyperceivedcharacterformationsbylargerlinguisticcommunityForGroup1anyidenticalcodepointsaredefinedasvariantsTheconfusablebutnotidenticalcasesarenotproposedasthereisanotherpanel(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesHowevercaseswhichbelongtoGroup2areproposedtobeconsideredasvariantsThesecasesarenotofmerevisualsimilarityasthey involve some deviations from the widely accepted norms of Bangla Aksharformations These can cause confusion even to a careful observer and hence beingproposedasvariantsThe variants are generated in a script when two or more forms are formed withdifferent storage or code points In Bangla the e-kāra ā-kāra and the o-kāra havedifferentcodepointsOnecantypeowithaconsonantatonegoandthesamebytypinge-kāra and ā-kāra as two separate keys getting the same results A reader cannotdifferentiatebetween the twoko (েকা)one typedwitha singlekeyand theotheronetypedwithtwodifferentkeysMoreoverthiswillnotbeconsideredasacaseofvariantbecauseakārafollowedbyakāraisnotallowed

61 InScriptVariantsHoweverweproposetwocasesoftruein-scriptvariantsinBanglascriptCASEIAs far as true variants in Bangla are concernedwemay drawour attention to caseswhereinHasantawith(U+09A5)থ(tha)appearsasconjunctwith(U+09B8)স(sa)and(U+09A8)ন(na)1 স+Hasanta+থ(U+09B8+U+09CD+U+09A5)versus

স +Hasanta+হ(U+09B8+U+09CD+U+09B9)

2 ন+Hasanta+থ(U+09A8+U+09CD+U+09A5)versus

ন+Hasanta+হ(U+09A8+U+09CD+U+09B9)

44

Theabovecombinationsifwrittenintraditionalorthographycouldbelittleconfusingwheretheথ(tha)inconjunctappearslikeaহ(ha)Theconjunctcouldbeintheinitialmedialorfinalpositions(asshownbelowinegno1)Itcouldbetypedwrongaswellthinking itwas aহ (ha)U+09B9 increasing the chances of risks in labelwriting andidentificationExamples1 acuteandসহ(asinacuteানsthānaacuteলsthulaাacuteGsvāsthyaঅacuteায়ীasthāyı)

2 andনহ(asinparagrantha)The fontswhichrepresent traditionalBanglawritingsystemcould tend tocreate thisproblemThereforethesemaybetakenascasesofvariantsinBanglaCASEIIAnotherinterestingexampleofvariantisencounteredinra+HasantaandHasanta+racombinationsinwritinglabelsintheBanglascript(forlanguagessuchasBanglaAssameseandManipuri)Thevariantcasesariseintypinglsquorepharsquo(involvingra+Hasanta)andlsquora-phalārsquo(involvingHasanta+ra)lsquoRepharsquocouldbeformedbytwosequences(mainlybecausebothAssameseandBanglafindplaceinthesameUNICODEpointsandlsquoB_RArsquoaswellaslsquoA_RArsquorefertothesamephoneticelement)Herethefinalligatureslookthesameandwillbeasfollows

(1) B_RA+H+C(2) A_RA+H+C

Where

B_RA = U+09B0BENGALILETTERRA(র)orA_RA = U+09F0BENGALILETTERRAWITHMIDDLEDIAGONAL(ৰ)H = U+09CDBENGALISIGNVIRAMA()C = anyconsonant(theoretically)

Example

Sequence1(UsingBanglaRA)

Ligature1

Sequence2(UsingAssameseRA)

Ligature2

U+09B0(র)U+09CD()U+0995(ক) কH U+09F0(ৰ)U+09CD()U+0995(ক) কH

U+09B0(র)U+09CD()U+09A0(ঠ) ঠH U+09F0(ৰ)U+09CD()U+09A0(ঠ) ঠH

Table12ExampleofRepha

45

NoteAsBanglaandAssameseকandঠlookexactlythesametheresultantcombinationswithRephalookidenticalAdditionofRephadoesnotmakeanydifferencelsquoRa-phalarsquocouldbeformedbytwosequencesonsimilargroundsandthefinalligatureswouldlookthesame

(1) C1+H+B_RA(2) C1+H+A_RA

WhereC1 = anyconsonantsexceptKhanda-ta

Example

Sequence1(UsingBanglaRA)

Ligature1

Sequence2(UsingAssameseRA)

Ligature2

U+0995(ক)U+09CD()U+09B0(র) U+0995(ক)U+09CD()U+09F0(ৰ)

U+09A8(ন)U+09CD()U+09B0(র) ) U+09A8(ন)U+09CD()U+09F0(ৰ) )

Table13ExampleofRa-phala

AstheAssameseandBanglaRephaandRa-phalaconjunctformslookthesamethiscouldcauseconfusabilitytotheend-usersHencetherephaandra-phalacasesneedtobedefinedasvariantsNBGPconcludedtodefineরandৰasvariantcodepointswhereonlyonevariantsetbetweenরandৰcouldcoverallcasesButthiswillcreateblockedvariantlabelsegif

someoneregistersldquoরররrdquothevariantlabelldquoৰৰৰrdquowillbegeneratedasvariantandwillbeblockedandviceversaHoweveritisonlyblockedatthelabellevelifsomeoneelseneedstoregisterotherlabelsegৰৰorৰৰৰৰitisstillpossible

62 CrossScriptVariantsAcrispcrossscriptstudyforBanglahasbeendonewithrespecttosisterscriptssuchasDevanāgarī Gurmukhı and Odia11(formerly Oriya) keeping in mind the visual andtechnicalconfusionstheymaycauseaslabelsonthewebdomainMoreoverthereisnoin-script variant in Bangla as far as the orthography is concerned The followingcharacters are being proposed by the NBPG as variants Although there are certaincharacters which are somewhat similar they but have not been included here TheyhavebeenprovidedintheAppendix(102)forreference

11 Unicode uses Oriya for the script although Odia is now the official term used

46

1 BanglaandNāgarīDevanāgarīScript

Bangla Devanāgarī

মU+09AE

मU+092E

িU+09BF

िU+093F

Table14-BanglaandDevanāgarīcross-scriptvariantcodepoint

2 BanglaandGurmukhiScript

Bangla Gurmukhı

মU+09AE

ਸU+0A38

িU+09BF

ਿU+0A3F

Table15-BanglaandGurmukhıcross-scriptvariantcodepoint

7 WholeLabelEvaluationRules(WLE) ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinsection32whenwritten inBangla12ScriptThe ruleshavebeendrafted in suchawaythattheycanbeeasilytranslatedintotheLGRspecificationsBelow are the symbols used in the WLE rules for each of the Indic SyllabicCategoryasmentionedinthetableprovidedinCodepointrepertoire(Section51)

C rarr Consonant

M rarr Kāra(Mātrā)

V rarr Vowel

B rarr Anusvāra

12 As used by the Unicode denoting and including both Assamese and Maṇipuri

47

D rarr Candrabindu

X rarr Visarga

H rarr Hasanta

Z rarr KhandaTa

S rarr S1S2(fromTable9)or(ae)Ya-phalā(V1HC1M1)whereV1iseither0985(অ-BENGALILETTERA)or098F(এ-BENGALILETTERE)His09CD(-BENGALISIGNVIRAMA)C1is-09AF(য-BENGALILETTERYA)M1is-09BE(া-BENGALIVOWELSIGNAA)S1 and S2 are valid even they are not allowed by the other context rules

P rarr Ra-Hasanta(C2H)whereC2iseither09B0(র-BENGALILETTERRA)or 09F0 (ৰ - ASSAMESE LETTER RA

Unicode name BENGALI LETTER RAWITHMIDDLEDIAGONAL)

His09CD(-BENGALISIGNVIRAMA)

Table16-SymbolsusedinWLErules

ItisalsoperhapsidealtomentionherethatinBanglatheconsonantletters(orgraphemes)arephysicallyjoinedtoformldquoclustersrdquothatcouldtheoreticallyconjoinfromtwotofourconsonantsandcombinetocreatenewshapesDashandChaudhuri(1998)statethatthereareldquonearly380uniqueconsonantclustersrdquooutofwhichBi-consonantalcombinationsare290three-lettercombinationsaccountforanother80andtherareroneswithfourlettersnumber10more[136Pg4]MoredetailsofsuchcombinationscouldbeseeninPabitraSarkar(1993)[135] 71FinalSetofWLERulesTheprevalentpatternsinBanglaandvariousrestrictionsbelowarethespecificWLErulesthatneedtobeimplemented

48

1 CisasetofCandCNwhereCNisthesetofnormalizedformsofড়ঢ়য়2 HmustbeprecededbyC

Example

3 MmustbeprecededbyCExampleকা

4 DmustbeprecededbyeitherofVCorMExampleআখখা হা

5 XmustbeprecededbyeitherofVCMorDExampleউঃখঃবঃাঃ দ ঃ

6 BmustbeprecededbyeitherofVCMorDExampleআংইংকং

7 ZmustbeprecededbyVCMDBXorPExampleইৎকৎাৎাৎপ6ৎrৎ (S is not listed because S ends with M Z may also follow S)rdquo

8 VCANNOTbeprecededbyHDetailsin711CaseofVprecededbyH

9 SCANNOTbeprecededbyH

Now letuselaborateeach rulewithexamples from the scriptkeeping inmindthe Bangla Assamese and Manipuri communities Some combinations ofcharactersmayseemunrealisticorrareinusagebutthereisnoharminaddingsuch ligaturesbecause it ispossible tocreate thembyanyusereasilybutmaynotbeattestedcombinationsCase of V Preceded by H There could be cases involvingmulti-word domains where Vmay need to beallowedtofollowanHegব8াsঅtইিuয়া baeligŋkʌv ɪndiə (U+09ACU+09CDU+09AFU+09BEU+0999U+09CD U+0995 U+0985 U+09AB U+09CD U+0987 U+09A8 U+09CD U+09A1U+09BFU+09DFU+09BE)(meaningBankofIndia)ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendswithanH(অt)andthesecondwordbeginswithaV(ইিuয়া)Somesectionsofthe linguistic community require the explicit presence of H for fullrepresentation of the sound intended However by and large the form of thefirstwordwithoutanH(U+09CD)isconsideredenoughforfullrepresentationofthesoundintendedforthefirstword

49

This isauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoire Otherwise V is never required to be allowed to follow an HPermittingthismaycreateaperceptivesimilaritybetweentwolabels(withandwithout H) for majority of the linguistic communities hence this is explicitlyprohibitedbytheNBGPIn future if required depending on the prevailing requirements from thecommunitythefutureNBGPmayconsiderrevisitingthisrule

72 AdditionalExamplesfromBanglaABNF

Belowarea fewexampleswhichhelponeunderstandsomeof the rulesABNFputsinplaceThesearejustgivenforreferencepurposesandarenotmeanttobecomprehensive

1HMBDorXcannotoccurinthebeginningofaBanglawordExample

ক क

াক ाक

ংক क

ক क

ঃক ःकAscanbeseensuchcombinationwillresultautomaticallyinaldquogolurdquooradottedcircle marking it as an invalid formation This is an intrinsic property of theIndianlanguagesyllableandisquasi-automaticallyappliedwhereversupportedbytheOS

2HisnotpermittedafterVBDXMS

Example

অ अ

অং क ক क কঃ कः ি )क

3NumberofBDorXpermittedafterConsonantorVowelorakāra(Mātrā)isrestrictedtoonethusthefollowingcombinationsareinvalidated

50

Example

কংং क

ক क

কঃঃ कःः

কা का

কীঃঃ कःः

অংং अ

অ अ

অঃঃ अःः

4NumberofMpermittedafterConsonantisrestrictedtooneExample

কীী की5MisnotpermittedafterV Example

ইাঈৗ ईाईौ6ThecombinationsofAnusvāra+ VisargaaswellasVisarga+Anusvāraarenotpermissible

Example

কংঃ कः

কঃং कः

8 Contributors

81ExpertsfromIndia ProfessorUdayaNarayanaSinghChair-ProfessorofLinguisticsampDeanFacultyofArtsAmityUniversityHaryanaGurgaonPachgaonManesarPIN122431(Haryana)India

51

ProfessorPabitraSarkarformerlyVice-ChancellorRabindraBharatiUniversityKolkataDrAtiurRahmanKhanPrincipalTechnicalOfficerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMrRajibChakrabortyLinguistSocietyforNaturalLanguageTechnologyResearch(SNLTR)Module114amp130SDFBuildingSaltLakeSector-VKolkata-700091(WestBengal)IndiaMrAkshatJoshiProjectEngineerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMsMoumitaChowdhurySeniorTechnicalOfficerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMrChandrakantaMurasinghAgartalaTripuraSomeotherNBGPmembers

82ContributorsfromBangladeshJanabMustafaJabbarHonorableMinisterMinistryofPostsTelecommunicationsampInformationTechnologyGovtofBangladeshProfShamsuzzamanKhanFormerDirector-GeneralBanglaAcademyDhakaProfRafiqulIslamNationalProfessorofHumanitiesDhakaProfSwarochisSarkarDirectorInstituteofBangladeshStudiesRajshahiUniversityRajshahiBangladeshProfJinnatImtiazAliDirector-GeneralInternationalMotherLanguageInstituteDhakaMrMohammadMamunOrRashidDepartmentofBanglaJahangirnagarUniversityampMemberBangladeshComputerCouncilProfManiruzzamanformerlyProfessorChittagongUniversityChattagramBangladeshMrShyamSunderSikderSecretarySecretaryPostampTelecommunicationsDivisionGovtofBangladesh

52

MrMdMustafaKamalFormerDirectorGeneralBangladeshTelecommunicationsRegulatoryAuthorityGovernmentofBangladeshDhakaBrigadierGeneralMdMahfuzulKarimMajumderDirector-GeneralEngineeringampOperationsDivisionBangladeshTelecommunicationsRegulatoryAuthorityGovernmentofBangladeshDhakaMdZiarulIslamProgrammerPostsampTelecommunicationsDivisionGovernmentofBangladeshDhakaProfSyedShahriyarRahmanDepartmentofLinguisticsUniversityofDhakaDrMizanurRahmanDirector(In-Charge)TranslationTextBookandInternationalRelationsDivisionBanglaAcademyDhakaDrApareshBandyopadhyayDirectorBanglaAcademyDhakaMrMdMobarakHossainDirectorBanglaAcademyDhakaDrJalalAhmedDirectorBanglaAcademyDhakaMrJahangirHossainInternetSocietyBangladesh(ICANNALS)JanabSarwarMostafaChoudhuryBangladeshComputerCouncilDhakaJanabMdRashidWasifBangladeshComputerCouncilDhakaJanabIstiaqueArifSeniorAssistantDirectorBangladeshTelecommunicationsRegulatoryAuthorityDhakaMsAfifaAbbasInformationSecurityandGovernanceLeadEngineeratBanglalinkandICANNFellowMr Mohammad Abdul Haque Secretary General Bangladesh Internet GovernanceForumMrImranHossenCEOEyeSoftandkeymemberofBangladeshAssociationofSoftwareampInformationServices(BASIS)MsShahidaKhatunDirectorFolkloreMuseumampArchiveDivisionBanglaAcademyDhakaMrSyedAshikRehmanCEOBengalMediaCorporationDhakaMrHaseebRahmanCEOProfessionalsrsquoSystemsDhaka

53

9 References[100] UnicodeConsortium2017UnicodeStandard100MountainViewCA

[101] BandyopadhyayChittaranjan1981DuiShatakerBanglaMudranoPrakashanKolkataAnandaPublishers

[102] BanerjiRD1919TheOriginoftheBengaliScriptKolkataNewDelhiAsianEducationalServices2003reprint

[103] ChatterjiSK1926TheOriginandDevelopmentoftheBengaliLanguageCalcuttaCalcuttaUniversityPress

[104] -----1939Bhasha-prakashBangalaVyakaran(AGrammaroftheBengaliLanguage)CalcuttaUniversityofCalcutta

[105] HaiMuhammadAbdul1964DhvaniVijnanOBanglaDhvani-tattwa(PhoneticsandBengaliPhonology)DhakaBanglaAcademy

[106]JhaSubhadra1958TheFormationofMaithiliLondonLuzacampCo

[107] KosticDjordjeDasRheaS1972AShortOutlineofBengaliPhoneticsCalcuttaStatisticalPublishingCompany

[108] MajumdarRC1971HistoryofAncientBengalCalcuttaGBhardwaj

[109] MazumdarBijaychandra19202000TheHistoryoftheBengaliLanguage(ReprCalcutta1920ed)NewDelhiAsianEducationalServices

[110] PandeyAnshuman2001ProposaltoEncodetheTirhutaScriptinISOIEC10646

[111] PalPalashBaran2001DhwanimalaBarnamalaKolkataPapyrus

[112] -----2007lsquoBanglaHarapherPanchParbarsquoInSwapanChakrabortyedMudranerSanskritiOBanglaBoiKolkataAbabhas

[113] RossFiona1999ThePrintedBengaliCharacteranditsEvolutionLondonCurzon

[114] ShastriMahamahopadhyayHaraPrasad1916HājārBacharērPurāṇaBāṅgālāBhāṣāyBauddhaGānōDōhāCalcuttaBangiyaSahityaParishat

[115] SinghUdayaNarayana(JointlyManiruzzaman)1983DiglossiainBangladeshandlanguageplanningCalcuttaGyanBharati214pp

[116] -----1987ABibliographyofBengaliLinguisticsMysoreCIILxii+316pp

[117] -----2017(withRajibChakrabortyBidishaBhattacharjeeampArimardanKumarTripathy)LanguagesandCulturesontheMarginGuidelinesforFieldworkonEndangeredLanguagesMimeoCentreforEndangeredLanguagesVisva-Bharati

54

[118] -----1980ScriptalchoiceandspellingreformAnessayinlanguageandplanningJournaloftheMSUniversityofBarodaSocialScienceNumber292173-186AmodifiedversionreprintedEAnnamalaiBjornJernuddandJoanRubinedsLanguagePlanningProceedingsofanInstituteMysoreCIIL405-417

[119] Sripantha1996JakhanChapakhanaEloKolkataPaschim-BangaBanglaAcademy

[120] SurAtul1986BanglaMudranerDushoBacharKolkataJijnasa

[121] ScriptBehaviourforBengaliVersion11TDILandC-DACPune

[122] BoraMahendra1981TheEvolutionofAssameseScriptJorhatAssamSahityaSabha

[123] ProposaltoEncodetheTirhutaScriptinISOIEC10646httpwwwunicodeorgL2L201111175r-tirhutapdfaccessedon25112017

[124] EthnologueAssameseintheLanguageCloudhttpswwwethnologuecomcloudasmaccessedon25112017

[125] BengalialphabetforManipurifoundinEthnologueManipuri(MeeteilonMeithei)httpswwwomniglotcomwritingmanipurihtmaccessedon20102019

[126] WikipediaBengalialphabethttpsenwikipediaorgwikiBengali_alphabetaccessedon25112017

[129]OmniglotSlyhetihttpwwwomniglotcomwritingsylotihtmaccessedon1052018

[130]WikipediaBishnupriyaManipurilanguagehttpsenwikipediaorgwikiBishnupriya_Manipuri_languageaccessedon25112017

[131] TheEMILLECIILCorpushttpmetashareeldaorgrepositorybrowsethe-emilleciil-corpusabdd35c8de6f11e2b1e400259011f6ea6bce74d38dbb42d881da76c64a6adb20accessedon1052018

[132]TheEMILLECIILCorpushttpcatalogelrainfoproduct_infophpproducts_id=696accessedon1052018

[133] BanglaLanguageampScript httpswwwisicalacin~rc_banglabanglahtmlaccessedon1052018

[134] SarkarPabitra1992BanglaBananSanskarSamasyaoSambhabanaKolkataChirayataPrakashan

[135] SarkarPabitra1993BanglaBhasharYuktabyanjanBhasha1123-45

55

[136] DashNiladriShekharandBBChaudhuri1998BanglaScriptAStructuralStudyLinguisticsToday121-28Alsoavailableathttpswwwacademiaedu9967428Bangla_Script_A_Structural_Study

[137] DaniAhmedHasan(1957)lsquoSrīhatta-NāgarīLipirUtpattioBikāśrsquoBanglaAcademyPatrika(Dhaka)Vol12(Bhadra-Agrahayan1364BangabdaNumber)pg1

[138] WikipediaSylhetiNagari httpsenwikipediaorgwikiSylheti_Nagariaccessedon1952018

[139] FuruiRyosuke(2015)lsquoVariegatedAdaptationsStateFormationinBengalfromtheFifthtoSeventhCenturyrsquoinBhairabiPrasadSahuampHermannKulkeedsInterrogatingPoliticalSystemsIntegrativeProcessesandStatesinPre-ModernIndiaChapter9Pp255-73NewDelhiManohar

[140] FergusonCharesAandMunierChowdhury(1960)lsquoPhonesofBengalirsquoLanguageVol36No1pp22-59

[141] ShahidullahMuhammad(2007)BuddhistMysticSongsDhakaMowlaBrothers

[142] RayPunyaSloka(1966)BengaliLanguageHandbookWashington

[143] HaiMuhammadAbdul(1960)AphoneticandphonologicalstudyofnasalsandnasalizationinBengaliDhakaUniversityofDhaka

[144] UnicodeConsortiumProposalSummaryFormtoAccompanySubmissionsforAdditionstotheRepertoireofISOIEC10646UNICODEhttpswwwunicodeorgL2L200202387r-syloti-formpdfaccessedonMay212018

[145] WikipediaOlChiki(Unicodeblock)httpsenwikipediaorgwikiOl_Chiki_(Unicode_block)accessedonMay212018

[146]BanglaScripthttpwwwbangladesh2000combdbangla_scripthtmlaccessedonMay212018

[147] BhattacharyaAshutoshed(1942)GopichandrerGanCalcuttaCalcuttaUniversity

[149] DasSisirKumar(1975)SahibsandMunshisAnAccountoftheCollegeofFortWilliamCalcutta

[150] IslamRafiqulPabitraSarkarMahbubulHaqampRajibChakraborty(eds)(2014)BanglaAcademyPromitoBanglaByabaharikByakaran(AFunctionalGrammarofStandardBangla)DhakaBanglaAcademy

[151] SarkarPabitra[2013]lsquoBanglaSpellingReformtheLongandShortofItrsquoBanglaJournal19215-232

[152] BanglaAcademy(2012)BanglaAcademyPromitoBanglaBananerNiyam(StandardBanglaSpellingasadoptedbyBanglaAcademy)DhakaBanglaAcademy

56

[153] SarkarPabitraampRajibChakraborty2018ldquoWhathashappenedSoFarIntermsof Script Reformsrdquo Paper presented at the Face to Face meeting jointly held by theBanglaAcademyDhakaampICANNatBanglaAcademyDhakaon10072018

[154] The Unicode Consortium 2018 The Unicodereg Standard Version 110 ndash CoreSpecificationChapter12P473

57

10 Appendix-I

101 AugmentedBackus-NaurFormalism(ABNF)The Augmented Backus-Naur Formalism (ABNF) is generic in nature and whenappliedtoaspecificlanguagescriptcertainrestrictionrulesapplyInotherwordsinagivenlanguagesomeoftheFormalismstructuresdonotnecessarilyapplyTotakecareofsuchcasesrestrictionrulesaresetinplaceTheserestrictionswillhelptofine-tunetheABNFIncaseofBangla13inparticularthefollowingrulesapply

1Khaṇḍata (ৎ) is NOT allowed at the beginning of an IDN label The sameappliestoঞandthevelarnasalঙintheBanglaSchemeoffive-foldlsquovargarsquo(asdefined under Table 5) Moreover Bangla does not allow ya (য়) in thebeginningofawordeitherbutwecanciteacoupleofnativeexamples forexamplethewordয়8াvেড়া(yaeligbbɔRo)fromthepoemlsquoLichuchorrsquowrittenbyKaziNazrul IslamHowever there are instances of it being used in namesmostlyof foreignoriginsuchasYaqubwhichmaybewrittenwithya(য়) inthe beginning as inয়াwব) In very recent timeswhile transliterating someChineseandJapanesenamesinBanglaonedoescomeacrossthepossibilityofKhaṇḍata (ৎ) followedbysa (স) inthebeginningofaword forexamplexেসিরং(Tsering)

2CHcancomewithKhandaTainonlythecasewhereCisra(র)(09B0)

ৎ6 asinভৎ6 সনা

3OnlyfollowingcombinationswithVHCMwillbeallowedrarrঅ8া(togetherpronouncedasaelig)asinঅ8ািসড(acid)rarrএ8া(togetheralsopronouncedasaelig)asinএ8ািসডএ8ােসািসেয়শান

(acidassociation)

102 lsquoSylhetiNagarılipirsquoorlsquoSilotirsquoThisversionofBanglascriptresemblesthe lsquoKaithīrsquoscript(ISO12954)usedbytheAccountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh andBiharndashwidelyinuseduringthe1880sTherewereseveralothernamesofSylheti

13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language Assamese and Maṇipuri have not been covered in this section

58

Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)

Table17ndashTheScriptTableofSylhetiNagarıorSiloti

103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints

1031 BanglaandNāgarīorDevanāgarī

Bangla Devanāgarī NBGP

Decision

ঃ U+0983 ः U+0903 Confusable

ওU+0993 उU+0909 Confusable

ঘU+0998 घU+0918 Confusable

U+0981 U+0945 Confusable

Table18BanglaandDevanāgarīconfusablecodepoints

59

1032 BanglaandGurmukhi

Bangla Gurmukhi NBGPdecision

ঘU+0998 ਬU+0A2C Confusable

U+0981 U+0A71 Confusable

Table19BanglaandGurmukhiconfusablecodepoints

Bangla Gurmukhi

NBGPdecision

ওU+0993 ਤU+0A24

Distinguishable

শU+09B6 ਅU+0A05

Distinguishable

মU+09AE ਮU+0A2E

Distinguishable

বাU+09ACandU+09BE

ਗU+0A17

Distinguishable

Table20ndashBanglaandGurmukhıdistinguishablecodepoints

1033BanglaandOriya(Odia)

Bangla Oriya(Odia) NBGPDecision

ওU+0993 ଓU+0B13 Confusable

Table21ndashBanglaandOriyadistinguishablecodepoints

Bangla Oriya(Odia) NBGP

Decision

ঘU+0998 ସU+0B38 Distinguishable

Table22ndashBanglaandOriyadistinguishablecodepoints

60

11 Appendix-IIBengaliconsonantsandtheirallographs

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

প p y(z+ত)(z +ন)|(z +প)প8(z +য)r(z +র)(z +ল)~(z +স)(+প)(+প)

ফ pʰ (t +র)(t +ল)(+ফ)

ব b ( +জ)( +দ)( +ধ)v(+ব)ব8(+য)(+র)(+ল)ভ(+ভ)(+ব)(+ব)

(0+ধ) 2 (3+ব)

ভ bʱ ভ8(+য)(+র)(+ল)

ত t (x+ত)8(x+x+য)(x+x+ব)(x+থ)(x+ন)ত8(x+য)(x+ম)8(x++য)(x+ব)(x+র)y(z+ত)(p+ত)(p+x+ব)(+ত)q8(+x++য) (+x+র)Thereisamarkedformofত+=ৎৎ6 (+xৎ)

amp (5+ত)

61

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

থ tʰ থ8(iexcl+য)cent(iexcl+র)pound(+থ)(x+থ)curren(+থ)

(7+থ) 9 (+থ)

দ d yen(brvbar+গ)sect(brvbar+ঘ)uml(brvbar+দ)copy(brvbar+ধ)দ8(brvbar+য)ordf(brvbar+ব)laquo(brvbar+ভ)not(brvbar+র)(+দ)shy(+দ)reg(+brvbar+র)not6 (+brvbar+র)

(lt+গ) gt (lt+ধ)

ধ dʱ macr(deg+ন)plusmn(deg+ম)ধ8(deg+য)sup2(deg+র)sup3(acute+ধ)copy(brvbar+ধ)(+ধ)micro(+ধ)

( (+ধ) gt (lt+ধ) (0+ধ) (7+ধ)

ট ʈ para(middot+ট)ট8(middot+য)cedil(middot+ব)sup1(middot+র)ordm(p+ট)raquo(frac14+ট)

ঠ ʈʰ ঠ8(frac12+য)frac34(iquest+ঠ)Agrave(frac14+ঠ)

ড ɖ Aacute(Acirc+ড)ড8(Acirc+য)Atilde(Acirc+র)

ঢ ɖʱ ঢ8(Auml+য)Aring(iquest+ঢ)

চ tʃ AElig(Ccedil+চ)Egrave(Ccedil+ছ)Eacute(Ccedil+Ecirc+র)Euml(Ccedil+ঞ)চ8(Ccedil+য)Igrave(Iacute+চ)Icirc(Iuml+চ)

(A+চ)

62

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ছ tʃʰ ETH(Ecirc+র)Egrave(Ccedil+ছ)Ntilde(Iacute+ছ)Ograve(Iuml+ছ)

$ (A+ছ)

জ dʒ Oacute(Ocirc+জ)Otilde(Ocirc+Ocirc+ব)Ouml(Ocirc+ঝ)times(Ocirc+ঞ)জ8(Ocirc+য)Oslash(Ocirc+র)Ugrave(Iacute+জ)

(A+জ)

ঝ dʒʱ (notprivilegedenoughtohaveclustersasafirstmember)Ouml(Ocirc+ঝ)Uacute(Iacute+ঝ)

ক k Ucirc(p+ক)ordm(p+ট)(p+ত)Uuml(p+x+র)(p+x+ব)Yacute(p+ন)THORN(p+ব)szlig(p+ম)ক8(p+য)agrave(p+র)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)auml(p+frac14+ব)aacute8(p+frac14+য)aring(p+স)s(aelig+ক)ccedil(+p+র)

amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)

খ kʰ (notprivilegedenoughtohaveclustersasafirstmember)egrave(aelig+খ)

63

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

গ g eacute(acute+গ)ecirc(acute+দ)sup3(acute+ধ)euml(acute+ন)igrave(acute+ব)iacute(acute+ম)গ8(acute+য)icirc(acute+র)iuml(acute+ল)eth(aelig+গ)eth6 (+aelig+গ)

( (+ধ) (H+গ) K (L+H+গ)

ঘ gʱ ntilde(ograve+ন)ঘ8(ograve+য)oacute(ograve+র)ocirc(aelig+ঘ)

ঞ Thisletterdoesnothaveanyparticularphoneticvaluebutmostlypronouncedasn

Igrave(Iacute+চ)Ntilde(Iacute+ছ)Ugrave(Iacute+জ)Uacute(Iacute+ঝ)times(Ocirc+ঞ)

(A+চ) $ (A+ছ) (A+জ) M (A+ঝ)

ণ n otilde(iquest+ট)frac34(iquest+ঠ)ouml(iquest+ড)divide(iquest+Acirc+র)Aring(iquest+ঢ)oslash(iquest+ণ)ণ8(iquest+য)ugrave(iquest+ব)acirc(p+frac14+ণ)uacute(frac14+ণ)ucirc(+ণ)

O (P+ড) - (P+R+র) + (S+ণ)

ঙং ŋ s(aelig+ক)uuml(aelig+p+র)egrave(aelig+খ)eth(aelig+গ)ocirc(aelig+ঘ)yacute(aelig+p+ষ)(Insomecontextsaeligisreplacedbyং)কংঅং

) (H+ক) (H+গ) U (H+ঘ)

64

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ম m thorn(+ল)yuml(+প)(+z+র)(+ভ)(++র)$(+ম)(+র)(x+ম)plusmn(deg+ম)amp(+ম)atilde(p+frac14+ম)

W (3+ম)

ন n (+ট)((+middot+র))(iquest+ঠ)u(+ড)(+Acirc+র)(+ত)q(+x+র)q8(+x++য)curren(+থ)shy(+দ)reg(+brvbar+র)micro(+ধ)+(+deg+র)(+brvbar+ব)-(+ন)(+ম)ন8(+য)(+স)0(+ন)

(7+থ) (7+ধ) (7+Y+র)

শ ʃ Icirc(Iuml+চ)Ograve(Iuml+ছ)1(Iuml+ন)2(Iuml+ম)3(Iuml+র)4(Iuml+ল)শ8(Iuml+য)

ষ ʃ 5(frac14+ক)raquo(frac14+ট)Agrave(frac14+ঠ)uacute(frac14+ণ)6(frac14+প)7(frac14+z+র)8(frac14+ফ)raquo(frac14+ট)9(frac14+middot+র)Agrave(frac14+ঠ)uacute(frac14+ণ)ষ8(frac14+য)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)

+ (S+ণ)

65

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

স sampʃ (+ক)(+ট)(+প)(+ফ)lt(+ত)pound(+থ)(+ট)(+ক)=(+খ)স8(+য)gt(+র)(+ল)aring(p+স)

9 (+থ)

হ h ucirc(+ণ)0(+ন)amp(+ম)হ8(+য)(+র)A(+ল)

W (3+ম)

ড় ɽ B(C+গ)

ঢ় ɽʱ (notprivilegedenoughtohaveclusters)

য dʒThesecondarysymbol(allograph)jɔ-phalahastwophoneticvaluesWhenaddedtotheinitialconsonantinaworditisavowelaelig(asinশ8ামলর 8াপারetc)Butafteranon-initialconsonantitjustdoublesitinpronunciation(asinকায6ধায6etc)The+যcombinationhastwophysicalmanifestationsmdashর 8andয6

ক8(p+য)স8(+য)র 8(+য)[Justর 8isneverusedinBanglaorthographyর 8াisbutthenitslasttwosymbolsYa-phalaa-karaconstituteavowelsignrepresentingthevowelঅ8া]

66

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

র r Twomanifestationsmdashi lরফrepʰasthefirst

memberofaclusteregপ6ৎ6 not6 য6D6(+deg+ব)(earlierE6=+brvbar+deg+বafour-termcluster)etc(placedoverthefollowingconsonant)

ii র-ফলাrɔ-pʰɔlaasthesecondthirdmemberofaclustereg etc(placedundertheconsonantitfollows)

ল l F(+গ)(+প)G(+ব)H(+ম)I(+ট)J(+ড)K(+ক)F(+গ)L(+দ)ল8(+য)iuml(acute+ল)(+ল)thorn(+ল)

ঃ hwordfinallywordmediallyitdoublesthepronunciationofthefollowingconsonant

অঃকঃ

অব

Page 34: Proposal for a Bangla (or Bengali) Script Root Zone …‘Bangla’ (or Bengali) is historically and genealogically regarded as an eastern Indo-Aryan language with around 178.2 million

34

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

54 U+09C2 BENGALIVOWELSIGNUU

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

55 U+09C3 BENGALIVOWELSIGNVOCALICR

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

56 U+09C4 BENGALIVOWELSIGNVOCALICRR

Kāra(Mātrā)

1Bangla2Assamese

[112][122]

57 U+09C7 l BENGALIVOWELSIGNE

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

58 U+09C8 m BENGALIVOWELSIGNAI

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

59 U+09CB lা BENGALIVOWELSIGNO

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

60 U+09CC lৗ BENGALIVOWELSIGNAU

Kāra(Mātrā)

1Bangla2Manipuri2Assamese

[112][122][125]

35

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

61 U+09CD BENGALISIGNVIRAMA

Hasanta(=Halant)Virama(=Da ri)

1Bangla2Assamese2Manipuri

[112][122][125]

62 U+09CE ৎ BENGALILETTERKHANDATA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

63 U+09F0 ৰ BENGALILETTERRAWITHMIDDLEDIAGONAL

Consonant 2Assamese [122]

64 U+09F1 ৱ BENGALILETTERRAWITHLOWERDIAGONAL

Consonant 2Assamese2Manipuri

[122][125]

Table8BanglaCode-PointRepertoire

Apart from the above individual code-points the Neo-Brāhmī Generation Panel alsoproposes some specific sequences which enable conditional inclusion of the BanglaLETTER A and E followed by Bangla SIGN VIRAMA and Bangla LETTER YA againfollowed by Bangla VOWEL SIGN AA in the repertoire for enabling inclusion of aeligsoundasinEnglishlsquobatrsquolsquocatrsquoetc

SrNo

UnicodeCodePoints

Sequence

CharacterNames Examplelanguagesusingthecode-point

(Notexhaustive

list)

Reference

36

SrNo

UnicodeCodePoints

Sequence

CharacterNames Examplelanguagesusingthecode-point

(Notexhaustive

list)

Reference

S1 098509CD09AF09BE

অ8া BENGALILETTERABENGALISIGNVIRAMABENGALILETTERYABENGALIVOWELSIGNAA

BanglaAssamese

[112][122]

S2 098F09CD09AF09BE

এ8া BENGALILETTEREBENGALISIGNVIRAMABENGALILETTERYABENGALIVOWELSIGNAA

Bangla [112]

Table9Sequences

52 CodePointRepertoireExclusionTherearesomecharactersoftheBanglascriptthatfindplaceintheUnicodebuthavenot been included in the repertoire in the LGR proposal The reason for excludingঌ(U+098C)andৗ(U+09D7)isthattheyarerareandobsoletecharacters

SrNo CodePoints

Glyph CharacterNames Note

1 U+098C ঌ BENGALILETTERVOCALICL Limitedordeclininguse

2 U+09D7 ৗ BENGALIAULENGTHMARK Limitedordeclininguse

Table10ExcludedCodePoints

37

53 CodepointnotusedaloneBENGALI SIGN NUKTA U+09BC (See 336) is excluded from repertoire since it will never be used alone It will be used as sequence in three special characters in normalized form for ড় ঢ় য়

UnicodeCodePoint

Glyph CharacterName Reasonforexclusion

U+09BC BENGALISIGNNUKTA

Never used alone Only used together with U+09A1 ড

U+09A2 ঢ U+09AF য as to form ড় ঢ় য় respectively

Table10bExcludedCodePoints

54 TheBasisofPresentIDNThepresentLGRhasalsobenefitedfromtheearlierworkonIDNforBangla(differentversions)doneforभारतorভারতdraftedbetween20112009and18072013

541 TheABNFVariablesTheAugmentedBackus-NaurFormalism(ABNF)beganwiththefollowingvariables

CrarrConsonantVrarrVowelMrarrkāra(Mātrā)BrarrAnusvāra(onuʃʃār)DrarrCandrabinduXrarrVisarga(biʃɔrgo)HrarrHasantaViramaZrarrKhandaTa

TheAugmentedBackus-NaurFormalism(ABNF)willusethefollowingOperators

SrNumber Operator Function

1 ldquo|ldquo Alternative

2 ldquo[]rdquo Optional

3 ldquordquo VariableRepetition

38

4 ldquo()rdquo SequenceGroup

Table11TheABNFFormalism

InwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoBanglaaregiventofacilitateunderstanding

542 TheVowelSequenceInwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoBanglaare given To facilitate understanding of other Brahmi script users equivalents inDevanāgarīareprovidedwherevernecessaryAvowelsequenceismadeupofasinglevowelItmaybefollowedbutnotnecessarily(optionally)byanAnusvāraonuʃʃār(B)Candrabindu(D)oraVisargabiʃɔrgo(X)ThenumberofDBorXwhichcanfollowaV inBanglamaynotberestrictedtooneGoingbytherules illustratedinthedocument it isclearthat formationssuchasVDDVBBandVXXare invalidorthographicunitsHowever it isvalidandpossible tohaveformationsorsequencessuchasanusvarafollowedbyachandrabinduononehandandvisarga followed by a chandrabindu on the other as in হ8াংচা lsquohaelignchārsquo and lsquohaelignrsquo হ8াঃrespectivelyThepossibilityof aVisarga orAnusvāra (onuʃʃār) followingaCandrabinduexists inBangla Vowel can optionally be followed by a combination of Hasanta Virama [H]Consonant [C] to formaYa-phala ldquoYa-phala isapresentation formofU+09AFBanglaletterযorlsquoyarsquoRepresentedbythesequenceltU+09CDieBENGALISIGNVIRAMABangla SIGNHasanta or VIRAMA U+09AF -য BENGALI LETTER YAgt Ya-phala has aspecialformয়AgainwhencombinedwithU+09BEাBENGALIVOWELSIGNAA(ielsquoaarsquo(ā))itisusedfortranscribing[aelig]asintheldquoardquointheEnglishwordldquobatrdquowritteninBanglaasব8াটAVowel-sequenceadmitsthefollowingcombinations

5421 ASingleVowel

ExamplesV অअ

5422 AVowelwithConditionsAVowelcanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D]orVisarga[X]or Candrabindu+ Anusvāra [DB] or Candrabindu+ Visarga [DX] or combination ofHasanta(orVirama)[H]followedbyConsonant[C]followedbykāra(Mātrā)[M]

39

Examples

VB অং अ

VD অ अ

VX অঃ अः

VDB অং अ

VDX অঃ अःVHCM অ8াএ8া

5423 VHCMSequenceAVHCMsequencecanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D] orVisarga [X]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga [DX]

Examples VHCMB অ8াংএ8াং VHCMD অ8াএ8া VHCMX অ8াঃএ8াঃ VHCMDB অ8াংএ8াং VHCMDX অ8াঃএ8াঃ

543 TheConsonantSequence

5431 ASingleConsonant(C) ExampleC কक

5432 AConsonantwithConditionsAConsonantoptionallyfollowedbydependentvowelsignkāra(Mātrā)[M]orAnusvāra [B] or Candrabindu [D] or Visarga [X] or Hasanta (also known asVirama)[H]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga[DX]

Example

CM িকক -कक

CB কং क

40

CD ক क

CX কঃ कः

CH p क(Pureconsonant)

CDB কং क

CDX কঃ कः

5433 CMSequenceACMsequencecanbeoptionallyfollowedbyBDXDBorDX

Example CMB কীংকং कक

CMD কা का

CMX বীঃ वीः

CMDB কাং का

CMDX কাঃ काः

5434 SequenceofConsonants Asequenceofconsonants(upto4)joinedbyHasanta(alsoknownasVirama)

3(CH)CExample

CHC W rarr ন++ ত न+त

CHCHC sup2 rarr ন+ + ত+ + র न+त+र

CHCHCHC q8 rarr ন++ত++র++য় न+त+र+य

5435 Subsets While considering its subsets as a representative example we will consider thecombinationCHConlyhoweverthesameisequallyapplicabletoCHCHCandCHCHCHC[A]ThecombinationmaybefollowedbyMBDXDBorDX

Example CHCM sup3ী rarrক ক ী 4कrarrककী

CHCB sup3ং rarrক ক ং 4कrarrकक

CHCD sup3 rarrক ক 4कrarrकक

41

CHCX sup3ঃ rarrক ক ঃ 4कःrarrककঃ

CHCDB sup3 ং rarrক ক ং 4कrarrकक

CHCDX sup3ঃ rarrক ক ঃ 4कःrarrककः [B] 3(CH)CMmayfurtherbefollowedbyaBDXDBorDX

Example

CHCMB sup3ীং rarr ক ক ী ং 4क rarr क क ी

sup3ং rarr ক ক ং 4क rarr क क

CHCMD sup3া rarr ক ক া 4का rarr क क ा

CHCMX sup3ীঃ rarr ক ক ী ঃ 4कः rarr क क ी ः

CHCMDB sup3াংrarr ক ক া ং 4काrarr क क ा

CHCMDX sup3াঃ rarr ক ক া ঃ 4काः rarr क क ा ः

544 TheKhanda-Tasequence

5441 AsinglelsquoKhandarsquo-Ta(Z) Example Zৎ=x

5442 AKhandaTaCombination10AKhandaTacanbeprecededbyaconsonantandHasanta(alsoknownasVirama)

[CH]Z

Example র++ৎ =ৎHasinভৎHসনা (bhartsanā) scoldingNoteTheconditionsinthiscontextofKHANDATAarethattheCshouldbeeitherRAU+09B0(র)(usedinBangla)orRAU+09F0(ৰ)(usedinAssamese)

545 SpecialCasesSandPTwospecialcasesinvolvingSequences(referredtoasSandPinTable16underSection7)couldbedescribedbrieflyhereLetustakeupSinthefirstinstanceItisnoteworthythat thereare two instances inBanglawhereHasanta(U+09CD) isprecededbya full

10 Refer to Rule P in Section 7 Table 16

42

vowel (U+0985 অ - BANGLA LETTER A and U+098F এ - BANGLA LETTER E) ForrenderingYa-phalāfollowedbyঅandএitisnecessarytotypeU+09CDplusU+09AFyapreceded by the said vowels This is a purely ligatural entity and the addition ofYa-phalāandā-kāraisusedtoelicittheaeligsoundasinEnglishlsquobatrsquolsquofatrsquoetcTheBrahmıscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribedaslsquovowel killerrsquo although it actually indicates absence of a vowel after the markedconsonant Only the consonants can have the Hasanta marked But as we see hereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅGাandএGা(CfUnicode100p473[100])Another case refers to the formation of repha and ra-phalā in the said script andmentioned in the tableaboveasPOwing toco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+

kaasinঅকHarkaldquothesunldquo)ra-phalā=C+Hasanta+ra(egTieka+Hasanta+raas

inচTcakraldquocycleldquo)ThepointisinboththecasestheslotforracouldbeBanglaraর

(U+09B0)ortheAssameseraৰ(U+09F0)followedprecededbythecommonHasanta(U+09CD)whereastheshapesofrephaandra-phalāinboththecasesremainthesame

43

6 VariantsThissection talksabout thevariants in theBanglascriptTheNBGPcategorizes theseconfusinglyvariantsintwogroupsGroup1ConfusingduetopurevisualsimilarityGroup2ConfusingduetodeviationfromnormallyperceivedcharacterformationsbylargerlinguisticcommunityForGroup1anyidenticalcodepointsaredefinedasvariantsTheconfusablebutnotidenticalcasesarenotproposedasthereisanotherpanel(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesHowevercaseswhichbelongtoGroup2areproposedtobeconsideredasvariantsThesecasesarenotofmerevisualsimilarityasthey involve some deviations from the widely accepted norms of Bangla Aksharformations These can cause confusion even to a careful observer and hence beingproposedasvariantsThe variants are generated in a script when two or more forms are formed withdifferent storage or code points In Bangla the e-kāra ā-kāra and the o-kāra havedifferentcodepointsOnecantypeowithaconsonantatonegoandthesamebytypinge-kāra and ā-kāra as two separate keys getting the same results A reader cannotdifferentiatebetween the twoko (েকা)one typedwitha singlekeyand theotheronetypedwithtwodifferentkeysMoreoverthiswillnotbeconsideredasacaseofvariantbecauseakārafollowedbyakāraisnotallowed

61 InScriptVariantsHoweverweproposetwocasesoftruein-scriptvariantsinBanglascriptCASEIAs far as true variants in Bangla are concernedwemay drawour attention to caseswhereinHasantawith(U+09A5)থ(tha)appearsasconjunctwith(U+09B8)স(sa)and(U+09A8)ন(na)1 স+Hasanta+থ(U+09B8+U+09CD+U+09A5)versus

স +Hasanta+হ(U+09B8+U+09CD+U+09B9)

2 ন+Hasanta+থ(U+09A8+U+09CD+U+09A5)versus

ন+Hasanta+হ(U+09A8+U+09CD+U+09B9)

44

Theabovecombinationsifwrittenintraditionalorthographycouldbelittleconfusingwheretheথ(tha)inconjunctappearslikeaহ(ha)Theconjunctcouldbeintheinitialmedialorfinalpositions(asshownbelowinegno1)Itcouldbetypedwrongaswellthinking itwas aহ (ha)U+09B9 increasing the chances of risks in labelwriting andidentificationExamples1 acuteandসহ(asinacuteানsthānaacuteলsthulaাacuteGsvāsthyaঅacuteায়ীasthāyı)

2 andনহ(asinparagrantha)The fontswhichrepresent traditionalBanglawritingsystemcould tend tocreate thisproblemThereforethesemaybetakenascasesofvariantsinBanglaCASEIIAnotherinterestingexampleofvariantisencounteredinra+HasantaandHasanta+racombinationsinwritinglabelsintheBanglascript(forlanguagessuchasBanglaAssameseandManipuri)Thevariantcasesariseintypinglsquorepharsquo(involvingra+Hasanta)andlsquora-phalārsquo(involvingHasanta+ra)lsquoRepharsquocouldbeformedbytwosequences(mainlybecausebothAssameseandBanglafindplaceinthesameUNICODEpointsandlsquoB_RArsquoaswellaslsquoA_RArsquorefertothesamephoneticelement)Herethefinalligatureslookthesameandwillbeasfollows

(1) B_RA+H+C(2) A_RA+H+C

Where

B_RA = U+09B0BENGALILETTERRA(র)orA_RA = U+09F0BENGALILETTERRAWITHMIDDLEDIAGONAL(ৰ)H = U+09CDBENGALISIGNVIRAMA()C = anyconsonant(theoretically)

Example

Sequence1(UsingBanglaRA)

Ligature1

Sequence2(UsingAssameseRA)

Ligature2

U+09B0(র)U+09CD()U+0995(ক) কH U+09F0(ৰ)U+09CD()U+0995(ক) কH

U+09B0(র)U+09CD()U+09A0(ঠ) ঠH U+09F0(ৰ)U+09CD()U+09A0(ঠ) ঠH

Table12ExampleofRepha

45

NoteAsBanglaandAssameseকandঠlookexactlythesametheresultantcombinationswithRephalookidenticalAdditionofRephadoesnotmakeanydifferencelsquoRa-phalarsquocouldbeformedbytwosequencesonsimilargroundsandthefinalligatureswouldlookthesame

(1) C1+H+B_RA(2) C1+H+A_RA

WhereC1 = anyconsonantsexceptKhanda-ta

Example

Sequence1(UsingBanglaRA)

Ligature1

Sequence2(UsingAssameseRA)

Ligature2

U+0995(ক)U+09CD()U+09B0(র) U+0995(ক)U+09CD()U+09F0(ৰ)

U+09A8(ন)U+09CD()U+09B0(র) ) U+09A8(ন)U+09CD()U+09F0(ৰ) )

Table13ExampleofRa-phala

AstheAssameseandBanglaRephaandRa-phalaconjunctformslookthesamethiscouldcauseconfusabilitytotheend-usersHencetherephaandra-phalacasesneedtobedefinedasvariantsNBGPconcludedtodefineরandৰasvariantcodepointswhereonlyonevariantsetbetweenরandৰcouldcoverallcasesButthiswillcreateblockedvariantlabelsegif

someoneregistersldquoরররrdquothevariantlabelldquoৰৰৰrdquowillbegeneratedasvariantandwillbeblockedandviceversaHoweveritisonlyblockedatthelabellevelifsomeoneelseneedstoregisterotherlabelsegৰৰorৰৰৰৰitisstillpossible

62 CrossScriptVariantsAcrispcrossscriptstudyforBanglahasbeendonewithrespecttosisterscriptssuchasDevanāgarī Gurmukhı and Odia11(formerly Oriya) keeping in mind the visual andtechnicalconfusionstheymaycauseaslabelsonthewebdomainMoreoverthereisnoin-script variant in Bangla as far as the orthography is concerned The followingcharacters are being proposed by the NBPG as variants Although there are certaincharacters which are somewhat similar they but have not been included here TheyhavebeenprovidedintheAppendix(102)forreference

11 Unicode uses Oriya for the script although Odia is now the official term used

46

1 BanglaandNāgarīDevanāgarīScript

Bangla Devanāgarī

মU+09AE

मU+092E

িU+09BF

िU+093F

Table14-BanglaandDevanāgarīcross-scriptvariantcodepoint

2 BanglaandGurmukhiScript

Bangla Gurmukhı

মU+09AE

ਸU+0A38

িU+09BF

ਿU+0A3F

Table15-BanglaandGurmukhıcross-scriptvariantcodepoint

7 WholeLabelEvaluationRules(WLE) ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinsection32whenwritten inBangla12ScriptThe ruleshavebeendrafted in suchawaythattheycanbeeasilytranslatedintotheLGRspecificationsBelow are the symbols used in the WLE rules for each of the Indic SyllabicCategoryasmentionedinthetableprovidedinCodepointrepertoire(Section51)

C rarr Consonant

M rarr Kāra(Mātrā)

V rarr Vowel

B rarr Anusvāra

12 As used by the Unicode denoting and including both Assamese and Maṇipuri

47

D rarr Candrabindu

X rarr Visarga

H rarr Hasanta

Z rarr KhandaTa

S rarr S1S2(fromTable9)or(ae)Ya-phalā(V1HC1M1)whereV1iseither0985(অ-BENGALILETTERA)or098F(এ-BENGALILETTERE)His09CD(-BENGALISIGNVIRAMA)C1is-09AF(য-BENGALILETTERYA)M1is-09BE(া-BENGALIVOWELSIGNAA)S1 and S2 are valid even they are not allowed by the other context rules

P rarr Ra-Hasanta(C2H)whereC2iseither09B0(র-BENGALILETTERRA)or 09F0 (ৰ - ASSAMESE LETTER RA

Unicode name BENGALI LETTER RAWITHMIDDLEDIAGONAL)

His09CD(-BENGALISIGNVIRAMA)

Table16-SymbolsusedinWLErules

ItisalsoperhapsidealtomentionherethatinBanglatheconsonantletters(orgraphemes)arephysicallyjoinedtoformldquoclustersrdquothatcouldtheoreticallyconjoinfromtwotofourconsonantsandcombinetocreatenewshapesDashandChaudhuri(1998)statethatthereareldquonearly380uniqueconsonantclustersrdquooutofwhichBi-consonantalcombinationsare290three-lettercombinationsaccountforanother80andtherareroneswithfourlettersnumber10more[136Pg4]MoredetailsofsuchcombinationscouldbeseeninPabitraSarkar(1993)[135] 71FinalSetofWLERulesTheprevalentpatternsinBanglaandvariousrestrictionsbelowarethespecificWLErulesthatneedtobeimplemented

48

1 CisasetofCandCNwhereCNisthesetofnormalizedformsofড়ঢ়য়2 HmustbeprecededbyC

Example

3 MmustbeprecededbyCExampleকা

4 DmustbeprecededbyeitherofVCorMExampleআখখা হা

5 XmustbeprecededbyeitherofVCMorDExampleউঃখঃবঃাঃ দ ঃ

6 BmustbeprecededbyeitherofVCMorDExampleআংইংকং

7 ZmustbeprecededbyVCMDBXorPExampleইৎকৎাৎাৎপ6ৎrৎ (S is not listed because S ends with M Z may also follow S)rdquo

8 VCANNOTbeprecededbyHDetailsin711CaseofVprecededbyH

9 SCANNOTbeprecededbyH

Now letuselaborateeach rulewithexamples from the scriptkeeping inmindthe Bangla Assamese and Manipuri communities Some combinations ofcharactersmayseemunrealisticorrareinusagebutthereisnoharminaddingsuch ligaturesbecause it ispossible tocreate thembyanyusereasilybutmaynotbeattestedcombinationsCase of V Preceded by H There could be cases involvingmulti-word domains where Vmay need to beallowedtofollowanHegব8াsঅtইিuয়া baeligŋkʌv ɪndiə (U+09ACU+09CDU+09AFU+09BEU+0999U+09CD U+0995 U+0985 U+09AB U+09CD U+0987 U+09A8 U+09CD U+09A1U+09BFU+09DFU+09BE)(meaningBankofIndia)ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendswithanH(অt)andthesecondwordbeginswithaV(ইিuয়া)Somesectionsofthe linguistic community require the explicit presence of H for fullrepresentation of the sound intended However by and large the form of thefirstwordwithoutanH(U+09CD)isconsideredenoughforfullrepresentationofthesoundintendedforthefirstword

49

This isauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoire Otherwise V is never required to be allowed to follow an HPermittingthismaycreateaperceptivesimilaritybetweentwolabels(withandwithout H) for majority of the linguistic communities hence this is explicitlyprohibitedbytheNBGPIn future if required depending on the prevailing requirements from thecommunitythefutureNBGPmayconsiderrevisitingthisrule

72 AdditionalExamplesfromBanglaABNF

Belowarea fewexampleswhichhelponeunderstandsomeof the rulesABNFputsinplaceThesearejustgivenforreferencepurposesandarenotmeanttobecomprehensive

1HMBDorXcannotoccurinthebeginningofaBanglawordExample

ক क

াক ाक

ংক क

ক क

ঃক ःकAscanbeseensuchcombinationwillresultautomaticallyinaldquogolurdquooradottedcircle marking it as an invalid formation This is an intrinsic property of theIndianlanguagesyllableandisquasi-automaticallyappliedwhereversupportedbytheOS

2HisnotpermittedafterVBDXMS

Example

অ अ

অং क ক क কঃ कः ি )क

3NumberofBDorXpermittedafterConsonantorVowelorakāra(Mātrā)isrestrictedtoonethusthefollowingcombinationsareinvalidated

50

Example

কংং क

ক क

কঃঃ कःः

কা का

কীঃঃ कःः

অংং अ

অ अ

অঃঃ अःः

4NumberofMpermittedafterConsonantisrestrictedtooneExample

কীী की5MisnotpermittedafterV Example

ইাঈৗ ईाईौ6ThecombinationsofAnusvāra+ VisargaaswellasVisarga+Anusvāraarenotpermissible

Example

কংঃ कः

কঃং कः

8 Contributors

81ExpertsfromIndia ProfessorUdayaNarayanaSinghChair-ProfessorofLinguisticsampDeanFacultyofArtsAmityUniversityHaryanaGurgaonPachgaonManesarPIN122431(Haryana)India

51

ProfessorPabitraSarkarformerlyVice-ChancellorRabindraBharatiUniversityKolkataDrAtiurRahmanKhanPrincipalTechnicalOfficerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMrRajibChakrabortyLinguistSocietyforNaturalLanguageTechnologyResearch(SNLTR)Module114amp130SDFBuildingSaltLakeSector-VKolkata-700091(WestBengal)IndiaMrAkshatJoshiProjectEngineerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMsMoumitaChowdhurySeniorTechnicalOfficerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMrChandrakantaMurasinghAgartalaTripuraSomeotherNBGPmembers

82ContributorsfromBangladeshJanabMustafaJabbarHonorableMinisterMinistryofPostsTelecommunicationsampInformationTechnologyGovtofBangladeshProfShamsuzzamanKhanFormerDirector-GeneralBanglaAcademyDhakaProfRafiqulIslamNationalProfessorofHumanitiesDhakaProfSwarochisSarkarDirectorInstituteofBangladeshStudiesRajshahiUniversityRajshahiBangladeshProfJinnatImtiazAliDirector-GeneralInternationalMotherLanguageInstituteDhakaMrMohammadMamunOrRashidDepartmentofBanglaJahangirnagarUniversityampMemberBangladeshComputerCouncilProfManiruzzamanformerlyProfessorChittagongUniversityChattagramBangladeshMrShyamSunderSikderSecretarySecretaryPostampTelecommunicationsDivisionGovtofBangladesh

52

MrMdMustafaKamalFormerDirectorGeneralBangladeshTelecommunicationsRegulatoryAuthorityGovernmentofBangladeshDhakaBrigadierGeneralMdMahfuzulKarimMajumderDirector-GeneralEngineeringampOperationsDivisionBangladeshTelecommunicationsRegulatoryAuthorityGovernmentofBangladeshDhakaMdZiarulIslamProgrammerPostsampTelecommunicationsDivisionGovernmentofBangladeshDhakaProfSyedShahriyarRahmanDepartmentofLinguisticsUniversityofDhakaDrMizanurRahmanDirector(In-Charge)TranslationTextBookandInternationalRelationsDivisionBanglaAcademyDhakaDrApareshBandyopadhyayDirectorBanglaAcademyDhakaMrMdMobarakHossainDirectorBanglaAcademyDhakaDrJalalAhmedDirectorBanglaAcademyDhakaMrJahangirHossainInternetSocietyBangladesh(ICANNALS)JanabSarwarMostafaChoudhuryBangladeshComputerCouncilDhakaJanabMdRashidWasifBangladeshComputerCouncilDhakaJanabIstiaqueArifSeniorAssistantDirectorBangladeshTelecommunicationsRegulatoryAuthorityDhakaMsAfifaAbbasInformationSecurityandGovernanceLeadEngineeratBanglalinkandICANNFellowMr Mohammad Abdul Haque Secretary General Bangladesh Internet GovernanceForumMrImranHossenCEOEyeSoftandkeymemberofBangladeshAssociationofSoftwareampInformationServices(BASIS)MsShahidaKhatunDirectorFolkloreMuseumampArchiveDivisionBanglaAcademyDhakaMrSyedAshikRehmanCEOBengalMediaCorporationDhakaMrHaseebRahmanCEOProfessionalsrsquoSystemsDhaka

53

9 References[100] UnicodeConsortium2017UnicodeStandard100MountainViewCA

[101] BandyopadhyayChittaranjan1981DuiShatakerBanglaMudranoPrakashanKolkataAnandaPublishers

[102] BanerjiRD1919TheOriginoftheBengaliScriptKolkataNewDelhiAsianEducationalServices2003reprint

[103] ChatterjiSK1926TheOriginandDevelopmentoftheBengaliLanguageCalcuttaCalcuttaUniversityPress

[104] -----1939Bhasha-prakashBangalaVyakaran(AGrammaroftheBengaliLanguage)CalcuttaUniversityofCalcutta

[105] HaiMuhammadAbdul1964DhvaniVijnanOBanglaDhvani-tattwa(PhoneticsandBengaliPhonology)DhakaBanglaAcademy

[106]JhaSubhadra1958TheFormationofMaithiliLondonLuzacampCo

[107] KosticDjordjeDasRheaS1972AShortOutlineofBengaliPhoneticsCalcuttaStatisticalPublishingCompany

[108] MajumdarRC1971HistoryofAncientBengalCalcuttaGBhardwaj

[109] MazumdarBijaychandra19202000TheHistoryoftheBengaliLanguage(ReprCalcutta1920ed)NewDelhiAsianEducationalServices

[110] PandeyAnshuman2001ProposaltoEncodetheTirhutaScriptinISOIEC10646

[111] PalPalashBaran2001DhwanimalaBarnamalaKolkataPapyrus

[112] -----2007lsquoBanglaHarapherPanchParbarsquoInSwapanChakrabortyedMudranerSanskritiOBanglaBoiKolkataAbabhas

[113] RossFiona1999ThePrintedBengaliCharacteranditsEvolutionLondonCurzon

[114] ShastriMahamahopadhyayHaraPrasad1916HājārBacharērPurāṇaBāṅgālāBhāṣāyBauddhaGānōDōhāCalcuttaBangiyaSahityaParishat

[115] SinghUdayaNarayana(JointlyManiruzzaman)1983DiglossiainBangladeshandlanguageplanningCalcuttaGyanBharati214pp

[116] -----1987ABibliographyofBengaliLinguisticsMysoreCIILxii+316pp

[117] -----2017(withRajibChakrabortyBidishaBhattacharjeeampArimardanKumarTripathy)LanguagesandCulturesontheMarginGuidelinesforFieldworkonEndangeredLanguagesMimeoCentreforEndangeredLanguagesVisva-Bharati

54

[118] -----1980ScriptalchoiceandspellingreformAnessayinlanguageandplanningJournaloftheMSUniversityofBarodaSocialScienceNumber292173-186AmodifiedversionreprintedEAnnamalaiBjornJernuddandJoanRubinedsLanguagePlanningProceedingsofanInstituteMysoreCIIL405-417

[119] Sripantha1996JakhanChapakhanaEloKolkataPaschim-BangaBanglaAcademy

[120] SurAtul1986BanglaMudranerDushoBacharKolkataJijnasa

[121] ScriptBehaviourforBengaliVersion11TDILandC-DACPune

[122] BoraMahendra1981TheEvolutionofAssameseScriptJorhatAssamSahityaSabha

[123] ProposaltoEncodetheTirhutaScriptinISOIEC10646httpwwwunicodeorgL2L201111175r-tirhutapdfaccessedon25112017

[124] EthnologueAssameseintheLanguageCloudhttpswwwethnologuecomcloudasmaccessedon25112017

[125] BengalialphabetforManipurifoundinEthnologueManipuri(MeeteilonMeithei)httpswwwomniglotcomwritingmanipurihtmaccessedon20102019

[126] WikipediaBengalialphabethttpsenwikipediaorgwikiBengali_alphabetaccessedon25112017

[129]OmniglotSlyhetihttpwwwomniglotcomwritingsylotihtmaccessedon1052018

[130]WikipediaBishnupriyaManipurilanguagehttpsenwikipediaorgwikiBishnupriya_Manipuri_languageaccessedon25112017

[131] TheEMILLECIILCorpushttpmetashareeldaorgrepositorybrowsethe-emilleciil-corpusabdd35c8de6f11e2b1e400259011f6ea6bce74d38dbb42d881da76c64a6adb20accessedon1052018

[132]TheEMILLECIILCorpushttpcatalogelrainfoproduct_infophpproducts_id=696accessedon1052018

[133] BanglaLanguageampScript httpswwwisicalacin~rc_banglabanglahtmlaccessedon1052018

[134] SarkarPabitra1992BanglaBananSanskarSamasyaoSambhabanaKolkataChirayataPrakashan

[135] SarkarPabitra1993BanglaBhasharYuktabyanjanBhasha1123-45

55

[136] DashNiladriShekharandBBChaudhuri1998BanglaScriptAStructuralStudyLinguisticsToday121-28Alsoavailableathttpswwwacademiaedu9967428Bangla_Script_A_Structural_Study

[137] DaniAhmedHasan(1957)lsquoSrīhatta-NāgarīLipirUtpattioBikāśrsquoBanglaAcademyPatrika(Dhaka)Vol12(Bhadra-Agrahayan1364BangabdaNumber)pg1

[138] WikipediaSylhetiNagari httpsenwikipediaorgwikiSylheti_Nagariaccessedon1952018

[139] FuruiRyosuke(2015)lsquoVariegatedAdaptationsStateFormationinBengalfromtheFifthtoSeventhCenturyrsquoinBhairabiPrasadSahuampHermannKulkeedsInterrogatingPoliticalSystemsIntegrativeProcessesandStatesinPre-ModernIndiaChapter9Pp255-73NewDelhiManohar

[140] FergusonCharesAandMunierChowdhury(1960)lsquoPhonesofBengalirsquoLanguageVol36No1pp22-59

[141] ShahidullahMuhammad(2007)BuddhistMysticSongsDhakaMowlaBrothers

[142] RayPunyaSloka(1966)BengaliLanguageHandbookWashington

[143] HaiMuhammadAbdul(1960)AphoneticandphonologicalstudyofnasalsandnasalizationinBengaliDhakaUniversityofDhaka

[144] UnicodeConsortiumProposalSummaryFormtoAccompanySubmissionsforAdditionstotheRepertoireofISOIEC10646UNICODEhttpswwwunicodeorgL2L200202387r-syloti-formpdfaccessedonMay212018

[145] WikipediaOlChiki(Unicodeblock)httpsenwikipediaorgwikiOl_Chiki_(Unicode_block)accessedonMay212018

[146]BanglaScripthttpwwwbangladesh2000combdbangla_scripthtmlaccessedonMay212018

[147] BhattacharyaAshutoshed(1942)GopichandrerGanCalcuttaCalcuttaUniversity

[149] DasSisirKumar(1975)SahibsandMunshisAnAccountoftheCollegeofFortWilliamCalcutta

[150] IslamRafiqulPabitraSarkarMahbubulHaqampRajibChakraborty(eds)(2014)BanglaAcademyPromitoBanglaByabaharikByakaran(AFunctionalGrammarofStandardBangla)DhakaBanglaAcademy

[151] SarkarPabitra[2013]lsquoBanglaSpellingReformtheLongandShortofItrsquoBanglaJournal19215-232

[152] BanglaAcademy(2012)BanglaAcademyPromitoBanglaBananerNiyam(StandardBanglaSpellingasadoptedbyBanglaAcademy)DhakaBanglaAcademy

56

[153] SarkarPabitraampRajibChakraborty2018ldquoWhathashappenedSoFarIntermsof Script Reformsrdquo Paper presented at the Face to Face meeting jointly held by theBanglaAcademyDhakaampICANNatBanglaAcademyDhakaon10072018

[154] The Unicode Consortium 2018 The Unicodereg Standard Version 110 ndash CoreSpecificationChapter12P473

57

10 Appendix-I

101 AugmentedBackus-NaurFormalism(ABNF)The Augmented Backus-Naur Formalism (ABNF) is generic in nature and whenappliedtoaspecificlanguagescriptcertainrestrictionrulesapplyInotherwordsinagivenlanguagesomeoftheFormalismstructuresdonotnecessarilyapplyTotakecareofsuchcasesrestrictionrulesaresetinplaceTheserestrictionswillhelptofine-tunetheABNFIncaseofBangla13inparticularthefollowingrulesapply

1Khaṇḍata (ৎ) is NOT allowed at the beginning of an IDN label The sameappliestoঞandthevelarnasalঙintheBanglaSchemeoffive-foldlsquovargarsquo(asdefined under Table 5) Moreover Bangla does not allow ya (য়) in thebeginningofawordeitherbutwecanciteacoupleofnativeexamples forexamplethewordয়8াvেড়া(yaeligbbɔRo)fromthepoemlsquoLichuchorrsquowrittenbyKaziNazrul IslamHowever there are instances of it being used in namesmostlyof foreignoriginsuchasYaqubwhichmaybewrittenwithya(য়) inthe beginning as inয়াwব) In very recent timeswhile transliterating someChineseandJapanesenamesinBanglaonedoescomeacrossthepossibilityofKhaṇḍata (ৎ) followedbysa (স) inthebeginningofaword forexamplexেসিরং(Tsering)

2CHcancomewithKhandaTainonlythecasewhereCisra(র)(09B0)

ৎ6 asinভৎ6 সনা

3OnlyfollowingcombinationswithVHCMwillbeallowedrarrঅ8া(togetherpronouncedasaelig)asinঅ8ািসড(acid)rarrএ8া(togetheralsopronouncedasaelig)asinএ8ািসডএ8ােসািসেয়শান

(acidassociation)

102 lsquoSylhetiNagarılipirsquoorlsquoSilotirsquoThisversionofBanglascriptresemblesthe lsquoKaithīrsquoscript(ISO12954)usedbytheAccountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh andBiharndashwidelyinuseduringthe1880sTherewereseveralothernamesofSylheti

13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language Assamese and Maṇipuri have not been covered in this section

58

Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)

Table17ndashTheScriptTableofSylhetiNagarıorSiloti

103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints

1031 BanglaandNāgarīorDevanāgarī

Bangla Devanāgarī NBGP

Decision

ঃ U+0983 ः U+0903 Confusable

ওU+0993 उU+0909 Confusable

ঘU+0998 घU+0918 Confusable

U+0981 U+0945 Confusable

Table18BanglaandDevanāgarīconfusablecodepoints

59

1032 BanglaandGurmukhi

Bangla Gurmukhi NBGPdecision

ঘU+0998 ਬU+0A2C Confusable

U+0981 U+0A71 Confusable

Table19BanglaandGurmukhiconfusablecodepoints

Bangla Gurmukhi

NBGPdecision

ওU+0993 ਤU+0A24

Distinguishable

শU+09B6 ਅU+0A05

Distinguishable

মU+09AE ਮU+0A2E

Distinguishable

বাU+09ACandU+09BE

ਗU+0A17

Distinguishable

Table20ndashBanglaandGurmukhıdistinguishablecodepoints

1033BanglaandOriya(Odia)

Bangla Oriya(Odia) NBGPDecision

ওU+0993 ଓU+0B13 Confusable

Table21ndashBanglaandOriyadistinguishablecodepoints

Bangla Oriya(Odia) NBGP

Decision

ঘU+0998 ସU+0B38 Distinguishable

Table22ndashBanglaandOriyadistinguishablecodepoints

60

11 Appendix-IIBengaliconsonantsandtheirallographs

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

প p y(z+ত)(z +ন)|(z +প)প8(z +য)r(z +র)(z +ল)~(z +স)(+প)(+প)

ফ pʰ (t +র)(t +ল)(+ফ)

ব b ( +জ)( +দ)( +ধ)v(+ব)ব8(+য)(+র)(+ল)ভ(+ভ)(+ব)(+ব)

(0+ধ) 2 (3+ব)

ভ bʱ ভ8(+য)(+র)(+ল)

ত t (x+ত)8(x+x+য)(x+x+ব)(x+থ)(x+ন)ত8(x+য)(x+ম)8(x++য)(x+ব)(x+র)y(z+ত)(p+ত)(p+x+ব)(+ত)q8(+x++য) (+x+র)Thereisamarkedformofত+=ৎৎ6 (+xৎ)

amp (5+ত)

61

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

থ tʰ থ8(iexcl+য)cent(iexcl+র)pound(+থ)(x+থ)curren(+থ)

(7+থ) 9 (+থ)

দ d yen(brvbar+গ)sect(brvbar+ঘ)uml(brvbar+দ)copy(brvbar+ধ)দ8(brvbar+য)ordf(brvbar+ব)laquo(brvbar+ভ)not(brvbar+র)(+দ)shy(+দ)reg(+brvbar+র)not6 (+brvbar+র)

(lt+গ) gt (lt+ধ)

ধ dʱ macr(deg+ন)plusmn(deg+ম)ধ8(deg+য)sup2(deg+র)sup3(acute+ধ)copy(brvbar+ধ)(+ধ)micro(+ধ)

( (+ধ) gt (lt+ধ) (0+ধ) (7+ধ)

ট ʈ para(middot+ট)ট8(middot+য)cedil(middot+ব)sup1(middot+র)ordm(p+ট)raquo(frac14+ট)

ঠ ʈʰ ঠ8(frac12+য)frac34(iquest+ঠ)Agrave(frac14+ঠ)

ড ɖ Aacute(Acirc+ড)ড8(Acirc+য)Atilde(Acirc+র)

ঢ ɖʱ ঢ8(Auml+য)Aring(iquest+ঢ)

চ tʃ AElig(Ccedil+চ)Egrave(Ccedil+ছ)Eacute(Ccedil+Ecirc+র)Euml(Ccedil+ঞ)চ8(Ccedil+য)Igrave(Iacute+চ)Icirc(Iuml+চ)

(A+চ)

62

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ছ tʃʰ ETH(Ecirc+র)Egrave(Ccedil+ছ)Ntilde(Iacute+ছ)Ograve(Iuml+ছ)

$ (A+ছ)

জ dʒ Oacute(Ocirc+জ)Otilde(Ocirc+Ocirc+ব)Ouml(Ocirc+ঝ)times(Ocirc+ঞ)জ8(Ocirc+য)Oslash(Ocirc+র)Ugrave(Iacute+জ)

(A+জ)

ঝ dʒʱ (notprivilegedenoughtohaveclustersasafirstmember)Ouml(Ocirc+ঝ)Uacute(Iacute+ঝ)

ক k Ucirc(p+ক)ordm(p+ট)(p+ত)Uuml(p+x+র)(p+x+ব)Yacute(p+ন)THORN(p+ব)szlig(p+ম)ক8(p+য)agrave(p+র)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)auml(p+frac14+ব)aacute8(p+frac14+য)aring(p+স)s(aelig+ক)ccedil(+p+র)

amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)

খ kʰ (notprivilegedenoughtohaveclustersasafirstmember)egrave(aelig+খ)

63

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

গ g eacute(acute+গ)ecirc(acute+দ)sup3(acute+ধ)euml(acute+ন)igrave(acute+ব)iacute(acute+ম)গ8(acute+য)icirc(acute+র)iuml(acute+ল)eth(aelig+গ)eth6 (+aelig+গ)

( (+ধ) (H+গ) K (L+H+গ)

ঘ gʱ ntilde(ograve+ন)ঘ8(ograve+য)oacute(ograve+র)ocirc(aelig+ঘ)

ঞ Thisletterdoesnothaveanyparticularphoneticvaluebutmostlypronouncedasn

Igrave(Iacute+চ)Ntilde(Iacute+ছ)Ugrave(Iacute+জ)Uacute(Iacute+ঝ)times(Ocirc+ঞ)

(A+চ) $ (A+ছ) (A+জ) M (A+ঝ)

ণ n otilde(iquest+ট)frac34(iquest+ঠ)ouml(iquest+ড)divide(iquest+Acirc+র)Aring(iquest+ঢ)oslash(iquest+ণ)ণ8(iquest+য)ugrave(iquest+ব)acirc(p+frac14+ণ)uacute(frac14+ণ)ucirc(+ণ)

O (P+ড) - (P+R+র) + (S+ণ)

ঙং ŋ s(aelig+ক)uuml(aelig+p+র)egrave(aelig+খ)eth(aelig+গ)ocirc(aelig+ঘ)yacute(aelig+p+ষ)(Insomecontextsaeligisreplacedbyং)কংঅং

) (H+ক) (H+গ) U (H+ঘ)

64

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ম m thorn(+ল)yuml(+প)(+z+র)(+ভ)(++র)$(+ম)(+র)(x+ম)plusmn(deg+ম)amp(+ম)atilde(p+frac14+ম)

W (3+ম)

ন n (+ট)((+middot+র))(iquest+ঠ)u(+ড)(+Acirc+র)(+ত)q(+x+র)q8(+x++য)curren(+থ)shy(+দ)reg(+brvbar+র)micro(+ধ)+(+deg+র)(+brvbar+ব)-(+ন)(+ম)ন8(+য)(+স)0(+ন)

(7+থ) (7+ধ) (7+Y+র)

শ ʃ Icirc(Iuml+চ)Ograve(Iuml+ছ)1(Iuml+ন)2(Iuml+ম)3(Iuml+র)4(Iuml+ল)শ8(Iuml+য)

ষ ʃ 5(frac14+ক)raquo(frac14+ট)Agrave(frac14+ঠ)uacute(frac14+ণ)6(frac14+প)7(frac14+z+র)8(frac14+ফ)raquo(frac14+ট)9(frac14+middot+র)Agrave(frac14+ঠ)uacute(frac14+ণ)ষ8(frac14+য)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)

+ (S+ণ)

65

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

স sampʃ (+ক)(+ট)(+প)(+ফ)lt(+ত)pound(+থ)(+ট)(+ক)=(+খ)স8(+য)gt(+র)(+ল)aring(p+স)

9 (+থ)

হ h ucirc(+ণ)0(+ন)amp(+ম)হ8(+য)(+র)A(+ল)

W (3+ম)

ড় ɽ B(C+গ)

ঢ় ɽʱ (notprivilegedenoughtohaveclusters)

য dʒThesecondarysymbol(allograph)jɔ-phalahastwophoneticvaluesWhenaddedtotheinitialconsonantinaworditisavowelaelig(asinশ8ামলর 8াপারetc)Butafteranon-initialconsonantitjustdoublesitinpronunciation(asinকায6ধায6etc)The+যcombinationhastwophysicalmanifestationsmdashর 8andয6

ক8(p+য)স8(+য)র 8(+য)[Justর 8isneverusedinBanglaorthographyর 8াisbutthenitslasttwosymbolsYa-phalaa-karaconstituteavowelsignrepresentingthevowelঅ8া]

66

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

র r Twomanifestationsmdashi lরফrepʰasthefirst

memberofaclusteregপ6ৎ6 not6 য6D6(+deg+ব)(earlierE6=+brvbar+deg+বafour-termcluster)etc(placedoverthefollowingconsonant)

ii র-ফলাrɔ-pʰɔlaasthesecondthirdmemberofaclustereg etc(placedundertheconsonantitfollows)

ল l F(+গ)(+প)G(+ব)H(+ম)I(+ট)J(+ড)K(+ক)F(+গ)L(+দ)ল8(+য)iuml(acute+ল)(+ল)thorn(+ল)

ঃ hwordfinallywordmediallyitdoublesthepronunciationofthefollowingconsonant

অঃকঃ

অব

Page 35: Proposal for a Bangla (or Bengali) Script Root Zone …‘Bangla’ (or Bengali) is historically and genealogically regarded as an eastern Indo-Aryan language with around 178.2 million

35

No UnicodeCodePoint

Glyph

CharacterName

Category Language(s)withEGIDSValue

ReferencesandComment

61 U+09CD BENGALISIGNVIRAMA

Hasanta(=Halant)Virama(=Da ri)

1Bangla2Assamese2Manipuri

[112][122][125]

62 U+09CE ৎ BENGALILETTERKHANDATA

Consonant 1Bangla2Manipuri2Assamese

[112][122][125]

63 U+09F0 ৰ BENGALILETTERRAWITHMIDDLEDIAGONAL

Consonant 2Assamese [122]

64 U+09F1 ৱ BENGALILETTERRAWITHLOWERDIAGONAL

Consonant 2Assamese2Manipuri

[122][125]

Table8BanglaCode-PointRepertoire

Apart from the above individual code-points the Neo-Brāhmī Generation Panel alsoproposes some specific sequences which enable conditional inclusion of the BanglaLETTER A and E followed by Bangla SIGN VIRAMA and Bangla LETTER YA againfollowed by Bangla VOWEL SIGN AA in the repertoire for enabling inclusion of aeligsoundasinEnglishlsquobatrsquolsquocatrsquoetc

SrNo

UnicodeCodePoints

Sequence

CharacterNames Examplelanguagesusingthecode-point

(Notexhaustive

list)

Reference

36

SrNo

UnicodeCodePoints

Sequence

CharacterNames Examplelanguagesusingthecode-point

(Notexhaustive

list)

Reference

S1 098509CD09AF09BE

অ8া BENGALILETTERABENGALISIGNVIRAMABENGALILETTERYABENGALIVOWELSIGNAA

BanglaAssamese

[112][122]

S2 098F09CD09AF09BE

এ8া BENGALILETTEREBENGALISIGNVIRAMABENGALILETTERYABENGALIVOWELSIGNAA

Bangla [112]

Table9Sequences

52 CodePointRepertoireExclusionTherearesomecharactersoftheBanglascriptthatfindplaceintheUnicodebuthavenot been included in the repertoire in the LGR proposal The reason for excludingঌ(U+098C)andৗ(U+09D7)isthattheyarerareandobsoletecharacters

SrNo CodePoints

Glyph CharacterNames Note

1 U+098C ঌ BENGALILETTERVOCALICL Limitedordeclininguse

2 U+09D7 ৗ BENGALIAULENGTHMARK Limitedordeclininguse

Table10ExcludedCodePoints

37

53 CodepointnotusedaloneBENGALI SIGN NUKTA U+09BC (See 336) is excluded from repertoire since it will never be used alone It will be used as sequence in three special characters in normalized form for ড় ঢ় য়

UnicodeCodePoint

Glyph CharacterName Reasonforexclusion

U+09BC BENGALISIGNNUKTA

Never used alone Only used together with U+09A1 ড

U+09A2 ঢ U+09AF য as to form ড় ঢ় য় respectively

Table10bExcludedCodePoints

54 TheBasisofPresentIDNThepresentLGRhasalsobenefitedfromtheearlierworkonIDNforBangla(differentversions)doneforभारतorভারতdraftedbetween20112009and18072013

541 TheABNFVariablesTheAugmentedBackus-NaurFormalism(ABNF)beganwiththefollowingvariables

CrarrConsonantVrarrVowelMrarrkāra(Mātrā)BrarrAnusvāra(onuʃʃār)DrarrCandrabinduXrarrVisarga(biʃɔrgo)HrarrHasantaViramaZrarrKhandaTa

TheAugmentedBackus-NaurFormalism(ABNF)willusethefollowingOperators

SrNumber Operator Function

1 ldquo|ldquo Alternative

2 ldquo[]rdquo Optional

3 ldquordquo VariableRepetition

38

4 ldquo()rdquo SequenceGroup

Table11TheABNFFormalism

InwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoBanglaaregiventofacilitateunderstanding

542 TheVowelSequenceInwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoBanglaare given To facilitate understanding of other Brahmi script users equivalents inDevanāgarīareprovidedwherevernecessaryAvowelsequenceismadeupofasinglevowelItmaybefollowedbutnotnecessarily(optionally)byanAnusvāraonuʃʃār(B)Candrabindu(D)oraVisargabiʃɔrgo(X)ThenumberofDBorXwhichcanfollowaV inBanglamaynotberestrictedtooneGoingbytherules illustratedinthedocument it isclearthat formationssuchasVDDVBBandVXXare invalidorthographicunitsHowever it isvalidandpossible tohaveformationsorsequencessuchasanusvarafollowedbyachandrabinduononehandandvisarga followed by a chandrabindu on the other as in হ8াংচা lsquohaelignchārsquo and lsquohaelignrsquo হ8াঃrespectivelyThepossibilityof aVisarga orAnusvāra (onuʃʃār) followingaCandrabinduexists inBangla Vowel can optionally be followed by a combination of Hasanta Virama [H]Consonant [C] to formaYa-phala ldquoYa-phala isapresentation formofU+09AFBanglaletterযorlsquoyarsquoRepresentedbythesequenceltU+09CDieBENGALISIGNVIRAMABangla SIGNHasanta or VIRAMA U+09AF -য BENGALI LETTER YAgt Ya-phala has aspecialformয়AgainwhencombinedwithU+09BEাBENGALIVOWELSIGNAA(ielsquoaarsquo(ā))itisusedfortranscribing[aelig]asintheldquoardquointheEnglishwordldquobatrdquowritteninBanglaasব8াটAVowel-sequenceadmitsthefollowingcombinations

5421 ASingleVowel

ExamplesV অअ

5422 AVowelwithConditionsAVowelcanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D]orVisarga[X]or Candrabindu+ Anusvāra [DB] or Candrabindu+ Visarga [DX] or combination ofHasanta(orVirama)[H]followedbyConsonant[C]followedbykāra(Mātrā)[M]

39

Examples

VB অং अ

VD অ अ

VX অঃ अः

VDB অং अ

VDX অঃ अःVHCM অ8াএ8া

5423 VHCMSequenceAVHCMsequencecanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D] orVisarga [X]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga [DX]

Examples VHCMB অ8াংএ8াং VHCMD অ8াএ8া VHCMX অ8াঃএ8াঃ VHCMDB অ8াংএ8াং VHCMDX অ8াঃএ8াঃ

543 TheConsonantSequence

5431 ASingleConsonant(C) ExampleC কक

5432 AConsonantwithConditionsAConsonantoptionallyfollowedbydependentvowelsignkāra(Mātrā)[M]orAnusvāra [B] or Candrabindu [D] or Visarga [X] or Hasanta (also known asVirama)[H]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga[DX]

Example

CM িকক -कक

CB কং क

40

CD ক क

CX কঃ कः

CH p क(Pureconsonant)

CDB কং क

CDX কঃ कः

5433 CMSequenceACMsequencecanbeoptionallyfollowedbyBDXDBorDX

Example CMB কীংকং कक

CMD কা का

CMX বীঃ वीः

CMDB কাং का

CMDX কাঃ काः

5434 SequenceofConsonants Asequenceofconsonants(upto4)joinedbyHasanta(alsoknownasVirama)

3(CH)CExample

CHC W rarr ন++ ত न+त

CHCHC sup2 rarr ন+ + ত+ + র न+त+र

CHCHCHC q8 rarr ন++ত++র++য় न+त+र+य

5435 Subsets While considering its subsets as a representative example we will consider thecombinationCHConlyhoweverthesameisequallyapplicabletoCHCHCandCHCHCHC[A]ThecombinationmaybefollowedbyMBDXDBorDX

Example CHCM sup3ী rarrক ক ী 4कrarrककী

CHCB sup3ং rarrক ক ং 4कrarrकक

CHCD sup3 rarrক ক 4कrarrकक

41

CHCX sup3ঃ rarrক ক ঃ 4कःrarrककঃ

CHCDB sup3 ং rarrক ক ং 4कrarrकक

CHCDX sup3ঃ rarrক ক ঃ 4कःrarrककः [B] 3(CH)CMmayfurtherbefollowedbyaBDXDBorDX

Example

CHCMB sup3ীং rarr ক ক ী ং 4क rarr क क ी

sup3ং rarr ক ক ং 4क rarr क क

CHCMD sup3া rarr ক ক া 4का rarr क क ा

CHCMX sup3ীঃ rarr ক ক ী ঃ 4कः rarr क क ी ः

CHCMDB sup3াংrarr ক ক া ং 4काrarr क क ा

CHCMDX sup3াঃ rarr ক ক া ঃ 4काः rarr क क ा ः

544 TheKhanda-Tasequence

5441 AsinglelsquoKhandarsquo-Ta(Z) Example Zৎ=x

5442 AKhandaTaCombination10AKhandaTacanbeprecededbyaconsonantandHasanta(alsoknownasVirama)

[CH]Z

Example র++ৎ =ৎHasinভৎHসনা (bhartsanā) scoldingNoteTheconditionsinthiscontextofKHANDATAarethattheCshouldbeeitherRAU+09B0(র)(usedinBangla)orRAU+09F0(ৰ)(usedinAssamese)

545 SpecialCasesSandPTwospecialcasesinvolvingSequences(referredtoasSandPinTable16underSection7)couldbedescribedbrieflyhereLetustakeupSinthefirstinstanceItisnoteworthythat thereare two instances inBanglawhereHasanta(U+09CD) isprecededbya full

10 Refer to Rule P in Section 7 Table 16

42

vowel (U+0985 অ - BANGLA LETTER A and U+098F এ - BANGLA LETTER E) ForrenderingYa-phalāfollowedbyঅandএitisnecessarytotypeU+09CDplusU+09AFyapreceded by the said vowels This is a purely ligatural entity and the addition ofYa-phalāandā-kāraisusedtoelicittheaeligsoundasinEnglishlsquobatrsquolsquofatrsquoetcTheBrahmıscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribedaslsquovowel killerrsquo although it actually indicates absence of a vowel after the markedconsonant Only the consonants can have the Hasanta marked But as we see hereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅGাandএGা(CfUnicode100p473[100])Another case refers to the formation of repha and ra-phalā in the said script andmentioned in the tableaboveasPOwing toco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+

kaasinঅকHarkaldquothesunldquo)ra-phalā=C+Hasanta+ra(egTieka+Hasanta+raas

inচTcakraldquocycleldquo)ThepointisinboththecasestheslotforracouldbeBanglaraর

(U+09B0)ortheAssameseraৰ(U+09F0)followedprecededbythecommonHasanta(U+09CD)whereastheshapesofrephaandra-phalāinboththecasesremainthesame

43

6 VariantsThissection talksabout thevariants in theBanglascriptTheNBGPcategorizes theseconfusinglyvariantsintwogroupsGroup1ConfusingduetopurevisualsimilarityGroup2ConfusingduetodeviationfromnormallyperceivedcharacterformationsbylargerlinguisticcommunityForGroup1anyidenticalcodepointsaredefinedasvariantsTheconfusablebutnotidenticalcasesarenotproposedasthereisanotherpanel(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesHowevercaseswhichbelongtoGroup2areproposedtobeconsideredasvariantsThesecasesarenotofmerevisualsimilarityasthey involve some deviations from the widely accepted norms of Bangla Aksharformations These can cause confusion even to a careful observer and hence beingproposedasvariantsThe variants are generated in a script when two or more forms are formed withdifferent storage or code points In Bangla the e-kāra ā-kāra and the o-kāra havedifferentcodepointsOnecantypeowithaconsonantatonegoandthesamebytypinge-kāra and ā-kāra as two separate keys getting the same results A reader cannotdifferentiatebetween the twoko (েকা)one typedwitha singlekeyand theotheronetypedwithtwodifferentkeysMoreoverthiswillnotbeconsideredasacaseofvariantbecauseakārafollowedbyakāraisnotallowed

61 InScriptVariantsHoweverweproposetwocasesoftruein-scriptvariantsinBanglascriptCASEIAs far as true variants in Bangla are concernedwemay drawour attention to caseswhereinHasantawith(U+09A5)থ(tha)appearsasconjunctwith(U+09B8)স(sa)and(U+09A8)ন(na)1 স+Hasanta+থ(U+09B8+U+09CD+U+09A5)versus

স +Hasanta+হ(U+09B8+U+09CD+U+09B9)

2 ন+Hasanta+থ(U+09A8+U+09CD+U+09A5)versus

ন+Hasanta+হ(U+09A8+U+09CD+U+09B9)

44

Theabovecombinationsifwrittenintraditionalorthographycouldbelittleconfusingwheretheথ(tha)inconjunctappearslikeaহ(ha)Theconjunctcouldbeintheinitialmedialorfinalpositions(asshownbelowinegno1)Itcouldbetypedwrongaswellthinking itwas aহ (ha)U+09B9 increasing the chances of risks in labelwriting andidentificationExamples1 acuteandসহ(asinacuteানsthānaacuteলsthulaাacuteGsvāsthyaঅacuteায়ীasthāyı)

2 andনহ(asinparagrantha)The fontswhichrepresent traditionalBanglawritingsystemcould tend tocreate thisproblemThereforethesemaybetakenascasesofvariantsinBanglaCASEIIAnotherinterestingexampleofvariantisencounteredinra+HasantaandHasanta+racombinationsinwritinglabelsintheBanglascript(forlanguagessuchasBanglaAssameseandManipuri)Thevariantcasesariseintypinglsquorepharsquo(involvingra+Hasanta)andlsquora-phalārsquo(involvingHasanta+ra)lsquoRepharsquocouldbeformedbytwosequences(mainlybecausebothAssameseandBanglafindplaceinthesameUNICODEpointsandlsquoB_RArsquoaswellaslsquoA_RArsquorefertothesamephoneticelement)Herethefinalligatureslookthesameandwillbeasfollows

(1) B_RA+H+C(2) A_RA+H+C

Where

B_RA = U+09B0BENGALILETTERRA(র)orA_RA = U+09F0BENGALILETTERRAWITHMIDDLEDIAGONAL(ৰ)H = U+09CDBENGALISIGNVIRAMA()C = anyconsonant(theoretically)

Example

Sequence1(UsingBanglaRA)

Ligature1

Sequence2(UsingAssameseRA)

Ligature2

U+09B0(র)U+09CD()U+0995(ক) কH U+09F0(ৰ)U+09CD()U+0995(ক) কH

U+09B0(র)U+09CD()U+09A0(ঠ) ঠH U+09F0(ৰ)U+09CD()U+09A0(ঠ) ঠH

Table12ExampleofRepha

45

NoteAsBanglaandAssameseকandঠlookexactlythesametheresultantcombinationswithRephalookidenticalAdditionofRephadoesnotmakeanydifferencelsquoRa-phalarsquocouldbeformedbytwosequencesonsimilargroundsandthefinalligatureswouldlookthesame

(1) C1+H+B_RA(2) C1+H+A_RA

WhereC1 = anyconsonantsexceptKhanda-ta

Example

Sequence1(UsingBanglaRA)

Ligature1

Sequence2(UsingAssameseRA)

Ligature2

U+0995(ক)U+09CD()U+09B0(র) U+0995(ক)U+09CD()U+09F0(ৰ)

U+09A8(ন)U+09CD()U+09B0(র) ) U+09A8(ন)U+09CD()U+09F0(ৰ) )

Table13ExampleofRa-phala

AstheAssameseandBanglaRephaandRa-phalaconjunctformslookthesamethiscouldcauseconfusabilitytotheend-usersHencetherephaandra-phalacasesneedtobedefinedasvariantsNBGPconcludedtodefineরandৰasvariantcodepointswhereonlyonevariantsetbetweenরandৰcouldcoverallcasesButthiswillcreateblockedvariantlabelsegif

someoneregistersldquoরররrdquothevariantlabelldquoৰৰৰrdquowillbegeneratedasvariantandwillbeblockedandviceversaHoweveritisonlyblockedatthelabellevelifsomeoneelseneedstoregisterotherlabelsegৰৰorৰৰৰৰitisstillpossible

62 CrossScriptVariantsAcrispcrossscriptstudyforBanglahasbeendonewithrespecttosisterscriptssuchasDevanāgarī Gurmukhı and Odia11(formerly Oriya) keeping in mind the visual andtechnicalconfusionstheymaycauseaslabelsonthewebdomainMoreoverthereisnoin-script variant in Bangla as far as the orthography is concerned The followingcharacters are being proposed by the NBPG as variants Although there are certaincharacters which are somewhat similar they but have not been included here TheyhavebeenprovidedintheAppendix(102)forreference

11 Unicode uses Oriya for the script although Odia is now the official term used

46

1 BanglaandNāgarīDevanāgarīScript

Bangla Devanāgarī

মU+09AE

मU+092E

িU+09BF

िU+093F

Table14-BanglaandDevanāgarīcross-scriptvariantcodepoint

2 BanglaandGurmukhiScript

Bangla Gurmukhı

মU+09AE

ਸU+0A38

িU+09BF

ਿU+0A3F

Table15-BanglaandGurmukhıcross-scriptvariantcodepoint

7 WholeLabelEvaluationRules(WLE) ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinsection32whenwritten inBangla12ScriptThe ruleshavebeendrafted in suchawaythattheycanbeeasilytranslatedintotheLGRspecificationsBelow are the symbols used in the WLE rules for each of the Indic SyllabicCategoryasmentionedinthetableprovidedinCodepointrepertoire(Section51)

C rarr Consonant

M rarr Kāra(Mātrā)

V rarr Vowel

B rarr Anusvāra

12 As used by the Unicode denoting and including both Assamese and Maṇipuri

47

D rarr Candrabindu

X rarr Visarga

H rarr Hasanta

Z rarr KhandaTa

S rarr S1S2(fromTable9)or(ae)Ya-phalā(V1HC1M1)whereV1iseither0985(অ-BENGALILETTERA)or098F(এ-BENGALILETTERE)His09CD(-BENGALISIGNVIRAMA)C1is-09AF(য-BENGALILETTERYA)M1is-09BE(া-BENGALIVOWELSIGNAA)S1 and S2 are valid even they are not allowed by the other context rules

P rarr Ra-Hasanta(C2H)whereC2iseither09B0(র-BENGALILETTERRA)or 09F0 (ৰ - ASSAMESE LETTER RA

Unicode name BENGALI LETTER RAWITHMIDDLEDIAGONAL)

His09CD(-BENGALISIGNVIRAMA)

Table16-SymbolsusedinWLErules

ItisalsoperhapsidealtomentionherethatinBanglatheconsonantletters(orgraphemes)arephysicallyjoinedtoformldquoclustersrdquothatcouldtheoreticallyconjoinfromtwotofourconsonantsandcombinetocreatenewshapesDashandChaudhuri(1998)statethatthereareldquonearly380uniqueconsonantclustersrdquooutofwhichBi-consonantalcombinationsare290three-lettercombinationsaccountforanother80andtherareroneswithfourlettersnumber10more[136Pg4]MoredetailsofsuchcombinationscouldbeseeninPabitraSarkar(1993)[135] 71FinalSetofWLERulesTheprevalentpatternsinBanglaandvariousrestrictionsbelowarethespecificWLErulesthatneedtobeimplemented

48

1 CisasetofCandCNwhereCNisthesetofnormalizedformsofড়ঢ়য়2 HmustbeprecededbyC

Example

3 MmustbeprecededbyCExampleকা

4 DmustbeprecededbyeitherofVCorMExampleআখখা হা

5 XmustbeprecededbyeitherofVCMorDExampleউঃখঃবঃাঃ দ ঃ

6 BmustbeprecededbyeitherofVCMorDExampleআংইংকং

7 ZmustbeprecededbyVCMDBXorPExampleইৎকৎাৎাৎপ6ৎrৎ (S is not listed because S ends with M Z may also follow S)rdquo

8 VCANNOTbeprecededbyHDetailsin711CaseofVprecededbyH

9 SCANNOTbeprecededbyH

Now letuselaborateeach rulewithexamples from the scriptkeeping inmindthe Bangla Assamese and Manipuri communities Some combinations ofcharactersmayseemunrealisticorrareinusagebutthereisnoharminaddingsuch ligaturesbecause it ispossible tocreate thembyanyusereasilybutmaynotbeattestedcombinationsCase of V Preceded by H There could be cases involvingmulti-word domains where Vmay need to beallowedtofollowanHegব8াsঅtইিuয়া baeligŋkʌv ɪndiə (U+09ACU+09CDU+09AFU+09BEU+0999U+09CD U+0995 U+0985 U+09AB U+09CD U+0987 U+09A8 U+09CD U+09A1U+09BFU+09DFU+09BE)(meaningBankofIndia)ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendswithanH(অt)andthesecondwordbeginswithaV(ইিuয়া)Somesectionsofthe linguistic community require the explicit presence of H for fullrepresentation of the sound intended However by and large the form of thefirstwordwithoutanH(U+09CD)isconsideredenoughforfullrepresentationofthesoundintendedforthefirstword

49

This isauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoire Otherwise V is never required to be allowed to follow an HPermittingthismaycreateaperceptivesimilaritybetweentwolabels(withandwithout H) for majority of the linguistic communities hence this is explicitlyprohibitedbytheNBGPIn future if required depending on the prevailing requirements from thecommunitythefutureNBGPmayconsiderrevisitingthisrule

72 AdditionalExamplesfromBanglaABNF

Belowarea fewexampleswhichhelponeunderstandsomeof the rulesABNFputsinplaceThesearejustgivenforreferencepurposesandarenotmeanttobecomprehensive

1HMBDorXcannotoccurinthebeginningofaBanglawordExample

ক क

াক ाक

ংক क

ক क

ঃক ःकAscanbeseensuchcombinationwillresultautomaticallyinaldquogolurdquooradottedcircle marking it as an invalid formation This is an intrinsic property of theIndianlanguagesyllableandisquasi-automaticallyappliedwhereversupportedbytheOS

2HisnotpermittedafterVBDXMS

Example

অ अ

অং क ক क কঃ कः ি )क

3NumberofBDorXpermittedafterConsonantorVowelorakāra(Mātrā)isrestrictedtoonethusthefollowingcombinationsareinvalidated

50

Example

কংং क

ক क

কঃঃ कःः

কা का

কীঃঃ कःः

অংং अ

অ अ

অঃঃ अःः

4NumberofMpermittedafterConsonantisrestrictedtooneExample

কীী की5MisnotpermittedafterV Example

ইাঈৗ ईाईौ6ThecombinationsofAnusvāra+ VisargaaswellasVisarga+Anusvāraarenotpermissible

Example

কংঃ कः

কঃং कः

8 Contributors

81ExpertsfromIndia ProfessorUdayaNarayanaSinghChair-ProfessorofLinguisticsampDeanFacultyofArtsAmityUniversityHaryanaGurgaonPachgaonManesarPIN122431(Haryana)India

51

ProfessorPabitraSarkarformerlyVice-ChancellorRabindraBharatiUniversityKolkataDrAtiurRahmanKhanPrincipalTechnicalOfficerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMrRajibChakrabortyLinguistSocietyforNaturalLanguageTechnologyResearch(SNLTR)Module114amp130SDFBuildingSaltLakeSector-VKolkata-700091(WestBengal)IndiaMrAkshatJoshiProjectEngineerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMsMoumitaChowdhurySeniorTechnicalOfficerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMrChandrakantaMurasinghAgartalaTripuraSomeotherNBGPmembers

82ContributorsfromBangladeshJanabMustafaJabbarHonorableMinisterMinistryofPostsTelecommunicationsampInformationTechnologyGovtofBangladeshProfShamsuzzamanKhanFormerDirector-GeneralBanglaAcademyDhakaProfRafiqulIslamNationalProfessorofHumanitiesDhakaProfSwarochisSarkarDirectorInstituteofBangladeshStudiesRajshahiUniversityRajshahiBangladeshProfJinnatImtiazAliDirector-GeneralInternationalMotherLanguageInstituteDhakaMrMohammadMamunOrRashidDepartmentofBanglaJahangirnagarUniversityampMemberBangladeshComputerCouncilProfManiruzzamanformerlyProfessorChittagongUniversityChattagramBangladeshMrShyamSunderSikderSecretarySecretaryPostampTelecommunicationsDivisionGovtofBangladesh

52

MrMdMustafaKamalFormerDirectorGeneralBangladeshTelecommunicationsRegulatoryAuthorityGovernmentofBangladeshDhakaBrigadierGeneralMdMahfuzulKarimMajumderDirector-GeneralEngineeringampOperationsDivisionBangladeshTelecommunicationsRegulatoryAuthorityGovernmentofBangladeshDhakaMdZiarulIslamProgrammerPostsampTelecommunicationsDivisionGovernmentofBangladeshDhakaProfSyedShahriyarRahmanDepartmentofLinguisticsUniversityofDhakaDrMizanurRahmanDirector(In-Charge)TranslationTextBookandInternationalRelationsDivisionBanglaAcademyDhakaDrApareshBandyopadhyayDirectorBanglaAcademyDhakaMrMdMobarakHossainDirectorBanglaAcademyDhakaDrJalalAhmedDirectorBanglaAcademyDhakaMrJahangirHossainInternetSocietyBangladesh(ICANNALS)JanabSarwarMostafaChoudhuryBangladeshComputerCouncilDhakaJanabMdRashidWasifBangladeshComputerCouncilDhakaJanabIstiaqueArifSeniorAssistantDirectorBangladeshTelecommunicationsRegulatoryAuthorityDhakaMsAfifaAbbasInformationSecurityandGovernanceLeadEngineeratBanglalinkandICANNFellowMr Mohammad Abdul Haque Secretary General Bangladesh Internet GovernanceForumMrImranHossenCEOEyeSoftandkeymemberofBangladeshAssociationofSoftwareampInformationServices(BASIS)MsShahidaKhatunDirectorFolkloreMuseumampArchiveDivisionBanglaAcademyDhakaMrSyedAshikRehmanCEOBengalMediaCorporationDhakaMrHaseebRahmanCEOProfessionalsrsquoSystemsDhaka

53

9 References[100] UnicodeConsortium2017UnicodeStandard100MountainViewCA

[101] BandyopadhyayChittaranjan1981DuiShatakerBanglaMudranoPrakashanKolkataAnandaPublishers

[102] BanerjiRD1919TheOriginoftheBengaliScriptKolkataNewDelhiAsianEducationalServices2003reprint

[103] ChatterjiSK1926TheOriginandDevelopmentoftheBengaliLanguageCalcuttaCalcuttaUniversityPress

[104] -----1939Bhasha-prakashBangalaVyakaran(AGrammaroftheBengaliLanguage)CalcuttaUniversityofCalcutta

[105] HaiMuhammadAbdul1964DhvaniVijnanOBanglaDhvani-tattwa(PhoneticsandBengaliPhonology)DhakaBanglaAcademy

[106]JhaSubhadra1958TheFormationofMaithiliLondonLuzacampCo

[107] KosticDjordjeDasRheaS1972AShortOutlineofBengaliPhoneticsCalcuttaStatisticalPublishingCompany

[108] MajumdarRC1971HistoryofAncientBengalCalcuttaGBhardwaj

[109] MazumdarBijaychandra19202000TheHistoryoftheBengaliLanguage(ReprCalcutta1920ed)NewDelhiAsianEducationalServices

[110] PandeyAnshuman2001ProposaltoEncodetheTirhutaScriptinISOIEC10646

[111] PalPalashBaran2001DhwanimalaBarnamalaKolkataPapyrus

[112] -----2007lsquoBanglaHarapherPanchParbarsquoInSwapanChakrabortyedMudranerSanskritiOBanglaBoiKolkataAbabhas

[113] RossFiona1999ThePrintedBengaliCharacteranditsEvolutionLondonCurzon

[114] ShastriMahamahopadhyayHaraPrasad1916HājārBacharērPurāṇaBāṅgālāBhāṣāyBauddhaGānōDōhāCalcuttaBangiyaSahityaParishat

[115] SinghUdayaNarayana(JointlyManiruzzaman)1983DiglossiainBangladeshandlanguageplanningCalcuttaGyanBharati214pp

[116] -----1987ABibliographyofBengaliLinguisticsMysoreCIILxii+316pp

[117] -----2017(withRajibChakrabortyBidishaBhattacharjeeampArimardanKumarTripathy)LanguagesandCulturesontheMarginGuidelinesforFieldworkonEndangeredLanguagesMimeoCentreforEndangeredLanguagesVisva-Bharati

54

[118] -----1980ScriptalchoiceandspellingreformAnessayinlanguageandplanningJournaloftheMSUniversityofBarodaSocialScienceNumber292173-186AmodifiedversionreprintedEAnnamalaiBjornJernuddandJoanRubinedsLanguagePlanningProceedingsofanInstituteMysoreCIIL405-417

[119] Sripantha1996JakhanChapakhanaEloKolkataPaschim-BangaBanglaAcademy

[120] SurAtul1986BanglaMudranerDushoBacharKolkataJijnasa

[121] ScriptBehaviourforBengaliVersion11TDILandC-DACPune

[122] BoraMahendra1981TheEvolutionofAssameseScriptJorhatAssamSahityaSabha

[123] ProposaltoEncodetheTirhutaScriptinISOIEC10646httpwwwunicodeorgL2L201111175r-tirhutapdfaccessedon25112017

[124] EthnologueAssameseintheLanguageCloudhttpswwwethnologuecomcloudasmaccessedon25112017

[125] BengalialphabetforManipurifoundinEthnologueManipuri(MeeteilonMeithei)httpswwwomniglotcomwritingmanipurihtmaccessedon20102019

[126] WikipediaBengalialphabethttpsenwikipediaorgwikiBengali_alphabetaccessedon25112017

[129]OmniglotSlyhetihttpwwwomniglotcomwritingsylotihtmaccessedon1052018

[130]WikipediaBishnupriyaManipurilanguagehttpsenwikipediaorgwikiBishnupriya_Manipuri_languageaccessedon25112017

[131] TheEMILLECIILCorpushttpmetashareeldaorgrepositorybrowsethe-emilleciil-corpusabdd35c8de6f11e2b1e400259011f6ea6bce74d38dbb42d881da76c64a6adb20accessedon1052018

[132]TheEMILLECIILCorpushttpcatalogelrainfoproduct_infophpproducts_id=696accessedon1052018

[133] BanglaLanguageampScript httpswwwisicalacin~rc_banglabanglahtmlaccessedon1052018

[134] SarkarPabitra1992BanglaBananSanskarSamasyaoSambhabanaKolkataChirayataPrakashan

[135] SarkarPabitra1993BanglaBhasharYuktabyanjanBhasha1123-45

55

[136] DashNiladriShekharandBBChaudhuri1998BanglaScriptAStructuralStudyLinguisticsToday121-28Alsoavailableathttpswwwacademiaedu9967428Bangla_Script_A_Structural_Study

[137] DaniAhmedHasan(1957)lsquoSrīhatta-NāgarīLipirUtpattioBikāśrsquoBanglaAcademyPatrika(Dhaka)Vol12(Bhadra-Agrahayan1364BangabdaNumber)pg1

[138] WikipediaSylhetiNagari httpsenwikipediaorgwikiSylheti_Nagariaccessedon1952018

[139] FuruiRyosuke(2015)lsquoVariegatedAdaptationsStateFormationinBengalfromtheFifthtoSeventhCenturyrsquoinBhairabiPrasadSahuampHermannKulkeedsInterrogatingPoliticalSystemsIntegrativeProcessesandStatesinPre-ModernIndiaChapter9Pp255-73NewDelhiManohar

[140] FergusonCharesAandMunierChowdhury(1960)lsquoPhonesofBengalirsquoLanguageVol36No1pp22-59

[141] ShahidullahMuhammad(2007)BuddhistMysticSongsDhakaMowlaBrothers

[142] RayPunyaSloka(1966)BengaliLanguageHandbookWashington

[143] HaiMuhammadAbdul(1960)AphoneticandphonologicalstudyofnasalsandnasalizationinBengaliDhakaUniversityofDhaka

[144] UnicodeConsortiumProposalSummaryFormtoAccompanySubmissionsforAdditionstotheRepertoireofISOIEC10646UNICODEhttpswwwunicodeorgL2L200202387r-syloti-formpdfaccessedonMay212018

[145] WikipediaOlChiki(Unicodeblock)httpsenwikipediaorgwikiOl_Chiki_(Unicode_block)accessedonMay212018

[146]BanglaScripthttpwwwbangladesh2000combdbangla_scripthtmlaccessedonMay212018

[147] BhattacharyaAshutoshed(1942)GopichandrerGanCalcuttaCalcuttaUniversity

[149] DasSisirKumar(1975)SahibsandMunshisAnAccountoftheCollegeofFortWilliamCalcutta

[150] IslamRafiqulPabitraSarkarMahbubulHaqampRajibChakraborty(eds)(2014)BanglaAcademyPromitoBanglaByabaharikByakaran(AFunctionalGrammarofStandardBangla)DhakaBanglaAcademy

[151] SarkarPabitra[2013]lsquoBanglaSpellingReformtheLongandShortofItrsquoBanglaJournal19215-232

[152] BanglaAcademy(2012)BanglaAcademyPromitoBanglaBananerNiyam(StandardBanglaSpellingasadoptedbyBanglaAcademy)DhakaBanglaAcademy

56

[153] SarkarPabitraampRajibChakraborty2018ldquoWhathashappenedSoFarIntermsof Script Reformsrdquo Paper presented at the Face to Face meeting jointly held by theBanglaAcademyDhakaampICANNatBanglaAcademyDhakaon10072018

[154] The Unicode Consortium 2018 The Unicodereg Standard Version 110 ndash CoreSpecificationChapter12P473

57

10 Appendix-I

101 AugmentedBackus-NaurFormalism(ABNF)The Augmented Backus-Naur Formalism (ABNF) is generic in nature and whenappliedtoaspecificlanguagescriptcertainrestrictionrulesapplyInotherwordsinagivenlanguagesomeoftheFormalismstructuresdonotnecessarilyapplyTotakecareofsuchcasesrestrictionrulesaresetinplaceTheserestrictionswillhelptofine-tunetheABNFIncaseofBangla13inparticularthefollowingrulesapply

1Khaṇḍata (ৎ) is NOT allowed at the beginning of an IDN label The sameappliestoঞandthevelarnasalঙintheBanglaSchemeoffive-foldlsquovargarsquo(asdefined under Table 5) Moreover Bangla does not allow ya (য়) in thebeginningofawordeitherbutwecanciteacoupleofnativeexamples forexamplethewordয়8াvেড়া(yaeligbbɔRo)fromthepoemlsquoLichuchorrsquowrittenbyKaziNazrul IslamHowever there are instances of it being used in namesmostlyof foreignoriginsuchasYaqubwhichmaybewrittenwithya(য়) inthe beginning as inয়াwব) In very recent timeswhile transliterating someChineseandJapanesenamesinBanglaonedoescomeacrossthepossibilityofKhaṇḍata (ৎ) followedbysa (স) inthebeginningofaword forexamplexেসিরং(Tsering)

2CHcancomewithKhandaTainonlythecasewhereCisra(র)(09B0)

ৎ6 asinভৎ6 সনা

3OnlyfollowingcombinationswithVHCMwillbeallowedrarrঅ8া(togetherpronouncedasaelig)asinঅ8ািসড(acid)rarrএ8া(togetheralsopronouncedasaelig)asinএ8ািসডএ8ােসািসেয়শান

(acidassociation)

102 lsquoSylhetiNagarılipirsquoorlsquoSilotirsquoThisversionofBanglascriptresemblesthe lsquoKaithīrsquoscript(ISO12954)usedbytheAccountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh andBiharndashwidelyinuseduringthe1880sTherewereseveralothernamesofSylheti

13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language Assamese and Maṇipuri have not been covered in this section

58

Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)

Table17ndashTheScriptTableofSylhetiNagarıorSiloti

103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints

1031 BanglaandNāgarīorDevanāgarī

Bangla Devanāgarī NBGP

Decision

ঃ U+0983 ः U+0903 Confusable

ওU+0993 उU+0909 Confusable

ঘU+0998 घU+0918 Confusable

U+0981 U+0945 Confusable

Table18BanglaandDevanāgarīconfusablecodepoints

59

1032 BanglaandGurmukhi

Bangla Gurmukhi NBGPdecision

ঘU+0998 ਬU+0A2C Confusable

U+0981 U+0A71 Confusable

Table19BanglaandGurmukhiconfusablecodepoints

Bangla Gurmukhi

NBGPdecision

ওU+0993 ਤU+0A24

Distinguishable

শU+09B6 ਅU+0A05

Distinguishable

মU+09AE ਮU+0A2E

Distinguishable

বাU+09ACandU+09BE

ਗU+0A17

Distinguishable

Table20ndashBanglaandGurmukhıdistinguishablecodepoints

1033BanglaandOriya(Odia)

Bangla Oriya(Odia) NBGPDecision

ওU+0993 ଓU+0B13 Confusable

Table21ndashBanglaandOriyadistinguishablecodepoints

Bangla Oriya(Odia) NBGP

Decision

ঘU+0998 ସU+0B38 Distinguishable

Table22ndashBanglaandOriyadistinguishablecodepoints

60

11 Appendix-IIBengaliconsonantsandtheirallographs

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

প p y(z+ত)(z +ন)|(z +প)প8(z +য)r(z +র)(z +ল)~(z +স)(+প)(+প)

ফ pʰ (t +র)(t +ল)(+ফ)

ব b ( +জ)( +দ)( +ধ)v(+ব)ব8(+য)(+র)(+ল)ভ(+ভ)(+ব)(+ব)

(0+ধ) 2 (3+ব)

ভ bʱ ভ8(+য)(+র)(+ল)

ত t (x+ত)8(x+x+য)(x+x+ব)(x+থ)(x+ন)ত8(x+য)(x+ম)8(x++য)(x+ব)(x+র)y(z+ত)(p+ত)(p+x+ব)(+ত)q8(+x++য) (+x+র)Thereisamarkedformofত+=ৎৎ6 (+xৎ)

amp (5+ত)

61

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

থ tʰ থ8(iexcl+য)cent(iexcl+র)pound(+থ)(x+থ)curren(+থ)

(7+থ) 9 (+থ)

দ d yen(brvbar+গ)sect(brvbar+ঘ)uml(brvbar+দ)copy(brvbar+ধ)দ8(brvbar+য)ordf(brvbar+ব)laquo(brvbar+ভ)not(brvbar+র)(+দ)shy(+দ)reg(+brvbar+র)not6 (+brvbar+র)

(lt+গ) gt (lt+ধ)

ধ dʱ macr(deg+ন)plusmn(deg+ম)ধ8(deg+য)sup2(deg+র)sup3(acute+ধ)copy(brvbar+ধ)(+ধ)micro(+ধ)

( (+ধ) gt (lt+ধ) (0+ধ) (7+ধ)

ট ʈ para(middot+ট)ট8(middot+য)cedil(middot+ব)sup1(middot+র)ordm(p+ট)raquo(frac14+ট)

ঠ ʈʰ ঠ8(frac12+য)frac34(iquest+ঠ)Agrave(frac14+ঠ)

ড ɖ Aacute(Acirc+ড)ড8(Acirc+য)Atilde(Acirc+র)

ঢ ɖʱ ঢ8(Auml+য)Aring(iquest+ঢ)

চ tʃ AElig(Ccedil+চ)Egrave(Ccedil+ছ)Eacute(Ccedil+Ecirc+র)Euml(Ccedil+ঞ)চ8(Ccedil+য)Igrave(Iacute+চ)Icirc(Iuml+চ)

(A+চ)

62

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ছ tʃʰ ETH(Ecirc+র)Egrave(Ccedil+ছ)Ntilde(Iacute+ছ)Ograve(Iuml+ছ)

$ (A+ছ)

জ dʒ Oacute(Ocirc+জ)Otilde(Ocirc+Ocirc+ব)Ouml(Ocirc+ঝ)times(Ocirc+ঞ)জ8(Ocirc+য)Oslash(Ocirc+র)Ugrave(Iacute+জ)

(A+জ)

ঝ dʒʱ (notprivilegedenoughtohaveclustersasafirstmember)Ouml(Ocirc+ঝ)Uacute(Iacute+ঝ)

ক k Ucirc(p+ক)ordm(p+ট)(p+ত)Uuml(p+x+র)(p+x+ব)Yacute(p+ন)THORN(p+ব)szlig(p+ম)ক8(p+য)agrave(p+র)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)auml(p+frac14+ব)aacute8(p+frac14+য)aring(p+স)s(aelig+ক)ccedil(+p+র)

amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)

খ kʰ (notprivilegedenoughtohaveclustersasafirstmember)egrave(aelig+খ)

63

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

গ g eacute(acute+গ)ecirc(acute+দ)sup3(acute+ধ)euml(acute+ন)igrave(acute+ব)iacute(acute+ম)গ8(acute+য)icirc(acute+র)iuml(acute+ল)eth(aelig+গ)eth6 (+aelig+গ)

( (+ধ) (H+গ) K (L+H+গ)

ঘ gʱ ntilde(ograve+ন)ঘ8(ograve+য)oacute(ograve+র)ocirc(aelig+ঘ)

ঞ Thisletterdoesnothaveanyparticularphoneticvaluebutmostlypronouncedasn

Igrave(Iacute+চ)Ntilde(Iacute+ছ)Ugrave(Iacute+জ)Uacute(Iacute+ঝ)times(Ocirc+ঞ)

(A+চ) $ (A+ছ) (A+জ) M (A+ঝ)

ণ n otilde(iquest+ট)frac34(iquest+ঠ)ouml(iquest+ড)divide(iquest+Acirc+র)Aring(iquest+ঢ)oslash(iquest+ণ)ণ8(iquest+য)ugrave(iquest+ব)acirc(p+frac14+ণ)uacute(frac14+ণ)ucirc(+ণ)

O (P+ড) - (P+R+র) + (S+ণ)

ঙং ŋ s(aelig+ক)uuml(aelig+p+র)egrave(aelig+খ)eth(aelig+গ)ocirc(aelig+ঘ)yacute(aelig+p+ষ)(Insomecontextsaeligisreplacedbyং)কংঅং

) (H+ক) (H+গ) U (H+ঘ)

64

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ম m thorn(+ল)yuml(+প)(+z+র)(+ভ)(++র)$(+ম)(+র)(x+ম)plusmn(deg+ম)amp(+ম)atilde(p+frac14+ম)

W (3+ম)

ন n (+ট)((+middot+র))(iquest+ঠ)u(+ড)(+Acirc+র)(+ত)q(+x+র)q8(+x++য)curren(+থ)shy(+দ)reg(+brvbar+র)micro(+ধ)+(+deg+র)(+brvbar+ব)-(+ন)(+ম)ন8(+য)(+স)0(+ন)

(7+থ) (7+ধ) (7+Y+র)

শ ʃ Icirc(Iuml+চ)Ograve(Iuml+ছ)1(Iuml+ন)2(Iuml+ম)3(Iuml+র)4(Iuml+ল)শ8(Iuml+য)

ষ ʃ 5(frac14+ক)raquo(frac14+ট)Agrave(frac14+ঠ)uacute(frac14+ণ)6(frac14+প)7(frac14+z+র)8(frac14+ফ)raquo(frac14+ট)9(frac14+middot+র)Agrave(frac14+ঠ)uacute(frac14+ণ)ষ8(frac14+য)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)

+ (S+ণ)

65

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

স sampʃ (+ক)(+ট)(+প)(+ফ)lt(+ত)pound(+থ)(+ট)(+ক)=(+খ)স8(+য)gt(+র)(+ল)aring(p+স)

9 (+থ)

হ h ucirc(+ণ)0(+ন)amp(+ম)হ8(+য)(+র)A(+ল)

W (3+ম)

ড় ɽ B(C+গ)

ঢ় ɽʱ (notprivilegedenoughtohaveclusters)

য dʒThesecondarysymbol(allograph)jɔ-phalahastwophoneticvaluesWhenaddedtotheinitialconsonantinaworditisavowelaelig(asinশ8ামলর 8াপারetc)Butafteranon-initialconsonantitjustdoublesitinpronunciation(asinকায6ধায6etc)The+যcombinationhastwophysicalmanifestationsmdashর 8andয6

ক8(p+য)স8(+য)র 8(+য)[Justর 8isneverusedinBanglaorthographyর 8াisbutthenitslasttwosymbolsYa-phalaa-karaconstituteavowelsignrepresentingthevowelঅ8া]

66

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

র r Twomanifestationsmdashi lরফrepʰasthefirst

memberofaclusteregপ6ৎ6 not6 য6D6(+deg+ব)(earlierE6=+brvbar+deg+বafour-termcluster)etc(placedoverthefollowingconsonant)

ii র-ফলাrɔ-pʰɔlaasthesecondthirdmemberofaclustereg etc(placedundertheconsonantitfollows)

ল l F(+গ)(+প)G(+ব)H(+ম)I(+ট)J(+ড)K(+ক)F(+গ)L(+দ)ল8(+য)iuml(acute+ল)(+ল)thorn(+ল)

ঃ hwordfinallywordmediallyitdoublesthepronunciationofthefollowingconsonant

অঃকঃ

অব

Page 36: Proposal for a Bangla (or Bengali) Script Root Zone …‘Bangla’ (or Bengali) is historically and genealogically regarded as an eastern Indo-Aryan language with around 178.2 million

36

SrNo

UnicodeCodePoints

Sequence

CharacterNames Examplelanguagesusingthecode-point

(Notexhaustive

list)

Reference

S1 098509CD09AF09BE

অ8া BENGALILETTERABENGALISIGNVIRAMABENGALILETTERYABENGALIVOWELSIGNAA

BanglaAssamese

[112][122]

S2 098F09CD09AF09BE

এ8া BENGALILETTEREBENGALISIGNVIRAMABENGALILETTERYABENGALIVOWELSIGNAA

Bangla [112]

Table9Sequences

52 CodePointRepertoireExclusionTherearesomecharactersoftheBanglascriptthatfindplaceintheUnicodebuthavenot been included in the repertoire in the LGR proposal The reason for excludingঌ(U+098C)andৗ(U+09D7)isthattheyarerareandobsoletecharacters

SrNo CodePoints

Glyph CharacterNames Note

1 U+098C ঌ BENGALILETTERVOCALICL Limitedordeclininguse

2 U+09D7 ৗ BENGALIAULENGTHMARK Limitedordeclininguse

Table10ExcludedCodePoints

37

53 CodepointnotusedaloneBENGALI SIGN NUKTA U+09BC (See 336) is excluded from repertoire since it will never be used alone It will be used as sequence in three special characters in normalized form for ড় ঢ় য়

UnicodeCodePoint

Glyph CharacterName Reasonforexclusion

U+09BC BENGALISIGNNUKTA

Never used alone Only used together with U+09A1 ড

U+09A2 ঢ U+09AF য as to form ড় ঢ় য় respectively

Table10bExcludedCodePoints

54 TheBasisofPresentIDNThepresentLGRhasalsobenefitedfromtheearlierworkonIDNforBangla(differentversions)doneforभारतorভারতdraftedbetween20112009and18072013

541 TheABNFVariablesTheAugmentedBackus-NaurFormalism(ABNF)beganwiththefollowingvariables

CrarrConsonantVrarrVowelMrarrkāra(Mātrā)BrarrAnusvāra(onuʃʃār)DrarrCandrabinduXrarrVisarga(biʃɔrgo)HrarrHasantaViramaZrarrKhandaTa

TheAugmentedBackus-NaurFormalism(ABNF)willusethefollowingOperators

SrNumber Operator Function

1 ldquo|ldquo Alternative

2 ldquo[]rdquo Optional

3 ldquordquo VariableRepetition

38

4 ldquo()rdquo SequenceGroup

Table11TheABNFFormalism

InwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoBanglaaregiventofacilitateunderstanding

542 TheVowelSequenceInwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoBanglaare given To facilitate understanding of other Brahmi script users equivalents inDevanāgarīareprovidedwherevernecessaryAvowelsequenceismadeupofasinglevowelItmaybefollowedbutnotnecessarily(optionally)byanAnusvāraonuʃʃār(B)Candrabindu(D)oraVisargabiʃɔrgo(X)ThenumberofDBorXwhichcanfollowaV inBanglamaynotberestrictedtooneGoingbytherules illustratedinthedocument it isclearthat formationssuchasVDDVBBandVXXare invalidorthographicunitsHowever it isvalidandpossible tohaveformationsorsequencessuchasanusvarafollowedbyachandrabinduononehandandvisarga followed by a chandrabindu on the other as in হ8াংচা lsquohaelignchārsquo and lsquohaelignrsquo হ8াঃrespectivelyThepossibilityof aVisarga orAnusvāra (onuʃʃār) followingaCandrabinduexists inBangla Vowel can optionally be followed by a combination of Hasanta Virama [H]Consonant [C] to formaYa-phala ldquoYa-phala isapresentation formofU+09AFBanglaletterযorlsquoyarsquoRepresentedbythesequenceltU+09CDieBENGALISIGNVIRAMABangla SIGNHasanta or VIRAMA U+09AF -য BENGALI LETTER YAgt Ya-phala has aspecialformয়AgainwhencombinedwithU+09BEাBENGALIVOWELSIGNAA(ielsquoaarsquo(ā))itisusedfortranscribing[aelig]asintheldquoardquointheEnglishwordldquobatrdquowritteninBanglaasব8াটAVowel-sequenceadmitsthefollowingcombinations

5421 ASingleVowel

ExamplesV অअ

5422 AVowelwithConditionsAVowelcanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D]orVisarga[X]or Candrabindu+ Anusvāra [DB] or Candrabindu+ Visarga [DX] or combination ofHasanta(orVirama)[H]followedbyConsonant[C]followedbykāra(Mātrā)[M]

39

Examples

VB অং अ

VD অ अ

VX অঃ अः

VDB অং अ

VDX অঃ अःVHCM অ8াএ8া

5423 VHCMSequenceAVHCMsequencecanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D] orVisarga [X]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga [DX]

Examples VHCMB অ8াংএ8াং VHCMD অ8াএ8া VHCMX অ8াঃএ8াঃ VHCMDB অ8াংএ8াং VHCMDX অ8াঃএ8াঃ

543 TheConsonantSequence

5431 ASingleConsonant(C) ExampleC কक

5432 AConsonantwithConditionsAConsonantoptionallyfollowedbydependentvowelsignkāra(Mātrā)[M]orAnusvāra [B] or Candrabindu [D] or Visarga [X] or Hasanta (also known asVirama)[H]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga[DX]

Example

CM িকক -कक

CB কং क

40

CD ক क

CX কঃ कः

CH p क(Pureconsonant)

CDB কং क

CDX কঃ कः

5433 CMSequenceACMsequencecanbeoptionallyfollowedbyBDXDBorDX

Example CMB কীংকং कक

CMD কা का

CMX বীঃ वीः

CMDB কাং का

CMDX কাঃ काः

5434 SequenceofConsonants Asequenceofconsonants(upto4)joinedbyHasanta(alsoknownasVirama)

3(CH)CExample

CHC W rarr ন++ ত न+त

CHCHC sup2 rarr ন+ + ত+ + র न+त+र

CHCHCHC q8 rarr ন++ত++র++য় न+त+र+य

5435 Subsets While considering its subsets as a representative example we will consider thecombinationCHConlyhoweverthesameisequallyapplicabletoCHCHCandCHCHCHC[A]ThecombinationmaybefollowedbyMBDXDBorDX

Example CHCM sup3ী rarrক ক ী 4कrarrककী

CHCB sup3ং rarrক ক ং 4कrarrकक

CHCD sup3 rarrক ক 4कrarrकक

41

CHCX sup3ঃ rarrক ক ঃ 4कःrarrककঃ

CHCDB sup3 ং rarrক ক ং 4कrarrकक

CHCDX sup3ঃ rarrক ক ঃ 4कःrarrककः [B] 3(CH)CMmayfurtherbefollowedbyaBDXDBorDX

Example

CHCMB sup3ীং rarr ক ক ী ং 4क rarr क क ी

sup3ং rarr ক ক ং 4क rarr क क

CHCMD sup3া rarr ক ক া 4का rarr क क ा

CHCMX sup3ীঃ rarr ক ক ী ঃ 4कः rarr क क ी ः

CHCMDB sup3াংrarr ক ক া ং 4काrarr क क ा

CHCMDX sup3াঃ rarr ক ক া ঃ 4काः rarr क क ा ः

544 TheKhanda-Tasequence

5441 AsinglelsquoKhandarsquo-Ta(Z) Example Zৎ=x

5442 AKhandaTaCombination10AKhandaTacanbeprecededbyaconsonantandHasanta(alsoknownasVirama)

[CH]Z

Example র++ৎ =ৎHasinভৎHসনা (bhartsanā) scoldingNoteTheconditionsinthiscontextofKHANDATAarethattheCshouldbeeitherRAU+09B0(র)(usedinBangla)orRAU+09F0(ৰ)(usedinAssamese)

545 SpecialCasesSandPTwospecialcasesinvolvingSequences(referredtoasSandPinTable16underSection7)couldbedescribedbrieflyhereLetustakeupSinthefirstinstanceItisnoteworthythat thereare two instances inBanglawhereHasanta(U+09CD) isprecededbya full

10 Refer to Rule P in Section 7 Table 16

42

vowel (U+0985 অ - BANGLA LETTER A and U+098F এ - BANGLA LETTER E) ForrenderingYa-phalāfollowedbyঅandএitisnecessarytotypeU+09CDplusU+09AFyapreceded by the said vowels This is a purely ligatural entity and the addition ofYa-phalāandā-kāraisusedtoelicittheaeligsoundasinEnglishlsquobatrsquolsquofatrsquoetcTheBrahmıscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribedaslsquovowel killerrsquo although it actually indicates absence of a vowel after the markedconsonant Only the consonants can have the Hasanta marked But as we see hereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅGাandএGা(CfUnicode100p473[100])Another case refers to the formation of repha and ra-phalā in the said script andmentioned in the tableaboveasPOwing toco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+

kaasinঅকHarkaldquothesunldquo)ra-phalā=C+Hasanta+ra(egTieka+Hasanta+raas

inচTcakraldquocycleldquo)ThepointisinboththecasestheslotforracouldbeBanglaraর

(U+09B0)ortheAssameseraৰ(U+09F0)followedprecededbythecommonHasanta(U+09CD)whereastheshapesofrephaandra-phalāinboththecasesremainthesame

43

6 VariantsThissection talksabout thevariants in theBanglascriptTheNBGPcategorizes theseconfusinglyvariantsintwogroupsGroup1ConfusingduetopurevisualsimilarityGroup2ConfusingduetodeviationfromnormallyperceivedcharacterformationsbylargerlinguisticcommunityForGroup1anyidenticalcodepointsaredefinedasvariantsTheconfusablebutnotidenticalcasesarenotproposedasthereisanotherpanel(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesHowevercaseswhichbelongtoGroup2areproposedtobeconsideredasvariantsThesecasesarenotofmerevisualsimilarityasthey involve some deviations from the widely accepted norms of Bangla Aksharformations These can cause confusion even to a careful observer and hence beingproposedasvariantsThe variants are generated in a script when two or more forms are formed withdifferent storage or code points In Bangla the e-kāra ā-kāra and the o-kāra havedifferentcodepointsOnecantypeowithaconsonantatonegoandthesamebytypinge-kāra and ā-kāra as two separate keys getting the same results A reader cannotdifferentiatebetween the twoko (েকা)one typedwitha singlekeyand theotheronetypedwithtwodifferentkeysMoreoverthiswillnotbeconsideredasacaseofvariantbecauseakārafollowedbyakāraisnotallowed

61 InScriptVariantsHoweverweproposetwocasesoftruein-scriptvariantsinBanglascriptCASEIAs far as true variants in Bangla are concernedwemay drawour attention to caseswhereinHasantawith(U+09A5)থ(tha)appearsasconjunctwith(U+09B8)স(sa)and(U+09A8)ন(na)1 স+Hasanta+থ(U+09B8+U+09CD+U+09A5)versus

স +Hasanta+হ(U+09B8+U+09CD+U+09B9)

2 ন+Hasanta+থ(U+09A8+U+09CD+U+09A5)versus

ন+Hasanta+হ(U+09A8+U+09CD+U+09B9)

44

Theabovecombinationsifwrittenintraditionalorthographycouldbelittleconfusingwheretheথ(tha)inconjunctappearslikeaহ(ha)Theconjunctcouldbeintheinitialmedialorfinalpositions(asshownbelowinegno1)Itcouldbetypedwrongaswellthinking itwas aহ (ha)U+09B9 increasing the chances of risks in labelwriting andidentificationExamples1 acuteandসহ(asinacuteানsthānaacuteলsthulaাacuteGsvāsthyaঅacuteায়ীasthāyı)

2 andনহ(asinparagrantha)The fontswhichrepresent traditionalBanglawritingsystemcould tend tocreate thisproblemThereforethesemaybetakenascasesofvariantsinBanglaCASEIIAnotherinterestingexampleofvariantisencounteredinra+HasantaandHasanta+racombinationsinwritinglabelsintheBanglascript(forlanguagessuchasBanglaAssameseandManipuri)Thevariantcasesariseintypinglsquorepharsquo(involvingra+Hasanta)andlsquora-phalārsquo(involvingHasanta+ra)lsquoRepharsquocouldbeformedbytwosequences(mainlybecausebothAssameseandBanglafindplaceinthesameUNICODEpointsandlsquoB_RArsquoaswellaslsquoA_RArsquorefertothesamephoneticelement)Herethefinalligatureslookthesameandwillbeasfollows

(1) B_RA+H+C(2) A_RA+H+C

Where

B_RA = U+09B0BENGALILETTERRA(র)orA_RA = U+09F0BENGALILETTERRAWITHMIDDLEDIAGONAL(ৰ)H = U+09CDBENGALISIGNVIRAMA()C = anyconsonant(theoretically)

Example

Sequence1(UsingBanglaRA)

Ligature1

Sequence2(UsingAssameseRA)

Ligature2

U+09B0(র)U+09CD()U+0995(ক) কH U+09F0(ৰ)U+09CD()U+0995(ক) কH

U+09B0(র)U+09CD()U+09A0(ঠ) ঠH U+09F0(ৰ)U+09CD()U+09A0(ঠ) ঠH

Table12ExampleofRepha

45

NoteAsBanglaandAssameseকandঠlookexactlythesametheresultantcombinationswithRephalookidenticalAdditionofRephadoesnotmakeanydifferencelsquoRa-phalarsquocouldbeformedbytwosequencesonsimilargroundsandthefinalligatureswouldlookthesame

(1) C1+H+B_RA(2) C1+H+A_RA

WhereC1 = anyconsonantsexceptKhanda-ta

Example

Sequence1(UsingBanglaRA)

Ligature1

Sequence2(UsingAssameseRA)

Ligature2

U+0995(ক)U+09CD()U+09B0(র) U+0995(ক)U+09CD()U+09F0(ৰ)

U+09A8(ন)U+09CD()U+09B0(র) ) U+09A8(ন)U+09CD()U+09F0(ৰ) )

Table13ExampleofRa-phala

AstheAssameseandBanglaRephaandRa-phalaconjunctformslookthesamethiscouldcauseconfusabilitytotheend-usersHencetherephaandra-phalacasesneedtobedefinedasvariantsNBGPconcludedtodefineরandৰasvariantcodepointswhereonlyonevariantsetbetweenরandৰcouldcoverallcasesButthiswillcreateblockedvariantlabelsegif

someoneregistersldquoরররrdquothevariantlabelldquoৰৰৰrdquowillbegeneratedasvariantandwillbeblockedandviceversaHoweveritisonlyblockedatthelabellevelifsomeoneelseneedstoregisterotherlabelsegৰৰorৰৰৰৰitisstillpossible

62 CrossScriptVariantsAcrispcrossscriptstudyforBanglahasbeendonewithrespecttosisterscriptssuchasDevanāgarī Gurmukhı and Odia11(formerly Oriya) keeping in mind the visual andtechnicalconfusionstheymaycauseaslabelsonthewebdomainMoreoverthereisnoin-script variant in Bangla as far as the orthography is concerned The followingcharacters are being proposed by the NBPG as variants Although there are certaincharacters which are somewhat similar they but have not been included here TheyhavebeenprovidedintheAppendix(102)forreference

11 Unicode uses Oriya for the script although Odia is now the official term used

46

1 BanglaandNāgarīDevanāgarīScript

Bangla Devanāgarī

মU+09AE

मU+092E

িU+09BF

िU+093F

Table14-BanglaandDevanāgarīcross-scriptvariantcodepoint

2 BanglaandGurmukhiScript

Bangla Gurmukhı

মU+09AE

ਸU+0A38

িU+09BF

ਿU+0A3F

Table15-BanglaandGurmukhıcross-scriptvariantcodepoint

7 WholeLabelEvaluationRules(WLE) ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinsection32whenwritten inBangla12ScriptThe ruleshavebeendrafted in suchawaythattheycanbeeasilytranslatedintotheLGRspecificationsBelow are the symbols used in the WLE rules for each of the Indic SyllabicCategoryasmentionedinthetableprovidedinCodepointrepertoire(Section51)

C rarr Consonant

M rarr Kāra(Mātrā)

V rarr Vowel

B rarr Anusvāra

12 As used by the Unicode denoting and including both Assamese and Maṇipuri

47

D rarr Candrabindu

X rarr Visarga

H rarr Hasanta

Z rarr KhandaTa

S rarr S1S2(fromTable9)or(ae)Ya-phalā(V1HC1M1)whereV1iseither0985(অ-BENGALILETTERA)or098F(এ-BENGALILETTERE)His09CD(-BENGALISIGNVIRAMA)C1is-09AF(য-BENGALILETTERYA)M1is-09BE(া-BENGALIVOWELSIGNAA)S1 and S2 are valid even they are not allowed by the other context rules

P rarr Ra-Hasanta(C2H)whereC2iseither09B0(র-BENGALILETTERRA)or 09F0 (ৰ - ASSAMESE LETTER RA

Unicode name BENGALI LETTER RAWITHMIDDLEDIAGONAL)

His09CD(-BENGALISIGNVIRAMA)

Table16-SymbolsusedinWLErules

ItisalsoperhapsidealtomentionherethatinBanglatheconsonantletters(orgraphemes)arephysicallyjoinedtoformldquoclustersrdquothatcouldtheoreticallyconjoinfromtwotofourconsonantsandcombinetocreatenewshapesDashandChaudhuri(1998)statethatthereareldquonearly380uniqueconsonantclustersrdquooutofwhichBi-consonantalcombinationsare290three-lettercombinationsaccountforanother80andtherareroneswithfourlettersnumber10more[136Pg4]MoredetailsofsuchcombinationscouldbeseeninPabitraSarkar(1993)[135] 71FinalSetofWLERulesTheprevalentpatternsinBanglaandvariousrestrictionsbelowarethespecificWLErulesthatneedtobeimplemented

48

1 CisasetofCandCNwhereCNisthesetofnormalizedformsofড়ঢ়য়2 HmustbeprecededbyC

Example

3 MmustbeprecededbyCExampleকা

4 DmustbeprecededbyeitherofVCorMExampleআখখা হা

5 XmustbeprecededbyeitherofVCMorDExampleউঃখঃবঃাঃ দ ঃ

6 BmustbeprecededbyeitherofVCMorDExampleআংইংকং

7 ZmustbeprecededbyVCMDBXorPExampleইৎকৎাৎাৎপ6ৎrৎ (S is not listed because S ends with M Z may also follow S)rdquo

8 VCANNOTbeprecededbyHDetailsin711CaseofVprecededbyH

9 SCANNOTbeprecededbyH

Now letuselaborateeach rulewithexamples from the scriptkeeping inmindthe Bangla Assamese and Manipuri communities Some combinations ofcharactersmayseemunrealisticorrareinusagebutthereisnoharminaddingsuch ligaturesbecause it ispossible tocreate thembyanyusereasilybutmaynotbeattestedcombinationsCase of V Preceded by H There could be cases involvingmulti-word domains where Vmay need to beallowedtofollowanHegব8াsঅtইিuয়া baeligŋkʌv ɪndiə (U+09ACU+09CDU+09AFU+09BEU+0999U+09CD U+0995 U+0985 U+09AB U+09CD U+0987 U+09A8 U+09CD U+09A1U+09BFU+09DFU+09BE)(meaningBankofIndia)ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendswithanH(অt)andthesecondwordbeginswithaV(ইিuয়া)Somesectionsofthe linguistic community require the explicit presence of H for fullrepresentation of the sound intended However by and large the form of thefirstwordwithoutanH(U+09CD)isconsideredenoughforfullrepresentationofthesoundintendedforthefirstword

49

This isauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoire Otherwise V is never required to be allowed to follow an HPermittingthismaycreateaperceptivesimilaritybetweentwolabels(withandwithout H) for majority of the linguistic communities hence this is explicitlyprohibitedbytheNBGPIn future if required depending on the prevailing requirements from thecommunitythefutureNBGPmayconsiderrevisitingthisrule

72 AdditionalExamplesfromBanglaABNF

Belowarea fewexampleswhichhelponeunderstandsomeof the rulesABNFputsinplaceThesearejustgivenforreferencepurposesandarenotmeanttobecomprehensive

1HMBDorXcannotoccurinthebeginningofaBanglawordExample

ক क

াক ाक

ংক क

ক क

ঃক ःकAscanbeseensuchcombinationwillresultautomaticallyinaldquogolurdquooradottedcircle marking it as an invalid formation This is an intrinsic property of theIndianlanguagesyllableandisquasi-automaticallyappliedwhereversupportedbytheOS

2HisnotpermittedafterVBDXMS

Example

অ अ

অং क ক क কঃ कः ি )क

3NumberofBDorXpermittedafterConsonantorVowelorakāra(Mātrā)isrestrictedtoonethusthefollowingcombinationsareinvalidated

50

Example

কংং क

ক क

কঃঃ कःः

কা का

কীঃঃ कःः

অংং अ

অ अ

অঃঃ अःः

4NumberofMpermittedafterConsonantisrestrictedtooneExample

কীী की5MisnotpermittedafterV Example

ইাঈৗ ईाईौ6ThecombinationsofAnusvāra+ VisargaaswellasVisarga+Anusvāraarenotpermissible

Example

কংঃ कः

কঃং कः

8 Contributors

81ExpertsfromIndia ProfessorUdayaNarayanaSinghChair-ProfessorofLinguisticsampDeanFacultyofArtsAmityUniversityHaryanaGurgaonPachgaonManesarPIN122431(Haryana)India

51

ProfessorPabitraSarkarformerlyVice-ChancellorRabindraBharatiUniversityKolkataDrAtiurRahmanKhanPrincipalTechnicalOfficerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMrRajibChakrabortyLinguistSocietyforNaturalLanguageTechnologyResearch(SNLTR)Module114amp130SDFBuildingSaltLakeSector-VKolkata-700091(WestBengal)IndiaMrAkshatJoshiProjectEngineerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMsMoumitaChowdhurySeniorTechnicalOfficerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMrChandrakantaMurasinghAgartalaTripuraSomeotherNBGPmembers

82ContributorsfromBangladeshJanabMustafaJabbarHonorableMinisterMinistryofPostsTelecommunicationsampInformationTechnologyGovtofBangladeshProfShamsuzzamanKhanFormerDirector-GeneralBanglaAcademyDhakaProfRafiqulIslamNationalProfessorofHumanitiesDhakaProfSwarochisSarkarDirectorInstituteofBangladeshStudiesRajshahiUniversityRajshahiBangladeshProfJinnatImtiazAliDirector-GeneralInternationalMotherLanguageInstituteDhakaMrMohammadMamunOrRashidDepartmentofBanglaJahangirnagarUniversityampMemberBangladeshComputerCouncilProfManiruzzamanformerlyProfessorChittagongUniversityChattagramBangladeshMrShyamSunderSikderSecretarySecretaryPostampTelecommunicationsDivisionGovtofBangladesh

52

MrMdMustafaKamalFormerDirectorGeneralBangladeshTelecommunicationsRegulatoryAuthorityGovernmentofBangladeshDhakaBrigadierGeneralMdMahfuzulKarimMajumderDirector-GeneralEngineeringampOperationsDivisionBangladeshTelecommunicationsRegulatoryAuthorityGovernmentofBangladeshDhakaMdZiarulIslamProgrammerPostsampTelecommunicationsDivisionGovernmentofBangladeshDhakaProfSyedShahriyarRahmanDepartmentofLinguisticsUniversityofDhakaDrMizanurRahmanDirector(In-Charge)TranslationTextBookandInternationalRelationsDivisionBanglaAcademyDhakaDrApareshBandyopadhyayDirectorBanglaAcademyDhakaMrMdMobarakHossainDirectorBanglaAcademyDhakaDrJalalAhmedDirectorBanglaAcademyDhakaMrJahangirHossainInternetSocietyBangladesh(ICANNALS)JanabSarwarMostafaChoudhuryBangladeshComputerCouncilDhakaJanabMdRashidWasifBangladeshComputerCouncilDhakaJanabIstiaqueArifSeniorAssistantDirectorBangladeshTelecommunicationsRegulatoryAuthorityDhakaMsAfifaAbbasInformationSecurityandGovernanceLeadEngineeratBanglalinkandICANNFellowMr Mohammad Abdul Haque Secretary General Bangladesh Internet GovernanceForumMrImranHossenCEOEyeSoftandkeymemberofBangladeshAssociationofSoftwareampInformationServices(BASIS)MsShahidaKhatunDirectorFolkloreMuseumampArchiveDivisionBanglaAcademyDhakaMrSyedAshikRehmanCEOBengalMediaCorporationDhakaMrHaseebRahmanCEOProfessionalsrsquoSystemsDhaka

53

9 References[100] UnicodeConsortium2017UnicodeStandard100MountainViewCA

[101] BandyopadhyayChittaranjan1981DuiShatakerBanglaMudranoPrakashanKolkataAnandaPublishers

[102] BanerjiRD1919TheOriginoftheBengaliScriptKolkataNewDelhiAsianEducationalServices2003reprint

[103] ChatterjiSK1926TheOriginandDevelopmentoftheBengaliLanguageCalcuttaCalcuttaUniversityPress

[104] -----1939Bhasha-prakashBangalaVyakaran(AGrammaroftheBengaliLanguage)CalcuttaUniversityofCalcutta

[105] HaiMuhammadAbdul1964DhvaniVijnanOBanglaDhvani-tattwa(PhoneticsandBengaliPhonology)DhakaBanglaAcademy

[106]JhaSubhadra1958TheFormationofMaithiliLondonLuzacampCo

[107] KosticDjordjeDasRheaS1972AShortOutlineofBengaliPhoneticsCalcuttaStatisticalPublishingCompany

[108] MajumdarRC1971HistoryofAncientBengalCalcuttaGBhardwaj

[109] MazumdarBijaychandra19202000TheHistoryoftheBengaliLanguage(ReprCalcutta1920ed)NewDelhiAsianEducationalServices

[110] PandeyAnshuman2001ProposaltoEncodetheTirhutaScriptinISOIEC10646

[111] PalPalashBaran2001DhwanimalaBarnamalaKolkataPapyrus

[112] -----2007lsquoBanglaHarapherPanchParbarsquoInSwapanChakrabortyedMudranerSanskritiOBanglaBoiKolkataAbabhas

[113] RossFiona1999ThePrintedBengaliCharacteranditsEvolutionLondonCurzon

[114] ShastriMahamahopadhyayHaraPrasad1916HājārBacharērPurāṇaBāṅgālāBhāṣāyBauddhaGānōDōhāCalcuttaBangiyaSahityaParishat

[115] SinghUdayaNarayana(JointlyManiruzzaman)1983DiglossiainBangladeshandlanguageplanningCalcuttaGyanBharati214pp

[116] -----1987ABibliographyofBengaliLinguisticsMysoreCIILxii+316pp

[117] -----2017(withRajibChakrabortyBidishaBhattacharjeeampArimardanKumarTripathy)LanguagesandCulturesontheMarginGuidelinesforFieldworkonEndangeredLanguagesMimeoCentreforEndangeredLanguagesVisva-Bharati

54

[118] -----1980ScriptalchoiceandspellingreformAnessayinlanguageandplanningJournaloftheMSUniversityofBarodaSocialScienceNumber292173-186AmodifiedversionreprintedEAnnamalaiBjornJernuddandJoanRubinedsLanguagePlanningProceedingsofanInstituteMysoreCIIL405-417

[119] Sripantha1996JakhanChapakhanaEloKolkataPaschim-BangaBanglaAcademy

[120] SurAtul1986BanglaMudranerDushoBacharKolkataJijnasa

[121] ScriptBehaviourforBengaliVersion11TDILandC-DACPune

[122] BoraMahendra1981TheEvolutionofAssameseScriptJorhatAssamSahityaSabha

[123] ProposaltoEncodetheTirhutaScriptinISOIEC10646httpwwwunicodeorgL2L201111175r-tirhutapdfaccessedon25112017

[124] EthnologueAssameseintheLanguageCloudhttpswwwethnologuecomcloudasmaccessedon25112017

[125] BengalialphabetforManipurifoundinEthnologueManipuri(MeeteilonMeithei)httpswwwomniglotcomwritingmanipurihtmaccessedon20102019

[126] WikipediaBengalialphabethttpsenwikipediaorgwikiBengali_alphabetaccessedon25112017

[129]OmniglotSlyhetihttpwwwomniglotcomwritingsylotihtmaccessedon1052018

[130]WikipediaBishnupriyaManipurilanguagehttpsenwikipediaorgwikiBishnupriya_Manipuri_languageaccessedon25112017

[131] TheEMILLECIILCorpushttpmetashareeldaorgrepositorybrowsethe-emilleciil-corpusabdd35c8de6f11e2b1e400259011f6ea6bce74d38dbb42d881da76c64a6adb20accessedon1052018

[132]TheEMILLECIILCorpushttpcatalogelrainfoproduct_infophpproducts_id=696accessedon1052018

[133] BanglaLanguageampScript httpswwwisicalacin~rc_banglabanglahtmlaccessedon1052018

[134] SarkarPabitra1992BanglaBananSanskarSamasyaoSambhabanaKolkataChirayataPrakashan

[135] SarkarPabitra1993BanglaBhasharYuktabyanjanBhasha1123-45

55

[136] DashNiladriShekharandBBChaudhuri1998BanglaScriptAStructuralStudyLinguisticsToday121-28Alsoavailableathttpswwwacademiaedu9967428Bangla_Script_A_Structural_Study

[137] DaniAhmedHasan(1957)lsquoSrīhatta-NāgarīLipirUtpattioBikāśrsquoBanglaAcademyPatrika(Dhaka)Vol12(Bhadra-Agrahayan1364BangabdaNumber)pg1

[138] WikipediaSylhetiNagari httpsenwikipediaorgwikiSylheti_Nagariaccessedon1952018

[139] FuruiRyosuke(2015)lsquoVariegatedAdaptationsStateFormationinBengalfromtheFifthtoSeventhCenturyrsquoinBhairabiPrasadSahuampHermannKulkeedsInterrogatingPoliticalSystemsIntegrativeProcessesandStatesinPre-ModernIndiaChapter9Pp255-73NewDelhiManohar

[140] FergusonCharesAandMunierChowdhury(1960)lsquoPhonesofBengalirsquoLanguageVol36No1pp22-59

[141] ShahidullahMuhammad(2007)BuddhistMysticSongsDhakaMowlaBrothers

[142] RayPunyaSloka(1966)BengaliLanguageHandbookWashington

[143] HaiMuhammadAbdul(1960)AphoneticandphonologicalstudyofnasalsandnasalizationinBengaliDhakaUniversityofDhaka

[144] UnicodeConsortiumProposalSummaryFormtoAccompanySubmissionsforAdditionstotheRepertoireofISOIEC10646UNICODEhttpswwwunicodeorgL2L200202387r-syloti-formpdfaccessedonMay212018

[145] WikipediaOlChiki(Unicodeblock)httpsenwikipediaorgwikiOl_Chiki_(Unicode_block)accessedonMay212018

[146]BanglaScripthttpwwwbangladesh2000combdbangla_scripthtmlaccessedonMay212018

[147] BhattacharyaAshutoshed(1942)GopichandrerGanCalcuttaCalcuttaUniversity

[149] DasSisirKumar(1975)SahibsandMunshisAnAccountoftheCollegeofFortWilliamCalcutta

[150] IslamRafiqulPabitraSarkarMahbubulHaqampRajibChakraborty(eds)(2014)BanglaAcademyPromitoBanglaByabaharikByakaran(AFunctionalGrammarofStandardBangla)DhakaBanglaAcademy

[151] SarkarPabitra[2013]lsquoBanglaSpellingReformtheLongandShortofItrsquoBanglaJournal19215-232

[152] BanglaAcademy(2012)BanglaAcademyPromitoBanglaBananerNiyam(StandardBanglaSpellingasadoptedbyBanglaAcademy)DhakaBanglaAcademy

56

[153] SarkarPabitraampRajibChakraborty2018ldquoWhathashappenedSoFarIntermsof Script Reformsrdquo Paper presented at the Face to Face meeting jointly held by theBanglaAcademyDhakaampICANNatBanglaAcademyDhakaon10072018

[154] The Unicode Consortium 2018 The Unicodereg Standard Version 110 ndash CoreSpecificationChapter12P473

57

10 Appendix-I

101 AugmentedBackus-NaurFormalism(ABNF)The Augmented Backus-Naur Formalism (ABNF) is generic in nature and whenappliedtoaspecificlanguagescriptcertainrestrictionrulesapplyInotherwordsinagivenlanguagesomeoftheFormalismstructuresdonotnecessarilyapplyTotakecareofsuchcasesrestrictionrulesaresetinplaceTheserestrictionswillhelptofine-tunetheABNFIncaseofBangla13inparticularthefollowingrulesapply

1Khaṇḍata (ৎ) is NOT allowed at the beginning of an IDN label The sameappliestoঞandthevelarnasalঙintheBanglaSchemeoffive-foldlsquovargarsquo(asdefined under Table 5) Moreover Bangla does not allow ya (য়) in thebeginningofawordeitherbutwecanciteacoupleofnativeexamples forexamplethewordয়8াvেড়া(yaeligbbɔRo)fromthepoemlsquoLichuchorrsquowrittenbyKaziNazrul IslamHowever there are instances of it being used in namesmostlyof foreignoriginsuchasYaqubwhichmaybewrittenwithya(য়) inthe beginning as inয়াwব) In very recent timeswhile transliterating someChineseandJapanesenamesinBanglaonedoescomeacrossthepossibilityofKhaṇḍata (ৎ) followedbysa (স) inthebeginningofaword forexamplexেসিরং(Tsering)

2CHcancomewithKhandaTainonlythecasewhereCisra(র)(09B0)

ৎ6 asinভৎ6 সনা

3OnlyfollowingcombinationswithVHCMwillbeallowedrarrঅ8া(togetherpronouncedasaelig)asinঅ8ািসড(acid)rarrএ8া(togetheralsopronouncedasaelig)asinএ8ািসডএ8ােসািসেয়শান

(acidassociation)

102 lsquoSylhetiNagarılipirsquoorlsquoSilotirsquoThisversionofBanglascriptresemblesthe lsquoKaithīrsquoscript(ISO12954)usedbytheAccountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh andBiharndashwidelyinuseduringthe1880sTherewereseveralothernamesofSylheti

13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language Assamese and Maṇipuri have not been covered in this section

58

Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)

Table17ndashTheScriptTableofSylhetiNagarıorSiloti

103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints

1031 BanglaandNāgarīorDevanāgarī

Bangla Devanāgarī NBGP

Decision

ঃ U+0983 ः U+0903 Confusable

ওU+0993 उU+0909 Confusable

ঘU+0998 घU+0918 Confusable

U+0981 U+0945 Confusable

Table18BanglaandDevanāgarīconfusablecodepoints

59

1032 BanglaandGurmukhi

Bangla Gurmukhi NBGPdecision

ঘU+0998 ਬU+0A2C Confusable

U+0981 U+0A71 Confusable

Table19BanglaandGurmukhiconfusablecodepoints

Bangla Gurmukhi

NBGPdecision

ওU+0993 ਤU+0A24

Distinguishable

শU+09B6 ਅU+0A05

Distinguishable

মU+09AE ਮU+0A2E

Distinguishable

বাU+09ACandU+09BE

ਗU+0A17

Distinguishable

Table20ndashBanglaandGurmukhıdistinguishablecodepoints

1033BanglaandOriya(Odia)

Bangla Oriya(Odia) NBGPDecision

ওU+0993 ଓU+0B13 Confusable

Table21ndashBanglaandOriyadistinguishablecodepoints

Bangla Oriya(Odia) NBGP

Decision

ঘU+0998 ସU+0B38 Distinguishable

Table22ndashBanglaandOriyadistinguishablecodepoints

60

11 Appendix-IIBengaliconsonantsandtheirallographs

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

প p y(z+ত)(z +ন)|(z +প)প8(z +য)r(z +র)(z +ল)~(z +স)(+প)(+প)

ফ pʰ (t +র)(t +ল)(+ফ)

ব b ( +জ)( +দ)( +ধ)v(+ব)ব8(+য)(+র)(+ল)ভ(+ভ)(+ব)(+ব)

(0+ধ) 2 (3+ব)

ভ bʱ ভ8(+য)(+র)(+ল)

ত t (x+ত)8(x+x+য)(x+x+ব)(x+থ)(x+ন)ত8(x+য)(x+ম)8(x++য)(x+ব)(x+র)y(z+ত)(p+ত)(p+x+ব)(+ত)q8(+x++য) (+x+র)Thereisamarkedformofত+=ৎৎ6 (+xৎ)

amp (5+ত)

61

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

থ tʰ থ8(iexcl+য)cent(iexcl+র)pound(+থ)(x+থ)curren(+থ)

(7+থ) 9 (+থ)

দ d yen(brvbar+গ)sect(brvbar+ঘ)uml(brvbar+দ)copy(brvbar+ধ)দ8(brvbar+য)ordf(brvbar+ব)laquo(brvbar+ভ)not(brvbar+র)(+দ)shy(+দ)reg(+brvbar+র)not6 (+brvbar+র)

(lt+গ) gt (lt+ধ)

ধ dʱ macr(deg+ন)plusmn(deg+ম)ধ8(deg+য)sup2(deg+র)sup3(acute+ধ)copy(brvbar+ধ)(+ধ)micro(+ধ)

( (+ধ) gt (lt+ধ) (0+ধ) (7+ধ)

ট ʈ para(middot+ট)ট8(middot+য)cedil(middot+ব)sup1(middot+র)ordm(p+ট)raquo(frac14+ট)

ঠ ʈʰ ঠ8(frac12+য)frac34(iquest+ঠ)Agrave(frac14+ঠ)

ড ɖ Aacute(Acirc+ড)ড8(Acirc+য)Atilde(Acirc+র)

ঢ ɖʱ ঢ8(Auml+য)Aring(iquest+ঢ)

চ tʃ AElig(Ccedil+চ)Egrave(Ccedil+ছ)Eacute(Ccedil+Ecirc+র)Euml(Ccedil+ঞ)চ8(Ccedil+য)Igrave(Iacute+চ)Icirc(Iuml+চ)

(A+চ)

62

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ছ tʃʰ ETH(Ecirc+র)Egrave(Ccedil+ছ)Ntilde(Iacute+ছ)Ograve(Iuml+ছ)

$ (A+ছ)

জ dʒ Oacute(Ocirc+জ)Otilde(Ocirc+Ocirc+ব)Ouml(Ocirc+ঝ)times(Ocirc+ঞ)জ8(Ocirc+য)Oslash(Ocirc+র)Ugrave(Iacute+জ)

(A+জ)

ঝ dʒʱ (notprivilegedenoughtohaveclustersasafirstmember)Ouml(Ocirc+ঝ)Uacute(Iacute+ঝ)

ক k Ucirc(p+ক)ordm(p+ট)(p+ত)Uuml(p+x+র)(p+x+ব)Yacute(p+ন)THORN(p+ব)szlig(p+ম)ক8(p+য)agrave(p+র)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)auml(p+frac14+ব)aacute8(p+frac14+য)aring(p+স)s(aelig+ক)ccedil(+p+র)

amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)

খ kʰ (notprivilegedenoughtohaveclustersasafirstmember)egrave(aelig+খ)

63

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

গ g eacute(acute+গ)ecirc(acute+দ)sup3(acute+ধ)euml(acute+ন)igrave(acute+ব)iacute(acute+ম)গ8(acute+য)icirc(acute+র)iuml(acute+ল)eth(aelig+গ)eth6 (+aelig+গ)

( (+ধ) (H+গ) K (L+H+গ)

ঘ gʱ ntilde(ograve+ন)ঘ8(ograve+য)oacute(ograve+র)ocirc(aelig+ঘ)

ঞ Thisletterdoesnothaveanyparticularphoneticvaluebutmostlypronouncedasn

Igrave(Iacute+চ)Ntilde(Iacute+ছ)Ugrave(Iacute+জ)Uacute(Iacute+ঝ)times(Ocirc+ঞ)

(A+চ) $ (A+ছ) (A+জ) M (A+ঝ)

ণ n otilde(iquest+ট)frac34(iquest+ঠ)ouml(iquest+ড)divide(iquest+Acirc+র)Aring(iquest+ঢ)oslash(iquest+ণ)ণ8(iquest+য)ugrave(iquest+ব)acirc(p+frac14+ণ)uacute(frac14+ণ)ucirc(+ণ)

O (P+ড) - (P+R+র) + (S+ণ)

ঙং ŋ s(aelig+ক)uuml(aelig+p+র)egrave(aelig+খ)eth(aelig+গ)ocirc(aelig+ঘ)yacute(aelig+p+ষ)(Insomecontextsaeligisreplacedbyং)কংঅং

) (H+ক) (H+গ) U (H+ঘ)

64

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ম m thorn(+ল)yuml(+প)(+z+র)(+ভ)(++র)$(+ম)(+র)(x+ম)plusmn(deg+ম)amp(+ম)atilde(p+frac14+ম)

W (3+ম)

ন n (+ট)((+middot+র))(iquest+ঠ)u(+ড)(+Acirc+র)(+ত)q(+x+র)q8(+x++য)curren(+থ)shy(+দ)reg(+brvbar+র)micro(+ধ)+(+deg+র)(+brvbar+ব)-(+ন)(+ম)ন8(+য)(+স)0(+ন)

(7+থ) (7+ধ) (7+Y+র)

শ ʃ Icirc(Iuml+চ)Ograve(Iuml+ছ)1(Iuml+ন)2(Iuml+ম)3(Iuml+র)4(Iuml+ল)শ8(Iuml+য)

ষ ʃ 5(frac14+ক)raquo(frac14+ট)Agrave(frac14+ঠ)uacute(frac14+ণ)6(frac14+প)7(frac14+z+র)8(frac14+ফ)raquo(frac14+ট)9(frac14+middot+র)Agrave(frac14+ঠ)uacute(frac14+ণ)ষ8(frac14+য)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)

+ (S+ণ)

65

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

স sampʃ (+ক)(+ট)(+প)(+ফ)lt(+ত)pound(+থ)(+ট)(+ক)=(+খ)স8(+য)gt(+র)(+ল)aring(p+স)

9 (+থ)

হ h ucirc(+ণ)0(+ন)amp(+ম)হ8(+য)(+র)A(+ল)

W (3+ম)

ড় ɽ B(C+গ)

ঢ় ɽʱ (notprivilegedenoughtohaveclusters)

য dʒThesecondarysymbol(allograph)jɔ-phalahastwophoneticvaluesWhenaddedtotheinitialconsonantinaworditisavowelaelig(asinশ8ামলর 8াপারetc)Butafteranon-initialconsonantitjustdoublesitinpronunciation(asinকায6ধায6etc)The+যcombinationhastwophysicalmanifestationsmdashর 8andয6

ক8(p+য)স8(+য)র 8(+য)[Justর 8isneverusedinBanglaorthographyর 8াisbutthenitslasttwosymbolsYa-phalaa-karaconstituteavowelsignrepresentingthevowelঅ8া]

66

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

র r Twomanifestationsmdashi lরফrepʰasthefirst

memberofaclusteregপ6ৎ6 not6 য6D6(+deg+ব)(earlierE6=+brvbar+deg+বafour-termcluster)etc(placedoverthefollowingconsonant)

ii র-ফলাrɔ-pʰɔlaasthesecondthirdmemberofaclustereg etc(placedundertheconsonantitfollows)

ল l F(+গ)(+প)G(+ব)H(+ম)I(+ট)J(+ড)K(+ক)F(+গ)L(+দ)ল8(+য)iuml(acute+ল)(+ল)thorn(+ল)

ঃ hwordfinallywordmediallyitdoublesthepronunciationofthefollowingconsonant

অঃকঃ

অব

Page 37: Proposal for a Bangla (or Bengali) Script Root Zone …‘Bangla’ (or Bengali) is historically and genealogically regarded as an eastern Indo-Aryan language with around 178.2 million

37

53 CodepointnotusedaloneBENGALI SIGN NUKTA U+09BC (See 336) is excluded from repertoire since it will never be used alone It will be used as sequence in three special characters in normalized form for ড় ঢ় য়

UnicodeCodePoint

Glyph CharacterName Reasonforexclusion

U+09BC BENGALISIGNNUKTA

Never used alone Only used together with U+09A1 ড

U+09A2 ঢ U+09AF য as to form ড় ঢ় য় respectively

Table10bExcludedCodePoints

54 TheBasisofPresentIDNThepresentLGRhasalsobenefitedfromtheearlierworkonIDNforBangla(differentversions)doneforभारतorভারতdraftedbetween20112009and18072013

541 TheABNFVariablesTheAugmentedBackus-NaurFormalism(ABNF)beganwiththefollowingvariables

CrarrConsonantVrarrVowelMrarrkāra(Mātrā)BrarrAnusvāra(onuʃʃār)DrarrCandrabinduXrarrVisarga(biʃɔrgo)HrarrHasantaViramaZrarrKhandaTa

TheAugmentedBackus-NaurFormalism(ABNF)willusethefollowingOperators

SrNumber Operator Function

1 ldquo|ldquo Alternative

2 ldquo[]rdquo Optional

3 ldquordquo VariableRepetition

38

4 ldquo()rdquo SequenceGroup

Table11TheABNFFormalism

InwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoBanglaaregiventofacilitateunderstanding

542 TheVowelSequenceInwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoBanglaare given To facilitate understanding of other Brahmi script users equivalents inDevanāgarīareprovidedwherevernecessaryAvowelsequenceismadeupofasinglevowelItmaybefollowedbutnotnecessarily(optionally)byanAnusvāraonuʃʃār(B)Candrabindu(D)oraVisargabiʃɔrgo(X)ThenumberofDBorXwhichcanfollowaV inBanglamaynotberestrictedtooneGoingbytherules illustratedinthedocument it isclearthat formationssuchasVDDVBBandVXXare invalidorthographicunitsHowever it isvalidandpossible tohaveformationsorsequencessuchasanusvarafollowedbyachandrabinduononehandandvisarga followed by a chandrabindu on the other as in হ8াংচা lsquohaelignchārsquo and lsquohaelignrsquo হ8াঃrespectivelyThepossibilityof aVisarga orAnusvāra (onuʃʃār) followingaCandrabinduexists inBangla Vowel can optionally be followed by a combination of Hasanta Virama [H]Consonant [C] to formaYa-phala ldquoYa-phala isapresentation formofU+09AFBanglaletterযorlsquoyarsquoRepresentedbythesequenceltU+09CDieBENGALISIGNVIRAMABangla SIGNHasanta or VIRAMA U+09AF -য BENGALI LETTER YAgt Ya-phala has aspecialformয়AgainwhencombinedwithU+09BEাBENGALIVOWELSIGNAA(ielsquoaarsquo(ā))itisusedfortranscribing[aelig]asintheldquoardquointheEnglishwordldquobatrdquowritteninBanglaasব8াটAVowel-sequenceadmitsthefollowingcombinations

5421 ASingleVowel

ExamplesV অअ

5422 AVowelwithConditionsAVowelcanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D]orVisarga[X]or Candrabindu+ Anusvāra [DB] or Candrabindu+ Visarga [DX] or combination ofHasanta(orVirama)[H]followedbyConsonant[C]followedbykāra(Mātrā)[M]

39

Examples

VB অং अ

VD অ अ

VX অঃ अः

VDB অং अ

VDX অঃ अःVHCM অ8াএ8া

5423 VHCMSequenceAVHCMsequencecanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D] orVisarga [X]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga [DX]

Examples VHCMB অ8াংএ8াং VHCMD অ8াএ8া VHCMX অ8াঃএ8াঃ VHCMDB অ8াংএ8াং VHCMDX অ8াঃএ8াঃ

543 TheConsonantSequence

5431 ASingleConsonant(C) ExampleC কक

5432 AConsonantwithConditionsAConsonantoptionallyfollowedbydependentvowelsignkāra(Mātrā)[M]orAnusvāra [B] or Candrabindu [D] or Visarga [X] or Hasanta (also known asVirama)[H]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga[DX]

Example

CM িকক -कक

CB কং क

40

CD ক क

CX কঃ कः

CH p क(Pureconsonant)

CDB কং क

CDX কঃ कः

5433 CMSequenceACMsequencecanbeoptionallyfollowedbyBDXDBorDX

Example CMB কীংকং कक

CMD কা का

CMX বীঃ वीः

CMDB কাং का

CMDX কাঃ काः

5434 SequenceofConsonants Asequenceofconsonants(upto4)joinedbyHasanta(alsoknownasVirama)

3(CH)CExample

CHC W rarr ন++ ত न+त

CHCHC sup2 rarr ন+ + ত+ + র न+त+र

CHCHCHC q8 rarr ন++ত++র++য় न+त+र+य

5435 Subsets While considering its subsets as a representative example we will consider thecombinationCHConlyhoweverthesameisequallyapplicabletoCHCHCandCHCHCHC[A]ThecombinationmaybefollowedbyMBDXDBorDX

Example CHCM sup3ী rarrক ক ী 4कrarrककী

CHCB sup3ং rarrক ক ং 4कrarrकक

CHCD sup3 rarrক ক 4कrarrकक

41

CHCX sup3ঃ rarrক ক ঃ 4कःrarrककঃ

CHCDB sup3 ং rarrক ক ং 4कrarrकक

CHCDX sup3ঃ rarrক ক ঃ 4कःrarrककः [B] 3(CH)CMmayfurtherbefollowedbyaBDXDBorDX

Example

CHCMB sup3ীং rarr ক ক ী ং 4क rarr क क ी

sup3ং rarr ক ক ং 4क rarr क क

CHCMD sup3া rarr ক ক া 4का rarr क क ा

CHCMX sup3ীঃ rarr ক ক ী ঃ 4कः rarr क क ी ः

CHCMDB sup3াংrarr ক ক া ং 4काrarr क क ा

CHCMDX sup3াঃ rarr ক ক া ঃ 4काः rarr क क ा ः

544 TheKhanda-Tasequence

5441 AsinglelsquoKhandarsquo-Ta(Z) Example Zৎ=x

5442 AKhandaTaCombination10AKhandaTacanbeprecededbyaconsonantandHasanta(alsoknownasVirama)

[CH]Z

Example র++ৎ =ৎHasinভৎHসনা (bhartsanā) scoldingNoteTheconditionsinthiscontextofKHANDATAarethattheCshouldbeeitherRAU+09B0(র)(usedinBangla)orRAU+09F0(ৰ)(usedinAssamese)

545 SpecialCasesSandPTwospecialcasesinvolvingSequences(referredtoasSandPinTable16underSection7)couldbedescribedbrieflyhereLetustakeupSinthefirstinstanceItisnoteworthythat thereare two instances inBanglawhereHasanta(U+09CD) isprecededbya full

10 Refer to Rule P in Section 7 Table 16

42

vowel (U+0985 অ - BANGLA LETTER A and U+098F এ - BANGLA LETTER E) ForrenderingYa-phalāfollowedbyঅandএitisnecessarytotypeU+09CDplusU+09AFyapreceded by the said vowels This is a purely ligatural entity and the addition ofYa-phalāandā-kāraisusedtoelicittheaeligsoundasinEnglishlsquobatrsquolsquofatrsquoetcTheBrahmıscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribedaslsquovowel killerrsquo although it actually indicates absence of a vowel after the markedconsonant Only the consonants can have the Hasanta marked But as we see hereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅGাandএGা(CfUnicode100p473[100])Another case refers to the formation of repha and ra-phalā in the said script andmentioned in the tableaboveasPOwing toco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+

kaasinঅকHarkaldquothesunldquo)ra-phalā=C+Hasanta+ra(egTieka+Hasanta+raas

inচTcakraldquocycleldquo)ThepointisinboththecasestheslotforracouldbeBanglaraর

(U+09B0)ortheAssameseraৰ(U+09F0)followedprecededbythecommonHasanta(U+09CD)whereastheshapesofrephaandra-phalāinboththecasesremainthesame

43

6 VariantsThissection talksabout thevariants in theBanglascriptTheNBGPcategorizes theseconfusinglyvariantsintwogroupsGroup1ConfusingduetopurevisualsimilarityGroup2ConfusingduetodeviationfromnormallyperceivedcharacterformationsbylargerlinguisticcommunityForGroup1anyidenticalcodepointsaredefinedasvariantsTheconfusablebutnotidenticalcasesarenotproposedasthereisanotherpanel(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesHowevercaseswhichbelongtoGroup2areproposedtobeconsideredasvariantsThesecasesarenotofmerevisualsimilarityasthey involve some deviations from the widely accepted norms of Bangla Aksharformations These can cause confusion even to a careful observer and hence beingproposedasvariantsThe variants are generated in a script when two or more forms are formed withdifferent storage or code points In Bangla the e-kāra ā-kāra and the o-kāra havedifferentcodepointsOnecantypeowithaconsonantatonegoandthesamebytypinge-kāra and ā-kāra as two separate keys getting the same results A reader cannotdifferentiatebetween the twoko (েকা)one typedwitha singlekeyand theotheronetypedwithtwodifferentkeysMoreoverthiswillnotbeconsideredasacaseofvariantbecauseakārafollowedbyakāraisnotallowed

61 InScriptVariantsHoweverweproposetwocasesoftruein-scriptvariantsinBanglascriptCASEIAs far as true variants in Bangla are concernedwemay drawour attention to caseswhereinHasantawith(U+09A5)থ(tha)appearsasconjunctwith(U+09B8)স(sa)and(U+09A8)ন(na)1 স+Hasanta+থ(U+09B8+U+09CD+U+09A5)versus

স +Hasanta+হ(U+09B8+U+09CD+U+09B9)

2 ন+Hasanta+থ(U+09A8+U+09CD+U+09A5)versus

ন+Hasanta+হ(U+09A8+U+09CD+U+09B9)

44

Theabovecombinationsifwrittenintraditionalorthographycouldbelittleconfusingwheretheথ(tha)inconjunctappearslikeaহ(ha)Theconjunctcouldbeintheinitialmedialorfinalpositions(asshownbelowinegno1)Itcouldbetypedwrongaswellthinking itwas aহ (ha)U+09B9 increasing the chances of risks in labelwriting andidentificationExamples1 acuteandসহ(asinacuteানsthānaacuteলsthulaাacuteGsvāsthyaঅacuteায়ীasthāyı)

2 andনহ(asinparagrantha)The fontswhichrepresent traditionalBanglawritingsystemcould tend tocreate thisproblemThereforethesemaybetakenascasesofvariantsinBanglaCASEIIAnotherinterestingexampleofvariantisencounteredinra+HasantaandHasanta+racombinationsinwritinglabelsintheBanglascript(forlanguagessuchasBanglaAssameseandManipuri)Thevariantcasesariseintypinglsquorepharsquo(involvingra+Hasanta)andlsquora-phalārsquo(involvingHasanta+ra)lsquoRepharsquocouldbeformedbytwosequences(mainlybecausebothAssameseandBanglafindplaceinthesameUNICODEpointsandlsquoB_RArsquoaswellaslsquoA_RArsquorefertothesamephoneticelement)Herethefinalligatureslookthesameandwillbeasfollows

(1) B_RA+H+C(2) A_RA+H+C

Where

B_RA = U+09B0BENGALILETTERRA(র)orA_RA = U+09F0BENGALILETTERRAWITHMIDDLEDIAGONAL(ৰ)H = U+09CDBENGALISIGNVIRAMA()C = anyconsonant(theoretically)

Example

Sequence1(UsingBanglaRA)

Ligature1

Sequence2(UsingAssameseRA)

Ligature2

U+09B0(র)U+09CD()U+0995(ক) কH U+09F0(ৰ)U+09CD()U+0995(ক) কH

U+09B0(র)U+09CD()U+09A0(ঠ) ঠH U+09F0(ৰ)U+09CD()U+09A0(ঠ) ঠH

Table12ExampleofRepha

45

NoteAsBanglaandAssameseকandঠlookexactlythesametheresultantcombinationswithRephalookidenticalAdditionofRephadoesnotmakeanydifferencelsquoRa-phalarsquocouldbeformedbytwosequencesonsimilargroundsandthefinalligatureswouldlookthesame

(1) C1+H+B_RA(2) C1+H+A_RA

WhereC1 = anyconsonantsexceptKhanda-ta

Example

Sequence1(UsingBanglaRA)

Ligature1

Sequence2(UsingAssameseRA)

Ligature2

U+0995(ক)U+09CD()U+09B0(র) U+0995(ক)U+09CD()U+09F0(ৰ)

U+09A8(ন)U+09CD()U+09B0(র) ) U+09A8(ন)U+09CD()U+09F0(ৰ) )

Table13ExampleofRa-phala

AstheAssameseandBanglaRephaandRa-phalaconjunctformslookthesamethiscouldcauseconfusabilitytotheend-usersHencetherephaandra-phalacasesneedtobedefinedasvariantsNBGPconcludedtodefineরandৰasvariantcodepointswhereonlyonevariantsetbetweenরandৰcouldcoverallcasesButthiswillcreateblockedvariantlabelsegif

someoneregistersldquoরররrdquothevariantlabelldquoৰৰৰrdquowillbegeneratedasvariantandwillbeblockedandviceversaHoweveritisonlyblockedatthelabellevelifsomeoneelseneedstoregisterotherlabelsegৰৰorৰৰৰৰitisstillpossible

62 CrossScriptVariantsAcrispcrossscriptstudyforBanglahasbeendonewithrespecttosisterscriptssuchasDevanāgarī Gurmukhı and Odia11(formerly Oriya) keeping in mind the visual andtechnicalconfusionstheymaycauseaslabelsonthewebdomainMoreoverthereisnoin-script variant in Bangla as far as the orthography is concerned The followingcharacters are being proposed by the NBPG as variants Although there are certaincharacters which are somewhat similar they but have not been included here TheyhavebeenprovidedintheAppendix(102)forreference

11 Unicode uses Oriya for the script although Odia is now the official term used

46

1 BanglaandNāgarīDevanāgarīScript

Bangla Devanāgarī

মU+09AE

मU+092E

িU+09BF

िU+093F

Table14-BanglaandDevanāgarīcross-scriptvariantcodepoint

2 BanglaandGurmukhiScript

Bangla Gurmukhı

মU+09AE

ਸU+0A38

িU+09BF

ਿU+0A3F

Table15-BanglaandGurmukhıcross-scriptvariantcodepoint

7 WholeLabelEvaluationRules(WLE) ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinsection32whenwritten inBangla12ScriptThe ruleshavebeendrafted in suchawaythattheycanbeeasilytranslatedintotheLGRspecificationsBelow are the symbols used in the WLE rules for each of the Indic SyllabicCategoryasmentionedinthetableprovidedinCodepointrepertoire(Section51)

C rarr Consonant

M rarr Kāra(Mātrā)

V rarr Vowel

B rarr Anusvāra

12 As used by the Unicode denoting and including both Assamese and Maṇipuri

47

D rarr Candrabindu

X rarr Visarga

H rarr Hasanta

Z rarr KhandaTa

S rarr S1S2(fromTable9)or(ae)Ya-phalā(V1HC1M1)whereV1iseither0985(অ-BENGALILETTERA)or098F(এ-BENGALILETTERE)His09CD(-BENGALISIGNVIRAMA)C1is-09AF(য-BENGALILETTERYA)M1is-09BE(া-BENGALIVOWELSIGNAA)S1 and S2 are valid even they are not allowed by the other context rules

P rarr Ra-Hasanta(C2H)whereC2iseither09B0(র-BENGALILETTERRA)or 09F0 (ৰ - ASSAMESE LETTER RA

Unicode name BENGALI LETTER RAWITHMIDDLEDIAGONAL)

His09CD(-BENGALISIGNVIRAMA)

Table16-SymbolsusedinWLErules

ItisalsoperhapsidealtomentionherethatinBanglatheconsonantletters(orgraphemes)arephysicallyjoinedtoformldquoclustersrdquothatcouldtheoreticallyconjoinfromtwotofourconsonantsandcombinetocreatenewshapesDashandChaudhuri(1998)statethatthereareldquonearly380uniqueconsonantclustersrdquooutofwhichBi-consonantalcombinationsare290three-lettercombinationsaccountforanother80andtherareroneswithfourlettersnumber10more[136Pg4]MoredetailsofsuchcombinationscouldbeseeninPabitraSarkar(1993)[135] 71FinalSetofWLERulesTheprevalentpatternsinBanglaandvariousrestrictionsbelowarethespecificWLErulesthatneedtobeimplemented

48

1 CisasetofCandCNwhereCNisthesetofnormalizedformsofড়ঢ়য়2 HmustbeprecededbyC

Example

3 MmustbeprecededbyCExampleকা

4 DmustbeprecededbyeitherofVCorMExampleআখখা হা

5 XmustbeprecededbyeitherofVCMorDExampleউঃখঃবঃাঃ দ ঃ

6 BmustbeprecededbyeitherofVCMorDExampleআংইংকং

7 ZmustbeprecededbyVCMDBXorPExampleইৎকৎাৎাৎপ6ৎrৎ (S is not listed because S ends with M Z may also follow S)rdquo

8 VCANNOTbeprecededbyHDetailsin711CaseofVprecededbyH

9 SCANNOTbeprecededbyH

Now letuselaborateeach rulewithexamples from the scriptkeeping inmindthe Bangla Assamese and Manipuri communities Some combinations ofcharactersmayseemunrealisticorrareinusagebutthereisnoharminaddingsuch ligaturesbecause it ispossible tocreate thembyanyusereasilybutmaynotbeattestedcombinationsCase of V Preceded by H There could be cases involvingmulti-word domains where Vmay need to beallowedtofollowanHegব8াsঅtইিuয়া baeligŋkʌv ɪndiə (U+09ACU+09CDU+09AFU+09BEU+0999U+09CD U+0995 U+0985 U+09AB U+09CD U+0987 U+09A8 U+09CD U+09A1U+09BFU+09DFU+09BE)(meaningBankofIndia)ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendswithanH(অt)andthesecondwordbeginswithaV(ইিuয়া)Somesectionsofthe linguistic community require the explicit presence of H for fullrepresentation of the sound intended However by and large the form of thefirstwordwithoutanH(U+09CD)isconsideredenoughforfullrepresentationofthesoundintendedforthefirstword

49

This isauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoire Otherwise V is never required to be allowed to follow an HPermittingthismaycreateaperceptivesimilaritybetweentwolabels(withandwithout H) for majority of the linguistic communities hence this is explicitlyprohibitedbytheNBGPIn future if required depending on the prevailing requirements from thecommunitythefutureNBGPmayconsiderrevisitingthisrule

72 AdditionalExamplesfromBanglaABNF

Belowarea fewexampleswhichhelponeunderstandsomeof the rulesABNFputsinplaceThesearejustgivenforreferencepurposesandarenotmeanttobecomprehensive

1HMBDorXcannotoccurinthebeginningofaBanglawordExample

ক क

াক ाक

ংক क

ক क

ঃক ःकAscanbeseensuchcombinationwillresultautomaticallyinaldquogolurdquooradottedcircle marking it as an invalid formation This is an intrinsic property of theIndianlanguagesyllableandisquasi-automaticallyappliedwhereversupportedbytheOS

2HisnotpermittedafterVBDXMS

Example

অ अ

অং क ক क কঃ कः ি )क

3NumberofBDorXpermittedafterConsonantorVowelorakāra(Mātrā)isrestrictedtoonethusthefollowingcombinationsareinvalidated

50

Example

কংং क

ক क

কঃঃ कःः

কা का

কীঃঃ कःः

অংং अ

অ अ

অঃঃ अःः

4NumberofMpermittedafterConsonantisrestrictedtooneExample

কীী की5MisnotpermittedafterV Example

ইাঈৗ ईाईौ6ThecombinationsofAnusvāra+ VisargaaswellasVisarga+Anusvāraarenotpermissible

Example

কংঃ कः

কঃং कः

8 Contributors

81ExpertsfromIndia ProfessorUdayaNarayanaSinghChair-ProfessorofLinguisticsampDeanFacultyofArtsAmityUniversityHaryanaGurgaonPachgaonManesarPIN122431(Haryana)India

51

ProfessorPabitraSarkarformerlyVice-ChancellorRabindraBharatiUniversityKolkataDrAtiurRahmanKhanPrincipalTechnicalOfficerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMrRajibChakrabortyLinguistSocietyforNaturalLanguageTechnologyResearch(SNLTR)Module114amp130SDFBuildingSaltLakeSector-VKolkata-700091(WestBengal)IndiaMrAkshatJoshiProjectEngineerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMsMoumitaChowdhurySeniorTechnicalOfficerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMrChandrakantaMurasinghAgartalaTripuraSomeotherNBGPmembers

82ContributorsfromBangladeshJanabMustafaJabbarHonorableMinisterMinistryofPostsTelecommunicationsampInformationTechnologyGovtofBangladeshProfShamsuzzamanKhanFormerDirector-GeneralBanglaAcademyDhakaProfRafiqulIslamNationalProfessorofHumanitiesDhakaProfSwarochisSarkarDirectorInstituteofBangladeshStudiesRajshahiUniversityRajshahiBangladeshProfJinnatImtiazAliDirector-GeneralInternationalMotherLanguageInstituteDhakaMrMohammadMamunOrRashidDepartmentofBanglaJahangirnagarUniversityampMemberBangladeshComputerCouncilProfManiruzzamanformerlyProfessorChittagongUniversityChattagramBangladeshMrShyamSunderSikderSecretarySecretaryPostampTelecommunicationsDivisionGovtofBangladesh

52

MrMdMustafaKamalFormerDirectorGeneralBangladeshTelecommunicationsRegulatoryAuthorityGovernmentofBangladeshDhakaBrigadierGeneralMdMahfuzulKarimMajumderDirector-GeneralEngineeringampOperationsDivisionBangladeshTelecommunicationsRegulatoryAuthorityGovernmentofBangladeshDhakaMdZiarulIslamProgrammerPostsampTelecommunicationsDivisionGovernmentofBangladeshDhakaProfSyedShahriyarRahmanDepartmentofLinguisticsUniversityofDhakaDrMizanurRahmanDirector(In-Charge)TranslationTextBookandInternationalRelationsDivisionBanglaAcademyDhakaDrApareshBandyopadhyayDirectorBanglaAcademyDhakaMrMdMobarakHossainDirectorBanglaAcademyDhakaDrJalalAhmedDirectorBanglaAcademyDhakaMrJahangirHossainInternetSocietyBangladesh(ICANNALS)JanabSarwarMostafaChoudhuryBangladeshComputerCouncilDhakaJanabMdRashidWasifBangladeshComputerCouncilDhakaJanabIstiaqueArifSeniorAssistantDirectorBangladeshTelecommunicationsRegulatoryAuthorityDhakaMsAfifaAbbasInformationSecurityandGovernanceLeadEngineeratBanglalinkandICANNFellowMr Mohammad Abdul Haque Secretary General Bangladesh Internet GovernanceForumMrImranHossenCEOEyeSoftandkeymemberofBangladeshAssociationofSoftwareampInformationServices(BASIS)MsShahidaKhatunDirectorFolkloreMuseumampArchiveDivisionBanglaAcademyDhakaMrSyedAshikRehmanCEOBengalMediaCorporationDhakaMrHaseebRahmanCEOProfessionalsrsquoSystemsDhaka

53

9 References[100] UnicodeConsortium2017UnicodeStandard100MountainViewCA

[101] BandyopadhyayChittaranjan1981DuiShatakerBanglaMudranoPrakashanKolkataAnandaPublishers

[102] BanerjiRD1919TheOriginoftheBengaliScriptKolkataNewDelhiAsianEducationalServices2003reprint

[103] ChatterjiSK1926TheOriginandDevelopmentoftheBengaliLanguageCalcuttaCalcuttaUniversityPress

[104] -----1939Bhasha-prakashBangalaVyakaran(AGrammaroftheBengaliLanguage)CalcuttaUniversityofCalcutta

[105] HaiMuhammadAbdul1964DhvaniVijnanOBanglaDhvani-tattwa(PhoneticsandBengaliPhonology)DhakaBanglaAcademy

[106]JhaSubhadra1958TheFormationofMaithiliLondonLuzacampCo

[107] KosticDjordjeDasRheaS1972AShortOutlineofBengaliPhoneticsCalcuttaStatisticalPublishingCompany

[108] MajumdarRC1971HistoryofAncientBengalCalcuttaGBhardwaj

[109] MazumdarBijaychandra19202000TheHistoryoftheBengaliLanguage(ReprCalcutta1920ed)NewDelhiAsianEducationalServices

[110] PandeyAnshuman2001ProposaltoEncodetheTirhutaScriptinISOIEC10646

[111] PalPalashBaran2001DhwanimalaBarnamalaKolkataPapyrus

[112] -----2007lsquoBanglaHarapherPanchParbarsquoInSwapanChakrabortyedMudranerSanskritiOBanglaBoiKolkataAbabhas

[113] RossFiona1999ThePrintedBengaliCharacteranditsEvolutionLondonCurzon

[114] ShastriMahamahopadhyayHaraPrasad1916HājārBacharērPurāṇaBāṅgālāBhāṣāyBauddhaGānōDōhāCalcuttaBangiyaSahityaParishat

[115] SinghUdayaNarayana(JointlyManiruzzaman)1983DiglossiainBangladeshandlanguageplanningCalcuttaGyanBharati214pp

[116] -----1987ABibliographyofBengaliLinguisticsMysoreCIILxii+316pp

[117] -----2017(withRajibChakrabortyBidishaBhattacharjeeampArimardanKumarTripathy)LanguagesandCulturesontheMarginGuidelinesforFieldworkonEndangeredLanguagesMimeoCentreforEndangeredLanguagesVisva-Bharati

54

[118] -----1980ScriptalchoiceandspellingreformAnessayinlanguageandplanningJournaloftheMSUniversityofBarodaSocialScienceNumber292173-186AmodifiedversionreprintedEAnnamalaiBjornJernuddandJoanRubinedsLanguagePlanningProceedingsofanInstituteMysoreCIIL405-417

[119] Sripantha1996JakhanChapakhanaEloKolkataPaschim-BangaBanglaAcademy

[120] SurAtul1986BanglaMudranerDushoBacharKolkataJijnasa

[121] ScriptBehaviourforBengaliVersion11TDILandC-DACPune

[122] BoraMahendra1981TheEvolutionofAssameseScriptJorhatAssamSahityaSabha

[123] ProposaltoEncodetheTirhutaScriptinISOIEC10646httpwwwunicodeorgL2L201111175r-tirhutapdfaccessedon25112017

[124] EthnologueAssameseintheLanguageCloudhttpswwwethnologuecomcloudasmaccessedon25112017

[125] BengalialphabetforManipurifoundinEthnologueManipuri(MeeteilonMeithei)httpswwwomniglotcomwritingmanipurihtmaccessedon20102019

[126] WikipediaBengalialphabethttpsenwikipediaorgwikiBengali_alphabetaccessedon25112017

[129]OmniglotSlyhetihttpwwwomniglotcomwritingsylotihtmaccessedon1052018

[130]WikipediaBishnupriyaManipurilanguagehttpsenwikipediaorgwikiBishnupriya_Manipuri_languageaccessedon25112017

[131] TheEMILLECIILCorpushttpmetashareeldaorgrepositorybrowsethe-emilleciil-corpusabdd35c8de6f11e2b1e400259011f6ea6bce74d38dbb42d881da76c64a6adb20accessedon1052018

[132]TheEMILLECIILCorpushttpcatalogelrainfoproduct_infophpproducts_id=696accessedon1052018

[133] BanglaLanguageampScript httpswwwisicalacin~rc_banglabanglahtmlaccessedon1052018

[134] SarkarPabitra1992BanglaBananSanskarSamasyaoSambhabanaKolkataChirayataPrakashan

[135] SarkarPabitra1993BanglaBhasharYuktabyanjanBhasha1123-45

55

[136] DashNiladriShekharandBBChaudhuri1998BanglaScriptAStructuralStudyLinguisticsToday121-28Alsoavailableathttpswwwacademiaedu9967428Bangla_Script_A_Structural_Study

[137] DaniAhmedHasan(1957)lsquoSrīhatta-NāgarīLipirUtpattioBikāśrsquoBanglaAcademyPatrika(Dhaka)Vol12(Bhadra-Agrahayan1364BangabdaNumber)pg1

[138] WikipediaSylhetiNagari httpsenwikipediaorgwikiSylheti_Nagariaccessedon1952018

[139] FuruiRyosuke(2015)lsquoVariegatedAdaptationsStateFormationinBengalfromtheFifthtoSeventhCenturyrsquoinBhairabiPrasadSahuampHermannKulkeedsInterrogatingPoliticalSystemsIntegrativeProcessesandStatesinPre-ModernIndiaChapter9Pp255-73NewDelhiManohar

[140] FergusonCharesAandMunierChowdhury(1960)lsquoPhonesofBengalirsquoLanguageVol36No1pp22-59

[141] ShahidullahMuhammad(2007)BuddhistMysticSongsDhakaMowlaBrothers

[142] RayPunyaSloka(1966)BengaliLanguageHandbookWashington

[143] HaiMuhammadAbdul(1960)AphoneticandphonologicalstudyofnasalsandnasalizationinBengaliDhakaUniversityofDhaka

[144] UnicodeConsortiumProposalSummaryFormtoAccompanySubmissionsforAdditionstotheRepertoireofISOIEC10646UNICODEhttpswwwunicodeorgL2L200202387r-syloti-formpdfaccessedonMay212018

[145] WikipediaOlChiki(Unicodeblock)httpsenwikipediaorgwikiOl_Chiki_(Unicode_block)accessedonMay212018

[146]BanglaScripthttpwwwbangladesh2000combdbangla_scripthtmlaccessedonMay212018

[147] BhattacharyaAshutoshed(1942)GopichandrerGanCalcuttaCalcuttaUniversity

[149] DasSisirKumar(1975)SahibsandMunshisAnAccountoftheCollegeofFortWilliamCalcutta

[150] IslamRafiqulPabitraSarkarMahbubulHaqampRajibChakraborty(eds)(2014)BanglaAcademyPromitoBanglaByabaharikByakaran(AFunctionalGrammarofStandardBangla)DhakaBanglaAcademy

[151] SarkarPabitra[2013]lsquoBanglaSpellingReformtheLongandShortofItrsquoBanglaJournal19215-232

[152] BanglaAcademy(2012)BanglaAcademyPromitoBanglaBananerNiyam(StandardBanglaSpellingasadoptedbyBanglaAcademy)DhakaBanglaAcademy

56

[153] SarkarPabitraampRajibChakraborty2018ldquoWhathashappenedSoFarIntermsof Script Reformsrdquo Paper presented at the Face to Face meeting jointly held by theBanglaAcademyDhakaampICANNatBanglaAcademyDhakaon10072018

[154] The Unicode Consortium 2018 The Unicodereg Standard Version 110 ndash CoreSpecificationChapter12P473

57

10 Appendix-I

101 AugmentedBackus-NaurFormalism(ABNF)The Augmented Backus-Naur Formalism (ABNF) is generic in nature and whenappliedtoaspecificlanguagescriptcertainrestrictionrulesapplyInotherwordsinagivenlanguagesomeoftheFormalismstructuresdonotnecessarilyapplyTotakecareofsuchcasesrestrictionrulesaresetinplaceTheserestrictionswillhelptofine-tunetheABNFIncaseofBangla13inparticularthefollowingrulesapply

1Khaṇḍata (ৎ) is NOT allowed at the beginning of an IDN label The sameappliestoঞandthevelarnasalঙintheBanglaSchemeoffive-foldlsquovargarsquo(asdefined under Table 5) Moreover Bangla does not allow ya (য়) in thebeginningofawordeitherbutwecanciteacoupleofnativeexamples forexamplethewordয়8াvেড়া(yaeligbbɔRo)fromthepoemlsquoLichuchorrsquowrittenbyKaziNazrul IslamHowever there are instances of it being used in namesmostlyof foreignoriginsuchasYaqubwhichmaybewrittenwithya(য়) inthe beginning as inয়াwব) In very recent timeswhile transliterating someChineseandJapanesenamesinBanglaonedoescomeacrossthepossibilityofKhaṇḍata (ৎ) followedbysa (স) inthebeginningofaword forexamplexেসিরং(Tsering)

2CHcancomewithKhandaTainonlythecasewhereCisra(র)(09B0)

ৎ6 asinভৎ6 সনা

3OnlyfollowingcombinationswithVHCMwillbeallowedrarrঅ8া(togetherpronouncedasaelig)asinঅ8ািসড(acid)rarrএ8া(togetheralsopronouncedasaelig)asinএ8ািসডএ8ােসািসেয়শান

(acidassociation)

102 lsquoSylhetiNagarılipirsquoorlsquoSilotirsquoThisversionofBanglascriptresemblesthe lsquoKaithīrsquoscript(ISO12954)usedbytheAccountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh andBiharndashwidelyinuseduringthe1880sTherewereseveralothernamesofSylheti

13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language Assamese and Maṇipuri have not been covered in this section

58

Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)

Table17ndashTheScriptTableofSylhetiNagarıorSiloti

103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints

1031 BanglaandNāgarīorDevanāgarī

Bangla Devanāgarī NBGP

Decision

ঃ U+0983 ः U+0903 Confusable

ওU+0993 उU+0909 Confusable

ঘU+0998 घU+0918 Confusable

U+0981 U+0945 Confusable

Table18BanglaandDevanāgarīconfusablecodepoints

59

1032 BanglaandGurmukhi

Bangla Gurmukhi NBGPdecision

ঘU+0998 ਬU+0A2C Confusable

U+0981 U+0A71 Confusable

Table19BanglaandGurmukhiconfusablecodepoints

Bangla Gurmukhi

NBGPdecision

ওU+0993 ਤU+0A24

Distinguishable

শU+09B6 ਅU+0A05

Distinguishable

মU+09AE ਮU+0A2E

Distinguishable

বাU+09ACandU+09BE

ਗU+0A17

Distinguishable

Table20ndashBanglaandGurmukhıdistinguishablecodepoints

1033BanglaandOriya(Odia)

Bangla Oriya(Odia) NBGPDecision

ওU+0993 ଓU+0B13 Confusable

Table21ndashBanglaandOriyadistinguishablecodepoints

Bangla Oriya(Odia) NBGP

Decision

ঘU+0998 ସU+0B38 Distinguishable

Table22ndashBanglaandOriyadistinguishablecodepoints

60

11 Appendix-IIBengaliconsonantsandtheirallographs

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

প p y(z+ত)(z +ন)|(z +প)প8(z +য)r(z +র)(z +ল)~(z +স)(+প)(+প)

ফ pʰ (t +র)(t +ল)(+ফ)

ব b ( +জ)( +দ)( +ধ)v(+ব)ব8(+য)(+র)(+ল)ভ(+ভ)(+ব)(+ব)

(0+ধ) 2 (3+ব)

ভ bʱ ভ8(+য)(+র)(+ল)

ত t (x+ত)8(x+x+য)(x+x+ব)(x+থ)(x+ন)ত8(x+য)(x+ম)8(x++য)(x+ব)(x+র)y(z+ত)(p+ত)(p+x+ব)(+ত)q8(+x++য) (+x+র)Thereisamarkedformofত+=ৎৎ6 (+xৎ)

amp (5+ত)

61

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

থ tʰ থ8(iexcl+য)cent(iexcl+র)pound(+থ)(x+থ)curren(+থ)

(7+থ) 9 (+থ)

দ d yen(brvbar+গ)sect(brvbar+ঘ)uml(brvbar+দ)copy(brvbar+ধ)দ8(brvbar+য)ordf(brvbar+ব)laquo(brvbar+ভ)not(brvbar+র)(+দ)shy(+দ)reg(+brvbar+র)not6 (+brvbar+র)

(lt+গ) gt (lt+ধ)

ধ dʱ macr(deg+ন)plusmn(deg+ম)ধ8(deg+য)sup2(deg+র)sup3(acute+ধ)copy(brvbar+ধ)(+ধ)micro(+ধ)

( (+ধ) gt (lt+ধ) (0+ধ) (7+ধ)

ট ʈ para(middot+ট)ট8(middot+য)cedil(middot+ব)sup1(middot+র)ordm(p+ট)raquo(frac14+ট)

ঠ ʈʰ ঠ8(frac12+য)frac34(iquest+ঠ)Agrave(frac14+ঠ)

ড ɖ Aacute(Acirc+ড)ড8(Acirc+য)Atilde(Acirc+র)

ঢ ɖʱ ঢ8(Auml+য)Aring(iquest+ঢ)

চ tʃ AElig(Ccedil+চ)Egrave(Ccedil+ছ)Eacute(Ccedil+Ecirc+র)Euml(Ccedil+ঞ)চ8(Ccedil+য)Igrave(Iacute+চ)Icirc(Iuml+চ)

(A+চ)

62

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ছ tʃʰ ETH(Ecirc+র)Egrave(Ccedil+ছ)Ntilde(Iacute+ছ)Ograve(Iuml+ছ)

$ (A+ছ)

জ dʒ Oacute(Ocirc+জ)Otilde(Ocirc+Ocirc+ব)Ouml(Ocirc+ঝ)times(Ocirc+ঞ)জ8(Ocirc+য)Oslash(Ocirc+র)Ugrave(Iacute+জ)

(A+জ)

ঝ dʒʱ (notprivilegedenoughtohaveclustersasafirstmember)Ouml(Ocirc+ঝ)Uacute(Iacute+ঝ)

ক k Ucirc(p+ক)ordm(p+ট)(p+ত)Uuml(p+x+র)(p+x+ব)Yacute(p+ন)THORN(p+ব)szlig(p+ম)ক8(p+য)agrave(p+র)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)auml(p+frac14+ব)aacute8(p+frac14+য)aring(p+স)s(aelig+ক)ccedil(+p+র)

amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)

খ kʰ (notprivilegedenoughtohaveclustersasafirstmember)egrave(aelig+খ)

63

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

গ g eacute(acute+গ)ecirc(acute+দ)sup3(acute+ধ)euml(acute+ন)igrave(acute+ব)iacute(acute+ম)গ8(acute+য)icirc(acute+র)iuml(acute+ল)eth(aelig+গ)eth6 (+aelig+গ)

( (+ধ) (H+গ) K (L+H+গ)

ঘ gʱ ntilde(ograve+ন)ঘ8(ograve+য)oacute(ograve+র)ocirc(aelig+ঘ)

ঞ Thisletterdoesnothaveanyparticularphoneticvaluebutmostlypronouncedasn

Igrave(Iacute+চ)Ntilde(Iacute+ছ)Ugrave(Iacute+জ)Uacute(Iacute+ঝ)times(Ocirc+ঞ)

(A+চ) $ (A+ছ) (A+জ) M (A+ঝ)

ণ n otilde(iquest+ট)frac34(iquest+ঠ)ouml(iquest+ড)divide(iquest+Acirc+র)Aring(iquest+ঢ)oslash(iquest+ণ)ণ8(iquest+য)ugrave(iquest+ব)acirc(p+frac14+ণ)uacute(frac14+ণ)ucirc(+ণ)

O (P+ড) - (P+R+র) + (S+ণ)

ঙং ŋ s(aelig+ক)uuml(aelig+p+র)egrave(aelig+খ)eth(aelig+গ)ocirc(aelig+ঘ)yacute(aelig+p+ষ)(Insomecontextsaeligisreplacedbyং)কংঅং

) (H+ক) (H+গ) U (H+ঘ)

64

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ম m thorn(+ল)yuml(+প)(+z+র)(+ভ)(++র)$(+ম)(+র)(x+ম)plusmn(deg+ম)amp(+ম)atilde(p+frac14+ম)

W (3+ম)

ন n (+ট)((+middot+র))(iquest+ঠ)u(+ড)(+Acirc+র)(+ত)q(+x+র)q8(+x++য)curren(+থ)shy(+দ)reg(+brvbar+র)micro(+ধ)+(+deg+র)(+brvbar+ব)-(+ন)(+ম)ন8(+য)(+স)0(+ন)

(7+থ) (7+ধ) (7+Y+র)

শ ʃ Icirc(Iuml+চ)Ograve(Iuml+ছ)1(Iuml+ন)2(Iuml+ম)3(Iuml+র)4(Iuml+ল)শ8(Iuml+য)

ষ ʃ 5(frac14+ক)raquo(frac14+ট)Agrave(frac14+ঠ)uacute(frac14+ণ)6(frac14+প)7(frac14+z+র)8(frac14+ফ)raquo(frac14+ট)9(frac14+middot+র)Agrave(frac14+ঠ)uacute(frac14+ণ)ষ8(frac14+য)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)

+ (S+ণ)

65

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

স sampʃ (+ক)(+ট)(+প)(+ফ)lt(+ত)pound(+থ)(+ট)(+ক)=(+খ)স8(+য)gt(+র)(+ল)aring(p+স)

9 (+থ)

হ h ucirc(+ণ)0(+ন)amp(+ম)হ8(+য)(+র)A(+ল)

W (3+ম)

ড় ɽ B(C+গ)

ঢ় ɽʱ (notprivilegedenoughtohaveclusters)

য dʒThesecondarysymbol(allograph)jɔ-phalahastwophoneticvaluesWhenaddedtotheinitialconsonantinaworditisavowelaelig(asinশ8ামলর 8াপারetc)Butafteranon-initialconsonantitjustdoublesitinpronunciation(asinকায6ধায6etc)The+যcombinationhastwophysicalmanifestationsmdashর 8andয6

ক8(p+য)স8(+য)র 8(+য)[Justর 8isneverusedinBanglaorthographyর 8াisbutthenitslasttwosymbolsYa-phalaa-karaconstituteavowelsignrepresentingthevowelঅ8া]

66

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

র r Twomanifestationsmdashi lরফrepʰasthefirst

memberofaclusteregপ6ৎ6 not6 য6D6(+deg+ব)(earlierE6=+brvbar+deg+বafour-termcluster)etc(placedoverthefollowingconsonant)

ii র-ফলাrɔ-pʰɔlaasthesecondthirdmemberofaclustereg etc(placedundertheconsonantitfollows)

ল l F(+গ)(+প)G(+ব)H(+ম)I(+ট)J(+ড)K(+ক)F(+গ)L(+দ)ল8(+য)iuml(acute+ল)(+ল)thorn(+ল)

ঃ hwordfinallywordmediallyitdoublesthepronunciationofthefollowingconsonant

অঃকঃ

অব

Page 38: Proposal for a Bangla (or Bengali) Script Root Zone …‘Bangla’ (or Bengali) is historically and genealogically regarded as an eastern Indo-Aryan language with around 178.2 million

38

4 ldquo()rdquo SequenceGroup

Table11TheABNFFormalism

InwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoBanglaaregiventofacilitateunderstanding

542 TheVowelSequenceInwhatfollowstheVowelSequenceandtheConsonantSequencepertinenttoBanglaare given To facilitate understanding of other Brahmi script users equivalents inDevanāgarīareprovidedwherevernecessaryAvowelsequenceismadeupofasinglevowelItmaybefollowedbutnotnecessarily(optionally)byanAnusvāraonuʃʃār(B)Candrabindu(D)oraVisargabiʃɔrgo(X)ThenumberofDBorXwhichcanfollowaV inBanglamaynotberestrictedtooneGoingbytherules illustratedinthedocument it isclearthat formationssuchasVDDVBBandVXXare invalidorthographicunitsHowever it isvalidandpossible tohaveformationsorsequencessuchasanusvarafollowedbyachandrabinduononehandandvisarga followed by a chandrabindu on the other as in হ8াংচা lsquohaelignchārsquo and lsquohaelignrsquo হ8াঃrespectivelyThepossibilityof aVisarga orAnusvāra (onuʃʃār) followingaCandrabinduexists inBangla Vowel can optionally be followed by a combination of Hasanta Virama [H]Consonant [C] to formaYa-phala ldquoYa-phala isapresentation formofU+09AFBanglaletterযorlsquoyarsquoRepresentedbythesequenceltU+09CDieBENGALISIGNVIRAMABangla SIGNHasanta or VIRAMA U+09AF -য BENGALI LETTER YAgt Ya-phala has aspecialformয়AgainwhencombinedwithU+09BEাBENGALIVOWELSIGNAA(ielsquoaarsquo(ā))itisusedfortranscribing[aelig]asintheldquoardquointheEnglishwordldquobatrdquowritteninBanglaasব8াটAVowel-sequenceadmitsthefollowingcombinations

5421 ASingleVowel

ExamplesV অअ

5422 AVowelwithConditionsAVowelcanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D]orVisarga[X]or Candrabindu+ Anusvāra [DB] or Candrabindu+ Visarga [DX] or combination ofHasanta(orVirama)[H]followedbyConsonant[C]followedbykāra(Mātrā)[M]

39

Examples

VB অং अ

VD অ अ

VX অঃ अः

VDB অং अ

VDX অঃ अःVHCM অ8াএ8া

5423 VHCMSequenceAVHCMsequencecanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D] orVisarga [X]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga [DX]

Examples VHCMB অ8াংএ8াং VHCMD অ8াএ8া VHCMX অ8াঃএ8াঃ VHCMDB অ8াংএ8াং VHCMDX অ8াঃএ8াঃ

543 TheConsonantSequence

5431 ASingleConsonant(C) ExampleC কक

5432 AConsonantwithConditionsAConsonantoptionallyfollowedbydependentvowelsignkāra(Mātrā)[M]orAnusvāra [B] or Candrabindu [D] or Visarga [X] or Hasanta (also known asVirama)[H]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga[DX]

Example

CM িকক -कक

CB কং क

40

CD ক क

CX কঃ कः

CH p क(Pureconsonant)

CDB কং क

CDX কঃ कः

5433 CMSequenceACMsequencecanbeoptionallyfollowedbyBDXDBorDX

Example CMB কীংকং कक

CMD কা का

CMX বীঃ वीः

CMDB কাং का

CMDX কাঃ काः

5434 SequenceofConsonants Asequenceofconsonants(upto4)joinedbyHasanta(alsoknownasVirama)

3(CH)CExample

CHC W rarr ন++ ত न+त

CHCHC sup2 rarr ন+ + ত+ + র न+त+र

CHCHCHC q8 rarr ন++ত++র++য় न+त+र+य

5435 Subsets While considering its subsets as a representative example we will consider thecombinationCHConlyhoweverthesameisequallyapplicabletoCHCHCandCHCHCHC[A]ThecombinationmaybefollowedbyMBDXDBorDX

Example CHCM sup3ী rarrক ক ী 4कrarrककী

CHCB sup3ং rarrক ক ং 4कrarrकक

CHCD sup3 rarrক ক 4कrarrकक

41

CHCX sup3ঃ rarrক ক ঃ 4कःrarrककঃ

CHCDB sup3 ং rarrক ক ং 4कrarrकक

CHCDX sup3ঃ rarrক ক ঃ 4कःrarrककः [B] 3(CH)CMmayfurtherbefollowedbyaBDXDBorDX

Example

CHCMB sup3ীং rarr ক ক ী ং 4क rarr क क ी

sup3ং rarr ক ক ং 4क rarr क क

CHCMD sup3া rarr ক ক া 4का rarr क क ा

CHCMX sup3ীঃ rarr ক ক ী ঃ 4कः rarr क क ी ः

CHCMDB sup3াংrarr ক ক া ং 4काrarr क क ा

CHCMDX sup3াঃ rarr ক ক া ঃ 4काः rarr क क ा ः

544 TheKhanda-Tasequence

5441 AsinglelsquoKhandarsquo-Ta(Z) Example Zৎ=x

5442 AKhandaTaCombination10AKhandaTacanbeprecededbyaconsonantandHasanta(alsoknownasVirama)

[CH]Z

Example র++ৎ =ৎHasinভৎHসনা (bhartsanā) scoldingNoteTheconditionsinthiscontextofKHANDATAarethattheCshouldbeeitherRAU+09B0(র)(usedinBangla)orRAU+09F0(ৰ)(usedinAssamese)

545 SpecialCasesSandPTwospecialcasesinvolvingSequences(referredtoasSandPinTable16underSection7)couldbedescribedbrieflyhereLetustakeupSinthefirstinstanceItisnoteworthythat thereare two instances inBanglawhereHasanta(U+09CD) isprecededbya full

10 Refer to Rule P in Section 7 Table 16

42

vowel (U+0985 অ - BANGLA LETTER A and U+098F এ - BANGLA LETTER E) ForrenderingYa-phalāfollowedbyঅandএitisnecessarytotypeU+09CDplusU+09AFyapreceded by the said vowels This is a purely ligatural entity and the addition ofYa-phalāandā-kāraisusedtoelicittheaeligsoundasinEnglishlsquobatrsquolsquofatrsquoetcTheBrahmıscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribedaslsquovowel killerrsquo although it actually indicates absence of a vowel after the markedconsonant Only the consonants can have the Hasanta marked But as we see hereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅGাandএGা(CfUnicode100p473[100])Another case refers to the formation of repha and ra-phalā in the said script andmentioned in the tableaboveasPOwing toco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+

kaasinঅকHarkaldquothesunldquo)ra-phalā=C+Hasanta+ra(egTieka+Hasanta+raas

inচTcakraldquocycleldquo)ThepointisinboththecasestheslotforracouldbeBanglaraর

(U+09B0)ortheAssameseraৰ(U+09F0)followedprecededbythecommonHasanta(U+09CD)whereastheshapesofrephaandra-phalāinboththecasesremainthesame

43

6 VariantsThissection talksabout thevariants in theBanglascriptTheNBGPcategorizes theseconfusinglyvariantsintwogroupsGroup1ConfusingduetopurevisualsimilarityGroup2ConfusingduetodeviationfromnormallyperceivedcharacterformationsbylargerlinguisticcommunityForGroup1anyidenticalcodepointsaredefinedasvariantsTheconfusablebutnotidenticalcasesarenotproposedasthereisanotherpanel(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesHowevercaseswhichbelongtoGroup2areproposedtobeconsideredasvariantsThesecasesarenotofmerevisualsimilarityasthey involve some deviations from the widely accepted norms of Bangla Aksharformations These can cause confusion even to a careful observer and hence beingproposedasvariantsThe variants are generated in a script when two or more forms are formed withdifferent storage or code points In Bangla the e-kāra ā-kāra and the o-kāra havedifferentcodepointsOnecantypeowithaconsonantatonegoandthesamebytypinge-kāra and ā-kāra as two separate keys getting the same results A reader cannotdifferentiatebetween the twoko (েকা)one typedwitha singlekeyand theotheronetypedwithtwodifferentkeysMoreoverthiswillnotbeconsideredasacaseofvariantbecauseakārafollowedbyakāraisnotallowed

61 InScriptVariantsHoweverweproposetwocasesoftruein-scriptvariantsinBanglascriptCASEIAs far as true variants in Bangla are concernedwemay drawour attention to caseswhereinHasantawith(U+09A5)থ(tha)appearsasconjunctwith(U+09B8)স(sa)and(U+09A8)ন(na)1 স+Hasanta+থ(U+09B8+U+09CD+U+09A5)versus

স +Hasanta+হ(U+09B8+U+09CD+U+09B9)

2 ন+Hasanta+থ(U+09A8+U+09CD+U+09A5)versus

ন+Hasanta+হ(U+09A8+U+09CD+U+09B9)

44

Theabovecombinationsifwrittenintraditionalorthographycouldbelittleconfusingwheretheথ(tha)inconjunctappearslikeaহ(ha)Theconjunctcouldbeintheinitialmedialorfinalpositions(asshownbelowinegno1)Itcouldbetypedwrongaswellthinking itwas aহ (ha)U+09B9 increasing the chances of risks in labelwriting andidentificationExamples1 acuteandসহ(asinacuteানsthānaacuteলsthulaাacuteGsvāsthyaঅacuteায়ীasthāyı)

2 andনহ(asinparagrantha)The fontswhichrepresent traditionalBanglawritingsystemcould tend tocreate thisproblemThereforethesemaybetakenascasesofvariantsinBanglaCASEIIAnotherinterestingexampleofvariantisencounteredinra+HasantaandHasanta+racombinationsinwritinglabelsintheBanglascript(forlanguagessuchasBanglaAssameseandManipuri)Thevariantcasesariseintypinglsquorepharsquo(involvingra+Hasanta)andlsquora-phalārsquo(involvingHasanta+ra)lsquoRepharsquocouldbeformedbytwosequences(mainlybecausebothAssameseandBanglafindplaceinthesameUNICODEpointsandlsquoB_RArsquoaswellaslsquoA_RArsquorefertothesamephoneticelement)Herethefinalligatureslookthesameandwillbeasfollows

(1) B_RA+H+C(2) A_RA+H+C

Where

B_RA = U+09B0BENGALILETTERRA(র)orA_RA = U+09F0BENGALILETTERRAWITHMIDDLEDIAGONAL(ৰ)H = U+09CDBENGALISIGNVIRAMA()C = anyconsonant(theoretically)

Example

Sequence1(UsingBanglaRA)

Ligature1

Sequence2(UsingAssameseRA)

Ligature2

U+09B0(র)U+09CD()U+0995(ক) কH U+09F0(ৰ)U+09CD()U+0995(ক) কH

U+09B0(র)U+09CD()U+09A0(ঠ) ঠH U+09F0(ৰ)U+09CD()U+09A0(ঠ) ঠH

Table12ExampleofRepha

45

NoteAsBanglaandAssameseকandঠlookexactlythesametheresultantcombinationswithRephalookidenticalAdditionofRephadoesnotmakeanydifferencelsquoRa-phalarsquocouldbeformedbytwosequencesonsimilargroundsandthefinalligatureswouldlookthesame

(1) C1+H+B_RA(2) C1+H+A_RA

WhereC1 = anyconsonantsexceptKhanda-ta

Example

Sequence1(UsingBanglaRA)

Ligature1

Sequence2(UsingAssameseRA)

Ligature2

U+0995(ক)U+09CD()U+09B0(র) U+0995(ক)U+09CD()U+09F0(ৰ)

U+09A8(ন)U+09CD()U+09B0(র) ) U+09A8(ন)U+09CD()U+09F0(ৰ) )

Table13ExampleofRa-phala

AstheAssameseandBanglaRephaandRa-phalaconjunctformslookthesamethiscouldcauseconfusabilitytotheend-usersHencetherephaandra-phalacasesneedtobedefinedasvariantsNBGPconcludedtodefineরandৰasvariantcodepointswhereonlyonevariantsetbetweenরandৰcouldcoverallcasesButthiswillcreateblockedvariantlabelsegif

someoneregistersldquoরররrdquothevariantlabelldquoৰৰৰrdquowillbegeneratedasvariantandwillbeblockedandviceversaHoweveritisonlyblockedatthelabellevelifsomeoneelseneedstoregisterotherlabelsegৰৰorৰৰৰৰitisstillpossible

62 CrossScriptVariantsAcrispcrossscriptstudyforBanglahasbeendonewithrespecttosisterscriptssuchasDevanāgarī Gurmukhı and Odia11(formerly Oriya) keeping in mind the visual andtechnicalconfusionstheymaycauseaslabelsonthewebdomainMoreoverthereisnoin-script variant in Bangla as far as the orthography is concerned The followingcharacters are being proposed by the NBPG as variants Although there are certaincharacters which are somewhat similar they but have not been included here TheyhavebeenprovidedintheAppendix(102)forreference

11 Unicode uses Oriya for the script although Odia is now the official term used

46

1 BanglaandNāgarīDevanāgarīScript

Bangla Devanāgarī

মU+09AE

मU+092E

িU+09BF

िU+093F

Table14-BanglaandDevanāgarīcross-scriptvariantcodepoint

2 BanglaandGurmukhiScript

Bangla Gurmukhı

মU+09AE

ਸU+0A38

িU+09BF

ਿU+0A3F

Table15-BanglaandGurmukhıcross-scriptvariantcodepoint

7 WholeLabelEvaluationRules(WLE) ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinsection32whenwritten inBangla12ScriptThe ruleshavebeendrafted in suchawaythattheycanbeeasilytranslatedintotheLGRspecificationsBelow are the symbols used in the WLE rules for each of the Indic SyllabicCategoryasmentionedinthetableprovidedinCodepointrepertoire(Section51)

C rarr Consonant

M rarr Kāra(Mātrā)

V rarr Vowel

B rarr Anusvāra

12 As used by the Unicode denoting and including both Assamese and Maṇipuri

47

D rarr Candrabindu

X rarr Visarga

H rarr Hasanta

Z rarr KhandaTa

S rarr S1S2(fromTable9)or(ae)Ya-phalā(V1HC1M1)whereV1iseither0985(অ-BENGALILETTERA)or098F(এ-BENGALILETTERE)His09CD(-BENGALISIGNVIRAMA)C1is-09AF(য-BENGALILETTERYA)M1is-09BE(া-BENGALIVOWELSIGNAA)S1 and S2 are valid even they are not allowed by the other context rules

P rarr Ra-Hasanta(C2H)whereC2iseither09B0(র-BENGALILETTERRA)or 09F0 (ৰ - ASSAMESE LETTER RA

Unicode name BENGALI LETTER RAWITHMIDDLEDIAGONAL)

His09CD(-BENGALISIGNVIRAMA)

Table16-SymbolsusedinWLErules

ItisalsoperhapsidealtomentionherethatinBanglatheconsonantletters(orgraphemes)arephysicallyjoinedtoformldquoclustersrdquothatcouldtheoreticallyconjoinfromtwotofourconsonantsandcombinetocreatenewshapesDashandChaudhuri(1998)statethatthereareldquonearly380uniqueconsonantclustersrdquooutofwhichBi-consonantalcombinationsare290three-lettercombinationsaccountforanother80andtherareroneswithfourlettersnumber10more[136Pg4]MoredetailsofsuchcombinationscouldbeseeninPabitraSarkar(1993)[135] 71FinalSetofWLERulesTheprevalentpatternsinBanglaandvariousrestrictionsbelowarethespecificWLErulesthatneedtobeimplemented

48

1 CisasetofCandCNwhereCNisthesetofnormalizedformsofড়ঢ়য়2 HmustbeprecededbyC

Example

3 MmustbeprecededbyCExampleকা

4 DmustbeprecededbyeitherofVCorMExampleআখখা হা

5 XmustbeprecededbyeitherofVCMorDExampleউঃখঃবঃাঃ দ ঃ

6 BmustbeprecededbyeitherofVCMorDExampleআংইংকং

7 ZmustbeprecededbyVCMDBXorPExampleইৎকৎাৎাৎপ6ৎrৎ (S is not listed because S ends with M Z may also follow S)rdquo

8 VCANNOTbeprecededbyHDetailsin711CaseofVprecededbyH

9 SCANNOTbeprecededbyH

Now letuselaborateeach rulewithexamples from the scriptkeeping inmindthe Bangla Assamese and Manipuri communities Some combinations ofcharactersmayseemunrealisticorrareinusagebutthereisnoharminaddingsuch ligaturesbecause it ispossible tocreate thembyanyusereasilybutmaynotbeattestedcombinationsCase of V Preceded by H There could be cases involvingmulti-word domains where Vmay need to beallowedtofollowanHegব8াsঅtইিuয়া baeligŋkʌv ɪndiə (U+09ACU+09CDU+09AFU+09BEU+0999U+09CD U+0995 U+0985 U+09AB U+09CD U+0987 U+09A8 U+09CD U+09A1U+09BFU+09DFU+09BE)(meaningBankofIndia)ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendswithanH(অt)andthesecondwordbeginswithaV(ইিuয়া)Somesectionsofthe linguistic community require the explicit presence of H for fullrepresentation of the sound intended However by and large the form of thefirstwordwithoutanH(U+09CD)isconsideredenoughforfullrepresentationofthesoundintendedforthefirstword

49

This isauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoire Otherwise V is never required to be allowed to follow an HPermittingthismaycreateaperceptivesimilaritybetweentwolabels(withandwithout H) for majority of the linguistic communities hence this is explicitlyprohibitedbytheNBGPIn future if required depending on the prevailing requirements from thecommunitythefutureNBGPmayconsiderrevisitingthisrule

72 AdditionalExamplesfromBanglaABNF

Belowarea fewexampleswhichhelponeunderstandsomeof the rulesABNFputsinplaceThesearejustgivenforreferencepurposesandarenotmeanttobecomprehensive

1HMBDorXcannotoccurinthebeginningofaBanglawordExample

ক क

াক ाक

ংক क

ক क

ঃক ःकAscanbeseensuchcombinationwillresultautomaticallyinaldquogolurdquooradottedcircle marking it as an invalid formation This is an intrinsic property of theIndianlanguagesyllableandisquasi-automaticallyappliedwhereversupportedbytheOS

2HisnotpermittedafterVBDXMS

Example

অ अ

অং क ক क কঃ कः ি )क

3NumberofBDorXpermittedafterConsonantorVowelorakāra(Mātrā)isrestrictedtoonethusthefollowingcombinationsareinvalidated

50

Example

কংং क

ক क

কঃঃ कःः

কা का

কীঃঃ कःः

অংং अ

অ अ

অঃঃ अःः

4NumberofMpermittedafterConsonantisrestrictedtooneExample

কীী की5MisnotpermittedafterV Example

ইাঈৗ ईाईौ6ThecombinationsofAnusvāra+ VisargaaswellasVisarga+Anusvāraarenotpermissible

Example

কংঃ कः

কঃং कः

8 Contributors

81ExpertsfromIndia ProfessorUdayaNarayanaSinghChair-ProfessorofLinguisticsampDeanFacultyofArtsAmityUniversityHaryanaGurgaonPachgaonManesarPIN122431(Haryana)India

51

ProfessorPabitraSarkarformerlyVice-ChancellorRabindraBharatiUniversityKolkataDrAtiurRahmanKhanPrincipalTechnicalOfficerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMrRajibChakrabortyLinguistSocietyforNaturalLanguageTechnologyResearch(SNLTR)Module114amp130SDFBuildingSaltLakeSector-VKolkata-700091(WestBengal)IndiaMrAkshatJoshiProjectEngineerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMsMoumitaChowdhurySeniorTechnicalOfficerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMrChandrakantaMurasinghAgartalaTripuraSomeotherNBGPmembers

82ContributorsfromBangladeshJanabMustafaJabbarHonorableMinisterMinistryofPostsTelecommunicationsampInformationTechnologyGovtofBangladeshProfShamsuzzamanKhanFormerDirector-GeneralBanglaAcademyDhakaProfRafiqulIslamNationalProfessorofHumanitiesDhakaProfSwarochisSarkarDirectorInstituteofBangladeshStudiesRajshahiUniversityRajshahiBangladeshProfJinnatImtiazAliDirector-GeneralInternationalMotherLanguageInstituteDhakaMrMohammadMamunOrRashidDepartmentofBanglaJahangirnagarUniversityampMemberBangladeshComputerCouncilProfManiruzzamanformerlyProfessorChittagongUniversityChattagramBangladeshMrShyamSunderSikderSecretarySecretaryPostampTelecommunicationsDivisionGovtofBangladesh

52

MrMdMustafaKamalFormerDirectorGeneralBangladeshTelecommunicationsRegulatoryAuthorityGovernmentofBangladeshDhakaBrigadierGeneralMdMahfuzulKarimMajumderDirector-GeneralEngineeringampOperationsDivisionBangladeshTelecommunicationsRegulatoryAuthorityGovernmentofBangladeshDhakaMdZiarulIslamProgrammerPostsampTelecommunicationsDivisionGovernmentofBangladeshDhakaProfSyedShahriyarRahmanDepartmentofLinguisticsUniversityofDhakaDrMizanurRahmanDirector(In-Charge)TranslationTextBookandInternationalRelationsDivisionBanglaAcademyDhakaDrApareshBandyopadhyayDirectorBanglaAcademyDhakaMrMdMobarakHossainDirectorBanglaAcademyDhakaDrJalalAhmedDirectorBanglaAcademyDhakaMrJahangirHossainInternetSocietyBangladesh(ICANNALS)JanabSarwarMostafaChoudhuryBangladeshComputerCouncilDhakaJanabMdRashidWasifBangladeshComputerCouncilDhakaJanabIstiaqueArifSeniorAssistantDirectorBangladeshTelecommunicationsRegulatoryAuthorityDhakaMsAfifaAbbasInformationSecurityandGovernanceLeadEngineeratBanglalinkandICANNFellowMr Mohammad Abdul Haque Secretary General Bangladesh Internet GovernanceForumMrImranHossenCEOEyeSoftandkeymemberofBangladeshAssociationofSoftwareampInformationServices(BASIS)MsShahidaKhatunDirectorFolkloreMuseumampArchiveDivisionBanglaAcademyDhakaMrSyedAshikRehmanCEOBengalMediaCorporationDhakaMrHaseebRahmanCEOProfessionalsrsquoSystemsDhaka

53

9 References[100] UnicodeConsortium2017UnicodeStandard100MountainViewCA

[101] BandyopadhyayChittaranjan1981DuiShatakerBanglaMudranoPrakashanKolkataAnandaPublishers

[102] BanerjiRD1919TheOriginoftheBengaliScriptKolkataNewDelhiAsianEducationalServices2003reprint

[103] ChatterjiSK1926TheOriginandDevelopmentoftheBengaliLanguageCalcuttaCalcuttaUniversityPress

[104] -----1939Bhasha-prakashBangalaVyakaran(AGrammaroftheBengaliLanguage)CalcuttaUniversityofCalcutta

[105] HaiMuhammadAbdul1964DhvaniVijnanOBanglaDhvani-tattwa(PhoneticsandBengaliPhonology)DhakaBanglaAcademy

[106]JhaSubhadra1958TheFormationofMaithiliLondonLuzacampCo

[107] KosticDjordjeDasRheaS1972AShortOutlineofBengaliPhoneticsCalcuttaStatisticalPublishingCompany

[108] MajumdarRC1971HistoryofAncientBengalCalcuttaGBhardwaj

[109] MazumdarBijaychandra19202000TheHistoryoftheBengaliLanguage(ReprCalcutta1920ed)NewDelhiAsianEducationalServices

[110] PandeyAnshuman2001ProposaltoEncodetheTirhutaScriptinISOIEC10646

[111] PalPalashBaran2001DhwanimalaBarnamalaKolkataPapyrus

[112] -----2007lsquoBanglaHarapherPanchParbarsquoInSwapanChakrabortyedMudranerSanskritiOBanglaBoiKolkataAbabhas

[113] RossFiona1999ThePrintedBengaliCharacteranditsEvolutionLondonCurzon

[114] ShastriMahamahopadhyayHaraPrasad1916HājārBacharērPurāṇaBāṅgālāBhāṣāyBauddhaGānōDōhāCalcuttaBangiyaSahityaParishat

[115] SinghUdayaNarayana(JointlyManiruzzaman)1983DiglossiainBangladeshandlanguageplanningCalcuttaGyanBharati214pp

[116] -----1987ABibliographyofBengaliLinguisticsMysoreCIILxii+316pp

[117] -----2017(withRajibChakrabortyBidishaBhattacharjeeampArimardanKumarTripathy)LanguagesandCulturesontheMarginGuidelinesforFieldworkonEndangeredLanguagesMimeoCentreforEndangeredLanguagesVisva-Bharati

54

[118] -----1980ScriptalchoiceandspellingreformAnessayinlanguageandplanningJournaloftheMSUniversityofBarodaSocialScienceNumber292173-186AmodifiedversionreprintedEAnnamalaiBjornJernuddandJoanRubinedsLanguagePlanningProceedingsofanInstituteMysoreCIIL405-417

[119] Sripantha1996JakhanChapakhanaEloKolkataPaschim-BangaBanglaAcademy

[120] SurAtul1986BanglaMudranerDushoBacharKolkataJijnasa

[121] ScriptBehaviourforBengaliVersion11TDILandC-DACPune

[122] BoraMahendra1981TheEvolutionofAssameseScriptJorhatAssamSahityaSabha

[123] ProposaltoEncodetheTirhutaScriptinISOIEC10646httpwwwunicodeorgL2L201111175r-tirhutapdfaccessedon25112017

[124] EthnologueAssameseintheLanguageCloudhttpswwwethnologuecomcloudasmaccessedon25112017

[125] BengalialphabetforManipurifoundinEthnologueManipuri(MeeteilonMeithei)httpswwwomniglotcomwritingmanipurihtmaccessedon20102019

[126] WikipediaBengalialphabethttpsenwikipediaorgwikiBengali_alphabetaccessedon25112017

[129]OmniglotSlyhetihttpwwwomniglotcomwritingsylotihtmaccessedon1052018

[130]WikipediaBishnupriyaManipurilanguagehttpsenwikipediaorgwikiBishnupriya_Manipuri_languageaccessedon25112017

[131] TheEMILLECIILCorpushttpmetashareeldaorgrepositorybrowsethe-emilleciil-corpusabdd35c8de6f11e2b1e400259011f6ea6bce74d38dbb42d881da76c64a6adb20accessedon1052018

[132]TheEMILLECIILCorpushttpcatalogelrainfoproduct_infophpproducts_id=696accessedon1052018

[133] BanglaLanguageampScript httpswwwisicalacin~rc_banglabanglahtmlaccessedon1052018

[134] SarkarPabitra1992BanglaBananSanskarSamasyaoSambhabanaKolkataChirayataPrakashan

[135] SarkarPabitra1993BanglaBhasharYuktabyanjanBhasha1123-45

55

[136] DashNiladriShekharandBBChaudhuri1998BanglaScriptAStructuralStudyLinguisticsToday121-28Alsoavailableathttpswwwacademiaedu9967428Bangla_Script_A_Structural_Study

[137] DaniAhmedHasan(1957)lsquoSrīhatta-NāgarīLipirUtpattioBikāśrsquoBanglaAcademyPatrika(Dhaka)Vol12(Bhadra-Agrahayan1364BangabdaNumber)pg1

[138] WikipediaSylhetiNagari httpsenwikipediaorgwikiSylheti_Nagariaccessedon1952018

[139] FuruiRyosuke(2015)lsquoVariegatedAdaptationsStateFormationinBengalfromtheFifthtoSeventhCenturyrsquoinBhairabiPrasadSahuampHermannKulkeedsInterrogatingPoliticalSystemsIntegrativeProcessesandStatesinPre-ModernIndiaChapter9Pp255-73NewDelhiManohar

[140] FergusonCharesAandMunierChowdhury(1960)lsquoPhonesofBengalirsquoLanguageVol36No1pp22-59

[141] ShahidullahMuhammad(2007)BuddhistMysticSongsDhakaMowlaBrothers

[142] RayPunyaSloka(1966)BengaliLanguageHandbookWashington

[143] HaiMuhammadAbdul(1960)AphoneticandphonologicalstudyofnasalsandnasalizationinBengaliDhakaUniversityofDhaka

[144] UnicodeConsortiumProposalSummaryFormtoAccompanySubmissionsforAdditionstotheRepertoireofISOIEC10646UNICODEhttpswwwunicodeorgL2L200202387r-syloti-formpdfaccessedonMay212018

[145] WikipediaOlChiki(Unicodeblock)httpsenwikipediaorgwikiOl_Chiki_(Unicode_block)accessedonMay212018

[146]BanglaScripthttpwwwbangladesh2000combdbangla_scripthtmlaccessedonMay212018

[147] BhattacharyaAshutoshed(1942)GopichandrerGanCalcuttaCalcuttaUniversity

[149] DasSisirKumar(1975)SahibsandMunshisAnAccountoftheCollegeofFortWilliamCalcutta

[150] IslamRafiqulPabitraSarkarMahbubulHaqampRajibChakraborty(eds)(2014)BanglaAcademyPromitoBanglaByabaharikByakaran(AFunctionalGrammarofStandardBangla)DhakaBanglaAcademy

[151] SarkarPabitra[2013]lsquoBanglaSpellingReformtheLongandShortofItrsquoBanglaJournal19215-232

[152] BanglaAcademy(2012)BanglaAcademyPromitoBanglaBananerNiyam(StandardBanglaSpellingasadoptedbyBanglaAcademy)DhakaBanglaAcademy

56

[153] SarkarPabitraampRajibChakraborty2018ldquoWhathashappenedSoFarIntermsof Script Reformsrdquo Paper presented at the Face to Face meeting jointly held by theBanglaAcademyDhakaampICANNatBanglaAcademyDhakaon10072018

[154] The Unicode Consortium 2018 The Unicodereg Standard Version 110 ndash CoreSpecificationChapter12P473

57

10 Appendix-I

101 AugmentedBackus-NaurFormalism(ABNF)The Augmented Backus-Naur Formalism (ABNF) is generic in nature and whenappliedtoaspecificlanguagescriptcertainrestrictionrulesapplyInotherwordsinagivenlanguagesomeoftheFormalismstructuresdonotnecessarilyapplyTotakecareofsuchcasesrestrictionrulesaresetinplaceTheserestrictionswillhelptofine-tunetheABNFIncaseofBangla13inparticularthefollowingrulesapply

1Khaṇḍata (ৎ) is NOT allowed at the beginning of an IDN label The sameappliestoঞandthevelarnasalঙintheBanglaSchemeoffive-foldlsquovargarsquo(asdefined under Table 5) Moreover Bangla does not allow ya (য়) in thebeginningofawordeitherbutwecanciteacoupleofnativeexamples forexamplethewordয়8াvেড়া(yaeligbbɔRo)fromthepoemlsquoLichuchorrsquowrittenbyKaziNazrul IslamHowever there are instances of it being used in namesmostlyof foreignoriginsuchasYaqubwhichmaybewrittenwithya(য়) inthe beginning as inয়াwব) In very recent timeswhile transliterating someChineseandJapanesenamesinBanglaonedoescomeacrossthepossibilityofKhaṇḍata (ৎ) followedbysa (স) inthebeginningofaword forexamplexেসিরং(Tsering)

2CHcancomewithKhandaTainonlythecasewhereCisra(র)(09B0)

ৎ6 asinভৎ6 সনা

3OnlyfollowingcombinationswithVHCMwillbeallowedrarrঅ8া(togetherpronouncedasaelig)asinঅ8ািসড(acid)rarrএ8া(togetheralsopronouncedasaelig)asinএ8ািসডএ8ােসািসেয়শান

(acidassociation)

102 lsquoSylhetiNagarılipirsquoorlsquoSilotirsquoThisversionofBanglascriptresemblesthe lsquoKaithīrsquoscript(ISO12954)usedbytheAccountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh andBiharndashwidelyinuseduringthe1880sTherewereseveralothernamesofSylheti

13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language Assamese and Maṇipuri have not been covered in this section

58

Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)

Table17ndashTheScriptTableofSylhetiNagarıorSiloti

103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints

1031 BanglaandNāgarīorDevanāgarī

Bangla Devanāgarī NBGP

Decision

ঃ U+0983 ः U+0903 Confusable

ওU+0993 उU+0909 Confusable

ঘU+0998 घU+0918 Confusable

U+0981 U+0945 Confusable

Table18BanglaandDevanāgarīconfusablecodepoints

59

1032 BanglaandGurmukhi

Bangla Gurmukhi NBGPdecision

ঘU+0998 ਬU+0A2C Confusable

U+0981 U+0A71 Confusable

Table19BanglaandGurmukhiconfusablecodepoints

Bangla Gurmukhi

NBGPdecision

ওU+0993 ਤU+0A24

Distinguishable

শU+09B6 ਅU+0A05

Distinguishable

মU+09AE ਮU+0A2E

Distinguishable

বাU+09ACandU+09BE

ਗU+0A17

Distinguishable

Table20ndashBanglaandGurmukhıdistinguishablecodepoints

1033BanglaandOriya(Odia)

Bangla Oriya(Odia) NBGPDecision

ওU+0993 ଓU+0B13 Confusable

Table21ndashBanglaandOriyadistinguishablecodepoints

Bangla Oriya(Odia) NBGP

Decision

ঘU+0998 ସU+0B38 Distinguishable

Table22ndashBanglaandOriyadistinguishablecodepoints

60

11 Appendix-IIBengaliconsonantsandtheirallographs

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

প p y(z+ত)(z +ন)|(z +প)প8(z +য)r(z +র)(z +ল)~(z +স)(+প)(+প)

ফ pʰ (t +র)(t +ল)(+ফ)

ব b ( +জ)( +দ)( +ধ)v(+ব)ব8(+য)(+র)(+ল)ভ(+ভ)(+ব)(+ব)

(0+ধ) 2 (3+ব)

ভ bʱ ভ8(+য)(+র)(+ল)

ত t (x+ত)8(x+x+য)(x+x+ব)(x+থ)(x+ন)ত8(x+য)(x+ম)8(x++য)(x+ব)(x+র)y(z+ত)(p+ত)(p+x+ব)(+ত)q8(+x++য) (+x+র)Thereisamarkedformofত+=ৎৎ6 (+xৎ)

amp (5+ত)

61

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

থ tʰ থ8(iexcl+য)cent(iexcl+র)pound(+থ)(x+থ)curren(+থ)

(7+থ) 9 (+থ)

দ d yen(brvbar+গ)sect(brvbar+ঘ)uml(brvbar+দ)copy(brvbar+ধ)দ8(brvbar+য)ordf(brvbar+ব)laquo(brvbar+ভ)not(brvbar+র)(+দ)shy(+দ)reg(+brvbar+র)not6 (+brvbar+র)

(lt+গ) gt (lt+ধ)

ধ dʱ macr(deg+ন)plusmn(deg+ম)ধ8(deg+য)sup2(deg+র)sup3(acute+ধ)copy(brvbar+ধ)(+ধ)micro(+ধ)

( (+ধ) gt (lt+ধ) (0+ধ) (7+ধ)

ট ʈ para(middot+ট)ট8(middot+য)cedil(middot+ব)sup1(middot+র)ordm(p+ট)raquo(frac14+ট)

ঠ ʈʰ ঠ8(frac12+য)frac34(iquest+ঠ)Agrave(frac14+ঠ)

ড ɖ Aacute(Acirc+ড)ড8(Acirc+য)Atilde(Acirc+র)

ঢ ɖʱ ঢ8(Auml+য)Aring(iquest+ঢ)

চ tʃ AElig(Ccedil+চ)Egrave(Ccedil+ছ)Eacute(Ccedil+Ecirc+র)Euml(Ccedil+ঞ)চ8(Ccedil+য)Igrave(Iacute+চ)Icirc(Iuml+চ)

(A+চ)

62

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ছ tʃʰ ETH(Ecirc+র)Egrave(Ccedil+ছ)Ntilde(Iacute+ছ)Ograve(Iuml+ছ)

$ (A+ছ)

জ dʒ Oacute(Ocirc+জ)Otilde(Ocirc+Ocirc+ব)Ouml(Ocirc+ঝ)times(Ocirc+ঞ)জ8(Ocirc+য)Oslash(Ocirc+র)Ugrave(Iacute+জ)

(A+জ)

ঝ dʒʱ (notprivilegedenoughtohaveclustersasafirstmember)Ouml(Ocirc+ঝ)Uacute(Iacute+ঝ)

ক k Ucirc(p+ক)ordm(p+ট)(p+ত)Uuml(p+x+র)(p+x+ব)Yacute(p+ন)THORN(p+ব)szlig(p+ম)ক8(p+য)agrave(p+র)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)auml(p+frac14+ব)aacute8(p+frac14+য)aring(p+স)s(aelig+ক)ccedil(+p+র)

amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)

খ kʰ (notprivilegedenoughtohaveclustersasafirstmember)egrave(aelig+খ)

63

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

গ g eacute(acute+গ)ecirc(acute+দ)sup3(acute+ধ)euml(acute+ন)igrave(acute+ব)iacute(acute+ম)গ8(acute+য)icirc(acute+র)iuml(acute+ল)eth(aelig+গ)eth6 (+aelig+গ)

( (+ধ) (H+গ) K (L+H+গ)

ঘ gʱ ntilde(ograve+ন)ঘ8(ograve+য)oacute(ograve+র)ocirc(aelig+ঘ)

ঞ Thisletterdoesnothaveanyparticularphoneticvaluebutmostlypronouncedasn

Igrave(Iacute+চ)Ntilde(Iacute+ছ)Ugrave(Iacute+জ)Uacute(Iacute+ঝ)times(Ocirc+ঞ)

(A+চ) $ (A+ছ) (A+জ) M (A+ঝ)

ণ n otilde(iquest+ট)frac34(iquest+ঠ)ouml(iquest+ড)divide(iquest+Acirc+র)Aring(iquest+ঢ)oslash(iquest+ণ)ণ8(iquest+য)ugrave(iquest+ব)acirc(p+frac14+ণ)uacute(frac14+ণ)ucirc(+ণ)

O (P+ড) - (P+R+র) + (S+ণ)

ঙং ŋ s(aelig+ক)uuml(aelig+p+র)egrave(aelig+খ)eth(aelig+গ)ocirc(aelig+ঘ)yacute(aelig+p+ষ)(Insomecontextsaeligisreplacedbyং)কংঅং

) (H+ক) (H+গ) U (H+ঘ)

64

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ম m thorn(+ল)yuml(+প)(+z+র)(+ভ)(++র)$(+ম)(+র)(x+ম)plusmn(deg+ম)amp(+ম)atilde(p+frac14+ম)

W (3+ম)

ন n (+ট)((+middot+র))(iquest+ঠ)u(+ড)(+Acirc+র)(+ত)q(+x+র)q8(+x++য)curren(+থ)shy(+দ)reg(+brvbar+র)micro(+ধ)+(+deg+র)(+brvbar+ব)-(+ন)(+ম)ন8(+য)(+স)0(+ন)

(7+থ) (7+ধ) (7+Y+র)

শ ʃ Icirc(Iuml+চ)Ograve(Iuml+ছ)1(Iuml+ন)2(Iuml+ম)3(Iuml+র)4(Iuml+ল)শ8(Iuml+য)

ষ ʃ 5(frac14+ক)raquo(frac14+ট)Agrave(frac14+ঠ)uacute(frac14+ণ)6(frac14+প)7(frac14+z+র)8(frac14+ফ)raquo(frac14+ট)9(frac14+middot+র)Agrave(frac14+ঠ)uacute(frac14+ণ)ষ8(frac14+য)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)

+ (S+ণ)

65

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

স sampʃ (+ক)(+ট)(+প)(+ফ)lt(+ত)pound(+থ)(+ট)(+ক)=(+খ)স8(+য)gt(+র)(+ল)aring(p+স)

9 (+থ)

হ h ucirc(+ণ)0(+ন)amp(+ম)হ8(+য)(+র)A(+ল)

W (3+ম)

ড় ɽ B(C+গ)

ঢ় ɽʱ (notprivilegedenoughtohaveclusters)

য dʒThesecondarysymbol(allograph)jɔ-phalahastwophoneticvaluesWhenaddedtotheinitialconsonantinaworditisavowelaelig(asinশ8ামলর 8াপারetc)Butafteranon-initialconsonantitjustdoublesitinpronunciation(asinকায6ধায6etc)The+যcombinationhastwophysicalmanifestationsmdashর 8andয6

ক8(p+য)স8(+য)র 8(+য)[Justর 8isneverusedinBanglaorthographyর 8াisbutthenitslasttwosymbolsYa-phalaa-karaconstituteavowelsignrepresentingthevowelঅ8া]

66

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

র r Twomanifestationsmdashi lরফrepʰasthefirst

memberofaclusteregপ6ৎ6 not6 য6D6(+deg+ব)(earlierE6=+brvbar+deg+বafour-termcluster)etc(placedoverthefollowingconsonant)

ii র-ফলাrɔ-pʰɔlaasthesecondthirdmemberofaclustereg etc(placedundertheconsonantitfollows)

ল l F(+গ)(+প)G(+ব)H(+ম)I(+ট)J(+ড)K(+ক)F(+গ)L(+দ)ল8(+য)iuml(acute+ল)(+ল)thorn(+ল)

ঃ hwordfinallywordmediallyitdoublesthepronunciationofthefollowingconsonant

অঃকঃ

অব

Page 39: Proposal for a Bangla (or Bengali) Script Root Zone …‘Bangla’ (or Bengali) is historically and genealogically regarded as an eastern Indo-Aryan language with around 178.2 million

39

Examples

VB অং अ

VD অ अ

VX অঃ अः

VDB অং अ

VDX অঃ अःVHCM অ8াএ8া

5423 VHCMSequenceAVHCMsequencecanoptionallybefollowedbyAnusvāra[B]orCandrabindu[D] orVisarga [X]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga [DX]

Examples VHCMB অ8াংএ8াং VHCMD অ8াএ8া VHCMX অ8াঃএ8াঃ VHCMDB অ8াংএ8াং VHCMDX অ8াঃএ8াঃ

543 TheConsonantSequence

5431 ASingleConsonant(C) ExampleC কक

5432 AConsonantwithConditionsAConsonantoptionallyfollowedbydependentvowelsignkāra(Mātrā)[M]orAnusvāra [B] or Candrabindu [D] or Visarga [X] or Hasanta (also known asVirama)[H]orCandrabindu+Anusvāra[DB]orCandrabindu+ Visarga[DX]

Example

CM িকক -कक

CB কং क

40

CD ক क

CX কঃ कः

CH p क(Pureconsonant)

CDB কং क

CDX কঃ कः

5433 CMSequenceACMsequencecanbeoptionallyfollowedbyBDXDBorDX

Example CMB কীংকং कक

CMD কা का

CMX বীঃ वीः

CMDB কাং का

CMDX কাঃ काः

5434 SequenceofConsonants Asequenceofconsonants(upto4)joinedbyHasanta(alsoknownasVirama)

3(CH)CExample

CHC W rarr ন++ ত न+त

CHCHC sup2 rarr ন+ + ত+ + র न+त+र

CHCHCHC q8 rarr ন++ত++র++য় न+त+र+य

5435 Subsets While considering its subsets as a representative example we will consider thecombinationCHConlyhoweverthesameisequallyapplicabletoCHCHCandCHCHCHC[A]ThecombinationmaybefollowedbyMBDXDBorDX

Example CHCM sup3ী rarrক ক ী 4कrarrककী

CHCB sup3ং rarrক ক ং 4कrarrकक

CHCD sup3 rarrক ক 4कrarrकक

41

CHCX sup3ঃ rarrক ক ঃ 4कःrarrककঃ

CHCDB sup3 ং rarrক ক ং 4कrarrकक

CHCDX sup3ঃ rarrক ক ঃ 4कःrarrककः [B] 3(CH)CMmayfurtherbefollowedbyaBDXDBorDX

Example

CHCMB sup3ীং rarr ক ক ী ং 4क rarr क क ी

sup3ং rarr ক ক ং 4क rarr क क

CHCMD sup3া rarr ক ক া 4का rarr क क ा

CHCMX sup3ীঃ rarr ক ক ী ঃ 4कः rarr क क ी ः

CHCMDB sup3াংrarr ক ক া ং 4काrarr क क ा

CHCMDX sup3াঃ rarr ক ক া ঃ 4काः rarr क क ा ः

544 TheKhanda-Tasequence

5441 AsinglelsquoKhandarsquo-Ta(Z) Example Zৎ=x

5442 AKhandaTaCombination10AKhandaTacanbeprecededbyaconsonantandHasanta(alsoknownasVirama)

[CH]Z

Example র++ৎ =ৎHasinভৎHসনা (bhartsanā) scoldingNoteTheconditionsinthiscontextofKHANDATAarethattheCshouldbeeitherRAU+09B0(র)(usedinBangla)orRAU+09F0(ৰ)(usedinAssamese)

545 SpecialCasesSandPTwospecialcasesinvolvingSequences(referredtoasSandPinTable16underSection7)couldbedescribedbrieflyhereLetustakeupSinthefirstinstanceItisnoteworthythat thereare two instances inBanglawhereHasanta(U+09CD) isprecededbya full

10 Refer to Rule P in Section 7 Table 16

42

vowel (U+0985 অ - BANGLA LETTER A and U+098F এ - BANGLA LETTER E) ForrenderingYa-phalāfollowedbyঅandএitisnecessarytotypeU+09CDplusU+09AFyapreceded by the said vowels This is a purely ligatural entity and the addition ofYa-phalāandā-kāraisusedtoelicittheaeligsoundasinEnglishlsquobatrsquolsquofatrsquoetcTheBrahmıscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribedaslsquovowel killerrsquo although it actually indicates absence of a vowel after the markedconsonant Only the consonants can have the Hasanta marked But as we see hereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅGাandএGা(CfUnicode100p473[100])Another case refers to the formation of repha and ra-phalā in the said script andmentioned in the tableaboveasPOwing toco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+

kaasinঅকHarkaldquothesunldquo)ra-phalā=C+Hasanta+ra(egTieka+Hasanta+raas

inচTcakraldquocycleldquo)ThepointisinboththecasestheslotforracouldbeBanglaraর

(U+09B0)ortheAssameseraৰ(U+09F0)followedprecededbythecommonHasanta(U+09CD)whereastheshapesofrephaandra-phalāinboththecasesremainthesame

43

6 VariantsThissection talksabout thevariants in theBanglascriptTheNBGPcategorizes theseconfusinglyvariantsintwogroupsGroup1ConfusingduetopurevisualsimilarityGroup2ConfusingduetodeviationfromnormallyperceivedcharacterformationsbylargerlinguisticcommunityForGroup1anyidenticalcodepointsaredefinedasvariantsTheconfusablebutnotidenticalcasesarenotproposedasthereisanotherpanel(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesHowevercaseswhichbelongtoGroup2areproposedtobeconsideredasvariantsThesecasesarenotofmerevisualsimilarityasthey involve some deviations from the widely accepted norms of Bangla Aksharformations These can cause confusion even to a careful observer and hence beingproposedasvariantsThe variants are generated in a script when two or more forms are formed withdifferent storage or code points In Bangla the e-kāra ā-kāra and the o-kāra havedifferentcodepointsOnecantypeowithaconsonantatonegoandthesamebytypinge-kāra and ā-kāra as two separate keys getting the same results A reader cannotdifferentiatebetween the twoko (েকা)one typedwitha singlekeyand theotheronetypedwithtwodifferentkeysMoreoverthiswillnotbeconsideredasacaseofvariantbecauseakārafollowedbyakāraisnotallowed

61 InScriptVariantsHoweverweproposetwocasesoftruein-scriptvariantsinBanglascriptCASEIAs far as true variants in Bangla are concernedwemay drawour attention to caseswhereinHasantawith(U+09A5)থ(tha)appearsasconjunctwith(U+09B8)স(sa)and(U+09A8)ন(na)1 স+Hasanta+থ(U+09B8+U+09CD+U+09A5)versus

স +Hasanta+হ(U+09B8+U+09CD+U+09B9)

2 ন+Hasanta+থ(U+09A8+U+09CD+U+09A5)versus

ন+Hasanta+হ(U+09A8+U+09CD+U+09B9)

44

Theabovecombinationsifwrittenintraditionalorthographycouldbelittleconfusingwheretheথ(tha)inconjunctappearslikeaহ(ha)Theconjunctcouldbeintheinitialmedialorfinalpositions(asshownbelowinegno1)Itcouldbetypedwrongaswellthinking itwas aহ (ha)U+09B9 increasing the chances of risks in labelwriting andidentificationExamples1 acuteandসহ(asinacuteানsthānaacuteলsthulaাacuteGsvāsthyaঅacuteায়ীasthāyı)

2 andনহ(asinparagrantha)The fontswhichrepresent traditionalBanglawritingsystemcould tend tocreate thisproblemThereforethesemaybetakenascasesofvariantsinBanglaCASEIIAnotherinterestingexampleofvariantisencounteredinra+HasantaandHasanta+racombinationsinwritinglabelsintheBanglascript(forlanguagessuchasBanglaAssameseandManipuri)Thevariantcasesariseintypinglsquorepharsquo(involvingra+Hasanta)andlsquora-phalārsquo(involvingHasanta+ra)lsquoRepharsquocouldbeformedbytwosequences(mainlybecausebothAssameseandBanglafindplaceinthesameUNICODEpointsandlsquoB_RArsquoaswellaslsquoA_RArsquorefertothesamephoneticelement)Herethefinalligatureslookthesameandwillbeasfollows

(1) B_RA+H+C(2) A_RA+H+C

Where

B_RA = U+09B0BENGALILETTERRA(র)orA_RA = U+09F0BENGALILETTERRAWITHMIDDLEDIAGONAL(ৰ)H = U+09CDBENGALISIGNVIRAMA()C = anyconsonant(theoretically)

Example

Sequence1(UsingBanglaRA)

Ligature1

Sequence2(UsingAssameseRA)

Ligature2

U+09B0(র)U+09CD()U+0995(ক) কH U+09F0(ৰ)U+09CD()U+0995(ক) কH

U+09B0(র)U+09CD()U+09A0(ঠ) ঠH U+09F0(ৰ)U+09CD()U+09A0(ঠ) ঠH

Table12ExampleofRepha

45

NoteAsBanglaandAssameseকandঠlookexactlythesametheresultantcombinationswithRephalookidenticalAdditionofRephadoesnotmakeanydifferencelsquoRa-phalarsquocouldbeformedbytwosequencesonsimilargroundsandthefinalligatureswouldlookthesame

(1) C1+H+B_RA(2) C1+H+A_RA

WhereC1 = anyconsonantsexceptKhanda-ta

Example

Sequence1(UsingBanglaRA)

Ligature1

Sequence2(UsingAssameseRA)

Ligature2

U+0995(ক)U+09CD()U+09B0(র) U+0995(ক)U+09CD()U+09F0(ৰ)

U+09A8(ন)U+09CD()U+09B0(র) ) U+09A8(ন)U+09CD()U+09F0(ৰ) )

Table13ExampleofRa-phala

AstheAssameseandBanglaRephaandRa-phalaconjunctformslookthesamethiscouldcauseconfusabilitytotheend-usersHencetherephaandra-phalacasesneedtobedefinedasvariantsNBGPconcludedtodefineরandৰasvariantcodepointswhereonlyonevariantsetbetweenরandৰcouldcoverallcasesButthiswillcreateblockedvariantlabelsegif

someoneregistersldquoরররrdquothevariantlabelldquoৰৰৰrdquowillbegeneratedasvariantandwillbeblockedandviceversaHoweveritisonlyblockedatthelabellevelifsomeoneelseneedstoregisterotherlabelsegৰৰorৰৰৰৰitisstillpossible

62 CrossScriptVariantsAcrispcrossscriptstudyforBanglahasbeendonewithrespecttosisterscriptssuchasDevanāgarī Gurmukhı and Odia11(formerly Oriya) keeping in mind the visual andtechnicalconfusionstheymaycauseaslabelsonthewebdomainMoreoverthereisnoin-script variant in Bangla as far as the orthography is concerned The followingcharacters are being proposed by the NBPG as variants Although there are certaincharacters which are somewhat similar they but have not been included here TheyhavebeenprovidedintheAppendix(102)forreference

11 Unicode uses Oriya for the script although Odia is now the official term used

46

1 BanglaandNāgarīDevanāgarīScript

Bangla Devanāgarī

মU+09AE

मU+092E

িU+09BF

िU+093F

Table14-BanglaandDevanāgarīcross-scriptvariantcodepoint

2 BanglaandGurmukhiScript

Bangla Gurmukhı

মU+09AE

ਸU+0A38

িU+09BF

ਿU+0A3F

Table15-BanglaandGurmukhıcross-scriptvariantcodepoint

7 WholeLabelEvaluationRules(WLE) ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinsection32whenwritten inBangla12ScriptThe ruleshavebeendrafted in suchawaythattheycanbeeasilytranslatedintotheLGRspecificationsBelow are the symbols used in the WLE rules for each of the Indic SyllabicCategoryasmentionedinthetableprovidedinCodepointrepertoire(Section51)

C rarr Consonant

M rarr Kāra(Mātrā)

V rarr Vowel

B rarr Anusvāra

12 As used by the Unicode denoting and including both Assamese and Maṇipuri

47

D rarr Candrabindu

X rarr Visarga

H rarr Hasanta

Z rarr KhandaTa

S rarr S1S2(fromTable9)or(ae)Ya-phalā(V1HC1M1)whereV1iseither0985(অ-BENGALILETTERA)or098F(এ-BENGALILETTERE)His09CD(-BENGALISIGNVIRAMA)C1is-09AF(য-BENGALILETTERYA)M1is-09BE(া-BENGALIVOWELSIGNAA)S1 and S2 are valid even they are not allowed by the other context rules

P rarr Ra-Hasanta(C2H)whereC2iseither09B0(র-BENGALILETTERRA)or 09F0 (ৰ - ASSAMESE LETTER RA

Unicode name BENGALI LETTER RAWITHMIDDLEDIAGONAL)

His09CD(-BENGALISIGNVIRAMA)

Table16-SymbolsusedinWLErules

ItisalsoperhapsidealtomentionherethatinBanglatheconsonantletters(orgraphemes)arephysicallyjoinedtoformldquoclustersrdquothatcouldtheoreticallyconjoinfromtwotofourconsonantsandcombinetocreatenewshapesDashandChaudhuri(1998)statethatthereareldquonearly380uniqueconsonantclustersrdquooutofwhichBi-consonantalcombinationsare290three-lettercombinationsaccountforanother80andtherareroneswithfourlettersnumber10more[136Pg4]MoredetailsofsuchcombinationscouldbeseeninPabitraSarkar(1993)[135] 71FinalSetofWLERulesTheprevalentpatternsinBanglaandvariousrestrictionsbelowarethespecificWLErulesthatneedtobeimplemented

48

1 CisasetofCandCNwhereCNisthesetofnormalizedformsofড়ঢ়য়2 HmustbeprecededbyC

Example

3 MmustbeprecededbyCExampleকা

4 DmustbeprecededbyeitherofVCorMExampleআখখা হা

5 XmustbeprecededbyeitherofVCMorDExampleউঃখঃবঃাঃ দ ঃ

6 BmustbeprecededbyeitherofVCMorDExampleআংইংকং

7 ZmustbeprecededbyVCMDBXorPExampleইৎকৎাৎাৎপ6ৎrৎ (S is not listed because S ends with M Z may also follow S)rdquo

8 VCANNOTbeprecededbyHDetailsin711CaseofVprecededbyH

9 SCANNOTbeprecededbyH

Now letuselaborateeach rulewithexamples from the scriptkeeping inmindthe Bangla Assamese and Manipuri communities Some combinations ofcharactersmayseemunrealisticorrareinusagebutthereisnoharminaddingsuch ligaturesbecause it ispossible tocreate thembyanyusereasilybutmaynotbeattestedcombinationsCase of V Preceded by H There could be cases involvingmulti-word domains where Vmay need to beallowedtofollowanHegব8াsঅtইিuয়া baeligŋkʌv ɪndiə (U+09ACU+09CDU+09AFU+09BEU+0999U+09CD U+0995 U+0985 U+09AB U+09CD U+0987 U+09A8 U+09CD U+09A1U+09BFU+09DFU+09BE)(meaningBankofIndia)ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendswithanH(অt)andthesecondwordbeginswithaV(ইিuয়া)Somesectionsofthe linguistic community require the explicit presence of H for fullrepresentation of the sound intended However by and large the form of thefirstwordwithoutanH(U+09CD)isconsideredenoughforfullrepresentationofthesoundintendedforthefirstword

49

This isauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoire Otherwise V is never required to be allowed to follow an HPermittingthismaycreateaperceptivesimilaritybetweentwolabels(withandwithout H) for majority of the linguistic communities hence this is explicitlyprohibitedbytheNBGPIn future if required depending on the prevailing requirements from thecommunitythefutureNBGPmayconsiderrevisitingthisrule

72 AdditionalExamplesfromBanglaABNF

Belowarea fewexampleswhichhelponeunderstandsomeof the rulesABNFputsinplaceThesearejustgivenforreferencepurposesandarenotmeanttobecomprehensive

1HMBDorXcannotoccurinthebeginningofaBanglawordExample

ক क

াক ाक

ংক क

ক क

ঃক ःकAscanbeseensuchcombinationwillresultautomaticallyinaldquogolurdquooradottedcircle marking it as an invalid formation This is an intrinsic property of theIndianlanguagesyllableandisquasi-automaticallyappliedwhereversupportedbytheOS

2HisnotpermittedafterVBDXMS

Example

অ अ

অং क ক क কঃ कः ি )क

3NumberofBDorXpermittedafterConsonantorVowelorakāra(Mātrā)isrestrictedtoonethusthefollowingcombinationsareinvalidated

50

Example

কংং क

ক क

কঃঃ कःः

কা का

কীঃঃ कःः

অংং अ

অ अ

অঃঃ अःः

4NumberofMpermittedafterConsonantisrestrictedtooneExample

কীী की5MisnotpermittedafterV Example

ইাঈৗ ईाईौ6ThecombinationsofAnusvāra+ VisargaaswellasVisarga+Anusvāraarenotpermissible

Example

কংঃ कः

কঃং कः

8 Contributors

81ExpertsfromIndia ProfessorUdayaNarayanaSinghChair-ProfessorofLinguisticsampDeanFacultyofArtsAmityUniversityHaryanaGurgaonPachgaonManesarPIN122431(Haryana)India

51

ProfessorPabitraSarkarformerlyVice-ChancellorRabindraBharatiUniversityKolkataDrAtiurRahmanKhanPrincipalTechnicalOfficerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMrRajibChakrabortyLinguistSocietyforNaturalLanguageTechnologyResearch(SNLTR)Module114amp130SDFBuildingSaltLakeSector-VKolkata-700091(WestBengal)IndiaMrAkshatJoshiProjectEngineerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMsMoumitaChowdhurySeniorTechnicalOfficerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMrChandrakantaMurasinghAgartalaTripuraSomeotherNBGPmembers

82ContributorsfromBangladeshJanabMustafaJabbarHonorableMinisterMinistryofPostsTelecommunicationsampInformationTechnologyGovtofBangladeshProfShamsuzzamanKhanFormerDirector-GeneralBanglaAcademyDhakaProfRafiqulIslamNationalProfessorofHumanitiesDhakaProfSwarochisSarkarDirectorInstituteofBangladeshStudiesRajshahiUniversityRajshahiBangladeshProfJinnatImtiazAliDirector-GeneralInternationalMotherLanguageInstituteDhakaMrMohammadMamunOrRashidDepartmentofBanglaJahangirnagarUniversityampMemberBangladeshComputerCouncilProfManiruzzamanformerlyProfessorChittagongUniversityChattagramBangladeshMrShyamSunderSikderSecretarySecretaryPostampTelecommunicationsDivisionGovtofBangladesh

52

MrMdMustafaKamalFormerDirectorGeneralBangladeshTelecommunicationsRegulatoryAuthorityGovernmentofBangladeshDhakaBrigadierGeneralMdMahfuzulKarimMajumderDirector-GeneralEngineeringampOperationsDivisionBangladeshTelecommunicationsRegulatoryAuthorityGovernmentofBangladeshDhakaMdZiarulIslamProgrammerPostsampTelecommunicationsDivisionGovernmentofBangladeshDhakaProfSyedShahriyarRahmanDepartmentofLinguisticsUniversityofDhakaDrMizanurRahmanDirector(In-Charge)TranslationTextBookandInternationalRelationsDivisionBanglaAcademyDhakaDrApareshBandyopadhyayDirectorBanglaAcademyDhakaMrMdMobarakHossainDirectorBanglaAcademyDhakaDrJalalAhmedDirectorBanglaAcademyDhakaMrJahangirHossainInternetSocietyBangladesh(ICANNALS)JanabSarwarMostafaChoudhuryBangladeshComputerCouncilDhakaJanabMdRashidWasifBangladeshComputerCouncilDhakaJanabIstiaqueArifSeniorAssistantDirectorBangladeshTelecommunicationsRegulatoryAuthorityDhakaMsAfifaAbbasInformationSecurityandGovernanceLeadEngineeratBanglalinkandICANNFellowMr Mohammad Abdul Haque Secretary General Bangladesh Internet GovernanceForumMrImranHossenCEOEyeSoftandkeymemberofBangladeshAssociationofSoftwareampInformationServices(BASIS)MsShahidaKhatunDirectorFolkloreMuseumampArchiveDivisionBanglaAcademyDhakaMrSyedAshikRehmanCEOBengalMediaCorporationDhakaMrHaseebRahmanCEOProfessionalsrsquoSystemsDhaka

53

9 References[100] UnicodeConsortium2017UnicodeStandard100MountainViewCA

[101] BandyopadhyayChittaranjan1981DuiShatakerBanglaMudranoPrakashanKolkataAnandaPublishers

[102] BanerjiRD1919TheOriginoftheBengaliScriptKolkataNewDelhiAsianEducationalServices2003reprint

[103] ChatterjiSK1926TheOriginandDevelopmentoftheBengaliLanguageCalcuttaCalcuttaUniversityPress

[104] -----1939Bhasha-prakashBangalaVyakaran(AGrammaroftheBengaliLanguage)CalcuttaUniversityofCalcutta

[105] HaiMuhammadAbdul1964DhvaniVijnanOBanglaDhvani-tattwa(PhoneticsandBengaliPhonology)DhakaBanglaAcademy

[106]JhaSubhadra1958TheFormationofMaithiliLondonLuzacampCo

[107] KosticDjordjeDasRheaS1972AShortOutlineofBengaliPhoneticsCalcuttaStatisticalPublishingCompany

[108] MajumdarRC1971HistoryofAncientBengalCalcuttaGBhardwaj

[109] MazumdarBijaychandra19202000TheHistoryoftheBengaliLanguage(ReprCalcutta1920ed)NewDelhiAsianEducationalServices

[110] PandeyAnshuman2001ProposaltoEncodetheTirhutaScriptinISOIEC10646

[111] PalPalashBaran2001DhwanimalaBarnamalaKolkataPapyrus

[112] -----2007lsquoBanglaHarapherPanchParbarsquoInSwapanChakrabortyedMudranerSanskritiOBanglaBoiKolkataAbabhas

[113] RossFiona1999ThePrintedBengaliCharacteranditsEvolutionLondonCurzon

[114] ShastriMahamahopadhyayHaraPrasad1916HājārBacharērPurāṇaBāṅgālāBhāṣāyBauddhaGānōDōhāCalcuttaBangiyaSahityaParishat

[115] SinghUdayaNarayana(JointlyManiruzzaman)1983DiglossiainBangladeshandlanguageplanningCalcuttaGyanBharati214pp

[116] -----1987ABibliographyofBengaliLinguisticsMysoreCIILxii+316pp

[117] -----2017(withRajibChakrabortyBidishaBhattacharjeeampArimardanKumarTripathy)LanguagesandCulturesontheMarginGuidelinesforFieldworkonEndangeredLanguagesMimeoCentreforEndangeredLanguagesVisva-Bharati

54

[118] -----1980ScriptalchoiceandspellingreformAnessayinlanguageandplanningJournaloftheMSUniversityofBarodaSocialScienceNumber292173-186AmodifiedversionreprintedEAnnamalaiBjornJernuddandJoanRubinedsLanguagePlanningProceedingsofanInstituteMysoreCIIL405-417

[119] Sripantha1996JakhanChapakhanaEloKolkataPaschim-BangaBanglaAcademy

[120] SurAtul1986BanglaMudranerDushoBacharKolkataJijnasa

[121] ScriptBehaviourforBengaliVersion11TDILandC-DACPune

[122] BoraMahendra1981TheEvolutionofAssameseScriptJorhatAssamSahityaSabha

[123] ProposaltoEncodetheTirhutaScriptinISOIEC10646httpwwwunicodeorgL2L201111175r-tirhutapdfaccessedon25112017

[124] EthnologueAssameseintheLanguageCloudhttpswwwethnologuecomcloudasmaccessedon25112017

[125] BengalialphabetforManipurifoundinEthnologueManipuri(MeeteilonMeithei)httpswwwomniglotcomwritingmanipurihtmaccessedon20102019

[126] WikipediaBengalialphabethttpsenwikipediaorgwikiBengali_alphabetaccessedon25112017

[129]OmniglotSlyhetihttpwwwomniglotcomwritingsylotihtmaccessedon1052018

[130]WikipediaBishnupriyaManipurilanguagehttpsenwikipediaorgwikiBishnupriya_Manipuri_languageaccessedon25112017

[131] TheEMILLECIILCorpushttpmetashareeldaorgrepositorybrowsethe-emilleciil-corpusabdd35c8de6f11e2b1e400259011f6ea6bce74d38dbb42d881da76c64a6adb20accessedon1052018

[132]TheEMILLECIILCorpushttpcatalogelrainfoproduct_infophpproducts_id=696accessedon1052018

[133] BanglaLanguageampScript httpswwwisicalacin~rc_banglabanglahtmlaccessedon1052018

[134] SarkarPabitra1992BanglaBananSanskarSamasyaoSambhabanaKolkataChirayataPrakashan

[135] SarkarPabitra1993BanglaBhasharYuktabyanjanBhasha1123-45

55

[136] DashNiladriShekharandBBChaudhuri1998BanglaScriptAStructuralStudyLinguisticsToday121-28Alsoavailableathttpswwwacademiaedu9967428Bangla_Script_A_Structural_Study

[137] DaniAhmedHasan(1957)lsquoSrīhatta-NāgarīLipirUtpattioBikāśrsquoBanglaAcademyPatrika(Dhaka)Vol12(Bhadra-Agrahayan1364BangabdaNumber)pg1

[138] WikipediaSylhetiNagari httpsenwikipediaorgwikiSylheti_Nagariaccessedon1952018

[139] FuruiRyosuke(2015)lsquoVariegatedAdaptationsStateFormationinBengalfromtheFifthtoSeventhCenturyrsquoinBhairabiPrasadSahuampHermannKulkeedsInterrogatingPoliticalSystemsIntegrativeProcessesandStatesinPre-ModernIndiaChapter9Pp255-73NewDelhiManohar

[140] FergusonCharesAandMunierChowdhury(1960)lsquoPhonesofBengalirsquoLanguageVol36No1pp22-59

[141] ShahidullahMuhammad(2007)BuddhistMysticSongsDhakaMowlaBrothers

[142] RayPunyaSloka(1966)BengaliLanguageHandbookWashington

[143] HaiMuhammadAbdul(1960)AphoneticandphonologicalstudyofnasalsandnasalizationinBengaliDhakaUniversityofDhaka

[144] UnicodeConsortiumProposalSummaryFormtoAccompanySubmissionsforAdditionstotheRepertoireofISOIEC10646UNICODEhttpswwwunicodeorgL2L200202387r-syloti-formpdfaccessedonMay212018

[145] WikipediaOlChiki(Unicodeblock)httpsenwikipediaorgwikiOl_Chiki_(Unicode_block)accessedonMay212018

[146]BanglaScripthttpwwwbangladesh2000combdbangla_scripthtmlaccessedonMay212018

[147] BhattacharyaAshutoshed(1942)GopichandrerGanCalcuttaCalcuttaUniversity

[149] DasSisirKumar(1975)SahibsandMunshisAnAccountoftheCollegeofFortWilliamCalcutta

[150] IslamRafiqulPabitraSarkarMahbubulHaqampRajibChakraborty(eds)(2014)BanglaAcademyPromitoBanglaByabaharikByakaran(AFunctionalGrammarofStandardBangla)DhakaBanglaAcademy

[151] SarkarPabitra[2013]lsquoBanglaSpellingReformtheLongandShortofItrsquoBanglaJournal19215-232

[152] BanglaAcademy(2012)BanglaAcademyPromitoBanglaBananerNiyam(StandardBanglaSpellingasadoptedbyBanglaAcademy)DhakaBanglaAcademy

56

[153] SarkarPabitraampRajibChakraborty2018ldquoWhathashappenedSoFarIntermsof Script Reformsrdquo Paper presented at the Face to Face meeting jointly held by theBanglaAcademyDhakaampICANNatBanglaAcademyDhakaon10072018

[154] The Unicode Consortium 2018 The Unicodereg Standard Version 110 ndash CoreSpecificationChapter12P473

57

10 Appendix-I

101 AugmentedBackus-NaurFormalism(ABNF)The Augmented Backus-Naur Formalism (ABNF) is generic in nature and whenappliedtoaspecificlanguagescriptcertainrestrictionrulesapplyInotherwordsinagivenlanguagesomeoftheFormalismstructuresdonotnecessarilyapplyTotakecareofsuchcasesrestrictionrulesaresetinplaceTheserestrictionswillhelptofine-tunetheABNFIncaseofBangla13inparticularthefollowingrulesapply

1Khaṇḍata (ৎ) is NOT allowed at the beginning of an IDN label The sameappliestoঞandthevelarnasalঙintheBanglaSchemeoffive-foldlsquovargarsquo(asdefined under Table 5) Moreover Bangla does not allow ya (য়) in thebeginningofawordeitherbutwecanciteacoupleofnativeexamples forexamplethewordয়8াvেড়া(yaeligbbɔRo)fromthepoemlsquoLichuchorrsquowrittenbyKaziNazrul IslamHowever there are instances of it being used in namesmostlyof foreignoriginsuchasYaqubwhichmaybewrittenwithya(য়) inthe beginning as inয়াwব) In very recent timeswhile transliterating someChineseandJapanesenamesinBanglaonedoescomeacrossthepossibilityofKhaṇḍata (ৎ) followedbysa (স) inthebeginningofaword forexamplexেসিরং(Tsering)

2CHcancomewithKhandaTainonlythecasewhereCisra(র)(09B0)

ৎ6 asinভৎ6 সনা

3OnlyfollowingcombinationswithVHCMwillbeallowedrarrঅ8া(togetherpronouncedasaelig)asinঅ8ািসড(acid)rarrএ8া(togetheralsopronouncedasaelig)asinএ8ািসডএ8ােসািসেয়শান

(acidassociation)

102 lsquoSylhetiNagarılipirsquoorlsquoSilotirsquoThisversionofBanglascriptresemblesthe lsquoKaithīrsquoscript(ISO12954)usedbytheAccountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh andBiharndashwidelyinuseduringthe1880sTherewereseveralothernamesofSylheti

13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language Assamese and Maṇipuri have not been covered in this section

58

Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)

Table17ndashTheScriptTableofSylhetiNagarıorSiloti

103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints

1031 BanglaandNāgarīorDevanāgarī

Bangla Devanāgarī NBGP

Decision

ঃ U+0983 ः U+0903 Confusable

ওU+0993 उU+0909 Confusable

ঘU+0998 घU+0918 Confusable

U+0981 U+0945 Confusable

Table18BanglaandDevanāgarīconfusablecodepoints

59

1032 BanglaandGurmukhi

Bangla Gurmukhi NBGPdecision

ঘU+0998 ਬU+0A2C Confusable

U+0981 U+0A71 Confusable

Table19BanglaandGurmukhiconfusablecodepoints

Bangla Gurmukhi

NBGPdecision

ওU+0993 ਤU+0A24

Distinguishable

শU+09B6 ਅU+0A05

Distinguishable

মU+09AE ਮU+0A2E

Distinguishable

বাU+09ACandU+09BE

ਗU+0A17

Distinguishable

Table20ndashBanglaandGurmukhıdistinguishablecodepoints

1033BanglaandOriya(Odia)

Bangla Oriya(Odia) NBGPDecision

ওU+0993 ଓU+0B13 Confusable

Table21ndashBanglaandOriyadistinguishablecodepoints

Bangla Oriya(Odia) NBGP

Decision

ঘU+0998 ସU+0B38 Distinguishable

Table22ndashBanglaandOriyadistinguishablecodepoints

60

11 Appendix-IIBengaliconsonantsandtheirallographs

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

প p y(z+ত)(z +ন)|(z +প)প8(z +য)r(z +র)(z +ল)~(z +স)(+প)(+প)

ফ pʰ (t +র)(t +ল)(+ফ)

ব b ( +জ)( +দ)( +ধ)v(+ব)ব8(+য)(+র)(+ল)ভ(+ভ)(+ব)(+ব)

(0+ধ) 2 (3+ব)

ভ bʱ ভ8(+য)(+র)(+ল)

ত t (x+ত)8(x+x+য)(x+x+ব)(x+থ)(x+ন)ত8(x+য)(x+ম)8(x++য)(x+ব)(x+র)y(z+ত)(p+ত)(p+x+ব)(+ত)q8(+x++য) (+x+র)Thereisamarkedformofত+=ৎৎ6 (+xৎ)

amp (5+ত)

61

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

থ tʰ থ8(iexcl+য)cent(iexcl+র)pound(+থ)(x+থ)curren(+থ)

(7+থ) 9 (+থ)

দ d yen(brvbar+গ)sect(brvbar+ঘ)uml(brvbar+দ)copy(brvbar+ধ)দ8(brvbar+য)ordf(brvbar+ব)laquo(brvbar+ভ)not(brvbar+র)(+দ)shy(+দ)reg(+brvbar+র)not6 (+brvbar+র)

(lt+গ) gt (lt+ধ)

ধ dʱ macr(deg+ন)plusmn(deg+ম)ধ8(deg+য)sup2(deg+র)sup3(acute+ধ)copy(brvbar+ধ)(+ধ)micro(+ধ)

( (+ধ) gt (lt+ধ) (0+ধ) (7+ধ)

ট ʈ para(middot+ট)ট8(middot+য)cedil(middot+ব)sup1(middot+র)ordm(p+ট)raquo(frac14+ট)

ঠ ʈʰ ঠ8(frac12+য)frac34(iquest+ঠ)Agrave(frac14+ঠ)

ড ɖ Aacute(Acirc+ড)ড8(Acirc+য)Atilde(Acirc+র)

ঢ ɖʱ ঢ8(Auml+য)Aring(iquest+ঢ)

চ tʃ AElig(Ccedil+চ)Egrave(Ccedil+ছ)Eacute(Ccedil+Ecirc+র)Euml(Ccedil+ঞ)চ8(Ccedil+য)Igrave(Iacute+চ)Icirc(Iuml+চ)

(A+চ)

62

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ছ tʃʰ ETH(Ecirc+র)Egrave(Ccedil+ছ)Ntilde(Iacute+ছ)Ograve(Iuml+ছ)

$ (A+ছ)

জ dʒ Oacute(Ocirc+জ)Otilde(Ocirc+Ocirc+ব)Ouml(Ocirc+ঝ)times(Ocirc+ঞ)জ8(Ocirc+য)Oslash(Ocirc+র)Ugrave(Iacute+জ)

(A+জ)

ঝ dʒʱ (notprivilegedenoughtohaveclustersasafirstmember)Ouml(Ocirc+ঝ)Uacute(Iacute+ঝ)

ক k Ucirc(p+ক)ordm(p+ট)(p+ত)Uuml(p+x+র)(p+x+ব)Yacute(p+ন)THORN(p+ব)szlig(p+ম)ক8(p+য)agrave(p+র)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)auml(p+frac14+ব)aacute8(p+frac14+য)aring(p+স)s(aelig+ক)ccedil(+p+র)

amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)

খ kʰ (notprivilegedenoughtohaveclustersasafirstmember)egrave(aelig+খ)

63

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

গ g eacute(acute+গ)ecirc(acute+দ)sup3(acute+ধ)euml(acute+ন)igrave(acute+ব)iacute(acute+ম)গ8(acute+য)icirc(acute+র)iuml(acute+ল)eth(aelig+গ)eth6 (+aelig+গ)

( (+ধ) (H+গ) K (L+H+গ)

ঘ gʱ ntilde(ograve+ন)ঘ8(ograve+য)oacute(ograve+র)ocirc(aelig+ঘ)

ঞ Thisletterdoesnothaveanyparticularphoneticvaluebutmostlypronouncedasn

Igrave(Iacute+চ)Ntilde(Iacute+ছ)Ugrave(Iacute+জ)Uacute(Iacute+ঝ)times(Ocirc+ঞ)

(A+চ) $ (A+ছ) (A+জ) M (A+ঝ)

ণ n otilde(iquest+ট)frac34(iquest+ঠ)ouml(iquest+ড)divide(iquest+Acirc+র)Aring(iquest+ঢ)oslash(iquest+ণ)ণ8(iquest+য)ugrave(iquest+ব)acirc(p+frac14+ণ)uacute(frac14+ণ)ucirc(+ণ)

O (P+ড) - (P+R+র) + (S+ণ)

ঙং ŋ s(aelig+ক)uuml(aelig+p+র)egrave(aelig+খ)eth(aelig+গ)ocirc(aelig+ঘ)yacute(aelig+p+ষ)(Insomecontextsaeligisreplacedbyং)কংঅং

) (H+ক) (H+গ) U (H+ঘ)

64

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ম m thorn(+ল)yuml(+প)(+z+র)(+ভ)(++র)$(+ম)(+র)(x+ম)plusmn(deg+ম)amp(+ম)atilde(p+frac14+ম)

W (3+ম)

ন n (+ট)((+middot+র))(iquest+ঠ)u(+ড)(+Acirc+র)(+ত)q(+x+র)q8(+x++য)curren(+থ)shy(+দ)reg(+brvbar+র)micro(+ধ)+(+deg+র)(+brvbar+ব)-(+ন)(+ম)ন8(+য)(+স)0(+ন)

(7+থ) (7+ধ) (7+Y+র)

শ ʃ Icirc(Iuml+চ)Ograve(Iuml+ছ)1(Iuml+ন)2(Iuml+ম)3(Iuml+র)4(Iuml+ল)শ8(Iuml+য)

ষ ʃ 5(frac14+ক)raquo(frac14+ট)Agrave(frac14+ঠ)uacute(frac14+ণ)6(frac14+প)7(frac14+z+র)8(frac14+ফ)raquo(frac14+ট)9(frac14+middot+র)Agrave(frac14+ঠ)uacute(frac14+ণ)ষ8(frac14+য)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)

+ (S+ণ)

65

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

স sampʃ (+ক)(+ট)(+প)(+ফ)lt(+ত)pound(+থ)(+ট)(+ক)=(+খ)স8(+য)gt(+র)(+ল)aring(p+স)

9 (+থ)

হ h ucirc(+ণ)0(+ন)amp(+ম)হ8(+য)(+র)A(+ল)

W (3+ম)

ড় ɽ B(C+গ)

ঢ় ɽʱ (notprivilegedenoughtohaveclusters)

য dʒThesecondarysymbol(allograph)jɔ-phalahastwophoneticvaluesWhenaddedtotheinitialconsonantinaworditisavowelaelig(asinশ8ামলর 8াপারetc)Butafteranon-initialconsonantitjustdoublesitinpronunciation(asinকায6ধায6etc)The+যcombinationhastwophysicalmanifestationsmdashর 8andয6

ক8(p+য)স8(+য)র 8(+য)[Justর 8isneverusedinBanglaorthographyর 8াisbutthenitslasttwosymbolsYa-phalaa-karaconstituteavowelsignrepresentingthevowelঅ8া]

66

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

র r Twomanifestationsmdashi lরফrepʰasthefirst

memberofaclusteregপ6ৎ6 not6 য6D6(+deg+ব)(earlierE6=+brvbar+deg+বafour-termcluster)etc(placedoverthefollowingconsonant)

ii র-ফলাrɔ-pʰɔlaasthesecondthirdmemberofaclustereg etc(placedundertheconsonantitfollows)

ল l F(+গ)(+প)G(+ব)H(+ম)I(+ট)J(+ড)K(+ক)F(+গ)L(+দ)ল8(+য)iuml(acute+ল)(+ল)thorn(+ল)

ঃ hwordfinallywordmediallyitdoublesthepronunciationofthefollowingconsonant

অঃকঃ

অব

Page 40: Proposal for a Bangla (or Bengali) Script Root Zone …‘Bangla’ (or Bengali) is historically and genealogically regarded as an eastern Indo-Aryan language with around 178.2 million

40

CD ক क

CX কঃ कः

CH p क(Pureconsonant)

CDB কং क

CDX কঃ कः

5433 CMSequenceACMsequencecanbeoptionallyfollowedbyBDXDBorDX

Example CMB কীংকং कक

CMD কা का

CMX বীঃ वीः

CMDB কাং का

CMDX কাঃ काः

5434 SequenceofConsonants Asequenceofconsonants(upto4)joinedbyHasanta(alsoknownasVirama)

3(CH)CExample

CHC W rarr ন++ ত न+त

CHCHC sup2 rarr ন+ + ত+ + র न+त+र

CHCHCHC q8 rarr ন++ত++র++য় न+त+र+य

5435 Subsets While considering its subsets as a representative example we will consider thecombinationCHConlyhoweverthesameisequallyapplicabletoCHCHCandCHCHCHC[A]ThecombinationmaybefollowedbyMBDXDBorDX

Example CHCM sup3ী rarrক ক ী 4कrarrककী

CHCB sup3ং rarrক ক ং 4कrarrकक

CHCD sup3 rarrক ক 4कrarrकक

41

CHCX sup3ঃ rarrক ক ঃ 4कःrarrककঃ

CHCDB sup3 ং rarrক ক ং 4कrarrकक

CHCDX sup3ঃ rarrক ক ঃ 4कःrarrककः [B] 3(CH)CMmayfurtherbefollowedbyaBDXDBorDX

Example

CHCMB sup3ীং rarr ক ক ী ং 4क rarr क क ी

sup3ং rarr ক ক ং 4क rarr क क

CHCMD sup3া rarr ক ক া 4का rarr क क ा

CHCMX sup3ীঃ rarr ক ক ী ঃ 4कः rarr क क ी ः

CHCMDB sup3াংrarr ক ক া ং 4काrarr क क ा

CHCMDX sup3াঃ rarr ক ক া ঃ 4काः rarr क क ा ः

544 TheKhanda-Tasequence

5441 AsinglelsquoKhandarsquo-Ta(Z) Example Zৎ=x

5442 AKhandaTaCombination10AKhandaTacanbeprecededbyaconsonantandHasanta(alsoknownasVirama)

[CH]Z

Example র++ৎ =ৎHasinভৎHসনা (bhartsanā) scoldingNoteTheconditionsinthiscontextofKHANDATAarethattheCshouldbeeitherRAU+09B0(র)(usedinBangla)orRAU+09F0(ৰ)(usedinAssamese)

545 SpecialCasesSandPTwospecialcasesinvolvingSequences(referredtoasSandPinTable16underSection7)couldbedescribedbrieflyhereLetustakeupSinthefirstinstanceItisnoteworthythat thereare two instances inBanglawhereHasanta(U+09CD) isprecededbya full

10 Refer to Rule P in Section 7 Table 16

42

vowel (U+0985 অ - BANGLA LETTER A and U+098F এ - BANGLA LETTER E) ForrenderingYa-phalāfollowedbyঅandএitisnecessarytotypeU+09CDplusU+09AFyapreceded by the said vowels This is a purely ligatural entity and the addition ofYa-phalāandā-kāraisusedtoelicittheaeligsoundasinEnglishlsquobatrsquolsquofatrsquoetcTheBrahmıscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribedaslsquovowel killerrsquo although it actually indicates absence of a vowel after the markedconsonant Only the consonants can have the Hasanta marked But as we see hereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅGাandএGা(CfUnicode100p473[100])Another case refers to the formation of repha and ra-phalā in the said script andmentioned in the tableaboveasPOwing toco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+

kaasinঅকHarkaldquothesunldquo)ra-phalā=C+Hasanta+ra(egTieka+Hasanta+raas

inচTcakraldquocycleldquo)ThepointisinboththecasestheslotforracouldbeBanglaraর

(U+09B0)ortheAssameseraৰ(U+09F0)followedprecededbythecommonHasanta(U+09CD)whereastheshapesofrephaandra-phalāinboththecasesremainthesame

43

6 VariantsThissection talksabout thevariants in theBanglascriptTheNBGPcategorizes theseconfusinglyvariantsintwogroupsGroup1ConfusingduetopurevisualsimilarityGroup2ConfusingduetodeviationfromnormallyperceivedcharacterformationsbylargerlinguisticcommunityForGroup1anyidenticalcodepointsaredefinedasvariantsTheconfusablebutnotidenticalcasesarenotproposedasthereisanotherpanel(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesHowevercaseswhichbelongtoGroup2areproposedtobeconsideredasvariantsThesecasesarenotofmerevisualsimilarityasthey involve some deviations from the widely accepted norms of Bangla Aksharformations These can cause confusion even to a careful observer and hence beingproposedasvariantsThe variants are generated in a script when two or more forms are formed withdifferent storage or code points In Bangla the e-kāra ā-kāra and the o-kāra havedifferentcodepointsOnecantypeowithaconsonantatonegoandthesamebytypinge-kāra and ā-kāra as two separate keys getting the same results A reader cannotdifferentiatebetween the twoko (েকা)one typedwitha singlekeyand theotheronetypedwithtwodifferentkeysMoreoverthiswillnotbeconsideredasacaseofvariantbecauseakārafollowedbyakāraisnotallowed

61 InScriptVariantsHoweverweproposetwocasesoftruein-scriptvariantsinBanglascriptCASEIAs far as true variants in Bangla are concernedwemay drawour attention to caseswhereinHasantawith(U+09A5)থ(tha)appearsasconjunctwith(U+09B8)স(sa)and(U+09A8)ন(na)1 স+Hasanta+থ(U+09B8+U+09CD+U+09A5)versus

স +Hasanta+হ(U+09B8+U+09CD+U+09B9)

2 ন+Hasanta+থ(U+09A8+U+09CD+U+09A5)versus

ন+Hasanta+হ(U+09A8+U+09CD+U+09B9)

44

Theabovecombinationsifwrittenintraditionalorthographycouldbelittleconfusingwheretheথ(tha)inconjunctappearslikeaহ(ha)Theconjunctcouldbeintheinitialmedialorfinalpositions(asshownbelowinegno1)Itcouldbetypedwrongaswellthinking itwas aহ (ha)U+09B9 increasing the chances of risks in labelwriting andidentificationExamples1 acuteandসহ(asinacuteানsthānaacuteলsthulaাacuteGsvāsthyaঅacuteায়ীasthāyı)

2 andনহ(asinparagrantha)The fontswhichrepresent traditionalBanglawritingsystemcould tend tocreate thisproblemThereforethesemaybetakenascasesofvariantsinBanglaCASEIIAnotherinterestingexampleofvariantisencounteredinra+HasantaandHasanta+racombinationsinwritinglabelsintheBanglascript(forlanguagessuchasBanglaAssameseandManipuri)Thevariantcasesariseintypinglsquorepharsquo(involvingra+Hasanta)andlsquora-phalārsquo(involvingHasanta+ra)lsquoRepharsquocouldbeformedbytwosequences(mainlybecausebothAssameseandBanglafindplaceinthesameUNICODEpointsandlsquoB_RArsquoaswellaslsquoA_RArsquorefertothesamephoneticelement)Herethefinalligatureslookthesameandwillbeasfollows

(1) B_RA+H+C(2) A_RA+H+C

Where

B_RA = U+09B0BENGALILETTERRA(র)orA_RA = U+09F0BENGALILETTERRAWITHMIDDLEDIAGONAL(ৰ)H = U+09CDBENGALISIGNVIRAMA()C = anyconsonant(theoretically)

Example

Sequence1(UsingBanglaRA)

Ligature1

Sequence2(UsingAssameseRA)

Ligature2

U+09B0(র)U+09CD()U+0995(ক) কH U+09F0(ৰ)U+09CD()U+0995(ক) কH

U+09B0(র)U+09CD()U+09A0(ঠ) ঠH U+09F0(ৰ)U+09CD()U+09A0(ঠ) ঠH

Table12ExampleofRepha

45

NoteAsBanglaandAssameseকandঠlookexactlythesametheresultantcombinationswithRephalookidenticalAdditionofRephadoesnotmakeanydifferencelsquoRa-phalarsquocouldbeformedbytwosequencesonsimilargroundsandthefinalligatureswouldlookthesame

(1) C1+H+B_RA(2) C1+H+A_RA

WhereC1 = anyconsonantsexceptKhanda-ta

Example

Sequence1(UsingBanglaRA)

Ligature1

Sequence2(UsingAssameseRA)

Ligature2

U+0995(ক)U+09CD()U+09B0(র) U+0995(ক)U+09CD()U+09F0(ৰ)

U+09A8(ন)U+09CD()U+09B0(র) ) U+09A8(ন)U+09CD()U+09F0(ৰ) )

Table13ExampleofRa-phala

AstheAssameseandBanglaRephaandRa-phalaconjunctformslookthesamethiscouldcauseconfusabilitytotheend-usersHencetherephaandra-phalacasesneedtobedefinedasvariantsNBGPconcludedtodefineরandৰasvariantcodepointswhereonlyonevariantsetbetweenরandৰcouldcoverallcasesButthiswillcreateblockedvariantlabelsegif

someoneregistersldquoরররrdquothevariantlabelldquoৰৰৰrdquowillbegeneratedasvariantandwillbeblockedandviceversaHoweveritisonlyblockedatthelabellevelifsomeoneelseneedstoregisterotherlabelsegৰৰorৰৰৰৰitisstillpossible

62 CrossScriptVariantsAcrispcrossscriptstudyforBanglahasbeendonewithrespecttosisterscriptssuchasDevanāgarī Gurmukhı and Odia11(formerly Oriya) keeping in mind the visual andtechnicalconfusionstheymaycauseaslabelsonthewebdomainMoreoverthereisnoin-script variant in Bangla as far as the orthography is concerned The followingcharacters are being proposed by the NBPG as variants Although there are certaincharacters which are somewhat similar they but have not been included here TheyhavebeenprovidedintheAppendix(102)forreference

11 Unicode uses Oriya for the script although Odia is now the official term used

46

1 BanglaandNāgarīDevanāgarīScript

Bangla Devanāgarī

মU+09AE

मU+092E

িU+09BF

िU+093F

Table14-BanglaandDevanāgarīcross-scriptvariantcodepoint

2 BanglaandGurmukhiScript

Bangla Gurmukhı

মU+09AE

ਸU+0A38

িU+09BF

ਿU+0A3F

Table15-BanglaandGurmukhıcross-scriptvariantcodepoint

7 WholeLabelEvaluationRules(WLE) ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinsection32whenwritten inBangla12ScriptThe ruleshavebeendrafted in suchawaythattheycanbeeasilytranslatedintotheLGRspecificationsBelow are the symbols used in the WLE rules for each of the Indic SyllabicCategoryasmentionedinthetableprovidedinCodepointrepertoire(Section51)

C rarr Consonant

M rarr Kāra(Mātrā)

V rarr Vowel

B rarr Anusvāra

12 As used by the Unicode denoting and including both Assamese and Maṇipuri

47

D rarr Candrabindu

X rarr Visarga

H rarr Hasanta

Z rarr KhandaTa

S rarr S1S2(fromTable9)or(ae)Ya-phalā(V1HC1M1)whereV1iseither0985(অ-BENGALILETTERA)or098F(এ-BENGALILETTERE)His09CD(-BENGALISIGNVIRAMA)C1is-09AF(য-BENGALILETTERYA)M1is-09BE(া-BENGALIVOWELSIGNAA)S1 and S2 are valid even they are not allowed by the other context rules

P rarr Ra-Hasanta(C2H)whereC2iseither09B0(র-BENGALILETTERRA)or 09F0 (ৰ - ASSAMESE LETTER RA

Unicode name BENGALI LETTER RAWITHMIDDLEDIAGONAL)

His09CD(-BENGALISIGNVIRAMA)

Table16-SymbolsusedinWLErules

ItisalsoperhapsidealtomentionherethatinBanglatheconsonantletters(orgraphemes)arephysicallyjoinedtoformldquoclustersrdquothatcouldtheoreticallyconjoinfromtwotofourconsonantsandcombinetocreatenewshapesDashandChaudhuri(1998)statethatthereareldquonearly380uniqueconsonantclustersrdquooutofwhichBi-consonantalcombinationsare290three-lettercombinationsaccountforanother80andtherareroneswithfourlettersnumber10more[136Pg4]MoredetailsofsuchcombinationscouldbeseeninPabitraSarkar(1993)[135] 71FinalSetofWLERulesTheprevalentpatternsinBanglaandvariousrestrictionsbelowarethespecificWLErulesthatneedtobeimplemented

48

1 CisasetofCandCNwhereCNisthesetofnormalizedformsofড়ঢ়য়2 HmustbeprecededbyC

Example

3 MmustbeprecededbyCExampleকা

4 DmustbeprecededbyeitherofVCorMExampleআখখা হা

5 XmustbeprecededbyeitherofVCMorDExampleউঃখঃবঃাঃ দ ঃ

6 BmustbeprecededbyeitherofVCMorDExampleআংইংকং

7 ZmustbeprecededbyVCMDBXorPExampleইৎকৎাৎাৎপ6ৎrৎ (S is not listed because S ends with M Z may also follow S)rdquo

8 VCANNOTbeprecededbyHDetailsin711CaseofVprecededbyH

9 SCANNOTbeprecededbyH

Now letuselaborateeach rulewithexamples from the scriptkeeping inmindthe Bangla Assamese and Manipuri communities Some combinations ofcharactersmayseemunrealisticorrareinusagebutthereisnoharminaddingsuch ligaturesbecause it ispossible tocreate thembyanyusereasilybutmaynotbeattestedcombinationsCase of V Preceded by H There could be cases involvingmulti-word domains where Vmay need to beallowedtofollowanHegব8াsঅtইিuয়া baeligŋkʌv ɪndiə (U+09ACU+09CDU+09AFU+09BEU+0999U+09CD U+0995 U+0985 U+09AB U+09CD U+0987 U+09A8 U+09CD U+09A1U+09BFU+09DFU+09BE)(meaningBankofIndia)ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendswithanH(অt)andthesecondwordbeginswithaV(ইিuয়া)Somesectionsofthe linguistic community require the explicit presence of H for fullrepresentation of the sound intended However by and large the form of thefirstwordwithoutanH(U+09CD)isconsideredenoughforfullrepresentationofthesoundintendedforthefirstword

49

This isauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoire Otherwise V is never required to be allowed to follow an HPermittingthismaycreateaperceptivesimilaritybetweentwolabels(withandwithout H) for majority of the linguistic communities hence this is explicitlyprohibitedbytheNBGPIn future if required depending on the prevailing requirements from thecommunitythefutureNBGPmayconsiderrevisitingthisrule

72 AdditionalExamplesfromBanglaABNF

Belowarea fewexampleswhichhelponeunderstandsomeof the rulesABNFputsinplaceThesearejustgivenforreferencepurposesandarenotmeanttobecomprehensive

1HMBDorXcannotoccurinthebeginningofaBanglawordExample

ক क

াক ाक

ংক क

ক क

ঃক ःकAscanbeseensuchcombinationwillresultautomaticallyinaldquogolurdquooradottedcircle marking it as an invalid formation This is an intrinsic property of theIndianlanguagesyllableandisquasi-automaticallyappliedwhereversupportedbytheOS

2HisnotpermittedafterVBDXMS

Example

অ अ

অং क ক क কঃ कः ি )क

3NumberofBDorXpermittedafterConsonantorVowelorakāra(Mātrā)isrestrictedtoonethusthefollowingcombinationsareinvalidated

50

Example

কংং क

ক क

কঃঃ कःः

কা का

কীঃঃ कःः

অংং अ

অ अ

অঃঃ अःः

4NumberofMpermittedafterConsonantisrestrictedtooneExample

কীী की5MisnotpermittedafterV Example

ইাঈৗ ईाईौ6ThecombinationsofAnusvāra+ VisargaaswellasVisarga+Anusvāraarenotpermissible

Example

কংঃ कः

কঃং कः

8 Contributors

81ExpertsfromIndia ProfessorUdayaNarayanaSinghChair-ProfessorofLinguisticsampDeanFacultyofArtsAmityUniversityHaryanaGurgaonPachgaonManesarPIN122431(Haryana)India

51

ProfessorPabitraSarkarformerlyVice-ChancellorRabindraBharatiUniversityKolkataDrAtiurRahmanKhanPrincipalTechnicalOfficerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMrRajibChakrabortyLinguistSocietyforNaturalLanguageTechnologyResearch(SNLTR)Module114amp130SDFBuildingSaltLakeSector-VKolkata-700091(WestBengal)IndiaMrAkshatJoshiProjectEngineerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMsMoumitaChowdhurySeniorTechnicalOfficerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMrChandrakantaMurasinghAgartalaTripuraSomeotherNBGPmembers

82ContributorsfromBangladeshJanabMustafaJabbarHonorableMinisterMinistryofPostsTelecommunicationsampInformationTechnologyGovtofBangladeshProfShamsuzzamanKhanFormerDirector-GeneralBanglaAcademyDhakaProfRafiqulIslamNationalProfessorofHumanitiesDhakaProfSwarochisSarkarDirectorInstituteofBangladeshStudiesRajshahiUniversityRajshahiBangladeshProfJinnatImtiazAliDirector-GeneralInternationalMotherLanguageInstituteDhakaMrMohammadMamunOrRashidDepartmentofBanglaJahangirnagarUniversityampMemberBangladeshComputerCouncilProfManiruzzamanformerlyProfessorChittagongUniversityChattagramBangladeshMrShyamSunderSikderSecretarySecretaryPostampTelecommunicationsDivisionGovtofBangladesh

52

MrMdMustafaKamalFormerDirectorGeneralBangladeshTelecommunicationsRegulatoryAuthorityGovernmentofBangladeshDhakaBrigadierGeneralMdMahfuzulKarimMajumderDirector-GeneralEngineeringampOperationsDivisionBangladeshTelecommunicationsRegulatoryAuthorityGovernmentofBangladeshDhakaMdZiarulIslamProgrammerPostsampTelecommunicationsDivisionGovernmentofBangladeshDhakaProfSyedShahriyarRahmanDepartmentofLinguisticsUniversityofDhakaDrMizanurRahmanDirector(In-Charge)TranslationTextBookandInternationalRelationsDivisionBanglaAcademyDhakaDrApareshBandyopadhyayDirectorBanglaAcademyDhakaMrMdMobarakHossainDirectorBanglaAcademyDhakaDrJalalAhmedDirectorBanglaAcademyDhakaMrJahangirHossainInternetSocietyBangladesh(ICANNALS)JanabSarwarMostafaChoudhuryBangladeshComputerCouncilDhakaJanabMdRashidWasifBangladeshComputerCouncilDhakaJanabIstiaqueArifSeniorAssistantDirectorBangladeshTelecommunicationsRegulatoryAuthorityDhakaMsAfifaAbbasInformationSecurityandGovernanceLeadEngineeratBanglalinkandICANNFellowMr Mohammad Abdul Haque Secretary General Bangladesh Internet GovernanceForumMrImranHossenCEOEyeSoftandkeymemberofBangladeshAssociationofSoftwareampInformationServices(BASIS)MsShahidaKhatunDirectorFolkloreMuseumampArchiveDivisionBanglaAcademyDhakaMrSyedAshikRehmanCEOBengalMediaCorporationDhakaMrHaseebRahmanCEOProfessionalsrsquoSystemsDhaka

53

9 References[100] UnicodeConsortium2017UnicodeStandard100MountainViewCA

[101] BandyopadhyayChittaranjan1981DuiShatakerBanglaMudranoPrakashanKolkataAnandaPublishers

[102] BanerjiRD1919TheOriginoftheBengaliScriptKolkataNewDelhiAsianEducationalServices2003reprint

[103] ChatterjiSK1926TheOriginandDevelopmentoftheBengaliLanguageCalcuttaCalcuttaUniversityPress

[104] -----1939Bhasha-prakashBangalaVyakaran(AGrammaroftheBengaliLanguage)CalcuttaUniversityofCalcutta

[105] HaiMuhammadAbdul1964DhvaniVijnanOBanglaDhvani-tattwa(PhoneticsandBengaliPhonology)DhakaBanglaAcademy

[106]JhaSubhadra1958TheFormationofMaithiliLondonLuzacampCo

[107] KosticDjordjeDasRheaS1972AShortOutlineofBengaliPhoneticsCalcuttaStatisticalPublishingCompany

[108] MajumdarRC1971HistoryofAncientBengalCalcuttaGBhardwaj

[109] MazumdarBijaychandra19202000TheHistoryoftheBengaliLanguage(ReprCalcutta1920ed)NewDelhiAsianEducationalServices

[110] PandeyAnshuman2001ProposaltoEncodetheTirhutaScriptinISOIEC10646

[111] PalPalashBaran2001DhwanimalaBarnamalaKolkataPapyrus

[112] -----2007lsquoBanglaHarapherPanchParbarsquoInSwapanChakrabortyedMudranerSanskritiOBanglaBoiKolkataAbabhas

[113] RossFiona1999ThePrintedBengaliCharacteranditsEvolutionLondonCurzon

[114] ShastriMahamahopadhyayHaraPrasad1916HājārBacharērPurāṇaBāṅgālāBhāṣāyBauddhaGānōDōhāCalcuttaBangiyaSahityaParishat

[115] SinghUdayaNarayana(JointlyManiruzzaman)1983DiglossiainBangladeshandlanguageplanningCalcuttaGyanBharati214pp

[116] -----1987ABibliographyofBengaliLinguisticsMysoreCIILxii+316pp

[117] -----2017(withRajibChakrabortyBidishaBhattacharjeeampArimardanKumarTripathy)LanguagesandCulturesontheMarginGuidelinesforFieldworkonEndangeredLanguagesMimeoCentreforEndangeredLanguagesVisva-Bharati

54

[118] -----1980ScriptalchoiceandspellingreformAnessayinlanguageandplanningJournaloftheMSUniversityofBarodaSocialScienceNumber292173-186AmodifiedversionreprintedEAnnamalaiBjornJernuddandJoanRubinedsLanguagePlanningProceedingsofanInstituteMysoreCIIL405-417

[119] Sripantha1996JakhanChapakhanaEloKolkataPaschim-BangaBanglaAcademy

[120] SurAtul1986BanglaMudranerDushoBacharKolkataJijnasa

[121] ScriptBehaviourforBengaliVersion11TDILandC-DACPune

[122] BoraMahendra1981TheEvolutionofAssameseScriptJorhatAssamSahityaSabha

[123] ProposaltoEncodetheTirhutaScriptinISOIEC10646httpwwwunicodeorgL2L201111175r-tirhutapdfaccessedon25112017

[124] EthnologueAssameseintheLanguageCloudhttpswwwethnologuecomcloudasmaccessedon25112017

[125] BengalialphabetforManipurifoundinEthnologueManipuri(MeeteilonMeithei)httpswwwomniglotcomwritingmanipurihtmaccessedon20102019

[126] WikipediaBengalialphabethttpsenwikipediaorgwikiBengali_alphabetaccessedon25112017

[129]OmniglotSlyhetihttpwwwomniglotcomwritingsylotihtmaccessedon1052018

[130]WikipediaBishnupriyaManipurilanguagehttpsenwikipediaorgwikiBishnupriya_Manipuri_languageaccessedon25112017

[131] TheEMILLECIILCorpushttpmetashareeldaorgrepositorybrowsethe-emilleciil-corpusabdd35c8de6f11e2b1e400259011f6ea6bce74d38dbb42d881da76c64a6adb20accessedon1052018

[132]TheEMILLECIILCorpushttpcatalogelrainfoproduct_infophpproducts_id=696accessedon1052018

[133] BanglaLanguageampScript httpswwwisicalacin~rc_banglabanglahtmlaccessedon1052018

[134] SarkarPabitra1992BanglaBananSanskarSamasyaoSambhabanaKolkataChirayataPrakashan

[135] SarkarPabitra1993BanglaBhasharYuktabyanjanBhasha1123-45

55

[136] DashNiladriShekharandBBChaudhuri1998BanglaScriptAStructuralStudyLinguisticsToday121-28Alsoavailableathttpswwwacademiaedu9967428Bangla_Script_A_Structural_Study

[137] DaniAhmedHasan(1957)lsquoSrīhatta-NāgarīLipirUtpattioBikāśrsquoBanglaAcademyPatrika(Dhaka)Vol12(Bhadra-Agrahayan1364BangabdaNumber)pg1

[138] WikipediaSylhetiNagari httpsenwikipediaorgwikiSylheti_Nagariaccessedon1952018

[139] FuruiRyosuke(2015)lsquoVariegatedAdaptationsStateFormationinBengalfromtheFifthtoSeventhCenturyrsquoinBhairabiPrasadSahuampHermannKulkeedsInterrogatingPoliticalSystemsIntegrativeProcessesandStatesinPre-ModernIndiaChapter9Pp255-73NewDelhiManohar

[140] FergusonCharesAandMunierChowdhury(1960)lsquoPhonesofBengalirsquoLanguageVol36No1pp22-59

[141] ShahidullahMuhammad(2007)BuddhistMysticSongsDhakaMowlaBrothers

[142] RayPunyaSloka(1966)BengaliLanguageHandbookWashington

[143] HaiMuhammadAbdul(1960)AphoneticandphonologicalstudyofnasalsandnasalizationinBengaliDhakaUniversityofDhaka

[144] UnicodeConsortiumProposalSummaryFormtoAccompanySubmissionsforAdditionstotheRepertoireofISOIEC10646UNICODEhttpswwwunicodeorgL2L200202387r-syloti-formpdfaccessedonMay212018

[145] WikipediaOlChiki(Unicodeblock)httpsenwikipediaorgwikiOl_Chiki_(Unicode_block)accessedonMay212018

[146]BanglaScripthttpwwwbangladesh2000combdbangla_scripthtmlaccessedonMay212018

[147] BhattacharyaAshutoshed(1942)GopichandrerGanCalcuttaCalcuttaUniversity

[149] DasSisirKumar(1975)SahibsandMunshisAnAccountoftheCollegeofFortWilliamCalcutta

[150] IslamRafiqulPabitraSarkarMahbubulHaqampRajibChakraborty(eds)(2014)BanglaAcademyPromitoBanglaByabaharikByakaran(AFunctionalGrammarofStandardBangla)DhakaBanglaAcademy

[151] SarkarPabitra[2013]lsquoBanglaSpellingReformtheLongandShortofItrsquoBanglaJournal19215-232

[152] BanglaAcademy(2012)BanglaAcademyPromitoBanglaBananerNiyam(StandardBanglaSpellingasadoptedbyBanglaAcademy)DhakaBanglaAcademy

56

[153] SarkarPabitraampRajibChakraborty2018ldquoWhathashappenedSoFarIntermsof Script Reformsrdquo Paper presented at the Face to Face meeting jointly held by theBanglaAcademyDhakaampICANNatBanglaAcademyDhakaon10072018

[154] The Unicode Consortium 2018 The Unicodereg Standard Version 110 ndash CoreSpecificationChapter12P473

57

10 Appendix-I

101 AugmentedBackus-NaurFormalism(ABNF)The Augmented Backus-Naur Formalism (ABNF) is generic in nature and whenappliedtoaspecificlanguagescriptcertainrestrictionrulesapplyInotherwordsinagivenlanguagesomeoftheFormalismstructuresdonotnecessarilyapplyTotakecareofsuchcasesrestrictionrulesaresetinplaceTheserestrictionswillhelptofine-tunetheABNFIncaseofBangla13inparticularthefollowingrulesapply

1Khaṇḍata (ৎ) is NOT allowed at the beginning of an IDN label The sameappliestoঞandthevelarnasalঙintheBanglaSchemeoffive-foldlsquovargarsquo(asdefined under Table 5) Moreover Bangla does not allow ya (য়) in thebeginningofawordeitherbutwecanciteacoupleofnativeexamples forexamplethewordয়8াvেড়া(yaeligbbɔRo)fromthepoemlsquoLichuchorrsquowrittenbyKaziNazrul IslamHowever there are instances of it being used in namesmostlyof foreignoriginsuchasYaqubwhichmaybewrittenwithya(য়) inthe beginning as inয়াwব) In very recent timeswhile transliterating someChineseandJapanesenamesinBanglaonedoescomeacrossthepossibilityofKhaṇḍata (ৎ) followedbysa (স) inthebeginningofaword forexamplexেসিরং(Tsering)

2CHcancomewithKhandaTainonlythecasewhereCisra(র)(09B0)

ৎ6 asinভৎ6 সনা

3OnlyfollowingcombinationswithVHCMwillbeallowedrarrঅ8া(togetherpronouncedasaelig)asinঅ8ািসড(acid)rarrএ8া(togetheralsopronouncedasaelig)asinএ8ািসডএ8ােসািসেয়শান

(acidassociation)

102 lsquoSylhetiNagarılipirsquoorlsquoSilotirsquoThisversionofBanglascriptresemblesthe lsquoKaithīrsquoscript(ISO12954)usedbytheAccountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh andBiharndashwidelyinuseduringthe1880sTherewereseveralothernamesofSylheti

13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language Assamese and Maṇipuri have not been covered in this section

58

Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)

Table17ndashTheScriptTableofSylhetiNagarıorSiloti

103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints

1031 BanglaandNāgarīorDevanāgarī

Bangla Devanāgarī NBGP

Decision

ঃ U+0983 ः U+0903 Confusable

ওU+0993 उU+0909 Confusable

ঘU+0998 घU+0918 Confusable

U+0981 U+0945 Confusable

Table18BanglaandDevanāgarīconfusablecodepoints

59

1032 BanglaandGurmukhi

Bangla Gurmukhi NBGPdecision

ঘU+0998 ਬU+0A2C Confusable

U+0981 U+0A71 Confusable

Table19BanglaandGurmukhiconfusablecodepoints

Bangla Gurmukhi

NBGPdecision

ওU+0993 ਤU+0A24

Distinguishable

শU+09B6 ਅU+0A05

Distinguishable

মU+09AE ਮU+0A2E

Distinguishable

বাU+09ACandU+09BE

ਗU+0A17

Distinguishable

Table20ndashBanglaandGurmukhıdistinguishablecodepoints

1033BanglaandOriya(Odia)

Bangla Oriya(Odia) NBGPDecision

ওU+0993 ଓU+0B13 Confusable

Table21ndashBanglaandOriyadistinguishablecodepoints

Bangla Oriya(Odia) NBGP

Decision

ঘU+0998 ସU+0B38 Distinguishable

Table22ndashBanglaandOriyadistinguishablecodepoints

60

11 Appendix-IIBengaliconsonantsandtheirallographs

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

প p y(z+ত)(z +ন)|(z +প)প8(z +য)r(z +র)(z +ল)~(z +স)(+প)(+প)

ফ pʰ (t +র)(t +ল)(+ফ)

ব b ( +জ)( +দ)( +ধ)v(+ব)ব8(+য)(+র)(+ল)ভ(+ভ)(+ব)(+ব)

(0+ধ) 2 (3+ব)

ভ bʱ ভ8(+য)(+র)(+ল)

ত t (x+ত)8(x+x+য)(x+x+ব)(x+থ)(x+ন)ত8(x+য)(x+ম)8(x++য)(x+ব)(x+র)y(z+ত)(p+ত)(p+x+ব)(+ত)q8(+x++য) (+x+র)Thereisamarkedformofত+=ৎৎ6 (+xৎ)

amp (5+ত)

61

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

থ tʰ থ8(iexcl+য)cent(iexcl+র)pound(+থ)(x+থ)curren(+থ)

(7+থ) 9 (+থ)

দ d yen(brvbar+গ)sect(brvbar+ঘ)uml(brvbar+দ)copy(brvbar+ধ)দ8(brvbar+য)ordf(brvbar+ব)laquo(brvbar+ভ)not(brvbar+র)(+দ)shy(+দ)reg(+brvbar+র)not6 (+brvbar+র)

(lt+গ) gt (lt+ধ)

ধ dʱ macr(deg+ন)plusmn(deg+ম)ধ8(deg+য)sup2(deg+র)sup3(acute+ধ)copy(brvbar+ধ)(+ধ)micro(+ধ)

( (+ধ) gt (lt+ধ) (0+ধ) (7+ধ)

ট ʈ para(middot+ট)ট8(middot+য)cedil(middot+ব)sup1(middot+র)ordm(p+ট)raquo(frac14+ট)

ঠ ʈʰ ঠ8(frac12+য)frac34(iquest+ঠ)Agrave(frac14+ঠ)

ড ɖ Aacute(Acirc+ড)ড8(Acirc+য)Atilde(Acirc+র)

ঢ ɖʱ ঢ8(Auml+য)Aring(iquest+ঢ)

চ tʃ AElig(Ccedil+চ)Egrave(Ccedil+ছ)Eacute(Ccedil+Ecirc+র)Euml(Ccedil+ঞ)চ8(Ccedil+য)Igrave(Iacute+চ)Icirc(Iuml+চ)

(A+চ)

62

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ছ tʃʰ ETH(Ecirc+র)Egrave(Ccedil+ছ)Ntilde(Iacute+ছ)Ograve(Iuml+ছ)

$ (A+ছ)

জ dʒ Oacute(Ocirc+জ)Otilde(Ocirc+Ocirc+ব)Ouml(Ocirc+ঝ)times(Ocirc+ঞ)জ8(Ocirc+য)Oslash(Ocirc+র)Ugrave(Iacute+জ)

(A+জ)

ঝ dʒʱ (notprivilegedenoughtohaveclustersasafirstmember)Ouml(Ocirc+ঝ)Uacute(Iacute+ঝ)

ক k Ucirc(p+ক)ordm(p+ট)(p+ত)Uuml(p+x+র)(p+x+ব)Yacute(p+ন)THORN(p+ব)szlig(p+ম)ক8(p+য)agrave(p+র)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)auml(p+frac14+ব)aacute8(p+frac14+য)aring(p+স)s(aelig+ক)ccedil(+p+র)

amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)

খ kʰ (notprivilegedenoughtohaveclustersasafirstmember)egrave(aelig+খ)

63

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

গ g eacute(acute+গ)ecirc(acute+দ)sup3(acute+ধ)euml(acute+ন)igrave(acute+ব)iacute(acute+ম)গ8(acute+য)icirc(acute+র)iuml(acute+ল)eth(aelig+গ)eth6 (+aelig+গ)

( (+ধ) (H+গ) K (L+H+গ)

ঘ gʱ ntilde(ograve+ন)ঘ8(ograve+য)oacute(ograve+র)ocirc(aelig+ঘ)

ঞ Thisletterdoesnothaveanyparticularphoneticvaluebutmostlypronouncedasn

Igrave(Iacute+চ)Ntilde(Iacute+ছ)Ugrave(Iacute+জ)Uacute(Iacute+ঝ)times(Ocirc+ঞ)

(A+চ) $ (A+ছ) (A+জ) M (A+ঝ)

ণ n otilde(iquest+ট)frac34(iquest+ঠ)ouml(iquest+ড)divide(iquest+Acirc+র)Aring(iquest+ঢ)oslash(iquest+ণ)ণ8(iquest+য)ugrave(iquest+ব)acirc(p+frac14+ণ)uacute(frac14+ণ)ucirc(+ণ)

O (P+ড) - (P+R+র) + (S+ণ)

ঙং ŋ s(aelig+ক)uuml(aelig+p+র)egrave(aelig+খ)eth(aelig+গ)ocirc(aelig+ঘ)yacute(aelig+p+ষ)(Insomecontextsaeligisreplacedbyং)কংঅং

) (H+ক) (H+গ) U (H+ঘ)

64

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ম m thorn(+ল)yuml(+প)(+z+র)(+ভ)(++র)$(+ম)(+র)(x+ম)plusmn(deg+ম)amp(+ম)atilde(p+frac14+ম)

W (3+ম)

ন n (+ট)((+middot+র))(iquest+ঠ)u(+ড)(+Acirc+র)(+ত)q(+x+র)q8(+x++য)curren(+থ)shy(+দ)reg(+brvbar+র)micro(+ধ)+(+deg+র)(+brvbar+ব)-(+ন)(+ম)ন8(+য)(+স)0(+ন)

(7+থ) (7+ধ) (7+Y+র)

শ ʃ Icirc(Iuml+চ)Ograve(Iuml+ছ)1(Iuml+ন)2(Iuml+ম)3(Iuml+র)4(Iuml+ল)শ8(Iuml+য)

ষ ʃ 5(frac14+ক)raquo(frac14+ট)Agrave(frac14+ঠ)uacute(frac14+ণ)6(frac14+প)7(frac14+z+র)8(frac14+ফ)raquo(frac14+ট)9(frac14+middot+র)Agrave(frac14+ঠ)uacute(frac14+ণ)ষ8(frac14+য)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)

+ (S+ণ)

65

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

স sampʃ (+ক)(+ট)(+প)(+ফ)lt(+ত)pound(+থ)(+ট)(+ক)=(+খ)স8(+য)gt(+র)(+ল)aring(p+স)

9 (+থ)

হ h ucirc(+ণ)0(+ন)amp(+ম)হ8(+য)(+র)A(+ল)

W (3+ম)

ড় ɽ B(C+গ)

ঢ় ɽʱ (notprivilegedenoughtohaveclusters)

য dʒThesecondarysymbol(allograph)jɔ-phalahastwophoneticvaluesWhenaddedtotheinitialconsonantinaworditisavowelaelig(asinশ8ামলর 8াপারetc)Butafteranon-initialconsonantitjustdoublesitinpronunciation(asinকায6ধায6etc)The+যcombinationhastwophysicalmanifestationsmdashর 8andয6

ক8(p+য)স8(+য)র 8(+য)[Justর 8isneverusedinBanglaorthographyর 8াisbutthenitslasttwosymbolsYa-phalaa-karaconstituteavowelsignrepresentingthevowelঅ8া]

66

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

র r Twomanifestationsmdashi lরফrepʰasthefirst

memberofaclusteregপ6ৎ6 not6 য6D6(+deg+ব)(earlierE6=+brvbar+deg+বafour-termcluster)etc(placedoverthefollowingconsonant)

ii র-ফলাrɔ-pʰɔlaasthesecondthirdmemberofaclustereg etc(placedundertheconsonantitfollows)

ল l F(+গ)(+প)G(+ব)H(+ম)I(+ট)J(+ড)K(+ক)F(+গ)L(+দ)ল8(+য)iuml(acute+ল)(+ল)thorn(+ল)

ঃ hwordfinallywordmediallyitdoublesthepronunciationofthefollowingconsonant

অঃকঃ

অব

Page 41: Proposal for a Bangla (or Bengali) Script Root Zone …‘Bangla’ (or Bengali) is historically and genealogically regarded as an eastern Indo-Aryan language with around 178.2 million

41

CHCX sup3ঃ rarrক ক ঃ 4कःrarrककঃ

CHCDB sup3 ং rarrক ক ং 4कrarrकक

CHCDX sup3ঃ rarrক ক ঃ 4कःrarrककः [B] 3(CH)CMmayfurtherbefollowedbyaBDXDBorDX

Example

CHCMB sup3ীং rarr ক ক ী ং 4क rarr क क ी

sup3ং rarr ক ক ং 4क rarr क क

CHCMD sup3া rarr ক ক া 4का rarr क क ा

CHCMX sup3ীঃ rarr ক ক ী ঃ 4कः rarr क क ी ः

CHCMDB sup3াংrarr ক ক া ং 4काrarr क क ा

CHCMDX sup3াঃ rarr ক ক া ঃ 4काः rarr क क ा ः

544 TheKhanda-Tasequence

5441 AsinglelsquoKhandarsquo-Ta(Z) Example Zৎ=x

5442 AKhandaTaCombination10AKhandaTacanbeprecededbyaconsonantandHasanta(alsoknownasVirama)

[CH]Z

Example র++ৎ =ৎHasinভৎHসনা (bhartsanā) scoldingNoteTheconditionsinthiscontextofKHANDATAarethattheCshouldbeeitherRAU+09B0(র)(usedinBangla)orRAU+09F0(ৰ)(usedinAssamese)

545 SpecialCasesSandPTwospecialcasesinvolvingSequences(referredtoasSandPinTable16underSection7)couldbedescribedbrieflyhereLetustakeupSinthefirstinstanceItisnoteworthythat thereare two instances inBanglawhereHasanta(U+09CD) isprecededbya full

10 Refer to Rule P in Section 7 Table 16

42

vowel (U+0985 অ - BANGLA LETTER A and U+098F এ - BANGLA LETTER E) ForrenderingYa-phalāfollowedbyঅandএitisnecessarytotypeU+09CDplusU+09AFyapreceded by the said vowels This is a purely ligatural entity and the addition ofYa-phalāandā-kāraisusedtoelicittheaeligsoundasinEnglishlsquobatrsquolsquofatrsquoetcTheBrahmıscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribedaslsquovowel killerrsquo although it actually indicates absence of a vowel after the markedconsonant Only the consonants can have the Hasanta marked But as we see hereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅGাandএGা(CfUnicode100p473[100])Another case refers to the formation of repha and ra-phalā in the said script andmentioned in the tableaboveasPOwing toco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+

kaasinঅকHarkaldquothesunldquo)ra-phalā=C+Hasanta+ra(egTieka+Hasanta+raas

inচTcakraldquocycleldquo)ThepointisinboththecasestheslotforracouldbeBanglaraর

(U+09B0)ortheAssameseraৰ(U+09F0)followedprecededbythecommonHasanta(U+09CD)whereastheshapesofrephaandra-phalāinboththecasesremainthesame

43

6 VariantsThissection talksabout thevariants in theBanglascriptTheNBGPcategorizes theseconfusinglyvariantsintwogroupsGroup1ConfusingduetopurevisualsimilarityGroup2ConfusingduetodeviationfromnormallyperceivedcharacterformationsbylargerlinguisticcommunityForGroup1anyidenticalcodepointsaredefinedasvariantsTheconfusablebutnotidenticalcasesarenotproposedasthereisanotherpanel(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesHowevercaseswhichbelongtoGroup2areproposedtobeconsideredasvariantsThesecasesarenotofmerevisualsimilarityasthey involve some deviations from the widely accepted norms of Bangla Aksharformations These can cause confusion even to a careful observer and hence beingproposedasvariantsThe variants are generated in a script when two or more forms are formed withdifferent storage or code points In Bangla the e-kāra ā-kāra and the o-kāra havedifferentcodepointsOnecantypeowithaconsonantatonegoandthesamebytypinge-kāra and ā-kāra as two separate keys getting the same results A reader cannotdifferentiatebetween the twoko (েকা)one typedwitha singlekeyand theotheronetypedwithtwodifferentkeysMoreoverthiswillnotbeconsideredasacaseofvariantbecauseakārafollowedbyakāraisnotallowed

61 InScriptVariantsHoweverweproposetwocasesoftruein-scriptvariantsinBanglascriptCASEIAs far as true variants in Bangla are concernedwemay drawour attention to caseswhereinHasantawith(U+09A5)থ(tha)appearsasconjunctwith(U+09B8)স(sa)and(U+09A8)ন(na)1 স+Hasanta+থ(U+09B8+U+09CD+U+09A5)versus

স +Hasanta+হ(U+09B8+U+09CD+U+09B9)

2 ন+Hasanta+থ(U+09A8+U+09CD+U+09A5)versus

ন+Hasanta+হ(U+09A8+U+09CD+U+09B9)

44

Theabovecombinationsifwrittenintraditionalorthographycouldbelittleconfusingwheretheথ(tha)inconjunctappearslikeaহ(ha)Theconjunctcouldbeintheinitialmedialorfinalpositions(asshownbelowinegno1)Itcouldbetypedwrongaswellthinking itwas aহ (ha)U+09B9 increasing the chances of risks in labelwriting andidentificationExamples1 acuteandসহ(asinacuteানsthānaacuteলsthulaাacuteGsvāsthyaঅacuteায়ীasthāyı)

2 andনহ(asinparagrantha)The fontswhichrepresent traditionalBanglawritingsystemcould tend tocreate thisproblemThereforethesemaybetakenascasesofvariantsinBanglaCASEIIAnotherinterestingexampleofvariantisencounteredinra+HasantaandHasanta+racombinationsinwritinglabelsintheBanglascript(forlanguagessuchasBanglaAssameseandManipuri)Thevariantcasesariseintypinglsquorepharsquo(involvingra+Hasanta)andlsquora-phalārsquo(involvingHasanta+ra)lsquoRepharsquocouldbeformedbytwosequences(mainlybecausebothAssameseandBanglafindplaceinthesameUNICODEpointsandlsquoB_RArsquoaswellaslsquoA_RArsquorefertothesamephoneticelement)Herethefinalligatureslookthesameandwillbeasfollows

(1) B_RA+H+C(2) A_RA+H+C

Where

B_RA = U+09B0BENGALILETTERRA(র)orA_RA = U+09F0BENGALILETTERRAWITHMIDDLEDIAGONAL(ৰ)H = U+09CDBENGALISIGNVIRAMA()C = anyconsonant(theoretically)

Example

Sequence1(UsingBanglaRA)

Ligature1

Sequence2(UsingAssameseRA)

Ligature2

U+09B0(র)U+09CD()U+0995(ক) কH U+09F0(ৰ)U+09CD()U+0995(ক) কH

U+09B0(র)U+09CD()U+09A0(ঠ) ঠH U+09F0(ৰ)U+09CD()U+09A0(ঠ) ঠH

Table12ExampleofRepha

45

NoteAsBanglaandAssameseকandঠlookexactlythesametheresultantcombinationswithRephalookidenticalAdditionofRephadoesnotmakeanydifferencelsquoRa-phalarsquocouldbeformedbytwosequencesonsimilargroundsandthefinalligatureswouldlookthesame

(1) C1+H+B_RA(2) C1+H+A_RA

WhereC1 = anyconsonantsexceptKhanda-ta

Example

Sequence1(UsingBanglaRA)

Ligature1

Sequence2(UsingAssameseRA)

Ligature2

U+0995(ক)U+09CD()U+09B0(র) U+0995(ক)U+09CD()U+09F0(ৰ)

U+09A8(ন)U+09CD()U+09B0(র) ) U+09A8(ন)U+09CD()U+09F0(ৰ) )

Table13ExampleofRa-phala

AstheAssameseandBanglaRephaandRa-phalaconjunctformslookthesamethiscouldcauseconfusabilitytotheend-usersHencetherephaandra-phalacasesneedtobedefinedasvariantsNBGPconcludedtodefineরandৰasvariantcodepointswhereonlyonevariantsetbetweenরandৰcouldcoverallcasesButthiswillcreateblockedvariantlabelsegif

someoneregistersldquoরররrdquothevariantlabelldquoৰৰৰrdquowillbegeneratedasvariantandwillbeblockedandviceversaHoweveritisonlyblockedatthelabellevelifsomeoneelseneedstoregisterotherlabelsegৰৰorৰৰৰৰitisstillpossible

62 CrossScriptVariantsAcrispcrossscriptstudyforBanglahasbeendonewithrespecttosisterscriptssuchasDevanāgarī Gurmukhı and Odia11(formerly Oriya) keeping in mind the visual andtechnicalconfusionstheymaycauseaslabelsonthewebdomainMoreoverthereisnoin-script variant in Bangla as far as the orthography is concerned The followingcharacters are being proposed by the NBPG as variants Although there are certaincharacters which are somewhat similar they but have not been included here TheyhavebeenprovidedintheAppendix(102)forreference

11 Unicode uses Oriya for the script although Odia is now the official term used

46

1 BanglaandNāgarīDevanāgarīScript

Bangla Devanāgarī

মU+09AE

मU+092E

িU+09BF

िU+093F

Table14-BanglaandDevanāgarīcross-scriptvariantcodepoint

2 BanglaandGurmukhiScript

Bangla Gurmukhı

মU+09AE

ਸU+0A38

িU+09BF

ਿU+0A3F

Table15-BanglaandGurmukhıcross-scriptvariantcodepoint

7 WholeLabelEvaluationRules(WLE) ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinsection32whenwritten inBangla12ScriptThe ruleshavebeendrafted in suchawaythattheycanbeeasilytranslatedintotheLGRspecificationsBelow are the symbols used in the WLE rules for each of the Indic SyllabicCategoryasmentionedinthetableprovidedinCodepointrepertoire(Section51)

C rarr Consonant

M rarr Kāra(Mātrā)

V rarr Vowel

B rarr Anusvāra

12 As used by the Unicode denoting and including both Assamese and Maṇipuri

47

D rarr Candrabindu

X rarr Visarga

H rarr Hasanta

Z rarr KhandaTa

S rarr S1S2(fromTable9)or(ae)Ya-phalā(V1HC1M1)whereV1iseither0985(অ-BENGALILETTERA)or098F(এ-BENGALILETTERE)His09CD(-BENGALISIGNVIRAMA)C1is-09AF(য-BENGALILETTERYA)M1is-09BE(া-BENGALIVOWELSIGNAA)S1 and S2 are valid even they are not allowed by the other context rules

P rarr Ra-Hasanta(C2H)whereC2iseither09B0(র-BENGALILETTERRA)or 09F0 (ৰ - ASSAMESE LETTER RA

Unicode name BENGALI LETTER RAWITHMIDDLEDIAGONAL)

His09CD(-BENGALISIGNVIRAMA)

Table16-SymbolsusedinWLErules

ItisalsoperhapsidealtomentionherethatinBanglatheconsonantletters(orgraphemes)arephysicallyjoinedtoformldquoclustersrdquothatcouldtheoreticallyconjoinfromtwotofourconsonantsandcombinetocreatenewshapesDashandChaudhuri(1998)statethatthereareldquonearly380uniqueconsonantclustersrdquooutofwhichBi-consonantalcombinationsare290three-lettercombinationsaccountforanother80andtherareroneswithfourlettersnumber10more[136Pg4]MoredetailsofsuchcombinationscouldbeseeninPabitraSarkar(1993)[135] 71FinalSetofWLERulesTheprevalentpatternsinBanglaandvariousrestrictionsbelowarethespecificWLErulesthatneedtobeimplemented

48

1 CisasetofCandCNwhereCNisthesetofnormalizedformsofড়ঢ়য়2 HmustbeprecededbyC

Example

3 MmustbeprecededbyCExampleকা

4 DmustbeprecededbyeitherofVCorMExampleআখখা হা

5 XmustbeprecededbyeitherofVCMorDExampleউঃখঃবঃাঃ দ ঃ

6 BmustbeprecededbyeitherofVCMorDExampleআংইংকং

7 ZmustbeprecededbyVCMDBXorPExampleইৎকৎাৎাৎপ6ৎrৎ (S is not listed because S ends with M Z may also follow S)rdquo

8 VCANNOTbeprecededbyHDetailsin711CaseofVprecededbyH

9 SCANNOTbeprecededbyH

Now letuselaborateeach rulewithexamples from the scriptkeeping inmindthe Bangla Assamese and Manipuri communities Some combinations ofcharactersmayseemunrealisticorrareinusagebutthereisnoharminaddingsuch ligaturesbecause it ispossible tocreate thembyanyusereasilybutmaynotbeattestedcombinationsCase of V Preceded by H There could be cases involvingmulti-word domains where Vmay need to beallowedtofollowanHegব8াsঅtইিuয়া baeligŋkʌv ɪndiə (U+09ACU+09CDU+09AFU+09BEU+0999U+09CD U+0995 U+0985 U+09AB U+09CD U+0987 U+09A8 U+09CD U+09A1U+09BFU+09DFU+09BE)(meaningBankofIndia)ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendswithanH(অt)andthesecondwordbeginswithaV(ইিuয়া)Somesectionsofthe linguistic community require the explicit presence of H for fullrepresentation of the sound intended However by and large the form of thefirstwordwithoutanH(U+09CD)isconsideredenoughforfullrepresentationofthesoundintendedforthefirstword

49

This isauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoire Otherwise V is never required to be allowed to follow an HPermittingthismaycreateaperceptivesimilaritybetweentwolabels(withandwithout H) for majority of the linguistic communities hence this is explicitlyprohibitedbytheNBGPIn future if required depending on the prevailing requirements from thecommunitythefutureNBGPmayconsiderrevisitingthisrule

72 AdditionalExamplesfromBanglaABNF

Belowarea fewexampleswhichhelponeunderstandsomeof the rulesABNFputsinplaceThesearejustgivenforreferencepurposesandarenotmeanttobecomprehensive

1HMBDorXcannotoccurinthebeginningofaBanglawordExample

ক क

াক ाक

ংক क

ক क

ঃক ःकAscanbeseensuchcombinationwillresultautomaticallyinaldquogolurdquooradottedcircle marking it as an invalid formation This is an intrinsic property of theIndianlanguagesyllableandisquasi-automaticallyappliedwhereversupportedbytheOS

2HisnotpermittedafterVBDXMS

Example

অ अ

অং क ক क কঃ कः ি )क

3NumberofBDorXpermittedafterConsonantorVowelorakāra(Mātrā)isrestrictedtoonethusthefollowingcombinationsareinvalidated

50

Example

কংং क

ক क

কঃঃ कःः

কা का

কীঃঃ कःः

অংং अ

অ अ

অঃঃ अःः

4NumberofMpermittedafterConsonantisrestrictedtooneExample

কীী की5MisnotpermittedafterV Example

ইাঈৗ ईाईौ6ThecombinationsofAnusvāra+ VisargaaswellasVisarga+Anusvāraarenotpermissible

Example

কংঃ कः

কঃং कः

8 Contributors

81ExpertsfromIndia ProfessorUdayaNarayanaSinghChair-ProfessorofLinguisticsampDeanFacultyofArtsAmityUniversityHaryanaGurgaonPachgaonManesarPIN122431(Haryana)India

51

ProfessorPabitraSarkarformerlyVice-ChancellorRabindraBharatiUniversityKolkataDrAtiurRahmanKhanPrincipalTechnicalOfficerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMrRajibChakrabortyLinguistSocietyforNaturalLanguageTechnologyResearch(SNLTR)Module114amp130SDFBuildingSaltLakeSector-VKolkata-700091(WestBengal)IndiaMrAkshatJoshiProjectEngineerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMsMoumitaChowdhurySeniorTechnicalOfficerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMrChandrakantaMurasinghAgartalaTripuraSomeotherNBGPmembers

82ContributorsfromBangladeshJanabMustafaJabbarHonorableMinisterMinistryofPostsTelecommunicationsampInformationTechnologyGovtofBangladeshProfShamsuzzamanKhanFormerDirector-GeneralBanglaAcademyDhakaProfRafiqulIslamNationalProfessorofHumanitiesDhakaProfSwarochisSarkarDirectorInstituteofBangladeshStudiesRajshahiUniversityRajshahiBangladeshProfJinnatImtiazAliDirector-GeneralInternationalMotherLanguageInstituteDhakaMrMohammadMamunOrRashidDepartmentofBanglaJahangirnagarUniversityampMemberBangladeshComputerCouncilProfManiruzzamanformerlyProfessorChittagongUniversityChattagramBangladeshMrShyamSunderSikderSecretarySecretaryPostampTelecommunicationsDivisionGovtofBangladesh

52

MrMdMustafaKamalFormerDirectorGeneralBangladeshTelecommunicationsRegulatoryAuthorityGovernmentofBangladeshDhakaBrigadierGeneralMdMahfuzulKarimMajumderDirector-GeneralEngineeringampOperationsDivisionBangladeshTelecommunicationsRegulatoryAuthorityGovernmentofBangladeshDhakaMdZiarulIslamProgrammerPostsampTelecommunicationsDivisionGovernmentofBangladeshDhakaProfSyedShahriyarRahmanDepartmentofLinguisticsUniversityofDhakaDrMizanurRahmanDirector(In-Charge)TranslationTextBookandInternationalRelationsDivisionBanglaAcademyDhakaDrApareshBandyopadhyayDirectorBanglaAcademyDhakaMrMdMobarakHossainDirectorBanglaAcademyDhakaDrJalalAhmedDirectorBanglaAcademyDhakaMrJahangirHossainInternetSocietyBangladesh(ICANNALS)JanabSarwarMostafaChoudhuryBangladeshComputerCouncilDhakaJanabMdRashidWasifBangladeshComputerCouncilDhakaJanabIstiaqueArifSeniorAssistantDirectorBangladeshTelecommunicationsRegulatoryAuthorityDhakaMsAfifaAbbasInformationSecurityandGovernanceLeadEngineeratBanglalinkandICANNFellowMr Mohammad Abdul Haque Secretary General Bangladesh Internet GovernanceForumMrImranHossenCEOEyeSoftandkeymemberofBangladeshAssociationofSoftwareampInformationServices(BASIS)MsShahidaKhatunDirectorFolkloreMuseumampArchiveDivisionBanglaAcademyDhakaMrSyedAshikRehmanCEOBengalMediaCorporationDhakaMrHaseebRahmanCEOProfessionalsrsquoSystemsDhaka

53

9 References[100] UnicodeConsortium2017UnicodeStandard100MountainViewCA

[101] BandyopadhyayChittaranjan1981DuiShatakerBanglaMudranoPrakashanKolkataAnandaPublishers

[102] BanerjiRD1919TheOriginoftheBengaliScriptKolkataNewDelhiAsianEducationalServices2003reprint

[103] ChatterjiSK1926TheOriginandDevelopmentoftheBengaliLanguageCalcuttaCalcuttaUniversityPress

[104] -----1939Bhasha-prakashBangalaVyakaran(AGrammaroftheBengaliLanguage)CalcuttaUniversityofCalcutta

[105] HaiMuhammadAbdul1964DhvaniVijnanOBanglaDhvani-tattwa(PhoneticsandBengaliPhonology)DhakaBanglaAcademy

[106]JhaSubhadra1958TheFormationofMaithiliLondonLuzacampCo

[107] KosticDjordjeDasRheaS1972AShortOutlineofBengaliPhoneticsCalcuttaStatisticalPublishingCompany

[108] MajumdarRC1971HistoryofAncientBengalCalcuttaGBhardwaj

[109] MazumdarBijaychandra19202000TheHistoryoftheBengaliLanguage(ReprCalcutta1920ed)NewDelhiAsianEducationalServices

[110] PandeyAnshuman2001ProposaltoEncodetheTirhutaScriptinISOIEC10646

[111] PalPalashBaran2001DhwanimalaBarnamalaKolkataPapyrus

[112] -----2007lsquoBanglaHarapherPanchParbarsquoInSwapanChakrabortyedMudranerSanskritiOBanglaBoiKolkataAbabhas

[113] RossFiona1999ThePrintedBengaliCharacteranditsEvolutionLondonCurzon

[114] ShastriMahamahopadhyayHaraPrasad1916HājārBacharērPurāṇaBāṅgālāBhāṣāyBauddhaGānōDōhāCalcuttaBangiyaSahityaParishat

[115] SinghUdayaNarayana(JointlyManiruzzaman)1983DiglossiainBangladeshandlanguageplanningCalcuttaGyanBharati214pp

[116] -----1987ABibliographyofBengaliLinguisticsMysoreCIILxii+316pp

[117] -----2017(withRajibChakrabortyBidishaBhattacharjeeampArimardanKumarTripathy)LanguagesandCulturesontheMarginGuidelinesforFieldworkonEndangeredLanguagesMimeoCentreforEndangeredLanguagesVisva-Bharati

54

[118] -----1980ScriptalchoiceandspellingreformAnessayinlanguageandplanningJournaloftheMSUniversityofBarodaSocialScienceNumber292173-186AmodifiedversionreprintedEAnnamalaiBjornJernuddandJoanRubinedsLanguagePlanningProceedingsofanInstituteMysoreCIIL405-417

[119] Sripantha1996JakhanChapakhanaEloKolkataPaschim-BangaBanglaAcademy

[120] SurAtul1986BanglaMudranerDushoBacharKolkataJijnasa

[121] ScriptBehaviourforBengaliVersion11TDILandC-DACPune

[122] BoraMahendra1981TheEvolutionofAssameseScriptJorhatAssamSahityaSabha

[123] ProposaltoEncodetheTirhutaScriptinISOIEC10646httpwwwunicodeorgL2L201111175r-tirhutapdfaccessedon25112017

[124] EthnologueAssameseintheLanguageCloudhttpswwwethnologuecomcloudasmaccessedon25112017

[125] BengalialphabetforManipurifoundinEthnologueManipuri(MeeteilonMeithei)httpswwwomniglotcomwritingmanipurihtmaccessedon20102019

[126] WikipediaBengalialphabethttpsenwikipediaorgwikiBengali_alphabetaccessedon25112017

[129]OmniglotSlyhetihttpwwwomniglotcomwritingsylotihtmaccessedon1052018

[130]WikipediaBishnupriyaManipurilanguagehttpsenwikipediaorgwikiBishnupriya_Manipuri_languageaccessedon25112017

[131] TheEMILLECIILCorpushttpmetashareeldaorgrepositorybrowsethe-emilleciil-corpusabdd35c8de6f11e2b1e400259011f6ea6bce74d38dbb42d881da76c64a6adb20accessedon1052018

[132]TheEMILLECIILCorpushttpcatalogelrainfoproduct_infophpproducts_id=696accessedon1052018

[133] BanglaLanguageampScript httpswwwisicalacin~rc_banglabanglahtmlaccessedon1052018

[134] SarkarPabitra1992BanglaBananSanskarSamasyaoSambhabanaKolkataChirayataPrakashan

[135] SarkarPabitra1993BanglaBhasharYuktabyanjanBhasha1123-45

55

[136] DashNiladriShekharandBBChaudhuri1998BanglaScriptAStructuralStudyLinguisticsToday121-28Alsoavailableathttpswwwacademiaedu9967428Bangla_Script_A_Structural_Study

[137] DaniAhmedHasan(1957)lsquoSrīhatta-NāgarīLipirUtpattioBikāśrsquoBanglaAcademyPatrika(Dhaka)Vol12(Bhadra-Agrahayan1364BangabdaNumber)pg1

[138] WikipediaSylhetiNagari httpsenwikipediaorgwikiSylheti_Nagariaccessedon1952018

[139] FuruiRyosuke(2015)lsquoVariegatedAdaptationsStateFormationinBengalfromtheFifthtoSeventhCenturyrsquoinBhairabiPrasadSahuampHermannKulkeedsInterrogatingPoliticalSystemsIntegrativeProcessesandStatesinPre-ModernIndiaChapter9Pp255-73NewDelhiManohar

[140] FergusonCharesAandMunierChowdhury(1960)lsquoPhonesofBengalirsquoLanguageVol36No1pp22-59

[141] ShahidullahMuhammad(2007)BuddhistMysticSongsDhakaMowlaBrothers

[142] RayPunyaSloka(1966)BengaliLanguageHandbookWashington

[143] HaiMuhammadAbdul(1960)AphoneticandphonologicalstudyofnasalsandnasalizationinBengaliDhakaUniversityofDhaka

[144] UnicodeConsortiumProposalSummaryFormtoAccompanySubmissionsforAdditionstotheRepertoireofISOIEC10646UNICODEhttpswwwunicodeorgL2L200202387r-syloti-formpdfaccessedonMay212018

[145] WikipediaOlChiki(Unicodeblock)httpsenwikipediaorgwikiOl_Chiki_(Unicode_block)accessedonMay212018

[146]BanglaScripthttpwwwbangladesh2000combdbangla_scripthtmlaccessedonMay212018

[147] BhattacharyaAshutoshed(1942)GopichandrerGanCalcuttaCalcuttaUniversity

[149] DasSisirKumar(1975)SahibsandMunshisAnAccountoftheCollegeofFortWilliamCalcutta

[150] IslamRafiqulPabitraSarkarMahbubulHaqampRajibChakraborty(eds)(2014)BanglaAcademyPromitoBanglaByabaharikByakaran(AFunctionalGrammarofStandardBangla)DhakaBanglaAcademy

[151] SarkarPabitra[2013]lsquoBanglaSpellingReformtheLongandShortofItrsquoBanglaJournal19215-232

[152] BanglaAcademy(2012)BanglaAcademyPromitoBanglaBananerNiyam(StandardBanglaSpellingasadoptedbyBanglaAcademy)DhakaBanglaAcademy

56

[153] SarkarPabitraampRajibChakraborty2018ldquoWhathashappenedSoFarIntermsof Script Reformsrdquo Paper presented at the Face to Face meeting jointly held by theBanglaAcademyDhakaampICANNatBanglaAcademyDhakaon10072018

[154] The Unicode Consortium 2018 The Unicodereg Standard Version 110 ndash CoreSpecificationChapter12P473

57

10 Appendix-I

101 AugmentedBackus-NaurFormalism(ABNF)The Augmented Backus-Naur Formalism (ABNF) is generic in nature and whenappliedtoaspecificlanguagescriptcertainrestrictionrulesapplyInotherwordsinagivenlanguagesomeoftheFormalismstructuresdonotnecessarilyapplyTotakecareofsuchcasesrestrictionrulesaresetinplaceTheserestrictionswillhelptofine-tunetheABNFIncaseofBangla13inparticularthefollowingrulesapply

1Khaṇḍata (ৎ) is NOT allowed at the beginning of an IDN label The sameappliestoঞandthevelarnasalঙintheBanglaSchemeoffive-foldlsquovargarsquo(asdefined under Table 5) Moreover Bangla does not allow ya (য়) in thebeginningofawordeitherbutwecanciteacoupleofnativeexamples forexamplethewordয়8াvেড়া(yaeligbbɔRo)fromthepoemlsquoLichuchorrsquowrittenbyKaziNazrul IslamHowever there are instances of it being used in namesmostlyof foreignoriginsuchasYaqubwhichmaybewrittenwithya(য়) inthe beginning as inয়াwব) In very recent timeswhile transliterating someChineseandJapanesenamesinBanglaonedoescomeacrossthepossibilityofKhaṇḍata (ৎ) followedbysa (স) inthebeginningofaword forexamplexেসিরং(Tsering)

2CHcancomewithKhandaTainonlythecasewhereCisra(র)(09B0)

ৎ6 asinভৎ6 সনা

3OnlyfollowingcombinationswithVHCMwillbeallowedrarrঅ8া(togetherpronouncedasaelig)asinঅ8ািসড(acid)rarrএ8া(togetheralsopronouncedasaelig)asinএ8ািসডএ8ােসািসেয়শান

(acidassociation)

102 lsquoSylhetiNagarılipirsquoorlsquoSilotirsquoThisversionofBanglascriptresemblesthe lsquoKaithīrsquoscript(ISO12954)usedbytheAccountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh andBiharndashwidelyinuseduringthe1880sTherewereseveralothernamesofSylheti

13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language Assamese and Maṇipuri have not been covered in this section

58

Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)

Table17ndashTheScriptTableofSylhetiNagarıorSiloti

103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints

1031 BanglaandNāgarīorDevanāgarī

Bangla Devanāgarī NBGP

Decision

ঃ U+0983 ः U+0903 Confusable

ওU+0993 उU+0909 Confusable

ঘU+0998 घU+0918 Confusable

U+0981 U+0945 Confusable

Table18BanglaandDevanāgarīconfusablecodepoints

59

1032 BanglaandGurmukhi

Bangla Gurmukhi NBGPdecision

ঘU+0998 ਬU+0A2C Confusable

U+0981 U+0A71 Confusable

Table19BanglaandGurmukhiconfusablecodepoints

Bangla Gurmukhi

NBGPdecision

ওU+0993 ਤU+0A24

Distinguishable

শU+09B6 ਅU+0A05

Distinguishable

মU+09AE ਮU+0A2E

Distinguishable

বাU+09ACandU+09BE

ਗU+0A17

Distinguishable

Table20ndashBanglaandGurmukhıdistinguishablecodepoints

1033BanglaandOriya(Odia)

Bangla Oriya(Odia) NBGPDecision

ওU+0993 ଓU+0B13 Confusable

Table21ndashBanglaandOriyadistinguishablecodepoints

Bangla Oriya(Odia) NBGP

Decision

ঘU+0998 ସU+0B38 Distinguishable

Table22ndashBanglaandOriyadistinguishablecodepoints

60

11 Appendix-IIBengaliconsonantsandtheirallographs

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

প p y(z+ত)(z +ন)|(z +প)প8(z +য)r(z +র)(z +ল)~(z +স)(+প)(+প)

ফ pʰ (t +র)(t +ল)(+ফ)

ব b ( +জ)( +দ)( +ধ)v(+ব)ব8(+য)(+র)(+ল)ভ(+ভ)(+ব)(+ব)

(0+ধ) 2 (3+ব)

ভ bʱ ভ8(+য)(+র)(+ল)

ত t (x+ত)8(x+x+য)(x+x+ব)(x+থ)(x+ন)ত8(x+য)(x+ম)8(x++য)(x+ব)(x+র)y(z+ত)(p+ত)(p+x+ব)(+ত)q8(+x++য) (+x+র)Thereisamarkedformofত+=ৎৎ6 (+xৎ)

amp (5+ত)

61

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

থ tʰ থ8(iexcl+য)cent(iexcl+র)pound(+থ)(x+থ)curren(+থ)

(7+থ) 9 (+থ)

দ d yen(brvbar+গ)sect(brvbar+ঘ)uml(brvbar+দ)copy(brvbar+ধ)দ8(brvbar+য)ordf(brvbar+ব)laquo(brvbar+ভ)not(brvbar+র)(+দ)shy(+দ)reg(+brvbar+র)not6 (+brvbar+র)

(lt+গ) gt (lt+ধ)

ধ dʱ macr(deg+ন)plusmn(deg+ম)ধ8(deg+য)sup2(deg+র)sup3(acute+ধ)copy(brvbar+ধ)(+ধ)micro(+ধ)

( (+ধ) gt (lt+ধ) (0+ধ) (7+ধ)

ট ʈ para(middot+ট)ট8(middot+য)cedil(middot+ব)sup1(middot+র)ordm(p+ট)raquo(frac14+ট)

ঠ ʈʰ ঠ8(frac12+য)frac34(iquest+ঠ)Agrave(frac14+ঠ)

ড ɖ Aacute(Acirc+ড)ড8(Acirc+য)Atilde(Acirc+র)

ঢ ɖʱ ঢ8(Auml+য)Aring(iquest+ঢ)

চ tʃ AElig(Ccedil+চ)Egrave(Ccedil+ছ)Eacute(Ccedil+Ecirc+র)Euml(Ccedil+ঞ)চ8(Ccedil+য)Igrave(Iacute+চ)Icirc(Iuml+চ)

(A+চ)

62

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ছ tʃʰ ETH(Ecirc+র)Egrave(Ccedil+ছ)Ntilde(Iacute+ছ)Ograve(Iuml+ছ)

$ (A+ছ)

জ dʒ Oacute(Ocirc+জ)Otilde(Ocirc+Ocirc+ব)Ouml(Ocirc+ঝ)times(Ocirc+ঞ)জ8(Ocirc+য)Oslash(Ocirc+র)Ugrave(Iacute+জ)

(A+জ)

ঝ dʒʱ (notprivilegedenoughtohaveclustersasafirstmember)Ouml(Ocirc+ঝ)Uacute(Iacute+ঝ)

ক k Ucirc(p+ক)ordm(p+ট)(p+ত)Uuml(p+x+র)(p+x+ব)Yacute(p+ন)THORN(p+ব)szlig(p+ম)ক8(p+য)agrave(p+র)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)auml(p+frac14+ব)aacute8(p+frac14+য)aring(p+স)s(aelig+ক)ccedil(+p+র)

amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)

খ kʰ (notprivilegedenoughtohaveclustersasafirstmember)egrave(aelig+খ)

63

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

গ g eacute(acute+গ)ecirc(acute+দ)sup3(acute+ধ)euml(acute+ন)igrave(acute+ব)iacute(acute+ম)গ8(acute+য)icirc(acute+র)iuml(acute+ল)eth(aelig+গ)eth6 (+aelig+গ)

( (+ধ) (H+গ) K (L+H+গ)

ঘ gʱ ntilde(ograve+ন)ঘ8(ograve+য)oacute(ograve+র)ocirc(aelig+ঘ)

ঞ Thisletterdoesnothaveanyparticularphoneticvaluebutmostlypronouncedasn

Igrave(Iacute+চ)Ntilde(Iacute+ছ)Ugrave(Iacute+জ)Uacute(Iacute+ঝ)times(Ocirc+ঞ)

(A+চ) $ (A+ছ) (A+জ) M (A+ঝ)

ণ n otilde(iquest+ট)frac34(iquest+ঠ)ouml(iquest+ড)divide(iquest+Acirc+র)Aring(iquest+ঢ)oslash(iquest+ণ)ণ8(iquest+য)ugrave(iquest+ব)acirc(p+frac14+ণ)uacute(frac14+ণ)ucirc(+ণ)

O (P+ড) - (P+R+র) + (S+ণ)

ঙং ŋ s(aelig+ক)uuml(aelig+p+র)egrave(aelig+খ)eth(aelig+গ)ocirc(aelig+ঘ)yacute(aelig+p+ষ)(Insomecontextsaeligisreplacedbyং)কংঅং

) (H+ক) (H+গ) U (H+ঘ)

64

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ম m thorn(+ল)yuml(+প)(+z+র)(+ভ)(++র)$(+ম)(+র)(x+ম)plusmn(deg+ম)amp(+ম)atilde(p+frac14+ম)

W (3+ম)

ন n (+ট)((+middot+র))(iquest+ঠ)u(+ড)(+Acirc+র)(+ত)q(+x+র)q8(+x++য)curren(+থ)shy(+দ)reg(+brvbar+র)micro(+ধ)+(+deg+র)(+brvbar+ব)-(+ন)(+ম)ন8(+য)(+স)0(+ন)

(7+থ) (7+ধ) (7+Y+র)

শ ʃ Icirc(Iuml+চ)Ograve(Iuml+ছ)1(Iuml+ন)2(Iuml+ম)3(Iuml+র)4(Iuml+ল)শ8(Iuml+য)

ষ ʃ 5(frac14+ক)raquo(frac14+ট)Agrave(frac14+ঠ)uacute(frac14+ণ)6(frac14+প)7(frac14+z+র)8(frac14+ফ)raquo(frac14+ট)9(frac14+middot+র)Agrave(frac14+ঠ)uacute(frac14+ণ)ষ8(frac14+য)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)

+ (S+ণ)

65

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

স sampʃ (+ক)(+ট)(+প)(+ফ)lt(+ত)pound(+থ)(+ট)(+ক)=(+খ)স8(+য)gt(+র)(+ল)aring(p+স)

9 (+থ)

হ h ucirc(+ণ)0(+ন)amp(+ম)হ8(+য)(+র)A(+ল)

W (3+ম)

ড় ɽ B(C+গ)

ঢ় ɽʱ (notprivilegedenoughtohaveclusters)

য dʒThesecondarysymbol(allograph)jɔ-phalahastwophoneticvaluesWhenaddedtotheinitialconsonantinaworditisavowelaelig(asinশ8ামলর 8াপারetc)Butafteranon-initialconsonantitjustdoublesitinpronunciation(asinকায6ধায6etc)The+যcombinationhastwophysicalmanifestationsmdashর 8andয6

ক8(p+য)স8(+য)র 8(+য)[Justর 8isneverusedinBanglaorthographyর 8াisbutthenitslasttwosymbolsYa-phalaa-karaconstituteavowelsignrepresentingthevowelঅ8া]

66

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

র r Twomanifestationsmdashi lরফrepʰasthefirst

memberofaclusteregপ6ৎ6 not6 য6D6(+deg+ব)(earlierE6=+brvbar+deg+বafour-termcluster)etc(placedoverthefollowingconsonant)

ii র-ফলাrɔ-pʰɔlaasthesecondthirdmemberofaclustereg etc(placedundertheconsonantitfollows)

ল l F(+গ)(+প)G(+ব)H(+ম)I(+ট)J(+ড)K(+ক)F(+গ)L(+দ)ল8(+য)iuml(acute+ল)(+ল)thorn(+ল)

ঃ hwordfinallywordmediallyitdoublesthepronunciationofthefollowingconsonant

অঃকঃ

অব

Page 42: Proposal for a Bangla (or Bengali) Script Root Zone …‘Bangla’ (or Bengali) is historically and genealogically regarded as an eastern Indo-Aryan language with around 178.2 million

42

vowel (U+0985 অ - BANGLA LETTER A and U+098F এ - BANGLA LETTER E) ForrenderingYa-phalāfollowedbyঅandএitisnecessarytotypeU+09CDplusU+09AFyapreceded by the said vowels This is a purely ligatural entity and the addition ofYa-phalāandā-kāraisusedtoelicittheaeligsoundasinEnglishlsquobatrsquolsquofatrsquoetcTheBrahmıscriptbynaturedoesnothaveHasantaafteravowelHasantaisgenerallydescribedaslsquovowel killerrsquo although it actually indicates absence of a vowel after the markedconsonant Only the consonants can have the Hasanta marked But as we see hereBanglaendsupwithadeviantfeatureintheorthographyhereinwhichHasantacomesimmediatelyafteravowelinligaturesঅGাandএGা(CfUnicode100p473[100])Another case refers to the formation of repha and ra-phalā in the said script andmentioned in the tableaboveasPOwing toco-occurrencewithHASANTARAeitherlosesitsownimplicitvowel(REPHA)orsuppressestheimplicitvoweloftheprecedingconsonant(RA-PHALA )Forinstancerepha=ra+Hasanta+C(egকHiera+Hasanta+

kaasinঅকHarkaldquothesunldquo)ra-phalā=C+Hasanta+ra(egTieka+Hasanta+raas

inচTcakraldquocycleldquo)ThepointisinboththecasestheslotforracouldbeBanglaraর

(U+09B0)ortheAssameseraৰ(U+09F0)followedprecededbythecommonHasanta(U+09CD)whereastheshapesofrephaandra-phalāinboththecasesremainthesame

43

6 VariantsThissection talksabout thevariants in theBanglascriptTheNBGPcategorizes theseconfusinglyvariantsintwogroupsGroup1ConfusingduetopurevisualsimilarityGroup2ConfusingduetodeviationfromnormallyperceivedcharacterformationsbylargerlinguisticcommunityForGroup1anyidenticalcodepointsaredefinedasvariantsTheconfusablebutnotidenticalcasesarenotproposedasthereisanotherpanel(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesHowevercaseswhichbelongtoGroup2areproposedtobeconsideredasvariantsThesecasesarenotofmerevisualsimilarityasthey involve some deviations from the widely accepted norms of Bangla Aksharformations These can cause confusion even to a careful observer and hence beingproposedasvariantsThe variants are generated in a script when two or more forms are formed withdifferent storage or code points In Bangla the e-kāra ā-kāra and the o-kāra havedifferentcodepointsOnecantypeowithaconsonantatonegoandthesamebytypinge-kāra and ā-kāra as two separate keys getting the same results A reader cannotdifferentiatebetween the twoko (েকা)one typedwitha singlekeyand theotheronetypedwithtwodifferentkeysMoreoverthiswillnotbeconsideredasacaseofvariantbecauseakārafollowedbyakāraisnotallowed

61 InScriptVariantsHoweverweproposetwocasesoftruein-scriptvariantsinBanglascriptCASEIAs far as true variants in Bangla are concernedwemay drawour attention to caseswhereinHasantawith(U+09A5)থ(tha)appearsasconjunctwith(U+09B8)স(sa)and(U+09A8)ন(na)1 স+Hasanta+থ(U+09B8+U+09CD+U+09A5)versus

স +Hasanta+হ(U+09B8+U+09CD+U+09B9)

2 ন+Hasanta+থ(U+09A8+U+09CD+U+09A5)versus

ন+Hasanta+হ(U+09A8+U+09CD+U+09B9)

44

Theabovecombinationsifwrittenintraditionalorthographycouldbelittleconfusingwheretheথ(tha)inconjunctappearslikeaহ(ha)Theconjunctcouldbeintheinitialmedialorfinalpositions(asshownbelowinegno1)Itcouldbetypedwrongaswellthinking itwas aহ (ha)U+09B9 increasing the chances of risks in labelwriting andidentificationExamples1 acuteandসহ(asinacuteানsthānaacuteলsthulaাacuteGsvāsthyaঅacuteায়ীasthāyı)

2 andনহ(asinparagrantha)The fontswhichrepresent traditionalBanglawritingsystemcould tend tocreate thisproblemThereforethesemaybetakenascasesofvariantsinBanglaCASEIIAnotherinterestingexampleofvariantisencounteredinra+HasantaandHasanta+racombinationsinwritinglabelsintheBanglascript(forlanguagessuchasBanglaAssameseandManipuri)Thevariantcasesariseintypinglsquorepharsquo(involvingra+Hasanta)andlsquora-phalārsquo(involvingHasanta+ra)lsquoRepharsquocouldbeformedbytwosequences(mainlybecausebothAssameseandBanglafindplaceinthesameUNICODEpointsandlsquoB_RArsquoaswellaslsquoA_RArsquorefertothesamephoneticelement)Herethefinalligatureslookthesameandwillbeasfollows

(1) B_RA+H+C(2) A_RA+H+C

Where

B_RA = U+09B0BENGALILETTERRA(র)orA_RA = U+09F0BENGALILETTERRAWITHMIDDLEDIAGONAL(ৰ)H = U+09CDBENGALISIGNVIRAMA()C = anyconsonant(theoretically)

Example

Sequence1(UsingBanglaRA)

Ligature1

Sequence2(UsingAssameseRA)

Ligature2

U+09B0(র)U+09CD()U+0995(ক) কH U+09F0(ৰ)U+09CD()U+0995(ক) কH

U+09B0(র)U+09CD()U+09A0(ঠ) ঠH U+09F0(ৰ)U+09CD()U+09A0(ঠ) ঠH

Table12ExampleofRepha

45

NoteAsBanglaandAssameseকandঠlookexactlythesametheresultantcombinationswithRephalookidenticalAdditionofRephadoesnotmakeanydifferencelsquoRa-phalarsquocouldbeformedbytwosequencesonsimilargroundsandthefinalligatureswouldlookthesame

(1) C1+H+B_RA(2) C1+H+A_RA

WhereC1 = anyconsonantsexceptKhanda-ta

Example

Sequence1(UsingBanglaRA)

Ligature1

Sequence2(UsingAssameseRA)

Ligature2

U+0995(ক)U+09CD()U+09B0(র) U+0995(ক)U+09CD()U+09F0(ৰ)

U+09A8(ন)U+09CD()U+09B0(র) ) U+09A8(ন)U+09CD()U+09F0(ৰ) )

Table13ExampleofRa-phala

AstheAssameseandBanglaRephaandRa-phalaconjunctformslookthesamethiscouldcauseconfusabilitytotheend-usersHencetherephaandra-phalacasesneedtobedefinedasvariantsNBGPconcludedtodefineরandৰasvariantcodepointswhereonlyonevariantsetbetweenরandৰcouldcoverallcasesButthiswillcreateblockedvariantlabelsegif

someoneregistersldquoরররrdquothevariantlabelldquoৰৰৰrdquowillbegeneratedasvariantandwillbeblockedandviceversaHoweveritisonlyblockedatthelabellevelifsomeoneelseneedstoregisterotherlabelsegৰৰorৰৰৰৰitisstillpossible

62 CrossScriptVariantsAcrispcrossscriptstudyforBanglahasbeendonewithrespecttosisterscriptssuchasDevanāgarī Gurmukhı and Odia11(formerly Oriya) keeping in mind the visual andtechnicalconfusionstheymaycauseaslabelsonthewebdomainMoreoverthereisnoin-script variant in Bangla as far as the orthography is concerned The followingcharacters are being proposed by the NBPG as variants Although there are certaincharacters which are somewhat similar they but have not been included here TheyhavebeenprovidedintheAppendix(102)forreference

11 Unicode uses Oriya for the script although Odia is now the official term used

46

1 BanglaandNāgarīDevanāgarīScript

Bangla Devanāgarī

মU+09AE

मU+092E

িU+09BF

िU+093F

Table14-BanglaandDevanāgarīcross-scriptvariantcodepoint

2 BanglaandGurmukhiScript

Bangla Gurmukhı

মU+09AE

ਸU+0A38

িU+09BF

ਿU+0A3F

Table15-BanglaandGurmukhıcross-scriptvariantcodepoint

7 WholeLabelEvaluationRules(WLE) ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinsection32whenwritten inBangla12ScriptThe ruleshavebeendrafted in suchawaythattheycanbeeasilytranslatedintotheLGRspecificationsBelow are the symbols used in the WLE rules for each of the Indic SyllabicCategoryasmentionedinthetableprovidedinCodepointrepertoire(Section51)

C rarr Consonant

M rarr Kāra(Mātrā)

V rarr Vowel

B rarr Anusvāra

12 As used by the Unicode denoting and including both Assamese and Maṇipuri

47

D rarr Candrabindu

X rarr Visarga

H rarr Hasanta

Z rarr KhandaTa

S rarr S1S2(fromTable9)or(ae)Ya-phalā(V1HC1M1)whereV1iseither0985(অ-BENGALILETTERA)or098F(এ-BENGALILETTERE)His09CD(-BENGALISIGNVIRAMA)C1is-09AF(য-BENGALILETTERYA)M1is-09BE(া-BENGALIVOWELSIGNAA)S1 and S2 are valid even they are not allowed by the other context rules

P rarr Ra-Hasanta(C2H)whereC2iseither09B0(র-BENGALILETTERRA)or 09F0 (ৰ - ASSAMESE LETTER RA

Unicode name BENGALI LETTER RAWITHMIDDLEDIAGONAL)

His09CD(-BENGALISIGNVIRAMA)

Table16-SymbolsusedinWLErules

ItisalsoperhapsidealtomentionherethatinBanglatheconsonantletters(orgraphemes)arephysicallyjoinedtoformldquoclustersrdquothatcouldtheoreticallyconjoinfromtwotofourconsonantsandcombinetocreatenewshapesDashandChaudhuri(1998)statethatthereareldquonearly380uniqueconsonantclustersrdquooutofwhichBi-consonantalcombinationsare290three-lettercombinationsaccountforanother80andtherareroneswithfourlettersnumber10more[136Pg4]MoredetailsofsuchcombinationscouldbeseeninPabitraSarkar(1993)[135] 71FinalSetofWLERulesTheprevalentpatternsinBanglaandvariousrestrictionsbelowarethespecificWLErulesthatneedtobeimplemented

48

1 CisasetofCandCNwhereCNisthesetofnormalizedformsofড়ঢ়য়2 HmustbeprecededbyC

Example

3 MmustbeprecededbyCExampleকা

4 DmustbeprecededbyeitherofVCorMExampleআখখা হা

5 XmustbeprecededbyeitherofVCMorDExampleউঃখঃবঃাঃ দ ঃ

6 BmustbeprecededbyeitherofVCMorDExampleআংইংকং

7 ZmustbeprecededbyVCMDBXorPExampleইৎকৎাৎাৎপ6ৎrৎ (S is not listed because S ends with M Z may also follow S)rdquo

8 VCANNOTbeprecededbyHDetailsin711CaseofVprecededbyH

9 SCANNOTbeprecededbyH

Now letuselaborateeach rulewithexamples from the scriptkeeping inmindthe Bangla Assamese and Manipuri communities Some combinations ofcharactersmayseemunrealisticorrareinusagebutthereisnoharminaddingsuch ligaturesbecause it ispossible tocreate thembyanyusereasilybutmaynotbeattestedcombinationsCase of V Preceded by H There could be cases involvingmulti-word domains where Vmay need to beallowedtofollowanHegব8াsঅtইিuয়া baeligŋkʌv ɪndiə (U+09ACU+09CDU+09AFU+09BEU+0999U+09CD U+0995 U+0985 U+09AB U+09CD U+0987 U+09A8 U+09CD U+09A1U+09BFU+09DFU+09BE)(meaningBankofIndia)ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendswithanH(অt)andthesecondwordbeginswithaV(ইিuয়া)Somesectionsofthe linguistic community require the explicit presence of H for fullrepresentation of the sound intended However by and large the form of thefirstwordwithoutanH(U+09CD)isconsideredenoughforfullrepresentationofthesoundintendedforthefirstword

49

This isauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoire Otherwise V is never required to be allowed to follow an HPermittingthismaycreateaperceptivesimilaritybetweentwolabels(withandwithout H) for majority of the linguistic communities hence this is explicitlyprohibitedbytheNBGPIn future if required depending on the prevailing requirements from thecommunitythefutureNBGPmayconsiderrevisitingthisrule

72 AdditionalExamplesfromBanglaABNF

Belowarea fewexampleswhichhelponeunderstandsomeof the rulesABNFputsinplaceThesearejustgivenforreferencepurposesandarenotmeanttobecomprehensive

1HMBDorXcannotoccurinthebeginningofaBanglawordExample

ক क

াক ाक

ংক क

ক क

ঃক ःकAscanbeseensuchcombinationwillresultautomaticallyinaldquogolurdquooradottedcircle marking it as an invalid formation This is an intrinsic property of theIndianlanguagesyllableandisquasi-automaticallyappliedwhereversupportedbytheOS

2HisnotpermittedafterVBDXMS

Example

অ अ

অং क ক क কঃ कः ি )क

3NumberofBDorXpermittedafterConsonantorVowelorakāra(Mātrā)isrestrictedtoonethusthefollowingcombinationsareinvalidated

50

Example

কংং क

ক क

কঃঃ कःः

কা का

কীঃঃ कःः

অংং अ

অ अ

অঃঃ अःः

4NumberofMpermittedafterConsonantisrestrictedtooneExample

কীী की5MisnotpermittedafterV Example

ইাঈৗ ईाईौ6ThecombinationsofAnusvāra+ VisargaaswellasVisarga+Anusvāraarenotpermissible

Example

কংঃ कः

কঃং कः

8 Contributors

81ExpertsfromIndia ProfessorUdayaNarayanaSinghChair-ProfessorofLinguisticsampDeanFacultyofArtsAmityUniversityHaryanaGurgaonPachgaonManesarPIN122431(Haryana)India

51

ProfessorPabitraSarkarformerlyVice-ChancellorRabindraBharatiUniversityKolkataDrAtiurRahmanKhanPrincipalTechnicalOfficerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMrRajibChakrabortyLinguistSocietyforNaturalLanguageTechnologyResearch(SNLTR)Module114amp130SDFBuildingSaltLakeSector-VKolkata-700091(WestBengal)IndiaMrAkshatJoshiProjectEngineerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMsMoumitaChowdhurySeniorTechnicalOfficerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMrChandrakantaMurasinghAgartalaTripuraSomeotherNBGPmembers

82ContributorsfromBangladeshJanabMustafaJabbarHonorableMinisterMinistryofPostsTelecommunicationsampInformationTechnologyGovtofBangladeshProfShamsuzzamanKhanFormerDirector-GeneralBanglaAcademyDhakaProfRafiqulIslamNationalProfessorofHumanitiesDhakaProfSwarochisSarkarDirectorInstituteofBangladeshStudiesRajshahiUniversityRajshahiBangladeshProfJinnatImtiazAliDirector-GeneralInternationalMotherLanguageInstituteDhakaMrMohammadMamunOrRashidDepartmentofBanglaJahangirnagarUniversityampMemberBangladeshComputerCouncilProfManiruzzamanformerlyProfessorChittagongUniversityChattagramBangladeshMrShyamSunderSikderSecretarySecretaryPostampTelecommunicationsDivisionGovtofBangladesh

52

MrMdMustafaKamalFormerDirectorGeneralBangladeshTelecommunicationsRegulatoryAuthorityGovernmentofBangladeshDhakaBrigadierGeneralMdMahfuzulKarimMajumderDirector-GeneralEngineeringampOperationsDivisionBangladeshTelecommunicationsRegulatoryAuthorityGovernmentofBangladeshDhakaMdZiarulIslamProgrammerPostsampTelecommunicationsDivisionGovernmentofBangladeshDhakaProfSyedShahriyarRahmanDepartmentofLinguisticsUniversityofDhakaDrMizanurRahmanDirector(In-Charge)TranslationTextBookandInternationalRelationsDivisionBanglaAcademyDhakaDrApareshBandyopadhyayDirectorBanglaAcademyDhakaMrMdMobarakHossainDirectorBanglaAcademyDhakaDrJalalAhmedDirectorBanglaAcademyDhakaMrJahangirHossainInternetSocietyBangladesh(ICANNALS)JanabSarwarMostafaChoudhuryBangladeshComputerCouncilDhakaJanabMdRashidWasifBangladeshComputerCouncilDhakaJanabIstiaqueArifSeniorAssistantDirectorBangladeshTelecommunicationsRegulatoryAuthorityDhakaMsAfifaAbbasInformationSecurityandGovernanceLeadEngineeratBanglalinkandICANNFellowMr Mohammad Abdul Haque Secretary General Bangladesh Internet GovernanceForumMrImranHossenCEOEyeSoftandkeymemberofBangladeshAssociationofSoftwareampInformationServices(BASIS)MsShahidaKhatunDirectorFolkloreMuseumampArchiveDivisionBanglaAcademyDhakaMrSyedAshikRehmanCEOBengalMediaCorporationDhakaMrHaseebRahmanCEOProfessionalsrsquoSystemsDhaka

53

9 References[100] UnicodeConsortium2017UnicodeStandard100MountainViewCA

[101] BandyopadhyayChittaranjan1981DuiShatakerBanglaMudranoPrakashanKolkataAnandaPublishers

[102] BanerjiRD1919TheOriginoftheBengaliScriptKolkataNewDelhiAsianEducationalServices2003reprint

[103] ChatterjiSK1926TheOriginandDevelopmentoftheBengaliLanguageCalcuttaCalcuttaUniversityPress

[104] -----1939Bhasha-prakashBangalaVyakaran(AGrammaroftheBengaliLanguage)CalcuttaUniversityofCalcutta

[105] HaiMuhammadAbdul1964DhvaniVijnanOBanglaDhvani-tattwa(PhoneticsandBengaliPhonology)DhakaBanglaAcademy

[106]JhaSubhadra1958TheFormationofMaithiliLondonLuzacampCo

[107] KosticDjordjeDasRheaS1972AShortOutlineofBengaliPhoneticsCalcuttaStatisticalPublishingCompany

[108] MajumdarRC1971HistoryofAncientBengalCalcuttaGBhardwaj

[109] MazumdarBijaychandra19202000TheHistoryoftheBengaliLanguage(ReprCalcutta1920ed)NewDelhiAsianEducationalServices

[110] PandeyAnshuman2001ProposaltoEncodetheTirhutaScriptinISOIEC10646

[111] PalPalashBaran2001DhwanimalaBarnamalaKolkataPapyrus

[112] -----2007lsquoBanglaHarapherPanchParbarsquoInSwapanChakrabortyedMudranerSanskritiOBanglaBoiKolkataAbabhas

[113] RossFiona1999ThePrintedBengaliCharacteranditsEvolutionLondonCurzon

[114] ShastriMahamahopadhyayHaraPrasad1916HājārBacharērPurāṇaBāṅgālāBhāṣāyBauddhaGānōDōhāCalcuttaBangiyaSahityaParishat

[115] SinghUdayaNarayana(JointlyManiruzzaman)1983DiglossiainBangladeshandlanguageplanningCalcuttaGyanBharati214pp

[116] -----1987ABibliographyofBengaliLinguisticsMysoreCIILxii+316pp

[117] -----2017(withRajibChakrabortyBidishaBhattacharjeeampArimardanKumarTripathy)LanguagesandCulturesontheMarginGuidelinesforFieldworkonEndangeredLanguagesMimeoCentreforEndangeredLanguagesVisva-Bharati

54

[118] -----1980ScriptalchoiceandspellingreformAnessayinlanguageandplanningJournaloftheMSUniversityofBarodaSocialScienceNumber292173-186AmodifiedversionreprintedEAnnamalaiBjornJernuddandJoanRubinedsLanguagePlanningProceedingsofanInstituteMysoreCIIL405-417

[119] Sripantha1996JakhanChapakhanaEloKolkataPaschim-BangaBanglaAcademy

[120] SurAtul1986BanglaMudranerDushoBacharKolkataJijnasa

[121] ScriptBehaviourforBengaliVersion11TDILandC-DACPune

[122] BoraMahendra1981TheEvolutionofAssameseScriptJorhatAssamSahityaSabha

[123] ProposaltoEncodetheTirhutaScriptinISOIEC10646httpwwwunicodeorgL2L201111175r-tirhutapdfaccessedon25112017

[124] EthnologueAssameseintheLanguageCloudhttpswwwethnologuecomcloudasmaccessedon25112017

[125] BengalialphabetforManipurifoundinEthnologueManipuri(MeeteilonMeithei)httpswwwomniglotcomwritingmanipurihtmaccessedon20102019

[126] WikipediaBengalialphabethttpsenwikipediaorgwikiBengali_alphabetaccessedon25112017

[129]OmniglotSlyhetihttpwwwomniglotcomwritingsylotihtmaccessedon1052018

[130]WikipediaBishnupriyaManipurilanguagehttpsenwikipediaorgwikiBishnupriya_Manipuri_languageaccessedon25112017

[131] TheEMILLECIILCorpushttpmetashareeldaorgrepositorybrowsethe-emilleciil-corpusabdd35c8de6f11e2b1e400259011f6ea6bce74d38dbb42d881da76c64a6adb20accessedon1052018

[132]TheEMILLECIILCorpushttpcatalogelrainfoproduct_infophpproducts_id=696accessedon1052018

[133] BanglaLanguageampScript httpswwwisicalacin~rc_banglabanglahtmlaccessedon1052018

[134] SarkarPabitra1992BanglaBananSanskarSamasyaoSambhabanaKolkataChirayataPrakashan

[135] SarkarPabitra1993BanglaBhasharYuktabyanjanBhasha1123-45

55

[136] DashNiladriShekharandBBChaudhuri1998BanglaScriptAStructuralStudyLinguisticsToday121-28Alsoavailableathttpswwwacademiaedu9967428Bangla_Script_A_Structural_Study

[137] DaniAhmedHasan(1957)lsquoSrīhatta-NāgarīLipirUtpattioBikāśrsquoBanglaAcademyPatrika(Dhaka)Vol12(Bhadra-Agrahayan1364BangabdaNumber)pg1

[138] WikipediaSylhetiNagari httpsenwikipediaorgwikiSylheti_Nagariaccessedon1952018

[139] FuruiRyosuke(2015)lsquoVariegatedAdaptationsStateFormationinBengalfromtheFifthtoSeventhCenturyrsquoinBhairabiPrasadSahuampHermannKulkeedsInterrogatingPoliticalSystemsIntegrativeProcessesandStatesinPre-ModernIndiaChapter9Pp255-73NewDelhiManohar

[140] FergusonCharesAandMunierChowdhury(1960)lsquoPhonesofBengalirsquoLanguageVol36No1pp22-59

[141] ShahidullahMuhammad(2007)BuddhistMysticSongsDhakaMowlaBrothers

[142] RayPunyaSloka(1966)BengaliLanguageHandbookWashington

[143] HaiMuhammadAbdul(1960)AphoneticandphonologicalstudyofnasalsandnasalizationinBengaliDhakaUniversityofDhaka

[144] UnicodeConsortiumProposalSummaryFormtoAccompanySubmissionsforAdditionstotheRepertoireofISOIEC10646UNICODEhttpswwwunicodeorgL2L200202387r-syloti-formpdfaccessedonMay212018

[145] WikipediaOlChiki(Unicodeblock)httpsenwikipediaorgwikiOl_Chiki_(Unicode_block)accessedonMay212018

[146]BanglaScripthttpwwwbangladesh2000combdbangla_scripthtmlaccessedonMay212018

[147] BhattacharyaAshutoshed(1942)GopichandrerGanCalcuttaCalcuttaUniversity

[149] DasSisirKumar(1975)SahibsandMunshisAnAccountoftheCollegeofFortWilliamCalcutta

[150] IslamRafiqulPabitraSarkarMahbubulHaqampRajibChakraborty(eds)(2014)BanglaAcademyPromitoBanglaByabaharikByakaran(AFunctionalGrammarofStandardBangla)DhakaBanglaAcademy

[151] SarkarPabitra[2013]lsquoBanglaSpellingReformtheLongandShortofItrsquoBanglaJournal19215-232

[152] BanglaAcademy(2012)BanglaAcademyPromitoBanglaBananerNiyam(StandardBanglaSpellingasadoptedbyBanglaAcademy)DhakaBanglaAcademy

56

[153] SarkarPabitraampRajibChakraborty2018ldquoWhathashappenedSoFarIntermsof Script Reformsrdquo Paper presented at the Face to Face meeting jointly held by theBanglaAcademyDhakaampICANNatBanglaAcademyDhakaon10072018

[154] The Unicode Consortium 2018 The Unicodereg Standard Version 110 ndash CoreSpecificationChapter12P473

57

10 Appendix-I

101 AugmentedBackus-NaurFormalism(ABNF)The Augmented Backus-Naur Formalism (ABNF) is generic in nature and whenappliedtoaspecificlanguagescriptcertainrestrictionrulesapplyInotherwordsinagivenlanguagesomeoftheFormalismstructuresdonotnecessarilyapplyTotakecareofsuchcasesrestrictionrulesaresetinplaceTheserestrictionswillhelptofine-tunetheABNFIncaseofBangla13inparticularthefollowingrulesapply

1Khaṇḍata (ৎ) is NOT allowed at the beginning of an IDN label The sameappliestoঞandthevelarnasalঙintheBanglaSchemeoffive-foldlsquovargarsquo(asdefined under Table 5) Moreover Bangla does not allow ya (য়) in thebeginningofawordeitherbutwecanciteacoupleofnativeexamples forexamplethewordয়8াvেড়া(yaeligbbɔRo)fromthepoemlsquoLichuchorrsquowrittenbyKaziNazrul IslamHowever there are instances of it being used in namesmostlyof foreignoriginsuchasYaqubwhichmaybewrittenwithya(য়) inthe beginning as inয়াwব) In very recent timeswhile transliterating someChineseandJapanesenamesinBanglaonedoescomeacrossthepossibilityofKhaṇḍata (ৎ) followedbysa (স) inthebeginningofaword forexamplexেসিরং(Tsering)

2CHcancomewithKhandaTainonlythecasewhereCisra(র)(09B0)

ৎ6 asinভৎ6 সনা

3OnlyfollowingcombinationswithVHCMwillbeallowedrarrঅ8া(togetherpronouncedasaelig)asinঅ8ািসড(acid)rarrএ8া(togetheralsopronouncedasaelig)asinএ8ািসডএ8ােসািসেয়শান

(acidassociation)

102 lsquoSylhetiNagarılipirsquoorlsquoSilotirsquoThisversionofBanglascriptresemblesthe lsquoKaithīrsquoscript(ISO12954)usedbytheAccountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh andBiharndashwidelyinuseduringthe1880sTherewereseveralothernamesofSylheti

13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language Assamese and Maṇipuri have not been covered in this section

58

Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)

Table17ndashTheScriptTableofSylhetiNagarıorSiloti

103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints

1031 BanglaandNāgarīorDevanāgarī

Bangla Devanāgarī NBGP

Decision

ঃ U+0983 ः U+0903 Confusable

ওU+0993 उU+0909 Confusable

ঘU+0998 घU+0918 Confusable

U+0981 U+0945 Confusable

Table18BanglaandDevanāgarīconfusablecodepoints

59

1032 BanglaandGurmukhi

Bangla Gurmukhi NBGPdecision

ঘU+0998 ਬU+0A2C Confusable

U+0981 U+0A71 Confusable

Table19BanglaandGurmukhiconfusablecodepoints

Bangla Gurmukhi

NBGPdecision

ওU+0993 ਤU+0A24

Distinguishable

শU+09B6 ਅU+0A05

Distinguishable

মU+09AE ਮU+0A2E

Distinguishable

বাU+09ACandU+09BE

ਗU+0A17

Distinguishable

Table20ndashBanglaandGurmukhıdistinguishablecodepoints

1033BanglaandOriya(Odia)

Bangla Oriya(Odia) NBGPDecision

ওU+0993 ଓU+0B13 Confusable

Table21ndashBanglaandOriyadistinguishablecodepoints

Bangla Oriya(Odia) NBGP

Decision

ঘU+0998 ସU+0B38 Distinguishable

Table22ndashBanglaandOriyadistinguishablecodepoints

60

11 Appendix-IIBengaliconsonantsandtheirallographs

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

প p y(z+ত)(z +ন)|(z +প)প8(z +য)r(z +র)(z +ল)~(z +স)(+প)(+প)

ফ pʰ (t +র)(t +ল)(+ফ)

ব b ( +জ)( +দ)( +ধ)v(+ব)ব8(+য)(+র)(+ল)ভ(+ভ)(+ব)(+ব)

(0+ধ) 2 (3+ব)

ভ bʱ ভ8(+য)(+র)(+ল)

ত t (x+ত)8(x+x+য)(x+x+ব)(x+থ)(x+ন)ত8(x+য)(x+ম)8(x++য)(x+ব)(x+র)y(z+ত)(p+ত)(p+x+ব)(+ত)q8(+x++য) (+x+র)Thereisamarkedformofত+=ৎৎ6 (+xৎ)

amp (5+ত)

61

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

থ tʰ থ8(iexcl+য)cent(iexcl+র)pound(+থ)(x+থ)curren(+থ)

(7+থ) 9 (+থ)

দ d yen(brvbar+গ)sect(brvbar+ঘ)uml(brvbar+দ)copy(brvbar+ধ)দ8(brvbar+য)ordf(brvbar+ব)laquo(brvbar+ভ)not(brvbar+র)(+দ)shy(+দ)reg(+brvbar+র)not6 (+brvbar+র)

(lt+গ) gt (lt+ধ)

ধ dʱ macr(deg+ন)plusmn(deg+ম)ধ8(deg+য)sup2(deg+র)sup3(acute+ধ)copy(brvbar+ধ)(+ধ)micro(+ধ)

( (+ধ) gt (lt+ধ) (0+ধ) (7+ধ)

ট ʈ para(middot+ট)ট8(middot+য)cedil(middot+ব)sup1(middot+র)ordm(p+ট)raquo(frac14+ট)

ঠ ʈʰ ঠ8(frac12+য)frac34(iquest+ঠ)Agrave(frac14+ঠ)

ড ɖ Aacute(Acirc+ড)ড8(Acirc+য)Atilde(Acirc+র)

ঢ ɖʱ ঢ8(Auml+য)Aring(iquest+ঢ)

চ tʃ AElig(Ccedil+চ)Egrave(Ccedil+ছ)Eacute(Ccedil+Ecirc+র)Euml(Ccedil+ঞ)চ8(Ccedil+য)Igrave(Iacute+চ)Icirc(Iuml+চ)

(A+চ)

62

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ছ tʃʰ ETH(Ecirc+র)Egrave(Ccedil+ছ)Ntilde(Iacute+ছ)Ograve(Iuml+ছ)

$ (A+ছ)

জ dʒ Oacute(Ocirc+জ)Otilde(Ocirc+Ocirc+ব)Ouml(Ocirc+ঝ)times(Ocirc+ঞ)জ8(Ocirc+য)Oslash(Ocirc+র)Ugrave(Iacute+জ)

(A+জ)

ঝ dʒʱ (notprivilegedenoughtohaveclustersasafirstmember)Ouml(Ocirc+ঝ)Uacute(Iacute+ঝ)

ক k Ucirc(p+ক)ordm(p+ট)(p+ত)Uuml(p+x+র)(p+x+ব)Yacute(p+ন)THORN(p+ব)szlig(p+ম)ক8(p+য)agrave(p+র)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)auml(p+frac14+ব)aacute8(p+frac14+য)aring(p+স)s(aelig+ক)ccedil(+p+র)

amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)

খ kʰ (notprivilegedenoughtohaveclustersasafirstmember)egrave(aelig+খ)

63

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

গ g eacute(acute+গ)ecirc(acute+দ)sup3(acute+ধ)euml(acute+ন)igrave(acute+ব)iacute(acute+ম)গ8(acute+য)icirc(acute+র)iuml(acute+ল)eth(aelig+গ)eth6 (+aelig+গ)

( (+ধ) (H+গ) K (L+H+গ)

ঘ gʱ ntilde(ograve+ন)ঘ8(ograve+য)oacute(ograve+র)ocirc(aelig+ঘ)

ঞ Thisletterdoesnothaveanyparticularphoneticvaluebutmostlypronouncedasn

Igrave(Iacute+চ)Ntilde(Iacute+ছ)Ugrave(Iacute+জ)Uacute(Iacute+ঝ)times(Ocirc+ঞ)

(A+চ) $ (A+ছ) (A+জ) M (A+ঝ)

ণ n otilde(iquest+ট)frac34(iquest+ঠ)ouml(iquest+ড)divide(iquest+Acirc+র)Aring(iquest+ঢ)oslash(iquest+ণ)ণ8(iquest+য)ugrave(iquest+ব)acirc(p+frac14+ণ)uacute(frac14+ণ)ucirc(+ণ)

O (P+ড) - (P+R+র) + (S+ণ)

ঙং ŋ s(aelig+ক)uuml(aelig+p+র)egrave(aelig+খ)eth(aelig+গ)ocirc(aelig+ঘ)yacute(aelig+p+ষ)(Insomecontextsaeligisreplacedbyং)কংঅং

) (H+ক) (H+গ) U (H+ঘ)

64

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ম m thorn(+ল)yuml(+প)(+z+র)(+ভ)(++র)$(+ম)(+র)(x+ম)plusmn(deg+ম)amp(+ম)atilde(p+frac14+ম)

W (3+ম)

ন n (+ট)((+middot+র))(iquest+ঠ)u(+ড)(+Acirc+র)(+ত)q(+x+র)q8(+x++য)curren(+থ)shy(+দ)reg(+brvbar+র)micro(+ধ)+(+deg+র)(+brvbar+ব)-(+ন)(+ম)ন8(+য)(+স)0(+ন)

(7+থ) (7+ধ) (7+Y+র)

শ ʃ Icirc(Iuml+চ)Ograve(Iuml+ছ)1(Iuml+ন)2(Iuml+ম)3(Iuml+র)4(Iuml+ল)শ8(Iuml+য)

ষ ʃ 5(frac14+ক)raquo(frac14+ট)Agrave(frac14+ঠ)uacute(frac14+ণ)6(frac14+প)7(frac14+z+র)8(frac14+ফ)raquo(frac14+ট)9(frac14+middot+র)Agrave(frac14+ঠ)uacute(frac14+ণ)ষ8(frac14+য)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)

+ (S+ণ)

65

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

স sampʃ (+ক)(+ট)(+প)(+ফ)lt(+ত)pound(+থ)(+ট)(+ক)=(+খ)স8(+য)gt(+র)(+ল)aring(p+স)

9 (+থ)

হ h ucirc(+ণ)0(+ন)amp(+ম)হ8(+য)(+র)A(+ল)

W (3+ম)

ড় ɽ B(C+গ)

ঢ় ɽʱ (notprivilegedenoughtohaveclusters)

য dʒThesecondarysymbol(allograph)jɔ-phalahastwophoneticvaluesWhenaddedtotheinitialconsonantinaworditisavowelaelig(asinশ8ামলর 8াপারetc)Butafteranon-initialconsonantitjustdoublesitinpronunciation(asinকায6ধায6etc)The+যcombinationhastwophysicalmanifestationsmdashর 8andয6

ক8(p+য)স8(+য)র 8(+য)[Justর 8isneverusedinBanglaorthographyর 8াisbutthenitslasttwosymbolsYa-phalaa-karaconstituteavowelsignrepresentingthevowelঅ8া]

66

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

র r Twomanifestationsmdashi lরফrepʰasthefirst

memberofaclusteregপ6ৎ6 not6 য6D6(+deg+ব)(earlierE6=+brvbar+deg+বafour-termcluster)etc(placedoverthefollowingconsonant)

ii র-ফলাrɔ-pʰɔlaasthesecondthirdmemberofaclustereg etc(placedundertheconsonantitfollows)

ল l F(+গ)(+প)G(+ব)H(+ম)I(+ট)J(+ড)K(+ক)F(+গ)L(+দ)ল8(+য)iuml(acute+ল)(+ল)thorn(+ল)

ঃ hwordfinallywordmediallyitdoublesthepronunciationofthefollowingconsonant

অঃকঃ

অব

Page 43: Proposal for a Bangla (or Bengali) Script Root Zone …‘Bangla’ (or Bengali) is historically and genealogically regarded as an eastern Indo-Aryan language with around 178.2 million

43

6 VariantsThissection talksabout thevariants in theBanglascriptTheNBGPcategorizes theseconfusinglyvariantsintwogroupsGroup1ConfusingduetopurevisualsimilarityGroup2ConfusingduetodeviationfromnormallyperceivedcharacterformationsbylargerlinguisticcommunityForGroup1anyidenticalcodepointsaredefinedasvariantsTheconfusablebutnotidenticalcasesarenotproposedasthereisanotherpanel(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcasesHowevercaseswhichbelongtoGroup2areproposedtobeconsideredasvariantsThesecasesarenotofmerevisualsimilarityasthey involve some deviations from the widely accepted norms of Bangla Aksharformations These can cause confusion even to a careful observer and hence beingproposedasvariantsThe variants are generated in a script when two or more forms are formed withdifferent storage or code points In Bangla the e-kāra ā-kāra and the o-kāra havedifferentcodepointsOnecantypeowithaconsonantatonegoandthesamebytypinge-kāra and ā-kāra as two separate keys getting the same results A reader cannotdifferentiatebetween the twoko (েকা)one typedwitha singlekeyand theotheronetypedwithtwodifferentkeysMoreoverthiswillnotbeconsideredasacaseofvariantbecauseakārafollowedbyakāraisnotallowed

61 InScriptVariantsHoweverweproposetwocasesoftruein-scriptvariantsinBanglascriptCASEIAs far as true variants in Bangla are concernedwemay drawour attention to caseswhereinHasantawith(U+09A5)থ(tha)appearsasconjunctwith(U+09B8)স(sa)and(U+09A8)ন(na)1 স+Hasanta+থ(U+09B8+U+09CD+U+09A5)versus

স +Hasanta+হ(U+09B8+U+09CD+U+09B9)

2 ন+Hasanta+থ(U+09A8+U+09CD+U+09A5)versus

ন+Hasanta+হ(U+09A8+U+09CD+U+09B9)

44

Theabovecombinationsifwrittenintraditionalorthographycouldbelittleconfusingwheretheথ(tha)inconjunctappearslikeaহ(ha)Theconjunctcouldbeintheinitialmedialorfinalpositions(asshownbelowinegno1)Itcouldbetypedwrongaswellthinking itwas aহ (ha)U+09B9 increasing the chances of risks in labelwriting andidentificationExamples1 acuteandসহ(asinacuteানsthānaacuteলsthulaাacuteGsvāsthyaঅacuteায়ীasthāyı)

2 andনহ(asinparagrantha)The fontswhichrepresent traditionalBanglawritingsystemcould tend tocreate thisproblemThereforethesemaybetakenascasesofvariantsinBanglaCASEIIAnotherinterestingexampleofvariantisencounteredinra+HasantaandHasanta+racombinationsinwritinglabelsintheBanglascript(forlanguagessuchasBanglaAssameseandManipuri)Thevariantcasesariseintypinglsquorepharsquo(involvingra+Hasanta)andlsquora-phalārsquo(involvingHasanta+ra)lsquoRepharsquocouldbeformedbytwosequences(mainlybecausebothAssameseandBanglafindplaceinthesameUNICODEpointsandlsquoB_RArsquoaswellaslsquoA_RArsquorefertothesamephoneticelement)Herethefinalligatureslookthesameandwillbeasfollows

(1) B_RA+H+C(2) A_RA+H+C

Where

B_RA = U+09B0BENGALILETTERRA(র)orA_RA = U+09F0BENGALILETTERRAWITHMIDDLEDIAGONAL(ৰ)H = U+09CDBENGALISIGNVIRAMA()C = anyconsonant(theoretically)

Example

Sequence1(UsingBanglaRA)

Ligature1

Sequence2(UsingAssameseRA)

Ligature2

U+09B0(র)U+09CD()U+0995(ক) কH U+09F0(ৰ)U+09CD()U+0995(ক) কH

U+09B0(র)U+09CD()U+09A0(ঠ) ঠH U+09F0(ৰ)U+09CD()U+09A0(ঠ) ঠH

Table12ExampleofRepha

45

NoteAsBanglaandAssameseকandঠlookexactlythesametheresultantcombinationswithRephalookidenticalAdditionofRephadoesnotmakeanydifferencelsquoRa-phalarsquocouldbeformedbytwosequencesonsimilargroundsandthefinalligatureswouldlookthesame

(1) C1+H+B_RA(2) C1+H+A_RA

WhereC1 = anyconsonantsexceptKhanda-ta

Example

Sequence1(UsingBanglaRA)

Ligature1

Sequence2(UsingAssameseRA)

Ligature2

U+0995(ক)U+09CD()U+09B0(র) U+0995(ক)U+09CD()U+09F0(ৰ)

U+09A8(ন)U+09CD()U+09B0(র) ) U+09A8(ন)U+09CD()U+09F0(ৰ) )

Table13ExampleofRa-phala

AstheAssameseandBanglaRephaandRa-phalaconjunctformslookthesamethiscouldcauseconfusabilitytotheend-usersHencetherephaandra-phalacasesneedtobedefinedasvariantsNBGPconcludedtodefineরandৰasvariantcodepointswhereonlyonevariantsetbetweenরandৰcouldcoverallcasesButthiswillcreateblockedvariantlabelsegif

someoneregistersldquoরররrdquothevariantlabelldquoৰৰৰrdquowillbegeneratedasvariantandwillbeblockedandviceversaHoweveritisonlyblockedatthelabellevelifsomeoneelseneedstoregisterotherlabelsegৰৰorৰৰৰৰitisstillpossible

62 CrossScriptVariantsAcrispcrossscriptstudyforBanglahasbeendonewithrespecttosisterscriptssuchasDevanāgarī Gurmukhı and Odia11(formerly Oriya) keeping in mind the visual andtechnicalconfusionstheymaycauseaslabelsonthewebdomainMoreoverthereisnoin-script variant in Bangla as far as the orthography is concerned The followingcharacters are being proposed by the NBPG as variants Although there are certaincharacters which are somewhat similar they but have not been included here TheyhavebeenprovidedintheAppendix(102)forreference

11 Unicode uses Oriya for the script although Odia is now the official term used

46

1 BanglaandNāgarīDevanāgarīScript

Bangla Devanāgarī

মU+09AE

मU+092E

িU+09BF

िU+093F

Table14-BanglaandDevanāgarīcross-scriptvariantcodepoint

2 BanglaandGurmukhiScript

Bangla Gurmukhı

মU+09AE

ਸU+0A38

িU+09BF

ਿU+0A3F

Table15-BanglaandGurmukhıcross-scriptvariantcodepoint

7 WholeLabelEvaluationRules(WLE) ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinsection32whenwritten inBangla12ScriptThe ruleshavebeendrafted in suchawaythattheycanbeeasilytranslatedintotheLGRspecificationsBelow are the symbols used in the WLE rules for each of the Indic SyllabicCategoryasmentionedinthetableprovidedinCodepointrepertoire(Section51)

C rarr Consonant

M rarr Kāra(Mātrā)

V rarr Vowel

B rarr Anusvāra

12 As used by the Unicode denoting and including both Assamese and Maṇipuri

47

D rarr Candrabindu

X rarr Visarga

H rarr Hasanta

Z rarr KhandaTa

S rarr S1S2(fromTable9)or(ae)Ya-phalā(V1HC1M1)whereV1iseither0985(অ-BENGALILETTERA)or098F(এ-BENGALILETTERE)His09CD(-BENGALISIGNVIRAMA)C1is-09AF(য-BENGALILETTERYA)M1is-09BE(া-BENGALIVOWELSIGNAA)S1 and S2 are valid even they are not allowed by the other context rules

P rarr Ra-Hasanta(C2H)whereC2iseither09B0(র-BENGALILETTERRA)or 09F0 (ৰ - ASSAMESE LETTER RA

Unicode name BENGALI LETTER RAWITHMIDDLEDIAGONAL)

His09CD(-BENGALISIGNVIRAMA)

Table16-SymbolsusedinWLErules

ItisalsoperhapsidealtomentionherethatinBanglatheconsonantletters(orgraphemes)arephysicallyjoinedtoformldquoclustersrdquothatcouldtheoreticallyconjoinfromtwotofourconsonantsandcombinetocreatenewshapesDashandChaudhuri(1998)statethatthereareldquonearly380uniqueconsonantclustersrdquooutofwhichBi-consonantalcombinationsare290three-lettercombinationsaccountforanother80andtherareroneswithfourlettersnumber10more[136Pg4]MoredetailsofsuchcombinationscouldbeseeninPabitraSarkar(1993)[135] 71FinalSetofWLERulesTheprevalentpatternsinBanglaandvariousrestrictionsbelowarethespecificWLErulesthatneedtobeimplemented

48

1 CisasetofCandCNwhereCNisthesetofnormalizedformsofড়ঢ়য়2 HmustbeprecededbyC

Example

3 MmustbeprecededbyCExampleকা

4 DmustbeprecededbyeitherofVCorMExampleআখখা হা

5 XmustbeprecededbyeitherofVCMorDExampleউঃখঃবঃাঃ দ ঃ

6 BmustbeprecededbyeitherofVCMorDExampleআংইংকং

7 ZmustbeprecededbyVCMDBXorPExampleইৎকৎাৎাৎপ6ৎrৎ (S is not listed because S ends with M Z may also follow S)rdquo

8 VCANNOTbeprecededbyHDetailsin711CaseofVprecededbyH

9 SCANNOTbeprecededbyH

Now letuselaborateeach rulewithexamples from the scriptkeeping inmindthe Bangla Assamese and Manipuri communities Some combinations ofcharactersmayseemunrealisticorrareinusagebutthereisnoharminaddingsuch ligaturesbecause it ispossible tocreate thembyanyusereasilybutmaynotbeattestedcombinationsCase of V Preceded by H There could be cases involvingmulti-word domains where Vmay need to beallowedtofollowanHegব8াsঅtইিuয়া baeligŋkʌv ɪndiə (U+09ACU+09CDU+09AFU+09BEU+0999U+09CD U+0995 U+0985 U+09AB U+09CD U+0987 U+09A8 U+09CD U+09A1U+09BFU+09DFU+09BE)(meaningBankofIndia)ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendswithanH(অt)andthesecondwordbeginswithaV(ইিuয়া)Somesectionsofthe linguistic community require the explicit presence of H for fullrepresentation of the sound intended However by and large the form of thefirstwordwithoutanH(U+09CD)isconsideredenoughforfullrepresentationofthesoundintendedforthefirstword

49

This isauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoire Otherwise V is never required to be allowed to follow an HPermittingthismaycreateaperceptivesimilaritybetweentwolabels(withandwithout H) for majority of the linguistic communities hence this is explicitlyprohibitedbytheNBGPIn future if required depending on the prevailing requirements from thecommunitythefutureNBGPmayconsiderrevisitingthisrule

72 AdditionalExamplesfromBanglaABNF

Belowarea fewexampleswhichhelponeunderstandsomeof the rulesABNFputsinplaceThesearejustgivenforreferencepurposesandarenotmeanttobecomprehensive

1HMBDorXcannotoccurinthebeginningofaBanglawordExample

ক क

াক ाक

ংক क

ক क

ঃক ःकAscanbeseensuchcombinationwillresultautomaticallyinaldquogolurdquooradottedcircle marking it as an invalid formation This is an intrinsic property of theIndianlanguagesyllableandisquasi-automaticallyappliedwhereversupportedbytheOS

2HisnotpermittedafterVBDXMS

Example

অ अ

অং क ক क কঃ कः ি )क

3NumberofBDorXpermittedafterConsonantorVowelorakāra(Mātrā)isrestrictedtoonethusthefollowingcombinationsareinvalidated

50

Example

কংং क

ক क

কঃঃ कःः

কা का

কীঃঃ कःः

অংং अ

অ अ

অঃঃ अःः

4NumberofMpermittedafterConsonantisrestrictedtooneExample

কীী की5MisnotpermittedafterV Example

ইাঈৗ ईाईौ6ThecombinationsofAnusvāra+ VisargaaswellasVisarga+Anusvāraarenotpermissible

Example

কংঃ कः

কঃং कः

8 Contributors

81ExpertsfromIndia ProfessorUdayaNarayanaSinghChair-ProfessorofLinguisticsampDeanFacultyofArtsAmityUniversityHaryanaGurgaonPachgaonManesarPIN122431(Haryana)India

51

ProfessorPabitraSarkarformerlyVice-ChancellorRabindraBharatiUniversityKolkataDrAtiurRahmanKhanPrincipalTechnicalOfficerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMrRajibChakrabortyLinguistSocietyforNaturalLanguageTechnologyResearch(SNLTR)Module114amp130SDFBuildingSaltLakeSector-VKolkata-700091(WestBengal)IndiaMrAkshatJoshiProjectEngineerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMsMoumitaChowdhurySeniorTechnicalOfficerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMrChandrakantaMurasinghAgartalaTripuraSomeotherNBGPmembers

82ContributorsfromBangladeshJanabMustafaJabbarHonorableMinisterMinistryofPostsTelecommunicationsampInformationTechnologyGovtofBangladeshProfShamsuzzamanKhanFormerDirector-GeneralBanglaAcademyDhakaProfRafiqulIslamNationalProfessorofHumanitiesDhakaProfSwarochisSarkarDirectorInstituteofBangladeshStudiesRajshahiUniversityRajshahiBangladeshProfJinnatImtiazAliDirector-GeneralInternationalMotherLanguageInstituteDhakaMrMohammadMamunOrRashidDepartmentofBanglaJahangirnagarUniversityampMemberBangladeshComputerCouncilProfManiruzzamanformerlyProfessorChittagongUniversityChattagramBangladeshMrShyamSunderSikderSecretarySecretaryPostampTelecommunicationsDivisionGovtofBangladesh

52

MrMdMustafaKamalFormerDirectorGeneralBangladeshTelecommunicationsRegulatoryAuthorityGovernmentofBangladeshDhakaBrigadierGeneralMdMahfuzulKarimMajumderDirector-GeneralEngineeringampOperationsDivisionBangladeshTelecommunicationsRegulatoryAuthorityGovernmentofBangladeshDhakaMdZiarulIslamProgrammerPostsampTelecommunicationsDivisionGovernmentofBangladeshDhakaProfSyedShahriyarRahmanDepartmentofLinguisticsUniversityofDhakaDrMizanurRahmanDirector(In-Charge)TranslationTextBookandInternationalRelationsDivisionBanglaAcademyDhakaDrApareshBandyopadhyayDirectorBanglaAcademyDhakaMrMdMobarakHossainDirectorBanglaAcademyDhakaDrJalalAhmedDirectorBanglaAcademyDhakaMrJahangirHossainInternetSocietyBangladesh(ICANNALS)JanabSarwarMostafaChoudhuryBangladeshComputerCouncilDhakaJanabMdRashidWasifBangladeshComputerCouncilDhakaJanabIstiaqueArifSeniorAssistantDirectorBangladeshTelecommunicationsRegulatoryAuthorityDhakaMsAfifaAbbasInformationSecurityandGovernanceLeadEngineeratBanglalinkandICANNFellowMr Mohammad Abdul Haque Secretary General Bangladesh Internet GovernanceForumMrImranHossenCEOEyeSoftandkeymemberofBangladeshAssociationofSoftwareampInformationServices(BASIS)MsShahidaKhatunDirectorFolkloreMuseumampArchiveDivisionBanglaAcademyDhakaMrSyedAshikRehmanCEOBengalMediaCorporationDhakaMrHaseebRahmanCEOProfessionalsrsquoSystemsDhaka

53

9 References[100] UnicodeConsortium2017UnicodeStandard100MountainViewCA

[101] BandyopadhyayChittaranjan1981DuiShatakerBanglaMudranoPrakashanKolkataAnandaPublishers

[102] BanerjiRD1919TheOriginoftheBengaliScriptKolkataNewDelhiAsianEducationalServices2003reprint

[103] ChatterjiSK1926TheOriginandDevelopmentoftheBengaliLanguageCalcuttaCalcuttaUniversityPress

[104] -----1939Bhasha-prakashBangalaVyakaran(AGrammaroftheBengaliLanguage)CalcuttaUniversityofCalcutta

[105] HaiMuhammadAbdul1964DhvaniVijnanOBanglaDhvani-tattwa(PhoneticsandBengaliPhonology)DhakaBanglaAcademy

[106]JhaSubhadra1958TheFormationofMaithiliLondonLuzacampCo

[107] KosticDjordjeDasRheaS1972AShortOutlineofBengaliPhoneticsCalcuttaStatisticalPublishingCompany

[108] MajumdarRC1971HistoryofAncientBengalCalcuttaGBhardwaj

[109] MazumdarBijaychandra19202000TheHistoryoftheBengaliLanguage(ReprCalcutta1920ed)NewDelhiAsianEducationalServices

[110] PandeyAnshuman2001ProposaltoEncodetheTirhutaScriptinISOIEC10646

[111] PalPalashBaran2001DhwanimalaBarnamalaKolkataPapyrus

[112] -----2007lsquoBanglaHarapherPanchParbarsquoInSwapanChakrabortyedMudranerSanskritiOBanglaBoiKolkataAbabhas

[113] RossFiona1999ThePrintedBengaliCharacteranditsEvolutionLondonCurzon

[114] ShastriMahamahopadhyayHaraPrasad1916HājārBacharērPurāṇaBāṅgālāBhāṣāyBauddhaGānōDōhāCalcuttaBangiyaSahityaParishat

[115] SinghUdayaNarayana(JointlyManiruzzaman)1983DiglossiainBangladeshandlanguageplanningCalcuttaGyanBharati214pp

[116] -----1987ABibliographyofBengaliLinguisticsMysoreCIILxii+316pp

[117] -----2017(withRajibChakrabortyBidishaBhattacharjeeampArimardanKumarTripathy)LanguagesandCulturesontheMarginGuidelinesforFieldworkonEndangeredLanguagesMimeoCentreforEndangeredLanguagesVisva-Bharati

54

[118] -----1980ScriptalchoiceandspellingreformAnessayinlanguageandplanningJournaloftheMSUniversityofBarodaSocialScienceNumber292173-186AmodifiedversionreprintedEAnnamalaiBjornJernuddandJoanRubinedsLanguagePlanningProceedingsofanInstituteMysoreCIIL405-417

[119] Sripantha1996JakhanChapakhanaEloKolkataPaschim-BangaBanglaAcademy

[120] SurAtul1986BanglaMudranerDushoBacharKolkataJijnasa

[121] ScriptBehaviourforBengaliVersion11TDILandC-DACPune

[122] BoraMahendra1981TheEvolutionofAssameseScriptJorhatAssamSahityaSabha

[123] ProposaltoEncodetheTirhutaScriptinISOIEC10646httpwwwunicodeorgL2L201111175r-tirhutapdfaccessedon25112017

[124] EthnologueAssameseintheLanguageCloudhttpswwwethnologuecomcloudasmaccessedon25112017

[125] BengalialphabetforManipurifoundinEthnologueManipuri(MeeteilonMeithei)httpswwwomniglotcomwritingmanipurihtmaccessedon20102019

[126] WikipediaBengalialphabethttpsenwikipediaorgwikiBengali_alphabetaccessedon25112017

[129]OmniglotSlyhetihttpwwwomniglotcomwritingsylotihtmaccessedon1052018

[130]WikipediaBishnupriyaManipurilanguagehttpsenwikipediaorgwikiBishnupriya_Manipuri_languageaccessedon25112017

[131] TheEMILLECIILCorpushttpmetashareeldaorgrepositorybrowsethe-emilleciil-corpusabdd35c8de6f11e2b1e400259011f6ea6bce74d38dbb42d881da76c64a6adb20accessedon1052018

[132]TheEMILLECIILCorpushttpcatalogelrainfoproduct_infophpproducts_id=696accessedon1052018

[133] BanglaLanguageampScript httpswwwisicalacin~rc_banglabanglahtmlaccessedon1052018

[134] SarkarPabitra1992BanglaBananSanskarSamasyaoSambhabanaKolkataChirayataPrakashan

[135] SarkarPabitra1993BanglaBhasharYuktabyanjanBhasha1123-45

55

[136] DashNiladriShekharandBBChaudhuri1998BanglaScriptAStructuralStudyLinguisticsToday121-28Alsoavailableathttpswwwacademiaedu9967428Bangla_Script_A_Structural_Study

[137] DaniAhmedHasan(1957)lsquoSrīhatta-NāgarīLipirUtpattioBikāśrsquoBanglaAcademyPatrika(Dhaka)Vol12(Bhadra-Agrahayan1364BangabdaNumber)pg1

[138] WikipediaSylhetiNagari httpsenwikipediaorgwikiSylheti_Nagariaccessedon1952018

[139] FuruiRyosuke(2015)lsquoVariegatedAdaptationsStateFormationinBengalfromtheFifthtoSeventhCenturyrsquoinBhairabiPrasadSahuampHermannKulkeedsInterrogatingPoliticalSystemsIntegrativeProcessesandStatesinPre-ModernIndiaChapter9Pp255-73NewDelhiManohar

[140] FergusonCharesAandMunierChowdhury(1960)lsquoPhonesofBengalirsquoLanguageVol36No1pp22-59

[141] ShahidullahMuhammad(2007)BuddhistMysticSongsDhakaMowlaBrothers

[142] RayPunyaSloka(1966)BengaliLanguageHandbookWashington

[143] HaiMuhammadAbdul(1960)AphoneticandphonologicalstudyofnasalsandnasalizationinBengaliDhakaUniversityofDhaka

[144] UnicodeConsortiumProposalSummaryFormtoAccompanySubmissionsforAdditionstotheRepertoireofISOIEC10646UNICODEhttpswwwunicodeorgL2L200202387r-syloti-formpdfaccessedonMay212018

[145] WikipediaOlChiki(Unicodeblock)httpsenwikipediaorgwikiOl_Chiki_(Unicode_block)accessedonMay212018

[146]BanglaScripthttpwwwbangladesh2000combdbangla_scripthtmlaccessedonMay212018

[147] BhattacharyaAshutoshed(1942)GopichandrerGanCalcuttaCalcuttaUniversity

[149] DasSisirKumar(1975)SahibsandMunshisAnAccountoftheCollegeofFortWilliamCalcutta

[150] IslamRafiqulPabitraSarkarMahbubulHaqampRajibChakraborty(eds)(2014)BanglaAcademyPromitoBanglaByabaharikByakaran(AFunctionalGrammarofStandardBangla)DhakaBanglaAcademy

[151] SarkarPabitra[2013]lsquoBanglaSpellingReformtheLongandShortofItrsquoBanglaJournal19215-232

[152] BanglaAcademy(2012)BanglaAcademyPromitoBanglaBananerNiyam(StandardBanglaSpellingasadoptedbyBanglaAcademy)DhakaBanglaAcademy

56

[153] SarkarPabitraampRajibChakraborty2018ldquoWhathashappenedSoFarIntermsof Script Reformsrdquo Paper presented at the Face to Face meeting jointly held by theBanglaAcademyDhakaampICANNatBanglaAcademyDhakaon10072018

[154] The Unicode Consortium 2018 The Unicodereg Standard Version 110 ndash CoreSpecificationChapter12P473

57

10 Appendix-I

101 AugmentedBackus-NaurFormalism(ABNF)The Augmented Backus-Naur Formalism (ABNF) is generic in nature and whenappliedtoaspecificlanguagescriptcertainrestrictionrulesapplyInotherwordsinagivenlanguagesomeoftheFormalismstructuresdonotnecessarilyapplyTotakecareofsuchcasesrestrictionrulesaresetinplaceTheserestrictionswillhelptofine-tunetheABNFIncaseofBangla13inparticularthefollowingrulesapply

1Khaṇḍata (ৎ) is NOT allowed at the beginning of an IDN label The sameappliestoঞandthevelarnasalঙintheBanglaSchemeoffive-foldlsquovargarsquo(asdefined under Table 5) Moreover Bangla does not allow ya (য়) in thebeginningofawordeitherbutwecanciteacoupleofnativeexamples forexamplethewordয়8াvেড়া(yaeligbbɔRo)fromthepoemlsquoLichuchorrsquowrittenbyKaziNazrul IslamHowever there are instances of it being used in namesmostlyof foreignoriginsuchasYaqubwhichmaybewrittenwithya(য়) inthe beginning as inয়াwব) In very recent timeswhile transliterating someChineseandJapanesenamesinBanglaonedoescomeacrossthepossibilityofKhaṇḍata (ৎ) followedbysa (স) inthebeginningofaword forexamplexেসিরং(Tsering)

2CHcancomewithKhandaTainonlythecasewhereCisra(র)(09B0)

ৎ6 asinভৎ6 সনা

3OnlyfollowingcombinationswithVHCMwillbeallowedrarrঅ8া(togetherpronouncedasaelig)asinঅ8ািসড(acid)rarrএ8া(togetheralsopronouncedasaelig)asinএ8ািসডএ8ােসািসেয়শান

(acidassociation)

102 lsquoSylhetiNagarılipirsquoorlsquoSilotirsquoThisversionofBanglascriptresemblesthe lsquoKaithīrsquoscript(ISO12954)usedbytheAccountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh andBiharndashwidelyinuseduringthe1880sTherewereseveralothernamesofSylheti

13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language Assamese and Maṇipuri have not been covered in this section

58

Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)

Table17ndashTheScriptTableofSylhetiNagarıorSiloti

103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints

1031 BanglaandNāgarīorDevanāgarī

Bangla Devanāgarī NBGP

Decision

ঃ U+0983 ः U+0903 Confusable

ওU+0993 उU+0909 Confusable

ঘU+0998 घU+0918 Confusable

U+0981 U+0945 Confusable

Table18BanglaandDevanāgarīconfusablecodepoints

59

1032 BanglaandGurmukhi

Bangla Gurmukhi NBGPdecision

ঘU+0998 ਬU+0A2C Confusable

U+0981 U+0A71 Confusable

Table19BanglaandGurmukhiconfusablecodepoints

Bangla Gurmukhi

NBGPdecision

ওU+0993 ਤU+0A24

Distinguishable

শU+09B6 ਅU+0A05

Distinguishable

মU+09AE ਮU+0A2E

Distinguishable

বাU+09ACandU+09BE

ਗU+0A17

Distinguishable

Table20ndashBanglaandGurmukhıdistinguishablecodepoints

1033BanglaandOriya(Odia)

Bangla Oriya(Odia) NBGPDecision

ওU+0993 ଓU+0B13 Confusable

Table21ndashBanglaandOriyadistinguishablecodepoints

Bangla Oriya(Odia) NBGP

Decision

ঘU+0998 ସU+0B38 Distinguishable

Table22ndashBanglaandOriyadistinguishablecodepoints

60

11 Appendix-IIBengaliconsonantsandtheirallographs

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

প p y(z+ত)(z +ন)|(z +প)প8(z +য)r(z +র)(z +ল)~(z +স)(+প)(+প)

ফ pʰ (t +র)(t +ল)(+ফ)

ব b ( +জ)( +দ)( +ধ)v(+ব)ব8(+য)(+র)(+ল)ভ(+ভ)(+ব)(+ব)

(0+ধ) 2 (3+ব)

ভ bʱ ভ8(+য)(+র)(+ল)

ত t (x+ত)8(x+x+য)(x+x+ব)(x+থ)(x+ন)ত8(x+য)(x+ম)8(x++য)(x+ব)(x+র)y(z+ত)(p+ত)(p+x+ব)(+ত)q8(+x++য) (+x+র)Thereisamarkedformofত+=ৎৎ6 (+xৎ)

amp (5+ত)

61

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

থ tʰ থ8(iexcl+য)cent(iexcl+র)pound(+থ)(x+থ)curren(+থ)

(7+থ) 9 (+থ)

দ d yen(brvbar+গ)sect(brvbar+ঘ)uml(brvbar+দ)copy(brvbar+ধ)দ8(brvbar+য)ordf(brvbar+ব)laquo(brvbar+ভ)not(brvbar+র)(+দ)shy(+দ)reg(+brvbar+র)not6 (+brvbar+র)

(lt+গ) gt (lt+ধ)

ধ dʱ macr(deg+ন)plusmn(deg+ম)ধ8(deg+য)sup2(deg+র)sup3(acute+ধ)copy(brvbar+ধ)(+ধ)micro(+ধ)

( (+ধ) gt (lt+ধ) (0+ধ) (7+ধ)

ট ʈ para(middot+ট)ট8(middot+য)cedil(middot+ব)sup1(middot+র)ordm(p+ট)raquo(frac14+ট)

ঠ ʈʰ ঠ8(frac12+য)frac34(iquest+ঠ)Agrave(frac14+ঠ)

ড ɖ Aacute(Acirc+ড)ড8(Acirc+য)Atilde(Acirc+র)

ঢ ɖʱ ঢ8(Auml+য)Aring(iquest+ঢ)

চ tʃ AElig(Ccedil+চ)Egrave(Ccedil+ছ)Eacute(Ccedil+Ecirc+র)Euml(Ccedil+ঞ)চ8(Ccedil+য)Igrave(Iacute+চ)Icirc(Iuml+চ)

(A+চ)

62

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ছ tʃʰ ETH(Ecirc+র)Egrave(Ccedil+ছ)Ntilde(Iacute+ছ)Ograve(Iuml+ছ)

$ (A+ছ)

জ dʒ Oacute(Ocirc+জ)Otilde(Ocirc+Ocirc+ব)Ouml(Ocirc+ঝ)times(Ocirc+ঞ)জ8(Ocirc+য)Oslash(Ocirc+র)Ugrave(Iacute+জ)

(A+জ)

ঝ dʒʱ (notprivilegedenoughtohaveclustersasafirstmember)Ouml(Ocirc+ঝ)Uacute(Iacute+ঝ)

ক k Ucirc(p+ক)ordm(p+ট)(p+ত)Uuml(p+x+র)(p+x+ব)Yacute(p+ন)THORN(p+ব)szlig(p+ম)ক8(p+য)agrave(p+র)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)auml(p+frac14+ব)aacute8(p+frac14+য)aring(p+স)s(aelig+ক)ccedil(+p+র)

amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)

খ kʰ (notprivilegedenoughtohaveclustersasafirstmember)egrave(aelig+খ)

63

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

গ g eacute(acute+গ)ecirc(acute+দ)sup3(acute+ধ)euml(acute+ন)igrave(acute+ব)iacute(acute+ম)গ8(acute+য)icirc(acute+র)iuml(acute+ল)eth(aelig+গ)eth6 (+aelig+গ)

( (+ধ) (H+গ) K (L+H+গ)

ঘ gʱ ntilde(ograve+ন)ঘ8(ograve+য)oacute(ograve+র)ocirc(aelig+ঘ)

ঞ Thisletterdoesnothaveanyparticularphoneticvaluebutmostlypronouncedasn

Igrave(Iacute+চ)Ntilde(Iacute+ছ)Ugrave(Iacute+জ)Uacute(Iacute+ঝ)times(Ocirc+ঞ)

(A+চ) $ (A+ছ) (A+জ) M (A+ঝ)

ণ n otilde(iquest+ট)frac34(iquest+ঠ)ouml(iquest+ড)divide(iquest+Acirc+র)Aring(iquest+ঢ)oslash(iquest+ণ)ণ8(iquest+য)ugrave(iquest+ব)acirc(p+frac14+ণ)uacute(frac14+ণ)ucirc(+ণ)

O (P+ড) - (P+R+র) + (S+ণ)

ঙং ŋ s(aelig+ক)uuml(aelig+p+র)egrave(aelig+খ)eth(aelig+গ)ocirc(aelig+ঘ)yacute(aelig+p+ষ)(Insomecontextsaeligisreplacedbyং)কংঅং

) (H+ক) (H+গ) U (H+ঘ)

64

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ম m thorn(+ল)yuml(+প)(+z+র)(+ভ)(++র)$(+ম)(+র)(x+ম)plusmn(deg+ম)amp(+ম)atilde(p+frac14+ম)

W (3+ম)

ন n (+ট)((+middot+র))(iquest+ঠ)u(+ড)(+Acirc+র)(+ত)q(+x+র)q8(+x++য)curren(+থ)shy(+দ)reg(+brvbar+র)micro(+ধ)+(+deg+র)(+brvbar+ব)-(+ন)(+ম)ন8(+য)(+স)0(+ন)

(7+থ) (7+ধ) (7+Y+র)

শ ʃ Icirc(Iuml+চ)Ograve(Iuml+ছ)1(Iuml+ন)2(Iuml+ম)3(Iuml+র)4(Iuml+ল)শ8(Iuml+য)

ষ ʃ 5(frac14+ক)raquo(frac14+ট)Agrave(frac14+ঠ)uacute(frac14+ণ)6(frac14+প)7(frac14+z+র)8(frac14+ফ)raquo(frac14+ট)9(frac14+middot+র)Agrave(frac14+ঠ)uacute(frac14+ণ)ষ8(frac14+য)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)

+ (S+ণ)

65

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

স sampʃ (+ক)(+ট)(+প)(+ফ)lt(+ত)pound(+থ)(+ট)(+ক)=(+খ)স8(+য)gt(+র)(+ল)aring(p+স)

9 (+থ)

হ h ucirc(+ণ)0(+ন)amp(+ম)হ8(+য)(+র)A(+ল)

W (3+ম)

ড় ɽ B(C+গ)

ঢ় ɽʱ (notprivilegedenoughtohaveclusters)

য dʒThesecondarysymbol(allograph)jɔ-phalahastwophoneticvaluesWhenaddedtotheinitialconsonantinaworditisavowelaelig(asinশ8ামলর 8াপারetc)Butafteranon-initialconsonantitjustdoublesitinpronunciation(asinকায6ধায6etc)The+যcombinationhastwophysicalmanifestationsmdashর 8andয6

ক8(p+য)স8(+য)র 8(+য)[Justর 8isneverusedinBanglaorthographyর 8াisbutthenitslasttwosymbolsYa-phalaa-karaconstituteavowelsignrepresentingthevowelঅ8া]

66

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

র r Twomanifestationsmdashi lরফrepʰasthefirst

memberofaclusteregপ6ৎ6 not6 য6D6(+deg+ব)(earlierE6=+brvbar+deg+বafour-termcluster)etc(placedoverthefollowingconsonant)

ii র-ফলাrɔ-pʰɔlaasthesecondthirdmemberofaclustereg etc(placedundertheconsonantitfollows)

ল l F(+গ)(+প)G(+ব)H(+ম)I(+ট)J(+ড)K(+ক)F(+গ)L(+দ)ল8(+য)iuml(acute+ল)(+ল)thorn(+ল)

ঃ hwordfinallywordmediallyitdoublesthepronunciationofthefollowingconsonant

অঃকঃ

অব

Page 44: Proposal for a Bangla (or Bengali) Script Root Zone …‘Bangla’ (or Bengali) is historically and genealogically regarded as an eastern Indo-Aryan language with around 178.2 million

44

Theabovecombinationsifwrittenintraditionalorthographycouldbelittleconfusingwheretheথ(tha)inconjunctappearslikeaহ(ha)Theconjunctcouldbeintheinitialmedialorfinalpositions(asshownbelowinegno1)Itcouldbetypedwrongaswellthinking itwas aহ (ha)U+09B9 increasing the chances of risks in labelwriting andidentificationExamples1 acuteandসহ(asinacuteানsthānaacuteলsthulaাacuteGsvāsthyaঅacuteায়ীasthāyı)

2 andনহ(asinparagrantha)The fontswhichrepresent traditionalBanglawritingsystemcould tend tocreate thisproblemThereforethesemaybetakenascasesofvariantsinBanglaCASEIIAnotherinterestingexampleofvariantisencounteredinra+HasantaandHasanta+racombinationsinwritinglabelsintheBanglascript(forlanguagessuchasBanglaAssameseandManipuri)Thevariantcasesariseintypinglsquorepharsquo(involvingra+Hasanta)andlsquora-phalārsquo(involvingHasanta+ra)lsquoRepharsquocouldbeformedbytwosequences(mainlybecausebothAssameseandBanglafindplaceinthesameUNICODEpointsandlsquoB_RArsquoaswellaslsquoA_RArsquorefertothesamephoneticelement)Herethefinalligatureslookthesameandwillbeasfollows

(1) B_RA+H+C(2) A_RA+H+C

Where

B_RA = U+09B0BENGALILETTERRA(র)orA_RA = U+09F0BENGALILETTERRAWITHMIDDLEDIAGONAL(ৰ)H = U+09CDBENGALISIGNVIRAMA()C = anyconsonant(theoretically)

Example

Sequence1(UsingBanglaRA)

Ligature1

Sequence2(UsingAssameseRA)

Ligature2

U+09B0(র)U+09CD()U+0995(ক) কH U+09F0(ৰ)U+09CD()U+0995(ক) কH

U+09B0(র)U+09CD()U+09A0(ঠ) ঠH U+09F0(ৰ)U+09CD()U+09A0(ঠ) ঠH

Table12ExampleofRepha

45

NoteAsBanglaandAssameseকandঠlookexactlythesametheresultantcombinationswithRephalookidenticalAdditionofRephadoesnotmakeanydifferencelsquoRa-phalarsquocouldbeformedbytwosequencesonsimilargroundsandthefinalligatureswouldlookthesame

(1) C1+H+B_RA(2) C1+H+A_RA

WhereC1 = anyconsonantsexceptKhanda-ta

Example

Sequence1(UsingBanglaRA)

Ligature1

Sequence2(UsingAssameseRA)

Ligature2

U+0995(ক)U+09CD()U+09B0(র) U+0995(ক)U+09CD()U+09F0(ৰ)

U+09A8(ন)U+09CD()U+09B0(র) ) U+09A8(ন)U+09CD()U+09F0(ৰ) )

Table13ExampleofRa-phala

AstheAssameseandBanglaRephaandRa-phalaconjunctformslookthesamethiscouldcauseconfusabilitytotheend-usersHencetherephaandra-phalacasesneedtobedefinedasvariantsNBGPconcludedtodefineরandৰasvariantcodepointswhereonlyonevariantsetbetweenরandৰcouldcoverallcasesButthiswillcreateblockedvariantlabelsegif

someoneregistersldquoরররrdquothevariantlabelldquoৰৰৰrdquowillbegeneratedasvariantandwillbeblockedandviceversaHoweveritisonlyblockedatthelabellevelifsomeoneelseneedstoregisterotherlabelsegৰৰorৰৰৰৰitisstillpossible

62 CrossScriptVariantsAcrispcrossscriptstudyforBanglahasbeendonewithrespecttosisterscriptssuchasDevanāgarī Gurmukhı and Odia11(formerly Oriya) keeping in mind the visual andtechnicalconfusionstheymaycauseaslabelsonthewebdomainMoreoverthereisnoin-script variant in Bangla as far as the orthography is concerned The followingcharacters are being proposed by the NBPG as variants Although there are certaincharacters which are somewhat similar they but have not been included here TheyhavebeenprovidedintheAppendix(102)forreference

11 Unicode uses Oriya for the script although Odia is now the official term used

46

1 BanglaandNāgarīDevanāgarīScript

Bangla Devanāgarī

মU+09AE

मU+092E

িU+09BF

िU+093F

Table14-BanglaandDevanāgarīcross-scriptvariantcodepoint

2 BanglaandGurmukhiScript

Bangla Gurmukhı

মU+09AE

ਸU+0A38

িU+09BF

ਿU+0A3F

Table15-BanglaandGurmukhıcross-scriptvariantcodepoint

7 WholeLabelEvaluationRules(WLE) ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinsection32whenwritten inBangla12ScriptThe ruleshavebeendrafted in suchawaythattheycanbeeasilytranslatedintotheLGRspecificationsBelow are the symbols used in the WLE rules for each of the Indic SyllabicCategoryasmentionedinthetableprovidedinCodepointrepertoire(Section51)

C rarr Consonant

M rarr Kāra(Mātrā)

V rarr Vowel

B rarr Anusvāra

12 As used by the Unicode denoting and including both Assamese and Maṇipuri

47

D rarr Candrabindu

X rarr Visarga

H rarr Hasanta

Z rarr KhandaTa

S rarr S1S2(fromTable9)or(ae)Ya-phalā(V1HC1M1)whereV1iseither0985(অ-BENGALILETTERA)or098F(এ-BENGALILETTERE)His09CD(-BENGALISIGNVIRAMA)C1is-09AF(য-BENGALILETTERYA)M1is-09BE(া-BENGALIVOWELSIGNAA)S1 and S2 are valid even they are not allowed by the other context rules

P rarr Ra-Hasanta(C2H)whereC2iseither09B0(র-BENGALILETTERRA)or 09F0 (ৰ - ASSAMESE LETTER RA

Unicode name BENGALI LETTER RAWITHMIDDLEDIAGONAL)

His09CD(-BENGALISIGNVIRAMA)

Table16-SymbolsusedinWLErules

ItisalsoperhapsidealtomentionherethatinBanglatheconsonantletters(orgraphemes)arephysicallyjoinedtoformldquoclustersrdquothatcouldtheoreticallyconjoinfromtwotofourconsonantsandcombinetocreatenewshapesDashandChaudhuri(1998)statethatthereareldquonearly380uniqueconsonantclustersrdquooutofwhichBi-consonantalcombinationsare290three-lettercombinationsaccountforanother80andtherareroneswithfourlettersnumber10more[136Pg4]MoredetailsofsuchcombinationscouldbeseeninPabitraSarkar(1993)[135] 71FinalSetofWLERulesTheprevalentpatternsinBanglaandvariousrestrictionsbelowarethespecificWLErulesthatneedtobeimplemented

48

1 CisasetofCandCNwhereCNisthesetofnormalizedformsofড়ঢ়য়2 HmustbeprecededbyC

Example

3 MmustbeprecededbyCExampleকা

4 DmustbeprecededbyeitherofVCorMExampleআখখা হা

5 XmustbeprecededbyeitherofVCMorDExampleউঃখঃবঃাঃ দ ঃ

6 BmustbeprecededbyeitherofVCMorDExampleআংইংকং

7 ZmustbeprecededbyVCMDBXorPExampleইৎকৎাৎাৎপ6ৎrৎ (S is not listed because S ends with M Z may also follow S)rdquo

8 VCANNOTbeprecededbyHDetailsin711CaseofVprecededbyH

9 SCANNOTbeprecededbyH

Now letuselaborateeach rulewithexamples from the scriptkeeping inmindthe Bangla Assamese and Manipuri communities Some combinations ofcharactersmayseemunrealisticorrareinusagebutthereisnoharminaddingsuch ligaturesbecause it ispossible tocreate thembyanyusereasilybutmaynotbeattestedcombinationsCase of V Preceded by H There could be cases involvingmulti-word domains where Vmay need to beallowedtofollowanHegব8াsঅtইিuয়া baeligŋkʌv ɪndiə (U+09ACU+09CDU+09AFU+09BEU+0999U+09CD U+0995 U+0985 U+09AB U+09CD U+0987 U+09A8 U+09CD U+09A1U+09BFU+09DFU+09BE)(meaningBankofIndia)ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendswithanH(অt)andthesecondwordbeginswithaV(ইিuয়া)Somesectionsofthe linguistic community require the explicit presence of H for fullrepresentation of the sound intended However by and large the form of thefirstwordwithoutanH(U+09CD)isconsideredenoughforfullrepresentationofthesoundintendedforthefirstword

49

This isauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoire Otherwise V is never required to be allowed to follow an HPermittingthismaycreateaperceptivesimilaritybetweentwolabels(withandwithout H) for majority of the linguistic communities hence this is explicitlyprohibitedbytheNBGPIn future if required depending on the prevailing requirements from thecommunitythefutureNBGPmayconsiderrevisitingthisrule

72 AdditionalExamplesfromBanglaABNF

Belowarea fewexampleswhichhelponeunderstandsomeof the rulesABNFputsinplaceThesearejustgivenforreferencepurposesandarenotmeanttobecomprehensive

1HMBDorXcannotoccurinthebeginningofaBanglawordExample

ক क

াক ाक

ংক क

ক क

ঃক ःकAscanbeseensuchcombinationwillresultautomaticallyinaldquogolurdquooradottedcircle marking it as an invalid formation This is an intrinsic property of theIndianlanguagesyllableandisquasi-automaticallyappliedwhereversupportedbytheOS

2HisnotpermittedafterVBDXMS

Example

অ अ

অং क ক क কঃ कः ি )क

3NumberofBDorXpermittedafterConsonantorVowelorakāra(Mātrā)isrestrictedtoonethusthefollowingcombinationsareinvalidated

50

Example

কংং क

ক क

কঃঃ कःः

কা का

কীঃঃ कःः

অংং अ

অ अ

অঃঃ अःः

4NumberofMpermittedafterConsonantisrestrictedtooneExample

কীী की5MisnotpermittedafterV Example

ইাঈৗ ईाईौ6ThecombinationsofAnusvāra+ VisargaaswellasVisarga+Anusvāraarenotpermissible

Example

কংঃ कः

কঃং कः

8 Contributors

81ExpertsfromIndia ProfessorUdayaNarayanaSinghChair-ProfessorofLinguisticsampDeanFacultyofArtsAmityUniversityHaryanaGurgaonPachgaonManesarPIN122431(Haryana)India

51

ProfessorPabitraSarkarformerlyVice-ChancellorRabindraBharatiUniversityKolkataDrAtiurRahmanKhanPrincipalTechnicalOfficerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMrRajibChakrabortyLinguistSocietyforNaturalLanguageTechnologyResearch(SNLTR)Module114amp130SDFBuildingSaltLakeSector-VKolkata-700091(WestBengal)IndiaMrAkshatJoshiProjectEngineerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMsMoumitaChowdhurySeniorTechnicalOfficerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMrChandrakantaMurasinghAgartalaTripuraSomeotherNBGPmembers

82ContributorsfromBangladeshJanabMustafaJabbarHonorableMinisterMinistryofPostsTelecommunicationsampInformationTechnologyGovtofBangladeshProfShamsuzzamanKhanFormerDirector-GeneralBanglaAcademyDhakaProfRafiqulIslamNationalProfessorofHumanitiesDhakaProfSwarochisSarkarDirectorInstituteofBangladeshStudiesRajshahiUniversityRajshahiBangladeshProfJinnatImtiazAliDirector-GeneralInternationalMotherLanguageInstituteDhakaMrMohammadMamunOrRashidDepartmentofBanglaJahangirnagarUniversityampMemberBangladeshComputerCouncilProfManiruzzamanformerlyProfessorChittagongUniversityChattagramBangladeshMrShyamSunderSikderSecretarySecretaryPostampTelecommunicationsDivisionGovtofBangladesh

52

MrMdMustafaKamalFormerDirectorGeneralBangladeshTelecommunicationsRegulatoryAuthorityGovernmentofBangladeshDhakaBrigadierGeneralMdMahfuzulKarimMajumderDirector-GeneralEngineeringampOperationsDivisionBangladeshTelecommunicationsRegulatoryAuthorityGovernmentofBangladeshDhakaMdZiarulIslamProgrammerPostsampTelecommunicationsDivisionGovernmentofBangladeshDhakaProfSyedShahriyarRahmanDepartmentofLinguisticsUniversityofDhakaDrMizanurRahmanDirector(In-Charge)TranslationTextBookandInternationalRelationsDivisionBanglaAcademyDhakaDrApareshBandyopadhyayDirectorBanglaAcademyDhakaMrMdMobarakHossainDirectorBanglaAcademyDhakaDrJalalAhmedDirectorBanglaAcademyDhakaMrJahangirHossainInternetSocietyBangladesh(ICANNALS)JanabSarwarMostafaChoudhuryBangladeshComputerCouncilDhakaJanabMdRashidWasifBangladeshComputerCouncilDhakaJanabIstiaqueArifSeniorAssistantDirectorBangladeshTelecommunicationsRegulatoryAuthorityDhakaMsAfifaAbbasInformationSecurityandGovernanceLeadEngineeratBanglalinkandICANNFellowMr Mohammad Abdul Haque Secretary General Bangladesh Internet GovernanceForumMrImranHossenCEOEyeSoftandkeymemberofBangladeshAssociationofSoftwareampInformationServices(BASIS)MsShahidaKhatunDirectorFolkloreMuseumampArchiveDivisionBanglaAcademyDhakaMrSyedAshikRehmanCEOBengalMediaCorporationDhakaMrHaseebRahmanCEOProfessionalsrsquoSystemsDhaka

53

9 References[100] UnicodeConsortium2017UnicodeStandard100MountainViewCA

[101] BandyopadhyayChittaranjan1981DuiShatakerBanglaMudranoPrakashanKolkataAnandaPublishers

[102] BanerjiRD1919TheOriginoftheBengaliScriptKolkataNewDelhiAsianEducationalServices2003reprint

[103] ChatterjiSK1926TheOriginandDevelopmentoftheBengaliLanguageCalcuttaCalcuttaUniversityPress

[104] -----1939Bhasha-prakashBangalaVyakaran(AGrammaroftheBengaliLanguage)CalcuttaUniversityofCalcutta

[105] HaiMuhammadAbdul1964DhvaniVijnanOBanglaDhvani-tattwa(PhoneticsandBengaliPhonology)DhakaBanglaAcademy

[106]JhaSubhadra1958TheFormationofMaithiliLondonLuzacampCo

[107] KosticDjordjeDasRheaS1972AShortOutlineofBengaliPhoneticsCalcuttaStatisticalPublishingCompany

[108] MajumdarRC1971HistoryofAncientBengalCalcuttaGBhardwaj

[109] MazumdarBijaychandra19202000TheHistoryoftheBengaliLanguage(ReprCalcutta1920ed)NewDelhiAsianEducationalServices

[110] PandeyAnshuman2001ProposaltoEncodetheTirhutaScriptinISOIEC10646

[111] PalPalashBaran2001DhwanimalaBarnamalaKolkataPapyrus

[112] -----2007lsquoBanglaHarapherPanchParbarsquoInSwapanChakrabortyedMudranerSanskritiOBanglaBoiKolkataAbabhas

[113] RossFiona1999ThePrintedBengaliCharacteranditsEvolutionLondonCurzon

[114] ShastriMahamahopadhyayHaraPrasad1916HājārBacharērPurāṇaBāṅgālāBhāṣāyBauddhaGānōDōhāCalcuttaBangiyaSahityaParishat

[115] SinghUdayaNarayana(JointlyManiruzzaman)1983DiglossiainBangladeshandlanguageplanningCalcuttaGyanBharati214pp

[116] -----1987ABibliographyofBengaliLinguisticsMysoreCIILxii+316pp

[117] -----2017(withRajibChakrabortyBidishaBhattacharjeeampArimardanKumarTripathy)LanguagesandCulturesontheMarginGuidelinesforFieldworkonEndangeredLanguagesMimeoCentreforEndangeredLanguagesVisva-Bharati

54

[118] -----1980ScriptalchoiceandspellingreformAnessayinlanguageandplanningJournaloftheMSUniversityofBarodaSocialScienceNumber292173-186AmodifiedversionreprintedEAnnamalaiBjornJernuddandJoanRubinedsLanguagePlanningProceedingsofanInstituteMysoreCIIL405-417

[119] Sripantha1996JakhanChapakhanaEloKolkataPaschim-BangaBanglaAcademy

[120] SurAtul1986BanglaMudranerDushoBacharKolkataJijnasa

[121] ScriptBehaviourforBengaliVersion11TDILandC-DACPune

[122] BoraMahendra1981TheEvolutionofAssameseScriptJorhatAssamSahityaSabha

[123] ProposaltoEncodetheTirhutaScriptinISOIEC10646httpwwwunicodeorgL2L201111175r-tirhutapdfaccessedon25112017

[124] EthnologueAssameseintheLanguageCloudhttpswwwethnologuecomcloudasmaccessedon25112017

[125] BengalialphabetforManipurifoundinEthnologueManipuri(MeeteilonMeithei)httpswwwomniglotcomwritingmanipurihtmaccessedon20102019

[126] WikipediaBengalialphabethttpsenwikipediaorgwikiBengali_alphabetaccessedon25112017

[129]OmniglotSlyhetihttpwwwomniglotcomwritingsylotihtmaccessedon1052018

[130]WikipediaBishnupriyaManipurilanguagehttpsenwikipediaorgwikiBishnupriya_Manipuri_languageaccessedon25112017

[131] TheEMILLECIILCorpushttpmetashareeldaorgrepositorybrowsethe-emilleciil-corpusabdd35c8de6f11e2b1e400259011f6ea6bce74d38dbb42d881da76c64a6adb20accessedon1052018

[132]TheEMILLECIILCorpushttpcatalogelrainfoproduct_infophpproducts_id=696accessedon1052018

[133] BanglaLanguageampScript httpswwwisicalacin~rc_banglabanglahtmlaccessedon1052018

[134] SarkarPabitra1992BanglaBananSanskarSamasyaoSambhabanaKolkataChirayataPrakashan

[135] SarkarPabitra1993BanglaBhasharYuktabyanjanBhasha1123-45

55

[136] DashNiladriShekharandBBChaudhuri1998BanglaScriptAStructuralStudyLinguisticsToday121-28Alsoavailableathttpswwwacademiaedu9967428Bangla_Script_A_Structural_Study

[137] DaniAhmedHasan(1957)lsquoSrīhatta-NāgarīLipirUtpattioBikāśrsquoBanglaAcademyPatrika(Dhaka)Vol12(Bhadra-Agrahayan1364BangabdaNumber)pg1

[138] WikipediaSylhetiNagari httpsenwikipediaorgwikiSylheti_Nagariaccessedon1952018

[139] FuruiRyosuke(2015)lsquoVariegatedAdaptationsStateFormationinBengalfromtheFifthtoSeventhCenturyrsquoinBhairabiPrasadSahuampHermannKulkeedsInterrogatingPoliticalSystemsIntegrativeProcessesandStatesinPre-ModernIndiaChapter9Pp255-73NewDelhiManohar

[140] FergusonCharesAandMunierChowdhury(1960)lsquoPhonesofBengalirsquoLanguageVol36No1pp22-59

[141] ShahidullahMuhammad(2007)BuddhistMysticSongsDhakaMowlaBrothers

[142] RayPunyaSloka(1966)BengaliLanguageHandbookWashington

[143] HaiMuhammadAbdul(1960)AphoneticandphonologicalstudyofnasalsandnasalizationinBengaliDhakaUniversityofDhaka

[144] UnicodeConsortiumProposalSummaryFormtoAccompanySubmissionsforAdditionstotheRepertoireofISOIEC10646UNICODEhttpswwwunicodeorgL2L200202387r-syloti-formpdfaccessedonMay212018

[145] WikipediaOlChiki(Unicodeblock)httpsenwikipediaorgwikiOl_Chiki_(Unicode_block)accessedonMay212018

[146]BanglaScripthttpwwwbangladesh2000combdbangla_scripthtmlaccessedonMay212018

[147] BhattacharyaAshutoshed(1942)GopichandrerGanCalcuttaCalcuttaUniversity

[149] DasSisirKumar(1975)SahibsandMunshisAnAccountoftheCollegeofFortWilliamCalcutta

[150] IslamRafiqulPabitraSarkarMahbubulHaqampRajibChakraborty(eds)(2014)BanglaAcademyPromitoBanglaByabaharikByakaran(AFunctionalGrammarofStandardBangla)DhakaBanglaAcademy

[151] SarkarPabitra[2013]lsquoBanglaSpellingReformtheLongandShortofItrsquoBanglaJournal19215-232

[152] BanglaAcademy(2012)BanglaAcademyPromitoBanglaBananerNiyam(StandardBanglaSpellingasadoptedbyBanglaAcademy)DhakaBanglaAcademy

56

[153] SarkarPabitraampRajibChakraborty2018ldquoWhathashappenedSoFarIntermsof Script Reformsrdquo Paper presented at the Face to Face meeting jointly held by theBanglaAcademyDhakaampICANNatBanglaAcademyDhakaon10072018

[154] The Unicode Consortium 2018 The Unicodereg Standard Version 110 ndash CoreSpecificationChapter12P473

57

10 Appendix-I

101 AugmentedBackus-NaurFormalism(ABNF)The Augmented Backus-Naur Formalism (ABNF) is generic in nature and whenappliedtoaspecificlanguagescriptcertainrestrictionrulesapplyInotherwordsinagivenlanguagesomeoftheFormalismstructuresdonotnecessarilyapplyTotakecareofsuchcasesrestrictionrulesaresetinplaceTheserestrictionswillhelptofine-tunetheABNFIncaseofBangla13inparticularthefollowingrulesapply

1Khaṇḍata (ৎ) is NOT allowed at the beginning of an IDN label The sameappliestoঞandthevelarnasalঙintheBanglaSchemeoffive-foldlsquovargarsquo(asdefined under Table 5) Moreover Bangla does not allow ya (য়) in thebeginningofawordeitherbutwecanciteacoupleofnativeexamples forexamplethewordয়8াvেড়া(yaeligbbɔRo)fromthepoemlsquoLichuchorrsquowrittenbyKaziNazrul IslamHowever there are instances of it being used in namesmostlyof foreignoriginsuchasYaqubwhichmaybewrittenwithya(য়) inthe beginning as inয়াwব) In very recent timeswhile transliterating someChineseandJapanesenamesinBanglaonedoescomeacrossthepossibilityofKhaṇḍata (ৎ) followedbysa (স) inthebeginningofaword forexamplexেসিরং(Tsering)

2CHcancomewithKhandaTainonlythecasewhereCisra(র)(09B0)

ৎ6 asinভৎ6 সনা

3OnlyfollowingcombinationswithVHCMwillbeallowedrarrঅ8া(togetherpronouncedasaelig)asinঅ8ািসড(acid)rarrএ8া(togetheralsopronouncedasaelig)asinএ8ািসডএ8ােসািসেয়শান

(acidassociation)

102 lsquoSylhetiNagarılipirsquoorlsquoSilotirsquoThisversionofBanglascriptresemblesthe lsquoKaithīrsquoscript(ISO12954)usedbytheAccountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh andBiharndashwidelyinuseduringthe1880sTherewereseveralothernamesofSylheti

13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language Assamese and Maṇipuri have not been covered in this section

58

Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)

Table17ndashTheScriptTableofSylhetiNagarıorSiloti

103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints

1031 BanglaandNāgarīorDevanāgarī

Bangla Devanāgarī NBGP

Decision

ঃ U+0983 ः U+0903 Confusable

ওU+0993 उU+0909 Confusable

ঘU+0998 घU+0918 Confusable

U+0981 U+0945 Confusable

Table18BanglaandDevanāgarīconfusablecodepoints

59

1032 BanglaandGurmukhi

Bangla Gurmukhi NBGPdecision

ঘU+0998 ਬU+0A2C Confusable

U+0981 U+0A71 Confusable

Table19BanglaandGurmukhiconfusablecodepoints

Bangla Gurmukhi

NBGPdecision

ওU+0993 ਤU+0A24

Distinguishable

শU+09B6 ਅU+0A05

Distinguishable

মU+09AE ਮU+0A2E

Distinguishable

বাU+09ACandU+09BE

ਗU+0A17

Distinguishable

Table20ndashBanglaandGurmukhıdistinguishablecodepoints

1033BanglaandOriya(Odia)

Bangla Oriya(Odia) NBGPDecision

ওU+0993 ଓU+0B13 Confusable

Table21ndashBanglaandOriyadistinguishablecodepoints

Bangla Oriya(Odia) NBGP

Decision

ঘU+0998 ସU+0B38 Distinguishable

Table22ndashBanglaandOriyadistinguishablecodepoints

60

11 Appendix-IIBengaliconsonantsandtheirallographs

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

প p y(z+ত)(z +ন)|(z +প)প8(z +য)r(z +র)(z +ল)~(z +স)(+প)(+প)

ফ pʰ (t +র)(t +ল)(+ফ)

ব b ( +জ)( +দ)( +ধ)v(+ব)ব8(+য)(+র)(+ল)ভ(+ভ)(+ব)(+ব)

(0+ধ) 2 (3+ব)

ভ bʱ ভ8(+য)(+র)(+ল)

ত t (x+ত)8(x+x+য)(x+x+ব)(x+থ)(x+ন)ত8(x+য)(x+ম)8(x++য)(x+ব)(x+র)y(z+ত)(p+ত)(p+x+ব)(+ত)q8(+x++য) (+x+র)Thereisamarkedformofত+=ৎৎ6 (+xৎ)

amp (5+ত)

61

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

থ tʰ থ8(iexcl+য)cent(iexcl+র)pound(+থ)(x+থ)curren(+থ)

(7+থ) 9 (+থ)

দ d yen(brvbar+গ)sect(brvbar+ঘ)uml(brvbar+দ)copy(brvbar+ধ)দ8(brvbar+য)ordf(brvbar+ব)laquo(brvbar+ভ)not(brvbar+র)(+দ)shy(+দ)reg(+brvbar+র)not6 (+brvbar+র)

(lt+গ) gt (lt+ধ)

ধ dʱ macr(deg+ন)plusmn(deg+ম)ধ8(deg+য)sup2(deg+র)sup3(acute+ধ)copy(brvbar+ধ)(+ধ)micro(+ধ)

( (+ধ) gt (lt+ধ) (0+ধ) (7+ধ)

ট ʈ para(middot+ট)ট8(middot+য)cedil(middot+ব)sup1(middot+র)ordm(p+ট)raquo(frac14+ট)

ঠ ʈʰ ঠ8(frac12+য)frac34(iquest+ঠ)Agrave(frac14+ঠ)

ড ɖ Aacute(Acirc+ড)ড8(Acirc+য)Atilde(Acirc+র)

ঢ ɖʱ ঢ8(Auml+য)Aring(iquest+ঢ)

চ tʃ AElig(Ccedil+চ)Egrave(Ccedil+ছ)Eacute(Ccedil+Ecirc+র)Euml(Ccedil+ঞ)চ8(Ccedil+য)Igrave(Iacute+চ)Icirc(Iuml+চ)

(A+চ)

62

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ছ tʃʰ ETH(Ecirc+র)Egrave(Ccedil+ছ)Ntilde(Iacute+ছ)Ograve(Iuml+ছ)

$ (A+ছ)

জ dʒ Oacute(Ocirc+জ)Otilde(Ocirc+Ocirc+ব)Ouml(Ocirc+ঝ)times(Ocirc+ঞ)জ8(Ocirc+য)Oslash(Ocirc+র)Ugrave(Iacute+জ)

(A+জ)

ঝ dʒʱ (notprivilegedenoughtohaveclustersasafirstmember)Ouml(Ocirc+ঝ)Uacute(Iacute+ঝ)

ক k Ucirc(p+ক)ordm(p+ট)(p+ত)Uuml(p+x+র)(p+x+ব)Yacute(p+ন)THORN(p+ব)szlig(p+ম)ক8(p+য)agrave(p+র)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)auml(p+frac14+ব)aacute8(p+frac14+য)aring(p+স)s(aelig+ক)ccedil(+p+র)

amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)

খ kʰ (notprivilegedenoughtohaveclustersasafirstmember)egrave(aelig+খ)

63

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

গ g eacute(acute+গ)ecirc(acute+দ)sup3(acute+ধ)euml(acute+ন)igrave(acute+ব)iacute(acute+ম)গ8(acute+য)icirc(acute+র)iuml(acute+ল)eth(aelig+গ)eth6 (+aelig+গ)

( (+ধ) (H+গ) K (L+H+গ)

ঘ gʱ ntilde(ograve+ন)ঘ8(ograve+য)oacute(ograve+র)ocirc(aelig+ঘ)

ঞ Thisletterdoesnothaveanyparticularphoneticvaluebutmostlypronouncedasn

Igrave(Iacute+চ)Ntilde(Iacute+ছ)Ugrave(Iacute+জ)Uacute(Iacute+ঝ)times(Ocirc+ঞ)

(A+চ) $ (A+ছ) (A+জ) M (A+ঝ)

ণ n otilde(iquest+ট)frac34(iquest+ঠ)ouml(iquest+ড)divide(iquest+Acirc+র)Aring(iquest+ঢ)oslash(iquest+ণ)ণ8(iquest+য)ugrave(iquest+ব)acirc(p+frac14+ণ)uacute(frac14+ণ)ucirc(+ণ)

O (P+ড) - (P+R+র) + (S+ণ)

ঙং ŋ s(aelig+ক)uuml(aelig+p+র)egrave(aelig+খ)eth(aelig+গ)ocirc(aelig+ঘ)yacute(aelig+p+ষ)(Insomecontextsaeligisreplacedbyং)কংঅং

) (H+ক) (H+গ) U (H+ঘ)

64

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ম m thorn(+ল)yuml(+প)(+z+র)(+ভ)(++র)$(+ম)(+র)(x+ম)plusmn(deg+ম)amp(+ম)atilde(p+frac14+ম)

W (3+ম)

ন n (+ট)((+middot+র))(iquest+ঠ)u(+ড)(+Acirc+র)(+ত)q(+x+র)q8(+x++য)curren(+থ)shy(+দ)reg(+brvbar+র)micro(+ধ)+(+deg+র)(+brvbar+ব)-(+ন)(+ম)ন8(+য)(+স)0(+ন)

(7+থ) (7+ধ) (7+Y+র)

শ ʃ Icirc(Iuml+চ)Ograve(Iuml+ছ)1(Iuml+ন)2(Iuml+ম)3(Iuml+র)4(Iuml+ল)শ8(Iuml+য)

ষ ʃ 5(frac14+ক)raquo(frac14+ট)Agrave(frac14+ঠ)uacute(frac14+ণ)6(frac14+প)7(frac14+z+র)8(frac14+ফ)raquo(frac14+ট)9(frac14+middot+র)Agrave(frac14+ঠ)uacute(frac14+ণ)ষ8(frac14+য)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)

+ (S+ণ)

65

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

স sampʃ (+ক)(+ট)(+প)(+ফ)lt(+ত)pound(+থ)(+ট)(+ক)=(+খ)স8(+য)gt(+র)(+ল)aring(p+স)

9 (+থ)

হ h ucirc(+ণ)0(+ন)amp(+ম)হ8(+য)(+র)A(+ল)

W (3+ম)

ড় ɽ B(C+গ)

ঢ় ɽʱ (notprivilegedenoughtohaveclusters)

য dʒThesecondarysymbol(allograph)jɔ-phalahastwophoneticvaluesWhenaddedtotheinitialconsonantinaworditisavowelaelig(asinশ8ামলর 8াপারetc)Butafteranon-initialconsonantitjustdoublesitinpronunciation(asinকায6ধায6etc)The+যcombinationhastwophysicalmanifestationsmdashর 8andয6

ক8(p+য)স8(+য)র 8(+য)[Justর 8isneverusedinBanglaorthographyর 8াisbutthenitslasttwosymbolsYa-phalaa-karaconstituteavowelsignrepresentingthevowelঅ8া]

66

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

র r Twomanifestationsmdashi lরফrepʰasthefirst

memberofaclusteregপ6ৎ6 not6 য6D6(+deg+ব)(earlierE6=+brvbar+deg+বafour-termcluster)etc(placedoverthefollowingconsonant)

ii র-ফলাrɔ-pʰɔlaasthesecondthirdmemberofaclustereg etc(placedundertheconsonantitfollows)

ল l F(+গ)(+প)G(+ব)H(+ম)I(+ট)J(+ড)K(+ক)F(+গ)L(+দ)ল8(+য)iuml(acute+ল)(+ল)thorn(+ল)

ঃ hwordfinallywordmediallyitdoublesthepronunciationofthefollowingconsonant

অঃকঃ

অব

Page 45: Proposal for a Bangla (or Bengali) Script Root Zone …‘Bangla’ (or Bengali) is historically and genealogically regarded as an eastern Indo-Aryan language with around 178.2 million

45

NoteAsBanglaandAssameseকandঠlookexactlythesametheresultantcombinationswithRephalookidenticalAdditionofRephadoesnotmakeanydifferencelsquoRa-phalarsquocouldbeformedbytwosequencesonsimilargroundsandthefinalligatureswouldlookthesame

(1) C1+H+B_RA(2) C1+H+A_RA

WhereC1 = anyconsonantsexceptKhanda-ta

Example

Sequence1(UsingBanglaRA)

Ligature1

Sequence2(UsingAssameseRA)

Ligature2

U+0995(ক)U+09CD()U+09B0(র) U+0995(ক)U+09CD()U+09F0(ৰ)

U+09A8(ন)U+09CD()U+09B0(র) ) U+09A8(ন)U+09CD()U+09F0(ৰ) )

Table13ExampleofRa-phala

AstheAssameseandBanglaRephaandRa-phalaconjunctformslookthesamethiscouldcauseconfusabilitytotheend-usersHencetherephaandra-phalacasesneedtobedefinedasvariantsNBGPconcludedtodefineরandৰasvariantcodepointswhereonlyonevariantsetbetweenরandৰcouldcoverallcasesButthiswillcreateblockedvariantlabelsegif

someoneregistersldquoরররrdquothevariantlabelldquoৰৰৰrdquowillbegeneratedasvariantandwillbeblockedandviceversaHoweveritisonlyblockedatthelabellevelifsomeoneelseneedstoregisterotherlabelsegৰৰorৰৰৰৰitisstillpossible

62 CrossScriptVariantsAcrispcrossscriptstudyforBanglahasbeendonewithrespecttosisterscriptssuchasDevanāgarī Gurmukhı and Odia11(formerly Oriya) keeping in mind the visual andtechnicalconfusionstheymaycauseaslabelsonthewebdomainMoreoverthereisnoin-script variant in Bangla as far as the orthography is concerned The followingcharacters are being proposed by the NBPG as variants Although there are certaincharacters which are somewhat similar they but have not been included here TheyhavebeenprovidedintheAppendix(102)forreference

11 Unicode uses Oriya for the script although Odia is now the official term used

46

1 BanglaandNāgarīDevanāgarīScript

Bangla Devanāgarī

মU+09AE

मU+092E

িU+09BF

िU+093F

Table14-BanglaandDevanāgarīcross-scriptvariantcodepoint

2 BanglaandGurmukhiScript

Bangla Gurmukhı

মU+09AE

ਸU+0A38

িU+09BF

ਿU+0A3F

Table15-BanglaandGurmukhıcross-scriptvariantcodepoint

7 WholeLabelEvaluationRules(WLE) ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinsection32whenwritten inBangla12ScriptThe ruleshavebeendrafted in suchawaythattheycanbeeasilytranslatedintotheLGRspecificationsBelow are the symbols used in the WLE rules for each of the Indic SyllabicCategoryasmentionedinthetableprovidedinCodepointrepertoire(Section51)

C rarr Consonant

M rarr Kāra(Mātrā)

V rarr Vowel

B rarr Anusvāra

12 As used by the Unicode denoting and including both Assamese and Maṇipuri

47

D rarr Candrabindu

X rarr Visarga

H rarr Hasanta

Z rarr KhandaTa

S rarr S1S2(fromTable9)or(ae)Ya-phalā(V1HC1M1)whereV1iseither0985(অ-BENGALILETTERA)or098F(এ-BENGALILETTERE)His09CD(-BENGALISIGNVIRAMA)C1is-09AF(য-BENGALILETTERYA)M1is-09BE(া-BENGALIVOWELSIGNAA)S1 and S2 are valid even they are not allowed by the other context rules

P rarr Ra-Hasanta(C2H)whereC2iseither09B0(র-BENGALILETTERRA)or 09F0 (ৰ - ASSAMESE LETTER RA

Unicode name BENGALI LETTER RAWITHMIDDLEDIAGONAL)

His09CD(-BENGALISIGNVIRAMA)

Table16-SymbolsusedinWLErules

ItisalsoperhapsidealtomentionherethatinBanglatheconsonantletters(orgraphemes)arephysicallyjoinedtoformldquoclustersrdquothatcouldtheoreticallyconjoinfromtwotofourconsonantsandcombinetocreatenewshapesDashandChaudhuri(1998)statethatthereareldquonearly380uniqueconsonantclustersrdquooutofwhichBi-consonantalcombinationsare290three-lettercombinationsaccountforanother80andtherareroneswithfourlettersnumber10more[136Pg4]MoredetailsofsuchcombinationscouldbeseeninPabitraSarkar(1993)[135] 71FinalSetofWLERulesTheprevalentpatternsinBanglaandvariousrestrictionsbelowarethespecificWLErulesthatneedtobeimplemented

48

1 CisasetofCandCNwhereCNisthesetofnormalizedformsofড়ঢ়য়2 HmustbeprecededbyC

Example

3 MmustbeprecededbyCExampleকা

4 DmustbeprecededbyeitherofVCorMExampleআখখা হা

5 XmustbeprecededbyeitherofVCMorDExampleউঃখঃবঃাঃ দ ঃ

6 BmustbeprecededbyeitherofVCMorDExampleআংইংকং

7 ZmustbeprecededbyVCMDBXorPExampleইৎকৎাৎাৎপ6ৎrৎ (S is not listed because S ends with M Z may also follow S)rdquo

8 VCANNOTbeprecededbyHDetailsin711CaseofVprecededbyH

9 SCANNOTbeprecededbyH

Now letuselaborateeach rulewithexamples from the scriptkeeping inmindthe Bangla Assamese and Manipuri communities Some combinations ofcharactersmayseemunrealisticorrareinusagebutthereisnoharminaddingsuch ligaturesbecause it ispossible tocreate thembyanyusereasilybutmaynotbeattestedcombinationsCase of V Preceded by H There could be cases involvingmulti-word domains where Vmay need to beallowedtofollowanHegব8াsঅtইিuয়া baeligŋkʌv ɪndiə (U+09ACU+09CDU+09AFU+09BEU+0999U+09CD U+0995 U+0985 U+09AB U+09CD U+0987 U+09A8 U+09CD U+09A1U+09BFU+09DFU+09BE)(meaningBankofIndia)ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendswithanH(অt)andthesecondwordbeginswithaV(ইিuয়া)Somesectionsofthe linguistic community require the explicit presence of H for fullrepresentation of the sound intended However by and large the form of thefirstwordwithoutanH(U+09CD)isconsideredenoughforfullrepresentationofthesoundintendedforthefirstword

49

This isauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoire Otherwise V is never required to be allowed to follow an HPermittingthismaycreateaperceptivesimilaritybetweentwolabels(withandwithout H) for majority of the linguistic communities hence this is explicitlyprohibitedbytheNBGPIn future if required depending on the prevailing requirements from thecommunitythefutureNBGPmayconsiderrevisitingthisrule

72 AdditionalExamplesfromBanglaABNF

Belowarea fewexampleswhichhelponeunderstandsomeof the rulesABNFputsinplaceThesearejustgivenforreferencepurposesandarenotmeanttobecomprehensive

1HMBDorXcannotoccurinthebeginningofaBanglawordExample

ক क

াক ाक

ংক क

ক क

ঃক ःकAscanbeseensuchcombinationwillresultautomaticallyinaldquogolurdquooradottedcircle marking it as an invalid formation This is an intrinsic property of theIndianlanguagesyllableandisquasi-automaticallyappliedwhereversupportedbytheOS

2HisnotpermittedafterVBDXMS

Example

অ अ

অং क ক क কঃ कः ি )क

3NumberofBDorXpermittedafterConsonantorVowelorakāra(Mātrā)isrestrictedtoonethusthefollowingcombinationsareinvalidated

50

Example

কংং क

ক क

কঃঃ कःः

কা का

কীঃঃ कःः

অংং अ

অ अ

অঃঃ अःः

4NumberofMpermittedafterConsonantisrestrictedtooneExample

কীী की5MisnotpermittedafterV Example

ইাঈৗ ईाईौ6ThecombinationsofAnusvāra+ VisargaaswellasVisarga+Anusvāraarenotpermissible

Example

কংঃ कः

কঃং कः

8 Contributors

81ExpertsfromIndia ProfessorUdayaNarayanaSinghChair-ProfessorofLinguisticsampDeanFacultyofArtsAmityUniversityHaryanaGurgaonPachgaonManesarPIN122431(Haryana)India

51

ProfessorPabitraSarkarformerlyVice-ChancellorRabindraBharatiUniversityKolkataDrAtiurRahmanKhanPrincipalTechnicalOfficerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMrRajibChakrabortyLinguistSocietyforNaturalLanguageTechnologyResearch(SNLTR)Module114amp130SDFBuildingSaltLakeSector-VKolkata-700091(WestBengal)IndiaMrAkshatJoshiProjectEngineerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMsMoumitaChowdhurySeniorTechnicalOfficerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMrChandrakantaMurasinghAgartalaTripuraSomeotherNBGPmembers

82ContributorsfromBangladeshJanabMustafaJabbarHonorableMinisterMinistryofPostsTelecommunicationsampInformationTechnologyGovtofBangladeshProfShamsuzzamanKhanFormerDirector-GeneralBanglaAcademyDhakaProfRafiqulIslamNationalProfessorofHumanitiesDhakaProfSwarochisSarkarDirectorInstituteofBangladeshStudiesRajshahiUniversityRajshahiBangladeshProfJinnatImtiazAliDirector-GeneralInternationalMotherLanguageInstituteDhakaMrMohammadMamunOrRashidDepartmentofBanglaJahangirnagarUniversityampMemberBangladeshComputerCouncilProfManiruzzamanformerlyProfessorChittagongUniversityChattagramBangladeshMrShyamSunderSikderSecretarySecretaryPostampTelecommunicationsDivisionGovtofBangladesh

52

MrMdMustafaKamalFormerDirectorGeneralBangladeshTelecommunicationsRegulatoryAuthorityGovernmentofBangladeshDhakaBrigadierGeneralMdMahfuzulKarimMajumderDirector-GeneralEngineeringampOperationsDivisionBangladeshTelecommunicationsRegulatoryAuthorityGovernmentofBangladeshDhakaMdZiarulIslamProgrammerPostsampTelecommunicationsDivisionGovernmentofBangladeshDhakaProfSyedShahriyarRahmanDepartmentofLinguisticsUniversityofDhakaDrMizanurRahmanDirector(In-Charge)TranslationTextBookandInternationalRelationsDivisionBanglaAcademyDhakaDrApareshBandyopadhyayDirectorBanglaAcademyDhakaMrMdMobarakHossainDirectorBanglaAcademyDhakaDrJalalAhmedDirectorBanglaAcademyDhakaMrJahangirHossainInternetSocietyBangladesh(ICANNALS)JanabSarwarMostafaChoudhuryBangladeshComputerCouncilDhakaJanabMdRashidWasifBangladeshComputerCouncilDhakaJanabIstiaqueArifSeniorAssistantDirectorBangladeshTelecommunicationsRegulatoryAuthorityDhakaMsAfifaAbbasInformationSecurityandGovernanceLeadEngineeratBanglalinkandICANNFellowMr Mohammad Abdul Haque Secretary General Bangladesh Internet GovernanceForumMrImranHossenCEOEyeSoftandkeymemberofBangladeshAssociationofSoftwareampInformationServices(BASIS)MsShahidaKhatunDirectorFolkloreMuseumampArchiveDivisionBanglaAcademyDhakaMrSyedAshikRehmanCEOBengalMediaCorporationDhakaMrHaseebRahmanCEOProfessionalsrsquoSystemsDhaka

53

9 References[100] UnicodeConsortium2017UnicodeStandard100MountainViewCA

[101] BandyopadhyayChittaranjan1981DuiShatakerBanglaMudranoPrakashanKolkataAnandaPublishers

[102] BanerjiRD1919TheOriginoftheBengaliScriptKolkataNewDelhiAsianEducationalServices2003reprint

[103] ChatterjiSK1926TheOriginandDevelopmentoftheBengaliLanguageCalcuttaCalcuttaUniversityPress

[104] -----1939Bhasha-prakashBangalaVyakaran(AGrammaroftheBengaliLanguage)CalcuttaUniversityofCalcutta

[105] HaiMuhammadAbdul1964DhvaniVijnanOBanglaDhvani-tattwa(PhoneticsandBengaliPhonology)DhakaBanglaAcademy

[106]JhaSubhadra1958TheFormationofMaithiliLondonLuzacampCo

[107] KosticDjordjeDasRheaS1972AShortOutlineofBengaliPhoneticsCalcuttaStatisticalPublishingCompany

[108] MajumdarRC1971HistoryofAncientBengalCalcuttaGBhardwaj

[109] MazumdarBijaychandra19202000TheHistoryoftheBengaliLanguage(ReprCalcutta1920ed)NewDelhiAsianEducationalServices

[110] PandeyAnshuman2001ProposaltoEncodetheTirhutaScriptinISOIEC10646

[111] PalPalashBaran2001DhwanimalaBarnamalaKolkataPapyrus

[112] -----2007lsquoBanglaHarapherPanchParbarsquoInSwapanChakrabortyedMudranerSanskritiOBanglaBoiKolkataAbabhas

[113] RossFiona1999ThePrintedBengaliCharacteranditsEvolutionLondonCurzon

[114] ShastriMahamahopadhyayHaraPrasad1916HājārBacharērPurāṇaBāṅgālāBhāṣāyBauddhaGānōDōhāCalcuttaBangiyaSahityaParishat

[115] SinghUdayaNarayana(JointlyManiruzzaman)1983DiglossiainBangladeshandlanguageplanningCalcuttaGyanBharati214pp

[116] -----1987ABibliographyofBengaliLinguisticsMysoreCIILxii+316pp

[117] -----2017(withRajibChakrabortyBidishaBhattacharjeeampArimardanKumarTripathy)LanguagesandCulturesontheMarginGuidelinesforFieldworkonEndangeredLanguagesMimeoCentreforEndangeredLanguagesVisva-Bharati

54

[118] -----1980ScriptalchoiceandspellingreformAnessayinlanguageandplanningJournaloftheMSUniversityofBarodaSocialScienceNumber292173-186AmodifiedversionreprintedEAnnamalaiBjornJernuddandJoanRubinedsLanguagePlanningProceedingsofanInstituteMysoreCIIL405-417

[119] Sripantha1996JakhanChapakhanaEloKolkataPaschim-BangaBanglaAcademy

[120] SurAtul1986BanglaMudranerDushoBacharKolkataJijnasa

[121] ScriptBehaviourforBengaliVersion11TDILandC-DACPune

[122] BoraMahendra1981TheEvolutionofAssameseScriptJorhatAssamSahityaSabha

[123] ProposaltoEncodetheTirhutaScriptinISOIEC10646httpwwwunicodeorgL2L201111175r-tirhutapdfaccessedon25112017

[124] EthnologueAssameseintheLanguageCloudhttpswwwethnologuecomcloudasmaccessedon25112017

[125] BengalialphabetforManipurifoundinEthnologueManipuri(MeeteilonMeithei)httpswwwomniglotcomwritingmanipurihtmaccessedon20102019

[126] WikipediaBengalialphabethttpsenwikipediaorgwikiBengali_alphabetaccessedon25112017

[129]OmniglotSlyhetihttpwwwomniglotcomwritingsylotihtmaccessedon1052018

[130]WikipediaBishnupriyaManipurilanguagehttpsenwikipediaorgwikiBishnupriya_Manipuri_languageaccessedon25112017

[131] TheEMILLECIILCorpushttpmetashareeldaorgrepositorybrowsethe-emilleciil-corpusabdd35c8de6f11e2b1e400259011f6ea6bce74d38dbb42d881da76c64a6adb20accessedon1052018

[132]TheEMILLECIILCorpushttpcatalogelrainfoproduct_infophpproducts_id=696accessedon1052018

[133] BanglaLanguageampScript httpswwwisicalacin~rc_banglabanglahtmlaccessedon1052018

[134] SarkarPabitra1992BanglaBananSanskarSamasyaoSambhabanaKolkataChirayataPrakashan

[135] SarkarPabitra1993BanglaBhasharYuktabyanjanBhasha1123-45

55

[136] DashNiladriShekharandBBChaudhuri1998BanglaScriptAStructuralStudyLinguisticsToday121-28Alsoavailableathttpswwwacademiaedu9967428Bangla_Script_A_Structural_Study

[137] DaniAhmedHasan(1957)lsquoSrīhatta-NāgarīLipirUtpattioBikāśrsquoBanglaAcademyPatrika(Dhaka)Vol12(Bhadra-Agrahayan1364BangabdaNumber)pg1

[138] WikipediaSylhetiNagari httpsenwikipediaorgwikiSylheti_Nagariaccessedon1952018

[139] FuruiRyosuke(2015)lsquoVariegatedAdaptationsStateFormationinBengalfromtheFifthtoSeventhCenturyrsquoinBhairabiPrasadSahuampHermannKulkeedsInterrogatingPoliticalSystemsIntegrativeProcessesandStatesinPre-ModernIndiaChapter9Pp255-73NewDelhiManohar

[140] FergusonCharesAandMunierChowdhury(1960)lsquoPhonesofBengalirsquoLanguageVol36No1pp22-59

[141] ShahidullahMuhammad(2007)BuddhistMysticSongsDhakaMowlaBrothers

[142] RayPunyaSloka(1966)BengaliLanguageHandbookWashington

[143] HaiMuhammadAbdul(1960)AphoneticandphonologicalstudyofnasalsandnasalizationinBengaliDhakaUniversityofDhaka

[144] UnicodeConsortiumProposalSummaryFormtoAccompanySubmissionsforAdditionstotheRepertoireofISOIEC10646UNICODEhttpswwwunicodeorgL2L200202387r-syloti-formpdfaccessedonMay212018

[145] WikipediaOlChiki(Unicodeblock)httpsenwikipediaorgwikiOl_Chiki_(Unicode_block)accessedonMay212018

[146]BanglaScripthttpwwwbangladesh2000combdbangla_scripthtmlaccessedonMay212018

[147] BhattacharyaAshutoshed(1942)GopichandrerGanCalcuttaCalcuttaUniversity

[149] DasSisirKumar(1975)SahibsandMunshisAnAccountoftheCollegeofFortWilliamCalcutta

[150] IslamRafiqulPabitraSarkarMahbubulHaqampRajibChakraborty(eds)(2014)BanglaAcademyPromitoBanglaByabaharikByakaran(AFunctionalGrammarofStandardBangla)DhakaBanglaAcademy

[151] SarkarPabitra[2013]lsquoBanglaSpellingReformtheLongandShortofItrsquoBanglaJournal19215-232

[152] BanglaAcademy(2012)BanglaAcademyPromitoBanglaBananerNiyam(StandardBanglaSpellingasadoptedbyBanglaAcademy)DhakaBanglaAcademy

56

[153] SarkarPabitraampRajibChakraborty2018ldquoWhathashappenedSoFarIntermsof Script Reformsrdquo Paper presented at the Face to Face meeting jointly held by theBanglaAcademyDhakaampICANNatBanglaAcademyDhakaon10072018

[154] The Unicode Consortium 2018 The Unicodereg Standard Version 110 ndash CoreSpecificationChapter12P473

57

10 Appendix-I

101 AugmentedBackus-NaurFormalism(ABNF)The Augmented Backus-Naur Formalism (ABNF) is generic in nature and whenappliedtoaspecificlanguagescriptcertainrestrictionrulesapplyInotherwordsinagivenlanguagesomeoftheFormalismstructuresdonotnecessarilyapplyTotakecareofsuchcasesrestrictionrulesaresetinplaceTheserestrictionswillhelptofine-tunetheABNFIncaseofBangla13inparticularthefollowingrulesapply

1Khaṇḍata (ৎ) is NOT allowed at the beginning of an IDN label The sameappliestoঞandthevelarnasalঙintheBanglaSchemeoffive-foldlsquovargarsquo(asdefined under Table 5) Moreover Bangla does not allow ya (য়) in thebeginningofawordeitherbutwecanciteacoupleofnativeexamples forexamplethewordয়8াvেড়া(yaeligbbɔRo)fromthepoemlsquoLichuchorrsquowrittenbyKaziNazrul IslamHowever there are instances of it being used in namesmostlyof foreignoriginsuchasYaqubwhichmaybewrittenwithya(য়) inthe beginning as inয়াwব) In very recent timeswhile transliterating someChineseandJapanesenamesinBanglaonedoescomeacrossthepossibilityofKhaṇḍata (ৎ) followedbysa (স) inthebeginningofaword forexamplexেসিরং(Tsering)

2CHcancomewithKhandaTainonlythecasewhereCisra(র)(09B0)

ৎ6 asinভৎ6 সনা

3OnlyfollowingcombinationswithVHCMwillbeallowedrarrঅ8া(togetherpronouncedasaelig)asinঅ8ািসড(acid)rarrএ8া(togetheralsopronouncedasaelig)asinএ8ািসডএ8ােসািসেয়শান

(acidassociation)

102 lsquoSylhetiNagarılipirsquoorlsquoSilotirsquoThisversionofBanglascriptresemblesthe lsquoKaithīrsquoscript(ISO12954)usedbytheAccountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh andBiharndashwidelyinuseduringthe1880sTherewereseveralothernamesofSylheti

13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language Assamese and Maṇipuri have not been covered in this section

58

Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)

Table17ndashTheScriptTableofSylhetiNagarıorSiloti

103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints

1031 BanglaandNāgarīorDevanāgarī

Bangla Devanāgarī NBGP

Decision

ঃ U+0983 ः U+0903 Confusable

ওU+0993 उU+0909 Confusable

ঘU+0998 घU+0918 Confusable

U+0981 U+0945 Confusable

Table18BanglaandDevanāgarīconfusablecodepoints

59

1032 BanglaandGurmukhi

Bangla Gurmukhi NBGPdecision

ঘU+0998 ਬU+0A2C Confusable

U+0981 U+0A71 Confusable

Table19BanglaandGurmukhiconfusablecodepoints

Bangla Gurmukhi

NBGPdecision

ওU+0993 ਤU+0A24

Distinguishable

শU+09B6 ਅU+0A05

Distinguishable

মU+09AE ਮU+0A2E

Distinguishable

বাU+09ACandU+09BE

ਗU+0A17

Distinguishable

Table20ndashBanglaandGurmukhıdistinguishablecodepoints

1033BanglaandOriya(Odia)

Bangla Oriya(Odia) NBGPDecision

ওU+0993 ଓU+0B13 Confusable

Table21ndashBanglaandOriyadistinguishablecodepoints

Bangla Oriya(Odia) NBGP

Decision

ঘU+0998 ସU+0B38 Distinguishable

Table22ndashBanglaandOriyadistinguishablecodepoints

60

11 Appendix-IIBengaliconsonantsandtheirallographs

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

প p y(z+ত)(z +ন)|(z +প)প8(z +য)r(z +র)(z +ল)~(z +স)(+প)(+প)

ফ pʰ (t +র)(t +ল)(+ফ)

ব b ( +জ)( +দ)( +ধ)v(+ব)ব8(+য)(+র)(+ল)ভ(+ভ)(+ব)(+ব)

(0+ধ) 2 (3+ব)

ভ bʱ ভ8(+য)(+র)(+ল)

ত t (x+ত)8(x+x+য)(x+x+ব)(x+থ)(x+ন)ত8(x+য)(x+ম)8(x++য)(x+ব)(x+র)y(z+ত)(p+ত)(p+x+ব)(+ত)q8(+x++য) (+x+র)Thereisamarkedformofত+=ৎৎ6 (+xৎ)

amp (5+ত)

61

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

থ tʰ থ8(iexcl+য)cent(iexcl+র)pound(+থ)(x+থ)curren(+থ)

(7+থ) 9 (+থ)

দ d yen(brvbar+গ)sect(brvbar+ঘ)uml(brvbar+দ)copy(brvbar+ধ)দ8(brvbar+য)ordf(brvbar+ব)laquo(brvbar+ভ)not(brvbar+র)(+দ)shy(+দ)reg(+brvbar+র)not6 (+brvbar+র)

(lt+গ) gt (lt+ধ)

ধ dʱ macr(deg+ন)plusmn(deg+ম)ধ8(deg+য)sup2(deg+র)sup3(acute+ধ)copy(brvbar+ধ)(+ধ)micro(+ধ)

( (+ধ) gt (lt+ধ) (0+ধ) (7+ধ)

ট ʈ para(middot+ট)ট8(middot+য)cedil(middot+ব)sup1(middot+র)ordm(p+ট)raquo(frac14+ট)

ঠ ʈʰ ঠ8(frac12+য)frac34(iquest+ঠ)Agrave(frac14+ঠ)

ড ɖ Aacute(Acirc+ড)ড8(Acirc+য)Atilde(Acirc+র)

ঢ ɖʱ ঢ8(Auml+য)Aring(iquest+ঢ)

চ tʃ AElig(Ccedil+চ)Egrave(Ccedil+ছ)Eacute(Ccedil+Ecirc+র)Euml(Ccedil+ঞ)চ8(Ccedil+য)Igrave(Iacute+চ)Icirc(Iuml+চ)

(A+চ)

62

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ছ tʃʰ ETH(Ecirc+র)Egrave(Ccedil+ছ)Ntilde(Iacute+ছ)Ograve(Iuml+ছ)

$ (A+ছ)

জ dʒ Oacute(Ocirc+জ)Otilde(Ocirc+Ocirc+ব)Ouml(Ocirc+ঝ)times(Ocirc+ঞ)জ8(Ocirc+য)Oslash(Ocirc+র)Ugrave(Iacute+জ)

(A+জ)

ঝ dʒʱ (notprivilegedenoughtohaveclustersasafirstmember)Ouml(Ocirc+ঝ)Uacute(Iacute+ঝ)

ক k Ucirc(p+ক)ordm(p+ট)(p+ত)Uuml(p+x+র)(p+x+ব)Yacute(p+ন)THORN(p+ব)szlig(p+ম)ক8(p+য)agrave(p+র)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)auml(p+frac14+ব)aacute8(p+frac14+য)aring(p+স)s(aelig+ক)ccedil(+p+র)

amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)

খ kʰ (notprivilegedenoughtohaveclustersasafirstmember)egrave(aelig+খ)

63

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

গ g eacute(acute+গ)ecirc(acute+দ)sup3(acute+ধ)euml(acute+ন)igrave(acute+ব)iacute(acute+ম)গ8(acute+য)icirc(acute+র)iuml(acute+ল)eth(aelig+গ)eth6 (+aelig+গ)

( (+ধ) (H+গ) K (L+H+গ)

ঘ gʱ ntilde(ograve+ন)ঘ8(ograve+য)oacute(ograve+র)ocirc(aelig+ঘ)

ঞ Thisletterdoesnothaveanyparticularphoneticvaluebutmostlypronouncedasn

Igrave(Iacute+চ)Ntilde(Iacute+ছ)Ugrave(Iacute+জ)Uacute(Iacute+ঝ)times(Ocirc+ঞ)

(A+চ) $ (A+ছ) (A+জ) M (A+ঝ)

ণ n otilde(iquest+ট)frac34(iquest+ঠ)ouml(iquest+ড)divide(iquest+Acirc+র)Aring(iquest+ঢ)oslash(iquest+ণ)ণ8(iquest+য)ugrave(iquest+ব)acirc(p+frac14+ণ)uacute(frac14+ণ)ucirc(+ণ)

O (P+ড) - (P+R+র) + (S+ণ)

ঙং ŋ s(aelig+ক)uuml(aelig+p+র)egrave(aelig+খ)eth(aelig+গ)ocirc(aelig+ঘ)yacute(aelig+p+ষ)(Insomecontextsaeligisreplacedbyং)কংঅং

) (H+ক) (H+গ) U (H+ঘ)

64

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ম m thorn(+ল)yuml(+প)(+z+র)(+ভ)(++র)$(+ম)(+র)(x+ম)plusmn(deg+ম)amp(+ম)atilde(p+frac14+ম)

W (3+ম)

ন n (+ট)((+middot+র))(iquest+ঠ)u(+ড)(+Acirc+র)(+ত)q(+x+র)q8(+x++য)curren(+থ)shy(+দ)reg(+brvbar+র)micro(+ধ)+(+deg+র)(+brvbar+ব)-(+ন)(+ম)ন8(+য)(+স)0(+ন)

(7+থ) (7+ধ) (7+Y+র)

শ ʃ Icirc(Iuml+চ)Ograve(Iuml+ছ)1(Iuml+ন)2(Iuml+ম)3(Iuml+র)4(Iuml+ল)শ8(Iuml+য)

ষ ʃ 5(frac14+ক)raquo(frac14+ট)Agrave(frac14+ঠ)uacute(frac14+ণ)6(frac14+প)7(frac14+z+র)8(frac14+ফ)raquo(frac14+ট)9(frac14+middot+র)Agrave(frac14+ঠ)uacute(frac14+ণ)ষ8(frac14+য)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)

+ (S+ণ)

65

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

স sampʃ (+ক)(+ট)(+প)(+ফ)lt(+ত)pound(+থ)(+ট)(+ক)=(+খ)স8(+য)gt(+র)(+ল)aring(p+স)

9 (+থ)

হ h ucirc(+ণ)0(+ন)amp(+ম)হ8(+য)(+র)A(+ল)

W (3+ম)

ড় ɽ B(C+গ)

ঢ় ɽʱ (notprivilegedenoughtohaveclusters)

য dʒThesecondarysymbol(allograph)jɔ-phalahastwophoneticvaluesWhenaddedtotheinitialconsonantinaworditisavowelaelig(asinশ8ামলর 8াপারetc)Butafteranon-initialconsonantitjustdoublesitinpronunciation(asinকায6ধায6etc)The+যcombinationhastwophysicalmanifestationsmdashর 8andয6

ক8(p+য)স8(+য)র 8(+য)[Justর 8isneverusedinBanglaorthographyর 8াisbutthenitslasttwosymbolsYa-phalaa-karaconstituteavowelsignrepresentingthevowelঅ8া]

66

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

র r Twomanifestationsmdashi lরফrepʰasthefirst

memberofaclusteregপ6ৎ6 not6 য6D6(+deg+ব)(earlierE6=+brvbar+deg+বafour-termcluster)etc(placedoverthefollowingconsonant)

ii র-ফলাrɔ-pʰɔlaasthesecondthirdmemberofaclustereg etc(placedundertheconsonantitfollows)

ল l F(+গ)(+প)G(+ব)H(+ম)I(+ট)J(+ড)K(+ক)F(+গ)L(+দ)ল8(+য)iuml(acute+ল)(+ল)thorn(+ল)

ঃ hwordfinallywordmediallyitdoublesthepronunciationofthefollowingconsonant

অঃকঃ

অব

Page 46: Proposal for a Bangla (or Bengali) Script Root Zone …‘Bangla’ (or Bengali) is historically and genealogically regarded as an eastern Indo-Aryan language with around 178.2 million

46

1 BanglaandNāgarīDevanāgarīScript

Bangla Devanāgarī

মU+09AE

मU+092E

িU+09BF

िU+093F

Table14-BanglaandDevanāgarīcross-scriptvariantcodepoint

2 BanglaandGurmukhiScript

Bangla Gurmukhı

মU+09AE

ਸU+0A38

িU+09BF

ਿU+0A3F

Table15-BanglaandGurmukhıcross-scriptvariantcodepoint

7 WholeLabelEvaluationRules(WLE) ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinsection32whenwritten inBangla12ScriptThe ruleshavebeendrafted in suchawaythattheycanbeeasilytranslatedintotheLGRspecificationsBelow are the symbols used in the WLE rules for each of the Indic SyllabicCategoryasmentionedinthetableprovidedinCodepointrepertoire(Section51)

C rarr Consonant

M rarr Kāra(Mātrā)

V rarr Vowel

B rarr Anusvāra

12 As used by the Unicode denoting and including both Assamese and Maṇipuri

47

D rarr Candrabindu

X rarr Visarga

H rarr Hasanta

Z rarr KhandaTa

S rarr S1S2(fromTable9)or(ae)Ya-phalā(V1HC1M1)whereV1iseither0985(অ-BENGALILETTERA)or098F(এ-BENGALILETTERE)His09CD(-BENGALISIGNVIRAMA)C1is-09AF(য-BENGALILETTERYA)M1is-09BE(া-BENGALIVOWELSIGNAA)S1 and S2 are valid even they are not allowed by the other context rules

P rarr Ra-Hasanta(C2H)whereC2iseither09B0(র-BENGALILETTERRA)or 09F0 (ৰ - ASSAMESE LETTER RA

Unicode name BENGALI LETTER RAWITHMIDDLEDIAGONAL)

His09CD(-BENGALISIGNVIRAMA)

Table16-SymbolsusedinWLErules

ItisalsoperhapsidealtomentionherethatinBanglatheconsonantletters(orgraphemes)arephysicallyjoinedtoformldquoclustersrdquothatcouldtheoreticallyconjoinfromtwotofourconsonantsandcombinetocreatenewshapesDashandChaudhuri(1998)statethatthereareldquonearly380uniqueconsonantclustersrdquooutofwhichBi-consonantalcombinationsare290three-lettercombinationsaccountforanother80andtherareroneswithfourlettersnumber10more[136Pg4]MoredetailsofsuchcombinationscouldbeseeninPabitraSarkar(1993)[135] 71FinalSetofWLERulesTheprevalentpatternsinBanglaandvariousrestrictionsbelowarethespecificWLErulesthatneedtobeimplemented

48

1 CisasetofCandCNwhereCNisthesetofnormalizedformsofড়ঢ়য়2 HmustbeprecededbyC

Example

3 MmustbeprecededbyCExampleকা

4 DmustbeprecededbyeitherofVCorMExampleআখখা হা

5 XmustbeprecededbyeitherofVCMorDExampleউঃখঃবঃাঃ দ ঃ

6 BmustbeprecededbyeitherofVCMorDExampleআংইংকং

7 ZmustbeprecededbyVCMDBXorPExampleইৎকৎাৎাৎপ6ৎrৎ (S is not listed because S ends with M Z may also follow S)rdquo

8 VCANNOTbeprecededbyHDetailsin711CaseofVprecededbyH

9 SCANNOTbeprecededbyH

Now letuselaborateeach rulewithexamples from the scriptkeeping inmindthe Bangla Assamese and Manipuri communities Some combinations ofcharactersmayseemunrealisticorrareinusagebutthereisnoharminaddingsuch ligaturesbecause it ispossible tocreate thembyanyusereasilybutmaynotbeattestedcombinationsCase of V Preceded by H There could be cases involvingmulti-word domains where Vmay need to beallowedtofollowanHegব8াsঅtইিuয়া baeligŋkʌv ɪndiə (U+09ACU+09CDU+09AFU+09BEU+0999U+09CD U+0995 U+0985 U+09AB U+09CD U+0987 U+09A8 U+09CD U+09A1U+09BFU+09DFU+09BE)(meaningBankofIndia)ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendswithanH(অt)andthesecondwordbeginswithaV(ইিuয়া)Somesectionsofthe linguistic community require the explicit presence of H for fullrepresentation of the sound intended However by and large the form of thefirstwordwithoutanH(U+09CD)isconsideredenoughforfullrepresentationofthesoundintendedforthefirstword

49

This isauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoire Otherwise V is never required to be allowed to follow an HPermittingthismaycreateaperceptivesimilaritybetweentwolabels(withandwithout H) for majority of the linguistic communities hence this is explicitlyprohibitedbytheNBGPIn future if required depending on the prevailing requirements from thecommunitythefutureNBGPmayconsiderrevisitingthisrule

72 AdditionalExamplesfromBanglaABNF

Belowarea fewexampleswhichhelponeunderstandsomeof the rulesABNFputsinplaceThesearejustgivenforreferencepurposesandarenotmeanttobecomprehensive

1HMBDorXcannotoccurinthebeginningofaBanglawordExample

ক क

াক ाक

ংক क

ক क

ঃক ःकAscanbeseensuchcombinationwillresultautomaticallyinaldquogolurdquooradottedcircle marking it as an invalid formation This is an intrinsic property of theIndianlanguagesyllableandisquasi-automaticallyappliedwhereversupportedbytheOS

2HisnotpermittedafterVBDXMS

Example

অ अ

অং क ক क কঃ कः ি )क

3NumberofBDorXpermittedafterConsonantorVowelorakāra(Mātrā)isrestrictedtoonethusthefollowingcombinationsareinvalidated

50

Example

কংং क

ক क

কঃঃ कःः

কা का

কীঃঃ कःः

অংং अ

অ अ

অঃঃ अःः

4NumberofMpermittedafterConsonantisrestrictedtooneExample

কীী की5MisnotpermittedafterV Example

ইাঈৗ ईाईौ6ThecombinationsofAnusvāra+ VisargaaswellasVisarga+Anusvāraarenotpermissible

Example

কংঃ कः

কঃং कः

8 Contributors

81ExpertsfromIndia ProfessorUdayaNarayanaSinghChair-ProfessorofLinguisticsampDeanFacultyofArtsAmityUniversityHaryanaGurgaonPachgaonManesarPIN122431(Haryana)India

51

ProfessorPabitraSarkarformerlyVice-ChancellorRabindraBharatiUniversityKolkataDrAtiurRahmanKhanPrincipalTechnicalOfficerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMrRajibChakrabortyLinguistSocietyforNaturalLanguageTechnologyResearch(SNLTR)Module114amp130SDFBuildingSaltLakeSector-VKolkata-700091(WestBengal)IndiaMrAkshatJoshiProjectEngineerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMsMoumitaChowdhurySeniorTechnicalOfficerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMrChandrakantaMurasinghAgartalaTripuraSomeotherNBGPmembers

82ContributorsfromBangladeshJanabMustafaJabbarHonorableMinisterMinistryofPostsTelecommunicationsampInformationTechnologyGovtofBangladeshProfShamsuzzamanKhanFormerDirector-GeneralBanglaAcademyDhakaProfRafiqulIslamNationalProfessorofHumanitiesDhakaProfSwarochisSarkarDirectorInstituteofBangladeshStudiesRajshahiUniversityRajshahiBangladeshProfJinnatImtiazAliDirector-GeneralInternationalMotherLanguageInstituteDhakaMrMohammadMamunOrRashidDepartmentofBanglaJahangirnagarUniversityampMemberBangladeshComputerCouncilProfManiruzzamanformerlyProfessorChittagongUniversityChattagramBangladeshMrShyamSunderSikderSecretarySecretaryPostampTelecommunicationsDivisionGovtofBangladesh

52

MrMdMustafaKamalFormerDirectorGeneralBangladeshTelecommunicationsRegulatoryAuthorityGovernmentofBangladeshDhakaBrigadierGeneralMdMahfuzulKarimMajumderDirector-GeneralEngineeringampOperationsDivisionBangladeshTelecommunicationsRegulatoryAuthorityGovernmentofBangladeshDhakaMdZiarulIslamProgrammerPostsampTelecommunicationsDivisionGovernmentofBangladeshDhakaProfSyedShahriyarRahmanDepartmentofLinguisticsUniversityofDhakaDrMizanurRahmanDirector(In-Charge)TranslationTextBookandInternationalRelationsDivisionBanglaAcademyDhakaDrApareshBandyopadhyayDirectorBanglaAcademyDhakaMrMdMobarakHossainDirectorBanglaAcademyDhakaDrJalalAhmedDirectorBanglaAcademyDhakaMrJahangirHossainInternetSocietyBangladesh(ICANNALS)JanabSarwarMostafaChoudhuryBangladeshComputerCouncilDhakaJanabMdRashidWasifBangladeshComputerCouncilDhakaJanabIstiaqueArifSeniorAssistantDirectorBangladeshTelecommunicationsRegulatoryAuthorityDhakaMsAfifaAbbasInformationSecurityandGovernanceLeadEngineeratBanglalinkandICANNFellowMr Mohammad Abdul Haque Secretary General Bangladesh Internet GovernanceForumMrImranHossenCEOEyeSoftandkeymemberofBangladeshAssociationofSoftwareampInformationServices(BASIS)MsShahidaKhatunDirectorFolkloreMuseumampArchiveDivisionBanglaAcademyDhakaMrSyedAshikRehmanCEOBengalMediaCorporationDhakaMrHaseebRahmanCEOProfessionalsrsquoSystemsDhaka

53

9 References[100] UnicodeConsortium2017UnicodeStandard100MountainViewCA

[101] BandyopadhyayChittaranjan1981DuiShatakerBanglaMudranoPrakashanKolkataAnandaPublishers

[102] BanerjiRD1919TheOriginoftheBengaliScriptKolkataNewDelhiAsianEducationalServices2003reprint

[103] ChatterjiSK1926TheOriginandDevelopmentoftheBengaliLanguageCalcuttaCalcuttaUniversityPress

[104] -----1939Bhasha-prakashBangalaVyakaran(AGrammaroftheBengaliLanguage)CalcuttaUniversityofCalcutta

[105] HaiMuhammadAbdul1964DhvaniVijnanOBanglaDhvani-tattwa(PhoneticsandBengaliPhonology)DhakaBanglaAcademy

[106]JhaSubhadra1958TheFormationofMaithiliLondonLuzacampCo

[107] KosticDjordjeDasRheaS1972AShortOutlineofBengaliPhoneticsCalcuttaStatisticalPublishingCompany

[108] MajumdarRC1971HistoryofAncientBengalCalcuttaGBhardwaj

[109] MazumdarBijaychandra19202000TheHistoryoftheBengaliLanguage(ReprCalcutta1920ed)NewDelhiAsianEducationalServices

[110] PandeyAnshuman2001ProposaltoEncodetheTirhutaScriptinISOIEC10646

[111] PalPalashBaran2001DhwanimalaBarnamalaKolkataPapyrus

[112] -----2007lsquoBanglaHarapherPanchParbarsquoInSwapanChakrabortyedMudranerSanskritiOBanglaBoiKolkataAbabhas

[113] RossFiona1999ThePrintedBengaliCharacteranditsEvolutionLondonCurzon

[114] ShastriMahamahopadhyayHaraPrasad1916HājārBacharērPurāṇaBāṅgālāBhāṣāyBauddhaGānōDōhāCalcuttaBangiyaSahityaParishat

[115] SinghUdayaNarayana(JointlyManiruzzaman)1983DiglossiainBangladeshandlanguageplanningCalcuttaGyanBharati214pp

[116] -----1987ABibliographyofBengaliLinguisticsMysoreCIILxii+316pp

[117] -----2017(withRajibChakrabortyBidishaBhattacharjeeampArimardanKumarTripathy)LanguagesandCulturesontheMarginGuidelinesforFieldworkonEndangeredLanguagesMimeoCentreforEndangeredLanguagesVisva-Bharati

54

[118] -----1980ScriptalchoiceandspellingreformAnessayinlanguageandplanningJournaloftheMSUniversityofBarodaSocialScienceNumber292173-186AmodifiedversionreprintedEAnnamalaiBjornJernuddandJoanRubinedsLanguagePlanningProceedingsofanInstituteMysoreCIIL405-417

[119] Sripantha1996JakhanChapakhanaEloKolkataPaschim-BangaBanglaAcademy

[120] SurAtul1986BanglaMudranerDushoBacharKolkataJijnasa

[121] ScriptBehaviourforBengaliVersion11TDILandC-DACPune

[122] BoraMahendra1981TheEvolutionofAssameseScriptJorhatAssamSahityaSabha

[123] ProposaltoEncodetheTirhutaScriptinISOIEC10646httpwwwunicodeorgL2L201111175r-tirhutapdfaccessedon25112017

[124] EthnologueAssameseintheLanguageCloudhttpswwwethnologuecomcloudasmaccessedon25112017

[125] BengalialphabetforManipurifoundinEthnologueManipuri(MeeteilonMeithei)httpswwwomniglotcomwritingmanipurihtmaccessedon20102019

[126] WikipediaBengalialphabethttpsenwikipediaorgwikiBengali_alphabetaccessedon25112017

[129]OmniglotSlyhetihttpwwwomniglotcomwritingsylotihtmaccessedon1052018

[130]WikipediaBishnupriyaManipurilanguagehttpsenwikipediaorgwikiBishnupriya_Manipuri_languageaccessedon25112017

[131] TheEMILLECIILCorpushttpmetashareeldaorgrepositorybrowsethe-emilleciil-corpusabdd35c8de6f11e2b1e400259011f6ea6bce74d38dbb42d881da76c64a6adb20accessedon1052018

[132]TheEMILLECIILCorpushttpcatalogelrainfoproduct_infophpproducts_id=696accessedon1052018

[133] BanglaLanguageampScript httpswwwisicalacin~rc_banglabanglahtmlaccessedon1052018

[134] SarkarPabitra1992BanglaBananSanskarSamasyaoSambhabanaKolkataChirayataPrakashan

[135] SarkarPabitra1993BanglaBhasharYuktabyanjanBhasha1123-45

55

[136] DashNiladriShekharandBBChaudhuri1998BanglaScriptAStructuralStudyLinguisticsToday121-28Alsoavailableathttpswwwacademiaedu9967428Bangla_Script_A_Structural_Study

[137] DaniAhmedHasan(1957)lsquoSrīhatta-NāgarīLipirUtpattioBikāśrsquoBanglaAcademyPatrika(Dhaka)Vol12(Bhadra-Agrahayan1364BangabdaNumber)pg1

[138] WikipediaSylhetiNagari httpsenwikipediaorgwikiSylheti_Nagariaccessedon1952018

[139] FuruiRyosuke(2015)lsquoVariegatedAdaptationsStateFormationinBengalfromtheFifthtoSeventhCenturyrsquoinBhairabiPrasadSahuampHermannKulkeedsInterrogatingPoliticalSystemsIntegrativeProcessesandStatesinPre-ModernIndiaChapter9Pp255-73NewDelhiManohar

[140] FergusonCharesAandMunierChowdhury(1960)lsquoPhonesofBengalirsquoLanguageVol36No1pp22-59

[141] ShahidullahMuhammad(2007)BuddhistMysticSongsDhakaMowlaBrothers

[142] RayPunyaSloka(1966)BengaliLanguageHandbookWashington

[143] HaiMuhammadAbdul(1960)AphoneticandphonologicalstudyofnasalsandnasalizationinBengaliDhakaUniversityofDhaka

[144] UnicodeConsortiumProposalSummaryFormtoAccompanySubmissionsforAdditionstotheRepertoireofISOIEC10646UNICODEhttpswwwunicodeorgL2L200202387r-syloti-formpdfaccessedonMay212018

[145] WikipediaOlChiki(Unicodeblock)httpsenwikipediaorgwikiOl_Chiki_(Unicode_block)accessedonMay212018

[146]BanglaScripthttpwwwbangladesh2000combdbangla_scripthtmlaccessedonMay212018

[147] BhattacharyaAshutoshed(1942)GopichandrerGanCalcuttaCalcuttaUniversity

[149] DasSisirKumar(1975)SahibsandMunshisAnAccountoftheCollegeofFortWilliamCalcutta

[150] IslamRafiqulPabitraSarkarMahbubulHaqampRajibChakraborty(eds)(2014)BanglaAcademyPromitoBanglaByabaharikByakaran(AFunctionalGrammarofStandardBangla)DhakaBanglaAcademy

[151] SarkarPabitra[2013]lsquoBanglaSpellingReformtheLongandShortofItrsquoBanglaJournal19215-232

[152] BanglaAcademy(2012)BanglaAcademyPromitoBanglaBananerNiyam(StandardBanglaSpellingasadoptedbyBanglaAcademy)DhakaBanglaAcademy

56

[153] SarkarPabitraampRajibChakraborty2018ldquoWhathashappenedSoFarIntermsof Script Reformsrdquo Paper presented at the Face to Face meeting jointly held by theBanglaAcademyDhakaampICANNatBanglaAcademyDhakaon10072018

[154] The Unicode Consortium 2018 The Unicodereg Standard Version 110 ndash CoreSpecificationChapter12P473

57

10 Appendix-I

101 AugmentedBackus-NaurFormalism(ABNF)The Augmented Backus-Naur Formalism (ABNF) is generic in nature and whenappliedtoaspecificlanguagescriptcertainrestrictionrulesapplyInotherwordsinagivenlanguagesomeoftheFormalismstructuresdonotnecessarilyapplyTotakecareofsuchcasesrestrictionrulesaresetinplaceTheserestrictionswillhelptofine-tunetheABNFIncaseofBangla13inparticularthefollowingrulesapply

1Khaṇḍata (ৎ) is NOT allowed at the beginning of an IDN label The sameappliestoঞandthevelarnasalঙintheBanglaSchemeoffive-foldlsquovargarsquo(asdefined under Table 5) Moreover Bangla does not allow ya (য়) in thebeginningofawordeitherbutwecanciteacoupleofnativeexamples forexamplethewordয়8াvেড়া(yaeligbbɔRo)fromthepoemlsquoLichuchorrsquowrittenbyKaziNazrul IslamHowever there are instances of it being used in namesmostlyof foreignoriginsuchasYaqubwhichmaybewrittenwithya(য়) inthe beginning as inয়াwব) In very recent timeswhile transliterating someChineseandJapanesenamesinBanglaonedoescomeacrossthepossibilityofKhaṇḍata (ৎ) followedbysa (স) inthebeginningofaword forexamplexেসিরং(Tsering)

2CHcancomewithKhandaTainonlythecasewhereCisra(র)(09B0)

ৎ6 asinভৎ6 সনা

3OnlyfollowingcombinationswithVHCMwillbeallowedrarrঅ8া(togetherpronouncedasaelig)asinঅ8ািসড(acid)rarrএ8া(togetheralsopronouncedasaelig)asinএ8ািসডএ8ােসািসেয়শান

(acidassociation)

102 lsquoSylhetiNagarılipirsquoorlsquoSilotirsquoThisversionofBanglascriptresemblesthe lsquoKaithīrsquoscript(ISO12954)usedbytheAccountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh andBiharndashwidelyinuseduringthe1880sTherewereseveralothernamesofSylheti

13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language Assamese and Maṇipuri have not been covered in this section

58

Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)

Table17ndashTheScriptTableofSylhetiNagarıorSiloti

103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints

1031 BanglaandNāgarīorDevanāgarī

Bangla Devanāgarī NBGP

Decision

ঃ U+0983 ः U+0903 Confusable

ওU+0993 उU+0909 Confusable

ঘU+0998 घU+0918 Confusable

U+0981 U+0945 Confusable

Table18BanglaandDevanāgarīconfusablecodepoints

59

1032 BanglaandGurmukhi

Bangla Gurmukhi NBGPdecision

ঘU+0998 ਬU+0A2C Confusable

U+0981 U+0A71 Confusable

Table19BanglaandGurmukhiconfusablecodepoints

Bangla Gurmukhi

NBGPdecision

ওU+0993 ਤU+0A24

Distinguishable

শU+09B6 ਅU+0A05

Distinguishable

মU+09AE ਮU+0A2E

Distinguishable

বাU+09ACandU+09BE

ਗU+0A17

Distinguishable

Table20ndashBanglaandGurmukhıdistinguishablecodepoints

1033BanglaandOriya(Odia)

Bangla Oriya(Odia) NBGPDecision

ওU+0993 ଓU+0B13 Confusable

Table21ndashBanglaandOriyadistinguishablecodepoints

Bangla Oriya(Odia) NBGP

Decision

ঘU+0998 ସU+0B38 Distinguishable

Table22ndashBanglaandOriyadistinguishablecodepoints

60

11 Appendix-IIBengaliconsonantsandtheirallographs

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

প p y(z+ত)(z +ন)|(z +প)প8(z +য)r(z +র)(z +ল)~(z +স)(+প)(+প)

ফ pʰ (t +র)(t +ল)(+ফ)

ব b ( +জ)( +দ)( +ধ)v(+ব)ব8(+য)(+র)(+ল)ভ(+ভ)(+ব)(+ব)

(0+ধ) 2 (3+ব)

ভ bʱ ভ8(+য)(+র)(+ল)

ত t (x+ত)8(x+x+য)(x+x+ব)(x+থ)(x+ন)ত8(x+য)(x+ম)8(x++য)(x+ব)(x+র)y(z+ত)(p+ত)(p+x+ব)(+ত)q8(+x++য) (+x+র)Thereisamarkedformofত+=ৎৎ6 (+xৎ)

amp (5+ত)

61

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

থ tʰ থ8(iexcl+য)cent(iexcl+র)pound(+থ)(x+থ)curren(+থ)

(7+থ) 9 (+থ)

দ d yen(brvbar+গ)sect(brvbar+ঘ)uml(brvbar+দ)copy(brvbar+ধ)দ8(brvbar+য)ordf(brvbar+ব)laquo(brvbar+ভ)not(brvbar+র)(+দ)shy(+দ)reg(+brvbar+র)not6 (+brvbar+র)

(lt+গ) gt (lt+ধ)

ধ dʱ macr(deg+ন)plusmn(deg+ম)ধ8(deg+য)sup2(deg+র)sup3(acute+ধ)copy(brvbar+ধ)(+ধ)micro(+ধ)

( (+ধ) gt (lt+ধ) (0+ধ) (7+ধ)

ট ʈ para(middot+ট)ট8(middot+য)cedil(middot+ব)sup1(middot+র)ordm(p+ট)raquo(frac14+ট)

ঠ ʈʰ ঠ8(frac12+য)frac34(iquest+ঠ)Agrave(frac14+ঠ)

ড ɖ Aacute(Acirc+ড)ড8(Acirc+য)Atilde(Acirc+র)

ঢ ɖʱ ঢ8(Auml+য)Aring(iquest+ঢ)

চ tʃ AElig(Ccedil+চ)Egrave(Ccedil+ছ)Eacute(Ccedil+Ecirc+র)Euml(Ccedil+ঞ)চ8(Ccedil+য)Igrave(Iacute+চ)Icirc(Iuml+চ)

(A+চ)

62

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ছ tʃʰ ETH(Ecirc+র)Egrave(Ccedil+ছ)Ntilde(Iacute+ছ)Ograve(Iuml+ছ)

$ (A+ছ)

জ dʒ Oacute(Ocirc+জ)Otilde(Ocirc+Ocirc+ব)Ouml(Ocirc+ঝ)times(Ocirc+ঞ)জ8(Ocirc+য)Oslash(Ocirc+র)Ugrave(Iacute+জ)

(A+জ)

ঝ dʒʱ (notprivilegedenoughtohaveclustersasafirstmember)Ouml(Ocirc+ঝ)Uacute(Iacute+ঝ)

ক k Ucirc(p+ক)ordm(p+ট)(p+ত)Uuml(p+x+র)(p+x+ব)Yacute(p+ন)THORN(p+ব)szlig(p+ম)ক8(p+য)agrave(p+র)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)auml(p+frac14+ব)aacute8(p+frac14+য)aring(p+স)s(aelig+ক)ccedil(+p+র)

amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)

খ kʰ (notprivilegedenoughtohaveclustersasafirstmember)egrave(aelig+খ)

63

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

গ g eacute(acute+গ)ecirc(acute+দ)sup3(acute+ধ)euml(acute+ন)igrave(acute+ব)iacute(acute+ম)গ8(acute+য)icirc(acute+র)iuml(acute+ল)eth(aelig+গ)eth6 (+aelig+গ)

( (+ধ) (H+গ) K (L+H+গ)

ঘ gʱ ntilde(ograve+ন)ঘ8(ograve+য)oacute(ograve+র)ocirc(aelig+ঘ)

ঞ Thisletterdoesnothaveanyparticularphoneticvaluebutmostlypronouncedasn

Igrave(Iacute+চ)Ntilde(Iacute+ছ)Ugrave(Iacute+জ)Uacute(Iacute+ঝ)times(Ocirc+ঞ)

(A+চ) $ (A+ছ) (A+জ) M (A+ঝ)

ণ n otilde(iquest+ট)frac34(iquest+ঠ)ouml(iquest+ড)divide(iquest+Acirc+র)Aring(iquest+ঢ)oslash(iquest+ণ)ণ8(iquest+য)ugrave(iquest+ব)acirc(p+frac14+ণ)uacute(frac14+ণ)ucirc(+ণ)

O (P+ড) - (P+R+র) + (S+ণ)

ঙং ŋ s(aelig+ক)uuml(aelig+p+র)egrave(aelig+খ)eth(aelig+গ)ocirc(aelig+ঘ)yacute(aelig+p+ষ)(Insomecontextsaeligisreplacedbyং)কংঅং

) (H+ক) (H+গ) U (H+ঘ)

64

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ম m thorn(+ল)yuml(+প)(+z+র)(+ভ)(++র)$(+ম)(+র)(x+ম)plusmn(deg+ম)amp(+ম)atilde(p+frac14+ম)

W (3+ম)

ন n (+ট)((+middot+র))(iquest+ঠ)u(+ড)(+Acirc+র)(+ত)q(+x+র)q8(+x++য)curren(+থ)shy(+দ)reg(+brvbar+র)micro(+ধ)+(+deg+র)(+brvbar+ব)-(+ন)(+ম)ন8(+য)(+স)0(+ন)

(7+থ) (7+ধ) (7+Y+র)

শ ʃ Icirc(Iuml+চ)Ograve(Iuml+ছ)1(Iuml+ন)2(Iuml+ম)3(Iuml+র)4(Iuml+ল)শ8(Iuml+য)

ষ ʃ 5(frac14+ক)raquo(frac14+ট)Agrave(frac14+ঠ)uacute(frac14+ণ)6(frac14+প)7(frac14+z+র)8(frac14+ফ)raquo(frac14+ট)9(frac14+middot+র)Agrave(frac14+ঠ)uacute(frac14+ণ)ষ8(frac14+য)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)

+ (S+ণ)

65

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

স sampʃ (+ক)(+ট)(+প)(+ফ)lt(+ত)pound(+থ)(+ট)(+ক)=(+খ)স8(+য)gt(+র)(+ল)aring(p+স)

9 (+থ)

হ h ucirc(+ণ)0(+ন)amp(+ম)হ8(+য)(+র)A(+ল)

W (3+ম)

ড় ɽ B(C+গ)

ঢ় ɽʱ (notprivilegedenoughtohaveclusters)

য dʒThesecondarysymbol(allograph)jɔ-phalahastwophoneticvaluesWhenaddedtotheinitialconsonantinaworditisavowelaelig(asinশ8ামলর 8াপারetc)Butafteranon-initialconsonantitjustdoublesitinpronunciation(asinকায6ধায6etc)The+যcombinationhastwophysicalmanifestationsmdashর 8andয6

ক8(p+য)স8(+য)র 8(+য)[Justর 8isneverusedinBanglaorthographyর 8াisbutthenitslasttwosymbolsYa-phalaa-karaconstituteavowelsignrepresentingthevowelঅ8া]

66

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

র r Twomanifestationsmdashi lরফrepʰasthefirst

memberofaclusteregপ6ৎ6 not6 য6D6(+deg+ব)(earlierE6=+brvbar+deg+বafour-termcluster)etc(placedoverthefollowingconsonant)

ii র-ফলাrɔ-pʰɔlaasthesecondthirdmemberofaclustereg etc(placedundertheconsonantitfollows)

ল l F(+গ)(+প)G(+ব)H(+ম)I(+ট)J(+ড)K(+ক)F(+গ)L(+দ)ল8(+য)iuml(acute+ল)(+ল)thorn(+ল)

ঃ hwordfinallywordmediallyitdoublesthepronunciationofthefollowingconsonant

অঃকঃ

অব

Page 47: Proposal for a Bangla (or Bengali) Script Root Zone …‘Bangla’ (or Bengali) is historically and genealogically regarded as an eastern Indo-Aryan language with around 178.2 million

47

D rarr Candrabindu

X rarr Visarga

H rarr Hasanta

Z rarr KhandaTa

S rarr S1S2(fromTable9)or(ae)Ya-phalā(V1HC1M1)whereV1iseither0985(অ-BENGALILETTERA)or098F(এ-BENGALILETTERE)His09CD(-BENGALISIGNVIRAMA)C1is-09AF(য-BENGALILETTERYA)M1is-09BE(া-BENGALIVOWELSIGNAA)S1 and S2 are valid even they are not allowed by the other context rules

P rarr Ra-Hasanta(C2H)whereC2iseither09B0(র-BENGALILETTERRA)or 09F0 (ৰ - ASSAMESE LETTER RA

Unicode name BENGALI LETTER RAWITHMIDDLEDIAGONAL)

His09CD(-BENGALISIGNVIRAMA)

Table16-SymbolsusedinWLErules

ItisalsoperhapsidealtomentionherethatinBanglatheconsonantletters(orgraphemes)arephysicallyjoinedtoformldquoclustersrdquothatcouldtheoreticallyconjoinfromtwotofourconsonantsandcombinetocreatenewshapesDashandChaudhuri(1998)statethatthereareldquonearly380uniqueconsonantclustersrdquooutofwhichBi-consonantalcombinationsare290three-lettercombinationsaccountforanother80andtherareroneswithfourlettersnumber10more[136Pg4]MoredetailsofsuchcombinationscouldbeseeninPabitraSarkar(1993)[135] 71FinalSetofWLERulesTheprevalentpatternsinBanglaandvariousrestrictionsbelowarethespecificWLErulesthatneedtobeimplemented

48

1 CisasetofCandCNwhereCNisthesetofnormalizedformsofড়ঢ়য়2 HmustbeprecededbyC

Example

3 MmustbeprecededbyCExampleকা

4 DmustbeprecededbyeitherofVCorMExampleআখখা হা

5 XmustbeprecededbyeitherofVCMorDExampleউঃখঃবঃাঃ দ ঃ

6 BmustbeprecededbyeitherofVCMorDExampleআংইংকং

7 ZmustbeprecededbyVCMDBXorPExampleইৎকৎাৎাৎপ6ৎrৎ (S is not listed because S ends with M Z may also follow S)rdquo

8 VCANNOTbeprecededbyHDetailsin711CaseofVprecededbyH

9 SCANNOTbeprecededbyH

Now letuselaborateeach rulewithexamples from the scriptkeeping inmindthe Bangla Assamese and Manipuri communities Some combinations ofcharactersmayseemunrealisticorrareinusagebutthereisnoharminaddingsuch ligaturesbecause it ispossible tocreate thembyanyusereasilybutmaynotbeattestedcombinationsCase of V Preceded by H There could be cases involvingmulti-word domains where Vmay need to beallowedtofollowanHegব8াsঅtইিuয়া baeligŋkʌv ɪndiə (U+09ACU+09CDU+09AFU+09BEU+0999U+09CD U+0995 U+0985 U+09AB U+09CD U+0987 U+09A8 U+09CD U+09A1U+09BFU+09DFU+09BE)(meaningBankofIndia)ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendswithanH(অt)andthesecondwordbeginswithaV(ইিuয়া)Somesectionsofthe linguistic community require the explicit presence of H for fullrepresentation of the sound intended However by and large the form of thefirstwordwithoutanH(U+09CD)isconsideredenoughforfullrepresentationofthesoundintendedforthefirstword

49

This isauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoire Otherwise V is never required to be allowed to follow an HPermittingthismaycreateaperceptivesimilaritybetweentwolabels(withandwithout H) for majority of the linguistic communities hence this is explicitlyprohibitedbytheNBGPIn future if required depending on the prevailing requirements from thecommunitythefutureNBGPmayconsiderrevisitingthisrule

72 AdditionalExamplesfromBanglaABNF

Belowarea fewexampleswhichhelponeunderstandsomeof the rulesABNFputsinplaceThesearejustgivenforreferencepurposesandarenotmeanttobecomprehensive

1HMBDorXcannotoccurinthebeginningofaBanglawordExample

ক क

াক ाक

ংক क

ক क

ঃক ःकAscanbeseensuchcombinationwillresultautomaticallyinaldquogolurdquooradottedcircle marking it as an invalid formation This is an intrinsic property of theIndianlanguagesyllableandisquasi-automaticallyappliedwhereversupportedbytheOS

2HisnotpermittedafterVBDXMS

Example

অ अ

অং क ক क কঃ कः ি )क

3NumberofBDorXpermittedafterConsonantorVowelorakāra(Mātrā)isrestrictedtoonethusthefollowingcombinationsareinvalidated

50

Example

কংং क

ক क

কঃঃ कःः

কা का

কীঃঃ कःः

অংং अ

অ अ

অঃঃ अःः

4NumberofMpermittedafterConsonantisrestrictedtooneExample

কীী की5MisnotpermittedafterV Example

ইাঈৗ ईाईौ6ThecombinationsofAnusvāra+ VisargaaswellasVisarga+Anusvāraarenotpermissible

Example

কংঃ कः

কঃং कः

8 Contributors

81ExpertsfromIndia ProfessorUdayaNarayanaSinghChair-ProfessorofLinguisticsampDeanFacultyofArtsAmityUniversityHaryanaGurgaonPachgaonManesarPIN122431(Haryana)India

51

ProfessorPabitraSarkarformerlyVice-ChancellorRabindraBharatiUniversityKolkataDrAtiurRahmanKhanPrincipalTechnicalOfficerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMrRajibChakrabortyLinguistSocietyforNaturalLanguageTechnologyResearch(SNLTR)Module114amp130SDFBuildingSaltLakeSector-VKolkata-700091(WestBengal)IndiaMrAkshatJoshiProjectEngineerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMsMoumitaChowdhurySeniorTechnicalOfficerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMrChandrakantaMurasinghAgartalaTripuraSomeotherNBGPmembers

82ContributorsfromBangladeshJanabMustafaJabbarHonorableMinisterMinistryofPostsTelecommunicationsampInformationTechnologyGovtofBangladeshProfShamsuzzamanKhanFormerDirector-GeneralBanglaAcademyDhakaProfRafiqulIslamNationalProfessorofHumanitiesDhakaProfSwarochisSarkarDirectorInstituteofBangladeshStudiesRajshahiUniversityRajshahiBangladeshProfJinnatImtiazAliDirector-GeneralInternationalMotherLanguageInstituteDhakaMrMohammadMamunOrRashidDepartmentofBanglaJahangirnagarUniversityampMemberBangladeshComputerCouncilProfManiruzzamanformerlyProfessorChittagongUniversityChattagramBangladeshMrShyamSunderSikderSecretarySecretaryPostampTelecommunicationsDivisionGovtofBangladesh

52

MrMdMustafaKamalFormerDirectorGeneralBangladeshTelecommunicationsRegulatoryAuthorityGovernmentofBangladeshDhakaBrigadierGeneralMdMahfuzulKarimMajumderDirector-GeneralEngineeringampOperationsDivisionBangladeshTelecommunicationsRegulatoryAuthorityGovernmentofBangladeshDhakaMdZiarulIslamProgrammerPostsampTelecommunicationsDivisionGovernmentofBangladeshDhakaProfSyedShahriyarRahmanDepartmentofLinguisticsUniversityofDhakaDrMizanurRahmanDirector(In-Charge)TranslationTextBookandInternationalRelationsDivisionBanglaAcademyDhakaDrApareshBandyopadhyayDirectorBanglaAcademyDhakaMrMdMobarakHossainDirectorBanglaAcademyDhakaDrJalalAhmedDirectorBanglaAcademyDhakaMrJahangirHossainInternetSocietyBangladesh(ICANNALS)JanabSarwarMostafaChoudhuryBangladeshComputerCouncilDhakaJanabMdRashidWasifBangladeshComputerCouncilDhakaJanabIstiaqueArifSeniorAssistantDirectorBangladeshTelecommunicationsRegulatoryAuthorityDhakaMsAfifaAbbasInformationSecurityandGovernanceLeadEngineeratBanglalinkandICANNFellowMr Mohammad Abdul Haque Secretary General Bangladesh Internet GovernanceForumMrImranHossenCEOEyeSoftandkeymemberofBangladeshAssociationofSoftwareampInformationServices(BASIS)MsShahidaKhatunDirectorFolkloreMuseumampArchiveDivisionBanglaAcademyDhakaMrSyedAshikRehmanCEOBengalMediaCorporationDhakaMrHaseebRahmanCEOProfessionalsrsquoSystemsDhaka

53

9 References[100] UnicodeConsortium2017UnicodeStandard100MountainViewCA

[101] BandyopadhyayChittaranjan1981DuiShatakerBanglaMudranoPrakashanKolkataAnandaPublishers

[102] BanerjiRD1919TheOriginoftheBengaliScriptKolkataNewDelhiAsianEducationalServices2003reprint

[103] ChatterjiSK1926TheOriginandDevelopmentoftheBengaliLanguageCalcuttaCalcuttaUniversityPress

[104] -----1939Bhasha-prakashBangalaVyakaran(AGrammaroftheBengaliLanguage)CalcuttaUniversityofCalcutta

[105] HaiMuhammadAbdul1964DhvaniVijnanOBanglaDhvani-tattwa(PhoneticsandBengaliPhonology)DhakaBanglaAcademy

[106]JhaSubhadra1958TheFormationofMaithiliLondonLuzacampCo

[107] KosticDjordjeDasRheaS1972AShortOutlineofBengaliPhoneticsCalcuttaStatisticalPublishingCompany

[108] MajumdarRC1971HistoryofAncientBengalCalcuttaGBhardwaj

[109] MazumdarBijaychandra19202000TheHistoryoftheBengaliLanguage(ReprCalcutta1920ed)NewDelhiAsianEducationalServices

[110] PandeyAnshuman2001ProposaltoEncodetheTirhutaScriptinISOIEC10646

[111] PalPalashBaran2001DhwanimalaBarnamalaKolkataPapyrus

[112] -----2007lsquoBanglaHarapherPanchParbarsquoInSwapanChakrabortyedMudranerSanskritiOBanglaBoiKolkataAbabhas

[113] RossFiona1999ThePrintedBengaliCharacteranditsEvolutionLondonCurzon

[114] ShastriMahamahopadhyayHaraPrasad1916HājārBacharērPurāṇaBāṅgālāBhāṣāyBauddhaGānōDōhāCalcuttaBangiyaSahityaParishat

[115] SinghUdayaNarayana(JointlyManiruzzaman)1983DiglossiainBangladeshandlanguageplanningCalcuttaGyanBharati214pp

[116] -----1987ABibliographyofBengaliLinguisticsMysoreCIILxii+316pp

[117] -----2017(withRajibChakrabortyBidishaBhattacharjeeampArimardanKumarTripathy)LanguagesandCulturesontheMarginGuidelinesforFieldworkonEndangeredLanguagesMimeoCentreforEndangeredLanguagesVisva-Bharati

54

[118] -----1980ScriptalchoiceandspellingreformAnessayinlanguageandplanningJournaloftheMSUniversityofBarodaSocialScienceNumber292173-186AmodifiedversionreprintedEAnnamalaiBjornJernuddandJoanRubinedsLanguagePlanningProceedingsofanInstituteMysoreCIIL405-417

[119] Sripantha1996JakhanChapakhanaEloKolkataPaschim-BangaBanglaAcademy

[120] SurAtul1986BanglaMudranerDushoBacharKolkataJijnasa

[121] ScriptBehaviourforBengaliVersion11TDILandC-DACPune

[122] BoraMahendra1981TheEvolutionofAssameseScriptJorhatAssamSahityaSabha

[123] ProposaltoEncodetheTirhutaScriptinISOIEC10646httpwwwunicodeorgL2L201111175r-tirhutapdfaccessedon25112017

[124] EthnologueAssameseintheLanguageCloudhttpswwwethnologuecomcloudasmaccessedon25112017

[125] BengalialphabetforManipurifoundinEthnologueManipuri(MeeteilonMeithei)httpswwwomniglotcomwritingmanipurihtmaccessedon20102019

[126] WikipediaBengalialphabethttpsenwikipediaorgwikiBengali_alphabetaccessedon25112017

[129]OmniglotSlyhetihttpwwwomniglotcomwritingsylotihtmaccessedon1052018

[130]WikipediaBishnupriyaManipurilanguagehttpsenwikipediaorgwikiBishnupriya_Manipuri_languageaccessedon25112017

[131] TheEMILLECIILCorpushttpmetashareeldaorgrepositorybrowsethe-emilleciil-corpusabdd35c8de6f11e2b1e400259011f6ea6bce74d38dbb42d881da76c64a6adb20accessedon1052018

[132]TheEMILLECIILCorpushttpcatalogelrainfoproduct_infophpproducts_id=696accessedon1052018

[133] BanglaLanguageampScript httpswwwisicalacin~rc_banglabanglahtmlaccessedon1052018

[134] SarkarPabitra1992BanglaBananSanskarSamasyaoSambhabanaKolkataChirayataPrakashan

[135] SarkarPabitra1993BanglaBhasharYuktabyanjanBhasha1123-45

55

[136] DashNiladriShekharandBBChaudhuri1998BanglaScriptAStructuralStudyLinguisticsToday121-28Alsoavailableathttpswwwacademiaedu9967428Bangla_Script_A_Structural_Study

[137] DaniAhmedHasan(1957)lsquoSrīhatta-NāgarīLipirUtpattioBikāśrsquoBanglaAcademyPatrika(Dhaka)Vol12(Bhadra-Agrahayan1364BangabdaNumber)pg1

[138] WikipediaSylhetiNagari httpsenwikipediaorgwikiSylheti_Nagariaccessedon1952018

[139] FuruiRyosuke(2015)lsquoVariegatedAdaptationsStateFormationinBengalfromtheFifthtoSeventhCenturyrsquoinBhairabiPrasadSahuampHermannKulkeedsInterrogatingPoliticalSystemsIntegrativeProcessesandStatesinPre-ModernIndiaChapter9Pp255-73NewDelhiManohar

[140] FergusonCharesAandMunierChowdhury(1960)lsquoPhonesofBengalirsquoLanguageVol36No1pp22-59

[141] ShahidullahMuhammad(2007)BuddhistMysticSongsDhakaMowlaBrothers

[142] RayPunyaSloka(1966)BengaliLanguageHandbookWashington

[143] HaiMuhammadAbdul(1960)AphoneticandphonologicalstudyofnasalsandnasalizationinBengaliDhakaUniversityofDhaka

[144] UnicodeConsortiumProposalSummaryFormtoAccompanySubmissionsforAdditionstotheRepertoireofISOIEC10646UNICODEhttpswwwunicodeorgL2L200202387r-syloti-formpdfaccessedonMay212018

[145] WikipediaOlChiki(Unicodeblock)httpsenwikipediaorgwikiOl_Chiki_(Unicode_block)accessedonMay212018

[146]BanglaScripthttpwwwbangladesh2000combdbangla_scripthtmlaccessedonMay212018

[147] BhattacharyaAshutoshed(1942)GopichandrerGanCalcuttaCalcuttaUniversity

[149] DasSisirKumar(1975)SahibsandMunshisAnAccountoftheCollegeofFortWilliamCalcutta

[150] IslamRafiqulPabitraSarkarMahbubulHaqampRajibChakraborty(eds)(2014)BanglaAcademyPromitoBanglaByabaharikByakaran(AFunctionalGrammarofStandardBangla)DhakaBanglaAcademy

[151] SarkarPabitra[2013]lsquoBanglaSpellingReformtheLongandShortofItrsquoBanglaJournal19215-232

[152] BanglaAcademy(2012)BanglaAcademyPromitoBanglaBananerNiyam(StandardBanglaSpellingasadoptedbyBanglaAcademy)DhakaBanglaAcademy

56

[153] SarkarPabitraampRajibChakraborty2018ldquoWhathashappenedSoFarIntermsof Script Reformsrdquo Paper presented at the Face to Face meeting jointly held by theBanglaAcademyDhakaampICANNatBanglaAcademyDhakaon10072018

[154] The Unicode Consortium 2018 The Unicodereg Standard Version 110 ndash CoreSpecificationChapter12P473

57

10 Appendix-I

101 AugmentedBackus-NaurFormalism(ABNF)The Augmented Backus-Naur Formalism (ABNF) is generic in nature and whenappliedtoaspecificlanguagescriptcertainrestrictionrulesapplyInotherwordsinagivenlanguagesomeoftheFormalismstructuresdonotnecessarilyapplyTotakecareofsuchcasesrestrictionrulesaresetinplaceTheserestrictionswillhelptofine-tunetheABNFIncaseofBangla13inparticularthefollowingrulesapply

1Khaṇḍata (ৎ) is NOT allowed at the beginning of an IDN label The sameappliestoঞandthevelarnasalঙintheBanglaSchemeoffive-foldlsquovargarsquo(asdefined under Table 5) Moreover Bangla does not allow ya (য়) in thebeginningofawordeitherbutwecanciteacoupleofnativeexamples forexamplethewordয়8াvেড়া(yaeligbbɔRo)fromthepoemlsquoLichuchorrsquowrittenbyKaziNazrul IslamHowever there are instances of it being used in namesmostlyof foreignoriginsuchasYaqubwhichmaybewrittenwithya(য়) inthe beginning as inয়াwব) In very recent timeswhile transliterating someChineseandJapanesenamesinBanglaonedoescomeacrossthepossibilityofKhaṇḍata (ৎ) followedbysa (স) inthebeginningofaword forexamplexেসিরং(Tsering)

2CHcancomewithKhandaTainonlythecasewhereCisra(র)(09B0)

ৎ6 asinভৎ6 সনা

3OnlyfollowingcombinationswithVHCMwillbeallowedrarrঅ8া(togetherpronouncedasaelig)asinঅ8ািসড(acid)rarrএ8া(togetheralsopronouncedasaelig)asinএ8ািসডএ8ােসািসেয়শান

(acidassociation)

102 lsquoSylhetiNagarılipirsquoorlsquoSilotirsquoThisversionofBanglascriptresemblesthe lsquoKaithīrsquoscript(ISO12954)usedbytheAccountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh andBiharndashwidelyinuseduringthe1880sTherewereseveralothernamesofSylheti

13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language Assamese and Maṇipuri have not been covered in this section

58

Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)

Table17ndashTheScriptTableofSylhetiNagarıorSiloti

103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints

1031 BanglaandNāgarīorDevanāgarī

Bangla Devanāgarī NBGP

Decision

ঃ U+0983 ः U+0903 Confusable

ওU+0993 उU+0909 Confusable

ঘU+0998 घU+0918 Confusable

U+0981 U+0945 Confusable

Table18BanglaandDevanāgarīconfusablecodepoints

59

1032 BanglaandGurmukhi

Bangla Gurmukhi NBGPdecision

ঘU+0998 ਬU+0A2C Confusable

U+0981 U+0A71 Confusable

Table19BanglaandGurmukhiconfusablecodepoints

Bangla Gurmukhi

NBGPdecision

ওU+0993 ਤU+0A24

Distinguishable

শU+09B6 ਅU+0A05

Distinguishable

মU+09AE ਮU+0A2E

Distinguishable

বাU+09ACandU+09BE

ਗU+0A17

Distinguishable

Table20ndashBanglaandGurmukhıdistinguishablecodepoints

1033BanglaandOriya(Odia)

Bangla Oriya(Odia) NBGPDecision

ওU+0993 ଓU+0B13 Confusable

Table21ndashBanglaandOriyadistinguishablecodepoints

Bangla Oriya(Odia) NBGP

Decision

ঘU+0998 ସU+0B38 Distinguishable

Table22ndashBanglaandOriyadistinguishablecodepoints

60

11 Appendix-IIBengaliconsonantsandtheirallographs

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

প p y(z+ত)(z +ন)|(z +প)প8(z +য)r(z +র)(z +ল)~(z +স)(+প)(+প)

ফ pʰ (t +র)(t +ল)(+ফ)

ব b ( +জ)( +দ)( +ধ)v(+ব)ব8(+য)(+র)(+ল)ভ(+ভ)(+ব)(+ব)

(0+ধ) 2 (3+ব)

ভ bʱ ভ8(+য)(+র)(+ল)

ত t (x+ত)8(x+x+য)(x+x+ব)(x+থ)(x+ন)ত8(x+য)(x+ম)8(x++য)(x+ব)(x+র)y(z+ত)(p+ত)(p+x+ব)(+ত)q8(+x++য) (+x+র)Thereisamarkedformofত+=ৎৎ6 (+xৎ)

amp (5+ত)

61

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

থ tʰ থ8(iexcl+য)cent(iexcl+র)pound(+থ)(x+থ)curren(+থ)

(7+থ) 9 (+থ)

দ d yen(brvbar+গ)sect(brvbar+ঘ)uml(brvbar+দ)copy(brvbar+ধ)দ8(brvbar+য)ordf(brvbar+ব)laquo(brvbar+ভ)not(brvbar+র)(+দ)shy(+দ)reg(+brvbar+র)not6 (+brvbar+র)

(lt+গ) gt (lt+ধ)

ধ dʱ macr(deg+ন)plusmn(deg+ম)ধ8(deg+য)sup2(deg+র)sup3(acute+ধ)copy(brvbar+ধ)(+ধ)micro(+ধ)

( (+ধ) gt (lt+ধ) (0+ধ) (7+ধ)

ট ʈ para(middot+ট)ট8(middot+য)cedil(middot+ব)sup1(middot+র)ordm(p+ট)raquo(frac14+ট)

ঠ ʈʰ ঠ8(frac12+য)frac34(iquest+ঠ)Agrave(frac14+ঠ)

ড ɖ Aacute(Acirc+ড)ড8(Acirc+য)Atilde(Acirc+র)

ঢ ɖʱ ঢ8(Auml+য)Aring(iquest+ঢ)

চ tʃ AElig(Ccedil+চ)Egrave(Ccedil+ছ)Eacute(Ccedil+Ecirc+র)Euml(Ccedil+ঞ)চ8(Ccedil+য)Igrave(Iacute+চ)Icirc(Iuml+চ)

(A+চ)

62

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ছ tʃʰ ETH(Ecirc+র)Egrave(Ccedil+ছ)Ntilde(Iacute+ছ)Ograve(Iuml+ছ)

$ (A+ছ)

জ dʒ Oacute(Ocirc+জ)Otilde(Ocirc+Ocirc+ব)Ouml(Ocirc+ঝ)times(Ocirc+ঞ)জ8(Ocirc+য)Oslash(Ocirc+র)Ugrave(Iacute+জ)

(A+জ)

ঝ dʒʱ (notprivilegedenoughtohaveclustersasafirstmember)Ouml(Ocirc+ঝ)Uacute(Iacute+ঝ)

ক k Ucirc(p+ক)ordm(p+ট)(p+ত)Uuml(p+x+র)(p+x+ব)Yacute(p+ন)THORN(p+ব)szlig(p+ম)ক8(p+য)agrave(p+র)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)auml(p+frac14+ব)aacute8(p+frac14+য)aring(p+স)s(aelig+ক)ccedil(+p+র)

amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)

খ kʰ (notprivilegedenoughtohaveclustersasafirstmember)egrave(aelig+খ)

63

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

গ g eacute(acute+গ)ecirc(acute+দ)sup3(acute+ধ)euml(acute+ন)igrave(acute+ব)iacute(acute+ম)গ8(acute+য)icirc(acute+র)iuml(acute+ল)eth(aelig+গ)eth6 (+aelig+গ)

( (+ধ) (H+গ) K (L+H+গ)

ঘ gʱ ntilde(ograve+ন)ঘ8(ograve+য)oacute(ograve+র)ocirc(aelig+ঘ)

ঞ Thisletterdoesnothaveanyparticularphoneticvaluebutmostlypronouncedasn

Igrave(Iacute+চ)Ntilde(Iacute+ছ)Ugrave(Iacute+জ)Uacute(Iacute+ঝ)times(Ocirc+ঞ)

(A+চ) $ (A+ছ) (A+জ) M (A+ঝ)

ণ n otilde(iquest+ট)frac34(iquest+ঠ)ouml(iquest+ড)divide(iquest+Acirc+র)Aring(iquest+ঢ)oslash(iquest+ণ)ণ8(iquest+য)ugrave(iquest+ব)acirc(p+frac14+ণ)uacute(frac14+ণ)ucirc(+ণ)

O (P+ড) - (P+R+র) + (S+ণ)

ঙং ŋ s(aelig+ক)uuml(aelig+p+র)egrave(aelig+খ)eth(aelig+গ)ocirc(aelig+ঘ)yacute(aelig+p+ষ)(Insomecontextsaeligisreplacedbyং)কংঅং

) (H+ক) (H+গ) U (H+ঘ)

64

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ম m thorn(+ল)yuml(+প)(+z+র)(+ভ)(++র)$(+ম)(+র)(x+ম)plusmn(deg+ম)amp(+ম)atilde(p+frac14+ম)

W (3+ম)

ন n (+ট)((+middot+র))(iquest+ঠ)u(+ড)(+Acirc+র)(+ত)q(+x+র)q8(+x++য)curren(+থ)shy(+দ)reg(+brvbar+র)micro(+ধ)+(+deg+র)(+brvbar+ব)-(+ন)(+ম)ন8(+য)(+স)0(+ন)

(7+থ) (7+ধ) (7+Y+র)

শ ʃ Icirc(Iuml+চ)Ograve(Iuml+ছ)1(Iuml+ন)2(Iuml+ম)3(Iuml+র)4(Iuml+ল)শ8(Iuml+য)

ষ ʃ 5(frac14+ক)raquo(frac14+ট)Agrave(frac14+ঠ)uacute(frac14+ণ)6(frac14+প)7(frac14+z+র)8(frac14+ফ)raquo(frac14+ট)9(frac14+middot+র)Agrave(frac14+ঠ)uacute(frac14+ণ)ষ8(frac14+য)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)

+ (S+ণ)

65

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

স sampʃ (+ক)(+ট)(+প)(+ফ)lt(+ত)pound(+থ)(+ট)(+ক)=(+খ)স8(+য)gt(+র)(+ল)aring(p+স)

9 (+থ)

হ h ucirc(+ণ)0(+ন)amp(+ম)হ8(+য)(+র)A(+ল)

W (3+ম)

ড় ɽ B(C+গ)

ঢ় ɽʱ (notprivilegedenoughtohaveclusters)

য dʒThesecondarysymbol(allograph)jɔ-phalahastwophoneticvaluesWhenaddedtotheinitialconsonantinaworditisavowelaelig(asinশ8ামলর 8াপারetc)Butafteranon-initialconsonantitjustdoublesitinpronunciation(asinকায6ধায6etc)The+যcombinationhastwophysicalmanifestationsmdashর 8andয6

ক8(p+য)স8(+য)র 8(+য)[Justর 8isneverusedinBanglaorthographyর 8াisbutthenitslasttwosymbolsYa-phalaa-karaconstituteavowelsignrepresentingthevowelঅ8া]

66

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

র r Twomanifestationsmdashi lরফrepʰasthefirst

memberofaclusteregপ6ৎ6 not6 য6D6(+deg+ব)(earlierE6=+brvbar+deg+বafour-termcluster)etc(placedoverthefollowingconsonant)

ii র-ফলাrɔ-pʰɔlaasthesecondthirdmemberofaclustereg etc(placedundertheconsonantitfollows)

ল l F(+গ)(+প)G(+ব)H(+ম)I(+ট)J(+ড)K(+ক)F(+গ)L(+দ)ল8(+য)iuml(acute+ল)(+ল)thorn(+ল)

ঃ hwordfinallywordmediallyitdoublesthepronunciationofthefollowingconsonant

অঃকঃ

অব

Page 48: Proposal for a Bangla (or Bengali) Script Root Zone …‘Bangla’ (or Bengali) is historically and genealogically regarded as an eastern Indo-Aryan language with around 178.2 million

48

1 CisasetofCandCNwhereCNisthesetofnormalizedformsofড়ঢ়য়2 HmustbeprecededbyC

Example

3 MmustbeprecededbyCExampleকা

4 DmustbeprecededbyeitherofVCorMExampleআখখা হা

5 XmustbeprecededbyeitherofVCMorDExampleউঃখঃবঃাঃ দ ঃ

6 BmustbeprecededbyeitherofVCMorDExampleআংইংকং

7 ZmustbeprecededbyVCMDBXorPExampleইৎকৎাৎাৎপ6ৎrৎ (S is not listed because S ends with M Z may also follow S)rdquo

8 VCANNOTbeprecededbyHDetailsin711CaseofVprecededbyH

9 SCANNOTbeprecededbyH

Now letuselaborateeach rulewithexamples from the scriptkeeping inmindthe Bangla Assamese and Manipuri communities Some combinations ofcharactersmayseemunrealisticorrareinusagebutthereisnoharminaddingsuch ligaturesbecause it ispossible tocreate thembyanyusereasilybutmaynotbeattestedcombinationsCase of V Preceded by H There could be cases involvingmulti-word domains where Vmay need to beallowedtofollowanHegব8াsঅtইিuয়া baeligŋkʌv ɪndiə (U+09ACU+09CDU+09AFU+09BEU+0999U+09CD U+0995 U+0985 U+09AB U+09CD U+0987 U+09A8 U+09CD U+09A1U+09BFU+09DFU+09BE)(meaningBankofIndia)ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendswithanH(অt)andthesecondwordbeginswithaV(ইিuয়া)Somesectionsofthe linguistic community require the explicit presence of H for fullrepresentation of the sound intended However by and large the form of thefirstwordwithoutanH(U+09CD)isconsideredenoughforfullrepresentationofthesoundintendedforthefirstword

49

This isauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoire Otherwise V is never required to be allowed to follow an HPermittingthismaycreateaperceptivesimilaritybetweentwolabels(withandwithout H) for majority of the linguistic communities hence this is explicitlyprohibitedbytheNBGPIn future if required depending on the prevailing requirements from thecommunitythefutureNBGPmayconsiderrevisitingthisrule

72 AdditionalExamplesfromBanglaABNF

Belowarea fewexampleswhichhelponeunderstandsomeof the rulesABNFputsinplaceThesearejustgivenforreferencepurposesandarenotmeanttobecomprehensive

1HMBDorXcannotoccurinthebeginningofaBanglawordExample

ক क

াক ाक

ংক क

ক क

ঃক ःकAscanbeseensuchcombinationwillresultautomaticallyinaldquogolurdquooradottedcircle marking it as an invalid formation This is an intrinsic property of theIndianlanguagesyllableandisquasi-automaticallyappliedwhereversupportedbytheOS

2HisnotpermittedafterVBDXMS

Example

অ अ

অং क ক क কঃ कः ি )क

3NumberofBDorXpermittedafterConsonantorVowelorakāra(Mātrā)isrestrictedtoonethusthefollowingcombinationsareinvalidated

50

Example

কংং क

ক क

কঃঃ कःः

কা का

কীঃঃ कःः

অংং अ

অ अ

অঃঃ अःः

4NumberofMpermittedafterConsonantisrestrictedtooneExample

কীী की5MisnotpermittedafterV Example

ইাঈৗ ईाईौ6ThecombinationsofAnusvāra+ VisargaaswellasVisarga+Anusvāraarenotpermissible

Example

কংঃ कः

কঃং कः

8 Contributors

81ExpertsfromIndia ProfessorUdayaNarayanaSinghChair-ProfessorofLinguisticsampDeanFacultyofArtsAmityUniversityHaryanaGurgaonPachgaonManesarPIN122431(Haryana)India

51

ProfessorPabitraSarkarformerlyVice-ChancellorRabindraBharatiUniversityKolkataDrAtiurRahmanKhanPrincipalTechnicalOfficerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMrRajibChakrabortyLinguistSocietyforNaturalLanguageTechnologyResearch(SNLTR)Module114amp130SDFBuildingSaltLakeSector-VKolkata-700091(WestBengal)IndiaMrAkshatJoshiProjectEngineerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMsMoumitaChowdhurySeniorTechnicalOfficerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMrChandrakantaMurasinghAgartalaTripuraSomeotherNBGPmembers

82ContributorsfromBangladeshJanabMustafaJabbarHonorableMinisterMinistryofPostsTelecommunicationsampInformationTechnologyGovtofBangladeshProfShamsuzzamanKhanFormerDirector-GeneralBanglaAcademyDhakaProfRafiqulIslamNationalProfessorofHumanitiesDhakaProfSwarochisSarkarDirectorInstituteofBangladeshStudiesRajshahiUniversityRajshahiBangladeshProfJinnatImtiazAliDirector-GeneralInternationalMotherLanguageInstituteDhakaMrMohammadMamunOrRashidDepartmentofBanglaJahangirnagarUniversityampMemberBangladeshComputerCouncilProfManiruzzamanformerlyProfessorChittagongUniversityChattagramBangladeshMrShyamSunderSikderSecretarySecretaryPostampTelecommunicationsDivisionGovtofBangladesh

52

MrMdMustafaKamalFormerDirectorGeneralBangladeshTelecommunicationsRegulatoryAuthorityGovernmentofBangladeshDhakaBrigadierGeneralMdMahfuzulKarimMajumderDirector-GeneralEngineeringampOperationsDivisionBangladeshTelecommunicationsRegulatoryAuthorityGovernmentofBangladeshDhakaMdZiarulIslamProgrammerPostsampTelecommunicationsDivisionGovernmentofBangladeshDhakaProfSyedShahriyarRahmanDepartmentofLinguisticsUniversityofDhakaDrMizanurRahmanDirector(In-Charge)TranslationTextBookandInternationalRelationsDivisionBanglaAcademyDhakaDrApareshBandyopadhyayDirectorBanglaAcademyDhakaMrMdMobarakHossainDirectorBanglaAcademyDhakaDrJalalAhmedDirectorBanglaAcademyDhakaMrJahangirHossainInternetSocietyBangladesh(ICANNALS)JanabSarwarMostafaChoudhuryBangladeshComputerCouncilDhakaJanabMdRashidWasifBangladeshComputerCouncilDhakaJanabIstiaqueArifSeniorAssistantDirectorBangladeshTelecommunicationsRegulatoryAuthorityDhakaMsAfifaAbbasInformationSecurityandGovernanceLeadEngineeratBanglalinkandICANNFellowMr Mohammad Abdul Haque Secretary General Bangladesh Internet GovernanceForumMrImranHossenCEOEyeSoftandkeymemberofBangladeshAssociationofSoftwareampInformationServices(BASIS)MsShahidaKhatunDirectorFolkloreMuseumampArchiveDivisionBanglaAcademyDhakaMrSyedAshikRehmanCEOBengalMediaCorporationDhakaMrHaseebRahmanCEOProfessionalsrsquoSystemsDhaka

53

9 References[100] UnicodeConsortium2017UnicodeStandard100MountainViewCA

[101] BandyopadhyayChittaranjan1981DuiShatakerBanglaMudranoPrakashanKolkataAnandaPublishers

[102] BanerjiRD1919TheOriginoftheBengaliScriptKolkataNewDelhiAsianEducationalServices2003reprint

[103] ChatterjiSK1926TheOriginandDevelopmentoftheBengaliLanguageCalcuttaCalcuttaUniversityPress

[104] -----1939Bhasha-prakashBangalaVyakaran(AGrammaroftheBengaliLanguage)CalcuttaUniversityofCalcutta

[105] HaiMuhammadAbdul1964DhvaniVijnanOBanglaDhvani-tattwa(PhoneticsandBengaliPhonology)DhakaBanglaAcademy

[106]JhaSubhadra1958TheFormationofMaithiliLondonLuzacampCo

[107] KosticDjordjeDasRheaS1972AShortOutlineofBengaliPhoneticsCalcuttaStatisticalPublishingCompany

[108] MajumdarRC1971HistoryofAncientBengalCalcuttaGBhardwaj

[109] MazumdarBijaychandra19202000TheHistoryoftheBengaliLanguage(ReprCalcutta1920ed)NewDelhiAsianEducationalServices

[110] PandeyAnshuman2001ProposaltoEncodetheTirhutaScriptinISOIEC10646

[111] PalPalashBaran2001DhwanimalaBarnamalaKolkataPapyrus

[112] -----2007lsquoBanglaHarapherPanchParbarsquoInSwapanChakrabortyedMudranerSanskritiOBanglaBoiKolkataAbabhas

[113] RossFiona1999ThePrintedBengaliCharacteranditsEvolutionLondonCurzon

[114] ShastriMahamahopadhyayHaraPrasad1916HājārBacharērPurāṇaBāṅgālāBhāṣāyBauddhaGānōDōhāCalcuttaBangiyaSahityaParishat

[115] SinghUdayaNarayana(JointlyManiruzzaman)1983DiglossiainBangladeshandlanguageplanningCalcuttaGyanBharati214pp

[116] -----1987ABibliographyofBengaliLinguisticsMysoreCIILxii+316pp

[117] -----2017(withRajibChakrabortyBidishaBhattacharjeeampArimardanKumarTripathy)LanguagesandCulturesontheMarginGuidelinesforFieldworkonEndangeredLanguagesMimeoCentreforEndangeredLanguagesVisva-Bharati

54

[118] -----1980ScriptalchoiceandspellingreformAnessayinlanguageandplanningJournaloftheMSUniversityofBarodaSocialScienceNumber292173-186AmodifiedversionreprintedEAnnamalaiBjornJernuddandJoanRubinedsLanguagePlanningProceedingsofanInstituteMysoreCIIL405-417

[119] Sripantha1996JakhanChapakhanaEloKolkataPaschim-BangaBanglaAcademy

[120] SurAtul1986BanglaMudranerDushoBacharKolkataJijnasa

[121] ScriptBehaviourforBengaliVersion11TDILandC-DACPune

[122] BoraMahendra1981TheEvolutionofAssameseScriptJorhatAssamSahityaSabha

[123] ProposaltoEncodetheTirhutaScriptinISOIEC10646httpwwwunicodeorgL2L201111175r-tirhutapdfaccessedon25112017

[124] EthnologueAssameseintheLanguageCloudhttpswwwethnologuecomcloudasmaccessedon25112017

[125] BengalialphabetforManipurifoundinEthnologueManipuri(MeeteilonMeithei)httpswwwomniglotcomwritingmanipurihtmaccessedon20102019

[126] WikipediaBengalialphabethttpsenwikipediaorgwikiBengali_alphabetaccessedon25112017

[129]OmniglotSlyhetihttpwwwomniglotcomwritingsylotihtmaccessedon1052018

[130]WikipediaBishnupriyaManipurilanguagehttpsenwikipediaorgwikiBishnupriya_Manipuri_languageaccessedon25112017

[131] TheEMILLECIILCorpushttpmetashareeldaorgrepositorybrowsethe-emilleciil-corpusabdd35c8de6f11e2b1e400259011f6ea6bce74d38dbb42d881da76c64a6adb20accessedon1052018

[132]TheEMILLECIILCorpushttpcatalogelrainfoproduct_infophpproducts_id=696accessedon1052018

[133] BanglaLanguageampScript httpswwwisicalacin~rc_banglabanglahtmlaccessedon1052018

[134] SarkarPabitra1992BanglaBananSanskarSamasyaoSambhabanaKolkataChirayataPrakashan

[135] SarkarPabitra1993BanglaBhasharYuktabyanjanBhasha1123-45

55

[136] DashNiladriShekharandBBChaudhuri1998BanglaScriptAStructuralStudyLinguisticsToday121-28Alsoavailableathttpswwwacademiaedu9967428Bangla_Script_A_Structural_Study

[137] DaniAhmedHasan(1957)lsquoSrīhatta-NāgarīLipirUtpattioBikāśrsquoBanglaAcademyPatrika(Dhaka)Vol12(Bhadra-Agrahayan1364BangabdaNumber)pg1

[138] WikipediaSylhetiNagari httpsenwikipediaorgwikiSylheti_Nagariaccessedon1952018

[139] FuruiRyosuke(2015)lsquoVariegatedAdaptationsStateFormationinBengalfromtheFifthtoSeventhCenturyrsquoinBhairabiPrasadSahuampHermannKulkeedsInterrogatingPoliticalSystemsIntegrativeProcessesandStatesinPre-ModernIndiaChapter9Pp255-73NewDelhiManohar

[140] FergusonCharesAandMunierChowdhury(1960)lsquoPhonesofBengalirsquoLanguageVol36No1pp22-59

[141] ShahidullahMuhammad(2007)BuddhistMysticSongsDhakaMowlaBrothers

[142] RayPunyaSloka(1966)BengaliLanguageHandbookWashington

[143] HaiMuhammadAbdul(1960)AphoneticandphonologicalstudyofnasalsandnasalizationinBengaliDhakaUniversityofDhaka

[144] UnicodeConsortiumProposalSummaryFormtoAccompanySubmissionsforAdditionstotheRepertoireofISOIEC10646UNICODEhttpswwwunicodeorgL2L200202387r-syloti-formpdfaccessedonMay212018

[145] WikipediaOlChiki(Unicodeblock)httpsenwikipediaorgwikiOl_Chiki_(Unicode_block)accessedonMay212018

[146]BanglaScripthttpwwwbangladesh2000combdbangla_scripthtmlaccessedonMay212018

[147] BhattacharyaAshutoshed(1942)GopichandrerGanCalcuttaCalcuttaUniversity

[149] DasSisirKumar(1975)SahibsandMunshisAnAccountoftheCollegeofFortWilliamCalcutta

[150] IslamRafiqulPabitraSarkarMahbubulHaqampRajibChakraborty(eds)(2014)BanglaAcademyPromitoBanglaByabaharikByakaran(AFunctionalGrammarofStandardBangla)DhakaBanglaAcademy

[151] SarkarPabitra[2013]lsquoBanglaSpellingReformtheLongandShortofItrsquoBanglaJournal19215-232

[152] BanglaAcademy(2012)BanglaAcademyPromitoBanglaBananerNiyam(StandardBanglaSpellingasadoptedbyBanglaAcademy)DhakaBanglaAcademy

56

[153] SarkarPabitraampRajibChakraborty2018ldquoWhathashappenedSoFarIntermsof Script Reformsrdquo Paper presented at the Face to Face meeting jointly held by theBanglaAcademyDhakaampICANNatBanglaAcademyDhakaon10072018

[154] The Unicode Consortium 2018 The Unicodereg Standard Version 110 ndash CoreSpecificationChapter12P473

57

10 Appendix-I

101 AugmentedBackus-NaurFormalism(ABNF)The Augmented Backus-Naur Formalism (ABNF) is generic in nature and whenappliedtoaspecificlanguagescriptcertainrestrictionrulesapplyInotherwordsinagivenlanguagesomeoftheFormalismstructuresdonotnecessarilyapplyTotakecareofsuchcasesrestrictionrulesaresetinplaceTheserestrictionswillhelptofine-tunetheABNFIncaseofBangla13inparticularthefollowingrulesapply

1Khaṇḍata (ৎ) is NOT allowed at the beginning of an IDN label The sameappliestoঞandthevelarnasalঙintheBanglaSchemeoffive-foldlsquovargarsquo(asdefined under Table 5) Moreover Bangla does not allow ya (য়) in thebeginningofawordeitherbutwecanciteacoupleofnativeexamples forexamplethewordয়8াvেড়া(yaeligbbɔRo)fromthepoemlsquoLichuchorrsquowrittenbyKaziNazrul IslamHowever there are instances of it being used in namesmostlyof foreignoriginsuchasYaqubwhichmaybewrittenwithya(য়) inthe beginning as inয়াwব) In very recent timeswhile transliterating someChineseandJapanesenamesinBanglaonedoescomeacrossthepossibilityofKhaṇḍata (ৎ) followedbysa (স) inthebeginningofaword forexamplexেসিরং(Tsering)

2CHcancomewithKhandaTainonlythecasewhereCisra(র)(09B0)

ৎ6 asinভৎ6 সনা

3OnlyfollowingcombinationswithVHCMwillbeallowedrarrঅ8া(togetherpronouncedasaelig)asinঅ8ািসড(acid)rarrএ8া(togetheralsopronouncedasaelig)asinএ8ািসডএ8ােসািসেয়শান

(acidassociation)

102 lsquoSylhetiNagarılipirsquoorlsquoSilotirsquoThisversionofBanglascriptresemblesthe lsquoKaithīrsquoscript(ISO12954)usedbytheAccountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh andBiharndashwidelyinuseduringthe1880sTherewereseveralothernamesofSylheti

13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language Assamese and Maṇipuri have not been covered in this section

58

Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)

Table17ndashTheScriptTableofSylhetiNagarıorSiloti

103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints

1031 BanglaandNāgarīorDevanāgarī

Bangla Devanāgarī NBGP

Decision

ঃ U+0983 ः U+0903 Confusable

ওU+0993 उU+0909 Confusable

ঘU+0998 घU+0918 Confusable

U+0981 U+0945 Confusable

Table18BanglaandDevanāgarīconfusablecodepoints

59

1032 BanglaandGurmukhi

Bangla Gurmukhi NBGPdecision

ঘU+0998 ਬU+0A2C Confusable

U+0981 U+0A71 Confusable

Table19BanglaandGurmukhiconfusablecodepoints

Bangla Gurmukhi

NBGPdecision

ওU+0993 ਤU+0A24

Distinguishable

শU+09B6 ਅU+0A05

Distinguishable

মU+09AE ਮU+0A2E

Distinguishable

বাU+09ACandU+09BE

ਗU+0A17

Distinguishable

Table20ndashBanglaandGurmukhıdistinguishablecodepoints

1033BanglaandOriya(Odia)

Bangla Oriya(Odia) NBGPDecision

ওU+0993 ଓU+0B13 Confusable

Table21ndashBanglaandOriyadistinguishablecodepoints

Bangla Oriya(Odia) NBGP

Decision

ঘU+0998 ସU+0B38 Distinguishable

Table22ndashBanglaandOriyadistinguishablecodepoints

60

11 Appendix-IIBengaliconsonantsandtheirallographs

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

প p y(z+ত)(z +ন)|(z +প)প8(z +য)r(z +র)(z +ল)~(z +স)(+প)(+প)

ফ pʰ (t +র)(t +ল)(+ফ)

ব b ( +জ)( +দ)( +ধ)v(+ব)ব8(+য)(+র)(+ল)ভ(+ভ)(+ব)(+ব)

(0+ধ) 2 (3+ব)

ভ bʱ ভ8(+য)(+র)(+ল)

ত t (x+ত)8(x+x+য)(x+x+ব)(x+থ)(x+ন)ত8(x+য)(x+ম)8(x++য)(x+ব)(x+র)y(z+ত)(p+ত)(p+x+ব)(+ত)q8(+x++য) (+x+র)Thereisamarkedformofত+=ৎৎ6 (+xৎ)

amp (5+ত)

61

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

থ tʰ থ8(iexcl+য)cent(iexcl+র)pound(+থ)(x+থ)curren(+থ)

(7+থ) 9 (+থ)

দ d yen(brvbar+গ)sect(brvbar+ঘ)uml(brvbar+দ)copy(brvbar+ধ)দ8(brvbar+য)ordf(brvbar+ব)laquo(brvbar+ভ)not(brvbar+র)(+দ)shy(+দ)reg(+brvbar+র)not6 (+brvbar+র)

(lt+গ) gt (lt+ধ)

ধ dʱ macr(deg+ন)plusmn(deg+ম)ধ8(deg+য)sup2(deg+র)sup3(acute+ধ)copy(brvbar+ধ)(+ধ)micro(+ধ)

( (+ধ) gt (lt+ধ) (0+ধ) (7+ধ)

ট ʈ para(middot+ট)ট8(middot+য)cedil(middot+ব)sup1(middot+র)ordm(p+ট)raquo(frac14+ট)

ঠ ʈʰ ঠ8(frac12+য)frac34(iquest+ঠ)Agrave(frac14+ঠ)

ড ɖ Aacute(Acirc+ড)ড8(Acirc+য)Atilde(Acirc+র)

ঢ ɖʱ ঢ8(Auml+য)Aring(iquest+ঢ)

চ tʃ AElig(Ccedil+চ)Egrave(Ccedil+ছ)Eacute(Ccedil+Ecirc+র)Euml(Ccedil+ঞ)চ8(Ccedil+য)Igrave(Iacute+চ)Icirc(Iuml+চ)

(A+চ)

62

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ছ tʃʰ ETH(Ecirc+র)Egrave(Ccedil+ছ)Ntilde(Iacute+ছ)Ograve(Iuml+ছ)

$ (A+ছ)

জ dʒ Oacute(Ocirc+জ)Otilde(Ocirc+Ocirc+ব)Ouml(Ocirc+ঝ)times(Ocirc+ঞ)জ8(Ocirc+য)Oslash(Ocirc+র)Ugrave(Iacute+জ)

(A+জ)

ঝ dʒʱ (notprivilegedenoughtohaveclustersasafirstmember)Ouml(Ocirc+ঝ)Uacute(Iacute+ঝ)

ক k Ucirc(p+ক)ordm(p+ট)(p+ত)Uuml(p+x+র)(p+x+ব)Yacute(p+ন)THORN(p+ব)szlig(p+ম)ক8(p+য)agrave(p+র)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)auml(p+frac14+ব)aacute8(p+frac14+য)aring(p+স)s(aelig+ক)ccedil(+p+র)

amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)

খ kʰ (notprivilegedenoughtohaveclustersasafirstmember)egrave(aelig+খ)

63

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

গ g eacute(acute+গ)ecirc(acute+দ)sup3(acute+ধ)euml(acute+ন)igrave(acute+ব)iacute(acute+ম)গ8(acute+য)icirc(acute+র)iuml(acute+ল)eth(aelig+গ)eth6 (+aelig+গ)

( (+ধ) (H+গ) K (L+H+গ)

ঘ gʱ ntilde(ograve+ন)ঘ8(ograve+য)oacute(ograve+র)ocirc(aelig+ঘ)

ঞ Thisletterdoesnothaveanyparticularphoneticvaluebutmostlypronouncedasn

Igrave(Iacute+চ)Ntilde(Iacute+ছ)Ugrave(Iacute+জ)Uacute(Iacute+ঝ)times(Ocirc+ঞ)

(A+চ) $ (A+ছ) (A+জ) M (A+ঝ)

ণ n otilde(iquest+ট)frac34(iquest+ঠ)ouml(iquest+ড)divide(iquest+Acirc+র)Aring(iquest+ঢ)oslash(iquest+ণ)ণ8(iquest+য)ugrave(iquest+ব)acirc(p+frac14+ণ)uacute(frac14+ণ)ucirc(+ণ)

O (P+ড) - (P+R+র) + (S+ণ)

ঙং ŋ s(aelig+ক)uuml(aelig+p+র)egrave(aelig+খ)eth(aelig+গ)ocirc(aelig+ঘ)yacute(aelig+p+ষ)(Insomecontextsaeligisreplacedbyং)কংঅং

) (H+ক) (H+গ) U (H+ঘ)

64

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ম m thorn(+ল)yuml(+প)(+z+র)(+ভ)(++র)$(+ম)(+র)(x+ম)plusmn(deg+ম)amp(+ম)atilde(p+frac14+ম)

W (3+ম)

ন n (+ট)((+middot+র))(iquest+ঠ)u(+ড)(+Acirc+র)(+ত)q(+x+র)q8(+x++য)curren(+থ)shy(+দ)reg(+brvbar+র)micro(+ধ)+(+deg+র)(+brvbar+ব)-(+ন)(+ম)ন8(+য)(+স)0(+ন)

(7+থ) (7+ধ) (7+Y+র)

শ ʃ Icirc(Iuml+চ)Ograve(Iuml+ছ)1(Iuml+ন)2(Iuml+ম)3(Iuml+র)4(Iuml+ল)শ8(Iuml+য)

ষ ʃ 5(frac14+ক)raquo(frac14+ট)Agrave(frac14+ঠ)uacute(frac14+ণ)6(frac14+প)7(frac14+z+র)8(frac14+ফ)raquo(frac14+ট)9(frac14+middot+র)Agrave(frac14+ঠ)uacute(frac14+ণ)ষ8(frac14+য)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)

+ (S+ণ)

65

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

স sampʃ (+ক)(+ট)(+প)(+ফ)lt(+ত)pound(+থ)(+ট)(+ক)=(+খ)স8(+য)gt(+র)(+ল)aring(p+স)

9 (+থ)

হ h ucirc(+ণ)0(+ন)amp(+ম)হ8(+য)(+র)A(+ল)

W (3+ম)

ড় ɽ B(C+গ)

ঢ় ɽʱ (notprivilegedenoughtohaveclusters)

য dʒThesecondarysymbol(allograph)jɔ-phalahastwophoneticvaluesWhenaddedtotheinitialconsonantinaworditisavowelaelig(asinশ8ামলর 8াপারetc)Butafteranon-initialconsonantitjustdoublesitinpronunciation(asinকায6ধায6etc)The+যcombinationhastwophysicalmanifestationsmdashর 8andয6

ক8(p+য)স8(+য)র 8(+য)[Justর 8isneverusedinBanglaorthographyর 8াisbutthenitslasttwosymbolsYa-phalaa-karaconstituteavowelsignrepresentingthevowelঅ8া]

66

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

র r Twomanifestationsmdashi lরফrepʰasthefirst

memberofaclusteregপ6ৎ6 not6 য6D6(+deg+ব)(earlierE6=+brvbar+deg+বafour-termcluster)etc(placedoverthefollowingconsonant)

ii র-ফলাrɔ-pʰɔlaasthesecondthirdmemberofaclustereg etc(placedundertheconsonantitfollows)

ল l F(+গ)(+প)G(+ব)H(+ম)I(+ট)J(+ড)K(+ক)F(+গ)L(+দ)ল8(+য)iuml(acute+ল)(+ল)thorn(+ল)

ঃ hwordfinallywordmediallyitdoublesthepronunciationofthefollowingconsonant

অঃকঃ

অব

Page 49: Proposal for a Bangla (or Bengali) Script Root Zone …‘Bangla’ (or Bengali) is historically and genealogically regarded as an eastern Indo-Aryan language with around 178.2 million

49

This isauniquesituationnecessitatedbythelackofhyphenspaceortheZeroWidthNon-joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoire Otherwise V is never required to be allowed to follow an HPermittingthismaycreateaperceptivesimilaritybetweentwolabels(withandwithout H) for majority of the linguistic communities hence this is explicitlyprohibitedbytheNBGPIn future if required depending on the prevailing requirements from thecommunitythefutureNBGPmayconsiderrevisitingthisrule

72 AdditionalExamplesfromBanglaABNF

Belowarea fewexampleswhichhelponeunderstandsomeof the rulesABNFputsinplaceThesearejustgivenforreferencepurposesandarenotmeanttobecomprehensive

1HMBDorXcannotoccurinthebeginningofaBanglawordExample

ক क

াক ाक

ংক क

ক क

ঃক ःकAscanbeseensuchcombinationwillresultautomaticallyinaldquogolurdquooradottedcircle marking it as an invalid formation This is an intrinsic property of theIndianlanguagesyllableandisquasi-automaticallyappliedwhereversupportedbytheOS

2HisnotpermittedafterVBDXMS

Example

অ अ

অং क ক क কঃ कः ি )क

3NumberofBDorXpermittedafterConsonantorVowelorakāra(Mātrā)isrestrictedtoonethusthefollowingcombinationsareinvalidated

50

Example

কংং क

ক क

কঃঃ कःः

কা का

কীঃঃ कःः

অংং अ

অ अ

অঃঃ अःः

4NumberofMpermittedafterConsonantisrestrictedtooneExample

কীী की5MisnotpermittedafterV Example

ইাঈৗ ईाईौ6ThecombinationsofAnusvāra+ VisargaaswellasVisarga+Anusvāraarenotpermissible

Example

কংঃ कः

কঃং कः

8 Contributors

81ExpertsfromIndia ProfessorUdayaNarayanaSinghChair-ProfessorofLinguisticsampDeanFacultyofArtsAmityUniversityHaryanaGurgaonPachgaonManesarPIN122431(Haryana)India

51

ProfessorPabitraSarkarformerlyVice-ChancellorRabindraBharatiUniversityKolkataDrAtiurRahmanKhanPrincipalTechnicalOfficerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMrRajibChakrabortyLinguistSocietyforNaturalLanguageTechnologyResearch(SNLTR)Module114amp130SDFBuildingSaltLakeSector-VKolkata-700091(WestBengal)IndiaMrAkshatJoshiProjectEngineerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMsMoumitaChowdhurySeniorTechnicalOfficerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMrChandrakantaMurasinghAgartalaTripuraSomeotherNBGPmembers

82ContributorsfromBangladeshJanabMustafaJabbarHonorableMinisterMinistryofPostsTelecommunicationsampInformationTechnologyGovtofBangladeshProfShamsuzzamanKhanFormerDirector-GeneralBanglaAcademyDhakaProfRafiqulIslamNationalProfessorofHumanitiesDhakaProfSwarochisSarkarDirectorInstituteofBangladeshStudiesRajshahiUniversityRajshahiBangladeshProfJinnatImtiazAliDirector-GeneralInternationalMotherLanguageInstituteDhakaMrMohammadMamunOrRashidDepartmentofBanglaJahangirnagarUniversityampMemberBangladeshComputerCouncilProfManiruzzamanformerlyProfessorChittagongUniversityChattagramBangladeshMrShyamSunderSikderSecretarySecretaryPostampTelecommunicationsDivisionGovtofBangladesh

52

MrMdMustafaKamalFormerDirectorGeneralBangladeshTelecommunicationsRegulatoryAuthorityGovernmentofBangladeshDhakaBrigadierGeneralMdMahfuzulKarimMajumderDirector-GeneralEngineeringampOperationsDivisionBangladeshTelecommunicationsRegulatoryAuthorityGovernmentofBangladeshDhakaMdZiarulIslamProgrammerPostsampTelecommunicationsDivisionGovernmentofBangladeshDhakaProfSyedShahriyarRahmanDepartmentofLinguisticsUniversityofDhakaDrMizanurRahmanDirector(In-Charge)TranslationTextBookandInternationalRelationsDivisionBanglaAcademyDhakaDrApareshBandyopadhyayDirectorBanglaAcademyDhakaMrMdMobarakHossainDirectorBanglaAcademyDhakaDrJalalAhmedDirectorBanglaAcademyDhakaMrJahangirHossainInternetSocietyBangladesh(ICANNALS)JanabSarwarMostafaChoudhuryBangladeshComputerCouncilDhakaJanabMdRashidWasifBangladeshComputerCouncilDhakaJanabIstiaqueArifSeniorAssistantDirectorBangladeshTelecommunicationsRegulatoryAuthorityDhakaMsAfifaAbbasInformationSecurityandGovernanceLeadEngineeratBanglalinkandICANNFellowMr Mohammad Abdul Haque Secretary General Bangladesh Internet GovernanceForumMrImranHossenCEOEyeSoftandkeymemberofBangladeshAssociationofSoftwareampInformationServices(BASIS)MsShahidaKhatunDirectorFolkloreMuseumampArchiveDivisionBanglaAcademyDhakaMrSyedAshikRehmanCEOBengalMediaCorporationDhakaMrHaseebRahmanCEOProfessionalsrsquoSystemsDhaka

53

9 References[100] UnicodeConsortium2017UnicodeStandard100MountainViewCA

[101] BandyopadhyayChittaranjan1981DuiShatakerBanglaMudranoPrakashanKolkataAnandaPublishers

[102] BanerjiRD1919TheOriginoftheBengaliScriptKolkataNewDelhiAsianEducationalServices2003reprint

[103] ChatterjiSK1926TheOriginandDevelopmentoftheBengaliLanguageCalcuttaCalcuttaUniversityPress

[104] -----1939Bhasha-prakashBangalaVyakaran(AGrammaroftheBengaliLanguage)CalcuttaUniversityofCalcutta

[105] HaiMuhammadAbdul1964DhvaniVijnanOBanglaDhvani-tattwa(PhoneticsandBengaliPhonology)DhakaBanglaAcademy

[106]JhaSubhadra1958TheFormationofMaithiliLondonLuzacampCo

[107] KosticDjordjeDasRheaS1972AShortOutlineofBengaliPhoneticsCalcuttaStatisticalPublishingCompany

[108] MajumdarRC1971HistoryofAncientBengalCalcuttaGBhardwaj

[109] MazumdarBijaychandra19202000TheHistoryoftheBengaliLanguage(ReprCalcutta1920ed)NewDelhiAsianEducationalServices

[110] PandeyAnshuman2001ProposaltoEncodetheTirhutaScriptinISOIEC10646

[111] PalPalashBaran2001DhwanimalaBarnamalaKolkataPapyrus

[112] -----2007lsquoBanglaHarapherPanchParbarsquoInSwapanChakrabortyedMudranerSanskritiOBanglaBoiKolkataAbabhas

[113] RossFiona1999ThePrintedBengaliCharacteranditsEvolutionLondonCurzon

[114] ShastriMahamahopadhyayHaraPrasad1916HājārBacharērPurāṇaBāṅgālāBhāṣāyBauddhaGānōDōhāCalcuttaBangiyaSahityaParishat

[115] SinghUdayaNarayana(JointlyManiruzzaman)1983DiglossiainBangladeshandlanguageplanningCalcuttaGyanBharati214pp

[116] -----1987ABibliographyofBengaliLinguisticsMysoreCIILxii+316pp

[117] -----2017(withRajibChakrabortyBidishaBhattacharjeeampArimardanKumarTripathy)LanguagesandCulturesontheMarginGuidelinesforFieldworkonEndangeredLanguagesMimeoCentreforEndangeredLanguagesVisva-Bharati

54

[118] -----1980ScriptalchoiceandspellingreformAnessayinlanguageandplanningJournaloftheMSUniversityofBarodaSocialScienceNumber292173-186AmodifiedversionreprintedEAnnamalaiBjornJernuddandJoanRubinedsLanguagePlanningProceedingsofanInstituteMysoreCIIL405-417

[119] Sripantha1996JakhanChapakhanaEloKolkataPaschim-BangaBanglaAcademy

[120] SurAtul1986BanglaMudranerDushoBacharKolkataJijnasa

[121] ScriptBehaviourforBengaliVersion11TDILandC-DACPune

[122] BoraMahendra1981TheEvolutionofAssameseScriptJorhatAssamSahityaSabha

[123] ProposaltoEncodetheTirhutaScriptinISOIEC10646httpwwwunicodeorgL2L201111175r-tirhutapdfaccessedon25112017

[124] EthnologueAssameseintheLanguageCloudhttpswwwethnologuecomcloudasmaccessedon25112017

[125] BengalialphabetforManipurifoundinEthnologueManipuri(MeeteilonMeithei)httpswwwomniglotcomwritingmanipurihtmaccessedon20102019

[126] WikipediaBengalialphabethttpsenwikipediaorgwikiBengali_alphabetaccessedon25112017

[129]OmniglotSlyhetihttpwwwomniglotcomwritingsylotihtmaccessedon1052018

[130]WikipediaBishnupriyaManipurilanguagehttpsenwikipediaorgwikiBishnupriya_Manipuri_languageaccessedon25112017

[131] TheEMILLECIILCorpushttpmetashareeldaorgrepositorybrowsethe-emilleciil-corpusabdd35c8de6f11e2b1e400259011f6ea6bce74d38dbb42d881da76c64a6adb20accessedon1052018

[132]TheEMILLECIILCorpushttpcatalogelrainfoproduct_infophpproducts_id=696accessedon1052018

[133] BanglaLanguageampScript httpswwwisicalacin~rc_banglabanglahtmlaccessedon1052018

[134] SarkarPabitra1992BanglaBananSanskarSamasyaoSambhabanaKolkataChirayataPrakashan

[135] SarkarPabitra1993BanglaBhasharYuktabyanjanBhasha1123-45

55

[136] DashNiladriShekharandBBChaudhuri1998BanglaScriptAStructuralStudyLinguisticsToday121-28Alsoavailableathttpswwwacademiaedu9967428Bangla_Script_A_Structural_Study

[137] DaniAhmedHasan(1957)lsquoSrīhatta-NāgarīLipirUtpattioBikāśrsquoBanglaAcademyPatrika(Dhaka)Vol12(Bhadra-Agrahayan1364BangabdaNumber)pg1

[138] WikipediaSylhetiNagari httpsenwikipediaorgwikiSylheti_Nagariaccessedon1952018

[139] FuruiRyosuke(2015)lsquoVariegatedAdaptationsStateFormationinBengalfromtheFifthtoSeventhCenturyrsquoinBhairabiPrasadSahuampHermannKulkeedsInterrogatingPoliticalSystemsIntegrativeProcessesandStatesinPre-ModernIndiaChapter9Pp255-73NewDelhiManohar

[140] FergusonCharesAandMunierChowdhury(1960)lsquoPhonesofBengalirsquoLanguageVol36No1pp22-59

[141] ShahidullahMuhammad(2007)BuddhistMysticSongsDhakaMowlaBrothers

[142] RayPunyaSloka(1966)BengaliLanguageHandbookWashington

[143] HaiMuhammadAbdul(1960)AphoneticandphonologicalstudyofnasalsandnasalizationinBengaliDhakaUniversityofDhaka

[144] UnicodeConsortiumProposalSummaryFormtoAccompanySubmissionsforAdditionstotheRepertoireofISOIEC10646UNICODEhttpswwwunicodeorgL2L200202387r-syloti-formpdfaccessedonMay212018

[145] WikipediaOlChiki(Unicodeblock)httpsenwikipediaorgwikiOl_Chiki_(Unicode_block)accessedonMay212018

[146]BanglaScripthttpwwwbangladesh2000combdbangla_scripthtmlaccessedonMay212018

[147] BhattacharyaAshutoshed(1942)GopichandrerGanCalcuttaCalcuttaUniversity

[149] DasSisirKumar(1975)SahibsandMunshisAnAccountoftheCollegeofFortWilliamCalcutta

[150] IslamRafiqulPabitraSarkarMahbubulHaqampRajibChakraborty(eds)(2014)BanglaAcademyPromitoBanglaByabaharikByakaran(AFunctionalGrammarofStandardBangla)DhakaBanglaAcademy

[151] SarkarPabitra[2013]lsquoBanglaSpellingReformtheLongandShortofItrsquoBanglaJournal19215-232

[152] BanglaAcademy(2012)BanglaAcademyPromitoBanglaBananerNiyam(StandardBanglaSpellingasadoptedbyBanglaAcademy)DhakaBanglaAcademy

56

[153] SarkarPabitraampRajibChakraborty2018ldquoWhathashappenedSoFarIntermsof Script Reformsrdquo Paper presented at the Face to Face meeting jointly held by theBanglaAcademyDhakaampICANNatBanglaAcademyDhakaon10072018

[154] The Unicode Consortium 2018 The Unicodereg Standard Version 110 ndash CoreSpecificationChapter12P473

57

10 Appendix-I

101 AugmentedBackus-NaurFormalism(ABNF)The Augmented Backus-Naur Formalism (ABNF) is generic in nature and whenappliedtoaspecificlanguagescriptcertainrestrictionrulesapplyInotherwordsinagivenlanguagesomeoftheFormalismstructuresdonotnecessarilyapplyTotakecareofsuchcasesrestrictionrulesaresetinplaceTheserestrictionswillhelptofine-tunetheABNFIncaseofBangla13inparticularthefollowingrulesapply

1Khaṇḍata (ৎ) is NOT allowed at the beginning of an IDN label The sameappliestoঞandthevelarnasalঙintheBanglaSchemeoffive-foldlsquovargarsquo(asdefined under Table 5) Moreover Bangla does not allow ya (য়) in thebeginningofawordeitherbutwecanciteacoupleofnativeexamples forexamplethewordয়8াvেড়া(yaeligbbɔRo)fromthepoemlsquoLichuchorrsquowrittenbyKaziNazrul IslamHowever there are instances of it being used in namesmostlyof foreignoriginsuchasYaqubwhichmaybewrittenwithya(য়) inthe beginning as inয়াwব) In very recent timeswhile transliterating someChineseandJapanesenamesinBanglaonedoescomeacrossthepossibilityofKhaṇḍata (ৎ) followedbysa (স) inthebeginningofaword forexamplexেসিরং(Tsering)

2CHcancomewithKhandaTainonlythecasewhereCisra(র)(09B0)

ৎ6 asinভৎ6 সনা

3OnlyfollowingcombinationswithVHCMwillbeallowedrarrঅ8া(togetherpronouncedasaelig)asinঅ8ািসড(acid)rarrএ8া(togetheralsopronouncedasaelig)asinএ8ািসডএ8ােসািসেয়শান

(acidassociation)

102 lsquoSylhetiNagarılipirsquoorlsquoSilotirsquoThisversionofBanglascriptresemblesthe lsquoKaithīrsquoscript(ISO12954)usedbytheAccountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh andBiharndashwidelyinuseduringthe1880sTherewereseveralothernamesofSylheti

13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language Assamese and Maṇipuri have not been covered in this section

58

Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)

Table17ndashTheScriptTableofSylhetiNagarıorSiloti

103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints

1031 BanglaandNāgarīorDevanāgarī

Bangla Devanāgarī NBGP

Decision

ঃ U+0983 ः U+0903 Confusable

ওU+0993 उU+0909 Confusable

ঘU+0998 घU+0918 Confusable

U+0981 U+0945 Confusable

Table18BanglaandDevanāgarīconfusablecodepoints

59

1032 BanglaandGurmukhi

Bangla Gurmukhi NBGPdecision

ঘU+0998 ਬU+0A2C Confusable

U+0981 U+0A71 Confusable

Table19BanglaandGurmukhiconfusablecodepoints

Bangla Gurmukhi

NBGPdecision

ওU+0993 ਤU+0A24

Distinguishable

শU+09B6 ਅU+0A05

Distinguishable

মU+09AE ਮU+0A2E

Distinguishable

বাU+09ACandU+09BE

ਗU+0A17

Distinguishable

Table20ndashBanglaandGurmukhıdistinguishablecodepoints

1033BanglaandOriya(Odia)

Bangla Oriya(Odia) NBGPDecision

ওU+0993 ଓU+0B13 Confusable

Table21ndashBanglaandOriyadistinguishablecodepoints

Bangla Oriya(Odia) NBGP

Decision

ঘU+0998 ସU+0B38 Distinguishable

Table22ndashBanglaandOriyadistinguishablecodepoints

60

11 Appendix-IIBengaliconsonantsandtheirallographs

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

প p y(z+ত)(z +ন)|(z +প)প8(z +য)r(z +র)(z +ল)~(z +স)(+প)(+প)

ফ pʰ (t +র)(t +ল)(+ফ)

ব b ( +জ)( +দ)( +ধ)v(+ব)ব8(+য)(+র)(+ল)ভ(+ভ)(+ব)(+ব)

(0+ধ) 2 (3+ব)

ভ bʱ ভ8(+য)(+র)(+ল)

ত t (x+ত)8(x+x+য)(x+x+ব)(x+থ)(x+ন)ত8(x+য)(x+ম)8(x++য)(x+ব)(x+র)y(z+ত)(p+ত)(p+x+ব)(+ত)q8(+x++য) (+x+র)Thereisamarkedformofত+=ৎৎ6 (+xৎ)

amp (5+ত)

61

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

থ tʰ থ8(iexcl+য)cent(iexcl+র)pound(+থ)(x+থ)curren(+থ)

(7+থ) 9 (+থ)

দ d yen(brvbar+গ)sect(brvbar+ঘ)uml(brvbar+দ)copy(brvbar+ধ)দ8(brvbar+য)ordf(brvbar+ব)laquo(brvbar+ভ)not(brvbar+র)(+দ)shy(+দ)reg(+brvbar+র)not6 (+brvbar+র)

(lt+গ) gt (lt+ধ)

ধ dʱ macr(deg+ন)plusmn(deg+ম)ধ8(deg+য)sup2(deg+র)sup3(acute+ধ)copy(brvbar+ধ)(+ধ)micro(+ধ)

( (+ধ) gt (lt+ধ) (0+ধ) (7+ধ)

ট ʈ para(middot+ট)ট8(middot+য)cedil(middot+ব)sup1(middot+র)ordm(p+ট)raquo(frac14+ট)

ঠ ʈʰ ঠ8(frac12+য)frac34(iquest+ঠ)Agrave(frac14+ঠ)

ড ɖ Aacute(Acirc+ড)ড8(Acirc+য)Atilde(Acirc+র)

ঢ ɖʱ ঢ8(Auml+য)Aring(iquest+ঢ)

চ tʃ AElig(Ccedil+চ)Egrave(Ccedil+ছ)Eacute(Ccedil+Ecirc+র)Euml(Ccedil+ঞ)চ8(Ccedil+য)Igrave(Iacute+চ)Icirc(Iuml+চ)

(A+চ)

62

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ছ tʃʰ ETH(Ecirc+র)Egrave(Ccedil+ছ)Ntilde(Iacute+ছ)Ograve(Iuml+ছ)

$ (A+ছ)

জ dʒ Oacute(Ocirc+জ)Otilde(Ocirc+Ocirc+ব)Ouml(Ocirc+ঝ)times(Ocirc+ঞ)জ8(Ocirc+য)Oslash(Ocirc+র)Ugrave(Iacute+জ)

(A+জ)

ঝ dʒʱ (notprivilegedenoughtohaveclustersasafirstmember)Ouml(Ocirc+ঝ)Uacute(Iacute+ঝ)

ক k Ucirc(p+ক)ordm(p+ট)(p+ত)Uuml(p+x+র)(p+x+ব)Yacute(p+ন)THORN(p+ব)szlig(p+ম)ক8(p+য)agrave(p+র)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)auml(p+frac14+ব)aacute8(p+frac14+য)aring(p+স)s(aelig+ক)ccedil(+p+র)

amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)

খ kʰ (notprivilegedenoughtohaveclustersasafirstmember)egrave(aelig+খ)

63

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

গ g eacute(acute+গ)ecirc(acute+দ)sup3(acute+ধ)euml(acute+ন)igrave(acute+ব)iacute(acute+ম)গ8(acute+য)icirc(acute+র)iuml(acute+ল)eth(aelig+গ)eth6 (+aelig+গ)

( (+ধ) (H+গ) K (L+H+গ)

ঘ gʱ ntilde(ograve+ন)ঘ8(ograve+য)oacute(ograve+র)ocirc(aelig+ঘ)

ঞ Thisletterdoesnothaveanyparticularphoneticvaluebutmostlypronouncedasn

Igrave(Iacute+চ)Ntilde(Iacute+ছ)Ugrave(Iacute+জ)Uacute(Iacute+ঝ)times(Ocirc+ঞ)

(A+চ) $ (A+ছ) (A+জ) M (A+ঝ)

ণ n otilde(iquest+ট)frac34(iquest+ঠ)ouml(iquest+ড)divide(iquest+Acirc+র)Aring(iquest+ঢ)oslash(iquest+ণ)ণ8(iquest+য)ugrave(iquest+ব)acirc(p+frac14+ণ)uacute(frac14+ণ)ucirc(+ণ)

O (P+ড) - (P+R+র) + (S+ণ)

ঙং ŋ s(aelig+ক)uuml(aelig+p+র)egrave(aelig+খ)eth(aelig+গ)ocirc(aelig+ঘ)yacute(aelig+p+ষ)(Insomecontextsaeligisreplacedbyং)কংঅং

) (H+ক) (H+গ) U (H+ঘ)

64

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ম m thorn(+ল)yuml(+প)(+z+র)(+ভ)(++র)$(+ম)(+র)(x+ম)plusmn(deg+ম)amp(+ম)atilde(p+frac14+ম)

W (3+ম)

ন n (+ট)((+middot+র))(iquest+ঠ)u(+ড)(+Acirc+র)(+ত)q(+x+র)q8(+x++য)curren(+থ)shy(+দ)reg(+brvbar+র)micro(+ধ)+(+deg+র)(+brvbar+ব)-(+ন)(+ম)ন8(+য)(+স)0(+ন)

(7+থ) (7+ধ) (7+Y+র)

শ ʃ Icirc(Iuml+চ)Ograve(Iuml+ছ)1(Iuml+ন)2(Iuml+ম)3(Iuml+র)4(Iuml+ল)শ8(Iuml+য)

ষ ʃ 5(frac14+ক)raquo(frac14+ট)Agrave(frac14+ঠ)uacute(frac14+ণ)6(frac14+প)7(frac14+z+র)8(frac14+ফ)raquo(frac14+ট)9(frac14+middot+র)Agrave(frac14+ঠ)uacute(frac14+ণ)ষ8(frac14+য)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)

+ (S+ণ)

65

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

স sampʃ (+ক)(+ট)(+প)(+ফ)lt(+ত)pound(+থ)(+ট)(+ক)=(+খ)স8(+য)gt(+র)(+ল)aring(p+স)

9 (+থ)

হ h ucirc(+ণ)0(+ন)amp(+ম)হ8(+য)(+র)A(+ল)

W (3+ম)

ড় ɽ B(C+গ)

ঢ় ɽʱ (notprivilegedenoughtohaveclusters)

য dʒThesecondarysymbol(allograph)jɔ-phalahastwophoneticvaluesWhenaddedtotheinitialconsonantinaworditisavowelaelig(asinশ8ামলর 8াপারetc)Butafteranon-initialconsonantitjustdoublesitinpronunciation(asinকায6ধায6etc)The+যcombinationhastwophysicalmanifestationsmdashর 8andয6

ক8(p+য)স8(+য)র 8(+য)[Justর 8isneverusedinBanglaorthographyর 8াisbutthenitslasttwosymbolsYa-phalaa-karaconstituteavowelsignrepresentingthevowelঅ8া]

66

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

র r Twomanifestationsmdashi lরফrepʰasthefirst

memberofaclusteregপ6ৎ6 not6 য6D6(+deg+ব)(earlierE6=+brvbar+deg+বafour-termcluster)etc(placedoverthefollowingconsonant)

ii র-ফলাrɔ-pʰɔlaasthesecondthirdmemberofaclustereg etc(placedundertheconsonantitfollows)

ল l F(+গ)(+প)G(+ব)H(+ম)I(+ট)J(+ড)K(+ক)F(+গ)L(+দ)ল8(+য)iuml(acute+ল)(+ল)thorn(+ল)

ঃ hwordfinallywordmediallyitdoublesthepronunciationofthefollowingconsonant

অঃকঃ

অব

Page 50: Proposal for a Bangla (or Bengali) Script Root Zone …‘Bangla’ (or Bengali) is historically and genealogically regarded as an eastern Indo-Aryan language with around 178.2 million

50

Example

কংং क

ক क

কঃঃ कःः

কা का

কীঃঃ कःः

অংং अ

অ अ

অঃঃ अःः

4NumberofMpermittedafterConsonantisrestrictedtooneExample

কীী की5MisnotpermittedafterV Example

ইাঈৗ ईाईौ6ThecombinationsofAnusvāra+ VisargaaswellasVisarga+Anusvāraarenotpermissible

Example

কংঃ कः

কঃং कः

8 Contributors

81ExpertsfromIndia ProfessorUdayaNarayanaSinghChair-ProfessorofLinguisticsampDeanFacultyofArtsAmityUniversityHaryanaGurgaonPachgaonManesarPIN122431(Haryana)India

51

ProfessorPabitraSarkarformerlyVice-ChancellorRabindraBharatiUniversityKolkataDrAtiurRahmanKhanPrincipalTechnicalOfficerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMrRajibChakrabortyLinguistSocietyforNaturalLanguageTechnologyResearch(SNLTR)Module114amp130SDFBuildingSaltLakeSector-VKolkata-700091(WestBengal)IndiaMrAkshatJoshiProjectEngineerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMsMoumitaChowdhurySeniorTechnicalOfficerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMrChandrakantaMurasinghAgartalaTripuraSomeotherNBGPmembers

82ContributorsfromBangladeshJanabMustafaJabbarHonorableMinisterMinistryofPostsTelecommunicationsampInformationTechnologyGovtofBangladeshProfShamsuzzamanKhanFormerDirector-GeneralBanglaAcademyDhakaProfRafiqulIslamNationalProfessorofHumanitiesDhakaProfSwarochisSarkarDirectorInstituteofBangladeshStudiesRajshahiUniversityRajshahiBangladeshProfJinnatImtiazAliDirector-GeneralInternationalMotherLanguageInstituteDhakaMrMohammadMamunOrRashidDepartmentofBanglaJahangirnagarUniversityampMemberBangladeshComputerCouncilProfManiruzzamanformerlyProfessorChittagongUniversityChattagramBangladeshMrShyamSunderSikderSecretarySecretaryPostampTelecommunicationsDivisionGovtofBangladesh

52

MrMdMustafaKamalFormerDirectorGeneralBangladeshTelecommunicationsRegulatoryAuthorityGovernmentofBangladeshDhakaBrigadierGeneralMdMahfuzulKarimMajumderDirector-GeneralEngineeringampOperationsDivisionBangladeshTelecommunicationsRegulatoryAuthorityGovernmentofBangladeshDhakaMdZiarulIslamProgrammerPostsampTelecommunicationsDivisionGovernmentofBangladeshDhakaProfSyedShahriyarRahmanDepartmentofLinguisticsUniversityofDhakaDrMizanurRahmanDirector(In-Charge)TranslationTextBookandInternationalRelationsDivisionBanglaAcademyDhakaDrApareshBandyopadhyayDirectorBanglaAcademyDhakaMrMdMobarakHossainDirectorBanglaAcademyDhakaDrJalalAhmedDirectorBanglaAcademyDhakaMrJahangirHossainInternetSocietyBangladesh(ICANNALS)JanabSarwarMostafaChoudhuryBangladeshComputerCouncilDhakaJanabMdRashidWasifBangladeshComputerCouncilDhakaJanabIstiaqueArifSeniorAssistantDirectorBangladeshTelecommunicationsRegulatoryAuthorityDhakaMsAfifaAbbasInformationSecurityandGovernanceLeadEngineeratBanglalinkandICANNFellowMr Mohammad Abdul Haque Secretary General Bangladesh Internet GovernanceForumMrImranHossenCEOEyeSoftandkeymemberofBangladeshAssociationofSoftwareampInformationServices(BASIS)MsShahidaKhatunDirectorFolkloreMuseumampArchiveDivisionBanglaAcademyDhakaMrSyedAshikRehmanCEOBengalMediaCorporationDhakaMrHaseebRahmanCEOProfessionalsrsquoSystemsDhaka

53

9 References[100] UnicodeConsortium2017UnicodeStandard100MountainViewCA

[101] BandyopadhyayChittaranjan1981DuiShatakerBanglaMudranoPrakashanKolkataAnandaPublishers

[102] BanerjiRD1919TheOriginoftheBengaliScriptKolkataNewDelhiAsianEducationalServices2003reprint

[103] ChatterjiSK1926TheOriginandDevelopmentoftheBengaliLanguageCalcuttaCalcuttaUniversityPress

[104] -----1939Bhasha-prakashBangalaVyakaran(AGrammaroftheBengaliLanguage)CalcuttaUniversityofCalcutta

[105] HaiMuhammadAbdul1964DhvaniVijnanOBanglaDhvani-tattwa(PhoneticsandBengaliPhonology)DhakaBanglaAcademy

[106]JhaSubhadra1958TheFormationofMaithiliLondonLuzacampCo

[107] KosticDjordjeDasRheaS1972AShortOutlineofBengaliPhoneticsCalcuttaStatisticalPublishingCompany

[108] MajumdarRC1971HistoryofAncientBengalCalcuttaGBhardwaj

[109] MazumdarBijaychandra19202000TheHistoryoftheBengaliLanguage(ReprCalcutta1920ed)NewDelhiAsianEducationalServices

[110] PandeyAnshuman2001ProposaltoEncodetheTirhutaScriptinISOIEC10646

[111] PalPalashBaran2001DhwanimalaBarnamalaKolkataPapyrus

[112] -----2007lsquoBanglaHarapherPanchParbarsquoInSwapanChakrabortyedMudranerSanskritiOBanglaBoiKolkataAbabhas

[113] RossFiona1999ThePrintedBengaliCharacteranditsEvolutionLondonCurzon

[114] ShastriMahamahopadhyayHaraPrasad1916HājārBacharērPurāṇaBāṅgālāBhāṣāyBauddhaGānōDōhāCalcuttaBangiyaSahityaParishat

[115] SinghUdayaNarayana(JointlyManiruzzaman)1983DiglossiainBangladeshandlanguageplanningCalcuttaGyanBharati214pp

[116] -----1987ABibliographyofBengaliLinguisticsMysoreCIILxii+316pp

[117] -----2017(withRajibChakrabortyBidishaBhattacharjeeampArimardanKumarTripathy)LanguagesandCulturesontheMarginGuidelinesforFieldworkonEndangeredLanguagesMimeoCentreforEndangeredLanguagesVisva-Bharati

54

[118] -----1980ScriptalchoiceandspellingreformAnessayinlanguageandplanningJournaloftheMSUniversityofBarodaSocialScienceNumber292173-186AmodifiedversionreprintedEAnnamalaiBjornJernuddandJoanRubinedsLanguagePlanningProceedingsofanInstituteMysoreCIIL405-417

[119] Sripantha1996JakhanChapakhanaEloKolkataPaschim-BangaBanglaAcademy

[120] SurAtul1986BanglaMudranerDushoBacharKolkataJijnasa

[121] ScriptBehaviourforBengaliVersion11TDILandC-DACPune

[122] BoraMahendra1981TheEvolutionofAssameseScriptJorhatAssamSahityaSabha

[123] ProposaltoEncodetheTirhutaScriptinISOIEC10646httpwwwunicodeorgL2L201111175r-tirhutapdfaccessedon25112017

[124] EthnologueAssameseintheLanguageCloudhttpswwwethnologuecomcloudasmaccessedon25112017

[125] BengalialphabetforManipurifoundinEthnologueManipuri(MeeteilonMeithei)httpswwwomniglotcomwritingmanipurihtmaccessedon20102019

[126] WikipediaBengalialphabethttpsenwikipediaorgwikiBengali_alphabetaccessedon25112017

[129]OmniglotSlyhetihttpwwwomniglotcomwritingsylotihtmaccessedon1052018

[130]WikipediaBishnupriyaManipurilanguagehttpsenwikipediaorgwikiBishnupriya_Manipuri_languageaccessedon25112017

[131] TheEMILLECIILCorpushttpmetashareeldaorgrepositorybrowsethe-emilleciil-corpusabdd35c8de6f11e2b1e400259011f6ea6bce74d38dbb42d881da76c64a6adb20accessedon1052018

[132]TheEMILLECIILCorpushttpcatalogelrainfoproduct_infophpproducts_id=696accessedon1052018

[133] BanglaLanguageampScript httpswwwisicalacin~rc_banglabanglahtmlaccessedon1052018

[134] SarkarPabitra1992BanglaBananSanskarSamasyaoSambhabanaKolkataChirayataPrakashan

[135] SarkarPabitra1993BanglaBhasharYuktabyanjanBhasha1123-45

55

[136] DashNiladriShekharandBBChaudhuri1998BanglaScriptAStructuralStudyLinguisticsToday121-28Alsoavailableathttpswwwacademiaedu9967428Bangla_Script_A_Structural_Study

[137] DaniAhmedHasan(1957)lsquoSrīhatta-NāgarīLipirUtpattioBikāśrsquoBanglaAcademyPatrika(Dhaka)Vol12(Bhadra-Agrahayan1364BangabdaNumber)pg1

[138] WikipediaSylhetiNagari httpsenwikipediaorgwikiSylheti_Nagariaccessedon1952018

[139] FuruiRyosuke(2015)lsquoVariegatedAdaptationsStateFormationinBengalfromtheFifthtoSeventhCenturyrsquoinBhairabiPrasadSahuampHermannKulkeedsInterrogatingPoliticalSystemsIntegrativeProcessesandStatesinPre-ModernIndiaChapter9Pp255-73NewDelhiManohar

[140] FergusonCharesAandMunierChowdhury(1960)lsquoPhonesofBengalirsquoLanguageVol36No1pp22-59

[141] ShahidullahMuhammad(2007)BuddhistMysticSongsDhakaMowlaBrothers

[142] RayPunyaSloka(1966)BengaliLanguageHandbookWashington

[143] HaiMuhammadAbdul(1960)AphoneticandphonologicalstudyofnasalsandnasalizationinBengaliDhakaUniversityofDhaka

[144] UnicodeConsortiumProposalSummaryFormtoAccompanySubmissionsforAdditionstotheRepertoireofISOIEC10646UNICODEhttpswwwunicodeorgL2L200202387r-syloti-formpdfaccessedonMay212018

[145] WikipediaOlChiki(Unicodeblock)httpsenwikipediaorgwikiOl_Chiki_(Unicode_block)accessedonMay212018

[146]BanglaScripthttpwwwbangladesh2000combdbangla_scripthtmlaccessedonMay212018

[147] BhattacharyaAshutoshed(1942)GopichandrerGanCalcuttaCalcuttaUniversity

[149] DasSisirKumar(1975)SahibsandMunshisAnAccountoftheCollegeofFortWilliamCalcutta

[150] IslamRafiqulPabitraSarkarMahbubulHaqampRajibChakraborty(eds)(2014)BanglaAcademyPromitoBanglaByabaharikByakaran(AFunctionalGrammarofStandardBangla)DhakaBanglaAcademy

[151] SarkarPabitra[2013]lsquoBanglaSpellingReformtheLongandShortofItrsquoBanglaJournal19215-232

[152] BanglaAcademy(2012)BanglaAcademyPromitoBanglaBananerNiyam(StandardBanglaSpellingasadoptedbyBanglaAcademy)DhakaBanglaAcademy

56

[153] SarkarPabitraampRajibChakraborty2018ldquoWhathashappenedSoFarIntermsof Script Reformsrdquo Paper presented at the Face to Face meeting jointly held by theBanglaAcademyDhakaampICANNatBanglaAcademyDhakaon10072018

[154] The Unicode Consortium 2018 The Unicodereg Standard Version 110 ndash CoreSpecificationChapter12P473

57

10 Appendix-I

101 AugmentedBackus-NaurFormalism(ABNF)The Augmented Backus-Naur Formalism (ABNF) is generic in nature and whenappliedtoaspecificlanguagescriptcertainrestrictionrulesapplyInotherwordsinagivenlanguagesomeoftheFormalismstructuresdonotnecessarilyapplyTotakecareofsuchcasesrestrictionrulesaresetinplaceTheserestrictionswillhelptofine-tunetheABNFIncaseofBangla13inparticularthefollowingrulesapply

1Khaṇḍata (ৎ) is NOT allowed at the beginning of an IDN label The sameappliestoঞandthevelarnasalঙintheBanglaSchemeoffive-foldlsquovargarsquo(asdefined under Table 5) Moreover Bangla does not allow ya (য়) in thebeginningofawordeitherbutwecanciteacoupleofnativeexamples forexamplethewordয়8াvেড়া(yaeligbbɔRo)fromthepoemlsquoLichuchorrsquowrittenbyKaziNazrul IslamHowever there are instances of it being used in namesmostlyof foreignoriginsuchasYaqubwhichmaybewrittenwithya(য়) inthe beginning as inয়াwব) In very recent timeswhile transliterating someChineseandJapanesenamesinBanglaonedoescomeacrossthepossibilityofKhaṇḍata (ৎ) followedbysa (স) inthebeginningofaword forexamplexেসিরং(Tsering)

2CHcancomewithKhandaTainonlythecasewhereCisra(র)(09B0)

ৎ6 asinভৎ6 সনা

3OnlyfollowingcombinationswithVHCMwillbeallowedrarrঅ8া(togetherpronouncedasaelig)asinঅ8ািসড(acid)rarrএ8া(togetheralsopronouncedasaelig)asinএ8ািসডএ8ােসািসেয়শান

(acidassociation)

102 lsquoSylhetiNagarılipirsquoorlsquoSilotirsquoThisversionofBanglascriptresemblesthe lsquoKaithīrsquoscript(ISO12954)usedbytheAccountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh andBiharndashwidelyinuseduringthe1880sTherewereseveralothernamesofSylheti

13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language Assamese and Maṇipuri have not been covered in this section

58

Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)

Table17ndashTheScriptTableofSylhetiNagarıorSiloti

103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints

1031 BanglaandNāgarīorDevanāgarī

Bangla Devanāgarī NBGP

Decision

ঃ U+0983 ः U+0903 Confusable

ওU+0993 उU+0909 Confusable

ঘU+0998 घU+0918 Confusable

U+0981 U+0945 Confusable

Table18BanglaandDevanāgarīconfusablecodepoints

59

1032 BanglaandGurmukhi

Bangla Gurmukhi NBGPdecision

ঘU+0998 ਬU+0A2C Confusable

U+0981 U+0A71 Confusable

Table19BanglaandGurmukhiconfusablecodepoints

Bangla Gurmukhi

NBGPdecision

ওU+0993 ਤU+0A24

Distinguishable

শU+09B6 ਅU+0A05

Distinguishable

মU+09AE ਮU+0A2E

Distinguishable

বাU+09ACandU+09BE

ਗU+0A17

Distinguishable

Table20ndashBanglaandGurmukhıdistinguishablecodepoints

1033BanglaandOriya(Odia)

Bangla Oriya(Odia) NBGPDecision

ওU+0993 ଓU+0B13 Confusable

Table21ndashBanglaandOriyadistinguishablecodepoints

Bangla Oriya(Odia) NBGP

Decision

ঘU+0998 ସU+0B38 Distinguishable

Table22ndashBanglaandOriyadistinguishablecodepoints

60

11 Appendix-IIBengaliconsonantsandtheirallographs

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

প p y(z+ত)(z +ন)|(z +প)প8(z +য)r(z +র)(z +ল)~(z +স)(+প)(+প)

ফ pʰ (t +র)(t +ল)(+ফ)

ব b ( +জ)( +দ)( +ধ)v(+ব)ব8(+য)(+র)(+ল)ভ(+ভ)(+ব)(+ব)

(0+ধ) 2 (3+ব)

ভ bʱ ভ8(+য)(+র)(+ল)

ত t (x+ত)8(x+x+য)(x+x+ব)(x+থ)(x+ন)ত8(x+য)(x+ম)8(x++য)(x+ব)(x+র)y(z+ত)(p+ত)(p+x+ব)(+ত)q8(+x++য) (+x+র)Thereisamarkedformofত+=ৎৎ6 (+xৎ)

amp (5+ত)

61

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

থ tʰ থ8(iexcl+য)cent(iexcl+র)pound(+থ)(x+থ)curren(+থ)

(7+থ) 9 (+থ)

দ d yen(brvbar+গ)sect(brvbar+ঘ)uml(brvbar+দ)copy(brvbar+ধ)দ8(brvbar+য)ordf(brvbar+ব)laquo(brvbar+ভ)not(brvbar+র)(+দ)shy(+দ)reg(+brvbar+র)not6 (+brvbar+র)

(lt+গ) gt (lt+ধ)

ধ dʱ macr(deg+ন)plusmn(deg+ম)ধ8(deg+য)sup2(deg+র)sup3(acute+ধ)copy(brvbar+ধ)(+ধ)micro(+ধ)

( (+ধ) gt (lt+ধ) (0+ধ) (7+ধ)

ট ʈ para(middot+ট)ট8(middot+য)cedil(middot+ব)sup1(middot+র)ordm(p+ট)raquo(frac14+ট)

ঠ ʈʰ ঠ8(frac12+য)frac34(iquest+ঠ)Agrave(frac14+ঠ)

ড ɖ Aacute(Acirc+ড)ড8(Acirc+য)Atilde(Acirc+র)

ঢ ɖʱ ঢ8(Auml+য)Aring(iquest+ঢ)

চ tʃ AElig(Ccedil+চ)Egrave(Ccedil+ছ)Eacute(Ccedil+Ecirc+র)Euml(Ccedil+ঞ)চ8(Ccedil+য)Igrave(Iacute+চ)Icirc(Iuml+চ)

(A+চ)

62

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ছ tʃʰ ETH(Ecirc+র)Egrave(Ccedil+ছ)Ntilde(Iacute+ছ)Ograve(Iuml+ছ)

$ (A+ছ)

জ dʒ Oacute(Ocirc+জ)Otilde(Ocirc+Ocirc+ব)Ouml(Ocirc+ঝ)times(Ocirc+ঞ)জ8(Ocirc+য)Oslash(Ocirc+র)Ugrave(Iacute+জ)

(A+জ)

ঝ dʒʱ (notprivilegedenoughtohaveclustersasafirstmember)Ouml(Ocirc+ঝ)Uacute(Iacute+ঝ)

ক k Ucirc(p+ক)ordm(p+ট)(p+ত)Uuml(p+x+র)(p+x+ব)Yacute(p+ন)THORN(p+ব)szlig(p+ম)ক8(p+য)agrave(p+র)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)auml(p+frac14+ব)aacute8(p+frac14+য)aring(p+স)s(aelig+ক)ccedil(+p+র)

amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)

খ kʰ (notprivilegedenoughtohaveclustersasafirstmember)egrave(aelig+খ)

63

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

গ g eacute(acute+গ)ecirc(acute+দ)sup3(acute+ধ)euml(acute+ন)igrave(acute+ব)iacute(acute+ম)গ8(acute+য)icirc(acute+র)iuml(acute+ল)eth(aelig+গ)eth6 (+aelig+গ)

( (+ধ) (H+গ) K (L+H+গ)

ঘ gʱ ntilde(ograve+ন)ঘ8(ograve+য)oacute(ograve+র)ocirc(aelig+ঘ)

ঞ Thisletterdoesnothaveanyparticularphoneticvaluebutmostlypronouncedasn

Igrave(Iacute+চ)Ntilde(Iacute+ছ)Ugrave(Iacute+জ)Uacute(Iacute+ঝ)times(Ocirc+ঞ)

(A+চ) $ (A+ছ) (A+জ) M (A+ঝ)

ণ n otilde(iquest+ট)frac34(iquest+ঠ)ouml(iquest+ড)divide(iquest+Acirc+র)Aring(iquest+ঢ)oslash(iquest+ণ)ণ8(iquest+য)ugrave(iquest+ব)acirc(p+frac14+ণ)uacute(frac14+ণ)ucirc(+ণ)

O (P+ড) - (P+R+র) + (S+ণ)

ঙং ŋ s(aelig+ক)uuml(aelig+p+র)egrave(aelig+খ)eth(aelig+গ)ocirc(aelig+ঘ)yacute(aelig+p+ষ)(Insomecontextsaeligisreplacedbyং)কংঅং

) (H+ক) (H+গ) U (H+ঘ)

64

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ম m thorn(+ল)yuml(+প)(+z+র)(+ভ)(++র)$(+ম)(+র)(x+ম)plusmn(deg+ম)amp(+ম)atilde(p+frac14+ম)

W (3+ম)

ন n (+ট)((+middot+র))(iquest+ঠ)u(+ড)(+Acirc+র)(+ত)q(+x+র)q8(+x++য)curren(+থ)shy(+দ)reg(+brvbar+র)micro(+ধ)+(+deg+র)(+brvbar+ব)-(+ন)(+ম)ন8(+য)(+স)0(+ন)

(7+থ) (7+ধ) (7+Y+র)

শ ʃ Icirc(Iuml+চ)Ograve(Iuml+ছ)1(Iuml+ন)2(Iuml+ম)3(Iuml+র)4(Iuml+ল)শ8(Iuml+য)

ষ ʃ 5(frac14+ক)raquo(frac14+ট)Agrave(frac14+ঠ)uacute(frac14+ণ)6(frac14+প)7(frac14+z+র)8(frac14+ফ)raquo(frac14+ট)9(frac14+middot+র)Agrave(frac14+ঠ)uacute(frac14+ণ)ষ8(frac14+য)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)

+ (S+ণ)

65

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

স sampʃ (+ক)(+ট)(+প)(+ফ)lt(+ত)pound(+থ)(+ট)(+ক)=(+খ)স8(+য)gt(+র)(+ল)aring(p+স)

9 (+থ)

হ h ucirc(+ণ)0(+ন)amp(+ম)হ8(+য)(+র)A(+ল)

W (3+ম)

ড় ɽ B(C+গ)

ঢ় ɽʱ (notprivilegedenoughtohaveclusters)

য dʒThesecondarysymbol(allograph)jɔ-phalahastwophoneticvaluesWhenaddedtotheinitialconsonantinaworditisavowelaelig(asinশ8ামলর 8াপারetc)Butafteranon-initialconsonantitjustdoublesitinpronunciation(asinকায6ধায6etc)The+যcombinationhastwophysicalmanifestationsmdashর 8andয6

ক8(p+য)স8(+য)র 8(+য)[Justর 8isneverusedinBanglaorthographyর 8াisbutthenitslasttwosymbolsYa-phalaa-karaconstituteavowelsignrepresentingthevowelঅ8া]

66

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

র r Twomanifestationsmdashi lরফrepʰasthefirst

memberofaclusteregপ6ৎ6 not6 য6D6(+deg+ব)(earlierE6=+brvbar+deg+বafour-termcluster)etc(placedoverthefollowingconsonant)

ii র-ফলাrɔ-pʰɔlaasthesecondthirdmemberofaclustereg etc(placedundertheconsonantitfollows)

ল l F(+গ)(+প)G(+ব)H(+ম)I(+ট)J(+ড)K(+ক)F(+গ)L(+দ)ল8(+য)iuml(acute+ল)(+ল)thorn(+ল)

ঃ hwordfinallywordmediallyitdoublesthepronunciationofthefollowingconsonant

অঃকঃ

অব

Page 51: Proposal for a Bangla (or Bengali) Script Root Zone …‘Bangla’ (or Bengali) is historically and genealogically regarded as an eastern Indo-Aryan language with around 178.2 million

51

ProfessorPabitraSarkarformerlyVice-ChancellorRabindraBharatiUniversityKolkataDrAtiurRahmanKhanPrincipalTechnicalOfficerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMrRajibChakrabortyLinguistSocietyforNaturalLanguageTechnologyResearch(SNLTR)Module114amp130SDFBuildingSaltLakeSector-VKolkata-700091(WestBengal)IndiaMrAkshatJoshiProjectEngineerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMsMoumitaChowdhurySeniorTechnicalOfficerGISTGroupC-DACPunePIN411008(Maharashtra)IndiaMrChandrakantaMurasinghAgartalaTripuraSomeotherNBGPmembers

82ContributorsfromBangladeshJanabMustafaJabbarHonorableMinisterMinistryofPostsTelecommunicationsampInformationTechnologyGovtofBangladeshProfShamsuzzamanKhanFormerDirector-GeneralBanglaAcademyDhakaProfRafiqulIslamNationalProfessorofHumanitiesDhakaProfSwarochisSarkarDirectorInstituteofBangladeshStudiesRajshahiUniversityRajshahiBangladeshProfJinnatImtiazAliDirector-GeneralInternationalMotherLanguageInstituteDhakaMrMohammadMamunOrRashidDepartmentofBanglaJahangirnagarUniversityampMemberBangladeshComputerCouncilProfManiruzzamanformerlyProfessorChittagongUniversityChattagramBangladeshMrShyamSunderSikderSecretarySecretaryPostampTelecommunicationsDivisionGovtofBangladesh

52

MrMdMustafaKamalFormerDirectorGeneralBangladeshTelecommunicationsRegulatoryAuthorityGovernmentofBangladeshDhakaBrigadierGeneralMdMahfuzulKarimMajumderDirector-GeneralEngineeringampOperationsDivisionBangladeshTelecommunicationsRegulatoryAuthorityGovernmentofBangladeshDhakaMdZiarulIslamProgrammerPostsampTelecommunicationsDivisionGovernmentofBangladeshDhakaProfSyedShahriyarRahmanDepartmentofLinguisticsUniversityofDhakaDrMizanurRahmanDirector(In-Charge)TranslationTextBookandInternationalRelationsDivisionBanglaAcademyDhakaDrApareshBandyopadhyayDirectorBanglaAcademyDhakaMrMdMobarakHossainDirectorBanglaAcademyDhakaDrJalalAhmedDirectorBanglaAcademyDhakaMrJahangirHossainInternetSocietyBangladesh(ICANNALS)JanabSarwarMostafaChoudhuryBangladeshComputerCouncilDhakaJanabMdRashidWasifBangladeshComputerCouncilDhakaJanabIstiaqueArifSeniorAssistantDirectorBangladeshTelecommunicationsRegulatoryAuthorityDhakaMsAfifaAbbasInformationSecurityandGovernanceLeadEngineeratBanglalinkandICANNFellowMr Mohammad Abdul Haque Secretary General Bangladesh Internet GovernanceForumMrImranHossenCEOEyeSoftandkeymemberofBangladeshAssociationofSoftwareampInformationServices(BASIS)MsShahidaKhatunDirectorFolkloreMuseumampArchiveDivisionBanglaAcademyDhakaMrSyedAshikRehmanCEOBengalMediaCorporationDhakaMrHaseebRahmanCEOProfessionalsrsquoSystemsDhaka

53

9 References[100] UnicodeConsortium2017UnicodeStandard100MountainViewCA

[101] BandyopadhyayChittaranjan1981DuiShatakerBanglaMudranoPrakashanKolkataAnandaPublishers

[102] BanerjiRD1919TheOriginoftheBengaliScriptKolkataNewDelhiAsianEducationalServices2003reprint

[103] ChatterjiSK1926TheOriginandDevelopmentoftheBengaliLanguageCalcuttaCalcuttaUniversityPress

[104] -----1939Bhasha-prakashBangalaVyakaran(AGrammaroftheBengaliLanguage)CalcuttaUniversityofCalcutta

[105] HaiMuhammadAbdul1964DhvaniVijnanOBanglaDhvani-tattwa(PhoneticsandBengaliPhonology)DhakaBanglaAcademy

[106]JhaSubhadra1958TheFormationofMaithiliLondonLuzacampCo

[107] KosticDjordjeDasRheaS1972AShortOutlineofBengaliPhoneticsCalcuttaStatisticalPublishingCompany

[108] MajumdarRC1971HistoryofAncientBengalCalcuttaGBhardwaj

[109] MazumdarBijaychandra19202000TheHistoryoftheBengaliLanguage(ReprCalcutta1920ed)NewDelhiAsianEducationalServices

[110] PandeyAnshuman2001ProposaltoEncodetheTirhutaScriptinISOIEC10646

[111] PalPalashBaran2001DhwanimalaBarnamalaKolkataPapyrus

[112] -----2007lsquoBanglaHarapherPanchParbarsquoInSwapanChakrabortyedMudranerSanskritiOBanglaBoiKolkataAbabhas

[113] RossFiona1999ThePrintedBengaliCharacteranditsEvolutionLondonCurzon

[114] ShastriMahamahopadhyayHaraPrasad1916HājārBacharērPurāṇaBāṅgālāBhāṣāyBauddhaGānōDōhāCalcuttaBangiyaSahityaParishat

[115] SinghUdayaNarayana(JointlyManiruzzaman)1983DiglossiainBangladeshandlanguageplanningCalcuttaGyanBharati214pp

[116] -----1987ABibliographyofBengaliLinguisticsMysoreCIILxii+316pp

[117] -----2017(withRajibChakrabortyBidishaBhattacharjeeampArimardanKumarTripathy)LanguagesandCulturesontheMarginGuidelinesforFieldworkonEndangeredLanguagesMimeoCentreforEndangeredLanguagesVisva-Bharati

54

[118] -----1980ScriptalchoiceandspellingreformAnessayinlanguageandplanningJournaloftheMSUniversityofBarodaSocialScienceNumber292173-186AmodifiedversionreprintedEAnnamalaiBjornJernuddandJoanRubinedsLanguagePlanningProceedingsofanInstituteMysoreCIIL405-417

[119] Sripantha1996JakhanChapakhanaEloKolkataPaschim-BangaBanglaAcademy

[120] SurAtul1986BanglaMudranerDushoBacharKolkataJijnasa

[121] ScriptBehaviourforBengaliVersion11TDILandC-DACPune

[122] BoraMahendra1981TheEvolutionofAssameseScriptJorhatAssamSahityaSabha

[123] ProposaltoEncodetheTirhutaScriptinISOIEC10646httpwwwunicodeorgL2L201111175r-tirhutapdfaccessedon25112017

[124] EthnologueAssameseintheLanguageCloudhttpswwwethnologuecomcloudasmaccessedon25112017

[125] BengalialphabetforManipurifoundinEthnologueManipuri(MeeteilonMeithei)httpswwwomniglotcomwritingmanipurihtmaccessedon20102019

[126] WikipediaBengalialphabethttpsenwikipediaorgwikiBengali_alphabetaccessedon25112017

[129]OmniglotSlyhetihttpwwwomniglotcomwritingsylotihtmaccessedon1052018

[130]WikipediaBishnupriyaManipurilanguagehttpsenwikipediaorgwikiBishnupriya_Manipuri_languageaccessedon25112017

[131] TheEMILLECIILCorpushttpmetashareeldaorgrepositorybrowsethe-emilleciil-corpusabdd35c8de6f11e2b1e400259011f6ea6bce74d38dbb42d881da76c64a6adb20accessedon1052018

[132]TheEMILLECIILCorpushttpcatalogelrainfoproduct_infophpproducts_id=696accessedon1052018

[133] BanglaLanguageampScript httpswwwisicalacin~rc_banglabanglahtmlaccessedon1052018

[134] SarkarPabitra1992BanglaBananSanskarSamasyaoSambhabanaKolkataChirayataPrakashan

[135] SarkarPabitra1993BanglaBhasharYuktabyanjanBhasha1123-45

55

[136] DashNiladriShekharandBBChaudhuri1998BanglaScriptAStructuralStudyLinguisticsToday121-28Alsoavailableathttpswwwacademiaedu9967428Bangla_Script_A_Structural_Study

[137] DaniAhmedHasan(1957)lsquoSrīhatta-NāgarīLipirUtpattioBikāśrsquoBanglaAcademyPatrika(Dhaka)Vol12(Bhadra-Agrahayan1364BangabdaNumber)pg1

[138] WikipediaSylhetiNagari httpsenwikipediaorgwikiSylheti_Nagariaccessedon1952018

[139] FuruiRyosuke(2015)lsquoVariegatedAdaptationsStateFormationinBengalfromtheFifthtoSeventhCenturyrsquoinBhairabiPrasadSahuampHermannKulkeedsInterrogatingPoliticalSystemsIntegrativeProcessesandStatesinPre-ModernIndiaChapter9Pp255-73NewDelhiManohar

[140] FergusonCharesAandMunierChowdhury(1960)lsquoPhonesofBengalirsquoLanguageVol36No1pp22-59

[141] ShahidullahMuhammad(2007)BuddhistMysticSongsDhakaMowlaBrothers

[142] RayPunyaSloka(1966)BengaliLanguageHandbookWashington

[143] HaiMuhammadAbdul(1960)AphoneticandphonologicalstudyofnasalsandnasalizationinBengaliDhakaUniversityofDhaka

[144] UnicodeConsortiumProposalSummaryFormtoAccompanySubmissionsforAdditionstotheRepertoireofISOIEC10646UNICODEhttpswwwunicodeorgL2L200202387r-syloti-formpdfaccessedonMay212018

[145] WikipediaOlChiki(Unicodeblock)httpsenwikipediaorgwikiOl_Chiki_(Unicode_block)accessedonMay212018

[146]BanglaScripthttpwwwbangladesh2000combdbangla_scripthtmlaccessedonMay212018

[147] BhattacharyaAshutoshed(1942)GopichandrerGanCalcuttaCalcuttaUniversity

[149] DasSisirKumar(1975)SahibsandMunshisAnAccountoftheCollegeofFortWilliamCalcutta

[150] IslamRafiqulPabitraSarkarMahbubulHaqampRajibChakraborty(eds)(2014)BanglaAcademyPromitoBanglaByabaharikByakaran(AFunctionalGrammarofStandardBangla)DhakaBanglaAcademy

[151] SarkarPabitra[2013]lsquoBanglaSpellingReformtheLongandShortofItrsquoBanglaJournal19215-232

[152] BanglaAcademy(2012)BanglaAcademyPromitoBanglaBananerNiyam(StandardBanglaSpellingasadoptedbyBanglaAcademy)DhakaBanglaAcademy

56

[153] SarkarPabitraampRajibChakraborty2018ldquoWhathashappenedSoFarIntermsof Script Reformsrdquo Paper presented at the Face to Face meeting jointly held by theBanglaAcademyDhakaampICANNatBanglaAcademyDhakaon10072018

[154] The Unicode Consortium 2018 The Unicodereg Standard Version 110 ndash CoreSpecificationChapter12P473

57

10 Appendix-I

101 AugmentedBackus-NaurFormalism(ABNF)The Augmented Backus-Naur Formalism (ABNF) is generic in nature and whenappliedtoaspecificlanguagescriptcertainrestrictionrulesapplyInotherwordsinagivenlanguagesomeoftheFormalismstructuresdonotnecessarilyapplyTotakecareofsuchcasesrestrictionrulesaresetinplaceTheserestrictionswillhelptofine-tunetheABNFIncaseofBangla13inparticularthefollowingrulesapply

1Khaṇḍata (ৎ) is NOT allowed at the beginning of an IDN label The sameappliestoঞandthevelarnasalঙintheBanglaSchemeoffive-foldlsquovargarsquo(asdefined under Table 5) Moreover Bangla does not allow ya (য়) in thebeginningofawordeitherbutwecanciteacoupleofnativeexamples forexamplethewordয়8াvেড়া(yaeligbbɔRo)fromthepoemlsquoLichuchorrsquowrittenbyKaziNazrul IslamHowever there are instances of it being used in namesmostlyof foreignoriginsuchasYaqubwhichmaybewrittenwithya(য়) inthe beginning as inয়াwব) In very recent timeswhile transliterating someChineseandJapanesenamesinBanglaonedoescomeacrossthepossibilityofKhaṇḍata (ৎ) followedbysa (স) inthebeginningofaword forexamplexেসিরং(Tsering)

2CHcancomewithKhandaTainonlythecasewhereCisra(র)(09B0)

ৎ6 asinভৎ6 সনা

3OnlyfollowingcombinationswithVHCMwillbeallowedrarrঅ8া(togetherpronouncedasaelig)asinঅ8ািসড(acid)rarrএ8া(togetheralsopronouncedasaelig)asinএ8ািসডএ8ােসািসেয়শান

(acidassociation)

102 lsquoSylhetiNagarılipirsquoorlsquoSilotirsquoThisversionofBanglascriptresemblesthe lsquoKaithīrsquoscript(ISO12954)usedbytheAccountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh andBiharndashwidelyinuseduringthe1880sTherewereseveralothernamesofSylheti

13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language Assamese and Maṇipuri have not been covered in this section

58

Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)

Table17ndashTheScriptTableofSylhetiNagarıorSiloti

103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints

1031 BanglaandNāgarīorDevanāgarī

Bangla Devanāgarī NBGP

Decision

ঃ U+0983 ः U+0903 Confusable

ওU+0993 उU+0909 Confusable

ঘU+0998 घU+0918 Confusable

U+0981 U+0945 Confusable

Table18BanglaandDevanāgarīconfusablecodepoints

59

1032 BanglaandGurmukhi

Bangla Gurmukhi NBGPdecision

ঘU+0998 ਬU+0A2C Confusable

U+0981 U+0A71 Confusable

Table19BanglaandGurmukhiconfusablecodepoints

Bangla Gurmukhi

NBGPdecision

ওU+0993 ਤU+0A24

Distinguishable

শU+09B6 ਅU+0A05

Distinguishable

মU+09AE ਮU+0A2E

Distinguishable

বাU+09ACandU+09BE

ਗU+0A17

Distinguishable

Table20ndashBanglaandGurmukhıdistinguishablecodepoints

1033BanglaandOriya(Odia)

Bangla Oriya(Odia) NBGPDecision

ওU+0993 ଓU+0B13 Confusable

Table21ndashBanglaandOriyadistinguishablecodepoints

Bangla Oriya(Odia) NBGP

Decision

ঘU+0998 ସU+0B38 Distinguishable

Table22ndashBanglaandOriyadistinguishablecodepoints

60

11 Appendix-IIBengaliconsonantsandtheirallographs

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

প p y(z+ত)(z +ন)|(z +প)প8(z +য)r(z +র)(z +ল)~(z +স)(+প)(+প)

ফ pʰ (t +র)(t +ল)(+ফ)

ব b ( +জ)( +দ)( +ধ)v(+ব)ব8(+য)(+র)(+ল)ভ(+ভ)(+ব)(+ব)

(0+ধ) 2 (3+ব)

ভ bʱ ভ8(+য)(+র)(+ল)

ত t (x+ত)8(x+x+য)(x+x+ব)(x+থ)(x+ন)ত8(x+য)(x+ম)8(x++য)(x+ব)(x+র)y(z+ত)(p+ত)(p+x+ব)(+ত)q8(+x++য) (+x+র)Thereisamarkedformofত+=ৎৎ6 (+xৎ)

amp (5+ত)

61

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

থ tʰ থ8(iexcl+য)cent(iexcl+র)pound(+থ)(x+থ)curren(+থ)

(7+থ) 9 (+থ)

দ d yen(brvbar+গ)sect(brvbar+ঘ)uml(brvbar+দ)copy(brvbar+ধ)দ8(brvbar+য)ordf(brvbar+ব)laquo(brvbar+ভ)not(brvbar+র)(+দ)shy(+দ)reg(+brvbar+র)not6 (+brvbar+র)

(lt+গ) gt (lt+ধ)

ধ dʱ macr(deg+ন)plusmn(deg+ম)ধ8(deg+য)sup2(deg+র)sup3(acute+ধ)copy(brvbar+ধ)(+ধ)micro(+ধ)

( (+ধ) gt (lt+ধ) (0+ধ) (7+ধ)

ট ʈ para(middot+ট)ট8(middot+য)cedil(middot+ব)sup1(middot+র)ordm(p+ট)raquo(frac14+ট)

ঠ ʈʰ ঠ8(frac12+য)frac34(iquest+ঠ)Agrave(frac14+ঠ)

ড ɖ Aacute(Acirc+ড)ড8(Acirc+য)Atilde(Acirc+র)

ঢ ɖʱ ঢ8(Auml+য)Aring(iquest+ঢ)

চ tʃ AElig(Ccedil+চ)Egrave(Ccedil+ছ)Eacute(Ccedil+Ecirc+র)Euml(Ccedil+ঞ)চ8(Ccedil+য)Igrave(Iacute+চ)Icirc(Iuml+চ)

(A+চ)

62

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ছ tʃʰ ETH(Ecirc+র)Egrave(Ccedil+ছ)Ntilde(Iacute+ছ)Ograve(Iuml+ছ)

$ (A+ছ)

জ dʒ Oacute(Ocirc+জ)Otilde(Ocirc+Ocirc+ব)Ouml(Ocirc+ঝ)times(Ocirc+ঞ)জ8(Ocirc+য)Oslash(Ocirc+র)Ugrave(Iacute+জ)

(A+জ)

ঝ dʒʱ (notprivilegedenoughtohaveclustersasafirstmember)Ouml(Ocirc+ঝ)Uacute(Iacute+ঝ)

ক k Ucirc(p+ক)ordm(p+ট)(p+ত)Uuml(p+x+র)(p+x+ব)Yacute(p+ন)THORN(p+ব)szlig(p+ম)ক8(p+য)agrave(p+র)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)auml(p+frac14+ব)aacute8(p+frac14+য)aring(p+স)s(aelig+ক)ccedil(+p+র)

amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)

খ kʰ (notprivilegedenoughtohaveclustersasafirstmember)egrave(aelig+খ)

63

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

গ g eacute(acute+গ)ecirc(acute+দ)sup3(acute+ধ)euml(acute+ন)igrave(acute+ব)iacute(acute+ম)গ8(acute+য)icirc(acute+র)iuml(acute+ল)eth(aelig+গ)eth6 (+aelig+গ)

( (+ধ) (H+গ) K (L+H+গ)

ঘ gʱ ntilde(ograve+ন)ঘ8(ograve+য)oacute(ograve+র)ocirc(aelig+ঘ)

ঞ Thisletterdoesnothaveanyparticularphoneticvaluebutmostlypronouncedasn

Igrave(Iacute+চ)Ntilde(Iacute+ছ)Ugrave(Iacute+জ)Uacute(Iacute+ঝ)times(Ocirc+ঞ)

(A+চ) $ (A+ছ) (A+জ) M (A+ঝ)

ণ n otilde(iquest+ট)frac34(iquest+ঠ)ouml(iquest+ড)divide(iquest+Acirc+র)Aring(iquest+ঢ)oslash(iquest+ণ)ণ8(iquest+য)ugrave(iquest+ব)acirc(p+frac14+ণ)uacute(frac14+ণ)ucirc(+ণ)

O (P+ড) - (P+R+র) + (S+ণ)

ঙং ŋ s(aelig+ক)uuml(aelig+p+র)egrave(aelig+খ)eth(aelig+গ)ocirc(aelig+ঘ)yacute(aelig+p+ষ)(Insomecontextsaeligisreplacedbyং)কংঅং

) (H+ক) (H+গ) U (H+ঘ)

64

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ম m thorn(+ল)yuml(+প)(+z+র)(+ভ)(++র)$(+ম)(+র)(x+ম)plusmn(deg+ম)amp(+ম)atilde(p+frac14+ম)

W (3+ম)

ন n (+ট)((+middot+র))(iquest+ঠ)u(+ড)(+Acirc+র)(+ত)q(+x+র)q8(+x++য)curren(+থ)shy(+দ)reg(+brvbar+র)micro(+ধ)+(+deg+র)(+brvbar+ব)-(+ন)(+ম)ন8(+য)(+স)0(+ন)

(7+থ) (7+ধ) (7+Y+র)

শ ʃ Icirc(Iuml+চ)Ograve(Iuml+ছ)1(Iuml+ন)2(Iuml+ম)3(Iuml+র)4(Iuml+ল)শ8(Iuml+য)

ষ ʃ 5(frac14+ক)raquo(frac14+ট)Agrave(frac14+ঠ)uacute(frac14+ণ)6(frac14+প)7(frac14+z+র)8(frac14+ফ)raquo(frac14+ট)9(frac14+middot+র)Agrave(frac14+ঠ)uacute(frac14+ণ)ষ8(frac14+য)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)

+ (S+ণ)

65

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

স sampʃ (+ক)(+ট)(+প)(+ফ)lt(+ত)pound(+থ)(+ট)(+ক)=(+খ)স8(+য)gt(+র)(+ল)aring(p+স)

9 (+থ)

হ h ucirc(+ণ)0(+ন)amp(+ম)হ8(+য)(+র)A(+ল)

W (3+ম)

ড় ɽ B(C+গ)

ঢ় ɽʱ (notprivilegedenoughtohaveclusters)

য dʒThesecondarysymbol(allograph)jɔ-phalahastwophoneticvaluesWhenaddedtotheinitialconsonantinaworditisavowelaelig(asinশ8ামলর 8াপারetc)Butafteranon-initialconsonantitjustdoublesitinpronunciation(asinকায6ধায6etc)The+যcombinationhastwophysicalmanifestationsmdashর 8andয6

ক8(p+য)স8(+য)র 8(+য)[Justর 8isneverusedinBanglaorthographyর 8াisbutthenitslasttwosymbolsYa-phalaa-karaconstituteavowelsignrepresentingthevowelঅ8া]

66

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

র r Twomanifestationsmdashi lরফrepʰasthefirst

memberofaclusteregপ6ৎ6 not6 য6D6(+deg+ব)(earlierE6=+brvbar+deg+বafour-termcluster)etc(placedoverthefollowingconsonant)

ii র-ফলাrɔ-pʰɔlaasthesecondthirdmemberofaclustereg etc(placedundertheconsonantitfollows)

ল l F(+গ)(+প)G(+ব)H(+ম)I(+ট)J(+ড)K(+ক)F(+গ)L(+দ)ল8(+য)iuml(acute+ল)(+ল)thorn(+ল)

ঃ hwordfinallywordmediallyitdoublesthepronunciationofthefollowingconsonant

অঃকঃ

অব

Page 52: Proposal for a Bangla (or Bengali) Script Root Zone …‘Bangla’ (or Bengali) is historically and genealogically regarded as an eastern Indo-Aryan language with around 178.2 million

52

MrMdMustafaKamalFormerDirectorGeneralBangladeshTelecommunicationsRegulatoryAuthorityGovernmentofBangladeshDhakaBrigadierGeneralMdMahfuzulKarimMajumderDirector-GeneralEngineeringampOperationsDivisionBangladeshTelecommunicationsRegulatoryAuthorityGovernmentofBangladeshDhakaMdZiarulIslamProgrammerPostsampTelecommunicationsDivisionGovernmentofBangladeshDhakaProfSyedShahriyarRahmanDepartmentofLinguisticsUniversityofDhakaDrMizanurRahmanDirector(In-Charge)TranslationTextBookandInternationalRelationsDivisionBanglaAcademyDhakaDrApareshBandyopadhyayDirectorBanglaAcademyDhakaMrMdMobarakHossainDirectorBanglaAcademyDhakaDrJalalAhmedDirectorBanglaAcademyDhakaMrJahangirHossainInternetSocietyBangladesh(ICANNALS)JanabSarwarMostafaChoudhuryBangladeshComputerCouncilDhakaJanabMdRashidWasifBangladeshComputerCouncilDhakaJanabIstiaqueArifSeniorAssistantDirectorBangladeshTelecommunicationsRegulatoryAuthorityDhakaMsAfifaAbbasInformationSecurityandGovernanceLeadEngineeratBanglalinkandICANNFellowMr Mohammad Abdul Haque Secretary General Bangladesh Internet GovernanceForumMrImranHossenCEOEyeSoftandkeymemberofBangladeshAssociationofSoftwareampInformationServices(BASIS)MsShahidaKhatunDirectorFolkloreMuseumampArchiveDivisionBanglaAcademyDhakaMrSyedAshikRehmanCEOBengalMediaCorporationDhakaMrHaseebRahmanCEOProfessionalsrsquoSystemsDhaka

53

9 References[100] UnicodeConsortium2017UnicodeStandard100MountainViewCA

[101] BandyopadhyayChittaranjan1981DuiShatakerBanglaMudranoPrakashanKolkataAnandaPublishers

[102] BanerjiRD1919TheOriginoftheBengaliScriptKolkataNewDelhiAsianEducationalServices2003reprint

[103] ChatterjiSK1926TheOriginandDevelopmentoftheBengaliLanguageCalcuttaCalcuttaUniversityPress

[104] -----1939Bhasha-prakashBangalaVyakaran(AGrammaroftheBengaliLanguage)CalcuttaUniversityofCalcutta

[105] HaiMuhammadAbdul1964DhvaniVijnanOBanglaDhvani-tattwa(PhoneticsandBengaliPhonology)DhakaBanglaAcademy

[106]JhaSubhadra1958TheFormationofMaithiliLondonLuzacampCo

[107] KosticDjordjeDasRheaS1972AShortOutlineofBengaliPhoneticsCalcuttaStatisticalPublishingCompany

[108] MajumdarRC1971HistoryofAncientBengalCalcuttaGBhardwaj

[109] MazumdarBijaychandra19202000TheHistoryoftheBengaliLanguage(ReprCalcutta1920ed)NewDelhiAsianEducationalServices

[110] PandeyAnshuman2001ProposaltoEncodetheTirhutaScriptinISOIEC10646

[111] PalPalashBaran2001DhwanimalaBarnamalaKolkataPapyrus

[112] -----2007lsquoBanglaHarapherPanchParbarsquoInSwapanChakrabortyedMudranerSanskritiOBanglaBoiKolkataAbabhas

[113] RossFiona1999ThePrintedBengaliCharacteranditsEvolutionLondonCurzon

[114] ShastriMahamahopadhyayHaraPrasad1916HājārBacharērPurāṇaBāṅgālāBhāṣāyBauddhaGānōDōhāCalcuttaBangiyaSahityaParishat

[115] SinghUdayaNarayana(JointlyManiruzzaman)1983DiglossiainBangladeshandlanguageplanningCalcuttaGyanBharati214pp

[116] -----1987ABibliographyofBengaliLinguisticsMysoreCIILxii+316pp

[117] -----2017(withRajibChakrabortyBidishaBhattacharjeeampArimardanKumarTripathy)LanguagesandCulturesontheMarginGuidelinesforFieldworkonEndangeredLanguagesMimeoCentreforEndangeredLanguagesVisva-Bharati

54

[118] -----1980ScriptalchoiceandspellingreformAnessayinlanguageandplanningJournaloftheMSUniversityofBarodaSocialScienceNumber292173-186AmodifiedversionreprintedEAnnamalaiBjornJernuddandJoanRubinedsLanguagePlanningProceedingsofanInstituteMysoreCIIL405-417

[119] Sripantha1996JakhanChapakhanaEloKolkataPaschim-BangaBanglaAcademy

[120] SurAtul1986BanglaMudranerDushoBacharKolkataJijnasa

[121] ScriptBehaviourforBengaliVersion11TDILandC-DACPune

[122] BoraMahendra1981TheEvolutionofAssameseScriptJorhatAssamSahityaSabha

[123] ProposaltoEncodetheTirhutaScriptinISOIEC10646httpwwwunicodeorgL2L201111175r-tirhutapdfaccessedon25112017

[124] EthnologueAssameseintheLanguageCloudhttpswwwethnologuecomcloudasmaccessedon25112017

[125] BengalialphabetforManipurifoundinEthnologueManipuri(MeeteilonMeithei)httpswwwomniglotcomwritingmanipurihtmaccessedon20102019

[126] WikipediaBengalialphabethttpsenwikipediaorgwikiBengali_alphabetaccessedon25112017

[129]OmniglotSlyhetihttpwwwomniglotcomwritingsylotihtmaccessedon1052018

[130]WikipediaBishnupriyaManipurilanguagehttpsenwikipediaorgwikiBishnupriya_Manipuri_languageaccessedon25112017

[131] TheEMILLECIILCorpushttpmetashareeldaorgrepositorybrowsethe-emilleciil-corpusabdd35c8de6f11e2b1e400259011f6ea6bce74d38dbb42d881da76c64a6adb20accessedon1052018

[132]TheEMILLECIILCorpushttpcatalogelrainfoproduct_infophpproducts_id=696accessedon1052018

[133] BanglaLanguageampScript httpswwwisicalacin~rc_banglabanglahtmlaccessedon1052018

[134] SarkarPabitra1992BanglaBananSanskarSamasyaoSambhabanaKolkataChirayataPrakashan

[135] SarkarPabitra1993BanglaBhasharYuktabyanjanBhasha1123-45

55

[136] DashNiladriShekharandBBChaudhuri1998BanglaScriptAStructuralStudyLinguisticsToday121-28Alsoavailableathttpswwwacademiaedu9967428Bangla_Script_A_Structural_Study

[137] DaniAhmedHasan(1957)lsquoSrīhatta-NāgarīLipirUtpattioBikāśrsquoBanglaAcademyPatrika(Dhaka)Vol12(Bhadra-Agrahayan1364BangabdaNumber)pg1

[138] WikipediaSylhetiNagari httpsenwikipediaorgwikiSylheti_Nagariaccessedon1952018

[139] FuruiRyosuke(2015)lsquoVariegatedAdaptationsStateFormationinBengalfromtheFifthtoSeventhCenturyrsquoinBhairabiPrasadSahuampHermannKulkeedsInterrogatingPoliticalSystemsIntegrativeProcessesandStatesinPre-ModernIndiaChapter9Pp255-73NewDelhiManohar

[140] FergusonCharesAandMunierChowdhury(1960)lsquoPhonesofBengalirsquoLanguageVol36No1pp22-59

[141] ShahidullahMuhammad(2007)BuddhistMysticSongsDhakaMowlaBrothers

[142] RayPunyaSloka(1966)BengaliLanguageHandbookWashington

[143] HaiMuhammadAbdul(1960)AphoneticandphonologicalstudyofnasalsandnasalizationinBengaliDhakaUniversityofDhaka

[144] UnicodeConsortiumProposalSummaryFormtoAccompanySubmissionsforAdditionstotheRepertoireofISOIEC10646UNICODEhttpswwwunicodeorgL2L200202387r-syloti-formpdfaccessedonMay212018

[145] WikipediaOlChiki(Unicodeblock)httpsenwikipediaorgwikiOl_Chiki_(Unicode_block)accessedonMay212018

[146]BanglaScripthttpwwwbangladesh2000combdbangla_scripthtmlaccessedonMay212018

[147] BhattacharyaAshutoshed(1942)GopichandrerGanCalcuttaCalcuttaUniversity

[149] DasSisirKumar(1975)SahibsandMunshisAnAccountoftheCollegeofFortWilliamCalcutta

[150] IslamRafiqulPabitraSarkarMahbubulHaqampRajibChakraborty(eds)(2014)BanglaAcademyPromitoBanglaByabaharikByakaran(AFunctionalGrammarofStandardBangla)DhakaBanglaAcademy

[151] SarkarPabitra[2013]lsquoBanglaSpellingReformtheLongandShortofItrsquoBanglaJournal19215-232

[152] BanglaAcademy(2012)BanglaAcademyPromitoBanglaBananerNiyam(StandardBanglaSpellingasadoptedbyBanglaAcademy)DhakaBanglaAcademy

56

[153] SarkarPabitraampRajibChakraborty2018ldquoWhathashappenedSoFarIntermsof Script Reformsrdquo Paper presented at the Face to Face meeting jointly held by theBanglaAcademyDhakaampICANNatBanglaAcademyDhakaon10072018

[154] The Unicode Consortium 2018 The Unicodereg Standard Version 110 ndash CoreSpecificationChapter12P473

57

10 Appendix-I

101 AugmentedBackus-NaurFormalism(ABNF)The Augmented Backus-Naur Formalism (ABNF) is generic in nature and whenappliedtoaspecificlanguagescriptcertainrestrictionrulesapplyInotherwordsinagivenlanguagesomeoftheFormalismstructuresdonotnecessarilyapplyTotakecareofsuchcasesrestrictionrulesaresetinplaceTheserestrictionswillhelptofine-tunetheABNFIncaseofBangla13inparticularthefollowingrulesapply

1Khaṇḍata (ৎ) is NOT allowed at the beginning of an IDN label The sameappliestoঞandthevelarnasalঙintheBanglaSchemeoffive-foldlsquovargarsquo(asdefined under Table 5) Moreover Bangla does not allow ya (য়) in thebeginningofawordeitherbutwecanciteacoupleofnativeexamples forexamplethewordয়8াvেড়া(yaeligbbɔRo)fromthepoemlsquoLichuchorrsquowrittenbyKaziNazrul IslamHowever there are instances of it being used in namesmostlyof foreignoriginsuchasYaqubwhichmaybewrittenwithya(য়) inthe beginning as inয়াwব) In very recent timeswhile transliterating someChineseandJapanesenamesinBanglaonedoescomeacrossthepossibilityofKhaṇḍata (ৎ) followedbysa (স) inthebeginningofaword forexamplexেসিরং(Tsering)

2CHcancomewithKhandaTainonlythecasewhereCisra(র)(09B0)

ৎ6 asinভৎ6 সনা

3OnlyfollowingcombinationswithVHCMwillbeallowedrarrঅ8া(togetherpronouncedasaelig)asinঅ8ািসড(acid)rarrএ8া(togetheralsopronouncedasaelig)asinএ8ািসডএ8ােসািসেয়শান

(acidassociation)

102 lsquoSylhetiNagarılipirsquoorlsquoSilotirsquoThisversionofBanglascriptresemblesthe lsquoKaithīrsquoscript(ISO12954)usedbytheAccountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh andBiharndashwidelyinuseduringthe1880sTherewereseveralothernamesofSylheti

13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language Assamese and Maṇipuri have not been covered in this section

58

Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)

Table17ndashTheScriptTableofSylhetiNagarıorSiloti

103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints

1031 BanglaandNāgarīorDevanāgarī

Bangla Devanāgarī NBGP

Decision

ঃ U+0983 ः U+0903 Confusable

ওU+0993 उU+0909 Confusable

ঘU+0998 घU+0918 Confusable

U+0981 U+0945 Confusable

Table18BanglaandDevanāgarīconfusablecodepoints

59

1032 BanglaandGurmukhi

Bangla Gurmukhi NBGPdecision

ঘU+0998 ਬU+0A2C Confusable

U+0981 U+0A71 Confusable

Table19BanglaandGurmukhiconfusablecodepoints

Bangla Gurmukhi

NBGPdecision

ওU+0993 ਤU+0A24

Distinguishable

শU+09B6 ਅU+0A05

Distinguishable

মU+09AE ਮU+0A2E

Distinguishable

বাU+09ACandU+09BE

ਗU+0A17

Distinguishable

Table20ndashBanglaandGurmukhıdistinguishablecodepoints

1033BanglaandOriya(Odia)

Bangla Oriya(Odia) NBGPDecision

ওU+0993 ଓU+0B13 Confusable

Table21ndashBanglaandOriyadistinguishablecodepoints

Bangla Oriya(Odia) NBGP

Decision

ঘU+0998 ସU+0B38 Distinguishable

Table22ndashBanglaandOriyadistinguishablecodepoints

60

11 Appendix-IIBengaliconsonantsandtheirallographs

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

প p y(z+ত)(z +ন)|(z +প)প8(z +য)r(z +র)(z +ল)~(z +স)(+প)(+প)

ফ pʰ (t +র)(t +ল)(+ফ)

ব b ( +জ)( +দ)( +ধ)v(+ব)ব8(+য)(+র)(+ল)ভ(+ভ)(+ব)(+ব)

(0+ধ) 2 (3+ব)

ভ bʱ ভ8(+য)(+র)(+ল)

ত t (x+ত)8(x+x+য)(x+x+ব)(x+থ)(x+ন)ত8(x+য)(x+ম)8(x++য)(x+ব)(x+র)y(z+ত)(p+ত)(p+x+ব)(+ত)q8(+x++য) (+x+র)Thereisamarkedformofত+=ৎৎ6 (+xৎ)

amp (5+ত)

61

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

থ tʰ থ8(iexcl+য)cent(iexcl+র)pound(+থ)(x+থ)curren(+থ)

(7+থ) 9 (+থ)

দ d yen(brvbar+গ)sect(brvbar+ঘ)uml(brvbar+দ)copy(brvbar+ধ)দ8(brvbar+য)ordf(brvbar+ব)laquo(brvbar+ভ)not(brvbar+র)(+দ)shy(+দ)reg(+brvbar+র)not6 (+brvbar+র)

(lt+গ) gt (lt+ধ)

ধ dʱ macr(deg+ন)plusmn(deg+ম)ধ8(deg+য)sup2(deg+র)sup3(acute+ধ)copy(brvbar+ধ)(+ধ)micro(+ধ)

( (+ধ) gt (lt+ধ) (0+ধ) (7+ধ)

ট ʈ para(middot+ট)ট8(middot+য)cedil(middot+ব)sup1(middot+র)ordm(p+ট)raquo(frac14+ট)

ঠ ʈʰ ঠ8(frac12+য)frac34(iquest+ঠ)Agrave(frac14+ঠ)

ড ɖ Aacute(Acirc+ড)ড8(Acirc+য)Atilde(Acirc+র)

ঢ ɖʱ ঢ8(Auml+য)Aring(iquest+ঢ)

চ tʃ AElig(Ccedil+চ)Egrave(Ccedil+ছ)Eacute(Ccedil+Ecirc+র)Euml(Ccedil+ঞ)চ8(Ccedil+য)Igrave(Iacute+চ)Icirc(Iuml+চ)

(A+চ)

62

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ছ tʃʰ ETH(Ecirc+র)Egrave(Ccedil+ছ)Ntilde(Iacute+ছ)Ograve(Iuml+ছ)

$ (A+ছ)

জ dʒ Oacute(Ocirc+জ)Otilde(Ocirc+Ocirc+ব)Ouml(Ocirc+ঝ)times(Ocirc+ঞ)জ8(Ocirc+য)Oslash(Ocirc+র)Ugrave(Iacute+জ)

(A+জ)

ঝ dʒʱ (notprivilegedenoughtohaveclustersasafirstmember)Ouml(Ocirc+ঝ)Uacute(Iacute+ঝ)

ক k Ucirc(p+ক)ordm(p+ট)(p+ত)Uuml(p+x+র)(p+x+ব)Yacute(p+ন)THORN(p+ব)szlig(p+ম)ক8(p+য)agrave(p+র)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)auml(p+frac14+ব)aacute8(p+frac14+য)aring(p+স)s(aelig+ক)ccedil(+p+র)

amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)

খ kʰ (notprivilegedenoughtohaveclustersasafirstmember)egrave(aelig+খ)

63

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

গ g eacute(acute+গ)ecirc(acute+দ)sup3(acute+ধ)euml(acute+ন)igrave(acute+ব)iacute(acute+ম)গ8(acute+য)icirc(acute+র)iuml(acute+ল)eth(aelig+গ)eth6 (+aelig+গ)

( (+ধ) (H+গ) K (L+H+গ)

ঘ gʱ ntilde(ograve+ন)ঘ8(ograve+য)oacute(ograve+র)ocirc(aelig+ঘ)

ঞ Thisletterdoesnothaveanyparticularphoneticvaluebutmostlypronouncedasn

Igrave(Iacute+চ)Ntilde(Iacute+ছ)Ugrave(Iacute+জ)Uacute(Iacute+ঝ)times(Ocirc+ঞ)

(A+চ) $ (A+ছ) (A+জ) M (A+ঝ)

ণ n otilde(iquest+ট)frac34(iquest+ঠ)ouml(iquest+ড)divide(iquest+Acirc+র)Aring(iquest+ঢ)oslash(iquest+ণ)ণ8(iquest+য)ugrave(iquest+ব)acirc(p+frac14+ণ)uacute(frac14+ণ)ucirc(+ণ)

O (P+ড) - (P+R+র) + (S+ণ)

ঙং ŋ s(aelig+ক)uuml(aelig+p+র)egrave(aelig+খ)eth(aelig+গ)ocirc(aelig+ঘ)yacute(aelig+p+ষ)(Insomecontextsaeligisreplacedbyং)কংঅং

) (H+ক) (H+গ) U (H+ঘ)

64

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ম m thorn(+ল)yuml(+প)(+z+র)(+ভ)(++র)$(+ম)(+র)(x+ম)plusmn(deg+ম)amp(+ম)atilde(p+frac14+ম)

W (3+ম)

ন n (+ট)((+middot+র))(iquest+ঠ)u(+ড)(+Acirc+র)(+ত)q(+x+র)q8(+x++য)curren(+থ)shy(+দ)reg(+brvbar+র)micro(+ধ)+(+deg+র)(+brvbar+ব)-(+ন)(+ম)ন8(+য)(+স)0(+ন)

(7+থ) (7+ধ) (7+Y+র)

শ ʃ Icirc(Iuml+চ)Ograve(Iuml+ছ)1(Iuml+ন)2(Iuml+ম)3(Iuml+র)4(Iuml+ল)শ8(Iuml+য)

ষ ʃ 5(frac14+ক)raquo(frac14+ট)Agrave(frac14+ঠ)uacute(frac14+ণ)6(frac14+প)7(frac14+z+র)8(frac14+ফ)raquo(frac14+ট)9(frac14+middot+র)Agrave(frac14+ঠ)uacute(frac14+ণ)ষ8(frac14+য)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)

+ (S+ণ)

65

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

স sampʃ (+ক)(+ট)(+প)(+ফ)lt(+ত)pound(+থ)(+ট)(+ক)=(+খ)স8(+য)gt(+র)(+ল)aring(p+স)

9 (+থ)

হ h ucirc(+ণ)0(+ন)amp(+ম)হ8(+য)(+র)A(+ল)

W (3+ম)

ড় ɽ B(C+গ)

ঢ় ɽʱ (notprivilegedenoughtohaveclusters)

য dʒThesecondarysymbol(allograph)jɔ-phalahastwophoneticvaluesWhenaddedtotheinitialconsonantinaworditisavowelaelig(asinশ8ামলর 8াপারetc)Butafteranon-initialconsonantitjustdoublesitinpronunciation(asinকায6ধায6etc)The+যcombinationhastwophysicalmanifestationsmdashর 8andয6

ক8(p+য)স8(+য)র 8(+য)[Justর 8isneverusedinBanglaorthographyর 8াisbutthenitslasttwosymbolsYa-phalaa-karaconstituteavowelsignrepresentingthevowelঅ8া]

66

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

র r Twomanifestationsmdashi lরফrepʰasthefirst

memberofaclusteregপ6ৎ6 not6 য6D6(+deg+ব)(earlierE6=+brvbar+deg+বafour-termcluster)etc(placedoverthefollowingconsonant)

ii র-ফলাrɔ-pʰɔlaasthesecondthirdmemberofaclustereg etc(placedundertheconsonantitfollows)

ল l F(+গ)(+প)G(+ব)H(+ম)I(+ট)J(+ড)K(+ক)F(+গ)L(+দ)ল8(+য)iuml(acute+ল)(+ল)thorn(+ল)

ঃ hwordfinallywordmediallyitdoublesthepronunciationofthefollowingconsonant

অঃকঃ

অব

Page 53: Proposal for a Bangla (or Bengali) Script Root Zone …‘Bangla’ (or Bengali) is historically and genealogically regarded as an eastern Indo-Aryan language with around 178.2 million

53

9 References[100] UnicodeConsortium2017UnicodeStandard100MountainViewCA

[101] BandyopadhyayChittaranjan1981DuiShatakerBanglaMudranoPrakashanKolkataAnandaPublishers

[102] BanerjiRD1919TheOriginoftheBengaliScriptKolkataNewDelhiAsianEducationalServices2003reprint

[103] ChatterjiSK1926TheOriginandDevelopmentoftheBengaliLanguageCalcuttaCalcuttaUniversityPress

[104] -----1939Bhasha-prakashBangalaVyakaran(AGrammaroftheBengaliLanguage)CalcuttaUniversityofCalcutta

[105] HaiMuhammadAbdul1964DhvaniVijnanOBanglaDhvani-tattwa(PhoneticsandBengaliPhonology)DhakaBanglaAcademy

[106]JhaSubhadra1958TheFormationofMaithiliLondonLuzacampCo

[107] KosticDjordjeDasRheaS1972AShortOutlineofBengaliPhoneticsCalcuttaStatisticalPublishingCompany

[108] MajumdarRC1971HistoryofAncientBengalCalcuttaGBhardwaj

[109] MazumdarBijaychandra19202000TheHistoryoftheBengaliLanguage(ReprCalcutta1920ed)NewDelhiAsianEducationalServices

[110] PandeyAnshuman2001ProposaltoEncodetheTirhutaScriptinISOIEC10646

[111] PalPalashBaran2001DhwanimalaBarnamalaKolkataPapyrus

[112] -----2007lsquoBanglaHarapherPanchParbarsquoInSwapanChakrabortyedMudranerSanskritiOBanglaBoiKolkataAbabhas

[113] RossFiona1999ThePrintedBengaliCharacteranditsEvolutionLondonCurzon

[114] ShastriMahamahopadhyayHaraPrasad1916HājārBacharērPurāṇaBāṅgālāBhāṣāyBauddhaGānōDōhāCalcuttaBangiyaSahityaParishat

[115] SinghUdayaNarayana(JointlyManiruzzaman)1983DiglossiainBangladeshandlanguageplanningCalcuttaGyanBharati214pp

[116] -----1987ABibliographyofBengaliLinguisticsMysoreCIILxii+316pp

[117] -----2017(withRajibChakrabortyBidishaBhattacharjeeampArimardanKumarTripathy)LanguagesandCulturesontheMarginGuidelinesforFieldworkonEndangeredLanguagesMimeoCentreforEndangeredLanguagesVisva-Bharati

54

[118] -----1980ScriptalchoiceandspellingreformAnessayinlanguageandplanningJournaloftheMSUniversityofBarodaSocialScienceNumber292173-186AmodifiedversionreprintedEAnnamalaiBjornJernuddandJoanRubinedsLanguagePlanningProceedingsofanInstituteMysoreCIIL405-417

[119] Sripantha1996JakhanChapakhanaEloKolkataPaschim-BangaBanglaAcademy

[120] SurAtul1986BanglaMudranerDushoBacharKolkataJijnasa

[121] ScriptBehaviourforBengaliVersion11TDILandC-DACPune

[122] BoraMahendra1981TheEvolutionofAssameseScriptJorhatAssamSahityaSabha

[123] ProposaltoEncodetheTirhutaScriptinISOIEC10646httpwwwunicodeorgL2L201111175r-tirhutapdfaccessedon25112017

[124] EthnologueAssameseintheLanguageCloudhttpswwwethnologuecomcloudasmaccessedon25112017

[125] BengalialphabetforManipurifoundinEthnologueManipuri(MeeteilonMeithei)httpswwwomniglotcomwritingmanipurihtmaccessedon20102019

[126] WikipediaBengalialphabethttpsenwikipediaorgwikiBengali_alphabetaccessedon25112017

[129]OmniglotSlyhetihttpwwwomniglotcomwritingsylotihtmaccessedon1052018

[130]WikipediaBishnupriyaManipurilanguagehttpsenwikipediaorgwikiBishnupriya_Manipuri_languageaccessedon25112017

[131] TheEMILLECIILCorpushttpmetashareeldaorgrepositorybrowsethe-emilleciil-corpusabdd35c8de6f11e2b1e400259011f6ea6bce74d38dbb42d881da76c64a6adb20accessedon1052018

[132]TheEMILLECIILCorpushttpcatalogelrainfoproduct_infophpproducts_id=696accessedon1052018

[133] BanglaLanguageampScript httpswwwisicalacin~rc_banglabanglahtmlaccessedon1052018

[134] SarkarPabitra1992BanglaBananSanskarSamasyaoSambhabanaKolkataChirayataPrakashan

[135] SarkarPabitra1993BanglaBhasharYuktabyanjanBhasha1123-45

55

[136] DashNiladriShekharandBBChaudhuri1998BanglaScriptAStructuralStudyLinguisticsToday121-28Alsoavailableathttpswwwacademiaedu9967428Bangla_Script_A_Structural_Study

[137] DaniAhmedHasan(1957)lsquoSrīhatta-NāgarīLipirUtpattioBikāśrsquoBanglaAcademyPatrika(Dhaka)Vol12(Bhadra-Agrahayan1364BangabdaNumber)pg1

[138] WikipediaSylhetiNagari httpsenwikipediaorgwikiSylheti_Nagariaccessedon1952018

[139] FuruiRyosuke(2015)lsquoVariegatedAdaptationsStateFormationinBengalfromtheFifthtoSeventhCenturyrsquoinBhairabiPrasadSahuampHermannKulkeedsInterrogatingPoliticalSystemsIntegrativeProcessesandStatesinPre-ModernIndiaChapter9Pp255-73NewDelhiManohar

[140] FergusonCharesAandMunierChowdhury(1960)lsquoPhonesofBengalirsquoLanguageVol36No1pp22-59

[141] ShahidullahMuhammad(2007)BuddhistMysticSongsDhakaMowlaBrothers

[142] RayPunyaSloka(1966)BengaliLanguageHandbookWashington

[143] HaiMuhammadAbdul(1960)AphoneticandphonologicalstudyofnasalsandnasalizationinBengaliDhakaUniversityofDhaka

[144] UnicodeConsortiumProposalSummaryFormtoAccompanySubmissionsforAdditionstotheRepertoireofISOIEC10646UNICODEhttpswwwunicodeorgL2L200202387r-syloti-formpdfaccessedonMay212018

[145] WikipediaOlChiki(Unicodeblock)httpsenwikipediaorgwikiOl_Chiki_(Unicode_block)accessedonMay212018

[146]BanglaScripthttpwwwbangladesh2000combdbangla_scripthtmlaccessedonMay212018

[147] BhattacharyaAshutoshed(1942)GopichandrerGanCalcuttaCalcuttaUniversity

[149] DasSisirKumar(1975)SahibsandMunshisAnAccountoftheCollegeofFortWilliamCalcutta

[150] IslamRafiqulPabitraSarkarMahbubulHaqampRajibChakraborty(eds)(2014)BanglaAcademyPromitoBanglaByabaharikByakaran(AFunctionalGrammarofStandardBangla)DhakaBanglaAcademy

[151] SarkarPabitra[2013]lsquoBanglaSpellingReformtheLongandShortofItrsquoBanglaJournal19215-232

[152] BanglaAcademy(2012)BanglaAcademyPromitoBanglaBananerNiyam(StandardBanglaSpellingasadoptedbyBanglaAcademy)DhakaBanglaAcademy

56

[153] SarkarPabitraampRajibChakraborty2018ldquoWhathashappenedSoFarIntermsof Script Reformsrdquo Paper presented at the Face to Face meeting jointly held by theBanglaAcademyDhakaampICANNatBanglaAcademyDhakaon10072018

[154] The Unicode Consortium 2018 The Unicodereg Standard Version 110 ndash CoreSpecificationChapter12P473

57

10 Appendix-I

101 AugmentedBackus-NaurFormalism(ABNF)The Augmented Backus-Naur Formalism (ABNF) is generic in nature and whenappliedtoaspecificlanguagescriptcertainrestrictionrulesapplyInotherwordsinagivenlanguagesomeoftheFormalismstructuresdonotnecessarilyapplyTotakecareofsuchcasesrestrictionrulesaresetinplaceTheserestrictionswillhelptofine-tunetheABNFIncaseofBangla13inparticularthefollowingrulesapply

1Khaṇḍata (ৎ) is NOT allowed at the beginning of an IDN label The sameappliestoঞandthevelarnasalঙintheBanglaSchemeoffive-foldlsquovargarsquo(asdefined under Table 5) Moreover Bangla does not allow ya (য়) in thebeginningofawordeitherbutwecanciteacoupleofnativeexamples forexamplethewordয়8াvেড়া(yaeligbbɔRo)fromthepoemlsquoLichuchorrsquowrittenbyKaziNazrul IslamHowever there are instances of it being used in namesmostlyof foreignoriginsuchasYaqubwhichmaybewrittenwithya(য়) inthe beginning as inয়াwব) In very recent timeswhile transliterating someChineseandJapanesenamesinBanglaonedoescomeacrossthepossibilityofKhaṇḍata (ৎ) followedbysa (স) inthebeginningofaword forexamplexেসিরং(Tsering)

2CHcancomewithKhandaTainonlythecasewhereCisra(র)(09B0)

ৎ6 asinভৎ6 সনা

3OnlyfollowingcombinationswithVHCMwillbeallowedrarrঅ8া(togetherpronouncedasaelig)asinঅ8ািসড(acid)rarrএ8া(togetheralsopronouncedasaelig)asinএ8ািসডএ8ােসািসেয়শান

(acidassociation)

102 lsquoSylhetiNagarılipirsquoorlsquoSilotirsquoThisversionofBanglascriptresemblesthe lsquoKaithīrsquoscript(ISO12954)usedbytheAccountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh andBiharndashwidelyinuseduringthe1880sTherewereseveralothernamesofSylheti

13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language Assamese and Maṇipuri have not been covered in this section

58

Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)

Table17ndashTheScriptTableofSylhetiNagarıorSiloti

103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints

1031 BanglaandNāgarīorDevanāgarī

Bangla Devanāgarī NBGP

Decision

ঃ U+0983 ः U+0903 Confusable

ওU+0993 उU+0909 Confusable

ঘU+0998 घU+0918 Confusable

U+0981 U+0945 Confusable

Table18BanglaandDevanāgarīconfusablecodepoints

59

1032 BanglaandGurmukhi

Bangla Gurmukhi NBGPdecision

ঘU+0998 ਬU+0A2C Confusable

U+0981 U+0A71 Confusable

Table19BanglaandGurmukhiconfusablecodepoints

Bangla Gurmukhi

NBGPdecision

ওU+0993 ਤU+0A24

Distinguishable

শU+09B6 ਅU+0A05

Distinguishable

মU+09AE ਮU+0A2E

Distinguishable

বাU+09ACandU+09BE

ਗU+0A17

Distinguishable

Table20ndashBanglaandGurmukhıdistinguishablecodepoints

1033BanglaandOriya(Odia)

Bangla Oriya(Odia) NBGPDecision

ওU+0993 ଓU+0B13 Confusable

Table21ndashBanglaandOriyadistinguishablecodepoints

Bangla Oriya(Odia) NBGP

Decision

ঘU+0998 ସU+0B38 Distinguishable

Table22ndashBanglaandOriyadistinguishablecodepoints

60

11 Appendix-IIBengaliconsonantsandtheirallographs

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

প p y(z+ত)(z +ন)|(z +প)প8(z +য)r(z +র)(z +ল)~(z +স)(+প)(+প)

ফ pʰ (t +র)(t +ল)(+ফ)

ব b ( +জ)( +দ)( +ধ)v(+ব)ব8(+য)(+র)(+ল)ভ(+ভ)(+ব)(+ব)

(0+ধ) 2 (3+ব)

ভ bʱ ভ8(+য)(+র)(+ল)

ত t (x+ত)8(x+x+য)(x+x+ব)(x+থ)(x+ন)ত8(x+য)(x+ম)8(x++য)(x+ব)(x+র)y(z+ত)(p+ত)(p+x+ব)(+ত)q8(+x++য) (+x+র)Thereisamarkedformofত+=ৎৎ6 (+xৎ)

amp (5+ত)

61

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

থ tʰ থ8(iexcl+য)cent(iexcl+র)pound(+থ)(x+থ)curren(+থ)

(7+থ) 9 (+থ)

দ d yen(brvbar+গ)sect(brvbar+ঘ)uml(brvbar+দ)copy(brvbar+ধ)দ8(brvbar+য)ordf(brvbar+ব)laquo(brvbar+ভ)not(brvbar+র)(+দ)shy(+দ)reg(+brvbar+র)not6 (+brvbar+র)

(lt+গ) gt (lt+ধ)

ধ dʱ macr(deg+ন)plusmn(deg+ম)ধ8(deg+য)sup2(deg+র)sup3(acute+ধ)copy(brvbar+ধ)(+ধ)micro(+ধ)

( (+ধ) gt (lt+ধ) (0+ধ) (7+ধ)

ট ʈ para(middot+ট)ট8(middot+য)cedil(middot+ব)sup1(middot+র)ordm(p+ট)raquo(frac14+ট)

ঠ ʈʰ ঠ8(frac12+য)frac34(iquest+ঠ)Agrave(frac14+ঠ)

ড ɖ Aacute(Acirc+ড)ড8(Acirc+য)Atilde(Acirc+র)

ঢ ɖʱ ঢ8(Auml+য)Aring(iquest+ঢ)

চ tʃ AElig(Ccedil+চ)Egrave(Ccedil+ছ)Eacute(Ccedil+Ecirc+র)Euml(Ccedil+ঞ)চ8(Ccedil+য)Igrave(Iacute+চ)Icirc(Iuml+চ)

(A+চ)

62

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ছ tʃʰ ETH(Ecirc+র)Egrave(Ccedil+ছ)Ntilde(Iacute+ছ)Ograve(Iuml+ছ)

$ (A+ছ)

জ dʒ Oacute(Ocirc+জ)Otilde(Ocirc+Ocirc+ব)Ouml(Ocirc+ঝ)times(Ocirc+ঞ)জ8(Ocirc+য)Oslash(Ocirc+র)Ugrave(Iacute+জ)

(A+জ)

ঝ dʒʱ (notprivilegedenoughtohaveclustersasafirstmember)Ouml(Ocirc+ঝ)Uacute(Iacute+ঝ)

ক k Ucirc(p+ক)ordm(p+ট)(p+ত)Uuml(p+x+র)(p+x+ব)Yacute(p+ন)THORN(p+ব)szlig(p+ম)ক8(p+য)agrave(p+র)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)auml(p+frac14+ব)aacute8(p+frac14+য)aring(p+স)s(aelig+ক)ccedil(+p+র)

amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)

খ kʰ (notprivilegedenoughtohaveclustersasafirstmember)egrave(aelig+খ)

63

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

গ g eacute(acute+গ)ecirc(acute+দ)sup3(acute+ধ)euml(acute+ন)igrave(acute+ব)iacute(acute+ম)গ8(acute+য)icirc(acute+র)iuml(acute+ল)eth(aelig+গ)eth6 (+aelig+গ)

( (+ধ) (H+গ) K (L+H+গ)

ঘ gʱ ntilde(ograve+ন)ঘ8(ograve+য)oacute(ograve+র)ocirc(aelig+ঘ)

ঞ Thisletterdoesnothaveanyparticularphoneticvaluebutmostlypronouncedasn

Igrave(Iacute+চ)Ntilde(Iacute+ছ)Ugrave(Iacute+জ)Uacute(Iacute+ঝ)times(Ocirc+ঞ)

(A+চ) $ (A+ছ) (A+জ) M (A+ঝ)

ণ n otilde(iquest+ট)frac34(iquest+ঠ)ouml(iquest+ড)divide(iquest+Acirc+র)Aring(iquest+ঢ)oslash(iquest+ণ)ণ8(iquest+য)ugrave(iquest+ব)acirc(p+frac14+ণ)uacute(frac14+ণ)ucirc(+ণ)

O (P+ড) - (P+R+র) + (S+ণ)

ঙং ŋ s(aelig+ক)uuml(aelig+p+র)egrave(aelig+খ)eth(aelig+গ)ocirc(aelig+ঘ)yacute(aelig+p+ষ)(Insomecontextsaeligisreplacedbyং)কংঅং

) (H+ক) (H+গ) U (H+ঘ)

64

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ম m thorn(+ল)yuml(+প)(+z+র)(+ভ)(++র)$(+ম)(+র)(x+ম)plusmn(deg+ম)amp(+ম)atilde(p+frac14+ম)

W (3+ম)

ন n (+ট)((+middot+র))(iquest+ঠ)u(+ড)(+Acirc+র)(+ত)q(+x+র)q8(+x++য)curren(+থ)shy(+দ)reg(+brvbar+র)micro(+ধ)+(+deg+র)(+brvbar+ব)-(+ন)(+ম)ন8(+য)(+স)0(+ন)

(7+থ) (7+ধ) (7+Y+র)

শ ʃ Icirc(Iuml+চ)Ograve(Iuml+ছ)1(Iuml+ন)2(Iuml+ম)3(Iuml+র)4(Iuml+ল)শ8(Iuml+য)

ষ ʃ 5(frac14+ক)raquo(frac14+ট)Agrave(frac14+ঠ)uacute(frac14+ণ)6(frac14+প)7(frac14+z+র)8(frac14+ফ)raquo(frac14+ট)9(frac14+middot+র)Agrave(frac14+ঠ)uacute(frac14+ণ)ষ8(frac14+য)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)

+ (S+ণ)

65

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

স sampʃ (+ক)(+ট)(+প)(+ফ)lt(+ত)pound(+থ)(+ট)(+ক)=(+খ)স8(+য)gt(+র)(+ল)aring(p+স)

9 (+থ)

হ h ucirc(+ণ)0(+ন)amp(+ম)হ8(+য)(+র)A(+ল)

W (3+ম)

ড় ɽ B(C+গ)

ঢ় ɽʱ (notprivilegedenoughtohaveclusters)

য dʒThesecondarysymbol(allograph)jɔ-phalahastwophoneticvaluesWhenaddedtotheinitialconsonantinaworditisavowelaelig(asinশ8ামলর 8াপারetc)Butafteranon-initialconsonantitjustdoublesitinpronunciation(asinকায6ধায6etc)The+যcombinationhastwophysicalmanifestationsmdashর 8andয6

ক8(p+য)স8(+য)র 8(+য)[Justর 8isneverusedinBanglaorthographyর 8াisbutthenitslasttwosymbolsYa-phalaa-karaconstituteavowelsignrepresentingthevowelঅ8া]

66

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

র r Twomanifestationsmdashi lরফrepʰasthefirst

memberofaclusteregপ6ৎ6 not6 য6D6(+deg+ব)(earlierE6=+brvbar+deg+বafour-termcluster)etc(placedoverthefollowingconsonant)

ii র-ফলাrɔ-pʰɔlaasthesecondthirdmemberofaclustereg etc(placedundertheconsonantitfollows)

ল l F(+গ)(+প)G(+ব)H(+ম)I(+ট)J(+ড)K(+ক)F(+গ)L(+দ)ল8(+য)iuml(acute+ল)(+ল)thorn(+ল)

ঃ hwordfinallywordmediallyitdoublesthepronunciationofthefollowingconsonant

অঃকঃ

অব

Page 54: Proposal for a Bangla (or Bengali) Script Root Zone …‘Bangla’ (or Bengali) is historically and genealogically regarded as an eastern Indo-Aryan language with around 178.2 million

54

[118] -----1980ScriptalchoiceandspellingreformAnessayinlanguageandplanningJournaloftheMSUniversityofBarodaSocialScienceNumber292173-186AmodifiedversionreprintedEAnnamalaiBjornJernuddandJoanRubinedsLanguagePlanningProceedingsofanInstituteMysoreCIIL405-417

[119] Sripantha1996JakhanChapakhanaEloKolkataPaschim-BangaBanglaAcademy

[120] SurAtul1986BanglaMudranerDushoBacharKolkataJijnasa

[121] ScriptBehaviourforBengaliVersion11TDILandC-DACPune

[122] BoraMahendra1981TheEvolutionofAssameseScriptJorhatAssamSahityaSabha

[123] ProposaltoEncodetheTirhutaScriptinISOIEC10646httpwwwunicodeorgL2L201111175r-tirhutapdfaccessedon25112017

[124] EthnologueAssameseintheLanguageCloudhttpswwwethnologuecomcloudasmaccessedon25112017

[125] BengalialphabetforManipurifoundinEthnologueManipuri(MeeteilonMeithei)httpswwwomniglotcomwritingmanipurihtmaccessedon20102019

[126] WikipediaBengalialphabethttpsenwikipediaorgwikiBengali_alphabetaccessedon25112017

[129]OmniglotSlyhetihttpwwwomniglotcomwritingsylotihtmaccessedon1052018

[130]WikipediaBishnupriyaManipurilanguagehttpsenwikipediaorgwikiBishnupriya_Manipuri_languageaccessedon25112017

[131] TheEMILLECIILCorpushttpmetashareeldaorgrepositorybrowsethe-emilleciil-corpusabdd35c8de6f11e2b1e400259011f6ea6bce74d38dbb42d881da76c64a6adb20accessedon1052018

[132]TheEMILLECIILCorpushttpcatalogelrainfoproduct_infophpproducts_id=696accessedon1052018

[133] BanglaLanguageampScript httpswwwisicalacin~rc_banglabanglahtmlaccessedon1052018

[134] SarkarPabitra1992BanglaBananSanskarSamasyaoSambhabanaKolkataChirayataPrakashan

[135] SarkarPabitra1993BanglaBhasharYuktabyanjanBhasha1123-45

55

[136] DashNiladriShekharandBBChaudhuri1998BanglaScriptAStructuralStudyLinguisticsToday121-28Alsoavailableathttpswwwacademiaedu9967428Bangla_Script_A_Structural_Study

[137] DaniAhmedHasan(1957)lsquoSrīhatta-NāgarīLipirUtpattioBikāśrsquoBanglaAcademyPatrika(Dhaka)Vol12(Bhadra-Agrahayan1364BangabdaNumber)pg1

[138] WikipediaSylhetiNagari httpsenwikipediaorgwikiSylheti_Nagariaccessedon1952018

[139] FuruiRyosuke(2015)lsquoVariegatedAdaptationsStateFormationinBengalfromtheFifthtoSeventhCenturyrsquoinBhairabiPrasadSahuampHermannKulkeedsInterrogatingPoliticalSystemsIntegrativeProcessesandStatesinPre-ModernIndiaChapter9Pp255-73NewDelhiManohar

[140] FergusonCharesAandMunierChowdhury(1960)lsquoPhonesofBengalirsquoLanguageVol36No1pp22-59

[141] ShahidullahMuhammad(2007)BuddhistMysticSongsDhakaMowlaBrothers

[142] RayPunyaSloka(1966)BengaliLanguageHandbookWashington

[143] HaiMuhammadAbdul(1960)AphoneticandphonologicalstudyofnasalsandnasalizationinBengaliDhakaUniversityofDhaka

[144] UnicodeConsortiumProposalSummaryFormtoAccompanySubmissionsforAdditionstotheRepertoireofISOIEC10646UNICODEhttpswwwunicodeorgL2L200202387r-syloti-formpdfaccessedonMay212018

[145] WikipediaOlChiki(Unicodeblock)httpsenwikipediaorgwikiOl_Chiki_(Unicode_block)accessedonMay212018

[146]BanglaScripthttpwwwbangladesh2000combdbangla_scripthtmlaccessedonMay212018

[147] BhattacharyaAshutoshed(1942)GopichandrerGanCalcuttaCalcuttaUniversity

[149] DasSisirKumar(1975)SahibsandMunshisAnAccountoftheCollegeofFortWilliamCalcutta

[150] IslamRafiqulPabitraSarkarMahbubulHaqampRajibChakraborty(eds)(2014)BanglaAcademyPromitoBanglaByabaharikByakaran(AFunctionalGrammarofStandardBangla)DhakaBanglaAcademy

[151] SarkarPabitra[2013]lsquoBanglaSpellingReformtheLongandShortofItrsquoBanglaJournal19215-232

[152] BanglaAcademy(2012)BanglaAcademyPromitoBanglaBananerNiyam(StandardBanglaSpellingasadoptedbyBanglaAcademy)DhakaBanglaAcademy

56

[153] SarkarPabitraampRajibChakraborty2018ldquoWhathashappenedSoFarIntermsof Script Reformsrdquo Paper presented at the Face to Face meeting jointly held by theBanglaAcademyDhakaampICANNatBanglaAcademyDhakaon10072018

[154] The Unicode Consortium 2018 The Unicodereg Standard Version 110 ndash CoreSpecificationChapter12P473

57

10 Appendix-I

101 AugmentedBackus-NaurFormalism(ABNF)The Augmented Backus-Naur Formalism (ABNF) is generic in nature and whenappliedtoaspecificlanguagescriptcertainrestrictionrulesapplyInotherwordsinagivenlanguagesomeoftheFormalismstructuresdonotnecessarilyapplyTotakecareofsuchcasesrestrictionrulesaresetinplaceTheserestrictionswillhelptofine-tunetheABNFIncaseofBangla13inparticularthefollowingrulesapply

1Khaṇḍata (ৎ) is NOT allowed at the beginning of an IDN label The sameappliestoঞandthevelarnasalঙintheBanglaSchemeoffive-foldlsquovargarsquo(asdefined under Table 5) Moreover Bangla does not allow ya (য়) in thebeginningofawordeitherbutwecanciteacoupleofnativeexamples forexamplethewordয়8াvেড়া(yaeligbbɔRo)fromthepoemlsquoLichuchorrsquowrittenbyKaziNazrul IslamHowever there are instances of it being used in namesmostlyof foreignoriginsuchasYaqubwhichmaybewrittenwithya(য়) inthe beginning as inয়াwব) In very recent timeswhile transliterating someChineseandJapanesenamesinBanglaonedoescomeacrossthepossibilityofKhaṇḍata (ৎ) followedbysa (স) inthebeginningofaword forexamplexেসিরং(Tsering)

2CHcancomewithKhandaTainonlythecasewhereCisra(র)(09B0)

ৎ6 asinভৎ6 সনা

3OnlyfollowingcombinationswithVHCMwillbeallowedrarrঅ8া(togetherpronouncedasaelig)asinঅ8ািসড(acid)rarrএ8া(togetheralsopronouncedasaelig)asinএ8ািসডএ8ােসািসেয়শান

(acidassociation)

102 lsquoSylhetiNagarılipirsquoorlsquoSilotirsquoThisversionofBanglascriptresemblesthe lsquoKaithīrsquoscript(ISO12954)usedbytheAccountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh andBiharndashwidelyinuseduringthe1880sTherewereseveralothernamesofSylheti

13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language Assamese and Maṇipuri have not been covered in this section

58

Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)

Table17ndashTheScriptTableofSylhetiNagarıorSiloti

103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints

1031 BanglaandNāgarīorDevanāgarī

Bangla Devanāgarī NBGP

Decision

ঃ U+0983 ः U+0903 Confusable

ওU+0993 उU+0909 Confusable

ঘU+0998 घU+0918 Confusable

U+0981 U+0945 Confusable

Table18BanglaandDevanāgarīconfusablecodepoints

59

1032 BanglaandGurmukhi

Bangla Gurmukhi NBGPdecision

ঘU+0998 ਬU+0A2C Confusable

U+0981 U+0A71 Confusable

Table19BanglaandGurmukhiconfusablecodepoints

Bangla Gurmukhi

NBGPdecision

ওU+0993 ਤU+0A24

Distinguishable

শU+09B6 ਅU+0A05

Distinguishable

মU+09AE ਮU+0A2E

Distinguishable

বাU+09ACandU+09BE

ਗU+0A17

Distinguishable

Table20ndashBanglaandGurmukhıdistinguishablecodepoints

1033BanglaandOriya(Odia)

Bangla Oriya(Odia) NBGPDecision

ওU+0993 ଓU+0B13 Confusable

Table21ndashBanglaandOriyadistinguishablecodepoints

Bangla Oriya(Odia) NBGP

Decision

ঘU+0998 ସU+0B38 Distinguishable

Table22ndashBanglaandOriyadistinguishablecodepoints

60

11 Appendix-IIBengaliconsonantsandtheirallographs

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

প p y(z+ত)(z +ন)|(z +প)প8(z +য)r(z +র)(z +ল)~(z +স)(+প)(+প)

ফ pʰ (t +র)(t +ল)(+ফ)

ব b ( +জ)( +দ)( +ধ)v(+ব)ব8(+য)(+র)(+ল)ভ(+ভ)(+ব)(+ব)

(0+ধ) 2 (3+ব)

ভ bʱ ভ8(+য)(+র)(+ল)

ত t (x+ত)8(x+x+য)(x+x+ব)(x+থ)(x+ন)ত8(x+য)(x+ম)8(x++য)(x+ব)(x+র)y(z+ত)(p+ত)(p+x+ব)(+ত)q8(+x++য) (+x+র)Thereisamarkedformofত+=ৎৎ6 (+xৎ)

amp (5+ত)

61

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

থ tʰ থ8(iexcl+য)cent(iexcl+র)pound(+থ)(x+থ)curren(+থ)

(7+থ) 9 (+থ)

দ d yen(brvbar+গ)sect(brvbar+ঘ)uml(brvbar+দ)copy(brvbar+ধ)দ8(brvbar+য)ordf(brvbar+ব)laquo(brvbar+ভ)not(brvbar+র)(+দ)shy(+দ)reg(+brvbar+র)not6 (+brvbar+র)

(lt+গ) gt (lt+ধ)

ধ dʱ macr(deg+ন)plusmn(deg+ম)ধ8(deg+য)sup2(deg+র)sup3(acute+ধ)copy(brvbar+ধ)(+ধ)micro(+ধ)

( (+ধ) gt (lt+ধ) (0+ধ) (7+ধ)

ট ʈ para(middot+ট)ট8(middot+য)cedil(middot+ব)sup1(middot+র)ordm(p+ট)raquo(frac14+ট)

ঠ ʈʰ ঠ8(frac12+য)frac34(iquest+ঠ)Agrave(frac14+ঠ)

ড ɖ Aacute(Acirc+ড)ড8(Acirc+য)Atilde(Acirc+র)

ঢ ɖʱ ঢ8(Auml+য)Aring(iquest+ঢ)

চ tʃ AElig(Ccedil+চ)Egrave(Ccedil+ছ)Eacute(Ccedil+Ecirc+র)Euml(Ccedil+ঞ)চ8(Ccedil+য)Igrave(Iacute+চ)Icirc(Iuml+চ)

(A+চ)

62

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ছ tʃʰ ETH(Ecirc+র)Egrave(Ccedil+ছ)Ntilde(Iacute+ছ)Ograve(Iuml+ছ)

$ (A+ছ)

জ dʒ Oacute(Ocirc+জ)Otilde(Ocirc+Ocirc+ব)Ouml(Ocirc+ঝ)times(Ocirc+ঞ)জ8(Ocirc+য)Oslash(Ocirc+র)Ugrave(Iacute+জ)

(A+জ)

ঝ dʒʱ (notprivilegedenoughtohaveclustersasafirstmember)Ouml(Ocirc+ঝ)Uacute(Iacute+ঝ)

ক k Ucirc(p+ক)ordm(p+ট)(p+ত)Uuml(p+x+র)(p+x+ব)Yacute(p+ন)THORN(p+ব)szlig(p+ম)ক8(p+য)agrave(p+র)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)auml(p+frac14+ব)aacute8(p+frac14+য)aring(p+স)s(aelig+ক)ccedil(+p+র)

amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)

খ kʰ (notprivilegedenoughtohaveclustersasafirstmember)egrave(aelig+খ)

63

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

গ g eacute(acute+গ)ecirc(acute+দ)sup3(acute+ধ)euml(acute+ন)igrave(acute+ব)iacute(acute+ম)গ8(acute+য)icirc(acute+র)iuml(acute+ল)eth(aelig+গ)eth6 (+aelig+গ)

( (+ধ) (H+গ) K (L+H+গ)

ঘ gʱ ntilde(ograve+ন)ঘ8(ograve+য)oacute(ograve+র)ocirc(aelig+ঘ)

ঞ Thisletterdoesnothaveanyparticularphoneticvaluebutmostlypronouncedasn

Igrave(Iacute+চ)Ntilde(Iacute+ছ)Ugrave(Iacute+জ)Uacute(Iacute+ঝ)times(Ocirc+ঞ)

(A+চ) $ (A+ছ) (A+জ) M (A+ঝ)

ণ n otilde(iquest+ট)frac34(iquest+ঠ)ouml(iquest+ড)divide(iquest+Acirc+র)Aring(iquest+ঢ)oslash(iquest+ণ)ণ8(iquest+য)ugrave(iquest+ব)acirc(p+frac14+ণ)uacute(frac14+ণ)ucirc(+ণ)

O (P+ড) - (P+R+র) + (S+ণ)

ঙং ŋ s(aelig+ক)uuml(aelig+p+র)egrave(aelig+খ)eth(aelig+গ)ocirc(aelig+ঘ)yacute(aelig+p+ষ)(Insomecontextsaeligisreplacedbyং)কংঅং

) (H+ক) (H+গ) U (H+ঘ)

64

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ম m thorn(+ল)yuml(+প)(+z+র)(+ভ)(++র)$(+ম)(+র)(x+ম)plusmn(deg+ম)amp(+ম)atilde(p+frac14+ম)

W (3+ম)

ন n (+ট)((+middot+র))(iquest+ঠ)u(+ড)(+Acirc+র)(+ত)q(+x+র)q8(+x++য)curren(+থ)shy(+দ)reg(+brvbar+র)micro(+ধ)+(+deg+র)(+brvbar+ব)-(+ন)(+ম)ন8(+য)(+স)0(+ন)

(7+থ) (7+ধ) (7+Y+র)

শ ʃ Icirc(Iuml+চ)Ograve(Iuml+ছ)1(Iuml+ন)2(Iuml+ম)3(Iuml+র)4(Iuml+ল)শ8(Iuml+য)

ষ ʃ 5(frac14+ক)raquo(frac14+ট)Agrave(frac14+ঠ)uacute(frac14+ণ)6(frac14+প)7(frac14+z+র)8(frac14+ফ)raquo(frac14+ট)9(frac14+middot+র)Agrave(frac14+ঠ)uacute(frac14+ণ)ষ8(frac14+য)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)

+ (S+ণ)

65

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

স sampʃ (+ক)(+ট)(+প)(+ফ)lt(+ত)pound(+থ)(+ট)(+ক)=(+খ)স8(+য)gt(+র)(+ল)aring(p+স)

9 (+থ)

হ h ucirc(+ণ)0(+ন)amp(+ম)হ8(+য)(+র)A(+ল)

W (3+ম)

ড় ɽ B(C+গ)

ঢ় ɽʱ (notprivilegedenoughtohaveclusters)

য dʒThesecondarysymbol(allograph)jɔ-phalahastwophoneticvaluesWhenaddedtotheinitialconsonantinaworditisavowelaelig(asinশ8ামলর 8াপারetc)Butafteranon-initialconsonantitjustdoublesitinpronunciation(asinকায6ধায6etc)The+যcombinationhastwophysicalmanifestationsmdashর 8andয6

ক8(p+য)স8(+য)র 8(+য)[Justর 8isneverusedinBanglaorthographyর 8াisbutthenitslasttwosymbolsYa-phalaa-karaconstituteavowelsignrepresentingthevowelঅ8া]

66

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

র r Twomanifestationsmdashi lরফrepʰasthefirst

memberofaclusteregপ6ৎ6 not6 য6D6(+deg+ব)(earlierE6=+brvbar+deg+বafour-termcluster)etc(placedoverthefollowingconsonant)

ii র-ফলাrɔ-pʰɔlaasthesecondthirdmemberofaclustereg etc(placedundertheconsonantitfollows)

ল l F(+গ)(+প)G(+ব)H(+ম)I(+ট)J(+ড)K(+ক)F(+গ)L(+দ)ল8(+য)iuml(acute+ল)(+ল)thorn(+ল)

ঃ hwordfinallywordmediallyitdoublesthepronunciationofthefollowingconsonant

অঃকঃ

অব

Page 55: Proposal for a Bangla (or Bengali) Script Root Zone …‘Bangla’ (or Bengali) is historically and genealogically regarded as an eastern Indo-Aryan language with around 178.2 million

55

[136] DashNiladriShekharandBBChaudhuri1998BanglaScriptAStructuralStudyLinguisticsToday121-28Alsoavailableathttpswwwacademiaedu9967428Bangla_Script_A_Structural_Study

[137] DaniAhmedHasan(1957)lsquoSrīhatta-NāgarīLipirUtpattioBikāśrsquoBanglaAcademyPatrika(Dhaka)Vol12(Bhadra-Agrahayan1364BangabdaNumber)pg1

[138] WikipediaSylhetiNagari httpsenwikipediaorgwikiSylheti_Nagariaccessedon1952018

[139] FuruiRyosuke(2015)lsquoVariegatedAdaptationsStateFormationinBengalfromtheFifthtoSeventhCenturyrsquoinBhairabiPrasadSahuampHermannKulkeedsInterrogatingPoliticalSystemsIntegrativeProcessesandStatesinPre-ModernIndiaChapter9Pp255-73NewDelhiManohar

[140] FergusonCharesAandMunierChowdhury(1960)lsquoPhonesofBengalirsquoLanguageVol36No1pp22-59

[141] ShahidullahMuhammad(2007)BuddhistMysticSongsDhakaMowlaBrothers

[142] RayPunyaSloka(1966)BengaliLanguageHandbookWashington

[143] HaiMuhammadAbdul(1960)AphoneticandphonologicalstudyofnasalsandnasalizationinBengaliDhakaUniversityofDhaka

[144] UnicodeConsortiumProposalSummaryFormtoAccompanySubmissionsforAdditionstotheRepertoireofISOIEC10646UNICODEhttpswwwunicodeorgL2L200202387r-syloti-formpdfaccessedonMay212018

[145] WikipediaOlChiki(Unicodeblock)httpsenwikipediaorgwikiOl_Chiki_(Unicode_block)accessedonMay212018

[146]BanglaScripthttpwwwbangladesh2000combdbangla_scripthtmlaccessedonMay212018

[147] BhattacharyaAshutoshed(1942)GopichandrerGanCalcuttaCalcuttaUniversity

[149] DasSisirKumar(1975)SahibsandMunshisAnAccountoftheCollegeofFortWilliamCalcutta

[150] IslamRafiqulPabitraSarkarMahbubulHaqampRajibChakraborty(eds)(2014)BanglaAcademyPromitoBanglaByabaharikByakaran(AFunctionalGrammarofStandardBangla)DhakaBanglaAcademy

[151] SarkarPabitra[2013]lsquoBanglaSpellingReformtheLongandShortofItrsquoBanglaJournal19215-232

[152] BanglaAcademy(2012)BanglaAcademyPromitoBanglaBananerNiyam(StandardBanglaSpellingasadoptedbyBanglaAcademy)DhakaBanglaAcademy

56

[153] SarkarPabitraampRajibChakraborty2018ldquoWhathashappenedSoFarIntermsof Script Reformsrdquo Paper presented at the Face to Face meeting jointly held by theBanglaAcademyDhakaampICANNatBanglaAcademyDhakaon10072018

[154] The Unicode Consortium 2018 The Unicodereg Standard Version 110 ndash CoreSpecificationChapter12P473

57

10 Appendix-I

101 AugmentedBackus-NaurFormalism(ABNF)The Augmented Backus-Naur Formalism (ABNF) is generic in nature and whenappliedtoaspecificlanguagescriptcertainrestrictionrulesapplyInotherwordsinagivenlanguagesomeoftheFormalismstructuresdonotnecessarilyapplyTotakecareofsuchcasesrestrictionrulesaresetinplaceTheserestrictionswillhelptofine-tunetheABNFIncaseofBangla13inparticularthefollowingrulesapply

1Khaṇḍata (ৎ) is NOT allowed at the beginning of an IDN label The sameappliestoঞandthevelarnasalঙintheBanglaSchemeoffive-foldlsquovargarsquo(asdefined under Table 5) Moreover Bangla does not allow ya (য়) in thebeginningofawordeitherbutwecanciteacoupleofnativeexamples forexamplethewordয়8াvেড়া(yaeligbbɔRo)fromthepoemlsquoLichuchorrsquowrittenbyKaziNazrul IslamHowever there are instances of it being used in namesmostlyof foreignoriginsuchasYaqubwhichmaybewrittenwithya(য়) inthe beginning as inয়াwব) In very recent timeswhile transliterating someChineseandJapanesenamesinBanglaonedoescomeacrossthepossibilityofKhaṇḍata (ৎ) followedbysa (স) inthebeginningofaword forexamplexেসিরং(Tsering)

2CHcancomewithKhandaTainonlythecasewhereCisra(র)(09B0)

ৎ6 asinভৎ6 সনা

3OnlyfollowingcombinationswithVHCMwillbeallowedrarrঅ8া(togetherpronouncedasaelig)asinঅ8ািসড(acid)rarrএ8া(togetheralsopronouncedasaelig)asinএ8ািসডএ8ােসািসেয়শান

(acidassociation)

102 lsquoSylhetiNagarılipirsquoorlsquoSilotirsquoThisversionofBanglascriptresemblesthe lsquoKaithīrsquoscript(ISO12954)usedbytheAccountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh andBiharndashwidelyinuseduringthe1880sTherewereseveralothernamesofSylheti

13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language Assamese and Maṇipuri have not been covered in this section

58

Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)

Table17ndashTheScriptTableofSylhetiNagarıorSiloti

103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints

1031 BanglaandNāgarīorDevanāgarī

Bangla Devanāgarī NBGP

Decision

ঃ U+0983 ः U+0903 Confusable

ওU+0993 उU+0909 Confusable

ঘU+0998 घU+0918 Confusable

U+0981 U+0945 Confusable

Table18BanglaandDevanāgarīconfusablecodepoints

59

1032 BanglaandGurmukhi

Bangla Gurmukhi NBGPdecision

ঘU+0998 ਬU+0A2C Confusable

U+0981 U+0A71 Confusable

Table19BanglaandGurmukhiconfusablecodepoints

Bangla Gurmukhi

NBGPdecision

ওU+0993 ਤU+0A24

Distinguishable

শU+09B6 ਅU+0A05

Distinguishable

মU+09AE ਮU+0A2E

Distinguishable

বাU+09ACandU+09BE

ਗU+0A17

Distinguishable

Table20ndashBanglaandGurmukhıdistinguishablecodepoints

1033BanglaandOriya(Odia)

Bangla Oriya(Odia) NBGPDecision

ওU+0993 ଓU+0B13 Confusable

Table21ndashBanglaandOriyadistinguishablecodepoints

Bangla Oriya(Odia) NBGP

Decision

ঘU+0998 ସU+0B38 Distinguishable

Table22ndashBanglaandOriyadistinguishablecodepoints

60

11 Appendix-IIBengaliconsonantsandtheirallographs

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

প p y(z+ত)(z +ন)|(z +প)প8(z +য)r(z +র)(z +ল)~(z +স)(+প)(+প)

ফ pʰ (t +র)(t +ল)(+ফ)

ব b ( +জ)( +দ)( +ধ)v(+ব)ব8(+য)(+র)(+ল)ভ(+ভ)(+ব)(+ব)

(0+ধ) 2 (3+ব)

ভ bʱ ভ8(+য)(+র)(+ল)

ত t (x+ত)8(x+x+য)(x+x+ব)(x+থ)(x+ন)ত8(x+য)(x+ম)8(x++য)(x+ব)(x+র)y(z+ত)(p+ত)(p+x+ব)(+ত)q8(+x++য) (+x+র)Thereisamarkedformofত+=ৎৎ6 (+xৎ)

amp (5+ত)

61

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

থ tʰ থ8(iexcl+য)cent(iexcl+র)pound(+থ)(x+থ)curren(+থ)

(7+থ) 9 (+থ)

দ d yen(brvbar+গ)sect(brvbar+ঘ)uml(brvbar+দ)copy(brvbar+ধ)দ8(brvbar+য)ordf(brvbar+ব)laquo(brvbar+ভ)not(brvbar+র)(+দ)shy(+দ)reg(+brvbar+র)not6 (+brvbar+র)

(lt+গ) gt (lt+ধ)

ধ dʱ macr(deg+ন)plusmn(deg+ম)ধ8(deg+য)sup2(deg+র)sup3(acute+ধ)copy(brvbar+ধ)(+ধ)micro(+ধ)

( (+ধ) gt (lt+ধ) (0+ধ) (7+ধ)

ট ʈ para(middot+ট)ট8(middot+য)cedil(middot+ব)sup1(middot+র)ordm(p+ট)raquo(frac14+ট)

ঠ ʈʰ ঠ8(frac12+য)frac34(iquest+ঠ)Agrave(frac14+ঠ)

ড ɖ Aacute(Acirc+ড)ড8(Acirc+য)Atilde(Acirc+র)

ঢ ɖʱ ঢ8(Auml+য)Aring(iquest+ঢ)

চ tʃ AElig(Ccedil+চ)Egrave(Ccedil+ছ)Eacute(Ccedil+Ecirc+র)Euml(Ccedil+ঞ)চ8(Ccedil+য)Igrave(Iacute+চ)Icirc(Iuml+চ)

(A+চ)

62

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ছ tʃʰ ETH(Ecirc+র)Egrave(Ccedil+ছ)Ntilde(Iacute+ছ)Ograve(Iuml+ছ)

$ (A+ছ)

জ dʒ Oacute(Ocirc+জ)Otilde(Ocirc+Ocirc+ব)Ouml(Ocirc+ঝ)times(Ocirc+ঞ)জ8(Ocirc+য)Oslash(Ocirc+র)Ugrave(Iacute+জ)

(A+জ)

ঝ dʒʱ (notprivilegedenoughtohaveclustersasafirstmember)Ouml(Ocirc+ঝ)Uacute(Iacute+ঝ)

ক k Ucirc(p+ক)ordm(p+ট)(p+ত)Uuml(p+x+র)(p+x+ব)Yacute(p+ন)THORN(p+ব)szlig(p+ম)ক8(p+য)agrave(p+র)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)auml(p+frac14+ব)aacute8(p+frac14+য)aring(p+স)s(aelig+ক)ccedil(+p+র)

amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)

খ kʰ (notprivilegedenoughtohaveclustersasafirstmember)egrave(aelig+খ)

63

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

গ g eacute(acute+গ)ecirc(acute+দ)sup3(acute+ধ)euml(acute+ন)igrave(acute+ব)iacute(acute+ম)গ8(acute+য)icirc(acute+র)iuml(acute+ল)eth(aelig+গ)eth6 (+aelig+গ)

( (+ধ) (H+গ) K (L+H+গ)

ঘ gʱ ntilde(ograve+ন)ঘ8(ograve+য)oacute(ograve+র)ocirc(aelig+ঘ)

ঞ Thisletterdoesnothaveanyparticularphoneticvaluebutmostlypronouncedasn

Igrave(Iacute+চ)Ntilde(Iacute+ছ)Ugrave(Iacute+জ)Uacute(Iacute+ঝ)times(Ocirc+ঞ)

(A+চ) $ (A+ছ) (A+জ) M (A+ঝ)

ণ n otilde(iquest+ট)frac34(iquest+ঠ)ouml(iquest+ড)divide(iquest+Acirc+র)Aring(iquest+ঢ)oslash(iquest+ণ)ণ8(iquest+য)ugrave(iquest+ব)acirc(p+frac14+ণ)uacute(frac14+ণ)ucirc(+ণ)

O (P+ড) - (P+R+র) + (S+ণ)

ঙং ŋ s(aelig+ক)uuml(aelig+p+র)egrave(aelig+খ)eth(aelig+গ)ocirc(aelig+ঘ)yacute(aelig+p+ষ)(Insomecontextsaeligisreplacedbyং)কংঅং

) (H+ক) (H+গ) U (H+ঘ)

64

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ম m thorn(+ল)yuml(+প)(+z+র)(+ভ)(++র)$(+ম)(+র)(x+ম)plusmn(deg+ম)amp(+ম)atilde(p+frac14+ম)

W (3+ম)

ন n (+ট)((+middot+র))(iquest+ঠ)u(+ড)(+Acirc+র)(+ত)q(+x+র)q8(+x++য)curren(+থ)shy(+দ)reg(+brvbar+র)micro(+ধ)+(+deg+র)(+brvbar+ব)-(+ন)(+ম)ন8(+য)(+স)0(+ন)

(7+থ) (7+ধ) (7+Y+র)

শ ʃ Icirc(Iuml+চ)Ograve(Iuml+ছ)1(Iuml+ন)2(Iuml+ম)3(Iuml+র)4(Iuml+ল)শ8(Iuml+য)

ষ ʃ 5(frac14+ক)raquo(frac14+ট)Agrave(frac14+ঠ)uacute(frac14+ণ)6(frac14+প)7(frac14+z+র)8(frac14+ফ)raquo(frac14+ট)9(frac14+middot+র)Agrave(frac14+ঠ)uacute(frac14+ণ)ষ8(frac14+য)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)

+ (S+ণ)

65

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

স sampʃ (+ক)(+ট)(+প)(+ফ)lt(+ত)pound(+থ)(+ট)(+ক)=(+খ)স8(+য)gt(+র)(+ল)aring(p+স)

9 (+থ)

হ h ucirc(+ণ)0(+ন)amp(+ম)হ8(+য)(+র)A(+ল)

W (3+ম)

ড় ɽ B(C+গ)

ঢ় ɽʱ (notprivilegedenoughtohaveclusters)

য dʒThesecondarysymbol(allograph)jɔ-phalahastwophoneticvaluesWhenaddedtotheinitialconsonantinaworditisavowelaelig(asinশ8ামলর 8াপারetc)Butafteranon-initialconsonantitjustdoublesitinpronunciation(asinকায6ধায6etc)The+যcombinationhastwophysicalmanifestationsmdashর 8andয6

ক8(p+য)স8(+য)র 8(+য)[Justর 8isneverusedinBanglaorthographyর 8াisbutthenitslasttwosymbolsYa-phalaa-karaconstituteavowelsignrepresentingthevowelঅ8া]

66

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

র r Twomanifestationsmdashi lরফrepʰasthefirst

memberofaclusteregপ6ৎ6 not6 য6D6(+deg+ব)(earlierE6=+brvbar+deg+বafour-termcluster)etc(placedoverthefollowingconsonant)

ii র-ফলাrɔ-pʰɔlaasthesecondthirdmemberofaclustereg etc(placedundertheconsonantitfollows)

ল l F(+গ)(+প)G(+ব)H(+ম)I(+ট)J(+ড)K(+ক)F(+গ)L(+দ)ল8(+য)iuml(acute+ল)(+ল)thorn(+ল)

ঃ hwordfinallywordmediallyitdoublesthepronunciationofthefollowingconsonant

অঃকঃ

অব

Page 56: Proposal for a Bangla (or Bengali) Script Root Zone …‘Bangla’ (or Bengali) is historically and genealogically regarded as an eastern Indo-Aryan language with around 178.2 million

56

[153] SarkarPabitraampRajibChakraborty2018ldquoWhathashappenedSoFarIntermsof Script Reformsrdquo Paper presented at the Face to Face meeting jointly held by theBanglaAcademyDhakaampICANNatBanglaAcademyDhakaon10072018

[154] The Unicode Consortium 2018 The Unicodereg Standard Version 110 ndash CoreSpecificationChapter12P473

57

10 Appendix-I

101 AugmentedBackus-NaurFormalism(ABNF)The Augmented Backus-Naur Formalism (ABNF) is generic in nature and whenappliedtoaspecificlanguagescriptcertainrestrictionrulesapplyInotherwordsinagivenlanguagesomeoftheFormalismstructuresdonotnecessarilyapplyTotakecareofsuchcasesrestrictionrulesaresetinplaceTheserestrictionswillhelptofine-tunetheABNFIncaseofBangla13inparticularthefollowingrulesapply

1Khaṇḍata (ৎ) is NOT allowed at the beginning of an IDN label The sameappliestoঞandthevelarnasalঙintheBanglaSchemeoffive-foldlsquovargarsquo(asdefined under Table 5) Moreover Bangla does not allow ya (য়) in thebeginningofawordeitherbutwecanciteacoupleofnativeexamples forexamplethewordয়8াvেড়া(yaeligbbɔRo)fromthepoemlsquoLichuchorrsquowrittenbyKaziNazrul IslamHowever there are instances of it being used in namesmostlyof foreignoriginsuchasYaqubwhichmaybewrittenwithya(য়) inthe beginning as inয়াwব) In very recent timeswhile transliterating someChineseandJapanesenamesinBanglaonedoescomeacrossthepossibilityofKhaṇḍata (ৎ) followedbysa (স) inthebeginningofaword forexamplexেসিরং(Tsering)

2CHcancomewithKhandaTainonlythecasewhereCisra(র)(09B0)

ৎ6 asinভৎ6 সনা

3OnlyfollowingcombinationswithVHCMwillbeallowedrarrঅ8া(togetherpronouncedasaelig)asinঅ8ািসড(acid)rarrএ8া(togetheralsopronouncedasaelig)asinএ8ািসডএ8ােসািসেয়শান

(acidassociation)

102 lsquoSylhetiNagarılipirsquoorlsquoSilotirsquoThisversionofBanglascriptresemblesthe lsquoKaithīrsquoscript(ISO12954)usedbytheAccountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh andBiharndashwidelyinuseduringthe1880sTherewereseveralothernamesofSylheti

13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language Assamese and Maṇipuri have not been covered in this section

58

Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)

Table17ndashTheScriptTableofSylhetiNagarıorSiloti

103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints

1031 BanglaandNāgarīorDevanāgarī

Bangla Devanāgarī NBGP

Decision

ঃ U+0983 ः U+0903 Confusable

ওU+0993 उU+0909 Confusable

ঘU+0998 घU+0918 Confusable

U+0981 U+0945 Confusable

Table18BanglaandDevanāgarīconfusablecodepoints

59

1032 BanglaandGurmukhi

Bangla Gurmukhi NBGPdecision

ঘU+0998 ਬU+0A2C Confusable

U+0981 U+0A71 Confusable

Table19BanglaandGurmukhiconfusablecodepoints

Bangla Gurmukhi

NBGPdecision

ওU+0993 ਤU+0A24

Distinguishable

শU+09B6 ਅU+0A05

Distinguishable

মU+09AE ਮU+0A2E

Distinguishable

বাU+09ACandU+09BE

ਗU+0A17

Distinguishable

Table20ndashBanglaandGurmukhıdistinguishablecodepoints

1033BanglaandOriya(Odia)

Bangla Oriya(Odia) NBGPDecision

ওU+0993 ଓU+0B13 Confusable

Table21ndashBanglaandOriyadistinguishablecodepoints

Bangla Oriya(Odia) NBGP

Decision

ঘU+0998 ସU+0B38 Distinguishable

Table22ndashBanglaandOriyadistinguishablecodepoints

60

11 Appendix-IIBengaliconsonantsandtheirallographs

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

প p y(z+ত)(z +ন)|(z +প)প8(z +য)r(z +র)(z +ল)~(z +স)(+প)(+প)

ফ pʰ (t +র)(t +ল)(+ফ)

ব b ( +জ)( +দ)( +ধ)v(+ব)ব8(+য)(+র)(+ল)ভ(+ভ)(+ব)(+ব)

(0+ধ) 2 (3+ব)

ভ bʱ ভ8(+য)(+র)(+ল)

ত t (x+ত)8(x+x+য)(x+x+ব)(x+থ)(x+ন)ত8(x+য)(x+ম)8(x++য)(x+ব)(x+র)y(z+ত)(p+ত)(p+x+ব)(+ত)q8(+x++য) (+x+র)Thereisamarkedformofত+=ৎৎ6 (+xৎ)

amp (5+ত)

61

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

থ tʰ থ8(iexcl+য)cent(iexcl+র)pound(+থ)(x+থ)curren(+থ)

(7+থ) 9 (+থ)

দ d yen(brvbar+গ)sect(brvbar+ঘ)uml(brvbar+দ)copy(brvbar+ধ)দ8(brvbar+য)ordf(brvbar+ব)laquo(brvbar+ভ)not(brvbar+র)(+দ)shy(+দ)reg(+brvbar+র)not6 (+brvbar+র)

(lt+গ) gt (lt+ধ)

ধ dʱ macr(deg+ন)plusmn(deg+ম)ধ8(deg+য)sup2(deg+র)sup3(acute+ধ)copy(brvbar+ধ)(+ধ)micro(+ধ)

( (+ধ) gt (lt+ধ) (0+ধ) (7+ধ)

ট ʈ para(middot+ট)ট8(middot+য)cedil(middot+ব)sup1(middot+র)ordm(p+ট)raquo(frac14+ট)

ঠ ʈʰ ঠ8(frac12+য)frac34(iquest+ঠ)Agrave(frac14+ঠ)

ড ɖ Aacute(Acirc+ড)ড8(Acirc+য)Atilde(Acirc+র)

ঢ ɖʱ ঢ8(Auml+য)Aring(iquest+ঢ)

চ tʃ AElig(Ccedil+চ)Egrave(Ccedil+ছ)Eacute(Ccedil+Ecirc+র)Euml(Ccedil+ঞ)চ8(Ccedil+য)Igrave(Iacute+চ)Icirc(Iuml+চ)

(A+চ)

62

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ছ tʃʰ ETH(Ecirc+র)Egrave(Ccedil+ছ)Ntilde(Iacute+ছ)Ograve(Iuml+ছ)

$ (A+ছ)

জ dʒ Oacute(Ocirc+জ)Otilde(Ocirc+Ocirc+ব)Ouml(Ocirc+ঝ)times(Ocirc+ঞ)জ8(Ocirc+য)Oslash(Ocirc+র)Ugrave(Iacute+জ)

(A+জ)

ঝ dʒʱ (notprivilegedenoughtohaveclustersasafirstmember)Ouml(Ocirc+ঝ)Uacute(Iacute+ঝ)

ক k Ucirc(p+ক)ordm(p+ট)(p+ত)Uuml(p+x+র)(p+x+ব)Yacute(p+ন)THORN(p+ব)szlig(p+ম)ক8(p+য)agrave(p+র)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)auml(p+frac14+ব)aacute8(p+frac14+য)aring(p+স)s(aelig+ক)ccedil(+p+র)

amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)

খ kʰ (notprivilegedenoughtohaveclustersasafirstmember)egrave(aelig+খ)

63

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

গ g eacute(acute+গ)ecirc(acute+দ)sup3(acute+ধ)euml(acute+ন)igrave(acute+ব)iacute(acute+ম)গ8(acute+য)icirc(acute+র)iuml(acute+ল)eth(aelig+গ)eth6 (+aelig+গ)

( (+ধ) (H+গ) K (L+H+গ)

ঘ gʱ ntilde(ograve+ন)ঘ8(ograve+য)oacute(ograve+র)ocirc(aelig+ঘ)

ঞ Thisletterdoesnothaveanyparticularphoneticvaluebutmostlypronouncedasn

Igrave(Iacute+চ)Ntilde(Iacute+ছ)Ugrave(Iacute+জ)Uacute(Iacute+ঝ)times(Ocirc+ঞ)

(A+চ) $ (A+ছ) (A+জ) M (A+ঝ)

ণ n otilde(iquest+ট)frac34(iquest+ঠ)ouml(iquest+ড)divide(iquest+Acirc+র)Aring(iquest+ঢ)oslash(iquest+ণ)ণ8(iquest+য)ugrave(iquest+ব)acirc(p+frac14+ণ)uacute(frac14+ণ)ucirc(+ণ)

O (P+ড) - (P+R+র) + (S+ণ)

ঙং ŋ s(aelig+ক)uuml(aelig+p+র)egrave(aelig+খ)eth(aelig+গ)ocirc(aelig+ঘ)yacute(aelig+p+ষ)(Insomecontextsaeligisreplacedbyং)কংঅং

) (H+ক) (H+গ) U (H+ঘ)

64

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ম m thorn(+ল)yuml(+প)(+z+র)(+ভ)(++র)$(+ম)(+র)(x+ম)plusmn(deg+ম)amp(+ম)atilde(p+frac14+ম)

W (3+ম)

ন n (+ট)((+middot+র))(iquest+ঠ)u(+ড)(+Acirc+র)(+ত)q(+x+র)q8(+x++য)curren(+থ)shy(+দ)reg(+brvbar+র)micro(+ধ)+(+deg+র)(+brvbar+ব)-(+ন)(+ম)ন8(+য)(+স)0(+ন)

(7+থ) (7+ধ) (7+Y+র)

শ ʃ Icirc(Iuml+চ)Ograve(Iuml+ছ)1(Iuml+ন)2(Iuml+ম)3(Iuml+র)4(Iuml+ল)শ8(Iuml+য)

ষ ʃ 5(frac14+ক)raquo(frac14+ট)Agrave(frac14+ঠ)uacute(frac14+ণ)6(frac14+প)7(frac14+z+র)8(frac14+ফ)raquo(frac14+ট)9(frac14+middot+র)Agrave(frac14+ঠ)uacute(frac14+ণ)ষ8(frac14+য)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)

+ (S+ণ)

65

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

স sampʃ (+ক)(+ট)(+প)(+ফ)lt(+ত)pound(+থ)(+ট)(+ক)=(+খ)স8(+য)gt(+র)(+ল)aring(p+স)

9 (+থ)

হ h ucirc(+ণ)0(+ন)amp(+ম)হ8(+য)(+র)A(+ল)

W (3+ম)

ড় ɽ B(C+গ)

ঢ় ɽʱ (notprivilegedenoughtohaveclusters)

য dʒThesecondarysymbol(allograph)jɔ-phalahastwophoneticvaluesWhenaddedtotheinitialconsonantinaworditisavowelaelig(asinশ8ামলর 8াপারetc)Butafteranon-initialconsonantitjustdoublesitinpronunciation(asinকায6ধায6etc)The+যcombinationhastwophysicalmanifestationsmdashর 8andয6

ক8(p+য)স8(+য)র 8(+য)[Justর 8isneverusedinBanglaorthographyর 8াisbutthenitslasttwosymbolsYa-phalaa-karaconstituteavowelsignrepresentingthevowelঅ8া]

66

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

র r Twomanifestationsmdashi lরফrepʰasthefirst

memberofaclusteregপ6ৎ6 not6 য6D6(+deg+ব)(earlierE6=+brvbar+deg+বafour-termcluster)etc(placedoverthefollowingconsonant)

ii র-ফলাrɔ-pʰɔlaasthesecondthirdmemberofaclustereg etc(placedundertheconsonantitfollows)

ল l F(+গ)(+প)G(+ব)H(+ম)I(+ট)J(+ড)K(+ক)F(+গ)L(+দ)ল8(+য)iuml(acute+ল)(+ল)thorn(+ল)

ঃ hwordfinallywordmediallyitdoublesthepronunciationofthefollowingconsonant

অঃকঃ

অব

Page 57: Proposal for a Bangla (or Bengali) Script Root Zone …‘Bangla’ (or Bengali) is historically and genealogically regarded as an eastern Indo-Aryan language with around 178.2 million

57

10 Appendix-I

101 AugmentedBackus-NaurFormalism(ABNF)The Augmented Backus-Naur Formalism (ABNF) is generic in nature and whenappliedtoaspecificlanguagescriptcertainrestrictionrulesapplyInotherwordsinagivenlanguagesomeoftheFormalismstructuresdonotnecessarilyapplyTotakecareofsuchcasesrestrictionrulesaresetinplaceTheserestrictionswillhelptofine-tunetheABNFIncaseofBangla13inparticularthefollowingrulesapply

1Khaṇḍata (ৎ) is NOT allowed at the beginning of an IDN label The sameappliestoঞandthevelarnasalঙintheBanglaSchemeoffive-foldlsquovargarsquo(asdefined under Table 5) Moreover Bangla does not allow ya (য়) in thebeginningofawordeitherbutwecanciteacoupleofnativeexamples forexamplethewordয়8াvেড়া(yaeligbbɔRo)fromthepoemlsquoLichuchorrsquowrittenbyKaziNazrul IslamHowever there are instances of it being used in namesmostlyof foreignoriginsuchasYaqubwhichmaybewrittenwithya(য়) inthe beginning as inয়াwব) In very recent timeswhile transliterating someChineseandJapanesenamesinBanglaonedoescomeacrossthepossibilityofKhaṇḍata (ৎ) followedbysa (স) inthebeginningofaword forexamplexেসিরং(Tsering)

2CHcancomewithKhandaTainonlythecasewhereCisra(র)(09B0)

ৎ6 asinভৎ6 সনা

3OnlyfollowingcombinationswithVHCMwillbeallowedrarrঅ8া(togetherpronouncedasaelig)asinঅ8ািসড(acid)rarrএ8া(togetheralsopronouncedasaelig)asinএ8ািসডএ8ােসািসেয়শান

(acidassociation)

102 lsquoSylhetiNagarılipirsquoorlsquoSilotirsquoThisversionofBanglascriptresemblesthe lsquoKaithīrsquoscript(ISO12954)usedbytheAccountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh andBiharndashwidelyinuseduringthe1880sTherewereseveralothernamesofSylheti

13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language Assamese and Maṇipuri have not been covered in this section

58

Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)

Table17ndashTheScriptTableofSylhetiNagarıorSiloti

103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints

1031 BanglaandNāgarīorDevanāgarī

Bangla Devanāgarī NBGP

Decision

ঃ U+0983 ः U+0903 Confusable

ওU+0993 उU+0909 Confusable

ঘU+0998 घU+0918 Confusable

U+0981 U+0945 Confusable

Table18BanglaandDevanāgarīconfusablecodepoints

59

1032 BanglaandGurmukhi

Bangla Gurmukhi NBGPdecision

ঘU+0998 ਬU+0A2C Confusable

U+0981 U+0A71 Confusable

Table19BanglaandGurmukhiconfusablecodepoints

Bangla Gurmukhi

NBGPdecision

ওU+0993 ਤU+0A24

Distinguishable

শU+09B6 ਅU+0A05

Distinguishable

মU+09AE ਮU+0A2E

Distinguishable

বাU+09ACandU+09BE

ਗU+0A17

Distinguishable

Table20ndashBanglaandGurmukhıdistinguishablecodepoints

1033BanglaandOriya(Odia)

Bangla Oriya(Odia) NBGPDecision

ওU+0993 ଓU+0B13 Confusable

Table21ndashBanglaandOriyadistinguishablecodepoints

Bangla Oriya(Odia) NBGP

Decision

ঘU+0998 ସU+0B38 Distinguishable

Table22ndashBanglaandOriyadistinguishablecodepoints

60

11 Appendix-IIBengaliconsonantsandtheirallographs

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

প p y(z+ত)(z +ন)|(z +প)প8(z +য)r(z +র)(z +ল)~(z +স)(+প)(+প)

ফ pʰ (t +র)(t +ল)(+ফ)

ব b ( +জ)( +দ)( +ধ)v(+ব)ব8(+য)(+র)(+ল)ভ(+ভ)(+ব)(+ব)

(0+ধ) 2 (3+ব)

ভ bʱ ভ8(+য)(+র)(+ল)

ত t (x+ত)8(x+x+য)(x+x+ব)(x+থ)(x+ন)ত8(x+য)(x+ম)8(x++য)(x+ব)(x+র)y(z+ত)(p+ত)(p+x+ব)(+ত)q8(+x++য) (+x+র)Thereisamarkedformofত+=ৎৎ6 (+xৎ)

amp (5+ত)

61

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

থ tʰ থ8(iexcl+য)cent(iexcl+র)pound(+থ)(x+থ)curren(+থ)

(7+থ) 9 (+থ)

দ d yen(brvbar+গ)sect(brvbar+ঘ)uml(brvbar+দ)copy(brvbar+ধ)দ8(brvbar+য)ordf(brvbar+ব)laquo(brvbar+ভ)not(brvbar+র)(+দ)shy(+দ)reg(+brvbar+র)not6 (+brvbar+র)

(lt+গ) gt (lt+ধ)

ধ dʱ macr(deg+ন)plusmn(deg+ম)ধ8(deg+য)sup2(deg+র)sup3(acute+ধ)copy(brvbar+ধ)(+ধ)micro(+ধ)

( (+ধ) gt (lt+ধ) (0+ধ) (7+ধ)

ট ʈ para(middot+ট)ট8(middot+য)cedil(middot+ব)sup1(middot+র)ordm(p+ট)raquo(frac14+ট)

ঠ ʈʰ ঠ8(frac12+য)frac34(iquest+ঠ)Agrave(frac14+ঠ)

ড ɖ Aacute(Acirc+ড)ড8(Acirc+য)Atilde(Acirc+র)

ঢ ɖʱ ঢ8(Auml+য)Aring(iquest+ঢ)

চ tʃ AElig(Ccedil+চ)Egrave(Ccedil+ছ)Eacute(Ccedil+Ecirc+র)Euml(Ccedil+ঞ)চ8(Ccedil+য)Igrave(Iacute+চ)Icirc(Iuml+চ)

(A+চ)

62

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ছ tʃʰ ETH(Ecirc+র)Egrave(Ccedil+ছ)Ntilde(Iacute+ছ)Ograve(Iuml+ছ)

$ (A+ছ)

জ dʒ Oacute(Ocirc+জ)Otilde(Ocirc+Ocirc+ব)Ouml(Ocirc+ঝ)times(Ocirc+ঞ)জ8(Ocirc+য)Oslash(Ocirc+র)Ugrave(Iacute+জ)

(A+জ)

ঝ dʒʱ (notprivilegedenoughtohaveclustersasafirstmember)Ouml(Ocirc+ঝ)Uacute(Iacute+ঝ)

ক k Ucirc(p+ক)ordm(p+ট)(p+ত)Uuml(p+x+র)(p+x+ব)Yacute(p+ন)THORN(p+ব)szlig(p+ম)ক8(p+য)agrave(p+র)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)auml(p+frac14+ব)aacute8(p+frac14+য)aring(p+স)s(aelig+ক)ccedil(+p+র)

amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)

খ kʰ (notprivilegedenoughtohaveclustersasafirstmember)egrave(aelig+খ)

63

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

গ g eacute(acute+গ)ecirc(acute+দ)sup3(acute+ধ)euml(acute+ন)igrave(acute+ব)iacute(acute+ম)গ8(acute+য)icirc(acute+র)iuml(acute+ল)eth(aelig+গ)eth6 (+aelig+গ)

( (+ধ) (H+গ) K (L+H+গ)

ঘ gʱ ntilde(ograve+ন)ঘ8(ograve+য)oacute(ograve+র)ocirc(aelig+ঘ)

ঞ Thisletterdoesnothaveanyparticularphoneticvaluebutmostlypronouncedasn

Igrave(Iacute+চ)Ntilde(Iacute+ছ)Ugrave(Iacute+জ)Uacute(Iacute+ঝ)times(Ocirc+ঞ)

(A+চ) $ (A+ছ) (A+জ) M (A+ঝ)

ণ n otilde(iquest+ট)frac34(iquest+ঠ)ouml(iquest+ড)divide(iquest+Acirc+র)Aring(iquest+ঢ)oslash(iquest+ণ)ণ8(iquest+য)ugrave(iquest+ব)acirc(p+frac14+ণ)uacute(frac14+ণ)ucirc(+ণ)

O (P+ড) - (P+R+র) + (S+ণ)

ঙং ŋ s(aelig+ক)uuml(aelig+p+র)egrave(aelig+খ)eth(aelig+গ)ocirc(aelig+ঘ)yacute(aelig+p+ষ)(Insomecontextsaeligisreplacedbyং)কংঅং

) (H+ক) (H+গ) U (H+ঘ)

64

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ম m thorn(+ল)yuml(+প)(+z+র)(+ভ)(++র)$(+ম)(+র)(x+ম)plusmn(deg+ম)amp(+ম)atilde(p+frac14+ম)

W (3+ম)

ন n (+ট)((+middot+র))(iquest+ঠ)u(+ড)(+Acirc+র)(+ত)q(+x+র)q8(+x++য)curren(+থ)shy(+দ)reg(+brvbar+র)micro(+ধ)+(+deg+র)(+brvbar+ব)-(+ন)(+ম)ন8(+য)(+স)0(+ন)

(7+থ) (7+ধ) (7+Y+র)

শ ʃ Icirc(Iuml+চ)Ograve(Iuml+ছ)1(Iuml+ন)2(Iuml+ম)3(Iuml+র)4(Iuml+ল)শ8(Iuml+য)

ষ ʃ 5(frac14+ক)raquo(frac14+ট)Agrave(frac14+ঠ)uacute(frac14+ণ)6(frac14+প)7(frac14+z+র)8(frac14+ফ)raquo(frac14+ট)9(frac14+middot+র)Agrave(frac14+ঠ)uacute(frac14+ণ)ষ8(frac14+য)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)

+ (S+ণ)

65

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

স sampʃ (+ক)(+ট)(+প)(+ফ)lt(+ত)pound(+থ)(+ট)(+ক)=(+খ)স8(+য)gt(+র)(+ল)aring(p+স)

9 (+থ)

হ h ucirc(+ণ)0(+ন)amp(+ম)হ8(+য)(+র)A(+ল)

W (3+ম)

ড় ɽ B(C+গ)

ঢ় ɽʱ (notprivilegedenoughtohaveclusters)

য dʒThesecondarysymbol(allograph)jɔ-phalahastwophoneticvaluesWhenaddedtotheinitialconsonantinaworditisavowelaelig(asinশ8ামলর 8াপারetc)Butafteranon-initialconsonantitjustdoublesitinpronunciation(asinকায6ধায6etc)The+যcombinationhastwophysicalmanifestationsmdashর 8andয6

ক8(p+য)স8(+য)র 8(+য)[Justর 8isneverusedinBanglaorthographyর 8াisbutthenitslasttwosymbolsYa-phalaa-karaconstituteavowelsignrepresentingthevowelঅ8া]

66

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

র r Twomanifestationsmdashi lরফrepʰasthefirst

memberofaclusteregপ6ৎ6 not6 য6D6(+deg+ব)(earlierE6=+brvbar+deg+বafour-termcluster)etc(placedoverthefollowingconsonant)

ii র-ফলাrɔ-pʰɔlaasthesecondthirdmemberofaclustereg etc(placedundertheconsonantitfollows)

ল l F(+গ)(+প)G(+ব)H(+ম)I(+ট)J(+ড)K(+ক)F(+গ)L(+দ)ল8(+য)iuml(acute+ল)(+ল)thorn(+ল)

ঃ hwordfinallywordmediallyitdoublesthepronunciationofthefollowingconsonant

অঃকঃ

অব

Page 58: Proposal for a Bangla (or Bengali) Script Root Zone …‘Bangla’ (or Bengali) is historically and genealogically regarded as an eastern Indo-Aryan language with around 178.2 million

58

Nagarı or Siloti (129)ndash suchas lsquoJalalabadaNagarırsquo lsquoFula (flower)Nagarırsquo lsquoMuslimNagarırsquoorlsquoMuhammadNagarırsquoItissaidthatShahJalalahadbroughtthescriptwithhim in13th-14thCentury inSylhet (138)althoughsomesuggested that itwasaninvention by the Afghan rulers of Sylhet (137) Some ascribe the credit to theBuddhistBhikkhusfromNepalPurelyforhistoricalreasonsthedetailsofthescriptwith32symbolsarereproducedhere(138)

Table17ndashTheScriptTableofSylhetiNagarıorSiloti

103 ConfusablecodepointsThe following code points were analysed and concluded that they are either (a)distinguishable or (b) confusable but not enough to be defined as variant codepoints

1031 BanglaandNāgarīorDevanāgarī

Bangla Devanāgarī NBGP

Decision

ঃ U+0983 ः U+0903 Confusable

ওU+0993 उU+0909 Confusable

ঘU+0998 घU+0918 Confusable

U+0981 U+0945 Confusable

Table18BanglaandDevanāgarīconfusablecodepoints

59

1032 BanglaandGurmukhi

Bangla Gurmukhi NBGPdecision

ঘU+0998 ਬU+0A2C Confusable

U+0981 U+0A71 Confusable

Table19BanglaandGurmukhiconfusablecodepoints

Bangla Gurmukhi

NBGPdecision

ওU+0993 ਤU+0A24

Distinguishable

শU+09B6 ਅU+0A05

Distinguishable

মU+09AE ਮU+0A2E

Distinguishable

বাU+09ACandU+09BE

ਗU+0A17

Distinguishable

Table20ndashBanglaandGurmukhıdistinguishablecodepoints

1033BanglaandOriya(Odia)

Bangla Oriya(Odia) NBGPDecision

ওU+0993 ଓU+0B13 Confusable

Table21ndashBanglaandOriyadistinguishablecodepoints

Bangla Oriya(Odia) NBGP

Decision

ঘU+0998 ସU+0B38 Distinguishable

Table22ndashBanglaandOriyadistinguishablecodepoints

60

11 Appendix-IIBengaliconsonantsandtheirallographs

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

প p y(z+ত)(z +ন)|(z +প)প8(z +য)r(z +র)(z +ল)~(z +স)(+প)(+প)

ফ pʰ (t +র)(t +ল)(+ফ)

ব b ( +জ)( +দ)( +ধ)v(+ব)ব8(+য)(+র)(+ল)ভ(+ভ)(+ব)(+ব)

(0+ধ) 2 (3+ব)

ভ bʱ ভ8(+য)(+র)(+ল)

ত t (x+ত)8(x+x+য)(x+x+ব)(x+থ)(x+ন)ত8(x+য)(x+ম)8(x++য)(x+ব)(x+র)y(z+ত)(p+ত)(p+x+ব)(+ত)q8(+x++য) (+x+র)Thereisamarkedformofত+=ৎৎ6 (+xৎ)

amp (5+ত)

61

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

থ tʰ থ8(iexcl+য)cent(iexcl+র)pound(+থ)(x+থ)curren(+থ)

(7+থ) 9 (+থ)

দ d yen(brvbar+গ)sect(brvbar+ঘ)uml(brvbar+দ)copy(brvbar+ধ)দ8(brvbar+য)ordf(brvbar+ব)laquo(brvbar+ভ)not(brvbar+র)(+দ)shy(+দ)reg(+brvbar+র)not6 (+brvbar+র)

(lt+গ) gt (lt+ধ)

ধ dʱ macr(deg+ন)plusmn(deg+ম)ধ8(deg+য)sup2(deg+র)sup3(acute+ধ)copy(brvbar+ধ)(+ধ)micro(+ধ)

( (+ধ) gt (lt+ধ) (0+ধ) (7+ধ)

ট ʈ para(middot+ট)ট8(middot+য)cedil(middot+ব)sup1(middot+র)ordm(p+ট)raquo(frac14+ট)

ঠ ʈʰ ঠ8(frac12+য)frac34(iquest+ঠ)Agrave(frac14+ঠ)

ড ɖ Aacute(Acirc+ড)ড8(Acirc+য)Atilde(Acirc+র)

ঢ ɖʱ ঢ8(Auml+য)Aring(iquest+ঢ)

চ tʃ AElig(Ccedil+চ)Egrave(Ccedil+ছ)Eacute(Ccedil+Ecirc+র)Euml(Ccedil+ঞ)চ8(Ccedil+য)Igrave(Iacute+চ)Icirc(Iuml+চ)

(A+চ)

62

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ছ tʃʰ ETH(Ecirc+র)Egrave(Ccedil+ছ)Ntilde(Iacute+ছ)Ograve(Iuml+ছ)

$ (A+ছ)

জ dʒ Oacute(Ocirc+জ)Otilde(Ocirc+Ocirc+ব)Ouml(Ocirc+ঝ)times(Ocirc+ঞ)জ8(Ocirc+য)Oslash(Ocirc+র)Ugrave(Iacute+জ)

(A+জ)

ঝ dʒʱ (notprivilegedenoughtohaveclustersasafirstmember)Ouml(Ocirc+ঝ)Uacute(Iacute+ঝ)

ক k Ucirc(p+ক)ordm(p+ট)(p+ত)Uuml(p+x+র)(p+x+ব)Yacute(p+ন)THORN(p+ব)szlig(p+ম)ক8(p+য)agrave(p+র)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)auml(p+frac14+ব)aacute8(p+frac14+য)aring(p+স)s(aelig+ক)ccedil(+p+র)

amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)

খ kʰ (notprivilegedenoughtohaveclustersasafirstmember)egrave(aelig+খ)

63

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

গ g eacute(acute+গ)ecirc(acute+দ)sup3(acute+ধ)euml(acute+ন)igrave(acute+ব)iacute(acute+ম)গ8(acute+য)icirc(acute+র)iuml(acute+ল)eth(aelig+গ)eth6 (+aelig+গ)

( (+ধ) (H+গ) K (L+H+গ)

ঘ gʱ ntilde(ograve+ন)ঘ8(ograve+য)oacute(ograve+র)ocirc(aelig+ঘ)

ঞ Thisletterdoesnothaveanyparticularphoneticvaluebutmostlypronouncedasn

Igrave(Iacute+চ)Ntilde(Iacute+ছ)Ugrave(Iacute+জ)Uacute(Iacute+ঝ)times(Ocirc+ঞ)

(A+চ) $ (A+ছ) (A+জ) M (A+ঝ)

ণ n otilde(iquest+ট)frac34(iquest+ঠ)ouml(iquest+ড)divide(iquest+Acirc+র)Aring(iquest+ঢ)oslash(iquest+ণ)ণ8(iquest+য)ugrave(iquest+ব)acirc(p+frac14+ণ)uacute(frac14+ণ)ucirc(+ণ)

O (P+ড) - (P+R+র) + (S+ণ)

ঙং ŋ s(aelig+ক)uuml(aelig+p+র)egrave(aelig+খ)eth(aelig+গ)ocirc(aelig+ঘ)yacute(aelig+p+ষ)(Insomecontextsaeligisreplacedbyং)কংঅং

) (H+ক) (H+গ) U (H+ঘ)

64

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ম m thorn(+ল)yuml(+প)(+z+র)(+ভ)(++র)$(+ম)(+র)(x+ম)plusmn(deg+ম)amp(+ম)atilde(p+frac14+ম)

W (3+ম)

ন n (+ট)((+middot+র))(iquest+ঠ)u(+ড)(+Acirc+র)(+ত)q(+x+র)q8(+x++য)curren(+থ)shy(+দ)reg(+brvbar+র)micro(+ধ)+(+deg+র)(+brvbar+ব)-(+ন)(+ম)ন8(+য)(+স)0(+ন)

(7+থ) (7+ধ) (7+Y+র)

শ ʃ Icirc(Iuml+চ)Ograve(Iuml+ছ)1(Iuml+ন)2(Iuml+ম)3(Iuml+র)4(Iuml+ল)শ8(Iuml+য)

ষ ʃ 5(frac14+ক)raquo(frac14+ট)Agrave(frac14+ঠ)uacute(frac14+ণ)6(frac14+প)7(frac14+z+র)8(frac14+ফ)raquo(frac14+ট)9(frac14+middot+র)Agrave(frac14+ঠ)uacute(frac14+ণ)ষ8(frac14+য)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)

+ (S+ণ)

65

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

স sampʃ (+ক)(+ট)(+প)(+ফ)lt(+ত)pound(+থ)(+ট)(+ক)=(+খ)স8(+য)gt(+র)(+ল)aring(p+স)

9 (+থ)

হ h ucirc(+ণ)0(+ন)amp(+ম)হ8(+য)(+র)A(+ল)

W (3+ম)

ড় ɽ B(C+গ)

ঢ় ɽʱ (notprivilegedenoughtohaveclusters)

য dʒThesecondarysymbol(allograph)jɔ-phalahastwophoneticvaluesWhenaddedtotheinitialconsonantinaworditisavowelaelig(asinশ8ামলর 8াপারetc)Butafteranon-initialconsonantitjustdoublesitinpronunciation(asinকায6ধায6etc)The+যcombinationhastwophysicalmanifestationsmdashর 8andয6

ক8(p+য)স8(+য)র 8(+য)[Justর 8isneverusedinBanglaorthographyর 8াisbutthenitslasttwosymbolsYa-phalaa-karaconstituteavowelsignrepresentingthevowelঅ8া]

66

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

র r Twomanifestationsmdashi lরফrepʰasthefirst

memberofaclusteregপ6ৎ6 not6 য6D6(+deg+ব)(earlierE6=+brvbar+deg+বafour-termcluster)etc(placedoverthefollowingconsonant)

ii র-ফলাrɔ-pʰɔlaasthesecondthirdmemberofaclustereg etc(placedundertheconsonantitfollows)

ল l F(+গ)(+প)G(+ব)H(+ম)I(+ট)J(+ড)K(+ক)F(+গ)L(+দ)ল8(+য)iuml(acute+ল)(+ল)thorn(+ল)

ঃ hwordfinallywordmediallyitdoublesthepronunciationofthefollowingconsonant

অঃকঃ

অব

Page 59: Proposal for a Bangla (or Bengali) Script Root Zone …‘Bangla’ (or Bengali) is historically and genealogically regarded as an eastern Indo-Aryan language with around 178.2 million

59

1032 BanglaandGurmukhi

Bangla Gurmukhi NBGPdecision

ঘU+0998 ਬU+0A2C Confusable

U+0981 U+0A71 Confusable

Table19BanglaandGurmukhiconfusablecodepoints

Bangla Gurmukhi

NBGPdecision

ওU+0993 ਤU+0A24

Distinguishable

শU+09B6 ਅU+0A05

Distinguishable

মU+09AE ਮU+0A2E

Distinguishable

বাU+09ACandU+09BE

ਗU+0A17

Distinguishable

Table20ndashBanglaandGurmukhıdistinguishablecodepoints

1033BanglaandOriya(Odia)

Bangla Oriya(Odia) NBGPDecision

ওU+0993 ଓU+0B13 Confusable

Table21ndashBanglaandOriyadistinguishablecodepoints

Bangla Oriya(Odia) NBGP

Decision

ঘU+0998 ସU+0B38 Distinguishable

Table22ndashBanglaandOriyadistinguishablecodepoints

60

11 Appendix-IIBengaliconsonantsandtheirallographs

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

প p y(z+ত)(z +ন)|(z +প)প8(z +য)r(z +র)(z +ল)~(z +স)(+প)(+প)

ফ pʰ (t +র)(t +ল)(+ফ)

ব b ( +জ)( +দ)( +ধ)v(+ব)ব8(+য)(+র)(+ল)ভ(+ভ)(+ব)(+ব)

(0+ধ) 2 (3+ব)

ভ bʱ ভ8(+য)(+র)(+ল)

ত t (x+ত)8(x+x+য)(x+x+ব)(x+থ)(x+ন)ত8(x+য)(x+ম)8(x++য)(x+ব)(x+র)y(z+ত)(p+ত)(p+x+ব)(+ত)q8(+x++য) (+x+র)Thereisamarkedformofত+=ৎৎ6 (+xৎ)

amp (5+ত)

61

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

থ tʰ থ8(iexcl+য)cent(iexcl+র)pound(+থ)(x+থ)curren(+থ)

(7+থ) 9 (+থ)

দ d yen(brvbar+গ)sect(brvbar+ঘ)uml(brvbar+দ)copy(brvbar+ধ)দ8(brvbar+য)ordf(brvbar+ব)laquo(brvbar+ভ)not(brvbar+র)(+দ)shy(+দ)reg(+brvbar+র)not6 (+brvbar+র)

(lt+গ) gt (lt+ধ)

ধ dʱ macr(deg+ন)plusmn(deg+ম)ধ8(deg+য)sup2(deg+র)sup3(acute+ধ)copy(brvbar+ধ)(+ধ)micro(+ধ)

( (+ধ) gt (lt+ধ) (0+ধ) (7+ধ)

ট ʈ para(middot+ট)ট8(middot+য)cedil(middot+ব)sup1(middot+র)ordm(p+ট)raquo(frac14+ট)

ঠ ʈʰ ঠ8(frac12+য)frac34(iquest+ঠ)Agrave(frac14+ঠ)

ড ɖ Aacute(Acirc+ড)ড8(Acirc+য)Atilde(Acirc+র)

ঢ ɖʱ ঢ8(Auml+য)Aring(iquest+ঢ)

চ tʃ AElig(Ccedil+চ)Egrave(Ccedil+ছ)Eacute(Ccedil+Ecirc+র)Euml(Ccedil+ঞ)চ8(Ccedil+য)Igrave(Iacute+চ)Icirc(Iuml+চ)

(A+চ)

62

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ছ tʃʰ ETH(Ecirc+র)Egrave(Ccedil+ছ)Ntilde(Iacute+ছ)Ograve(Iuml+ছ)

$ (A+ছ)

জ dʒ Oacute(Ocirc+জ)Otilde(Ocirc+Ocirc+ব)Ouml(Ocirc+ঝ)times(Ocirc+ঞ)জ8(Ocirc+য)Oslash(Ocirc+র)Ugrave(Iacute+জ)

(A+জ)

ঝ dʒʱ (notprivilegedenoughtohaveclustersasafirstmember)Ouml(Ocirc+ঝ)Uacute(Iacute+ঝ)

ক k Ucirc(p+ক)ordm(p+ট)(p+ত)Uuml(p+x+র)(p+x+ব)Yacute(p+ন)THORN(p+ব)szlig(p+ম)ক8(p+য)agrave(p+র)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)auml(p+frac14+ব)aacute8(p+frac14+য)aring(p+স)s(aelig+ক)ccedil(+p+র)

amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)

খ kʰ (notprivilegedenoughtohaveclustersasafirstmember)egrave(aelig+খ)

63

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

গ g eacute(acute+গ)ecirc(acute+দ)sup3(acute+ধ)euml(acute+ন)igrave(acute+ব)iacute(acute+ম)গ8(acute+য)icirc(acute+র)iuml(acute+ল)eth(aelig+গ)eth6 (+aelig+গ)

( (+ধ) (H+গ) K (L+H+গ)

ঘ gʱ ntilde(ograve+ন)ঘ8(ograve+য)oacute(ograve+র)ocirc(aelig+ঘ)

ঞ Thisletterdoesnothaveanyparticularphoneticvaluebutmostlypronouncedasn

Igrave(Iacute+চ)Ntilde(Iacute+ছ)Ugrave(Iacute+জ)Uacute(Iacute+ঝ)times(Ocirc+ঞ)

(A+চ) $ (A+ছ) (A+জ) M (A+ঝ)

ণ n otilde(iquest+ট)frac34(iquest+ঠ)ouml(iquest+ড)divide(iquest+Acirc+র)Aring(iquest+ঢ)oslash(iquest+ণ)ণ8(iquest+য)ugrave(iquest+ব)acirc(p+frac14+ণ)uacute(frac14+ণ)ucirc(+ণ)

O (P+ড) - (P+R+র) + (S+ণ)

ঙং ŋ s(aelig+ক)uuml(aelig+p+র)egrave(aelig+খ)eth(aelig+গ)ocirc(aelig+ঘ)yacute(aelig+p+ষ)(Insomecontextsaeligisreplacedbyং)কংঅং

) (H+ক) (H+গ) U (H+ঘ)

64

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ম m thorn(+ল)yuml(+প)(+z+র)(+ভ)(++র)$(+ম)(+র)(x+ম)plusmn(deg+ম)amp(+ম)atilde(p+frac14+ম)

W (3+ম)

ন n (+ট)((+middot+র))(iquest+ঠ)u(+ড)(+Acirc+র)(+ত)q(+x+র)q8(+x++য)curren(+থ)shy(+দ)reg(+brvbar+র)micro(+ধ)+(+deg+র)(+brvbar+ব)-(+ন)(+ম)ন8(+য)(+স)0(+ন)

(7+থ) (7+ধ) (7+Y+র)

শ ʃ Icirc(Iuml+চ)Ograve(Iuml+ছ)1(Iuml+ন)2(Iuml+ম)3(Iuml+র)4(Iuml+ল)শ8(Iuml+য)

ষ ʃ 5(frac14+ক)raquo(frac14+ট)Agrave(frac14+ঠ)uacute(frac14+ণ)6(frac14+প)7(frac14+z+র)8(frac14+ফ)raquo(frac14+ট)9(frac14+middot+র)Agrave(frac14+ঠ)uacute(frac14+ণ)ষ8(frac14+য)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)

+ (S+ণ)

65

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

স sampʃ (+ক)(+ট)(+প)(+ফ)lt(+ত)pound(+থ)(+ট)(+ক)=(+খ)স8(+য)gt(+র)(+ল)aring(p+স)

9 (+থ)

হ h ucirc(+ণ)0(+ন)amp(+ম)হ8(+য)(+র)A(+ল)

W (3+ম)

ড় ɽ B(C+গ)

ঢ় ɽʱ (notprivilegedenoughtohaveclusters)

য dʒThesecondarysymbol(allograph)jɔ-phalahastwophoneticvaluesWhenaddedtotheinitialconsonantinaworditisavowelaelig(asinশ8ামলর 8াপারetc)Butafteranon-initialconsonantitjustdoublesitinpronunciation(asinকায6ধায6etc)The+যcombinationhastwophysicalmanifestationsmdashর 8andয6

ক8(p+য)স8(+য)র 8(+য)[Justর 8isneverusedinBanglaorthographyর 8াisbutthenitslasttwosymbolsYa-phalaa-karaconstituteavowelsignrepresentingthevowelঅ8া]

66

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

র r Twomanifestationsmdashi lরফrepʰasthefirst

memberofaclusteregপ6ৎ6 not6 য6D6(+deg+ব)(earlierE6=+brvbar+deg+বafour-termcluster)etc(placedoverthefollowingconsonant)

ii র-ফলাrɔ-pʰɔlaasthesecondthirdmemberofaclustereg etc(placedundertheconsonantitfollows)

ল l F(+গ)(+প)G(+ব)H(+ম)I(+ট)J(+ড)K(+ক)F(+গ)L(+দ)ল8(+য)iuml(acute+ল)(+ল)thorn(+ল)

ঃ hwordfinallywordmediallyitdoublesthepronunciationofthefollowingconsonant

অঃকঃ

অব

Page 60: Proposal for a Bangla (or Bengali) Script Root Zone …‘Bangla’ (or Bengali) is historically and genealogically regarded as an eastern Indo-Aryan language with around 178.2 million

60

11 Appendix-IIBengaliconsonantsandtheirallographs

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

প p y(z+ত)(z +ন)|(z +প)প8(z +য)r(z +র)(z +ল)~(z +স)(+প)(+প)

ফ pʰ (t +র)(t +ল)(+ফ)

ব b ( +জ)( +দ)( +ধ)v(+ব)ব8(+য)(+র)(+ল)ভ(+ভ)(+ব)(+ব)

(0+ধ) 2 (3+ব)

ভ bʱ ভ8(+য)(+র)(+ল)

ত t (x+ত)8(x+x+য)(x+x+ব)(x+থ)(x+ন)ত8(x+য)(x+ম)8(x++য)(x+ব)(x+র)y(z+ত)(p+ত)(p+x+ব)(+ত)q8(+x++য) (+x+র)Thereisamarkedformofত+=ৎৎ6 (+xৎ)

amp (5+ত)

61

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

থ tʰ থ8(iexcl+য)cent(iexcl+র)pound(+থ)(x+থ)curren(+থ)

(7+থ) 9 (+থ)

দ d yen(brvbar+গ)sect(brvbar+ঘ)uml(brvbar+দ)copy(brvbar+ধ)দ8(brvbar+য)ordf(brvbar+ব)laquo(brvbar+ভ)not(brvbar+র)(+দ)shy(+দ)reg(+brvbar+র)not6 (+brvbar+র)

(lt+গ) gt (lt+ধ)

ধ dʱ macr(deg+ন)plusmn(deg+ম)ধ8(deg+য)sup2(deg+র)sup3(acute+ধ)copy(brvbar+ধ)(+ধ)micro(+ধ)

( (+ধ) gt (lt+ধ) (0+ধ) (7+ধ)

ট ʈ para(middot+ট)ট8(middot+য)cedil(middot+ব)sup1(middot+র)ordm(p+ট)raquo(frac14+ট)

ঠ ʈʰ ঠ8(frac12+য)frac34(iquest+ঠ)Agrave(frac14+ঠ)

ড ɖ Aacute(Acirc+ড)ড8(Acirc+য)Atilde(Acirc+র)

ঢ ɖʱ ঢ8(Auml+য)Aring(iquest+ঢ)

চ tʃ AElig(Ccedil+চ)Egrave(Ccedil+ছ)Eacute(Ccedil+Ecirc+র)Euml(Ccedil+ঞ)চ8(Ccedil+য)Igrave(Iacute+চ)Icirc(Iuml+চ)

(A+চ)

62

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ছ tʃʰ ETH(Ecirc+র)Egrave(Ccedil+ছ)Ntilde(Iacute+ছ)Ograve(Iuml+ছ)

$ (A+ছ)

জ dʒ Oacute(Ocirc+জ)Otilde(Ocirc+Ocirc+ব)Ouml(Ocirc+ঝ)times(Ocirc+ঞ)জ8(Ocirc+য)Oslash(Ocirc+র)Ugrave(Iacute+জ)

(A+জ)

ঝ dʒʱ (notprivilegedenoughtohaveclustersasafirstmember)Ouml(Ocirc+ঝ)Uacute(Iacute+ঝ)

ক k Ucirc(p+ক)ordm(p+ট)(p+ত)Uuml(p+x+র)(p+x+ব)Yacute(p+ন)THORN(p+ব)szlig(p+ম)ক8(p+য)agrave(p+র)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)auml(p+frac14+ব)aacute8(p+frac14+য)aring(p+স)s(aelig+ক)ccedil(+p+র)

amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)

খ kʰ (notprivilegedenoughtohaveclustersasafirstmember)egrave(aelig+খ)

63

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

গ g eacute(acute+গ)ecirc(acute+দ)sup3(acute+ধ)euml(acute+ন)igrave(acute+ব)iacute(acute+ম)গ8(acute+য)icirc(acute+র)iuml(acute+ল)eth(aelig+গ)eth6 (+aelig+গ)

( (+ধ) (H+গ) K (L+H+গ)

ঘ gʱ ntilde(ograve+ন)ঘ8(ograve+য)oacute(ograve+র)ocirc(aelig+ঘ)

ঞ Thisletterdoesnothaveanyparticularphoneticvaluebutmostlypronouncedasn

Igrave(Iacute+চ)Ntilde(Iacute+ছ)Ugrave(Iacute+জ)Uacute(Iacute+ঝ)times(Ocirc+ঞ)

(A+চ) $ (A+ছ) (A+জ) M (A+ঝ)

ণ n otilde(iquest+ট)frac34(iquest+ঠ)ouml(iquest+ড)divide(iquest+Acirc+র)Aring(iquest+ঢ)oslash(iquest+ণ)ণ8(iquest+য)ugrave(iquest+ব)acirc(p+frac14+ণ)uacute(frac14+ণ)ucirc(+ণ)

O (P+ড) - (P+R+র) + (S+ণ)

ঙং ŋ s(aelig+ক)uuml(aelig+p+র)egrave(aelig+খ)eth(aelig+গ)ocirc(aelig+ঘ)yacute(aelig+p+ষ)(Insomecontextsaeligisreplacedbyং)কংঅং

) (H+ক) (H+গ) U (H+ঘ)

64

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ম m thorn(+ল)yuml(+প)(+z+র)(+ভ)(++র)$(+ম)(+র)(x+ম)plusmn(deg+ম)amp(+ম)atilde(p+frac14+ম)

W (3+ম)

ন n (+ট)((+middot+র))(iquest+ঠ)u(+ড)(+Acirc+র)(+ত)q(+x+র)q8(+x++য)curren(+থ)shy(+দ)reg(+brvbar+র)micro(+ধ)+(+deg+র)(+brvbar+ব)-(+ন)(+ম)ন8(+য)(+স)0(+ন)

(7+থ) (7+ধ) (7+Y+র)

শ ʃ Icirc(Iuml+চ)Ograve(Iuml+ছ)1(Iuml+ন)2(Iuml+ম)3(Iuml+র)4(Iuml+ল)শ8(Iuml+য)

ষ ʃ 5(frac14+ক)raquo(frac14+ট)Agrave(frac14+ঠ)uacute(frac14+ণ)6(frac14+প)7(frac14+z+র)8(frac14+ফ)raquo(frac14+ট)9(frac14+middot+র)Agrave(frac14+ঠ)uacute(frac14+ণ)ষ8(frac14+য)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)

+ (S+ণ)

65

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

স sampʃ (+ক)(+ট)(+প)(+ফ)lt(+ত)pound(+থ)(+ট)(+ক)=(+খ)স8(+য)gt(+র)(+ল)aring(p+স)

9 (+থ)

হ h ucirc(+ণ)0(+ন)amp(+ম)হ8(+য)(+র)A(+ল)

W (3+ম)

ড় ɽ B(C+গ)

ঢ় ɽʱ (notprivilegedenoughtohaveclusters)

য dʒThesecondarysymbol(allograph)jɔ-phalahastwophoneticvaluesWhenaddedtotheinitialconsonantinaworditisavowelaelig(asinশ8ামলর 8াপারetc)Butafteranon-initialconsonantitjustdoublesitinpronunciation(asinকায6ধায6etc)The+যcombinationhastwophysicalmanifestationsmdashর 8andয6

ক8(p+য)স8(+য)র 8(+য)[Justর 8isneverusedinBanglaorthographyর 8াisbutthenitslasttwosymbolsYa-phalaa-karaconstituteavowelsignrepresentingthevowelঅ8া]

66

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

র r Twomanifestationsmdashi lরফrepʰasthefirst

memberofaclusteregপ6ৎ6 not6 য6D6(+deg+ব)(earlierE6=+brvbar+deg+বafour-termcluster)etc(placedoverthefollowingconsonant)

ii র-ফলাrɔ-pʰɔlaasthesecondthirdmemberofaclustereg etc(placedundertheconsonantitfollows)

ল l F(+গ)(+প)G(+ব)H(+ম)I(+ট)J(+ড)K(+ক)F(+গ)L(+দ)ল8(+য)iuml(acute+ল)(+ল)thorn(+ল)

ঃ hwordfinallywordmediallyitdoublesthepronunciationofthefollowingconsonant

অঃকঃ

অব

Page 61: Proposal for a Bangla (or Bengali) Script Root Zone …‘Bangla’ (or Bengali) is historically and genealogically regarded as an eastern Indo-Aryan language with around 178.2 million

61

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

থ tʰ থ8(iexcl+য)cent(iexcl+র)pound(+থ)(x+থ)curren(+থ)

(7+থ) 9 (+থ)

দ d yen(brvbar+গ)sect(brvbar+ঘ)uml(brvbar+দ)copy(brvbar+ধ)দ8(brvbar+য)ordf(brvbar+ব)laquo(brvbar+ভ)not(brvbar+র)(+দ)shy(+দ)reg(+brvbar+র)not6 (+brvbar+র)

(lt+গ) gt (lt+ধ)

ধ dʱ macr(deg+ন)plusmn(deg+ম)ধ8(deg+য)sup2(deg+র)sup3(acute+ধ)copy(brvbar+ধ)(+ধ)micro(+ধ)

( (+ধ) gt (lt+ধ) (0+ধ) (7+ধ)

ট ʈ para(middot+ট)ট8(middot+য)cedil(middot+ব)sup1(middot+র)ordm(p+ট)raquo(frac14+ট)

ঠ ʈʰ ঠ8(frac12+য)frac34(iquest+ঠ)Agrave(frac14+ঠ)

ড ɖ Aacute(Acirc+ড)ড8(Acirc+য)Atilde(Acirc+র)

ঢ ɖʱ ঢ8(Auml+য)Aring(iquest+ঢ)

চ tʃ AElig(Ccedil+চ)Egrave(Ccedil+ছ)Eacute(Ccedil+Ecirc+র)Euml(Ccedil+ঞ)চ8(Ccedil+য)Igrave(Iacute+চ)Icirc(Iuml+চ)

(A+চ)

62

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ছ tʃʰ ETH(Ecirc+র)Egrave(Ccedil+ছ)Ntilde(Iacute+ছ)Ograve(Iuml+ছ)

$ (A+ছ)

জ dʒ Oacute(Ocirc+জ)Otilde(Ocirc+Ocirc+ব)Ouml(Ocirc+ঝ)times(Ocirc+ঞ)জ8(Ocirc+য)Oslash(Ocirc+র)Ugrave(Iacute+জ)

(A+জ)

ঝ dʒʱ (notprivilegedenoughtohaveclustersasafirstmember)Ouml(Ocirc+ঝ)Uacute(Iacute+ঝ)

ক k Ucirc(p+ক)ordm(p+ট)(p+ত)Uuml(p+x+র)(p+x+ব)Yacute(p+ন)THORN(p+ব)szlig(p+ম)ক8(p+য)agrave(p+র)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)auml(p+frac14+ব)aacute8(p+frac14+য)aring(p+স)s(aelig+ক)ccedil(+p+র)

amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)

খ kʰ (notprivilegedenoughtohaveclustersasafirstmember)egrave(aelig+খ)

63

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

গ g eacute(acute+গ)ecirc(acute+দ)sup3(acute+ধ)euml(acute+ন)igrave(acute+ব)iacute(acute+ম)গ8(acute+য)icirc(acute+র)iuml(acute+ল)eth(aelig+গ)eth6 (+aelig+গ)

( (+ধ) (H+গ) K (L+H+গ)

ঘ gʱ ntilde(ograve+ন)ঘ8(ograve+য)oacute(ograve+র)ocirc(aelig+ঘ)

ঞ Thisletterdoesnothaveanyparticularphoneticvaluebutmostlypronouncedasn

Igrave(Iacute+চ)Ntilde(Iacute+ছ)Ugrave(Iacute+জ)Uacute(Iacute+ঝ)times(Ocirc+ঞ)

(A+চ) $ (A+ছ) (A+জ) M (A+ঝ)

ণ n otilde(iquest+ট)frac34(iquest+ঠ)ouml(iquest+ড)divide(iquest+Acirc+র)Aring(iquest+ঢ)oslash(iquest+ণ)ণ8(iquest+য)ugrave(iquest+ব)acirc(p+frac14+ণ)uacute(frac14+ণ)ucirc(+ণ)

O (P+ড) - (P+R+র) + (S+ণ)

ঙং ŋ s(aelig+ক)uuml(aelig+p+র)egrave(aelig+খ)eth(aelig+গ)ocirc(aelig+ঘ)yacute(aelig+p+ষ)(Insomecontextsaeligisreplacedbyং)কংঅং

) (H+ক) (H+গ) U (H+ঘ)

64

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ম m thorn(+ল)yuml(+প)(+z+র)(+ভ)(++র)$(+ম)(+র)(x+ম)plusmn(deg+ম)amp(+ম)atilde(p+frac14+ম)

W (3+ম)

ন n (+ট)((+middot+র))(iquest+ঠ)u(+ড)(+Acirc+র)(+ত)q(+x+র)q8(+x++য)curren(+থ)shy(+দ)reg(+brvbar+র)micro(+ধ)+(+deg+র)(+brvbar+ব)-(+ন)(+ম)ন8(+য)(+স)0(+ন)

(7+থ) (7+ধ) (7+Y+র)

শ ʃ Icirc(Iuml+চ)Ograve(Iuml+ছ)1(Iuml+ন)2(Iuml+ম)3(Iuml+র)4(Iuml+ল)শ8(Iuml+য)

ষ ʃ 5(frac14+ক)raquo(frac14+ট)Agrave(frac14+ঠ)uacute(frac14+ণ)6(frac14+প)7(frac14+z+র)8(frac14+ফ)raquo(frac14+ট)9(frac14+middot+র)Agrave(frac14+ঠ)uacute(frac14+ণ)ষ8(frac14+য)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)

+ (S+ণ)

65

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

স sampʃ (+ক)(+ট)(+প)(+ফ)lt(+ত)pound(+থ)(+ট)(+ক)=(+খ)স8(+য)gt(+র)(+ল)aring(p+স)

9 (+থ)

হ h ucirc(+ণ)0(+ন)amp(+ম)হ8(+য)(+র)A(+ল)

W (3+ম)

ড় ɽ B(C+গ)

ঢ় ɽʱ (notprivilegedenoughtohaveclusters)

য dʒThesecondarysymbol(allograph)jɔ-phalahastwophoneticvaluesWhenaddedtotheinitialconsonantinaworditisavowelaelig(asinশ8ামলর 8াপারetc)Butafteranon-initialconsonantitjustdoublesitinpronunciation(asinকায6ধায6etc)The+যcombinationhastwophysicalmanifestationsmdashর 8andয6

ক8(p+য)স8(+য)র 8(+য)[Justর 8isneverusedinBanglaorthographyর 8াisbutthenitslasttwosymbolsYa-phalaa-karaconstituteavowelsignrepresentingthevowelঅ8া]

66

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

র r Twomanifestationsmdashi lরফrepʰasthefirst

memberofaclusteregপ6ৎ6 not6 য6D6(+deg+ব)(earlierE6=+brvbar+deg+বafour-termcluster)etc(placedoverthefollowingconsonant)

ii র-ফলাrɔ-pʰɔlaasthesecondthirdmemberofaclustereg etc(placedundertheconsonantitfollows)

ল l F(+গ)(+প)G(+ব)H(+ম)I(+ট)J(+ড)K(+ক)F(+গ)L(+দ)ল8(+য)iuml(acute+ল)(+ল)thorn(+ল)

ঃ hwordfinallywordmediallyitdoublesthepronunciationofthefollowingconsonant

অঃকঃ

অব

Page 62: Proposal for a Bangla (or Bengali) Script Root Zone …‘Bangla’ (or Bengali) is historically and genealogically regarded as an eastern Indo-Aryan language with around 178.2 million

62

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ছ tʃʰ ETH(Ecirc+র)Egrave(Ccedil+ছ)Ntilde(Iacute+ছ)Ograve(Iuml+ছ)

$ (A+ছ)

জ dʒ Oacute(Ocirc+জ)Otilde(Ocirc+Ocirc+ব)Ouml(Ocirc+ঝ)times(Ocirc+ঞ)জ8(Ocirc+য)Oslash(Ocirc+র)Ugrave(Iacute+জ)

(A+জ)

ঝ dʒʱ (notprivilegedenoughtohaveclustersasafirstmember)Ouml(Ocirc+ঝ)Uacute(Iacute+ঝ)

ক k Ucirc(p+ক)ordm(p+ট)(p+ত)Uuml(p+x+র)(p+x+ব)Yacute(p+ন)THORN(p+ব)szlig(p+ম)ক8(p+য)agrave(p+র)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)auml(p+frac14+ব)aacute8(p+frac14+য)aring(p+স)s(aelig+ক)ccedil(+p+র)

amp (5+ত) (5+E+র) G (5+E+ব) (5+র) ) (H+ক) J (+5+র)

খ kʰ (notprivilegedenoughtohaveclustersasafirstmember)egrave(aelig+খ)

63

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

গ g eacute(acute+গ)ecirc(acute+দ)sup3(acute+ধ)euml(acute+ন)igrave(acute+ব)iacute(acute+ম)গ8(acute+য)icirc(acute+র)iuml(acute+ল)eth(aelig+গ)eth6 (+aelig+গ)

( (+ধ) (H+গ) K (L+H+গ)

ঘ gʱ ntilde(ograve+ন)ঘ8(ograve+য)oacute(ograve+র)ocirc(aelig+ঘ)

ঞ Thisletterdoesnothaveanyparticularphoneticvaluebutmostlypronouncedasn

Igrave(Iacute+চ)Ntilde(Iacute+ছ)Ugrave(Iacute+জ)Uacute(Iacute+ঝ)times(Ocirc+ঞ)

(A+চ) $ (A+ছ) (A+জ) M (A+ঝ)

ণ n otilde(iquest+ট)frac34(iquest+ঠ)ouml(iquest+ড)divide(iquest+Acirc+র)Aring(iquest+ঢ)oslash(iquest+ণ)ণ8(iquest+য)ugrave(iquest+ব)acirc(p+frac14+ণ)uacute(frac14+ণ)ucirc(+ণ)

O (P+ড) - (P+R+র) + (S+ণ)

ঙং ŋ s(aelig+ক)uuml(aelig+p+র)egrave(aelig+খ)eth(aelig+গ)ocirc(aelig+ঘ)yacute(aelig+p+ষ)(Insomecontextsaeligisreplacedbyং)কংঅং

) (H+ক) (H+গ) U (H+ঘ)

64

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ম m thorn(+ল)yuml(+প)(+z+র)(+ভ)(++র)$(+ম)(+র)(x+ম)plusmn(deg+ম)amp(+ম)atilde(p+frac14+ম)

W (3+ম)

ন n (+ট)((+middot+র))(iquest+ঠ)u(+ড)(+Acirc+র)(+ত)q(+x+র)q8(+x++য)curren(+থ)shy(+দ)reg(+brvbar+র)micro(+ধ)+(+deg+র)(+brvbar+ব)-(+ন)(+ম)ন8(+য)(+স)0(+ন)

(7+থ) (7+ধ) (7+Y+র)

শ ʃ Icirc(Iuml+চ)Ograve(Iuml+ছ)1(Iuml+ন)2(Iuml+ম)3(Iuml+র)4(Iuml+ল)শ8(Iuml+য)

ষ ʃ 5(frac14+ক)raquo(frac14+ট)Agrave(frac14+ঠ)uacute(frac14+ণ)6(frac14+প)7(frac14+z+র)8(frac14+ফ)raquo(frac14+ট)9(frac14+middot+র)Agrave(frac14+ঠ)uacute(frac14+ণ)ষ8(frac14+য)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)

+ (S+ণ)

65

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

স sampʃ (+ক)(+ট)(+প)(+ফ)lt(+ত)pound(+থ)(+ট)(+ক)=(+খ)স8(+য)gt(+র)(+ল)aring(p+স)

9 (+থ)

হ h ucirc(+ণ)0(+ন)amp(+ম)হ8(+য)(+র)A(+ল)

W (3+ম)

ড় ɽ B(C+গ)

ঢ় ɽʱ (notprivilegedenoughtohaveclusters)

য dʒThesecondarysymbol(allograph)jɔ-phalahastwophoneticvaluesWhenaddedtotheinitialconsonantinaworditisavowelaelig(asinশ8ামলর 8াপারetc)Butafteranon-initialconsonantitjustdoublesitinpronunciation(asinকায6ধায6etc)The+যcombinationhastwophysicalmanifestationsmdashর 8andয6

ক8(p+য)স8(+য)র 8(+য)[Justর 8isneverusedinBanglaorthographyর 8াisbutthenitslasttwosymbolsYa-phalaa-karaconstituteavowelsignrepresentingthevowelঅ8া]

66

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

র r Twomanifestationsmdashi lরফrepʰasthefirst

memberofaclusteregপ6ৎ6 not6 য6D6(+deg+ব)(earlierE6=+brvbar+deg+বafour-termcluster)etc(placedoverthefollowingconsonant)

ii র-ফলাrɔ-pʰɔlaasthesecondthirdmemberofaclustereg etc(placedundertheconsonantitfollows)

ল l F(+গ)(+প)G(+ব)H(+ম)I(+ট)J(+ড)K(+ক)F(+গ)L(+দ)ল8(+য)iuml(acute+ল)(+ল)thorn(+ল)

ঃ hwordfinallywordmediallyitdoublesthepronunciationofthefollowingconsonant

অঃকঃ

অব

Page 63: Proposal for a Bangla (or Bengali) Script Root Zone …‘Bangla’ (or Bengali) is historically and genealogically regarded as an eastern Indo-Aryan language with around 178.2 million

63

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

গ g eacute(acute+গ)ecirc(acute+দ)sup3(acute+ধ)euml(acute+ন)igrave(acute+ব)iacute(acute+ম)গ8(acute+য)icirc(acute+র)iuml(acute+ল)eth(aelig+গ)eth6 (+aelig+গ)

( (+ধ) (H+গ) K (L+H+গ)

ঘ gʱ ntilde(ograve+ন)ঘ8(ograve+য)oacute(ograve+র)ocirc(aelig+ঘ)

ঞ Thisletterdoesnothaveanyparticularphoneticvaluebutmostlypronouncedasn

Igrave(Iacute+চ)Ntilde(Iacute+ছ)Ugrave(Iacute+জ)Uacute(Iacute+ঝ)times(Ocirc+ঞ)

(A+চ) $ (A+ছ) (A+জ) M (A+ঝ)

ণ n otilde(iquest+ট)frac34(iquest+ঠ)ouml(iquest+ড)divide(iquest+Acirc+র)Aring(iquest+ঢ)oslash(iquest+ণ)ণ8(iquest+য)ugrave(iquest+ব)acirc(p+frac14+ণ)uacute(frac14+ণ)ucirc(+ণ)

O (P+ড) - (P+R+র) + (S+ণ)

ঙং ŋ s(aelig+ক)uuml(aelig+p+র)egrave(aelig+খ)eth(aelig+গ)ocirc(aelig+ঘ)yacute(aelig+p+ষ)(Insomecontextsaeligisreplacedbyং)কংঅং

) (H+ক) (H+গ) U (H+ঘ)

64

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ম m thorn(+ল)yuml(+প)(+z+র)(+ভ)(++র)$(+ম)(+র)(x+ম)plusmn(deg+ম)amp(+ম)atilde(p+frac14+ম)

W (3+ম)

ন n (+ট)((+middot+র))(iquest+ঠ)u(+ড)(+Acirc+র)(+ত)q(+x+র)q8(+x++য)curren(+থ)shy(+দ)reg(+brvbar+র)micro(+ধ)+(+deg+র)(+brvbar+ব)-(+ন)(+ম)ন8(+য)(+স)0(+ন)

(7+থ) (7+ধ) (7+Y+র)

শ ʃ Icirc(Iuml+চ)Ograve(Iuml+ছ)1(Iuml+ন)2(Iuml+ম)3(Iuml+র)4(Iuml+ল)শ8(Iuml+য)

ষ ʃ 5(frac14+ক)raquo(frac14+ট)Agrave(frac14+ঠ)uacute(frac14+ণ)6(frac14+প)7(frac14+z+র)8(frac14+ফ)raquo(frac14+ট)9(frac14+middot+র)Agrave(frac14+ঠ)uacute(frac14+ণ)ষ8(frac14+য)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)

+ (S+ণ)

65

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

স sampʃ (+ক)(+ট)(+প)(+ফ)lt(+ত)pound(+থ)(+ট)(+ক)=(+খ)স8(+য)gt(+র)(+ল)aring(p+স)

9 (+থ)

হ h ucirc(+ণ)0(+ন)amp(+ম)হ8(+য)(+র)A(+ল)

W (3+ম)

ড় ɽ B(C+গ)

ঢ় ɽʱ (notprivilegedenoughtohaveclusters)

য dʒThesecondarysymbol(allograph)jɔ-phalahastwophoneticvaluesWhenaddedtotheinitialconsonantinaworditisavowelaelig(asinশ8ামলর 8াপারetc)Butafteranon-initialconsonantitjustdoublesitinpronunciation(asinকায6ধায6etc)The+যcombinationhastwophysicalmanifestationsmdashর 8andয6

ক8(p+য)স8(+য)র 8(+য)[Justর 8isneverusedinBanglaorthographyর 8াisbutthenitslasttwosymbolsYa-phalaa-karaconstituteavowelsignrepresentingthevowelঅ8া]

66

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

র r Twomanifestationsmdashi lরফrepʰasthefirst

memberofaclusteregপ6ৎ6 not6 য6D6(+deg+ব)(earlierE6=+brvbar+deg+বafour-termcluster)etc(placedoverthefollowingconsonant)

ii র-ফলাrɔ-pʰɔlaasthesecondthirdmemberofaclustereg etc(placedundertheconsonantitfollows)

ল l F(+গ)(+প)G(+ব)H(+ম)I(+ট)J(+ড)K(+ক)F(+গ)L(+দ)ল8(+য)iuml(acute+ল)(+ল)thorn(+ল)

ঃ hwordfinallywordmediallyitdoublesthepronunciationofthefollowingconsonant

অঃকঃ

অব

Page 64: Proposal for a Bangla (or Bengali) Script Root Zone …‘Bangla’ (or Bengali) is historically and genealogically regarded as an eastern Indo-Aryan language with around 178.2 million

64

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

ম m thorn(+ল)yuml(+প)(+z+র)(+ভ)(++র)$(+ম)(+র)(x+ম)plusmn(deg+ম)amp(+ম)atilde(p+frac14+ম)

W (3+ম)

ন n (+ট)((+middot+র))(iquest+ঠ)u(+ড)(+Acirc+র)(+ত)q(+x+র)q8(+x++য)curren(+থ)shy(+দ)reg(+brvbar+র)micro(+ধ)+(+deg+র)(+brvbar+ব)-(+ন)(+ম)ন8(+য)(+স)0(+ন)

(7+থ) (7+ধ) (7+Y+র)

শ ʃ Icirc(Iuml+চ)Ograve(Iuml+ছ)1(Iuml+ন)2(Iuml+ম)3(Iuml+র)4(Iuml+ল)শ8(Iuml+য)

ষ ʃ 5(frac14+ক)raquo(frac14+ট)Agrave(frac14+ঠ)uacute(frac14+ণ)6(frac14+প)7(frac14+z+র)8(frac14+ফ)raquo(frac14+ট)9(frac14+middot+র)Agrave(frac14+ঠ)uacute(frac14+ণ)ষ8(frac14+য)aacute(p+ষ)acirc(p+frac14+ণ)atilde(p+frac14+ম)

+ (S+ণ)

65

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

স sampʃ (+ক)(+ট)(+প)(+ফ)lt(+ত)pound(+থ)(+ট)(+ক)=(+খ)স8(+য)gt(+র)(+ল)aring(p+স)

9 (+থ)

হ h ucirc(+ণ)0(+ন)amp(+ম)হ8(+য)(+র)A(+ল)

W (3+ম)

ড় ɽ B(C+গ)

ঢ় ɽʱ (notprivilegedenoughtohaveclusters)

য dʒThesecondarysymbol(allograph)jɔ-phalahastwophoneticvaluesWhenaddedtotheinitialconsonantinaworditisavowelaelig(asinশ8ামলর 8াপারetc)Butafteranon-initialconsonantitjustdoublesitinpronunciation(asinকায6ধায6etc)The+যcombinationhastwophysicalmanifestationsmdashর 8andয6

ক8(p+য)স8(+য)র 8(+য)[Justর 8isneverusedinBanglaorthographyর 8াisbutthenitslasttwosymbolsYa-phalaa-karaconstituteavowelsignrepresentingthevowelঅ8া]

66

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

র r Twomanifestationsmdashi lরফrepʰasthefirst

memberofaclusteregপ6ৎ6 not6 য6D6(+deg+ব)(earlierE6=+brvbar+deg+বafour-termcluster)etc(placedoverthefollowingconsonant)

ii র-ফলাrɔ-pʰɔlaasthesecondthirdmemberofaclustereg etc(placedundertheconsonantitfollows)

ল l F(+গ)(+প)G(+ব)H(+ম)I(+ট)J(+ড)K(+ক)F(+গ)L(+দ)ল8(+য)iuml(acute+ল)(+ল)thorn(+ল)

ঃ hwordfinallywordmediallyitdoublesthepronunciationofthefollowingconsonant

অঃকঃ

অব

Page 65: Proposal for a Bangla (or Bengali) Script Root Zone …‘Bangla’ (or Bengali) is historically and genealogically regarded as an eastern Indo-Aryan language with around 178.2 million

65

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

স sampʃ (+ক)(+ট)(+প)(+ফ)lt(+ত)pound(+থ)(+ট)(+ক)=(+খ)স8(+য)gt(+র)(+ল)aring(p+স)

9 (+থ)

হ h ucirc(+ণ)0(+ন)amp(+ম)হ8(+য)(+র)A(+ল)

W (3+ম)

ড় ɽ B(C+গ)

ঢ় ɽʱ (notprivilegedenoughtohaveclusters)

য dʒThesecondarysymbol(allograph)jɔ-phalahastwophoneticvaluesWhenaddedtotheinitialconsonantinaworditisavowelaelig(asinশ8ামলর 8াপারetc)Butafteranon-initialconsonantitjustdoublesitinpronunciation(asinকায6ধায6etc)The+যcombinationhastwophysicalmanifestationsmdashর 8andয6

ক8(p+য)স8(+য)র 8(+য)[Justর 8isneverusedinBanglaorthographyর 8াisbutthenitslasttwosymbolsYa-phalaa-karaconstituteavowelsignrepresentingthevowelঅ8া]

66

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

র r Twomanifestationsmdashi lরফrepʰasthefirst

memberofaclusteregপ6ৎ6 not6 য6D6(+deg+ব)(earlierE6=+brvbar+deg+বafour-termcluster)etc(placedoverthefollowingconsonant)

ii র-ফলাrɔ-pʰɔlaasthesecondthirdmemberofaclustereg etc(placedundertheconsonantitfollows)

ল l F(+গ)(+প)G(+ব)H(+ম)I(+ট)J(+ড)K(+ক)F(+গ)L(+দ)ল8(+য)iuml(acute+ল)(+ল)thorn(+ল)

ঃ hwordfinallywordmediallyitdoublesthepronunciationofthefollowingconsonant

অঃকঃ

অব

Page 66: Proposal for a Bangla (or Bengali) Script Root Zone …‘Bangla’ (or Bengali) is historically and genealogically regarded as an eastern Indo-Aryan language with around 178.2 million

66

Consonants PhoneticValue Allographs

Clusters TransparentForm(BanglaAkademifont)

র r Twomanifestationsmdashi lরফrepʰasthefirst

memberofaclusteregপ6ৎ6 not6 য6D6(+deg+ব)(earlierE6=+brvbar+deg+বafour-termcluster)etc(placedoverthefollowingconsonant)

ii র-ফলাrɔ-pʰɔlaasthesecondthirdmemberofaclustereg etc(placedundertheconsonantitfollows)

ল l F(+গ)(+প)G(+ব)H(+ম)I(+ট)J(+ড)K(+ক)F(+গ)L(+দ)ল8(+য)iuml(acute+ল)(+ল)thorn(+ল)

ঃ hwordfinallywordmediallyitdoublesthepronunciationofthefollowingconsonant

অঃকঃ

অব