Download - 98MX17
1
MINPURAA: A PANDITHAM BASED
MULTILINGUAL MAIL SERVER AND CLIENT
J. Ramesh
98MX17
Guided by: Dr. P. Navaneethan
2
System Configuration
Pentium II Processor (333 MHz)
128 MB RAM
2.1 GB HDD
Hardware:
Software:
Platform: Microsoft Windows 2000 Professional
JAVA 2
Kawa 3.22 (IDE for JAVA)
Macromedia Fontographer 4.1
High Logic Font Creator Programmer 2.1
3
Motivation Existing Multilingual E-mail facilities support sending
E-mails in non-character oriented formats.
Most Multilingual products rely on fonts and not the
language.
User name and password are still provided only in
English.
The usage of E-mail is limited within the people who
know English.
‘To take IT to rural areas’ is the primary motivation.
4
Requisites A character encoding scheme for processing
Multilingual strings; PANDITHAM is used.
A protocol for sending mails; SMTP (RFC 821) is used.
A protocol for Message formats; Internet Message
Formats (RFC 822) is used.
Region based 16-bit character oriented Fonts .
5
Acronym for the Protocol for ApplicatioNs Development
In THAmizh and Multilingual computing.
It was developed and being improved by
Dr. P. Navaneethan
It gives life to individual characters in each language by
assigning unique values (in that language) to them.
Introduction:
PANDITHAM 4.0 - An overview
6
PANDITHAM Principle …
7
I Want ARUN!In English
Drop letters in1,18,21,14
1 2 .. 1814 2621.. .. .. 26.. ..1 2 .. 1814 21 ..
A1
R18
U21A
1A1A1A1A1A1A1
R18R18R18R18R18R18R18R18R18R18
U21U21U21U21U21U21U21U21U21U21U21
N14N14N14N14N14N14N14N14N14N14N14N14
1 2 .. 10050 247200.. .. ..
ENG
TAM
A1A1A1A1A1
R18
R18R18
R18
R18
R18
R18
R18
R18
U21
U21
U21
U21
U21
U21
U21
U21
U21
N14
ENGENGENGENG
8
Now I Want `R]f
In Tamil
Dropletters in1,161,91
1 2 .. 11891 247161.. .. ..
1 2 .. 1814 2621.. .. ..
1 2 .. 11891 247161.. .. ..
` R]f`````
`
`
`
`
ENG
TAM
RRRR
R
RRRR
R
]f]f]f]f]f
]f]f
]f]f
]f
TAMTAMTAMTAMTAM
RRR]f]f]f
RRRRRRRRR
9
Why Protocol ? PANDITHAM packs the 247 Thamizh characters into an 8 bit
space as follows:
` ~ ; : ... Oq # k ka ... ekq kf g ... W ...
e[q [f
9 10 11 12 20 21 22 23 33 34 35 65 254 255
The above lexical order has been referred from
tiRkfKbqf etqiv<Ar by Dr. M. v r t ra c [a af.
The value 65 stands for ‘A’ in ASCII and ‘W’ in Thamizh. To
resolve this ambiguity the machine needs a set of RULES.
Hence the Protocol.
The remaining nine characters (0-8) are used as PANDITHAM
Control and Punctuation characters.
10
The escape character DLC (Default Language Code) is the first byte of a PANDITHAM
string and is followed by the code that is assigned to that language.
This is followed by a Monolingual String and if need be, it can switch to a different
language.
The Language Switching can be in 2 different ways:
Way 1: There are at least 2 characters in the new language, in which case once again the escape
character DLC is used.
Way 2: There is exactly only one character in the new language, in which case the escape
character MLC (Momentary Language Code) is used. This escape character conveys
that language switching is momentary in nature.
Language Codes:ASCII - 05h
THAMIZH1 - 08h
THAMIZH2 - 09h
TELUGU - 0Ah
KANNADA - 0Bh
MALAYALAM - 0Ch
The Rule
11
Example:
String = maEtsfvr[f R
DLC TM1 ma Et MLC TM2 sf v r [f SP MLC ASC R
DLC TM1 8C 6B MLC TM2 E4 BF A5 FF SP MLC ASC 52 NULL
Length of the string : 15 bytes
12
Modification of the protocol rules slightly, so as to accommodate Telugu, Kannada etc. in a better way; i.e., by making use of a scheme similar to the one followed by Japanese, namely, DBCS (Double Byte Coding Scheme).
DLC TEL U L U L … DLC TM1 B B B … NULL
Value of a Telugu letter will be U * 256 + L
The Rule (contd.)
13
Design of Region and Language Databases
Region Fonts
Language
has
has
1
1
N
N
14
Structures of Database
Type Region regionCode As Byte /* Unique Region code */ regionName As String /* Name of the Region */ defaultFont As int /* Default font code of the region */End Type
Type Region_Fonts fontCode As int /* Unique Font Code */ fontName As String /* Name of the Font */ regionCode As Byte /* Font Association to a Region */End Type
15
Type Language /* Language Record Structure */ langCode As Byte /* Unique Language Code */ langName As String /* Language Name */ regionCode As Byte /* Unique Region Code*/ DBCS As Boolean /* Double Byte Coding Scheme */ fontLocation As Char /* Location in 16 Bit Font
Table, sizeof (Char) is 2 Bytes */ fontUnits As Byte /* No. Of Font Units, 1 Unit = 256
Slots */ weight As Byte /* Language Weight for sorting */End Type
Structures of Database (Contd.)
16
Universal
THAMIZH1 (Pure Thamizh)
Telugu
Other Languages in this region (Kannada, Malayalam, etc.,)
8200H
THAMIZH2 (Grantha)
80FFH
8100H
.
.
.
.
.
.
.
.
.
81FFH
0000H
7FFFH
8000H
.
.
.
8AFFH
8BFFH
.
.
.
FFFFH
Slots allotted in 16-bit Font table
17
Language based ordering is
feasible
Network
Congestion is low
Lexical Order Sorting is easy
No Mis-scripting
eg. eci
No Kerning Problemeg. ]f
Merits
Ease of Speech Synthesis
18
PANDITHAM as applied to other languages
Name : Krishna Reddy PANDITHAM rep.: DLC ASC K r i s h n a R e d d y NULL Length : 16 bytes
Name : kiRxf]a erdfF PANDITHAM rep.: DLC TM1 ki R MLC TM2 xf ]a SP er df F NULL Length : 13 bytes
Name : h.v{^w¨ UvxD« PANDITHAM rep.: DLC KAN .v{ ^w¨ Uvx D« NULL
Length : 12 bytes
On the Average, to represent Kannada or Telugu strings in PANDITHAM, it may require about 2 bytes per character
19
Storage RequirementsMonolingualeg. tiRkfKbqf, Any English Text
1 Byte / Character
Multilingual
(in the worst case it is multilingual to the core ; i.e., alternate letters switch between two different languages)
eg. ½ ai h Uv n (22 bytes)
DLC HIN ½ MLC TM1 ai MLC TM2 h MLC KAN Uv MLC ENG n null
1.1 Bytes / Character Bilingual (Thamizh & Grantha)
Best case: Most of the characters belong to the same languageeg. tiRkfKbqf
Worst case: Alternate letters switch between two different languages eg. haihErxf
1 Byte / Character
2 Bytes / Character
The Average will depend on Languages present
20
Performance of various schemes - A Comparison
7 bit ASCII
8 bit ASCII(GlyphBased)
Issues----------Schemes
PANDITHAM
1 Byte
1 - 3 Bytes
3.5 Bytesfor Thamizh
StoragePer char
Best Case1 ByteWorst Case2 BytesLikely Case1.1 Bytes
2 Bytes
Very low
Very High
ExtremelyHigh
NetworkCongestion
Low
Very High
Simple
Complex andParsing
Required
Complex
LexicalOrder Sorting
Simple
Simple
N.A.
No
No
Flexibilityin Language Ordering?
Yes
No
Difficult
ComplexParsing
Req.and
Discontinuous
Complex
SpeechSynthesis
Simple
Simple
Random Processing of
letters ?
Yes
No
No
Simple for Monolingual
(Pure Thamizh)
Yes
Lingual
Mono(English)
Bi
ISCII Based
Multi
Char Based
ISCII Based
unicodeMulti
Multi
21
Features of various schemesFeatures----------Schemes
7 bit ASCII
8 bit ASCII
(Glyph Based)
UnicodeISCII Based
UnicodeChar. Based
PANDITHAM
Characterrendering
Simple
Simple, but not always.But time consuming
Parsing required,time consuming
Simple
Simple
KerningProblem
Yes
Noeg. ]ffff
May be
Yes
Yes
Mis-ScriptProblem
Yes
Noeg. eci
Yes
Yes
Yes
Lingual
Mono(English)
Bilingual
Indian
Multi
Multi
ELIMINATES
22
An Overview of E-mail
Mail Transfer Agents (MTAs)
Mail User Agents (MUAs)
Permanent Programs run on the hosts
Listens for E-mail
Saves the E-mail for the local users
Host Computers run MTAs, also known as Mail Servers
Run by user to send (or) receive E-mails
An interface to view the E-mail
Facilitates communication with the MTA
23
Study of Existing Multilingual E-mail Providers
It is a Mail User Agent (Client Program)
Supports 12 Indian languages including Thamizh
Supports Various Keyboard Layouts including Tamil99 Keyboard
Mail Message is despatched in Rich Text Format which is very Costly
Fonts are attached with the Mail
Glyph based Editing (glyphs ‘N’ and ‘ÿ’ make ‘Nÿ’)
Mis-Scripting and Kerning problems are encountered (eg. eg. EciEci ) (eg. XI, ]f)
IndoMail (By Lastech Systems):
24
It is web based mail service
Multilingual E-mail
No Standard Keyboard Layout has been used
Mail Message is converted to image format (.gif) and sent to the destination
Mis-Scripting and Kerning problems are identified
End-user should know English to send E-mail in Thamizh service since they use Transliteration technology, i.e., ‘ka’ becomes ‘k’ and ‘amma’ becomes ‘`mfma’
www.bharatmail.com:
25
Protocol used for communication between MTAs and
between MUAs and MTAs
Objective is to transfer Mail reliably and efficiently
Clients use 4 letter Command for communication with
Server
3 Digit numeric code is used as the response by Server
SMTP Servers usually listen on port 25
An Overview of SMTP
26
Sender-SMTP Receiver-SMTP
User
File System
File System
SMTP Commands/
Replies
and Mail
The SMTP Model
27
Opening and Closing Connection
Sender-SMTP Receiver-SMTP
MUA MTA
HELO <panditham>
250 PMS-Ok
QUIT
221 PMS-service
closing transmission
channel
220 Ready
28
Sending Mail
Sender-SMTP Receiver-SMTP
MUA MTA
MAIL FROM:<panditham>
250 PMS-Ok
RCPT TO:<ØÊå@Nt>
250 PMS-Ok
DATA
354 PMS-Start mail input
<Happy Thamizh new year é×Ìʵ Og×ÏZNt>
<CRLF>.<CRLF>
250 PMS-Ok
29
E-R Diagram
User
Message
Has
N
M
30
Name Type Size (in bytes)
Record_size Short 2
User_id_no Integer 4
Password Byte Var
User_id Byte Var
User_name Byte Var
Sex Byte 1
Alternate_email Byte Var
Contact_phone Byte Var
MailDayBox Byte 1
Table Design
Table: USER
31
Name Type Size (in bytes)
Record_size Short 2
User_id_no Integer 4
Message_id Long 8
IsRead Boolean 1
IsUrgent Boolean 1
Date Byte Var
Sender Byte Var
Subject Byte Var
Name Type Size (in bytes)
Record_size Long 8
Message_id Long 8
Message_data Byte Var
Table: USER_MESSAGE_<0-6>
Table: MESSAGE
32
USER_MESSAGE_0
00
USER_MESSAGE_6
100011
1 0234567bit
Usage of MailDayBox Field
33
Format of E-mail message
E-mail consists of header part and message part
Headers are terminated by null line
Message is terminated by <CRLF>.<CRLF>
Headers Used
To: Date:Cc: Urgent:From: ImmDel:Subject: Delivery-Date:Day: Received:
34
To: 5ì×Cc: From: "Administrator" <admin>Subject: åʳËþ/PandithamDay: 3 /* 0-Sunday, 1-Monday and so on */Date: 06/12/2000Urgent: 0 /* 0-Not Urgent, and 1-Urgent */Immdel: 0 /* 0-Do Nothing, and 1-Delete on Read */Delivery-Date: Wed Dec 06 11:31:15 GMT+05:30 2000Received: from panditham1/164.16.18.181
A½é¹*@/Hello,
åʳËþ ô½±°hM oVZON åÍR n3ËúZR ôZN ؽw. ËgNt I'gNlZR "åʳËþ" G½u öNN6ZR ô½±°hM AµñåRþ.
Thank you for registering with Panditham Mail Service. Mail to "admin" for any clarification and/or help.ؽw/Regards,åʳËþ R_/The Panditham Team.
A sample mail message
35
Supporting tools
Thamizh Keyboard driver in the form of Component
Tool for Language Database Maintenance
Multilingual Text components
Multilingual password component
Multilingual Message Box
Tool for creating interface data
16-bit fonts; Muhil, Aruvi, and Thamizhan have been
developed
36
Features of PANDITHAM Mail Server
Understands Multilingual strings
Uses Character oriented protocol PANDITHAM for processing
multilingual strings
Registration of a new User in multilingual form,
i.e., 4ù×_J is a valid user-id
Handles multiple users at the same time
Tools provided to monitor the clients
Supports unique features such as Urgent mail and
Delete on read mail.
37
Features of PANDITHAM Mail Client
Provides Multilingual User Interface
Character Oriented data entry
No Kerning Problem (eg. XI, ]f)
No Mis-Scripting (eg. eg. Eci Eci )
Optimized data transfer for Multilingual strings.
Uses Tamil99 Keyboard Layout (Phonetic) for Thamizh
Provision is provided to send English mails to other mail servers
Also features POP3 client facility, such that mails can be read from
other servers, which support POP3 mails
38
Conclusion
The reliability of Multilinguality is of high standard
as the core engine of the system is based on PANDITHAM.
The system can easily accommodate new languages when
the appropriate keyboard drivers are provided in the form
of components.
The system can be further improved by
incorporating POP3 Server feature in the PANDITHAM
Mail Server.
39
40
ؽwVisit: www.psgtech.ac.in/panditham/Mail: [email protected]