98mx17

Post on 25-Dec-2015

6 Views

Category:

Documents

4 Downloads

Preview:

Click to see full reader

DESCRIPTION

viva

TRANSCRIPT

1

MINPURAA: A PANDITHAM BASED

MULTILINGUAL MAIL SERVER AND CLIENT

J. Ramesh

98MX17

Guided by: Dr. P. Navaneethan

2

System Configuration

Pentium II Processor (333 MHz)

128 MB RAM

2.1 GB HDD

Hardware:

Software:

Platform: Microsoft Windows 2000 Professional

JAVA 2

Kawa 3.22 (IDE for JAVA)

Macromedia Fontographer 4.1

High Logic Font Creator Programmer 2.1

3

Motivation Existing Multilingual E-mail facilities support sending

E-mails in non-character oriented formats.

Most Multilingual products rely on fonts and not the

language.

User name and password are still provided only in

English.

The usage of E-mail is limited within the people who

know English.

‘To take IT to rural areas’ is the primary motivation.

4

Requisites A character encoding scheme for processing

Multilingual strings; PANDITHAM is used.

A protocol for sending mails; SMTP (RFC 821) is used.

A protocol for Message formats; Internet Message

Formats (RFC 822) is used.

Region based 16-bit character oriented Fonts .

5

Acronym for the Protocol for ApplicatioNs Development

In THAmizh and Multilingual computing.

It was developed and being improved by

Dr. P. Navaneethan

It gives life to individual characters in each language by

assigning unique values (in that language) to them.

Introduction:

PANDITHAM 4.0 - An overview

6

PANDITHAM Principle …

7

I Want ARUN!In English

Drop letters in1,18,21,14

1 2 .. 1814 2621.. .. .. 26.. ..1 2 .. 1814 21 ..

A1

R18

U21A

1A1A1A1A1A1A1

R18R18R18R18R18R18R18R18R18R18

U21U21U21U21U21U21U21U21U21U21U21

N14N14N14N14N14N14N14N14N14N14N14N14

1 2 .. 10050 247200.. .. ..

ENG

TAM

A1A1A1A1A1

R18

R18R18

R18

R18

R18

R18

R18

R18

U21

U21

U21

U21

U21

U21

U21

U21

U21

N14

ENGENGENGENG

8

Now I Want `R]f

In Tamil

Dropletters in1,161,91

1 2 .. 11891 247161.. .. ..

1 2 .. 1814 2621.. .. ..

1 2 .. 11891 247161.. .. ..

` R]f`````

`

`

`

`

ENG

TAM

RRRR

R

RRRR

R

]f]f]f]f]f

]f]f

]f]f

]f

TAMTAMTAMTAMTAM

RRR]f]f]f

RRRRRRRRR

9

Why Protocol ? PANDITHAM packs the 247 Thamizh characters into an 8 bit

space as follows:

` ~ ; : ... Oq # k ka ... ekq kf g ... W ...

e[q [f

9 10 11 12 20 21 22 23 33 34 35 65 254 255

The above lexical order has been referred from

tiRkfKbqf etqiv<Ar by Dr. M. v r t ra c [a af.

The value 65 stands for ‘A’ in ASCII and ‘W’ in Thamizh. To

resolve this ambiguity the machine needs a set of RULES.

Hence the Protocol.

The remaining nine characters (0-8) are used as PANDITHAM

Control and Punctuation characters.

10

The escape character DLC (Default Language Code) is the first byte of a PANDITHAM

string and is followed by the code that is assigned to that language.

This is followed by a Monolingual String and if need be, it can switch to a different

language.

The Language Switching can be in 2 different ways:

Way 1: There are at least 2 characters in the new language, in which case once again the escape

character DLC is used.

Way 2: There is exactly only one character in the new language, in which case the escape

character MLC (Momentary Language Code) is used. This escape character conveys

that language switching is momentary in nature.

Language Codes:ASCII - 05h

THAMIZH1 - 08h

THAMIZH2 - 09h

TELUGU - 0Ah

KANNADA - 0Bh

MALAYALAM - 0Ch

The Rule

11

Example:

String = maEtsfvr[f R

DLC TM1 ma Et MLC TM2 sf v r [f SP MLC ASC R

DLC TM1 8C 6B MLC TM2 E4 BF A5 FF SP MLC ASC 52 NULL

Length of the string : 15 bytes

12

Modification of the protocol rules slightly, so as to accommodate Telugu, Kannada etc. in a better way; i.e., by making use of a scheme similar to the one followed by Japanese, namely, DBCS (Double Byte Coding Scheme).

DLC TEL U L U L … DLC TM1 B B B … NULL

Value of a Telugu letter will be U * 256 + L

The Rule (contd.)

13

Design of Region and Language Databases

Region Fonts

Language

has

has

1

1

N

N

14

Structures of Database

Type Region regionCode As Byte /* Unique Region code */ regionName As String /* Name of the Region */ defaultFont As int /* Default font code of the region */End Type

Type Region_Fonts fontCode As int /* Unique Font Code */ fontName As String /* Name of the Font */ regionCode As Byte /* Font Association to a Region */End Type

15

Type Language /* Language Record Structure */ langCode As Byte /* Unique Language Code */ langName As String /* Language Name */ regionCode As Byte /* Unique Region Code*/ DBCS As Boolean /* Double Byte Coding Scheme */ fontLocation As Char /* Location in 16 Bit Font

Table, sizeof (Char) is 2 Bytes */ fontUnits As Byte /* No. Of Font Units, 1 Unit = 256

Slots */ weight As Byte /* Language Weight for sorting */End Type

Structures of Database (Contd.)

16

Universal

THAMIZH1 (Pure Thamizh)

Telugu

Other Languages in this region (Kannada, Malayalam, etc.,)

8200H

THAMIZH2 (Grantha)

80FFH

8100H

.

.

.

.

.

.

.

.

.

81FFH

0000H

7FFFH

8000H

.

.

.

8AFFH

8BFFH

.

.

.

FFFFH

Slots allotted in 16-bit Font table

17

Language based ordering is

feasible

Network

Congestion is low

Lexical Order Sorting is easy

No Mis-scripting

eg. eci

No Kerning Problemeg. ]f

Merits

Ease of Speech Synthesis

18

PANDITHAM as applied to other languages

Name : Krishna Reddy PANDITHAM rep.: DLC ASC K r i s h n a R e d d y NULL Length : 16 bytes

Name : kiRxf]a erdfF PANDITHAM rep.: DLC TM1 ki R MLC TM2 xf ]a SP er df F NULL Length : 13 bytes

Name : h.v{^w¨ UvxD« PANDITHAM rep.: DLC KAN .v{ ^w¨ Uvx D« NULL

Length : 12 bytes

On the Average, to represent Kannada or Telugu strings in PANDITHAM, it may require about 2 bytes per character

19

Storage RequirementsMonolingualeg. tiRkfKbqf, Any English Text

1 Byte / Character

Multilingual

(in the worst case it is multilingual to the core ; i.e., alternate letters switch between two different languages)

eg. ½ ai h Uv n (22 bytes)

DLC HIN ½ MLC TM1 ai MLC TM2 h MLC KAN Uv MLC ENG n null

1.1 Bytes / Character Bilingual (Thamizh & Grantha)

Best case: Most of the characters belong to the same languageeg. tiRkfKbqf

Worst case: Alternate letters switch between two different languages eg. haihErxf

1 Byte / Character

2 Bytes / Character

The Average will depend on Languages present

20

Performance of various schemes - A Comparison

7 bit ASCII

8 bit ASCII(GlyphBased)

Issues----------Schemes

PANDITHAM

1 Byte

1 - 3 Bytes

3.5 Bytesfor Thamizh

StoragePer char

Best Case1 ByteWorst Case2 BytesLikely Case1.1 Bytes

2 Bytes

Very low

Very High

ExtremelyHigh

NetworkCongestion

Low

Very High

Simple

Complex andParsing

Required

Complex

LexicalOrder Sorting

Simple

Simple

N.A.

No

No

Flexibilityin Language Ordering?

Yes

No

Difficult

ComplexParsing

Req.and

Discontinuous

Complex

SpeechSynthesis

Simple

Simple

Random Processing of

letters ?

Yes

No

No

Simple for Monolingual

(Pure Thamizh)

Yes

Lingual

Mono(English)

Bi

ISCII Based

Multi

Char Based

ISCII Based

unicodeMulti

Multi

21

Features of various schemesFeatures----------Schemes

7 bit ASCII

8 bit ASCII

(Glyph Based)

UnicodeISCII Based

UnicodeChar. Based

PANDITHAM

Characterrendering

Simple

Simple, but not always.But time consuming

Parsing required,time consuming

Simple

Simple

KerningProblem

Yes

Noeg. ]ffff

May be

Yes

Yes

Mis-ScriptProblem

Yes

Noeg. eci

Yes

Yes

Yes

Lingual

Mono(English)

Bilingual

Indian

Multi

Multi

ELIMINATES

22

An Overview of E-mail

Mail Transfer Agents (MTAs)

Mail User Agents (MUAs)

Permanent Programs run on the hosts

Listens for E-mail

Saves the E-mail for the local users

Host Computers run MTAs, also known as Mail Servers

Run by user to send (or) receive E-mails

An interface to view the E-mail

Facilitates communication with the MTA

23

Study of Existing Multilingual E-mail Providers

It is a Mail User Agent (Client Program)

Supports 12 Indian languages including Thamizh

Supports Various Keyboard Layouts including Tamil99 Keyboard

Mail Message is despatched in Rich Text Format which is very Costly

Fonts are attached with the Mail

Glyph based Editing (glyphs ‘N’ and ‘ÿ’ make ‘Nÿ’)

Mis-Scripting and Kerning problems are encountered (eg. eg. EciEci ) (eg. XI, ]f)

IndoMail (By Lastech Systems):

24

It is web based mail service

Multilingual E-mail

No Standard Keyboard Layout has been used

Mail Message is converted to image format (.gif) and sent to the destination

Mis-Scripting and Kerning problems are identified

End-user should know English to send E-mail in Thamizh service since they use Transliteration technology, i.e., ‘ka’ becomes ‘k’ and ‘amma’ becomes ‘`mfma’

www.bharatmail.com:

25

Protocol used for communication between MTAs and

between MUAs and MTAs

Objective is to transfer Mail reliably and efficiently

Clients use 4 letter Command for communication with

Server

3 Digit numeric code is used as the response by Server

SMTP Servers usually listen on port 25

An Overview of SMTP

26

Sender-SMTP Receiver-SMTP

User

File System

File System

SMTP Commands/

Replies

and Mail

The SMTP Model

27

Opening and Closing Connection

Sender-SMTP Receiver-SMTP

MUA MTA

HELO <panditham>

250 PMS-Ok

QUIT

221 PMS-service

closing transmission

channel

220 Ready

28

Sending Mail

Sender-SMTP Receiver-SMTP

MUA MTA

MAIL FROM:<panditham>

250 PMS-Ok

RCPT TO:<ØÊå@Nt>

250 PMS-Ok

DATA

354 PMS-Start mail input

<Happy Thamizh new year é×Ìʵ Og×ÏZNt>

<CRLF>.<CRLF>

250 PMS-Ok

29

E-R Diagram

User

Message

Has

N

M

30

Name Type Size (in bytes)

Record_size Short 2

User_id_no Integer 4

Password Byte Var

User_id Byte Var

User_name Byte Var

Sex Byte 1

Alternate_email Byte Var

Contact_phone Byte Var

MailDayBox Byte 1

Table Design

Table: USER

31

Name Type Size (in bytes)

Record_size Short 2

User_id_no Integer 4

Message_id Long 8

IsRead Boolean 1

IsUrgent Boolean 1

Date Byte Var

Sender Byte Var

Subject Byte Var

Name Type Size (in bytes)

Record_size Long 8

Message_id Long 8

Message_data Byte Var

Table: USER_MESSAGE_<0-6>

Table: MESSAGE

32

USER_MESSAGE_0

00

USER_MESSAGE_6

100011

1 0234567bit

Usage of MailDayBox Field

33

Format of E-mail message

E-mail consists of header part and message part

Headers are terminated by null line

Message is terminated by <CRLF>.<CRLF>

Headers Used

To: Date:Cc: Urgent:From: ImmDel:Subject: Delivery-Date:Day: Received:

34

To: 5ì×Cc: From: "Administrator" <admin>Subject: åʳËþ/PandithamDay: 3 /* 0-Sunday, 1-Monday and so on */Date: 06/12/2000Urgent: 0 /* 0-Not Urgent, and 1-Urgent */Immdel: 0 /* 0-Do Nothing, and 1-Delete on Read */Delivery-Date: Wed Dec 06 11:31:15 GMT+05:30 2000Received: from panditham1/164.16.18.181

A½é¹*@/Hello,

åʳËþ ô½±°hM oVZON åÍR n3ËúZR ôZN ؽw. ËgNt I'gNlZR "åʳËþ" G½u öNN6ZR ô½±°hM AµñåRþ.

Thank you for registering with Panditham Mail Service. Mail to "admin" for any clarification and/or help.ؽw/Regards,åʳËþ R_/The Panditham Team.

A sample mail message

35

Supporting tools

Thamizh Keyboard driver in the form of Component

Tool for Language Database Maintenance

Multilingual Text components

Multilingual password component

Multilingual Message Box

Tool for creating interface data

16-bit fonts; Muhil, Aruvi, and Thamizhan have been

developed

36

Features of PANDITHAM Mail Server

Understands Multilingual strings

Uses Character oriented protocol PANDITHAM for processing

multilingual strings

Registration of a new User in multilingual form,

i.e., 4ù×_J is a valid user-id

Handles multiple users at the same time

Tools provided to monitor the clients

Supports unique features such as Urgent mail and

Delete on read mail.

37

Features of PANDITHAM Mail Client

Provides Multilingual User Interface

Character Oriented data entry

No Kerning Problem (eg. XI, ]f)

No Mis-Scripting (eg. eg. Eci Eci )

Optimized data transfer for Multilingual strings.

Uses Tamil99 Keyboard Layout (Phonetic) for Thamizh

Provision is provided to send English mails to other mail servers

Also features POP3 client facility, such that mails can be read from

other servers, which support POP3 mails

38

Conclusion

The reliability of Multilinguality is of high standard

as the core engine of the system is based on PANDITHAM.

The system can easily accommodate new languages when

the appropriate keyboard drivers are provided in the form

of components.

The system can be further improved by

incorporating POP3 Server feature in the PANDITHAM

Mail Server.

39

40

ؽwVisit: www.psgtech.ac.in/panditham/Mail: panditham@yahoo.com

top related