bidi in the wild - unicode conference in the wild challenges of the unicode bidirectional algorithm...

Post on 30-May-2018

217 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

BiDi in the WildChallenges of the Unicode Bidirectional algorithm

Moriel Schottlender

Software Engineer

Wikipedia’sRight-to-Left support

Right-to-Left Wikipedias● ~260 Wikipedias in Left-to-Right

● ~17 Wikipedias in Right-to-Left

Right-to-Left Wikipedias● ~260 Wikipedias in Left-to-Right

● ~17 Wikipedias in Right-to-Left

Arabic Wikipedia ~1,000,000 users

~375,000 articles

Right-to-Left Wikipedias● ~260 Wikipedias in Left-to-Right

● ~17 Wikipedias in Right-to-Left

Arabic Wikipedia ~1,000,000 users

~375,000 articles

Persian Wikipedia ~514,000 users

~460,000 articles

Right-to-Left Wikipedias● ~260 Wikipedias in Left-to-Right

● ~17 Wikipedias in Right-to-Left

Arabic Wikipedia ~1,000,000 users

~375,000 articles

Persian Wikipedia ~514,000 users

~460,000 articles

Hebrew Wikipedia ~277,000 users

~175,000 articles

Editing Right-to-Left Wikipedias

A brief history ofRight-to-Left support online

Long long ago● Computers mostly only knew Left to Right

● Supporting non-latin scripts required special fonts

● There was no real Right-to-Left support

Long long ago● Computers mostly only knew Left to Right

● Supporting non-latin scripts required special fonts

● There was no real Right-to-Left support

Solution:

Long long ago● Computers mostly only knew Left to Right

● Supporting non-latin scripts required special fonts

● There was no real Right-to-Left support

Solution: Writing backwards

Long long ago● Computers mostly only knew Left to Right

● Supporting non-latin scripts required special fonts

● There was no real Right-to-Left support

Solution: Writing backwards

Long long ago● Computers mostly only knew Left to Right

● Supporting non-latin scripts required special fonts

● There was no real Right-to-Left support

Solution: Writing backwards

enod eb ot dah gnihtemoS

Long long ago● Computers mostly only knew Left to Right

● Supporting non-latin scripts required special fonts

● There was no real Right-to-Left support

Solution: Writing backwards

Something had to be done

● Visual: שלום עולם

● Logical: שלום עולם

Char order: 1 2 3 4

Char order: 4 3 2 1

Pre-BiDi Solution: Visual and Logical encoding

● Visual: שלום עולם

● Logical: שלום עולם

1 2 3 4

4 3 2 1

(Someone had to type this backwards!)

Pre-BiDi Solution: Visual and Logical encoding

Char order:

Char order:

Pre-BiDi Solution: Visual and Logical encoding

● Visual: שלום עולם

● Logical: שלום עולם

Char order: 1 2 3 4

Char order: 4 3 2 1

(Someone had to type this backwards!)

UnicodeBidirectionalAlgorithm

Unicode Bidirectional Algorithmhttp://unicode.org/reports/tr9/

If all text on a page is uniform (all RTL or all LTR) the ordering of the display text is

unambiguous.

Unicode Bidirectional Algorithmhttp://unicode.org/reports/tr9/

If all text on a page is uniform (all RTL or all LTR) the ordering of the display text is

unambiguous.

● RTL content can include digits (written LTR)

● RTL content can be mixed with LTR content

Unicode Bidirectional Algorithmhttp://unicode.org/reports/tr9/

If all text on a page is uniform (all RTL or all LTR) the ordering of the display text is

unambiguous.

● RTL content can include digits (written LTR)

● RTL content can be mixed with LTR content

Santa Claraב Unicode Conferenceאני הולכת להרצות בExamples:

צריך להתקשר ל 555-123-4567

Unicode Bidirectional Algorithmhttp://unicode.org/reports/tr9/

If all text on a page is uniform (all RTL or all LTR) the ordering of the display text is

unambiguous.

● RTL content can include digits (written LTR)

● RTL content can be mixed with LTR content

The Bidirectional Algorithm is meant to solve ambiguity in rendering order.

Santa Claraב Unicode Conferenceאני הולכת להרצות בExamples:

צריך להתקשר ל 555-123-4567

Quick primer to BiDi entity typesStrong

Weak

Neutral

Affect the directionality of entities around them

Do not affect the directionality of entities around them

Take the directionality of the context they’re in

Alphabet

Punctuation*, digits

Space, newline, tab, etc

http://unicode.org/reports/tr9/

Numbersעברית 123

עברית 123

Numbers

RTL

LTR

עברית 123

Numbers

RTL

עברית 1 23

LTR

RTL

LTR

(Whitespace is neutral)

עברית 123

Numbers

RTL

עברית 1 23

LTR

RTL

LTR

(Whitespace is neutral)

עברית 1 2 3RTL

Text and numbersEnglish 1 2 3 Hebrew 1 2 3 English

English 1 2 3 Hebrew 1 2 3 English

English 1 2 3 3 2 1 עברית English

Text and numbers

English 1 2 3 Hebrew 1 2 3 English

English 1 2 3 3 2 1 עברית English

Weak WeakStrongStrong Strong

Text and numbers

English 1 2 3 3 2 1 עברית English

LTR

Text and numbersEnglish 1 2 3 Hebrew 1 2 3 English

Weak WeakStrongStrong Strong

English 1 2 3 3 2 1 עברית English

LTR RTL

Text and numbersEnglish 1 2 3 Hebrew 1 2 3 English

Weak WeakStrongStrong Strong

English 1 2 3 3 2 1 עברית English

LTR RTL LTR

Text and numbersEnglish 1 2 3 Hebrew 1 2 3 English

Weak WeakStrongStrong Strong

The confusing issueof the parentheses

The confusing issueof the parenthesesDemo

Parentheses

Parentheses

(hello)

Parentheses

(hello)

Parentheses

(hello) (שלום)

Parentheses

(hello) (שלום)

Parentheses

Parentheses

Good luck

with HTML

Parentheses

Good luck

with HTML

Or math

comparisons

<a href="http://wikipedia.org" title="foo">bar</a>LTR

<a href="http://wikipedia.org" title="foo">bar</a>LTR

RTL

<a href="http://wikipedia.org" title="foo">שלום</a>LTR

LTR

<a href="http://wikipedia.org" title="אהלן">שלום</a>

<a href="http://wikipedia.org" title="foo">bar</a>LTR

RTL

<a href="http://wikipedia.org" title="foo">שלום</a>LTR

LTR

RTL

LTR

LTR

<a href="http://wikipedia.org" title="אהלן">שלום</a>

<a href="http://wikipedia.org" title="foo">bar</a>LTR

RTL

<a href="http://wikipedia.org" title="foo">שלום</a>LTR

LTR

RTL

LTR

LTR

RTL[[קובץ:Moriel schottlender.jpg|250px|שמאל|זו אני!]]

RTL[[קובץ:Moriel schottlender.jpg|250px|שמאל|זו אני!]]

[[קובץ:Moriel schottlender.jpg|250px|שמאל|זו אני!]]RTLLTR

RTL[[קובץ:Moriel schottlender.jpg|250px|שמאל|זו אני!]]

RTL

RTL

[[קובץ:Moriel schottlender.jpg|250px|שמאל|זו אני!]]RTLLTR

RTL[[קובץ:Moriel schottlender.jpg|250px|שמאל|זו אני!]]

RTL

RTL

Your brain on BiDi

Credit: U.S. Navy photo by Photographer’s Mate 2nd Class Aaron Peterson. Public Domain.

https://commons.wikimedia.org/wiki/File:US_Navy_020712-N-5471P-010_EOD_teams_detonate_expired_ordnance_in_the_Kuwaiti_desert.jpg

The tale of ananimated bitmap

fig.bmp

\u202epmb.gif

fig.bmp

\u202epmb.gif

fig.bmp

#!/usr/bin/env pythonimport shutilshutil.copy("animated.gif", u"\u202Epmb.gif")

Code by David Chan

\u202epmb.gif

fig.bmp

#!/usr/bin/env pythonimport shutilshutil.copy("animated.gif", u"\u202Epmb.gif")

Code by David Chan

\u202epmb.gif

fig.bmp

#!/usr/bin/env pythonimport shutilshutil.copy("animated.gif", u"\u202Epmb.gif")

Code by David Chan

\u202epmb.gif

fig.bmp

#!/usr/bin/env pythonimport shutilshutil.copy("animated.gif", u"\u202Epmb.gif")

Code by David Chan

\u202epmb.gif

fig.bmp

#!/usr/bin/env pythonimport shutilshutil.copy("animated.gif", u"\u202Epmb.gif")

Code by David Chan

\u202epmb.gif

fig.bmp

#!/usr/bin/env pythonimport shutilshutil.copy("animated.gif", u"\u202Epmb.gif")

Code by David Chan

\u202epmb.gif

fig.bmp

#!/usr/bin/env pythonimport shutilshutil.copy("animated.gif", u"\u202Epmb.gif")

Code by David Chan

Control characters● Implicit directional formatting

○ U+200E: LEFT-TO-RIGHT MARK (LRM)

○ U+200F: RIGHT-TO-LEFT MARK (RLM)

● Explicit directional embedding

○ U+202A: LEFT-TO-RIGHT EMBEDDING (LRE)

○ U+202B: RIGHT-TO-LEFT EMBEDDING (RLE)

○ U+202C: POP DIRECTIONAL FORMATTING (PDF)

● Explicit directional override

○ U+202D: LEFT-TO-RIGHT OVERRIDE (LRO)

○ U+202E: RIGHT-TO-LEFT OVERRIDE (RLO)

● Explicit directional isolate

○ U+2066: LEFT-TO-RIGHT ISOLATE

○ U+2067: RIGHT-TO-LEFT ISOLATE

○ U+2068: FIRST STRONG ISOLATE

○ U+2069: POP DIRECTIONAL ISOLATE

When BiDi istechnically correctbut practically wrong

Solution: Force Isolation

Solution: Force Isolation

Solution: Force Isolation

Solution: Force Isolation

New topic created on [board name]: “<bdi>[topic title]</bdi>”

Solution: Force Isolation

New topic created on [board name]: “<bdi>[topic title]</bdi>”

New topic created on [board name]: “<bdi>[topic title]</bdi>”

Solution: Force Isolation

Dates

Dates

Dates

TLV אל IST 28 במאי, 8:40

(TLV to IST 28 May, 8:40)

Dates

TLV אל IST 28 במאי, 8:40

LTRLTR RTLRTL

(TLV to IST 28 May, 8:40)

Dates

TLV אל IST 28 במאי, 8:40

LTRLTR RTLRTL

(TLV to IST 28 May, 8:40)

LTR email in RTL clients

LTR email in RTL clients

LTR email in RTL clients

LTR email in RTL clients

LTR client

LTR email in RTL clients

LTR client

RTL client

LTR email in RTL clients

LTR client

RTL client

LTR email in RTL clients

LTR client

RTL client

12

1 2

LTR email in RTL clients

LTR client

RTL client

12

1 2

LTR client

RTL client

12

1 2

Solution: Always define content directionality

Applicationsimplement BiDiinconsistently

Web

Inconsistent implementation of BiDi (Facebook)

Web

Inconsistent implementation of BiDi (Facebook)

Maximum 2 Terms

Web

Mobile

Inconsistent implementation of BiDi (Facebook)

Maximum 2 Terms

Web

Mobile

Inconsistent implementation of BiDi (Facebook)

Maximum 2 Terms

Terms 2 Maximum

Web

Mobile

BiDi not

implemented???

Inconsistent implementation of BiDi (Facebook)

Maximum 2 Terms

Terms 2 Maximum

Inconsistent automatic detection of direction (Google Hangounts)

desktop

mobile

desktop

mobile

Inconsistent automatic detection of direction (Google Hangounts)

desktop

mobile

No auto-flip

auto-flip

auto-flipauto-flip

Inconsistent automatic detection of direction (Google Hangounts)

Even real lifeignores BiDi(and Unicode)a lot

מסורת

מסורת

سنت

التقلید

When BiDi itselfis confusing

Numbers, math and phone numbersNumbers are rendered Left-to-Right even in (most) Right-to-Left contexts

Numbers, math and phone numbersNumbers are rendered Left-to-Right even in (most) Right-to-Left contexts

Phone number 123-456-7890LTR

Numbers, math and phone numbersNumbers are rendered Left-to-Right even in (most) Right-to-Left contexts

Phone number 123-456-7890LTR

RTLמספר טלפון 123-456-7890

Numbers, math and phone numbersNumbers are rendered Left-to-Right even in (most) Right-to-Left contexts

Phone number 123-456-7890LTR Phone number +1-234-567-9012LTR

RTLמספר טלפון 123-456-7890

Numbers, math and phone numbersNumbers are rendered Left-to-Right even in (most) Right-to-Left contexts

Phone number 123-456-7890LTR

RTLמספר טלפון +1-234-567-9012

Phone number +1-234-567-9012LTR

RTLמספר טלפון 123-456-7890

Plus / Minus signs are weak

Numbers, math and phone numbersNumbers are rendered Left-to-Right even in (most) Right-to-Left contexts

Phone number 123-456-7890LTR

RTLמספר טלפון +1-234-567-9012

Phone number +1-234-567-9012LTR

RTLמספר טלפון 123-456-7890

There are 1-2 things but 4 - 5 othersLTR

Plus / Minus signs are weak

Numbers, math and phone numbersNumbers are rendered Left-to-Right even in (most) Right-to-Left contexts

Phone number 123-456-7890LTR

RTLמספר טלפון +1-234-567-9012

Phone number +1-234-567-9012LTR

RTLמספר טלפון 123-456-7890

There are 1-2 things but 4 - 5 othersLTR

spaces

Plus / Minus signs are weak

Numbers, math and phone numbersNumbers are rendered Left-to-Right even in (most) Right-to-Left contexts

Phone number 123-456-7890LTR

RTLמספר טלפון +1-234-567-9012

Phone number +1-234-567-9012LTR

RTLמספר טלפון 123-456-7890

RTLיש 1-2 דברים אבל 4 - 5 אחרים

There are 1-2 things but 4 - 5 othersLTR

spaces

spaces (flipped)

Plus / Minus signs are weak

Printed in IsraelPrinted abroad

Bonus:Emoticons

Emoticons

:)LTR

(:RTL

Emoticons

:)LTR

(:RTL

:(LTR

):RTL

Emoticons

:)LTR

(:RTL

:(LTR

):RTL

:DLTR

D:RTL

:PLTR

P:RTL

Web

(Bonus fail)

Web

Android: Google SlidesAndroid: Google Drive

(Bonus fail)

RTL UsersExpect nothing good

RTL Usersare used to their computernot quite cooperating

Now what?

Now what?● Consistency in implementing the Bidirectional algorithm

Now what?● Consistency in implementing the Bidirectional algorithm

● Standard in predicting directionality while typing

Now what?● Consistency in implementing the Bidirectional algorithm

● Standard in predicting directionality while typing

● Improving isolation of numbers and dates

Now what?● Consistency in implementing the Bidirectional algorithm

● Standard in predicting directionality while typing

● Improving isolation of numbers and dates

● Consistent punctuation within sentences

(We solved “jumping” parentheses; let’s solve periods, commas, and colons!)

Remember Parentheses...MSchottlender (WMF)LTR

Remember Parentheses...MSchottlender (WMF)LTR

(MSchottlender (WMFRTL

Remember Parentheses...MSchottlender (WMF)LTR

(MSchottlender (WMFRTL

http://unicode.org/reports/tr9/#Paired_Brackets

Remember Parentheses...MSchottlender (WMF)LTR

(MSchottlender (WMFRTL

http://unicode.org/reports/tr9/#Paired_Brackets

Remember Parentheses...MSchottlender (WMF)LTR

(MSchottlender (WMFRTL

http://unicode.org/reports/tr9/#Paired_Brackets

Keep RTLing

Keep RTLinghttp://rtl.wtf

Keep RTLing ?snoitseuQhttp://rtl.wtf

top related