bidi in the wild - unicode conference in the wild challenges of the unicode bidirectional algorithm...
Post on 30-May-2018
217 Views
Preview:
TRANSCRIPT
BiDi in the WildChallenges of the Unicode Bidirectional algorithm
Moriel Schottlender
Software Engineer
Wikipedia’sRight-to-Left support
Right-to-Left Wikipedias● ~260 Wikipedias in Left-to-Right
● ~17 Wikipedias in Right-to-Left
Right-to-Left Wikipedias● ~260 Wikipedias in Left-to-Right
● ~17 Wikipedias in Right-to-Left
Arabic Wikipedia ~1,000,000 users
~375,000 articles
Right-to-Left Wikipedias● ~260 Wikipedias in Left-to-Right
● ~17 Wikipedias in Right-to-Left
Arabic Wikipedia ~1,000,000 users
~375,000 articles
Persian Wikipedia ~514,000 users
~460,000 articles
Right-to-Left Wikipedias● ~260 Wikipedias in Left-to-Right
● ~17 Wikipedias in Right-to-Left
Arabic Wikipedia ~1,000,000 users
~375,000 articles
Persian Wikipedia ~514,000 users
~460,000 articles
Hebrew Wikipedia ~277,000 users
~175,000 articles
Editing Right-to-Left Wikipedias
A brief history ofRight-to-Left support online
Long long ago● Computers mostly only knew Left to Right
● Supporting non-latin scripts required special fonts
● There was no real Right-to-Left support
Long long ago● Computers mostly only knew Left to Right
● Supporting non-latin scripts required special fonts
● There was no real Right-to-Left support
Solution:
Long long ago● Computers mostly only knew Left to Right
● Supporting non-latin scripts required special fonts
● There was no real Right-to-Left support
Solution: Writing backwards
Long long ago● Computers mostly only knew Left to Right
● Supporting non-latin scripts required special fonts
● There was no real Right-to-Left support
Solution: Writing backwards
Long long ago● Computers mostly only knew Left to Right
● Supporting non-latin scripts required special fonts
● There was no real Right-to-Left support
Solution: Writing backwards
enod eb ot dah gnihtemoS
Long long ago● Computers mostly only knew Left to Right
● Supporting non-latin scripts required special fonts
● There was no real Right-to-Left support
Solution: Writing backwards
Something had to be done
● Visual: שלום עולם
● Logical: שלום עולם
Char order: 1 2 3 4
Char order: 4 3 2 1
Pre-BiDi Solution: Visual and Logical encoding
● Visual: שלום עולם
● Logical: שלום עולם
1 2 3 4
4 3 2 1
(Someone had to type this backwards!)
Pre-BiDi Solution: Visual and Logical encoding
Char order:
Char order:
Pre-BiDi Solution: Visual and Logical encoding
● Visual: שלום עולם
● Logical: שלום עולם
Char order: 1 2 3 4
Char order: 4 3 2 1
(Someone had to type this backwards!)
UnicodeBidirectionalAlgorithm
Unicode Bidirectional Algorithmhttp://unicode.org/reports/tr9/
If all text on a page is uniform (all RTL or all LTR) the ordering of the display text is
unambiguous.
Unicode Bidirectional Algorithmhttp://unicode.org/reports/tr9/
If all text on a page is uniform (all RTL or all LTR) the ordering of the display text is
unambiguous.
● RTL content can include digits (written LTR)
● RTL content can be mixed with LTR content
Unicode Bidirectional Algorithmhttp://unicode.org/reports/tr9/
If all text on a page is uniform (all RTL or all LTR) the ordering of the display text is
unambiguous.
● RTL content can include digits (written LTR)
● RTL content can be mixed with LTR content
Santa Claraב Unicode Conferenceאני הולכת להרצות בExamples:
צריך להתקשר ל 555-123-4567
Unicode Bidirectional Algorithmhttp://unicode.org/reports/tr9/
If all text on a page is uniform (all RTL or all LTR) the ordering of the display text is
unambiguous.
● RTL content can include digits (written LTR)
● RTL content can be mixed with LTR content
The Bidirectional Algorithm is meant to solve ambiguity in rendering order.
Santa Claraב Unicode Conferenceאני הולכת להרצות בExamples:
צריך להתקשר ל 555-123-4567
Quick primer to BiDi entity typesStrong
Weak
Neutral
Affect the directionality of entities around them
Do not affect the directionality of entities around them
Take the directionality of the context they’re in
Alphabet
Punctuation*, digits
Space, newline, tab, etc
http://unicode.org/reports/tr9/
Numbersעברית 123
עברית 123
Numbers
RTL
LTR
עברית 123
Numbers
RTL
עברית 1 23
LTR
RTL
LTR
(Whitespace is neutral)
עברית 123
Numbers
RTL
עברית 1 23
LTR
RTL
LTR
(Whitespace is neutral)
עברית 1 2 3RTL
Text and numbersEnglish 1 2 3 Hebrew 1 2 3 English
English 1 2 3 Hebrew 1 2 3 English
English 1 2 3 3 2 1 עברית English
Text and numbers
English 1 2 3 Hebrew 1 2 3 English
English 1 2 3 3 2 1 עברית English
Weak WeakStrongStrong Strong
Text and numbers
English 1 2 3 3 2 1 עברית English
LTR
Text and numbersEnglish 1 2 3 Hebrew 1 2 3 English
Weak WeakStrongStrong Strong
English 1 2 3 3 2 1 עברית English
LTR RTL
Text and numbersEnglish 1 2 3 Hebrew 1 2 3 English
Weak WeakStrongStrong Strong
English 1 2 3 3 2 1 עברית English
LTR RTL LTR
Text and numbersEnglish 1 2 3 Hebrew 1 2 3 English
Weak WeakStrongStrong Strong
The confusing issueof the parentheses
The confusing issueof the parenthesesDemo
Parentheses
Parentheses
(hello)
Parentheses
(hello)
Parentheses
(hello) (שלום)
Parentheses
(hello) (שלום)
Parentheses
Parentheses
Good luck
with HTML
Parentheses
Good luck
with HTML
Or math
comparisons
<a href="http://wikipedia.org" title="foo">bar</a>LTR
<a href="http://wikipedia.org" title="foo">bar</a>LTR
RTL
<a href="http://wikipedia.org" title="foo">שלום</a>LTR
LTR
<a href="http://wikipedia.org" title="אהלן">שלום</a>
<a href="http://wikipedia.org" title="foo">bar</a>LTR
RTL
<a href="http://wikipedia.org" title="foo">שלום</a>LTR
LTR
RTL
LTR
LTR
<a href="http://wikipedia.org" title="אהלן">שלום</a>
<a href="http://wikipedia.org" title="foo">bar</a>LTR
RTL
<a href="http://wikipedia.org" title="foo">שלום</a>LTR
LTR
RTL
LTR
LTR
RTL[[קובץ:Moriel schottlender.jpg|250px|שמאל|זו אני!]]
RTL[[קובץ:Moriel schottlender.jpg|250px|שמאל|זו אני!]]
[[קובץ:Moriel schottlender.jpg|250px|שמאל|זו אני!]]RTLLTR
RTL[[קובץ:Moriel schottlender.jpg|250px|שמאל|זו אני!]]
RTL
RTL
[[קובץ:Moriel schottlender.jpg|250px|שמאל|זו אני!]]RTLLTR
RTL[[קובץ:Moriel schottlender.jpg|250px|שמאל|זו אני!]]
RTL
RTL
Your brain on BiDi
Credit: U.S. Navy photo by Photographer’s Mate 2nd Class Aaron Peterson. Public Domain.
https://commons.wikimedia.org/wiki/File:US_Navy_020712-N-5471P-010_EOD_teams_detonate_expired_ordnance_in_the_Kuwaiti_desert.jpg
The tale of ananimated bitmap
fig.bmp
\u202epmb.gif
fig.bmp
\u202epmb.gif
fig.bmp
#!/usr/bin/env pythonimport shutilshutil.copy("animated.gif", u"\u202Epmb.gif")
Code by David Chan
\u202epmb.gif
fig.bmp
#!/usr/bin/env pythonimport shutilshutil.copy("animated.gif", u"\u202Epmb.gif")
Code by David Chan
\u202epmb.gif
fig.bmp
#!/usr/bin/env pythonimport shutilshutil.copy("animated.gif", u"\u202Epmb.gif")
Code by David Chan
\u202epmb.gif
fig.bmp
#!/usr/bin/env pythonimport shutilshutil.copy("animated.gif", u"\u202Epmb.gif")
Code by David Chan
\u202epmb.gif
fig.bmp
#!/usr/bin/env pythonimport shutilshutil.copy("animated.gif", u"\u202Epmb.gif")
Code by David Chan
\u202epmb.gif
fig.bmp
#!/usr/bin/env pythonimport shutilshutil.copy("animated.gif", u"\u202Epmb.gif")
Code by David Chan
\u202epmb.gif
fig.bmp
#!/usr/bin/env pythonimport shutilshutil.copy("animated.gif", u"\u202Epmb.gif")
Code by David Chan
Control characters● Implicit directional formatting
○ U+200E: LEFT-TO-RIGHT MARK (LRM)
○ U+200F: RIGHT-TO-LEFT MARK (RLM)
● Explicit directional embedding
○ U+202A: LEFT-TO-RIGHT EMBEDDING (LRE)
○ U+202B: RIGHT-TO-LEFT EMBEDDING (RLE)
○ U+202C: POP DIRECTIONAL FORMATTING (PDF)
● Explicit directional override
○ U+202D: LEFT-TO-RIGHT OVERRIDE (LRO)
○ U+202E: RIGHT-TO-LEFT OVERRIDE (RLO)
● Explicit directional isolate
○ U+2066: LEFT-TO-RIGHT ISOLATE
○ U+2067: RIGHT-TO-LEFT ISOLATE
○ U+2068: FIRST STRONG ISOLATE
○ U+2069: POP DIRECTIONAL ISOLATE
When BiDi istechnically correctbut practically wrong
Solution: Force Isolation
Solution: Force Isolation
Solution: Force Isolation
Solution: Force Isolation
New topic created on [board name]: “<bdi>[topic title]</bdi>”
Solution: Force Isolation
New topic created on [board name]: “<bdi>[topic title]</bdi>”
New topic created on [board name]: “<bdi>[topic title]</bdi>”
Solution: Force Isolation
Dates
Dates
Dates
TLV אל IST 28 במאי, 8:40
(TLV to IST 28 May, 8:40)
Dates
TLV אל IST 28 במאי, 8:40
LTRLTR RTLRTL
(TLV to IST 28 May, 8:40)
Dates
TLV אל IST 28 במאי, 8:40
LTRLTR RTLRTL
(TLV to IST 28 May, 8:40)
LTR email in RTL clients
LTR email in RTL clients
LTR email in RTL clients
LTR email in RTL clients
LTR client
LTR email in RTL clients
LTR client
RTL client
LTR email in RTL clients
LTR client
RTL client
LTR email in RTL clients
LTR client
RTL client
12
1 2
LTR email in RTL clients
LTR client
RTL client
12
1 2
LTR client
RTL client
12
1 2
Solution: Always define content directionality
Applicationsimplement BiDiinconsistently
Web
Inconsistent implementation of BiDi (Facebook)
Web
Inconsistent implementation of BiDi (Facebook)
Maximum 2 Terms
Web
Mobile
Inconsistent implementation of BiDi (Facebook)
Maximum 2 Terms
Web
Mobile
Inconsistent implementation of BiDi (Facebook)
Maximum 2 Terms
Terms 2 Maximum
Web
Mobile
BiDi not
implemented???
Inconsistent implementation of BiDi (Facebook)
Maximum 2 Terms
Terms 2 Maximum
Inconsistent automatic detection of direction (Google Hangounts)
desktop
mobile
desktop
mobile
Inconsistent automatic detection of direction (Google Hangounts)
desktop
mobile
No auto-flip
auto-flip
auto-flipauto-flip
Inconsistent automatic detection of direction (Google Hangounts)
Even real lifeignores BiDi(and Unicode)a lot
מסורת
מסורת
سنت
التقلید
When BiDi itselfis confusing
Numbers, math and phone numbersNumbers are rendered Left-to-Right even in (most) Right-to-Left contexts
Numbers, math and phone numbersNumbers are rendered Left-to-Right even in (most) Right-to-Left contexts
Phone number 123-456-7890LTR
Numbers, math and phone numbersNumbers are rendered Left-to-Right even in (most) Right-to-Left contexts
Phone number 123-456-7890LTR
RTLמספר טלפון 123-456-7890
Numbers, math and phone numbersNumbers are rendered Left-to-Right even in (most) Right-to-Left contexts
Phone number 123-456-7890LTR Phone number +1-234-567-9012LTR
RTLמספר טלפון 123-456-7890
Numbers, math and phone numbersNumbers are rendered Left-to-Right even in (most) Right-to-Left contexts
Phone number 123-456-7890LTR
RTLמספר טלפון +1-234-567-9012
Phone number +1-234-567-9012LTR
RTLמספר טלפון 123-456-7890
Plus / Minus signs are weak
Numbers, math and phone numbersNumbers are rendered Left-to-Right even in (most) Right-to-Left contexts
Phone number 123-456-7890LTR
RTLמספר טלפון +1-234-567-9012
Phone number +1-234-567-9012LTR
RTLמספר טלפון 123-456-7890
There are 1-2 things but 4 - 5 othersLTR
Plus / Minus signs are weak
Numbers, math and phone numbersNumbers are rendered Left-to-Right even in (most) Right-to-Left contexts
Phone number 123-456-7890LTR
RTLמספר טלפון +1-234-567-9012
Phone number +1-234-567-9012LTR
RTLמספר טלפון 123-456-7890
There are 1-2 things but 4 - 5 othersLTR
spaces
Plus / Minus signs are weak
Numbers, math and phone numbersNumbers are rendered Left-to-Right even in (most) Right-to-Left contexts
Phone number 123-456-7890LTR
RTLמספר טלפון +1-234-567-9012
Phone number +1-234-567-9012LTR
RTLמספר טלפון 123-456-7890
RTLיש 1-2 דברים אבל 4 - 5 אחרים
There are 1-2 things but 4 - 5 othersLTR
spaces
spaces (flipped)
Plus / Minus signs are weak
Printed in IsraelPrinted abroad
Bonus:Emoticons
Emoticons
:)LTR
(:RTL
Emoticons
:)LTR
(:RTL
:(LTR
):RTL
Emoticons
:)LTR
(:RTL
:(LTR
):RTL
:DLTR
D:RTL
:PLTR
P:RTL
Web
(Bonus fail)
Web
Android: Google SlidesAndroid: Google Drive
(Bonus fail)
RTL UsersExpect nothing good
RTL Usersare used to their computernot quite cooperating
Now what?
Now what?● Consistency in implementing the Bidirectional algorithm
Now what?● Consistency in implementing the Bidirectional algorithm
● Standard in predicting directionality while typing
Now what?● Consistency in implementing the Bidirectional algorithm
● Standard in predicting directionality while typing
● Improving isolation of numbers and dates
Now what?● Consistency in implementing the Bidirectional algorithm
● Standard in predicting directionality while typing
● Improving isolation of numbers and dates
● Consistent punctuation within sentences
(We solved “jumping” parentheses; let’s solve periods, commas, and colons!)
Remember Parentheses...MSchottlender (WMF)LTR
Remember Parentheses...MSchottlender (WMF)LTR
(MSchottlender (WMFRTL
Remember Parentheses...MSchottlender (WMF)LTR
(MSchottlender (WMFRTL
http://unicode.org/reports/tr9/#Paired_Brackets
Remember Parentheses...MSchottlender (WMF)LTR
(MSchottlender (WMFRTL
http://unicode.org/reports/tr9/#Paired_Brackets
Remember Parentheses...MSchottlender (WMF)LTR
(MSchottlender (WMFRTL
http://unicode.org/reports/tr9/#Paired_Brackets
Keep RTLing
Keep RTLinghttp://rtl.wtf
Keep RTLing ?snoitseuQhttp://rtl.wtf
top related