the 9th bit: encodings in ruby 1.9
DESCRIPTION
Talk on Ruby 1.9's Encoding API given at RubyConf Brasil, 2010.TRANSCRIPT
![Page 1: The 9th Bit: Encodings in Ruby 1.9](https://reader034.vdocument.in/reader034/viewer/2022052507/5585865bd8b42aac148b4eb0/html5/thumbnails/1.jpg)
The 9th Bit: Encodings in
Ruby 1.9Norman Clarke
![Page 2: The 9th Bit: Encodings in Ruby 1.9](https://reader034.vdocument.in/reader034/viewer/2022052507/5585865bd8b42aac148b4eb0/html5/thumbnails/2.jpg)
Encoding API
One of the most visible changes to Ruby in 1.9
![Page 3: The 9th Bit: Encodings in Ruby 1.9](https://reader034.vdocument.in/reader034/viewer/2022052507/5585865bd8b42aac148b4eb0/html5/thumbnails/3.jpg)
invalid multibyte char (US-ASCII)
invalid byte sequence in US-ASCII/UTF8
`encode': "\xE2\x80\xA6" from UTF-8 to ISO-8859-1
![Page 4: The 9th Bit: Encodings in Ruby 1.9](https://reader034.vdocument.in/reader034/viewer/2022052507/5585865bd8b42aac148b4eb0/html5/thumbnails/4.jpg)
Today’s Topics
• Character Encodings
• Ruby’s Encoding API
• Avoiding problems with UTF-8
![Page 5: The 9th Bit: Encodings in Ruby 1.9](https://reader034.vdocument.in/reader034/viewer/2022052507/5585865bd8b42aac148b4eb0/html5/thumbnails/5.jpg)
Why should I care?
![Page 6: The 9th Bit: Encodings in Ruby 1.9](https://reader034.vdocument.in/reader034/viewer/2022052507/5585865bd8b42aac148b4eb0/html5/thumbnails/6.jpg)
Ruby 1.9.2 is much faster than 1.8.7
![Page 7: The 9th Bit: Encodings in Ruby 1.9](https://reader034.vdocument.in/reader034/viewer/2022052507/5585865bd8b42aac148b4eb0/html5/thumbnails/7.jpg)
but forces you to be aware of encodings
![Page 8: The 9th Bit: Encodings in Ruby 1.9](https://reader034.vdocument.in/reader034/viewer/2022052507/5585865bd8b42aac148b4eb0/html5/thumbnails/8.jpg)
Encodings are boring but you can't ignore
them forever
![Page 9: The 9th Bit: Encodings in Ruby 1.9](https://reader034.vdocument.in/reader034/viewer/2022052507/5585865bd8b42aac148b4eb0/html5/thumbnails/9.jpg)
flickr.com/photos/29213152@N00/2410328364/
Character Encoding
Algorithm for interpreting a sequence of bytes as characters in
a written language
![Page 10: The 9th Bit: Encodings in Ruby 1.9](https://reader034.vdocument.in/reader034/viewer/2022052507/5585865bd8b42aac148b4eb0/html5/thumbnails/10.jpg)
0 nul 1 soh 2 stx 3 etx 4 eot 5 enq 6 ack 7 bel 8 bs 9 ht 10 nl 11 vt 12 np 13 cr 14 so 15 si 16 dle 17 dc1 18 dc2 19 dc3 20 dc4 21 nak 22 syn 23 etb 24 can 25 em 26 sub 27 esc 28 fs 29 gs 30 rs 31 us 32 sp 33 ! 34 " 35 # 36 $ 37 % 38 & 39 ' 40 ( 41 ) 42 * 43 + 44 , 45 - 46 . 47 / 48 0 49 1 50 2 51 3 52 4 53 5 54 6 55 7 56 8 57 9 58 : 59 ; 60 < 61 = 62 > 63 ? 64 @ 65 A 66 B 67 C 68 D 69 E 70 F 71 G 72 H 73 I 74 J 75 K 76 L 77 M 78 N 79 O 80 P 81 Q 82 R 83 S 84 T 85 U 86 V 87 W 88 X 89 Y 90 Z 91 [ 92 \ 93 ] 94 ^ 95 _ 96 ` 97 a 98 b 99 c 100 d 101 e 102 f 103 g104 h 105 i 106 j 107 k 108 l 109 m 110 n 111 o112 p 113 q 114 r 115 s 116 t 117 u 118 v 119 w120 x 121 y 122 z 123 { 124 | 125 } 126 ~ 127 del
ASCII
![Page 11: The 9th Bit: Encodings in Ruby 1.9](https://reader034.vdocument.in/reader034/viewer/2022052507/5585865bd8b42aac148b4eb0/html5/thumbnails/11.jpg)
ASCII: 7 bits
a97: 0110 0001
![Page 12: The 9th Bit: Encodings in Ruby 1.9](https://reader034.vdocument.in/reader034/viewer/2022052507/5585865bd8b42aac148b4eb0/html5/thumbnails/12.jpg)
Latin1: 8 bits
ã227: 1110 0011
![Page 13: The 9th Bit: Encodings in Ruby 1.9](https://reader034.vdocument.in/reader034/viewer/2022052507/5585865bd8b42aac148b4eb0/html5/thumbnails/13.jpg)
wikipedia.org/wiki/ISO/IEC_8859-1
![Page 14: The 9th Bit: Encodings in Ruby 1.9](https://reader034.vdocument.in/reader034/viewer/2022052507/5585865bd8b42aac148b4eb0/html5/thumbnails/14.jpg)
Other 8-bit Encodings
Work for most languages
![Page 15: The 9th Bit: Encodings in Ruby 1.9](https://reader034.vdocument.in/reader034/viewer/2022052507/5585865bd8b42aac148b4eb0/html5/thumbnails/15.jpg)
256 is not enough for:
かたかな• Chinese
• Japanese
• some others
한국어/조선말
中文
字喃/낄喃/끻喃
![Page 16: The 9th Bit: Encodings in Ruby 1.9](https://reader034.vdocument.in/reader034/viewer/2022052507/5585865bd8b42aac148b4eb0/html5/thumbnails/16.jpg)
fanpop.com/spots/gandalf/images/7018563/title/gandalf-vs-el-balrog-fanart
![Page 17: The 9th Bit: Encodings in Ruby 1.9](https://reader034.vdocument.in/reader034/viewer/2022052507/5585865bd8b42aac148b4eb0/html5/thumbnails/17.jpg)
8-bit overlap
202
Ê Ę Ъ ت Κ ส Ź
![Page 18: The 9th Bit: Encodings in Ruby 1.9](https://reader034.vdocument.in/reader034/viewer/2022052507/5585865bd8b42aac148b4eb0/html5/thumbnails/18.jpg)
Unicode: An Improbable Success
¡cn:中文!
![Page 19: The 9th Bit: Encodings in Ruby 1.9](https://reader034.vdocument.in/reader034/viewer/2022052507/5585865bd8b42aac148b4eb0/html5/thumbnails/19.jpg)
Used internally by Perl, Java, Python 3, Haskell and others
![Page 20: The 9th Bit: Encodings in Ruby 1.9](https://reader034.vdocument.in/reader034/viewer/2022052507/5585865bd8b42aac148b4eb0/html5/thumbnails/20.jpg)
Unicode in Japan: not as popular
![Page 21: The 9th Bit: Encodings in Ruby 1.9](https://reader034.vdocument.in/reader034/viewer/2022052507/5585865bd8b42aac148b4eb0/html5/thumbnails/21.jpg)
Ruby 1.9: Character Set Independence
![Page 22: The 9th Bit: Encodings in Ruby 1.9](https://reader034.vdocument.in/reader034/viewer/2022052507/5585865bd8b42aac148b4eb0/html5/thumbnails/22.jpg)
![Page 23: The 9th Bit: Encodings in Ruby 1.9](https://reader034.vdocument.in/reader034/viewer/2022052507/5585865bd8b42aac148b4eb0/html5/thumbnails/23.jpg)
Ruby’s Encoding API
• Source code
• String
• Regexp
• IO
• Encoding
![Page 24: The 9th Bit: Encodings in Ruby 1.9](https://reader034.vdocument.in/reader034/viewer/2022052507/5585865bd8b42aac148b4eb0/html5/thumbnails/24.jpg)
# coding: utf-8class Canção GÊNEROS = [:forró, :carimbó, :afoxé]
attr_accessor :gêneroendasa_branca = Canção.newasa_branca.gênero = :forróp asa_branca.gênero
Source
![Page 25: The 9th Bit: Encodings in Ruby 1.9](https://reader034.vdocument.in/reader034/viewer/2022052507/5585865bd8b42aac148b4eb0/html5/thumbnails/25.jpg)
Warnings
• Breaks syntax highlighting
• #inspect, #p don’t work as of 1.9.2
• Some editors/programmers will probably mess up your code
• Just because you can, doesn’t mean you should
![Page 26: The 9th Bit: Encodings in Ruby 1.9](https://reader034.vdocument.in/reader034/viewer/2022052507/5585865bd8b42aac148b4eb0/html5/thumbnails/26.jpg)
# encoding: utf-8 string = “ã” string.length #=> 1 string.bytesize #=> 2 string.bytes.to_a #=> [195, 163] string.encode! "ISO-8859-1" string.length #=> 1 string.bytesize #=> 1 string.bytes.to_a #=> [227]
String
![Page 27: The 9th Bit: Encodings in Ruby 1.9](https://reader034.vdocument.in/reader034/viewer/2022052507/5585865bd8b42aac148b4eb0/html5/thumbnails/27.jpg)
# encoding: utf-8 string = “ã” string.length #=> 1 string.bytesize #=> 2 string.bytes.to_a #=> [195, 163] string.encode! "ISO-8859-1" string.length #=> 1 string.bytesize #=> 1 string.bytes.to_a #=> [227]
String
![Page 28: The 9th Bit: Encodings in Ruby 1.9](https://reader034.vdocument.in/reader034/viewer/2022052507/5585865bd8b42aac148b4eb0/html5/thumbnails/28.jpg)
# encoding: utf-8 string = “ã” string.length #=> 1 string.bytesize #=> 2 string.bytes.to_a #=> [195, 163] string.encode! "ISO-8859-1" string.length #=> 1 string.bytesize #=> 1 string.bytes.to_a #=> [227]
String
![Page 29: The 9th Bit: Encodings in Ruby 1.9](https://reader034.vdocument.in/reader034/viewer/2022052507/5585865bd8b42aac148b4eb0/html5/thumbnails/29.jpg)
# encoding: utf-8 string = “ã” string.length #=> 1 string.bytesize #=> 2 string.bytes.to_a #=> [195, 163] string.encode! "ISO-8859-1" string.length #=> 1 string.bytesize #=> 1 string.bytes.to_a #=> [227]
String
![Page 30: The 9th Bit: Encodings in Ruby 1.9](https://reader034.vdocument.in/reader034/viewer/2022052507/5585865bd8b42aac148b4eb0/html5/thumbnails/30.jpg)
puts a1 ("ã")puts a2 ("ã")a1.encoding #=> "ASCII-8BIT"a2.encoding #=> "UTF-8"a1.bytes.to_a == a2.bytes.to_a #=> truea1 == a2 #=> false
String
![Page 31: The 9th Bit: Encodings in Ruby 1.9](https://reader034.vdocument.in/reader034/viewer/2022052507/5585865bd8b42aac148b4eb0/html5/thumbnails/31.jpg)
# vim: set fileencoding=utf-8
pat = /ã/ pat.encoding #=> “UTF-8” pat.encode! “ISO-8859-1” #=> FAIL pat = “ã”.encode “ISO-8859-1” regexp = Regexp.new(pat) #=> OK
Regexp
![Page 32: The 9th Bit: Encodings in Ruby 1.9](https://reader034.vdocument.in/reader034/viewer/2022052507/5585865bd8b42aac148b4eb0/html5/thumbnails/32.jpg)
# vim: set fileencoding=utf-8
pat = /ã/ pat.encoding #=> “UTF-8” pat.encode! “ISO-8859-1” #=> FAIL pat = “ã”.encode “ISO-8859-1” regexp = Regexp.new(pat) #=> OK
Regexp
![Page 33: The 9th Bit: Encodings in Ruby 1.9](https://reader034.vdocument.in/reader034/viewer/2022052507/5585865bd8b42aac148b4eb0/html5/thumbnails/33.jpg)
# vim: set fileencoding=utf-8
pat = /ã/ pat.encoding #=> “UTF-8” pat.encode! “ISO-8859-1” #=> FAIL pat = “ã”.encode “ISO-8859-1” regexp = Regexp.new(pat) #=> OK
Regexp
![Page 34: The 9th Bit: Encodings in Ruby 1.9](https://reader034.vdocument.in/reader034/viewer/2022052507/5585865bd8b42aac148b4eb0/html5/thumbnails/34.jpg)
f = File.open("file.txt", "r:ISO-8859-1") data = f.read data.encoding #=> “ ISO-8859-1”
IO
![Page 35: The 9th Bit: Encodings in Ruby 1.9](https://reader034.vdocument.in/reader034/viewer/2022052507/5585865bd8b42aac148b4eb0/html5/thumbnails/35.jpg)
f = File.open("file.txt", "rb:UTF-16BE:UTF8") data = f.read data.encoding #=> “UTF-8”
IO
![Page 36: The 9th Bit: Encodings in Ruby 1.9](https://reader034.vdocument.in/reader034/viewer/2022052507/5585865bd8b42aac148b4eb0/html5/thumbnails/36.jpg)
f = File.open("file.txt", "r:BINARY") # (or “rb”) data = f.read data.encoding #=> “ASCII-8BIT” data.force_encoding "UTF-8"
IO
![Page 37: The 9th Bit: Encodings in Ruby 1.9](https://reader034.vdocument.in/reader034/viewer/2022052507/5585865bd8b42aac148b4eb0/html5/thumbnails/37.jpg)
f = File.open("file.txt", "r:BINARY") # (or “rb”) data = f.read data.encoding #=> “ASCII-8BIT” data.force_encoding "UTF-8"
IO
![Page 38: The 9th Bit: Encodings in Ruby 1.9](https://reader034.vdocument.in/reader034/viewer/2022052507/5585865bd8b42aac148b4eb0/html5/thumbnails/38.jpg)
f = File.open("file.txt", "r:BINARY") # (or “rb”) data = f.read data.encoding #=> “ASCII-8BIT” data.force_encoding "UTF-8"
IO
![Page 39: The 9th Bit: Encodings in Ruby 1.9](https://reader034.vdocument.in/reader034/viewer/2022052507/5585865bd8b42aac148b4eb0/html5/thumbnails/39.jpg)
Encoding.list.size #=> 95 Encoding.default_external = "ISO-8859-1" Encoding.default_internal = "UTF-8" File.open("latin1.txt", "r") do |file| p file.external_encoding #=> ISO-8859-1 data = file.read p data.encoding #=> UTF-8 end
Encoding
![Page 40: The 9th Bit: Encodings in Ruby 1.9](https://reader034.vdocument.in/reader034/viewer/2022052507/5585865bd8b42aac148b4eb0/html5/thumbnails/40.jpg)
Encoding.list.size #=> 95 Encoding.default_external = "ISO-8859-1" Encoding.default_internal = "UTF-8" File.open("latin1.txt", "r") do |file| p file.external_encoding #=> ISO-8859-1 data = file.read p data.encoding #=> UTF-8 end
Encoding
![Page 41: The 9th Bit: Encodings in Ruby 1.9](https://reader034.vdocument.in/reader034/viewer/2022052507/5585865bd8b42aac148b4eb0/html5/thumbnails/41.jpg)
Encoding.list.size #=> 95 Encoding.default_external = "ISO-8859-1" Encoding.default_internal = "UTF-8" File.open("latin1.txt", "r") do |file| p file.external_encoding #=> ISO-8859-1 data = file.read p data.encoding #=> UTF-8 end
Encoding
![Page 42: The 9th Bit: Encodings in Ruby 1.9](https://reader034.vdocument.in/reader034/viewer/2022052507/5585865bd8b42aac148b4eb0/html5/thumbnails/42.jpg)
Encoding.list.size #=> 95 Encoding.default_external = "ISO-8859-1" Encoding.default_internal = "UTF-8" File.open("latin1.txt", "r") do |file| p file.external_encoding #=> ISO-8859-1 data = file.read p data.encoding #=> UTF-8 end
Encoding
![Page 43: The 9th Bit: Encodings in Ruby 1.9](https://reader034.vdocument.in/reader034/viewer/2022052507/5585865bd8b42aac148b4eb0/html5/thumbnails/43.jpg)
UTF-8: a Unicode Encoding
Unicode, UTF-8, UTF-16, UTF-32, UCS-2, etc.
![Page 44: The 9th Bit: Encodings in Ruby 1.9](https://reader034.vdocument.in/reader034/viewer/2022052507/5585865bd8b42aac148b4eb0/html5/thumbnails/44.jpg)
UTF-8
Backwards-compatible with ASCII
![Page 45: The 9th Bit: Encodings in Ruby 1.9](https://reader034.vdocument.in/reader034/viewer/2022052507/5585865bd8b42aac148b4eb0/html5/thumbnails/45.jpg)
Use UTF-8 unless you have a good reason not to
![Page 46: The 9th Bit: Encodings in Ruby 1.9](https://reader034.vdocument.in/reader034/viewer/2022052507/5585865bd8b42aac148b4eb0/html5/thumbnails/46.jpg)
UTF-8 and HTML
<meta http-equiv="content-type"
content="text/html;charset=UTF-8" />
![Page 47: The 9th Bit: Encodings in Ruby 1.9](https://reader034.vdocument.in/reader034/viewer/2022052507/5585865bd8b42aac148b4eb0/html5/thumbnails/47.jpg)
UTF-8 and HTML
日本語
![Page 48: The 9th Bit: Encodings in Ruby 1.9](https://reader034.vdocument.in/reader034/viewer/2022052507/5585865bd8b42aac148b4eb0/html5/thumbnails/48.jpg)
UTF-8 and HTML
日本語
![Page 49: The 9th Bit: Encodings in Ruby 1.9](https://reader034.vdocument.in/reader034/viewer/2022052507/5585865bd8b42aac148b4eb0/html5/thumbnails/49.jpg)
UTF-8 and HTML
<form action="/" accept-
charset="UTF-8">
![Page 50: The 9th Bit: Encodings in Ruby 1.9](https://reader034.vdocument.in/reader034/viewer/2022052507/5585865bd8b42aac148b4eb0/html5/thumbnails/50.jpg)
UTF-8 and HTML
f.html?l=日本語
![Page 51: The 9th Bit: Encodings in Ruby 1.9](https://reader034.vdocument.in/reader034/viewer/2022052507/5585865bd8b42aac148b4eb0/html5/thumbnails/51.jpg)
UTF-8 and HTML
f.html?l=%26%2326085%3B%26%2326412%3B
%26%2335
![Page 52: The 9th Bit: Encodings in Ruby 1.9](https://reader034.vdocument.in/reader034/viewer/2022052507/5585865bd8b42aac148b4eb0/html5/thumbnails/52.jpg)
...here's where things get kind of strange.
![Page 53: The 9th Bit: Encodings in Ruby 1.9](https://reader034.vdocument.in/reader034/viewer/2022052507/5585865bd8b42aac148b4eb0/html5/thumbnails/53.jpg)
“JOÃO”.downcase #=> “joÃo”“joão”.upcase #=> “JOãO”
Case Folding
![Page 54: The 9th Bit: Encodings in Ruby 1.9](https://reader034.vdocument.in/reader034/viewer/2022052507/5585865bd8b42aac148b4eb0/html5/thumbnails/54.jpg)
# UnicodeUnicode.downcase(“JOÃO”)
# Active Support“JOÃO”.mb_chars.downcase
Case Folding
![Page 55: The 9th Bit: Encodings in Ruby 1.9](https://reader034.vdocument.in/reader034/viewer/2022052507/5585865bd8b42aac148b4eb0/html5/thumbnails/55.jpg)
# NOT always true "João" == "João"
Equivalence
![Page 56: The 9th Bit: Encodings in Ruby 1.9](https://reader034.vdocument.in/reader034/viewer/2022052507/5585865bd8b42aac148b4eb0/html5/thumbnails/56.jpg)
"ã" or "a" + "~"
Two ways to represent many characters
![Page 57: The 9th Bit: Encodings in Ruby 1.9](https://reader034.vdocument.in/reader034/viewer/2022052507/5585865bd8b42aac148b4eb0/html5/thumbnails/57.jpg)
Composed
a = Unicode.normalize_C("ã")a.bytes.to_a #=> [195, 163]
![Page 58: The 9th Bit: Encodings in Ruby 1.9](https://reader034.vdocument.in/reader034/viewer/2022052507/5585865bd8b42aac148b4eb0/html5/thumbnails/58.jpg)
Decomposed
a = Unicode.normalize_D("ã")a.bytes.to_a #=> [97, 204, 131]
![Page 59: The 9th Bit: Encodings in Ruby 1.9](https://reader034.vdocument.in/reader034/viewer/2022052507/5585865bd8b42aac148b4eb0/html5/thumbnails/59.jpg)
Why?
dec = Unicode.normalize_D("ã")dec =~ /a/ # match
comp = Unicode.normalize_C("ã")comp =~ /a/ # no match
![Page 60: The 9th Bit: Encodings in Ruby 1.9](https://reader034.vdocument.in/reader034/viewer/2022052507/5585865bd8b42aac148b4eb0/html5/thumbnails/60.jpg)
Normalize string keys!!!
![Page 61: The 9th Bit: Encodings in Ruby 1.9](https://reader034.vdocument.in/reader034/viewer/2022052507/5585865bd8b42aac148b4eb0/html5/thumbnails/61.jpg)
{ "João" => "authorized", "João" => "not authorized"}
You have been warned
![Page 62: The 9th Bit: Encodings in Ruby 1.9](https://reader034.vdocument.in/reader034/viewer/2022052507/5585865bd8b42aac148b4eb0/html5/thumbnails/62.jpg)
Some libraries
• Unicode
• Active Support
• Java’s stdlib
![Page 63: The 9th Bit: Encodings in Ruby 1.9](https://reader034.vdocument.in/reader034/viewer/2022052507/5585865bd8b42aac148b4eb0/html5/thumbnails/63.jpg)
Cleaning up bad data: avoid Iconv
![Page 64: The 9th Bit: Encodings in Ruby 1.9](https://reader034.vdocument.in/reader034/viewer/2022052507/5585865bd8b42aac148b4eb0/html5/thumbnails/64.jpg)
require "active_support"require "active_support/multibyte/unicode"
include ActiveSupport::MultibyteUnicode.tidy_bytes(@bad_string)
Tidy Bytes
![Page 65: The 9th Bit: Encodings in Ruby 1.9](https://reader034.vdocument.in/reader034/viewer/2022052507/5585865bd8b42aac148b4eb0/html5/thumbnails/65.jpg)
MySQL
Set encoding options early
![Page 66: The 9th Bit: Encodings in Ruby 1.9](https://reader034.vdocument.in/reader034/viewer/2022052507/5585865bd8b42aac148b4eb0/html5/thumbnails/66.jpg)
# 1: decompose@s = Unicode.normalize_D(@s)
# 2: delete accent [email protected]!(/[^\x00-\x7F]/, '')
# 3: FAIL
Approximating ASCII:"João" => "joao"
![Page 67: The 9th Bit: Encodings in Ruby 1.9](https://reader034.vdocument.in/reader034/viewer/2022052507/5585865bd8b42aac148b4eb0/html5/thumbnails/67.jpg)
OK
ã á ê ü à ça a e u a c
![Page 68: The 9th Bit: Encodings in Ruby 1.9](https://reader034.vdocument.in/reader034/viewer/2022052507/5585865bd8b42aac148b4eb0/html5/thumbnails/68.jpg)
FAIL
ß ø œ æ"" "" "" ""
![Page 69: The 9th Bit: Encodings in Ruby 1.9](https://reader034.vdocument.in/reader034/viewer/2022052507/5585865bd8b42aac148b4eb0/html5/thumbnails/69.jpg)
Use instead:
• Active Support’s Inflector.transliterate
• I18n.transliterate
• Babosa
![Page 70: The 9th Bit: Encodings in Ruby 1.9](https://reader034.vdocument.in/reader034/viewer/2022052507/5585865bd8b42aac148b4eb0/html5/thumbnails/70.jpg)
To Sum Up...
![Page 71: The 9th Bit: Encodings in Ruby 1.9](https://reader034.vdocument.in/reader034/viewer/2022052507/5585865bd8b42aac148b4eb0/html5/thumbnails/71.jpg)
Ruby is weird
![Page 72: The 9th Bit: Encodings in Ruby 1.9](https://reader034.vdocument.in/reader034/viewer/2022052507/5585865bd8b42aac148b4eb0/html5/thumbnails/72.jpg)
Use UTF-8
![Page 73: The 9th Bit: Encodings in Ruby 1.9](https://reader034.vdocument.in/reader034/viewer/2022052507/5585865bd8b42aac148b4eb0/html5/thumbnails/73.jpg)
Normalize UTF-8 keys
![Page 74: The 9th Bit: Encodings in Ruby 1.9](https://reader034.vdocument.in/reader034/viewer/2022052507/5585865bd8b42aac148b4eb0/html5/thumbnails/74.jpg)
Configure MySQL properly for UTF-8