text copyright © software carpentry 2010 this work is licensed under the creative commons...
TRANSCRIPT
![Page 1: Text Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bfd31a28abf838cac073/html5/thumbnails/1.jpg)
Text
Copyright © Software Carpentry 2010
This work is licensed under the Creative Commons Attribution License
See http://software-carpentry.org/license.html for more information.
Python
![Page 2: Text Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bfd31a28abf838cac073/html5/thumbnails/2.jpg)
Python Text
How to represent characters?
![Page 3: Text Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bfd31a28abf838cac073/html5/thumbnails/3.jpg)
Python Text
How to represent characters?
American English in the 1960s:
![Page 4: Text Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bfd31a28abf838cac073/html5/thumbnails/4.jpg)
Python Text
How to represent characters?
American English in the 1960s:
26 characters × {upper, lower}
![Page 5: Text Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bfd31a28abf838cac073/html5/thumbnails/5.jpg)
Python Text
How to represent characters?
American English in the 1960s:
26 characters × {upper, lower}
+ 10 digits
![Page 6: Text Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bfd31a28abf838cac073/html5/thumbnails/6.jpg)
Python Text
How to represent characters?
American English in the 1960s:
26 characters × {upper, lower}
+ 10 digits
+ punctuation
![Page 7: Text Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bfd31a28abf838cac073/html5/thumbnails/7.jpg)
Python Text
How to represent characters?
American English in the 1960s:
26 characters × {upper, lower}
+ 10 digits
+ punctuation
+ special characters for controlling teletypes
(new line, carriage return, form feed, bell, …)
![Page 8: Text Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bfd31a28abf838cac073/html5/thumbnails/8.jpg)
Python Text
How to represent characters?
American English in the 1960s:
26 characters × {upper, lower}
+ 10 digits
+ punctuation
+ special characters for controlling teletypes
(new line, carriage return, form feed, bell, …)
= 7 bits per character (ASCII standard)
![Page 9: Text Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bfd31a28abf838cac073/html5/thumbnails/9.jpg)
Python Text
How to represent text?
![Page 10: Text Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bfd31a28abf838cac073/html5/thumbnails/10.jpg)
Python Text
How to represent text?
1. Fixed-width records
![Page 11: Text Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bfd31a28abf838cac073/html5/thumbnails/11.jpg)
Python Text
How to represent text?
1. Fixed-width recordsA crash reducesyour expensive computerto a simple stone.
![Page 12: Text Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bfd31a28abf838cac073/html5/thumbnails/12.jpg)
Python Text
How to represent text?
1. Fixed-width recordsA crash reducesyour expensive computerto a simple stone.
A c r a s h r e d u c e s · · · · · · · ·
y o u r e x p e n s i v e c o m p u t e r
t o a s i m p l e s t o n e . · · · · ·
![Page 13: Text Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bfd31a28abf838cac073/html5/thumbnails/13.jpg)
Python Text
How to represent text?
1. Fixed-width recordsA crash reducesyour expensive computerto a simple stone.
Easy to get to line N
A c r a s h r e d u c e s · · · · · · · ·
y o u r e x p e n s i v e c o m p u t e r
t o a s i m p l e s t o n e . · · · · ·
![Page 14: Text Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bfd31a28abf838cac073/html5/thumbnails/14.jpg)
Python Text
How to represent text?
1. Fixed-width recordsA crash reducesyour expensive computerto a simple stone.
Easy to get to line N
But may waste space
A c r a s h r e d u c e s · · · · · · · ·
y o u r e x p e n s i v e c o m p u t e r
t o a s i m p l e s t o n e . · · · · ·
![Page 15: Text Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bfd31a28abf838cac073/html5/thumbnails/15.jpg)
Python Text
How to represent text?
1. Fixed-width recordsA crash reducesyour expensive computerto a simple stone.
Easy to get to line N
But may waste space
What if lines are longer than the record length?
A c r a s h r e d u c e s · · · · · · · ·
y o u r e x p e n s i v e c o m p u t e r
t o a s i m p l e s t o n e . · · · · ·
![Page 16: Text Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bfd31a28abf838cac073/html5/thumbnails/16.jpg)
Python Text
How to represent text?
1. Fixed-width records
2. Stream with embedded end-of-line markers
![Page 17: Text Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bfd31a28abf838cac073/html5/thumbnails/17.jpg)
Python Text
How to represent text?
1. Fixed-width records
2. Stream with embedded end-of-line markers
A crash reducesyour expensive computerto a simple stone.
A c r a s h r e d u c e s y o u r e x p e n s i v
e c o m p u t e r t o a s i m p l e s t o n e .
![Page 18: Text Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bfd31a28abf838cac073/html5/thumbnails/18.jpg)
Python Text
How to represent text?
1. Fixed-width records
2. Stream with embedded end-of-line markers
A crash reducesyour expensive computerto a simple stone.
A c r a s h r e d u c e s y o u r e x p e n s i v
e c o m p u t e r t o a s i m p l e s t o n e .
More flexible
![Page 19: Text Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bfd31a28abf838cac073/html5/thumbnails/19.jpg)
Python Text
How to represent text?
1. Fixed-width records
2. Stream with embedded end-of-line markers
A crash reducesyour expensive computerto a simple stone.
A c r a s h r e d u c e s y o u r e x p e n s i v
e c o m p u t e r t o a s i m p l e s t o n e .
More flexible
Wastes less space
![Page 20: Text Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bfd31a28abf838cac073/html5/thumbnails/20.jpg)
Python Text
How to represent text?
1. Fixed-width records
2. Stream with embedded end-of-line markers
A crash reducesyour expensive computerto a simple stone.
A c r a s h r e d u c e s y o u r e x p e n s i v
e c o m p u t e r t o a s i m p l e s t o n e .
More flexible
Wastes less space
Skipping ahead is harder
![Page 21: Text Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bfd31a28abf838cac073/html5/thumbnails/21.jpg)
Python Text
How to represent text?
1. Fixed-width records
2. Stream with embedded end-of-line markers
A crash reducesyour expensive computerto a simple stone.
A c r a s h r e d u c e s y o u r e x p e n s i v
e c o m p u t e r t o a s i m p l e s t o n e .
More flexible
Wastes less space
Skipping ahead is harder
What to use for end of line?
![Page 22: Text Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bfd31a28abf838cac073/html5/thumbnails/22.jpg)
Python Text
Unix: newline ('\n')
![Page 23: Text Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bfd31a28abf838cac073/html5/thumbnails/23.jpg)
Python Text
Unix: newline ('\n')
Windows: carriage return + newline ('\r\n')
![Page 24: Text Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bfd31a28abf838cac073/html5/thumbnails/24.jpg)
Python Text
Unix: newline ('\n')
Windows: carriage return + newline ('\r\n')
Oh dear…
![Page 25: Text Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bfd31a28abf838cac073/html5/thumbnails/25.jpg)
Python Text
Unix: newline ('\n')
Windows: carriage return + newline ('\r\n')
Oh dear…
Python converts '\r\n' to '\n' and back on Windows
![Page 26: Text Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bfd31a28abf838cac073/html5/thumbnails/26.jpg)
Python Text
Unix: newline ('\n')
Windows: carriage return + newline ('\r\n')
Oh dear…
Python converts '\r\n' to '\n' and back on Windows
To prevent this (e.g., when reading image files)
open the file in binary mode
![Page 27: Text Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bfd31a28abf838cac073/html5/thumbnails/27.jpg)
Python Text
Unix: newline ('\n')
Windows: carriage return + newline ('\r\n')
Oh dear…
Python converts '\r\n' to '\n' and back on Windows
To prevent this (e.g., when reading image files)
open the file in binary mode
reader = open('mydata.dat', 'rb')
![Page 28: Text Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bfd31a28abf838cac073/html5/thumbnails/28.jpg)
Python Text
Back to characters…
![Page 29: Text Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bfd31a28abf838cac073/html5/thumbnails/29.jpg)
Python Text
Back to characters…
How to represent ĕ, β, Я, …?
![Page 30: Text Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bfd31a28abf838cac073/html5/thumbnails/30.jpg)
Python Text
Back to characters…
How to represent ĕ, β, Я, …?
7 bits = 0…127
![Page 31: Text Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bfd31a28abf838cac073/html5/thumbnails/31.jpg)
Python Text
Back to characters…
How to represent ĕ, β, Я, …?
7 bits = 0…127
8 bits (a byte) = 0…255
![Page 32: Text Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bfd31a28abf838cac073/html5/thumbnails/32.jpg)
Python Text
Back to characters…
How to represent ĕ, β, Я, …?
7 bits = 0…127
8 bits (a byte) = 0…255
Different companies/countries defined different
meanings for 128...255
![Page 33: Text Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bfd31a28abf838cac073/html5/thumbnails/33.jpg)
Python Text
Back to characters…
How to represent ĕ, β, Я, …?
7 bits = 0…127
8 bits (a byte) = 0…255
Different companies/countries defined different
meanings for 128...255
Did not play nicely together
![Page 34: Text Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bfd31a28abf838cac073/html5/thumbnails/34.jpg)
Python Text
Back to characters…
How to represent ĕ, β, Я, …?
7 bits = 0…127
8 bits (a byte) = 0…255
Different companies/countries defined different
meanings for 128...255
Did not play nicely together
And East Asian "characters" won't fit in 8 bits
![Page 35: Text Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bfd31a28abf838cac073/html5/thumbnails/35.jpg)
Python Text
1990s: Unicode standard
![Page 36: Text Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bfd31a28abf838cac073/html5/thumbnails/36.jpg)
Python Text
1990s: Unicode standard
Defines mapping from characters to integers
![Page 37: Text Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bfd31a28abf838cac073/html5/thumbnails/37.jpg)
Python Text
1990s: Unicode standard
Defines mapping from characters to integers
Does not specify how to store those integers
![Page 38: Text Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bfd31a28abf838cac073/html5/thumbnails/38.jpg)
Python Text
1990s: Unicode standard
Defines mapping from characters to integers
Does not specify how to store those integers
32 bits per character will do it...
![Page 39: Text Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bfd31a28abf838cac073/html5/thumbnails/39.jpg)
Python Text
1990s: Unicode standard
Defines mapping from characters to integers
Does not specify how to store those integers
32 bits per character will do it...
...but wastes a lot of space in common cases
![Page 40: Text Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bfd31a28abf838cac073/html5/thumbnails/40.jpg)
Python Text
1990s: Unicode standard
Defines mapping from characters to integers
Does not specify how to store those integers
32 bits per character will do it...
...but wastes a lot of space in common cases
Use in memory (for speed)
![Page 41: Text Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bfd31a28abf838cac073/html5/thumbnails/41.jpg)
Python Text
1990s: Unicode standard
Defines mapping from characters to integers
Does not specify how to store those integers
32 bits per character will do it...
...but wastes a lot of space in common cases
Use in memory (for speed)
Use something else on disk and over the wire
![Page 42: Text Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bfd31a28abf838cac073/html5/thumbnails/42.jpg)
Python Text
(Almost) everyone uses a variable-length encoding
called UTF-8 instead
![Page 43: Text Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bfd31a28abf838cac073/html5/thumbnails/43.jpg)
Python Text
(Almost) everyone uses a variable-length encoding
called UTF-8 instead
First 128 characters (old ASCII) stored in 1 byte each
![Page 44: Text Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bfd31a28abf838cac073/html5/thumbnails/44.jpg)
Python Text
(Almost) everyone uses a variable-length encoding
called UTF-8 instead
First 128 characters (old ASCII) stored in 1 byte each
Next 1920 stored in 2 bytes, etc.
![Page 45: Text Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bfd31a28abf838cac073/html5/thumbnails/45.jpg)
Python Text
(Almost) everyone uses a variable-length encoding
called UTF-8 instead
First 128 characters (old ASCII) stored in 1 byte each
Next 1920 stored in 2 bytes, etc.
0xxxxxxx 7 bits
![Page 46: Text Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bfd31a28abf838cac073/html5/thumbnails/46.jpg)
Python Text
(Almost) everyone uses a variable-length encoding
called UTF-8 instead
First 128 characters (old ASCII) stored in 1 byte each
Next 1920 stored in 2 bytes, etc.
0xxxxxxx 7 bits
110yyyyy 10xxxxxx 11 bits
![Page 47: Text Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bfd31a28abf838cac073/html5/thumbnails/47.jpg)
Python Text
(Almost) everyone uses a variable-length encoding
called UTF-8 instead
First 128 characters (old ASCII) stored in 1 byte each
Next 1920 stored in 2 bytes, etc.
0xxxxxxx 7 bits
110yyyyy 10xxxxxx 11 bits
1110zzzz 10yyyyyy 10xxxxxx 16 bits
![Page 48: Text Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bfd31a28abf838cac073/html5/thumbnails/48.jpg)
Python Text
(Almost) everyone uses a variable-length encoding
called UTF-8 instead
First 128 characters (old ASCII) stored in 1 byte each
Next 1920 stored in 2 bytes, etc.
0xxxxxxx 7 bits
110yyyyy 10xxxxxx 11 bits
1110zzzz 10yyyyyy 10xxxxxx 16 bits
11110www 10zzzzzz 10yyyyyy 10xxxxxx 21 bits
![Page 49: Text Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bfd31a28abf838cac073/html5/thumbnails/49.jpg)
Python Text
(Almost) everyone uses a variable-length encoding
called UTF-8 instead
First 128 characters (old ASCII) stored in 1 byte each
Next 1920 stored in 2 bytes, etc.
0xxxxxxx 7 bits
110yyyyy 10xxxxxx 11 bits
1110zzzz 10yyyyyy 10xxxxxx 16 bits
11110www 10zzzzzz 10yyyyyy 10xxxxxx 21 bits
The good news is, you don't need to know
![Page 50: Text Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bfd31a28abf838cac073/html5/thumbnails/50.jpg)
Python Text
Python 2.* provides two kinds of string
![Page 51: Text Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bfd31a28abf838cac073/html5/thumbnails/51.jpg)
Python Text
Python 2.* provides two kinds of string
Classic: one byte per character
![Page 52: Text Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bfd31a28abf838cac073/html5/thumbnails/52.jpg)
Python Text
Python 2.* provides two kinds of string
Classic: one byte per character
Unicode: "big enough" per character
![Page 53: Text Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bfd31a28abf838cac073/html5/thumbnails/53.jpg)
Python Text
Python 2.* provides two kinds of string
Classic: one byte per character
Unicode: "big enough" per character
Write u'the string' for Unicode
![Page 54: Text Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bfd31a28abf838cac073/html5/thumbnails/54.jpg)
Python Text
Python 2.* provides two kinds of string
Classic: one byte per character
Unicode: "big enough" per character
Write u'the string' for Unicode
Must specify encoding when converting from
Unicode to bytes
![Page 55: Text Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bfd31a28abf838cac073/html5/thumbnails/55.jpg)
Python Text
Python 2.* provides two kinds of string
Classic: one byte per character
Unicode: "big enough" per character
Write u'the string' for Unicode
Must specify encoding when converting from
Unicode to bytes
Use UTF-8
![Page 56: Text Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See](https://reader035.vdocument.in/reader035/viewer/2022062519/5697bfd31a28abf838cac073/html5/thumbnails/56.jpg)
October 2010
created by
Greg Wilson
Copyright © Software Carpentry 2010
This work is licensed under the Creative Commons Attribution License
See http://software-carpentry.org/license.html for more information.