python programming language primer (from a data analysis
TRANSCRIPT
1
Python Programming Language Primer(From a Data Analysis Perspective)
Sharma ChakravarthyInformation Technology Laboratory (IT Lab)
Computer Science and Engineering DepartmentThe University of Texas at Arlington, Arlington, TX 76019
Email: [email protected]: http://itlab.uta.edu/sharma
2
Communicating with a computer
Operating systems (Unix/Linux, Windows, Mac OS) mediate between a human and the computer OS commands allow you to create files, manage resources, users, execute programs, etc.
Other applications allow us to use computer in different ways− Editors, IDEs, video player, compiler, Skype, …
Programming languages are among the most commonly used mechanism for specifying/describing and executing computations. Others− Sorting, search, shortest path, querying
− File content processing
− Registering for courses, …
3
Communicating with a computer
A program written in a programming language is compiled and/or interpreted
Compilation generates binary code (for the target platform) that is executed on that platform− Cannot compile for one platform
and execute on another
Interpreters process each statement of the program− Shell scripts, Perl, Lisp are
interpreted
Some languages generate bytecodewhich is platform independent
Java generated bytecode
Pascal generated something similar very early on
Firmware has the BIOS (Basic Input/Output System)
Hardware includes CPU, Ram where Programs run
Early Computer
2/10/2022 © your name 4
By NASA - Great Images in NASA Description, Public Domain, https://commons.wikimedia.org/w/index.php?curid=6455009
An IBM 704 Mainframe (1957)
2
Early Personal Computer
2/10/2022 © your name 5
By Rama & Musée Bolo - File:IBM_PC-IMG_7271.jpg, CC BY-SA 2.0 fr, https://commons.wikimedia.org/w/index.php?curid=94784371
IBM PC: Typically had 640 K main memory5 to 20 MB hard drive
6
Programming Languages Saga
Ultimately, binary or executable code is what runs on a processor (however it is generated) RISC (Reduced Instruction Set Computer) needs more RAM
− System/360, VAX, AMD, and Intel x86 CPUs.
CISC (Complex Instruction Set Computer)− ARM, AVR, PA‐RISC, and SPARC
Assembly languages were developed to raise the level of abstraction Believe it or not, earlier IBM OSs were written in assembly
language (similar to bytecode) and had 100,000+ statements
Programming languages were again used to raise the level of abstraction (remember the DBMS evolution)
Different languages were developed for different purposes, with different features, and as the technology evolved
7
Programming Languages Saga FORTRAN: FORmula TRANslation, 1950’s, by IBM for
scientific computation Limited data types: arrays, matrix Compiled No recursion Global variables (in very early versions)
Lisp: interpreted, based on Lambda Calculus (J McCarthy) Widely used in AI, expert systems Streams: Common Lisp, scheme Even Lisp machines were marketed (I used one from
Symbolics)
COBOL: for business data processing (COmmon Business Oriented Language) English‐like, Verbose, for financial applications Formatting could be tailored
Basic: very first (or one of the early) interpreted language
8
Programming Languages Saga
Then came a slew of languages (mainly imperative, procedural) Algol 58, 60, 68 (Algorithmic language)
− Overcame problems with Fortran− Introduced code blocks (begin … end)− Introduced nested static/lexical scoping, nested functions, recursion
− BNF (formal grammar) was used for its syntax specification
Pascal (for teaching programming) by N Wirth (1970)− Turbo Pascal was popular− Structured (not the same as OO)
subroutine as blocks were introduced Recursion was introduced Data structures were introduced
Others: PL/1, Simula, BCPL, B, C, SNOBOL (string handling language from the 60’s)
C was perhaps the most popular one (followed B)
3
9
Programming Languages Saga
Then came Object‐oriented languages C++ (1985), Java (1995), all previous languages have an OO version including OO Fortran, OO Cobol etc.
C++ was backward compatible with C
Hence, they had to compromise on a few things− Pointers, pointer arithmetic, functions, interrupts, …
Smalltalk, Modula, Objective‐C, …
ADA (1983): developed by DoD (replaced 100+ languages used by DoD) Kitchen‐sink approach (by a committee)
No one else seems to use it now
Rational Rose company came into existence based on Ada
10
Programming Languages Saga Java was developed from scratch in 1995 with a focus on
object‐orientation and to achieve operating system‐independence Tried to overcome the limitations of C++. Was easy as there
was no baggage!− Immutable object reference, object visibility thru inheritance (private, protected, public)
− Single inheritance (unlike C++), interfaces to support multiple inheritance to some extent
− No pointer arithmetic
− Extensive exception handling (throw, try, raise, …)
− Documentation (Javadoc) built into language
− OS independent Graphics support
− Reflection (an important capability)
− No separation of specification and program (.hh and .cc files)
Interestingly, was developed by Goosling who was a Lisp admirer and had developed Emacs editor in Lisp
11
Language Features: PythonPros Cons
General‐purpose language Slow since interpreted
Preferred by beginners Database access layer is not very mature
Easier to learn Not suitable for hardware‐near programming
Dynamically typed Not suitable for mobile development
Automated garbage collection Not suitable for game development
Lots of machine learning libraries Not suitable for game development
Fast prototyping Slower than C/C++, Java
Platform‐independent
Large community
Job availability
Future potential
Source: https://www.neuralnine.com/top-5-programming-languages-for-2020/12
Language Features: Java
Pros Cons
General‐purpose language Not optimal for game development
Easy to learn Not many machine learning libraries
Large professional frameworks Not suitable for hardware‐near programming
Perfect for enterprise applications Slower than C/C++
Automated garbage collection
Platform‐independent
Large community
Supports multi‐threading
Native android language
4
13
Language Features: C/C++
Pros Cons
General purpose programming language Complex syntax
Very good performance Not object‐oriented (supports struct)
Hardware‐near programming Not good for beginners
Good for game development Not suitable for mobile development
Old, but still widely used Too low level for many complex tasks
portability
Supports multi‐threading
Great community
Job opportunities
Others: JavaScript, C#, …
14
Python
Python (derived from British comedy group Monty Python) was conceived in the 80’s by its designer Guido van Rossum of Netherlands and was influenced by many languages (ABC, SETL, Perl, …) Was supposed to be fun to use it (as its name)
Python 2.0 was released in 2000, 3.0 in 2008 Is an interpreted high‐level programming language and
emphasizes code readability with the use of indentation
As of 2021, 3rd most popular PL behind Java and C
Object‐oriented, Uses dynamic typing− Variable names are untyped (no need to indicate type)
− Type constraints are not checked at compile time
Python Implementation alternatives− Cpython, Jython, IronPython, Stackless, PyPy
15
Is Python compiled? interpreted?
import disdef example(x):for i in range(x):print(2*i)dis.dis(example)dis.dis(example)
3 0 SETUP_LOOP 28 (to 30)2 LOAD_GLOBAL 0 (range)4 LOAD_FAST 0 (x)6 CALL_FUNCTION 18 GET_ITER
>> 10 FOR_ITER 16 (to 28)12 STORE_FAST 1 (i)
4 14 LOAD_GLOBAL 1 (print)16 LOAD_CONST 1 (2)18 LOAD_FAST 1 (i)20 BINARY_MULTIPLY22 CALL_FUNCTION 124 POP_TOP26 JUMP_ABSOLUTE 10
>> 28 POP_BLOCK>> 30 LOAD_CONST 0 (None)
32 RETURN_VALUE
Here is an example of a short Python function and its bytecode
In Python, the source code is compiled into a much simpler form called bytecode. These are instructions similar in spirit to CPU instructions, but instead of being executed by the CPU, they are executed by software called a virtual machine. (These are not VM’s that emulate entire operating systems, just a simplified CPU execution environment.)Compiled to bytecode and bytecode interpreted (can also be compiled)JIT compilers (e.g., Java) convert bytecode into machine code for speed!
16
Python
To convert Python into byte code, you do not invoke a compiler. It is done implicitly You run the Python code directly (.py file), or
You write Python statements using interpretive prompt (>>>). Colab does not show interpretive prompt
In both cases byte code is generated
The notion of bytecode came about to avoid multiplicative number of compilers 4 languages, 3 target platforms = 4*3 = 12 compilers
With bytecode (additive) 4 languages, 3 target platforms
= 4 bytecode compilers + 3 runtime systems
= 7 implementations instead of 12
5
17
A word of advise and caution Do not get carried away by a language and its popularity (comes
and goes) Languages support same/similar things in slightly different ways
depending on their vintage What is important are the features of the language and your
understanding of how to use it effectively Object orientation (provides encapsulation)
− Less mistakes, design using interacting objects
Protections against making mistakes− Pointers vs. object references, garbage collection− C and even C++ suffered from this
Lexical/static scoping vs. dynamic scoping− Easier to debug (do not have to trace calls)
Call by value vs. call by reference− understanding side effects
Static typing (type cannot change) vs. loose typing vs. dynamic typing − Ease vs. mistakes tradeoff
Strong, weak, or no typing (has to declare type)− Understandability of code
18
Learning Python You need to invest in hands‐on learning for getting a
hang of any programming language Making mistakes will certainly help you learn/understand minimal mistakes is the smart way. For this, you need to understand a few concepts clearly. Hope I can help you with this
But, YOU have to get your feet wet Use colab.research.google.com for interactive usage
You cannot read, understand code and learn a language This is not poetry! Writing poetry is, perhaps, even harder
You cannot master a recipe by memorizing it! The proof is in the taste of the pudding
Proof of your understanding is in the time it takes to develop correct, easy to understand programs!
19
Another word of caution
Do not confuse or equate design with programming
Different things; needs different skill sets
Design involves developing multiple algorithms, data type choices, and their interaction
Programming entails implementing a given design in a specific, chosen PL (programming languages)
Design is language indpendent
Implementation is NOT!
Matching/mapping design to an implementation in a PL is a non‐trivial task
Choosing appropriate PL features is not easy!
Which one is harder? Design? Or programming?
Python Primerwe will use 3.X, not 2.X
(unfortunately, Python is not always backward compatible!)
YOU are going to write your Python code using
Colaboratory from Google Research
colab.research.google.com
Colaboratory, or "Colab" for short, allows you to write and execute Python in your browser with
• Zero configuration required• Free access to GPUs (not that we need it!)• Easy sharing
Source: Donna French
Tutorial: https://docs.python.org/3/tutorial/index.html
6
21
Python
So, let us understand Python concepts and their intuition along with examples so you can practice further on your own (in 2 to 3 classes)1. Overall philosophy of Python2. Variables, Typing, strings, and Print statements3. Pattern matching4. Scoping rules, polymorphism5. Mutable and immutable concepts6. Control structures7. Call by value, reference8. File read/write, serialization9. Data structures10. Objects, inheritance11. Namespace, garbage collection, …
22
Python: Overall Philosophy
Borrow ideas whenever it makes sense
Things should be as simple as possible, bot no simpler (Occam’s principal)
Platform independence
Zen of Python (a few from Tim Peters)
Readability counts
− Strict indentation (no statement separator!)
Errors should never pass silently
If the implementation is hard to explain, it's a bad idea
Sparse is better than dense
Namespaces are one honking great idea ‐‐ let's do more of those!
23
Python
I will cover the following with examples so you can practice further on your own Overall philosophy of Python
Typing, strings, and Print statements
Scoping rules, polymorphism
Mutable and immutable
Control structures
Call by value, reference
File read/write, serialization
Data structures
Objects, inheritance
Namespace, garbage collection, …
24
Python: core data types (built into the PL)
Numbers: 999, 3.14156, Decimal(), Fraction() Strings: ‘dasc 5300’, “Bob’s”, b’a\x01c’, u’sp\xc4m’, r’C:\novel\trial’
Boolean: True (1) and False (0); what is: True+10,True < False
Lists: [1, [2, ‘three’, 4.5], list(range(10)] ‐‐ ordered
Tuples: (0001, ‘jack’, ‘f’, 22, ‘cse’), named tuple – ordered
Dictionaries: {‘food’ : ’pizza’, ‘juice’ : ‘orange’} – no order
Files: open(‘census.txt’, ‘r’), open(‘result.txt’, ‘w’)
Sets: {‘john’, ‘mary’, ‘uma’}, set(‘abc’, ‘def’) no order! | (union), ‐ (diff), & (intersection), ^ (XOR), > (super set), < (subset)
Other core types – None, NaN
Program unit types: functions, modules, classes Implementation‐related types: compiled code, stack
tracebacks
7
Python: variables, objects, and references
Variables are names. Identifier syntax is followed Starts with a letter, no hyphen, special chars, numbers ok, underscore
is used for private identifiers
Upper and lowercase should be used properly to indicate semantics
placeOrder is better than placeorder or PLACEORDER
Variables are created when a value is assigned
Different from other PLs
Variable Type: the notion of type lives with objects, NOT names
Same name can be associated with objects of different types
Variable use: is replaced with the object it currently refers to (or points to). All variables in Python should be assigned before they are used!
2/10/2022 © your name 25
Python: variables, objects, and references
Consider name reference object
a = 3
Now, add
b = 3
Now, add
a = ‘dasc’
Notion of ‘shared reference’
Mutable – ability to change “in‐place” Integer objects are immutable (also strings)
Sets allow in‐place change using set methods (update)
2/10/2022 © your name 26
‘dasc’
a 3
b
a
Python: equality due to shared references>>> l = [1, 3, 5]>>> m = l>>> l == m # comparison of object valuesTrue >>>l is m # is operator checks object identity. Both references should point to the SAME object!True
On the other hand,>>> l = [1,2, 3]>>> m = [1, 2, 3]>>> l==m # checks values; corresponds to equals method of JavaTrue>>> l is m # compares object references (identity); here there are 2 copies of [1,2,3]. False # Hence false
However,>>> x = 99>>> y = 99>>> x == yTrue>>> x is yTrue # why? sys.getrefcount(name) returns number of references to variable name
2/10/2022 © your name 27
Please look up “weak reference”Implemented in weakref standard library module
Small objects are NOT replicated for efficiency! One copy of 99
If X is Y then X == Y is TrueIf X == Y then X is Y need not be True!
28
Python: Typing
Python is dynamically typed, not statically typed Python values, not variables carry type information
− Unlike Java, C, C++ (int a; only allows integer values to be stored in a)
All variables in Python hold references to objects − Has implications that need to be understood
# This is a single line comment in Pythonalist = ['a', 'm', 'z']print(alist)alist = 3.14print(alist)alist= 55print(alist)‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐['a', 'm', 'z']3.1455
----------------------------------------------------------------------NameError Traceback (most recent call last)<ipython-input-5-d61a29e07a40> in <module>()
5 alist= 556 print(alist)
----> 7 print(z)
NameError: name 'z' is not defined
Type of a variable is allowed to change over its lifetime
print(z) # in-line comment. What happens?
8
Python: Typing comparison
Java
String Name = "Python";int age = 30;double version = 3.9;
C
char Name[10] = "Python";int age = 30;double version = 3.9;
Python
Name = "Python"age = 30version = 3.9
+ Can make it easier to get started quickly
‐ Can be error prone
Strongly typed languages won't compile when a type error is made, but Python would with unexpected results.
From: Donna French
Python OperatorsOperator Description Example Result
+ Addition 3 + 4 7
‐ Subtraction 7 – 2 5
* Multiplication 5 * 6 30
/ Division 32 / 3 10.666666666666666
% Modulus (mod) 32 % 3 2
// Integer Division 32 // 3 10
** Exponential 4 ** 2 16
Operator * is overloaded; for strings and numerical valuesOperator + is overloaded; for strings and numerical valuesOperator / is undefined for some types (e.g., strings)Supports bitwise operation ( <<, …)
Import math # math is a module that can be imported. module.namemath.pi # once imported, all functions become available3.141592 # functions are invoked as module.function_name()math.sqrt(85) # you need to lookup available functions (HW)9.219math.isnan(99) # NaN – not a numberfalse
31
Python: Typing
Dynamic typing can lead to some unexpected results, if you are not careful
len = 10width = 5print(len*width)len = ['dasc'] # list print(len*width)width = [' 5300']print(len+width)len = 'dasc‚ # stringwidth = ' 5300'print(len+width)
‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐50['dasc', 'dasc', 'dasc', 'dasc', 'dasc'] # repetition['dasc', ' 5300'] # uniondasc 5300 # string concatenation
What is going on? * is an overloaded operator and repeats the string n times + is also overloaded and acts as string concatenation/union At the same time, / and * on strings will give an error! (Why?)
print(len/width) or print(len*width)----------------------------------------------------------------------TypeError Traceback (most recent call last)<ipython-input-1-5dae687cc566> in <module>()
9 width = ' 5300'10 print(len+width)
---> 11 print(len/width)12
TypeError: unsupported operand type(s) for /: 'str' and 'str'
Python StringsA string is a sequence of characters.
A string in Python is immutable‐ value cannot be changed ‘in‐place’
‐ path = ‘abc’
‐ print(path)
‐ path[1] = ‘d’ # ‘str’ object does not support item assignment!
‐ print(path)
concatenation or replace creates a NEW string
Using + with two strings will concatenate them together.
Using * with a string and a value repeats the string
Operator / will give an error!
9
Python Strings
So…single quotes or double quotes?
' and " are interchangeable most of the time…
Python Primer
Strings
\ is an escape character; widely used as such!
Python Primer
Strings
Use " instead of ' to quote the string Use
\to
escape the '
Python Primer
Strings
We can calculate the length of a string by using len()
in operator can be used to check a substring
Remember, index starts from 0And goes up to 33; total 34 characters
10
Python Primer
In a string\n is used to add newline to a text string\t is used to add a tab\r is carriage return\v vertical tab\xhh character with hex value hh
Use multiline continuation character \ to break a statement into multiple lines. 38
Python: strings You need to be very careful while using Python strings
Especially when escape characters are embedded!path='C:\new\text.dat' # myfile = open(‘C:\new\text.dat’, ‘w’)print(path)path ='C:\\new\\text.dat' # escaping using \ characterprint(path)path = r'C:\new\text.dat' # disable escape: raw string modeprint(path)
----------------------------------C:ew ext.datC:\new\text.datC:\new\text.dat Strings can be indexed, sliced
Path[4] is ‘n’ Path[i:j] extracts contiguous section of sequences from i
to j-1 Path[1:3] is ‘:\’, path[1:-1] is ‘:\new\text.da’
39
Python: indexing Python index starts from 0 as most other
languages
Same for arrays, strings, lists, … Not sets (Why?)
Goes from 0 to len(a) ‐1; also from ‐1 to –len()
Gives ArrayBoundException error in other languages
D A S C 5 3 0 0
0
ARRAY / STRING
(len() -1) or 8
index
index Unique to Python-1-len()
40
Python: strings Extended slicing s[i:j:k] accepts a step
or stride k which defaults to +1 from i up to < j in steps of k
Allows for skipping items and reversing order (HW problem)good test problem!
Can also be used as − ‘spam’[1:3] results in ‘pa’
− ‘spam’[slice(1,3)] results in ‘pa’
Slice returns an object that can be used as an index!
Line = ‘dasc 5300 or CSE 5300’
Line.split(‘ ‘) gives
[‘dasc’, ‘5300’, ‘or, ‘CSE’, ‘5300’]
11
41
Python: bounds checking len() function applies to all arrays, lists, sets, and others A bound check is done to prevent access beyond the current sizealist = [1, 2, 3, 4, 5]print(alist, len(alist))alist[4] = 6print(alist, len(alist))Myset = {‘a’, ‘b’, ‘c’, ‘a’}Print(myset, len(myset))alist[10] =10‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐[1, 2, 3, 4, 5] 5[1, 2, 3, 4, 6] 5{'b', 'a', 'c'} 3‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐IndexError Traceback (most recent call last)<ipython‐input‐14‐733bb04efd44> in <module>()
24 myset = {'a', 'b', 'c', 'a'}25 print(myset, len(myset))
‐‐‐> 26 alist[10]=10
IndexError: list assignment index out of range
Look up comprehensions in PythonFor list, set, …
Python Primer
How do we format our output? We use the built‐in function
Using .format
format is used by substituting replacement fields for placeholders in the string using the formats specified within the placeholder.
format(replacementField, …)
Python: formattingType of Replacement Field Type Conversion Character Output Format
String s String in default format
Integerc Converts an integer to its Unicode equivalent
d decimal integer (default)
Floating point f or F Floating point with a precision of 6
Precision .n Display n numbers after the decimal point
print(“%s %d %c %.2f %.3f” % (x, y, z, a, b))
Python Primer
Built‐in Functions
We just learned how to use len() to get the length of a string. Function len() is built into the language.
Some built‐in functions return values and other do not
None is a special value that means nothing was returned
12
45
Python
I will cover the following with examples so you can practice further on your own Overall philosophy of Python
Typing, strings, and Print statements
Pattern matching (regular expressions in Python)
Scoping rules, polymorphism
Mutable and immutable
Control structures
Call by value, reference
File read/write, serialization
Data structures
Objects, inheritance
Namespace, garbage collection, …
46
Python: Pattern matching Pattern matching is typically done using scripts and is essential
and very useful for processing text data (given in project 1) Project 1 data is given as text (as .csv file format)
Regular expressions are used for pattern matching Python module re supports that and need to be imported Substring matching, wild card specification, and splitting can be
done (quite extensive) 50+ lines of Java code can be expressed in 1 or 2 lines in Python or
Perl!
. any char, ^ start of string (also not, depending on where it is used), $ end of string, * 0 or more, + 1 or more, \w any letter, digit, or _ \W any char not part of \w, \d and \D are similar
47
Python: search and match re.search() method takes a regular expression pattern and a
string; searches for that pattern within the string. If the search is successful, search() returns the first match object or None otherwise. None is a key word
re.match() function searches for some substring in a string at the beginning and returns the first match object if found, else it returns None.
There is a difference between the use of both functions. Both return the first match of a substring found in the string, but re.match() searches only from the beginning of the string and returns match object if found. But if a match of substring is found somewhere in the middle of the string, it returns None.
While re.search() searches for the whole string even if the string contains multi‐lines and tries to find a match of the substring in all the lines of string
re.IGNORECASE is a useful flag Use of () creates groups (a tuple)which can be
indexed[a-f] any char enclosed in [] {i:j} min:max explicitly indicated
48
Python: search and matchImport re
print("‐‐‐‐‐‐‐‐1")
match1= re.search('Co.k.e', "Cookie") # . matches any character
print(match1)
print(match1.group())
print(match1.groups())
‐‐‐‐‐‐‐‐1
<re.Match object; span=(0, 6), match='Cookie'>
Cookie
()
print("‐‐‐‐‐‐‐‐2")
match1= re.search('Co{1,3}k.e', “My Cookie") # 1 min and max 3 of prev char
print(match1) # max has to be >= min
‐‐‐‐‐‐‐‐2
<re.Match object; span=(3, 9), match='Cookie'> # anywhere in the string
match1= re.search('Co{2,1}k.e', “My Cookie")
error: min repeat greater than max repeat at position 3 #UNDERSTAND
13
49
Python: search and matcha{m} # matches exactly n occurrences of a
a{m,n} # matches minimum m and max n occurrences of a
# if m is omitted, it is assumed to be 0
*, + # are greedy. If RE <.*> is matched against ‘<a> b <c>’, it will match # the entire string
#whereas <.*>? Will match only ‘<a>’
match1= re.search('Co{1,2}k.e', “My Coookie") #what wil this match
Print(match1)
‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐
None
50
Python: search and matchprint("‐‐‐‐‐‐‐‐3")
match1= re.search('Co{3,4}k.e', "my Cookie jar") # min 3 and max 4 AFTER 1st o
print(match1)
‐‐‐‐‐‐‐‐3
None
print("‐‐‐‐‐‐‐‐4")
match1= re.search('Co.*e', "Cookie") # * preceding char 0 or more times
print(match1) # [. ]* for . or space 0 or more times
‐‐‐‐‐‐‐‐4
<re.Match object; span=(0, 6), match='Cookie'>
print("‐‐‐‐‐‐‐‐5")
match1= re.search('(Co.*e)', "my Cookie in a Cookie jar") # () grouping
print(match1)
print(match1.group(0), " :: ", match1.group(1)) # group(1) is first explicit capture
print(match1.groups()) # group(0) is the entire substring matched by ur regex
‐‐‐‐‐‐‐‐5
<re.Match object; span=(3, 21), match='Cookie in a Cookie'> # why BOTH Cookie?
Cookie in a Cookie :: Cookie in a Cookie
('Cookie in a Cookie',) # adds comma if only one element to indicate tuple
51
Python: search and matchprint("‐‐‐‐‐‐‐‐7")m = re.match(r"(\w+) (\w+)", "Isaac Newton, physicist") # match at the beginning onlyprint(m.group(0)) # The entire matchprint(m.group(1)) # The first parenthesized subgroup.print(m.group(2)) # The second parenthesized subgroupPrint(m.groups()) # multiple arguments as a tuple‐‐‐‐‐‐‐‐7Isaac NewtonIsaacNewton('Isaac', 'Newton')print("‐‐‐‐‐‐‐‐8")m = re.search(r"(\w+) (\w+)", "99, Isaac Newton, physicist") # try match on this!print(m.group(0)) # The entire matchprint(m.group(1)) # The first parenthesized subgroup.print(m.group(2)) # The second parenthesized subgroup.print(m.groups()) # Multiple arguments give us a tuple.‐‐‐‐‐‐‐‐8Isaac NewtonIsaacNewton('Isaac', 'Newton')
52
Python: Pattern matching examples
match = re.match(‘/(.)*’, ‘/usr/home/python/examples’) Print(match)<re.Match object; span=(0, 25), match='/usr/home/python/examples'>
Print(match = re.match(‘/.*’, ‘/usr/home/python/examples’) )#what will the above give?match = re.match(‘/(.)’, ‘/usr/home/python/examples’)Print(match)<re.Match object; span=(0, 2), match='/u'>
match = re.match(‘/(.)/’, ‘/usr/home/python/examples’)Print(match)None
match = re.match(‘(.)*/’, ‘/usr/home/python/examples’)Print(match)<re.Match object; span=(0, 17), match='/usr/home/python/'>
14
53
Python: re grouping
A group is a part of a regex pattern enclosed in parentheses () metacharacter. We create a group by placing the regex pattern inside the set of parentheses ( and )
Capturing groups are numbered by counting their opening parentheses from left to right.
Also, capturing groups are a way to treat multiple characters as a single unit. They are created by placing the characters to be grouped inside a set of parentheses (, ). For example, in the expression, ((\w)(\s\d)), there are three groups
((\w)(\s\d)) #group 1 (\w) # group 2 (\s\d) # group 3
54
Python: re grouping
Anything you have in parentheses () will be a capture group. using the group(group_number) method of the regex Match object we can extract the matching value of each group. Group_number starts with 1
The groups() method: using the groups() method of a Match object, we can extract all the group matches at once. It provides all matches in the tuple format.
The group with the number 0 is always the target string. If you call The group() method with no arguments at all or with 0 as an argument you will get the entire target string.
55
Python: other functions findall(pattern, string, flags=0) Finds all the possible matches in the entire sequence and returns them as a list of strings. Each returned string represents one match. Group() cannot be applied as it is for a tuple!
Re.VERBOSE, re.DOTALL, re.IGNORECASE, re.MULTILINE, re.ASCII
finditer(string, [position, end_position) Similar to findall() ‐ it finds all the possible matches in the entire sequence but returns regex match objects as an iterator
finditer() might be an excellent choice when you want to have more information returned to you about your search. The returned regex match object holds not only the sequence that matched but also their positions in the original text.
import retarget_string = "The price of ice‐creams PINEAPPLE 20 MANGO 30 CHOCOLATE 40“# two groups enclosed in separate ( and ) bracket# group 1: find all uppercase letter# group 2: find all numbers# you can compile a pattern or directly pass to the finditer() methodpattern = re.compile(r"(\b[A‐Z]+\b).(\b\d+\b)") # find all matches to groups for match in pattern.finditer(target_string): # extract words
print(match.group(1)) # extract numbers print(match.group(2)) print(match.groups()) print(match.group(0))
print("1:", re.search(pattern, target_string).group(0))print("2:", re.search(pattern, target_string).group(1))print("3:", re.search(pattern, target_string).group(2))‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐PINEAPPLE20('PINEAPPLE', '20')PINEAPPLE 20MANGO30('MANGO', '30')MANGO 30CHOCOLATE40('CHOCOLATE', '40')CHOCOLATE 401: PINEAPPLE 202: PINEAPPLE3: 20
2/10/2022 © your name 56
15
57
Python re split
split(pattern, string, maxsplit=0, flags=0) Splits string by the occurrences of pattern. If capturing parentheses are used in pattern, then the text of all groups in the pattern are also returned as part of the resulting list. Ifmaxsplit is nonzero, at mostmaxsplit splits occur, and the remainder of the string is returned as the final element of the list.
print(re.split('[a‐f]', '10a34B8547', flags=re.IGNORECASE))['10', '34', '8547']print(re.split('[a‐f]', '10a34B8547', flags=re.IGNORECASE, maxsplit=1))['10', '34B8547'] # only first and the rest of the stringprint(re.split('[a‐f]', '10a34B8547'))['10', '34B8547'] # note the difference from above!statement = "please contact us at: [email protected], [email protected]"match = re.split(r'[:,]', statement)print(match)['please contact us at', ' [email protected]', ' [email protected]'] # can iterate thru this list or use index
58
Python: re split Be careful in using + and * They cause the resulting RE to match
1 or more repetitions of the preceding RE. ab+ will match ‘a’ followed by any non-zero number of ‘b’s; it will not match just ‘a’. [ab]+ will match 1 or more occurrences of any of a or b
If capturing parentheses are used in pattern, then the text of all groups in the pattern are also returned as part of the resulting list
Default separator is any white space
txt = “welcome to dasc 5300”
Print (txt.split())
print(re.split(r'\W+', 'Words, words, words, after all'))
Print(re.split(r'(\W+)', 'Words, words, words, after all’) # captures all words
Print(re.split(r'\W+', 'Words, words, words.', 1))
‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐
[‘welcome’, ‘to’, ‘dasc’, ‘5300’] # list of split elements
['Words', 'words', 'words', 'after', 'all']['Words', ', ', 'words', ', ', 'words', ', ', 'after', ' ', 'all'] # all groups
['Words', 'words, words.'] # note 2 strings, first and everything else due to 1
Python: re Splitprint(re.split(r'\W+', 'Words, words, words, after all!'))
['Words', 'words', 'words', 'after', 'all‘, ‘’] # note empty string at the end
print(re.split(r'\W+', 'Words, words, words, after all'))
['Words', 'words', 'words', 'after', 'all‘] # NO empty string at the end
Print(re.split(r'(\W+)', 'Words, words, words.’)
['Words', ', ', 'words', ', ', 'words', '.', ''] # () captures all words by def
Print(re.split(r'(\W+)', 'Words, words, words’)
['Words', ',', '', ' ', 'words', ',', '', ' ', 'words'] # why no empty string at the end?
Print(re.split(r'\W+', 'Words, words, words, 1’))
['Words', 'words, words.']
print(re.split(r'(\W)', 'Words, words, words, 1')) # () grouping
['Words', ',', '', ' ', 'words', ',', '', ' ', 'words', ',', '', ' ', '1']
print(re.split(r'(\W+)', 'Words, words, words.')[0:1])
['Words', ', '] # 2 words, ‘,’ due to grouping
print(re.split(r'\W+', 'Words, words, words.')[0:2])
['Words', 'words']
2/10/2022 © your name 59 60
Python re: greedy vs. non‐greedy
In the absence of a qualifier, the whole string is matched
This is termed greedy (as much as possible)heading = r’<h1>Title</h1>’
re.match(r’<.*>’, heading).group()
-------------------------
’<h1>Title</h1>’ # whole string is matched
On the other hand,
re.match(r’<.*?>’, heading).group()
-----------------------------------
‘<h1>’ Adding ? after the qualifier makes it perform the match in a non‐greedy or
minimal fashion; That is, as few characters as possible will be matched
Earliest match in some sense
16
61
Python
I will cover the following with examples so you can practice further on your own Overall philosophy of Python Typing, strings, and Print statements Pattern matching Scope rules Functions and polymorphism Garbage collection Mutable and immutable Control structures Call by value, reference File read/write, serialization Data structures Objects, inheritance Namespace, garbage collection, …
62
Python: Lexical scoping
Lexical or static scoping: location of the name declaration determines its scope, not the (function) calls Used in Java, C++, … (most PLs) Can be determined by looking at the code Unrelated to the runtime call stack
Python uses lexical scoping using the assignment (as declaration is not needed)
Names must be assigned (not declared) before they are used
Just about everything related to names, including scope classification, happens at assignment time
Location of assignment determines the namespace association (binding) and the scope of its visibility
Dynamic Scoping is determined at runtime
− Early versions of Lisp, APL, SNOBOL etc.
Scope rule: the "current" binding for a given name is the one
encountered most recently during execution using the stack
With dynamic scope: Name-to-object bindings cannot be determined by a
compiler, in general Hard to keep track of active bindings when reading a
program text Most languages are therefore compiled, or a compiler /
interpreter mix
64
Python Scope Details
def is executable code, creates an object with function name
lambda creates an object but returns as a result Used for in‐line function where def does not work (advanced)
return sends a result object back to the caller global declares module‐level variables that are to
be assigned nonlocal declares enclosing function variables that
are to be assigned Calls/invocation are the same Arguments are passed by object reference (no name
aliasing) Arguments are passed by position unless you say
otherwise
17
Python LEGB scope lookup Rule
2/10/2022 © your name 65
Built-in PythonNames pre-assigned in the built-in names module: open, range,SyntaxError, …
Local (function)Names assigned in any way within a function (def or lambda), and not declared global in that function
Enclosing Function LocalsNames in the local scope of any and all enclosing functions (def, lambda), from inner to outer
Global (module)Names at the top-level of module file, or declared global inA def within the file
66
Python Scope Details If a variable is assigned inside a def, it is local to that function If a variable is assigned in an enclosing def, it is nonlocal to nested functions If a variable is assigned outside of all defs, it is global to the entire file
exec orderx=6 1 # global name, in outer most scopeprint(x) 2def fun(): 5x=99 6 # local to fun()print(x) 7y =9 8 # nonlocal to fun_nested(); local to fun()def fun_nested(): 10 #nested function inside fun()print(y) 11x=77 12 # local to fun_nested()print(x) 13
fun_nested() 9print(x) 3fun() 4Print(y) 14 # insert print(y) here? See What happens? HW or test ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐6699977
67
Python Scope Detailsdef intersect(seq1, seq2):res = []for x in seq1: # in a loopif x in seq2: # in in a statement
res.append(x)return res
def append(seq1, seq2):res = []for x in seq1:for x in seq2:
res.append(x)return res
print(intersect('dasc', 'cse'))print(intersect([1,4], [4]))print(append('dasc', 'cse'))print(intersect([1,4], (4)))‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐['s', 'c'][4]['c', 's', 'e', 'c', 's', 'e', 'c', 's', 'e', 'c', 's', 'e'] # repeated 4 times, why?error
However, print(intersect([1,4], (4,)) works correctly! Why?
---------------------------------------------------------------------------TypeError Traceback (most recent call last)<ipython-input-10-1b9ea9b12748> in <module>()
15 print(append('dasc', 'cse'))16 print(intersect([1,4], [4]))
---> 17 print(intersect([1,4], (4)))
<ipython-input-10-1b9ea9b12748> in intersect(seq1, seq2)2 res = []3 for x in seq1:
----> 4 if x in seq2:5 res.append(x)6 return res
TypeError: argument of type 'int' is not iterable
In Python, first argument is bound to the object used for
invoking the function
68
Python
I will cover the following with examples so you can practice further on your own Overall philosophy of Python Typing, strings, and Print statements Pattern matching Scoping rules Python Functions and polymorphism Mutable and immutable Call by value, reference Input, operators, Boolean File read/write, serialization Control structures Data structures Objects, inheritance Namespace, garbage collection, …
18
Python: input
So how do we prompt for input in Python?
Let's look at the built‐in function input
Format for using input
result = input(prompt)#input(“enter: ”)
result will be a string and prompt is typically a quoted string
Built‐in functions can be used to convert the string result of input to other data types
For example, the return string can be converted to int(result) or float (float)
Python: relational operators
Same as other languages
Python: Boolean
True and False (with capital T and F) are keywords
All objects have an inherent Boolean true or false value
Any non‐zero number or non‐empty object is true
Zero, empty object, and None are false
Boolean operators stop evaluating (“short circuit”) as soon as result is known
Important. Need to be careful!
If f1() or f2(): … may not reach f2 evaluation, if f1() is true
If that is important, use
tmp1, tmp2 = f1(), f2()
If tmp1 or tmp2: … #no parenthesis needed around conditions
Python: Boolean
Remember “short circuit” evaluation (left to right)
X = a or b or c or NoneAssigns X to the first “non-empty” object among a, b, and c, or None if all of them are empty (or false)
19
Python: Boolean: check these
0 and 101 and 000 and -10-1 and 001 and 1 13 and 99-5 and -10-10
74
Python
I will cover the following with examples so you can practice further on your own Overall philosophy of Python
Typing, strings, and Print statements
Pattern matching
Scoping rules
Python Functions and polymorphism
Mutable and immutable
Call by value, reference
Input, operators, Boolean
File read/write, serialization
Control structures, Recursion
Objects, inheritance
75
Python: if statement An if test followed by one or more optional elif ( “else
if”) tests and a final optional else blockif test: # remember “short circuit” evaluationstatements1
elif test2: # no need for () around condition Statements2 # any number of elif
else: # remember :# note default case, no test
staetments3
No switch or case statement. Use elif Ternary expression as in C is alloweda = y if x else z # is the same as if x:
a=yelse:
a=z
76
Python: if statement
choice = ‘ham’
Print({‘spam’: 22,
‘ham’: 33,
‘eggs’: 44,
‘bacon’: 55}[choice])
Will print 33
This is called a dictionary‐based switch
Is indexed using the keyword
Dictionaries and lists can span across lines
Open syntactic pair (), {}, []
20
77
Python: while loop
Has an optional else case unlike most other languages
while test:statements
else:statements
break jumps out of the closest enclosing loop
continue jumps to the top (loop header line) of the closest enclosing loop
pass does nothing; an empty statement
loop else block runs if and only if the loop is exited normally (i.e., without hitting a break)
while True: pass # is an infinite loop
78
Python: for loop
A generic iterator in Python, It can step through items in any ordered sequence or other iterable objects
The for statement works on strings, lists, tuples, and other built‐in iterables as well as used‐defined objects
Has a header line with that specifies an assignment target (or targets) along with the object you want to step through. Has an optional else clause
for target in object:statements
else: # exactly like in the while statementstatements
break and continue are the same as in whileSum = 0for x in [1, 2, 3, 4]:
sum = sum + xPrint(sum)10
79
Python: for loop
If you are iterating thru a sequence of tuples, the loop target can actually be a tuple of targets
T = [(1,2), (3,4), (5,6)]for (a, b) in T:
print(a,b)--------------------1 23 45 6 Can also loop thru both keys and values in a dictionaryD = {‘a’:1, ‘b’:,4,‘c’:,6}for key in D:
print(key, ‘=>”, D[key])-------------------------------a => 1b => 4c => 6 Please lookup extended sequence assignments
80
Python: range
range(start, stop, step) Start and step are optional
start: (Lower limit) It is the starting position of the sequence. The default value is 0 if not specified. For example, range(0, 10). Here, start=0 and stop = 10
stop: (Upper limit) generate numbers up to this number, i.e., An integer number specifying at which position to stop (upper limit). The range() never includes the stop number in its result
step: Specify the increment value. Each next number in the sequence is generated by adding the step value to a preceding number. The default value is 1 if not specified. It is nothing but a difference between each number in the result. For example, range(0, 6, 1). Here, step = 1.
21
81
Python: List comprehension
Syntax is derived from a construct in set theory notation that applies an operation to each item in the set Teen = { y | y > 12 and y < 20}
the syntax for Python comprehension is L = [11, 12, 13, 14, 15]
L= [x + 10 for x in L]
‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐
[21, 22, 23, 24, 25]
Which can also be obtained using res = []
For x in L
res.append(x + 10)
Print(res)
‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐
[21, 22, 23, 24, 25]
List comprehensions are more concise to write!
Likely to run faster than Manual for loop
82
Recursion: General
When the solution to a problem depends on the smaller instances of the same problem, it can be done either with iteration or recursion
Recursion calls the same solution from inside the solution to solve the problem Uses a language feature (most PLs support recursive calls)
Needs a stack to store intermediate results and to unfurlthe recursive call to compute the result!
Iteration needs to identify and index the smaller instances at programming time
Factorial, Fibonacci series are naturally expressed as recursive expressions! N! = N * (N‐1)! and 1! Is 1 and 0! Is 1 (base cases)
83
Python: Recursion Example
# watch out for stack overflow
import sys
sys.setrecursionlimit(1500)
def fact(n):
if n > 10:
return 1
else:
return n*fact(n‐1) # tail recursion; also direct recursion
print(fact(10)) # what will be the answer?
‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐
RecursionError Traceback (most recent call last)
<ipython‐input‐4‐d450c16c557f> in <module>()
8 return n*fact(n‐1)
9
‐‐‐> 10 print(fact(10))
Look up indirect recursion!
You can change the recursion limit!
Answer for fact(15)? HW
84
Python: Recursion Example
# watch out for stack overflowimport syssys.setrecursionlimit(1500)def fact(n):if n == 1:return 1else:return n*fact(n‐1) # tail recursion; also direct recursion
print(fact(10)) # what will be the answer? Print(fact(20))Print(fact(100))‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐3628800243290200817664000093326215443944152681699238856266700490715968264381621468592963895217599993229915608941463976156518286253697920827223758251185210916864000000000000000000000000
You can change the recursion limit!
22
85
Python: Recursion Example
Summing a list of arbitrary numbersDef mySum(L)print(L)if not L: # if list is emptyreturn 0
else:return L[0] + mySum(L[1:]) # recursive call
mySum([1, 2, 3, 4, 5]15With the print statement[1, 2, 3,4, 5][2, 3, 4, 5][3, 4, 5][4, 5][5][]15mySum(L[5, 4, 3, 2, 1]) # will this work correctly? Answer? (HW)
Any recursive solution has a base case and a recursive case
Empty list returns 0 (base case)Recursive call to the sameFunction mySum(L[1:])
Always, in a recursive solution, there will beA monotonically decreasing object which will Reach the base case after finite number of callsIn this example, the list decreases in size reaching an empty list
Beware of Stack overflow. Occurs when recursion is not properly coded
86
Recursion: General
Stacks and Heaps are in‐memory data structures used by a compiler
They server different purposes
A stack is needed for allocating program local variables in a block structured language, for subroutine and recursive calls
− Stacks are LIFO (last in first out)
A heap is used for global variables and allocation of objects
Typically, given a memory allocation, heap is used from top downwards and stack from bottom upwards (Why?)
87
Recursion: General
Stack grows and shrinks
Heap is managed based on deletion and creation of objects
Stack overflow occurs when there is no more space for stack to grow
Memory exceeded occurs when there is no more heap space
Data and code segments are not shown
88
Python
I will cover the following with examples so you can practice further on your own Overall philosophy of Python Typing, strings, and Print statements Pattern matching Scoping rules Call by value, reference Python Functions and polymorphism Mutable and immutable Control structures File read/write, serialization Data structures Objects, inheritance Namespace, garbage collection, …
23
89
Call by value, references
All PLs pass arguments to functions using the above concepts (others also exist)
Call by value essentially sends a copy of the object Hence, any modification inside the function is local
Copy is modified, not the original!
In Python, immutable arguments are effectively passed by value [why?] Strings, integer, …
In Python, mutable arguments are passed by reference Objects are passed by call by reference (or pointer) Hence, any changes to the object is equivalent to changing the
original object
Lists, dictionaries, …
Python: variables, objects, and references
Consider name reference object
a = 3
Now, add
b = 3
Now, add
a = ‘dasc’
Notion of ‘shared reference’
Mutable – ability to change “in‐place” Integer objects are immutable (also strings)
Sets allow in‐place change using set methods (update)
2/10/2022 © your name 90
‘dasc’
a 3
b
a
a and b are aliases for the same object!
91
Argument matching basics Positionals: matched from left to right Keywords: matched by argument name Looks like positional arguments cannot follow keyword argument
Need to be careful when specifying arg by position (has to be left to right)
Defaults: specify values for optional arguments that aren’t passed
Varargs unpacking: pass arbitrarily many positional or keyword arguments Use single asterisk (*) syntax to unpack argument collections into separate arguments− Lists
Varargs collecting: collect arbitrarily many positional or keyword arguments Preceded by two asterisks (**); dictionaries
92
Argument matching basics
def whatDay(mon, mon2, Date='12‐09‐2021', day='monday'):
print(Date, day, mon, mon2)
whatDay(99, day =100, Date ='sat', mon=500)
‐‐‐‐> 3 whatDay(99, day =100, Date ='sat', mon=500)
TypeError: whatDay() got multiple values for argument 'mon‘
whatDay(99, day =100, Date ='sat', ‘500’)
SyntaxError: positional argument follows keyword argument
whataDay(99, day =100, Date ='sat', mon2=700)
sat 100 99 700
whataDay(99, 500, day ='100', Date ='sat')
sat 100 99 500
whatDay(‘jan’, ‘feb’)
12‐09‐2021 monday jan feb
24
93
Python
I will cover the following with examples so you can practice further on your own Overall philosophy of Python Typing, strings, and Print statements Pattern matching Scoping rules Call by value, reference Python Functions and polymorphism Mutable and immutable objects Control structures Input, File read/write, serialization Data structures Objects, inheritance Namespace, garbage collection, …
94
Python: Functions
A function groups a set of statements so they can be run more than once in a program
Invocation or calling Avoids cutting and pasting the same code
However, we are likely to make small changes with cut and paste
Parameters are used for this customization
A basic program structure
Code reuse Design by procedural decomposition
Understandability of design
95
Python: Function related statements call expression myfunc(‘sharma’, pos=‘professor’, 256)
def def twice(message): # note colon
print(message*2) # note indentation
def adder(a, b=1, *c): # variable num of args
Return return a+b+c[0]
nonlocal (3.X) def outer():
x = ‘old’
def changer():
nonlocal x; x = ‘new’
global x = ‘old
def changer():
global x; x = ‘”new”
yield def squares(x):
for i in range(x): yield i**2
lambda funcs = [lambda x: x**2, lambda x: x**3]
96
Python: Functions
def is executable code
A new statement (starts on a line)
Does not exist until Python reaches and executes def
You can have def inside if statements, while loops, and even other defs (nested)
Def creates an object and assigns it to a name (function name)
Function objects can be assigned to other names, can be stored in a list.
25
97
Python: Functions
lambda is an anonymous function
lambda creates an object, but returns a result
Name assignment is not automatic as in a def
lambda is an expression, not a statement. Hence, can appear in places where def cannot (inside a list literal or call arguments, etc.)
Lambda’s body is a single expression, not a block of statements
F = lambda(x, y, z: x+y+z)
F(2, 3, 4) gives 9
98
Python: lambda usageL = [lambda x: x**2, # inline function definition
lambda x: x**3,
lambda x: x**4] # list of callable functions
for f in L:
print(f(2)) # prints 4, 8, and 16
print(L[0](3)) # prints 9
What will Print(L[2](2)) give? # expect these on tests
‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ OR ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐
Def f1(x): return x**2
Def f2(x): return x**3
Def f3(x): return x**4
L = [f1, f2, f3]
For f in L
print(f(2)) # prints 4, 8, 16
99
Python: Functions
return sends a return object to the caller Synchronous call (interrupts are asynchronous)
Returns an object or None to the caller
yield sends a return object to the caller, but remembers where it left off (suspends)
Used by generators to suspend state and resume later to produce a series of results over time (advanced)
global declares module level variables that are to be assigned!
nonlocal declares enclosing function variables that are to be assigned!
100
Python: Functions Arguments are passed by assignment (object reference)
Caller and callee share objects by references
Changing an argument name within an argument function does not change the corresponding name in the caller, but changing passed‐in mutable objects in place can change objects shared by the caller, and serve as a function result
− Strings are immutable; hence changing them in the function has no effect on the value seen by the caller!
Arguments are passed by position, unless stated otherwise
− Arguments are used left to right
− Name = value keyword syntax can be used
Arguments, return values are not declared
There are no type constraints on functions.
A single function can be applied to a variety of object types
26
101
Polymorphism (in Python)
Behavior based on the number of parameter and their types. In Java multiply(int x, int y), multiply(float x, float y), multiply(int x) are polymorphic
Which one is called depends on parameter types and number Else error
In the absence of polymorphism ( Have to name each function differently otherwise multiply_int(), multiply_real() etc.
User has to call them appropriately
With polymorphism You make the system do the grunt work!
Raising the level of abstraction!
102
Polymorphism in Python
Behavior based on the number of parameter and their types. In Java multiply(int x, int y), multiply(float x, float y), multiply(int x)
are polymorphic Which one is called depends on parameter types and number
Else error
In PythonDef times(x, y):
return x*ytimes(2, 4) 8times(3.14, 4)12.56times(‘dasc’,3) ‘dascdascdasc’
In Python * works on BOTH numbers and sequences Because we never declare the types of variables, Arguments, or return types in Python, we can use timesTo either multiply numbers or repeat sequences!
Because it’s a dynamically typed language,Polymorphism runs rampart in Python. Every operation is a polymorphic operation: printing, indexing, * and much more! Feature, not bug!
103
Polymorphism in Python
Philosophical differences with (strongly) typed languages like C/C++ and Java
In lieu of types, extra testing is necessary to make sure it works correctly!
What is done by the compiler now needs to be done by the developer
By and large, coding is done to object interfaces, not data types
Polymorphism is also known as duck typing
− Your code is not supposed to care whether an object is a duck. Implementation of “quacking” is up to the object philosophy
104
Polymorphism example
def intersect(seq1, seq2):
res = []
for x in seq1:
if x in seq2:
res.append(x)
return res
intersect('dasc', 'cse')
['s', 'c']
print(intersect([1,4], [4]))
[4]
print(intersect([1, 4, 5], (1,5)))
[1,5]
print(intersect([1,4], (4,)))
[4]
print(intersect([1,4], (4)))
# (4,) makes it a tuple and is iterable! Gives [4]
<ipython-input-7-1e560012f002> in intersect(seq1, seq2)
2 res = []3 for x in seq1:
----> 4 if x in seq2:5 res.append(x)
6 return res
TypeError: argument of type 'int' is not iterable
27
105
Python
I will cover the following with examples so you can practice further on your own Overall philosophy of Python
Typing, strings, and Print statements
Pattern matching
Scoping rules, polymorphism
Call by value, reference
Garbage collection in Python
Mutable and immutable
Control structures
File read/write, serialization
Data structures
Objects, inheritance
106
Python: Garbage collection Objects are allocated in memory and take up memory space Over the course of a program execution, more and more
memory is allocated for various purposes Should understand stack and heap allocations
Memory space is still allocated even if you stop using that variable
Eventually, your program runs out of available memory space and exits
Recovering those “unused” memory is the concept of “garbage collection”
Has to be done by the user in C/C++ and is not easy Java and Python does garbage collection without user
intervention How does the computer know when memory is not used
anymore? not an easy task. (why?)
aliasing!
107
Python: Garbage collection
Consider the following code
Alist = [‘dasc 5300’, ‘cse 5300’]
.
.
.
Alist = 28.8
Alist = 33
The above is straightforward! Memory of list and float can be garbage collected!
108
Python: Garbage collection It is not always that easy If an object is referenced by multiple names (aliases),
then you cannot garbage collect unless all of the references are gone.
Another issue: cyclic references Not easy to identify! Overhead, slows things down
Hence, some book keeping is necessary (reference counters)
So, there is a runtime overhead for automatic garbage collection Cycle check can be disabled in Python (benefit?)
User doing garbage collection is error prone! Very critical for programs that use lots of memory
space!
28
109
Python
I will cover the following with examples so you can practice further on your own Overall philosophy of Python Typing, strings, and Print statements Pattern matching Scoping rules Python Functions and polymorphism Mutable and immutable Call by value, reference Input, operators, Boolean File read/write, serialization Control structures Objects, inheritance Namespace, garbage collection, …
Python: file operations
The built‐in open function creates a Python file object, which serves as a link to the file residing on your machine
output = open(r’C:\mydata.txt’, ‘w’) # opens file for write; note raw stringinput = open (‘data’, ‘r’) # read mode, ‘r’ is default
aString = input.read() # reads entire file into a single string
aString = input.read(n) # reads up to n characters (or bytes) into a string
aString = input.readline() # read next line (including \n newline) into a string
aList = input.readlines() # read entire file into list of line strings (with \n)
‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐
output.write(aString) # write a string of characters (or bytes) into file
output.writelines(aList) # write all the strings in the List into file
output.close() # manual close (done for you when file is collected)
output.flush() # flush output buffer to disk without closing
anyfile.seek(n) # change the position to offset n for next operation
Python: file operations When you use an open function, you have to close the file
manually
This may not happen if there is an exception raised by the code
A try statement with finally is use to circumvent thatmyfile = open(r’C:\code\textdata’, ‘w’)try:process myfile …
finally: # block always executed
myfile.close()
However, using a with statement is a better practice. It automatically closes the file even if the code encounters an exception
with open(r’C:\code\textdata’, ‘w’)process myfile …
Python: file operations
for line in open(‘data.txt’): # file iterators read line by line code for processing line
Files are buffered and seekable Remember buffers are flushed by the OS when they get full
close and flush operations empty the buffer into the file (even when they are no full)
File statements along with re package is a powerful data pre‐processing tool for data analysis
Alternative usagesOpen(‘myFile.txt’).read() #read entire file into a stringPrint(open(myFile.txt’).read()) # read and print the entire filefor line in open(‘myFile.dat’):print(line, end = ‘:’) # instead of line feed, end adds specified char
29
113
Memory Usage When you execute a program, all data structures
created during the program are allocated/stored in main memory
Data has to be in memory for any computation or modification
You typically create and delete data structures while running a program (memory allocation / deallocation)
All the data structures created in memory go away when program finishes (unless it is saved explicitly)
Programs read/create data, work on it, write results out to a file
RAM and persistence
Data in memory is not accessible after the program ends To preserve modifications from one run to another, it has to be explicitly written to persistent storage
RAM is volatile. Hence, is erased when there is a power failure or system crash Flash and other memory types are not volatile But RAM is fast although expensive, but volatile
To store data on disk (non‐volatile or persistent), files are used In ASCII or Unicode or binary format
When persistent storage is used, data has to be transferred to memory for processing
Computation can be done only on data in memory/registers by the CPU
115
Computing in Memory
Different types of data structures and operations on them are supported by all programming languages arrays, vectors, linked lists, trees (several kinds), hash table (dictionary in Python), B and B+ trees, etc.
All programming languages (including Python) support only in‐memory data structures and possibly serialization to persist them in files
Each data structure serves a different purpose and has to be chosen with understanding and justification to meet algorithm/application needs− Array is faster, but less flexible− Hash tables support associative search (linear time)− Trees support search in log time
Hence, data structure need to be chosen carefully or wisely to meet the needs of the algorithm or application
Understanding subtle differences is important!
116
Usage Paradigms Typical program Initializes data, works on it, produces results, prints/writes results, and finish
The above is not sufficient for all needs
Alternate paradigm Load persisted data, manipulate, and automatically (i.e., without user intervention) persist changes for subsequent usage
Many applications need this paradigm− When you edit a file, you want results (edits) to persist across editing sessions
− When you use Mymav, you want your registrations to persist across sessions
− UTA keeps track of their students, courses taken, transcripts for many years
30
117
Data Analysis perspective
Most of the data used for analysis are in files, spread sheets, or databases (in ASCII or as text)
Typically, all data is loaded for performing analysis
The way you are doing project 1
You will output/save some results, charts, plots into files (persistent store)
What happens if entire data cannot be loaded into memory?
Instead of 30,000+ rows, what if I gave 100 Million rows for project 1?
You cannot always use “load all data & analyze” approach!
118
Alternative and its implications
If the data set is very large (cannot be held in memory) Bring portions of data as needed for analysis
− May have to bring the same data multiple times from disk
By staging data between disk and memory
Implications of the above Need for a main memory buffer and its management
(complex) Buffer replacement policies (suitable ones) Same main memory algorithm may not be appropriate
− Disk‐based sort vs. in‐memory sort, association rule mining
Impedance mismatch Has impact on response time Real‐time analysis is difficult in this mode
− Amazon, Netflix recommendations, click stream analysis etc.− Stream data processing addressed these QoS (Quality of Service: latency, throughput, and memory usage) issues
New algorithms, novel techniques
What is Impedance mismatch?
If you are analyzing data stored on disks, you have to understand this very clearly
CPU is way, way faster than disk access. In fact, it is a Million or more times faster
CPU accesses are in Nano secs and disk accesses are still in Milliseconds!
Because of this speed difference, it takes much longer to bring data from a disk
So, cpu may end up waiting/idling most of the times for data (if not architected properly)!
This is called impedance mismatch! DBMSs (and other systems) are architected keeping this in mind
Text vs. binary format on disk For human readability, data is stored on disks in
text format or can be output into a text format CSV, XML, YAML, …
However, these need to be interpreted/converted after reading for computation which can be expensive or time‐consuming
Alternatively, data can be stored in binary format on disk A more concise representation (.pyc, .o, .jpeg .mp4, etc.)
But loading back into memory can be complicated
Not human readable
Avoids reconstruction in some environments
31
Object and data structure persistence
Earlier programming languages did not provide support for persisting data structures or objects created in memory during a program execution This had to be done by the user
Current programming languages, including Python, provide writing (or serializing) complex objects (cyclic as well as embedded objects) to be written to file in binary and reloaded later
Requires certain constraints to be satisfied Object definitions should not change in between
Versioning of programs needs to be compatible
Also adds vulnerabilities
DBMS storage & data structure persistence
They are not the same
Persistent Java and persistent C++ were attempted By declaring objects to be persistent
Did not gain traction
DBMSs store ALL data on disk and manage them Relations
Indexes, views
Catalogs
Stored procedures etc.
BTW, DBMSs also support embedding of SQL in programming languages!
123
Storing data on Disk
Data structures stored on disks have to be mapped properly to corresponding data structures/objects B and B+ trees, Hash tables, others as well`
Everything on disk is a file− Program, data, executable code, image, video, relations, indexes
− Ascii, binary, encrypted, compressed, …
Pointers in memory are very different from pointers on disk (please understand this clearly) Memory‐based pointers cannot be copied and stored on disk
Memory pointers have to be converted into disk pointers and converted back to memory pointers (pointer swizzling)
Java and Python allow serialized objects to be written to a file in binary and loaded later
This is much more efficient than re‐constructing the object in memory from an asci file representation
124
Object serialization
The process of converting application data to another format (usually binary) suitable for transportation is called serialization
The process of reading data back in after it has been serialized is called unserialization or deserialization
Vulnerabilities arise when developers write code that accepts serialized data from users and attempt to unserialize it for use in the program
Depending on the language, this can lead to all sorts of consequences
Malicious code can be inserted into the binary
32
125
Python: Object serialization
How do you save (or persist) data structures from one program run to another
Pickling is the process whereby a Python object hierarchy is converted into a byte stream (usually not human readable) to be written to a file, this is also known as serialization. Unpickling is the reverse operation, whereby a byte stream is converted back into a working Python object hierarchy.
− Serialization, also called marshalling, flattening
Pickle is operationally the simplest way to store/persist objects. The Python Pickle module is an object‐oriented way to store objects directly in a special storage format.
126
Pickling in Python
+ Pickle can store and reproduce dictionaries and lists very easily.
+ Stores object attributes and restores them back to the same State.
‐ It does not save an objects code. Only its attribute values.
‐ It cannot store file handles or connection sockets.
‐ Usage (potential for malicious pickle data)
import pickle
Write a variable to file, something like
pickle.dump(myObject, outfile, protocol)
127
Pickle and Marshal
Always use pickle. Marshal is a more primitive serialization module and used by .pyc files Recursive objects are handled by pickle
Shared objects are stored only once!
pickle can save and restore class instances transparently.
However, for pickling, the class definition must be importable and live in the same module as when the object was stored.
The marshal serialization is not guaranteed to be portable across Python versions
pickle serialization is guaranteed to work across versions using the protocol parameter
128
Relationship to others
Comparison with JSON (JavaScript Object Notation)
JSON is a text serialization format (it outputs unicode text, although most of the time it is then encoded to utf‐8), while pickle is a binary serialization format;
JSON is human‐readable, while pickle is not
− Lightweight Data interchange format
JSON is interoperable and widely used outside of the Python ecosystem, while pickle is Python‐specific;
33
129
Relationship to others
Comparison with JSON (JavaScript Object Notation)
JSON, by default, can only represent a subset of the Python built‐in types, and no custom classes; pickle can represent an extremely large number of Python types (many of them automatically, by clever usage of Python’s introspection facilities (termed reflection inJava); complex cases can be tackled by implementing specific object APIs);
Unlike pickle, de‐serializing untrusted JSON does not in itself create an arbitrary code execution vulnerability.
130
Relationship to other Python Modules Pickle is Python‐specific The module pickletools contains tools for analyzing data streams generated by pickle. pickletools source code has extensive comments about opcodes used by pickle protocols.
protocol parameter− Protocol version 0 is the original “human‐readable” protocol and is backwards compatible with earlier versions of Python.
− Protocol version 1 is an old binary format which is also compatible with earlier versions of Python.
− Protocol version 2 was introduced in Python 2.3. It provides much more efficient pickling of new‐style classes. Refer to PEP 307 for information about improvements brought by protocol 2.
131
Relationship to other Python Modules protocol parameter
− Protocol version 3 was added in Python 3.0. It has explicit support for bytes objects and cannot be unpickled by Python 2.x. This was the default protocol in Python 3.0–3.7.
− Protocol version 4 was added in Python 3.4. It adds support for very large objects, pickling more kinds of objects, and some data format optimizations. It is the default protocol starting with Python 3.8. Refer to PEP 3154 for information about improvements brought by protocol 4.
− Protocol version 5 was added in Python 3.8. It adds support for out‐of‐band data and speedup for in‐band data. Refer to PEP 574 for information about improvements brought by protocol 5.
Use default protocol value (is based on Python version) pickle.DEFAULT_PROTOCOL pickle.HIGHEST_PROTOCOL sets it to highest version available
132
Relationship to others
The data format used by pickle is Python‐specific
This has the advantage that there are no restrictions imposed by external standards such as JSON or XDR (which can’t represent pointer sharing)
However it means that non‐Python programs may not be able to reconstruct pickled Python objects
XDR (eXtended Detection and Response) is a cyber security technology
A framework that collects correlates data across various networks points, such as servers, emails, and endpoints
Will not discuss this further
34
133
JSON, XDR, YAML, XML, … There are many markup languages. You have certainly
heard of HTML (Hyper Text Markup Language)
XDR framework serves a specific purpose (security‐related)
XML (eXtended Markup Language) is also human readable
− xml parsers is used and Xquery for traversal/search
− not well‐integrated into a programming language
− Quite verbose
YAML (Yet Another markup Language) is more of a data format Is a superset of JSON
As they say “the good thing about standards is that there are so many of them!”
134
Serialization and Persistence
Not the same Serialization is used by PLs for writing object hierarchy into files and read it back and create same objects
Persistence is used by DBMSs for storing ALL your data in a transparent (to the user) way (relations, data types used, etc.) and are staged back and forth transparently− by bringing them to memory, processing and writing back
No need for the entire file to be loaded into memory− Important and critical Difference between main memory algorithms and DBMS operations
− Think of MyMav loaded entirely into memory!
Staging disk‐based data into memory (back and forth) is a complex process that uses buffer manager, alternate file formats, page fetches, replacement policies and disk I/O optimization, etc.
Data Structures in Python
2/10/2022 © your name 135 2/10/2022 © your name 136
35
137
Python
I will cover the following with examples so you can practice further on your own Overall philosophy of Python
Typing, strings, and Print statements
Pattern matching
Scoping rules
Python Functions and polymorphism
Mutable and immutable
Call by value, reference
Input, operators, Boolean
File read/write, serialization
Control structures
Objects, inheritance, Namespaces
138
Programming paradigms
Functional programming Haskell, Scala, Erlang
Structured or modular programming C, C++, PHP, C#
Imperative programming Pascal, C, Java, Python
Non‐procedural language SQL
Declarative programming (relationship between input and output) Pure Lisp, Prolog
Logic Programming (uses first order logic) Prolog
Object Orientated design and Object Oriented Programming or OOP
2/10/2022 © your name 139
Source: https://help.sap.com/doc/saphelp_nw73ehp1/7.31.19/en-US/c3/225b5654f411d194a60000e8353423/content.htm?no_cache=true
140
Object Oriented Programming (OOP)
Program = data structures +algorithms Object = data + code Smalltalk: one of the early PL to support dynamic typing and
objects (1972, from Xerox Parc) Loops was a very early multi‐paradigm programming system
from Xerox Object‐orientation came about from a software engineering
perspective as an alternative to Functional and other forms of programming Idea: Cooperating collection of objects
− Objects communicate by messages (synchronous and asynchronous)
Uses encapsulation and abstraction Classes are organized into hierarchies (or taxonomy) Cleaner than global functions and data structures (structs) Methods can only operate on objects for which they are designed
36
141
OOP: general
Using polymorphism and inheritance, object‐oriented programming allows you to reuse individual components. In an object‐oriented system, the amount of work involved in revising and maintaining the system is reduced, since many problems can be detected and corrected in the design phase
Object‐oriented programming (OOP) is a computer programming model that organizes software design around data, or objects, rather than functions and logic. An object can be defined as a data field that has unique attributes and behavior.
142
OOP: general The structure, or building blocks, of object‐oriented
programming include the following: Classes: user‐defined data types that act as the blueprint for
individual objects, attributes and methods. Objects: instances of a class created with specifically defined
data. Objects can correspond to real‐world objects or an abstract entity. When a class is defined initially, the description is the only thing that is defined.
Methods: functions that are defined inside a class that describe the behaviors of an object. Each method contained in class definitions starts with a reference to an instance object. Additionally, the subroutines contained in an object are called instance methods. Programmers use methods for reusability or keeping functionality encapsulated inside one object at a time.
Attributes are defined in the class template and represent the state of an object. Objects will have data stored in the attributes field. Class attributes belong to the class itself
143
OOP Principles: Encapsulation
Encapsulation principle states that all important information is contained inside an object and only select information is exposed.
The implementation and state of each object are privately held inside a defined class.
Other objects do not have access to this class or the authority to make changes (without proper permissions)
They are only able to call a list of public functions or methods. This characteristic of data hiding provides greater program security and avoids unintended data corruption.
144
OOP Principles: Abstraction, inheritance
Abstraction Objects only reveal internal mechanisms that are relevant for the use of other objects, hiding any unnecessary implementation code.
The derived class can have its functionality extended.
This concept can help developers more easily make additional changes or additions over time
Inheritance Classes can reuse code from other classes. Relationships and subclasses between objects can be assigned, enabling developers to reuse common logic while still maintaining a unique hierarchy. This property of OOP forces a more thorough data analysis, reduces development time and ensures a higher level of accuracy.
37
145
OOP Principles: Polymorphism Polymorphism Objects are designed to share
behaviors and they can take on more than one form. The program will determine which meaning or usage is necessary for each execution of that object from a parent class, reducing the need to duplicate code.
Polymorphism allows different types of objects to pass through the same interface.
A child class is then created, which extends the functionality of the parent class.
We saw the times(x, y) example earlier and how it can be invoked for different data types
146
OOD & OOP benefits: summing up
Modularity Reusability Productivity Easily upgradable and scalable Interface descriptions Security Flexibility
A widely used design and programming paradigm
For DBMS, EER (Enhanced Entity Relationship) diagrams are widely used for data modeling (developed in 1970)
For OO design UML (Unified Modeling Language) is widely used for design, use cases, etc. (developed in 1994)
Both are graphical languages for visualizing design
147
OOP Terminology: General Class – object description and instance generation
factory (using the class constructor) Object Instance of a class – an instance created using
the factory method of a class attributes – defined in the class (can include other
classes) Instance attributes – is part of every object created Class or static attributes – only one copy exists for all instances (used for aggregates, etc.)
Methods – function specific to a class that can act upon attributes Static methods – cannot access instance methods and instance attributes
Instance methods – can access static attributes and static methods
148
OOP Terminology: Inheritance
Inheritance (“is-a”) relationship# specialization Vs. generalization
− A hummer is a vehicle //a vehicle is not always a hummer
− A RushOrder is an Order− A rectangle is a polygon
Other relation types Dependence (“uses‐a”) # need
− Order object uses Account object
− Student object uses Course object to check for registered courses
Aggregation (“has‐a”) and composition # grouping− Car has an engine, 4 wheels, several seats
− Order has multiple items
38
© Sharma Chakravarthy © 149
Inheritance• Class hierarchy (terminology)
– Direct superclass
• Inherited explicitly (one level up hierarchy)
– Indirect superclass
• Inherited two or more levels up hierarchy (multilevel)
– Single inheritance
• Inherits from only one superclass
– Multiple inheritance
• Inherits from multiple superclasses– Java does not support multiple inheritance (unlike C++)
– Python supports multiple inheritance (like C++)
• All classes are inherited from the type class in Python (Object class or root of the hierarchy in Java)
150© Sharma Chakravarthy ©
151
OOP Terminology: General
Information hiding: ‐‐ Languages support information hiding visibility of attributes (both instance and static/class)
Visibility of methods (both instance and static/class)
Principle of minimum exposure
For example, for both attributes and methods, Java supports Public
Protected
Private
Has implications for inheritance
single inheritance, and
Interfaces in lieu of multiple inheritance
152
Python: Classes and Methods
There are plenty of data structures available immediately in Python to get going
One does not have to start with classes as it requires planning and design Strategic or enterprise code development (long‐term)
− Long‐term and stable
Tactical code development (where time is in short supply) − quick and dirty!− This is the attraction of Python for data analysis!
Why use classes? If you are building you own analysis system that will be used over a period of time, investment in it is justifiable
Use appropriate features to develop a system that can be used and maintained over a period of time
39
Classes: Python vs. Java
Public class Student{
private string name;
private int age;
private String phNum;
public Student(String n, int a, String p){
this.name = n;
this.age = a;
this.phNum = p;
}
public String getName(){
return name;
}
Java classes are defined with file name same as class name
Only one public class per file
Best practices are identified
class Student:
def __init__(self, name, age, phnum):
self.name = name
self.age = age
self. phNum = phnum
You can declare a class anywhere, in any file, at any time
You can save this file as stud.py
Double underscore (dunder) for init
Self prefix indicate attributes (also the equivalent of this in Java)
Attributes get defined inside __init__ when they are assigned!
Equivalent of Java constructor
See what happens if you leave out self in init First argument is interpreted as self
Without self, a local variable is created
2/10/2022 © your name 153
Classes: Python details
class Student:
def __init__(self, name, age, phnum):
self.name = name
self.age = age
self. phNum = phnum
Python allows to add and initialize attributes outside of the constructor
s = Student(‘Sam’, 24, ‘1234567890’)
s.classification = ‘sophomore’
print(s.classification)
BAD practice!
ONLY for this instance
Difficult to understand and introduces subtle bugs !
All attributes are public in Python (no public, private, protected as in Java)
However, attributes with a single _ (underscore) are considered non‐public! Not supposed to access, but
one can
Attributes with __ (dunder) are considered strictly non‐public; hence, name is changed by Python to discourage their access/usage But still can access once you
know the naming convention
2/10/2022 © your name 154
Classes: Python class attributesclass student:totalStudents = 0def __init__(self, name, age, phnum):self.name=nameself.age = ageself.phNum = phnumstudent.totalStudents += 1def display(self)print(f'{self.name}, {self.age}, {self.phNum}')def totStudents():return student.totalStudents
s = student('Sam', 24, "1234567890")y = student('john', 25, '345')s.display(); y.display()print(f"total students: {student.totalStudents}")print(student.totStudents()); print(y.totalStudents)Print(s.totalStudents) In Python, you can declare a variable
outside of a method. It is treated as a class or (static in Java) attributes
There is only one copy this attribute for the whole class
Instances DO NOT have this attribute !
You refer to a class variable using the class name as it is not part of an instance
You can also refer to it using an instance (don’t do it)
You can also refer to it usingfilename.classname.attributestud.student.totalStudents if you used stud.py to store
this class‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐Sam, 24, 1234567890john, 25, 345total students created: 2222
2/10/2022 © your name 155 156
Python: Access Control
In Java, you access private attributes using setters and getters
Java’s best practices of using private attributes accessed with public getters and setters is one of the reasons why Java code tends to be more verbose than Python.
In contrast, you can access attributes in Python directly, since everything is public
You can access anything, anytime, anywhere
You can even delete attributes in Python! (not possible in Java) only for that instance!
Example: del y.age # do not use unless absolutely justified!
40
Python Namespaces
2/10/2022 © your name 157
Built-in PythonNames pre-assigned in the built-in names module: open, range,SyntaxError, …
Local (function)Names assigned in any way within a function (def or lambda), and not declared global in that function
Enclosing Function LocalsNames in the local scope of any and all enclosing functions (def, lambda), from inner to outer
Global (module) corresponds to package in JavaNames at the top-level of module file, or declared global inA def within the file
158
Python: decorations
In Python, properties provide access to private attributes using decorator syntax
You should be using this instead of accessing non‐public attributes directly!
Properties allow functions to be declared in Python classes that are analogous to Java getter and setter methods (including deletion of attributes)
@property
Def studentPhnum(self)
return self._phNum
@phnum.delete
Def phnum(self)
print(“Warning: phNum attribute will be deleted”) # only for this instance
del self._phNum
159
Python: Others
Reflection is supported in Python (like Java)
Refers to examining an object or class from within the object or class. A very useful feature for enterprise systems
Can check types at run time and take decisions− Avoids editing, re‐compiling and re‐loading
Uses type() to display the type of variable Uses isinstance() to determine if a given variable is an instance or child of a specific class
Using dir() in Python, you can view every attribute and function contained in any object including the dunder methods
You can call methods through reflection You can make a list of methods/functions
160
Python: Others
Multi‐threaded programming
Today, most non‐trivial systems use threads and is important to understand their usage
− Cores can be used beneficially
Debugging is difficult in the presence of race conditions
Synchronization
Critical sections
Semaphores
Conditional waits
Locking
Remote Procedure Calls (RPC)
Distributed systems development
41
161
Python: Summary
+ It is very easy to get going in Python (compared to other equivalent languages)
+ Code size is likely to be small+ Tons of libraries make it easier+ good for “tactical” usage (for quick results)+ not much design is needed for analysis
‐ “strategic” or enterprise systems require design and long‐term usage and maintenance‐ Doing them in Python is the same as doing it in other
languages‐ Some of Python advantages may become less useful/relevant
‐ Python is very forgiving. Therefore, The burden is on you to not make mistakes and shoot yourself in the foot!
Questions/comments
CSE 6331
For more information visit:http://itlab.uta.edu
Spring 2019