web viewch1: python basics . print() type() : int, float, str, bool # in python, double “...
TRANSCRIPT
Python for Data science (DataCamp)ch1: Python basics
print()type() : int, float, str, bool# In python, double “ ” and single quotes ‘ ’ have identical functionality, unlike PHP or Bash# In [16]: 2 + 3 Out[16]: 5 In [17]: 'ab' + 'cd' Out[17]: 'abcd'help(function): open up documentation
ch2: python list [a, b, c] contain any type contain different typesfam = ["liz", 1.73, "emma", 1.68, "mom", 1.71, "dad", 1.89]fam2 = [[ "liz", 1.73], ["emma", 1.68], ["mom", 1.71], ["dad", 1.89]]In [13]: type(fam) Out[13]: list In [14]: type(fam2) Out[14]: list
Subsetting listsfam[3]: 1.68fam[-1]: 1.89fam[3:5]: [1.68, ‘mom’] [ start : end ] inclusive exclusivefam[:4]: ['liz', 1.73, 'emma', 1.68] Adding elements: fam + ["me", 1.79] Delete elements: del(fam[2])# Note: list is not primary type, so if y=x, y is referred to x. Any change to y will also change x.
ch3: Functions & Packagesmax(fam) % Maximum of listlen() % Length of list or string:round() % round(number [ , ndigits]) Round a number to a given precision in decimal digits (default 0).round(1.68, 1) 1.7
list.count() method counts how many times an element has occurred in a list and returns it.fam.append("me")
Methods Everything = object Object have methods associated, depending on type
ch4: NumPyList recap: powerful, collection o f different types, change/add/removeBut lack of mathematical operations over collections, and speedfor example:
Solution: NumPyNumeric Python;Alternative to Python list: NumPy Array;Calculations over entire arraysEasy and FastInstallation in the terminal: pip3 install numpy
NumPy methods: np.mean, np.median, np.corrcoef, np.std, np.sort, np.sum
intermediate_ch1: MatplotlibFunctions: Visualization; Data structures; Control structures; Case study
py.scatter(x, y)help(plt.hist)plt.xlabel(‘Year’)plt.title(‘…’)plt.yticks([0, 2, 4, 6, 8], [‘Germany’, ‘Dutch’, ‘China’, ‘US’, ‘UK’])
intermediate_ch2: Dictionaries & Pandasdict_name [ key ]result: value
Dictionaries can contain key: value pairs where the values are again dictionaries.europe = { 'spain': { 'capital':'madrid', 'population':46.77 }, 'france': { 'capital':'paris', 'population':66.03 }, 'germany': { 'capital':'berlin', 'population':80.62 }, 'norway': { 'capital':'oslo', 'population':5.084 } }# Print out the capital of Franceprint(europe['france']['capital'])# Create sub-dictionary datadata={'capital':'rome', 'population':59.83}# Add data to europe under key 'italy'europe['italy']=dataprint(europe)
Pandas
Pandas is an open source library, providing high-performance, easy-to-use data structures and data analysis tools for Python.
The DataFrame is one of Pandas' most important data structures. It's basically a way to store tabular data where you can label the rows and the columns. One way to build a DataFrame is from a dictionary.
Index and select Data Square brackets Advanced methods
loc, iloc
# Note: The single bracket version gives a Pandas Series, the double bracket version gives a Pandas DataFrame.loc and iloc allow you to select both rows and columns from a DataFrame.
# Note: about differences between Pandas series and Dataframepandas.DataFrame(data=None, index=None, columns=None, dtype=None, copy=False)Two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). Arithmetic operations align on both row and column labels. Can be thought of as a dict-like container for Series objects. The primary pandas data structureSo the Series is the datastructure for a single column of a DataFrame, not only conceptually, but literally i.e. the data in a DataFrame is actually stored in memory as a collection of Series.https://stackoverflow.com/questions/26047209/what-is-the-difference-between-a-pandas-series-and-a-single-column-dataframe
https://www.tutorialspoint.com/python_pandas/python_pandas_dataframe.htmhttps://www.tutorialspoint.com/python_pandas/python_pandas_series.htm
intermediate_ch3: Comparison OperatorsComparison Operators: how python values relate<, >, <=, >=, ==, !=Boolean Operators: and, or, not# Note: when dealing with numpy array, use np.logical_or/and/not(logic_array1, logic_array2) on element-wise comparison
Conditional Statements: if condition :
expression 1: elif condition:
expression 2: else :
expression 3:
Filtering Pandas DataFrame:Example-1 Compare: select contries with are over 8 million km2
Example-2 Boolean operators: also numpy.logical_and/or/not()
intermediate_ch4: While loopwhile: repeat action until condition is met:while condition :
expression
for loop: for each var in seq, execute expressionfor var in seq :
expression enumerate(obj): iterator for index, value of iterable
Loop over string
Loop over Dictionary: dict.items()
loop over Numpy arrays: np.nditer(obj)
loop over DataFrame: my_pandas_dataframe.iterrows()
Pandas method: apply(function): apply functions
Recap: Dictionary: for key, val in dict.items() :Numpy array: for var in np.nditer(my_array) :DataFrame: for lab, row in my_pandas_dataframe.iterrows() :
intermediate_ch5: Random Numbersimport numpy as npnp.random.seed(num)np.random.rand() # random float from 0-1np.random.randint(start, end)
Throw coins 10 times, count number of times tails appeared, store this number in final_tails list. Repeat 100 times.
== & is == is for value equality. Use it when you would like to know if two objects have the
same value. is is for reference equality. Use it when you would like to know if two references
refer to the same object.In general, when you are comparing something to a simple type, you are usually checking for value equality, so you should use ==. For example, the intention of your example is probably to check whether x has a value equal to 2 (==), not whether x is literally referring to the same object as 2.>>> a = 500>>> b = 500>>> a == bTrue>>> a is bFalse
>>> a = [1, 2, 3]>>> b = a>>> b is a True>>> b == aTrue>>> b = a[:]>>> b is aFalse>>> b == aTruehttps://stackoverflow.com/questions/132988/is-there-a-difference-between-and-is-in-python