lecture 2 - collecting, analyzing, and visualizing data with python … 2 - collecting... · •...
TRANSCRIPT
![Page 1: Lecture 2 - Collecting, Analyzing, and Visualizing Data with Python … 2 - Collecting... · • Using web scraping frameworks like Scrapy • Writing your own code. Using Application](https://reader036.vdocument.in/reader036/viewer/2022063006/5fb535d1fc3b5355396fec34/html5/thumbnails/1.jpg)
COLLECTING, ANALYZING, AND VISUALIZING DATA WITH PYTHON PART I DR. MICHAEL FIRE
![Page 2: Lecture 2 - Collecting, Analyzing, and Visualizing Data with Python … 2 - Collecting... · • Using web scraping frameworks like Scrapy • Writing your own code. Using Application](https://reader036.vdocument.in/reader036/viewer/2022063006/5fb535d1fc3b5355396fec34/html5/thumbnails/2.jpg)
Collecting Data
There several ways to collect data:•Using existing datasets•Create/Simulate your own dataset•Using Web scraping•Using API
![Page 3: Lecture 2 - Collecting, Analyzing, and Visualizing Data with Python … 2 - Collecting... · • Using web scraping frameworks like Scrapy • Writing your own code. Using Application](https://reader036.vdocument.in/reader036/viewer/2022063006/5fb535d1fc3b5355396fec34/html5/thumbnails/3.jpg)
Web ScrapingWe can collect data using web scraping using oneof the following methods:• Using simple tools like wget• Using Selenium for dynamic loaded pages• Using web scraping frameworks like Scrapy• Writing your own code
![Page 4: Lecture 2 - Collecting, Analyzing, and Visualizing Data with Python … 2 - Collecting... · • Using web scraping frameworks like Scrapy • Writing your own code. Using Application](https://reader036.vdocument.in/reader036/viewer/2022063006/5fb535d1fc3b5355396fec34/html5/thumbnails/4.jpg)
Using Application Programming Interfaces
We can use various websites’ Application Programming Interfaces (APIs) tocollect data from various platforms, such as:• Twitter• Reddit• Google Maps• Kaggle• Github
![Page 5: Lecture 2 - Collecting, Analyzing, and Visualizing Data with Python … 2 - Collecting... · • Using web scraping frameworks like Scrapy • Writing your own code. Using Application](https://reader036.vdocument.in/reader036/viewer/2022063006/5fb535d1fc3b5355396fec34/html5/thumbnails/5.jpg)
Recommended Read• Python Data Science Handbook, Chapter 1 IPython: Beyond Normal Python by Jake VanderPlas• The Unix Shell by Software Carpentry Foundation• Practical Introduction to Web Scraping in Python by Colin OKeefe
![Page 6: Lecture 2 - Collecting, Analyzing, and Visualizing Data with Python … 2 - Collecting... · • Using web scraping frameworks like Scrapy • Writing your own code. Using Application](https://reader036.vdocument.in/reader036/viewer/2022063006/5fb535d1fc3b5355396fec34/html5/thumbnails/6.jpg)
MANIPULATING DATA
![Page 7: Lecture 2 - Collecting, Analyzing, and Visualizing Data with Python … 2 - Collecting... · • Using web scraping frameworks like Scrapy • Writing your own code. Using Application](https://reader036.vdocument.in/reader036/viewer/2022063006/5fb535d1fc3b5355396fec34/html5/thumbnails/7.jpg)
NUMERICAL PYTHON (NUMPY)
![Page 8: Lecture 2 - Collecting, Analyzing, and Visualizing Data with Python … 2 - Collecting... · • Using web scraping frameworks like Scrapy • Writing your own code. Using Application](https://reader036.vdocument.in/reader036/viewer/2022063006/5fb535d1fc3b5355396fec34/html5/thumbnails/8.jpg)
Source: Python Data Science Handbook, Chapter 1 IPython: Beyond Normal Python by Jake VanderPlas
![Page 9: Lecture 2 - Collecting, Analyzing, and Visualizing Data with Python … 2 - Collecting... · • Using web scraping frameworks like Scrapy • Writing your own code. Using Application](https://reader036.vdocument.in/reader036/viewer/2022063006/5fb535d1fc3b5355396fec34/html5/thumbnails/9.jpg)
NumPy - The Basics• Supports large multi-dimensional arrays and matrices• Contains large collection of high-level mathematical
functions to operate on these arrays • Tools for reading / writing array data to disk
Useful Reading:• Chapter 4. NumPy Basics: Arrays and Vectorized Computation, Python for Data
Analysis by Wes McKinney• Chapter 2. Introduction to Numpy, Python Data Science Handbook, by Jake
VanderPlas
![Page 10: Lecture 2 - Collecting, Analyzing, and Visualizing Data with Python … 2 - Collecting... · • Using web scraping frameworks like Scrapy • Writing your own code. Using Application](https://reader036.vdocument.in/reader036/viewer/2022063006/5fb535d1fc3b5355396fec34/html5/thumbnails/10.jpg)
WORKING WITH PANDAS & DATAFRAMES
![Page 11: Lecture 2 - Collecting, Analyzing, and Visualizing Data with Python … 2 - Collecting... · • Using web scraping frameworks like Scrapy • Writing your own code. Using Application](https://reader036.vdocument.in/reader036/viewer/2022063006/5fb535d1fc3b5355396fec34/html5/thumbnails/11.jpg)
![Page 12: Lecture 2 - Collecting, Analyzing, and Visualizing Data with Python … 2 - Collecting... · • Using web scraping frameworks like Scrapy • Writing your own code. Using Application](https://reader036.vdocument.in/reader036/viewer/2022063006/5fb535d1fc3b5355396fec34/html5/thumbnails/12.jpg)
PandasPros: • Provides flexible and expressive data
structures• Easy to handle missing data• Columns can easily be added and deleted
Cons: • Good for several gigabytes of data • Mostly single threaded • Complex Group By operations
![Page 13: Lecture 2 - Collecting, Analyzing, and Visualizing Data with Python … 2 - Collecting... · • Using web scraping frameworks like Scrapy • Writing your own code. Using Application](https://reader036.vdocument.in/reader036/viewer/2022063006/5fb535d1fc3b5355396fec34/html5/thumbnails/13.jpg)
“My rule of thumb for pandas is that you should have 5 to 10 times as much RAM as the size of your dataset”
Wes McKinney, 2017
![Page 14: Lecture 2 - Collecting, Analyzing, and Visualizing Data with Python … 2 - Collecting... · • Using web scraping frameworks like Scrapy • Writing your own code. Using Application](https://reader036.vdocument.in/reader036/viewer/2022063006/5fb535d1fc3b5355396fec34/html5/thumbnails/14.jpg)
Pandas Objects
NumPy
Series
DataFrame
Values
Column
![Page 15: Lecture 2 - Collecting, Analyzing, and Visualizing Data with Python … 2 - Collecting... · • Using web scraping frameworks like Scrapy • Writing your own code. Using Application](https://reader036.vdocument.in/reader036/viewer/2022063006/5fb535d1fc3b5355396fec34/html5/thumbnails/15.jpg)
Let’s move to the Notebook