scraping in 60 minutes

31
Paul Bradshaw Leanpub.com/scrapingforjournalists * Scraping in 60 mins

Upload: paul-bradshaw

Post on 21-Feb-2017

463 views

Category:

Education


0 download

TRANSCRIPT

Page 1: Scraping in 60 minutes

Paul Bradshaw Leanpub.com/scrapingforjournalists*

Scraping in 60 mins

Page 2: Scraping in 60 minutes

How do you scrape?

Aron Pilhofer, News Rewired

Page 3: Scraping in 60 minutes

WYSIWYG tools (Import.io, OutWit Hub) Google Sheets =IMPORT Scraperwiki, Morph.io

Scraping tools

Page 4: Scraping in 60 minutes

OutWit Hub

Page 5: Scraping in 60 minutes

Import.io

Page 6: Scraping in 60 minutes

Import.io

Page 7: Scraping in 60 minutes
Page 8: Scraping in 60 minutes

*

Chrome extensions:

Page 9: Scraping in 60 minutes

*

Edit column > Add column by fetching URLs…

Page 10: Scraping in 60 minutes

https://ifttt.com/channels

Page 11: Scraping in 60 minutes
Page 12: Scraping in 60 minutes
Page 13: Scraping in 60 minutes
Page 14: Scraping in 60 minutes
Page 15: Scraping in 60 minutes
Page 16: Scraping in 60 minutes

Call it what you want

Put it where you want

Page 17: Scraping in 60 minutes
Page 18: Scraping in 60 minutes
Page 19: Scraping in 60 minutes
Page 20: Scraping in 60 minutes

*

Page 21: Scraping in 60 minutes

*

Page 22: Scraping in 60 minutes

*

Function (Arguments) (aka parameters)

Page 23: Scraping in 60 minutes

*

Function (arguments)

=SUM(A2:A50)

=AVERAGE(B2:B300)

=COUNTIF(A10:A3000,”Smith”)

Page 24: Scraping in 60 minutes

*

Function (parameters)

=SUM(range of cells to be summed)

=AVERAGE(range of cells to be averaged)

=COUNTIF(range of cells to be counted,what to count)

Page 25: Scraping in 60 minutes

*

(“string”, index)

Page 26: Scraping in 60 minutes

*

Tip: search for documentation

Page 27: Scraping in 60 minutes

*

Variable

Page 28: Scraping in 60 minutes

*

Variables

Page 29: Scraping in 60 minutes

*

Jargon checklist:

Function Arguments Parameters String Index Variable Documentation

Page 30: Scraping in 60 minutes

IMPORTXML IMPORTDATA IMPORTFEED

Page 31: Scraping in 60 minutes

Paul Bradshaw Leanpub.com/scrapingforjournalists*

Thank you.