jdp15 import.io workshop
TRANSCRIPT
![Page 1: jdp15 import.io workshop](https://reader030.vdocument.in/reader030/viewer/2022032505/55c7ddb8bb61eb96108b45bd/html5/thumbnails/1.jpg)
jpd15, Junio 2015
Ignacio Elola @ignacio_elola
Web data? Extrayendo datos de la web
![Page 2: jdp15 import.io workshop](https://reader030.vdocument.in/reader030/viewer/2022032505/55c7ddb8bb61eb96108b45bd/html5/thumbnails/2.jpg)
who I am?
web data and import.io
example: text analysis with import.io and MonkeyLearn
summary
![Page 3: jdp15 import.io workshop](https://reader030.vdocument.in/reader030/viewer/2022032505/55c7ddb8bb61eb96108b45bd/html5/thumbnails/3.jpg)
import.io?
the Web as a data source
![Page 4: jdp15 import.io workshop](https://reader030.vdocument.in/reader030/viewer/2022032505/55c7ddb8bb61eb96108b45bd/html5/thumbnails/4.jpg)
What is import.io? ● Machine reading the web● Point-and-click UI● Map the data on a web page● Algorithms will turn it into structured data ● Real-time through an API
![Page 5: jdp15 import.io workshop](https://reader030.vdocument.in/reader030/viewer/2022032505/55c7ddb8bb61eb96108b45bd/html5/thumbnails/5.jpg)
What is import.io? (continued) ● Custom Crawlers● Auto extraction● Authenticated APIs● Cloud scaling● Wide range of integration options
![Page 6: jdp15 import.io workshop](https://reader030.vdocument.in/reader030/viewer/2022032505/55c7ddb8bb61eb96108b45bd/html5/thumbnails/6.jpg)
Structure the web
![Page 7: jdp15 import.io workshop](https://reader030.vdocument.in/reader030/viewer/2022032505/55c7ddb8bb61eb96108b45bd/html5/thumbnails/7.jpg)
import.io consists of 4 tools
● Magic● Extractor● Crawler● Connector
![Page 8: jdp15 import.io workshop](https://reader030.vdocument.in/reader030/viewer/2022032505/55c7ddb8bb61eb96108b45bd/html5/thumbnails/8.jpg)
and completely free...
![Page 9: jdp15 import.io workshop](https://reader030.vdocument.in/reader030/viewer/2022032505/55c7ddb8bb61eb96108b45bd/html5/thumbnails/9.jpg)
import.io Magic
![Page 10: jdp15 import.io workshop](https://reader030.vdocument.in/reader030/viewer/2022032505/55c7ddb8bb61eb96108b45bd/html5/thumbnails/10.jpg)
Sometimes we need to train the tool ourselves
![Page 11: jdp15 import.io workshop](https://reader030.vdocument.in/reader030/viewer/2022032505/55c7ddb8bb61eb96108b45bd/html5/thumbnails/11.jpg)
import.io Extractor
![Page 12: jdp15 import.io workshop](https://reader030.vdocument.in/reader030/viewer/2022032505/55c7ddb8bb61eb96108b45bd/html5/thumbnails/12.jpg)
import.io Extractorlets you structure a single page of data
![Page 13: jdp15 import.io workshop](https://reader030.vdocument.in/reader030/viewer/2022032505/55c7ddb8bb61eb96108b45bd/html5/thumbnails/13.jpg)
import.io Extractorlets you structure a single page of data
Custom XPaths Custom Regex Updatable in real-time
![Page 14: jdp15 import.io workshop](https://reader030.vdocument.in/reader030/viewer/2022032505/55c7ddb8bb61eb96108b45bd/html5/thumbnails/14.jpg)
Sometimes we need to extract data from a lot of URLS
![Page 15: jdp15 import.io workshop](https://reader030.vdocument.in/reader030/viewer/2022032505/55c7ddb8bb61eb96108b45bd/html5/thumbnails/15.jpg)
Sometimes we need to extract data from a lot of URLS
import.io Crawler
![Page 16: jdp15 import.io workshop](https://reader030.vdocument.in/reader030/viewer/2022032505/55c7ddb8bb61eb96108b45bd/html5/thumbnails/16.jpg)
Sometimes we need to extract data from a lot of URLS
import.io Crawler import.io extractor (bulk queries)
![Page 17: jdp15 import.io workshop](https://reader030.vdocument.in/reader030/viewer/2022032505/55c7ddb8bb61eb96108b45bd/html5/thumbnails/17.jpg)
Sometimes we need to extract data from a lot of URLS we don’t know
import.io Crawler
![Page 18: jdp15 import.io workshop](https://reader030.vdocument.in/reader030/viewer/2022032505/55c7ddb8bb61eb96108b45bd/html5/thumbnails/18.jpg)
The import.io Crawler relies on minimum input and gives you
maximum output
![Page 19: jdp15 import.io workshop](https://reader030.vdocument.in/reader030/viewer/2022032505/55c7ddb8bb61eb96108b45bd/html5/thumbnails/19.jpg)
Sometimes we need to interact with the website
![Page 20: jdp15 import.io workshop](https://reader030.vdocument.in/reader030/viewer/2022032505/55c7ddb8bb61eb96108b45bd/html5/thumbnails/20.jpg)
The import.io Connector uses page interactions, such as searches and
extracts the resulting data.
![Page 21: jdp15 import.io workshop](https://reader030.vdocument.in/reader030/viewer/2022032505/55c7ddb8bb61eb96108b45bd/html5/thumbnails/21.jpg)
Example: analyzing newspapers with import.io and MonkeyLearn
![Page 22: jdp15 import.io workshop](https://reader030.vdocument.in/reader030/viewer/2022032505/55c7ddb8bb61eb96108b45bd/html5/thumbnails/22.jpg)
Example: analyzing newspapers with import.io and MonkeyLearn
https://github.com/ignacioelola/web-text-analyzer
![Page 23: jdp15 import.io workshop](https://reader030.vdocument.in/reader030/viewer/2022032505/55c7ddb8bb61eb96108b45bd/html5/thumbnails/23.jpg)
Example: analyzing newspapers with import.io and MonkeyLearn
![Page 24: jdp15 import.io workshop](https://reader030.vdocument.in/reader030/viewer/2022032505/55c7ddb8bb61eb96108b45bd/html5/thumbnails/24.jpg)
Example: analyzing newspapers with import.io and MonkeyLearn
![Page 25: jdp15 import.io workshop](https://reader030.vdocument.in/reader030/viewer/2022032505/55c7ddb8bb61eb96108b45bd/html5/thumbnails/25.jpg)
Example: analyzing newspapers with import.io and MonkeyLearn
![Page 26: jdp15 import.io workshop](https://reader030.vdocument.in/reader030/viewer/2022032505/55c7ddb8bb61eb96108b45bd/html5/thumbnails/26.jpg)
Thanks!
Q & A