luminati provides web-transparencyluminati.io/static/london-workshop.pdf · curl --proxy...
TRANSCRIPT
![Page 1: Luminati Provides Web-Transparencyluminati.io/static/London-Workshop.pdf · Curl --proxy :@unblock.zproxy.lum-superproxy.io:22225 Just make a simple](https://reader034.vdocument.in/reader034/viewer/2022042103/5e80ff28ae12c250bd4b79db/html5/thumbnails/1.jpg)
Luminati Provides Web-Transparency
Web Scraping Proxy Management Workshop
![Page 2: Luminati Provides Web-Transparencyluminati.io/static/London-Workshop.pdf · Curl --proxy :@unblock.zproxy.lum-superproxy.io:22225 Just make a simple](https://reader034.vdocument.in/reader034/viewer/2022042103/5e80ff28ae12c250bd4b79db/html5/thumbnails/2.jpg)
Consumers opt-in to the network in return for free partner's application usage
Luminati developed a global P2P network 35M+ consumers willing to help
![Page 3: Luminati Provides Web-Transparencyluminati.io/static/London-Workshop.pdf · Curl --proxy :@unblock.zproxy.lum-superproxy.io:22225 Just make a simple](https://reader034.vdocument.in/reader034/viewer/2022042103/5e80ff28ae12c250bd4b79db/html5/thumbnails/3.jpg)
How do we get users active consent?
![Page 4: Luminati Provides Web-Transparencyluminati.io/static/London-Workshop.pdf · Curl --proxy :@unblock.zproxy.lum-superproxy.io:22225 Just make a simple](https://reader034.vdocument.in/reader034/viewer/2022042103/5e80ff28ae12c250bd4b79db/html5/thumbnails/4.jpg)
How does it work?
We use a peer’s IP address only when a device meets 3 conditions:
![Page 5: Luminati Provides Web-Transparencyluminati.io/static/London-Workshop.pdf · Curl --proxy :@unblock.zproxy.lum-superproxy.io:22225 Just make a simple](https://reader034.vdocument.in/reader034/viewer/2022042103/5e80ff28ae12c250bd4b79db/html5/thumbnails/5.jpg)
Businesses can now view the web, as these 35M global consumers can see it
![Page 6: Luminati Provides Web-Transparencyluminati.io/static/London-Workshop.pdf · Curl --proxy :@unblock.zproxy.lum-superproxy.io:22225 Just make a simple](https://reader034.vdocument.in/reader034/viewer/2022042103/5e80ff28ae12c250bd4b79db/html5/thumbnails/6.jpg)
Luminati Proxy Networks Available
![Page 7: Luminati Provides Web-Transparencyluminati.io/static/London-Workshop.pdf · Curl --proxy :@unblock.zproxy.lum-superproxy.io:22225 Just make a simple](https://reader034.vdocument.in/reader034/viewer/2022042103/5e80ff28ae12c250bd4b79db/html5/thumbnails/7.jpg)
Crawling Network Architecture
![Page 8: Luminati Provides Web-Transparencyluminati.io/static/London-Workshop.pdf · Curl --proxy :@unblock.zproxy.lum-superproxy.io:22225 Just make a simple](https://reader034.vdocument.in/reader034/viewer/2022042103/5e80ff28ae12c250bd4b79db/html5/thumbnails/8.jpg)
Luminati Proxy Manager
![Page 9: Luminati Provides Web-Transparencyluminati.io/static/London-Workshop.pdf · Curl --proxy :@unblock.zproxy.lum-superproxy.io:22225 Just make a simple](https://reader034.vdocument.in/reader034/viewer/2022042103/5e80ff28ae12c250bd4b79db/html5/thumbnails/9.jpg)
Luminati Proxy Manager
![Page 10: Luminati Provides Web-Transparencyluminati.io/static/London-Workshop.pdf · Curl --proxy :@unblock.zproxy.lum-superproxy.io:22225 Just make a simple](https://reader034.vdocument.in/reader034/viewer/2022042103/5e80ff28ae12c250bd4b79db/html5/thumbnails/10.jpg)
RobotDetection
![Page 11: Luminati Provides Web-Transparencyluminati.io/static/London-Workshop.pdf · Curl --proxy :@unblock.zproxy.lum-superproxy.io:22225 Just make a simple](https://reader034.vdocument.in/reader034/viewer/2022042103/5e80ff28ae12c250bd4b79db/html5/thumbnails/11.jpg)
ROBOT DETECTION Techniques for Bot Detection
● IP reputation
● Browser headers and cookies
● Device fingerprints
● User behaviour and history
● IP leaks
![Page 12: Luminati Provides Web-Transparencyluminati.io/static/London-Workshop.pdf · Curl --proxy :@unblock.zproxy.lum-superproxy.io:22225 Just make a simple](https://reader034.vdocument.in/reader034/viewer/2022042103/5e80ff28ae12c250bd4b79db/html5/thumbnails/12.jpg)
ROBOT DETECTION IP Reputation
● Type
● Request rate
● Account association
● Blacklisted IPs
● Inconsistencies
![Page 13: Luminati Provides Web-Transparencyluminati.io/static/London-Workshop.pdf · Curl --proxy :@unblock.zproxy.lum-superproxy.io:22225 Just make a simple](https://reader034.vdocument.in/reader034/viewer/2022042103/5e80ff28ae12c250bd4b79db/html5/thumbnails/13.jpg)
ROBOT DETECTION Browser Fingerprints
● User uniqueness on the web
● Users become more unique as the entropy level increases
![Page 14: Luminati Provides Web-Transparencyluminati.io/static/London-Workshop.pdf · Curl --proxy :@unblock.zproxy.lum-superproxy.io:22225 Just make a simple](https://reader034.vdocument.in/reader034/viewer/2022042103/5e80ff28ae12c250bd4b79db/html5/thumbnails/14.jpg)
ROBOT DETECTION Browser Fingerprint Examples
![Page 15: Luminati Provides Web-Transparencyluminati.io/static/London-Workshop.pdf · Curl --proxy :@unblock.zproxy.lum-superproxy.io:22225 Just make a simple](https://reader034.vdocument.in/reader034/viewer/2022042103/5e80ff28ae12c250bd4b79db/html5/thumbnails/15.jpg)
ROBOT DETECTION
Desktop <> Mobile Android <> iOS
User Agent Uniqueness
![Page 16: Luminati Provides Web-Transparencyluminati.io/static/London-Workshop.pdf · Curl --proxy :@unblock.zproxy.lum-superproxy.io:22225 Just make a simple](https://reader034.vdocument.in/reader034/viewer/2022042103/5e80ff28ae12c250bd4b79db/html5/thumbnails/16.jpg)
ROBOT DETECTION Audio Fingerprints
AudioContext properties:
![Page 17: Luminati Provides Web-Transparencyluminati.io/static/London-Workshop.pdf · Curl --proxy :@unblock.zproxy.lum-superproxy.io:22225 Just make a simple](https://reader034.vdocument.in/reader034/viewer/2022042103/5e80ff28ae12c250bd4b79db/html5/thumbnails/17.jpg)
ROBOT DETECTION
Image from http://getwallpapers.com
Symptoms: blocked <> cloaked <> recaptcha
![Page 18: Luminati Provides Web-Transparencyluminati.io/static/London-Workshop.pdf · Curl --proxy :@unblock.zproxy.lum-superproxy.io:22225 Just make a simple](https://reader034.vdocument.in/reader034/viewer/2022042103/5e80ff28ae12c250bd4b79db/html5/thumbnails/18.jpg)
ROBOT DETECTION How to Prevent Getting Blocked or Cloaked
● Request rate
● Country and city discovery
● Managing headers and fingerprints
● Internet protocol version (i.e HTTP/2)
● Persistence
![Page 19: Luminati Provides Web-Transparencyluminati.io/static/London-Workshop.pdf · Curl --proxy :@unblock.zproxy.lum-superproxy.io:22225 Just make a simple](https://reader034.vdocument.in/reader034/viewer/2022042103/5e80ff28ae12c250bd4b79db/html5/thumbnails/19.jpg)
ROBOT DETECTION How to Overcome Common Blockades
● By using different IPs, geo’s and networks
○ Waterfall routing
● Auto retry and banning IPs
○ Optimize IP cooling period
● New IP and fingerprints
○ Error code, ReCaptcha, cloaked
![Page 20: Luminati Provides Web-Transparencyluminati.io/static/London-Workshop.pdf · Curl --proxy :@unblock.zproxy.lum-superproxy.io:22225 Just make a simple](https://reader034.vdocument.in/reader034/viewer/2022042103/5e80ff28ae12c250bd4b79db/html5/thumbnails/20.jpg)
ROBOT DETECTION Waterfall Routing
Target Website
![Page 21: Luminati Provides Web-Transparencyluminati.io/static/London-Workshop.pdf · Curl --proxy :@unblock.zproxy.lum-superproxy.io:22225 Just make a simple](https://reader034.vdocument.in/reader034/viewer/2022042103/5e80ff28ae12c250bd4b79db/html5/thumbnails/21.jpg)
ROBOT DETECTION Luminati’s Unblocker
Curl --proxy <username>:<password>@unblock.zproxy.lum-superproxy.io:22225 https://example.com
Just make a simple request and let us handle the rest!
Automatic RetryAutomatically retries request upon a failed response
Network RotationRoute through multiple networks automatically (waterfall)
Manages HeadersAutomatic header management based on site requirements
Manages CookiesIP priming and cookie management based on overall request load
Country DiscoveryChooses the right country IP based on your request or target site
Detection and MatchingEnsures the response is of the right content type
![Page 22: Luminati Provides Web-Transparencyluminati.io/static/London-Workshop.pdf · Curl --proxy :@unblock.zproxy.lum-superproxy.io:22225 Just make a simple](https://reader034.vdocument.in/reader034/viewer/2022042103/5e80ff28ae12c250bd4b79db/html5/thumbnails/22.jpg)
ROBOT DETECTION Luminati’s Unblocker
Curl --proxy <username>:<password>@unblock.zproxy.lum-superproxy.io:22225 https://example.com
Just make a simple request and let us handle the rest!
Automatic RetryAutomatically retries request upon a failed response
Network RotationRoute through multiple networks automatically (waterfall)
Manages HeadersAutomatic header management based on site requirements
Manages CookiesIP priming and cookie management based on overall request load
Country DiscoveryChooses the right country IP based on your request or target site
Detection and MatchingEnsures the response is of the right content type
![Page 24: Luminati Provides Web-Transparencyluminati.io/static/London-Workshop.pdf · Curl --proxy :@unblock.zproxy.lum-superproxy.io:22225 Just make a simple](https://reader034.vdocument.in/reader034/viewer/2022042103/5e80ff28ae12c250bd4b79db/html5/thumbnails/24.jpg)