mining the geo needles in the social haystack
DESCRIPTION
Matthew Russell's "Mining the Geo Needles in the Social Haystack" from Where 2.0 (April, 19, 2011 - Santa Clara, CA)TRANSCRIPT
![Page 1: Mining the Geo Needles in the Social Haystack](https://reader035.vdocument.in/reader035/viewer/2022062319/554f63b6b4c9058a148b498b/html5/thumbnails/1.jpg)
Mining the Geo Needles in the Social Haystack
(Where 2.0, 2011)
Matthew A. Russellhttp://linkedin.com/in/ptwobrussell@ptwobrussell
![Page 2: Mining the Geo Needles in the Social Haystack](https://reader035.vdocument.in/reader035/viewer/2022062319/554f63b6b4c9058a148b498b/html5/thumbnails/2.jpg)
About Me
2
•VP of Engineering @ Digital Reasoning Systems
•Principal @ Zaffra
•Author of Mining the Social Web et al.
•Triathlete-in-training
@SocialWebMining
![Page 3: Mining the Geo Needles in the Social Haystack](https://reader035.vdocument.in/reader035/viewer/2022062319/554f63b6b4c9058a148b498b/html5/thumbnails/3.jpg)
Objectives
3
•Orientation to geo data in the social web space
•Hands-on exercises for analyzing/visualizing geo data
•Whet your appetite and send you away motivated and with useful
tools/insight
![Page 4: Mining the Geo Needles in the Social Haystack](https://reader035.vdocument.in/reader035/viewer/2022062319/554f63b6b4c9058a148b498b/html5/thumbnails/4.jpg)
Approximate Schedule
•Microformats: 10 minutes
•Twitter : 15 minutes
•LinkedIn: 15 minutes
•Facebook: 15 minutes
•Text-mining: 15 minutes
•General Q&A (time-permitting)
4
![Page 5: Mining the Geo Needles in the Social Haystack](https://reader035.vdocument.in/reader035/viewer/2022062319/554f63b6b4c9058a148b498b/html5/thumbnails/5.jpg)
Development
•Your local machine
•Python version 2.{6,7}
•Recommend Windows users try ActivePython
•We'll handle the rest along the way
5
![Page 6: Mining the Geo Needles in the Social Haystack](https://reader035.vdocument.in/reader035/viewer/2022062319/554f63b6b4c9058a148b498b/html5/thumbnails/6.jpg)
Agile Data Solutions
Microformats
![Page 7: Mining the Geo Needles in the Social Haystack](https://reader035.vdocument.in/reader035/viewer/2022062319/554f63b6b4c9058a148b498b/html5/thumbnails/7.jpg)
Microformats
•My definition: "conventions for unambiguously including structured
data into web pages in an entirely value-added way" (MTSW, p19)
•Bookmark and browse: http://microformats.org
•Examples:
•geo, hCard, hEvent, hResume, XFN
7
![Page 8: Mining the Geo Needles in the Social Haystack](https://reader035.vdocument.in/reader035/viewer/2022062319/554f63b6b4c9058a148b498b/html5/thumbnails/8.jpg)
geo
8
<!-- Download MTSW pp 30-34 from XXX -->
<!-- The multiple class approach --> <span style="display: none" class="geo"> <span class="latitude">36.166</span> <span class="longitude">-86.784</span> </span>
<!-- When used as one class, the separator must be a semicolon --> <span style="display: none" class="geo">36.166; -86.784</span>
![Page 9: Mining the Geo Needles in the Social Haystack](https://reader035.vdocument.in/reader035/viewer/2022062319/554f63b6b4c9058a148b498b/html5/thumbnails/9.jpg)
Exercise!
•View source @ http://en.wikipedia.org/wiki/List_of_U.S._national_parks
•Use http://microform.at to extract the geo data as KML
•http://microform.at/?type=geo&url=http%3A%2F%2Fen.wikipedia.org
%2Fwiki%2FList_of_U.S._national_parks
•Try pasting this URL into Google Maps and see what happens
9
![Page 10: Mining the Geo Needles in the Social Haystack](https://reader035.vdocument.in/reader035/viewer/2022062319/554f63b6b4c9058a148b498b/html5/thumbnails/10.jpg)
10
Exercise Results
• Feel free to hack on the KML
• http://code.google.com/apis/kml/documentation/
• Google Earth can be fun too
• But you already knew that
• We'll see it later...
![Page 11: Mining the Geo Needles in the Social Haystack](https://reader035.vdocument.in/reader035/viewer/2022062319/554f63b6b4c9058a148b498b/html5/thumbnails/11.jpg)
Agile Data Solutions
![Page 12: Mining the Geo Needles in the Social Haystack](https://reader035.vdocument.in/reader035/viewer/2022062319/554f63b6b4c9058a148b498b/html5/thumbnails/12.jpg)
Twitter Data
12
•There's geo data in the user profile
•And in tweets...
• ...if the user enabled it in their prefs
•And even in the 140 chars of the tweet itself
![Page 13: Mining the Geo Needles in the Social Haystack](https://reader035.vdocument.in/reader035/viewer/2022062319/554f63b6b4c9058a148b498b/html5/thumbnails/13.jpg)
A Tweet as JSON
13
{ "user" : { "name" : "Matthew Russell", "description" : "Author of Mining the Social Web; International Sex Symbol", "location" : "Franklin, TN", "screen_name" : "ptwobrussell", ... }, "geo" : { "type" : "Point", "coordinates" : [36.166, 86.784]}, "text" : "Franklin, TN is the best small town in the whole wide world #WIN", ...}
![Page 14: Mining the Geo Needles in the Social Haystack](https://reader035.vdocument.in/reader035/viewer/2022062319/554f63b6b4c9058a148b498b/html5/thumbnails/14.jpg)
Exercise!
14
http://api.twitter.com/1/users/show.json?screen_name=ptwobrussell
$ sudo easy_install twitter # 1.6.1 is the current$ python>>> import twitter>>> t = twitter.Twitter()>>> user = t.users.show(screen_name='ptwobrussell')>>> import json>>> print json.dumps(user, indent=2)
•In your browser, try accessing this URL:
• In a terminal with Python, try it programatically:
![Page 15: Mining the Geo Needles in the Social Haystack](https://reader035.vdocument.in/reader035/viewer/2022062319/554f63b6b4c9058a148b498b/html5/thumbnails/15.jpg)
Recipe #21
•Geocode locations in profiles:
•https://github.com/ptwobrussell/Recipes-for-Mining-Twitter/blob/
master/recipe__geocode_profile_locations.py
•Recipe #21 from 21 Recipes for Mining Twitter
15
![Page 16: Mining the Geo Needles in the Social Haystack](https://reader035.vdocument.in/reader035/viewer/2022062319/554f63b6b4c9058a148b498b/html5/thumbnails/16.jpg)
Sample Results
16
<?xml version="1.0" encoding="UTF-8"?> <kml xmlns="http://earth.google.com/kml/2.0"> <Folder> <name>Geocoded profiles for Twitterers showing up in search results for ... </name> <Placemark> <Style> <LineStyle> <color>cc0000ff</color> <width>5.0</width> </LineStyle> </Style> <name>Paris</name> <Point> <coordinates>2.3509871,48.8566667,0</coordinates> </Point> </Placemark> ... </kml>
![Page 17: Mining the Geo Needles in the Social Haystack](https://reader035.vdocument.in/reader035/viewer/2022062319/554f63b6b4c9058a148b498b/html5/thumbnails/17.jpg)
Recipe #20
•Visualizing results with a Dorling Cartogram:
•https://github.com/ptwobrussell/Recipes-for-Mining-Twitter/blob/
master/recipe__dorling_cartogram.py
•Recipe #20 from 21 Recipes for Mining Twitter
17
![Page 18: Mining the Geo Needles in the Social Haystack](https://reader035.vdocument.in/reader035/viewer/2022062319/554f63b6b4c9058a148b498b/html5/thumbnails/18.jpg)
18
Sample Results
![Page 19: Mining the Geo Needles in the Social Haystack](https://reader035.vdocument.in/reader035/viewer/2022062319/554f63b6b4c9058a148b498b/html5/thumbnails/19.jpg)
Recipe #22 (?!?)
19
•Extracting "geo" fields from a batch of search results
•https://github.com/ptwobrussell/Recipes-for-Mining-Twitter/blob/
master/recipe__geocode_tweets.py
•Not in current edition of 21 Recipes for Mining Twitter
•Just checked in especially for you
![Page 20: Mining the Geo Needles in the Social Haystack](https://reader035.vdocument.in/reader035/viewer/2022062319/554f63b6b4c9058a148b498b/html5/thumbnails/20.jpg)
Sample Results
20
[None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, {u'type': u'Point', u'coordinates': [32.802900000000001, -96.828100000000006]}, {u'type': u'Point', u'coordinates': [33.793300000000002, -117.852]}, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, {u'type': u'Point', u'coordinates': [35.512099999999997, -97.631299999999996]}, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None]
•Unfortunately (???), "geo" data for
tweets seems really scarce
•Varies according to a particular
user's privacy mindset?
•Examining only Twitter users who
enable "geo" would be interesting
in and of itself
![Page 21: Mining the Geo Needles in the Social Haystack](https://reader035.vdocument.in/reader035/viewer/2022062319/554f63b6b4c9058a148b498b/html5/thumbnails/21.jpg)
Mining the 140 Characters
•Not a trivial exercise
•Mining natural language data is hard
•Mining bastardized natural language data is even harder
•We'll look at mining natural language data later
21
![Page 22: Mining the Geo Needles in the Social Haystack](https://reader035.vdocument.in/reader035/viewer/2022062319/554f63b6b4c9058a148b498b/html5/thumbnails/22.jpg)
Fun Possibilities
22
#TeaParty#JustinBieber
![Page 23: Mining the Geo Needles in the Social Haystack](https://reader035.vdocument.in/reader035/viewer/2022062319/554f63b6b4c9058a148b498b/html5/thumbnails/23.jpg)
Oh, and by the way...
23
![Page 24: Mining the Geo Needles in the Social Haystack](https://reader035.vdocument.in/reader035/viewer/2022062319/554f63b6b4c9058a148b498b/html5/thumbnails/24.jpg)
OAuth 1.0a - Nowimport twitterfrom twitter.oauth_dance import oauth_dance
# Get these from http://dev.twitter.com/apps/newconsumer_key, consumer_secret = 'key', 'secret'
(oauth_token, oauth_token_secret) = oauth_dance('MiningTheSocialWeb', consumer_key, consumer_secret)
auth=twitter.oauth.OAuth(oauth_token, oauth_token_secret, consumer_key, consumer_secret)
t = twitter.Twitter(domain='api.twitter.com', auth=auth)
![Page 25: Mining the Geo Needles in the Social Haystack](https://reader035.vdocument.in/reader035/viewer/2022062319/554f63b6b4c9058a148b498b/html5/thumbnails/25.jpg)
OAuth 2.0 - "Soon" +----------+ Client Identifier +---------------+ | -+----(A)--- & Redirect URI ------>| | | End-user | | Authorization | | at |<---(B)-- User authenticates --->| Server | | Browser | | | | -+----(C)-- Authorization Code ---<| | +-|----|---+ +---------------+ | | ^ v (A) (C) | | | | | | ^ v | | +---------+ | | | |>---(D)-- Client Credentials, --------' | | Web | Authorization Code, | | Client | & Redirect URI | | | | | |<---(E)----- Access Token -------------------' +---------+ (w/ Optional Refresh Token)
See http://tools.ietf.org/html/draft-ietf-oauth-v2-10#section-1.4.1
![Page 26: Mining the Geo Needles in the Social Haystack](https://reader035.vdocument.in/reader035/viewer/2022062319/554f63b6b4c9058a148b498b/html5/thumbnails/26.jpg)
Agile Data Solutions
![Page 27: Mining the Geo Needles in the Social Haystack](https://reader035.vdocument.in/reader035/viewer/2022062319/554f63b6b4c9058a148b498b/html5/thumbnails/27.jpg)
LinkedIn Data
27
•Coarsely grained geo data is available in user profiles
•"Greater Nashville Area", "San Francisco Bay", etc.
•Most geocoders don't seem to recognize these names...
•No geocoordinates! (Yet???)
•Mitigation approach: (1) transform/normalize and then (2) geocode
![Page 28: Mining the Geo Needles in the Social Haystack](https://reader035.vdocument.in/reader035/viewer/2022062319/554f63b6b4c9058a148b498b/html5/thumbnails/28.jpg)
Exercise!
•Get an API key at http://code.google.com/apis/maps/signup.html
28
$ easy_install geopy$ python>>> import geopy>>> g = geopy.geocoders.Google(GOOGLE_MAPS_API_KEY)>>> results = g.geocode("Nashville", exactly_one=False)>>> for r in results:. . . print r # (u'Nashville, TN, USA', (36.165889, -86.784443))
•See also https://github.com/ptwobrussell/Recipes-for-Mining-Twitter/blob/
master/etc/geocoding_pattern.py
![Page 29: Mining the Geo Needles in the Social Haystack](https://reader035.vdocument.in/reader035/viewer/2022062319/554f63b6b4c9058a148b498b/html5/thumbnails/29.jpg)
Diving Deeper
•Example 6-14 from MTSW (pp194-195) works though an extended example
and dumps KML output that includes clustered output
•See http://github.com/ptwobrussell/Mining-the-Social-Web/python_code/
linkedin__geocode.py
29
![Page 30: Mining the Geo Needles in the Social Haystack](https://reader035.vdocument.in/reader035/viewer/2022062319/554f63b6b4c9058a148b498b/html5/thumbnails/30.jpg)
Clustering
•First half of MTSW Chapter 6 (pp167-188) provides a good/detailed intro
•Think of clustering as "approximate matching"
•The task of grouping items together according to a similarity metric
• It's among the most useful algorithmic techniques in all of data mining
•The catch: It's a hard problem.
•What do you name the clusters once you've created them?
30
![Page 31: Mining the Geo Needles in the Social Haystack](https://reader035.vdocument.in/reader035/viewer/2022062319/554f63b6b4c9058a148b498b/html5/thumbnails/31.jpg)
Example Output
31
![Page 32: Mining the Geo Needles in the Social Haystack](https://reader035.vdocument.in/reader035/viewer/2022062319/554f63b6b4c9058a148b498b/html5/thumbnails/32.jpg)
Better Data Exploration
32
![Page 33: Mining the Geo Needles in the Social Haystack](https://reader035.vdocument.in/reader035/viewer/2022062319/554f63b6b4c9058a148b498b/html5/thumbnails/33.jpg)
Clustering Approaches
•Agglomerative (hierarchical)
•Greedy
•Approximate
•k-means
33
![Page 34: Mining the Geo Needles in the Social Haystack](https://reader035.vdocument.in/reader035/viewer/2022062319/554f63b6b4c9058a148b498b/html5/thumbnails/34.jpg)
k-Means Algorithm
34
1. Randomly pick k points in the data space as initial values that will be used to compute the k clusters: K1, K2, ..., Kk.
2. Assign each of the n points to a cluster by finding the nearest Kn—effectively creating k clusters and requiring k*n comparisons.
3. For each of the k clusters, calculate the centroid (the mean of the cluster) and reassign its Ki value to be that value. (Hence, you’re computing “k-means” during each iteration of the algorithm.)
4. Repeat steps 2–3 until the members of the clusters do not change between iterations. Generally speaking, relatively few iterations are required for convergence.
Let's try it: http://home.dei.polimi.it/matteucc/Clustering/tutorial_html/AppletKM.html
![Page 35: Mining the Geo Needles in the Social Haystack](https://reader035.vdocument.in/reader035/viewer/2022062319/554f63b6b4c9058a148b498b/html5/thumbnails/35.jpg)
Step 0 (init)
35
![Page 36: Mining the Geo Needles in the Social Haystack](https://reader035.vdocument.in/reader035/viewer/2022062319/554f63b6b4c9058a148b498b/html5/thumbnails/36.jpg)
Step 1
36
![Page 37: Mining the Geo Needles in the Social Haystack](https://reader035.vdocument.in/reader035/viewer/2022062319/554f63b6b4c9058a148b498b/html5/thumbnails/37.jpg)
Step 2
37
![Page 38: Mining the Geo Needles in the Social Haystack](https://reader035.vdocument.in/reader035/viewer/2022062319/554f63b6b4c9058a148b498b/html5/thumbnails/38.jpg)
Step 3
38
![Page 39: Mining the Geo Needles in the Social Haystack](https://reader035.vdocument.in/reader035/viewer/2022062319/554f63b6b4c9058a148b498b/html5/thumbnails/39.jpg)
Step 4
39
![Page 40: Mining the Geo Needles in the Social Haystack](https://reader035.vdocument.in/reader035/viewer/2022062319/554f63b6b4c9058a148b498b/html5/thumbnails/40.jpg)
Step 5
40
![Page 41: Mining the Geo Needles in the Social Haystack](https://reader035.vdocument.in/reader035/viewer/2022062319/554f63b6b4c9058a148b498b/html5/thumbnails/41.jpg)
Step 6
41
![Page 42: Mining the Geo Needles in the Social Haystack](https://reader035.vdocument.in/reader035/viewer/2022062319/554f63b6b4c9058a148b498b/html5/thumbnails/42.jpg)
Step 7
42
![Page 43: Mining the Geo Needles in the Social Haystack](https://reader035.vdocument.in/reader035/viewer/2022062319/554f63b6b4c9058a148b498b/html5/thumbnails/43.jpg)
Step 8
43
![Page 44: Mining the Geo Needles in the Social Haystack](https://reader035.vdocument.in/reader035/viewer/2022062319/554f63b6b4c9058a148b498b/html5/thumbnails/44.jpg)
Step 9 (done)
44
![Page 45: Mining the Geo Needles in the Social Haystack](https://reader035.vdocument.in/reader035/viewer/2022062319/554f63b6b4c9058a148b498b/html5/thumbnails/45.jpg)
k-Means Applied
45
![Page 46: Mining the Geo Needles in the Social Haystack](https://reader035.vdocument.in/reader035/viewer/2022062319/554f63b6b4c9058a148b498b/html5/thumbnails/46.jpg)
Agile Data Solutions
![Page 47: Mining the Geo Needles in the Social Haystack](https://reader035.vdocument.in/reader035/viewer/2022062319/554f63b6b4c9058a148b498b/html5/thumbnails/47.jpg)
Facebook Data
47
•Ridiculous amounts of data (all kinds) is available via the FB Platform
•Current location, hometown, "checkins"
•Access to the FB platform data is relatively painless:
•Social Graph: http://developers.facebook.com/docs/reference/api/
•FQL: http://developers.facebook.com/docs/reference/fql/
![Page 48: Mining the Geo Needles in the Social Haystack](https://reader035.vdocument.in/reader035/viewer/2022062319/554f63b6b4c9058a148b498b/html5/thumbnails/48.jpg)
FQL Checkins
•See http://developers.facebook.com/docs/reference/fql/checkin/
48
![Page 49: Mining the Geo Needles in the Social Haystack](https://reader035.vdocument.in/reader035/viewer/2022062319/554f63b6b4c9058a148b498b/html5/thumbnails/49.jpg)
FQL Connections
•See http://developers.facebook.com/docs/reference/fql/connection/
49
![Page 50: Mining the Geo Needles in the Social Haystack](https://reader035.vdocument.in/reader035/viewer/2022062319/554f63b6b4c9058a148b498b/html5/thumbnails/50.jpg)
Sample FQL
•An excerpt from MTSW Example 9-18 (pp306-308) conveys the gist:
50
fql = FQL(ACCESS_TOKEN)
q= \ """select name, current_location, hometown_location from user where uid in (select target_id from connection where source_id = me() and target_type = 'user')"""
results = fql.query(q)
![Page 51: Mining the Geo Needles in the Social Haystack](https://reader035.vdocument.in/reader035/viewer/2022062319/554f63b6b4c9058a148b498b/html5/thumbnails/51.jpg)
Example "App"
•Basic idea is simple
•You already have the tools to
geocode and plot on a map...
•See also: http://answers.oreilly.com/
topic/2555-a-data-driven-game-
using-facebook-data/51
![Page 52: Mining the Geo Needles in the Social Haystack](https://reader035.vdocument.in/reader035/viewer/2022062319/554f63b6b4c9058a148b498b/html5/thumbnails/52.jpg)
FB Platform Demo
•Mininal sample app at http://miningthesocialweb.appspot.com
•Source is at http://github.com/ptwobrussell/Mining-the-Social-Web/
web_code/facebook_gae_demo_app
52
![Page 53: Mining the Geo Needles in the Social Haystack](https://reader035.vdocument.in/reader035/viewer/2022062319/554f63b6b4c9058a148b498b/html5/thumbnails/53.jpg)
Agile Data Solutions
Text Mining
![Page 54: Mining the Geo Needles in the Social Haystack](https://reader035.vdocument.in/reader035/viewer/2022062319/554f63b6b4c9058a148b498b/html5/thumbnails/54.jpg)
References
54
•MTSW Chapter 7 (Google Buzz: TF-IDF, Cosine Similarity, and Collocations)
•MTSW Chapter 8 (Blogs et al.: Natural Language Processing and Beyond)
![Page 55: Mining the Geo Needles in the Social Haystack](https://reader035.vdocument.in/reader035/viewer/2022062319/554f63b6b4c9058a148b498b/html5/thumbnails/55.jpg)
"Legacy" NLP
55
•"Legacy" => Classic Information Retrieval (IR) techniques
•Often (but not always) uses a "bag of words" model
•tf-idf metric is usually the root of the core strategy
•Variations on cosine similarity are often the fruition
•Additional higher order analytics are possible, but inevitably
cannot be optimal for deep semantic analysis
•Virtually every A-list search engine has started here
![Page 56: Mining the Geo Needles in the Social Haystack](https://reader035.vdocument.in/reader035/viewer/2022062319/554f63b6b4c9058a148b498b/html5/thumbnails/56.jpg)
A Vector Space
56
![Page 57: Mining the Geo Needles in the Social Haystack](https://reader035.vdocument.in/reader035/viewer/2022062319/554f63b6b4c9058a148b498b/html5/thumbnails/57.jpg)
How might you discover locations from text using "legacy" techniques?
57
![Page 58: Mining the Geo Needles in the Social Haystack](https://reader035.vdocument.in/reader035/viewer/2022062319/554f63b6b4c9058a148b498b/html5/thumbnails/58.jpg)
Some possibilities
58
•Combinations of language dependent "hacks"•n-gram detection/examination
•bigrams, trigrams, etc.•"Proper Case" hints
•"Chipotle Mexican Grill"•prepositional phrase cues
•"in the garden", "at the store"•Gazetteers
•lists of "well-known" locations like "Statue of Liberty"
![Page 59: Mining the Geo Needles in the Social Haystack](https://reader035.vdocument.in/reader035/viewer/2022062319/554f63b6b4c9058a148b498b/html5/thumbnails/59.jpg)
"Modern" NLP Pipeline
59
•A deeper "understanding" the data is much harder•End of Sentence (EOS) Detection •Tokenization•Part-of-Speech Tagging•Chunking•Anaphora Resolution•Extraction•Entity Resolution
•Blending in "legacy" IR techniques can be very helpful in reducing noise
![Page 60: Mining the Geo Needles in the Social Haystack](https://reader035.vdocument.in/reader035/viewer/2022062319/554f63b6b4c9058a148b498b/html5/thumbnails/60.jpg)
Entity Interactions
60
![Page 61: Mining the Geo Needles in the Social Haystack](https://reader035.vdocument.in/reader035/viewer/2022062319/554f63b6b4c9058a148b498b/html5/thumbnails/61.jpg)
Quality Metrics
61
•Precision = TP/(TP+FP)
•Recall = TP/(TP+FN)
•F1 = (2*P*R)/(P+R)
![Page 62: Mining the Geo Needles in the Social Haystack](https://reader035.vdocument.in/reader035/viewer/2022062319/554f63b6b4c9058a148b498b/html5/thumbnails/62.jpg)
Exercise!•Get a webpage:
•curl http://example.com/foo.html
•Extract the text:
•curl -d @foo.html "http://www.datasciencetoolkit.org/html2story" > foo.json
•Extract the locations:
•curl -d @foo.json "http://www.datasciencetoolkit.org/text2places"
•NOTE: Windows users can work directly at http://www.datasciencetoolkit.org
62
![Page 63: Mining the Geo Needles in the Social Haystack](https://reader035.vdocument.in/reader035/viewer/2022062319/554f63b6b4c9058a148b498b/html5/thumbnails/63.jpg)
Tools to Investigate
•NLTK - http://nltk.org
•Data Science Toolkit - http://www.datasciencetoolkit.org
•WordNet - http://wordnet.princeton.edu/
63
![Page 64: Mining the Geo Needles in the Social Haystack](https://reader035.vdocument.in/reader035/viewer/2022062319/554f63b6b4c9058a148b498b/html5/thumbnails/64.jpg)
Agile Data Solutions
Q&A
![Page 65: Mining the Geo Needles in the Social Haystack](https://reader035.vdocument.in/reader035/viewer/2022062319/554f63b6b4c9058a148b498b/html5/thumbnails/65.jpg)
Agile Data Solutions
The End