search and the ‘net @ 2015 · search and the ‘net @ 2015 michael hunter reference librarian...
Post on 08-Aug-2020
0 Views
Preview:
TRANSCRIPT
Search and the ‘Net @ 2015
Michael Hunter Reference Librarian
Hobart and William Smith Colleges
For Rochester Regional Library Council Member Libraries’ Staff
Sponsored by the Rochester Regional Library Council
Supported by Regional Bibliographic Databases and Resources Sharing (RBDB) funds granted by the
New York State Library 2015
For today . . .
The Searchscape
Current Evolution of Search
New Services
The Social Web and Research
Data Visualization
Bing, Yahoo and DuckDuckGo
Linklist
http://people.hws.edu/hunter/searchnet15links.htm
USC Annenberg’s Digital Future Report 2014 http://www.digitalcenter.org/wp-content/uploads/2014/12/2014-Digital-Future-Report.pdf "General Internet Activities"
E-Reading Rises as Device Ownership Jumps By Kathryn Zickuhr and Lee Rainie http://www.pewinternet.org/2014/01/16/e-reading-rises-as-device-ownership-jumps
American adults 18+ - % who
read at least 1 book in that year
American adults 18+ - % who
own each device
New Top Level Domains
First made available 1/29/14
Over 150 now live on donuts.co (2/15/15)
Content-significant
.bike, .energy, .delivery, .legal, .guru
Brand-specific – “vanity domains”
.android, .walmart, .nyc
Allow for non-roman scripts –Arabic, Chinese etc.
Require proof of identity/relationship to TLD
Unique TLD costs $185,000
Growth of Query Types over 1 year http://searchenginewatch.com/sew/how-to/2383498/how-will-voice-search-impact-a-search-marketers-world
Voice Search http://googleblog.blogspot.com/2014/10/omg-mobile-voice-survey-reveals-teens.html
Google's Voice Search (2010) 36 languages
Apple's Siri (2011) 11 languages
MS's Cortana (2014) 6 languages
Study by Northstar; 1400 American smartphone users, 400 age 13-17, 1000 18+
40% - ask for directions
39% - dictate a text message
32% - make a phone call
27% - check the weather
23% 18+ - questions about cooking
51% 13-17, 32% 18+ - "just for fun"
When do we use voice search?
57
22
8
17
59
24
15
23
15
36
0 10 20 30 40 50 60 70
WITH FRIENDS
IN THE BATHROOM
COOKING
EXERCISING
WHILE WATCHING TV
18+ 13-18
Web Search in 2014 Who’s crawling the Web?
Bing (aka Yahoo!)
Gigablast
Blekko
DuckDuckGo
Baidu
Yandex
Market Share Growth Oct. 2013– Oct. 2014
www.comscore.com
0
10
20
30
40
50
60
70
80
Google Bing Yahoo! Ask AOL
2013
2014
The current evolution of search: WAY beyond keyword matching
Semantic Processing Predictive Operations
Internal to the From data about search engine and from the user
ANSWERS, NOT JUST SEARCH RESULTS
Semantic Processes
NLP Parsing Pattern Matching
Knowledgebase Entities Structured Data
Term Frequency Data
NLP Parsing
Machine-learned meaning derived from human or natural language speech or text. (Adapted from Wikipedia)
Analysis of large sets of documents (corpora) that have been human-annotated with parts of speech and other semantic information
Machine “learns” the relationships and meaning through statistical inference
Visualization at http://nlpviz.bpodgursky.com
Knowledgebase Entities Google’s Knowledge Graph – Bing’s Satori
Google’s Knowledge Graph – rooted in the (human) community-created entities in Freebase
Crowdsourcing too slow; often ignores specialized areas of knowledge, non-English content
Knowledge Vault – Automated extraction of raw data and creation of entities derived from that data
DOM trees-structures that help browsers represent and interact with
documents in html and other formats (Wikipedia)
More Semantic Processing…
Term Frequency Data
Frequency, proximity, order
Aids in discovery across subject areas, filetypes and entire domains
Pattern Matching Algorithms
Focuses on recognition of patterns and regularities in text, data and images
Structured Data
Structured Web tables and data sets
(.xls, .kml, .sdf)
Human created tags – Schema.org
Schema.org
Organization backed by G, B, Y and other engines to standardize metadata for use by crawler-based services.
Helps create "real" answers and "rich snippets"
Schema example: Restaurant with a menu
Predictive Operations: Inferring the user’s intent
“The Holy Grail of Search”
Location-based results – IP and GPS
Weather, entertainment, restaurants…..
Anonymous past searches and user behavior
Personal data volunteered by user
Time of day
Device used
Semantic Predictive Processing Operations
--Correctly interpret the query, or a portion of the query --Give a “best guess” answer based on highly trusted sources (knowledgebase) and similar searches --Aggregate and grow the knowledgebase through iterative, real-time web crawls
Discovery Apps: Personalized Search on Steroids
Combines your
Personal preferences
Location
Demographic characteristics
Social network data
People, Preferences, Interests, Events
Suggests entertainment, restaurants and more
Chat with your social network friends
“Current events you may like within X miles”
Gravy – Free on I Tunes
Personal Assistant Apps
Connects to your
Calendar
Facebook events
Prompts for transportation times, quickest routes
Includes some discovery and chat features
Relies heavily on user-supplied personal data
Sunrise, Tempo, et. al.
Apps and the Deep Web
Currently crawler-based S.E.’s cannot access content in apps
Posts
Links
Personal data
Education apps continue to grow in content, quality and use
Google is working on indexing them…..
New Services
Izik.com
Search app by Blekko search engine
Launched as a tablet app in 2013
Now accessible via desktop/laptop/smartphone
Searches Web, Twitter, News sources
Dynamically clustered results
Focuses on popular culture, shopping, news
Individual results can be shared via social networks
Qwant A fresh approach to search
Aims to offer a European-based service that respects user’s privacy
Launched in France in 2013
Search verticals offered:
Web Media (News) People (Social Networks)
Boards (Online Forums, mostly European)
Results clusters offer “refine search”
Web News Social Shopping
Qnowledge Graph (from Wikipedia and other general sources)
16 interface languages, which influence search results
Binpad Hierarchical results clustering
Search options: Web, Wiki, Pubmed
Results clusters include other closely related items, often in hierarchical order
All results include images
Verticals include News, Edu, TV, Editor
Editor still in development
Project of Xdroid, Inc., an enterprise search software developer in Hungary
CC Search search.creativecommons.org/
Searches media in the public domain
Flickr, YouTube, Jamendo, Wikimedia Commons, SoundCloud and others…..
Some sponsored results appear that are not in the public domain
Verify use conditions for each result
Search and the Dark Web
Dark Web- Networks with server addresses intentionally obscured
Often house online criminal activities
Includes TOR Networks Hidden Services
have .onion TLD
Only accessible via TOR’s private browser
Content not PW protected, but not accessible to crawler-based services due to lack of linkage
Memex DOD’s Dark Web Search Engine
Software to visualize and organize big data
Searches text, handwritten text, images, geographic data embedded in photos….
Identifies hidden relationships among websites, deep web sites and forums
Can access Dark Web obscured networks
Used in online criminal investigations
Sex-trafficking ads
ISIS-funding and other money laundering
Contact memex@darpa.mil
http://www.wsj.com/articles/sleuthing-search-engine-even-better-than-google-
1423703464
The Social Web and Research
Why search the social web???
Public responses, attitudes, opinions
Breaking news, events
Trending topics and people
Latest product reviews
First-hand accounts of events-text, image, audio, video (primary sources)
Security, technology topics (latest virus, etc.)
Locate individuals/experts and their networks
People interested in a topic/hobby
Social web research projects
BuzzSumo - meta for social networks
Discovers the most shared content
Crawls FB, TW, LinkedIn, Pinterest, Google+
Advanced search features
Boolean URL or domain search
Author search Twitter user search
Filters
Article Infographic Guest Post
Giveaways Interviews Videos Date
Requires (free) account; other fee-based options
Twitter Search - search.twitter.com
Now includes every public Tweet since 2006
Searchable with all search features previously available at twitter.com/search-advanced
Indexes ca. ½ trillion tweets, and grows by several billion tweets a week.
Tweets deal with “everyday human experiences to major historical events”
Entire TV, sports seasons Conferences
Places Events Industry discussions
Long-lived hashtags across countries, ideologies
#ScotlandDecides #HongKong #Ferguson #Hamas
Opinion Mining with TW
Identify TW users with differing opinions on a debated topic
Linguistic analysis of ca. 1 m. public tweets with “guns” or
“gun control” sent 4/15/13-4/18/13
Members of TW lists such as “Prevent Gun Violence” or “Guns Save Lives” – Sample of 263
85 for reforms 178 against reforms
Belonging to no relevant TW lists – Sample 500
276 for, 120 against, 204 did not voice opinion-
(re-tweets of relevant tweets from others) Ashwin Rajadesingan and Huan Liu “Identifying Users with Opposing Opinions
in Twitter Debates” http://www.public.asu.edu/~huanliu/papers/sbp14.pdf
TW as social indicator and health predictor – Upenn study Linguistic and emoticon analysis of geo-
tagged tweets combined with health data from over 1,300 US counties
Tweets expressing negative emotions-stress, anger, fatigue-are associated with higher heart disease risk
Tweets with positive emotions-optimism, enthusiasm-are associated with lower levels of risk
http://www.upenn.edu/pennnews/news/twitter-can-predict-rates-coronary-heart-disease-according-penn-research
Topic-based Twitter Lists
Can provide very latest top news, tips and cutting-edge research in a topic or interest
Slowly gaining popularity-require set-up time and maintenance
Locating lists using hashtags
topic or associated element
#tax #IRS
person, place, event associated with topic
#olympians #worldcup
“101 best twitter lists to follow” http://www.postplanner.com/101-best-twitter-lists-to-follow/
music.twitter.com twitter’s own music location service
Education and the social searchscape
Offers first-hand accounts of events and conditions
Informative of current world cultures and trends on a wide range of subjects
Gateway to blogs and other online communication that can enhance scholarship
Channel for updates to educational programs
Embedded links and other information often highly relevant and recent
Requires careful evaluation of information found there
Data Visualization
Enables patterns to emerge in big data
More accessible to visual learners
Facilitates sharing across languages
Can be made compatible with a wide range of data formats
Responsive to real-time changes
Showcase of 2014 projects:
http://flowingdata.com/2014/12/19
Bing, Yahoo and DuckDuckGo
Looking for a niche
Bing and Yahoo represent 29% of all US searches http://comscore.com 12/1/14
Yahoo
Focus is on local and personalized search results
Now partnered with Yelp, local business search engine
Bing
Focus is on lifestyle, travel, images, maps
Social search results (FB, TW) in a sidebar
Bing Image Search
High quality images
Related search offered, based on descriptive text associated with the image
Clustering by topic
Filters
Size People
Color Date
Type License
Layout SafeSearch
Image Match with a URL or image you upload
Entity Comparisons
Google Bing
Bing for Schools http://www.bing.com/classroom
Safe search filters and ad-free environment
Requires registration by a school
Not possible to access it for home use
Daily lesson plan available based on the image used each day on the Bing homepage
Excludes Bing apps
DuckDuckGo http://ddg.gg
Offers anonymous search functionality
Popularity spiked after NSA PRISM search engine scandal
Does not save search history of any type
G. does, using it "to increase relevancy"
Included as a search option in Apple's latest version of Safari
Has been blocked in China !!!
Knowledge Vault Beyond the Graph…..
Knowledge Graph seeded from Freebase entities and human additions
Automated generation of entities increases number and discovers hidden relationships among entities and their attributes
Entities now appear at top of results page with related topics or other relevant information
Type of additional information varies depending on entity
Graph database stores data in nodes and relationships.
http://www.oaddo.org/home
Right to be Forgotten ruling EU's European Court of Justice, May 2014
G. and other search engines must remove results deemed to be "inadequate, irrelevant or no longer relevant, or excessive in relation to the purposes for which they were processed and in the light of the time that has elapsed."
http://curia.europa.eu/jcms/upload/docs/application/pdf/2014-05/cp140070en.pdf
Does not require them to be removed from the servers on which they are located
Makes the content more difficult to find
Of the initial 12,000 removal requests
33% - fraud accusations
20% - related to violent/serious crimes
12% - related to child pornography arrests
App indexing
G. indexes content from apps that open their content to G's crawlers (7/3/14)
Results from apps are combined with mobile search results if the searcher has that app installed on their mobile device.
Will play an increasing role in web search across devices
Google’s device-dependent results sets
The intent and context of queries varies between devices
G.'s search results on mobile devices vary from those on desktops or laptops by as much as 43%
Mobile results
Tend to focus more on local-based results
Display pages with smaller file size, on average
Based on analysis of first 30 results for 10,000 keyword searches
“US Google Ranking Factors 2014” http://www.searchmetrics.com/news-and-events/mobile-optimization/
http://www.comscore.com/Insights/Presentations_and_Whitepaper
s/2013/The_Digital_World_in_Focus
Maps Gallery, In-depth articles
Interactive digital thematic map collections
Historic city plans Climate trends
Housing affordability Shipwrecks
Up-to-date evacuation routes
In-depth articles caveat
"How to write the In Depth Articles that Google Loves" copyblogger.com
Content farm orientation?
Requires careful evaluation of each item; unvetted websites in particular
Google's tech projects
Google for Kids - under 13; more parental controls
Project Loon - Provide Web access via solar-powered drones
Self-driving cars
Google Glass 2
Smart contact lenses
Continuous health monitoring via disease-detecting nanoparticles
Liftware - stabilized spoon for tremor sufferers
"Google Tracker 2015" http://arstechnica.com
A bit of historical perspective: Top 5 http://www.washingtonpost.com/news/the-intersect/wp/2014/12/15/from-lycos-to-ask-jeeves-to-facebook-tracking-the-20-most-popular-web-sites-every-year-since-1996/
Search in the Future
Will continue to be more specialized
Shopping - Amazon Travel - Kayak
Movies - IMDB Real-time news - TW
Discovery software will integrate more diverse types of data, crowdsourced to expert
Data overload will continue
Social web will increase as a tool for social change
Search engines will be challenged by governments worldwide in the areas of commercial monopoly and individual privacy
Thank You and Enjoy Your Searching!
Michael Hunter
Reference Librarian
Hobart and William Smith Colleges
Geneva, NY 14456
(315) 781-3014 hunter@hws.edu
top related