Search and the ‘Net @ 2013
Michael HunterReference Librarian
Hobart and William Smith Colleges
For Rochester Regional Library Council
Member Libraries’ StaffSponsored by the
Rochester Regional Library Council Supported by Regional Bibliographic Databases and Resources Sharing (RBDB)
funds granted by the New York State Library 2012
For today . . .
The Searchscape Entity-based Search New Services and Tools The Social Web Bing, Blekko, DuckDuckGo, Exalead News from Google A Privacy Primer Trends and Future Directions Linklist
http://people.hws.edu/hunter/searchnet13links.htm
America at the Digital Turning PointCenter for the Digital Future – USC Annenberg School for Communication www.digitalcenter.org/pdf/CDF_10_year_digital_turning_point.pdf Longitudinal study over 10 years Over 2,000 US households surveyed each
year “…online behavior changes relentlessly.” “…constant social connection, unlimited
access to information, and unprecedented abilities to purchase.”
“…online technology creates extraordinary demands on our time, major concerns about privacy, and fundamental questions about the proliferation of the digital realm…”
America at the Digital Turning PointSelected highlights Americans view the Internet as an
important information source, yet many Internet users do not trust much of the information (there)
Our privacy is lost. Most printed daily newspapers will be gone
in about five years. The sheer overwhelming nature of
technology may be reaching a critical point. Because of online technology, work is
increasingly a 24/7 experience.
America at the Digital Turning PointTime spent face-to-face with family in the household since the Internet
The Web Worldwidedata from the International Telecommunications Union 2011
Total Population – ca. 7 b. Connected to the Web – ca. 2 b. Mobile subscriptions – ca. 6 b.
Mobile subscriptions forecast for 2017- 9 b. with 5 b. mobile broadband connections
GLOBAL 5,981,000,000
Developed nations 1,461,000,000
Developing nations 4,520,000,000
World Internet Projectwww.worldinternetproject.netBy using the Internet, people like you can better understand politics – 2009 reporting countries
New Top Level Domains (ICANN 1/11/12) .com domains almost exhausted for new
website names “Someone got there first” New businesses must pay domain brokers
for an address or register a new one with un-natural, insignificant words
Now possible to purchase a unique TLD (.mycompany or .ourtrademark or .ourbrand)
Fee - $185,000 with waiting period of 2 years.
Domain Registration
Currently unrestricted: .com .info .net .org
Currently require proof of eligibility .edu .coop .mil .gov .int .museum .xxx .aero .asia
Search engines and satisfactionmdgadvertising.com (data from Pew Research)How often do you actually find the information you’re looking for with search engines?
Entity-based searchThe back end- How S.E’s worked until now
Matched query terms to terms in their crawler-created database
Results refined Linkage patterns Popularity Personalization Other (?????)
Ambiguous terms abound“kings” “jaguar” “Apollo”
Can a system know????
“Charles Dickens” This searcher wants information about
and books by him “Frank Lloyd Wright”
This searcher wants information about and pictures of buildings designed by him
The basics…. Entity database seeded with a
large“bag of nouns” and supplemented with nouns from web crawls identified through natural language processing
These nouns are mapped to another database of information related and/or relevant to those nouns through n.l.p. beyond simple text matches
Results can be customized based on click responses from previous anonymous searches for that query
Yahoo Research paper - 2009http://research.yahoo.com/files/pods09-woc.pdf Extract structured data (addresses, prices,
item #, etc.) from web documents and associate it with an entity
Link relationships between entities An actor to his films and other actors he has
worked with Discover categorizing information in the
document’s content Subject headings Reviews ( : or ) : Type of food served
The front end- Google’s Knowledge Graph: Focused on questions and answers Contextual box for ambiguous terms
with short descriptions Bing’s Satori: Focused on potential “actions”
associated with the entity Searchers for a rock band usually want to
buy a recording, find lyrics or get tickets “Snapshot” panel – entity-based results
from the social web (your’s and others)
Benefits of entity-based search Greater predictability of searcher
satisfaction Discovers related information that
does not contain the search term(s) Disambiguates many terms Colocates related information from
across the Web in a variety of filetypes
Future challenges- the “long tail” Entities are now limited to the most
popular topics Currently no way to map complex
queries to an entity or entity group “volcanic eruptions in the 18th century” “Lady Gaga concerts in a warm location”
Currently limited to English only Including more entities in English and
other languages will greatly increase processing and impact response time
http://marketingland.com/new-social-discovery-engine-bottlenose-aims-to-take-over-real-time-exploration-17024
Bottlenose: A realtime meta
Launched 8/12 (public beta) Homepage access via login to your social
network (gives Bottlenose access also) Click into Social Search tab and search a
category with no login (11/27/12) Searches all the major social networks Events, trending topics and people Tabs to sort, organize and display Mobile apps available
Terrier – www.smartfp7.eu
Open source Research Project of the EU based at U. of Glasgow
Real-time information about the “real world” Current traffic conditions at a specific
intersection My friends’ favorite bar right now
“Smart Cities” concept Physical spaces covered in an array of
intelligent sensors which communicate and can be searched for information
Zuula: a multi-meta
Web search includes Google, Bing,Yahoo, Gigablast, Exalead, Alexa, EntireWeb, Mahalo, Mojeek
Unique sources and settings available for each type of search: Web News Images Tags Blogs Jobs
Tab through results from each source engine
Polymeta
Web search includes Google, Bing, Ask, Yahoo, Exalead
Source selection available for each search type Web News Images Videos Twitter Blogs Twitter search is limited to top 50 containing
your search terms Faceted and graphed results available Related results from other search types
appear to the right
Searchteam.com
Search engine with wiki-like, real-time collaborative work spaces
“Collective knowledge from your trusted social network circles”
Web sites Videos (YouTube) ImagesReference (Wikipedia) EducationalBooks and Articles (Amazon)
Faceted results and suggested searches Related main topics Subtopics Related searches (suggested)
Searchteam.com
SearchSpaces Organize and share links Online forum for collaborative searching
with friends Small database Educational tab not inclusive of all .edu
domains Results counts unreliable
GapVisnrabinowitz.github.com/gapvis/index.html Maps occurrences of geographic places
in texts Currently includes public domain texts
of Graeco-Roman literature Project of classical scholars and
visualization designers in the US and UK In beta
Why search the social web???
Public responses/attitudes/primary sources Breaking news Trending topics and people Latest product reviews Companies and competition
Security, technology topics (latest virus, etc.) Locate individuals and their networks
Who they follow, who follows them People interested in a topic/hobby
Monitor collaborations
Social Networks in the Egyptian Revolution
1/25/11-2/11/11Enabling protesters to become citizen
journalists
Mining Today’s Social Web:The trust factors People you don’t know
Wikipedia Human-created databases, directories
“I need a few good sites on solar energy”
Mahalo, Ipl2.org Q&A Services
“How do I repair my garage door opener?”
Yahoo Answers, Answers.com, Mahalo Answers
Mining Today’s Social Web:The trust factors
People you follow Twitter-human created Tweets“What’s the buzz on Beyonce?”
People you know Post a question to friends and family“What type of Mac should I buy?” Facebook, LinkedIn, Google+, Bing (login
via Facebook)
Tumblr
Microblogging platform; requires free account
Allows users to post multimedia and other content to a “tumblog”
Search options www.tumblr.com- posts searchable by
author-supplied tags only; no keyword search Tumblow.com- offers keyword search Google site search- more comprehensive
than tumblow Site:www.tumblr.com +search term(s)
TwitterminingSome tweets are more “authoritative” than others…
Access to unfiltered, real-time perspective on what people are thinking and doing
Authority (and usefulness) of a tweet depends on Who sent it The number and “authority” of their
followers When it was sent Documents/sites it refers to
Twittermining Tools
Twitter.com Requires a (free) account Only the latest 2 weeks available Searchable by hashtag (#)
Author-designated keyword or significant term or phrase
#rochester #jobs #marketing
Twittermining Tools
Discover Tab (access via your account) Launched 5/12 Offers Personalized content based on your
Twitter activity Favorites, follows, retweets, and more by
people you follow Who to follow -Twitter accounts suggested
for you based on who you follow Browse categories (<25) and
people/organizations heavily associated with the categories
Twittermining Tools
https://twitter.com/search-advanced No account required Only the latest 2 weeks available Advanced search features
Booleans Hashtag Language limit Author search (tweets from or to) “Near this place” Attitude – positive, negative, question
.
Twittermining Tools
Storify.com Users build social stories, bringing
together media scattered across the Web into a coherent narrative
Access material shared with and by you and public posts
Postings, status updates, photos, videos, podcasts from Twitter, Facebook, YouTube, Flickr, Instagram and more
Discover others with similar interests Requires (free) account, via Facebook or
The Fallacy of the Superior Search Engine
Conrad Saam*
Is there a difference in the quality of search results from Google and Bing? Data set of 100 difficult queries
“clean crayon off an led t.v. screen”“Who was Kim Jong Un’s mother?”“wii new release rumors”
*http://searchengineland.com/google-fails-to-trounce-bing-again-the-fallacy-of-the-superior-search-engine-revisited-107238
The Fallacy of the Superior Search Engine
Evaluative factors Timeliness One-click access to information Volume of content Lack of spam Authoritative sites appear in first 3 results
The winner??? G. 296 B. 274 “Bing needs to be a much better search
engine than Google to make it worth the switch”
Microsoft’s Bing Redesigned 6/8/12 Social search results now located in the
newSocial Sidebar (Facebook-based)
When logged in through Facebook Ask friends Friends who might know People who know Feed of questions you’ve asked your FB
friends through Bing Without a FB login Sidebar results come
from public posts
What Bing is NOW
Travel- Price Predictor Video- Hover and get a preview Music: Artists – All content related to the
artist (entity-based search) Events – FanSnap (meta for ticket
purchasing) Shopping – Hottest deals on the web right
now Maps – Malls and Airports added Everywhere – Xbox, Mobile, iPad
Curating the web with Blekkohttp://blekko.com (still in beta!)
Human/crawler service Blekko (human) editors create “topic” and
“built-in” slashtags used to label content in the Blekko crawler database.
Registered users can create their own tags for any site in the Blekko database for a personal, searchable web
Slashtags help refine results and eliminate spam
Small but well curated database “AdSpam” algo blocked 1.5 m. sites in the first
6 months
Blekko: Under the hood
3 search options Web results Slashtags (human/expert curation) Likes (Facebook friends’ curation)
Adding a slashtag limits the search to those sites so tagged
Note: adding multiple Blekko “topic” slashtags limits the search to sites which have ALL the tags
Blekko this year
Slashtags now automatically added to searches in 500 broad categories based on aggregated anonymous search behavior.
For suggested slashtags-Search term/
Adding /monte gives you results from 3 engines; sources revealed only after you select the most relevant results set
Received substantial investment from major Russian search engine Yandex
DuckDuckGo – http://ddg.gg
Home and search results pages redesigned
Related “Search Suggestions” on results pages
“Goodies” – user-supplied questions with answers in 20 broad categoriesEntertainment ProgrammingFood & Drink SysadminTravel Web Design
Exalead – http://exalead.com/search
Enterprise search company based in France with free web search as product demo
Advanced search options appear as questions
Database well maintained Faceted search results Used by several of the major
metaengines
Personalization and Social Networks in Google Results: A Timeline 2005 – Sites you visited given a boost
(Opt-in via Google account) 2009 – Sites your IP address visited
given a boost by default (Opt-out possible)
2009 – Sites mentioned by your personal social network given a boost, but separated from main results (Opt-in)
2011 – Social network results blended with main results (Opt-in)
Personalization and Social Networks:2012 – Search Plus Your World Boosts in results ranking
Based on IP search behavior (Opt-out) Based on personal search behavior (Opt-
in) Based on your social networks (Opt-in) Based on Google+ public posts (Default;
multiple steps needed to opt-out) Based on your private Google+ network
posts(Opt-in)
IP-based personalization
To permanently opt-out go to Search Settings
To opt-out on a per-search basis use the toggle (top right)
Personalization based on your personal search behavior is still opt-in
Google+ plus.google.com
Google’s social network (requires a Google account)
Launched 9/19/11 (access to Twitter ended 7/2/11)
Currently over 400 m users, 100 m active on a monthly basis Facebook currently over 1.01 b. active
users Offers “hangouts” –video chat rooms
within the social network Businesses and organizations allowed
Google+
“Google+1” allows Google+ member to give a site a vote of approval
Web search results include +1 votes, sometimes location-based
Best access to content is through Google: site:plus.google.com search term(s)
Search Lesson Plans and Common Core Standards
Part of Google’s search education initiative 5 main topics with beginner, intermediate and
advanced levels Picking the right search terms Understanding search results Narrowing a search to get the best results Searching for evidence for research tasks Evaluating credibility of sources
google.com/insidesearch/searcheducation/lessons.html
Search Lesson Plans
Focus is using Google, but adaptable to other sources
Each plan lists Common Core Standards addressed
Include illustrative slides and suggested assessments of student work
“A Google-a-day challenge” questions with answers
Good strategies for deep web searching in Advanced Level of Lesson #1
APA Lawsuit settled
2005 – Association of American Publishers and McGraw-Hill, Person, Penguin, John Wiley, Simon & Shuster allege copyright violation in the Library scanning project
2012- Google settles with publishers, who may now remove their books or journals from the Library project
Author’s Guild suit remains unsettled
Content Removal Requests 1/12 – 6/12
Top 6 countries
Country Total Requests
US 4167
UK 3193
Brazil 2310
Turkey 2084
Germany 1903
France 1250
Google’s policy for its account-based services New unified privacy policy in effect
3/1/12 User profiles and individual search
behavior will be shared among all Google services that require a login
Account holders cannot opt-out of this sharing
Separate privacy policies still in effect for Google Books and Chrome
Google’s policy for services not requiring an account
Covers Search, Youtube IP-based personalization in effect since
2009 “We will not combine Double-Click
cookie information with personally identifiable information unless we have your opt-in consent”
Remarketing or retargeting in the Google ad network Company and other websites tag visitors
with an IP-based (personally anonymous) cookie
When you visit other sites in Google’s ad network you will see ads from sites you have visited before based on these cookies
How to opt-out of remarketing/retargeting in your browser Turn off Web history Clear/Remove Web history Accept no cookies
Bing’s privacy policy
For MS services that require a Windows Live ID “…information collected through one MS
service may be combined with information obtained through other Microsoft services.”
Signing into one service may automatically sign you into other Microsoft services
To opt-out Use separate browsers for each MS service
you access Sign in and out of your accounts throughout
the day to de-couple specific activities
DuckDuckGo
Does not collect or share personal information
No browser cookies stored No personally identifiable or IP-based
search histories stored No IP addresses stored Very comprehensive with high-quality
search results
Search Engine Trends in 2012 Reversal in transparency at the major
services Increasing personalization as the norm Explosion of social network influence Stronger anti-competitive allegations Modest Bing marketshare gains
“The nature of the Internet is undergoing a paradigm shift” – Matthew Berk (Zyxt Labs)http://zyxt.com/post/26851542949/study-of-1-3-billion-urls-22-of-web-pages-reference
2012 study of 1.3 billion URLs 22% of web pages contain Facebook
URLs Among 500 m. hardcoded links to
Facebook only 3.5 m. are unique URLs from Common Crawl (open
repository of web crawl data that can be accessed and analyzed by everyone)
“The Internet is shifting….” – M. Berk from unstructured to structured content
Structured content can be parsed and formatted into any other type of content
Unstructured content- static html from websites to entities
Nodes in social and other networks that contain or link to websites and other content
from links to connection Growth of business and personal presence
on the social web
In the future ---
Mobile search will continue to grow rapidly Entity-based search will continue to develop Personalization will grow but more slowly as
users better understand the consequences Social networks will continue as powerful
tools for grassroots political movements Web access and web search will attract more
government scrutiny worldwide
Thank You and Enjoy Your Searching!
Michael HunterReference Librarian
Hobart and William Smith CollegesGeneva, NY 14456
(315) 781-3014 [email protected]