search and the ‘net @ 2015 · search and the ‘net @ 2015 michael hunter reference librarian...

Post on 08-Aug-2020

0 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Search and the ‘Net @ 2015

Michael Hunter Reference Librarian

Hobart and William Smith Colleges

For Rochester Regional Library Council Member Libraries’ Staff

Sponsored by the Rochester Regional Library Council

Supported by Regional Bibliographic Databases and Resources Sharing (RBDB) funds granted by the

New York State Library 2015

For today . . .

The Searchscape

Current Evolution of Search

New Services

The Social Web and Research

Data Visualization

Bing, Yahoo and DuckDuckGo

Google

Linklist

http://people.hws.edu/hunter/searchnet15links.htm

USC Annenberg’s Digital Future Report 2014 http://www.digitalcenter.org/wp-content/uploads/2014/12/2014-Digital-Future-Report.pdf "General Internet Activities"

E-Reading Rises as Device Ownership Jumps By Kathryn Zickuhr and Lee Rainie http://www.pewinternet.org/2014/01/16/e-reading-rises-as-device-ownership-jumps

American adults 18+ - % who

read at least 1 book in that year

American adults 18+ - % who

own each device

New Top Level Domains

First made available 1/29/14

Over 150 now live on donuts.co (2/15/15)

Content-significant

.bike, .energy, .delivery, .legal, .guru

Brand-specific – “vanity domains”

.android, .walmart, .nyc

Allow for non-roman scripts –Arabic, Chinese etc.

Require proof of identity/relationship to TLD

Unique TLD costs $185,000

Growth of Query Types over 1 year http://searchenginewatch.com/sew/how-to/2383498/how-will-voice-search-impact-a-search-marketers-world

Voice Search http://googleblog.blogspot.com/2014/10/omg-mobile-voice-survey-reveals-teens.html

Google's Voice Search (2010) 36 languages

Apple's Siri (2011) 11 languages

MS's Cortana (2014) 6 languages

Study by Northstar; 1400 American smartphone users, 400 age 13-17, 1000 18+

40% - ask for directions

39% - dictate a text message

32% - make a phone call

27% - check the weather

23% 18+ - questions about cooking

51% 13-17, 32% 18+ - "just for fun"

When do we use voice search?

57

22

8

17

59

24

15

23

15

36

0 10 20 30 40 50 60 70

WITH FRIENDS

IN THE BATHROOM

COOKING

EXERCISING

WHILE WATCHING TV

18+ 13-18

Web Search in 2014 Who’s crawling the Web?

Google

Bing (aka Yahoo!)

Gigablast

Blekko

DuckDuckGo

Baidu

Yandex

Market Share Growth Oct. 2013– Oct. 2014

www.comscore.com

0

10

20

30

40

50

60

70

80

Google Bing Yahoo! Ask AOL

2013

2014

The current evolution of search: WAY beyond keyword matching

Semantic Processing Predictive Operations

Internal to the From data about search engine and from the user

ANSWERS, NOT JUST SEARCH RESULTS

Semantic Processes

NLP Parsing Pattern Matching

Knowledgebase Entities Structured Data

Term Frequency Data

NLP Parsing

Machine-learned meaning derived from human or natural language speech or text. (Adapted from Wikipedia)

Analysis of large sets of documents (corpora) that have been human-annotated with parts of speech and other semantic information

Machine “learns” the relationships and meaning through statistical inference

Visualization at http://nlpviz.bpodgursky.com

Knowledgebase Entities Google’s Knowledge Graph – Bing’s Satori

Google’s Knowledge Graph – rooted in the (human) community-created entities in Freebase

Crowdsourcing too slow; often ignores specialized areas of knowledge, non-English content

Knowledge Vault – Automated extraction of raw data and creation of entities derived from that data

DOM trees-structures that help browsers represent and interact with

documents in html and other formats (Wikipedia)

More Semantic Processing…

Term Frequency Data

Frequency, proximity, order

Aids in discovery across subject areas, filetypes and entire domains

Pattern Matching Algorithms

Focuses on recognition of patterns and regularities in text, data and images

Structured Data

Structured Web tables and data sets

(.xls, .kml, .sdf)

Human created tags – Schema.org

Schema.org

Organization backed by G, B, Y and other engines to standardize metadata for use by crawler-based services.

Helps create "real" answers and "rich snippets"

Schema example: Restaurant with a menu

Predictive Operations: Inferring the user’s intent

“The Holy Grail of Search”

Location-based results – IP and GPS

Weather, entertainment, restaurants…..

Anonymous past searches and user behavior

Personal data volunteered by user

Time of day

Device used

Semantic Predictive Processing Operations

--Correctly interpret the query, or a portion of the query --Give a “best guess” answer based on highly trusted sources (knowledgebase) and similar searches --Aggregate and grow the knowledgebase through iterative, real-time web crawls

Discovery Apps: Personalized Search on Steroids

Combines your

Personal preferences

Location

Demographic characteristics

Social network data

People, Preferences, Interests, Events

Suggests entertainment, restaurants and more

Chat with your social network friends

“Current events you may like within X miles”

Gravy – Free on I Tunes

Personal Assistant Apps

Connects to your

E-mail

Calendar

Facebook events

Prompts for transportation times, quickest routes

Includes some discovery and chat features

Relies heavily on user-supplied personal data

Sunrise, Tempo, et. al.

Apps and the Deep Web

Currently crawler-based S.E.’s cannot access content in apps

Posts

Links

Personal data

Education apps continue to grow in content, quality and use

Google is working on indexing them…..

New Services

Izik.com

Search app by Blekko search engine

Launched as a tablet app in 2013

Now accessible via desktop/laptop/smartphone

Searches Web, Twitter, News sources

Dynamically clustered results

Focuses on popular culture, shopping, news

Individual results can be shared via social networks

Qwant A fresh approach to search

Aims to offer a European-based service that respects user’s privacy

Launched in France in 2013

Search verticals offered:

Web Media (News) People (Social Networks)

Boards (Online Forums, mostly European)

Results clusters offer “refine search”

Web News Social Shopping

Qnowledge Graph (from Wikipedia and other general sources)

16 interface languages, which influence search results

Binpad Hierarchical results clustering

Search options: Web, Wiki, Pubmed

Results clusters include other closely related items, often in hierarchical order

All results include images

Verticals include News, Edu, TV, Editor

Editor still in development

Project of Xdroid, Inc., an enterprise search software developer in Hungary

CC Search search.creativecommons.org/

Searches media in the public domain

Flickr, YouTube, Jamendo, Wikimedia Commons, SoundCloud and others…..

Some sponsored results appear that are not in the public domain

Verify use conditions for each result

Search and the Dark Web

Dark Web- Networks with server addresses intentionally obscured

Often house online criminal activities

Includes TOR Networks Hidden Services

have .onion TLD

Only accessible via TOR’s private browser

Content not PW protected, but not accessible to crawler-based services due to lack of linkage

Memex DOD’s Dark Web Search Engine

Software to visualize and organize big data

Searches text, handwritten text, images, geographic data embedded in photos….

Identifies hidden relationships among websites, deep web sites and forums

Can access Dark Web obscured networks

Used in online criminal investigations

Sex-trafficking ads

ISIS-funding and other money laundering

Contact memex@darpa.mil

http://www.wsj.com/articles/sleuthing-search-engine-even-better-than-google-

1423703464

The Social Web and Research

Why search the social web???

Public responses, attitudes, opinions

Breaking news, events

Trending topics and people

Latest product reviews

First-hand accounts of events-text, image, audio, video (primary sources)

Security, technology topics (latest virus, etc.)

Locate individuals/experts and their networks

People interested in a topic/hobby

Social web research projects

BuzzSumo - meta for social networks

Discovers the most shared content

Crawls FB, TW, LinkedIn, Pinterest, Google+

Advanced search features

Boolean URL or domain search

Author search Twitter user search

Filters

Article Infographic Guest Post

Giveaways Interviews Videos Date

Requires (free) account; other fee-based options

Twitter Search - search.twitter.com

Now includes every public Tweet since 2006

Searchable with all search features previously available at twitter.com/search-advanced

Indexes ca. ½ trillion tweets, and grows by several billion tweets a week.

Tweets deal with “everyday human experiences to major historical events”

Entire TV, sports seasons Conferences

Places Events Industry discussions

Long-lived hashtags across countries, ideologies

#ScotlandDecides #HongKong #Ferguson #Hamas

Opinion Mining with TW

Identify TW users with differing opinions on a debated topic

Linguistic analysis of ca. 1 m. public tweets with “guns” or

“gun control” sent 4/15/13-4/18/13

Members of TW lists such as “Prevent Gun Violence” or “Guns Save Lives” – Sample of 263

85 for reforms 178 against reforms

Belonging to no relevant TW lists – Sample 500

276 for, 120 against, 204 did not voice opinion-

(re-tweets of relevant tweets from others) Ashwin Rajadesingan and Huan Liu “Identifying Users with Opposing Opinions

in Twitter Debates” http://www.public.asu.edu/~huanliu/papers/sbp14.pdf

TW as social indicator and health predictor – Upenn study Linguistic and emoticon analysis of geo-

tagged tweets combined with health data from over 1,300 US counties

Tweets expressing negative emotions-stress, anger, fatigue-are associated with higher heart disease risk

Tweets with positive emotions-optimism, enthusiasm-are associated with lower levels of risk

http://www.upenn.edu/pennnews/news/twitter-can-predict-rates-coronary-heart-disease-according-penn-research

Topic-based Twitter Lists

Can provide very latest top news, tips and cutting-edge research in a topic or interest

Slowly gaining popularity-require set-up time and maintenance

Locating lists using hashtags

topic or associated element

#tax #IRS

person, place, event associated with topic

#olympians #worldcup

“101 best twitter lists to follow” http://www.postplanner.com/101-best-twitter-lists-to-follow/

music.twitter.com twitter’s own music location service

Education and the social searchscape

Offers first-hand accounts of events and conditions

Informative of current world cultures and trends on a wide range of subjects

Gateway to blogs and other online communication that can enhance scholarship

Channel for updates to educational programs

Embedded links and other information often highly relevant and recent

Requires careful evaluation of information found there

Data Visualization

Enables patterns to emerge in big data

More accessible to visual learners

Facilitates sharing across languages

Can be made compatible with a wide range of data formats

Responsive to real-time changes

Showcase of 2014 projects:

http://flowingdata.com/2014/12/19

Bing, Yahoo and DuckDuckGo

Looking for a niche

Bing and Yahoo represent 29% of all US searches http://comscore.com 12/1/14

Yahoo

Focus is on local and personalized search results

Now partnered with Yelp, local business search engine

Bing

Focus is on lifestyle, travel, images, maps

Social search results (FB, TW) in a sidebar

Bing Image Search

High quality images

Related search offered, based on descriptive text associated with the image

Clustering by topic

Filters

Size People

Color Date

Type License

Layout SafeSearch

Image Match with a URL or image you upload

Entity Comparisons

Google Bing

Bing for Schools http://www.bing.com/classroom

Safe search filters and ad-free environment

Requires registration by a school

Not possible to access it for home use

Daily lesson plan available based on the image used each day on the Bing homepage

Excludes Bing apps

DuckDuckGo http://ddg.gg

Offers anonymous search functionality

Popularity spiked after NSA PRISM search engine scandal

Does not save search history of any type

G. does, using it "to increase relevancy"

Included as a search option in Apple's latest version of Safari

Has been blocked in China !!!

Google

Knowledge Vault Beyond the Graph…..

Knowledge Graph seeded from Freebase entities and human additions

Automated generation of entities increases number and discovers hidden relationships among entities and their attributes

Entities now appear at top of results page with related topics or other relevant information

Type of additional information varies depending on entity

Graph database stores data in nodes and relationships.

http://www.oaddo.org/home

Right to be Forgotten ruling EU's European Court of Justice, May 2014

G. and other search engines must remove results deemed to be "inadequate, irrelevant or no longer relevant, or excessive in relation to the purposes for which they were processed and in the light of the time that has elapsed."

http://curia.europa.eu/jcms/upload/docs/application/pdf/2014-05/cp140070en.pdf

Does not require them to be removed from the servers on which they are located

Makes the content more difficult to find

Of the initial 12,000 removal requests

33% - fraud accusations

20% - related to violent/serious crimes

12% - related to child pornography arrests

App indexing

G. indexes content from apps that open their content to G's crawlers (7/3/14)

Results from apps are combined with mobile search results if the searcher has that app installed on their mobile device.

Will play an increasing role in web search across devices

Google’s device-dependent results sets

The intent and context of queries varies between devices

G.'s search results on mobile devices vary from those on desktops or laptops by as much as 43%

Mobile results

Tend to focus more on local-based results

Display pages with smaller file size, on average

Based on analysis of first 30 results for 10,000 keyword searches

“US Google Ranking Factors 2014” http://www.searchmetrics.com/news-and-events/mobile-optimization/

http://www.comscore.com/Insights/Presentations_and_Whitepaper

s/2013/The_Digital_World_in_Focus

Maps Gallery, In-depth articles

Interactive digital thematic map collections

Historic city plans Climate trends

Housing affordability Shipwrecks

Up-to-date evacuation routes

In-depth articles caveat

"How to write the In Depth Articles that Google Loves" copyblogger.com

Content farm orientation?

Requires careful evaluation of each item; unvetted websites in particular

Google's tech projects

Google for Kids - under 13; more parental controls

Project Loon - Provide Web access via solar-powered drones

Self-driving cars

Google Glass 2

Smart contact lenses

Continuous health monitoring via disease-detecting nanoparticles

Liftware - stabilized spoon for tremor sufferers

"Google Tracker 2015" http://arstechnica.com

A bit of historical perspective: Top 5 http://www.washingtonpost.com/news/the-intersect/wp/2014/12/15/from-lycos-to-ask-jeeves-to-facebook-tracking-the-20-most-popular-web-sites-every-year-since-1996/

Search in the Future

Will continue to be more specialized

Shopping - Amazon Travel - Kayak

Movies - IMDB Real-time news - TW

Discovery software will integrate more diverse types of data, crowdsourced to expert

Data overload will continue

Social web will increase as a tool for social change

Search engines will be challenged by governments worldwide in the areas of commercial monopoly and individual privacy

Thank You and Enjoy Your Searching!

Michael Hunter

Reference Librarian

Hobart and William Smith Colleges

Geneva, NY 14456

(315) 781-3014 hunter@hws.edu

top related