search engine dependency conference

41
SEARCH ENGINE DEPENDENCY AND ITS INFLUENCE ON DATA QUALITY By Ronan CHARDONNEAU

Upload: nanor

Post on 03-Sep-2014

1.724 views

Category:

Technology


1 download

DESCRIPTION

Conference slides about search engine dependency and its influence on data quality

TRANSCRIPT

Page 1: Search Engine Dependency Conference

SEARCH ENGINE DEPENDENCY AND ITS INFLUENCE ON

DATA QUALITYBy Ronan CHARDONNEAU

Page 2: Search Engine Dependency Conference

Index

I - Introduction to the world of search enginesII - Risks of search engines dependency

III - How to solve the equation?IV - Future of Google and information research

V - Conclusion

I-Introduction II-Risks III-Solutions IV-Future V-Conclusion

Page 3: Search Engine Dependency Conference

The World of Search engines

I-Introduction II-Risks III-Solutions IV-Future V-Conclusion

Page 4: Search Engine Dependency Conference

Market configurationTOP 10 Search websites in the world for August 2007

Target: users more than 15 year-old, home and at work Source: comscore qSearch 2.0

I-Introduction II-Risks III-Solutions IV-Future V-Conclusion

Page 5: Search Engine Dependency Conference

Leaders per country

Source: map made using data on « Alexa the Web information company (2008) ».

I-Introduction II-Risks III-Solutions IV-Future V-Conclusion

Page 6: Search Engine Dependency Conference

A win or lose market

I-Introduction II-Risks III-Solutions IV-Future V-Conclusion

Page 7: Search Engine Dependency Conference

Approximation of language contents available on Internet

Source: Internet world Stats

I-Introduction II-Risks III-Solutions IV-Future V-Conclusion

Page 8: Search Engine Dependency Conference

What has already been proved

• Studies are showing that Internet is the main information provider (at least in Europe and America);• When surfing on the Internet search engines are the most used websites;• People trust search engines results;• When making research on the Internet people are mainly using one single search engine;

I-Introduction II-Risks III-Solutions IV-Future V-Conclusion

Page 9: Search Engine Dependency Conference

Brief summary

• Google is the market leader, followers are far;• 8 search engine leaders and probably eight continents on Internet;• A market defined by the adoption of standards (<50%) to search;• Contents are mainly in English, importance of Chinese, quality contents in Japanese, German and Korean;• Internet users cannot live without search engines and are loyal to a specific one;

I-Introduction II-Risks III-Solutions IV-Future V-Conclusion

Page 10: Search Engine Dependency Conference

Risks of search engine dependency and its influence on

data quality

I-Introduction II-Risks III-Solutions IV-Future V-Conclusion

Page 11: Search Engine Dependency Conference

Definition The behaviour of not reconsidering the results coming from one single search engine.

It normally starts when you hear sentences such as:

- "Why should I bother using other search engines because I find everything I want with Google?"

- Do I really have some risks when I am using Google?

- All countries in the world have Google in their top 100 or less;

- Google has been recognized as the most powerful brand;

I-Introduction II-Risks III-Solutions IV-Future V-Conclusion

Page 12: Search Engine Dependency Conference

• Who is Google? Well... It is our friend;• We can carry it everywhere, relevant, convenient(quick display, services associated);• But:

– You have to know how to deal with it;– You have to know its limits;– You have to know its potential;

I-Introduction II-Risks III-Solutions IV-Future V-Conclusion

Page 13: Search Engine Dependency Conference

• If you don’t know how to deal with it:- You will never use his true capacities;- You will probably take the first information which is

displayed;• If you don't know its limits:

- And cannot find the information you will may think that the information does not exist;

- You may even think that the technology does not exist elsewhere;

• If you don’t know its potential:- You will not improve at performing research;

Consequences

I-Introduction II-Risks III-Solutions IV-Future V-Conclusion

Page 14: Search Engine Dependency Conference

Advertisement• Search engines economical model is based on advertisement (99% of Google revenues are based on it);• However studies are showing that some categories of adults (non Internet generations) do not make the difference between commercial and non commercial links;• Some search engines are more commercial than others;• The more you know a search engine (Google) and the more you can practise Search Engine Optimization;

I-Introduction II-Risks III-Solutions IV-Future V-Conclusion

Page 15: Search Engine Dependency Conference

Google is not an isolated case• Baidu dependency in China and Yandex dependency in Russia;• Seznam dependency in Czech Republic;• Naver dependency in South Korea;• Yahoo dependency in Japan and many others Asian countries;

I-Introduction II-Risks III-Solutions IV-Future V-Conclusion

Page 16: Search Engine Dependency Conference

• Search engine dependency is confortable and then understandable;• But for many reasons it goes for a mass consumption information (blog phenomenon, advertisement…) which is not the best ones;• In our countries it is Google dependency but keep in mind that Europe and Americas are not the center of the world;

Brief summary

I-Introduction II-Risks III-Solutions IV-Future V-Conclusion

Page 17: Search Engine Dependency Conference

How to solve the equation?

I-Introduction II-Risks III-Solutions IV-Future V-Conclusion

Page 18: Search Engine Dependency Conference

First point

• If an answer exist... we should look for it;• At the moment there is no miracle solution

for lazy search;• But there are ways to get closer to the

answer;

I-Introduction II-Risks III-Solutions IV-Future V-Conclusion

Page 19: Search Engine Dependency Conference

Three pillarsLearn how to use the technology

Breaking the habitsTechnological awareness

I-Introduction II-Risks III-Solutions IV-Future V-Conclusion

Page 20: Search Engine Dependency Conference

Concrete case: GoogleLearn how to use the technology:• Make advanced research:

– Simple Boolean operators («  », links:, define:, ?, *, ~,…) ;– Complex request: ?intitle:index.of? "" -filetype:html -filetype:asp -wiki -ringtone -filetype:htm

-posts -lyrics -filetype:shtml -filetype:php -filetype:doc -filetype:pdf -filetype:txt mpeg wma avi wmv

– Google Advanced search;• Using other Google services such as Google Alerts;• Use sub Google search engines such as Google Scholars;

Breaking the habits:- Get used to practice what you learnt and force

yourself to do so;- Results are coming and you get used to it;

I-Introduction II-Risks III-Solutions IV-Future V-Conclusion

Page 21: Search Engine Dependency Conference

Concrete case: GoogleTechnological awareness:

By performing better at search you will discover new technologies that you will have to learn.

For example: Google Alerts tell you that a new searchengine is coming up and then you try it;

I-Introduction II-Risks III-Solutions IV-Future V-Conclusion

Page 22: Search Engine Dependency Conference

Technological awareness: Google

Google Ads

Google Advanced Search

Do you know iGoogle?

I-Introduction II-Risks III-Solutions IV-Future V-Conclusion

When Google promotes its own technology good chances that it is worthwhile

Page 23: Search Engine Dependency Conference

Technological awareness: How to select the best

• Search engine market is a world of buzz:

• Where every search engine want to beat Google;• But are they really providing a technical

revolution?

I-Introduction II-Risks III-Solutions IV-Future V-Conclusion

Page 24: Search Engine Dependency Conference

• Real time information: the Twitter example

When Google starts to be interested in one's technology it should then be a good one

I-Introduction II-Risks III-Solutions IV-Future V-Conclusion

Start to look at what Google does not have

Page 25: Search Engine Dependency Conference

• Finding similar websites: Who is like it?

Unfortunately it is working only for popular websites

I-Introduction II-Risks III-Solutions IV-Future V-Conclusion

Start to look at what Google does not have

Page 26: Search Engine Dependency Conference

Another way of searching information: Social bookmarking

Advantages: you find unindexed websites;

Disadvantages: rubbish websites, advertisement?

I-Introduction II-Risks III-Solutions IV-Future V-Conclusion

Start to look at what Google does not have

Page 27: Search Engine Dependency Conference

Graphical display

I-Introduction II-Risks III-Solutions IV-Future V-Conclusion

Start to look at what Google does not have

Page 28: Search Engine Dependency Conference

Look for specialized search engines- People: 123 People, CV gadget, Pipl…- Jobs: Indeed, JobiJoba…- Tutorials: Tutosearch, …- Torrent: Toorgle, …- Scientific information: Scirus,…- Information in a specific language: Yandex

for Russian, Baidu for Chinese….

I-Introduction II-Risks III-Solutions IV-Future V-Conclusion

Start to look where Google is not the best

Page 29: Search Engine Dependency Conference

• Triangle method: Locating three independent sources that point to the same answer;

• Recent events in Tibet showed how it was important to look at different sources of information and even out of your own country;

How to improve data quality on the Internet?

I-Introduction II-Risks III-Solutions IV-Future V-Conclusion

Source 1: Washington Post

Source 3: AntiCnn.comSource 2: Le Parisien

Page 30: Search Engine Dependency Conference

• Learn how to use, change your habits, be aware; • Be curious• Think about another way to look for information;• Three dependent sources of information;

I-Introduction II-Risks III-Solutions IV-Future V-Conclusion

Brief summary

Page 31: Search Engine Dependency Conference

Future of Google and information research

I-Introduction II-Risks III-Solutions IV-Future V-Conclusion

Page 32: Search Engine Dependency Conference

Semantic search• You get feed instead of entering your

request;• Everything is talking about Semantic

search;• But it is mature yet, a buzz world again (there

are not a lot of suggestions);• Poor results if developped on scratch (poor index)

if developped by huge companies (few suggestions);

I-Introduction II-Risks III-Solutions IV-Future V-Conclusion

Page 33: Search Engine Dependency Conference

Some issues to fix

• How to well index pictures? Are solutions such as Google labeler are the best???

• How to index videos?• How to index sounds?

I-Introduction II-Risks III-Solutions IV-Future V-Conclusion

Page 34: Search Engine Dependency Conference

A Google which will have to change

• Too much information on the Internet;• A Google which is collapsing and providing

more and more sub search engines;• The development of high bandwidth

connection which mean graphical interface;

• A technological awareness which is difficult to transmitt;

I-Introduction II-Risks III-Solutions IV-Future V-Conclusion

Page 35: Search Engine Dependency Conference

But a Google more and more present in our life

• Forecasts are going in that sense;• Development of OS on cell phones, Web

browser, Web software application (Google slides, Google « excel »....)

I-Introduction II-Risks III-Solutions IV-Future V-Conclusion

Page 36: Search Engine Dependency Conference

The question is just how they will do it?

Google in 1998 Google 11 years after

I-Introduction II-Risks III-Solutions IV-Future V-Conclusion

Page 37: Search Engine Dependency Conference

• Google will be with us in the future and we have to get used to it;

• Information research will be more and more assisted but you will still be in late if you do not perform advanced research;

• In a short future some issues will still be there (indexing of pictures…)

I-Introduction II-Risks III-Solutions IV-Future V-Conclusion

Brief summary

Page 38: Search Engine Dependency Conference

Conclusion

I-Introduction II-Risks III-Solutions IV-Future V-Conclusion

Page 39: Search Engine Dependency Conference

What you have to keep in mind

• At least if you are dependent you should be well dependent;

• Apply the triangle method;• Reconsider on each time the information

process (think differently);

I-Introduction II-Risks III-Solutions IV-Future V-Conclusion

Page 40: Search Engine Dependency Conference

RecommandationsMaster thesis about search engine dependency:

- http://www.pandia.com/index.htmlList of search engines:

- http://www.pandia.com/powersearch/index.html- http://www.philb.com/whichengine.htm

To know more about search engines: Pandia search:- www.pandiasearch.com

Documentaries:- Google: Behind the screen by IJsbrand van Veelen

http://www.youtube.com/watch?v=TBNDYggyesc&hl=fr- The Great Firewall of China

http://www.youtube.com/watch?v=IWsXhNJFj78&hl=frI-Introduction II-Risks III-Solutions IV-Future V-Conclusion

Page 41: Search Engine Dependency Conference

Thank you for your attention

http://moteurs-de-recherches-alternatifs.blogspot.com