search engine dependency conference
DESCRIPTION
Conference slides about search engine dependency and its influence on data qualityTRANSCRIPT
SEARCH ENGINE DEPENDENCY AND ITS INFLUENCE ON
DATA QUALITYBy Ronan CHARDONNEAU
Index
I - Introduction to the world of search enginesII - Risks of search engines dependency
III - How to solve the equation?IV - Future of Google and information research
V - Conclusion
I-Introduction II-Risks III-Solutions IV-Future V-Conclusion
The World of Search engines
I-Introduction II-Risks III-Solutions IV-Future V-Conclusion
Market configurationTOP 10 Search websites in the world for August 2007
Target: users more than 15 year-old, home and at work Source: comscore qSearch 2.0
I-Introduction II-Risks III-Solutions IV-Future V-Conclusion
Leaders per country
Source: map made using data on « Alexa the Web information company (2008) ».
I-Introduction II-Risks III-Solutions IV-Future V-Conclusion
A win or lose market
I-Introduction II-Risks III-Solutions IV-Future V-Conclusion
Approximation of language contents available on Internet
Source: Internet world Stats
I-Introduction II-Risks III-Solutions IV-Future V-Conclusion
What has already been proved
• Studies are showing that Internet is the main information provider (at least in Europe and America);• When surfing on the Internet search engines are the most used websites;• People trust search engines results;• When making research on the Internet people are mainly using one single search engine;
I-Introduction II-Risks III-Solutions IV-Future V-Conclusion
Brief summary
• Google is the market leader, followers are far;• 8 search engine leaders and probably eight continents on Internet;• A market defined by the adoption of standards (<50%) to search;• Contents are mainly in English, importance of Chinese, quality contents in Japanese, German and Korean;• Internet users cannot live without search engines and are loyal to a specific one;
I-Introduction II-Risks III-Solutions IV-Future V-Conclusion
Risks of search engine dependency and its influence on
data quality
I-Introduction II-Risks III-Solutions IV-Future V-Conclusion
Definition The behaviour of not reconsidering the results coming from one single search engine.
It normally starts when you hear sentences such as:
- "Why should I bother using other search engines because I find everything I want with Google?"
- Do I really have some risks when I am using Google?
- All countries in the world have Google in their top 100 or less;
- Google has been recognized as the most powerful brand;
I-Introduction II-Risks III-Solutions IV-Future V-Conclusion
• Who is Google? Well... It is our friend;• We can carry it everywhere, relevant, convenient(quick display, services associated);• But:
– You have to know how to deal with it;– You have to know its limits;– You have to know its potential;
I-Introduction II-Risks III-Solutions IV-Future V-Conclusion
• If you don’t know how to deal with it:- You will never use his true capacities;- You will probably take the first information which is
displayed;• If you don't know its limits:
- And cannot find the information you will may think that the information does not exist;
- You may even think that the technology does not exist elsewhere;
• If you don’t know its potential:- You will not improve at performing research;
Consequences
I-Introduction II-Risks III-Solutions IV-Future V-Conclusion
Advertisement• Search engines economical model is based on advertisement (99% of Google revenues are based on it);• However studies are showing that some categories of adults (non Internet generations) do not make the difference between commercial and non commercial links;• Some search engines are more commercial than others;• The more you know a search engine (Google) and the more you can practise Search Engine Optimization;
I-Introduction II-Risks III-Solutions IV-Future V-Conclusion
Google is not an isolated case• Baidu dependency in China and Yandex dependency in Russia;• Seznam dependency in Czech Republic;• Naver dependency in South Korea;• Yahoo dependency in Japan and many others Asian countries;
I-Introduction II-Risks III-Solutions IV-Future V-Conclusion
• Search engine dependency is confortable and then understandable;• But for many reasons it goes for a mass consumption information (blog phenomenon, advertisement…) which is not the best ones;• In our countries it is Google dependency but keep in mind that Europe and Americas are not the center of the world;
Brief summary
I-Introduction II-Risks III-Solutions IV-Future V-Conclusion
How to solve the equation?
I-Introduction II-Risks III-Solutions IV-Future V-Conclusion
First point
• If an answer exist... we should look for it;• At the moment there is no miracle solution
for lazy search;• But there are ways to get closer to the
answer;
I-Introduction II-Risks III-Solutions IV-Future V-Conclusion
Three pillarsLearn how to use the technology
Breaking the habitsTechnological awareness
I-Introduction II-Risks III-Solutions IV-Future V-Conclusion
Concrete case: GoogleLearn how to use the technology:• Make advanced research:
– Simple Boolean operators (« », links:, define:, ?, *, ~,…) ;– Complex request: ?intitle:index.of? "" -filetype:html -filetype:asp -wiki -ringtone -filetype:htm
-posts -lyrics -filetype:shtml -filetype:php -filetype:doc -filetype:pdf -filetype:txt mpeg wma avi wmv
– Google Advanced search;• Using other Google services such as Google Alerts;• Use sub Google search engines such as Google Scholars;
Breaking the habits:- Get used to practice what you learnt and force
yourself to do so;- Results are coming and you get used to it;
I-Introduction II-Risks III-Solutions IV-Future V-Conclusion
Concrete case: GoogleTechnological awareness:
By performing better at search you will discover new technologies that you will have to learn.
For example: Google Alerts tell you that a new searchengine is coming up and then you try it;
I-Introduction II-Risks III-Solutions IV-Future V-Conclusion
Technological awareness: Google
Google Ads
Google Advanced Search
Do you know iGoogle?
I-Introduction II-Risks III-Solutions IV-Future V-Conclusion
When Google promotes its own technology good chances that it is worthwhile
Technological awareness: How to select the best
• Search engine market is a world of buzz:
• Where every search engine want to beat Google;• But are they really providing a technical
revolution?
I-Introduction II-Risks III-Solutions IV-Future V-Conclusion
• Real time information: the Twitter example
When Google starts to be interested in one's technology it should then be a good one
I-Introduction II-Risks III-Solutions IV-Future V-Conclusion
Start to look at what Google does not have
• Finding similar websites: Who is like it?
Unfortunately it is working only for popular websites
I-Introduction II-Risks III-Solutions IV-Future V-Conclusion
Start to look at what Google does not have
Another way of searching information: Social bookmarking
Advantages: you find unindexed websites;
Disadvantages: rubbish websites, advertisement?
I-Introduction II-Risks III-Solutions IV-Future V-Conclusion
Start to look at what Google does not have
Graphical display
I-Introduction II-Risks III-Solutions IV-Future V-Conclusion
Start to look at what Google does not have
Look for specialized search engines- People: 123 People, CV gadget, Pipl…- Jobs: Indeed, JobiJoba…- Tutorials: Tutosearch, …- Torrent: Toorgle, …- Scientific information: Scirus,…- Information in a specific language: Yandex
for Russian, Baidu for Chinese….
I-Introduction II-Risks III-Solutions IV-Future V-Conclusion
Start to look where Google is not the best
• Triangle method: Locating three independent sources that point to the same answer;
• Recent events in Tibet showed how it was important to look at different sources of information and even out of your own country;
How to improve data quality on the Internet?
I-Introduction II-Risks III-Solutions IV-Future V-Conclusion
Source 1: Washington Post
Source 3: AntiCnn.comSource 2: Le Parisien
• Learn how to use, change your habits, be aware; • Be curious• Think about another way to look for information;• Three dependent sources of information;
I-Introduction II-Risks III-Solutions IV-Future V-Conclusion
Brief summary
Future of Google and information research
I-Introduction II-Risks III-Solutions IV-Future V-Conclusion
Semantic search• You get feed instead of entering your
request;• Everything is talking about Semantic
search;• But it is mature yet, a buzz world again (there
are not a lot of suggestions);• Poor results if developped on scratch (poor index)
if developped by huge companies (few suggestions);
I-Introduction II-Risks III-Solutions IV-Future V-Conclusion
Some issues to fix
• How to well index pictures? Are solutions such as Google labeler are the best???
• How to index videos?• How to index sounds?
I-Introduction II-Risks III-Solutions IV-Future V-Conclusion
A Google which will have to change
• Too much information on the Internet;• A Google which is collapsing and providing
more and more sub search engines;• The development of high bandwidth
connection which mean graphical interface;
• A technological awareness which is difficult to transmitt;
I-Introduction II-Risks III-Solutions IV-Future V-Conclusion
But a Google more and more present in our life
• Forecasts are going in that sense;• Development of OS on cell phones, Web
browser, Web software application (Google slides, Google « excel »....)
I-Introduction II-Risks III-Solutions IV-Future V-Conclusion
The question is just how they will do it?
Google in 1998 Google 11 years after
I-Introduction II-Risks III-Solutions IV-Future V-Conclusion
• Google will be with us in the future and we have to get used to it;
• Information research will be more and more assisted but you will still be in late if you do not perform advanced research;
• In a short future some issues will still be there (indexing of pictures…)
I-Introduction II-Risks III-Solutions IV-Future V-Conclusion
Brief summary
Conclusion
I-Introduction II-Risks III-Solutions IV-Future V-Conclusion
What you have to keep in mind
• At least if you are dependent you should be well dependent;
• Apply the triangle method;• Reconsider on each time the information
process (think differently);
I-Introduction II-Risks III-Solutions IV-Future V-Conclusion
RecommandationsMaster thesis about search engine dependency:
- http://www.pandia.com/index.htmlList of search engines:
- http://www.pandia.com/powersearch/index.html- http://www.philb.com/whichengine.htm
To know more about search engines: Pandia search:- www.pandiasearch.com
Documentaries:- Google: Behind the screen by IJsbrand van Veelen
http://www.youtube.com/watch?v=TBNDYggyesc&hl=fr- The Great Firewall of China
http://www.youtube.com/watch?v=IWsXhNJFj78&hl=frI-Introduction II-Risks III-Solutions IV-Future V-Conclusion
Thank you for your attention
http://moteurs-de-recherches-alternatifs.blogspot.com