seek and you will find? yes... but probably not what you were expecting netikx, 18 th march 2015,...
TRANSCRIPT
Seek and you will find?Yes ... but probably not what you were expecting
NetIKX, 18th March 2015, #netikx72British Dental Association, 64 Wimpole Street, London W1G 8YS
Karen BlakemanRBA Information Serviceshttp://www.rba.co.uk/
[email protected] twitter.com/karenblakeman
Presentation available for a short time at: http://www.rba.co.uk/as/ This work is licensed under a Creative Commons Attribution-Share-Alike License.
EU - so called “right to be forgotten” ruling
www.rba.co.uk 2
Mario Costeja GonzalezEdition of Monday, January 19, 1998, page 23 - Newspaper - Lavanguardia.es http://hemeroteca.lavanguardia.com/preview/1998/01/19/pagina-23/33842001/pdf.html
EU Court of Justice ruled that Google is a “data controller” under Data Protection legislation and must remove, if requested, links to information that is “inadequate, irrelevant .... or excessive” from search results on a person’s name.
Applies to search engines in the EU, Norway, Lichtenstein, Iceland and Switzerland
17/03/2015
www.rba.co.uk 3
Scale of EU 'right to be forgotten' rules revealed by Google http://www.dailymail.co.uk/news/article-2952260/The-scale-EU-right-forgotten-rules-revealed-Google-says-forced-delete-260-000-links-legislation-criticised-protecting-terrorists-criminals.html
17/03/2015
www.rba.co.uk 4
Spanish Newspapers Suddenly Regret Forcing Google Out Of Spain - http://uk.businessinsider.com/spanish-newspapers-have-changed-their-minds-and-are-now-begging-google-news-to-stay-2014-12
How Google News Lives On In Spain Despite Being Closed http://searchengineland.com/google-noticias-lives-google-spains-homepage-211146
17/03/2015
Oh joy - NOT!
More UK information vanishes into GOV.UK http://www.rba.co.uk/wordpress/2015/02/28/more-uk-information-vanishes-into-gov-uk/ 17/03/2015
Where’s the information gone to?
List of departments, agencies and public bodies at https://www.gov.uk/government/organisations
“Home pages” on GOV.UK
Data and information may still be on the old websites
Data may have been moved to http://data.gov.uk/
Information may have been sent to http://www.nationalarchives.gov.uk/webarchive/
Or information may have been “lost”
Don’t rely on just GOV.UK search – use Google/Bing site: command combined with filetype: if appropriate
www.rba.co.uk 617/03/2015
"Yes Minister" The Skeleton in the Cupboard (TV Episode 1982) - Quotes - IMDb http://www.imdb.com/title/tt0751825/quotes
James Hacker: [reads memo] This file contains the complete set of papers, except for a number of secret documents, a few others which are part of still active files, some correspondence lost in the floods of 1967...James Hacker: Was 1967 a particularly bad winter?Sir Humphrey Appleby: No, a marvellous winter. We lost no end of embarrassing files.James Hacker: [reads] Some records which went astray in the move to London and others when the War Office was incorporated in the Ministry of Defence, and the normal withdrawal of papers whose publication could give grounds for an action for libel or breach of confidence or cause embarrassment to friendly governments.James Hacker: That's pretty comprehensive. How many does that normally leave for them to look at?James Hacker: How many does it actually leave? About a hundred?... Fifty?... Ten?... Five?... Four?... Three?... Two?... One?... *Zero?*Sir Humphrey Appleby: Yes, Minister.
[Add “transfer to GOV.UK” to the list of excuses]17/03/2015
http://www.nationalarchives.gov.uk/webarchive/
www.rba.co.uk 917/03/2015
Remember - Google knows best
Google very kindly....
1. Goes to great lengths to personalise your results according to your search history, contacts, location, device, phase of the moon, the train, bus or tram you take to work and anything else it can think of
2. Rewrites your search for you by leaving out some of your terms and looking for weird and wonderful alternatives
3. Doesn’t bother you with everything that might be relevant
4. Changes its algorithms on a regular basis to keep you on your toes
5. Constantly conducts experiments on you to ensure that you don’t feel forgotten
www.rba.co.uk 1117/03/2015
Google no longer looks at keywords in isolation
Tries to make “sense” of your search and put it into context, natural language queries, uses what others have searched and clicked on
Constantly changing – all bets are off when it comes to predicting what your results will look like
How you ask your question is taken into account, device you are using is taken into account
Providing Quick Answers and “facts”, extracts from websites giving you the “answer”
www.rba.co.uk 1217/03/2015
http://googlesystem.blogspot.co.uk/2013/11/google-knowledge-graph-gets-confused.html
www.rba.co.uk 2217/03/2015
One of many wrong Quick Answers submitted to me by a delegate at a recent conference
Many thanks to Philip Stirups for the example. About 24 hours after taking this screen shot Google corrected the error.
17/03/2015
www.rba.co.uk 24
Google "Henry VIII wives": Jane Seymour reveals search engine's blind spots http://www.slate.com/blogs/future_tense/2013/09/23/google_henry_viii_wives_jane_seymour_reveals_search_engine_s_blind_spots.html
Image courtesy of Will Oremus
17/03/2015
Waitrose Caversham opening times New Year’s Day
www.rba.co.uk 26
Google used the standard opening times in its answer, not the seasonal opening times
17/03/2015
www.rba.co.uk 27
http://searchengineland.com/google-shows-source-credit-quick-answers-knowledge-graph-203293
But Google’s choice of “basic factual data” may be wrong!17/03/2015
www.rba.co.uk 28
Google wants to rank websites based on facts not links - 28 February 2015 - New Scientist http://www.newscientist.com/article/mg22530102.600-google-wants-to-rank-websites-based-on-facts-not-links.html
17/03/2015
http://arxiv.org/abs/1502.03519
Or maybe not....
www.rba.co.uk 29
Google: We Are Not Using Facts For Search Engine Ranking Now https://www.seroundtable.com/google-fact-ranking-not-happening-19979.html
17/03/2015
Artificial intelligence
www.rba.co.uk 30
Artificial Intelligence machine plays video games like a pro - CBBC Newsround http://www.bbc.co.uk/newsround/31633702
Google buys UK artificial intelligence startup Deepmind for £400m http://www.theguardian.com/technology/2014/jan/27/google-acquires-uk-artificial-intelligence-startup-deepmind Google buys two more UK artificial intelligence startups http://www.theguardian.com/technology/2014/oct/23/google-uk-artificial-intelligence-startups-machine-learning-dark-blue-labs-vision-factory
17/03/2015
http://www.google.com/publicdata/Google Public Data Explorer Minimum Wage
www.rba.co.uk 32
Some countries are missing e.g. Germany
17/03/2015
http://www.rightmove.co.uk/ - uses Land Registry data
www.rba.co.uk 35
Land Registry data often goes missing. I know that 10 months ago the sold price for number 90 was listed as £185,000 and for 2012.
17/03/2015
http://landregistry.data.gov.uk/app/ppd/search
Data doesn’t show up via the Land Registry Open Data interface either.
17/03/2015
Missing data
www.rba.co.uk 37
Error report filed with the Land Registry - still waiting for a response
Why might a property/price paid not appear in the data?
Seems not that uncommon according to discussion boards – usually data entry error (but the above example was in the open data sets until a few months ago)
Absence of price – gift of property or purchase of a share
Impractical to calculate price e.g. bulk purchase of properties
Commercial transactions
https://www.gov.uk/about-the-price-paid-data#data-excluded-from-the-house-price-index-and-price-paid-data
Raw data files downloaded and searched and data for number 90 is missing
17/03/2015
Free Companies House data to boost UK economy - Press releases - GOV.UK https://www.gov.uk/government/news/free-companies-house-data-to-boost-uk-economy
17/03/2015
Companies House free data
http://download.companieshouse.gov.uk/en_accountsdata.html
Bulk data – all or nothing
Large daily files available as zipped files
No support provided – you’re on your own!
www.rba.co.uk 4017/03/2015
Companies House free data
www.rba.co.uk 41
Each file within the zipped file is a separate document. Note the uninformative file names!
17/03/2015
www.rba.co.uk 42
Variable Pitch http://www.variablepitch.co.uk/stations/1310/
Uses public electricity micro-generation data
17/03/2015
www.rba.co.uk 43
Variable Pitch http://www.variablepitch.co.uk/stations/2580/
Virginia Station is the hydroelectric installation at Windsor Castle – no data!
17/03/2015
FoI request generation data for Virginia Station
www.rba.co.uk 44
https://www.whatdotheyknow.com/request/request_electricity_output_of_ro
http://www.independent.co.uk/news/uk/home-news/royal-family-granted-new-right-of-secrecy-2179148.html
17/03/2015
And finally..... Per capita consumption of cheese (US) correlates with Number of people who died by becoming tangled in their bedsheets http://www.tylervigen.com/view_correlation?id=7
www.rba.co.uk 4517/03/2015