msc projects information searching peter hancox computer …rzb/information_searching.pdf ·...

31
MSc Projects – Information Searching MSc Projects Information Searching Peter Hancox Computer Science

Upload: others

Post on 07-Jul-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: MSc Projects Information Searching Peter Hancox Computer …rzb/Information_searching.pdf · 2011-07-21 · MSc Projects – Information Searching Science Citation Index Comprehensiveness/coverage

MSc Projects – Information Searching

MSc ProjectsInformation Searching

Peter HancoxComputer Science

Page 2: MSc Projects Information Searching Peter Hancox Computer …rzb/Information_searching.pdf · 2011-07-21 · MSc Projects – Information Searching Science Citation Index Comprehensiveness/coverage

MSc Projects – Information Searching

Why should you be searching?

– saving you time by finding ways to solve problems, produce better designs, discover problem domains, benchmark your work;

Information searching/retrieval is about:

Introduction to information retrieval 1

produce better designs, discover problem domains, benchmark your work;

– learning from other people’s work;– developing problem-solving skills;– keeping your examiners happy.

Page 3: MSc Projects Information Searching Peter Hancox Computer …rzb/Information_searching.pdf · 2011-07-21 · MSc Projects – Information Searching Science Citation Index Comprehensiveness/coverage

MSc Projects – Information Searching

IR is more than Google

Most people’s way of finding information is to use Google.

Google is great for some things - mainly finding

What were the surnames of my grandfathers?

Is the Earth flat? See:http://www.alaska.net/~clund/e_djublonskopf/Flatearthsociety.htm

Introduction to information retrieval 2

Google is great for some things - mainly finding undisputed facts.

Google relies on indexing WWW pages - so it is as:– complete as the WWW;– accurate as the WWW.

Page 4: MSc Projects Information Searching Peter Hancox Computer …rzb/Information_searching.pdf · 2011-07-21 · MSc Projects – Information Searching Science Citation Index Comprehensiveness/coverage

MSc Projects – Information Searching

The Computer Science literature

3%7%

Conference papers

Journal articles

Technical reports

Introduction to information retrieval 3

39%

26%

5%

3%

17%Technical reports

Theses

Books

Other material (egprograms)

WWW pages

Page 5: MSc Projects Information Searching Peter Hancox Computer …rzb/Information_searching.pdf · 2011-07-21 · MSc Projects – Information Searching Science Citation Index Comprehensiveness/coverage

MSc Projects – Information Searching

How to know you’re getting quality

Journal papers- the best are “peer-reviewed”

How do you, a novice, know you’re reading high quality scientific/technical literature?

Introduction to information retrieval 4

Journal papers- the best are “peer-reviewed”Conference papers- the best are “peer-reviewed”Books- the best are published by the best publishers, e.g. Oxford, Cambridge, MIT, ….Technical reports- the best probably come from the best universities and companies …But how do you know which are the best?

Page 6: MSc Projects Information Searching Peter Hancox Computer …rzb/Information_searching.pdf · 2011-07-21 · MSc Projects – Information Searching Science Citation Index Comprehensiveness/coverage

MSc Projects – Information Searching

How to know you’re getting quality

The answer is very simple:

Use specialised information retrieval databases that have

Introduction to information retrieval 5

Use specialised information retrieval databases that have– excellentcoverage;– excellent currency.

Page 7: MSc Projects Information Searching Peter Hancox Computer …rzb/Information_searching.pdf · 2011-07-21 · MSc Projects – Information Searching Science Citation Index Comprehensiveness/coverage

MSc Projects – Information Searching

Some IR theory and practice - 1

There are three kinds of search:

– finding simple facts- use Google (with care)– current awareness- keeping yourself up-to-date

Introduction to information retrieval 6

– current awareness- keeping yourself up-to-date– retrospective searching- finding some (or all) the

literature on a topic

This lecture is mainly about retrospective searching.

Page 8: MSc Projects Information Searching Peter Hancox Computer …rzb/Information_searching.pdf · 2011-07-21 · MSc Projects – Information Searching Science Citation Index Comprehensiveness/coverage

MSc Projects – Information Searching

Some IR theory and practice - 2

A document set can be divided into relevant and irrelevant documents:

Introduction to information retrieval 7

Page 9: MSc Projects Information Searching Peter Hancox Computer …rzb/Information_searching.pdf · 2011-07-21 · MSc Projects – Information Searching Science Citation Index Comprehensiveness/coverage

MSc Projects – Information Searching

Some IR theory and practice - 3

A document set can be divided into relevant and irrelevant documents:

Precision =

Introduction to information retrieval 8

Precision = no. of relevant documentstotal no. of docs retrieved

100/160 = 62.5%

Page 10: MSc Projects Information Searching Peter Hancox Computer …rzb/Information_searching.pdf · 2011-07-21 · MSc Projects – Information Searching Science Citation Index Comprehensiveness/coverage

MSc Projects – Information Searching

Some IR theory and practice - 4

A document set can be divided into relevant and irrelevant documents:

Recall =

Introduction to information retrieval 9

Recall = no. of relevant documentstotal no. of relevant docs

100/200 = 50%

Page 11: MSc Projects Information Searching Peter Hancox Computer …rzb/Information_searching.pdf · 2011-07-21 · MSc Projects – Information Searching Science Citation Index Comprehensiveness/coverage

MSc Projects – Information Searching

Some IR theory and practice - 5

The paradox of searching? It seems impossible to get 100% precision and 100% recall.

Introduction to information retrieval 10

100% precision and 100% recall.

Page 12: MSc Projects Information Searching Peter Hancox Computer …rzb/Information_searching.pdf · 2011-07-21 · MSc Projects – Information Searching Science Citation Index Comprehensiveness/coverage

MSc Projects – Information Searching

Some IR theory and practice - 6

Bradford’s law of scattering:

Colloquially:To find all relevant scientific literature on a topic, you

Introduction to information retrieval 11

To find all relevant scientific literature on a topic, you have to look in all the literature; to find ~90% of the literature, you only have to look in 10% of the literature.

More formally:the returns of extending a search for references in science journals diminishes exponentially.

Page 13: MSc Projects Information Searching Peter Hancox Computer …rzb/Information_searching.pdf · 2011-07-21 · MSc Projects – Information Searching Science Citation Index Comprehensiveness/coverage

MSc Projects – Information Searching

Some IR theory and practice - 7

Bradford’s law of scattering:

Means that we can concentrate searching on a fairly small subset of the literature and get most results.

Introduction to information retrieval 12

subset of the literature and get most results.

Specialised information retrieval databases are designed to retrieve large amounts of literature from the optimum number of journals. Google isn’t designed to do this.

Page 14: MSc Projects Information Searching Peter Hancox Computer …rzb/Information_searching.pdf · 2011-07-21 · MSc Projects – Information Searching Science Citation Index Comprehensiveness/coverage

MSc Projects – Information Searching

Choosing databases - books

Don’t use Amazon - it only has books currently on sale that it can source.

Use a copyright deposit library:

Databases - books 13

Use a copyright deposit library:British LibraryLibrary of CongressCambridge UL

Page 15: MSc Projects Information Searching Peter Hancox Computer …rzb/Information_searching.pdf · 2011-07-21 · MSc Projects – Information Searching Science Citation Index Comprehensiveness/coverage

MSc Projects – Information Searching

Choosing databasesjournals and conference papers

The best keyword-based Computer Science services are:

Inspec

Databases - journals and conference papers 14

Inspechttp://www.engineeringvillage2.org

ACM Guide to Computing Literaturehttp://portal.acm.org/guide.cfm

Page 16: MSc Projects Information Searching Peter Hancox Computer …rzb/Information_searching.pdf · 2011-07-21 · MSc Projects – Information Searching Science Citation Index Comprehensiveness/coverage

MSc Projects – Information Searching

Choosing databasesjournals and conference papers

Interdisciplinary services with substantial Computing coverage:

Databases - journals and conference papers 15

Medlinehttp://gateway.ovid.com/autologin.html

Compendex/Engineering Indexhttp://www.engineeringvillage2.org

Page 17: MSc Projects Information Searching Peter Hancox Computer …rzb/Information_searching.pdf · 2011-07-21 · MSc Projects – Information Searching Science Citation Index Comprehensiveness/coverage

MSc Projects – Information Searching

Choosing databasesjournals and conference papers

Single publisher services - perhaps with full text access:

IEEE Xplore

Databases - journals and conference papers 16

IEEE Xplorehttp://ieeexplore.ieee.org/

Page 18: MSc Projects Information Searching Peter Hancox Computer …rzb/Information_searching.pdf · 2011-07-21 · MSc Projects – Information Searching Science Citation Index Comprehensiveness/coverage

MSc Projects – Information Searching

Inspec - coverage and currency

Includes:– 3,500 journals – many of them computing science

and applications journals– conference papers – 1,500 conferences added each

year

Searching Inspec 17

– conference papers – 1,500 conferences added each year

– seems to include reports, theses, etc, but how satisfactory is the coverage?

Journals seem to be completely indexed within ~6 months of publication.

Page 19: MSc Projects Information Searching Peter Hancox Computer …rzb/Information_searching.pdf · 2011-07-21 · MSc Projects – Information Searching Science Citation Index Comprehensiveness/coverage

MSc Projects – Information Searching

Inspec - indexing

How are the entries indexed?– Classification scheme– Controlled language

Searching Inspec 18

– Controlled language– Keywords

• taken from title• taken from abstract• written by the indexer

Page 20: MSc Projects Information Searching Peter Hancox Computer …rzb/Information_searching.pdf · 2011-07-21 · MSc Projects – Information Searching Science Citation Index Comprehensiveness/coverage

MSc Projects – Information Searching

Inspec - indexing

Searching Inspec 19

ti: coherenceti: inferenceti: representation

Page 21: MSc Projects Information Searching Peter Hancox Computer …rzb/Information_searching.pdf · 2011-07-21 · MSc Projects – Information Searching Science Citation Index Comprehensiveness/coverage

MSc Projects – Information Searching

Inspec - indexing

Searching Inspec 20

Page 22: MSc Projects Information Searching Peter Hancox Computer …rzb/Information_searching.pdf · 2011-07-21 · MSc Projects – Information Searching Science Citation Index Comprehensiveness/coverage

MSc Projects – Information Searching

Inspec - searching

Demonstration based on handout.

Searching Inspec 21

Page 23: MSc Projects Information Searching Peter Hancox Computer …rzb/Information_searching.pdf · 2011-07-21 · MSc Projects – Information Searching Science Citation Index Comprehensiveness/coverage

MSc Projects – Information Searching

Science Citation Index Subject coverage

The scope is so wide as to be multidisciplinary.

It indexes:

Searching Science Citation Index 22

It indexes:– journals - almost 5,300 science journals including at

least 200 computing journals and probably more.It doesn’t directly index:– conferences – books– reports – theses

Inspec indexes 3,500 mainly relevant journals

Page 24: MSc Projects Information Searching Peter Hancox Computer …rzb/Information_searching.pdf · 2011-07-21 · MSc Projects – Information Searching Science Citation Index Comprehensiveness/coverage

MSc Projects – Information Searching

Science Citation IndexComprehensiveness/coverage

Covers many of the principal journals in computing– has a wide computer science coverage, choosing the

Searching Science Citation Index 23

– has a wide computer science coverage, choosing the most widely respected journals rather than (e.g.) an engineering bias.

Page 25: MSc Projects Information Searching Peter Hancox Computer …rzb/Information_searching.pdf · 2011-07-21 · MSc Projects – Information Searching Science Citation Index Comprehensiveness/coverage

MSc Projects – Information Searching

Science Citation IndexSubject overlap

SCIoverlaps with several other indexing services.– Compendexhas many of the same core journals - but

also has conferences.

Searching Science Citation Index 24

also has conferences.– Inspechas many of the same core journals and lots of

other journals - and also has conferences.

Page 26: MSc Projects Information Searching Peter Hancox Computer …rzb/Information_searching.pdf · 2011-07-21 · MSc Projects – Information Searching Science Citation Index Comprehensiveness/coverage

MSc Projects – Information Searching

Science Citation IndexRecord content

How much information do the entries contain?– Basic bibliographic information– Abstract

Searching Science Citation Index 25

– Abstract– Institution - e.g. University of Birmingham– Language of original article

Page 27: MSc Projects Information Searching Peter Hancox Computer …rzb/Information_searching.pdf · 2011-07-21 · MSc Projects – Information Searching Science Citation Index Comprehensiveness/coverage

MSc Projects – Information Searching

Science Citation Index Indexing

How are the entries indexed?– Keywords

• taken from title

Searching Science Citation Index 26

• taken from title• taken from abstract• written by the indexer

– Citations

Page 28: MSc Projects Information Searching Peter Hancox Computer …rzb/Information_searching.pdf · 2011-07-21 · MSc Projects – Information Searching Science Citation Index Comprehensiveness/coverage

MSc Projects – Information Searching

SCI - searching

Demonstration based on handout.

Searching Science Citation Index 27

Page 29: MSc Projects Information Searching Peter Hancox Computer …rzb/Information_searching.pdf · 2011-07-21 · MSc Projects – Information Searching Science Citation Index Comprehensiveness/coverage

MSc Projects – Information Searching

So what does SCI retrieve?

If you use it as keyword-based indexOnly journals/serials

Searching Science Citation Index 28

If you search for citationsAnything that authors cite …– journal & conference papers, books, theses, technical

reports– letters, WWW pages, newspapers, conversations …

Page 30: MSc Projects Information Searching Peter Hancox Computer …rzb/Information_searching.pdf · 2011-07-21 · MSc Projects – Information Searching Science Citation Index Comprehensiveness/coverage

MSc Projects – Information Searching

Searching SCI for citations

Points to think aboutDoes the use of citations improve recall and/or precision?

Searching Science Citation Index 29

precision?

What criteria are used to include cited items? – Are items cited because they are relevant?– Because the author wrote them? – To criticize an alternative approach? – To impress readers with the author’s erudition?

Page 31: MSc Projects Information Searching Peter Hancox Computer …rzb/Information_searching.pdf · 2011-07-21 · MSc Projects – Information Searching Science Citation Index Comprehensiveness/coverage

MSc Projects – Information Searching

The End

30

The End