understanding the depth of google scholar and its implication for webometrics ranking of higher...

41
Adegbilero-Iwari, Idowu is: An IFLA/OCLC Fellow, An Emerging Technology Librarian, A DeGruyter Open Access Funding Board Member and a Understanding the Depth of Google Scholar and its Implication for Webometrics Ranking of Higher Institutions Presented by Adegbilero-Iwari, Idowu At the 2016 Open Access Week Programme, Elizade University, Ilara-Mokin, Ondo State, Nigeria Date: 25th October, 2016

Upload: idowu-adegbilero-iwari

Post on 15-Apr-2017

113 views

Category:

Education


0 download

TRANSCRIPT

Adegbilero-Iwari, Idowu is: An IFLA/OCLC Fellow, An Emerging Technology Librarian, A DeGruyter Open Access Funding Board Member and a

Understanding the Depth of Google Scholar and its Implication for Webometrics Ranking of Higher Institutions

Presented by Adegbilero-Iwari, Idowu

At the 2016 Open Access Week Programme, Elizade University, Ilara-Mokin, Ondo State, Nigeria

Date: 25th October, 2016

Open Access, briefly

• Open Access is the free, immediate, online availability of research articles, coupled with the rights to use these articles fully in the digital environment

• Ways authors can provide open access:

• self-archiving their journal articles in an open access repository, also known as 'green' open access, or

• publishing in an open access journal, known as 'gold' open access but with the payment of an APC

Which One? OR

• Open • Locked

Open Access according to Peter Suber

• Peter Suber of Harvad Library’S OSC and author of Open Access has it that,

"The basic idea of OA is simple: Make research literature available online without price barriers and without most permission barriers.“

• If this was said in Harvard, then we must sing it in Africa

But the Dilemma

• Institutions in Resource poor countries are likely not able to subscribe to pay-walled journals or databases

• In the same vein, their scholars are likely not able to pay over a $1000 a high impact factor journal will ask for as Open Access Charges

• The Result? research outputs of such nations will continue to be in the Dark or their scholars fall for Predatory OA publishers with as low as $100

And Some help? The website that shows you this image, that is it!

http://whyopenresearch.org/

Reducing publishing costs Tips from the Site

i. Find a no-cost open access journal: as of 2014, over 70% of journals indexed in the Directory of Open Access Journals

ii. Find a low-cost open access journal

iii. Request a waiver

Archiving and Publishing Colours

• According to Bill Hubbard of SHERPA’s Repository Support Project, we have

• Green: can archive pre-print and post-print

• Blue: can archive post-print (i.e. final draft post-refereeing)

• Yellow: can archive pre-print (i.e. pre-refereeing)

• White: archiving not formally supported Open

• Open Access Publishing i.e. Gold Publishing or Gold route to open access: author pays cost of article publication and the work is freely available.

Open Access and Webometrics Ranking of World Universities

• The overall goals are Visibility and Impact

• Thus, the relationship between and OA and Webometrics Ranking (The Ranking) is direct

• The aims of the Ranking are:

• 1. To improve the Web presence of research and academic institutions

• 2. To promote Open Access to research

Webometrics Ranking

• The Ranking measures the strength of universities’ web presence using their:

• A. web domain

• B. Sub-pages

• C. Rich files

• D. Scholarly articles

Webometrics

• Largest academic ranking of Higher Education Institutions

• Performed by Cybermetrics Lab (Spanish National Research Council, CSIC)

• Started in 2004 based on ARWU and released twice per year since 2006

• Global in scope and based on the web presence and impact of universities

The Ranking is not:

• to evaluate websites, their design or usability or the popularity of their contents according to the number of visits or visitors

• But measures:

• All of universities’ tripartite mission: research, teaching and “the economic relevance of the technology transfer to industry, the community engagement”

Categories of the Webometrics Ranking

i. The Ranking Web of World Universities (Green)

ii. The Ranking of Institutional Repositories (Red)

iii. The Ranking of Hospitals (Grey)

iv. The Ranking of Research Centers (Blue)

v. The Ranking of Business Schools* (Orange)

Why Learn Google Scholar?

• Google Scholar score is 30% for both i and ii above

• It, thus, worth understanding the depth of Google Scholar

Google Scholar (GS)

• An indexer*

• A machine (Search Engine)

• A research tool

• A researcher’s ladder to the top

• Or, as a researcher, why search Google when you have Google Scholar???

All of Google Scholar

Searching Google Scholar

• Basic Search

Searching Google Scholar: Advanced search

Sorting search results

Google Scholar: H-Index (Hirsch Index or Hirsch Number)

• The h-index is an author-level metric that attempts to measure both the productivity and citation impact of the publications of a scientist or scholar based on the set of the scientist's most cited papers and the number of citations that they have received in other publications

Google Scholar: H-Index (Hirsch Index or Hirsch Number)

• The index can also be applied to the productivity and impact of a scholarly journal as well as a group of scientists, such as a department or university or country

• The index was suggested in 2005 by Jorge E. Hirsch, a physicist at University of California, San Diego as a tool for determining theoretical physicists' relative quality

H-Index simply defined

• It goes like, a scholar with an index of h has published h papers each of which has been cited in other papers at least h times

• the h-index reflects both the number of publications and the number of citations per publication

H-index calculated

• If f is the function that corresponds to the number of citations for each publication, we compute the h index as follows:

• First, we order the values of f from the largest to the lowest value.

• Then, we look for the last position in which f is greater than or equal to the position (we call h this position)

H-index cont’d

• Example • if we have a researcher with 6 publications A, B,

C, D, E and F with 11, 9, 7, 4, 3 and 2 citations, respectively, the h-index is equal to 5 because the 5th publication has 3 citations and the 6th has only 2

• Tools for measuring H-Index: Web of Science Scopus Google Scholar

An example,

And this!

Google Scholar Metrics

• Google Scholar Metrics provide an easy way for authors to quickly gauge the visibility and influence of recent articles in scholarly publications.

• Scholar Metrics summarize recent citations to many publications, to help authors as they consider where to publish their new research.

Coverage of Publications • Scholar Metrics currently cover articles published between 2011 and 2015, both inclusive. The

metrics are based on citations from all articles that were indexed in Google Scholar in June 2016. Included Publications: • journal articles from websites that follow Scholar’s inclusion guidelines; • selected conference articles in Computer Science and Electrical Engineering; • preprints from arXiv, SSRN, NBER and RePEC - for these sites, metrics are computed for individual

collections, e.g., "arXiv Superconductivity (cond-mat.supr-con)" or "CEPR Discussion Papers". Excluded Publications: • court opinions, patents, books, and dissertations; • publications with fewer than 100 articles published between 2011 and 2015; • publications that received no citations to articles published between 2011 and 2015.

The Scholar Metrics

Top Publications with highest h5-index

GS as Indexer: Getting Included

Channel 1: Individual Author’s Website

e.g., “www.example.edu/~professor/jpdr2009.pdf; and add a link to it on your publications page, such as www.example.edu/~professor/publications.html.”

Criteria for Inclusion

• the full text of your paper is in a

PDF file that ends with ".pdf",

• the title of the paper appears in a large font on top of the first page,

• the authors of the paper are listed right below the title on a separate line, and

• there's a bibliography section titled, e.g., "References" or "Bibliography" at the end.

These done, GS search robots should normally find your paper and include it.

Channel 2

Institutional Repositories

• Institutional repositories should use the latest version of popular repository software such as Eprints (eprints.org), Digital Commons (digitalcommons.bepress.com), or DSpace (dspace.org) to host researchers’ papers.

• Repositories must be configured for indexing in Google Scholar

Channel 3

Journal Publishers

• Three options:

• Use established journal hosting services, e.g., Atypon and Highwire

• Or

• Use Aggregators that host many journals on a single website, such as JSTOR or SciELO only if they support full-text indexing in GS

• Or

• Use Open Journal Systems (OJS) software that's available for download from the Public Knowledge Project (PKP) if you have technical expertise to manage your site

• The content of your website needs to meet the two basic criteria:

• 1. Scholarly articles: journal papers, conference papers, technical reports, or their drafts, dissertations, pre-prints, post-prints, or abstracts

• 2. Abstract shown (or contains full-text of article)

Guidelines for Contents

Things the site must avoid: must not require users (or search robots) to sign in, install special software, accept disclaimers, dismiss popup or interstitial advertisements, click on links or buttons, or scroll down the page before they can read the entire abstract of the paper.

1. File formats must be HTML or PDF with searchable text not exceeding 5MB

2. Good browse interface for search robots to discover your articles urls

Crawl Guidelines Note:

Just like Google search GS uses automated software, known as "robots" or "crawlers", to fetch your files for inclusion in the search results.

Guide for organizing website containing a small publication: • list all articles on a single HTML page, such as

www.example.edu/~professor/publications.html, and include links to their full text in the PDF format

For sites containing 1000s of publications: • list them by the date of publication or the date of record entry instead of browse by

author or browse by keywords interfaces • create an additional browse interface that lists only the articles added in the last two

weeks • use of Flash, JavaScript, or form-based navigation makes it hard for our automated system

to find your articles so add browse by date interface that uses only simple HTML GET links if your site uses any of these.

3. Website availability: at all times to both crawler and users

4. Robots exclusion protocol:

Crawl Guideline cont’d

While it should block robots from accessing large dynamically generated spaces that aren't useful in the discovery of your articles, such as shopping carts, comment forms, your website must however NOT block Google's search robots from accessing your articles or your browse URLs.

• Things to do:

1. When preparing article URLs:

Each paper must have its own unique URL in order for it to be included in Google Scholar. Place each article and each abstract in a separate HTML or PDF file.

2.a. When Configuring the meta-tags:

Configure your repository or journal management software to export bibliographic data in HTML "<meta>" tags e.g

• The title tag, e.g., citation_title or DC.title, must contain the title of the paper

• The publication date tag, e.g., citation_publication_date or DC.issued, must contain the date of publication

Indexing Guidelines Note:

• GS uses automated software, known as "parsers", to identify bibliographic data of your papers, as well as references between the papers.

• Incorrect identification of bibliographic data or references will lead to poor indexing of your site.

2.b. Indexing of content without the meta-tags

i. The title of the paper must be the largest chunk of text on top of the page say font size 24

ii. The authors of the paper must be listed right before or right after the title, in a slightly smaller font that is still larger than normal text say 16-23

iii. Include a bibliographic citation to a published version of the paper on a line by itself, and place it inside the header or the footer of the first page in the PDF file or if unpublished, include the full date of its present version on a line by itself

iv. Avoid use of Type 3 fonts in PDF files, because they're often generated with missing or incorrect font size and character encoding information

If it is not possible to implement the HTML "<meta>" tags, e.g., if your papers are only available in the PDF format, then the document needs to be visually laid out according to the following conventions

3. When Marking the References

• Mark the section of the paper that contains references to other works with a standard heading, such as "References" or "Bibliography", on a line just by itself

• Individual references inside this section should be either numbered "1. - 2. - 3." or "[1] - [2] - [3]" in PDF, or put inside an "<ol>" list in HTML.

• The text of each reference must be a formal bibliographic citation in a commonly used format, without free-form commentary.

Note:

references are identified automatically by the parser software; they're not entered or corrected by human operators

Bibliography

1. Google Scholar https://scholar.google.com/intl/en/scholar/inclusion.html#indexing 2. Wikipedia https://en.wikipedia.org/wiki/H-index#i10-index 3. Bill Hubbard http://www.sherpa.ac.uk/documents/sherpaplusdocs/Nottingham-colour-guide.pdf 4. Peter Suber https://osc.hul.harvard.edu/policies/ 5. Cornell University Library http://guides.library.cornell.edu/c.php?g=32272&p=203391 6. SPARC http://www.sparc.arl.org/issues/open-access#sthash.EzrFGvc1.dpuf 7. Why Open Research? http://whyopenresearch.org/costs.html 8. http://www.123rf.com/photo_7911221_3d-man-on-bicycle.html 9. http://www.deviantart.com/tag/asante