scientific web intelligence the birth of a new research field mike thelwall statistical cybermetrics...

22
Scientific Web Intelligence The Birth of a New Research Field Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK

Post on 19-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Scientific Web IntelligenceThe Birth of a New Research Field

Mike Thelwall

Statistical Cybermetrics Research Group

University of Wolverhampton, UK

The Problem To map patterns of communication between

researchers in a country based upon university web sites

Patterns of communication are also mapped based upon journal citations or journal title words Provides useful information about the structure and

evolution of research fields Can identify previously unknown field connections

Web analysis could illustrate wider and more current patterns

Part 1: Hyperlink Analysis Citation counts are known to be reasonable

indicators of research quality but is the same true for inlink counts? Counts of links to universities within a country can

correlate significantly with measures of research productivity

The significance of this result is in giving ‘permission’ to investigate the use of inter-university links for researching scholarly communication

Links to UK universities against their research productivity

The reason for the strong correlation is the quantity of Web publication, not its quality

This is different to citation analysis

Most links are only loosely related to research 90% of links between UK university sites have some

connection with scholarly activity, including teaching and research But less than 1% are equivalent to citations

So link counts do not measure research dissemination but are more a natural by-product of scholarly activity Cannot use link counts to assess research Can use link counts to track an aspect of communication

Some Hyperlink Patterns

Patterns in counts of links between university Web sites

Universities tend to link to neighbours

Universitiesclustergeographically

Language is a factor in international interlinking

English the dominant language for Web sites in the Western EU

In a typical country, 50% of pages are in the national language(s) and 50% in English

Non-English speaking extensively interlink in English

{Research with Rong Tang & Liz Price}

Can map patterns of international communicationCounts of links between EU universities in Swedish are represented by arrow thickness.

Counts of links between EU universities in French are represented by arrow thickness.

Which language???

Which language???

Disciplinary Patterns

Links and subject areas

Linking patterns vary enormously by discipline No evidence of a significant geographic trend Disciplinary differences in the extent of

interlinking: e.g., history Web use is very low, Chemistry is very high

Individual research projects can have an enormous impact upon individual departments E.g. Arts web sites are often for specific exhibitions

or for digital media projects Links not frequent enough to reliably reveal

patterns of interdiscipliniarity

Stretching links: colinks, couplings For the UK academic Web, about 42% of

domains connected by links alone host similar disciplines, and about 43% connected by links, colinks and couplings

But over 100 times more domains are colinked or coupled than are directly linked

Links in any form are less than 50% reliable as indicators of subject similarity

Text Mining Approaches Hyperlinks are not frequent enough or

systematic enough to yield reliable evidence of connections at a low level

Alternative is to look for words in common E.g., the frequency with which words

associated with psychology are found in computer science web sites

Clustering web pages/sites based upon word occurrences (c.f. journal title word clustering)

Text clustering – early resultsWord Frequency Domains Importance

business 59806 408 0.005902

marketing 16987 242 0.004476

finance 8300 217 0.002826

economics 15509 261 0.002726

banking 2010 123 0.002717

management 76754 465 0.002569

sitemap 2419 62 0.001874

accounting 8162 197 0.001613

auckland 55604 414 0.001546

Which discipline?Word Frequency Domains Importance

template 3356 147 0.001355

assignment 15610 240 0.001186

copyright 16780 278 0.001166

changed 7172 284 0.001152

sst 199 33 0.001071

semester 18364 319 0.001009

systems 44521 451 0.000949

lab 7709 261 0.000861

comments 16931 354 0.000842

Scientific Web Intelligence Standard hyperlink and text mining

approaches are inadequate for identifying low level inter-subject connections

Either extensive human intervention or artificial intelligence techniques needed to extract useful information

Hence the founding of Scientific Web Intelligence

Scientific Web Intelligence Objective: to combine techniques from

Information Science, Web Mining and Web Intelligence to extract patterns of interdiscipliniarity from university Web sites

Opportunities Develop graphical techniques to display

the data Develop AI/Data Mining techniques to

analyse the data Extend the techniques to other domains –

e.g. business web intelligence