cybermetrics - kaznu.kz aguillo... · general term of cybermetrics or the more specific of...

140
Cybermetrics Theory and practice Isidro F. Aguillo Version 2 (Nov’11) [email protected]

Upload: dangnguyet

Post on 25-Sep-2018

253 views

Category:

Documents


0 download

TRANSCRIPT

CybermetricsTheory and practice

Isidro F. AguilloVersion 2 (Nov’11)[email protected]

2

Presentación: Isidro F. Aguillo Current position

Head, Cybermetrics LabSpanish National Research Council (CSIC)

Background MSc. Biology (Univ. Complutense, Madrid) MID (Univ. Carlos III, Madrid) DEA (Univ. Granada) Doctor Honoris Causa (Univ. Indonesia)

Research topics & other working activities Rankings Portal: webometrics.info Research projects: QEAVIS (e-humanities), MAVIR

(multilingual Web), CARTO (R&D cartography), ICYTnet (Virtual Libraries)

EU funded projects: ACUMEN (indicators portfolio for individuals), OpenAIRE (EU central repository), WISER (cybermetrics), EICSTES (R&D web indicators), PEKING (knowledge management), IMPACT-INFO2000 (information society)

Founder and editor of the e-journal “Cybermetrics” 300 seminars and conferences in over 100

universities from all over the World

3

Agenda I. Descriptive Cybermetrics Methods and tools Web indicators

II. Applied Webometrics Positioning in search engines Optimising web contents

III. Usagemetrics Log files and visits analysis Popularity

Descriptive Cybermetrics Web Analysis

See also: UsabilityAccesibilityWeb Metrics

MODULE 1

5

Cybermetrics is the discipline dedicated to the quantitative description of the contents and processes of the communication that take place in the cyberspace Cyberspace is the set of contents accessible in electronic

format. The condition of universal accessibility of Internet suggests the use of this term as synonymous of the Internet of the contents, basically but not exclusively, the webspace

Since the Cyber-scientometric is the sub-field more developed, for practical reasons it is named with the more general term of Cybermetrics or the more specific of Webometrics

Definition

6

informetrics

bibliometrics scientometrics

webometricscibermetrics

Adapted from Björneborn

Cyberscientometrics

Quantitative disciplines

7

Relationships

Webometrics

Informetrics

Mathematics/Physics

Librarianship and Documentation

Science‘s sociology

History of science

Economy

Scientific documentation

Services forInvestigation in

Libraries

Scientific policyInvestigation managementn

Scienctometricsapplied

basic

Life sciences

www.ulb.ac.be/unica/docs/Sch-com-2004-pres-Glanzel.ppt

Other sciences/Humanities

8

The presence on the Web reflects more and better the activities of the institution or individual than the traditional publications on paper At the academic area, professors, researchers and students put

on the Web unpublished material, first draw works, preliminary versions of papers, course materials, slides for presentations or data bases

The Web reaches a greater audience than other traditional scientific communication media The scientific journals has a restricted distribution

The hypertext nature of the Web offers the possibility to discover hidden patterns between the different institutional sites The academic sites link to other sites with a marked economic,

industrial, cultural, politic or social character

Advantages of the quantitative approach

9

New application areas

Webometrics Topology of hipertextual networks Social networks PageRank, HITS Comparative analysis of search engines

Ciberscientometrics Studies of electronic mails and forums “Big Science” & Grid Cybergeography and cyberdemography New units: institutional Web sites New indicators

Visibility Popularity

10

Cibergeography, ciberdemography Data and sources

Internet Geography Project www.zooknic.com Cybergeography www.cybergeography.org Clickz Surveys www.clickz.com/stats Blog www.internetworldstats.com/blog.htm Demography and Geography of the Internet

www.sociosite.org/demography.phpwww.sociosite.net/topics/webgeography.php

Internet Demographics Directoryinternet-demographics.netfirms.com

11

Ciberdemography (I)

www.internetworldstats.com/stats.htm

12

Ciberdemography (II)

13

Ciberdemography (III)

www.internetworldstats.com/stats7.htm

14

Size of Internet: Infrastructures Hosts

Lottor (World) www.isc.org/ds RIPE (Europe) www.ripe.net/info/stats/hostcount/ Asia Web Watch www.ciolek.com/Asia-Web-Watch/main-page.html

Servers Netcraft www.netcraft.com

Domains World www.norid.no/domenenavnbaser/domreg.html Domain worldwide www.domainworldwide.comwww.verisign.com/Resources/Naming_Services_Resources/Domain_Name

_Industry_Brief/ Germany (and others) www.denic.de/en/domains/statistiken Studies (outdated) www.zooknic.com

15

Internet evolution (Lottor)

16

Lottor

http://ftp.isc.org/www/survey/reports/2011/01/bynum.txt

Web servers

17http://news.netcraft.com/archives/web_server_survey.html

18

Web contents Webspace Spireproject 10.000 millions (10/02)

spireproject.com/art13.htm Present day 40+40.000 millions

Deposits Archive www.archive.org Google Cache www.google.com

Traffic The 80% of the browser sessions in the Web imply the use of

a search engine or a directory. Yahoo and, specially Google, are the more important intermediaries

19

Wayback Machine

20

The problem with the gTLD gTLD

First ones: .com, .org, .net, .int (.eu.int) New ones: .biz, .info, .name, .aero, .coop, .museum, .eu, .cat De facto: .cx, .tv, .cc Special cases: .edu

Experiments Google/Bing/Exalead

Filter operator “site:” Problems with some cTLD Domains and countries International domains (gTLD)

IP translators IP Locator 1.41 AW IP Locator 1.8 www.atelierweb.com/iploc IP Address Locator www.geobytes.com/IpLocator.htm?GetLocation Ip2location www.ip2location.com/free.asp

21

Google: Languages and countries

22

Mentions

23

Academic Webspace

Sites Institutional domains

OCLC Web Characterization (1998-2002)http://www.oclc.org/research/projects/archive/wcp/

Sites and institutional sites Netcraft October 2011

500 millions of web sites Active (50%) * (5-10 institutional site/site) ~ 2 000 mill.

institutional sites Academic webspace Academic subdomains

Not every country

24

Academic subdomainsac.ae ac.in ac.rw edu.am edu.cn edu.hk edu.mm edu.pk edu.uaac.at ac.ir ac.se edu.ar edu.co edu.hn edu.mn edu.pl edu.uyac.bd ac.je ac.sg edu.au edu.cu edu.hu edu.mo edu.pr edu.veac.be ac.jp ac.sz edu.az edu.dm edu.jm edu.mp edu.pt edu.vgac.bw ac.ke ac.th edu.ba edu.do edu.jo edu.mt edu.py edu.vnac.by ac.kr ac.tz edu.bb edu.dz edu.kg edu.mx edu.qa edu.wsac.ci ac.lk ac.ug edu.bh edu.ec edu.kh edu.my edu.ru edu.yeac.cn ac.lv ac.uk edu.bm edu.ee edu.kn edu.na edu.sa edu.yuac.cr ac.ma ac.uz edu.bn edu.eg edu.kw edu.nf edu.sg edu.zaac.cy ac.mu ac.vn edu.bo edu.gd edu.ky edu.ng edu.sh edu.zmac.fj ac.mz ac.yu edu.br edu.ge edu.kz edu.ni edu.stac.gg ac.nz ac.za edu.bs edu.gh edu.lb edu.np edu.svac.gs ac.pa ac.zm edu.bt edu.gr edu.lc edu.om edu.toac.id ac.pg ac.zw edu.by edu.gs edu.li edu.pa edu.trac.il ac.pl acad.bg edu.bz edu.gt edu.lv edu.pe edu.ttac.im ac.ru edu.al edu.ck edu.gu edu.mk edu.ph edu.tw

25

Academic databases Public Web

Google Scholar scholar.google.comPublish or Perish www.harzing.com/pop.htmCitations Gadget code.google.com/p/citations-gadget/

MS Academic Search academic.research.microsoft.comScirus www.scirus.comCiteSeerX citeseerx.ist.psu.eduCitebase www.citebase.orgParacite paracite.eprints.orgDBLP dblp.uni-trier.deScienceDirect www.sciencedirect.com(US) Science Gov www.science.govIn-extenso www.in-extenso.org

26

ContextPublic Web Private Web

Databases

Repositories

Electronic journals

Visible Web

Invisible Internet

27

Google Scholar

28

Scholar (II)

28

Trabajos en dominios universitarios

(Enero ‘07)

29

Scholar: Publish or Perish

30

Google Scholar Citations (testing)

31

Microsoft Academic Search

32

MAS Author entry

33

MAS Institution entry

34

MAS Comparing institutions

35

CiteSeerX

36

Rich files and media files Rich files

Definition and types Adobe Acrobat (pdf) y Postscript (ps) MS Office: Word (doc, rtf), Excel (xls), Powerpoint (ppt)

Size Filter operators: filetype (Google, Live, Exalead) Media files

Definition and types FilExt www.filext.com

Localization in search engines Terms Filter operators Autonomous databases

37

Google (filetype)

38

Bing (filetype)

39

Images in search engines

40

Languages on the Net

Sources and studies Users according to language

Global Reach global-reach.biz/globstats/index.php3

Composition of the webspace Experiments with search engines Google Yahoo! Bing (ex-Live) Search Ask (Teoma) Copernic

41

Users according to language

http://www.glreach.com/globstats/index.php3

42

Languages on the Net

Languages used to access Googlewww.google.com/press/zeitgeist.html

43

Languages (Google)

Language

<lr> value

Language

Idioma Código Idioma CódigoArabic lang_ar Icelandic lang_isChinese (S) lang_zh-CN Italian lang_itChinese (T) lang_zh-TW Japanese lang_jaCzech lang_cs Korean lang_koDanish lang_da Latvian lang_lvDutch lang_nl Lithuanian lang_ltEnglish lang_en Norwegian lang_noEstonian lang_et Portuguese lang_ptFinnish lang_fi Polish lang_plFrench lang_fr Romanian lang_roGerman lang_de Russian lang_ruGreek lang_el Spanish lang_esHebrew lang_iw Swedish lang_svHungarian lang_hu Turkish lang_tr

44

Countries (Google)

Language

Language

Andorra AD Bhutan BT Estonia EE Guinea-Bissau GW Kazakhstan KZUnited Arab Emirates AE Bouvet Island BV Egypt EG Guyana GY Lao PDR LAAfghanistan AF Botswana BW Western Sahara EH Hong Kong HK Lebanon LBAntigua and Barbuda AG Belarus BY Eritrea ER Heard and Mc Donald Islands HM Saint Lucia LCAnguilla AI Belize BZ Spain ES Honduras HN Liechtenstein LIAlbania AL Canada CA Ethiopia ET Croatia (Hrvatska) HR Sri Lanka LKArmenia AM Cocos (Keeling) Islands CC European Union EU Haiti HT Liberia LRNetherlands Antilles AN Congo, DR CD Finland FI Hungary HU Lesotho LSAngola AO Central African Republic CF Fiji FJ Indonesia ID Lithuania LTAntarctica AQ Congo CG Falkland Islands (Malvinas) FK Ireland IE Luxembourg LUArgentina AR Switzerland CH Micronesia, FS FM Israel IL Latvia LVAmerican Samoa AS Cote D'ivoire CI Faroe Islands FO India IN Libya LYAustria AT Cook Islands CK France FR British Indian Ocean Terr. IO Morocco MAAustralia AU Chile CL France, Metropolitan FX Iraq IQ Monaco MCAruba AW Cameroon CM Gabon GA Iran IR Moldova MDAzerbaijan AZ China CN United Kingdom UK Iceland IS Madagascar MGBosnia and Herzegowina BA Colombia CO Grenada GD Italy IT Marshall Islands MHBarbados BB Costa Rica CR Georgia GE Jamaica JM Macedonia, FYR MKBangladesh BD Cuba CU French Quiana GF Jordan JO Mali MLBelgium BE Cape Verde CV Ghana GH Japan JP Myanmar MMBurkina Faso BF Christmas Island CX Gibraltar GI Kenya KE Mongolia MNBulgaria BG Cyprus CY Greenland GL Kyrgyzstan KG Macau MOBahrain BH Czech Republic CZ Gambia GM Cambodia KH Northern Mariana Islands MPBurundi BI Germany DE Guinea GN Kiribati KI Martinique MQBenin BJ Djibouti DJ Guadeloupe GP Comoros KM Mauritania MRBermuda BM Denmark DK Equatorial Guinea GQ Saint Kitts and Nevis KN Montserrat MSBrunei Darussalam BN Dominica DM Greece GR Korea, DPR KP Malta MTBolivia BO Dominican Republic DO South Georgia/South Sandwich I. GS Korea, Republic of KR Mauritius MUBrazil BR Algeria DZ Guatemala GT Kuwait KW Maldives MVBahamas BS Ecuador EC Guam GU Cayman Islands KY Malawi MW

45

Countries II (Google)

Language

Language

Mexico MX Qatar QA Tokelau TKMalaysia MY Reunion RE Turkmenistan TMMozambique MZ Romania RO Tunisia TNNamibia NA Russian Federation RU Tonga TONew Caledonia NC Rwanda RW East Timor TPNiger NE Saudi Arabia SA Turkey TRNorfolk Island NF Solomon Islands SB Trinidad and Tobago TTNigeria NG Seychelles SC Tuvalu TVNicaragua NI Sudan SD Taiwan TWNetherlands NL Sweden SE Tanzania TZNorway NO Singapore SG Ukraine UANepal NP St. Helena SH Uganda UGNauru NR Slovenia SI United States Minor Outlying I. UMNiue NU Svalbard and Jan Mayen Is. SJ United States USNew Zealand NZ Slovakia (Slovak Republic) SK Uruguay UYOman OM Sierra Leone SL Uzbekistan UZPanama PA San Marino SM Holy See (Vatican City State) VAPeru PE Senegal SN Saint Vincent and the Grenadines VCFrench Polynesia PF Somalia SO Venezuela VEPapua New Guinea PG Suriname SR Virgin Islands (British) VGPhilippines PH Sao Tome and Principe ST Virgin Islands (U.S.) VIPakistan PK El Salvador SV Vietnam VNPoland PL Syria SY Vanuatu VUSt. Pierre and Miquelon PM Swaziland SZ Wallis and Futuna Islands WFPitcairn PN Turks and Caicos Islands TC Samoa WSPuerto Rico PR Chad TD Yemen YEPalestine PS French Southern Territories TF Mayotte YTPortugal PT Togo TG Yugoslavia YUPalau PW Thailand TH South Africa ZAParaguay PY Tajikistan TJ Zambia ZM

46

Lists of universities

Language

Language

Braintrack www.braintrack.comUniversities Worldwide univ.ccGalilei www.galilei.com.arWebometrics Cataloguewww.webometrics.info/university_by_country_select.aspHEIR siu.no/heirGeneral Education Online www.findaschool.orgInternational Colleges and Universities www.4icu.orgPortal Tecnociencia www.tecnociencia.esUniversia www.universia.esCanadian Universities www.uwaterloo.ca/canuU.S. Universities by State www.utexas.edu/world/univ/stateTop American Reseach Universities thecenter.ufl.eduUK Higher Education Map www.scit.wlv.ac.uk/ukinfo/uk.map.htmlTimes World Universities Rankings www.thes.co.uk/worldrankingsGerman University Ranking www.university-ranking.orgAcademic Ranking of World Universities ed.sjtu.edu.cn/ranking.htmAll Universities around the World www.bulter.nl/universitiesRanking of China Universities rank2005.netbig.comAlphabetical Index of Japanese Universities camp.ff.tku.ac.jp/TOOL-BOX/JapanUNIV

47

Personal agents (I) Website extractors

AaronWebVacuum 2.9 www.surfwarelabs.comJOC WebSpider 5.7 www.jocsoft.comTeleport Pro 1.64 www.tenmax.comLeech 4.3 www.aeria.comWebCopier 5.4 www.maximumsoft.comBlackWidow 6.28 www.softbytelabs.comMemoWeb 4.0 www.goto.frOffline Commander 2.1 www.zylox.comWebReaper 10 www.webreaper.netOffline Explorer Pro 5.9 www.metaproducts.comWebsite Extractor 10.0 www.asona.orgWebWhacker 5.0 www.bluesquirrel.comWebZip 7.1 www.spidersoft.comWebsite2PDF 1.0 www.spidersoft.comMedusa 1.2 www.candego.com

48

Personal agents (II)

Link checkersAlert LinkRunner 6.01 www.alertbookmarks.com/lrHTML Link Validator 4.47 www.lithopssoft.comHTML Validator Professional 11 www.htmlvalidator.comLink Checker Pro 3.3 www.link-checker-pro.comLinkScan Workstation 12.1 www.elsop.comWeb Link Validator 5.5 www.relsoftware.com/wlvXenu's Link Sleuth 1.3 home.snafu.de/tilman/xenulink.html

49

Personal agents (III)

HTML extractors WebData Extractor 6.0 www.webextractor.com

Experiments Site extraction with the offline browser Teleport Pro Mapping of the extracted site with Xenu

Link checking Direct mapping of the site with Xenu

Link checking Size of the site according to the search engines

Google, Yahoo, Exalead, Ask, Gigablast

50

WebDataExtractor

51

Website extraction, checking and mapping

52

Cybermetrics of search engines Search engines: Characteristics and

problems 8 “different” big search engines

Google Yahoo Search (now Bing supplied) Bing (ex-Live) Search Ask (ex-Teoma) Exalead Wisenut Gigablast Alexa

Studies about search enginesSearch Engine Showdown searchengineshowdown.comSearch Engine Watch searchenginewatch.com

53

¿Only seven (+one)?

Sede Base de datos Sede Base de datos Sede Base de datosGOOGLE GOOGLE GOOGLENETSCAPE NETSCAPE NETSCAPEYAHOO YAHOO YAHOOALTAVISTA ALTAVISTA ALTAVISTA ALTAVISTAALLTHEWEB ALLTHEWEB ALLTHEWEBLYCOS LYCOS TEOMA LYCOSIWON GOOGLE IWON GOOGLE IWONHOTBOT HOTBOTMSN SEARCH MSN SEARCHMSN SEARCH LIVE LIVETEOMA TEOMAASK JEEVES ASK JEEVESALEXA GOOGLE ALEXA ALEXA ALEXA

A9 A9 LIVEEXALEAD EXALEAD EXALEAD EXALEAD

WISENUT WISENUT WISENUT WISENUT WISENUT WISENUTGIGABLASTHEREUARE

GOOGLE/MSN SEARCH

2003 2004-2005 2006-2007

GIGABLAST GIGABLASTGIGABLAST GIGABLAST GIGABLAST

GOOGLEGOOGLE

ASK

YAHOO

TEOMA ASK ASK

YAHOO

GOOGLE

FAST

INKTOMI

TEOMA

54

Cybermetrics of search engines

GOOGLE BING (LIVE) EXALEAD ASK GIGABLAST

TLD site:xx site:xx site:xx site:xx site:xxDomain site:aa.xx site:aa.xx site:aa.xx site:aa.xx site:aa.xx

Directory site:aa.xx/bb site:aa.xx/bb NO site:aa.xx/bb NO

Word in url inurl:xx NOinurl:xxurl:xx

inurl:xx inurl:xx

Link link:aa.xx/b.htm NO link:www.aa.xx (NO) (NO)

Link domain NO NO link:aaa.xx NO NO

File type filetype:yy filetype:yy filetype:yy filetype:yy filetype:yy

Language Advanced Advanced Advanced Advanced NO

Country Advanced (Advanced) Advanced Advanced NO

55

URL-mention

56

Outlinks

57

Quality, visibility and impact Quantitative evaluation of institutional

websites The Google model

ToolBar installation (toolbar.google.com) Page Rank

Logarithmic scalerankwhere.com/google-page-rank.phpwww.rustybrick.com/pagerank-prediction.php

Components: visibility + weight

Visibility Types of links: inlinks, outlinks, self-links, back-links Calculation using search engines Web impact (WebIF) Link quality: Link inspectors

58

Google Toolbar

59

RankWhere

60

PageRank Prediction

61

urltrends

62

Nutch

63

Popularity Number of visits

It's difficult to obtain for comparative studies Relative position

Popularity according to www.alexa.com Only domains World Wide coverage Some “absolute” values Temporal evolution Geographic biases (>> Asia)

Snapshot snapshot.compete.com Only USA!!!

Ranking.com www.ranking.com Traffic Estimate www.trafficestimate.com Popularity according to Netcraft toolbar.netcraft.com/site_report

Institutional sites and variants More restricted coverage

No comparables

64

Alexa

65

Limits of Alexa

66

Inequalities in Alexa

Posición % VISITASTop 3 23Top 500 45Número 10 5Número 100 0,1Número 1.000 0,06%Número 10.000 0,02%

67

Snapshot

68

Ranking.com

69

Netcraft

70

Working with links

Visibility Inlinks (incoming links)

Yahoo Site Explorer Exalead: link: -site:

Outlinks (outgoing links)=Luminosity Link inspectors

Web impact Definition of WebIF

Calculation=Visibility/size Quality

Link checkers

71

Basic terminology

B has an outlink to C : ~ reference B has an inlink from A : ~ citation B has a selflink : ~ self-citation

E and F are reciprocally linked A is transitively linked with H via B-D A has a transversal link to G : short cut

C and D are co-linked from B,i.e. shared inlinks: co-citation

B and E are co-linking to D,i.e. shared outlinks: bibliog.coupling

A

B

D

E G

F

H

C

co-links

72

Cyberscientometrics Development of R&D indicators in the Web

Units Institutional site

Models Indicators

Co-sitation, social networks and theory of the “small world” Small World www.db.dk/lb/2002smallworld.pps

Bibliometrics of e-journals and deposits of documents CiteSeerX citeseerx.ist.psu.edu CiteBase citebase.eprints.org/cgi-bin/search Google Scholar scholar.google.com Arxiv arxiv.org Scirus www.scirus.com DBLP dblp.uni-trier.de

73

Web indicators

R&DIndicators

Information SocietyIndicators

Input Output

WebIndicators

Scientometrics

BibliometricsPatentometrics

WebometricsCybermetrics

74

Building Indicators Experiments Codification

Institutional Subject (UNESCO) Geographic (NUTS)

Indicators calculation Visibility (sitations)

Visibility of the rich files Visibility of articles in repositories Visibility of electronic journals

Impact (WebIF) Diversity Co-citation

75

Web Impact factor (WebIF) Visibility (sitations)/ Size (No. of pages)

Webometrics (Academic) Rank

Composite indicators

Size No. of Webpages No. of files

Rich files:pdf, ppt, doc, ps

No. of papersGoogle ScholarOther bibliographic

databases

Visibility Incoming external links Mentions

Popularity

76

Webometrics Ranking

www.webometrics.info

77

Size (number of pages)

78

Direct crawling

79

Other rankings

http://vcmike.blogspot.com/2006/01/ranking-colleges-using-google-and-oss.html

80

Other rankings: G-factor

http://www.universitymetrics.com/g-factor

81

Related (I)

82

Related (II)

Applied CybermetricsSearch Engine Optimization (SEO)

Web Positioning

MODULE 2

84

Applied Cybermetrics The aim is not only to publish in the Web, but to get

visibility Getting a great number of visits (real audience closed to the

potential one) Receiving external links Being present in directories and portals

A search engine is used in 80% of the web sessions The web positioning is the key to increment visibility

Quality influences the chances to get a good positioning, but also... The volume of information The hypertext structure The contents annotation

85

Positioning Presence measurements

Directory indexing Actual indexed pages by a search engine/Total pages

Visibility measurements Page Rank Prominence by terms

Measurements of access and usage Popularity

• Absolute: Number of visits• Relative: Alexa Ranking

Usage• Number of downloaded files• Average time per visit• More frequent reference terms

86

PageRank Google

87

Problems Design is irrelevant, or even counterproductive

Few indexable contents on main page Flash animations or Java applets that hinder the robots’

navigation Invisible Internet

Databases and dynamic web pages can not be indexed by search engines

Link quality It's necessary a continuous maintenance and update of external

and internal links Rich files

Documental files are handy for distributing information with a plus value• Formats pdf, ppt, doc, ps

88

ToolsWebmasters World tools.webmastersworld.orgSEO Encyclopedia www.seopedia.infoWebmasters Tools tools.devshed.comSEO Online www.seoonline.infoPageStrength www.seomoz.org/tools/page-strength.phpData Centers Tool www.seocritique.com/datacentertoolSEO Tools www.seochat.com/seo-toolsSEO Web Directory www.seowebdirectory.com/SEO_ToolsSEO Company www.seocompany.ca/tool/seo-tools.htmlSEO ToolSet www.webconfs.com

89

90

91

Criteria (Google) Hypertext structure

Maturity: Depth of the institutional sites Visibility: PageRank Neighborhood: External and internal links

Number of times that the search terms appear Relative position of the search terms

Title and URL Metadata Headings ALT tags and external anchors

Updating periodicity Freshness (new contents)

Popularity: Page visits Local aspects (geographic, languages)

92

Criteria (Google)

93

Presence of terms in the URL Very relevant Preferably in the domain or subdomain

Recommended no longer than 30 characters The order is important

http://better.good.xx/aceptable

Whole words, not truncated http://lib.univ.edu http://library.university.edu (YES)

Independent terms/phrases (dash/underscore) Universidad-Complutense= +Universidad +Complutense Universidad_Complutense= “Universidad Complutense”

94

Agapea

95

Presence of terms in Title Very relevant Tag contents <TITLE>!!!

Key words, no title The position is important: first words carefully selected Long phrase, without empty words (~60 characters) Don't repeat terms, bilingual option Institutional identification, geographic localization

The tag’s contents are also considered <Hn> The heading gives the title obtained <H1> Moving generic words: “Hello”, “Welcome”, “Page of” to inferior

levels <H2> ó <H3>

96

Terms in Title

97

Metatags They are not so important Description

Up to 250 characters Reusable tag for versions in other languages The position is important: choose wisely the first words Don’t repeat words

Keywords Up to 20 terms Terms SHOULD also appear in the text Reusable tag for versions in other languages The position is important: choose wisely the first words Don’t repeat words

Description pre-cataloging Use another tags: Dublin Core model (15 repeatable)

98

Generating META tagsMeta Builder 2vancouver-webpages.com/META/mk-metas.htmlMeta Tags Generator www.meta-tags.usMetaTags Generatortools.webmastersworld.org/MetatagsGenerator.phpMeta Tag Generatorwww.invision-graphics.com/meta-tag-generator.htmlMeta Tag Generator www.submitcorner.com/Tools/Meta

DC-Dot www.ukoln.ac.uk/metadata/dcdot/

99

Key words in text To select correctly

To study synonymy, variants, similar terms in other languages To analyze usage in search engines

Density Total: Up to 25% Individual: Up to 5%

Position Heading tags <Hn> First paragraphs Font modifying tags

Bold <B><strong>; Italic <I>; Font size To promote the proximity of terms (where appropriate)

100

More about keywords Alternative text ALT

Very important Used to give meaning to images, graphs and banners Specific treatment similar to title Up to 250 characters

Anchor terms in the links Use keywords It’s very important the pages that link ours It’s also relevant for the internal navigational links

101

Google-bombing

102

Google Trends

103

Google Timeline & Map

104

Links to external pages Link’s density

Average of links/page (incl. internal) ~ 20 Structuring resource lists in hierarchical directories

Each category, one or more pages

Target pages Linking to good pages

Main page (whenever appropriate) Pages with high PR Updated pages Local>.edu>.org>.info>.com

Check frequently that links are still active Avoid links to link farms Select carefully the text on the link (avoid “here”, “page”)

105

Characteristics of the institutional sites Domain

Own Avoid acronyms, provide content Local, .org, .info, .name versus .com

Subdomain: Inherit PR from site root Don’t change domain!!!

Medium-sized and big institutional sites Preferably large

Updating Frequently

Increase number of pages (maintain new/old rate )

Promote inlinks Promote visits

Keep statistics

106

Characteristics of the pages Size

Small or medium-sized <100 k But 40-50 k can be a great volume of text Structure correctly the groups of pages through consecutive links

(back-next) Medium or big-sized

Updating Frequent, but not that much Change contents, no address

Reduce to a minimum the restructuring

Versions In different pages

In other languages In other formats (pdf, doc, ps, ppt, ...)

107

Barriers for robots Links hidden, incomplete or without meaning

Graphs and way-in banners without link in text mode Specially Flash files It’s also important the presence of ALT text

Javascripts in navigational menus With hidden links With relative, incomplete links (without URL Base declaration)

Frames (but NOT always!!) Orphan pages

Avoid re-direction and alias Refresh tags Institutional farms (site.es; site.com; site.org)

Dynamic pages Reduce length and complexity of the URLS: Give them a

meaning

108

Robot-friendly File robots.txt

Don’t abuse of “no index” Map of the site (html and xml) Navigational internal links

Just the ones and necessary Sign-in in referrals

At the search engines (not very important, only speed-up indexing)

In directories (In Yahoo increase the visibility) In supersites (trick: Wikipedia)

Fight against the invisibility Static pages Support submenus

109

“Visible” Internet

110

Hacking strategies (to avoid) Invisible texts Pixel links Link farms

Link buying Visits buying

Duplicate texts Cloaking

Different pages for the search engine than for the user Hacking mirrors

111

Tools: Words’ Density

Site Content Analyzer 2.2.15 www.sitecontentanalyzer.comGood Keywords 2.0 www.goodkeywords.comKeyword Density www.keyworddensity.comKeyw. Dens. & Prominence 1.2 www.ranks.nl/tools/spider.htmlKeyword Density Analyzer tool.motoricerca.info/keyword-density.phtmlKDAnalyzer Version 2.0 www.webjectives.com/keyword.htmGoogle Adwords adwords.google.com/select/KeywordSandboxKeyword Density Analyzer 1.3www.searchengineworld.com/cgi-bin/kwda.cgiKeyword Investigatorwww.keywordster.com/keyword-investigator.htmGRKdawww.grsoftware.net/search_engines/software/grkda.html

112

Keyword Density & Prominence

113

Tools: PositionAccurate Monitor 2.5 www.cleverstat.comAdvanced Web Ranking 4.7 www.advancedwebranking.comAgentWebRanking Pro 2.6 www.agentwebranking.comIBP 9 www.axandra.comDynamic Web Ranking 7.0 www.dynamicwebrank.comLink Popularity Analysis 2.0 www.link-popularity-analysis.comLink Popularity Check 3.0 www.checkyourlinkpopularity.comLink Survey 1.5 www.antssoft.comRankSpy 1.3 www.searchutilities.com/rankspyTrellian SEO Toolkit www.trellian.com/seotoolkitWeb CEO 6.0 www.webceo.com

114

WebPosition

115

Advanced Web Ranking

116

Quality: Duplicates, broken links

117

Evolution and persistence

Volatility Persistence

Changes in web pages used to be minor or cosmetic

The frequency of change varies according to the domains

The magnitude of the change depends largely on the size

Big pages change more and more frequently

research.microsoft.com/research/sv/sv-pubs/p97-fetterly/p97-fetterly.pdf

118

Generating Contents Personal pages (also research groups or departments)

Access to full texts files (academic publications)

Institutional Repositories

Papers, books and book chapters, dissertations, …

Multimedia repositories

Portal of journals

Local institutional journals

Super-sites

Added value directories of (web) resources

119

Added value

120

Personal pages Current situation

Few scholars with their own personal webpage, most of them with a limited amount of contents

Bad positioning practices, especially regarding the URL

Personal Branding

Increased Impact (global audiences)

Efficient Networking (peers and non-peers)

Complements your formal scholarly communication

Reflects the diversity of your activities (and of yourself)

Not only reactive but also proactive

It is easy, fast and cheap

121

A model

Institutional Logo & BannerName of the group, department or faculty

Index Papers Conferences Books Teaching Proyects Popular

Science Prizes Hobbies Press notes Blog / Web 2.0 Statistics CV (pdf)

Photo Contact info

General comments and presentation

Links

News, relevant new infoNext conferences

Updated 5-July-2012

thebook.virtualknowledgestudio.nl/author/paul-wouters

http://johnclements.net/home

Usage metricsTracking and Analyzing Visits

MODULE 3

123

Web Usage Mining

Definitions Data mining: Knowledge extraction from databases Web Mining: Gathering and analisys of the visit patterns of a Web

site It is not to search or recover information about that site

Objectives: Aspects to explore Joining Classification and clustering Transversal patterns Sequential patterns Similarities

Visits Web sites analysis Log files: Definition and structure Software for log analyzing

Practices with WebTrends Analysis Suite (www.netiq.com)

124

Taxonomy of the Web Mining

Web Mining

Mining of the Web use

Database miningDatabase mining

Mining of Web contents

Mining based on agents

Search engines Metasearchers Personal agents

Invisible Internet

Identification Description Analysis tools

125

Log files(logbook)

IP address from the visitor Visited URLs Time of visit Time dedicated to the visit URL from which the visit came

Type of petition Type of answer Size of answer (bytes) Browser used etc…

File that automatically records all data about the visits that a web site receives

Apache web log205.188.209.10 - - [29/Mar/2002:03:58:06 -0800] "GET /~sophal/whole5.gif HTTP/1.0" 200 9609 "http://www.csua.berkeley.edu/~sophal/whole.html" "Mozilla/4.0 (compatible; MSIE 5.0; AOL 6.0; Windows 98; DigExt)" 216.35.116.26 - - [29/Mar/2002:03:59:40 -0800] "GET /~alexlam/resume.html HTTP/1.0" 200 2674 "-" "Mozilla/5.0 (Slurp/cat; [email protected]; http://www.inktomi.com/slurp.html)“202.155.20.142 - - [29/Mar/2002:03:00:14 -0800] "GET /~tahir/indextop.html HTTP/1.1" 200 3510 "http://www.csua.berkeley.edu/~tahir/" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)“

126

Utilities Questions to answer

¿How the information has been used? ¿How frequently? ¿What is the most and the less popular (visited)? ¿Where from do the visitors come?. ¿Where from do they

exit? ¿Where do they spend more time? ¿How much time do they spend? ¿Which are the paths that visitors follow the most? ¿Who are the visitors? ¿Where do they come from? ¿How did they arrive?

127

Google Analytics www.google.com/analyticsYahoo Web Analytics web.analytics.yahoo.comStatCounter www.statcounter.comActiveMeter www.activemeter.com123Statmore www.123stat.comCounter Central www.countercentral.comDigits Web Counter www.digits.comFree Hit Counter www.ritecounter.comGoStats www.gostats.comMyWebStats www.mywebstats.orgOneStat Free www.onestatfree.comOneStat www.onestat.comOpentracker www.opentracker.netShinyStat www.shinystat.comTDstats www.tdstats.comTheCounter www.thecounter.comWebSTAT www.webstat.comWhat Counter www.whatcounter.com

Visits trackers

128

Google Analytics

129

Google Analytics (II)

130

Google Analytics (III)

131

StatCounter

132

10-Strike Log-Analyzer 1.53 www.10-strike.com123LogAnalyzer 3.3 www.123loganalyzer.comLog2Stats 1.5 www.bitstrike.comAdvancedLogAnalyzer 2.1 www.abacre.com/ala/index.htmAlterwind Log Analyzer 4.0 www.alterwind.comAnalog 6.0 www.analog.cxAnalyse Spider 3.01 www.analysespider.comDeep Log Analyzer 4.0 www.deep-software.comeWebLogAnalyzer 2.3 www.esoftys.comFastStats Analyzer 4.1 www.mach5.com/products/analyzerNihuo Web Log Analyzer 4.07 www.nihuo.comSawMill 8.5 www.sawmill.netSmarterStats 6.5 www.smartertools.comSurfstats 2011 www.surfstats.comWebLogStorming 2.6 www.datalandsoftware.com/weblogWebLogExpert 7.4 www.weblogexpert.comWebTrends Analytics 10 www.webtrends.com

Log file analysis software

133

10-Strike Log Analyzer

134

123-Log Analyzer

135

SawMill

136

Exercises Experiments

Funnel Web 5.0 Practices with log files

Total and disaggregated visits More popular pages and directories Downloaded files Points of entry and exit Visitors demography Entry referrals (origin, browser and search engine words

used)

137

Configuring Funnel Web

138

Results

139

Referrals

140

Bibliography/Webliography General Bibliography/Webliography www.cindoc.csic.es/cybermetrics/links03.html Björneborn, L. & Ingwersen, P. (2001). Perspectives of webometrics. Scientometrics, 50(1): 65-82.

http://www.db.dk/lb/2001webometrics.pdf van Raan, A. F. J. (2001). Bibliometrics and internet: Some observations and expectations. Scientometrics, 50(1):

59-63 Bar-Ilan, J. (2001). Data collection methods on the Web for infometric purposes. A review and analysis.

Scientometrics, 50(1):7-32 Björneborn, L. (2004). Small-world link structures across an academic web space : a library and information

science approach. PhD dissertation. Royal School of Library and Information Science. xxxvi, 399 p. ISBN 87-7415-276-9.<http://www.db.dk/lb/phd/phd-thesis.pdf >

Jepsen, E.T.; Seiden, P.; Ingwersen, P.; Björneborn, L. & Borlund, P. (2005). Characteristics of scientific web publications: preliminary data gathering and analysis. Journal of the American Society for Information Science and Technology. Special Issue on Webometrics.

Björneborn, L. & Ingwersen, P. (2005). Towards a basic framework for webometrics. Journal of the American Society for Information Science and Technology. Special Issue on Webometrics.

Thelwall, M.; Vaughan, L. & Björneborn, L. (2005). Webometrics. Annual Review of Information Science and Technology, 39.

Ingwersen, P. & Björneborn, L. (2004). Methodological issues of webometric studies. In: Glänzel, W. et al. (eds.). Quantitative Science and Technology Research. Klüwer Academic Publishers.

The Statistical Cybermetrics Research Group. Wolverhampton University <http://cybermetrics.wlv.ac.uk> Alonso Berrocal, J.L.; Figuerola, C.G. & Zazo, A.F. (2004). Cibermetría:nuevas técnicas de estudio aplicables al

Web. Ediciones Trea, Gijón. 207 pags. Faba Perez, C., Guerrero Bote, V. P. & Moya Anegón, F. (2004). Fundamentos y técnicas cibermétricas: modelos

cuantitativos de análisis. Junta de Extremadura, Mérida. Serie Sociedad de la Información, no. 18. 216 pags. Prime, C.; Bassecoulard, E.; Zitt, M. (2002). Co-citations and co-sitations: A cautionary view on an analogy.

Scientometrics 54 (2): 291-308: