web content analysis (pdf, 2390kb)

44
Web Content Analysis Web Content Analysis Web Content Analysis Web Content Analysis Workshop in Advanced Techniques for Political Workshop in Advanced Techniques for Political Communication Research: Web Content Analysis Session Communication Research: Web Content Analysis Session Professor Rachel Gibson, University of Manchester Professor Rachel Gibson, University of Manchester

Upload: phungngoc

Post on 05-Feb-2017

221 views

Category:

Documents


5 download

TRANSCRIPT

Page 1: Web Content Analysis (PDF, 2390KB)

Web Content AnalysisWeb Content AnalysisWeb Content AnalysisWeb Content Analysis

Workshop in Advanced Techniques for Political Workshop in Advanced Techniques for Political

Communication Research: Web Content Analysis SessionCommunication Research: Web Content Analysis Session

Professor Rachel Gibson, University of ManchesterProfessor Rachel Gibson, University of Manchester

Page 2: Web Content Analysis (PDF, 2390KB)

Aims of the SessionAims of the Session

• Defining and understanding the Web as on object of study &

challenges to analysing web content

• Defining the shift from Web 1.0 to Web 2.0 or ‘social media’

• Approaches to analysing web content, particularly in relation to • Approaches to analysing web content, particularly in relation to

election campaign research and party homepages.

• Beyond websites? Examples of how web 2.0 campaigns are being

studied – key research questions and methodologies.

Page 3: Web Content Analysis (PDF, 2390KB)

The Web as an object of studyThe Web as an object of study

• Inter-linkage

• Non-linearity

• Interactivity

• Multi-media

• Global reach• Global reach

• Ephemerality

Mitra, A and Cohen, E. ‘Analyzing the Web: Directions and Challenges’ in Jones, S.

(ed) Doing Internet Research1999 Sage

Page 4: Web Content Analysis (PDF, 2390KB)

From Web 1.0 to Web 2.0From Web 1.0 to Web 2.0

Tim Berners-Lee“Web 1.0 was all about connecting people. It was an interactive space, and I think Web 2.0 is of

course a piece of jargon, nobody even knows what it means. If Web 2.0 for you is blogs and

wikis, then that is people to people. But that was what the Web was supposed to be all

along. And in fact you know, this ‘Web 2.0’, it means using the standards which have been

produced by all these people working on Web 1.0,” produced by all these people working on Web 1.0,”

(from Anderson, 2007: 5)

Web 1.0

The “read only” web

Web 2.0 – ‘What is Web 2.0’ Tim O’Reilly (2005)

“The “read/write” web

Page 5: Web Content Analysis (PDF, 2390KB)

Web 2.0Web 2.0

• Technological definition

– Web as platform, supplanting desktop pc

– Inter-operability , through browser access a wide range of programmes that fulfil a variety of online

tasks – i.e. manage a home page, communicate with friends, share/publish pictures, receive news.

• Social understanding• Social understanding

– Web 2.0 is based around social networking activities – relies on and built through ‘social or

participatory’ software.

– Users create, distribute, value-add to online content through blogs, facebook profiles, online video,

wikis, tagging tools. ‘Produsers’ Bruns (2007) or ‘Prosumers’ – distinction between producers and

users/consumers of content disappears.

– Hallmark of these applications is the way in which they devolve creative and classificatory power to

‘ordinary’ users.

Page 6: Web Content Analysis (PDF, 2390KB)

Web 1.0 & Web 2.0 applications Web 1.0 & Web 2.0 applications –– ‘social media’‘social media’

Web 1.0

• Websites

• Email

• Discussion forums/chat room/IM

Web 2.0 or ‘social media’

• Blogs

• Social Network Sites (Facebook)

• Online Video sharing/posting (YouTube)

• Online Photo sharing/posting (Flickr)

• Microblogging sites (Twitter) @, #, RT

• Online Link sharing/posting (Del.icio.us, Digg)

Page 7: Web Content Analysis (PDF, 2390KB)
Page 8: Web Content Analysis (PDF, 2390KB)
Page 9: Web Content Analysis (PDF, 2390KB)
Page 10: Web Content Analysis (PDF, 2390KB)

Methodologies used for analysing online campaigns Methodologies used for analysing online campaigns (supply(supply--side)side)

Adapted from offline - `digitized methods` (Rodgers, 2010) Web 1.0

• Online interviews, elite surveys

• Online discourse analysis

• Online content/feature analysis – ‘Classic ‘ or offline approaches

– ‘Web specific – mixed mode– ‘Web specific – mixed mode

Online specific - ‘digital methods’ Web 2.0

• Hyperlinking and webspheres – Voson, Issue Crawler, SocScibotOnline, NodeXL,

• Blog analysis toolkits and search engines –Technorati . BAT (Blog Analysis Toolkit)

• Twitter and facebook scrapers - Infoscape Lab, 140kit.com, Twapperkeeper, Discovertext

• YouTube scrapers -Tubemogul, Tubekit

• Wikipedia – Wikiscanner

Page 11: Web Content Analysis (PDF, 2390KB)
Page 12: Web Content Analysis (PDF, 2390KB)
Page 13: Web Content Analysis (PDF, 2390KB)

Methodologies for analysing online campaigns Methodologies for analysing online campaigns (‘demand’ side)(‘demand’ side)

User Focused (demand)

• Focus groups: Price and Capella (2001); Stromer-Galley and Foot (2002)

• Surveys: Bimber and Davis (2003); Gibson & McAllister (2007)

• Experiments: Iyengar (2002); Lupia & Philpott (2005)• Experiments: Iyengar (2002); Lupia & Philpott (2005)

• User statistics: Hitwise, Sitemeter, Alexa, Google Analytics

• Computer tracking devices: Phorm

• Online SNA software: NodeXL,

Page 14: Web Content Analysis (PDF, 2390KB)

Discourse analysis to study web contentDiscourse analysis to study web content

• Focus of the analysis is on web as socio-cultural ‘texts’.

• Described and explored from a qualitative perspective to uncover underlying meaning and significance.

• Examples of studies:– Markham (1998) Howard (2002) Hakken (1999) Hine (2000)– Markham (1998) Howard (2002) Hakken (1999) Hine (2000)

– Benoit and Benoit (2000) ‘The Virtual Campaign: Presidential Primary Websites in Campaign 2000’ American Communication Journal

– Warnick (1998) ‘Appearance or Reality? Political Parody on the Web in Campaign 96’ Critical Studies in Mass Communication

Page 15: Web Content Analysis (PDF, 2390KB)

Web specific content/feature analysisWeb specific content/feature analysis

• Focus is on the web page/site as unit of analysis

• Structural-Functional & Quantitative - systematic set of criteria developed and applied to sites to yield largely quantitative measures of content, functionality, usability, and design features.

– Quantitative-automated - Bauer and Scharl (2000)– Quantitative semi-automated– Quantitative semi-automated

• Web 1.0• Web 1.5?• Action Centers

• Heuristic Evaluation - Nielsen & Molich (1990) ‘Heuristic evaluation of user interfaces’; Collings and Pearce (2002); Yates (2006).

Page 16: Web Content Analysis (PDF, 2390KB)

Bauer and Scharl (2000)

Internet Research

Page 17: Web Content Analysis (PDF, 2390KB)
Page 18: Web Content Analysis (PDF, 2390KB)

Quantitative semiQuantitative semi--automatedautomated

1. Web 1.0 static/fixed homepages

- Parties and campaigns sites Gibson and Ward (2000; 2002) Foot & Schneider (2002) - E-government field see Baker (2009) Pina et al. (2009) Panopoulou et al (2008) West (2007) Henriksson et a. (2006) Holzer and Kim (2005); Garcia et al (2005) http://www.insidepolitics.org/egovt06us.pdf

2. Web 1.5? - Updated web 1.0 schemes to incorporate web 2.0 elements.Party and Campaign home pages - Gulati & Wiliams, 2006; Foot and Schneider 2006Party and Campaign home pages - Gulati & Wiliams, 2006; Foot and Schneider 2006NSM - Stein, 2009

3. ‘Action Centers’ - MyBO, Membersnet, MyConservatives, LibDemACT (Gibson, 2010; Lilleker and Jackson, 2010)

Page 19: Web Content Analysis (PDF, 2390KB)

Gibson & Ward (2000)

SSCORE

Page 20: Web Content Analysis (PDF, 2390KB)
Page 21: Web Content Analysis (PDF, 2390KB)

Gibson & Ward (2002) AJPS

Page 22: Web Content Analysis (PDF, 2390KB)

Gibson & Ward (2002) AJPS

Page 23: Web Content Analysis (PDF, 2390KB)

Gibson & Ward (2002) AJPS

Page 24: Web Content Analysis (PDF, 2390KB)

Stein, L.. 2009 NM&S

Page 25: Web Content Analysis (PDF, 2390KB)

Stein, L.. 2009 NM&S

Page 26: Web Content Analysis (PDF, 2390KB)

Stein, L.. 2009 NM&S

Page 27: Web Content Analysis (PDF, 2390KB)

Stein, L.. 2009 NM&S

Page 28: Web Content Analysis (PDF, 2390KB)
Page 29: Web Content Analysis (PDF, 2390KB)
Page 30: Web Content Analysis (PDF, 2390KB)
Page 31: Web Content Analysis (PDF, 2390KB)

Gibson, R. 2010 APSA

Page 32: Web Content Analysis (PDF, 2390KB)

HyperlinkingHyperlinking and and websphereswebspheres

• Focus is on inter-relational/textual element of websites and the wider context. Unit

of analysis is the web network rather than individual sites.

• Thematic set of websites are identified, ‘captured’/archived over time and the

linkage patterns explored as well as content.

– Method pioneered by Schneider and Foot (2004) ‘Web Sphere analysis’.– Method pioneered by Schneider and Foot (2004) ‘Web Sphere analysis’.

– Define a Web Sphere as a ‘hyperlinked set of dynamically defined digital resources

spanning multiple Web sites and deemed relevant to related to a central theme or

‘object’.’ The borders of the sphere are defined relationship to central theme/object and

a given time period.

See Library of Congress, Internet Archive http://www.archive.org/index.php &

WebArchivist.org http://webarchivist.org/

Page 33: Web Content Analysis (PDF, 2390KB)

Hyperlink analysisHyperlink analysis

More sophisticated/automated methods have developed for academic research since around 2003.

– Adaptations of search engine technology – most simple form of online link analysis.

– Hindman et al. (2003) ‘Googlearchy’ pioneered the term ‘googlearchy’ referred to a ‘power law .operating in regard to the prominence of sites.

– Adamic and Glance (2005) examined linkage between left and right-wing blogs in the U.S. http://www.blogpulse.com/papers/2005/AdamicGlanceBlogWWW.pdfU.S. http://www.blogpulse.com/papers/2005/AdamicGlanceBlogWWW.pdf

– Ackland and Gibson (2007) comparative analysis of political party systems linkage using VOSON http://anu.voson.edu.au crawler to examine the out and inbound linkages among political parties in 6 different democracies. Hyperlinks = networked communication – inclusivity, identity, opponent dismissal, force multiplication. Test via nos to links to other parties, target of links by tld, inter-linkage within party groups.

– For tools see http://www.issuecrawler.net ; http://www.touchgraph.comhttp://socscibot.wlv.ac.uk/

Page 34: Web Content Analysis (PDF, 2390KB)
Page 35: Web Content Analysis (PDF, 2390KB)

InterInter--linkage between Parties by ideological familylinkage between Parties by ideological family

Table 4: Inter-linkage between political parties by party type (1)

Far Left

(2)

Left

(3)

Centre

(4)

Right

(5)

Far Right

(6)

Ecol.

(7)

Reg.

(8)

Total no. of parties linking to

other parties

(9)

Total no.

of links

made to

other

parties

(10)

% of links

within party

family

parties

Ecologist 0 0 2 1 0 6 0 6 14 79

Far Left 11 7 0 2 1 3 1 12 71 46

Left 3 4 3 2 0 2 0 2 22 50

Centre 0 4 2 2 0 1 0 5 12 25

Right 1 3 1 8 1 2 1 9 20 55

Far Right 0 0 0 1 2 0 0 3 7 71

Regionalist 0 0 0 0 0 0 1 1 1 100

39 147 Note: The numbers in columns 1-7 are counts of parties (e.g. 11 far left parties linked to other far left parties). Note that the

same party can appear in more than once in a given row in columns 1-7, and this is why column 8 (showing the total number of parties that have linked to another seed site) is not equal to the sum of the preceding columns.

Page 36: Web Content Analysis (PDF, 2390KB)

Beyond websites…?Beyond websites…?

Blog analysis – key studies

Types, Content, Structure

Sweetster-Trammel, Kaye D. 2007. ‘Candidate campaign Blogs: Directly Reaching Out to the Youth Vote.’

The American Behavioral Scientist. 50(9): 1255-1264.

Xifra, J. and A. Huerta2008. s ‘Blogging PR: An Exploratory Analysis of Public Relations’ Public Relations

Review 34:265-275.

Herring, S. et al. ‘Weblogs as a bridging genre’ Information Technology and People 18(2): 2005: 142-171.Herring, S. et al. ‘Weblogs as a bridging genre’ Information Technology and People 18(2): 2005: 142-171.

Influence

Karpf, D. 2008. ‘Measuring Influence in the Political Blogosphere’ Institute Politics and Technology Review,

Institute for Politics, Democracy and the Internet. :33-41

http://www.the4dgroup.com/BAI/articles/PoliTechArticle.pdf

Etling, B. et al. 2010. ‘Mapping the Arabic blogosphere: politics and dissent online’ 12: 1225-1243.

Elmer, G.; Ryan, P.M.; Devereaux, Z.; Langlois, G.; Redden, J. and McKelvey, F. 2007. Election Bloggers:

Methods for Determining Political Influence. First Monday 12(4).

Drezner and Farrell. 2008. ‘The Power and Politics of Blogs’ Public Choice (134): 15-30

Page 37: Web Content Analysis (PDF, 2390KB)

Beyond websites…?Beyond websites…?

Twitter analysis

• Boyd, D. et al. 2010 ‘Tweet Tweet Retweet’ Working paper.

http://www.danah.org/papers/TweetTweetRetweet.pdf

• Tumasjan et al. 2010. ‘Predicting Elections with Twitter’ Association

for the Advancement of Artificial Intelligence. Social Science for the Advancement of Artificial Intelligence. Social Science

Computer Review

http://ssc.sagepub.com/content/early/2010/09/24/089443931038655

7.full.pdf+html

• Zuckerman, E. ‘Studying Twitter and the Moldovan Protests’

www.Ethanzuckermann.com/blog/

Page 38: Web Content Analysis (PDF, 2390KB)
Page 39: Web Content Analysis (PDF, 2390KB)

Beyond websites…?Beyond websites…?

YouTube analysis• Shah, C. 2010. ‘Supporting Research Data Collection from YouTube with Tubekit’ Journal of

Information Technology & Politics 7(2&3): 226-240

• Wallsten, K. 2010. ‘Yes We Can’: How Online Viewership, Blog Discussion, Campaign

Statement, and Mainstream Media Coverage Produced by a Viral Video Phenomenon.’ Statement, and Mainstream Media Coverage Produced by a Viral Video Phenomenon.’

Journal of Information Technology & Politics 7(2&3): 163-181

• Zink, M. Suh, K and J. Kurose. 2009. ‘Characteristic of YouTube network traffic at a campus

network,’ Computer Networks 53: 501-514

• Tian, Y. 2010. ‘Organ Donation on Web 2.0: Content and Audience Analysis of Organ

Donation Videos on YouTube’ Health Communication 25: 238-246.

Page 40: Web Content Analysis (PDF, 2390KB)

Online resources and tools for collection and Online resources and tools for collection and analysing web 2.0 content.analysing web 2.0 content.

Digital Tool Kits

Digital Methods Initiative - https://wiki.digitalmethods.net/Dmi/ToolDatabase Richard Rodgers,Natively Digital: The Link | The URL | The Tag | The Domain | The PageRank | The Robots.txt Device Centric: Google | Google Images |

Google News | Google Blog Search | Yahoo | YouTube | Del.icio.us | Technorati | Wikipedia | Alexa | IssueCrawler | Twitter | Facebook

Infoscape Research Lab Tools www.infoscapelab.ca Greg Elmer

- Blog Aggregator – measures activity levels over time and hyperlinks cited in blog posts

- YouTube & Twitter scraper – title, tags, links, date uploaded, author info, views per week- YouTube & Twitter scraper – title, tags, links, date uploaded, author info, views per week

- Facebook Group scraper - title, type, number and members of group

Training / Research

• Exploring Online Research Methods Website: http://www.geog.le.ac.uk/ORM/site/home.htm

• Digital Methods Initiative Training Course https://www.digitalmethods.net/Digitalmethods/WebHome

• http://nms.sagepub.com/

• http://jcmc.indiana.edu/

• http://www.connectedaction.net/

• http://www.methodspace.com/group/sageresearchmethodsonline

Page 41: Web Content Analysis (PDF, 2390KB)
Page 42: Web Content Analysis (PDF, 2390KB)
Page 43: Web Content Analysis (PDF, 2390KB)

Concluding commentsConcluding comments• There are a growing number of options open to researchers seeking to capture and analyze

campaigns and their influence on voters.

• Approaches have evolved as Web itself has. From adaptation ‘traditional’ methods suitable to analysis of static point to mass ‘old’ media to newer web ‘specific’ tools that allow for capture and analysis of more dynamic ‘many to many’ medium.

• Network analysis tools increasingly used – reflecting a shift in the understanding and practice of the Web as a dynamic social context - ‘conversational’ - rather than a fixed infrastructure.

• The notion of ‘content’ itself has changed fundamentally - user-generated rather than editor • The notion of ‘content’ itself has changed fundamentally - user-generated rather than editor controlled (although MSM still dominant in news).

• This has some significant implications for content analysis:

1. Linkage of the context to the content. To interpret meaning of web content – twitter feed, facebook profiles, locate it in the wider and ongoing stream of dialogue.

2. . Locating content – inter-operability of the various platforms means that content is shared across multiple sites. So need to ‘follow ‘ the content.

3. Volume of content increases - capturing and archiving issues

4. Quality of content changes . A new language or terminology emerging content = actions. Posts, embeds, widgets, tweets, email, views, followers, ‘befriending’ or ‘liking’ a politician/party . A new software language has grown up - @, #, RT, http. Automating/manual coding.

Page 44: Web Content Analysis (PDF, 2390KB)

Short demo of a digital methodsShort demo of a digital methods

Getting started with NodeXL.

Manual: Hansen et al. Analyzing Social Media Networks with NodeXL

Not focused on web content per se but structure or influence of that content. So ex.

compare politicians’ twitter networks, or a trending topic – import by #.

Example: Fake FB network but you can import pajek, ucinet datasets or network data

from twitter, youtube, email lists from twitter, youtube, email lists

‘Vertices’ = nodes or ‘agents’ and ‘edges’ or ties between nodes.

‘Edges’ are relationships between them - ‘undirected’ or ‘directed’

‘Unweighted’ edge is simplest – whether relationship exists. ‘Weighted’ adds

information (strength, frequency of the tie) have darker/thicker lines.

NodeXL works through setting up an ‘edge list’ which represents a network. This

network can then be represented in a graphical form. Through this you can

calculate various measures of centrality of individuals or nodes in the network.

Group function – can group types of individuals and calculate a collective weight.