social web mining

Upload: ajeetkharra

Post on 08-Apr-2018

221 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/7/2019 Social Web Mining

    1/8

    c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 1 0 0 ( 2 0 1 0 ) 1623

    j o u r n a l h o m e p a g e : w w w . i n t l . e l s e v i e r h e a l t h . c o m / j o u r n a l s / c m p b

    Social Web mining and exploitation for serious applications:

    Technosocial Predictive Analytics and related technologies

    for public health, environmental and national security

    surveillance

    Maged N. Kamel Boulos a,, Antonio P. Sanfilippo b, Courtney D. Corley b, Steve Wheeler c

    a Faculty of Health, University of Plymouth, Drake Circus, Plymouth, Devon PL4 8AA, UKb Techno-Social Predictive Analytics Initiative, Pacific Northwest National Laboratory, P.O. Box 999, Richland, WA 99352, USAc Faculty of Education, University of Plymouth, Drake Circus, Plymouth, Devon PL4 8AA, UK

    a r t i c l e i n f o

    Article history:

    Received 3 December 2009

    Received in revised form

    10 February 2010

    Accepted 12 February 2010

    Keywords:

    Social Web

    Web 2.0

    Disease surveillance

    Technosocial Predictive Analytics

    a b s t r a c t

    This paper explores Technosocial Predictive Analytics (TPA) and related methods for Web

    data mining where users posts andqueries aregarneredfrom SocialWeb(Web 2.0) tools

    such as blogs, micro-blogging and social networking sites to form coherent representations

    of real-time health events. The paper includes a brief introduction to commonly used Social

    Web tools such as mashups and aggregators, and maps their exponentialgrowth as an open

    architecture of participation for the masses and an emerging way to gain insight about

    peoples collective health status of whole populations. Several health related tool examples

    are described and demonstrated as practical means through which health professionalsmight create clear location specific pictures of epidemiological data such as flu outbreaks.

    2010 Elsevier Ireland Ltd. All rights reserved.

    1. Background

    The rapid development and growing popularity of the

    Social Web, also commonly known as Web 2.0, and its

    applications, coupled with the increasing availability of

    mobile Internet-enabled devices (such as netbooks and AppleiPhone), has ensured that the Internet is now a truly

    integral part of everyday life. The Social Web is a partic-

    ularly dynamic medium that captures the pulse of the

    society in real time. Blogs, micro-blogs such as Twitter

    (http://twitter.com/), and social networking sites such as

    Corresponding author at: Room 05, 3 Portland Villas, Faculty of Health, University of Plymouth, Drake Circus, Plymouth, Devon PL4 8AA,UK. Tel.: +44 0 1752 586530; fax: +44 0 7053 487881.

    E-mail addresses: [email protected] (M.N. Kamel Boulos), [email protected] (A.P. Sanfilippo),[email protected] (C.D. Corley), [email protected] (S. Wheeler).

    Facebook (http://www.facebook.com/), enable people to pub-

    lish their personal stories, opinions, product reviews and

    a great deal more in real time. They can also share geo-

    tagged alerts and reports about their current location and

    allow others to follow their whereabouts, all in real time,

    using location-aware social services such as Microsoft Vine

    (http://www.vine.net/). Such applications have greater power

    when coupled to Social Web mashup services and aggre-

    gators. Such tools weave data from disparate sources into a

    new, compound data source or service [1,2]. A useful exam-

    ple of the latter is HealthMap, the global disease alert map

    (http://www.healthmap.org/en).

    0169-2607/$ see front matter 2010 Elsevier Ireland Ltd. All rights reserved.

    doi:10.1016/j.cmpb.2010.02.007

    http://dx.doi.org/10.1016/j.cmpb.2010.02.007http://twitter.com/mailto:[email protected]:[email protected]:[email protected]:[email protected]://www.facebook.com/http://www.vine.net/http://www.healthmap.org/enhttp://dx.doi.org/10.1016/j.cmpb.2010.02.007http://dx.doi.org/10.1016/j.cmpb.2010.02.007http://www.healthmap.org/enhttp://www.vine.net/http://www.facebook.com/mailto:[email protected]:[email protected]:[email protected]:[email protected]://twitter.com/http://dx.doi.org/10.1016/j.cmpb.2010.02.007
  • 8/7/2019 Social Web Mining

    2/8

    c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 1 0 0 ( 2 0 1 0 ) 1 6 2 3 17

    Fig. 1 How To Use Who Is Sick? (Screenshot from http://whoissick.org/sickness/).

    The dynamic nature and continuously updated features of

    the Social Web make it a fertile environment for intelligencegathering in a variety of disciplines, enabling users to tap into

    the wisdom of the crowds. Further, the Social Web is useful

    for gleaning the collective impression of the masses regard-

    ing current matters, events and products [3]. In fact, it is not

    uncommonthese days for example,to see shoppers(including

    those who do not make their purchases through the Internet)

    consulting online user reviews and ratings on sites such as

    Amazon.com about products they are planning to buy before

    making their purchase decisions.

    This paper will briefly review a number of emerging

    technologiesand tools, includingTechnosocialPredictiveAna-

    lytics pioneered at Pacific Northwest National Laboratory,

    USA (http://predictiveanalytics.pnl.gov/ ), that can be used toexploit these inherent Social Web features in real or near-real

    time and harness them for public health, environmental and

    national security surveillance purposes.

    2. A brief introduction to TechnosocialPredictive Analytics and some relatedtechnologies and tool examples

    2.1. Who Is Sick: the power of user-generated maps

    Who Is Sick (http://whoissick.org/sickness/ , Fig. 1) was

    launched in 2006 as a location-aware service for tracking andmonitoring sickness, with the aim of providing current and

    local sickness information by the public to the public. People

    can anonymously post information about their own illness

    and use a familiar Google Maps interface to search and fil-

    ter illness by symptoms, sex, age, and location. The maps act

    as a crude syndromic surveillance tool in real time. A person

    feeling unwell can instantly check what sicknesses are circu-

    lating within their own locale. Health-conscious individuals

    conduct similar scans and take anynecessary preventive mea-

    sures in preparation. People traveling to another area of the

    country or abroad can use the service to learn if there are any

    sicknesses going around that they need to be aware of. One

    should be warned however, that information quality is always

    dependent upon the accuracy of user-reported details and on

    sufficient numbers of people (who are actually sick) using thetool to report their symptoms in a timely manner.

    2.2. We Feel Fine: differentiating between informative

    and affective text on the Social Web

    Created in 2006, We Feel Fine (http://www.wefeelfine.org/)

    is a Web-based applet and API (Application Programming

    Interfacehttp://wefeelfine.org/api.html ) that can be used to

    assess the fluctuating effects of real world events and factors

    such as the Stock Index or the weather on peoples feelings

    and emotions, and ascertain a crude idea about the general

    mood of populations at different dates and locations around

    the world [4].At the heart of We Feel Fine is a data crawler

    that automatically scours the Web every 10min, har-

    vesting data in the form of human feelings from a

    large number of blogs and social pages such as those

    hosted by MySpace (http://www.myspace.com/) and Blogger

    (http://www.blogger.com/). We Feel Fine scans blog posts for

    occurrences of the phrases I feel and I am feeling, and

    once any of these is found, the system goes backward to the

    beginning of the sentence and forward to the end of the sen-

    tence, and then saves the full sentence in a database. Saved

    sentences are next scanned against a manually compiled list

    of about 5000 pre-identified feelings (adjectives and some

    adverbs). If a valid feeling is found, the corresponding sen-tence is considered to represent one person who feels that

    way. We Feel Fine also attempts to extract the username of

    theposts author from the URLof his/her blog post (e.g., steve-

    wheeler in http://steve-wheeler.blogspot.com/ ) and uses it

    to automatically traverse the blog hosting site to locate that

    user profile page. From the profile page, it is often possible to

    extract the age, gender, country, state, and city of residence

    of the blog owner. We Feel Fine can also retrieve the local

    weather conditions for the blog owners city at the time the

    post was written. This process results in about 15,000to 20,000

    saved feelings (with associated data) per day. For display pur-

    poses, the top 200 feelings were manually assigned colours

    that loosely correspond to the tone of the feeling, e.g., happy

    http://dx.doi.org/10.1016/j.cmpb.2010.02.007http://whoissick.org/sickness/http://predictiveanalytics.pnl.gov/http://whoissick.org/sickness/http://www.wefeelfine.org/http://wefeelfine.org/api.htmlhttp://www.myspace.com/http://www.blogger.com/http://steve-wheeler.blogspot.com/http://steve-wheeler.blogspot.com/http://www.blogger.com/http://www.myspace.com/http://wefeelfine.org/api.htmlhttp://www.wefeelfine.org/http://whoissick.org/sickness/http://predictiveanalytics.pnl.gov/http://whoissick.org/sickness/http://dx.doi.org/10.1016/j.cmpb.2010.02.007
  • 8/7/2019 Social Web Mining

    3/8

    18 c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 1 0 0 ( 2 0 1 0 ) 1 6 2 3

    Fig. 2 Screenshot of the available filtering options (feeling, gender, age, weather, location [country city], date) in the We

    Feel Fine applet (the applet can be accessed by clicking on Open We Feel Fine link athttp://www.wefeelfine.org/).

    positive feelings are graphically represented using bright yel-low and sad negative feelings are depicted in dark blue. The

    applet also offers a number of filtering options for querying

    the underlying database (Fig. 2).

    While We Feel Fine functions as intended with its

    pre-coordinated search strategy and list of feelings, it is

    rather limited and cannot fully capture all possible affective

    expressions and differentiate between them and informative

    sentences in Social Web text. One can easily appreciate how

    complex this task is by searching for some keywords: such

    as flu at BingTweets (http://bingtweets.com/ ) and observing

    the nonstop tweets in the resulting real-time Twitter feed

    pane: there will almost always be people tweeting about

    their own flu sickness (e.g., back to bed to try and sleep offthis flu. My sinuses feel numb!counts as one flu case) and

    others just sending informative tweets about flu (sharing

    a news item, some related research or information, or ask-

    ing a question, e.g., How business travelers can avoid swine

    flu: http://bit.ly/zBOuYdoes not count as a flu case). The

    challenge here is to devise an algorithm that can reliably dif-

    ferentiate between the two types of tweetsautomaticallyand

    in real time. Ni et al. [5] describe one solution using a machine

    learning method for classifying informative and affective blog

    posts.

    We Feel Fine also attempts to perform some geographical

    classification of the feelings it continually gathers from the

    blogosphere, but again it does so in a rather limited fashion.

    The limitations extend beyond looking at users profiles forlocation details (which are not always available), because there

    may also be issues to individual posts. A post by, for example

    a user from the USA, might relate to that users experience at

    a totally different location to the one they are normally resid-

    ing in, e.g., the Bloggers holidays in Italy. Fortunately, more

    sophisticated technologies exist today, such as MetaCarta

    (http://www.metacarta.com/), which can be applied to sift

    through Social Web text and extract much more compre-

    hensive geographical references, while intelligently handling

    any geographical name ambiguities or inconsistencies in the

    scanned text [6].

    2.3. Google Flu Trends: an example of infodemiologyand infoveillance in action

    Eysenbach [7,8] defines infodemiology as the science of

    distribution and determinants of information in an elec-

    tronic medium, specifically the Internet, or in a population,

    with the ultimate aim to inform public health and pub-

    lic policy. He further goes on to define a related concept,

    infoveillance, as a similar science whose primary aim is

    surveillance. Infodemiology and infoveillance rely on auto-

    mated and continuous analysis of unstructured, free text

    information available on the Internet. Thisincludes analysisof

    search engine queries (or what Eysenbach calls the demand

    side), as well as of what is being published on Web sites, blogs,

    http://dx.doi.org/10.1016/j.cmpb.2010.02.007http://www.wefeelfine.org/http://bingtweets.com/http://bit.ly/zBOuYhttp://www.metacarta.com/http://www.metacarta.com/http://bit.ly/zBOuYhttp://bingtweets.com/http://www.wefeelfine.org/http://dx.doi.org/10.1016/j.cmpb.2010.02.007
  • 8/7/2019 Social Web Mining

    4/8

    c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 1 0 0 ( 2 0 1 0 ) 1 6 2 3 19

    Twitter, etc. (or the supply side, to use again Eysenbachs

    terminology).

    Building on the same infodemiology and infoveillance

    conceptsdescribed by Eysenbach, researchers at Google found

    that certain search terms are good indicators of flu activity

    [9]. Google Flu Trends (http://www.google.org/flutrends/) uses

    aggregated Google search data to estimate flu activity up to

    two weeks quicker than traditional systems. Such an earlydetection of disease activity (in close to real time), when fol-

    lowed by an appropriate rapid response, has the potential of

    reducing the impact of both seasonaland pandemic influenza.

    Trends (continuously updated) are currently available from

    Google for Australia, New Zealand, the United States (Fig. 3),

    Mexico and sixteen more countries. A related project, Infovigil

    (http://www.infovigil.com/ ), has recently been launched by

    Eysenbach and colleagues at the Centre for Global eHealth

    Innovation, Toronto, Canada, and is introduced in [8]. (Seealso

    the Technosocial Predictive Analytics approach to monitoring

    of seasonal influenza later in this paper.)

    2.4. Emergency and public health virtual situationrooms using 3D serious gaming technology: getting the

    big picture in real time

    Conventional situation rooms are routinely used to oversee

    public health emergencies and disaster management opera-

    tions in realtime. Nowadays, large amounts of emergency data

    are increasingly coming from a wide range of sources in real

    or near-real time and need to be cross-linked and visualized

    where they spatially belong on maps of the affected regions.

    Also, emergencies are usually managed by multi-professional

    teams who, not uncommonly, are distributed in multiple geo-

    graphic locations. Because of these reasons, we have started

    to see physical situation rooms gradually being replaced (orcombined) with virtual situation rooms that use onlinecollab-

    orative (and mostly 2Dtwo-dimensional) platforms such as

    Depiction (http://www.depiction.com/), in addition to conven-

    tional Web conferencing. These platforms, although usable

    and helpful, leave much to be desired, as they are lacking the

    third dimension, which is needed to create a proper percep-

    tion of the emergency space, as well as a sense of co-presence

    of other virtual team members [1,10,11].

    To overcome this limitation, Kamel Boulos [11,12] pro-

    posed the development of novel emergency and public health

    virtual situation rooms in a suitable 3D (three-dimensional)

    virtual world (sometimes also called 3D serious gaming

    platform), where animated 3D graphical self-representations(avatars) of experts and professionals can collaborate together

    and discuss incident data in real time in a simulated 3D

    space representing the physical location where the emer-

    gency/public health incident of interest is unfolding and

    reflecting all changes taking place there (again in real time).

    The real-time link between the virtual world and the physical

    world incident would be two-way and multimodal involving

    geo-tagged physical/environmental sensor data feeds, citizen-

    contributed data (as found in Microsoft Vine and Who Is

    Sick), data gleaned via automatic analysis of Social Web

    content, textual and 3D spatialized audio/voice exchanges

    between virtual team members, video feeds, 3D simulations

    and animations, various Web mashups, and shared desk-

    top applications, among other possibilities. Various what-if

    scenarios can also be explored and tested in the virtual

    environment, making the proposed rooms much suited for

    real-time emergency and disaster management, e.g., for man-

    aging an influenza pandemic and planning and coordinating

    actions at global, regional and local levels.

    To achieve the above vision, the proposed collabo-

    rative and interactive situation room platform wouldmarry virtual globes or 3D mirror worlds (such as

    Google EarthTMhttp://earth.google.com/) and 3D virtual

    worlds (such as Second Lifehttp://secondlife.com/ and

    OpenSimhttp://opensimulator.org/ ) [1,11,12], and comple-

    ment and tightly integrate them with other key technologies,

    e.g., real time, geo-tagged RSSReally Simple Syndication

    data feeds and geo-mashups (using Web services such as

    Yahoo! Pipeshttp://pipes.yahoo.com/ ). The platform would

    weave data and services in real time from different sources

    into a new rich datascape that better reflects the current

    situation (the big picture) in novel visual ways that are easier

    to understand and manage. The platform has to be secure,

    enabling multiple distributed persons to see each other,visualize relevant data together in unique ways, conduct

    3D simulation and training scenarios, and collaborate in

    real time, each according to their assigned role and access

    privileges. True stereoscopic vision can be added, if needed,

    using readily available technologies such as NVIDIA 3D Vision

    (http://www.nvidia.com/object/3D Vision Overview.html) for

    more realistic 3D visualization, with a bettersense of 3D depth

    and relief. It is noteworthy that it is now possible to stream a

    3D virtual world to a suitable mobile phone (see, for example,

    http://www.youtube.com/watch?v=XwRnjbkljnc ) or other

    Internet-enabled, small form factor mobile devices, making

    this vision end-user device and platform-independent and

    thus suitable for those members of the emergency operationsteam who are on the move.

    2.5. Technosocial Predictive Analytics: a systems

    science approach to proactive anticipatory reasoning

    Events occur daily that challenge the health, security and

    sustainable growth of our society, and often find us unpre-

    pared for the catastrophic outcomes. These events involve

    the interaction of complex processes such as climate change,

    emerging infectious diseases, energy reliability, terrorism,

    nuclear proliferation, natural and man-made disasters, and

    geopolitical, social and economic vulnerabilities. If we are to

    prevent the adversities and leverage the opportunities thatemerge from these events, integrated anticipatory reasoning

    has to become an everyday activity. Technosocial Predictive

    Analytics (TPA [13]) supports anticipatory reasoning through a

    multidisciplinary approach to strategic analysis and response

    that enables a concerted decision-making effort by analysts

    and policymakers.

    There is increased awareness among subject-matter

    experts, analysts, and decision makers that a combined

    understanding of interacting physical and human factors is

    essential in addressing strategic decision making proactively.

    For example, multilevel studies that consider a broad range

    of biological, family, community, socio-cultural, environmen-

    tal, policy, and macro-level economic factors are necessary to

    http://dx.doi.org/10.1016/j.cmpb.2010.02.007http://www.google.org/flutrends/http://www.infovigil.com/http://www.depiction.com/http://earth.google.com/http://secondlife.com/http://opensimulator.org/http://pipes.yahoo.com/http://www.nvidia.com/object/3D_Vision_Overview.htmlhttp://www.youtube.com/watch%3Fv=XwRnjbkljnchttp://www.youtube.com/watch%3Fv=XwRnjbkljnchttp://www.nvidia.com/object/3D_Vision_Overview.htmlhttp://pipes.yahoo.com/http://opensimulator.org/http://secondlife.com/http://earth.google.com/http://www.depiction.com/http://www.infovigil.com/http://www.google.org/flutrends/http://dx.doi.org/10.1016/j.cmpb.2010.02.007
  • 8/7/2019 Social Web Mining

    5/8

    20 c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 1 0 0 ( 2 0 1 0 ) 1 6 2 3

    Fig. 3 Google Flu Trends for the United States (screenshot ofhttp://www.google.org/flutrends/intl/en us/). The site was

    displaying a data current through: October 04, 2009 on the day this screenshot was taken (on October 05, 2009). Users canalso view flu activity by individual state.

    prevent and mitigate the emergence of health threats such

    as childhood obesity [14] and drug addiction [15]. Integrated

    understanding of the infrastructural, social and ideational

    context in which contentious social movements operate is

    an essential analytical step in framing the emergence of

    violent behaviour [16]. Combined understanding of anthro-

    pogenic effects (e.g., chemical waste) and natural processes

    (e.g., solar variation) is needed to predict the impact of global

    warming [17]. Insights on human cognitive and emotional

    biases improve our understanding of economic decisions andtheir effect on market prices, returns, and the allocation

    of resources [18]. TPA provides theoretical and operational

    advancements of these insights.

    The human brain provides a basic framework for mem-

    ory and prediction in which decision making emerges as a

    process of pattern storage, recognition and projection rooted

    in our experience of the world and driven by perception

    and creativity [19]. Qualities such as the ability to focus on

    what is perceived to be most important in the moment

    and the capacity to make quick decisions by insight and

    intuition make human judgment uniquely effective [20,21].

    However, the same qualities can also be responsible for fal-

    lacious reasoning when judgment lacks sufficient knowledge

    and expertise [22], and is biased by factors such as group-

    think [23,24], limitations in memory and focus attention

    [25,26], positive framing [27] and increased confidence in

    extreme judgments and highly correlated observables [28].

    TPA addresses these gaps by using knowledge management

    andmodeling to informhumanjudgment andanalytical gam-

    ing to neutralize biased reasoning. Morespecifically, TPA helps

    analysts and policymakers provide better proactive analysis

    and response by enabling naturalistic decision making while

    countering adverse influences on human judgment through acombined set of capabilities that (a) provide multidisciplinary

    knowledge reach-back to inform analysis and response dur-

    ing decision making; (b) supplement the expertise of the

    analyst and policymaker with simulated scenarios generated

    by computational models that integrate human and physi-

    cal factors, and (c) engage analysts and policymakers within

    a gaming environment that stimulates creative critical rea-

    soning through visual analysis and collaborative/competitive

    work, as described in Fig. 4a.

    TPA offers a collaborative visual analytic environment

    with user-friendly access to predictive models in the areas

    of security, energy and the environment that are informed

    by knowledge management processes, as shown in Fig. 4b.

    http://dx.doi.org/10.1016/j.cmpb.2010.02.007http://www.google.org/flutrends/intl/en_us/http://www.google.org/flutrends/intl/en_us/http://dx.doi.org/10.1016/j.cmpb.2010.02.007
  • 8/7/2019 Social Web Mining

    6/8

    c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 1 0 0 ( 2 0 1 0 ) 1 6 2 3 21

    Fig. 4 Top diagram (a) describes TPAs capability components. Bottom diagram (b) illustrates TPAs operational

    environment.

    Such an environment promotes model integration through

    innovative algorithms that enable the combination of diverse

    modeling paradigms such as Bayesian networks and system

    dynamics. Integrated modeling is informed by a Knowl-

    edge Encapsulation Framework that facilitates the distillation

    and vetting of relevant knowledge from heterogeneous

    data streams, including social media. The TPA collaborative

    working environment enables analysts and policymakers to

    stress-test the validity of their analysis and policy plans using

    techniques such as role-playing and serious gaming. Gaming

    data are stored over time and analyzed to make inferences

    about strategic decision making in context from social inter-

    action during game-play.

    TPA provides an ideal systems science approach for pub-

    lic health informatics with reference to challenges such as

    emerging infectious diseases. Current epidemic models char-

    acterize effects of transportation, demographic traits, schools,

    healthcare access, and host-to-hostdynamics. TPA can extend

    http://dx.doi.org/10.1016/j.cmpb.2010.02.007http://dx.doi.org/10.1016/j.cmpb.2010.02.007
  • 8/7/2019 Social Web Mining

    7/8

    22 c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 1 0 0 ( 2 0 1 0 ) 1 6 2 3

    Fig. 5 CDC ILINet vs. normalized flu-keywords: containing ILI-based blog post frequency per week.

    these existing modeling paradigms through the integration

    of socio-cultural models to provide a more accurate char-acterization of how public health responses reflect factors

    such as social order, individual and community-based health

    knowledge, and behaviour change due to public health com-

    munications in mass media and emerging Web technologies.

    One specific application of the TPA approach to public

    health informatics is the monitoring of seasonal influenza

    through Web and social media. As described earlier in this

    paper, many Internet users formulate search queries and post

    entries in blogs and discussion forums using terms related to

    influenza-like-illness (ILI) to identify sites and otherusers that

    offer medical diagnosisand advice [9,29]. There is a clear sense

    in which an increase/decrease in the number of ILI-based

    searches and posts in blogs and discussion forums reflects ahigher/lower public concern for the spreading of influenzaand

    can therefore be used to monitor seasonal influenza. Such an

    insight is corroborated by observed correlations of ILI-based

    blog posts and surveillance reports from sentinel health-

    care providers. For example, Fig. 5 plots ILI-based blog posts

    (source: http://www.spinn3r.com) with outpatient ILI Surveil-

    lance Network (ILINet) by the US Centers for Disease Control

    and Prevention on a weekly basis for the period from Octo-

    ber 05, 2008 to January 25, 2009. The evaluation of Pearsons

    correlation coefficient for the two data series yields a strong

    (r = 0.545) and statistically significant (95% confidence) correla-

    tion value. Such results and analyses can greatly improve the

    quality of existing models of seasonal influenza.

    3. Conclusions

    Technosocial Predictive Analytics is still in its infancy, but

    nevertheless provides health professionals with a powerful

    method of exploiting vast, continuous streams of user-

    generated content on a rapidly proliferating worldwide social

    network. As the technology develops it will provide more

    sophisticated and reliable tools for public health use. It must

    be emphasized however, that such techniques are only appli-

    cable where large proportions of a society regularly use Social

    Web services. There are still many people and sometimes

    entirecommunities whodo notuse theWeb on a regular basis.

    For such communities there may be no easy way to gain gen-eral health data, and it is inadvisable to extrapolate findings

    from other communities in an attempt to predict their health

    state. However, as Social Web use increases exponentially due

    to technology and connectivity becoming more widely avail-

    able andaffordable, this situation will rapidly change. There is

    therefore a need to monitor the continuously shifting demo-

    graphic characteristics of a variety of Social Web services that

    populations are using. Each social network service and its

    prevailing user characteristics such as age, gender and preva-

    lent user geographic locations should be considered as these

    data can help in better interpreting any intelligence or results

    derived from these services.

    r e f e r e n c e s

    [1] M.N. Kamel Boulos, M. Scotch, K.-H. Cheung, D. Burden, WebGIS in practice VI: a demo playlist of geo-mashups forpublic health neogeographers, International Journal ofHealth Geographics 7 (2008) 38, Available at: http://www.ij-healthgeographics.com/content/pdf/1476-072X-7-38.pdf(last accessed 2.12.09).

    [2] S. Wheeler, M.N. Kamel Boulos, Mashing, burning, mixingand the destructive creativity of Web 2.0: applications formedical education, RECIISElectronic Journal ofCommunication, Information and Innovation in Health 1 (1)

    (2007) 2733, Available at: http://www.revista.cict.fiocruz.br/index.php/reciis/article/view/49/51 (last accessed 2.12.09).

    [3] M.N. Kamel Boulos, S. Wheeler, The emerging Web 2.0 socialsoftware: an enabling suite of sociable technologies inhealth and healthcare education, Health Information andLibraries Journal 24 (March (1)) (2007) 223.

    [4] R. Mittal, S. Nikkila, Analyzing the effect of real world eventson feelings using we feel fine, Available at:http://www.public.asu.edu/rmittal2/docs/MittalNikkilaProjectReport.doc (last accessed 2.12.09).

    [5] X. Ni, G.-R. Xue, X. Ling, Y. Yu, Q. Yang, Exploring in theweblog space by detecting informative and affective articles,in: Proceedings of the 16th International Conference onWorld Wide Web, Banff, Alberta, Canada (Session: IndustrialPractice & Experience), 812 May, 2007, pp. 281290,

    Available at: http://www2007.org/papers/paper225.pdfor

    http://dx.doi.org/10.1016/j.cmpb.2010.02.007http://www.spinn3r.com/http://www.ij-healthgeographics.com/content/pdf/1476-072X-7-38.pdfhttp://www.revista.cict.fiocruz.br/index.php/reciis/article/view/49/51http://www.revista.cict.fiocruz.br/index.php/reciis/article/view/49/51http://www.public.asu.edu/~rmittal2/docs/MittalNikkila_ProjectReport.dochttp://www.public.asu.edu/~rmittal2/docs/MittalNikkila_ProjectReport.dochttp://www.public.asu.edu/~rmittal2/docs/MittalNikkila_ProjectReport.dochttp://www.public.asu.edu/~rmittal2/docs/MittalNikkila_ProjectReport.dochttp://www2007.org/papers/paper225.pdfhttp://www2007.org/papers/paper225.pdfhttp://www.public.asu.edu/~rmittal2/docs/MittalNikkila_ProjectReport.dochttp://www.public.asu.edu/~rmittal2/docs/MittalNikkila_ProjectReport.dochttp://www.revista.cict.fiocruz.br/index.php/reciis/article/view/49/51http://www.revista.cict.fiocruz.br/index.php/reciis/article/view/49/51http://www.ij-healthgeographics.com/content/pdf/1476-072X-7-38.pdfhttp://www.spinn3r.com/http://dx.doi.org/10.1016/j.cmpb.2010.02.007
  • 8/7/2019 Social Web Mining

    8/8

    c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 1 0 0 ( 2 0 1 0 ) 1 6 2 3 23

    http://doi.acm.org/10.1145/1242572.1242611(last accessed2.12.09).

    [6] M.N. Kamel Boulos, On geography and medical journalology:a study of the geographical distribution of articles publishedin a leading medical informatics journal between 1999 and2004, International Journal of Health Geographics 4 (March1) (2005) 7.

    [7] G. Eysenbach, Infodemiology: tracking flu-related searches

    on the web for syndromic surveillance, AMIA AnnualSymposium Proceedings (2006) 244248.

    [8] G. Eysenbach, Infodemiology and infoveillance: frameworkfor an emerging set of public health informatics methods toanalyze search, communication and publication behavior onthe Internet, Journal of Medicine Internet Research 11(March (1)) (2009) e11.

    [9] J. Ginsberg, M.H. Mohebbi, R.S. Patel, L. Brammer, M.S.Smolinski, L. Brilliant, Detecting influenza epidemics usingsearch engine query data, Nature 457 (February (7232)) (2009)10121014.

    [10] G.F. Welch, D.H. Sonnenwald, H. Fuchs, B. Cairns, K.Mayer-Patel, H.M. Sderholm, R. Yang, A. State, H. Towles, A.Ilie, M.K. Ampalam, S. Krishnan, V. Noel, M. Noland, J.E.Manning, 3D medical collaboration technology to enhance

    emergency healthcare, Journal of Biomedical Discovery andCollaborations 4 (2009) 4.

    [11] M.N. Kamel Boulos, D. Burden, Web GIS in practice V: 3-Dinteractive and real-time mapping in Second Life,International Journal of Health Geographics 6 (51) (2007),Available at: http://www.ij-healthgeographics.com/content/pdf/1476-072X-6-51.pdf(last accessed 2.12.09).

    [12] M.N. Kamel Boulos, Novel emergency/public health situationrooms and more using 4-D GIS, in: Presented at: ISPRS WGIV/4 International Workshop on Virtual Changing Globe forVisualisation & Analysis (VCGVA2009), Wuhan University,Wuhan, Hubei, China, 2728 October, 2009 (Published inISPRS Archives, vol. XXXVIII ISSN No: 1682-1777 PART4/W10). Available at: http://www.isprs.org/proceedings/XXXVIII/4-W10/papers/VCGVA2009 03608 Boulos.pdf(last

    accessed 10.02.10).[13] A. Sanfilippo, A.J. Cowell, L. Malone, R. Riensche, J. Thomas,

    S. Unwin, P. Whitney, P.C. Wong, Technosocial predictiveanalytics in support of naturalistic decision making, in:Proceedings of NDM9, the Ninth International Conferenceon Naturalistic Decision Making, London, UK, June, 2009.

    [14] NIH RFA-HD-08-023, Innovative Computational andStatistical Methodologies for the Design and Analysis of

    Multilevel Studies on Childhood Obesity (R01). Request forApplications (RFA) Number: RFA-HD-08-023, Department ofHealth and Human Services, National Institutes of Health,2008.

    [15] NIDA, Drug, Brains and Behavior: The Science of Addiction.National Institutes of Health Publication No. 07-5605,National Institute on Drug Abuse, 2007.

    [16] Q. Wiktorowicz, Islamic Activism: A Social Movement

    Theory Approach, Indiana University Press, Bloomington,IN, 2004.

    [17] A. Gore, An Inconvenient Truth: The Planetary Emergency ofGlobal Warming and What We Can Do About It, RodaleBooks, New York, 2006.

    [18] S. Mullainathan, R.H. Thaler, Behavioral economics,International Encyclopedia of the Social & BehavioralSciences (2001) 10941100.

    [19] J. Hawkins, S. Blakeslee, On Intelligence, Owl Books, NewYork, NY, 2004.

    [20] G. Gigerenzer, Gut Feelings: The Intelligence of theUnconscious, Penguin Books, New York, NY, 2007.

    [21] M. Gladwell, Blink: The Power of Thinking WithoutThinking, Little, Brown and Company, Boston, MA, 2005.

    [22] G. Klein, Sources of Power: How People Make Decisions, MIT

    Press, Cambridge, MA, 1998.[23] I. Janis, Victims of Groupthink: A Psychological Study of

    Foreign-policy Decisions and Fiascoes, Houghton, Mifflin,Boston, 1972.

    [24] J. Surowiecki, The wisdom of crowds: why the many aresmarter than the few and how collective wisdom shapesbusiness, in: Economies, Societies and Nations, Little,Brown, London, 2004.

    [25] G. Miller, The magical number seven, plus or minus two:some limits on our capacity for processing information, ThePsychological Review 63 (1956) 8189.

    [26] R.J. Heuer Jr., Psychology of Intelligence Analysis, Center forthe Study of Intelligence, Central Intelligence Agency,Washington, DC, 1999.

    [27] A. Tversky, D. Kahneman, The framing of decisions and the

    psychology of choice, Science 211 (4481) (1981) 453458.[28] D. Kahneman, A. Tversky, On the psychology of prediction,

    Psychological Review 80 (1973) 237251.[29] P. Polgreen, Y. Chen, D. Pennock, F. Nelson, Using internet

    searches for influenza surveillance, Clinical InfectiousDisease 47 (November (11)) (2008) 14431448.

    http://dx.doi.org/10.1016/j.cmpb.2010.02.007http://doi.acm.org/10.1145/1242572.1242611http://www.ij-healthgeographics.com/content/pdf/1476-072X-6-51.pdfhttp://www.ij-healthgeographics.com/content/pdf/1476-072X-6-51.pdfhttp://www.isprs.org/proceedings/XXXVIII/4-W10/papers/VCGVA2009_03608_Boulos.pdfhttp://www.isprs.org/proceedings/XXXVIII/4-W10/papers/VCGVA2009_03608_Boulos.pdfhttp://www.isprs.org/proceedings/XXXVIII/4-W10/papers/VCGVA2009_03608_Boulos.pdfhttp://www.isprs.org/proceedings/XXXVIII/4-W10/papers/VCGVA2009_03608_Boulos.pdfhttp://www.isprs.org/proceedings/XXXVIII/4-W10/papers/VCGVA2009_03608_Boulos.pdfhttp://www.ij-healthgeographics.com/content/pdf/1476-072X-6-51.pdfhttp://www.ij-healthgeographics.com/content/pdf/1476-072X-6-51.pdfhttp://doi.acm.org/10.1145/1242572.1242611http://dx.doi.org/10.1016/j.cmpb.2010.02.007