data socialscienceprogramme

4

Click here to load reader

Upload: dan-mcquillan

Post on 16-Apr-2017

605 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Data socialscienceprogramme

{Data|Social} Science!29/11/13 14:00-18:30 NAB314!!14:00 Introduction!!14:15 Big Data Practices !Evelyn Rupert!!14:30 Data as Design Tool!Rebecca Fiebrink!!14:45 Gamification, visualisation and crowd-sourcing!Frederic Fol Leymarie!!15:00 Understanding Game play behaviour!Jeremy Gow!!15:15 Big Data and Disasters!Dhiraj Murthy!!15:30 Coffee!!15:45 Ethical Challenges for Data Science!Dan McQuillan!!16:00 Values in modern digital relations and traditional prosperity theology!Bev Skeggs!!16:15 Legible Machine Learning!Marco Gillies!!16:30 On some Data Science applications in interdisciplinary research !Daniel Stamate!!16:45 MSc Data Science!Daniel Stamate!!17:00 Discussion and Drinks!!!

Page 2: Data socialscienceprogramme

Big Data Practices !Evelyn Rupert!!I use the term ‘Big Data practices’ to suggest that what is ‘big’ about Big Data are changing practices that are reconfiguring four kinds of relations: social, method, data, and research. I’ll focus on the latter and how our academic craft is generating Big Data from online research articles to other forms of digital content such as websites, databases, blogs, profiles, images, tweets, podcasts and so on.  Through these online mediums academics are re-versioning and multiplying their research outputs such that the main output – the research article – is but one of a larger and longer process of relations and practices accumulated as data on the internet. How might we think about this?  I’ll respond to this in relation to the journal I am editing, Big Data & Society. I’ll discuss how we are organising the journal as a digital space for linking out to related content and developing a ‘lively’ logo built on the co-word analysis of journal keywords to explore how it is part of the practices making up what ‘is’ Big Data.!

!Data as Design Tool!Rebecca Fiebrink!!Gamification, visualisation and crowd-sourcing!Frederic Fol Leymarie!!Gamification, visualisation and crowd-sourcing, with as an illustration our new BBSRC grant: DockIt: a Crowd-Sourced Molecular Docking Puzzle Game. I will address the potential to apply this approach to other complex big data & analytics problems, in particular in the realm of smart-cities.!Information retrieval: the need for better multimedia search search and data management. I will illustrate what we can contribute with recent on-going research on a novel way to search on images using shape information; work funded in part by the EU FET project CEEDs. I will say a few words about CEEDs as well, which focuses on novel interfaces for human user dealing with complex big data problems: http://ceeds-project.eu/!!Understanding Game play behaviour!Jeremy Gow!!Big Data and Disasters!Dhiraj Murthy!!Though natural disasters are product of meteorological, seismic, and other physical actors, they are always social events. Specifically, the ways in which warning occurs, disasters are responded to, and how reconstruction takes place are all mediated by sociopolitical factors. These three time envelopes of pre-disaster, diaster, and aftermath are particularly important in studying disasters. Social media is 'always on' and ubiquitous and these traits have meant that data is being generated during all three time periods. The volume of data being collected on various social media is immense and easily places it within the category of Big Data. My recent work has been focused on data from Hurricane Sandy. The storm caused over $65 billion in damage, making it the second costliest storm in U.S. history. In this project, I examine the behavior of Twitter users from October 22, 2012 to November 3, 2012, using mentions, links and hashtags for data analysis. We found that certain Twitter rose to prominence depending on the stage of the storm.  For example, in the days following Hurricane Sandy’s initial landfall, users became more

Page 3: Data socialscienceprogramme

interested in relief efforts. Data was collected from October 22, 2012 to November 3, 2012, giving a two week window of Twitter activity. We utilized the Twitter API to collect geo-located tweets from 50 major US cities. Tweets were filtered for three storm related terms: “hurricane”, “storm” and “sandy”, yielding a total of 142,768 tweets. A second project I am working on refined this data by following any links to Instagram images within the tweet. This search returned 11,964 Instagram images that were hand coded into thirteen separate categories. By studying these images, we were able to discern which categories rose to prominence during the three time envelopes. For example, food images were mostly dominant pre-disaster and during the disaster, damage-related images were dominant. The data and methods of both projects will be briefly introduced.!!!Ethical Challenges for Data Science!Dan McQuillan!!This presentation will interrogate key ethical challenges that are arising at the borders of social science and computing, and will suggest some approaches to transform these tensions into productive lines of research. In a post-PRISM environment, big data research needs distinguish itself from surveillance. 'Because we can' is not an adequate rationale for researching social media and the data exhaust because it is indistinguishable from the dynamics of the NSA on one hand and Silicon Valley on the other. Do ethics committees understand the implications of heterogenous metadata better than the judiciary who failed in their oversight of PRISM? Further, the algorithms are as important as the data- a computing-based understanding of algorithms must be combined with a sociological appreciation of their consequences. We are already seeing a proliferation of 'predictive methods' with the application of data science and machine learning to everything from Wonga loans to drone strikes. Rapid development of methods is outpacing the development of a social framework for their governance. By drilling down to issues of!data construction, and looking at algorithms through a combination of Foucault and cybernetics, this presentation will propose participatory methods as an important new line of development in data science, and suggest that emerging areas of citizen science are finding an appropriate balance of the empricial and the ethical.!!Values in modern digital relations and traditional prosperity theology!Bev Skeggs!!There has been a great deal of interest in how capital has intervened in almost every area of life, leading some to propose new forms of capital eg ‘emotional capitalism’, and others to suggest that processes of valuation are now the major method for understanding the social world. Whilst, no doubt, capital behaves according to its own logic, finding new lines of flight, converting affects into value, making multi-culturalism marketable, generating !new forms of bio-capital, and making many of our actions subject to the logic of calculation, this project asks if anything is left behind. Is there anything that cannot be capitalized upon? Many social theories reproduce the logic of capital. But if we only understand the world from the perspective of this logic what do we miss seeing? My !previous research projects have drawn attention to how values are formed beyond value, unnoticed and unseen, producing new ways of being and doing in the world, organized differently through spatial and temporal co-ordinates. This project consolidates and expands this analysis by exploring values (and their relationship to value) through two limit cases that attempt to convert all values to value: modern digital relations and traditional prosperity theology. !!

Page 4: Data socialscienceprogramme

Legible Machine Learning!Marco Gillies!!This talk will give an overview on research that uses machine learning as part of a tool to enable actors and ordinary gamers to design the movement and behaviour of a virtual character. They use data of their movements as the means for customising the algorithms that control the characters. The key challenge in this work is how to debug the models when they go wrong and do not work as intended. Learning algorithms are often opaque, even to expert researchers, making them difficult to debug. This research has lead us to the importance of designing algorithms and tools that are legible to users. This means that they must support a clearly legible conceptual model both in their interface and the algorithm itself. We will conclude with a brief discussion of how this might apply to data research in the social sciences. !!On some Data Science applications in interdisciplinary research !Daniel Stamate!!We present a series of applications of Machine Learning, Statistical Data Mining and Big Data Analytics and research work in: (a) predicting medical treatment outcomes based on genotype data in medical sectors in which efficient treatment prescribing is paramount but in which the trial and error approach to prescribing a working treatment is current practice; (b) diagnosing cancer patients based on gene expression data; (c) the evaluation of forecasting models in the renewable energy sector (wind time series); (d) web mining and sentiment analysis; (e) mining  census data. A brief introduction of the new Data Science & Soft Computing Lab and its activity will conclude this presentation.!!16:45 MSc Data Science!Daniel Stamate!!We outline the profile of this new MSc programme in Data Science, and the opportunities it brings to its students in particular in studying cutting edge Data Science technologies, and in being exposed to and potentially involved in interdisciplinary research work in the College, to which these students could contribute with their expertise in Machine Learning, Statistical Data Mining, and Big Data Management and Analytics during their final project work or possibly in subsequent PhD study. These fields inspire new trends indeed not only in industry but in any other sector of activity, including research, in which processing and analysing data brings unprecedented challenges and offers unprecedented opportunities. In this presentation we want also to suggest concrete ways in which the Data Science MSc's students could be offered the opportunity to be inspired by the interdisciplinary research activities developed in the College's departments, opportunity which could potentially be followed by the involvement of some of these students in these activities.