epsrc cross-sat big data workshop: well sorted materials
TRANSCRIPT
EPSRC Cross-SAT Big DataWorkshop:
Well Sorted Materials5th August 2015
ContentsIntroduction 1
Dendrogram 2
Tree Map 3
Heat Map 4
Raw Group Data 5
For an online, interactive version of the visualisations in this document, go here:
www.well-sorted.org/output/EPSRCBigData
Introduction
Dear participant,
Thank you for taking part in submitting and sorting your ideas.
This document contains several visualisations of your ideas, grouped by the average of your online sorts. Theyare:
Dendrogram - This tree shows each submitted idea and its similarity to the others. The lower two ideas 'join' themore people grouped those two ideas together. For example, if two ideas join at the bottom, every persongrouped those two together.
Tree Map - This visualisation presents an 'average' grouping. It is calculated by 'cutting' the Dendrogram at thedashed line so that any items which join lower than that line are placed in the same group. In addition, rectangleswhich share a side of the same length are more similar to each other than their peers.
Heat Map - This visualisation shows a similarity matrix where each idea is given a colour at the intersection withanother idea, showing how similar the two are. This is useful to see how well formed a group is. The more redthere is in a group (shown by the black lines), the more similar the ideas inside it were judged to be.
Raw Group Data - This table shows every submitted idea and its longer description. They are shown in the sameorder as the Dendrogram (so similar ideas are close to each other) and split into the coloured groups used in theTree Map. In addition, each idea has been given a unique number so they are easier to find.
EPSRC Cross-SAT Big Data Workshop - Materials Generated by Well-Sorted.Org Page 1
Raw Group Data
Colour # Title Description
Red 1 Anonymity and Privacy Pulling together data to grain greater insights. Howeverthis has an impact on privacy. Even anonymisation or
pseudonymisation have challenges as data can beworked back to the source. Mathematic techniques to
prevent deduction of data would be useful.
2 Data privacy for Big Data Understanding how to achieve adequate levels ofprivacy given the difficulty in using traditional methods
of cleaning and delinking.
3 Data personalisation and de-identification
Finding robust, scaleable, practical ways of reconcilingthe tension between the need to identify/own personaldata (and data derived from personal data) while alsobeing able to contribute to population level analyses of
data without being identified.
4 Cybersecurity Analysis of how big data can be used to track activityin an organisation and detect possible cyber attacks,track threats and campaigns and provide context and
insight into the form of attacks. This could enable earlywarning and prevention.
5 Trusted Crowd SourcedInformation
How can we develop methods that ensure crowd-sourced information is accurate and trustworthy?
EPSRC Cross-SAT Big Data Workshop - Materials Generated by Well-Sorted.Org Page 5
Colour # Title Description
Blue 6 Applying techniques to newareas
For many big data problems there is a lack of expertisein the field to understand the correct approaches to
employ and this needs data scientists and applicationscientists to work together.
7 EPS big data problems There are a huge numbers of areas in the EPSdisciplines where big data problems exist, but the
expertise to tackle them does not. This is especiallyimportant for EPSRC engagement with industry.
8 Innovation ready data scientists How do we produce the multidisciplinary researchersand entrepreneurs who are commercially aware, cantalk/engage with people, and yet also have the data
analytic and visualisation skills?
9 Develop the skill set There is a significant lack of people with properexperience and knowledge of handling big / distributed
datasets. Those people with academic experienceoften don't have experience with the tools used in
industry
10 Raise the profile of StatisticalScience in UK
At the heart of Big Data is the statistical methodsrequired to make sane and rational inferences that
lead to actionable knowledge. SS is central to realisingpotential of Big Data and this is a good thing for
EPSRC supporting SS.
Colour # Title Description
Green 11 Taking the data analyst out ofthe loop
How do we create data exploration interfaces andassociated methodologies that enable (and guide) non-experts to explore their own data to discover and then
exploit value. Is this partly an education issue?
12 Improving accessibility to bigdata
Research to create methods & tools that non-expertscan use to explore the potential of big data - enablingwider uptake and new kinds of innovation and impact
driven by a wide range of people
13 Complex Data Visualisation How can we develop visualisations that make complexdata sets easier to understand and analyse.
EPSRC Cross-SAT Big Data Workshop - Materials Generated by Well-Sorted.Org Page 6
Colour # Title Description
Orange 14 Defining the physiologicalenvelope
Sophisticated physics-based simulations of individualpatients are performed using exquisitely accurate
anatomical data from medical images. The boundaryconditions are equally important and should be
personalised from information in the clinical record.
15 Uncertainty and variation inphysiological models
We need to learn how to characterise and to representthe uncertainty and variation in information that comes
from clinical data, and to develop methods for thepropagation into physiological models for diagnosis
and interventional planning.
16 Dynamic modelling of complexreal-life systems
Synthetic and playable data-driven models of complexinteracting systems in biology, engineering, health,
environment, transport, robotoics, manufacturing andpublic policy, unsupervised learning of emergentphenomena capable of driving decision making.
17 Define a connectedness inresearch landscape
Almost all sciences are becoming more reliant on thesensible analysis and production of Big Data. Many ofthese disparate disciplines are being tied together by
the need of common computational statistical methodsto make the advances BD promises.
18 Application-specific research Work in support of applications in the Digital Economyand PaCCS programmes. Specifically there will be
emerging "big data" challenges from the ESRCResearch and Evidence Hub and also the new EPSRC
IoT Research Hub.
19 Opportunity to deliver impact innumber of areas
There is no doubt that BD presents many opportunitiesto physical sciences - how this is to be harnessed anddeliver impact can be facilitated by EPSRC - ATI being
one example of many.
Colour # Title Description
Purple 20 Economic and social models fordata
Much of the impact from big data should come fromthe new economies and social structures that it
engenders. Can we model this in a way that givesuseful, predictive abilities for economies and societies
of the future.
21 Social and realtime data Making use of new forms of data coming from socialnetworks and sensor networks to augment curated
data from longitudinal studies.
22 Symbiotic Human-MachineCollaboration
Understand the science of how humans and machinescan work together in the most effective way that makes
the best use of their complementary data analysisskills
EPSRC Cross-SAT Big Data Workshop - Materials Generated by Well-Sorted.Org Page 7
Colour # Title Description
Yellow 23 Closing loops Creating new and automated ways of collecting in-use,through and end of life data and feeding it back to earlystages of product development processes - to improveexisting products and inform the development of new
ones.
24 Archiving software is essentialfor big data
Software often generates big data. If this is the case, itis not necessary to always keep the big data but isimportant to make sure the software generating thedata is archived in a sustainable and recoverable
manner.
25 When do we no longer need adataset?
The increase in speed of software can mean it is betterto reproduce data than store it. A life expectancy of thisdata might therefore be four times as long as it takes to
generate it. Understanding this life cycle is key to aneffective big data policy.
26 Lots of little bits of data makebig data
Alot, if not most, of big data is made up of many bits ofsmall data. Engendering a culture of sustainablydocumenting and archiving small data is a critical
component of many scientific areas where EPSRC cancontribute by promoting best practice.
27 Sharing Big Data from ManyProducers
Understanding how to move big data where the modelis not simply having one large instrument, but is
instead many producers of large amounts of data whoneed to share and combine subsets places different
stresses on the infrastructure we have in place.
28 Data discovery and aggregation Within large organisations data is often distributed inseparate IT systems many using commercial
enterprise software. How do we apply map-reducetype operations at a meta-level? How do we securelyaggregate data that may be commercially sensitive?
29 Methods/infrastructure tointegrate models/data
The model-based interpretation of Big Data requireseffective and efficient integration of the data with themodelling and simulation tools. We need to develop
the whole area of physics-based Reduced-OrderModelling, and the infrastructure to integrate.
Colour # Title Description
Pink 30 Creation of SyntheticBenchmark Datasets
Based on an understanding of application areas, thecreation of large scale, realistic data sets that can be
used to benchmark potential solutions.
EPSRC Cross-SAT Big Data Workshop - Materials Generated by Well-Sorted.Org Page 8
Colour # Title Description
Silver 31 Data Analytics for DisruptiveBusiness Models
New ways of collecting, analysing, visualisingheterogeneous data (especially that of new double-
sided markets and platforms). What are the best waysto encourage the two sides to provide value?
32 Landing decisions Businesses don't need more data or insights, theyneed better decisions based on data. how to translatebig data into business changes and impact (beyond
good sounding case studies) is difficult.
33 Asset optimisation How do we organise assets to provide optimisedservice offerings? E.g. real-time asset tracking andhealth monitoring; portfolio management; predicting
customer behaviour and usage patterns.
34 Product optimisation How do we optimise products using all relevantproduct data? E.g. physics based design simulations;manufacturing process data; service data from current
products in the field; knowledge of the market placeand competitor products.
Colour # Title Description
Brown 35 Intelligent simulations Big data could be used to inform the design, validationand verification of computational simulations - bringing
process simulations closer to real-world processes,akin to CAD for 3D products.
EPSRC Cross-SAT Big Data Workshop - Materials Generated by Well-Sorted.Org Page 9
Colour # Title Description
Cyan 36 Mathematics of Information Mathematics is the language of information and data.Some mathematical areas (harmonic analysis,
optimization, computation, topology) are alreadyengaged in Big Data research, the challenge is tocreate intellectual space for further developments.
37 Transformation at the maths /CS interface
Using problems of challenging data (big,heterogeneous, streaming, soft, uncertain, partial,
garbled) to generate novel research at the interfacebetween mathematics and computer science,
especially in algorithms, complexity, computability andreasoning.
38 Algorithms Theoretical Computer Sciences are fairly weak in UK,the challenge is expanding capacity in general area of
Algorithms: deterministic, random, combinatorial,mixed..., their design, analysis and complexity.
39 Data algorithms at scale Developing robust algorithms to give good enoughdecisions/predictions in situations of very high datavolumes/velocities and/or at very low levels of dataintegrity (through heterogeneity or measurement
uncertainty).
40 Development of new algorithmsfor big data
Given the size and complexity of some big dataproblems new approaches are needed that combined
statistics, computer science and high performancecomputing to tackle the analysis.
41 Analytics Efficiency As data grows exponentially although advances inprocessing it are also speeding up there comes a point
when the cost of analysing it exceeds the valuegained. Research into efficient analytic techniques will
help to continue getting the benefit from this
42 Cross-cutting machine learning Machine learning is the machine room of Big Data. Itspans themes in Statistics, Functional Analysis,Approximation Theory, Optimization, Computer
Science and Engineering. The challenge is to planresearch there as truly inter-disciplinary.
43 Novel Machine LearningMethods
New algorithms to support real-time analysis ofuncertain, incomplete, inconsistent and possibly
corrupted data.
44 Scalable Machine LearningFrameworks
Interactive programming notebooks could be the newexcel. If these get linked to distributed computing andeasy to use programming frameworks, business canget faster access to insights without specialised staff.
45 Multi-scale data analysis Applications involving very large data sets and streamsbeing mined for structure, pattern and communitydetection on a variety of scales simultaneously.
Examples: social, economic, behavioural as well assystems biology, gen- and proteomics, cosmology.
Powered by TCPDF (www.tcpdf.org)
EPSRC Cross-SAT Big Data Workshop - Materials Generated by Well-Sorted.Org Page 10