the rensselaer idea: data exploration
DESCRIPTION
The Rensselaer Institute for Data Exploration and Applications is addressing new modes of data exploration and integration to enhance the work of campus researchers (and beyond). This talk outlines the "data exploration" technologies being exploredTRANSCRIPT
![Page 1: The Rensselaer IDEA: Data Exploration](https://reader033.vdocument.in/reader033/viewer/2022052618/554d94a8b4c905575e8b47d0/html5/thumbnails/1.jpg)
Data ExplorationJim Hendler
Director, Rensselaer Institute for Data Exploration and Applications
THE RENSSELAER IDEARensselaer Polytechnic Institute, USA
http://www.cs.rpi.edu/~hendler
![Page 2: The Rensselaer IDEA: Data Exploration](https://reader033.vdocument.in/reader033/viewer/2022052618/554d94a8b4c905575e8b47d0/html5/thumbnails/2.jpg)
IDEA
• Data-driven Medical and Healthcare Applications• Predictive Models for Business and Economics• “Biome” studies for Built and Natural Environments• Question Answering from texts and data• Resiliency Models for Population-Scale Problems and cyber-
security domains• Semantically-enabled Data Services for Science and
Engineering Research• Materials genome and nano-manufacturing informatics• Platforms for testing Policy and Open Data issues • …
Data-driven research areas at RPI
![Page 3: The Rensselaer IDEA: Data Exploration](https://reader033.vdocument.in/reader033/viewer/2022052618/554d94a8b4c905575e8b47d0/html5/thumbnails/3.jpg)
IDEA
The Rensselaer IDEA: empowering our researchers
Data discovery, integration,
and interaction technologies
Application-specificdata tools
![Page 4: The Rensselaer IDEA: Data Exploration](https://reader033.vdocument.in/reader033/viewer/2022052618/554d94a8b4c905575e8b47d0/html5/thumbnails/4.jpg)
IDEA
High Performance Modeling and Simulation• Center for Computational Innovation
Cognitive Computing • Watson at Rensselaer IBM Partnership
Perceptualization• Experimental Multimedia Performing Arts Center
Data Science• Data Science Research Center
The trunk: Shared Data Technologies
![Page 5: The Rensselaer IDEA: Data Exploration](https://reader033.vdocument.in/reader033/viewer/2022052618/554d94a8b4c905575e8b47d0/html5/thumbnails/5.jpg)
IDEA
Roots: Data Exploration
Discover
Integrate
Validate
Explain
Geekopedia: Data exploration helps a data consumer focus an information search on the pertinent aspect of relevant data before true analysis can be achieved. In large data sets, data is not gathered or controlled in a focused manner. Even in smaller data sets, it is also true that data gathered are not in a very rigid and specific technique can result in a disorganized manner and a myriad of subsets each…
DATA
![Page 6: The Rensselaer IDEA: Data Exploration](https://reader033.vdocument.in/reader033/viewer/2022052618/554d94a8b4c905575e8b47d0/html5/thumbnails/6.jpg)
IDEA
Data Exploration Challenges
Discover
Integrate
Validate
Explain
These needs live outside traditional data/info architectures
![Page 7: The Rensselaer IDEA: Data Exploration](https://reader033.vdocument.in/reader033/viewer/2022052618/554d94a8b4c905575e8b47d0/html5/thumbnails/7.jpg)
IDEA
Discovery needs semantics
How do you find the Data you need?
Middle Eastern Terrorists for $800 ?
![Page 8: The Rensselaer IDEA: Data Exploration](https://reader033.vdocument.in/reader033/viewer/2022052618/554d94a8b4c905575e8b47d0/html5/thumbnails/8.jpg)
IDEA
Discovery – there’s a lot out there
![Page 9: The Rensselaer IDEA: Data Exploration](https://reader033.vdocument.in/reader033/viewer/2022052618/554d94a8b4c905575e8b47d0/html5/thumbnails/9.jpg)
IDEA
Discovery needs more than keywords
World Bank: Africa
US Data.gov: Crop
Africover: Agriculture
Kenya: Agricultural
![Page 10: The Rensselaer IDEA: Data Exploration](https://reader033.vdocument.in/reader033/viewer/2022052618/554d94a8b4c905575e8b47d0/html5/thumbnails/10.jpg)
IDEA
Integration needs Semantics
Person
RIN 660125137
Address # 1118
Address St Pinehurst
Address zip 12203
Course topic CSCI
Course # 4961
Campus Personnel
RPI ID 660125137
Name Hendler
Campus Classes
CRN 1118
Name Intro to Physics
YES
NO!!!!
![Page 11: The Rensselaer IDEA: Data Exploration](https://reader033.vdocument.in/reader033/viewer/2022052618/554d94a8b4c905575e8b47d0/html5/thumbnails/11.jpg)
IDEA
Semantic Web and Linked Data (UK)
County Council
Ordnance Survey
Royal Mail
IOGDC Open Data Tutorial 11
![Page 12: The Rensselaer IDEA: Data Exploration](https://reader033.vdocument.in/reader033/viewer/2022052618/554d94a8b4c905575e8b47d0/html5/thumbnails/12.jpg)
IDEADistribution Statement
http://logd.tw.rpi.edu
Data Mashups
![Page 13: The Rensselaer IDEA: Data Exploration](https://reader033.vdocument.in/reader033/viewer/2022052618/554d94a8b4c905575e8b47d0/html5/thumbnails/13.jpg)
IDEA
Validation needs semantics
Easy for us
![Page 14: The Rensselaer IDEA: Data Exploration](https://reader033.vdocument.in/reader033/viewer/2022052618/554d94a8b4c905575e8b47d0/html5/thumbnails/14.jpg)
IDEA
Hard for machines…
Head to head comparison shows that burglaries in Avon and Somerset (UK) far exceed those in Los Angeles, California
![Page 15: The Rensselaer IDEA: Data Exploration](https://reader033.vdocument.in/reader033/viewer/2022052618/554d94a8b4c905575e8b47d0/html5/thumbnails/15.jpg)
IDEA
Data + everything else you know
Same or different?
Do the terms mean the same? Are they collected in the same way? Are they processed differently? …
![Page 16: The Rensselaer IDEA: Data Exploration](https://reader033.vdocument.in/reader033/viewer/2022052618/554d94a8b4c905575e8b47d0/html5/thumbnails/16.jpg)
IDEA
Trends in Smoking Prevalence, Tobacco Policy Coverage and Tobacco Prices (1991-2007)
Validation/Explanation need knowledge
Statistical correlation needs explanation
![Page 17: The Rensselaer IDEA: Data Exploration](https://reader033.vdocument.in/reader033/viewer/2022052618/554d94a8b4c905575e8b47d0/html5/thumbnails/17.jpg)
IDEA
Explanation also needs Semantics
Inference Web: McGuinness – various DoD/IC projects
![Page 18: The Rensselaer IDEA: Data Exploration](https://reader033.vdocument.in/reader033/viewer/2022052618/554d94a8b4c905575e8b47d0/html5/thumbnails/18.jpg)
IDEA
Closing the loop: where do the semantics come from?
Data
Prediction
Model
Design
How do we go from the predictive analytics of Big Data to models/explanations that allow newunderstanding?
![Page 19: The Rensselaer IDEA: Data Exploration](https://reader033.vdocument.in/reader033/viewer/2022052618/554d94a8b4c905575e8b47d0/html5/thumbnails/19.jpg)
IDEA
1. Better tools for Analytics, Agents and HPC
Make the tools and algorithms being developed by RPI researchers more “reusable” and multitask (including HPC data-analytic tools)
![Page 20: The Rensselaer IDEA: Data Exploration](https://reader033.vdocument.in/reader033/viewer/2022052618/554d94a8b4c905575e8b47d0/html5/thumbnails/20.jpg)
IDEA
2. Next-Gen Visualization (at scale)
How can multi-modal, multi-user, large scale sensory (visualization, sonification, haptics) interaction change the way we understand data?
![Page 21: The Rensselaer IDEA: Data Exploration](https://reader033.vdocument.in/reader033/viewer/2022052618/554d94a8b4c905575e8b47d0/html5/thumbnails/21.jpg)
IDEA
3. Include “agents” in the modeling
Develop technologies that enable researchers to work with “human-based” data at larger scales and in new ways• Population-scale
computing models for agent-based simulations
![Page 22: The Rensselaer IDEA: Data Exploration](https://reader033.vdocument.in/reader033/viewer/2022052618/554d94a8b4c905575e8b47d0/html5/thumbnails/22.jpg)
IDEA
Approach
Platform: Research in using supercomputers fordiscrete modeling• Carothers’ ROSS model
KR Model:• Weaver’s restricted rules
on graphs
Challenge problem:• Classification algorithms at petaflop scale• “Logical” (nonlinear, discontinuous) agents
![Page 23: The Rensselaer IDEA: Data Exploration](https://reader033.vdocument.in/reader033/viewer/2022052618/554d94a8b4c905575e8b47d0/html5/thumbnails/23.jpg)
IDEA
4. Exploit Cognitive Computing
IDEA will be the hub of Rensselaer’s cognitive-computing research• eg. Answer questions such as “Why” and “How”
integrated with large scale simulations
![Page 24: The Rensselaer IDEA: Data Exploration](https://reader033.vdocument.in/reader033/viewer/2022052618/554d94a8b4c905575e8b47d0/html5/thumbnails/24.jpg)
IDEA
Watson’s parallel model
Distributed (coarse-grained) parallelism© Making Watson Fast, IBM J Res and Dev,3/4 2012
![Page 25: The Rensselaer IDEA: Data Exploration](https://reader033.vdocument.in/reader033/viewer/2022052618/554d94a8b4c905575e8b47d0/html5/thumbnails/25.jpg)
IDEA
DeepQA type approach best on large clusters
(Physical) Simulation runs on supercomputers
Cognitive Computing at Scale
![Page 26: The Rensselaer IDEA: Data Exploration](https://reader033.vdocument.in/reader033/viewer/2022052618/554d94a8b4c905575e8b47d0/html5/thumbnails/26.jpg)
IDEA
Approach: link these computational models
Surmise (unproven): Cognitive Computing on a fast (large) cluster can query computations run against data generated by simulations (physical or agent-based) on the supercomputer
![Page 27: The Rensselaer IDEA: Data Exploration](https://reader033.vdocument.in/reader033/viewer/2022052618/554d94a8b4c905575e8b47d0/html5/thumbnails/27.jpg)
IDEA
• Semantics is a key technology for common data services
5. Data services will provide synergy across disciplines
Discovery, Integration. ValidationCuration, Citation,Archiving …
![Page 28: The Rensselaer IDEA: Data Exploration](https://reader033.vdocument.in/reader033/viewer/2022052618/554d94a8b4c905575e8b47d0/html5/thumbnails/28.jpg)
IDEA
Conclusions• The “warehouse” is only a small part of the data
ecosystem• Database technologies are only part of the story• Discovery, Integration, … , validation, explanation are key to
solving problems with data
• Closing the loop means “exploring” our data • Humans are still a key player in this
• The Rensselaer IDEA will explore• Data-driven applications and tools, but also…• … multimodal visualization, multiscale and agent modeling,
cognitive computing, and semantic data platforms
![Page 29: The Rensselaer IDEA: Data Exploration](https://reader033.vdocument.in/reader033/viewer/2022052618/554d94a8b4c905575e8b47d0/html5/thumbnails/29.jpg)
Rensselaer Institute for Data Exploration and Applications