big data and tangibles - tei 13

Post on 01-Jul-2015

84 Views

Category:

Education

4 Downloads

Preview:

Click to see full reader

DESCRIPTION

Slides created for the Tangible Embedded & Embodied Interaction conference 2013

TRANSCRIPT

From Big Data to Insights:

Opportunities and Challenges

for TEI in Genomics

Orit Shaer, Ali Mazalek, Brygg Ullmer, Miriam K. Konkel

Outline

Introduction to genomics/motivation

Design challenges

Case studies

Opportunities for TEI

Going forward

Genomics

“While the work is a challenge, making genetics

interactive is potentially as

transformative as the move from batch

processing to time sharing” -Bafna V. et al. Communications of the ACM Jan 2013

Project flow:

Genome Sequencing Project

Sequencing

Centers

High-

throughput

Sequencing

Draft Sequence

Finished Sequence

Sequence Archiving

Genome Annotation

DNA

Sequence

Protein

Prediction Pathways

Comparative

Analysis

Target Selection

Schkolne, Ishii, and Schroder 2004.

TEI for Scientists

Gillet et al. 2005 Brooks et al. 1990

Project GROPE

Tabard, A., et. al 2011. eLabBench.

Challenges

Scale

Heterogeneous Data

Diverse Audience

Scale

Filesystem @ Broad Inst.: 13+PB

One run of an Illumina HiSeq 2500:

6 billion paired-end sequences

(600 gigabases, or 120Gb/day)

Thousand Genomes project:

692 collaborators

110 institutions

>15 groups in (bi-)weekly

conference calls

Blue Waters cluster:

>380K CPU cores

+ >3K GPUs

Heterogeneous Data

Diverse Audience

Citizen Scientist

Genomic Scientists

Citizen Scientist General Public

Future Scientists

How can TEI systems be designed to

• Empower citizens to make informed health decisions?

• Communicate scientific data to communities?

• Enhance learning of complex concepts?

• Support experts interacting with big data?

Challenges

Scale

Heterogeneous Data

Diverse Audience

Case Studies

Tabletop Genome Browsing & Primer Design

Tangible-targeted Computational Genomics

Tangibles For Visualizing Systems Biology

Locate

Learn Retrieve

Annotate

Compare

48.4%

1.0% 2.4%

46.6%

1.6%

Human genome: understanding ca. 2012

Mobile elements

Processed pseudogenes

Tandem repeats & lowcomplexity DNA

Dark matter

Protein & RNA codingregions

Composition of other primate genomes is very similar

Tangibles-targeted computational genomics

Example projects: rhesus, orangutan, human, marmoset genomes

• Often multi-institution, multi-person efforts

– Above articles: ~250, 100 co-authors

• Often long duration (e.g., 4-6 years before first publication)

• Iterative fusion of computational and “wet bench” analyses

• Some analyses “big CPU” (e.g., 200 cpu cores for weeks);

others, “big RAM” (200+GB RAM)

Tangible Visualization:

persistent representations

of people, projects, activities…

Interactions 2012.07: Entangling space, form, light, time, computational STEAM, and cultural artifacts

CS3: Systems Biology Modeling

Lessons learned TEI can facilitate immediate, visible, and easily reversible manipulations

• How to design TEI for open-ended creative inquiries?

Tangible representations can facilitate multi-stage workflows

• Important for execution and tracking of complex analyses

• Need parametrized, annotatable representations of complex large datasets

TEI could facilitate collaboration for distributed and co-located teams

• Large interdisciplinary teams and distributed work are common in this area

• Users can jointly manipulate assumptions and see consequences

Tangible tools can support understanding and discovery

• Provide access to different pieces of the problem (data, reactions)

• Help users forms accurate mental models through tangible/embodied manipulation

Opportunities for TEI Engagement

Understanding Complex Problems

Visualizing Biological Data

Enabling Large Collaborations

Supporting Diverse Audiences

Managing Varied Timescales

Understanding Complex Problems

Enabling Large Collaborations

Managing Varied Timescales

Powers of 10,000:

• Milliseconds

• Minutes

• Months

• Millenia

Entangling Space, Form, Light, Time, Computational STEAM, and Cultural Artifacts

Examples

• Many genome projects: 5+ years

• Sequencing Lincoln’s DNA: under

active discussion since 1991

• Most of us sequenced within decade?

materially impacting all our descendants

Going forward

• Some aspects w/ broad TEI, computational science synergies

• How to visualize and engage data, activity, progress spanning

many systems, people, places, timescales?

• What representational forms, device ecologies, most

appropriate for large, abstract data?

• Facilitating engagement with big data in ways that highlight

connections between multiple forms of evidence

• Some aspects specific to genomics

• 2023: anticipate most of us in room + many thousands of

species having genomes fully or partially sequenced

• Commonalities, distinctions in engagements by scientists,

students, street people, senators, senior citizens, solicitors, …

THANKS!

Orit Shaer: oshaer@wellesley.edu

Ali Mazalek: mazalek@gatech.edu

Brygg Ullmer: ullmer@lsu.edu

Miriam Konkel: konkel@lsu.edu

Consuelo Valdes (Wellesley College) and Andy Wu (Georgia Tech).

This work has been partially funded by NSF IIS-1017693, DRL-

097394084, and CNS-1126739.

top related