cancer informatics data visualisation presentation

46
Visual Analytics: An exploration Presentation on Ideation of cancer informatics representation By – Rupam and linu

Upload: rupam-das

Post on 13-Jul-2015

187 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Cancer informatics data visualisation presentation

Visual Analytics: An explorationPresentation on Ideation of cancer informatics representation

By – Rupam and linu

Page 2: Cancer informatics data visualisation presentation

Big DataBig Data is about the growing challenge that organizations face as they deal with large and fast-growing sources of data or information that also present a complex range of analysis and use problems. These includes:

• Having a computing infrastructure that can ingest, validate, and analyze high volumes (size and/or rate) of data

• Assessing mixed data (structured and unstructured) from multiple sources

• Dealing with unpredictable content with no apparent schema or structure

• Enabling real-time or near-real-time collection, analysis, and answers

Page 3: Cancer informatics data visualisation presentation

Big Data AnalyticsBig data is a big industry. Research conducted at the Massachusetts Institute of Technology shows that companies that use “data-directed decision making” enjoy a 5%– 6% increase in productivity. There is a strong link between effective data management strategy and financial performance. Companies that use data most effectively stand out from the rest.

The potential advantages of big data analytics within the medical field have resulted in public policy initiatives to mine and leverage such data. David Cameron, Prime Minister of the United Kingdom, recently announced that every NHS patient would henceforth be a “research patient” whose medical record would be “opened up”for research by private healthcare firms.

Page 4: Cancer informatics data visualisation presentation

Concerns regarding Big Data• Incremental Effect- The accumulation of personal data has an incremental

adverse effect on privacy. A researcher will draw entirely different conclusions from a string of online search queries consisting of different sets of words.

• Automated Decision-Making- The relegation of decisions about an individual’s life to automated processes based on algorithms and artificial intelligence raises concerns about discrimination, self determination, and the narrowing of choice.

• Predictive Analysis- Big data may facilitate predictive analysis with stark implications for individuals susceptible to disease, crime, or other socially stigmatizing characteristics or behaviors.

• Lack of Access and Exclusion- it tilts an already uneven scale in favor of organizations and against individuals. The big benefits of big data, the argument goes, accrue to government and big business, not to individuals—and they often come at individuals’ expense. In the words of the adage, “if you're not paying for it, you're not the customer; you're the product

• The Ethics of Analytics: Drawing the Line- Like any other type of research, data analytics can cross the threshold of unethical behavior.

Page 5: Cancer informatics data visualisation presentation

Data Analysis Scenario

Page 6: Cancer informatics data visualisation presentation

As information technology becomes an integral part of health care, it is important to collect and analyze data in a way that makes the information understandable and useful.

This big data surge presents many challencges around data quality; moving/managing big data sets; reduction of large sets of genes to find biomarkers; making sense out of the data in a repeatable manner; integration of clinical data with molecular/genomic data for the purpose of analytics, and scalable analysis.

Need for Informatics

Page 7: Cancer informatics data visualisation presentation

To see and understand pictures is one of the natural instincts of human, and to understand numerical data is a years training skill from schools, and even so, a lot of people are still not good with numerical data [4]. From a well-drawn picture, one is much easier to find the trends and relations. Because visual presentation of information takes advantage of the vast, and often underutilized, capacity of the human eye to detect information from pictures and illustrations. Data visualization shifts the load from numerical reasoning to visual reasoning. Getting information from pictures is far more time-saving than looking through text and numbers – that’s why many decision makers would rather have information presented to them in graphical form, as opposed to a written or textual form.

Why do we do data visualization?

Page 8: Cancer informatics data visualisation presentation

Entity: • point, • line(curve), • polyline, • glyph, • Surface,• solid, • image, • text

Numeric, symbolic (or mix): • 123, or • @

• Scalar, • vector, or• complex structure:

Various units:• meters, • inch

Discrete or continuous:• 1, 2, 3

Efficient: • minimize data-ink

ratio • chart-junk, • show data, • maximize data-ink

ratio,• brase non-data- ink, • brase redundant data-

ink• Spatial, quantity, category, temporal, relational, structural• Accurate or approximate• Dense or space• Ordered or non-ordered• Disjoint or overlapping• Binary, enumerated, multilevel• Independent or dependent• Multidimensional, etc.• Effective: viewers can interpret it easily.• Accurate: sufficient for correct quantitative evaluation.• Aesthetics: must not offend viewer's senses• Adaptable: can adjust to serve multiple needs

Page 9: Cancer informatics data visualisation presentation

Keywords

Interactive

RestrictionsOpen Source

Aesthetics

Limtations

Complex DataEasyTo Use

DensityCustomizable

Developer Tools

Mapping

Exploring Data

Flexible

Types of Visualisation

Languages

Relatedness

Output Format

Online

Toolsfor Interaction Colour

Text

Single Format

Animation VariousInput Sources

Live Data Streams

Intuitive

Page 10: Cancer informatics data visualisation presentation

Data Visualization Tools : Present Scenario

1. Dygraphs: Colour , Tools for Interaction, Intuitive, Text, Restrictions, Interactive, Online , Languages, Flexible, Developer Tools, Customizable, Density, Complex Data, Limitations, Aesthetics, Open Source dygraphs is a fast, flexible open source JavaScript charting library. It allows users to explore and interpret dense data sets

2. Circos : Intuitive, Text, Colour, Tools for Interaction, Online, Flexible, Exploring Data, Developer Tools, Customizable, Density, Aesthetics, Open Source

ZingChart is a JavaScript charting library and feature-rich API set that lets you build interactive Flash or HTML5 charts. It offer over 100 chart types to fit your data.InteractiveOther Examples ZingChart

3. Timeline 4. Exhibit 5. Leaflet6. Visual.ly 7. Dipity8. Many Eyes9. Google Charts10. Crossfilter

Page 11: Cancer informatics data visualisation presentation

Name-- Dhwanit Shah

Job title/major responsibilities--DataScientist in SAP LabsDemographics: 28 years old

Unmarried

Has aB.Tech and M.Tech degree in Computer

Science Engineering

Goals and tasks: He isfocused, goal-oriented person with very good communication skills. One of his

concerns is maintaining quality across all output of programs.

Spends hiswork time: Coding and building data visualising softwares

Reviewing and Testing of the built software

Doing Literature review and compiling of prerequisite of the software

Environment-- Hisworkplace isconnected with round the clock wifi and is equiped with aworkst -

tion where he performs high end simulation of hiscode. He works 12 hrs a day and keeps tab of each

and every activity and task that is happening on his mobile task manager.

Views on visualisation software -- He has project where he hasto compare a set of DNA with other set

that iscoded in the software. He directly puts the input into the software where it makes the internal

calculation to give the result. He wishes the software to be interactive, intuitive to use and perform

required tasks only.

Personas

Page 12: Cancer informatics data visualisation presentation

Views on visualisation software -- He has project where he hasto compare a set of DNA with other set

that iscoded in the software. He feeds the input and sets all the parameters carefully to get the

required data. He closely observes and takes note of each and every step of the process.The data taken

from respective step will help him to prove his stated hypothesis. Further he can use the large data set

for the softwares benefit. He can make use of the information for improvement and innovating in soff-

ware.

Name-- Ranjit Kumar

Job title/major responsibilities--Senior Biologist in SAP LabsDemographics: 32 years old

Married

Has aPhd degree in Bioinformatics Engineering

Goals and tasks: He isfocused, goal-oriented person with very good concept understanding skills.

One of hisconcerns is analysis and collection of legible data from the information

provided to them in the software.

Spends hiswork time: Performing Experimental testsfor proper functioning of data visualising pllatform

Reviewing and Testing of the built software

Doing Literature review and compiling of prerequisite of the software

Environment-- Hisworkplace isconnected with round the clock wifi and is equiped with aworkst -

tion. He works in the laboratory and performsvariousexperiment in cancer research. He works 12 hrs

a day and keeps tab of each and every activity and task that is happening on the software.

Page 13: Cancer informatics data visualisation presentation

Why not show data relevant to the task at hand?Why can’t the data be presented more realistically rather than in abstract format?Why shouldn’t the screen layout change rather than keeping it in static circular format (layover, clouds, threads, n dimension axis)?Why not make it easier for even non researchers, practitioners, doctors and patients to understand?Can we make the interaction more intuitive and natural? Why not consider touch as mode of interaction ?Can we build in ways to filter out specific chromosomes, DNA’s, and base pairs and remove rest of data? Can we help convert data to information? Possible to add semantics to the relationships of base pairs?Why not enable multimodal search (text, nos., zoom, click etc.) within the data for expert users?Facilitate tools to dig data ?Navigation and Walkthroughs?

What more can be done?

Page 14: Cancer informatics data visualisation presentation

Short Term Goal

To advance the State of the Art Technologies by enriching Data Visualization Techniques and methods for Data Analytics and Transformation.

Long Term Goal

To improve the efficiency and scale at which Data Analyst can work while simultaneously lowering the threshold to enable the broader audience to engage with data.

Page 15: Cancer informatics data visualisation presentation

Current Visualization methods

Page 16: Cancer informatics data visualisation presentation

Data can be mapped as a physical map to make user explore the given visual

Map

Page 17: Cancer informatics data visualisation presentation

This type of chart displays the contribution of each value to a total while emphasizing individual values.

Exploding circle , Sphere

Application ex:

Here we could use selection as a way to explode the circle and maybe to view info on it.

Page 18: Cancer informatics data visualisation presentation

Sphere which will have various histograms and data embedded on the surface of the sphere which will explode further to showcase much deeper information.

3D circle

Application ex:

Page 19: Cancer informatics data visualisation presentation

User will zoom into the different categories of elements that will be provided to them and will explore it to its limits.

Universe model

Page 20: Cancer informatics data visualisation presentation

They can be used to make comparisons between different variables effectively which will clearly show trends in data. We can give various shapes to the bar like Cylinder, Cone, or Pyramid charts.

Bar Graph

Page 21: Cancer informatics data visualisation presentation

Intensity and probability information can be shown as dot graph with visual graphics.

Dot Graph

Page 22: Cancer informatics data visualisation presentation

Wave will denote the information of the respective pattern and will show the fluctuations among the variables

Wave graph

Page 23: Cancer informatics data visualisation presentation

This type of chart displays trends over time or categories. It is also available with markers displayed at each data value. With line graphs we can show the variations of data along with Patterns unfolding alongside.

Line Graph

Page 24: Cancer informatics data visualisation presentation

Here data is represented by colors. It provides an immediate visual summary of information. It will also allow the viewer to understand complex data sets.

Terrain and Heat Maps

Page 25: Cancer informatics data visualisation presentation

Brainstorming on Data Visualization Patterns

Page 26: Cancer informatics data visualisation presentation

• Circle divided into arcs and colors.• Bevier curves representing each arcs with color.• Dot graph provides population density.• Circle with big strokes filters the incoming and

outgoing curves.• Outermost circle(one with speedometric lines)

dividing the circles into numbers.• Bar histograms depicts SNP count.

Sap Model:

Page 27: Cancer informatics data visualisation presentation

Data in polygon of n dimension. Each side will indicate a pattern and they will be interlinked with the help of straight and translucent lines.

Polygonal Structure

Application ex:

For Smaller sets of categories, we can change the layout to something like no. of categories=no. sided polygon

Page 28: Cancer informatics data visualisation presentation
Page 29: Cancer informatics data visualisation presentation

N Dimensional IllustrationData in n dimensional illustrations.When vast amount of data is illustrated, each data has many patterns passing through its location due to which finding the that particular data is very tough. In that case n-dimensional illustrations are very useful for visualization.

Page 30: Cancer informatics data visualisation presentation

AGE

GENDER

DISEASE

PATIENTS

NATIONALITY

GENOLOGY

ETHNICITY

FILTERS

AGESETTINGS

Page 31: Cancer informatics data visualisation presentation

AGE

AGE

SETTINGS

Page 32: Cancer informatics data visualisation presentation

Cell and all the constituents will be shown and the user will zoom more into the pattern and upto the deepest level to seek information.

Cell Structure

Page 33: Cancer informatics data visualisation presentation

This model is based on cell structure where we can have animation and 3d visual graphics to make things interesting, that way we can keep users engaged, intrigued and absorbed.

Here the cell is bounded by a circle having all the constituent chromosome. We can scroll towards the nucleus to see the individual chromosome or a set of chromosomes(like 2, 5 etc) in that boundary.

Page 34: Cancer informatics data visualisation presentation

In this each of the chromosome is tagged with a particular color and the same color is flanked by the arc outside, which will show the information about it.

The particular chromosome will pop out of the screen after selection where we can show various information and data related to gene. In this way one can use the scroll button to zoom more into the gene to see the area of defect like the SNPs.

Page 35: Cancer informatics data visualisation presentation

In this model the single chromosome having marked with genes having high probability of SNPs are marked, which will be connected with the arcs that are protruding that particular portion of gene to the chromosome where it has been linked.

Further one can find various information related to the concerned cancer in the pop-up box near the marked gene. Here the color of the chromosome that is in the center will be same as the arc that has been colored to show the correlation between the objects.

Page 36: Cancer informatics data visualisation presentation

In this the chromosome will be towards the periphery where the same chromosomal arc . From here the bending arcs from the smaller circle will connect to various other arcs which depicts other chromosome.

Page 37: Cancer informatics data visualisation presentation

Actual DNA structure along with the corresponding information can be shown to make user explore it while scrolling and zooming.

Helical structure

Application ex:

Page 38: Cancer informatics data visualisation presentation

In this model the whole circle can be inserted in the chromosome in which we can provide with the levels of information with the help scrolling, where we can zoom deeper into the chromosome as well as move along it to find high density SNP region.

Helical structure

Page 39: Cancer informatics data visualisation presentation

Map

Page 40: Cancer informatics data visualisation presentation

Exploding circle,Sphere

Page 41: Cancer informatics data visualisation presentation

3D circle

Page 42: Cancer informatics data visualisation presentation

Universe model

Page 43: Cancer informatics data visualisation presentation

• Dimensionality of the data: • Chromosomal data = 12-15 dimensions• Patient data = 5-10 dimensions• Researchers data = 5-10 dimensions• Diagnostic data = 4-10 dimensions

We are dealing with 30-50 dimensions of each data node (base pairs) to be presented to a user.There has to be facility to search for correlations within 50 dimensions !What’s the way? Human can at best see.. 3 dimensions… OK lets add sound and movement .. 5 dimensions!How do we do that? Visualization beyond 3D is not possible.. We will have to be selective in displays at a time.

Day dreaming about the setup?

Page 44: Cancer informatics data visualisation presentation

References :[1] Zhao Kaidi , “Data visualization”[2] Saiganesh Swaminathan, Conglei Shi, “Creating Physical Visualization with MakerVis”[3] Jeffrey Heer, Ed H. Chi, ”Separating the swarm: Categorization Methods for User Sessions on the Web”[4] Jeffrey Heer, danah boyd, “Vizster: Visualizing Online Social Networks”[5] Jeffrey Heer, Maneesh Agrawala, “Software Design Patterns for Information Visualization”[6] Jeffrey Heer, “Socializing Visualization”[4] Ed H. Chi, Adam Rosien, Jeffrey Heer, ” LumberJack: Intelligent Discovery and Analysis ofWeb User Traffic Composition”[5] Sarah J. Waterson, Jason I. Hong, Tim Sohn, “What Did They Do? Understanding Clickstreams with the WebQuilt Visualization System”[6] Jason I. Hong, Jeffrey Heer, Sarah Waterson, “WebQuilt: A Proxy-based Approach toRemote Web Usability Testing”[7] Shaun Bangay, “Visview: A system for the visualization of Multi-dimensionaldata”, in “Visual Data Exploration and Analysis V”

Page 45: Cancer informatics data visualisation presentation

[8] David S. Ebert, Randall M. Rohrer, Christopher D. Shaw, Pradyut Panda, James M. Kukla, D. Aaron Roberts, “Procedural Shape Generation for Multi-dimensional Data Visualization”, in “Data Visualization”[9] Daniel Keim, “Visual Support for Query Specification and Data Mining”. [10] Matthe w O. Ward, “A Taxonomy of Glyph Placement Strategies for Multidimensional Data Visualization”, [11] Helena Mitasova, Lubos Mitas, Bill Brown, Irina Kosinovsky, Dave Gerdes, Terry Baker, John Isaacson, “Interpolation and Visualization from 3D and 4D scattered Data Using GRASS GIS” [12] Rich Balfour, application of “fourDviz”, from http://www.infinitytechnologies.com/itweb/html/4d.html[14] U. Axen, I. Choi, “Using Additive Sound Synthesis to Analyze Simplicial Complexes”, in “Proceedings of 1994 International Conference on Auditory Display”.[20] Jim Foley and Bill Ribarsky. “Next- generation Data Visualization Tools”. In L. Rosenblum, R. A. Earnshaw, J. Encarnacao, H. Hagen, A. Kaufman, S. Klimenko, G. Nielson, F. Post, and D. Thalmann, editors, “Scientific Visualization, Advances and Challenges”. [21] K. V. Mardia, J. T. Kent, and J. M. Bibby. “Multivariate Analysis”.

Page 46: Cancer informatics data visualisation presentation

And Many Many More…….