data science and visualization

16
Data Science and Visualization 2014 Summer Internship - Tetherless World Constellation Sumithra Gnanasekar Lakshmi Chenicheri

Upload: drew

Post on 21-Jan-2016

39 views

Category:

Documents


0 download

DESCRIPTION

Data Science and Visualization. 2014 Summer Internship - Tetherless World Constellation. Sumithra Gnanasekar Lakshmi Chenicheri. Objective. Visualize Minimum Information about a Marker Gene Sequence ( MiMarks ) compliant datasets A dark data exercise. *. MiMarks. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Data Science and Visualization

Data Science and Visualization

2014 Summer Internship - Tetherless World Constellation

Sumithra GnanasekarLakshmi Chenicheri

Page 2: Data Science and Visualization

Objective

• Visualize Minimum Information about a Marker Gene Sequence (MiMarks) compliant datasets

• A dark data exercise

*

Page 3: Data Science and Visualization

MiMarks

• A standard developed by the Genomic Standards Consortium (GSC) for reporting marker gene sequences

• Describes the environment from which the sample has been taken from

• Ensures contextual data is collected and submitted

*

Page 4: Data Science and Visualization

MiMarks Checklist

Page 5: Data Science and Visualization

Datasets

• Two datasets from a bacterial diversity study from the Western English Channel

• Focused on the seasonal structure of microbial communities

• Dataset 1 was converted from Excel to CSV

• Dataset 2 was converted from SRA to CSV

• Data cleaning was undertaken to retrieve relevant fields *

Page 6: Data Science and Visualization

Tools for Visualization

• R

• Google charts integrated with R

• Shiny R Studio

• D3.js

D3.js was finally used due to its flexibility of use and range of visualizations available

*

Page 7: Data Science and Visualization

Scatter Plot Dataset 1

• Allows the user to filter fields

• Drill and expand

• Group based on fields

• Handy in determining correlations between variables

*

Page 8: Data Science and Visualization

Analysis of Scatter Plot Dataset 1

• Depth, density, total_Depth of water column, longitude and latitude were found to be independent of the other environmental variables

• Near linear correlation between nitrate and silicate, and nitrate and phosphate

*

Page 9: Data Science and Visualization

Scatter Plot Dataset 2

• Allows the user to filter fields

• Drill and expand

*

Page 10: Data Science and Visualization

Analysis of Scatter Plot Dataset 2

Linear trend seen in the scatter plots of:

1. Spots vs Bases

2. Nitrate vs Phosphate

3. Org_nitro vs Ord_carb

4. Temperature vs Density

*

Page 11: Data Science and Visualization

Temporal Visualization

Allows one to filter values based on time and analyze its effect on other variables

*

Page 12: Data Science and Visualization

DOI Visualization

• Visually represents DOIs associated with data points

• On clicking a bubble, the metadata for that DOI is fetched and displayed

*

Page 13: Data Science and Visualization

Bubble Chart

• Visually represents the environment data associated with each sample

• Bubble size corresponds to organism count

*

Page 14: Data Science and Visualization

RDF Conversion

The RDF conversion for MiMarks compliant datasets involves two steps:

1. Construct an Ontology or use an existing one2. Convert the dataset into a triple instance using CSV to RDF

conversion tools

csv2rdf4lod is an open source tool that can be used to easily convert the data in a CSV file into RDF encoded data

*

Page 15: Data Science and Visualization

Spatio-temporal feature of MiMarks, VAMPS and CoDL datasets

Some tools or visualizations that can be used to visualize the MiMarks, VAMPS and CoDL datasets are as follows:

• Planetary.js, an open source tool will be effective in representing the spatial features in an interactive way

• Motion charts that show the change over a period of time can be effective, by showing a change in the quantity represented as the size of the bubble in the motion chart

• Calendar based representation of values if there is continuous data, is another option

*

Page 16: Data Science and Visualization

Links to Visualizations

• Timeline crossfiltering visualization: http://dco.tw.rpi.edu/viz/timeline/index.html

• DOI visualization: http://dco.tw.rpi.edu/viz/doiVis/index.html

• Scatterplot visualization for Dataset 1: http://dco.tw.rpi.edu/viz/scatterPlot/demo/demo.html

• Bubble chart Visualization: http://dco.tw.rpi.edu/viz/Bubblechart/bubble_dataset2/index.html

• Scatterplot visualization for Dataset 2: http://dco.tw.rpi.edu/viz/scatterplot_dataset2/demo/demo.html

*