dat 5 minute lightning talk

Post on 14-Jul-2015

97 Views

Category:

Data & Analytics

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Dat: version and share your data

Karissa McKelveySoftware Developer and Project Manager and Science Evangelist and Designer (I wear a lot of hats) U.S. Open Data

@karissamck

karissa $ ~

dat is a non profit

Reproducible Research

“A rule of thumb … is that half of published research cannot be replicated”

How do we replicate research today?

How do we replicate research today?collaborate on

How do we replicate research today?collaborate on

data analysis

How do we collaborate today?

How do we collaborate today?

How do we collaborate today?

How do we collaborate today?

????????

How do we replicate research today?

me@home $ dat push me@campus $ dat pull

you@work $ dat clone

dat workflow• import

• version

• publish

• replicate

.csv.csvdata

you

.csv.csvdata

you

.csv.csvdata

you

.csv.csvdata

import

you

$ dat init

$ dat add dataset cities

$ dat add rows cities cities.csv

$ dat add files cities city_model.gz

import

.csv.csvdata

import

http://my-data.bids.edu

you

$ dat listen

.csv.csvdata

import

http://my-data.bids.edu

publish

you

.csv.csvdata

import

http://my-data.bids.edu

publish

you

$ dat clone

.csv.csvdata

import

http://my-data.bids.edu

publish

.csv.csvdata

you

Versioning

$ dat add files cities us_cities_viz.pngThis will override us_cities_viz.png at c2342. OK?

$ dat cities add rows updated_data.csvThis will update 3,434,245 rows. OK?

$ dat push

.csv.csvdata

import

http://my-data.bids.edu

publish

.csv.csvdata

you

http://my-data.bids.edu

publish

.csv.csvdata

http://my-data.indiana.edu

.csv

.png.png.png

.csv.csv

.csv.csv.R

.csv.csv.pyINTEROPERABILITY in Python and R

.csv

.png.png.png

.csv.csv

.csv.csv.R

.csv.csv.pyECOSYSTEM

• Goal: manipulate datasets with scripting

• Supported keywords: run, pipe, map, reduce, fork, keyword

• Bash-like

• Platform-independent

• Uses node.js streams (fast!)

Datscript

Top: Datscript “pipe” command Bottom: Equivalent command in bash

Datscript: pipeline example

Datscript: example commands

background - executes command, but doesn’t wait for it to finish map- pipes first argument into rest of arguments

run- a serial command (executes and finishes command)

Karissa McKelvey - @karissamck

Melanie Cebula - @melaniecebula

http://dat-data.com

.csv

.png.png.png

.csv.csv

.csv.csv.R

.csv.csv.pyINTEROPERABILITY in Python and R

.csv

.png.png.png

.csv.csv

.csv.csv.R

.csv.csv.pyECOSYSTEM

top related