data engine
DESCRIPTION
Slides to accompany Patrick McSweeney's winning pitch in the Open Repositories 2012 DevCSI Developer Challenge. More information about this entry can be found at http://devcsi.ukoln.ac.uk/or2012-developer-challenge-data-engineTRANSCRIPT
![Page 1: Data Engine](https://reader033.vdocument.in/reader033/viewer/2022042715/5597136f1a28ab2f108b4726/html5/thumbnails/1.jpg)
DataEngine
By Patrick McSweeney
![Page 2: Data Engine](https://reader033.vdocument.in/reader033/viewer/2022042715/5597136f1a28ab2f108b4726/html5/thumbnails/2.jpg)
Dave Mills
● PhD electrical engineering● 10-15 Number of experiments
per month● Raw data: 1 GB● Processed data : 5-10 MB ● Processed with MATLAB.● Raw data when zipped: 450 MB
DavePatrick
![Page 3: Data Engine](https://reader033.vdocument.in/reader033/viewer/2022042715/5597136f1a28ab2f108b4726/html5/thumbnails/3.jpg)
State of the onion
Researchers are using many different methods to collect or generate data from sensors and CCDs to supercomputers and particle colliders. When the data finally shows up in your computer, what do you do with all this information that is now in your digital shoebox? People are continually seeking me out and saying, “Help! I’ve got all this data. What am I supposed to do with it? My Excel spreadsheets are getting out of hand!”
The suggestion that I have been making is that we now have terrible data management tools for most of the science disciplines. Commercial organizations like Walmart can afford to build their own data management software, but in science we do not have that luxury. At present, we have hardly any data visualization and analysis tools. Some research communities use MATLAB, for example, but the funding agencies in the U.S. and elsewhere need to do a lo more to foster the building of tools to make scientists more productive. When you go and look at what scientists are doing, day in and day out, in terms of data analysis, it is truly dreadful. And I suspect that many of you are in the same state that I am in where essentially the only tools I have at my disposal are MATLAB and Excel!
![Page 4: Data Engine](https://reader033.vdocument.in/reader033/viewer/2022042715/5597136f1a28ab2f108b4726/html5/thumbnails/4.jpg)
State of the onion
![Page 5: Data Engine](https://reader033.vdocument.in/reader033/viewer/2022042715/5597136f1a28ab2f108b4726/html5/thumbnails/5.jpg)
Data imported
![Page 6: Data Engine](https://reader033.vdocument.in/reader033/viewer/2022042715/5597136f1a28ab2f108b4726/html5/thumbnails/6.jpg)
Data provenance
![Page 7: Data Engine](https://reader033.vdocument.in/reader033/viewer/2022042715/5597136f1a28ab2f108b4726/html5/thumbnails/7.jpg)
Data manipulation
![Page 8: Data Engine](https://reader033.vdocument.in/reader033/viewer/2022042715/5597136f1a28ab2f108b4726/html5/thumbnails/8.jpg)
Choose Visualsiation
![Page 9: Data Engine](https://reader033.vdocument.in/reader033/viewer/2022042715/5597136f1a28ab2f108b4726/html5/thumbnails/9.jpg)
Save visualisation
![Page 10: Data Engine](https://reader033.vdocument.in/reader033/viewer/2022042715/5597136f1a28ab2f108b4726/html5/thumbnails/10.jpg)
Lots of possibilities
![Page 11: Data Engine](https://reader033.vdocument.in/reader033/viewer/2022042715/5597136f1a28ab2f108b4726/html5/thumbnails/11.jpg)
Take home
● An important step on the road to data science● Make the repository a tool● Get the data at the point of creatation● Repeatable experiments
![Page 12: Data Engine](https://reader033.vdocument.in/reader033/viewer/2022042715/5597136f1a28ab2f108b4726/html5/thumbnails/12.jpg)
The outlook is good