towards lensfield
DESCRIPTION
Lensfield is a desktop and filesystem-based tool designed as a “personal data management assistant” for the scientist. It combines distributed version control (DVCS), software transaction memory (STM) and linked open data (LOD) publishing to create a novel data management, processing and publication tool. The application “just looks after” these technologies for the scientist, providing simple interfaces for typical uses. It is built with Clojure and includes macros which define steps in a common workflow. Functions and Java libraries provide facilities for automatic processing of data which is ultimately published as RDF in a web application. The progress of data processing is tracked by a fine-grained data structure that can be serialized to disk, with the potential to include manual steps and programmatic interrupts in largely automated processes through seamless resumption. Flexibility in operation and minimizing barriers to adoption are major design features.TRANSCRIPT
Towards Lensfield: data management, processing and
semantic publication for vernacular e-science
Nick Day, Jim Downing, Lezan Hawizy, Nico Adams and
Peter Murray-Rust Unilever Centre for Molecular Science Informatics,
University of Cambridge
This presentation: CC-By-SA Jim Downing
Linked Data
CC-By-SA-NC jmelchio
CC Images from Flickr
Selling Linked Data
Make it transparentMake it easy
CC-By mrslogic
Selling Linked Data
Citations
Selling Linked Data
• Visualizations
• Data management
• Automation
Demo
http://code.google.com/p/lensfield/
Lensfield Principles
• Make it easier to do the right thing
• Vernacular
• KISS and Embrace constraints
Constraints
• Work on the desktop without infrastructure installation
• Processing tasks could be anything and aren’t predictable
Re-use
Jumbo-Converters
• Library of chemistry file format converters, semantifiers and enhancers
• Part of the CML Java libraries
• http://sourceforge.net/projects/cml/
Version Control• Mercurial
• Excellent support for experimentation
• Backup to remote machine
• P2P sharing
• Track script changes with data
• Automatically ignore deterministic intermediates
Build metaphor
• Describing state transitions rather than process better for provenance tracking
• Alternative to graphical programming languages / workflow packages
• hard problems are re-use and comprehension
Clojure
• Strong on concurrency
• Functional
• Software Transactional Memory
• Lisp
• Snapshots, pause and resume, continuations
Future Development
• Templated Parameter Sweeps & sensitivity analysis
• Design of Experiments
• Multicore performance testing
• Grid processing
Users
• CLARION project
• Embargo management and publication of Electronic Lab Notebook data.
• OREChem
• Distributed chemistry eScience using Linked Data.
• Computational Chemical engineering
UsersYou?
... to use Lensfield!
http://code.google.com/p/lensfield/
CC-By-NC ilonameagher
ThanksColleagues Funds
Collaboration and Inspiration
Nick DayJohn AspdenLezan HawizyPeter Murray-Rust
Nico Adams (Dept of Genetics, Cambridge)Jerry Winter (Unilever)Noel Ruddock (Unilever)Markus Kraft, Weerapong Phadungsukanan (Chemical Engineering, Cambridge)