genome content management: a tale of small rna
DESCRIPTION
PresentationTRANSCRIPT
Genome Content ManagementA Tale of Small RNA
William [email protected]
@whsqwghlm
eSI workshop on data flows in NGS
Eagle: An Open Source Business
Consultancy/advice Training Support Installation/Integration Customization Out sourced management
BusinessOpen Community(e.g. Academia)
Service Company
ServiceCollaboration
The Sequencing Cliff
$1000 genome~$0.01 Mbase (30x coverage)
Bioinformatics Crash Landing?
What needs to change?
The following must increase:
1. Number/efficiency of bioinformaticians,
2. Hardware scalability,
3. Software quality.
Survey – Attitudes to OS Software
Stab
ility
/relia
bilit
y
Scie
ntifi
c va
lidat
ion
Compu
tatio
nal e
fficien
cy
Easy
to in
stal
l/mai
ntai
n
Visu
al re
pres
enta
tion
Secu
rity
Inte
grat
ion
Ease
of u
se
Avai
labi
lity
of tr
aini
ng
Comm
ercial
sup
port
0
20
40
60
80
100
120
IrreleventUsefulImportant
%
Technical attributes win
Technical attributes win
Technical attributes: WIN
Usability attributes lose
Usability attributes lose
Usability attributes: LOSE
Source - http://eaglegenomics.com/survey
BUT…CAN THIS APPROACH SCALE?
Bioinformaticians like to;
• Develop their own solutions,
• Using open-source software,
• That’s stable, reliable, and published.
Bioinformaticians don’t like to;
•Develop user-friendly, supported software,
•Or train others to use it.
Is this the Answer?
“Genome Content Management is the set of processes and technologies that support the creating, managing, and reporting of genomic data.”
Create
Man
ag
e
Report
Create
Report
Ext
end
Manage
Share
Reuse
TIMELINE: Bespoke….Common Models….Content Management Systems
Genome Content Management Systems (G-CMS)
Work
flow
Ori
en
ted
Data
base O
rien
ted
Open Source Proprietary
Ensembl as a GCMS
Comparative Genomics
Functional Genomics
Variation
Assembly/GenesData Integration
Data Reporting
Data Analysis
Data Integration
Data Querying
Data QC
API
Auto-Scaling Ensembl
Databases Activities
APIDatabases Activities
APIDatabases Activities
APIDatabases Activities
APIDatabases Activities
API
Databases Activities
eHive
ActivitiesActivities
ActivitiesActivities
ActivitiesActivities
miRNA Prediction
Addfurther metrics to each transcript, e.g.:
MiRDeep Friedländer et. al. 2008
MiPred Jiang et. al. 2007
Drosha Site Finder Helvik et. al. 2007
RNAFold/RNAeval Gruber et. Al. 2008
Align small RNA reads to genome, store expressed regions
Test regions for stable secondary structure, store transcripts
Ensembl eHivebased
miRNAPrediction
from
Small RNA-Seq
Example Prediction: G.ga Chr. 19gga-mir-142 CACAGTACACTCATCCATAAAGTAGGAAACACTACACCCTGCAGTGCTGTTTAGTAGTGCTTTCTACTTTATGGGTGACTGCACTGTCgga-miR-142-3p/gga-miR-142-5p CCATAAAGTAGGAAACACTACA GTAGTGCTTTCTACTTTATGGG
IAH_G0249505300 .((.((.((((((((((((((((((.((((((.....(((....)))...)))))).)))))))))))))))))).)).)) IAH_G0249505300 AGTACACTCATCCATAAAGTAGGAAACACTACACCCTGCAGTGCTGTTTAGTAGTGCTTTCTACTTTATGGGTGACTGCACKN-1523_BP25_NR000026458 AGTACACTCATCCATAAAGTAGGAAACACTAC.................................................KN-1523_BP25_NR000050389 .GTACACTCATCCATAAAGTAGGAAACACTACA................................................KN-1523_BP25_NR000037148 ..TACACTCATCCATAAAGTAGGAAACACTAC.................................................KN-1523_BP25_NR000055284 .......TCATCCATAAAGTAGGAAACACT...................................................KN-1523_BP25_NR000055535 ........CATCCATAAAGTAGGAAACACTA..................................................KN-1523_BP25_NR000026404 ........CATCCATAAAGTAGGAAACACT...................................................KN-1523_BP25_NR000044105 ........CATCCATAAAGTAGGAAACACT...................................................KN-1523_BP25_NR000031787 ........CATCCATAAAGTAGGAAACACTAC.................................................KN-1523_BP25_NR000031957 .........ATCCATAAAGTAGGAAACACTAC.................................................KN-1523_BP25_NR000022166 .........ATCCATAAAGTAGGAAACACTACA................................................KN-1523_BP25_NR000045530 .........ATCCATAAAGTAGGAAACACTA..................................................KN-1523_BP25_NR000046154 .........ATCCATAAAGTAGGAAACACTAC.................................................KN-1523_BP25_NR000037676 ..........TCCATAAAGTAGGAAACACTACA................................................KN-1523_BP25_NR000032163 ..........TCCATAAAGTAGGAAACACTAC.................................................KN-1523_BP25_NR000065952 ..........TCCATAAAGTAGGAAACACTACA................................................KN-1523_BP25_NR000064893 ..........TCCATAAAGTAGGAAACACTACA................................................KN-1523_BP25_NR000060177 ..........TCCATAAAGTAGGAAACACTA..................................................KN-1523_BP25_NR000054812 ...........CCATAAAGTAGGAAACACTACA................................................KN-1523_BP25_NR000058317 ...........CCATAAAGTAGGAAACACTACA................................................KN-1523_BP25_NR000007107 ...........CCATAAAGTAGGAAACACTA..................................................KN-1523_BP25_NR000055367 ...........CCATAAAGTAGGAAACACTAC.................................................KN-1523_BP25_NR000016790 ...........CCATAAAGTAGGAAACACTACA................................................KN-1523_BP25_NR000041436 ...........CCATAAAGTAGGAAACACTAC.................................................KN-1523_BP25_NR000034949 ...........CCATAAAGTAGGAAACACTACA................................................KN-1523_BP25_NR000008209 ...........CCATAAAGTAGGAAACAC....................................................KN-1523_BP25_NR000060347 ...........CCATAAAGTAGGAAACACTAC.................................................KN-1523_BP25_NR000002912 ...........CCATAAAGTAGGAAACACTAC.................................................KN-1523_BP25_NR000006332 ...........CCATAAAGTAGGAAACACTACA................................................KN-1523_BP25_NR000038759 ............CATAAAGTAGGAAACACTACA................................................KN-1523_BP25_NR000049956 ............CATAAAGTAGGAAACACTAC.................................................KN-1523_BP25_NR000048198 .............ATAAAGTAGGAAACACTACA................................................KN-1523_BP25_NR000042788 ...............................................TTAGTAGTGCTTTCTACTTTAT............KN-1523_BP25_NR000051592 .................................................AGTAGTGCTTTCTACTTTATG...........KN-1523_BP25_NR000064475 .................................................AGTAGTGCTTTCTACTTTATG...........KN-1523_BP25_NR000065559 ..................................................GTAGTGCTTTCTACTTTATG...........KN-1523_BP25_NR000031530 ....................................................AGTGCTTTCTACTTTATG...........KN-1523_BP25_NR000049786 ....................................................AGTGCTTTCTACTTTATGGG.........KN-1523_BP25_NR000000111 .....................................................GTGCTTTCTACTTTATGGG.........
KnownmiRNA
PredictedmiRNA
Droshasite
Conclusions
• This talk;
• Started with a business model,
• Ended with a sequence annotation,
• The two are linked by content management;
• Reuse• Extend• Share
Acknowledgements
• Mick Watson
- Institute of Animal Health
• Nick James
- Eagle Genomics Ltd.
• Madhu Donepudi
- Eagle Genomics Ltd.
Bioinformatics in the age of the
$1000 Genome?
http://eaglegenomics.com/survey