the elixir uk industry survey by gabriella rustici
TRANSCRIPT
Industry Engagement sector update Gabriella Rustici
Bioinformatics Training Facility, School of the Biological Sciences
Activities so far
Appointed an Industry Engagement advisory committee
Confirmed members: • Claus Bendtsen, AstraZeneca • Mark Forster, Syngenta • Samiul Hasan, GSK • Wendy Filsell, Uniliver • William Spooner, Eagle Genomics • Audrey Kauffmann, Novartis
To be confirmed: • Justin Powell, Takeda (not replied)
Activities so far
Circulated two surveys to:
• help us understanding the bioinformatics-related training needs of industry and
• consequently ensure that suitable training activities are developed and honed to target such needs.
One survey targeted bioinformaticians, the other one wet lab scientists
Surveys’ results
0
10
20
30
40
50
60
70
80
90
Bioinformaticians Wet lab
Large company Small-to-medium enterprise
0 10 20 30 40 50 60 70 80
Biogen Idec Bioindustry Park Silvano Fumero
Databiology DNAdigest.org
DNAnexus Dupont
EMD Serono (Merck Serono) Euformatics Oy
Genentech Ina Harrow Consulting
Instem Scientific LGC
Life Technologies - Thermo Fisher Lundbeck
MedImmune Novo Nordisk
Omixon Biocomputing LTD Redoxis AB
Roche Astellas Pharma Inc.
Bayer Healthcare Eli Lilly & Company
Heptares OP
Pfizer Inc. UCB
Unilever AstraZeneca
Bayer Bayer Pharma AG
NIBR Sanofi
Eagle Genomics Illumina
GlaxoSmithKline Novartis
Discipline of interest
0 10 20 30 40 50 60
Virology Toxicology
Systems biology Proteomics
Plant Sciences Oncology
Neurobiology Molecular Biology
Microbiology Medicine
Infectious diseases Immunology
Genomics/epigenomics Epidemiology
Drug development Computer Science
Computational chemistry Chemistry
Cell biology Biomedical Sciences
Bioinformatics Biochemistry/Biophysics
Bioanalytics
Bioinformaticians Wet lab
What databases do they use in their work?
0 10 20 30 40 50 60
Systems databases (e.g. BioModels, Reactome, KEGG,...)
Structures databases (e.g. PDBe)
Protein databases (e.g. Uniprot, Pfam, Intact,...)
Ontology resources (e.g. Gene Ontology,..)
No, I do not use databases
Literature services (e.g. Pubmed,...)
Gene expression databases (e.g. ArrayExpress, Gene Expression Omnibus,...)
DNA & RNA databases (e.g. Ensembl, 1000 genomes, UCSC,....)
Chemical biology databases (e.g. ChEMBL,..)
bioinformaticians
wet lab
Bioinformaticians: Software tools/Data mining software
0 2 4 6 8 10
cBioPortal Custom
Databiology Genedata
IGV Matlab
Omicsoft Pathway Studio
postgresql SQLite
SVM Spotfire
Arraystudio Expressionist
Weka Knime
Linguamatics R/Bioconductor
Data mining software
0 5 10 15 20
AffyMetrix
Ensembl
Oracle
Plink
SAS
Eclipse
Spotfire
cytoscape
R/Bioconductor
NGS tools
Software tools
0 10 20 30 40 50 60 70
Workflow tools (e.g. Galaxy)
No, I do not use any software
Gene set enrichment testing tools (e.g DAVID)
Next generating sequencing read alignment and assembly programs (e.g BWA, Bowtie,...)
Pathway & network analysis tools (e.g. Cytoscape, Biocarta, Ingenuity,...)
Data analysis environments (e.g. R/Bioconductor, Matlab,....)
Sequence alignment, similarity & homology tools (e.g. Blast, Clustal,...)
Microsoft software (e.g. Excel)
Wet lab scientists: analysis software
Wet lab scientists and statistics
6%
29%
59%
6%
How confident are you with statistics?
Very confident
Confident
Not so confident
I am not even sure of what statistics I need to know
34.0%
1.5%
32.4%
32.4%
No, I do not have any support. I am responsible for analyzing the data that I
generate.
No, the data analysis is carried out by someone else. I just receive a file with the
results.
Yes, occasionally I interact with a bioinformatician/statistician at my Institute,
particularly when I get stuck and I don’t know how to proceed.
Yes, I have a bioinformatician in the group that helps me to design experiments and
also provides support for the data analysis
Do you collaborate with a bioinformatician/statistician?
Programming experience/languages
0 5 10 15 20
Python R/BioConductor
Perl Java C++
Matlab Ruby
Javascript Unix bash
HTML MySQL PL/SQL
Scala sparql
SQL
Bioinformaticians - Programming languages
26.5%
73.5%
Wet lab – programming experience
Yes
No
Bioinformaticians: What competencies are crucial?
0% 10% 20% 30% 40% 50% 60% 70% 80% 90%
Using and building ontologies
Using and applying standards
Version control tools
Retrieving and manipulating data from public repositories
Working with high-performance computing or cloud-based solutions
Modeling and warehousing of biological data
Programming
Integrating public and private data-sets
Use of scripting languages
Ability to use statistical analysis software packages
Data mining of large biological data-sets
Wet lab: what expertise would you like to acquire?
0.0% 10.0% 20.0% 30.0% 40.0% 50.0% 60.0% 70.0% 80.0%
If other, please specify
Data publishing skills (e.g. how do I publish my results?)
Scientific knowledge (e.g. how should I design my experiment to obtain meaningful results?)
Data manipulation skills (e.g. what software is more appropriate to analyze my data? How does a specific software work?)
Statistical knowledge (e.g. what statistics do I need to know to be able to analyze my data?)
Data visualization skills (e.g. how does my data look like? How do I interpret and present my data?)
Bioinformaticians: Which bioinformatics training would you most value in relation to your work?
0% 10% 20% 30% 40% 50% 60% 70% 80% 90%
Basic computing skills
Data analysis skills
Programming skills
Statistical methodologies
Use of data standards in curation and/or data integration practices
Bioinformaticians: What topics would you like to see covered in future training activities?
0 5 10 15 20 25 30 35 40
Programming
HPC
Cloud solutions
Text mining
Data visualization
Drug development/discovery
New resources/latest technologies
Network analysis
Workflows
Statistics
Data integration
NGS analysis
What training format do they prefer?
0.0%
10.0%
20.0%
30.0%
40.0%
50.0%
60.0%
70.0%
Face-to-face training courses on site
Face-to-face training courses off site
Online/eLearning Face-to-face combined with online
Bioinformaticians
Wet lab
Summary
• Statistics is a major weakness, regardless of the user group – the topic is inadequately covered in undergraduate curricula and its teaching often approached theoretically rather than practically
• Analysis of high-throughput data, primarily with popular data mining software, knowledge of programming languages (Python, R, Perl), data integration and network analysis are high priorities for bioinformaticians
• Basic data manipulation, visualization and statistics are fundamental to wet-lab scientists. Lack of confidence in the use of statistical software that requires scripting.
• Face-to-face is always popular but a lot of basic training can be delivered online
Various considerations
• Collate all Industry use cases already available to define key competencies in Industry and disseminate these to all other sectors
• Utilize this information to prioritize key training areas
• Hold the first Advisory committee meeting within the next six months
• Collaborations with other Elixir nodes:
• ELIXIR-NL will use the same surveys to assess the training needs of industry; share the results and collate more information – planning to do this in collaboration with all Elixir nodes
• Engage Industries beyond pharma?