p3 training and_life_as_a_postdoc_(felix_klein)
TRANSCRIPT
Bioinformatics Career Day24 May 2012
Felix Klein
Felix Klein24.05.20122
Background
• physics diploma, University of Heidelberg
• diploma thesis in radiation dosimetry
at DKFZ
• measurements at HIT
Felix Klein24.05.20123
Why bioinformatics?
• interdisciplinary
• programmed in R
• worked on data analysis
Felix Klein24.05.20124
Progress in science is driven by technology
Felix Klein24.05.20125
Chromatin loops
Felix Klein24.05.20126
Investigation of chromatin 3D structure
• role of chromatin 3D structure in gene regulation
• 4C to investigate detailed interactions of
cis-regulatory modules (CRMs)
• global chromatin interactome using HiC
Felix Klein24.05.20127
Investigation of chromatin 3D structure
8
Automated analysis of microscopy based RNAi screens
Imaging Segmentation
Calibrated image Segmentation mask
g.x g.y g.s g.p g.pdm [1,] 123.1391 3.288660 194 67 9.241719 [2,] 206.7460 9.442248 961 153 20.513190 [3,] 502.9589 7.616438 219 60 8.286918 [4,] 20.1919 22.358418 1568 157 22.219461 [5,] 344.7959 45.501992 2259 233 35.158966 [6,] 188.2611 50.451863 2711 249 28.732680 [7,] 269.7996 46.404036 2131 180 26.419631 [8,] 106.6127 58.364243 1348 143 21.662879 [9,] 218.5582 77.299007 1913 215 25.724580[10,] 19.1766 81.840147 1908 209 26.303760[11,] 6.3558 62.017647 340 68 10.314127[12,] 58.9873 86.034128 2139 214 27.463158[13,] 245.1087 94.387405 1048 123 18.280901[14,] 411.2741 109.198678 2572 225 28.660816[15,] 167.8151 107.966014 1942 160 24.671533[16,] 281.7084 121.609892 2871 209 31.577270
Features extraction
Object featuresObjects labels
ClassificationSummary
g.x g.y g.s g.p g.pd
123.1391 3.288660 194 67 9.241719
Phenotypic profile
Source image
aft apt neg
int pos
Felix Klein24.05.20129
What was important for me?
• bioinformatics group with members of diverse backgrounds
• PI who successfully trained bioinformaticians
• well established group in bioinformatics
Felix Klein24.05.201210
What might be interesting for you
• turn data into biology
• interaction with people from biology groups
• communication skills !!!
• workload divides mainly into:• programming (50 %)• reports, meetings, email
Felix Klein24.05.201211
Acknowledgements
Wolfgang Huber
Simon AndersJoseph BarryBernd FischerJulian GehringAleksandra PekowskaPaul Theodor PylAlejandro ReyesMaria Secrier
Collaborators:
Michael BoutrosChristian Volz
Eileen FurlongYad Ghavi Helm
Data production rates
LHC: 1.8 GB / s at peak capacity (i.e. actively conducting a primary aspect of the LHC’s four main experiments: ATLAS, ALICE, CMS, and LHCb).
These experiments will take roughly a decade to complete, and each of them is expected to produce over a 1 PB per year of data.
One Illumina HiSeq: up to 600 Gb/run , i.e. ~600 GB/10 days = 18 TB/year (not including derived data e.g. BAM)
One Digital Embryo (2008): 3.5 TB (2048 x 2048 x 370 x 1226)
EMBL-EBI: in 9/2011, data storage capacity was 14 PB