r/qtl2: progress & plans · 2018. 2. 15. · 14 51.48 395.25 m 1 - sb bb 13 60.94 275.63 m 1 -...
TRANSCRIPT
R/qtl2: progress & plans
Karl Broman
Biostatistics & Medical Informatics, UW–Madison
kbroman.orggithub.com/kbroman
@kwbromanSlides: bit.ly/Memphis2016
R/qtl
0
5000
10000
15000
20000
25000
30000
35000
●
●
●●
●
● ●●●
●
●●●
●
●
●
●●
●●●
●●
●●●●● ● ● ● ● ●
● ●●●●●●
●● ● ● ● ●
Year
Line
s of
cod
e
●
●● ● ●
● ●●
●●
●●●●
● ● ●●
●●●
●●
●● ●●● ● ● ● ● ●
● ●●●●●●●● ● ● ● ●
●●
● ● ● ● ●● ●●
●●●●
●●
●●●●●
●●●●
●●● ● ● ● ● ● ● ●●●●●●●● ● ● ● ●
2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016
idea svn git
R
C
doc
2
Intercross
P1 P2
F1 F1
F2
3
QTL mapping
0
2
4
6
8
Chromosome
LOD
sco
re
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 X
BB BR RR
0.8
0.9
1.0
1.1
4
Multi-parent populations
combine mix fix
Valdar et al., Genetics 172:1783, 2006
6
Challenges: diagnostics
kbroman.wordpress.com/2012/04/25/microarrays-suck7
Challenges: scale of results
genotypes phenotypes
8
Challenges: scale of results
genotypes phenotypes
results
8
Challenges: organizing, automating
genotypes phenotypes
9
Challenges: organizing, automating
genotypes phenotypes
9
Challenges: organizing, automating
genotypes phenotypes
9
Challenges: organizing, automating
genotypes phenotypes
9
Challenges: organizing, automating
genotypes phenotypes
9
Challenges: organizing, automating
genotypes phenotypes
9
Challenges: organizing, automating
genotypes phenotypes
9
Challenges: metadata
What the heck is FAD_NAD SI 8.3_3.3G?
10
What was the question again?
11
The ridiculome
12
Multivariate phenotypes
13
Multivariate phenotypes
13
Composite phenotypes
Shaffer et al. (2013) J Dent Res 92:32-3714
R/qtl
0
5000
10000
15000
20000
25000
30000
35000
●
●
●●
●
● ●●●
●
●●●
●
●
●
●●
●●●
●●
●●●●● ● ● ● ● ●
● ●●●●●●
●● ● ● ● ●
Year
Line
s of
cod
e
●
●● ● ●
● ●●
●●
●●●●
● ● ●●
●●●
●●
●● ●●● ● ● ● ● ●
● ●●●●●●●● ● ● ● ●
●●
● ● ● ● ●● ●●
●●●●
●●
●●●●●
●●●●
●●● ● ● ● ● ● ● ●●●●●●●● ● ● ● ●
2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016
idea svn git
R
C
doc
15
Good things
▶ HMM code▶ basic user interface▶ comprehensive▶ diagnostics and data visualization▶ quite flexible
16
Good things▶ HMM code▶ basic user interface▶ comprehensive▶ diagnostics and data visualization▶ quite flexible
16
Bad things
17
Stupidest code ever
n <- ncol(data)temp <- rep(FALSE ,n)for(i in 1:n) {
temp[i] <- all(data[2,1:i]=="")if(!temp[i]) break
}if(!any(temp)) stop("...")n.phe <- max((1:n)[temp])
18
Input file
BBBBSBSBBB1m260.4547.1215
BBSB−−−1m395.2551.4814
BBSB−−−1m275.6360.9413
SBSBSBBBSB1m133.0538.6412
SBSB−−−1m184.1265.6311
SBSBBBBBBB1m154.8859.4710
SSSS−−−1m191.5459.269
SBSB−−−1m181.0565.318
SSSSBBSBSB1m126.1378.067
SBSBSBSBBB1m131.91586
BBBB−−−1m178.5888.335
SBSBSBSBBB1m153.1661.924
48.138.3110.451.427.33
221112
D2Mit75D2Mit379D1Mit17D1Mit80D1Mit18pgmsexspleenliver1
IHGFEDCBA
19
Open source meanseveryone can see my stupid mistakes
Version control meanseveryone can see every stupid mistake I’ve evermade
20
Open source meanseveryone can see my stupid mistakes
Version control meanseveryone can see every stupid mistake I’ve evermade
20
Baroque data structures
attr(mycross$geno[["X"]]$probs, "map")
21
R/qtl2
▶ High-density genotypes▶ High-dimensional phenotypes▶ Multi-parent populations▶ Linear mixed models
22
R/qtl2: Let’s not make the same mistakes
▶ C++ and Rcpp▶ Simpler documentation▶ Unit tests▶ A single “switch” for cross type
▶ Split into multiple packages▶ Yet another data input format▶ Flatter data structures, but still complex
23
R/qtl2: Let’s not make the same mistakes
▶ C++ and Rcpp▶ Simpler documentation▶ Unit tests▶ A single “switch” for cross type▶ Split into multiple packages▶ Yet another data input format▶ Flatter data structures, but still complex
23
Slides: bit.ly/Memphis2016
kbroman.org
github.com/kbroman
@kwbroman
24