UCI ICS IGB SISL
Scientific Applications of Machine Learning
Eric MjolsnessScientific Inference Systems Laboratory
Donald Bren School of Information and Computer Sciences, and
Institute for Genomics and Bioinformatics
University of California, Irvine
UCI ICS IGB SISL
Scientific Imagery Applications
NGC 7331 - http://photojournal.jpl.nasa.gov/catalog/PIA06322Arabidopsis SAM - Meyerowitz Lab
UCI ICS IGB SISL
Some Basic Machine Learning Distinctions• Supervised vs. unsupervised learning
– Supervised e.g. classification and regression • Feature selection• regression for phenomenological model fitting e.g. GRN’s
– Unsupervised e.g. clustering; may be preprocessor
• Generative vs. Kernal methods– Generative (statistical inference) models– Kernal methods e.g Support Vector Machines
• Vector vs. Relationship data– Vector data: preprocessed image features log I, x, …– Images, time series, shifted spectra - semigroup actions– Sparse graph/relationship data - permutation actions
UCI ICS IGB SISL
Correspondence Problems• Extended sources - map morphologies
– Similar to biological imaging problems– Fewer sources but many pixels
• Moving or changing point sources– E.g. Ida and Dactyl / JPL MLS
• Dense point sources with instrument noise e.g. globular clusters (radial density function)
• Techniques: – soft permutations, geometric transformations via optimization &
continuation– Embedding inside a graph clustering (optimization) algorithm– Multiscale acceleration of optimization
UCI ICS IGB SISL
Mixture Models• Mixture of Gaussians, t-distributions, …
– Can do outlier detection
• Mixture of factor analyzers
• Mixture of time series models• * Problem-specific generative models
– Can formulate with a Stochastic Parameterized Grammar
– Clustering graphs
Frey et al. 1998
Utsugi and Kumagai 2000
UCI ICS IGB SISL
More Detailed Clustering Grammars
• Clusters generate data
• Priors on cluster centers & variances
• Iterative through levels in a hierarchy
• Recursive through hierarchy
σ
Pr
r y
Pr
r x −
r y ( ) /σ
Pr
DatasetCluster(1,y,σ)Cluster(2,y,σ)Cluster(3,y,σ)Datu ′ m (1,1),x(1,1)( )
Datu ′ m (1,2),x(1,2)( )
Datu ′ m (3,n),x(3,n)( )
Datumx1( )
DatumxN( )
Datumx2( )
1/ N!
UCI ICS IGB SISL
Rock Field Grammargrammar rockfield(){
start → {deposit(a,ya,ca,pa) |a∈A} , distractors
c
a
2/ 2σ 0
2
a∑ + ya
2/ 2σ 0
2
a∑ + pa −p̂
2/ 2σ1
2
a∑ + log2 σ̂ a /
)σ( ) deposit(a,c
a,p
a)→ {patch(a,b,xab,ca,pab) |a∈A,b∈Ba}
patch(a,b,x
ab,c
a,p
a) → {rock xabc,cabc,sabc( ) |a∈A,b∈Ba,c∈Cab}
yab−ya
a∑ 2
/ 2σ̂ a2
p
ab~ f p
a, x
ab−xa( )
c
abc−cab
ab∑ 2
/ 2σ 42 + yabc −yab
ab∑ 2
/ 2σ52
⇒MFT
E = ca
2/ 2σ 0
2
a∑ + ya
2/ 2σ 0
2
a∑ + pa −p̂
2/ 2σ1
2
a∑ + log2 σ̂ a /
)σ( )
+ yab −yaa∑ 2
/ 2σ̂ a2 + Piab ci −ca
2/ 2σ 4
2 + xi −yab
2/ 2σ5
2⎡⎣⎢
⎤⎦⎥iab
∑
{rock x
abc,c
abc,s
abc( ) |a∈ ′A ,b∈ ′Ba,c∈Cab} → {visible_rock xi,ci ,si( ) |i ∈ I}
distractors → {rock x00d,c00d,s00d( ) |d∈D}
c
00dab∑ 2
/ 2σ 02 + x00d
ab∑ 2
/ 2σ 32
s
00d~ sizedistr p̂( )
{} ,,~uniform_permutations()iabciabcabicPPn=∑
x
i= Pi,abcxabc
ab∑
s
abc~ sizedistr p
a( )
UCI ICS IGB SISL
Transcriptional Gene Regulation Networks
• Gene Regulation Network (GRN) model
T
v
Extracellularcommunication
Drosophila eve stripe expression in model (right) and data (left). Green: eve expression, red: kni expression. From [Reinitz and Sharp, Mech. of Devel., 49:133-
158, 1995 ]. [Mjolsness et al. J. Theor. Biol. 152: 429-453, 1991]
UCI ICS IGB SISL
Gene Regulation + Signal Transduction Network
transcriptional regulation targets
receptors
ligands
cell
nucleus
d
dtv
a(t) =
1τ a
g(ua + ha) −λava⎡⎣ ⎤⎦,
where
ua(t) = Tabvb(t) +b∑ I T̂abvb
I (t)b∑
I∈Nbrs∑ + I %Tac
1 %Tcb2vc(t)vb
I
c∑
b∑
I∈Nbrs∑ (t)
T
UCI ICS IGB SISL
3-tier architecture
Database Access
Model Translation
SigmoidPathway Representation/StorageDatabase
CelleratorSimulation/InferenceEngine
MENU
InteractiveGraphicModel(SVG/Applet)
Graphic Output
OJBAPI
JLinkAPI
PROPERTY
SOAP: Web Service
XML(Object),
Image,via HTTP
UCI ICS IGB SISL
Possible software support
• Machine learning (open source/academic)– CompClust (CIT/JPL):
• Scripting/GUI dichotomy data point; • dataset views
– WEKA data mining– Intel: PNL Probabilistic Networks Library– Future: stochastic grammar modeler
• + autogeneration (as in Cellerator)
• Image processing, data environments – Matlab, IDL, Mathematica, Khoros/VisiQuest, …– NIHImage/ImageJ, …
UCI ICS IGB SISL
Metadata in Systems Biology
• SBML
• Sigmoid UMLQuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
UCI ICS IGB SISL
CLV3CLV1
?
Fletcher et al., Science v. 283, 1999 Brand et. al., Science 289, 617-619, (2000)
WUS
UCI ICS IGB SISL
SAM growth imagery
PIN1 cell
walls
QuickTime™ and aTIFF decompressor
are needed to see this picture.
UCI ICS IGB SISL
Basic Machine Learning Distinctions• Supervised vs. unsupervised learning
– Supervised e.g. classification and regression • Feature selection• regression for phenomenological model fitting e.g. GRN’s
– Unsupervised e.g. clustering; may be preprocessor
• Generative vs. Kernal methods– Generative (statistical inference) models– Kernal methods e.g Support Vector Machines
• Vector vs. Relationship data– Vector data: preprocessed image features log I, x, …– Images, time series, shifted spectra - semigroup actions– Sparse graph/relationship data - permutation actions