weekly report start learning gpu
Post on 14-Jan-2016
14 Views
Preview:
DESCRIPTION
TRANSCRIPT
Weekly ReportStart learning GPU
Ph.D. Student: Leo LeeSupervisor: Dr. Xiaowen ChuDate: Sep. 11, 2009
Outline
Protein identification and pFind
GPU and data mining
Research Plan
Protein identification and pFind
Background
Identify flow
Challenges
Could GPU be used?
Protein identification and pFind
Background
Identify flow
Challenges
Could GPU be used?
The Human Genome Project: China 1%
Same gene , different protein
Human Plasma ProteomeProject, USA
Human Disease Glycomics/Proteome Initiative (HGPI), Japan
Human Proteome Program: China in charge of liver
Characters of Proteome
Protein identification and pFind
Background
Identify flow
Challenges
Could GPU be used?
Mass Spectrometry Based Protein Identification
Mixed Proteins
>ipi|IPI00243451|IPI00243451.6 MDQHQHLNKTAESASSEKKKTRRCNGFKMFLAALSFSYIAKALGGIIMKISITQIERRFD…
TAESASSEKMFLAALSFSYIAK…
Digest
Mixed peptides
LC-MS/MS
Data
analyze
Protein sequence Peptide sequence
Merge
19-21-08 FT 893 MS2 9 avg #1 RT: 0.63 AV: 1 NL: 1.04E4T: FTMS + p NSI Full ms2 893.60@30.00 [ 500.00-1600.00]
600 700 800 900 1000 1100 1200 1300 1400m/z
0
10
20
30
40
50
60
70
80
90
100
Relat
ive A
bund
ance
928.6396
929.9735
720.3784823.9249
916.4733769.9116 955.7405
1008.5148
1097.6791676.8584
1229.5820 1358.6410900.2117663.0114588.3018 1115.5698 1412.59381348.38761239.3015
Tandem MS
Web search engine
Protein identification SE
20040060080010001200
Go pFind
Sequence database
…KFDTGIPDGFAGFFGHYAQGGITFRH
EWTRJQIDF…
query
scoreTAESA
MFLAALS
…FSYIAK200400600800100012
00
20040060080010001200
……
Upper bound of mass : 699.70
lower bound of mass 699.90
6 9 9 .7 8 T L K H L K6 9 9 .7 8 W D R D L6 9 9 .8 2 E L D G E R...
查询结果
200 40060080010001200
400.15 EVDG400.15 AAEE400.15 PSTD
…698.48 SVKKKK699.78 TLKHLK699.78 WDRDL
……
>IQPSKANMETEPDQ…>DEAVPPPALQLQFN……..
Protein sequence database
Protein identification SE
digestion
20040060080010001200
20040060080010001200
20040060080010001200
……
>IQPSKANMETEPDQ…
>DEAVPPPALQLQFN…
>RQRAILKVMNTIGGE……
MS
Protein identification SEProtein
database
>IQPSKANMETEPDQ…
>DEAVPPPALQLQFN…
>RQRAILKVMNTIGGE……
MS Protein database
Digest
400 EVDG
400 AAEE
400 PSTD
698 SVKKKK
699 TLKHLK
699 WDRDL
……
Peptide
Matching
Protein identification SE
Protein identification and pFind
Background
Identify flow
Challenges
Could GPU be used?
>IQPSKANMETEPDQ…>DEAVPPPALQLQFN…>RQRAILKVMNTIGGE…
MS Protein database
Digest
EVDGAAEEPSTD
SVKKKKTLKHLKWDRDL
……
Peptide
Matching
Challenges of PISE
Generation Speed keep increasing
Protein increaseexponentially
PTM leads to huge peptides
E.g. Phosphorylation
Amino S, T and Y (HPO3,80Da)
- May be happen- 25 kinds of possibilities
PO3 PO3 PO3 PO3PO3
EMSVPSCQYILSATNR
Identification of PTM
400 EVDG
400 AAEE
400 PSTD
631 EMSVPS
699 TLKHLK
699 WDRDL
……
Peptide
>IQPSKANMETEPDQ…
>DEAVPPPALQLQFN…
>RQRAILKVMNTIGGE……
Protein
Protein identification and pFind
Background
Identify flow
Challenges
Could GPU be used? http://bioinformatics.oxfordjournals.org/cgi/
content/full/25/15/1937
Protein identification on GPU
Each thread-each MS
Each thread-each score
Each thread-each “query” V1 Match V2
Seems valuable to think further!
Outline
Protein identification and pFind
GPU and data mining
Research Plan
Google 2009.09.11
CPU 133,000,000 Genome GPU 45,600
GPU 13,800,000 Proteomic GPU 7,830
GPGPU 621,000 Protein GPUProtein GPU 85,300
CUDA 6,040,000 Protein identification GPU
3,450
Data mining on GPU
77,700
GPU and data mining
Characters of GPU GPU VS CPU
CUDA
Data mining on GPU
Quadro FX 5600
NV35 NV40
G70G70-512
G71
Tesla C870
NV30
3.0 GHzCore 2 Quad3.0 GHz
Core 2 Duo3.0 GHz Pentium 4
GeForce8800 GTX
0
100
200
300
400
500
600
Jan 2003 Jul 2003 Jan 2004 Jul 2004 Jan 2005 Jul 2005 Jan 2006 Jul 2006 Jan 2007 Jul 2007
GF
LO
PS
1 Based on slide 7 of S. Green, “GPU Physics,” SIGGRAPH 2007 GPGPU Course. http://www.gpgpu.org/s2007/slides/15-GPGPU-physics.pdf
GPU VS CPU
Design philosophies are different.
The GPU is specialized for compute-intensive, massively data parallel computation (exactly what graphics rendering is about) So, more transistors can be devoted to data processing rather than data
caching and flow control
The fast-growing video game industry exerts strong economic pressure for constant innovation
DRAM
Cache
ALUControl
ALU
ALU
ALU
DRAM
CPU GPU
What is the GPU Good at?
The GPU is good at data-parallel processing The same computation executed on many data
elements in parallel – low control flow overhead with high SP floating point arithmetic intensity
Many calculations per memory access Currently also need high floating point to integer
ratio High floating-point arithmetic intensity and many data
elements mean that memory access latency can be hidden with calculations instead of big data caches – Still need to avoid bandwidth saturation!
CUDA - No more shader functions. CUDA integrated CPU+GPU application C program
Serial or modestly parallel C code executes on CPU Highly parallel SPMD kernel C code executes on GPU
CPU Serial CodeGrid 0
. . .
. . .
GPU Parallel Kernel
KernelA<<< nBlk, nTid >>>(args);
Grid 1CPU Serial Code
GPU Parallel Kernel
KernelB<<< nBlk, nTid >>>(args);
CUDA
Basic
Memory
Threads
Application performance
Data mining on GPU
K-means
K-nn
Apriori
SVM
K-means on GPU
A team at University of Virginia, led by Professor Skadron
HKUST && MSRA GPUMiner
LABS-hp
Experiments -GPUMiner
Experiments-HPL
Data mining on GPU
The time of speed-up highly depends on the implementation Data transfer Memory CPU-GPU cooperation
Outline
Protein identification and pFind
GPU and data mining
Research Plan
Research Plan
Keep reading related papers GPU, data mining
Development Read our k-means program Try to speed it up Try protein identification on GPU
Time schedule
Courses Thu. 6.30-9.30pm, data mining
TA Tue. 11.30-12.20am, Network security; Fri. 9.30-11.30am, Network security;
Thank you for your listening
top related