parallel c3m1 aylin tokuç erkan okuyan Özlem gür aylin tokuç erkan okuyan Özlem gür
Post on 22-Dec-2015
231 views
TRANSCRIPT
Parallel C3M 1
Parallel C3MParallel C3M
Aylin TokuçErkan Okuyan
Özlem Gür
Aylin TokuçErkan Okuyan
Özlem Gür
Parallel C3M 2
OutlineOutline
• Basics of Parallel computing
• Sequential C3M
• Parallel C3M
• Basics of Parallel computing
• Sequential C3M
• Parallel C3M
Parallel C3M 3
Parallel ComputationParallel Computation
Decomposition: The process of dividing a computation into smaller parts.
Task: Programmer defined units of computation into which the main computation is subdivided by means of decomposition.
Decomposition: The process of dividing a computation into smaller parts.
Task: Programmer defined units of computation into which the main computation is subdivided by means of decomposition.
Parallel C3M 4
Parallel Computation Primary Considerations
Parallel Computation Primary Considerations
• Load Balancing
• Minimizing Communication
• Task Dependency Optimization
• Load Balancing
• Minimizing Communication
• Task Dependency Optimization
Parallel C3M 5
Parallel Computation Load Balancing
Parallel Computation Load Balancing
Parallel C3M 6
Parallel Computation Minimizing Communication
Parallel Computation Minimizing Communication
Parallel C3M 7
Parallel Computation Task Dependency Optimization
Parallel Computation Task Dependency Optimization
Parallel C3M 8
C3M AlgorithmC3M Algorithm
1- Determine the cluster seeds of the database.
2- if d, is not a cluster seed then Find the cluster seed (if any) that maximally covers d
3- If there remain unclustered documents, group them into a ragbag cluster.
Parallel C3M 9
C3M FormulasC3M Formulas
Parallel C3M 10
C3M – Sample MatricesC3M – Sample Matrices
000101
110000
110001
001111
101001
D
.3750.0.125.375.125
0.0.417.4170.0.167
.083.277.361.083.194
.1880.0.063.563.188
.083.111.194.25.361
C
Parallel C3M 11
Parallel C3M- DistributionParallel C3M- Distribution
Distribute rows among processors
Load balancing by cyclic block distribution
Distribute rows among processors
Load balancing by cyclic block distribution
Parallel C3M 12
Local CalculationsLocal Calculations
All processors calculate α, partial β and PiAll processors calculate α, partial β and Pi
Current Method for Weighted Matrix: too costlyCurrent Method for Weighted Matrix: too costly
Need coloumn vectors (but row-wise partitioned)
Need coloumn vectors (but row-wise partitioned)
Parallel C3M 13
Seed Powers PiSeed Powers Pi
• Seed power Pi, should be small for a document whose terms appear in too many documents or too few documents.
• Seed power Pi, should be bigger for a document whose terms appear in a moderate number of documents.
• Seed power Pi, should be small for a document whose terms appear in too many documents or too few documents.
• Seed power Pi, should be bigger for a document whose terms appear in a moderate number of documents.
Parallel C3M 14
Minimize Communication - Proposed Heuristic
Minimize Communication - Proposed Heuristic
m
kkii d
1
),1min('
n
j
jjijiii mmdP
1
'1
''
# of non-zeros# of non-zeros
All processors calculate α, partial β and β’
Parallel C3M 15
Effectiveness of HeuristicEffectiveness of Heuristic
• A matlab script is written to compare the effectiveness of the proposed heuristic.
• Correlation Coeeficient = 0.95
• A matlab script is written to compare the effectiveness of the proposed heuristic.
• Correlation Coeeficient = 0.95
Parallel C3M 16
Communication btw Processors
Communication btw Processors
• Partial β and β’ vectors are exchanged btw processors to calculate the final β and β’ vectors.
• Then, all processor calculate cii=δi
• Partial β and β’ vectors are exchanged btw processors to calculate the final β and β’ vectors.
• Then, all processor calculate cii=δi
Parallel C3M 17
# of Clusters# of Clusters
• Processors exchange local δ
• All processors calculate nc
• Processors exchange local δ
• All processors calculate nc
m
iicn
1
Parallel C3M 18
Cluster-head SelectionCluster-head Selection
• Calculate seed power of local documents
• Exchange largest nc seed powers.
• Calculate largest nc seed powers among all Pi and find cluster heads.
• Calculate seed power of local documents
• Exchange largest nc seed powers.
• Calculate largest nc seed powers among all Pi and find cluster heads.
n
j
jjijiii mmdP
1
'1
''
Parallel C3M 19
Clustering Non-seed DocsClustering Non-seed Docs
• Exchange seed documents
• Cluster non-seed documents (as in sequential C3M) in each processor.
• Exchange seed documents
• Cluster non-seed documents (as in sequential C3M) in each processor.
Parallel C3M 20
Future WorkFuture Work
• Term Based Clustering
• Overlapping Clusters
• Term Based Clustering
• Overlapping Clusters
Parallel C3M 21
C3M SummaryC3M Summary• Load Balancing with cyclic block distribution• Communication minimization by a new
heuristic• Task dependency minimized with block
distirbution & heuristic.
• Load Balancing with cyclic block distribution• Communication minimization by a new
heuristic• Task dependency minimized with block
distirbution & heuristic.
n
j
jjijiii mmdP
1
'1
''
Parallel C3M 22
ReferencesReferences• Concepts and the effectiveness of the cover
coefficient-based clustering methodology, F. Can, E. A. Ozkarahan
• Parallelizing the Buckshot Algorithm for Efficient Document Clustering, Eric C. Jensen, Steven M. Beitzel, Angelo J. Pilotto, Nazli Goharian, Ophir Frieder
• Clustering and Classification of Large Document Bases in a Parallel Environment, Anthony S. Ruocco, Ophir Frieder
• Efficient Clustering of Very Large Document Collections, I.S. Dhillon, J. Fan, Y. Guan
• Concepts and the effectiveness of the cover coefficient-based clustering methodology, F. Can, E. A. Ozkarahan
• Parallelizing the Buckshot Algorithm for Efficient Document Clustering, Eric C. Jensen, Steven M. Beitzel, Angelo J. Pilotto, Nazli Goharian, Ophir Frieder
• Clustering and Classification of Large Document Bases in a Parallel Environment, Anthony S. Ruocco, Ophir Frieder
• Efficient Clustering of Very Large Document Collections, I.S. Dhillon, J. Fan, Y. Guan
Parallel C3M 23
Questions?Questions?
Parallel C3M 24
The EndThe End
Thank you for your patience
Thank you for your patience