experiments with distributed training of neural networks on the grid
DESCRIPTION
Experiments with Distributed Training of Neural Networks on the Grid. Maciej Malawski 1 Marian Bubak 1,2 Elżbieta Richter-Wąs 3,4 Grzegorz Sala 3,5 Tadeusz Szymocha 3 1 Institute of Computer Science AGH, Mickiewicza 30, 30-059 Kraków, Poland - PowerPoint PPT PresentationTRANSCRIPT
Maciej Malawski1 Marian Bubak1,2 Elżbieta Richter-Wąs3,4 Grzegorz Sala3,5 Tadeusz Szymocha3
1Institute of Computer Science AGH, Mickiewicza 30, 30-059 Kraków, Poland2Academic Computer Centre CYFRONET, Nawojki 11, 30-950 Kraków, Poland
3Institute of Nuclear Physics, Polish Academy of Sciences, Krakow, Poland4Institute of Physics, Jagiellonian University, Kraków, Poland
5Faculty of Physics and Applied Computer Science AGH, Kraków, Poland {bubak,malawski}@agh.edu.pl, [email protected], [email protected], [email protected]
Testbed for our experiments: EGEE project• Virtual Organization for Central Europe• CYFRONET Kraków, PSNC Poznań, KFKI
Budapest, CESNET Prague, TU Kosice Grid sites• Support for MPI applications
Why neural networks• Once trained, are efficient and
accurate• Applicable for classification and
prediction • Proven in wide area of applications
Challenges • Neural network training is a highly compute-
intensive task – may need High Performance Computing
• Finding optimal configuration may be time consuming: many experiments with various parameters – may need High Throughput Computing
Solution: The Grid• The distribution of the computation on a cluster of
machines can lead to significant improvement in decreasing computation time.
• Utilizing resources (multiple clusters) available on the Grid can make this task less time consuming for researcher.
Target application• High Energy Physics• Discrimination between signal and
background events coming from the particle detector (simulation)
• ROOT and Athena as basic data analysis tools
Observation Training of neural networks on the Grid requires many repeated tasks:
• job preparation, • submission, • monitoring of status,• gathering results.
Performing them manually is time consuming for the researcher→ Preparation of tools for automating such tasks can facilitate the whole process considerably.
Our Goals • Develop the tools facilitating usage of Grid for
multiple classification experiments• Investigate and validate algorithms for
distributed neural network training• Allow seamless integration with data analysis
tools such as ROOT
Node i
Update Weights
Calculate error
Node 2
Update Weights
Calculate error
Node 1
Update Weights
Calculate error
Read the training data& the network details
Distribute data to the nodes
Master
repeat untilstopping criterian
is metBackpropagation algoritm
Node i
Update Weights
Calculate error
Node i
Update Weights
Calculate error
Node 2
Update Weights
Calculate error
Node 2
Update Weights
Calculate error
Node 1
Update Weights
Calculate error
Node 1
Update Weights
Calculate error
Read the training data& the network details
Distribute data to the nodes
Master
repeat untilstopping criterian
is metBackpropagation algoritm
Jacobian
Hessjan
Node i
Jacobian
Hessjan
Node 2
Jacobian
Hessjan
Node 1
Read the training data& the network details
Distribute data to the nodes
Update Weights
Master
repeat untilstopping criterian
is met
Levenberg-Marquardt algoritm
Jacobian
Hessjan
Node i
Jacobian
Hessjan
Node 2
Jacobian
Hessjan
Node 1
Read the training data& the network details
Distribute data to the nodes
Update Weights
Master
repeat untilstopping criterian
is met
Jacobian
Hessjan
Node i
Jacobian
Hessjan
Node i
Jacobian
Hessjan
Node 2
Jacobian
Hessjan
Node 2
Jacobian
Hessjan
Node 1
Jacobian
Hessjan
Node 1
Read the training data& the network details
Distribute data to the nodes
Update Weights
Master
repeat untilstopping criterian
is met
Levenberg-Marquardt algoritm
Grid
User
Submit()
Cluster C
Cluster B
Cluster A
Grid
User
Submit()
Cluster CCluster C
Cluster BCluster B
Cluster ACluster A