experiments with distributed training of neural networks on the grid

1
Maciej Malawski 1 Marian Bubak 1,2 Elżbieta Richter-Wąs 3,4 Grzegorz Sala 3,5 Tadeusz Szymocha 3 1 Institute of Computer Science AGH, Mickiewicza 30, 30-059 Kraków, Poland 2 Academic Computer Centre CYFRONET, Nawojki 11, 30-950 Kraków, Poland 3 Institute of Nuclear Physics, Polish Academy of Sciences, Krakow, Poland 4 Institute of Physics, Jagiellonian University, Kraków, Poland 5 Faculty of Physics and Applied Computer Science AGH, Kraków, Poland {bubak,malawski}@agh.edu.pl, [email protected], [email protected], [email protected] Testbed for our experiments: EGEE project • Virtual Organization for Central Europe • CYFRONET Kraków, PSNC Poznań, KFKI Budapest, CESNET Prague, TU Kosice Grid sites • Support for MPI applications Why neural networks • Once trained, are efficient and accurate • Applicable for classification and prediction • Proven in wide area of applications Challenges • Neural network training is a highly compute- intensive task – may need High Performance Computing • Finding optimal configuration may be time consuming: many experiments with various parameters – may need High Throughput Computing Solution: The Grid • The distribution of the computation on a cluster of machines can lead to significant improvement in decreasing computation time. • Utilizing resources (multiple clusters) available on the Grid can make this task less time consuming for researcher. Target application • High Energy Physics • Discrimination between signal and background events coming from the particle detector (simulation) • ROOT and Athena as basic data analysis tools Observation Training of neural networks on the Grid requires many repeated tasks: • job preparation, • submission, • monitoring of status, • gathering results. Performing them manually is time consuming for the researcher → Preparation of tools for automating such tasks can facilitate the whole process considerably. Our Goals • Develop the tools facilitating usage of Grid for multiple classification experiments • Investigate and validate algorithms for distributed neural network training • Allow seamless integration with data analysis tools such as ROOT Nodei Update Weights Calculateerror Node 2 Update Weights Calculateerror Node 1 Update Weights Calculateerror Readthetrainingdata & thenetworkdetails Distributedata tothenodes Master repeat until stoppingcriterian ismet Backpropagationalgoritm Nodei Update Weights Calculateerror Nodei Update Weights Calculateerror Node 2 Update Weights Calculateerror Node 2 Update Weights Calculateerror Node 1 Update Weights Calculateerror Node 1 Update Weights Calculateerror Readthetrainingdata & thenetworkdetails Distributedata tothenodes Master repeat until stoppingcriterian ismet Backpropagationalgoritm Jacobian Hessjan Node i Jacobian Hessjan Node 2 Jacobian Hessjan Node 1 Readthetrainingdata & thenetworkdetails Distributedata tothenodes UpdateWeights Master repeat until stopping criterian ismet Levenberg-Marquardtalgoritm Jacobian Hessjan Node i Jacobian Hessjan Node 2 Jacobian Hessjan Node 1 Readthetrainingdata & thenetworkdetails Distributedata tothenodes UpdateWeights Master repeat until stopping criterian ismet Jacobian Hessjan Node i Jacobian Hessjan Node i Jacobian Hessjan Node 2 Jacobian Hessjan Node 2 Jacobian Hessjan Node 1 Jacobian Hessjan Node 1 Readthetrainingdata & thenetworkdetails Distributedata tothenodes UpdateWeights Master repeat until stopping criterian ismet Levenberg-Marquardtalgoritm Grid U ser Subm it() ClusterC ClusterB ClusterA Grid U ser Subm it() ClusterC ClusterC ClusterB ClusterB ClusterA ClusterA

Upload: rose-ellison

Post on 31-Dec-2015

29 views

Category:

Documents


2 download

DESCRIPTION

Experiments with Distributed Training of Neural Networks on the Grid. Maciej Malawski 1 Marian Bubak 1,2 Elżbieta Richter-Wąs 3,4 Grzegorz Sala 3,5 Tadeusz Szymocha 3 1 Institute of Computer Science AGH, Mickiewicza 30, 30-059 Kraków, Poland - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Experiments with Distributed Training of Neural Networks on the Grid

Maciej Malawski1 Marian Bubak1,2 Elżbieta Richter-Wąs3,4 Grzegorz Sala3,5 Tadeusz Szymocha3

1Institute of Computer Science AGH, Mickiewicza 30, 30-059 Kraków, Poland2Academic Computer Centre CYFRONET, Nawojki 11, 30-950 Kraków, Poland

3Institute of Nuclear Physics, Polish Academy of Sciences, Krakow, Poland4Institute of Physics, Jagiellonian University, Kraków, Poland

5Faculty of Physics and Applied Computer Science AGH, Kraków, Poland {bubak,malawski}@agh.edu.pl, [email protected], [email protected], [email protected]

Testbed for our experiments: EGEE project• Virtual Organization for Central Europe• CYFRONET Kraków, PSNC Poznań, KFKI

Budapest, CESNET Prague, TU Kosice Grid sites• Support for MPI applications

Why neural networks• Once trained, are efficient and

accurate• Applicable for classification and

prediction • Proven in wide area of applications

Challenges • Neural network training is a highly compute-

intensive task – may need High Performance Computing

• Finding optimal configuration may be time consuming: many experiments with various parameters – may need High Throughput Computing

Solution: The Grid• The distribution of the computation on a cluster of

machines can lead to significant improvement in decreasing computation time.

• Utilizing resources (multiple clusters) available on the Grid can make this task less time consuming for researcher.

Target application• High Energy Physics• Discrimination between signal and

background events coming from the particle detector (simulation)

• ROOT and Athena as basic data analysis tools

Observation Training of neural networks on the Grid requires many repeated tasks:

• job preparation, • submission, • monitoring of status,• gathering results.

Performing them manually is time consuming for the researcher→ Preparation of tools for automating such tasks can facilitate the whole process considerably.

Our Goals • Develop the tools facilitating usage of Grid for

multiple classification experiments• Investigate and validate algorithms for

distributed neural network training• Allow seamless integration with data analysis

tools such as ROOT

Node i

Update Weights

Calculate error

Node 2

Update Weights

Calculate error

Node 1

Update Weights

Calculate error

Read the training data& the network details

Distribute data to the nodes

Master

repeat untilstopping criterian

is metBackpropagation algoritm

Node i

Update Weights

Calculate error

Node i

Update Weights

Calculate error

Node 2

Update Weights

Calculate error

Node 2

Update Weights

Calculate error

Node 1

Update Weights

Calculate error

Node 1

Update Weights

Calculate error

Read the training data& the network details

Distribute data to the nodes

Master

repeat untilstopping criterian

is metBackpropagation algoritm

Jacobian

Hessjan

Node i

Jacobian

Hessjan

Node 2

Jacobian

Hessjan

Node 1

Read the training data& the network details

Distribute data to the nodes

Update Weights

Master

repeat untilstopping criterian

is met

Levenberg-Marquardt algoritm

Jacobian

Hessjan

Node i

Jacobian

Hessjan

Node 2

Jacobian

Hessjan

Node 1

Read the training data& the network details

Distribute data to the nodes

Update Weights

Master

repeat untilstopping criterian

is met

Jacobian

Hessjan

Node i

Jacobian

Hessjan

Node i

Jacobian

Hessjan

Node 2

Jacobian

Hessjan

Node 2

Jacobian

Hessjan

Node 1

Jacobian

Hessjan

Node 1

Read the training data& the network details

Distribute data to the nodes

Update Weights

Master

repeat untilstopping criterian

is met

Levenberg-Marquardt algoritm

Grid

User

Submit()

Cluster C

Cluster B

Cluster A

Grid

User

Submit()

Cluster CCluster C

Cluster BCluster B

Cluster ACluster A