tutorial for park data fitting paul kienzle, wenwu chen and ziwen fu reflectometry group

26
Tutorial for PARK data fitting Paul KIENZLE, Wenwu CHEN and Ziwen FU Reflectometry Group

Upload: gretchen-dennison

Post on 31-Mar-2015

225 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Tutorial for PARK data fitting Paul KIENZLE, Wenwu CHEN and Ziwen FU Reflectometry Group

Tutorial for PARK data fitting

Paul KIENZLE, Wenwu CHEN and Ziwen FU

Reflectometry Group

Page 2: Tutorial for PARK data fitting Paul KIENZLE, Wenwu CHEN and Ziwen FU Reflectometry Group

Objective: Distributed Computing Environment

Service ServerMaster Node

User

Cluster

Working Nodes

User/Client

ServiceServerManagement

WorkingServer

User User User User

Page 3: Tutorial for PARK data fitting Paul KIENZLE, Wenwu CHEN and Ziwen FU Reflectometry Group

Prerequisite

Python: version >= 2.40

Windows: cygwin

Client:wxPython: version >= 2.6matplot

Most services may need numpy

Page 4: Tutorial for PARK data fitting Paul KIENZLE, Wenwu CHEN and Ziwen FU Reflectometry Group

Setup of park

• Download Source code:– Source code: svn co svn://[email protected]/park– Package for unix/linux: park-0.2.0.tar.gz park-0.2.0.tar.bz2 – Package for windows: park-0.2.0.zip

• Edit cluster config file:– park/config/hosts

• Start service server– park/servers/mapServer.py

• Start client – park/client/AppJob.py

• Provide services– park/services

Page 5: Tutorial for PARK data fitting Paul KIENZLE, Wenwu CHEN and Ziwen FU Reflectometry Group

Setup of park in Unix/Linux• Download park-0.2.0.tar.gz or park-0.2.0.tar.bz2 from

http://danse.us• Unzip the file:

tar –xvzf park-0.2.0.tar.gz• Make the installation:

cd park-0.2.0make install

or setup.py install –install-

purelib=home_directory_of_park

The command make install is equivalent to setup.py install –install-purelib=~. It will install park in directory ~/park.

Page 6: Tutorial for PARK data fitting Paul KIENZLE, Wenwu CHEN and Ziwen FU Reflectometry Group

Setup of park in Windows

• Download park-0.2.0.zip or park-0.2.0.tar.bz2 from http://danse.us

• Unzip the file: unzip park-0.2.0.zip

• Make the installation in MSDOS window:cd park-0.2.0setup.py install

It will install park in directory ~/Lib/site-packages/park.

Page 7: Tutorial for PARK data fitting Paul KIENZLE, Wenwu CHEN and Ziwen FU Reflectometry Group

Edit the config fileThe server makes use of park/config/hosts to configure the working nodes.

Example of park/config/hosts:# # hosts configure file for park# example for compufans.ncnr.nist.gov cluster: # 4 nodes, each node with 2 cpus## the format is similar to that of /ect/hosts:# ip_address full_name alias_name[:port:number_of_cpus]#127.0.0.1 localhost.localdomain localhost:5300:2#172.16.255.251 n4.ncnr.nist.gov n4:6500:2#172.16.255.252 n3.ncnr.nist.gov n3:6300:2#172.16.255.253 n2.ncnr.nist.gov n2:6200:2#172.16.255.254 n1.ncnr.nist.gov n1:6100:2

Page 8: Tutorial for PARK data fitting Paul KIENZLE, Wenwu CHEN and Ziwen FU Reflectometry Group

Start the serverThe server is park/servers/mapServer.py:

cd park/serverspython mapServer.py

Or in cygwin in Windows cd Lib/site-packages/park/serverspython mapServer.py

The full command is: python mapServer.py –port port –host host_name –log

log_file_name.

Page 9: Tutorial for PARK data fitting Paul KIENZLE, Wenwu CHEN and Ziwen FU Reflectometry Group

Start the server• Make sure that python and its environments are set correctly.• Make sure that RSH defined in park/servers/environ.py is set to

the remote shell command for cluster with multiple working nodes

• Make sure that this remote shell command can start the remote command without the password.

• Make sure that the services are executable files.

Common Error:• [Errno 2] No such file or directory: '~/park/config/hosts': no configure

file hosts.• ERROR (111, 'Connection refused')

– the working server doesn’t start. – make sure that the port is not used

• ERROR (xxx, ‘port is used')– Wait a while before restart the server– make sure that the port is not used

Page 10: Tutorial for PARK data fitting Paul KIENZLE, Wenwu CHEN and Ziwen FU Reflectometry Group

Stop the server

Shut down the service server by Ctrl-C or kill command. Use kill without -9 command, which will also stop the

working server program. Otherwise the working server will continue to work even the service server is killed.

Page 11: Tutorial for PARK data fitting Paul KIENZLE, Wenwu CHEN and Ziwen FU Reflectometry Group

Start the client• Enter ~/park/client• Run the client application:• $python AppJob.py• Connect the server:

– server > server | port (default port is 5400)– click connect button to connect the server.

• Prepare and submit the service request:– shell > load : load xml service request, which will be shown in the

upper text field– click submit button to submit the service request– the message related to service request is shown in the lower text field.

• View the service results:– view : to view the results.

• There are 3 types of data to be viewed: experimental data (with error bar), simulation data, and chi square. The experimental and simulation data only show the best results, and chi square shows the improvement of chisq for data fitting. Under the panel is a toolbar, which can be used to zoom in/out, save figure, and change the properties of figure (property button).

• Shutdown the client:– server > disconnect then close the window– or close the window directly.

Page 12: Tutorial for PARK data fitting Paul KIENZLE, Wenwu CHEN and Ziwen FU Reflectometry Group

Map-reduce parallel pattern• Map: master node assigns working unit [i] to working node [j] :

– map(fn, input[i] ) = output[i] to working node j• Reduce: master node collection message from each working node

and perform reduce function, and send the result to the user: – reduce(gn, output[0], …, output[n] ) => send to the user client

Service ServerMapping

Working Nodes

Service Serverreducing

Page 13: Tutorial for PARK data fitting Paul KIENZLE, Wenwu CHEN and Ziwen FU Reflectometry Group

Service request<?xml version='1.0' encoding='UTF-8'?><session version='2.0.1' type='7' user='wwchen‘

email='[email protected]' priority='0' > <group name='group1'> <dataSet> </dataSet> <reduce classname='Chisq'/> <task cmd='longwinstr.py' > <bufsize value='3000'/> <home value='/home/wwchen/dansesrc/park/services/tester'/> <cwd value='/home/wwchen/dansesrc/park/servers/tester'/> </task> <joblist name='job1' priority='4' cnt='4' > <input count='24'> </input> </joblist> </group></session>

Reduce function

map function

inputs

Page 14: Tutorial for PARK data fitting Paul KIENZLE, Wenwu CHEN and Ziwen FU Reflectometry Group

Software Infrastructure of PARKfor data fitting

Service Server

ServiceService

ServiceService

Working Nodes

User Interface

Scientist

View DeveloperReduce Service Developer

Data reduction

Model Developer

Data simulation

Data presentation

Data View

Page 15: Tutorial for PARK data fitting Paul KIENZLE, Wenwu CHEN and Ziwen FU Reflectometry Group

Reduce functionThe class inherits from park/services/reduce/reduce.Reduce.

class Reduce: """ A base class as the reduce function. """ def __init__(self): """ constructor. """ self.archive = None self.msgqueue = None

def setArchive(self, archive): """ set the archive to store data """ self.archive = archive

def setMsgQueue(self, msgqueue): """ set the message queue. """ self.msgqueue = msgqueue

def __call__(self, msg): """ called by the PARK to process the reply from the working node. """ pass

Page 16: Tutorial for PARK data fitting Paul KIENZLE, Wenwu CHEN and Ziwen FU Reflectometry Group

A example of Reduce functionpark/services/reduce/Chisq.Chisq:

class Chisq (Reduce): """ A class to handle the chisq for data fitting. """

def __init__(self): """ constructor. """ Reduce.__init__(self) self.chisq = None

def __call__(self, reply): keys = {}; keys['gid'] = reply.gid; keys['jid'] = reply.id

self.archive.put(keys, str(reply))

if hasattr(reply, 'chisq'): chisqval = self.chisq if self.chisq is None: self.chisq = chisqval elif chisqval < self.chisq: self.chisq = chisqval self.msgqueue.putMsg(reply.gid, '%s<reply gid="%s" update="%s" chisq=%s/>' \ %(XML_HEADER, str(reply.gid), str(reply.id), str(chisqval)))

Page 17: Tutorial for PARK data fitting Paul KIENZLE, Wenwu CHEN and Ziwen FU Reflectometry Group

map function1. The pure python function.

- Running as a thread in PARK.- Bad scalability for SMP (due to python multithreading implementation)- Only works for pure python function.Format:output_string function_name(input_string)

• The executable program.- Running as a separated process in PARK.- Excellent scalability for SMP- Works for any executable program- Need more memory and long start-up time

Read input from the standard in and output the results to standard out.

Page 18: Tutorial for PARK data fitting Paul KIENZLE, Wenwu CHEN and Ziwen FU Reflectometry Group

A example of map function

park/services/tester/longwinstr.py:

if __name__ == '__main__':

try:

longwin()

except:

sys.stderr.write('Exception:%s' %(sys.exc_info()[1]))

Page 19: Tutorial for PARK data fitting Paul KIENZLE, Wenwu CHEN and Ziwen FU Reflectometry Group

A example of map functiondef longwin(): print 'call longwin' s0 = sys.stdin.read() node = minidom.parseString(s0).childNodes[0] t = int(node.getAttribute('count')) if t > 25: count = t else: count = 2**t

print ' Start work with iteration number: ', t cnt = 0 while (cnt < count): a= math.sqrt(2.0) cnt += 1 print ' finish work: cnt=', cnt

Page 20: Tutorial for PARK data fitting Paul KIENZLE, Wenwu CHEN and Ziwen FU Reflectometry Group

Fully Distributed Services ?

Service Register

Cluster Management

Service Management

Job Queue Message Queue

Data Fetching

Archive Logging

Task Management

UserClient

Services

Shared Files

Page 21: Tutorial for PARK data fitting Paul KIENZLE, Wenwu CHEN and Ziwen FU Reflectometry Group

Pull or put ?

Working Server

Job Server Message Server

1. Job server sends job to working server, and working server send results to message server

2. Job server sends job to working server, and message server working retrieve results from working server3. Working server retrieves job from job server and send

results to message server4. Working server retrieves job from job server and message server working retrieve results from working server

Page 22: Tutorial for PARK data fitting Paul KIENZLE, Wenwu CHEN and Ziwen FU Reflectometry Group

Security: authentication and authorization

Working Server

Job Server Security Server

MessageServer

Page 23: Tutorial for PARK data fitting Paul KIENZLE, Wenwu CHEN and Ziwen FU Reflectometry Group

Data Transfer

1. Provide the data center server for the cluster, which will retrieve data from remote data server, and store the data for the accessing by the local working nodes. Necessary for diskless nodes in the cluster.

2. Provide the reference to the remote data (similar to url), and each working node will access the data individually.

Page 24: Tutorial for PARK data fitting Paul KIENZLE, Wenwu CHEN and Ziwen FU Reflectometry Group

UI/Visualization

MVC model

Traits-UI

2D/3D

Page 25: Tutorial for PARK data fitting Paul KIENZLE, Wenwu CHEN and Ziwen FU Reflectometry Group

Multi-tier of PARK

Service Server

Working Server

Reduce Server

Data Server

Client ServerExplicit direct connectionImplicit direct connectionPossible connection

All are working as both the server and the client

Page 26: Tutorial for PARK data fitting Paul KIENZLE, Wenwu CHEN and Ziwen FU Reflectometry Group

Multi-tier of PARK

Service Server

Working Server

Reduce Server

Data Server

Client ServerExplicit direct connectionImplicit direct connectionPossible connection

All are working as both the server and the client