grid computing: harnessing underutilized resources uncw department of chemistry & biochemistry...

47
Grid Computing: Grid Computing: Harnessing Harnessing Underutilized Underutilized Resources Resources UNCW Department of Chemistry UNCW Department of Chemistry & Biochemistry Seminar & Biochemistry Seminar September 24, 2004 September 24, 2004 Ned H. Martin Ned H. Martin

Upload: thomas-gilbert

Post on 18-Dec-2015

219 views

Category:

Documents


0 download

TRANSCRIPT

Grid Computing: Harnessing Grid Computing: Harnessing Underutilized ResourcesUnderutilized Resources

UNCW Department of Chemistry & UNCW Department of Chemistry & Biochemistry SeminarBiochemistry SeminarSeptember 24, 2004September 24, 2004

Ned H. MartinNed H. Martin

OutlineOutline

Definition of Grid computingDefinition of Grid computing

A brief history of computingA brief history of computing

Growth of computing powerGrowth of computing power

Rationale for Grid computingRationale for Grid computing

How a Grid worksHow a Grid works

Examples of Grid projectsExamples of Grid projects

Grid computing in NCGrid computing in NC

Limitations of Grid computingLimitations of Grid computing

UNCW Grid initiative: UNCW Grid initiative: GridNexusGridNexus

What’s next?What’s next?

Definition of Grid ComputingDefinition of Grid Computing

Grid computing is a form of distributed Grid computing is a form of distributed computing that involves coordinating and computing that involves coordinating and controlled sharing of diverse computing, controlled sharing of diverse computing, applications, data, storage, or network resources applications, data, storage, or network resources across dynamic and geographically dispersed across dynamic and geographically dispersed multi-institutional virtual organizations. multi-institutional virtual organizations.

A user of Grid computing does not need to have A user of Grid computing does not need to have the data and the software on the same the data and the software on the same computer, and neither must be on the user’s computer, and neither must be on the user’s home (login) computer.home (login) computer.

Grid Computing Grid Computing

The term Grid computing The term Grid computing suggests a computing suggests a computing paradigm similar to an paradigm similar to an electric power grid - a electric power grid - a variety of resources variety of resources contribute power into a contribute power into a shared "pool" for many shared "pool" for many consumers to access on consumers to access on an as-needed basis. an as-needed basis.

Background of Grid ComputingBackground of Grid Computing

The idea of Grid computing resulted from the The idea of Grid computing resulted from the confluence of three developments:confluence of three developments:– The proliferation of largely unused computing The proliferation of largely unused computing

resources (especially desktop computers)resources (especially desktop computers)– Their greatly increased cpu speed in recent yearsTheir greatly increased cpu speed in recent years– The widespread availability of fast, universal network The widespread availability of fast, universal network

connections (the Internet).connections (the Internet).

Brief History of ComputingBrief History of Computing

1943: "I think there is a world market for maybe 5 1943: "I think there is a world market for maybe 5 computers." Thomas Watson, chairman of IBMcomputers." Thomas Watson, chairman of IBM

1947: Testudo: The very first computer in the 1947: Testudo: The very first computer in the Netherlands; the relay-based machine was 5 m Netherlands; the relay-based machine was 5 m long. Adding took 30 s and multiplication 45 s.long. Adding took 30 s and multiplication 45 s.

Brief History of ComputingBrief History of Computing

1949: "Computers in the future may weigh no 1949: "Computers in the future may weigh no more than 1.5 tons." -Popular Mechanics, more than 1.5 tons." -Popular Mechanics, forecasting the relentless march of science forecasting the relentless march of science

1957: "I have traveled the length and breadth of 1957: "I have traveled the length and breadth of this country and talked with the best people, and this country and talked with the best people, and I can assure you that data processing is a fad I can assure you that data processing is a fad that won't last out the year." -The business book that won't last out the year." -The business book editor for Prentice Hall. editor for Prentice Hall.

Brief History of ComputingBrief History of Computing

1977: "There is no reason anyone would want a 1977: "There is no reason anyone would want a computer in their home." -Ken Olson, president, computer in their home." -Ken Olson, president, chairman and founder of Digital Equipment chairman and founder of Digital Equipment Corp. Corp. 1980: "DOS addresses only 1 Megabyte of RAM 1980: "DOS addresses only 1 Megabyte of RAM because we cannot imagine any applications because we cannot imagine any applications needing more." -Microsoft on the development of needing more." -Microsoft on the development of DOS. DOS. 1981: "640k ought to be enough for anybody." 1981: "640k ought to be enough for anybody." -Bill Gates -Bill Gates

Brief History of ComputingBrief History of Computing

1979: Introduction of the 8086 chip by Intel; used 1979: Introduction of the 8086 chip by Intel; used a 16 bit processor; too expensive, so an 8 bit a 16 bit processor; too expensive, so an 8 bit version was developed (the 8088), which was version was developed (the 8088), which was chosen by IBM for the first IBM PC; available chosen by IBM for the first IBM PC; available clock frequencies up to 10 MHz. It had an clock frequencies up to 10 MHz. It had an instruction set of about 300 operations. At instruction set of about 300 operations. At introduction the fastest processor was the 8 MHz introduction the fastest processor was the 8 MHz version which achieved 0.8 MIPs (0.8 x 10version which achieved 0.8 MIPs (0.8 x 1066

instructions perinstructions per second) and contained 29,000 second) and contained 29,000 transistors. transistors.

Brief History of ComputingBrief History of Computing

1982: Intel 80286 released. It supported clock 1982: Intel 80286 released. It supported clock frequencies of up to 20 MHz. At introduction the frequencies of up to 20 MHz. At introduction the fastest version ran at 12.5 MHz, achieved 2.7 fastest version ran at 12.5 MHz, achieved 2.7 MIPs and contained 134,000 transistors.MIPs and contained 134,000 transistors.

1985: Intel 80386 DX released. It supported 1985: Intel 80386 DX released. It supported clock frequencies of up to 33 MHz. At the date of clock frequencies of up to 33 MHz. At the date of release the fastest version ran at 20 MHz and release the fastest version ran at 20 MHz and achieved 6.0 MIPs. It contained 275,000 achieved 6.0 MIPs. It contained 275,000 transistors. transistors.

Brief History of ComputingBrief History of Computing

1989: Intel 80486 DX released by Intel. It 1989: Intel 80486 DX released by Intel. It contained the equivalent of about 1.2 million contained the equivalent of about 1.2 million transistors. At the time of release the fastest transistors. At the time of release the fastest version ran at 25 MHz and achieved up to 20 version ran at 25 MHz and achieved up to 20 MIPs. Later versions had clock speeds up to 100 MIPs. Later versions had clock speeds up to 100 MHz. MHz.

1993: Intel Pentium released. At that time it was 1993: Intel Pentium released. At that time it was only available in 60 & 66 MHz versions which only available in 60 & 66 MHz versions which achieved up to 100 MIPs, with over 3.1 million achieved up to 100 MIPs, with over 3.1 million transistors.transistors.

Brief History of ComputingBrief History of Computing

1995: Pentium Pro released. At introduction it 1995: Pentium Pro released. At introduction it achieved a clock speed of up to 200 MHz. It achieved a clock speed of up to 200 MHz. It achieved 440 MIPs and contained 5.5 million achieved 440 MIPs and contained 5.5 million transistors - this was nearly 2400 times as transistors - this was nearly 2400 times as many as the first microprocessor in 1971- and many as the first microprocessor in 1971- and capable of 70,000 times as many instructions capable of 70,000 times as many instructions per second.per second.2004: Pentium 4 chips available with clock 2004: Pentium 4 chips available with clock speeds of up to 3.6 GHz providing 11,356 speeds of up to 3.6 GHz providing 11,356 MIPS and containing 125,000,000 transistors.MIPS and containing 125,000,000 transistors.2005: 500,000,000 transistors !!!2005: 500,000,000 transistors !!!

Growth of Computing PowerGrowth of Computing Power

0

2000

4000

6000

8000

10000

12000

14000

1960 1970 1980 1990 2000 2010

MIPS

Series2ts/104

2004

Rationale for Grid ComputingRationale for Grid Computing

The proliferation of largely unused computing The proliferation of largely unused computing resources (especially desktop computers, of resources (especially desktop computers, of which which 152 million152 million were sold in 2003). were sold in 2003).Their greatly increased cpu speed in recent Their greatly increased cpu speed in recent years (now >3 GHz).years (now >3 GHz).The widespread availability of fast, universal The widespread availability of fast, universal network connections (the Internet).network connections (the Internet).

Rationale for Grid ComputingRationale for Grid Computing

High performance computers (formerly called High performance computers (formerly called supercomputers) are very expensive to buy and supercomputers) are very expensive to buy and maintain.maintain.Much of the enhancement of computing power Much of the enhancement of computing power recently has come through the application of recently has come through the application of mulltiple cpus to a problem (e.g., NCSC had a mulltiple cpus to a problem (e.g., NCSC had a 720 processor IBM parallel computer).720 processor IBM parallel computer).Many computing tasks relegated to these Many computing tasks relegated to these (especially massively parallel) computers could (especially massively parallel) computers could be performed by a “divide and conquer” strategy be performed by a “divide and conquer” strategy using many more, although slower, processors using many more, although slower, processors as are available on a Grid.as are available on a Grid.

How a Grid WorksHow a Grid Works

The term "grid computing" suggests a computing The term "grid computing" suggests a computing paradigm similar to an electric power grid - a paradigm similar to an electric power grid - a variety of resources contribute power into a variety of resources contribute power into a shared "pool" for many consumers to access on shared "pool" for many consumers to access on an as-needed basisan as-needed basisIdeally the user does not know or care where the Ideally the user does not know or care where the computing operation is being performed; the computing operation is being performed; the process is invisible to the user.process is invisible to the user.Middleware handles security, authentication, Middleware handles security, authentication, authorization, resource selection and routing of authorization, resource selection and routing of input and output seamlessly.input and output seamlessly.

Examples of Grid ProjectsExamples of Grid Projects

SETI@homeSETI@home

DNet (distributed.net) DNet (distributed.net)

GRID.ORG (anti-cancer ligand screening) GRID.ORG (anti-cancer ligand screening)

IBM Smallpox cureIBM Smallpox cure

Entropia.orgEntropia.org

CERNCERN

Grid Projects: SETI@homeGrid Projects: SETI@home

SETI@home SETI@home – A large-scale search through data gathered A large-scale search through data gathered

by radiotelescopes in P.R. for evidence of by radiotelescopes in P.R. for evidence of extraterrestrial life extraterrestrial life

– Involved more than 3 million computers Involved more than 3 million computers averaging about 14 TeraFLOPS, or 14 trillion averaging about 14 TeraFLOPS, or 14 trillion floating point operations per second,floating point operations per second,

– Utilized over 500,000 years of processing Utilized over 500,000 years of processing time in the past year and a half. time in the past year and a half.

Grid Projects: DNetGrid Projects: DNet

DNet (distributed.net)DNet (distributed.net)– Began in 1997 as the first general-purpose Began in 1997 as the first general-purpose

distributed computing network on the Internetdistributed computing network on the Internet– Highly successful in bringing individuals Highly successful in bringing individuals

together to complete cryptographic challenges together to complete cryptographic challenges via a distributed environment. via a distributed environment.

– Equivalent to more than 160,000 PII 266Mhz Equivalent to more than 160,000 PII 266Mhz computers working 24 hours a day, 7 days a computers working 24 hours a day, 7 days a week, 365 days a year!week, 365 days a year!

– The core distributed.net development team The core distributed.net development team joined United Devices in 2000. joined United Devices in 2000.

Grid Projects: GRID.ORGGrid Projects: GRID.ORG

The United Devices Cancer Research Project The United Devices Cancer Research Project (GRID.ORG) will advance research to uncover (GRID.ORG) will advance research to uncover new cancer drugs through the combination of new cancer drugs through the combination of chemistry, computers, and specialized software.chemistry, computers, and specialized software.The research centers on proteins that have been The research centers on proteins that have been determined to be a possible target for cancer determined to be a possible target for cancer therapy. Through a process called "virtual therapy. Through a process called "virtual screening", LigandFit docking software by screening", LigandFit docking software by Accelrys identifies molecules that interact with Accelrys identifies molecules that interact with these proteins, and determines which ones have these proteins, and determines which ones have a high likelihood of being developed into a drug.a high likelihood of being developed into a drug.In the first year and a half, over 3.5 million drug In the first year and a half, over 3.5 million drug candidates were screened using over a million candidates were screened using over a million personal computers.personal computers.

Grid Projects: Smallpox CureGrid Projects: Smallpox Cure

Smallpox cureSmallpox cure– To help find a cure for smallpox, IBM and a To help find a cure for smallpox, IBM and a

group of partners harnessed the processing group of partners harnessed the processing power of 2 million idle PCs. They then power of 2 million idle PCs. They then screened 35 million drug compounds and screened 35 million drug compounds and smallpox proteins to find the most effective smallpox proteins to find the most effective cure. cure.

Grid Projects: EntropiaGrid Projects: Entropia

In 1997, Entropia applied idle computers In 1997, Entropia applied idle computers worldwide to problems of scientific interest. In worldwide to problems of scientific interest. In just two years, this network grew to encompass just two years, this network grew to encompass 30,000 computers with an aggregate speed of 30,000 computers with an aggregate speed of over one teraflop per second. Among its several over one teraflop per second. Among its several scientific achievements is the identification of the scientific achievements is the identification of the largest known prime number.largest known prime number.

Grid Projects: CERNGrid Projects: CERN

CERNCERN– By 2005, detectors at the Large Hadron Collider By 2005, detectors at the Large Hadron Collider

at CERN, the European Laboratory for Particle at CERN, the European Laboratory for Particle Physics will produce several petabytes of data Physics will produce several petabytes of data per year - a million times the storage capacity of per year - a million times the storage capacity of a desktop computer a desktop computer

– Just the basic data analysis requires 20 tflops/s Just the basic data analysis requires 20 tflops/s of computing power (the fastest supercomputer of computing power (the fastest supercomputer produces 3 teraflops per second).produces 3 teraflops per second).

– more sophisticated analyses will need orders of more sophisticated analyses will need orders of magnitude more computing power magnitude more computing power

Grid Computing in NCGrid Computing in NC

NCBioGrid (NCBioGrid (www.ncbiogrid.org/www.ncbiogrid.org/), an outgrowth ), an outgrowth of the High Performance Computing and Data of the High Performance Computing and Data Storage Focus Group of the Storage Focus Group of the NC Genomics and Bioinformatics ConsortiumNC Genomics and Bioinformatics Consortium NC Computing Grid – now includes 7 NC Computing Grid – now includes 7 universities plus MCNC; UNCW will be joining universities plus MCNC; UNCW will be joining soonsoonUNCW Grid – started as a grid for UNCW UNCW Grid – started as a grid for UNCW bioinformatics/genomics research, expanded bioinformatics/genomics research, expanded now into chemistry and business applications.now into chemistry and business applications.

Limitations of Grid ComputingLimitations of Grid Computing

Currently, although efforts are being made to Currently, although efforts are being made to standardize protocols (e.g., Globus toolkit and standardize protocols (e.g., Globus toolkit and Avaki), interacting with Grid services remains a Avaki), interacting with Grid services remains a complex process. complex process. Most of the existing applications that access Most of the existing applications that access Grid services require the user to type Grid services require the user to type cumbersome commands, often using a cumbersome commands, often using a command-line interface. command-line interface. Creating new clients and services requires Creating new clients and services requires programming in a language such as C or Java programming in a language such as C or Java and using a host of libraries for interacting with and using a host of libraries for interacting with Open Grid Services Infrastructure, Grid Security Open Grid Services Infrastructure, Grid Security Infrastructure, Web Services Description Infrastructure, Web Services Description Language and other standards. Language and other standards.

Limitations of Grid ComputingLimitations of Grid Computing

These tools and techniques are useful to a These tools and techniques are useful to a select group of computing specialists; however select group of computing specialists; however the only way to make Grid resources accessible the only way to make Grid resources accessible to a wide range of users is to provide a relatively to a wide range of users is to provide a relatively simple graphical user interface (GUI). simple graphical user interface (GUI). The UNCW Grid project proposes to develop a The UNCW Grid project proposes to develop a Graphical Grid User InterfaceGraphical Grid User Interface that is easy to use that is easy to use and can access a wide range of applications.and can access a wide range of applications.Our hope is to create an interface to Grid Our hope is to create an interface to Grid computing that accomplishes what Internet computing that accomplishes what Internet browsers (browsers (NetscapeNetscape and and Internet ExplorerInternet Explorer) did to ) did to open up the WWW .open up the WWW .

UNCW Grid Initiative: UNCW Grid Initiative: GridNexusGridNexus

This initiative grew in part out of a need for HPC This initiative grew in part out of a need for HPC resources following the closure of the NCSC in resources following the closure of the NCSC in June 2003, coupled with the availability of faculty June 2003, coupled with the availability of faculty with software programming expertise and others with software programming expertise and others with computing applications that could benefit with computing applications that could benefit from use of a Grid.from use of a Grid.

The UNC-OP funded UNCW’s proposal for The UNC-OP funded UNCW’s proposal for $557,634 over two years to develop Grid portals $557,634 over two years to develop Grid portals (GUI middleware to allow users to access (GUI middleware to allow users to access software on computers on a Grid).software on computers on a Grid).

UNCW Grid Initiative: UNCW Grid Initiative: GridNexusGridNexus

The UNCW Grid Computing Project is a two-year collaborative The UNCW Grid Computing Project is a two-year collaborative project among a multi-discipline, multi-investigator core research project among a multi-discipline, multi-investigator core research team at UNCW and several discipline-focused researchers at team at UNCW and several discipline-focused researchers at partner institutions: NCSU, WCU, NCCU, ECU, and CFCC. The partner institutions: NCSU, WCU, NCCU, ECU, and CFCC. The research areas and institutional interests of this project are:research areas and institutional interests of this project are:Advanced Grid Software Development (UNCW) Advanced Grid Software Development (UNCW) Computational Chemistry (UNCW and ECU) Computational Chemistry (UNCW and ECU) Bioinformatics (UNCW, NCSU, and NCCU) Bioinformatics (UNCW, NCSU, and NCCU) Combinatorics (UNCW) Combinatorics (UNCW) Business Computing (UNCW and NCCU) Business Computing (UNCW and NCCU) Education and Training (UNCW, WCU, CFCC) Education and Training (UNCW, WCU, CFCC) This project proposes to develop a Grid interface that is easy-to-use This project proposes to develop a Grid interface that is easy-to-use and may be used by a wide-range of applications and users. We and may be used by a wide-range of applications and users. We have developed an innovative graphical user interface (GUI) for grid have developed an innovative graphical user interface (GUI) for grid applications. In particular, we introduced a new scripting language applications. In particular, we introduced a new scripting language (JXPL) designed for web-based services, a GUI for creating scripts, (JXPL) designed for web-based services, a GUI for creating scripts, and have demonstrated the use of these tools with grid services.and have demonstrated the use of these tools with grid services.

UNCW Grid Initiative: UNCW Grid Initiative: GridNexusGridNexus

UNCW’s initiative is unique in that it involves UNCW’s initiative is unique in that it involves undergraduate studentsundergraduate students as the main players in as the main players in the development of the Grid portal (GUI).the development of the Grid portal (GUI).Undergraduate computer science students are Undergraduate computer science students are partnered with facultypartnered with faculty and students in application and students in application areas (chemistry, biology, business) to develop areas (chemistry, biology, business) to develop graphical front-ends to access services graphical front-ends to access services (programs) on computers on the Grid.(programs) on computers on the Grid.Grid portals are being developed for the two Grid portals are being developed for the two computational chemistry programs (computational chemistry programs (Gaussian 03 Gaussian 03 andand DMol DMol ) most often used in research by our ) most often used in research by our faculty and studentsfaculty and students. .

Resources of UNCW GridResources of UNCW Grid

Beowulf cluster – 16 PIII processors in Computer Beowulf cluster – 16 PIII processors in Computer Sciences DepartmentSciences DepartmentFire and FireDev servers plus disc storage devicesFire and FireDev servers plus disc storage devicesPQSPQS Quantum Cube – 8 cpu cluster with Quantum Cube – 8 cpu cluster with PQSPQS and and Gaussian 03Gaussian 03 computational chemistry software, computational chemistry software, plus plus TCP-Linda TCP-Linda environment.environment.An 8 processor IBM blade cluster with 0.5 tB disk An 8 processor IBM blade cluster with 0.5 tB disk storage will be added soon. storage will be added soon. Other computers may be added, including the Other computers may be added, including the possibility of using all computing lab computers, or possibility of using all computing lab computers, or possibly even all faculty/staff computers (when not possibly even all faculty/staff computers (when not in use).in use).

Remote Computing before GridRemote Computing before Grid

1.1. Telnet to remote computer, login (separate login and password for Telnet to remote computer, login (separate login and password for each user account and for each computer)each user account and for each computer)

2.2. FTP input data file from local computer to remote machine FTP input data file from local computer to remote machine (requires login, password)(requires login, password)

3.3. Create and edit an input file for job (using vi or other text editor)Create and edit an input file for job (using vi or other text editor)4.4. Create a .job file, edit it if necessaryCreate a .job file, edit it if necessary5.5. Select queue based on # cpus and time required; submit .job fileSelect queue based on # cpus and time required; submit .job file6.6. Check progress of calculation by periodically: telnet to remote Check progress of calculation by periodically: telnet to remote

machine; look for file that indicates completion of job. machine; look for file that indicates completion of job.7.7. FTP output file to local computerFTP output file to local computer8.8. Open output file in text editor, examine numerical dataOpen output file in text editor, examine numerical data9.9. Open output file in a commercial program on local computer to Open output file in a commercial program on local computer to

visualize structurevisualize structure

Now, to submit a quantum chemistry calculationNow, to submit a quantum chemistry calculation

to a remote computer, e.g., at NCSU, one must:to a remote computer, e.g., at NCSU, one must:

Remote Computing on a GridRemote Computing on a Grid

1.1. Login to Grid (single user login and password to access Login to Grid (single user login and password to access ANY Grid resource)ANY Grid resource)

2.2. Select a data file and job parameters from pull-down Select a data file and job parameters from pull-down menus; click to submit (.input and .job file is created menus; click to submit (.input and .job file is created automatically by Grid middleware, job is submitted automatically by Grid middleware, job is submitted automatically to an appropriate available computer)automatically to an appropriate available computer)

3.3. Upon completion of computation, output file is Upon completion of computation, output file is automatically sent to local computer to visualize structure automatically sent to local computer to visualize structure (which can also be automated).(which can also be automated).

In the future, using Grid middleware to submit a In the future, using Grid middleware to submit a quantum chemistry calculation to a remote computer quantum chemistry calculation to a remote computer at NCSU:at NCSU:

Development of a Grid PortalDevelopment of a Grid Portal

The objective is to make accessing HPC The objective is to make accessing HPC resources (wherever they may be located) easy resources (wherever they may be located) easy to scientists who are not computer savvy.to scientists who are not computer savvy.Most computation involves doing various Most computation involves doing various mathematical operations on a dataset.mathematical operations on a dataset.A GUI approach is employed, in which the user, A GUI approach is employed, in which the user, after a single login that checks authentication after a single login that checks authentication and authorization, can create a ‘workflow’ of and authorization, can create a ‘workflow’ of functions/operations graphically by connecting functions/operations graphically by connecting boxes dragged from a series of lists of options, boxes dragged from a series of lists of options, then applying that series of steps to a dataset.then applying that series of steps to a dataset.Such a ‘workflow’ can be saved for subsequent Such a ‘workflow’ can be saved for subsequent application to another dataset.application to another dataset.

Development of a Grid PortalDevelopment of a Grid Portal

Job submission: Ideally in a grid, the grid Job submission: Ideally in a grid, the grid middleware should select the ‘best’ resource – middleware should select the ‘best’ resource – those computers that are available, capable, and those computers that are available, capable, and have the software needed to handle the job. have the software needed to handle the job. The user need not select – nor know – where The user need not select – nor know – where the computation is taking place. In fact, the job the computation is taking place. In fact, the job may even be passed from one computer to may even be passed from one computer to another for various aspects of the calculation.another for various aspects of the calculation.The output is returned to the user’s workstation The output is returned to the user’s workstation or account, rather than the user having to or account, rather than the user having to access and download the output file from a access and download the output file from a remote computer.remote computer.

UNCW’s Grid Portal: UNCW’s Grid Portal: GridNexusGridNexus

3 main application types: genomics/ 3 main application types: genomics/ bioinformatics, business and bioinformatics, business and chemistrychemistryChemistry resources on UNCW Grid:Chemistry resources on UNCW Grid:– PQSPQS Quantum Cube – 8 cpu cluster with Quantum Cube – 8 cpu cluster with PQSPQS and and

Gaussian 03Gaussian 03 computational chemistry software and computational chemistry software and TCP-LindaTCP-Linda

– Beowulf Cluster – 16 cpu cluster with Beowulf Cluster – 16 cpu cluster with Gaussian 03Gaussian 03 computational chemistry software and computational chemistry software and TCP-LindaTCP-Linda

– Soon to be added:Soon to be added: IBM blade server with 8 or 16 IBM blade server with 8 or 16 cpus; cpus; Gaussian 03Gaussian 03 will be installed on it. will be installed on it.

– Java script for file transformation…e.g., to convert Java script for file transformation…e.g., to convert HyperChemHyperChem file into a file into a Gaussian 03Gaussian 03 input file input file

Quantum Chemistry PortalQuantum Chemistry Portal

A GUI is under development to allow a user A GUI is under development to allow a user to select the following from pull-down menus to select the following from pull-down menus within ‘boxes’ that are linked into a ‘workflow’:within ‘boxes’ that are linked into a ‘workflow’:– Data input fileData input file– Transform to another file type if necessary Transform to another file type if necessary – Level of calculation: HF, DFT, MP2, etc.Level of calculation: HF, DFT, MP2, etc.– Basis set: 6-31G(d,p), 6-311++G(2d,p), etc.Basis set: 6-31G(d,p), 6-311++G(2d,p), etc.– Number of processors neededNumber of processors needed– CPU time requestedCPU time requested– Keywords: opt, nmr, freq, pop=npa, etc.Keywords: opt, nmr, freq, pop=npa, etc.– Charge and multiplicityCharge and multiplicity

Design of UNCW Grid GUIDesign of UNCW Grid GUI

Select from pull-down menus in categories:Select from pull-down menus in categories:

Data sets (Windows Explorer-like file browser)

File Type Transformer

Level of Theory Basis Set

CPU Time # Processors

Keywords Chg. & Multiplicity

SubmitVisualize

Design of UNCW Grid GUIDesign of UNCW Grid GUI

Select from pull-down menus in categories:Select from pull-down menus in categories:

Data sets (Windows Explorer-like file browser)

File Type Transformer

Level of Theory Basis Set

CPU Time # Processors

Keywords Chg. & Multiplicity

SubmitVisualize

HF

MP2

DFT

Design of UNCW Grid GUIDesign of UNCW Grid GUI

Functions can be grouped into sets called Functions can be grouped into sets called “workflows” for repetitive operations:“workflows” for repetitive operations:

Data sets (Windows Explorer-like file browser)

File Type Transformer

Level of Theory Basis Set

CPU Time # Processors

Keywords Chg. & Multiplicity

SubmitVisualize

Design of UNCW Grid GUIDesign of UNCW Grid GUI

Preferences among choices can be saved as Preferences among choices can be saved as part of the workflow:part of the workflow:

Data sets (Windows Explorer-like file browser)

File Type Transformer

HF 6-31G(d)

4000 4

NMR 0,1

SubmitVisualize

Design of UNCW Grid GUIDesign of UNCW Grid GUI

The result is a much more simplified process for The result is a much more simplified process for the user:the user:

Calculate, Visualize

Select data, Transform it

Design of UNCW Grid GUIDesign of UNCW Grid GUI

Multiple repeatedly used sets of commands Multiple repeatedly used sets of commands (‘workflows’) can be saved (‘workflows’) can be saved

A user’s preferences within a workflow (e.g., A user’s preferences within a workflow (e.g., level of theory, basis set, # processors, cpu time level of theory, basis set, # processors, cpu time requested, keywords, charge and multiplicity) requested, keywords, charge and multiplicity) could be saved also (future design feature).could be saved also (future design feature).

In the future a user may need only to specify a In the future a user may need only to specify a data set (file) and link it to a pre-set ‘workflow’ to data set (file) and link it to a pre-set ‘workflow’ to initiate a calculation!initiate a calculation!

Chemistry PortalChemistry Portal

Initially, the portal will operate under Initially, the portal will operate under LinuxLinux

Next it will be ported to operate under Next it will be ported to operate under WindowsWindows

Eventually, computations will be submitted Eventually, computations will be submitted online through web browsersonline through web browsers

This could be accomplished from any devise This could be accomplished from any devise (e.g., pc, laptop, or even a cell phone) that can (e.g., pc, laptop, or even a cell phone) that can access the Internet.access the Internet.

JXPL LanguageJXPL Language

UNCW Mathematics Faculty Dr. Jeff Brown with UNCW Mathematics Faculty Dr. Jeff Brown with help from Computer Science Faculty Dr. Clayton help from Computer Science Faculty Dr. Clayton Ferner and recent graduate Mike Wood Ferner and recent graduate Mike Wood developed a new java-base programming developed a new java-base programming language called JXPL.language called JXPL.JXPL is the language used in the JXPL is the language used in the GridNexusGridNexus project, and is a project, and is a language commonly used language commonly used with web services and grid serviceswith web services and grid servicesThe advantages of JXPL include:The advantages of JXPL include:– It is readily extensibleIt is readily extensible– Interfaces easily with (LISP-like) data structures in GUI Interfaces easily with (LISP-like) data structures in GUI – JXPL scripts are written in XML, a commonly used JXPL scripts are written in XML, a commonly used

language language

What’s Next?What’s Next?

More “filters” to transform data need to be More “filters” to transform data need to be developed and testeddeveloped and testedFancier graphics may be added to the GUIsFancier graphics may be added to the GUIsMore computational nodes will be added to the More computational nodes will be added to the Grid. The eventual goal is to include all NC Grid. The eventual goal is to include all NC institutions of higher learning.institutions of higher learning.Extend Grid to include more software applicationsExtend Grid to include more software applicationsExtend Grid services to other disciplinesExtend Grid services to other disciplinesInclude industry and businesses as users and Include industry and businesses as users and developers.developers.

References:References:

http://people.uncw.edu/vetterr/grid/proposal/UNC-OP_Grid_Projecthttp://people.uncw.edu/vetterr/grid/proposal/UNC-OP_Grid_Project%20Overview.htm%20Overview.htm

http://www.ox.compsoc.net/~swhite/history/http://www.ox.compsoc.net/~swhite/history/

http://www.grid.orghttp://www.grid.org

http://www.gridcomputingplanet.com/http://www.gridcomputingplanet.com/

http://www.globus.org/research/papers/anatomy.pdfhttp://www.globus.org/research/papers/anatomy.pdf

http://http://www.ibm.comwww.ibm.com/grid/grid

http://http://www.globus.orgwww.globus.org

http://http://www.usatlas.bnl.govwww.usatlas.bnl.gov/computing/grid//computing/grid/

AcknowledgmentsAcknowledgments

UNC-OP for funding the UNCW Grid Initiative Proposal:UNC-OP for funding the UNCW Grid Initiative Proposal:

““Fostering Undergraduate Research Partnerships Fostering Undergraduate Research Partnerships through a Graphical User Environment for the North through a Graphical User Environment for the North Carolina Computing Grid,” Carolina Computing Grid,” Dr. Ron Vetter, PI Dr. Ron Vetter, PI

– Co-PIs:Co-PIs: Dr. Rebecca S. Boston, NCSU; Dr. Anthony Wilkinson, WCU; Dr. Rebecca S. Boston, NCSU; Dr. Anthony Wilkinson, WCU; Dr. Marilyn McClelland, NCCU; Dr. Libero Bartolotti, ECU; Ms. Judy Dr. Marilyn McClelland, NCCU; Dr. Libero Bartolotti, ECU; Ms. Judy Porter, CFCC.Porter, CFCC.

– UNCW Participants: UNCW Participants: Computer ScienceComputer Science: Dr. Ron Vetter, Dr. Clayton : Dr. Ron Vetter, Dr. Clayton Ferner, Dr. David Berman, and Dr. Tom Hudson. Ferner, Dr. David Berman, and Dr. Tom Hudson. Information Information Technology SystemsTechnology Systems: Dr. Bob Tyndall and Mr. Bobby Miller. : Dr. Bob Tyndall and Mr. Bobby Miller. Mathematics and StatisticsMathematics and Statistics: Dr. Jeff Brown. : Dr. Jeff Brown. Chemistry and Chemistry and BiochemistryBiochemistry: Dr. Ned H. Martin. : Dr. Ned H. Martin. Biological SciencesBiological Sciences: Dr. Ann : Dr. Ann Stapleton Stapleton Information Systems and Operations ManagementInformation Systems and Operations Management: : Dr. Tom Janicki.Dr. Tom Janicki.

– UNCW Computer Science students working on the Chemistry portal: UNCW Computer Science students working on the Chemistry portal: Tristan Carland, Jerry Martin, Andrew Martin Tristan Carland, Jerry Martin, Andrew Martin