florida tech grid cluster

Florida Tech Grid ClusterP. Ford2 * X. Fave1 * M. Hohlmann1High Energy Physics Group1Department of Physics and Space Sciences2Department of Electrical & Computer Engineering

HistoryOriginal conception in 2004 with FIT ACITC grant.2007 - Received over 30 more low-end systems from UF. Basic cluster software operational.2008 - Purchased high-end servers and designed new cluster. Established Cluster on Open Science Grid.2009 - Upgraded and added systems. Registered as CMS Tier 3 site.

Current StatusOS: Rocks V (CentOS 5.0)Job Manager: Condor 7.2.0Grid Middleware: OSG 1.2, Berkeley Storage Manager (BeStMan) 2.2.1.2.i7.p3, Physics Experiment Data Exports (PhEDEx) 3.2.0Contributed over 400,000 wall hours to CMS experiment. Over 1.3M wall hours total.Fully Compliant on OSG Resource Service Validation (RSV), and CMS Site Availability Monitoring (SAM) tests.

System Architecturenas-0-0Compute Element (CE)Storage Element (SE)compute-2-Xcompute-1-X

HardwareCE/Frontend: 8 Intel Xeon E5410, 16GB RAM, RAID5NAS0: 4 CPUs, 8GB RAM, 9.6TB RAID6 ArraySE: 8 CPUs, 64GB RAM, 1TB RAID520 Compute Nodes: 8 CPUs & 16GB RAM each. 160 total batch slots.Gigabit networking, Cisco Express at core.2x 208V 5kVA UPS for nodes, 1x 120V 3kVA UPS for critical systems.

HardwareOlin Physical Science High Bay

Rocks OSHuge software package for clusters (e.g. 411, dev tools, apache, autofs, ganglia)Allows customization through Rolls and appliances. Config stored in MySQL.Customizable appliances auto-install nodes and post-install scripts.

StorageSet up XFS on NAS partition - mounted on all machines.NAS stores all user and grid data, streams over NFS.Storage Element gateway for Grid storage on NAS array.

Condor Batch Job ManagerBatch job system that enables distribution of workflow jobs to compute nodes.Distributed computing, NOT parallel.Users submit jobs to a queue and system finds places to process them.Great for Grid Computing, most-used in OSG/CMS.Supports Universes - Vanilla, Standard, Grid...

Master: Manages all daemonsNegotiator: Matchmaker between idle jobs and pool nodes.Collector: Directory service for all daemons. Daemons send ClassAd updates periodically.Startd: Runs on each execute node.Schedd: Runs on a submit host, creates a shadow process on the host. Allows manipulation of job queue.

Typical Condor setup

Condor PriorityUser priority managed by complex algorithm (half-life) with configurable parameters.System does not kick off running jobs.Resource claim is freed as soon as job is finished.Enforces fair use AND allows vanilla jobs to finish. Optimized for Grid Computing.

Grid MiddlewareSource: OSG Twiki documentation

OSG MiddlewareOSG middleware installed/updated by Virtual Data Toolkit (VDT).Site configuration was complex before 1.0 release. Simpler now.Provides Globus framework & security via Certificate Authority.Low maintenance: Resource Service Validation (RSV) provides snapshot of site.Grid User Management System (GUMS) handles mapping of grid certs to local users.

BeStMan StorageBerkeley Storage Manager: SE runs basic gateway configuration - short config but hard to get working.Not nearly as difficult as dCache - BeStMan is a good replacement for small to medium sites.Allows grid users to transfer data to-and-from designated storage via LFN e.g. srm://uscms1-se.fltech-grid3.fit.edu:8443/srm/v2/server?SFN=/bestman/BeStMan/cms...

WLCGLarge Hadron Collider - expected 15PB/year. Compact Muon Solenoid detector will be a large part of this.World LHC Computing Grid (WLCG) handles the data, interfaces with sites in OSG, EGEE (european), etc.Tier 0 - CERN, Tier 1 - Fermilab, Closest Tier 2 - UFlorida.Tier 3 - US! Not officially part of CMS computing group (i.e. no funding), but very important for dataset storage and analysis.

T2/T3 sites in the UST3T3T3T3T3https://cmsweb.cern.ch/sitedb/sitelist/T3T3T3T3T3T2T2T2T2T2T2T2T3T3

Cumulative Hours for FIT on OSG

Local Usage TrendsTrendsOver 400,000 cumulative hours for CMSOver 900,000 cumulative hours by local usersTotal of 1.3 million CPU hours utilized

Tier-3 SitesNot yet completely defined. Consensus: T3 sites give scientists a framework for collaboration (via transfer of datasets), also provide compute resources.Regular testing by RSV and Site Availability Monitoring (SAM) tests, and OSG site info publishing to CMS.FIT is one of the largest Tier 3 sites.

RSV & SAM Results

PhEDExPhysics Experiment Data Exports: Final milestone for our site.Physics datasets can be downloaded from other sites or exported to other sites.All relevant datasets catalogued on CMS Data Bookkeeping System (DBS) - keeps track of locations of datasets on the grid.Central web interface allows dataset copy/deletion requests.

Demohttp://myosg.grid.iu.eduhttp://uscms1.fltech-grid3.fit.eduhttps://cmsweb.cern.ch/dbs_discovery/aSearch?caseSensitive=on&userMode=user&sortOrder=desc&sortName=&grid=0&method=dbsapi&dbsInst=cms_dbs_ph_analysis_02&userInput=find+dataset+where+site+like+*FLTECH*+and+dataset.status+like+VALID*

CMS Remote Analysis Builder (CRAB)Universal method for experimental data processingAutomates analysis workflow, i.e. status tracking, resubmissionsDatasets can be exported to Data Discovery PageLocally used extensively in our muon tomography simulations.

Network PerformanceChanged to a default 64kB blocksize across NFSRAID Array change to fix write-cachingIncreased kernel memory allocation for TCPImprovements in both network and grid transfer ratesDD copy tests across networkChanges from 2.24 to2.26 GB/s in readingChanges from 7.56 to 81.78 MB/s in Writing

Iperf on the Frontend BeforeIperf on the Frontend AfterDD on the Frontend AfterDD on the Frontend Before

Block sizeWRITEMB/sREADGB/s64k12.784.40.422.513.181.70.452.313.181.50.482.21476.50.532.312.684.80.462.313.181.780.4682.26

Block sizeWRITEMB/sREADGB/s64k102.910.40.452.494.511.40.492.2288.53.70.492.2244.894.40.492.21357.90.492.2173.1587.560.4822.24

TCP:STCP:CUDP: jitterlost(Mbits/sec)7537540.1101.059129130.02201.058968970.03401.058918920.39301.058888891.75101.058688690.46201.05

TCP: STCP: CUDP: jitterlost(Mbits/sec)9419420.04801.059399400.02501.059359370.02201.059309310.02301.059419420.02501.05937.2938.40.028601.05

Our cluster has seen a lot of growth since its coming online. The general trends we have observed have shown a cumulative total of 400,000 hours from the CMS project. Locally over 900,000 hours have been logged by users within our own department. This adds up to a total of 1.3 million CPU hours and this is only since the New cluster came online. Its about and a year and a half old. Gratia is a tool provided by OSG to keep track of the statistics of grid sites. We can use it to graphically display a lot of information.Crab is a python program that was developed by CMS and has been in use since spring 2004. It has become the universal method of experimental data processing for CMS. The advantage of using CRAB is that it allows for distributing and parallelizing local CMS data analysis over the Grid. It facilitates the submission and running of jobs because it doesnt require any knowledge about which sites have which programs. Once submitted, Crab will find a site with the necessary components for your jobs. More specifically, the user doesnt have to know anything about WLCG, gLite, or the OSG middleware involved. We use it for all of these advantages quite extensively. It can be used standalone or with a server. The commands are exactly the same in both cases but a crab server is recommended when over 100 jobs are being run because it will automate resubmission, status caching, and output retrieval. Many of our jobs use CRAB servers but some of them also run standalone. The Data Discovery Page hosts a myriad of datasets that are available to any CRAB user. Additionally, after generating your own data, you can easily make it available to the scientific community by uploading it with phedex to this database. We are currently hosting a dataset on DBS and working on uploading more. Internationally, CRAB has been used in the Physics Technical Design Report, in the analysis of reconstructed event samples generated during the Computing Software and Analysis Challenges and in the preliminary cosmic rays data taking. More information is available at https://twiki.cern.ch/twiki/bin/view/CMS/SWGuideCrabLast fall we received a lot of complaints about the performance of our cluster. I began doing some diagnostic tests to see what speeds we were running at and what we could do to improve. We ended up changing the default blocksize to 64Kb and also had to replace the battery storage pack on the RAID Array which broke during our testing. Then we repeated all of the measurements we had taken for comparison. Using DD we saw only a small change in the reading speed from 2.24 to 2.26 GB/s but our Writing speed increased to be over 10 times as fast. It went from 7.56 to 81.78 MB/s. Iperf gives us a more theoretical bandwidth of what we should be capable of and we saw improvements with its use as well. Using the TCP protocol we saw a leap form 868 to 937.2 Mbits/s although with UDP we saw absolutely no change. Our jitter also decreased from .462 to .0268 which was a good sign since that means the average in the time difference between packets is low, so they are coming in at a steady rate. These are just a few excerpts from our data. The values below the lines are the averages for every column.

florida tech grid cluster

Documents

grid storage

grid data

grid users

grid computing

grid middlewaresource

open science grid

mapping of grid certs

gb ram