status of the production and nagios news alice tf meeting 29/07/2010
Post on 04-Jan-2016
219 Views
Preview:
TRANSCRIPT
Status of the Production and Nagios news
ALICE TF Meeting29/07/2010
Status of the production
• Since yesterday (28/07/2010) ALICE is running out of MC production– Raw data reconstruction: Currently running at
CERN (LHC10e). Decrease of the activity during the week
– Analysis trains: Ongoing– User analysis: Ongoing– MC production: Finished for the moment. No new
MC requirements on pipe
Job profile this week
Decrease due to the stop of the MC production
Job profile per users
Production clearly dominated by the MC jobs this week
As usual, important user analysis activity also this
week
Raw data transfers and productionLow raw data transfer activity this week: 1.3TB of raw data transferred. (Compatible with the raw data taking regime this week)
Around 25TB of raw data recorded in CASTOR@CERN
Status of the sites
• T1 sites– CNAF: The site has been running a very low number of Alice
jobs since more than a week. • A GPFS migration caused this problem• Still today the number of jobs is low although the operation is finished• # jobs should increase in the next hours
– RAL• ALICE is running over the number of assigned resources• Site proposed to put a cap on the number of Alice jobs at 1250. This is
about 25% of the farm, and is around 10 times Alice's current fairshare allocation, (Alice's current usage is about 65%)This is necessary as the recent high volumes on Alice work caused CMS to run a high priority workload elsewhere.
Status of the sites• T2 sites
– Subatech will be down starting tomorrow Friday at 16:00 GMT+2 until Monday in the morning. Electrical maintenance• In addition some French sites had cooling problems already solved
– Grenoble: External network will be down on Saturday, July 31st from 5:30 am till 6:00pm.
– Poznan: SE failed during the week, already solved– IPNL: CREAM1.6 migration completed – Torino: CREAM1.6 migration completed – Madrid: SE failing today. Migration activities ongoing. The CREAM system
already migrared to CREAM1.6– Trujillo: Out of production since a long time, in addition SE failing– LBL: SE failing today– Small activities at some Russian sites (new host certificates of the voboxes)
Pending issues
• Issue reported last week:– Large amount of zombies or extremely long jobs
running at the sites (over 46h)• Declared as pathological jobs which should be killed• Sites were encouraged to whether kill those jobs or
decrease the CPU limit time of the ALICE queues to 24h– No news after this during this week
Quattor recipe for the CREAM-CE migration
• Thanks to Jerome for this instructions– Available at:– http://alien2.cern.ch/index.php?
option=com_content&view=article&id=46&Itemid=103
Status of Nagios
• SAM will switched off in September
ALL VOBOXES MUST BE PINGABLE AND ACCESIBLE FROM samnag014.cern.ch
top related