software - portal.futuresystems.org file · web viewfuture grid report april 2 2012. geoffrey fox....

24
Future Grid Report April 2 2012 Geoffrey Fox Introduction This report is the sixty-fourth for the project and now continues with status of each team/committee and the collaborating sites. Summary Operations and Change Management Committee Operations Committee conference call on Tue Mar 27th (details below). User Survey I remains active. New FutureGrid Project Challenge announced. Partner invoice processing on-going. Software Team (includes Performance and Systems Management Teams) (FG-1423 ) USC finished the Pegasus on FutureGrid tutorial. A new Inca test was written based on the ViNe tutorial and deployed Sierra and Foxtrot. A first draft of the messaging service design document has been completed to interface monitoring data to experiment management. Based on feedback from a networking user, we will deploy additional perfSONAR services and are pricing out 10G capabilities for the perfSONAR machines. Continued improvement of the Portal theme/look & feel; migrated portal front page to panels page manager for better flexibility. IUKB indexing module finished testing; final requirements discovered during testing are being implemented now and should be ready to release in the next biweekly. We moved the image management code to github, added kernel support, and introduced a trusted user feature that allows us to bypass the LDAP authentication. The PAPI team made progress on Grid Benchmark Challenge (GBC), a version of the HPCC benchmark suite that is suitable for virtualized environments, such as Future Grid. Hardware and Network Team

Upload: dangkhue

Post on 08-Aug-2019

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Software - portal.futuresystems.org file · Web viewFuture Grid Report April 2 2012. Geoffrey Fox. Introduction. This report is the sixty-fourth for the project and now continues

Future Grid Report April 2 2012

Geoffrey Fox

Introduction

This report is the sixty-fourth for the project and now continues with status of each team/committee and the collaborating sites.

Summary

Operations and Change Management CommitteeOperations Committee conference call on Tue Mar 27th (details below). User Survey I remains active. New FutureGrid Project Challenge announced. Partner invoice processing on-going.

Software Team (includes Performance and Systems Management Teams) (FG-1423 ) USC finished the Pegasus on FutureGrid tutorial. A new Inca test was written based on the ViNe tutorial and deployed Sierra and Foxtrot. A first draft of the messaging service design document has been completed to interface monitoring data to experiment management. Based on feedback from a networking user, we will deploy additional perfSONAR services and are pricing out 10G capabilities for the perfSONAR machines. Continued improvement of the Portal theme/look & feel; migrated portal front page to panels page manager for better flexibility. IUKB indexing module finished testing; final requirements discovered during testing are being implemented now and should be ready to release in the next biweekly. We moved the image management code to github, added kernel support, and introduced a trusted user feature that allows us to bypass the LDAP authentication. The PAPI team made progress on Grid Benchmark Challenge (GBC), a version of the HPCC benchmark suite that is suitable for virtualized environments, such as Future Grid.

Hardware and Network Team GPU Cloud Cluster, delta, in initial testing, all nodes have OS installs. Integrated into India

scheduler. Early user use to start this week RHEL 6 or CentOS being widely tested at IU, TACC, UC, and SDSC for deployment. Sierra

will be upgraded first in preparation for an upgrade of India. Network issues at TACC fully resolved with new monitoring scripts. New version of Nimbus

may fix the MAC address conflicts. Staggered maintenance days for Nimbus enabled clusters are being explored for higher

availability of resources for the project as a whole. Shared storage for Nimbus VMs is required to enable that functionality and is currently being discussed.

Training, Education and Outreach Team (includes user support)TEOS team activities have focused on preliminary analysis of initial results coming from the FutureGrid survey, work on the Project Challenge, improvements in the FG portal, and outreach through social media and news.

Page 2: Software - portal.futuresystems.org file · Web viewFuture Grid Report April 2 2012. Geoffrey Fox. Introduction. This report is the sixty-fourth for the project and now continues

Knowledgebase TeamModest progress – see details below

Site ReportsUniversity of VirginiaNo report received

University of Southern California Information SciencesUSC finished the Pegasus on FutureGrid tutorial.

University of Texas at Austin/Texas Advanced Computing Center Upgraded Nimbus to version 2.9 on Alamo Users of Nimbus on Alamo can now create a virtual cluster Presented FutureGrid to TACC

University of Chicago/Argonne National LabsWork focused on the development and adaptation of multi-cloud capabilities and support activities.

University of FloridaUF collaborated with UCSD to deploy inca tests for the current ViNe deployment at foxtrot and sierra. Testing of ViNe version 2 has been performed, and improvements in the ViNe management code have been identified with focus on the May 1st project deadline. Fortes chaired the operations committee, and Figueiredo chaired the TEOS team.

San Diego Supercomputer Center at University of California San DiegoUCSD wrote and deployed a new Inca test for ViNe, worked with TACC to finish a design document for a messaging service for monitoring data, and coordinated activities to increase capabilities of FutureGrid’s perfSONAR deployment.

University of Tennessee Knoxville(FG-1462) We made progress on Grid Benchmark Challenge (GBC), a version of the HPCC benchmark suite that is suitable for virtualized environments, such as Future Grid.

Page 3: Software - portal.futuresystems.org file · Web viewFuture Grid Report April 2 2012. Geoffrey Fox. Introduction. This report is the sixty-fourth for the project and now continues

Detailed Descriptions

Operations and Change Management CommitteeOperations Committee Chair: Jose FortesChange Control Board Chair: Gary Miksik, Project Manager

Operations Committee conference call on Tue Mar 27th. o Approved the use of “polls” in the FutureGrid portal to solicit feedback on specific

questions of interest to the FutureGrid team. Polls do not take the place of formal surveys

o Approved policy of “disabling” an existing FutureGrid account, when request has been made to “delete” the account. In order to maintain a proper audit, disabling an account will not physically delete the account. All email notifications to the user will be stopped. This action will be taken either on request or if system abuse is determined

o Agreed that certain types of email notifications to end users should have an “unsubscribe” (i.e. opt-out) feature. These would include outage notifications, news, and user forums.

o Began discussion of new “alumni” role in FutureGrid and what it would mean to have such a role

User Survey I. Remains active and open. Reminder notices sent out. No formal close date has been set yet. See Outreach report for more details.

FutureGrid Project Challenge.

The FPC is open to both students and non-students and provides an opportunity to showcase their skills and knowledge by utilizing FutureGrid computing resources in the execution of applications. Eight awards are envisioned, four for students and four for non-students, to be based on the following criteria:

Interoperability Scalability Contribution to Education Research (innovation, quality of papers, new software, algorithms, insightful

performance measurements, etc.)

The award will be based on FutureGrid work and its analysis or on ideas and software developed with major use of FutureGrid. First place award for students is a trip to SC12 (up to a $1,000), where a demonstration will be provided. All other awards are $200 cash. We encourage minority students to enter the Challenge, as one of the student awards is reserved for a "top minority" project. The deadline for submitting the FPC project requests is May 15, 2012. Awards will be announced on June 15, 2012.

Page 4: Software - portal.futuresystems.org file · Web viewFuture Grid Report April 2 2012. Geoffrey Fox. Introduction. This report is the sixty-fourth for the project and now continues

Financials.

Note: NSF spending authority is thru December 31, 2011.Partner Invoice Processing to Date

IU PO # Spend Auth Y1 PTD Y2 PTD Y3 PTD Total PTD Remaining ThruUC 760159 831,845 213,235 343,809 129,292 686,337 145,508 Feb-12

UCSD 750476 598,541 192,295 183,855 95,566 471,716 126,825 Feb-12UF 750484 434,055 83,298 110,891 46,731 240,920 193,135 Jan-12

USC 740614 450,000 154,771 115,407 151,290 421,468 28,532 Feb-12UT 734307 920,269 547,509 159,987 61,435 768,931 151,338 Jan-12UV 740593 257,598 53,488 103,878 72,709 230,075 27,523 Feb-12

UTK 1014379 177,084 N/A 28,271 34,300 62,571 114,513 Jan-123,669,392 2,882,018 787,374

Software Team Lead: Gregor von Laszewski

ADMINISTRATION (FG-907 - IU Gregor von Laszewski) Several former project members were removed from jira and the mediawiki server.

A meeting with the systems team took place in which we agreed to also use jira to coordinate the systems related tasks. Gregor von Laszewski offered help in to make this transition easy. This will follow the request by Geoffrey Fox that all major activities be tracked with jira.

In order to coordinate activities for the May 1st deliverables a new wiki page to collect tasks that we will monitor via jira has been created at

https://wiki.futuregrid.org/index.php/Sw:Deliverables-2012-05-01

Reporting of Unicore and Genesis have been delegated to Vanamala Venkataswamy from UVA.

Defining Activities (FG-1223 - Gregor von Laszewski)

All Jira tasks previously assigned and watched by the former systems manager were reassigned to Gregor von Laszewski, Koji Tanaka, Sharif Islam, and Barbara O'Leary. Several tasks of this list have also been closed as they were previously not reported as such.

Over the last 14 days the following activities took place in jiraUpdated: 269 issues includes closed and resolved tasksClosed: 57 issuesResolved: 29 issuesCreated: 45 issues

Page 5: Software - portal.futuresystems.org file · Web viewFuture Grid Report April 2 2012. Geoffrey Fox. Introduction. This report is the sixty-fourth for the project and now continues

Improving development infrastructure (FG-1204 - Gregor von Laszewski)

FG-1204 - Activity: Redeployment of the management services

To enable SSO via Crowd, we will develop the following command line tool to add the groups jira-users and jira-developers to a user within LDAP.

FG-992 - mediawiki: new wiki server

Due to the availability of jira and the new mediawiki this is no longer a blocker.

Although we did not update to the newest mediawiki version, our current version is sufficient for our needs. Thus we will close this activity. We will create two new activities that deal with this update as well as the integration of mediawiki via crowd. However these ar at this time of lower priority. Till we work on these tasks we will remove the mediawiki activity from the reported items.

HPC SERVICES

Unicore (FG-927) (Vanamala Venkataswamy UVA)

We have delegated the reporting of Unicore updated to Vanamala Venkataswamy.

Genesis (FG-933) (Vanamala Venkataswamy UVA)

Vanamala Venkataswamy has conducted a tutorial on how to use jira and mediawiki for the reporting of Genesis II related tasks for FG with Gregor von Laszewski.

Virtualized Globus (FG-1158 - IU Andrew Younge)

No significant update has been reported.

HPC Globus (FG-1235 - TACC Warren Smith)

We are waiting for the LDAP schema to be updated

EXPERIMENT MANAGEMENT

Experiment Management (FG-518 Warren Smith, Jens Vöckler)

Integration of Pegasus into Experiment Management (FG-412 - ISI Jens Vöckler)

USC finished the Pegasus on FutureGrid tutorial, integrating feedback from IU. With integration help from IU, Pegasus is now visible on the FutureGrid portal and has an entry in the FutureGrid user manual as https://portal.futuregrid.org/manual/pegasus. USC finished supplying Virtual

Page 6: Software - portal.futuresystems.org file · Web viewFuture Grid Report April 2 2012. Geoffrey Fox. Introduction. This report is the sixty-fourth for the project and now continues

Machines for Nimbus, Eucalyptus and OpenStack on all FG IaaS resources, as documented in the user manual. The tutorial is available from http://pegasus.isi.edu/futuregrid/tutorials/ at this point.

With the approaching departure of Jens Vöckler, USC is working on handing over responsibilities to Mats Rynge and Karan Vahi.

Experiment management with support of Experiment Harness (FG-906 - TACC Warren Smith)

Please see site report from TACC.

Image Management (FG-899 - Creation, Repository, Provisioning - IU Javier Diaz)

1. Image generator

The image generation creates an image file with a fixed size. The size can be too small if a user requests a large number of packages. We tried to tackle this problem by creating the image in a directory instead of using the image file. As a result, we don’t have any space problem as the image file is created after all packages are installed. However, in retrospect we found that this solution was not uniformly applicable due to problems with the package manager of some OSs such as those based on Debian. The problem was that aptitude (Debian package manager) can not install packages in a directory mounted via NFS.

Therefore, we decided to create an optional parameter in the command line interface to specify the size of the image file.

2. Image Registration

We have improved the image management tools to support different kernels. We maintain a list of available kernels for each of the supported infrastructures. The new functionality allows users to select the kernel they want for their images. Moreover, users will be able to request us some specific kernel that we will make available for them.

3. Trusted users

We have added a new feature that allows us to define a list of “trusted” users that do not need to authenticate against LDAP. For each user we need to include a list of IP addresses where the user is going to access from. This feature is intended for system daemons, like Inca, that need to use our software in non-interactive mode.

Page 7: Software - portal.futuresystems.org file · Web viewFuture Grid Report April 2 2012. Geoffrey Fox. Introduction. This report is the sixty-fourth for the project and now continues

We have restructured our code to create a final package. Moreover, we have made it compatible with the Python easy_install tools to easily distribute and install the software.

Finally, we have moved the code from Subversion to Github.

ACCOUNTING

Accounting for HPC (FG-1081 - IU Allen Streib)

We are now working on an fg-manage script that can conveniently add groups to our LDAP server. This is a precondition for our work on accounts.

Accounting for Clouds (FG-1301 - IU Hyungro Lee)

A meeting took place between Hyungro Lee and Gregor von Laszewski to discuss the newest coding efforts done by both. The code contains now a command line shell developed by Gregor. Hyungro further developed the PHP framework. We decided that the code developed as part of the command line shel could be reused in the php framework in order to simplify programming. We tested the speed of the command line tool and its interaction with the database and found that the code returns results in 0.027 seconds. This is ample speed to be used in interactive mode. It also shows that the framework is now more suitable for easy software development as lengthy parsing of log entries as part of the program development is avoided.

The entire eucalyptus records for ccPrintInstance are now available in the database (uploaded by Hyungro)

FG SUPPORT SOFTWARE AND FG CLOUD SERVICES

Nimbus (FG-842 - John Bresnahan)

The last week UC spent most of their time working on testing an auto-scaling service for FutureGrid and creating a web interface for FutureGrid clouds. This has the potential to dramatically lower the barrier to entry for new FG users as well as be a very interesting work framework. UC has been working with "swift" on a demo for this software.

Eucalyptus (FG-1429 - IU Sharif Islam)

Version 2.0 After the end of B534 class assignment, eucalyptus load is back to normal. As a team, we discussed adding more memory. We also got some helpful suggestions from euca support team.

Version 3.0 FG-1202

Page 8: Software - portal.futuresystems.org file · Web viewFuture Grid Report April 2 2012. Geoffrey Fox. Introduction. This report is the sixty-fourth for the project and now continues

We installed Eucalyptus test cluster on fg-gravel. We are currently in touch with euca support to figure out some network setting details.

Configuration details will be documented here: https://wiki.futuregrid.org/index.php/Euca3

OpenStack (FG-1203 IU - Sharif Islam)

I installed Swift with using 4 nodes (1 proxy node and 3 storage nodes), and then, set it up as the Glance's backend storage. I faced several issues for making Swift work with Glance (which didn't happen to gravel), but Google helped me a lot for finding solutions.

One thing to be noted is that, current storage nodes have only one 500G disk for each. They are too small to be storage nodes. If we can have nodes having bigger disks, I can set them up as storage nodes. And we can use current storage nodes for nova compute nodes.

Inca (FG-877 - Shava Smallen, UCSD)

In the last week, we worked to write and deploy a new Inca test to verify ViNe capabilities between Sierra and Foxtrot. The test is based on the FutureGrid tutorial “Connecting VMs in private networks via ViNe overlay”. Using the VM image “centos-5.5-x64-vine.gz”, a VM with a public IP address is started on Sierra and a VM with a private IP address is started on Foxtrot. Once the VMs are up, the test verifies that the Sierra VM can connect to the Foxtrot VM and then shuts both down. This test currently runs once per day and the results are available on the Inca Cloud page at http://inca.futuregrid.org:8080/inca/HTML/rest/Cloud/FG_CLOUD.

ViNe: (FG-140 - UF Renato F. Mauricio T. Jose Fortes)

ViNe development activities focused on testing ViNe version 2 (ViNe with management enhancements). While basic functionality of ViNe2 was verified, it has been identified that only simple scenarios of overlay networks deployment and management (e.g., overlay topology in which most of the load concentrates in a single node, difficulty to support the currently deployed ViNe on sierra and foxtrot, limited autoconfiguration capability, etc) is supported in current ViNe management implementation. It has been decided that the ViNe management code will be re-worked in order to enable the full capability of the ViNe infrastructure. The development will target delivery by May 1st.

WEB SITE AND SUPPORT

Portal and Web Site – (Mathew Hanlon (TACC), IU Fugang Wang, Gregor von Laszewski) 

Page 9: Software - portal.futuresystems.org file · Web viewFuture Grid Report April 2 2012. Geoffrey Fox. Introduction. This report is the sixty-fourth for the project and now continues

We have devised a mechanism on how to deal with early users. There are some issues that yet have to be worked out. Our plan is to create portal accounts for all early users.

FG-1311

Updated several modules and removed obsolete/unnecessary modules. FG-1513

Migrated portal front page to panels page manager for better flexibility. FG-1514

IUKB module completed and tested. Initial feedback is to not update on cron schedule, and make local versions editable by appropriate roles. We don't want to use cron, because this will overwrite any local editing. Need to add a manual "Update All Documents (admin privilege) and "Update this document" (editor privilege). Also, need to add rules for when a KB node is edited/commented on to send email notification of event to appropriate IUKB parties. FG-1426

KnowledgeBase  (FG-1222 - Jonathan Bolte IU)

Greor von Laszewski has changed the description of the task to include the discussions that took place at various software meetings in regards to this task. We also reassigned the task to those identified a pathway for developing a solution to the information separation between IUKB and the drupal portal.

The description has been changed to: "Matthew Hanlon is developing a component that can render IUKB entries within the portal. One of the problems while managing KB entries outside drupals provided mechanism for managing knowledge bases is that the search function of drupal does not return pointers to the IUKB entries. Matthew is developing a workaround for this problem so that IUKB entries also show up in the searches."

Matthew Hanlon pushed the module update to webdev.futuregrid.org that include the syncing/indexing of kb documents into drupal nodes. You can now search for KB document directly from the "global search" on the portal (the search box at the top-right of every page). Relevant KB documents will appear in the default "Content" search results. There is also a specific "Knowledge Base" search which searches only KB nodes.

Some minor changes will take place over the next period. We anticipate that this activity be closed soon.

PERFORMANCE (UCSD Shava Smallen)

Vampir (FG-955 - Thomas Williams)

Page 10: Software - portal.futuresystems.org file · Web viewFuture Grid Report April 2 2012. Geoffrey Fox. Introduction. This report is the sixty-fourth for the project and now continues

After discussions with the Vampir team about how to collect usage data, it was suggested to change the shell alias for modules to log the loaded module to syslog rather than add directives to the modules file. The next step will be to prototype this on our VM monitoring server.

PAPI (FG-957 - Piotr Luszczek (UTK))

We have successfully reproduced the results from our proto-GBC effort that we presented last year at EuroPar. This most recent effort used twice as many cores. The problems we see are very similar to what was happening last time: time inside the guest does not correspond to wall clock time, timer inversion, and inconsistent behavior. In the most recent tests, we went one step further, following a sensible approach that was suggested by our reviewers. Namely, we used VMware ESX, which should (potentially) have much less overheard. But the old problems persisted. At this point we are considering our choices regarding tools for discovering what is going on. One path forward is to use the latest beta-quality release from VMware that would allow us to use hardware counters inside the guest OS. But the timeline on receiving the software bits is uncertain at best so we are putting this approach on hold for the present. Instead, we will be looking into using VMware ESX add-on called VMKperf which allows counting at the ESX console level and gives PAPI-like accuracy (the golden standard in performance tools). Other options include LTTng and some form of profiling at the guest OS level if the VMware tools do not yield satisfactory answers

Performance: Interface monitoring data to the Experiment Harness (FG-1098 - Shava Smallen SDSC)

In the last few weeks, we completed the first draft of the FutureGrid messaging service at https://wiki.futuregrid.org/index.php/Docs/Performance/Messaging. The messaging service will be used to publish monitoring data, providing a single point of access for applications to subscribe to the system event messages from different monitoring tools (e.g., Inca, Ganglia, perfSONAR, etc.). In particular, we finished defining the format of routing keys so that users will be able to easily filter and parse information they are interested in. We will continue to add to this document as we work to integrate the various monitoring tools.

FG-1094 - Performance: Help coordinate setup of perfSONAR

In the last few weeks, we worked with each of the sites to gather information needed to price out adding 10G capability to the perfSONAR machines and are working out implementation details. The GRNOC priced out a perfSONAR measurement machine for IU and we are waiting for the order to be submitted. Also, we spoke with our main networking user and made plans (FG-1506) to deploy a low impact/high frequency network measurement tool (OWAMP or pingER) that can be used with the current higher impact/low frequency BWCTL measurements to determine network performance. Similarly, we will look into deploying Periscope (FG-1507), which provides enhanced interfaces to perfSONAR data.

Page 11: Software - portal.futuresystems.org file · Web viewFuture Grid Report April 2 2012. Geoffrey Fox. Introduction. This report is the sixty-fourth for the project and now continues

Hardware and Network TeamLead: David Hancock

Networking

All FutureGrid network milestones are complete and networking is in a fully operational state.

Planned network outage on April 3rd will relocate IU’s FG connection in Chicago, 4 hour downtime expected.

Compute & Storage Systems

IU Data Capacitor Storageo Access to the Data Capacitor has not been a critical issue with the expansion of

FG internal storage. The requirement will be dropped unless a user need arises. IU iDataPlex (india)

o RHEL6 testing on 4 nodes is progressing well with no user issues yet.o Openstack Diablo test cluster installed.o RHEL updates planned for next maintenance.o System operational for production users.

IU Cray (xray)o No issues during this period.o New software release available (danub, SLES 11 based).o Two newest IU admins scheduled to take Cray administration course this summer.o System operational for production HPC users

IU HP (bravo)o Swift test implementation is installed on bravo, nodes in use for NID testing

currently.o 50% of the system being used for testing with network impairment device, HDFS

available on the remaining systems.o RHEL updates planned for next maintenance.o System operational for production users

IU GPU System (delta)o 1 node available for testing, remainder have OS installs, cloud tools are being

installed.o System management switches will be delivered the first week of April.o System integrated into India scheduler.o System operational for early users

SDSC iDataPlex (sierra)o Upgrade of two nodes to RHEL6 has been done to prep for upgrade of Sierra &

India.o RHEL updates planned for next maintenance.o System operational for production Eucalyptus, Nimbus, and HPC users.

UC iDataPlex (hotel)

Page 12: Software - portal.futuresystems.org file · Web viewFuture Grid Report April 2 2012. Geoffrey Fox. Introduction. This report is the sixty-fourth for the project and now continues

o Deployment plan for Genesis II in progress, waiting on feedback from the Genesis team at UVA.

o RHEL updates planned for next maintenance.o System operational for production Nimbus and HPC users.

UF iDataPlex (foxtrot)o No issues during this period.o System operational for production Nimbus users.

Dell system at TACC (alamo)o Planning for CentOS upgrade. Upgrade should be ready prior to May maintenance

day, remaining issue is upgrading the persistent daemon node for Genesis II.o 5 nodes are provisioned for XSEDE TIS testing with SGE & Torque using the

same headnode.o System operational for production Nimbus and HPC users.

All system outages are posted at https://portal.futuregrid.org/outages_all

Training, Education and Outreach Team (includes user support)Lead: Renato Figueiredo

The TEOS team has discussed summary survey results from a partial set of responses and tried to assess areas of improvement to drive possible requirements for the May 1st deadline. Barbara has followed up with the 6 respondents who indicated their project was stalled or abandoned – this has resulted in one user's problem being resolved. Survey responses calling for improved documentation, streamlined menu, more effective organization of info, clearer statement of FG mission and goals are among areas we are working to address on the FG Portal side of things. We have received 91 surveys as of 3/30; Barbara sent follow up email reminder Monday morning (3/26) and have final reminder set to send on Wednesday afternoon (4/4).

FutureGrid Project Challenge has been announced. Barbara posted to FG Portal, Twitter, and Facebook, and an email notice was sent via FG Portal user forums. Reached out to SoIC (they tweeted about the challenge), Women in Computing and a few other groups to help get the word out. We plan continue this kind of outreach going forward.

Improvements in the FG Portal:

Tutorials: Added links to three Pegasus and one HPSS tutorial on the tutorials page, and posted a news announcement about this in the FG portal and in our social media outlets. Based on discussions in the TEOS call, Barbara is working with Sonali on tutorial improvement, focusing on creating "What You Need" information to put at the top of each tutorial, including other tutorial dependencies, to increase people's preparedness before plunging into the tutorial, so they'll have a better chance of successfully completing them.

Barbara is coordinating the improvement of several pages: Support and Getting Started Pages, Hardware and Status info, and Networking.

Page 13: Software - portal.futuresystems.org file · Web viewFuture Grid Report April 2 2012. Geoffrey Fox. Introduction. This report is the sixty-fourth for the project and now continues

The FG Portal redesign has continued; the TEOS team meeting included Matt Hanlon and Carrie Arnold (new web developer at TACC) with plans for May 1st enhancements. Major issues being addressed at this time:

Project Pages Revamp: including a much more visually appealing and content-rich approach to the individual project pages and to the Project List page. This will include FEATURED PROJECT to help us highlight user success stories. Also working on improving ease of adding project members for project leaders.

Knowledgebase TeamLead: Jonathan Bolte, Chuck Aikman

Active document repository increase by = 0Documents modified = 1Portal document edits = 4Current Total 'live' FG KB documents: 105Target as of 7/1/2012: 175Difference between current and target: 71Number of KB documents that must be completed per week between now and 7/1/2012 to hit target: (71/13) 5Current number of documents in draft status within KB workflow: 36

TicketsLead: Sharif Islam

38 tickets created16 tickets resolved

Currently:54 tickets total24 new tickets27 open tickets3 stalled tickets18 unowned tickets

Site ReportsUniversity of VirginiaLead: Andrew GrimshawNo report received

University of Southern California Information SciencesLead: Ewa Deelman

USC continued to participate in the conference calls for FG Software, FG OCMC, and FG AHM.

Page 14: Software - portal.futuresystems.org file · Web viewFuture Grid Report April 2 2012. Geoffrey Fox. Introduction. This report is the sixty-fourth for the project and now continues

USC finished the Pegasus on FutureGrid tutorial, integrating feedback from IU. With integration help from IU, Pegasus is now visible on the FutureGrid portal and has an entry in the FutureGrid user manual as https://portal.futuregrid.org/manual/pegasus.

USC finished supplying Virtual Machines for Nimbus, Eucalyptus and OpenStack on all FG IaaS resources, as documented in the user manual. The tutorial is available from http://pegasus.isi.edu/futuregrid/tutorials/ at this point.

With the approaching departure of Jens Vöckler, USC is working on handing over responsibilities to Mats Rynge and Karan Vahi.

University of Texas at Austin/Texas Advanced Computing Center Lead: Warren Smith

Dell cluster: Continued problems with duplicate MAC addresses on Nimbus partition

o Upgraded to Nimbus 2.9 to either fix this problem or have the current Nimbus software to debug

o No problems in the few days since the upgrade Addressed a problem where scheduling of the HPC partition was falling back to Torque

from Moab Investigating issues with job epilogue script

o The epilogue script for a job is killing processes for other running jobs on the same node

o For now, the scheduling policy has been changed so that there is at most one job per node

o Investigating methods that the epilogue script can use so that it only kills processes associated with the job it is running for

Continued to work on CentOS 6.2 upgradeo LDAP configured on new master nodeo Patched IB card firmware to work with CentOS 6.2o Installed more applications

Experiment harness: Investigated the boto Python library to access EC2 cloud interfaces

o Plan to use the boto library to get information about provisioned virtual machines from OpenStack and other IaaS infrastructures deployed on FutureGrid

Continued to modify the TeraGrid/XSEDE glue2 software for use on FutureGrido Adding support so that this software can provide resource information in JSON

FutureGrid user portal: Added an alumni field to indicate former members of a project

Page 15: Software - portal.futuresystems.org file · Web viewFuture Grid Report April 2 2012. Geoffrey Fox. Introduction. This report is the sixty-fourth for the project and now continues

Updated several Drupal modules and disabled several other unused modules in the production portal

Changed the type of Drupal page used for the home page to improve flexibility and maintainability

Participated in planning of portal work to be completed by May 1

Outreach: Created a virtual cluster for use by a research group at the University of Texas at San

Antonio. The cluster is automatically configured when it is started by Nimbus and consists of a head node that acts as a NFS and Torque server and one or more compute nodes that act as NFS and Torque clients. OpenMPI is available in the cluster

Presented FutureGrid to TACC to encourage additional FutureGrid usage and receive suggestions for how we can improve FutureGrid

o Several additional groups from TACC are interested in using FutureGrido Significant interest in OpenStacko Suggested that FutureGrid presentations are made in departmental seminar series

at Universities (e.g. Department of Computer Science at the University of Texas)

University of Chicago/Argonne National LabsLead: Kate Keahey

Work focused on the development and adaptation of multi-cloud capabilities and support activities. Specifically, we focused on the following activities:

Further work on designing, prototyping and evaluating multi-cloud infrastructure for FG One on one support of two application groups using Nimbus on FG (ATLAS and social

network modeling) Further work on improving and graphing utilization

Hotel: Diagnosed, replaced HD, and rebuilt c48. Got Ganglia updated to 3.1. Got Nagios DB integration working again so John can start getting availability data from it.

University of FloridaLead: Jose Fortes

The UF team collaborated with UCSD to deploy FG inca tests of the current ViNe deployment at sierra and foxtrot. The test verifies that the procedure described in the simple ViNe tutorial (available at https://portal.futuregrid.org/contrib/simple-vine-tutorial) works correctly. The test is run once a day and results are summarized at http://inca.futuregrid.org:8080/inca/HTML/rest/Cloud/FG_CLOUD (currently shown in the last row).

ViNe activities focused on testing ViNe version 2 (ViNe with management enhancements). While basic functionality of ViNe2 was verified, limitations in the management interface have

Page 16: Software - portal.futuresystems.org file · Web viewFuture Grid Report April 2 2012. Geoffrey Fox. Introduction. This report is the sixty-fourth for the project and now continues

been identified - simple scenarios of overlay networks deployment and management (e.g., overlay topology in which most of the load concentrates in a single node, difficulty to support the currently deployed ViNe on sierra and foxtrot, limited auto-configuration capability) is supported in current ViNe management implementation. The ViNe management code will be re-vamped in order to enable the full capability of the ViNe infrastructure, with a target milestone for delivery of May 1st, aligning with the major internal FutureGrid deliverable deadline.

San Diego Supercomputer Center at University of California San DiegoLead: Shava Smallen

In the past few weeks, UCSD located a bad disk in one of Sierra’s storage servers and will replace it during the Tuesday maintenance period. We also reviewed the ViNE tutorial and wrote and deployed a new Inca test for ViNE that runs once per day. We continue to work with TACC and finished the first draft of a design document to integrate performance monitoring with the experiment management component by publishing system monitoring events from Inca, Ganglia, etc. to a standard messaging service. UCSD also worked with the GRNOC and our networking user Martin Swany to identify additional services to be deployed for perfSONAR and collected information needed to price out adding 10G capabilities to the perfSONAR machines. All activities are described further in the software section of this report. UCSD continues to lead the performance group activities and integrated the 3-4 month plans into Jira tasks as required by the Software group. UCSD led a group call on March 21st and attended the Software and All Hands calls.

University of Tennessee KnoxvilleLead: Jack Dongarra(FG-957) We have successfully reproduced the results from our proto-GBC effort that we presented last year at EuroPar. This most recent effort used twice as many cores. The problems we see are very similar to what was happening last time: time inside the guest does not correspond to wall clock time, timer inversion, and inconsistent behavior. In the most recent tests, we went one step further, following a sensible approach that was suggested by our reviewers. Namely, we used VMware ESX, which should (potentially) have much less overheard. But the old problems persisted. At this point we are considering our choices regarding tools for discovering what is going on. One path forward is to use the latest beta-quality release from VMware that would allow us to use hardware counters inside the guest OS. But the timeline on receiving the software bits is uncertain at best so we are putting this approach on hold for the present. Instead, we will be looking into using VMware ESX add-on called VMKperf which allows counting at the ESX console level and gives PAPI-like accuracy (the golden standard in performance tools). Other options include LTTng and some form of profiling at the guest OS level if the VMware tools do not yield satisfactory answers

Page 17: Software - portal.futuresystems.org file · Web viewFuture Grid Report April 2 2012. Geoffrey Fox. Introduction. This report is the sixty-fourth for the project and now continues