managing large research groups: thoughts from the …...northrop-grumman office of naval research...
TRANSCRIPT
Advanced Computing and Information Systems laboratory
Managing large research groups:
thoughts from the ACIS experience
José Fortes
Advanced Computing and Information Systems laboratory
Outline
� Vision and Mission of ACIS laboratory
� ACIS numbers at a glance
� Examples of different types of ACIS projects
� Comments on management goals and practices• The good, the bad and the ugly
� Key points
Advanced Computing and Information Systems laboratory
ACIS vision and mission
� Vision: to advance the science of IT systems engineering by inventing, analyzing, prototyping and deploying innovative cyberinfrastructure for eScience
� Mission: fundamental and applied research on systems that integrate computing and information processing: • Cloud Computing
Using virtualization technologies for computing platforms, file systems, applications as services, networks and I/O systems
• Cyberinfrastructure for e-science and e-healthFor research on biodiversity informatics, brain-machine interfaces, coastal and ocean modeling, genetics and atomic-scale friction
• Autonomic Computing and Software-defined SystemsAs relevant to data centers, real-time systems, virtual networking and other topics pursued by the Center for Cloud and Autonomic Computing
• Computer ArchitectureArchitectural support for virtualization, reliable computing and green computing
• Peer-to-peer Computing and Software Defined NetworkingSelf-organizing virtual networks, and structured and unstructured query systems
Advanced Computing and Information Systems laboratory
Some numbers associated with ACIS
� Founded in 2001
� 2 faculty members (+2)
� 2 research professors
� 4 IT experts
� 1 Admin. Assistant
� 10 - 20 PhD students
� 3 PhD graduates/year
� long-term visitors from
Japan, Korea, China, France
� 15+ papers/year
� 1+ keynote speech/year
� Chair 1+ major meeting/year
� Expenditures: $1.5 M/year (avg).
Advanced Computing and Information Systems laboratory
ACIS Funding sources
� Army Research Office
� BellSouth/AT&T Foundation
� DARPA
� Intel Corporation
� IBM
� National Aeronautical & Science Administration
� National Oceanic and Atmospheric Administration
� National Science Foundation (CISE, OCI, ENG, BIO)
� Northrop-Grumman
� Office of Naval Research
� Semiconductor Research Corporation
� Southeastern Universities Research Association
� Citrix, Microsoft, Merrill-Lynch, Samsung…
Advanced Computing and Information Systems laboratory
eScience Cyberinfrastructure (iDigBio)
Integrated Digitized
Biocollections (iDigBio) is
the US resource for digitized
information about natural
history collections
Advanced Computing and Information Systems laboratory
Software-defined Systems
� Fault-tolerant Map Reduce
MAP-REDUCE
SYSTEM
MAP-REDUCE APPLICATION
MAP-REDUCE FRAMEWORK
SYSTEM SOFTWARE
INFRASTRUCTURE
MONITORING MODULE
GANGLIA BASED
MONITORING
NODE HEALTH SCRIPT
PLANNING MODULE
SCALING HEURISTIC(MASTER)
ANOMALY DETECTION(SLAVE)
ANALYSIS MODULE
HEART BEAT
PROCESSING
(USING GANGLIA METRIC MODULES)
PRECURSOR DETECTION
(USING HADOOP NODE HEALTH SCRIPT)
EXECUTION MODULE
RESOURCE SCALING
BLACK-LISTING
PREDICTION MODELS
(MASTER)
COST MODELS (MASTER)
� Virtual Networking
UC
UF
PU
ViNe
Downloa
d Server
Virtual Cluster 1. ViNe-enable sites2. Configure ViNe VRs
3. Instantiate BLAST VMs4. Contextualize
a.Retrieve VM information
b.ViNe-enable VMs
c.Configure Hadoop
• Multicloud
Hadoop-based
BLAST
Advanced Computing and Information Systems laboratory
PRAGMA: Enabling the Long Tail of Team Science
Advanced Computing and Information Systems laboratory
ACIS FACILITIES
• State-of-the-art computing, storage and networking facilities
• Unique environment for experimental research and design of
distributed systems that use virtualization software developed
by commercial and open-source projects
• ~200 servers, ~1250 cores, ~4.8TB memory, ~260TB storage• FutureGrid cluster: IBM iDataPlex connected to Florida Lambda Rail.
• NUMAcloud: up to 64 cores and 512MB of memory in a single image
• Autonomic Testbed: autonomics for datacenter management
• VM and cloud: rich set of VMMs and cloud software
• Storage: centralized (IBM DS4800) and cloud-based (OpenStack)
Advanced Computing and Information Systems laboratory
Management goals and practices
� Goals• To succeed in mission
• Solid reputation, high-quality, highly-cited work
• Consistent ability to attract funding (sustainability)
• Excellent IT research infrastructure/facilities
• Growing research capacity and culture
� Practices• Make expectations clear to all; then trust, measure and verify
• Pursue multiple sources of funding
• Invest in facilities: hardware, software and space
• Simplify, automate, create routines to eliminate overhead
Advanced Computing and Information Systems laboratory
The good
� Have best practices• Put them in writing
• Known when joining the lab
• Create wiki
• Everyone to contribute
� Weekly research reports• Key questions: what, why,
discoveries, show-stoppers
• Research notes, lab results, papers read, drafts
• Save everything in e-forum
Advanced Computing and Information Systems laboratory
The bad (and what to do about it)
� Every one is different • Keep expectations the same for everyone but …
• provide flexibility on how to achieve (schedule, style, topic…)
� You manage the research team you have, not the research team you would like to have …• Adapt assignments to talents/capabilities
• Enable team to acquire needed skills
• Recruit wisely
• Enable senior members to help junior members grow
Advanced Computing and Information Systems laboratory
The ugly (and how to live with it)
� People leave• Graduation, employment, change of plans…
• Overlap stays of departing members and new researchers
� Some people and/or projects will fail• Prepare for failure recovery and learn from failure
� Funding will vary• Have “rainy day” funds (overhead return, gift monies…)
� Funding comes with work and overhead • Zero-sum game – there is a limit to what you can do
� Other “stuff” will happen (social, personal, political…)
• Have everyone focus on doing good work …
Advanced Computing and Information Systems laboratory
Key points (noteworthy common sense)
1. Clear vision and mission (known and understood by everyone)
2. Clear metrics (measuring quality and quantity)
3. Best practices (documented and known from day one)
4. Routines and automation (simplify, minimize overhead, admin assistance if possible)
5. Train leaders (and give them ownership of lab initiatives)
6. Good resources, facilities and working environment
7. Diversify funding portfolio
8. Align the stars (maintain solid reputation)
9. Understand limitations and learn from mistakes• Insufficient number of faculty members
• Too large a dependency on soft money
• No think-time/slack to engage in large efforts or new initiatives
• Institutional inertia in engaging into new/novel operational models
10.Do periodic SWOT analysis