![Page 1: HPCC Systems Engineering Summit Presentation - Leveraging HPCC Systems with VCL (Virtual Computing Lab)](https://reader030.vdocument.in/reader030/viewer/2022032419/55a2bfb51a28abfe3e8b4684/html5/thumbnails/1.jpg)
Leveraging HPCC Systems with
Virtual Computing Lab Vincent W. Freeh
Department of Computer Science North Carolina State University
![Page 2: HPCC Systems Engineering Summit Presentation - Leveraging HPCC Systems with VCL (Virtual Computing Lab)](https://reader030.vdocument.in/reader030/viewer/2022032419/55a2bfb51a28abfe3e8b4684/html5/thumbnails/2.jpg)
Projects
Data Intensive Curriculum Data
• Data at scale • Storage management • Data warehousing • Data format • Encryption, compression • Meta-data, provenance
Distributed computing • HPCC • Hadoop • NoSQL DBs • Hive, Pig, zookeeper, • BIONC/REST/AWS+
Algorithms • MR algorithm design • Graph algorithms
Knowledge from information • IR – info retrieval • Analytics • Inverted index • Text processing • Clustering and classification
![Page 3: HPCC Systems Engineering Summit Presentation - Leveraging HPCC Systems with VCL (Virtual Computing Lab)](https://reader030.vdocument.in/reader030/viewer/2022032419/55a2bfb51a28abfe3e8b4684/html5/thumbnails/3.jpg)
Virtual Computing Lab
• Cloud infrastructure – Authentication – Privileges
• Highly flexible – Time limits – Concurrent
reservations – Block allocations
• Images – User creation – Bare metal or virtual machine – Lab machine – Cluster environments
![Page 4: HPCC Systems Engineering Summit Presentation - Leveraging HPCC Systems with VCL (Virtual Computing Lab)](https://reader030.vdocument.in/reader030/viewer/2022032419/55a2bfb51a28abfe3e8b4684/html5/thumbnails/4.jpg)
History of VCL
• Begun 2004 at NCSU – College of Engineering – Office of Information Technology
• Donated source to Apache Software Foundation 2008 – Top-level Apache Project
•World wide – More than 40 installations
![Page 5: HPCC Systems Engineering Summit Presentation - Leveraging HPCC Systems with VCL (Virtual Computing Lab)](https://reader030.vdocument.in/reader030/viewer/2022032419/55a2bfb51a28abfe3e8b4684/html5/thumbnails/5.jpg)
NCSU VCL Statistics • Total reservations: > 1.4M • Total hours: > 10M • Unique images: > 3,000
Max concurrent reservations/day
![Page 6: HPCC Systems Engineering Summit Presentation - Leveraging HPCC Systems with VCL (Virtual Computing Lab)](https://reader030.vdocument.in/reader030/viewer/2022032419/55a2bfb51a28abfe3e8b4684/html5/thumbnails/6.jpg)
HPCC on VCL
• Project: Create HPCC image on VCL •Why
– No setup to use HPCC – Experience with HPCC cluster
• Goals – Standalone HPCC image – HPCC cluster – Not for production (yet)
![Page 7: HPCC Systems Engineering Summit Presentation - Leveraging HPCC Systems with VCL (Virtual Computing Lab)](https://reader030.vdocument.in/reader030/viewer/2022032419/55a2bfb51a28abfe3e8b4684/html5/thumbnails/7.jpg)
Standalone image
• To use HPCC – Install VMware or equivalent – Download VM image from HPCCsystems •Which one?
– Create VM instance – Cross fingers
•With VCL – Create reservation – Login
![Page 8: HPCC Systems Engineering Summit Presentation - Leveraging HPCC Systems with VCL (Virtual Computing Lab)](https://reader030.vdocument.in/reader030/viewer/2022032419/55a2bfb51a28abfe3e8b4684/html5/thumbnails/8.jpg)
VCL Timeline
new request
allocate VM/node
provision
start
initialize
notify user
![Page 9: HPCC Systems Engineering Summit Presentation - Leveraging HPCC Systems with VCL (Virtual Computing Lab)](https://reader030.vdocument.in/reader030/viewer/2022032419/55a2bfb51a28abfe3e8b4684/html5/thumbnails/9.jpg)
![Page 10: HPCC Systems Engineering Summit Presentation - Leveraging HPCC Systems with VCL (Virtual Computing Lab)](https://reader030.vdocument.in/reader030/viewer/2022032419/55a2bfb51a28abfe3e8b4684/html5/thumbnails/10.jpg)
![Page 11: HPCC Systems Engineering Summit Presentation - Leveraging HPCC Systems with VCL (Virtual Computing Lab)](https://reader030.vdocument.in/reader030/viewer/2022032419/55a2bfb51a28abfe3e8b4684/html5/thumbnails/11.jpg)
![Page 12: HPCC Systems Engineering Summit Presentation - Leveraging HPCC Systems with VCL (Virtual Computing Lab)](https://reader030.vdocument.in/reader030/viewer/2022032419/55a2bfb51a28abfe3e8b4684/html5/thumbnails/12.jpg)
![Page 13: HPCC Systems Engineering Summit Presentation - Leveraging HPCC Systems with VCL (Virtual Computing Lab)](https://reader030.vdocument.in/reader030/viewer/2022032419/55a2bfb51a28abfe3e8b4684/html5/thumbnails/13.jpg)
Issues
• Authentication • Persistent storage
![Page 14: HPCC Systems Engineering Summit Presentation - Leveraging HPCC Systems with VCL (Virtual Computing Lab)](https://reader030.vdocument.in/reader030/viewer/2022032419/55a2bfb51a28abfe3e8b4684/html5/thumbnails/14.jpg)
Authentication
• SSH – Instance is “owned” by user who created
reservation – Can ssh into image using campus ID and
password • ECL Watch
– Web page – Needs to be password protected
![Page 15: HPCC Systems Engineering Summit Presentation - Leveraging HPCC Systems with VCL (Virtual Computing Lab)](https://reader030.vdocument.in/reader030/viewer/2022032419/55a2bfb51a28abfe3e8b4684/html5/thumbnails/15.jpg)
Authentication
• Two methods – LDAP •Not working (at this time) •Need to authenticate with campus LDAP server
– .htaccess
![Page 16: HPCC Systems Engineering Summit Presentation - Leveraging HPCC Systems with VCL (Virtual Computing Lab)](https://reader030.vdocument.in/reader030/viewer/2022032419/55a2bfb51a28abfe3e8b4684/html5/thumbnails/16.jpg)
.htaccess
• Create random password • Create .htaccess file • (Re)start ECL watch • Email password to user
![Page 17: HPCC Systems Engineering Summit Presentation - Leveraging HPCC Systems with VCL (Virtual Computing Lab)](https://reader030.vdocument.in/reader030/viewer/2022032419/55a2bfb51a28abfe3e8b4684/html5/thumbnails/17.jpg)
![Page 18: HPCC Systems Engineering Summit Presentation - Leveraging HPCC Systems with VCL (Virtual Computing Lab)](https://reader030.vdocument.in/reader030/viewer/2022032419/55a2bfb51a28abfe3e8b4684/html5/thumbnails/18.jpg)
![Page 19: HPCC Systems Engineering Summit Presentation - Leveraging HPCC Systems with VCL (Virtual Computing Lab)](https://reader030.vdocument.in/reader030/viewer/2022032419/55a2bfb51a28abfe3e8b4684/html5/thumbnails/19.jpg)
Persistent storage
•NCSU – AFS storage – Limited
• VCL image – Mounts AFS as remote disk – Spray and despray from/to AFS – Done manually
![Page 20: HPCC Systems Engineering Summit Presentation - Leveraging HPCC Systems with VCL (Virtual Computing Lab)](https://reader030.vdocument.in/reader030/viewer/2022032419/55a2bfb51a28abfe3e8b4684/html5/thumbnails/20.jpg)
• AFS too small •Multiple datasets • Sharing • Specific to NCSU
Persistent storage issues
![Page 21: HPCC Systems Engineering Summit Presentation - Leveraging HPCC Systems with VCL (Virtual Computing Lab)](https://reader030.vdocument.in/reader030/viewer/2022032419/55a2bfb51a28abfe3e8b4684/html5/thumbnails/21.jpg)
HPCC Cluster Image
• Use VCL cluster environment – Parent-child – Any number – /etc/cluster
•HPCC cluster configuration – Cluster configurations vary – Many parameters and options – Complex
![Page 22: HPCC Systems Engineering Summit Presentation - Leveraging HPCC Systems with VCL (Virtual Computing Lab)](https://reader030.vdocument.in/reader030/viewer/2022032419/55a2bfb51a28abfe3e8b4684/html5/thumbnails/22.jpg)
Configuration
•Web page GUI • Good for novice • Good for persistent
![Page 23: HPCC Systems Engineering Summit Presentation - Leveraging HPCC Systems with VCL (Virtual Computing Lab)](https://reader030.vdocument.in/reader030/viewer/2022032419/55a2bfb51a28abfe3e8b4684/html5/thumbnails/23.jpg)
Cluster configuration
• environment.xml – Specifies configuration – Easy to get wrong – Command line tool
• Idea – Create several cluster VCL images – Dynamically create environment.xml on each
node in image – Start HPCC services
![Page 24: HPCC Systems Engineering Summit Presentation - Leveraging HPCC Systems with VCL (Virtual Computing Lab)](https://reader030.vdocument.in/reader030/viewer/2022032419/55a2bfb51a28abfe3e8b4684/html5/thumbnails/24.jpg)
VCL Hooks
•Hook – Routine invoked by instance of image – Provides for dynamic configuration – Many hooks – at various points in the boot
timeline
![Page 25: HPCC Systems Engineering Summit Presentation - Leveraging HPCC Systems with VCL (Virtual Computing Lab)](https://reader030.vdocument.in/reader030/viewer/2022032419/55a2bfb51a28abfe3e8b4684/html5/thumbnails/25.jpg)
Example: default user
• Image is generic • Instance has specific user and access to
user’s storage •Hooks
– Create user – Mount remote filesystem
![Page 26: HPCC Systems Engineering Summit Presentation - Leveraging HPCC Systems with VCL (Virtual Computing Lab)](https://reader030.vdocument.in/reader030/viewer/2022032419/55a2bfb51a28abfe3e8b4684/html5/thumbnails/26.jpg)
Cluster Configuration
• Create environment.xml – Need node info for all nodes in cluster – Need cluster type (eg, thor-only, thor+roxie) – Execute command line tool
• Set up ssh keys • Start HPCC services
![Page 27: HPCC Systems Engineering Summit Presentation - Leveraging HPCC Systems with VCL (Virtual Computing Lab)](https://reader030.vdocument.in/reader030/viewer/2022032419/55a2bfb51a28abfe3e8b4684/html5/thumbnails/27.jpg)
Issues
• Passwordless ssh – Share keys during load – VCLs blocks general ssh
• Persistent storage – Even a bigger problem
• Cluster configurations – Create a VCL image for each configuration – Essentially infinitely many possible
configurations – What are the primary clusters?
![Page 28: HPCC Systems Engineering Summit Presentation - Leveraging HPCC Systems with VCL (Virtual Computing Lab)](https://reader030.vdocument.in/reader030/viewer/2022032419/55a2bfb51a28abfe3e8b4684/html5/thumbnails/28.jpg)
Teaching •HPCC is a vehicle
– Use HPCC to teach concepts
•What can be taught? – Applications (use ECL) – Distributed systems (evaluation) – System design (configuration) – Performance (identify bottlenecks)
![Page 29: HPCC Systems Engineering Summit Presentation - Leveraging HPCC Systems with VCL (Virtual Computing Lab)](https://reader030.vdocument.in/reader030/viewer/2022032419/55a2bfb51a28abfe3e8b4684/html5/thumbnails/29.jpg)
Summary
•HPCC on VCL – Standalone prototype – Cluster prototype
• Issues
– LDAP – Persistent storage – SSH
![Page 30: HPCC Systems Engineering Summit Presentation - Leveraging HPCC Systems with VCL (Virtual Computing Lab)](https://reader030.vdocument.in/reader030/viewer/2022032419/55a2bfb51a28abfe3e8b4684/html5/thumbnails/30.jpg)
RESEARCH
![Page 31: HPCC Systems Engineering Summit Presentation - Leveraging HPCC Systems with VCL (Virtual Computing Lab)](https://reader030.vdocument.in/reader030/viewer/2022032419/55a2bfb51a28abfe3e8b4684/html5/thumbnails/31.jpg)
Extending ECL with Natural Language Processing (NLP) • GATE – open source NLP system • Java • Pipeline of processing resources • Add ECL routines to create and execute
pipelines
![Page 32: HPCC Systems Engineering Summit Presentation - Leveraging HPCC Systems with VCL (Virtual Computing Lab)](https://reader030.vdocument.in/reader030/viewer/2022032419/55a2bfb51a28abfe3e8b4684/html5/thumbnails/32.jpg)
Elastic HPCC
• Elastic changes procurement (from capital to operating)
•Must effectively add or remove nodes •Must efficiently access any data from any
node