bioinformatics and cluster support by scientific it ... · lifecycle mgt. automation processing...
TRANSCRIPT
||Scientific IT Services (SIS)
Michal Okoniewski, Samuel Fux, Scientific IT Services, ETH Zurich
24.11.2016 1
Bioinformatics and cluster support by Scientific IT Services of ETH
Using the CLC Genomics Server with the EULER cluster
||Scientific IT Services (SIS) 24.11.2016 2
SIS @ ETH Zurich: An Integrative Approach
Consulting & Training
High-Performance Computing
Scientific Software and
Data Management Research
Informatics
||Scientific IT Services (SIS) 24.11.2016 3
Bio/Medical/Social Sciences: Data Avalanche ...
Growing a quantitative branch
Computing know-how greatly varying
Availability of “big data”
Much larger data volumes
Increasing data complexity
More complex workflows
Large collaborations
More groups and sites
More people
Longer projects
||Scientific IT Services (SIS) 24.11.2016 4
… and Data Analysis / Management
Data Management
Provenance Tracking
Lifecycle Mgt.
Automation
Processing
Analysis
Data Integration
Visualization
Sharing
||Scientific IT Services (SIS) 24.11.2016 5
Triaging in the «Jungle» of Computing Options
||Scientific IT Services (SIS) 24.11.2016 6
Consulting and Training
Courses:
Best practices in scientific programming
Various Python courses
Introduction to Apache Spark for large scale data processing
Workshop on next-generation sequencing analysis using HPC
Usage of portals (e.g. proteomics data analysis)
Introduction to “electronic lab notebook”
Data management plans for research proposals
(sustainability, reproducible research)
Procurement of large scale computational infrastructures
||Scientific IT Services (SIS) 7
EULER cluster. Euler I (right) & II (left)
© 2
015
Oliv
ier
Byrd
e
24.11.2016
||Scientific IT Services (SIS)
EULER stands for
Erweiterbarer, Umweltfreundlicher, Leistungsfähiger ETH Rechner
It is the 5th central (shared) cluster of ETH
1999–2007 Asgard ➔ decommissioned
2004–2008 Hreidar ➔ integrated into Brutus
2005–2008 Gonzales ➔ integrated into Brutus
2007–2016 Brutus
2014–2018+ Euler
It benefits from the 16 years of experience gained with
those previous large clusters
8
What is EULER?
24.11.2016
||Scientific IT Services (SIS)
Like its predecessors, Euler has been financed (for the most
part) by its users
Since 2014 over 70 (!) research groups from almost all departments of
ETH have invested in Euler
These so-called “shareholders” receive a share of the cluster’s
resources (processors, memory, storage) proportional to their
investment
The (small) share of Euler financed by IT Services is open to all
members of ETH
The only requirement is a valid NETHZ account
These “guest users” can use limited resources
If someone needs more computing power, he/she can invest in the
cluster and become a shareholder at any time
9
Shareholder model
24.11.2016
||Scientific IT Services (SIS) 10
Shareholders by department (January 2016)
BSSE5%
CHAB14%
ERDW25%
GESS4%
MATH8%
MATL1%
MAVT10%
USYS14%
Other4%
Public11%
Cloud3%
Admin1%
24.11.2016
||Scientific IT Services (SIS)
Euler I (2014) 448 x HP BL460c Gen8 (352 x 64 GB, 32 x 128 GB, 64 x 256 GB)
Each node contains 2 x 12-core Intel Xeon E5-2697v2 @ 2.7 GHz
Euler II (2015) 768 x HP BL460c Gen9 (736 x 64 GB, 32 x 512 GB)
Each node contains 2 x 12-core Intel Xeon E5-2680v3 @ 2.5 GHz
Plus 4 very large memory nodes (4 x 3072 GB)
Euler III (2016) Over 1200 compute nodes with faster CPUs (3.0-3.5 GHz)
To be delivered in November 2016, in production in January 2017
High-speed networks 10-Gigabit Ethernet (Cisco) for file access
56-Gigabit InfiniBand (Mellanox) for inter-node communication
11
Hardware generations
24.11.2016
||Scientific IT Services (SIS) 12
Performance growth
0
200
400
600
800
1000
1200
2010 2011 2012 2013 2014 2015 2016
Peak p
erf
orm
ance [
TF
]
Euler
Brutus
24.11.2016
||Scientific IT Services (SIS)
The only requirement to use Euler is a valid NETHZ account No need to fill out an account request form
Immediate access using your NETHZ credentials
You can login right now using your NETHZ account ssh [email protected]
Your Euler account is created automatically upon first login Becomes active once you have accepted the cluster’s usage rules
Euler uses NETHZ database to identify shareholders and guest users, and sets privileges and priorities automatically
13
Who can use Euler
24.11.2016
||Scientific IT Services (SIS) 24.11.2016 14
Adding server plugin in your workbench
Start the Workbench as administrator
In the Workbench, go to Help => Plugins
||Scientific IT Services (SIS) 24.11.2016 15
Adding server plugin
Help => Plugins
||Scientific IT Services (SIS)
File=> CLC Server Login
24.11.2016 16
Connect to CLC server
||Scientific IT Services (SIS)
Login with your CLC username and password
24.11.2016 17
Connect to CLC server
||Scientific IT Services (SIS) 24.11.2016 18
You have own space for data on EULER
||Scientific IT Services (SIS)
Import
“Workbench” to get your local file
Choose destination on EULER
24.11.2016 19
Importing the data into the EULER space
||Scientific IT Services (SIS)
Local file
24.11.2016 20
Importing the data into the EULER space
||Scientific IT Services (SIS)
Choose a local file
24.11.2016 21
Importing the data into the EULER space
||Scientific IT Services (SIS)
Choose a destination on Euler
24.11.2016 22
Importing the data into the EULER space
||Scientific IT Services (SIS)
Choosing the right cluster queue for the compute jobs
“Grid” option
24.11.2016 23
Data processing on EULER
||Scientific IT Services (SIS)
Choosing the right cluster queue for the compute jobs
“Grid” option
24.11.2016 24
Data processing on EULER
||Scientific IT Services (SIS)
Multiple BLAST
Next-gen sequencing alignment to a genome
De-novo assembly
Alignment to contigs
….
Please only choose the parallel queues if you are running
on of these tasks
24.11.2016 25
Typical tasks that can use multiple cores
||Scientific IT Services (SIS)
Import the paired reads into the Workbench
24.11.2016 26
De-novo assembly example
||Scientific IT Services (SIS)
Choose the de-novo assembly option…
24.11.2016 27
De-novo assembly example
||Scientific IT Services (SIS)
… and Grid option with large memory
24.11.2016 28
De-novo assembly example
||Scientific IT Services (SIS)
Choose the input files…
24.11.2016 29
De-novo assembly example
||Scientific IT Services (SIS)
… and parameters
24.11.2016 30
De-novo assembly example
||Scientific IT Services (SIS)
Wait until the process is executed (surprisingly fast…)
24.11.2016 31
De-novo assembly example
||Scientific IT Services (SIS)
Check the created assembly
24.11.2016 32
De-novo assembly example
||Scientific IT Services (SIS)
It is enough to use Grid => single_core
Select the files
24.11.2016 33
Exporting the (bigger) data from EULER
||Scientific IT Services (SIS)
Careful with “custom file name” for multiple files – use
wildcards, e.g. {1}.{2} for filename, extension
24.11.2016 34
Exporting the (bigger) data from EULER
||Scientific IT Services (SIS)
Destination can be one of admins’ folders
Talk to the admin
To get the files
To get own export folder on EULER
24.11.2016 35
Exporting the (bigger) data from EULER
||Scientific IT Services (SIS)
Scientific IT Services and ID representatives in DBIOL are
glad to help you with your bioinformatic needs
CLC support
Help with parallelization on EULER of big computing
tasks
Help with EULER command line software stack
Help with data co-analysis
Time-slots for small projects (up to 3 man/days of work) available
upon request and discussion
“Code clinic” for your programs/scripts
24.11.2016 36
Summary – support modes, Scientific IT Service
||Scientific IT Services (SIS)
https://sis.id.ethz.ch
https://scientific.ethz.ch
24.11.2016 37
Summary – support modes, Scientific IT Service
||Scientific IT Services (SIS)
Wiki
https://scicomp.ethz.ch
https://scicomp.ethz.ch/wiki/Getting_started_with_clusters
Ticket system
http://tinyurl.com/cluster-support (NETHZ authentication)
Please do not send questions to individual members of the team
Person-to-person
Contact us to set up an appointment at your place
Visit us at Weinbergstrasse 11, WEC, D floor (please call first)
38
Contact / Getting help - EULER
24.11.2016
||Scientific IT Services (SIS)
Samuel Fux – application specialist
EULER software stack
CLC server administration
Michal Okoniewski – bioinformatician
bioinformatics support
co-analysis projects
code clinic
24.11.2016 39
Responsible for bioinformatic software in SIS
Thank you for your attention!
Questions/comments?
24.11.201640