analytics & ai software suite - cray · large-scale analytics, machine learning and deep...

4
Analytics & AI Software Suite

Upload: others

Post on 20-May-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Analytics & AI Software Suite - Cray · Large-scale analytics, machine learning and deep learning have become a significant workload on HPC systems. Research shows the percentage

Analytics & AI Software Suite

Page 2: Analytics & AI Software Suite - Cray · Large-scale analytics, machine learning and deep learning have become a significant workload on HPC systems. Research shows the percentage

Large-scale analytics, machine learning and deep learning have become a significant workload on HPC systems. Research shows the percentage of total system utilization devoted to each is climbing — and it shows no signs of stopping. Data science isn’t an option — it’s an imperative. And it’s an imperative that requires the right technology.

Data-intensive approaches demand data-intensive systems capable of handling datasets from terabytes to exabytes and at speeds from teraflops to petaflops. Scale matters, and commodity clusters can neither support the magnitude of data nor the critical research dependent on that data. You need analytics at supercomputer scale to take advantage of growing datasets.

Your solution is the Cray® Urika®-XC analytics and AI software suite. The Urika-XC suite brings big data analytics, artificial intelligence (AI) and large-scale graph processing to the massively scalable Cray® XC™ series supercomputer. Combined with the Urika-XC soft-ware suite, an already-powerful supercomputer for simulations becomes an equally powerful system for big data analytics and AI.

Complete solution for simulation, analytics and AIFrom an operational perspective, you need technology that yields results and maximizes your investment. From a scientific perspective, you need transformative insights and the time to explore them. The Cray Urika-XC suite of software capabilities meets both needs. Optimized to run seamlessly on XC systems, the Urika-XC suite of applications and tools extends core supercomputing capabilities, enabling converged simulation and analysis workloads that fully leverage the system’s massive scale.

Tools that deliver richer insights The Urika-XC software suite brings together modern tools and languages to power richer insights and reduce the need for already scarce talent. It consists of two core components:

• Open Source Analytics. This collection of popular frameworks, languages and tools enables analytics, machine learning and deep learning at scale.

• Cray Graph Engine. This unique in-memory semantic graph database, implemented using HPC technology, handles the largest and most demanding use cases requiring graph discover and pattern matching.

Drive deep learning to scale You can capitalize on the XC series supercomputer’s scale and performance for your deep learning research with the Cray-developed CrayPE Machine Learning plugin (CrayPE ML). Exclusive to the Urika-XC software suite, this plugin — delivered in conjunction with the industry-leading machine intelligence framework TensorFlow™ — leverages the Aries™ interconnect to give you scaling to more than 500 nodes on both CPUs and GPUs. Additionally, the CrayPE ML plugin eliminates time-consuming administrative tasks for you. For example, it automatically defines the nodes to use, removing the burden of determining how many to use and where to put them. Overall, the CrayPE ML plugin makes it easy to perform deep learning training on the XC series supercomputer.

How the Urika-XC Suite Delivers Better TCO and More Discovery

Improved TCO

● Provide your users with a complete portfolio of simulation and data-intensive processing capabilities

● Eliminate the costs of buying and administrating a purpose-built analytics machine

● Run analytics, AI and simulation workloads all on the same system

More Discovery

● Make breakthrough discoveries with powerful analytics and deep learning tools

● Get faster time-to-insight — scale analytics environments to 256 nodes and distributed deep learning to 500 nodes

● Capitalize on next-gen developer languages and tools such as Apache® Spark™ and Python ecosystems

Page 3: Analytics & AI Software Suite - Cray · Large-scale analytics, machine learning and deep learning have become a significant workload on HPC systems. Research shows the percentage

Open Source Analytics The Cray Open Source Analytics (OSA) package gives you open source frameworks, programming environments and distributed deep learning — all complemented by Cray expertise and support. We’ve integrated popular open-source packages into ready-to-run containers designed to take advantage of the performance and scale of the XC series systems. And because we provide direct support for upgrades and bug fixes, it eliminates the need to continually integrate updated open source packages.

With the OSA package comes frameworks that can be used to dynamically allocate distributed clusters of nodes for large-scale work:

Apache Spark fast and general-purpose engine for large-scale data processing with an extensive set of libraries and packages for analytics and machine learning uses

Distributed Dask and Python for open data science analytics

Jupyter Notebook for interactive data science

Distributed Deep Learning for artificial intelligence

• Google TensorFlow machine intelligence framework delivered as a pre-tested and integrated application for easy deployment and faster time to accuracy

• CrayPE Machine Learning plugin for CPU and GPU scaling and simplified ease of use

OSA capabilities are delivered in a Shifter container for fast deployment. Frameworks are optimized to run in conjunction with existing system workload managers and you can easily add updates and new features.

Cray Graph Engine The Cray Graph Engine (CGE) is a graph analytics platform optimized for large-scale pattern matching and graph discovery applications. Leveraging the Aries network and software infrastructure, this in-memory semantic graph database enables real-time analytics on the largest and most complex datasets, using graph data structures and problems. It features highly optimized support for inference, deep graph analysis and pattern-based queries. CGE is based on W3C industry standards and implemented using HPC technologies. It has been tested against a 400 billion collection of tuples and shows a nearly 100x performance speed-up over Spark-based GraphX.

• Java, Scala, Python and R

• MLlib, Graph-X, Spark SQL and Spark Streaming

• Maven project management

• Sbt (Scala build tool)

• BigDL

• Anaconda Python-based open source data science tools and package management for out-of-the-box data science

• Distributed Dask for dynamically allocating a distributed memory cluster for Python-based analytic processing

• Live, shareable documents for data preparation, code development, modeling simulation and machine learning

AI SupportThe Urika-XC suite supports artificial intelligence in several ways:

● Data preparation. Spark- and Python-based tools import, transform and prepare data for use by deep learning frameworks.

● Model training. For training within the Apache Spark environment, use MLlib or BigDL. For training that uses CPU or GPU nodes, use TensorFlow with the CrayPE ML plugin for faster time to accuracy.

● Deep learning. TensorFlow enhanced with the CrayPE ML plugin and Apache Spark-based BigDL deep learning frameworks.

Page 4: Analytics & AI Software Suite - Cray · Large-scale analytics, machine learning and deep learning have become a significant workload on HPC systems. Research shows the percentage

© 2018 Cray Inc. All rights reserved. Specifications are subject to change without notice. Cray, the Cray logo and Cray Urika-XC are registered trademarks, and Cray XC is a trademark of Cray Inc. All other trademarks mentioned herein are the properties of their respective owners. 20180621ES

Cray Inc. 901 Fifth Avenue, Suite 1000 Seattle, WA 98164 Tel: 206.701.2000 Fax: 206.701.2500 www.cray.com