secrets for successful deep learning projectsa26b8277...structures. generally, machine learning is...

Secrets for Successful Deep Learning Projects

2

For any machine learning project, we start with what we’re trying to learn. This means there’s a corpus of data which may directly or indirectly contain insight we want to use. Usually this insight is second-order kind of information – meaning that it requires more than simple retrieval of records from fields in the data set or from simple operations on flat data structures. Generally, machine learning is most efficacious in situations where the number of dimensions required to increment insight goes beyond what humans can reasonably handle on their own. This can be from data volume,

breadth, or complexity. But once the effort required to repeatedly derive insight from a dataset becomes troublesome and time consuming, machine learning shines.

The key to characterizing a problem that machine learning helps us understand is in how we frame and go about solving it. Anything requiring deductive reasoning, joins, unions, and intersections of large multivariate data, or that needs repetitive analysis of large or continuous data sets are great candidates. There are no magic bullets and no such thing as a “general-purpose machine learning”

DEEP LEARNING IS GREAT: BUT IT MAY NOT YET BE READY FOR US…

We’ve heard it all. Computers are able to work longer, don’t get tired, don’t make errors, and continue to do the same thing over and over without complaint. Machine learning is revolutionizing every industry on the planet.

But a problem persists: How do you go from an idea in your head that might have merit, to getting a computer to work on it for you? How do you move from “well that might work,” to insight which helps solve your problem? This article aims to make that process easier.

3

algorithm (at this time). This means we must rely on subject matter experts and computer scientists working in concert toward a methodology which applies to our particular problem. We can however, codify what a general-purpose machine learning workflow might look like to help us understand the breadth of the project.

Getting started, we spend a lot of time thinking about the problem we’re solving and the wider workflow to address it. No amount of algorithmic efficiencies or framework acceleration will make a difference to the success or failure of the project on their own without solving data ingress and insight egress. This is probably the biggest mistake made by new machine learning projects, and the one which results in the most frustration and lost return on investment. The project is effectively useless without understanding where the data originates, handling it properly as it moves and transforms through the learning

system, and then pushing results into systems designed to handle or react to insight. Determining ROI for the project must consider all these aspects of the workflow. Luckily there are many reliable open source engines for building pipelines for data manipulation. We’ll illustrate one of these workflow systems

(Apache’s Spark ecosystem) in action as we move forward and follow along with Cray’s recent research in Nowcasting.

Standard weather forecasting techniques use an approach known as numerical weather prediction (NWP). NWP uses observations of the Earth (e.g. from satellites, planes, radars and ground stations) to build a representation of the state of the atmosphere which is then projected forward in time using a physics-based model of the atmosphere. This is a computationally-intensive process requiring dedicated high performance computing resources, with the first

Diagram 1: Generalized machine learning Workflow

4

forecasts typically available about an hour after the start of the process. Nowcasting is used to fill the gap between current observations and the availability of the first NWP-derived forecasts. This is particularly important for localized severe weather events such as very heavy precipitation. Recently, engineers at Cray investigated precipitation Nowcasting using deep learning models trained using historical data.

In Diagram 2, historical NEXRAD radar data – supplied by National Oceanic and Atmospheric Administration (NOAA) in a public repository - was used as the training set to create the Nowcasting model. NEXRAD data is not in a format readily consumable by any of the common learning frameworks. The team leveraged python extensively as their means both to experiment and to model the data through the system. The training of the deep learning model appears relatively late in the overall pipeline, though this is a bit misleading in this case. The end goal of the project was to investigate deep learning-based approaches for Nowcasting, but not to carry those results further. In a production deployment, the insight from the trained models would go on to inform reporting, emergency management, transportation,

and other reactive systems that consume the forecasts.

Data collection, transformation, sampling, and training make up the basis for all learning pipelines. Despite the focus on machine learning and insight from learning efforts, a significant amount of Cray’s effort was spent in moving the data, massaging the data and manipulating it before finally feeding it

into computer systems learning from the data. This is due not to the inability of the machine learning algorithms and models to handle the data that we feed in, but simply that the data often is not in the format needed for these particular models. In the Nowcasting example, data is converted from a radial data structure to a Cartesian grid format suitable for ingestion by the learning models. This is where the interaction between subject matter experts and software developers is crucial – to ensure data manipulation preserves the important content while streamlining efficiencies. Many of these changes are outside the scope of weather prediction itself, and are more concerned with data formatting, normalization, serialization, and marshaling. For every efficiency gained through model tuning and data refinement, we pay a penalty

Diagram 2: Nowcasting Workflow

5

of inefficiency in copy time, storage, retrieval, augmentation, or manipulation.

This means that for the foreseeable future it is absolutely necessary to involve a computer scientist or true software developer simultaneously with subject matter experts. Subject matter experts properly frame problems and explain the insight they need to solve those problems and make a difference to the organization. Software developers translate problem parameters and solution outcomes into effective algorithms and models, as well as handle data marshaling. Currently there are no subject matter-specific machine learning frameworks available. Cray developed their own historical model by leveraging Nowcasting research on modern neural networks. All specialized machine learning frameworks and ecosystems -- whether they are proprietary or open source - derive from a mathematics and computer science background regardless of the content analyzed within that framework. While this makes for great efficiencies in software design, the vast majority of work in any actual data pipeline remains outside the scope of the machine learning software and is left as an exercise for the user. Fair or unfair, this is the current state of machine learning.

Should you embark on a machine learning journey? The answer is a resounding yes! But a yes complicated with additional requirements surrounding the data ecosystem and workflow. Start with the problem. Figure out what you don’t know. Get a rough idea of where and what format the data is that might contain the answer, or at least might be able to point you in the right direction for reaching an answer. Bringing source data and proper format together gets you a long way toward your first successful machine learning effort.

Treading that first road will be demanding. Don’t lose heart. Deep learning is one of

the most active areas for contemporary computer science and software development. Open source communities and private enterprise are working toward a common set of frameworks and tools with which users can assemble a functional machine learning system. Based on this trajectory, the world will see a general purpose accelerated framework within the next few years. Such a framework would automatically discover and use purpose-built training and inference processors like Google’s TPU platform, local or remote GPGPUs, or FPGAs – all for the general case. This would put control of the learning process into the hands of subject matter experts alone, decreasing time to market and dramatically reducing the manpower required to bring a project to light.

What’s our recommendation? Simply put: jump into the pool now and swim with the pioneers a little while longer. Leverage the accelerators on hand to determine if your deep learning problems can have valuable solutions from the machine learning disciplines. To get started, realize that most everyone has a perfectly serviceable GPU right in their laptop today – and if you’re not already learning from your data, then those laptop GPUs will absolutely amaze. Align your subject matter experts to software development teams, and then target the hardware acceleration needed in order to make your platform successful. There are a wide variety of acceleration technologies available that run the gamut for all budgets and techniques. No need to make an a priori decision about any particular hardware acceleration until you understand the basics of how your problem “learns best.” Stay loose as they say. And phone a friend. If you’re stuck, there are plenty of pioneers out there who will help you.

6

Machine LearningMachine Learning has been around for decades. Philosophically, there are two main camps around teaching computers to learn new things. Basically, it boils down to whether or not humans are directing the learning process. Note that in the examples below, many of the algorithms can be used in a variety of methodologies. We present them in the camp they tend to be associated with most of the time in the wider literature. Generally speaking, most Machine Learning workflows use a combination of supervised and unsupervised learning methods to derive insight from the data set. Much of the information in this section comes from https://en.wikipedia.org/wiki/Deep_learning.

Supervised LearningSupervised learning algorithms contend that we as intelligent creatures understand a particular process and chain of events, and therefore should impart our understanding and decision-making methods to computers. Generally, then, subject matter experts teach computers in the same ways we would teach small children: an expert develops a model of behavior in circumstances common to the problem we’re learning to solve. Then the computer steps through millions of example behaviors while the supervisor points out when the computer makes a mistake and adjusts the model or training data (or both). Remember that learning for computer systems is the same way that we train repetitive tasks for people, and often the causal relationships which are elucidated in the problem set are the same ones which would be outlined if a subject matter expert were explaining the same process

to a student. Some of the more popular supervised learning algorithms include decision trees, random forests, naïve Bayes classifiers, nearest neighbors, and support vector machines.

Unsupervised LearningUnsupervised learning algorithms allow computers to come up with relationships and insights about processed information without regard to how a subject matter expert might go about gaining information. This is most popular when we are trying to explore data sets for relationships which we do not know, or which we may only anticipate, and which might make a difference. It can sometimes be difficult for subject matter experts and laypeople to rely on unsupervised learning as there is no means by which the computer can “show its work”. For example, a system might learn a relationship in the data, and that relationship can be known to be relevant to the problem we’re solving, the answer to “why” that relationship exists can be answered only through the mathematics of the algorithm sorting the data. While allowing for unique and novel approaches to understanding data, some of which can be very efficient and clean-cut, it also is hard to explain complex dependent relationships and is therefore not usually relied upon in situations where strict adherence to causality or provenance are required. Some of the more popular unsupervised learning algorithms include principal component analysis, method of moments, and a variety of clustering algorithms.

Deep LearningDeep Learning is difficult to define because the field is under constant flux

DEFINITIONS

https://en.wikipedia.org/wiki/Deep_learning

https://en.wikipedia.org/wiki/Deep_learning

7

and new developments occur almost daily. Drawing on both unsupervised and supervised methods of learning, there are some commonalities we can elucidate here. Generally, the notion of deeper understanding comes from layered relationships between data entities. This is to say we tend to take one layer of algorithmic results and feed that into another “deeper” layer (which may or may not use the same or similar algorithms). Repeating this process helps develop a more “connected” or “hierarchical” result set with multiple levels of abstraction. Though not synonymous, often the vernacular usage of Deep Learning tends to refer to one of its more common methodologies, that of the Deep Neural Network.

Deep Neural NetworkArtificial neural networks (ANN) aim to mimic the biology of learning by attempting to mirror the behavior of animal brains. Deep Neural Networks (DNN) are specific classes of ANNs using multiple layers of abstraction to both partition large datasets (e.g. image recognition) and to develop preferred

pathways (e.g. language recognition) which create strongly connected outputs. Popular with many of the emerging uses for machine learning, DNNs are involved with autonomous vehicles, speech and language recognition, image and video processing, population behavior prediction, genetic modeling and other areas. These layers of processing in DNNs – especially those where inputs are fed from previous layers – often are called “convolutions”. You will sometimes hear “Convolutional Deep Neural Networks” as a term. Generally, these refer to a DNN tuned to generate a structure from multiple passes of inputs and outputs through abstractions which presents a hierarchical representation of the dataset.

NEXRADIn the context of this article, NEXRAD is the data generated by the “Next Generation Radar” observation stations and the format that data is presented in to the public by the National Oceanic and Atmospheric Administration. For more information, see https://en.wikipedia.org/wiki/NEXRAD.

https://en.wikipedia.org/wiki/NEXRAD

https://en.wikipedia.org/wiki/NEXRAD

8

Global supercomputing leader Cray builds innovative systems and solutions enabling scientists, researchers and engineers in academia, government and industry to meet existing and future simulation, advanced analytics and AI/Deep Learning challenges. Leveraging 40 years of experience in developing and servicing the world’s most advanced supercomputers, Cray offers a comprehensive portfolio of high-performance computing, storage

and data analytics solutions delivering unrivaled performance, efficiency and scalability. Cray’s industry-leading technologies are available in configurations to meet every budget and need. Whatever your research question, Cray makes it easy to take advantage of high-performance computing advancements.

For more information on Cray solutions and systems, please visit www.cray.com

ABOUT CRAY

Copyright © 2017 Tabor Communications, Inc. All Rights Reserved.

Produced by Tabor Custom Publishing in Conjunction with EnterpriseTech

Sponsored by:

http://www.cray.com

http://www.enterprisetech.com

http://www.taborcommunications.com

secrets for successful deep learning projectsa26b8277...structures. generally, machine learning is...

Documents