agenda - test science...big data history late 1980’s, ibm researchers barry devlin and paul murphy...

43

Upload: others

Post on 20-May-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Agenda - Test Science...Big Data History Late 1980’s, IBM researchers Barry Devlin and Paul Murphy developed the "business data warehouse“, while Bill Inmon wrote Building the
Page 2: Agenda - Test Science...Big Data History Late 1980’s, IBM researchers Barry Devlin and Paul Murphy developed the "business data warehouse“, while Bill Inmon wrote Building the

Agenda

NIAM – NASA Information Architecture and Data Management

NASA New Roles with Big Data

Big Data/Big Think

Big Data Tools & Technologies

NASA Examples

NESC Data

» Frangible Joint Data

» Business Data

2

4/11/2017

Page 3: Agenda - Test Science...Big Data History Late 1980’s, IBM researchers Barry Devlin and Paul Murphy developed the "business data warehouse“, while Bill Inmon wrote Building the

NIAM

NIAM – NASA Information Architecture and Data Management

» Establish effective information standards, digital strategies, and data management

across the Agency

» This OCIO Program is composed of five primary elements; Architecture and

Standards, Management and Governance, Policy and Legal, User and Data

Services, Sustainment and Outreach

» The intent of the Information Architecture and Data Management Program is to

achieve a variety of capabilities to enable the Agency to efficiently acquire or

generate, find and access, use and reuse, share and exchange, manage and govern,

and store and retire. The Agency Information Architecture defines a consistent

information lifecycle context that overlays all Agency activity, whether it is

engineering, science, operations, or institutional support.

3

4/11/2017

ARCHDATA

OPENDIGITAL TECH

</>Data ScienceThe collection, management and analysis of data in order to produce information and drive decision making.

Data StrategyData Governance Data Lifecycle ManagementData Analytics LabData Fellows ProgramData Stewards

Open InnovationThe development of new innovation frameworks and techniques, and the development and delivery of machine-readable instructions to access, arrange, and apply data.

Agency Open Data MgmtDigital Strategy ReportingSpace Apps ChallengeInnovation IncubatorData Innovation PipelineWomen in Data

Open.NASA, Github/NASAData.NASA, Code.NASAAPI.NASA

Digital Integration The delivery of enhanced digital capability to support data interoperability and accessibility to enable data insights and discovery.

Data inventory/RegistryTagging/discoverabilityData usability/APIsComputing/CodingMission-focused tech applications

Tech InfusionResearch, prototype and assess the creation, modifications and usage of processes or tools to solve a problem, to meet stakeholder needs

Data Centric ArchitectureInternet of ThingsSoftware as a ServiceVirtual Desktop InfraCollaboration (Secure Fileshare, NASATube)

InformationArchitectureThe development and use of models, policies, rules or standards that govern how information is collected, stored, arranged, integrated, and put to use in data systems and in organizations.

Standards and interoperabilityIA Reference ArchitectureData MiningData Contract LanguageMission Support Tech Consulting

Big DataNIAM.NASA.Gov

Tech & Innovation Division--What We Do:

Page 4: Agenda - Test Science...Big Data History Late 1980’s, IBM researchers Barry Devlin and Paul Murphy developed the "business data warehouse“, while Bill Inmon wrote Building the

NASA New Roles with Big Data

Data Scientist: NASA has several including one in Office of the Chief

Information Officer

Big Data Administration

Chief Data Officer/Chief Knowledge Officer: Becoming very popular in

Federal Government and in Industry

Big Data Stewards

Data Evangelists

Labs:

» Data Analytics Lab

» Internet of Things Lab4

4/11/2017

Page 5: Agenda - Test Science...Big Data History Late 1980’s, IBM researchers Barry Devlin and Paul Murphy developed the "business data warehouse“, while Bill Inmon wrote Building the

More on Big Data Working Group

Over 60 subject matter experts from the NASA Centers and Mission

Directorates

Heavy focus on sharing of latest projects

» techniques/tools used

» avoid duplication

» share licenses when appropriate

» new applications to problems

Best practices, upcoming symposiums, conferences, meetings

Tool demos, commercial or homegrown

5

4/11/2017OCIO Template

Page 6: Agenda - Test Science...Big Data History Late 1980’s, IBM researchers Barry Devlin and Paul Murphy developed the "business data warehouse“, while Bill Inmon wrote Building the

Big Data History

Late 1980’s, IBM researchers Barry Devlin and Paul Murphy developed the

"business data warehouse“, while Bill Inmon wrote Building the Data

Warehouse in 1992 (wiki.com)

1997, The term “Big Data” was coined by two Ames Research Center

employees, Michael Cox and David Ellsworth, in the late 90’s because the data

they were processing was too big to run on their desktop computers

July 1999, Predictive Analysis changes business as usual (Winshuttle.com)

February 2001. Doug Laney, an analyst with the Meta Group, publishes a

research note titled “3D Data Management: Controlling Data Volume, Velocity,

and Variety.” This becomes known as the “3Vs (Forbes.com)

2006, Hadoop is created to handle explosion of web data (Gigaom.com)

June 2008, The term “Big Data” began gaining popularity, predicted to “create

revolutionary breakthroughs” (Cra.org)

6

4/11/2017

Page 7: Agenda - Test Science...Big Data History Late 1980’s, IBM researchers Barry Devlin and Paul Murphy developed the "business data warehouse“, while Bill Inmon wrote Building the

What Is Big Data?

Definition

Big Data means lots of things all rolled into one phrase

• No single standard definition…

• Some think it’s Data too big to fit in excel!!!

• Some think it’s Data too large to fit on one node

Gartner has defined Big Data as high-volume, high-velocity, high-variety

information assets that demand cost-effective, innovative forms of information

processing for enhanced insight and decision making

Well known definition: Big Data is data whose scale, diversity and complexity

require new architecture, techniques, algorithms, and analytics to manage it

and extract value and knowledge from it

7

4/11/2017

Page 8: Agenda - Test Science...Big Data History Late 1980’s, IBM researchers Barry Devlin and Paul Murphy developed the "business data warehouse“, while Bill Inmon wrote Building the

What is Big Data?

5 V’s

Volume

• Huge data size

• Terabytes to Zettabytes of data

• Data coming from records, archives, transactions, tables and files

Velocity

• Streaming data, high speed of data flow

• Speed in which data is generated, processed, and moved in real or near-time

Variety

• Various data sources

• Data in many formats: structured, unstructured, text, multimedia

Veracity

• Trustworthiness, authenticity, reputation, accountability of the source and accuracy

of the data

Value

• Most important function of Big Data is turning data into information 8

4/11/2017

Page 9: Agenda - Test Science...Big Data History Late 1980’s, IBM researchers Barry Devlin and Paul Murphy developed the "business data warehouse“, while Bill Inmon wrote Building the

What is Big Data?

to NASA

Big Data describes the situation in which collection, storage, and analysis

of data exceeds the capability and capacity of conventional processing,

storage and methods

Necessitates new architectural approaches

NASA views Big Data challenges from the point of view of data life cycle• Generation, Transport, Stewardship, Processing, Analysis, Access, Dissemination

Requirements for new technologies lie in areas such as:• Data management

• Artificial intelligence

• Instrumentation

• Scalable cyber infrastructure

• Data storage and access methods

• Computational and collaboration systems

• Visualization

• Analytics algorithms

9

4/11/2017

Page 10: Agenda - Test Science...Big Data History Late 1980’s, IBM researchers Barry Devlin and Paul Murphy developed the "business data warehouse“, while Bill Inmon wrote Building the

Why Big Data?

What People Do With It

10

4/11/2017

The predominant usage of Big

Data Analytics is to find

relationships between various

unconnected datasets and

databases

» Finding Patterns, “Virtual Schemas,”

which may be temporal / transient –

primarily from web logs

» As opposed to relational databases

(RDBMs) which have pre-defined

relationships between all its data through

the use of a schema

Page 11: Agenda - Test Science...Big Data History Late 1980’s, IBM researchers Barry Devlin and Paul Murphy developed the "business data warehouse“, while Bill Inmon wrote Building the

Why Big Data?

Growth of Big Data is possible because of:

• Increased low cost storage, processing power

• Availability of different types of data

• Distributed computing

• Virtualization techniques

• Increased sources/devices spewing data (e.g. Internet of Things)

• Increased ability to capture data (e.g. Hubble Space Telescope)

Data contains information of great business value. If we can

extract the insights, we can make far better decisions

All major companies are developing Big Data strategy to

reduce cost, to substantially improve the time required to

perform computing tasks and to offer new products/services

Businesses and Agencies that don’t take Big Data seriously

will be left behind

11

4/11/2017

Page 12: Agenda - Test Science...Big Data History Late 1980’s, IBM researchers Barry Devlin and Paul Murphy developed the "business data warehouse“, while Bill Inmon wrote Building the

12

4/11/2017

Big Data Tools & Technologies

Big Data Landscape

Page 13: Agenda - Test Science...Big Data History Late 1980’s, IBM researchers Barry Devlin and Paul Murphy developed the "business data warehouse“, while Bill Inmon wrote Building the

NASA Examples

13

4/11/2017OCIO Template

Page 14: Agenda - Test Science...Big Data History Late 1980’s, IBM researchers Barry Devlin and Paul Murphy developed the "business data warehouse“, while Bill Inmon wrote Building the

Langley Research Center’s

Deep Content Analytics and Deep Q&A

Deep Content Analytics: using Watson Content Analytics Technology

Goal is to provide community with natural langue processing technologies that

will quickly make sense of internal/global knowledge by identifying trends and

experts, aiding in discovery, and finding answers to questions with evidence

Key Collections Analyzed:

• Space Radiation • Human-Machine Teaming

• Aerospace Vehicle Design • NASA Lessons Learned

• Carbon Nanotubes • Uncertainty Quantification

• Autonomous Flight • Select NASA Technical Reports

• Model Based Engineering • Space Mission Analysis

Deep Q&A: Using Watson Discovery Advisor Platform14

4/11/2017CIO Templatebigdata.larc.nasa.gov

Page 15: Agenda - Test Science...Big Data History Late 1980’s, IBM researchers Barry Devlin and Paul Murphy developed the "business data warehouse“, while Bill Inmon wrote Building the

15

4/11/2017

• Digest and analyze thousands of articles without reading

• Rapidly identify groupings, trends, connections, and experts

• Explore technology gaps that could be leveraged

• Identify cross-domain leverages and research

Autonomous Flight Research

Carbon Nanotubes Research

Space Radiation Research

Knowledge Assistants Using Watson Content Analytics

Analysis of 130,000 articles from

a 20-year time span

Key Capabilities

• Successfully demonstrated value and developed robust expertise and capability

• Buying licensed content is a challenge ; Working with NASA content, open content and

individual researchers collections

Analysis of 4,000 articles

integrating scholarly and web

content

Analysis of 1,000 articles of NASA

research from Human Research

Program Human Machine Teaming, Uncertainty

Quantification, Vehicle Design being worked

Page 16: Agenda - Test Science...Big Data History Late 1980’s, IBM researchers Barry Devlin and Paul Murphy developed the "business data warehouse“, while Bill Inmon wrote Building the

16

4/11/2017

NASA Earth eXchange (NEX)

Big Data Experiment

4/11/2017

500+ members2.3 PB of Storage

http://nex.nasa.gov

Page 17: Agenda - Test Science...Big Data History Late 1980’s, IBM researchers Barry Devlin and Paul Murphy developed the "business data warehouse“, while Bill Inmon wrote Building the

Distributed Active Archive Centers (DAACs)

Archives, Processes, Documents, and Distributes Data for all of NASA’s Earth

Science Division missions through 12 science discipline focused DAACs

17

4/11/2017OCIO Template

Stores Earth Observing System

(EOS) Data

Feb 2017: Global Imagery Browse

Services (GIBS) team announced

the availability of 80+ new layers

One example: Langley Research

Center’s Atmospheric Science Data

Center (ASDC) supports more than

40 projects and has more than 1800

archived data sets

Page 18: Agenda - Test Science...Big Data History Late 1980’s, IBM researchers Barry Devlin and Paul Murphy developed the "business data warehouse“, while Bill Inmon wrote Building the

18

4/11/2017

NASA’S Earth Observing System (EOS)

Earth Science mission data are processed by

Science Investigator-led Processing Systems

designed for missions and measurements

Stewardship of Earth Science Data is conducted by

Distributed Active Archive Centers that provide

knowledgeable curation and science-discipline-

based support

NASA develops tools for users to obtain needed data/information while minimizing burden associated with unwanted data

NASA engages with multiple US and international partners to facilitate use of data by broadest possible community with minimal effort and maximal consistency with other data sources

Open Data Policy (since 1994) - Data are openly available to all and free of charge except where governed by international agreements

Page 19: Agenda - Test Science...Big Data History Late 1980’s, IBM researchers Barry Devlin and Paul Murphy developed the "business data warehouse“, while Bill Inmon wrote Building the

19

4/11/2017

Earth Science Data and Information Policy

NASA commits to the full and open sharing of Earth science data obtained from NASA Earth observing satellites, sub-orbital platforms and field campaigns with all users as soon as such data become available.

There will be no period of exclusive access to NASA Earth science data. Following a post-launch checkout period, all data will be made available to the user community. Any variation in access will result solely from user capability, equipment, and connectivity.

NASA will make available all NASA-generated standard products along with the source code for algorithm software, coefficients, and ancillary data used to generate these products.

All NASA Earth science missions, projects, and grants and cooperative agreements shall include data management plans to facilitate the implementation of these data principles.

NASA will enforce a principle of non-discriminatory data access so that all users will be treated equally. For data products supplied from an international partner or another agency, NASA will restrict access only to the extent required by the appropriate Memorandum of Understanding (MOU).

http://science.nasa.gov/earth-science/earth-science-data/data-information-policy/

Page 20: Agenda - Test Science...Big Data History Late 1980’s, IBM researchers Barry Devlin and Paul Murphy developed the "business data warehouse“, while Bill Inmon wrote Building the

Example of Data Stored in the DAACS

20

4/11/2017OCIO TemplateImage captured on 29 January 2017 by the VIIRS instrument, on board the joint NASA/NOAA SNPP satellite.

Page 21: Agenda - Test Science...Big Data History Late 1980’s, IBM researchers Barry Devlin and Paul Murphy developed the "business data warehouse“, while Bill Inmon wrote Building the

21

4/11/2017

NASA Examples

Counting trees across United States

Mission Impact: First ever high resolution forest cover

dataset. Reduces uncertainties in forest carbon

estimation in support of the NASA Carbon Monitoring

System.

NEX production pipeline example: In this high-resolution image, the NEX team used a

combination of machine-learning and computer vision algorithms to identify urban

tree cover over San Francisco and the surrounding area using data from the National

Agricultural Imaging Program aerial imagery project. This map serves as a critical

input for biomass estimations and a number of eco-physiological models

POCs: Saikat Basu, Louisiana State University

Sangram Ganguly, NASA Ames Research Center

Application of machine learning to

Treecover/Landcover classification

from very high-resolution data (1-m

from NAIP)

Combination of feature extraction,

normalization and Deep Belief Network

Input size – 70TB per epoch

Feature vectors – 3PB

Page 22: Agenda - Test Science...Big Data History Late 1980’s, IBM researchers Barry Devlin and Paul Murphy developed the "business data warehouse“, while Bill Inmon wrote Building the

22

4/11/2017

Small Data Turns Big

What problem or problems are we solving?» Structured Data (Financials)

• Labor cost

• Financial cost

• Duration vs cost

• Labor count vs cost

• External Labor vs duration

» Unstructured Data & Structured (Technical)

• Assessment similarities by technical data

• Findings

• Observations

• Recommendations

• Lessons Learned

Financial with Technical Engineering Data

Who needs the problem solved

» Management

» Cost analysts

» Engineers

Page 23: Agenda - Test Science...Big Data History Late 1980’s, IBM researchers Barry Devlin and Paul Murphy developed the "business data warehouse“, while Bill Inmon wrote Building the

23

4/11/2017

Extract, Transform, and Load (ETL)

NESC Assessments

Financial

DataTechnical

Data

And

Show

Transformed

Unstructured

Structured

Integrating Data

Texts Analyti

cs

Financial

Technical

Page 24: Agenda - Test Science...Big Data History Late 1980’s, IBM researchers Barry Devlin and Paul Murphy developed the "business data warehouse“, while Bill Inmon wrote Building the

24

4/11/2017

Small Data Turns Into Big Data

Transformed

Unstructured Data

Structured Data

About 1000

Assessments

1,000,000 Plus

Rows of Data

Assessment

Findings

Observation

Recommendations

Lesson Learned

Textual Analytics

Page 25: Agenda - Test Science...Big Data History Late 1980’s, IBM researchers Barry Devlin and Paul Murphy developed the "business data warehouse“, while Bill Inmon wrote Building the

25

4/11/2017

Visual Analytics

Page 26: Agenda - Test Science...Big Data History Late 1980’s, IBM researchers Barry Devlin and Paul Murphy developed the "business data warehouse“, while Bill Inmon wrote Building the

26

4/11/2017

Frangible Joint Task

Page 27: Agenda - Test Science...Big Data History Late 1980’s, IBM researchers Barry Devlin and Paul Murphy developed the "business data warehouse“, while Bill Inmon wrote Building the

27

4/11/2017

Frangible Joint Task

The data complexity and output format required some basic data to database

translation to improve quality control

» Standardize units of measure, data field names, and make data

corrections/updates, as required.

» The Database team systematically reviewed and coordinated specific

terminology with FJ Assessment Test team

» The Database team and Test teams worked closely and this collaboration

led to a uniform process for building the database.

Page 28: Agenda - Test Science...Big Data History Late 1980’s, IBM researchers Barry Devlin and Paul Murphy developed the "business data warehouse“, while Bill Inmon wrote Building the

28

4/11/2017

Frangible Joint Task

Visual basic was used to create application macros to extract and prepare the

data - lessening process time and improving repeatability.

The users were provided an interface to query the database.

The data can be exported in various formats to a spreadsheet or other

analysis tools.

The Database team also provided specific reporting formats, per user

requirements.

» There were some limitations on how much data of the 113,000,000 records could be

downloaded at one time because of the user computer and/or software limitations.

For example, Excel can only view 1,000,000 rows of data if using the 64-bit operating

system and much less if a 32-bit operating system is used.

Page 29: Agenda - Test Science...Big Data History Late 1980’s, IBM researchers Barry Devlin and Paul Murphy developed the "business data warehouse“, while Bill Inmon wrote Building the

29

4/11/2017

Frangible Joint Task, Main Report Page

Page 30: Agenda - Test Science...Big Data History Late 1980’s, IBM researchers Barry Devlin and Paul Murphy developed the "business data warehouse“, while Bill Inmon wrote Building the

30

4/11/2017

Frangible Joint Task, Report Example

Page 31: Agenda - Test Science...Big Data History Late 1980’s, IBM researchers Barry Devlin and Paul Murphy developed the "business data warehouse“, while Bill Inmon wrote Building the

31

4/11/2017

Backup

Page 32: Agenda - Test Science...Big Data History Late 1980’s, IBM researchers Barry Devlin and Paul Murphy developed the "business data warehouse“, while Bill Inmon wrote Building the

32

4/11/2017

Introduction to Big Data

(aka: Big Data 101)

John D. Sprague

Acting, Associate CIO,

Technology & Innovation Division

10/07/2016

Page 33: Agenda - Test Science...Big Data History Late 1980’s, IBM researchers Barry Devlin and Paul Murphy developed the "business data warehouse“, while Bill Inmon wrote Building the

Introduction to Big Data

(aka: Big Data 101)

John D. Sprague

Acting, Associate CIO,

Technology & Innovation Division

10/07/2016

Page 34: Agenda - Test Science...Big Data History Late 1980’s, IBM researchers Barry Devlin and Paul Murphy developed the "business data warehouse“, while Bill Inmon wrote Building the

Big Data Return On Investment

A few early results:

» Visa sped up crunching 36TB of data from one month to 13 minutes with

Hadoop in the cloud

» Google’s self learning voice recognition via probability calculations reduced

errors by up to 25 percent

» Nestle saves $1B annually by analyzing big data

» Best Buy found that 7% of its customers accounted for 43% of sales and

reorganized the stores around them

» Wal-Mart organizes in real-time from 2.5 PB databases

» JPL sped up processing of data from two weeks to four hours by using the

cloud 34

4/11/2017

Page 35: Agenda - Test Science...Big Data History Late 1980’s, IBM researchers Barry Devlin and Paul Murphy developed the "business data warehouse“, while Bill Inmon wrote Building the

Big Data Tools & Technologies

Big Data Landscape

35

4/11/2017

Page 36: Agenda - Test Science...Big Data History Late 1980’s, IBM researchers Barry Devlin and Paul Murphy developed the "business data warehouse“, while Bill Inmon wrote Building the

NASA Open Gov

Increase Agency transparency

and accountability to external

stakeholders

Enable citizen participation in

NASA missions (prizes and

challenges, citizen science)

Improve internal NASA

collaboration and innovation

36

4/11/2017OCIO Template

Encourage partnerships that can create economic opportunity

Institutionalize Open Gov philosophies and practices at NASA

Located: https://open.nasa.gov/open-gov/

Page 37: Agenda - Test Science...Big Data History Late 1980’s, IBM researchers Barry Devlin and Paul Murphy developed the "business data warehouse“, while Bill Inmon wrote Building the

NASA's Pleiades Supercomputer

State-of-the-art tech for meeting the agency's supercomputing requirements,

enabling scientists and engineers to conduct mission modeling and simulation.

5.95 Pflop/s LINPACK rating (#13 on November 2016 in the world).

NASA’s Interface Region Imaging Spectrograph (IRIS) spacecraft flying above

the solar surface at 6,000 miles, showing only light emitted by

plasma at a temperature of about 35,000 degrees Fahrenheit.

37

4/11/2017OCIO Templatehttps://www.nas.nasa.gov/hecc/resources/pleiades.html

Predictive Modeling for NASA’s Entry,

Descent, and Landing (EDL) Missions.

Image Credit: Mats Carlsson,

University of Oslo

Page 38: Agenda - Test Science...Big Data History Late 1980’s, IBM researchers Barry Devlin and Paul Murphy developed the "business data warehouse“, while Bill Inmon wrote Building the

38

4/11/2017

Unmanned Missions

Hubble Space Telescope, deployment in April 1990

First major optical telescope to be placed in space

More than 1.3 M observations, 14K scientific papers

Hubble archive contains more than 120 Terabytes,

and Hubble science data processing generates about

10 Terabytes of new archive data per year

4/11/2017

James Web Space Telescope (JWST)

» Space-based observatory, optimized for infrared

wavelengths, which will complement and extend

the discoveries of the Hubble. Launches in 2018

» 6 times larger in area than Hubble

Page 39: Agenda - Test Science...Big Data History Late 1980’s, IBM researchers Barry Devlin and Paul Murphy developed the "business data warehouse“, while Bill Inmon wrote Building the

Moon Laser

(Lunar Laser Communication Demonstration)

Pulsed laser beam set record in data transmission

» Transmitted data over 239,000 miles between moon

and Earth, record-breaking download rate of 622 Mbps

Developed by MIT’s Lincoln Laboratory, Hosted on LADEE

Prep for Laser Communications Relay Demo (LCRD)

» Transmit data 10 to 100 times faster than the fastest

radio-frequency systems, Uses significantly less mass/power

» Being built at NASA’s Goddard Space Flight Center

» First time NASA has contracted to fly a payload on an American-manufactured

commercial communications satellite, launching in mid-201939

4/11/2017OCIO Template

Page 40: Agenda - Test Science...Big Data History Late 1980’s, IBM researchers Barry Devlin and Paul Murphy developed the "business data warehouse“, while Bill Inmon wrote Building the

40

4/11/2017

Page 41: Agenda - Test Science...Big Data History Late 1980’s, IBM researchers Barry Devlin and Paul Murphy developed the "business data warehouse“, while Bill Inmon wrote Building the

NASA Examples – Extra Vehicular Activity

(EVA) Data Integration

Just Completed Phase 2

State-of-the-art Platform using Tools:

• Core platform based on ELK Stack

• Prelert for Anomaly Detection

• Hosted in AWS Government Cloud

• Various databases including

RethinkDB, Neo4J, Amazon RDS, &

PostgresSQL

• APIs deployed as micro services

using Docker container

• Amazon S3 for document storage

• Integrated with jBPM workflow

engine

• UI framework: HTML5, Angular,

Bootstrap, Materialize CSS

• 2D and 3D models for browsing

Access to all EVA data – Search, Browse, Navigate & Analyze19

Page 42: Agenda - Test Science...Big Data History Late 1980’s, IBM researchers Barry Devlin and Paul Murphy developed the "business data warehouse“, while Bill Inmon wrote Building the

42

4/11/2017

Vehicle Exploration Medical System

Ground Based & Vehicle Data Architectures:

• Clinical Operational Needs

• Research Data Capture

• Long Term Health Information

• Crew Medical Officer

• Crew Medical Support

• Flight Surgeon/BME

• External Consults

Real-Time Data Processing &

Analytics for Crew

Mirrored Delayed Data Presentation

for situational awareness/support

Exploration Medical Capabilities (ExMC)

Data Analytics

Technologies:

• Apache Spark

• Kafka for messaging

• D3 Visualization

• cFS (Core Flight

System)

• TimeSeries and

RocksDB databases

• Machine Learning,

Speech Recognition and

Natural Lang Processing

• Processing sensor data

including Bio and env.

sensors

Page 43: Agenda - Test Science...Big Data History Late 1980’s, IBM researchers Barry Devlin and Paul Murphy developed the "business data warehouse“, while Bill Inmon wrote Building the

Other Resources

Get involved

• Join the NASA Big Data Working Group– Meets monthly, 2nd Thursday of each month

– Notes are captured on the http://niam.nasa.gov/big-data-2/ website (only accessible from the NASA corporate

network or VPN)

• Big Data Face to Face Work Sessions (1 or 2 a year, rotating Centers)

Training

• MIT’s Tackling the Challenges of Big Data online course is considered the gold

standard

• In the works: an advanced Big Data SATERN course

Apache

• Various Open Source Big Data tools by Apache

Massive Open Online Courses (MOOCs)

• Coursera, Udacity, Udemy, Big Data University and edX43

4/11/2017