thermal emission spectrometer

Post on 10-May-2022

4 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

I M P R O V E D D A T A A N A L Y S I S T O O L S F O R T H E

T H E R M A L E M I S S I O N S P E C T R O M E T E R

K. Rodriguez J. Laura R. Fergason R. Bogle

What is TES

W H A T I S T H I S A B O U T ?

Why is TES important

Challenges in ingestion and modeling

• setup costs

• ingestion costs

• query times

Collected ~206 Million Spectra in its lifetime

Highest spectral resolution for infrared data

Ideal for analysis at global scales

519 Fields across ~206 million records

~200GB footprint

Nine tables, normalized

One is command line based, the other is a web form. They

get the job done, but we can always aim for improvement.

Increase productivity by reducing friction though more

precise tools.

Two Tools already exist: Vanilla and TES Data Tool

First Goal: Enable distributed analytics for the TES data set

Focus on spatio-temporal. Bin data at different solar longitude steps and Lon/Lat per pixel.

MONGODB

Popular document

based database. Has

built-in distributed

support.

POSTGRESQL

Traditional RDBMS

store. Uses PostGIS

for supporting spatial

queries

CASSANDRA

Popular Columnar &

key/value store. Works

best for sparse data

access.

The Databases.

The Data Center.

APACHE SPARK

Distributed computing engine

for large scale data volumes.

Supports most databases.

DC/OS

DataCenter Operating System.

Allows for the containerization

of services like Spark over a

pool of resources.

DC/OS Running with Cassandra and PostgreSQL

Progress so far…

Ingestion code publicly available via the Python library plio on Github.

Reads in the original binaries as Pandas data frames. Scripts for loading files into Postgres and MongoDB also available soon.

Maps Created.

Mars Thermal Inertia from MongoDB. Year 25at 6 lat/lon per pixel during Solar Longitudes (Ls) 170-220 at

10 Ls per step.

Future Work.Maps are nice… What about analytics?

DATA MINING

Start to apply spatio-temporal data mining techniques to look

for outliers in the data (Anomaly Detection).

MORE DISTRIBUTION

Complete distributed Cassandra & MongoDB. Make Spark

Distributed.

top related