zettavox: content mining and analysis across heterogeneous compute clouds__hadoopsummit2010
DESCRIPTION
Hadoop Summit 2010 - Application Track ZettaVox: Content Mining and Analysis Across Heterogeneous Compute Clouds Mark Davis, KitengaTRANSCRIPT
ZettaVox: Content Mining and Analysis across Heterogeneous Compute Clouds
Mark DavisKitenga, Inc.
The Company
The Problem
The Solution
Demo
2
Session Agenda
Kitenga1,2: (Maori) A view or perception
› 2004-present
› CTO: Mark Davis, InXight Software (Business Objects/SAP), Microsoft, Defense R&D
› CEO: Anil Uberoi, Lucid Imagination, Amdocs, Sun
3
Kitenga
1also a region in Uganda2also a bed-and-breakfast in Clevendon, Auckland
Solutions for Information Overload
2953 Bunker Hill Lane, Santa Clara, CA
Support
4
PredictionLogic, Inc.
The Never-Ending Problem
5
Multimedia Data
VideoImageryAudio
Sensor StreamsBiometric data
3DText
EmailWeb pages
TweetsPosts
Enterprise Data
Enterprise dataCDRs
Financial recordsAccess logs
Solving the Problem is Hard
6
Content mining analysts
Machine learning specialists
Information retrieval specialists
Software Engineers
Expensive and hard to find
Parallel Supercomputers
Racked clusters
Systems management
Enterprise storage solutions
Gigabit switches
Power management
Text analytics
Ontologies
Database reporting tools
ETL tools
Business intelligence
Open source components
Convert raw data into actionable intelligence
Defense Intelligence
7
Situation Reports
Geotagged Imagery
ZettaVox
Named Entity Extraction
Image tagging
Video analytics
Linkage Analysis
Network Visualization
Search
Hadoop, GPUs, HDFS, Hbase, SOLR
Improve Force Effectiveness
Increase speed of drug discovery
Pharmaceutical R&D
8
Patents
Genetic Sequence
Data
Journal Articles
ZettaVox
Biological Named Entity
Extraction
Author Name Extraction and Normalization
Linkage Analysis
Timelines
Facetted Search
Hadoop, HDFS, Hbase, GPUs, SOLR
Faster Discovery
ZettaVox
Compose analysis workflows using out-of-the-box components
Interact with HDFS/Hadoop through Rich Internet Application
Monitor system progress
Visualize and analyze results
Batch mode via XML and JSON
Heterogenous compute resources
9
Heterogenous Compute Clouds
10
42 U ≈ 84-168 cores
2 PCIe slots15 multiprocessors
480 cores$0.13-$0.35/Gflop
AmazonAWS
RackspaceMosso
PrivateCloud
Author Analysis Solutions
11
Interact with HDFS
12
Monitor Analysis Jobs
13
Use and Visualize Results
14
ZettaVox
15
Current Approach
Slow analyticsMethods don’t scaleExpensive hardwareExpensive softwareCapital investment
Expertise investment
ZettaVoxZettaVoxInternet-scale cloud
and cluster-based content mining
Hadoop with GPU supportScalable
SaaSOut-of-the-box expertise
Rich user experience
Questions?
Mark [email protected]