using apache solr for images as big data: presented by kerry koitzsch, wipro technologies

35
OCTOBER 11-14, 2016 BOSTON, MA

Upload: lucidworks

Post on 07-Jan-2017

77 views

Category:

Technology


3 download

TRANSCRIPT

Page 1: Using Apache Solr for Images as Big Data: Presented by Kerry Koitzsch,  Wipro Technologies

O C T O B E R 1 1 - 1 4 , 2 0 1 6 • B O S T O N , M A

Page 2: Using Apache Solr for Images as Big Data: Presented by Kerry Koitzsch,  Wipro Technologies

Using Apache Solr for Images As Big Data: A Case Study Kerry Koitzsch

Architect, Wipro Technologies

Page 3: Using Apache Solr for Images as Big Data: Presented by Kerry Koitzsch,  Wipro Technologies

Overview of this Presentation

•  This quick overview of one of our ongoing projects describes why Lucene and Solr are key parts of our ongoing research, development, and client support activities.

•  The presentation highlights areas of research which involve Solr technologies in the “images as big data” arena: an automated microscope slide application prototype as well as other kinds of data analysis and visualization. The use case described relies heavily on Lucene, Solr, and related “helper libraries” to provide data storage capabilities for the software toolkit, the “Image as Big Data Toolkit” (IABDT). •  Throughout the presentation we discuss how the flexibility, high performance, and ability to “play well with” other components makes Lucene/Solr an essential part of the application described here.

Page 4: Using Apache Solr for Images as Big Data: Presented by Kerry Koitzsch,  Wipro Technologies

4

01 Use Case Overview: How Solr Technologies Relate To:

§ ‘Old School’ statistical displays

§ Web-based data visualization

§ ‘Glue Ware’

§ A crime statistic visualization

§ An image as big data

visualization

Page 5: Using Apache Solr for Images as Big Data: Presented by Kerry Koitzsch,  Wipro Technologies

5

02 Types of Data Visualization Statistical displays --- ‘old school’ histogram, pie chart, and time series

Tabular displays --- stylized table-based visualization with search, etc.

Notebook based visualization

Map based displays with geo-location

Images with overlays

Constructing data visualizers with Lucene | Solr components

Page 6: Using Apache Solr for Images as Big Data: Presented by Kerry Koitzsch,  Wipro Technologies

6

03 “Old School” Statistical Visualization

Histograms, line charts, pie charts and time series displays.

Notebook technologies, built-in visualization capabilities (such as Elasticsearch-Kibana or Apache Mahout visualization) may be used with Cassandra data and with Lucene/Solr.

A standard ETL approach may be used as part of the data pipeline, and intelligent search can be provided by Lucene/Solr.

Page 7: Using Apache Solr for Images as Big Data: Presented by Kerry Koitzsch,  Wipro Technologies

7

01 “Old School” Statistical Visualization: Standard Plots and Charts

Page 8: Using Apache Solr for Images as Big Data: Presented by Kerry Koitzsch,  Wipro Technologies

8

01 “Old School” Visualization of Classifier Results

Page 9: Using Apache Solr for Images as Big Data: Presented by Kerry Koitzsch,  Wipro Technologies

9

01 “Old School” Statistical Visualization: Standard Time Series Plots

Page 10: Using Apache Solr for Images as Big Data: Presented by Kerry Koitzsch,  Wipro Technologies

10

01 Tabular Display Visualization: Hive Notebook

Page 11: Using Apache Solr for Images as Big Data: Presented by Kerry Koitzsch,  Wipro Technologies

11

01 Graph Visualization

ID,Case Number,Date,Block,IUCR,Primary Type,Description,Location Description,Arrest,Domestic,Beat,District,Ward,Community Area,FBI Code,X Coordinate,Y Coordinate,Year,Updated On,Latitude,Longitude,Location9955810,HY144797,02/08/2015 11:43:40 PM,081XX S COLES AVE,1811,NARCOTICS,POSS: CANNABIS 30GMS OR LESS,STREET,true,false,0422,004,7,46,18,1198273,1851626,2015,02/15/2015 12:43:39 PM,41.747693646,-87.549035389,"(41.747693646, -87.549035389)"9955861,HY144838,02/08/2015 11:41:42 PM,118XX S STATE ST,0486,BATTERY,DOMESTIC BATTERY SIMPLE,APARTMENT,true,true,0522,005,34,53,08B,1178335,1826581,2015,02/15/2015 12:43:39 PM,41.679442289,-87.622850758,"(41.679442289, -87.622850758)"9955801,HY144779,02/08/2015 11:30:22 PM,002XX S LARAMIE AVE,2026,NARCOTICS,POSS: PCP,SIDEWALK,true,false,1522,015,29,25,18,1141717,1898581,2015,02/15/2015 12:43:39 PM,41.87777333,-87.755117993,"(41.87777333, -87.755117993)"9956197,HY144787,02/08/2015 11:30:23 PM,006XX E 67TH ST,1811,NARCOTICS,POSS: CANNABIS 30GMS OR LESS,STREET,true,false,0321,,6,42,18,,,2015,02/15/2015 12:43:39 PM,,,9955846,HY144829,02/08/2015 11:30:58 PM,0000X S MAYFIELD AVE,0610,BURGLARY,FORCIBLE ENTRY,APARTMENT,false,false,1513,015,29,25,05,1137239,1899372,2015,02/15/2015 12:4

§ Leveraging Graph databases and graph visualization toolkits with Lucene/Solr-centric systems § Giraph, neo4j, OrientDB, and other graph databases in combination with a Lucene/Solr centric technology stack § For example, Chicago crime data format as CSV:

Page 12: Using Apache Solr for Images as Big Data: Presented by Kerry Koitzsch,  Wipro Technologies

Graph Visualization in Neo4J

Graph Visualization Example I: Neo4J (Separate Nodes)

Page 13: Using Apache Solr for Images as Big Data: Presented by Kerry Koitzsch,  Wipro Technologies

Graph Visualization Example : Simple UIs and Hierarchies

Graph Visualization Example II: gojs Visualization

Page 14: Using Apache Solr for Images as Big Data: Presented by Kerry Koitzsch,  Wipro Technologies

Notebook-Based Visualization

Jupyter or Zeppelin notebook technologies may

be used to display Solr based information and

analytics results

These notebook technologies can be used as the display component in a data pipeline oriented

processing architecture

Solr works well as one element of such a data

pipeline

Spring, Spring Data, and Apache Tika may be used

as data pipeline components

Simpler data pipelines may be evolved into Complex Event Processors (CEPs)

Page 15: Using Apache Solr for Images as Big Data: Presented by Kerry Koitzsch,  Wipro Technologies

Notebook Visualization: Architecture and Strategy

§ A relatively simple data pipeline system

may be build using Zeppelin notebook

as a visualization of the output results

§ Geolocation data may be visualized as

in the following example

Hadoop HBase NGData Lily Solr Lucene

Solandra Katta

Cassandra ELK Stack

Kafka Apache Spark

Mesos

Akka

Technology components

Page 16: Using Apache Solr for Images as Big Data: Presented by Kerry Koitzsch,  Wipro Technologies

Notebook Based Visualization: Example: Solr-Zeppelin-Cassandra

Page 17: Using Apache Solr for Images as Big Data: Presented by Kerry Koitzsch,  Wipro Technologies

Map / Geolocation Visualization

Crime data can easily be imported into Solr

The data may be manipulated and pushed into Elasticsearch or Solr or back to Cassandra

Elasticsearch data can be visualized using Kibana and searched compatibly with Lucene | Solr and the other modules

Logstash may be used to assist in importing data from “log file analysis” type applications, or Flume or any of the many other import frameworks: Apache Tika is especially useful as a support library

Page 18: Using Apache Solr for Images as Big Data: Presented by Kerry Koitzsch,  Wipro Technologies

Map / Geolocation Data: Crime Data in Solr

§ Technology stack includes the ELK Stack plus Cassandra plus Lucene/Solr/Hadoop § Data may use CSV crime data files as an original data source §  Solr can process JSON based data with geolocation data associated with it, and is especially powerful with Apache Tika

Page 19: Using Apache Solr for Images as Big Data: Presented by Kerry Koitzsch,  Wipro Technologies

Map / Geolocation : Crime Data in Kibana

§ Technology stack includes the ELK Stack plus Cassandra plus Lucene/Solr/Hadoop § Data may use CSV crime data files as an original data source §  Kibana can process JSON based data with geolocation data associated with it, as can Lucene/Solr/Tika

Page 20: Using Apache Solr for Images as Big Data: Presented by Kerry Koitzsch,  Wipro Technologies

Map | Geolocation Visualization: Data to Image

Page 21: Using Apache Solr for Images as Big Data: Presented by Kerry Koitzsch,  Wipro Technologies

“Image as Big Data” Visualization

A data pipeline with images as a data source

Feature extraction can identify features of interest and write them to Cassandra as feature descriptors, using Lucene/Solr for intelligent search capability

Deep learning and machine learning can enhance the processing pipeline

Page 22: Using Apache Solr for Images as Big Data: Presented by Kerry Koitzsch,  Wipro Technologies

Image as Big Data Analysis Image as Big Data Analysis (Poggio’s MIT Vision Machine)

Original Images

Color Analyzers Texture Analyzers Edge Detectors Motion Analyzers Stereo Image Analyzers

Discontinuity Map Generation (Including Line & Continuous Process)

Cooperating Recognition Process

Analysis Result Repository

Page 23: Using Apache Solr for Images as Big Data: Presented by Kerry Koitzsch,  Wipro Technologies

Intelligent Search with Lucene Solr Centric Architecture

Page 24: Using Apache Solr for Images as Big Data: Presented by Kerry Koitzsch,  Wipro Technologies

Image “As Big Data” Analytics Visualization: Linear Features

Page 25: Using Apache Solr for Images as Big Data: Presented by Kerry Koitzsch,  Wipro Technologies

Automated Microscopy : The Original Components

Page 26: Using Apache Solr for Images as Big Data: Presented by Kerry Koitzsch,  Wipro Technologies

Feature Extraction : Original Electron Microscope Image

Page 27: Using Apache Solr for Images as Big Data: Presented by Kerry Koitzsch,  Wipro Technologies

Feature Extraction : Image to Data : Ellipses

Page 28: Using Apache Solr for Images as Big Data: Presented by Kerry Koitzsch,  Wipro Technologies

Feature Extraction : Image to Data : Contours

Page 29: Using Apache Solr for Images as Big Data: Presented by Kerry Koitzsch,  Wipro Technologies

“Image as Big Data” Visualization: Optical Microscope Hardware

Page 30: Using Apache Solr for Images as Big Data: Presented by Kerry Koitzsch,  Wipro Technologies

Microscope Control Software, with Data Ingestion

Page 31: Using Apache Solr for Images as Big Data: Presented by Kerry Koitzsch,  Wipro Technologies

“Image as Big Data” Visualization: Solr Search: Metadata

Page 32: Using Apache Solr for Images as Big Data: Presented by Kerry Koitzsch,  Wipro Technologies

“Image as Big Data” Visualization: Microscopy UI

Page 33: Using Apache Solr for Images as Big Data: Presented by Kerry Koitzsch,  Wipro Technologies

Another View of the Data Pipeline

 

Image  and  Metadata  Input  Sources  

(or  “smart  sensors”)  

Multi-­‐sensor  Fusion  Software  Engine  

Short  Term  Computation  Result  

Repository  

Long-­‐Term    Result  Data  Repository  

Feature  Extraction  and  Model  Builder  

Global  System  Controller  

Page 34: Using Apache Solr for Images as Big Data: Presented by Kerry Koitzsch,  Wipro Technologies

Conclusions and Future Work

A use case was described in which we use a Lucene/Solr- centric technology stack to provide an intelligent search component

Flat files, HDFS files, CSV data, data streams and other data sources may be used, including microscope images of many different formats, resolutions, and metadata content

“Images as big data” is a viable strategy for building image processing applications with Lucene/Solr as an intelligent search component, because of Lucene/Solr’s flexibility and ability to play well with other components Deep learning, machine learning, data mining, and hybrid techniques can be used to develop Lucene/Solr-centric analytics applications with “intelligent search” capabilities

Your Questions? [email protected]

Page 35: Using Apache Solr for Images as Big Data: Presented by Kerry Koitzsch,  Wipro Technologies