university of illinois role of mashups, cloud computing, and parallelism for visual analytics...
Post on 21-Dec-2015
214 views
TRANSCRIPT
Dagstuhl Scalable Visual Analytics
University of Illinois
Role of Mashups, Cloud Computing, and Parallelism for
Visual Analytics
Loretta Auvil
Dagstuhl Scalable Visual Analytics
University of Illinois
SW Silos
We continue to build silos.. Why?
I’m only creating a prototype for my paper… I want to have control… I want to write my own code… I can do it faster… I’m not funded to integrate with… …
Images from Google Search
Dagstuhl Scalable Visual Analytics
University of Illinois
From Silos to Mashups
Definition: Mashup is a web page or application that uses and combines data, presentation or functionality from two or more sources to create new services
Why do we want this? Enable out services in many applications and on a variety of
devices (laptop, high-res display wall, ipad, iphone or the others) Share and reuse is a good thing Reach communities with our tools and their data!!!
What can we do to change this? We can think and create data driven solutions so that they can
be mashed up with other tools. We can build web services that can be deployed or accessed. We can create API’s to be used.
How can we do this?
Dagstuhl Scalable Visual Analytics
University of Illinois
Mashup Framework
Components
Virtualization Infrastructure
Meandre Infrastructure
Visualization
Component Repository
Component Discovery
Meandre Data-Intensive Flows
Apps ServicesPlugin
sWeb Apps
Analytics
Data
Develo
per
Tools
Repositories
DataAnalysis
ComponentsFlows
User Interfaces
Computational Resources
Visualizations
Meandre Workbench
Dagstuhl Scalable Visual Analytics
University of Illinois
Kepler
Triana
BPEL
Ptolemy II
Taverna
Trident
Meandre
VisTrails
David De Roure slide (slightly modified)
BPEL
Scientific Workflows
Dagstuhl Scalable Visual Analytics
University of Illinois
Meandre for Mashups
Major Capabilities Dataflow execution Semantic technology (using RDF for storing meta info) Web-Oriented Supports publishing services for data, analytics and
visualization Modular components Encapsulation and execution mechanism Promotes reuse, sharing, and collaboration Cloud-friendly infrastructure
Note: (for Tom) Trading off some performance for reuse, flexibility and modular components… with option to parallelize components to improve performance
Dagstuhl Scalable Visual Analytics
University of Illinois
Components
Analytics
• Unsupervised Learning• Clustering• Frequent Pattern
Analysis (Rule Association)
• Supervised Learning• Naïve Bayesian• Support Vector
Machines (Weka)• Decision Trees (c4.5)
• Optimization Approaches• Genetic Algorithm
• Text Analysis (POS, Entity Ext)• OpenNLP• Stanford NER
Visualization
• Geographic (Google Maps)
• Temporal (Simile)
• Network Graphs – Link Nodes and Arcs (Protovis)
• Parallel Coordinates (Protovis)
• Stacked Area Chart (Flare)
• Tag Cloud Maker
• Decision Tree (Applet D2K)
• Naïve Bayes (Applet D2K)
• Rule Association (Applet)
• Dendogram (GWT)
Dagstuhl Scalable Visual Analytics
University of Illinois
Readability Analysis
Meandre Services from Firefox Plugin
Tag Cloud Analysis
Date Entity to Simile TimelineNetwork Analysis
Automatic Summarization
Location Entity to Google Map
Example: Zotero and SEASR
Dagstuhl Scalable Visual Analytics
University of Illinois
Cloud Metaphor The term cloud is used as a metaphor for
the Internet, based on how it is depicted in computer network diagrams and is an abstraction for the complex infrastructure it conceals
Cloud Computing – Definition The first academic use of this term appears to define it as a
computing paradigm where the boundaries of computing will be determined by economic rationale rather than technical limits.
Cloud computing is a paradigm of computing in which dynamically scalable and often virtualized resources are provided as a service over the Internet. Users need not have knowledge of, expertise in, or control over the technology infrastructure in the "cloud" that supports them
http://en.wikipedia.org/wiki/Cloud_computing
An Ideological Metaphor & Definition
Dagstuhl Scalable Visual Analytics
University of Illinois
Cloud Computing
How can we leverage these computation environments? Known issues
Cloud mechanics have a steep learning curve.. Data movement to the cloud Security
Next generation data-intensive applications will: Use cloud computing technologies and conduits Require adaptation of programming paradigms Leverage a flexible and modular architecture Promote processing and resources at scale Distributed data flow designs to allow processing to be co-
located with data sources and enable transparent scalability
Dagstuhl Scalable Visual Analytics
University of Illinois
Meandre in the Clouds
Meandre Data-intensive execution engine Component-based programming architecture Orchestrate cloud deployments Leverage cloud conduits
NCSA Virtual Machines & Enterprise Cloud VMWare, Xen, & Eucalyptus ElasticFox & AMS Web Application
Dagstuhl Scalable Visual Analytics
University of Illinois
Components for Amazon & EucalyptusComponents can be
created to: List images Launch/
terminate instances
Transfer Data or Programs to running instances
Trigger process computation
Monitor processes and/or persistent services
Dagstuhl Scalable Visual Analytics
University of Illinois
Parallelism
Writing parallel code can be hard and debugging even harder…
But we need it because our data sets are growing… And software tools can help And hardware is also available
MapReduce model a powerful abstraction (software framework) developed
by Google to support distributed computing on large data sets on clusters of computers
Hadoop is an open source version GPUs
Dagstuhl Scalable Visual Analytics
University of Illinois
Meandre for Parallelism
Implemented a Script Language (ZigZag) Implemented MapReduce in Meandre Automatic Parallelization for stateless components
Adding the operator [+4] or [+4!] would result in a directed graph
# Describes the data-intensive flow
#
@pu = push()
@pt = pass( string:pu.string ) [+4!]
print( object:pt.string )
Dagstuhl Scalable Visual Analytics
University of Illinois
Scaling Genetic Algorithms in Meandre
Intel 2.8Ghz QuadCore, 4Gb RAM. Average of 20 runs.
Dagstuhl Scalable Visual Analytics
University of Illinois
And With Hadoop
60 Dual Quad Core Xeons with 8GB RAM. GB Ethernet
Resources exhaustion