the datascientists workplace of the future, ibm developerdays 2014, vienna by romeo kienzler
DESCRIPTION
TRANSCRIPT
![Page 1: The datascientists workplace of the future, IBM developerDays 2014, Vienna by Romeo Kienzler](https://reader033.vdocument.in/reader033/viewer/2022051819/54c6a8034a7959cc268b45ec/html5/thumbnails/1.jpg)
© 2013 IBM Corporation1
The Data Scientists Workplace of the Future - Workshop SwissRE, 11.6.14
Romeo Kienzler
IBM Center of Excellence for Data Science, Cognitive Systems and BigData(A joint-venture between IBM Research Zurich and IBM Innovation Center DACH)
Source: http://www.kdnuggets.com/2012/04/data-science-history.jpg
![Page 2: The datascientists workplace of the future, IBM developerDays 2014, Vienna by Romeo Kienzler](https://reader033.vdocument.in/reader033/viewer/2022051819/54c6a8034a7959cc268b45ec/html5/thumbnails/2.jpg)
© 2013 IBM Corporation2
The Data Scientists Workplace of the Future -
* * C R E D I T S * *
Romeo Kienzler
IBM Innovation Center
● Parts of these slides have been copied from and/or revised by● Dr. Anand Ranganathan, IBM Watson Research Lab● Dr. Stefan Mück, IBM BigData Leader Europe● Dr. Berthold Rheinwald, IBM Almaden Research Lab● Dr. Diego Kuonen, Statoo Consulting● Dr. Abdel Labbi, IBM Zurich Research Lab● Brandon MacKenzie, IBM Software Group
![Page 3: The datascientists workplace of the future, IBM developerDays 2014, Vienna by Romeo Kienzler](https://reader033.vdocument.in/reader033/viewer/2022051819/54c6a8034a7959cc268b45ec/html5/thumbnails/3.jpg)
© 2013 IBM Corporation3
What is DataScience?
Source: Statoo.com http://slidesha.re/1kmNiX0
![Page 4: The datascientists workplace of the future, IBM developerDays 2014, Vienna by Romeo Kienzler](https://reader033.vdocument.in/reader033/viewer/2022051819/54c6a8034a7959cc268b45ec/html5/thumbnails/4.jpg)
© 2013 IBM Corporation4
What is DataScience?
Source: Statoo.com http://slidesha.re/1kmNiX0
![Page 5: The datascientists workplace of the future, IBM developerDays 2014, Vienna by Romeo Kienzler](https://reader033.vdocument.in/reader033/viewer/2022051819/54c6a8034a7959cc268b45ec/html5/thumbnails/5.jpg)
© 2013 IBM Corporation5
DataScience at present● Tools (http://blog.revolutionanalytics.com/2014/01/in-data-scientist-survey-r-is-the-most-used-tool-other-than-databases.html)
● SQL (42%)● R (33%)● Python (26%)● Excel (25%)● Java, Ruby, C++ (17%)● SPSS, SAS (9%)
● Limitations (Single Node usage)● Main Memory● CPU <> Main Memory Bandwidth● CPU ● Storage <> Main Memory Bandwidth (either Single node or SAN)
![Page 6: The datascientists workplace of the future, IBM developerDays 2014, Vienna by Romeo Kienzler](https://reader033.vdocument.in/reader033/viewer/2022051819/54c6a8034a7959cc268b45ec/html5/thumbnails/6.jpg)
© 2013 IBM Corporation6
DataScience at present - Demo● Assume 1 TB file on Hard Drive● Spit into 16 files
● split -d -n 16 output.json● Distribute on 4 Nodes
● for node in `seq 1 16`; do scp x$node id@node$i:~/; done● Perform calculation in paralell
● for node in `seq 1 16`; do ssh id@node$i 'cat $file
|awk -F":" '{print $6}' |grep -i samsung|grep breathtaking |wc -l';
done > result● Merge Result
● cat result |sumSource: http://sergeytihon.wordpress.com/2013/03/20/the-data-science-venn-diagram/
![Page 7: The datascientists workplace of the future, IBM developerDays 2014, Vienna by Romeo Kienzler](https://reader033.vdocument.in/reader033/viewer/2022051819/54c6a8034a7959cc268b45ec/html5/thumbnails/7.jpg)
© 2013 IBM Corporation7
What is BIG data?
![Page 8: The datascientists workplace of the future, IBM developerDays 2014, Vienna by Romeo Kienzler](https://reader033.vdocument.in/reader033/viewer/2022051819/54c6a8034a7959cc268b45ec/html5/thumbnails/8.jpg)
© 2013 IBM Corporation8
What is BIG data?
![Page 9: The datascientists workplace of the future, IBM developerDays 2014, Vienna by Romeo Kienzler](https://reader033.vdocument.in/reader033/viewer/2022051819/54c6a8034a7959cc268b45ec/html5/thumbnails/9.jpg)
© 2013 IBM Corporation9
What is BIG data?
Big Data
Hadoop
![Page 10: The datascientists workplace of the future, IBM developerDays 2014, Vienna by Romeo Kienzler](https://reader033.vdocument.in/reader033/viewer/2022051819/54c6a8034a7959cc268b45ec/html5/thumbnails/10.jpg)
© 2013 IBM Corporation10
What is BIG data?
Business Intelligence
Data Warehouse
![Page 11: The datascientists workplace of the future, IBM developerDays 2014, Vienna by Romeo Kienzler](https://reader033.vdocument.in/reader033/viewer/2022051819/54c6a8034a7959cc268b45ec/html5/thumbnails/11.jpg)
© 2013 IBM Corporation11
BigData == Hadoop?
Hadoop BigData
Hadoop
![Page 12: The datascientists workplace of the future, IBM developerDays 2014, Vienna by Romeo Kienzler](https://reader033.vdocument.in/reader033/viewer/2022051819/54c6a8034a7959cc268b45ec/html5/thumbnails/12.jpg)
© 2013 IBM Corporation12
What is beyond “Data Warehouse”?
Data Lake
Data Warehouse
![Page 13: The datascientists workplace of the future, IBM developerDays 2014, Vienna by Romeo Kienzler](https://reader033.vdocument.in/reader033/viewer/2022051819/54c6a8034a7959cc268b45ec/html5/thumbnails/13.jpg)
© 2013 IBM Corporation13
First “BigData” UseCase ?● Google Index
● 40 X 10^9 = 40.000.000.000 => 40 billion pages indexed● Will break 100 PB barrier soon● Derived from MapReduce● now “caffeine” based on “percolator”
● Incremental vs. batch● In-Memory vs. disk
●
![Page 14: The datascientists workplace of the future, IBM developerDays 2014, Vienna by Romeo Kienzler](https://reader033.vdocument.in/reader033/viewer/2022051819/54c6a8034a7959cc268b45ec/html5/thumbnails/14.jpg)
© 2013 IBM Corporation14
Map-Reduce → Hadoop → BigInsights
![Page 15: The datascientists workplace of the future, IBM developerDays 2014, Vienna by Romeo Kienzler](https://reader033.vdocument.in/reader033/viewer/2022051819/54c6a8034a7959cc268b45ec/html5/thumbnails/15.jpg)
© 2013 IBM Corporation15
BigData UseCases● CERN LHC
● 25 petabytes per year● Facebook
● Hive Datawarehouse● 300 PB, Growing 600 TB / d● > 100 k servers
● Genomics● Enterprises
● Data center analytics (Logflies, OS/NW monitors, ...)● Predictive Maintenance, Cybersecurity
● Social Media Analytics● DWH offload● Call Detail Record (CDR) data preservation
●
http://www.balthasar-glaettli.ch/vorratsdaten/
![Page 16: The datascientists workplace of the future, IBM developerDays 2014, Vienna by Romeo Kienzler](https://reader033.vdocument.in/reader033/viewer/2022051819/54c6a8034a7959cc268b45ec/html5/thumbnails/16.jpg)
© 2013 IBM Corporation1616
Why is Big Data important?
![Page 17: The datascientists workplace of the future, IBM developerDays 2014, Vienna by Romeo Kienzler](https://reader033.vdocument.in/reader033/viewer/2022051819/54c6a8034a7959cc268b45ec/html5/thumbnails/17.jpg)
© 2013 IBM Corporation17
BigData Analytics
Source: http://www.strategy-at-risk.com/2008/01/01/what-we-do/
![Page 18: The datascientists workplace of the future, IBM developerDays 2014, Vienna by Romeo Kienzler](https://reader033.vdocument.in/reader033/viewer/2022051819/54c6a8034a7959cc268b45ec/html5/thumbnails/18.jpg)
© 2013 IBM Corporation18
BigData Analytics – Predictive Analytics
"sometimes it's not who has the best algorithm that wins; it's who has the most data."
(C) Google Inc.
The Unreasonable Effectiveness of Data¹
¹http://www.csee.wvu.edu/~gidoretto/courses/2011-fall-cp/reading/TheUnreasonable%20EffectivenessofData_IEEE_IS2009.pdf
No Sampling => Work with full dataset => No p-Value/z-Scores anymore
![Page 19: The datascientists workplace of the future, IBM developerDays 2014, Vienna by Romeo Kienzler](https://reader033.vdocument.in/reader033/viewer/2022051819/54c6a8034a7959cc268b45ec/html5/thumbnails/19.jpg)
© 2013 IBM Corporation19
We need Data Parallelism
![Page 20: The datascientists workplace of the future, IBM developerDays 2014, Vienna by Romeo Kienzler](https://reader033.vdocument.in/reader033/viewer/2022051819/54c6a8034a7959cc268b45ec/html5/thumbnails/20.jpg)
© 2013 IBM Corporation20
Aggregated Bandwith between CPU, Main Memory and Hard Drive
1 TB (at 10 GByte/s)
- 1 Node - 100 sec
- 10 Nodes - 10 sec
- 100 Nodes - 1 sec
- 1000 Nodes - 100 msec
![Page 21: The datascientists workplace of the future, IBM developerDays 2014, Vienna by Romeo Kienzler](https://reader033.vdocument.in/reader033/viewer/2022051819/54c6a8034a7959cc268b45ec/html5/thumbnails/21.jpg)
© 2013 IBM Corporation21
Fault Tolerance / Commodity Hardware
AMD Turion II Neo N40L (2x 1,5GHz / 2MB / 15W), 8 GB RAM,
3TB SEAGATE Barracuda 7200.14
< CHF 500
100 K => 200 X (2, 4, 3) => 400 Cores, 1,6 TB RAM, 200 TB HD
MTBF ~ 365 d > 1,5 d
Source: http://www.cloudcomputingpatterns.org/Watchdog
![Page 22: The datascientists workplace of the future, IBM developerDays 2014, Vienna by Romeo Kienzler](https://reader033.vdocument.in/reader033/viewer/2022051819/54c6a8034a7959cc268b45ec/html5/thumbnails/22.jpg)
© 2013 IBM Corporation22
NoSQL Databases Column Store
– Hadoop / HBASE– Cassandra– Amazon Simple DB
JSON / Document Store– MongoDB– CouchDB
Key / Value Store– Amazon DynamoDB– Voldemort
Graph DBs– DB2 SPARQL Extension– Neo4J
MP RDBMS– DB2 DPF, DB2 pureScale, PureData for Operational Analytics– Oracle RAC– Greenplum
http://nosql-database.org/ > 150
![Page 23: The datascientists workplace of the future, IBM developerDays 2014, Vienna by Romeo Kienzler](https://reader033.vdocument.in/reader033/viewer/2022051819/54c6a8034a7959cc268b45ec/html5/thumbnails/23.jpg)
© 2013 IBM Corporation23
CAP Theorem / Brewers Theorem¹ impossible for a distributed computer system simultaneously guarantee all 3 properties
– Consistency (all nodes see the same data at the same time)– Availability (guarantee that every request knows whether it was successful or failed)– Partition tolerance (continues to operate despite failure of part of the system)
What about ACID?– Atomicity– Consistency– Isolation– Durability
BASE, the new ACID– Basically Available– Soft state– Eventual consistency
• Monotonic Read Consistency• Monotonic Write Consistency
• Read Your Own Writes
–
![Page 24: The datascientists workplace of the future, IBM developerDays 2014, Vienna by Romeo Kienzler](https://reader033.vdocument.in/reader033/viewer/2022051819/54c6a8034a7959cc268b45ec/html5/thumbnails/24.jpg)
© 2013 IBM Corporation24
What role is the cloud playing here?
![Page 25: The datascientists workplace of the future, IBM developerDays 2014, Vienna by Romeo Kienzler](https://reader033.vdocument.in/reader033/viewer/2022051819/54c6a8034a7959cc268b45ec/html5/thumbnails/25.jpg)
© 2013 IBM Corporation25
“Elastic” Scale-Out
Source: http://www.cloudcomputingpatterns.org/Continuously_Changing_Workload
![Page 26: The datascientists workplace of the future, IBM developerDays 2014, Vienna by Romeo Kienzler](https://reader033.vdocument.in/reader033/viewer/2022051819/54c6a8034a7959cc268b45ec/html5/thumbnails/26.jpg)
© 2013 IBM Corporation26
“Elastic” Scale-Out
of
![Page 27: The datascientists workplace of the future, IBM developerDays 2014, Vienna by Romeo Kienzler](https://reader033.vdocument.in/reader033/viewer/2022051819/54c6a8034a7959cc268b45ec/html5/thumbnails/27.jpg)
© 2013 IBM Corporation27
“Elastic” Scale-Out
of
CPU Cores
![Page 28: The datascientists workplace of the future, IBM developerDays 2014, Vienna by Romeo Kienzler](https://reader033.vdocument.in/reader033/viewer/2022051819/54c6a8034a7959cc268b45ec/html5/thumbnails/28.jpg)
© 2013 IBM Corporation28
“Elastic” Scale-Out
of
CPU Cores Storage
![Page 29: The datascientists workplace of the future, IBM developerDays 2014, Vienna by Romeo Kienzler](https://reader033.vdocument.in/reader033/viewer/2022051819/54c6a8034a7959cc268b45ec/html5/thumbnails/29.jpg)
© 2013 IBM Corporation29
“Elastic” Scale-Out
of
CPU Cores Storage Memory
![Page 30: The datascientists workplace of the future, IBM developerDays 2014, Vienna by Romeo Kienzler](https://reader033.vdocument.in/reader033/viewer/2022051819/54c6a8034a7959cc268b45ec/html5/thumbnails/30.jpg)
© 2013 IBM Corporation30
“Elastic” Scale-Out
linear
Source: http://www.cloudcomputingpatterns.org/Elastic_Platform
![Page 31: The datascientists workplace of the future, IBM developerDays 2014, Vienna by Romeo Kienzler](https://reader033.vdocument.in/reader033/viewer/2022051819/54c6a8034a7959cc268b45ec/html5/thumbnails/31.jpg)
© 2013 IBM Corporation31
How do Databases Scale-Out?
Shared Disk Architectures
![Page 32: The datascientists workplace of the future, IBM developerDays 2014, Vienna by Romeo Kienzler](https://reader033.vdocument.in/reader033/viewer/2022051819/54c6a8034a7959cc268b45ec/html5/thumbnails/32.jpg)
© 2013 IBM Corporation32
How do Databases Scale-Out?
Shared Nothing Architectures
![Page 33: The datascientists workplace of the future, IBM developerDays 2014, Vienna by Romeo Kienzler](https://reader033.vdocument.in/reader033/viewer/2022051819/54c6a8034a7959cc268b45ec/html5/thumbnails/33.jpg)
© 2013 IBM Corporation33
Hadoop?
Shared Nothing Architecture?
Shared Disk Architecture?
![Page 34: The datascientists workplace of the future, IBM developerDays 2014, Vienna by Romeo Kienzler](https://reader033.vdocument.in/reader033/viewer/2022051819/54c6a8034a7959cc268b45ec/html5/thumbnails/34.jpg)
© 2013 IBM Corporation34
Data Science on Hadoop
SQL (42%)
R (33%)
Python (26%)
Excel (25%)
Java, Ruby, C++ (17%)
SPSS, SAS (9%)
Data Science Hadoop
![Page 35: The datascientists workplace of the future, IBM developerDays 2014, Vienna by Romeo Kienzler](https://reader033.vdocument.in/reader033/viewer/2022051819/54c6a8034a7959cc268b45ec/html5/thumbnails/35.jpg)
© 2013 IBM Corporation35
Large Scale Data Ingestion● Traditionally
● Crawl to local file system (e.g. wget http://www.heise.de/newsticker/)● Export RDBMS data to CSV (local file system)● Batched FTP Servers uploads● Then: Copy to HDFS
● BigInsights● Use one of built-in importers● Imports directly info HDFS● Use Eclipse-Tooling to deploy custom importers easily
![Page 36: The datascientists workplace of the future, IBM developerDays 2014, Vienna by Romeo Kienzler](https://reader033.vdocument.in/reader033/viewer/2022051819/54c6a8034a7959cc268b45ec/html5/thumbnails/36.jpg)
© 2013 IBM Corporation36
Large Scale Data Ingestion (ETL on M/R)● Modern ETL (Extract, Transform, Load) tools support Hadoop as
● Source, Sink (HDFS)● Engine (MapReduce)● Example: InfoSphere DataStage
![Page 37: The datascientists workplace of the future, IBM developerDays 2014, Vienna by Romeo Kienzler](https://reader033.vdocument.in/reader033/viewer/2022051819/54c6a8034a7959cc268b45ec/html5/thumbnails/37.jpg)
© 2013 IBM Corporation37
Real-Time/ In-Memory Data Ingestion● If volume can be reduced dramatically during first processing steps
● Feature Extraction of● Video● Audio● Semistructured Text (e.g. Logfiles)● Structured Text
● Filtering● Compression
● Recommendation: Usage of Streaming Engines● IBM InfoSphere Streams● Twitter Storm (now Apache incubator)● Apache Spark Streaming
![Page 38: The datascientists workplace of the future, IBM developerDays 2014, Vienna by Romeo Kienzler](https://reader033.vdocument.in/reader033/viewer/2022051819/54c6a8034a7959cc268b45ec/html5/thumbnails/38.jpg)
© 2013 IBM Corporation38
Real-Time/ In-Memory Data Ingestion● If volume can be reduced dramatically during first processing steps
● Feature Extraction of● Video● Audio● Semistructured Text (e.g. Logfiles)● Structured Text
● Filtering● Compression
![Page 39: The datascientists workplace of the future, IBM developerDays 2014, Vienna by Romeo Kienzler](https://reader033.vdocument.in/reader033/viewer/2022051819/54c6a8034a7959cc268b45ec/html5/thumbnails/39.jpg)
© 2013 IBM Corporation39
SQL on Hadoop● IBM BigSQL (ANSI 92 compliant)● HIVE (SQL dialect)● Cloudera Impala ● Lingual● ...
SQL Hadoop
![Page 40: The datascientists workplace of the future, IBM developerDays 2014, Vienna by Romeo Kienzler](https://reader033.vdocument.in/reader033/viewer/2022051819/54c6a8034a7959cc268b45ec/html5/thumbnails/40.jpg)
© 2013 IBM Corporation40
BigSQL V3.0 – ANSI SQL 92 compliantIBM BigInsights v3.0, with Big SQL 3.0, is the only Hadoop distribution to successfully run ALL 99 TPC-DS queries and ALL 22 TPC-H queries without modification. Source: http://www.ibmbigdatahub.com/blog/big-deal-about-infosphere-biginsights-v30-big-sql
![Page 41: The datascientists workplace of the future, IBM developerDays 2014, Vienna by Romeo Kienzler](https://reader033.vdocument.in/reader033/viewer/2022051819/54c6a8034a7959cc268b45ec/html5/thumbnails/41.jpg)
© 2013 IBM Corporation41
BigSQL V3.0 – Architecture
![Page 42: The datascientists workplace of the future, IBM developerDays 2014, Vienna by Romeo Kienzler](https://reader033.vdocument.in/reader033/viewer/2022051819/54c6a8034a7959cc268b45ec/html5/thumbnails/42.jpg)
© 2013 IBM Corporation42
BigSQL V3.0 – Demo (small)● 32 GB Data, ~650.000.000 rows (small, Innovation Center Zurich)● 3 TB Data, ~ 60.937.500.000 rows (middle, Innovation Center Zurich)● 0.7 PB Data, ~ 1.421875×10¹³ rows (large, Innovation Center Hursley)
● 32 GB Data, ~650.000.000 rows (small, Innovation Center Zurich)● 3 TB Data, ~ 60.937.500.000 rows (middle, Innovation Center Zurich)● 0.7 PB Data, ~ 1.421875×10¹³ rows (large, Innovation Center Hursley)
![Page 43: The datascientists workplace of the future, IBM developerDays 2014, Vienna by Romeo Kienzler](https://reader033.vdocument.in/reader033/viewer/2022051819/54c6a8034a7959cc268b45ec/html5/thumbnails/43.jpg)
© 2013 IBM Corporation43
BigSQL V3.0 – Demo (small)CREATE EXTERNAL TABLE trace (
hour integer, employeeid integer,
departmentid integer, clientid integer,
date string, timestamp string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' STORED AS TEXTFILE LOCATION '/user/biadmin/32Gtest';
select count(hour), hour from trace group by hour order by hour
-- This command runs on 32 GB / ~650.000.000 rows in HDFS
![Page 44: The datascientists workplace of the future, IBM developerDays 2014, Vienna by Romeo Kienzler](https://reader033.vdocument.in/reader033/viewer/2022051819/54c6a8034a7959cc268b45ec/html5/thumbnails/44.jpg)
© 2013 IBM Corporation44
BigSQL V3.0 – Demo (small)
![Page 45: The datascientists workplace of the future, IBM developerDays 2014, Vienna by Romeo Kienzler](https://reader033.vdocument.in/reader033/viewer/2022051819/54c6a8034a7959cc268b45ec/html5/thumbnails/45.jpg)
© 2013 IBM Corporation45
BigSQL V3.0 – Demo (small)
![Page 46: The datascientists workplace of the future, IBM developerDays 2014, Vienna by Romeo Kienzler](https://reader033.vdocument.in/reader033/viewer/2022051819/54c6a8034a7959cc268b45ec/html5/thumbnails/46.jpg)
© 2013 IBM Corporation46
R on Hadoop● IBM BigR (based on SystemML Almadan Research project)● Rhadoop● RHIPE● ...
“R” Hadoop
![Page 47: The datascientists workplace of the future, IBM developerDays 2014, Vienna by Romeo Kienzler](https://reader033.vdocument.in/reader033/viewer/2022051819/54c6a8034a7959cc268b45ec/html5/thumbnails/47.jpg)
© 2013 IBM Corporation47
BigR (based on SystemML)Example: Gaussian Non-negative Matrix Factorization
package gnmf;
import java.io.IOException;import java.net.URISyntaxException;
import org.apache.hadoop.fs.FileSystem;import org.apache.hadoop.fs.Path;import org.apache.hadoop.mapred.JobConf;
public class MatrixGNMF{ public static void main(String[] args) throws IOException, URISyntaxException { if(args.length < 10) { System.out.println("missing parameters"); System.out.println("expected parameters: [directory of v] [directory of w] [directory of h] " + "[k] [num mappers] [num reducers] [replication] [working directory] " + "[final directory of w] [final directory of h]"); System.exit(1); } String vDir = args[0]; String wDir = args[1]; String hDir = args[2]; int k = Integer.parseInt(args[3]); int numMappers = Integer.parseInt(args[4]); int numReducers = Integer.parseInt(args[5]); int replication = Integer.parseInt(args[6]); String outputDir = args[7]; String wFinalDir = args[8]; String hFinalDir = args[9]; JobConf mainJob = new JobConf(MatrixGNMF.class); String vDirectory; String wDirectory; String hDirectory; FileSystem.get(mainJob).delete(new Path(outputDir)); vDirectory = vDir; hDirectory = hDir; wDirectory = wDir; String workingDirectory; String resultDirectoryX; String resultDirectoryY; long start = System.currentTimeMillis(); System.gc(); System.out.println("starting calculation"); System.out.print("calculating X = WT * V... "); workingDirectory = UpdateWHStep1.runJob(numMappers, numReducers, replication, UpdateWHStep1.UPDATE_TYPE_H, vDirectory, wDirectory, outputDir, k); resultDirectoryX = UpdateWHStep2.runJob(numMappers, numReducers, replication, workingDirectory, outputDir); FileSystem.get(mainJob).delete(new Path(workingDirectory)); System.out.println("done"); System.out.print("calculating Y = WT * W * H... "); workingDirectory = UpdateWHStep3.runJob(numMappers, numReducers, replication, wDirectory, outputDir); resultDirectoryY = UpdateWHStep4.runJob(numMappers, replication, workingDirectory, UpdateWHStep4.UPDATE_TYPE_H, hDirectory, outputDir); FileSystem.get(mainJob).delete(new Path(workingDirectory)); System.out.println("done"); System.out.print("calculating H = H .* X ./ Y... "); workingDirectory = UpdateWHStep5.runJob(numMappers, numReducers, replication, hDirectory, resultDirectoryX, resultDirectoryY, hFinalDir, k); System.out.println("done"); FileSystem.get(mainJob).delete(new Path(resultDirectoryX)); FileSystem.get(mainJob).delete(new Path(resultDirectoryY)); System.out.print("storing back H... "); FileSystem.get(mainJob).delete(new Path(hDirectory)); hDirectory = workingDirectory; System.out.println("done"); System.out.print("calculating X = V * HT... "); workingDirectory = UpdateWHStep1.runJob(numMappers, numReducers, replication, UpdateWHStep1.UPDATE_TYPE_W, vDirectory, hDirectory, outputDir, k); resultDirectoryX = UpdateWHStep2.runJob(numMappers, numReducers, replication, workingDirectory, outputDir); FileSystem.get(mainJob).delete(new Path(workingDirectory)); System.out.println("done"); System.out.print("calculating Y = W * H * HT... "); workingDirectory = UpdateWHStep3.runJob(numMappers, numReducers, replication, hDirectory, outputDir); resultDirectoryY = UpdateWHStep4.runJob(numMappers, replication, workingDirectory, UpdateWHStep4.UPDATE_TYPE_W, wDirectory, outputDir); FileSystem.get(mainJob).delete(new Path(workingDirectory)); System.out.println("done"); System.out.print("calculating W = W .* X ./ Y... "); workingDirectory = UpdateWHStep5.runJob(numMappers, numReducers, replication, wDirectory, resultDirectoryX, resultDirectoryY, wFinalDir, k); System.out.println("done"); FileSystem.get(mainJob).delete(new Path(resultDirectoryX)); FileSystem.get(mainJob).delete(new Path(resultDirectoryY)); System.out.print("storing back W... "); FileSystem.get(mainJob).delete(new Path(wDirectory)); wDirectory = workingDirectory; System.out.println("done"); long requiredTime = System.currentTimeMillis() - start; long requiredTimeMilliseconds = requiredTime % 1000; requiredTime -= requiredTimeMilliseconds; requiredTime /= 1000; long requiredTimeSeconds = requiredTime % 60; requiredTime -= requiredTimeSeconds; requiredTime /= 60; long requiredTimeMinutes = requiredTime % 60; requiredTime -= requiredTimeMinutes; requiredTime /= 60; long requiredTimeHours = requiredTime;}}
package gnmf;
import gnmf.io.MatrixObject;import gnmf.io.MatrixVector;import gnmf.io.TaggedIndex;
import java.io.IOException;import java.util.Iterator;
import org.apache.hadoop.fs.Path;import org.apache.hadoop.mapred.FileInputFormat;import org.apache.hadoop.mapred.FileOutputFormat;import org.apache.hadoop.mapred.JobClient;import org.apache.hadoop.mapred.JobConf;import org.apache.hadoop.mapred.MapReduceBase;import org.apache.hadoop.mapred.Mapper;import org.apache.hadoop.mapred.OutputCollector;import org.apache.hadoop.mapred.Reducer;import org.apache.hadoop.mapred.Reporter;import org.apache.hadoop.mapred.SequenceFileInputFormat;import org.apache.hadoop.mapred.SequenceFileOutputFormat;
public class UpdateWHStep2{ static class UpdateWHStep2Mapper extends MapReduceBase implements Mapper<TaggedIndex, MatrixVector, TaggedIndex, MatrixVector> { @Override public void map(TaggedIndex key, MatrixVector value, OutputCollector<TaggedIndex, MatrixVector> out, Reporter reporter) throws IOException { out.collect(key, value); } } static class UpdateWHStep2Reducer extends MapReduceBase implements Reducer<TaggedIndex, MatrixVector, TaggedIndex, MatrixObject> { @Override public void reduce(TaggedIndex key, Iterator<MatrixVector> values, OutputCollector<TaggedIndex, MatrixObject> out, Reporter reporter) throws IOException { MatrixVector result = null; while(values.hasNext()) { MatrixVector current = values.next(); if(result == null) { result = current.getCopy(); } else { result.addVector(current); } } if(result != null) { out.collect(new TaggedIndex(key.getIndex(), TaggedIndex.TYPE_VECTOR_X), new MatrixObject(result)); } } } public static String runJob(int numMappers, int numReducers, int replication, String inputDir, String outputDir) throws IOException { String workingDirectory = outputDir + System.currentTimeMillis() + "-UpdateWHStep2/";
JobConf job = new JobConf(UpdateWHStep2.class); job.setJobName("MatrixGNMFUpdateWHStep2"); job.setInputFormat(SequenceFileInputFormat.class); FileInputFormat.setInputPaths(job, new Path(inputDir)); job.setOutputFormat(SequenceFileOutputFormat.class); FileOutputFormat.setOutputPath(job, new Path(workingDirectory)); job.setNumMapTasks(numMappers); job.setMapperClass(UpdateWHStep2Mapper.class); job.setMapOutputKeyClass(TaggedIndex.class); job.setMapOutputValueClass(MatrixVector.class); job.setNumReduceTasks(numReducers); job.setReducerClass(UpdateWHStep2Reducer.class); job.setOutputKeyClass(TaggedIndex.class); job.setOutputValueClass(MatrixObject.class); JobClient.runJob(job); return workingDirectory;
}}
package gnmf;
import gnmf.io.MatrixCell;import gnmf.io.MatrixFormats;import gnmf.io.MatrixObject;import gnmf.io.MatrixVector;import gnmf.io.TaggedIndex;
import java.io.IOException;import java.util.Iterator;
import org.apache.hadoop.filecache.DistributedCache;import org.apache.hadoop.fs.FileSystem;import org.apache.hadoop.fs.Path;import org.apache.hadoop.mapred.FileInputFormat;import org.apache.hadoop.mapred.FileOutputFormat;import org.apache.hadoop.mapred.JobClient;import org.apache.hadoop.mapred.JobConf;import org.apache.hadoop.mapred.MapReduceBase;import org.apache.hadoop.mapred.Mapper;import org.apache.hadoop.mapred.OutputCollector;import org.apache.hadoop.mapred.Reducer;import org.apache.hadoop.mapred.Reporter;import org.apache.hadoop.mapred.SequenceFileInputFormat;import org.apache.hadoop.mapred.SequenceFileOutputFormat;
public class UpdateWHStep1{ public static final int UPDATE_TYPE_H = 0; public static final int UPDATE_TYPE_W = 1; static class UpdateWHStep1Mapper extends MapReduceBase implements Mapper<TaggedIndex, MatrixObject, TaggedIndex, MatrixObject> { private int updateType; @Override public void map(TaggedIndex key, MatrixObject value, OutputCollector<TaggedIndex, MatrixObject> out, Reporter reporter) throws IOException { if(updateType == UPDATE_TYPE_W && key.getType() == TaggedIndex.TYPE_CELL) { MatrixCell current = (MatrixCell) value.getObject(); out.collect(new TaggedIndex(current.getColumn(), TaggedIndex.TYPE_CELL), new MatrixObject(new MatrixCell(key.getIndex(), current.getValue()))); } else { out.collect(key, value); } } @Override public void configure(JobConf job) { updateType = job.getInt("gnmf.updateType", 0); } } static class UpdateWHStep1Reducer extends MapReduceBase implements Reducer<TaggedIndex, MatrixObject, TaggedIndex, MatrixVector> { private double[] baseVector = null; private int vectorSizeK; @Override public void reduce(TaggedIndex key, Iterator<MatrixObject> values, OutputCollector<TaggedIndex, MatrixVector> out, Reporter reporter) throws IOException { if(key.getType() == TaggedIndex.TYPE_VECTOR) { if(!values.hasNext()) throw new RuntimeException("expected vector"); MatrixFormats current = values.next().getObject(); if(!(current instanceof MatrixVector)) throw new RuntimeException("expected vector"); baseVector = ((MatrixVector) current).getValues(); } else { while(values.hasNext()) { MatrixCell current = (MatrixCell) values.next().getObject(); if(baseVector == null) { out.collect(new TaggedIndex(current.getColumn(), TaggedIndex.TYPE_VECTOR), new MatrixVector(vectorSizeK)); } else { if(baseVector.length == 0) throw new RuntimeException("base vector is corrupted"); MatrixVector resultingVector = new MatrixVector(baseVector); resultingVector.multiplyWithScalar(current.getValue()); if(resultingVector.getValues().length == 0) throw new RuntimeException("multiplying with scalar failed"); out.collect(new TaggedIndex(current.getColumn(), TaggedIndex.TYPE_VECTOR), resultingVector); } } baseVector = null; } } @Override public void configure(JobConf job) { vectorSizeK = job.getInt("dml.matrix.gnmf.k", 0); if(vectorSizeK == 0) throw new RuntimeException("invalid k specified"); } } public static String runJob(int numMappers, int numReducers, int replication, int updateType, String matrixInputDir, String whInputDir, String outputDir, int k) throws IOException {
Java Implementation
(>1500 lines of code)
Equivalent SystemML Implementation
(10 lines of code)
Experimenting with multiple variants!
W = W*max(V%*%t(H) – alphaW JW, 0)/(W%*%H%*%t(H))H = H*max(t(W)%*%V – alphaH JH, 0)/(t(W)%*%W%*%H)
W = W*((S*V)%*%t(H))/((S*(W%*%H))%*%t(H))H = H*(t(W)%*%(S*V))/(t(W)%*%(S*(W%*%H)))
W = W*(V/(W%*%H) %*% t(H))/(E%*%t(H))H = H*(t(W)%*%(V/(W%*%H)))/(t(W)%*%E)
![Page 48: The datascientists workplace of the future, IBM developerDays 2014, Vienna by Romeo Kienzler](https://reader033.vdocument.in/reader033/viewer/2022051819/54c6a8034a7959cc268b45ec/html5/thumbnails/48.jpg)
© 2013 IBM Corporation48
BigR (based on SystemML)SystemML compiles hybrid runtime plans ranging from in-memory, single machine (CP) to large-scale, cluster (MR) compute
● Challenge● Guaranteed hard memory constraints
(budget of JVM size)● for arbitrary complex ML programs
● Key Technical Innovations● CP & MR Runtime: Single machine & MR operations, integrated runtime● Caching: Reuse and eviction of in-memory objects● Cost Model: Accurate time and worst-case memory estimates● Optimizer: Cost-based runtime plan generation● Dyn. Recompiler: Re-optimization for initial unknowns
Data size
Run
time
CP CP/MR MR
Gradually exploit MR parallelism
High performance computing for small data sizes.
Scalable computing for large data sizes.
Hybrid Plans
![Page 49: The datascientists workplace of the future, IBM developerDays 2014, Vienna by Romeo Kienzler](https://reader033.vdocument.in/reader033/viewer/2022051819/54c6a8034a7959cc268b45ec/html5/thumbnails/49.jpg)
© 2013 IBM Corporation49
R Clients
SystemMLStatistics
Engine
Data Sources
Embedded R Execution
IBM R Packages
IBM R Packages
Pull data (summaries) to
R client
Or, push R functions
right on the data
1
2
3
© 2014 IBM Corporation17 IBM Internal Use Only
BigR Architecture
![Page 50: The datascientists workplace of the future, IBM developerDays 2014, Vienna by Romeo Kienzler](https://reader033.vdocument.in/reader033/viewer/2022051819/54c6a8034a7959cc268b45ec/html5/thumbnails/50.jpg)
© 2013 IBM Corporation50
BigR Demo (small) ● 32 GB Data, ~650.000.000 rows (small, Innovation Center Zurich)● 3 TB Data, ~ 60.937.500.000 rows (middle, Innovation Center Zurich)● 0.7 PB Data, ~ 1.421875×10¹³ rows (large, Innovation Center Hursley)
![Page 51: The datascientists workplace of the future, IBM developerDays 2014, Vienna by Romeo Kienzler](https://reader033.vdocument.in/reader033/viewer/2022051819/54c6a8034a7959cc268b45ec/html5/thumbnails/51.jpg)
© 2013 IBM Corporation51
BigR Demo (small) library(bigr)
bigr.connect(host="bigdata",
port=7052, database="default",
user="biadmin", password="xxx")
is.bigr.connected()
tbr <- bigr.frame(dataSource="DEL", coltypes = c("numeric","numeric","numeric","numeric","character","character"),
dataPath="/user/biadmin/32Gtest", delimiter=",",
header=F, useMapReduce=T)
h <- bigr.histogram.stats(tbr$V1, nbins=24)
![Page 52: The datascientists workplace of the future, IBM developerDays 2014, Vienna by Romeo Kienzler](https://reader033.vdocument.in/reader033/viewer/2022051819/54c6a8034a7959cc268b45ec/html5/thumbnails/52.jpg)
© 2013 IBM Corporation52
BigR Demo (small) class bins counts centroids
1 ALL 0 18289280 1.583333
2 ALL 1 15360 2.750000
3 ALL 2 55040 3.916667
4 ALL 3 189440 5.083333
5 ALL 4 579840 6.250000
6 ALL 5 5292160 7.416667
7 ALL 6 8074880 8.583333
8 ALL 7 15653120 9.750000
...
![Page 53: The datascientists workplace of the future, IBM developerDays 2014, Vienna by Romeo Kienzler](https://reader033.vdocument.in/reader033/viewer/2022051819/54c6a8034a7959cc268b45ec/html5/thumbnails/53.jpg)
© 2013 IBM Corporation53
BigR Demo (small)
![Page 54: The datascientists workplace of the future, IBM developerDays 2014, Vienna by Romeo Kienzler](https://reader033.vdocument.in/reader033/viewer/2022051819/54c6a8034a7959cc268b45ec/html5/thumbnails/54.jpg)
© 2013 IBM Corporation54
BigR Demo (small) jpeg('hist.jpg')
bigr.histogram(tbr$V1, nbins=24)
# This command runs on 32 GB / ~650.000.000 rows in HDFS
dev.off()
![Page 55: The datascientists workplace of the future, IBM developerDays 2014, Vienna by Romeo Kienzler](https://reader033.vdocument.in/reader033/viewer/2022051819/54c6a8034a7959cc268b45ec/html5/thumbnails/55.jpg)
© 2013 IBM Corporation55
BigR Demo (small) Sampling, Resampling, Bootstrapping
vsWhole Dataset Processing
What is your experience?
![Page 56: The datascientists workplace of the future, IBM developerDays 2014, Vienna by Romeo Kienzler](https://reader033.vdocument.in/reader033/viewer/2022051819/54c6a8034a7959cc268b45ec/html5/thumbnails/56.jpg)
© 2013 IBM Corporation56
Python on Hadoop
python Hadoop
![Page 57: The datascientists workplace of the future, IBM developerDays 2014, Vienna by Romeo Kienzler](https://reader033.vdocument.in/reader033/viewer/2022051819/54c6a8034a7959cc268b45ec/html5/thumbnails/57.jpg)
© 2013 IBM Corporation57
SPSS on Hadoop
![Page 58: The datascientists workplace of the future, IBM developerDays 2014, Vienna by Romeo Kienzler](https://reader033.vdocument.in/reader033/viewer/2022051819/54c6a8034a7959cc268b45ec/html5/thumbnails/58.jpg)
© 2013 IBM Corporation58
SPSS on Hadoop
![Page 59: The datascientists workplace of the future, IBM developerDays 2014, Vienna by Romeo Kienzler](https://reader033.vdocument.in/reader033/viewer/2022051819/54c6a8034a7959cc268b45ec/html5/thumbnails/59.jpg)
© 2013 IBM Corporation59
BigSheets Demo (small)● 32 GB Data, ~650.000.000 rows (small, Innovation Center Zurich)● 3 TB Data, ~ 60.937.500.000 rows (middle, Innovation Center Zurich)● 0.7 PB Data, ~ 1.421875×10¹³ rows (large, Innovation Center Hursley)
● 32 GB Data, ~650.000.000 rows (small, Innovation Center Zurich)● 3 TB Data, ~ 60.937.500.000 rows (middle, Innovation Center Zurich)● 0.7 PB Data, ~ 1.421875×10¹³ rows (large, Innovation Center Hursley)
![Page 60: The datascientists workplace of the future, IBM developerDays 2014, Vienna by Romeo Kienzler](https://reader033.vdocument.in/reader033/viewer/2022051819/54c6a8034a7959cc268b45ec/html5/thumbnails/60.jpg)
© 2013 IBM Corporation60
BigSheets Demo (small)
![Page 61: The datascientists workplace of the future, IBM developerDays 2014, Vienna by Romeo Kienzler](https://reader033.vdocument.in/reader033/viewer/2022051819/54c6a8034a7959cc268b45ec/html5/thumbnails/61.jpg)
© 2013 IBM Corporation61
BigSheets Demo (small)
This command runs on 32 GB /
~650.000.000 rows in HDFS
![Page 62: The datascientists workplace of the future, IBM developerDays 2014, Vienna by Romeo Kienzler](https://reader033.vdocument.in/reader033/viewer/2022051819/54c6a8034a7959cc268b45ec/html5/thumbnails/62.jpg)
© 2013 IBM Corporation62
BigSheets Demo (small)
![Page 63: The datascientists workplace of the future, IBM developerDays 2014, Vienna by Romeo Kienzler](https://reader033.vdocument.in/reader033/viewer/2022051819/54c6a8034a7959cc268b45ec/html5/thumbnails/63.jpg)
© 2013 IBM Corporation63
Text Extraction (SystemT, AQL)
![Page 64: The datascientists workplace of the future, IBM developerDays 2014, Vienna by Romeo Kienzler](https://reader033.vdocument.in/reader033/viewer/2022051819/54c6a8034a7959cc268b45ec/html5/thumbnails/64.jpg)
© 2013 IBM Corporation64
Text Extraction (SystemT, AQL)
![Page 65: The datascientists workplace of the future, IBM developerDays 2014, Vienna by Romeo Kienzler](https://reader033.vdocument.in/reader033/viewer/2022051819/54c6a8034a7959cc268b45ec/html5/thumbnails/65.jpg)
© 2013 IBM Corporation65
If this is not enough? → BigData AppStore
![Page 66: The datascientists workplace of the future, IBM developerDays 2014, Vienna by Romeo Kienzler](https://reader033.vdocument.in/reader033/viewer/2022051819/54c6a8034a7959cc268b45ec/html5/thumbnails/66.jpg)
© 2013 IBM Corporation66
BigData AppStore, Eclipse Tooling● Write your apps in
● Java (MapReduce)● PigLatin,Jaql● BigSQL/Hive/BigR
● Deploy it to BigInsights via Eclipse● Automatically
● Schedule● Update
● hdfs files● BigSQL tables● BigSheets collections
![Page 67: The datascientists workplace of the future, IBM developerDays 2014, Vienna by Romeo Kienzler](https://reader033.vdocument.in/reader033/viewer/2022051819/54c6a8034a7959cc268b45ec/html5/thumbnails/67.jpg)
© 2013 IBM Corporation67
Questions?
http://www.ibm.com/software/data/bigdata/
Twitter: @RomeoKienzler, @IBMEcosystem_DE, @IBM_ISV_Alps
![Page 68: The datascientists workplace of the future, IBM developerDays 2014, Vienna by Romeo Kienzler](https://reader033.vdocument.in/reader033/viewer/2022051819/54c6a8034a7959cc268b45ec/html5/thumbnails/68.jpg)
© 2013 IBM Corporation68
DFT/Audio Analytics (as promised)library(tuneR)
a <- readWave("whitenoisesine.wav")
f<- fft(a@left)
jpeg('rplot_wnsine.jpg')
plot(Re(f)^2)
dev.off()
a <- readWave("whitenoise.wav")
f<- fft(a@left)
jpeg('rplot_wn.jpg')
plot(Re(f)^2)
dev.off()
a <- readWave("whitenoisesine.wav")
brv <- as.bigr.vector(a@left)
al <- as.list(a@left)
![Page 69: The datascientists workplace of the future, IBM developerDays 2014, Vienna by Romeo Kienzler](https://reader033.vdocument.in/reader033/viewer/2022051819/54c6a8034a7959cc268b45ec/html5/thumbnails/69.jpg)
© 2013 IBM Corporation69
Backup Slides
![Page 70: The datascientists workplace of the future, IBM developerDays 2014, Vienna by Romeo Kienzler](https://reader033.vdocument.in/reader033/viewer/2022051819/54c6a8034a7959cc268b45ec/html5/thumbnails/70.jpg)
© 2013 IBM Corporation70
![Page 71: The datascientists workplace of the future, IBM developerDays 2014, Vienna by Romeo Kienzler](https://reader033.vdocument.in/reader033/viewer/2022051819/54c6a8034a7959cc268b45ec/html5/thumbnails/71.jpg)
© 2013 IBM Corporation71
![Page 72: The datascientists workplace of the future, IBM developerDays 2014, Vienna by Romeo Kienzler](https://reader033.vdocument.in/reader033/viewer/2022051819/54c6a8034a7959cc268b45ec/html5/thumbnails/72.jpg)
© 2013 IBM Corporation72
![Page 73: The datascientists workplace of the future, IBM developerDays 2014, Vienna by Romeo Kienzler](https://reader033.vdocument.in/reader033/viewer/2022051819/54c6a8034a7959cc268b45ec/html5/thumbnails/73.jpg)
© 2013 IBM Corporation73
![Page 74: The datascientists workplace of the future, IBM developerDays 2014, Vienna by Romeo Kienzler](https://reader033.vdocument.in/reader033/viewer/2022051819/54c6a8034a7959cc268b45ec/html5/thumbnails/74.jpg)
© 2013 IBM Corporation74
![Page 75: The datascientists workplace of the future, IBM developerDays 2014, Vienna by Romeo Kienzler](https://reader033.vdocument.in/reader033/viewer/2022051819/54c6a8034a7959cc268b45ec/html5/thumbnails/75.jpg)
© 2013 IBM Corporation75
![Page 76: The datascientists workplace of the future, IBM developerDays 2014, Vienna by Romeo Kienzler](https://reader033.vdocument.in/reader033/viewer/2022051819/54c6a8034a7959cc268b45ec/html5/thumbnails/76.jpg)
© 2013 IBM Corporation76
![Page 77: The datascientists workplace of the future, IBM developerDays 2014, Vienna by Romeo Kienzler](https://reader033.vdocument.in/reader033/viewer/2022051819/54c6a8034a7959cc268b45ec/html5/thumbnails/77.jpg)
© 2013 IBM Corporation77
![Page 78: The datascientists workplace of the future, IBM developerDays 2014, Vienna by Romeo Kienzler](https://reader033.vdocument.in/reader033/viewer/2022051819/54c6a8034a7959cc268b45ec/html5/thumbnails/78.jpg)
© 2013 IBM Corporation78
![Page 79: The datascientists workplace of the future, IBM developerDays 2014, Vienna by Romeo Kienzler](https://reader033.vdocument.in/reader033/viewer/2022051819/54c6a8034a7959cc268b45ec/html5/thumbnails/79.jpg)
© 2013 IBM Corporation79
![Page 80: The datascientists workplace of the future, IBM developerDays 2014, Vienna by Romeo Kienzler](https://reader033.vdocument.in/reader033/viewer/2022051819/54c6a8034a7959cc268b45ec/html5/thumbnails/80.jpg)
© 2013 IBM Corporation80
![Page 81: The datascientists workplace of the future, IBM developerDays 2014, Vienna by Romeo Kienzler](https://reader033.vdocument.in/reader033/viewer/2022051819/54c6a8034a7959cc268b45ec/html5/thumbnails/81.jpg)
© 2013 IBM Corporation81
![Page 82: The datascientists workplace of the future, IBM developerDays 2014, Vienna by Romeo Kienzler](https://reader033.vdocument.in/reader033/viewer/2022051819/54c6a8034a7959cc268b45ec/html5/thumbnails/82.jpg)
© 2013 IBM Corporation82
![Page 83: The datascientists workplace of the future, IBM developerDays 2014, Vienna by Romeo Kienzler](https://reader033.vdocument.in/reader033/viewer/2022051819/54c6a8034a7959cc268b45ec/html5/thumbnails/83.jpg)
© 2013 IBM Corporation83
![Page 84: The datascientists workplace of the future, IBM developerDays 2014, Vienna by Romeo Kienzler](https://reader033.vdocument.in/reader033/viewer/2022051819/54c6a8034a7959cc268b45ec/html5/thumbnails/84.jpg)
© 2013 IBM Corporation84
Map-Reduce
Source: http://www.cloudcomputingpatterns.org/Map_Reduce
![Page 85: The datascientists workplace of the future, IBM developerDays 2014, Vienna by Romeo Kienzler](https://reader033.vdocument.in/reader033/viewer/2022051819/54c6a8034a7959cc268b45ec/html5/thumbnails/85.jpg)
© 2013 IBM Corporation85
![Page 86: The datascientists workplace of the future, IBM developerDays 2014, Vienna by Romeo Kienzler](https://reader033.vdocument.in/reader033/viewer/2022051819/54c6a8034a7959cc268b45ec/html5/thumbnails/86.jpg)
© 2013 IBM Corporation86
![Page 87: The datascientists workplace of the future, IBM developerDays 2014, Vienna by Romeo Kienzler](https://reader033.vdocument.in/reader033/viewer/2022051819/54c6a8034a7959cc268b45ec/html5/thumbnails/87.jpg)
© 2013 IBM Corporation87
![Page 88: The datascientists workplace of the future, IBM developerDays 2014, Vienna by Romeo Kienzler](https://reader033.vdocument.in/reader033/viewer/2022051819/54c6a8034a7959cc268b45ec/html5/thumbnails/88.jpg)
© 2013 IBM Corporation88
![Page 89: The datascientists workplace of the future, IBM developerDays 2014, Vienna by Romeo Kienzler](https://reader033.vdocument.in/reader033/viewer/2022051819/54c6a8034a7959cc268b45ec/html5/thumbnails/89.jpg)
© 2013 IBM Corporation89
![Page 90: The datascientists workplace of the future, IBM developerDays 2014, Vienna by Romeo Kienzler](https://reader033.vdocument.in/reader033/viewer/2022051819/54c6a8034a7959cc268b45ec/html5/thumbnails/90.jpg)
© 2013 IBM Corporation90
![Page 91: The datascientists workplace of the future, IBM developerDays 2014, Vienna by Romeo Kienzler](https://reader033.vdocument.in/reader033/viewer/2022051819/54c6a8034a7959cc268b45ec/html5/thumbnails/91.jpg)
© 2013 IBM Corporation91
![Page 92: The datascientists workplace of the future, IBM developerDays 2014, Vienna by Romeo Kienzler](https://reader033.vdocument.in/reader033/viewer/2022051819/54c6a8034a7959cc268b45ec/html5/thumbnails/92.jpg)
© 2013 IBM Corporation92
![Page 93: The datascientists workplace of the future, IBM developerDays 2014, Vienna by Romeo Kienzler](https://reader033.vdocument.in/reader033/viewer/2022051819/54c6a8034a7959cc268b45ec/html5/thumbnails/93.jpg)
© 2013 IBM Corporation93
![Page 94: The datascientists workplace of the future, IBM developerDays 2014, Vienna by Romeo Kienzler](https://reader033.vdocument.in/reader033/viewer/2022051819/54c6a8034a7959cc268b45ec/html5/thumbnails/94.jpg)
© 2013 IBM Corporation94
![Page 95: The datascientists workplace of the future, IBM developerDays 2014, Vienna by Romeo Kienzler](https://reader033.vdocument.in/reader033/viewer/2022051819/54c6a8034a7959cc268b45ec/html5/thumbnails/95.jpg)
© 2013 IBM Corporation95
![Page 96: The datascientists workplace of the future, IBM developerDays 2014, Vienna by Romeo Kienzler](https://reader033.vdocument.in/reader033/viewer/2022051819/54c6a8034a7959cc268b45ec/html5/thumbnails/96.jpg)
© 2013 IBM Corporation96
![Page 97: The datascientists workplace of the future, IBM developerDays 2014, Vienna by Romeo Kienzler](https://reader033.vdocument.in/reader033/viewer/2022051819/54c6a8034a7959cc268b45ec/html5/thumbnails/97.jpg)
© 2013 IBM Corporation97
![Page 98: The datascientists workplace of the future, IBM developerDays 2014, Vienna by Romeo Kienzler](https://reader033.vdocument.in/reader033/viewer/2022051819/54c6a8034a7959cc268b45ec/html5/thumbnails/98.jpg)
© 2013 IBM Corporation98
![Page 99: The datascientists workplace of the future, IBM developerDays 2014, Vienna by Romeo Kienzler](https://reader033.vdocument.in/reader033/viewer/2022051819/54c6a8034a7959cc268b45ec/html5/thumbnails/99.jpg)
© 2013 IBM Corporation99
![Page 100: The datascientists workplace of the future, IBM developerDays 2014, Vienna by Romeo Kienzler](https://reader033.vdocument.in/reader033/viewer/2022051819/54c6a8034a7959cc268b45ec/html5/thumbnails/100.jpg)
© 2013 IBM Corporation100
![Page 101: The datascientists workplace of the future, IBM developerDays 2014, Vienna by Romeo Kienzler](https://reader033.vdocument.in/reader033/viewer/2022051819/54c6a8034a7959cc268b45ec/html5/thumbnails/101.jpg)
© 2013 IBM Corporation101
![Page 102: The datascientists workplace of the future, IBM developerDays 2014, Vienna by Romeo Kienzler](https://reader033.vdocument.in/reader033/viewer/2022051819/54c6a8034a7959cc268b45ec/html5/thumbnails/102.jpg)
© 2013 IBM Corporation102
![Page 103: The datascientists workplace of the future, IBM developerDays 2014, Vienna by Romeo Kienzler](https://reader033.vdocument.in/reader033/viewer/2022051819/54c6a8034a7959cc268b45ec/html5/thumbnails/103.jpg)
© 2013 IBM Corporation103
![Page 104: The datascientists workplace of the future, IBM developerDays 2014, Vienna by Romeo Kienzler](https://reader033.vdocument.in/reader033/viewer/2022051819/54c6a8034a7959cc268b45ec/html5/thumbnails/104.jpg)
© 2013 IBM Corporation104
![Page 105: The datascientists workplace of the future, IBM developerDays 2014, Vienna by Romeo Kienzler](https://reader033.vdocument.in/reader033/viewer/2022051819/54c6a8034a7959cc268b45ec/html5/thumbnails/105.jpg)
© 2013 IBM Corporation105
![Page 106: The datascientists workplace of the future, IBM developerDays 2014, Vienna by Romeo Kienzler](https://reader033.vdocument.in/reader033/viewer/2022051819/54c6a8034a7959cc268b45ec/html5/thumbnails/106.jpg)
© 2013 IBM Corporation106
![Page 107: The datascientists workplace of the future, IBM developerDays 2014, Vienna by Romeo Kienzler](https://reader033.vdocument.in/reader033/viewer/2022051819/54c6a8034a7959cc268b45ec/html5/thumbnails/107.jpg)
© 2013 IBM Corporation107
![Page 108: The datascientists workplace of the future, IBM developerDays 2014, Vienna by Romeo Kienzler](https://reader033.vdocument.in/reader033/viewer/2022051819/54c6a8034a7959cc268b45ec/html5/thumbnails/108.jpg)
© 2013 IBM Corporation108