big data: using arcgis with apache hadoop · pdf filebig data overview -the hadoop platform...
TRANSCRIPT
![Page 1: Big Data: Using ArcGIS with Apache Hadoop · PDF fileBig Data overview -The Hadoop platform -How Esri’s . GIS Tools for Hadoop . enables developers to process spatial data on Hadoop](https://reader031.vdocument.in/reader031/viewer/2022022504/5ab84e7c7f8b9a684c8ca600/html5/thumbnails/1.jpg)
Esri UC2013 . Technical Workshop .
Technical Workshop
2013 Esri International User Conference July 8–12, 2013 | San Diego, California
Big Data: Using ArcGIS with Apache Hadoop
David Kaiser
Erik Hoel
Offering 1330
![Page 2: Big Data: Using ArcGIS with Apache Hadoop · PDF fileBig Data overview -The Hadoop platform -How Esri’s . GIS Tools for Hadoop . enables developers to process spatial data on Hadoop](https://reader031.vdocument.in/reader031/viewer/2022022504/5ab84e7c7f8b9a684c8ca600/html5/thumbnails/2.jpg)
Esri UC2013 . Technical Workshop .
In this technical workshop
• This presentation is for anyone that uses ArcGIS and is interested in analyzing large amounts of data
• We will cover: - Big Data overview - The Hadoop platform - How Esri’s GIS Tools for Hadoop enables developers to
process spatial data on Hadoop - How ArcGIS can leverage these custom Hadoop applications
for GIS analysis
Big Data: Using ArcGIS with Apache Hadoop
![Page 3: Big Data: Using ArcGIS with Apache Hadoop · PDF fileBig Data overview -The Hadoop platform -How Esri’s . GIS Tools for Hadoop . enables developers to process spatial data on Hadoop](https://reader031.vdocument.in/reader031/viewer/2022022504/5ab84e7c7f8b9a684c8ca600/html5/thumbnails/3.jpg)
Esri UC2013 . Technical Workshop .
Big Data Overview
![Page 4: Big Data: Using ArcGIS with Apache Hadoop · PDF fileBig Data overview -The Hadoop platform -How Esri’s . GIS Tools for Hadoop . enables developers to process spatial data on Hadoop](https://reader031.vdocument.in/reader031/viewer/2022022504/5ab84e7c7f8b9a684c8ca600/html5/thumbnails/4.jpg)
Esri UC2013 . Technical Workshop .
Big Data
• Within ArcGIS, Geoprocessing was enhanced at 10.1 SP1 to support 64-bit address spaces - This is sufficient to handle traditional large GIS datasets
• However, this solution may run into problems when confronted with datasets of a size that are colloquially referred to as Big Data - Internet scale datasets
Big Data: Using ArcGIS with Apache Hadoop
![Page 5: Big Data: Using ArcGIS with Apache Hadoop · PDF fileBig Data overview -The Hadoop platform -How Esri’s . GIS Tools for Hadoop . enables developers to process spatial data on Hadoop](https://reader031.vdocument.in/reader031/viewer/2022022504/5ab84e7c7f8b9a684c8ca600/html5/thumbnails/5.jpg)
Esri UC2013 . Technical Workshop .
Age of Data Ubiquity
• Data is now central to our existence – both for corporations and individuals
• Nimble, thin, data-centric apps exploiting massive data sets generated by both enterprises and consumers
• Hardware era: 20 – 30 years • Software era: 20 – 30 years • Data era: ?
Big Data: Using ArcGIS with Apache Hadoop
![Page 6: Big Data: Using ArcGIS with Apache Hadoop · PDF fileBig Data overview -The Hadoop platform -How Esri’s . GIS Tools for Hadoop . enables developers to process spatial data on Hadoop](https://reader031.vdocument.in/reader031/viewer/2022022504/5ab84e7c7f8b9a684c8ca600/html5/thumbnails/6.jpg)
Esri UC2013 . Technical Workshop .
Age of Data Ubiquity
• Data is now central to our existence – both for corporations and individuals
• Nimble, thin, data-centric apps exploiting massive data sets generated by both enterprises and consumers
• Hardware era: 20 – 30 years • Software era: 20 – 30 years • Data era: ?
Big Data: Using ArcGIS with Apache Hadoop
"The Internet has caused a Cambrian explosion of new life forms, new applications, new data- centric APIs, … literally thousands and thousands of new APIs are born every month." - Mike Hoskins
![Page 7: Big Data: Using ArcGIS with Apache Hadoop · PDF fileBig Data overview -The Hadoop platform -How Esri’s . GIS Tools for Hadoop . enables developers to process spatial data on Hadoop](https://reader031.vdocument.in/reader031/viewer/2022022504/5ab84e7c7f8b9a684c8ca600/html5/thumbnails/7.jpg)
Esri UC2013 . Technical Workshop .
Sensor data challenge
• Sensors are continuously observing our world - With increases in networking technology, we are moving
away from pure remote sensing and moving towards direct sensing, where every sensor reports it’s own data
• One researcher says: “We are instrumenting the universe”, referring to what is often called The Internet of Things
Big Data: Using ArcGIS with Apache Hadoop
![Page 8: Big Data: Using ArcGIS with Apache Hadoop · PDF fileBig Data overview -The Hadoop platform -How Esri’s . GIS Tools for Hadoop . enables developers to process spatial data on Hadoop](https://reader031.vdocument.in/reader031/viewer/2022022504/5ab84e7c7f8b9a684c8ca600/html5/thumbnails/8.jpg)
Esri UC2013 . Technical Workshop .
Internet of Things
• Your old refrigerator was a dumb refrigerator • Your new refrigerator is a smart refrigerator
- It’s a digital asset aware of itself - It records time, temperatures, vibrations - It records the electricity it’s consuming - It’s connected to the Internet as an IP device
• More realistically, commercial jets or valuable equipment
• By continuously observing things, we obtain a massive amount of valuable information
Big Data: Using ArcGIS with Apache Hadoop
IoT - uniquely identifiable objects and their virtual representations in an Internet-like structure; the term was proposed by Kevin Ashton (MIT) in 1999
![Page 9: Big Data: Using ArcGIS with Apache Hadoop · PDF fileBig Data overview -The Hadoop platform -How Esri’s . GIS Tools for Hadoop . enables developers to process spatial data on Hadoop](https://reader031.vdocument.in/reader031/viewer/2022022504/5ab84e7c7f8b9a684c8ca600/html5/thumbnails/9.jpg)
Esri UC2013 . Technical Workshop .
Volume of data – the rise of data anarchy
• If all 7 billion people on Earth joined Twitter and continually tweeted for one century, they would generate one zettabyte of data (billion terabytes)
• Almost double that amount, 1.8 zettabytes (1.8 x 1021), was generated globally in 2011 (and 2.8 ZB in 2012)
• Rising 40-50% per year • Estimated over 40 ZB in 2020
Big Data: Using ArcGIS with Apache Hadoop
![Page 10: Big Data: Using ArcGIS with Apache Hadoop · PDF fileBig Data overview -The Hadoop platform -How Esri’s . GIS Tools for Hadoop . enables developers to process spatial data on Hadoop](https://reader031.vdocument.in/reader031/viewer/2022022504/5ab84e7c7f8b9a684c8ca600/html5/thumbnails/10.jpg)
Esri UC2013 . Technical Workshop .
Volume of data – the rise of data anarchy
• If all 7 billion people on Earth joined Twitter and continually tweeted for one century, they would generate one zettabyte of data (Hadhazy, 2010)
• Almost double that amount, 1.8 zettabytes (1.8 x 1021), was generated globally in 2011(and 2.8 ZB in 2012)
• Rising 40-50% per year • Estimated over 40 ZB in 2020
Big Data: Using ArcGIS with Apache Hadoop
![Page 11: Big Data: Using ArcGIS with Apache Hadoop · PDF fileBig Data overview -The Hadoop platform -How Esri’s . GIS Tools for Hadoop . enables developers to process spatial data on Hadoop](https://reader031.vdocument.in/reader031/viewer/2022022504/5ab84e7c7f8b9a684c8ca600/html5/thumbnails/11.jpg)
Esri UC2013 . Technical Workshop .
Example
• For every hour that it runs, a commercial jet aircraft can create 20 gigabytes of operations information
• For a single journey across the Atlantic Ocean, a two-engine jet can create over 125 GB of data
• Multiply that by the more than 100,000 flights flown each day, and you get an understanding of the enormous amount of data that exists - E.g., 5-10 petabytes/day; 2-4 exabytes/year
Big Data: Using ArcGIS with Apache Hadoop
![Page 12: Big Data: Using ArcGIS with Apache Hadoop · PDF fileBig Data overview -The Hadoop platform -How Esri’s . GIS Tools for Hadoop . enables developers to process spatial data on Hadoop](https://reader031.vdocument.in/reader031/viewer/2022022504/5ab84e7c7f8b9a684c8ca600/html5/thumbnails/12.jpg)
Esri UC2013 . Technical Workshop .
All this data has business value
• Smart Sensors - Electrical meters
• GPS Telemetry - Vehicle tracking, smartphone data collectors
• Internet Services - E-Commerce transactions, social media
• Monitoring Sensors - Heavy equipment, aircraft, etc.
Big Data: Using ArcGIS with Apache Hadoop
![Page 13: Big Data: Using ArcGIS with Apache Hadoop · PDF fileBig Data overview -The Hadoop platform -How Esri’s . GIS Tools for Hadoop . enables developers to process spatial data on Hadoop](https://reader031.vdocument.in/reader031/viewer/2022022504/5ab84e7c7f8b9a684c8ca600/html5/thumbnails/13.jpg)
Esri UC2013 . Technical Workshop .
Value when analyzing data at mass scale
• As observations increase in frequency - Each individual observation is worth less - …as the set of all observations becomes more valuable
• One single metric from the jet aircraft is much less
useful than the analysis of that metric against the same metric from every known flight of that aircraft over time
• Big Data is the accumulation and analytical processes that uses this data for business value
Big Data: Using ArcGIS with Apache Hadoop
![Page 14: Big Data: Using ArcGIS with Apache Hadoop · PDF fileBig Data overview -The Hadoop platform -How Esri’s . GIS Tools for Hadoop . enables developers to process spatial data on Hadoop](https://reader031.vdocument.in/reader031/viewer/2022022504/5ab84e7c7f8b9a684c8ca600/html5/thumbnails/14.jpg)
Esri UC2013 . Technical Workshop .
What is Hadoop?
Racks of Hadoop Servers inside the Facebook.com Data Center
![Page 15: Big Data: Using ArcGIS with Apache Hadoop · PDF fileBig Data overview -The Hadoop platform -How Esri’s . GIS Tools for Hadoop . enables developers to process spatial data on Hadoop](https://reader031.vdocument.in/reader031/viewer/2022022504/5ab84e7c7f8b9a684c8ca600/html5/thumbnails/15.jpg)
Esri UC2013 . Technical Workshop .
The need for distributed systems
• As data volumes have grown, individual computing systems have not scaled equivalently, e.g., - Single-node databases - Single computers
• Big data has led to distributed systems:
- Data is stored across a number of servers - Processing the data takes place in the server where the
data is stored
Big Data: Using ArcGIS with Apache Hadoop
![Page 16: Big Data: Using ArcGIS with Apache Hadoop · PDF fileBig Data overview -The Hadoop platform -How Esri’s . GIS Tools for Hadoop . enables developers to process spatial data on Hadoop](https://reader031.vdocument.in/reader031/viewer/2022022504/5ab84e7c7f8b9a684c8ca600/html5/thumbnails/16.jpg)
Esri UC2013 . Technical Workshop .
Legacy system architecture
Big Data: Using ArcGIS with Apache Hadoop
Network Bottleneck
Disks Processors
![Page 17: Big Data: Using ArcGIS with Apache Hadoop · PDF fileBig Data overview -The Hadoop platform -How Esri’s . GIS Tools for Hadoop . enables developers to process spatial data on Hadoop](https://reader031.vdocument.in/reader031/viewer/2022022504/5ab84e7c7f8b9a684c8ca600/html5/thumbnails/17.jpg)
Esri UC2013 . Technical Workshop . Big Data: Using ArcGIS with Apache Hadoop
![Page 18: Big Data: Using ArcGIS with Apache Hadoop · PDF fileBig Data overview -The Hadoop platform -How Esri’s . GIS Tools for Hadoop . enables developers to process spatial data on Hadoop](https://reader031.vdocument.in/reader031/viewer/2022022504/5ab84e7c7f8b9a684c8ca600/html5/thumbnails/18.jpg)
Esri UC2013 . Technical Workshop . Big Data: Using ArcGIS with Apache Hadoop
![Page 19: Big Data: Using ArcGIS with Apache Hadoop · PDF fileBig Data overview -The Hadoop platform -How Esri’s . GIS Tools for Hadoop . enables developers to process spatial data on Hadoop](https://reader031.vdocument.in/reader031/viewer/2022022504/5ab84e7c7f8b9a684c8ca600/html5/thumbnails/19.jpg)
Esri UC2013 . Technical Workshop .
Distributed system architecture
Big Data: Using ArcGIS with Apache Hadoop
Processing Elements
![Page 20: Big Data: Using ArcGIS with Apache Hadoop · PDF fileBig Data overview -The Hadoop platform -How Esri’s . GIS Tools for Hadoop . enables developers to process spatial data on Hadoop](https://reader031.vdocument.in/reader031/viewer/2022022504/5ab84e7c7f8b9a684c8ca600/html5/thumbnails/20.jpg)
Esri UC2013 . Technical Workshop .
Distributed system architecture
Big Data: Using ArcGIS with Apache Hadoop
Processing Elements
![Page 21: Big Data: Using ArcGIS with Apache Hadoop · PDF fileBig Data overview -The Hadoop platform -How Esri’s . GIS Tools for Hadoop . enables developers to process spatial data on Hadoop](https://reader031.vdocument.in/reader031/viewer/2022022504/5ab84e7c7f8b9a684c8ca600/html5/thumbnails/21.jpg)
Esri UC2013 . Technical Workshop .
Until Now…
• Google implemented their enterprise on a distributed network of many nodes, fusing storage and processing into each node
• Hadoop is an open source implementation of the framework that Google has built their business around for many years
Big Data: Using ArcGIS with Apache Hadoop
![Page 22: Big Data: Using ArcGIS with Apache Hadoop · PDF fileBig Data overview -The Hadoop platform -How Esri’s . GIS Tools for Hadoop . enables developers to process spatial data on Hadoop](https://reader031.vdocument.in/reader031/viewer/2022022504/5ab84e7c7f8b9a684c8ca600/html5/thumbnails/22.jpg)
Esri UC2013 . Technical Workshop .
Until Now…
• Google implemented their enterprise on a distributed network of many nodes, fusing storage and processing into each node
• Hadoop is an open implementation of the framework that Google has built their business around for many years
Big Data: Using ArcGIS with Apache Hadoop
![Page 23: Big Data: Using ArcGIS with Apache Hadoop · PDF fileBig Data overview -The Hadoop platform -How Esri’s . GIS Tools for Hadoop . enables developers to process spatial data on Hadoop](https://reader031.vdocument.in/reader031/viewer/2022022504/5ab84e7c7f8b9a684c8ca600/html5/thumbnails/23.jpg)
Esri UC2013 . Technical Workshop .
Hadoop
• Open-source data processing framework • Provides scalable, distributed system for both
storage and computation of data - It supports the running of applications on large clusters of
commodity hardware - Hadoop was derived from Google's MapReduce and Google
File System (GFS) scientific papers
• Platform consists of the kernel, HDFS, and MapReduce - Other applications have been built on Hadoop; this
includes Hive, HBase, etc. Big Data: Using ArcGIS with Apache Hadoop
![Page 24: Big Data: Using ArcGIS with Apache Hadoop · PDF fileBig Data overview -The Hadoop platform -How Esri’s . GIS Tools for Hadoop . enables developers to process spatial data on Hadoop](https://reader031.vdocument.in/reader031/viewer/2022022504/5ab84e7c7f8b9a684c8ca600/html5/thumbnails/24.jpg)
Esri UC2013 . Technical Workshop .
MapReduce
• A programming model for processing large data sets with a parallel distributed algorithm on a cluster
• A MapReduce program is comprised of: - A Map() procedure that performs filtering and comparing, and - A Reduce() procedure that performs a summary operation
• The MapReduce System marshals the distributed servers, runs the various tasks in parallel, manages all communications and data transfers, and provides for redundancy and failures
• MapReduce libraries have been written in many languages; Hadoop is a popular open source implementation Big Data: Using ArcGIS with Apache Hadoop
![Page 25: Big Data: Using ArcGIS with Apache Hadoop · PDF fileBig Data overview -The Hadoop platform -How Esri’s . GIS Tools for Hadoop . enables developers to process spatial data on Hadoop](https://reader031.vdocument.in/reader031/viewer/2022022504/5ab84e7c7f8b9a684c8ca600/html5/thumbnails/25.jpg)
Esri UC2013 . Technical Workshop .
High Level MapReduce Walk-Through
Big Data: Using ArcGIS with Apache Hadoop
• An instance of a MapReduce program is a job • The job accepts arguments for data input and
outputs • The job combines
- Functions from the MapReduce framework - “Splitting” large inputs into smaller pieces - Reading inputs and writing outputs
- Functions that are written by the application developer - Map function, maps input values to keys - Reduce function, reduces many keys to one
![Page 26: Big Data: Using ArcGIS with Apache Hadoop · PDF fileBig Data overview -The Hadoop platform -How Esri’s . GIS Tools for Hadoop . enables developers to process spatial data on Hadoop](https://reader031.vdocument.in/reader031/viewer/2022022504/5ab84e7c7f8b9a684c8ca600/html5/thumbnails/26.jpg)
Esri UC2013 . Technical Workshop . Big Data: Using ArcGIS with Apache Hadoop
MapReduce Job
• An instance of a MapReduce program is a job
High Level MapReduce Walk-Through
![Page 27: Big Data: Using ArcGIS with Apache Hadoop · PDF fileBig Data overview -The Hadoop platform -How Esri’s . GIS Tools for Hadoop . enables developers to process spatial data on Hadoop](https://reader031.vdocument.in/reader031/viewer/2022022504/5ab84e7c7f8b9a684c8ca600/html5/thumbnails/27.jpg)
Esri UC2013 . Technical Workshop .
data
hdfs://path/to/input hdfs://path/to/output
Big Data: Using ArcGIS with Apache Hadoop
MapReduce Job
• The job accepts arguments for data input and outputs
High Level MapReduce Walk-Through
![Page 28: Big Data: Using ArcGIS with Apache Hadoop · PDF fileBig Data overview -The Hadoop platform -How Esri’s . GIS Tools for Hadoop . enables developers to process spatial data on Hadoop](https://reader031.vdocument.in/reader031/viewer/2022022504/5ab84e7c7f8b9a684c8ca600/html5/thumbnails/28.jpg)
Esri UC2013 . Technical Workshop .
data
hdfs://path/to/input hdfs://path/to/output
Big Data: Using ArcGIS with Apache Hadoop
Split
MapReduce Job
• The MapReduce framework splits the data space
High Level MapReduce Walk-Through
![Page 29: Big Data: Using ArcGIS with Apache Hadoop · PDF fileBig Data overview -The Hadoop platform -How Esri’s . GIS Tools for Hadoop . enables developers to process spatial data on Hadoop](https://reader031.vdocument.in/reader031/viewer/2022022504/5ab84e7c7f8b9a684c8ca600/html5/thumbnails/29.jpg)
Esri UC2013 . Technical Workshop .
map()
map()
map()
map()
data
hdfs://path/to/input hdfs://path/to/output
Big Data: Using ArcGIS with Apache Hadoop
Split
MapReduce Job
• Many map() functions are run in parallel
High Level MapReduce Walk-Through
![Page 30: Big Data: Using ArcGIS with Apache Hadoop · PDF fileBig Data overview -The Hadoop platform -How Esri’s . GIS Tools for Hadoop . enables developers to process spatial data on Hadoop](https://reader031.vdocument.in/reader031/viewer/2022022504/5ab84e7c7f8b9a684c8ca600/html5/thumbnails/30.jpg)
Esri UC2013 . Technical Workshop .
map()
map()
map()
map()
data
hdfs://path/to/input hdfs://path/to/output
Big Data: Using ArcGIS with Apache Hadoop
Split Combine
MapReduce Job
• The framework runs combine to join map() outputs
High Level MapReduce Walk-Through
![Page 31: Big Data: Using ArcGIS with Apache Hadoop · PDF fileBig Data overview -The Hadoop platform -How Esri’s . GIS Tools for Hadoop . enables developers to process spatial data on Hadoop](https://reader031.vdocument.in/reader031/viewer/2022022504/5ab84e7c7f8b9a684c8ca600/html5/thumbnails/31.jpg)
Esri UC2013 . Technical Workshop .
map()
map()
map()
map()
data
hdfs://path/to/input hdfs://path/to/output
Big Data: Using ArcGIS with Apache Hadoop
Split Combine Shuffle / Sort
MapReduce Job
• The framework performs a shuffle and sort on all data
High Level MapReduce Walk-Through
![Page 32: Big Data: Using ArcGIS with Apache Hadoop · PDF fileBig Data overview -The Hadoop platform -How Esri’s . GIS Tools for Hadoop . enables developers to process spatial data on Hadoop](https://reader031.vdocument.in/reader031/viewer/2022022504/5ab84e7c7f8b9a684c8ca600/html5/thumbnails/32.jpg)
Esri UC2013 . Technical Workshop .
map()
map()
map()
map()
reduce()
reduce()
data
hdfs://path/to/input hdfs://path/to/output
Big Data: Using ArcGIS with Apache Hadoop
Split Combine Shuffle / Sort
MapReduce Job
• The reduce() functions work against the sorted data
High Level MapReduce Walk-Through
![Page 33: Big Data: Using ArcGIS with Apache Hadoop · PDF fileBig Data overview -The Hadoop platform -How Esri’s . GIS Tools for Hadoop . enables developers to process spatial data on Hadoop](https://reader031.vdocument.in/reader031/viewer/2022022504/5ab84e7c7f8b9a684c8ca600/html5/thumbnails/33.jpg)
Esri UC2013 . Technical Workshop .
map()
map()
map()
map()
reduce()
reduce()
data part 1
part 2
hdfs://path/to/input hdfs://path/to/output
Big Data: Using ArcGIS with Apache Hadoop
Split Combine Shuffle / Sort
MapReduce Job
• Reducers each write a part of the results to a file
High Level MapReduce Walk-Through
![Page 34: Big Data: Using ArcGIS with Apache Hadoop · PDF fileBig Data overview -The Hadoop platform -How Esri’s . GIS Tools for Hadoop . enables developers to process spatial data on Hadoop](https://reader031.vdocument.in/reader031/viewer/2022022504/5ab84e7c7f8b9a684c8ca600/html5/thumbnails/34.jpg)
Esri UC2013 . Technical Workshop .
map()
map()
map()
map()
reduce()
reduce()
data part 1
part 2
hdfs://path/to/input hdfs://path/to/output
Big Data: Using ArcGIS with Apache Hadoop
Split Combine Shuffle / Sort
• When the job has finished, ArcGIS can retrieve the results
High Level MapReduce Walk-Through
![Page 35: Big Data: Using ArcGIS with Apache Hadoop · PDF fileBig Data overview -The Hadoop platform -How Esri’s . GIS Tools for Hadoop . enables developers to process spatial data on Hadoop](https://reader031.vdocument.in/reader031/viewer/2022022504/5ab84e7c7f8b9a684c8ca600/html5/thumbnails/35.jpg)
Esri UC2013 . Technical Workshop .
AK AK OR AK WA
WA OR AK WA
AK 1 AK 1 OR 1 AK 1 WA 1
WA 1 OR 1 AK 1 WA 1
WA 1 WA 1 WA 1 AK 1 AK 1 AK 1 AK 1
WA 3 AK 4
Map
Map
WA OR OR OR
WA 1 OR 1 OR 1 OR 1
Map
OR 1 OR 1 OR 1 OR 1 OR 1
OR 5 Reduce
Shuffle / Sort
Split 3 / Record 1
MapReduce – Polygon Count Example
Split 2 / Record 1
Split 1 / Record 1
Reduce
Big Data: Using ArcGIS with Apache Hadoop
Washington Alaska Oregon
![Page 36: Big Data: Using ArcGIS with Apache Hadoop · PDF fileBig Data overview -The Hadoop platform -How Esri’s . GIS Tools for Hadoop . enables developers to process spatial data on Hadoop](https://reader031.vdocument.in/reader031/viewer/2022022504/5ab84e7c7f8b9a684c8ca600/html5/thumbnails/36.jpg)
Esri UC2013 . Technical Workshop . Big Data: Using ArcGIS with Apache Hadoop
Yahoo! Hadoop Cluster in 2011 • ~10,000 servers in a number of clusters • Largest cluster is 1,600 nodes • Nearly 1 Petabyte of user data • Yahoo! runs nearly 10,000 research jobs per month
![Page 37: Big Data: Using ArcGIS with Apache Hadoop · PDF fileBig Data overview -The Hadoop platform -How Esri’s . GIS Tools for Hadoop . enables developers to process spatial data on Hadoop](https://reader031.vdocument.in/reader031/viewer/2022022504/5ab84e7c7f8b9a684c8ca600/html5/thumbnails/37.jpg)
Esri UC2013 . Technical Workshop .
• We didn’t bring 10,000 servers to the user conference…
• … but we do have a small Hadoop cluster we can use for demos
Big Data: Using ArcGIS with Apache Hadoop
![Page 38: Big Data: Using ArcGIS with Apache Hadoop · PDF fileBig Data overview -The Hadoop platform -How Esri’s . GIS Tools for Hadoop . enables developers to process spatial data on Hadoop](https://reader031.vdocument.in/reader031/viewer/2022022504/5ab84e7c7f8b9a684c8ca600/html5/thumbnails/38.jpg)
Esri UC2013 . Technical Workshop .
GIS Tools for Hadoop
Credit: Fuqing Zhang and Yonghui Weng, Pennsylvania State University; Frank Marks, NOAA; Gregory P.
Johnson, Romy Schneider, John Cazes, Karl Schulz, Bill Barth, The University of Texas at Austin
![Page 39: Big Data: Using ArcGIS with Apache Hadoop · PDF fileBig Data overview -The Hadoop platform -How Esri’s . GIS Tools for Hadoop . enables developers to process spatial data on Hadoop](https://reader031.vdocument.in/reader031/viewer/2022022504/5ab84e7c7f8b9a684c8ca600/html5/thumbnails/39.jpg)
Esri UC2013 . Technical Workshop .
GIS Tools for Hadoop
• Hadoop is a data processing system that is designed to store and process large amounts of data
• The most common Hadoop data processing task is to reduce a large amount of data to a smaller, more manageable amount of data
• The GIS Tools for Hadoop provide query functions and API methods that enable Hadoop application developers to perform this data reduction process on spatial data
Big Data: Using ArcGIS with Apache Hadoop
![Page 40: Big Data: Using ArcGIS with Apache Hadoop · PDF fileBig Data overview -The Hadoop platform -How Esri’s . GIS Tools for Hadoop . enables developers to process spatial data on Hadoop](https://reader031.vdocument.in/reader031/viewer/2022022504/5ab84e7c7f8b9a684c8ca600/html5/thumbnails/40.jpg)
Esri UC2013 . Technical Workshop . Big Data: Using ArcGIS with Apache Hadoop
![Page 41: Big Data: Using ArcGIS with Apache Hadoop · PDF fileBig Data overview -The Hadoop platform -How Esri’s . GIS Tools for Hadoop . enables developers to process spatial data on Hadoop](https://reader031.vdocument.in/reader031/viewer/2022022504/5ab84e7c7f8b9a684c8ca600/html5/thumbnails/41.jpg)
Esri UC2013 . Technical Workshop .
Data Reduction Patterns
• Need to reduce large volumes of data into manageable datasets that can be processed in the ArcGIS Platform - Filtering - Grouping
- Simple “binning” against grid cells - Aggregation into known spatial areas
- More complex patterns if desired
Big Data: Using ArcGIS with Apache Hadoop
![Page 42: Big Data: Using ArcGIS with Apache Hadoop · PDF fileBig Data overview -The Hadoop platform -How Esri’s . GIS Tools for Hadoop . enables developers to process spatial data on Hadoop](https://reader031.vdocument.in/reader031/viewer/2022022504/5ab84e7c7f8b9a684c8ca600/html5/thumbnails/42.jpg)
Esri UC2013 . Technical Workshop .
GIS Tools for Hadoop
• Developer API to support data reduction workflows - Release in March 2013 at Esri Developer Summit
• Spatial reduction with full geometry capabilities
- Relational operators (touch, disjoint, …) - Topological operators (buffer, union, intersection, …) - Accessible via
- SQL Expressions - Java
• Small set of GP Tools to migrate data and invoke jobs
![Page 43: Big Data: Using ArcGIS with Apache Hadoop · PDF fileBig Data overview -The Hadoop platform -How Esri’s . GIS Tools for Hadoop . enables developers to process spatial data on Hadoop](https://reader031.vdocument.in/reader031/viewer/2022022504/5ab84e7c7f8b9a684c8ca600/html5/thumbnails/43.jpg)
Esri UC2013 . Technical Workshop .
json
HadoopTools.pyt
GIS Tools for Hadoop
Spatial Framework for Hadoop
Geoprocessing Tools for Hadoop
Geometry API Java
hive
spatial-sdk-hive.jar
spatial-sdk-json.jar
esri-geometry-api.jar
samples tools
Tools and samples using the open source resources that solve specific problems
• Spatial type functions for Hive, a SQL Engine on Hadoop
• JSON helper utilities
Geoprocessing tools that… • Copy to/from Hadoop • Convert to/from JSON • Invoke Hadoop jobs
Java geometry library for spatial data processing
![Page 44: Big Data: Using ArcGIS with Apache Hadoop · PDF fileBig Data overview -The Hadoop platform -How Esri’s . GIS Tools for Hadoop . enables developers to process spatial data on Hadoop](https://reader031.vdocument.in/reader031/viewer/2022022504/5ab84e7c7f8b9a684c8ca600/html5/thumbnails/44.jpg)
Esri UC2013 . Technical Workshop .
MapReduce Demos
DEMONSTRATION
![Page 45: Big Data: Using ArcGIS with Apache Hadoop · PDF fileBig Data overview -The Hadoop platform -How Esri’s . GIS Tools for Hadoop . enables developers to process spatial data on Hadoop](https://reader031.vdocument.in/reader031/viewer/2022022504/5ab84e7c7f8b9a684c8ca600/html5/thumbnails/45.jpg)
Esri UC2013 . Technical Workshop .
Earthquake Measurement Data
1964/03/28 06:42:57.00,57.98,-151.6,33.0,5.2,ML,0
1964/03/28 06:43:54.40,58.26,-151.25,4.0,6.1,ML,0
1964/03/28 06:50:52.00,56.9,-151.7,33.0,5.1,ML,0
1964/03/28 06:53:35.90,58.79,-149.54,20.0,5.7,ML,0
1964/03/28 07:09:07.60,59.79,-148.1,13.0,5.3,ML,0
1964/03/28 07:10:22.00,58.83,-149.29,17.0,6.2,ML,0
1964/03/28 07:16:16.70,58.0,-150.7,33.0,5.3,ML,0
1964/03/28 07:24:24.60,59.62,-148.7,20.0,5.1,ML,0
Big Data: Using ArcGIS with Apache Hadoop
Coordinates Depth Magnitude
• Data format is CSV, already stored in HDFS file
![Page 46: Big Data: Using ArcGIS with Apache Hadoop · PDF fileBig Data overview -The Hadoop platform -How Esri’s . GIS Tools for Hadoop . enables developers to process spatial data on Hadoop](https://reader031.vdocument.in/reader031/viewer/2022022504/5ab84e7c7f8b9a684c8ca600/html5/thumbnails/46.jpg)
Esri UC2013 . Technical Workshop .
In this demo, we use ArcGIS to provide the polygonal boundaries of the U.S. States, and the Hadoop application aggregates earthquake data by the state boundaries
Grouping by Polygons
DEMONSTRATION
![Page 47: Big Data: Using ArcGIS with Apache Hadoop · PDF fileBig Data overview -The Hadoop platform -How Esri’s . GIS Tools for Hadoop . enables developers to process spatial data on Hadoop](https://reader031.vdocument.in/reader031/viewer/2022022504/5ab84e7c7f8b9a684c8ca600/html5/thumbnails/47.jpg)
Esri UC2013 . Technical Workshop .
In this demo, we compute arbitrary grid cells such as a 1km grid, and aggregate earthquakes by these cells
Grouping into Grid Cells
DEMONSTRATION
![Page 48: Big Data: Using ArcGIS with Apache Hadoop · PDF fileBig Data overview -The Hadoop platform -How Esri’s . GIS Tools for Hadoop . enables developers to process spatial data on Hadoop](https://reader031.vdocument.in/reader031/viewer/2022022504/5ab84e7c7f8b9a684c8ca600/html5/thumbnails/48.jpg)
Esri UC2013 . Technical Workshop .
The Future
• GIS Tools for Hadoop is a developer story - Our initial foray into Big Data
• Esri will continue forward in this domain - User stories and technology will be the focus
• Swing by the island to chat further • Please fill out the evaluations: offering 1330
Big Data: Using ArcGIS with Apache Hadoop
![Page 49: Big Data: Using ArcGIS with Apache Hadoop · PDF fileBig Data overview -The Hadoop platform -How Esri’s . GIS Tools for Hadoop . enables developers to process spatial data on Hadoop](https://reader031.vdocument.in/reader031/viewer/2022022504/5ab84e7c7f8b9a684c8ca600/html5/thumbnails/49.jpg)
Esri UC2013 . Technical Workshop .
Questions?
![Page 50: Big Data: Using ArcGIS with Apache Hadoop · PDF fileBig Data overview -The Hadoop platform -How Esri’s . GIS Tools for Hadoop . enables developers to process spatial data on Hadoop](https://reader031.vdocument.in/reader031/viewer/2022022504/5ab84e7c7f8b9a684c8ca600/html5/thumbnails/50.jpg)
Esri UC2013 . Technical Workshop . Big Data: Using ArcGIS with Apache Hadoop