geolocation analysis using hiveql

17
Geolocation Data Analysis for Safe Residence using HiveQL TEAM: PRIYANKA KALE, PRIYAL MISTRY, HITESH JAGTAP GUIDE: DR. JONGWOOK WOO 24th Annual Student Symposium, CSULA 26 th February 2016

Upload: priyanka-kale

Post on 16-Apr-2017

323 views

Category:

Data & Analytics


1 download

TRANSCRIPT

Page 1: Geolocation analysis using HiveQL

Geolocation Data Analysis for Safe Residence using HiveQL

TEAM: PRIYANKA KALE, PRIYAL MISTRY, HITESH JAGTAP GUIDE: DR. JONGWOOK WOO

24th Annual Student Symposium, CSULA26th February 2016

Page 2: Geolocation analysis using HiveQL

Table of Contents1. Introduction

2. Big Data

3. Flowchart

4. Specifications

5. Implementation

6. Visualization

7. GitHub

8. Business Perspective

9. References

Page 3: Geolocation analysis using HiveQL

Introduction: Goal- To determine if a location is safe or not by analyzing huge

crime data (1.3 GB) for Chicago city in IL collected from 2001 to present(November 2015).

This is a study of real dataset provided by the government of United States of America using Big Data Analytics and related Tools.

Query output is visualized using different graphs and maps for better interpretation.

Page 4: Geolocation analysis using HiveQL

Big Data

Volume

Complexity

Variety

Variability

Page 5: Geolocation analysis using HiveQL

Flowchart

Download Dataset

Upload data into HDFS

Trigger Hive Queries

Result Tables

Output visualization

Page 6: Geolocation analysis using HiveQL

Specifications

• Microsoft Azure Hortonwork’s sandbox: 1. Linux system2. No. of nodes: 43. 8 cores4. Size-14 Gb

Page 7: Geolocation analysis using HiveQL

Implementation

Hue is a web application which helps to browse HDFS and work with Hive and Cloudera Impala queries, MapReduce jobs.

Page 8: Geolocation analysis using HiveQL

Creation of tables in Hcatalog:

Page 9: Geolocation analysis using HiveQL

Hive and Beeswax

Hive is an infrastructure built on top of Hadoop for data summarization, query and analysis

Beeswax an application to perform HIVE queries

Page 10: Geolocation analysis using HiveQL

Processing in Beeswax:

Page 11: Geolocation analysis using HiveQL

Total no and rank of crime type –

select primary_type, count(iucr), rank() over (ORDER BY count(iucr) desc) from crime group by primary_type limit

100;

Queries and Visualization

Page 12: Geolocation analysis using HiveQL

number of crime as per location type for a given area- select location_description, count(iucr) from crime where address = '008XX N MICHIGAN AVE' group by location_description limit 100;

0200400600800

10001200

Total

Total

Page 13: Geolocation analysis using HiveQL

Final Outcome of Analysis:CREATE TABLE UnsafeArea row format delimited fields terminated by ',' STORED AS RCFile AS select address,count(iucr) AS total_crimes,rank() over (ORDER BY count(iucr) desc) AS rank from crime GROUP BY address;

Page 14: Geolocation analysis using HiveQL

GitHub

URL: https://github.com/priya708/Project-520

Page 15: Geolocation analysis using HiveQL

Business Perspective Get better advertisement

Predictive Policing for Police department: The future of Law enforcement?

• Reducing Random Gunfire• Connecting Burglaries and Code Violations

Page 16: Geolocation analysis using HiveQL

References

https://catalog.data.gov

https://cwiki.apache.org/confluence/display/Hive/Tutorial

https://hortonworks.com/tutorials

Page 17: Geolocation analysis using HiveQL

THANK YOU