‘bigexcel’ a web-based framework for exploring big data in social sciences asif saleem, blesson...

Download ‘BigExcel’ A Web-Based Framework for Exploring Big Data in Social Sciences Asif Saleem, Blesson Varghese and Adam Barker University of St Andrews, UK

If you can't read please download the document

Upload: marianna-phelps

Post on 18-Jan-2018

216 views

Category:

Documents


0 download

DESCRIPTION

Introduction Transformative change in the data analysis landscape o Traditionally, used spreadsheet like applications o Now, big data tools Big data technologies are maturing o Cloud computing – infrastructure support o Hadoop and Hive – programming paradigm Technologies are sometimes not easy for even computer scientists o Set up, programming, adapting to hardware infrastructure, etc B. Varghese - Big Humanities 20143

TRANSCRIPT

BigExcel A Web-Based Framework for Exploring Big Data in Social Sciences Asif Saleem, Blesson Varghese and Adam Barker University of St Andrews, UK Agenda Introduction Challenges Framework Demo Feasibility Study Conclusions B. Varghese - Big Humanities 20142 Introduction Transformative change in the data analysis landscape o Traditionally, used spreadsheet like applications o Now, big data tools Big data technologies are maturing o Cloud computing infrastructure support o Hadoop and Hive programming paradigm Technologies are sometimes not easy for even computer scientists o Set up, programming, adapting to hardware infrastructure, etc B. Varghese - Big Humanities 20143 Challenges Limited Accessibility of Big Data Tools o Gap between technology and end user o In-depth knowledge of the tools required to use it o Knowledge of hardware and excellent programming skills required Lack of Exploratory Tools for Big Data o Perform quick analysis without undertaking large programming tasks Lack of Lightweight Big Data Tools o Full fledged and comprehensive tools are available but require professional training B. Varghese - Big Humanities 20144 BigExcel Framework Three tier framework: o User Interaction Layer Data browser built using RichFaces Connects to next layer using RESTful Web Services o Query Management Layer Constructs queries for Hive Manages the data Stores the logic for analytical operations in MapReduce o Infrastructure Management Layer Connecting to the Cloud Amazon Web Services SDK used B. Varghese - Big Humanities 20145 BigExcel v1.0 Demo B. Varghese - Big Humanities 20146 Feasibility Study Based on Yahoo Sandbox datasets o Predicting market trends o News related n-grams Example o User clicks on the browser o Clicks are converted to queries: SELECT TRANSFORM(date, time, buzz_score) USING hourly_analysis FROM Yahoo_Buzz_Scores WHERE product=EBOOKS AND date >= AND date