big data insights with red hat jboss data virtualization

41
GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION Syed Rasheed Solution Manager Red Hat Corp. Kenny Peeples Technical Manager Red Hat Corp. Kimberly Palko Product Manager Red Hat Corp.

Upload: kenneth-peeples

Post on 22-Jan-2015

981 views

Category:

Technology


7 download

DESCRIPTION

You’re hearing a lot about big data these days. And big data and the technologies that store and process it, like Hadoop, aren’t just new data silos. You might be looking to integrate big data with existing enterprise information systems to gain better understanding of your business. You want to take informed action. During this session, we’ll demonstrate how Red Hat JBoss Data Virtualization can integrate with Hadoop through Hive and provide users easy access to data. You’ll learn how Red Hat JBoss Data Virtualization: Can help you integrate your existing and growing data infrastructure. Integrates big data with your existing enterprise data infrastructure. Lets non-technical users access big data result sets. We’ll also provide typical uses cases and examples and a demonstration of the integration of Hadoop sentiment analysis with sales data.

TRANSCRIPT

  • 1. GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION Syed Rasheed Solution Manager Red Hat Corp. Kenny Peeples Technical Manager Red Hat Corp. Kimberly Palko Product Manager Red Hat Corp.

2. AGENDA Demystifying Big Data Data Virtualization: Making Big Data Available to Everyone Red Hat Big Data Strategy and Platform Real World Customer Example using Red Hat Big Data Platform Demo Roadmap Q&A 3. DO WE AGREE ON WHAT BIG DATA IS? 4. Source: http://blogs.ifsworld.com/2013/02/how-will-big-data-influence-your-finance-team/ 5. ITS ALL ABOUT GAINING BUSINESS INSIGHTS Improve product development Optimize business processes Improve customer care Improve customer lifetime value Personalize products Competitive intelligence 6. INFORMATION AND AGILITY GAP OverOver 70%70%BI project efforts lies in Data Integration finding and identifying source data OnlyOnly 28%28%Users have any meaningful data access 7. DATA CHALLENGES GETTING BIGGER FOR USERS NoSQL Hive MapReduce HDFS Pig Jaql Flume Storm HBase 8. RED HATS BIG DATA STRATEGY Reduce Information Gap thru cost effectively making ALL data easily consumable for analytics Data Analytics Data to Actionable Information Cycle 9. BIG DATA FOR EVERYONE 10. EASY ACCESS TO BIG DATA BI Reports & Analytics Hive MapReduce HDFS Analytical Reporting Tool Data Virtualization Server Hadoop Big Data 1. Reporting tool accesses the data virtualization server via rich SQL dialect 2. The data virtualization server translates rich SQL dialect to HiveQL 3. Hive translates SQL to MapReduce 4. MapReduce runs MR job on big data 11. TURN FRAGMENTED DATA INTO ACTIONABLE INFORMATION ConnectConnect ComposeCompose ConsumeConsume BI Reports & Analytics Mobile Applications SOA Applications & PortalsESB, ETL Native Data ConnectivityNative Data Connectivity Standard based Data Provisioning JDBC, ODBC, REST, SOAP, OData Standard based Data Provisioning JDBC, ODBC, REST, SOAP, OData Design ToolsDesign Tools DashboardDashboard OptimizationOptimization CachingCaching SecuritySecurity MetadataMetadata Hadoop NoSQL Cloud Apps Data Warehouse & Databases Mainframe XML, CSV & Excel Files Enterprise Apps Siloed & Complex Virtualize Transform Federate Easy, Real-time Information Access Unified Virtual Database / Common Data Model Data Transformations Unified Virtual Database / Common Data Model Data Transformations 12. BENEFITS OF DATA VIRTUALIZATION ON BIG DATA Enterprise democratization of big data Any reporting or analytical tool can be used Easy access to big data Seamless integration of big data and existing data assets Sharing of integration specifications Collaborative development on big data Fine-grained of security big data Increased time-to-market of reports on big data 13. CONVERGENCE OF FOUR DATA TRENDS 14. COMPREHENSIVE MIDDLEWARE PLATFORM CAPTURE, PROCESS AND INTEGRATE BIG DATA VOLUME, VELOCITY, VARIETY Hadoop Data Integration JBoss Data Virtualization Data Integration JBoss Data Virtualization In-memory Cache JBoss Data Grid In-memory Cache JBoss Data Grid BI Analytics (historical, operational, predictive) BI Analytics (historical, operational, predictive) SOA Composite ApplicationsSOA Composite Applications Messaging and Event Processing JBoss A-MQ and JBoss BRMS J Messaging and Event Processing JBoss A-MQ and JBoss BRMS J Structured DataStructured Data Streaming DataStreaming Data Semi-Structured DataSemi-Structured Data RedHatStorage RedHatEnterpriseLinux&Virtualization Capture&ProcessIntegrate&Analyze 15. RED HAT BIG DATA PLATFORM 16. EXAMPLES: RED HAT BIG DATA PLATFORM IN THE REAL WORLD 17. BIG DATA IN THE UTILITIES Objective: Combine data from smart meters on homes with data from electricity generation and transmission and make it available to power providers Problem: The original smart grid project looked only at reading information from the meters on houses and now this data needs to be combined with generation and transmission data in a cost-effective way The data points are all over the place: sensors on the lines, in the field, homes, etc. The information must be accessible to multiple power providers through a common interface Solution: Use Messaging to collect data from a variety of sources and route it to a CEP for initial filtering. Process with Hadoop map/reduce and BRMS and distribute data to Data Virtualization to be combined with other sources and consumed with BI tools, and/or to JDG for in-memory data caching and/or send to archive. 18. SMART GRID TransmissionTransmission GenerationGeneration ConsumerConsumer RegulatoryRegulatory UsersUsers Collector Sensors Collector Sensors Local Data Store Local Data Store Collector Scada Collector Scada Local Data Store Local Data Store Collector Meter Collector Meter Local Data Store Local Data Store Adaptor Rules Adaptor Rules Sensor Adaptor Sensor Adaptor Routing Function Routing Function Normalization / MapReduce Normalization / MapReduce PM Regional Translator / Scheduler PM Regional Translator / Scheduler Offline Storage Offline Storage Data Virtualization Data Virtualization CacheCache AuthenticationAuthentication PresentationPresentation REST ExposureREST Exposure Element Connection Tier Data Adaptation & Routing Tier Normalized Data Tier Data Tier API Exposure &Portal Tier ComposeCompose PM Data Schedule PM Data Reports Rules Creation / Updates PM Admin NoSQL-Cassandra 19. RETAIL CUSTOMER USE CASE GAIN BETTER INSIGHT FOR INTELLIGENT INVENTORY MANAGEMENT Objective: Right merchandise, at right time and price Problem: Cannot utilize social data and sentiment analysis with their inventory and purchase management system Solution: Leverage JBoss Data Virtualization to mashup Sentiment analysis data with inventory and purchasing system data. Leveraged BRMS to optimize pricing and stocking decisions. Consume Compose Connect Analytical Apps JBoss Data Virtualization Hive Inventory Databases Purchase Mgmt Application Sentiment Analysis JBoss BRMS Data Driven Decision Management 20. DEMOS LUCIDWORKS, JBOSS DATA VIRTUALIZATION AND RED HAT STORAGE 21. ABOUT LUCIDWORKS Employs 40% of the committers for Lucene/Solr Makes 50% - 70% of the enhancements to each release of Lucene/Solr Only company to offer Open Source and Open Core Search Solutions 22. LUCENE/SOLR: ENABLING BETTER, DATA-DRIVEN DECISIONS 23. LUCIDWORKS DEMONSTRATION LucidWorks/Solr to provide full text search and statistics Data Virtualization provides the data through Teiid JDBC driver and pulls the data from Hive/Hadoop, CSV File, XML File Red Hat Storage provides the Enterprise Data Repository 24. DEMONSTRATION ARCHITECTURE 25. DEMOS HORTONWORKS AND JBOSS DATA VIRTUALIZATION 26. ABOUT HORTONWORKS Founded in 2011 by 24 engineers from the original Yahoo! Hadoop development and operations team Hortonworks drive innovation in the open exclusively via the Apache Software Foundation process Hortonworks is responsible for around 50% of core code base advances to Apache Hadoop 27. HORTONWORKS DATA PLATFORM 2 SANDBOX Enterprise Ready YARN, the Hadoop Operating System Stinger Phase 2; Interactive SQL Queries at Petabyte Scale Reliable NoSQL IN Hadoop with Hbase Technical Specs Component Version Apache Hadoop 2.2.0 Apache Hive 0.12.0 Apache HCatalog 0.12.0 Apache HBase 0.96.0 Apache ZooKeeper 3.4.5 Apache Pig 0.12.0 Apache Sqoop 1.4.4 Apache Flume 1.4.0 Apache Oozie 4.0.0 Apache Ambari 1.4.1 Apache Mahout 0.8.0 Hue 2.3.0 28. HORTONWORKS DEMONSTRATION Objective: Secure data according to Role for row level security and Column Masking Problem: Cannot hide region data such as patient data from region specific users Solution: Leverage JBoss Data Virtualization to provide Row Level Security and Masking of columns Consume Compose Connect DV Dashboard to analyze the aggregated data by User Role JBoss Data Virtualization Hive SOURCE 1: Hive/Hadoop in the HDP contains US Region Data SOURCE 2: Hive/Hadoop in the HDP contains EU Region Data Hive 29. HORTONWORKS DEMONSTRATION Objective: Determine if sentiment data from the first week of the Iron Man 3 movie is a predictor of sales Problem: Cannot utilize social data and sentiment analysis with sales management system Solution: Leverage JBoss Data Virtualization to mashup Sentiment analysis data with ticket and merchandise sales data on MySQL into a single view of the data. Consume Compose Connect Excel Powerview and DV Dashboard to analyze the aggregated data JBoss Data Virtualization Hive SOURCE 1: Hive/Hadoop contains twitter data including sentiment SOURCE 2: MySQL data that includes ticket and merchandise sales 30. DEMONSTRATION SYSTEM REQUIREMENTS JDK Oracle JDK 1.6, 1.7 or OpenJDK 1.6 or 1.7 JBoss Data Virtualization v6 Beta http://jboss.org/products/datavirt.html JBoss Developer Studio http://jboss.org/products JBoss Integration Stack Tools (Teiid) https://devstudio.jboss.com/updates/7.0-development/integration-stack/ Slides, Code and References for demo https://github.com/DataVirtualizationByExample/Mashup-with-Hive-and-MySQL Hortonworks Data Platform (A VM for testing Hive/Hadoop) http://hortonworks.com/products/hdp-2/#install Red Hat Storage http://www.redhat.com/products/storage-server/ 31. JBOSS DATA VIRTUALIZATION PRODUCT ROADMAP AND BIG DATA 32. WHAT COMING: JBOSS DATA VIRTUALIZATION 6.1 33. BENEFITS OF DATA VIRTUALIZATION ON BIG DATA Enterprise democratization of big data Any reporting or analytical tool can be used Easy access to big data Seamless integration of big data and existing data assets Sharing of integration specifications Collaborative development on big data Fine-grained of security big data Increased time-to-market of reports on big data 34. WHY RED HAT FOR BIG DATA? Transform ALL data into actionable information Cost Effective, Comprehensive Platform Community based Innovation Enterprise Class Software and Support Data Analytics Data to Actionable Information Cycle 35. THANK YOU Q & A