using big data technologies to enable social media analytics- impetus white paper

Upload: impetus

Post on 05-Apr-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/31/2019 Using Big Data Technologies to Enable Social Media Analytics- Impetus White Paper

    1/12

    Using Big Data technologies to

    enable social media analytics

    W H I T E P A P E R

    Abstract

    In this white paper, Impetus talks about the need for

    building Big Data technologies based social analytics

    platform for better business insight. The paper also

    focuses on why social media analytics is important intodays world and how 3-D data sourcesthat is,

    internal, external and social datacan be utilized to

    build a data warehouse based on Big Data

    technologies.

    Impetus also shares in this white paper, its

    recommended solution, and how Big Data

    technologies can be used to optimize costs and handle

    and exponential increases in data over time.

    Impetus Technologies Inc.

    www.impetus.com

    February 2012

  • 7/31/2019 Using Big Data Technologies to Enable Social Media Analytics- Impetus White Paper

    2/12

    Using Big Data technologies to enable social media analytics

    2

    Table of Contents

    Introduction .................................................................................................................................................. 2

    The benefits of Social Analytics .................................................................................................................... 4

    Data sources that facilitate Social Media Analytics ...................................................................................... 5

    Technical tenets of Social Media Analytics ................................................................................................... 5

    Using Big Data technologies to enable Social Media Analytics .................................................................... 7

    Building a Big Data warehouse ..................................................................................................................... 8

    A step-by-step approach to creating the Big Data EDW ............................................................................... 9

    The Impetus solution .................................................................................................................................. 11

    The iLaDaP high level architecture.......................................................................................................... 11

    Summary ..................................................................................................................................................... 12

    Introduction

    Social Media Analytics is a discipline that helps organizations measure, assess

    and explain the performance of their social media initiatives.

    There are four stages of analyzing social media data, including the following:

    Step 1: collecting the data. This facilitates the compiling of reports and statistics

    that are to be shared with the management or the internal and external

    stakeholders.

    Step 2: measuring the data. This helps in Sentiment Analysis and gauging which

    products are well received in the marketplace.

    Step 3: analysis. Here, data is presented in a visual and interactive manner to

    the management, as well as the sales and marketing teams to provide better

    insights.

    Step 4: innovation. Based on the insights and analysis, there is a move towards

    innovation, where organizations determine the new products and ideas they are

    going to pursue, as a response to customer requirements. Innovation also helps

    unearth the cross sell or up sell opportunities that were not visible before.

    Social Analytics opens up a host of new opportunities and perspectives.

    Category-wise analysis of customer data for instance, enables their

  • 7/31/2019 Using Big Data Technologies to Enable Social Media Analytics- Impetus White Paper

    3/12

    Using Big Data technologies to enable social media analytics

    3

    demographic profiling and helps determine their usage patterns. Similarly, with

    Feature analysis, it is possible to figure out which forums, platforms or sources

    of data are more active as compared to others.

    Product Growth Analysis, which focuses on the data generated for a specific

    product, helps understand the response of users to that product. There is also aRecommendation Engine, which helps zero in on what is missing or lacking in a

    product range.

    Finally, Social Analytics enables Third Party Analysis, which is purely focused on

    what the public social media platforms, such as Twitter, Facebook, MySpace,

    etc. have to say about the product.

  • 7/31/2019 Using Big Data Technologies to Enable Social Media Analytics- Impetus White Paper

    4/12

    Using Big Data technologies to enable social media analytics

    4

    The benefits of Social Analytics

    Social Analytics is an outcome-based approach and one which creates visible

    Return on Investment (RoI).

    It helps organizations retain customers by addressing their concernsupfront, rather than being slaves to processes. The results of theanalytics help organizations retain brand preference in a fickle

    consumer world.

    It improves customer service and brings down the cost of operations. It enables organizations to add new customers, by understanding and

    addressing their requirements

    Social Analytics helps companies keep an eye on their competition. Witheasy access to social media data, it is simple to track and counter the

    moves of competitors.

    It helps companies remain proactive. The turnaround time for gatheringcustomer feedback is reduced drastically. Moreover, the reactions of

    customers and their subsequent actions can be predicted more

    accurately, enabling organizations to take appropriate measures.

  • 7/31/2019 Using Big Data Technologies to Enable Social Media Analytics- Impetus White Paper

    5/12

    Using Big Data technologies to enable social media analytics

    5

    Social Media Analytics effectively converges on-site, social media and third party

    data to extract useful information. Considering these factors, and the fact that it

    enables enterprises to leverage the colossal data that is continuously generated

    through social media interactions, Social Media Analytics should be made an

    integral part of the marketing and research strategies of enterprises.

    Data sources that facilitate Social Media Analytics

    Data sources include internal data, such as the purchase history of customers,

    their transactions, and profiles in the enterprise database. It also encompasses

    website traffic analysis, covering internal CSR logs, customer queries,

    automated agent discussions, complaints and resolutions, and employee

    insights.

    Data sources can also be the social activities and profile updates of customerson public social media platforms such as Twitter, Facebook, Myspace, LinkedIn,

    etc.

    External data sources can additionally be used, and customers analyzed by

    factoring in industry sources of information and market research reports.

    Technical tenets of Social Media Analytics

    Heres a look at what Social Media Analytics entails and enables:

    Clustering: Clustering is about capturing and analyzing various comments,

    demands, and questions that customers share with like-minded friends and

    groups, over social media platforms. It helps identify the appropriate response

    and behavioral anomalies.

    Classification: Having captured data on the activities of customers and their

    comments, it is possible to perform natural language processing on it to evolve

    patterns. These patterns can then be categorized and understood for

    appropriate responses. Organizations can use Classification to address the

    concerns of customers and approach them with products and offerings that

    really meet their needs.

    Sequential classification: This enables organizations to identify the subsequent

    steps and actions that customers might take, based on their recent experiences.

  • 7/31/2019 Using Big Data Technologies to Enable Social Media Analytics- Impetus White Paper

    6/12

  • 7/31/2019 Using Big Data Technologies to Enable Social Media Analytics- Impetus White Paper

    7/12

    Using Big Data technologies to enable social media analytics

    7

    Using Big Data technologies to enable Social Media Analytics

    One of the biggest challenges that organizations face with their social media

    data is its humungous size.

    Existing Enterprise Data warehousing (EDW) environments, designed decades

    ago, simply lack the ability to capture, and process social media data within a

    reasonable time. Moreover, these traditional EDWs have limited capabilities

    when it comes to analyzing the behavioral data of users. Traditional solutions

    cannot help companies in managing complex and unstructured data generated

    by social media interactions nor handle multimedia data.

    Using Big Data technologies is their best bet in this scenario. Big Data

    technologies can help organizations handle large volumes of complex,

    unstructured data from social sources, of the order of terabytes and petabytes,

    gain insights into customers and trends, store images and videos, and save

    hundreds of thousands of dollars per terabyte per year.

    Take the instance of a Big Data Social Analytics Platform which has to deal with

    information from various data sources such as Social Media sites and web 2.0

    enabled websites. The Platform can also pull historical bulk data lying around in

    existing systems using appropriate connectors.

    The connectors enable the conversion of the data from all kinds of data sources

    into a Hadoop-based data warehouse. After collecting this data, Apaches

    Mahout, a scalable machine learning and data mining solution, can be used to

    categorize the data and store it in accordance with the categories for later use.

    It is also possible to run Map-Reduce jobs that use Natural Language Tool Kits

    (NLTK) to perform natural language processing of the comments and feedback

    from the social data sources.

    The aptly massaged and categorized data can then be used to draw graphs, and

    analyze market sentiment about a product. The data can be used for MIS and to

    compile regulatory reports that need to be produced on a regular basis using

    Sqoop.

    Since the Big Data Social Analytics is powered by Hadoop, it can linearly scale up

    to thousands of nodes using commodity hardware. This spells a significant costadvantage for organizations, in the long run.

    Since it is important for businesses to track down, and take advantage of

    opportunities quickly, this platform can enable them to react to the events as

    they happen.

  • 7/31/2019 Using Big Data Technologies to Enable Social Media Analytics- Impetus White Paper

    8/12

    Using Big Data technologies to enable social media analytics

    8

    Building a Big Data warehouse

    In order to build a Big Data warehouse that extracts data from the sources

    discussed earlier, and draw pertinent insights from it, organizations must begin

    by grabbing social media data from various public social media platforms. The

    historical master data and transactional data about customers can be taken

    from existing systems. Sqoop can come in handy for pulling out the data into the

    RDBMS systems, which are already in place.

    Text User Location Source

    Gift card TweetUser USA, NY Twitter

    Free offer FaceUser USA, GA Facebook

  • 7/31/2019 Using Big Data Technologies to Enable Social Media Analytics- Impetus White Paper

    9/12

    Using Big Data technologies to enable social media analytics

    9

    For natural language processing, using a NLTK is a good Open Source option.

    Data preparation/Mashups can be accomplished by running Map-Reduce jobs

    over the collected data and massaging it.

    Apache Mahouts k-means algorithm can be used for clustering, while its Nave

    Bayesian algorithm can be used for classification/sentiment analysis using thecomments and tweets from social media data sources and identifying patterns.

    The item-based similarity algorithm of Mahout can be used for collaborative

    filtering and recommendations. When the data is ready for analytical reporting

    and deep mining, Hive or Pig can be used.

    A step-by-step approach to creating the Big Data EDW

    Step 1: The first step is to create and run training data through Mahout to help

    it understand how to classify social data feeds. Next, the feeds have to be

    collected from public social media platforms. This can be accomplished by

    performing keyword based searches and streaming in the result sets on a

    continuous basis. It is possible now to search on the basis of a brand name,

    product make and model, category, industry terminology, product segment,

    special offers and marketing buzzwords, using the various APIs offered by social

    media platforms. This classified data can then be dumped into an HBASE-based

    data warehouse constantly and continuously.

    The data from existing systems can also be imported into the HBASE base Big

    Data warehouse. Online content can be crawled and dumped into the HBASE

    database. Connectors are available for classification of online pages. Lucene

    and Solr are very suitable for this purpose.

    Step 2: At this stage, quantitative analytics can be performed on the collected

    data. It is possible to draw comparisons between Total tweets versus Our

    product specific tweets. This is accomplished by using Mahout algorithms over

    a Hadoop cluster. Organizations can also publish a daily trend watch. This may

    contain the total number of comments about the products of their

    competitors, versus the total number of comments about their own products.

    With customers increasingly using devices for connecting to social media

    platforms, it is now possible to perform location-based trend analysis.

    Classification and clustering is performed by using Mahout/NLTK processed

    data. Organizations can run the training data through Mahout/NLTK to help it

    understand how to build trained models. After that, it is possible to run the

    tweets and feed from other social media platforms through trained models, and

    have the tweets and comments classified. This provides a clear picture of the

    sentiments prevailing in the marketplace for the products of organizations as

    well as their competitors.

  • 7/31/2019 Using Big Data Technologies to Enable Social Media Analytics- Impetus White Paper

    10/12

    Using Big Data technologies to enable social media analytics

    10

    Companies can come up with recommendations by running the data through

    Mahout. These recommendations can then be factored into future product

    design and rollouts.

    Step 3: This step is about using customer data to recommend new and relatedproducts. Once companies have data from their existing systems as well as

    social sources, they can prepare the mock customer data for Social ID mapping

    and run Item or User based recommendations on this data using Mahout.

    At this stage, it is possible to produce Analytical Reports on data generated by

    Mahout. This can be accomplished by generating reports using a traditional

    Reporting product or framework. The nicely sliced and diced reporting data can

    be dumped into a MySQL database or some other SQL database, with the help

    of Sqoop. This SQL database can be used to meet the regular downstream

    reporting requirements of organizations. This will enable them to use their

    existing investments in reporting tools as well as provide the drill down reports

    for use by the management and Sales and Marketing departments.

    Alongside social media, this Big Data Media Analytics platform can be used to

    address other large data analytics requirements. The platform can give

    companies a head start in putting together the pieces of their Big Data strategy

    and provide them with an asymmetric advantage over competition.

  • 7/31/2019 Using Big Data Technologies to Enable Social Media Analytics- Impetus White Paper

    11/12

  • 7/31/2019 Using Big Data Technologies to Enable Social Media Analytics- Impetus White Paper

    12/12

    Using Big Data technologies to enable social media analytics

    12

    Summary

    Traditional Enterprise Data Warehouses do not have the ability to keep up with

    rapidly increasing social media data. The need of the hour is to effectively

    strategize and build a Big Data Analytics Platform to manage, store and derive

    insights from this digital data.

    Any single vendor technology may not be sufficient to undertake this task, and it

    is recommended that organizations go for Open Source options to build a Social

    Media Analytics Platform using Big Data technologies. The fact is that the

    success of a Big Data platform depends entire on the tools that are used.

    Organizations therefore, need to use discretion and select the most appropriate

    tools from the available options. Companies can also re-use existing EDW

    investments for their Big Data Analytics Platform.

    DisclaimersThe information contained in this document is the proprietary and exclusive property of Impetus Technologies Inc. except as otherwise indicated. No part of

    this document, in whole or in part, may be reproduced, stored, transmitted, or used for design purposes without the prior written permission of Impetus

    Technologies Inc.

    About Impetus

    Impetus Technologies offers Product Engineering and Technology R&D services for software product development.

    With ongoing investments in research and application of emerging technology areas, innovative business models, and

    an agile approach, we partner with our client base comprising large scale ISVs and technology innovators to deliver

    cutting-edge software products. Our expertise spans the domains of Big Data, SaaS, Cloud Computing, Mobility

    Solutions, Test Engineering, Performance Engineering, and Social Media among others.

    Impetus Technologies, Inc.

    5300 Stevens Creek Boulevard, Suite 450, San Jose, CA 95129, USATel: 408.252.7111 | Email:[email protected]

    Regional Development Centers - INDIA: New Delhi Bangalore Indore Hyderabad

    To know more visit:www.impetus.com