talend studio.docx

Upload: divya-gupta

Post on 02-Jun-2018

265 views

Category:

Documents


3 download

TRANSCRIPT

  • 8/10/2019 Talend Studio.docx

    1/26

    TALEND STUDIO

    US India IM Technology Updates

    Deloitte Page 1

    TALEND STUDIO

    January 2014

    Author: US India IM Technology Updates

    Deloitte Consulting LLP

  • 8/10/2019 Talend Studio.docx

    2/26

    TALEND STUDIO

    US India IM Technology Updates

    Deloitte Page 2

    Table of Contents

    1. Introduction to Talend Studio............................................................................................................ 3

    1.1 What is Talend Studio? ............................................................................................................. 3

    1.2 The Open Source Approach ..................................................................................................... 3

    1.3 Talend Open Studio and It`s Products.................................................................................... 4

    2. Big Data ............................................................................................................................................... 4

    2.1 How does Big Data fit into Talend Studio?............................................................................. 4

    2.2 Features/Benefits ....................................................................................................................... 5

    2.3 Unique Challenges ..................................................................................................................... 7

    2.4 Surveys on Big Data Benefits and Challenges...................................................................... 8

    2.5 Talend and Big Data ................................................................................................................ 10

    3. Data Integration ................................................................................................................................ 12

    3.1 Introduction ................................................................................................................................ 12

    3.2 Product Description .................................................................................................................. 12

    3.3 Features ..................................................................................................................................... 13

    3.4 Benefits ...................................................................................................................................... 14

    4. Data Quality ....................................................................................................................................... 17

    4.1 Introduction ................................................................................................................................ 17

    4.2 Features ..................................................................................................................................... 17

    4.3 Working Principles of Data Quality ........................................................................................ 19

    4.4 Benefits ...................................................................................................................................... 19

    4.5 Root Causes of Data Quality Problems................................................................................ 19

    5. Master Data Management (MDM) ................................................................................................. 22

    5.1 Introduction ................................................................................................................................ 22

    5.2 Talend MDM Functional Architecture.................................................................................... 22

    5.3 Features ..................................................................................................................................... 23

    5.4 Advantages ............................................................................................................................... 25

    6. Conclusion ......................................................................................................................................... 26

  • 8/10/2019 Talend Studio.docx

    3/26

    TALEND STUDIO

    US India IM Technology Updates

    Deloitte Page 3

    1. Introduction to Talend Studio

    1.1 What is Talend Studio?

    Talend provides data, application and business process integration solutions that enable

    organizations to effectively leverage all of their information assets. Talend unites integration

    projects and technologies to accelerate the time-to-value for the business.

    Talends flexible architecture easily adapts to future IT platforms , for big data environments.

    Talends unified solutions include Data Integration, Data Quality, Master Data Management,

    Enterprise Service Bus and Business Process Management. Talends platform is built around a

    common set of easy-to-use tools implemented across all products to maximize the skills of

    integration teams.

    Talend offers a flexible open source based platform, unlike traditional vendors offering closed

    and disjointed solutions, supported by a predictable and scalable value-based subscription

    model.

    Talend Open Studio for Data Integration can be leveraged by an organization for:

    synchronization or replication of databases right-time or batch exchanges of data ETL (Extract/Transform/Load) foranalytics

    data migration complex data transformation and loading data quality exercises big data

    1.2 The Open Source Approach

    Fueled by an open source approach, and liberatingly-available downloads, Talends perpetually

    expanding base of adopters fosters the expansion of a vigorous community that benefits users

    of the open source versions and commercial customers alike.

    By publishing the code of its core modules under the GNU Public License and the Apache

    License, Talend offers to the community the flexibility to modify and extend source code to meet

    their specific business needs. This enables them to engender their own components and

    apportion them with the rest of the community, rendering the products more versatile and more

    scalable for different uses and projects.

    http://en.wikipedia.org/wiki/Extract,_transform,_loadhttp://en.wikipedia.org/wiki/Analyticshttp://en.wikipedia.org/wiki/Data_migrationhttp://en.wikipedia.org/wiki/Data_qualityhttp://en.wikipedia.org/wiki/Big_datahttp://en.wikipedia.org/wiki/Big_datahttp://en.wikipedia.org/wiki/Data_qualityhttp://en.wikipedia.org/wiki/Data_migrationhttp://en.wikipedia.org/wiki/Analyticshttp://en.wikipedia.org/wiki/Extract,_transform,_load
  • 8/10/2019 Talend Studio.docx

    4/26

    TALEND STUDIO

    US India IM Technology Updates

    Deloitte Page 4

    Talend also receives the voluntary help of thousands of community members for testing,

    translation and improvements, resulting in a more rapid and more cost-effective product

    development as well an innovation advantage over traditional alternatives.

    Talends team members are contributors to key open source projects, and the company is asponsor or member of several open source foundations and consortiums, including the Apache

    Software Foundation and the Eclipse Foundation.

    1.3 Talend Open Studio and It`s Products

    Talend Open Studio is a powerful and versatile set of open source products for developing,

    testing, deploying and administrating data management and application integration projects.

    Products which will be discussed in the coming sections are:

    1. Big Data

    2. Data Integration

    3. Data Quality

    4. MDM

    2. Big Data

    2.1 How does Big Data fit into Talend Studio?

    Talend provides a facile-to-use graphical environment that sanctions developers to visually map

    big data sources and targets without the need to learn and write complicated code. Running

    100% natively on Hadoop, Talend Big Data provides massive scalability. Once a big data

    connection is configured the underlying code is automatically generated and can be deployed

    remotely as a job that runs natively on your big data cluster - HDFS, Pig, HCatalog, HBase,

    Sqoop or Hive.

    Talend's big data components have been tested and certified to work with leading big data

    Hadoop distributions, including Amazon EMR, Cloudera, IBM PureData, Hortonworks, MapR,

    Pivotal Greenplum, Pivotal HD, and SAP HANA. Talend provides out-of-the-box support for

    sizably voluminous data platforms from the leading appliance vendors including

    Greenplum/Pivotal, Netezza, Teradata, and Vertica.

    Talend provides two big data integration solutions to address all needs: Talend Open Studio for

    Big Data is a free, open source development tool and Talend Platform for Big Data adds data

    quality, advanced deployment and management functions across the enterprise.

  • 8/10/2019 Talend Studio.docx

    5/26

    TALEND STUDIO

    US India IM Technology Updates

    Deloitte Page 5

    2.2 Features/Benefits

    1. Integrating Disparate Data Sources

    One of the most prevalent business pressures driving Big Data investment is having an

    extravagant quantity of data silos. In many companies, important business data is spread out

    over inordinate quantity of locations, from databases to file stores to collaborative web portals to

    multiple versions of enterprise applications like ERP or CRM systems. Big Data companies had

    on average 20 unique internal data sources that stored data necessary for operations or

    analysis.

    Given the complexity inherent in accessing data from so many sources, an important piece of a

    Big Data foundation is the ability to easily move data from one source to another. Three

    quarters (74%) of Leaders in Big Data had ETL tools (extract, transform, load) to move their

    data, and they were 1.6-times more likely than Followers to be able to integrate data in real

    time.

  • 8/10/2019 Talend Studio.docx

    6/26

    TALEND STUDIO

    US India IM Technology Updates

    Deloitte Page 6

    2. Processing Data at High Speed with In-memory

    Not able to deliver information as quickly as business user needs it is the most common

    business pressure faced. Efficiently delivering data and analysis on business events as they

    occur, and making informed decisions on this information, are marks of an agile, data-drivenorganization. The demand for data is growing, and business users are asking for it faster than

    ever. Forty-seven percent (47%) of Big Data organizations need insight within an hour of a

    business event occurring, and more than a third (35%) need it in near real time.

    One of the most efficacious technologies for processing data at high speed is in-memory

    computing. These solutions load the target data directly into the random access memory (RAM)

    of a server or desktop, very close to the processer itself. This eliminates the need to connect to

    a storage array or disk, locate the desired information, and convey it over a network to the

    server doing the processing. Without these potential bottlenecks, the full power of the

    processors can be directly used to access and manipulate the desired information.

    There are several in-memory solutions aimed almost exclusively at immensely colossal global

    enterprises. However, the basic concept behind the technology can translate from advanced

    next-generation servers to machines as simple as a standard notebook. Given the steadily

    decrementing cost of RAM, most commodity servers have dozens or hundreds of gigabytes of

    memory available, opening the door for low-cost solutions.

    3. Handling the Unstructured Data Problem

    One of the major strengths of Big Data initiatives is the ability to collect, manage, and analyze

    not just structured data from relational databases, but unstructured or semi-structured data also

    from documents, emails, social media feeds, images, video, and rich media. A surprisingamount of business data resides in these unstructured formats.

    There are a number of different database management systems being designed to leverage

    unstructured information. On a broad level, most are referred to as NoSQL databases,

    commonly interpreted as "not only structured query language". This departure from traditional

    relational database management systems (RDBMS) gives more flexibility to the data formats

    being stored, managed, and accessed, as well as accommodating as a paramount repository of

    data to feed into analytic platforms.

    One of the most exciting developments in the world of unstructured data analysis is Apache

    Hadoop, an open-source file storage framework that couples the flexibility of managing

    unstructured data with high-powered processing capabilities. Predicated on Google's

    MapReduce programming model for breaking large, complex problems into small bite-sized

    chunks, Hadoop can utilize the processing power of clusters of ordinary servers to tackle tasks

    far larger than any single machine could accomplish. Given the fact that the code is open

    source, and can run on commodity hardware, it offers a powerful, cost-effective Big Data option.

  • 8/10/2019 Talend Studio.docx

    7/26

    TALEND STUDIO

    US India IM Technology Updates

    Deloitte Page 7

    4. Comprehensive corporate data analysis

    The benefits of taking the Big Data approach are not constrained to better and more

    comprehensive corporate data analysis by taking in the totality of enterprise data, not just a

    fraction of it. Timeliness of querying could also be radically improved. Complex queries that inthe past took hours to schedule and execute could in future be set up and run in seconds.

    5. Decision making, planning and execution

    There is additionally a veritude pay-off that can extend right across the enterprise. Just as big

    data can enhance decision making at the corporate level by integrating crucial unstructured data

    into BI and other enterprise data systems, it also promises to enhance decision making,

    planning and execution at the departmental and workgroup levels.

    6. Highlights errors and misinformation on the fly

    At the local level, most project management, planning and delivery is based on unstructureddata files. And these files contain errors, questionable assumptions and misinformation. Big

    data systems can highlight and spot-check such errors and misinformation on the fly, greatly

    improving the timeliness and efficacy of local programs.

    2.3 Unique Challenges

    1. Limited Big Data Resources

    The majority of architects and developers who understand big data are working for the original

    creators of big data technologies; companies like Facebook, Google, and Yahoo to name a few.

    There are others employed by numerous startups in this space like Hortonworks, Cloudera and

    MapR. The technology is still a bit complex to learn which restricts the rate at which new big

    data resources are available.

    2. Poor Data Quality + Big Data = Big Problems

    Bad data quality can have a big impact on effectiveness. Inconsistent or invalid data could have

    an exponential impact on analysis in the big data world. As analysis on big data grows, so too

    will the need for validation, standardization, enrichment and resolution of data. Even

    identification of linkages can be considered a data quality issue that needs to be resolved for big

    data.

    3. Setting up the system

    Setting up and running big data systems can pose significant skills and knowledge challenges

    as big data presents a very different paradigm to conventional enterprise relational database

    systems. Programming skills required for big data are also often quite different, centering

  • 8/10/2019 Talend Studio.docx

    8/26

    TALEND STUDIO

    US India IM Technology Updates

    Deloitte Page 8

    around language types such as Perl and statistical analytical programming languages such as R

    and ECL. Expertise in areas such as multi-tiered SQL constructions are replaced by a need to

    understand efficient regex constructions and Levenshtein algorithms.

    2.4 Surveys on Big Data Benefits and Challenges

    What would you identify as potentially the main benefits of integrating disparate data via

    big data file systems?

  • 8/10/2019 Talend Studio.docx

    9/26

    TALEND STUDIO

    US India IM Technology Updates

    Deloitte Page 9

    What do you see as the challenges to implementing big data in your organization?

  • 8/10/2019 Talend Studio.docx

    10/26

    TALEND STUDIO

    US India IM Technology Updates

    Deloitte Page 10

    2.5 Talend and Big Data

    Talends open source approach and flexible integration platform for big data enables users to

    easily connect and analyze data from different systems to help drive and improve business

    performance. Talends big data capabilities integrate with todays big data market leaders such

    as Cloudera, Hortonworks, Google, EMC/Greenplum, MapR, Netezza, Teradata and Vertica,

    positioning Talend as a leader in the management of big data. Talends goal is to democratize

    the big data market just as it has with data integration, data quality, master data management,application integration and business process management.

    Talend offers three big data products:

    1. Talend Open Studio for Big Data

    2. Talend Enterprise Big Data

    3. Talend Platform for Big Data

    Talend Open Studio for Big Data

    Talend Open Studio for Big Data is a free open source development tool that packages our big

    data components for Hadoop, Hbase, Hive, HCatalog, Oozie, Sqoop and Pig with our base

    Talend Open Studio for Data Integration. It was released into the community under the Apache

    license. It also allows you to bridge the old with the new as it includes hundreds of components

    for existing systems like SAP, Oracle, DB2, Teradata and many others.

    Talend Enterprise Big Data

  • 8/10/2019 Talend Studio.docx

    11/26

    TALEND STUDIO

    US India IM Technology Updates

    Deloitte Page 11

    Talend Enterprise Big Data extends the Talend Open Studio for Big Data product with

    professional-grade technical support and enterprise-class features. An organization will upgrade

    to this version to take advantage of advanced collaboration, monitoring and project

    management features.

    Talend Platform for Big Data

    The Talend Platform for Big Data addresses the challenges of big data integration, data quality

    and big data governance, simplifying the loading, extraction and processing of large and diverse

    data sets so you can make more informed and timely decisions. Data quality components allow

    you to do big data profiling, cleansing and matching using a massively parallel environment

    such as Hadoop. Advanced clustering features allow you to integrate at any scale.

    Delivered on top of the Talend unified platform, Talend Platform for Big Data improves

    productivity across data management domains by sharing a common code repository and

    tooling for scheduling, metadata management, data processing and service enablement.

  • 8/10/2019 Talend Studio.docx

    12/26

    TALEND STUDIO

    US India IM Technology Updates

    Deloitte Page 12

    3. Data Integration

    3.1 Introduction

    Product Talend Open Studio for Data Integration is an open source data integration product

    developed by Talend and designed to combine, convert and update data in various locations

    across a business.

    It was launched in October 2006, under Talend Open Studio, its previous name and is

    distributed under GPLv2. It had been downloaded over 1 million times in January 2008. The

    product totaled 20 million downloads and over 3500 clients in January 2012.

    Talend also provides Talend Enterprise Data Integration, a commercial extension to Talend

    Open Studio for Data Integration with additional features, technical support and IPindemnification.

    The product provides an extensible, high efficiency, open source set of tools to access,

    transform and integrate data from any business system in real time or batch to meet both

    operational and analytical data integration needs. With 450+ connectors, it integrates almost

    any data source. The broad range of use cases addressed include: massive scale integration

    (big data/ NoSQL), ETL for business intelligence and data warehousing, data synchronization,

    data migration, data sharing, and data services.

    3.2 Product Description

    Talend Open Studio for Data Integration operates as a code generator. It produces data-

    transformation scripts and underlying programs in Java. Its GUI gives access to a metadata

    repository and to a graphical designer. The metadata repository contains the definitions and

    configuration for each job - but not the actual data being transformed or moved. The information

    in the metadata repository is used by all of the components of Talend Open Studio for Data

    Integration.

    The product is based on Eclipse RCP. Most of its contributors work for commercial open-source

    vendor Talend.

    Using graphical components, users design individual jobs, from a set of over 400, for

    transformation, connectivity, or other operations. The jobs created can be executed from within

    the studio or as standalone scripts.

  • 8/10/2019 Talend Studio.docx

    13/26

    TALEND STUDIO

    US India IM Technology Updates

    Deloitte Page 13

    3.3 Features

    1. A Comprehensive Solution

    Talend provides a Business Modeler, a visual tool for designing business logic for an

    application; a Job Designer, a visual tool for functional diagramming, delineating data

    development and flow sequencing using components and connectors. It also provides a

    Metadata Manager, for storing and managing all project metadata, including contextual data

    such as database connection details and file paths.

    2. Broad Connectivity to All Systems

    Talend connects natively to the following databases: packaged applications (ERP, CRM, etc.),

    SaaS and Cloud applications, mainframes, files, Web services, data warehouses, data marts,

    and OLAP applications. It offers built-in advanced components for ETL including stringmanipulators, Slowly Changing Dimensions, automatic lookup handling and bulk loading. Direct

    integration is provided with data quality, data matching, MDM and related functions. Talend

    connects to popular cloud apps including Salesforce.com and SugarCRM.

    3. Teamwork and Collaboration

    The shared repository consolidates all project information and enterprise metadata in a

    centralized repository. This repository is shared by all stakeholders: business users, job

    developers, and IT operations staff. Developers can easily version jobs with the ability to roll-

    back to a prior version.

    4. Advanced Management and Monitoring

    Talend includes various features such as powerful testing, debugging, management and tuning

    features with real-time tracking of data execution statistics and an advanced trace mode. The

    product incorporates tools for managing the simplest jobs to the most complex ones, from single

    jobs to thousands of jobs. Processes can be deployed across enterprise and grid systems as

    data services using the export tool.

    It is a free, open source development tool; Talend Enterprise Data Integration adds teamwork

    and management functions; Talend Platform for Data Management adds data quality and

    clustering. Below is the comparison of all three:

  • 8/10/2019 Talend Studio.docx

    14/26

    TALEND STUDIO

    US India IM Technology Updates

    Deloitte Page 14

    3.4 Benefits

    1. Optimized Time and Cost

    Talend's solutions are 50% to 80% cheaper than equivalent proprietary solutions offered in the

    market. They are also less expensive to deploy, maintain, and support. In addition, they

    facilitate faster development and production as compared to proprietary tools and hand-coding.

    2. Functionality, Performance, Reliability

    Far from the stereotypes concerning the lack of professionalism surrounding open Source,

    Talends solutions are real business tools, offering a level of functionality equal to that of

    proprietary vendors. And being open source doesnt mean that a solution was developed by

    volunteers in their spare time. Talend has its own R&D teams and, as discussed above, its

    solutions are enriched by contributions from the community. Although Talend controls the

    product roadmap, the company continually listens to the opinion and needs of its community

    and its customers to help effect change. Talend is committed to providing powerful, reliable

    solutions that attract many users.

  • 8/10/2019 Talend Studio.docx

    15/26

    TALEND STUDIO

    US India IM Technology Updates

    Deloitte Page 15

    3. Universality, Versatility for all Projects

    Talends data integration solutions are not limited to standard ETL (Extract-Transform-Load)

    functionality for Business Intelligence, but can also be used for operational data integration

    projects; typically these are still done manually, but can benefit from a data integrationenvironment

    4. The Broadest Connectivity

    With more than 400 connectors, Talend's solutions provide virtually unlimited connectivity to

    enterprise systemsdatabases, software packages, mainframes, files, Web Services, etc. No

    other solution available on the market today offers so many connectors.

    5. Enterprise-Grade Support

    Contrary to popular belief, commercial open source vendors provide real support services,

    similar in type and quality to those offered by proprietary vendors. This is the case with Talend,whose services are designed to facilitate teamwork and increase productivity. These services

    are delivered by Talend experts, or by Talend-certified partners, and offer the same services as

    the largest proprietary vendorsservice level agreements (SLA), guaranteed response times,

    etc.

    6. The Strength of a Community

    When one refers to open source, the community is also there. Open source users benefit from

    the strength of this community, both in terms of support and product development.

    7. Stable & Predictable Pricing Model

    Proprietary vendors charge a data tax which increases the cost of processing additional

    dataadding servers, data sources/targets, or even transitioning to multi-core CPUs requires

    the purchase of additional licenses. Thus, infrastructure costs are not predictable and

    companies cant determine when they will reach their limits. With Talend, the cost of the solution

    is based on the number of developers of data integration processes. You can access new data

    as needed. For instance, when setting up a new application or acquiring a new business

    operations sometimes hard to predict in advanceyou dont need to buy additional licenses.

    Moreover, if a company is moving from development mode to maintenance mode, it doesnt

    have to keep all its licenses.

    8. Fast Learning Curve

    Talend tools are user-friendly and very easy to handle. The graphical user interface is intuitive

    and doesntrequire formal training. TalendsJob Designer provides both a graphical and a

    functional view of the actual integration processes using a graphical palette of components and

    connectorsthe Component Library. Integration processes are built by simply dragging and

    dropping components and connectors onto the workspace, drawing connections and

  • 8/10/2019 Talend Studio.docx

    16/26

    TALEND STUDIO

    US India IM Technology Updates

    Deloitte Page 16

    relationships between them, and setting their properties (most properties are inherited from the

    metadata). Talend`s Business Modeler leverages a top-down approach, allowing line-of-

    business stakeholders to get involved in the design of the integration processes and to monitor

    development progress.

    9. No Barrier to Adoption

    Implementing Talend Open Studio is quick and easyjust download the latest version from

    Talend`s website and install it.

    The product is free, which means that you don`t need to justify it to management or start a

    formal procurement process before solving your integration issues. You won`t spend any time

    on administration tasks and wont need any vendor face time. You can use the product in an

    unlimited mode and, of course, keep it as long as you like.

    10. Market-Ready Products, Not Evaluation Versions

    Talend Open Studio is a complete product comprising many features and a "wide range of

    connectors". Talend's flagship product, it is the most open, innovative and powerful data

    integration solution available today. Talend Open Studio isn`t a lightweight product or trial ware.

    It contains all the features required for building powerful data integration processes, and is freely

    downloadable and usable under the GPL v2 license.

    Talend Integration Suite is an enhanced version of Talend Open Studio, providing additional,

    enterprise-level functionality (collaboration, automated deployment, load balancing, monitoring)

    for enterprise-grade projects. The solution includes high-level technical support to respond to

    corporate issues and legal guarantees of intellectual property protection (IP indemnification).

  • 8/10/2019 Talend Studio.docx

    17/26

    TALEND STUDIO

    US India IM Technology Updates

    Deloitte Page 17

    4. Data Quality

    4.1 Introduction

    Talend Open Studio for Data Quality is an open source software that helps companies to

    assess the quality of data contained in their databases and business applications, and to decide

    which actions must be taken to correct erroneous or incomplete data.

    Talend Open Studio for Data Quality was launched in June 2008, under its previous name:

    Talend Open Profiler and is distributed underLGPL.Talend also provides Talend Enterprise

    Data Quality, integrated in Talend Enterprise Data Integration, which is a commercial extension

    to Talend Open Studio for Data Quality with additional features.

    This data profiling tool allows business users to define a set of designators for each dataelement that needs to be analyzed or monitored. It produces sophisticated reports and graphs

    that let users analyze the level of quality of the data.

    The following profiling needs are addressed by Talend Open Studio for Data Quality:

    A. Metadata discovery, which identifies the structure of the databases that need to be

    analyzed.

    B. Statistics definition, which defines thestatistics and metrics that need to be measured on

    each data item.

    C. Results and graphs, which make it easy to view the results and assess the level

    ofquality of the data.

    4.2 Features

    1. A Complete Solution

    Talend provides a consummate data quality solution with built-in data connectivity, profiling,

    cleansing, matching and monitoring to address all your data quality and data governance needs.

    Data quality capabilities can be scaled to handle anything from flat text files to enterprise data to

    Hadoop. Talend is able to leverage the best capabilities of the platform to provide data qualityseamlessly across many data types and over any data volume.

    2. Data Profiling

    Data profiling is all about understanding data completely, and making sure it conforms to

    company and industry standards. With Talend, users can profile and analyze data, then create

    and share web-based reports on the quality of the data. With this information you can build team

    http://en.wikipedia.org/wiki/LGPLhttp://en.wikipedia.org/wiki/Statisticshttp://en.wikipedia.org/wiki/Statisticshttp://en.wikipedia.org/wiki/Quality_(business)http://en.wikipedia.org/wiki/Quality_(business)http://en.wikipedia.org/wiki/Statisticshttp://en.wikipedia.org/wiki/Statisticshttp://en.wikipedia.org/wiki/LGPL
  • 8/10/2019 Talend Studio.docx

    18/26

    TALEND STUDIO

    US India IM Technology Updates

    Deloitte Page 18

    alignment on the use of data and highlight areas for improvement. Talend provides pre-defined

    tests to ensure data quality is fit-for-use within your enterprise application, or you can define

    your own.

    3. Data Standardization and Enrichment

    The secret behind Talends data standardization and enrichment capabilities are built-in data

    integration and powerful parsing technology. Use the integrated parsing technology to assign

    structure to data that has none. Then achieve data quality enhancement and enrichment by

    using free reference data. Talend provides ways to integrate most external reference data

    sources for postal validation, business identification and credit score information, to name just a

    few.

    4. Data Matching and Survivorship

    Talend provides a variety of data matching solutions that moves the process of overly complex,

    green-screen match rules editing to real-world business users. Users can configure matching

    within the Talend user environment instead of heavy editing of rules files and using multiple

    GUIs that are associated with most data quality tools. Create what-if analysis when modifying

    matching techniques with charts and graphs for key matching metrics.

    Talend incorporates data quality into several products including Talend Open Studio for Data

    Quality and Talend Platform for Data Management.

  • 8/10/2019 Talend Studio.docx

    19/26

    TALEND STUDIO

    US India IM Technology Updates

    Deloitte Page 19

    4.3 Working Principles of Data Quality

    To profile data using the studio involves the following steps:

    1. Connecting to a data source including databases, a Master Data Management (MDM)servers and delimited files or excel files in order to be able to access the tables and columns

    on which you want to define and execute analyses.

    2. Defining any of the available data quality analyses including database content analysis,column analysis, table analysis, redundancy analysis, correlation analysis, etc. Theseanalyses will carry out data profiling processes that will define the content, structure andquality of highly intricate data structures.

    3. Generating reports from different analyses and store them in a database. These reportsallows to compare current and historical statistics to determine the improvement ordegradation of data.

    4. Access different analytical tools that will allow you to explore and monitor the reportsgenerated in the studio.

    4.4 Benefits

    1. Improves performance and provides Big Data alternative for Data Quality.

    2. Improves data quality, which directly impact business analysis and decision making withinan organization.

    3. Get developers up to speed quickly and learn new techniques that can be applied directlyinto real world projects.

    4. Less testing time, more accuracy, improves the overall data quality and analysis.

    5. Better data, means better execution, means more cost effectives - when it comes toleveraging the CRM system.

    4.5 Root Causes of Data Quality Problems

    We all can very well identify data quality problems. These problems can undermine yourorganizations ability to work efficiently and comply with government. The specific technicalproblems include missing data, misfielded attributes, duplicate records and broken data modelsto name just a few.

    But rather than merely patching up bad data, the best strategy for fighting data quality issues isto understand the root causes and put new processes in place to prevent them.

  • 8/10/2019 Talend Studio.docx

    20/26

  • 8/10/2019 Talend Studio.docx

    21/26

    TALEND STUDIO

    US India IM Technology Updates

    Deloitte Page 21

    7. Defining Data Quality

    More and more organizations recognize the need for data quality, but there are different ways toclean data and improve data quality. You can:

    Write some code and cleanse manually Handle data quality within the source application

    Buy tools to cleanse data

    However, consider what happens when you have two or more of these types of data qualityprocesses adjusting the data. This might bring anomalies in data.

  • 8/10/2019 Talend Studio.docx

    22/26

    TALEND STUDIO

    US India IM Technology Updates

    Deloitte Page 22

    5. Master Data Management (MDM)

    5.1 Introduction

    Mastering data has been a goal for as long as there have been disparate, heterogeneous, data

    sources. Until today, access to the necessary tools to realize this goal has been cost-prohibitive.

    Additionally, homegrown development of a proper master data solution often proves too

    complex and difficult to evolve and maintain. Master Data Management (MDM) has proven to be

    extremely valuable, but only as an esoteric solution restricted to elite or large organizations with

    huge resources. Open source MDM reduces implementation complexity, time-to-value, and

    cost. In fact, open source in any market helps organizations overcome these obstacles and

    realize their goals.

    Talend MDM provides the technology to create a unified view of information and manage thatmaster view over time. Talend simplifies MDM with a flexible and open approach to master data

    projects, and combines complete functionality for data integration, data quality, data profiling,

    data mastering, and data governance. Talend MDM provides collaborative workflow enabling

    teams to build and enforce data governance policies. It provides a system of record and

    ensures that master data stays clean and is made available to those who need it.

    Talends MDM solutions extend the core Talend competencies of integration and quality with

    functions that rationalize, master, and perform data stewardship. In fact, the Talend`s approach

    to MDM incorporates these core competencies into the functional definition of Master Data

    Management. This is a natural extension of an already successful product offering.

    High quality master data is extremely important for enterprise business processes and analytics.

    However, data resides in disparate systems across an organization, is rarely in a standard

    formats and is often found in varying quality levels. Open source is a positive force that can

    galvanize the MDM community to strengthen the definition of its function and purpose while

    elongating the reach and availability of a solution. Users now have the freedom to master their

    data.

    5.2 Talend MDM Functional Architecture

    Talend MDM architecture can be broken down into functional blocks that enable interaction

    between users and the MDM Hub and their corresponding IT needs.

  • 8/10/2019 Talend Studio.docx

    23/26

    TALEND STUDIO

    US India IM Technology Updates

    Deloitte Page 23

    The main functional blocks are:

    1. The Integrationblock, where data integration can be carried out regardless of processescomplexity and data volumes.

    2. The Profilingblock, used for Data Quality, where data sources are profiled and cleansed

    before being loaded into the MDM Hub.3. MDM model where the master data entities of the organization are defined and managed.4. MDM hub where the data associated with the data models and user roles are stored in XML

    format.

    Talend Studio tightly couples the above four blocks to provide processes for collecting,aggregating, matching, consolidating, quality-assuring, persisting and distributing datathroughout the organization.

    5.3 Features

    1. Master Any Domain

    The scalability and flexibility to model and master any domain is provided by Talend. You can

    start by mastering a single domain and incrementally increase this to include other domains all

    within a single deployment. You can define advanced business rules, validations, access rights

    and registry lookups directly on the model. The model sits at the center of the MDM solution and

    drives all communications with closed loop quality, workflow and end to end integration

    functions.

  • 8/10/2019 Talend Studio.docx

    24/26

    TALEND STUDIO

    US India IM Technology Updates

    Deloitte Page 24

    2. Powerful Business User Interface

    Talend`s GUI provides the business users a rich utility to search and manage master data.

    Standard composite views are provided to gain a 360-degree view of any mastered entity or to

    investigate a hierarchy. Businessoriented views can be specified with the Smart View customform designer.

    3. Workflow and Business Processes

    Talend sanctions you to define and track master data through a series of steps using tasks. This

    workflow utilizes an intuitive graphical interface and displays a graphical trail of process steps

    for contextual history. Every create, read, update or delete of master data is evaluated and

    initiates an event to look for a duplicate, enrich data, synchronize a back-end system, send an

    email and even kick off a custom workflow.

    4. Data Quality Built-in

    Comprehensive data quality features are provided by Talend`s MDM. These cover data

    profiling, data standardization, parsing and next generation data matching that provides a

    superior alternative to overly complex processes used by legacy vendors. Sophisticated

    matching algorithms are provided to help find duplicates and probable duplicates. Stewardship

    and survivorship components define common business rules to apply to sets of duplicate

    records and automate the creation of a single master.

    5. Manage Permissions, Users and Groups

    Role-based access controls are applied to every concept in Talend MDM - from read/write

    access to master data attributes to workflow and user permissions.

    Talend provides two MDM editions to address organizational needs: Talend Open Studio for

    MDM is a free, open source development tool and Talend Platform for MDM adds advanced

    deployment and management functions.

  • 8/10/2019 Talend Studio.docx

    25/26

    TALEND STUDIO

    US India IM Technology Updates

    Deloitte Page 25

    5.4 Advantages

    1. The new UI allows users to easily find and navigate through master data and increasesadoption of master data applications.

    2. Customization of the web UI improves adoption in the business audience for MDM. Customforms allow for a consistent look and feel for an organization.

    3. Find master data in seconds. No other MDM solution provides this function.

    4. Provide a business user with a single view of a master entity to gather complete insight.

    5. Enable the business to create and manage large sets of master data.

    6. Enable a team of developers across all MDM functional categories to share and collaborateon an MDM project.

    7. Manage simple and complex hierarchies in an easy to use and intuitive interface. TheTalend platform features a user-friendly interface specifically designed to promote sharingand collaboration.

    8. Talends open source business model gives OEM partners the best of both worlds.Leverage the mindshare of more than 750,000 developers and all that they represent: over500 connectors, including sophisticated technologies such as Hadoop, SAP, Salesforce.comand more; rigorous quality assurance; useful, user-requested features, and user forumsteeming with expertise.

  • 8/10/2019 Talend Studio.docx

    26/26

    TALEND STUDIO

    US India IM Technology Updates

    6. Conclusion

    Talend has progressively built best of breed solutions for all integration needs while working ona common unified platform. Products by Talend are a powerful and versatile open sourcesolutions for Big Data integration that addresses the needs of the data analyst by providingthem with a graphical tool that abstracts the underling complexities of big data technologies anddramatically improve the efficiency of Job Design.

    With data spread out across locations, from Databases to file store to multiple versionsof Enterprise Applications, it helps Integrating Disparate Data Sources.

    Since it is an Open Source Solution, it helps processing data at High Speed whichreduces the cost significantly. Talend's solutions are 50% to 80% cheaper thanequivalent proprietary solutions offered on the market.

    Improves data quality, which directly impacts business analysis and decision makingwithin an organization.

    While we step in the Big Data Era, an open source solution with numerous advantages is bound

    to succeed. It is very likely that the big data landscape will see more innovations before the

    unanimity emerges on the right technology architecture.