data profiing using informatica

Upload: chandan-deo

Post on 02-Jun-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/10/2019 Data Profiing Using Informatica

    1/5

    Data profiing using Informatica: Data Explorer

    Data profiling is a technique used to examine data for different purposes like determining accuracy

    and completeness. This process examines a data source such as a database to uncover the

    erroneous areas in data organization. Deployment of this technique improves data quality.

    Data profiling is the method of examining the data available in a data source and collecting statisticsand information about that data. Such statistics help to identify the use and data quality of metadata.

    This method is widely used in enterprise data warehousing.

    Data profiling clarifies the structure, relationship, content and derivation rules of data, which aid in

    the understanding of anomalies within metadata. Data profiling uses different kinds of descriptive

    statistics including mean, minimum, maximum, percentile, frequency and other aggregates such as

    count and sum. The additional metadata information obtained during profiling is data type, length,

    discrete values, uniqueness and abstract type recognition.

    1. Find out whether existing data can easily be used for other purposes

    2. Improve the ability to search the data bytaggingit withkeywords,descriptions, or assigning

    it to a category

    3. Givemetricsondata qualityincluding whether the data conforms to particular standards or

    patterns

    4. Assess the risk involved inintegrating datafor new applications, including the challenges

    ofjoins

    5. Assess whethermetadataaccurately describes the actual values in the source database

    6. Understanding data challenges early in any data intensive project, so that late project

    surprises are avoided. Finding data problems late in the project can lead to delays and cost

    overruns.

    7. Have an enterprise view of all data, for uses such asmaster data managementwhere key

    data is needed, ordata governancefor improving data quality.

    Data profiling is an analysis of the candidate data sources for a data warehouse to clarify the

    structure, content, relationships and derivation rules of the data

    Data profiling utilizes different kinds of descriptive statistics such as minimum, maximum, mean,

    mode, percentile, standard deviation, frequency, and variation as well as other aggregates such as

    count and sum

    Additional metadata information obtained during data profiling could be data type, length, discrete

    values, uniqueness, occurrence of null values, typical string patterns, and abstract type

    recognition.[2][4][5]The metadata can then be used to discover problems such as illegal values,

    misspelling, missing values, varying value representation, and duplicates.

    http://en.wikipedia.org/wiki/Tag_(metadata)http://en.wikipedia.org/wiki/Tag_(metadata)http://en.wikipedia.org/wiki/Tag_(metadata)http://en.wikipedia.org/wiki/Keywordshttp://en.wikipedia.org/wiki/Keywordshttp://en.wikipedia.org/wiki/Keywordshttp://en.wikipedia.org/wiki/Software_metrichttp://en.wikipedia.org/wiki/Software_metrichttp://en.wikipedia.org/wiki/Software_metrichttp://en.wikipedia.org/wiki/Data_qualityhttp://en.wikipedia.org/wiki/Data_qualityhttp://en.wikipedia.org/wiki/Data_qualityhttp://en.wikipedia.org/wiki/Data_integrationhttp://en.wikipedia.org/wiki/Data_integrationhttp://en.wikipedia.org/wiki/Data_integrationhttp://en.wikipedia.org/wiki/Joinhttp://en.wikipedia.org/wiki/Joinhttp://en.wikipedia.org/wiki/Joinhttp://en.wikipedia.org/wiki/Metadatahttp://en.wikipedia.org/wiki/Metadatahttp://en.wikipedia.org/wiki/Metadatahttp://en.wikipedia.org/wiki/Master_data_managementhttp://en.wikipedia.org/wiki/Master_data_managementhttp://en.wikipedia.org/wiki/Master_data_managementhttp://en.wikipedia.org/wiki/Data_governancehttp://en.wikipedia.org/wiki/Data_governancehttp://en.wikipedia.org/wiki/Data_governancehttp://en.wikipedia.org/wiki/Data_profiling#cite_note-Loshin2009-2http://en.wikipedia.org/wiki/Data_profiling#cite_note-Loshin2009-2http://en.wikipedia.org/wiki/Data_profiling#cite_note-Singh2010-5http://en.wikipedia.org/wiki/Data_profiling#cite_note-Singh2010-5http://en.wikipedia.org/wiki/Data_profiling#cite_note-Singh2010-5http://en.wikipedia.org/wiki/Data_profiling#cite_note-Loshin2009-2http://en.wikipedia.org/wiki/Data_profiling#cite_note-Loshin2009-2http://en.wikipedia.org/wiki/Data_governancehttp://en.wikipedia.org/wiki/Master_data_managementhttp://en.wikipedia.org/wiki/Metadatahttp://en.wikipedia.org/wiki/Joinhttp://en.wikipedia.org/wiki/Data_integrationhttp://en.wikipedia.org/wiki/Data_qualityhttp://en.wikipedia.org/wiki/Software_metrichttp://en.wikipedia.org/wiki/Keywordshttp://en.wikipedia.org/wiki/Tag_(metadata)
  • 8/10/2019 Data Profiing Using Informatica

    2/5

    http://www.datamartist.com/[Data Profiling, transformation, visualization, and migration tool, 30 days

    free for use.]

    Data Profiling using SQL Server 2008

    Configuring the profiling tool

    Start whichever Visual Studio environment you have, and create a new Integration Services

    project. Next, from the SSIS Toolbox, drag a Data Profiling Task onto the design surface and

    double-click on it to configure.

    Profiling results are stored as an XML file, so specify the name and location of the file.

    Click in the blank box next to Destination, and an arrow will appear.

    Click on the arrow and then on .

    In the resulting box, specify a path and filename (including .xml suffix).

    Click OKthen click the Quick Profilebutton.

    Click the Newbutton next toADO.NET Connection.

    In the box that appears, specify the SQL Server and database hosting the data to be profiled

    (Im using theAdventureWorksLT2012database) then click OK.

    Use the Table or Viewdrop-down box to choose the data to be profiled (ImusingSalesLT.Product).

    http://www.datamartist.com/http://www.datamartist.com/http://www.datamartist.com/
  • 8/10/2019 Data Profiing Using Informatica

    3/5

    Data governance is aquality controldiscipline for assessing, managing, using, improving,

    monitoring, maintaining, and protecting organizational information.

    Six Steps to Data Governance SuccessWith more than a billion people connected online today, we are at the dawn of a

    data explosion, and it is becoming increasingly difficult to manage and control the

    terabytes of data residing within different parts of the organization. Manycompanies use the fortress method, a big thick perimeter wall to keep out the

    bad guys. But this method can be problematic since not all data has the same

    value, not all risks are outside the perimeter, and not all controls can effectively

    prevent fraud. The fortress model of data security creates a one-size-fits-all

    approach, allowing organizations to overprotect low-quality data and

    http://en.wikipedia.org/wiki/Quality_controlhttp://en.wikipedia.org/wiki/Quality_controlhttp://en.wikipedia.org/wiki/Quality_controlhttp://en.wikipedia.org/wiki/Quality_control
  • 8/10/2019 Data Profiing Using Informatica

    4/5

    underprotect high-value information like customer account details or employee

    Social Security numbers, regardless of business context or use.

    Step 1: Get a governor and the right people in place togovern

    The first step in any successful data-governance program isidentifying an individual within the organization who carries thedelegated authority of the CEO and making that personaccountable to make things happen. There is no substitute forstrong leadership.

    Data governance is a political challenge that requires buildingconsensus among many diverse stakeholders. Politicalleadership within the organization is therefore a priority. Onceestablished, the governor can create a governing councilcomposed of organizational stakeholders to formulate

    stewardship policies and report progress to the CEO and boardof directors.

    Step 2: Survey your situation

    Once you have the leadership team in place, it needs to surveythe territory and inventory current practices across many diversedomains. The teams need to see across the stovepipes, and anenterprise data-governance assessment methodology isimperative for this task. It helps benchmark where theorganizations data-governance program is today and delivers a

    road map to determine where it will be tomorrow.

    Step 3: Develop a data-governance strategy

    After the data-governance assessment, the governance councilshould look into creating a vision of where it wants the companysdata-governance practices to be in the next few years, therebycreating a vision for the future. The council should workbackward, and create realistic milestones and project plans to fillrelevant gaps by establishing key performance indicators to trackprogress and deliver annual reports to the CEO and the board to

    validate results. Step 4: Calculate the value of your data

    If companies dont know what its worth, they cant enhance,protect or measure the value of the data to the bottom line. Dataisnt a normal commodity. Its like water out of a tapvital to lifeyet so often taken for granted. But you cant calculate the valueof something if you dont know its price.

  • 8/10/2019 Data Profiing Using Informatica

    5/5

    If you want to calculate the value of your data, build an internalmarketplace for data based on user entitlements and the utility ofIT services. When everyone in an organization is paying for ITservices and data directly, the value of data is part of thebusiness P&L.

    Step 5: Calculate the probability of risk

    Knowing how data has been used and abused in the past is anindicator of how it might be compromised and disclosed in thefuture. Every organization has causes, events and losses that arelost in stovepipes, hierarchies and business reports. This data isalready available and unused by most organizations. Collectingit, relating its meaning and studying loss trends over time canhelp any organization transform risk management into a fact-based, business intelligence method for analyzing past events,forecasting future losses and changing current policyrequirements to improve your mitigation strategies.

    Step 6: Monitor the efficacy of your controls

    Data governance is largely about organizational behavior.Organizations change every day, and therefore their data, itsvalue and risk also shift rapidly. Unfortunately, mostorganizations assess themselves only once a year. If a businessisnt able to change organizational controls to meet demands ona daily or weekly basis, it isnt governing change.

    In business, master data management (MDM)comprises the processes, governance, policies,

    standards and tools that consistently define and manage the critical data of anorganizationto

    provide a single point of reference.[1]

    The data that is mastered may include:

    reference data- the business objects for transactions, and the dimensions for analysis

    analytical data - supports decision making[2][3]

    http://en.wikipedia.org/wiki/Organizationhttp://en.wikipedia.org/wiki/Organizationhttp://en.wikipedia.org/wiki/Organizationhttp://en.wikipedia.org/wiki/Master_data_management#cite_note-1http://en.wikipedia.org/wiki/Master_data_management#cite_note-1http://en.wikipedia.org/wiki/Master_data_management#cite_note-1http://en.wikipedia.org/wiki/Reference_datahttp://en.wikipedia.org/wiki/Reference_datahttp://en.wikipedia.org/wiki/Master_data_management#cite_note-2http://en.wikipedia.org/wiki/Master_data_management#cite_note-2http://en.wikipedia.org/wiki/Master_data_management#cite_note-2http://en.wikipedia.org/wiki/Master_data_management#cite_note-2http://en.wikipedia.org/wiki/Master_data_management#cite_note-2http://en.wikipedia.org/wiki/Reference_datahttp://en.wikipedia.org/wiki/Master_data_management#cite_note-1http://en.wikipedia.org/wiki/Organization