tibco clarity examples - tibco product documentation · 2015-08-20 · tibco clarity - enterprise...

34
TIBCO ® Clarity Examples Software Release 2.2 August 2015 Two-Second Advantage ®

Upload: others

Post on 03-Aug-2020

15 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: TIBCO Clarity Examples - TIBCO Product Documentation · 2015-08-20 · TIBCO Clarity - Enterprise Edition Installation The enterprise edition is available only for premium subscribers

TIBCO® ClarityExamplesSoftware Release 2.2August 2015

Two-Second Advantage®

Page 2: TIBCO Clarity Examples - TIBCO Product Documentation · 2015-08-20 · TIBCO Clarity - Enterprise Edition Installation The enterprise edition is available only for premium subscribers

Important Information

SOME TIBCO SOFTWARE EMBEDS OR BUNDLES OTHER TIBCO SOFTWARE. USE OF SUCHEMBEDDED OR BUNDLED TIBCO SOFTWARE IS SOLELY TO ENABLE THE FUNCTIONALITY(OR PROVIDE LIMITED ADD-ON FUNCTIONALITY) OF THE LICENSED TIBCO SOFTWARE. THEEMBEDDED OR BUNDLED SOFTWARE IS NOT LICENSED TO BE USED OR ACCESSED BY ANYOTHER TIBCO SOFTWARE OR FOR ANY OTHER PURPOSE.

USE OF TIBCO SOFTWARE AND THIS DOCUMENT IS SUBJECT TO THE TERMS ANDCONDITIONS OF A LICENSE AGREEMENT FOUND IN EITHER A SEPARATELY EXECUTEDSOFTWARE LICENSE AGREEMENT, OR, IF THERE IS NO SUCH SEPARATE AGREEMENT, THECLICKWRAP END USER LICENSE AGREEMENT WHICH IS DISPLAYED DURING DOWNLOADOR INSTALLATION OF THE SOFTWARE (AND WHICH IS DUPLICATED IN THE LICENSE FILE)OR IF THERE IS NO SUCH SOFTWARE LICENSE AGREEMENT OR CLICKWRAP END USERLICENSE AGREEMENT, THE LICENSE(S) LOCATED IN THE “LICENSE” FILE(S) OF THESOFTWARE. USE OF THIS DOCUMENT IS SUBJECT TO THOSE TERMS AND CONDITIONS, ANDYOUR USE HEREOF SHALL CONSTITUTE ACCEPTANCE OF AND AN AGREEMENT TO BEBOUND BY THE SAME.

This document contains confidential information that is subject to U.S. and international copyright lawsand treaties. No part of this document may be reproduced in any form without the writtenauthorization of TIBCO Software Inc.

TIBCO, Two-Second Advantage, TIBCO Clarity, TIBCO ActiveSpaces, TIBCO Cloud Marketplace,TIBCO GeoAnalytics Builder, TIBCO MDM, TIBCO Patterns, TIBCO Spotfire, and TIBCO Vault areeither registered trademarks or trademarks of TIBCO Software Inc. in the United States and/or othercountries.

Enterprise Java Beans (EJB), Java Platform Enterprise Edition (Java EE), Java 2 Platform EnterpriseEdition (J2EE), and all Java-based trademarks and logos are trademarks or registered trademarks ofOracle Corporation in the U.S. and other countries.

All other product and company names and marks mentioned in this document are the property of theirrespective owners and are mentioned for identification purposes only.

THIS SOFTWARE MAY BE AVAILABLE ON MULTIPLE OPERATING SYSTEMS. HOWEVER, NOTALL OPERATING SYSTEM PLATFORMS FOR A SPECIFIC SOFTWARE VERSION ARE RELEASEDAT THE SAME TIME. SEE THE README FILE FOR THE AVAILABILITY OF THIS SOFTWAREVERSION ON A SPECIFIC OPERATING SYSTEM PLATFORM.

THIS DOCUMENT IS PROVIDED “AS IS” WITHOUT WARRANTY OF ANY KIND, EITHEREXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OFMERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, OR NON-INFRINGEMENT.

THIS DOCUMENT COULD INCLUDE TECHNICAL INACCURACIES OR TYPOGRAPHICALERRORS. CHANGES ARE PERIODICALLY ADDED TO THE INFORMATION HEREIN; THESECHANGES WILL BE INCORPORATED IN NEW EDITIONS OF THIS DOCUMENT. TIBCOSOFTWARE INC. MAY MAKE IMPROVEMENTS AND/OR CHANGES IN THE PRODUCT(S)AND/OR THE PROGRAM(S) DESCRIBED IN THIS DOCUMENT AT ANY TIME.

THE CONTENTS OF THIS DOCUMENT MAY BE MODIFIED AND/OR QUALIFIED, DIRECTLY ORINDIRECTLY, BY OTHER DOCUMENTATION WHICH ACCOMPANIES THIS SOFTWARE,INCLUDING BUT NOT LIMITED TO ANY RELEASE NOTES AND "READ ME" FILES.

Copyright © 2013-2015 TIBCO Software Inc. ALL RIGHTS RESERVED.

TIBCO Software Inc. Confidential Information

2

TIBCO® Clarity Examples

Page 3: TIBCO Clarity Examples - TIBCO Product Documentation · 2015-08-20 · TIBCO Clarity - Enterprise Edition Installation The enterprise edition is available only for premium subscribers

Contents

TIBCO Documentation and Support Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5

Sample Datasets Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7

Working with the Sample-customers Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8

Creating a Dataset and a Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8

Uploading Data from a Local File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

Uploading Data from a Cloud Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

Mapping Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

Creating a Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10

Cloning a Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

Analyzing Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

Profiling Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

Performing Row Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

Performing Column Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12

Faceting Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12

Applying Pattern Facet to SSN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12

Applying Pattern Facet to ZIP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .13

Checking Dependency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

Charting Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .14

Validating Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

Transforming Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

Transforming Data Based on Pattern Facet Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15

Transforming Date Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

Removing Columns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

Splitting a Column . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

Transforming Data Using a Look-up Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .17

Creating a Look-up Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .17

Using the Look-up Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .18

Cleansing Address . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

Deduplicating Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

Working with the Sample-patients Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

Analyzing Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

Profiling Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

Performing Row Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

Performing Column Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .21

Faceting Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .21

Faceting to Check Duplications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

Faceting to Check Invalid Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

Faceting to Check Date Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .22

3

TIBCO® Clarity Examples

Page 4: TIBCO Clarity Examples - TIBCO Product Documentation · 2015-08-20 · TIBCO Clarity - Enterprise Edition Installation The enterprise edition is available only for premium subscribers

Faceting to Check Data Integrity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

Checking Dependency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

Charting Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .24

Creating a Bar Chart to Check Duplicates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

Creating a Bar Chart to Check Invalid Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

Creating a Pie Chart to Visualize Data Count . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .25

Validating Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

Validating Project Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .26

Profiling Validating Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

Exporting Validating Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

Transforming Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

Transforming Digit Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

Transforming Date Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

Cleansing Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .30

Trimming White Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .31

Removing Empty Rows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

Removing Invalid Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

Modifying Invalid Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

Delivering Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .32

Creating a Dataset and a Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

Cleansing Project Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .33

Exporting Project Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

4

TIBCO® Clarity Examples

Page 5: TIBCO Clarity Examples - TIBCO Product Documentation · 2015-08-20 · TIBCO Clarity - Enterprise Edition Installation The enterprise edition is available only for premium subscribers

TIBCO Documentation and Support Services

Documentation for this and other TIBCO products is available on the TIBCO Documentation site:

https://docs.tibco.com

Documentation on the TIBCO Documentation site is updated more frequently than any documentationthat might be included with the product. To ensure that you are accessing the latest available helptopics, please visit https://docs.tibco.com.

Product-Specific Documentation

Documentation for TIBCO products is not bundled with the software. Instead, it is available on theTIBCO Documentation site. To directly access documentation for this product, double-click thefollowing file:

TIBCO_HOME/release_notes/TIB_clarity-dt_version_docinfo.html

where TIBCO_HOME is the top-level directory in which TIBCO products are installed. On Windows,the default TIBCO_HOME is C:\tibco. On UNIX systems, the default TIBCO_HOME is /opt/tibco.

The following documents for this product can be found on the TIBCO Documentation site:

● TIBCO Clarity User's Guide● TIBCO Clarity Examples● TIBCO Clarity Release Notes● TIBCO Clarity - Enterprise Edition Installation

The enterprise edition is available only for premium subscribers.

The following documents provide additional information and can be found on the TIBCODocumentation site:

● TIBCO ActiveSpaces documentation● TIBCO GeoAnalytics Builder documentation● TIBCO MDM documentation● TIBCO Patterns documentation● TIBCO Spotfire documentation● TIBCO Vault documentation

How to Contact TIBCO Support

For comments or problems with this manual or the software it addresses, contact TIBCO Support:

● For an overview of TIBCO Support, and information about getting started with TIBCO Support,visit this site:

http://www.tibco.com/services/support

● If you already have a valid maintenance or support contract, visit this site:

https://support.tibco.com

Entry to this site requires a user name and password. If you do not have a user name, you canrequest one.

5

TIBCO® Clarity Examples

Page 6: TIBCO Clarity Examples - TIBCO Product Documentation · 2015-08-20 · TIBCO Clarity - Enterprise Edition Installation The enterprise edition is available only for premium subscribers

How to Join TIBCOmmunity

TIBCOmmunity is an online destination for TIBCO customers, partners, and resident experts. It is aplace to share and access the collective experience of the TIBCO community. TIBCOmmunity offersforums, blogs, and access to a variety of resources. To register, go to the following web address:

https://www.tibcommunity.com

6

TIBCO® Clarity Examples

Page 7: TIBCO Clarity Examples - TIBCO Product Documentation · 2015-08-20 · TIBCO Clarity - Enterprise Edition Installation The enterprise edition is available only for premium subscribers

Sample Datasets Overview

Three sample datasets are provided for you to try out, or get started with TIBCO® Clarity.

After launching TIBCO Clarity, you are presented with the following three sample datasets on thehome page. The first two datasets are used to show how you can analyze, validate, transform, cleanse,and deliver data using a variety of methods, from simple to complex data operations.

Sample-customers

This sample dataset contains a set of artificial customers data. By default, a project named project 1 iscreated from this dataset. For more information, see Working with the Sample-customers Dataset.

Sample-patients

This sample dataset contains a set of artificial patients data. By default, a project named project 1 iscreated from this dataset. For more information, see Working with the Sample-patients Dataset.

Sample-students_records

This sample dataset contains a set of artificial students data. By default, a project named project 1 iscreated from this dataset.

You can download the sample files from http://clarity.cloud.tibco.com/console/download/Samples.zip.The compressed file contains three CSV files: customers-1.csv, customers-2.csv, andpatients.csv. The Sample-customers dataset contains the data uploaded from the customers-1.csvand customers-2.csv files, and the Sample-patients dataset contains the data uploaded from thepatients.csv file.

7

TIBCO® Clarity Examples

Page 8: TIBCO Clarity Examples - TIBCO Product Documentation · 2015-08-20 · TIBCO Clarity - Enterprise Edition Installation The enterprise edition is available only for premium subscribers

Working with the Sample-customers Dataset

Starting from the very beginning, the Sample-customers dataset is used to show how you can load,analyze, validate, transform, and cleanse your data using various methods.

The Sample-customers dataset contains a set of artificial customers data. By default, a project namedproject 1 is created from this dataset.

Suppose you are an administrator at a widget manufacturer named TWIDGCO, Inc. The company hasexperienced unprecedented growth over the last decade, but a recent report revealed inefficiencies andlost opportunities due to inconsistent customer data across all brands. The management decided to rollall brands and their respective customers data into the main Customer Master of TWIDGCO.

Now you are facing the challenge of consolidating a massive amount of customers data from multipledata sources and in a variety of formats. With TIBCO Clarity, you can upload data from various datasources and streamline your data in the best possible shape.

This example dataset is used to show how you can meet the challenge with TIBCO Clarity:

● Creating a Dataset and a Project

Create a dataset to load data from different data sources, and create a project out of the dataset tosample the data.

● Analyzing Data

Profile data, facet data, check data dependency, and chart data.● Validating Data

Validate data by data types.● Transforming Data

Transform data into a uniform data format.● Cleansing Address

Cleanse the address data.● Deduplicating Data

Check duplicates.

If you are familiar with creating a dataset and creating a project, you can skip Creating a Dataset and aProject.

Creating a Dataset and a ProjectThe example shows how to create a dataset by uploading data from two sample data files, and how tocreate a project from this dataset to sample data.

The customers-1.csv and customers-2.csv files are used. Ensure that you have extracted the samplefiles downloaded from http://clarity.cloud.tibco.com/console/download/Samples.zip, and uploaded thecustomers-2.csv file to Dropbox.

A project can be created while creating a dataset, or by cloning an existing project. Complete thefollowing tasks to create a dataset and a project:

● Uploading Data from a Local File● Uploading Data from a Cloud Storage● Mapping Data● Creating a Project

8

TIBCO® Clarity Examples

Page 9: TIBCO Clarity Examples - TIBCO Product Documentation · 2015-08-20 · TIBCO Clarity - Enterprise Edition Installation The enterprise edition is available only for premium subscribers

Uploading Data from a Local FileUpload the customers-1.csv file from your local computer to TIBCO Clarity.

Procedure

1. On the home page, click Create dataset.

2. On the "Get data from" page, click My computer.

3. In the "File upload" dialog, click Choose file to locate the customers-1.csv file.

4. On the "Parse file" page, keep the default settings, click Next.

5. Rename the dataset to Sample-customers.

What to do next

Continue to upload the customers-2.csv file from Dropbox, as described in Uploading Data from aCloud Storage.

Uploading Data from a Cloud StorageAfter uploading data from the local customers-1.csv file, continue to upload the customers-2.csvfile from Dropbox to TIBCO Clarity.

Procedure

1. On the "Get data from" page, click Cloud Storage.

2. Next to Dropbox, click Sign in.

3. Enter your login credentials to log on to Dropbox.

4. Select the customers-2.csv file.

5. Click Next to confirm the file uploading.

6. On the "Parse file" page, keep the default settings, click Next.

Result

Now you have created a dataset named Sample-customers with two sample data files.

What to do next

Mapping Data

After create the dataset, click Done adding. You are then brought to the "Map data" page.

Mapping DataAfter uploading data, use the mapping function to consolidate data from multiple data sources into aunified dataset.

The following two mapping methods are available:

Auto Mapping

Automatically map your data from various data sources. TIBCO Clarity finds identical column titles indataset and groups them under the same column name. Click Auto map to enable auto mapping.TIBCO Clarity sorts out the rest, as shown in the following figure:

9

TIBCO® Clarity Examples

Page 10: TIBCO Clarity Examples - TIBCO Product Documentation · 2015-08-20 · TIBCO Clarity - Enterprise Edition Installation The enterprise edition is available only for premium subscribers

Manul Mapping

Use manual mapping to select the columns to be mapped.

For example, to map the FirstName column:

Drag FirstName from customers-1.csv to the mapping area, and then drag FirstName fromcustomers-2.csv to the same place where you dropped FirstName from customers-1.csv. Asuperset of column named FirstName is displayed in the Group your data area. Continue to map othercolumns one by one.

Creating a ProjectAfter mapping data, create a project while sampling data on the "Sample data" page.

Procedure

1. Click the customers-1.csv tab.

2. Click Load 100 % of rows.

3. Click the customers-2.csv tab.

4. Click Load 100 % of rows.

5. Click Done.

Result

A project named project 1 is created and you are brought to the project data page.

Cloning a Project

You can also create a project by cloning a project.

To create a project with 30 rows by cloning project 1:

10

TIBCO® Clarity Examples

Page 11: TIBCO Clarity Examples - TIBCO Product Documentation · 2015-08-20 · TIBCO Clarity - Enterprise Edition Installation The enterprise edition is available only for premium subscribers

Procedure

1. On the home page, expand the Sample-customers dataset.

2. Move your cursor over project 1, and then click Clone.

3. On the Clone page, rename the project project 2.

4. Click the customers-1.csv tab:a) Click Load rows from.b) In the row fields, enter 1 and 30 to load rows from 1 to 30.

5. Click the customers-2.csv tab:a) Click Load rows from.b) In the row fields, enter 1 and 30 to load rows from 1 to 30.

6. Click Done.

Result

A project named project 2 is created and you are brought to the project data page.

Analyzing DataYou can analyze data by profiling data, faceting data, checking data dependency, and charting data.

Profiling DataUse the profiling function to analyze data based on rows and columns.

The following two analysis methods are used:

● Performing Row Analysis● Performing Column Analysis

Performing Row Analysis

Perform row analysis to analyze the sample data based on rows.

Procedure

1. On the home page, click project 1 to go to the project data page.

2. On the toolbar, click Profile.The "Profiling analysis" page is displayed, and a row analysis report is generated by default.

3. View the row analysis report.The report shows that project 1 has 0 empty rows, with 192 as the maximum row size compared tothe average row size of 166.62. Click the maximum column size 16, 490 matching rows withmaximum column size (16 columns ) are displayed on the data page. Click the minimum columnsize 11, one matching row with minimum column size (11 columns) is displayed on the data page.

11

TIBCO® Clarity Examples

Page 12: TIBCO Clarity Examples - TIBCO Product Documentation · 2015-08-20 · TIBCO Clarity - Enterprise Edition Installation The enterprise edition is available only for premium subscribers

Performing Column Analysis

Perform column analysis to analyze the sample data based on columns.

Procedure

1. On the home page, click project 1 to go to the project data page.

2. On the toolbar, click Profile.The "Profiling analysis" page is displayed.

3. From the Analysis type list, select Column analysis.A column analysis report is generated.

4. View the column analysis report.The report shows the following two aspects of the columns:

● Numeric columns: Provides mathematical results on operations of Empties, Nulls, Uniqueness,Unique count, Min, Max, Mean, Sum, Standard deviation, and Quartile.

● String columns: Provides column count on the subjects of Empties, Nulls, Uniqueness, Uniquecount, Min length, Max length, and Mean length.

Faceting DataUse the pattern facet function to filter out inconsistent data formats that exists in your project data.

The following two examples show how to apply pattern facet on a column:

● Applying Pattern Facet to SSN● Applying Pattern Facet to ZIP

Applying Pattern Facet to SSN

Apply pattern facet to the SSN column to filter out inconsistent data formats of social security number.

Procedure

1. From the SSN column menu, click Facet > Text pattern facet.

2. In the SSN facet panel, click count.Most of the values in the SSN column are in the 999-99-9999 format.

12

TIBCO® Clarity Examples

Page 13: TIBCO Clarity Examples - TIBCO Product Documentation · 2015-08-20 · TIBCO Clarity - Enterprise Edition Installation The enterprise edition is available only for premium subscribers

Applying Pattern Facet to ZIP

Apply pattern facet to the ZIP column to filter out inconsistent data formats of zip code.

Procedure

1. From the ZIP column menu, click Facet > Text pattern facet.2. In the ZIP facet panel, click count.

Most of the values in the ZIP column are in the 99999 format.

For details about how to transform the invalid zip values, see Transforming Data Based on PatternFacet Results.

Checking DependencyUse the dependency checking function to check the relationships between columns.

This example shows how you can check dependency between the SSN column and other columns.

Procedure

1. On the toolbar, click Data > Check dependency.2. In the "Column dependency" dialog, drag FirstName, LastName, and Phone from the Column

Name area to the Key column(s) area.3. Drag SSN from the Column Name area to the Value column area.

Dependency analysis tests whether one or a group of keys can uniquely determine the column inthe Value field.

13

TIBCO® Clarity Examples

Page 14: TIBCO Clarity Examples - TIBCO Product Documentation · 2015-08-20 · TIBCO Clarity - Enterprise Edition Installation The enterprise edition is available only for premium subscribers

4. Click Analyze Dependency.

Result

The ID of the rows that are not uniquely determined by the columns are displayed.

Charting DataUse the charting function to visualize data.For example, create a bar chart for project 1 in the Sample-customers dataset to present distribution ofgender values. To create a bar chart for the Gender column:

Procedure

1. On the toolbar, click Chart.

2. In the Chart type area, click bar, and then configure the chart setting:a) From the X axis list, select Gender.b) From the Y axis list, select Row Count.c) Click Create chart.The gender distribution is displayed, as shown in the following figure:

3. View the chart.The chart shows that there are 6 different types of gender values: (blank), F, FM, M, ML, and X. Youcan easily find blank and invalid gender values.

14

TIBCO® Clarity Examples

Page 15: TIBCO Clarity Examples - TIBCO Product Documentation · 2015-08-20 · TIBCO Clarity - Enterprise Edition Installation The enterprise edition is available only for premium subscribers

Validating DataUse the data validation function to filter out invalid data.

Procedure

1. On the toolbar, click Validate.

2. Next to Data types and constraints, click Auto suggest.

3. In the "Configure data type" dialog, keep the default settings, click Next.

4. On the "Data type result" page, use the data types suggested by TIBCO Clarity, click Next.

5. Configure the constraints for the Gender column data type:a) From the third field, select Valid list.b) Click Click to add/edit valid list.c) In the Enter a valid value field, enter M, and then click Add.d) In the Enter a valid value field, enter F, and then click Add.e) Click Save.

6. Click Save changes to start validating data.The validating results are displayed on the data page. The rows that contain invalid values aremarked with the icon.

7. Click an invalid icon to view the details.For example, click the invalid icon before the number 4 row, and then it shows: String valuedoesn't match expression. Click the invalid icon before the number 17 row, and then it shows:String value is not valid.

Transforming DataAfter analyzing and validating the data, you can transform data into a consistent data format.

The following transforming methods are used in this guide:

● Transforming Data Based on Pattern Facet Results● Transforming Date Formats● Transforming Data Formats Using a Look-up Table

Transforming Data Based on Pattern Facet ResultsAfter applying pattern facet to a column, you can transform data to consistently format your data.

Prerequisites

Applying Pattern Facet to ZIP

After applying pattern facet to the ZIP column, you can see that 49 rows are in the format of 9999. Thisformat is a typo. To transform this format to the right 99999 format:

Procedure

1. In the ZIP facet panel, click 9999.All the rows with dada in the 9999 format are displayed.

2. From the ZIP column menu, click Edit cells > Transform.The Custom Text Transform on Column ZIP is displayed.

15

TIBCO® Clarity Examples

Page 16: TIBCO Clarity Examples - TIBCO Product Documentation · 2015-08-20 · TIBCO Clarity - Enterprise Edition Installation The enterprise edition is available only for premium subscribers

3. In the Expression field, enter "0" + value, and then click OK.All the data in the 9999 format are transformed into the 99999 format.

Transforming Date FormatsUse the date format transforming function to transform inconsistent date formats.This example shows how to transform the date format of the DOB column data into the dd/MM/yyyyformat.

Procedure

1. From the DOB column menu, click Edit column > Transform date format.

2. In the "Transform date format based on column DOB" dialog, configure the following fields:a) Optional: In the New column name field, enter a name for the new column.

The dafult value is Copy_DOB. In this example, the default value is used.b) From the New format list, select dd/MM/yyyy (06/06/2000).c) Keep the default values for the rest of the options. Click OK.

Result

A new column named Copy_DOB is created, and all dates are in the dd/MM/yyyy format.

Removing Columns

After transforming a column with inconsistent date format, a new column is created. You can removethe original column and rename the new column.

Prerequisites

Transforming Date Formats

This example shows how to remove the DOB column after transforming and rename the new columnCopy_DOB.

Procedure

1. From the DOB column menu, click Edit column > Remove this column.

2. From the Copy_DOB column menu, click Edit column > Rename this column.

3. In the "Rename column" dialog, enter DOB, and then click OK.

16

TIBCO® Clarity Examples

Page 17: TIBCO Clarity Examples - TIBCO Product Documentation · 2015-08-20 · TIBCO Clarity - Enterprise Edition Installation The enterprise edition is available only for premium subscribers

Splitting a Column

After transforming a column with inconsistent date formats, you can split it into several columns.

Prerequisites

Transforming Date Formats

This example shows how to split the transformed DOB column to create a month column.

Procedure

1. From the DOB column menu, click Edit column > Split into several columns.

2. In the Separator field, enter /, and then click OK.Three columns, DOB 1, DOB 2, and DOB 3 are created that display the day, month, and year data.

3. Remove the DOB 1 and DOB 3 columns:a) From the DOB 1 column menu, click Edit column > Remove this column.b) From the DOB 3 column menu, click Edit column > Remove this column.

4. Rename the DOB 2 column. Double click the column name DOB 2, and then enter Month.

Transforming Data Using a Look-up TableUse a look-up table to define a data format for transforming.

Create a look-up table and then use it for transforming:

● Creating a Look-up Table● Using the Look-up Table

Creating a Look-up Table

Create a look-up table to define the data format for transforming.To create a look-up table for transforming the Month column:

Procedure

1. Click youraccount/Settings > Look-up tables.

2. Move your mouse pointer over Look-up table n, click Rename, and then enter Month to rename thelook-up table.

3. From the Look-up table source list, click Manual input to manually add keys and values to thelook-up table.

4. Enter the key/value pairs of month, for example, 1/January, and then click Add to list. Continue toadd all months in the key/value pairs. Click Save.

5. Click CloseA look-up table for the Month column is created.

What to do next

Using the Look-up Table

17

TIBCO® Clarity Examples

Page 18: TIBCO Clarity Examples - TIBCO Product Documentation · 2015-08-20 · TIBCO Clarity - Enterprise Edition Installation The enterprise edition is available only for premium subscribers

Using the Look-up Table

After creating a look-up table named Month, you can use it to substitute every numerical month valueas text.

Prerequisites

Creating a Look-up Table

To transform the Month data from numerical values into string values using the Month look-up table:

Procedure

1. From the Month column menu, click Edit cells > Transform.The Custom Text Transform on Column Month dialog is displayed.

2. Click the Lookup tab, and then click Hint to associate with the Month table name.

Optionally, enter value.tableLookup("Month") in the Expression field.

3. Click OK.All the month values are displayed in string.

Cleansing AddressUse the address cleansing function to check the address data.

Procedure

1. On the toolbar, click Address.

2. In the "TIBCO GeoAnalytics configuration" dialog, enter your user name and key to access theMajorana service, and then click Save.

This step is required only when you are using the enterprise edition.

3. On the "Address cleansing" page, move the slider to change the similarity threshold to 80%.

4. Drag the State and City columns from the Source Columns area to the middle area.

5. Drag the State and City columns from the Destination Columns area to the middle area.

6. Click Run to start checking.

Result

Two columns, addr_city and addr_state are added, displaying the automatically cleansed address. Youcan delete the original City and State columns.

Deduplicating DataUse the dedup (deduplication) function to check duplicate records.

For the enterprise edition, you have to configure a connection to the TIBCO® Patterns server beforeusing the dedup function.

You can either create a switchable group of columns or select the columns that you want to checkduplicates. This example shows how to create a switchable group for deduplicating data.

18

TIBCO® Clarity Examples

Page 19: TIBCO Clarity Examples - TIBCO Product Documentation · 2015-08-20 · TIBCO Clarity - Enterprise Edition Installation The enterprise edition is available only for premium subscribers

Procedure

1. On the toolbar, click Dedup.

2. From the menu next to Column name, click Create a switchable group, and then select theFirstName and LastName check boxes. Click elsewhere.A switchable group named FirstNameLastName is created.

3. Select the SSN check box.

4. Ensure the weight values for FirstNameLastName and SSN columns are 1, and then click Run.

Result

The duplicate rows are marked with the icon. Three new columns: dedup_isLead, dedup_group,and dedup_rowIndex are added. The following table lists the details of the dedup results:

Column Name Data Type Hint

dedup_isLead Boolean true: This row is the first found row in the group.

false: This row is not the first found row in the group

dedup_group Integer 0: This row is a unique row.

>0: This row is in a duplicated group.

dedup_rowIndex Integer The value is the original row index.

19

TIBCO® Clarity Examples

Page 20: TIBCO Clarity Examples - TIBCO Product Documentation · 2015-08-20 · TIBCO Clarity - Enterprise Edition Installation The enterprise edition is available only for premium subscribers

Working with the Sample-patients Dataset

The Sample-patients dataset is used to show how you can analyze, validate, transform, and cleanse,and deliver data using various methods.

The Sample-patients dataset contains a set of artificial patients data. By default, a project named project1 is created from this dataset. On the home page, click the project name to go to the project data page.The patients data is stored in the following columns:

● PATNO: Patient number.● GENDER: Gender.● VISIT: Date of visit.● HR: Heart rate.● SBP: Systolic blood pressure.● DBP: Diastolic blood pressure.● DX: Diagnosis code.● AE: Adverse event.

This example dataset is used to show how you can analyze, validate, transform, cleanse, and deliverdata:

● Analyzing Data

Profile data, facet data, check data dependency, and chart data.● Validating Data

Validate data by data types, and analyze and export validating results.● Transforming Data

Transform data into a uniform data format.● Cleansing Data

Trim white space, remove empty rows, and remove or modify invalid data.● Delivering Data

A complete procedure: load, analyze, transform, cleanse, and deliver data.

Analyzing DataYou can analyze data by profiling data, faceting data, checking data dependency, and charting data.

Profiling DataUse the profiling function to analyze data based on rows and columns.

The following two analysis methods are used:

● Performing Row Analysis● Performing Column Analysis

20

TIBCO® Clarity Examples

Page 21: TIBCO Clarity Examples - TIBCO Product Documentation · 2015-08-20 · TIBCO Clarity - Enterprise Edition Installation The enterprise edition is available only for premium subscribers

Performing Row Analysis

Perform row analysis to analyze the sample data.

Procedure

1. On the home page, click project 1 to go to the project data page.

2. On the toolbar, click Profile.The "Profiling analysis" page is displayed, and a row analysis report is generated by default.

3. View the row analysis report.The report shows that project 1 has 1 empty rows, with 27 as the maximum row size compared tothe average row size of 26.13. Click the maximum column size 8, 29 matching rows with maximumcolumn size (8 columns ) are displayed on the data page. Click the minimum column size 1, onematching row with no content is displayed on the data page.

Performing Column Analysis

Perform column analysis to analyze the sample data.

Procedure

1. On the home page, click project 1 to go to the project data page.

2. On the toolbar, click Profile.The "Profiling analysis" page is displayed.

3. From the Analysis type list, select Column analysis.A column analysis report is generated.

4. View the column analysis report.

Faceting DataUse the data faceting function to check duplicates, invalid values, data formats, and data integrity.

Facet data to analyze the sample data:

● Faceting to Check Duplications● Faceting to Check Invalid Values● Faceting to Check Date Formats● Faceting to Check Data Integrity

Faceting to Check Duplications

Use the data faceting function to check duplicates in a column.This example shows how to use the faceting function to check duplicates in the PATNO column.

Procedure

1. From the PATNO column menu, click Facet > Text Facet.

2. In the PATNO facet panel, click count.In the PATNO column, the values 002, 003, and 006 have duplicates.

21

TIBCO® Clarity Examples

Page 22: TIBCO Clarity Examples - TIBCO Product Documentation · 2015-08-20 · TIBCO Clarity - Enterprise Edition Installation The enterprise edition is available only for premium subscribers

Faceting to Check Invalid Values

Use the data faceting function to check invalid values in a column.This example shows how to use the faceting function to check duplicates in the GENDER column.

Procedure

1. From the GENDER column menu, click Facet > Text Facet.

2. In the GENDER facet panel, click count.The GENDER column has invalid values: 2, X, and (blank).

Faceting to Check Date Formats

Use the data faceting function to check date formats.

This operation is based on the validation rules defined in Validating Data.

This example shows how to use the faceting function to check the formats of dates in the VISIT column.

Procedure

1. From the VISIT column menu, click Facet > Text Pattern Facet.

2. In the VISIT facet panel, click count.The facet results shows that most of the dates are in the MM/dd/yyyy format.

22

TIBCO® Clarity Examples

Page 23: TIBCO Clarity Examples - TIBCO Product Documentation · 2015-08-20 · TIBCO Clarity - Enterprise Edition Installation The enterprise edition is available only for premium subscribers

To modify the dates that do not conform to this format, see Transforming Date Formats.

Faceting to Check Data Integrity

Use the data faceting function to check the integrity of your data.This example shows how to use the faceting function to check the integrity of the HR column.

Procedure

1. Click the data type icon before the HR column header, and change its data type to Integer.

2. From the HR column menu, click Facet > Numeric facet.A bar chart is displayed in the HR facet panel. The heart rates of patients consist of numeric, non-numeric, blank and error data. The heart rate ranges from 10 to 910 beats a minute.

3. Move to restrict the range of data display.

If the heart rate of a patient is beyond the reasonable range, for example, from 10 to 200 beats aminute, the data is invalid. For information about how to remove or modify the invalid data, see Removing Invalid Data and Modifying Invalid Data.

Checking DependencyUse the dependency checking function to check the relationships between columns.To check dependency between the PATNO and GENDER columns:

Procedure

1. On the toolbar, click Data > Check dependency.

2. In the "Column dependency" dialog, drag PATNO from the Column Name area to the Keycolumn(s) area.

3. Drag GENDER from the Column Name area to the Value column area.Dependency analysis tests whether one or a group of keys can uniquely determine the column inthe Value field.

23

TIBCO® Clarity Examples

Page 24: TIBCO Clarity Examples - TIBCO Product Documentation · 2015-08-20 · TIBCO Clarity - Enterprise Edition Installation The enterprise edition is available only for premium subscribers

4. Click Analyze Dependency.

Result

The ID of the rows that are not uniquely determined by the columns are displayed.

Charting DataUse the charting function to visualize data, including duplicate and invalid data.

You can create a variety of charts, including line, bar, pie, line bar, and scatter. Create a chart to viewdata based on your need:

● Creating a Bar Chart to Check Duplicates● Creating a Bar Chart to Check Invalid Data● Creating a Pie Chart to Visualize Data Count

Creating a Bar Chart to Check Duplicates

Create a bar chart to check duplicate data.To create a bar chart to check duplicate data in the PATNO column:

Procedure

1. On the toolbar, click Chart.

2. In the Chart type area, click bar.

3. Keep the default chart setting. Click Create chart.

Result

A bar chart is created. The chart shows that PANTO 002, 003, and 006 have duplicates.

24

TIBCO® Clarity Examples

Page 25: TIBCO Clarity Examples - TIBCO Product Documentation · 2015-08-20 · TIBCO Clarity - Enterprise Edition Installation The enterprise edition is available only for premium subscribers

Creating a Bar Chart to Check Invalid Data

Create a bar chart to check invalid data.To create a bar chart to check invalid data in the GENDER column:

Procedure

1. On the toolbar, click Chart.2. In the Chart type area, click bar, and then configure the chart setting:

a) From the X axis list, select GENDER.b) From the Y axis list, select Row Count.c) From the Group by list, select GENDER.d) Click Create chart.

A bar chart is created.e) On the top of the chart, click stacked.

Result

A bar chart is created. The chart shows that the GENDER column has invalid values: (blank), 2, and X.

Creating a Pie Chart to Visualize Data Count

Create a pie chart to visualize the number of rows for every data value.To create a pie chart to visualize the number of rows for different values in the GENDER column:

Procedure

1. On the toolbar, click Chart.2. In the Chart type area, click pie, and then configure the chart setting:

a) From the Color by list, select GENDER.b) From the Size by list, select Row Count.c) Click Create chart.

Result

A pie chart is created. The chart shows that the number of female patients and the number of the malepatients are almost the same. There are also a small portion of invalid values in the GENDER column.

25

TIBCO® Clarity Examples

Page 26: TIBCO Clarity Examples - TIBCO Product Documentation · 2015-08-20 · TIBCO Clarity - Enterprise Edition Installation The enterprise edition is available only for premium subscribers

Validating DataUse the data validation function to filter out invalid data. After validation, you can analyze and exportthe validating results.

A complete validation procedure includes:

● Validating Project Data● Profiling Validating Results● Exporting Validating Results

Validating Project DataUse the data validation function to filter out invalid data.

Procedure

1. On the toolbar, click Validate.

2. Next to Data types and constraints, click Auto suggest.

3. In the "Configure data type" dialog, keep the default settings, click Next.

4. On the "Data type result" page, use the data types suggested by TIBCO Clarity, click Next.

5. Configure the constraints for the PATNO column data type:a) Change the data type from Integer to String.b) In the string/regular expression field, enter the regular expression:^\d\d\d$.c) Next to the Allows null check box, click Save to save the data type.d) In the "Save as data type" dialog, enter a name for the new custom type, and then click Add.

6. Define custom data types for each column as shown in the following table:

ColumnName Description

VariableType Constraint Clarity Constraint

NullAllowed?

PATNO Patientnumber

String Numerals Whole: ^\d\d\d$ Yes

26

TIBCO® Clarity Examples

Page 27: TIBCO Clarity Examples - TIBCO Product Documentation · 2015-08-20 · TIBCO Clarity - Enterprise Edition Installation The enterprise edition is available only for premium subscribers

ColumnName Description

VariableType Constraint Clarity Constraint

NullAllowed?

GENDER Gender String ’M’ or ’F’ Valid values: M, F Yes

VISIT Date of visit Date Like 12/31/2013 MM/dd/yyyy Yes

HR Heart rate Integer 40 to 120 40 to 120 Yes

SBP Systolic bloodpressure

Integer 80 to 200 80 to 200 Yes

DBP Diastolicblood pressure

Integer 60 to 120 60 to 120 Yes

DX Diagnosiscode

String 1 to 3 digits Length: 3 Yes

AE Adverse event Boolean None None Yes

See the following figure for validation rules configured for each column:

7. Click Save changes to start validating data.

Result

The validating results are displayed on the data page. The rows that contain invalid values are markedwith the icon. Click an icon to view the details.

Profiling Validating ResultsAfter validating data, you can profile the validating results.

Prerequisites

Validating Project Data

In this example, column analysis is used to profile the validating results. You can analyze a specificcolumn by using a quartile diagram or a normal distribution diagram.

27

TIBCO® Clarity Examples

Page 28: TIBCO Clarity Examples - TIBCO Product Documentation · 2015-08-20 · TIBCO Clarity - Enterprise Edition Installation The enterprise edition is available only for premium subscribers

Procedure

1. On the toolbar, click Profile.The "Profiling analysis" page is displayed.

2. From the Analysis type list, select Column analysis.The column analysis report of the validating results is generated.

3. Analyze a specific column by using a quartile diagram or a normal distribution diagram.

● Using a quartile diagram:

a) Click any value in the following columns: 1st quartile, Median, or 3rd quartile. For example,click the 1st quartile value of DBP.A quartile diagram is generated.

b) Click Close to exit.

● Using a normal distribution diagram:

a) Click the Std deviation value for DBP.A normal distribution diagram is generated.

28

TIBCO® Clarity Examples

Page 29: TIBCO Clarity Examples - TIBCO Product Documentation · 2015-08-20 · TIBCO Clarity - Enterprise Edition Installation The enterprise edition is available only for premium subscribers

b) Click Close to exit.

Exporting Validating ResultsAfter validating and analyzing the sample data, you can separate valid rows from invalid rows, deletethe invalid rows, and export the valid rows.

Prerequisites

Validating Project Data

Procedure

1. On the toolbar, click Data > Facet > Facet by validation.

2. In the "Validation error" facet panel, click true.

3. On the toolbar, click Data > Edit rows > Remove all validated error rows .All the invalid rows are removed.

4. In the "Validation error" facet panel, click false.

5. Move your mouse pointer over false, and then click include.

6. On the toolbar, click Export project > To file > Custom exporter.The " Custom tabular exporter" dialog is displayed.

7. In the Other formats area, click Excel, and then click download.The valid rows are exported to an excel file.

Transforming DataUse the data transforming function to transform column data to a specified format.

The following two examples show how to transform digit numbers and date formats:

● Transforming Digit Numbers● Transforming Date Formats

Transforming Digit NumbersUse the data transforming function to transform the digit numbers of column values.The number of digits that a column value can have is defined. By default, no symbols are used torepresent a value that does not exist. For the DX column, the defined digit number is 3. Whereas, thecurrent displayed digit number for DX column is 1. You can transform the values for DX column byusing a regular expression. This example shows how to use hyphens (-) to replace the space that has novalue.

29

TIBCO® Clarity Examples

Page 30: TIBCO Clarity Examples - TIBCO Product Documentation · 2015-08-20 · TIBCO Clarity - Enterprise Edition Installation The enterprise edition is available only for premium subscribers

Procedure

1. From the DX column menu, click Edit cells > Transform.The Custom Text Transform on Column DX dialog is displayed.

2. In the Expression field, enter value.replace(/ /,"-").

3. Move the cursor in the Expression field.

4. Press Tab and double-click Row from the displayed list. Click OK.

Result

All values in the DX column are displayed in three digits. Space is replaced by hyphens. One hyphenrepresents one digit.

Transforming Date FormatsUse the data transforming function to transform the formats of a date column.To transform the dates in the VISIT column to the MM/dd/yyy format:

Procedure

1. From the VISIT column menu, click Edit column > Transform date format.

2. In the New column name field, enter a name for the new column.

3. Keep the default settings, and then click OK.

Result

All the dates in the VISIT column are transformed to the MM/dd/yyy format.

Cleansing DataTo cleanse your data, you can trim white spaces, remove empty rows, and remove and modify invalidvalues.

Use one of the following cleansing methods to cleanse the sample data:

● Trimming White Spaces● Removing Empty Rows● Removing Invalid Data● Modifying Invalid Data

30

TIBCO® Clarity Examples

Page 31: TIBCO Clarity Examples - TIBCO Product Documentation · 2015-08-20 · TIBCO Clarity - Enterprise Edition Installation The enterprise edition is available only for premium subscribers

Trimming White SpacesTim white spaces to save space for your dataset.

Trim the white spaces at the beginning and end of a string to ensure that their identical values do notdiffer by white space only, which can be quite tricky to detect, thus making your data consistent andreduce the size of your dataset.

To trim all the leading and trailing white spaces in a data table in one go:

On the toolbar, click Data > Edit columns > Trim all the leading and trailing white spaces.

Removing Empty RowsRemove empty rows to save space for your dataset.

To remove all empty rows in a data table:

On the toolbar, click Data > Edit rows > Remove all empty rows.

Removing Invalid DataUse the data faceting function to filter out invalid data, and then remove the invalid data.Suppose the normal heart rate of patients ranges from 10 to 200 beats a minute, and any value beyondthis range is considered as invalid data. This example shows how to remove the values within therange of 200 - 910 in the HR column:

Procedure

1. Click the data type icon before the HR column header, and change its data type to Integer.

2. From the HR column menu, click Facet > Numeric facet.A bar chart is displayed in the HR facet panel. The heart rates of patients consist of 27 numericvalues, 3 non-numeric values, and 1 blank cell. The heart rate ranges from 10 to 910 beats a minute.

3. Move to restrict the range to 200.00 - 910.00.All the rows with values ranging from 200 to 910 are displayed on the data page.

4. Clear the Non-numeric check box.

5. Clear the Blank check box.

6. From the HR column menu, click Edit cells > Common transforms > Blank out cells.

7. In the HR facet panel, click Reset.

Result

All values within the range of 200 - 910 in the HR column are removed.

31

TIBCO® Clarity Examples

Page 32: TIBCO Clarity Examples - TIBCO Product Documentation · 2015-08-20 · TIBCO Clarity - Enterprise Edition Installation The enterprise edition is available only for premium subscribers

Modifying Invalid DataUse the data faceting function to filter out invalid data, and then modify the invalid data.This example shows how to modify the numeric values within the range of 200 - 910, non-numericvalues, and blank values in the HR column.

Procedure

1. Click the data type icon before the HR column header, and change its data type to Integer.

2. From the HR column menu, click Facet > Numeric facet.A bar chart is displayed in the HR facet panel. The heart rates of patients consist of 27 numericvalues, 3 non-numeric values, and one blank cell. The heart rate ranges from 10 to 910 beats aminute.

3. Move to restrict the range to 200.00 - 910.00.The rows with values within the range of 200 - 910, and the rows with non-numeric and blank HRvalues are displayed on the data page.

4. Move your mouse pointer over an invalid value. Click Edit.

5. In the "Edit this cell" dialog, enter a new value for the invalid heart rate, and then click OK.Continue to modify other invalid heart rate values.

Delivering DataA complete procedure of delivering data involves loading data, analyzing data, transforming data,cleansing data, and exporting data.

The following tasks shows how you can deliver data with TIBCO Clarity:

● Creating a Dataset and a Project● Cleansing Project Data● Exporting Project Data

Creating a Dataset and a ProjectUpload data from a ZIP file to create a new dataset and a project.

Prerequisites

Compress the patients.csv file and rename the compressed file patients.zip.

Procedure

1. On the home page, click Create dataset.

2. On the "Get data from" page, click My computer.

32

TIBCO® Clarity Examples

Page 33: TIBCO Clarity Examples - TIBCO Product Documentation · 2015-08-20 · TIBCO Clarity - Enterprise Edition Installation The enterprise edition is available only for premium subscribers

3. In the "File upload" dialog, click Choose file to locate the patients.zip file.

4. On the "Parse file" page, keep the default settings, click Next.

5. Rename the dataset to Patients. Click Done adding.

6. On the "Map data" page, click Auto map, and then click Next.

7. On the "Sample data" page, rename the project to patients, and then click Done.

Result

A project named patients is created and you are brought to the project data page.

What to do next

Cleansing Project Data

Cleansing Project DataCombine a variety of functions such as data faceting and transforming to cleanse project data.

Prerequisites

Creating a Dataset and a Project

The created patients project is used to show how to cleanse data.

Procedure

1. Go to the patients project data page.

2. From the PATNO column menu, click Facet > Text Facet.

3. In the PATNO facet panel, click count.Patient 2, 3, and 6 have duplicates.

4. Remove the duplicate rows or modify the duplicate value:a) Click 2.

Two rows of patient 2 with identical information are displayed.b) Flag either row.c) On the toolbar, click Data > Edit rows > Remove all flagged rows.Continue to remove other duplicate rows. In this example, PANTO 3 and PANT 6 have duplicatevalues, not duplicate rows, so either remove a duplicate row or modify one duplicate patientnumber.

5. From the GENDER column menu, click Facet > Text Facet.The GENDER column contains three types of invalid values: 2, X, and (blank).

6. Remove the invalid value 2:a) Click 2.

One matching row is displayed.b) From the GENDER column menu, click Edit cells > Common transforms > Blank out cells to

remove this value.

7. From the VISIT column menu, click Facet > Text pattern facet.Some dates are not in the MM/dd/yyyy format.

8. Transform the invalid date formats to the MM/dd/yyyy format:a) Click 99/99/99, and then update the value of VISIT, for example, 10/12/1998.b) Click AAAAAAAA, and then blank out the value of VISIT.

33

TIBCO® Clarity Examples

Page 34: TIBCO Clarity Examples - TIBCO Product Documentation · 2015-08-20 · TIBCO Clarity - Enterprise Edition Installation The enterprise edition is available only for premium subscribers

9. Check the HR column data:a) Click the data type icon before the HR column header, and change its data type to Integer.b) From the HR column menu, click Facet > Numeric facet.c) Clear the Numeric check box to display results of non-numeric and blank cells only.

Six matching rows are displayed.d) From the HR column menu, click Edit cells > Common transforms > Blank out cells.e) Flag all blank rows.f) Select the Numeric check box.

g) Move to the ranges of 10 - 40 and 120 - 910 respectively.h) Flag all the rows that fall into these two ranges.i) Click Reset to return to the project data page.Apply the same transformation rules for the HR column to the SBP, DBP, and DX columns.

When you finish the transformation, all the rows with invalid and incomplete patient values areflagged.

10. From the AE column menu, click Facet > Text Facet:a) In the AE facet panel, click (blank), and then flag the rows with blank AE value.b) Click Reset to return to the project data page.

11. On the toolbar, click Data > Edit rows > Remove all flagged rows.

12. On the toolbar, click Data > Edit rows > Remove all validated errors rows.

Result

The patients project data is cleansed.

What to do next

Exporting Project Data

Exporting Project DataAfter cleansing data, you can export the project to share with others or reuse in other contexts.

Prerequisites

Cleansing Project Data

You can export project data to various destinations. For more information, see TIBCO Clarity User’sGuide. This example shows how you can export cleansed data to a database.

Procedure

1. On the toolbar, click Export project > To database.The "Export to database" dialog is displayed.

2. From the JDBC driver list, select a JDBC driver.

3. In the Database URL field, specify the URL to connect to the database.

4. Enter your user name and password that are used to access the database.

5. In the Login timeouts field, specify a timeout interval (in seconds).

6. Click Connect. When the connection is successful, click Next.

7. In the "Export to database" dialog, click Create new table, and then enter a name for the table.

8. Click Finish to export your data to the database.

34

TIBCO® Clarity Examples