how to load data more quickly and accurately into oracle's life sciences data hub

How to Load Data More Quickly and Accurately

into Oracle Life Sciences Data Hub

2

ABOUT PERFICIENT

Perficient is a leading information

technology consulting firm serving

clients throughout North America.

We help clients implement digital experience, business

optimization, and industry solutions that cultivate and captivate

customers, drive efficiency and productivity, integrate business

processes, reduce costs, and create a more agile enterprise.

3

Founded in 1997

Public, NASDAQ: PRFT

2014 revenue $456.7 million

Major market locations:

Allentown, Atlanta, Ann Arbor, Boston, Charlotte,

Chattanooga, Chicago, Cincinnati, Columbus,

Dallas, Denver, Detroit, Fairfax, Houston,

Indianapolis, Lafayette, Milwaukee, Minneapolis,

New York City, Northern California, Oxford (UK),

Southern California, St. Louis, Toronto

Global delivery centers in China and India

>2,600 colleagues

Dedicated solution practices

~90% repeat business rate

Alliance partnerships with major technology vendors

Multiple vendor/industry technology and growth awards

PERFICIENT PROFILE

4

Business Process Management

Customer Relationship Management

Enterprise Performance Management

Enterprise Information Solutions

Enterprise Resource Planning

Experience Design

Portal / Collaboration

Content Management

Information Management

Mobile

BU

SIN

ES

S S

OL

UT

ION

S

50

+ P

AR

TN

ER

S

Safety / PV

Clinical Data Management

Electronic Data Capture

Medical Coding

Data Warehousing

Data Analytics

Clinical Trial Management

Precision Medicine

CL

INIC

AL / H

EA

LT

HC

AR

E IT

Consulting

Implementation

Integration

Migration

Upgrade

Managed Services

Private Cloud Hosting

Validation

Study Setup

Project Management

Application Development

Software Licensing

Application Support

Staff Augmentation

Training

SE

RV

ICE

S

OUR SOLUTIONS PORTFOLIO

5

WELCOME & INTRODUCTION

Extensive clinical trial software implementation experience

• 20 years of experience in the life sciences industry

• Extensive experience with Oracle’s clinical data warehousing, analytics, and

precision medicine applications

• Expertise in improving and standardizing business processes to support best

practices and the ever-changing regulatory requirements

Kathryn HansonSolutions Architect, Life Sciences

Perficient

6

WHAT’S TRENDING IN TECHNOLOGY?

• Big Data

• How do we acquire data from other sources?

• How do we manage high volume data?

• Data analytics

• What conclusions can we draw from the raw data?

• Data privacy and security

• How do we control who has access to our data?

7

WHICH ISSUES ARE WE FACING?

The pharmaceutical industry has many of these same technology issues:

• How do we acquire data from external sources?

• How do we manage high volume data?

• How can we present that data for analysis?

• How can we secure our data against unauthorized access?

How can we acquire and manage the data we

receive from many different sources?

8

WHAT WOULD WE LIKE IN A SOLUTION?

Hands off and automated

– After the initial setup only routine monitoring is needed

Flexible

• Adapts as data changes over time

• Handles multiple file types

• Can start other jobs as needed when the load is complete

Reliable and secure

Efficient and performs well on high-volume data

Simple to implement

9

THE SOLUTION: AUTOMATED FILE LOAD

Quality

Assurance

Secure

Staging Area

File

Load

Utility

Warehouse

Study

Staged data

Transformed data

Analysis programs

Data file 1 2

3

4

10

WHAT DO I NEED TO GET STARTED?

• A repository to receive and manage the clinical data(in this presentation that’s the Oracle Life Sciences Warehouse)

• Resources to set up and monitor the system

• Secure directories to receive and process data files

• Utility software to process the files and load the data into the repository

• Scheduling software to control when, where, and how jobs run

• A way to register new data sources to the utility

11

HOW DO I BEGIN LOADING DATA?

• Work with the vendor to

• Understand the file format, data structures, file naming conventions, etc.

• Provide secure access to the download area

• Receive a sample data file

• Register the new data source in the utility

• Set up the storage areas in the repository

• Test the new data source to verify it loads correctly into the repository

• Complete any other setup needed so authorized users can access the data (for transformations, visualizations, etc.)

12

SETTING UP THE DIRECTORIES

<root directory>

+

+

+

+

stagedir

processdir

rejectdir

scripts

successdir

+

—

The data file is dropped into this watched directory

The pre-processed files are moved here for final

processing and loading into the warehouse

The data file is moved here if the file load fails

The data file is moved here when the job finishes

successfully

Utility software is stored in this directory

13

SETTING UP STUDY REGISTRATION

The first 3 attributes identify the study

and data type

These 3 attributes tell the utility where

to store the data in the repository

There are many options that control

how the data should be loaded

14

NAMING CONVENTIONS FOR THE DATA FILE

File naming conventions ensure that the utility can identify

the registered study

CDISC01 – The study name

FULL – The type of data that will be loaded

DEV – Is this development, test, or production data?

201509211010 – A unique date and time stamp

15

ADDING OTHER PROCESSING OPTIONS

The utility lets you specify how you want to handle the data:

• Running another job after the data loads

• Handling blinded data

• Sending out notifications

• Processing large files

• Managing changes in data structures

• Identifying file formats for text files

16

SETTING UP THE REPOSITORY

The data will be loaded into the work area under which you registered the study.

Warehouse

Study

Staged data

Transformed data

Analysis programs

17

WHAT THE UTILITY DOES

Your vendor has uploaded a data file; now the utility…

1. Detects the file and runs a set of preprocessing checks

2. Extracts all the datasets (text files, etc.)

3. Extracts the metadata for each dataset

4. Verifies the metadata for each dataset. If the dataset has been loaded before, either

• The new metadata must match that in the previous load

OR

• The study allows compatible metadata updates

18

WHAT THE UTILITY DOES

The utility continues if everything checks out by …

5. Creating a load set for each dataset in the data file(if one doesn’t already exist)

6. Updating the repository metadata, if required

7. Starting each of the load sets

8. Monitoring the running jobs for errors

9. Sending notifications to users, as required, when all the the jobs are done

19

THE RESULTS

For efficiency, the utility processes all the datasetsin the file in parallel…

20

THE RESULTS

…and when all the jobs are done the data is loaded and available in the repository.

21

WHAT HAPPENS IF THE METADATA CHANGES?

• One of the options you can choose is whether or not to allow changes to a table’s metadata

• If that flag is “Y”, the utility will accept and process compatible changes

• For example, you need to add 2 new columns to the table…

22

WHAT HAPPENS IF THE METADATA CHANGES?

The table in the repository now has those two additional columns:

23

DOES THE UTILITY MEET OUR GOALS?

Automated and hands off

Flexible

Efficient

Simple to implement

24

QUESTIONS

Type your question into the

chat box

25

FOLLOW US ONLINE

• Perficient.com/SocialMedia

• Facebook.com/Perficient

• Twitter.com/Perficient_LS

• Blogs.perficient.com/LifeSciences

26

THANK YOU

how to load data more quickly and accurately into oracle's life sciences data hub

Technology