how to load data more quickly and accurately into oracle's life sciences data hub
TRANSCRIPT
How to Load Data More Quickly and Accurately
into Oracle Life Sciences Data Hub
2
ABOUT PERFICIENT
Perficient is a leading information
technology consulting firm serving
clients throughout North America.
We help clients implement digital experience, business
optimization, and industry solutions that cultivate and captivate
customers, drive efficiency and productivity, integrate business
processes, reduce costs, and create a more agile enterprise.
3
Founded in 1997
Public, NASDAQ: PRFT
2014 revenue $456.7 million
Major market locations:
Allentown, Atlanta, Ann Arbor, Boston, Charlotte,
Chattanooga, Chicago, Cincinnati, Columbus,
Dallas, Denver, Detroit, Fairfax, Houston,
Indianapolis, Lafayette, Milwaukee, Minneapolis,
New York City, Northern California, Oxford (UK),
Southern California, St. Louis, Toronto
Global delivery centers in China and India
>2,600 colleagues
Dedicated solution practices
~90% repeat business rate
Alliance partnerships with major technology vendors
Multiple vendor/industry technology and growth awards
PERFICIENT PROFILE
4
Business Process Management
Customer Relationship Management
Enterprise Performance Management
Enterprise Information Solutions
Enterprise Resource Planning
Experience Design
Portal / Collaboration
Content Management
Information Management
Mobile
BU
SIN
ES
S S
OL
UT
ION
S
50
+ P
AR
TN
ER
S
Safety / PV
Clinical Data Management
Electronic Data Capture
Medical Coding
Data Warehousing
Data Analytics
Clinical Trial Management
Precision Medicine
CL
INIC
AL / H
EA
LT
HC
AR
E IT
Consulting
Implementation
Integration
Migration
Upgrade
Managed Services
Private Cloud Hosting
Validation
Study Setup
Project Management
Application Development
Software Licensing
Application Support
Staff Augmentation
Training
SE
RV
ICE
S
OUR SOLUTIONS PORTFOLIO
5
WELCOME & INTRODUCTION
Extensive clinical trial software implementation experience
• 20 years of experience in the life sciences industry
• Extensive experience with Oracle’s clinical data warehousing, analytics, and
precision medicine applications
• Expertise in improving and standardizing business processes to support best
practices and the ever-changing regulatory requirements
Kathryn HansonSolutions Architect, Life Sciences
Perficient
6
WHAT’S TRENDING IN TECHNOLOGY?
• Big Data
• How do we acquire data from other sources?
• How do we manage high volume data?
• Data analytics
• What conclusions can we draw from the raw data?
• Data privacy and security
• How do we control who has access to our data?
7
WHICH ISSUES ARE WE FACING?
The pharmaceutical industry has many of these same technology issues:
• How do we acquire data from external sources?
• How do we manage high volume data?
• How can we present that data for analysis?
• How can we secure our data against unauthorized access?
How can we acquire and manage the data we
receive from many different sources?
8
WHAT WOULD WE LIKE IN A SOLUTION?
Hands off and automated
– After the initial setup only routine monitoring is needed
Flexible
• Adapts as data changes over time
• Handles multiple file types
• Can start other jobs as needed when the load is complete
Reliable and secure
Efficient and performs well on high-volume data
Simple to implement
9
THE SOLUTION: AUTOMATED FILE LOAD
Quality
Assurance
Secure
Staging Area
File
Load
Utility
Warehouse
Study
Staged data
Transformed data
Analysis programs
Data file 1 2
3
4
10
WHAT DO I NEED TO GET STARTED?
• A repository to receive and manage the clinical data(in this presentation that’s the Oracle Life Sciences Warehouse)
• Resources to set up and monitor the system
• Secure directories to receive and process data files
• Utility software to process the files and load the data into the repository
• Scheduling software to control when, where, and how jobs run
• A way to register new data sources to the utility
11
HOW DO I BEGIN LOADING DATA?
• Work with the vendor to
• Understand the file format, data structures, file naming conventions, etc.
• Provide secure access to the download area
• Receive a sample data file
• Register the new data source in the utility
• Set up the storage areas in the repository
• Test the new data source to verify it loads correctly into the repository
• Complete any other setup needed so authorized users can access the data (for transformations, visualizations, etc.)
12
SETTING UP THE DIRECTORIES
<root directory>
+
+
+
+
stagedir
processdir
rejectdir
scripts
successdir
+
—
The data file is dropped into this watched directory
The pre-processed files are moved here for final
processing and loading into the warehouse
The data file is moved here if the file load fails
The data file is moved here when the job finishes
successfully
Utility software is stored in this directory
13
SETTING UP STUDY REGISTRATION
The first 3 attributes identify the study
and data type
These 3 attributes tell the utility where
to store the data in the repository
There are many options that control
how the data should be loaded
14
NAMING CONVENTIONS FOR THE DATA FILE
File naming conventions ensure that the utility can identify
the registered study
CDISC01 – The study name
FULL – The type of data that will be loaded
DEV – Is this development, test, or production data?
201509211010 – A unique date and time stamp
15
ADDING OTHER PROCESSING OPTIONS
The utility lets you specify how you want to handle the data:
• Running another job after the data loads
• Handling blinded data
• Sending out notifications
• Processing large files
• Managing changes in data structures
• Identifying file formats for text files
16
SETTING UP THE REPOSITORY
The data will be loaded into the work area under which you registered the study.
Warehouse
Study
Staged data
Transformed data
Analysis programs
17
WHAT THE UTILITY DOES
Your vendor has uploaded a data file; now the utility…
1. Detects the file and runs a set of preprocessing checks
2. Extracts all the datasets (text files, etc.)
3. Extracts the metadata for each dataset
4. Verifies the metadata for each dataset. If the dataset has been loaded before, either
• The new metadata must match that in the previous load
OR
• The study allows compatible metadata updates
18
WHAT THE UTILITY DOES
The utility continues if everything checks out by …
5. Creating a load set for each dataset in the data file(if one doesn’t already exist)
6. Updating the repository metadata, if required
7. Starting each of the load sets
8. Monitoring the running jobs for errors
9. Sending notifications to users, as required, when all the the jobs are done
19
THE RESULTS
For efficiency, the utility processes all the datasetsin the file in parallel…
20
THE RESULTS
…and when all the jobs are done the data is loaded and available in the repository.
21
WHAT HAPPENS IF THE METADATA CHANGES?
• One of the options you can choose is whether or not to allow changes to a table’s metadata
• If that flag is “Y”, the utility will accept and process compatible changes
• For example, you need to add 2 new columns to the table…
22
WHAT HAPPENS IF THE METADATA CHANGES?
The table in the repository now has those two additional columns:
23
DOES THE UTILITY MEET OUR GOALS?
Automated and hands off
Flexible
Efficient
Simple to implement
24
QUESTIONS
Type your question into the
chat box
25
FOLLOW US ONLINE
• Perficient.com/SocialMedia
• Facebook.com/Perficient
• Twitter.com/Perficient_LS
• Blogs.perficient.com/LifeSciences
26
THANK YOU