analysis data model (adam) - digital infuzion, inc.cdiscportal.digitalinfuzion.com/cdisc user...

28
CDISC Italian User Group 2007 Analysis Data Model (ADaM) Annamaria Muraro Helsinn Healthcare

Upload: vominh

Post on 28-Feb-2018

217 views

Category:

Documents


3 download

TRANSCRIPT

CDISC Italian User Group 2007

Analysis Data Model (ADaM)

Annamaria MuraroHelsinn Healthcare

2

Data flow in clinical studies

• Raw Datasets (=SDTM)– Data from a clinical trial – Source: CRF

• Analysis Datasets (=ADaM data)– Datasets used in the

analysis, restructured and contain additional information (derived variables, flags, etc.)

– Source: raw datasets• Two sets of data• Each with a specific purpose

3

FDA requirements

4

Analysis Data Model: General ConsiderationsDocument http://www.cdisc.org/models/adam/V2.0/index.html

• Analysis Data Model Version 2.0 (November 2006)– key principles for analysis datasets

– conventions for standard analysis variables

– provides a model for subject-level analysis dataset

– Metadata for Analysis Datasets • Analysis dataset metadata

• Analysis variable metadata

• Analysis results metadata

– Analysis datasets are discussed within the context of electronicsubmissions to the FDA but “the same principles and standardswill apply, regardless of the purpose of the analysis datasets”

5

Key Principles for Analysis Datasets Creation

Analysis datasets should: • facilitate a clear and unambiguous communication of the

content, source and quality of the datasets supporting the statistical analyses

• be useable by currently available tools (SAS XPT)• be “Analysis-ready” or “One Statistical Procedure Away”• redundancy may be acceptable• well documented: metadata and other documentation

should provide clear description of the analytic results, including statistical method, transformations, assumptions, derivations and imputations performed

• include the optimum number of datasets • garantee traceability

6

SDTM and ADaM• SDTM

– Source data– Vertical

– No redundancy

– Character variables– Each domain is specific to

itself– Dates are ISO8601 character

strings

– Two chars for dataset name– Data transfer– Interoperability

• ADaM– Derived data– Structure may not necessarily by

vertical– Redundancy is needed for easy

analysis– Numeric variables– Combines variables across

multiple domains– Dates are formatted as numeric

(e.g. SAS dates) to allow manipulation

– Dataset Name: ADXXXX– Analytic & graphical analysis– Clear communication of statistical

analysis and related decision

BOTH ARE NEEDED FOR FDA REVIEWBOTH ARE NEEDED FOR FDA REVIEW !!

7

Analysis Dataset Variables• Analysis dataset variables should be compliant with

SDTM standards– Maintain SDTM variable attributes (if the identical variable also

exists in an SDTM dataset)– Follow naming conventions for datasets and variables consistent

with the SDTM conventions, where feasible

• Analysis variables to be included– Identifiers– Analysis Population Indicators– Analysis Date Variables– Analysis Study Day Variables– Visit time Variables– Numeric Code Variables– Analysis Treatment Variables

8

Analysis Dataset VariablesAnalysis Population

• Analysis datasets should include analysispopulation flag at whatever level (eg. subject, visit or measurement) is necessary to clearlydescribe the population set used for anyanalysis

• Variables used to identify specific population– FULLSET, SAFETY, PPROT

• Population flags may be required at Visit level– FULLV, SAFV

• Population flags may be present in the SDTM (supplemental domain)

9

Analysis Dataset VariablesNumeric Code Variables

• When a numeric version of a categorical variables isrequired for statistical purposes: append an ‘N’ to the SDTM variable name

10

Analysis Dataset VariablesAnalysis Treatment Variables Variables

• Treatment variables are required to be present in all analysis datasets– Planned Treatment (TRTP char, TRTPN numeric)– Actual Treatment (TRTA char, TRTAN numeric)

• If an analysis is performed on the actual treament instead of the planned treatment, actual treatment variables are required in additionto the planned treatment variables

11

Subject-Level Dataset (ADSL)

• One record per subject• All the variables for describing the analysis population

• Demographic data (age, sex, race, other relevant factors)• Baseline characteristics• Disease factors• Treatment code/group• Factors that could affect response to therapy• Other relevant variables (smoking, alcohol intake, ....)• Population flags

• Data included in the subject-level analysis dataset can be used as source for data used in other analysisdatasets (derive variables only once!)

12

ADSL, Example

36-5038ADVERSE EVENTNNYY0001-2XX00012

21-3530YYYY0001-1XX00011

AGEGRPAGEDSREASCOMPLTPPROTITTSAFETYUSUBJIDStudyidObs

SAMPLE DATASET FOR ADSL

25.7486.21830PLACEBO4ASIANM32

21.9763.51701DRUG A1WHITEF21

BMIBLWEIGHTBLHEIGHTBLTRTPNTRTPRACENRACESEXAGEGRPNObs

SAMPLE DATASET FOR ADSL (continued)

Dataset named “ADxxxxxx” SDTM variable

with no changesADaM Treatment

Variable

13

Vital Signs Analysis Dataset: horizontal structureVariable Name

Variable Label Type Controlled Terms or Format

Source

STUDYID Study Identifier Char $15. VS.STUDYIDUSUBJID Unique Subject Identifier Char $30. VS.USUBJIDSUBJID Subject Identifier for the Study Char $5. ADSL.SUBJIDSITEID Study Site Identifier Char $5. ADSL.SITEIDVSBLFL Baseline Flag Char Y or Null VS.VSBLFL (where VS.VSTESTCD in ('DIABP' 'SYSBP' 'HR'))VISITNUM Visit Number Num 3. VS.VISITNUMVISIT Visit Name Char $100. VS.VISITWGT_BASE Body Weight Baseline Measurement Num 5.1 VS.VSSTRESN (where VSTESTCD = 'WEIGHT' and

VS.VSBSFL='Y') WGT_VAL Body Weight Visit Measurement Num 5.1 VS.VSSTRESN (where VSTESTCD = 'WEIGHT' ) WGT_CHG Body Weight Change from Baseline Num 5.1 ADVS.WGT_VAL - ADVS.WGT_BASEHR_BASE Heart Rate (beats/minute) Baseline Num 3. VS.VSSTRESN (where VSTESTCD = 'HR' and VS.VSBSFL='Y') HR_VAL Heart Rate (beats/minute) Visit Num 3. VS.VSSTRESN (where VSTESTCD = 'HR' ) HR_CHG Heart Rate (beats/minute) Change Num 3. ADVS.HR_VAL - ADVS.HR_BASESBP_BASE Systolic Blood Pressure (mmHg) Baseline Num 3. VS.VSSTRESN (where VSTESTCD = 'SYSBP' and

VS.VSBSFL='Y') SBP_VAL Systolic Blood Pressure (mmHg) Visit Num 3. VS.VSSTRESN (where VSTESTCD = 'SYSBP' ) SBP_CHG Systolic Blood Pressure (mmHg) Change Num 3. ADVS.SBP_VAL - ADVS.SBP_BASE.......AGE Age in AGEU at Reference Date/Time Num 3. ADSL.AGEAGEU Age Units Char years ADSL.AGEUSEX Sex Char F,M,U ADSL.SEXSEXN Sex Numeric Num 1=Male, 2=Female ADSL.SEXN

RACE Race Char White, Black, Hispanic, Asian, Other

ADSL.RACE

RACEN Race Numeric Num 1=White, 2=Black3=Hispanic, 4=Asian9=Other

ADSL.RACEN

...........TRTP Planned Treatment Group Char ADSL.TRTPTRTPN Planned Treatment Group Numeric Code Num ADSL.TRTPNTRTA Actual Treatment Group Char ADSL.TRTATRTAN Actual Treatment Group Numeric Code Num ADSL.TRTANSAFETY Safety Set Char Y, N ADSL.SAFETYFULLSET Full Analysis Set Char Y, N ADSL.FULLSETPPROT Per-Protocol Set Char Y, N ADSL.PPROT

Treatment variables

Analysis Population

Demographic variables

ADSL is the source

VS SDTM is the source

14

Adverse Events Analysis DatasetVariable Name

Variable Label Type

STUDYID Study Identifier Char

USUBJID Unique Subject Identifier Char

SUBJID Subject Identifier for the Study Char

SITEID Study Site Identifier Char

AESEQ Sequence Number Num

AETERM Reported Term for the Adverse Event Char

AEDECOD Dictionary-Derived Term Char

AEBODSYS Body System or Organ Class Char

AESEV Severity/Intensity Char

AESEVN Severity/Intensity Numeric Num

AESER Serious Event Char

AEACN Action Taken with Study Treatment Char

AEREL Causality Char

AERELN Causality Numeric Num

AEOUT Outcome of Adverse Event Char

AEOUTN Outcome of Adverse Event Numeric Num

.....

AESTDT Start Date of Adverse Event Numeric Num

AESTDY Study Day of Onset of Event Num

....

AERELAT Event Related to Study Drug Char

AEDUR Duration of Adverse Event (days) Num

Keep variables from AE SDTM

Add numeric variables

Add derived variables

Add flags for Treatment

Emergent AE

Variable Name

Variable Label Type

AEPRE Pre-Treatment Adverse Event Char

AETRTEM Treatment Emergent Adverse Event Char

AEPOST Post-Treatment Adverse Event Char

HEIGHTBL Baseline Height (cm) Num

WEIGHTBL Baseline Body Weight (kg) Num

AGE Age in AGEU at Reference Date/Time Num

AGEU Age Units Char

SEX Sex Char

SEXN Sex Numeric Num

RACE Race Char

RACEN Race Numeric Num

RACEOTH Specify Other Race Char

.....

TRTP Planned Treatment Group Char

TRTPN Planned Treatment Group Numeric Code Num

TRTA Actual Treatment Group Char

TRTAN Actual Treatment Group Numeric Code Num

SAFETY Safety Set Char

Add demographic variables from ADSL

Add treatment variables from ADSL

Add population flag from ADSL

15

Analysis Dataset Documentation

• Provide the link between the general description of the analysis (as found on the study protocol, SAP) and the source data

• The source of the analysis dataset should be clearly documented, allowing the reviewer to trace back data items to their source

• Documentation includes:– Analysis dataset metadata– Analysis variable metadata– Analysis results metadata– Other (SAS programs and/or other written documentation)

16

Analysis Dataset Metadata• Should contain:

– Dataset name, Dataset description, Structure, Purpose, Keys, Location, Documentation

Link to detaileddocumentation

17

Analysis Variable MetadataADSL (example from CDISC guideline) / 1

• describes each variable in the analysis dataset• provides details about where the variable came from in

the source data or how the variable was derived

18

ADSL / 2

19

ADSL / 3

20

Analysis Results Metadata

Description Reason Dataset Documentation

Table 5.1: Demographic data - full analysis set Summary of demographic data for full analysis set

Analysis pre-specified in SAP

ADSLselect records with FULLSET=Y

SAP Section XX

Table 5.2: Demographic data - per-protocol set Summary of demographic data for per-protocol set

Analysis pre-specified in SAP

ADSLselect records with PPROT=Y

SAP Section XX

Table 5.3: Demographic data - safety set Summary of demographic data for safety set Analysis pre-specified in SAP

ADSLselect records with SAFETY=Y

SAP Section XX

Table 5.4: Demographic data by country - full analysis set

Summary of demographic data by country for full analysis set

Analysis pre-specified in SAP

ADSLselect records with FULLSET=Y

SAP Section XX

Table 5.5: Demographic data by gender - full analysis set

Summary of demographic data by gender for full analysis set

Analysis pre-specified in SAP

ADSLselect records with FULLSET=Y

SAP Section XX

Analysis name

A unique identifier for the analysis

• Describes the major attributes of each important analysisresults

Reason for performing the analysis (pre-specified, exploratory, reg request

Name of the datasets / subset used in the

analysis

21

Select a strategy for ADaM implementationhttp://www.lexjansen.com/pharmasug/2005/fdacompliance/fc03.pdf

• Parallel method

• Linear method

• Hybrid method

• Other approaches

SDTM ADaM

CDMS

SDTM

ADaM

CDMS

DraftSDTM

ADaMCDMS SDTM

22

Implementation issues, Helsinn experience• Key aspects discussed during implementation:

– Vertical vs horizontal structure– Analysis ready and redundancy– Clear link between SDTM and ADaM (AE à ADAE, VS àADVS etc.): traceability

• Datasets– Subject level: full complaint with CDISC ADaM– Defined a generation sequence– One analysis dataset for each SDTM dataset (ADAE, ADIE, ADMH, ADPE, ADEX, ADCM,

ADLB etc.)– More than one dataset when needed (example EG, ADEG for par and findings)– Keep the vertical structure when possible (just add variables)– Efficacy datasets: study specific, no specifications– Additional datasets needed for the analysis may be created (example: to store

totals/denominators to be used in the summaries)

• Variables– Variables in SDTM SUPPQUAL merged back to the original domain (ex. Race, other)– Common set of variables in each dataset (age, gender, race, stratifications variables,

treatment planned/actual) – Analysis population flag: added to each dataset– Numeric variables: added as needed for the analysis (dates, numeric version of categorical

variables)– Add dataset specific variables (analysis day, TE, change from baseline etc.)

23

Benefits(even if you are not working on a submission)

• Minimized programming effort• Reduce risk of programming error• Less validation effort• Reuse of programs• Reduce the time need for analysis datasets

creation (we can spend more time to analysis)• Integrated Analysis make easier

24

ADaM Work in Progress• Develop Implementation Guide

– The ADaM team is working on an implementation guide that will build on the considerations discussed in the Analysis Data Model Version 2.0.

– This implementation guide will outline specific standards and recommendations for the structure and content of analysis data sets

– will contain a library of examples of analysis data sets that would serve to support specific statistical methodology used within clinical trials, such as

- Change from Baseline - Categorical Analysis - Time to Event - Adverse Events

• Develop Training Course

• Cross-team activities including:– SDS/ADaM Pilot project– DEFINE.XML and analysis data – Trial Design Model for 2-3 frequently used trial designs – Controlled terminology to be used for analysis data

25

Questions

26

Analysis Dataset Creation documentation –back-up

• Descriptions for each dataset:– the source datasets– processing steps– scientific decisions pertaining to creation

• Clearly distinguish:– derivations & decision rules specified a priori– decisions that were data-driven

• Key issues:– derived variables documentation: algorithms – handling of missing data– data item specific derivations, i.e change to a data value for a

specific observation

• Analysis dataset creation programs may be used as documentation

27

Standardized process for analysis datasetscreation – back-up

• ADSL should be created before other ADaM datasets

• Derivations should be performed only once (more efficient and reduces the risk of discrepancies)

• Define the datasets creation order (depending on existing relationships between ADaM datasets)

• Some SDTM variables may be not needed in the ADaM

• The list of ADaM datasets may be shorter than the SDTM (no suppqual datasets, efficacy data may becombined in one dataset)

There is still a lot of freedom in the possible set-up of ADaM structure. Define a standard approach!

28

Analysis Results Metadata – back-up

• Describes the major attributes of each important analysis results

• Links statistical results to – analysis datasets and programs used to generate the analysis– metadata describing the analysis– reason for performing the analysis

• Should contain– ANALYSIS NAME: A unique identifier for this analysis. May include a table

number or other sponsor-specific reference.– DESCRIPTION: A text description documenting the analysis performed. – REASON: The reason for performing this analysis. Examples may include Pre-

specified, Exploratory, and Regulatory Request.– DATASET: the name of the analysis dataset used for this analysis. The column

may also include specific selection criteria (e.g. where SAFETY=‘Y’)– DOCUMENTATION: information about how the analysis was performed (text

description, link to another document or the analysis generation program)