analysis data model (adam) - digital infuzion, inc.cdiscportal.digitalinfuzion.com/cdisc user...
TRANSCRIPT
2
Data flow in clinical studies
• Raw Datasets (=SDTM)– Data from a clinical trial – Source: CRF
• Analysis Datasets (=ADaM data)– Datasets used in the
analysis, restructured and contain additional information (derived variables, flags, etc.)
– Source: raw datasets• Two sets of data• Each with a specific purpose
4
Analysis Data Model: General ConsiderationsDocument http://www.cdisc.org/models/adam/V2.0/index.html
• Analysis Data Model Version 2.0 (November 2006)– key principles for analysis datasets
– conventions for standard analysis variables
– provides a model for subject-level analysis dataset
– Metadata for Analysis Datasets • Analysis dataset metadata
• Analysis variable metadata
• Analysis results metadata
– Analysis datasets are discussed within the context of electronicsubmissions to the FDA but “the same principles and standardswill apply, regardless of the purpose of the analysis datasets”
5
Key Principles for Analysis Datasets Creation
Analysis datasets should: • facilitate a clear and unambiguous communication of the
content, source and quality of the datasets supporting the statistical analyses
• be useable by currently available tools (SAS XPT)• be “Analysis-ready” or “One Statistical Procedure Away”• redundancy may be acceptable• well documented: metadata and other documentation
should provide clear description of the analytic results, including statistical method, transformations, assumptions, derivations and imputations performed
• include the optimum number of datasets • garantee traceability
6
SDTM and ADaM• SDTM
– Source data– Vertical
– No redundancy
– Character variables– Each domain is specific to
itself– Dates are ISO8601 character
strings
– Two chars for dataset name– Data transfer– Interoperability
• ADaM– Derived data– Structure may not necessarily by
vertical– Redundancy is needed for easy
analysis– Numeric variables– Combines variables across
multiple domains– Dates are formatted as numeric
(e.g. SAS dates) to allow manipulation
– Dataset Name: ADXXXX– Analytic & graphical analysis– Clear communication of statistical
analysis and related decision
BOTH ARE NEEDED FOR FDA REVIEWBOTH ARE NEEDED FOR FDA REVIEW !!
7
Analysis Dataset Variables• Analysis dataset variables should be compliant with
SDTM standards– Maintain SDTM variable attributes (if the identical variable also
exists in an SDTM dataset)– Follow naming conventions for datasets and variables consistent
with the SDTM conventions, where feasible
• Analysis variables to be included– Identifiers– Analysis Population Indicators– Analysis Date Variables– Analysis Study Day Variables– Visit time Variables– Numeric Code Variables– Analysis Treatment Variables
8
Analysis Dataset VariablesAnalysis Population
• Analysis datasets should include analysispopulation flag at whatever level (eg. subject, visit or measurement) is necessary to clearlydescribe the population set used for anyanalysis
• Variables used to identify specific population– FULLSET, SAFETY, PPROT
• Population flags may be required at Visit level– FULLV, SAFV
• Population flags may be present in the SDTM (supplemental domain)
9
Analysis Dataset VariablesNumeric Code Variables
• When a numeric version of a categorical variables isrequired for statistical purposes: append an ‘N’ to the SDTM variable name
10
Analysis Dataset VariablesAnalysis Treatment Variables Variables
• Treatment variables are required to be present in all analysis datasets– Planned Treatment (TRTP char, TRTPN numeric)– Actual Treatment (TRTA char, TRTAN numeric)
• If an analysis is performed on the actual treament instead of the planned treatment, actual treatment variables are required in additionto the planned treatment variables
11
Subject-Level Dataset (ADSL)
• One record per subject• All the variables for describing the analysis population
• Demographic data (age, sex, race, other relevant factors)• Baseline characteristics• Disease factors• Treatment code/group• Factors that could affect response to therapy• Other relevant variables (smoking, alcohol intake, ....)• Population flags
• Data included in the subject-level analysis dataset can be used as source for data used in other analysisdatasets (derive variables only once!)
12
ADSL, Example
36-5038ADVERSE EVENTNNYY0001-2XX00012
21-3530YYYY0001-1XX00011
AGEGRPAGEDSREASCOMPLTPPROTITTSAFETYUSUBJIDStudyidObs
SAMPLE DATASET FOR ADSL
25.7486.21830PLACEBO4ASIANM32
21.9763.51701DRUG A1WHITEF21
BMIBLWEIGHTBLHEIGHTBLTRTPNTRTPRACENRACESEXAGEGRPNObs
SAMPLE DATASET FOR ADSL (continued)
Dataset named “ADxxxxxx” SDTM variable
with no changesADaM Treatment
Variable
13
Vital Signs Analysis Dataset: horizontal structureVariable Name
Variable Label Type Controlled Terms or Format
Source
STUDYID Study Identifier Char $15. VS.STUDYIDUSUBJID Unique Subject Identifier Char $30. VS.USUBJIDSUBJID Subject Identifier for the Study Char $5. ADSL.SUBJIDSITEID Study Site Identifier Char $5. ADSL.SITEIDVSBLFL Baseline Flag Char Y or Null VS.VSBLFL (where VS.VSTESTCD in ('DIABP' 'SYSBP' 'HR'))VISITNUM Visit Number Num 3. VS.VISITNUMVISIT Visit Name Char $100. VS.VISITWGT_BASE Body Weight Baseline Measurement Num 5.1 VS.VSSTRESN (where VSTESTCD = 'WEIGHT' and
VS.VSBSFL='Y') WGT_VAL Body Weight Visit Measurement Num 5.1 VS.VSSTRESN (where VSTESTCD = 'WEIGHT' ) WGT_CHG Body Weight Change from Baseline Num 5.1 ADVS.WGT_VAL - ADVS.WGT_BASEHR_BASE Heart Rate (beats/minute) Baseline Num 3. VS.VSSTRESN (where VSTESTCD = 'HR' and VS.VSBSFL='Y') HR_VAL Heart Rate (beats/minute) Visit Num 3. VS.VSSTRESN (where VSTESTCD = 'HR' ) HR_CHG Heart Rate (beats/minute) Change Num 3. ADVS.HR_VAL - ADVS.HR_BASESBP_BASE Systolic Blood Pressure (mmHg) Baseline Num 3. VS.VSSTRESN (where VSTESTCD = 'SYSBP' and
VS.VSBSFL='Y') SBP_VAL Systolic Blood Pressure (mmHg) Visit Num 3. VS.VSSTRESN (where VSTESTCD = 'SYSBP' ) SBP_CHG Systolic Blood Pressure (mmHg) Change Num 3. ADVS.SBP_VAL - ADVS.SBP_BASE.......AGE Age in AGEU at Reference Date/Time Num 3. ADSL.AGEAGEU Age Units Char years ADSL.AGEUSEX Sex Char F,M,U ADSL.SEXSEXN Sex Numeric Num 1=Male, 2=Female ADSL.SEXN
RACE Race Char White, Black, Hispanic, Asian, Other
ADSL.RACE
RACEN Race Numeric Num 1=White, 2=Black3=Hispanic, 4=Asian9=Other
ADSL.RACEN
...........TRTP Planned Treatment Group Char ADSL.TRTPTRTPN Planned Treatment Group Numeric Code Num ADSL.TRTPNTRTA Actual Treatment Group Char ADSL.TRTATRTAN Actual Treatment Group Numeric Code Num ADSL.TRTANSAFETY Safety Set Char Y, N ADSL.SAFETYFULLSET Full Analysis Set Char Y, N ADSL.FULLSETPPROT Per-Protocol Set Char Y, N ADSL.PPROT
Treatment variables
Analysis Population
Demographic variables
ADSL is the source
VS SDTM is the source
14
Adverse Events Analysis DatasetVariable Name
Variable Label Type
STUDYID Study Identifier Char
USUBJID Unique Subject Identifier Char
SUBJID Subject Identifier for the Study Char
SITEID Study Site Identifier Char
AESEQ Sequence Number Num
AETERM Reported Term for the Adverse Event Char
AEDECOD Dictionary-Derived Term Char
AEBODSYS Body System or Organ Class Char
AESEV Severity/Intensity Char
AESEVN Severity/Intensity Numeric Num
AESER Serious Event Char
AEACN Action Taken with Study Treatment Char
AEREL Causality Char
AERELN Causality Numeric Num
AEOUT Outcome of Adverse Event Char
AEOUTN Outcome of Adverse Event Numeric Num
.....
AESTDT Start Date of Adverse Event Numeric Num
AESTDY Study Day of Onset of Event Num
....
AERELAT Event Related to Study Drug Char
AEDUR Duration of Adverse Event (days) Num
Keep variables from AE SDTM
Add numeric variables
Add derived variables
Add flags for Treatment
Emergent AE
Variable Name
Variable Label Type
AEPRE Pre-Treatment Adverse Event Char
AETRTEM Treatment Emergent Adverse Event Char
AEPOST Post-Treatment Adverse Event Char
HEIGHTBL Baseline Height (cm) Num
WEIGHTBL Baseline Body Weight (kg) Num
AGE Age in AGEU at Reference Date/Time Num
AGEU Age Units Char
SEX Sex Char
SEXN Sex Numeric Num
RACE Race Char
RACEN Race Numeric Num
RACEOTH Specify Other Race Char
.....
TRTP Planned Treatment Group Char
TRTPN Planned Treatment Group Numeric Code Num
TRTA Actual Treatment Group Char
TRTAN Actual Treatment Group Numeric Code Num
SAFETY Safety Set Char
Add demographic variables from ADSL
Add treatment variables from ADSL
Add population flag from ADSL
15
Analysis Dataset Documentation
• Provide the link between the general description of the analysis (as found on the study protocol, SAP) and the source data
• The source of the analysis dataset should be clearly documented, allowing the reviewer to trace back data items to their source
• Documentation includes:– Analysis dataset metadata– Analysis variable metadata– Analysis results metadata– Other (SAS programs and/or other written documentation)
16
Analysis Dataset Metadata• Should contain:
– Dataset name, Dataset description, Structure, Purpose, Keys, Location, Documentation
Link to detaileddocumentation
17
Analysis Variable MetadataADSL (example from CDISC guideline) / 1
• describes each variable in the analysis dataset• provides details about where the variable came from in
the source data or how the variable was derived
20
Analysis Results Metadata
Description Reason Dataset Documentation
Table 5.1: Demographic data - full analysis set Summary of demographic data for full analysis set
Analysis pre-specified in SAP
ADSLselect records with FULLSET=Y
SAP Section XX
Table 5.2: Demographic data - per-protocol set Summary of demographic data for per-protocol set
Analysis pre-specified in SAP
ADSLselect records with PPROT=Y
SAP Section XX
Table 5.3: Demographic data - safety set Summary of demographic data for safety set Analysis pre-specified in SAP
ADSLselect records with SAFETY=Y
SAP Section XX
Table 5.4: Demographic data by country - full analysis set
Summary of demographic data by country for full analysis set
Analysis pre-specified in SAP
ADSLselect records with FULLSET=Y
SAP Section XX
Table 5.5: Demographic data by gender - full analysis set
Summary of demographic data by gender for full analysis set
Analysis pre-specified in SAP
ADSLselect records with FULLSET=Y
SAP Section XX
Analysis name
A unique identifier for the analysis
• Describes the major attributes of each important analysisresults
Reason for performing the analysis (pre-specified, exploratory, reg request
Name of the datasets / subset used in the
analysis
21
Select a strategy for ADaM implementationhttp://www.lexjansen.com/pharmasug/2005/fdacompliance/fc03.pdf
• Parallel method
• Linear method
• Hybrid method
• Other approaches
SDTM ADaM
CDMS
SDTM
ADaM
CDMS
DraftSDTM
ADaMCDMS SDTM
22
Implementation issues, Helsinn experience• Key aspects discussed during implementation:
– Vertical vs horizontal structure– Analysis ready and redundancy– Clear link between SDTM and ADaM (AE à ADAE, VS àADVS etc.): traceability
• Datasets– Subject level: full complaint with CDISC ADaM– Defined a generation sequence– One analysis dataset for each SDTM dataset (ADAE, ADIE, ADMH, ADPE, ADEX, ADCM,
ADLB etc.)– More than one dataset when needed (example EG, ADEG for par and findings)– Keep the vertical structure when possible (just add variables)– Efficacy datasets: study specific, no specifications– Additional datasets needed for the analysis may be created (example: to store
totals/denominators to be used in the summaries)
• Variables– Variables in SDTM SUPPQUAL merged back to the original domain (ex. Race, other)– Common set of variables in each dataset (age, gender, race, stratifications variables,
treatment planned/actual) – Analysis population flag: added to each dataset– Numeric variables: added as needed for the analysis (dates, numeric version of categorical
variables)– Add dataset specific variables (analysis day, TE, change from baseline etc.)
23
Benefits(even if you are not working on a submission)
• Minimized programming effort• Reduce risk of programming error• Less validation effort• Reuse of programs• Reduce the time need for analysis datasets
creation (we can spend more time to analysis)• Integrated Analysis make easier
24
ADaM Work in Progress• Develop Implementation Guide
– The ADaM team is working on an implementation guide that will build on the considerations discussed in the Analysis Data Model Version 2.0.
– This implementation guide will outline specific standards and recommendations for the structure and content of analysis data sets
– will contain a library of examples of analysis data sets that would serve to support specific statistical methodology used within clinical trials, such as
- Change from Baseline - Categorical Analysis - Time to Event - Adverse Events
• Develop Training Course
• Cross-team activities including:– SDS/ADaM Pilot project– DEFINE.XML and analysis data – Trial Design Model for 2-3 frequently used trial designs – Controlled terminology to be used for analysis data
26
Analysis Dataset Creation documentation –back-up
• Descriptions for each dataset:– the source datasets– processing steps– scientific decisions pertaining to creation
• Clearly distinguish:– derivations & decision rules specified a priori– decisions that were data-driven
• Key issues:– derived variables documentation: algorithms – handling of missing data– data item specific derivations, i.e change to a data value for a
specific observation
• Analysis dataset creation programs may be used as documentation
27
Standardized process for analysis datasetscreation – back-up
• ADSL should be created before other ADaM datasets
• Derivations should be performed only once (more efficient and reduces the risk of discrepancies)
• Define the datasets creation order (depending on existing relationships between ADaM datasets)
• Some SDTM variables may be not needed in the ADaM
• The list of ADaM datasets may be shorter than the SDTM (no suppqual datasets, efficacy data may becombined in one dataset)
There is still a lot of freedom in the possible set-up of ADaM structure. Define a standard approach!
28
Analysis Results Metadata – back-up
• Describes the major attributes of each important analysis results
• Links statistical results to – analysis datasets and programs used to generate the analysis– metadata describing the analysis– reason for performing the analysis
• Should contain– ANALYSIS NAME: A unique identifier for this analysis. May include a table
number or other sponsor-specific reference.– DESCRIPTION: A text description documenting the analysis performed. – REASON: The reason for performing this analysis. Examples may include Pre-
specified, Exploratory, and Regulatory Request.– DATASET: the name of the analysis dataset used for this analysis. The column
may also include specific selection criteria (e.g. where SAFETY=‘Y’)– DOCUMENTATION: information about how the analysis was performed (text
description, link to another document or the analysis generation program)