is03: an introduction to sdtm – part ii · – conventions for values are sponsor-defined. –...
TRANSCRIPT
IS03: An Introduction to SDTM – Part II
Jennie Mc Guirk
SDTM Framework
1. Where should the data go?
2. What type of information should
it contain?
3. What is the minimum information needed?
SDTM Framework: Re-cap
Data Class General Observation
Special Purpose
Relationship
Trail Design
Variable Role Identifier
Topic
Timing
Qualifier
Core Variables
Required
Expected
Permissible
SDTM Small print
• Naming Conventions • Subject Identifiers
• Sequence Variable
• Relationships and Linking
• Controlled Terminology
• Dates Formats
• Reference Start Date & Study Days
• Handling Text
• Original & Standard Results
• Missing & Multiple Values
• Timing and Timepoints
• Splitting Domains
Naming Conventions - Datasets
• 2 letter code, exceptions – Split domains – SUPP & RELREC
• SDTM IG Appendix C2, 30 SDTM reserved codes – Events (5): AE, CE, MH, DS, DV – Findings (12): EG, IE, LB, PC, PE, PP, QS, VS, DA, MB, MS, SC – Findings About (1): FA – Interventions (3): CM, EX, SU – Trial Design (5): TA, TE, TI, TS, TV – Special Purpose (4): CO, DM, SE, SV
• 1 reserved code relative to analysis datasets (AD)
• SDTM IG has also reserved the code X-, Y-, Z- for sponsor defined domains use / custom domains
Naming Conventions - Variables
• 8 character limit (40 character label limit) • IG defines variables names per dataset class
• where ‘--’ indicates the domain name
• Fragment names (Appendix D) – Guideline for SUPP QNAMs and TESTCDs
LBSTAT
Subject Identifiers (USUBJID & SUBJID)
• ‘Subject’ should be used where applicable – consistent with the recommendation in FDA guidance – generically refers to both patients and healthy volunteers.
• USUBJID – Unique Subject Identifier across all studies – must be unique for each trial participant (subject) across all
trials in the submission. – no two (or more) subjects, across all trials in the submission,
may have the same USUBJID. – the same person who participates in multiple clinical trials
(when this is known) must be assigned the same USUBJID value in all trials.
• SUBJID – Subject Identifier for the Study
USUBJID SUBJID
Uniquely identifies a subject across trials
Uniquely identifies a subject within a trial
Sequence Variable (--SEQ)
• The Sequence Number (--SEQ) – uniquely identifies a record for a given USUBJID within a domain. – required in all domains (except DM) – Conventions for values are sponsor-defined. – Values may or may not be sequential depending on data
processes and sources. – Necessary to link observations between domains such as
• Linking parent and supplemental qualifier observations • Relating records together (RELREC, CO)
Subject 1234 6 conmeds
Sequential number 1 thru 6, uniquely identifying each observation for that USUBJID
Relationships and Linking
• --GRPID
Relationships within a domain
• RELREC • CO
Relationships across domains
• SUPP
Non-standard questions
Relationships and Linking: --GRPID
Subject 1234 6 conmeds
CMGRPID represents a relationship between observations. CMSEQ = 1, 2, 3 are related (Combination Therapy 1) CMSEQ = 4, 5, 6 are related (Combination Therapy 2)
Relationships within a domain
Relationships and Linking: RELREC
Related Adverse Event & Disposition Event
RELID indicates the relationship identifier
IDVAR & IDVARVAL represents the observations that are related
RDOMAIN represents the related domains
Relationships across
domains
Relationships and Linking: CO
RDOMAIN represents the related domains
IDVAR & IDVARVAL represents the observations that are related
Relationships across
domains
Relationships and Linking: SUPPs
LB Family
Parent Child
Parent = LB
Child = SUPPLB Link via LBSEQ
Relationship to non
standard questions
Controlled Terminology
• Certain variables are ‘controlled terms’ • Values from a pre-defined list
• Represented 1 of 4 ways in the IG
Date Formats
• Dates in SDTM represented in ISO8601 format
• Dates in SDTM are character, enabling partial dates ISO8601 format
YYYY-MM-DDThh:mm:ss
Reference Start Date (RFSTDTC)
• Subject Reference Start Date (RFSTDTC) – designated as Study Day 1 – usually relates to the day subject was first exposed to study drug – the date preceding designated as Study Day -1 – there is no Study Day 0
RFSTDTC
Day -n Day -2 Day -1 Day 1 Day 2 Day n … …
Study Day (--STDY)
• sequential days relative to a reference point • all Study Day values are integers.
• Calculate Study Day: – if --DTC is on or after RFSTDTC
• --DY = (date portion of --DTC) - (date portion of RFSTDTC) + 1 – if --DTC precedes RFSTDTC
• --DY = (date portion of --DTC) - (date portion of RFSTDTC)
Study Day (--STDY)
RFSTDTC
13OCT2013 15OCT2013 14OCT2013 16OCT2013
Day -1 Day 2 Day 1 Day 3
No Study Day 0!!
17OCT2013
Day -2
-1 +1
Handling Text
• Casing – Upper case (recommended) – Exceptions include
• Comments / Free text • --TEST in Findings domains • External dictionary text (e.g. MedDRA) • Unit symbols (e.g. mg/dL)
• Free Text – General Comments – ‘Specify’ values for
• Result qualifier variables • Non-result qualifier variables • Topic variables
Free text collected on a dedicated CRF page and/or related to one or more SDTM domains will be stored within the CO (Comments) domain
Free text responses to specific questions
Handling Specify Text
• Result qualifier variables • Non-result qualifier variables
• Topic variables
Remember – the limit for all variables in SDTM is
$200
Original & Standard Results
• --ORRES – original result in a Findings domain (e.g. LB) – Expected Variable should be populated – With exception of
1. –STAT = ‘NOT DONE’ (Status variable) 2. -DRVFL = ‘Y’ (Result is derived)
• When --ORRES is populated 1. --STRESC (std character result) must be populated 2. --STRESN (std numeric result) should be populated when result is
numeric
• --STRESC is derived by conversion of values in --ORRES to values with standard units
Missing Values
• Missing values should be represented by nulls. – Note: This is a change from previous versions of the SDTMIG
which previously allowed sponsors to define their conventions for missing values.
• When groups of tests are not performed
• Individual missing –TESTCD, will have --STAT = NOT DONE
Variable Value Example --TESTCD --ALL LBALL -- TEST Name of module Labs Data --CAT Name of group of tests Urinalysis --ORRES Null --STAT NOT DONE NOT DONE --REASND If collected Not collected
Multiple Values
Type Example Action Result Intervention Topic variable
TYLENOL AND BENADRYL
Split CMTRT = TYLENOL CMTRT = BENADRYL
Event Topic variable
HEADACHE AND NAUSEA
Split AETERM = HEADACHE AETERM = NAUSEA
Findings Result variable
ATRIAL FIBRILLATION AND ATRIAL FLUTTER
Split EGORRES= ATRIAL FIBRILLATION EGORRES = ATRIL FLUTTER
Non Result Qualifier
AE LOCATION check all that apply
MULITPLE with SUPP
AELOC = MUTIPLE SUPPAE.QNAM = AELOC1, 2,…n
Timing & Timepoint Variables
• --STRF – used to identify the start of an observation relative to
the sponsor-defined reference period. • --ENRF
– used to identify the end of an observation relative to the sponsor-defined reference period.
• Reference period: RFSTDTC to RFENDTC
• Values: BEFORE, DURING, AFTER, DURING/AFTER, U (for unknown)
RFSTDTC RFENDTC
BEFORE DURING
DURING/AFTER
AFTER
Timing & Timepoint Variables
• When to use --STRF and –ENRF?
1. When CRF collect the below type of information in lieu of a date
2. Some sponsors may wish to derive --STRF and --ENRF for analysis or reporting purposes even when dates are collected.
*Sponsors are cautioned not to use –STRF & --ENRF for both (1) and (2), as it will blur the distinction between collected and derived
values within the domain.
**Sponsors wishing to derive for reporting purposes are instead encouraged to use supplemental variables or analysis datasets for
this derived data
Timing & Timepoint Variables
• Represent timing information relative to a specific time point --STRTPT (Start Reference Time Point) --STTPT (Start Time point) --ENRTPT (End Reference Time Point) --ENTPT (End Time point)
--STTPT BEFORE COINCIDENT
AFTER
e.g. Date of withdrawal
Timing & Timepoint Variables
REFERENCE VARIABLE
START END VALUES
--STRF RFSTDTC RFENDTC BEFORE, DURING, DURING/AFTER, AFTER, U
--ENRF RFSTDTC RFENDTC BEFORE, DURING, DURING/AFTER, AFTER, U
--STRTPT --STTPT BEFORE, COINCIDENT, AFTER, U
--ENRTPT --ENTPT BEFORE, COINCIDENT, AFTER, ONGOING, U
Splitting Domains
• Why split domains? – Size restrictions: exceeds limitations – Ease of use: store topically related observations together
• Considerations when splitting domains – Split by category (--CAT) e.g. LBCAT = HEMATOLOGY,
CHEMISTRY – Split dataset names can be up to four characters in length e.g LBHM,
LBCH – Value of the DOMAIN variable consistent across the separate
datasets e.g. DOMAIN = ‘LB’ in all – Variables have the same attributes across the split domains – Permissible variables included in one split dataset need not be
included in all split datasets. – --SEQ must be unique within USUBJID for all records across all the
split datasets, and relate in the same way to SUPPs, CO, RELREC
Splitting Domains
In short, if you append the split domains together, they should have the same appearance and work in the same way like
you have never split them!
Study 1
Study 2
What we covered
• Naming Conventions • Subject Identifiers
• Sequence Variable
• Relationships and Linking • Controlled Terminology
• Dates Formats • Reference Start Date & Study Days
• Handling Text
• Original & Standard Results • Missing & Multiple Values
• Timing and Timepoints • Splitting Domains
An Introduction to SDTM – Part II
An Introduction to SDTM – Part II