is03: an introduction to sdtm – part ii · – conventions for values are sponsor-defined. –...

IS03: An Introduction to SDTM – Part II

Jennie Mc Guirk

SDTM Framework

1. Where should the data go?

2. What type of information should

it contain?

3. What is the minimum information needed?

SDTM Framework: Re-cap

Data Class General Observation

Special Purpose

Relationship

Trail Design

Variable Role Identifier

Topic

Timing

Qualifier

Core Variables

Required

Expected

Permissible

SDTM Small print

•  Naming Conventions •  Subject Identifiers

•  Sequence Variable

•  Relationships and Linking

•  Controlled Terminology

•  Dates Formats

•  Reference Start Date & Study Days

•  Handling Text

•  Original & Standard Results

•  Missing & Multiple Values

•  Timing and Timepoints

•  Splitting Domains

Naming Conventions - Datasets

•  2 letter code, exceptions –  Split domains –  SUPP & RELREC

•  SDTM IG Appendix C2, 30 SDTM reserved codes –  Events (5): AE, CE, MH, DS, DV –  Findings (12): EG, IE, LB, PC, PE, PP, QS, VS, DA, MB, MS, SC –  Findings About (1): FA –  Interventions (3): CM, EX, SU –  Trial Design (5): TA, TE, TI, TS, TV –  Special Purpose (4): CO, DM, SE, SV

•  1 reserved code relative to analysis datasets (AD)

•  SDTM IG has also reserved the code X-, Y-, Z- for sponsor defined domains use / custom domains

Naming Conventions - Variables

•  8 character limit (40 character label limit) •  IG defines variables names per dataset class

•  where ‘--’ indicates the domain name

•  Fragment names (Appendix D) –  Guideline for SUPP QNAMs and TESTCDs

LBSTAT

Subject Identifiers (USUBJID & SUBJID)

•  ‘Subject’ should be used where applicable –  consistent with the recommendation in FDA guidance –  generically refers to both patients and healthy volunteers.

•  USUBJID –  Unique Subject Identifier across all studies –  must be unique for each trial participant (subject) across all

trials in the submission. –  no two (or more) subjects, across all trials in the submission,

may have the same USUBJID. –  the same person who participates in multiple clinical trials

(when this is known) must be assigned the same USUBJID value in all trials.

•  SUBJID –  Subject Identifier for the Study

USUBJID SUBJID

Uniquely identifies a subject across trials

Uniquely identifies a subject within a trial

Sequence Variable (--SEQ)

•  The Sequence Number (--SEQ) –  uniquely identifies a record for a given USUBJID within a domain. –  required in all domains (except DM) –  Conventions for values are sponsor-defined. –  Values may or may not be sequential depending on data

processes and sources. –  Necessary to link observations between domains such as

•  Linking parent and supplemental qualifier observations •  Relating records together (RELREC, CO)

Subject 1234 6 conmeds

Sequential number 1 thru 6, uniquely identifying each observation for that USUBJID

Relationships and Linking

•  --GRPID

Relationships within a domain

•  RELREC •  CO

Relationships across domains

•  SUPP

Non-standard questions

Relationships and Linking: --GRPID

Subject 1234 6 conmeds

CMGRPID represents a relationship between observations. CMSEQ = 1, 2, 3 are related (Combination Therapy 1) CMSEQ = 4, 5, 6 are related (Combination Therapy 2)

Relationships within a domain

Relationships and Linking: RELREC

Related Adverse Event & Disposition Event

RELID indicates the relationship identifier

IDVAR & IDVARVAL represents the observations that are related

RDOMAIN represents the related domains

Relationships across

domains

Relationships and Linking: CO

RDOMAIN represents the related domains

IDVAR & IDVARVAL represents the observations that are related

Relationships across

domains

Relationships and Linking: SUPPs

LB Family

Parent Child

Parent = LB

Child = SUPPLB Link via LBSEQ

Relationship to non

standard questions

Controlled Terminology

•  Certain variables are ‘controlled terms’ •  Values from a pre-defined list

•  Represented 1 of 4 ways in the IG

Date Formats

•  Dates in SDTM represented in ISO8601 format

•  Dates in SDTM are character, enabling partial dates ISO8601 format

YYYY-MM-DDThh:mm:ss

Reference Start Date (RFSTDTC)

•  Subject Reference Start Date (RFSTDTC) –  designated as Study Day 1 –  usually relates to the day subject was first exposed to study drug –  the date preceding designated as Study Day -1 –  there is no Study Day 0

RFSTDTC

Day -n Day -2 Day -1 Day 1 Day 2 Day n … …

Study Day (--STDY)

•  sequential days relative to a reference point •  all Study Day values are integers.

•  Calculate Study Day: –  if --DTC is on or after RFSTDTC

•  --DY = (date portion of --DTC) - (date portion of RFSTDTC) + 1 –  if --DTC precedes RFSTDTC

•  --DY = (date portion of --DTC) - (date portion of RFSTDTC)

Study Day (--STDY)

RFSTDTC

13OCT2013 15OCT2013 14OCT2013 16OCT2013

Day -1 Day 2 Day 1 Day 3

No Study Day 0!!

17OCT2013

Day -2

-1 +1

Handling Text

•  Casing –  Upper case (recommended) –  Exceptions include

•  Comments / Free text •  --TEST in Findings domains •  External dictionary text (e.g. MedDRA) •  Unit symbols (e.g. mg/dL)

•  Free Text –  General Comments –  ‘Specify’ values for

•  Result qualifier variables •  Non-result qualifier variables •  Topic variables

Free text collected on a dedicated CRF page and/or related to one or more SDTM domains will be stored within the CO (Comments) domain

Free text responses to specific questions

Handling Specify Text

•  Result qualifier variables •  Non-result qualifier variables

•  Topic variables

Remember – the limit for all variables in SDTM is

$200

Original & Standard Results

•  --ORRES –  original result in a Findings domain (e.g. LB) –  Expected Variable should be populated –  With exception of

1.  –STAT = ‘NOT DONE’ (Status variable) 2.  -DRVFL = ‘Y’ (Result is derived)

•  When --ORRES is populated 1.  --STRESC (std character result) must be populated 2.  --STRESN (std numeric result) should be populated when result is

numeric

•  --STRESC is derived by conversion of values in --ORRES to values with standard units

Missing Values

•  Missing values should be represented by nulls. –  Note: This is a change from previous versions of the SDTMIG

which previously allowed sponsors to define their conventions for missing values.

•  When groups of tests are not performed

•  Individual missing –TESTCD, will have --STAT = NOT DONE

Variable Value Example --TESTCD --ALL LBALL -- TEST Name of module Labs Data --CAT Name of group of tests Urinalysis --ORRES Null --STAT NOT DONE NOT DONE --REASND If collected Not collected

Multiple Values

Type Example Action Result Intervention Topic variable

TYLENOL AND BENADRYL

Split CMTRT = TYLENOL CMTRT = BENADRYL

Event Topic variable

HEADACHE AND NAUSEA

Split AETERM = HEADACHE AETERM = NAUSEA

Findings Result variable

ATRIAL FIBRILLATION AND ATRIAL FLUTTER

Split EGORRES= ATRIAL FIBRILLATION EGORRES = ATRIL FLUTTER

Non Result Qualifier

AE LOCATION check all that apply

MULITPLE with SUPP

AELOC = MUTIPLE SUPPAE.QNAM = AELOC1, 2,…n

Timing & Timepoint Variables

•  --STRF –  used to identify the start of an observation relative to

the sponsor-defined reference period. •  --ENRF

–  used to identify the end of an observation relative to the sponsor-defined reference period.

•  Reference period: RFSTDTC to RFENDTC

•  Values: BEFORE, DURING, AFTER, DURING/AFTER, U (for unknown)

RFSTDTC RFENDTC

BEFORE DURING

DURING/AFTER

AFTER


•  When to use --STRF and –ENRF?

1.  When CRF collect the below type of information in lieu of a date

2.  Some sponsors may wish to derive --STRF and --ENRF for analysis or reporting purposes even when dates are collected.

*Sponsors are cautioned not to use –STRF & --ENRF for both (1) and (2), as it will blur the distinction between collected and derived

values within the domain.

**Sponsors wishing to derive for reporting purposes are instead encouraged to use supplemental variables or analysis datasets for

this derived data


•  Represent timing information relative to a specific time point --STRTPT (Start Reference Time Point) --STTPT (Start Time point) --ENRTPT (End Reference Time Point) --ENTPT (End Time point)

--STTPT BEFORE COINCIDENT

AFTER

e.g. Date of withdrawal


REFERENCE VARIABLE

START END VALUES

--STRF RFSTDTC RFENDTC BEFORE, DURING, DURING/AFTER, AFTER, U

--ENRF RFSTDTC RFENDTC BEFORE, DURING, DURING/AFTER, AFTER, U

--STRTPT --STTPT BEFORE, COINCIDENT, AFTER, U

--ENRTPT --ENTPT BEFORE, COINCIDENT, AFTER, ONGOING, U

Splitting Domains

•  Why split domains? –  Size restrictions: exceeds limitations –  Ease of use: store topically related observations together

•  Considerations when splitting domains –  Split by category (--CAT) e.g. LBCAT = HEMATOLOGY,

CHEMISTRY –  Split dataset names can be up to four characters in length e.g LBHM,

LBCH –  Value of the DOMAIN variable consistent across the separate

datasets e.g. DOMAIN = ‘LB’ in all –  Variables have the same attributes across the split domains –  Permissible variables included in one split dataset need not be

included in all split datasets. –  --SEQ must be unique within USUBJID for all records across all the

split datasets, and relate in the same way to SUPPs, CO, RELREC

Splitting Domains

In short, if you append the split domains together, they should have the same appearance and work in the same way like

you have never split them!

Study 1

Study 2

What we covered

•  Naming Conventions •  Subject Identifiers

•  Sequence Variable

•  Relationships and Linking •  Controlled Terminology

•  Dates Formats •  Reference Start Date & Study Days

•  Handling Text

•  Original & Standard Results •  Missing & Multiple Values

•  Timing and Timepoints •  Splitting Domains

An Introduction to SDTM – Part II

An Introduction to SDTM – Part II

is03: an introduction to sdtm – part ii · – conventions for values are sponsor-defined. –...

Documents