gregory steffens associate director, programming novartis row – level metadata
TRANSCRIPT
![Page 1: Gregory Steffens Associate Director, Programming Novartis Row – Level Metadata](https://reader036.vdocument.in/reader036/viewer/2022082816/56649cf65503460f949c57fc/html5/thumbnails/1.jpg)
Gregory Steffens
Associate Director, Programming
Novartis
Row – Level Metadata
![Page 2: Gregory Steffens Associate Director, Programming Novartis Row – Level Metadata](https://reader036.vdocument.in/reader036/viewer/2022082816/56649cf65503460f949c57fc/html5/thumbnails/2.jpg)
Why Do We Need Row-Level Metadata?
| Presentation Title | Presenter Name | Date | Subject | Business Use Only2
If we know why, we will know how to design it and when to use it
A requirement for describing tall-thin data sets in studies and in data standards• Storing data in –TESTCD --ORRES kinds of data set requires more than a
simple metadata that can describe data sets and variables
• These data sets have several variables that the simple metadata cannot describe, including ORRES, ORRESU, STRESN, STRESC, STRESU, STRESPOS, etc.
• In the simpler world these test results and attributes would be stored in short-wide data sets in variables like HEIGHT, HEIGHT_UNIT, WEIGHT, WEIGHT_UNIT, SYSBP, SYSBP_UNIT, SYSBP_POS
• Storing these test results in ORRES kinds of variables does not mean we need less metadata, a lesser number of variables does not mean a lesser amount of metadata. ORRES contains many virtual variables we need to describe just as if they were in a simple short-wide data set.
A prerequisite for software to transform data for reporting purposes
![Page 3: Gregory Steffens Associate Director, Programming Novartis Row – Level Metadata](https://reader036.vdocument.in/reader036/viewer/2022082816/56649cf65503460f949c57fc/html5/thumbnails/3.jpg)
An Example of a Short-Wide Data Set
| Presentation Title | Presenter Name | Date | Subject | Business Use Only3
A variable for each result and result unit
USUBJID HEIGHT HEIGHTU WEIGHT WEIGHTU BMI BMIU SEX
1 74 IN 190 LBS 24.39 KG/M**2 MALE
![Page 4: Gregory Steffens Associate Director, Programming Novartis Row – Level Metadata](https://reader036.vdocument.in/reader036/viewer/2022082816/56649cf65503460f949c57fc/html5/thumbnails/4.jpg)
Some of the Metadata to Describe Short-Wide Data
| Presentation Title | Presenter Name | Date | Subject | Business Use Only4
A simple description of the attributes of these variables
USUBJID HEIGHT HEIGHTU WEIGHT WEIGHTU BMI BMIU SEX
1 74 IN 190 LBS 24.39 KG/M**2 MALE
TABLE COLUMN CTYPE CLENGTH CLABEL CFORMAT CDERIVATION
VS USUBJID C 15 Subject ID USUBID
VS HEIGHT N 8 Subj Height 2.0
VS HEIGHTU C 10 Height Unit HTU
VS WEIGHT N 8 Subj Weight 3.0
VS WEIGHTU C 12 Weight Unit WTU
VS BMI N 8 Subj BMI 5.2 BMIFORMULA
VS BMIU C 7 BMI Unit BMIU BMIUNIT
VS SEX C 6 Subj Gender SEX
![Page 5: Gregory Steffens Associate Director, Programming Novartis Row – Level Metadata](https://reader036.vdocument.in/reader036/viewer/2022082816/56649cf65503460f949c57fc/html5/thumbnails/5.jpg)
Same Values in a Tall-Thin Data Set
| Presentation Title | Presenter Name | Date | Subject | Business Use Only5
Results now all in 1 variable and units in 1 other variable
USUBJID VSTESTCD VSORRES VSORRESU SEX
1 HEIGHT 74 IN MALE
1 WEIGHT 190 LBS MALE
1 BMI 24.39 KG/M**2 MALE
![Page 6: Gregory Steffens Associate Director, Programming Novartis Row – Level Metadata](https://reader036.vdocument.in/reader036/viewer/2022082816/56649cf65503460f949c57fc/html5/thumbnails/6.jpg)
Some Metadata to Describe the Tall-Thin Data Set
| Presentation Title | Presenter Name | Date | Subject | Business Use Only6
Row-level metadata must define all the attributes of a variable but for a subset of the rows defined by each unique value of xxTESTCD
USUBJID VSTESTCD VSORRES VSORRESU SEX
1 HEIGHT 74 IN MALE
1 WEIGHT 190 LBS MALE
1 BMI 24.39 KG/M**2 MALE
TABLE COLUMN PARAM PARAMREL CTYPE CLENGTH CLABEL CFORMAT CDERIVATION
VS VSTESTCD HEIGHT VSORRES N 8 Subj Height 2.0
VS VSTESTCD HEIGHT VSORRESU C 10 Height Unit HTU
VS VSTESTCD WEIGHT VSORRES N 8 Subj Weight 3.0
VS VSTESTCD WEIGHT VSORRESU C 12 Weight Unit WTU
VS VSTESTCD BMI VSORRES N 8 Subj BMI 5.2 BMIFORMULA
VS VSTESTCD BMI VSORRESU C 7 BMI Unit BMIU BMIUNIT
![Page 7: Gregory Steffens Associate Director, Programming Novartis Row – Level Metadata](https://reader036.vdocument.in/reader036/viewer/2022082816/56649cf65503460f949c57fc/html5/thumbnails/7.jpg)
USUBJID VSTESTCD VSORRES VSORRESU SEX
1 HEIGHT 74 IN MALE
1 WEIGHT 190 LBS MALE
1 BMI 24.39 KG/M**2 MALE
VARIABLE TYPE EXAMPLES
PRIMARY KEYS USUBJID, VSTESTCD
PARAMETER VARIABLE NAME VSTESTCD (the last pkey)
PARAMETER VARIABLE VALUES HEIGHT, WEIGHT, BMI
PARAMETER-RELATED VSORRES, VSORRESUvsstresn, vsstresc, vsstresu, vspos, vsloc
PARAMETER-NONRELATED SEX| Presentation Title | Presenter Name | Date | Subject | Business Use Only7
Metadata must fully describe all the attributes of all the categoriesCategories of Variables in Tall-Thin Data Sets
![Page 8: Gregory Steffens Associate Director, Programming Novartis Row – Level Metadata](https://reader036.vdocument.in/reader036/viewer/2022082816/56649cf65503460f949c57fc/html5/thumbnails/8.jpg)
What row-level metadata is NOT!
| Presentation Title | Presenter Name | Date | Subject | Business Use Only8
Not meant to define other relationships in study metadata
NOT a list of values, ValueList is not simply a list of values
Row-level metadata is not designed to define all the other relationships between study variables
It is designed as metadata, i.e. to describe the ItemDef attributes of virtual variables. That is, to describe the attributes of parameter-related variables for each value of –TESTCD
It should not be used for non-metadata purposes• NOT to define the height unit of measure as being inches in the USA but
centimeters in the EU
• NOT to look for males with positive pregnancy test results
• NOT to define all the edit checks. That can be data driven but NOT by row-level metadata, which is inadequate to this task because it only enables single-domain where conjuncts
![Page 9: Gregory Steffens Associate Director, Programming Novartis Row – Level Metadata](https://reader036.vdocument.in/reader036/viewer/2022082816/56649cf65503460f949c57fc/html5/thumbnails/9.jpg)
Problem Solved
| Presentation Title | Presenter Name | Date | Subject | Business Use Only9
Metadata and a pair of macros enables easy transformation of data
Transforming data between short-wide and tall-thin data sets is now a very simple macro call
%dt_wide2thin(data=vitals,out=vs,mdlib=md)
%dt_thin2wide(data=vs,out=vitals,mdlib=md)
The tall-thin and short-wide data structures are not perfect for all uses, summary tables, listings, deriving new parameter results from mutiple parameter results, comparing parameter results, etc.
Tall-thin is very better for storage, summary tables
Short-wide is better for listings, deriving, comparing
Define file and data transparency achieved
![Page 10: Gregory Steffens Associate Director, Programming Novartis Row – Level Metadata](https://reader036.vdocument.in/reader036/viewer/2022082816/56649cf65503460f949c57fc/html5/thumbnails/10.jpg)
Variable Categories Described
| Presentation Title | Presenter Name | Date | Subject | Business Use Only10
Primary Keys
• Defined in the COLUMNS metadata set
Parameter Name
• A special primary key that defines the kind of result for the current row
• Defined in the COLUMNS_PARAM metadata set
Parameter Value
Parameter-related
• Each non-key variable whose attributes each differ across rows but are the same attributes for the subset of rows defined by parameter variable xxTESTCD. These are “virtual variables”.
• Defined in the COLUMNS_PARAM metadata set
Parameter-nonrelated
• Each non-key variable whose attributes do not differ across rows and are not dependent on the parameter variable
• Defined in the COLUMNS metadata set
![Page 11: Gregory Steffens Associate Director, Programming Novartis Row – Level Metadata](https://reader036.vdocument.in/reader036/viewer/2022082816/56649cf65503460f949c57fc/html5/thumbnails/11.jpg)
Columns_param Metadata Set
| Presentation Title | Presenter Name | Date | Subject | Business Use Only11
The list of attributes in columns_param are identical to the list in columns. That is to say, everything you need to describe about a short-wide column must be described about the tall-thin parameter-related column.
Storing the study data in tall-thin data sets does not reduce the amount of metadata definition that is required
In data set TABLE, when variable COLUMN equals the value PARAM then the attributes of variable PARAMREL are described in the columns_param metadata set row
There are many other variable attributes than in the example, but these were subsetted to fit in a slide