Wednesday, January 11, 2012

Other strange things in define.xml

Although some people will protest, I am still stating that the SDTM standard has been written with SAS XPT in mind. The 8-character, 40-character and 200-character limitations in SDTM do have a source: the ancient SAS XPT format.
Another major problem of SAS XPT is that it essentially describes two-dimensional tables, similar to tables in relational databases. But even if the SDTM is a blueprint for databases, database specialists will still find a lot of strange things in the specification and implementation guide.

"The world is not flat" has been preached by Armando Oliva from the FDA, stating that also the FDA would like to go to a multi-dimensional model for SDTM submissions. Unfortunately, what they are proposing is a set of HL7-v3 messages, not really knowing what they are talking about.

Multi-dimension models for SDTM would make life (and CDISC end-to-end) considerably more easy, and it would us allow to get rid of many of the strange and illogical constructs in define.xml.

Let us e.g. have a look at the pair VSTESTCD and VSTEST.
According to the SDTM standard, VSTEST is a "synonym" qualifier to VSTESTCD (the standard speaks about "equivalent terms for a --TESTCD". So VSTEST is NOT an attribute of the SDTM record, it should be an attribute to VSTESTCD.
But how is this made visible in SDTM datasets and in define.xml?
It isn't.

Even worse, both VSTESTCD and VSTEST have controlled terminology, i.e. there is an associated CodeList in define.xml for each of them. Let's have a look:

Here is the codelist for VSTESTCD:


and here the one for VSTEST:



We see that define.xml uses CodedValue = Decoded Value.
But how do we now know that "BMI" corresponds to "Body Mass Index"?
These are related 1 to 1 isn't it?
Maybe we know, but there is no way a machine can understand this.

So, what's wrong?
The reason for all this is the flatness of the SDTM, due to the choice of SAS XPT as a transport format.

For me, VSTEST (i.e. test name) is just  a "display variable" to VSTESTCD, i.e. it is not really necessary, and when using XML, one could just display it when necessary, i.e. as a tooltip in the HTML that is generated by the stylesheet.
I will soon write a separate blog about how this can be done and how it could look like.

So, ideally, in the define.xml there should NOT be a variable VSTEST, only a VSTESTCD, and the ItemDef for VSTESTCD should look like:


Remark the use of SDSVarName to keep the SDS (SDTM) Variable name, and the correct use of the "Name" attribute containing the test name (description), so that we do not need VSTEST anymore.
Here is the associated codelist:


It clearly shows that "Adipose Tissue" is the vital sign test name for the vital sign test code "BODYFAT", a relation that cannot be found out with the current SDTM constructs.

Next time, we will see how this can be further extended for units of measurement (--ORRESU, --STRESU) and valuelists.

No comments:

Post a Comment