Tuesday, May 3, 2016

Ask SHARE - SDTM validation rules in XQuery

This weekend, after returning from the European CDISC Interchange (where I gave a presentation titled "Less is more - A Visionary View of the Future of CDISC Standards"), I continued my work on the implementation of the SDTM validation rules in the open and vendor-neutral XQuery language (also see earlier postings here and here).
This time, I worked on a rule that is not so easy. It is the PMDA rule CT2001: "Variable must be populated with terms from its CDISC controlled terminology codelist. New terms cannot be added into non-extensible codelists".
This looks like an easy one on first sight, but it isn't. How does a machine-executable rule know whether a codelist is extensible or not? Looking into an Excel worksheet is not the best way (also as Excel is not a vendor-neutral standard) and cumbersome to program (if possible at all). So we do need something better.

So we developed the human-readable, machine-executable rule (it can be found here) using the following algorithm:

  • the define.xml is taken and iteration is performed over each dataset (ItemGroupDef)
  • within the dataset definition, an iteration is performed over all the defined variables (ItemRef/ItemDef) and it is looked whether there is a codelist is attached to the variable
  • if a codelist is attached, the NCI code of the codelist is taken. A web service request "Get whether a CDISC-CT Codelist is extensible or not" is triggered which returns whether the codelist is extensible or not. Only the codelists that are not extensible are retained. This leads to a list of non-extensible codelists for each dataset
  • the next step could be that each of the (non-extensible) codelist is inspected for whether it has some "EnumeratedItem" or "CodeListItem" elements that have the flag 'def:ExtendedValue="Yes"'. This is however not "bullet proof" as sponsors may have added terms and forgot to add the flag. 
  • the step could also have been to use the web service "Get whether a coded value is a valid value for the given codelist" to query whether each of the values in the codelist is really a value of that codelist as published by CDISC. This relies on that the values in the dataset itself for the given variable are all present in the codelist (enforced by another rule). The XQuery implementation can be found here.
  • we choose however to inspect each value in the dataset of the given variable which has a non-extensible codelist for whether it is an allowed value for that codelist by using the web service "Get whether a coded value is a valid value for the given codelist". If the answer from the web service is "false", an error is returned in XML format (allowing reuse in other applications). The XQuery implementation can be found here.
For each of the XQuery implementations, you can inspect them either using NotePad or NotePad++, or a modern XML editor like Altova XML Spy, oXygen-XML or EditiX.

A partial view is below:


What does this have to do with SHARE?

Everything!

All of the above mentioned RESTful web services (available at www.xml4pharmaserver.com/WebServices/index.html) are based on SHARE content. The latter has been implemented as a relational database (it could however also have been a native XML database) and a pretty large number of RESTful web services has been build around it.

In future, the SHARE API will deliver such web services directly from the SHARE repository. So our own web services are only a "test bed" for finding out what is possible with SHARE.

So in future, forget about "black box" validation tools - simply "ask SHARE"!

No comments:

Post a Comment