Thursday, January 5, 2017

Generating Define-XML: the Pinnacle21 roundtrip test

In my previous post, I presented our new "Define.xml Designer" software, implementing all "best practices for generating define.xml", but also allowing to generate extremely good define.xml files for legacy studies for which the SAS-XPT files are already present, but no define.xml exists yet.

It looks as many people are however still using the "Pinnacle21 Community Define.xml Generator", probably because it's free, and uses Excel as an input for the tool. The price for that is however that there is no user manual, no support, no graphical user interface. As there is no manual nor GUI, the originators advise users to load an existing define.xml into the tool, generate the Excel worksheet from that, and then adapt the worksheet for the current study, and then generate the new define.xml from the worksheet with the tool. This usually results in a number of "trial-and-error" cycles, each time changing the worksheet and have a new try, until the desired define.xml is obtained. However, when one knows the basic principles of XML (my students at the university learn these in less than 3 hours), I presume adapting the define.xml using an XML editor is considerably faster (and one understands what one does!).

A good test for such software is always to do a "round trip", i.e. taking a correct file, load it into the tool, and then exporting it again. In the case of the Pinnacle21 Define.xml Generator, this means loading an existing define.xml, exporting it to an Excel worksheet, and then generating a new define.xml from that worksheet, without having made any changes to it.
Ideally, the result should be that source define.xml and newly generated define.xml are 100% identical. No information should be lost, and no new information should have been added somehow. Existing information should not have been changed either.

Round-tripping is a typical quality test for software. Loading a file and exporting it again should result in no differences. So we did the test on the Pinnacle21 software (v.2.2.0) using the sample SDTM define.xml 2.0 file that comes with the standards distribution.

What are the results?


Let us first check whether any information was lost in the roundtrip. This is what we found:
  • we found that the "Originator" attribute on the "ODM" element disappears, as well as the "SourceSystem" and "SourceSystemVersion" attributes. These contain important information about who (organization) and what system generated the define.xml. As there is no manual, we could also not find out how one can reintroduce this important information using the tool.
  • we also found that the "label" of many of the variables had disappeared ("Description" element under "ItemDef" element). We found that this is the case when the variable is a "valuelist" variable. Inspection of the by the tool generated worksheet revealed that there is indeed no "Label" column in the worksheet in the "ValueLevel" tab. Maybe one should add one there manually, but as there is no user manual, there is no way we can find out. This also means that the as such generated define.xml file (without labels for valuelevel variables) is not only essentially invalid, but also not very usable for reviewers either as they cannot find out what the valuelist variable is about.
  • additionally, all "SASFormatName" attributes disappeared. Now, "SASFormatName" is an optional attribute, but it may have it's value to have it in the define.xml when a define.xml of one study is used as a template for a define.xml for a subsequent (similar) study (reuse).
The Pinnacle21 tool removes some of the important attributes on the ODM element (colored red)




Let us now check whether any information was added (silently) that was not in our original define.xml at all. 
  • Rather surprisingly, we found that a number of variable definitions were automatically added, although they were not in the original define.xml. We found that when a variable is defined once originally (e.g. STUDYID, USUBJID) and referenced many times (i.e. by each dataset), the Pinnacle21 refuses this kind of "reuse" and creates different variable definitions for STUDYID and USUBJID, for each dataset a  new one. So, in our original define.xml we had only 1 definition of STUDYID (with OID "IT.STUDYID), whereas in the newly generated define.xml we have over 30 of them (with OIDs "IT.TA.STUDYID", "IT.TE.STUDYID", "IT.DM.STUDYID", etc..). The same applies to USUBJID: instead of having a single definition of USUBJID, we suddenly have over 30 ones.
Did the tool change any information from our original define.xml file?
We found the following:
  • All OIDs (the identifiers) were altered, except for most of the ones of the valuelists (but not all of them) and of the codelists. It looks as in many cases the tool assigns the OIDs itself, without the possibility for the user to have any influence on this. As the OIDs are arbitrary, this is not a disaster, but it again means that one cannot use one define.xml as a template for a next one, especially when one has company-standardized OIDs for SDTM or SEND or ADaM variables.
The Pinnacle21 tool changes all the OIDs in the define.xml (or reassigns them)


We were shocked by the finding that the tool also alters the "Study OID" without any notice. In the original define.xml it's value is "cdisc01", in the newly created define.xml it is "CDISC01.SDTM-IG.3.1.2". We again suspect that the user cannot have influence on the assignment of the "Study OID". The same applies to the OID and Name attributes of the "MetaDataVersion" elements and the contents of its "Description" element: all these were changed by the tool without any notice.



OIDs of "Study" and "MetaDataVersion" have been altered, as well as "MetaDataVersion Name" and the "MetaDataVersion Description"





You might now ask yourself how our own "XML4Pharma Define.xml Designer" scores in the "roundtrip test". Well, you can easily find out yourself by requesting a trial version of the software and perform the roundtrip test yourself. This will also allow you to discover how user-friendly this new software is.


Conclusion: the Pinnacle21 "Define-XML Generator" does a pretty good job in generating a (prototype) define.xml starting from an Excel worksheet. The "round trip test" however shows that the user does not have any influence at all on how the OIDs are generated. Worse is that the labels for the "ValueList" variables are missing. Maybe this can be circumvented by adding an extra "Label" column in the worksheet for them, but as there is no user manual, there is no way to find out.
This means that the generated define.xml still requires manual editing (best by using an XML-editor - there are some free ones). This triggers the question whether taking an existing define.xml, and use an XML-editor for adapting it for a new study isn't the faster way, with the additional advantage that one is knowing what one is doing".
There are considerable better define.xml generating software tools on the market, with nice GUIs and wizards (including our own "Define.xml Designer"). These are not for free, but their cost is very reasonable, and e.g. only a fraction of what the "Pinnacle21 Enterprise Edition" costs.