HASUG Meeting Notes: November 2011 (define.xml)

The 4th quarter 2011 HASUG meeting took place at Bristol-Myers Squibb in Wallingford, CT, on November 10th. Speakers included John Adams of Boehringer Ingelheim and David Kelly from Customer Intelligence at SAS Institute.

John Adams’s presentation, “Creating a define.xml file for ADaM and SDTM,” addressed a current issue within the pharmaceutical industry as CDISC (Clinical Data Interchange Standards Consortium) moves to standardize the electronic submission process of pharmaceutical studies to the FDA, in the interest of making the review process more efficient and consequently decreasing the time it takes for a new drug to reach the market. A define.xml file contains all the metadata information needed to guide the reviewer through the electronic FDA submission. While there is software readily available for the creation of this file for SDTM submissions, only limited support exists for ADaM compatible define.xml files. Adams’s presentation described how his organization addresses this problem.

Adams began with a short tutorial on xml schemas and style sheets before describing the process for creating ADaM compatible define.xml files and discussing the methodology for capturing metadata. The xml tutorial, which was very well done, included a visual representation of basic xml structure, showing how root elements, child elements, attributes, and values are organized hierarchically in xml. He also contrasted html vs. xml (global, standard tags in html vs. non-standard tags defined by a schema in xml) and described the requirement that the define.xml file be “well-formed xml” (as opposed to an xml fragment), listing the basics of well-formed xml as follows: xml declaration, unique root element, start and end tags, proper nesting of case-sensitive elements, quoted attribute values, and use of entities for special characters (&,<,>,etc.). Finally, he defined the two elements of the define.xml file: the schema, an .xsd file which defines the file structure (elements, attributes, order and number of child elements, data types, default values) and validates the data, and the style sheet, an .xsl file which defines the layout/display for rendering the data (table of contents, tables, links), used to transform the xml into an html file that can be recognized and displayed by a browser.

Next, Adams described the general CDISC schema, zeroing in on some of the more important elements, and provided a list of available software tools for developing xml files along with some of the challenges associated with each: CDISC software, SAS Clinical Toolkit (in Base SAS), SAS XML Mapper (Java-based GUI which is helpful translating xml files to SAS data sets but not vice versa), and SAS XML Engine (Base SAS). He described the process of capturing metadata in Excel to use as input for the SAS programs which output the define.xml file, highlighting the newer v.9 Excel libname feature in SAS (example syntax: “LIBNAME WrkBk EXCEL ‘My Workbook.xls’ VER=2002;” see sugi paper for more details: http://www2.sas.com/proceedings/sugi31/024-31.pdf), or refer to my previous post on 3 Ways to Export Your Data to Excel for other ways to use the Excel libname. He also shared a SAS macro using the tranwrd() function to replace special characters such as “&” and “< "which must be represented in the xml document as "&amp" and "&lt." Also of note: Adams recommended Oxygen Editor to debug the xml code and make sure the file displays properly in Internet Explorer. This was a very interesting discussion of how he and others at Boehringer successfully adapted CDISC schema and style sheets to produce an ADaM compatible define.xml file; even for a non-pharmaceutical audience, his discussion of basic xml structure and SAS tools used to solve this business problem could prove useful.

Jessica Hampton

Analytics Professional

HASUG Meeting Notes: November 2011 (define.xml)