This technique is for XML Schemas and plain old XML data, not DFDL-described data. See Combining DFDL Schemas Together for the DFDL variation. DFDL requires slightly different techniques, and people often need both if they are processing both XML and DFDL-described data together in the same system. 

Q: Given multiple XML messages with different names and namespaces.
(dozens to hundreds of different ones) I want to read them from a file and validate them.  They each have different XML schemas.

A: This is much easier than you think. You don't need to touch the XML, or try to figure out which schema should be used with each kind of incoming XML message. You just create a combined XML schema from all the separate ones then let the validating parser do the work. 

Caveat: The schemas being combined must have distinct target namespaces. There is no way to combine two schemas if they have conflicting element, type, or group definitions that are in the same namespace. 

Caveat: Test with the XML schema validators you need. The technique here is standard XML Schema practice, but there are limitations and flaws in many XML schema validators.

Many cyber security applications that use XML schema validation use some mixture of Xerces C, Xerces J, libxml2 (which is used by the xmlint command line tool), and possibly others. 

Let's use 3 as the number of different messages. The concept scales up to hundreds. 

Assume there are 3 XML Schema files named a.xsd, b.xsd and c.xsd for each of 3 different XML message schemas.

Each defines a message element and a target namespace for that message. 

In the below, the namespace URIs used are just trivial ones like "urn:a". These can be any namespace URIs. The example just uses these trivial short ones for brevity. Your schemas are likely to have meaningful namespace URIs and more meaningful prefixes than the single-letters used here. 

a.xsd
<schema    
 xmlns="http://www.w3.org/2001/XMLSchema"
 xmlns:xs="http://www.w3.org/2001/XMLSchema"
 xmlns:a="urn:a"
 targetNamespace="urn:a">
  
<element name="a_msg" type="a:a_msg_type"/>
 
... plus type definitions
</schema>


b.xsd
<schema    
 xmlns="http://www.w3.org/2001/XMLSchema"
 xmlns:xs="http://www.w3.org/2001/XMLSchema"
 xmlns:b="urn:b"
 targetNamespace="urn:b">
  
<element name="b_msg" type="b:b_msg_type"/>
 
... plus type definitions
</schema>


c.xsd
<schema    
 xmlns="http://www.w3.org/2001/XMLSchema"
 xmlns:xs="http://www.w3.org/2001/XMLSchema"
 xmlns:c="... c..."
 targetNamespace="... c ...">
 
<element name="c_msg" type="c:c_msg_type"/>

... plus type definitions
</schema>


Now we can "glue" these together into a single combined schema:

combined_abc.xsd
<schema
  xmlns="http://www.w3.org/2001/XMLSchema"
  xmlns:abc="urn:abc"
  targetNamespace="urn:abc">

<import namespace="urn:a" schemaLocation="a.xsd"/>
<import namespace="urn:b" schemaLocation="b.xsd"/>
<import namespace="urn:c" schemaLocation="c.xsd"/>

<!--
  Notice that there is nothing else in this file.
  Just a bunch of import statements. 
  
  All the element declarations of possible "root"
  elements of XML documents are from elements defined
  in these imported schema files. 

  There could be dozens or hundreds of import statements like this.
  -->
</schema>

You now have one single schema that can be used to validate ANY of your incoming messages.  

Incoming XML instance data can have

<a:a_msg xmlns:a="urn:a">....contents of a_msg type </a:a_msg>

or

<b:b_msg xmlns:b="urn:b">....contents of b_msg type </b:b_msg>

or

<c:c_msg xmlns:c="urn:c">....contents of c_msg type </c:c_msg>

The XML parser-validator handles finding the right element declaration,  which will ultimately be from one of the imported schema files.


  • No labels