XmlDomDataContext
The class XmlDomDataContext
is no longer recommended, except for exploratory use. It will load the complete DOM of an XML file into memory and model tables based on each element that contains sub-elements or attributes with value-types in it. This makes it a good vehicle for visualizing or demonstrating the capability to read XML files, but unfortunately it also makes it very content-dependent and different XML files may not get the same schema model even if they share the same XSD.
XmlSaxDataContext
The XmlSaxDataContext
is the newer and much improved DataContext implementation for XML files.
Since XML files are hierarical and MetaModel tables are tabular, you need to do some mapping. MetaModel provides a mapping model that is XPath based, with a few slight modifications.
Assume we have the following XML document:
<?xml version="1.0" encoding="UTF-8"?> <root> <organization type="governmental"> <name>Company A</name> <employees> <employee> <name>John Doe</name> <gender>M</gender> </employee> <employee> <name>Jane Doe</name> <gender>F</gender> </employee> </employees> </organization> <organization type="company"> <name>Company B</name> <employees> <employee> <name>Peter</name> <gender>M</gender> </employee> <employee> <name>Bob</name> <gender>M</gender> </employee> </employees> </organization> </root>
Now imagine that you want to have a table of employee names and gender information, and another table with company name and type information. We define our DataContext and those tables like this:
XmlSaxTableDef employeeTableDef = new XmlSaxTableDef( "/root/organization/employees/employee", new String[] { "/root/organization/employees/employee/name", "/root/organization/employees/employee/gender" } ); XmlSaxTableDef organizationTableDef = new XmlSaxTableDef( "/root/organization", new String[] { "/root/organization/name", "/root/organization@type" } ); DataContext dc = new XmlSaxDataContext(new File("my_file.xml"), employeeTableDef, organizationTableDef);
As you see, we simply provide XPath expressions to 1) define the record scope and 2) define paths of individual values (or rather - the column definitions). If you query those tables, you will get datasets like these:
Table: /employee
row_id | /name | /gender |
---|---|---|
0 | John Doe | M |
1 | Jane Doe | F |
2 | Peter | M |
3 | Bob | M |
Table: /organization
row_id | /name | @type |
---|---|---|
0 | Company A | governmental |
1 | Company B | company |
This is nice, but you might be thinking: How can I then join these tables? There doesn't seem to be any cross-reference value that we can join or perform lookups by.
To solve this issue, MetaModel provides a modification for XPath, the index(...) function. Say we want to add the organization's id to the employee table (as a foreign key). To archieve that, we will need this modified employee table definition (notice the third value XPath expression):
XmlSaxTableDef employeeTableDef = new XmlSaxTableDef( "/root/organization/employees/employee", new String[] { "/root/organization/employees/employee/name", "/root/organization/employees/employee/gender", "index(/root/organization)" } );
Now if you query the employees table, this will be your result:
row_id | /name | /gender | index(/root/organization) |
---|---|---|---|
0 | John Doe | M | 0 |
1 | Jane Doe | F | 0 |
2 | Peter | M | 1 |
3 | Bob | M | 1 |
Moving on, you will be able to define both joins and lookups using this foreign key. For example:
Column fk = employeeTable.getColumnByName("index(/root/organization)"); Column empName = employeeTable.getColumnByName("/name"); Column orgId = organizationTable.getColumnByName("row_id"); Column orgName = organizationTable.getColumnByName("/name"); Query q = dc.query().from(employeeTable) .innerJoin(organizationTable).on(fk, orgId) .select(empName).as("employee") .select(orgName).as("company").toQuery(); DataSet ds = dc.executeQuery(q);
The contents of this queried dataset will now be:
employee | company |
---|---|
John Doe | Company A |
Jane Doe | Company A |
Peter | Company B |
Bob | Company B |