A prior version of the proposal can be found here.
Introduction
Some functionalities are either not supported or too difficult to express using the DFDL expression language. While it is possible to add desired functionality to the codebase, some may be applicable to only a small dataset and won't have widespread use. It also leaves the burden of implementing said functionalities on the Daffodil developers. We would like Daffodil to be able to register and execute external/user defined functions (UDFs) in DFDL expressions.
See - DAFFODIL-2186Getting issue details... STATUS for additional information.
Use Cases/Examples
A trivial use case would be adding a 'replace' function that is callable from a DFDL expression. In the DFDL Schema, we might call something like the below; where transformedElem will contain "Hello_World", if someElement resolves to "Hello World".
xmlns:sdf="urn:example:com:ext:udfunction:stringfunctions" ... <xs:element name="transformedElem" ... dfdl:inputValueCalc="{ sdf:replace(../someElement, ' ', '_') }"/>
The function class would look something like the below
@UserDefinedFunctionIdentification( name = "replace", namespaceURI = "urn:example:com:ext:udfunction:stringfunctions" ) public class Replace { public String evaluate(String orig, String pre, String post) { //implementation... } }
Another use case would be implementing the normalization of elevation above Mean-Sea-Level (MSL) to Height-Above-Ellipsoid (HAE) for Link16F1 data. In the DFDL schema, we might call something like the below; where the functions will return the result of the conversion.
xmlns="http://extOther.UDFunction.ElevationConversions.com" ... dfdl:outputValueCalc="{ convert_to_hae(../lat, ../lon, ../msl) }"
The userDefinedFunction class would look something like the below
@UserDefinedFunctionIdentification( name = "convert_to_hae", namespaceURI = "http://extOther.UDFunction.ElevationConversions.com" ) public class MSLConversions { public double evaluate(double latitude, double longitude, double msl) { //implementation.. } }
Requirements
- The UDF will be defined in a stand-alone library outside of the Daffodil codebase
- The UDF must be accessible to and callable from the Daffodil code
- Daffodil must be able to process and pass the return value from the UDF back to the Schema
- The support of UDFs in the DFDL Schema must be language agnostic and not Java, Scala or Daffodil specific
Proposed Solution
The Daffodil solution will use a combination of JAVA's ServiceLoader and Reflection APIs.
Daffodil Provided Classes
Daffodil will provide a UserDefinedFunction interface, a UserDefinedFunctionProvider abstract class, a UserDefinedFunctionIdentification annotation class, and two exception classes: UserDefinedFunctionFatalException and UserDefinedFunctionProcessingError.
Each UDF must implement the UserDefinedFunction interface. This marks it as a UDF to Daffodil and gives it some properties such as Serializability.
The UserDefinedFunctionProvider class will have an abstract function that returns an array of classes that are representative of all the UDFs the provider is aware of. It will also provide a default function to initialize the UDF classes into object. This default function can only be used for classes with no argument constructors. This function must be overloaded for other types of UDF classes. This UserDefinedFunctionProvider class must be implemented for each provider class supplied.
The UserDefinedFunctionIdentification annotation class must be applied and properly initialized for each UDF class. It provides name and namespaceURI elements that will be used to call the function from the schema.
The UserDefinedFunctionProcessingError exception can be thrown when an implementer wishes to throw a recoverable error that'll induce backtracking. The UserDefinedFunctionFatalException exception can be thrown to halt processing all together and abort Daffodil.
UDF Implementation
The implementer will be expected to implement at least two classes: a provider class and at least one UDF class.
The provider class will be an implementation of the Daffodil provided UserDefinedFunctionProvider class. It will contain a function that returns an array of classes of all its UDFs, and an optional lookup function (for UDFs with argument constructors, such as in classes that need state). This class will act as a traditional service provider as explained in the ServiceLoader API, and must have an entry, with its fully qualified name, in the project's META-INF/services/org.apache.daffodil.udf.UserDefinedFunctionProvider file. That file will either be located in the root of the project or the resources directory, as long as its accessible on the classpath. Through that file, this class will be made visible to the ServiceLoader API and the UDF object can be obtained by Daffodil. A sample is provided below.
public class myUDFunctionProvider extends UserDefinedFunctionProvider { StateObject someState = someValue; @Override public Class<?>[] getUserDefinedFunctionClasses() { return new Class<?>[] { someUDFClassA.class, someUDFClassB.class }; } @Override public UserDefinedFunction createUserDefinedFunction(String namespaceURI, String name) throws IllegalArgumentException, SecurityException, ReflectiveOperationException { UserDefinedFunction fcObject = null; String udfid = namespaceURI + ":" + name; if (udfid.equals("urn:some:udf:needing:state:some_udf_a")) { fcObject = new someUDFClassA(someState); } else { fcObject = super.createUserDefinedFunction(namespaceURI, name); } return fcObject; } } }
The UDF classes will contain the functionality of the UDF embodied in an evaluate method. The UDF class will be expected to implement an evaluate method as well as apply the Daffodil provided UserDefinedFunctionIdentification annotation to the class . Because the parameter types and the return types of the evaluate function are dependent on the functionality, and we really only care about the name, we will not provide an abstract function for it. Each function that the implementer wishes to expose must implement the UserDefinedFunction interface, contain an evaluate function, and have the UserDefinedFunctionIdentification annotation. See Proposal: Feature to Support User Defined Functions#Use Cases/Examples for a sample UDF class.
Daffodil Service Loader
Daffodil will use the ServiceLoader API to poll for UDF Provider classes and return the desired function class on request.
Daffodil will have an internal object that uses the ServiceLoader iterator to aggregate and validate all the provider classes and the UDF classes they provide. This class will do this aggregation and validation at compile time, and will only initialize a UDF object and look up its method if an attempt is made to call the UDF. And providers or UDFs that don't validate during compile time will be dropped It. Any attempts to call a dropped UDF from the schema will result in an SDE.
Daffodil DFDL Processing
Acquiring the UDF
The Internal object referenced above will be instantiated only if a function call from the schema is not recognized as one of our previously supported functions. We will call this object's lookup function to find the UDF based on the name and namespace. If it finds the UDF, it will return a case class containing the UDF class, the evaluate method, its NodeInfo.Kind param types and return type. The aforementioned are necessary to call the UDF at runtime. If the UDF is not found, we'll throw an SDE.
val udfCallingInfo = UserDefinedFunctionService.lookupUserDefinedFunctionCallingInfo(namespace, fName) val UserDefinedFunctionService.UserDefinedFunctionCallingInfo(udf, ei) = udfCallingInfo.get val UserDefinedFunctionService.EvaluateMethodInfo(evaluateMethod, evaluateParamTypes, evaluateReturnType) = ei
Calling the UDF
Within the DFDL expression processing code, Daffodil will define 2 case classes, a UserDefinedFunctionCallExpr and a UserDefinedFunctionCall. UserDefinedFunctionCallExpr will extend Daffodil's FunctionCallBase, and override inherentType, targetTypeForSubexpression and compiledDPath. It will call UserDefinedFunctionCall as follows.
UserDefinedFunctionCallExpr(functionQNameString, functionQName, args, evaluateParamTypes, evaluateReturnType, UserDefinedFunctionCall(_, _, udf, evaluateMethod)) case class UserDefinedFunctionCallExpr( nameAsParsed: String, fnQName: RefQName, args: List[Expression], argTypes: List[NodeInfo.Kind], resultType: NodeInfo.Kind, constructor: (String, List[CompiledDPath]) => RecipeOp) extends FunctionCallBase(nameAsParsed, fnQName, args) { override lazy val inherentType = resultType lazy val argToArgType = { checkArgCount(argTypes.length) (args zip argTypes).toMap } override def targetTypeForSubexpression(childExpr: Expression): NodeInfo.Kind = { argToArgType.get(childExpr) match { case Some(tt) => tt case None => Assert.invariantFailed("subexpression isn't of the expected type.") } } override lazy val compiledDPath = { checkArgCount(argTypes.length) val recipes = args.map { _.compiledDPath } val res = new CompiledDPath(constructor(nameAsParsed, recipes) +: conversions) res } }
UserDefinedFunctionCall will override computeValues to call the evaluateFxn using its invoke method. It will catch any exceptions are treat them either as a processing error or as a fatal error/abort. Errors calling the method (such as reflection or IllegalArgumentException) and UserDefinedFunctionProcessingError as treated as processing errors. Any other erorr is treated as a fatal error/abort.
case class UserDefinedFunctionCall( functionQNameString: String, recipes: List[CompiledDPath], userDefinedFunction: UserDefinedFunction, evaluateFxn: UserDefinedFunctionMethod) extends FNArgsList(recipes) { override def computeValue(values: List[Any], dstate: DState) = { val jValues = values.map { _.asInstanceOf[Object] } try { val res = evaluateFxn.method.invoke(userDefinedFunction, jValues: _*) res } catch { case e: InvocationTargetException => { val targetException = e.getTargetException targetException match { case te: UserDefinedFunctionProcessingError => throw new UserDefinedFunctionProcessingErrorException( s"User Defined Function '$functionQNameString'", Maybe(dstate.compileInfo.schemaFileLocation), dstate.contextLocation, Maybe(te), Maybe.Nope) case te: Exception => throw new UserDefinedFunctionFatalErrorException( s"User Defined Function '$functionQNameString' Error", te, userDefinedFunction.getClass.getName) } } case e @ (_: IllegalArgumentException | _: NullPointerException | _: ReflectiveOperationException) => throw new UserDefinedFunctionProcessingErrorException( s"User Defined Function '$functionQNameString'", Maybe(dstate.compileInfo.schemaFileLocation), dstate.contextLocation, Maybe(e), Maybe.Nope) case e: ExceptionInInitializerError => throw new UserDefinedFunctionFatalErrorException( s"User Defined Function '$functionQNameString' Error", e, userDefinedFunction.getClass.getName) } } }
Diagnostics
We intend to supply the user will at least the following errors/warning
- Warning: Any ignored/dropped User Defined Function or User Defined Function Providers
- Error: Errors loading User Defined Function Providers or initializing User Defined Functions
- Info: User Defined Function Loaded
- SDE: No User Defined function class with specified name/namespace found
Testing
ID | Description | Test Data |
---|---|---|
1 | Tests when there are no providers found by the ServiceLoader API due to missing or empty meta-inf file | No META-INF file on classpath |
2 | Tests when there is an error thrown from ServiceLoader API | META-INF file contains class that doesn’t exit |
3 | Tests when UDF Provider has no function classes | UDFP whose getUDF func returns null |
4 | Tests when UDF Provider has empty function class | UDFP whose getUDF func returns empty array of classes |
5 | Tests when function classes don’t implement UserDefinedFunction interface | UDF with function class that doesn’t implement UserDefinedFunction interface |
6 | Tests when function classes don’t have annotations | UDF with function class that doesn’t have UserDefinedFunctionIdentification annotation |
7 | Tests when function classes have empty/invalid annotation fields | UDF with function class that has annotation function with empty fields |
8 | Tests when function classes have no evaluate function | UDF with function class whose doesn’t have method called evaluate |
9 | Tests when function can’t be found | Function call from schema with no matching UDF loaded |
10 | Tests when function class have overloaded evaluate function | UDF with overloaded evaluate function |
11 | Tests when arguments number incorrect | Function call from schema with incorrect arg number |
12 | Tests when argument types incorrect | Function call from schema with incorrect arg type |
13 | Tests when argument types unsupported | Function call from schema with unsupported type (such as Array of String) |
14 | Tests when return type unsupported | UDF with unsupported return type such as Array of Arrays |
15 | Tests UDF with no args | UDF with no params |
16 | Tests UDF with no return type | UDF with void return type |
17 | Tests UDF with primitive int params and returns | UDF with primitive params and return |
18 | Tests UDF with primitive byte params and returns | UDF with primitive params and return |
19 | Tests UDF with primitive byte array params and returns | UDF with primitive params and return |
20 | Tests UDF with primitive short params and returns | UDF with primitive params and return |
21 | Tests UDF with primitive long params and returns | UDF with primitive params and return |
22 | Tests UDF with primitive double params and returns | UDF with primitive params and return |
23 | Tests UDF with primitive float params and returns | UDF with primitive params and return |
24 | Tests UDF with primitive boolean params and returns | UDF with primitive params and return |
25 | Tests UDF with Boxed Integer params and returns | UDF with boxed params and return |
26 | Tests UDF with Boxed Byte params and returns | UDF with boxed params and return |
27 | Tests UDF with Boxed Short params and returns | UDF with boxed params and return |
28 | Tests UDF with Boxed Long params and returns | UDF with boxed params and return |
29 | Tests UDF with Boxed Double params and returns | UDF with boxed params and return |
30 | Tests UDF with Boxed Float params and returns | UDF with boxed params and return |
31 | Tests UDF with Boxed Boolean params and returns | UDF with boxed params and return |
32 | Tests UDF with Java Big Integer params and returns | UDF with specified params and returns |
33 | Tests UDF with Java Big Decimal params and returns | UDF with specified params and returns |
34 | Tests UDF with String params and returns | UDF with specified params and returns |
35 | Tests when no UDFs called, and no UDFs available to be loaded | No UDFs on classpath, no UDF in schema |
36 | Tests when UDFs called, but no UDFs loaded | No UDFs on classpath, UDF in schema |
37 | Tests when UDF called with default namespace | Default namespace set to UDF namespaceURI; UDF calls with no prefix |
38 | Tests when exceptions thrown during loading UDFP | UDFP classes throws exception in class |
39 | Tests when exceptions thrown during loading UDFP’s UDF classes | UDFP throws exception in getUDFs function |
40 | Tests when exceptions thrown during loading UDF | UDF throws exception in class |
41 | Tests when custom exceptions thrown during evaluating (FatalError) | UDF throws exception in evaluate function |
42 | Tests when UDFProcessingError thrown during evaluating (ProcessingError) | UDF throws UDFProcessingError in evaluate function |
43 | Tests when UDF initializer returns object of wrong type | UDFP’s initialization function creates UDF object of different type |
Pull Requests
https://github.com/apache/incubator-daffodil/pull/273 - Initial Proposal
https://github.com/apache/incubator-daffodil/pull/279 - Final Product