A prior version of the proposal can be found here.

Introduction

Some functionalities are either not supported or too difficult to express using the DFDL expression language. While it is possible to add desired functionality to the codebase, some may be applicable to only a small dataset and won't have widespread use. It also leaves the burden of implementing said functionalities on the Daffodil developers. We would like Daffodil to be able to register and execute external/user defined functions (UDFs) in DFDL expressions.

See  DAFFODIL-2186 - Getting issue details... STATUS  for additional information.

Use Cases/Examples

A trivial use case would be adding a 'replace' function that is callable from a DFDL expression. In the DFDL Schema, we might call something like the below; where transformedElem will contain "Hello_World", if someElement resolves to "Hello World". 

xmlns:sdf="urn:example:com:ext:udfunction:stringfunctions"
...
<xs:element name="transformedElem" ... dfdl:inputValueCalc="{ sdf:replace(../someElement, ' ', '_') }"/>

The function class would look something like the below

@UserDefinedFunctionIdentification(
	name = "replace", 
	namespaceURI = "urn:example:com:ext:udfunction:stringfunctions" 
)
public class Replace {
	public String evaluate(String orig, String pre, String post) {
		//implementation...
	}
}


Another use case would be implementing the normalization of elevation above Mean-Sea-Level (MSL) to Height-Above-Ellipsoid (HAE) for Link16F1 data. In the DFDL schema, we might call something like the below; where the functions will return the result of the conversion.

xmlns="http://extOther.UDFunction.ElevationConversions.com"
...
dfdl:outputValueCalc="{ convert_to_hae(../lat, ../lon, ../msl) }"

The userDefinedFunction class would look something like the below

@UserDefinedFunctionIdentification(
	name = "convert_to_hae", 
	namespaceURI = "http://extOther.UDFunction.ElevationConversions.com" 
)
public class MSLConversions {
	public double evaluate(double latitude, double longitude, double msl) {
		//implementation..
	}
}

Requirements

  1. The UDF will be defined in a stand-alone library outside of the Daffodil codebase
  2. The UDF must be accessible to and callable from the Daffodil code
  3. Daffodil must be able to process and pass the return value from the UDF back to the Schema
  4. The support of UDFs in the DFDL Schema must be language agnostic and not Java, Scala or Daffodil specific

Proposed Solution

The Daffodil solution will use a combination of JAVA's ServiceLoader and Reflection APIs.

Daffodil Provided Classes

Daffodil will provide a UserDefinedFunction interface, a UserDefinedFunctionProvider abstract class, a  UserDefinedFunctionIdentification annotation class, and two exception classes: UserDefinedFunctionFatalException and UserDefinedFunctionProcessingError.

Each UDF must implement the UserDefinedFunction interface. This marks it as a UDF to Daffodil and gives it some properties such as Serializability.

The UserDefinedFunctionProvider class will have an abstract function that returns an array of classes that are representative of all the UDFs the provider is aware of. It will also provide a default function to initialize the UDF classes into object. This default function can only be used for classes with no argument constructors. This function must be overloaded for other types of UDF classes. This UserDefinedFunctionProvider class must be implemented for each provider class supplied.

The UserDefinedFunctionIdentification annotation class must be applied and properly initialized for each UDF class. It provides name and namespaceURI elements that will be used to call the function from the schema.

The UserDefinedFunctionProcessingError exception can be thrown when an implementer wishes to throw a recoverable error that'll induce backtracking. The UserDefinedFunctionFatalException exception can be thrown to halt processing all together and abort Daffodil.

UDF Implementation

The implementer will be expected to implement at least two classes: a provider class and at least one UDF class.

The provider class will be an implementation of the Daffodil provided UserDefinedFunctionProvider class. It will contain a function that returns an array of classes of all its UDFs, and an optional lookup function (for UDFs with argument constructors, such as in classes that need state). This class will act as a traditional service provider as explained in the ServiceLoader API, and must have an entry, with its fully qualified name, in the project's META-INF/services/org.apache.daffodil.udf.UserDefinedFunctionProvider file. That file will either be located in the root of the project or the resources directory, as long as its accessible on the classpath. Through that file, this class will be made visible to the ServiceLoader API and the UDF object can be obtained by Daffodil. A sample is provided below.

public class myUDFunctionProvider extends UserDefinedFunctionProvider {
	StateObject someState = someValue;

	@Override
 	public Class<?>[] getUserDefinedFunctionClasses() {
    	return new Class<?>[] { someUDFClassA.class, someUDFClassB.class };
  	}

    @Override
      public UserDefinedFunction createUserDefinedFunction(String namespaceURI, String name)
          throws IllegalArgumentException, SecurityException, ReflectiveOperationException {
        UserDefinedFunction fcObject = null;
        String udfid = namespaceURI + ":" + name;

        if (udfid.equals("urn:some:udf:needing:state:some_udf_a")) {
          fcObject = new someUDFClassA(someState);
        } else {
          fcObject = super.createUserDefinedFunction(namespaceURI, name);
        }
        return fcObject;
      }
    }
}


The UDF classes will contain the functionality of the UDF embodied in an evaluate method. The UDF class will be expected to implement an evaluate method as well as apply the Daffodil provided UserDefinedFunctionIdentification annotation to the class . Because the parameter types and the return types of the evaluate function are dependent on the functionality, and we really only care about the name, we will not provide an abstract function for it. Each function that the implementer wishes to expose must implement the UserDefinedFunction interface, contain an evaluate function, and have the UserDefinedFunctionIdentification annotation. See Proposal: Feature to Support User Defined Functions#Use Cases/Examples for a sample UDF class.

Daffodil Service Loader

Daffodil will use the ServiceLoader API to poll for UDF Provider classes and return the desired function class on request.

Daffodil will have an internal object that uses the ServiceLoader iterator to aggregate and validate all the provider classes and the UDF classes they provide. This class will do this aggregation and validation at compile time, and will only initialize a UDF object and look up its method if an attempt is made to call the UDF. And providers or UDFs that don't validate during compile time will be dropped It. Any attempts to call a dropped UDF from the schema will result in an SDE.

Daffodil DFDL Processing

Acquiring the UDF

The Internal object referenced above will be instantiated only if a function call from the schema is not recognized as one of our previously supported functions. We will call this object's lookup function to find the UDF based on the name and namespace. If it finds the UDF, it will return a case class containing the UDF class, the evaluate method, its NodeInfo.Kind param types and return type. The aforementioned are necessary to call the UDF at runtime.  If the UDF is not found, we'll throw an SDE. 

val udfCallingInfo = UserDefinedFunctionService.lookupUserDefinedFunctionCallingInfo(namespace, fName)

val UserDefinedFunctionService.UserDefinedFunctionCallingInfo(udf, ei) = udfCallingInfo.get
val UserDefinedFunctionService.EvaluateMethodInfo(evaluateMethod, evaluateParamTypes, evaluateReturnType) = ei

Calling the UDF

Within the DFDL expression processing code, Daffodil will define 2 case classes, a UserDefinedFunctionCallExpr and a UserDefinedFunctionCall. UserDefinedFunctionCallExpr will extend Daffodil's FunctionCallBase, and override inherentType, targetTypeForSubexpression and compiledDPath.  It will call UserDefinedFunctionCall as follows.

UserDefinedFunctionCallExpr(functionQNameString, functionQName, args, evaluateParamTypes, evaluateReturnType, UserDefinedFunctionCall(_, _, udf, evaluateMethod))

case class UserDefinedFunctionCallExpr(
  nameAsParsed: String,
  fnQName: RefQName,
  args: List[Expression],
  argTypes: List[NodeInfo.Kind],
  resultType: NodeInfo.Kind,
  constructor: (String, List[CompiledDPath]) => RecipeOp)
  extends FunctionCallBase(nameAsParsed, fnQName, args) {

  override lazy val inherentType = resultType

  lazy val argToArgType = {
    checkArgCount(argTypes.length)
    (args zip argTypes).toMap
  }

  override def targetTypeForSubexpression(childExpr: Expression): NodeInfo.Kind = {
    argToArgType.get(childExpr) match {
      case Some(tt) => tt
      case None => Assert.invariantFailed("subexpression isn't of the expected type.")
    }
  }

  override lazy val compiledDPath = {
    checkArgCount(argTypes.length)
    val recipes = args.map { _.compiledDPath }
    val res = new CompiledDPath(constructor(nameAsParsed, recipes) +: conversions)
    res
  }
}


UserDefinedFunctionCall will override computeValues to call the evaluateFxn using its invoke method. It will catch any exceptions are treat them either as a processing error or as a fatal error/abort.  Errors calling the method (such as reflection or IllegalArgumentException) and UserDefinedFunctionProcessingError as treated as processing errors. Any other erorr is treated as a fatal error/abort.


case class UserDefinedFunctionCall(
  functionQNameString: String,
  recipes: List[CompiledDPath],
  userDefinedFunction: UserDefinedFunction,
  evaluateFxn: UserDefinedFunctionMethod)
  extends FNArgsList(recipes) {

  override def computeValue(values: List[Any], dstate: DState) = {
    val jValues = values.map { _.asInstanceOf[Object] }
    try {
      val res = evaluateFxn.method.invoke(userDefinedFunction, jValues: _*)
      res
    } catch {
      case e: InvocationTargetException => {
        val targetException = e.getTargetException
        targetException match {
          case te: UserDefinedFunctionProcessingError =>
            throw new UserDefinedFunctionProcessingErrorException(
              s"User Defined Function '$functionQNameString'",
              Maybe(dstate.compileInfo.schemaFileLocation), dstate.contextLocation, Maybe(te), Maybe.Nope)
          case te: Exception =>
            throw new UserDefinedFunctionFatalErrorException(
              s"User Defined Function '$functionQNameString' Error",
              te, userDefinedFunction.getClass.getName)
        }
      }
      case e @ (_: IllegalArgumentException | _: NullPointerException | _: ReflectiveOperationException) =>
        throw new UserDefinedFunctionProcessingErrorException(
          s"User Defined Function '$functionQNameString'",
          Maybe(dstate.compileInfo.schemaFileLocation), dstate.contextLocation, Maybe(e), Maybe.Nope)
      case e: ExceptionInInitializerError =>
        throw new UserDefinedFunctionFatalErrorException(
          s"User Defined Function '$functionQNameString' Error",
          e, userDefinedFunction.getClass.getName)
    }
  }
}

Diagnostics

We intend to supply the user will at least the following errors/warning

  • Warning: Any ignored/dropped User Defined Function or User Defined Function Providers
  • Error: Errors loading User Defined Function Providers or initializing User Defined Functions
  • Info: User Defined Function Loaded
  • SDE: No User Defined function class with specified name/namespace found

Testing

IDDescriptionTest Data
1Tests when there are no providers found by the ServiceLoader API due to missing or empty meta-inf fileNo META-INF file on classpath
2Tests when there is an error thrown from ServiceLoader APIMETA-INF file contains class that doesn’t exit
3Tests when UDF Provider has no function classesUDFP whose getUDF func returns null
4Tests when UDF Provider has empty function classUDFP whose getUDF func returns empty array of classes
5Tests when function classes don’t implement UserDefinedFunction interfaceUDF with function class that doesn’t implement UserDefinedFunction interface
6Tests when function classes don’t have annotationsUDF with function class that doesn’t have UserDefinedFunctionIdentification annotation
7Tests when function classes have empty/invalid annotation fieldsUDF with function class that has annotation function with empty fields
8Tests when function classes have no evaluate functionUDF with function class whose doesn’t have method called evaluate
9Tests when function can’t be foundFunction call from schema with no matching UDF loaded
10Tests when function class have overloaded evaluate functionUDF with overloaded evaluate function
11Tests when arguments number incorrectFunction call from schema with incorrect arg number
12Tests when argument types incorrectFunction call from schema with incorrect arg type
13Tests when argument types unsupportedFunction call from schema with unsupported type (such as Array of String)
14Tests when return type unsupportedUDF with unsupported return type such as Array of Arrays
15Tests UDF with no argsUDF with no params
16Tests UDF with no return typeUDF with void return type
17Tests UDF with primitive int params and returnsUDF with primitive params and return
18Tests UDF with primitive byte params and returnsUDF with primitive params and return
19Tests UDF with primitive byte array params and returnsUDF with primitive params and return
20Tests UDF with primitive short params and returnsUDF with primitive params and return
21Tests UDF with primitive long params and returnsUDF with primitive params and return
22Tests UDF with primitive double params and returnsUDF with primitive params and return
23Tests UDF with primitive float params and returnsUDF with primitive params and return
24Tests UDF with primitive boolean params and returnsUDF with primitive params and return
25Tests UDF with Boxed Integer params and returnsUDF with boxed params and return
26Tests UDF with Boxed Byte params and returnsUDF with boxed params and return
27Tests UDF with Boxed Short params and returnsUDF with boxed params and return
28Tests UDF with Boxed Long params and returnsUDF with boxed params and return
29Tests UDF with Boxed Double params and returnsUDF with boxed params and return
30Tests UDF with Boxed Float params and returnsUDF with boxed params and return
31Tests UDF with Boxed Boolean params and returnsUDF with boxed params and return
32Tests UDF with Java Big Integer params and returnsUDF with specified params and returns
33Tests UDF with Java Big Decimal params and returnsUDF with specified params and returns
34Tests UDF with String params and returnsUDF with specified params and returns
35Tests when no UDFs called, and no UDFs available to be loadedNo UDFs on classpath, no UDF in schema
36Tests when UDFs called, but no UDFs loadedNo UDFs on classpath, UDF in schema
37Tests when UDF called with default namespaceDefault namespace set to UDF namespaceURI; UDF calls with no prefix
38Tests when exceptions thrown during loading UDFPUDFP classes throws exception in class
39Tests when exceptions thrown during loading UDFP’s UDF classesUDFP throws exception in getUDFs function
40Tests when exceptions thrown during loading UDFUDF throws exception in class
41Tests when custom exceptions thrown during evaluating (FatalError)UDF throws exception in evaluate function
42Tests when UDFProcessingError thrown during evaluating (ProcessingError)UDF throws UDFProcessingError in evaluate function
43Tests when UDF initializer returns object of wrong typeUDFP’s initialization function creates UDF object of different type

Pull Requests

https://github.com/apache/incubator-daffodil/pull/273 - Initial Proposal

https://github.com/apache/incubator-daffodil/pull/279 - Final Product

  • No labels