MXNet can integrate with many different kinds of backend libraries, including TVM, MKLDNN, TensorRT, Intel nGraph and more. These backend in general support a limited number of operators, and thus running computation in a model usually involves in interaction between backend-supported operators and MXNet operators.

These backend libraries share some common requirements:

  • TVM , MKLDNN and nGraph uses customized data formats. Interaction between these backends with MXNet requires data format conversion.
  • TVM, MKLDNN, TensorRT and nGraph fuses operators.

Integration with these backends should happen in the granularity of subgraphs instead of in the granularity of operators. To fuse operators, it's obvious that we need to divide a graph into subgraphs so that the operators in a subgraph can be fused into a single operator. To handle customized data formats, we should partition a computation graph into subgraphs as well. Each subgraph contains only TVM, MKLDNN or ngraph operators. In this way, MXNet converts data formats only when entering such a subgraph and the operators inside a subgraph handle format conversion themselves if necessary. This makes interaction of TVM and MKLDNN with MXNet much easier. Neither the MXNet executor nor the MXNet operators need to deal with customized data formats.

Even though invoking these libraries from MXNet requires similar steps, the partitioning rule and the subgraph execution of these backends can be different. As such, we define the following interface for backends to customize graph partitioning and subgraph execution inside an operator.

class SubgraphProperty {
public:
// the criteria of selecting the subgraph nodes.
virtual SubgraphSelectorPtr CreateSubgraphSelector() const = 0;
// create an nnvm node for a given subgraph. Here users can customize how to
// execute the operators in the subgraph.
virtual nnvm::NodePtr CreateSubgraphNode(const nnvm::Graph &g) const = 0;
};

Step 1: graph partition
Graph partitioning is to traverse a computation graph and group operators into subgraphs based on certain rules. There already exists an TVM fuse pass in NNVM, which groups operators into subgraphs based on certain rules (e.g., convolution followed by element-wise operations). This graph partitioner is TVM-specific. It doesn't work for other backends. For example, MKLDNN operator fusion requires a partitioner that finds subgraphs with specific patterns (e.g., convolution followed by batchnorm, followed by activation, etc).

Regardless of the diverse partitioning requirements, we assume all graph partitioning shares the following requirements:

  • all nodes in a subgraph should be connected via either incoming links or outgoing links or both.
  • a node can't belong to two or more subgraphs.

Given these assumptions, our graph partitioning algorithm traverses from every node in a graph with breadth-first search and explore their neighbor nodes with rules provided by users. The interface below allows users to define the selection rules. Given this interface, users can determine which edges to follow to generate a subgraph. Starting from every node, the graph partitioning algorithm creates a new subgraph selector and tries to grow a subgraph. Two criteria need to meet before a node is selected as a starting node:

  • the node hasn't been selected as part of a subgraph;
  • Select is called on the node and returns true.

From this starting node, SelectInput and SelectOutput are called to determine if a neighbor node (that hasn't been selected by another subgraph before) should be selected as a candidate for the subgraph. The search continues when a new node is selected as a candidate, and terminates when no more nodes are found. When the process of searching for candidate nodes ends, all of the candidate nodes will passed to Filter to finalize the subgraph. The filter step gives the user the last opportunity to drop out some of the candidate nodes (by default, Filter returns all nodes as the subgraph nodes). The selector is stateful. It can change its own state when seeing a new node.

class SubgraphSelector {
public:
// Select a starting node for a subgraph.
virtual bool Select(const nnvm::Node &n) = 0;
// The two methods below select neighbors in the incoming or outgoing links as candidate nodes of a subgraph.
virtual bool SelectInput(const nnvm::Node &curr_node, const nnvm::Node &new_node) = 0;
virtual bool SelectOutput(const nnvm::Node &curr_node, const nnvm::Node &new_node) = 0;
// After collecting all candidate nodes for a subgraph, users can use this method to prune some of the nodes
// to finalize the subgraph.
virtual std::vector<nnvm::Node*> Filter(const std::vector<nnvm::Node*>& candidates);
};

We provide a default selector called ContainOpSelector, which extracts a subgraph with operators supported by a backend. This selector or its variant is commonly needed by most of the backends.

When a subgraph is found, the partitioning algorithm invokes SubgraphProperty::CreateSubgraphNode to create a new node for the subgraph and connects the new node back to the original graph to replace the subgraph. The subgraph passed to CreateSubgraphNode contains shape/dtype/storage information of the input nodes. This is important because backends, such as TVM and TensorRT, need these information to compile and select the best kernel. CreateSubgraphNode allows any customization on the subgraph and the node, which gives users many opportunities to optimize the subgraph. For example, some backends, such as TVM and MKLDNN, can perform another level of graph partitioning in CreateSubgraphNode for operator fusion.

Each backend that uses the subgraph mechanism for integration needs to register a subgraph property with a macro MXNET_REGISTER_SUBGRAPH_PROPERTY. For example, MKLDNN can register its subgraph property as below. The first argument of this macro is the backend name and MKLDNNSubgraphProperty is a class name of the MKLDNN subgraph property.

MXNET_REGISTER_SUBGRAPH_PROPERTY("mkldnn", MKLDNNSubgraphProperty);

The subgraph properties registered by MXNET_REGISTER_SUBGRAPH_PROPERTY is used for graph partitioning in bind/simple_bind for Symbol executor and in CachedOp for Gluon hybridize. We only need to partition the forward graph. The backward graph will be generated accordingly. Currently, our design only supports graph partitioning for one backend. The backend is determined by an environment variable MXNET_SUBGRAPH_BACKEND. In the next step, we'll support graph partitioning for multiple backends.

Step 2: subgraph operator (function call)
Although there are two levels of graph partitioning, we only need to handle one level of subgraphs in the executor because the subgraphs in the second level are fused into operators. We can execute these subgraphs inside special operators, which is specific to the backend.

  • TVM execution operator: loads a subgraph from a TVM compiled binary, a graph JSON file and weight arrays, and executes the subgraph composed of fused operators. We can first use the TVM executor to execute the subgraph, but in the future we should use the MXNet executor because MXNet executes operators in multiple threads, which is useful for task parallelism. The operator needs to convert all output NDArrays of the subgraph to the default format.
  • MKLDNN execution operator: gets a subgraph from the first step and runs operators in the MXNet executors. Like TVM operator, this operator also needs to convert all output NDArrays of the subgraph to the default format.
  • TensorRT has its engine for executing the optimized subgraph.
  • nGraph execution operators: it's similar to TensorRT.

To customize the subgraph execution, a backend needs to provide their own operator implementation and attach the operator to the subgraph node when SubgraphProperty::CreateSubgraphNode is called. The subgraph operator should be a stateful operator and contain a computation graph. We provide a default subgraph operator implementation (“_subgraph_op”) that executes operators with MXNet Executor.

For fast inference in TVM and MKLDNN, the subgraph operators need to maintain a copy of weight arrays (similar to the closure of a function). In this way, we can convert the data format of the weight arrays and cache the array inside the subgraph operator to avoid any redundant format conversion. The original weight arrays will still be part of the inputs of the subgraph operator. Even though the weight arrays are normally not modified, we still need to handle this case correctly. One solution is to maintain a version number for the var of an NDArray, which is increased by one whenever the NDArray is modified in the execution engine. We can use the version number to determine the weight arrays have been modified whenever the subgraph operator is invoked.

The benefit of invoking a subgraph inside an operator
Introducing a subgraph operator for TVM and MKLDNN may sound like unnecessary complexity. It actually significantly reduces the complexity of the integration. By using the subgraph operator, we can completely isolate TVM operators and MKLDNN operators from MXNet operators as well as the default MXNet memory planning. Inside the subgraph operators, we don't need to deal with data format conversion and can use a completely different memory plan for the subgraph.


  • No labels

4 Comments

  1. Is the SubgraphProperty name good? I would think SubgraphSelectorFactory or something like that would be clearer.

    The semantic / naming of filter is a bit confusing. In functional programming filter always get a function passed to select which elements to filter. (In this case is Select?) Why not just operator()() since the class already suggest subgraph selection?   or finalize?


    I would change:

    Select → StartingNode ...   

    new_node → neighbour

    Filter → 

      virtual std::vector<nnvm::Node*> operator()(const std::vector<nnvm::Node*>& candidates);


  2. Thanks for your comments.

    I would prefer to use SubgraphProperty for two reasons. 1) SubgraphProperty works as more than SubgraphSelectorFactory. It provides additional methods such as CreateSubgraphNode. In the future, we may add more methods. 2) SubgraphProperty is similar to OperatorProperty, so it kind of follows the naming convention.

    The function of the filter method is exactly what you described: to selector which elements to filter. we can also name it with finalize, but I think filter is more intuitive.

  3. This looks great Da.  Very happy we're working on this.  I'm wondering what the plan is for merging this into mainline?  Are we developing it in a branch that get's a big review when it's ready?  What would be the first steps in porting for example TRT support to the new API?

  4. Kellen Sunderland the subgraph API has been merged to MXNet. https://github.com/apache/incubator-mxnet/pull/12157

    The MKLDNN team is using this subgraph API to improve performance. You can use their code as an example.