VXQuery could use a new method of executing queries and getting their result. One option would be to add a RESTful API. The API would allow  developers to use the system and allow for us to create a web-base interface for VXQuery. In addition the current CLI access method could be rewritten to use the same API. The new API would increase number of options a user or developer has when interacting with VXQuery. 

The big question is: What is the API? This page is here to define the API and an work surrounding it. A issue has been created in JIRA with sub-task to track all individual work items ( VXQUERY-180 - Getting issue details... STATUS ).

End Users of the API

  • VXQuery's CLI
    • save-file VAL : File path to save the query result
    • Local only parameters
      • available-processors N : Number of available processors. (default java's available processors)
      • local-node-controllers N : Number of local node controllers (default 1)
    • Cluster only parameters
      • client-net-ip-address VAL : IP Address of the ClusterController
      • client-net-port N : Port of the ClusterController (default 1098)
  • Application developers (new)
  • Web-based query interface (new)

Possible Tools or Libraries

Secondary Optionals

API functions

/query (POST)

Sends a query to be executed with all the same parameters as the current CLI. The system sends back the job id and the result id.

See SwaggerIO Configuration for details on parameters and response values.

Example 1 (Basic request):

Query
$ curl -v http://localhost:????/query -X POST \
-d "statement=for $x in doc('books.xml')/bookstore/book/title return $x" 
Response
< HTTP/1.1 200 OK
<response>
  <requestId>6e0143ed-4657-4c9d-a184-703c930c7401</requestId>
  <status>success</status>
  <resultId>1</resultId>
  <resultUrl>http://localhost:????/query/result/1</resultUrl>
</response>

Example 2 (Basic request with additional parameters):

Query
$ curl -v http://localhost:????/query -X POST \
-d "statement=for $x in doc('books.xml')/bookstore/book/title return $x&frameSize=128000&showOptimizedExpressionTree=true&metrics=true" 
Response
< HTTP/1.1 200 OK
<response>
  <requestId>6e0143ed-4657-4c9d-a184-703c930c7401</requestId>
  <status>fatal</status>
  <resultId>2</resultId>
  <resultUrl>http://localhost:????/query/result/2</resultUrl>
  <metrics>
     <elapsedTime>134</elapsedTime>
     <compileTime>110</compileTime>
  </metrics>
  <optimizedExpressionTree>
DISTRIBUTE-RESULT( $$17 )
UNNEST( $$17:child($$13, "title") )
UNNEST( $$13:child($$7, "book") )
UNNEST( $$7:child($$2, "bookstore") )
ASSIGN( $$2:doc("books.xml") )
EMPTY-TUPLE-SOURCE
  </optimizedExpressionTree>
</response>

Example 3 (Bad request):

Query
$ curl -v http://localhost:????/query -X POST \
-d "statement=for $x in doc('ooks.xml')/bookstore/book/title return $x" 
Response
< HTTP/1.1 400 Bad Request
<response>
  <requestID>6e0143ed-4657-4c9d-a184-703c930c7401</requestID>
  <status>fatal</status>
  <error>
    <code>400</code>
    <message>Bad Request</message>
    <stackTrace>...</stackTrace>
  </error>
  <queryError>
    <code>FODC0002</code>
    <message>Error retrieving resource.</message>
  </queryError>

</response>

The VXQuery SystemException should be parsed into the queryError tag separating out the code and message. The stackTrace tag may include the java stack trace of the error message.

 

/query/result/{result-id} (GET)

Requests the query result for the result id. The system may response saying the job is not done, thus the result is not ready.

See SwaggerIO Configuration for details on parameters and response values.

Example 1 (Basic request):

Query
$ curl -v http://localhost:????/query/result/1 -X POST
Response
< HTTP/1.1 200 OK
<response>
  <requestId>6e0143ed-4657-4c9d-a184-703c930c7401</requestId>
  <status>success</status>
  <result>
    <title lang="en">Everyday Italian</title>
    <title lang="en">Harry Potter</title>
    <title lang="en">XQuery Kick Start</title>
    <title lang="en">Learning XML</title>
  </result>
</response>

Example 2 (Basic request with additional parameters):

Query
$ curl -v http://localhost:????/query/result/2 -X POST \
-d "metrics=true" 
Response
< HTTP/1.1 200 OK
<response>
  <requestId>6e0143ed-4657-4c9d-a184-703c930c7401</requestId>
  <status>success</status>
  <result>
    <title lang="en">Everyday Italian</title>
    <title lang="en">Harry Potter</title>
    <title lang="en">XQuery Kick Start</title>
    <title lang="en">Learning XML</title>
  </result>
  <metrics>
     <elapsedTime>123</elapsedTime>
  </metrics>
</response>

AsterixDB Code References

  • org.apache.asterix.api.http.servlet.APIServlet.java
  • org.apache.asterix.api.http.servlet.RESTAPIServlet.java
  • org.apache.asterix.hyracks.bootstrap.CCApplicationEntryPoint.java
  • https://atom.io/packages/language-jsoniq

 

  • No labels

10 Comments

  1. It seems that a number of the options on /query (like e.g. client-net-ip-address) shouldn't be part of an HTTP interface.

    1. Good point.

      The current CLI will start a cluster and run the query even if a cluster has not been defined. Should this feature be included in the CLI based on the RESTful API, if so how?

      1. I think that some process has to serve the HTTP API and that that process would need to do that.

        If we want to have completely smooth startup experience, I think that we could make the "client" create the server process/thread as well, but I'm not sure that that's a good idea ...

  2. For the result API I think that it would be good to have one URI to identify the result and to embed URIs to the following result chunks in the first response.

     

    1. This method would require materializing the whole result before sending it to the client. A next chunk method could avoid materialization.

      1. I think that there are 2 points here:

        1. If every chunk just points to the next one you only need to map the identifier to the the query state and no materialization if necessary.
        2. Keeping a query "alive" waiting for the client to request results might be worse than materializing the results on the server. Clients tend to be slow and while the query is alive all the resources for the query remain allocated. So I think that we'd want to aim for a similar model as AsterixDB where the query is finished to free up the cluster and the results are materialized on the NCs. 

         

  3. For the result API we should decide if the job-id/results-id (do we really need both?) is passed in the URI or as a parameter (currently I prefer the URI, but I know that there are different opinions on this subject (smile) ).

  4. I updated the results. We have parameters to request different intermediate query steps back as part of the result. For example, the query, abstract syntax tree, initial logical plan, optimized logical plan, the hyrack job, and timing information (compilation and query timing). How should these results be sent to the client? (In the future this could include a graphic representing the DAG for plans or job.)

    1. I think that we should wrap everything in a (XML) response message (including error messages).

  5. The page has been update with examples and the details of the API have been moved to the SwaggerIO Configuration page. Please post any feedback to the proposed alpha VXQuery API. The goal of this API is to lay the ground work for getting a working implementation with the least about of work. I think this definition would be a good starting point.