Child pages
  • Chunked File Upload Support
Skip to end of metadata
Go to start of metadata

Chunked File Upload

Status: DRAFT
Created: 20. January 2013
Author: shgupta
JIRA: SLING-2707
References: - http://docs.aws.amazon.com/AmazonS3/latest/dev/mpuoverview.html
Updated: -

Use Cases:

1. Large file upload - With high speed internet connections, advent of cloud and HD going mainstream, Sling support large files (> 2GB) upload.
2. Fault tolerant uploads - Sling provides capability to resume upload from failure point. It doesn't require client to restart the complete upload process.

Approach

Sling provides an extension to SlingPostServlet which accepts file chunks in accordance with a specified protocol. Sling client slices the file in chunks, and upload the chunks in serial manner to server. Each chunk has "Offset" attribute which identify chunk position in complete file. SlingPostServletupon receiving the last chunk, stitches all chunks into a single file and save them to the final destination.

In case of upload failures, sling provides support to query the last chunk uploaded till failure point. Client resumes chunk upload from last failure point.

Content Model

Chunks are stored within actual path in sling:chunkMixin mixin node type. The content model to store chunk is defined as follows:

Content model to store chunk

The typical nt:file node under chunked upload would look like

Typical nt:file nod under chunked upload

Protocol Specification

Upload chunk using POST

Chunk upload request

Client uses POST method to parent path to upload binary chunk of file. Sling client passes $filename@Offset and $filename@Length as multipart request parameters. "Offset" indicate chunk's data offset in complete file. "Length" is optional parameter and indicates length of complete file. If "Length" parameter is known, Sling automatically calculates if request is last chunk request and according stitches all chunks and store it into final destination.
[request]

First/Intermediate chunk upload request

[response]
The response shows that chunk is stored within the actual path in sling:chunkMixin node type.

Chunk upload response

Chunk Upload in streaming use case

In streaming use case, file's length is not known in advance. Sling client requires to send "fileName@Completed" to true to indicate that it has reached end of file and current chunk request would be last.
[request]

Last chunk upload request

[response]
The response shows that merge chunks uploaded at final destination and deletion of chunk upload metadata.

Last chunk upload response

Query Sling about the interrupted chunk upload status

Client sent get request on upload to retrieve chunk upload status.
[request]

Query interrupted chunk upload request

[response]
sling:chunksLength property indicate cumulative length of all chunks stored in sling. Sling client should resume upload from sling:chunksLength property's value offset

Query interrupted chunk upload response

Abort chunk upload

To abort chunk upload Sling client passes ":operation=delete" request parameter along with ":applyToChunks=true".

Abort incomplete chunk upload request

[response]
Sling removes nt:file node for a new resumable upload and individual chunks on a existing nt:file node.

Abort incomplete chunk upload response

Error Scenarios

Start concurrent chunk upload

If Sling client starts a new upload on an already "in progress" chunk upload, Sling sends 500 internal server error along with error message "Chunk upload already in progress at {path}"

Start concurrent chunk upload

[response]

Chunk upload already in progress

Start resumable upload from non zero offset

If sling client starts upload from non zero offset, Sling sends 500 internal server error along with error message "no chunk upload found at {path} wrapped javax.jcr.RepositoryException.

Start chunk upload request from non-zero offset

[response]

No chunk upload found

Upload noncontinuous upload

If sling client sends noncontinuous chunk upload request, Sling sends 500 internal server error along with error message "Chunk's offset {actual offset} doesn't match expected offset {expected offset} wrapped javax.jcr.RepositoryException.

Noncontinuous chunk upload request

[response]

Offset mismatch error
  • No labels

1 Comment

  1. Some comments:

    • I second Julian Reschke's proposal to return 201/Created from the POST requests. The Location header should be set to the URL use to test for chunked upload; except for the final POST, which of course returns the URL to the actual binary uploaded.
    • Request Extension for POST: Yes, we need a request extension to properly assign selectors. Suggest to use res instead of html. We also use this extension in the Default GET Servlet to request streaming the result in case we need an extension.
    • Chunk numbers: Using chunk numbering as propose implies that there is a predefined size of each chunk. The respective specification is missing in this proposal. Otherwise, instead of using chunk numbers you could use size ranges. For example the discovery request returns the size of contiguous data already uploaded successfully. The POST requests in indicate the file offset (the number of bytes sent is equal to the Content-Length header.
    • If the check request has to has an extension, this should be JSON to reflect the actual data format expected in the response. This URL should (see above) be used as the Location header on the 201/CREATED responses to the POST request.