Chunked File Upload
Status: DRAFT
Created: 20. January 2013
Author: shgupta
JIRA: SLING-2707
References: - http://docs.aws.amazon.com/AmazonS3/latest/dev/mpuoverview.html
Updated: -
Use Cases:
1. Large file upload - With high speed internet connections, advent of cloud and HD going mainstream, Sling support large files (> 2GB) upload.
2. Fault tolerant uploads - Sling provides capability to resume upload from failure point. It doesn't require client to restart the complete upload process.
Approach
Sling provides an extension to SlingPostServlet which accepts file chunks in accordance with a specified protocol. Sling client slices the file in chunks, and upload the chunks in serial manner to server. Each chunk has "Offset" attribute which identify chunk position in complete file. SlingPostServletupon receiving the last chunk, stitches all chunks into a single file and save them to the final destination.
In case of upload failures, sling provides support to query the last chunk uploaded till failure point. Client resumes chunk upload from last failure point.
Content Model
Chunks are stored within actual path in sling:chunkMixin mixin node type. The content model to store chunk is defined as follows:
// node type to store chunk // offset: offset of chunk in file // jcr:data: binary of chunk [sling:chunk] > nt:hierarchyNode primaryitem jcr:data - sling:offset (long) mandatory - jcr:data (binary) mandatory //----------------------------------------------------------------------------- // Mixin node type to identify that a node has chunks // sling:fileLength : length of complete file // sling:length : cumulative length of all uploaded chunks [sling:chunks] mixin - sling:fileLength (long) - sling:chunksLength (long) + * (sling:chunkNode) multiple
The typical nt:file node under chunked upload would look like
/content/dam/folder/catalog.pdf [nt:file] + jcr:content [nt:resource] [sling:chunkMixin] - jcr:data = empty until completed - sling:fileLength = 982145 // (filename@Length from client) - sling:chunksLength = 30000 //cumulative length of all uploaded chunks + chunk_0-9999 [sling:chunkNode] - sling:offset =0 - jcr:data [binary data] + chunk_10000-19999 [sling:chunkNode] + ....
Protocol Specification
Upload chunk using POST
Chunk upload request
Client uses POST method to parent path to upload binary chunk of file. Sling client passes $filename@Offset and $filename@Length as multipart request parameters. "Offset" indicate chunk's data offset in complete file. "Length" is optional parameter and indicates length of complete file. If "Length" parameter is known, Sling automatically calculates if request is last chunk request and according stitches all chunks and store it into final destination.
[request]
POST /content/dam/folder HTTP/1.1 Authorization: Basic YWRtaW46YWRtaW4= Transfer-Encoding: chunked Content-Type: multipart/form-data; boundary=CbZDcL_DxJIVQqSG1WkYaIoLWqT3FGYCVe Connection: Keep-Alive User-Agent: Apache-HttpClient/4.1 (java 1.5) Host: localhost:4502 --CbZDcL_DxJIVQqSG1WkYaIoLWqT3FGYCVe Content-Disposition: form-data; name="catalog.pdf@Length" Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit 1000 --CbZDcL_DxJIVQqSG1WkYaIoLWqT3FGYCVe Content-Disposition: form-data; name="catalog.pdf@Offset" Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit 400 --CbZDcL_DxJIVQqSG1WkYaIoLWqT3FGYCVe Content-Disposition: form-data; name="catalog.pdf"; filename="catalog.pdf" Content-Type: application/pdf Content-Transfer-Encoding: binary $binary-data --CbZDcL_DxJIVQqSG1WkYaIoLWqT3FGYCVe--
[response]
The response shows that chunk is stored within the actual path in sling:chunkMixin node type.
HTTP/1.1 200 OK Connection: Keep-Alive Server: Day-Servlet-Engine/4.1.42 Content-Type: text/html;charset=UTF-8 Date: Mon, 06 May 2013 14:42:22 GMT Transfer-Encoding: chunked <html> <head> <title>Content modified /content/dam/folder</title> </head> <body> <h1>Content modified /content/dam/folder</h1> <table> <tbody> <tr> <td>Status</td> <td><div id="Status">200</div></td> </tr> <tr> <td>Message</td> <td><div id="Message">OK</div></td> </tr> <tr> <td>Location</td> <td><a href="/content/dam/folder" id="Location">/content/dam/folder</a></td> </tr> <tr> <td>Parent Location</td> <td><a href="/" id="ParentLocation">/</a></td> </tr> <tr> <td>Path</td> <td><div id="Path">/content/dam/folder</div></td> </tr> <tr> <td>Referer</td> <td><a href="" id="Referer"></a></td> </tr> <tr> <td>ChangeLog</td> <td><div id="ChangeLog"><pre>modified("/content/dam/folder/catalog.pdf/jcr:content/jcr:lastModified");<br/>modified("/content/dam/folder/catalog.pdf/jcr:content/jcr:mimeType"); <br/>created("/content/dam/folder/catalog.pdf/jcr:content/chunk_400_799");<br/>modified("/content/dam/folder/catalog.pdf/jcr:content/chunk_400_799/jcr:data");<br/> modified("/content/dam/folder/catalog.pdf/jcr:content/chunk_400_799/sling:offset");<br/>modified("/content/dam/folder/catalog.pdf/jcr:content/sling:chunksLength");<br/></pre></div></td> </tr> </tbody> </table> <p><a href="">Go Back</a></p> <p><a href="/content/dam/folder">Modified Resource</a></p> <p><a href="/">Parent of Modified Resource</a></p> </body> </html>
Chunk Upload in streaming use case
In streaming use case, file's length is not known in advance. Sling client requires to send "fileName@Completed" to true to indicate that it has reached end of file and current chunk request would be last.
[request]
POST /content/dam/folder HTTP/1.1 Authorization: Basic YWRtaW46YWRtaW4= Transfer-Encoding: chunked Content-Type: multipart/form-data; boundary=lMaKIb2KPscWvPV8B0fULKkKayVtcxugD8Lt Connection: Keep-Alive User-Agent: Apache-HttpClient/4.1 (java 1.5) Host: localhost:4502 --lMaKIb2KPscWvPV8B0fULKkKayVtcxugD8Lt Content-Disposition: form-data; name="catalog.pdf@Completed" Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit true --lMaKIb2KPscWvPV8B0fULKkKayVtcxugD8Lt Content-Disposition: form-data; name="catalog.pdf@Offset" Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit 400 --lMaKIb2KPscWvPV8B0fULKkKayVtcxugD8Lt Content-Disposition: form-data; name="catalog.pdf"; filename="catalog.pdf" Content-Type: application/pdf Content-Transfer-Encoding: binary $binary_data --lMaKIb2KPscWvPV8B0fULKkKayVtcxugD8Lt--
[response]
The response shows that merge chunks uploaded at final destination and deletion of chunk upload metadata.
HTTP/1.1 200 OK Connection: Keep-Alive Server: Day-Servlet-Engine/4.1.42 Content-Type: text/html;charset=UTF-8 Date: Mon, 06 May 2013 15:52:16 GMT Transfer-Encoding: chunked <html> <head> <title>Content modified /content/dam/folder</title> </head> <body> <h1>Content modified /content/dam/folder</h1> <table> <tbody> <tr> <td>Status</td> <td><div id="Status">200</div></td> </tr> <tr> <td>Message</td> <td><div id="Message">OK</div></td> </tr> <tr> <td>Location</td> <td><a href="/content/dam/folder" id="Location">/content/dam/folder</a></td> </tr> <tr> <td>Parent Location</td> <td><a href="/" id="ParentLocation">/</a></td> </tr> <tr> <td>Path</td> <td><div id="Path">/content/dam/folder</div></td> </tr> <tr> <td>Referer</td> <td><a href="" id="Referer"></a></td> </tr> <tr> <td>ChangeLog</td> <td><div id="ChangeLog"><pre>modified("/content/dam/folder/catalog.pdf/jcr:content/jcr:lastModified");<br/>modified("/content/dam/folder/catalog.pdf/jcr:content/jcr:mimeType"); <br/>modified("/content/dam/folder/catalog.pdf/jcr:content/jcr:data");<br/>deleted("/content/dam/folder/catalog.pdf/jcr:content/chunk_0_199"); <br/>deleted("/content/dam/folder/catalog.pdf/jcr:content/chunk_200_399");<br/>deleted("/content/dam/folder/catalog.pdf/jcr:content/sling:chunksLength"); <br/></pre></div></td> </tr> </tbody> </table> <p><a href="">Go Back</a></p> <p><a href="/content/dam/folder">Modified Resource</a></p> <p><a href="/">Parent of Modified Resource</a></p> </body> </html>
Query Sling about the interrupted chunk upload status
Client sent get request on upload to retrieve chunk upload status.
[request]
GET //content/dam/folder/catalog.pdf.3.json HTTP/1.1 Authorization: Basic YWRtaW46YWRtaW4= Connection: Keep-Alive User-Agent: Apache-HttpClient/4.1 (java 1.5) Host: localhost:4502
[response]
sling:chunksLength property indicate cumulative length of all chunks stored in sling. Sling client should resume upload from sling:chunksLength property's value offset
{"jcr:createdBy":"admin","jcr:created":"Mon May 06 2013 21:31:22 GMT+0530","jcr:primaryType":"nt:file","jcr:content":{"jcr:lastModifiedBy":"admin", "jcr:uuid":"845e9cee-f963-4f72-b115-fa021859c809",":jcr:data":0,"jcr:mixinTypes":["sling:chunkMixin"],"sling:chunksLength":200,"jcr:mimeType":"application/pdf", "jcr:lastModified":"Mon May 06 2013 21:31:22 GMT+0530", "jcr:primaryType":"nt:resource","sling:fileLength":1700,"chunk_0_199":{"jcr:createdBy":"admin",":jcr:data":200,"sling:offset":0,"jcr:created":"Mon May 06 2013 21:31:22 GMT+0530","jcr:primaryType":"sling:chunkNode"}}
Abort chunk upload
To abort chunk upload Sling client passes ":operation=delete" request parameter along with ":applyToChunks=true".
POST /content/dam/folder/catalog.pdf HTTP/1.1 Authorization: Basic YWRtaW46YWRtaW4= Content-Length: 403 Content-Type: multipart/form-data; boundary=dDzF5u2n-HJu5tudkdVpFucFsmqcVV-CONtRqlL Connection: Keep-Alive User-Agent: Apache-HttpClient/4.1 (java 1.5) Host: localhost:4502 --dDzF5u2n-HJu5tudkdVpFucFsmqcVV-CONtRqlL Content-Disposition: form-data; name=":applyToChunks" Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit true --dDzF5u2n-HJu5tudkdVpFucFsmqcVV-CONtRqlL Content-Disposition: form-data; name=":operation" Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit delete --dDzF5u2n-HJu5tudkdVpFucFsmqcVV-CONtRqlL--
[response]
Sling removes nt:file node for a new resumable upload and individual chunks on a existing nt:file node.
HTTP/1.1 200 OK Connection: Keep-Alive Server: Day-Servlet-Engine/4.1.42 Content-Type: text/html;charset=UTF-8 Date: Mon, 06 May 2013 16:09:58 GMT Transfer-Encoding: chunked <html> <head> <title>Content modified /content/dam/folder/catalog.pdf</title> </head> <body> <h1>Content modified /content/dam/folder/catalog.pdf</h1> <table> <tbody> <tr> <td>Status</td> <td><div id="Status">200</div></td> </tr> <tr> <td>Message</td> <td><div id="Message">OK</div></td> </tr> <tr> <td>Location</td> <td><a href="/content/dam/folder/catalog.pdf" id="Location">/content/dam/folder/catalog.pdf</a></td> </tr> <tr> <td>Parent Location</td> <td><a href="/content/dam/folder" id="ParentLocation">/content/dam/folder</a></td> </tr> <tr> <td>Path</td> <td><div id="Path">/content/dam/folder/catalog.pdf</div></td> </tr> <tr> <td>Referer</td> <td><a href="" id="Referer"></a></td> </tr> <tr> <td>ChangeLog</td> <td><div id="ChangeLog"><pre>deleted("/content/dam/folder/catalog.pdf");<br/></pre></div></td> </tr> </tbody> </table> <p><a href="">Go Back</a></p> <p><a href="/content/dam/folder/catalog.pdf">Modified Resource</a></p> <p><a href="/content/dam/folder">Parent of Modified Resource</a></p> </body> </html>
Error Scenarios
Start concurrent chunk upload
If Sling client starts a new upload on an already "in progress" chunk upload, Sling sends 500 internal server error along with error message "Chunk upload already in progress at {path}"
POST /content/dam/folder HTTP/1.1 Authorization: Basic YWRtaW46YWRtaW4= Transfer-Encoding: chunked Content-Type: multipart/form-data; boundary=WR64qwKjZHY7i8CXduKaVyT6hxsIyBjAie Connection: Keep-Alive User-Agent: Apache-HttpClient/4.1 (java 1.5) Host: localhost:4502 --WR64qwKjZHY7i8CXduKaVyT6hxsIyBjAie Content-Disposition: form-data; name="catalog.pdf@Length" Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit 1000 --WR64qwKjZHY7i8CXduKaVyT6hxsIyBjAie Content-Disposition: form-data; name="catalog.pdf@Offset" Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit 0 --WR64qwKjZHY7i8CXduKaVyT6hxsIyBjAie Content-Disposition: form-data; name="catalog.pdf"; filename="catalog.pdf" Content-Type: application/pdf Content-Transfer-Encoding: binary $binary_data --WR64qwKjZHY7i8CXduKaVyT6hxsIyBjAie--
[response]
HTTP/1.1 500 Internal Server Error Connection: Close Server: Day-Servlet-Engine/4.1.42 Content-Type: text/html;charset=UTF-8 Date: Mon, 06 May 2013 16:09:58 GMT Transfer-Encoding: chunked <html> <head> <title>Error while processing /content/dam/folder</title> </head> <body> <h1>Error while processing /content/dam/folder</h1> <table> <tbody> <tr> <td>Status</td> <td><div id="Status">500</div></td> </tr> <tr> <td>Message</td> <td><div id="Message">javax.jcr.RepositoryException: Chunk upload already in progress at {/content/dam/folder/catalog.pdf}</div></td> </tr> <tr> <td>Location</td> <td><a href="/content/dam/folder" id="Location">/content/dam/folder</a></td> </tr> <tr> <td>Parent Location</td> <td><a href="/" id="ParentLocation">/</a></td> </tr> <tr> <td>Path</td> <td><div id="Path">/content/dam/folder</div></td> </tr> <tr> <td>Referer</td> <td><a href="" id="Referer"></a></td> </tr> <tr> <td>ChangeLog</td> <td><div id="ChangeLog"><pre></pre></div></td> </tr> </tbody> </table> <p><a href="">Go Back</a></p> <p><a href="/content/dam/folder">Modified Resource</a></p> <p><a href="/">Parent of Modified Resource</a></p> </body> </html>
Start resumable upload from non zero offset
If sling client starts upload from non zero offset, Sling sends 500 internal server error along with error message "no chunk upload found at {path} wrapped javax.jcr.RepositoryException.
POST /content/dam/folder HTTP/1.1 Authorization: Basic YWRtaW46YWRtaW4= Transfer-Encoding: chunked Content-Type: multipart/form-data; boundary=4SC3O7Wgs4nrN8yqNaH1TNfQRxPK62 Connection: Keep-Alive User-Agent: Apache-HttpClient/4.1 (java 1.5) Host: localhost:4502 --4SC3O7Wgs4nrN8yqNaH1TNfQRxPK62 Content-Disposition: form-data; name="catalog.pdf@Length" Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit 1700 --4SC3O7Wgs4nrN8yqNaH1TNfQRxPK62 Content-Disposition: form-data; name="catalog.pdf@Offset" Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit 200 --4SC3O7Wgs4nrN8yqNaH1TNfQRxPK62 Content-Disposition: form-data; name="catalog.pdf"; filename="catalog.pdf" Content-Type: application/pdf Content-Transfer-Encoding: binary $binary-data --4SC3O7Wgs4nrN8yqNaH1TNfQRxPK62--
[response]
HTTP/1.1 500 Internal Server Error Connection: Close Server: Day-Servlet-Engine/4.1.42 Content-Type: text/html;charset=UTF-8 Date: Mon, 06 May 2013 16:22:55 GMT Transfer-Encoding: chunked <html> <head> <title>Error while processing /content/dam/folder</title> </head> <body> <h1>Error while processing /content/dam/folder</h1> <table> <tbody> <tr> <td>Status</td> <td><div id="Status">500</div></td> </tr> <tr> <td>Message</td> <td><div id="Message">javax.jcr.RepositoryException: no chunk upload found at {/content/dam/folder/catalog.pdf}</div></td> </tr> <tr> <td>Location</td> <td><a href="/content/dam/folder" id="Location">/content/dam/folder</a></td> </tr> <tr> <td>Parent Location</td> <td><a href="/" id="ParentLocation">/</a></td> </tr> <tr> <td>Path</td> <td><div id="Path">/content/dam/folder</div></td> </tr> <tr> <td>Referer</td> <td><a href="" id="Referer"></a></td> </tr> <tr> <td>ChangeLog</td> <td><div id="ChangeLog"><pre></pre></div></td> </tr> </tbody> </table> <p><a href="">Go Back</a></p> <p><a href="/content/dam/folder">Modified Resource</a></p> <p><a href="/">Parent of Modified Resource</a></p> </body> </html>
Upload noncontinuous upload
If sling client sends noncontinuous chunk upload request, Sling sends 500 internal server error along with error message "Chunk's offset {actual offset} doesn't match expected offset {expected offset} wrapped javax.jcr.RepositoryException.
POST /content/dam/folder HTTP/1.1 Authorization: Basic YWRtaW46YWRtaW4= Transfer-Encoding: chunked Content-Type: multipart/form-data; boundary=i3nkScb8nmEmcC87H-LOXKXPO5cutm6 Connection: Keep-Alive User-Agent: Apache-HttpClient/4.1 (java 1.5) Host: localhost:4502 --i3nkScb8nmEmcC87H-LOXKXPO5cutm6 Content-Disposition: form-data; name="catalog.pdf@Length" Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit 1000 --i3nkScb8nmEmcC87H-LOXKXPO5cutm6 Content-Disposition: form-data; name="catalog.pdf@Offset" Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit 200 --i3nkScb8nmEmcC87H-LOXKXPO5cutm6 Content-Disposition: form-data; name="catalog.pdf"; filename="catalog.pdf" Content-Type: application/pdf Content-Transfer-Encoding: binary $binary_data --i3nkScb8nmEmcC87H-LOXKXPO5cutm6--
[response]
HTTP/1.1 500 Internal Server Error Connection: Close Server: Day-Servlet-Engine/4.1.42 Content-Type: text/html;charset=UTF-8 Date: Mon, 06 May 2013 16:09:58 GMT Transfer-Encoding: chunked <html> <head> <title>Error while processing /content/dam/folder</title> </head> <body> <h1>Error while processing /content/dam/folder</h1> <table> <tbody> <tr> <td>Status</td> <td><div id="Status">500</div></td> </tr> <tr> <td>Message</td> <td><div id="Message">javax.jcr.RepositoryException: Chunk's offset {200} doesn't match expected offset {600}</div></td> </tr> <tr> <td>Location</td> <td><a href="/content/dam/folder" id="Location">/content/dam/folder</a></td> </tr> <tr> <td>Parent Location</td> <td><a href="/" id="ParentLocation">/</a></td> </tr> <tr> <td>Path</td> <td><div id="Path">/content/dam/folder</div></td> </tr> <tr> <td>Referer</td> <td><a href="" id="Referer"></a></td> </tr> <tr> <td>ChangeLog</td> <td><div id="ChangeLog"><pre></pre></div></td> </tr> </tbody> </table> <p><a href="">Go Back</a></p> <p><a href="/content/dam/folder">Modified Resource</a></p> <p><a href="/">Parent of Modified Resource</a></p> </body> </html>
1 Comment
Felix Meschberger
Some comments:
res
instead ofhtml
. We also use this extension in the Default GET Servlet to request streaming the result in case we need an extension.