Table of Contents |
---|
maxLevel | 3 |
---|
outline | true |
---|
separator | pipe |
---|
|
Chunked File Upload
Status: DRAFT
Created: 20. January 2013
Author: shgupta
JIRA: SLING-2707
References: - http://docs.aws.amazon.com/AmazonS3/latest/dev/mpuoverview.html
Updated: -
Use Cases:
1. Large file upload - With high speed internet connections, advent of cloud and HD going mainstream, Sling support large files (> 2GB) upload.
2. Fault tolerant uploads - Sling provides capability to resume upload from failure point. It doesn't require client to restart the complete upload process.
Approach
Sling provides an extension to SlingPostServlet which accepts file chunks in accordance with a specified protocol. Sling client slices the file in chunks, and upload the chunks in serial manner to server. Each chunk has "Offset" attribute which identify chunk position in complete file. SlingPostServletupon receiving the last chunk, stitches all chunks into a single file and save them to the final destination.
In case of upload failures, sling provides support to query the last chunk uploaded till failure point. Client resumes chunk upload from last failure point.
Content Model
Chunks are stored within actual path in sling:chunkMixin mixin node type. The content model to store chunk is defined as follows:
Code Block |
---|
title | Content model to store chunk |
---|
|
// node type to store chunk
// offset: offset of chunk in file
// jcr:data: binary of chunk
[sling:chunk] > nt:hierarchyNode
primaryitem jcr:data
- sling:offset (long) mandatory
- jcr:data (binary) mandatory
//-----------------------------------------------------------------------------
// Mixin node type to identify that a node has chunks
// sling:fileLength : length of complete file
// sling:length : cumulative length of all uploaded chunks
[sling:chunks]
mixin
- sling:fileLength (long)
- sling:chunksLength (long)
+ * (sling:chunkNode) multiple
|
The typical nt:file node under chunked upload would look like
Code Block |
---|
title | Typical nt:file nod under chunked upload |
---|
|
/content/dam/folder/catalog.pdf [nt:file]
+ jcr:content [nt:resource] [sling:chunkMixin]
- jcr:data = empty until completed
- sling:fileLength = 982145 // (filename@Length from client)
- sling:chunksLength = 30000 //cumulative length of all uploaded chunks
+ chunk_0-9999 [sling:chunkNode]
- sling:offset =0
- jcr:data [binary data]
+ chunk_10000-19999 [sling:chunkNode]
+ ....
|
Protocol Specification
Upload chunk using POST
Chunk upload request
Client uses POST method to parent path to upload binary chunk of file. Sling client passes $filename@Offset and $filename@Length as multipart request parameters. "Offset" indicate chunk's data offset in complete file. "Length" is optional parameter and indicates length of complete file. If "Length" parameter is known, Sling automatically calculates if request is last chunk request and according stitches all chunks and store it into final destination.
[request]
Code Block |
---|
title | First/Intermediate chunk upload request |
---|
|
POST /content/dam/folder HTTP/1.1
Authorization: Basic YWRtaW46YWRtaW4=
Transfer-Encoding: chunked
Content-Type: multipart/form-data; boundary=CbZDcL_DxJIVQqSG1WkYaIoLWqT3FGYCVe
Connection: Keep-Alive
User-Agent: Apache-HttpClient/4.1 (java 1.5)
Host: localhost:4502
--CbZDcL_DxJIVQqSG1WkYaIoLWqT3FGYCVe
Content-Disposition: form-data; name="catalog.pdf@Length"
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
1000
--CbZDcL_DxJIVQqSG1WkYaIoLWqT3FGYCVe
Content-Disposition: form-data; name="catalog.pdf@Offset"
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
400
--CbZDcL_DxJIVQqSG1WkYaIoLWqT3FGYCVe
Content-Disposition: form-data; name="catalog.pdf"; filename="catalog.pdf"
Content-Type: application/pdf
Content-Transfer-Encoding: binary
$binary-data
--CbZDcL_DxJIVQqSG1WkYaIoLWqT3FGYCVe--
|
[response]
The response shows that chunk is stored within the actual path in sling:chunkMixin node type.
Code Block |
---|
title | Chunk upload response |
---|
|
HTTP/1.1 200 OK
Connection: Keep-Alive
Server: Day-Servlet-Engine/4.1.42
Content-Type: text/html;charset=UTF-8
Date: Mon, 06 May 2013 14:42:22 GMT
Transfer-Encoding: chunked
<html>
<head>
<title>Content modified /content/dam/folder</title>
</head>
<body>
<h1>Content modified /content/dam/folder</h1>
<table>
<tbody>
<tr>
<td>Status</td>
<td><div id="Status">200</div></td>
</tr>
<tr>
<td>Message</td>
<td><div id="Message">OK</div></td>
</tr>
<tr>
<td>Location</td>
<td><a href="/content/dam/folder" id="Location">/content/dam/folder</a></td>
</tr>
<tr>
<td>Parent Location</td>
<td><a href="/" id="ParentLocation">/</a></td>
</tr>
<tr>
<td>Path</td>
<td><div id="Path">/content/dam/folder</div></td>
</tr>
<tr>
<td>Referer</td>
<td><a href="" id="Referer"></a></td>
</tr>
<tr>
<td>ChangeLog</td>
<td><div id="ChangeLog"><pre>modified("/content/dam/folder/catalog.pdf/jcr:content/jcr:lastModified");<br/>modified("/content/dam/folder/catalog.pdf/jcr:content/jcr:mimeType");
<br/>created("/content/dam/folder/catalog.pdf/jcr:content/chunk_400_799");<br/>modified("/content/dam/folder/catalog.pdf/jcr:content/chunk_400_799/jcr:data");<br/>
modified("/content/dam/folder/catalog.pdf/jcr:content/chunk_400_799/sling:offset");<br/>modified("/content/dam/folder/catalog.pdf/jcr:content/sling:chunksLength");<br/></pre></div></td>
</tr>
</tbody>
</table>
<p><a href="">Go Back</a></p>
<p><a href="/content/dam/folder">Modified Resource</a></p>
<p><a href="/">Parent of Modified Resource</a></p>
</body>
</html>
|
Chunk Upload in streaming use case
In streaming use case, file's length is not known in advance. Sling client requires to send "fileName@Completed" to true to indicate that it has reached end of file and current chunk request would be last.
[request]
Code Block |
---|
title | Last chunk upload request |
---|
|
POST /content/dam/folder HTTP/1.1
Authorization: Basic YWRtaW46YWRtaW4=
Transfer-Encoding: chunked
Content-Type: multipart/form-data; boundary=lMaKIb2KPscWvPV8B0fULKkKayVtcxugD8Lt
Connection: Keep-Alive
User-Agent: Apache-HttpClient/4.1 (java 1.5)
Host: localhost:4502
--lMaKIb2KPscWvPV8B0fULKkKayVtcxugD8Lt
Content-Disposition: form-data; name="catalog.pdf@Completed"
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
true
--lMaKIb2KPscWvPV8B0fULKkKayVtcxugD8Lt
Content-Disposition: form-data; name="catalog.pdf@Offset"
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
400
--lMaKIb2KPscWvPV8B0fULKkKayVtcxugD8Lt
Content-Disposition: form-data; name="catalog.pdf"; filename="catalog.pdf"
Content-Type: application/pdf
Content-Transfer-Encoding: binary
$binary_data
--lMaKIb2KPscWvPV8B0fULKkKayVtcxugD8Lt--
|
[response]
The response shows that merge chunks uploaded at final destination and deletion of chunk upload metadata.
Code Block |
---|
title | Last chunk upload response |
---|
|
HTTP/1.1 200 OK
Connection: Keep-Alive
Server: Day-Servlet-Engine/4.1.42
Content-Type: text/html;charset=UTF-8
Date: Mon, 06 May 2013 15:52:16 GMT
Transfer-Encoding: chunked
<html>
<head>
<title>Content modified /content/dam/folder</title>
</head>
<body>
<h1>Content modified /content/dam/folder</h1>
<table>
<tbody>
<tr>
<td>Status</td>
<td><div id="Status">200</div></td>
</tr>
<tr>
<td>Message</td>
<td><div id="Message">OK</div></td>
</tr>
<tr>
<td>Location</td>
<td><a href="/content/dam/folder" id="Location">/content/dam/folder</a></td>
</tr>
<tr>
<td>Parent Location</td>
<td><a href="/" id="ParentLocation">/</a></td>
</tr>
<tr>
<td>Path</td>
<td><div id="Path">/content/dam/folder</div></td>
</tr>
<tr>
<td>Referer</td>
<td><a href="" id="Referer"></a></td>
</tr>
<tr>
<td>ChangeLog</td>
<td><div id="ChangeLog"><pre>modified("/content/dam/folder/catalog.pdf/jcr:content/jcr:lastModified");<br/>modified("/content/dam/folder/catalog.pdf/jcr:content/jcr:mimeType");
<br/>modified("/content/dam/folder/catalog.pdf/jcr:content/jcr:data");<br/>deleted("/content/dam/folder/catalog.pdf/jcr:content/chunk_0_199");
<br/>deleted("/content/dam/folder/catalog.pdf/jcr:content/chunk_200_399");<br/>deleted("/content/dam/folder/catalog.pdf/jcr:content/sling:chunksLength");
<br/></pre></div></td>
</tr>
</tbody>
</table>
<p><a href="">Go Back</a></p>
<p><a href="/content/dam/folder">Modified Resource</a></p>
<p><a href="/">Parent of Modified Resource</a></p>
</body>
</html>
|
Query Sling about the interrupted chunk upload status
Client sent get request on upload to retrieve chunk upload status.
[request]
Code Block |
---|
title | Query interrupted chunk upload request |
---|
|
GET //content/dam/folder/catalog.pdf.3.json HTTP/1.1
Authorization: Basic YWRtaW46YWRtaW4=
Connection: Keep-Alive
User-Agent: Apache-HttpClient/4.1 (java 1.5)
Host: localhost:4502
|
[response]
sling:chunksLength property indicate cumulative length of all chunks stored in sling. Sling client should resume upload from sling:chunksLength property's value offset
Code Block |
---|
title | Query interrupted chunk upload response |
---|
|
{"jcr:createdBy":"admin","jcr:created":"Mon May 06 2013 21:31:22 GMT+0530","jcr:primaryType":"nt:file","jcr:content":{"jcr:lastModifiedBy":"admin",
"jcr:uuid":"845e9cee-f963-4f72-b115-fa021859c809",":jcr:data":0,"jcr:mixinTypes":["sling:chunkMixin"],"sling:chunksLength":200,"jcr:mimeType":"application/pdf",
"jcr:lastModified":"Mon May 06 2013 21:31:22 GMT+0530", "jcr:primaryType":"nt:resource","sling:fileLength":1700,"chunk_0_199":{"jcr:createdBy":"admin",":jcr:data":200,"sling:offset":0,"jcr:created":"Mon May 06 2013 21:31:22 GMT+0530","jcr:primaryType":"sling:chunkNode"}}
|
Abort chunk upload
To abort chunk upload Sling client passes ":operation=delete" request parameter along with ":applyToChunks=true".
Code Block |
---|
title | Abort incomplete chunk upload request |
---|
|
POST /content/dam/folder/catalog.pdf HTTP/1.1
Authorization: Basic YWRtaW46YWRtaW4=
Content-Length: 403
Content-Type: multipart/form-data; boundary=dDzF5u2n-HJu5tudkdVpFucFsmqcVV-CONtRqlL
Connection: Keep-Alive
User-Agent: Apache-HttpClient/4.1 (java 1.5)
Host: localhost:4502
--dDzF5u2n-HJu5tudkdVpFucFsmqcVV-CONtRqlL
Content-Disposition: form-data; name=":applyToChunks"
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
true
--dDzF5u2n-HJu5tudkdVpFucFsmqcVV-CONtRqlL
Content-Disposition: form-data; name=":operation"
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
delete
--dDzF5u2n-HJu5tudkdVpFucFsmqcVV-CONtRqlL--
|
[response]
Sling removes nt:file node for a new resumable upload and individual chunks on a existing nt:file node.
Code Block |
---|
title | Abort incomplete chunk upload response |
---|
|
HTTP/1.1 200 OK
Connection: Keep-Alive
Server: Day-Servlet-Engine/4.1.42
Content-Type: text/html;charset=UTF-8
Date: Mon, 06 May 2013 16:09:58 GMT
Transfer-Encoding: chunked
<html>
<head>
<title>Content modified /content/dam/folder/catalog.pdf</title>
</head>
<body>
<h1>Content modified /content/dam/folder/catalog.pdf</h1>
<table>
<tbody>
<tr>
<td>Status</td>
<td><div id="Status">200</div></td>
</tr>
<tr>
<td>Message</td>
<td><div id="Message">OK</div></td>
</tr>
<tr>
<td>Location</td>
<td><a href="/content/dam/folder/catalog.pdf" id="Location">/content/dam/folder/catalog.pdf</a></td>
</tr>
<tr>
<td>Parent Location</td>
<td><a href="/content/dam/folder" id="ParentLocation">/content/dam/folder</a></td>
</tr>
<tr>
<td>Path</td>
<td><div id="Path">/content/dam/folder/catalog.pdf</div></td>
</tr>
<tr>
<td>Referer</td>
<td><a href="" id="Referer"></a></td>
</tr>
<tr>
<td>ChangeLog</td>
<td><div id="ChangeLog"><pre>deleted("/content/dam/folder/catalog.pdf");<br/></pre></div></td>
</tr>
</tbody>
</table>
<p><a href="">Go Back</a></p>
<p><a href="/content/dam/folder/catalog.pdf">Modified Resource</a></p>
<p><a href="/content/dam/folder">Parent of Modified Resource</a></p>
</body>
</html>
|
Error Scenarios
Start concurrent chunk upload
If Sling client starts a new upload on an already "in progress" chunk upload, Sling sends 500 internal server error along with error message "Chunk upload already in progress at {path}"
Code Block |
---|
title | Start concurrent chunk upload |
---|
|
POST /content/dam/folder HTTP/1.1
Authorization: Basic YWRtaW46YWRtaW4=
Transfer-Encoding: chunked
Content-Type: multipart/form-data; boundary=WR64qwKjZHY7i8CXduKaVyT6hxsIyBjAie
Connection: Keep-Alive
User-Agent: Apache-HttpClient/4.1 (java 1.5)
Host: localhost:4502
--WR64qwKjZHY7i8CXduKaVyT6hxsIyBjAie
Content-Disposition: form-data; name="catalog.pdf@Length"
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
1000
--WR64qwKjZHY7i8CXduKaVyT6hxsIyBjAie
Content-Disposition: form-data; name="catalog.pdf@Offset"
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
0
--WR64qwKjZHY7i8CXduKaVyT6hxsIyBjAie
Content-Disposition: form-data; name="catalog.pdf"; filename="catalog.pdf"
Content-Type: application/pdf
Content-Transfer-Encoding: binary
$binary_data
--WR64qwKjZHY7i8CXduKaVyT6hxsIyBjAie--
|
[response]
Code Block |
---|
title | Chunk upload already in progress |
---|
|
HTTP/1.1 500 Internal Server Error
Connection: Close
Server: Day-Servlet-Engine/4.1.42
Content-Type: text/html;charset=UTF-8
Date: Mon, 06 May 2013 16:09:58 GMT
Transfer-Encoding: chunked
<html>
<head>
<title>Error while processing /content/dam/folder</title>
</head>
<body>
<h1>Error while processing /content/dam/folder</h1>
<table>
<tbody>
<tr>
<td>Status</td>
<td><div id="Status">500</div></td>
</tr>
<tr>
<td>Message</td>
<td><div id="Message">javax.jcr.RepositoryException: Chunk upload already in progress at {/content/dam/folder/catalog.pdf}</div></td>
</tr>
<tr>
<td>Location</td>
<td><a href="/content/dam/folder" id="Location">/content/dam/folder</a></td>
</tr>
<tr>
<td>Parent Location</td>
<td><a href="/" id="ParentLocation">/</a></td>
</tr>
<tr>
<td>Path</td>
<td><div id="Path">/content/dam/folder</div></td>
</tr>
<tr>
<td>Referer</td>
<td><a href="" id="Referer"></a></td>
</tr>
<tr>
<td>ChangeLog</td>
<td><div id="ChangeLog"><pre></pre></div></td>
</tr>
</tbody>
</table>
<p><a href="">Go Back</a></p>
<p><a href="/content/dam/folder">Modified Resource</a></p>
<p><a href="/">Parent of Modified Resource</a></p>
</body>
</html>
|
Start resumable upload from non zero offset
If sling client starts upload from non zero offset, Sling sends 500 internal server error along with error message "no chunk upload found at {path} wrapped javax.jcr.RepositoryException.
Code Block |
---|
title | Start chunk upload request from non-zero offset |
---|
|
POST /content/dam/folder HTTP/1.1
Authorization: Basic YWRtaW46YWRtaW4=
Transfer-Encoding: chunked
Content-Type: multipart/form-data; boundary=4SC3O7Wgs4nrN8yqNaH1TNfQRxPK62
Connection: Keep-Alive
User-Agent: Apache-HttpClient/4.1 (java 1.5)
Host: localhost:4502
--4SC3O7Wgs4nrN8yqNaH1TNfQRxPK62
Content-Disposition: form-data; name="catalog.pdf@Length"
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
1700
--4SC3O7Wgs4nrN8yqNaH1TNfQRxPK62
Content-Disposition: form-data; name="catalog.pdf@Offset"
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
200
--4SC3O7Wgs4nrN8yqNaH1TNfQRxPK62
Content-Disposition: form-data; name="catalog.pdf"; filename="catalog.pdf"
Content-Type: application/pdf
Content-Transfer-Encoding: binary
$binary-data
--4SC3O7Wgs4nrN8yqNaH1TNfQRxPK62--
|
[response]
Code Block |
---|
title | No chunk upload found |
---|
|
HTTP/1.1 500 Internal Server Error
Connection: Close
Server: Day-Servlet-Engine/4.1.42
Content-Type: text/html;charset=UTF-8
Date: Mon, 06 May 2013 16:22:55 GMT
Transfer-Encoding: chunked
<html>
<head>
<title>Error while processing /content/dam/folder</title>
</head>
<body>
<h1>Error while processing /content/dam/folder</h1>
<table>
<tbody>
<tr>
<td>Status</td>
<td><div id="Status">500</div></td>
</tr>
<tr>
<td>Message</td>
<td><div id="Message">javax.jcr.RepositoryException: no chunk upload found at {/content/dam/folder/catalog.pdf}</div></td>
</tr>
<tr>
<td>Location</td>
<td><a href="/content/dam/folder" id="Location">/content/dam/folder</a></td>
</tr>
<tr>
<td>Parent Location</td>
<td><a href="/" id="ParentLocation">/</a></td>
</tr>
<tr>
<td>Path</td>
<td><div id="Path">/content/dam/folder</div></td>
</tr>
<tr>
<td>Referer</td>
<td><a href="" id="Referer"></a></td>
</tr>
<tr>
<td>ChangeLog</td>
<td><div id="ChangeLog"><pre></pre></div></td>
</tr>
</tbody>
</table>
<p><a href="">Go Back</a></p>
<p><a href="/content/dam/folder">Modified Resource</a></p>
<p><a href="/">Parent of Modified Resource</a></p>
</body>
</html>
|
Upload noncontinuous upload
If sling client sends noncontinuous chunk upload request, Sling sends 500 internal server error along with error message "Chunk's offset {actual offset} doesn't match expected offset {expected offset} wrapped javax.jcr.RepositoryException.
Code Block |
---|
title | Noncontinuous chunk upload request |
---|
|
POST /content/dam/folder HTTP/1.1
Authorization: Basic YWRtaW46YWRtaW4=
Transfer-Encoding: chunked
Content-Type: multipart/form-data; boundary=i3nkScb8nmEmcC87H-LOXKXPO5cutm6
Connection: Keep-Alive
User-Agent: Apache-HttpClient/4.1 (java 1.5)
Host: localhost:4502
--i3nkScb8nmEmcC87H-LOXKXPO5cutm6
Content-Disposition: form-data; name="catalog.pdf@Length"
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
1000
--i3nkScb8nmEmcC87H-LOXKXPO5cutm6
Content-Disposition: form-data; name="catalog.pdf@Offset"
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
200
--i3nkScb8nmEmcC87H-LOXKXPO5cutm6
Content-Disposition: form-data; name="catalog.pdf"; filename="catalog.pdf"
Content-Type: application/pdf
Content-Transfer-Encoding: binary
$binary_data
--i3nkScb8nmEmcC87H-LOXKXPO5cutm6--
|
[response]
Code Block |
---|
title | Offset mismatch error |
---|
|
HTTP/1.1 500 Internal Server Error
Connection: Close
Server: Day-Servlet-Engine/4.1.42
Content-Type: text/html;charset=UTF-8
Date: Mon, 06 May 2013 16:09:58 GMT
Transfer-Encoding: chunked
<html>
<head>
<title>Error while processing /content/dam/folder</title>
</head>
<body>
<h1>Error while processing /content/dam/folder</h1>
<table>
<tbody>
<tr>
<td>Status</td>
<td><div id="Status">500</div></td>
</tr>
<tr>
<td>Message</td>
<td><div id="Message">javax.jcr.RepositoryException: Chunk's offset {200} doesn't match expected offset {600}</div></td>
</tr>
<tr>
<td>Location</td>
<td><a href="/content/dam/folder" id="Location">/content/dam/folder</a></td>
</tr>
<tr>
<td>Parent Location</td>
<td><a href="/" id="ParentLocation">/</a></td>
</tr>
<tr>
<td>Path</td>
<td><div id="Path">/content/dam/folder</div></td>
</tr>
<tr>
<td>Referer</td>
<td><a href="" id="Referer"></a></td>
</tr>
<tr>
<td>ChangeLog</td>
<td><div id="ChangeLog"><pre></pre></div></td>
</tr>
</tbody>
</table>
<p><a href="">Go Back</a></p>
<p><a href="/content/dam/folder">Modified Resource</a></p>
<p><a href="/">Parent of Modified Resource</a></p>
</body>
</html>
|