Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 5.3
Table of Contents
maxLevel3
outlinetrue
separatorpipe

Chunked File Upload

Status: DRAFT
Created: 20. January 2013
Author: shgupta
JIRA: SLING-2707
References: - http://docs.aws.amazon.com/AmazonS3/latest/dev/mpuoverview.html
Updated: -

Use Cases:

1. Large file upload - With high speed internet connections, advent of cloud and HD going mainstream, Sling support large files (> 2GB) upload.
2. Fault tolerant uploads - Sling provides capability to resume upload from failure point. It doesn't require client to restart the complete upload process.

Approach

Sling provides an extension to SlingPostServlet which accepts file chunks in accordance with a specified protocol. Sling client slices the file in chunks, and upload the chunks in serial manner to server. Each chunk has "Offset" attribute which identify chunk position in complete file. SlingPostServletupon receiving the last chunk, stitches all chunks into a single file and save them to the final destination.

In case of upload failures, sling provides support to query the last chunk uploaded till failure point. Client resumes chunk upload from last failure point.

Content Model

Chunks are stored within actual path in sling:chunkMixin mixin node type. The content model to store chunk is defined as follows:

Code Block
titleContent model to store chunk
// node type to store chunk
// offset: offset of chunk in file
// jcr:data: binary of chunk
[sling:chunk] > nt:hierarchyNode
  primaryitem jcr:data
  - sling:offset  (long) mandatory
  - jcr:data (binary) mandatory
 
 //----------------------------------------------------------------------------- 
 // Mixin node type to identify that a node has chunks
 // sling:fileLength : length of complete file
 // sling:length : cumulative length of all uploaded chunks
[sling:chunks]
  mixin
  - sling:fileLength (long)
  - sling:chunksLength (long)
  + * (sling:chunkNode) multiple

The typical nt:file node under chunked upload would look like

Code Block
titleTypical nt:file nod under chunked upload
/content/dam/folder/catalog.pdf [nt:file]
                                + jcr:content [nt:resource] [sling:chunkMixin]
                                    - jcr:data = empty until completed
                                    - sling:fileLength = 982145 // (filename@Length from client)
									- sling:chunksLength = 30000 //cumulative length of all uploaded chunks
                                    + chunk_0-9999 [sling:chunkNode]
                                            - sling:offset =0
                                            - jcr:data [binary data]
                                    + chunk_10000-19999 [sling:chunkNode]
                                    + ....

Protocol Specification

Upload chunk using POST

Chunk upload request

Client uses POST method to parent path to upload binary chunk of file. Sling client passes $filename@Offset and $filename@Length as multipart request parameters. "Offset" indicate chunk's data offset in complete file. "Length" is optional parameter and indicates length of complete file. If "Length" parameter is known, Sling automatically calculates if request is last chunk request and according stitches all chunks and store it into final destination.
[request]

Code Block
titleFirst/Intermediate chunk upload request
POST /content/dam/folder HTTP/1.1
Authorization: Basic YWRtaW46YWRtaW4=
Transfer-Encoding: chunked
Content-Type: multipart/form-data; boundary=CbZDcL_DxJIVQqSG1WkYaIoLWqT3FGYCVe
Connection: Keep-Alive
User-Agent: Apache-HttpClient/4.1 (java 1.5)
Host: localhost:4502

--CbZDcL_DxJIVQqSG1WkYaIoLWqT3FGYCVe
Content-Disposition: form-data; name="catalog.pdf@Length"
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

1000
--CbZDcL_DxJIVQqSG1WkYaIoLWqT3FGYCVe
Content-Disposition: form-data; name="catalog.pdf@Offset"
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

400
--CbZDcL_DxJIVQqSG1WkYaIoLWqT3FGYCVe
Content-Disposition: form-data; name="catalog.pdf"; filename="catalog.pdf"
Content-Type: application/pdf
Content-Transfer-Encoding: binary
$binary-data
--CbZDcL_DxJIVQqSG1WkYaIoLWqT3FGYCVe--

[response]
The response shows that chunk is stored within the actual path in sling:chunkMixin node type.

Code Block
titleChunk upload response
HTTP/1.1 200 OK
Connection: Keep-Alive
Server: Day-Servlet-Engine/4.1.42
Content-Type: text/html;charset=UTF-8
Date: Mon, 06 May 2013 14:42:22 GMT
Transfer-Encoding: chunked

<html>
<head>
    <title>Content modified /content/dam/folder</title>
</head>
    <body>
    <h1>Content modified /content/dam/folder</h1>
    <table>
        <tbody>
            <tr>
                <td>Status</td>
                <td><div id="Status">200</div></td>
            </tr>
            <tr>
                <td>Message</td>
                <td><div id="Message">OK</div></td>
            </tr>
            <tr>
                <td>Location</td>
                <td><a href="/content/dam/folder" id="Location">/content/dam/folder</a></td>
            </tr>
            <tr>
                <td>Parent Location</td>
                <td><a href="/" id="ParentLocation">/</a></td>
            </tr>
            <tr>
                <td>Path</td>
                <td><div id="Path">/content/dam/folder</div></td>
            </tr>
            <tr>
                <td>Referer</td>
                <td><a href="" id="Referer"></a></td>
            </tr>
            <tr>
                <td>ChangeLog</td>
                <td><div id="ChangeLog">&lt;pre&gt;modified("/content/dam/folder/catalog.pdf/jcr:content/jcr:lastModified");&lt;br/&gt;modified("/content/dam/folder/catalog.pdf/jcr:content/jcr:mimeType");
&lt;br/&gt;created("/content/dam/folder/catalog.pdf/jcr:content/chunk_400_799");&lt;br/&gt;modified("/content/dam/folder/catalog.pdf/jcr:content/chunk_400_799/jcr:data");&lt;br/&gt;
modified("/content/dam/folder/catalog.pdf/jcr:content/chunk_400_799/sling:offset");&lt;br/&gt;modified("/content/dam/folder/catalog.pdf/jcr:content/sling:chunksLength");&lt;br/&gt;&lt;/pre&gt;</div></td>
            </tr>
        </tbody>
    </table>
    <p><a href="">Go Back</a></p>
    <p><a href="/content/dam/folder">Modified Resource</a></p>
    <p><a href="/">Parent of Modified Resource</a></p>
    </body>
</html>

Chunk Upload in streaming use case

In streaming use case, file's length is not known in advance. Sling client requires to send "fileName@Completed" to true to indicate that it has reached end of file and current chunk request would be last.
[request]

Code Block
titleLast chunk upload request
POST /content/dam/folder HTTP/1.1
Authorization: Basic YWRtaW46YWRtaW4=
Transfer-Encoding: chunked
Content-Type: multipart/form-data; boundary=lMaKIb2KPscWvPV8B0fULKkKayVtcxugD8Lt
Connection: Keep-Alive
User-Agent: Apache-HttpClient/4.1 (java 1.5)
Host: localhost:4502

--lMaKIb2KPscWvPV8B0fULKkKayVtcxugD8Lt
Content-Disposition: form-data; name="catalog.pdf@Completed"
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

true
--lMaKIb2KPscWvPV8B0fULKkKayVtcxugD8Lt
Content-Disposition: form-data; name="catalog.pdf@Offset"
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

400
--lMaKIb2KPscWvPV8B0fULKkKayVtcxugD8Lt
Content-Disposition: form-data; name="catalog.pdf"; filename="catalog.pdf"
Content-Type: application/pdf
Content-Transfer-Encoding: binary
$binary_data

--lMaKIb2KPscWvPV8B0fULKkKayVtcxugD8Lt--

[response]
The response shows that merge chunks uploaded at final destination and deletion of chunk upload metadata.

Code Block
titleLast chunk upload response
HTTP/1.1 200 OK
Connection: Keep-Alive
Server: Day-Servlet-Engine/4.1.42
Content-Type: text/html;charset=UTF-8
Date: Mon, 06 May 2013 15:52:16 GMT
Transfer-Encoding: chunked

<html>
<head>
    <title>Content modified /content/dam/folder</title>
</head>
    <body>
    <h1>Content modified /content/dam/folder</h1>
    <table>
        <tbody>
            <tr>
                <td>Status</td>
                <td><div id="Status">200</div></td>
            </tr>
            <tr>
                <td>Message</td>
                <td><div id="Message">OK</div></td>
            </tr>
            <tr>
                <td>Location</td>
                <td><a href="/content/dam/folder" id="Location">/content/dam/folder</a></td>
            </tr>
            <tr>
                <td>Parent Location</td>
                <td><a href="/" id="ParentLocation">/</a></td>
            </tr>
            <tr>
                <td>Path</td>
                <td><div id="Path">/content/dam/folder</div></td>
            </tr>
            <tr>
                <td>Referer</td>
                <td><a href="" id="Referer"></a></td>
            </tr>
            <tr>
                <td>ChangeLog</td>
                <td><div id="ChangeLog">&lt;pre&gt;modified("/content/dam/folder/catalog.pdf/jcr:content/jcr:lastModified");&lt;br/&gt;modified("/content/dam/folder/catalog.pdf/jcr:content/jcr:mimeType");
&lt;br/&gt;modified("/content/dam/folder/catalog.pdf/jcr:content/jcr:data");&lt;br/&gt;deleted("/content/dam/folder/catalog.pdf/jcr:content/chunk_0_199");
&lt;br/&gt;deleted("/content/dam/folder/catalog.pdf/jcr:content/chunk_200_399");&lt;br/&gt;deleted("/content/dam/folder/catalog.pdf/jcr:content/sling:chunksLength");
&lt;br/&gt;&lt;/pre&gt;</div></td>
            </tr>
        </tbody>
    </table>
    <p><a href="">Go Back</a></p>
    <p><a href="/content/dam/folder">Modified Resource</a></p>
    <p><a href="/">Parent of Modified Resource</a></p>
    </body>
</html>

Query Sling about the interrupted chunk upload status

Client sent get request on upload to retrieve chunk upload status.
[request]

Code Block
titleQuery interrupted chunk upload request
GET //content/dam/folder/catalog.pdf.3.json HTTP/1.1
Authorization: Basic YWRtaW46YWRtaW4=
Connection: Keep-Alive
User-Agent: Apache-HttpClient/4.1 (java 1.5)
Host: localhost:4502

[response]
sling:chunksLength property indicate cumulative length of all chunks stored in sling. Sling client should resume upload from sling:chunksLength property's value offset

Code Block
titleQuery interrupted chunk upload response
{"jcr:createdBy":"admin","jcr:created":"Mon May 06 2013 21:31:22 GMT+0530","jcr:primaryType":"nt:file","jcr:content":{"jcr:lastModifiedBy":"admin",
"jcr:uuid":"845e9cee-f963-4f72-b115-fa021859c809",":jcr:data":0,"jcr:mixinTypes":["sling:chunkMixin"],"sling:chunksLength":200,"jcr:mimeType":"application/pdf",
"jcr:lastModified":"Mon May 06 2013 21:31:22 GMT+0530", "jcr:primaryType":"nt:resource","sling:fileLength":1700,"chunk_0_199":{"jcr:createdBy":"admin",":jcr:data":200,"sling:offset":0,"jcr:created":"Mon May 06 2013 21:31:22 GMT+0530","jcr:primaryType":"sling:chunkNode"}}

Abort chunk upload

To abort chunk upload Sling client passes ":operation=delete" request parameter along with ":applyToChunks=true".

Code Block
titleAbort incomplete chunk upload request
POST /content/dam/folder/catalog.pdf HTTP/1.1
Authorization: Basic YWRtaW46YWRtaW4=
Content-Length: 403
Content-Type: multipart/form-data; boundary=dDzF5u2n-HJu5tudkdVpFucFsmqcVV-CONtRqlL
Connection: Keep-Alive
User-Agent: Apache-HttpClient/4.1 (java 1.5)
Host: localhost:4502

--dDzF5u2n-HJu5tudkdVpFucFsmqcVV-CONtRqlL
Content-Disposition: form-data; name=":applyToChunks"
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

true
--dDzF5u2n-HJu5tudkdVpFucFsmqcVV-CONtRqlL
Content-Disposition: form-data; name=":operation"
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

delete
--dDzF5u2n-HJu5tudkdVpFucFsmqcVV-CONtRqlL--

[response]
Sling removes nt:file node for a new resumable upload and individual chunks on a existing nt:file node.

Code Block
titleAbort incomplete chunk upload response
HTTP/1.1 200 OK
Connection: Keep-Alive
Server: Day-Servlet-Engine/4.1.42
Content-Type: text/html;charset=UTF-8
Date: Mon, 06 May 2013 16:09:58 GMT
Transfer-Encoding: chunked

<html>
<head>
    <title>Content modified /content/dam/folder/catalog.pdf</title>
</head>
    <body>
    <h1>Content modified /content/dam/folder/catalog.pdf</h1>
    <table>
        <tbody>
            <tr>
                <td>Status</td>
                <td><div id="Status">200</div></td>
            </tr>
            <tr>
                <td>Message</td>
                <td><div id="Message">OK</div></td>
            </tr>
            <tr>
                <td>Location</td>
                <td><a href="/content/dam/folder/catalog.pdf" id="Location">/content/dam/folder/catalog.pdf</a></td>
            </tr>
            <tr>
                <td>Parent Location</td>
                <td><a href="/content/dam/folder" id="ParentLocation">/content/dam/folder</a></td>
            </tr>
            <tr>
                <td>Path</td>
                <td><div id="Path">/content/dam/folder/catalog.pdf</div></td>
            </tr>
            <tr>
                <td>Referer</td>
                <td><a href="" id="Referer"></a></td>
            </tr>
            <tr>
                <td>ChangeLog</td>
                <td><div id="ChangeLog">&lt;pre&gt;deleted("/content/dam/folder/catalog.pdf");&lt;br/&gt;&lt;/pre&gt;</div></td>
            </tr>
        </tbody>
    </table>
    <p><a href="">Go Back</a></p>
    <p><a href="/content/dam/folder/catalog.pdf">Modified Resource</a></p>
    <p><a href="/content/dam/folder">Parent of Modified Resource</a></p>
    </body>
</html>

Error Scenarios

Start concurrent chunk upload

If Sling client starts a new upload on an already "in progress" chunk upload, Sling sends 500 internal server error along with error message "Chunk upload already in progress at {path}"

Code Block
titleStart concurrent chunk upload
POST /content/dam/folder HTTP/1.1
Authorization: Basic YWRtaW46YWRtaW4=
Transfer-Encoding: chunked
Content-Type: multipart/form-data; boundary=WR64qwKjZHY7i8CXduKaVyT6hxsIyBjAie
Connection: Keep-Alive
User-Agent: Apache-HttpClient/4.1 (java 1.5)
Host: localhost:4502

--WR64qwKjZHY7i8CXduKaVyT6hxsIyBjAie
Content-Disposition: form-data; name="catalog.pdf@Length"
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

1000
--WR64qwKjZHY7i8CXduKaVyT6hxsIyBjAie
Content-Disposition: form-data; name="catalog.pdf@Offset"
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

0
--WR64qwKjZHY7i8CXduKaVyT6hxsIyBjAie
Content-Disposition: form-data; name="catalog.pdf"; filename="catalog.pdf"
Content-Type: application/pdf
Content-Transfer-Encoding: binary
$binary_data
--WR64qwKjZHY7i8CXduKaVyT6hxsIyBjAie--

[response]

Code Block
titleChunk upload already in progress
HTTP/1.1 500 Internal Server Error
Connection: Close
Server: Day-Servlet-Engine/4.1.42
Content-Type: text/html;charset=UTF-8
Date: Mon, 06 May 2013 16:09:58 GMT
Transfer-Encoding: chunked

<html>
<head>
    <title>Error while processing /content/dam/folder</title>
</head>
    <body>
    <h1>Error while processing /content/dam/folder</h1>
    <table>
        <tbody>
            <tr>
                <td>Status</td>
                <td><div id="Status">500</div></td>
            </tr>
            <tr>
                <td>Message</td>
                <td><div id="Message">javax.jcr.RepositoryException: Chunk upload already in progress at {/content/dam/folder/catalog.pdf}</div></td>
            </tr>
            <tr>
                <td>Location</td>
                <td><a href="/content/dam/folder" id="Location">/content/dam/folder</a></td>
            </tr>
            <tr>
                <td>Parent Location</td>
                <td><a href="/" id="ParentLocation">/</a></td>
            </tr>
            <tr>
                <td>Path</td>
                <td><div id="Path">/content/dam/folder</div></td>
            </tr>
            <tr>
                <td>Referer</td>
                <td><a href="" id="Referer"></a></td>
            </tr>
            <tr>
                <td>ChangeLog</td>
                <td><div id="ChangeLog">&lt;pre&gt;&lt;/pre&gt;</div></td>
            </tr>
        </tbody>
    </table>
    <p><a href="">Go Back</a></p>
    <p><a href="/content/dam/folder">Modified Resource</a></p>
    <p><a href="/">Parent of Modified Resource</a></p>
    </body>
</html>

Start resumable upload from non zero offset

If sling client starts upload from non zero offset, Sling sends 500 internal server error along with error message "no chunk upload found at {path} wrapped javax.jcr.RepositoryException.

Code Block
titleStart chunk upload request from non-zero offset
POST /content/dam/folder HTTP/1.1
Authorization: Basic YWRtaW46YWRtaW4=
Transfer-Encoding: chunked
Content-Type: multipart/form-data; boundary=4SC3O7Wgs4nrN8yqNaH1TNfQRxPK62
Connection: Keep-Alive
User-Agent: Apache-HttpClient/4.1 (java 1.5)
Host: localhost:4502

--4SC3O7Wgs4nrN8yqNaH1TNfQRxPK62
Content-Disposition: form-data; name="catalog.pdf@Length"
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

1700
--4SC3O7Wgs4nrN8yqNaH1TNfQRxPK62
Content-Disposition: form-data; name="catalog.pdf@Offset"
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

200
--4SC3O7Wgs4nrN8yqNaH1TNfQRxPK62
Content-Disposition: form-data; name="catalog.pdf"; filename="catalog.pdf"
Content-Type: application/pdf
Content-Transfer-Encoding: binary

$binary-data
--4SC3O7Wgs4nrN8yqNaH1TNfQRxPK62--

[response]

Code Block
titleNo chunk upload found
HTTP/1.1 500 Internal Server Error
Connection: Close
Server: Day-Servlet-Engine/4.1.42
Content-Type: text/html;charset=UTF-8
Date: Mon, 06 May 2013 16:22:55 GMT
Transfer-Encoding: chunked

<html>
<head>
    <title>Error while processing /content/dam/folder</title>
</head>
    <body>
    <h1>Error while processing /content/dam/folder</h1>
    <table>
        <tbody>
            <tr>
                <td>Status</td>
                <td><div id="Status">500</div></td>
            </tr>
            <tr>
                <td>Message</td>
                <td><div id="Message">javax.jcr.RepositoryException: no chunk upload found at {/content/dam/folder/catalog.pdf}</div></td>
            </tr>
            <tr>
                <td>Location</td>
                <td><a href="/content/dam/folder" id="Location">/content/dam/folder</a></td>
            </tr>
            <tr>
                <td>Parent Location</td>
                <td><a href="/" id="ParentLocation">/</a></td>
            </tr>
            <tr>
                <td>Path</td>
                <td><div id="Path">/content/dam/folder</div></td>
            </tr>
            <tr>
                <td>Referer</td>
                <td><a href="" id="Referer"></a></td>
            </tr>
            <tr>
                <td>ChangeLog</td>
                <td><div id="ChangeLog">&lt;pre&gt;&lt;/pre&gt;</div></td>
            </tr>
        </tbody>
    </table>
    <p><a href="">Go Back</a></p>
    <p><a href="/content/dam/folder">Modified Resource</a></p>
    <p><a href="/">Parent of Modified Resource</a></p>
    </body>
</html>

Upload noncontinuous upload

If sling client sends noncontinuous chunk upload request, Sling sends 500 internal server error along with error message "Chunk's offset {actual offset} doesn't match expected offset {expected offset} wrapped javax.jcr.RepositoryException.

Code Block
titleNoncontinuous chunk upload request
POST /content/dam/folder HTTP/1.1
Authorization: Basic YWRtaW46YWRtaW4=
Transfer-Encoding: chunked
Content-Type: multipart/form-data; boundary=i3nkScb8nmEmcC87H-LOXKXPO5cutm6
Connection: Keep-Alive
User-Agent: Apache-HttpClient/4.1 (java 1.5)
Host: localhost:4502

--i3nkScb8nmEmcC87H-LOXKXPO5cutm6
Content-Disposition: form-data; name="catalog.pdf@Length"
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

1000
--i3nkScb8nmEmcC87H-LOXKXPO5cutm6
Content-Disposition: form-data; name="catalog.pdf@Offset"
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

200
--i3nkScb8nmEmcC87H-LOXKXPO5cutm6
Content-Disposition: form-data; name="catalog.pdf"; filename="catalog.pdf"
Content-Type: application/pdf
Content-Transfer-Encoding: binary

$binary_data
--i3nkScb8nmEmcC87H-LOXKXPO5cutm6--

[response]

Code Block
titleOffset mismatch error
HTTP/1.1 500 Internal Server Error
Connection: Close
Server: Day-Servlet-Engine/4.1.42
Content-Type: text/html;charset=UTF-8
Date: Mon, 06 May 2013 16:09:58 GMT
Transfer-Encoding: chunked

<html>
<head>
    <title>Error while processing /content/dam/folder</title>
</head>
    <body>
    <h1>Error while processing /content/dam/folder</h1>
    <table>
        <tbody>
            <tr>
                <td>Status</td>
                <td><div id="Status">500</div></td>
            </tr>
            <tr>
                <td>Message</td>
                <td><div id="Message">javax.jcr.RepositoryException: Chunk's offset {200} doesn't match expected offset {600}</div></td>
            </tr>
            <tr>
                <td>Location</td>
                <td><a href="/content/dam/folder" id="Location">/content/dam/folder</a></td>
            </tr>
            <tr>
                <td>Parent Location</td>
                <td><a href="/" id="ParentLocation">/</a></td>
            </tr>
            <tr>
                <td>Path</td>
                <td><div id="Path">/content/dam/folder</div></td>
            </tr>
            <tr>
                <td>Referer</td>
                <td><a href="" id="Referer"></a></td>
            </tr>
            <tr>
                <td>ChangeLog</td>
                <td><div id="ChangeLog">&lt;pre&gt;&lt;/pre&gt;</div></td>
            </tr>
        </tbody>
    </table>
    <p><a href="">Go Back</a></p>
    <p><a href="/content/dam/folder">Modified Resource</a></p>
    <p><a href="/">Parent of Modified Resource</a></p>
    </body>
</html>