large files without the trials

29
Large Files Without the Trials Aaron VanDerlip and Sally Kleinfeldt Plone Symposium East 2010 Thursday, June 3, 2010

Upload: sally-kleinfeldt

Post on 01-Nov-2014

798 views

Category:

Technology


1 download

DESCRIPTION

Sally Kleinfeldt and Aaron VanDerlip describe ore.bigfile, a minimalist solution to the problem of uploading, downloading, and versioning very large files in Plone.

TRANSCRIPT

Page 1: Large Files without the Trials

Large FilesWithout the Trials

Aaron VanDerlip and Sally KleinfeldtPlone Symposium East 2010

Thursday, June 3, 2010

Page 2: Large Files without the Trials

Acknowledgments

• Bioneers provides environmental education and social connectivity through conferences, radio and TV, books, and online materials

• Engaged Jazkarta to build a file asset server based on Plone to help them organize, capture, and store multimedia and textual content with files as large as 5 GB.

Thursday, June 3, 2010

Page 3: Large Files without the Trials

Acknowledgments

• Aaron VanDerlip - Project Manager

• Kapil Thangavelu - Developer

Thursday, June 3, 2010

Page 4: Large Files without the Trials

What is a Big File?

• Anything that makes you wait...

Thursday, June 3, 2010

Page 5: Large Files without the Trials

Plone Problems with Big Files

1.Uploading/Downloading

2.Versioning

Thursday, June 3, 2010

Page 6: Large Files without the Trials

Uploading Big Files

• Both the user and a Zope thread are waiting for the file transfer

Thursday, June 3, 2010

Page 7: Large Files without the Trials

Thursday, June 3, 2010

Page 8: Large Files without the Trials

Uploading Big Files

• Browser encodes file in multipart mime format

• Zope must undo this encoding

• CPU and memory intensive, and SLOW

• Zope thread is blocked during this process

Thursday, June 3, 2010

Page 9: Large Files without the Trials

Downloading Big Files

• ...the same thing happens in reverse

Thursday, June 3, 2010

Page 10: Large Files without the Trials

Learning from Rails

• Get file encoding/unencoding and read/write operations out of Plone

• Web servers are really good at this -Apache, Nginx, and Lighttpd

• Our implementation uses Apache

• Apache file streaming is fast and threads are cheap

Thursday, June 3, 2010

Page 11: Large Files without the Trials

Learning from Rails

• Uploads: Apache plus mod_porter http://therailsway.com/tags/porter

• Downloads: Apache plus mod_xsendfile http://john.guen.in/past/2007/4/17/send_files_faster_with_xsendfile/

• ...and of course ZODB Blob storage

Thursday, June 3, 2010

Page 12: Large Files without the Trials

Mod Porter

• Parses the multipart mime data

• Writes the file to disk

• Changes the Request to contain a pointer to the temp file on disk

• All done efficiently in C code inside your Apache process

Thursday, June 3, 2010

Page 13: Large Files without the Trials

Mod Porter

Thursday, June 3, 2010

Page 14: Large Files without the Trials

Apache Config for Mod Porter

LoadModule apreq_module /usr/lib/Apache2/modules/mod_apreq2.so

LoadModule porter_module /usr/lib/Apache2/modules/mod_porter.so

# Apache has a default read limit of 64MB, set it higher

APREQ2_ReadLimit 2G

...

Porter On

# Files below this size will not be handled by mod-porter

PorterMinSize 14M

# Where the uploaded files are stored

PorterDir /mnt/uploads-Apache

Thursday, June 3, 2010

Page 15: Large Files without the Trials

X-Sendfile

• HTTP header

• Set an X-Sendfile header and the path of a file on your response

• Apache does the rest

Thursday, June 3, 2010

Page 16: Large Files without the Trials

Apache Config for X-Sendfile

LoadModule xsendfile_module /usr/lib/Apache2/modules/mod_xsendfile.so

...

EnableSendfile On

XSendFile on

# Config to send file resources directly from blob storage

XSendFilePath /mnt/bioneers/var/blobstorage

Thursday, June 3, 2010

Page 17: Large Files without the Trials

Using X-Sendfile from Python

def download(self, response, file_path):

response.setHeader("X-Sendfile",

file_path)

Thursday, June 3, 2010

Page 18: Large Files without the Trials

Blob Storage

• Uploads

• Blob.consumeFile moves file from Apache’s temp area to blob storage (ZODB/blob.py)

• Uses os.rename, file never enters Plone

• Downloads

• Served directly from blob storage

Thursday, June 3, 2010

Page 19: Large Files without the Trials

Upload Process

Thursday, June 3, 2010

Page 20: Large Files without the Trials

What About Really Really Big Files?

• Use FTP

• Supports continuation and batching

• Handles files too large for browser limits

• Content editors use FTP to transfer files to an upload directory

Thursday, June 3, 2010

Page 21: Large Files without the Trials

UI

Thursday, June 3, 2010

Page 22: Large Files without the Trials

Uploading with FTP

Thursday, June 3, 2010

Page 23: Large Files without the Trials

ore.bigfile

• Minimally intrusive, works with the grain of Plone

• Provides Big File content type

• IFrontendFileServer interface defines two methods that provide web server support for upload and download

• Apache and Nginx implementations provided

Thursday, June 3, 2010

Page 24: Large Files without the Trials

ore.bigfileLimitations

• Upload directory is hardcoded

• Possibility of error on very large images which Mod Porter intercepts

Thursday, June 3, 2010

Page 25: Large Files without the Trials

Versioning Big Files

Thursday, June 3, 2010

Page 26: Large Files without the Trials

Solution

• Bypass CMFEditions - no file size limitation

• Create a new version only when file changes (not metadata)

• Allow old versions to be purged

• Version information stored on Big File object using annotations

Thursday, June 3, 2010

Page 27: Large Files without the Trials

UI

Thursday, June 3, 2010

Page 28: Large Files without the Trials

Conclusion

• ore.bigfile solves the Big File problem for a particular use case, not feature complete

• It does so by taking advantage of mature web server technology

• The code is minimally intrusive

• It provides a strategy for implementation we can learn from as we improve Plone’s Big File story

Thursday, June 3, 2010