Skip to main content

Ajax File Uploads with JavaScript's File API

Developers have been using Ajax techniques for years to create dynamic web forms, but handling file uploads using Ajax was always problematic. The crux of the problem was security – it's not a good idea to allow arbitrary code access to any file it wants on a user's system so JavaScript was intentionally restricted in how it could interact with things like file input elements. Uploading a file with JavaScript was essentially a standard form submission that targeted a hidden iframe. It felt dirty but it got the job done.

The W3C began work on standardizing a File API for JavaScript sometime between 2006 and 2009 and we're now at the point with browser support where developers can take advantage of it. Developers supporting web apps on IE8 and 9 still need to use iframes, but those of us targeting newer browsers can finally take a pure JavaScript approach to file uploads. And as more users migrate from IE8/9, the iframe approach will eventually be left in the dustbin.

The interesting things defined by the W3C's File API are:

  • Blob – an object to represent a sequence of bytes and is consumed by FileReader. Its size property lists the size of the sequence in bytes and its type property is a lower-case MIME-type string if such information is available.
  • File – an object that extends Blob and offers additional properties to make the file's metadata available. Its name property holds the filename (no path information) and lastModifiedDate holds a Date object instance set to when the file was last modified.
  • FileReader – an object that reads the byte sequence of a Blob or File object.
  • FileList – a property given to file input elements which essentially is a list of File objects.

The API is designed so that byte sequences are loaded asynchronously by default. This makes sense since there are several things that can cause the read process to take a while to complete: it might be a large file, the file might be on a mounted network share, etc. Reading files asynchronously ensures the main execution thread is free and the browser doesn't lock up.

So what does a basic upload look like using the API? At a high level, the steps are:

  1. Provide a file input for the user.
  2. When the user sets a file, retrieve its File object from the input's files property.
  3. Create a FileReader instance and register a callback for its onload event. This callback will have access to the read data.
  4. Initiate the read process with the FileReader methods readAsText() or readAsDataURL().

I like to use readAsDataURL() to initiate the read process, especially for binary files like images and PDFs, since the data will be base64 encoded. The ASCII URI string can then be safely sent to the server just like any other string.

I also recommend using POST for the HTTP method; yes, the encoded contents as a data URI which can be used in a GET parameter, but doing so increases the risk of getting an HTTP/414 error because of the resulting size of the request. Base64 encodes binary content to safe ASCII which increases the data's size by roughly 130%.

<form>
 <input id="fileInput" type="file" />
</form>

<script>
document.getElementById("fileInput").onchange = function () {
    // retrieve File from input
    var file = this.files[0];

    // set FileReader's onload event
    var reader = new FileReader();
    reader.onload = function () {
        // the results of the read is available with the FileReader's
        // result property when the callback is executed
        var fileContent = this.result;

        // send fileContent to server via Ajax request
        // ...
    };
    // initiate reading
    reader.readAsDataURL(file);
};
</script>

Handling the upload once it reaches the server is different than working with traditional file uploads in PHP since the file comes into the system as “normal” user input. That is, you won't be using the $_FILES superglobal or functions like move_uploaded_file(). Instead the content will be available straight from $_POST.

The data URI format is defined by RFC 2397 looks like the following:

data:[<mediatype>][;base64],<data>

You're free to existing libraries to parse the URI or parse it yourself. The media type is optional. If present, the value is a MIME type string. If it's missing, the default value text/plain;charset=US-ASCII should be assumed. If ;base64 is present then the data is base64 encoded.

<?php
// parse out file data
list($front, $data) = explode(',', $dataUri, 2);
if (stristr($front, ';base64') !== false) {
    $data = base64_decode($data);
}

// test whether the file is a valid image
try {
    $image = new \Imagick();
    $image->readImageBlob($data);
}
catch (\ImagickException $e) {
    header('HTTP/1.0 400 Bad Request');
    exit;
}

// do something with $image
// ...

Posting a file as data URI protects you from some of the security vulnerabilities that are typically inherent when dealing with files. Data URIs don't account for filenames, for instance, so you're safe from directory traversal attacks by maliciously named files. Still, you should treat the URI as you would any other piece of user-supplied data. Your application will obviously dictate how you filter and validate the file.

A secondary concern is the possibility of a malicious person using large file posts as a vector for a denial of service attack. The traditional upload approaches must mitigate this risk, and an Ajax approach must do so as well. Make certain you review the memory_limit and post_max_size entries in your php.ini, and keep in mind the tradeoff between size and ASCII-safety when using base64 encoding.

This isn't the first post on the Internet to deal with Ajax file uploads or JavaScript's File API, but many of them provide little beyond code samples. Hopefully I've remedied the situation by providing a succinct overview of the API's important objects/interfaces and discussing how receiving the file is different using this approach. If there's something I've neglected, feel free to leave a comment!

Comments

  1. This is interesting... Curious though if you're using HTML5's multiple attribute, is there much to change within the FileReader portion of the Javascript?

    ReplyDelete
  2. The files property is an array regardless of the number of files selected (notice I index it with [0] even though I'm not using the multiple attribute); you can reuse the FileReader to process additional files however you please.

    ReplyDelete

Post a Comment

Popular posts from this blog

Geolocation Search

Services that allow users to identify nearby points of interest continue to grow in popularity. I'm sure we're all familiar with social websites that let you search for the profiles of people near a postal code, or mobile applications that use geolocation to identify Thai restaurants within walking distance. It's surprisingly simple to implement such functionality, and in this post I will discuss how to do so.

The first step is to obtain the latitude and longitude coordinates of any locations you want to make searchable. In the restaurant scenario, you'd want the latitude and longitude of each eatery. In the social website scenario, you'd want to obtain a list of postal codes with their centroid latitude and longitude.

In general, postal code-based geolocation is a bad idea; their boundaries rarely form simple polygons, the area they cover vary in size, and are subject to change based on the whims of the postal service. But many times we find ourselves stuck on a c…

Composing Music with PHP

I’m not an expert on probability theory, artificial intelligence, and machine learning. And even my Music 201 class from years ago has been long forgotten. But if you’ll indulge me for the next 10 minutes, I think you’ll find that even just a little knowledge can yield impressive results if creatively woven together. I’d like to share with you how to teach PHP to compose music. Here’s an example: You’re looking at a melody generated by PHP. It’s not the most memorable, but it’s not unpleasant either. And surprisingly, the code to generate such sequences is rather brief. So what’s going on? The script calculates a probability map of melodic intervals and applies a Markov process to generate a new sequence. In friendlier terms, musical data is analyzed by a script to learn which intervals make up pleasing melodies. It then creates a new composition by selecting pitches based on the possibilities it’s observed. . Standing on ShouldersComposition doesn’t happen in a vacuum. Bach was f…

Creepy JavaScript Tracking

I recently began allergy shots so my new Monday morning routine includes me sitting in a doctor's office for 30 minutes (I must wait after receiving the shots and be checked by a nurse to make sure there was no reaction). With nothing else better to do while I waited last week, I started playing around with some JavaScript. This is what I came up with:
<html> <head> <title>Test</title> <script type="text/javascript"> window.onload = function () { var mX = 0,  mY = 0, sX = 0,  sY = 0, queue = [], interval = 200, recIntv = null, playIntv = null, b = document.body, de = document.documentElement, cursor = document.getElementById("cursor"), record = document.getElementById("record"), play = document.getElementById("play"); window.onmousemove = function (e) { e = e || window.event; if (e.pageX || e.pageY) { …