I'm Andrew Hoefling, and I work for FileOnQ as a Lead Software Engineer building mobile technologies for Government, Financial and First Responders using Xamarin.

Azure

8/27/2020

Uploading Large Files to Azure Blob Storage in C#

Azure Blob Storage is a great tool for storing any type of file for easy access in your app. The APIs allow you to easily upload and download files of any type which integrates with many popular languages and frameworks. If the SDK isn't supported you can always fall back right to the RESTful endpoints (which I wouldn't recommend, unless you absolutely have to).

The SDK is great for small files, but it there are extra steps for handling large files both on client side and SDK usage. Let's go over the basics and how to extend it to support large files.

.NET Library (v11)

UPDATE - 10/16/2020

It was pointed out to me this is using v11 and not v12. When I originally wrote this blog I was using the NuGet Microsoft.Azure.Storage.Blob which has now been deprecated and moved to Azure.Storage.Blobs. The latest and greatest version is v12 which is available in the NuGet referenced below. This article focuses on v11 and the implementation details may be different.

This article will be using the v11 library to upload files to Azure Blob Storage. It can be confusing to find the correct documentation and NuGet reference, we are going to use the links below:

NuGet: Azure.Storage.Blobs
Documentation: https://docs.microsoft.com/en-us/azure/storage/blobs/storage-quickstart-blobs-dotnet

Assumptions

This article is going to make a couple assumptions as it focuses on a particular problem with uploading large files in Azure Blob Storage

Azure Blob Storage Exists
Connection String
Steam of file to upload
- byte[]
- File Stream
- Buffered Stream
- etc.

If you need help getting started, look at the documentation link above as it is the quick start guide. Once you get the basic upload/download done come back here to follow along for large file uploads.

Basic Upload

Let's define a basic upload class to help with our implementation. This class will encapsulate our connection to Azure Blob Storage. You should use your connection string generated by your resource. Let's start by creating our constructor that uses the connection string and then generates an proper instance of CloudBlobContainer.

public class UploadManager
{
    CloudBlobContainer _container;
    public UploadManager(string connectionString)
    {
        _container = new CloudBlobContainer(new Uri(connectionString));
    }
}

Add new method that will upload any Stream object. This method will leverage the upload API from the SDK and attempt to upload the file regardless of size. We will update this in just a bit to support large files.

public class UploadManager
{
    CloudBlobContainer _container;
    public UploadManager(string connectionString)
    {
        _container = new CloudBlobContainer(new Uri(connectionString));
    }

    public Task UploadStreamAsync(Stream stream, string name)
    {
        CloudBlockBlob blob = _container.GetBlockBlobReference(name);
        return blob.UploadFromStreamAsync(stream);
    }
}

This code will work, but once your file reaches a certain size you won't be able to upload it using this API. At that point you will need to make some changes to get it to work.

Large File Upload

Once you start working with large files a simple upload will not work. The way you solve this problem is by splitting the file into smaller chunks. If you are using a Stream like we are the code will just read a certain number of bytes and process those until complete.

Logic

Determine size of file chunk you want to process at a time
Read the number of bytes from your size into a buffer
Create a block ID to match your upload
Upload your buffer to Azure Blob Storage
Repeat until done

Modify Upload Method Signature

Let's update our Upload method signature to include a size, which allows the consumer to define how many bytes to read on each iteration. Using a default value for size will prevent any breaking changes as we update the signature

public Task UploadAsync(Stream stream, string name, int size = 8000000)
{
    // Code omitted
}

I chose a chunk size of 8000000 which is an arbitray number I made up. I don't have any recommendation for chunk size, but you can calculate out an optimal value for yourself.

Implement File Chunking

Our solution uses a technique called file chunking which breaks the large file into smaller chunks for each upload. Azure Blob Storage will understand how to properly put everything together for use using unique Block IDs. We will need to manage the following items

Block IDs - for tracking each upload
Buffer - a byte[] of the current data to upload

public class UploadManager
{
    CloudBlobContainer _container;
    public UploadManager(string connectionString)
    {
        _container = new CloudBlobContainer(new Uri(connectionString));
    }

    public async Task UploadStreamAsync(Stream stream, string name, int size = 8000000)
    {
        CloudBlockBlob blob = _container.GetBlockBlobReference(name);

        // local variable to track the current number of bytes read into buffer
        int bytesRead

        // track the current block number as the code iterates through the file
        int blockNumber = 0;

        // Create list to track blockIds, it will be needed after the loop
        List<string> blockList = new List<string>();

        do {
            // increment block number by 1 each iteration
            blockNumber++; 
            
            // set block ID as a string and convert it to Base64 which is the required format
            string blockId = $"{blockNumber:0000000}";
            string base64BlockId = Convert.ToBase64String(Encoding.UTF8.GetBytes(blockId));

            // create buffer and retrieve chunk
            byte[] buffer = new byte[size];
            bytesRead = await stream.ReadAsync(buffer, 0, size);

            // Upload buffer chunk to Azure
            await blob.PutBlockAsync(base64BlockId, new MemoryStream(buffer, 0, bytesRead), null);

            // add the current blockId into our list
            blockList.Add(base64BlockId); 

            // While bytesRead == size it means there is more data left to read and process
        } while (bytesRead == size); 

        // add the blockList to the Azure which allows the resource to stick together the chunks
        await blob.PutBlockListAsync(blockList);

        // make sure to dispose the stream once your are done
        stream.Dispose();
    }
}

Conclusion

This should get you started on uploading large files to Azure Blob Storage. The chunking solution breaks everything down to small files that are easy to upload. This solution requires a few additional steps and code to maintain, but with a little extra work you can now upload very large files to your resource without issue.

-Happy Coding

Andrew Hoefling