Azure Blob Storage Part 4: Uploading Large Blobs – Simple Talk

Posted: February 28, 2022 at 8:05 pm

In the previous article in this series, I showed you how to use the Storage Client Library to do many of the operations needed to manage files in blob storage, such as upload, download, copy, delete, list, and rename. The CloudBlockBlob.UploadFile works fine, but it can be tuned for special cases such as very slow internet access.

When I worked for a startup, one of the things our desktop product did was upload a bunch of images and an MP3 file to Azure blob storage. The MP3 could be as large as 20 MB. Many of our customers lived in areas with broadband upload speeds of 1.0 mbps on a good day. When we tested using File.Upload on a 20MB file with minimal broadband speed, we found the upload would time out and eventually fail. It just couldnt send up enough bytes and get a handshake back quickly enough to be successful.

In order to make our product work for all customers, we changed the upload to send the file up in blocks. The customer could set the block size. If the customer had pretty good internet speed (5 mbps or higher), they might set the block size as high as 1 MB. If they had pretty bad internet speed (1 mbps or lower), they could set the block size as low as 256kb. This is small enough for a block to be uploaded and the handshake completed, and then it could start on the next block.

In this article, Im going to discuss two ways to upload a file in blocks. One way is to use the parameters that can be changed when calling the UploadFile method on the CloudBlockBlob object. The other way is to programmatically break the file into blocks and upload them one by one, then ask Azure to reassemble them.

Lets start with using the built-in functions for uploading a file. I messed around with this a bit back in 2010-2011, but the properties as used back then are obsolete, and/or have been moved to different objects of the Storage Client Library since then. Bing-ing SingleBlobUploadThresholdInBytes only returned 8 articles. (Think about that. What have you searched for lately that only returned 8 results?) Most of the articles were from 2010-2011; the others were from MSDN, which offered a useful explanation like this: This is the threshold in bytes for a single blob upload. Wow, incredibly helpful.

I managed to track down someone on the Azure Storage team at Microsoft to help me understand this, so at the time of this writing, I think only three people in the world know how to use this correctly me, the guy at Microsoft who owns it, and one of the other Azure MVPs. So after you read this, you will be part of a very elite group.

There are three properties directly involved.

This is the threshold in bytes for a single blob upload. (Haha! Kidding!) This setting determines whether the blob will be uploaded in one shot (Put Blob) or multiple requests (Put Block). It does not determine the block size. It basically says if the file is smaller than this size, upload it as one block. If the file size is larger than this value, break it into blocks and upload it.

The minimum value for this is 1MB (1024 * 1024). This means you can not use this to chunk files that are smaller than 1 MB. ParallelOperationThreadCount must be equal to 1 (more on that below). Also, this works with the Upload* APIs (such as UploadFile) but not to blob streams. If you use OpenWrite to get a stream and write to it, it will always be uploaded behind the scenes using Put Block calls.

This property is found in the BlobRequestOptions class. To use it, create a BlobRequestOptions object and then assign it to the CloudBlobClients DefaultRequestOptions property.

This sets the size of the blocks to use when you do a Put Blob and it breaks it into blocks to upload because the file is larger than the value of SingleBlobUploadThresholdInBytes.

By default, this is 4MB (4 * 1024 * 1024).

This is a property on the CloudBlockBlob object or CloudPageBlob object, whichever you are using. You can use this when streaming files up to Azure as well (like when youre using UploadStream instead of UploadFile).

This specifies how many parallel PutBlock or PutPage operations should be pending at a time.

If this is set to anything but 1, SingleBlobUploadThresholdInBytes will be ignored. After all, if you ask the file to be sent up in multiple threads, theres no way to do that but to send it up in blocks, right?

This is a property of the BlobRequestOptions object.

So for example, if you use these values:

and call blob.UploadFile, if the file is less than 1MB, it will use one Put Blob to upload it. If the file is larger than 1 MB, it will split it into 256kb blocks and send the blocks up as multiple requests.

You might also consider changing the default Retry Policy. If youre chunking the file because you think the client will have problems uploading it because their internet connection is poor, you might want to set it to only retry once, or not at all. Otherwise it may time out, then wait X seconds and time out again, etc, when it will never succeed. For this reason, Im only having it retry once in the code below.

Uploading a file using the .NET Storage SDK

TimeSpan backOffPeriod = TimeSpan.FromSeconds(2);

int retryCount = 1;

BlobRequestOptions bro = new BlobRequestOptions()

{

SingleBlobUploadThresholdInBytes = 1024 * 1024, //1MB, the minimum

ParallelOperationThreadCount = 1,

RetryPolicy = new ExponentialRetry(backOffPeriod, retryCount),

};

CloudStorageAccount cloudStorageAccount = CloudStorageAccount.Parse(ConnectionString);

CloudBlobClient cloudBlobClient = cloudStorageAccount.CreateCloudBlobClient();

cloudBlobClient.DefaultRequestOptions = bro;

cloudBlobContainer = cloudBlobClient.GetContainerReference(ContainerName);

CloudBlockBlob blob = cloudBlobContainer.GetBlockBlobReference(Path.GetFileName(fileName));

blob.StreamWriteSizeInBytes = 256 * 1024; //256 k

blob.UploadFromFile(fileName, FileMode.Open);

In the code above, you can see that I create a BlobRequestOptions object, assign the values of SingleBlobUploadThresholdInBytes, ParallelOperationThreadCount, and RetryPolicy. Then after instantiating the CloudBlobClient, I set the DefaultRequestOptions to my BlobRequestOptions object. After getting a reference to the blob, I set the StreamWriteSizeInBytes. Then I upload the file.

If I turn fiddler on and use the code above to upload a 5MB file, I see multiple requests one for each block. These calls are made consecutively because they are all running in a single thread (ParallelOperationThreadCount = 1).

Figure 1: Fiddler View

And if I look at any one line, I can see the size of the request. For all but the last two, the block size is the same as StreamWriteSizeInBytes. The last two send out the remainder of the blocks.

Figure 2: Fiddler Details

If you can set a couple of properties and upload a file in blocks easily, why would you want to do it programmatically? The case that immediately comes to mind is if you have files that are less than 1 MB and you want to send them up in 256kb blocks. The minimum value for SingleBlobUploadThresholdInBytes is 1 MB, so you can not use the method above.

Another case is if you want to let the user pause the upload process, then come back later and restart it. Ill talk about this after the code for uploading a file in blocks.

To programmatically upload a file in blocks, you first open a file stream for the file. Then repeatedly read a block of the file, set a block ID, calculate the MD5 hash of the block and write the block to blob storage. Keep a list of the block IDs as you go. When youre done, you call PutBlockList and pass it the list of block IDs. Azure will put the blocks together in the order specified in the list, and then commit them. If you get the Block List out of order, or you dont put all of the blocks before committing the list, your file will be corrupted.

The block ids must all be the same size for all of the blocks, or your upload/commit will fail. I usually just number them from 1 to whatever, using a block ID that is formatted to a 7-character string. So for 1, Ill get 0000001. Note that block ids have to be a base 64 string.

Heres the code for uploading a file in blocks. Ive put comments in to explain whats going on.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

Link:

Azure Blob Storage Part 4: Uploading Large Blobs - Simple Talk

Related Posts