Reducing Consumed Request Units in DocumentDb with C# .NET

PUBLISHED ON JAN 31, 2017 — .NET, AZURE, DATABASES

I’ve previously covered getting up and running with DocumentDb. Since then, one of the things I’ve been looking at with it is how to reduce the amount of request units each write operation to the database consumes.

This post will be assuming you’ve got that code, as we’ll be building on it here.

First, it’s good to understand what throughput you’ve set your collection to have, and what this means in terms of performance as you scale.

Azure lets you configure the collection to have a throughput of between 400 and 1000 request units (RUs) per second.

Every operation you perform on DocumentDb will consume some of these request units, the amount dependent on the size and complexity of what you’re doing.

Once you hit the limit of your throughput, DocumentDb will throttle requests to get you under the limit. When this happens, you’ll receive a HTTP Status 429 response, 'RequestRateTooLarge'.

In this case, you need to get the header x-ms-retry-after-ms which will give you the time to wait before attempting the request again.

However, if you’re using the .NET client SDK then this will be handled for you most of the time as the SDK will implicitly cache the response, will respect the header, and retry the request.

One thing to be wary of is not setting your throughput too high - Azure will charge you for RESERVED throughput. That means whatever you set the collection to, that is what you get charged for, regardless of if you use it or not. I’ll cover scaling the database up in a future post, but unless you’re sure you’ll need it, I’d set the throughput to the minimum, especially if it’s just for testing.

Now, in order to find out if any changes we make will actually work, we’ll need to first find out how many request units we’re currently consuming.

For the purposes of this demo, I’m going to be writing the following class to the database:

public class CustomerEntry
{
    public Customer Customer { get; set; }

    public DateTime TimeStamp { get; set; }

    public Guid id { get; set; }
}

public class Customer
{
    public string FirstName { get; set; }

    public string LastName { get; set; }

    public string Email { get; set; }

    public string Password { get; set; }

    public bool IsVerified { get; set; }

    public string HomePhone { get; set; }

    public string MobilePhone { get; set; }

    public Address RegisteredAddress { get; set; }

    public Address BillingAddress { get; set; }

    public Account AccountDetails { get; set; }

    public string CompanyCode { get; set; }

    public string CompanyName { get; set; }
}

public class Address
{
    public string BuildingNumberName { get; set; }

    public string AddressLine1 { get; set; }

    public string AddressLine2 { get; set; }

    public string AddressLine3 { get; set; }

    public string Town { get; set; }

    public string Region { get; set; }

    public string PostCode { get; set; }

    public string Country { get; set; }
}

public class Account
{
    public string AccountCode { get; set; }

    public string AccountName { get; set; }

    public string AccountType { get; set; }

    public string PaymentMethod { get; set; }

    public string Currency { get; set; }

    public bool AllowFreeDelivery { get; set; }

    public string TimeZone { get; set; }
}

This is just so we have something that might be vaguely more realistic than me just dumping a load of randomly named properties into an object.

So what we’ll be writing to the database is a CustomerEntry object. I like to wrap my objects alongside the id that DocumentDb expects, and a TimeStamp for any audit or debug issues.

Now we’ve done this, we’ll take the method from my previous post, and modify it slightly so we can output to the console how many request units the operation consumed:

ResourceResponse<Document> documentResponse = await _client.CreateDocumentAsnc(UriFactory(CreateDocumentCollectionUri(_databaseName, _collectionName),
                                        documentObject);
Console.WriteLine(String.Format("Request units consumed: {0}", documentResponse.RequestCharge));

The parameter documentObject will be an instance of CustomerEntry. So if you now run the console app, you’ll see the number of request units output to the window - I’m seeing 19.62 units consumed.

So now we want to try and reduce this amount, which we’re going to do by changing our indexing policy for the DocumentCollection.

In DocumentDb, the indexing determines what on the object is queryable. By default, the entire object is indexed.

We’re going to change this to index by only by one or two properties that I might be interested in querying on, ignoring the rest. This might not be suitable for your application, so think carefully about what you want to index based on your needs.

To do this, I’m going to delete the existing DocumentCollection, and update the method to create a DocumentCollection to apply the new indexing policy.

We’re going to set the ExcludedPaths to ignore the entire Customer object, and then in our IncludedPaths we’ll add the specific property we want. In this case, I’m assuming I’ll only be interested in the Email on the Customer (as that would be my unique property).

So with these changes, our method will now look like this:

private static async void CreateDocumentCollectionIfNotExists()
{
    try
    {
        if (!_client.CreateDocumentCollectionQuery(UriFactory.CreateDatabaseUri(_databaseName))
                .Where(coll => coll.Id == _collectionName).ToArray()
                .Any())
        {
            DocumentCollection collectionInfo = new DocumentCollection();
            collectionInfo.Id = _collectionName;
            collectionInfo.IndexingPolicy.Automatic = true;
            collectionInfo.IndexingPolicy.ExcludedPaths.Add(new ExcludedPath
            {
                Path = "/Customer/*"
            });
            collectionInfo.IndexingPolicy.IncludedPaths.Add(new IncludedPath
            {
                Path = "/*",
                Indexes = new System.Collections.ObjectModel.Collection<Index>
                {
                    new RangeIndex(DataType.String) { Precision = 20 }
                }
            });
            collectionInfo.IndexingPolicy.IncludedPaths.Add(new IncludedPath
            {
                Path = "/Customer/Email/?",
                Indexes = new System.Collections.ObjectModel.Collection<Index>
                {
                    new RangeIndex(DataType.String) { Precision = 20 }
                }
            });

            ResourceResponse<DocumentCollection> collectionResponse = await _client.CreateDocumentCollectionAsync(UriFactory.CreateDatabaseUri(_databaseName),
                                                            collectionInfo,
                                                            new RequestOptions { OfferThroughput = 400 });
            Console.WriteLine(String.Format("Collection request units consumed: {0}", collectionResponse.RequestCharge));
        }
    }
    catch (Exception ex)
    {
        Console.WriteLine(ex.Message);
    }
}

So here we’re first querying to see if the Document Collection exists, as discussed in my previous post. Then we set up out Document Collection, setting the Indexing Policy to automatic.

Now, we need to configure our excluded and included paths. We set the Excluded Path to exclude the entire Customer object, and set the Included Path to include everything. This leaves us with the id and TimeStamp properties being indexed, which will be useful for future queries.

Next, we set another Included Path, this time specifying the Email property on the Customer object. So now, our TimeStamp, id, and Customer.Email properties are all indexed, and as such are all queryable.

The real test is the difference in the consumed request units per write operation - remember we were previously consuming 19.62 RUs when writing.

Ensure the collection has been deleted in the Azure Portal, then run the code to create the collection again, and write an object to the collection. When doing so, I now see 7.24 RUs being consumed.

We’ve more than halved the amount we consume!

This is just one way of improving performance with DocumentDb - I’ll continue to learn more and will share what I find.

As always, if you have any improvements or suggestions, feel free to let me know!

comments powered by Disqus