Transaction Engine Costs

July 06, 2020 | Glynn Bird | TxE Pricing

Cloudant on Tranasaction Engine (Cloudant TxE) is the latest incarnation of the Cloudant JSON document store running as a service in the IBM Cloud. It differs from the “classic” Cloudant product, as a glance at the feature comparison chart will show, but it also has a different pricing model.

Cloudant TxE’s pricing structure is closely aligned with the performance characteristics of the database, in other words database operations that are cheap (in terms of the number of read/write units they consume) are relatively easy for the database to execute and are performant and scalable.

Photo by Agê Barros on Unsplash

This document explores the implicit incentives that the pricing model is presenting to the developer. If you follow incentives you’ll get fast, scalable performance at the lowest price.

How does pricing work on Cloudant TxE 🔗

Pricing is fully explained here but in a nutshell a Cloudant TxE plan comes with a number of read units and write units that are provisioned for your use every second. The number of read/write units you provision is determined by how much you pay and can change up and down over time, either by altering the position of the slider in the IBM Cloud dashboard or via an API call:

Cloudant capacity

Each Cloudant operation consumes a different number of read/write units depending on how complex it is so it’s in your interest to try to achieve your application’s goals while consuming the fewest units as possible.

This chart shows the cost of common Cloudant operations, with a colour-coded guide to how the price is calculated:

Cloudant TxE pricing

Let’s unpack this diagram and draw out the incentives that Cloudant has baked into the product and which best practices you are being driven towards.

Bulk over piecemeal API calls 🔗

Notice that every Cloudant operation expends one read/write unit to “open the transaction” with the underlying key/value store (indicated by a red square on the diagram). This means all of the API calls that write, update, delete or fetch a single document cost two units each - one to open the transaction the other to perform the database operation. The bulk APIs are cheaper because a database transaction is opened once and is able to service several reads/writes.

Indexing is important for Cloudant Query 🔗

Using the POST /db/_find endpoint to query a database becomes very expensive if the query is not backed by a supporting secondary index. The query isn’t charged on the number of documents returned but on the number of documents scanned to get the answer. If a query has to churn through hundreds of “cancelled” orders before finding the “completed” orders it needs, then the query is more expensive than it need be and may exhaust your provisioned read allocation.

The best practice is to create indexes on the fields your query is searching for. An query that exactly aligns with a secondary index will only consume one read unit per returned document. Creating the right index for your data takes skill: read some advice on index design and optimisation here.

Deleting databases is the cleanest way to purge old unwanted data 🔗

Deleting a single document with DELETE /db/id?rev=<rev> costs 2 write units - it’s cheaper per document to use POST /db/_bulk_docs to do multiple deletions - but the most efficient way to purge many documents in bulk is to delete entire databases.

The “time-boxed databases” approach detailed here has merit in all flavours of Cloudant, allowing older data to be cleanly archived and deleted with full recovery of disk space.

The fewer indexes the better 🔗

Writes and bulk writes are charged not only on the number of documents written but the number of Cloudant Query secondary indexes defined, as each document write will need to be processed and written out into each index. The more indexes you have, the more expensive each write costs, so the fewer indexes the better. There is technique for using a handful of indexes that service many use-cases that is worth exploring.

Using the _id field as a free index 🔗

Storing data in the _id field rather than using Cloudant’s auto-generated ids can give you a “free” index for range querying e.g:

Using ?include_docs=true with MapReduce adds expense 🔗

Fetching keys and values from a MapReduce view is cheap (1 read unit to open the transaction and one read unit per one hundred key/values). Adding ?include_docs=true to fetch the associated document body adds additional expense of one read unit per document fetched.

Projecting data into a MapReduce view 🔗

The very cheapest thing that Cloudant TxE does is reading key/values from a MapReduce view. This incentivises you to project data into the index’s value to avoid having to use ?include_docs=true e.g

function (doc) {
  // create an index keyed on userId/date whose value is a part of the document as an object
  emit([doc.userId,], { status: doc.status, description: doc.description })

Fetching data from a view like this is fast, cheap and scalable.

Bulk operations have a limit of 2000 documents 🔗

Cloudant TxE returns a maximum of 2000 documents or view rows at at a time. Use the page_size parameter and the bookmark parameter to page through a result set and only ask for the documents needed. Read about pagination in Cloudant TxE here.