Design Docs For Life

August 16, 2019 | Glynn Bird | Design Docs Query

A Design Document is a special Cloudant document whose _id field begins with _design/ e.g. _design/search. It stores meta data about a secondary index or indexes: the name of the index, which fields are to be indexed etc. Design documents are created in one of two ways:

  1. You create and update them manually as you would for any normal Cloudant document. This method is required for MapReduce views, Cloudant Search and Geospatial indexes.
  2. They are created for you when you instruct the database to create a Cloudant Query index using the POST /db/_index endpoint. Such design documents are not intended for you to maintain - they merely record state for the Cloudant Query service.

As a side note, a Design Document can also contain definitions for List, Show, Update and Filter functions, none of which we’ll look at in this post.

design docs

Photo by Kelly Sikkema on Unsplash

View building

When a Design Document is created, whether by hand or via Cloudant Query, Cloudant asynchronously begins to build the index. Indexing requires Cloudant to trawl through each document in the database in turn to build the index - in fact, as a Cloudant database is split into many (sixteen by default) pieces called shards the index builds independently on each shard copy. If the database document count is significant, it may take some time to finish the indexing process.

You can query a view before the index has finished building, but you will not get a response until indexing is complete. You can check on the progress of an indexing job by querying the _active_tasks endpoint.

Design Document Tips

Cloudant Query indexes can be more efficient than MapReduce

The MapReduce engine requires Cloudant to execute your JavaScript map function using a JavaScript engine where as Cloudant Query work stays within the Erlang space of Cloudant’s core. As a result, Cloudant Queries are more efficient to build than their MapReduce equivalents.

Store data in a index-friendly way

Cloudant Query doesn’t give you the opportunity to modify the data prior to indexing (as can be achieved in a JavaScript map function) so it is important to store data in the form that is suitable for querying. With only a handful of indexed fields, many of your application access patterns can be queried optimally.

Partial Indexes are smaller

If you only need to index a sub-set of a database with Cloudant Query then a partial index can make the index smaller and more efficient by applying a filter selector at indexing time. A similar effect is achieved with an if statement in a MapReduce or Search index:

function(doc) {
  // we only want totals of paid orders
  if (doc.type === 'order' && doc.status === 'paid') {
  
    // build in index of order value by date, but only for paid-for orders
    emit(doc.date, doc.total)
  }
}

A small index requires fewer computing resources to manage so should be faster to query.

Use meaningful names

After using Cloudant for a while, you might find yourself staring at an index definition and wondering what it is for, who created it and whether it is still needed. Having a good naming convention is important in most software projects, and this applies to design document management too. Here are some ideas:

Partitioned indexes are faster

If your data can be moulded into a Partitioned Database model, then queries that are directed to a single partition will execute faster and be charged less per query than their global equivalents.

Specify use_index

When querying using Cloudant Query, supply use_index when you know which index you intend Cloudant to use to answer your query.

Use the “explain” API

Learn how to use the explain API which describes which index would be used to answer an individual query.

Automated testing for JavaScript

JavaScript map functions are just code and are as susceptible to bugs and typos as any other source code. It’s worth running your map functions through an automated test suite, to ensure that your code is emitting the correct keys and values in all circumstances. Here’s an example test suite for a map function:

const map = require('./designdocs/global.js').views.byDeviceId

let emittedValues = []

const emit = function(key, value) {
  emittedValues.push({key: key, value: value})
}

const index = function(key, value) {
  emittedValues.push({key: key, value: value})
}

const reset = function() {
  emittedValues = []
}

test('Ensure nothing is emitted for person documents', function() {
  const doc = {
    _id: '123',
    _rev: '1-2324',
    type: 'person',
    x: 1
  }
  reset()
  map(doc)
  expect(emittedValues.length).toEqual(0)
}

test('Ensure order date and value is emitted for paid order documents', function() {
  const doc = {
    _id: '123',
    _rev: '1-2324',
    type: 'order',
    date: '2018-02-42',
    total: 56.22,
    status: 'paid'
  }
  reset()
  map(doc)
  expect(emittedValues).toEqual(['2018-02-42', 56.22])
}

The above code uses the Jest testing framework and exercises our map function to ensure that it behaves correctly with the right stimulus. Dummy “emit”/”index” functions collect the emitted keys and values where they can be checked by your test code.

Note: Automated testing can help catch syntax errors and logical edge cases, but exactly reproduce how Cloudant’s JavaScript engine will execute your map functions, your test suite would have to use the same version of SpiderMonkey JavaScript engine.

Lifecycle management for Design Documents

Understanding the lifecycle of indexes is important when it comes to your application’s configuration management.

  1. CREATE - When a Design Document that contains index definitions is first saved, the process of building the index begins. Each copy of each shard of the database builds the index independently and in parallel. The index is not usable until all the processes are complete.
  2. UPDATE - When a Design Document that alters one or more MapReduce indexes is updated, all of the MapReduce indexes in the same document are invalidated and become unusable until they are rebuilt, The same rule does not apply to Cloudant Search indexes - only changed indexes are invalidated.
  3. DELETE - When a Design Document is deleted, the indexes are invalidated and the disk space occupied by its indexes will be recovered in due course.

There are some subtleties too:

The above semantics can used to ensure an orderly transition from one version of an index to another.

Design Document change control

With a large data set and significant index build times, care must be taken when deploying new versions of design documents if the the views are to be available at all times. Taking advantage of the subtle behaviour of design documents and their indexes, the following algorithm can be implemented to avoid downtime for the indexes defined by design document “ddoc”:

  1. Copy old design document to ddoc_OLD.
  2. Save new design document to ddoc_NEW.
  3. Trigger the view/search to make sure it builds.
  4. Poll the view to see if it has finished building.
  5. Copy ddoc_NEW to the ddoc.
  6. Delete ddoc_NEW
  7. Delete ddoc_OLD

This relatively convoluted series of steps has been implemented into a command-line tool couchmigrate which performs the design document dance on your behalf.