Partitioned Databases - Data Migration

March 05, 2019 | Glynn Bird | Partitioned Databases Migration

This is the third part of a series of posts on Partitioned Databases in Cloudant. Part One, Part Two and Part Four may also be of interest.

Cloudant’s new Partitioned Databases feature allows a Cloudant database to be organised into partitions (blocks of data guaranteed to reside on the same database shard) by specifying a two part _id field consisting of the parition and document id e.g.

  "_id": "US:52412",
  "name": "Boston",
  "state": "Massachusetts",
  "country": "US"

A partitioned database allows queries limited to a single parition - such queries can be performed much more efficiently than whole-database (global) queries.


Photo by Timothy Muza on Unsplash

Migrating to a partitioned database 🔗

Migrating existing data over to a partitioned database will require creating a new database with the partitioned=true flag:

curl -X PUT "$URL/cities2?partitioned=true"

The new database will need to be populated with a copy of the original data, but with the new partitionid:documentid format.

i.e we need to transform documents of this form:


into this form:


to ensure that each city is placed in a per-country partition.

Cloudant’s Replication API allows data to be copied or synced from a source database to a target database. Filters can be used to decide whether a document should be replicated or not but replication doesn’t allow you to transform the data as it is replicated.

There is a neat trick that allows data to be moved from one database to another, while modifying the _id field (or any other field for that matter) in the process. To do this we are going to use two command-line tools:

  1. couchbackup - allows CouchDB/Cloudant data be backed-up and stored as text files on your machine. It also comes with a tool to restore that data back to the database, or to a new empty database.
  2. jq is a JSON processor used to format and modify JSON data structures.

Our process is this:

The three actions can be achieved in a single command:

couchbackup --db cities | jq -c '. | map(._id = .country + ":" + ._id)' | couchrestore --db cities2

Let’s break that down:

couchbackup --db cities simply initiates a backup of the “cities” database. The data is output in batches of several hundred documents with one batch per line e.g.


The jq line jq -c '. | map(._id = .country + ":" + ._id)' means:

The result is a transformed array of countries:


Piping this data to couchrestore populates the new database.

Other considerations 🔗

Further information 🔗