Replication Efficiency Improvements

Feb 8, 2023 | Nick Vatamaniuc

Replication

Cloudant’s replication is a rock-solid protocol that allows a database’s changes to be easily synced to a different database. This feature is used widely to create multi-region Cloudant topologies, allowing dependent applications to survive a regional Cloud outage.

Cloudant has recently published a number of improvements that make replication even better than before - in our internal benchmarks we have seen replications speeds of 3x the previous version. Some of these features have been switched off by default, but may become the default behaviour in future releases.

In this blog post we’ll explore what has changed and how the new optional features can be switched on in your replications.

replication

Photo by Vince Fleming on Unsplash

Before we get to that, it is instructive to understand how Cloudant’s replication works.

How does Replication work?🔗

Replication consists of three actors:

The source. The Cloudant database containing the data that is to be written to the target.
The target. The Cloudant database where data from the source is to be written.
The mediator. The Cloudant instance that performs the administration of the replication process. This can be either the source or target instance, or in some cases, an entirely different Cloudant instance.

Replication data logically flows in one direction only: from the source to the target - changes that occur on the source are carefully grafted on to data that exists in the target database. (If two way “sync” is required, then two replications are needed - one for data flowing from A->B, the other for B->A).

Replications are started by writing a document to the _replicator database on the mediator (remember that the mediator may also be either the source or target Cloudant service). In this case we’re replicating a database a on one Cloudant instance to database b on another:

{
  "_id": "a_to_b",
  "source": "https://myfirstaccount.cloudant.com/a",
  "target": "https://mysecondaccount.cloudant.com/b"
}

Note that this simplified _replicator document omits any authentication credentials which would be necessary for the replication to proceed. The replication document has a number of optional configuration parameters but source and target are the key mandatory options.

Once the document is written to the Cloudant mediator instance’s _replicator database, that Cloudant service will begin a series of repeating steps:

A batch of changes is read from the source instance. This uses the Cloudant Changes Feed API call which provides a list of document revisions that have occurred since the last checkpoint (see step 5).
The target database is then queried to see if it already has the changes from step 1. This uses batched calls to the Cloudant _revs_diff API call, which given a list of document revisions will reply with a list of revisions the target doesn’t have. This is an optimisation to avoid having to send data to the target that it already has.
The revisions required to be sent to the target from Step 2, are fetched from the source using the Cloudant Document Fetch API call - one invocation for each document revision required.
Batches of document revisions from Step 3 are written to the target database using the Cloudant Bulk Write API. The document revisions written could be freshly inserted documents, updates to existing documents, conflicted documents or document deletions.
The state of the progress of the replication job is written to the source and target databases as local “checkpoint” documents which allow a stopped replication to resume from where it left off.

Steps 1 through 5 are repeated until there are no more changes for “one-shot” replications, or forever for “continuous” replications.

Read more about Cloudant Replication in our documentation.

Optimisation 1: Skipping `_revs_diff`🔗

As discussed in the previous section, the _revs_diff step is there to prevent data that already exists on the target being re-sent from source. But what if the target is empty, as it would be at the start of fresh replication to a new database? The _revs_diff step would be a waste of time, eating up valuable database read operations from the target.

Cloudant now intelligentlly decides whether to perform the _revs_diff step, learning from the responses to previous _revs_diff requests. If it detects that the target seems to need most of the revisions being sent, then it will send them without bothering with the _revs_diff step in most cases.

No user action is required to unlock this optimisation - Cloudant will automatically use fewer _revs_diff requests when the target appears to have few of the changes that the source is sending. As a result, replications to empty of sparse targets will proceed more quickly.

Optimisation 2: Make `_revs_diff` faster🔗

Even though some replications may use fewer _revs_diff API calls, the API call itself has been tweaked so that, if called, it runs much faster than before.

No user action is required to unlock this optimisation - it just goes faster!

Optimisation 3: Using `_bulk_get`🔗

Instead of using GET /<sourcedb>/<docid> to fetch each document revision for Step 3, the Cloudant Replicator can be configured to use the POST /<sourcedb>/_bulk_get endpoint instead, to fetch batches of documents in bulk.

Using fewer bulk requests in place of many more individual document fetches means that the source Cloudant instance is doing less work, so replications can proceed more quickly.

This feature can be enabled by adding "use_bulk_get": true to your replication document e.g.

{
  "_id": "a_to_b",
  "source": "https://myfirstaccount.cloudant.com/a",
  "target": "https://mysecondaccount.cloudant.com/b",
  "use_bulk_get": true
}

A word of warning on this feature: although it uses the same number of Cloudant “reads” as fetching the documents individually, the bulk API consumes those reads in a single second - so read consumption may be more “peaky”. Take care that your source Cloudant service has enough capacity to avoid exhausting the read allocation required by other API clients.

Optimisation 4: Make `_bulk_get` faster🔗

The _bulk_get API call has been made more efficient so that bulk fetches of documents put less strain on the Cloudant service.

No user action is required to unlock this optimisation - it just goes faster!

Optimisation 5: Winning revisions only🔗

Sometimes replication is used to repair a source database that contains conflicted documents. The source database can be replicated to a new target but only the winning revisions are retained (leaving behind any conflicts).

This is achieved with the "winning_revs_only": true flag:

{
  "_id": "a_to_b",
  "source": "https://myfirstaccount.cloudant.com/a",
  "target": "https://mysecondaccount.cloudant.com/b",
  "use_bulk_get": true,
  "winning_revs_only": true
}

See Repairing A Database With Conflicts

Note the winning_revs_only flag should only be used for one-off replications for the purposes of conflict repair. It is not suitable for general-purpose replication tasks.

Note that Cloudant is built on Apache CouchDB and the features described in this blog post were published in Apache CouchDB 3.3.

Replication Efficiency Improvements

How does Replication work?🔗

Optimisation 1: Skipping _revs_diff🔗

Optimisation 2: Make _revs_diff faster🔗

Optimisation 3: Using _bulk_get🔗

Optimisation 4: Make _bulk_get faster🔗

Optimisation 5: Winning revisions only🔗

Optimisation 1: Skipping `_revs_diff`🔗

Optimisation 2: Make `_revs_diff` faster🔗

Optimisation 3: Using `_bulk_get`🔗

Optimisation 4: Make `_bulk_get` faster🔗