Throttling Replication

Feb 1, 2024 | Glynn Bird

Replication

Throttling replicaion🔗

Cloudant’s replication protocol allows a database’s data to be copied to another database in the same Cloudant instance, or to a database in a different Cloudant instance, perhaps in another IBM Cloud region. It is commonly used to keep two copies of the same data in different regions in sync, for applications that need high availability than provided by a single cloud region.

A Cloudant replication job is a one-way operation copying documents from the source to the target and consumes:

1 read unit from the source Cloudant instance
1 write unit from the target Cloudant instance

for every document that needs to be copied from source to target. Other API calls (such as _changes and _revs_diff) are used to work out whether a document revision needs to be copied, but these are not charged.

As such a replication’s speed can be limited by the plan capacity of the source Cloudant instance (for reads) and the capacity of the target Cloudant instance (for writes).

a valve

Photo by Igal Ness on Unsplash

In most applications it is not desirable for replication traffic to exhaust the read or write capacity at each end of a replication job, so this blog post shows how replication jobs can be tuned to proceed at a slower rate - to leave some Cloudant capacity free for operational API calls.

How to set up replication🔗

A replication job is created by adding a document into a Cloudant instance’s _replicator database.

{
  "_id": "uswest_to_useast_products",
  "source": {
    "url": "https://mysourceinstance.cloudant.com/mysourcedatabase",
    "auth": {
      "iam": {
        "api_key": "MY_IAM_API_KEY"
      }
    }
  },
  "target": {
    "url": "https://mytargetinstance.cloudant.com/mytargetedatabase",
    "auth": {
      "iam": {
        "api_key": "MY_IAM_API_KEY"
      }
    }
  },
  "continuous": true
}

Note:

The replication job is given a meaningful _id attribute which will help us identify this replication job in the future.
The source object contains a url attribute which contains the hostname of the source instance and the source database’s name.
The target object contains a url attribute which contains the hostname of the target instance and the target database’s name.
Both source and target objects contain an IAM api key that will be used to authenticate with each Cloudant instance.
The continuous flag keeps the replication running, even after the last source change is processed. We’ll be adding other parameters at this level to tweak the replication’s speed.

Monitoring a replication job🔗

To see how our replication job is proceeding we can use the retrieve a replication scheduler document API:

GET  /_scheduler/docs/_replicator/uswest_to_useast_products
{
  "database": "_replicator",
  "doc_id": "uswest_to_useast_products",
  "id": "63fa8857a6bc494f9474525d2293faf1+continuous",
  "source": "https://mysourceinstance.cloudant.com/mysourcedatabase/",
  "target": "https://mytargetinstance.cloudant.com/mytargetedatabase/",
  "state": "running",
  "info": {
    "revisions_checked": 1854,
    "missing_revisions_found": 1854,
    "docs_read": 453,
    "docs_written": 33,
    "changes_pending": 9649,
    "doc_write_failures": 0,
    "bulk_get_docs": 0,
    "bulk_get_attempts": 0,
    "checkpointed_source_seq": "4-g1AAAATk",
    "source_seq": "354-g1AAAAfjeJy",
    "through_seq": "4-g1AAAATkeJzLYW"
  },
  "error_count": 0,
  "last_updated": "2024-01-29T11:32:21Z",
  "start_time": "2024-01-29T11:32:21Z",
  "source_proxy": null,
  "target_proxy": null
}

There’s a lot to unpack here. But here are the main points:

We fetch the replication scheduler document using the _id of the document we created earlier.
info.docs_read & info.docs_written show how many documents have been read from the source and written to the target respectively.
info.error would contain an error message if something went wrong.
state shows the status of the replication. A continuous replication would be expected to be “running” forever, unless a fatal error occurred.
Full API reference is here, including all the possible values of state.

Controlling the speed of a replication job🔗

The default configuration for a replication job is to use 20 parallel HTTP connections and 4 worker processes. For small document sizes, this can result in 3000 documents being fetched from the source and being written to the target, per second.

To make a replication job proceed more slowly than that, we can set the http_connections and worker_processes parameters in the _replicator document:

{
  "_id": "uswest_to_useast_products",
  "source": {
    "url": "https://mysourceinstance.cloudant.com/mysourcedatabase",
    "auth": {
      "iam": {
        "api_key": "MY_IAM_API_KEY"
      }
    }
  },
  "target": {
    "url": "https://mytargetinstance.cloudant.com/mytargetedatabase",
    "auth": {
      "iam": {
        "api_key": "MY_IAM_API_KEY"
      }
    }
  },
  "continuous": true,
  "http_connections": 1,
  "worker_processes": 1
}

This table shows how combinations of http_connections and worker_processes translate into numbers of documents transferred per second between a source database of small documents and new empty target database.

http_connections	worker_processes	Approximate reads per second on source database
2	1	200
6	2	1000
12	3	2000
20	4	3000 (This value is the default.)

Note: there are other flags and attributes that can be added to the _replicator document. See this blog post.

In practice, it’s best to try a replication configuration and measure how fast it is progressing and how much of your Cloudant capacity is being consumed using the provisioned throughput capacity consumption API. The actual consumption rate will depend on factors including:

How different the source and target databases are and therefore how many document revisions need to be copied over.
How big the documents are.
How many document conflicts are present.
How many document attachments need to be copied.
If a replication filter is provided, how many source documents are discarded by the filter.
If the replication bumps into the provisioned read capacity of the source Cloudant or write capacity of the target Cloudant instance.

Futher reading🔗

API reference for replication documents: https://cloud.ibm.com/apidocs/cloudant#putreplicationdocument
API reference for reading a replication scheduler document: https://cloud.ibm.com/apidocs/cloudant#getschedulerdocument
Speeding up replication blog post