Repairing a database with conflicts

November 26, 2020 | Glynn Bird | Conflicts Replication

Cloudant conflicts occur when disconnected replicas of a database are updated in different ways at the same time. The replicas could be:

Note that Cloudant on Transaction Engine does not generate conflicts for in-region write, eliminating the third of the above bullet points.

Cloudant conflicts can be resolved by deleting unwanted conflicted revisions and optionally writing a new revision (the choice of conflict resolution algorithm will vary between applications: merging the bodies of conflcting documents, being one example) but there are consequences for having a database that has conflicted documents because Cloudant retains information for each branch in the revision tree - the wider the revision tree, the more work is required to navigate it for update, retrieval and indexing operations.

In short, highly conflicted documents are a performance drag even if your application has dilligently resolved conflicts as they arise.

pic

Photo by jeshoots.com on Unsplash

This post describes two ways in which a database that contains problematic documents can be “repaired”, so as to not impact performance. Both involve creating a new copy of the data in a new, freshly-created database.

Repairing via replication 🔗

We can replicate data from the source database to a new empty database, with a replication filter that omits conflicted documents. A selector object is added to the replication document to set up the filter:

"selector": {
  "_conflicts": {
    "$exists": false
  }
}

The above selector reads as “allow documents that don’t have a _conflicts attribute to be replicated”.

The replication document in the _replicator database would look something like this:

{
  "source": "https://a.cloudant.com/source",
  "target": "https://a.cloudant.com/target",
  "selector": {
    "_conflicts": {
      "$exists": false
    }
  }
}

Replication makes light work of copying over documents without conflicts, but if you also need the winning revisions of the source database’s conflicted documents, a separate script would have to be written. It would fetch the winning revision of each conflicted document and write them to the target database using the bulk_docs API with new_edits=false to retain the original document’s revision token.

To help identify conflicted documents a view can be created in the source database.

Repairing by backup 🔗

Another method is to backup the affected database to a file and to restore it to a new, freshly created database. The couchbackup tool has a “shallow” mode which only copies over the winning revisions of each document it finds. Restoring only the winning revisions to the target has the effect of eliminating the conflict history.

The procedure is as follows:

# backup the source database using shallow mode
couchbackup --db source --mode shallow > backup.txt

# restore to a new empty database
curl -X PUT "$COUCH_URL/target"
cat backup.txt | couchrestore --db target

Note: couchbackup does not backup attachments, so this method may not be suitable for such databases.