Stale, update and stable
tl;dr If you are using stale=ok
in queries to Cloudant or CouchDB 2.x, you
most likely want to be using update=false
instead. If you are using
stale=update_after
, use update=lazy
instead.
This question has come up a few times, so here’s a reference to what the situation is with these parameters to query requests in Cloudant and CouchDB 2.x.
CouchDB originally used stale=ok
on the query string to specify that you were
okay with receiving out-of-date results. By default, CouchDB lazily updates
indexes upon querying them rather than when JSON data is changed or added. If up
to date results are not strictly required, using stale=ok
provides a latency
improvement for queries as the request does not have to wait for indexes to be
updated before returning results. This is particularly useful for databases with
a high write rate.
As an aside, Cloudant automatically enqueues indexes for update when primary data changes, so this problem isn’t so acute. However, in the face of high update rate bursts, it’s still possible for indexing to fall behind so a delay may occur.
When using a single node, as in CouchDB 1.x, this parameter behaved as you’d
expect. However, when clustering was added to CouchDB, a second meaning was
added to stale=ok
: also use the same set of shard replicas to retrieve the
results.
Recall that Cloudant and CouchDB 2.x stores three copies of each shard and
by default will use the shard replica that starts returning results fastest for
a query request. This latter fact helps even out load across the cluster.
Heavily loaded nodes will likely return slower and so won’t be picked to respond
to a given query. When using stale=ok
, the database will instead always use
the same shard replicas for every request to that index. The use of the same
replica to answer queries has two effects:
- Using
stale=ok
could drive load unevenly across the nodes in your database cluster because certain shard replicas would always be used for the queries to the index that specifystale=ok
. This means a set of nodes could receive outside numbers of requests. - If one of the replicas was hosted on a heavily loaded node in the cluster,
this would slow down all queries to that index using
stale=ok
. This is compounded by the tendency ofstale=ok
to drive imbalanced load.
The end result is that using stale=ok
can, counter-intuitively, cause queries
to become slower. Worse, they may become unavailable during cluster split-brain
scenarios because of the forced use of a certain set of replicas. Given that
mostly people use stale=ok
to improve performance, this wasn’t a great state
to be in.
As stale=ok
’s existing behaviour needed to be maintained for backwards
compatibility, the fix for this problem was to introduce two new query string
parameters were introduced which set each of the two stale=ok
behaviours
independently:
update=true/false/lazy
: controls whether the index should be up to date before the query is executed.true
: the index will be updated first.false
: the index will not be updated.lazy
: the index will not be updated before the query, but enqueued for update after the query is completed.
stable=true/false
: controls the use of the certain shard replicas.
The main use of stable=true
is that queries are more likely to appear to “go
forward in time” because each shard replica may update its indexes in different
orders. However, this isn’t guaranteed, so the availability and performance
trade offs are likely not worth it.
The end result is that virtually all applications using stale=ok
should move
to instead use update=false
.