New Reducers
As of June 2025, Cloudant now supports some new MapReduce reducers. In this blog post we’ll discuss what each of these reducers does and useful applications for each.
Photo by Javier Quiroga on Unsplash
Before we get to the new reducers, let’s just refresh what we know about MapReduce views and the existing reducers.
MapReduce - roll-your-own indexes🔗
Cloudant MapReduce allows secondary indexes to be generated from the primary document data by using user-supplied JavaScript “map” functions which are executed on every document in the database - any data “emitted”, by the map function forms the key/value pairs stored in the secondary index structure.
The map function’s emit
call has two parameters
- The key that forms the sort order of the index. This is usually a string, number or array of strings or numbers.
- The value that is stored against the key in the primary index and can be any JavaScript data. This value is returned when using the MapReduce index for selection.
Some map function examples:
// create an index on email
function (doc) {
// create an index keyed on the document's email address
// with nothing (null) as the index's "value"
emit(doc.email.toLowerCase(), null)
}
// create a view keyed on the year/month/day with the order's total
// as the index's value, but only for 'delivered' orders
function (doc) {
// only delivered orders are included...
if (doc.status === 'delivered') {
// parse the document's ISO-8601 date into a JavaScript Date object
const d = new Date(doc.date)
// create an array key ordered by Year, Month, Day e.g. [2025, 4, 25]
// the key contains the order basket total and tax cost
emit([ d.getFullYear(), d.getMonth() + 1, d.getDate() ], [doc.total, doc.tax])
}
}
// create a view on an IoT application's deviceid and timestamp with the
// full document emitted as the index's key.
function(doc) {
emit([doc.deviceid, doc.timestamp], doc)
}
The above examples demonstrate some useful features of MapReduce:
- Creating simple indexes for selection, based on data that is within the document such as an index on a document’s email address.
- Tidying up data using JavaScript prior to indexing, such as lowercasing the email address.
- Creating partial indexes based on logic, such as the second map function’s “if” statement to only included “delivered” orders.
- Using arrays as keys to create a keyspace which can be used for grouping by purchase year, year/month or year/month/day.
- Emitting numerical values which can summed or used as the basis of statistical calculations. Arrays of numbers or objects with numeric keys can also be aggregated into totals and statistics as well as single numbers.
- Emitting the entire document as a value, to avoid having to use
?include_docs
to fetch the document body at query-time. This makes for a bigger index, but faster queries.
At their simplest, MapReduce views are used for selection only - finding view rows by an exact matching key or keys(e.g key="bob@aol.com"
or keys=["fred@gmail.com","rita@apple.com"]
) or by performing a range query on the keyspace (e.g. startkey=["2024","06"]&endkey=["2025","01"]
). The view returns the matching keys and the accompanying “value” from the index, or with include_docs=true
the addition of the whole document that was used to generate the view row.
Note that adding
?include_docs=true
makes for a more computationally expensive query, as each matching row needs to fetch the document body too.
If a reducer was supplied with the index definition, it can be used to aggregate the resultant view rows, instead of returning all of the matching rows.
Note: a view with a reducer can also be used for selection only - simply switch off query-time reduction with
?reduce=false
.
The original reducers are:
_count
- counts view rows._sum
- sums numeric view values, including arrays of numbers or objects with numeric attribute values._stats
- produces sums, counts and “sum of squares” totals, suitable for creating mean, variance and standard deviation calculations on numeric view values._approx_count_distinct
- estimates a count of the distinct keys in the view. Note that unlike count/sum/stats, this reducer acts on the keys, not the values of the view.
Adding grouping options (group=true
or group_level=<number>
) unlocks grouping of the count/sum/stats data by unique keys. This is especially useful with array-based keys such as [2025, 4, 14]
:
// ?group_level=2 with a _count reducer returns counts of view rows
// grouped by year/month and ordered by year/month.
{"rows":[
{"key":[2024,1],"value":683},
{"key":[2024,2],"value":635},
{"key":[2024,3],"value":592},
{"key":[2024,4],"value":620},
{"key":[2024,5],"value":642},
{"key":[2024,6],"value":652},
{"key":[2024,7],"value":679},
{"key":[2024,8],"value":620},
{"key":[2024,9],"value":666},
{"key":[2024,10],"value":721},
{"key":[2024,11],"value":608},
{"key":[2024,12],"value":649},
{"key":[2025,1],"value":623},
{"key":[2025,2],"value":611},
{"key":[2025,3],"value":721},
{"key":[2025,4],"value":278}
]}
The new reducers🔗
The new reducers add additional query-time aggregation functionality.
_top_n/_bottom_n🔗
The _top_n
and _bottom_n
reducers return the highest “n” values from a view or grouping within the view, where n is a number between 1 and 100.
Imagine an online game where each game run’s score is recorded in a Cloudant database, with documents of this form:
{
"type": "score",
"user": "bsmith",
"game": "Bellicose_Badgers",
"date": "2025-04-14T13:11:01.000Z",
"score": 8752
}
We want the top ten scores a user has scored for a given game, or for all the games that they’ve played. First, we create a view keyed, on game
& user
we would use a map function like this:
function (doc) {
// if this is a score document
if (doc.type === 'score') {
// create an index with:
// key - an array of [game, user]
// value - the points scored in that game
emit([doc.game, doc.user], doc.score)
}
}
and we use the _top_10
reducer, which returns the 10 highest values from the index for the given grouping.
Note: any reducer name from
_top_1
to_top_100
is valid e.g._top_25
.
When we query the view without any parameters, we get the top 10 scores for any game and any user:
{"rows":[
{"key":null,"value":[9968,9966,9952,9928,9902,9894,9870,9864,9863,9849]}
]}
By employing MapReduce’s grouping feature, we can group by game
to get the top 10 scores for each game:
// ?group_level=1
{"rows":[
{"key":["Bellicose_Badgers"],"value":[9968,9952,9863,9849,9800,9778,9771,9763,9746,9700]},
{"key":["Cunning_Capybaras"],"value":[9928,9902,9870,9704,9601,9544,9479,9470,9445,9434]},
{"key":["Friendly_Fruitbats"],"value":[9864,9847,9818,9810,9787,9740,9734,9721,9667,9643]},
{"key":["Hungry_Hedgehogs"],"value":[9966,9894,9747,9720,9651,9615,9605,9596,9585,9567]}
]}
or group by game
and user
to get the top 10 scores for each game/user combination:
{"rows":[
{"key":["Bellicose_Badgers","ajones"],"value":[9952,9863,9678,9644,9526,9516,9034,8795,8711,8676]},
{"key":["Bellicose_Badgers","bsmith"],"value":[9778,9771,9682,9630,9460,9196,8973,8861,8857,8093]},
{"key":["Bellicose_Badgers","cjackson"],"value":[9968,9495,9109,8553,8534,8514,8341,8244,8038,7822]},
{"key":["Bellicose_Badgers","dparkinson"],"value":[9800,9746,9700,9694,9625,9375,9297,9285,9204,9100]},
{"key":["Bellicose_Badgers","estanwick"],"value":[9849,9763,9543,8833,8435,8387,8287,8154,8028,8009]},
{"key":["Bellicose_Badgers","fwilkes"],"value":[9690,9612,9339,9192,9027,8478,8428,8337,7452,7414]},
{"key":["Cunning_Capybaras","ajones"],"value":[9704,9311,9307,9175,9073,8848,8751,8465,8464,8128]},
{"key":["Cunning_Capybaras","bsmith"],"value":[9870,9445,9434,9226,8519,8269,8263,7339,7266,7126]},
{"key":["Cunning_Capybaras","cjackson"],"value":[9928,9902,9544,9349,9003,8370,8328,8057,7933,7415]},
{"key":["Cunning_Capybaras","dparkinson"],"value":[9601,9479,9470,9253,8720,8701,8579,8572,8490,8488]},
{"key":["Cunning_Capybaras","estanwick"],"value":[8963,8916,8643,8489,8264,8173,7790,7722,7628,7381]},
{"key":["Cunning_Capybaras","fwilkes"],"value":[9328,9258,9129,9121,9099,9031,8899,8647,8421,8334]},
{"key":["Friendly_Fruitbats","ajones"],"value":[9864,9847,9818,9734,9643,9527,9488,9302,8771,8592]},
{"key":["Friendly_Fruitbats","bsmith"],"value":[9721,9570,9240,9221,9126,8781,8259,8193,7658,6894]},
{"key":["Friendly_Fruitbats","cjackson"],"value":[9740,9217,9154,8984,8584,7975,7761,7666,6799,6724]},
{"key":["Friendly_Fruitbats","dparkinson"],"value":[9787,9239,8844,8681,8676,8286,8231,8093,7550,7460]},
{"key":["Friendly_Fruitbats","estanwick"],"value":[9810,9586,9579,9429,8956,8762,8426,8408,8349,8246]},
{"key":["Friendly_Fruitbats","fwilkes"],"value":[9667,9454,9256,8582,8330,8277,8089,8064,7950,7936]},
{"key":["Hungry_Hedgehogs","ajones"],"value":[9615,9605,9567,9129,9121,9038,8984,8613,8013,7963]},
{"key":["Hungry_Hedgehogs","bsmith"],"value":[9894,9544,9293,9285,9251,9082,9016,9012,8882,8524]},
{"key":["Hungry_Hedgehogs","cjackson"],"value":[9720,9484,9172,8703,8275,8072,7605,7540,7455,7272]},
{"key":["Hungry_Hedgehogs","dparkinson"],"value":[9585,9378,9230,8798,8721,8689,8599,7885,7772,7524]},
{"key":["Hungry_Hedgehogs","estanwick"],"value":[9966,9747,9651,9399,9086,8765,8479,8292,8265,7530]},
{"key":["Hungry_Hedgehogs","fwilkes"],"value":[9596,9450,8825,8798,6772,6644,6640,6381,6332,5767]}
]}
If we know the game/user combination we want, we can supply a key
parameter:
// ?key=["Cunning_Capybaras","cjackson"]
{"rows":[
{"key":null,"value":[9928,9902,9544,9349,9003,8370,8328,8057,7933,7415]}
]}
Of course, the reducer can be switched off at query-time to use the view for selection:
// ?key=["Cunning_Capybaras","cjackson"]&reduce=false
{"total_rows":1000,"offset":357,"rows":[
{"id":"4913b1dccc7b8fa4720d99f706001099","key":["Cunning_Capybaras","cjackson"],"value":3980},
{"id":"4913b1dccc7b8fa4720d99f70601653d","key":["Cunning_Capybaras","cjackson"],"value":6920},
{"id":"4913b1dccc7b8fa4720d99f706016698","key":["Cunning_Capybaras","cjackson"],"value":5839}
...
]}
The _bottom_n
reducer works just the same but returns the lowest values for the given grouping. Like _top_n
, the value of n
can be any number between 1 and 100.
_first/_last🔗
The _first
and _last
reducers return one value from a view with the lowest or highest keys respectively, for the given grouping. This is best demonstrated by example. Let’s say we have IoT data arriving from a number of remote devices, each with their own unique deviceid
, where each reading stores documents like this:
{
"_id":"93117567370d41d091b8dd160a3adf3f",
"deviceid":"A00", // the unique id of the IoT device
"reading":13.8986, // the reading from the IoT device
"timestamp":"2025-03-26T04:44:08.917Z" // when the reading was taken
}
We can create a MapReduce view with a map function that orders its keys by deviceid
and timestamp
and stores the whole document as the view’s “value”:
function (doc) {
emit([doc.deviceid, doc.timestamp], doc)
}
When paired with the _last
reducer, we can use the resultant view to find the latest reading from each unique deviceid (“latest” because we are using the _last
reducer and the data is ordered by timestamp
for each deviceid
, so the latest data sorts last):
// ?group_level=1
{"rows":[
{"key":["A00"],"value":{"_id":"93117567370d41d091b8dd160a3adf3f","_rev":"1-bc05e93e592d5a5a18e240240b581a55","deviceid":"A00","reading":13.8986,"timestamp":"2025-03-26T04:44:08.917Z"}},
{"key":["A01"],"value":{"_id":"c9f53ac9e4a8444487ed0eaa11dc1c78","_rev":"1-1fcf121c73db03e49bac4c1981518b19","deviceid":"A01","reading":59.8453,"timestamp":"2025-04-01T01:52:34.254Z"}},
{"key":["A02"],"value":{"_id":"577b7108a8a1458f9a17194ed1da398a","_rev":"1-71d345a773ab31f35ceb998f3c107c41","deviceid":"A02","reading":2.6208,"timestamp":"2025-03-31T00:22:17.175Z"}},
{"key":["A03"],"value":{"_id":"150839b1d363427496a4f4e2917b8b1d","_rev":"1-860a2ed5f1f48aa642495dfb21dff3ce","deviceid":"A03","reading":55.8677,"timestamp":"2025-03-22T10:15:57.890Z"}},
{"key":["A04"],"value":{"_id":"d3317bdc1f7b4466ae6bf7a30ca9e328","_rev":"1-a58762532f6980ca5b06a8d01d113814","deviceid":"A04","reading":44.1822,"timestamp":"2025-03-23T10:56:10.639Z"}},
{"key":["A05"],"value":{"_id":"946754d3762f44e297ca20d13bbceb5e","_rev":"1-41fa2bb3ef8782c90b28ee44628abeab","deviceid":"A05","reading":13.2874,"timestamp":"2025-03-27T13:59:04.723Z"}},
{"key":["A06"],"value":{"_id":"bbcd8a5c0ae948baae713a9fcb5262d5","_rev":"1-66883211ee20a0772374672aa175dc50","deviceid":"A06","reading":7.9525,"timestamp":"2025-04-01T15:13:04.305Z"}},
{"key":["A07"],"value":{"_id":"a600caccdb82400698b158ecebfaa6f2","_rev":"1-68975a8d7133a13c6cc8ae34d28ea1c6","deviceid":"A07","reading":89.4818,"timestamp":"2025-03-07T02:12:59.934Z"}},
{"key":["A08"],"value":{"_id":"be20bae911db4da685f891fd01e07d3a","_rev":"1-d69b559627cf01e80fe85ea2f54d2f4b","deviceid":"A08","reading":97.6739,"timestamp":"2025-03-29T13:46:31.689Z"}},
{"key":["A09"],"value":{"_id":"96526e80f2ff48e89e9e42aa47abae24","_rev":"1-85e56901f7113d7d6ff8ba573c535eb3","deviceid":"A09","reading":26.1597,"timestamp":"2025-03-18T03:22:19.848Z"}}
]}
Before the release of the _last
reducer, to get the the above result set would have required us to run one query per device to get the answer.
Note:
- We need a key with at least two parts: the first is the item we intend to group by and the second determines the sort order within the group.
- In this case, we emit the entire document into the view’s value so that we get all of the IoT reading’s data back at query time.
The _first
reducer works in the same way but returns the value of the lowest sorting key in the group: the earliest reading for each device, in our case.
For reference, the table below shows which readings are considered “first” and “last” for a group_level=1
query on our data set:
key | value | First reading (group_level=1) | Last reading (group_level=1) |
---|---|---|---|
[“A00”,“2025-01-01T10:00:00.000Z”] | {“reading”: 65,“timestamp”:2025-01-01T10:00:00.000Z",“deviceid”:“A00”} | x | |
[“A00”,“2025-01-01T10:01:00.000Z”] | {“reading”: 64,“timestamp”:2025-01-01T10:01:00.000Z",“deviceid”:“A00”} | ||
[“A00”,“2025-01-01T10:02:00.000Z”] | {“reading”: 59,“timestamp”:2025-01-01T10:02:00.000Z",“deviceid”:“A00”} | x | |
[“A01”,“2025-01-01T10:00:00.000Z”] | {“reading”: 12,“timestamp”:2025-01-01T10:00:00.000Z",“deviceid”:“A01”} | x | |
[“A01”,“2025-01-01T10:01:00.000Z”] | {“reading”: 15,“timestamp”:2025-01-01T10:01:00.000Z",“deviceid”:“A01”} | ||
[“A01”,“2025-01-01T10:02:00.000Z”] | {“reading”: 19,“timestamp”:2025-01-01T10:02:00.000Z",“deviceid”:“A01”} | x | |
[“A02”,“2025-01-01T10:00:00.000Z”] | {“reading”: 55,“timestamp”:2025-01-01T10:00:00.000Z",“deviceid”:“A02”} | x | |
[“A02”,“2025-01-01T10:01:00.000Z”] | {“reading”: 54,“timestamp”:2025-01-01T10:01:00.000Z",“deviceid”:“A02”} | ||
[“A02”,“2025-01-01T10:02:00.000Z”] | {“reading”: 56,“timestamp”:2025-01-01T10:02:00.000Z",“deviceid”:“A02”} | x |
SQL equivalents🔗
Here are the SQL equivalents of each reducer , for a view of this form, where the key is made up of [year, month, day]
with a single numeric value
:
year | month | day | value |
---|---|---|---|
2020 | 10 | 14 | 12345 |
2020 | 10 | 15 | 23456 |
2020 | 10 | 16 | 34567 |
2021 | 2 | 21 | 912345 |
2021 | 2 | 22 | 923456 |
2021 | 2 | 23 | 934567 |
The examples below assume a ?group_level=2
grouping.
_count🔗
SELECT year, month, COUNT() as count
FROM view
GROUP BY year, month
ORDER BY year, month, day;
year | month | count |
---|---|---|
2020 | 10 | 3 |
2021 | 2 | 3 |
_sum🔗
SELECT year, month, SUM(value) as sum
FROM view
GROUP BY year, month
ORDER BY year, month, day;
year | month | sum |
---|---|---|
2020 | 10 | 70368 |
2021 | 2 | 2770368 |
_stats🔗
SELECT year, month, COUNT() as count, SUM(value) as sum, MIN(value) as min, MAX(value) as max, SUM(value*value) as sumsqr FROM view
GROUP BY year, month
ORDER BY year, month, day;
year | month | count | sum | min | max | sumsqr |
---|---|---|---|---|---|---|
2020 | 10 | 3 | 70368 | 12345 | 34567 | 1897460450 |
2021 | 2 | 3 | 2770368 | 912345 | 934567 | 2558559860450 |
_approx_count_distinct🔗
SELECT year, month, COUNT() as count
FROM view
GROUP BY year, month
ORDER BY year, month
year | month | count |
---|---|---|
2020 | 10 | 3 |
2021 | 2 | 3 |
Note: remember
_approx_count_distinct
works on the keys, and this is an exact count distinct instead of an approximation.
This reducer would likely be used to find the approximate distinct count of all keys, so without any grouping, which would be a query similar to:
SELECT COUNT(DISTINCT(CONCAT(year,month,day))) as count
FROM view;
count |
---|
6 |
_top_2🔗
SELECT
year,
month,
(SELECT GROUP_CONCAT(v) FROM (SELECT value AS v FROM view WHERE year=months.year AND month=months.month ORDER BY value DESC LIMIT 2)) as top_2
FROM view AS months
GROUP BY year, month
ORDER BY year,month;
year | month | top_2 |
---|---|---|
2020 | 10 | 34567,23456 |
2021 | 2 | 934567,923456 |
Note: the innermost sub-query gets the top 2 values for a single year/month. These 2 values are “group concat’ed” into a list in the outer sub-query which is fed from a grouped list of years and months.
_bottom_2🔗
SELECT
year,
month,
(SELECT GROUP_CONCAT(v) FROM (SELECT value AS v FROM view WHERE year=months.year AND month=months.month ORDER BY value ASC LIMIT 2)) as bottom_2
FROM view AS months
GROUP BY year, month
ORDER BY year,month;
year | month | bottom_2 |
---|---|---|
2020 | 10 | 12345,23456 |
2021 | 2 | 912345,923456 |
Note: nearly identical to the
_top_2
query, but withASC
instead ofDESC
.
_first🔗
SELECT
year,
month,
(SELECT value AS v FROM view WHERE year=months.year AND month=months.month ORDER BY day limit 1) as first
FROM view as months
GROUP BY year, month
ORDER BY year,month;
year | month | first |
---|---|---|
2020 | 10 | 12345 |
2021 | 2 | 912345 |
Note: the sub-query gets the first value (ordered by day) for each year/month.
_last🔗
SELECT
year,
month,
(SELECT value AS v FROM view WHERE year=months.year AND month=months.month ORDER BY day DESC limit 1) as last
FROM view as months
GROUP BY year, month
ORDER BY year,month;
year | month | last |
---|---|---|
2020 | 10 | 34567 |
2021 | 2 | 934567 |
Note: Same as
_first
but with the order of the sub-query changed toDESC
.