Cloudant Fundamentals 10/10
It’s been an emotional journey through Cloudant’s fundamentals but we’re nearly at the end. In this final post, we’ll discuss data aggregation: counting, summing and statistics.
Cloudant Query which we used for part 8 and part 9 of this series does not have the ability to perform aggregations, only selection. i.e. you can do SELECT * FROM mydb WHERE actor='Al Pacino'
but not SELECT COUNT(*) FROM mydb WHERE actor='Al Pacino'
.
Photo by Sven Scheuermeier on Unsplash
MapReduceπ
One option is to write your own MapReduce view. A view is defined as a JavaScript function. Cloudant takes each document in the database, passing it to this function in turn and then storing the emitted key/value pairs in an index on disk.
To calculate counts of documents by actor we would create a map function like this:
function(doc) {
emit(doc.actor, null)
}
This function paired with the _count
reducer produces counts of each value of actor. The other built-in reducers _stats
and sum
can be used calculate statistics on the second field emitted by your map function:
function(doc) {
emit([doc.year, doc.month], doc.orderValue)
}
The above example allows a hierarchical key of year
and month
to be used to group the aggregation of the orderValue
field - ideal for reports and dashboards.
Facet countsπ
Another way of calculating simple count aggregations is using the faceting feature of Cloudant Search. Like MapReduce, Cloudant Search indexes are configured in JavaScript, this time with an index
function instead of emit
:
function(doc) {
index("category", doc.category, {"facet": true});
index("price", doc.price, {"facet": true});
}
The above example indexes two fields (category
and price
) from our document and instructs both to be “faceted”. This means that values can be counted at query time by supplying a counts
parameter:
...?q=*:*&counts=["category"]
{
"total_rows":100000,
"bookmark":"g...",
"rows":[...],
"counts":{
"category":{
"action": 55,
"comedy": 98,
"documentary": 3
}
}
}
See the documentation for further examples of facet counts and range faceting.
Aggregation with cloudant-quickstartπ
The simplest way to do aggregation is using the cloudant-quickstart library.
Counts are performed without defining the index yourself:
// count documents by year and month
db.count(['year','month]),then(console.log)
Simply specify the field or array of fields you want to group by and the library will create the appropriate MapReduce view for you.
The same applies to stats
and sum
aggregations
// get sum of sales grouped by year/month/day
db.sum('sales', ['year', 'month', 'day'])
// get stats on weather by state/city
db.stats(['temperature', 'airPressure', 'windSpeed'], 'state')
Conclusionπ
In this Cloudant Fundamentals series we’ve touched on schema design, unique ids, revision tokens, CRUD operations, querying and aggregation. But there’s a lot more.
Thereβs a project called PouchDB that might strike your fancy. You also can search our Medium publication for more articles on Cloudant and CouchDB. And of course, you can reach the wider open source community on the Apache CouchDB chat channels or get involved via the CouchDB mailing lists. Iβll see you on there!