Announcing Cloudant Search

Tim Anglade

July 14, 2011

I’ve always strongly felt that using NOSQL wasn’t so much a choice as a necessity. That most successful NOSQL deployments start with the intimate knowledge that your set of requirements — from speed & availability to operational considerations and budget — cannot be met with a relational database, coupled with a deep understanding of the tradeoffs you are making. Among those, perhaps no tradeoff has been felt more deeply by NOSQL users worldwide, than the eponymous loss of a natural, instantaneous way of accessing your data through a structured query language. We all came up with our own remedies; more often than not, that substitute was based on MapReduce: Google’s novel, elegant way of explicitly parallelizing computation over distributed, unstructured data. But as the joke goes, it’s always been a non-starter for the more novice users out there, and where suits & ties are involved.

CouchDB Views (as our brand of MapReduce is called) come with additional concerns, as they are pre-computed and written to disk. While this is fine — and actually, extremely useful — for the use-cases and small scales a lot of Apache CouchDB deployments reside at (single instances working off a limited dataset), this behavior is somewhere North of nagging and South of suicidal for the data sizes & use-cases most Cloudant customers have to deal with. Part of the promise of our industry is — or should be, anyway — to make your life & business easier, no matter how much data you have. And so, while CouchDB Views have been, and will undoubtedly remain, an essential tool to index, filter & transform your data, once you know what to do with it; and while its various weaknesses (explicitly parallelized syntax, lengthy computation, heavy disk usage) are also the source of its most meaningful strengths (distributed processing, high performance on repeated queries, persistent transformations), we at Cloudant saw a clear opportunity to offer a novel, complementary way to interact with your data.

A way that would allow you to interact with your data instantaneously; wouldn’t force you to mess around with MapReduce jobs or complex languages; a way that would not require you to set up a third-party, financially or operationally expensive solution.

We call this way Cloudant Search. And today, we’re proud to announce its immediate availability, as a public beta.

Want to easily find all the documents that contain the word “bieber”? This is the Cloudant Search query you have to write:

bieber

Want to find all the records that have “my world” in their title? Just write:

title:"my world"

How about only finding all the artists who were born in Canada in February & March 1994?

type:artist country:canada dob<date>:[1994-02-01 TO 1994-03-31]

How about people who are fan of either Justin Bieber or Justin Timberlake, that live in JBiebz’ hometown in Ontario — wait, what’s it called, Stratwood? Stratburg? Strat- something. Oh, I’ll just search it:

type:person fan-of:(justin (bieber OR timberlake)) city:strat* state:ontario

And that’s just scratching the surface and you can see more query examples in our knowledge base. Of course, all those queries are also served instantly, and new records you put in your database will be indexed in real time.

But Cloudant Search goes beyond ease of access for developers, analysts & end-users.

For the IT & DevOps teams out there, and for our own sake (since, as maintainers of hosting service cloudant.com, we have the distinction of being the only NOSQL company tasked with operating its own product around the clock), we worked tirelessly to make it capable of tackling terabytes and easy to maintain under load. It is baked in directly into the Cloudant codebase and leverages existing, proven technology, such as the Lucene indexers, CouchDB’s view indexing logic and of course, Cloudant’s own Dynamo-inspired distribution algorithms. Compared to external search & analytics solutions such as Hadoop, SAS, Solr, Sphinx or ElasticSearch, this integration & simplicity effectively translates to more flexibility to address ever-changing use-cases, a drastically reduced operational complexity and, at the end of the day, a much lower total cost of ownership for start-ups and established businesses alike.

Finally, for the power-users out there, we’ve made Cloudant Search extremely extensible. As our documentation details, you can write your own indexing algorithms, leverage Lucene tokenizers or even, customize the Cloudant Search indexer itself, with a bit of Java code.

Now, I’d be engaging in doublespeak if I didn’t make it clear that Search is still bound by some of the tradeoffs that epitomize our industry. Like some of our esteemed colleagues, we strive to make our users & customers aware of the adjustments that, yes, define our weaknesses but more importantly, are also the source of our strengths. Among those, I should mention that Cloudant Search will require an initial, one-time indexing and this single index itself will take up disk space. To limit resource consumption and self-induced podiatric harm, wildcard (“som*thing”) or suffix (“*thing”) searches are not allowed (only prefix searches like “some*”). And yes, some of its more advanced features like the dynamic sorting of results, are still memory-bound and as such, should be used carefully.

There are no silver bullets in this industry. That is something that the Cloudant team has kept in mind through the months we spent developing Cloudant Search. But more than an admission of eventual defeat, we’ve always taken it as a challenge to innovate at the pain-points. To explore the areas beyond the boundaries of our existing trade-offs. To find tasteful solutions to gritty problems. Today, we’re happy to bring you the fruits of the labor, share our response to this problem, and in a way, pass the challenge on to you: we can’t wait to see what you will do with it. We’re confident that Cloudant Search will open new doors, from user-facing analytics dashboards, to efficient Object-Document Mappers, to killer apps we haven’t even dared to think of.

So go ahead and read more about the technology, peruse the docs, activate Search on a database through your cloudant.com dashboard or contact us to see how this technology can be deployed in your datacenter.

Comment on HN