Developer Preview: Cloudant Search for CouchDB

David Hardtke

January 11, 2011

CouchDB is a compelling choice for database-backed web apps and analytics applications. Cloudant’s hosted CouchDB service adds scaling, redundancy, and fault-tolerance to CouchDB. Today we are releasing a new feature for developers to try out: full text indexing and search.

There are several products that implement search for CouchDB as external services (couchdb-lucene and ElasticSearch). Like these, our full text search uses Lucene query syntax and is based on Lucene. For Cloudant search, however, the search application is directly integrated into the Cloudant CouchDB database. This has several distinct advantages over other Lucene based search products:

  • Real-time Search. Search results reflect state of database at moment it is searched. Little to no latency on document additions, changes, or deletions.
  • Indexing and Search using all valid JSON types. In Lucene, the search terns are required to be strings. This is especially problematic for date range and numerical range searches. With Cloudant search, the tokens can be any valid JSON type (string, number, object, array, boolean, null).
  • Fault-tolerance and redundancy. The search indices are stored on the same nodes as their parent documents.
  • Little to no setup and configuration. For basic search, the procedure is upload documents, enable search, search documents using REST interface.
  • Scalability. Cloudant Core is designed to be horizontally scalable document store. Cloudant Search is a horizontally scalable search product.

Under the hood, our search consists of two parts. The first step is the creation of a reverse index, or posting list, for your documents. This reverse index maps terms to the documents in which they appear. We have implemented document indexing as a MapReduce view using couchjava, our Java language view server for CouchDB. This allows for usage of custom application-specific Lucene analyzers written in Java. In fact, your existing Lucene analyzers can be imported directly into Cloudant. If you prefer, you can write your document indexer in Javascript as long as you write a MapReduce view using the correct format (in a later blog post, we’ll show how our search application can be used for ad-hoc queries of your CouchDB database).

The second part of Cloudant search is a RESTful API that takes your search query and performs the necessary boolean logic to return the most relevant documents.

What is Cloudant search good for? Here are some applications that are easy (and cheap) to build using Cloudant full text indexing and search:

  • Searchable Real Time Catalog. If you are starting the next Ebay, build it using Cloudant. We combine full text search and numerical range queries.
  • Document Discovery by the Hour. If you have a collection of documents that you wish to search through but don’t need to archive, you can upload them to Cloudant, enable search, run your queries, and delete the database.

Currently, the beta version of Cloudant search is available to all hosted customers with paid accounts (including users of the Cloudant Heroku Addon). The version of the search product we are rolling out today is designed for smaller databases (less that a few gigabytes). An enhanced version that uses distributed search for larger databases will be available soon. The feature is free, but the search index counts against your data storage quota and the search queries count against your query quotas.

A technical description of the search API can be found on the Search API page. Instructions for making your databases searchable can be found on the Search Indexing page. Please submit your questions and suggestions to our discussion forum. If your application requires advanced indexing or search capabilities, please contact us for a quote on a custom solution. Please contact us for further information.

Over the next weeks, we’ll be posting tutorials on various ways the search can be used for web applications and ad-hoc analytics. Please stay tuned.

Comment on HN