Cloudant's BigCouch is open-source

The Cloudant team is pleased to make available its 'BigCouch' software project as open source software under the Apache 2.0 license. We have been the beneficiary of countless open source projects while constructing this system, so it is only fitting that we share our efforts in hopes that people will not only find utility, but also assist us in making it better.  

It has taken us a while to get to this point. The version we are opening has benefitted from Cloudant operating our systems in production for our customers for almost a year. The recent (and heavy) refactoring effort has encompassed our learning over this time, and we believe it is now time to share this system with the world. We will now be focusing our efforts on documenting and ease of use, as is the case with most newly opened projects.

What does it do? Think of BigCouch as a set of Erlang/OTP applications that allow you to create a cluster of CouchDBs that is distributed across many nodes/servers.  Instead of one big honking CouchDB, the result is an elastic data store which is fully CouchDB API-compliant.  The top is a picture of the standalone CouchDB setup that is common today.  When you download BigCouch and build with 'make dev' (See the README.md file), the result is the bottom picture, a three-node development cluster.

The clustering layer is most closely modeled after Amazon's Dynamo, with consistent hashing, replication, and quorum for read/write operations. CouchDB view indexing occurs in parallel on each partition, and can achieve impressive speedups as compared to standalone serial indexing.

Also contained within BigCouch is a tailor-made RPC server application called 'rexi.'  Rexi better fits the needs of this distributed data store by dropping some unneeded overhead in rex, the RPC server that ships with Erlang/OTP. Rexi is optimized for the case when you need to spawn a bunch of remote processes.  Cast messages are sent from the origin to the remote rexi server, and local processes are spawned from there, which is vastly more efficient than spawning remote processes from the origin.

So, let us know what you think. We will be hanging around Freenode IRC in the #cloudant channel.  Pull the code, build it, install, and begin playing. Document, enhance, give back, benchmark, blog, tweet, ...and enjoy.

Github Project Page

Cloudant Developer Resources

 

Dynamo and CouchDB Clusters

For a while, CouchDB was described as a "distributed, fault-tolerant and schema-free document-oriented database accessible via a RESTful HTTP/JSON API." The story about CouchDB's 'distributed' description has always involved its multi-master replication.  In this sense, it is not truly a horizontally scalable database, as noted here.  With the availability of Cloudant's new hosted service, a new option has entered the scene.  Our clustering is similar to Voldemort, Cassandra, or Riak, as it implements a version of Amazon's Dynamo.  We'd like to tell you a bit about how it works.

Clustering in a ring

Cloudant makes use of partitions for sharding data.  Picture a ring of partitions that each are responsible for a range of the data.  The partitions, sometimes referred to as virtual nodes or vnodes, are placed on separate physical nodes as the primary means of distribution.  Partitioning databases is nothing new, and with Dynamo systems, a keyspace is usually employed.  We use CRC32, so from 0 to 2^32.  We've seen MD5, fnv, and SHA-1 used, and previous versions of our software used SHA-1.  When creating a database, users may provide Q, a constant that determines how many partitions the database will be spread across.  Let's check out an example:

 curl -X PUT http://username.cloudant.com/dbname?q=10 

For a Q of 10, there will be 10 partitions put across all the nodes in the cluster.  Each partition will be responsible for a range of 2^32/Q keys.

Any node can handle a request

Each request to the cluster will most likely go through a round-robin load balancer, because each of the nodes knows how to find the data in O(1) lookup time.  To do this, the document id is hashed, and the resulting hash is compared against the database's partition map, as we discussed above.  If you're referring to the shiny picture on this page, you will see that our example hashes 'doc_id' to the E partition.  In reality, this could be something like hashing a value to c34b and that falls between c000 and cfff hex values, so we call that range the E partition.  The partition map for that database knows which node or nodes have copies of the document, and requests are routed via distributed Erlang messages to those nodes.

Quorum system (N, R, W)

The durability and consistency settings of Dynamo systems are governed by quorum constants.   As you can see from the diagram, N=3.  This means that three replica copies of 'doc_id' document will be stored in the cluster.  When the 'write' arrives at Node 1 and the consistent hashing algorithm tells us the doc will live on the E partition, the system attempts three asynchronous writes, one to each of the nodes that house the E partition replicas.  You can be reasonably certain of the writes succeeding, because the request will wait for W (in this case, W=2) responses.  When two successful responses return to Node 1, the response of 201 Created is returned to the client.  The same thing occurs on 'reads.'  When a GET request arrives at a node, N 'read' requests go out to the nodes which house that document's partition.  If R=2, then the system waits for two agreed-upon values to be returned, and then replies to the client with a 200 OK and the document's contents.

You can set the defaults of N, R, and W for your cluster, but you may also override them at request time.  N may be overridden at database creation time, just like Q above.  R and W may be overridden for each request individually.  A higher W value means more durable writes, where lower W value means more write throughput.  High R values mean consistent values returned, while low R values mean first read wins, and higher read throughput.

Masterless (no SPOF)

Server failure is handled transparently.  The load balancer will not route requests to a downed node.  With consistent hashing, the other nodes are capable of serving requests.  Also, when a node is down, other partitions will provide the data that is located on the downed node.  If N=3, W=2, R=2 as in our example, a node may be down but read and write quorums are still satisfied.  When the node returns, operations return to normal.  The node is placed back into service by the load balancer, and a 'hinted-handoff-like' system is used to replicate the writes that the downed node may have missed.  The other nodes holding data for the partitions of the downed node replicate the updates continuously after the node returns.

Horizontally Scalable

When the cluster gets closer to capacity, or could use more nodes to crunch data in a more parallel fashion, extra nodes may be added to the cluster.  When this happens, partitions are moved from existing nodes to the new nodes.  The system has the flexibility to move partitions on a database level, and can split large partitions if required.  This is an evolving portion of the code base, and we are attempting to do partition merging and splitting all while the database remains online, i.e. other partitions serve data while the work is being done.

Above, there was mention of partition merging.  This may be required as the cluster shrinks.  Or, it is also possible to re-partition an over-partitioned database into fewer shards.  The flexibility exists to shrink a cluster by removing nodes, just as it may be grown, providing a truly elastic data store.

Transparent to the application

All of these features -- distributed, horizontally scalable, durable, consistent -- happen with little or no change required in applications that have been written for CouchDB.  A cluster looks just like a stand-alone CouchDB, and API compliance has been our goal from the beginning.  Granted, there are a few extra options like overriding quorum constant defaults and there are a few vagaries, like views always performing rereduce due to the views being distributed.  But on the whole, the extras in Cloudant are transparent to the application.

Feel free to comment away, email us (info@cloudant.com), or join us on Freenode IRC in the #cloudant room.  We'd love to hear from you.

Using Cloudant’s Heroku Add-on

Heroku has recently jumped with both feet into the NoSQL world, providing add-ons for many different NoSQL data stores to complement it's easy ruby platform.  We're very excited to be able to offer CouchDB 1.0 to Heroku users via the Cloudant add-on.  Cloudant is a distributed CouchDB service, with the added benifits of data redundancy, clustering, and automatic maintenance (view builds and compaciton).  You can use it just like you would standalone CouchDB but you have the advantage of your data being distributed over multiple machines.  The following is a (very short) guide to help you get started with Cloudant and Heroku.  It is by no means comprehensive.  If you have quesitons, feel free to drop us a line at support@cloudant.com or find us at the #cloudant irc channel on freenode.  You can read more about the add-on at the Heroku Blog.

Local setup

Since CouchDB and Cloudant have the same REST API you can use them the same way. You just need a http client to interface with the database and a JSON parser.

In the simplest case you can just use the rest-client and json gems:

 sudo gem install rest-client json 

If you don’t already have CouchDB on your local machine:

OSX:

 brew install couchdb 

Ubuntu:

 sudo apt-get install couchdb 

Further reading:

Using from Sinatra

Installing the Cloudant add-on gives you an environmental variable ‘CLOUDANT_URL’. This contains your API URL and credentials and is how you access your database(s) at Cloudant:

heroku config --long CLOUDANT_URL
https://USER:PASSWORD@APP.heroku.cloudant.com

When you first install the add-on you need to create a database:

Using cURL:

 curl -X PUT https://USER:PASSWORD@APP.heroku.cloudant.com/DATABASENAME 

Using rest-client:

 RestClient.put("https://USER:PASSWORD@APP.heroku.cloudant.com/DATABASENAME", "") 

You could also do this in your code by doing something like:

begin
    RestClient.put("#{ENV['CLOUDANT_URL']}/SOMEDATABASE", "")
rescue
    puts "database already created"
end

The following code will allow you to GET a document from the “music” database and then print it to the page:

DB = "#{ENV['CLOUDANT_URL']}/music"
get "/doc/:doc" do
    doc = RestClient.get("#{DB}/#{params[:doc]}")
    @result = JSON.parse(doc)
    haml :doc_id
end

The view would look something like:

%h1 A Doc from CouchDB!
- @result.each do |k,v|
    %b=k
    %em=v
    %br

To request this page by pointing your browser to:

Deploying to Heroku

To use Cloudant on Heroku, install the Cloudant add-on:

heroku addons:add cloudant:basic

Cloudant's Own to Speak at SurgeCon.

As Cloudant's ops guy and "utility infielder" I (@williamsjoe) will be speaking at the Surge conference in September. Surge is a conference presented by OmitTI that focuses on infrastructure and web operations. As such I will be discussing how we have built Cloudant's database platform on top of Amazon's EC2. The conference will include keynotes from the always excellent John Allspaw (of Flickr and Etsy fame) and Theo Schlossnagle (OmniTI).

 

Cloudant and SETI: Crowdsourcing the search for E.T.

Going to OSCON?  SETI needs your help! 

Cloudant is teaming up with the SETI Institute to help engage citizen-scientists in the search for intelligent life elswhere in the universe.  With some help from Amazon, we've built a 'croudsourced data analytics platform' where ordinary citizens can search and analyze data from the Allen telescope array.  Are you a citizen-scientist?  Do you know about signal processing?  Are  you interested in helping in the search?  Come find out how!

Thursday, July 22nd at OSCON, our own Dr. Dave Hardtke will be demonstrating SETICloud.  He'll be describing the Cloudant platform and how we're bringing the wisdom of the crowds to SETI's search.

Time: 12:20 - 1:20 pm

Place: Room E143/144

SETI is hacking on CouchDB!  Come find out how you can help.

And don't miss Dr. Jill Tarter's keynote talk on "Open SEITQuest" Thursday morning at OSCON:

http://www.oscon.com/oscon2010/public/schedule/detail/13425

Learning to Relax

 

Over the weekend I traveled to Chicago to give a talk on 'CouchDB for Beginners' at the WindyCityDB conference.  You can read the slides above.  The conference exceeded my expectations and many of the talks, both long and short, were stellar.  Big thanks to Ray Hightower and all the people who helped put WindyCityDB together.  I very much appreciate the opportunity to give this talk and I hope everyone enjoyed it. 

Cloudant Nationwide Tour

It turns out that next week (the week of June 21st) is a big week for tech conferences.  A bunch of the Cloudant people will be hitting up various conferences and giving a couple talks.  If anyone is interested in meeting up over beers to talk Cloudant, CouchDB, NoSQL, Startups, etc. get in touch (info at cloudant dot com is a good place to start.)  Here are the tour dates:

  • Structure Conference.  June 23 - 24. San Francisco, CA:   Both Adam Kocoloski (CTO) and David Hardtke (Search) will be milling about this conference, whose tagline is "Put Cloud Computing To Work."  On the 23rd in the afternoon, Cloudant will be taking part in the Launchpad! event.  Adam will be working the crowd, drumming up some love for Cloudant, giving a short pitch about our service.  Check it out.
  • Velocity Conference.  June 22 - 24. Santa Clara, CA:  No Cloudant talk at this conference but Joe Williams (Operations and Utility Infeilding) will be in attendence.  No one I know gets more pumped up about server uptime, configuration management, and monitoring than Joe.  If you're the same way, say hi.
  • Momentum Summit.  June 23. Cambridge, MA: This one day conference focuses on how to turn a startup into big business.  CEO Alan Hoffman (that's me) will be in attendence.  Looking forward to hearing from Boston startup gurus like Steve Kaufer and Scott Griffith.  I'm always eager to talk with other local startup types so come find me.
  • Windy City DB.  June 26.  Chicago, IL:  I'm lucky enough to be heading back to my hometown to attend and speak at this one day event on the south side of Chicago.  My talk is entitiled "Learning to Relax: CouchDB for Beginners" and it is about, uh, CouchDB for beginners.  It's my first long-format NoSQL talk so if you're in attendance, please no heckling.  I'm happy to meet up after the conference; if you're looking for me on Friday afternoon, I'll be at U.S. Cellular Field watching the White Sox.

Just Opensourced: Gaff and Deckard

Today we released two open source projects that have been in use internally at Cloudant for some time now, Gaff and Deckard.

All of our infrastructure is in the cloud and as such we need a way for disperate systems to all request resources, this is where Gaff comes in. Gaff is a pubsub daemon for asynchronously talking to cloud APIs using AMQP. Currently it supports a subset of the Dynect (DNS), Slicehost and EC2 APIs and uses geemus' awesome fog Ruby library. The basic workflow for Gaff is to send JSON-RPC formated messages to an AMQP exchange with a routing key corresponding to the API you are talking to, you could be sending these messages from a web application or another service.  Each message gets routed to an API specific queue and is picked up by Gaff and turned into the appropriate API call, starting, stopping, modifying your servers on EC2 or elsewhere.

We have a lot of CouchDB instances to keep tabs on to do this we wrote Deckard. Deckard is a HTTP check monitoring system based on CouchDB. Yo dawg! What better than to monitor CouchDB with CouchDB (and some Ruby)? Deckard supports basic HTTP content checks, email alerts, SMS alerts (via email) for on-call rotations, basic maintenance scheduling, replication latency alerts (between two Couches) and even has EC2 Elastic IP support for failover between two EC2 instances. Best of all since it's based on Couch you get an API for free, just PUT a doc in the HTTP checks database and you get a new HTTP check the next time Deckard runs.

Checkout these and my other projects on GitHub and follow Cloudant and myself on Twitter.

 

Ode to a Utility Infielder

Today marks the one year anniversary of Joe Williams (@williamsjoe) becoming a member of Team Cloudant.  Those of us who are married will recall that one year is the paper anniversary so it's fitting that Joe will be receiving paper (in the form of vested stock options) from Cloudant today.  I thought I would take a moment on this occasion to thank Joe for his work over the past year.

We first hired Joe on a contract basis to help us build an internal infrastructure tool we call The Deployer.  This is the tool that allows us to quickly provision and configure resources from a number of different cloud providers.  We can, with the push of one button, 'inflate' an X-node Cloudant database cluster, and have it configured and available in a matter of minutes, all because of Joe's ongoing efforts.

Since joining full-time, Joe has acted as our 'utility infielder.'  All the problems in the operations and infrastructure realm fell to Joe.  He built and mantains our monitoring system, our logging system, and our build system.  That alarm system we have where we get a text message in the middle of the night when something goes wrong, Joe did that (thanks Joe!).  Joe, margarita in hand, has pushed hot code updates to live systems from a Mexican karaoke bar.

Needless to say Joe has been and continues to be a staggering presence at Cloudant.  We feel privileged to work with him on a daily basis.  I only hope I'm able to write a similar post on his 5 year anniversary.  Plus, the man has excellent mutton chops.

Thanks Joe!  Here's to the big things (and fewer alarms) coming in the next year.

NoSQL Live From Boston

We're very excited to be joinin up with our friends at 10gen to sponsor the NoSQL Live conference right here in Boston, our backyard.  The official press release can be found here.

The event will take place on March 11th at the John Hancock hotel.  A number of us from Cloudant will be at the conference in both official (panel discussions, lightning talks) and unofficial (milling about, drinking) capacities.  If you are in the New England area or plan on coming to the New England area, you should register and attend the conference.  You can register here.

From the 10gen events site

About:

A one-size-fits-all approach to databases no longer applies. Relational databases have worked well - and will continue to - for highly transactional systems. But today's web applications require enormous scalability and performance. This has spurred the growth of a new class of databases that trade off some of the features of relational databases to offer high performance, ease of programming, high availability & the ability to scale in cloud environments. They are collectively called NoSQL or non-relational databases.

NoSQL Live from Boston is a full-day interactive conference that brings together the thinking in this space. Picking up where other NoSQL events have left off, NoSQL Live goes beyond understanding the basics of these databases to how they are used in production systems. It features panel discussions on use cases, lightning talks, networking session, and a NoSQL Lab where attendees can get a practical view of programming with NoSQL databases.

Interested in presenting or sponsorship opportunities? Contact meghan@10gen.com