Post-Mortem on Overnight Downtime
After looking into the problems from last night, we believe we've found the cause. In the database cluster, we tried a new version of dynamic code loading in a distributed erlang environment. We learned that this introduced a single point of failure in the database and we've since reverted the system to eliminate this problem. We were running the test to improve performance for customers and with today's update we have resolved the issue.
Again, we're sorry to all those affected. We appreciate your patience as we work to improve Cloudant's performance and service.
Posted by Alan Hoffman