A Quick Overview of a Database for the Cloud: CouchDB

Need a database in your cloud? Check out CouchDB.

What is CouchDB?

CouchDB is an Apache project. CouchDB is not a relational database. It seems that cloud computing has spawned, or at least made popular, a new breed of database. Rather than the hierarchical, network or relational databases of yore*, we have a new paradigm: key/value pairs. You declare a field and assign some values.

*I left object database and xml database off my database of yore list as they never really caught on.

SimpleDB is another key/value database that you may have heard of or used. SimpleDB is provided as part of Amazon Web Services (AWS).

What does CouchDB Offer?

CouchDB is accessible via JSON (which I like better than XML for tasks like these) and it uses JavaScript as a query language. CouchDB is document aware. That is, you create a new document and store related data wiithin that document. There is no schema, documents are the important classification of your data..

The really important thing is that CouchDB is highly distributed. It’s this feature that makes it desireable in situations where a relational database does not scale well. According to the Apache CouchDB Documentation:

CouchDB is a peer based distributed database system. Any number of CouchDB hosts (servers and offline-clients) can have independent “replica copies” of the same database, where applications have full database interactivity (query, add, edit, delete). When back online or on a schedule, database changes are replicated bi-directionally.

CouchDB has built-in conflict detection and management and the replication process is incremental and fast, copying only documents and individual fields changed since the previous replication. Most applications require no special planning to take advantage of distributed updates and replication.

Distributed from the ground up. Sweet.

An important note about where CouchDB is different from SimpleDB is that CouchDB is ACID. Rather than using logs for consitency, CouchDB uses redundants sets of data (much like Vertica). CouchDB, like the other key/value databases is “eventually consistent“. That means that it will take time for the replicas to be updated. CouchDB also uses MVCC and readers never block writers. Readers always see a consistent data set.

CouchDB is written in Erlang. That’s a down side to me in that it is not a very common language. If you do need a patch in a hurry, it may be difficult to find someone qualified to write it. CouchDB was originally written in C++ but the author chose to redo it in Erlang for scalability reasons. Hmmm.

That’s the short story on CouchDB. I plan to write more about actually using CouchDB in the near future.


Technorati : , , ,

You can follow any responses to this entry through the RSS 2.0 feed. Both comments and pings are currently closed.

Comments are closed.