Cloud Computing Client Login

Archive for Development

NoSQL Ecosystem

// November 9th, 2009 // 51 Comments » // Development

By Jonathan Ellis, Systems Architect

Unprecedented data volumes are driving businesses to look at alternatives to the traditional relational database technology that has served us well for over thirty years.  Collectively, these alternatives have become known as “NoSQL databases.”

The fundamental problem is that relational databases cannot handle many modern workloads.  There are three specific problem areas: scaling out to data sets like Digg’s (3 TB for green badges) or Facebook’s (50 TB for inbox search) or eBay’s (2 PB overall), per-server performance, and rigid schema design.

Businesses, including The Rackspace Cloud, need to find new ways to store and scale large amounts of data. I recently wrote a post on  Cassandra, a non-relational database we have committed resources to. There are other non-relational databases being worked on and collectively, we call this the “NoSQL movement.”

The “NoSQL” term was actually coined by a fellow Racker, Eric Evans when Johan Oskarsson of Last.fm wanted to organize an event to discuss open source distributed databases. The name and concept both caught on.

Some people object to the NoSQL term because it sounds like we’re defining ourselves based on what we aren’t doing rather than what we are. That’s true, to a degree, but the term is still valuable because when a relational database is the only tool you know, every problem looks like a thumb.  NoSQL is making people aware that there are other options out there. But we’re not anti-relational-database for when that really is the best tool for the job; it’s “Not Only SQL,” rather than “No SQL at all.”

One real concern with the NoSQL name is that it’s such a big tent that there is room for very different designs.  If this is not made clear when discussing the various products, it results in confusion.  So I’d like to suggest three axes along which to think about the many database options: scalability, data and query model, and persistence design.

I have chosen 10 NoSQL databases as examples.  This is not an exhaustive list, but the concepts discussed are crucial for evaluating others as well.

Scalability

Scaling reads is easy with replication, so when we’re talking about scaling in this context, we mean scaling writes by automatically partitioning data across multiple machines.  We call systems that do this “distributed databases.”  These include Cassandra, HBase, Riak, Scalaris, Voldemort, and more.  If your write volume or data size is more than one machine can handle then these are your only options if you don’t want to manage partitioning manually.  (You don’t.)

There are two things to look for in a distributed database: 1) support for multiple datacenters and 2) the ability to add new machines to a live cluster transparently to your applications.

Non-distributed NoSQL databases include CouchDB, MongoDB, Neo4j, Redis, and Tokyo Cabinet.  These can serve as persistence layers for distributed systems; MongoDB provides limited support for sharding, as does a separate Lounge project for CouchDB, and Tokyo Cabinet can be used as a Voldemort storage engine.

Data and Query Model

There is a lot of variety in the data models and query APIs in NoSQL databases.

(Respective Links: Thrift, map/reduce views, Thrift, Cursor, Graph, Collection, Nested hashes, get/put, get/put, get/put)

Some highlights:

The columnfamily model shared by Cassandra and HBase is inspired by the one described by Google’s Bigtable paper, section 2.  (Cassandra drops historical versions, and adds supercolumns.) In both systems, you have rows and columns like you are used to seeing, but the rows are sparse: each row can have as many or as few columns as desired, and columns do not need to be defined ahead of time.

The Key/value model is the simplest and easiest to implement but inefficient when you are only interested in querying or updating part of a value.  It’s also difficult to implement more sophisticated structures on top of distributed key/value.

Document databases are essentially the next level of Key/value, allowing nested values associated with each key.  Document databases support querying those more efficiently than simply returning the entire blob each time.

Neo4J has a really unique data model, storing objects and relationships as nodes and edges in a graph.  For queries that fit this model (e.g., hierarchical data) they can be 1000s of times faster than alternatives.

Scalaris is unique in offering distributed transactions across multiple keys.  (Discussing the trade-offs between consistency and availability is beyond the scope of this post, but that is another aspect to keep in mind when evaluating distributed systems.)

Persistence Design

By persistence design I mean, “how is data stored internally?”

The persistence model tells us a lot about what kind of workloads these databases will be good at.

In-memory databases are very, very fast (Redis achieves over 100,000 operations per second on a single machine), but cannot work with data sets that exceed available RAM.  Durability (retaining data even if a server crashes or loses power) can also be a problem; the amount of data you can expect to lose between flushes (copying the data to disk) is potentially large.  Scalaris, the other in-memory database on our list, tackles the durability problem with replication, but since it does not support multiple data centers your data will be still be vulnerable to things like power failures.

Memtables and SSTables buffer writes in memory (a “memtable”) after writing to an append-only commit log for durability.  When enough writes have been accepted, the memtable is sorted and written to disk all at once as a “sstable.”  This provides close to in-memory performance since no seeks are involved, while avoiding the durability problems of purely in-memory approaches.  (This is described in more detail in sections 5.3 and 5.4 of the previously-referenced Bigtable paper, as well as in The log-structured merge-tree.)

B-Trees have been used in databases since practically the beginning of time.  They provide robust indexing support, but performance is poor on rotational disks (which are still by far the most cost-effective) because of the multiple seeks involved in reading or writing anything.

An interesting variant is CouchDB’s append-only B-Trees, which avoids the overhead of seeks at the cost of limiting CouchDB to one write at a time.

Conclusion

The NoSQL movement has exploded in 2009 as an increasing number of businesses wrestle with large data volumes.  The Rackspace Cloud is pleased to have played an early role in the NoSQL movement, and continues to commit resources to Cassandra and support events like NoSQL East.

NoSQL conference announcements and related discussion can be found on the Google discussion group.

How do you put a Database in the Cloud?

// October 28th, 2009 // 3 Comments » // Development

By Jonathan Bryce, Founder

There’s a lot of buzz in the cloud world around Amazon’s new Relational Database Service. With this move, Amazon inches up one level from pure infrastructure to also owning the operating system and base server software (you can’t SSH into an RDS EC2 instance). More interesting than the announcement itself is the discussion it’s generated, a frequent question being, “Is it really cloud?”

  Techcrunch provides coverage and an intelligent discussion in the comments.

Rightscale makes the point that RDS instances are basically MySQL appliances, at the core just EC2 instances running MySQL. This is a capability RightScale has offered for years on top of the same infrastructure. The RDS instances then have some valuable, automated services layered on top to back up and scale the resources available to that EC2 instance. This is similar to the value added services Rightscale has offered as well as similar to the snapshot backups and in-place scaling Cloud Servers offers for all server types. A side note is that this is obviously a step that will be worrisome for some of the Amazon partners who are building businesses on top of Amazon’s infrastructure services.

Database != Cloud?

Back to the original question: How do you do databases in the cloud? This is a question we’ve been consumed with for years. We are are running thousands and thousands of applications, most of which are back-ended by a MySQL or Microsoft SQL Server database.  At Rackspace, we have a few basic philosophies that influence how we approach our product offerings from managed hosting to cloud to email.

First, we want to give users a variety of options that start very low in the stack and go all the way up to software services. Customers can pick where in the stack they want to work to match their needs for customization, ease of use, and required technical skill.

Second, we want to try to make the transition from one type of service to another as smooth as possible. I applaud Amazon for implementing this in a way that preserves the standard MySQL protocol. We’ve taken the same approach with our Cloud Sites database capabilities.

The third goal, though, is the hardest: smooth scalability. One of the primary promises of the cloud is elasticity. For something like our web application servers, we run custom versions of web server software that allow us to reach a level of scale that will meet practically any need. Relational databases, though, are much more difficult to scale infinitely. Amazon’s approach is to give their users the building blocks and ask them to handle the scaling. We’ve taken a different tack, trying to handle scaling as seamlessly as possible.

Along the way, we’ve learned many lessons about scaling, especially databases. The first lesson is that there’s still a limit to how far you can stretch today’s relational database software. We’ve been able to create a MySQL offering for Cloud Sites that has elastically handled massive volume, but we’ve also reached upper bounds and had to help a small number of customers deploy in other configurations. You can throw bigger, beefier hardware at it, but eventually you can’t go any farther vertically. You can scale horizontally using projects like mysql-proxy to load-balance queries, but again, you will run into problems like maintaining consistency across all your nodes. For the vast majority of database usage out there, these problems never appear on the horizon and the work we’ve already done on MySQL is elastic enough. For those cases where the database needs to do more, we’ve been working with two interesting new technologies.

Drizzle

Drizzle is a project that is one step removed from MySQL. It’s primarily worked on by developers who also worked on MySQL, with the goal being to modularize every component of MySQL and build a scalable, cloud-friendly version of the world’s most popular RDBMS. Drizzle is still in early stages of development, but it shows a lot of promise. Bringing real horizontal scaling to MySQL, while maintaining compatibility and relational database functionality will be a huge step forward. And if it won’t require a complete rework of the decades of development time that has been spent on RDBMS-backed applications, that is a big bonus.

Cassandra

Beyond Drizzle, we are actively contributing to and working on the Cassandra distributed database system. Cassandra goes beyond trying to scale a traditional relational database. Cassandra removes many of those traditional concepts and places a priority on scaling. When you are dealing with billions of writes and terabytes of data, you’ve moved into a realm of technology needs that requires you to adopt some new concepts. Truly web-scaled applications, like Digg, have reached this point and started to make the shift. The possibility of reaching hundreds of millions of users worldwide with online applications creates scale problems like never before. We see distributed databases as a key component to solving the next wave of scaling problems and that’s why we are investing heavily into it. If you put in a little bit of extra development effort upfront, they offer the potential for truly elastic, cloud database services.

If the idea of creating infinitely scalable database technology consumes your thoughts, send us your resume – jobs@rackspacecloud.com. We are always looking to add skilled engineers and developers, and will provide an opportunity to work on some of the largest infrastructure systems in the world, handling billions of transactions every month.

Related Posts: The Cassandra Project

The Cassandra Project

// September 23rd, 2009 // 7 Comments » // Development

By Jonathan Ellis, Systems Architect

You may have heard about the Cassandra distributed database in recent articles or conferences. I’d like to explain what advantages Cassandra offers over traditional relational databases like MySQL or Oracle and why Rackspace has committed resources to the Cassandra project.

The Cassandra project was started by Facebook in 2007 to scale their internal applications, particularly Inbox Search. Earlier this year, they released it to the Apache incubator where other people from the community could become involved and start contributing. This allowed  the project to move forward in a direction that is more general to the public than just to Facebook’s needs.

In March, I became the first outside committer to this Apache Incubator project. Eric Evans from Rackspace and Jun Rao from IBM Research soon followed, and we recently added Chris Goffinet from Digg. The community has grown from 5 people in the IRC channel in December to  over 60.

Distributed vs. Relational Databases

Traditional relational databases are 30 years old, are well understood and have a huge ecosystem of tools around them.  For that reason, it’s a compelling option when building your application. Postgres, MySQL, and Oracle are all relational databases modeling a schema on entities and relations between those entities. That’s a good, powerful programming model with interesting theoretical properties. But companies with large amounts of data have already gone past what you can reasonably fit on a single machine, even on high-end hardware, and it’s provably impossible to keep the traditional relational model, in particular the ACID properties, while scaling across multiple machines. Even if you’re willing to give up availability, scaling reads (via caching and replication) is difficult with relational databases, and scaling writes by partitioning is either very expensive, very painful from an application programming and operations standpoint, or both.

Cassandra is taking the approach that, given that you’re going to have to give up some parts of the relational model to scale, let’s start over and rethink things. Let’s add things like transparent replication and failover, built-in partitioning and load balancing, multiple data center support, and the ability to add capacity without ever disturbing applications running against the database.

Rackspace’s Involvement

The original Facebook team has been busy elsewhere, so the community has had to step up and take the initiative in moving Cassandra forward.  Cassandra is open source and I don’t want to downplay others’ contributions, including those from IBM Research, Digg, and Twitter as well as other companies and individuals, but I’m proud that Rackspace’s support has been instrumental in adding many important new features, fixing bugs, and getting out new releases.

Here are 3 reasons why Rackspace has committed resources:

1-    As stated in previous posts by Erik Carlin, we are committed to an Open Cloud. With Amazon’s Simple DB or Google App Engine’s datastore, you’re locked in. Cassandra presents an open alternative: you can write against Cassandra and deploy anywhere.  That’s important.

2-    We have a suite of Cloud products that are productized beyond just the raw Cloud Servers. Cassandra is interesting to us because we can use it under the hood to improve Cloud Sites and Cloud Files. And people are already starting to ask, “When can I just go to Rackspace and deploy a preconfigured Cassandra cluster?” It’s still early, but that’s definitely something we’re looking at.

3-    Rackspace itself has a ton of data that we generate from our switches and routers and the rest of our infrastructure. Right now we are getting by with traditional monitoring and logging technologies, searching those logs and so forth. Cassandra will help us a lot with that as our volumes continue to increase. Our Mail & Apps products are also very interested in using Cassandra to store mail messages and other data.

Finally, I want to emphasize Cassandra is not a magic bullet. You can’t just take your SQL app and put it on Cassandra and expect it to work.  It’s a different programming model and instead of modeling as entities and relationships and just adding indexes to get performance, you need to think at a more basic level: “What information do I need to retrieve from each query?” and model your Cassandra schema accordingly.  It’s a different way of thinking and does require new code to be written. It’s very much for people that have a lot data that doesn’t fit on a single machine and are feeling the pain from traditional approaches to scaling that.

We plan to write some other posts in the future detailing what a switch might look like for some sample applications.

Coding in the Cloud – Rule 6 – HTTP Includes

// September 22nd, 2009 // 1 Comment » // Development

By Adrian Otto

This continues my series on Rules for Coding in the Cloud. These are rules I’ve developed after watching applications encounter problems at scale when deployed on Cloud Sites.

Rule 6:  Never use HTTP include. Let me explain.

How does a HTTP include work?

You tell your PHP application, “I want to include a file.” For the file name, you supply a URL, which the server must download.  A client makes a connection to a PHP web server, the PHP web server runs an application, the application opens a file, and the file type is a URL. The server makes contact with another server, downloads this URL and puts the output into the PHP script.

Why is this a problem?

This results in not only a huge security problem, but also a performance problem. And now you’re faced with a potential outcome that could be disastrous—an infinite loop in an elastic server environment. You can accidentally create an HTTP include which includes something from your own site, which includes something from your site, which includes something from your site, and… well, you get the idea. If you do that, you’ll get a single client connection, which will open a connection to itself, over and over, until you have 50,000 of them running in parallel. The last connection will then hit the limit that you’re allowed to create and the entire thing will roll all the way back. You’ll get a failure, and the whole application will proceed as if it never happened.  Unfortunately, you will not be aware of this issue until you receive your bill with an outrageous amount of compute cycle usage. The cloud had to do huge amounts of work that you couldn’t even see!  That’s really the scary part about this scenario because the site looks like it’s working just fine. When you browse through your site, it comes up relatively quickly because that just scales through the entire system.  Meanwhile, The Rackspace Cloud is receiving alerts. You may not even know that your site has done the equivalent of 50,000 hits for every single hit.

In addition, you may also inadvertently involve someone else’s site. If you have two interdependent sites, the two may end up fighting back and forth, creating a massive loop.  And because the server is making the HTTP connection, the browser is completely unaware of it, so the browser’s anti-loop code won’t prevent it.  There’s no way to break the loop because there’s no way to see where it starts.

There is more than one way to do an HTTP include. One of them actually allows you to include PHP code from a remote URL and execute it as part of the local application. This feature (gaping hole) in PHP is actually disabled on Cloud Sites. What does work is using an fopen() call where the argument is a URL. This allows you to read data from that file handle and process it (potentially just printing it out to the browser). Try not to be tempted to eval() any of that output.

This may strike you as familiar advice. I mentioned a similar subject in Rule 4 – Avoid External Dependencies and included a code example of how to download content from a remote site on demand, cache a local copy, and provide non-blocking access to that data. The reason why this is a separate rule is I’ve seen it broken repeatedly, but not as an external dependency. It’s a risk of a circular internal (or external) dependency. People find reasons to HTTP include content from their own site but please try not to! What seems like an innocent include eventually leads to the infinite loop situation described above.

Bottom line: Never use HTTP include.

Click here to learn more about cloud computing.

The Work of Open Cloud

// September 21st, 2009 // 2 Comments » // Community, Development, Events

erik-carlin By Erik Carlin, Senior Architect

I just returned from Philadelphia where the DMTF Open Cloud Standards Incubator group had a three-day face-to-face meeting on cloud computing standardization.  In addition to Rackspace, other companies present included:  CA, Cisco, Citrix, Hitachi, HP, IBM, Intel, Microsoft, Sungard, and VMware.

The purpose of the incubator group is essentially to get the ball rolling – to frame the problem and lay a foundation upon which specific cloud standards can be developed via new or existing DMTF working groups (e.g. OVF) and other SDOs – a new TLA (three letter acronym) I learned this week – which stands for Standards Development Organization.  Specifically, this includes things like defining a cloud taxonomy, laying out use cases, and identifying specific areas ripe for standardization.  The work of open cloud is an arduous one and I appreciate the energy and commitment of the group.  Three full days of debating terms, concepts, technology, use cases, etc. can be exhausting!  Nevertheless, it is necessary and is the means to an end of specific cloud standards that will add real value for customers.

Rackspace has been involved in a number of early conversations and meetings over the past 18 months around cloud standards.  It has been frustrating because there has been little to no progress (albeit the efforts were always well intentioned).  It’s exciting now to see more formality around cloud standards development (from the DMTF and others) as well as a coalescing of various standardization efforts (e.g. http://cloud-standards.org).  We had a number of discussions about the work other groups are doing so there is both awareness and intentionality with regard to collaboration.

Rackspace has never believed in “lock-in.”  We want to earn your business through Fanatical Support. Even in our traditional hosting business, where the single tenant nature and ability to uniquely customize infrastructure necessitates a contract, we have the Fanatical Support Promise which lets you out if you feel we haven’t lived up to our promises.  If you believe you are best served by going elsewhere, you should have the freedom to move.  And the cloud should be no different.  While the cloud doesn’t require a contract, there are APIs, image formats, etc. that, in the absence of standards, will be proprietary and hinder portability.  That shouldn’t be the case and solid, well-received cloud standards are the key to avoiding cloud lock-in.

We can’t predict where cloud standards will go, but we are committed to participating in the process and helping create a world of open clouds.  Without them, we will never realize the full potential of the cloud.

Have a question? Send me an email.