Cloud Computing Client Login

Archive for Development

The Cassandra Project

// September 23rd, 2009 // 7 Comments » // Development

By Jonathan Ellis, Systems Architect

You may have heard about the Cassandra distributed database in recent articles or conferences. I’d like to explain what advantages Cassandra offers over traditional relational databases like MySQL or Oracle and why Rackspace has committed resources to the Cassandra project.

The Cassandra project was started by Facebook in 2007 to scale their internal applications, particularly Inbox Search. Earlier this year, they released it to the Apache incubator where other people from the community could become involved and start contributing. This allowed  the project to move forward in a direction that is more general to the public than just to Facebook’s needs.

In March, I became the first outside committer to this Apache Incubator project. Eric Evans from Rackspace and Jun Rao from IBM Research soon followed, and we recently added Chris Goffinet from Digg. The community has grown from 5 people in the IRC channel in December to  over 60.

Distributed vs. Relational Databases

Traditional relational databases are 30 years old, are well understood and have a huge ecosystem of tools around them.  For that reason, it’s a compelling option when building your application. Postgres, MySQL, and Oracle are all relational databases modeling a schema on entities and relations between those entities. That’s a good, powerful programming model with interesting theoretical properties. But companies with large amounts of data have already gone past what you can reasonably fit on a single machine, even on high-end hardware, and it’s provably impossible to keep the traditional relational model, in particular the ACID properties, while scaling across multiple machines. Even if you’re willing to give up availability, scaling reads (via caching and replication) is difficult with relational databases, and scaling writes by partitioning is either very expensive, very painful from an application programming and operations standpoint, or both.

Cassandra is taking the approach that, given that you’re going to have to give up some parts of the relational model to scale, let’s start over and rethink things. Let’s add things like transparent replication and failover, built-in partitioning and load balancing, multiple data center support, and the ability to add capacity without ever disturbing applications running against the database.

Rackspace’s Involvement

The original Facebook team has been busy elsewhere, so the community has had to step up and take the initiative in moving Cassandra forward.  Cassandra is open source and I don’t want to downplay others’ contributions, including those from IBM Research, Digg, and Twitter as well as other companies and individuals, but I’m proud that Rackspace’s support has been instrumental in adding many important new features, fixing bugs, and getting out new releases.

Here are 3 reasons why Rackspace has committed resources:

1-    As stated in previous posts by Erik Carlin, we are committed to an Open Cloud. With Amazon’s Simple DB or Google App Engine’s datastore, you’re locked in. Cassandra presents an open alternative: you can write against Cassandra and deploy anywhere.  That’s important.

2-    We have a suite of Cloud products that are productized beyond just the raw Cloud Servers. Cassandra is interesting to us because we can use it under the hood to improve Cloud Sites and Cloud Files. And people are already starting to ask, “When can I just go to Rackspace and deploy a preconfigured Cassandra cluster?” It’s still early, but that’s definitely something we’re looking at.

3-    Rackspace itself has a ton of data that we generate from our switches and routers and the rest of our infrastructure. Right now we are getting by with traditional monitoring and logging technologies, searching those logs and so forth. Cassandra will help us a lot with that as our volumes continue to increase. Our Mail & Apps products are also very interested in using Cassandra to store mail messages and other data.

Finally, I want to emphasize Cassandra is not a magic bullet. You can’t just take your SQL app and put it on Cassandra and expect it to work.  It’s a different programming model and instead of modeling as entities and relationships and just adding indexes to get performance, you need to think at a more basic level: “What information do I need to retrieve from each query?” and model your Cassandra schema accordingly.  It’s a different way of thinking and does require new code to be written. It’s very much for people that have a lot data that doesn’t fit on a single machine and are feeling the pain from traditional approaches to scaling that.

We plan to write some other posts in the future detailing what a switch might look like for some sample applications.

Coding in the Cloud – Rule 6 – HTTP Includes

// September 22nd, 2009 // 1 Comment » // Development

By Adrian Otto

This continues my series on Rules for Coding in the Cloud. These are rules I’ve developed after watching applications encounter problems at scale when deployed on Cloud Sites.

Rule 6:  Never use HTTP include. Let me explain.

How does a HTTP include work?

You tell your PHP application, “I want to include a file.” For the file name, you supply a URL, which the server must download.  A client makes a connection to a PHP web server, the PHP web server runs an application, the application opens a file, and the file type is a URL. The server makes contact with another server, downloads this URL and puts the output into the PHP script.

Why is this a problem?

This results in not only a huge security problem, but also a performance problem. And now you’re faced with a potential outcome that could be disastrous—an infinite loop in an elastic server environment. You can accidentally create an HTTP include which includes something from your own site, which includes something from your site, which includes something from your site, and… well, you get the idea. If you do that, you’ll get a single client connection, which will open a connection to itself, over and over, until you have 50,000 of them running in parallel. The last connection will then hit the limit that you’re allowed to create and the entire thing will roll all the way back. You’ll get a failure, and the whole application will proceed as if it never happened.  Unfortunately, you will not be aware of this issue until you receive your bill with an outrageous amount of compute cycle usage. The cloud had to do huge amounts of work that you couldn’t even see!  That’s really the scary part about this scenario because the site looks like it’s working just fine. When you browse through your site, it comes up relatively quickly because that just scales through the entire system.  Meanwhile, The Rackspace Cloud is receiving alerts. You may not even know that your site has done the equivalent of 50,000 hits for every single hit.

In addition, you may also inadvertently involve someone else’s site. If you have two interdependent sites, the two may end up fighting back and forth, creating a massive loop.  And because the server is making the HTTP connection, the browser is completely unaware of it, so the browser’s anti-loop code won’t prevent it.  There’s no way to break the loop because there’s no way to see where it starts.

There is more than one way to do an HTTP include. One of them actually allows you to include PHP code from a remote URL and execute it as part of the local application. This feature (gaping hole) in PHP is actually disabled on Cloud Sites. What does work is using an fopen() call where the argument is a URL. This allows you to read data from that file handle and process it (potentially just printing it out to the browser). Try not to be tempted to eval() any of that output.

This may strike you as familiar advice. I mentioned a similar subject in Rule 4 – Avoid External Dependencies and included a code example of how to download content from a remote site on demand, cache a local copy, and provide non-blocking access to that data. The reason why this is a separate rule is I’ve seen it broken repeatedly, but not as an external dependency. It’s a risk of a circular internal (or external) dependency. People find reasons to HTTP include content from their own site but please try not to! What seems like an innocent include eventually leads to the infinite loop situation described above.

Bottom line: Never use HTTP include.

Click here to learn more about cloud computing.

The Work of Open Cloud

// September 21st, 2009 // 2 Comments » // Community, Development, Events

erik-carlin By Erik Carlin, Senior Architect

I just returned from Philadelphia where the DMTF Open Cloud Standards Incubator group had a three-day face-to-face meeting on cloud computing standardization.  In addition to Rackspace, other companies present included:  CA, Cisco, Citrix, Hitachi, HP, IBM, Intel, Microsoft, Sungard, and VMware.

The purpose of the incubator group is essentially to get the ball rolling – to frame the problem and lay a foundation upon which specific cloud standards can be developed via new or existing DMTF working groups (e.g. OVF) and other SDOs – a new TLA (three letter acronym) I learned this week – which stands for Standards Development Organization.  Specifically, this includes things like defining a cloud taxonomy, laying out use cases, and identifying specific areas ripe for standardization.  The work of open cloud is an arduous one and I appreciate the energy and commitment of the group.  Three full days of debating terms, concepts, technology, use cases, etc. can be exhausting!  Nevertheless, it is necessary and is the means to an end of specific cloud standards that will add real value for customers.

Rackspace has been involved in a number of early conversations and meetings over the past 18 months around cloud standards.  It has been frustrating because there has been little to no progress (albeit the efforts were always well intentioned).  It’s exciting now to see more formality around cloud standards development (from the DMTF and others) as well as a coalescing of various standardization efforts (e.g. http://cloud-standards.org).  We had a number of discussions about the work other groups are doing so there is both awareness and intentionality with regard to collaboration.

Rackspace has never believed in “lock-in.”  We want to earn your business through Fanatical Support. Even in our traditional hosting business, where the single tenant nature and ability to uniquely customize infrastructure necessitates a contract, we have the Fanatical Support Promise which lets you out if you feel we haven’t lived up to our promises.  If you believe you are best served by going elsewhere, you should have the freedom to move.  And the cloud should be no different.  While the cloud doesn’t require a contract, there are APIs, image formats, etc. that, in the absence of standards, will be proprietary and hinder portability.  That shouldn’t be the case and solid, well-received cloud standards are the key to avoiding cloud lock-in.

We can’t predict where cloud standards will go, but we are committed to participating in the process and helping create a world of open clouds.  Without them, we will never realize the full potential of the cloud.

Have a question? Send me an email.

The Rackspace Cloud Servers iPhone App Now Available

// September 1st, 2009 // 4 Comments » // Cloud Servers, Community, Development

Since the release of our API’s, developers have been vigorously working on developing cool new applications, one of those being the Cloud Servers iPhone App developed by Michael Mayo.

Since Michael first mentioned his work on the iPhone app on his blog, overhrd.com on July 4th, we have been receiving a high number of people asking, “When and where will it be released?” It has finally been approved and is now available in the iPhone app store from your iPhone or  you can also get it in the iTunes store.

This Cloud Servers application, available on iPhone and iPod Touch, allows you to easily and quickly administer your Cloud Servers on the go, wherever you are. While it’s not a complete replacement for our browser-based control panel or APIs, it does allow for some powerful remote administration capabilities.

Here is a list of features:

•    List all the Cloud Servers on your account
•    View details about each Cloud Server
•    Rename Cloud Servers
•    Resize Cloud Servers
•    Perform Hard Reboots
•    Perform Soft Reboots
•    Find Cloud Servers by Shared IP Group
•    Create new Cloud Servers (including from any existing backups)

We spoke with Michael about his experience working with The Rackspace Cloud team and developing with our API:

“It’s been a great experience to collaborate with the Rackspace Cloud team on the development of the Cloud Servers API. Working together with the Rackspace team enabled me to develop my application much more quickly. I’m very pleased with the open feedback process as it makes my work a lot easier.”

Just Launched: The Rackspace Cloud Tools Website

// August 25th, 2009 // Comments Off // Announcements, Community, Development

At The Rackspace Cloud, we believe that a key component to the growth of an open cloud is a strong cloud ecosystem.  Step one in this effort was the release of our APIs – and we didn’t build these alone. We asked our partners what they wanted and their feedback served as a key factor in driving this effort forward. We made these available as open source so anyone can incorporate them directly into their products.

Step two has been the ongoing joint innovation and collaboration with our partners.  To ensure we  build and provide all of the solutions required for customers to develop, deploy or manage their environments in the cloud, we rely heavily on our partners.  We’ve been actively working with our partners to develop the tools and applications that our customers and the market need today.   These partner solutions will make it easier for customers to utilize the cloud when it makes sense, and accelerate the pace of innovation in the marketplace.

Today, The Rackspace Cloud is taking another big step forward in an effort to encourage openness, innovation and collaboration in the cloud. We’re proud to announce the launch of Cloud Tools (tools.rackspacecloud.com), an online site that brings together tools, applications and services created for The Rackspace Cloud by our partners and the community of developers.

A little bit about Cloud Tools

  • - It includes offerings from Featured Partners who we’ve worked extensively with to define customer needs and build offerings to serve them.
  • - It provides a venue for the Community to list the projects and solutions  they’re building.
  • - It’s one destination for browsing and researching tools that extend the functionality of The Rackspace Cloud.
  • - It offers a way to easily raise awareness about our partners’ tools and applications, driving more customers to their products. And by driving more customers to their products, we hope to encourage more innovation from the ecosystem.

Openness and collaboration will continue to define our engagement with our partners and the development community.  Should you wish to speak with the team, please send an email to cloudpartners@rackspacecloud.com or feel free to email me directly at jim.curry@rackspace.com.

Jim Curry
Vice President, Corporate Development
Twitter:  @jimcurry