Cloud Computing Client Login

Archive for September 23rd, 2009

The Cassandra Project

// September 23rd, 2009 // 7 Comments » // Development

By Jonathan Ellis, Systems Architect

You may have heard about the Cassandra distributed database in recent articles or conferences. I’d like to explain what advantages Cassandra offers over traditional relational databases like MySQL or Oracle and why Rackspace has committed resources to the Cassandra project.

The Cassandra project was started by Facebook in 2007 to scale their internal applications, particularly Inbox Search. Earlier this year, they released it to the Apache incubator where other people from the community could become involved and start contributing. This allowed  the project to move forward in a direction that is more general to the public than just to Facebook’s needs.

In March, I became the first outside committer to this Apache Incubator project. Eric Evans from Rackspace and Jun Rao from IBM Research soon followed, and we recently added Chris Goffinet from Digg. The community has grown from 5 people in the IRC channel in December to  over 60.

Distributed vs. Relational Databases

Traditional relational databases are 30 years old, are well understood and have a huge ecosystem of tools around them.  For that reason, it’s a compelling option when building your application. Postgres, MySQL, and Oracle are all relational databases modeling a schema on entities and relations between those entities. That’s a good, powerful programming model with interesting theoretical properties. But companies with large amounts of data have already gone past what you can reasonably fit on a single machine, even on high-end hardware, and it’s provably impossible to keep the traditional relational model, in particular the ACID properties, while scaling across multiple machines. Even if you’re willing to give up availability, scaling reads (via caching and replication) is difficult with relational databases, and scaling writes by partitioning is either very expensive, very painful from an application programming and operations standpoint, or both.

Cassandra is taking the approach that, given that you’re going to have to give up some parts of the relational model to scale, let’s start over and rethink things. Let’s add things like transparent replication and failover, built-in partitioning and load balancing, multiple data center support, and the ability to add capacity without ever disturbing applications running against the database.

Rackspace’s Involvement

The original Facebook team has been busy elsewhere, so the community has had to step up and take the initiative in moving Cassandra forward.  Cassandra is open source and I don’t want to downplay others’ contributions, including those from IBM Research, Digg, and Twitter as well as other companies and individuals, but I’m proud that Rackspace’s support has been instrumental in adding many important new features, fixing bugs, and getting out new releases.

Here are 3 reasons why Rackspace has committed resources:

1-    As stated in previous posts by Erik Carlin, we are committed to an Open Cloud. With Amazon’s Simple DB or Google App Engine’s datastore, you’re locked in. Cassandra presents an open alternative: you can write against Cassandra and deploy anywhere.  That’s important.

2-    We have a suite of Cloud products that are productized beyond just the raw Cloud Servers. Cassandra is interesting to us because we can use it under the hood to improve Cloud Sites and Cloud Files. And people are already starting to ask, “When can I just go to Rackspace and deploy a preconfigured Cassandra cluster?” It’s still early, but that’s definitely something we’re looking at.

3-    Rackspace itself has a ton of data that we generate from our switches and routers and the rest of our infrastructure. Right now we are getting by with traditional monitoring and logging technologies, searching those logs and so forth. Cassandra will help us a lot with that as our volumes continue to increase. Our Mail & Apps products are also very interested in using Cassandra to store mail messages and other data.

Finally, I want to emphasize Cassandra is not a magic bullet. You can’t just take your SQL app and put it on Cassandra and expect it to work.  It’s a different programming model and instead of modeling as entities and relationships and just adding indexes to get performance, you need to think at a more basic level: “What information do I need to retrieve from each query?” and model your Cassandra schema accordingly.  It’s a different way of thinking and does require new code to be written. It’s very much for people that have a lot data that doesn’t fit on a single machine and are feeling the pain from traditional approaches to scaling that.

We plan to write some other posts in the future detailing what a switch might look like for some sample applications.

Winner of TechCrunch50 RedBeacon on The Rackspace Cloud

// September 23rd, 2009 // 4 Comments » // Community

Angela Bartels, Cloud Maven

When you need a service done – maybe house painting, landscaping, or even plumbing– it can be time consuming to go out and find the best candidate for your job. You may go through your Yellow Pages, send an email to friends and family asking for a referral or even just Google it. Even then, you have to evaluate each contender – what’s the best value for the price you want to pay. You are the payee so why should it require so much time and effort on your part?

RedBeacon, Rackspace Cloud Customer (on Slicehost), was designed to solve this problem. You can go to their website, enter the service you need and they will update all potential providers in your area they believe can serve you best and invite them to submit a price quote for your job.  You will receive a list of service provider descriptions with price quotes interested in doing your job. Choose the best one and you can simply book an appointment with them. The workload is not on you anymore.

It’s no surprise that these guys were the winners of TechCrunh50 this year. When Robert Scoble, also a TechCrunch50 panelist, interviewed them for a Building43 video, Scoble quoted:

“Just interviewed @redbeacon – I am even more impressed with their thinking now (TC50 winner). I was wrong to not pick them.”

I had the opportunity to catch up with Aaron Lee, co-founder of RedBeacon, to find out more about the origin of RedBeacon, their experience at TechCrunch50 and why they chose cloud computing.

As ex-Google product employees, the founders of RedBeacon did take notice that finding local service providers was difficult and unfortunately, they hadn’t leveraged the existing technology to make the lives of consumers better and easier. The idea was to fulfill this need and that’s essentially how RedBeacon came to be.

TechCrunch50 was the perfect platform for RedBeacon to launch their product, receive feedback from the experts and to network. Lee quotes:

“For any startup, TechCrunch50 is definitely a must attend event. It’s a great way for a startup to gain exposure and recognition. We received great advice and feedback from TechCrunch, panel judges, and the audience. The guidance and presentation feedback from Jason and Michael was just phenomenal. They also provide a low cost way for young startups to showcase their ideas to industry veterans and VCs.”

Currently, RedBeacon is signing up service providers (it’s free) to establish a good foundation. Once they seed the database, they will launch in early October to consumers. They will only make money (10% commission) when the service provider makes money. Their next steps will be focused on deploying exciting new features such as leveraging social graph and partnerships with merchants, associations and publishers.

As a start up venture, RedBeacon wanted a cloud computing provider where they could easily turn servers off and on when needed. They chose Slicehost for this very reason but most importantly for the support and the control panel:

“Slicehost has kickass 24/7 technical support – it’s like having a master system admin on your team and only for a fraction of the cost. The UI is slick and intuitive. Bringing up and down servers is a breeze. We spun up a dozen servers easily to handle the load during TechCrunch50. Overall I am very happy with Slicehost and would definitely recommend it to others.”

Congrats RedBeacon!