Cloud Computing Client Login

Archive for September, 2009

The Rackspace Cloud Welcomes Harper Reed, Nepholologist

// September 30th, 2009 // 2 Comments » // Community

Emil Sayegh, General Manager, The Rackspace Cloud

You may have read from Silicon Angle yesterday that former CTO of Threadless, Harper Reed is joining the Rackspace Cloud as our very own Nepholologist - What is a Nepholologist? Feel free to look it up. ;)   He’ll be working on cool technology for our customers, evangelizing The Rackspace Cloud, and serve as an all around technical expert to help customers on board The Rackspace Cloud.  I am personally very excited to have him join the Rackspace family and on behalf of myself and the entire team here, we would like to give him a very warm welcome.

Harper brings a wealth of experience in technology and business development as one of the first employees at the trendy and highly successful start up Threadless (and a long-term loved Rackspace customer). He has accomplished great things at Threadless (selling something like 5M shirts during his time there) and is ready to take on new challenges. As a matter of fact, this is how we met him. He was an amazing customer of both our Cloud and Dedicated businesses, always helping us get better by providing critical feedback. We loved his passion and I’d say he loved our passion as well.  Harper is joining us because he is eager to put what he has learned over the years to use by helping start ups solve their computing problems:

“I’m excited about companies that are solving creating products which are simple and easy to consume. A lot of iPhone developers are doing great solving addressing this problem challenge – creating a simple app that solves a simple problem that I could explain to my mom over the phone.  I want to help the next generation start ups of companies reach the scale of Twitter,  Facebook, and of course Threadless.  That’s cool.”

Harper has told me, and many Rackers he speaks with frequently here, that joining the cloud computing movement, specifically The Rackspace Cloud, is an exciting opportunity for him. He is just fascinated with cloud computing and how it’s democratizing hosting.  Harper also has always been a great culture fit with Rackspace – he’s fanatical about what he does, and so are we.  With his experience in infrastructure, and passion for helping customers, Harper is going to be a great advisor to both startups, and established companies on how to leverage the cloud for their businesses.

“Making hosting and server technology cheap and accessible will hopefully get more people building better products and launching them without losing their shirts.   I’m into that!”

We’re excited to have him join our cause and can’t wait to see all the cool things he will bring to the table – and yes, they will be cool. He also plays YoYo professionally. Check it out.

The Cassandra Project

// September 23rd, 2009 // 7 Comments » // Development

By Jonathan Ellis, Systems Architect

You may have heard about the Cassandra distributed database in recent articles or conferences. I’d like to explain what advantages Cassandra offers over traditional relational databases like MySQL or Oracle and why Rackspace has committed resources to the Cassandra project.

The Cassandra project was started by Facebook in 2007 to scale their internal applications, particularly Inbox Search. Earlier this year, they released it to the Apache incubator where other people from the community could become involved and start contributing. This allowed  the project to move forward in a direction that is more general to the public than just to Facebook’s needs.

In March, I became the first outside committer to this Apache Incubator project. Eric Evans from Rackspace and Jun Rao from IBM Research soon followed, and we recently added Chris Goffinet from Digg. The community has grown from 5 people in the IRC channel in December to  over 60.

Distributed vs. Relational Databases

Traditional relational databases are 30 years old, are well understood and have a huge ecosystem of tools around them.  For that reason, it’s a compelling option when building your application. Postgres, MySQL, and Oracle are all relational databases modeling a schema on entities and relations between those entities. That’s a good, powerful programming model with interesting theoretical properties. But companies with large amounts of data have already gone past what you can reasonably fit on a single machine, even on high-end hardware, and it’s provably impossible to keep the traditional relational model, in particular the ACID properties, while scaling across multiple machines. Even if you’re willing to give up availability, scaling reads (via caching and replication) is difficult with relational databases, and scaling writes by partitioning is either very expensive, very painful from an application programming and operations standpoint, or both.

Cassandra is taking the approach that, given that you’re going to have to give up some parts of the relational model to scale, let’s start over and rethink things. Let’s add things like transparent replication and failover, built-in partitioning and load balancing, multiple data center support, and the ability to add capacity without ever disturbing applications running against the database.

Rackspace’s Involvement

The original Facebook team has been busy elsewhere, so the community has had to step up and take the initiative in moving Cassandra forward.  Cassandra is open source and I don’t want to downplay others’ contributions, including those from IBM Research, Digg, and Twitter as well as other companies and individuals, but I’m proud that Rackspace’s support has been instrumental in adding many important new features, fixing bugs, and getting out new releases.

Here are 3 reasons why Rackspace has committed resources:

1-    As stated in previous posts by Erik Carlin, we are committed to an Open Cloud. With Amazon’s Simple DB or Google App Engine’s datastore, you’re locked in. Cassandra presents an open alternative: you can write against Cassandra and deploy anywhere.  That’s important.

2-    We have a suite of Cloud products that are productized beyond just the raw Cloud Servers. Cassandra is interesting to us because we can use it under the hood to improve Cloud Sites and Cloud Files. And people are already starting to ask, “When can I just go to Rackspace and deploy a preconfigured Cassandra cluster?” It’s still early, but that’s definitely something we’re looking at.

3-    Rackspace itself has a ton of data that we generate from our switches and routers and the rest of our infrastructure. Right now we are getting by with traditional monitoring and logging technologies, searching those logs and so forth. Cassandra will help us a lot with that as our volumes continue to increase. Our Mail & Apps products are also very interested in using Cassandra to store mail messages and other data.

Finally, I want to emphasize Cassandra is not a magic bullet. You can’t just take your SQL app and put it on Cassandra and expect it to work.  It’s a different programming model and instead of modeling as entities and relationships and just adding indexes to get performance, you need to think at a more basic level: “What information do I need to retrieve from each query?” and model your Cassandra schema accordingly.  It’s a different way of thinking and does require new code to be written. It’s very much for people that have a lot data that doesn’t fit on a single machine and are feeling the pain from traditional approaches to scaling that.

We plan to write some other posts in the future detailing what a switch might look like for some sample applications.

Winner of TechCrunch50 RedBeacon on The Rackspace Cloud

// September 23rd, 2009 // 4 Comments » // Community

Angela Bartels, Cloud Maven

When you need a service done – maybe house painting, landscaping, or even plumbing– it can be time consuming to go out and find the best candidate for your job. You may go through your Yellow Pages, send an email to friends and family asking for a referral or even just Google it. Even then, you have to evaluate each contender – what’s the best value for the price you want to pay. You are the payee so why should it require so much time and effort on your part?

RedBeacon, Rackspace Cloud Customer (on Slicehost), was designed to solve this problem. You can go to their website, enter the service you need and they will update all potential providers in your area they believe can serve you best and invite them to submit a price quote for your job.  You will receive a list of service provider descriptions with price quotes interested in doing your job. Choose the best one and you can simply book an appointment with them. The workload is not on you anymore.

It’s no surprise that these guys were the winners of TechCrunh50 this year. When Robert Scoble, also a TechCrunch50 panelist, interviewed them for a Building43 video, Scoble quoted:

“Just interviewed @redbeacon – I am even more impressed with their thinking now (TC50 winner). I was wrong to not pick them.”

I had the opportunity to catch up with Aaron Lee, co-founder of RedBeacon, to find out more about the origin of RedBeacon, their experience at TechCrunch50 and why they chose cloud computing.

As ex-Google product employees, the founders of RedBeacon did take notice that finding local service providers was difficult and unfortunately, they hadn’t leveraged the existing technology to make the lives of consumers better and easier. The idea was to fulfill this need and that’s essentially how RedBeacon came to be.

TechCrunch50 was the perfect platform for RedBeacon to launch their product, receive feedback from the experts and to network. Lee quotes:

“For any startup, TechCrunch50 is definitely a must attend event. It’s a great way for a startup to gain exposure and recognition. We received great advice and feedback from TechCrunch, panel judges, and the audience. The guidance and presentation feedback from Jason and Michael was just phenomenal. They also provide a low cost way for young startups to showcase their ideas to industry veterans and VCs.”

Currently, RedBeacon is signing up service providers (it’s free) to establish a good foundation. Once they seed the database, they will launch in early October to consumers. They will only make money (10% commission) when the service provider makes money. Their next steps will be focused on deploying exciting new features such as leveraging social graph and partnerships with merchants, associations and publishers.

As a start up venture, RedBeacon wanted a cloud computing provider where they could easily turn servers off and on when needed. They chose Slicehost for this very reason but most importantly for the support and the control panel:

“Slicehost has kickass 24/7 technical support – it’s like having a master system admin on your team and only for a fraction of the cost. The UI is slick and intuitive. Bringing up and down servers is a breeze. We spun up a dozen servers easily to handle the load during TechCrunch50. Overall I am very happy with Slicehost and would definitely recommend it to others.”

Congrats RedBeacon!

Coding in the Cloud – Rule 6 – HTTP Includes

// September 22nd, 2009 // 1 Comment » // Development

By Adrian Otto

This continues my series on Rules for Coding in the Cloud. These are rules I’ve developed after watching applications encounter problems at scale when deployed on Cloud Sites.

Rule 6:  Never use HTTP include. Let me explain.

How does a HTTP include work?

You tell your PHP application, “I want to include a file.” For the file name, you supply a URL, which the server must download.  A client makes a connection to a PHP web server, the PHP web server runs an application, the application opens a file, and the file type is a URL. The server makes contact with another server, downloads this URL and puts the output into the PHP script.

Why is this a problem?

This results in not only a huge security problem, but also a performance problem. And now you’re faced with a potential outcome that could be disastrous—an infinite loop in an elastic server environment. You can accidentally create an HTTP include which includes something from your own site, which includes something from your site, which includes something from your site, and… well, you get the idea. If you do that, you’ll get a single client connection, which will open a connection to itself, over and over, until you have 50,000 of them running in parallel. The last connection will then hit the limit that you’re allowed to create and the entire thing will roll all the way back. You’ll get a failure, and the whole application will proceed as if it never happened.  Unfortunately, you will not be aware of this issue until you receive your bill with an outrageous amount of compute cycle usage. The cloud had to do huge amounts of work that you couldn’t even see!  That’s really the scary part about this scenario because the site looks like it’s working just fine. When you browse through your site, it comes up relatively quickly because that just scales through the entire system.  Meanwhile, The Rackspace Cloud is receiving alerts. You may not even know that your site has done the equivalent of 50,000 hits for every single hit.

In addition, you may also inadvertently involve someone else’s site. If you have two interdependent sites, the two may end up fighting back and forth, creating a massive loop.  And because the server is making the HTTP connection, the browser is completely unaware of it, so the browser’s anti-loop code won’t prevent it.  There’s no way to break the loop because there’s no way to see where it starts.

There is more than one way to do an HTTP include. One of them actually allows you to include PHP code from a remote URL and execute it as part of the local application. This feature (gaping hole) in PHP is actually disabled on Cloud Sites. What does work is using an fopen() call where the argument is a URL. This allows you to read data from that file handle and process it (potentially just printing it out to the browser). Try not to be tempted to eval() any of that output.

This may strike you as familiar advice. I mentioned a similar subject in Rule 4 – Avoid External Dependencies and included a code example of how to download content from a remote site on demand, cache a local copy, and provide non-blocking access to that data. The reason why this is a separate rule is I’ve seen it broken repeatedly, but not as an external dependency. It’s a risk of a circular internal (or external) dependency. People find reasons to HTTP include content from their own site but please try not to! What seems like an innocent include eventually leads to the infinite loop situation described above.

Bottom line: Never use HTTP include.

Click here to learn more about cloud computing.

The Work of Open Cloud

// September 21st, 2009 // 2 Comments » // Community, Development, Events

erik-carlin By Erik Carlin, Senior Architect

I just returned from Philadelphia where the DMTF Open Cloud Standards Incubator group had a three-day face-to-face meeting on cloud computing standardization.  In addition to Rackspace, other companies present included:  CA, Cisco, Citrix, Hitachi, HP, IBM, Intel, Microsoft, Sungard, and VMware.

The purpose of the incubator group is essentially to get the ball rolling – to frame the problem and lay a foundation upon which specific cloud standards can be developed via new or existing DMTF working groups (e.g. OVF) and other SDOs – a new TLA (three letter acronym) I learned this week – which stands for Standards Development Organization.  Specifically, this includes things like defining a cloud taxonomy, laying out use cases, and identifying specific areas ripe for standardization.  The work of open cloud is an arduous one and I appreciate the energy and commitment of the group.  Three full days of debating terms, concepts, technology, use cases, etc. can be exhausting!  Nevertheless, it is necessary and is the means to an end of specific cloud standards that will add real value for customers.

Rackspace has been involved in a number of early conversations and meetings over the past 18 months around cloud standards.  It has been frustrating because there has been little to no progress (albeit the efforts were always well intentioned).  It’s exciting now to see more formality around cloud standards development (from the DMTF and others) as well as a coalescing of various standardization efforts (e.g. http://cloud-standards.org).  We had a number of discussions about the work other groups are doing so there is both awareness and intentionality with regard to collaboration.

Rackspace has never believed in “lock-in.”  We want to earn your business through Fanatical Support. Even in our traditional hosting business, where the single tenant nature and ability to uniquely customize infrastructure necessitates a contract, we have the Fanatical Support Promise which lets you out if you feel we haven’t lived up to our promises.  If you believe you are best served by going elsewhere, you should have the freedom to move.  And the cloud should be no different.  While the cloud doesn’t require a contract, there are APIs, image formats, etc. that, in the absence of standards, will be proprietary and hinder portability.  That shouldn’t be the case and solid, well-received cloud standards are the key to avoiding cloud lock-in.

We can’t predict where cloud standards will go, but we are committed to participating in the process and helping create a world of open clouds.  Without them, we will never realize the full potential of the cloud.

Have a question? Send me an email.