Bret Piatt

Tag: cloud computing

How to tell the difference between “cloud” and “virtualization”

by Bret Piatt on Feb.07, 2010, under Technology

Many people seem to think “cloud” is just off-premise “virtualization”.  Cloud comes in a few flavors and I’ll argue that you can have “private cloud” either hosted off-premise in a provider’s facility or in your own.  The fundamental difference between cloud and virtualization is the goal of cloud is to automate provisioning (this applies to IaaS, PaaS, and SaaS) and the goal of virtualization is resource utilization optimization.  You can (and many providers do) use virtualization as the basis for building a cloud but it is not required.

If we take a look at the Reductive Labs presentation from OpsCamp slide 3 illustrates the primary benefit of cloud.  Cloud helps companies even if their minimum unit of work is larger than a single host machine where virtualization just adds overhead in that case.  The difference between “cloud” and “grid computing” or HPC is that grid/HPC process jobs in a batch manner rather than serve interactive applications.  You can build a compute grid on top of a cloud but not vice versa.

Other folks are saying “private clouds can’t exist because you can’t have rapid elasticity and pay for what you use”.  For a small company you may not be able to have a private cloud but for a large enterprise with many business units you certainly can.  An IT infrastructure BU can provide other organizations in the company all of the requirements of a cloud.

For public cloud to succeed they need to provide all three

Depending on the current utilization across an enterprises infrastructure they may be able to defer spending for a number of years by moving to a fully cloud enabled business.  Right now many departments cling to servers they don’t need because they’re afraid if they release it they’ll never get it back.  With cloud removing that fear resource hoarding ends and many enterprises will have a significant increase in available computing power.

Over the long term if the public computing clouds continue to grow, increase their transparency, and optimize their delivery models it will no longer make financial sense for enterprises to build their own infrastructure.  Public cloud providers will need to prove over the next decade they can deliver on all three corners of the “impossible triangle”.

View Comments :, , , , , , , , , , more...

Public clouds and their features, followed by the future of cloud computing hardware

by bretpiatt on Dec.20, 2009, under Technology

I’m going to break this post up into two sections, the first will discuss public clouds and their features focused on advanced networking as an example.  The second portion will look at the future of cloud computing hardware — both networking and computing.

Public Clouds and Feature Selection

A discussion started on Twitter today after Werner Vogels (@Werner) tweeted about the future of networking through a blog post by James Hamilton entitled, “Networking: The Last Bastion of Mainframe Computing”.  Christopher Hoff hasn’t been thrilled (understatement of 2009) with the networking features provided by cloud computing platforms both public and private.  Unless I misunderstood his tweet he’d love to hear public cloud providers commit to a flexible API driven networking layer using technology such as OpenFlow.

I tossed back a question asking, “Are customers willing to pay for complex network customization in a cloud? If so, what percentage of them? Thoughts?” and he replied, “In terms of paying for parity in what I can do in even a basic enterprise today? No thanks. That’s on you as a provider in long term”.  I threw this question out because here-in lies the problem… Public clouds will only end up with the features that a broad market will pay for or a small market will pay a very significant premium for.  The reason behind this is when a cloud adds a core feature, it adds it everywhere.  This leads providers to only invest in new features that a enough of their customers are interested in to offset the cost of deployment and still yield a satisfactory return on capital.

Today at Rackspace customers that want advanced networking configurations are directed to our Private Cloud platform (I say our because I’m employDifference between public and private clouded by Rackspace — the opinions expressed here however are mine alone).  They can then create security zones, use IPS/IDS, and enable enhanced DDoS defense services all behind dedicated firewalls and load balancers.  The private cloud environment can have bridged network segments that connect to a public Rackspace Cloud Servers(tm) configuration for workloads that do not require advanced networking.  The current addressable market interested in both public cloud as a primary platform and advanced networking is small.  The early adopter group of start-ups and SMBs doesn’t typically need or is not willing to pay for advanced networking and the enterprises that are willing generally aren’t first movers on new technology.

As the public cloud market matures the addressable market will grow and you’ll start to see public cloud providers adding advanced networking capabilities though the cloud definition of “advanced” won’t ever be truly “cutting edge” on a mass market cloud.  I expect we’ll see niche clouds emerge that will cater to specific application use cases that will have advanced features for their target customer.  Early examples of this are Force.com or the OpSource Cloud.

The Future of Cloud Computing Hardware

I’m now going to loop back to James’s post that kicked this whole thing off where he compared the current network device situation to mainframe and the vertical scale centralized systems.  He asserted that we’ll see a commoditization of the networking layer similar to what we’ve seen in the storage layer through technologies like RAID and through servers with x86.  The reason RAID and x86 have been successful is they are multi-purpose with the capabilities to serve a broad range of applications well with proper configuration.

Networking gear is very different because the workloads are all uniform and when you have a uniform workload an ASIC (Application Specific Integrated Circuit) or a FPGA (Field Programmable Gate Array) that has is tailored to a specific type of workload will enable better performance per dollar.  The second core difference between the server/storage markets and networking is once you step into the “carrier/cloud class” networking equipment only a few hundred potential customers exist — markets with fewer stronger customers tend to be more consolidated.  Networking gear has also been “cloud like” for over a decade now.  Lets look at the NIST requirements for a cloud:

On-demand self-service - This requirement is for a cloud to user relationship.  I’ll translate this to a network cloud to network engineer relationship.  For them, all carrier class networking gear supports SNMP along with other potential programmable configuration methods through management systems with APIs such as the Cisco Configuration Engine [PDF].

Rapid elasticity – This dates back to frame-relay where the concepts of a CIR (Committed Information Rate) was introduced.  The space has continually evolved with QoS being introduced on ATM up through the advanced dynamic algorithmic traffic routing today over IP/MPLS networks.

Resource pooling - Doing this for computing is new outside of the HPC market — telecommunication networks have been multi-tenant since the point the 3rd phone was hooked up over 100 years ago.

Measured Service – Networking has been doing this for years as well, down to the minute or byte of data instead of the hour or GB (the smallest unit of measure any public cloud compute or storage platform bills in).

Sun Oracle Database Machine

Sun Oracle Database Machine

Broad network access – Service provider IP networks are the ultimate in heterogeneous access through standards based communication.  They support connectivity over a number of layer 1 physical mediums using quite a few layer 2 communication protocols.

Cloud computing may actually end up bringing the server market closer to the current networking market than vice versa.  An IBM Z-series is capable of very efficiently Linux instances.  It also supports I/O virtualization for both networking and storage with granular controls — features we still don’t have at the same quality level from x86 virtualization solutions.  The Oracle Exadata V2 is another example, it supports 1 million I/O per second for non-sequential workloads on databases up to 140TB in size.  How many commodity x86 servers does it take to match either of those configurations and how do they compare in capex and TCO (Total Cost of Ownership) to the IBM or Oracle specialized platforms?  We see even specialized x86 platforms being developed and deployed by a number of players.  Some examples are the Cisco UCS, SGI Ice Cube, and the Sun Modular Datacenter.  These platforms are all designed to optimize spend for virtualization/cloud computing workloads and while they may be made up of x86 sub-components they are designed to function as a complete “mainframe” functional unit.

Conclusions

We’re still very early in the technology transition to a full utility style computing grid.  As the transition progresses we’ll see more use cases served by a broader range of features.  For the small verticals with complex configuration needs and a low willingness to pay a premium we’ll see niche providers.

Networking hardware has been cloud like for more than a decade and a few major players dominate the market because of the small number of strong buyers.  Technologies such as OpenFlow in combination with Moore’s law has the potential to disrupt the market but this isn’t a guarantee.  The current clouds being built using a massive number of commodity x86 systems is also not guaranteed to be the future — specialized computing platforms have the potential to deliver better unit economics and in a commodity business it will come down to the financials in the end.

View Comments :, , , , , , , , , , more...

Availability is a fundamental design concept

by bretpiatt on Oct.03, 2009, under Technology

Earlier today a conversation on Twitter with Christopher Hoff (@Beaker), James Watters (@wattersjames), George Reese (@georgereese), Benjamin Black (@benjaminblack), and Shlomo Swidler (@ShlomoSwidler) discussed how many people seem to assume that because clouds can scale and rapidly provision servers that they’re always available and that because of this availability doesn’t have to be a fundamental design concept anymore.  It kicked off with @Beaker’s tweet about BitBucket, “Cloudifornication: 20+ hour outage due to EC2/EBS on BitBucket http://bit.ly/A8vCy” BitBucket ran into a problem with EC2/EBS that made their site unavailable for 20+ hours (I’m linking to the comments discussing it on Hacker News since the main BitBucket page is back to normal now, no longer the explanation since the problem is fixed). [UPDATE: Adding BitBucket blog post on the outage.]

The purpose of this post isn’t to analyze the BitBucket situation, it is to help people understand how to design an available architecture while still keeping it efficient in terms of expense.  Given an unlimited budget (or nearly unlimited) most IT architects will be able to build a “bullet proof” configuration.  Most of us don’t function in that world though so compromises are made.  Here I hope to outline how you can compromise effectively by thinking about availability early and often in the design process.  The design recommendations I’m going to outline are general in nature and depending on your specific business and operational model may not fit.  I enjoy discussing specific use cases and designs so if you’d like analysis directly related to your situation comment on the post and lets discuss it.

With that disclaimer here goes…a step by step guide to building a web application that will be available “almost all the time”… [Second disclaimer, I work for Rackspace Hosting, we have a cloud (The Rackspace Cloud), the recommendations here are my opinions, not those of my employer.]

1. Start with DNS — This is overlooked quite a bit and is the easiest thing you can do to ensure availability.  Get a reliable DNS provider that hosts their DNS servers in multiple data centers that each have multiple peering arrangements with documentation on their BGP convergence times.  This DNS provider should let you set the TTL (time to live) on your A records down to a maximum of 5 minutes (some will let you go as low as 1 minute).  Now you have the ability to redirect www.yoursite.com to a new IP address in 1-5 minutes.  While this may not let you recover your site completely, the worst case is in 5 minutes you can have a simplified version of your site up and running “somewhere” in 5 minutes.  Being able to give your customers a “We’re experiencing issues” message with a phone number or other information is invaiuable.  When customers believe you are working on recovering your site and/or have things under control they’re willing to trust you much more than if they get a 404 or 503 error page from their browser — if they are a new visitor and not a customer a 404 most likely means they never come back.

2. Design your application with portability in mind. Using a technology only available from a single provider may sound like a good idea but it locks you into that provider.  While we all believe our hosting provider will be in business forever 5 years ago we all thought we’d never see GM go bankrupt or Lehman Brothers cease to exist.  Cloud computing makes this much easier to test and implement than it used to be.  Part of going from idea to launch should include deploying your application to a minimum of two providers to ensure if something does happen to your provider you’ll be able to continue to run your business.  I don’t recommend trying to run your application on multiple providers as it’ll generally add expense you shouldn’t need — however I do recommend having your code and data with mutiple providers.  This requirement means you should try to avoid customizing at the OS/kernel/filesystem level.  Those are the main items I see causing difficulty in portability.  Next, if you want a hosting provider to support your application infrastructure stack (i.e. the HTTP server [Apache, IIS, etc], database server [Oracle, MySQL, MS SQL, Postgres, etc]) pick standard versions or plan on hiring staff to support your customizations.  While a single provider may agree to support your (or their) modifications others probably won’t.  If your provider has their own special versions of the appliation platform they may be trying to lock you in — beware!

3. Spend some time on BCP/DR (Business Continuity Planning/Disaster Recovery). You’ve spent months (or years) going from idea to application — if you spend a day or two you’ll have a fair BCP/DR plan — if you have somebody with a background in this you can have a good plan in a day or two.  After putting the plan together –TEST IT!  I’ve helped a number of businesses put together a plan and after we’re done they check the box, put it in a filing cabinet and then pray they never have to get it out.  That mindset is like a football team having a “2 minute drill” playbook but never practicing the plays hoping that they’ll never need to use it.  When it comes down the having to do it, if you haven’t practiced how well do you expect it to go with the added stress of an outage? “But Bret, I can’t test it, we can’t take our site offline for a test!” — You don’t have to go all the way to taking your main infrastructure offline (see #1 DNS).  You can bring up the replacement site without ever impacting your real site by modifying the DNS on your test machines (either point them to a BCP system test DNS server or modify the local host files).

Backup your data, backup your data, backup your data.

Backup your data, backup your data, backup your data.

4. Backup your data, backup your data, backup your data. Customers will deal with service outages.  They won’t put up with you losing their data.  You use time capsule, Jungle Disk, Mozy, Dropbox, or any other number of personal backup programs for your personal files.  If your house burned down you’d still have all of your own stuff.  What would happen to your web site if the data center your servers are in burned to the ground?  Is the data gone? If it isn’t gone how long will it take you to restore?  Is that timeframe acceptable to you and your users?  A couple of concepts to familiarize yourself with are RPO (recovery point objective) and RTO (recovery time objective).  RPO means how much data will be lost — if you do a daily backup you have a 24 hour RPO, if you run a transaction replicated database (such as Oracle with Data Guard) with the databases in separate geographic locations your RPO may be under a second. On RTO if you’re restoring from a backup medium like tape you’ll be able to recover ~10-40GB/hr (depending on the tape technology and compression ratio of the backup) — if you have a 400GB database you have a RTO of 10+ hours even if with cloud computing you can instantly have a new database server available to put the data on.  With a live database in a second geographic location your RTO is also potentially under a second (for restoring data, since you don’t have a restore — this doesn’t mean your whole site is automatically online in that same time).  I won’t go into detail here since we’re talking availability and not integrity but having a multi-geographic location replicated database doesn’t insure integrity — you still need snapshots or transaction logs or another way to go back to various points in time if you end up with bad or erased data (see my favoriate XKCD, “Exploits of a Mom”).

So now that we’ve taken all of this into account — what do we do?  My recommendations…

1. Make a “gold build” of each of the server types in your application and understand how long it takes you to have your necessary quantity of each server type online at various providers — cloud makes this much easier, in the dedicated world you’re looking at days typically to provision a new environment.

2. If your business relies on a fully functional web site as a primary revenue stream have a live database at a secondary location with the ability to launch web and app servers to bring your environment online quickly in the event of a primary provider failure.  If you can continue to service your customers via phone and/or e-mail have a static version of your web site running that you can switch to using DNS in the event of a primary provider issue.

3. Keep your source code in multiple locations with the ability for multiple employees to be able to deploy the site in the event of an issue.  I’m a huge fan of collaborative code repositories like GitHub and Beanstalk but if your code is only one one of them and they’re down (or in maintenance window) when you need to have that code to bring up a backup environment you’re stuck — it costs next to nothing to keep that code in multiple places.

I understand that nowhere in this post do I mention HA (high availability) nor do I mention things people generally think of when they hear HA.  Having redundant switches, firewalls, routers, and servers all in a single location (what people generally think of when they hear HA) will ensure that location stays online and you should certainly be doing that but it puts all of your eggs into that basket if you aren’t looking at HA beyond the single infrastructure.  Now that I’ve mentioned it if you want to learn more about HA design in a single location the Internet is full of good information on the topic.

I’ve also focused the discussion on architectures relevant to “most folks”.  If you’re Facebook, eBay, or Google (the search engine) you don’t want to rely on DNS to deal with outages at a specific location.  You’ll want to pair DNS with GLB (global load balancing) and BGP so you can have near real-time re-routing of users and potentially even sessions.  My availability recommendations certainly aren’t free to implement but they also don’t double your expenses.  It is very possible to add between 5-25% to your hosting expense to significantly increase your availability (and decrease your RPO/RTO).

I’m going to also note that I didn’t mention systems management or monitoring here really.  Those are both key items to understand to have an available environment but aren’t directly tied to designing an available architecture.  You’ll need to have proper systems management tools and policies (or you’ll cause outages yourself) and you’ll need monitoring so you know when to implement your BCP/DR plan.

View Comments :, , , , , , , more...

Cloud Computing, “For Everyone, Not Everything”

by Bret Piatt on Jun.07, 2009, under Business, Technology

Cloud computing is a broad term that covers Internet based services that provide SaaS (Software as a service), PaaS (Platform as a service), and IaaS (Infrastructure as a service).  SaaS services are the most commonly used cloud solutions — web based e-mail is the prime example.  The most widely used PaaS offering is probably WordPress.org unless you consider customizing your Facebook profile a very restricted PaaS. IaaS is the newest of the cloud services with the most well known example of Amazon Web Services which includes EC2 (cloud servers) and S3 (cloud storage).

Until Hotmail launched in 1996 we all pretty much had an e-mail client on our own system and potentially had to run our own mail server if we didn’t want to have a mailbox tied to our college or ISP — now almost all of us use any number of SaaS e-mail services.  Many of these e-mail services now include full features that businesses expect such as Rackspace E-mail or Google Apps Enterprise.

Before cloud based services if you wanted to have a website you had to run your own server until GeoCities launched in late 1995 — now PaaS providers from GoDaddy, for low price, to Mosso, for horizontal scale, provide very capable platforms to deploy a website without having your own server.

Now IaaS providers like Amazon, Terremark, and Rackspace are eliminating the need to always deploy and manage dedicated configurations for complex applications.  Before these type of IaaS offerings companies like Twitter would end up with their own datacenters and dedicated infrastructure.  Load testing services from companies like SOASTA would be cost prohibitive to offer.

So what about the title, “For everyone, not everything”?  It sounds like cloud has the capability to do everything now doesn’t it?  In a broad sense, yes, it can do a bit of everything but specific use cases in all service times aren’t a fit for cloud.  In the e-mail world if you want to do offline messaging on an airplane you want a mail client.  At the platform service level perhaps your application runs 10x faster if you can customize a couple of libraries or it just doesn’t work at all without those changes.  The infrastructure offerings force you to re-architect for horizontal over vertical scale to use them effectively.

Many other use cases aren’t a fit for the cloud yet.  Take video rendering as an example; it is much less expensive to buy a video card capable of performing rendering than it is to stream the rendered video over a network as 30 JPGs per second.  Another example is a retail POS system, at least some of the functionality needs to be in the store — you don’t want to stop selling things if network connectivity is lost.  Many more explanatory and reasonable examples abound.

Will cloud ever be the answer for all computing needs?  I doubt it, but over time it will be used to solve more problems because a centrally managed pool of resources provides greater efficiency and flexibility.  An example on this is utility power; we use it almost exclusively now but for a few use cases we still need generators.  Cloud will succeed and it will be adopted for a wider set of use cases over time as it will address those use cases better than previous generation solutions.

View Comments :, , , , , more...

Cloud Computing forces IT “Evolve or Perish”

by Bret Piatt on Mar.21, 2009, under Technology

When I started this blog I thought I’d be talking about technology on a regular basis and so far I haven’t.  This is still somewhat business related but it is also very tech heavy.  The tech focused pieces I intend to explain at a level that an average “nerd” gets but the average adult can read.

Earlier today I spent an hour watching one of the Rackspace founders deliver a training video intended for new hires in 1999.  In the video they go through the complexity of ensuring hardware works properly together, that the OS is installed properly, and that DNS is configured properly. Now just 10 years later much of this is significantly simplified.  When is the last time you spent time dealing with an “IRQ conflict” or “checking jumper settings” (hardware related troubleshooting that is automagic today)?

Now as we move to cloud computing with pre-defined virtual machine images the “OS is installed properly” piece is going away.  Projects like TurnKey Linux will lead to one-click application stacks on top of an OS.  For much of the IT community their career has been performing these tasks.  Now instead of an application developer needing a system administrator to “build the server” they go to a web based control panel, pick the system type they want and click “create” and the server is spawned.

It isn’t that the system administrator career is being completely eliminated; rather instead of every company needing their own system administrators in the future the computing providers will need them and general business will only need to have an IT staff that works on their specific business applications.  Business won’t need to have many other “building block” level IT roles either: networking, desktop support, and storage/backup administrators.

Many in the IT industry think I’m taking things a bit far when we have this discussion.  I don’t believe it’ll happen over night but during the next 10-20 years it will. Looking back in the past nobody has a “typing pool” to type up hand written notes, a “courier” to deliver a message across town in a hurry, or a “research” department to go look up basic information we all have access to now through a search engine in a matter of seconds.

This is where the “evolve or perish” comes in.  If you’re within 10 years of retirement and focused on the building blocks you may want to consider a job at an infrastructure company or risk the business you work for now eliminating your position in a transition to cloud computing.  If you’re at the start of your career and focused on those building blocks you need to be the best and brightest in your field so you can obtain one of the service provider jobs in a much smaller market going foward.  Your other option is to evolve and move further up the application stack.  This could mean learning how to properly architect an application to make the most cost effective use of the utility priced OS clouds or it could mean going all the way up the stack to interface design.

This isn’t all doom and gloom.  Evolution and automation like this increase productivity allowing us to focus on moving forward more rapidly.  If you enjoy your IT industry job start asking your employer what you can learn above and beyond the building blocks to help out.  While you may not need to today it is much better to be ahead of the game rather than waiting around for a layoff to start learning in panic mode.

View Comments :, , , , more...

Looking for something?

Use the form below to search the site:

Still not finding what you're looking for? Drop a comment on a post or contact us so we can take care of it!

Visit our friends!

A few highly recommended friends...