It would be nice, but cloud doesn’t have to be “interoperable”
Disclosure: If you’re coming straight here you may not know work for Rackspace Hosting and I’ve been involved with OpenStack since the inception of the project. The opinions on this blog are my personal ones, not those of my employer.
This post is an assessment, a thought. I don’t really explore the meaning or outcomes completely. I may do that in a future rambling… on to the thought…
For the first decade of networking or more we had many competing technologies that didn’t interoperate: SNA, IPX/SPX, TCP/IP, AppleTalk, DECnet, NetBEUI, and more. The lack of a consistent and unified standard didn’t stop networking from succeeding anymore than it will stop cloud computing. Cloud is a fundamental shift that dramatically increases productivity just like networking did — businesses love increases in productivity and will adopt anything that yields one – often the first one presented to them and they’ll run it for a refresh cycle before switching to the “interoperable” platform. Ok, so I’m a networking geek but this isn’t the only analogy that holds true…
We have many programming languages.. compiled, interpreted, functional, object oriented.. all with major differences.
We have many types of processors from low energy mobile chips to super fast server chips all with different instruction sets.
We have a variety of operating systems all with a loyal following and a vastly different set of capabilities.
I believe cloud could see wider and more rapid adoption if interoperability is figured out but looking back at history, and even history specifically in the technology world, we have many successful markets without true interoperability as a fundamental capability.
Over time most of these markets have achieved the guise interoperability through consolidation and it looks like cloud computing is headed the same way. Networking is predominately IP; programming is C/C++/C# for OS/infrastructure, Java for enterprise applications, and PHP for web applications; processors are x86 in desktops and servers, and ARM in mobile devices; Operating Systems are generally Windows for consumer and SMB/departmental large business IT, and Linux for web and larger business core IT.
With the pace of innovation and the foundation laid down by previous generational shifts the cloud market will grow and reach a critical mass market share much more rapidly as technology companies that are involved know the path to follow. Microprocessors, operating systems, and networking took many decades. Java swept through the enterprise software development market in a decade as did PHP across the web. The cloud market really started to emerge around the start of the decade and by the current look of things by middle we’ll have a clear picture of interoperability for clouds.
DevOps != sneaky, reckless, or process-adverse hooligans
Operations has one primary mission above all else… uptime. This often gets them labeled by the rest of the organization by names such as, “the brick wall”, “organization of ‘no’”, and a host of other names I won’t use on a PG blog. With automation coming to IT operations teams are being asked to ensure uptime while allowing a more rapid evolution of the environment. The days of an 18 month release cycle are over. So how exactly do we accomplish this without wrecking things? Proper testing, visibility, and a scale staging environment.

Almost all application deployments are iterations and evolutions of existing applications. They also aren’t a single function system so changes to one application can impact the performance or availability of others — this is why operations has become the organization of “no”. To prevent bad things from happening first you have to understand what is happening. You need to have monitoring that can show network, operating system, platform, and application performance.
Now that we can see what is going on with production we need to create a small scale version of production used for staging. When creating this scaled down version of the environment you’ll need some understanding of how increasing size and load impact the system. If all of your algorithms are O(x) then it is easy to scale down and predict what it will do in a larger environment. This won’t be the case though — some things will be O(log(x)) where it takes more resources at a smaller scale than it will once scaled up and others will be O(#x) or even O(x^#) where as you grow they consume resources in a linear multiple or exponential manner.
So then where does DevOps come in?
It isn’t just a crazy idea by the developers to try and find a way around “no”. If implemented properly it will build a strong cooperative effort between development and operations — get rid of the “Us vs. Them” mentality. It will also eliminate duplication of effort where a development team writes unit, integration, and system tests and operations will write all different tests in their monitoring and management systems.
This divide also creates problems where new releases get delayed as operations find issues in staging because they weren’t involved with the test case development early on. With continuous integration and automated development testing many of these tools also make great operational monitoring systems as well. This can allow for the operations management team to write unified tests with development that are used throughout the process — everyone is working from the same goals — no more meetings of, “Why didn’t you catch this in development?”, “It works fine in development, you screwed up the installation.”
The next benefit you’ll get from a DevOps culture is tasks will be automated in other areas decreasing error rates and increasing delivery speed. Today developers use distributed source repositories with version control to deploy applications onto development and testing systems. Operations will then package up those applications into an installer that introduces another step in the process that has to have additional testing and new tooling. DVCS platforms have authentication and tracking built in, all of the audit controls an operations department wants.
I know I’ve glossed over the details in this post. I plan on putting together some detailed future posts covering examples of how I’ll be implementing each section using open source tools to help with work my team is doing on OpenStack.
Why OpenStack matters to me
I’d like to start off with an apology to everyone out there that over the past 9 months if I didn’t reply to your email, didn’t answer your phone call, or made your life less interesting by disappearing from Twitter and from sharing my thoughts on this blog. I’ll be out, alive and available again now that OpenStack is a reality.
Life is about priorities and hopefully at some point in your life you have already had or will have in the future an opportunity to work on something that has the ability to really make an impact. At Rackspace we are a Strengths based organization. My top 5 are Learner, Achiever, Competition, Analytical, and Focus. I’ll use my strengths as a way to explain the past ~9 months.
When we started exploring the strategy around this all of us had lots to learn. We’d all used open source software. Some of us on the team had contributed to projects, but we all knew we had a lot to learn if we were going to get this right. The great thing about open source, the full history of all of it is on the Internet. You can go back and read mailing list archives, you can find out who contributed to a project, who led them, who had influence and you can reach out to those people and they’re often happy to talk about it. This is very different from trying to do research on businesses where information is hard to find — no corporation will share their full mailing list archive that covers the history of their decision making (heck most don’t even have one). The openness and ability to learn about things easily was a huge motivator for me.
So began the Learner->Analytical->Focus->Achiever “death spiral”, well the “death” of my learning anything not involved on this project that is. The good news is those 4 strengths together make it so I really enjoy learning about new complex systems and figuring the best way to navigate, the bad news is the Focus->Achiever half may let me chase Alice all the way down the rabbit hole to Wonderland. Sometimes this is counterproductive where a decision could have been made “good enough” with less analysis but in this case I’m really happy about it. When forming an open source community you have a lot of choices to make and all of them have different benefits or drawbacks and the perception of is it a benefit or drawback varies from the perspective of the individual or group.
Forming this community is important enough to go all the way down the rabbit hole because thousands of people will become part of it and each potential member of the community is worth more than an hour of my time. This gives me a good segway to talk about scale — If you’re only going to use a piece of software once to solve a single need then you should make it just good enough to get the job done — you should optimize for min(time coding + time for code to run[where you have to pay attention to it]). The opposite end of the spectrum is a project like Linux (or like OpenStack will be — I dream big!) that runs on millions of machines 24/7 all around the globe. If you can make an operation one minute faster on something that runs on a million machines you save 2 years worth of system time. With that same idea we spent all the time we could making sure we got the community started the right way because every hour we spent will be multiplied by each of you that join it.
So now here is where my Competition kicks in. I don’t want to make just an average community and then go watch reruns of “Everybody Loves Raymond” (Ray, hopefully you aren’t offended, you shouldn’t be, you were the first show that I know made it to rerun syndication that popped into my head!) on local TV — I want to make the best community ever. The problem is… the bar is really high.. it isn’t like I said, “I want to make the biggest ball of rainbow yarn a person with a 9 letter long name made on a Tuesday afternoon” — I want to make the best open source community around a distribution of projects out there — and a lot of people have done an excellent job at this. So to do this we’ve learned as much as we could from past projects to lay the proper foundation. With that let me lay out the “4 opens” (I’d like to credit Rick Clark on our team for summarizing these thoughts into a concise and clear manner we can all hopefully understand)…
Open Source: We are committed to creating truly open source software that is usable and scalable. Truly open source software is not feature or performance limited and is not crippled. We will utilize the Apache Software License 2.0 making the code freely available to all. [Personal commentary: What this means is "we accept patches", the project won't block a feature contribution because it competes with a commercial feature a community member has. This doesn't mean all of those commercial entities have to contribute all of their code -- it just means they aren't guaranteed exclusivity.]
Open Design: Every 6 months the development community will hold a design summit to gather requirements and write specifications for the upcoming release. [Personal commentary: The design summits have been great (so far we've had 2) to get people aligned and to really get the complicated items solved. An example on this is the large object support for Object Storage, members of the community had a number of different implementation ideas and through discussion we've come up with a great way to do it.]
Open Development: We will maintain a publicly available source code repository through the entire development process. This will be hosted on Launchpad, the same community used by 100s of projects including the Ubuntu Linux distribution. [Personal commentary: Getting code and designs out in the open as early as possible in the process allows everyone to benefit from the power of a community in the biggest way possible. This also makes finding and fixing big problems much easier as each patch can be tracked and its individual impact measured.]
Open Community: Our core goal is to produce a healthy, vibrant development and user community. Most decisions will be made using a lazy consensus model. All processes will be documented, open and transparent. [Personal commentary: Everyone should have a seat at the table at a level that corresponds to the effort and contributions they're putting into the project. With all of the decision making done in IRC meetings (with transcripts) and over mailing lists members of the community can see "how the sausage was made" rather than just the end result of the decision -- this is really important to build and maintain trust.]
We’re off to a fun and exciting start. Looking at the stats from this week I’m amazed at the amount of contribution we’re seeing from such a large group of developers (stats for the week of 12/3 to 12/9):
- OpenStack Compute (NOVA) Data
- 17 Active Reviews
- 97 Active Branches – owned by 34 people & 4 teams
- 472 commits by 26 people in last month
- OpenStack Object Storage (SWIFT) Data
- 5 Active Reviews
- 41 Active Branches – owned by 19 people & 2 teams
- 184 commits by 15 people in last month
This shows me what we’re doing is working and given the time to continue to grow and bloom OpenStack Compute can help IT make the move to automation the same way manufacturing has over the past 50 years. Yes, I’m saying IT isn’t automated right now. IT automates other tasks inside the Enterprise but they haven’t really automated many of their own tasks (this probably deserves a full post of it’s own).
Object Storage is potentially more important even than the automation. This is a topic I’ve been presenting on frequently because I’m very passionate about it (see the Strengths above) as it allows us to see an order of magnitude increase in efficiency over the TCO of “the average storage solution”. It doesn’t serve every storage use case but the use case it does serve is growing rapidly and over the next decade it’ll be clear to everyone that their largest storage platform (in terms of GB stored) will be object based.
I expect we’ll see additional projects as part of OpenStack over the next year but we should keep that bar high as a community on what is a major project. Both Compute and Object Storage are providing software for ubiquitous problems that are growing in importance to everyone. Some items that clear the bar for me (these are critical issues to all users and operators of clouds a decade from now):
“Networking as a Service” — This should be abstracting from the end-point computing service as it can be utilized by all projects and to provide connection points to other inter-cloud and non-cloud services. Here we can define, routing, switching, and filtering network devices and we can automate their integration with other cloud services.
“Inter-cloud Services” — As different clouds become available with varied services we need an automated way to discover and catalog them the same way routing protocols advertise network availability so we can have a loosely coupled global network (you may be familiar with it.. the Internet). OpenStack is a great place to define a reference implementation of the directory and advertising capabilities as all interested parties can have a seat at the table to contribute their needs.
Some items I’m on the fence about (the reason I’m on the fence isn’t that they aren’t extremely important to some implementations, it is that they aren’t important to all implementations):
“Host Provisioning Automation” — For service providers that are constantly growing and re-provisioning assets automating these tasks is critical. For a SMB that is going to build a 2-6 cabinet cloud solution once this isn’t nearly as important.
“Security & Compliance Services” — Everyone wants “some level” of security but what that level is and what amount of the resources that get dedicated to providing them varies widely.
“Network Block Storage Services” — As the performance and size of local storage continues to increase the need for network block storage decreases. I’m still a big believer in the benefits here for many use cases; it just doesn’t apply for every use case.
I really believe 2011 our community has a chance to really deliver “the promise of cloud” to the masses through the efforts and commercial implementations created by the members of our community. As exciting as getting things off the ground in 2010 I’m even more excited about the future to come.
