Author Archive
Advertising isn’t the only business model for websites
by bretpiatt on Mar.07, 2010, under Business, Technology
A post by Ken Fisher at Ars Technica stirred up quite the hornet’s nest. Brian Carper replied that, “Advertising is devastating to my well-being”. Rob Sayre chimed in on the Mozilla Blog about, “Why Ad Blockers Work”. All three of these were picked up by Hacker News and became some of the most commented threads of the week.
I’m not going to rehash anything said in those posts — I’m instead going to look at the different business models in the print and broadcast media markets and ask the Internet site operators why they aren’t trying to monetize in those ways?
In print media publications exist that are 100% advertising supported. You’ll find them in the magazine racks by the exit of your local supermarket or in between the exterior door and interior door of a coffee shop like Denny’s. These publications have marginal quality content — not good enough I’d be willing to pay for it but good enough that if I want something to read while I eat my Grand Slam I might pick it up and thumb through it. If you operate a website and you try to support it 100% through advertising you’re telling me, “My content is marginal so I only believe I can monetize it through advertising because you wouldn’t be willing to pay me for it.”
Moving to broadcast media the days of 100% advertising supported is nearly gone. As of this study from December, 2008, nearly 90% of US households receive their television through a subscription based service. We’ve seen a decade or more of whining from the major networks that they can’t continue to provide the quality we’re used to while viewership continues to decline. None of the networks provide 24×7 original content, after 11:00PM on most you get 6 hours of infomercials until the early morning news shows. The whining by website operators that users block their ads sounds a lot like the major networks crying the same thing with DVRs (a DVR is the functional equivalent to an Ad Blocker in your browser as long as you skip the commercials with it) and/or the fact we have more selection now due to competition from companies with other models.
Most content today is published under a hybrid model of pay for content (either through one time purchase or a subscription) plus advertising revenue. This is model is used by magazines, newspapers, and cable TV channels. Because they have a hybrid model they can produce content that doesn’t require as large of an audience to generate a profit. Ars Technica comes close to using this model on the Internet except when you subscribe there all they do is stop showing ads — they aren’t getting the model right. I pay a monthly subscriber fee to TNT or ESPN and they still show me advertising. If you’re going to have a subscription service on a website give the users access to premium content — don’t just turn off ads. I’ll pay for premium content and I won’t pay to have ads turned off when I can turn them off for free with an ad blocker.
The final model is 100% pay for content with no advertising. In the print business this applies to very few publications — mostly academic journals. With broadcast media many “premium channels” exist such as HBO, Showtime, Cinemax, and Starz that generate all of their revenue from pay for content. Ars Technica is jumping from the 100% advertising model to the 100% pay for content model but they’re giving away the exact same content. Many HBO subscribers would be willing to watch their favorite series with commercials for free each month instead of paying the $10 subscription fee — but HBO doesn’t give you that choice — it is subscribe or don’t get access. For you to be successful with this model you have to have premium quality content that will attract more people willing to pay than your cost to produce.
Most of the Internet today is running in the first business model and because of that you get “weekly circular” quality content surrounded by tons of flashy advertising. Very few websites have been able to successfully use a hybrid model. The NY Times and WSJ are a couple of examples. I’m not certain if their web divisions are profitable or not — that doesn’t have as much to do with the inability to run a hybrid model web property as it does that they have a mostly print based company still with costs a pure Internet business would not have.
We’re still very early in the days of media moving to the Internet. Based on some 2009 estimates Internet advertising amounted to ~$21B whereas newspapers still brought in ~$31B, television at ~$36B, and magazines at ~$16B — these numbers are just advertising revenue, purchase/subscription numbers not included. As revenue continues to shift to Internet publishing formats you’ll see all models emerge and as a publisher you’ll need to figure out which category you want to be in. If you don’t view your content as “local circular” quality then perhaps you should start looking at a new business model today.
Public clouds and their features, followed by the future of cloud computing hardware
by bretpiatt on Dec.20, 2009, under Technology
I’m going to break this post up into two sections, the first will discuss public clouds and their features focused on advanced networking as an example. The second portion will look at the future of cloud computing hardware — both networking and computing.
Public Clouds and Feature Selection
A discussion started on Twitter today after Werner Vogels (@Werner) tweeted about the future of networking through a blog post by James Hamilton entitled, “Networking: The Last Bastion of Mainframe Computing”. Christopher Hoff hasn’t been thrilled (understatement of 2009) with the networking features provided by cloud computing platforms both public and private. Unless I misunderstood his tweet he’d love to hear public cloud providers commit to a flexible API driven networking layer using technology such as OpenFlow.
I tossed back a question asking, “Are customers willing to pay for complex network customization in a cloud? If so, what percentage of them? Thoughts?” and he replied, “In terms of paying for parity in what I can do in even a basic enterprise today? No thanks. That’s on you as a provider in long term”. I threw this question out because here-in
lies the problem… Public clouds will only end up with the features that a broad market will pay for or a small market will pay a very significant premium for. The reason behind this is when a cloud adds a core feature, it adds it everywhere. This leads providers to only invest in new features that a enough of their customers are interested in to offset the cost of deployment and still yield a satisfactory return on capital.
Today at Rackspace customers that want advanced networking configurations are directed to our Private Cloud platform (I say our because I’m employ
ed by Rackspace — the opinions expressed here however are mine alone). They can then create security zones, use IPS/IDS, and enable enhanced DDoS defense services all behind dedicated firewalls and load balancers. The private cloud environment can have bridged network segments that connect to a public Rackspace Cloud Servers(tm) configuration for workloads that do not require advanced networking. The current addressable market interested in both public cloud as a primary platform and advanced networking is small. The early adopter group of start-ups and SMBs doesn’t typically need or is not willing to pay for advanced networking and the enterprises that are willing generally aren’t first movers on new technology.
As the public cloud market matures the addressable market will grow and you’ll start to see public cloud providers adding advanced networking capabilities though the cloud definition of “advanced” won’t ever be truly “cutting edge” on a mass market cloud. I expect we’ll see niche clouds emerge that will cater to specific application use cases that will have advanced features for their target customer. Early examples of this are Force.com or the OpSource Cloud.
The Future of Cloud Computing Hardware
I’m now going to loop back to James’s post that kicked this whole thing off where he compared the current network device situation to mainframe and the vertical scale centralized systems. He asserted that we’ll see a commoditization of the networking layer similar to what we’ve seen in the storage layer through technologies like RAID and through servers with x86. The reason RAID and x86 have been successful is they are multi-purpose with the capabilities to serve a broad range of applications well with proper configuration.
Networking gear is very different because the workloads are all uniform and when you have a uniform workload an ASIC (Application Specific Integrated Circuit) or a FPGA (Field Programmable Gate Array) that has is tailored to a specific type of workload will enable better performance per dollar. The second core difference between the server/storage markets and networking is once you step into the “carrier/cloud class” networking equipment only a few hundred potential customers exist — markets with fewer stronger customers tend to be more consolidated. Networking gear has also been “cloud like” for over a decade now. Lets look at the NIST requirements for a cloud:
On-demand self-service - This requirement is for a cloud to user relationship. I’ll translate this to a network cloud to network engineer relationship. For them, all carrier class networking gear supports SNMP along with other potential programmable configuration methods through management systems with APIs such as the Cisco Configuration Engine [PDF].
Rapid elasticity – This dates back to frame-relay where the concepts of a CIR (Committed Information Rate) was introduced. The space has continually evolved with QoS being introduced on ATM up through the advanced dynamic algorithmic traffic routing today over IP/MPLS networks.
Resource pooling - Doing this for computing is new outside of the HPC market — telecommunication networks have been multi-tenant since the point the 3rd phone was hooked up over 100 years ago.
Measured Service – Networking has been doing this for years as well, down to the minute or byte of data instead of the hour or GB (the smallest unit of measure any public cloud compute or storage platform bills in).
Broad network access – Service provider IP networks are the ultimate in heterogeneous access through standards based communication. They support connectivity over a number of layer 1 physical mediums using quite a few layer 2 communication protocols.
Cloud computing may actually end up bringing the server market closer to the current networking market than vice versa. An IBM Z-series is capable of very efficiently Linux instances. It also supports I/O virtualization for both networking and storage with granular controls — features we still don’t have at the same quality level from x86 virtualization solutions. The Oracle Exadata V2 is another example, it supports 1 million I/O per second for non-sequential workloads on databases up to 140TB in size. How many commodity x86 servers does it take to match either of those configurations and how do they compare in capex and TCO (Total Cost of Ownership) to the IBM or Oracle specialized platforms? We see even specialized x86 platforms being developed and deployed by a number of players. Some examples are the Cisco UCS, SGI Ice Cube, and the Sun Modular Datacenter. These platforms are all designed to optimize spend for virtualization/cloud computing workloads and while they may be made up of x86 sub-components they are designed to function as a complete “mainframe” functional unit.
Conclusions
We’re still very early in the technology transition to a full utility style computing grid. As the transition progresses we’ll see more use cases served by a broader range of features. For the small verticals with complex configuration needs and a low willingness to pay a premium we’ll see niche providers.
Networking hardware has been cloud like for more than a decade and a few major players dominate the market because of the small number of strong buyers. Technologies such as OpenFlow in combination with Moore’s law has the potential to disrupt the market but this isn’t a guarantee. The current clouds being built using a massive number of commodity x86 systems is also not guaranteed to be the future — specialized computing platforms have the potential to deliver better unit economics and in a commodity business it will come down to the financials in the end.
Every developer should learn the OSI model
by bretpiatt on Dec.18, 2009, under Technology
The OSI model is a great way to learn to layered design so components can be refactored or replaced without a complete system redesign. This will also allow for a project to be broken up into separate teams in the future as they’ll have a clear understanding of their upstream and downstream requirements. Beyond being able to divide a project up you also gain the ability for a new hire to jump in and really start contributing.

The OSI model visualized
This doesn’t mean you should “use the OSI model” in each project, it means you should use the principles behind it when designing the project. Lets take the OSI model concepts to a basic web application.
Application: Your web front-end that users of the site see. This should talk to a clear presentation layer API to generate any dynamic content.
Presentation: This generates the dynamic content of the site, handles encoding / decoding of data formats. You should use a standard interface to connect to your data storage (ODBC/JDBC, OS/file system abstracted file I/O).
Session: This layer should be handled by your application server (ex. Apache, Jetty, Tomcat, etc.) This can handle communicating with the networking layer of your operating system.
Layers 1-4: Most web applications don’t redesign anything here. If you’re writing an infrastructure application you may need to consider segmenting at these levels.
We’ve now gone through a single purpose, single module web application architecture. When you add a second service/module to your application ensure that communication occurs at the proper layers. Having an application layer service of module A talking directly to a session layer service of module B may sound efficient but you’ll quickly end up weaving a web that will cause long term problems down the road. All communications between modules should occur at the same layer, i.e. A:5 to B:5 to pass session data to another service.
I’d like to write more on this topic with examples so I’m going to cut it short tonight with a plan to continue in a series on this that includes an example application.
Availability is a fundamental design concept
by bretpiatt on Oct.03, 2009, under Technology
Earlier today a conversation on Twitter with Christopher Hoff (@Beaker), James Watters (@wattersjames), George Reese (@georgereese), Benjamin Black (@benjaminblack), and Shlomo Swidler (@ShlomoSwidler) discussed how many people seem to assume that because clouds can scale and rapidly provision servers that they’re always available and that because of this availability doesn’t have to be a fundamental design concept anymore. It kicked off with @Beaker’s tweet about BitBucket, “Cloudifornication: 20+ hour outage due to EC2/EBS on BitBucket http://bit.ly/A8vCy” BitBucket ran into a problem with EC2/EBS that made their site unavailable for 20+ hours (I’m linking to the comments discussing it on Hacker News since the main BitBucket page is back to normal now, no longer the explanation since the problem is fixed). [UPDATE: Adding BitBucket blog post on the outage.]
The purpose of this post isn’t to analyze the BitBucket situation, it is to help people understand how to design an available architecture while still keeping it efficient in terms of expense. Given an unlimited budget (or nearly unlimited) most IT architects will be able to build a “bullet proof” configuration. Most of us don’t function in that world though so compromises are made. Here I hope to outline how you can compromise effectively by thinking about availability early and often in the design process. The design recommendations I’m going to outline are general in nature and depending on your specific business and operational model may not fit. I enjoy discussing specific use cases and designs so if you’d like analysis directly related to your situation comment on the post and lets discuss it.
With that disclaimer here goes…a step by step guide to building a web application that will be available “almost all the time”… [Second disclaimer, I work for Rackspace Hosting, we have a cloud (The Rackspace Cloud), the recommendations here are my opinions, not those of my employer.]
1. Start with DNS — This is overlooked quite a bit and is the easiest thing you can do to ensure availability. Get a reliable DNS provider that hosts their DNS servers in multiple data centers that each have multiple peering arrangements with documentation on their BGP convergence times. This DNS provider should let you set the TTL (time to live) on your A records down to a maximum of 5 minutes (some will let you go as low as 1 minute). Now you have the ability to redirect www.yoursite.com to a new IP address in 1-5 minutes. While this may not let you recover your site completely, the worst case is in 5 minutes you can have a simplified version of your site up and running “somewhere” in 5 minutes. Being able to give your customers a “We’re experiencing issues” message with a phone number or other information is invaiuable. When customers believe you are working on recovering your site and/or have things under control they’re willing to trust you much more than if they get a 404 or 503 error page from their browser — if they are a new visitor and not a customer a 404 most likely means they never come back.
2. Design your application with portability in mind. Using a technology only available from a single provider may sound like a good idea but it locks you into that provider. While we all believe our hosting provider will be in business forever 5 years ago we all thought we’d never see GM go bankrupt or Lehman Brothers cease to exist. Cloud computing makes this much easier to test and implement than it used to be. Part of going from idea to launch should include deploying your application to a minimum of two providers to ensure if something does happen to your provider you’ll be able to continue to run your business. I don’t recommend trying to run your application on multiple providers as it’ll generally add expense you shouldn’t need — however I do recommend having your code and data with mutiple providers. This requirement means you should try to avoid customizing at the OS/kernel/filesystem level. Those are the main items I see causing difficulty in portability. Next, if you want a hosting provider to support your application infrastructure stack (i.e. the HTTP server [Apache, IIS, etc], database server [Oracle, MySQL, MS SQL, Postgres, etc]) pick standard versions or plan on hiring staff to support your customizations. While a single provider may agree to support your (or their) modifications others probably won’t. If your provider has their own special versions of the appliation platform they may be trying to lock you in — beware!
3. Spend some time on BCP/DR (Business Continuity Planning/Disaster Recovery). You’ve spent months (or years) going from idea to application — if you spend a day or two you’ll have a fair BCP/DR plan — if you have somebody with a background in this you can have a good plan in a day or two. After putting the plan together –TEST IT! I’ve helped a number of businesses put together a plan and after we’re done they check the box, put it in a filing cabinet and then pray they never have to get it out. That mindset is like a football team having a “2 minute drill” playbook but never practicing the plays hoping that they’ll never need to use it. When it comes down the having to do it, if you haven’t practiced how well do you expect it to go with the added stress of an outage? “But Bret, I can’t test it, we can’t take our site offline for a test!” — You don’t have to go all the way to taking your main infrastructure offline (see #1 DNS). You can bring up the replacement site without ever impacting your real site by modifying the DNS on your test machines (either point them to a BCP system test DNS server or modify the local host files).


Backup your data, backup your data, backup your data.
4. Backup your data, backup your data, backup your data. Customers will deal with service outages. They won’t put up with you losing their data. You use time capsule, Jungle Disk, Mozy, Dropbox, or any other number of personal backup programs for your personal files. If your house burned down you’d still have all of your own stuff. What would happen to your web site if the data center your servers are in burned to the ground? Is the data gone? If it isn’t gone how long will it take you to restore? Is that timeframe acceptable to you and your users? A couple of concepts to familiarize yourself with are RPO (recovery point objective) and RTO (recovery time objective). RPO means how much data will be lost — if you do a daily backup you have a 24 hour RPO, if you run a transaction replicated database (such as Oracle with Data Guard) with the databases in separate geographic locations your RPO may be under a second. On RTO if you’re restoring from a backup medium like tape you’ll be able to recover ~10-40GB/hr (depending on the tape technology and compression ratio of the backup) — if you have a 400GB database you have a RTO of 10+ hours even if with cloud computing you can instantly have a new database server available to put the data on. With a live database in a second geographic location your RTO is also potentially under a second (for restoring data, since you don’t have a restore — this doesn’t mean your whole site is automatically online in that same time). I won’t go into detail here since we’re talking availability and not integrity but having a multi-geographic location replicated database doesn’t insure integrity — you still need snapshots or transaction logs or another way to go back to various points in time if you end up with bad or erased data (see my favoriate XKCD, “Exploits of a Mom”).
So now that we’ve taken all of this into account — what do we do? My recommendations…
1. Make a “gold build” of each of the server types in your application and understand how long it takes you to have your necessary quantity of each server type online at various providers — cloud makes this much easier, in the dedicated world you’re looking at days typically to provision a new environment.
2. If your business relies on a fully functional web site as a primary revenue stream have a live database at a secondary location with the ability to launch web and app servers to bring your environment online quickly in the event of a primary provider failure. If you can continue to service your customers via phone and/or e-mail have a static version of your web site running that you can switch to using DNS in the event of a primary provider issue.
3. Keep your source code in multiple locations with the ability for multiple employees to be able to deploy the site in the event of an issue. I’m a huge fan of collaborative code repositories like GitHub and Beanstalk but if your code is only one one of them and they’re down (or in maintenance window) when you need to have that code to bring up a backup environment you’re stuck — it costs next to nothing to keep that code in multiple places.
I understand that nowhere in this post do I mention HA (high availability) nor do I mention things people generally think of when they hear HA. Having redundant switches, firewalls, routers, and servers all in a single location (what people generally think of when they hear HA) will ensure that location stays online and you should certainly be doing that but it puts all of your eggs into that basket if you aren’t looking at HA beyond the single infrastructure. Now that I’ve mentioned it if you want to learn more about HA design in a single location the Internet is full of good information on the topic.
I’ve also focused the discussion on architectures relevant to “most folks”. If you’re Facebook, eBay, or Google (the search engine) you don’t want to rely on DNS to deal with outages at a specific location. You’ll want to pair DNS with GLB (global load balancing) and BGP so you can have near real-time re-routing of users and potentially even sessions. My availability recommendations certainly aren’t free to implement but they also don’t double your expenses. It is very possible to add between 5-25% to your hosting expense to significantly increase your availability (and decrease your RPO/RTO).
I’m going to also note that I didn’t mention systems management or monitoring here really. Those are both key items to understand to have an available environment but aren’t directly tied to designing an available architecture. You’ll need to have proper systems management tools and policies (or you’ll cause outages yourself) and you’ll need monitoring so you know when to implement your BCP/DR plan.

