Farm Development

The Promise of the Cloud

As web developers we are faced with this problem: how do we scale up our code to handle high traffic? A lot of time and engineering goes into this problem -- time to simulate the traffic we expect and add servers to our cluster, cache heavy database access, etc, in anticipation of the load. Time is precious. This time could be spent optimizing the usefulness of our web product and creating interesting content. No one really congratulates you when a website works, they expect it to work.

When Google App Engine was released their pitch was: "Run your web apps on Google's infrastructure. Easy to build, easy to maintain, easy to scale." As a web developer I was excited by this because it sounded like I could spend my time on the important thing: innovation! I started running some internal apps for an online radio station (CHIRP Radio) because the price was right (free) and we knew that eventually we'd have a lot of data so infinite scalability was appealing. The apps do not get heavy traffic but they are used nearly every second of the day by a live DJ in the studio since the station broadcasts 21hrs/day.

After one year of running these apps, here's the reality of what Google App Engine offers to web developers: a volatile environment that's capable of high traffic and lots of data stoage but one which requires custom code.

Nothing has been published about the hardware that each app runs on but I have noticed from the logs that an app instance will typically start up and serve about 10 requests before another instance starts up somewhere else. Within each request, there are a lot of limitations for what an app can do. If it uses too much cpu it might die or time out. The biggest killer is app startup time, which is limited just like any other request (IMO, the limitations should be more lax for app startup). Then there is the datastore. Even after its growing pains, there are still random datastore timeouts in our app at least once a day.

The Google App Engine status page has some checks that sample latency across the entire system. When they reach a threshold, an error is reported and acknowledged by Google. Occasionally there are refunds for paid accounts if the errors are substantial. However, any latency that happens in your app might not be enough to tip over the sampling threshold of the status monitor. When your app times out, the result is a 500 page for your users. This instability is unpredictable and thus hard to plan for and develop against. A page you built might run fine for a couple days but then it might time out the next day.

But it is possible to write better code to work with this volatile system. So what does that mean for web developers? We are back to spending time on optimization. This is not the cloud I was hoping for. I was hoping that Google App Engine would take on all the responsibility of making the servers scale if I were to deliver fully tested, working code. If someone could solve this problem of scalability then it would be a huge benefit to developers. It would allow us to spend our time dreaming up and implementing websites. Well, maybe App Engine will solve this problem over time.

Even with the need to micro-optimize, Google App Engine is still useful though. The optimization is much different than typical scaling. Instead of expanding a server or sharding a database you just have to make the code more compact so that it uses very little cpu, caches all datastore queries as much as possible, etc. This will, admittedly, take less time and effort than scaling up with a fully self-hosted infrastructure would.

Ultimately, I hope that future cloud providers understand the limited benefits of App Engine's volatile system. Actually, it makes App Engine feel more like shared hosting, which never worked well.

  • Re: The Promise of the Cloud

    Hi,

    yes, App Engine got its limitations, but that let you think about how to optimize your apps accordingly. And this is useful for any hosting, not just App Engine or other cloud hosting.

    The way is simply different. It let you pay more attention to how to optimize your app in the first place (on code level) than trying to adjust the scaling issues with hardware.

    Like you said:

    > Instead of expanding a server or sharding a database

    > you just have to make the code more compact so that

    > it uses very little cpu, caches all datastore queries

    > as much as possible, etc. This will, admittedly, take

    > less time and effort than scaling up with a fully

    > self-hosted infrastructure would.

  • Re: The Promise of the Cloud

    It would be nice to just write code against a local SDK and once it worked locally to have it work in a cloud of servers. It doesn't seem too hard to accomplish this. Actually, if the SDK simulated the harsh runtime environment and the runtime environment's restrictions were predictable then it would be really easy to write and test code.

  • Re: The Promise of the Cloud

    Thanks for the thoughtful post Kumar, the App Engine team is very aware of the issues you raise, and we are constantly improving the platform to address them.

    Some of the issues you describe have been addressed in our 1.4.0 release from december 2 2010, have you tried it?http://googleappengine.blogspot.com/2010/12/happy-holidays-from-app-engine-team-140.html

    Always On instances and Warm Up Requests should address the app startup time issue that seems to be your biggest issue: please give it a try and let me know if it addresses your issue. App Engine 1.4.0 also raises many of the limits from previous versions: 10 minutes task queues instead of 30s, and Increased API Call Size Limits for UrlFetch, Image, Memcache and Mail APIs. A combination of longer running task queues and Memcache should help you prepare the data you need during requests.

    Also, announced yesterday, the High Replication Datastore for App Engine http://googleappengine.blogspot.com/2011/01/announcing-high-replication-datastore.html gives you better Datastore availability at the cost of increased latency for writes and changes in consistency guarantees in the API (there is no way around Brewer's CAP theorem:-)

    Google App Engine is a rapidly evolving platform and we try to fix the main pain points that developers have with each release. I hope that the new features in 1.4.0 and the High Replication Datastore will help address some of your concerns with the platform. Thank you very much for the detailed feedback, this is exactly what we need to hear in order to make the platform better.

    Patrick Chanezon - Google Developer Relations Manager - Cloud

  • Re: The Promise of the Cloud

    Patrick, thanks for the detailed comment. I knew about some of these changes but not all of them. Checking out your links...

  • Re: The Promise of the Cloud

    The whole scalability thing is IMO a distraction. GAE is really all about offering a platform for developers that don't want to be system administrators. Before GAE, if you built something with rails, or on django, or whatever, you're going to spend a lot of time doing stuff like configuring apache, installing databases, setting up some virtual or dedicated host, etc.

    GAE takes away that whole headache. Most of the people that I talk to that are really big fans also happen to be people that do not enjoy that system administration stuff at all.

  • Re: The Promise of the Cloud

    @Matt Agreed. Not having to maintain security patches and tweak out servers and load balancers is HUGE. I didn't meant to downplay that in the post; I do want GAE to be a little easier to develop code for though.

    @Patrick Whoa! I just started using the App warmup feature and this is exactly something we needed.

  • Re: The Promise of the Cloud

    Kumar, glad to hear App warmup works for you, keep sending us feedback!

Note: HTML tags will be stripped. Hit enter twice for a new paragraph.

Recent Projects

  • JSTestNet

    Like botnet but for JS tests in CI.

  • Nose Nicedots

    Nose plugin that prints nicer dots.

  • Fudge

    Mock objects for testing.

  • Fixture

    Loading and referencing test data.

  • NoseJS

    Nose plugin that runs JavaScript tests for a Python project.

  • Wikir

    converts reST to various Wiki formats.