Velocity 2011 retrospective

This was my third Velocity conference, and it’s been great to see it grow. Smoothly running a conference with roughly 2,000 registered attendees (well over 50% more than 2010) is itself a testament to the importance of operations.

Velocity 2011 had many highlights, and below you’ll find mine. I’ll confess to some biases up front: I’m more interested in operations than development, so if you think mobile and web performance are underrepresented here, you’re probably right. There was plenty of excellent content in those tracks, so please add your thoughts about those and other sessions in the comments area.

Blame and resilience engineering

I was particularly impressed by John Allspaw’s session on “Advanced Post-mortem Fu.” To summarize it in a sentence, it was about getting beyond blame. The job of a post-mortem analysis isn’t to assign blame for failure, but to realize that failure happens and to plan the conditions under which failure will be less likely in the future. This isn’t just an operational issue; moving beyond blame is a broader cultural imperative. Cultural historians have made much of the transition from shame culture — where if you failed you were “shamed” and obliged to leave the community — to guilt culture, where shame is internalized (the world of Hawthorne’s “Scarlet Letter“). Now, we’re clearly in a “blame culture,” where it’s always someone else’s fault and nothing is ever over until the proper people have been blamed (and, more than likely, sued). That’s not a way forward, for web ops any more than finance or medicine. John presented some new ways for thinking about failure, studying it, and making it less likely without assigning blame. There is no single root cause; many factors contribute to both success and failure, and you won’t understand either until you take the whole system into account. Once you’ve done that, you can work on ways to improve operations to make failure less likely.

John’s talk raised the idea of “resilience engineering,” an important theme that’s emerging from the Velocity culture. Resilience engineering isn’t just about making things that work; anyone can do that. It’s about designing systems that stay running in the face of problems. Along similar lines, Justin Sheehy’s talk was specifically about resilience in the design of Riak. It was fascinating to see how to design a distributed database so that any node could suddenly disappear with no loss of data. Erlang, which Riak uses, encourages developers to write partition tasks into small pieces that are free to crash, running under a supervisor that restarts failed tasks.

Bryan Cantrill’s excellent presentation on instrumenting the real-time web using Node.js and DTrace would win my vote for the best technical presentation of the conference, but it was most notable for his rant on Greenland and Antarctica’s plot to take over the world. While the rant was funny, it’s important not to forget the real message: DTrace is an underused but extremely flexible tool that can tell you exactly what is going on inside an application. It’s more complex than other profiling tools I’ve seen, but in return for complexity, it lets you specify exactly what you want to know, and delivers results without requiring special compilation or even affecting the application’s performance.

Data and performance

John Rauser’s workshop on statistics (“Decisions in the Face of Uncertainty“), together with his keynote (“Look at your Data“), was another highlight. The workshop was an excellent introduction to working with data, an increasingly important tool for anyone interested in performance. But the keynote took it a step further, going beyond the statistics and looking at the actual raw data, spread across the living room floor. That was a powerful reminder that summary statistics are not always the last word in data: the actual data, the individual entries in your server logs, may hold the clues to your performance problem.

Velocity observations and overarching themes

There were many other memorable moments at Velocity (Steve Souders’ belly dance wasn’t one of them). I was amazed that Sean Power managed to do an Ignite Karaoke (a short impromptu presentation against a set of slides he didn’t see in advance) that wasn’t just funny, but actually almost made sense.

I could continue, but I would end up listing every session I attended; my only regret is that I couldn’t attend more. Video for the conference keynotes is available online, so you can catch up on some of what you missed. The post-conference online access pass provides video for all the sessions for which presenters gave us permission.

The excellent sessions aren’t the only news from Velocity. The Velocity Women’s Networking Meetup had more than double the previous years’ attendance; the group photo (right) has more people than I can count. The job board was stuffed to the gills. The exhibit hall dwarfed 2010’s — I’d guess we had three times as many exhibitors — and there were doughnuts! But more than the individual sessions, the exhibits, the food, or the parties, I’ll remember the overarching themes of cultural change; the many resources available for studying and improving performance; and most of all, the incredible people I met, all of whom contributed to making this conference a success.

We’ll see you at the upcoming Velocity Europe in November and Velocity China in December, and next year at Velocity 2012 in California.

Related:

Velocity 2011 retrospective

Resilience engineering and data's role in performance are key trends in web ops.

Blame and resilience engineering

Data and performance

Velocity observations and overarching themes

Get the O’Reilly Programming Newsletter