Velocity 2011 retrospective

Resilience engineering and data's role in performance are key trends in web ops.

This was my third
Velocity conference
, and it’s been great to see it
grow. Smoothly running a conference with roughly 2,000 registered attendees
(well over 50% more than 2010) is itself a testament to the
importance of operations.

Velocity 2011 had many highlights, and below you’ll find mine. I’ll confess to
some biases up front: I’m more interested in operations than
development, so if you think

are underrepresented here, you’re probably right.
There was plenty of excellent content in those tracks, so please add your thoughts about those and other sessions in the comments area. </p

Blame and resilience engineering

I was particularly impressed by John Allspaw’s session on
Advanced Post-mortem Fu.” To summarize it in a sentence, it was about getting
beyond blame. The job of a post-mortem analysis isn’t to assign blame
for failure, but to realize that failure happens and to plan the
conditions under which failure will be less likely in the future.
This isn’t just an operational issue; moving beyond blame is a
broader cultural imperative. Cultural historians have made much of
the transition from shame culture — where if you failed you were
“shamed” and obliged to leave the community — to guilt culture, where
shame is internalized (the world of Hawthorne’s
Scarlet Letter“). Now, we’re clearly in a “blame culture,” where it’s always someone else’s
fault and nothing is ever over until the proper people have been
blamed (and, more than likely, sued). That’s not a way forward, for
web ops any more than finance or medicine. John presented
some new ways for thinking about failure, studying it, and
making it less likely without assigning blame. There is no single
root cause; many factors contribute to both success and failure, and
you won’t understand either until you take the whole system into
account. Once you’ve done that, you can work on ways to improve
operations to make failure less likely.

Velocity 2011 Online Access Pass
Couldn’t make it to Velocity? Purchase the Velocity Online Access pass for $495 and get the following: The Velocity bookshelf — six essential O’Reilly ebooks; access to three upcoming online conferences; and access to the extensive Velocity 2011 conference video archive

Learn more about the Velocity Online Access Pass

John’s talk raised the idea of “resilience engineering,” an important
theme that’s emerging from the Velocity culture. Resilience
engineering isn’t just about making things that work; anyone can do
that. It’s about designing systems that stay running in the face of
problems. Along similar lines, Justin Sheehy’s talk was specifically about
resilience in the design of Riak. It was fascinating to
see how to design a distributed database so that any node could
suddenly disappear with no loss of data. Erlang, which Riak uses, encourages developers
to write partition tasks into small pieces that are free to crash,
running under a supervisor that restarts failed tasks.

Bryan Cantrill’s excellent presentation
instrumenting the real-time web

using Node.js and DTrace would win my vote for the best
technical presentation of the conference, but it was most notable for his
rant on Greenland and Antarctica’s plot to take over the world.
While the rant was funny, it’s
important not to forget the real message: DTrace is an underused but
extremely flexible tool that can tell you exactly what is going on
inside an application. It’s more complex than other profiling tools
I’ve seen, but in return for complexity, it lets you specify exactly
what you want to know, and delivers results without requiring special
compilation or even affecting the application’s performance.

Data and performance

John Rauser’s workshop on statistics
(“Decisions in the Face of Uncertainty“), together with his keynote (“Look
at your Data
“), was another highlight. The workshop was an
excellent introduction to working with data, an increasingly important
tool for anyone interested in performance. But the keynote took it a
step further, going beyond the statistics and looking at the actual
raw data, spread across the living room floor. That was a powerful
reminder that summary statistics are not always the last word in data:
the actual data, the individual entries in your server logs, may hold
the clues to your performance problem.

Velocity observations and overarching themes

There were many other memorable moments at Velocity (Steve Souders’ belly dance wasn’t one of them). I was amazed that Sean Power managed to do an Ignite
Karaoke (a short impromptu presentation against a set of slides he
didn’t see in advance) that wasn’t just funny, but actually almost made

I could continue, but I would end up listing every session I
attended; my only regret is that I couldn’t attend more.
for the conference keynotes
is available online, so you can catch up on
some of what you missed. The post-conference online
access pass
provides video for all the sessions for
which presenters gave us permission.

Women's MeetupThe excellent sessions aren’t the only news from Velocity. The
Velocity Women’s Networking Meetup had more than double the previous years’
attendance; the

group photo
(right) has more people than I can count. The

job board
was stuffed to the gills. The exhibit hall dwarfed
2010′s — I’d guess we had three times as many exhibitors — and there were doughnuts! But more
than the individual sessions, the exhibits, the food, or the parties,
I’ll remember the overarching themes of cultural change; the many resources
available for studying and improving performance; and most of all, the
incredible people I met, all of whom contributed to making this
conference a success.

We’ll see you at the upcoming Velocity Europe in November and Velocity China in December, and next year at Velocity 2012 in California.



Sign up for the O'Reilly Programming Newsletter to get weekly insight from industry insiders.
topic: Programming
  • Ras Fred

    Bad link – “Advanced Post-mortem Fu.” Perhaps “Advanced Postmortem Fu and Human Error 101″ ?

  • Mac Slocum

    @Ras: Thanks for the catch and for digging up the correct link. That’s much appreciated. The post has been updated.