Feedback is Different

Doing away with heuristic constants

In our last post, we pointed out that feedback is different from common algorithmic thinking. In the current post, we will discuss these differences in more detail.

Typical algorithms tend to be deterministic, and are grounded in the assumption that all possible outcomes can be enumerated: “Take the middle element in the array. If it is less than or equal to the pivot, do this; otherwise, do that.” Control systems built with this mindset tend to be rule-based and heuristic: “Every day at 10am, spin up 15 more servers, then take them down again at 4pm.”

Such systems tend to suffer from two problems, both of which stem from the same underlying cause, namely the fact that the controller is fixed (deterministic) and does not take the actual state of the system into account.

The first problem is the occurrence (and, usually, proliferation) of heuristic constants, where “heuristic” is really a euphemism for “ad-hoc”. Why do we spin up 15 servers? Why not 16? And why at 10am and not 10 minutes earlier? Well, truth be told, because that worked and seemed like a good idea at the time. (As the business grows and changes, such fixed configuration parameters can become a tremendous legacy issue.)

The other problem is that, over time, the rules become ever more complex, usually in response to fire drills. 15 additional servers were good enough for a while, but then the Christmas season hit, and we really needed 35. And then somebody notices that we do not need to spin up as many servers on a weekend. But on the other hand, in the middle of the week, traffic begins to grow at 8:30, so let’s spin up 5 servers at 8:30, 5 more at 9:00 and then the last 5 at 10:00 – that should do it… (Except when…)

The ultimate problem is that the “real world” is more messy and more unpredictable than a classical data structure, and hence a feedforward approach, that prescribes what step to take when, is bound to fail. Robust systems must take the actual state of the system into account and respond to it appropriately.

That’s what a feedback loop does. It constantly monitors some quality-of-service metric, and when it notices a deviation from the desired behavior, it applies a correction to drive the system back towards the target. This simplifies the rule engine tremendously, because we can summarize everything into a single action: “If the response time goes up, spin up more servers, and vice-versa”. Notice how this does away with the “heuristic” constants completely, and at the same time will work under all conditions: day-time, night-time, weekends, you name it – the controller will do what it takes to keep the system responsive.

Designing such systems requires a shift in thinking, because we must let go of the prescriptive mindset that wants to list all possible outcomes and prescribe exactly what to do in each case. Instead, we must think about the system’s behavior, and design the appropriate change that must be applied to drive the system in a desired direction.

It also takes real nerve to design and commission such a system: feedback control only works with the hands off the wheel! It can be very difficult to relinquish explicit, deterministic control of a real-world system – until one sees it work, like magic, in practice.

And finally, it takes skill and knowledge to build such a system. Naive approaches are likely to fail (so that one ends up staying with the devil-one-knows and adding some more exceptions to the rule engine). The primary challenge is to “tune” the control loop in such a way that it responds sufficiently quickly to changes in the environment (such as changes in traffic intensity), but at the same time does not drive the system towards instability (which would manifest itself in control actions that are too large: adding hundreds of servers in one moment, only to take them all down again, as soon as the response time begins to drop).

But before we can talk about controller tuning, we first need to understand what a typical feedback controller looks like. This will be the topic of our next post. Stay tuned!

tags: