The End Of Agile Estimation: Lies, Damn Lies, And Burndown Charts

by simbo1905

In my last couple of posts, I gave an example of a successful agile team that didn’t do estimates, then made the case that estimation is aligned to “wagile” ways of working. This post is going to make the case that the approach is fundamentally flawed when viewed as a measurement activity.

A common approach to agile estimation is to have a team play planning poker to come up with a story point estimate for a given story. What is a story point? It is literally whatever the team agrees it is. The idea is that a team, over time, converges on some reasonably consistent usage. In parallel, the team also converges on some real velocity expressed in the same terms. The idea being that we can then reliably estimate that the stories that we take from a prioritised backlog will fit into the next sprint. Once upon a time, I bought into this fairy tale enough to write a mobile optimised planning poker app (which is still on Github).

In the last post, we pointed out that if we are aiming for predictability folks might unconsciously overestimate stories. They can then either speed up or slow down to complete the work within the estimated time. That implies that the work rate can vary to ensure that stories will fit into a sprint. We don’t want that as it will reduce the total amount of productive work done for users. The following quote points out this effect:

“Software project estimation often deteriorates into a political game. Software developers give optimistic estimates to get gold stars. Business stakeholders, burned before by failed projects, forcefully demand all features on an arbitrary schedule. The final schedule is determined by horse-trading rather than engineering. Both sides have legitimate needs but end up in a lose-lose situation because it isn’t politically safe to communicate these needs.” Excerpt From The Tao of Microservices by Richard Rodger

We can, and should, negotiate the amount of work to be done on a story.  Agile aims to get real-world feedback and to iterate. Failing faster is also better. The shorter the feedback cycle time the better. So we should aggressively make each story the absolute minimum and iterate upon it at later sprints. More iterations will mean more total work done but that’s okay as it optimises the user outcomes and that is more important than optimising the work.

Another problem with estimation is that we have constraints on which stories can be worked on in parallel. Ideally, stories should be independent so that they can be developed and released independently. Yet that often is not possible due to real-world realities. Any common code may need to be enhanced and we need to avoid having people stepping on each other’s toes. There may also be scheduling interdependencies based on the availability of collaborators. So to keep the team fully utilised we might flex the scope of a group of tickets to maximise team utilisation. This might arbitrarily effect how many iterations we might perform on any given story or how much work is done on it in any given sprint.

So each of the work rate, the amount of work, the number of iterations on a story, is both variable and negotiable. What exactly what are you then measuring when you estimate stories using planning poker? If two developers guess two different sizes for a story are they thinking about the same scope? If you say “yes as the acceptance criteria are fixed” then that’s wagile (aka water-scrum-fall). That sort of upfront design and acceptance criteria doesn’t honour feedback and iteration as the top priority; so it isn’t true agile. Doing upfront design in two weeks sprint is still waterfall only a very short waterfall. We want outcomes oriented assessments rather than activity oriented measurements to measure success. “Velocity” is actually a hangover from waterfall and is all about managing activities, not outcomes.

In software development, the code is the only design that counts. The deployment of the code is the implementation. The language “put the code into production” is not a metaphor it is a description of what really happens. The deployed code produces the user outcomes, based on how the developer designed the code. The only meaningful acceptance criteria must be based on measuring end-user outcomes with the deployed code. We want feedback and we aim to be agile in acting upon it. The true “definition of done” is when there is no further end-user feedback which gets prioritised above any other work.

The inherent flexibility in the work to be done on a story implies that the decision to take a group of stories into a sprint isn’t about accurately estimating the size of the stories. It is simply the team agreeing that they can flex the scope of a set of stories and sequence them to fit into the sprint. Why not put the planning poker cards down and have a direct discussion on scope and interdependencies?

The fact is that in agile it’s the time and people that are fixed and the scope is plastic. A steady velocity only shows that a team is skilled at flexing and scheduling work to either keep the team at high utilisation, or at padding estimates and varying work rate to look predictable. Likewise, a straight line on a burndown chart is likely only showing that the team is optimising for that wrong-headed key performance indicator. How can you tell the difference? Measure and optimise for user satisfaction and real agility in changing direction. Stop trying to measure or optimise team activities using wagile techniques like planning poker.