Our first approach at improving the predictability of sprints
This article was lying in a virtual drawer for months. At first I gave it only our team to read, then I wanted to publish it here, but then decided to do it at a later date, if the overall approach would prove successful with a second team.
Shortly before Christmas we made a second experiment which proved successful, so it was finally time to open the drawer and publish…
On Wednesday we’ve had the opportunity to make our first measurement regarding the relation of T-Shirt sizes to Person-working-days. It took us 5 sprints (so 10 weeks in total) to reach this state. What’s the fuss all about? Why did it take so long? What did that give us?
What did it give us?
We were able to considerably improve our sprint predictability. We started from a state where our initial estimates led to approximately 150% overloaded sprints.
This caused large tensions between the developer team and the business team, due to not delivering upon the promised sprint goals.
Over a course of roughly seven sprints our team was able to reach a stable working state, decreasing the tensions with cooperating teams and providing predictability to planned deliveries.
What’s the fuss all about?
The Agile methodology tries to bridge the world of prototyping (hello developers!), where the uncertainties of “what will be done when” are large, with the world of production (hello managers!), where clear predictions are everyday life, where people ask “by when will it be ready?”.
Since prototyping deals with a considerable amount of unknowns, the software developers are often simply unable to predict when they will be ready to ship something.
Bugs may appear, additional difficulties may arise, unforeseen interdependencies may complicate the task at hand.
Predicting how many working days will be necessary to complete a task resembles divination.
Nevertheless, humans have one very handy and actually puzzling trick up their sleeve — they can still quite well estimate the complexity of the tasks at hand. One such method is based on ‘T-Shirt sizes’ — so that tasks are grouped into ‘very complicated, complex, regular, easy, very easy’ classes: XL, L, M, S, XS respectively.
Knowing that humans can pretty well estimate the complexity of tasks one could say “if only there was a way to translate these into person days…” and actually, there is!
How to translate from T-Shirt sizes to PDs?
To be fair — by now some of you might be wondering “why would you waste FIVE sprints to measure this? Just take the Story-Point Fibonacci scale with 1SP for a simple task that takes about half-a-day and you’re done…”.
That’s actually the base of our first hypothesis. Namely — “the effort for complexity classes doesn’t have to follow the Fibonacci scale”. In case it doesn’t, it will contribute to the error of estimations.
We wanted to be sure, so we measured this for our team.
If you remember middle school and solving simple linear sets of equations (yes, these do come in handy after all!), then this is exactly one of such cases.
A+B = 5 and
A - B = 3 can be done by adding ‘row 2’ to ‘row 1’ in order to eliminate B and calculate
2A = 8, finally leading to the solution of
A = 4, B = 1.
The same logic can be applied to a slightly more complicated set of equations for XL, L, M, S, XS.
If you remember the conditions, when such a set of equations is solvable — you need at least 5 equations, the determinant of the matrix must be defined and non-zero (this I will not explain, stay calm). This is why we needed to collect data over 5 sprints to verify this hypothesis.
Our set of equations was:
L + 2M + S = 79
XL - L + 2M - S = 65
2L + S = 64
M + 3S + XS = 65
L + M + S - XS = 53
The final solution was (see here):
XL = 63, L = 51/2, M = 81/4, S = 13, XS = 23/4,
or in more eye-friendly version:
XL = 63, L = 25.50, M = 20.25, S = 13, XS = 5.75
How did we estimate?
During each sprint planning phase, we sat down and ‘played’ sprint poker with the entire team to estimate the complexity of tasks at hand. Why we did this is best illustrated by this short video on how to count the number of gumballs in a vending machine. As absurd as it may seem, this has a very profound base in science, even though it starts off with no more than ‘gut-feeling’.
What does each equation correspond to?
So in our case, the sprint #1 allowed us to deliver a total of one large, two medium and one small complexity stories, costing the team 79 person-days of effort. Leading to the equation:
L + 2M + S = 79
Where did the negative terms come from?
You have very likely noticed in equation #2 the
- L and
- S terms and could be wondering “what on Earth is a NEGATIVE large story?”. These came from measuring the progress and repeating estimations.
Neither are the estimations always right, nor does reality allow us to deliver everything as we would wish to. Sometimes we deliver only a part of the story that we have committed to do.
In case we began a sprint with an
XL sized story, and ended the sprint with an L-sized leftover, we took it into account using a term like (
XL - L).
Why did we repeat the estimations?
Since some a-priori estimates may have been wrong, we have repeated the estimation process at the end to correct them with all the knowledge gained during the sprints. This for instance led to re-labeling of some initially
L sized stories to their nightmarish and fully deserved
Armed with this a-posteriori knowledge and updated estimates we were able to refine our set of equations to give us more accurate results.
What do the numbers mean?
Since the left hand side of the equations obviously corresponds to book-keeping the progress of sprint deliverables, we need to take a look at the right hand side of the equations.
This corresponds to the part of book-keeping of the team’s efforts and availability: times where team members were either not available due to any reasons like sickness, vacations, or investigating bug tickets (even though those are part of the sprint planning, but as such cannot be estimated as they are quite often equal to doing a mass of experimenting).
These were estimated with an accuracy of up to 1 person-day (later abbreviated as PD) and introduced into the equations as follows:
Say that during a given sprint one PD was spent on investigating a bug-ticket and the overall team availability was 80 PDs, the equation would look like
L + 2M + S + 1 = 80, which then was simplified to
L+2M +S = 79.
How to build the equations?
The left side contains a sum of complexities over all story-tickets tackled in the sprint and sum of PDs spent on all time-boxed tickets (spikes, bugs, etc):
Where the S terms denote the i-th story complexity estimated at the sprint start and end accordingly. In short — how much progress was achieved for a given story during the sprint.
The T term denotes the amount of PDs spent on time-boxed items.
The right side contains terms P corresponding to the total amount of person-days that the team was able to provide.
The assumptions pitfall…
The classical Agile planning very often relies on assumptions like:
Story complexity-sizes arithmetic follows the Fibonacci sequence: 1,2,3,5,8
to account for emerging complexity when small tasks are being integrated into bigger stories and epics. Translating to:
XS + S = M, S + M = L, M + L = XL.
A story with complexity
XStakes about a day,
Stwo days, by consequence leading to
Mbeing half a work week,
L— a full week,
XLbeing nearly two weeks.
However, as it also very often happens, these assumptions may be like a broken clock that shows the correct time twice a day. They may be correct sometimes, but there is no guarantee that they will work for every team.
For the assumptions to work — the team using the methodology, has to accidentally have their gut-feeling of complexity estimations really match the assumption in the first place.
When this is not the case, the team wanders into a trap of trying to do arithmetic using string-representation-of-numbers:
…solved by measuring
Pitfall 1: The Fibonacci scale of complexity.
To avoid this pitfall, our team has chosen to measure and verify:
Do our estimates really align with the assumption of the Fibonacci scale?
and found out that the answer is: No.
As you can easily see, our result of: 63, 25, 20, 13, 6, is nowhere near the corresponding Fibonacci sequence of: 51, 32, 19, 13, 6.
So if we had decided to use the Fibonacci scale to start with — since the first three lowest complexity sizes do fit the sequence pretty well — the large and extra-large stories would hugely contribute to errors in our estimations.
An additional take-away from this result was gaining understanding of a thumb-rule: don’t take XL stories into a sprint and split them up instead.
Pitfall 2: Assume an XS story takes one PD.
Another avoided pitfall was, verifying the assumption of:
Does an extra-small task take one PD on average?
brought yet again the answer: No.
In our measurements, for our team it turned out to be about 6 PDs.
Just imagine the consequence of taking this assumption as an axiom…
This gigantic difference between the assumed and real values would only have led to our planning diverging more and more. Now, having gained this insight, we can adjust our planning and refine the results.
We can plan more accurately!
That does not mean that the work is done and we can now forevermore use these numbers. The planning accuracy still can be further improved, which remains our aim.
Luckily, now we do not need to invest yet another 10 weeks to improve. Now every single sprint should increase our accuracy of predictions.
One needs to keep in mind, that there are quite a few intricacies involved:
- The team learns to estimate better, it should improve our results, but also means that the equations we use are describing a dynamic and changing system,
- The team evolves itself — people join, people leave, people are affected by each other and what happens in their lives. The equations largely simplify what we do.
- The team surroundings evolve — we do our best to increase the quality of the system. The longer we work on it, the longer we work together as a team, the better we cooperate with other teams, the higher our efficiency should be.
It is quite easy to imagine that the team’s initial reaction to the outcomes was skeptical. After all, almost every developer has been working with Agile already, this must be wrong, it’s just different to what people usually take without ever checking it…
It took us two more sprints to build up the team’s confidence that the results are correct. In the following sprints “the maths” has settled in and the sprint predictability improved.
The initial version of this article, with our spreadsheets used for estimations, was given as a proof-of-concept to a second team.
With the second team, we were able to improve the predictability even faster.
The results will be described in a separate article (which you can now find here).