Managing Risk and Uncertainty in Agile
Okay… let’s set a little context here. In my last post we talked about two different types of projects. The ones that are knowable and the ones that aren’t knowable. Projects where it makes sense to estimate and projects that are more like R&D investments where we are spending money to learn and discover. Today, I want to talk more about the first kind. The ones where we do have some idea of what we are building and the technical challenges that might be involved.
Getting clarity on what we are going to build and how we are going to build it isn’t easy. This is especially true when we have multiple competing stakeholders, no clear way to resolve priority conflicts, more to do than we can possibly get done, and the technology while understandable, certainly isn’t trivial. In the face of this kind of ambiguity, I think that many of us have thrown up our hands and concluded that software isn’t estimateable and everything should be treated like R&D.
I think this is a mistake. If we keep pushing to treat everything like R&D, without understanding the delivery context we’re working within, the whole agile movement risks loosing credibility with our executives. If you remember from our earlier conversations, most companies have predictive-convergent business models. We may want them to be adaptive-emergent, but they aren’t there yet. We’ll can talk about how to move these folks, but for now we have to figure out how to commit.
Back in the day when I was just learning about agile, and immersed in the early thinking of Kent Beck, Alistair Cockburn, Mary Poppendieck, and Ken Schwaber… I came across this idea that the team had to commit to a goal or an increment of the product at the end of the iteration. There were some preconditions of course, the story had to be clear and understandable, we needed to have access to an onsite customer, and the team had to have everything it needed to deliver.
If the story wasn’t clear and understandable, there was this idea of a spike. I’ve always understood a spike to be some snippet of product we’d build to go learn something about what we wanted to do or how we were going to do it. This idea has expanded some over the past few years to include any work we bring into the sprint to do discovery or learn something we didn’t know before. It’s basically an investment the product owner makes to get clarity into the backlog.
Like most of you guys I’m sure, my planning approach was heavily influenced by Mike Cohn’s Agile Estimating and Planning book. Mike turned me on to Bill Wake’s INVEST model and I’ve used that as a tool for understanding and teaching good user stories ever since. The longer I use the INVEST model the more profound I think it is, but as widely accepted as this idea seems to be, I think there are a few under-appreciated letters in the model. The one most relevant to this discussion is the ‘E’.
E is for Estimateable
The idea behind INVEST is that these six attributes of a user story set the preconditions the Product Owner must meet before the story can be brought to the team. If they are not INVEST ready, they get deferred and we schedule a spike. The idea behind the ‘E’ specifically is that the user story must be well enough understood by the team that they know how to build it. The team has to be able to break down the user story enough to put detailed estimates on the tasks and be willing to commit.
Remember when I said that almost every disfunction on an agile team tracks back to the backlog? We’ll here is the problem… most teams are accepting user stories into sprints that are not INVEST ready and are certainly not estimateable. If the team doesn’t have enough information to make a commitment they shouldn’t make a commitment. So… do we conclude from this that software isn’t estimateable or do we conclude that we collectively do a poor job backlog grooming.
Because most of us work in dysfunctional organizations that don’t manage a roadmap, don’t stick to their vision, thrash around at the executive level, and generally can’t make up their mind… many of us have come to the conclusion that creating a backlog is waste. Why spend all that time writing up user stories when things are going to constantly change? This is indeed a problem, but giving up on planning is not the answer. Making up your backlog as you go isn’t the answer either.
When we make up the backlog as we go, everything the team encounters is going to be new to them. Everything they do is going to require a spike. The whole notion of Sprint Planning and Release Planning is predicated on the notion that we generally know the size of the backlog, we learn the velocity of the team, and based on those two variables we can begin to predict when we’ll be able to get done. If everything requires a spike, the backlog is unstable and indeterminate.
For every user story we attempt to build, we create at least one spike, and probably two to three more user stories. Even if we have a stable velocity, the scope of the release is increasing faster than we can burn down the backlog. We never end up with stable view of what’s happening on the project. The team feels like it’s thrashing, the product owner feels like their thrashing, and the organization gets frustrated because the product isn’t getting out the door.
Dealing with Backlog Uncertainty
To solve this problem… to get better at making and meeting commitments… even at just the sprint level… we have to plan a sprint or two ahead of the increment we intend to commit. That means if we want to get better at nailing sprint commitments, we need to have backlog properly groomed BEFORE the sprint planning meeting. If there is spike work to be done, that needs to be identified and done BEFORE the sprint where the user story is going to be built.
The implication here is that the team (in any given sprint) is going to have some capacity allocated toward the work of the sprint and some capacity reserved for preparing for the upcoming sprints. This could be as simple as having a meeting or two every sprint to help the PO groom the backlog, do look ahead on the backlog, to ask questions and to give guidance. Sometimes it’s more ad-hoc and sometimes it’s more formal. Some teams track this work, some just allow it to lower velocity.
Either way, allowing some slice of capacity for preparing for the upcoming sprints is an essential element to begin stabilizing delivery, agree? If you are with me so far… I’d suggest that stabilizing a release follows the same rules we just applied for stabilizing sprints. Don’t commit to anything in a release that the team doesn’t understand or know how to build. If there is a ton of risk and uncertainty coming out of release planning, Scrum isn’t going to necessarily help us delivery reliably against the release objectives.
So how do we get this level of clarity on the release level backlog? If we apply the same concept at the release level we applied at the sprint level, we’d have to suggest that the team has to do spike work BEFORE they get to the release planning event. That means that we need to have some idea of what’s in the upcoming release, and the unknowns associated with that release, BEFORE the current release is even done. If we are doing 3 month releases, I’m looking for high-level planning somewhere between 3 to 6 months out.
That’s NOT Agile!?
And here is what I’ll inevitably get when I make this point with some folks. Hey Mike… that’s not agile!? That sounds like Waterfall. You’re suggesting that I have to plan 3 to 6 months out, at the detailed user story level, in order to stabilize delivery and make and meet commitments at the release level? Just to be clear, that is EXACTLY what I am saying. And that begs the question… what is agile? What exactly does it mean to be agile at the corporate level.
In the context of a predictive-convergent company, one that is doing projects that are not R&D, where it is reasonable to understand requirements, and the technology is not totally unknowable… yes, it is perfectly reasonable and advisable to start looking at your backlog 3 to 6 months in advance. Maybe not at the finest level of granularity, but we need enough understanding of what we are going to build, and how we are going to build it, to sufficiently groom the backlog and mitigate risk.
Corporate agility comes not from the practices of Scrum, or from making your backlog up as you go, but from creating the ability to change your mind as you learn new information. If I plan ahead, sure I may create some waste and incur some carrying cost due to a longer backlog, but because I am building complete features in short time-boxes, and working toward potentially shippable code every two weeks, I make it easier to change direction. Some level of forward planning is the cost of making and meeting commitments.
I’d go so far as to say that in a larger organization, changing your mind every two weeks is the equivalent of thrashing.
Most companies need to be able to change their mind every quarter, some only every six months. Most do not need the ability to change their mind every day. In this case, if we can stabilize the roadmap, get some basic governance, apply some Lean/Kanban based program and portfolio management, and disciplined release management… you can get a stable well groomed risk adjusted backlog. And it is possible to estimate and make and meet commitments.
We do it with our clients all the time.
Building My House With Agile?
And this is why I told you the story about building my house. I think the metaphor holds well in companies that are trying to deliver in the predictive-convergent problem space.
1. We routinely recommend a 12-18 month roadmap level plan. Initiatives are broken into 2-3 month increments that can ideally be delivered within a single release. The roadmap isn’t just about business goals, it also has architectural guidance and maybe even high level UX. It’s enough to understand what we know and what we don’t know and provide budgetary estimates that should be close enough to reality such that they establish reasonable constraints. This is analogous to the time I spent with the builder coming up with the architectural design, feature list, and budgetary estimates for my home.
2. We routinely recommend a 3-6 month feature level plan. This helps us get clarity around what we can actually build in the current release and what’s coming in the next release so we can start grooming the backlog and mitigating risk. For me, there is no hard rule for a feature level breakdown. Of course I’d like the features to be smaller than the roadmap level initiatives, but they are going to be much bigger than user stories. I like to allow them to span sprints so somewhere around 2-4 weeks seems right to me. Having this view is analogous to a construction schedule that lays out the foundations, walls, roof, landscaping etc and puts the key dates on a calendar. We still haven’t made all the fine grained decisions, but the project is starting to come into focus.
3. We routinely recommend a 3 month rolling backlog of fine grained user stories. I’m not saying that the user stories have to be 100% sprint ready, but they should be risk mitigated and pretty darn small. Definitely smaller than a sprint and confined to a single team. They should have a clear definition of done and some acceptance criteria, but as we learn, we may split them up more hopefully finding ways to leave stuff out and focus on just the minimally marketable part of the requirement. This is analogous in my house building example to picking carpet color, paint color, the exact kind of hardwood and tile, and the placement of the bushes in the front yard.
Context, Context, Context
If we are building stuff that is unknowable… sure let’s not plan or estimate or commit to anything. But let’s also be clear with our stakeholders what they are investing in. They are putting significant dollars at risk hoping to find a solution to a problem that may or may not exist. That is a perfectly valid way to spend money and allocate investment as long as we are all in agreement that is what we are doing. If the stakeholders think they’ve got a guarantee, that is a problem.
If we are building stuff that is knowable… we need to have a plan for road-mapping the product, progressively breaking things down into smaller and smaller pieces, mitigating risk, planning forward, establish velocity, collaborating to converge on desired outcomes, making tradeoffs, communicating progress and reporting status. It’s not that things will never change, and maybe even our high level planning is off, or maybe we see a risk we didn’t anticipate, but at least we have a baseline to communicate how what we learned may impact the project.
If we are building stuff that is knowable… but we don’t have the organizational structure, governance, metrics, discipline, prioritization, or whatever… and it just FEELS like the work is unknowable… putting these projects in the unknowable category diminishes our credibility. Executives know this stuff is knowable. You’d be better off calling out the stuff that is broken in the organization and working with the organization to fix it. Scrum calls these organization impediments. Remove them.
In my opinion, regardless if you are a consultant or an internal employee, people appreciate a thoughtful consideration of their particular context and a willingness to adapt your point of a view to help them solve the problems they are really trying to solve. We started this thread with the notion that many in the agile community are solving the wrong problem. Even if it’s the right problem, it’s not the one that executives are trying to solve.
There is so much goodness coming out of the agile community right now. So much forward thinking and so many new ideas. Unfortunately we also see a ton of dogmatism and a profound lack of understanding about the real problems executives are trying to solve. I think we have had, and continue to have, the opportunity as a community to make a profound impact on the way companies operate and in the lives of the people working in them.
The early adopter ship has long since sailed. We are probably on the tail end of the early majority. That leaves us with the late majority and laggards just coming to the party. Maybe that’s what’s driving some of my consternation here. Maybe it’s time we start figuring out how to talk to the folks coming late to the party. Maybe it’s time to focus on how to help these folks adopt agile. For these folks it’s less about defining the end state and more about learning how to get there.
NOTE: This series has been an interesting exploration for me. We have been evolving our transformation model over the past year or so, and writing about this stuff has given me a level of clarity and understanding I didn’t have before. I’m actually talking about stuff differently than I was even a few weeks ago. I made some connections that I hand’t made previously.
This post ended up being a good setup for going back and reevaluating some of my first few posts and putting them in a slightly different context. I think what I’ll do is recap some of the earlier stuff in the new context and then build on the house stuff, planning and risk stuff, to start talking about governance and road mapping. The next few weeks are going to be a little crazy, so wish me luck actually finding time to write. Thanks for reading.
Check out the previous post: Understanding Risk in Your Project Portfolio
It touches a number of questions a CEO asked me yesterday about planning and commitment in agile.
I am glad to see my answers were close to what you propose here, and I have learned new things to think about.
Just a small remark:
I was told ‘backlog grooming’ has become ‘backlog refinement’ because of a negative meaning of the word.
this series of 14 articles (if I counted right) is the best stuff I have read on scaling agile for a long time. Thanks a lot for sharing your thoughts and experience.
I believe this article can be summarized as:
1. Have a definition of READY, and only put READY items into a Sprint Backlog.
2. Refine the Product Backlog regularly, not during Sprint planning meetings. (People really do that?)
And since the first sprint can’t begin without any READY PBIs, the Product Backlog has to be refined first until there’s enough work for a sprint to begin (preferably more than one).