Skip to main content

Agile Metrics : Agile Health Metrics for Predictability

Reading: Agile Metrics : Agile Health Metrics for Predictability
Agile Metrics : Agile Health Metrics for Predictability

LeadingAgile uses Agile Metrics to demonstrate the results of our process improvement efforts and to identify areas that need further improvement. We have many internal documents describing our approach that we share with our clients, but to my surprise, it seems that we have never blogged about it. Here is a high-level view of the agile metrics we often start with.

When deciding what to measure, the place to start is with a goal. First, ask yourself what outcomes are you after, your goals. Then consider what is needed to meet those goals. And finally, what agile metrics indicate whether you have what you need. You may recognize this as the Goal-Question-Metric approach.

Our clients tend to care about predictability, early ROI, improved quality, or lower cost. Predictability seems to be paramount. They want teams to get good at making and keeping promises, consistently delivering working, tested, remediated code at the end of each sprint. A team that is not predictable isn’t “bad” – but they aren’t predictable.  Without stable predictable teams, we can’t have stable predictable programs, particularly when there are multiple dependencies between teams.

This post focuses on agile metrics for predictability. The goal, then, is:

Teams can plan, coordinate, and deliver predictably enough to make a release level commitment.

Here’s how we break that down; Does the team:

  • Deliver the functionality it intended each sprint?
  • Frequently deliver working, tested, remediated code?
  • Have everything expected each sprint to perform the work?
  • Have confidence they will deliver the functionality expected for the release?
  • Has the team established a stable velocity?

We answer these questions with the following agile metrics:

Agile Metrics: Story and Point Completion Ratio

  • Number of Committed Stories Delivered / Number of Committed Stories
  • Number of Committed Points Delivered / Number of Committed Points

This metric helps teams become predictable in their estimating and sprint planning. It encourages smaller stories and more effort in getting work ready prior to the sprint. We like to see delivered points and stories to be within 10% of the commitment.

Agile Metrics: Velocity and Throughput Variation

  • Recent Velocity / Average Velocity
  • Recent Throughput / Average Throughput

This metric helps teams become stable in their performance. This will encourage managing risks and dependencies ahead of the sprints, and not over committing within the sprint. We like to see recent velocity be within 20% of average. We also want to see a reduction in the standard deviation of the velocity over time.

Agile Metrics: Lead Time

  • WIP to Throughput Ratio

Building a large inventory of untested code typically increases the costs and time associated with fixing defects. This, in turn, increases the costs and challenges associated with version control, dependency management, and the delivery of working, tested, remediated code. Our objective is to improve lead-time and to deliver frequently. There should not be more than 4 weeks worth of throughput active in a team from Ready to Delivered. Less is better. We like to see for two weeks or less.

Agile Metrics: Team-member Availability Ratio

  • Headcount available / Headcount expected

We need an indication when planned team-members aren’t available. Stability is critical for teams to be able to make and keep release commitments. When people are pulled across multiple teams – or are not available as planned – it is unlikely that the team will be able to deliver predictably. We like to see this be within 10% of the plan.

Agile Metrics: Release Confidence

Use the team’s insight and record of performance to evaluate the team’s confidence that the release objectives can be achieved. This metric is useful for planning and commitment purposes. Release Confidence is a consensus vote where 1 is no confidence and 5 is very confident. If a team has heavy dependencies, they should include a vote from the Agile Project Manager of the team handling the dependencies. If the team is missing a skill or if a role is unfilled, the team should take into account the likely impact to release success. Support this metric with a release burn-up.

Other Agile Metrics

That’s just a taste of the Agile metrics we use for predictability. We also use quality indicators like build frequency, broken builds, code coverage, defect rates or technical debt. Likewise, for Product Owners, we are interested in things like major initiatives, features remaining, features released, size of the release cycle, and more. And for value, we are interested in things like time to value.

Using Agile metrics responsibly provides insight across the organization to understand the organization’s ability to meet expectations. These agile metrics help establish a shared understanding of the respective capabilities of the teams, and guidance for improvement efforts.

Next Defect Driven Test Automation

Comments (12)

  1. Dave Speck
    Reply

    Can you provide an example of how WIP and Throughput Ratio are calculated? Thanks.

    Reply
  2. Andrew Fuqua
    Reply

    Hey Dave. Thanks for the question. Metrics sound simple on paper, but putting them into practice requires some thought. And you must change them over time.

    We’re clearly talking about Little’s law here, so the simple answer to your question may be:
    Lead Time = WIP / Throughput
    Lead Time = 2 stories / 5 stories per week = .4 weeks
    Lead Time = 20 story points / 20 story points per sprint = 1 sprint

    For an iterative process, Throughput is average Velocity (for the unit of iteration time), and WIP is the number of Points in the Stories not yet Delivered. You could alternatively just use a count of stories: on average, how many stories do we deliver every n weeks, and how many stories have we started but are not yet delivered?

    Your specific situation affects how you use this metric, and remember that we’re measuring to get information so that we can make wise improvements — not for rewards or as some absolute goal. I’ll give some examples…

    Suppose we have a well-oiled Scrum team of 8 people with max WIP during their sprints at 4 stories started but not done, no carry-over to the next sprint and 16 stories in the sprint backlog and an average velocity of 16. This team doesn’t need this metric to inform them about their in-sprint WIP.

    With a very mature team that is able to deploy to production frequently, you may be more interested in the amount un-deployed work (with un-deployed work as your WIP, your definition of Delivered). If my throughput of deployed stories is 5 per week, but I have 30 stories in process, that is, undelivered (and the trend is stable or worsening), then maybe I should investigate how to decrease WIP, increase throughput, shorten lead time.

    However, consider a struggling team in which test can’t keep up with development — perhaps QA can do 10 stories/week but dev can do 11. At the end of week 1 they have 1 untested story and a throughput of 10 stories/week (fully tested) — they have a little WIP hanging out compared to their throughput. But by the end of week 10 that carryover has grown to 10 and their throughput (through QA) is also still 10. The trend was bad from the start, which I could see from week 2. That’s why I always also look at the trends of these metrics, not just the absolute number. If week 10 is the end of the release and we have 10 stories not tested, we could catch up pretty quickly if we stop developing for a bit, or increase swarming, or change how we test, or have increased quality, or whatever. But if I’ve got 4 weeks worth of un-tested stuff, then I’ve got a much bigger problem.

    In that situation, my definition of Delivered may just be tested stories, not yet packaged into a release, or not yet deployed into production. Once that team solves that current problem and starts deploying to production frequently, they’ll need to change this metric.

    I see the same kind of dev-outrunning-qa situation in iterative teams as well. When doing the math for an iterative team you need to know whether the team is counting the un-tested stories in their velocity. They /shouldn’t/, but what they /should do/ is irrelevant if you are going to re-use their measurement. What they /actually count/ matters.

    Does that help?

    Reply
  3. Dave Speck
    Reply

    yes, thank you

    Reply
  4. Paul Boos (@paul_boos)
    Reply

    There are some important metrics being left of this set; these metrics are only as good as the actual reliability of the software being produced. The only metric related to reliability mentioned above is the consensus vote, which is subjective. While this is somewhat useful, I would hope a team is using objective measures to base this on, and if so these then could be used rather than a subjective team measure of consensus.

    Test coverage and the number of tests passing within that test coverage (along with criticality of the test) would take a team a step there. Understanding cyclomatic complexity would provide a useful metric to possible future problems.

    I’d hypothesize that without these (and probably others, this is only about 5 minutes of thought) that any predictability metric would be meaningless, In fact, I imagine that one could find some form of ratio of features being produced to some aggregate of these code health metrics that could be monitored like an EKG, the periodicity of these could be monitored to determine when to release.

    Paul

    Reply
  5. Andrew Fuqua
    Reply

    Hi Paul. Thanks so much for reading and commenting. I’m honored!

    By reliability, I assume you mean, for example, that the software does what it is supposed to do without corrupting data or crashing, that it is of sufficient quality. In my 2nd to last paragraph I state the need for quality metrics.

    Regarding your premise that “these metrics are only as good as the actual reliability of the software being produced”: If my team cannot meet their sprint commitments because they don’t have a stable team, or they get interrupted, or they are accepting ambiguous work into the sprint, or they are writing poor quality (unreliable) software, then the team isn’t going to be predictable and the project isn’t going to be predictable. These metrics will help us see that. These metrics in this post are about project predictability. That’s the value of these metrics, particularly in a context in which teams aren’t yet where they should be. If you have well running agile organizations with predictable teams, then, no, these metrics wouldn’t be useful. They are incredibly useful where you don’t have that.

    I was surprised that you said the release confidence consensus vote is related to (software) reliability. When I introduce this metric to an organization I explain that it’s about their confidence in their ability to predictably meet their release date or project commitments. Some of the objective measures that inform this are the story completion ratio (can the team meet its sprint commitments), team-member availability (are people getting moved around), the velocity variation, the velocity graph, and the release and sprint burndown charts. That informs our gut. Other stuff informs our gut as well. For example, I might think one of our critical team mates is a flight risk. We read those objective charts, interpret them, consider other things we know, suspect, or feel, and make a call — Are we going to make it? Your guy will tell you yes/likely or not likely. That’s why we include the objective measures and ask the subjective question.

    Perhaps I missed your point, Paul?

    andrew

    Reply
  6. Paul Boos (@paul_boos)
    Reply

    I see I missed the word and took the word commitment as something that was being asked when preparing to release, not something that sounds like release planning. Anyway, so you can attribute that to misunderstanding (and not enough caffeine).

    Anyway, my main point is, I think you can do away with these project predictability metrics if you can tie them to quality metrics and let the quality metrics decide when you do your releases; which won’t be effective unless you are working really small. The larger the batch size, the more need for predictability metrics and the more decoupled any predictability metric(s) become from the quality metrics.

    Cheers,
    Paul

    Reply
  7. Andrew Fuqua
    Reply

    Ah, yes, I totally get the misunderstanding now. Right… by release commitment confidence, I’m talking about the planned release that may be a couple months away. And yes, I’m talking about large organizations that are just beginning their transition to more adaptive planning approaches, that still have largish release and don’t yet have the ability to frequently test and release in small batches.

    Thanks again for the discussion, Paul.

    Reply
  8. Stephen Wu
    Reply

    With all due respect, Number of Committed Points Delivered / Number of Committed Points within 10% is a dangerous metric. Many authors have written against this practice. There are many reasons: this metric encourages dysfunctional behavior (e.g. do work in “shadow”; over-estimate to be on the safe side; becomes more of a management tool, rather than a team’s tool for self-improvement (and one of the reasons for the burgeoning revolt against Agile as better for managers than for developers); it focuses the goal on showing good numbers instead of on the sprint goal; 10% is an arbitrary number — I’ve seen 15% and 20% too — and I would challenge the notion that a team that consistently comes within 10% as more effective than the team that comes within 20% of prediction/commitment.

    Reply
  9. Dan
    Reply

    Good article but I am in the camp that agile and release predictability is such a myth even a lie. It only gives executives or others in leadership something latch onto real or not. When in reality they should focus on the product quality and happiness of the customer. They say they do but only after the fact. I am personally against any metric.

    Reply

Leave a comment

Your email address will not be published. Required fields are marked *