Here are some useful things for agile software development teams to be measuring in order to understand how effective they are being (and how well their current project or initiative is tracking):
In terms of specific measures pertinent to the delivery of high quality, valuable to customer and business, working software (which, at the end of the day, should be the goal of any software development team, agile or otherwise), it is useful to differentiate between “process” metrics and “output” metrics.
Process metrics are those which the team is directly in control of, i.e. they form part of the team’s process (how they work), rather than an outcome of their process. In other words, a team’s process, and its corresponding metrics, can be used as leading indicators of delivery outcomes.
Here are some examples:
Frequency of commits
How often production code is checked into source control
Frequency of deploys
How often production code is put on a shared server
If production code (i.e. fully tested, integrated, working software) is not committed and deployed frequently, there is a higher risk of conflicts, bugs and deployment issues.
Frequency of releases
How often features are made available to users
If the evolving product is not put in real users’ hands frequently, there is a higher risk of building the wrong thing due to reduced feedback, big changes/surprises for users and lots of big bang rework.
Definition of Ready
Process steps, checks and outcomes required before a work item is deemed “ready for development”
If there is a loose (or no) agreement on when a story is ready to be worked on by the development team, there is a higher risk that there is not a shared understanding with the customer about what should be built, and thus the wrong thing might be built, or crucial business rules/acceptance criteria missed.
Definition of Done
Process steps, checks and outcomes required before implementation of a work item is deemed “done”, representing a shared team standard and understanding of what is deemed to be “production quality”
If there is a loose (or no) agreement on the steps in a team’s process through to when a story is considered “done”, there is a higher risk that the scope of the story will be wrong, or expand during development, or crucial quality checkpoints (such as code review, refactoring or documentation) will be missed.
Number and scale of work items requiring delivery on a team’s backlog
If the backlog is treated as a queue rather than a list of options, a large backlog is a leading indicator of a slow delivery rate (lead time) for new ideas and overall objectives.
WIP (work in progress/process)
The number and scale of work items which have been started (in some sense) and are thus deemed “in progress”
An increase in WIP is a leading indicator for a possible slowdown in cycle time and/or reduction in quality (if the WIP puts the team over their sustainable capacity). High WIP causes costly context switching and poor flow. See Little’s Law.
TDD (test driven development)
An Extreme Programming code design and writing practice where a deliberately failing micro-test is written before the production code which is needed to make the test pass, followed by refactoring to ensure the design is clean
If developers are not doing TDD, and thus not writing tests before production code, there is a risk that not all production code will have tests, and thus there is a higher risk of developers unwittingly committing breaking changes and thus defects escaping to customers.
BDD (behaviour driven development) / ATDD (acceptance test driven development)
A development approach where programmers, testers, analysts and designers collaborate directly with customers to build a shared understanding of their needs, and then drive the implementation of a solution by starting with one or more customer-agreed “acceptance tests”
If the team is not writing executable specifications collaboratively with business/customer reps, there is a higher risk that the delivered solution will not do what the customer wants or needs it to do.
Output metrics are those which are outputs, or outcomes, of the team’s process, i.e the team cannot directly control them, only try and influence them in a desired direction via changes to their process.
As such, output metrics are lagging indicators, which is one of the reasons why releasing working software to customers frequently is so important (it reduces the lag, giving the team a far better understanding of whether they are delivering the right things, and building them in the right way).
Customer usage and satisfaction
Arguably the most important indicator as to how successful a development team is being is that the customer is frequently using the delivered product or service and they are deriving value from using it (i.e. it is meeting their needs). So teams need to talk to customers regularly and measure the above in order to understand if they are serving their customers well.
Defects found (by customers or internal stakeholders) on “done” software, i.e. those which have “escaped” the team’s current iteration or sprint cycle
A large number of defects, particularly high severity ones found by/impacting customers, often points to poor quality, as well as reduced capacity of the team to deliver value.
Measuring the total number of defects, along with defect rates (e.g. how many new ones are being raised, how much time the team is spending on fixing them, etc.) helps the team to understand if there is a general problem with quality in the system, and whether the problem is improving.
Critical production incidents
Incidents such as severity 1 defects and production outages that cause expedited team action to rectify the problem
Firefighting activity, like defects, often points to poor quality, as well as reduced capacity of the team to deliver value.
Teams should seek to understand the amount of capacity spent on these incidents, and address the root causes of the fires so they can prevent them happening again (reliability / resilience), or reduce the impact if they do happen again (recoverability).
Number of stories delivered to “done” per week on average
Throughput is a useful measure for the team to understand how well deliverables are flowing through their process into production, and (in conjunction with variance) to forecast (and share with stakeholders) possible release outcomes.
Throughput (days) = Stories “Done” / weeks elapsed
It should be borne in mind that throughput should represent the delivery of actual business value as closely as possible. Until working capabilities are in the customer’s hands, any throughput measure is only a proxy for real throughput.
Volatility of weekly throughput values compared to the mean average (aka standard deviation)
Throughput variance is a useful measure for the team to understand if:
It also makes uncertainty transparent, and reminds folks that forecasting is about identifying multiple possible outcomes, not only one.
Elapsed time for a story to move from one part of the process to another, typically measuring the time in progress with the team
Cycle time is helpful for understanding how long a story typically takes to deliver from start to finish.
It is useful to measure real cycle time per story (easily done using sticky dots at daily stand-up) or calculate averages using WIP and throughput (Little’s Law).
Unlike effort, cycle time incorporates wait times, making it handy for finding unnecessary delays in the process.
Cycle time (days) = “Done” date - Start date
Average cycle time (days) = (Weeks elapsed * 5) / Stories “Done”
Elapsed time from when a story is added to the backlog to when it is released to users
Slow lead time means that the overall speed to market (concept to cash) is slow, even if the team’s delivery rate (cycle time) is fast. Caused by long queues (backlogs).
Average lead time (days) = WIP + Queue / Throughput
The average elapsed time between stories being delivered (deployed or released)
If the time between stories being delivered is high or volatile, it is an indicator that the team has poor flow.
Takt time = Date/time story delivered - Date/time previous story delivered
Average takt time (days) = Number of days (sprint length) / Stories “Done”
Required behaviours of a feature that were incorrectly specified or are missing
Missing or defective requirements cause disruption, re-work and delay in development. Measure how often stories have acceptance criteria changed, or added to, during development.
What are some other helpful metrics you have used in your agile software development teams?
If you have any questions about any of the concepts in this article, or need a hand with your service design, lean/agile product development or agile transformation endeavours, please reach out directly to me or my company Hypothesis.