When I look to define what a healthy Software Engineering Culture of Quality looks like, it is the way a team that cares about the external quality (value we bring to the customer) and internal quality (value we bring to other developers) of the product being built today in the context of the longevity of how long the product will be used in the future. When you consider all of these facets, the teams with a great Culture of Quality will ensure that decisions, discussions, and actions are appropriately made within the context of their current situation.

One of the ways we can build a Culture of Quality is by measuring what matters. I’ve identified 3 areas that can be summarized in a Quality Score Card for Software Engineering Teams.

Externally Reported Defects

Within this category, I’ve found that there are typically 2 types of defects that are reported externally (by our customers). The first type are issues that are due to new recently released code (within the last 3 months). The second type are issues that have been in the system for longer than 3 months but potentially ever since the software was created. I treat each of these issue types are treated differently.

Externally Reported Recently Released

For the recently released issues those are linked back to the team who worked on the feature and are planned and prioritized along with their current deliverables, ideally being fixed with the next scheduled release. In this scenario a delivery team broke the boy scout rule, leaving the software/functionality in a worse state that it started with.

Externally Reported Backlog

For the defects that have been in the system for longer than 3 months, these get sent to the product backlog as areas of the system to be prioritized by our customer success team and our product team. A Severity and Priority matrix can be used to discuss how bad our customers are affected (severity) and how upset or willing to accept the defect (priority) are customers are. These are important but its possible that many of these issues weren’t caused by your current team, if your working with software that is over 5 years old.

Externally Reported Created VS Resolved

To see how your team is trending, the first metric I like to track is the Created defects vs Resolved defects. This allows me to answer the question: How many new defects are being reported externally by our customers and how are we responding to those reports? Are we making progress towards reducing that backlog size or is it growing? This data can also be segmented to view only externally reported recently released items, and how many of those are being resolved by the delivery team in a timely manner. I typically have a past 30 day view and a past 90 days view to really see any prevailing trends.

Total Open Count Externally Reported Defects

It’s also important to track track an overall count of Externally Reported Defects that are not resolved. This will give you a pulse or allow discussion and investigation around these recently released regressions. It can also be useful to segment this data by severity, priority, and responsible delivery team.

Test Automation Metrics

Tracking certain metrics for Automation in Testing efforts can be very helpful to show off the value of the work being done in this area.

Issues Discovered by Automated Checks

One simple lagging metric thats worth tracking is issues discovered by automated checks. Any issue that an automated check uncovers should be documented and you can quickly see the value or lack of value your efforts in building automated checks are providing. I tend to track this metric month over month so as to see overall trends. If zero issues are discovered by automation in a given month that is not necessarily a bad thing.

Functional and Unit Automated Check Counts / Coverage

There are ways to determine what areas you should focus your efforts building up automated checks, I will not cover that here but will cover the counts or coverage of checks in place. This metric will be very subjective as what you want your coverage to be is something you have to define. The way I’ve found best to track this is to create a Feature Map of your system, every action/link on a page broken out mapping out the entire system. From this established list, decide if an automated check is necessary to cover this area of the system. If so mark it as something you intend to add an automated check for (todo). If it has any sort of automated check that touches that feature mark it as automated (yes). If it doesn’t have any automated check and you don’t plan to add one in the near future mark it as (no). One caveat to this method is things that are marked as yes automated may not have the appropriate level of coverage, if that is the case feel free to mark things that have coverage but need more as todo on this list. What this will give you is a way to measure the coverage of your automated checks, and give you a nice backlog of items that can be planned and tracked via burn down chart.

The simpler way to track this by total count, and broken down by the area of automated check counts. This again will give you some information around how many checks are in place but this number doesn’t give information about the quality of the checks.

Most unit test frameworks have built in code coverage metrics so getting a coverage % is a lot easier. I tend to use this and set a baseline that the team works towards staying above with each commit.

Release Quality Metrics

The final category of metrics can be useful in gauging how many defects are found in production by your customers due to a recent release, and how quickly the team responsible responds.

Externally Reported Change Fail Percentage

This metric should give us a good indication of how often we break functionality for our customers due to a release. Some may not find it useful to track every single externally reported defect, but only include defects above a certain severity or priority. This is where you can make this metric work for your team. Gather the data and focus in on what you want to improve. I tend to track this metric month over month. I first gather any externally reported recently released defects that have been reported by customers, and verified as actual defects. During the initial research and triage, an attempt is made to identify the root cause, and like the ticket, and link the custom field we have added in Jira ‘Related to Fix Version’. Then I will gather every change (user story, bug fix) that was deployed during a release in the month I’m measuring. From there I follow the formula below to calculate the Change Fail Rate Percentage.

  • X Changes released to production (any tickets linked to the release)
  • Y Issues reported against those releases
  • (Y / X) * 100 = Change Fail Rate Percentage

Once you have a few months of established data I recommend setting an aggressive target change fail rate. For the project I’m currently working on, my target is 2%.

Mean Time to Repair (MTTR)

Mean time to repair helps us track how quickly we react to externally reported recently released defects. To measure this we we identify defects above a certain severity or priority that are important for us to respond quickly to. The

  • 4/1 - Defect1 Opened Resolved 4/5 - 5
  • 4/2 - Defect2 Opened Resolved 4/5 - 4
  • 4/6 - Defect3 Opened Resolved 4/6 - 0 (resolved same day)
  • 4/19 - Defect4 Opened Current day 4/22 Not resolved open (not counting this)
  • MTTR = 3 (9 days in progress) / (3 number of issues that have been resolved)

Using this metric can help surface data to the team how quickly they are responding to defects they have released recently. This is a lagging metric, it doesn’t track defects that aren’t resolved yet. Mean time to repair can be measured in many different ways but this is how I tend to measure it when dealing with software delivery teams.

Lead Time for Hotfix

Another useful metric though less around Release quality is, how long does it take for a piece of code to get from test complete to a production environment. This is assuming a followup release hotfix to resolve a released defect. This value should include any CI/CD pipelines (build,deploy,test) along with any release processes needed to get the fix to a production environment. It should not include time to write the code or test the code as those times will never be consistent due to the complexity of the changes. The release process however should be something that is consistent

Deployment Frequency

For this metric I like to track 2 things. How many scheduled deployments do we have per month, and how many deployments did you actually have. The first metric may be less important for a team working on decouple systems already practicing continuous deployments, but for many teams still working on monolithic applications, the release train model is used, with scheduled releases. Tracking these can give insight into how many changes get deployed per deployment, giving insight into batch size.

Conclusion

There you have it! My list of Quality Metrics that Matter. From each of these metrics I’ve found it useful to create a quality score card that get shared with the team and the leadership team. Being able to see positive or negative progress should help drive better decisions, when it comes to considering quality. Should we speed up and allow our externally reported defects to fill the backlog? Should we slow down, and spend more time testing, releasing with less risk, ect? Should we hire another test engineer to help test or add additional checks to the automation framework? Are our automated checks even useful? I see we have 10 recently released defects and we don’t have any recent issues discovered by automation. These metrics will give useful insight into your teams work.

Do you have any useful metrics that I haven’t covered here? I would love to hear about them, please share them in a comment below!