TLDR: Integration tests often provide the most bang for the buck, but unlike unit tests, their benefits are hard to quantify. Linking the success of integration tests to user stories can provide a framework to think about integration testing success.
Kent C. Dodd’s testing trophy provides a great way to think about the software QA practice. Unlike the earlier test pyramid that focuses on speed of tests and stability of the product, the trophy focuses on delivering customer value, and that’s probably the index by which product teams should be measured. The testing trophy effectively makes the novel case that integration tests (not unit) provide the most value, and most tests in a codebase should be integration tests, even at the expense of unit tests.
Now, this topic is more nuanced, and Kent’s later articulation is probably more correct:
But it’s undeniable that for the vast majority of web applications written now, integration tests (& not unit tests as the earlier test pyramid would suggest) are the most valuable.
However, there is a problem: unit tests are easily measured by automated tools that output code coverage. While it’s a very basic measure of code quality and getting to 100% code coverage is not desired, teams often aspire to code coverage in the high 70s or 80s. It’s a good metric to aim for, and a nice, clean way to measure proactive QA success.
How do you measure integration tests? While tools like Jest with react-testing-library and even newer ones like Playwright allow you to test components in isolation & hence can generate code coverage equivalent measures, I would argue that these are not the right measures to use when we think from the integration testing perspective.
In the vein of Kent’s tweet above, the way you measure integration testing should resemble the way your software is used, not how it’s made.
User-Story Driven Measurement
Efforts such as BDD and the Gherkin syntax already bring much of the thought of user-stories to testing, and what follows is just a logical extension:
This is a better measure because most product teams already have a user story library. If they don’t, then it’s easy enough for product owners or even engineering managers to reverse-engineer user stories from a working product or a design spec.
Well written user stories reveal not only the persona of the user, but also their intent (or JTBD). This provides a lot of context to write integration tests around. As an example, a user story that adds a 1-click checkout link on a product page will naturally emphasize convenience, and a good engineer can then convert that to integration tests that measures performance regressions too.
Product owners often have an innate grasp of what are “critical” user journeys, so it’s easy to prioritize which integration tests to write. And similarly, engineering managers often know which product areas are the most brittle and a source of bugs (possibly due to underlying tech debt), and they can prioritize important user journeys in those areas.
And finally, one of the hardest bits of code coverage is understanding which % number is good enough. The answers to this Stack Overflow question are clear indicators that perhaps the question is wrong: most answers are heuristic or experience-based, and while everything should be interpreted based on context, it’s nice to have a deeper understanding. When you link testing quality to user behavior, then you have the right instincts: if you prioritize and cover all user stories that are business critical, you have good-enough coverage. And if you prioritize and cover all product areas that are brittle, and you continuously work on improving the failure rates of such tests, you are working on reducing your technical debt.
If you don’t have committed engineering managers or product owners who don’t have adequate business context, it’s hard to understand which bits of user stories are most relevant for tests. This is particularly important when junior engineers are assigned to write integration tests with just a design spec as input. Engineers often have trouble interpreting design specs and getting to the most valuable bits, so some collaboration with product or a senior engineering manager is essential to write quality tests. I’ve tried using Gherkin syntax for this in the past to promote collaboration between engineers and product, but that has done more harm than good, people often contest the details of vocabulary and discussions derail into the vagaries of the syntax. What is important is for the person writing the tests to have a good understanding of what is important from a product and user perspective. What I’ve found works best to start is a meeting between product folks and engineers (ideally with an EM refereeing) where product folks walk through the design or the product explaining the ideal customer journey and the business objectives. Good engineers often pick up “what’s important” very quickly, and then they are driven to write tests that emulate customer behavior. High performing or async teams can often replace this meeting with async verbose descriptions of user journeys, and thinking early about integration tests even while they develop the feature.
Another problem is figuring out: when is coverage good enough? Even within a user story, it’s possible to write hundreds of tests each covering an edge case a customer might encounter. While unit tests often encourage writing such tests (in the name of covering every code path), I consider this bad practice for integration tests. Remember: the test pyramid is still true and writing tests for all sorts of edge cases for every user story will make your test suite horribly slow. It’s better then to focus on what is important from the user perspective and write tests for just that. Here’s a couple of thumb rules I have:
- For a new project that has zero story coverage, write tests only for the “happy path”. Ignore all error paths unless product folks agree that the flow is business critical.
- As the project matures, and you start receiving bug reports, add error paths for the most commonly reported errors that you fix in production. This is to ensure that regressions don’t occur.
So that’s it! I hope you can employ story coverage in your own product organizations and let me know how that goes.