Do you have a huge pool of defects, bugs, and fixes waiting to work on? So much that you find it impossible to balance it with other project work?
I’ve heard three versions of the same story about how teams fixed this problem over the last year. I’m always cautious of over-generalization, but once our sample size gets to three it’s time to share.
The first part of the story goes like this: after releasing super-cool whiz-bang 4.0, the team is given a lot of fixes to make. For a while, everybody works on them, but management has other ideas: it’s time to work on super-cool whiz-bang 5.0, guys! Surely you can do this other stuff in the background.
So the “other stuff” is taken up by one or two guys, or it’s farmed off to India, or a couple of guys come in on the weekends for a bit to try to tidy it all up. There are a lot of plans for how it’s all taken care of.
Only it doesn’t work the was it was planned.
Instead, the load becomes bigger than the guys can handle. Not only stuff from 4.0 is in the list, there is still stuff from 3.0, 2.0, and 1.0 that hasn’t been fixed. The list gets huge.
Acting logically, you set up a triage. Things come into the list. You take a bit of time to estimate how big it is. Is it 1 day? 2 weeks? 3 months? Once you know roughly how big items are, then the business guys can make decisions about priorities. So the next thing to set up some kind of field for priorities, maybe low, medium, and high. Maybe forced ranking. Of course you use some kind of online system. After all, this is the 21st century! All important things must happen online somewhere. If it’s not online, it’s not important.
But wait a dang minute, here. Some things are mission-critical fixes, like when production tanks. You can’t keep working on fixing a report when the whole thing is broken. So you make a rule that when production tanks, everybody stops work and fixes it. Then they go back to whatever they were working on.
After a bit of time, you notice that not only is the list huge, it just keeps growing bigger and bigger — much faster than you could ever hope to catch. You’ve just entered Defect Hell, where a big dump-truck comes by once a day and piles an endless stream of crap on top of your head.
This is bad for everybody. For customers, things are broken and never get fixed. For other people in the company, they start losing faith in the ability of the team to fix problems quickly. For the team itself, they’re on a demoralizing death-march without end.
Bad. Bad. Bad. And if you’ve worked in a big organization at all, you’ve seen it.
In most places, the response to this problem is to simply keep over-complicating the plumbing. Perhaps you add another field to your list, like “severity”, or “impact”, or “will cause loss of customers” Maybe you create little automated email systems, flashing lights, or loud sirens. The hidden message is obvious: you dumb defect fixers need us to help kick you in the butt so that you’ll work faster.
That never works, so then, sadly, organizations will invent completely new ways of fixing the problem. Instead of complicating the existing system, another system will grow up alongside the first in order to “really once and for all fix the defect problem” Perhaps it has red flags, or little stuffed animals that explode. The safe money says it’s something that looks very serious and has a lot of complexity underneath. Maybe there’s a class or a little gold star you get to wear.
The problem here is outhouse process engineering, or the idea that a few smart guys can sit around with a team of super-users and pinpoint the exact nature of the defect-tracking system to a degree necessary in order to design a one-size-fits-all solution.
Yes, Six-Sigma Guys, I’m looking at you.
But you are by no means alone. Not at all. All of us in technology believe we can over-generalize and then over-apply those generalizations.
And we are most always wrong when we do it. Technology development, no matter how much we’d like it to be, just ain’t manufacturing. It’s R&D.
Here’s how the second part of the story is turning out for some folks:
Take a look at your data. How long does it take to fix a defect on average? For our story let’s say the answer is 25 days.
Throw out all the other complexity. We know on average it takes 25 days to fix any defect. Yes, it will vary by the individual case, but the variance is not important right now. Addressing it causes more problems than it solves.
Second, eliminate any sort of up-front sizing, prioritizing, or tool-based data nightmares. Simply make a list. Once again, the overhead creates more problems than it solves.
Third, create a kanban board with some steps you think might be common for all defects. On one side might be “start” on the other side “test” Don’t worry if it’s perfect, just throw it out there. If it doesn’t work for you, change it. Immediately. It’s your board, not anybody else’s. It doesn’t exist as a rule anywhere.
Fourth, pick a number that represents your maximum capacity, i.e., the amount of stuff you can work on at any one time. let’s say you’ve got a couple of people working on things and everybody decides that number is 3. But it could be 7. Just like the layout of your board, don’t spend a second over-engineering it. Pick something and then adapt. it’s important to get buy-in from your customers on this, but the critical thing is that you have to pick some number that everybody agrees on. This number represents work saturation — when you have 3 defects on the board, you’re loaded. You’re not allowed to add more on the left until one comes off the other end.
At this point, you have a simple question for people submitting things to the list: what would you like to have done in 25 days? They can flip for it, arm-wrestle, conduct decision-facilitation meetings — it doesn’t matter. From the team’s standpoint, just let them know. The business must self-organize to work through this.
There are a lot of objections to this setup, mostly because it ditches a lot of complexity that we just “know” is important. What about things that have to happen immediately? Well, make sure they get pulled next. What about things that finish early? The way kanban works is that once something is finished, the next item at the top is pulled. What about people trying to submit new feature requests disguised as bugs? That’s a common worry, but as it turns out it only happens a very small percentage of the time, maybe 2 or 3 percent of the total submissions. When the team finds one, let them escalate if you like. It’s not a big enough deal to add complexity to the system. Why are we giving up the step of sizing the effort before it goes in the list? Because teams that look at the data are finding they spend between 25-40 percent of their time simply sizing defects that may never get fixed. What about things that stay on the list and nobody ever puts it in the top three? Things on the list over six months get booted. If they’re important, somebody will put them back on the list. If not, they won’t.
Once all the objections are satisfied, what I’m hearing is that this is good for everybody. For the teams, it gives them some amount of sanity: they know what they are working on and have freedom to adapt and improve how they solve problems.
For the business partners, they start having to have real conversations about how much they can do and what needs to be done next, instead of just tagging everything “urgent” and throwing it in an electronic bucket somewhere. It also cuts a lot of the cruft out. The question of “what do we want done in 25 days” is a solid question with real impact. Checking little boxes on a form somewhere is just an exercise in frustration and futility.
Also the business starts to note that things are arriving as scheduled. Many things are done in much less than 25 days, but somehow even the outliers get pulled in under the limit. The business starts trusting the team more.
And, after a few months, as the team relaxes into a flow and starts self-optimizing, the 25-day lead time starts decreasing. It’s 22 days, then 18 days. As the lead time decreases, velocity through the system increases.
In all three stories I’ve heard, on huge messes it took perhaps a year or two to completely fix. You don’t clean up a problem that took years to accumulate in a week. But even within a few months, it was clear to everybody that things were on the right track. Every one of these guys were extremely happy — to the point of recommending kanban for just about everything in the world.
Which, of course, is the problem. The same “disease” that caused folks to create cumbersome tracking and allocation systems in an effort to help defect teams can also cause successful defect teams to create systems that may or may not work for other folks. Beware over-generalization and premature optimization.
So no, there’s no magic potion, but as they say in the news business, a pattern of three makes a trend, which makes it newsworthy. I am putting this in my arsenal of suggestions for any type of team that has a situation that looks like this, including program-level defect teams, where this defect hell thing can drive everybody crazy.
Whatever it’s actual value for your particular situation, it’s an nice tool to have sitting in the tool-chest.
If you've read this far and you're interested in Agile, you should take my No-frills Agile Tune-up Email Course, and follow me on Twitter