They've tried to add support tickets to the sprint as they come in, but that changes the sprint scope and renders the "velocity" metrics useless.
They've tried to exclude support tickets from the sprint and keep the scope very small. Some weeks it works, other weeks they overdeliver by a lot (because there were fewer support tickets and they got a lot of tickets from the backlog).
I'm thinking Scrum is useless for that team but management insists all engineering teams must use scrum.
Do you have any advice for managing tasks in a team that has support/adhoc and project tickets?
What to use then? Kanban, with little to no deadlines (estimates are ok but be careful as it's very easy for an "estimate" to turn into a "this work is late because we thought it would be done by X.")
If there is any one thing teams new to Agile need to learn, it is that Scrum is just one choice among many. And nowhere near as universally useful as people seem to believe.
This also helps avoiding constant interruptions or context switching to the whole team that are now handled by the designated person of the week.
- All support tickets go into a Kanban board separate from the scrum board
- Whomever is primary oncall works exclusively in this queue and is excluded from the sprint for that week. Depending on volume, you can add a second goalie.
- If the support queue is empty, the oncall works on lower priority debt or other OE.
- Oncall changes weekly for us on Monday
As a manager, I love it because it's predictable for planning. The team loves it because they're not constantly jerked around.
When I ran SRE at Netflix, our general trick was one person is on support for the week. They were the primary on-call, they handled any emergency support. Everyone else was primarily building, or at least doing longer-term follow up work from support tickets.
In major emergencies of course everyone would drop what they were doing and help, or if a support ticket came through for a tool someone else built, it was fine to ask them for help. But at least the mindset was "jedberg is on call this week, so don't expect him to be building anything this week".
We also had good management support so that when we went to review our quarterly goals, and 75% of them were red, they trusted us to know that it was because we had a lot of support load that month. Everyone who depended on our tools knew that they would be ready when they are ready.
This is because the pathway to excellence of one leads to mediocrity in the other. Sprint tickets require an internal orientation, focused work, and blocked-out time. Support tickets require customer orientation, fast response time, and high availability.
If you require your developers to be responsive to customers' needs, they will never have time to do sprint work. If you require them to have excellence in sprint work, they will not be responsive to customers.
There is no way to "time box" these matters, i.e. devote a certain number of hours for this for a certain number of hours for that, because excellence cannot be time boxed. Excellence requires you to do whatever is necessary to achieve it. And these two types of excellence cancel each other out and nullify each other.
OK, this is a problem because why? Some weeks it works, and other weeks they accomplish more than expected... and this is a problem why??
Like, what's the actual problem here, if good progress is being made on features and bugfixes?
My guess it's a problem if management is trying to use "scrum" to treat developers like sweatshop workers who they wring the most possible production out of. How can we know what the most possible production is, if they have a different amount of time to work on it every week?!?
We could come up with different arithmetic ways to solve this. Assign "points" to support tickets too (even after the fact, how much time they actually took?) to include them in your "velocity". Or keep track of how much time is being spent on support to, to "normalize" the velocity of the other stuff accordingly (different way of doing the same thing).
But personally I do not really want to help companies become better at treating developers like sweatshop workers to wring maximal productivity out of. I'd rather realign to "agile" it was originally intended, to empower developers to bring value, not to treat them as commodified interchangeable widgets to exploit and burn out.
but, I mean, exploitive companies gonna company I guess.
> I'm thinking Scrum is useless for that team but management insists all engineering teams must use scrum.
Which they also insist means relying on those "velocity" metrics? Can you do "scrum" without "velocity" at all?
All 'support' created tickets go into the backlog then, earliest they can possibly be addressed is next sprint. (Assuming you want the rule to change/it be someone else's problem/proposed solution.)
As a bit of an aside, I'm a bit curious about your role/the org structure that you seem to have oversight over SRE as well as other teams' working processes, but also have 'management' imposing roughly what they look like from above?
Now of course, since this is HN somebody's going to sneer divisively at everything I just said and tell me it's not possible (despite the fact that I've done this, repeatedly in different organizations in a developer and coaching capacity for almost two decades now). Here's my preemptive caveat/STFU for detractors: the above method only works if you, as developers, have full control & ownership over your application code data, and do your own deployments or are partnered strongly with an OPS team that gives you full monitoring & Read Only access. If your team is working in an environment where you don't actually get ownership over your code and data, this won't work. If management, architects, or egotistical prima donna staff/senior developers "won't let you" pair/mob, do TDD, do trunk based development, or do proper CI/CD, this won't work.
P.S. If you're in an environment where "this won't work" - QUIT! Life's too short to put up with being expected to build software with one hand tied behind your back. These things are often easier to do in medium to small sized companies. These things are often easier to do on greenfield (or at least recent) projects.
I have a blog post on the subject a while back: https://blog.rstankov.com/bug-duty-process/
I have never worked anywhere, including using scrum, where I could avoid incoming issues for 2 weeks at a time.
At the end of the day, better systems, better code, less technical debt means less support. Issues with previous work might indicate a false velocity of the past, releasing low quality.
There are long term maintenance exceptions that happen to all code bases, that have nothing to do with quality, like supporting new platforms or unpredictable platform changes, those can go in as backlog tasks.
The real question is what you are using "velocity" for? Is it to help get a feel for work you can complete, or is it a metric to be judged by management to evaluate you?
* Estimation time to complete each project task (hopefully estimated by the person who is going to do the job. Then add 20%)
* Status of each project task (from 0% to 100%)
* Time devoted to "support interruptions" each week. Doesn't need to be super detailed - something like "week 4: 5 man-days on support tickets x1, y4 and z6"
That way at the very least you are equipped to answer the inevitable questions "why isn't feature C done?". The answer would be something like "we still need around 2 more weeks of full-time work in order to complete that task, but given the current rate of interruptions we will realistically have it ready in one month".
Scrum doesn't survive "reality testing" in the context of actual s/w development projects.
Management (at least in your case) doesn't care about reality.
The usual solution is to participate in "software process theater" where you have the meetings and maintain the project plans for scrum, but actually use an alternative reality-driven management process to run your team.
Not perfect, but it's how software development has been done in at least 1/2 the places I've worked for the past 30 years. Before Scrum there were other crazy things, and there will be more crazy things to come. It's just the nature of humans + the endeavor of developing software.
Scrum is an accounting tool. Apply all the shenanigans that one might find in financial reports to it.
At my first job, we just edited estimates after the fact to hit the velocity target management wanted.
You could always point out that the Agile Manifesto specifically says "Individuals and interactions over processes and tools," and therefore you aren't doing real capital-A-Agile if teams aren't empowered to use the processes that work best for them. (It probably wouldn't do you any good, but it might feel good to say)
Why does management insist all engineering teams use scrum, though? I hope they're not trying to compare different teams' story points, because that would imply that management has a gross misunderstanding of how story points are supposed to work.
We do it similarly. We generally reserve 20% of our time for production support and 80% of time for new development. If we have fewer production issues that week, we get more new development done. If we have more production issues that week, get get less new development done.
We have one generic production issue story where we put small things and we'll create a fully fleshed out story for larger production issues.
If we have a trend of having too many production issues then we raise that as an issue itself and get to the bottom of what is causing it.
Just do the most valuable thing first. Ship. Repeat.
> management insists all engineering teams must use scrum.
They aren't doing scrum. At the very least, just "do scrum" externally and internally do kanban. Management is clearly dumb enough to believe it.
Orrrr.....you assign points to the support work as well, your team hits their 50 points every week or else they get hit over the head by management as to why they weren't on track.
My priorities were inverted: customer escalations first, and if you have time left over, then go work on bug backlogs that no one else wants to address. Everyone else sprinted while I bumped along at my own pace -- subject to escalation priorities, of course.
Those were dream jobs for me, and the rest of the teams seemed to appreciate having someone -- anyone, just not them! -- dedicated to the support role.
Is it just my imagination that this kind of (I would say "enlightened") management/organization is rare in the industry at large? Or do lots of dev teams do this sort of thing? And where can I find them? :-)
From a PM perspective, that also lets management pre-allocate capacity for the sprint. '2 people on prod support, and 3 people on development/project work' is a decision that can be made and pointed back to. And being allocated to 'jumping on things' means you don't have to justify your productivity/velocity/ticket count beyond responsiveness and issue cycle time.
DevOps/SRE are actually System Admins (always have been, always will be), and they are NOT engineers. They are operations. You cannot run Operations using scrum/agile.
We run scrum with 2 week sprint cadence and mix new dev with support/maintenance. The allocation differs based on season and events. Any support tickets we do don't count towards velocity and are mixed in with story point tasks and other maintenance/debt tasks. We track velocity as points delivered to a client, which helps with estimation for delivery of products / features to clients. This velocity is an average, and that's when the numbers work in your favor as "all estimates are wrong."
In a scenario where velocity is constantly decreased by support/maintenance/bugs, that tells a story, either of quality of work done previously, impatient management, or lack of discipline by the team. If you can't go a sprint without having to put out fires, that's indicative of a system that should be mitigated on a larger scale than continually applying bandaids, otherwise you'll constantly be bit.
In my mind, your choices are: change your cadence/strategy, change your values/philosophy, change management style, change your budget/velocity, invest heavily now to replace troubled systems, invest in man power to put out fires, or simply stay the course with a shift in perspective.
Only you can choose.
I find it helpful to rank support tasks by:
* Likelihood. Does this issue impact 10%, 25%, 50%, or 100% of all users?
* Pain. Where does this issue rank on a scale of 1 (minor nuisance) to 4 (product usage is impossible)
Adding categories about the "type" of work is also helpful. This lets you stay current on "crash" support issues while also giving you the ability to lesser-priority tasks for dedicated "localization sprints."
Check out excellent article for more examples: https://lostgarden.home.blog/2008/05/20/improving-bug-triage...
For some DevOps teams, I've found that "support ticket" is equivalent to "hold someone's hand while doing ____." This happens because DevOps people tend to be jack-of-all-trades and know their way around a wide variety of systems. If you're swamped by these kinds of tasks, start collecting metrics about the types of tasks coming in and build a self-service knowledge-base. Another common way to deal with this situation is dedicate 1 person to "triage" each sprint cycle. This person's workload is expected to be 100% support/routing.
That said, you have to be brutal during triage. Have a bucket of time set aside to handle support. That time is used for operational issues that are important but not more important than any product work you are doing. On a healthy system, most of the issues can usually wait to be fixed, due to either low severity or low number of customer impacted (or both).
For very high urgency issues, you drop everything. The sprint doesn't matter anymore, you are keeping the lights on. If your team is constantly dropping everything and blowing out sprints (or whatever duration measure you use), you need to look into why. Most likely, there is a quality problem somewhere that should be your top priority for sprint work. If you can't prioritize fixing that sort of thing, you need find someone who will listen, explain how much money is being lost spinning plates due to something that could be fixed at the root.
If you still can't get it prioritized, find another company. You'll burn out and quit at some point anyway ;)
"Sprints" work okay for product work. I use them a lot less rigidly than most people do with success. On-call activities should be recorded in a kanban-style project, these things are made for rapid reprioritization. Whoever works that board should be dedicated to only that board; making members of a team work in multiple project management paradigms at the same time usually has bad and confusing outcomes. Lastly, the "support" tickets can be varied. If they're requests that are gated by your team, I'd add them to the on-call kanban board. If they're just question/answer I wouldn't bother tracking them, you'll more or less end up recreating ITIL.
Some additional context would be to dedicate people to these boards for periods of time. Have a rotation that incorporates everyone doing a set amount of on-call on a recurring schedule. Build up a handoff procedure for the kanban board since context will be fresh every week due to the nature of the work.
Fixating on metrics doesn't work. For managing humans, human touch is required. I am saying this despite creating a tool for managing metrics. My opinion is that metrics should be used as starting point to dig deeper into your process rather than using them as end goal. In this case if you are consistently getting higher than usual support tasks than it may be indicating some other problem like documentation not being clear in which case you need to improve documentation rather than manage the metrics or it could be a new product launch that would invariably lead to more support tickets until it gets stabilized.
This works fine for small teams but for larger (and with large, older codebases), the ramp-up time for being an effective support dev becomes a drag because it can be months between support stints. We don't all have eidetic memories and forget the tricks we use to diagnose production issues.
To counteract that we've recently put a developer on a 6 month rotation, and support them with a rotating backup. This allows the primary support developer to not only stay productive fixing issues but also surface intelligence around the problem areas in our app (ie: what's always needing support) and construct tools to make resolution easier or better yet convert to a self-serve.
I infer from your comment that you're a smaller team so perhaps you won't need to put someone on a long stint, but you might want to consider doing so anyway in order to have a resource pave the proverbial cow-paths and make it easier for the team in the future.
Even for normal dev teams with customer-facing products (and therefore no zero support load) I have found a need to carve out some time for support work from the Scrum team velocity.
As others have suggested, having a dedicated engineer as primary support contact works well. You can figure out what the 95th %ile support burden is and remove that from the velocity. If you have a quiet week of support, the expectation is you spend the support budget on tools and docs, maybe improve automation on some existing scripts, knock out lower-priority incident remediation WIBNIs, etc.
If you have a >95%ile bad week, then the support engineer does less sprint work or the rest of the team pitches in as needed. But most of the time support doesn’t impact your velocity with this approach.
Here is our yardstick: https://gitlab.com/dreamer-labs/tsc/service-maturity-model
This shares the burden of the interrupt work. It's still interrupt work, and it still slows down someone's velocity, but because it punishes the service designers for their poor design decisions, those get better over time and the overall quantity of bugs goes down, not up.
Example:
Your team has 6 members
Your management would like to put 15% of the workload for support
Every sprint 1 alternating team member is doing just support tickets
You keep track of the support days of every member to equalize the workload (support is no fun)
Because working alone is not the best situation you could work with 2 members every two sprints, but keep in mind that this means NO support ticket is solved for 1 sprint
This is the best solution I could find being forced to practise SCRUM - even if it does not match the rules exactly.
One thing to consider… support tanking your velocity is a feature, not a bug.
Sounds like you need investment in support tooling etc that can lessen that burden. Velocity will improve when support is less burdensome.
This is obviously not one-size-fits-all advice but worked well for a previous org I was at. If the sprint work will have a side effect of fixing support or support is so intense it requires a majority of the team you HAVE to fix that first regardless if sprint/scrum/kanban, etc. if velocity was super high for a few sprints but regressions/bugs were introduced they could impact future velocity, so those high velocity sprints weren’t as productive as you thought. It’s never just one number.
The org I worked at with the absolute worst support/sprint structure also had the only code base I considered “unsalvageable”. They refused to do anything to improve existing processes and spent half of every team’s time on the same support issues on an endless loop. They never had the measurements needed to actually figure out where to improve clients experiences.
Rotate who the "fireman" is on the team, they are out of Sprint and just fix. Spreads internal knowledge around too.
We should only have scrum tickets tied to us that require sprint-specific work from a timing perspective. Otherwise, it's retroactive "tracking" points later on or preallocating points ahead of time for "support work" which is absolutely pointless.
If that many tickets are coming in and disrupting everything, I’d be using that to prioritize addressing the causes so the disruptions stop.
If you’ve ever read The Phoenix Project, this is well illustrated when they talk about prioritizing preventative measures for “unplanned work” and they map out in detail why it’s such a huge problem.
If both are falling behind, you need to slow down the velocity of which you're demanding features be shipped, or loosen the SLOs, or hire more programmers.
On my side, working in a small team of 5, we just leave one person out of the sprint to work on support requests, bug fixes and if time allow improvements. Large handling of technical debt (ex: transitioning from VueJS 2 to 3, good luck expressing this as a "product increment" in a sprint goal and good luck if you don't tackle it ... ) are part of sprint work.
Overtime, I realized that Scrum is a framework, not a fixed set of rules, and as per Agile, the goal is to maximize business value. So if we are swamped by support requests and our systems are not operating properly, it's the role of the person on "support duty" to warn the rest of the team and ask for help. Sure it may impact some velocity metrics and we may not hit our sprint goal that week, but why would that matter? There is no point writing more code and building more features if what we have is not working properly and not satisfying to users.
The main issue i have seen with this system is that the developer on support duty may not have the right skills to perform all the work that is needed. Scrum relies on the assumption that anyone can do anything, and that's definitely a flaw when working on a larger codebase with a mixed of frontend, backend and devops with more junior developers. It does force us to train everybody and learn to delegate so that everybody can be exposed to all parts of the system but this takes many months / years.
We also learnt to define smaller Scrum goals. It's better to achieve a small goal and then decide what to do next than to always feel like we are running behind. The sequence of "two weeks" sprints is often seen as a "fixed" schedule with deadline to hold at all costs, but that's stupid. The core idea of Agile is to follow an incremental approach following the path of max business value delivery, and periodically reflect about how things are going. If the amount of support requests and operational issues is such that there is no more resources available for new feature development, then it's time to prioritize tackling some technical debt and improve monitoring and automation to prevent the most common issues
If it is implemented, then it'll be heavily gamed, or you have people working ridiculous hours.
Kanban makes more sense.
... Without adding additional staff of course.
Sometimes the"scrum" is imposed to try to order and manage things as the support burden adds more and more disruption. In a vain search for a magic bullet.
They were in effect excluded from the sprint during that week.
It was also a company who was trying to shoehorn everything into SCRUM because a consultant told them.
We rotated support engineer among our team members. When they're on support duty, we just assign them half as many sprint points. If they have a light support week, they take on extra work. A nice bonus for our team.
just use a kanban-board with a backlog ...
just my 0.02€