Why can't anyone tell me what story points equate to?

Question

Contract after contract, engagement after engagement: why can't a single product owner, scrum master, agile coach, anyone, tell me what story points equate to?Time? Effort? How many times I have to push my code? Cakes Available Per Day (mmmm, cakes)?And what's with the whole Fibonacci numbers? 1, 3, 5, 8, 13, ..., etc. What does that even mean? Why even do that?No one can tell me. I know someone can, but it's never been the people who are managing this stuff.What's with the concept of trying to make story points mean "Effort"? Why even bother?Why can't story points simply mean: time. How much time a task will take. How much time I think I'll need to apply assuming the best case.I think X will take N time based on experience, environmental factors, common causes of time loss (meetings, coffee breaks, eating cake (mmmm cake), ...)... if you're asking ME to do the time.Why does that then need to be converted into "effort"? Or Fibonacci numbers?"Task X will take 7 gold stars and a glitter bomb to complete.""Good work, Mike! That's a great estimate. Now get to work!"Please help :(

AnimalMuppet · Accepted Answer

The point is to have a rough gut-level size to put on things, so that you can do so quickly. "Well, task X will be about the same as task Y was, and Y was 2 story points, so X is probably 2."As others have said, they are convertible to time. But what's the conversion factor? You don't know until you measure it. "The team is doing an average of 22 story points per two-week sprint over the last three sprints." OK, now you know what a story point equates to - for that team, working on that project, with the same people doing the estimates. That doesn't mean anything for the same team on a different project, or a different team, or different people doing the estimates. They are not portable whatsoever. But within the context of the project, you can estimate reasonably well how much you can get done in the next several sprints.

quantified · Answer

They are a proxy for time and nothing else. An intentionally imprecise and noisy proxy, and worth keeping that way. Their purpose is to be a notional size that can be compared with the rough size of a “sprint”. A sprint might be denominated as points or as time, but since they’re typically time, the point is forced to be roughly time too. Otherwise the unit of measurement doesn’t make sense: what else would you sum up and compare to a duration except durations?
Also: effort-time, not elapsed time. You can’t usefully predict blockers, and when you’re blocked you can usually work on something else, maintaining effort on overall progress.
“Effort” and “complexity” only translate into “time spent doing something useful” and “it’s not all figured out, need time to figure out and probably will take more work than something known simple.
If a point was “effort”, you can work half as hard and take twice as long, or twice as hard and take half as long. Implying a wide range of throttle on your participants. Estimating is always done with a particular throttle setting in mind.
“Complexity” on its own doesn’t make sense. Entering the contents of the Harry Potter series into a Markdown file is pretty simple therefore 1 point but may take a sprint.
I do believe in the value of estimating in points but also in getting granular to identify all the time drivers. Consistency there and constant review of accomplished work to estimated work in a story makes future estimating much more reliable as well as makes progress in a sprint much more transparent.

dyeje · Answer

It literally does not matter. As long as you're consistent with whatever methodology you choose, it'll work as a forecasting tool.
Fibonacci because humans find log scales more intuitive.
Estimating anything but the simplest tasks by time is a fool's errand. Too many unknowns.

dragonwriter · Answer

> Contract after contract, engagement after engagement: why can't a single product owner, scrum master, agile coach, anyone, tell me what story points equate to?
Used properly (which not everyone who uses them does, and you can be sure someone isn't of they can't explain this) they are a measure of relative effort that should be linear to working time (what you would log against the work item) to complete, i.e., a 2 sp item should take about 2/3 the staff time of a 3 which should take 3/5 the staff time of a 5.
> And what's with the whole Fibonacci numbers? 1, 3, 5, 8, 13, ..., etc.
> Why can't story points simply mean: time.
Because empirically (and the research on this goes back to the 1940s, IIRC) people, even domain experts, are very, very bad at estimating absolute time of intellectual work (and this gets worse, not better, when break things down into finer-grained tasks and try to rally them up), but aggregating broad-bucket relative-time estimates arrived at by team discussion and scaling by empirical measures of actual velocity works much better.
Fibonacci is to provide the broad buckets.
Unfortunately, it's all become cargo cult detached from rationale.

uberman · Answer

Story points are level of effort. The Fibonacci allocation helps you estimate this level of effort without getting bogged down in the details of is it 8 or 9 points. There is a clear distinction between each step.
The why bother is to estimate your team's ability to ultimately deliver a product on time and on budget.
In some sense, story points do mean time, but remember if you assign an 8 task to me, I might be able to do that in a week, whereas you might be able to do 13 points in a week so story points are not strictly a measure of time but also of capability and hence "level of effort".
If you don't understand it then you should stop your task master/project manager and make the explain it to you. Otherwise, you will not be able to effectively participate in estimation.

karlkfi · Answer

It doesn't actually matter what story points stand for, as long as they converge towards a metric that can be used for consistent predictions.
In fact, if you pin them to something concrete, you lose the ability to adjust the point value to account for less-concrete variability.
They're almost always "pointless" in the beginning, no better than a Super Wild Ass Guess (SWAG). But if you use them relatively consistently over time, have relatively consistent work, and account for team size changes, they can become valuable tools for predicting completion time of complex/complicated tasks/project.

throwaway4233 · Answer

Here is how I assumed story points would relate to the task, but this takes into consideration 2 facts
1. The team is at least 60-70% aware of what needs to be built, be it a new feature or changes to an existing one. This does not imply every single member should have the same awareness, but more would be more helpful.
2. If the team is not well equipped then it would require them to research about the task at hand and gain a rough idea before estimating. If you do not have access to this time, then buy it.
Coming to the story point wrt to task complexity aspect, here is how I see it,
1. 1 => minor code changes, such as consolidating migration files, css changes that do not involve a parent component or moving config keys for a new feature to a store. Tasks with 1 SP are usually sub tasks based on what I have seen.
2. 2 => minor code changes that involve QA testing the change specifically. This might also include a change in assertion for a test case or adding a new test case to cover this scenario.
3. 3 => Code changes that involve QA testing as well as adding new test cases since this will involving adding new behaviour to a user flow.
4. 5 => Same as for SP(3) but has code changes that involve refactoring or might end up touching dependent files or APIs. Could also be related to adding a new user flow. Might scale to an 8 if the QA feels that testing from their side might be more complex.
5. 8 => Tasks that involves adding a new user flow(s) or refactoring existing flow(s). Any task that involves the QA going `hmmm` during estimation.
6. 13 => Any task 13 or over needs to be split up. One could identify such tasks by checking if the acceptance criteria is similar to a `Give Yourself Goosebumps` novel.
I have not included SP wrt to research tasks that would aid a team in understanding how to estimate a new feature because even I throw a 5 or 8 at it.
P.S. This is how I saw it, but was never able to fully have the team use it, hence I cannot vouch how effective this might be in the long run

WolfOliver · Answer

Whenever I propose to use a time based estimation they say I don't understand. I tried and I still don't understand :D
But the idea is that when you use a time based estimation you do add a buffer so that you can be sure that you finish the task in the estimated time.
In order to prevent the "buffering", we says we do not estimate the time but the complexity of a task.
That's the theory. In practice: You have a story which is e.g. 3 points complex and you can not really say how long it will take you to finish it. It could be a simple task like painting the wall. The complexity does not depend on the size of the wall so it does not matter if I have to paint a 5 meter wall or a 20 meter wall it still will be 3 points?
Now how is this suppose to help anybody???

happy_path · Answer

Theoretically, story points are like a measure of time that is used to compare user stories. My understanding is that for some developers are a relative measure of complexity, useful to compare USs.Now, for me they are the ideal time slot. An ideal hour, spent by an ideal software developer (not a 10x engineer). I stored all measures of actual spent times for each US and multiply them by a factor for each developer depending on his/her experience. I know that's not the best way to do it, but it served me to create some kind of dataset and make some regressions about future USs based on the data.

v1l · Answer

I think a better heuristic when asking developers for an estimate to ask for a lower bound and a black swan upper bound. Most developers have a better sense for this than an exact time or story point estimate. Alas no product tools allow you to do this today.As a dev, I'm so much more confident of saying .. umm this bug will take at least a couple hours to investigate/fix but could balloon up to 2 days to re-factor and fix depending on what I find.

muzani · Answer

My house rule is my estimate is in half-hours, but the time it takes to be done is in hours. So an 8 is a full day, but I expect to be sitting in front of the computer for at 4 hours.
Fibonacci is good because it means that your estimate is increasingly inaccurate as it goes higher.
Anything higher than 3 should be broken down. I normally use a 0 or 0.5 as well for smaller tasks - typos, creating a dialog box, etc. 13 or above isn't a real estimate and should not be taken seriously.

tdeck · Answer

Because we have no satisfactory way to measure software output, and thus have no way to measure productivity or consistently size tasks. Exercises like story point estimation are like medieval medicine - occasionally they work, but more often it's just about making the patient feel they're taking control of their illness.

slipwalker · Answer

i always think of story points as "function points ( as in IFPUG[0] ) for lazy people". It should measure complexity[1] of a task. The time dimension will come later, after you have some perpective on your team's technical skills ( "oh, on average they can produce N magical-points over each day/week/month", what Joel calls 'evidence-based scheduling'[2] ).
[0] https://www.codeproject.com/Articles/8151/Using-function-poi...
[1] https://ciandt.com/us/en-us/complexitypoints
[2] https://www.joelonsoftware.com/2007/10/26/evidence-based-sch...

hackflip · Answer

It's a made up scale that is intentionally imprecise to communicate rough estimates.These imprecise measurements may be way off on individual units of work, but in aggregate, can average out to fairly consistent measurements within a team.

souprock · Answer

If they were meaningful, dishonesty and self-delusion would creep in. They are thus abstract, and you use a conversion factor to get time. The conversion factor could experience inflation or deflation.

dexwiz · Answer

I've used T-shirt sizes: S, M, L, XL, and that seems to be a good gauge. Anything beyond XL needs to be broken up into multiple stories. This maps to 1, 3, 5, 8, etc or whatever your tool uses. Numbers are easier to sum than sizes, so that is what gets used in management software.
Agile tries to balance feedback to management with the inherent uncertainty in software development. Most managers are happy to hear something like "This is two big things, and 5 little things." Any analysis beyond that is a exercise in futility. They understand a small thing may take a while for a junior engineer but be a footnote in the day of a senior engineer. So time doesn't really make sense. Lines of code and checkins are both arbitrary metrics that don't necessarily reflect the work put in. I have spent days or weeks on issues that resulted in single line changes, and cranked out hundreds of lines of code in an hour. What else can you measure? Turns out there aren't very many good metrics for software engineering or other knowledge work. All you are left is a "feeling" of "effort." This is really a judgement call that you can only make by combining personal and team experience. If you are jumping from contract to contract you likely don't have the time to calibrate to a specific team's feeling.
Any metric is subject to being gamed, at which point its useless. By not saying "X of Y equates to Z story points" you avoiding creating gameable metric. If points are tied to a concrete value, then people will adjust for that value to try to get promotions/raises/notoriety instead of delivering a quality product. Knowing points are arbitrary makes them less appealing to game.
I struggled with the same question early in career. As I've gone through more sprints I have taken the Whose Line is it Anyway's approach that Agile is "[a] show where everything's made up and the points don't matter". Its not completely made up, but your job is develop software, not assign points to stories, so focus on your job and assign points based on what feels right. If you find you are often wrong when assigning points, adjust accordingly.
In the end, points are only really used for a "big picture" for management. They understand an individual story's points isn't exact. But if the team wildly swings on points sprint to sprint, then someone thing up. Either they are not consistently finishing work, not correctly evaluating work, or something else. So think of less an absolute measure, and more of a relative measurement of output sprint to sprint. That is why each team points their own work, and teams shouldn't be compared to each other based only on point value.
This is all predicated on that management understands Agile/Scrum. If a manager is too focused on points, try to direct them back to concrete Todo/Done lists, as that is what really matters. Sometimes they will try to manage via points, and this will fail everyone in the long term.

rajacombinator · Answer

It&rsquo;s all a bunch of Agile Culting designed to generate overhead and keep the manager class employed.

markus_zhang · Answer

We use T-shirt size. Small is usually one sprint, medium two and large four.

schmookeeg · Answer

I used to ask this to the annoyance of everyone too. I stopped awhile ago and have learned to embrace the suck of it.
I am convinced that 'story points' have been made vague and hand-wavy, because stakeholders would take an engineer's time quote at face value. Then the quote of 'two weeks' gets passed up the chain. Then some marketer or sales person makes promises or media buys based on this quote, and then when the date slips, and the engineer was asked why, they got a jargon-laden shovel-load that was of no use to anybody.
Heads rolled shortly after. Story points were born to protect the other middle-management wastrels from a similar fate.
So now 'features' are only reported in 'release notes' that happen after a sprint. 'when is it done?' can be answered with 'this sprint or next, most likely, as long as our velocity holds during this epic' -- a rich and blameless pile of nonsense that anyone with firing authority cannot possibly act upon.
Nobody ask engineers when things will be ready ever again.
Engineers can read YC during their workday as long as that magical 'Team Velocity' stays high enough every two weeks. You can even play Dyson Sphere Project on days 1-13 and sneak a heroic bender of pull requests across the line the night before sprint end, saving the sprint, champion to all.
All forecasting can and will be expertly sandbagged by the engineering team. Nobody can accuse them of collusion because they were independent votes using scrum poker or some other web toy. We all said that one-word text change was a 2 point ticket, boy howdy, so it surely must be. We're the ones doing the work after all, and 2 points doesnt mean squat anyway in a temporal sense, so what is there to even question or argue about? FINE, TWO. next!
My advice, OP, just let it go. Embrace the nonsense of it all. Let those 'stakeholders' justify their existence and have their plausible excuses that preserve their jobs. We can work on side projects or new steam releases in peace for most of our corporate existence.
Also, some bonus tips I learned for voting:
1 is never the right answer, or everyone will need to discuss consolidation potential with other tickets, which never works, but can kill hours with pointless discussion and debate -- time far in excess of the the work itself.
2 is okay, unless everyone else went 3, then you're just being a cocky prat.
3 is the default vote for all things big and small.
5 is okay, but risks elaborate "let's talk it out and pre-engineer the thing while virtue-signalling that I'm a very thought-provoking team member" sub-sessions. These are to be avoided unless you want to pre-engineer and virtue-signal the thing. Quicker and easier to vote 3 and just ask to take that ticket if you're so interested.
8 is right out, or everyone will need to discuss where and how to break the thing apart and usually you earn a second grooming meeting.
13 is a super hilarious sarcastic vote for that one-word text update, but you can only use it once, mister or miss funnylaffs. Then we roll our eyes because of the re-vote it causes.
The game is only won by getting out of grooming/sprint planning meeting as quickly as possible. That's now your entire productivity goal at Big Dumb Agile Company. Enjoy. :)
- Mike also.

Why can't anyone tell me what story points equate to?

It literally does not matter. As long as you're consistent with whatever methodology you choose, it'll work as a forecasting tool.
Fibonacci because humans find log scales more intuitive.
Estimating anything but the simplest tasks by time is a fool's errand. Too many unknowns.

It's a made up scale that is intentionally imprecise to communicate rough estimates.
These imprecise measurements may be way off on individual units of work, but in aggregate, can average out to fairly consistent measurements within a team.

If they were meaningful, dishonesty and self-delusion would creep in. They are thus abstract, and you use a conversion factor to get time. The conversion factor could experience inflation or deflation.

It’s all a bunch of Agile Culting designed to generate overhead and keep the manager class employed.

We use T-shirt size. Small is usually one sprint, medium two and large four.

Why can't anyone tell me what story points equate to?

It literally does not matter. As long as you're consistent with whatever methodology you choose, it'll work as a forecasting tool.Fibonacci because humans find log scales more intuitive.Estimating anything but the simplest tasks by time is a fool's errand. Too many unknowns.

It's a made up scale that is intentionally imprecise to communicate rough estimates.These imprecise measurements may be way off on individual units of work, but in aggregate, can average out to fairly consistent measurements within a team.

If they were meaningful, dishonesty and self-delusion would creep in. They are thus abstract, and you use a conversion factor to get time. The conversion factor could experience inflation or deflation.

It’s all a bunch of Agile Culting designed to generate overhead and keep the manager class employed.

We use T-shirt size. Small is usually one sprint, medium two and large four.

It literally does not matter. As long as you're consistent with whatever methodology you choose, it'll work as a forecasting tool.
Fibonacci because humans find log scales more intuitive.
Estimating anything but the simplest tasks by time is a fool's errand. Too many unknowns.

It's a made up scale that is intentionally imprecise to communicate rough estimates.
These imprecise measurements may be way off on individual units of work, but in aggregate, can average out to fairly consistent measurements within a team.