In terms of tracking it for the future, what suggestions do people have? I can backlog it as a bug, but it's not going to be easily searchable. A dev could pick it up, but without a method to reproduce, it's not easily fixed in a sprint either.
And it's also hard to know if it gets fixed in the future either!
But it's also a crash, and I personally hate not documenting them, no matter how rare. But I'd like a better way to manage it.
- remove file/line#
- omit the bottom (top) frame, which can be different between environments
- convert certain constructs to a common format (e.g., "unknown module" (clang) to "???" (valgrind)
- translate "func@@GLIBC_version" => "func"
This works well enough in practice for our purposes (identifying regressions and suppressing specific reports from valgrind/Address Santizer).
We also maintain a xref between md5 hash and full stack trace.
I send errors to Sentry.io when the error contains a novel stacktrace for the user and the user hasn't disabled error reporting. I also send recent log messages and some other info, like the OS and hardware architecture (so I can reproduce it on my end). [1]
PhotoStructure uses a SHA of the stacktrace to discriminate between different errors. This certainly can group different problems together, but in practice those problems are related.
Only sending novel stacktraces prevents a user from clogging up my Sentry dashboard, and from wasting my users' bandwidth. PhotoStructure imports huge libraries, and before I added this squelching, I could have a single user send tens of thousands of reports (when the "error" turned out to be an ignorable warning due to the camera they were using writing metadata that was malformed but still parseable).
If you're building a SaaS, and you own the hardware you're software is running on, just send all errors to Sentry.
Sentry does a good job in helping me triage new errors, marking when errors crop back up, and highlighting which build seems to have introduced a novel error.
Keep in mind that the stacktrace may not be relevant if that section of code or the upstream code is modified. I use automatic upgrading on all platforms to keep things consistent.
Highly imperfect of course, and it created separate entries for some exceptions that included random numbers in their message. But it did put pressure on people to clean them up.
> I can backlog it as a bug, but it's not going to be easily searchable.
In my experience, I've marked it as a bug, comment with the stack trace and mark as U. Then when it arises again, hopefully someone searches for a part of the stack and comes lucky or more often than not, I (or others) will hear of the crash and relay the bug info. Bug is updated with any new info and live continues until it crashes again... Not perfect by any means. I'd love to hear how others deal with this
You can email me if this interests you (email is in my bio)