How does your company keep track of lessons learned?

Question

As a single developer, I would usually just keep a .txt file with interesting things that I've learned throughout my tenure. It's a lot harder to replicate that efficiency for a large team; especially a distributed team. How do you effectively share knowledge with your co-workers? What have you seen work/fail?

bjornlouser · Accepted Answer

We have an alumni network that any developer can ping via email. The group is comprised of former standout employees who were paid a small retainer at the time of departure in return for their commitment to provide missing details and lore about design choices, missteps, and implementation.

3dfan · Answer

We have a directory lessons_learned/ with *.md files.
It's a git repo and everyone can pull/push directly to master.
I skim the commits once a week.
It's basically plain text files. But the md extension triggers some nice eye candy in vim and other browsers.
I think we will keep this structure forever. Maybe we will (additionally) serve the files over http at some point. Maybe we might even add edit / search / push functions over http, but for now I have not planned that.
I have seen CMS come and go. And I'm tired of it. Text files are forever.

konschubert · Answer

Maybe I&rsquo;m a pessimist but I don&rsquo;t really think lessons are learned by companies, lessons are learned by individuals.Postmortems are important to drive home what went wrong, but newcomers won&rsquo;t read them all.That&rsquo;s why you need people with experience and people with tenure.What companies sometimes do: They encode lessons into rules, which tend to survive turnover. But that comes with its own set of problems, where you end up having a lot of rules who&rsquo;s reason is forgotten.

jkingsbery · Answer

Documentation can be nice (it's something to point someone to if in a design review it becomes clear we're about to repeat a past mistake), but the best way to improve based on lessons learned is to bake them into your team's mechanisms.
To describe an example at a previous company from the one I'm at now: I used to work for a startup building a mobile phone network (technically, a Mobile Virtual Network Operator, MVNO, if you care about those distinctions - the point is that phone calls went through our infrastructure). It was very easy in the process of changing someone's account (porting in a phone number, changing a plan, or something like that) for the different IDs from different systems (phone number, phone's serial number, SIM's serial number, billing system ID number) to get out of sync.
So, we could have documented how to avoid this, and the document would have sat there with no one reading it. Instead we created a nightly job that went through all our accounts and verified that ID numbers were "as expected." It would output to a slack channel with whatever breakages occurred for us to look into the next day. This program also served as documentation - I could look at it to understand what IDs should match to which systems.
My current employer follows the same sort of learn-by-building-mechanisms approach, but at a much larger scale.

mariocesar · Answer

When I was working in a team, I did Bug's Anniversaries celebrations. It actually boost moral.
February 2th. when I wipe out a production database because the ansible playbooks had hadcorded settings. Since then we use settings repositories and confirmation dialogs when playbooks will run in production.
September 2. They day we realise we are unable to restore old backups because of media paths are not related to the data, so when we move all data from different servers with lost all user's images forever. Since then images are prefixed with the db record id where it belongs, later we added metadata to S3 to add extra stuff like user_id, object_id, company_id, etc. so we keep urls clean
September 10. Inbox carnaval: we have an small hack that added users to BCC to send newsletter, with the time all users where receiving emails 2 times, then 3, then 4, then 10, then 2 again. It was a threading issue where the variable of the BCC was set to global in certain cases and it appended to the list instead of starting again ... 2 full weeks into that. Python3 and typing was the way to fix it

jack_codes · Answer

We have Confluence pages that no one reads but it's nice we go through the exercise, I guess.

jrhusney · Answer

You know that feeling of "we always complain about the same things, but it never gets any better?" Your question is at the heart of making it possible to escape stagnation and actually evolve the way a team works together.
Sharing knowledge isn't just a matter of tooling, but a matter of principle. Because we want the knowledge we share to not just float out there as a "lesson", we want people to use the lessons and act differently what we're actually talking about here is governance (i.e. law!). This might sound heavy. It's not. It's just a change in orientation from "I'm passively sharing this lesson we learned" to "this lesson we learned changes the way we act."
There are 3 things to do to shift from "lessons learned" to working agreements:
1. Capture knowledge in the pattern "when this happens, our team will act this way"
2. Adopt a workflow for formally adopting a working agreement – could be a majority vote, consensus vote, etc.
3. Keep that knowledge someplace the team can browse, search, and update (e.g. Confluence, Notion, Google Drive, etc.)
If you do this, something magical happens: you'll begin to evolve your knowledge over time.
Have a working agreement that didn't quite cover a corner case? Update it! Have a working agreement that was too restrictive? Nuke it!
It's no less shift in magnitude as when humanity switched from oral tradition to the written word. And guess what? The written word works much better when you're operating remotely.
Our remote team has been operating this way for nearly 5 years at Parabol. It's a common pattern that at the end of every retrospective we have a new working agreement we'd like to adopt. We've even come up with a Slack-based async workflow for adopting them: https://www.parabol.co/blog/async-decision-making-slack

Damorian · Answer

We have a checklist system. Depending on which files are on your commit or which database objects are modified, you're presented with a checklist specific to the things you're changing. When something goes really wrong, the outcome of the post mortem often includes adding to or changing the checklist for the affected files/filetypes/team/whatever. It's not a perfect system, but it beats reading our long, confusing wiki pages.

gwbas1c · Answer

We don't!One of the reasons why some people become valuable in long-tenure positions is because of the lessons learned. At a certain point, no one is going to read through every page in the wiki / archive / man pages / whatever is popular this year.That's where onboarding and process come in: Management needs to make sure that lessons drive improving the process, that newcomers are onboarded with lessons from the past, and that everyone continues to follow the processes.Now, jokes aside, in my company, the new owners decided they didn't like the people we were outsourcing with, and decided to replace them with their own outsourcing center. Now everyone's re-learning lessons that are probably tracked in our various wikis, repositories, ect. But, the newcomers want to run things their own way.That's why a few long-tenured people are important.

Balanceinfinity · Answer

Realistically, the way we do it is we have a team of supervisors who have been around long enough and have seen everything that - as a group - we are the institutional knowledge. We are pretty effective at communicating with each other to bring the institutional knowledge to bear on a problem - we could be better at training our subordinates so that they have access to this institutional knowledge, but the truth is that the training is expensive and not usually well received.Generally, I don't think efforts to accumulate institutional knowledge on a website work bear fruit - no one really wants to update the website, both because it's thankless, but also because of access time. It is much faster to tap the institutional knowledge in management by sending an email than by paging through the results of a search. For written institutional knowledge to have real value, the access time has to be small, which means someone has taken real care in curating the knowledge so it's easily accessed. Finally, we have the Brian problem. Brian was the person most likely to update our internal websites - unfortunately Brian wasn't very good and had some poor ideas regarding lessons learned - by adding them to the websites, his bad ideas were passed on to younger team members who didn't know better.

phlhar · Answer

I work voluntarily at a university radio station. We are a small team of three technicians, but the team changes quite often. I have been there for four years now, which is already quite long as people come and go as they finish their studies. We keep track of everything in a dokuwiki. Often I catch myself not wanting to document stuff, but it is essential for coming technicians to have a place where they can look stuff up. That was the place to learn about how the infrastructure works for me when I was new. The second tool we use is openproject for ticket tracking. We don't delete old tickets, so if a problem comes up again it is likely that there is already a solution for it documented in the ticket system.

zzaner · Answer

We used to try and do this in Confluence.
Felt good to know that stuff is neatly documented somewhere, but since no one ever knows where that was, it was of little value and few ever read it. People still tapped on shoulders and repeated the same mistakes.
It baffles me that an established company like Atlassian can’t get something as fundamental as search right. I can’t even find the content I myself created at times.
We have since switched to Nuclino (https://www.nuclino.com) and so far are having a better experience. It's as feature-packed, but the basics work as expected and are a lot more user-friendly.
Re-establishing a proper documentation culture in the team is still a challenge, but that’s not something a tool can solve.

cborenstein · Answer

Disclosure: I am a founder at bytebase.io, currently in closed beta. We're building Bytebase with this kind of use case in mind.
Wikis - we found wikis are too heavyweight and formal to be used consistently for recording learnings.
Slack - in our experience, Slack makes capturing learnings easy but organizing and keeping track of learnings difficult.
Our goal with BB is to make recording a learning as convenient as writing a Slack message AND to make organizing and keeping track of these learnings similarly easy.
You can write bytes directly in Bytebase or save them from Slack.
Would love any feedback or ideas. Email me (cara@bytebase.io) to get access to the closed beta with HN in the subject line.

weekay · Answer

Developing a postmortem culture is important to share the lessons learnt from production. For eg. Documenting and sharing the lessons learnt from a SRE perspective , a google doc would suffice. Some pointers at https://landing.google.com/sre/sre-book/chapters/postmortem-... Have seen Wiki being setup and fail and go stale very quickly. Lots of knowledge and learnings are tribal knowledge. Sharing tribal knowledge is effective in person or in a non wiki mode especially through brown bags or chapter meetings etc., Challenge is not everyone has the time or the energy / enthusiasm to be talking in front of a wide audience. Never seen one solution working effectively. You need to figure out the best approach based on the team culture and how the organisation is setup. For eg., some teams are hesitant to share knowledge and learnings with other groups - Conways law comes into play. End of the day it is not up to the company to track lessons learnt. It is the job or becomes the job of the person supporting production to a large extent to maintain it for making it easier to do their job. That being said , that knowledge leaves when that person moves jobs & the cycle continues

itake · Answer

I work at a company with ~1k engineers.Every every system failure, we email the entire org a postmortem google doc describing what went wrong, why, and what we are doing to prevent it from happening again. Postmortems are also as their own JIRA project.

fierarul · Answer

At company level lessons learned turn into internal rules. Too flexible holiday time results in more explicit rules about what is and what isn't allowed. The endgame is you become too bureaucratic after enough creative employees.

thyagoquintas · Answer

We have implemented an collaborative wiki (wikimedia, like a copy of wikipedia but just opened for internal usage) in 2009, in a company with 200 employee. We engaged people to write everything that they think is good for the others.

SergeAx · Answer

1. We are writing detailed postmortems for incidens of medium scale and larger. It is a must read for all the newbies. Postmorteams are always linked to issues to prevent similar situations in the future.
2. We are diligently linking our issues into a hypertext mesh: what is related to what, what was blocked by what, what was decomposed from what. We are using milestones and epics.
3. All the commits in all the codebases are linked to issues. There is a rule on a server side that forbids pushing non-linked commits. There is a single exception for firefighting code changes, when there is no time to write a ticket first, but those commits are marked with a special signs and author should create an issue and link to commit when the problem is solved.
4. Documentation and API specs are laying in the same or adjacent repos and are changing according to same rules.
5. So, every line of code is linked to the corresponding issue via commit and then to other issues and commits in other repos via hypertext mesh. When your code is clean, it mostly self-documented and becomes a knowledge itself.

TopHand · Answer

I established a wiki where I use to work. I was quite efficient about keeping it up to date and well categorized. I found it quite useful for not having to re-work out issues I had already solved. It was easy to search and correctable. I gave the people I worked with full access to it. Most of my colleagues ignored it. A small handful utilized it as much as I did. Some would even go back and edit articles and make it much more readable. I found it to be a very useful tool which increased my productivity immensely. I was somewhat disappointed by the lack of use it got from the majority of the people I worked with. On the other hand they wouldn't utilize the version control system I maintained for us either.

sailfast · Answer

Wikis work pretty well for this, provided you use them.
If you can't be bothered to put together documentation as you build the software (sometimes I can't be bothered), you should at least make sure to document as you troubleshoot later so you don't keep making the same mistake. We store these as "Flight Rules" for our application (or error signatures, etc - whatever you want to call them) which provides the team a single location to start their search when things go belly up.
That way, when you run your post-mortem (you run these, right?) you have a place to store the error notes which eventually builds up into a really useful document.
Lastly, I'd say having a team norm that when one person does something the others should also be able to test it (and therefore have the right instructions to do so) is a good one for continuity.
EDIT: COuple other things that have worked: - Checked-in ".dot" file GraphViz context diagrams alongside your repos are nice and easy to update - Creating decision documents for a quick run-through of options with your technical team is a great way to run an effective process while also creating a searchable artifact for later, which is great for context / lessons learned.

Kaivo · Answer

We haven't started using it yet but we've been thinking about using the Architecture Decision Record pattern as presented in the ThoughtWorks Technology Radar [1].
The basic idea is document decisions with a specific structure and keep it close to the code. The thing is any time we can answer "why", it's a form of decision that can be documented somehow. Since it's close to the code, while coding, any search will also land on those decision if the same terms are used.
There are several tools to help with that as presented here [2] and here [3].
[1] https://www.thoughtworks.com/radar/techniques/lightweight-ar... [2] https://adr.github.io/ [3] https://github.com/joelparkerhenderson/architecture_decision...

emarsden · Answer

My work concerns safety in high-hazard industries, where lessons learned analysis (also called operational experience feedback) is a very important organizational process. It's also well-known to be difficult, not so much related to the technical tools used for recording/sharing incidents and lessons, but mainly in terms of organizational culture (learning, blame, psychological safety and speaking up). Some references from this area:
https://risk-engineering.org/learning-incidents-accidents/
https://risk-engineering.org/barriers-learning-experience/
https://risk-engineering.org/concept/psychological-safety

lwh · Answer

Discuss fervently, make some tickets, writeup in Confluence. Then act surprised when it happens two years from now.

xolox · Answer

At my employer we have two ways:1. For large issues visible to customers an incident report is shared inside the company. These are written for general consumption and so lack any technically interesting aspects (they're "dumbed-down" a lot).2. Technical "lessons learned" are curated in a Sphinx based documentation website that I started but which is starting to see more and more contributions from other tech heads in the company.We used to have a wiki but it ossified after years of no contributions. Personally I didn't like the MoinMoin wiki engine that much but this is just personal taste of course. I started setting up the Sphinx site to encourage knowledge retention despite turnover - I kept explaining the same things again and again. Now I just share a URL when such questions come up :-).

thu2111 · Answer

The answers here are breaking down into two categories:
1. General business lessons, to which companies generally don't summarise or track them.
2. DevOps outage post-mortems, which competent companies generally have some sort of process around.
I've never seen a rigorous post-mortem culture in tech outside of DevOps/SRE.
I guess there are a few reasons. One is probably that the DevOps/SRE space is very amenable to encoding lessons learned in scripts of various kinds, so it's actually useful to do a post-mortem exercise because the outcomes are very small, very concrete and will be somewhat actionable. Things like "errors in parsing this file shouldn't cause the server to blow up" are easily corrected and a process (unit test) put in place to formally encode that institutional knowledge.
In regular software development there's way less reflection. This is partly because the tooling is much less home grown and malleable. Lessons are learned and they are encoded, but it happens slowly and through the mechanism of library and language design. It's generally not something you do within a single company but rather, it's an emergent consensus across the whole industry. Additionally, this is harder because lessons learned are often ambiguous or subjective. For instance I learned the lesson, many years ago, that dynamic typing leads to more mistakes than static typing. But you see many programmers still who prefer dynamically typed languages and dispute this sort of conclusion.
In the business world there's virtually never any kind of "lessons learned" repository or process. At most you get something like a formalised interview process, but even then, those are usually baked into a company from day one or never adopted at all. I've heard of very few large companies that adopted a more rigorous approach to hiring than the one they previously used. It does happen but it's rare.
At the executive/CEO level lessons learned get recorded in the form of strategy talks given at fancy conferences, if at all. Often abstracted or vague to the point of uselessness, any insight that is present gets forgotten immediately by the audience who are mostly there because it's easier than doing real work. These lessons learned are things like "innovation is key to the customer experience", which is a genuine learning in a sense (usually from observing the wreckage of firms that went up against a competent tech company). But it's not really useful in the sense of being actionable by normal employees.

kvz · Answer

We&rsquo;re small but sharing post mortems publicly on our transloadit blog helps to a) be transparent towards customers b) have a reference for future team mates / scenarios c) make sure we really understand the issue. Once something works again, the brain is all to eager to accept any kind of summary and move on. If you share your post mortems, you&rsquo;re forced to look deeply at all the assumptions and preconceptions they are built upon, and often (always) that process reveals new insight and fixes. It can be painful and time consuming, but it pays dividends.

znpy · Answer

I've worked at a company that as seen a HUGE amount of turnover.
The things I have learnt are:
- companies don't learn, people do
- having an internal wiki/kb helps a lot IF it's structured/indexed well enough that you can actually find information
- in an ideal world, no project should be considered done if documentation is not written
It's all about people. People learn and can recall lessons learned.
Otherwise you rely on the good faith and will of the next person to actually go through all the documentation that has been left from the people before. This person might not have all the will, or it might not have the time.

soonnow · Answer

We, at some point, had a Knowledge Sharing event every week. The goal was just go round the table and every developer would share something they learned that week. As I was the senior developer at the time I would usually bring something that I learned or from CS, like a nifty algorithm or how Hashmaps are implemented in C#. Other people would share some bugs or unexpected behaviour. I liked it a lot and I felt that the team did as well. Sadly we stopped doing it, as there was a reorganisation that impacted our ability to do it.

nickthemagicman · Answer

My company DXC just laid off or let go 99% of my team. I'm the only one left. There's no lessons learned here only the short term bottom line. And the developers are simply a commodity.

meristem · Answer

Keeping track is the initial part of the effort. Getting the information from &lsquo;existing&rsquo; to &lsquo;integrated&rsquo; requires a system easy enough to use ( add Bare-bones description of issue, add comments or post mortems, format in ways that highlight core problem/solution) and robust enough to have hooks everywhere an engineer might need to see it.The negative side of integration is the person with knowledge integrates it in their practice and &lsquo;forgets&rsquo; it is new knowledge for everyone else, thus not propagating the new lesson learned.

0x4164 · Answer

Telegram bot with options button.As the telegram bot are processed by my coordinator website, almost every message sent by any user to the bot are saved in my co's database. There are 2 types of options, the public and private. Public type will be saved in db and tell the whole bot's user about information someone sent, with or without further feedback from other bot users. The private type will be saved in db and only visible to several user specified.I though it is very versatile for report, lesson learned and etc.

zikani_03 · Answer

We have a git repo (called the devops repo) and we use sphinx to generate the html. We try to document lessons, post-mortems, decisions and tools we plan to evaluate.
Unfortunately, folks don't really read the docs (and I've learned from this thread that we're not alone ;)
Been thinking about this problem and thought to embed something like a quiz in the docs to make them interactive, yet still static - something like howtographql.com. Yet to try that approach, though.

Waterluvian · Answer

As I mature as a developer I'm learning what many of you probably know: effective long term curation of knowledge is really frickin hard. It's the hardest part of my job, especially because it's so subtly important and easily missed. The end product rarely breaks because of a short term lapse in documentation.I don't have a better answer yet other than making it a personal point of pride that my docs are always up to date and well-organized.

b3lvedere · Answer

Very fragmented all over the corporate network in all kinds of weird tools and applications.They did make effort to standardize everything, but nobody seems to care.

rod_rodriguez · Answer

We are building a single open-source repository consisting of markdown good practices from lessons learned and share-code repository.https://github.com/pragmatismo-io/pragmatismo-io-framework/t... (Currently available in Portuguese)

thewebcount · Answer

One thing I started doing a few years ago is keeping a debug diary. This helps me avoid mistakes I've made in the past, or at least fix them more quickly the next time they happen. I've actually put the more interesting parts of my diary onto our Wiki and encourage others to add their stories. I think I got the idea for a debug diary from HackerNews years ago.

mattcrail · Answer

One of the first thing my co-founder and I did when we started our new project is create a table in Notion with all the previous lessons we had from our last company, and have been adding to it as we create fresh, new mistakes. Using tagging to keep it organised and readily available so when we start adding more people everyone has ready access to it.

jblakey · Answer

I set up Confluence and used it a lot for "how to do this" and "how to do that" notes for myself. It was helpful to me, and I figure that it MIGHT be helpful to other people. It doesn't HURT to have the information searchable, and with a GUI on it for the less-UNIXy folk. Some people use it, most don't.

haxplorer · Answer

We use a mix of confluence and slack groups. We have one slack group per type of learning (RCAs for failures, best practices in engineering / manufacturing / logistics, etc., Top of the mind thoughts, interesting reads). We use confluence to do detailed documentation or writing, and slack to share and drive visibility.

eru · Answer

I used to work for Google as an SRE a while ago.They have pretty good procedures for keeping track of lessons learned. The book (https://landing.google.com/sre/sre-book/toc/) goes into some detail.

bilekas · Answer

> How does your company keep track of lessons learned.
The dev overall, across the globe: i find retrospectives after a sprint cycle really good actually, it's a good place to call up where improvements can be made too.
On a personal level: When my mess up/mistake causes grief for someone else, I make damn sure I learn from that.

quaffapint · Answer

We have a handbook based on Hugo for the tech folks thats editable by anyone and bloomfire for the general masses.

unfunco · Answer

At a previous job, we had an internal Wiki with a page titled: "Cow in a ditch" which was specifically for this purpose.Otherwise, it's mostly Confluence now, but no specific page of lessons learned, instead, those lessons are dotted around in individual documentation pages.

hirako2000 · Answer

We talk about it, some deserve to know, some don't. Some better know, some better not.Non written education is probably the most effective way to communicate and maintain important information. Writings leave it up to the authors' ability to know where and how to communicate...

dragonsky67 · Answer

We have an active process whereby staff with more experience try to actively pass on that experience, spending large amounts of time explaining that the "new" initiative has been tried before and detailing what happened and why it failed.Of course nobody takes any notice.

bryanmgreen · Answer

For an organization, I think it's best to have a company handbook that covers all your standard practices and culture.And if there are significant failures in policy, you can just put a note after the revised policy that says "we tried X, it didn't work because of Y.

Pamar · Answer

I have a question too, but I will put it here instead/before of making an Ask HN entry:Is there any intersection between "keeping track of lessons learned" and "agile methodologies" or are the two completely unrelated/orthogonal ?

petr25102018 · Answer

I have seen post mortems or "common problems" in knowledge bases, but outside of that the lessons learned were either addressed directly (improved process, fixed code, improved docs) or only in the heads of people involved:)

meddlepal · Answer

We put it in a write-only wiki.

lutorm · Answer

It doesn't sound exactly like what you have in mind, but we have a DFIUA program ("Don't Fuck It Up Again..." ;-) where people do postmortems on serious misses and report it out to the software team.

jvanderbot · Answer

From what I can tell, we keep the best people around, or maintain good ties with them as they are leaving to ensure knowledge transfer. Many come back because they find the work environment to be unmatched elsewhere.

asplake · Answer

Reminds of the ironic naming of the "Lessons learned meeting". If no change is made to what people actually do, it's very hard to say that anything has really been learned, and I would put that first.

ericalexander0 · Answer

Daily notes posted in slack where a bot logs to elastic search. Notes reviewed in our retros and the useful/actionable is logged in a wiki and or jira. Wiki notes are reviewed in our annual retro.

MisterOctober · Answer

we have a lesson_log channel [with standard entry format] in slack -- for all Slack's shortcomings, this works great, as the whole team can see lessons learned from every department

happywolf · Answer

My company is awesome in the sense to make sure it doesn't forget, it repeats the same mistakes every couple of years

p2t2p · Answer

We have a kss - knowledge sharing system, that looks at patterns in code changes on pull requests and add comments if needed. Anyone can add a rule

2rsf · Answer

Keep and share are two different things. You can email or have a small meeting sharing results, but keeping a searchable, updated, database is hard.

kirubakaran · Answer

I use a Histre notebook shared with my team https://histre.com/

ing33k · Answer

we are using a service called https://www.getoutline.com/+ for being OSS

donatj · Answer

We have had a couple wikis over the years. They're rarely updated and even more rarely cited. It really hasn't worked out.

Vaslo · Answer

Usually through tribal knowledge contained in experienced employees who &ldquo;retire&rdquo; when they get too old and expensive

Justsignedup · Answer

- we're still in business. We learned something.- anything written down and not enforced is almost the same as nothing.

awinter-py · Answer

tacit knowledge is hard enough for a person to describe, much harder to do in a company context
IMO if the full answer to a question doesn't exist in a single brain, you'll have a hard time reconstructing what really happened without a challenger-level investigation
companies only solve problems temporarily

aklemm · Answer

Oral histories of each firing. /s

leandot · Answer

Github repo with TILs (Today I Learned / Things I Learned)

qxxx · Answer

we have a wiki in microsoft teams, which no one ever reads...

rb808 · Answer

Usually new Tests(Unit or Integration) or monitoring.

codegladiator · Answer

Only by repeating it frequently over span of months.

dboreham · Answer

In "the circular filing cabinet"?

techslave · Answer

via the post mortem process

boardmad · Answer

email

slumdev · Answer

Bluntly: We don't bother. We don't even have reliable mechanisms for collecting feedback from customers.I'm leaving soon.