How do you maintain personal annotations for code you don't control?

Question

I spend significant time reading and understanding codebases that I don't control (open source libraries, internal legacy systems, etc.). As I build understanding, I need to document my insights, gotchas, and mental models - but these notes are purely personal and shouldn't be part of the actual codebase.My challenges:1. These annotations need to be tightly coupled with specific locations in the source code (particular functions, variables, or even specific lines)2. The underlying code changes regularly (new versions, updates from maintainers) which can break the connection between my notes and the code3. My notes are private - they include half-formed thoughts, questions, and sometimes critical observations that wouldn't be appropriate as public comments4. I want to preserve this knowledge across different machines and working environmentsI've tried various approaches: - Local IDE bookmarks (lost between sessions) - Separate markdown files (hard to maintain precise code references) - Private forks with comments (becomes unmaintainable as source evolves)I'm curious how others solve this problem. Do you have a systematic approach for maintaining personal annotations on code that's not under your control? How do you handle the challenge of the code evolving while keeping your notes relevant?Would especially love to hear from people working with large codebases or those who regularly need to dive deep into external dependencies.

thunderbong · Accepted Answer

I recently came across a VS Code extension that does pretty much what you're looking for -
> Out-of-Code Insights is a Visual Studio Code extension that allows you to add annotations, notes, and comments without modifying your source files.
https://marketplace.visualstudio.com/items?itemName=JacquesG...
GitHub:
https://github.com/JacquesGariepy/out-of-code-insights/

JoshTriplett · Answer

This is the first time I've ever heard of someone keeping private source-line-attached notes in a codebase. I work with very large codebases, but if I discover things about the codebase that required spelunking, I generally turn them into comments or documentation.
Of the requirements that you've laid out, I'd suggest that you need to either relax requirement 2 or 3:
If you relax requirement 2, you could keep your notes in a private fork.
If you relax requirement 3, and make your notes suitable for public consumption, you could submit your notes as comments and make the codebase easier for everyone to understand. (Or, at least, you could submit some of your comments, making the remainder easier to maintain privately.)

IshKebab · Answer

The closest I've come to doing something like this is commenting poorly-commented code, and keeping my in-progress comments in a branch that I regularly rebase.You said that becomes unmaintainable as the source evolves, but that's surely a fundamental property of keeping notes on changing code? You have to do work keeping your private comments up to date with any method.

diggan · Answer

If you're working within git, maybe `git notes` fit your use case? You can basically attach notes to various Git objects, without changing the objects themselves.https://git-scm.com/docs/git-notes

rini17 · Answer

Leo editor allows to keep in sync its outline which combines your annotations and external files.
Obviously it isn't bulletproof and needs maintenance when it can't merge external changes automatically.
https://leo-editor.github.io/leo-editor/

williamstein · Answer

Coincidentally, I'm in the middle [1] of building something for https://CoCalc.com that is exactly what you're describing. For collaborative document editing (e.g., google drive and overleaf) it's a common feature, but for code editors it isn't. CoCalc is both. Anyway, nothing to see yet, but you might want to check with us in a month. After thinking about this problem a lot recently, I think it&rsquo;s critical to store the comment locations with all versions of the file, so you don&rsquo;t lose comment locations, or at least maximize the information you have available to locate comments when they get lost.[1] https://github.com/sagemathinc/cocalc/pull/8071

zffr · Answer

I maintain a branch with my comments inline.If the underlying code changes, I just update my comments.

layer8 · Answer

> Separate markdown files (hard to maintain precise code references)
That shouldn’t be difficult. Most code repository systems support links to exact line numbers in specific commits, for example like [0]. Even in the event that the links stop working, you can still identify the commit hash, file name and line number from the URL.
[0] https://github.com/curl/curl/blob/3b057d4b7a7e6b811245fd0312...

dqv · Answer

I don't anymore and when I did the code didn't change much. But I haven't seen anyone mentioning processing the AST. Some things would break between changes, but if the language the code uses has a good AST traversal library, you could assign your notes to parts of the tree rather than source code locations, falling back to source code locations when that fails. It would still need manual maintenance, but would at least be less fragile than using solely line locations.

dragon96 · Answer

I keep comments committed in a separate branch.
The lack of syncing doesn't bother me, because the purpose of taking notes always falls into one of these categories:
1. I read the code to get an idea of how something works. The code is there to make examples/variable names concrete, but I don't need to know the exact implementation.
If the notes need to sit in the code, usually that's because the answer spans multiple methods (eg "what does an e2e request look like?"). A set of comments on outdated code is always good enough for me.
Otherwise, a lot of times the answer can be summarized in one line (eg "where is the state tracked?" -> in FooBarClass). These can go into personal notes.
2. I need to know the implementation and it is complex and hard to follow.
If I need to know the implementation, either it is because I'm actively working on it, or I need to make [complex idea] more concrete in my head.
If it's the former, usually I'll have memorized it by the time I read through it.
If it's the latter, by the end of it I'll have gotten the main idea and it's fine to forget the implantation details.

terminalbraid · Answer

I do my absolute best to write code that does not require many or an comments or annotations because of the pain points described. I assume you're not referring to things like documenting "infrastructure" or "overall design" or "how to get started" as they don't change much and I just put those in a readme in the repo. For the nuts and bolts itself, this involves
Carefully naming variables and classes in obvious and consistent ways. I will spend time refactoring code so that it is named consistently and behaves as named.
Very small functions and classes (but not smaller than they need to be). This lets me use more named functions which gives me more description. It also typically gives me a nice hierarchy of how things occur, so whatever main "driver" function I have is pretty declarative and light on logic. It avoids big "god" functions or classes which tend to get cluttered and are often the hardest to break down or read.
Enforce obvious and established patterns. These again go in names, but if I'm using CQRS, then I'll have lots of CQRS, handler, registrar, etc in the names. If I have a factory it has Factory in the name. When you see these you know what and how things are organized.
Related to the above, no "clever" code and no inconsistent code. I'll write more "inefficient" code if it's not a bottleneck rather than something tight which was a premature optimization. If it's not normal for the established patterns, but could be forged into something consistent, I do the latter.
Lots and lots of tests. Tests describe behavior which tends to be pretty immutable OR if I have a requirement on behavior change, the test will fail at some point and needs to be reconsidered so gets my renaming attention. That last part is very important. Most testing frameworks let add plain language names/failure conditions, so if the behavior has changed the test starts going red and it doesn't let you forget about it. Those often become my documentation/annotations.
I will use comments when I've written something that needs to be structured outside of the above. These tend to be rare and typically pretty dense "black box" places, like when I've implemented a numerical or other very specific algorithm. As such they don't tend to get touched very often and I will write unit tests to make sure behavior is enforced.

dpats · Answer

I used codestream with two of my previous teams and absolutely loved it. I don&rsquo;t remember if you can keep annotations private but I see plenty of value in allowing the rest of the team to see what questions/note you have. In any case, I believe they open sourced the whole thing so you could see how they handled code changes

simonw · Answer

GitHub issue comments. You can link to code in GitHub that's anchored to a specific commit. If it's in the same repo GitHub will inline the code into the issue git you. For separate repos I sometimes link and then manually copy in the code block myself.

karmakaze · Answer

I use Sublime Text and put my notes in a file. I don't use file/line references but rather name the thing I'm noting (e.g. class/method/variable). Other times I'll use a commit and a literal string as a (nearly) unique reference.

frou_dh · Answer

> Private forks with comments (becomes unmaintainable as source evolves)If swdev-grade merging tools are not sufficient to get it done then that's probably a bad sign for your requirements being possible to be met

coldtea · Answer

>1. These annotations need to be tightly coupled with specific locations in the source code (particular functions, variables, or even specific lines)2. The underlying code changes regularly (new versions, updates from maintainers) which can break the connection between my notes and the codeMaybe depend on more loosely coupled notes?You say they "need", but realistically they don't really need "to be tightly coupled with specific locations in the source code", that's just a nice to have.

ronald_petty · Answer

I would like to see (better) solutions not only for source code, but general web-pages and applications. For example, bookmarks in a browser are ok, but it would be a lot better if you could easily annotate and later reference / rank / prioritize. A browser is a pretty good proxy to the world's knowledge including source code. It be nice if they would level up in these regards.There are tools for aspects of all these areas, but still feel unsolved (easy, feature-full).

smitelli · Answer

I&rsquo;d try to drop requirement 3. Any insights made could be beneficial to somebody else working on the code (especially in closed-source environments only touched by people employed by your organization).Re: critical tone, instead of saying &ldquo;this is a useless garbage fire&rdquo; maybe something like &ldquo;it is not yet apparent how this interacts with blah blah.&rdquo; There&rsquo;s always a way to phrase it where it&rsquo;ll plant the seeds of how you want the reader to feel about it without being overt.My 2c, anyway.

jupenur · Answer

The weAudit VSCode extension [1] works pretty well. It's designed for security work, but there's no reason why you couldn't use it for general note-keeping.[1] https://blog.trailofbits.com/2024/03/19/read-code-like-a-pro...

hprotagonist · Answer

https://github.com/nobiot/org-remarkhandy, if you&rsquo;re in the emacs ecosystem.

ZedZark · Answer

Maybe a combination of private fork with comments and separate markdown files with notes (maybe in the same private fork)Consider using special "symbols" in comments like "MYDOCS_XXX" that you search for in your modified version of the code base, and refer to in other places. These will survive renames of function names etc by the upstream authors.

JohnBooty · Answer

A lot of times I just take notes in a personal fork, but that's imperfect for all of the obvious reasons.I also take notes in my notes app. This obviously is imperfect too, but the codebases I work on aren't typically churning so much that these notes become out of date too quickly.

anthonymelchett · Answer

Check out https://cartograph.app/

1oooqooq · Answer

you describe a nightmare. which the only solution is to keep one single commit with all the comments on a branch.update and rebase the branch. solve conflicts if they changed code around the comments. anything else you will be just delaying this exact same chore and possibly making it impossible down the road.

kentich · Answer

I make a mind map in FreeMind with method/property names and even pieces of code as nodes.

billsix · Answer

I teach a class on computer graphics, where I want to embed my working source code into my web based explanations, so perhaps the following could help you

I have my source code in one directory, and in another I use Sphinx to make the documentation. In the documentation, I reference certain sections of code, which you can do by line number, or you can do by some pattern to begin and end.

Since I control all my source code, I put in comments with certain flags for regions of code.

I can then reference said section of code as follows

  .. literalinclude:: ../../src/demo06/demo.py
     :language: python
     :start-after: doc-region-begin define uniform scale
     :end-before: doc-region-end define uniform scale
     :linenos:
     :lineno-match:
     :caption: src/demo06/demo.py

https://github.com/billsix/modelviewprojection/blob/master/b...

The generated book is here https://billsix.github.io/modelviewprojection/

For your purposes, using a third party's code, I would make a new git repository, and copy the current status of their code in, I would then annotate the sections that I want to with comments, And then generate the documentation using Sphinx, referencing you annotations of their code

nunez · Answer

I use a reMarkable to write thoughts as they happen

How do you maintain personal annotations for code you don't control?

If you're working within git, maybe `git notes` fit your use case? You can basically attach notes to various Git objects, without changing the objects themselves.
https://git-scm.com/docs/git-notes

Leo editor allows to keep in sync its outline which combines your annotations and external files.
Obviously it isn't bulletproof and needs maintenance when it can't merge external changes automatically.
https://leo-editor.github.io/leo-editor/

I maintain a branch with my comments inline.
If the underlying code changes, I just update my comments.

GitHub issue comments. You can link to code in GitHub that's anchored to a specific commit. If it's in the same repo GitHub will inline the code into the issue git you. For separate repos I sometimes link and then manually copy in the code block myself.

I use Sublime Text and put my notes in a file. I don't use file/line references but rather name the thing I'm noting (e.g. class/method/variable). Other times I'll use a commit and a literal string as a (nearly) unique reference.

> Private forks with comments (becomes unmaintainable as source evolves)
If swdev-grade merging tools are not sufficient to get it done then that's probably a bad sign for your requirements being possible to be met

The weAudit VSCode extension [1] works pretty well. It's designed for security work, but there's no reason why you couldn't use it for general note-keeping.
[1] https://blog.trailofbits.com/2024/03/19/read-code-like-a-pro...

https://github.com/nobiot/org-remark
handy, if you’re in the emacs ecosystem.

A lot of times I just take notes in a personal fork, but that's imperfect for all of the obvious reasons.
I also take notes in my notes app. This obviously is imperfect too, but the codebases I work on aren't typically churning so much that these notes become out of date too quickly.

Check out https://cartograph.app/

I make a mind map in FreeMind with method/property names and even pieces of code as nodes.

I use a reMarkable to write thoughts as they happen

How do you maintain personal annotations for code you don't control?

If you're working within git, maybe `git notes` fit your use case? You can basically attach notes to various Git objects, without changing the objects themselves.https://git-scm.com/docs/git-notes

Leo editor allows to keep in sync its outline which combines your annotations and external files.Obviously it isn't bulletproof and needs maintenance when it can't merge external changes automatically.https://leo-editor.github.io/leo-editor/

I maintain a branch with my comments inline.If the underlying code changes, I just update my comments.

GitHub issue comments. You can link to code in GitHub that's anchored to a specific commit. If it's in the same repo GitHub will inline the code into the issue git you. For separate repos I sometimes link and then manually copy in the code block myself.

I use Sublime Text and put my notes in a file. I don't use file/line references but rather name the thing I'm noting (e.g. class/method/variable). Other times I'll use a commit and a literal string as a (nearly) unique reference.

> Private forks with comments (becomes unmaintainable as source evolves)If swdev-grade merging tools are not sufficient to get it done then that's probably a bad sign for your requirements being possible to be met

The weAudit VSCode extension [1] works pretty well. It's designed for security work, but there's no reason why you couldn't use it for general note-keeping.[1] https://blog.trailofbits.com/2024/03/19/read-code-like-a-pro...

https://github.com/nobiot/org-remarkhandy, if you’re in the emacs ecosystem.

A lot of times I just take notes in a personal fork, but that's imperfect for all of the obvious reasons.I also take notes in my notes app. This obviously is imperfect too, but the codebases I work on aren't typically churning so much that these notes become out of date too quickly.

Check out https://cartograph.app/

I make a mind map in FreeMind with method/property names and even pieces of code as nodes.

I use a reMarkable to write thoughts as they happen

If you're working within git, maybe `git notes` fit your use case? You can basically attach notes to various Git objects, without changing the objects themselves.
https://git-scm.com/docs/git-notes

Leo editor allows to keep in sync its outline which combines your annotations and external files.
Obviously it isn't bulletproof and needs maintenance when it can't merge external changes automatically.
https://leo-editor.github.io/leo-editor/

I maintain a branch with my comments inline.
If the underlying code changes, I just update my comments.

> Private forks with comments (becomes unmaintainable as source evolves)
If swdev-grade merging tools are not sufficient to get it done then that's probably a bad sign for your requirements being possible to be met

The weAudit VSCode extension [1] works pretty well. It's designed for security work, but there's no reason why you couldn't use it for general note-keeping.
[1] https://blog.trailofbits.com/2024/03/19/read-code-like-a-pro...

https://github.com/nobiot/org-remark
handy, if you’re in the emacs ecosystem.

A lot of times I just take notes in a personal fork, but that's imperfect for all of the obvious reasons.
I also take notes in my notes app. This obviously is imperfect too, but the codebases I work on aren't typically churning so much that these notes become out of date too quickly.