My challenges:
1. These annotations need to be tightly coupled with specific locations in the source code (particular functions, variables, or even specific lines)
2. The underlying code changes regularly (new versions, updates from maintainers) which can break the connection between my notes and the code
3. My notes are private - they include half-formed thoughts, questions, and sometimes critical observations that wouldn't be appropriate as public comments
4. I want to preserve this knowledge across different machines and working environments
I've tried various approaches: - Local IDE bookmarks (lost between sessions) - Separate markdown files (hard to maintain precise code references) - Private forks with comments (becomes unmaintainable as source evolves)
I'm curious how others solve this problem. Do you have a systematic approach for maintaining personal annotations on code that's not under your control? How do you handle the challenge of the code evolving while keeping your notes relevant?
Would especially love to hear from people working with large codebases or those who regularly need to dive deep into external dependencies.
> Out-of-Code Insights is a Visual Studio Code extension that allows you to add annotations, notes, and comments without modifying your source files.
https://marketplace.visualstudio.com/items?itemName=JacquesG...
GitHub:
Of the requirements that you've laid out, I'd suggest that you need to either relax requirement 2 or 3:
If you relax requirement 2, you could keep your notes in a private fork.
If you relax requirement 3, and make your notes suitable for public consumption, you could submit your notes as comments and make the codebase easier for everyone to understand. (Or, at least, you could submit some of your comments, making the remainder easier to maintain privately.)
You said that becomes unmaintainable as the source evolves, but that's surely a fundamental property of keeping notes on changing code? You have to do work keeping your private comments up to date with any method.
Obviously it isn't bulletproof and needs maintenance when it can't merge external changes automatically.
If the underlying code changes, I just update my comments.
That shouldn’t be difficult. Most code repository systems support links to exact line numbers in specific commits, for example like [0]. Even in the event that the links stop working, you can still identify the commit hash, file name and line number from the URL.
[0] https://github.com/curl/curl/blob/3b057d4b7a7e6b811245fd0312...
The lack of syncing doesn't bother me, because the purpose of taking notes always falls into one of these categories:
1. I read the code to get an idea of how something works. The code is there to make examples/variable names concrete, but I don't need to know the exact implementation.
If the notes need to sit in the code, usually that's because the answer spans multiple methods (eg "what does an e2e request look like?"). A set of comments on outdated code is always good enough for me.
Otherwise, a lot of times the answer can be summarized in one line (eg "where is the state tracked?" -> in FooBarClass). These can go into personal notes.
2. I need to know the implementation and it is complex and hard to follow.
If I need to know the implementation, either it is because I'm actively working on it, or I need to make [complex idea] more concrete in my head.
If it's the former, usually I'll have memorized it by the time I read through it.
If it's the latter, by the end of it I'll have gotten the main idea and it's fine to forget the implantation details.
Carefully naming variables and classes in obvious and consistent ways. I will spend time refactoring code so that it is named consistently and behaves as named.
Very small functions and classes (but not smaller than they need to be). This lets me use more named functions which gives me more description. It also typically gives me a nice hierarchy of how things occur, so whatever main "driver" function I have is pretty declarative and light on logic. It avoids big "god" functions or classes which tend to get cluttered and are often the hardest to break down or read.
Enforce obvious and established patterns. These again go in names, but if I'm using CQRS, then I'll have lots of CQRS, handler, registrar, etc in the names. If I have a factory it has Factory in the name. When you see these you know what and how things are organized.
Related to the above, no "clever" code and no inconsistent code. I'll write more "inefficient" code if it's not a bottleneck rather than something tight which was a premature optimization. If it's not normal for the established patterns, but could be forged into something consistent, I do the latter.
Lots and lots of tests. Tests describe behavior which tends to be pretty immutable OR if I have a requirement on behavior change, the test will fail at some point and needs to be reconsidered so gets my renaming attention. That last part is very important. Most testing frameworks let add plain language names/failure conditions, so if the behavior has changed the test starts going red and it doesn't let you forget about it. Those often become my documentation/annotations.
I will use comments when I've written something that needs to be structured outside of the above. These tend to be rare and typically pretty dense "black box" places, like when I've implemented a numerical or other very specific algorithm. As such they don't tend to get touched very often and I will write unit tests to make sure behavior is enforced.
If swdev-grade merging tools are not sufficient to get it done then that's probably a bad sign for your requirements being possible to be met
2. The underlying code changes regularly (new versions, updates from maintainers) which can break the connection between my notes and the code
Maybe depend on more loosely coupled notes?
You say they "need", but realistically they don't really need "to be tightly coupled with specific locations in the source code", that's just a nice to have.
There are tools for aspects of all these areas, but still feel unsolved (easy, feature-full).
Re: critical tone, instead of saying “this is a useless garbage fire” maybe something like “it is not yet apparent how this interacts with blah blah.” There’s always a way to phrase it where it’ll plant the seeds of how you want the reader to feel about it without being overt.
My 2c, anyway.
[1] https://blog.trailofbits.com/2024/03/19/read-code-like-a-pro...
handy, if you’re in the emacs ecosystem.
Consider using special "symbols" in comments like "MYDOCS_XXX" that you search for in your modified version of the code base, and refer to in other places. These will survive renames of function names etc by the upstream authors.
I also take notes in my notes app. This obviously is imperfect too, but the codebases I work on aren't typically churning so much that these notes become out of date too quickly.
update and rebase the branch. solve conflicts if they changed code around the comments. anything else you will be just delaying this exact same chore and possibly making it impossible down the road.
I have my source code in one directory, and in another I use Sphinx to make the documentation. In the documentation, I reference certain sections of code, which you can do by line number, or you can do by some pattern to begin and end.
Since I control all my source code, I put in comments with certain flags for regions of code.
I can then reference said section of code as follows
.. literalinclude:: ../../src/demo06/demo.py
:language: python
:start-after: doc-region-begin define uniform scale
:end-before: doc-region-end define uniform scale
:linenos:
:lineno-match:
:caption: src/demo06/demo.py
https://github.com/billsix/modelviewprojection/blob/master/b...The generated book is here https://billsix.github.io/modelviewprojection/
For your purposes, using a third party's code, I would make a new git repository, and copy the current status of their code in, I would then annotate the sections that I want to with comments, And then generate the documentation using Sphinx, referencing you annotations of their code