With some of our code, the designers are either gone or sometimes unavailable, and it can be tricky to see how all the pieces fit together. A good IDE makes the job a little easier (finding references, ctrl+click to go to declarations, etc.), but it'd be nice to have a diagram or something for visualization.
So, is it a good idea to try documenting the code design through some sort of visualization? If so, using UML or something else? I suppose there might be tools for doing this automatically in some languages? Otherwise I think if it was valuable enough, it could be something we make sure to review and update along with code changes.
Any thoughts would be appreciated!
I don't know if an automated visualization system is possible, but you'll have to understand the whole thing before doing so. Pen and paper was the most expedient solution for me at the time.
I'd say the big problem in visualizing big systems is that you can't usefully do it in one graph. For instance I worked on a system that had 2000+ database tables if you were going to make a diagram of that which shows everything it is going to take up a long wall. (This can be useful, but it is a big commitment)
A useful tool is going to let you make meaningful diagrams that show the subset of entities that are part of a story. I went to an art show of Mark Lombardi's works
https://en.wikipedia.org/wiki/Mark_Lombardi
who (before he was murdered) drew elaborate diagrams of conspiracies. One thing they showed was drafts that he made in the progress of creating his visualizations and he would sometimes make 40 or more of them. He would start out with a "hairball" that was disorganized and gradually figure out how to lay the diagram out in a way that made the meaning obvious.
At the moment, codeatlas is only a static gallery, but we're currently about 1-2 weekends away from releasing a Github action that deploys this diagram on github pages for your own repos - if you're interested, feel free to watch this repo: https://github.com/codeatlasHQ/codebase-visualizer-action
For some reason, tools related to program understanding are not widely adopted by IDEs.
A while ago I used a tool for Java that was based on the Object-Oriented Metrics[0] book by Michele Lanza. But, that tool was discontinued and it doesn't exist anymore[1].
If you are interested in that topic take a look at Moose[2], a dig a little bit in the research papers. (honestly I tried Moose a few times, but I wasn't very comfortable with it).
For TypeScript projects, the TS compiler API is extremely powerful and easy to use. You can use that to extract information and analyze the code relationships (Graphviz is your friend here :) ).
[0]: https://link.springer.com/book/10.1007/3-540-39538-5 [1]: https://web.archive.org/web/20150428173717/http://www.intooi... [2]: https://moosetechnology.org/
You could map the way programs fit into machines, and the networks between them. This would be the topology.
You can map the way services call upon one another with requests. This is the service graph.
You can map how systems interact over events or shared resources. You could say this is the logical graph.
The problem happens when you try and graph them all at once. It's the same as trying to draw a real map, with all the services, bus routes, railways, shops and administrative regions superimposed on one image. It's very busy.
So I use separate maps.
Tools are another matter. Personally I use Mermaid for graphs. I also have my own tools that create SVG visualisations using DAGre. This can be helpful for interactive visualisations where you can click into different nodes and explore more detail.
My system uses CloudFormation templates and our in house deployment DSLs to figure out the "topology", then let the users see the different superimposed "graphs" as they see fit
I use https://mermaid-js.github.io/mermaid/#/ for the diagram itself because Github natively supports it in markdown files, so you can revision control the diagram. I managed to get reasonably close to the C4 diagrams minus a few features that mermaid does not support.
If that can't be done, there are some interesting things you can try. A lot of the suggestions in the thread are "top down" methods; you can get a lot of value out of "bottom up" visualizations too. Things like:
- Histograms of which lines / functions get called the most, or spent the most time in
- Which lines / functions / files get changed the most in the git history
- CPU flamegraphs
- Plain old print-debugging
In over-architected systems it can be difficult to figure out where the real "meat" of the code is, as opposed to the endless layers of configuration and wrappers and interfaces and indirection. UML diagrams may not help, or even be deceiving, but a stack trace never is.
However, teams usually do not follow any hard rules like UML. A box with text, a connecting line, and a grouping border can all mean anything, and the subtleties of their actual implementation are still trapped in the lines of code and minds of architects.
I would definitely recommend drawing diagrams of systems you create or work with. If there is not one at work, and you're struggling to grasp the big picture, start making one yourself. Start with the pieces you know and put in ambiguous boxes for the pieces you know exist but don't know what they do. I do this even for solo side projects when they have more than one physically-separated component. This will help you catch bad logic and inefficiencies early, and find the best implementations for new features down the road.
I have used draw.io (now diagrams.net), LucidChart, Microsoft Visio, and they all get the job done. I'll recommend the first one as it's the most open.
What I generally do, is start with what I call a "napkin sketch" (which can be UML-ish)[1], and try to avoid writing down too much stuff, in order to reduce "concrete galoshes"[2].
I try to use tools like Doxygen and Jazzy, to document the code, in an inline fashion[3]. Doxygen will generate a UML "Lite" diagram[4].
[0] https://littlegreenviper.com/miscellany/swiftwater/the-curio...
[1] https://littlegreenviper.com/miscellany/forensic-design-docu...
[2] https://littlegreenviper.com/miscellany/concrete-galoshes/
[3] https://littlegreenviper.com/miscellany/leaving-a-legacy/
So I usually find myself in the same position of using jump-to-definition and trying to get a handle on things that way.
From there, I'll sketch a simple diagram on paper to aid my own understanding of whatever piece of the system on working on. Personally, I'd avoid software diagramming tools at this stage. A simple hand drawn diagram can be extremely helpful and doesn't take long to create.
After a while, I'll often start sketching out a more comprehensive system diagram with the database(s), processes, etc. At this point, I'd ask management about working on this because it can be a time consuming process that involves talking to a lot of people.
In my experience, this has usually been an uphill battle and somewhere between difficult and impossible to take on as an individual developer, but I think it's worth a try. Taking this kind of initiative could lead to career advancement, but I've often found that there's a lot of resistance and it's not seen as a priority.
It might seem a bit cynical, but as a junior or even intermediate developer, I'd also advise caution against stepping on anyone's toes. I'm not sure why, but some people tend to get defensive about this kind of thing.
If you're looking to understand it's properties and how it behaves I would look into something more robust like Alloy 6 [0] which has a great visualization system for inspecting models. However if you're looking for a class diagram tool then it's outside your wheelhouse.
There's also something I've played around with a bit but haven't used seriously: Moose [1]. It's basically an IDE for doing analysis of code. The trick is writing good parsers.
Yes, if it helps you understand how it works and how the pieces fit together.
No, if the previous is not all that useful for you (different types of learners), or you need to spend significant amounts of time doing it manually, especially given that code could change.
If you can, look into any tool that might allow you to get visualizations in an automated manner.
For example, JetBrains IDEs have a few different graph visualizations for dependencies and inheritance etc.: https://www.jetbrains.com/help/idea/2022.1/tests-in-ide.html...
There also used to be SourceTrail, though sadly the project is now retired: https://github.com/CoatiSoftware/Sourcetrail
For databases, you can also use external tools like DbVis: https://www.dbvis.com/features/
There are also a few tools here and there for visualizing networks or how container deployments look, but those are pretty situational/specific for each platform/setup.
https://github.com/whyboris/TypeScript-Call-Graph
Works for _functions_ not classes. I'm unsure how useful this tool is, but I suspect it might be helpful in some codebases.
To do this, I used a number of generally proprietary tools, where part of the project was getting the company to them buy these tools and fix their practices so that perhaps they would avoid the problem in the future.
Anyway, my point is there are open source and proprietary tools out there in the world that will instrument an application and generally based on usage patterns will build out various visualization of the application. While these are often sold or marketed as tools for operational triage, I always argued that the best practice was for developers to use these tools in order to both understand the performance characteristics of their work as well as look for unexpected interactions.
These tools will generally build out a topology of how various components interact and will often do other type of visualizations that will show class and method level chain of invocation. The good ones show end to end across distributed systems every bit of code that gets called from the time a user attempts some kind of action.
The benefit of this approach over a UML diagram or something similar is that it shows how the system was actually built and is working, rather than what a developer intended to be built. The larger the system and the more years / developers involved, the greater the delta.
The divio system is a good place to start, IMO, when it comes to organization: https://documentation.divio.com/
So, I would treat a visualization used in an explanation-style document very differently from a reference guide. One is intended to illustrate a concept quickly, another is intended to be precise. I don't think you'll see a single "visual system" ever take over, largely because documentation can have very, very different goals for the reader.
Visualizations in reference guides are (unfortunately) rare. I happen to like the approach taken by project reactor, embedding visuals into java reference docs: https://projectreactor.io/docs/core/release/api/
One of the hardest planning parts was “finding the gap” (where was the least unnatural way to place the split). To approach this, I pulled every function’s dependency tree (from code analysis) and all of our relational database entity dependencies (from DBMS telemetry) and all of our functions calling RDBMS code (also code analysis) into a large graph, which I then manipulated with a mix of hand-written tools and Gephi to create logical clusters that were both near-neighbors and business direction aligned (each business was going to have a particular focus, so not all divisions were equivalent).
I don’t recall exactly how long I spent, but it was in the month and a half range to get to a workable initial plan that we could start burning down.
That’s not live tooling to explore a system and is more archeological in nature, but may give you some ideas of how to auto-generate some terrain data.
Its called MTREES and its a free epub on Apple or Google.
It produces diagrams like these: https://www3.svjatoslav.eu/projects/sixth-3d/graphs/
Advantage of this is that diagram can be automatically updated from latest code. Classes discovery and visual layout is automatic.
The suggestion it made was to look across module boundaries or microservice boundaries, while making a particular request. So I go to update a Project with a new Pipeline, and I find out that the project service talks to the pipeline service which then has to create resources for which it talks to various resource services, all of that.
Since there aren't great digital tools for doing this, for visualizing these things, it was suggested in that talk that one of the more outside the box things you could do is to just take cardboard and modeling clay and strings and cut out some shapes to represent the different services, let each string be an RPC, and the actual diagram you would imagine being alive in time, like little beads flying across these strings, to indicate requests going out and then responses coming back... But the key was less to witness the time but more to get a timeless sense of connection, “oh, it turns out ResourceService is very hairy in these diagrams, it is kind of the central hub for this microservice cluster.” and for that it was helpful that the visualization had a certain physicality to it, it had weight and structure and engaged more senses than just the visual...
I've actually thought about just digitizing these ideas, even though it misses half the point LOL. I think that could be some wonderful documentation and diagrams in our onboarding for new folks.
And a jpg with an iPhone to digitize the sketch.
Documentation is a practice not a tool.
A shitty process can be improved.
Without process "the perfect tool" just sits collecting dust.
Or a person wants to start documenting, so they shop for documentation tools.
Now they have two problems.
Good luck.
Welcome to the industry! sobs in time constraints
> So, is it a good idea to try documenting the code design through some sort of visualization?
Yes, I think it's a good idea, though I'm afraid I don't have much advice on how to accomplish that.
The problem I've always had is that there's so many ways to cut up a system. Some people want higher-level architecture diagrams, that show how all the various systems fit together. Some people want infra diagrams showing what VMs, DBs, cloud resources, etc., are all wired to what else. Some people want sequence diagrams detailing RPC/API calls between systems/component.
Invariably, for whatever documentation does exist, the person wanting documentation wants the diagram that doesn't exist.
I've tried PlantUML, but it is complete garbage when it comes to emitting useful diagnostics. Paired with a language that seems to be nothing but special case after special case, and the result is basically unusable.
In school, I was taught to draw UML for classes within a program. I have never seen that IRL. I think the difference is the time required to comprehend the application.
There is however a growing movement of being able to visualize how a system is functioning at various levels. XState/Statecharts are a good example (https://xstate.js.org/viz/). Another example in the Ops space would be https://github.com/spekt8/spekt8 for K8S. I work at Grafana and we're more or less trying to expose these things in ways that make sense. Our bread and butter is timeseries data but we're adding more in that regard (it's possible to build node graphs from running systems).
Rolling your own visualization is useful and not too hard. It drives you to frame your questions more narrowly (call-hierarchy/sequence diagram, data flow, sub-systems?).
The fast-path for me is to generate the relation doublets/triplets (x --R--> y) and then let yFiles/yEd (https://www.yworks.com) lay it out hierarchically after converting to their tgf "trivial graph format". yEd GUI is free; the yFiles layout library is worth 10X the price if you're displaying lots of graphs.
These types of diagrams are either not specific enough to answer any useful questions or so dense and insane that no one can read them.
If you have a business requirement with a decision tree it is probably a good fit for this type of documentation - write it down in DOT or asciiflow or something
To describe messages and related state transitions representing the function of a system over time, numbered arrows atop a block diagram can work well (essentially one visualization of a graph), but for more complex multi-component interfaces with a strong requirement for ordered messaging, message sequence charts are well received. https://www.mcternan.me.uk/mscgen/
There are two models of reality that I find to be the most useful ones, especially when writing programs. The first is functions, and the second is sequences of states. - Leslie Lamport
.. via https://github.com/globalcitizen/taoup - see also https://en.wikipedia.org/wiki/State_machine - and you won't regret learning http://graphviz.org/
As others have mentioned, to effectively communicate a system you must limit the context to particular layers of abstraction, and C4 is a good approach to doing just that. There's also a C4 plugin for PlantUML[1].
But don't forget that, as with all visualizations, audience and purpose are key.
Consider whether you are addressing short-term needs (eg identifying inefficiencies, modeling for a client pitch) or long-term needs (eg knowledge retention, managing complexity). If your audience's needs are short-term, you can certainly get by with much simpler tools (eg Inkscape, excalidraw/draw.io/etc, picture of a whiteboard, doodles on a napkin).
Also consider whether or not you actually have a problem better served by bottom-up (ie generated) visualizations (eg ERDs for database schema refactoring, heatmaps for profiling, GraphViz for debugging DAGs).
And, tangently related, visualizing concurrency: https://divan.dev/posts/go_concurrency_visualize/
For individual systems I think a good inversion of control/dependency injection system can give you a good overview of the connections between components.
If I’m stepping into a giant ball of spaghetti code I generally set the debugger at line 1 and start stepping, and draw a lot of boxes and lines as I go… then throw those drawings away when I’m done. They’re really only helpful while you’re producing them (like taking notes as you read a textbook), hopefully in that situation you improve the code as you go so this approach isn’t as necessary!
I’ve recently realized that UML is no longer something new developers are aware of. It’s a useful tool in reducing your time for solving problems since you won’t have to write the code as you work through them.
I'm an intensely visual person, but have never found a visual programming system which scales well --- the problem is, past a certain level of complexity one has to use modules, which then devolves the visual representation down to just a bunch named blocks.
That said, I'm using BlockSCAD:
https://www.blockscad3d.com/community/projects/1421975
to work up designs which I'm then putting into other tools.
Looking at GraphSCAD:
and there's also Ryven and pythonocc which I managed to get installed:
https://github.com/Tanneguydv/Pythonocc-nodes-for-Ryven
but I'd really like to see a tool for this sort of thing which made G-code.
A good tool needs to be automatically generated from code which makes me think it needs to be integrated in a framework. Or the code needs to be written in a way that is easily visualizable in the first place.
Code comments seem to be the only way to add structure while ensuring the diagrams are kept up to date.
I think there is room for future frameworks that are built with visualization and it’s necessary tooling as a first class priority. When have you ever seen a framework claim it is easy to visualize and understand? Existing software is too much of a mess.
Also https://sequencediagram.org/ allows you to describe them in text and creates a diagram for you.
Its what I have been using.
So you need to break it down into multiple documents that cover different use cases and include only the components for a small set of use cases.
Then use the C4 model to break up the docs.
https://github.com/hammeiam/saddle-data-graph/blob/master/Sa... (scroll down for images)
I also do some sketchnoting (using pen & pencil) when I design a system for the first time. I wish I could easily convert those sketches to PlantUML.
[1]: https://brainfck.org/#PlantUML [2]: https://skyksit.com/programming/uml/plantuml-samples/
If you can, try to put together 2 different diagrams presenting different views. For example, sequence diagrams, model diagrams, etc. each tell you something different. Having a few different perspectives on the same system will give you a richer understanding.
Not specifically related to visualization, but, also, if you spend a lot of time trying to understand a piece of code, once you understand it, document it (e.g. in a function or method comment). Over time, this makes a huge difference and will help you and others out when you revisit the code later.
As a junior dev, you shouldn’t have to be worried by such things. This should be handled at the senior/principal level.
If you aren’t getting use case diagrams at the very least, you don’t have a clear definition of done.
Sometimes I wish I could project my IDE onto the wall behind my monitors. Programming in VR would probably achieve the same thing, but I'm not ready to move into that world yet.
Also the C4 Model seems cool but I don't currently use it
I would never use UML.
It's interesting because it attempts to capture domain knowledge isolated from a programming-centric view /metaphor which makes it challenge to discuss complex systems with staff other than programmers.
There are a lot of great tools that can reverse engineer code -> UML and databases to ER or UML. (and forward engineer UML or ER -> code or database)
I find them quite helpful.
If someone has taken care to keep up UML diagrams, and I can grab the blessed ones with more context that is great.
SoftVis3D (https://softvis3d.com/): where a "‘code city’ view provides a visualization for the hierarchical structure of the project".
Code Park: A New 3D Code Visualization Tool (2017) (https://arxiv.org/pdf/1708.02174.pdf), a “novel tool for visualizing codebases in a 3D game-like environment” with code represented as “code rooms” with code on the walls.
Code Structure Visualization Using 3D-Flythrough (2016) (https://opus-htw-aalen.bsz-bw.de/frontdoor/deliver/index/doc...), with spatial metaphors and first-person exploration of code.
Primitive (https://primitive.io/), a VR collaboration startup with a Matrix-looking “Immersive Development Environment” with “new tools for visually analyzing software in 3D”.
AppMap (https://appland.com/docs/how-to-use-appmap-diagrams.html), an automated code analysis tool that includes dependency maps and trace views.
plurid (https://github.com/plurid/plurid), a framework for visualizing and debugging code in a 3D explorable structure.
fsn (file manager) (https://en.wikipedia.org/wiki/Fsn_(file_manager)), an experimental application to view a file system in 3D (featured in Jurassic Park).
However, there are very few automated polyglot methods. So, often, I must trace the data paths manually.
a metacomputer of sorts; maybe epicomputer would be a better name?
i always found uml and other object style diagrams to be kind of obfuscating and encouraging of overcomplicated oop designs.
Keep them small, if you need more than 7ish boxes, your model is too detailed
The standard data-flow diagrams are too low level to be useful nowadays, but it's trivial to remap its symbols into the high-level components our modern distributed systems use. But still, for most people they will just show a database, an application transformation, and the user's output, so they are only useful when you have something different from that to show.
Entity-relationship diagrams were technically included in UML, but the entire world disagrees, so you are better thinking about them as a separated class. Those are in wide use.
Messages diagram (included in UML as sequence diagrams) are very useful to design and analyze protocols, but the UML one is a bad fit for that use due to its serial appearance. If you look at distributed systems papers, you will see a version with oblique arrows instead of horizontal ones that doesn't imply two-sided links, serial communication or even a total order on the messages.
Finally, state machines are a really useful architecture pattern, and states transition diagrams capture them very well.
Every other diagram that I've seen is either completely ad-hock or replacing it with text would improve everything. I really miss some "central architecture diagram" that shows the important structures on your code and how they interact (UML has structure as class diagrams, those suck), but I have never seen any good implementation of this one.
I have a feeling there might be more UML in parts of the enterprise world, or in places with "architects" that have forgotten (/never learned) how to code. But my guess is the above is what the majority of "UML" usage in the industry looks like.
As for your question about documentation: The best advice I can give is to be mindful that documentation tends to get outdated faster, the further away it lives from the code that it documents. That's why "self-documenting" code is so valuable, since the compiler will often refuse to compile it if you don't update it. Thus I would always advise to use the "closest" documentation form that is suitable, in roughly this order:
- The code itself
- The tests
- Comments in the code
- Comments at the start of files
- Readme.md
- Other files in repo
- Wiki that lives alongside the repo (e.g. GitHub)
- External wiki (e.g. Confluence)
In your specific case, it sound like you need to document the architecture/design in a way that needs to be understood before someone would even know which file to look in for further documentation. In that case, I would first attempt to describe the architecture in words in Readme.md. E.g. something like a list of the most important classes/functions/datatypes and a paragraph or two about how each relates to the others.
If something like that isn't helpful enough, and you want to make some UML consider using something like asciiflow.com or textik.com to make ascii diagrams to put in the readme. If the info doesn't fit that format, consider making diagrams in images that can be shown inline in the readme - ideally if you do this, use a format where you can also check in the "source" of the image (e.g. a graphviz DOT-file) to make it easier to update. And then finally, if all else fails, you can either check in a PDF, or use a wiki. But as mentioned above - prefer the "closer to the code" solution whenever possible.
For these things you know there are systematic ways of finding them - for mine it was a C/C++ Project so:
1. Find all executables via build output, or in the running system. For now you're largely going to ignore the details of what the code is doing. You just want to know what is actually "running" at runtime :). 2. Figure out where the entry points to those executables are, like a main. These are usually easy to search for or discover by convention. 3. Find out what threads it spawns 4. Start a simple diagram with a box at the top named for the executable, and branch down to one box for each thread. Manually trace control flow for each thread, adding boxes at points you think are noteworthy logical units. E.g. often threads will have some kind of main loop they sit in, which is a key element for understanding what that thread is doing. 5. Continue for nested threads and worker (short lived but not ephemeral) threads.
Once you complete this, you should have an abstract block diagram that gives a decent map of "What code is running in the system". And just through the process of naming and looking over, maybe a rough idea of what the various pieces of software are doing and possibly how they relate.
You can then repeat this for the other basics in a similar fashion - data and communication interfaces. It's good to emphasize staying at a first-principles kind of abstract mindset. You know there are a finite number of ways a process or thread can communicate or create side-effects outside of itself. If you literally just find all of them (not the details of what is happening over those interfaces), usually it ends up being quite few, and all of a sudden the complexity becomes less intimidating. You have a little box that does some manipulation of data via logic and state, and it goes in one pipe and out the other.
I should point out how difficult all this is largely derives from those "coding practices" droned on about for benefiting maintainability, but so often get tossed. For example, say your system uses message IDs as part of an IPC mechanism. If the code followed good practice, using some kind of constant definitions shared from a single place, you can now do things like search for that message identifier and find all places it's sent/received. If some code used it's own re-definition of the same ID, or hardcoded just the raw numerical value, this becomes nearly impossible.
Also you'll need multiple diagrams. You won't be able to clearly show a complete "code execution" diagram at the same time as an interface relationship diagram or shared data sources diagram. The complexity of it will not help, it will just be more overwhelming complexity.
For visualizing system designs, you should take a look at STAMP, e.g., via “Engineering A Safer World” + the resources at mit.edu/psas + on YouTube.
(Multiple tools, both commercial and libre, exist and are being developed to make these diagrams, although for what it’s worth, I mostly hear about people making them using draw.io, Google Drawings, on physical paper/whiteboards, or occasionally with specialized tooling.
I have also recently published a project in this area, https://github.com/mstone/depict, which I believe is well on its way toward addressing some unmet needs here.)
For visualizing protocols, things like sequence diagrams, data flow diagrams, DRAKON flow charts, value stream maps, and occasional more specialized objects like CPSA “cryptographic protocol shapes” / strand space skeletons are where I start depending on the flavor of what’s needed.
For visualizing the design of implementations themselves, I have not yet seen anything that I feel obliged to recommend; rather, here, I suggest investing in adding illustrations to your existing documentation in whatever way is easiest for you to use to clarify whatever subtleties you need to clarify for your audience.
(Here I tend to look at things like ASCII-art, SQLite’s railroad diagrams (now made with pikchr, AIUI), and sequence diagrams, as mentioned by other commenters, as helpful examples to start with.)
Finally, for implementing debugging/profiling/operational illustrations, there is a such a rich set of examples to turn to — whether from the very specialized (custom process model video rendering pipelines in robotics) to TensorBoard for TensorFlow to general-purpose tools like browser performance debugging suites, flame charts, or Go’s built-in profile graphing tools - that rather than learn any particular such tools, I’d instead suggest trying to get comfortable with the building blocks underlying these systems, which include contemporary GUI/web apps, custom drawing and animation tools like SVG, pretty printers, and Grammar-of-Graphics systems like vega-lite.
(Note: although it may seem superficially extraneous to your question, the reason I also suggest thinking about debugging visualizations in this context is because IMO, to work, they ~necessarily encode a visual model of the design of your implementation since it is the design of the implementation that provides the vocabulary and relationships that have to be understood and navigated in order to successfully debug/optimize/monitor any given running instance of whatever system you are building.)