HACKER Q&A
📣 lovehatesoft

Visualizing software designs, especially of large systems (if at all)?


I learned about UML in a course, but have never used it or seen it in practice as a junior developer. Sometimes I'll see a flowchart, but that's not too common. Is this the same in other companies?

With some of our code, the designers are either gone or sometimes unavailable, and it can be tricky to see how all the pieces fit together. A good IDE makes the job a little easier (finding references, ctrl+click to go to declarations, etc.), but it'd be nice to have a diagram or something for visualization.

So, is it a good idea to try documenting the code design through some sort of visualization? If so, using UML or something else? I suppose there might be tools for doing this automatically in some languages? Otherwise I think if it was valuable enough, it could be something we make sure to review and update along with code changes.

Any thoughts would be appreciated!


  👤 jaylaal Accepted Answer ✓
For a small but complicated project I got thrown into a while ago, the only way for me to understand it was to print out all the source directly, vertically tape together the pages for a single file, and then lay them all out on a huge table. Then I took multicolored markers and started physically drawing out the call chains. I then I sers-toi the system, and also found an enraging bug: the system widely used the variables "blah_name" and "blah_id", including in many functions' parameters. Except, in one case, blah_id was passed in as blah_name and thenceforth became known as blah_name.

I don't know if an automated visualization system is possible, but you'll have to understand the whole thing before doing so. Pen and paper was the most expedient solution for me at the time.


👤 PaulHoule
This tool is good for making simple UML diagrams and even lets you do it with simplified syntax

https://plantuml.com/

I'd say the big problem in visualizing big systems is that you can't usefully do it in one graph. For instance I worked on a system that had 2000+ database tables if you were going to make a diagram of that which shows everything it is going to take up a long wall. (This can be useful, but it is a big commitment)

A useful tool is going to let you make meaningful diagrams that show the subset of entities that are part of a story. I went to an art show of Mark Lombardi's works

https://en.wikipedia.org/wiki/Mark_Lombardi

who (before he was murdered) drew elaborate diagrams of conspiracies. One thing they showed was drafts that he made in the progress of creating his visualizations and he would sometimes make 40 or more of them. He would start out with a "hairball" that was disorganized and gradually figure out how to lay the diagram out in a way that made the meaning obvious.


👤 Weidenwalker
My friend and I have been working on https://www.codeatlas.dev in our spare time, which is a tool that creates pretty (2D!) visualisations of codebases, while providing additional insights via overlays (e.g. commit density, programming language). For example here's the Kubernetes codebase visualised using codeatlas: https://www.codeatlas.dev/repo/kubernetes/kubernetes.

At the moment, codeatlas is only a static gallery, but we're currently about 1-2 weekends away from releasing a Github action that deploys this diagram on github pages for your own repos - if you're interested, feel free to watch this repo: https://github.com/codeatlasHQ/codebase-visualizer-action


👤 diegof79
What you are looking for is called "Program Understanding". If you Google for it you'll find a bunch of research papers on the topic.

For some reason, tools related to program understanding are not widely adopted by IDEs.

A while ago I used a tool for Java that was based on the Object-Oriented Metrics[0] book by Michele Lanza. But, that tool was discontinued and it doesn't exist anymore[1].

If you are interested in that topic take a look at Moose[2], a dig a little bit in the research papers. (honestly I tried Moose a few times, but I wasn't very comfortable with it).

For TypeScript projects, the TS compiler API is extremely powerful and easy to use. You can use that to extract information and analyze the code relationships (Graphviz is your friend here :) ).

[0]: https://link.springer.com/book/10.1007/3-540-39538-5 [1]: https://web.archive.org/web/20150428173717/http://www.intooi... [2]: https://moosetechnology.org/


👤 jbreckmckye
The challenge is that there are different ways of "mapping" software.

You could map the way programs fit into machines, and the networks between them. This would be the topology.

You can map the way services call upon one another with requests. This is the service graph.

You can map how systems interact over events or shared resources. You could say this is the logical graph.

The problem happens when you try and graph them all at once. It's the same as trying to draw a real map, with all the services, bus routes, railways, shops and administrative regions superimposed on one image. It's very busy.

So I use separate maps.

Tools are another matter. Personally I use Mermaid for graphs. I also have my own tools that create SVG visualisations using DAGre. This can be helpful for interactive visualisations where you can click into different nodes and explore more detail.

My system uses CloudFormation templates and our in house deployment DSLs to figure out the "topology", then let the users see the different superimposed "graphs" as they see fit


👤 aetherlord
I'm a fan of https://www.ilograph.com/. I've only used it for a few small things, but the author has good samples, including a diagram of ilograph itself - https://app.ilograph.com/demo.ilograph.Ilograph/Request.

👤 gbuk2013
I like using https://c4model.com/ - the Level 2 diagram is particularly useful.

I use https://mermaid-js.github.io/mermaid/#/ for the diagram itself because Github natively supports it in markdown files, so you can revision control the diagram. I managed to get reasonably close to the C4 diagrams minus a few features that mermaid does not support.


👤 _dain_
No automated tool will come close to having a 5 minute conversation with the main designer and having him draw you a diagram freehand on the back of a napkin. This is a social and organizational and communication problem, not technical.

If that can't be done, there are some interesting things you can try. A lot of the suggestions in the thread are "top down" methods; you can get a lot of value out of "bottom up" visualizations too. Things like:

- Histograms of which lines / functions get called the most, or spent the most time in

- Which lines / functions / files get changed the most in the git history

- CPU flamegraphs

- Plain old print-debugging

In over-architected systems it can be difficult to figure out where the real "meat" of the code is, as opposed to the endless layers of configuration and wrappers and interfaces and indirection. UML diagrams may not help, or even be deceiving, but a stack trace never is.


👤 nwsm
All three companies I have been at make heavy use of architecture diagrams, though their rules and prevalence vary from team to team. In fact I can't imagine a high functioning software organization taking a project from requirements gathering to production support without having ever made an architecture diagram.

However, teams usually do not follow any hard rules like UML. A box with text, a connecting line, and a grouping border can all mean anything, and the subtleties of their actual implementation are still trapped in the lines of code and minds of architects.

I would definitely recommend drawing diagrams of systems you create or work with. If there is not one at work, and you're struggling to grasp the big picture, start making one yourself. Start with the pieces you know and put in ambiguous boxes for the pieces you know exist but don't know what they do. I do this even for solo side projects when they have more than one physically-separated component. This will help you catch bad logic and inefficiencies early, and find the best implementations for new features down the road.

I have used draw.io (now diagrams.net), LucidChart, Microsoft Visio, and they all get the job done. I'll recommend the first one as it's the most open.


👤 ChrisMarshallNY
I use what I call "UML Lite" (UML, without all the frou-frou). It's handy for illustrating points[0].

What I generally do, is start with what I call a "napkin sketch" (which can be UML-ish)[1], and try to avoid writing down too much stuff, in order to reduce "concrete galoshes"[2].

I try to use tools like Doxygen and Jazzy, to document the code, in an inline fashion[3]. Doxygen will generate a UML "Lite" diagram[4].

[0] https://littlegreenviper.com/miscellany/swiftwater/the-curio...

[1] https://littlegreenviper.com/miscellany/forensic-design-docu...

[2] https://littlegreenviper.com/miscellany/concrete-galoshes/

[3] https://littlegreenviper.com/miscellany/leaving-a-legacy/

[4] https://doxygen.nl/manual/diagrams.html


👤 saila
I've worked at a variety of places and have never seen any kind of system or architecture diagram going in (should probably ask about this in the interview stage). Further, it often seems that no one understands the overall system well, in some cases even when there's an architect.

So I usually find myself in the same position of using jump-to-definition and trying to get a handle on things that way.

From there, I'll sketch a simple diagram on paper to aid my own understanding of whatever piece of the system on working on. Personally, I'd avoid software diagramming tools at this stage. A simple hand drawn diagram can be extremely helpful and doesn't take long to create.

After a while, I'll often start sketching out a more comprehensive system diagram with the database(s), processes, etc. At this point, I'd ask management about working on this because it can be a time consuming process that involves talking to a lot of people.

In my experience, this has usually been an uphill battle and somewhere between difficult and impossible to take on as an individual developer, but I think it's worth a try. Taking this kind of initiative could lead to career advancement, but I've often found that there's a lot of resistance and it's not seen as a priority.

It might seem a bit cynical, but as a junior or even intermediate developer, I'd also advise caution against stepping on anyone's toes. I'm not sure why, but some people tend to get defensive about this kind of thing.


👤 agentultra
Depends on what you want to understand about the system.

If you're looking to understand it's properties and how it behaves I would look into something more robust like Alloy 6 [0] which has a great visualization system for inspecting models. However if you're looking for a class diagram tool then it's outside your wheelhouse.

There's also something I've played around with a bit but haven't used seriously: Moose [1]. It's basically an IDE for doing analysis of code. The trick is writing good parsers.

[0] https://alloytools.org/alloy6.html

[1] https://moosetechnology.org/


👤 KronisLV
> So, is it a good idea to try documenting the code design through some sort of visualization?

Yes, if it helps you understand how it works and how the pieces fit together.

No, if the previous is not all that useful for you (different types of learners), or you need to spend significant amounts of time doing it manually, especially given that code could change.

If you can, look into any tool that might allow you to get visualizations in an automated manner.

For example, JetBrains IDEs have a few different graph visualizations for dependencies and inheritance etc.: https://www.jetbrains.com/help/idea/2022.1/tests-in-ide.html...

There also used to be SourceTrail, though sadly the project is now retired: https://github.com/CoatiSoftware/Sourcetrail

For databases, you can also use external tools like DbVis: https://www.dbvis.com/features/

There are also a few tools here and there for visualizing networks or how container deployments look, but those are pretty situational/specific for each platform/setup.


👤 yboris
Specifically for TypeScript I created a CLI to visualize the call graph

https://github.com/whyboris/TypeScript-Call-Graph

Works for _functions_ not classes. I'm unsure how useful this tool is, but I suspect it might be helpful in some codebases.


👤 coward123
I spent a number of years as a traveling consultant fixing problems... I guess it was aligned with what the cool kids call Site Reliability Engineering these days. Some company would have a crisis and I'd go in and fix it. So basically I had a matter of a few hours to learn as much as I could about some huge system that had probably been built over many years by a whole lot of people.

To do this, I used a number of generally proprietary tools, where part of the project was getting the company to them buy these tools and fix their practices so that perhaps they would avoid the problem in the future.

Anyway, my point is there are open source and proprietary tools out there in the world that will instrument an application and generally based on usage patterns will build out various visualization of the application. While these are often sold or marketed as tools for operational triage, I always argued that the best practice was for developers to use these tools in order to both understand the performance characteristics of their work as well as look for unexpected interactions.

These tools will generally build out a topology of how various components interact and will often do other type of visualizations that will show class and method level chain of invocation. The good ones show end to end across distributed systems every bit of code that gets called from the time a user attempts some kind of action.

The benefit of this approach over a UML diagram or something similar is that it shows how the system was actually built and is working, rather than what a developer intended to be built. The larger the system and the more years / developers involved, the greater the delta.


👤 mr_tristan
Visualizations are just one aspect of documentation, so I would recommend looking into how you organize your documentation, and build visualizations to support that. This is, after all, what I think you are really going for: just better written explanatory documentation, with useful organization.

The divio system is a good place to start, IMO, when it comes to organization: https://documentation.divio.com/

So, I would treat a visualization used in an explanation-style document very differently from a reference guide. One is intended to illustrate a concept quickly, another is intended to be precise. I don't think you'll see a single "visual system" ever take over, largely because documentation can have very, very different goals for the reader.

Visualizations in reference guides are (unfortunately) rare. I happen to like the approach taken by project reactor, embedding visuals into java reference docs: https://projectreactor.io/docs/core/release/api/


👤 sokoloff
I led a project ages ago to divide a massively tangled code base into two parts to facilitate a separation of tech organizations into two (related) companies.

One of the hardest planning parts was “finding the gap” (where was the least unnatural way to place the split). To approach this, I pulled every function’s dependency tree (from code analysis) and all of our relational database entity dependencies (from DBMS telemetry) and all of our functions calling RDBMS code (also code analysis) into a large graph, which I then manipulated with a mix of hand-written tools and Gephi to create logical clusters that were both near-neighbors and business direction aligned (each business was going to have a particular focus, so not all divisions were equivalent).

I don’t recall exactly how long I spent, but it was in the month and a half range to get to a workable initial plan that we could start burning down.

That’s not live tooling to explore a system and is more archeological in nature, but may give you some ideas of how to auto-generate some terrain data.


👤 rdubs333
I have been working on a natural design system for visually thinking about technology. Maybe it will help you.

Its called MTREES and its a free epub on Apple or Google.

https://mtrees.io


👤 svjatoslav
I wrote tool to generate diagrams from Java code: https://www3.svjatoslav.eu/projects/javainspect/

It produces diagrams like these: https://www3.svjatoslav.eu/projects/sixth-3d/graphs/

Advantage of this is that diagram can be automatically updated from latest code. Classes discovery and visual layout is automatic.


👤 crdrost
A nice talk at kubecon last year spoke about visualizing protocols and more complex multi-party exchanges.

The suggestion it made was to look across module boundaries or microservice boundaries, while making a particular request. So I go to update a Project with a new Pipeline, and I find out that the project service talks to the pipeline service which then has to create resources for which it talks to various resource services, all of that.

Since there aren't great digital tools for doing this, for visualizing these things, it was suggested in that talk that one of the more outside the box things you could do is to just take cardboard and modeling clay and strings and cut out some shapes to represent the different services, let each string be an RPC, and the actual diagram you would imagine being alive in time, like little beads flying across these strings, to indicate requests going out and then responses coming back... But the key was less to witness the time but more to get a timeless sense of connection, “oh, it turns out ResourceService is very hairy in these diagrams, it is kind of the central hub for this microservice cluster.” and for that it was helpful that the visualization had a certain physicality to it, it had weight and structure and engaged more senses than just the visual...

I've actually thought about just digitizing these ideas, even though it misses half the point LOL. I think that could be some wonderful documentation and diagrams in our onboarding for new folks.


👤 brudgers
The simplest thing that might work is a pen and a notebook.

And a jpg with an iPhone to digitize the sketch.

Documentation is a practice not a tool.

A shitty process can be improved.

Without process "the perfect tool" just sits collecting dust.

Or a person wants to start documenting, so they shop for documentation tools.

Now they have two problems.

Good luck.


👤 deathanatos
> I learned about UML in a course, but have never used it or seen it in practice as a junior developer. Sometimes I'll see a flowchart, but that's not too common. Is this the same in other companies?

Welcome to the industry! sobs in time constraints

> So, is it a good idea to try documenting the code design through some sort of visualization?

Yes, I think it's a good idea, though I'm afraid I don't have much advice on how to accomplish that.

The problem I've always had is that there's so many ways to cut up a system. Some people want higher-level architecture diagrams, that show how all the various systems fit together. Some people want infra diagrams showing what VMs, DBs, cloud resources, etc., are all wired to what else. Some people want sequence diagrams detailing RPC/API calls between systems/component.

Invariably, for whatever documentation does exist, the person wanting documentation wants the diagram that doesn't exist.

I've tried PlantUML, but it is complete garbage when it comes to emitting useful diagnostics. Paired with a language that seems to be nothing but special case after special case, and the result is basically unusable.


👤 flurly
I'm a big fan of Terrastruct (https://terrastruct.com/). They focus on building great software for visualizing complex software architecture. Their secret sauce is this idea of attention where they allow you to zoom in and out so you can get the 10 foot view or 10,000 foot view, whatever you find most useful.

👤 kulikalov
Any system, however complex, can be designed as a bunch of "I/O devices" with abstraction layers. The trick is to keep each abstraction layer simple enough so that you can hold it in your mind (or fit a piece of paper). With this approach any kind of visualisations tool would work just fine - from a piece of paper to tools like Ilograph. I personally prefer excalidraw.

👤 raxor53
Why are all these comments suggesting new tools instead of answering your question? From what I've seen, high documentation only comes into higher level abstractions such as individual services or architecture design.

In school, I was taught to draw UML for classes within a program. I have never seen that IRL. I think the difference is the time required to comprehend the application.


👤 CitizenKane
In my experience it tends not to be done because there's an inevitable drift between the system as it actually operates and the relevant visualizations and diagrams. I have seen them used as a starting off point at times, although that's also been infrequent.

There is however a growing movement of being able to visualize how a system is functioning at various levels. XState/Statecharts are a good example (https://xstate.js.org/viz/). Another example in the Ops space would be https://github.com/spekt8/spekt8 for K8S. I work at Grafana and we're more or less trying to expose these things in ways that make sense. Our bread and butter is timeseries data but we're adding more in that regard (it's possible to build node graphs from running systems).


👤 w10-1
For OO structure, https://structure101.com is fantastic at global views, especially for refactoring, where it can help you plan how to transition from hairball to e.g., directed graph. It has a 30-day trial to show you it's worth the price.

Rolling your own visualization is useful and not too hard. It drives you to frame your questions more narrowly (call-hierarchy/sequence diagram, data flow, sub-systems?).

The fast-path for me is to generate the relation doublets/triplets (x --R--> y) and then let yFiles/yEd (https://www.yworks.com) lay it out hierarchically after converting to their tgf "trivial graph format". yEd GUI is free; the yFiles layout library is worth 10X the price if you're displaying lots of graphs.


👤 liveoneggs
The time of UML came and went very quickly. No one misses it.

These types of diagrams are either not specific enough to answer any useful questions or so dense and insane that no one can read them.

If you have a business requirement with a decision tree it is probably a good fit for this type of documentation - write it down in DOT or asciiflow or something


👤 contingencies
Interfaces. A well designed system will present simple interfaces with clear delineations of responsibility. All elements will be carefully named to minimize the potential for confusion or misuse. In such a system, diagrams become not only easy but semantically meaningful. Programmers use this to manage complexity, and in business domains laypeople should usually also be able to comprehend the chosen abstractions.

To describe messages and related state transitions representing the function of a system over time, numbered arrows atop a block diagram can work well (essentially one visualization of a graph), but for more complex multi-component interfaces with a strong requirement for ordered messaging, message sequence charts are well received. https://www.mcternan.me.uk/mscgen/

There are two models of reality that I find to be the most useful ones, especially when writing programs. The first is functions, and the second is sequences of states. - Leslie Lamport

.. via https://github.com/globalcitizen/taoup - see also https://en.wikipedia.org/wiki/State_machine - and you won't regret learning http://graphviz.org/


👤 oedo
If you're interested in diagramming system architecture in a top-down fashion, I'll pile on another recommendation to check out the C4 model[0].

As others have mentioned, to effectively communicate a system you must limit the context to particular layers of abstraction, and C4 is a good approach to doing just that. There's also a C4 plugin for PlantUML[1].

But don't forget that, as with all visualizations, audience and purpose are key.

Consider whether you are addressing short-term needs (eg identifying inefficiencies, modeling for a client pitch) or long-term needs (eg knowledge retention, managing complexity). If your audience's needs are short-term, you can certainly get by with much simpler tools (eg Inkscape, excalidraw/draw.io/etc, picture of a whiteboard, doodles on a napkin).

Also consider whether or not you actually have a problem better served by bottom-up (ie generated) visualizations (eg ERDs for database schema refactoring, heatmaps for profiling, GraphViz for debugging DAGs).

[0] https://c4model.com/

[1] https://github.com/plantuml-stdlib/C4-PlantUML


👤 divan
Some thoughts and attempt to visualize Go code in a meaningful way: https://divan.dev/posts/visual_programming_go/

And, tangently related, visualizing concurrency: https://divan.dev/posts/go_concurrency_visualize/


👤 er_d0s
I’ve tried c4/plantuml (and loved the concept) but have never been able to make it stick, it always seems to get out of date quickly and having a feeling of “I’m not sure if this is up to date” is worse than no documentation at all (because you need to learn the documentation, then the behaviour). Honestly the fastest way I’ve found to understand large distributed systems is by observing the interactions between them, schema detection in service bus type architecture and tools like X-ray (in aws) or open tracing give you a good picture of “this thing produces or consumes this type of message”

For individual systems I think a good inversion of control/dependency injection system can give you a good overview of the connections between components.

If I’m stepping into a giant ball of spaghetti code I generally set the debugger at line 1 and start stepping, and draw a lot of boxes and lines as I go… then throw those drawings away when I’m done. They’re really only helpful while you’re producing them (like taking notes as you read a textbook), hopefully in that situation you improve the code as you go so this approach isn’t as necessary!


👤 edpichler
UML is not a documentation tool, it´s a communication tool. It is a standardized way to communicate. It removes ambiguity. UML is the best way to manage, communicate and handle a large amounts of complexity. Most of the free UML tools are very basic and does not have traceability features (we can see many in the comments), and this is really a no-go if you want to understand large systems.

👤 jimmytehbanana
I’ve done the print and trace for bizarre bugs while trying to understand the code. I’ve also used UML to work through designing something, or even changing the existing design. UML can be super helpful is working out programming problems without having to spend time writing the code. Your design problems can be worked out quickly with UML. Once you get your design work done and settle on an implementation, the individual class diagrams help determine your order of operations and what you may need to do for the refactoring (which, again, you can plan out in UML). Executing the plan becomes a matter of implementing your UML designs.

I’ve recently realized that UML is no longer something new developers are aware of. It’s a useful tool in reducing your time for solving problems since you won’t have to write the code as you work through them.


👤 WillAdams
"What does an algorithm look like?"

I'm an intensely visual person, but have never found a visual programming system which scales well --- the problem is, past a certain level of complexity one has to use modules, which then devolves the visual representation down to just a bunch named blocks.

That said, I'm using BlockSCAD:

https://www.blockscad3d.com/community/projects/1421975

to work up designs which I'm then putting into other tools.

Looking at GraphSCAD:

http://graphscad.blogspot.com

and there's also Ryven and pythonocc which I managed to get installed:

https://ryven.org

https://github.com/Tanneguydv/Pythonocc-nodes-for-Ryven

but I'd really like to see a tool for this sort of thing which made G-code.


👤 cwoolfe
I remember having the same problem earlier in my career. I inherited a code base that was a million lines of code and 15 years old. Things I tried: printed out the nastiest 2000 line function on a giant piece of paper, logged every function entry. Things that worked well at the time: reading filtered logs of actual code execution, tracing through code to see how it works. What I would do if I could talk to me 10 years ago: Don't worry about the big picture. Trace through the code path relevant to your task and make sure you understand whatever inputs/outputs there are on that path. If there's something that makes 0 sense, ask a senior dev. After tracing through enough paths, you'll start to develop a mental map, in the same way you develop a mental map of a geographical location, after visiting about five locations in the same area. You get to know the main roads, highways, etc. You have a knowledge of the relevant paths; you don't always need a map of the whole city.

👤 vaughan
Seeing as there are so many tools and no one is using them, it makes me think that they are just too hard to maintain separately from the system itself. Keeping a readme up to date is hard enough.

A good tool needs to be automatically generated from code which makes me think it needs to be integrated in a framework. Or the code needs to be written in a way that is easily visualizable in the first place.

Code comments seem to be the only way to add structure while ensuring the diagrams are kept up to date.

I think there is room for future frameworks that are built with visualization and it’s necessary tooling as a first class priority. When have you ever seen a framework claim it is easy to visualize and understand? Existing software is too much of a mess.


👤 masterofmisc
Sequence Diagrams!! They are awesome. You get to see all the entities involved in the architecture and how they interact with each other.

Also https://sequencediagram.org/ allows you to describe them in text and creates a diagram for you.

Its what I have been using.


👤 mtoddsmith
The most important rule is that you can’t include everything in a single doc as it quickly gets too big and complicated.

So you need to break it down into multiple documents that cover different use cases and include only the components for a small set of use cases.

Then use the C4 model to break up the docs.


👤 hammeiam
I took a stab at writing my own UML-type diagram in Python using networkx and rendered with dot. Since it's a real network instead of just pictures, I can slice it however I want it (eg "show me all dependencies of page X" or "show me all nodes of type 'state'". It's still a work in progress, but it's helped me think through some things.

https://github.com/hammeiam/saddle-data-graph/blob/master/Sa... (scroll down for images)


👤 cyneox
I mostyl use PlantUML and C4 Model (by Simon Brown). I have my own collection of PlantUML related resources [1]. I've also recently found these examples [2] which are really nice.

I also do some sketchnoting (using pen & pencil) when I design a system for the first time. I wish I could easily convert those sketches to PlantUML.

[1]: https://brainfck.org/#PlantUML [2]: https://skyksit.com/programming/uml/plantuml-samples/


👤 TreyS
Different types of diagrams highlight different aspects of a system.

If you can, try to put together 2 different diagrams presenting different views. For example, sequence diagrams, model diagrams, etc. each tell you something different. Having a few different perspectives on the same system will give you a richer understanding.

Not specifically related to visualization, but, also, if you spend a lot of time trying to understand a piece of code, once you understand it, document it (e.g. in a function or method comment). Over time, this makes a huge difference and will help you and others out when you revisit the code later.


👤 gabereiser
1000% yes it’s a good idea.

As a junior dev, you shouldn’t have to be worried by such things. This should be handled at the senior/principal level.

If you aren’t getting use case diagrams at the very least, you don’t have a clear definition of done.


👤 sno6
I think a _part_ of the problem is due to being confined to the dimensions of your screen. If I'm reading code in one file which relies on context from another file, I can bring them up side by side in my editor, but the value in doing that diminishes with the more files / context I need in order to understand something.

Sometimes I wish I could project my IDE onto the wall behind my monitors. Programming in VR would probably achieve the same thing, but I'm not ready to move into that world yet.


👤 djbusby
We like using PlantUML and it embeds in Asciidoc nicely.

Also the C4 Model seems cool but I don't currently use it

https://c4model.com/


👤 candiddevmike
If you can get OpenTelemetry/Jaeger/etc integrated with the codebase, you'll be able to visualize requests really easily, especially if you sample everything (in dev).

👤 fasteddie31003
I'm a visual learner. I wanted a way to see how documentation of complex systems looked too. I'm working on a https://gainknowhow.com/software-companies.html . It is basically like a mind map for documentation. The high level documentation like core values is on top connected through learning paths to learn the complete context of low-level skills.

👤 theocodesitall
I recently started using https://onemodel.app and it really makes diagramming super fast and easy.

👤 Aeolun
I never make diagrams for entire codebases. I do make diagrams for specific logical flows (still not exactly mirroring the code, just the logic behind it).

I would never use UML.


👤 rramadass
Visualization of a code base (using anything) is absolutely a must for System comprehension. You need to look at both the Static structures/call graphs (use Doxygen, any UML tool, CFlow/Codeviz etc.) and Dynamic (i.e. Runtime) call graphs (Use profiler call graph, Debug Tracing within the App. etc.) for a code base. Printout relevant diagrams/graphs and liberally add notes as reqd.

👤 joe8756438
One technique that improved my process a lot, particularly for system-level problems, is swim lane diagrams. I worked with someone that created them for _everything_, and at first it irritated the crap out of me because for some things I felt it took longer to write the diagram than to write the code. But, when things are complicated it is so helpful to see a process overview with minimal path ambiguity.

👤 golf_mike
> ...documenting the code design... I would say that code is an implementation of (part of) a design, and both implementation and design should be documented appropriately. If it's not present, make as much text and visualizations you need for you to understand it. As you are a developer start with the code. I prefer C4 over UML for the separation of contexts. Good and fun exercise!

👤 evo_9
I would take a look at Wardley Mapping: https://learnwardleymapping.com/

It's interesting because it attempts to capture domain knowledge isolated from a programming-centric view /metaphor which makes it challenge to discuss complex systems with staff other than programmers.


👤 ThinkBeat
I am fond of UML diagrams for the programming part and UML or ER for the database schema.

There are a lot of great tools that can reverse engineer code -> UML and databases to ER or UML. (and forward engineer UML or ER -> code or database)

I find them quite helpful.

If someone has taken care to keep up UML diagrams, and I can grab the blessed ones with more context that is great.


👤 qxxx
I am a php developer and sometimes if I work on a complex project I use xdebug + profiler + cachegrind to generate flow diagrams. It shows not only how often a method was called and how much memory it used but also where the call came from and what happened next. It helps sometimes to understand the code better. I use it mostly for debugging.

👤 qrian
Recently stumbled upon [C4 Model](https://www.youtube.com/watch?v=x2-rSnhpw0g), which seemed very promising. It builds on the failure of adopton of UMLs, and aim to be practical and communicative.

👤 douchescript
I like to use code coverage tools to understand how things work together. Its especially useful with unit tests. You can see every line that was hit all over the codebase for a particular function or api call, with a heat map of line counts. Makes it easy to know what’s important and what’s cruft.

👤 silasb
Not a large systems, but we started using https://backstage.io to help visualize our software. Additionally, we're hoping to leverage backstage plugins to provide tooling around ownership for our components.

👤 kbrkbr
For what it’s worth: I have good experiences with OPM [0], but not at very large scale. [0] https://en.m.wikipedia.org/wiki/Object_Process_Methodology

👤 otras
I gave this a bit of thought earlier this year (https://alexanderell.is/posts/visualizing-code/) and, with the help of HN commenters, collected a small list of ways people are working to help with code visualization. I don't think most of them are production ready (some are just research papers), but you may find them interesting all the same.

SoftVis3D (https://softvis3d.com/): where a "‘code city’ view provides a visualization for the hierarchical structure of the project".

Code Park: A New 3D Code Visualization Tool (2017) (https://arxiv.org/pdf/1708.02174.pdf), a “novel tool for visualizing codebases in a 3D game-like environment” with code represented as “code rooms” with code on the walls.

Code Structure Visualization Using 3D-Flythrough (2016) (https://opus-htw-aalen.bsz-bw.de/frontdoor/deliver/index/doc...), with spatial metaphors and first-person exploration of code.

Primitive (https://primitive.io/), a VR collaboration startup with a Matrix-looking “Immersive Development Environment” with “new tools for visually analyzing software in 3D”.

AppMap (https://appland.com/docs/how-to-use-appmap-diagrams.html), an automated code analysis tool that includes dependency maps and trace views.

plurid (https://github.com/plurid/plurid), a framework for visualizing and debugging code in a 3D explorable structure.

fsn (file manager) (https://en.wikipedia.org/wiki/Fsn_(file_manager)), an experimental application to view a file system in 3D (featured in Jurassic Park).


👤 scothalverson
I have found success with Scitools' Understand (https://en.wikipedia.org/wiki/Understand_(software))

👤 spotlesstofu
I love the open source diagrams.net (previously draw.io). I save the files as editable SVG so I can include them directly in markdown READMEs. I use diagrams both to help me design and to show during demos

👤 ultra_nick
I've found dataflow analysis to be more helpful than other methods when designing and debugging systems.

However, there are very few automated polyglot methods. So, often, I must trace the data paths manually.


👤 bsedlm
I want to make a computer runtime which doesn't execute the program but instead allows for a programmer to understand what's going on.

a metacomputer of sorts; maybe epicomputer would be a better name?


👤 a-dub
for existing codebases, doxygen will produce static call graphs for c/c++ and java codebases using dot graphs that are hyperlinked back and forth to lxr style code listings (you do typically have to turn this on as it's expensive to compute and is turned off by default.)

i always found uml and other object style diagrams to be kind of obfuscating and encouraging of overcomplicated oop designs.


👤 skadamat
Glamorous Toolkit is doing some interesting work here:

https://gtoolkit.com/


👤 8note
I typically see data flow, data model, and sequence diagrams.

Keep them small, if you need more than 7ish boxes, your model is too detailed


👤 handstad

👤 marcosdumay
In my experience, software engineering has ever created 4 types of standardized diagrams that are useful: entity-relationship, data-flow, messages and states-transition. Most share the feature that they communicate data, instead of code.

The standard data-flow diagrams are too low level to be useful nowadays, but it's trivial to remap its symbols into the high-level components our modern distributed systems use. But still, for most people they will just show a database, an application transformation, and the user's output, so they are only useful when you have something different from that to show.

Entity-relationship diagrams were technically included in UML, but the entire world disagrees, so you are better thinking about them as a separated class. Those are in wide use.

Messages diagram (included in UML as sequence diagrams) are very useful to design and analyze protocols, but the UML one is a bad fit for that use due to its serial appearance. If you look at distributed systems papers, you will see a version with oblique arrows instead of horizontal ones that doesn't imply two-sided links, serial communication or even a total order on the messages.

Finally, state machines are a really useful architecture pattern, and states transition diagrams capture them very well.

Every other diagram that I've seen is either completely ad-hock or replacing it with text would improve everything. I really miss some "central architecture diagram" that shows the important structures on your code and how they interact (UML has structure as class diagrams, those suck), but I have never seen any good implementation of this one.


👤 m12k
In 15 years in the industry, the only UML I have seen outside of a classroom has been made ad-hoc, on a whiteboard during a discussion. There would usually be no strict adherence to whether a square or "blob" was a class, an object, a user, a database or a concept, nor whether a line between these with an arrow at the end meant "inherits from", "knows about", "has an instance of", "sends data to", "calls a function on", "contacts with a network request", "is transformed into" or something else - all these details would usually just be cleared up from the context, or with text next to the lines. There was thus also usually no strict distinction between class diagram, sequence diagram, flowchart or other diagram - the diagram would just be whatever it needed to be in a given area in order to convey the information being discussed. In short it was highly informal, and used as a means of communication or brainstorming, rather than documentation.

I have a feeling there might be more UML in parts of the enterprise world, or in places with "architects" that have forgotten (/never learned) how to code. But my guess is the above is what the majority of "UML" usage in the industry looks like.

As for your question about documentation: The best advice I can give is to be mindful that documentation tends to get outdated faster, the further away it lives from the code that it documents. That's why "self-documenting" code is so valuable, since the compiler will often refuse to compile it if you don't update it. Thus I would always advise to use the "closest" documentation form that is suitable, in roughly this order:

- The code itself

- The tests

- Comments in the code

- Comments at the start of files

- Readme.md

- Other files in repo

- Wiki that lives alongside the repo (e.g. GitHub)

- External wiki (e.g. Confluence)

In your specific case, it sound like you need to document the architecture/design in a way that needs to be understood before someone would even know which file to look in for further documentation. In that case, I would first attempt to describe the architecture in words in Readme.md. E.g. something like a list of the most important classes/functions/datatypes and a paragraph or two about how each relates to the others.

If something like that isn't helpful enough, and you want to make some UML consider using something like asciiflow.com or textik.com to make ascii diagrams to put in the readme. If the info doesn't fit that format, consider making diagrams in images that can be shown inline in the readme - ideally if you do this, use a format where you can also check in the "source" of the image (e.g. a graphviz DOT-file) to make it easier to update. And then finally, if all else fails, you can either check in a PDF, or use a wiki. But as mentioned above - prefer the "closer to the code" solution whenever possible.


👤 theincredulousk
I had a systematic way of doing this for reverse engineering a large undocumented system. First you get first principles in mind - there is control flow (execution), data (files, databases), and communication interfaces (IPC, network, etc.). You don't need a sophisticated modeling tool - just any Visio-esque, giphly, draw.io, etc.

For these things you know there are systematic ways of finding them - for mine it was a C/C++ Project so:

1. Find all executables via build output, or in the running system. For now you're largely going to ignore the details of what the code is doing. You just want to know what is actually "running" at runtime :). 2. Figure out where the entry points to those executables are, like a main. These are usually easy to search for or discover by convention. 3. Find out what threads it spawns 4. Start a simple diagram with a box at the top named for the executable, and branch down to one box for each thread. Manually trace control flow for each thread, adding boxes at points you think are noteworthy logical units. E.g. often threads will have some kind of main loop they sit in, which is a key element for understanding what that thread is doing. 5. Continue for nested threads and worker (short lived but not ephemeral) threads.

Once you complete this, you should have an abstract block diagram that gives a decent map of "What code is running in the system". And just through the process of naming and looking over, maybe a rough idea of what the various pieces of software are doing and possibly how they relate.

You can then repeat this for the other basics in a similar fashion - data and communication interfaces. It's good to emphasize staying at a first-principles kind of abstract mindset. You know there are a finite number of ways a process or thread can communicate or create side-effects outside of itself. If you literally just find all of them (not the details of what is happening over those interfaces), usually it ends up being quite few, and all of a sudden the complexity becomes less intimidating. You have a little box that does some manipulation of data via logic and state, and it goes in one pipe and out the other.

I should point out how difficult all this is largely derives from those "coding practices" droned on about for benefiting maintainability, but so often get tossed. For example, say your system uses message IDs as part of an IPC mechanism. If the code followed good practice, using some kind of constant definitions shared from a single place, you can now do things like search for that message identifier and find all places it's sent/received. If some code used it's own re-definition of the same ID, or hardcoded just the raw numerical value, this becomes nearly impossible.

Also you'll need multiple diagrams. You won't be able to clearly show a complete "code execution" diagram at the same time as an interface relationship diagram or shared data sources diagram. The complexity of it will not help, it will just be more overwhelming complexity.


👤 anoncept
You might find it helpful to distinguish between visualizing the design of the system being implemented by your software, visualizing protocols being implemented by your software, visualizing the design of your software implementation itself, and visualizing important implementation details at runtime, e.g. for debugging, profiling, and operations.

For visualizing system designs, you should take a look at STAMP, e.g., via “Engineering A Safer World” + the resources at mit.edu/psas + on YouTube.

(Multiple tools, both commercial and libre, exist and are being developed to make these diagrams, although for what it’s worth, I mostly hear about people making them using draw.io, Google Drawings, on physical paper/whiteboards, or occasionally with specialized tooling.

I have also recently published a project in this area, https://github.com/mstone/depict, which I believe is well on its way toward addressing some unmet needs here.)

For visualizing protocols, things like sequence diagrams, data flow diagrams, DRAKON flow charts, value stream maps, and occasional more specialized objects like CPSA “cryptographic protocol shapes” / strand space skeletons are where I start depending on the flavor of what’s needed.

For visualizing the design of implementations themselves, I have not yet seen anything that I feel obliged to recommend; rather, here, I suggest investing in adding illustrations to your existing documentation in whatever way is easiest for you to use to clarify whatever subtleties you need to clarify for your audience.

(Here I tend to look at things like ASCII-art, SQLite’s railroad diagrams (now made with pikchr, AIUI), and sequence diagrams, as mentioned by other commenters, as helpful examples to start with.)

Finally, for implementing debugging/profiling/operational illustrations, there is a such a rich set of examples to turn to — whether from the very specialized (custom process model video rendering pipelines in robotics) to TensorBoard for TensorFlow to general-purpose tools like browser performance debugging suites, flame charts, or Go’s built-in profile graphing tools - that rather than learn any particular such tools, I’d instead suggest trying to get comfortable with the building blocks underlying these systems, which include contemporary GUI/web apps, custom drawing and animation tools like SVG, pretty printers, and Grammar-of-Graphics systems like vega-lite.

(Note: although it may seem superficially extraneous to your question, the reason I also suggest thinking about debugging visualizations in this context is because IMO, to work, they ~necessarily encode a visual model of the design of your implementation since it is the design of the implementation that provides the vocabulary and relationships that have to be understood and navigated in order to successfully debug/optimize/monitor any given running instance of whatever system you are building.)