1. Quickly learn the domain and context of the application
2. After adding a feature, we should aware if we broke anything (assume you work with code that doesn't have test-case), it helps even to search testcases
3. Find similar code and ensure you are improving quality of the overall similar code (not just fixing current bug)
4. Understand how application behaves when there are production issues.
Most often I deal with large inherited code-base in my career, often we need to search similar code or usage of certain variable or a function/class/module. When it is statically typed language to certain extent IDE/compiler helps. But we have to deal with different languages and sometime developers copy/paste for various reason. Searching/grepping code and its usage seems to be very useful for various reasons.You as a developer, what are all the ways you search source-code before working/fixing feature or bug? Do you use any CLI tools other than grep. I have used OpenGrok, But few times it is not maintained by me or other developers.
Below is my steps.
1. Read the relevant code, and know certain domain keyword, variable names (inclusive class/method/function)
2. Use the bitbucket/GitHub/git search
3. Use the grep
4. Use the git-grep
Still few times, I end up missing.Seems like this is (Especially CLI based search) very valuable skill to have. Do you have any tips/tools for other developers?
Septum is neighborhood based (context-based) search, so you can find contiguous groups of lines which contain specific things, but exclude other things. It's also interactive so you can add/remove filters as needed. This makes it useful for those cases where terms change based on their context so you can exclude terms related to the contexts you don't want to keep. It reads .septum/config which contains its normal commands to load directories and settings, so you can have different configs per project you're working on.
- Sourcetrail (GUI/Linux/Windows, closed-net-capable, archived) - https://github.com/CoatiSoftware/Sourcetrail
- SourceInsight (repo/web-server/closed-net) https://www.sourceinsight.com/
- OpenGrok (web server plug-in/Java/closed-net) https://oracle.github.io/opengrok/
- CLion (GUI-based IDE) - By IntelliJ/JetBrains - https://www.jetbrains.com/clion/
- SourceGraph (Web-based) https://about.sourcegraph.com (thanks, gravypod)
- Codesee.io (GitHub/web-based) - https://www.codesee.io/privacy-and-security
For free as in beer, I prefer OpenGrok so I can get more than JetBrains
https://codesearchguide.org/story/google
https://codesearchguide.org/story/facebook
https://codesearchguide.org/story/brave
https://codesearchguide.org/story/chromium-android
https://codesearchguide.org/story/linux
https://codesearchguide.org/story/yelp
https://codesearchguide.org/story/stripe
The Google one in particular has a great breakdown of how they use code search by use case (examples, exploration, etc.).
And here are a bunch of known code search tools: https://codesearchguide.org/tools
(Disclaimer: I am the Sourcegraph CEO and our core product is code search.)
Honourable mentions to cscope and ctags. They work for me since most of my $dayjob involves me mucking around with C++.
All tools get invoked from within Vim. (Which _also_ works reasonably well in Windows Terminal).
At the root level I maintain two scripts:
clone.sh
update.sh
clone.sh has one git clone --recursive ..
line per repo.
When I run low on disk space I sometimes delete larger repos.
The clone script allows me to easily re-clone everything in this case.update.sh is similar but pulls all repos.
For global search across all branches I do:
git grep $(git rev-list --all)
(when I forget the line I look it up in Stack Overflow [1])This is especially useful since I work a lot with Bitbucket and to the best of my knowledge you can only search the default branches there.
When I know the branch, but want to search across all history I use git pickaxe, aka
git -S ...
All of this is not very sophisticated and takes a lot of disk space but it works pretty well for me.
It's surprising to me how much effort is not being put into whole-org code search. Most projects focus solely on single-repo search. If you need to make breaking changes or find examples and you don't even know where to look, single-repo search isn't so useful.
For example:
def foo(args):
trace_func("foo", args)
# rest of the function
(NOTE There is call tracing logic built into some languages but it doesn't always work for some complex code bases; try it before you write your own.)If the code creates html elements, my script adds attributes to the html element to link back the source code location so I can look at the html and figure out where the elements were created. If the html is built using templates, then I add html comments to the template so I can tell where they are used in the final page.
Then I test the app and look at the traces to figure how it works.
At first I trace everything but once I get to know the code I add the tracing to the areas that matter. I don't check in this code.
What if any frameworks and libraries is it using? Try to identify particularly core frameworks that tend to dictate the whole workflow of the application. Many frameworks have standards of file organization and system architecture that can help you get a handle on what goes where. They may not always have been used properly, but it's a start at least. It might even help to set up a small learning project in that framework just to get to know it better. There may also be libraries in use that influence a lot of how the application does whatever it does.
Trace control flows of the application. How does it start? Do any other processes get started in addition to the main application? Learn how to do the workflow you need to modify, or the closest one to it if you're making a new one. Trace how the command to do X first gets into the application (API call? GUI button press? Some kind of messaging system trigger?), and try to follow the code to see what it does and how it does it.
Trace data flows. Where does the application store critical data, and how does that data actually get picked up from there, transformed, and eventually used, to present to the user or get transformed and handed off to some other system or whatever?
Text search of the codebase can be useful. In strongly-typed languages, often IDE tools are better at jumping straight to the code of the actual method being called though. In less typed languages, text search might be better. Or if whoever wrote the thing did a bunch of dynamic trickery, you may need to resort to running the code, in a unit test if it actually exists, or in your test environment, and attaching a debugger or adding a bunch of log statements.
It's always helpful to understand the business logic of what the application is actually trying to do, and the perspective of developers more experienced with it, if any such people are actually available.
Usually you need to do all of the above to actually develop expertise in a new codebase. Sometimes you have to not be afraid to just jump in and try doing stuff, even if it might not be the best way.
How good are you at reading code, finding out how smaller parts work in a larger system and understanding the context&domain is a large part in how good you are.
Not so good developers, when they are not so good at this, often start blaming the system and people who have worked on it.
Not so good developers may however be able to handle smaller systems (in particular ones written in their favorite tech stack), and this experience leads them to erroneously believe they are good.
One tool I haven't been able to find that I feel would be super helpful in the IDE is to show where code is covered in tests, like contexts when using python's `coverage`. Does anything like this exist? The benefits are two-fold: they help show me how the methods are supposed to be used, and also guide me on how and where I should test my fix or feature.
- grep
- ag - Same as grep, but faster!
- find - when looking for a file by name
- helm (an epic Emacs package which does interactive search)
Used to work at a Windows shop and we used Entrian in visual studio. That was pretty good, bust closed source and a pain to setup.
https://github.com/hound-search/hound#hound
It would be great if someone integrated this with tree-sitter plus something to make the search semantics a bit smarter about usages of X:
https://www.etsy.com/codeascraft/announcing-hound-a-lightnin...
Screenshots:
https://jaxenter.com/hound-go-react-code-search-engine-15008...
Another trick I use for Java: javap all the Enums out of the compiled artifacts; these indicate weird things like "modes" that you can use to start asking questions relevant to the domain. Like "why are there four ways to reprice an invoice" or finding the "types" of fees or w/e in a billing system. (assuming enum classes are used)
(1) breakpoint debugging, finding the connection between program start and various features
(2) Doxygen to generate a dependency graph
(3) create json performance profiles, manually instrumenting functions, and navigate traces using Google Chrome about://tracing or similar tools.
(4) trace and look at the data input and output, using a hex editor or over the network using wireshark
1. search with advanced tools or scripts that you wrote to find concrete answers in the code. 2. draw graph of knowledge what youu have, steps, undersand how these knowledge may help to resolve an issue. 3. go to reviewer with the plan. 4. if expert make dicision you plan will not work, then repeat step 1. 5. you may implement fix for an issue.
1. Try to find out which framework, architecture, design patterns used - get hold of that
2. Library dependancy 3. Database structure
4. Pick up your favourite editor (be it vim or emacs or vs code or any) in which you have mastery
5. Search for various entry points like routes, or start activity or main function etc & try step thru code (with possible debug tools open)
Then use Vim to read the concatenation and (regexp) search.
also: GraphViz is a great tool and CLI friendly