How would you approach the process to opensource a proprietary codebase, especially regarding things such as ensuring that no secrets are sitting somewhere in the history.
I'd be tempted to publish without history but I feel like a lot of important context will be lost.
Making something available publicly is a far shot from actually having an open source project in my opinion. Good luck in your endeavors, it's a lot of work!
What do you want to achieve by open sourcing? Are you trying to grow your username, gain trust or is it just throwing code over the fence so users can go on while you focus elsewhere. Do you hope for contributions or do you want to continue driving the project?
From there you can derive the community management you have to do. The more involvement you want from externals the more you have to invest in community management. (The more you want the quicker and more thorough you have to respond)
Then be aware of all the legal things. When using libraries: anything lgpl or gplnlicensed stuff with something incompatible etc.?
And then for secrets best is indeed squashing the history. Makes it in the beginning a bit annoying, but you might have code comments or commit messages referring to customers or have experiments with libraries of unacceptable license or whatever in there. Limiting review to recent state is a lot simpler.
a) Make sure it's actually all your code - not code you copied from some other closed source project, or a contractor gave you 3 years ago. b) Remove the questionable comments about other employees and their managers c) If you're removing history you are removing some of the rational about why things are like they are - that does make it harder for people in the future to change things
1) run the full contents of HEAD into a space-separated list of tokens considered "okay".
2) dump out the full history into a space-separated list of tokens, filter out everything in the "okay" list, and list what's left.
You might want to set up some sort of incremental regex filter thing to chew through the list efficiently.
But if you implicitly trust HEAD as "incontrovertibly okay" this might filter out a lot of tokens for you.
In addition to the good advice here, you might want to check for anything potentially embarrassing, such as offensive language in comments, identifiers or commit comments. Some "tech bros" can be remarkably dumb about that stuff. Of course if you developed all this yourself, no problem.
Edit for clarification: You can delete the history for the open-source version and publish it with a fresh history. And keep the original history internally in case it is ever needed.
Do not publish the history. Fresh start for open source.
Also it gives you the chance to secret scan it and remove any swearing and embarrassing comments (I haven’t seen a codebase without any yet)
I wrote the library because the issue it solves in MinIO's Python client was marked as "won't fix" and it has been useful for many people (we put it on PyPI before adding it to GitHub), and I was glad a few days ago to see that MinIO added something very similar to their Python client (they added a Python wrapper just like bmc).