Really appreciate hearing about your experience.
A lot of this has to do with terrible "DevOps" practices the org had previously. No version control, all devs coded with vim over ssh directly in production (same username+password for everyone), people just sort of patched bugs randomly whenever they felt like it, dozens of different out of date versions of the same dependencies were imported from random directories with cryptic names like "deps2", etc. Honestly, it's a testament to the versatility of php that the house of cards was able to stand for so long.
Moving to python (mostly so we wouldn't be tempted to copy any of the legacy code directly), starting from the ground up, and being deliberate about having "adequate" if not "best" practices was a huge improvement and totally worth it. Once auth, permissions, and infrastructure was squared away (probs 6 months to reverse engineer the existing home-made rbac system and figure out how to simplify it and migrate hundreds of users!), we went one feature/page/sub-application at a time through the php codebase and either rebuilt it, eliminated it, or merged it with some redundant similar feature elsewhere.
Politically, it was really hard to convince leadership this was worth the time since the end result was functionally identical to the start (with some minor UX improvements). It was worth it though. We went from dedicating something like 60% of capacity to random support and bug fix nonsense, to less than 10%.
Basically, it can pay off but for any non-trivial system it will be expensive and politically difficult to justify for a long time. In dire straits it can save your product, but definitely don't be tempted to swap languages because of tiny performance upsides, marketing hype, or boredom.
It took me about 2 years. I ended up leaving for another job as I was getting started migrating stuff into the new system. Someone else ended up taking over and finishing the job.
I should probably mention I went the rewrite route. I hadn't even considered the option to port it, I think I would have rejected it as infeasible due to my juniorness, and to be fair given my juniorness I don't think I had the skills or know-how to port such a thing. But at the end of the day it led me to realize that porting it probably would have been easier and less of a pain. When you rewrite, you have to re-invent every little detail. Every little hack, every little or big workaround that the original system wasn't flexible enough to handle (or the original programmer didn't have time to do correctly). The bigger the system, the more these details pile up and keep. getting. in. the. way. of. finishing.
At the end of the day it wasn't that satisfying. I modernized something that needed occasional maintenance, and I didn't really improve performance that much, not in any way that mattered to the bottom line. The project did a good job of giving me experience with programming, and then I took that experience and went elsewhere lol. But that's on management and overall I think they were happy with my efforts.
You're right to approach it with due caution. Best of luck!
but consider reading it:
https://www.joelonsoftware.com/2000/04/06/things-you-should-...
Always remember: Legacy code is legendary code because it pays your bills. Honor it and you'll be fine.
The received wisdom of yore is that it's easier to port code from one language to another than to rewrite from scratch. However, I suspect in practice most people who do that will wind up rewriting in part and that's where the pain happens, screwing up all the undocumented edge cases.
In this day and age it generally makes more sense to keep the codebase in the original language and use virtualization as required to keep it running without the human overhead of porting.
The first thing I did before porting a feature was write a unit test on either side (because of course the Matlab side came with the most arbitrary test coverage you've ever seen). Then it was off to mexing Julia through the terrible C FFI in order to pretend to do validation. (In practice, this was not as helpful as it could have been, since I couldn't figure out how to make Julia and Matlab agree on a floating-point environment and Julia back then had some well-known correctness problems with its math library (that have been fixed by now, afaik)).
Other than that, the whole experience was pretty chaotic and terrible, would not advise doing again.
Unless the existing system has terrible architecture and technical debt, or its written in some language where you can no longer find developers, be very weary of the "do it right this time" rewrite.
Be double weary if no long term staff are still around with broad and deep business knowledge.
* Profile the code, clean up the hot paths in the native language.
* If it makes sense, implement as a c-extension
* Restructure your application to handle concurrency/parallelism. Lots of options to try here; thread pools, external task queues, break your application into multiple services.
* Re-write one of the smaller services you spun out in the language of your choice.
Edit: formatting
UI stuff tends to be a big bang migration since we don't want users to need two portals. We do select a small group of users to pilot the new portal.
Backend or midtier stuff tends to go piecemeal since it's easier to swap out an API, or start calling new APIs while the old monolith is used for the rest of the calls.
Either way, you need a plan of what to migrate and how to migrate it. Generally aim for parody with the legacy system before adding new functionality.
First we tried to rewrite (refactor) project to make code clearer, but it becomes just copy-paste, without real rewriting.
So have chosen to migrate to PHP, which was mature enough at that time (PHP 5 already appears, but chosen 4), and in about year, testers said, it work.
This is ERP like platform, on LAMP stack, and fortunately, used Perl 3 practices and very old framework, so performance mostly depends on database speed.
Most important where to make complete enough test suite (mostly used tests just from Perl version), and than just do classic loop write-test-fix...
Easy peasy...only troubles with smart pointers and memory leaks, but that's bread and butter for C++ development
The first was a C to C++ conversion of an open-source project. Primarily for future maintainability and quality gains. This was on the smaller side, and it took a few weeks only. It had the advantage of being possible to convert piecemeal. Once I got the C code building with the C++ compiler, the code could be rapidly refactored to use C++ classes, smart pointers etc very easily, and the testsuite could be run every step of the way, so the conversion was not very risky (I found and fixed some small logic bugs the C compiler hadn't picked up as the code became more type safe). Straightforward and trouble-free.
The second was a big conversion of a large decade+-old Java codebase to C++. This was an academic open-source codebase, where the consortium maintaining the codebase got a grant to provide a C++ equivalent to the Java version. This took around 18 months for the core part of the codebase, including major updates to a complex code generator from a large data model, and I worked on it for about 6.5 years in total to varying degrees over that time.
My experience of the second was not as good as it might have been. The sheer scale of the work was, in retrospect, too much for one person. The nature of academic grants and the project goals were a bit in conflict with providing a good quality solution within the desired timeframe. One of the problems here was that due to the difference in the languages, the work could not be done incrementally--the system wouldn't be functional until the vast majority of the work was done. The other was the requirement to do a direct 1:1 port of the Java interfaces and classes; this made the porting more difficult and created interfaces with Java-isms which made it less pleasant to use from the perspective of a C++ developer. We did get a working solution in the end, which was almost completely compatible with the original implementation, but it came with some compromises.
Before you start doing any work, I'd suggest spending a decent amount of time investigating and planning. It will reap dividends, and will let you properly understand the costs and tradeoffs of the various approaches you could take. Based on my previous work, I'd suggest asking these questions:
- Do you need to port everything? Ignore the code and go back to the requirements. Which functionality is strictly necessary to meet those requirements? Which interfaces and implementations are needed by the users, and which are internal details which could be implemented differently or dropped entirely? In my case, some of the difficulties were due to a lack of detailed written-down product requirements, which made answering these questions impossible, and we ended up porting the whole implementation effectively line-for-line to ensure 100% compatibility with identical behaviour.
- Can you access code written in the old language from the new language? Is it possible to wrap the old code so you can port bits incrementally? Or even leave whole parts untouched?
- Make sure you've looked at the totality of what needs porting, not forgetting unit and integration tests. Also factor in needed third-party libraries; are there equivalents for your new language, or will you have to port them over as well? Look at all transitive dependencies, you might have some nasty surprises which will ruin your estimates.
- Look at alternatives to porting. Can you do automated code translation?
- Look at where language differences can have significant impacts upon the APIs. For example, Java GC vs C++ where I had to approximate it with smartpointers, because the whole codebase was written with the assumption of GC in mind. Likewise Java enum classes. In retrospect, if I'd had the flexibility to do so, I could have replaced ~90% of the Java code with an in-memory SQLite database and the end user would have been none the wiser. It would have eliminated a lot of the porting costs at a stroke.
- What is the true cost:benefit of doing the port? Maybe the cost (once you've properly planned and estimated it) is too great.
- What risks are you introducing by doing the port? Not just bugs and behaviour changes, but risks to the company if it can't be delivered on schedule with the available resources.
That's just to start. Overall, just be sure this is the right path before committing to it, because it can be very costly.
Due to C being a typed language, I found the C code often more readable than the untyped Perl. Perl has the advantage of more readable string manipulation and automatic memory management.
You can see the Perl code here (single file): https://git.savannah.gnu.org/cgit/texinfo.git/tree/tp/Texinf...
and the C code here (directory): https://git.savannah.gnu.org/cgit/texinfo.git/tree/tp/Texinf...
As I was only doing this in my spare time, it took several years in total as I recall, although the project can be said to be a success now as the results are being used in the program by default and have made it much faster.
The existing test suite was crucial for testing that the new code had the same results. When developing the new code I kept a reference file with the old code in it using comments with line numbers in the new code.
You have to decide what approach to take and whether it is worth it. Maintaining two parallel sets of code in different languages and trying to maintain compatibility clearly has its costs, as well as benefits in terms of increased scrutiny of the code. However, I do feel that a lot of time can be spent on issues that aren't very important in trying to get exact compatibility for use cases that are quite unlikely.
Before this rewrite we had also achieved a very significant speed-up (about 30%) by rewriting a smaller part of the program in C, just the plaintext paragraph formatter.
Character encoding issues are a huge time sink - it could be as much as 50% of the work.
Before I got involved with the project, in 2010 there had been a complete rewrite of the makeinfo program from C to Perl. (The main developer of Texinfo did a talk in 2011 - https://www.gnu.org/ghm/2011/paris/#sec-2-4.) The main downside of this was that it made the program much slower (about 50 times as slow, unacceptable for some users). It also did not appear to attract many more contributors to the code. The upsides were better structuring of the code, allowing more functionality to be added in terms of supported output formats (although this is only happening now), better test coverage and treatment of different input cases.
Team just 'finished' a migration of a 10 year old codebase from Java -> Kotlin (ended up with 100k SLOC). Part-time this took around 1.5 years, which was faster than I expected. Seeing benefits already but it'll take years to see if it was worth the opportunity cost. Also in the middle of a second migration of some aspects of the codebase to Rust/Svelte. Massive and permanent immediate benefits when this goes through.
- Have well-defined rationale, scope, and buy-in.
- Determine how you'll approach bad code that nobody's touched for a long time
- Testing guideline: ensure you 'could' test the code in the 'before' (typically about 15% coverage). If you can't test the 'after', you've probably picked the wrong language
- CI is table-stakes.
- Define the level of interop between old and new code that you want.
- If you can avoid a 'big bang', do so. The burden of a 'big test' beforehand and 'everything is now unstable' afterwards isn't an easy situation to solve (still working on this). Nobody wants to be in the position of 'you did nothing and everything's worse'
- Ensure you keep git blame working between old and new. git bisect is an ideal
- Set up linting immediately for the new project
- Optional: I went down the route of adding AST-based linting for a couple common errors/issues. Tempted to add some more. Didn't get a massive return from this but on balance it was worthwhile, this may be different for other projects.
- Convert and refactor separately (use your judgment on small refactors). When converting, add language-based annotations or standardised comments to areas which need love (tests, important for refactoring, or bad style). You don't want to be in a situation where you've both introduced bugs from the conversion, then introduced bugs from a refactoring in the same commit. Only add 'issue tracker tasks' for major opportunities or structural fixes.
- With the above annotations, you have a quantitative metric of 'first we convert to 100%', then 'get the annotation count to 0'. It keeps focus, and gives microtasks to handle.
- Having 'porting helper functions' is useful: you're going to join strings 100 times, have a helper that works the same way and has similar syntax to the source language. You can use your IDE to inline that function to something more idiomatic when you're done. Type aliases are awesome.
- Avoid lava layers[0]. After a conversion is 'complete', you should be consistent in your use of programming language(s). Aim to split the project if this isn't the case. Also gives a compile time benefit to only use one toolchain
- Consider isolating toolchains within dependencies: I split out a Java -> Rust interop library into a separate dependency: this massively helped onboard new developers as they only needed one download to get the 'core' of the project running.
- You spent X years getting to a stable process with your current code, accept that it'll take time and effort to get to the same level with your new language
- Automate the boring parts (depending on the size of your project): this can both be more fun, and save time with the conversion
[0] http://mikehadlow.blogspot.com/2014/12/the-lava-layer-anti-p...
When I was a King and a Mason -- a Master proven and skilled --
I cleared me ground for a Palace such as a King should build.
I decreed and dug down to my levels. Presently, under the silt,
I came on the wreck of a Palace such as a King had built.
There was no worth in the fashion -- there was no wit in the plan --
Hither and thither, aimless, the ruined footings ran --
Masonry, brute, mishandled, but carven on every stone:
"After me cometh a Builder. Tell him, I too have known."
Swift to my use in my trenches, where my well-planned ground-works grew,
I tumbled his quoins and his ashlars, and cut and reset them anew.
Lime I milled of his marbles; burned it, slacked it, and spread;
Taking and leaving at pleasure the gifts of the humble dead.
Yet I despised not nor gloried; yet, as we wrenched them apart,
I read in the razed foundations the heart of that builder's heart.
As he had risen and pleaded, so did I understand
The form of the dream he had followed in the face of the thing he had planned.
* * * * *
When I was a King and a Mason -- in the open noon of my pride,
They sent me a Word from the Darkness. They whispered and called me aside.
They said -- "The end is forbidden." They said -- "Thy use is fulfilled.
"Thy Palace shall stand as that other's -- the spoil of a King who shall build."
I called my men from my trenches, my quarries, my wharves, and my sheers.
All I had wrought I abandoned to the faith of the faithless years.
Only I cut on the timber -- only I carved on the stone:
"After me cometh a Builder. Tell him, I too have known!"