HACKER Q&A
📣 xwowsersx

Have you ever migrated a large/old project to a different language?


I'm curious if anyone has had the experience of porting over a large, existing project to a different language (either because of performance or other reasons). If so, how did that go? It seems like it could be a rather risky venture and one that a person ought to embark on with some due caution. Any general ideas or suggestions for how to go about this? Is it advisable to gradually move over bits of the existing system (like the parts that need better performance, assuming that's the reason for the switch)?

Really appreciate hearing about your experience.


  👤 guenthert Accepted Answer ✓
I wasn't involved, but was hired shortly afterwards as sysadmin and witnessed the aftermath. Around the bubble-burst time a medium-sized website's code base was converted from Perl (out of fashion already) to Java (the hot new thing everyone was talking about). Thing was just, there weren't all that many expert Java programmers around and the remaining expert Perl programmers became utterly frustrated and disengaged. That was the first and only time that I encountered colleagues high on drugs at work (and could be certain of it). It was an ugly mess. The company was eventually acquired by its largest customer and withered for a few more years before finally being dissolved.

👤 academia_hack
Helped lead a team that moved a huge decade-old internal amalgamation of php applications to Django around 2014. It was a nightmare that took almost 2 years, but the end result was orders of magnitude more maintainable and stable.

A lot of this has to do with terrible "DevOps" practices the org had previously. No version control, all devs coded with vim over ssh directly in production (same username+password for everyone), people just sort of patched bugs randomly whenever they felt like it, dozens of different out of date versions of the same dependencies were imported from random directories with cryptic names like "deps2", etc. Honestly, it's a testament to the versatility of php that the house of cards was able to stand for so long.

Moving to python (mostly so we wouldn't be tempted to copy any of the legacy code directly), starting from the ground up, and being deliberate about having "adequate" if not "best" practices was a huge improvement and totally worth it. Once auth, permissions, and infrastructure was squared away (probs 6 months to reverse engineer the existing home-made rbac system and figure out how to simplify it and migrate hundreds of users!), we went one feature/page/sub-application at a time through the php codebase and either rebuilt it, eliminated it, or merged it with some redundant similar feature elsewhere.

Politically, it was really hard to convince leadership this was worth the time since the end result was functionally identical to the start (with some minor UX improvements). It was worth it though. We went from dedicating something like 60% of capacity to random support and bug fix nonsense, to less than 10%.

Basically, it can pay off but for any non-trivial system it will be expensive and politically difficult to justify for a long time. In dire straits it can save your product, but definitely don't be tempted to swap languages because of tiny performance upsides, marketing hype, or boredom.


👤 actinium226
I did this once as one of my first big projects. I was tasked with re-implementing a system that had been written in FORTRAN and some legacy databases and bringing it up to date (to C++ and a SQL database). There were also a couple extra features that they wanted.

It took me about 2 years. I ended up leaving for another job as I was getting started migrating stuff into the new system. Someone else ended up taking over and finishing the job.

I should probably mention I went the rewrite route. I hadn't even considered the option to port it, I think I would have rejected it as infeasible due to my juniorness, and to be fair given my juniorness I don't think I had the skills or know-how to port such a thing. But at the end of the day it led me to realize that porting it probably would have been easier and less of a pain. When you rewrite, you have to re-invent every little detail. Every little hack, every little or big workaround that the original system wasn't flexible enough to handle (or the original programmer didn't have time to do correctly). The bigger the system, the more these details pile up and keep. getting. in. the. way. of. finishing.

At the end of the day it wasn't that satisfying. I modernized something that needed occasional maintenance, and I didn't really improve performance that much, not in any way that mattered to the bottom line. The project did a good job of giving me experience with programming, and then I took that experience and went elsewhere lol. But that's on management and overall I think they were happy with my efforts.

You're right to approach it with due caution. Best of luck!


👤 tester756
I'm not saying this is 100% right

but consider reading it:

https://www.joelonsoftware.com/2000/04/06/things-you-should-...


👤 SeriousM
Oh yes, some years ago I got the opportunity to migrate a vb.net into c#. The average reader might think "it's the dotnet eco system, so what" but please read on. The code was maintained by a single guy over the last 10 years for a niche market. Spanning from winform code over silverlight, asp.net and background workers, having the business code right below the buttons, with no sense for coding styles. They didn't share a common codebase yet "some" parts were kind of the same (copied over at some point in time) but evolved into different directions. The first thing I did was searching for a way to convert the vb.net code into c#. 90% of this huge codebase was converted to compiling code, 10% was handmade. Now with a compiling code I started forming a shared base, introduce abstractions to support the different derivates of services, introduce dependency injection (> 70% was static code), introduce unit testing, introduce command pattern to share business code throughout the code. Over the span of two years we migrated bit by bit to the new architecture while running our business. Starting with framework 4.6 and single server hosted services we're now on dotnet 6 and kubernetes in azure and deploy with azure devops pipelinee. The benefits of this migration are so huge, we're now able to deliver features and fixes in a reasonable time and we can sleep well.

Always remember: Legacy code is legendary code because it pays your bills. Honor it and you'll be fine.


👤 contingencies
Only abortively.

The received wisdom of yore is that it's easier to port code from one language to another than to rewrite from scratch. However, I suspect in practice most people who do that will wind up rewriting in part and that's where the pain happens, screwing up all the undocumented edge cases.

In this day and age it generally makes more sense to keep the codebase in the original language and use virtualization as required to keep it running without the human overhead of porting.


👤 davidhyde
Porting to a new language is not necessarily a risky thing. If your existing python system communicates with a standard database then you’re in luck. You can probably get away with running the old system and the new one in parallel. Port bits over piecemeal. I would recommend using a managed language like C# because it is a great all rounder. It is very modern and productive and you will find developers easily. This means the software is cheaper and safer to develop. C# is extremely fast these days and works on Mac, Windows and Linux equally well. Async and channels are easy and a no-brainier to use. Web apis are straight forward to build out. It also scales well when you throw loads of developers and man years at it. I would avoid micro services if your python app doesn’t use them. Building a monolith is cheaper, lower latency and you spend more of your time actually building business logic instead of serialising messages and multi-server dependency chain error nightmares. A 32 core server can do an insane amount work. Especially when compared to running something using python.

👤 thebooktocome
I had the misfortune of porting about 100kloc of Matlab (aggregated by a bunch of non-coding types) into Julia (this was back in the v0.5 days) and then two years later porting the whole ~75kloc of Julia (plus a couple mods made in the intervening time) back into Matlab.

The first thing I did before porting a feature was write a unit test on either side (because of course the Matlab side came with the most arbitrary test coverage you've ever seen). Then it was off to mexing Julia through the terrible C FFI in order to pretend to do validation. (In practice, this was not as helpful as it could have been, since I couldn't figure out how to make Julia and Matlab agree on a floating-point environment and Julia back then had some well-known correctness problems with its math library (that have been fixed by now, afaik)).

Other than that, the whole experience was pretty chaotic and terrible, would not advise doing again.


👤 nikau
I've joined rewrite projects towards the end a few times to try and steer them back on course, and the projects followed identical patterns - rewrite in shiny latest language/architecture and get 80% of the way there, then get bogged down with a bunch of missed edge cases and obscure business logic that the existing application also learnt the hard way.

Unless the existing system has terrible architecture and technical debt, or its written in some language where you can no longer find developers, be very weary of the "do it right this time" rewrite.

Be double weary if no long term staff are still around with broad and deep business knowledge.


👤 piotr_bulinski
I was part of a successful rewrite of a legacy enterprise SaaS from Groovy to Golang (moved from manually managed EC2 instances to Google Cloud AppEngine standard). The project took couple of months and began with writing an extensive test suite using Karate. Due to complexity of the software, I can’t see it succeeding without those tests. Also, plenty of benefits of having an extensive test suite going forward. Another reason for success IMO was good project planning with milestones and tight time limit, so the engineering didn’t overdo it, but made it “good enough” in the provide time frame.

👤 mattpallissard
While a complete re-write can be fun, in my experience it's usually not the best course of action. In decreasing order of most bang for your buck (your mileage may vary)

    * Profile the code, clean up the hot paths in the native language.
    * If it makes sense, implement as a c-extension
    * Restructure your application to handle concurrency/parallelism.  Lots of options to try here; thread pools, external task queues, break your application into multiple services.
    * Re-write one of the smaller services you spun out in the language of your choice.
Edit: formatting

👤 giantg2
Yeah, we've done this a couple times ar work.

UI stuff tends to be a big bang migration since we don't want users to need two portals. We do select a small group of users to pilot the new portal.

Backend or midtier stuff tends to go piecemeal since it's easier to swap out an API, or start calling new APIs while the old monolith is used for the rest of the calls.

Either way, you need a plan of what to migrate and how to migrate it. Generally aim for parody with the legacy system before adding new functionality.


👤 wizofaus
Depends on your definition of "large" and "migrated", but yes, a number of times. In one case it was a straight rewrite, maintaining the same functionality (from JS into C#). I'd been anticipating it would take at least 2 or 3 weeks initially (based on how long it took the original JS code to be written, and the level of functionality), but it ended up taking less than a week. It was done wholesale, because it was basically a microservice where all the APIs depended on the same DB backend, and the DB conversion (from dynamo DB to SQL server) was the first bit I did. I had been planning for long time to do the same for a much larger Java backend project. I'd been going through in my head how to do it piecemeal but the problem was it wasn't just the use of Java I was trying to get away from (though that was the primary motivator - too many things about the tomcat+spring+hibernate stack annoyed me and were a serious drag on productivity) but some bad architectural decisions as well, including the amount of business logic spread throughout various layers etc. Still, if I'd been given the greenlight (I left that role before I had the chance) I reckon I could've knocked it off in a few months. The main challenge was that I would essentially have to maintain the same API for existing mobile clients, and there was a lot wrong with that API (very inconsistent use of IDs for a start, including integer DB identity field IDs that I'd hoped to get rid of). I'd even seriously thought about fixing the API first within the Java version before converting to C#/.NET.

👤 simne
I was involved in migration of about 500k locs of Perl code to PHP.

First we tried to rewrite (refactor) project to make code clearer, but it becomes just copy-paste, without real rewriting.

So have chosen to migrate to PHP, which was mature enough at that time (PHP 5 already appears, but chosen 4), and in about year, testers said, it work.

This is ERP like platform, on LAMP stack, and fortunately, used Perl 3 practices and very old framework, so performance mostly depends on database speed.

Most important where to make complete enough test suite (mostly used tests just from Perl version), and than just do classic loop write-test-fix...


👤 metatranca
One time I have to migrate around 20 huge pipelines written(using) Pentaho to Apache Airflow (Python). The key was to run Pentaho inside Docker for testing: this was crucial because the developer was able to test locally the new DAG vs the Pentaho pipeline. It was difficult but it was worth it. Later we started having issues with Airflow but that is another story.

👤 leros
I worked at a company that was rewriting their 10 year old C++ app to C#. Now they're rewriting their C# app to React/Typescript. Their main motivation both times was to make hiring easier.

👤 pestatije
Java to C++. Not a migration but making the library available in C++

Easy peasy...only troubles with smart pointers and memory leaks, but that's bread and butter for C++ development


👤 kashkhan
Have done several migrations from PHP to Java and iOS apps from Objective-C to Swift etc.

👤 rleigh
I ported a couple of projects.

The first was a C to C++ conversion of an open-source project. Primarily for future maintainability and quality gains. This was on the smaller side, and it took a few weeks only. It had the advantage of being possible to convert piecemeal. Once I got the C code building with the C++ compiler, the code could be rapidly refactored to use C++ classes, smart pointers etc very easily, and the testsuite could be run every step of the way, so the conversion was not very risky (I found and fixed some small logic bugs the C compiler hadn't picked up as the code became more type safe). Straightforward and trouble-free.

The second was a big conversion of a large decade+-old Java codebase to C++. This was an academic open-source codebase, where the consortium maintaining the codebase got a grant to provide a C++ equivalent to the Java version. This took around 18 months for the core part of the codebase, including major updates to a complex code generator from a large data model, and I worked on it for about 6.5 years in total to varying degrees over that time.

My experience of the second was not as good as it might have been. The sheer scale of the work was, in retrospect, too much for one person. The nature of academic grants and the project goals were a bit in conflict with providing a good quality solution within the desired timeframe. One of the problems here was that due to the difference in the languages, the work could not be done incrementally--the system wouldn't be functional until the vast majority of the work was done. The other was the requirement to do a direct 1:1 port of the Java interfaces and classes; this made the porting more difficult and created interfaces with Java-isms which made it less pleasant to use from the perspective of a C++ developer. We did get a working solution in the end, which was almost completely compatible with the original implementation, but it came with some compromises.

Before you start doing any work, I'd suggest spending a decent amount of time investigating and planning. It will reap dividends, and will let you properly understand the costs and tradeoffs of the various approaches you could take. Based on my previous work, I'd suggest asking these questions:

- Do you need to port everything? Ignore the code and go back to the requirements. Which functionality is strictly necessary to meet those requirements? Which interfaces and implementations are needed by the users, and which are internal details which could be implemented differently or dropped entirely? In my case, some of the difficulties were due to a lack of detailed written-down product requirements, which made answering these questions impossible, and we ended up porting the whole implementation effectively line-for-line to ensure 100% compatibility with identical behaviour.

- Can you access code written in the old language from the new language? Is it possible to wrap the old code so you can port bits incrementally? Or even leave whole parts untouched?

- Make sure you've looked at the totality of what needs porting, not forgetting unit and integration tests. Also factor in needed third-party libraries; are there equivalents for your new language, or will you have to port them over as well? Look at all transitive dependencies, you might have some nasty surprises which will ruin your estimates.

- Look at alternatives to porting. Can you do automated code translation?

- Look at where language differences can have significant impacts upon the APIs. For example, Java GC vs C++ where I had to approximate it with smartpointers, because the whole codebase was written with the assumption of GC in mind. Likewise Java enum classes. In retrospect, if I'd had the flexibility to do so, I could have replaced ~90% of the Java code with an in-memory SQLite database and the end user would have been none the wiser. It would have eliminated a lot of the porting costs at a stroke.

- What is the true cost:benefit of doing the port? Maybe the cost (once you've properly planned and estimated it) is too great.

- What risks are you introducing by doing the port? Not just bugs and behaviour changes, but risks to the company if it can't be delivered on schedule with the available resources.

That's just to start. Overall, just be sure this is the right path before committing to it, because it can be very costly.


👤 captainpicard
I rewrote some of the Perl code in Texinfo, the GNU documentation system, to C, because the code was unacceptably slow. It was not a migration as we kept the old code as a back-up. It led to reform of the existing Perl code to get a structure that was more easily implementable in C. Data structures have to be more clearly defined in C and attention has to be paid to which parts of structures are shared or duplicated.

Due to C being a typed language, I found the C code often more readable than the untyped Perl. Perl has the advantage of more readable string manipulation and automatic memory management.

You can see the Perl code here (single file): https://git.savannah.gnu.org/cgit/texinfo.git/tree/tp/Texinf...

and the C code here (directory): https://git.savannah.gnu.org/cgit/texinfo.git/tree/tp/Texinf...

As I was only doing this in my spare time, it took several years in total as I recall, although the project can be said to be a success now as the results are being used in the program by default and have made it much faster.

The existing test suite was crucial for testing that the new code had the same results. When developing the new code I kept a reference file with the old code in it using comments with line numbers in the new code.

You have to decide what approach to take and whether it is worth it. Maintaining two parallel sets of code in different languages and trying to maintain compatibility clearly has its costs, as well as benefits in terms of increased scrutiny of the code. However, I do feel that a lot of time can be spent on issues that aren't very important in trying to get exact compatibility for use cases that are quite unlikely.

Before this rewrite we had also achieved a very significant speed-up (about 30%) by rewriting a smaller part of the program in C, just the plaintext paragraph formatter.

Character encoding issues are a huge time sink - it could be as much as 50% of the work.

Before I got involved with the project, in 2010 there had been a complete rewrite of the makeinfo program from C to Perl. (The main developer of Texinfo did a talk in 2011 - https://www.gnu.org/ghm/2011/paris/#sec-2-4.) The main downside of this was that it made the program much slower (about 50 times as slow, unacceptable for some users). It also did not appear to attract many more contributors to the code. The upsides were better structuring of the code, allowing more functionality to be added in terms of supported output formats (although this is only happening now), better test coverage and treatment of different input cases.


👤 david_allison
I've done a few (started my career doing VB6 -> .NET porting). Working Effectively with Legacy Code was the first book I read. Email's in my profile

Team just 'finished' a migration of a 10 year old codebase from Java -> Kotlin (ended up with 100k SLOC). Part-time this took around 1.5 years, which was faster than I expected. Seeing benefits already but it'll take years to see if it was worth the opportunity cost. Also in the middle of a second migration of some aspects of the codebase to Rust/Svelte. Massive and permanent immediate benefits when this goes through.

- Have well-defined rationale, scope, and buy-in.

- Determine how you'll approach bad code that nobody's touched for a long time

- Testing guideline: ensure you 'could' test the code in the 'before' (typically about 15% coverage). If you can't test the 'after', you've probably picked the wrong language

- CI is table-stakes.

- Define the level of interop between old and new code that you want.

- If you can avoid a 'big bang', do so. The burden of a 'big test' beforehand and 'everything is now unstable' afterwards isn't an easy situation to solve (still working on this). Nobody wants to be in the position of 'you did nothing and everything's worse'

- Ensure you keep git blame working between old and new. git bisect is an ideal

- Set up linting immediately for the new project

- Optional: I went down the route of adding AST-based linting for a couple common errors/issues. Tempted to add some more. Didn't get a massive return from this but on balance it was worthwhile, this may be different for other projects.

- Convert and refactor separately (use your judgment on small refactors). When converting, add language-based annotations or standardised comments to areas which need love (tests, important for refactoring, or bad style). You don't want to be in a situation where you've both introduced bugs from the conversion, then introduced bugs from a refactoring in the same commit. Only add 'issue tracker tasks' for major opportunities or structural fixes.

- With the above annotations, you have a quantitative metric of 'first we convert to 100%', then 'get the annotation count to 0'. It keeps focus, and gives microtasks to handle.

- Having 'porting helper functions' is useful: you're going to join strings 100 times, have a helper that works the same way and has similar syntax to the source language. You can use your IDE to inline that function to something more idiomatic when you're done. Type aliases are awesome.

- Avoid lava layers[0]. After a conversion is 'complete', you should be consistent in your use of programming language(s). Aim to split the project if this isn't the case. Also gives a compile time benefit to only use one toolchain

- Consider isolating toolchains within dependencies: I split out a Java -> Rust interop library into a separate dependency: this massively helped onboard new developers as they only needed one download to get the 'core' of the project running.

- You spent X years getting to a stable process with your current code, accept that it'll take time and effort to get to the same level with your new language

- Automate the boring parts (depending on the size of your project): this can both be more fun, and save time with the conversion

[0] http://mikehadlow.blogspot.com/2014/12/the-lava-layer-anti-p...

  When I was a King and a Mason -- a Master proven and skilled --
  I cleared me ground for a Palace such as a King should build.
  I decreed and dug down to my levels. Presently, under the silt,
  I came on the wreck of a Palace such as a King had built.
  
  There was no worth in the fashion -- there was no wit in the plan --
  Hither and thither, aimless, the ruined footings ran --
  Masonry, brute, mishandled, but carven on every stone:
  "After me cometh a Builder. Tell him, I too have known."
  
  Swift to my use in my trenches, where my well-planned ground-works grew,
  I tumbled his quoins and his ashlars, and cut and reset them anew.
  Lime I milled of his marbles; burned it, slacked it, and spread;
  Taking and leaving at pleasure the gifts of the humble dead.
  
  Yet I despised not nor gloried; yet, as we wrenched them apart,
  I read in the razed foundations the heart of that builder's heart.
  As he had risen and pleaded, so did I understand
  The form of the dream he had followed in the face of the thing he had planned.
  
              *   *   *   *   *
              
  When I was a King and a Mason -- in the open noon of my pride,
  They sent me a Word from the Darkness. They whispered and called me aside.
  They said -- "The end is forbidden." They said -- "Thy use is fulfilled.
  "Thy Palace shall stand as that other's -- the spoil of a King who shall build."
  
  I called my men from my trenches, my quarries, my wharves, and my sheers.
  All I had wrought I abandoned to the faith of the faithless years.
  Only I cut on the timber -- only I carved on the stone:
  "After me cometh a Builder. Tell him, I too have known!"