- this code generates more than 20 million dollars a year of revenue
- it runs on PHP
- it has been developed for 12 years directly on production with no source control ( hello index-new_2021-test-john_v2.php )
- it doesn't use composer or any dependency management. It's all require_once.
- it doesn't use any framework
- the routing is managed exclusively as rewrites in NGInX ( the NGInX config is around 10,000 lines )
- no code has ever been deleted. Things are just added . I gather the reason for that is because it was developed on production directly and deleting things is too risky.
- the database structure is the same mess, no migrations, etc... When adding a column, because of the volume of data, they add a new table with a join.
- JS and CSS is the same. Multiple versions of jQuery fighting each other depending on which page you are or even on the same page.
- no MVC pattern of course, or whatever pattern. No templating library. It's PHP 2003 style.
- In many places I see controllers like files making curl requests to its own rest API (via domain name, not localhost) doing oauth authorizations, etc... Just to get the menu items or list of products...
- no caching ( but there is memcached but only used for sessions ...)
- team is 3 people, quite junior. One backend, one front, one iOS/android. Resistance to change is huge.
- productivity is abysmal which is understandable. The mess is just too huge to be able to build anything.
This business unit has a pretty aggressive roadmap as management and HQ has no real understanding of these blockers. And post COVID, budget is really tight.
I know a full rewrite is necessary, but how to balance it?
But before you re-write once line of code - get some testing in place. Or, a lot of testing. If you have end-to-end tests that run through every feature that is currently used by your customer base, then you have a baseline to safely make changes. You can delete code as long as the tests pass. You can change code as long as the tests pass.
Once you are at that point, start picking off pieces to modernize and improve.
Also, respect the team. Maybe they aren't doing what you would, but they are keeping this beast alive, and probably have invaluable knowledge of how to do so. Don't come in pushing for change... come in embracing that this beast of a codebase makes 20 million a year. So talk about how the team can improve it, and modernize their skills at the same time.
Because if you walk in, saying, "This all sucks, and so do you, lets throw it out", do you really have to wonder why you are hitting resistance?
From a business perspective, nothing is broken. In fact, they laid a golden goose.
> team is 3 people, quite junior. One backend, one front, one iOS/android. Resistance to change is huge.
My mistake, they didn't lay a golden goose--they built a money printer. The ROI here is insane.
> productivity is abysmal which is understandable. The mess is just too huge to be able to build anything.
But you just told me they built a $20M revenue product with 3 bozos. That sounds unbelievably productive.
> This business unit has a pretty aggressive roadmap as management and HQ has no real understanding of these blockers
You should consider quitting your job.
As far as the business is concerned, there are no problems... because well... they have a money printer, and your team seems not to care enough to advocate for change. Business people don't give a damn about code quality. They give a damn about value. If 2003 style PHP code does that, so be it. Forget a rewrite, why waste time and effort doing simple refactoring? To them, even that has negative financial value.
From their perspective, you're not being paid to make code easy to work with, you're being paid to ship product in a rats nest. Maybe you could make a business case for why its valuable to use source control, dependency management, a framework, routing outside of nginx, and so on... but it doesn't sound like any of that mattered on the road to $20M a year, so it will be very difficult to convince them otherwise especially if your teammates resist.
This, again, is why you should consider leaving.
Some developers don't mind spaghetti, cowboy coding. You do. Don't subject yourself to a work environment and work style that's incompatible with you, especially when your teammates don't care either. I guarantee you will hate your job.
First, get everything in source control!
Next, make it possible to spin service up locally, pointing at production DB.
Then, get the db running locally.
Then get another server and get cd to that server, including creating the db, schema, and sample data.
Then add tests, run on pr, then code review, then auto deploy to new server.
This should stop the bleeding… no more index-new_2021-test-john_v2.php
Add tests and start deleting code.
Spin up a production server, load balance to it. When confident it works, blow away the old one and redeploy to it. Use the new server for blue/green deployments.
Write more tests for pages, clean up more code.
Pick a framework and use it for new pages, rewrite old pages only when major functionality changes. Don’t worry about multiple jquery versions on a page, lack of mvc, lack of framework, unless overhauling that page.
Second: Doing a full rewrite with a junior team is not going to end well. They’ll just make other mistakes in the rewritten app, and then you’ll be back where your started.
You need to gradually introduce better engineering practices, while at the same time keeping the project up and running (i.e. meeting business needs). I’d start with introducing revision control (git), then some static testing (phpstan, eslint), then some CI to run the test automatically, then unit/integration tests (phpunit), etc. These things should be introduced one at a time and over a timespan of months probably.
I’d also have a sort of long term technical vision to strive against, like “we are going to move away from our home-written framework towards Laravel”, or “we are moving towards building the client with React Native”, or whatever you think is a good end outcome.
You also need to shield the team from upper management and let them just focus on the engineering stuff. This means you need to understand the business side, and advocate for your team and product in the rest of the organization.
You have a lot of work ahead of you. Be communicative and strive towards letting people and business grow. I can see you focus a lot on the technical aspects. Try to not let that consume too much of your attention, but try to shift towards business and people instead.
The problem with this plan is corporate politics. Say that OP takes on this challenge. He makes a plan and carefully and patiently executes it. Say that in six months he's already fixed 30% of the problem, and by doing do he meaningfully improved the team's productivity.
The executives are happy. The distaster was averted, and now they can ask for more features and get them more quickly, which they do.
Congratulations, OP. You are now the team lead of a mediocre software project. You want to continue fixing the code beyond the 30%? Management will be happy for you to take it as a personal project. After all, you probably don't have anything to do on the weekend anyway.
You could stand strong and refuse to improve the infrastructure until the company explicitly prioritizes it. But then why would that job be better than just taking a random position in a FAANG company? The code quality will be better and so will the pay.
Those seem like low hanging fruit that are unlikely to effect prod.
You should also probably spend a decent amount of time convincing management of the situation. If they're oblivious that's never going to go well.
I agree a full rewrite is a mistake and you have to instead fixed bite sized chunks. It also will help to do that if you start to invest in tooling, a deploy story and eventually tests (I'm assuming there are none). If I was making 20 million off some code I'd sure as heck prioritize testing stuff (at least laying the groundwork).
Its probably also worth determining how risk tolerant the product is and you could probably move faster cleaning up if it is something that can accept risk. If it's super critical and I'd seriously prioritize setting up regression testing in some form first
1. Commit the entire production codebase to git and push it to a host (GitHub would be easiest here)
2. Set up a cron that runs once every ten minutes and commits ALL changes (with a dummy commit message) and pushes the result
Now you have a repo that's capturing changes. If someone messes up you have a chance to recover. You can also keep track of what changes are being applied using the commit log.
You can put this in place without anyone having to change their current processes.
Obviously you should aim to get them to use git properly, with proper commit messages - and eventually with production deploys happening from your git repository rather then people editing files in production!
But you can get a lot of value straight away from using this trick.
It's basically a form of git scraping: https://simonwillison.net/2020/Oct/9/git-scraping/
But I would start by choosing how and whether to fix up the crown jewels, the database.
You say that instead of adding columns, team has been adding new tables instead. With such behaviours, it's possible your database is such a steaming pile of crap that you'll be unable to move at any pace at all until you fix the database. Certainly if management want e.g. reporting tools added, you'd be much better to fix the database first. On the other hand, if the new functionality doesn't require significant database interaction (maybe you're just tarting up the front end and adding some eye candy) then maybe you can leave it be. Unlikely I would imagine.
Do not however just leave the database as a steaming pile of crap, and at the same time start writing a whole lot of new code against it. Every shitty database design decision made over the previous years will echo down and make it's ugly way into your new nice code. You will be better for the long run to normalise and rationalise the DB first.
Some these things are terrible choices but some of these are just weird choices that aren't neccesarily terrible or a minor inconvinence at most.
E.g. no source control - obviously that is terrible. But its also trivial to rectify. You could have fixed that in less time it took to write this post.
Otoh "it runs on php" - i know php aint cool anymore, but sheesh not being cool has no bearing on how maintainable something is.
> "it doesn't use composer or any dependency management. It's all require_once."
A weird choice, and one that certainly a bit messy, but hardly the end of the world in and of itself.
>it doesn't use any framework
So?
What really matters is if its a mess of spaghetti code. You can do that with or without a framework.
> no caching ( but there is memcached but only used for sessions ...)
Is performance unacceptable? If no, then then sounds like the right choice (premature optimization)...
> the database structure is the same mess, no migrations, etc... When adding a column, because of the volume of data, they add a new table with a join.
Not ideal... but also pretty minor.
Anyways, my point is that what you're describing is definitely unideal, but on the scale of legacy nightmeres seems not that bad.
If you stay you need to manage your relationship with the management team. This involves the usual reporting, lunches etc. You need to setup some sort of metrics immediately. Just quarterly might be sufficient. Nobody is going to care about bug fix counts, your metrics should be around features.
Testing and version control are a good place to start. But you are going to need to get them started there and you will pretty much need to instill good discipline. You will be herding cats for quite a while. If you can't get these two items going well in 3 months then abort and leave. You don't want to stick around for when the money printer stops working and nobody can figure out why.
We also introduced git as well as dev and staging tiers and some agile methodologies. Definitely do some that first!
Now, as management and customers are happy, the backend can be refactored step by step. Here, more test coverage might come in handy.
So, I'd recommend to be a bit picky about where to create value. You can restructure the whole database and that'll be good for maintenance (and most likely performance) but management & customers won't literally "see" much. Ask the people with the money for their preferences, excite them to get more runway. Regarding "backend stuff": Think like a Microservice architect and identify components that are least strongly coupled and have a big (performance) impact. Work on those when management is happy and you've got plenty of budget.
Your job is to create value and reduce risk. Not to create something that's technically awesome ;)
2. Slowly start extracting code and making small functions. Document like crazy in the code as you learn. Keep the single file or close to it, and don't worry about frameworks yet.
3. Introduce unit tests with each new function if you can.
After all that is done, make a plan for next steps (framework, practices, replace tech etc).
Along the way, take the jr backend engineer under your wing, explain everything, and ensure they are a strong ally.
Call me crazy, but that project sounds like fun.
We did a complete rewrite into a Django application, it took 2 years and untold political pain but was absolutely the correct choice. The legacy code was beyond saving and everyone on the team agreed with this assessment - meaning our political battles were only outward facing.
In order to get support, we started very small with it as a "20% project" for some of our engineers. After level setting auth, cicd, and infrastructure stuff, we began with one commonly used functionality and redirected the legacy php page to the new python-based page. Every sprint, in addition to all the firefighting we were doing, we'd make another stealth replacement of a legacy feature with its updated alternative.
Eventually we had enough evidence that the replacements were good (users impressed with responsiveness, upgraded UI stuff like replacing default buttons with bootstrap, etc.) that we got a blessing to make this a larger project. As the project succeeded piecemeal, we built more momentum and more wins until we had decent senior leadership backing.
Advocating for this change was basically the full time job of our non-technical team members for 2 straight years. We had good engineers quit, got into deeply frustrating fights with basically every department in the company and had rough go of it. In the end though, it did work out very well. Huge reduction in cost and complexity, ability to support really impactful stuff for the business with agility, and a ton of fulfilling dev experience for our engineers too.
All this is to say, I understand where everyone warning you not to do a rewrite is coming from. It's a deeply painful experience and not one to be embraced lightly. Your immediate leadership needs to genuinely believe in the effort and be willing to expend significant political capital on it. Your team also needs to be 100% on board.
If you can't make this happen and you're not working on a business which does immense social good and needs your support as a matter of charity, you should quit and go somewhere more comfortable.
1) A rewrite from scratch is almost always a bad idea, especially if the business side is doing just fine. By the way, when you want to sell a rewrite, you don't sell a rewrite, you sell an investment in a new product (with a new team) and a migration path; it's a different mindset, and you have to show business value in the new product (still ends up failing most of the time, but it has a better chance of getting approved).
2) You never ever try to change people (or yourself) directly. It's doomed to failure. You change the environment, then the environment changes the people (if the changes are slow and inertia is working for you, otherwise people just leave).
Since probably it would be too hard to change the environment by yourself and given that your team seems fine with the status quo, my advice it to just manage things as they are while you look for another job. Otherwise my bet is that your life will be miserable.
This isn't going to come off nicely, but your assumption that it needs a full rewrite, is in my eyes a bigger problem than the current mess itself.
The "very junior" devs who are "resistant" to change are potentially like that in your view for a reason. Because of the cluster they deal with I suspect the resistance is more they spend most of their time doing it XYZ way because that's the way they know how to get it done without it taking even more time.
What it sounds like to me is that this business could utilize someone at the table who can can understand the past, current, and future business - and can tie those requirements in with the current environment with perhaps "modernizing" mixed in there.
So uh, good luck. You're going to be the one everyone hates.
I'd just quit in your shoes, to be completely honest. Your desire for a solid foundation will never be seen as anything but a roadblock to an organization that just wants more floors added to the house with reckless abandon for safety.
Any securities gained by improvements you champion will go unnoticed. You will be blamed when the inevitable downtime from molding a mountain of shit into less of a mountain of shit happens.
You are going to lose this fight. Please just quit and go work for a software engineering organization, you seem to have taken a job at a sausage factory for some reason. I'd also try to learn from that...
Good luck.
In my view, as long as management believes this, a fix is not possible at all.
You should forget about improving the code but see your job as a kind of consultancy thing where you teach management about what they have and the consequences of that are.
And probably look for a new job. If you are completely successful with teaching management, it may be working on this, but it'd probably need to be renegotiated as if it were a new job
My thinking is that you can essentially plug a thousand probes into this frankenstein monster and start to learn the true shape and surface area of it without needing to step through the mess of the code. Then the code might make more sense, or at least a clearer path forward as to what a new architecture needs to look could appear.
Static analyzers might also be helpful, since the full featured ones tend to provide gui tools or outputs of things like call graphs or dependency chains. That can be useful in learning the 'true' surface area of the app, too.
20 mil a year is no joke, so use that to your advantage. It sounds like this has been stretched so thin that at this point it is a huge disaster/liability waiting to happen, so I would try and leverage some of that cash to use whatever paid/advanced tooling might be necessary to help here. Old PHP apps are a security nightmare waiting to happen, particularly as the world has moved on to higher and higher TLS levels.
From what you've mentioned, it sounds like every change that isn't additive is viewed as too risky. So at this point before trying to make big shifts, some work should be done to de-risk the situation as much as possible. Granted, you probably can't stop work and introduce a bunch of new practices and patterns, but you need to start reduce the risk to unleash the team to make necessary changes.
For example, introducing version control should be a slam dunk. Start using a database migration facility for all database changes. Create a release schedule that requires features to be stabilized by a certain window for deployment. Create some really, really simple Selenium tests by just browser recording yourself using the app.
Once you can start making changes more confidently, then you can start unwinding some of the bad choices moving forward. Resist the urge to start "making a good foundation for the future" by trying to rewrite core parts of the system immediately and instead start thinking in terms of forward progress oriented changes. Need to add a feature? Make sure to write that feature properly with good practices and make only the necessary changes to other parts of the system. I realize that's probably going to be painful, but eventually you will accrete enough of these small changes that you can string them together with a little more work into larger scale changes under the hood.
These things are rarely easy, especially in established legacy systems. But if this is the revenue engine for your company, you'll need to move conservatively but decisively or risk making the situation worse. Good luck!
I sort of think... if you have to ask this here you might be in the wrong job? Was this a job that seemed like something else then became this? This sounds like a job for an experienced VP Engineering. It is a tough order. Wouldn't know how to do it myself. Lots of technical challenges, people challenges, growth challenges, and managing up and down.
The resistance to change is something you need to get to the bottom of. People are naturally resistant to change if they are comfortable, and we've all been through 'crappy' changes before at companies and been burned.
The solution might be to get them to state the problems and get them to suggest solutions. You are acting more like a facilitator than an architect or a boss. If one of them suggests using SVN or Git because they are pissed off their changes got lost last week, then it was their idea. No need to sell it.
This assumes the team feels like a unit. If the 3 are individualistic, then that should be sorted first. E.g. if Frank thinks it is a problem but no one else does, and they can't agree amongst themselves, then the idea is not sold yet.
Once you know more about what your team think the problems are and add in a pinch of your own intuitions you might be able to formulate confidently the problems, so you can manage their expectations.
figure out what you want to fix first, and then fix that. then go to the next thing. but keep in mind - "management and HQ has no real understanding", and as far as they are concerned, what they have works.
if this doesn't sound like something you want to do, then find a new job. you are effectively the property manager for a run-down rental property. you aren't going to convince the owners to tear it down and build a new set of condos.
A full rewrite of a functional 12-year old application? Yea, you're going to waste years and deliver something that is functionaly worse than what you have. It took 12-years to build it would realistically take years to rebuild. Fixing this will take years and honestly some serious skill.
What you want to do is build something in front of your mudball application. For the most part your application will be working. It's just a mudball.
Step 0. Make management and HQ understand the state of the application. To do this I would make a presentation explaining and showing best practices from various project docs and then show what you have. Without this step, everything else is pointless.
If they don't understand how bad it is. You will fail. Failure is the only option.
If the team is not willing to change and you're not able to force change then you're going to fail.
So once you have the ability to implement changes.
Step 1. Add version control.
Step 2. Add a deployment process to stop coding developing in production.
Step 3. Standardise the development env.
If you have views and not intermingled php & html:
Step 4. Start a new frontend and create endpoints that reuse the original code to return json for all the variables.
If not:
Step 4. Add views. Copy all the html into another file and then make a note of the variables. Step 5. Start a new frontend and create endpoints that reuse the original code to return json for all the variables.
... Carry on moving things over to the new frontend until everything is in the frontend.
Probably a year later.
Step 6. When adding new functionality you can either rewrite that section, do a decorator approach, or edit the original functionality.
That's without fixing the database mess or infra mess.
https://www.amazon.com/Working-Effectively-Legacy-Michael-Fe....
Fully apprising management of the situation in a way they can understand may also reap long-term dividends.
That gives you an annual maintenance cost which will include, say "every 2 years something goes badly wrong with the flargle blargle, and costs $10,000 to fix", or "every 3 days we have to clear out the wurble gurble to stop it all crashing".
Finally, you put together the same thing but for a re-written version, or even with some basic improvements as others have suggested, and hopefully you see a lower total cost of maintenance.
At that point, you can weigh up the cost of either a rewrite or incremental improvements in actual dollars.
This should be the thing that starts every conversation. Because IT WORKS for the intended purpose.
Someone else said it. Put everything in source control first.
And just fix things that directly impact that 20 Million dollars a year.
Example, fix speed issues. Fix any javascript issues. Fix anything that will get you to 21 million dollars a year.
Then if you want, you can put together a small sub-team that would be responsible to transitioning certain pages into a framework. But don't rewrite the whole thing.
Unless you have power at the executive level, or are brought in as an expensive consultant to make big changes , you are wasting your time.
I would tell you to stick around and shovel shit just to take cash home but from your post it doesn’t sound like you are happy there to begin with.
What you are seeing here is a symptom of leadership not valuing engineer so trying to improve this requires a culture change from the top which is highly unlikely from where you stand.
If the pay is really good, you might consider sticking with it for a bit and then move on. However if you feel like it will push you towards burnout, abandon ship ASAP.
My younger self would have stayed and tried to be the unsung hero but now that I’m older I chuckle at that foolishness.
Don’t be the silent hero.
I worked for a company early in my career that sold a $1500 piece of software and had revenue of $15 million. When I was there, the head could was 70. Ten years later the head count is two - one engineer and one person to take the orders. And revenue was still a couple million. A classic "rot-in-place" situation.
In these types of situations, the problems are social and possibly political and rarely technical, even though the technical problems are the symptoms that present themselves so readily.
First of all, don't do a rewrite! Your team most likely do not know what they need to know to be able to perform a clean rewrite. You are new and still probably don't know the whole picture and all the knowledge that is in the application in one way or another. If you start a rewrite, the productivity will plummet and you will have to keep choosing whether to put resources on the new or on the old and the old will always win. I have seen this play out many times, the rewrite keeps getting starved of resources until it gets abandoned.
Refactor is better because you can balance allocating resources to refactoring as you go and also keep brining improvements that are BAU development more efficient.
Do not make mistake of forgetting about "the business". They probably are already irritated by the project and will be on the lookout for any further missteps from you. You might think you have good credit with them because they just hired you but that might simply not be the case. Their fuse is probably short. You need to keep them happy.
At first, prioritise changes that improve developer productivity. This is how you will create bandwidth necessary for further improvements. This means improving development process, improving ability to debug problems, improving parts of the applications that are modified most frequently for new features.
Second, make sure to prove the team is able to deliver the features the business wants. The business probably doesn't care about the state of the application but they do care that you deliver features. This is how you will create the credit of trust with them that will allow you to make any larger changes.
Do make sure to hire at least one other person that know what they are doing (and know what they are getting into).
My thoughts:
* Get the code in source control straight away
* Get the infrastructure stable and up to date if it's not
* Get CI pipelines set up. As part of this, make sure the code is running through a static analyser. This will give you a backlog of things to work on.
* Organize an external penetration test to be carried out
* Investigate updating and/or consolidating the software libraries used (Jquery etc)
* Choose a page/feature to update on its own. Bring it up to date.
At this point, you should be in a much better state and you will have learned a lot.
You have to have a conversation with the people responsible for this shit, including (and specially) stakeholders, make them aware of the problem, and get them on board with respect to the possible solution. This step is essential before even bothering to do fucking anything.
Most importantly, make it clear that while you are there to help, this is their responsibility, and they have to become a part of the solution by making amends. If they're not willing to own their responsibility and collaborate, get the fuck out of that tech debt mill or it will ruin your life.
If you want to try to redo everything alone in silence, you will have to work infinitely hard, and in the end, three things can happen:
a) you fail, and then the organization gets rid of you. the most likely outcome.
b) you succeed, but now "you know too much", you have dirt on a lot of people that fucked up and become the Comrade Legasov from Chernobyl that becomes the target of important people from the Soviet communist party. They will get rid of you once the problem is gone because now you have no value to them.
c) in the best case scenario, you succeed, but noone will congratulate you because that means that a problem existed in the first place, and since noone is willing to assume any responsibility for their contributions to the problem, noone will say fucking anything. all your contributions will be for nothing. and if you insist that a problem existed, you'll go to outcome b). Otherwise, they will go back to their old ways and create the next fucking mess for you to solve.
Personally, I would get the fuck out. It is clear that nobody there was committed to do the right thing, starting from the hiring process. It is either highly unprepared people, extreme normalization of deviance, or some highly idiotic leadership obsessed with the short-term. Whatever it is, that team is rotten and needs an amputation. If I stayed, I would start by laying off the entire team and then rehiring everyone on a 3-month test period where they will have to completely change their attitude towards development.
Get some type of CI/devops thing going so you can deploy to a temporary test environment whenever you want. This applies to the data too so that means getting backups working. Don't forget email notifications and stuff like that.
Next comes some manner of automated testing. Nothing too flash, just try to cover as much of the codebase as possible so you can know if something has broken.
Go over the codebase looking for dramatic security problems. I bet there's some "stringified" SQL in there. Any hard coded passwords? Plaintext API calls?
And now everything else. You're going to be busy.
1. Convince the business team that these team members might leave and put the $20mn revenue at risk. There is no way you can make them learn and do things properly. Therefore, take separate budget, hire a new separate team. Do full rewrite of backend and plug the app and new website into it. It would be 1-2 year project with high chance of getting failed (on big bang release...stressful and large chance of you getting fired but once done you can fire the oldies and give the business team a completely new setup and team) or partial failed (that means large part of traffic would move to new system but some parts would remain...making the whole transition slow, painful and complex plus never ending).
2. Add newer strong and senior php members to the existing team. Ask new senior members to not fight with them but train them. They would listen to them as these guys would know more. Slowly add version control, staging-dev envs, add php framework for new code, add caching, CI/CD pipeline, bring on a automated test suite built by external agency etc. This would be low risk as business team would see immediate benefits/speedups. Rewrite portions of code which are too rusty and remove code bases which do not required anymore. This would be possibly take 5-6 years to complete, giving you ample job security while achieving results in a stable manner.
The goal is to slowly build up a parallel application which will seamlessly inherit an increasing number of tasks from the legacy system.
What I would start with, is building a compatibility layer. For example: the new code base should be able to make use of the old application's sessions. This way, you could rewrite a single page on the new system and add a reverse proxy one page at a time. Eventually, every page will be served by the new application and you can retire the old.
I would stick with the language but pick up something more capable, ie. Laravel. This makes it easy to copy over legacy code as needed.
Godspeed.
Whatever else you do, I hope you and the organization figure out how to celebrate that those three people are generating 20 million dollars of revenue (or at least keeping part of the machinery that does that running.
"I know a full rewrite is necessary, but how to balance it?"
Well, maybe...
How much code is it? How much traffic does it receive?
----
I would be looking at https://martinfowler.com/bliki/BranchByAbstraction.html or https://martinfowler.com/bliki/StranglerFigApplication.html
and
> team is 3 people
and
> post COVID, budget is really tight
Why? All technical details aside if this can't be addressed I wouldn't even bother trying unless I owned stock.
One tip — don't complain to management about how "awful the codebase is" or "how you need to start over" (100% agree this is usually a terrible idea). Managers have been hearing this over their entire career as a technical manager (over time they can lose empathy being out of the weeds). It becomes an overused trope and management will start to see you as being problem-oriented.
I'm not saying don't surface the issues — management absolutely should have an accurate understanding. Instead, try and balance the good with the bad (and there will always be some good). Don't catastrophize — approach is as a manageable problem with quantified risk e.g. responding to estimation with "typically this is a straightforward problem to solve, but I've explored this area of the codebase and there are some challenges we'll need to overcome — the estimate will be larger and less precise then we want to see, and we'll benefit from prototyping/research/spikes to reduce risk of introducing serious bugs and come to a more accurate estimate".
You'll build trust by consistently delivering on the expectations you set around concrete features/task (including the negative expectations) then management will reach the conclusion themselves and will trust your assessment with any new project. Plus, management will ultimately see you as an incredible asset to help bridge the gap of the technical black box and their purvey.
- git: More productive and more control over the code base and the each member team responsibilities. Don't change the structure of the code. If it's a monorepo, leave it as it is. Just create simple branches like, prod and dev. Consider putting nginx configuration into the repositories as well (since it's part of the application).
- Documentation Via Comments: In this part you should improve a little of culture in this team, new code should be documented at least using comments.
- Test environment: now you have a dev branch you can push all the code to this new test environment and test things without worries. If it's possible start to write configuration environment case it's needed.
- CI/CD: now everything is traceable by git, you can write a routine to deploy every branch on it's place. Some tools self hosted you can consider: Jenkins or Drone.io are great and requires almost no maintenance(no need to hire a devops to work on this)
- Database: you have test environment and ci/cd, now you can TEST(what a great news) your database migrations. In php I can remember of phinx to starting to write migrations for this application.
- Auto Tests: I think unit testing could be considered when adding new code. Old code just leave as it is.
If you apply at least 3 things of this list I think at this point you will see that's a rewrite could be not that necessary.
Not saying this is your only option, but I am saying, if the tech work is hopeless, the culture is unreasonable, and it's not gonna change until two-three people go through it and are honest at exit interviews, you have to make an honest assessment of your goals. Last I checked the company decided to hire someone for 2x what I worked for, and that person put "open to new roles" on their LinkedIn a few weeks ago...
Or does this software "facilitate" $20 million of revenue, instead of generate it single handedly.
What if were talking about a car sales website that 'generates' $20 million in revenue via selling 500 $40k cars?
leave it the fuck alone
You speak pf "resistance to change", from juniors? You are the change. You get to set the agenda, not them. Unless you don't, in which case you can't fix anything. But legitimacy comes not just from authority, but also from rigor. Anything you truly dictate needs to be 100% based in evidence and fact. This means letting go of implied a-prioris such as "PHP is bad" and "we must use a framework". The only real constraint is to keep the gravy train rolling.
So what exactly is your role, the thing you were hired for? If it's to manage, manage. If it's anything else, the best you can do is lead by example. But one way or another, you'll have to let go of some things.
I would: 1. Get it in source control without “fixing anything”. 2. Get a clone of the prod server up and running, vs a clown of the db. 3. Put in something to log all of the request/response pairs. 4. Take snapshots of the database at several time points and note where they occur on the log history from number 3.
You now have the raw material to make test cases that verify the system works as it did before, but for bug, when you refactor. If the same set of requests creates the same overall db changes and response messages, you “pass tests”.
First thing to refactor is stochastic code. Make it consistent even if it’s a little slower so you can test.
Once you can refactor, you can do anything. Including a full rewrite but in steps that don’t break it.
If you try to rewrite it from scratch it will probably just never be deployable. But you’d an rewrite it safely in chunks with the above.
I don’t believe this is a problem that can be solved with people skills alone. It requires senior technical expertise.
You are probably better off leaving. You'll have to solve a culture problem and a technology problem at the same time. Each step of the process will be an uphill battle and even if you do succeed no one will notice since the app will look the same. The appetite for change will only materialize once market share and revenues start to drop, at which point it will likely be too late.
You'll need to be the CEO or trusted executive to effect this kind of change. Trying to do this from a middle management or dev position won't work and will come at enormous personal cost.
The best solution - for me - ended up dropping them as a client. There was zero interest in change from both developers and management (no matter how senior).
We parted ways and I wished them good luck.
Occasionally I wonder what happened to the application containing 50,000 procedural PHP files. Yes, 50k. And no source control or off-server backup.
Same for the DB - instrument your queries, figure out what your most important queries are.
There are some things it should be easy to sell to both the team and to management. First, adding git into the mix. Tell them it's like backing up your work forever. You can roll most changes back to the beginning of the repo, easily. I say most changes because rolling back the code won't roll back changes to the database.
Likewise, creating a preprod environment means you can make sure new stuff doesn't bring down the system before you roll it out. Yes, it will cost a bit more but having that extra assurance and the ability to do a little experiment is considered worth it by almost every other team on Earth.
If you can get those two things in place, you can make it policy that nothing is done directly on production because the risk is too high.
Then you can tackle refactoring code, a little at a time.
Focus hard on training the team. If they are as junior as you say, they need to learn good habits before their ability to ever work as professionals is destroyed. Don't explain it to them that way. Smile and tell them you just want to help them develop their careers, which should be pretty close to the truth.
Above all, keep your resume up to date and your ear to the ground. It sounds like you may burn out before all the work is complete. Have an exit strategy, just in case.
Good luck!
2. You say you don’t manage the team. I guess you have some kind of ‘tech lead’ role. I think to get things to change, you’re going to need buy in from management and the team. If the budget is tight it will be harder to say ‘we need to invest in fixing all this stuff instead of whatever it is that actually makes money’. Whatever you do must have a good business case. It sounds like there needs to be better communication about the state of things with whoever in the business unit came up with the aggressive roadmap.
Perhaps a roadmap like this would work:
- First, set up source control and separate prod from however people are developing things. Hopefully this will reduce trivial outages from people eg making a syntax error when editing prod. I think this will be a difficult fight with the team and management may not understand what you’re doing. You’ll likely need to be ready to be the person who answers everyone’s git questions and un-fucks their local repos. You’ll probably also want some metrics or something to show that you are reducing trivial errors.
- I think some intermediate stages might involve people still developing in prod but having source control there and committing changes; then developing locally with a short feedback loop from pushing to running on prod (you won’t get but-in if you make the development process slower/more inconvenient for the team); then you can hopefully add some trivial tests like php syntax checks, and then slowly build up a local dev environment that is separate from proof and more tests. At some point you could eg use branches and perhaps some kind of code-review process (you can’t be the only person responsible for code review, to be clear)
- You’re going to want a way to delete old code. I think partly you will be able to find unreachable code and delete it but also you’ll likely want a way to easily instrument a function to see if it is ever used in proof over eg a week or two.
- Eventually, improving the dev environment enough may have already led to some necessary refactors and you’ll have enough tests that the defect rate will have decreased. At some point you’ll hopefully be confident enough to make bigger reactors or deletions and wean people further off messing with prod. For example moving some routing, bit-by-bit outside of nginx or perhaps using some lightweight framework.
- you should also get the team involved in making some smaller refactors too and they should definitely be involved in adding tests.
One thing you could do if you haven't been asked to fix these things is to "provoke" management into asking you to fix these things. You could talk to your boss and ask them what they don't like about the current setup. They might answer that the velocity is too slow, that the software is too unreliable, has too many bugs, or they might answer that everything's fine they just want you do implement their new features. Be careful not to lead management here, you want to find out what they actually want, not persuade them to want something (that won't work, it won't be a real desire). If they do want you to change something, you can argue for some of the suggestions in this thread (e.g. introduce VCS) where you can clearly draw an argument from one of the desires e.g. problem "releases are too risky", solution "if we use VCS we have old versions and can roll back".
Basically you've been hired to do a job. If your job is to fix all this stuff, fair enough. But if you haven't been asked to do this (and you can't provoke them to ask you) then it's simply not your job, and you have to accept the situation or find a new job.
First thing would be to use source control and get some sort of code review/release process in place.
Contrary to other suggestions of "then write tests for everything", I think that's bad advice. It's far more likely that you'll pigeonhole yourself and your team on complicated and unhelpful tests, particularly with dead code and trying to enumerate all the features that aren't documented. 3 things you could do in a short amount of time to radically increase the code quality:
- Lint all the code (php-cs-fixer is a good tool, rector can also help)
- At least start dependency management (with composer), even if it's empty.
- Introduce static analysis into the code review process (phpstan/psalm, in a CI preferably). Baseline suppression of existing errors are easy to generate.
Then personally I would try and aggressively purge dead code, which is easier said than done. Tombs (https://github.com/krakjoe/tombs) is a little awkward but can be helpful, especially if all there is is production. It requires PHP 7.1, I'm assuming you're below that, but the good news is that every route is in nginx; you can upgrade PHP piecemeal.
Again, handling tech debt sounds like it will be nigh impossible at this company, but modern PHP is really enjoyable and I hope you're able to experience it.
* Create a git repo from the code as it exists
* If the other team is still doing things live, create a workflow that copies the code from the prod server to git as-is nightly so you have visibility into changes. Here’s an opportunity for you to see maybe what the team gets stuck on or frustrated with, and you can build some lines of communication and most importantly some trust. You can suggest fixed and maybe even develop the leadership role you need.
* Get a staging instance up and running. If I had to guess why the team does things live, maybe the project is a huge pain to get sample data for. If that’s the case, figure out the schemas and build a sample data creation tool. Share with the team and demonstrate how they can make changes without having to risk breaking production (and for goodwill - it helps prevent them from having to work evenings, weekends, and vacations because prod goes down!)
* PHP isn’t so bad! Wordpress runs a huge chunk of the web with PHP!
* tailwind might be a cool way to slowly improve CSS - it can drop into a project better than other css frameworks IMO
* Pitch your way of fixing this to management while quoting the cost of a rebuild from different agencies. Throw in the cost of Accenture to rebuild or whatever to scare management a little. You are the most cost effective fix for now and they need to know that.
- team is 3 junior people
- productivity is abysmal
- budget is tight
- resistance to change is huge
- aggressive roadmap
- management and HQ have no real understanding
I have never walked away from a technical challenge, but I've exited from management clusterfucks and have never regretted it. These people will block you, blame you for anything you break during the refactor but give you no thanks if you fix it (because they don't even understand the scale of what you're trying to fix)
From the HQ perspective they make a lot of money with very few developers and all seems to be going well, with no problems at all. Judging by the spreadsheets this looks great!
Your task is now to explain to them the risks involved with proceeding forward. You can also present them a plan to mitigate that risk without interrupting ongoing operations too much and slap some money figure on it — ideally you present them three options where one of those is doing nothing. Be aware that the decision on this is not yours, it is theirs. Your task is to tell them everything relevant for that decision. You can also tell them, that your professional opinion is that this is something that ahould have been done years ago and the fact that this didn't explode in their faces yet was pure luck. But again it is their decision.
How you lay it out depends on you, but there have been many tips already. Version control might be the first thing. Maybe you can present it as: one day a week goes towards maintenance or something.
As an aside this helps to cover your own behind if nothing is done and everything goes south in a year. Then you can point to that extensive risk analysis you presented them with and tell them you told them so.
I'll share some less technical thoughts that I think hold true regardless of the approach taken (rewrite or not).
My experience with changes like this is that you need to be as transparent as possible to both parties (the devs, and your execs). This means consistent comms around goals, achievements, and crucially the challenges preventing the first two.
With any team, you are not going to win much by implying that their work sucks or the thing they have built is broken. While they might know it, a third party is just not going to get a good reception with that mentality. It will be important for the devs to understand why the change is needed from a business perspective (e.g. time to market is too slow to remain competitive, changing regs, etc.). The intent here is to focus the devs on what the hope is for the future as opposed to shitting over the thing they have poured their blood, sweat, and tears into.
With the execs, they need to understand just how bad of a shape things are in so they give you and the team the space they need to make a significant enough change that isn't just going to revert to the same mess as before. If you're dealing with tech background execs it might be a bit of a simpler set of convos. But if not, then you are going to have to illustrate for them how bad things are. One way I've done this is to first get an idea of what the execs want as the final state of the team / codebase / product (e.g. time to market is < 4 wks) and then draw them a picture/flowchart of what it takes to get that thing done in the current state. Could use some form of value stream map to do this as it combines players, states, activities, and also timelines.
I suggested full rewrite and got fired in 3 weeks (actually it was a subcontractor role). I have been considering myself as be really good at presentations and persuading executive people to understand what I am doing and what I will be doing, but the situation was too much for me to take on. They didn't like this unrealistic 3-months roadmap to rewrite the whole thing, which does nothing on their point of view but still needs paying the whole team (even though I was the only one). So I told them we are gradually improving it, and did this full-rewrite underground on my own. It consumed me ~13 hours every day, but I was happy myself and was enjoying the birth of the product. Finally after 10 weeks, I gave up to myself and their frustration.
Regarding your problem, I totally suggest dumping your codebase into a git repo first of all, add some cypress/playwright testing to carefully probe the major functionalities, build ci for these, and start gradually removing old version files. After then, just forget how messy it was, what you thought in the first place, consider this beast as a perfect engineering gift (like linux kernel), then start making small changes then adapt yourself into it. Guide the team to follow your methodologies to treat the code, and tell the executive team that the legacy codebase looks great but complex enough to move quickly as it was a brand new startup project.
This code is making $20mio, so something must be going well. Don't forget that a codebase like this covers all the history and knowledge.
So first make sure that you appreciate the work of the current team. As you write "resistance to change is huge" I would bet that the team doesn't feel like you're trying to understand them.
It actually reminds me of a client who I wrote an order system in PHP that made $15mio annually. As the client and me didn't get along anymore he was looking for someone to replace me, and found this new CTO who came in with "everything's shitty, nothing works, we need to redo everything". Obviously the client finally saw the chance of getting rid of me, only to ask me one month later to come back as they fired the new CTO. Seems like something was working all along :)
The worsr thing you can do is come in with that attitude and expect the team to be onboard. You will only alienate yourself, try to understand why things were done the way they were (never architected but put together piece by piece over time). Make them feel heard and and pace yourself with any changes.
What you need to do is full rewrite. You need business owners backing you up on this intention. From your description they don’t understand the scale of the problem. So that’s a dead end.
When they will understand that they have to halt all new developments for few years and drastically increase budgets for development team meantime, you can start thinking about how to proceed. But they will not.
Then check for the most basic security issues like the database being accessible from the outside, SQL injection, etc.
Then set up monitoring. It's quite possible the thing is falling over from time to time without people knowing.
So let's stick to advice that is universal to all roles and I think most people who have been in similar situations would agree with. First, let's be clear about one thing: This situation isn't the least bit unusual. From the facts above it doesn't look very bad. The team is small, and you can all gather in the same room and communicate. The fact that there is no framework and no patterns in place is good given the circumstances, awful codebases based on ancient frameworks and legacy patterns are generally an order of magnitude more work to understand.
Second, be humble towards the team and the problem. After such a long time, there's bound to be details that you don't know, and you have to find out about them sooner rather than later. People may seem resistant to change, but understand their angle and work with them. The likely want their codebase to improve, too, even if they see other problems as more pressing. It all depends on what your role is, and if you intent to help out with the actual work or not. But again, this is a small team with a shared goal.
Third, start with the lowest hanging fruit. Personal opinions come into play here, but I probably would look at operational issues early. Get monitoring in place. Test backups (yes, really). Some key metrics, both application wise (on some key processes such as login or payments) and operational (memory, open files, sockets). Learn about version control and start using it. Get proper test environments in place (including databases and mocked external integrations).
Good luck! Things are probably not as bad as you think. This type of work is really quite rewarding, because results are quickly very visible to everyone.
Secondly the worst code you've ever seen is capable of pulling in 20 mil per year. Was the best? There is something to be said for success and it really makes me wonder about what 'good code' really is suppose to look like.
Granted a lot of what you're describing sounds terrifying.
If you want to deal with it first thing you should do is stand up a testing server and backup the code. Get some E2E tests in place to keep track of expected behavior. All of this can be done without removing a line of code and you can do it yourself while the team goes about their merry business. This is where I would start.
But if I could hijack and m curious the HN opinion on a variant: what if the product never launched? It’s 10,000 files of spaghetti and dreams that just cant work well enough to put in fro t of customers?
I was brought in on such a project and the very kind business owner was under the impression they were close to launch because of all the features he’d seen demoed in isolation. But it was like a bridge built of gum and two-by-fours, spanning a massive gully but with a 10 foot gap in the middle, and nowhere near the strength to fill that last span.
1. Start adding logging all throughout, wherever changes are being made. That can quickly build up insight into what's happening where and gain confidence into what can be deleted safely. You want the meeting where you can show that an entire file is completely unused and has never once been called for months. It surely exists. Find it. Then say you won't delete it, you'll just comment it out.
2. As you make changes, start doing things twice: one in the way that patches the code as directly as you can manage, the other a stub into a possible design pattern. You don't want to force the pattern into production as soon as you think it works, instead you wait until the code hits a certain evolutionary state where you can "harvest" it easily. Think "architecture as a feature-flag". If it turns out your design can't work, nothing bad happens, you just delete those stubs and give it another go.
3. I would not actually worry about the state of the tooling otherwise. Backups for recovering from the catastrophic, yes. Getting the team on git, not as important. Adding standardized tooling is comforting to you because you're parachuting in. It adds more moving parts for the other devs. That's true even when they benefit from it: if the expected cost of wielding a tool wrongly is high enough to cause immediate danger, you can't proceed down that road - in woodworking that means lost fingers, in software it means lost data. You have to expect to wind down the mess in a low-impact, possibly home-grown way. There are always alternatives in software. And there are likewise always ways of causing fires to fight.
This job most likely isn't going to lead towards using anything new and hot. But if you go in with an attitude of seeing what you can make of the climate as it is, it will teach you things you never knew about maintenance.
Ex: if you tell your team “drop everything you’re doing and follow my best practices”, it won’t be accepted, and business will ask why you’re wasting time. Instead, if you tell your team “we need to improve these calls making a cUrl request to it’s own domain because this is a performance/security issue that might make us lose those 20 million”, then you might have a chance of changing culture overtime after accumulating smaller wins. Keep doing this for every specific point of possible improvement, backing it with a business justification.
Sorry, but you can forget about this. Because if you are not in a managerial position and do not have support from management no matter what you are trying to do to make things better could even backfire on you and could even lead to reprimands or in the absolutely worst case scenario getting you fired.
That's why the realistic old geezers around here would recommend people who are in a situation like this to please look around and try to find something better.
If you are in the situation that you actually can make decisions and have management support (or can acquire it) then it's a whole different story, ofcourse.
Very gradual, well-tested evolutions is the way to go. If it were me I would add a LOT of unit and integration tests before I changed anything. I would also formalize the expected behaviour, schemas, APIs, etc.
You’ve inherited the Ship of Theseus. Believe it or not, this is actually a huge boon for you. 18 months from now your managers will look back and say, “wow this is the same ship?! I want you on my team wherever I end up.”
The best approach is to:
-Assess the situation
-Create a task list
-Decide what needs immediate attention
-Create a time line for it all
-Get feedback from team
-Add the business roadmap to you list
-With upper management work on a timeline
-Define your project with realistic times
Execute and manage the project.
It took 12 years to get to this point so don't expect to change it overnight.
BTW, this type of team and codebase is not out of the ordinary. Companies start to program with the idea that eventually the problems will be fixed yet it never happens. Upper management does not care because all they care about is reducing cost and getting the results they need. You're dealing with the results.
I inherited something similar 12 years ago, also cobbled together PHP, also no separation of code and rendering - making any sort of progress was painful.
As others have said there are a myriad of ways to extend code like this, encapsulating the old with a better facade. Splitting some pieces off - but it needs to be approached as a piecemeal project that takes a decent amount of time, but can be done in parallel with shipping new features.
No, re-write over time. There's an extremely high chance there is complexity you do not understand yet.
> - it has been developed for 12 years directly on production with no source control ( hello index-new_2021-test-john_v2.php )
First immediate win, start using source control. Initially people can operate in the same way they have been, just through git. Slowly but surely clean up the old files and show people how they are not lost, and how it cleans up the code. The switch to more advanced code management practices, like master branch vs working branches, code reviews, etc.
> - the routing is managed exclusively as rewrites in NGInX ( the NGInX config is around 10,000 lines )
Make sure this is definitely checked into git. Ideally you look to simplify this somewhat, you don't really want to be so heavily tied to the server.
> - the database structure is the same mess, no migrations, etc... When adding a column, because of the volume of data, they add a new table with a join.
A migration to a better database setup takes time. As long as there are no fires, treat it as a black box until you have time to fix it up. Just double check their backup strategy.
> - team is 3 people, quite junior. One backend, one front, one iOS/android. Resistance to change is huge.
It sounds like you are new to their team. You need to win hearts and minds. One small thing at a time.
> This business unit has a pretty aggressive roadmap as management and HQ has no real understanding of these blockers. And post COVID, budget is really tight.
Explain to them the code is like an old house. It has had lots of investment over the years and you have generated a lot of profit from it. The problem is, over the years, the foundations have crumbled, and despite the walls looks nice, they are painted over serious cracks. Whilst you could continue to use it how it is, one day it will simply fall down - unless time is invested today to maintain it.
They will then say "well, what needs to be done?". And you need quite a concise and well thought out way to respond to that question.
Hey I've done this, everyone states just rewrite each part isn't really helpful.
You first need to fixup obvious brokenness, turn on error logging and warnings within fpm, next fix absolute path issues, next fix any containerization issues (deps, etc) and containerized it, next roll out some sort linter and formatter.
At this point you have a ci system with standardized formatting and linting now slowly part out things or do a full rewrite as you now can read the code make changes locally
Is there documentation, requirements or user stories available for the existing features? Is it B2B or B2C? If it's B2B it becomes a lot easier to do customer survey of what is actually used and could help you remove half of the 12 year legacy.
Apart from the lack of source control, the rest of the issues, while being far from best practices, honestly don't sound extremely bad. Lack of framework or DI is not an antipattern in itself, even if it of course can be. Productivity of 3 juniors, split across one stack each, doing both operations and feature development on such a big application is going to be small even if using better practices. If revenue really is 20M and this code is critical, it sounds like you are understaffed.
Skipping the scm, deployment and process improvements, as others already gave good suggestions. Assuming you need to keep the existing code. One thing that has not been mentioned in static analysis. If the majority of the rats nest is in PHP, one thing you should do to add static type checking. This has zero effect on production and makes the code infinitely easier to navigate. This will expose how much of the code that is dead, how much is shared, what depends on what, etc. From here, refactoring will be a lot easier and safer. As others suggested you obviously need tests around it as well.
And it's (still) free!
It's almost always better to do small replacements. Peel the onion so-to-speak. Refactor from within first to make a migration plan away from the crufty tech possible.
First and foremost: make a plan and sell it to the devs. If you don't get buy-in from them, nothing will change.
Good luck.
If you do manage to get them to give you some points of the revenue from the project, then start with introducing source control and follow by building up a testing process and integration environment. I’d probably use tcpdump to capture a week’s worth of live traffic and replay it with a suitable speed up to replicate testing in production in your integration environment. That should give you serviceable integration tests. To start by writing unit tests will be pointless because it sounds like there are no discrete units to test.
From there you’ll want to apply some kind of strangler pattern where you incrementally replace pieces of the system. Doing that will require some refactoring to start separating concerns. Again don’t try to do it all at once and don’t try to make it perfect. Then you can start introducing unit tests.
Then there’s the database, which is a full size job in its own right.
And who knows what other unpleasant surprises await, but bank on them being there.
- Is anything broken after all? Yes, there are annoyances and risks, but in the greater schenem, everything seems to work. Is fixing really necessary or would you just feel better after that?
- What does the "aggressive roadmap" look like? Build another product? Double or triple the revenue from this product? I think this helps/defines how to handle the situation.
- Your job as middle level management (at least that's how I understand it, being in that position myself) is to shield your teams from direct hits with piles of shit, while getting them running to evade the stuff by themselves at some point. Seems like your team already did great things in building the product, now help them get better, one small step at a time. I think they can see the benefits in things like using Git but probably you need to help them make some room to learn it without fearing that upper level management thinks they are lazy and not doing anything...
- Leaving the company: Maybe that's a viable option, too. You can't save them all. And if you feel overwhelmed by the task and see no way forward, you should leave. That's not about being weak, it's about protecting yourself from an unmanagable task.
This is the key point. Why is there resistence to change if everything is as bad as you say? How does tings look from the perspecive of the developers?
There is also a certain disconnect in what you are describing. On one hand you describe the developers as “junior”, productivity as absymal and it is inpossible to get anything done. On the other hand the code seem to be highly sucessful from a business perspective, generating millions in revenue. Something is missing in your analysis.
Set up two nginx servers. One that's your usual to load Laravel and the other to the legacy nginx server that acts as routing to the legacy application. I would even recommend using OpenResty to help delegate if you need something intelligent.
I would strongly discourage a JS framework that would add increased complexity when you need to keep things focused. The front-end would need to be recorded in Laravel and brought back over in a clean fashion.
Set up a CI and ensure all the code that goes over to Laravel is near 100% tested. Might also be useful to set up a visual regression test tool such as Percy to ensure everything moves over nice. Push for SMACCS and BEM to keep things consistent. Or just make new styling for the new pages to surprise the users.
Rewrites are a trap though and can be painful. Keep a balance of features entering Laravel and the big foxes entering the legacy app. I would recommend RabbitMQ to communicate between them.
That’ll be 200$ lol
- Large fraction of features are unused. Have internal analytics that will answer you which features/code paths are used and which are safe to delete/ignore. It's much easier to migrate xx% of features than have 1:1 parity.
- Lack of tests is a huge pain. Makes incremental migration near impossible. Find a workaround for it before jumping to migration (forcing huge code coverage increase for already submitted code never worked for me in the past)
- See if some parts can be proxied. Put proxies in place and migrate features behind it (in one past project, the logic was split between stored procedures in Oracle DB, backend code and js code -- which made it possible to proxy stored procedures and break the migration in milestones)
- Hackatons are great tool for exploring options, uncovering blockers and dedicating a large chunk of focused time. Make it clear that the result is experimental, not that it must be merged to main. A nice way for introducing frameworks, vcs etc. without high friction.
The rest depends on the management support, the teams aptitude, intake of feature requests & bugs, the difficulty of maintenance etc. You are the best to judge how to approach there.
After you've got a working time window for getting things right, prepare a workflow that should take half the time you've discussed, as it will probably take twice the time than anticipated. (if you've negotiated on 3 months of fixing the mess, assume you have only 1.5 months or even 1 month and prepare 1 month's worth of work)
Then I think the very first thing should be moving to Git (or other SVN), setup development/staging environment and using CI/CD.
After making 100% sure the environments are separated, start writing tests. Perhaps not hundreds or thousands at this stage, but ones that catch up the critical/big failures at least.
After it start moving to a dependency manager and resolving multiple-version conflicts in the process.
Then find the most repeated parts of the code and start refactoring them.
As you have more time you can start organizing code more and more.
It sucks but it's not something that can't be fixed.
Also finally, given the work environment before you came, it might be a good idea to block pushes to the master/production branch and only accept it through PRs with all the tests requiring to pass, to prevent breaking anything in production.
To make allies of senior management, you need metrics. You need to show, concretely, how current operations put revenue at risk and make the incremental investment necessary for their roadmap items prohibitive. If you can swing a penetration test, they'll probably find plenty on a stack like this. Then you have a security justification. If not, get the best monitoring stack you can. Demonstrate reliability and performance issues. (As well as reliability and performance improvements.)
From there... I'll say the #1 tool I've used in situations like this is Fastly. VCL is way more flexible than your 10k line nginx rewrite file (I've been there, too). And the edge caching will paper over minor outages. Rollbacks are easy. Rebuild your site piece by piece and stitch it all together with a reverse proxy at the edge.
Advice: propose a "canary" portion of the site to rebuild, and make it the lowest revenue / highest complexity thing you can. Once you stabilize the cash cow, getting the buy-in to finish the job and deprecate the old code base will be tough.
I'd also advocate for adding 1 incremental engineer to your team. Make it a senior dev and interview specifically for people who have done this sort of thing before. Your team needs a hands-on mentor in the trenches with them.
Best of luck. It isn't easy, but it's rewarding.
Nice thing is you can start with the current codebase, and add in these, it will make the rewrite a lot easier since your capabilities/feature-configs are already extracted.
Example:
If your product/code serves multiple customers, you should never have:
if (customer-id==123 || customer-id==999) {
//do things this way
}else {
//do things the other way
}
instead always aim for feature-options (config?)
if (config-feature-a == true){
//do things this way
}else {
//do things the other way
}
If this seems not related to your codebase or product, you just need to dig deeper, it's usually there in some form or another.PS. If you think the above is 'obvious', you have probably not seen an old enough (or bad enough ?) codebase, few coders start out with the bad case, the bad-case (coding to a instance/customer) are those 'quick-fixes' that accumulate over the years.
- Start using migrations. (build migration file from current DB)
- Start using CI/CD. (Run migrations, pull/push PHP files, add new nginx routes and reload nginx)
- Start using docker for dev env.
Then I'd focus on the application itself, and that will probably take some days/weeks of work if the routes are complicated or the PHP version is very old(12 years/PHP 5.2 or 5.3 should not be too much work):
- Upgrade the codebase to PHP 8.1. (Reactor might be useful here, but PHP code is generally not hard to update.)
- Consider doing routing using a PHP entry file instead of nginx.
Then new features can follow whatever pattern you want, and old code/DB-schema can be upgraded as you go.
Many of your points are non-issues: PHP is fine. You don't have to use a framework. You only need caching if you need it. PHP itself does not necessarily need a templating language.
The fact that there are a bunch of route entries in nginx suggests that there at least is some form of pattern, for something.
A full all-at-once rewrite would probably break a lot of things at once, so I would just do the low hanging fruit first and modernize the codebase as it's being worked on.
I saw this in the past a few times. There is no universal recipe, if this is what you are looking for. Get some development and stage environment and make them use Git, that's a start. See what is the plan for that software, maybe the company does not want (you) to waste time and money with it, if they want to do something, discuss and align that.
In the end, if it works it brings value. If you want to rewrite it, it will bring some value and some cost: which is bigger and what is the priority, a rewrite or new features?
One more thing you can do it show the developers how to do some things in a better way, like composer or cleaning up versions and dependencies, but take it easy and present it to them in a way they will buy it and do it themselves, not because you told them so. Make them better and they will make the product better.
When you've got a clean base, the team will be moving quicker, be more skilled with what they already are learning and listen to you. Then you can consider the structural changes.
Pure, clean 2003 php into a new format is way easier than spaghetti nightmare into total re-write.
https://www.penguinrandomhouse.com/books/667571/kill-it-with...
I'm guessing this is a medical billing system of some sort, lol
2. Find out the "real version" of the sql schema.
3. Make some method of running this code + nginx config locally.
4. Add a test framework which simulates real traffic you see on the app and make sure the DB contains the right thing.
5. Make a staging environment which mirrors traffic from prod and run changes there for ~1 week and manually audit things to make sure it looks right. (You'll only do those until you feel safe)
Now you can feel safe chanting things! You can tackle problems as they come in. Focus 10% of the time of devs in new features. Focus 90% on reducing tech debt.
Lots of dead code? Tackle that.
Package management hard? Migrate to composer.
Don't do everything up front. Just make sure you have a way to change the codes, test those changes, then push them to prod / staging.
I wonder what you proposed, how you proposed it, and to whom?
If it's to the business unit I'd go with stuff like "If we make a mistake and it brings the site down it hurts income, so we should have source control and dependency management, automate deployment…" etc. They think their ideas will make more money than yours and they won't be reasonable about things they don't understand. Everyone understands big screw ups and websites that are down.
Once you have that, you can kill two birds with on stone by documenting all the APIs using using integration tests. Use the same fear of destroying income argument.
Once you know the APIs you can chop things into pieces and improve code and put boundaries around tasks. You can start to cache things because you know what the API behind it expects. Then you can build new APIs with adapters behind the cache and slowly introduce them.
You can build the stuff the business unit wants.
If you can't excite your developers with the possibility to design and build new APIs like that, then:
a) you need to brush up on your "soft" skills
b) you need to move on or ask for more money/perks
I agree with the sentiment that a full rewrite is a waste of time. The team needs to learn better practices, together, or any rewrite will fall into the same pattern. We've had great success doing side-by-side upgrades (from AngularJS to React as an example). > All new features (screens) build on React (newest) > Run them in the same path, so it looks like a single app > Each sprint has additional upgrade work to start porting over to React > Use the customer and usage analytics to refactor screen, flow, function, while rewriting
Refactoring over time is by far the least risky, and is where you should start. And the start of that is understanding the scenarios and getting tests in place. At some point, you'll know the refactoring is working, or you'll know a rewrite is needed.
But that is just the technical side. Most of your risk is not there.
As others have mentioned, you need to get your new bosses on board and aware of what the situation really is in terms they understand (specific business risks, specific business opportunities) and make sure they have your back. You will be the first to go if they are taken by surprise. They need to understand the jeopardy to business that already exists, and that while the team has reached a point of relative stability, it is perilous, and some risks will need to be taken to get to a point of actual stability.
The other main risk is the team itself. What do they value? Is it in line with where you know things need to go? If they walk, who will maintaing the beast?
A $20m/year is pretty impressive with that kind of spaghetti code/tech amalgamation. It would be certainly a fun project for your more junior developers to dig into it and understand the actively used features. That raises my next question: what exactly is wrong with your 3-people development team? Are you expecting only 3 of them to make major changes, let alone a full rewrite for such a project?
The way I see it is that you only have enough development resources to make minor changes or features that fit in the project's current spaghetti framework. Is that what management wants? If they want some big new features your only option is to find path of least resistance to implement them, especially if your budget is tight. Basically, add more hack-fixes and continue feeding the monstrous legacy. Unless you get more people, more budget you don't really have a choice of doing things "the proper way".
It takes time, but the outcome is a fully tested version of the already production-tested software, and there's no need to maintain two versions.
Then start thinking about replicating deployment of the application. At start it can be a script that compresses and extracts the files to the production environment. This will benefit you to build similar environments or other more experimental development environments.
Once you have a flow of the application state management and deployment under control, you can start building on top of it.
The most valuable work would be to build a separate test suite that documents the most mission critical parts of code or application.
Only after this I would try to reason the changes to the application. The great part is that you have Nginx configuration as an abstraction layer. From there you dissect the application to smaller pieces and replace the application one redirection call at time.
If the application has an expected lifetime of over 2 years. Then these changes will pay themselves back as faster development cycles and maintainability of workers and codebase. This can be a selling point to management for roadmap or recruitment.
Good luck.
It got this way exactly because management doesn’t see the point or the problem. The fix isn’t technical (not yet), it’s cultural and strategic first which isn’t something you have control over.
Also, the "without managing them directly" is interesting. Are you a peer of the existing three team members?
Focus and think of any other improvement you could do.
It sounds like management doesn’t think there is an actual problem to solve, so I wouldn’t necessarily pick refactoring or rewrite as the hill to die on.
If you go the refactoring route, i have little advice:
0. Clean up the database, it will immediately impact performance and make management happy
1. Find vertical (feature-wise) or horizontal (layer-wise) architectural boundaries and split the code base into module, separated libraries. This will be an ongoing process for a long while. Do it by touching as little code as possible - this is pure scaffolding, actual refactoring comes later.
2. Stick with PHP, at least until results from #1 aren’t good enough.
3. Use testing as a tool to pressure management, it works a surprisingly large number of times
4. Rewrite one feature/page at a time, once results from #1 indicate a good candidate. It might be a good idea to introduce a new language at this point, or even some form of micro services (if it makes sense).
You personally will gain no knowledge there, just that your codebase is hell.
You cat try to convince the management of creating a new gen implementation. Not a rewrite. New software, that can fulfill customer needs better. Compete better, is safer and better to extend to do all this in the future.
One thing you can do though is to immediately set up modern practices. SCM, Code Review, CI, Tests (most of the code might not be unit testable in this state, but some tests at least) - This way you can see what others do when they add if fix something and learn better (SCM, Reviews), make changes and know that you did not break the whole thing (Tests) and have CI to at least ensure the tests run and everything works and it will glue all together.
Good luck
It would be incredibly unlikely to convince management to stop the roadmap for a full rewrite unless you can really give some solid evidence and numbers to show the rewrite costs less than the effort needed to get new functionality added reliably into those parts of the system with issues. For a large system that would be basically impossible. If not able to pause the roadmap, trying to continue development on new features and making sure the new code base is kept synchronized will just be a nightmare.
Like many others have said, the most likely strategy that will get a successful outcome would be to:
- Get some automated testing for key business flows in place. These act as documentation and contracts for the basic business functionality that guarantees that revenue. These then act as safety net for when refactoring is taking place.
- Do targeted refactoring either as part of a 20% tech debt reduction budget your work into your roadmap planning, and/or factored into new feature estimates (fix as you are in there changing something)
- Get the basic structure and processes in place early as those will likely be possible to set up without a big, or at least minimal, impact to production (source control, branch management, PR process, coding standards, CI, deployment process)
It will take time to get through the whole source code, but you would be seeing incremental improvements over time at least. Plus, you can at least still manage to continue with the roadmap with adjusted expectations a bit more easily.
I have gone through a few different projects where it was either a full rewrite with new features only going into the new code base, full rewrite being kept synchronized with existing codebase receiving updates, and the incremental rewrite and the incremental generally will make the most sense.
How could the budget possibly be tight if this thing makes $20M a year?
Even at a 5% R&D budget you should be able to hire at-least a couple more devs.
Do you mean to say the whole company makes $20M? If not, what other costs are associated with producing this revenue?
> aggressive roadmap
> budget is really tight
Leave. If you care about the space, start a competitor.
This is pretty apparent since they seem to be earning 20 million dollars with a software managed by three junior engineers.
My advice to the OP - if you value good software engineering, this is not the organization you should be working for. Because no matter what you do, your effort will not be appreciated and you'll be replaced with a junior developer as soon as the management deems it necessary.
That is your core problem. If you are not directly managing then how can you bring about any changes?
If HQ management can't see the problems you see, then you are unlikely to receive any support for the changes you are contemplating.
Your number one problem is politics not technology.
You should also understand the audience. Who are the users of the app? It sounds like that the app does not need high reliability or availability, or any of the stuff that's required for typical mass market web apps. Understanding this might give you some room to improvise.
Sounds like someone needs to push back against management first and foremost. Without this understanding the only thing you'll succeed in doing is denting that $20m revenue stream with very little appreciable benefit and the higher-ups will understand even less what you're up against.
Get that message right first and small doors may open to better budget. Then approach as others have said, piece by piece, or as Martin Fowler describes as a StranglerFigApp (https://martinfowler.com/bliki/StranglerFigApplication.html)
The key is though, you don’t rewrite the code, you rewrite the app. Figure out what the functional pieces of the app and what it’s supposed to do. Don’t use any ActiveRecord style ORMs, so Laravel is out. If the app is that bad then SQL database is probably a huge mess. If it had to be PHP, use Symfony and Doctrine.
Build an MVC version of the application.
If there was any sort of structure to the application then the refactor not rewrite approach would be correct but if it’s anything like what I think it is, it’s a fucking mess. Refactoring will just make a bigger mess.
If you can get away with refactoring pieces at a time into symfony components until you can eventually have an MVC framework then do it but likely that would be a much bigger task.
It could be much worse. You could break something and cost the company money.
Deploy the code into a staging environment (make a copy of prod). Kubernetes might be useful to try to package the application in a replicable manner. Then get the tests running on CI.
When the tests cover literally everything the app can do, and everything (tests/deployment) are running on CI, changing the app becomes very easy.
Your junior coders no doubt have been yelled at many times for attempting changes and failing. When they begin to understand that change with breakage is possible, their confidence will increase, and they will become better coders.
Resist the urge to change the application at all until you have tests.
1.) Leave this mess behind you and quit - and miss an opportunity to learn a lot about code, yourself, teamwork and solving real world problems
2.) Work together with your team and solve problems, that probably will improve your skills more than anything in your future
I recommend you to give 2.) at least 6 months, before your quit.
What I would recommend:
- Create a git repository (I would not init it on the production server, but copy the code over to your machine, init, experiment a bit, and if you found a reliable way, repeat this process on the server)
- For the first weeks, continue developing on the server with one main branch, but at least push it to a central repository, so that you have a kind of VCS
- Setup a dev system, that points to a cloned (maybe stripped down) prod database, where you can test things
- Add composer in dev and see, if you manage to migrate this to production
- As you said, you already have an API, that is called via curl. That might be the way out of your mess. Create a new API namespace / directory in the old code base, that is fully under version control, uses composer and as little of the OLD mess of code as possible (you won't get out of this with a full rewrite). Write unit tests, wherever possible.
- I recommend to use jsonrpc in your situation, because it is more flexible than CRUD / REST, but this is up to you
- Get SonarQube up and running for the new API and manage your code quality improvement
- New features go to the new API, if possible
- Start to move old features to the new API, create branches and deploy only ONE folder from dev to prod: the api directory
- The database mess is a problem, that you should not solve too early...
This should take roughly a year. Have fun ;)
There is simply no solution to this problem, so you better leave and go to work where things are actually handled by professional engineers and not some non-dev shitty manager. Simple as.
Rewrites are almost never the answer unless you wrote the previous version. Sure, to most of us here the code you’re describing might look like garbage, but it works and certainly a ton of wisdom has been embedded into it that will be difficult to replicate and understand unless you dive into what exists now and try to work with it on its terms for a little while.
I did a major rewrite early in my career based on something someone else built, and it was a total disaster for a while. I thought I knew better as an outsider looking in, and sure, eventually we did improve things, but a lot of my choices were not best practices, but some form of fashion.
Writing tests is great, but how do you even write good tests for spaghetti code like this and have faith in them? Answer: you can’t. But you can instrument your spaghetti code so that you have a fighting chance of seeing what’s wrong when stuff breaks.
After a year or so of instrumentation, small bug fixes, and fixing the absurdly stupid stuff, you’ll grok that spaghetti mess well enough and have enough political capital to be able to start refactoring great whacks of it. The strangler fig pattern mentioned earlier smells like the right approach, but you won’t really know until you’ve really grilled the codebase.
I've lead rewrites in worse circumstances (larger codebase split in 30 microservices, 15 people across 3 teams, making just 2M per year!) and I don't think you can do it with your current team. In the above examples we downsized the teams to 1 team with 4 people and then rewrote to 2 rightly sized services.
The new team was all new people (introduced gradually), while we shifted out the previous employees to other areas of the business.
The bottom line you have to use with management is you need a more senior team. Hiring seniors is pretty hard nowadays and it doesn't sound like you can offer much of an environment.
Get a good agency for 1M / year and let them work with your team to understand the ins and out and then replace them.
Practically: cut the bleeding, get the current team at least using version control and working with a CI environment. That will be a lot of effort (been there before with a similar .Net product but much better team).
Then you're going to need significant resources to re-build on a modern architecture. I would simply go with releasing another product if that's at all possible. You clearly have some market and channel to sell into.
Just beware: this sounds like a problem which will take 3-5 years to solve and whose chance of success is dependant on organisational buy-in. So you need to ask yourself if you're willing to commit to that. If not, quit early.
Start using Git
Start doing code reviews, for newer code
Only refactor as needed. Don’t rewrite, it will likely end in disaster (we tried and failed)
Start deleting dead code. If you’re paranoid, comment it out for a few releases, before deleting
It is all about ROI - for example, removing inline CSS might be good practice, but does it really matter that much in your codebase? Maybe there are better things to do.
Even when refactoring, try to do it in stages. For example, simply splitting a large file into two or more files, without changing the code too much might be a good start.
For any new code that is being written have strict code reviews and rules in place, so past problems aren’t repeated
Your comment about productivity of the dev team is a red flag for me. They’re charged with containing this 20m revenue engine, it probably stresses them out big time. This is not the time to count feature development. When you’re treading water you don’t punish the survivors of the titanic for not also doing laps while they wait to be rescued.
Given you’ve made no comment as to expanding the team, I can only assume the business owner wants to make more money without investing in this product. There’s no magical advice that will unfuck the executive level if that’s the case.
Two reasons for this: (1) You haven't inherited anything, the business owns the code, not you as an individual. You and the tech team need to work together to to make sure the code keeps generating revenue, and possibly more. No one is owning other people or teams. (2) The code is generating $20m annual revenue. That's pretty cool and not bad at all!
I'd follow the following steps:
1. Start by defining responsibility areas: input, code, output (business value). Any codebase can be modelled in this way. Once you have explicitly defined input and output of your code base, you know what your degrees of freedom are as long as you don't mess with input or output of your application. Also a good way to get to know the stakeholder landscape.
2. Introduce version control, move everything to Git. Git enables a nice way-of-working that is recognized industry-wide. Team work is everything.
3. Start writing tests. Preferably E2E tests that will be stable for a long time to come. In all cases, don't disturb the revenue flow with your changes. This will help you to make changes without having angry coworkers in your mailbox when your change caused existing functionality to break.
4. Fix the low-hanging fruit first. Define a list of maximum 5 issues that can easily be fixed in isolation and will improve the code base. Be sure everyone understands why and how things are done. This will boost team ownership.
5. Improve the codebase step-by-step. Be sure for every improvement to explain why it is important in terms of business value. If you can't explain it to yourself, maybe you are just fixing this for esthetics and it's not really important at all.
And finally, don't go for a full rewrite. Rewrites always seem easy, until you remember that you forgot to take into account all the edge cases the original code base did take into account and it's not as simple as you've thought after all. Instead move parts of the code to a new codebase and migrate slowly from v1 to v2.
Listen to he more experienced people in the thread. They have good advice. Probably ignore the people who were on one lucky project that worked out with a risky full-rewrite.
But, the business' ambitious but naive plan is not viable, and it's your job to communicate why, and figure out how a less ambitious series of slower goals could be achieved. If I were in this position, as an IC, I'd literally just refuse to shoulder the stress of naively agreed upon deadlines etc.. because it wouldn't be feasible unless I risked burnout for probably the not-enough salary.
I will start by properly correcting the NGINX codes.
I feel solving that will provide a bases to rewrite the other parts or the codebases.
Find new servers and backup everything onto the server and do the changes there including tests, and move successfully ones to production.
At least from a technical perspective, the key to making this manageable is not a re-write, that’s probably the worst approach especially if you have little to no buy-in from above. From a business perspective, a re-write provides little to no benefit and will only be a large cost and time sink, so you will never get buy-in on that anyways.
The key here is slow, progressive improvement. For example, get it in source control, that’s a relatively simple task and provides an endless amount of benefit. The next step which is a bit more complicated, is get a way to run this in a local development environment.
Getting a local environment for this type of situation can certainly be tough, and you have to be prepared to accept what may be considered a “non-optimal” solution for it. Does your code have a bunch of hard coded credentials and URLs that you would accidentally hit 3rd party services or databases from local and cause problems? The answer to that is NOT to try and extract all those things, because that will take a ton of time and you have no test environment. Instead cut the container off from all internet access and add-in a proxy container to it and give it a container it can proxy through for outbound, then you can explicitly control what it can reach and what it can’t, now you can progressively fix the hard coded problems.
Basically the key is to accept that shooting for “ideal” here is a mistake, and you have to sneak the small progressive improvements alongside meeting the business goals that have been set for the team.
In my experience, if you can sneak some simpler but very impactful changes in, then demonstrate how those help deliver on things, it will be easier to get buy in. If you can point to be being able to deliver a feature weeks ahead of previous estimates and attribute it to say having a sane deployment strategy, or a local dev environment, the advantages become clearer from a business perspective. If you say “we need time to fix this” but have no data or concrete examples of how this helps the business, you won’t get buy in.
If you are not managing them directly and they don't want to do those kind of things because it sounds hard or foreign, then you can't really do anything about it.
You have inherited working code generating revenue but in state which makes it hard to develop new features and manage productively.
As you say that the roadmap is agressive and management has no understanding of the situation, you have already established what you have to do: explain to management what makes development difficult (avoid statements like this is the worst and focus on what needs to be done to establish best practices and identify where the quick wins to gain development velocity are - more expertise and less judgement is always a good idea). Then you propose a realistic roadmap and start making the changes that need to be done.
Reading this my first thought was, "I hope you're getting well paid. I would triple my fees going in to this scenario."
Then you come to "HQ has no real understanding ... budget is really tight."
Life's too short. If you can do this job at all you can do it for someone who doesn't have their head up their fundament. Failure seems inevitable, but you don't have to be the captain of that sinking ship. Let it fail without you. I mean, this isn't the sole company keeping alive the small home town you grew up in? This isn't your family business that's been handed down for generations?
There's really only way to help improve a codebase / development process in a situation like this: one small incremental step after another, for a very very very long time. If you don't think you can enjoy that and have the patience to stay with the problem for a few years, consider looking for another job.
You have to convince them that not only is this situation a drag reducing their future revenue, as they cannot develop it further with any speed, but it can also come crashing down catastrophically at any point in time.
It also sounds like the current team is not up for it you need more people and a dedicated project
If you can't change the culture and get your boss(es) on board, then you will fail.
Right now, the business is likely "mostly happy" with things the way they are. They're getting their changes made (but not as quickly as they'd like). Their costs are low (3 junior devs, with just their laptops and a production server). Convince them that unless the changes you want are made, their business will become stagnant. Use phrases like "invest for future growth" and "protect the business' current investment in the product"
Each question you have above should be solved in an order that makes sense.
Full rewrites do not make sense if you plan to put on hold the project. You have to make a greenfield space within the mud.
I recently did this, I inherited an Angular 1/Java project and someone had already hired my team they hired 6 React/Node Devs. They were JS devs but not angular. We just started embedding React in the Angular routes, also product team wanted a new design. So we had two themes, old and new at a certain point we were 80% there and made a push for the final 20%. Took 1.5 years to rewrite a FE e-commerce app.
Second: making a change without tests is like walking in the dark without a flashlight. Having tests is a very important thing.
Read Working Effectively With Legacy Code", by "Michael Feathers, one of the best books I've read that really can help in situations like that. In summary, it boils down to having tests to aid your changes you need to make.
At least half of the stuff you listed will probably never change. Congrats! Being the senior person means becoming comfortable with people making objectively worse decisions than you would, and putting the structure and architecture in place so that it still works anyway. As a bonus, most of those “objectively worse” decisions can be really good and better suited for the team than your decisions would have been ;).
That's it.
A concurrent small migration to the new system without changing all the system at once.
Why it works? New systems often fail to encapsulate all the complexity. also two systems, duplicate your workload until you decide to drop new system because of the first statement.
Finally, get stats from nginx and figure out which routes aren't used in a month, try disabled some and see how much dead routes you can find and clean
There's no 20m/yr 3 jr dev team, it's just to get the scene set for asking questions about "what if there was a bad code base that was making money, how would you bring it up to spec"
This community is great at offering advice and telling people how to do things the "better" way.
Posted on the weekend too so people that are having downtime on Sunday have enough time to reply. Sorry guys, if you don't get contacted, you haven't passed the first tech interview.
I was brought on board to 'modernize' a similar application. Almost a year later we haven't modernized anything... Despite a lot of promises from mgmt up front they have now gone into the 'if it's not broke don't fix it' mode.
- Source Control
- CI/CD process
- Lock down production so there's no access
- Kill off dead code
- Start organizing and refactoring
etc...
Edit: Alot of people have already said the above. But I want to add.
Just because code sucks and is messy, obscure, has no structure or breaks everything we learn as developers that define 'good code' or 'good coding practices'... does not really mean it's bad if its generating the business money.
It can often be quite fun to work on because everything is a win, performance, cost reduction, easier maintence, etc.
Took a couple years to recover mentally from it. First off. Make sure whomever is in charge understands how screwed they are. Hiring and retaining staff in is a complete nightmare.
Get someone between your team and management.
Learn to say no to everything. Better yet. This is your go between job. Do not allow management access to any IT staff. They will destroy morale.
Support and slow cleanup is only work that is done for a year or more. No new work.
make sure people who had a part in decisions are gone. Otherwise your wasting your time.
End of day. Decide if your up for this.
Code can be fixed, but people sometimes can't be. You need to break down the "resistance to change" somehow. Trying to convince people can burn a lot of time and effort on its own. If you can't easily convince them, and you can't overrule them to dictate the direction, don't even bother.
You need people and you need budget. The business doesn't understand bad code, but you should find a way to make them feel fear. They have been drinking poison for years without feeling ill. Make them understand how easily the house of cards could come crashing down.
I took the e2e approach, since making any changes is having that huge domino-effect of breaking everything else. I think it's really important to setup a proper build/e2e CI pipeline with instant Slack reports, and run this pipeline on every commit, from this point you can just add specs to fully cover it and then it can be released nightly without a fuss
WHERE IS THE MONEY, LEBOWSKI?!!?!
Seriously, WHERE IS THE MONEY GOING? I'm all about keeping a tight small team, but where is the money going? Even paying for a manager to help them move in a direction and address business risk would be worth the investment.
If you really have that level of revenue, and only 3 devs, then you need to be looking at the risk of losing one of them.
All the tech debt is irrelevant, your focus should be on mitigating risk due to attrition/burnout/mistakes.
That being said, just getting a sane deployment process would be helpful.
* Pair program to teach people you work with that there is another way; they may simply not know any better. * Make any code you touch better; new / old it doesn't make any difference. Do it right. * Important: NEVER, EVER COMPROMISE ON THIS!!! Seriously you skip one time and it can all be downhill after that (sayeth the voice of regretted experience).
Run. The problem here is empathically not on the technical side.
1. Stop cribbing 2. Start using version control/git, Build Test/UAT environment 3. Upskill your team - As you mentioned your current team MUST have also inherited the code from someone else 4. Try tools like dead/junk code finder, lint etc 5. Try other refactoring tools and techniques 6. Most imp: Try to gain trust and Read 1.
A) Is current system stable [Understand it is messy!]? If it is stable there are ways and means to build/design/architech parallel future roadmap without adding more mess.
> Resistance to change is huge.
These 2 quotes tell me they haven't yet recognized the grave danger and pain of their complexity. They will eventually, but for now neither management nor the team seem open to the radical change which they desperately need. Eventually collapse will come, but for now it's a no-win situation for you. Unless the money is insanely good and worth the stress, best path is to get the heck out.
What industry and type of business is this?
You should quit, there is no solution to that.
I second most comments against the "full rewrite" here:
- source control it
- get a local environment if you can
- write tests before changing/deleting something
Adding tests can be hard at first. The book "Working Effectively With Legacy Code" by M Feather contains useful techniques and examples.
Be wary of the Chesteron's fence : "reforms should not be made until the reasoning behind the existing state of affairs is understood". Don't fix or remove something you don't understand.
Or just quit that job, it might not be worth staying there.
Resistance mean it is a situation of hostage taking: https://neilonsoftware.com/difficult-people-on-software-proj...
It's very serious and coders do this for job security. Don't accept this BS.
Rewriting is a wrong approach
2. Start planting seeds with upper management explaining that kicking the can down the road wrt code quality is like skipping oil changes in your car. They may not like change but they’ll have a broken car if they don’t start taking small steps now.
3. Study domain driven design and software architecture, primarily loose coupling and reducing live dependencies. You’re about to become phenomenal at software architecture. Codescene.io may help.
1. Add a staging environment.
2. Add a CICD.
3. Add tests. Start by easy ones (new features) to get team used to writing them. Then important flows. If your team doesn’t have the bandwidth hire a contractor. Ex: https://avantsoft.com.br
4. Choose a part of the code the warrants being the first to refactor. Balance easiness with importance.
5. Define a better structure/architecture for it.
6. Refactor.
7. Repeat from 4 as many times as needed.
Also, consider micro-services on new features… may be an alternative to full rewrite.
The team is another issue. That's where you need to make an immediate impact. But give them each the benefit of the doubt. Start by speaking with them individually. Then as a team. Establish a relationship(s). And then nudge by nudge make changes, changes to culture, workflow, coding standards, etc.
The way to fix things involving people is through something called leadership. That means you need to double down on your soft skills and you need the explicit support of management. If you hope a framework will do this for you then you are just as broken as that you wished were fixed.
Train your team, set high standards, and focus on automation (not tools, not frameworks). This is a tremendous amount of work outside of product. If you aren’t willing to invest the necessary extra effort you don’t seem to care that it’s fixed.
Quit.
Find a job where management has half a clue and is reasonable.
> - it runs on PHP
> - it doesn't use composer or any dependency management. It's all require_once.
Great --- explicit dependencies are better than magic. Personally, I'm a fan of require rather than require_once, because of some history, but require_once is mostly fine.
> - it doesn't use any framework
> - no MVC pattern of course, or whatever pattern. No templating library. It's PHP 2003 style.
This is the proper way to run PHP. Can you imagine if they used frameworks? It'd be a slow mess, with about 70 different frameworks. At least this is likely a bare metal, fast mess.
> - this code generates more than 20 million dollars a year of revenue
> - team is 3 people, quite junior. One backend, one front, one iOS/android. Resistance to change is huge.
So you've got 3 junior people managing 20M of revenue
> - productivity is abysmal which is understandable. The mess is just too huge to be able to build anything.
> I have to find a strategy to fix this development team without managing them directly.
> This business unit has a pretty aggressive roadmap as management and HQ has no real understanding of these blockers. And post COVID, budget is really tight.
HQ doesn't understand the process, can't even budget a manger, because apparently it's not your job to manage them. I'd bet their requirements are unclear and poorly communicated too.
> - the routing is managed exclusively as rewrites in NGInX ( the NGInX config is around 10,000 lines )
Great, the routing is one place!
> - no caching ( but there is memcached but only used for sessions ...)
Do you actually need caching? You didn't say anything about the performance, so I'm guessing not.
> - In many places I see controllers like files making curl requests to its own rest API (via domain name, not localhost) doing oauth authorizations, etc... Just to get the menu items or list of products...
Curl to the same server port is a bad pattern; yeah. Localhost or domain name doesn't make it better or worse. Figure out how to make those a call to a backend service maybe? Are you also saying this is running on a single machine (I think you are, but you didn't mention it)
> - it has been developed for 12 years directly on production with no source control ( hello index-new_2021-test-john_v2.php )
Ok, check in what you have, and make a deployment procedure that doesn't suck, and set things up so you have to use the deployment procedure.
> - no code has ever been deleted. Things are just added . I gather the reason for that is because it was developed on production directly and deleting things is too risky.
If you can, run profiling on the production site to see what code appears to be dead code, and run down the list.
> - the database structure is the same mess, no migrations, etc... When adding a column, because of the volume of data, they add a new table with a join.
Depending on the size and volume of the database and the operational requirements, this is kind of what you need to do. Do you have anyone with operational database experience who could help them consolidate tables, if that's what's really required? Is the database a bottleneck? You didn't say that, you just said you didn't like it. There's ways to add columns and migrate data, but it requires either downtime or a flexible replication system and some know-how. Consolidating the tables without at least write downtime is going to be a lot more challenging than if they had the opportunity to add columns at the right time... of course, sometimes having tables with a join is the right thing to do anyway.
Is there budget for a staging system, complete with enough database instances to test a data migration and time to do it? Maybe focus on developing a plan for future column additions rather than trying to clean up the current mess.
> - JS and CSS is the same. Multiple versions of jQuery fighting each other depending on which page you are or even on the same page.
jQuery is pretty compatible right? You can make a list of all the pages and all the versions and maybe make time to test updating the pages with the oldest versions to newer versions, etc. Again, a staging system would help with testing. Developing a testing plan and running the tests is something that doesn't require much from the three overworked developers, but could be offloaded to a manager.
> fix this development team without managing them directly
This is the worrying part. If you're not their manager, or at least the technical lead dev it's a lost cause. Because you need to laid a plan and have complete buying from management.
There's almost no realistic salary that can make it for working on (I presume) PHP 5 and this codebase forever and the effect on your career future prospects.
Ensure the beast is monitored, like staring with the basics, cpu, disk space and so on.
Then all goes to version control. Then changes can not be done in production, you need cicd, just build one step at a time.
Do not aim for perfection, just concentrate on having a framework(mentality) of continuous improvement.
You been given the opportunity of testing all your skills in a thing that "works" (makes money), you just need to find the metrics of where the money comes from and how to maximise it.
Pareto principle can be of help when making decisions.
Second task is to come up with a plan to your refactor. Break it down with time estimates, etc.
Doing this will put a safety net in place enabling rollbacks, introduce the team to version control, and give you a beachhead for automated testing in the pipeline.
Definitely DO NOT do a rewrite.
2) depending on the size of your db, you may want to just go with a shared dev db.
So now you can fix and enhance things in dev
3) add in a modern web framework. Depends on your app but I would go on something like Symfony: same language, can integrate old stuff you don’t want to rewrite yet.
4) Slowly and steadily migrate your routes to the new framework based on the new requirements
Last point is key, it is very likely to miss crucial logic hidden in existing code.
Consider the opportunity cost of cleaning up this mess. Consider the years of your life spent. The impact to your career. The stress.
In my opinion, unless the compensation is legendary OR this is something you feel very strongly about taking on, you might consider taking a different and more fulfilling role.
1. Make it script version controlled. 2. modularise the code. This will help in understanding the structure. 3. Add the dependency management. 4. improve the code deployment process. CI/CD etc.
> - it has been developed for 12 years directly on production with no source control ( hello index-new_2021-test-john_v2.php )
First step would be to get that into source control.
> - the routing is managed exclusively as rewrites in NGInX ( the NGInX config is around 10,000 lines )
This might be a benefit actually. I'd just start a new application and route to the new code one-by-one using the Strangler approach.
That’s a very profitable business off 3 junior devs so there is money for more, senior people.
The junior devs can’t possible like working like this - it will be through necessity and fear they push back on you. Ask them what they think could be done to improve things and start there. Remove the fear of change.
If they are just being protective and won’t accommodate change then replace the most influential one with someone more senior once the team can cope with the loss.
A strategy you can use is to incorporate any refactor into the estimates for a "new feature" development with the idea being that if you have to touch this part of the codebase that it gets refactored.
In this case since there's no framework I suggest to have a framework gradually take over the functionality of the monolith and the fact all the routes are in nginx will actually help you here because you can just redirect the route to the new framework when the functionality is refactored and ported into the new framework.
Do not refactor the database as interoperability between the legacy project and the new project can fail although migrations should be executed in the new project.
What I do suggest is to get development, staging, pre-production and production environments going because you will have to write a lot of pure selenium tests to validate that you didn't break important features and that you did correctly recreate/support the expected functionality.
You can run these validation tests against a pre-production environment with a copy of production. This also gives you feedback if your migrations worked.
On the team, that's the hard part. If they walk out on you, you will lose all context of how this thing worked.
As precaution, get them to record a lot of video walkthroughs of the code as documentation and keep them on maintaining the old project while you educate them on how to work in the new system. The video walkthroughs will be around forever and is a good training base for new senior devs you bring in.
Last, make sure you have good analytics (amplitude for example) so you know which features are actually used. Features that nobody uses can just be deleted.
Over time, you will have ported all the functionality that mattered to the new project and feature development in the new project will go much faster (balancing out the time lost refactoring).
A business making 20 million/year should be able to afford a proper dev-team though, what are they doing with all that money?
You should be able to get budget for a team of 5 seniors and leave the juniors on maintenance of the old system.
Get the team functioning well, then improve the code.
For example, not having version control was already unacceptable 12 years ago. Someone on the team must be strongly opposed to it. Find why. If no-one is against it, just set it up yourself. If it's management, and you know it's not going to change, find some other management to work for.
Rince and repeat for all the low hanging fruits.
After that... Good luck.
However the lack of budget and that the management has an 'aggressive roadmap' says that the management team is toxic, ill-informed, negligent, and ignorant.
Your mental heath takes priority. Get the fuck away from that tyrefire of a company
I mean, if you see this a fantastic opportunity to grow or whatever then fine, have at it.
However, you’re going to be fighting a two-front battle, both against the devs and against management, for widely different reasons. It’s going to take a toll on you.
Ask yourself if you really want to spend the next few years doing work you probably won’t see any recognition for.
Without the authority to make changes you this will be very hard to do, given the scope of changes required. Soft skills and influence works up to a point but given your remarks about resistance to change this is a big challenge.
You need to ask for the proper remit and authority, or decline and move onto another project or job.
If you're looking for reading resources I found "Working Effectively with Legacy Code" by Michael Feathers to be very useful helping me build a plan. Yes it's an older book but that helped me appreciate this is not a new problem.
The team will be able to try out how good programming can be and perhaps support you more. From there you should gradually move the old features in the new system. Even if you were to never fully complete the refactoring the situation would be much better.
If the answer is no, pick a common MVC framework like Django or Rails or Adonis, generate the models from a copy of your database, and make a minimal proof of concept.
This will go a lot longer way than just complaining about how bad everything is, and how everything needs a rewrite.
If the code generate 20 mio revenue, then it is very successful code. It might be ugly, but clearly something works right. You say "the mess is just too huge to be able to build anything" - nevertheless these three juniors have managed to build something with great business value. Most likely they are more productive as measured in revenue pr development effort, than most of the experts giving you advice in this comment section. The worst code is code which doesn't work or doesn't fulfill its purpose - regardless of how many patterns and best practices it implements.
The dirty secret in software development is most advice and "best practices" have no empirical basis. If "bad" code is highly successful, is it really bad? If theory does not match reality, is it reality that is wrong?
So before you try to change everything, you should eat a bit of humble-pie and try to understand how the code became successful in the first place. Otherwise you very easily throw the baby out with the bathwater.
For example:
> it doesn't use composer or any dependency management. It's all require_once
I'm not familiar with PHP patterns, but I would venture a guess that this "require_once" pattern is also the simplest? If you talk to real seasoned experts, they will harp on "keep it simple", while complex patterns are often being pushed by sophomores and consultants.
> no code has ever been deleted. Things are just added . I gather the reason for that is because it was developed on production directly and deleting things is too risky.
Perhaps, but this is actually reminiscent to the open/closed principle, part of the SOLID framework, which at least at one point was considered best practice: Improve code by adding and extending, not rewriting working code already in use.
> no MVC pattern of course, or whatever pattern.
Great! Patterns are an antipattern. Or slightly less flippant: Patterns are not a sign of quality or a goal in themselves. Patterns are solution to problems, so only appropriate if you have that problem in the first place.
Bottom line: You might learn a lot from working on this project.
> Resistance to change is huge.
I can understand that, if they have built something highly successful, and now you waltz in and declare that they are doing everything wrong because they are not using enough patterns.
You are right about source control though.
Even getting that process to stick properly ("Step 1") will be a challenge, never mind resolving the other 10 complaints in OP's list.
Start building a wiki and get knowledge from your team - they built everything in the first place. Embrace what they know and go from there.
Have you heard of Swimm for knowledge base/wikis/documentation?
> this code generates more than 20 million dollars a year of revenue
Budget is probably not as tight as you think
Your job isn't to fix the technical mess, but rather not kill the product. As an owner I wouldn't care how fast features can be released if my revenue started to drop.
Keep in mind that there may be passwords / keys in the spaghetti etc...
iykyk...
I would just add -- embrace the challenge. It actually sounds like a fun problem. After many years in tech, I've learned that I'd rather work on improving a pile of shit codebase that produces a lot of value than a pristine perfect codebase that does not.
This is bad.
> It's all require_once. > it doesn't use any framework
This is not necessarily bad.
> no code has ever been deleted
This is bad
> Multiple versions of jQuery
This is bad
> a full rewrite is necessary, but how to balance it?
You never need to fully rewrite something. You can always take an incremental approach. If the code and 3 people generate 20 million dollars of revenue (on their own? or with massive sales support? what is the cost of goods?) then it's got to be doing something right.
I'd start with the source control. Just check everything in, so it's easy to go back. Do that on the server they develop on, even. (But have a script that pushes the checked-in code to offsite.)
Second, make it possible to spin up a second instance of the same application, in some automated fashion, out of bare source control. This may mean dumping schemas and checking them in, and probably figuring out what data in the database are "necessary configuration" versus "user payload data."
Then, you can initiate integration testing on top of the second cluster. You can also turn this into some kind of local sandbox development setup.
Once that is done, you may be able to change the code quicker, because you can do and test it locally, and perhaps have some acceptance tests on top of the mudball. At that point, you can switch over to doing development locally and deploying (and, ideally, having the ability to un-deploy.)
After that, starting to clean up should at least be possible with less risk, because you can test it in isolation. You can then start pulling on threads in the code, such as standardizing library versions, detecting and deleting unused code, putting like modules together, and so on.
You don't need fancy tools for managing this, shell scripts and command-line git are probably plenty enough. Resist the temptation to spend six months engineering the build system of the future!
Of course a lot will depend on details, but from your brief description, this sounds like the path forward -- focus on making it possible and safe and cheap to iterate, and then you can get on with actual iteration. Don't waste time on big rewrites; instead do things incrementally. Don't believe that any one tool will save the day, because it won't.
Get the business to buy into "fixing" this before doing anything. Convince them to hire more, sounds like the current team is already swamped.
If you don't get business buy in, it may be the wrong place for you.
You need to introduce things bit by bit to convince the team. Start with version control.
Any experience you gain from improving this situation won’t benefit you in a future job change. The team will resent you for rocking the boat and implying their code sucks. Management won’t care and will fight anything that puts revenue at risk (rightfully so).
index-new_2022-test-whattodochange_v1.php
I did not get this, if it is three people who are juniors, how do they resist any changes.
Since it is only three, could you get to hire someone senior and start untangling?
As such a framework in PHP is counter intuitive and will only slow things down more.
>> I know a full rewrite is necessary
Rewrite it in rust! /s
You’re most likely focussing on the wrong thing here. The tech doesn’t matter. It’s a business, this bit matters:
>> this code generates more than 20 million dollars a year of revenue
You need to be able to quantify which lines of code you’re going to change to increase that 20 number to something higher, or at the very least, increase the amount of that the business gets to keep rather than burn on costs.
This might sound like a hard problem at first glance but it’s really not.
>> This business unit has a pretty aggressive roadmap
This is a positive. To be clear the worst case is an apathetic business unit. This is huge, you’re already ahead. People want things from you so you’re free to exchange what they want for what you need. Think of other business units as part of your workforce, what can they do to help you?
>> management and HQ has no real understanding of these blockers
Yeah that’s the way it is and it’s totally ok, management doesn’t fully appreciate the blockers impacting the HR unit or plant maintenance or purchasing or customer service or etc etc but they DO NEED to know from you the problems you can see that they care about.
That means issues about how code quality are problematic are out of scope but informing management that your team are going to continue to be slow for now are in scope.
Issues about developing in production are out. Issues about your working practice is unsafe and we have a high risk of breaking the revenue stream unexpectedly over the coming weeks and months, that’s in scope for being communicated. At the same time, take them through the high level of your mitigation plan. Use neon lights to point out the levers they can pull for you, e.g. we need SAAS product X at a cost of $$$ for the next year to help us deliver Y $$$ in return.
For every strategic piece of work you line up, be clear on how many $$$ it’s going to unlock.
Be clear on how you personally can fail here. Transparency and doing what you say you will go a long way.
Practice saying no.
You’re an unknown quantity to them so get ahead of that. For example, make it so you’re always first to tell the other units when the product has broken for a customer, rather than customer service telling you about a support ticket that just came in.
Yeah, we hate that. On the one hand, it's impossible to build off a shaky foundation. On the other hand, software quality rarely correlates with revenue. That why we call it work?
rewrites are super dangerous, double if the team is junior then they would need to double all the features, migrate, develop the skills they lack now, otherwise same mess in the end with a new framework and so on.
But first, I'd take apart some assumtptions:
- this code generates more than 20 million dollars a year of revenue
This is GOOD! Thsi means this project is important. There WILL be budget for this.
You need to find out two things:
1. What is the PROFIT margin? 2. HOW does this generate revenue?
If you can increase profits (or promise to) by either making it easier to generate more revenue (onboarding woes of new customers / sales UI, etc) or a bigger margin, you'll be golden.
- it runs on PHP
This is not necessarily bad, check ylour code wars at the entrance.
- it has been developed for 12 years directly on production with no source control ( hello index-new_2021-test-john_v2.php )
THIS is one of the things that need to be remedied - source control NOW. Get a professional course on git for all devs, add some nice dinner for teambuilding.
- it doesn't use composer or any dependency management. It's all require_once.
This needs to be adressed.
- it doesn't use any framework
This needs to be adressed.
- the routing is managed exclusively as rewrites in NGInX ( the NGInX config is around 10,000 lines )
This needs to be adressed!!!!!!!
- no code has ever been deleted. Things are just added . I gather the reason for that is because it was developed on production directly and deleting things is too risky.
Source control should take care of some. The rest is on you sitting with the product manager to find out WHAT is the core, what needs to be removed, what features need to be kept.
(bunch of horrible code smells that ALL need to be adressed.)
- team is 3 people, quite junior. One backend, one front, one iOS/android. Resistance to change is huge.
OK... if the team is junior, you need to lead.
And WTF is an iOS / Android developer doing in your team?
- productivity is abysmal which is understandable. The mess is just too huge to be able to build anything.
This business unit has a pretty aggressive roadmap as management and HQ has no real understanding of these blockers.
YOu need to sit with business and make them understand the blockers, the challenges, the timelines, AND the opportunities that your solutins will bring.
And post COVID, budget is really tight.
No it is not. See above.
Or at least, have the team listen.
Create a monorepo
Design and setup new architecture to use going forward
Allocate x%-time to write tests and port old stuff over time (continuous weekly process)
With good reasons I might say :).
Looks like the bow wave here has swamped the boat.
There have been some good great replies here and I agree step one is version control
You have some directly measurable consequences of the underlying issues, as well as some obvious risks that are generally being ignored. Start with those:
1. Productivity is abysmal. Measure the time to implement a feature to get a feel for how long things actually take. How long does it take a feature from being requested by management to being released?
2. Unstated, but I'm guessing that release quality / safety is generally low. (due to lack of testing / staging / devops / source control). Measure this by looking at whatever system of bugs you can get (even if that's just show me all the emails about the system in the last year).
3. An aggressive roadmap. You're going to have to find some balance and negotiate this. If you happen to find a way to make the software better, but don't deliver any value to the business, you've failed. Learning this developer failure mode the hard way kinda sucks as it's usually terminal.
4. Resistance to change is huge. The team have so far been successful in delivering the software, and their best alternative to changing what they're doing for something else might just be to quit and do that something else somewhere else. What incentive do they have instead to change what they're doing here? This likely involves spending time and money on up-skilling. You've identified a bunch of areas that could be useful, now you've gotta work out how to make that change. E.g. actual time to attend paid courses during work hours on how and why to use git. You mentioned budget issues, but it's worth considering this old homily:
> CFO: "What happens if we spend money training our people and then they leave?"
> CEO: "What happens if we don't and they stay?"
5. You can see a bunch of risks, and the team knows them too. Right now, the team probably mitigates them informally with practices learnt from experience. (E.g. the add a new table with a join approach). Because the risks are adequately mitigated in their minds, there really isn't a problem. You're the problem for not seeing their mitigations. That said, by taking the approach of getting the team involved in risk planning, you may see them reevaluate those approaches and come to some opinions about what they need (i.e. source control, tests, devops, etc.)
6. Your people problem is such that you're going to have to convince the existing team to accept that they made mistakes. However you do that you're asking the team to reevaluate their output as a success and instead accept that they are failing. This might be the hardest part of any of this. To do so is going to take untangling the team's identity from their output. If you don't have the soft skills to do this, you'll need a mentor or stakeholder that can help you develop these. You will fail if you don't accept this.
7. Lastly, you're fighting against one of Einstein's quotes "We cannot solve our problems with the same thinking we used when we created them". Are you sure you can fix the problems created by the team, using only the members of the team? Unless you can change their thinking significantly, or add more people with different thinking (yourself and one more developer), then you're bound to fail.
I'd echo a bunch of jeremymcanally's comments below [1]
On the technical sides:
1. Buy each developer a copy of "Working Effectively with Legacy Code" by Michael Feathers [2]. Book club a couple of chapters a week. Allocate actual work time to read it and discuss. Buy them lunch if you have to. The ROI of $100 of food a week and several hours of study would be huge. Follow this up with "Release It!" by Michael Nygard [3].
2. Don't rewrite, use the strangler fig pattern [4] to rewrite in place. Others in this post have referred to this as Ship of Theseus, which is similar (but different enough). Spend some time finding some good youtube videos / other materials that go a bit deeper on this approach.
3. In the very short term, try to limit the amount of big changes you're bringing at once. Perhaps the most important thing to tackle is how each page hits the DB (i.e. stand up an API or service layer). If you try to change too many things at once, you end up with too many moving pieces. Once the impact of the first thing is really bedded in and accepted, you've earned enough trust to do more.
4. Stop looking at the symptoms as bad, instead always talk in terms of impact. By doing this you ensure that you're not jumping to a solution before examining whether the issue is as big as it seems, and you acknowledge that each suboptimal technology choice has real business level effect. E.g.:
- Lack of dependency management isn't bad, the problems it causes are the real issue (spaghetti code, highly coupled implementations, etc.). The business values predictability in implementation estimates.
- Lack of source control isn't bad, not being able to understand why a change was made is the real problem. The business values delivering the correct implementation to production.
- Lack of automated testing isn't bad, but spending time on non-repeatable tasks is a problem. The business values delivering bug free software in a reasonable time.
- Lack of caching isn't a problem, but users having to wait 30 seconds for some result might be (or might not if it's something done infrequently). The business values its users time as satisfied users sell more product.
[1]: https://news.ycombinator.com/item?id=32883823
[2]: https://www.oreilly.com/library/view/working-effectively-wit...
[3]: https://pragprog.com/titles/mnee2/release-it-second-edition/
[4]: https://martinfowler.com/bliki/StranglerFigApplication.html
Then start in on the code. Start by writing some basic tests (you'll probably have to do this as a series of curl commands because it's unlikely the interfaces are clean enough to do it any other way). You'll need the tests to make sure everything else you do doesn't break major functionality.
Then do the easy stuff first. Fix the parts that curl itself and make it a real API call. Fix the dependency management. Compress the NGInX file by eliminating whatever rewrites you can by adding routing into the code. Test often, deploy often.
Enable tracing to figure out what code can be safely deleted. See if you can find old versions sitting around and do diffs.
Replace all the code that accesses the data store with a data access layer. Once you've done that, you can bring up a new data store with a proper schema. Make the data access layer write to the new data store and do queries by joining the old and new as necessary. If possible have the data access layer write any data it reads from the old data store into the new one after it serves the request, and read first from the new data store. Log how often you have to read from the old data store. In theory this will go down over time. Once there isn't a lot of reads from the old data store, write a program that runs in the background migrating the remaining data.
Most likely you can do all of that without anyone really noticing, other than teaching them a new way to write code by doing a checkin instead of in production. Also you'll have to teach them to use the data access layer instead of directly going to the data store.
After you've done all that, don't try and rewrite the code. Spin up a new service that does some small part of the code, and build it properly with frameworks and libraries and dependency management and whatever else makes sense. Change the main code to call your service, then delete the code in the main service and replace with a comment of where to find the new code. Maybe if no one else is working on that service they won't notice. Make sure new functionality goes in the new service with all the dependency management and such.
Keep doing that with small parts of the code by either adding into the new service or spinning up new micro services, whichever way you think is best. Ideally do this in the order of how often each function is called (you still have tracing on right?). Eventually most of the important stuff will be moved, and then you can decide if you want to bother moving the rest.
Hopefully by then you'll have a much better velocity on the most important stuff.
I'd start by small incremental changes. A big change will be resisted.
Deployments first, separate environment next etc
- You need to get an understanding of why things are the way they are. Team of 3 people seems small. Is the team always in firefighting mode due to business constantly dropping things in their lap. - Do not attempt a full rewrite. Here be dragons & krakens. - One of the first things to do is to get your code into source control before you do anything else. That gives you insight into how often the code changes and in what way it changes. - The routing, templating, caching, curl requests, dependency management issues all stem from the no framework issue. - You are going to face varying levels of resistance. Part of that is going to be from the business side of things
My suggestions:
- You need to get management to understand the problems and on board with reform as soon as possible. Avoid framing the issues as technical problems. Explain the potential risks to bottom line resulting from business continuity failure or regulatory/compliance failure (esp if your industry is health/finance/insurance). If management is not onboard, your reforms are very likely going to be dead in the water. Might be best to cut your losses. - Get your code as is into git asap. - You will need more hands. At the very least, you need a senior who can help hammer things into a structured pattern that the juniors can follow. - Carrot is going to be much more effective for convincing your devs to adapt to new changes. Understand their pain points and make sure to frame things as not questioning their competence. The understanding needs to be that their time is valuable and should be spent on this that deliver the most value to them and to the business. - Business unit needs to rework their aggressive roadmap. I suspect there's an element of 'we always have delays in releasing so we need to keep the pressure up on developers to keep momentum up". You need some kind of process in place for managing roadmaps (We're currently working our way towards scrum across the business. It's difficult but persistence even in failure is important). - We've attempted rewrites of one of products. It took much longer than we planned (currently still in progress). What we're currently doing is using laravel as a front end to the legacy apps (laravel receives the request and passes it on to the legacy app in the same request) It is working well so far and has the advantage of allowing us to use laravel's tools (query builder, eloquent, views etc) in the legacy app. Then we can progressively modernize the legacy functionality and move it fully into laravel.
Also, remember to breathe and take a break now and then. Wishing you good luck. If you want to talk more or just vent, hit me up at voltageek [at] gmail.com.
First and foremost, always remember that you and your team are there to support that revenue stream. At the moment the junior developers have done that, but it sounds like they are at an inflection point and need help moving on.
Either one of two cases exists. The current state of software development is holding the business back from growing, or the business is near its limit, but the software is still a possible source of expense, reducing profit, either through excess maintenance or potential for failures.
In either case, your job is not to fix, it’s to lead and help.
First listen to each one of the developers in detail. Find out what they think are the problems. What difficulties they have on a day to day basis. Then teach them.
Perhaps they complain about losing work, or problems merging code. Teach them source control. Perhaps they really fear production changes causing outages. Teach them how to use a staging stack.
If the business is making revenue and sustainable, then you’ve got time and space. And always remember, that revenue is your goal along with your teams productivity. Your goal is not your own happiness with the stack.
If you stick with this company, the opportunity for personal and professional growth is incredible. You’ll learn skills you’ll use for the rest of your career.
So stick with it. And just remember, everything you know about how to run software development is the end goal. It’s where you want those developers to be at the end of the journey. But always listen to them first, and help them by teaching them how to help themselves.
Do a swot analysis with the team. Make them answer why it takes days to do simple changes. Make them answer how they'd recover prod if the disks died.
Block access to prod. The team has to code on Dev and upload their artifact to cicd.
They'll hate the change but it's policy and it's enforced. What are they going to do?
Block artifact upload to deployment. They have to merge a branch instead. Be extremely available to help them learn the SCM tool.
They'll hate the change but policy, etc.
Set up a work tracker that lets you link bugs to features. Populate it with historic data. Triage it extensively. Show the team how each bug comes from an earlier change. Show the team git bisect. (You'll need a test server at this point.)
Set them a target: average time per feature or issue. You'll abolish this metric once it's attained for the first time. In the meantime, it's hard to game the metric, because the codebase is fucked.
Wait, and see if they come up with anything on their own - dinner is cooked when it starts making interesting thoughts.
If they fail to work it out, you'll need to coach them. Give them little breadcrumbs.
You want them to understand:
- slow delivery == poor business outcomes - bugs == poor business outcomes - git helps with bugs - cicd lets you write code - testing reduces (delivery time + bugfix time)
Only when the team understands this can they do the work of fixing the app. (IMO that's a total rewrite, but you're not short of advice ITT.)
I don't have silver bullets for you, but hopefully you can benefit from my experiences.
> - this code generates more than 20 million dollars a year of revenue
Priority 0: don't fuck this up. Proceed cautiously, with intention. Focus on observability before you make changes. Get some sort of datadog type product, or run something in house.
Start building the culture of understanding risk, mitigating risk by having monitors. Get the other developers on a pager duty rotation, work to get them personally invested in operational excellence.
Get management on board with investing time in it: it's risk mitigation for their business. Get any incidents in front of them. Explain how and why it happened, what lead up to it, and things you're considering doing to remediate. Track how much time winds up getting spent there, and use that as an argument to proactively fix things. Most management will understand that if you're getting randomized, you're not being able to make progress on any single issue.
Work on getting a docker compose setup going so you can easily create a dev environment that looks exactly like production.
Use that to start creating black box tests. Consider things like selenium or postman. Your goal is to test as though you were your user and have no clue about the internals of the program. You do this so that when you make changes, you're not having to update tests as well. Write the tests first. Think in terms of TDD given/when/then. As you add new code, write unit (and integration) tests. Don't try to unit test existing code unless it's very simple.
> it runs on PHP
I feel the pain. Part of the engineering challenge here is accepting unfortunate initial conditions. Your goal is to raise the bar to sustainable.
> it has been developed for 12 years directly on production with no source control ( hello index-new_2021-test-john_v2.php )
Priority 1: get this in source control. If you can't get the other devs immediately onboard, copy what's in production to your local machine and start a git repository. If you need to copy down files after they've changed them in production, that's ok, just start building the repository and tracking some history.
After rsyncing changes down, you'll be able to diff with your latest checkout to see what changes have been made.
This is another mentoring opportunity. Show the other devs how using git is making your life easier. Show them how it's helping you manage the risk that they're afraid of.
Ideally, get a gitlab or github account for it, and start getting a CI pipeline going. Proceed slowly here and make sure you build consent from everyone. Maybe start with a private gitlab account and again, show the other devs how it's saving you time.
The first iterations may just be starting a Dockerfile to recreate the production environment.
> - it doesn't use composer or any dependency management. It's all require_once.
> - it doesn't use any framework
The silver lining here is that it means you don't have any external dependencies :) My biggest concern here would be: - is it using pdo/mysqli, or is it on the legacy mysql extension? - is it using parameterized queries, or are you going to need to audit for sql injections.
Chip away at this over time. It's not urgent. I'm sure many HN heads may explode at that thought -- but until you've triaged everything, everything seems urgent. You've got unmet basic prerequisites here. Say you start fixing sql injections before observability -- how do you know you haven't accidentally broken some page?
> - no caching ( but there is memcached but only used for sessions ...)
Nothing to fix! wonderful! You can figure out a good caching strategy after everything else is under control
> - the routing is managed exclusively as rewrites in NGInX ( the NGInX config is around 10,000 lines )
Having it centralized is actually a bit of a blessing. It means you're not having to scour the application for where it's being routed.
Start collecting nginx access logs, and getting metrics on what the top K endpoints are. Focus on those. Configure it to have a slow request log as well as an error log.
Do yourself a favor and setup the access log to use tabs to delimit fields. It'll make awking it, or pulling it into a database for querying much easier.
> - no code has ever been deleted. Things are just added . I gather the reason for that is because it was developed on production directly and deleting things is too risky.
The silver lining is that the unused code is inert. This is another "chip away with time" type task. start finding paths that haven't had requests in N months. Use analysis tools to show that something isn't ever called. When someone starts with the "well, what if...", remind them that it's in the repository, and isn't gone forever. It's just a revert away.
A bit theme here is fear. You need to start instilling confidence and resiliency in the team.
> - the database structure is the same mess, no migrations, etc... When adding a column, because of the volume of data, they add a new table with a join.
This isn't the worst thing in the world. It's also not urgent. Start putting together ERD diagrams, get the schema in source control, get a docker image going so that you can easily stand up a test database in a known state, nuke it, and start over.
> - JS and CSS is the same. Multiple versions of jQuery fighting each other depending on which page you are or even on the same page.
Slowly work on normalizing the jquery version. Identify all the different versions used, where they are, and make a list. Chip away at the list.
> no MVC pattern of course, or whatever pattern. No templating library. It's PHP 2003 style.
Not the end of the world -- this is pretty low on the priority list. Both are luxuries, and you're in Sparta. Start identifying the domain models, define POD classes for them, start moving the CRUD functions near by. The crud functions can just take the POD classes and a database connection.
> In many places I see controllers like files making curl requests to its own rest API (via domain name, not localhost) doing oauth authorizations, etc... Just to get the menu items or list of products...
Same as with the domain model and jquery, make a list, chip away over time. Be sure the curl calls have timeouts. Slowly replace the self-http-requests with library calls. Explain how if you only have N request workers, if all of those N requests are then making subrequests that there wont be any workers available to serve them, and they'll fail.
> - team is 3 people, quite junior. One backend, one front, one iOS/android. Resistance to change is huge.
This is a bit of a social problem. 4 people can be very effective though, if you're all working together well. Get to the root of why they're resistant and fix that. Are they just set in their ways? Afraid?
Work with them to rank their top 3 challenges, and work through what solutions may be.
> - productivity is abysmal which is understandable. The mess is just too huge to be able to build anything.
> This business unit has a pretty aggressive roadmap as management and HQ has no real understanding of these blockers. And post COVID, budget is really tight.
Measure, so management gets some visibility. Push back on work if you don't understand it. Be clear about what a "definition of ready" and "definition of done" is.
Don't stop the world to fix things. Consider having one person working on a fixup project while everyone else gives them cover by taking on the management ask.
> I know a full rewrite is necessary, but how to balance it?
I can't emphasize it enough, do not rewrite -- resist the urge, if you can't, find a different job. One of the challenges here is integrating with respect to time.
If you have no observability, and no tests, how are you supposed to even show that your rewrite behaves correctly? And if your team is so afraid of breaking something to the point of never deleting code, how do you expect them to handle deleting all the code?
That's about all I've got in me. I hope you're able to implement some meaningful change. Take it one day at a time, and just try to make it better than it was the day before. Good luck :)
There were thousands and thousands of business rules no one knew why they were there and if they were still relevant. I remember one fondly. If product=="kettle" and color=="blue" and volume=="1l" then volume=1.5l... This rule like many others would run on the millions of product lines they would import daily. And the cutest thing in the system was that if any single exception happened during a batch run... the whole run would fail. And every run would take close to 15 hours (sometimes more).
Not going into details ... But they couldn't afford the run going over 24 hours... And every day they were inching closer.
Similar to OP they extensively used EAV + "detail tables" to be able to add "things" to the database.
The web application itself was similar but less of a time-bomb. It was using some proprietary search engine that was responsible for structuring much of the interaction (a lot of it was drill-down in categories).
Any change on the system had to happen live with no downtime. Every minute of downtime was $1,000 in lost revenue.
The assumptions we had were: 1. At some point the system will catastrophically fail so 100% of the revenu will be lost for a long time. 2. Even if it were possible to rewrite the system to the same specs (which it wasn't because no one knew what the system actually did) such a rewrite would probably be delivered after the catastrophe.
The approach we used was to 1. Instrument the code - see what was used what wasn't. We set some thresholds - and we explained to the stakeholders they were going to be potentially be losing revenue/functionality. And we started NoOping PHP files like crazy. Remember, whatever they did the worse thing they could do is raise 2. Transform all batch jobs to async workers (we initially kept the logic the same) - but this allowed us with 1# to group things by frequency. 3. Rewrite the most frequent jobs in a different language (we chose Ruby) to make sure no debt could be carried over. NoOp the old code. 4. Proxy all http traffic and group coherent things together with front controllers that actually had 4 layers "unclean external" - whatever mess we got from the outside. "clean internal" which was the new implementation. "clean external" and "unclean internal" which would do whatever hacks needed to recreate side effects that were actually necessary. The simple mandate was that whenever someone did any change to frontend code they needed to move the implementation to "clean external". 5. We ported over the most crucial, structuring parts to Ruby as independent services (not really micto-services just reasonable well structured chunks that were sufficiently self-contained). If I remember correctly this was something of the size of "User" and "Catalog browser" the other things stayed as PHP scripts. 6. And with savagery any time we got the usage levels of anything low enough.. we'd NoOp them.
Around a year in there was still a huge mess of PHP around but most of it was no longer doing any critical business functions. Most of the traffic was going through the new clean interfaces that had unit tests, documentation etc. I think that 100% of the "write path" was ported over to Ruby. A lot of reports (all of them?) and some pages were still in PHP.
I don't think anyone ever noticed all the functionality that went away. We had time to replace the search engine with Elastic Search. It wasn't clean by any means but it was sturdy enough not to have catastrophes.
The company was bought by some corp around that time... and they transitioned the whole thing to a SaaS solution. I was no longer involved for quite awhile so I only heard about it later. But we bought them that extra year or more.
So .. as far as recommendations go: 1. Instrument the code (backfire.io !) 2. Find bang for the buck and some reasonable layer separation and do it chunk by chunk. 3. Don't try to reproduce everything you have. Go for major use-cases 4. Communicate clearly that this is coming with functionality loss. 5. Be emotionally ready for this being a long long journey.
tl;dr - Don't rewrite, focus on the biggest pain points first and work your way down. Build a framework in which the junior devs can work on new stuff while you untangle the big ball of spaghetti - they'll think they're doing the big fun stuff and feel like they've won, while you'll be able to be heads-down making things better in the long run. If there's any analytics, you can use that to justify some big changes if you can show that inefficiencies (like poor DB performance and cache usage) affect revenue.
I find it a little shocking that 3 junior engineers can’t be convinced to learn/try something new that might look good on their resume or make their lives easier.
Life is limited, do you want to spend 5 years of it here?
Sorry what? What position are you in here? If you have no authority here then you are in a very precarious situation and you should figure that out first.
The website was terrible. Mixed encodings which messed everything up. all the routing in htaccess going to hundreds separate files. PHP 5.4, no version control and pretty much everything wrong that could be.
Company had a hot potato that was generating quite a few milion pounds. They actually didn't had a programmer for a year, but because of pandemic fortunately old one decided to come back.
I had quite a lot of trust from owners as I developed two other side projects with them already, in crazy times. they knew they had to invest in technology or die and they said they want to do that.
We had to start from scratch, there was nothing salvageable there and files where 100k lines long with no explanation which one is run anywhere. I wanted to replace piece by piece, but because db structure and encoding, I really couldn't find a way. whatever we would end up with, wouldn't be decent thing, it would just be a frankenstein that then would have to be rewritten again ( although drastically better ).
I told owners how much it takes to develop similar project and said that this is not estimate as no one would know how long it will take ( they tried to redevelop this twice and didn't manage to ).
The project struggled with a lot of issues. The other developer couldn't really contribute anything even that I tried to pull him to new code. He ended up taking over product owner job on my advice as he was actually really useful to company, just not as coder. We couldn't find anyone to work even that we paid pretty well and allowed people from any place in the world. We found two developers which were pretty skilled, but seriously didn't do anything. I often deal with that, but in such a small project it's just killing any productivity, including mine.
We managed to publish the project with large delay. It ended up being super rushed, but we generally managed to sort out most of the issues quite quickly.
We failed on one thing though. SEO. Even that the site increased in all the stats in webmaster tools, the reindexing wasn't kicking in 5 months later. We hired SEO agency, but frankly they didn't help anything. The issue had nothing to do with new site. Just we were in google bad graces with previous site and google just ignored our new links while removing old ones. I knew that will be the case with new site, but the benefits should drastically out-weight the cons.
At this stage company literally refused to pay me my shares and some money they did owe me ( they were broke waiting for new round of funding so I gave them a bit of leeway ). I had to stop working and I am suing them now.
The moral of the story is:
- Everything will take way longer than you expect.
You cannot divide and conquer such a large project. Any amount of planning outside of basic one will be just a waste of time as no one will be aware of all the features, some features are lacking and some are just plainly stupid. Any scope will change thousand times. Rewriting partially is way better, but it wasn't possible in my case.
- Business will say they want to invest the money, but they don't understand tech.
All the time, the solution to too slow progress was hiring more devs. Man hours are almost never a solution. Time is way more important investment. There also have to be contingency, as something will go wrong. In my case SEO issue will be solved, but it might take 3, 6 or 12 months, in which time the business will have to loose some sales.
- Communication with business is very hard.
At this same time, you need to explain the stuff will go wrong ( unless you have unlimited amount of time and resources ) and will be delayed. In this same time you want them to invest money in it. Frankly I failed on that the most. What I would make very clear now, is that those issues are caused by lack of investment over the years.
- Only go to it with good team.
I managed to build a very good team at the end, but it took a lot of bad apples to get there that wasted a lot of my time. People had the skills, but half of them tried to rewrite every piece of code not developing anything useful and the other half did just not do anything for weeks.
My view on that is: Unless business understand the need to change, have the money and time to do it, and you have a good team, don't do it. Seems like you are 0 to 4.
Some businesses cannot be saved. It's their fault they didn't invest any money over the years and if you want to do tech, you need to have tech people in management or on board seats.
1) You said you can't manage this team directly. Is it your responsibility to make this team successful? I know it's annoying to see a team with horrible code and who refuse to change. But is your manager expecting you personally to fix this? If not, just leave it.
2) Even if it's your responsibility, is this where you want to spend your time? As a leader you have limited time, energy and political capital. You need to decide strategically where to spend that time to have the best impact on your company and to achieve your personal career goals. The fact that you can't manage them directly makes me think that they're not your only job. If it's just one area of your responsibilities, I'd consider letting this team continue to fail and focus on other areas where you can make some wins.
3) Is how the business views this team wrong? They're making a lot of revenue with a very cheap team who seem to be very focussed on delivering results. Yes I know, it's annoying. They're doing everything wrong and their code is unimaginably dirty. But... They're making money, getting results and neither they nor the business see any problem. So again... should you just let it be?
4) Ok, so if you're absolutely committed that this code base has to be fixed... maybe you should just find a different job? Either in the same company or in a different company.
5) Ok, so it's your problem, you want to solve it and you're unwilling to leave. What do you do?
Well, anyone can make a list of ways to make the code better. Because this team has been doing everything perfectly wrong, it's not hard to find ways to improve: source control, automated testing, CI/CD, modern libraries, SOLID, clean architecture, etc, etc.
You can't quietly make the changes, because the team doesn't agree with you. And even if they did, this hot mess is way past the point of small fixes. You need to put in some solid work to fix it.
So you need buy in from management. You either need to deliver less while you improve the code base or spend more money on building a larger team. But since they see no problem, getting their buy in won't be easy.
Try to find allies, make a pitch, frame the problem in business terms so they understand. Focus on security risks and reputational risks. And don't give up. You may not convince them today, but if you make a pitch, they will remember in 6 months time, when this team is still floundering. They will remember that you were the person who had the answers. And then, they may come back and give you the time and resources you need to clean up the code base.
So in conclusion. If it's not your problem, ignore it. If you have other teams to manage that aren't a mess, focus on them and let this one fail. If you're going to be responsible for this pending disaster, quit. If you absolutely insist on making a change, start with getting buy in from management. Then incrementally work down the technical debt.
What I did was forming a mental plan on how to get the org to a more sensible state - namely, having the application run on a framework, within a container, with tests, have it deploy from CI into an auto-scaling cluster of container hosts, configurable via environment variables. That was difficult, as the seniors all had reservations against frameworks, tests, and containers. So I went slowly, introducing stuff one by one, as it made sense:
* I started by rewriting core code as modules, in particular the database wrapper. They had cooked up an OOP abomination of mysqli-wrapper, instead of just moving to PDO. So I wrote a proper PDO wrapper that exposed a compatibility layer for the old method calls, and provided some cool „new“ stuff like prepared statements. Modules like this could be installed from a private composer registry, which helped justify the need for composer. * instead of going for Symfony, I created a very thin framework layer from a few Symfony components on top of Slim. This didn’t felt as „magic“ as the bigger options would have, and didn’t scare the devs away. * to build up trust, I added an nginx in front of the old and the new application which used version-controlled configuration to route only a few endpoints to the new app selectively. This went well. * now that we had proper entry points, we could introduce middleware, centralised and env-based config and more. In the old app, we reused code from the new one to access the configuration. Dirty, but it worked. More and more Code was moved over. * I started writing a few tests for core functionality, which gave confidence that all this was really working fine. I wasn’t really able to make the other devs enthusiastic about testing as I would have liked back then, though. * Testing showed the need for dependency injection, so I introduced PHP-DI, which brought the most elegant dependency injection mechanisms I know of. The senior devs actually surprised me here, as the accepted this without resistance and even appreciated the ability to inject instances into their code. * deployments would require uploading lots of files now, so I introduced BuddyCI, which is probably the most friendly CI server. It would simply copy everything from the repository to the servers, which was a large step forward considering the seniors suddenly couldn’t just upload fixes anymore. * with the deployments in place, I introduced development and production branches, and let the team discover the need for fix and feature branches by itself. * to avoid having to run both apps and nginx, I added container configuration and docker compose to spin up the stack with a single command. This convinced everyone. * from there on, I added production-ready containers and set up kubernetes on Google Cloud (this is something I wouldn’t do at most places, but it made sense at this particular org). We deployed copies of the app into the cluster, and set up a load balancer to gradually move requests over. * one by one, we migrated services to the cluster, until practically all workloads were running as containers. The images were built by the CI, which would also run tests if available, push the images, and initiate the rolling update. * at this point, things were very flexible, so I could add delicacies like dynamically deployed feature branches previews, runtime secrets, and more.
All in all, we went from 80+ bare-Metal servers (some of them not even used anymore) to a 12 node GKE cluster. Instead of manually updating individual files, we got CI deployments from production branches. Secrets in the code were gradually replaced with environment variables, which were moved from source-controlled .env files to cluster secrets. Devs got confidence in their code due to tests, feature branches and local execution. From a custom „framework“, we moved to commonly known idioms, paving the way for a migration to a full framework.
What I didn’t manage was introducing database migrations, disciplined testing, and real secret management.
I hope this helps you, if only to draw inspiration to get started _somewhere_. Best of luck!
In parallel is a review of the disaster recovery plan... do a full test restore of code + data from scratch!
I would then encourage an evaluation to get the lay of the land. If my intuition is correct, there are high priority problems in production that no one is aware of, well beyond the tech debt.
Start by setting up centralized error logging as quickly as possible, from the simple 404/500 error and database timeout reporting (is there any low-hanging fruit here redirecting URLs or speeding up the DB [indexes]?) to more deeply entangled server-side error reporting... ELMAH was an eye-opener when first dropped into an existing cowboy-style ASP.NET app, I don't know if something similar exists for PHP for free but you could learn a ton just trialing a commercial APM solution (same for db optimization tools).
Then once the fires are identified and maybe even a few are out, analyze available metadata to determine the highest-traffic areas of the application. This combines client-side analytics, server-side logs, and database query profiling, and guides where issues should be fixed and tech debt should be paid down first. You can get down to "is this button clicked" if you need to, but "is this page/database table ever accessed" is helpful when getting started. (It's often nice to separate customers from employees here if you can, such as by IP if working from an office.)
Do you have the option of pursuing hardware upgrades to improve performance? (Is this on-prem?) You might want to dig into the details of the existing configuration, especially if the database hasn't been configured correctly. Which databases are on which drives/how are available iops allocated/can you upgrade RAM or SSDs? One big item here is if your are nearing any limits on disk space or iops that might mean downtime if not addressed quickly.
In the cloud you have opportunity to find resources that are not being used anymore and other ways to cut costs. Here again you can trial commercial solutions for quick wins.
Finally, implement some type of ongoing monitoring to catch anything that happens rarely but may be absolutely critical. This might be best done through an automated scan of logs for new URLs and database queries. After a year to 18 months, you should have a good picture of which portions are completely dead (and can be excised instead of fixed). You can start cutting things out much sooner than that, but don't be surprised if a show-stopping emergency comes up at the end of the fiscal year, etc.!
These are all easily justifiable actions to take as someone hired to get things headed in the right direction, and can earn the political capital necessary to begin pursuing all of the other recommendations in this thread for managing technical debt.
Edit: one mention in the thread of prioritizing restructuring the DB, sounds best but also tough.
Step -2 is what you are doing now, OP, getting informed about the best way to go about this.
Step -1 is forming the battle plan of what you're going to change and in what order of importance.
Step 0 is communicating your plan to all stakeholders (owners, managers, devs, whoever) so they have an idea what is coming down the pipe. Here is where you assure them that you see this as a long process of continual improvement. Even though your end goal is to get to full VCS/CI/CD/DB Migrations/Monitoring, you're not trying to get there TODAY.
Step 1 is getting the codebase into a VCS. Get it in VCS with simonw's plan elsewhere in this thread. It doesn't have to be git if the team has another tool they want to put in place, but git is a decent default if you have no other preferences.
Step 2, for me, would be to make sure I had DB backups happening on a nightly basis. And, at least once, I'd want to verify that I could restore a nightly backup to a DB server somewhere (anywhere! Cloud/Laptop/On-prem)
Step 3, again, for me, would be to create an automatically-updated "dev" server. Basically create a complementary cronjob to simonw's auto-committer. This cronjob will simply clone the repo down to a brand new "dev" server. So changes will go: requirement -> developer's head -> production code change -> autocommit to github -> autoclone main branch to dev server.
Chances are nobody has any idea how to spin up the website on a new server. That's fine! Take this opportunity to document, in a `README.md` in your autocommitting codebase on the production server, the steps it takes to get the dev server running. Include as much detail as you can tolerate while still making progress. Don't worry about having a complete ansible playbook or anything. Just create a markdown list of steps you take as you take them. Things like `install PHP version X.Y via apt` or `modify DB firewall to allow dev server IP`.
Now you have 2 servers that are running identical code that can be modified independently of each other. Congratulations, you've reached dev-prod parity[1]!
Note that all of these changes can be done without impacting the production website or feature velocity or anyone's current workflow. This is the best way to introduce a team to the benefits of modern development practices. Don't foist your worldview upon them haphazardly. Start giving them capabilities they didn't have before, or taking away entire categories of problems they currently have, and let the desire build naturally.
There are a number of things you mentioned that I would recommend NOT changing, or at least, not until you're well down the road of having straightened this mess out. From your list:
> it runs on PHP The important part here is that it _runs_ on anything at all.
> it doesn't use any framework This can come much, much later, if it's ever really needed.
> no code has ever been deleted. As you make dev improvements, one day folks will wake up and realize that they're confident to delete code in ways they didn't used to be able to.
> no caching Cache as a solution of last-resort. If the current site is fast enough to do the job without caching, then don't worry about it.
1. Complete a risk assessment. List all the security, business, availability, liability, productivity, and other risks and prioritize them. Estimate the real world impact and probability of the risks, describe examples from the real world.
2. Estimate the work to mitigate each risk. Estimate multiple mitigation options (people are more likely to agree to the least bad of multiple options).
3. Negotiate with leadership to begin solving the highest risk, lowest effort issues.
But before you begin all that, focus on the psychology of leadership. Change is scary, and from their perspective, unnecessary. The way you describe each risk and its mitigation will determine whether it is seen as a threat or an exciting opportunity. You will want allies to advocate for you.
If all of that seems like too much work, then you should probably either quit, or just try to make small performance improvements to put on your resume.
No, seriously, some projects like this are lost causes. The company wants to just get maximal return on minimal effort. A rewrite is going to be a sunk cost with no return.
Basically, your job is to limp it along if you can't prove that a rewrite will make them more money.
If you don't like that answer, you might as well look elsewhere.