How much of the 'crisis of reproducibility' is caused by code that wouldn't pass the smell test, let alone a real code review?
If the code can't be reproduced 7/10 times it's because the author doesn't want you to be able to reproduce it so they can continue researching it. 1 out of 10 times it's because of an error, and the remainder is fraud.
It's also important to NOT require code be shared for some fields. I know of several companies who review papers for prominent journals in their field. They take results from papers they review put them in their software to sell and then reject the papers so competitors have no idea some new useful thing exists. It's not illegal to do this by the way, infact there's no reason not too with blind reviewers as far as typical businesses are concerned.
The crisis of reproducibility is the worst in biological sciences and becomes the best toward purer sciences. That's its own problem, I won't go too far chasing that down here... Scientific reporting is mostly flawed not because of "oh silly scientist you made an error" but because of the social factors driving it 9/10 times. People just won't admit it, because they'd lose their positions if they did...
In practice, I'm not so sure. I have yet to meet a developer who reviewed another developer's project, and said
> Definitely, whoever wrote this code is 100% not braindead.
Academic code is written for a purpose - I'm afraid if code review became mandatory, there would be fewer papers submitted in fear of negative criticism due to a memory leak that eats up all memory if the program is running for 3 weeks (even though it finishes after 4 minutes).
On the other hand, most probably reviewer would come from the same science area as the submitter, with similar level of coding knowledge. Not sure if they would be any significant help regarding the code quality. They would do the same.
With that said, I would require making the code part of the paper, regardless how garbage it is.
I did not reject papers on that basis, just asked them to add back in the hidden details, and the authors always did.
But it is being tried. This journal: https://mpc.zib.de/ does code reviews. I am not linked with them, but from what I gather the code-reviews experiment is considered a net positive. I did submit papers there and my experience is positive as well. Code reviews were not super deep, but constructive; the reviewers seemed knowledgeable and impartial.
They do occasionally accept closed-source submissions, in which case they ask for binaries and ask the "code" reviewer to reproduce/check the numerical experiments. (It helps that they are in a field in which checking results validity is often vastly easier than generating them, in terms of computational complexity.)
Besides, the journal is fully open access (free to submit, free to read) and very well regarded in its (admittedly niche) subfield.
My experience in Nature, Science, and Cell, both as a submitter and as a reviewer, is that problems tend to be in the way inference is done. I don't think people lie with their code. They lie with the way they do inference, but inference tends to be faithfully implemented as code.
For example, the null hypothesis tends to be false almost all the time. So, it is a matter of accumulating a sufficient amount of data to get a small enough p-value to call the result "significant". Solution, encourage designs with a continuous explanatory variable, emphasize effect sizes, use partial pooling / shrinking, etc. And perhaps hire technical editors that get the right to veto flawed methods.
More generally, who's doing the reviewing, and do they have the skillset needed (and how are they being compensated)? Additionally, are they limited to just the code provided, or are they going to review any libraries (who covers the cost of any purchases?) or external code?