Why not create AI benchmarks to port open source to other languages?

Question

For example, the benchmark could ask the AIs to port 1000 methods in the TeXmacs C++ source to another language such as rust.You could then evaluate the results by running TeXmacs with one ported method at a time to see if it seems to be computing the same thing as the original C++ method.So this AI benchmark would serve two purposes: testing AIs on their ability to port apps in general while also porting a particular app.

theGeatZhopa · Accepted Answer

Generally, that's a good approach to show the capabilities of such a system. But, I think, it isn't so much suitable to do a comparison benchmark, because it just results in an 1 in case Texmacs, or some function of, is running or 0, if TeXmacs or some function is not running.
Just imagine the both extremas:
- translated every feature very good to other language, but not the main function, and hence, the proggi is not starting
- no functions or feature is properly translated, but it starts.
Which will get the better mark? In my eyes, it would be the second one - it's starting in the end.
So, a benchmark must consist of a lot of different comparable tasks and not of a single (oss) program like TeXmacs. If I've understood everything properly...
This approach seems not manageable.

pancsta · Answer

There is a benchmark, way simpler than the one you described and still not passed by AI http://swebench.com/