HACKER Q&A
📣 amichail

Why not create AI benchmarks to port open source to other languages?


For example, the benchmark could ask the AIs to port 1000 methods in the TeXmacs C++ source to another language such as rust.

You could then evaluate the results by running TeXmacs with one ported method at a time to see if it seems to be computing the same thing as the original C++ method.

So this AI benchmark would serve two purposes: testing AIs on their ability to port apps in general while also porting a particular app.


  👤 theGeatZhopa Accepted Answer ✓
Generally, that's a good approach to show the capabilities of such a system. But, I think, it isn't so much suitable to do a comparison benchmark, because it just results in an 1 in case Texmacs, or some function of, is running or 0, if TeXmacs or some function is not running.

Just imagine the both extremas:

- translated every feature very good to other language, but not the main function, and hence, the proggi is not starting

- no functions or feature is properly translated, but it starts.

Which will get the better mark? In my eyes, it would be the second one - it's starting in the end.

So, a benchmark must consist of a lot of different comparable tasks and not of a single (oss) program like TeXmacs. If I've understood everything properly...

This approach seems not manageable.


👤 pancsta
There is a benchmark, way simpler than the one you described and still not passed by AI http://swebench.com/