HACKER Q&A
📣 warrenm

Why are there no traditional language compilers that target the JVM?


Or maybe there are, and my Google-fu is weak.

"Traditional" compilable languages, like C, C++, FORTRAN, etc all ought to be targetable to a portable runtime like the JVM (since it defines its own instruction set (Java byte code)).

Scala and other languages target the JVM as their runtime environment.

Why not older/traditionally-compiled languages?


  👤 jfengel Accepted Answer ✓
I wrote a book on that almost a quarter-century ago (Programming for the Java Virtual Machine). It includes toy versions of both a Lisp and Prolog compiler. It can definitely be done.

I don't know if it's worth it to port existing code to the JVM that way. There are so many pitfalls in porting that it's often easier to rewrite it. The older code is, the more likely it has something specific to the compiler/platform/library, and you'll spend more effort debugging that than just using it as a reference for a clean-sheet implementation in a modern language.

I picked Lisp and Prolog because those are languages with a very different style. Compared to that C/C++/Fortran are all kinda the same language. Since then functional programming has taken on a lot more prominence, though logic-style programming still hasn't caught much attention. (The book actually came out of a project to use logic programming as a database query language, and I still feel like that's better than where we are now.)

It blows my mind that the book is still in print, and dribbles out a few copies a year. It does need a good rewrite, since so many of the things I talked about are now practical rather than theoretical. The JVM has some new features for supporting non-Java languages, and there are now standard libraries for manipulating bytecode. (I had to roll my own.)


👤 jonahbenton
Because there is no sensible mapping from, like, C source to the JVM. C expresses semantics that are not available on the JVM, like most casts, and the JVM is built around concepts that don't exist in C or Fortran, like objects and method dispatch. They do exist to some extent in C++ but the semantics, memory model, etc, are completely different.

Someone could make up a mapping but it would be a numerological exercise, a flight of fancy forever lacking utility.

And anyway there is a practical integration between C and Fortran world and JVM world. The JVM can link to compiled C and Fortran object code, and for those who like suffering, JVM structures can be reached from C source.


👤 jimwhite
Four COBOL for the JVM implementations here along with Prolog, Common Lisp, four versions of Scheme (don't forget Clojure), Smalltalk, two versions of Pascal, Go, Rust, three JavaScript, Simula (the original OO language), and Basic (among others):

https://en.wikipedia.org/wiki/List_of_JVM_languages

GCC-Bridge, a C/Fortran compiler targeting the Java Virtual Machine (JVM) that makes it possible for Renjin to run R packages that include "native" C and Fortran code

https://www.renjin.org/blog/2016-01-31-introducing-gcc-bridg...

Earlier attempts at FORTRAN struggle because of performance issues but there were some successes:

JLAPACK – Compiling LAPACK FORTRAN to Java https://www.hindawi.com/journals/sp/1999/179617/

https://www.semanticscholar.org/paper/Automatic-translation-...


👤 alexl97
There's also GraalVM if you want to target the JVM for multiple languages (e.g. you can use https://www.graalvm.org/latest/reference-manual/llvm/ for C/C++/Fortran).

👤 amelius
Because of pointers.

C and C++ work with a large pointer-addressable heap, the JVM works with objects.

You could emulate a heap in the JVM though. You wouldn't get the advantage of JVM's automatic garbage collection, but at least you'd get some form of portability (don't expect to run the JVM inside the JVM this way).


👤 haspok
What would be the point? For example, what would you gain by having a C compiler that compiled to bytecode? It would be amazingly low level (C is just an assembler with macros) yet amazingly non-performant. And of course you'd need to port libc et al to do anything non-trivial.

Don't forget that the JVM already has ways to communicate with native programs (JNI originally, but now a new FFI is in the works), so if you depended on any native code you could still use it.


👤 billythemaniam
In addition to other answers in this thread, those languages came first. Java's main motivation initially was to fix all the memory bugs that come from manually managing memory in C/C++ and give a pure OOP development experience. Finally for the use cases that C/C++ are still good for, you typically don't want memory management or the JVM performance overhead so there isn't any incentive.

👤 millerm
You can literally compile C/C++(using clang)/Rust to LLVM bitcode and run those using GraalVM: https://www.graalvm.org/latest/reference-manual/llvm/Compili...

👤 lboasso
The Oberon programming language is 37 years old. Since it is a memory safe language a compiler for the JVM can be written (with some workarounds), for example see the self-hosting compiler oberonc [0].

[0] https://github.com/lboasso/oberonc


👤 mike_hearn
Here is the More Than You Ever Wanted To Know answer whilst I wait for a build to go green.

Firstly, for the last few years when you talk about this topic you have to distinguish between the "old world" JVMs which can only be given bytecode as input, and GraalVM, which is a superset of those JVMs and enables language implementations to use the JVM's features whilst bypassing the bytecode layer entirely.

Java byte code has some problems that make it unsuitable for C-like languages. Most obviously you can't do pointer arithmetic or arbitrary casts, which is a fundamental requirement for real C code. This doesn't mean it can't be done, these are all Turing complete machines after all, but it means there's no point because performance would be very poor due to all the workarounds. For C++ the gap gets wider because Java has its own ideas about how objects work that aren't the same as those in C++, e.g. multiple inheritance, so you can't implement C++ classes as Java classes.

These sorts of problems affect any language that is semantically too far away from Java, like scripting languages. JVM bytecode is fairly well designed, is reasonably general, and the invokedynamic bytecode was put in there to make scripting languages easier. But it's a high level bytecode so the semantic mismatch is still there.

For a long time this problem seemed inherent to the design space. Any VM bytecode language you can design will end up encoding some assumptions about language design into it, if it doesn't then you've just got assembly language and we already have those. In effect, trying to create a universal bytecode is like trying to create a universal programming language.

But the JVM guys didn't give up. Sun/Oracle Labs spent many years on research and the result is Truffle. Instead of asking people to encode their language into a universal ISA so the JVM can understand it, Truffle is an API. It comes as part of GraalVM. You write an interpreter for your language using this API and compile that to JVM bytecode instead (it doesn't have to be written in Java but it does have to be a language that can produce bytecode). This interpreter is then fused with the code of your target language at runtime and fed through a very advanced optimizing compiler (Graal) which generates machine code for your language. This code is then "installed" into the running JVM using another API called JVMCI, and then has access to all the normal JVM services like the garbage collectors, profiling, deoptimization, observability, standard libraries, OS abstractions, ability to call to/from bytecode world and so on.

With Truffle you don't think about the low level details of all that. You just write an interpreter. You do have the learn the API - your generated machine code will be relatively slow until you start using the API to annotate your interpreter and optimize it for common cases. But the API is pretty comprehensive and offers a lot of functionality, like language interop, debugging, tracing, hot swapping ...

The first languages implemented with Truffle were scripting languages. But the technique is general. It's not scripting or JVM specific and there's no specific reason the language your interpreter reads has to be textual. So they implemented an interpreter for LLVM bitcode. Now you can compile C/C++/FORTRAN/Rust using LLVM, and then execute that on the JVM. Performance is good, not quite as fast as natively compiled with GCC but in the general area. It's a fairly specialized thing to do and today is mostly used for running Python/Ruby extensions. However, because the whole thing is virtualized you can do some mind-bending tricks with it, for example, you can eliminate all the memory errors in the software run this way and then sandbox it without using kernel sandboxing. So it has some potentially big security benefits.

And that's how you can run C or any other language on the JVM without killing performance.


👤 s17tnet
HAXE also targets JVM [0] but it is not traditional.

https://haxe.org/manual/target-jvm-getting-started.html


👤 JohnFen
I don't understand what the use case for doing that would be. One of the advantages of traditionally compiled languages is that you don't need a runtime component for the resulting binary.

👤 Traubenfuchs
There is the GraalVM Python Runtime, Renjin GCC-Bridge (for C, C++, R)...

I feel like all of this kind of exists but it's quite esoteric "non-standard stuff" and not necessarily something sane people want in production.

https://github.com/bedatadriven/renjin/tree/master/tools/gcc...

https://www.graalvm.org/python/


👤 h0l0cube
Things that are central to C, bust the memory model of the JVM (e.g., pointer arithmetic). I did come across a C to JVM project which I used to try and port something slightly complex and it failed to work, thought it may have been sufficient for the author's purposes of accelerating R projects that had dependencies on C libraries.

https://www.renjin.org/blog/2016-01-31-introducing-gcc-bridg...


👤 wsherman
Free Pascal (fpc) has had a JVM target since 2011: https://wiki.freepascal.org/FPC_JVM

👤 bjourne
There are many but most never gained traction. Compiling C to JVM bytecode obviously is possible but the resulting code would not run efficiently due to the semantic mismatch. For example, implementing call-by-value semantics that C has using bytecode would not be efficient. Neither would pointer arithmetic.

👤 exabrial
Your assumption is incorrect; it does exist. GraalVM actually supports some of these. And long before GraalVM, there have been multiple commercial offerings to run FORTRAN on the JVM, but they've gone in and out of existence.

Most of the time if you're writing C, Cpp, FORTRAN, your primary goal is performance, not memory safety or portability. So executing on the JVM wouldn't buy you much.


👤 kosolam
Just wanted to note that maybe “traditional” isn’t the most accurate description. They are used not because of a tradition but rather due to mountains of legacy code. And, eventually everything boils down to specific requirements and tasks that need to be accomplished, and the question should be what is the best tool for the given job, given a set of circumstances and constraints…

👤 jbn
I will just leave this here: http://nestedvm.ibex.org/

👤 ilyt
Because portability is overrated compared to performance/RAM usage hit. Someone stubborn enough could figure it out but why ?

👤 simne
This is economically ineffective.

You could write backend for most compilers, to target JVM (instead of pure machine codes), but result will be extremely slow.

Going in opposite way is much more fruitful - for example, compiling Java to native machine codes you will got about 10 times faster execution.


👤 bedatadriven
I have written one (gcc-bridge). Fortran is actually a really good fit for JVM byte code. C/c++ is doable, but only with some unpleasant trade offs that others have mentioned here.

👤 AndrewDucker
Sure, compile to WASM and then use https://github.com/cretz/asmble to convert to JVM bytecode.

👤 simne
Plus, to slowness, troubles with JVM license. Better to use Apache-licensed LLVM, or something like it (BSD/MIT licenses).

👤 Alifatisk
TruffleRuby & JRuby targets the JVM aswell.

👤 bigbillheck
There's a whole bunch of reasons, but like so many other things one of them involves RMS: https://gcc.gnu.org/legacy-ml/gcc/2001-02/msg00895.html