HACKER Q&A
📣 role

Recommendation for general purpose JIT compiler


Are there any open-source "general purpose" JIT compiler's out there? "General purpose" in the sense that it is not tied to any programming language (unlike v8 and luajit). At a high level, I want to feed in IR and get out a function pointer to call.

My use case is I have a DSL with a custom parser and interpreter. The DSL is essentially a programming language and is proving too slow (in terms of latency). The bottlneck is in the interpreter. I want to replace the interpreter with a JIT without having to deal with assembly code generation myself.

Preferably in Rust and/or Rust bindings. Preferably lightweight (small object code footprint). Preferably cross-arch (x86, arm, arm64).


  👤 eklitzke Accepted Answer ✓
This is exactly the API presented by LLVM ORC and gccjit, you feed in their IR/bitcode and they return to you a C function pointer.

https://llvm.org/docs/ORCv2.html

https://gcc.gnu.org/onlinedocs/jit/


👤 chaosite
There is Graal's Truffle [0]

You'll have to rewrite your parser/interpreter in Truffle, but you get everything else "for free".

Not in Rust. I wouldn't call it at all lightweight. It is cross-arch in the sense that the Graal JVM is cross-arch, which may or may not be sufficient for your purposes.

[0] https://www.graalvm.org/22.0/graalvm-as-a-platform/language-...


👤 boulos
As folks have mentioned, LLVM's JIT produces great code and is (relatively) easy to use.

However, LLVM is extremely heavyweight. Which "latency" did you mean? Are you going to run these functions 1M times, so that the quality of the generated code is paramount (and you can afford really long compile time) or do you care more about "I hit enter and get the answer"? You can tune LLVM (disable almost all passes, use fast instruction selection) but it's really not focused on millisecond-ish compilation.

There are a lot of "simple JIT" libraries for the latter case (you just want to feed in a simple IR and get machine code out, do an okay job at register allocation, but nothing fancy). None of them has "won" and most only have C bindings (to my knowledge).


👤 MaxBarraclough
GNU lightning is a simple and portable JIT library written in C, it might be a good fit. Its engine is fast and minimal, and does not perform much in the way of optimisations for you.

(For what it's worth I'm a very minor contributor.)

* https://en.wikipedia.org/wiki/GNU_lightning

* https://www.gnu.org/software/lightning/

If you want a more sophisticated JIT engine, others have already mentioned libgccjit and LLVM (heavyweight compiler solutions), as well as Cranelift and Mir (more lightweight).

Of these, only Cranelift is written in Rust.


👤 ogogmad
Haven't used the RPython toolchain, but it's worth looking into. You write an interpreter in a restricted subset of Python2 called RPython and make your interpreter report the start of a loop in the guest language, and the end of a loop. The interpreter then gets transformed effectively into a JIT compiler. The underlying principle is something called meta-tracing.

[edit: Changed "host" to "guest"; Python to Python2]

I have a question to the experts though: The principle of meta-tracing suggests you might be able to write your guest language in Python. Is that currently possible with RPython/PyPy?


👤 stormbrew
I would personally try cranelift for this. Its goals are around producing executable code quickly rather than optimally, which is pretty much the opposite of llvm. There are lots of things out there that use llvm for jit, but even its jit library layer is pretty heavy and you’ll probably spend more time generating code than you’d like.

That said, cranelift is still experimental.


👤 tlb
LLVM (in ORC mode) is very powerful. I wrote a JIT compiler for a DSL with it. It takes a fair bit of poring over the IR manual to figure out basic things. The optimization is as good as for C, including automatic unrolling of loops to generate SIMD instructions.

A very useful feature is that you can write C or C++ and run it through LLVM to see the IR it generates, and adapt it to your needs. You can even do it in Godbolt.

If your generated code is crunching over large amounts of data, an alternative to a JIT is to make the interpreter implicitly parallel. So each interpreter dispatch operation does N (say, 16) parallel operations, effectively cutting interpreter overhead by N. It works if there isn't data-dependent branching, which is often true for numerical operations.


👤 hawski
The usual recommendations have been given. Now for more touristic approach what I would like to use if given excuse and time. All those options are mostly written in C:

- QBE [1] - small compiler backend with nice IL

- DynASM [2] - IIUC the laujit's backend, that can and is used by other languages

- uBPF - Userspace eBPF VM. Depending on your DSL the eBPF toolchain could fit your use-case, but this would probably be the biggest excursion. There is some basic assembler in python.

[1] https://c9x.me/compile/

[2] https://luajit.org/dynasm.html

[3] https://github.com/iovisor/ubpf


👤 kasperni
Might not be exactly what you are looking for but take a look at GraalVM's Truffle [1].

[1] https://www.graalvm.org/22.0/graalvm-as-a-platform/language-...


👤 dataflow

👤 PaulHoule
You could write JVM byte code based on some language that isn’t Java but then you get the garbage collector and the rest of the runtime which you may or may not like.

👤 stevekemp
LibJIT is a library that provides generic Just-In-Time compiler functionality independent of any particular bytecode, language, or runtime:

https://www.gnu.org/software/libjit/

I've used that in the past to speed up a toy interpreter, but of course it is in C, rather than Rust.

There is at least one binding for it in rust:

https://github.com/MonliH/jit-sys

Finally here's a good introduction with several approaches for JIT:

https://eli.thegreenplace.net/tag/code-generation


👤 brrrrrm
This might seem like a weird suggestion, but you could probably use WebAssembly. The spec is incredibly simple and portable. It'd allow you to use V8, cranelift, wasmer, wasm-micro-runtime and a whole host of other JITs/runtimes.

👤 pizlonator
WebKit’s B3 compiler is easy to extract and not at all specific to any language.

I’ve heard good things about cranelift and I believe it’s sort of meant to fulfill the same role as B3. Might be worth checking out.

Most likely though, you should start by writing a template jit before you try to optimize. WebKit’s “assembler” and “jit” directories will show you how and you can probably extract most of the relevant code as it’s not WebKit specific. In particular the cross platform machine code gen.

Lastly I would advise against trying to reuse a C compiler backend like llvm unless your language is very close to C.


👤 isaacimagine
Maybe take a look at MiniVM[0]? It was on HN a couple months ago[1], and has gotten a lot better since then (e.g. JIT).

[0]: https://github.com/fastvm/minivm

[1]: https://news.ycombinator.com/item?id=29850562


👤 Rochus
If your DSL is statically typed then I recommend that you have a look at the Mono CLR; it's compatible with the ECMA-335 standard and the IR (CIL) is well documented, even with secondary literature.

If your DSL is dynamically typed I recommend LuaJIT; the bytecode is lean and documented (not as good as CIL though). LuaJIT also works well with statically typed languages, but Mono is faster in the latter case. Even if it was originally built for Lua any compiler can generate LuaJIT bytecode.

Both approaches are lean (Mono about 8 MB, LuaJIT about 1 MB, much leaner and less complex than e.g. LLVM), general purpose, available on many platforms (especially the ones you're mentioning) and work well (see e.g. https://github.com/rochus-keller/Oberon/ and https://github.com/rochus-keller/Som/).


👤 mamcx
You say the interpreter is slow, maybe consider speeding it up:

https://blog.cloudflare.com/building-fast-interpreters-in-ru...

https://ndmitchell.com/downloads/slides-cheaply_writing_a_fa...

P.D: A simple trick I apply for mine is to inline the looping for equivalents to folds/maps/filters like `[1, 2, 3] + 1`. You can do that calculation directly inside Rust, and even eliminate all interpretation if allow for specialization on the AST, ie: Ast.Map(Fn()->Ast, Vec, i32)


👤 tekknolagi
Not mentioned yet: QBE (https://c9x.me/compile/)

Although I might recommend interpreter optimizations before you go straight to machine code. While writing a just-in-time compiler for your DSL will remove interpretation overhead in software and in hardware, you will probably want to have more type information so that you can generate better code. Check out my PL resources page, which has multiple sections on runtime optimization: https://bernsteinbear.com/pl-resources/

Happy to chat, if you like.


👤 pie_flavor
An object-oriented model may not be what you are going for but the .NET runtime is general-purpose in that sense - you can compile a wide variety of languages to CIL and it's got loads of features (unlike say the JVM which contains no opcodes Java doesn't use). And then you can benefit from the huge standard library as well.


👤 evacchi
You may want to look at cranelift

👤 stefanos82

👤 eatonphil
Is there a way to write v8 bytecode directly and to execute v8 directly on it?

👤 SemanticStrengh
The state of the art is GraalVM/sulong, and enable you polyglotism. see e.g. https://github.com/graalvm/simplelanguage

👤 xfer
I would suggest generating webassembly instead and use wasmtime.