HACKER Q&A
📣 mjbale116

How are you using LLMs for traversing decompiler output?


I need to reverse a binary made years ago, and I have zero experience with cpp, so I think it would be a good experiment to get an LLM to help me in any way


  👤 carom Accepted Answer ✓
Binary Ninja has an AI integration called side kick, it has a free trial but I'm not sure it can be used in the free web version. [1]

In my experience, the off the shelf LLMs (e.g. ChatGPT) do a pretty poor job with assembly, they can not reason about the stack or stack frames well.

I think your job will be the same with or without AI. Figuring out the data structures and data types a function is operating on and naming variables.

What are you reverse engineering for? For example, getting a full compilable decompilation has different goals than finding vulnerabilities or patching a bug.

1. https://sidekick.binary.ninja/


👤 JosephRedfern
These guys are building foundational models for this purpose: https://reveng.ai/. The results are quite compelling, and they have plugins for your favourite reverse engineering tools.

👤 netsec_burn
I made a site to use LLMs to help me with reverse engineering. The output is surprisingly readable, even with C++ classes. Let me know any feedback you might have: https://decompiler.zeroday.engineering/

👤 __alexander
Do you have experience reverse engineering? If not, LLMs are not going to help much. LLMs are useful for aiding the analysis but they don’t do the analysis.

👤 lumb63
It has nothing to do with LLMs, but Ghidra is a wonderful tool.

👤 Dwedit
Have you tried Ghidra yet? If you still have your debug symbols, then it can do a really good job.

👤 flashgordon
Interesting. Wouldn't this actually be a deterministic problem based on graph analysis. Id have thought LLMs would have been more effective taking the out out some graph recognizer and then identifying what those higher level constructs map to?

👤 rgovostes
The LLM4Decompile project (https://github.com/albertan017/LLM4Decompile) provides some open models for binary to C decompilation and Ghidra pseudocode refinement, along with some training sets.

RevEng.ai, linked a few times already, discusses their approach here: https://blog.reveng.ai/training-an-llm-to-decompile-assembly...


👤 mahaloz
I like using it for library function comments, variable name recovery, and sometimes types. The comments are usually hit or miss, but I find the variable names to be a bit better than auto-generated ones. I implement most of this in my decompiler plugin: https://github.com/mahaloz/DAILA; check it out if you are interested :).

👤 stackghost
The Advent of Cyber side quest this year needed some Ghidra and I found Pickman's Model was pretty good at helping me craft a heap exploit from a decompilation.

👤 jkstill
I've only played a with this, but it was impressive.

https://ghidra-sre.org/


👤 userbinator
Unfortunately LLMs are not good at precision and details, which is exactly what you need for the sort of analysis you're trying to do.

👤 apatheticonion
Inspired by the work out there that reverse engineers game engines, I've always wanted to try my hand at reverse engineering to contribute to the world of game preservation.

Is it actually legal to decompile a game engine from executables/dll files, write new sources by making sense of the output and rewriting it such that it can be compiled targeting modern APIs?

I feel like that must be illegal


👤 feznyng
You could use the LLM to help you write utility scripts for whatever disassembler you’re using e.g. python for IDA. That might work better than feeding it raw assembly.

Game RE communities also have all sorts of neat utilities for decompiling large cpp binaries. Skyrim’s community is pretty active with ghidra/ida.

Guessing you’re not lucky enough to have a PDB?



👤 sitkack
Do you know the compiler and what the source possibly looks like? I found LLMs are pretty good at recovering code from binaries, they need help though.

If you are able to run the program and collect traces, that will help a ton.


👤 svilen_dobrev
cpp? that's a preprocessor. u mean c++?

LLM won't help you much if u can't understand what it's talking about.

Manual way is, given ELF (linux executable format) somexe,

$ strings somexe

$ objdump -d somexe

$ objdump -s -j .ro data somexe

then look+ponder over the results.

and/or running ghidra (as mouse'd UI) over it.. which may help somewhat but not 100%

Have in mind, that objdump and ghidra have opposite ways of showing assembly transfer/multi-operand instructions - one has mov dest,target , other has mov target,dest - for same code.

no idea on (recent) windoze front. IDA ?


👤 u53rn4m3
RevEng.AI have their own foundational AI models for decompilation with English language summaries.

👤 seba_dos1
Good luck. If that's how you're approaching it, you're going to need it.

👤 ianhawes
Highly recommend it. I reversed an app with o1 Pro Mode and the analysis of the obfuscated C# code matched up accurately with what I eventually discovered by manually reversing.