https://mitp-content-server.mit.edu/books/content/sectbyfn/b...
Video lectures around the course as well
https://ocw.mit.edu/courses/6-001-structure-and-interpretati...
MIT keeps a lot of course materials available online. I like to go through syllabi for text ideas for example.
https://people.csail.mit.edu/feser/pld-s23/
https://ocw.mit.edu/courses/6-035-computer-language-engineer...
Let's Build a Compiler
https://compilers.iecc.com/crenshaw/
Introduction to Compilers and Language Design
https://www3.nd.edu/~dthain/compilerbook/
Dragon book is where I started, but it's getting long in the tooth. Still a classic
https://www.amazon.com/Compilers-Principles-Techniques-Tools...
Most resources are going to be focused almost entirely on parsing, which is almost all you need if you're just going to create a toy language that doesn't aspire to be performant and have no interest in optimization, code generation, linking, etc. The difficulty of parsing depends almost entirely on the syntax of your language though.
A pre- or post-fix language is far simpler to parse than an infix language, to the point of being trivial and invalidating 90%+ of most books out there, so if you want to focus on low-level stuff and don't care about the syntax as much, choose one of those. If you are interested in lower-level stuff but want a more conventional syntax, use a parser generator (lex/yacc, ANTLR, tree sitter, etc).
Compilers and interpreters aren't much different in terms of difficulty or complexity, especially if you're concerned about optimizing performance, but an interpreter is much quicker to reach a minimum level of functionality which can be very good for keeping your interest up in the early portion of a project. The difference in complexity between a "transpiler"-style simple compiler and a simple interpreter are minimal though.
For target, choose what you know and are interested in. If you want to target hardware and do code generation or optimization, choose a RISC or less complex ISA at first unless you're already familiar with x86 - the principles are the same but the details are much less finicky.
Some things that are much smaller than a language (easy études — KISS):
- a desk calculator
- a template system
- a source-to-source transpiler for a single feature
- a simple cpu emulator
- a metacircular evaluator
- an esolang
- a build system
- a regex engine
EDIT: * according to c-t.d; please use this list to aid in writing a language, not a language-learning post.
sadly, a lot of language related libraries are in C/C++, so if using those sounds a pain to you, just ignore them for now.
get somthing simple working soon, such as an interpreter for Forth [0] written in any language you already know.
do the parsing later, there is so much more to learn about compilers and language runtimes!
when it's finally time for parsing, i recommend parser combinators [1]. they are pretty easy to implement yourself, once you have understood the concept and are very flexible.
do not forget about proper error messages with line/col information.
any kind of performance optimisation is strictly forbidden until you know what you are doing.
[0] https://en.m.wikipedia.org/wiki/Forth_(programming_language)
[1] https://www.theorangeduck.com/page/you-could-have-invented-p...
2. Write a parser.
3. Write an interpreter. Do not bother with a compiler yet. Writing an interpreter is a challenge and your final proof of MVP.
4. Refactor and optimize your prior logic.
5. Write documentation. Write with extreme empathy because if the level of effort remains too high nobody will look at this. Identify current issues and shortcomings. Identify next steps.
6. Socialize the work product.
7. Now, write a compiler.
---
I have thought about writing a ridiculously scaled down minimal JavaScript like language with TypeScript like type annotations and named procedures.
parser combinators also look interesting, like another commenter said. not understood them fully yet, but have just started looking into them. seems like describing the grammar in code using parser combinators can closely parallel the EBNF grammar for one's language.
i also saw this one recently: