https://sysprog21.github.io/lkmpg/
https://blog.sourcerer.io/writing-a-simple-linux-kernel-modu...
If you want to "go low" in the way hardware works, you could try and write an interrupt handler on an embedded device.
If you want to "go low" in how optimizations work in application development, you could try and implement microbenchmarks and look at flamegraphs.
I've recently enjoyed Game Engine Architecture, mostly because there is an interesting mix using low level techniques to solve problems a normal application wouldn't be required to fix.
Game development in general is case for tuning yourself to when you should utilize high level programming techniques, and when required dropping into low level optimization to solve local problems.
Some simple things you can do:
* Get yourself a suitable embedded development system - I would recommend anything ESP32'ish that suits your fancy such as a Liligo or Watchy ESP32-based watch, or PineTime if thats more up your alley - and then write some little apps for it.
* Get to know Godbolt with a great deal of intimacy, just as a general approach to understanding what is going on:
* Invest a little workbench time in some of the various embedded frameworks out there - platformio, FreeRTOS, etc. and, very important: learn the Tooling And Methodology techniques that these frameworks manifest.
* Invest some workbench time in the RETRO Computing Scene. Seriously, you can still learn extremely valuable principles of tooling and methodology from an 8-bit retro system from the 80's. Get your favourite platform, get all its tools onboard, engage with its community - you will learn a lot of things that are still entirely relevant, in spite of the changes over the decades.
* Get into the F/OSS tooling/methdology flow - find software projects that are interesting to you, find their repositories, learn to clone and build and test locally, and so on. There are so many fantastic projects out there for which low-level skills can be developed/fostered. Get onboard with something that interests you.
Good luck!
GPU race is getting really hot and there is a lot of work being done to squeeze every ounce of performance especially for LLM training and inference.
One resource I would recommend is “Programming massively parallel processors” [1]
I am also learning it as my hobby project and uploading my notes here [2]
[1] https://shop.elsevier.com/books/programming-massively-parall...
[1] https://www.nand2tetris.org/
* - Lower levels include transistor logic, analog electronics, electromagnetism, chemistry and the equilibrium equation (how transistors work), quantum mechanics (how atoms and chemistry works).
1. The best source of low-level information on things like operating systems (writing your own) etc is https://wiki.osdev.org/Expanded_Main_Page
2. Compiler related low-level should include a read through Crafting Interpreters (https://craftinginterpreters.com/), even if all you're going to do is create compiled languages.
3. Hardware type low-level (where you build your own logic) is a long and ultimately non-rewarding path. I would suggest https://eater.net/8bit/
All those links are only starting points; use them to find a direction in which to head.
[EDIT: I also recommend writing a eBPF module for linux - easier than writing a kernel module, with just as much low-level hooks as you might need].
Also, learn Rust.
This comes to mind: https://www.morling.dev/blog/one-billion-row-challenge/
Read how others have done it. Here's an example in Java that goes relatively low-level to squeeze out performance: https://questdb.io/blog/billion-row-challenge-step-by-step/
https://wiki.osdev.org/Main_Page
It will give you a much more holistic view of computers-as-hardware and the low-level intricacies, that are in my opinion more useful and more foundational than just being good at optimising a hot-loop.
I did it in my teenage years, and it's my first true and only love. Now almost 20 years later I'm back at it, this time with all the accumulated experience in software engineering. There is nothing quite like it. Any basic, trite design (i.e. the usual POSIX clone) will teach you a great deal about the entire stack.
I learned assembly so I could disassemble and understand programs.
I learned C so I could use all the libraries that people had made and their frameworks than later because C++, Objective C, C#, java, python and other derivatives.
I wanted to manipulate images, speech and video and using high level programming language was so inefficient so I continued using C.
I learned FPFGAs again because I needed efficiency or the things I wanted to do like controlling robots did not work at all(they moved so sluggishly).
I love learning things, but that was never enough for me to learn something deeply when problems appear.
He discusses exactly what you're describing (L1/2/3 cache hit rates, their performance implications, how compiler optimizations can fool us into thinking we have a good hit rate, etc).
Also take a look into Intel VTune and Processor Tracing to understand how performance counters like Instructions per Cycle are calculated.
Here's a doc you can dig around with https://source.android.com/docs/core
You get a high level overview, then it explains how everything connects right down to the hardware. It's open source too, so you can go in there and poke around.
If you want something that can make money, I'd say look at camera and Bluetooth, because these are the things that need the most customization. Neural network API could see a lot of use in the future too.
But there's plenty of fascinating stuff, like how it renders fonts, how it handles hearing aids, and so on.
Edit: TIL Android has a category of 'rich haptics', where it gives tiny haptic feedback when you swipe your finger across a surface or to the beat of music. Very few app devs know this, so it's not integrates well into apps.
https://wiki.osdev.org/ is a good source for getting a hold in OS development
Not only will it show you how C/C++/Rust, etc... language statements map to CPU instructions, but it can also show you how CPUs execute those instructions! There are advanced views that show the various pipeline stages, execution ports, etc...
E.g.: https://godbolt.org/#g:!((g:!((g:!((h:codeEditor,i:(filename...
The right-most tab should show you the CPU execution pipeline
And the hardest things are in peripheral.
Any modern microcontroller it will give you the opportunity to learn a lot of useful things about peripheral devices. You can start with any really good modern 8-bits micro like Microchip's 2nd generation of ATTiny, so you'll have in your hands a lot of very powerful interesting smart peripherals: hardware event system, small programmable logic, different timers, good ADC etc.
The only rational additional consideration here is that your target platform should be popular, well documented and supported by the manufacturer.
Then will be time for some Cortex-M0 device with DMA.
Then you'll decide where to go further :)
Learn how it works, try adding a new instruction or implementing an extension.
Then write your own OS.
https://bootstrappable.org/ https://github.com/fosslinux/live-bootstrap/ https://bootstrapping.miraheze.org/wiki/Stage0
Start with this - https://bottomupcs.com/
Then do this - https://www.youtube.com/playlist?list=PLhy9gU5W1fvUND_5mdpbN...
and finally this https://diveintosystems.org/book/introduction.html
Learn everything there is to learn about the Tillitis TKey. It's the most open-source software and hardware USB security token there is. It is FPGA-based, and contains a tiny RISC-V core.
Full disclosure: I'm involved in the project.
Learn how to use a profiler like Linux' perf, VTune or Apple's Instruments. Which means interpreting the results of it to optimise your code.