I'm looking for some hardware that comes with as many modern features such as multiple cores, USB, PCI, etc., and with datasheets and open specs for all components of the board, that are reasonably possible for a single person to grok.
That cuts out all modern x86 and ARM systems (you need a lifetime just to program a single GPU driver), RISC-V boards still depend on binary blobs and underdocumented auxiliary chips.
Without going as far down to a Z80 microcomputer, what is the most powerful and programmable computer I can plug a keyboard and monitor to, and start hacking?
I've written from bare metal: keys, display, networking, with hard realtime stuff (response times in millionths of second). It's not actually the bad.
If this is for enrichment, you may as well compromise and let system firmware set up a framebuffer for you. Then you won't need to have much in the way of a driver. Personally, I'm pretty happy with VGA text mode, but that has significant limitations (only 256 characters, support is disappearing rapidly and is inconsistent already). Serial console works pretty well too.
If you're using prebuilt processors rather than fabbing your own, you can compartmentalize around using someone else's firmware to initialize the hardware. One day, maybe, replace coreboot/u-boot, but you don't need to do that to start. Plenty of valuable knowledge of low level stuff to gather without starting by doing everything.
IMHO, x86 is a good place to start if you want a lot of modernity. There's actually tons of official documentation on the processors and the basic perihiperals (many of which are integrated into modern processors) and there's also tons of contributed documentation, tutorials, and examples out there. Yes, there's a trail of destruction in the way of legacy bits and bobs, but you can draw a line in the sand and say only support processors with local apics, stable TSC, etc, and skip a lot of the legacy or at least do the bare minimum of legacy to enable the modern versions.
osdev.org is a good resource to help you get started on that adventure.
It's uses the open source CV32E40P RISC-V core (in-order 4-stage RISC-V RV32IMFCXpulp)
> It’s probably the most “open-source hardware” board we’ve covered so far, since not only the hardware design files and SDK are open-source, but also the MCU core used in the CORE-V MCU. [1]
[0] https://www.openhwgroup.org/core-v-devkits/
[1] https://www.cnx-software.com/2023/08/04/core-v-mcu-devkit-fe...
As you progress so will the RISC-V environment and choices. And the cost of SBCs will continue to come down. Buying the most powerful SBC now is an expensive overkill. When working at the low-level you won't be programming PCI and USB straightaway.
With your stated goals, your growing RISC-V expertise will become increasingly valuable.
https://www.raptorcs.com/content/TL2DS1/intro.html
Openess might be impacted by add-on hardware. For servers, more powerful OpenPOWER systems are available from IBM.
Ok... lemme take you down a rabbit hole.... imagine an FPGA, but with no routing hardware what so ever... just a vast sea of LUT (Look Up Tables) base cells with 4 bits of input and output, from each of its neighbors. A latch on the inputs is clocked like a chessboard... all of the white, then all of the black... this slows down processing, but completely removes timing issues.
Properly programmed, a large fraction of all the LUTs could be doing computation on every clock cycle. Imagine a 1024x1024 grid of these.... you could throw inputs in one side of the array, and get outputs every clock cycle, as everything is pipelined.
Programming.... no coherent ideas about that... I have some, involving working backwards from the output.
Lets assume 1024x1024, 100 Mhz clocking, and a FP16 taking 24x24 cells... you could have 42x42 computes on each and every clock cycle, about 175 Billion FP16 ops/second... for something that takes up the same space as 64 megabytes of static ram. Bear in mind, that's being very conservative without optimizations.
At present, I can simulate that 1024x1024 grid at about 37 Hz on my desktop.[1]
I know it’s a fairly popular resource here - has anyone extended the content to go all the way with an FPGA implementation?