- the MIT Operating Systems Engineering OCW course [^1]
- specifically, Lecture #2 [^2] which describes the bootloader for 'xv6', a reimplementation of Unix v6, which does a great job showing what a solid OS with all the basics look like
[^1]: https://ocw.mit.edu/courses/6-828-operating-system-engineeri...
[^2]: https://ocw.mit.edu/courses/6-828-operating-system-engineeri...
- old x86 PCs boot in BIOS mode: https://opensource.com/article/17/2/linux-boot-and-startup
With BIOS, your PC initializes hardware and contains firmware drivers for reading disk drives. BIOS loads the first sector (MBR) from the selected boot drive to a fixed memory location and jumps to it. Everything else is up to the bootloader (BIOS still provides methods for reading disk, interacting with the keyboard/moues and text or (S)VGA video output). GRUB2 is saved to the disk in two stages: Stage 1 is <512bytes and is only used to load stage 2. Stage 2 is commonly saved in the empty space before the first partition. Grub stage 2 then loads it's own drivers for filesystems, searches partitions and kernels+ramdisks... Once you select or manually type the path to a kernel and init ram disk, GRUB loads it into memory and jumps to it (still in 16bit real mode).
- new x86 PCs and some ARM use UEFI: https://hackaday.com/2021/11/30/whats-the-deal-with-uefi/
UEFI works similarly, but is cross-platform (can work on x86, x86_64, itanium, 32bit and 64bit ARM, possibly on RISC-V). UEFI doesn't look for boot drives but for EFI applications (bootloaders) on a special EFI System Partition (uses FAT32 filesystem per spec, but can be any filesystem your UEFI firmware has a driver for). The firmware then loads the configured UEFI application and runs it (it also provides services for input devices, disks, networking, display). GRUB does the same as above.
The only difference is if you are using a distribution that has a signed bootloader (using Microsoft's UEFI CA) probably also check whether the kernel and 3rd party modules are signed using distribution's or your (owner's) signing keys.
- other ARM devices and most MIPS, RISC-V commonly use U-Boot with Device Tree (as without UEFI or ACPI, there has to be a way to inform the kernel of basic non-PNP hardware and how to communicate with it).
Your computer and all of its hardware are built on standards. For the most part, they are adhered to by hardware and BIOS manufacturers. GRUB et al are just abstracting those away because they're obtuse and not very ergonomic to work with from day to day.
But modern real hardware first boots a Management Engine (or BMC, or T2, or similar), which prepares things and starts subsystems like hardware controllers, before releasing the CPU where the "OS" finally boots, blissfully unaware that it is running on an abstraction.
Watch https://www.youtube.com/watch?v=36myc8wQhLo from about 10:00 to about 20:00 for a decent super high level overview.
These two are also good for a deeper understanding.
https://i.blackhat.com/USA-19/Thursday/us-19-Davidov-Inside-...
https://i.blackhat.com/USA-19/Thursday/us-19-Krstic-Behind-T...
They both cover UEFI, even though the focus is on T2.
I know that's well beyond what you were asking for, but the boot process goes well beyond GRUB.
https://wiki.archlinux.org/title/Arch_boot_process
P.S I use Arch btw. ;)
https://docs.freebsd.org/en/books/handbook/boot/
The text is not edited yet, but as I had the benefit of learning about it with fresh eyes, it should be very approachable (and hopefully accurate enough)
It explains the boot process from the cpu initialization to a the end of the linux kernel initialization.
And since no one's mentioned let me say that TLDP is a comprehensive, understandable, and the authoritative source on all things Linux.
That said, here's two documents from TLDP.
1. [Boot process, Init and shutdown] https://tldp.org/LDP/intro-linux/html/sect_04_02.html
2. [The boot process in closer look] https://tldp.org/LDP/sag/html/boot-process.html
https://en.wikipedia.org/wiki/GNU_GRUB
Incidentally I discovered you can learn a fair amount about the boot process by making bad edits to your /etc/fstab file on a Linux system :) e.g.
https://unix.stackexchange.com/questions/107828/how-is-etc-f...
Also this central page for different bootloaders for a variety of systems, is kind of interesting, there's many different approaches.
UNIX and Linux System Administration Handbook, 5th Edition, Chapter 2: Booting and System Management Daemons
"Code" by Charles Petzold, 2nd Edition, Chapter Twenty-Six: The Operating System
The first one is totally fine to read stand-alone. "Code" makes more sense to read cover to cover, though the chapter on its own isn't totally useless.
https://github.com/nu11secur1ty/All-Stages-of-Linux-Booting-...
https://github.com/0xAX/linux-insides/blob/master/Booting/li...
Dig in on each of their home/official site e.g: https://www.gnu.org/software/grub/manual/grub/grub.pdf
most say Gentoo or Funtoo (its improvement) excel in docs. among them
You'll learn everything you wanted about booting and much more.
Booting process of Windows NT since Vista: https://en.wikipedia.org/wiki/Booting_process_of_Windows_NT_...
UEFI > Secure Booting, Boot Stages: https://en.wikipedia.org/wiki/UEFI#Boot_stages
The EFI system partition is conventionally /boot/efi on a Linux system; and there's a signed "shim loader" that GRUB launches, which JMP- launches the kernel+initrd after loading the initrd into RAM (a "RAM drive") and mounting it as the initial root filesystem /, which is pivot_root'd away from after the copy of /sbin/init (systemd) mounts the actual root fs and launches all the services according to the Systemd unit files in order according to a topological sort given their dependency edges: https://en.wikipedia.org/wiki/EFI_system_partition
Runlevels: https://en.wikipedia.org/wiki/Runlevel
runlevel 5 is runlevel 3 (multi-user with networking) + GUI. On a gnome system, GDM is the GUI process that is launched. GDM launches the user's Gnome session upon successful login. `systemctl restart gdm` restarts the GDM Gnome Display Manager "greeter" login screen, which runs basically runs ~startx after `bash --login`. Systemd maps the numbered runlevels to groups of unit files to launch:
telinit 6 # reboot
telinit 3 # kill -15 GDM and all logged in *GUI* sessions
You can pass a runlevel number as a kernel parameter by editing the GRUB menu item by pressing 'e' if there's not a GRUB password set; just the number '3' will cause the machine to skip starting the login greeter (which may be what's necessary to troubleshoot GPU issues). The word 'rescue' as a kernel parameter launches single-user mode, and may be what is necessary to rescue a system failing to boot. You may be able to `telinit 5` from the rescue runlevel, or it may be best to reboot.
[0]: https://www.joe-bergeron.com/posts/Writing%20a%20Tiny%20x86%...
> This is a discussion of what happens when a CPU chip starts. It may be thought of as what happens when a whole computer starts, since the CPU is the center of the computer and the place where the action begins.
Problems with bootlaoders are generally specific to the system it's on and how the drive(s) is / are set up.
https://www.linux.org/threads/linux-bootloaders.8943/
https://www.tecmint.com/best-linux-boot-loaders/
I think even old versions of Scott Mueller's "Upgrading and Repairing Pcs" might be of use to you. Digital versions could be found at archive.org.
This post[1] has an animated image that's pretty helpful: https://embeddedinn.xyz/images/posts/rvLinuxQemuBoot/bootAni...
It's of course specific to one platform, but the visualization of building up a structures in memory with the right offsets/order and jumping in is pretty typical.
[1] https://embeddedinn.xyz/articles/tutorial/RISCV-Uncovering-t...
At each of those stages, the bootloader images, the kernel image, the device trees, and the disk / filesystem are verified ("measured") for integrity eventually anchored on a "hardware root-of-trust".
If you're a visual learner:
How TI Beagleboard boots (33m): https://www.youtube.com/watch?v=DV5S_ZSdK0s / https://ghostarchive.org/varchive/DV5S_ZSdK0s
How ARM Cortex M boots (9m): https://www.youtube.com/watch?v=3brOzLJmeek / https://ghostarchive.org/varchive/3brOzLJmeek
Bootloader 101 (embedded devices; 38m): https://www.youtube.com/watch?v=UvFG76qM6co / https://ghostarchive.org/varchive/UvFG76qM6co
Measured Boot (39m): https://www.youtube.com/watch?v=EzSkU3Oecuw / https://ghostarchive.org/varchive/EzSkU3Oecuw
Coreboot (38m): https://www.youtube.com/watch?v=iffTJ1vPCSo / https://ghostarchive.org/varchive/iffTJ1vPCSo
U-Boot (1h 25m): https://www.youtube.com/watch?v=INWghYZH3hI
UEFI for U-Boot (38m): https://www.youtube.com/watch?v=VnsF3uRZzNk / https://ghostarchive.org/varchive/VnsF3uRZzNk
Device Trees (43m): https://www.youtube.com/watch?v=Nz6aBffv-Ek / https://ghostarchive.org/varchive/Nz6aBffv-Ek
ARM Trust Zone (46m): https://www.youtube.com/watch?v=q32BEMMxmfw
I found it invaluable when debugging some issues a few years back. Try unpacking the initramfs and reading the scripts contained in there, too: https://wiki.archlinux.org/title/Arch_boot_process#initramfs
https://www.gnu.org/software/grub/manual/grub/grub.html
and/or in the Linux From Scratch book:
http://www.bitsavers.org/components/intel/80386/231499-001_8...
Modern PCs are a little more complicated. As a users, it seems to me like 20% of that complexity is necessary, such as to configure the DRAM (which definitely has to be done before running any user code.) The rest is all accidental complexity and compatibility with various legacy mistakes.
Most links here are very useful describing booting from a firmware perspective, CPU perspective, kernel perspective etc. but it really depends on what level of detail and level of knowledge you want to dig in to.
In general, this is what happens when you plug in the power and push the power on button:
1. When power is applied, the hardware (so no software or firmware!) makes sure that your power supply is healthy and uses stand-by power (which is always on) to check if you are pushing the button to turn the system on
2. Once you push the button, the circuit uses its stand-by power to turn on main power, which in turn does some hardware voltage and timing checks of its own, while also 'holding' the main processor and some other components in an endless reset state
3. If everything checks out, the clock runs and the power is good, it releases the infinite reset and the processor is allowed to run
Those three steps are very generic, and highly dependant on standards like ATX, x86, and whatever component and mainboard vendors tack on to it. Diving deeper into this would direct you into the area of digital electronics, power management, timers, microcontrollers etc. On one hand it might not matter what the electrical details are, on the other hand, it is very much a part of 'booting' a computer. Primers on how this works haven't been shared so far, but I would dig some up.Onwards:
4. Only at this time does software or firmware even come in to play. When a CPU wakes up and comes out of reset, there is one pre-defined thing it was designed to do, specific to the how the CPU was built; usually it is going to have a special location in memory where it always begins to look for things to do: the reset vector. Decades ago, this was very simple, it would just expect you to connect a memory device like a BIOS memory chip to the CPU on specific pins, so that when the CPU comes out of reset it always starts at the top of the memory contained within that chip and just does what it tells the CPU to do. In a way, this could be referred to as IPL: initial program load, something that mainframes used to do (and in a way still do). You have to tell the processor what to do, otherwise if you give it power, it just sits there doing nothing. But you can't tell it what to do if there is no program running to tell it to do anything, hence the reset vector, a hard-wired first-step that it always performs. All you need to do is make sure that you have a chip put at the location it expects which contains some CPU instructions.
5. Once the CPU has started executing whatever it found at the reset vector, it is actually doing stuff. In most cases, it starts running BIOS code. Various references to information as to how this works are specified in the comments, but also on Intel's website, on the UEFI site, Coreboot site etc. If you were to dive much deeper into this, you would get into Firmware Support Packages, BringUp code, Boot ROMs, Management Engines, SMM etc. Keep in mind that all of this is specific to x86 PCs, if you were to check out x86 servers, ARM systems, PowerPC systems, RISC-V systems, they all do similar things, but slightly different.
6. At some point during this booting phase, the code the CPU found in the firmware chip (be it BIOS or UEFI) will have configured the DRAM so it can use the actual memory in the system, and perhaps started up the PCIe bus and some other systems like USB, PS/2 etc. This is an important moment because until now, the facilities you can use are extremely limited, there was no memory, no display, no keyboard etc. So after some of those were initialised, the primitive code gets switched out for some more advanced code which performs tasks you might be familiar with: POST codes, devices start showing some activiteit and the system starts powering up things like video output and keyboard input. At this point the system is also 'advanced' to the point where it can load your settings, such as your preferred boot device.
7. This is the phase where it gets closer to actually using the computer, and most of the events that happened until now might collectively take 1 second to complete. A lot of stuff happens in very little time! Next, once the system has started itself to the point where it can communicate with you, and with peripherals like storage devices, it can start thinking about what to do next. In the firmware settings you can often specify the boot behaviour, for example you might want to have the system boot from a specific disk or file on a disk, or you might want to boot from a network device, or maybe you want to boot and get dropped into a shell. All of those things are essentially the same from the firmware's perspective.
The first two of four steps described are often referred to 'early firmware' because the things it can do are very limited and serve mostly to prepare the system for more advanced firmware. Some of the steps are 'secret' and are delivered by the CPU manufacturer (like Intel). As a firmware developer you would get a bunch of binary data from them, and some instructions where you need to let it take control of the entire system for a bit, and hope that it gives control back to you. The x86 FSP and Coreboot teams deal a lot with this 'first steps to make the computer work' stuff, and are good starting points if you wanted to dive into that. Other resources are limited, companies like Phoenix and Inside and AMI keep their software secret and even if they would tell you about it, you'd have to pay a lot of money and sign an NDA. Other resources like the EDK and UEFI spec have been shared in other comments. 8. Let's assume you want to run an operating system; you'd often find that the system has a bunch of files on a disk, using a filesystem. BIOS and UEFI are generally not very good at reading filesystems. The BIOS didn't really read much at all, and UEFI mostly does FAT32, with optional drivers for other filesystems. If you're on EXT4 (Linux) or NTFS (Windows), that's no good, because you can't really assume that your firmware can read that at all. What's more: the operating systems are likely using kernels that need advanced features to run, and a few of those need to be setup beforehand. This is where you'd use a boot loader: a program that is designed to bridge the gap between a firmware that can't do enough, and a kernel that wants too much.
9. A bootloader is essentially just a program, so it can be loaded into memory, and started. This is a task the firmware can perform: BIOS would use a very rigid approach: read the first sector of the disk you picked, and hope it contains a boot loader. Not very flexible, and also very limited (one bootloader per disk for example). UEFI is a bit smarter: as long as you have a disk with a partition table of the GPT variant, and it contains a partition of the ESP type with a FAT32 filesystem, it can read directories and files all day long. So if you tell UEFI "run from disk 1 using file /efi/boot/banana.efi", it will go to the disk, go through the efi and boot directories, and load and execute the banana.efi program. It doesn't even really 'boot' anything, since an EFI program can be anything (it doesn't have to be a bootloader).
10. At this stage the bootloader of choice has started, and the firmware can mostly be ignored. The bootloader is likely smart enough to load its own settings, for example what operating systems it knows about and on what disks they can be found. It also prepares things for the operating system, like what devices are known ahead of time, and where in the computers memory the kernel is going to be loaded.
The three steps above are mainly about control handoff and boot discovery. There are numerous standards and references on this, like the multiboot specification, but most of those (including GRUB) have been commented already. This is also the phase in which the computer does more or less what you see and use on a daily basis. Since there are many open source elements available, it is also much more open to exploration, and less 'hidden away' in proprietary secret documents.Nearly all of this follows a relatively simple pattern: start small and primitive, do a dedicated task, start the next phase which is slightly bigger and slightly more advanced. Almost all steps in this pattern were built from manual processes; the farther back you go in time with older and more primitive computers, the more steps you'd actually be doing yourself. This can be pretty helpful to figure out why those processes or phases exist in the first place. A "manual" startup on a very old machine: https://youtu.be/PwftXqJu8hs?t=244 (edit: I made a mistake, this is not the bootup, but does include the line "powered on, because there is nothing to boot up" -- the machine essentially powers up and does nothing, no reset vector, heck, not even an actual reset! I'll try to find the reference to the actual bootup)
A bootloader is what loads your OS. Alternatively, it can load another bootloader: for example, GRUB chainloading the windows bootloader. Even if a bootloader can support both BIOS and UEFI, it must install a matching configuration, because the UEFI boot process is structured very differently from the BIOS boot process. The bootloader also sets Linux kernel flags.
UEFI replaces BIOS. A UEFI will probably emulate the BIOS boot methodology with "legacy mode". These are basically the OS that loads your bootloader. One of the two lives on your motherboard and configures your hardware.
GPT replaces MBR. They are both partitioning schemes: they organize the areas in a storage media where a filesystem can exist. MBR has an overcomplicated partitioning scheme, and a few reserved sectors in the front that tell the BIOS where to find a bootloader. GPT has a simple partitioning scheme, and a special partition for the bootloader. BIOS does not support GPT. UEFI probably supports MBR with its legacy mode.
That special GPT bootloader partition is called the "EFI System Partition" (ESP). A partition is "flagged" as ESP in the GPT partition table itself. It is expected that an ESP partition will be formatted with the fat32 filesystem.
When using BIOS, Windows is infamous for overwriting the MBR to point to its bootloader.
When using UEFI, Windows install media autogenerates the ESP partition alongside multiple backup partitions: if there is an existing ESP partition on any present internal storage media, the windows installer will simply use that one, even if there isn't enough space. If there are multiple ESP partitions present, you cannot tell the windows installer which one to install to.
Since the MBR does most of the work, a BIOS only needs to decide which disk to boot. Usually it saves the order of disks that it will try. Some older BIOSes don't support booting USB media, but you can potentially work around this by loading another bootloader from a floppy/CD and chainloading.
UEFI saves a list of boot entries. It can autogenerate them by scanning storage media, like USB install disks. An OS can save or edit these entries directly, so long as it has been booted with permission (some laptop motherboards overcomplicate this). The Linux kernel can be compiled with a minimal EFIstub bootloader. In this case, the kernel flags are saved in the UEFI boot entry on the motherboard.
The boot sequence looks like this:
1. The BIOS or UEFI initializes hardware, and lets the user interrupt to edit BIOS/UEFI settings.
2a. UEFI has a saved list of boot entries. The default entry is loaded, or one is chosen from the list in the UEFI settings. The boot entry loads a bootloader, a Linux kernel EFIstub, or some arbitrary UEFI program like memtest86.
2b. BIOS must search all present disks for a bootloader. This is expected to be found in the first few sectors of the MBR partition table. The saved order of potentially bootable storage media is followed: the first bootloader found is loaded.
3. The bootloader does what it is configured to do. GRUB will usually present a list of entries configured by the OS it was installed with. The bootloader loads an OS.
There are some more complexities:
Libreboot/Coreboot is both a firmware (like UEFI) and a bootloader. It's made to skip as much hardware initialization as possible, and go straight to the OS fast.
Apple implements their own proprietary "EFI", which is like UEFI, but less compatible, and has a very minimal pre-OS UI. It (at my most recent attempts nearly a decade ago) refuses to boot USB media. The bootloader rEFInd is pretty necessary if you want multiple OS installs.
The world of ARM is different: no UEFI or BIOS. Usually whatever bootloader Android uses, but maybe [libre/core]boot.
I had the same questions as you around a year ago. As other commenters mentioned, modern hardware is incredibly complex and needs to go through many, many steps before it can boot a kernel. Modern CPUs have a bunch of legacy requirements that still exist to keep backward compatibility with older hardware.
One good example of these legacy requirements is the A20 line on x86 [1]. A20 can be considered a boolean flag that determines whether the CPU can access more than 1MB of RAM. The A20 flag was introduced in the Intel 286 because its predecessor, the 8086 had a limit of 1MB of RAM. The 286 needed to access 16MB of RAM, so Intel introduced the A20 flag that determines whether the CPU can access memory past 1MB. The A20 line still exists in x86_64 CPUs today, so it needs to be enabled before the entire RAM address space is used.
There are a TON of legacy requirements like the A20 line on modern CPUs (especially x86_64). It can make it very difficult to figure out what is going on in the boot process. For me, I found it much easier to start learning how simple pieces of hardware works before moving on to x86. The boot process for Raspberry Pi's RP2040 microcontroller is explained in the datasheet [2] and I found it to be a great resource to figure out which hardware needs to be initialized and what that actually means.
Just like x86, microcontrollers have many tasks they need to complete before loading the main program. But these steps are much less ambiguous and are described very well in the RP2040 datasheet. It lists every step that it goes through before the main program starts (including stuff like initializing the clocks to specified speeds). After I felt like I thoroughly understood how the RP2040 boots, it became much easier to understand why the bootloader needs to complete certain tasks.
Not sure if you are familiar with assembly, but you can also checkout part of the RP2040's bootloader here [3]. There are a few different bootloaders for the RP2040, the one I linked is the one referenced in the datasheet. Its purpose is to load the main program from a specified location in the flash memory chip.
Remember that the boot process on x86 processors has been changing for around 40 years, so expect it to take a while before you feel more comfortable with the terms used in bootloading and hardware. I've been learning this stuff for around a year and still feel like I only understand a fraction of the x86 boot process. But I find it so fascinating that I can't help but want to learn more of its complexity.
I love talking about hardware and the boot process, so feel free to let me know if you have any more questions! :)
[1] https://en.wikipedia.org/wiki/A20_line
[2] https://datasheets.raspberrypi.com/rp2040/rp2040-datasheet.p...
[3] https://github.com/rp-rs/rp2040-boot2/blob/main/src/boot2_w2...
https://thestarman.pcministry.com/asm/mbr/STDMBR.htm
Note that EFI/UEFI -- occurred much later in time than MBR...
The MBR boot process is also called "Legacy Boot" -- and is emulated on (U)EFI -- although it may not show as an option on some (U)EFI BIOS'es if the option is turned off...
Related: https://en.wikipedia.org/wiki/Master_boot_record
You might also wish to check out some emulators, most notably Bochs (https://bochs.sourceforge.io/) and QEMU (https://www.qemu.org/) because they simulate the boot process, and if you're in their debuggers, you should be able to inspect that process step by step -- but also more generally emulators for other machines/platforms/architectures (https://en.wikipedia.org/wiki/List_of_computer_system_emulat...) because in general, most of those emulators should realistically simulate the given machine/platform/architecture's boot process...
The basic theory of booting is that when a system starts, it contains a little bit of persistent memory (BIOS ROM, EPROM?) that contains a little bit of code, which is just enough code to load the data of the first block/sector of the hard disk (or other persistent storage boot device) to a specific address in memory as code -- and jump to it.
This data on the first block/sector -- is a small bit of machine code -- which although fairly stupid -- knows enough about the system to load the next N blocks/sectors of the hard disk (or other storage device) -- again into memory at a specific address -- and then jump (transfer control) to it.
This pattern may repeat several times, for example, GRUB's first bootsector then loads an intermediate length program (a "chainloader") from the next N contiguous blocks/sectors of the storage device, this chainloader knows more about the hardware and filesystems than the initial boot block did, and then it proceeds to do a yet longer/more complex load of the main Operating System into memory.
During the final load, the main Operating System might be on discontiguous blocks/sectors and those blocks/sectors may be part of a filesystem. But that doesn't have to be the case.
But whatever the case, the final load (or perhaps "bootstrap phase") is usually more complex than reading continuous blocks/sectors into memory and subsequently jumping to its start address in memory -- but not always...
Another way to think about it is that a 512 byte program (the bootsector) is loaded that then loads a 32K (let's say) program (the chainloader on N contiguous blocks/sectors) which then loads the multi-megabyte (or multi-gigabyte!) OS from multiple files from multiple file systems from multiple discontinuous blocks/sectors on the physical device (let's say, for example...)
Simple Program (loads) -> More Complex Program (loads) -> Most Complex Program (OS)...
Anyway, there's your basic theory...
Good luck!