Building Linux from scratch: From zero to sh

My day job is writing embedded software, so I do a decent amount of Linux work. However, since the team that I work on was created long before I joined, there’s already a set of tools that builds our Linux image. When I want to test a Linux change I run one build command and out pops a full Linux image. But, if by some tragic accident, all our build code disappeared tomorrow, how would I go about building a Linux image myself? How, exactly, do you go from a Linux source tree and some userspace code to a bootable binary? I wasn’t sure, so I decided to find out!

Goals

Before I start, I need to define the final product I’m looking for. I want a Linux kernel that I compiled, running with a device tree I compiled, booting off a file system I made, into a shell. No wifi, no GUI, just a terminal screen that I can type in. Basically, the minimal viable product of a Linux distro.

Now, if you go look up how to build a Linux image, you’re going to come across two major tools: Buildroot and Yocto. And while I assume that these tools are very powerful, I already don’t know how to build a Linux image. And learning how to build a Linux image at the same time as I learn a tool that builds the Linux image seems a bit much. So, for this exercise, I’m going to be doing everything by hand (and by hand I mean using make plus whatever other miscellaneous utilities I need). But no pre-packaged Linux building system for me! _[1]

There are only really three components that I need to build to boot into a shell.

A Linux kernel
A device tree for my target device
A file system for my image to boot

So with all this in mind, onto step 1!

Building a kernel

Before I start building anything, I need to decide which hardware I’m going to be building for. I already have a Raspberry Pi 4 at home, so I decided to go with that (this also means that I could boot the image on real hardware in the future if I desired).

The Raspberry Pi foundation has a page on how to build a Linux kernel, so I started there. _[2]

The documentation said to build this for the 64-bit kernel

make ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- Image modules dtbs

and it said to build this for the 32-bit kernel

make ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- zImage modules dtbs

I was a little confused to see two different targets for the kernel; do the 64 and 32-bit kernels not use the same target?

After a little googling it turns out that this can be answered via the help target. Running

make ARCH=arm64 help

Gives these build targets

Architecture-specific targets (arm64):
* Image.gz      - Compressed kernel image (arch/arm64/boot/Image.gz)
  Image         - Uncompressed kernel image (arch/arm64/boot/Image)

and setting ARCH=arm gives these build targets

Architecture-specific targets (arm):
* zImage        - Compressed kernel image (arch/arm/boot/zImage)
  Image         - Uncompressed kernel image (arch/arm/boot/Image)
* xipImage      - XIP kernel image, if configured (arch/arm/boot/xipImage)
  uImage        - U-Boot wrapped zImage
  bootpImage    - Combined zImage and initial RAM disk
                  (supply initrd image via make variable INITRD=<path>)

So it seems that zImage and Image.gz are both compressed kernel images, just with different target names, depending on the architecture. Since I’m interested in 64-bit Linux, I’ll follow the guide and run the following commands. _[3]

KERNEL=kernel8
make -j $(nproc) ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- bcm2711_defconfig
make -j $(nproc) ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- Image dtbs

I get a file named ‘Image’ once Linux has finished building, and running file on it reassures me that I’ve compiled the correct image.

file result/boot/Image
result/boot/Image: Linux kernel ARM64 boot executable Image, little-endian, 4K pages

I also get a device tree which seems valid

file result/dtbs/bcm2711-rpi-4-b.dtb
result/dtbs/bcm2711-rpi-4-b.dtb: Device Tree Blob version 17, size=56108, boot CPU=0, string block size=4872, DT structure block size=51164

Nice, my image built successfully!

So I now have a Linux image and a device tree. But what about a file system? I need somewhere to store my shell (and the other utilities that I might want).

Building a file system

Initramfs

When you boot Linux, you need to inform it of where the file system for the image lives. You can use a file system on a physical device (like a hard drive or SSD), or you can use a ram only file system called initramfs. I decided to try the initramfs option. _[4]

After reading through the documentation on initramfs, it seems like what I need to do is pretty minimal. I need to create a gzipped cpio archive that will be extracted into a root file system. That archive needs to contain a script called /init, which will be what Linux runs after it’s unpacked my archive. Since the archive will be unpacked into a file system, it needs to contain all the programs I want access to, such as cp, ls, and, given that I want a shell, sh. But there are a lot of tools that come bundled with a standard Linux system. Am I going to need to build all these from source?

As you may have guessed by my leading question, the answer is no, I don’t need to build these all from source. All I need is to use BusyBox.

BusyBox

To quote the busybox website

BusyBox combines tiny versions of many common UNIX utilities into a single small executable. It provides replacements for most of the utilities you usually find in GNU fileutils, shellutils, etc. The utilities in BusyBox generally have fewer options than their full-featured GNU cousins; however, the options that are included provide the expected functionality and behave very much like their GNU counterparts. BusyBox provides a fairly complete environment for any small or embedded system.

And luckily for me, those “many common” utilities happen to include all the programs I need for my shell to be useful! This means that I don’t need to ship a few hundred different binaries in my initramfs image, I can just ship BusyBox. But how does BusyBox provide all the functionality of the various tools that I want access to?

A brief tangent on `argv[0]`

Normally argv[0] isn’t used for anything; it’s just the name of your program, after all. And why would you ever care about the name of the program you’re running?

This is some C code that just prints out argv[0]

#include <stdio.h>

int main(int argc, char** argv)
{
        printf("%s\n", argv[0]);
}

when I run it I get

~/scratch$ ./a.out
./a.out

which isn’t very interesting. a.out is the name of the program. What else could it print?

However, an interesting thing happens when you create symlinks to a program.

~/scratch$ tree
.
├── a.out
└── foo -> a.out

Now, if I run foo (which is just a.out), I get this

~/scratch$ ./foo
./foo

Well isn’t that interesting. I only have one binary, but I can change argv[0] by invoking the same binary through a symlink. Imagine if I wanted to allow one binary to do multiple things, depending on which symlink it’s invoked through. I could mimic having multiple binaries by checking the value of argv[0] and taking the appropriate action depending on the value. _[5]

if argv[0] == "ls":
    call_ls()
else if argv[0] == "cp"
    call_cp()
// repeat for all other utilities

This is how BusyBox works; BusyBox has a few hundred programs inside of it, and you symlink each program name to the BusyBox binary. When you invoke BusyBox via the relevant symlink, BusyBox checks argv[0] and calls the appropriate sub program for you. This means that I can ship one binary - BusyBox - but have access to all the tools that BusyBox contains internally. Listing out the programs BusyBox includes shows almost every program I’ve ever used

Usage: busybox [function [arguments]...]
   or: busybox --list[-full]
   or: busybox --show SCRIPT
   or: busybox --install [-s] [DIR]
   or: function [arguments]...

        BusyBox is a multi-call binary that combines many common Unix
        utilities into a single executable.  Most people will create a
        link to busybox for each function they wish to use and BusyBox
        will act like whatever it was invoked as.

Currently defined functions:
        [, [[, acpid, add-shell, addgroup, adduser, adjtimex, arch, arp,
        arping, ascii, ash, awk, base32, base64, basename, bc, beep,
        blkdiscard, blkid, blockdev, bootchartd, brctl, bunzip2, bzcat, bzip2,
        cal, cat, chat, chattr, chgrp, chmod, chown, chpasswd, chpst, chroot,
        chrt, chvt, cksum, clear, cmp, comm, conspy, cp, cpio, crc32, crond,
        crontab, cryptpw, cttyhack, cut, date, dc, dd, deallocvt, delgroup,
        deluser, depmod, devmem, df, dhcprelay, diff, dirname, dmesg, dnsd,
        dnsdomainname, dos2unix, dpkg, dpkg-deb, du, dumpkmap, dumpleases,
        echo, ed, egrep, eject, env, envdir, envuidgid, ether-wake, expand,
        expr, factor, fakeidentd, fallocate, false, fatattr, fbset, fbsplash,
        fdflush, fdformat, fdisk, fgconsole, fgrep, find, findfs, flock, fold,
        free, freeramdisk, fsck, fsck.minix, fsfreeze, fstrim, fsync, ftpd,
        ftpget, ftpput, fuser, getopt, getty, grep, groups, gunzip, gzip, halt,
        hd, hdparm, head, hexdump, hexedit, hostid, hostname, httpd, hush,
        hwclock, i2cdetect, i2cdump, i2cget, i2cset, i2ctransfer, id, ifconfig,
        ifdown, ifenslave, ifplugd, ifup, inetd, init, insmod, install, ionice,
        iostat, ip, ipaddr, ipcalc, ipcrm, ipcs, iplink, ipneigh, iproute,
        iprule, iptunnel, kbd_mode, kill, killall, killall5, klogd, last, less,
        link, linux32, linux64, linuxrc, ln, loadfont, loadkmap, logger, login,
        logname, logread, losetup, lpd, lpq, lpr, ls, lsattr, lsmod, lsof,
        lspci, lsscsi, lsusb, lzcat, lzma, lzop, makedevs, makemime, man,
        md5sum, mdev, mesg, microcom, mim, mkdir, mkdosfs, mke2fs, mkfifo,
        mkfs.ext2, mkfs.minix, mkfs.vfat, mknod, mkpasswd, mkswap, mktemp,
        modinfo, modprobe, more, mount, mountpoint, mpstat, mt, mv, nameif,
        nanddump, nandwrite, nbd-client, nc, netstat, nice, nl, nmeter, nohup,
        nologin, nproc, nsenter, nslookup, ntpd, od, openvt, partprobe, passwd,
        paste, patch, pgrep, pidof, ping, ping6, pipe_progress, pivot_root,
        pkill, pmap, popmaildir, poweroff, powertop, printenv, printf, ps,
        pscan, pstree, pwd, pwdx, raidautorun, rdate, rdev, readahead,
        readlink, readprofile, realpath, reboot, reformime, remove-shell,
        renice, reset, resize, resume, rev, rm, rmdir, rmmod, route, rpm,
        rpm2cpio, rtcwake, run-init, run-parts, runlevel, runsv, runsvdir, rx,
        script, scriptreplay, sed, seedrng, sendmail, seq, setarch, setconsole,
        setfattr, setfont, setkeycodes, setlogcons, setpriv, setserial, setsid,
        setuidgid, sh, sha1sum, sha256sum, sha3sum, sha512sum, showkey, shred,
        shuf, slattach, sleep, smemcap, softlimit, sort, split, ssl_client,
        start-stop-daemon, stat, strings, stty, su, sulogin, sum, sv, svc,
        svlogd, svok, swapoff, swapon, switch_root, sync, sysctl, syslogd, tac,
        tail, tar, taskset, tc, tcpsvd, tee, telnet, telnetd, test, tftp,
        tftpd, time, timeout, top, touch, tr, traceroute, traceroute6, tree,
        true, truncate, ts, tsort, tty, ttysize, tunctl, ubiattach, ubidetach,
        ubimkvol, ubirename, ubirmvol, ubirsvol, ubiupdatevol, udhcpc, udhcpc6,
        udhcpd, udpsvd, uevent, umount, uname, unexpand, uniq, unix2dos,
        unlink, unlzma, unshare, unxz, unzip, uptime, users, usleep, uudecode,
        uuencode, vconfig, vi, vlock, volname, w, wall, watch, watchdog, wc,
        wget, which, who, whoami, whois, xargs, xxd, xz, xzcat, yes, zcat,
        zcip

All I need to do to get access to all of these tools is make sure that my /init script sets up the relevant symlinks before starting the shell

Compiling BusyBox

I don’t plan on packaging a C standard library in my system, so I need to make sure the BusyBox is compiled statically (otherwise BusyBox will search for a non-existent system-wide C standard library)

Luckily this isn’t that hard. To compile BusyBox I first clone the repo and then run

make defconfig

which produces a .config file. When I then open up this .config I see this

#
# Build Options
#
# CONFIG_STATIC is not set

Changing this to

CONFIG_STATIC=y

means that my BusyBox image will now build statically.

I then build BusyBox using this command

make CROSS_COMPILE=aarch64-unknown-linux-gnu- -j $(nproc)

After building, I can verify that it is indeed a static binary.

file result/busybox
result/busybox: ELF 64-bit LSB executable, ARM aarch64, version 1 (GNU/Linux), statically linked, for GNU/Linux 3.10.0, stripped

Creating an initramfs image

Now that I’ve gotten BusyBox, I can actually create my initramfs image, which will contain the following

a /init file (called at startup)
BusyBox
BusyBox symlinks to whatever programs the /init script needs

Putting all that together results in a file system that looks like this

.
├── bin
│   ├── busybox
│   ├── ln -> busybox
│   ├── ls -> busybox
│   └── sh -> busybox
├── init

With this as the init script

#!/bin/sh

for command in $(busybox --list); do
        if [ ! -e "/bin/$command" ]; then
                ln -s busybox "/bin/$command"
        fi
done

exec /bin/sh

This script creates symlinks from the programs BusyBox packages to the BusyBox binary itself. It then invokes /bin/sh, which creates the shell that I’ll interact with.

Now that I’ve got the file system set up I need to package it in a way that Linux understands. Luckily the initramfs documentation provides a script for that!

  #!/bin/sh

  # Copyright 2006 Rob Landley <[email protected]> and TimeSys Corporation.
  # Licensed under GPL version 2

  if [ $# -ne 2 ]
  then
    echo "usage: mkinitramfs directory imagename.cpio.gz"
    exit 1
  fi

  if [ -d "$1" ]
  then
    echo "creating $2 from $1"
    (cd "$1"; find . | cpio -o -H newc | gzip) > "$2"
  else
    echo "First argument must be a directory"
    exit 1
  fi

This script takes two arguments: a directory to convert to an initramfs image, and what you want the resulting compressed cpio file to be called. _[6]

Running this script on my file system gives me a compressed cpio archive, which seems to be valid (I called my initramfs file init.cpio)

file init.cpio
init.cpio: gzip compressed data, from Unix, original size modulo 2^32 3069952

Booting the system

I now have all the pieces I need to actually boot the system. I have a kernel, a device tree, and an initramfs file system. Now it’s time to put it all together.

To test out the image, I need real hardware or an emulator. In this case I’m going to use QEMU, which is an emulator that can run a full kernel on a virtual Raspberry Pi, without needing to set up any real hardware. _[7]

I can start the kernel by running

qemu-system-aarch64
            -nographic \
            -machine raspi4b \
            -cpu cortex-a72 \
            -m 2G -smp 4 \
            -kernel result/boot/Image \
            -dtb result/dtbs/bcm2711-rpi-4-b.dtb \
            --initrd scratch/init.cpio  \
            -serial null \
	    -chardev stdio,id=uart1 \
	    -serial chardev:uart1 \
	    -monitor none

The important points here are

pointing QEMU at the Image file I generated earlier using the --kernel flag
pointing QEMU at the device tree file using the --dtb flag
pointing QEMU at the initramfs image using the --initrd flag

And after some waiting I see

[    0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd083]
[    0.000000] KASLR disabled due to lack of seed                                                                                                                                                                                                                                       [    0.000000] Machine model: Raspberry Pi 4 Model B
[    0.000000] efi: UEFI not found.
[    0.000000] Reserved memory: created CMA memory pool at 0x000000002c000000, size 64 MiB
[    0.000000] OF: reserved mem: initialized node linux,cma, compatible id shared-dma-pool
[    0.000000] OF: reserved mem: 0x000000002c000000..0x000000002fffffff (65536 KiB) map reusable linux,cma
[    0.000000] NUMA: No NUMA configuration found
[    0.000000] NUMA: Faking a node at [mem 0x0000000000000000-0x000000003bffffff]
[    0.000000] NUMA: NODE_DATA [mem 0x3bdd33c0-0x3bdd5fff]
[    0.000000] Zone ranges:
[    0.000000]   DMA      [mem 0x0000000000000000-0x000000003bffffff]
[    0.000000]   DMA32    empty
[    0.000000]   Normal   empty
[    0.000000] Movable zone start for each node
// ignore most of the output
[    1.410530] of_cfs_init
[    1.413332] of_cfs_init: OK
[    1.416512] clk: Disabling unused clocks
|[    1.474364] Freeing unused kernel memory: 4864K
[    1.477876] Run /init as init process
/bin/sh: can't access tty; job control turned off
~ #

I have a shell!! And I can verify that everything is working by doing the time-honored tradition of hello world.

~ # echo hello world
hello world

And I’m done! I have a full - although limited - Linux image that boots!

Future work

At this point I've achieved what I set out to do. I now have a minimal Linux system that I can use to boot into a shell!

From here there's a bunch of different directions that I could go. I could look at integrating a system wide C standard library, so I could have dynamically linked executables. I could try and get networking setup, so I can ssh into the system. Or I could investigate other init systems, like systemD.

This won't be the last time that I'm writing about Linux, but for now this is a great place to stop.