Static Binaries and Why I Want Them ¶
I am a huge fan of statically linked binaries because of their portability. In theory — and often in practice, you can copy a statically linked binary compiled on one Linux distribution to another and have it work out of the box. Unfortunately, when using GNU libc (glibc), not all functions can be statically linked which causes the resulting binaries to be dependent on whatever version of glibc the application was compiled with. Most issues typically revolve around Name Service Switch (NSS). NSS allows system administrators to reconfigure a system to use external sources for things that would normally be queried from "/etc/passwd", "/etc/shadow", "/etc/hosts", "/etc/groups", "/etc/resolv.conf", etc.; think of services provided by things like LDAP, Active Directory and Windows Domain Controllers.
On most single-user systems not associated with or administered by an institution, NSS-specific features are used rarely to never. I have created modified versions of tmux, NGINX and Vim with patches that remove any NSS-dependent functions to make the builds 100% hermetic with no external glibc dependencies that I run on my personal machines. My current goal is to produce a package I can download or repository I can clone that allows me to setup a working development environment with up-to-date versions of my favorite tools on any Linux distribution, so I set my sights on creating a hermetic, statically linked version of GNU coreutils.
Taming of the coreutils ¶
As of release 8.23 of coreutils, the --enable-single-binary configuration option allows users to create a single binary that contains every tool in the coreutils package. For most people, the reduction in disk utilization is probably the best part of the change. For my purposes, having a single file with a multitude of tools is the best part. I downloaded coreutils 8.25, the most recent release at the time of this writing. An initial run of
./configure LDFLAGS="-static" && make unsurprisingly revealed that some of the applications in coreutils depend on NSS features. When compiling these libraries, GCC emits messages that look like this:
warning: Using ... in statically linked applications requires ... glibc version used for linking
I created a script to build every binary individually that also ran
make clean between builds. I grepped the output for messages like those mentioned above to determine exactly which tools use NSS-dependent functions. The resulting list of tools was fairly short:
I don't use most of the tools above on a regular basis for software development; I typically only need to use things like
chown while administering a system, not developing code on it, and even if that wasn't the case, I've never run into a situation where I wished I had a more recent version of most of these tools. Ultimately, I at least wanted up-to-date versions of
stat. By inspecting the build output, I discovered the functions causing problems for
getpwuid(3) all invoked in
stat is dependent on
getpwuid(3), both of which are called in main source file,
./src/stat.c. These functions are used to map user and group information back to strings among other things, so I don't want to simply excise those function calls.
There's an implementation of the C standard library named musl. It aims to be "lightweight, fast, simple, free, and strives to be correct in the sense of standards-conformance and safety." Unlike glibc, it does not support NSS. I first tried to build coreutils using musl exclusively, but that did not pan out. After looking through some of the musl source code, I decided to replace any functions from glibc that depend on NSS with their musl counterparts. Most of the code for these functions is found in under
./src/passwd in the musl repository:
musl$ egrep -l -R -w 'getgrgid|getgrnam|getpwnam|getpwuid' src/ src/passwd/getpwent.c src/passwd/getgrent.c
The core implementation for many of these functions lives in some similarly-named but different files. All in all, the files needed from musl are:
I figured this out by reviewing the files in
./src/passwd, a bit of trial and error and by using the "-MM" flag for GCC (
gcc -MM $C_FILE_HERE) after creating a file that included and successfully used the targeted functions. Before object files could be generated, lines with
#include "libc.h" and calls to
weak_alias(...) needed to be removed. The
include statement causes the local, musl libc header file to be included, but what's actually needed is glibc header file which was already be included as part of the coreutils build process. The definition of
weak_alias is in the musl
libc.h file. Weak aliases are beyond the scope of this document, but the aliases are not required by coreutils which means the macro isn't needed. I used
sed(1) to delete the unwanted lines:
sed -i -r '/libc\.h|weak_alias/d' src/passwd/*. After that, I created three C files which would eventually be used to generate object files:
==> getgrent.c <== #include <stdlib.h> #include <pwd.h> #include "src/passwd/getgr_a.c" #include "src/passwd/getgrent.c" #include "src/passwd/getgrent_a.c" ==> getpwent.c <== #include <stdlib.h> #include <pwd.h> #include "src/passwd/getpw_a.c" #include "src/passwd/getpwent.c" #include "src/passwd/getpwent_a.c" ==> nscd.c <== #include <pwd.h> #include "src/passwd/nscd.h" #include "src/passwd/nscd_query.c"
echo *.c | xargs -n1 gcc -static -pthread -c to generate object files from these source files. With that done, the object files could be added to
LDFLAGS. There are a couple of main ways to do this: re-run the configuration script i.e.
./configure LDFLAGS="-static -pthread "musl/*.o or edit the
Makefile at the root of the coreutils repository if the configuration script was previously run with
LDFLAGS="-static". I opted for the former. After I rebuilt the binaries above that still had NSS-related glibc dependencies, only the following binaries were still not hermetic:
Those binaries depend on
getaddrinfo(3). I'm sure porting over the necessary functions from musl would be trivial, but I put that on the back-burner since I don't care about having bleeding-edge versions of any of those tools. After that, I updated the configuration to produce a single binary that excluded the tools I didn't care about and re-ran
make to create the multi-call binary:
coreutils-8.25$ make clean ... coreutils-8.25$ ./configure LDFLAGS="-static -pthread "musl/*.o \ > --enable-single-binary=symlinks \ > --enable-single-binary-exceptions=chroot,groups,id,pinky,who checking for a BSD-compatible install... /usr/bin/install -c checking whether build environment is sane... yes checking for a thread-safe mkdir -p... /bin/mkdir -p ... config.status: creating po/Makefile coreutils-8.25$ make ... CC lib/mbslen.o CC lib/mbsstr.o CC lib/mbswidth.o CC lib/mbuiter.o CC lib/mgetgroups.o CC lib/mkancesdirs.o ... coreutils-8.25$ ls -lh src/coreutils -rwx------ 1 ericpruitt ericpruitt 5.8M Mar 7 20:11 src/coreutils coreutils-8.25$ ldd src/coreutils not a dynamic executable (1) coreutils-8.25$ ./src/coreutils --help Usage: ./src/coreutils --coreutils-prog=PROGRAM_NAME [PARAMETERS]... Execute the PROGRAM_NAME built-in program with the given PARAMETERS. ...
With the multi-call binary built, I verified that user and group information could still be queried despite the lack of NSS support:
coreutils-8.25$ (exec -a ls ./src/coreutils -l musl/*.o) -rw------- 1 ericpruitt ericpruitt 7160 Mar 7 19:41 getgrent.o -rw------- 1 ericpruitt ericpruitt 6168 Mar 7 19:41 getpwent.o -rw------- 1 ericpruitt ericpruitt 3248 Mar 7 19:41 nscd.o
That command executed
argv set to
ls which will makes the multi-call binary act like
./src/coreutils --coreutils-prog=ls -l musl/*.o would have achieved the exact same thing.