Several news outlets reported the discovery of a 1970s Fourth Edition Research Unix magnetic tape at the University of Utah in July 2025 and its successful restoration. This is a significant find, because up to now only the Fourth Edition’s manual was thought to have survived. Over the past few days I incorporated the tape’s source code into the Unix History Repository hosted on GitHub (see it here) and studied the code’s composition.
The Fourth Research Edition Unix came out of the famous AT&T Bell Laboratories in November 1973. A significant development it introduced was the rewriting of large parts of the system’s kernel in a high-level language (early C) rather than PDP-11 assembly language. The tape contains a complete system dump, including both source code and the compiled binaries and kernel. For inclusion in the Unix history repository, I removed the binaries, to match what is normally put under source code version control.
find $dir -name '*.[oa]' | xargs rm rm rm -rf $dir /bin $dir /usr/bin $dir /usr/games $dir /lib $dir /dev /bin/usr/bin/usr/games/lib/dev rm $dir /etc/ {lpd , init , msh , getty , mkfs , mknod , glob , update , umount , mount} /etc/ rm $dir /unix /unix rm $dir /usr/mdec/ [ tm ] boot $dir /usr/sys/conf/mkconf $dir /usr/fort/fc1 /usr/mdec/boot/usr/sys/conf/mkconf/usr/fort/fc1 rm $dir /usr/c/cvopt $dir /usr/lib/suftab /usr/c/cvopt/usr/lib/suftab
As with other source code snapshots included in the Unix history repository, the (synthetic) Git commit timestamps are derived from the file timestamps while the commit authors are derived from a manually-created map file. I updated the existing V4 author map file based on information I had gathered for preceding and following Unix Research editions. I explicitly put ken,dmr (Ken Thompson and Dennis Ritchie the system’s main developers) in all source code files where I lacked author information (this is also the default introduced via a .* regular expression) to mark missing details. Two members of the original Bell Labs Unix development team kindly provided me information to fill some details, such as the developer of the SNOBOL III interpreter (Ken Thompson) and the implementer of the math library and emulator (Robert H. Morris).
Some have claimed that the tape’s contents are very close to the Fifth Edition rather to what really was the Fourth Edition. The reason for this claim is that, in contrast to Unix manual editions (which were formally numbered and give the Unix Research Editions their name) distributed software tapes were mostly a copy of whatever was at the time in the (single) Unix development computer. I set out to see the differences between the two versions. First, I looked at the base file names included in the two.
normalize() { sed 's|.*/||' | sort -u } comm -3 \ <( git ls-tree -r --name-only Research-V4-Snapshot-Development | normalize ) \ ls-treeResearch-V4-Snapshot-Development <( git ls-tree -r --name-only Research-V5-Snapshot-Development | normalize ) ls-treeResearch-V5-Snapshot-Development
The above command, which outputs files whose base file name occurs only in one of the two releases, shows only the following files introduced in the Fifth Edition.
c13.c c21.c c2h.c cmp.c ldfps.s
So, the C compiler grew by a few files, and the cmp (compare) utility was written in C.
To dig deeper I then run git blame on each file of the two editions, to see what parts of preceding editions they incorporated.
... continue reading