Tech News
← Back to articles

The macOS LC_COLLATE hunt: Or why does sort order differently on macOS and Linux

read original related products more articles

Or, why does sort(1) order differently on macOS and Linux?

Zhiming Wang

2020-06-03

Today I noticed something interesting while working with a sorted list of package names: sort(1) orders them differently on macOS and Linux (Ubuntu 20.04). A very simple example, with locale set explicitly:

(macOS) $ LC_ALL=en_US.UTF-8 sort <<<$'python-dev

python3-dev' python-dev python3-dev (Linux) $ LC_ALL=en_US.UTF-8 sort <<<$'python-dev

python3-dev' python3-dev python-dev

What the hell? Same locale, different order (or technically, collation). This is not even a difference between GNU and BSD userland; coreutils sort on macOS produces the same output as /usr/bin/sort . (Of course, when LC_ALL=C is used, the results are the same, matching the macOS result above, since “ - ” as 0x2D on the ASCII table comes before “ 3 ” as 0x33 .) Therefore, the locale itself becomes the prime suspect.

macOS

LC_COLLATE for any locale on macOS is very easy to find: just look under /usr/share/locale/ . Somewhat surprisingly, /usr/share/locale/en_US.UTF-8/LC_COLLATE is a symlink to ../la_LN.US-ASCII/LC_COLLATE . The US-ASCII part is a giveaway for lack of sophistication, while the unfamiliar language code la and unfamiliar country code LN gave me pause. Turns out la is code for Latin and LN isn’t really code for anything (I guess they invented it for the Latin script influence sphere)? In fact, if we look a little bit closer, most locales’ LC_COLLATE are symlinked to la_LN dot something (mostly dot US-ASCII ), which isn’t very remarkable once we realize it stands for Latin: realpath in the following command is part of GNU coreutils. In fact I’ll be liberally using coreutils commands in this article. You can brew install coreutils (make sure you read the caveats).

... continue reading