This post documents how I built a cross-platform Elixir NIF in C to get on-demand up-to-date disk-usage stats without relying on os_mon and its disksup service. I had Grok 3 generate the initial C code and Makefile, then iterated through multiple code reviews by Gemini 2.5 Flash and GPT-5 to make it work on Linux, macOS, Windows, and the BSDs (except DragonFlyBSD). Along the way, I ran into typical LLM hiccups that speak volumes about the breathless hyperbole often peddled by LLM vendors, compute providers, and over-enthusiastic consultants, middle managers and executives on LinkedIn. Nevertheless, the result is a working, cross-platform Elixir package on Hex.pm, plus a real-world case study in where LLMs shine, where they fail, and what “human-in-the-loop” can mean in practice. Spoiler alert: the hype is exactly that; even so, we ended up with working code that is, at the very least, a solid starting point for further improvements by actual general intelligence. What set this off? I was working on my new book, Elixir File Browsing (readers of Northwind Elixir Traders : it’s not published yet), which is about developing an Elixir client for the undocumented REST API of File Browser . I had reached the point of developing the enough_space_to_download?/3 function that “gates” file downloads from the File Browser server based on whether there is enough space to actually store the file on a local path. Using Erlang’s os_mon After looking online for an Elixir function that does this, I found the Check disk space inside elixir thread on ElixirForum. The guidance therein is to use the disksup service of Erlang’s os_mon OS monitor application. It’s possible to use :disksup.get_disk_info/1 to get the total space, the available space, and the percentage of disk space used, but there are a few caveats: You have to add :os_mon to the :extra_application in mix.exs , so that you have access to :disksup.get_disk_info/1 . to the in , so that you have access to . The default configuration of disksup is to check disk space every 30 minutes. It’s configurable, so that’s good. is to check disk space every 30 minutes. It’s configurable, so that’s good. On Windows, disksup checks the space of “all logical drives of type FIXED_DISK”, which I’m assuming doesn’t include UNCs (e.g., Samba shares), except if you perhaps use the Map Network Drive feature of Windows’ Explorer. Of those caveats I didn’t particularly care for the third one, though some of my readers or eventual users of the ExFileBrowser library (not yet open-sourced) might be using Windows and might want to download files from a File Browser server to e.g. \\server\share\path . I cared more about the first two caveats. Adding an extra application to runtime dependencies seems like overkill, and having reported values updated every 30 minutes (if I don’t change the disk_space_check_interval configuration parameter of disksup to something smaller) seems like too much, and too little at the same time. All I want is the equivalent of a “spot check” on demand, for a specific path, and be able to take into account the mount point that this path belongs to. Essentially, I wanted the equivalent of running a df command (or whatever its equivalent on Windows), but without the hassle of having to parse the output of df ’s different implementations (or Windows’ equivalent). I found it pretty weird that there is no function like File.df/1 . It would have been very useful. This part was a roadblock of completing the chapter of my book, so I thought… why not write a NIF (Native Implemented Function)? Writing a NIF; how hard can it be? What is a NIF, anyway? Quoth the Erlang documentation on NIFs: “A NIF is a function that is implemented in C instead of Erlang. NIFs appear as any other functions to the callers. They belong to a module and are called like any other Erlang functions. The NIFs of a module are compiled and linked into a dynamic loadable, shared library (SO in UNIX, DLL in Windows). The NIF library must be loaded in runtime by the Erlang code of the module.” I hadn’t written C since taking the Mobile Robotics class and its accompanying hands-on semester project at the Nelson lab while studying Mechanical Engineering at ETH Zurich. That was back in 2004. I also had (and continue to have) no idea about writing C for different operating systems’ internals, and had never looked into implementing a NIF. However, I am using something regularly that relies on NIFs: Exqlite , the SQLite library for Elixir. So I thought, let’s see how Exqlite does things. From looking at its GitHub repo I deduced the following things: The C source code and headers are found in the c_src directory of the source tree. directory of the source tree. You need a Makefile and possibly another one for Windows builds. You need to add elixir_make as a dependency and define various options for it within the project/0 function of mix.exs . I then looked at Erlang’s documentation of the erl_nif functions to understand the basics. There are also a few articles online that helped me understand the task of writing and calling NIFs, such as: After that documentation safari, I had to figure out how to write a Makefile. No chance, though. Last time I wrote a Makefile was decades ago, and I couldn’t delay work on the book to get up to speed on that topic. Can an LLM write a NIF in C, and a Makefile? I turned to Grok 3 to get things rolling, initially for the C source, and later for the Makefile. I wrote an extensive prompt explaining that we want a NIF that, when provided with a path, will return a map with atom keys and integer values representing bytes for: the total size of the filesystem that this path is on, size of the filesystem that this path is on, the bytes that are used , , the bytes that are free (equals total minus used), and (equals total minus used), and the bytes that are available to the current user (due to permissions). I don’t know if I should be surprised or not, but Grok 3 was capable of spitting out first drafts of what became disk_space.c and the Makefile that worked on the first try on Debian 12. Note that the versions of those files on the disk_space GitHub repo are not these first drafts. Perhaps it’s not surprising, as Grok 3 must have ingested tons of C code related to erl_nif , POSIX functions related to filesystems, and Windows internals. The first draft of the C source used statfs on Linux (from sys/statfs.h ), and functions from sys/statvfs.h and sys/stat.h on Windows. Since I have no Windows or Apple computers right now, I didn’t know whether this would work on anything other than Linux. I asked Grok 3 to revise the C source and the Makefile a few times, in order to: Reduce the likelihood of memory-unsafe code. Increase the likelihood that the code works on Windows and macOS. Increase the likelihood that the code also works on the various BSDs. Various prompts were used for that, including prompts to help me understand what the C code did, to the small degree that I cared to understand more about this topic. After a few rounds of this I could see that Grok 3 started “forgetting” improvements that it had suggested earlier, and was even second-guessing its own output. Specifically, it kept running around in circles regarding the dynamic or static allocation of the array for the path string, and how it would ensure a valid UTF-8 string for Windows. It seemed to help when I would open a new chat, give it some context and the latest C source and Makefile of the previous chat, and ask it to review the code on a clean slate, unaffected by previous things it had recommended or warned against when asked to review its own code. At some point I decided to pitch other LLMs against Grok, or rather to have them provide a code review and have Grok continue coding with their guidance. That was because, according to xAI’s Terms of Service : “We ask that when using Output, you attribute the Service as having generated the Output, as detailed in our Brand Guidelines.” OK, so I would attribute the C code and the Makefile to Grok (I did, check the GitHub repo), but things seemed less clear about code generated by ChatGPT and Gemini. Thus, when using GPT-5 and Gemini 2.5 Flash, I requested a review of the latest C source and Makefile that would not include any new code, but only provide text feedback on what is wrong or could be improved, again with the same goals as above. This was the loop: Copy the C code and the Makefile from the latest Grok chat. Paste the text into a new GPT-5 or Gemini 2.5 Flash chat and ask for a code review without new code (see the prompt below). Copy the recommendations of GPT-5 or Gemini and paste them together with the C code and the Makefile from step (1) into a new Grok chat. Ask Grok to evaluate the recommendations and adapt the C code and the Makefile accordingly. Take Grok’s output and go back to GPT-5 or Gemini, paste them there, and add the simple prompt “Is this better now?” Take step (5)’s feedback and go back to step (1) until it seemed that none of the LLMs had anything of significant importance to revise or add/remove. The prompt for step (2) was something like this: “Thinking like an expert in multi-platform C code development, review the following C source for an Erlang NIF. Identify potential issues of memory safety, e.g. when allocating arrays for paths etc. Identify improvements to make the code more robust across Linux, the BSDs, macOS, and Windows. Check whether the code uses modern, non-deprecated functions and whether fallbacks need to be provided. Check whether erl_nif functions are used correctly. Identify any sources of unnecessary complexity, or incorrect usage of functions. Following your recommendations on the C source, provide recommendations for the accompanying Makefile. Do not give me any code, just give me text of your review and recommendations. Finally, give me your overall verdict on whether the code and the Makefile are production-ready and sufficiently well-written to expect that this will compile and work as expected across the aforementioned operating systems, without any gotchas.” After around 7 rounds of this back-and-forth between LLMs, this was Gemini’s verdict: “The provided C code is exemplary for a production-quality Elixir NIF. It is safe, robust, and portable to its stated target platforms. No significant security or stability issues were identified. The code demonstrates best practices for NIF development, including careful resource management, detailed error reporting, and proper handling of platform-specific APIs.” LLMs patting themselves on the back. But can you trust the verdict? Not entirely, because copying the same C source and prompt into another Gemini chat delivered generally positive, though different feedback that hinted at some lingering issues. For example (emphasis mine): “The C code is remarkably well-written, demonstrating a strong understanding of cross-platform development, memory management with the Erlang NIF API, and modern C practices. The use of helpers for error handling and path conversion is a clean approach. The use of Dirty Schedulers is a critical and correct choice. However, there are a few subtle memory safety issues and unnecessary complexities, as noted in the detailed review. The code could be simplified by streamlining the strerror_r logic and the get_path_from_term function, and by removing the redundant UTF-8 validation. The Windows-specific winapi_error_to_term has a potential memory leak.” After a couple of more rounds I considered these two files done, for now. If you are an expert in what disk_space.c does and whether it does so correctly, your Human General Intelligence will be highly appreciated as an issue or directly as a PR. Using the NIF in Elixir The C code compiled locally on Debian 12: $ cd c_src $ make cc -I/home/tisaak/.asdf/installs/erlang/27.2.4/usr/include -O2 -Wall -fPIC disk_space.c -shared -o ../priv/disk_space.so $ file ../priv/disk_space.so ../priv/disk_space.so: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, BuildID[sha1]=62a13e2b84059df337dca3d60dc7ec7d63997408, not stripped To use it and check that it actually does what the LLMs claimed it can do, I needed to draw the rest of the owl in Elixir. That was familiar territory, and also rather simple: defmodule DiskSpace do @on_load :load_nifs defp load_nifs do priv_dir = :code . priv_dir( :disk_space ) |> to_string() base_name = "disk_space" path = Path . join(priv_dir, base_name) :erlang . load_nif(to_charlist(path), 0 ) end defp stat_fs(_path), do : :erlang . nif_error( :nif_not_loaded ) def stat(path) when is_bitstring(path), do : path |> stat_fs() |> reshape_error_tuple() def stat!(path) do case stat(path) do { :ok , stats} -> stats error -> error end end defp reshape_error_tuple({ :error , reason}), do : { :error , %{ reason : reason, info : nil }} defp reshape_error_tuple({ :error , reason, info}), do : { :error , %{ reason : reason, info : info}} defp reshape_error_tuple({ :ok , stats_map} = success) when is_map(stats_map), do : success end Before I could try it out though, I had to adapt mix.exs with options for elixir_make . Looking at how Exqlite does it, I cut the options down to this, making the make invocation silent: defmodule DiskSpace.MixProject do use Mix.Project ... def project do [ app : :disk_space , ... compilers : [ :elixir_make ] ++ Mix . compilers(), make : "make -s" , make_targets : [ "all" ], make_clean : [ "clean" ], make_env : %{ "MAKEFLAGS" => "-s" }, make_cwd : "c_src" , ... ] end ... end With this in place, it was time to see if it works in IEx ( iex -S mix ): iex > DiskSpace . stat( "/home" ) { :ok , %{ free : 126205595648 , total : 256000393216 , used : 129794797568 , available : 54240219136 }} And there you have it. Grok 3, with a lot of human-in-the-loop copy/pasting and guidance from Gemini 2.5 Flash and GPT-5, wrote a NIF that does what it’s supposed to. Making the owl prettier, and testing it The rest of the owl was now drawn, so I spent some time adding a bell and a whistle, namely adding helper functions and a humanize/2 function that converts the byte values coming from stat/1 and stat!/2 to human-readable strings (kibibytes, kilobytes, etc.), and expanding the arity of stat/1 and stat!/1 to 2 by supporting an opts keyword list with the :humanize boolean option and the :base atom option that are used as arguments to humanize/2 : iex > DiskSpace . stat( "/home" , humanize : true , base : :binary ) { :ok , %{ free : "117.52 GiB" , total : "238.42 GiB" , used : "120.89 GiB" , available : "50.50 GiB" }} iex > DiskSpace . stat( "/home" ) |> DiskSpace . humanize( :decimal ) { :ok , %{ free : "126.19 GB" , total : "256 GB" , used : "129.81 GB" , available : "54.23 GB" }} Trivial but useful, in case you want to show such data on your Phoenix LiveView app. I also added docstrings that include doctests and wrote a few ExUnit tests, trying things out until they all passed: $ mix test Generated disk_space app Running ExUnit with seed: 626043, max_cases: 24 ............... Finished in 0.1 seconds (0.1s async, 0.00s sync) 4 doctests, 11 tests, 0 failures But will it work on other OSs? I parked the idea of trying it out on the BSDs at the time. More important for what became the DiskSpace package on Hex.pm and for the readers of my upcoming book was that the package should also work on macOS (only on Apple Silicon machines) and, ideally, also on Windows. Given that the last Mac I bought is a 2012 MacBook Air 13" that has been running Debian 12 for a while now (with a recent kernel everything works, including the Wi-Fi chipset), and given that I have no Windows installation to try it out on, I decided to look into GitHub Actions. I had never done this before. This too could not become a bottleneck in my book-writing process, so I turned to Grok 3 once again. This time I didn’t bother figuring out what exactly I had to do or how GitHub Actions workflows work. I just asked: “What do I need to do so that I can test that my Elixir package with this NIF builds and its tests pass, on Linux (amd64), macOS (arm64), and Windows (amd64)?” Grok 3 was eager to provide a .github/workflows/build.yaml file that turned out to be mostly garbage when I committed it and pushed to the repo. All builds failed, including that for Linux. Still, I read through it and understood the lay of the land. After that, I started a new bout of iterations, this time only with Grok 3, to get the workflow to complete successfully on Linux (Ubuntu), macOS and Windows. Most of the time was spent twiddling my thumbs while observing the workflow logs as they were streaming in, because GitHub could not show them after the workflow completed. The Linux build was the first one to complete successfully, but macOS and Windows were still giving me trouble. The main issues ended up being a combination of: The step for preparing build dependencies Windows ( msys2 etc.) etc.) That the Makefile contained options to generate a .dylib for macOS, where in the end an .so was sufficient. for macOS, where in the end an was sufficient. Elixir and OTP version combinations. For the first two, a few iterations of providing the build logs to Grok did the trick, at least with regards to seeing that the compilation was taking place on Windows, even though it wasn’t completing successfully. Even though I’ve long switched to asdf and am daily-driving Elixir 1.18.3 for my projects, and even though we were recently blessed with Debian 13 “trixie” , my original aspiration was to get the package working with Debian 12’s packages for Elixir (1.14) and Erlang/OTP (25). It was great back in 2022 when I first got into Elixir that a simple apt install elixir was all it took to get started on Debian 12, and I wanted to replicate that simplicity. Unfortunately, it seems that the C code that Grok wrote (with code reviews by its competitors) does not work for: Anything below OTP 27 on Windows, which means we’re constrained to Elixir 1.17 and above. OTP 25 (the latest supported for Elixir 1.14) on Linux, macOS, Windows. This means that overall the GitHub Build and Test workflow is restricted to a some combinations of Elixir and OTP: strategy : matrix : os : - windows-latest - macos-14 - ubuntu-latest elixir_otp : - { elixir : "1.18" , otp : "27" } - { elixir : "1.17" , otp : "27" } - { elixir : "1.16" , otp : "26" } - { elixir : "1.15" , otp : "26" } # - { elixir: "1.14", otp: "25" } # does not work exclude : - os : windows-latest elixir_otp : { elixir : "1.16" , otp : "26" } - os : windows-latest elixir_otp : { elixir : "1.15" , otp : "26" } fail-fast : false Can’t LLMs fix that? Apparently, not. I tried my iteration loops between the three LLMs for a few hours (more than it deserved) and gave up. The LLMs suggested various fixes, including better detection of Erlang header files, getting simplifying or getting rid of some functions they considered unnecessary, even changing linker options in the Makefile, something that didn’t seem to have any point at all, with or without the changes to the C source. I don’t have the time or interest to truly understand what breaks, but I’m sure that someone who actually knows can get it working, so PRs are welcome. Is this really supposed to be the precursor to AGI? Are you kidding me? Of course not–and that’s not coming from someone who is negatively predisposed towards the technology. I’ve been using ChatGPT since it was first launched in December 2022. I’ve used it almost every day since then to summarize and reformulate text, to discuss and review ideas, to learn about software architecture, to understand better how to write Elixir-idiomatic code, to figure out how to profile SQL queries with SQLite, and much more. Even the original ChatGPT was amazing, IMO. We’ve seen immense strides since then, especially with regards to context length and general availability. GPT 4o was an impressive feat, as was 3.5 Turbo before it. Gemini is amazing, especially given its multilingual capabilities, and Grok is (so far) the only one I’ve bothered to pay for, and use daily and extensively for various purposes. I’m far from an “LLM hater”, but I increasingly consider myself a skeptic, especially when it comes to the fairy-tale claims aiming to inflate the valuations of LLM vendors, compute vendors and AI-peddling con-men–I mean, con-sultants, sorry. It’s The Incredible Story of Deft once again. Let’s stop kidding ourselves: this cannot be the precursor to the alluded-to, vaguely-defined “AGI”. It is impressive, yes, and here it got me from 0 to having a NIF in C , a Makefile , and a build.yml that work, within a few hours… but only after tons of trial and error about failures that you would think that something supposedly at the cusp of AGI would/should not demonstrate. Namely, the models would repeatedly undo prior decisions and make the same mistakes as earlier, such as Grok including versions of Elixir and OTP we had already clarified as not working or unsupported with each other (e.g., Elixir 1.14 and OTP 27), or GPT and Gemini “forgetting” that I earlier asked them to not address something. Other times, Grok 3 would claim proudly to have incorporated the suggested improvements, and spit out the exact C source or Makefile I had given it to begin with. For a while, Grok (and unquestioningly, Gemini and GPT, too) led me down a rabbit-hole in which macOS allegedly absolutely, positively, for sure had to have a disk_space.dylib instead of disk_space.so . Two problems I see again and again, and that I noticed repeatedly during this small project: For things that are actually objectively true or objectively false, you sometimes receive either an objectively wrong verdict, delivered with incredible confidence, and often sprinkled with flattery. This is especially true of GPT 4o and 5. For things that are not clear-cut, you sometimes receive a response delivered confidently, tending towards one of the possible answers as if it is clearly the better one. Admittedly, both problems can be largely remediated with better prompting, such as asking for an exploration of options, or asking follow-up questions, such as “are you sure about that?”–to which you sometimes receive a response like “oh no, you are right, I wrongly assumed that …, but …”. The other approach I’ve found working well is to prompt in a way that gets the LLM to interview you, i.e. to deliver questions that when answered will enrich its context, instead of just bombarding it with questions, accepting its answers, and building on them. It seems that this guides the discussion down paths that strengthen false assumptions. This still doesn’t negate the potential downsides of using LLMs like ChatGPT to get things done . Clearly, without any C or OS-internals skills, I am firmly in the “Danger Zone”. If I had the time or willingness to actually become proficient in those topics, using LLMs would certainly accelerate my learning, supplementing other sources of information. Then again, we have an Elixir package now that others who know their stuff can improve, if they see the need. Grok vs. Gemini vs. GPT As for the three models pitted against each other, here is my anecdotal, totally subjective opinion: Overconfidence and bluster seems to be a common element, way less so in my experience with Grok, way more so with the GPTs, so-so with Gemini. Gemini is verbose, formal, almost “professional” in its “tone of voice”. It does what it’s asked to do. Great multilingual capabilities, which is not relevant for this project, but for another one I have running, for which I cannot afford to provide customer data in Greek to Gemini due to privacy worries. The GPTs are very USA-centric, and that includes the “tone of voice”. I don’t know if this has changed from 4o to 5, but my impression is that “everything is awesome” (cf. the LEGO Movie song) is almost a theme in the responses. Always look at the bright side, oopsy-daisy we made a mistake but no biggie, we’ll fix it, tee-hee, etc. Grok 3 is the best of the three, so far, which is why I’ve been paying for it monthly since February 2025. It demonstrates less bluster and overconfidence, except in a few cases where it hallucinates things, e.g. about Elixir code–it will sometimes given me nonsensical opinions about what is more idiomatic, for example, or will hallucinate functions that don’t exist. However, it is great at “opening the funnel” to explore options, and it is suprisingly well-versed in things about Greece that you wouldn’t expect it to be. In any case, color me skeptical and satisfied at the same. This is perhaps a blind alley on the path towards “AGI” (whatever tf that is supposed to be), but it’s a blind alley that delivers great results when used in certain conducive ways. In other words: it is a tool. Testing on the BSDs Now, as for that other tool, DiskSpace : after I got the Linux, macOS and Windows builds building and passing the test suite on GitHub Actions, I fired up 4 VMs on a Proxmox VE server on which I installed 4 BSDs, the requisite packages for Elixir, Erlang, and for building DiskSpace, and saw whether it works. Here is the situation: NetBSD 10.1: works; Erlang/OTP 27, Elixir 1.17.2 works; Erlang/OTP 27, Elixir 1.17.2 FreeBSD 14.3: works; Erlang/OTP 26, Elixir 1.17.3 works; Erlang/OTP 26, Elixir 1.17.3 OpenBSD 7.7: works; Erlang/OTP 27, Elixir 1.18.3 works; Erlang/OTP 27, Elixir 1.18.3 DragonFlyBSD 6.4.2: does not work, because of Erlang/OTP 25 (Elixir 1.16.3) You can find the full matrix of tested and untested combinations here: Supported Elixir and OTP versions . User feedback is that DiskSpace also works on Elixir 1.18.4 with OTP 28, so that’s good. The goodies I won’t pretend that this was some kind of a major feat of programming, but at least now there is an Elixir package that can give you spot-checks on disk usage of a path without having to run os_mon to get possibly-stale data from the disksup service. If I knew enough, I’d even submit a patch to have this feature implemented within the File module, as a File.df/2 function seems like a genuinely useful thing to have around. Nevertheless, the package should do, for now: PRs welcome.