In this post I describe an issue I ran into at work when we were trying to test an app using Microsoft.Data.SQLite on Alpine Linux, and were running into this error: Unhandled exception. System.DllNotFoundException: Unable to load shared library 'e_sqlite3' or one of its dependencies .
This post is primarily a walkthrough of the steps I took to solve the issue. I describe the problem itself, the environment in which it happened, the things we tried to isolate the issue, the eventual root cause, and how we resolved it.
We ran into the issue while working on the Datadog .NET tracer in a branch where we were adding provisional support for .NET 10. One of the issues we ran into in this branch was that .NET 10 no longer runs on the version of alpine we were previously using, alpine:3.14 .
One of the issues we have to deal with in general is that we support a wide range of target frameworks and deployment platforms, which generally means we need to use and test on "old" versions of both distros and target frameworks.
Initial attempts to run .NET 10 on alpine:3.14 fails at runtime with the following error:
Failed to load /usr/share/dotnet/host/fxr/10.0.0-preview.5.25277.114/libhostfxr.so, error: Error relocating /usr/share/dotnet/host/fxr/10.0.0-preview.5.25277.114/libhostfxr.so: _ZSt28__throw_bad_array_new_lengthv: symbol not found The library libhostfxr.so was found, but loading it from /usr/share/dotnet/host/fxr/10.0.0-preview.5.25277.114/libhostfxr.so failed
We subsequently confirmed that this is expected: .NET 10 has updated its support matrix, and now requires Alpine 3.17 or higher.
After jumping through the various hoops to confirm that we could update the base image of alpine without breaking anything, we updated our build and test images to use alpine:3.17 . And that's when we started hitting the issue that is the focus of this post.
As part of CI tests, we run over a hundred different sample applications, across a variety of different package version combinations and target frameworks. After updating to the new version of alpine, we started seeing issues in one application in particular, Samples.Microsoft.Data.Sqlite. This was failing in two specific target frameworks, netcoreapp3.1 and net5.0 , and was failing with the following error:
Unhandled exception. System.DllNotFoundException: Unable to load shared library 'e_sqlite3' or one of its dependencies. In order to help diagnose loading problems, consider setting the LD_DEBUG environment variable: Error loading shared library libe_sqlite3: No such file or directory at SQLitePCL.SQLite3Provider_e_sqlite3.NativeMethods.sqlite3_libversion_number ( ) at SQLitePCL.SQLite3Provider_e_sqlite3.SQLitePCL.ISQLite3Provider.sqlite3_libversion_number ( ) at SQLitePCL.raw.SetProvider ( ISQLite3Provider imp ) at SQLitePCL.Batteries_V2.Init ( ) at SQLitePCL.Batteries.Init ( ) at Samples.Microsoft.Data.Sqlite.Program.OpenConnection ( ) in D: \ a \ _work \ 1 \ s \ tracer \ test \ test-applications \ integrations \ Samples.Microsoft.Data.Sqlite \ Program.cs:line 29 at Samples.Microsoft.Data.Sqlite.Program.Main ( ) in D: \ a \ _work \ 1 \ s \ tracer \ test \ test-applications \ integrations \ Samples.Microsoft.Data.Sqlite \ Program.cs:line 17 at Samples.Microsoft.Data.Sqlite.Program. < Main > ( )
As you might expect, this app uses the Microsoft.Data.Sqlite package package, which transitively references various SQLite packages. The question was:
Why is the library unable to load the libe_sqlite3 ?
? Why does it only affect .NET Core 3.1 and .NET 5?
One of the difficulties with solving the problem was that we were changing multiple variables at once: we updated the base image to alpine:3.17 and we were building with the .NET 10 preview SDK too. Unfortunately, for technical reasons, it wasn't easy for us to separate these two steps entirely, so we took a slightly different approach.
Our base suspicion was that the updated alpine base image was the problem. There were lots of possible reasons this could be the case, such as missing native dependencies, but before we narrowed our focus, we wanted to isolate the issue to alpine.
To confirm our suspicions, we used a build of the sample taken from the master branch's CI, and ran it on both alpine:3.14 and alpine:3.17 , without the .NET tracer attached, running against .NET Core 3.1. The results were:
On alpine:3.14 , the app ran without issue.
, the app ran without issue. On alpine:3.17 , the app crashed with Unable to load shared library 'e_sqlite3' or one of its dependencies .
OK, so the problem was definitely the new alpine:3.17 image. Now to try to understand why it was a problem.
So at this point, we know the app itself is correct, and that the libe_sqlite3 exists and is in the right location, because it works with alpine:3.14 . So then why can't the library be loaded?
I started by running the ldd (List Dynamic Dependencies) command inside the alpine:3.17 project, passing in the path to the SQLite library. This then lists the dynamic dependencies of the library:
$ ldd ./runtimes/linux-musl-x64/native/libe_sqlite3.so /lib/ld-musl-x86_64.so.1 ( 0x741dab2ae000 ) libc.so = > /lib/ld-musl-x86_64.so.1 ( 0x741dab2ae000 )
This shows that the SQLite library is linked only against musl's libc , and that the library is present. So that means the issue is likely not due to a missing dependency.
The next thing I tried was doing what the error message said:
In order to help diagnose loading problems, consider setting the LD_DEBUG environment variable
The LD_DEBUG variable is a feature of the Linux dynamic linker that allows dumping information about how it's functioning. This is great for debugging, and there's lots of options you can pass to it. For the example below, I used the libs option which displays native library search paths:
$ LD_DEBUG = libs ls 295 : find library = libselinux.so.1 [ 0 ] ; searching 295 : search cache = /etc/ld.so.cache 295 : trying file = /lib/x86_64-linux-gnu/libselinux.so.1 295 : 295 : find library = libc.so.6 [ 0 ] ; searching 295 : search cache = /etc/ld.so.cache 295 : trying file = /lib/x86_64-linux-gnu/libc.so.6 295 : 295 : find library = libpcre2-8.so.0 [ 0 ] ; searching 295 : search cache = /etc/ld.so.cache 295 : trying file = /lib/x86_64-linux-gnu/libpcre2-8.so.0 295 : 295 : calling init: /lib64/ld-linux-x86-64.so.2 295 : calling init: /lib/x86_64-linux-gnu/libc.so.6 295 : calling init: /lib/x86_64-linux-gnu/libpcre2-8.so.0 295 : calling init: /lib/x86_64-linux-gnu/libselinux.so.1 295 : initialize program: ls 295 : transferring control: ls 295 :
Unfortunately, running our app with LD_DEBUG=libs dotnet Samples.Microsoft.Data.Sqlite.dll doesn't reveal anything interesting. That's because the libe_sqlite.so native library isn't loaded as a dynamic dependency, so LD_DEBUG doesn't give us any information. Rather, the SQLite library is explicitly loaded by the .NET runtime, so it's the runtime that's unable to load the library.
Another useful tool for debugging linking issues is strace which provides insights into all system calls.
The next thing I tried was explicitly setting LD_LIBRARY_PATH to include the path to libe_sqlite.so . LD_LIBRARY_PATH is a set of additional paths to search for dynamically linked libraries, in addition to the standard locations. As a hack, I tried setting the variable to include the directory containing the SQLite library:
$ LD_LIBRARY_PATH = $LD_LIBRARY_PATH :/project/runtimes/linux-musl-x64/native/ \ dotnet Samples.Microsoft.Data.Sqlite.dll
And sure enough, it worked! The sample successfully found the libe_sqlite.so library, and ran correctly!
So at this point, we're pretty certain that it's purely a problem with .NET finding the native library when we're running on alpine:3.17 , not an issue with loading the library itself.
So what's going on here?
At this point, my best guess was essentially that .NET Core 3.1 and .NET 5 simply didn't support alpine:3.17 based on the fact:
Alpine 3.17 wasn't released until 2022
.NET Core 3.1 was release in 2019 and .NET 5 in 2020
What's more, Microsoft added dockerfiles for alpine:3.17 but they only added them for .NET 6+. Microsoft never updated .NET Core 3.1 images to use alpine:3.17 , even though they were still updating 3.1 at this time. So even though .NET Core 3.1 and .NET 5 officially support Alpine 3.13+, it seems the native library lookup rules are just broken on Alpine 3.17.
And that pretty much sums it up: the native-library lookup rules are broken for Alpine 3.17+ on .NET Core 3.1 and .NET 5.
I had actually forgotten that in earlier versions of .NET Core, there were hardcoded lists mapping from distro names like alpine:3.17 to runtime IDs, like linux-musl-x64 . If the mapping was missing, then .NET didn't know which runtime ID to use, and instead of choosing the correct linux-musl-x64 runtime ID, it would fallback to linux-x64 . And choosing the wrong runtime ID is why the app was failing previously.
This problem of a missing runtime ID entry causing native failures on alpine was actually a problem that happened repeatedly:
Eventually, a new fallback was added for alpine that explicitly used linux-musl-x64 artifacts for unknown alpine versions, so this shouldn't be a problem for .NET 7+.
OK, so now we understood why the problem is happening. But how could we fix it?
Luckily, the .NET host allows explicitly setting the runtime ID via an environment variable, DOTNET_RUNTIME_ID . If this variable is set, the runtime uses that in preference to its usual fallbacks, so even these old runtime versions can work on newer versions of alpine.
So in this scenario, we can set DOTNET_RUNTIME_ID=linux-musl-x64 and the app runs perfectly:
$ DOTNET_RUNTIME_ID = linux-musl-x64 dotnet Samples.Microsoft.Data.Sqlite.dll
Problem solved!
So the solution was very simple in the end, but I thought it was worth describing the process we went through to narrow it down. And maybe it will help someone who (like me) has forgotten that this was a thing😅
In this post I walkthrough how we solved an error running .NET Core 3.1 and .NET 5 on Alpine Linux 3.17: Unable to load shared library 'e_sqlite3' or one of its dependencies . I describe the problem itself—the failure to load the SQLite native library—the environment in which it happened, the things we tried to isolate the issue, the eventual root cause, and how we resolved it.