Skip to content
Tech News
← Back to articles

Writing string.h functions using string instructions in asm x86-64 (2025)

read original more articles
Why This Matters

This article explores how string functions like memcpy are implemented in x86-64 assembly, comparing standard library implementations with optimized assembly using string and SIMD instructions. Understanding these low-level details can help developers optimize performance-critical code and better grasp hardware-level operations, which is vital for advancing software efficiency and system design in the tech industry.

Key Takeaways

Introduction

The C standard library offers a bunch of functions (whose declarations can be found in the string.h header) to manage NULL-terminated strings and arrays. These are some of the most used C functions, often implemented as builtin by the C compiler, as they are crucial to the speed of programs.

On the other hand, the x86 architecture contains “string instructions”, aimed at implementing operations on strings at the hardware level. Moreover, the x86 architecture was incrementally enhanced with SIMD instructions over the years, allowing for the processing of multiple bytes of data in a single instruction.

In this article, we’ll inspect the implementation of string.h of the GNU standard library for x86, and see how it compares with a pure assembly implementation of these functions using string instructions and SIMD, and try to explain the choices made by the GNU developers and help you write better assembly.

Disassembling a call to memcpy

One of the most popular C functions is memcpy . It copies an array of bytes to another, which is a very common operation and makes its performance particularly important.

There are several ways you can perform this operation using x86 asm. Let’s see how it is implemented by gcc using this simple C program:

#include <string.h> #define BUF_LEN 1024 char a [ BUF_LEN ]; char b [ BUF_LEN ]; int main ( void ) { memcpy ( b , a , BUF_LEN ); return EXIT_SUCCESS ; }

We can observe the generated asm by using godbolt.

Or compile the code using gcc 14.2: gcc -O1 -g -o string main.c

... continue reading