March 17, 2026
LZ4 Rust - Devlog #0
First steps into an [un]safe, FFI journey
10 min read
I always wanted to start my own ‘Rewrite it in Rust™️’ project. However, it always felt overwhelming to me, not only because of its length but also because of the difficulty of maintaining an aligned implementation in the long run.
That changed when I attended the 2025 edition of EuroRust. Among dozens of quite interesting talks, the one given by Luca Palmieri got me thinking about starting this project.
Rewrite, Optimize, Repeat
We don’t want a standalone version of the software at the very beginning of the project. We want two parts living together, coexisting, and interoperating with each other. A rewrite from scratch is not the solution; replacing isolated modules until the Rust virus has spread enough to take control of the project is.
This project is not just about writing Rust code. It is about C, ABIs, build systems, interoperability, memory layouts, and optimizations while keeping an eye on possible regressions.
Project choice
Ok, we know what we want to do. The next question is: what is our target? Since the main purpose of this project is learning, we don’t need to find a project that hasn’t been migrated before or a project that really “needs” it. Also, for personal interest, if the project has some algorithmic complexity to make the development fun and full of lessons, that would be nice as well!
We should look for a not-too-large project, assumable by a solo dev during their side-project hours, with a clear, scoped division into modules to make it easier to squeeze ourselves into it. We also want it to be well tested and documented so we do not drift away from the specification.
Given these constraints, and after a quick search, I believe the
LZ4 compression algorithm and its official C implementation
are a pretty good candidate. Its
lib/ folder, where the core of the
algorithm is located, has just ~7k LoC, which sounds like quite a feasible
task.
~/D/P/l/lib dev ◼ via v17.0.0-clang
❯❯❯ tokei
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Language Files Lines Code Comments Blanks
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Autoconf 2 50 40 7 3
C 5 8639 6261 1345 1033
C Header 6 2569 731 1549 289
Makefile 2 324 200 75 49
Markdown 2 262 0 194 68
Visual Studio Pro| 1 182 182 0 0
Visual Studio Sol| 1 25 25 0 0
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Total 19 12051 7439 3170 1442
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
If the project succeeds (success == not getting bored too soon), we could think about migrating the rest of the codebase, which includes other programs and utilities.
Let’s start
First things first: Fork the repo.
Now let’s inspect the surface and see if we can find a good starting point:
~/D/P/lz4 dev
❯❯❯ tree -L 1
.
├── appveyor.yml
├── build
├── CODING_STYLE
├── contrib
├── doc
├── examples
├── INSTALL
├── lib
├── LICENSE
├── Makefile
├── NEWS
├── ossfuzz
├── programs
├── README.md
├── SECURITY.md
└── tests
9 directories, 8 files
build/: We can safely ignore it. It includes support for several build systems. Since we are going to eventually migrate everything tocargo, we’ll support justmake.contrib/: Third party stuff not related to the core of the project.examples/: Useful for testing purposes in the future, not now.ossfuzz/. Test suite for the Google’s OSS Fuzz project. Also useful in future chapters of our lives.programs/: Include programs that make use of the algorithm, such as thelz4CLI.
That leaves us with the lib/ folder, where the algorithm lives and where we
should focus for now, not to start coding yet, but to understand the very basics
of how this is built.
Inside, there is a README.md which is quite revealing:
The `/lib` directory contains many files, but depending on project's objectives,
not all of them are required. Limited systems may want to reduce the nb of
source files to include as a way to reduce binary size and dependencies.
Capabilities are added at the "level" granularity, detailed below.
#### Level 1 : Minimal LZ4 build
The minimum required is **`lz4.c`** and **`lz4.h`**, which provides the fast
compression and decompression algorithms. They generate and decode data using
the [LZ4 block format].
...
Understanding the code
As we explore the lz4.h
file, after a few #define and constant declarations, we can find the public
API definitions, separated by sections depending on how fine-grained we want to
interact with this implementation.
This is helpful because it allows us to identify potential entry points for our rewrite.
Specifically, the following signatures are interesting:
/*-************************************
* Simple Functions
**************************************/
/*! LZ4_compress_default() :
* ...
*/
LZ4LIB_API int LZ4_compress_default(const char* src, char* dst, int srcSize, int dstCapacity);
/*! LZ4_decompress_safe() :
* ...
*/
LZ4LIB_API int LZ4_decompress_safe (const char* src, char* dst, int compressedSize, int dstCapacity);
Let’s dive deep into the compression one. Exploring that one will give us the needed context on the types, structures, and helpers that power the compression function:
int LZ4_compress_default(const char* src, char* dst, int srcSize, int dstCapacity)
{
return LZ4_compress_fast(src, dst, srcSize, dstCapacity, 1);
}
Alright, that was easy… it just calls another LZ4_compress_fast, which has a
very similar signature with an extra hardcoded 1. This is what the function
definition says about it:
Same as LZ4_compress_default(), but allows selection of “acceleration” factor. The larger the acceleration value, the faster the algorithm, but also the lesser the compression. It’s a trade-off. It can be fine tuned, with each successive value providing roughly +~3% to speed. An acceleration value of “1” is the same as regular LZ4_compress_default()
Also,
LZ4_compress_fast()
is just another wrapper over LZ4_compress_fast_extState(). The latter receives
another extra field, an external state; fast() just decides how to allocate
that state and then pass it to extState().
First target
If we take a look at the implementation of
LZ4_compress_fast_extState,
after some initializations and checks, we get to this line:
if (maxOutputSize >= LZ4_compressBound(inputSize)) {
Here, LZ4_compressBound(inputSize) is a super simple function that we can use
to end this chapter: create our Rust crate, expose a function that behaves the
same, and make the C implementation use our version instead. That would leave us
with a Rust project wired into the main C one, in a really good shape for the
next chapters.
int LZ4_compressBound(int isize) { return LZ4_COMPRESSBOUND(isize); }
...
#define LZ4_COMPRESSBOUND(isize) ((unsigned)(isize) > (unsigned)LZ4_MAX_INPUT_SIZE ? 0 : (isize) + ((isize)/255) + 16)
Note that this is a compile-time macro, so we are certainly losing efficiency if we use our custom function instead, since it becomes an externally linked, runtime-executed function. However, that’s acceptable for now, since outperforming the current implementation is way, way beyond our current scope.
Code integration
Rust crate
Ok, so let’s create a library crate at the root level of the repo:
cargo new --lib lz4-rs
Then, we need to add the following to Cargo.toml to tell Cargo it should treat
our code as a static library, so it emits liblz4_rs.a we can link to the C
artifacts.
[package]
name = "lz4-rs"
version = "0.1.0"
edition = "2024"
[lib]
crate-type = ["staticlib"]
Inside lib.rs, we are going to create a function that replicates the behavior
of LZ4_compressBound():
use std::ffi::c_int;
pub const LZ4_MAX_INPUT_SIZE: u32 = 0x7E00_0000;
pub const fn lz4_compress_bound(input_size: c_int) -> c_int {
if (input_size as u32) > LZ4_MAX_INPUT_SIZE {
0
} else {
input_size + (input_size / 255) + 16
}
}
- Include
c_int. We need to match whateverintmeans on the platform that executes the code (it can go from 16 bits to 64). LZ4_MAX_INPUT_SIZEconstant with the same value as in the C implementation.- Rust version of the function. Note that we declare it as
const, because we want to match the#definefrom C, which is evaluated at compile time.
Now, we need to add a FFI wrapper to be able to call it from C:
#[unsafe(no_mangle)]
pub unsafe extern "C" fn LZ4_rs_compressBound(isize: c_int) -> c_int {
lz4_compress_bound(isize)
}
- Rust, when compiling, changes symbol names at the binary level. This is used
to add meta-information and prevent name collisions. However, we don’t want
that to happen here, because otherwise C would not be able to find the
function. So with
#[unsafe(no_mangle)]we tell the compiler not to change the function name. - The function must be marked with
extern "C"because Rust’s ABI is not stabilized. Parameter ordering, how return values are handled, and who cleans up the stack can change. With that, we tell the compiler we need to stick to the C ABI in order to interoperate properly.
Call it from C
Going back to lz4.c, if we go to the implementation of LZ4_compressBound()
We can swap the call of the macro with our Rust version. Declaring first the
existence of the symbol:
extern int LZ4_rs_compressBound(int isize);
int LZ4_compressBound(int isize) { return LZ4_rs_compressBound(isize); }
Now, all the references to LZ4_compressBound() across the C implementation
will use our version instead.
There is just another more step. Linking.
Linking all together
We do not need to reinvent the whole build. We just need to teach lib/Makefile
how to build the Rust crate and link its output into liblz4.
First, we define where the Rust crate lives and where Cargo will leave the generated static library:
CARGO ?= cargo
RUSTDIR ?= ../lz4-rs
RUST_PROFILE ?= release
RUST_TARGET_DIR := $(RUSTDIR)/target/$(RUST_PROFILE)
RUST_STATICLIB := $(RUST_TARGET_DIR)/liblz4_rs.a
Then we add a rule for building that artifact:
$(RUST_STATICLIB): $(RUSTDIR)/src/lib.rs $(RUSTDIR)/Cargo.toml
$(CARGO) build --manifest-path $(RUSTDIR)/Cargo.toml --lib --profile $(RUST_PROFILE)
And finally, we make the shared liblz4 target depend on that Rust archive and
link it in:
$(LIBLZ4): $(RUST_STATICLIB)
$(LIBLZ4): LDLIBS += $(RUST_STATICLIB)
$(eval $(call c_dynamic_library,$(LIBLZ4),$(OBJFILES),,echo "$(LIBLZ4) created",$(RUST_STATICLIB)))
With that, running make from the project root still feels like regular old
make, but now there is a tiny Rust island hidden inside the build.
However, programs/Makefile and tests/Makefile also compile and link targets
that pull in lz4.c, so once LZ4_compressBound() starts calling into Rust,
they need to link against liblz4_rs.a too. Otherwise the library itself builds
fine, but the CLI and test binaries fail at link time with an unresolved symbol.
So the same RUST_STATICLIB definition and build rule must be added there as
well, and the corresponding targets need LDLIBS += $(RUST_STATICLIB).
Let’s try a make test to see if everything works as expected:
+--------------+------------------------------+--------------+----------------+--------------+--------------+
|Source |Function Benchmarked | Total Seconds| Iterations/sec| ns/Iteration| % of default|
+--------------+------------------------------+--------------+----------------+--------------+--------------+
|Normal Text |LZ4_compress_default() | 0.016784000| 595,805| 1,678| 100.00%|
|Normal Text |LZ4_compress_fast() | 0.016889000| 592,101| 1,688| 100.63%|
|Normal Text |LZ4_compress_fast_extState() | 0.016694000| 599,017| 1,669| 99.46%|
|Normal Text |LZ4_decompress_safe() | 0.003408000| 2,934,272| 340| 20.31%|
|Normal Text |LZ4_decompress_fast() | 0.009859000| 1,014,301| 985| 58.74%|
| | | | | | |
|Compressible |LZ4_compress_default() | 0.002235000| 4,474,272| 223| 100.00%|
|Compressible |LZ4_compress_fast() | 0.002402000| 4,163,197| 240| 107.47%|
|Compressible |LZ4_compress_fast_extState() | 0.002260000| 4,424,778| 226| 101.12%|
|Compressible |LZ4_decompress_safe() | 0.000831000| 12,033,694| 83| 37.18%|
|Compressible |LZ4_decompress_fast() | 0.025111000| 398,231| 2,511| 1,123.53%|
+--------------+------------------------------+--------------+----------------+--------------+--------------+
Well, it definitely runs!
And it think that’s enough for devlog #0: we have not rewritten the compressor, we have not won any benchmarks; but we do have a very important part done: a Rust crate compiled by Cargo, linked into the C library, and callable from the existing codebase without breaking the build.
From here on, things get more interesting.
Comments
Join the discussion
Loading replies from Mastodon...