DuckDB.NET 1.5.0 Performance: Up to 40% Faster Writes and 22% Fewer Allocations
What’s New in DuckDB.NET 1.5.0
DuckDB.NET 1.5.0 focuses on performance. The codebase has been optimized across multiple layers - from the low-level native interop ( LibraryImport migration, SuppressGCTransition) to the ADO.NET provider (reader reuse, appender boxing elimination) and type converters (decimal rewrite, BigNum conversion). The results show up across every major code path: reading, writing, and type conversion.
Note that the current pre-release (1.5.0-alpha) still uses the DuckDB 1.4.4 native library under the hood - the performance gains come entirely from improvements on the .NET side. The stable DuckDB.NET 1.5.0 release will ship with DuckDB 1.5.0 once it becomes available.
What Changed
LibraryImport Migration
All P/Invoke declarations have been migrated from [DllImport] to the source-generated [LibraryImport]. The runtime no longer needs to generate marshalling stubs at JIT time - the source generator produces them at compile time, eliminating stub overhead on every native call. As part of this migration, string returns now use custom marshallers (DuckDBOwnedStringMarshaller and DuckDBCallerOwnedStringMarshaller) that correctly and transparently handle ownership semantics - whether DuckDB or the caller is responsible for freeing the memory.
SuppressGCTransition on Fast Native Calls
Many DuckDB C API calls are trivially fast - retrieving a vector data pointer, checking validity, getting chunk size. These calls complete in nanoseconds, but the .NET runtime’s GC transition (cooperative → preemptive → cooperative) adds measurable overhead on each call. Adding [SuppressGCTransition] to these methods skips the transition entirely. This is only safe for native functions that execute in under a microsecond, perform no blocking syscalls or I/O, don’t call back into the runtime, don’t throw exceptions, and don’t manipulate locks. The attribute was applied to every DuckDB C API method that meets these criteria - primarily vector data and validity pointer access, chunk size queries, and similar lightweight operations.
AggressiveInlining on Hot-Path Methods
[MethodImpl(MethodImplOptions.AggressiveInlining)] was added to frequently called methods in the reader and writer paths - including IsValid(), GetFieldData<T>(), AppendValueInternal<T>(), and type conversion helpers. For example, IsValid() and GetFieldData<T>() are called for every column of every row. At 100,000 rows and 20 columns, that’s 2 million calls per query. Inlining these small methods directly into the read/write loops removes call overhead.
Appender Boxing Elimination
In 1.4.4, all AppendValue() overloads funneled into a single generic AppendValueInternal<T>(T? value) without a struct constraint. When T is a value type, the nullable wrapper was boxed on every call. In 1.5.0, this is split into struct-constrained and class-constrained overloads - the struct path unwraps Nullable<T> directly via HasValue/Value, eliminating the boxing. With 8 columns per row, that’s 6 fewer heap allocations per row for a typical mixed-type schema.
Decimal Conversion Rewrite
The decimal reader was rewritten for all four internal storage paths:
- SmallInt/Integer/BigInt paths: Replaced
decimal.Divide(raw, powerOfTen)with the direct constructornew decimal(Math.Abs(raw), 0, 0, raw < 0, scale). This constructs the decimal from its binary components without any arithmetic. - HugeInt path: Uses
BigInteger.DivRemwith pre-computedBigInteger[]powers of ten instead of repeated intermediate BigInteger arithmetic. - Pre-computed lookup tables: Static
decimal[]andBigInteger[]arrays for powers of ten (scales 0–28 and 0–38 respectively) are computed once at startup and reused for all conversions.
Vector Reader Reuse Across Chunks
DuckDB returns data in chunks of up to 2,048 rows. In 1.4.4, new VectorDataReader instances were allocated for each chunk. In 1.5.0, readers implement Reset(IntPtr vector) which updates data and validity pointers in place, reusing the same reader objects across all chunks in a result set. Composite readers (struct, list, map, decimal) override Reset to also update their nested child readers.
BIGNUM O(n) Conversion
The BIGNUM (previously VarInt) to BigInteger conversion was rewritten from an O(n²) digit-by-digit algorithm to a direct O(n) construction using BigInteger's byte-span constructor. For positive values, the raw bytes are passed directly. For negative values, a byte complement is needed - small payloads (≤128 bytes) use stackalloc for this, larger ones rent from ArrayPool<byte>. The result: 93% faster reads and 98-99% fewer allocations at 10K–100K rows.
Benchmarks
All benchmarks compare DuckDB.NET.Data.Full 1.4.4 (stable, from NuGet.org) against 1.5.0-alpha.35 (pre-release, from GitHub) using
BenchmarkDotNet. Each version is compiled and run independently using WithMsBuildArguments to swap the package version at build time, so both versions run against the same benchmark code with no shared state.
Reader: 17-22% Faster
The reader benchmark creates a table with a mix of column types (INT, VARCHAR, DOUBLE, BOOLEAN, DECIMAL, BIGINT, TIMESTAMP) and reads 100,000 rows using typed getters (GetInt32, GetString, GetDecimal, etc.).
| Method | Job | ColumnCount | Mean | Ratio | Allocated | Alloc Ratio |
|---|---|---|---|---|---|---|
| ReadAllColumns | alpha-1.5.0 | 5 | 20.45 ms | -22% | 3.82 MB | -1% |
| ReadAllColumns | stable-1.4.4 | 5 | 26.12 ms | baseline | 3.85 MB | |
| ReadAllColumns | alpha-1.5.0 | 10 | 44.15 ms | -17% | 7.63 MB | -1% |
| ReadAllColumns | stable-1.4.4 | 10 | 53.60 ms | baseline | 7.68 MB | |
| ReadAllColumns | alpha-1.5.0 | 20 | 89.56 ms | -17% | 11.45 MB | -1% |
| ReadAllColumns | stable-1.4.4 | 20 | 107.50 ms | baseline | 11.56 MB |
Allocations are nearly identical because both versions allocate the same string objects for VARCHAR columns. The speedup comes from LibraryImport, SuppressGCTransition, and AggressiveInlining working together on the hot read path.
Appender: 20-40% Faster, 22% Less Memory
The appender benchmark creates rows with 8 columns (INT, VARCHAR, DOUBLE, BOOLEAN, DECIMAL, TIMESTAMP, BIGINT, VARCHAR) using the CreateRow().AppendValue(...).EndRow() API.
| Method | Job | RowCount | Mean | Ratio | Allocated | Alloc Ratio |
|---|---|---|---|---|---|---|
| AppendRows | alpha-1.5.0 | 10,000 | 19.58 ms | -20% | 8.85 MB | -22% |
| AppendRows | stable-1.4.4 | 10,000 | 24.41 ms | baseline | 11.36 MB | |
| AppendRows | alpha-1.5.0 | 100,000 | 86.90 ms | -41% | 89.20 MB | -22% |
| AppendRows | stable-1.4.4 | 100,000 | 147.46 ms | baseline | 114.38 MB | |
| AppendRows | alpha-1.5.0 | 1,000,000 | 803.55 ms | -40% | 899.61 MB | -22% |
| AppendRows | stable-1.4.4 | 1,000,000 | 1,349.50 ms | baseline | 1151.38 MB |
The consistent 22% allocation reduction is the boxing elimination at work. At 1 million rows with 8 columns, that’s millions of heap allocations removed. The 40% speed gain comes from boxing elimination, reduced GC pressure from fewer allocations, and AggressiveInlining on the write path.
Summary
| Area | Speedup | Memory Reduction |
|---|---|---|
| Reader (mixed types) | 17-22% | ~1% |
| Appender (8 columns) | 20-40% | 22% |
These improvements require no code changes - just update the DuckDB.NET.Data package to 1.5.0 and the optimizations apply automatically.
In the next post, I’ll cover the API improvements in 1.5.0 - including a simplified API for scalar and table user-defined functions.