DuckDB.NET 1.5.0 Performance: Up to 40% Faster Writes and 22% Fewer Allocations

What’s New in DuckDB.NET 1.5.0

DuckDB.NET 1.5.0 focuses on performance. The codebase has been optimized across multiple layers - from the low-level native interop ( LibraryImport migration, SuppressGCTransition) to the ADO.NET provider (reader reuse, appender boxing elimination) and type converters (decimal rewrite, BigNum conversion). The results show up across every major code path: reading, writing, and type conversion.

Note that the current pre-release (1.5.0-alpha) still uses the DuckDB 1.4.4 native library under the hood - the performance gains come entirely from improvements on the .NET side. The stable DuckDB.NET 1.5.0 release will ship with DuckDB 1.5.0 once it becomes available.

What Changed

LibraryImport Migration

All P/Invoke declarations have been migrated from [DllImport] to the source-generated [LibraryImport]. The runtime no longer needs to generate marshalling stubs at JIT time - the source generator produces them at compile time, eliminating stub overhead on every native call. As part of this migration, string returns now use custom marshallers (DuckDBOwnedStringMarshaller and DuckDBCallerOwnedStringMarshaller) that correctly and transparently handle ownership semantics - whether DuckDB or the caller is responsible for freeing the memory.

SuppressGCTransition on Fast Native Calls

Many DuckDB C API calls are trivially fast - retrieving a vector data pointer, checking validity, getting chunk size. These calls complete in nanoseconds, but the .NET runtime’s GC transition (cooperative → preemptive → cooperative) adds measurable overhead on each call. Adding [SuppressGCTransition] to these methods skips the transition entirely. This is only safe for native functions that execute in under a microsecond, perform no blocking syscalls or I/O, don’t call back into the runtime, don’t throw exceptions, and don’t manipulate locks. The attribute was applied to every DuckDB C API method that meets these criteria - primarily vector data and validity pointer access, chunk size queries, and similar lightweight operations.

AggressiveInlining on Hot-Path Methods

[MethodImpl(MethodImplOptions.AggressiveInlining)] was added to frequently called methods in the reader and writer paths - including IsValid(), GetFieldData<T>(), AppendValueInternal<T>(), and type conversion helpers. For example, IsValid() and GetFieldData<T>() are called for every column of every row. At 100,000 rows and 20 columns, that’s 2 million calls per query. Inlining these small methods directly into the read/write loops removes call overhead.

Appender Boxing Elimination

In 1.4.4, all AppendValue() overloads funneled into a single generic AppendValueInternal<T>(T? value) without a struct constraint. When T is a value type, the nullable wrapper was boxed on every call. In 1.5.0, this is split into struct-constrained and class-constrained overloads - the struct path unwraps Nullable<T> directly via HasValue/Value, eliminating the boxing. With 8 columns per row, that’s 6 fewer heap allocations per row for a typical mixed-type schema.

Decimal Conversion Rewrite

The decimal reader was rewritten for all four internal storage paths:

  • SmallInt/Integer/BigInt paths: Replaced decimal.Divide(raw, powerOfTen) with the direct constructor new decimal(Math.Abs(raw), 0, 0, raw < 0, scale). This constructs the decimal from its binary components without any arithmetic.
  • HugeInt path: Uses BigInteger.DivRem with pre-computed BigInteger[] powers of ten instead of repeated intermediate BigInteger arithmetic.
  • Pre-computed lookup tables: Static decimal[] and BigInteger[] arrays for powers of ten (scales 0–28 and 0–38 respectively) are computed once at startup and reused for all conversions.

Vector Reader Reuse Across Chunks

DuckDB returns data in chunks of up to 2,048 rows. In 1.4.4, new VectorDataReader instances were allocated for each chunk. In 1.5.0, readers implement Reset(IntPtr vector) which updates data and validity pointers in place, reusing the same reader objects across all chunks in a result set. Composite readers (struct, list, map, decimal) override Reset to also update their nested child readers.

BIGNUM O(n) Conversion

The BIGNUM (previously VarInt) to BigInteger conversion was rewritten from an O(n²) digit-by-digit algorithm to a direct O(n) construction using BigInteger's byte-span constructor. For positive values, the raw bytes are passed directly. For negative values, a byte complement is needed - small payloads (≤128 bytes) use stackalloc for this, larger ones rent from ArrayPool<byte>. The result: 93% faster reads and 98-99% fewer allocations at 10K–100K rows.

Benchmarks

All benchmarks compare DuckDB.NET.Data.Full 1.4.4 (stable, from NuGet.org) against 1.5.0-alpha.35 (pre-release, from GitHub) using BenchmarkDotNet. Each version is compiled and run independently using WithMsBuildArguments to swap the package version at build time, so both versions run against the same benchmark code with no shared state.

Reader: 17-22% Faster

The reader benchmark creates a table with a mix of column types (INT, VARCHAR, DOUBLE, BOOLEAN, DECIMAL, BIGINT, TIMESTAMP) and reads 100,000 rows using typed getters (GetInt32, GetString, GetDecimal, etc.).

MethodJobColumnCountMeanRatioAllocatedAlloc Ratio
ReadAllColumnsalpha-1.5.0520.45 ms-22%3.82 MB-1%
ReadAllColumnsstable-1.4.4526.12 msbaseline3.85 MB
ReadAllColumnsalpha-1.5.01044.15 ms-17%7.63 MB-1%
ReadAllColumnsstable-1.4.41053.60 msbaseline7.68 MB
ReadAllColumnsalpha-1.5.02089.56 ms-17%11.45 MB-1%
ReadAllColumnsstable-1.4.420107.50 msbaseline11.56 MB

Allocations are nearly identical because both versions allocate the same string objects for VARCHAR columns. The speedup comes from LibraryImport, SuppressGCTransition, and AggressiveInlining working together on the hot read path.

Appender: 20-40% Faster, 22% Less Memory

The appender benchmark creates rows with 8 columns (INT, VARCHAR, DOUBLE, BOOLEAN, DECIMAL, TIMESTAMP, BIGINT, VARCHAR) using the CreateRow().AppendValue(...).EndRow() API.

MethodJobRowCountMeanRatioAllocatedAlloc Ratio
AppendRowsalpha-1.5.010,00019.58 ms-20%8.85 MB-22%
AppendRowsstable-1.4.410,00024.41 msbaseline11.36 MB
AppendRowsalpha-1.5.0100,00086.90 ms-41%89.20 MB-22%
AppendRowsstable-1.4.4100,000147.46 msbaseline114.38 MB
AppendRowsalpha-1.5.01,000,000803.55 ms-40%899.61 MB-22%
AppendRowsstable-1.4.41,000,0001,349.50 msbaseline1151.38 MB

The consistent 22% allocation reduction is the boxing elimination at work. At 1 million rows with 8 columns, that’s millions of heap allocations removed. The 40% speed gain comes from boxing elimination, reduced GC pressure from fewer allocations, and AggressiveInlining on the write path.

Summary

AreaSpeedupMemory Reduction
Reader (mixed types)17-22%~1%
Appender (8 columns)20-40%22%

These improvements require no code changes - just update the DuckDB.NET.Data package to 1.5.0 and the optimizations apply automatically.

In the next post, I’ll cover the API improvements in 1.5.0 - including a simplified API for scalar and table user-defined functions.

Avatar
Giorgi Dalakishvili
World-Class Software Engineer

Related