<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Benchmarks | Giorgi Dalakishvili | Personal Website</title><link>https://www.giorgi.dev/tags/benchmarks/</link><atom:link href="https://www.giorgi.dev/tags/benchmarks/index.xml" rel="self" type="application/rss+xml"/><description>Benchmarks</description><generator>Source Themes Academic (https://sourcethemes.com/academic/)</generator><language>en-us</language><copyright>© Giorgi Dalakishvili 2019 - 2026</copyright><lastBuildDate>Tue, 03 Mar 2026 05:00:00 +0400</lastBuildDate><image><url>img/map[gravatar:%!s(bool=false) shape:circle]</url><title>Benchmarks</title><link>https://www.giorgi.dev/tags/benchmarks/</link></image><item><title>DuckDB.NET 1.5.0 Performance: Up to 40% Faster Writes and 22% Fewer Allocations</title><link>https://www.giorgi.dev/database/duckdb-net-1-5-performance/</link><pubDate>Tue, 03 Mar 2026 05:00:00 +0400</pubDate><guid>https://www.giorgi.dev/database/duckdb-net-1-5-performance/</guid><description>&lt;h2 id="whats-new-in-duckdbnet-150">What&amp;rsquo;s New in DuckDB.NET 1.5.0&lt;/h2>
&lt;p>
&lt;a href="https://github.com/Giorgi/DuckDB.NET" target="_blank" rel="noopener">DuckDB.NET&lt;/a> 1.5.0 focuses on performance. The codebase has been optimized across multiple layers - from the low-level native interop (
&lt;a href="https://github.com/Giorgi/DuckDB.NET/commit/370ca9070710784258cde85e43a88144f8200b6f" target="_blank" rel="noopener">LibraryImport migration&lt;/a>,
&lt;a href="https://github.com/Giorgi/DuckDB.NET/commit/377c3a1c76e2117015c0b29b4896b4535ac15cce" target="_blank" rel="noopener">SuppressGCTransition&lt;/a>) to the ADO.NET provider (reader reuse, appender boxing elimination) and type converters (decimal rewrite, BigNum conversion). The results show up across every major code path: reading, writing, and type conversion.&lt;/p>
&lt;p>Note that the current pre-release (1.5.0-alpha) still uses the DuckDB 1.4.4 native library under the hood - the performance gains come entirely from improvements on the .NET side. The stable DuckDB.NET 1.5.0 release will ship with DuckDB 1.5.0 once it becomes available.&lt;/p>
&lt;h2 id="what-changed">What Changed&lt;/h2>
&lt;h3 id="libraryimport-migration">LibraryImport Migration&lt;/h3>
&lt;p>All P/Invoke declarations have been migrated from &lt;code>[DllImport]&lt;/code> to the source-generated &lt;code>[LibraryImport]&lt;/code>. The runtime no longer needs to generate marshalling stubs at JIT time - the source generator produces them at compile time, eliminating stub overhead on every native call. As part of this migration, string returns now use custom marshallers (&lt;code>DuckDBOwnedStringMarshaller&lt;/code> and &lt;code>DuckDBCallerOwnedStringMarshaller&lt;/code>) that correctly and transparently handle ownership semantics - whether DuckDB or the caller is responsible for freeing the memory.&lt;/p>
&lt;h3 id="suppressgctransition-on-fast-native-calls">SuppressGCTransition on Fast Native Calls&lt;/h3>
&lt;p>Many DuckDB C API calls are trivially fast - retrieving a vector data pointer, checking validity, getting chunk size. These calls complete in nanoseconds, but the .NET runtime&amp;rsquo;s GC transition (cooperative → preemptive → cooperative) adds measurable overhead on each call. Adding &lt;code>[SuppressGCTransition]&lt;/code> to these methods skips the transition entirely. This is only safe for native functions that execute in under a microsecond, perform no blocking syscalls or I/O, don&amp;rsquo;t call back into the runtime, don&amp;rsquo;t throw exceptions, and don&amp;rsquo;t manipulate locks. The attribute was applied to every DuckDB C API method that meets these criteria - primarily vector data and validity pointer access, chunk size queries, and similar lightweight operations.&lt;/p>
&lt;h3 id="aggressiveinlining-on-hot-path-methods">AggressiveInlining on Hot-Path Methods&lt;/h3>
&lt;p>&lt;code>[MethodImpl(MethodImplOptions.AggressiveInlining)]&lt;/code> was added to frequently called methods in the reader and writer paths - including &lt;code>IsValid()&lt;/code>, &lt;code>GetFieldData&amp;lt;T&amp;gt;()&lt;/code>, &lt;code>AppendValueInternal&amp;lt;T&amp;gt;()&lt;/code>, and type conversion helpers. For example, &lt;code>IsValid()&lt;/code> and &lt;code>GetFieldData&amp;lt;T&amp;gt;()&lt;/code> are called for every column of every row. At 100,000 rows and 20 columns, that&amp;rsquo;s 2 million calls per query. Inlining these small methods directly into the read/write loops removes call overhead.&lt;/p>
&lt;h3 id="appender-boxing-elimination">Appender Boxing Elimination&lt;/h3>
&lt;p>In 1.4.4, all &lt;code>AppendValue()&lt;/code> overloads funneled into a single generic &lt;code>AppendValueInternal&amp;lt;T&amp;gt;(T? value)&lt;/code> without a &lt;code>struct&lt;/code> constraint. When &lt;code>T&lt;/code> is a value type, the nullable wrapper was boxed on every call. In 1.5.0, this is split into &lt;code>struct&lt;/code>-constrained and &lt;code>class&lt;/code>-constrained overloads - the struct path unwraps &lt;code>Nullable&amp;lt;T&amp;gt;&lt;/code> directly via &lt;code>HasValue&lt;/code>/&lt;code>Value&lt;/code>, eliminating the boxing. With 8 columns per row, that&amp;rsquo;s 6 fewer heap allocations per row for a typical mixed-type schema.&lt;/p>
&lt;h3 id="decimal-conversion-rewrite">Decimal Conversion Rewrite&lt;/h3>
&lt;p>The decimal reader was rewritten for all four internal storage paths:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>SmallInt/Integer/BigInt paths&lt;/strong>: Replaced &lt;code>decimal.Divide(raw, powerOfTen)&lt;/code> with the direct constructor &lt;code>new decimal(Math.Abs(raw), 0, 0, raw &amp;lt; 0, scale)&lt;/code>. This constructs the decimal from its binary components without any arithmetic.&lt;/li>
&lt;li>&lt;strong>HugeInt path&lt;/strong>: Uses &lt;code>BigInteger.DivRem&lt;/code> with pre-computed &lt;code>BigInteger[]&lt;/code> powers of ten instead of repeated intermediate BigInteger arithmetic.&lt;/li>
&lt;li>&lt;strong>Pre-computed lookup tables&lt;/strong>: Static &lt;code>decimal[]&lt;/code> and &lt;code>BigInteger[]&lt;/code> arrays for powers of ten (scales 0–28 and 0–38 respectively) are computed once at startup and reused for all conversions.&lt;/li>
&lt;/ul>
&lt;h3 id="vector-reader-reuse-across-chunks">Vector Reader Reuse Across Chunks&lt;/h3>
&lt;p>DuckDB returns data in chunks of up to 2,048 rows. In 1.4.4, new &lt;code>VectorDataReader&lt;/code> instances were allocated for each chunk. In 1.5.0, readers implement &lt;code>Reset(IntPtr vector)&lt;/code> which updates data and validity pointers in place, reusing the same reader objects across all chunks in a result set. Composite readers (struct, list, map, decimal) override &lt;code>Reset&lt;/code> to also update their nested child readers.&lt;/p>
&lt;h3 id="bignum-on-conversion">BIGNUM O(n) Conversion&lt;/h3>
&lt;p>The &lt;code>BIGNUM&lt;/code> (previously &lt;code>VarInt&lt;/code>) to &lt;code>BigInteger&lt;/code> conversion was rewritten from an O(n²) digit-by-digit algorithm to a direct O(n) construction using &lt;code>BigInteger&lt;/code>'s byte-span constructor. For positive values, the raw bytes are passed directly. For negative values, a byte complement is needed - small payloads (≤128 bytes) use &lt;code>stackalloc&lt;/code> for this, larger ones rent from &lt;code>ArrayPool&amp;lt;byte&amp;gt;&lt;/code>. The result: &lt;strong>93% faster&lt;/strong> reads and &lt;strong>98-99% fewer allocations&lt;/strong> at 10K–100K rows.&lt;/p>
&lt;h2 id="benchmarks">Benchmarks&lt;/h2>
&lt;p>All benchmarks compare &lt;strong>DuckDB.NET.Data.Full 1.4.4&lt;/strong> (stable, from NuGet.org) against &lt;strong>1.5.0-alpha.35&lt;/strong> (pre-release, from GitHub) using
&lt;a href="https://github.com/dotnet/BenchmarkDotNet" target="_blank" rel="noopener">BenchmarkDotNet&lt;/a>. Each version is compiled and run independently using &lt;code>WithMsBuildArguments&lt;/code> to swap the package version at build time, so both versions run against the same benchmark code with no shared state.&lt;/p>
&lt;h3 id="reader-17-22-faster">Reader: 17-22% Faster&lt;/h3>
&lt;p>The reader benchmark creates a table with a mix of column types (INT, VARCHAR, DOUBLE, BOOLEAN, DECIMAL, BIGINT, TIMESTAMP) and reads 100,000 rows using typed getters (&lt;code>GetInt32&lt;/code>, &lt;code>GetString&lt;/code>, &lt;code>GetDecimal&lt;/code>, etc.).&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Method&lt;/th>
&lt;th>Job&lt;/th>
&lt;th>ColumnCount&lt;/th>
&lt;th align="right">Mean&lt;/th>
&lt;th align="right">Ratio&lt;/th>
&lt;th align="right">Allocated&lt;/th>
&lt;th align="right">Alloc Ratio&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>ReadAllColumns&lt;/td>
&lt;td>alpha-1.5.0&lt;/td>
&lt;td>5&lt;/td>
&lt;td align="right">20.45 ms&lt;/td>
&lt;td align="right">-22%&lt;/td>
&lt;td align="right">3.82 MB&lt;/td>
&lt;td align="right">-1%&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>ReadAllColumns&lt;/td>
&lt;td>stable-1.4.4&lt;/td>
&lt;td>5&lt;/td>
&lt;td align="right">26.12 ms&lt;/td>
&lt;td align="right">baseline&lt;/td>
&lt;td align="right">3.85 MB&lt;/td>
&lt;td align="right">&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;td align="right">&lt;/td>
&lt;td align="right">&lt;/td>
&lt;td align="right">&lt;/td>
&lt;td align="right">&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>ReadAllColumns&lt;/td>
&lt;td>alpha-1.5.0&lt;/td>
&lt;td>10&lt;/td>
&lt;td align="right">44.15 ms&lt;/td>
&lt;td align="right">-17%&lt;/td>
&lt;td align="right">7.63 MB&lt;/td>
&lt;td align="right">-1%&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>ReadAllColumns&lt;/td>
&lt;td>stable-1.4.4&lt;/td>
&lt;td>10&lt;/td>
&lt;td align="right">53.60 ms&lt;/td>
&lt;td align="right">baseline&lt;/td>
&lt;td align="right">7.68 MB&lt;/td>
&lt;td align="right">&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;td align="right">&lt;/td>
&lt;td align="right">&lt;/td>
&lt;td align="right">&lt;/td>
&lt;td align="right">&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>ReadAllColumns&lt;/td>
&lt;td>alpha-1.5.0&lt;/td>
&lt;td>20&lt;/td>
&lt;td align="right">89.56 ms&lt;/td>
&lt;td align="right">-17%&lt;/td>
&lt;td align="right">11.45 MB&lt;/td>
&lt;td align="right">-1%&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>ReadAllColumns&lt;/td>
&lt;td>stable-1.4.4&lt;/td>
&lt;td>20&lt;/td>
&lt;td align="right">107.50 ms&lt;/td>
&lt;td align="right">baseline&lt;/td>
&lt;td align="right">11.56 MB&lt;/td>
&lt;td align="right">&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>Allocations are nearly identical because both versions allocate the same &lt;code>string&lt;/code> objects for VARCHAR columns. The speedup comes from &lt;code>LibraryImport&lt;/code>, &lt;code>SuppressGCTransition&lt;/code>, and &lt;code>AggressiveInlining&lt;/code> working together on the hot read path.&lt;/p>
&lt;h3 id="appender-20-40-faster-22-less-memory">Appender: 20-40% Faster, 22% Less Memory&lt;/h3>
&lt;p>The appender benchmark creates rows with 8 columns (INT, VARCHAR, DOUBLE, BOOLEAN, DECIMAL, TIMESTAMP, BIGINT, VARCHAR) using the &lt;code>CreateRow().AppendValue(...).EndRow()&lt;/code> API.&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Method&lt;/th>
&lt;th>Job&lt;/th>
&lt;th>RowCount&lt;/th>
&lt;th align="right">Mean&lt;/th>
&lt;th align="right">Ratio&lt;/th>
&lt;th align="right">Allocated&lt;/th>
&lt;th align="right">Alloc Ratio&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>AppendRows&lt;/td>
&lt;td>alpha-1.5.0&lt;/td>
&lt;td>10,000&lt;/td>
&lt;td align="right">19.58 ms&lt;/td>
&lt;td align="right">-20%&lt;/td>
&lt;td align="right">8.85 MB&lt;/td>
&lt;td align="right">-22%&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>AppendRows&lt;/td>
&lt;td>stable-1.4.4&lt;/td>
&lt;td>10,000&lt;/td>
&lt;td align="right">24.41 ms&lt;/td>
&lt;td align="right">baseline&lt;/td>
&lt;td align="right">11.36 MB&lt;/td>
&lt;td align="right">&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;td align="right">&lt;/td>
&lt;td align="right">&lt;/td>
&lt;td align="right">&lt;/td>
&lt;td align="right">&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>AppendRows&lt;/td>
&lt;td>alpha-1.5.0&lt;/td>
&lt;td>100,000&lt;/td>
&lt;td align="right">86.90 ms&lt;/td>
&lt;td align="right">-41%&lt;/td>
&lt;td align="right">89.20 MB&lt;/td>
&lt;td align="right">-22%&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>AppendRows&lt;/td>
&lt;td>stable-1.4.4&lt;/td>
&lt;td>100,000&lt;/td>
&lt;td align="right">147.46 ms&lt;/td>
&lt;td align="right">baseline&lt;/td>
&lt;td align="right">114.38 MB&lt;/td>
&lt;td align="right">&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;td>&lt;/td>
&lt;td align="right">&lt;/td>
&lt;td align="right">&lt;/td>
&lt;td align="right">&lt;/td>
&lt;td align="right">&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>AppendRows&lt;/td>
&lt;td>alpha-1.5.0&lt;/td>
&lt;td>1,000,000&lt;/td>
&lt;td align="right">803.55 ms&lt;/td>
&lt;td align="right">-40%&lt;/td>
&lt;td align="right">899.61 MB&lt;/td>
&lt;td align="right">-22%&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>AppendRows&lt;/td>
&lt;td>stable-1.4.4&lt;/td>
&lt;td>1,000,000&lt;/td>
&lt;td align="right">1,349.50 ms&lt;/td>
&lt;td align="right">baseline&lt;/td>
&lt;td align="right">1151.38 MB&lt;/td>
&lt;td align="right">&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>The consistent 22% allocation reduction is the boxing elimination at work. At 1 million rows with 8 columns, that&amp;rsquo;s millions of heap allocations removed. The 40% speed gain comes from boxing elimination, reduced GC pressure from fewer allocations, and &lt;code>AggressiveInlining&lt;/code> on the write path.&lt;/p>
&lt;h2 id="summary">Summary&lt;/h2>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Area&lt;/th>
&lt;th>Speedup&lt;/th>
&lt;th>Memory Reduction&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>Reader (mixed types)&lt;/td>
&lt;td>17-22%&lt;/td>
&lt;td>~1%&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Appender (8 columns)&lt;/td>
&lt;td>20-40%&lt;/td>
&lt;td>22%&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>These improvements require no code changes - just update the DuckDB.NET.Data package to 1.5.0 and the optimizations apply automatically.&lt;/p>
&lt;p>In the next post, I&amp;rsquo;ll cover the API improvements in 1.5.0 - including a simplified API for scalar and table user-defined functions.&lt;/p></description></item></channel></rss>