DuckDB | Giorgi Dalakishvili | Personal Website

DuckDB.NET 1.5.0: Simplified Scalar Functions, Table Functions, and NULL Handling

Tue, 10 Mar 2026 13:00:00 +0400

In the previous post, I covered the performance improvements in DuckDB.NET 1.5.0. This post covers the API side: a new high-level registration API for scalar and table user-defined functions, named parameter support for table functions, and explicit NULL handling for scalar UDFs.

Scalar Functions: Before and After

DuckDB.NET has supported scalar UDFs since version 1.0. The existing API gives you full control over vectors and row iteration - but even simple functions require boilerplate.

Here’s an is_prime function using the low-level API:

connection.RegisterScalarFunction<int, bool>("is_prime", (readers, writer, rowCount) =>
{
for (ulong index = 0; index < rowCount; index++)
{
var value = readers[0].GetValue<int>(index);
var prime = true;
for (int i = 2; i <= Math.Sqrt(value); i++)
{
if (value % i == 0)
{
prime = false;
break;
}
}
writer.WriteValue(prime, index);
}
});

You have to manually iterate over rows, read values from input vectors by index, and write results to the output vector. This works, but it’s a lot of ceremony for a function that just checks if a number is prime.

High-Level Scalar Functions

The new simplified API lets you register a scalar function with a plain Func<> delegate. The framework handles row iteration and vector access automatically:

connection.RegisterScalarFunction<int, bool>("is_prime", IsPrime);

That’s it. The library wraps your function, iterates over each row in the chunk, reads the input value, calls your function, and writes the result.

The API supports zero to four parameters, plus variable arguments:

// Zero parameters
connection.RegisterScalarFunction("the_answer", () => 42);
// Two parameters
connection.RegisterScalarFunction<long, long, long>("add", (a, b) => a + b);
// Three parameters
connection.RegisterScalarFunction<int, int, int, int>("clamp",
(value, min, max) => Math.Clamp(value, min, max));
// Variable arguments
connection.RegisterScalarFunction("sum_all", (long[] args) => args.Sum());

Variable argument functions receive all inputs as an array. DuckDB allows calling them with any number of arguments:

SELECT sum_all(1, 2, 3); -- 6
SELECT sum_all(10); -- 10
SELECT sum_all(); -- 0

Dynamic Type Support

The simplified API also supports object as an input type, which maps to DuckDB’s ANY type. This lets you write functions that accept any column type:

connection.RegisterScalarFunction<object, string>("net_type_name",
value => value.GetType().Name);

SELECT net_type_name(42); -- 'Int32'
SELECT net_type_name(42::BIGINT); -- 'Int64'
SELECT net_type_name('hello'); -- 'String'
SELECT net_type_name('2024-01-01'::DATE); -- 'DateOnly'

You can combine object with typed parameters. Here’s a .NET format function that accepts any formattable value:

connection.RegisterScalarFunction("format_net",
(object value, string format) => value is IFormattable f
? f.ToString(format, CultureInfo.InvariantCulture)
: value.ToString());

SELECT format_net(255, 'X'); -- 'FF'
SELECT format_net(0.15, 'P'); -- '15.00 %'
SELECT format_net('2024-11-06'::DATE, 'yyyy/MM/dd'); -- '2024/11/06'

NULL Handling for Scalar Functions

By default, DuckDB short-circuits NULLs automatically: if any input to a scalar function is NULL, the result is NULL and your function is never called. This is the correct behavior for most functions, but sometimes you need to handle NULLs explicitly - for example, to implement COALESCE-like logic or to return a default value.

With the high-level API, the opt-in is the type signature itself. Use a nullable parameter type and DuckDB.NET automatically detects it at registration time and configures the function to receive NULLs:

connection.RegisterScalarFunction<int?, string>("describe_val",
x => x.HasValue ? x.Value.ToString() : "nothing");

SELECT describe_val(42); -- '42'
SELECT describe_val(NULL::INT); -- 'nothing'

This works for reference types too - string? (with nullable annotations enabled) opts into null handling, while string keeps the default NULL propagation:

connection.RegisterScalarFunction<string?, string>("echo_or_default",
s => s ?? "was_null");

SELECT echo_or_default(NULL::VARCHAR); -- 'was_null'
SELECT echo_or_default('hello'); -- 'hello'

You can mix nullable and non-nullable parameters. If any parameter is nullable, the function receives NULLs - but non-nullable parameters will throw a clear error if they get one:

connection.RegisterScalarFunction<int?, int, string>("coalesce_add",
(a, b) => a.HasValue ? (a.Value + b).ToString() : b.ToString());

SELECT coalesce_add(NULL::INT, 5); -- '5'
SELECT coalesce_add(10, 5); -- '15'

NULL handling also works with the low-level vector API. In scalar function callbacks, GetValue<T> returns null for NULL rows when T is a nullable type (int?, string, etc.) instead of throwing, so you can write:

connection.RegisterScalarFunction<string, string>("echo_nullable", (readers, writer, rowCount) =>
{
for (ulong i = 0; i < rowCount; i++)
{
var value = readers[0].GetValue<string>(i);
writer.WriteValue(value ?? "was_null", i);
}
}, new() { HandlesNulls = true });

Table Functions: Before and After

The low-level table function API requires you to define columns, return a TableFunction with data, and provide a mapper callback that writes each row to output vectors:

connection.RegisterTableFunction<int>("employees", parameters =>
{
var count = parameters[0].GetValue<int>();
var employees = Enumerable.Range(1, count)
.Select(i => new Employee(i, $"Employee{i}", 50000 + i * 100));
return new TableFunction(
new List<ColumnInfo> { new("id", typeof(int)), new("name", typeof(string)) },
employees);
},
(item, writers, rowIndex) =>
{
var employee = (Employee)item!;
writers[0].WriteValue(employee.Id, rowIndex);
writers[1].WriteValue(employee.Name, rowIndex);
});

The column definitions and mapper are separate, so they can easily get out of sync - add a column to the schema but forget the mapper, or reorder them differently.

High-Level Table Functions

The new API uses a projection expression to define both the columns and the mapper in one place:

connection.RegisterTableFunction("employees",
(int count) => GetEmployees(count),
e => new { e.Id, e.Name });

The first argument is the function name. The second is a data function that receives the SQL parameters and returns an IEnumerable<T>. The third is a projection expression that defines which properties to expose as columns - column names and types are extracted automatically from the expression.

The projection supports anonymous types, object initializers, computed columns, and single properties:

// Computed columns
connection.RegisterTableFunction("ext_computed",
(int count) => GetEmployees(count),
e => new { FullName = "Dr. " + e.Name, DoubleSalary = e.Salary * 2 });
// Object initializer
connection.RegisterTableFunction("ext_init",
(int count) => GetEmployees(count),
e => new EmployeeDto { Id = e.Id, Name = e.Name });

Like the scalar API, table functions support zero to four parameters:

// No parameters
connection.RegisterTableFunction("all_employees",
() => employees.AsEnumerable(),
e => new { e.Id, e.Name });
// Four parameters
connection.RegisterTableFunction("ext_four",
(int start, int count, string prefix, double multiplier) =>
Enumerable.Range(start, count).Select(i => new Employee(i, $"{prefix}{i}", i * multiplier)),
e => new { e.Id, e.Name, e.Salary });

Async data sources work too - just call ToBlockingEnumerable():

connection.RegisterTableFunction("ext_async",
(int count) => FetchEmployeesAsync(count).ToBlockingEnumerable(),
e => new { e.Id, e.Name });

Named Parameters for Table Functions

DuckDB supports named parameters for table functions. DuckDB.NET 1.5.0 exposes this with the [Named] attribute:

connection.RegisterTableFunction("employees",
(int count, [Named] string? prefix) =>
GetEmployees(count).Select(e => e with { Name = (prefix ?? "") + e.Name }),
e => new { e.Id, e.Name });

In SQL, positional parameters come first, and named parameters use the = syntax:

SELECT * FROM employees(3, prefix = 'Dr. ');
-- (1, 'Dr. Employee1'), (2, 'Dr. Employee2'), (3, 'Dr. Employee3')

-- Named parameters are optional - omit them and they're NULL
SELECT * FROM employees(3);
-- (1, 'Employee1'), (2, 'Employee2'), (3, 'Employee3')

You can have multiple named parameters:

connection.RegisterTableFunction("employees",
(int count, [Named] string? prefix, [Named] double? multiplier) =>
GetEmployees(count).Select(e => e with
{
Name = (prefix ?? "") + e.Name,
Salary = e.Salary * (multiplier ?? 1)
}),
e => new { e.Id, e.Name, e.Salary });

-- Provide both
SELECT * FROM employees(2, prefix = 'X', multiplier = 2.0);
-- Provide only one
SELECT * FROM employees(2, prefix = 'Y');

By default, the SQL parameter name matches the C# parameter name. You can override it with a custom name:

connection.RegisterTableFunction("employees",
(int count, [Named("max_rows")] int? limit) => GetEmployees(limit ?? count),
e => new { e.Id, e.Name });

SELECT * FROM employees(10, max_rows = 2);

Named parameters should typically be nullable types (string?, int?) since they’re optional in SQL. If a named parameter is non-nullable and the caller omits it, DuckDB.NET throws a clear error:

Table function 'employees' named parameter 'limit' is NULL, but parameter type 'Int32' is non-nullable.

Summary

DuckDB.NET 1.5.0 adds four improvements to user-defined functions:

High-level scalar functions: Register with a plain Func<> instead of manually iterating vectors. Supports 0–4 parameters, variable arguments, and dynamic object type.
High-level table functions: Define columns and mapping with a single projection expression. Supports 0–4 parameters, computed columns, and async data sources.
NULL handling for scalar functions: Use nullable parameter types (int?, string?) and DuckDB.NET automatically configures the function to receive NULLs instead of short-circuiting.
Named parameters for table functions: The [Named] attribute maps C# parameters to DuckDB’s = syntax for optional, named arguments.

The low-level vector-based APIs remain available for cases where you need direct control over chunk processing.

DuckDB.NET 1.5.0 Performance: Up to 40% Faster Writes and 22% Fewer Allocations

Tue, 03 Mar 2026 05:00:00 +0400

What’s New in DuckDB.NET 1.5.0

DuckDB.NET 1.5.0 focuses on performance. The codebase has been optimized across multiple layers - from the low-level native interop ( LibraryImport migration, SuppressGCTransition) to the ADO.NET provider (reader reuse, appender boxing elimination) and type converters (decimal rewrite, BigNum conversion). The results show up across every major code path: reading, writing, and type conversion.

Note that the current pre-release (1.5.0-alpha) still uses the DuckDB 1.4.4 native library under the hood - the performance gains come entirely from improvements on the .NET side. The stable DuckDB.NET 1.5.0 release will ship with DuckDB 1.5.0 once it becomes available.

What Changed

LibraryImport Migration

All P/Invoke declarations have been migrated from [DllImport] to the source-generated [LibraryImport]. The runtime no longer needs to generate marshalling stubs at JIT time - the source generator produces them at compile time, eliminating stub overhead on every native call. As part of this migration, string returns now use custom marshallers (DuckDBOwnedStringMarshaller and DuckDBCallerOwnedStringMarshaller) that correctly and transparently handle ownership semantics - whether DuckDB or the caller is responsible for freeing the memory.

SuppressGCTransition on Fast Native Calls

Many DuckDB C API calls are trivially fast - retrieving a vector data pointer, checking validity, getting chunk size. These calls complete in nanoseconds, but the .NET runtime’s GC transition (cooperative → preemptive → cooperative) adds measurable overhead on each call. Adding [SuppressGCTransition] to these methods skips the transition entirely. This is only safe for native functions that execute in under a microsecond, perform no blocking syscalls or I/O, don’t call back into the runtime, don’t throw exceptions, and don’t manipulate locks. The attribute was applied to every DuckDB C API method that meets these criteria - primarily vector data and validity pointer access, chunk size queries, and similar lightweight operations.

AggressiveInlining on Hot-Path Methods

[MethodImpl(MethodImplOptions.AggressiveInlining)] was added to frequently called methods in the reader and writer paths - including IsValid(), GetFieldData<T>(), AppendValueInternal<T>(), and type conversion helpers. For example, IsValid() and GetFieldData<T>() are called for every column of every row. At 100,000 rows and 20 columns, that’s 2 million calls per query. Inlining these small methods directly into the read/write loops removes call overhead.

Appender Boxing Elimination

In 1.4.4, all AppendValue() overloads funneled into a single generic AppendValueInternal<T>(T? value) without a struct constraint. When T is a value type, the nullable wrapper was boxed on every call. In 1.5.0, this is split into struct-constrained and class-constrained overloads - the struct path unwraps Nullable<T> directly via HasValue/Value, eliminating the boxing. With 8 columns per row, that’s 6 fewer heap allocations per row for a typical mixed-type schema.

Decimal Conversion Rewrite

The decimal reader was rewritten for all four internal storage paths:

SmallInt/Integer/BigInt paths: Replaced decimal.Divide(raw, powerOfTen) with the direct constructor new decimal(Math.Abs(raw), 0, 0, raw < 0, scale). This constructs the decimal from its binary components without any arithmetic.
HugeInt path: Uses BigInteger.DivRem with pre-computed BigInteger[] powers of ten instead of repeated intermediate BigInteger arithmetic.
Pre-computed lookup tables: Static decimal[] and BigInteger[] arrays for powers of ten (scales 0–28 and 0–38 respectively) are computed once at startup and reused for all conversions.

Vector Reader Reuse Across Chunks

DuckDB returns data in chunks of up to 2,048 rows. In 1.4.4, new VectorDataReader instances were allocated for each chunk. In 1.5.0, readers implement Reset(IntPtr vector) which updates data and validity pointers in place, reusing the same reader objects across all chunks in a result set. Composite readers (struct, list, map, decimal) override Reset to also update their nested child readers.

BIGNUM O(n) Conversion

The BIGNUM (previously VarInt) to BigInteger conversion was rewritten from an O(n²) digit-by-digit algorithm to a direct O(n) construction using BigInteger's byte-span constructor. For positive values, the raw bytes are passed directly. For negative values, a byte complement is needed - small payloads (≤128 bytes) use stackalloc for this, larger ones rent from ArrayPool<byte>. The result: 93% faster reads and 98-99% fewer allocations at 10K–100K rows.

Benchmarks

All benchmarks compare DuckDB.NET.Data.Full 1.4.4 (stable, from NuGet.org) against 1.5.0-alpha.35 (pre-release, from GitHub) using BenchmarkDotNet. Each version is compiled and run independently using WithMsBuildArguments to swap the package version at build time, so both versions run against the same benchmark code with no shared state.

Reader: 17-22% Faster

The reader benchmark creates a table with a mix of column types (INT, VARCHAR, DOUBLE, BOOLEAN, DECIMAL, BIGINT, TIMESTAMP) and reads 100,000 rows using typed getters (GetInt32, GetString, GetDecimal, etc.).

Method	Job	ColumnCount	Mean	Ratio	Allocated	Alloc Ratio
ReadAllColumns	alpha-1.5.0	5	20.45 ms	-22%	3.82 MB	-1%
ReadAllColumns	stable-1.4.4	5	26.12 ms	baseline	3.85 MB

ReadAllColumns	alpha-1.5.0	10	44.15 ms	-17%	7.63 MB	-1%
ReadAllColumns	stable-1.4.4	10	53.60 ms	baseline	7.68 MB

ReadAllColumns	alpha-1.5.0	20	89.56 ms	-17%	11.45 MB	-1%
ReadAllColumns	stable-1.4.4	20	107.50 ms	baseline	11.56 MB

Allocations are nearly identical because both versions allocate the same string objects for VARCHAR columns. The speedup comes from LibraryImport, SuppressGCTransition, and AggressiveInlining working together on the hot read path.

Appender: 20-40% Faster, 22% Less Memory

The appender benchmark creates rows with 8 columns (INT, VARCHAR, DOUBLE, BOOLEAN, DECIMAL, TIMESTAMP, BIGINT, VARCHAR) using the CreateRow().AppendValue(...).EndRow() API.

Method	Job	RowCount	Mean	Ratio	Allocated	Alloc Ratio
AppendRows	alpha-1.5.0	10,000	19.58 ms	-20%	8.85 MB	-22%
AppendRows	stable-1.4.4	10,000	24.41 ms	baseline	11.36 MB

AppendRows	alpha-1.5.0	100,000	86.90 ms	-41%	89.20 MB	-22%
AppendRows	stable-1.4.4	100,000	147.46 ms	baseline	114.38 MB

AppendRows	alpha-1.5.0	1,000,000	803.55 ms	-40%	899.61 MB	-22%
AppendRows	stable-1.4.4	1,000,000	1,349.50 ms	baseline	1151.38 MB

The consistent 22% allocation reduction is the boxing elimination at work. At 1 million rows with 8 columns, that’s millions of heap allocations removed. The 40% speed gain comes from boxing elimination, reduced GC pressure from fewer allocations, and AggressiveInlining on the write path.

Summary

Area	Speedup	Memory Reduction
Reader (mixed types)	17-22%	~1%
Appender (8 columns)	20-40%	22%

These improvements require no code changes - just update the DuckDB.NET.Data package to 1.5.0 and the optimizations apply automatically.

In the next post, I’ll cover the API improvements in 1.5.0 - including a simplified API for scalar and table user-defined functions.