DuckDB.NET 1.5.0: Simplified Scalar Functions, Table Functions, and NULL Handling

In the previous post, I covered the performance improvements in DuckDB.NET 1.5.0. This post covers the API side: a new high-level registration API for scalar and table user-defined functions, named parameter support for table functions, and explicit NULL handling for scalar UDFs.

Scalar Functions: Before and After

DuckDB.NET has supported scalar UDFs since version 1.0. The existing API gives you full control over vectors and row iteration - but even simple functions require boilerplate.

Here’s an is_prime function using the low-level API:

connection.RegisterScalarFunction<int, bool>("is_prime", (readers, writer, rowCount) =>
{
    for (ulong index = 0; index < rowCount; index++)
    {
        var value = readers[0].GetValue<int>(index);
        var prime = true;

        for (int i = 2; i <= Math.Sqrt(value); i++)
        {
            if (value % i == 0)
            {
                prime = false;
                break;
            }
        }

        writer.WriteValue(prime, index);
    }
});

You have to manually iterate over rows, read values from input vectors by index, and write results to the output vector. This works, but it’s a lot of ceremony for a function that just checks if a number is prime.

High-Level Scalar Functions

The new simplified API lets you register a scalar function with a plain Func<> delegate. The framework handles row iteration and vector access automatically:

connection.RegisterScalarFunction<int, bool>("is_prime", IsPrime);

That’s it. The library wraps your function, iterates over each row in the chunk, reads the input value, calls your function, and writes the result.

The API supports zero to four parameters, plus variable arguments:

// Zero parameters
connection.RegisterScalarFunction("the_answer", () => 42);

// Two parameters
connection.RegisterScalarFunction<long, long, long>("add", (a, b) => a + b);

// Three parameters
connection.RegisterScalarFunction<int, int, int, int>("clamp",
    (value, min, max) => Math.Clamp(value, min, max));

// Variable arguments
connection.RegisterScalarFunction("sum_all", (long[] args) => args.Sum());

Variable argument functions receive all inputs as an array. DuckDB allows calling them with any number of arguments:

SELECT sum_all(1, 2, 3);    -- 6
SELECT sum_all(10);          -- 10
SELECT sum_all();            -- 0

Dynamic Type Support

The simplified API also supports object as an input type, which maps to DuckDB’s ANY type. This lets you write functions that accept any column type:

connection.RegisterScalarFunction<object, string>("net_type_name",
    value => value.GetType().Name);
SELECT net_type_name(42);                -- 'Int32'
SELECT net_type_name(42::BIGINT);        -- 'Int64'
SELECT net_type_name('hello');           -- 'String'
SELECT net_type_name('2024-01-01'::DATE); -- 'DateOnly'

You can combine object with typed parameters. Here’s a .NET format function that accepts any formattable value:

connection.RegisterScalarFunction("format_net",
    (object value, string format) => value is IFormattable f
        ? f.ToString(format, CultureInfo.InvariantCulture)
        : value.ToString());
SELECT format_net(255, 'X');                     -- 'FF'
SELECT format_net(0.15, 'P');                    -- '15.00 %'
SELECT format_net('2024-11-06'::DATE, 'yyyy/MM/dd'); -- '2024/11/06'

NULL Handling for Scalar Functions

By default, DuckDB short-circuits NULLs automatically: if any input to a scalar function is NULL, the result is NULL and your function is never called. This is the correct behavior for most functions, but sometimes you need to handle NULLs explicitly - for example, to implement COALESCE-like logic or to return a default value.

With the high-level API, the opt-in is the type signature itself. Use a nullable parameter type and DuckDB.NET automatically detects it at registration time and configures the function to receive NULLs:

connection.RegisterScalarFunction<int?, string>("describe_val",
    x => x.HasValue ? x.Value.ToString() : "nothing");
SELECT describe_val(42);        -- '42'
SELECT describe_val(NULL::INT); -- 'nothing'

This works for reference types too - string? (with nullable annotations enabled) opts into null handling, while string keeps the default NULL propagation:

connection.RegisterScalarFunction<string?, string>("echo_or_default",
    s => s ?? "was_null");
SELECT echo_or_default(NULL::VARCHAR); -- 'was_null'
SELECT echo_or_default('hello');       -- 'hello'

You can mix nullable and non-nullable parameters. If any parameter is nullable, the function receives NULLs - but non-nullable parameters will throw a clear error if they get one:

connection.RegisterScalarFunction<int?, int, string>("coalesce_add",
    (a, b) => a.HasValue ? (a.Value + b).ToString() : b.ToString());
SELECT coalesce_add(NULL::INT, 5); -- '5'
SELECT coalesce_add(10, 5);       -- '15'

NULL handling also works with the low-level vector API. In scalar function callbacks, GetValue<T> returns null for NULL rows when T is a nullable type (int?, string, etc.) instead of throwing, so you can write:

connection.RegisterScalarFunction<string, string>("echo_nullable", (readers, writer, rowCount) =>
{
    for (ulong i = 0; i < rowCount; i++)
    {
        var value = readers[0].GetValue<string>(i);
        writer.WriteValue(value ?? "was_null", i);
    }
}, new() { HandlesNulls = true });

Table Functions: Before and After

The low-level table function API requires you to define columns, return a TableFunction with data, and provide a mapper callback that writes each row to output vectors:

connection.RegisterTableFunction<int>("employees", parameters =>
{
    var count = parameters[0].GetValue<int>();
    var employees = Enumerable.Range(1, count)
        .Select(i => new Employee(i, $"Employee{i}", 50000 + i * 100));

    return new TableFunction(
        new List<ColumnInfo> { new("id", typeof(int)), new("name", typeof(string)) },
        employees);
},
(item, writers, rowIndex) =>
{
    var employee = (Employee)item!;
    writers[0].WriteValue(employee.Id, rowIndex);
    writers[1].WriteValue(employee.Name, rowIndex);
});

The column definitions and mapper are separate, so they can easily get out of sync - add a column to the schema but forget the mapper, or reorder them differently.

High-Level Table Functions

The new API uses a projection expression to define both the columns and the mapper in one place:

connection.RegisterTableFunction("employees",
    (int count) => GetEmployees(count),
    e => new { e.Id, e.Name });

The first argument is the function name. The second is a data function that receives the SQL parameters and returns an IEnumerable<T>. The third is a projection expression that defines which properties to expose as columns - column names and types are extracted automatically from the expression.

The projection supports anonymous types, object initializers, computed columns, and single properties:

// Computed columns
connection.RegisterTableFunction("ext_computed",
    (int count) => GetEmployees(count),
    e => new { FullName = "Dr. " + e.Name, DoubleSalary = e.Salary * 2 });

// Object initializer
connection.RegisterTableFunction("ext_init",
    (int count) => GetEmployees(count),
    e => new EmployeeDto { Id = e.Id, Name = e.Name });

Like the scalar API, table functions support zero to four parameters:

// No parameters
connection.RegisterTableFunction("all_employees",
    () => employees.AsEnumerable(),
    e => new { e.Id, e.Name });

// Four parameters
connection.RegisterTableFunction("ext_four",
    (int start, int count, string prefix, double multiplier) =>
        Enumerable.Range(start, count).Select(i => new Employee(i, $"{prefix}{i}", i * multiplier)),
    e => new { e.Id, e.Name, e.Salary });

Async data sources work too - just call ToBlockingEnumerable():

connection.RegisterTableFunction("ext_async",
    (int count) => FetchEmployeesAsync(count).ToBlockingEnumerable(),
    e => new { e.Id, e.Name });

Named Parameters for Table Functions

DuckDB supports named parameters for table functions. DuckDB.NET 1.5.0 exposes this with the [Named] attribute:

connection.RegisterTableFunction("employees",
    (int count, [Named] string? prefix) =>
        GetEmployees(count).Select(e => e with { Name = (prefix ?? "") + e.Name }),
    e => new { e.Id, e.Name });

In SQL, positional parameters come first, and named parameters use the = syntax:

SELECT * FROM employees(3, prefix = 'Dr. ');
-- (1, 'Dr. Employee1'), (2, 'Dr. Employee2'), (3, 'Dr. Employee3')

-- Named parameters are optional - omit them and they're NULL
SELECT * FROM employees(3);
-- (1, 'Employee1'), (2, 'Employee2'), (3, 'Employee3')

You can have multiple named parameters:

connection.RegisterTableFunction("employees",
    (int count, [Named] string? prefix, [Named] double? multiplier) =>
        GetEmployees(count).Select(e => e with
        {
            Name = (prefix ?? "") + e.Name,
            Salary = e.Salary * (multiplier ?? 1)
        }),
    e => new { e.Id, e.Name, e.Salary });
-- Provide both
SELECT * FROM employees(2, prefix = 'X', multiplier = 2.0);

-- Provide only one
SELECT * FROM employees(2, prefix = 'Y');

By default, the SQL parameter name matches the C# parameter name. You can override it with a custom name:

connection.RegisterTableFunction("employees",
    (int count, [Named("max_rows")] int? limit) => GetEmployees(limit ?? count),
    e => new { e.Id, e.Name });
SELECT * FROM employees(10, max_rows = 2);

Named parameters should typically be nullable types (string?, int?) since they’re optional in SQL. If a named parameter is non-nullable and the caller omits it, DuckDB.NET throws a clear error:

Table function 'employees' named parameter 'limit' is NULL, but parameter type 'Int32' is non-nullable.

Summary

DuckDB.NET 1.5.0 adds four improvements to user-defined functions:

  • High-level scalar functions: Register with a plain Func<> instead of manually iterating vectors. Supports 0–4 parameters, variable arguments, and dynamic object type.
  • High-level table functions: Define columns and mapping with a single projection expression. Supports 0–4 parameters, computed columns, and async data sources.
  • NULL handling for scalar functions: Use nullable parameter types (int?, string?) and DuckDB.NET automatically configures the function to receive NULLs instead of short-circuiting.
  • Named parameters for table functions: The [Named] attribute maps C# parameters to DuckDB’s = syntax for optional, named arguments.

The low-level vector-based APIs remain available for cases where you need direct control over chunk processing.

Avatar
Giorgi Dalakishvili
World-Class Software Engineer

Related