What are User-Defined Functions, and what do they have to do with WebAssembly?

What are User-Defined Functions, and what do they have to do with WebAssembly?

When I was first talking with people in the tech sphere about my ideas for Suborbital as a company, the conversations were largely based around the utility and future promise of WebAssembly as a technology. Back in late 2020 I had about a half dozen front-runner ideas for how my open source work would be applied. It took many weeks of conversations to narrow down on something that stuck out as the 'oh damn, that's really cool' idea that I could extrapolate out into an exciting product, and that was the idea of UDFs, or User-Defined Functions.

A UDF is a piece of logic, often expressed as a block of code, that the user or consumer of a piece of software provides in order to modify the behaviour of said software. Starting with an example in pseudocode:

func filter_tweets(tweets []Tweet) -> []Tweet {
    filtered = []Tweet{}

    for tweet in tweets {
        if (tweet.Contents.Contains("web3") == false) {
            filtered.push(tweet)
        }
    }

    return filtered
}

This is a function that takes in a list of Tweets and returns a filtered list of the Tweets that do not contain mention of web3. This is the kind of logic that seems like it can be set up using the settings UI of an app, but what happens when you want to add more complex calculations like sentiment analysis, conditionals, or (god forbid) RegEx? The ability to modify software using... software just makes sense in a lot of situations.

The hard part

The scary part of UDFs is the 'U'... Users running code within a product is a terrifying proposition, especially in the time of cryptojacking, cybersecurity incidents, and vulnerabilities like Log4j and Spectre/Meltdown. While we all want to assume the best from the people using our services, there is always the potential for nefarious activity to cause real harm to software and infrastructure if anyone is allowed to run code on your servers.

This is why UDFs are increasingly being associated with WebAssembly in 2022. Wasm brings an ideal environment with which to run untrusted code. The sandbox that Wasm code is executed in is closed-by-default. This means that any attempt at accessing the outside world from your function is not even able to begin executing unless it has been explicitly allowed by the host it's running on. This is contrasted with something like a container, which is open-by-default, and additional measures such as network policies need to be put in place to prevent malicious activity after an attempt has already begun.

As an example, if I am a malicious developer hoping to gain access to privileged information available on internal APIs, I could attempt to make HTTP requests to other services on the network in an attempt to exfiltrate some data. In the case of UDFs, my malicious code is running within the application’s own infrastructure, which would normally be quite dangerous. WebAssembly’s sandboxing allows system calls (such as accessing the network) to be filtered, ensuring that when my malicious code attempts to access a resource it’s not authorized for, the attempt can be stopped before it even leaves the process.

In addition to the strong host-controlled sandbox, WebAssembly is also memory-safe, using a linear memory space and strict checks to ensure that any Wasm code cannot read or write to any memory outside of its current runtime instance. This is very important when running multiple untrusted workloads from different sources on the same infrastructure, i.e. multi-tenancy. Allowing code to execute on the same host while guaranteeing memory separation is a big win for security and performance, as it allows more running code to be packed into the available hardware resources.

UDFs today

Out in the world today, you can see several examples of UDFs powered by WebAssembly. Our friends at Redpanda are using Wasm to run message transforms directly on a stream of data passing through the brokers of their streaming platform. TiDB, a distributed SQL database, built a Wasm-based UDF engine to make extending and customizing your database easier. I’d also be remiss if I didn’t mention Suborbital Compute, our platform for adding UDFs to any SaaS application. These are all examples of WebAssembly being used to safely run user logic within a larger application.

As the sophistication of hosted software platforms progress, UDFs are a natural evolution to maximize the utility of the applications we rely on. Software products that are build with this kind of flexibility in mind will be able to scale to handle the use-cases of a much wider audience without its developers being bogged down in an endless backlog of feature requests. We think WebAssembly is the best way to do this securely, and we’re excited to see how software evolves with these capabilities made simpler.