LEV -- Async I/O Runtime

Overview

This document defines the public contract for the lev module (src/lev/).

Single-threaded, coroutine-based async I/O runtime built on Linux epoll. Provides cooperative multitasking with timers, UDP sockets, TCP sockets with TLS, cancellation tokens, and structured concurrency combinators.

The runtime is opt-in and scoped: lev.run() creates an event loop for its dynamic extent. All networking code (HTTP, DNS, Redis) requires lev.run() context.

Architecture: C core (lev.core) is the scheduling authority -- only the event loop resumes coroutines. The Lua wrapper (lev.lua) manages task lifecycle, cancellation, and combinators.

Event loop

lev.run(fn) creates an epoll-based event loop, spawns fn as the main coroutine, and runs until all coroutines complete. Returns the main function's results, or nil, err if it errored.

local lev = require("lev")

local result, err = lev.run(function()
    -- async code here
    return 42
end)
-- result == 42

Nested lev.run() calls are not allowed (raises a Lua error()).

Error protocol

All I/O operations return nil, err_string on failure. Error strings:

ErrorMeaning
"timeout"Operation timed out
"cancelled"Cancel token was triggered
"closed"Peer closed the connection (TCP/TLS)
"EAGAIN"Would block (low-level; wrapped by Lua layer)
"EINPROGRESS"Connect in progress (low-level; wrapped by lev.connect())
"socket is closed"Operation on a closed socket
strerror()System error string for other errno values

Spawning tasks

lev.spawn(fn, opts) creates a new coroutine that runs concurrently. Returns a task handle for use with lev.await().

lev.run(function()
    local task = lev.spawn(function()
        lev.sleep(0.1)
        return "done"
    end)

    -- Detached: errors logged to stderr, never awaited
    lev.spawn(function()
        handle_request(data)
    end, { detached = true })

    local result = lev.await(task)  -- "done"
end)

Options

FieldTypeDefaultDescription
detachedbooleanfalseIf true, task errors are logged automatically when the task completes; the task handle can still be awaited if retained
namestringnilHuman-readable name for observability (appears in task_tree()/task_dump())

Awaiting tasks

lev.await(task) yields the current coroutine until the task finishes. Returns the task's return values on success, or nil, err on error.

local result, err = lev.await(task)

Multiple coroutines can await the same task. If the task has already completed, await returns immediately.

Resource tracking

lev.own(), lev.defer(), and lev.disown() provide per-coroutine resource cleanup. When a task finishes (success or error), registered cleanups run automatically in LIFO order before the task status is updated and awaiters are notified.

lev.own(resource)

Register a resource (any table with a close() method) for auto-close. Returns the resource for chaining.

lev.spawn(function()
    local sock = lev.own(lev.connect("1.1.1.1", 53, { timeout = 5 }))
    -- sock:close() called automatically when this coroutine exits,
    -- even if an error is thrown
    local data = sock:recv(4096, 5)
end, { detached = true })

lev.defer(fn)

Register an arbitrary cleanup function, like Go's defer.

lev.spawn(function()
    local counter = get_counter()
    counter:increment()
    lev.defer(function() counter:decrement() end)
    -- counter:decrement() runs when this coroutine exits
    do_work()
end)

lev.disown(resource)

Remove a resource from the current task's cleanup list (identity comparison). Use for ownership transfer between coroutines. Returns the resource.

lev.spawn(function()
    local sock = lev.own(lev.connect("1.1.1.1", 53))
    -- Transfer ownership to another task
    lev.disown(sock)
    lev.spawn(function()
        lev.own(sock)
        -- sock is now cleaned up when THIS task exits
    end)
end)

Properties

Timers

lev.sleep(seconds, cancel)

Yields the current coroutine for the given duration (fractional seconds supported). Returns true on normal completion, or nil, "cancelled" if a cancel token fires before the timer expires.

lev.sleep(0.5)  -- sleep 500ms

-- Cancellable sleep
lev.sleep(5.0, token)  -- returns nil, "cancelled" if token fires

lev.now()

Returns the current time from CLOCK_MONOTONIC as a float (seconds with sub-millisecond precision). Available outside lev.run().

local start = lev.now()
do_work()
local elapsed = lev.now() - start

UDP sockets

lev.udp() creates a non-blocking UDP socket integrated with the event loop. Must be called within lev.run().

lev.run(function()
    local sock = lev.udp()
    sock:bind("0.0.0.0", 5353)
    sock:setsockopt("reuseaddr", 1)

    sock:sendto(data, "1.1.1.1", 53)

    -- Yields until data arrives or timeout
    local data, addr, port = sock:recvfrom(4096, 5.0)
    if not data then
        -- addr contains error: "timeout", "cancelled", etc.
    end

    sock:close()
end)

Methods

MethodSignatureDescription
bind(addr, port) -> true \| nil, errBind to address and port (use port 0 for ephemeral)
sendto(data, addr, port) -> nbytes \| nil, errSend datagram (non-blocking)
recvfrom(maxlen, timeout_or_opts) -> data, addr, port \| nil, errReceive datagram; yields on EAGAIN
setsockopt(name, value) -> true \| nil, errSet socket option
close()Close socket and deregister from event loop
fd() -> intReturn raw file descriptor number

recvfrom options

The second argument to recvfrom can be a number (timeout in seconds) or a table:

-- Simple timeout
local data, addr, port = sock:recvfrom(4096, 5.0)

-- Table form with cancel token
local data, addr, port = sock:recvfrom(4096, {
    timeout = 5.0,
    cancel = token,
})

TCP sockets

lev.tcp()

Creates a raw non-blocking TCP socket. Typically used indirectly via lev.connect() or lev.listen().

lev.connect(addr, port, opts)

High-level connect: creates a TCP socket, performs non-blocking connect (with async wait for EINPROGRESS), and optionally upgrades to TLS.

lev.run(function()
    -- Plain TCP
    local sock, err = lev.connect("1.1.1.1", 53, { timeout = 5 })

    -- TCP + TLS
    local tls_sock, err = lev.connect("1.1.1.1", 853, {
        timeout = 5,
        tls = {
            mode = "client",
            server_name = "cloudflare-dns.com",
            verify = false,
        },
    })
end)

Options

FieldTypeDefaultDescription
timeoutnumbernilConnection timeout in seconds
tlstablenilIf present, perform TLS handshake after connect (see TLS config)
canceltokennilCancel token; propagated to TLS handshake if tls.cancel is not set

lev.listen(addr, port, opts)

Creates a bound listening TCP socket. Returns a listener object with an accept method that yields until a client connects.

lev.run(function()
    local listener = lev.listen("0.0.0.0", 8080, { reuseport = true })

    while true do
        local client, addr, port = listener:accept(5.0)
        if client then
            lev.spawn(function()
                local data = client:recv(4096, 10)
                client:send("HTTP/1.0 200 OK\r\n\r\nHello")
                client:close()
            end, { detached = true })
        end
    end
end)

Options

FieldTypeDefaultDescription
reuseportbooleanfalseEnable SO_REUSEPORT
backlognumber128Listen backlog

Listener methods

MethodSignatureDescription
accept(timeout_or_opts) -> client, addr, port \| nil, errAccept a connection; yields until one arrives. Accepts a number (timeout) or { timeout = N, cancel = token }
close()Close the listening socket
fd() -> intReturn raw file descriptor

TCP socket methods

MethodSignatureDescription
send(data, opts?) -> nbytes \| nil, errSend all data; handles partial sends and EAGAIN internally. opts: { timeout = N, cancel = token } (default timeout: 30s)
recv(maxlen, timeout) -> data \| nil, errReceive up to maxlen bytes; yields on EAGAIN
recv_exactly(n, timeout) -> data \| nil, errReceive exactly n bytes; accumulates partial reads
recv_until(pattern, timeout, max_size) -> data \| nil, errReceive until pattern matches; returns data including pattern
setsockopt(name, value) -> true \| nil, errSet socket option
getpeername() -> addr, portReturn remote address and port
shutdown(how) -> true \| nil, errHalf-close: "r", "w", or "rw"
starttls(config) -> true \| nil, errUpgrade to TLS (see TLS section)
close()Close socket and deregister from event loop
fd() -> intReturn raw file descriptor

TCP socket options

NameDescription
"reuseaddr"SO_REUSEADDR
"reuseport"SO_REUSEPORT
"rcvbuf"SO_RCVBUF (receive buffer size)
"sndbuf"SO_SNDBUF (send buffer size)
"keepalive"SO_KEEPALIVE
"nodelay"TCP_NODELAY (disable Nagle's algorithm)

TLS

TLS is integrated directly into TCP sockets via LITLS. After establishing a TCP connection, call sock:starttls(config) to perform a TLS handshake. Once upgraded, send() and recv() transparently use TLS -- no API changes needed for data transfer.

Client TLS

local sock = lev.connect("example.com", 443, {
    timeout = 5,
    tls = {
        mode = "client",
        server_name = "example.com",  -- SNI
        verify = true,                -- certificate verification (default)
        cafile = "/etc/ssl/certs/ca-certificates.crt",
    },
})

Server TLS

local listener = lev.listen("0.0.0.0", 443)
local client = listener:accept(5)
client:starttls({
    mode = "server",
    cert = "/path/to/cert.pem",
    key = "/path/to/key.pem",
    verify = false,  -- don't require client certs (default for server)
})

TLS config fields

FieldTypeDefaultDescription
modestring"client""client" or "server"
server_namestringnilSNI hostname (client mode)
verifybooleantrue (client) / false (server)Certificate verification
cafilestringnilCA certificate bundle path
certstringnilCertificate file path (PEM) — server identity or client mTLS identity
keystringnilPrivate key file path (PEM) — server key or client mTLS key
identityuserdatanilPre-parsed TLS identity (from lev.tls_identity(); server mode alternative to cert/key)
hoststablenilSNI host map for server mode (see below)
timeoutnumber10TLS handshake timeout in seconds

Server-side SNI

When running a TLS server that serves multiple domains, the hosts field in the starttls config provides per-hostname cert/key pairs. During the TLS handshake, the SNI callback selects the correct certificate based on the client's requested hostname.

client:starttls({
    mode = "server",
    cert = "/default/cert.pem",
    key  = "/default/key.pem",
    hosts = {
        ["example.com"]    = { cert = "/certs/example.com.crt",   key = "/keys/example.com.key" },
        ["*.example.org"]  = { cert = "/certs/_.example.org.crt", key = "/keys/_.example.org.key" },
    },
})

Matching order:

  1. Exact match: hostname matches a key exactly

  2. Wildcard match: *.domain matches any single-label prefix (e.g. *.example.org matches api.example.org but not deep.sub.example.org)

  3. Default: if no match, the default cert/key is used

Each entry in the hosts table must have both cert and key paths pointing to valid PEM files. Invalid entries cause starttls to fail instead of being silently ignored.

TLS-aware error strings

In addition to the standard error protocol, TLS operations may return:

ErrorMeaning
"want_read"TLS needs to read (internal, handled by Lua wrapper)
"want_write"TLS needs to write (internal, handled by Lua wrapper)
"closed"Peer closed the connection

Cancellation

lev.cancel_token() returns a token for cooperative cancellation. Pass it to blocking operations via the options table.

local token = lev.cancel_token()

lev.spawn(function()
    lev.sleep(5.0)
    token:cancel()  -- interrupt waiters
end)

local data, err = sock:recvfrom(4096, { cancel = token })
-- err == "cancelled" if token fired first

The cancel token sets a cancelled flag that blocking operations check before and after yielding. Cancellation is cooperative -- operations return nil, "cancelled" and the socket remains open for reuse.

Token fields

Field/MethodDescription
token.cancelledboolean -- read-only flag
token:cancel()Set the flag and interrupt registered waiters

Combinators

lev.race(fns)

Spawns each function concurrently. Returns the first result (success or error) and cancels the remaining tasks. Each function receives a cancel token as its first argument.

local result = lev.race({
    function(token)
        lev.sleep(0.1)
        return "fast"
    end,
    function(token)
        lev.sleep(10.0)
        return "slow"
    end,
})
-- result == "fast"

lev.all(fns)

Spawns each function and waits for all to complete. Returns a list of result tables. If any task errors, cancels the rest and returns nil, err immediately.

local results, err = lev.all({
    function(token) return "a" end,
    function(token) return "b" end,
})
-- results == { {"a"}, {"b"} }

-- Fail-fast on error
local results, err = lev.all({
    function(token) error("boom") end,
    function(token) lev.sleep(10) end,
})
-- results == nil, err matches "boom"

Signal handling

lev.on_signal(signum, handler) registers a callback for signal delivery via signalfd. Must be called within lev.run(). Signal disposition is restored when lev.run() exits.

lev.run(function()
    lev.on_signal(lev.SIGTERM, function(signo)
        shutdown = true
    end)

    while not shutdown do
        -- serve requests
    end
end)

Stopping the loop

lev.stop() sets the loop's running flag to false, causing lev.run() to exit on the next iteration regardless of active coroutines. This is intended for signal handlers that need to enforce a shutdown deadline:

lev.run(function()
    lev.on_signal(lev.SIGTERM, function()
        shutdown = true
        lev.spawn(function()
            lev.sleep(5)   -- grace period for in-flight work
            lev.stop()     -- force exit
        end, { detached = true })
    end)

    while not shutdown do
        -- serve requests
    end
end)

Signal constants

ConstantValue
lev.SIGTERM15
lev.SIGINT2
lev.SIGHUP1
lev.SIGUSR110
lev.SIGUSR212

C core API

The lev.core module exposes low-level primitives used by the Lua wrapper. These are not part of the public API but are documented for maintainability.

FunctionDescription
loop_new()Create epoll fd and allocate loop structures
loop_run(loop, on_complete)Run event loop until all coroutines finish
loop_stop(loop)Signal the loop to stop
loop_destroy(loop)Close epoll, free memory (also __gc)
spawn(loop, co)Add coroutine to ready queue, increment active count
yield_to_loop()Pure lua_yield
ready_enqueue(loop, co, nargs)Add coroutine to ready queue
wait_readable(fd, timeout)Register fd with epoll + optional timer, yield
wait_writable(fd, timeout)Same for write readiness
fd_deregister(fd)Remove fd from epoll, clear wait entry
timer_sleep(seconds)Add timer, yield, resume on expiry
timer_register(seconds)Add timer, return timer_id (no yield; for cancellable sleep)
cancel_wait(co, fd, timer_id)Deregister fd/timer, resume with nil, "cancelled"
signal_setup(signum, fn)Add signal to signalfd, register handler
now()CLOCK_MONOTONIC as float seconds
udp_new()Create non-blocking UDP socket userdata
tcp_new()Create non-blocking TCP socket userdata
loop_stats(loop)Return event loop statistics
tls_parse_identity(cert, key, hostname?)Parse cert+key PEM into reusable TLS identity userdata

TCP userdata methods (C level)

MethodDescription
connect(addr, port)Non-blocking connect; returns true or nil, "EINPROGRESS"
connect_finish()Check SO_ERROR after writable; returns true or nil, err
bind(addr, port)Bind to address
listen(backlog)Start listening
accept()Non-blocking accept; returns tcp_ud, addr, port or nil, "EAGAIN"
send(data)Non-blocking send (TLS-aware); returns bytes or nil, err
recv(maxlen)Non-blocking recv (TLS-aware); returns data or nil, err
setsockopt(name, val)Set socket option
getpeername()Returns addr, port
shutdown(how)Half-close: "r", "w", "rw"
starttls_init(config)Initialize LITLS TLS context and session
starttls_step()Drive TLS handshake; returns "done", "want_read", "want_write", or nil, err
close()Cleanup TLS + close fd
fd()Return raw fd

Async subprocess

lev.exec(path, args, opts) spawns an external command as a child process with non-blocking pipe I/O and pidfd-based exit notification. The child's stdin/stdout/stderr are connected via pipes, and the pidfd integrates directly into the epoll event loop for async wait.

lev.run(function()
    local proc = lev.exec("/bin/echo", {"hello"})
    local output = proc:read_all(5)   -- "hello\n"
    local code = proc:wait(5)         -- 0
    proc:close()
end)

Options

FieldTypeDefaultDescription
cwdstringnilWorking directory for child
envtablenilArray of "KEY=VALUE" strings; replaces child environment (uses execve)
stderr_to_stdoutbooleanfalseMerge stderr into stdout stream

Process methods

MethodSignatureDescription
send(data, opts) -> nbytes \| nil, errWrite to child stdin; handles partial writes
recv(maxlen, timeout_or_opts) -> data \| nil, errRead from child stdout; yields on EAGAIN
recv_stderr(maxlen, timeout_or_opts) -> data \| nil, errRead from child stderr
read_all(timeout_or_opts) -> data \| nil, errRead all stdout until EOF
wait(timeout_or_opts) -> exit_code, exit_signal \| nil, errAsync wait via pidfd
close_stdin()Close stdin pipe (sends EOF to child)
kill(signal) -> true \| nil, errSend signal (default SIGTERM)
pid() -> intChild PID
close()Close all fds, reap zombie

All methods accepting timeout_or_opts support: a number (timeout in seconds), or a table { timeout = N, cancel = token }.

Exec failure detection

If execvp/execve fails (e.g. command not found), lev.exec() returns nil, err immediately. This uses the self-pipe trick: a CLOEXEC pipe that the child writes errno to on exec failure; the parent reads it before returning the userdata.

Bidirectional I/O

For commands that both read stdin and produce output, use separate coroutines to prevent deadlocks:

lev.run(function()
    local proc = lev.exec("/bin/cat")
    local writer = lev.spawn(function()
        proc:send(big_data)
        proc:close_stdin()
    end)
    local reader = lev.spawn(function()
        return proc:read_all(10)
    end)
    lev.await(writer)
    local output = lev.await(reader)
    proc:wait(5)
    proc:close()
end)

Signal constants

ConstantValue
lev.SIGKILL9

(In addition to the existing SIGTERM, SIGINT, SIGHUP, SIGUSR1, SIGUSR2.)

Subprocess C core API

Function/MethodDescription
subprocess_spawn(path, argv, opts)Fork+exec with pipes and pidfd
read(maxlen)Non-blocking read from stdout
read_stderr(maxlen)Non-blocking read from stderr
write(data)Non-blocking write to stdin
close_stdin()Close stdin pipe
kill(sig)Send signal to child
waitid_pidfd()Non-blocking waitid via pidfd
pid() / pidfd()Return child PID / pidfd
stdin_fd() / stdout_fd() / stderr_fd()Return raw fds
close() / __gcClose all fds, reap zombie

Deadlock detection

The C event loop detects deadlocks: when all coroutines are blocked but no file descriptors, timers, or signal handlers are registered, the loop returns nil, "deadlock: ..." instead of hanging indefinitely.

local ok, err = lev.run(function()
    local never = lev.spawn(function()
        coroutine.yield()  -- yield without registering anything
    end)
    lev.await(never)  -- blocks forever
end)
-- ok == nil, err matches "deadlock"

Signal handlers prevent false positives: a server waiting for SIGTERM with no active I/O is not a deadlock.

Observability

LEV provides introspection APIs for debugging hung servers and monitoring task lifecycle. The primary use case: send SIGUSR1 to a running RECALL or RELIW process to get a full task dump showing what every coroutine is blocked on.

lev.set_logger(logger)

Injects a std.logger instance for structured error reporting. When set, cleanup errors and detached task errors are routed through the logger instead of raw io.stderr:write(). The logger persists across run() calls. Pass nil to revert to stderr.

local std = require("std")
local logger = std.logger.new("info")
lev.set_logger(logger)

lev.task_name(name)

Sets and/or returns the current task's name. With an argument, sets the name and returns it. Without an argument, returns the current name (or nil). No-op outside lev.run().

lev.spawn(function()
    lev.task_name("request-handler")
    -- ...
end)

Tasks can also be named at spawn time via opts.name. The main task is automatically named "main".

lev.stats()

Returns a table with fast loop counters. Returns nil outside lev.run().

FieldDescription
active_corosNumber of live coroutines in the event loop
registered_fdsNumber of file descriptors registered with epoll
timer_countNumber of pending timers in the heap
ready_countNumber of coroutines in the ready queue
task_countNumber of entries in the Lua task registry
lev.run(function()
    local s = lev.stats()
    print(s.active_coros, s.task_count)
end)

lev.task_tree()

Returns a sorted array of task info tables. Returns nil outside lev.run().

Each entry has: id, name, status, detached, parent_id, co_status. Sorted by id for stable output. co_status is from coroutine.status().

lev.run(function()
    lev.spawn(function() lev.sleep(1) end, { name = "worker" })
    for _, t in ipairs(lev.task_tree()) do
        print(t.id, t.name, t.status, t.co_status)
    end
end)

lev.task_dump()

Like task_tree() but adds a traceback field for suspended coroutines, showing the exact yield point (e.g. sleep → timer_sleep). Has meaningful overhead -- use for debugging, not monitoring.

lev.enable_dump_signal(format)

Registers a SIGUSR1 handler that calls task_dump() and outputs the result. Supports "text" (default) and "json" formats. Uses the injected logger if set, otherwise stderr. Safe because LEV signal handlers run via signalfd dispatch, not async-signal context.

lev.run(function()
    lev.enable_dump_signal()  -- or "json"
    lev.spawn(function() lev.sleep(30) end, { name = "sleeper" })
    lev.sleep(30)
end)
-- Then: kill -USR1 <pid>

Text output example:

=== LEV task dump ===
  task #1 "main"  status=pending  co=suspended  detached=false  parent=none
  task #2 "sleeper"  status=pending  co=suspended  detached=false  parent=1
=== end dump ===