
This document defines the public contract for the mneme module (src/mneme/).
MNEME is an embedded key-value database. A MNEME db stores data in a single file, and supports multiple keyspaces, typed values, TTL expiry, sorted sets, lists, native vector search and encryption at rest. Written in C with a Lua API layer.
All operations are synchronous. MNEME does not require lev.run() context,
as it uses blocking file I/O. Database operations are expected to complete in
microseconds (mmap'd reads) to low milliseconds (fsynced writes).
C core (mneme.core) handles the storage engine -- pages,
B-tree, cursors, file I/O, vector distance computation. The Lua layer
(mneme.lua) provides keyspace management, type system, TTL expiry, sorted
set encoding, list management, and the public API.
mneme.open(path, opts) opens or creates a database file. Returns a
database handle or nil, err.
local mneme = require("mneme")
local db, err = mneme.open("/path/to/data.mneme")
if not db then error(err) end
-- Open read-only (no writes)
local db, err = mneme.open("/path/to/data.mneme", { readonly = true })
-- Open without creating (fails if file doesn't exist)
local db, err = mneme.open("/path/to/data.mneme", { create = false })
-- Open with no fsync (for bulk loading)
local db, err = mneme.open("/path/to/data.mneme", { sync = "none" })
| Field | Type | Default | Description |
|---|---|---|---|
readonly | boolean | false | Open for reads only |
create | boolean | true | Create file if it does not exist |
sync | string | "normal" | "normal" (fsync per commit) or "none" (no fsync) |
encryption | table | nil | Encryption at rest credentials |
-- List all keyspace names
local names = db:keyspaces()
-- Drop a keyspace and its data
db:drop_keyspace("old_cache")
-- Get low-level storage info (page size, file size, version)
local info = db:info()
-- Close the database gracefully
db:close()
A database contains named keyspaces. Each keyspace is an independent B-tree with its own key-value namespace.
Keyspace names are UTF-8 strings, 1--255 bytes. Keyspaces are created implicitly on first write.
local ks = db:keyspace("myapp")
Keys are byte strings, max 4000 bytes. Keys are sorted lexicographically within a keyspace.
ks:set("key", "value")
local val = ks:get("key")
ks:del("key")
local exists = ks:exists("key")
When set() receives a Lua value, the storage type is inferred:
| Lua type | MNEME type | Condition |
|---|---|---|
string | Bytes | Always |
number (integer) | Integer | math.floor(n) == n and ` |
number (float) | Float | Has fractional part, or is NaN/Inf |
Explicit type setters override auto-detection:
ks:set_integer("counter", 0)
ks:set_float("temperature", 36.6)
ks:set_bytes("binary", "\x00\x01\x02")
get() decodes values back to native Lua types: strings for Bytes,
numbers for Integer and Float.
Integer and Float values can be atomically incremented or decremented. If the key doesn't exist, it behaves as if it were 0.
local new_val = ks:incr("counter") -- increments by 1
local new_val = ks:incr("counter", 10) -- increments by 10
local new_val = ks:decr("counter") -- decrements by 1
local new_val = ks:decr("counter", 5) -- decrements by 5
Keys can be iterated using prefix scans or lexicographical range iterators.
-- Scan all keys starting with a prefix
for key, value in ks:scan("user:") do
print(key, value)
end
-- Range iteration over a lexicographic span (inclusive)
for key, value in ks:range("A", "C\xff") do
print(key, value)
end
Any key can have a TTL (time-to-live) in seconds. Expired keys return
nil on read (lazy detection). purge_expired() removes expired keys
from disk. Scan and range iterators automatically skip expired keys.
ks:set("session:abc", data, { ttl = 3600 })
local remaining = ks:ttl("session:abc")
-- Remove the TTL constraint, keeping the key indefinitely
ks:persist("session:abc")
-- Reclaim disk space
local count = ks:purge_expired()
Run multiple operations inside a single logical database transaction.
local ok, err = ks:batch(function()
ks:set("user:1:name", "Alice")
ks:incr("user:1:logins", 1)
end)
Lists map a string name to an ordered collection of values. Elements are added to the head (left) or tail (right).
local lst = ks:list("my_queue")
lst:rpush("task1") -- append to end
lst:lpush("task0") -- prepend to beginning
local len = lst:len() -- get list length
local val = lst:lpop() -- remove and return first element ("task0")
local val = lst:rpop() -- remove and return last element ("task1")
local item = lst:index(0) -- get item at index 0 (0-based)
-- Fetch range of elements [start..stop].
-- Stop is inclusive. Both can be negative (offset from end).
-- Returns a Lua array table.
local items = lst:range(0, -1)
Sorted sets map a string name to a set of string members, each associated with a numeric floating-point score. Members are ordered primarily by score, and secondarily by lexicographical order.
local zs = ks:sorted_set("leaderboard")
zs:add(100.5, "player1")
zs:add(200.0, "player2")
zs:batch_incr({"player1"}, 50.0) -- increment player1's score
local score = zs:score("player1") -- 150.5
local rank = zs:rank("player1") -- 0 (0-based ascending rank)
local count = zs:card() -- total count of elements
zs:rem("player1")
-- Range queries return an array of { member = string, score = number }
local results = zs:range_by_rank(0, -1)
local results = zs:range_by_score(100.0, 500.0)
-- Iterators
for member, score in zs:iter() do
-- iterate complete set
end
for member, score in zs:iter({ min = 100, max = 200, reverse = true }) do
-- score bounds are inclusive
end
Vectors are stored as [uint16 dim][float32 × dim] in B-tree values
with type tag 0x06. Dimensions are float32 (not float64) to halve
storage and improve cache utilization during search. A 128-dimensional
vector occupies 514 bytes inline.
vsearch performs brute-force k-nearest-neighbor search in C: scan all
leaf pages, compute distance for each vector entry, maintain a max-heap
of size top_k. For the expected scale (up to ~100K vectors at dim
128--384), this completes in single-digit milliseconds with mmap'd pages
in the OS cache.
-- Similarity search
local results = ks:vsearch(query_vector, {
metric = "cosine", -- "cosine", "dot", "l2"
top_k = 10,
filter = "doc:", -- optional key prefix filter
})
-- { { key = "doc:1", score = 0.98 },
-- { key = "doc:2", score = 0.95 }, ... }
-- Dimension of stored vectors (or nil if empty)
local dim = ks:vdim()
All vectors in a keyspace must have the same dimensionality. vset
returns nil, "dimension mismatch" if the dimension differs from
existing vectors.
| Metric | Computation | Score meaning |
|---|---|---|
cosine | 1 - (a·b / (\|a\| × \|b\|)) | 0 = identical, 2 = opposite |
dot | a·b | Higher = more similar |
l2 | √(Σ(a_i - b_i)²) | 0 = identical |
BM25 ranked keyword search. Each FT keyspace is a dedicated keyspace with an inverted index, document storage, and corpus statistics.
local ft = db:ft_keyspace("my_corpus")
-- Index documents
ft:put("rfc8446", document_text)
ft:put("rfc5681", another_document)
-- Retrieve original text
local text = ft:get("rfc8446")
-- Ranked search (returns by descending BM25 score)
local results = ft:search("congestion window flow control", {
top_k = 10, -- max results (default 10)
k1 = 1.2, -- BM25 k1 parameter (default 1.2)
b = 0.75, -- BM25 b parameter (default 0.75)
})
-- { { key = "rfc5681", score = 5.82 },
-- { key = "rfc8446", score = 4.31 }, ... }
-- Delete a document (removes all index entries)
ft:del("rfc8446")
-- Check existence
local exists = ft:exists("rfc8446")
-- Iterate documents
for doc_id, text in ft:scan("rfc") do ... end
for doc_id, text in ft:scan() do ... end
-- Count documents
local n = ft:count()
local n = ft:count("rfc")
-- Corpus statistics
local info = ft:stats()
-- { doc_count = 9200, unique_terms = 48000,
-- avg_doc_len = 1850.3, total_doc_len = 17022760 }
put() on an existing key is an update: old index entries are removed
and new ones are created atomically. When indexing is active, each stored
document is expected to have matching document metadata, reverse-map, df,
and posting entries. If del() or an indexed put() detects stored
document text without the reverse map, it returns a corruption error
instead of silently changing corpus statistics. During pause_indexing(),
documents may temporarily exist without index entries until
rebuild_index() completes. Deleting such unindexed paused documents
removes stored text/metadata and succeeds without changing corpus stats.
The default analyzer (C) performs:
Byte scan identifying word boundaries (ASCII alphanumeric + UTF-8 continuation bytes form words)
ASCII lowercasing (A-Z to a-z); non-ASCII passthrough
Skip tokens shorter than 2 bytes
Truncate tokens longer than 255 bytes
A custom Lua analyzer overrides the entire pipeline:
local ft = db:ft_keyspace("my_corpus", {
analyzer = function(text)
-- returns array of term strings
return terms
end,
analyzer_name = "english_stemmed",
})
The same analyzer is used for both indexing and querying.
ft:pause_indexing()
for id, text in pairs(corpus) do
ft:put(id, text) -- stores doc text, skips inverted index
end
ft:rebuild_index() -- builds complete index in batched commits
| Field | Type | Default | Description |
|---|---|---|---|
analyzer | function | C default | Custom tokenization function |
analyzer_name | string | "default" | Stored for consistency checking |
k1 | number | 1.2 | BM25 term frequency saturation |
b | number | 0.75 | BM25 document length normalization |
Optional value-level encryption at rest using Ed25519 SSH keys. Values are encrypted with ChaCha20-Poly1305; B-tree keys remain plaintext so prefix scans, range queries, sorted sets, and lists continue to work.
Pass an encryption table to mneme.open(). On first open this
initialises encryption metadata; on subsequent opens it unwraps the
existing Data Encryption Key (DEK).
-- Open with an SSH key file (~/.ssh/id_ed25519, unencrypted keys only)
local db, err = mneme.open("/path/to/data.mneme", {
encryption = { key_file = "~/.ssh/id_ed25519" }
})
-- Open with raw key material
local db, err = mneme.open("/path/to/data.mneme", {
encryption = { seed = seed_32, pubkey = pubkey_32 }
})
Encrypted databases require the key on every open. Without the
encryption option the database opens normally (no encryption).
Encryption is per-keyspace. Pass encrypted = true when obtaining a
keyspace handle:
local secrets = db:keyspace("secrets", { encrypted = true })
local cache = db:keyspace("cache") -- not encrypted
secrets:set("api_key", "sk-abc123")
local key = secrets:get("api_key") -- decrypted transparently
All standard key-value operations work on encrypted keyspaces:
get, set, set_bytes, set_integer, set_float, del,
exists, incr, decr, ttl, persist, scan, range,
count, batch.
Lists in encrypted keyspaces encrypt element values:
local lst = secrets:list("queue")
lst:rpush("sensitive data") -- encrypted on disk
local val = lst:rpop() -- decrypted on read
lst:range() on an encrypted keyspace returns a second value when
decryption failures occur: local items, corrupted = lst:range(0, -1).
corrupted is the count of entries that failed to decrypt (omitted
when zero). Scan and range iterators on encrypted keyspaces silently
skip entries that fail to decrypt.
Sorted sets work transparently (scores and members live in B-tree keys which remain plaintext; internal metadata values are small and not encrypted).
Vector and full-text operations cannot be used on encrypted keyspaces:
secrets:vset("doc:1", vec) -- nil, "encryption not supported for vectors"
secrets:vsearch(query, opts) -- nil, "encryption not supported for vectors"
db:ft_keyspace("corpus", { encrypted = true }) -- nil, "encryption not supported for full-text keyspaces"
Use separate, unencrypted keyspaces for vectors and full-text search.
Multiple Ed25519 keys can be authorized to open the same database. Each authorized key gets its own key slot that wraps the DEK independently.
-- Add a new authorized key -- only their PUBLIC key is needed.
-- The database re-reads your own key_file to unwrap the DEK internally.
local crypto = require("mneme.crypto")
local friend_pk = crypto.load_ssh_pubkey("/path/to/friend_id_ed25519.pub")
db:add_encryption_key(friend_pk)
-- Remove an authorized key (by SHA-256 fingerprint)
db:remove_encryption_key(fingerprint_32)
-- List authorized key fingerprints
local keys = db:list_encryption_keys()
-- { { fingerprint = "\x..." }, { fingerprint = "\x..." } }
The new user's private key never needs to leave their machine. Your own
key file (passed at open() time) is re-read when needed for key
management. The last key slot cannot be removed.
Each encrypted value stores a 12-byte nonce and 16-byte authentication tag alongside the ciphertext -- 28 bytes overhead per value.
Ed25519 SSH key
│ Ed25519 → X25519 curve conversion
▼
X25519 keypair
│ ECDH with per-db ephemeral X25519 key
▼
Shared secret (32 bytes)
│ HKDF-SHA256 Extract + Expand
▼
DEK (Data Encryption Key, 32 bytes)
│ Wrapped per authorized key, stored in keyspace directory
│ HKDF-SHA256 Expand(info = "mneme-ks:" .. keyspace_name)
▼
Per-keyspace sub-key (32 bytes)
│
▼ ChaCha20-Poly1305(sub-key, random nonce, key name as AAD, value)
The DEK is held in mlock'd memory and securely zeroed on garbage
collection. Key slots are stored in the keyspace directory B-tree under
a reserved internal key, committed atomically with all other data.
| Error | Meaning |
|---|---|
"encryption not supported for vectors" | vset/vget/vsearch on encrypted keyspace |
"encryption not supported for full-text keyspaces" | ft_keyspace with encrypted = true |
"encryption requested but database has no encryption key" | encrypted = true on a db opened without encryption |
"encrypted SSH keys not supported" | Password-protected SSH key (use ssh-keygen -p to remove passphrase) |
"no matching key slot for this key" | SSH key not authorized for this database |
"decryption failed (authentication error)" | Tampered or corrupt ciphertext |
-- Database statistics
local stats = db:stats()
-- {
-- page_count = 1024,
-- retired_pages = 0,
-- live_pages = 700,
-- reclaimable_pages = 324,
-- bloat_ratio = 0.46,
-- keyspaces = 3,
-- file_size = 4194304,
-- }
retired_pages is the count tracked in retired metadata
live_pages includes retired-tree metadata pages, because they are meta-reachable live pages
reclaimable_pages counts pages not reachable from the current meta
snapshot. Reader-safe retired pages may be reused by later writer
transactions; pages still protected by active readers become reusable only
after those readers release. db:compact() is still useful to physically
shrink the file and discard accumulated unreachable pages.
-- Keyspace statistics
local ks_stats = ks:stats()
-- {
-- keys = 15000,
-- depth = 3,
-- leaf_pages = 200,
-- branch_pages = 12,
-- overflow_pages = 45,
-- }
-- Compact: rewrite database, reclaiming free pages and removing expired keys.
-- Output path must not exist and must differ from the source path.
db:compact("/path/to/compacted.mneme")
Compaction copies live user data, keyspaces, and internal encryption metadata into a fresh database. Expired user keys are skipped. The destination database starts with an empty retired store.
Single file, page-based. All pages are 4096 bytes.
┌──────────────────────────────┐ Page 0
│ Meta page A │
│ magic: "MNEM" │
│ version (3), page_size │
│ page_count, retired_root │
│ keyspace_root, txn_id │
│ retired_page_count │
│ checksum (FNV-1a) │
├──────────────────────────────┤ Page 1
│ Meta page B (alternating) │
├──────────────────────────────┤ Page 2+
│ Keyspace directory (B-tree) │
│ Retired-batch tree │
│ Data pages │
└──────────────────────────────┘
The current storage format (v3) actively rejects older v1/v2 files.
There is no mixed-format fallback or in-place migration path.
Pages 0 and 1 are meta pages A and B. Commits alternate between them.
The meta page with the higher valid txn_id (and correct checksum) is
the current root. This provides atomic commits without a WAL.
MNEME uses copy-on-write MVCC. All values live in leaf nodes. Branch nodes contain only separator keys and child pointers. Writes never modify existing pages; they allocate new pages and update the root pointer atomically via the meta page.
Each read transaction publishes a snapshot using its own OFD lock fd.
Pages displaced by COW (or removed by drop_keyspace) are recorded in a
persistent retired-batch tree keyed by retire transaction id.
Writers must not reuse pages retired after any still-active reader snapshot.
Writers harvest only reader-safe retired pages into a transaction-local reuse cache
and allocate from that cache before extending the file. Page reuse happens from that local reuse cache, not from page contents themselves.
Same-transaction transient dirty pages are discarded, not persisted as retired.
Readers and writers on the same process/handle are only safe because reader
tracking uses distinct per-reader lock fds.
Because retired pages are harvested lazily, the database file may temporarily contain reclaimable retired pages even when there are no active readers. This is expected.
Commits are sequenced as: update retired metadata → write all dirty pages
→ fdatasync → write the new meta page → fdatasync. The meta page
write is the commit point; a crash before its fdatasync leaves the
previously committed snapshot unchanged.
| Type | Tag | Description |
|---|---|---|
| Meta | 0x00 | Database header (pages 0 and 1 only) |
| Branch | 0x01 | Internal B-tree node (keys + child pointers) |
| Leaf | 0x02 | B-tree leaf (keys + inline values) |
| Overflow | 0x03 | Large value continuation pages |
Single-writer, multiple-reader. Writers acquire flock(LOCK_EX) for
the duration of a transaction. Readers use mmap snapshots and are never
blocked. Multiple processes can open the same .mneme file.
| Tag | Type | Storage |
|---|---|---|
0x01 | Bytes | Raw binary, length-prefixed |
0x02 | Integer | 64-bit signed, little-endian |
0x03 | Float | 64-bit IEEE 754 double, little-endian |
0x04 | Sorted Set | Metadata: cardinality as int64 (guards against type collision) |
0x05 | List | Metadata: head/tail/length as 3× int64 (guards against type collision) |
0x06 | Vector | [uint16 dim][float32 × dim] |
0x07 | FT Meta | Corpus stats, analyzer config |
Values exceeding ~2KB spill to overflow pages. The overflow threshold ensures at least two cells fit per leaf page for B-tree splitting.
All operations return nil, err_string on failure.
Common errors include:
| Error | Meaning |
|---|---|
"not found" | Key does not exist (or expired) |
"type mismatch" | Operation incompatible with stored value type |
"db is read-only" | Write attempted on a read-only handle |
"db is closed" | Operation on a closed database |
"indexing paused" | Search attempted while indexing is paused |
Lilush subsystems automatically use MNEME for local persistent storage:
| Subsystem | MNEME file | Keyspace |
|---|---|---|
| Shell history | ~/.local/share/lilush/shell.mneme | shell, lua, general (per mode) |
| Local secrets | ~/.local/share/lilush/shell.mneme | secrets (encrypted) |
| Learned completions | ~/.local/share/lilush/shell.mneme | completions |
| DNS cache | configurable via mneme_path | cache |
| Wiki databases | ~/.local/share/lilush/wiki/*.mneme | __wiki (manifest), app-defined (content, meta, vectors) |
src/mneme/
mneme.h -- C API, types, meta layout, MVCC helpers
db.c -- Database open/close, mmap, meta pages
readers.c -- Reader OFD-lock registry and writer probes
retire.c -- Retired-batch encoding, harvest, persist, rebuild
txn.c -- Transaction begin/commit/abort, flock
btree.c -- B+ tree insert/delete/search/split
cursor.c -- B-tree cursor iteration
page.c -- Page allocation, retire/discard semantics
vector.c -- Vector distance computations
ft.c -- Full-text analyzer and BM25 search
mneme_lua.c -- Lua bindings (mneme.core), DEK secure userdata
mneme.lua -- Public Lua API
mneme/crypto.lua -- Encryption key management (DEK wrap/unwrap, SSH key loading)