An AI-native programming language

The type doesn't check the model. It shapes what it can say.

Witchcraft makes inference a built-in operation. You declare the shape of the answer you need, and that shape constrains the model as it generates — so a malformed answer isn't rejected afterwards, it's unreachable. The model itself lives outside your code, swappable by a config file.

“If you deleted the type, would the model's output change?”
same program · same model · same seed
WITH THE TYPE
# urgency must be a number 0–10
divine r: Urgency
  from (msg) using reader
model generates7
TYPE DELETED
# no constraint — free generation
divine r
  from (msg) using reader
model generates"quite high, I'd say…"

The output changes, because the type is fed into generation token by token — it's part of the computation, not validation bolted on after. That's the line between a real primitive and a wrapper with nice syntax — and it's verified against real model weights.

Why it's different

Most “AI-native” tools wrap a model in a library. Witchcraft puts it in the type system.

Four ideas the compiler actually reasons about — not conventions you hope everyone follows.

Shape guaranteed by construction

Declare the answer as a type — a number in a range, a record, one of a fixed set of choices. The model is constrained to produce exactly that. No “it returned prose when I needed a number” bugs.

The model lives outside your code

Your program names a need, never a model. Which model fills it is a one-line config file. Swap a tiny local model for a frontier one — the program gets smarter with zero code changes.

Low confidence can't leak through

Every inference carries a confidence score and must clear a gate before you can act on it. Below the threshold, your fallback runs. Uncertain answers route to a human by construction, not by remembering to check.

Powers are visible and checked

Reaching the network, touching scoped data, escalating — every capability is declared in the source and enforced at compile time. A program that could phone a cloud model has to say so, where you can read it.

Quickstart

From clone to a running program in three steps.

i

Build the toolchain

You need Rust. The default build ships a deterministic offline engine, so you can write and run programs with no model installed.

$ git clone https://github.com/sjwaller/witchcraft.git
$ cd witchcraft && cargo build --release
# → target/release/witch and grimoire
ii

Write a program

Describe the shape you want; ask a model to fill it. Save as mood.witch.

type Reading = { feeling: one_of { Happy, Annoyed, Angry, Worried }, urgency: spark in 0..10 }

oracle reader = summon "MoodReader"

divine r: Reading from ("the site keeps logging me out") using reader
  with confidence >= 0.0 fallback { feeling: Annoyed, urgency: 0 }

print "feeling: ${r.feeling}, urgency: ${r.urgency}/10"
iii

Check it, run it

Type-check without running, then run against the offline engine. Point it at a real model when you're ready.

$ witch check mood.witch # static checks only
$ witch run mood.witch --seed 7
feeling: Annoyed, urgency: 6/10
The payoff

The same program gets smarter. You change a config file, not the code.

Your program never names a model — it names a need. A manifest binds that need to a real engine. Swap the manifest, keep the source.

mood.local.toml — a small local model
[need.MoodReader]
engine   = "small"
locality = "local"

[engine.small]
kind = "llama"
gguf = "./models/qwen2.5-0.5b.gguf"
mood.better.toml — a sharper one, no code change
[need.MoodReader]
engine   = "big"
locality = "local"

[engine.big]
kind = "llama"
gguf = "./models/qwen2.5-7b.gguf"
$ witch run mood.witch --manifest mood.local.toml  # rough read
$ witch run mood.witch --manifest mood.better.toml # sharper read — same source
What it's for

The layer between messy human input and software that has to act on it.

Anywhere a model reads unstructured text and must return a clean, structured decision your code can trust: support routing, inbox triage, alert and log classification, moderation pre-filters, structured extraction from documents.

What it does — and doesn't — promise

  • It guarantees shape, never quality. The answer will always be well-formed and in-type. It will not always be wise — a weak model gives a well-shaped but poor judgement. Choosing a capable model is your job.
  • It's built for bounded answers, not open-ended chat. Decisions, classifications, structured extractions — things with a describable shape.
  • Agents are deliberately bounded. No free-running autonomy; you drive the loop and the model fills in judgements.
  • This is v0.1. The core thesis is proven and verified against real model weights; some surface is still being built.
Read further

Documentation & downloads.