An AI-native programming language

The type doesn't check the model. It shapes what it can say.

Witchcraft makes inference a built-in operation. You declare the shape of the answer you need, and that shape constrains the model as it generates — so a malformed answer isn't rejected afterwards, it's unreachable. The model itself lives outside your code, swappable by a config file.

Get started → View on GitHub

“If you deleted the type, would the model's output change?”

same program · same model · same seed

WITH THE TYPE

# urgency must be a number 0–10
divine r: Urgency
  from (msg) using reader

model generates7

TYPE DELETED

# no constraint — free generation
divine r
  from (msg) using reader

model generates"quite high, I'd say…"

The output changes, because the type is fed into generation token by token — it's part of the computation, not validation bolted on after. That's the line between a real primitive and a wrapper with nice syntax — and it's verified against real model weights.

Why it's different

Most “AI-native” tools wrap a model in a library. Witchcraft puts it in the type system.

Four ideas the compiler actually reasons about — not conventions you hope everyone follows.

Shape guaranteed by construction

Declare the answer as a type — a number in a range, a record, one of a fixed set of choices. The model is constrained to produce exactly that. No “it returned prose when I needed a number” bugs.

The model lives outside your code

Your program names a need, never a model. Which model fills it is a one-line config file. Swap a tiny local model for a frontier one — the program gets smarter with zero code changes.

Low confidence can't leak through

Every inference carries a confidence score and must clear a gate before you can act on it. Below the threshold, your fallback runs. Uncertain answers route to a human by construction, not by remembering to check.

Powers are visible and checked

Reaching the network, touching scoped data, escalating — every capability is declared in the source and enforced at compile time. A program that could phone a cloud model has to say so, where you can read it.

Quickstart

From clone to a running program in three steps.

Build the toolchain

You need Rust. The default build ships a deterministic offline engine, so you can write and run programs with no model installed.

$ git clone https://github.com/sjwaller/witchcraft.git
$ cd witchcraft && cargo build --release
# → target/release/witch and grimoire

Write a program

Describe the shape you want; ask a model to fill it. Save as mood.witch.

type Reading = { feeling: one_of { Happy, Annoyed, Angry, Worried }, urgency: spark in 0..10 }

oracle reader = summon "MoodReader"

divine r: Reading from ("the site keeps logging me out") using reader
with confidence >= 0.0 fallback { feeling: Annoyed, urgency: 0 }

print "feeling: ${r.feeling}, urgency: ${r.urgency}/10"

iii

Check it, run it

Type-check without running, then run against the offline engine. Point it at a real model when you're ready.

$ witch check mood.witch # static checks only
$ witch run mood.witch --seed 7
feeling: Annoyed, urgency: 6/10

The payoff

The same program gets smarter. You change a config file, not the code.

Your program never names a model — it names a need. A manifest binds that need to a real engine. Swap the manifest, keep the source.

mood.local.toml — a small local model

[need.MoodReader]
engine   = "small"
locality = "local"

[engine.small]
kind = "llama"
gguf = "./models/qwen2.5-0.5b.gguf"

mood.better.toml — a sharper one, no code change

[need.MoodReader]
engine   = "big"
locality = "local"

[engine.big]
kind = "llama"
gguf = "./models/qwen2.5-7b.gguf"

$ witch run mood.witch --manifest mood.local.toml # rough read
$ witch run mood.witch --manifest mood.better.toml # sharper read — same source

What it's for

The layer between messy human input and software that has to act on it.

Anywhere a model reads unstructured text and must return a clean, structured decision your code can trust: support routing, inbox triage, alert and log classification, moderation pre-filters, structured extraction from documents.

What it does — and doesn't — promise

It guarantees shape, never quality. The answer will always be well-formed and in-type. It will not always be wise — a weak model gives a well-shaped but poor judgement. Choosing a capable model is your job.
It's built for bounded answers, not open-ended chat. Decisions, classifications, structured extractions — things with a describable shape.
Agents are deliberately bounded. No free-running autonomy; you drive the loop and the model fills in judgements.
This is v0.1. The core thesis is proven and verified against real model weights; some surface is still being built.

Read further

Documentation & downloads.

GETTING STARTED

README

Build, install, run, and compile your first program. The practical front door.

Open →

LEARN THE LANGUAGE

Language Guide

Every construct explained from scratch — types, divine, memory, agents, capabilities.

Read the guide →

DOWNLOADS

Releases

Prebuilt binaries and tagged source. Grab the toolchain for your platform.

Get binaries →

SAMPLE PROGRAMS

Examples

Working .witch programs — the triage flagship, the strict-network demo, and more.

Browse examples →

THE THINKING

The discussion paper

“What would an AI-native programming language actually make primitive?” — the idea behind it.

Read the paper →

HOW IT WAS BUILT

OpenSpec history

The spec-driven design record — every proposal, decision, and guarantee, in the open.

See the specs →