VectorPin¶

Verifiable integrity for AI embedding stores.

VectorPin is the provenance layer of the ThirdKey trust stack: SchemaPin (tool integrity) → AgentPin (agent identity) → VectorPin (vector store integrity) → Symbiont (runtime).

What VectorPin Does¶

Vector databases sit underneath every modern RAG system, but most are written and read with zero integrity checking. VectorPin binds each embedding to its source content and the model that produced it, then verifies that nothing has changed — including covert steganographic modifications invisible to traditional DLP.

Pinning — Sign a compact attestation that commits to the source text (SHA-256 of NFC-normalized UTF-8), the model identifier, the vector itself (SHA-256 of canonical little-endian bytes), the producer (Ed25519 key), and an RFC 3339 timestamp.
Verification — Reject any embedding whose hash, source, model, signature, or kid does not match. Distinguish VECTOR_TAMPERED from SOURCE_MISMATCH from UNKNOWN_KEY so callers can route them.
Auditing — Walk a whole LanceDB / Chroma / Qdrant / Pinecone collection and report on every record. JSON summary on stdout, non-zero exit on any failure — drops into CI or cron unchanged.
Key rotation — Verifier registries hold multiple kid → public_key mappings, each with a (valid_from, valid_until) window. Old pins keep verifying; compromised keys produce KEY_EXPIRED for anything signed after the compromise instant.
Cross-language — Python, Rust, and TypeScript implementations are byte-for-byte compatible, locked by shared test vectors in CI.

Quick Example¶

import numpy as np
from vectorpin import Signer, Verifier

# At ingestion time
signer = Signer.generate(key_id="prod-2026-05")
embedding = my_model.embed("The quick brown fox.")
pin = signer.pin(
    source="The quick brown fox.",
    model="text-embedding-3-large",
    vector=embedding,
)
# Store pin.to_json() alongside the embedding in your vector DB metadata.

# At read/audit time
verifier = Verifier({"prod-2026-05": signer.public_key_bytes()})
result = verifier.verify(pin, source="The quick brown fox.", vector=embedding)
if not result.ok:
    print(f"INTEGRITY FAILURE: {result.error.value} — {result.detail}")

Implementations¶

Language	Package	Install
Python	`vectorpin`	`pip install vectorpin`
Rust	`vectorpin`	`cargo add vectorpin`
TypeScript	`vectorpin`	`npm install vectorpin`

All three are byte-for-byte compatible — a pin produced by any implementation verifies on the other two. The TS port is pure JavaScript via @noble/ed25519 and @noble/hashes, so it also runs in Deno, Bun, and edge runtimes.

Why this matters¶

Modern RAG systems convert sensitive content into high-dimensional vectors and store them in databases that don't inspect what gets written, don't verify integrity on read, and treat embeddings as opaque numerical artifacts. That's a giant attack surface.

The companion VectorSmuggle research project demonstrates that an attacker with write access to a vector pipeline can hide arbitrary data inside embeddings using noise injection, rotation, scaling, offset perturbations, cross-model fragmentation, and steganographic encoding that survives quantization.

Cryptographic pinning is the kill shot. Every steganographic technique requires modifying the vector after the model produces it. If each vector ships with a signed attestation binding it to source text and producing model, any modification breaks the signature.

Documentation¶

Guide	Description
Getting Started	Install, generate keys, pin and verify embeddings
Pin Protocol	Wire format, canonicalization, and verification order
CLI Guide	`vectorpin keygen`, `pin`, `verify-pin`, and `audit-*` commands
Vector Store Adapters	LanceDB, Chroma, Qdrant, Pinecone integrations
Statistical Detectors	Defense-in-depth against ingestion-time poisoning
Deployment	Key custody, rotation, and CI integration
Security	Threat model and best practices
Troubleshooting	Common errors and solutions
Specification	Protocol v2 wire-format specification