Reference
title: "Rewrite producer" description: "Object-tree mutations on a single PDF — OCG flips, metadata patches, color-space swaps, hygiene strips, page lifecycle ops. No content-stream surgery." group: "Reference" order: 10
Rewrite producer
compile_pdf.rewrite applies object-tree mutations to a single PDF
input. The output is a single PDF with the requested mutations
applied and everything else byte-identical to the input — that
"nothing else touched" guarantee is mechanically verified.
What it does
15 in-scope mutations grouped by category:
Structural
- OCG flips — toggle Optional Content Group visibility per layer.
- Page lifecycle ops — insert / delete / reorder / rotate pages.
- Box adjustments — trim, bleed, art, crop, media boxes.
- Page-label surgery — fix
1, 2, i, ii, …numbering.
Hygiene
- Metadata patches — Info dict + XMP. Strip / set / replace.
- Color-space swaps — DeviceRGB → DeviceCMYK pin (or inverse).
- Strip JS — remove
/JavaScriptactions and/JSkeys. - Strip embedded files — remove
/EmbeddedFiles. - Normalize page-tree fan-out — flatten deep trees.
Lifecycle
- PDF/X version pin — declare conformance level.
- Producer / Creator stamping — operator provenance.
What it doesn't do
Out of scope, gated by a hard STOP-gate in the audit:
- Content-stream surgery
- Font subsetting / embedding changes
- Image recompression
- Color reflow
These belong to a future producer (or never — see the design spec for the rationale).
Plan schema
A rewrite plan is a JSON document. Top-level fields:
{
"schema_version": "1.0.0",
"ops": [
{ "op": "ocg_flip", "layer": "Bleed", "visible": false },
{ "op": "metadata_set", "key": "Title", "value": "Job 12345" },
{ "op": "metadata_strip", "keys": ["JS", "JavaScript"] },
{ "op": "page_rotate", "page": 1, "degrees": 90 },
{ "op": "box_set", "page": 2, "box": "TrimBox", "rect_pt": [0, 0, 612, 792] }
]
}
Schema documented at compile-pdf schema rewrite. Validation runs
client-side (CLI) and server-side (POST /v1/rewrite/apply).
Determinism guarantee
Same input + same plan produces byte-identical output (verified by
SHA-256). The cache key composer (src/compile_pdf/cache.py)
includes the canonical plan hash, so re-running an identical request
short-circuits to the cached output.
Codex surface consumed
codex_pdf.CodexDocument— Compile reads the document to validate page-index references in the plan ("you can't delete page 12 of a 10-page PDF").
No re-implementation; the audit script enforces.
Retention-for-training
POST /v1/rewrite/apply honours the X-Compile-Retain-For-Training
header. When truthy and COMPILE_RETAIN_BUCKET is configured, the
call's input/output/result triplet is persisted to S3-compatible
storage with a TTL tag. The decision is reflected on the lineage
record. See operations/retention.md.
Status
Shipped. The mutation engine (pikepdf) + three-layer verify +
POST /v1/rewrite/apply are live; determinism is enforced in CI.