# PROCESS — the exact steps we followed

Every step below actually happened on a Windows 11 machine with Adobe InDesign 2026
(build 21.4.0.74). Failures are recorded with their real root cause and fix. Nothing
here is hypothetical; forward-looking ideas live only in the "Where this could go"
section of the story and are labelled as such.

## 0. Environment
- Confirmed InDesign 2026 installed; found the working COM ProgID.
  - `InDesign.Application.2026` and `InDesign.Application` work.
  - `…2023/2024/2025` → `REGDB_E_CLASSNOTREG`. **(setup wall #1)**
- `win32com.client.gencache.EnsureDispatch` failed with *"Library not registered"*.
  - Fix: run `makepy.GenerateFromTypeLibSpec()` against InDesign's cached
    `Resources for Visual Basic.tlb`, which exposes the named enum constants. **(wall #2)**
- Python 3.11 venv; installed pymupdf, pillow, numpy, scikit-image, opencv, lxml,
  simpleidml, pywin32. Chose 3.11 (not the system 3.14) for mature scientific wheels.

## 1. Source
- Downloaded OpenStax *Concepts of Biology* (CC BY 4.0); picked Chapter 5
  (Photosynthesis), pages 129–133 — the most consistently colourful 5-page span by an
  image-area / fills / text-block scan.
- Extracted those pages to `source_5pages.pdf`.

## 2. Extraction layer (the great equalizer)
- `extract.py` (PyMuPDF) → `layout_spec.json` + extracted PNG assets + verification
  overlays. Result: 173 text runs, 7 images, 9 fills across 5 pages.

## 3. Fonts
- The book uses IBM Plex Sans + Mulish (both SIL OFL). Not installed by default →
  InDesign would substitute and the layout would drift. Installed both variable fonts
  per-user; InDesign then exposed every named instance, so substitution count = 0.

## 4. The five generation tracks
### B1 — PyWin32 COM
- Drive the DOM from Python. **Failure 1:** page exported 10× too big because
  `PageWidth = 612` was read as picas (7344 pt) — fixed by setting
  `MeasurementUnit = idPoints` before any geometry. **Failure 2:** all 7 images failed
  on `Fit(idFitContentToFrame)` (wrong enum group) — fixed with `Fit(idContentToFrame)`.
  Result: composite 90.9, SSIM up to 0.987.
### B2 — ExtendScript
- Same logic via `app.doScript`. **Failure 3:** identical logic scored SSIM 0.79 vs
  0.96 because ExtendScript applied a `CAP_HEIGHT` baseline while PyMuPDF bboxes are
  ascender-derived — fixed with `ASCENT_OFFSET`. Result: identical to B1 (90.9).
### B3 — UXP
- Ported to the UXP runtime via `doScript(…, idUxpscript)`. **Failure 4:**
  "app is not defined" — UXP needs `require('indesign')`; enums live on the module; no
  return marshaling. Result: byte-identical to B1/B2 (90.9).
### B5 — simpleidml
- Built a tagged IDML template in InDesign (fixed photo + placeholder frames + tag→style
  map), then injected text with `IDMLPackage.import_xml()` in pure Python (no InDesign in
  the fill). **Limit:** title overflowed its placeholder; free-form structure can't be
  reproduced. Composite 59.6 on 1 page — a template tool, not a layout engine.
### B4 — MCP / agentic visual loop
- The suggested repo (`chris-enea/indesign-mcp`) is macOS-only / text-only, so we built
  a COM tool layer and ran a vision-only render→compare→correct loop. **Failure 5:** it
  fixed a distorted photo (switched to fill-proportionally) but composite only moved
  0.601 → 0.608 across 3 rounds — text reflow is the wall. Composite 43.5.

### B6 — block-level / threaded-story (the experiment that regressed)
- Hypothesis: group spans into source blocks, set whole-block text in one frame, and let
  InDesign reflow → more faithful paragraphs. **Result: it regressed** — composite
  **80.7 vs B1's 91.0**, text-IoU **~48 vs ~75**. InDesign's composer re-breaks justified
  prose at different line endings than the source, and block-grouping fused distinct
  elements (title mis-wrapped, the 5.1/5.2/5.3 outline collapsed, "INTRODUCTION" lost its
  bold). Lesson: per-span placement was already right for faithful *reproduction*;
  threaded stories suit *editable* documents, not pixel matches. Documented in the journey
  (section 06) as an honest negative result. Code: `tracks/b6_blocks/build_b6.py`.

### B7 — precision pass (the fix that broke the ceiling)
- The score sheet after the five tracks showed image/colour/judge near-maxed; the only
  real drag was one page. Diagnosis: page 3's “Everyday Connection” callout is a *frame*
  (outer+inner rectangle, even-odd fill) that our extractor had recorded as a single
  solid magenta fill and painted over the photo and text.
- Fix: keep B1's exact per-span text + image placement, but re-classify every drawing
  into **fill** (header bars/tints), **border** (frame → rendered as a stroke), and
  **rule** (thin bar). Page 3 jumped colour 63→100, SSIM 0.875→0.994.
- **Result: composite 90.9 → 93.3**, mean SSIM 0.959 → 0.985. New best track.
  Documented in the journey (section 07). Code: `tracks/b7_precision/build_b7.py`.

### B8 — fit-to-width (a second experiment that regressed; internal)
- Hypothesis: measure each span's rendered width in InDesign and apply a per-span
  horizontal scale so right edges match the source → higher text-IoU.
- **Result: regressed to mean SSIM 0.857.** Horizontal scaling distorts glyph shapes
  (worse pixel match), and the measure-then-resize step left some frames too narrow so
  headings wrapped (“CHAP- / Photosyn-”). Kept internal under `outputs/b8_fitwidth/`.
- Lesson: B7 already places spans at true position with the real font at 100% scale.
  Forcing widths fights the font; the residual text-IoU gap is sub-pixel and not worth
  distorting type to chase. **93.3 is the ceiling with these tools.**

## 5. Scoring (Phase C)
- `score.py`: rasterize source vs output; SSIM + diff heatmap; text-bbox IoU; colour
  ΔE2000; image-placement IoU; plus a Gemini vision judge.
- **Failure 6:** Gemini `2.0-flash` returned HTTP 404 for the key in use — switched to
  `gemini-2.5-flash`, which worked. Judge rated page 1 layout/text/colour/image 10/10.

## 6. Live capture (this story's footage)
- A COM-Dispatch instance creates a real, showable layout window (an interactively
  launched instance does **not** register in the COM ROT — `Operation unavailable` — so
  GetActiveObject is a dead end; Dispatch + showing its window is the route).
- For each track: maximise the window, build element-by-element with redraw on, and
  screenshot the canvas after every placement → clean per-track frame sequences, then
  assembled to MP4 + GIF with ffmpeg.
  - B1: native COM per element. B2/B3: per-element `doScript` (ExtendScript / UXP).
  - B5: open template → run Python fill → open filled result. B4: rounds 0/1/2 rebuilt.

## 7. This page
- `index.html` is hand-authored (light editorial design modelled on the Blender
  journey), embedding only the real media and measured numbers above.