OOXML Manipulation
The OOXML editing primitives in ppt_craft provide a low-level, direct manipulation layer for PowerPoint files, bypassing high-level abstractions to ensure byte-equal round-trips and full feature parity with desktop PowerPoint 1. This package operates by loading the .pptx as a mutable in-memory ZIP archive, enumerating parts via [Content_Types].xml to avoid hardcoded path assumptions, and applying surgical XML mutations using lxml 2. It handles slide geometry, text content, native chart insertion, and theme properties while enforcing strict validation against geometry bounds and placeholder residue 1.
Package Loading and Part Enumeration
Section titled “Package Loading and Part Enumeration”The core entry point is load(), which reads a .pptx file into a PptxPackage dataclass containing two dictionaries: parts (mapping zip paths to raw bytes) and zinfos (preserving original ZipInfo metadata like compression level and modification time) 2. This structure allows for replace() operations and to_bytes() repackaging that maintains byte-equal fidelity.
Crucially, the system never assumes standard part names like theme1.xml or slideMaster1.xml. Instead, enumerate_parts() parses [Content_Types].xml to dynamically discover all themes, masters, layouts, slides, and charts. This is vital for multi-master decks where multiple theme files exist.
Slide geometry is not assumed; slide_size_emu() explicitly reads <p:sldSz> from ppt/presentation.xml to determine canvas dimensions in EMUs, falling back to widescreen defaults only if the element is missing.
Surgical Slide and Shape Editing
Section titled “Surgical Slide and Shape Editing”Slide editing is handled in slide_edit.py, which operates directly on lxml roots in-place 3. Shapes are located by their id attribute via find_shape_by_id(), searching <p:sp>, <p:graphicFrame>, and <p:pic> elements.
Text manipulation is granular. apply_paragraphs() replaces the entire text body of a shape, preserving <a:bodyPr> and <a:lstStyle> while nuking existing <a:p> elements. It constructs runs with specific formatting (bold, italic, size, color) using _build_run_xml(). For targeted updates, set_run_text() allows in-place modification of specific <a:t> elements by paragraph and run index.
Geometry manipulation is achieved via move_shape(), which sets <a:off> (position) and <a:ext> (size) attributes within the shape’s <a:xfrm> element. shape_bbox_emu() retrieves the current bounding box (x, y, cx, cy) for layout calculations.
Native Chart Insertion
Section titled “Native Chart Insertion”Chart handling in charts.py delegates to python-pptx rather than hand-rolling XML 4. This is because bespoke lxml generation fails to produce the embedded Excel workbook and c:externalData link required for the “Edit Data” feature in desktop PowerPoint.
add_chart_to_slide() maps compact chart types (e.g., “bar”, “scatter”) to python-pptx enums and inserts the chart at specified EMU coordinates. It supports scatter plots using XyChartData and other types using CategoryChartData.
After insertion, _recolour_series() performs a post-pass on the chart’s XML (<c:ser> elements) to apply theme palette colors directly to <a:solidFill> elements, bypassing python-pptx’s high-level formatting which may lose style for certain chart types.
Theme and Font Modification
Section titled “Theme and Font Modification”Theme editing in theme_edit.py ensures that palette and font changes are applied to all theme parts in the deck, not just theme1.xml 5. This addresses a previous failure mode where multi-master decks were partially updated.
set_palette() iterates through all theme parts found by enumerate_parts(), locating <a:clrScheme> and rewriting slot values (e.g., “dk1”, “accent1”) to the provided hex palette. It normalizes existing color definitions to <a:srgbClr>.
set_fonts() similarly iterates through theme parts to update <a:fontScheme>, modifying <a:latin> typeface attributes for both major (heading) and minor (body) fonts.
Validation and Geometry Checks
Section titled “Validation and Geometry Checks”The validate.py module provides geometry and placeholder validation without requiring a LibreOffice round-trip 6. It uses the deck-specific slide size from slide_size_emu() as the boundary for all checks.
validate_pptx() iterates through all slides and shapes, checking for:
- Placeholder Residue: Text matching patterns like “click to add” or “lorem ipsum”.
- Out-of-Bounds Shapes: Shapes whose bounding box extends beyond the slide canvas dimensions.
- Shape Overlaps: Shapes intersecting by more than 5% of the smaller shape’s area.
It returns a list of ValidationIssue objects detailing the code, slide ID, shape ID, message, and fix hints.
"""OOXML editing primitives (unpack / slide / theme / master / charts / validate).
See plan §P3 - OOXML editing core. This sub-package is the heart of the
"Claude edits OOXML directly" parity path.
"""
"""PPTX zip ↔ XML-bytes round trip + relationship-graph enumeration.
Every theme/master/layout edit must walk the **actual** part list from
`[Content_Types].xml` and the `_rels` graph - never assume `theme1.xml`
or `slideMaster1.xml` (per Codex iteration 2 P1 finding).
"""
from __future__ import annotations
import dataclasses
import io
import zipfile
from collections.abc import Iterable
from pathlib import Path
from lxml import etree
CT_NS = "http://schemas.openxmlformats.org/package/2006/content-types"
REL_NS = "http://schemas.openxmlformats.org/package/2006/relationships"
# Content-Type strings for the parts we care about.
CT_THEME = "application/vnd.openxmlformats-officedocument.theme+xml"
CT_MASTER = "application/vnd.openxmlformats-officedocument.presentationml.slideMaster+xml"
CT_LAYOUT = "application/vnd.openxmlformats-officedocument.presentationml.slideLayout+xml"
CT_SLIDE = "application/vnd.openxmlformats-officedocument.presentationml.slide+xml"
CT_CHART = "application/vnd.openxmlformats-officedocument.drawingml.chart+xml"
@dataclasses.dataclass
class PptxPackage:
"""Mutable in-memory view of a .pptx zip.
Use `.parts[zip_path]` for direct byte access; mutate then call
`.to_bytes()` to repackage. Every part keeps its zip metadata
(deflate level, mtime, etc.) so byte-equal round-trips are possible.
"""
parts: dict[str, bytes]
zinfos: dict[str, zipfile.ZipInfo]
"""Surgical mutators on `ppt/slides/slideN.xml`.
Operates on lxml roots in-place; callers wrap with `unpack.write_xml(pkg, path, root)`.
"""
from __future__ import annotations
from collections.abc import Iterable
from lxml import etree
from ppt_craft.schema import Paragraph, Run
A_NS = "http://schemas.openxmlformats.org/drawingml/2006/main"
P_NS = "http://schemas.openxmlformats.org/presentationml/2006/main"
def _A(tag: str) -> str:
return f"{{{A_NS}}}{tag}"
def _P(tag: str) -> str:
return f"{{{P_NS}}}{tag}"
def find_shape_by_id(slide_root, shape_id: str):
"""Locate a `<p:sp>` (or `<p:graphicFrame>`/`<p:pic>`) by `id` attr."""
for sp in slide_root.iter(_P("sp"), _P("graphicFrame"), _P("pic")):
nv = sp.find(f".//{_P('cNvPr')}")
if nv is not None and nv.get("id") == shape_id:
return sp
return None
def get_shape_text(shape) -> str:
"""Concatenate every `<a:t>` text inside the shape's text body."""
if shape is None:
return ""
parts: list[str] = []
for t in shape.iter(_A("t")):
"""Native PowerPoint chart insertion via python-pptx.
Why python-pptx and not bespoke lxml: python-pptx generates the FULL
chart package - `chartN.xml`, chart rels, `[Content_Types].xml` entries,
**plus** the embedded `ppt/embeddings/Microsoft_Excel_Worksheet.xlsx`
workbook + `c:externalData` link required for `Edit Data` parity in
desktop PowerPoint. Hand-rolling the chart XML alone produces a chart
where right-click → Edit Data is greyed out (Codex iter-2 P1 finding).
This module wraps python-pptx's `slide.shapes.add_chart` with our schema's
`Chart` dataclass + a small lxml post-pass that recolours series strokes
and fills from the theme palette.
"""
from __future__ import annotations
from collections.abc import Iterable
from lxml import etree
from pptx.chart.data import CategoryChartData, XyChartData
from pptx.enum.chart import XL_CHART_TYPE
from pptx.util import Emu
from ppt_craft.schema import Chart, ChartType
# Map our compact chart names to python-pptx enums. We pick clustered
# variants for bar/line; PowerPoint shows them with the standard ribbon UI.
_TYPE_MAP: dict[ChartType, XL_CHART_TYPE] = {
"bar": XL_CHART_TYPE.BAR_CLUSTERED,
"bar_stacked": XL_CHART_TYPE.BAR_STACKED,
"line": XL_CHART_TYPE.LINE,
"pie": XL_CHART_TYPE.PIE,
"doughnut": XL_CHART_TYPE.DOUGHNUT,
"scatter": XL_CHART_TYPE.XY_SCATTER,
"area": XL_CHART_TYPE.AREA,
}
def add_chart_to_slide(
slide, *, spec: Chart, x_emu: int, y_emu: int, cx_emu: int, cy_emu: int, palette: Iterable[str] | None = None
"""Theme palette + font rewrites - applied to EVERY ppt/theme/*.xml part.
Multi-master decks have multiple theme parts (theme1.xml, theme2.xml, …).
The previous "edit theme1.xml" shortcut from Codex iter-2 P1 silently
missed the other masters; we now walk the part index from
[Content_Types].xml and rewrite every one.
"""
from __future__ import annotations
from collections.abc import Mapping
from lxml import etree
from ppt_craft.ooxml.unpack import PptxPackage, enumerate_parts, iter_xml, write_xml
A_NS = "http://schemas.openxmlformats.org/drawingml/2006/main"
def _A(tag: str) -> str:
return f"{{{A_NS}}}{tag}"
def set_palette(pkg: PptxPackage, palette: Mapping[str, str]) -> int:
"""Rewrite `<a:clrScheme>` slots across every theme part.
`palette` keys are slot names ("dk1", "lt1", "accent1"…), values are
"RRGGBB" hex (no #). Returns count of theme parts updated.
"""
idx = enumerate_parts(pkg)
touched = 0
for path, root in iter_xml(pkg, idx.themes):
clr_scheme = root.find(f".//{_A('themeElements')}/{_A('clrScheme')}")
if clr_scheme is None:
continue
for slot, hex_rgb in palette.items():
slot_elem = clr_scheme.find(_A(slot))
if slot_elem is None:
continue
# Slots may carry <a:srgbClr> OR <a:sysClr>; normalise to srgbClr.
"""Geometry / contrast / placeholder validators (lxml only, no LO round-trip).
Per-deck `<p:sldSz>` is read once and used as the bounds - we never assume
the 4:3 default (Codex iter-3 advisory).
"""
from __future__ import annotations
import dataclasses
import re
from pathlib import Path
from lxml import etree
from ppt_craft.ooxml.unpack import enumerate_parts, load, slide_size_emu
A_NS = "http://schemas.openxmlformats.org/drawingml/2006/main"
P_NS = "http://schemas.openxmlformats.org/presentationml/2006/main"
def _A(tag: str) -> str:
return f"{{{A_NS}}}{tag}"
def _P(tag: str) -> str:
return f"{{{P_NS}}}{tag}"
@dataclasses.dataclass
class ValidationIssue:
code: str
slide_id: str
shape_id: str | None
message: str
fix_hint: str | None = None
severity: str = "major" # info | minor | major | blocker
_OVERLAP_THRESHOLD = 0.05 # flag if intersection > 5% of smaller shape area
# Loose heuristic - no word boundaries because PowerPoint placeholders