Data Models and Schema
The core data structures for pptcraft are defined in src/ppt_craft/schema.py as a set of Pydantic-free dataclasses that serve as the canonical intermediate representation between the Qwen LLM and the PowerPoint renderer 1. This schema supports a hierarchical structure where a Deck contains a list of Slide objects, each of which may contain Paragraph text, Visual elements (such as charts, tables, or images), and speaker notes. Validation and serialization are handled via explicit coercion functions that parse raw dictionaries into these typed objects, ensuring strict adherence to constraints like hex color formats and slide dimensions, while raising SchemaError (a subclass of ValueError) on invalid input 2.
Core Data Structures
Section titled “Core Data Structures”The hierarchy begins with the Deck class, which holds global presentation metadata including the theme (e.g., “corporate”, “dark-tech”), title, audience, and locale 1. It also maintains two lists of slides: slides for the active presentation and archived for removed slides. Crucially, the Deck carries a slide_size_emu tuple (defaulting to 16:9 widescreen) to ensure the canvas dimensions validate correctly against the actual output.
Each Slide is identified by a stable UUID string and specifies a layout from a predefined set of literals (e.g., “title”, “two_column”, “chart”). The content of a slide is composed of a title, a list of Paragraph objects, an optional Visual element, and optional speaker_notes.
Paragraph objects consist of a list of Run objects, which define the atomic text styling (bold, italic, size, color). The Visual class is a union-like structure that holds exactly one type of visual element: Chart, TableSpec, IconGrid, or an image_path string. Chart data includes categories, series with float values, and a title, while TableSpec contains headers and rows of strings.
Validation and Serialization Logic
Section titled “Validation and Serialization Logic”The schema uses explicit coercion functions to transform raw JSON/dictionaries into the typed dataclasses. The entry point is deck_from_dict, which validates the presence of the theme and the structure of slide_size_emu (requiring a list or tuple of two integers) 2. It then iterates through slides and archived lists, calling _coerce_slide for each entry.
_coerce_slide ensures the layout key is present and generates a new slide ID if one is not provided. It delegates to _coerce_paragraph for the body and _coerce_visual for the visual element. _coerce_visual checks the kind field and conditionally coerces nested structures like Chart or TableSpec if they exist in the input dictionary.
Text runs are validated by _coerce_run, which raises a SchemaError if the text field is missing or if color_hex does not match the ^#[0-9A-Fa-f]{6}$ regex pattern.
Serialization is handled by deck_to_dict, which uses dataclasses.asdict to convert the object graph back into a dictionary, and deck_to_json, which serializes this dictionary to a JSON string with optional indentation. The deck_from_json function provides a convenience wrapper to parse JSON strings directly into Deck objects.
"""Deck/Slide JSON schema - the canonical intermediate representation.
Qwen emits this; render.py materialises it into a `.pptx`.
Pinpoint edits patch a single Slide entry, leaving others untouched.
"""
from __future__ import annotations
import dataclasses
import json
import re
import uuid
from typing import Any, Literal
# 16:9 widescreen is the modern PowerPoint default. We carry slide_size
# through the deck so widescreen / 4:3 / custom decks all validate against
# their actual canvas (per Codex P1 advisory: never assume 9144000×6858000).
DEFAULT_SLIDE_SIZE_EMU: tuple[int, int] = (12_192_000, 6_858_000) # 13.333" × 7.5"
LayoutName = Literal[
"title",
"title_content",
"two_column",
"chart",
"image_text",
"table",
"divider",
"closing",
]
ThemeName = Literal["corporate", "dark-tech", "minimal"]
ChartType = Literal[
"bar",
"bar_stacked",
"line",
"pie",
"doughnut",
"scatter",
"area",
]
def new_deck_id() -> str:
return f"deck-{uuid.uuid4().hex[:12]}"
_HEX_RE = re.compile(r"^#[0-9A-Fa-f]{6}$")
class SchemaError(ValueError):
"""Raised when JSON cannot be coerced into a Deck/Slide."""
def _coerce_run(d: dict) -> Run:
if "text" not in d or not isinstance(d["text"], str):
raise SchemaError(f"run missing 'text': {d!r}")
color = d.get("color_hex")
if color is not None and not _HEX_RE.match(color):
raise SchemaError(f"color_hex must be #RRGGBB: {color!r}")
return Run(
text=d["text"],
bold=bool(d.get("bold", False)),
italic=bool(d.get("italic", False)),
size_pt=d.get("size_pt"),
color_hex=color,
)
def _coerce_paragraph(d: dict) -> Paragraph:
runs = [_coerce_run(r) for r in d.get("runs", [])]
return Paragraph(runs=runs, bullet=bool(d.get("bullet", False)), indent=int(d.get("indent", 0)))
def _coerce_chart(d: dict) -> Chart:
return Chart(
type=d["type"],
categories=list(d.get("categories", [])),
series=[ChartSeries(name=s["name"], values=list(s.get("values", []))) for s in d.get("series", [])],
title=d.get("title"),
)