Code Generation Engine
The PowerShell module generator is a deterministic, offline Python engine that synthesizes the Glean module from vendored OpenAPI specifications and Speakeasy overlays. It processes REST API definitions to produce individual PowerShell cmdlet files, a module manifest, and a machine-readable contract file for testing. The system handles complex naming disambiguation, parameter mapping, and pagination logic to ensure the generated code adheres to PowerShell best practices while maintaining strict uniqueness invariants for function names and aliases.
Input Processing and Overlay Application
Section titled “Input Processing and Overlay Application”The generator begins by loading OpenAPI specifications from the specs directory, distinguishing between client and indexing APIs based on their base paths (/rest/api/v1 and /api/index/v1 respectively) 1. It then parses Speakeasy overlay files located in the overlays directory to apply custom naming and grouping rules. The engine extracts overlay entries that define name overrides or group assignments, matching them against operations by stripping inconsistent base prefixes to ensure accurate path suffix matching 2.
Simultaneously, the generator identifies operations marked for removal via remove: true actions in the overlays. These removals are stored in a set of tuples (api, stripped_path, method) to filter out deprecated or unwanted endpoints during the collection phase. The overlay processing ensures that the generator respects author intent for API surface customization without modifying the source OpenAPI specs.
Operation Collection and Parameter Resolution
Section titled “Operation Collection and Parameter Resolution”Once inputs are processed, the engine iterates through every operation in the loaded specifications. For each HTTP method (GET, POST, PUT, PATCH, DELETE), it checks against the removal set to exclude filtered operations. The generator resolves parameters by combining operation-level and path-item-level parameters, ensuring that operation-level definitions take precedence in case of conflicts 3.
The engine determines the body schema for each operation, identifying whether the request body is a JSON object, raw content, or multipart data. This distinction is critical for generating correct PowerShell parameter types and handling logic. The system also resolves $ref pointers within the schema, flattening allOf compositions while preserving oneOf and anyOf structures as opaque types 1.
Naming and Disambiguation
Section titled “Naming and Disambiguation”A core component of the generator is the naming engine, which maps Speakeasy method names and HTTP verbs to approved PowerShell Verb-Noun pairs. It uses a predefined mapping of methods (e.g., “create” to “New”, “get” to “Get”) and falls back to HTTP verb mappings if no specific method rule exists 4. The engine derives the noun from the last segment of the API group path, applying singularization rules to handle irregular plurals and standard suffixes like “ies” or “ses”.
To guarantee uniqueness, the generator detects collisions where multiple operations would result in the same Verb-Noun pair. It resolves these collisions by appending a descriptive suffix derived from the method name or by appending a numeric index if necessary. The primary function name is assigned based on a priority order of methods, while aliases are generated as fully qualified names (e.g., Glean.client.search.query) to provide alternative invocation paths.
Template Rendering and Output Generation
Section titled “Template Rendering and Output Generation”The generator uses Jinja2 templates to render PowerShell code. For each operation, it builds a context dictionary containing resolved parameters, body schema details, pagination metadata, and idempotency flags. Parameters are decorated with PowerShell literals for help text and style attributes, and a runnable example is synthesized from mandatory parameters 5.
The engine writes individual .ps1 files for each cmdlet into Public/Client or Public/Indexing subdirectories, depending on the API type. It also generates a module manifest (Glean.psd1) listing all exported functions and aliases, and an aliases.json file for runtime alias registration. Finally, a contract.json file is produced, containing a machine-readable summary of all operations, their parameters, and metadata, which is used by the test suite to verify the generated code 6.
"""Spec loading, $ref resolution, schema composition, and OpenAPI->PowerShell type mapping."""
from __future__ import annotations
import os
import yaml
HTTP_METHODS = ("get", "post", "put", "patch", "delete")
# Each published spec and the server sub-path its operations live under.
SPECS = {
"client": {"file": "client_rest.yaml", "base": "/rest/api/v1"},
"indexing": {"file": "indexing.yaml", "base": "/api/index/v1"},
}
def specs_dir() -> str:
return os.path.join(os.path.dirname(os.path.abspath(__file__)), "specs")
def load_yaml(path: str) -> dict:
with open(path, "r", encoding="utf-8") as fh:
return yaml.safe_load(fh)
def load_spec(api: str) -> dict:
return load_yaml(os.path.join(specs_dir(), SPECS[api]["file"]))
def resolve_ref(spec: dict, ref: str) -> dict:
"""Resolve a local '#/...' JSON pointer against the spec document."""
assert ref.startswith("#/"), f"non-local $ref unsupported: {ref}"
node = spec
for part in ref[2:].split("/"):
part = part.replace("~1", "/").replace("~0", "~")
node = node[part]
return node
def deref(spec: dict, node: dict, _seen=None):
"""Return node with a top-level $ref resolved (one hop, cycle-guarded)."""
_seen = _seen or set()
#!/usr/bin/env python3
"""Generate the Glean PowerShell module from the vendored OpenAPI specs + Speakeasy overlays.
Outputs (all under ../Glean and ./):
- Glean/Public/Client/*.ps1, Glean/Public/Indexing/*.ps1 (one cmdlet per operation)
- Glean/Glean.psd1 (manifest: exports + aliases)
- generator/contract.json (machine-readable contract for tests)
Deterministic and offline. Re-run any time the vendored specs change.
"""
from __future__ import annotations
import glob
import json
import os
import re
import sys
import yaml
from jinja2 import Environment, FileSystemLoader
HERE = os.path.dirname(os.path.abspath(__file__))
sys.path.insert(0, HERE)
import schema as S # noqa: E402
import naming # noqa: E402
import params as P # noqa: E402
import pagination as PG # noqa: E402
MODULE_DIR = os.path.normpath(os.path.join(HERE, "..", "Glean"))
PUBLIC = os.path.join(MODULE_DIR, "Public")
MODULE_VERSION = "1.0.0"
MODULE_GUID = "8f2a1c4e-9b30-4d67-bf12-7a5e6c0d3e91" # stable across regenerations
IDEMPOTENT_HTTP = {"get", "put", "delete"}
_TARGET_RE = re.compile(r'\["paths"\]\["(?P<path>[^"]+)"\]\["(?P<method>[^"]+)"\]')
def ps_literal(value) -> str:
"""Render a Python str/bool/None as a PowerShell literal."""
if value is None:
return "$null"
is_object = (
media == "application/json"
and isinstance(resolved, dict)
and (resolved.get("type") == "object" or "properties" in resolved)
and not any(k in resolved for k in ("oneOf", "anyOf"))
)
return (media, resolved, is_object, required)
def resolve_parameters(spec: dict, op: dict, path_item: dict):
"""Return a flat list of resolved parameter dicts (op-level + path-item-level)."""
out = []
# OpenAPI 3.0: an operation-level parameter overrides a path-item-level parameter with the
# same (name, in). Operation params come first so the first-wins dedup keeps the override.
raw = (op.get("parameters") or []) + (path_item.get("parameters") or [])
seen = set()
for p in raw:
p = deref(spec, p)
key = (p.get("name"), p.get("in"))
if key in seen:
continue
seen.add(key)
out.append(p)
return out
"""Map Speakeasy (group, method) + HTTP verb to a unique approved PowerShell Verb-Noun.
Uniqueness of function names AND aliases is a hard invariant: build_names() raises on any
collision it cannot deterministically resolve.
"""
from __future__ import annotations
import re
# Speakeasy method name -> approved PowerShell verb. HTTP verb is a fallback signal.
METHOD_VERB = {
"create": "New",
"add": "Add",
"delete": "Remove",
"remove": "Remove",
"update": "Set",
"edit": "Set",
"patch": "Set",
"upsert": "Set",
"get": "Get",
"retrieve": "Get",
"list": "Get",
"search": "Search",
"query": "Search",
"autocomplete": "Find",
"recommendations": "Find",
"upload": "Send",
"index": "Add",
"bulkindex": "Add",
"process": "Submit",
"report": "Submit",
"feedback": "Submit",
"run": "Invoke",
"execute": "Invoke",
"cancel": "Stop",
"stop": "Stop",
"rotate": "Update",
"check": "Test",
"verify": "Test",
"count": "Measure",
"summarize": "Get",
def match_overlay(entries: list[dict], api: str, op_path: str, method: str) -> dict:
"""Overlay whose api+method match and whose base-stripped path EXACTLY equals op_path."""
for e in entries:
if e["api"] == api and e["method"] == method and e["stripped"] == op_path:
return e
return {}
def _fallback_method(path: str, hm: str) -> str:
"""Derive a readable method name from the last non-templated path segment."""
segs = [s for s in path.strip("/").split("/") if s and not s.startswith("{")]
return segs[-1] if segs else hm
def collect_ops(overlay_entries: list[dict], removals: set = None) -> list[dict]:
removals = removals or set()
ops = []
for api, meta in S.SPECS.items():
spec = S.load_spec(api)
for path, item in spec["paths"].items():
for hm in S.HTTP_METHODS:
if hm not in item:
continue
if (api, path, hm) in removals:
continue # overlay marked this operation `remove: true`
op = item[hm]
ov = match_overlay(overlay_entries, api, path, hm)
tag = (op.get("tags") or ["Default"])[0]
group = ov.get("group") or f"{api}.{re.sub(r'[^A-Za-z0-9]+', '', tag).lower()}"
method = ov.get("name") or op.get("operationId") or _fallback_method(path, hm)
operation_id = op.get("operationId") or f"{api}:{hm}:{path}"
ops.append({
"api": api,
"spec": spec,
"path": path,
"http_method": hm,
"http_method_upper": hm.upper(),
"operationId": operation_id,
"group": group,
"method": method,
"httpMethod": op["http_method_upper"],
"path": op["path"],
"group": op["group"],
"method": op["method"],
"shouldProcess": op["should_process"],
"idempotent": idempotent,
"hasObjectBody": has_object_body,
"hasRawBody": has_raw_body,
"multipart": multipart,
"mediaType": media,
"bodyRequired": body_required,
"paginate": ctx["paginate"] is not None,
"pagination": pag,
"parameters": [
{
"psName": p["ps_name"],
"wire": p["wire"],
"location": p["location"],
"mandatory": p["mandatory"],
"paramSet": p["param_set"],
"psType": p["ps_type"],
"style": p.get("style"),
"explode": p.get("explode"),
}
for p in ps_params
],
}
def clean_public():
for sub in ("Client", "Indexing"):
d = os.path.join(PUBLIC, sub)
os.makedirs(d, exist_ok=True)
for f in glob.glob(os.path.join(d, "*.ps1")):
os.remove(f)
def main():
# autoescape is intentionally off: this renders PowerShell source, not HTML, so HTML
# escaping would corrupt the generated .ps1 output.