Skip to content

Orchestrator and Deployment Management

The Orchestrator and Deployment Management subsystem provides the interface for interacting with the Cloudify orchestrator’s REST API surface (/api/v3.1/*). This layer handles the specific authentication contract required by the orchestrator, manages the lifecycle of blueprints and deployments, and executes workflows. It abstracts away the complexities of async operations, such as blueprint uploads and environment creation, by implementing polling mechanisms with appropriate timeouts to handle long-running cold-start scenarios.

The orchestrator client (orch_client.py) serves as the single source of truth for the authentication contract. Unlike the inventory client which may use different endpoints, the orchestrator’s session-service base64-decodes the raw Authorization header and fails on the space inside Bearer . Consequently, every v3.1 request must carry Cookie: session.id=<jwt> instead of an Authorization header 1.

The attach_to_orchestrator function resolves authentication via two paths:

  1. Browser Session: If a Session object is provided, it attaches the portal’s session.id cookie to the client. For portal-scoped JWTs, the system also extracts the csrf claim from the JWT and injects it into the CSRF-Token header for all state-changing calls (PUT/POST/PATCH/DELETE) to pass the anti-CSRF gate.
  2. Service Account: If an admin (BootstrapResult) is provided, the system mints a Bearer JWT via /api/v1/oidc/token and writes it into the cookie jar as session.id. Service-account tokens are exempt from the CSRF gate because they have a different scope claim.

The open_orchestrator_client context manager yields an authenticated httpx.Client with a default timeout of 60 seconds. This higher timeout is necessary because the first install of a blueprint on a fresh pod provisions an ansible-core venv and galaxy collections, which can take 8-15 minutes end-to-end.

diagram

The blueprint engine (blueprint/client.py) provides a minimal set of functions to drive the standard “upload, deploy, run install, watch the result” loop. It enforces several load-bearing contracts discovered through probing:

  • Multipart Upload: Uploading a blueprint requires a multipart form with exactly two fields: params (a JSON form field containing application_file_name and visibility) and blueprint_archive (the file) 2. Other field names result in 400 errors, and non-multipart requests result in 422 errors 3.
  • Tar Structure: The blueprint source directory is tarred and gzipped, rooted at the source directory’s name. Cloudify requires a single top-level directory; flat tars are rejected with “main blueprint file not found” 2.
  • Async Upload: The upload returns immediately with state: uploading. The client polls /api/v3.1/blueprints/<id> until state == uploaded or fails.

The upload_blueprint function handles the tar creation, multipart request, and polling for the uploaded state. The list_blueprints and delete_blueprint functions provide read and delete capabilities, with delete_blueprint supporting a force flag to remove blueprints even if deployments exist 4.

Deployment creation and workflow execution are also asynchronous processes that require careful polling.

  • Deployment Creation: Creating a deployment triggers create_deployment_environment automatically. Subsequent execute(install) calls will 409 if this environment creation is still in flight. The create_deployment function polls the most recent create_deployment_environment execution until it terminates 2.
  • Workflow Execution: The execute_workflow function POSTs to /api/v3.1/executions to start a workflow. The wait_execution function then polls the execution status until it reaches terminated, failed, or cancelled states. The default timeout for execution is 1800 seconds to accommodate long cold-starts.

The latest_execution function queries the most recent execution for a deployment, optionally filtered by workflow ID, which powers the dap blueprint logs command 5.

Runtime properties may contain secrets. The orchestrator’s sensitive_keys mechanism masks values in execution logs but NOT in the GET /api/v3.1/node-instances response. To prevent stray secrets from appearing in operator scrollback, the redact_runtime_properties function walks the object and replaces values under keys containing hints like “password”, “secret”, “token”, “credential”, “private_key”, “api_key”, or “auth” with <redacted> 2.

The fetch_runtime_properties function retrieves node instances for a deployment, which can then be passed through redact_runtime_properties before display 6.