Greentic deployment · control plane

The "Server Next" plan, in plain words

Greentic deploys digital workers. Until now, the record of what is deployed where lived as files on one operator's machine. "Server Next" turns that into a real, shared, safe control plane — a server many operators and automations talk to over the network. Here's why that matters, what already works, and what's left.

Why this is necessary

The short version: a file on a laptop is fine for a demo, not for an enterprise.

The analogy. Today's local store is like a spreadsheet saved on one person's laptop. It works until a second person needs to edit it, until someone overwrites a change by accident, until you need to know who changed what and when, or until the file gets corrupted and there's no undo. The "Server Next" plan replaces that laptop spreadsheet with a shared database that has logins, an edit history, locking, and a restore button.
BEFORE — local only Operator + deployment state as files one machine, no sharing, no undo AFTER — shared store server Operator A Automation HTTP Store server one source of truth locking · audit · retry-safe access control · restore
From "state lives on one machine" to "state lives in one server everyone safely shares".

What a real control plane has to guarantee

Each of these is a way the laptop-spreadsheet model fails — and a thing the server must get right.

① One source of truth

Reachable over the network, not a file only one person can see.

② No silent clobbering

Two operators editing at once can't overwrite each other — writes are checked against a version tag (optimistic locking).

③ No accidental double-apply

If a network glitch makes a client retry "deploy", it must not deploy twice — the server replays the first answer instead.

④ A permanent record

Every change — and every rejected attempt — is written to an audit log for compliance.

⑤ Access control

Who may change what, decided per person + environment + action, and locked shut by default.

⑥ Recover & detect corruption

Snapshots you can restore, and integrity checks that catch silent data rot before it spreads.

The roadmap: five phases (S1–S5)

The plan sequenced the work so each phase builds on the one before it. Below is the plan and where each phase actually stands today.

Shipped Mostly shipped — a piece remains Future / not started
S1

Pick the storage backend

Re-sequenced

The plan said "finish the Postgres backend first." In practice the team shipped a simpler single-writer SQLite server first — it works today and ships sooner — and parked Postgres as a prototype to revisit when scale or high-availability demand it. A deliberate change of order, not a skipped step.

S2

The HTTP store server

Shipped

A real server (greentic-operator-store-server) now exposes the deployment contract over HTTP. Version-tag locking (guarantee ②) and retry-safe replay (③) are built in. --store-url works end-to-end: an operator can apply an environment to a remote server, and even dry-run it first to confirm it would converge.

S3

Access control (RBAC)

Static tokens shipped

Decisions are made per actor + environment + action, fail-closed, and rejected attempts are still audited (guarantees ④ & ⑤). Today it uses static bearer tokens with coarse roles (admin / operator / read-only). The future piece is stronger identity — OIDC or mutual-TLS — which can replace this module without changing anything downstream.

S4

The remaining deploy actions

Shipped

The "stage a new revision → warm it up → drain → archive" lifecycle is wired through the server, with a clean split between the pure decision logic and the database write. These were the actions left unimplemented when the plan was written.

S5

Operational hardening

Mostly shipped

This is the gate for "production-ready", and most of it is done: the retry-log is bounded, the audit log has opt-in retention that records what it trimmed, corruption detection verifies hashes on load, and backup & restore exists with tests. Remaining: a full backup→restore acceptance drill and high-availability (automatic failover to a standby).

What's already shipped

A lot of this was "finishing", not greenfield — the wire contract and the client existed before the server did.

Foundations (pre-existing)

  • The wire contract — version tags, retry-replay records, access-request and corruption-hash types.
  • The HTTP client and --store-url routing on the operator CLI.
  • A Postgres prototype backend (parked).

Built out this cycle

  • The SQLite store server — full schema, every read/write verb.
  • Optimistic locking (no silent overwrites) + retry-safe replay.
  • Audit log + opt-in retention with a watermark.
  • Static-token RBAC, fail-closed, denials audited.
  • Revision lifecycle: stage · warm · drain · archive.
  • Backup & restore + integrity / corruption checks.

What's left

ItemWhat it addsStatus
Backup→restore drill A single end-to-end acceptance test proving that taking a backup and restoring it reproduces the exact live state ("active generation"). The mechanism exists and is unit-tested; this is the production sign-off proof. Next
Identity beyond static tokens OIDC or mutual-TLS so actors are real verified identities, not shared bearer tokens. Designed-for: it can swap in without touching downstream code. Future
High availability A standby server that takes over automatically without losing any committed change. Needs a design decision (shared storage vs. replication) first. Future
Postgres backend Finish the parked Postgres path for when one SQLite writer is no longer enough. Reopen when scale/HA forces it. Parked

Where it stands, in one line

The control plane is working end-to-end: a shared server with locking, retry-safety, audit, access control, the full deploy lifecycle, and recoverable backups. The remaining work is the production sign-off drill and the enterprise-grade extras — verified identity and automatic failover.

Greentic deployment · "Server Next" (Part 2 of the plan). Status reflects the greentic-operator-store-server work through the latest develop. Phases S2 and S4 shipped; S3 shipped at the static-token level; S5 mostly shipped (drill + HA remain); S1 re-sequenced to SQLite-first with Postgres parked.