Sprint Demo · June 1–12, 2026

TechOps

This sprint

Two themes

Safer continuous deployment

Better post-deploy test guidance
Ship-it enablement, on one standard path across the fleet

Datadog cost control

Shorter live-index retention
Rehydratable archives, so nothing is lost

Safer continuous deployment

Resilient post-deploy testing for Ship It

Clearer guidance and a real bar: ship-it-enabled apps cover every critical path with automated post-deploy tests.

The standard

Improved guidance for writing post-deploy tests
Coverage requirement: resilient automated tests across every critical path for ship-it-enabled apps

Tooling

Upgraded skills that automatically create post-deploy tests
And validate them — so the safety net is real, not just present

Why it matters Ship It is only as safe as its tests — now every critical path is covered before a merge can reach prod.

Continuous deployment

A preview environment on every PR — automatically

Preview deploys were always self-serve. Now they come up on their own for every PR and run post-deploy tests automatically — across all shipping repos.

What changed

Automatic, not manual: every PR gets a live preview without anyone kicking it off
Tests run on their own: post-deploy tests execute against each preview automatically
Now fleet-wide: apps like canopy, ui-react, and experiment-router already had it — now it's all shipping repos

Why it matters

Faster review cycles: reviewers and PMs open a link to the real running change, already exercised by tests
Lower cloud cost: previews share one ALB instead of each standing up its own
More confidence: every PR is tested before a human even looks

Continuous deployment

A deploy isn't done until it's actually live

Hardened the shared deployment workflows every service already uses — tighter test integration and real readiness gating.

What we hardened

Readiness gating: a preview environment isn't treated as deployed until its DNS name and ALB are actually available
Consistent test harness: the service base URL is passed to the post-deploy tests the same way every time
Right-timed notification: the deploy Slack message now fires when the service is live, not while you're still waiting on an ALB

Why it matters

More resilient: tests run against a real, reachable URL — fewer false failures
Faster feedback: "it's live" means it's live; no waiting after the ping
Inherited by all: a fix to the shared path protects every service at once

Observability · Cost

Better Datadog spend efficiency

Shorter live-index retention with rehydratable S3 archives, and a contract that matches how we actually scale.

Spend efficiency

Live-index retention trimmed (15 → 7 days), done non-destructively
All logs archive to S3 — cheap long-tail storage, rehydrate on demand
Result: fewer overages and more indexing headroom for the service logs teams actually use

Contract renegotiation

Hourly accounting for APM and infrastructure hosts — fits our continuous auto-scaling instead of paying for peak
More Browser Synthetics committed, for automated testing

Net Lower Datadog spend and room to grow service logging — without losing any history.

Data platform · In progress

Supply & routes databases → managed DocumentDB

We don't normally present work before it's finished — but this one's worth an early look, because it touches other teams that depend on these data stores.

A cut-over with no downtime

Dual-write — the writers send every change to both the old MongoDB and the new DocumentDB at once, keeping them in lockstep
Reader cut-over — each dependent service moves one at a time: canary a single pod, validate its results against MongoDB, then switch the rest
Cleanup — once every reader is happy, the writers drop MongoDB and we decommission it

Why it matters

No big-bang switch: services move independently, and a problem with one can't stall the others
Validated before it counts: every reader is checked against the current database before it carries real traffic
End state: managed and secured — lower operational risk for two core data stores

Status In flight — currently dual-writing; reader cut-over proceeds as the dependent teams have capacity.