Sprint Demo · June 1–12, 2026
TechOps
This sprint
Two themes
Safer continuous deployment
- Better post-deploy test guidance
- Ship-it enablement, on one standard path across the fleet
Datadog cost control
- Shorter live-index retention
- Rehydratable archives, so nothing is lost
Safer continuous deployment
Resilient post-deploy testing for Ship It
Clearer guidance and a real bar: ship-it-enabled apps cover every critical path with automated post-deploy tests.
The standard
- Improved guidance for writing post-deploy tests
- Coverage requirement: resilient automated tests across every critical path for ship-it-enabled apps
Tooling
- Upgraded skills that automatically create post-deploy tests
- And validate them — so the safety net is real, not just present
Why it matters
Ship It is only as safe as its tests — now every critical path is covered before a merge can reach prod.
Continuous deployment
A preview environment on every PR — automatically
Preview deploys were always self-serve. Now they come up on their own for every PR and run post-deploy tests automatically — across all shipping repos.
What changed
- Automatic, not manual: every PR gets a live preview without anyone kicking it off
- Tests run on their own: post-deploy tests execute against each preview automatically
- Now fleet-wide: apps like canopy, ui-react, and experiment-router already had it — now it's all shipping repos
Why it matters
- Faster review cycles: reviewers and PMs open a link to the real running change, already exercised by tests
- Lower cloud cost: previews share one ALB instead of each standing up its own
- More confidence: every PR is tested before a human even looks
Continuous deployment
A deploy isn't done until it's actually live
Hardened the shared deployment workflows every service already uses — tighter test integration and real readiness gating.
What we hardened
- Readiness gating: a preview environment isn't treated as deployed until its DNS name and ALB are actually available
- Consistent test harness: the service base URL is passed to the post-deploy tests the same way every time
- Right-timed notification: the deploy Slack message now fires when the service is live, not while you're still waiting on an ALB
Why it matters
- More resilient: tests run against a real, reachable URL — fewer false failures
- Faster feedback: "it's live" means it's live; no waiting after the ping
- Inherited by all: a fix to the shared path protects every service at once
Observability · Cost
Better Datadog spend efficiency
Shorter live-index retention with rehydratable S3 archives, and a contract that matches how we actually scale.
Spend efficiency
- Live-index retention trimmed (15 → 7 days), done non-destructively
- All logs archive to S3 — cheap long-tail storage, rehydrate on demand
- Result: fewer overages and more indexing headroom for the service logs teams actually use
Contract renegotiation
- Hourly accounting for APM and infrastructure hosts — fits our continuous auto-scaling instead of paying for peak
- More Browser Synthetics committed, for automated testing
Net
Lower Datadog spend and room to grow service logging — without losing any history.
Data platform · In progress
Supply & routes databases → managed DocumentDB
We don't normally present work before it's finished — but this one's worth an early look, because it touches other teams that depend on these data stores.
A cut-over with no downtime
- Dual-write — the writers send every change to both the old MongoDB and the new DocumentDB at once, keeping them in lockstep
- Reader cut-over — each dependent service moves one at a time: canary a single pod, validate its results against MongoDB, then switch the rest
- Cleanup — once every reader is happy, the writers drop MongoDB and we decommission it
Why it matters
- No big-bang switch: services move independently, and a problem with one can't stall the others
- Validated before it counts: every reader is checked against the current database before it carries real traffic
- End state: managed and secured — lower operational risk for two core data stores
Status
In flight — currently dual-writing; reader cut-over proceeds as the dependent teams have capacity.