Sprint Demo · June 1–12, 2026

TechOps

This sprint

Two themes

Safer continuous deployment

  • Better post-deploy test guidance
  • Ship-it enablement, on one standard path across the fleet

Datadog cost control

  • Shorter live-index retention
  • Rehydratable archives, so nothing is lost

Safer continuous deployment

Resilient post-deploy testing for Ship It

Clearer guidance and a real bar: ship-it-enabled apps cover every critical path with automated post-deploy tests.

The standard

  • Improved guidance for writing post-deploy tests
  • Coverage requirement: resilient automated tests across every critical path for ship-it-enabled apps

Tooling

  • Upgraded skills that automatically create post-deploy tests
  • And validate them — so the safety net is real, not just present
Why it matters Ship It is only as safe as its tests — now every critical path is covered before a merge can reach prod.

Continuous deployment

A preview environment on every PR — automatically

Preview deploys were always self-serve. Now they come up on their own for every PR and run post-deploy tests automatically — across all shipping repos.

What changed

  • Automatic, not manual: every PR gets a live preview without anyone kicking it off
  • Tests run on their own: post-deploy tests execute against each preview automatically
  • Now fleet-wide: apps like canopy, ui-react, and experiment-router already had it — now it's all shipping repos

Why it matters

  • Faster review cycles: reviewers and PMs open a link to the real running change, already exercised by tests
  • Lower cloud cost: previews share one ALB instead of each standing up its own
  • More confidence: every PR is tested before a human even looks

Continuous deployment

A deploy isn't done until it's actually live

Hardened the shared deployment workflows every service already uses — tighter test integration and real readiness gating.

What we hardened

  • Readiness gating: a preview environment isn't treated as deployed until its DNS name and ALB are actually available
  • Consistent test harness: the service base URL is passed to the post-deploy tests the same way every time
  • Right-timed notification: the deploy Slack message now fires when the service is live, not while you're still waiting on an ALB

Why it matters

  • More resilient: tests run against a real, reachable URL — fewer false failures
  • Faster feedback: "it's live" means it's live; no waiting after the ping
  • Inherited by all: a fix to the shared path protects every service at once

Observability · Cost

Better Datadog spend efficiency

Shorter live-index retention with rehydratable S3 archives, and a contract that matches how we actually scale.

Spend efficiency

  • Live-index retention trimmed (15 → 7 days), done non-destructively
  • All logs archive to S3 — cheap long-tail storage, rehydrate on demand
  • Result: fewer overages and more indexing headroom for the service logs teams actually use

Contract renegotiation

  • Hourly accounting for APM and infrastructure hosts — fits our continuous auto-scaling instead of paying for peak
  • More Browser Synthetics committed, for automated testing
Net Lower Datadog spend and room to grow service logging — without losing any history.

Data platform · In progress

Supply & routes databases → managed DocumentDB

We don't normally present work before it's finished — but this one's worth an early look, because it touches other teams that depend on these data stores.

A cut-over with no downtime

  • Dual-writethe writers send every change to both the old MongoDB and the new DocumentDB at once, keeping them in lockstep
  • Reader cut-overeach dependent service moves one at a time: canary a single pod, validate its results against MongoDB, then switch the rest
  • Cleanuponce every reader is happy, the writers drop MongoDB and we decommission it

Why it matters

  • No big-bang switch: services move independently, and a problem with one can't stall the others
  • Validated before it counts: every reader is checked against the current database before it carries real traffic
  • End state: managed and secured — lower operational risk for two core data stores
Status In flight — currently dual-writing; reader cut-over proceeds as the dependent teams have capacity.