← Journal
Engineering May 2026 · 8 min read

Shipping 15 themes without shipping 15 bugs

How we built a token-based theming system in Flutter that lets us ship new looks without regression-testing our way to burnout.

SS
Sahdeep Singh
Founder · Engineering

When we launched Minesweeper with three themes, the QA process was simple: play through each theme, tap every state, ship. When we hit eight themes, that same process took most of a day. By fifteen, it was clear we needed a system — not more testers.

The problem wasn't that we were lazy. It was that we were doing the same work fifteen times. Every new theme meant touching fifteen files, updating fifteen ColorScheme objects, and hoping nothing drifted. Themes were defined inline, scattered across the codebase. One off-by-one in a hex value and you'd ship a tile that looked great in the Gold theme and completely wrong in Ice.

The insight: one source, many outputs

The fix was boring and obvious, which is exactly why it took us a while to do it. Instead of defining each theme as a separate ThemeData object, we created a single AppThemeToken struct — a plain data class with the semantic values that actually matter:

class AppThemeToken {
  final Color accent;        // primary interactive color
  final Color accentMuted;   // secondary / unfocused
  final Color surface;       // card / tile background
  final Color surfaceRaised; // elevated card state
  final Color onAccent;      // text on accent background
  final Color label;         // primary text
  final Color labelMuted;    // secondary text
  final String name;
}

Everything downstream — ThemeData, widget-level styling, the status bar tint — is derived from that token. One factory, parameterized once:

ThemeData buildTheme(AppThemeToken token) {
  return ThemeData(
    colorScheme: ColorScheme.dark(
      primary: token.accent,
      secondary: token.accentMuted,
      surface: token.surface,
      onPrimary: token.onAccent,
    ),
    // ... rest derived from token
  );
}

Testing: one assertion to rule them all

The real win wasn't less code — it was that we could now write a single golden test that exercised every theme:

void main() {
  for (final token in AppTheme.all) {
    testWidgets('${token.name} renders correctly', (tester) async {
      await tester.pumpWidget(
        MaterialApp(
          theme: buildTheme(token),
          home: const GameBoard(),
        ),
      );
      await expectLater(
        find.byType(GameBoard),
        matchesGoldenFile('goldens/${token.name}.png'),
      );
    });
  }
}

If the factory is correct, all fifteen themes are correct. If it's wrong, all fifteen themes fail — which is exactly what you want.

On the first run against the golden files, three themes had subtle contrast issues we'd never noticed in manual testing. The gold number on a slightly-lighter surface that just barely failed WCAG AA. The kind of thing that ships unless you're specifically looking for it.

CI: screenshot diffing on every PR

We added the golden tests to our GitHub Actions pipeline. Every PR now generates a screenshot diff for all 15 themes across the 4 tile states that matter (unrevealed, revealed, mine, flagged). It adds about 40 seconds to CI. Worth every millisecond.

The workflow is: new theme? Add a token, one line. The factory handles the rest. The test loop picks it up automatically. If it breaks visually, CI catches it before review.

What we'd do differently

We wish we'd built the token system before the second theme, not the ninth. The migration was tedious — not hard, just tedious. Every widget that had been written with a hardcoded Colors.amber had to be found and updated. We used grep and swore a lot.

If you're building a theming system from scratch: define your semantic tokens first, before you have a single theme. The naming is the hard part. Once the names are stable, adding themes becomes filling in a form.

The Minesweeper codebase now has 17 themes. Adding the 18th will take about ten minutes.

More from the lab
DesignMar 2026

The feel of a tile: animating 2048 in Flutter

Read →
StudioJan 2026

Why we build classics, not clones

Read →