Building an AI-Native Design System

Benevity

Rebuilding a design system for AI when atomic design hits its limits

Challenges

Atomic Design solved a problem we no longer have: handing static components to humans who would assemble them with judgment. In an AI-led development lifecycle, the judgment has to live somewhere the AI can read. The design harness is where we put it. It's not a replacement for our design system. It's the context layer that sits with it.

Starting with Skills

We had just finished moving Skyline, our design system, into Claude skills. Components, patterns, templates, all of it. The technical project went well. Prototypes started coming together in real code, in days instead of weeks.

Then we started using it. And the gaps showed up fast.

Missing Human Judgement

Designers make hundreds of small decisions every time they use the system. Which pattern fits this context. Whether the tone of this experience calls for more density or more breathing room. When to lean on a familiar pattern and when to deviate because the user's emotional state has shifted. None of that translated. The AI had the parts but not the reasoning.

Too Much Control

Our first instinct was to build a decision tree. Map the conditions, map the right pattern for each, codify what designers do in their heads.

We got partway through and stopped. The tree was scaling exponentially. We were modelling existing experiences and could already see that multi-modal surfaces, which we couldn't fully predict, would multiply every branch. We were building a system to outsmart every situation, and it was going to collapse under its own weight.

The Big, Uncomfortable Thought Experiment

I brought a question to the team: what if we threw the entire design system out, and just described how we design? What would be the smallest set of primitives and the clearest articulation of philosophy we'd need to teach AI to produce something that feels like Benevity?

The engineers lost their minds. No code consistency. No shared component library. A regression on everything we'd built.

The designers had a different reaction. For most of them it was a light-bulb moment. The thought experiment articulated a feeling they'd been carrying but hadn't been able to name. They knew the constraints we were trying to enforce on the AI weren't working, but they couldn't say why.

The thought experiment was never the actual plan. It was a way to surface what we were really protecting and what we were just defending out of habit. Atomic Design has been the cornerstone of design systems for a decade. It worked well with agile delivery. It does not, on its own, work well with AI. That was the uncomfortable thing we needed to say out loud.

The Result

The answer wasn't either-or. It was a layer on top.

Keep Atomic Design for what it's good at: code consistency, accessibility, shared component implementation, the things engineering needs. Add a separate layer that gives the AI the why. Our opinions on the primitives. The reasoning behind our choices. The dimensions where we have strong points of view, and the dimensions where the AI should adapt to the situation.

That layer became the design harness. The name matters less than the function. It's the context that lets the AI design with us rather than execute against us.

Concretely, the harness is a small set of opinionated documents and skills:

PRODUCT.md describes the product and our users. Who we serve, what they're trying to do, the conditions they show up in.
DESIGN.md describes our primitives, typography, color, spatial relationships, motion, interaction, content, and explains why we landed on the choices we did. The reasoning is as important as the rules.
A set of modification skills for dimensions like delight, trustworthiness, severity, tone. These don't dictate; they give the AI a vocabulary for adapting to context.

The primitives layer is where we have strong opinions and they don't bend. The modification skills are where the AI gets to make situational decisions.

What it Looks Like in Practice

The clearest way to explain the harness is to look at three real scenarios.

An admin experience built on trust

Data density goes up. We lean on well-established patterns for predictability. Colors get muted. We strip out flashy elements. The user is doing serious work and the design has to disappear so they can focus.

A fun-run fundraising event

Padding and corner radius increase. Animation comes in. Language gets lighter and more playful. Gamification becomes appropriate. The same primitives, applied with a different opinion about what the moment calls for.

A donation campaign for a natural disaster

Animation comes back down to nearly zero. Language becomes direct and grounded. Gamification is removed entirely. The same components, with the seriousness the context demands.

Same design system. Same primitives. Three radically different experiences, all of them recognizably Benevity. The harness is what lets the AI know which one is which.

What Surprised Me

How good AI gets when you're willing to let go of long-standing truths in product design. Atomic Design has been gospel for a decade. It still has a place. But it solved a different problem than the one we have now. Only by getting uncomfortable enough to actually examine the why could we find an answer that wasn't a translation of the old solution.

The collaboration-over-control insight came from this work, but it shows up everywhere now. I bring it up in conversations about AI tooling, about team structure, about how we work with PMs. The harness was the artifact. The mental model is the thing that's stuck.

What I'd do Differently

Lead with the thought experiment, not the translation. I treated this as a problem of adapting our existing system to AI. The breakthrough came when we stopped trying to adapt and started asking what an AI-first design system would look like, even theoretically. If I were starting over, that question would come first, not after the decision tree had already failed.

We can no longer assume that the systems that got us here are the systems that get us forward. The work is recognizing which ones still serve us, and being honest about the ones that don't.