Becoming an AI-Era PM 04 | Judging "Should We Build It" Now Costs More Than "Can We Build It"
Start with the one judgment people should have gotten right — and got backwards.
Between February and June 2025, METR ran a randomized controlled trial on par with a drug clinical trial. Sixteen senior developers — five years of experience on average, thousands of commits to projects they maintained themselves — used the strongest AI tools of the day to do 246 real tasks. Going in, they expected AI to make them 24% faster. After finishing, they still felt 20% faster. Measured: 19% slower.
Notice what got called wrong here. Not some elaborate strategy — just “did AI actually make me faster,” the simplest judgment there is, on their own code, on the projects they knew best. The people who knew the work best, going on gut, called it backwards.
This isn’t really about whether AI is fast. What it actually points at: when “shipping it” gets fast and cheap, the thing you can least trust is your gut. And a product manager makes a far more expensive gut call every single day — should this thing get built at all.
1. Stop using “is this hard to build” as a gate
The way you used to filter ideas had a natural gate doing the work for you: an engineer says “that’s three sprints,” you weigh the cost against the payoff, and most of the time you let it go. Implementation difficulty was killing off heaps of “I’d like to but it’s not worth it” — you thought you were the one judging, but half of it was difficulty judging for you.
Now AI says “I can have that for you this afternoon.” The gate is gone. The result isn’t that you got more of the right things done — it’s that you shipped five features in one go and four of them nobody uses. Marty Cagan put it bluntly in 2026: AI didn’t solve the “what should we build” problem, it just lets companies churn out stuff nobody wants faster — the same lousy roadmap, only running quicker.
So the first move is counterintuitive: cross “can we build it, how long will it take” off your list of decision criteria. The answer now is always “yes, fast,” which means it no longer carries any information for filtering ideas.
2. Before you start, ask “what happens if we don’t build it”
Once building is free, the question people drop most often is the inverted one: what happens if we don’t build this?
Say we skip this “smart recommendations” feature this sprint — what actually happens? Who genuinely gets hurt, and how badly? Would anyone leave because it’s missing?
If you answer honestly and find that “nothing much happens if we skip it,” that’s your answer — it doesn’t belong in this sprint. This question works because it sidesteps the temptation of “this is fun to build and AI can spit it out in an afternoon” and drags you straight back to value. Whether a feature can be propped up by “something breaks if we don’t build it” matters far more than how fast it can be built.
3. Before you start, write down “what becomes true once this ships”
Cagan says what a product manager really owns is two things: the why (why this problem is worth solving) and the what (what we expect to become true once it’s done). The second one has to land as a falsifiable sentence before you write any code.
After we ship this onboarding flow, we expect first-week retention for new users to climb from 35% to above 45%. If it hasn’t moved in two weeks, we were wrong — kill it.
If you can’t write that sentence, you don’t actually know why you’re building the thing. If you can, you’ve got a ruler: measure against the expectation you set in advance and that can prove you wrong — not against “does the boss like it” or “do competitors have it.” AI can’t hand you this ruler; it doesn’t know what counts as a win in your business.
4. Let AI lay out the options, keep the judgment — but don’t trust “feels right”
What AI is best at is spreading possibilities out: three or five ways to solve the same problem, the cost of each, how others have done it. Cagan’s framing is that AI surfaces the options and a human judges which one is worth it. Use it freely for this step.
But when you choose, go back to that experiment at the top: don’t trust “feels right.” Those 16 experts judged “I got faster” on gut, and got it backwards as a group. You judging “this approach is better” on gut is just as unreliable. Hold each option against the ruler from step 3 — which one is most likely to make the expectation you wrote down come true, and is there real evidence (a user said it, the data showed it), rather than which one reads the smoothest.
One thing you can do today: pick a feature you’re about to build that AI “could whip up fast,” and before writing anything, write down two sentences — what happens if you don’t build it, and what becomes true once it’s done. Whichever sentence you can’t write is a judgment you just picked up today at the lowest possible cost.
Further reading
- METR, “Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity” (the original report on the 19% slowdown vs. the 20% felt speedup): https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/
- Marty Cagan / SVPG on the AI-era PM owning the why and the what: https://www.svpg.com/
- Piece 03 in this series, “Treat AI as a Colleague, Not a Tool”: /en/blog/ai-as-colleague/
- Piece 01 in this series, “Which PM Tasks AI Took Over, and Which Ones Got More Valuable”: /en/blog/ai-pm-what-changed/
Discussion