2nd Order Thinkers
2nd Order Thinkers.
AI Agents Problems No One Talks About.
0:00
-35:07

Paid episode

The full episode is only available to paid subscribers of 2nd Order Thinkers

AI Agents Problems No One Talks About.

Memory like goldfish, goals like broken GPS, and feedback loops that would make your worst coworker look brilliant.
9

Despite the bold proclamations from tech CEOs about AI agents "joining the workforce" and creating "multi-trillion-dollar opportunities," today's AI agents fall short of true autonomy.

If you missed part 1 of this series, catch up here:

What we actually have are sophisticated chatbots wrapped in fancy code. They follow scripts, not instincts. Feed them the same task twice; you might get completely different answers. Good luck tracing the logic when trying to debug.

The numbers tell the real story (Via this paper: Top of the Class: Benchmarking LLM Agents on Real-World Enterprise), and they're sobering:

  • Best models hit 76% accuracy on enterprise tasks (would you accept a 1-in-4 failure rate?)

  • Under 50% on financial analysis work (imagine explaining that to your CFO)

  • Only 24% of task completion in realistic workplace simulations

I've been watching this space closely, and here's my take: AI agents are phenomenal brainstorming partners but terrible decision-makers.

Think of them like that brilliant but widely unreliable colleague who generates amazing ideas randomly in meetings. You'd never trust them to close the deal alone. They're input providers; you should own the output (and outcome).

Instead of rushing to hand over the keys to AI agents, brilliant leaders like you, ask the harder question: Where can these tools solve existing problems without creating chaos?

The revolution may be coming in the near future. But right now, those betting everything on autonomous AI agents are playing with fire.


The “Bright” Future of Saas and AI Agents

Suppose we see agentic AI as a reasoning framework that can navigate ambiguity while maintaining a coherent goal in mind.

In that case, here is what we can expect in the future:

  • It can break down high-level goals into executable sub-goals without human guidance (or on system prompts), while maintaining goal consistency

  • It understands unspoken requirements and context, or is aware of the context under which a request is received (or even who makes the request and what that implies)

  • Not just raw memory storage, but the ability to evolve conceptual understanding of domain models and stakeholders.

  • It knows how to operate inside boundaries, when to proceed autonomously, or when to seek clarification

  • Multi-context reasoning. It can articulate (and make) decisions based on tradeoffs between technical/operational/business/user (and even ethical) dimensions

In practice, you'll likely have many of these trained on particular domain knowledge, potentially even down to a full agent model trained on your proprietary knowledge that has never been in the public domain.

I'll come back later to what needs to be true for those agents to exist.

Weekly contract: I dive deep, you click once. I'm starting to think I'm bad at negotiating.


From A Customer Perspective

Let's assume all I’ve listed above are real, and what that means for the current crop of SaaS businesses…

Listen to this episode with a 7-day free trial

Subscribe to 2nd Order Thinkers to listen to this post and get 7 days of free access to the full post archives.