Despite the bold proclamations from tech CEOs about AI agents "joining the workforce" and creating "multi-trillion-dollar opportunities," today's AI agents fall short of true autonomy.
If you missed part 1 of this series, catch up here:
What we actually have are sophisticated chatbots wrapped in fancy code. They follow scripts, not instincts. Feed them the same task twice; you might get completely different answers. Good luck tracing the logic when trying to debug.
The numbers tell the real story (Via this paper: Top of the Class: Benchmarking LLM Agents on Real-World Enterprise), and they're sobering:
Best models hit 76% accuracy on enterprise tasks (would you accept a 1-in-4 failure rate?)
Under 50% on financial analysis work (imagine explaining that to your CFO)
Only 24% of task completion in realistic workplace simulations
I've been watching this space closely, and here's my take: AI agents are phenomenal brainstorming partners but terrible decision-makers.
Think of them like that brilliant but widely unreliable colleague who generates amazing ideas randomly in meetings. You'd never trust them to close the deal alone. They're input providers; you should own the output (and outcome).
Instead of rushing to hand over the keys to AI agents, brilliant leaders like you, ask the harder question: Where can these tools solve existing problems without creating chaos?
The revolution may be coming in the near future. But right now, those betting everything on autonomous AI agents are playing with fire.
The “Bright” Future of Saas and AI Agents
Suppose we see agentic AI as a reasoning framework that can navigate ambiguity while maintaining a coherent goal in mind.
In that case, here is what we can expect in the future:
It can break down high-level goals into executable sub-goals without human guidance (or on system prompts), while maintaining goal consistency
It understands unspoken requirements and context, or is aware of the context under which a request is received (or even who makes the request and what that implies)
Not just raw memory storage, but the ability to evolve conceptual understanding of domain models and stakeholders.
It knows how to operate inside boundaries, when to proceed autonomously, or when to seek clarification
Multi-context reasoning. It can articulate (and make) decisions based on tradeoffs between technical/operational/business/user (and even ethical) dimensions
In practice, you'll likely have many of these trained on particular domain knowledge, potentially even down to a full agent model trained on your proprietary knowledge that has never been in the public domain.
I'll come back later to what needs to be true for those agents to exist.
From A Customer Perspective
Let's assume all I’ve listed above are real, and what that means for the current crop of SaaS businesses…
Listen to this episode with a 7-day free trial
Subscribe to 2nd Order Thinkers to listen to this post and get 7 days of free access to the full post archives.