2nd Order Thinkers.
Jing Hu's AI Breakdown → 2nd Order Thinkers.
The Valuable LLM Apps Excel At Being Wrong
0:00
Current time: 0:00 / Total time: -11:08
-11:08

The Valuable LLM Apps Excel At Being Wrong

First in my two-part guide: Build your LLM bullshit detector. As humans thrive by embracing our nature, successful LLM apps leverage hallucinations instead of fighting them.

This is the first in a two-part series where I separate valuable LLM applications from expensive distractions, culminating in why Meta is positioned to dominate the AI landscape.

Ever nodded along to LLM discussions while secretly having no clue what's happening under the hood? Or tried reading technical explanations that left you more confused than before? What's worse, not understanding these models directly impacts your ability to recognize which AI applications are worth a damn.

I'll demystify LLMs without making your brain hurt - imagine that!

Better yet, you'll walk away with a bullshit detector for LLM applications - saving you from expensive AI mistakes others will make.

This matters because the difference between revolutionary AI and expensive gimmicks isn't always obvious until you know what to look for.

Part two will expose why Meta's unique approach to AI could fundamentally reshape their business while others flounder. Follow both for an insider's view of the LLM landscape that you won't find in pitch decks or press releases.

You will never find an analysis this clear to help you get the big picture of the LLM's commercial world.

Recap on LLM

Imagine you had a massive hard drive with all the websites that exist. You also had transcriptions of all YouTube videos, all movie scripts, newspapers, and even digital copies of books. All of that organized and labeled and available to you as a giant set of text files.

A naive computer program could count how many times a word was followed by another one across all that text. That would give you a distribution of how likely words B, C, and D followed word A. Think of it as a hat (call it "words after A") with all possible words in there. But you've rigged the hat so that the odds of pulling out B, C, or D fall in line with that count you did earlier. And then you had similar hats for "words after B" and "words after C" and so on.

With those hats, you can now play a fun game. Whenever you see word A, you put your hand in "words after A" and pull out a word. Sometimes that'll be B, sometimes C or D. But whatever it was, you just continue pulling from the right hat (words after B, words after C, ... ) until you reach 100 words.

You'd get some funny results, but a lot of it would be garbage. Just think about all the words that could follow "the". It wouldn't take much for your 100 words to unravel into utter nonsense.

Attention Is All You Need

Then, in 2017, Transformers came and changed the game. Rather than just completing a sequence with the next word, they made two huge differences.

Attention

The first key difference is called attention.

Or, more simply, the ability to look at all the input words simultaneously and understand how compatible they are with each other. It's how it differs the bat in "I swing my bat" from "The bat sleeps in a cave".

There are multiple of these attention mechanisms, focused on different relationships (syntax, semantics, etc.). As context windows (i.e., how much input a model can take) grow, it's no longer just the words immediately surrounding something but references you make with words far apart.

Process input in parallel

The second big breakthrough is that it allows the processing of input in parallel.

So, rather than going forward one word at a time, it actually looks at all words at the same time to work out the relationships between them. This has enormous benefits for the processing speed and the amount it can process.

A side-effect of all this, particularly if you feed it enough input text, is that these statistical relationships between words start to represent real concepts. For instance the statistical relationship between Paris and France; is highly similar to that between Madrid and Spain; and both will be closely linked to the term capital.

All this is basic training for a language model.

It's really capturing the statistical relationship between relevant words based on loads and loads of examples.

This type of training is slow and expensive.

But you end up with a comprehensive dataset that recognizes relationships between concepts, from the obvious to obscure ones that no human would easily make. And that is a powerful engine.


Reinforced Learning (Fine Tuning)

The next step is turning that into a helpful assistant. This is the phase where humans get involved to reinforce the learning and separate good answers from bad. The good or bad here is to "teach" the model the sort of conversations a helpful assistant would (or rather "should", in the opinion of the trainers) have.

If you boil it down to the mechanical side, all this achieves is altering the probabilities from the first training stage to make the output look more like a helpful assistant's response. This has an impact both on the kind of text it generates, but it can also actively change how the model relates certain concepts. Which ultimately is where the AI alignment and training bias problems come in.

For instance. Have you noticed that chatGPT often ends an answer asking you if it can do more? Or prompt you for a next step even when there really isn't anything else? Or when it starts an answer, it harks back certain key terms from your initial question? That's the trained behavior from this stage.

Compared to basic training, this type of training is relatively cheap and fast. T

hat makes sense since it's not really learning anything new but more "rewiring" itself. While the first step creates what's called a base model, the result of this stage is referred to as an "instructed" model.

In the end, what you're left with is something that will grab the next N number of words based on some amount of input, according to a fixed statistical representation, and with a complex mechanism to try and stay on topic.

In practice, that leads to more relevant words showing up more regularly.

However, statistics can only capture so much of this and are limited by what a model has been trained on. This means that inevitably, there are pathways in that representation that will make it go off the rails.

You could call that hallucination or genius.

That decision has more to do with whether it was a desired effect or not. But whichever side you're on, it's simply a feature of a statistical system.


Low-to-Zero Value LLM Applications

A "context-aware next word guesser" is, of course, technically very cool and impressive, but is it also valuable?

Given how expensive it is to produce a model from scratch and run it, it stands to reason there won't be many. At least not until the price of training and running decreases significantly.

And I'm talking come down far below even DeepSeeks' reported $5-6m.

A lot of this cost is in power consumption and cooling, or more broadly, data center and hardware efficiency. So, any gains in those areas, as well as code optimization, will eventually filter through to lower training costs. But don't hold your breath. We're a long way away from every firm having its truly individual model.

So for the time being, we're stuck with a couple of engine manufacturers so to speak. As a petrol head I know an engine is only truly exciting once you add a base with wheels, a steering wheel and some well-shaped bodywork.

So what would that look like for these engines?

Note Takers

One area everyone seems to go after is summarizing meetings. You're probably tired of seeing yet another note taker when you have a Zoom/Teams/Hangout call.

"Note taker" is of course a generous euphemism for summary producer. Which in most cases is a company that sends the literal transcription to someone else's engine with a prompt along the lines of "you're a useful assistant, please summarize this meeting and list any actions points and who owns them".

There isn't much uniqueness or moat in this business.

If anything, looking at the likes of Fireflies.ai, tl;dv.io ... it's hard to see how they are not already commodity tools. I'm going to call their adoption mainly the novelty factor. And I suspect they'll eventually all get replaced by a built-in version of the meeting tool you're using.

Not to mention that those summaries and action points aren't always as trustworthy as you'd hope.

Now take this from someone who worked at a telco delivering phone calls for sales teams: transcripts and summaries are the new call recordings. They'll just become a tickbox feature that doesn't command a premium in communication software.

Given the reliability (or lack thereof, pesky hallucinations are a feature) of the action points, people are slowly learning not to rely too much on the output.

The ultimate use case here is more likely for compliance and ass-covering, which is a niche that relies strictly on accuracy. Not quite the strong suit of LLM, either.

High-Value LLM Applications

Said all that, there is a category of players who I think are genuinely in a position to deliver added value for users through gen AI. Some of those were around long before generative AI, but will benefit from adopting it. Others are born in it, molded by it.

I'm thinking of new kids such as Cursor (powered by Claude in my setup), which allows me (someone who stopped regularly coding 7-8 years ago) to churn out good enough quality code to prototype ideas quickly.

Not that I would trust this code anywhere near production yet. But if you know what a good answer should be, even if you can't exactly articulate it, an LLM can help you get there.

That's true for many use cases in this space right now, including the deep-research style tasks.

I'm also quite hopeful for players like Gong. They have long specialized in the "understanding language" area and now have a whole new toolbox to play with.

The key difference is that these companies provide lasting value to users beyond the novelty of a dancing monkey.

The key characteristic is they found ways of making the AI disappear in favor of solving an actual problem. And critically they don't rely on 100% accuracy of the produced output, but rather have a level of "directionally correct" that makes them good enough.

One key to nailing an application for LLM is knowing that being broadly right and precisely wrong is okay, better yet, a desired feature.

Most of these examples keep the AI relatively constrained to a core function of generating text. Though I'm not blind to the multi-modal evolution of eventually reliably generating images and video. Those will come.

More than anything, though, I am looking for these models to get trained to interact with other tools.

You would likely grab a calculator to work out 483 * 7289. If a model can be trained to recognize a math problem and learn to "operate" a calculator... It changes the problem from "model needs to understand math" to "model needs to recognize math problem and call help".

Now replace calculator with any number of systems you have around you, and you see where this goes.

It changes from the model needing to solve problems to first classifying and then coordinating a solution.

You could call that "agentic". Classifying is a pretty well-understood problem in computer science, even though it's far from totally solved; just think about the quality of your Netflix recommendations.

Coordinating different independent tools, on the other hand, is much more difficult. The idea that one agent could talk to another agent and magically work out a data exchange format is ... well, magic.

This finally brings me to Meta and how I could see their use of generative AI evolve.


What You Need to Know Before Meta's AI Play

You've now got the inside track on how LLMs work and what makes LLMs valuable. AI is neither all good nor all evil. I hope this practical understanding helps you separate AI hype products from genuine value-offered products.

Remember: the most powerful LLM applications aren't about perfect accuracy but about being "broadly right" while solving real problems where the technology itself disappears into the background.

What's Coming in Part Two?

Meta is sitting on a gold mine, and you'll want to see how they might tap it:

  • How could Meta transform its $160 billion advertising business (97% of revenue) using AI to create truly personalized ads?

  • What happens when the cost to generate an ad just for you drops to nearly zero?

  • Could Meta create a system that perfectly targets you and writes custom copy that speaks directly to your needs?

The numbers are jaw-dropping.

We're talking about the technical feasibility of generating billions of personalized ads daily at a cost of pennies. Potentially less than $x for 1,000 hyper-personalized ads.

Want to know how much is the $x? Subscribe to find out.

While Google faces existential questions about the future of search, Meta could be positioning itself to capture unprecedented advertising value by combining its vast user data with generative capabilities.

Follow part two to understand not just what's possible but what's probable. I will tell you what signals to watch for as this transformation unfolds. This isn't crystal ball gazing; it's connecting the already visible dots for anyone looking closely enough.

Discussion about this episode