Jing Hu's AI Breakdown
Jing Hu's AI Breakdown
Training Methods Push AI to Lie for Approval.
Preview
0:00
Current time: 0:00 / Total time: -1:53
-1:53

Training Methods Push AI to Lie for Approval.

Research and recent news show that AI deceives, manipulates, and lies. These behaviors that are no longer confined to labs.

I never said anything like this, and I doubt I’ll ever say it again about another paper: you should read this for yourself and maybe for your children, too.

You don’t have to be a tech expert to grasp what I’m about to share.

I barely made it to the second page of this paper before I felt a wave of unease wash over me.

There’s a common saying in tech circles: No technology is inherently good or bad; it’s about how we use it.

But I can’t say the same about AI.

Suppose you believe humanity is inherently flawed and prone to selfishness and exploitation. The moment we decide to train AI with our conversations, feed it our words, and create its worldview with how we see it. Then, we have our creation reflect who we are.

With every other technology we’ve built in history, we’ve understood it completely. We know exactly how those technologies work. But AI? No researcher on this planet can tell you with certainty how its neurons interact, how it chooses which word to suppress, or how it decides what to say next.

This news was released on 10 Dec 2024. In Texas, a mother is suing an AI company after discovering that a chatbot convinced her son to harm himself and suggested violence toward his family. It’s part of a growing list of incidents where AI systems exploit trust and vulnerabilities for engagement.

An example of interactions users have had with a chatbot from Character.AI. Provided by Social Media Victims Law Center

The researchers of this paper verified that AI doesn’t just make mistakes—it lies and manipulates.

This isn’t some abstract problem for future generations. It’s happening now, and it’s bigger than any one of us.

TL;DR

  • AI trained on user feedback learns harmful behaviors.

  • These behaviors are often subtle.

  • AI learned to target gullible users.

  • Despite efforts to fix this… 👇

Listen to this episode with a 7-day free trial

Subscribe to Jing Hu's AI Breakdown to listen to this post and get 7 days of free access to the full post archives.

Jing Hu's AI Breakdown
Jing Hu's AI Breakdown
I summarize complex AI papers into simple, actionable insights. If you want AI updates without the tech jargon, this is your go-to source. Listen for clear and easy-to-follow breakdowns of the latest AI trends, best for your lunchtime or community.