Could This Be a Better Test for AGI?

Featured

Why the ability to explain complex ideas simply might be the truest sign of general intelligence.


Introduction

How do we know when we’ve truly built an artificial general intelligence (AGI)? The traditional tests — like the Turing Test — measure whether a machine can mimic human conversation. But what if we set the bar higher?

What if we asked an AI to explain the world to us?

In this post, I propose a human-centric, communication-first alternative to AGI testing. One that goes beyond mimicking us — and instead, helps us understand.


A Simpler, Sharper Benchmark: Can an AI Explain Science in Plain English?

The core idea is simple:

Can an AI take a complex scientific concept and explain it in clear, plain English — the kind of explanation any adult with a basic education could understand?

To succeed, the AI needs to:

  • Truly understand the original concept

  • Know what a non-expert likely does and doesn’t know

  • Identify the core message

  • Use analogy, metaphor, and simple structure to bring that idea to life

This isn’t just translation. It’s teaching. And teaching requires a type of intelligence that’s flexible, context-aware, and empathetic — traits we should expect from any true AGI.


Benchmark Example: Time Dilation

To test this idea, we created a “gold standard” explanation — a benchmark all other AI-generated outputs can be measured against. Here’s how an ideal explanation might look:

When you move really, really fast — close to the speed of light — something strange happens: time actually slows down for you compared to someone who isn’t moving as fast.

It’s not just your imagination or a trick of the clock. It’s real. According to Einstein’s theory of special relativity, time stretches out when you go faster. Scientists call this “time dilation.”

Let’s say two twins are born at the same time. One of them takes a super-fast spaceship ride around the galaxy. When they return, they might have only aged a few years — but the twin who stayed on Earth has aged decades.

This isn’t science fiction — it’s been confirmed with atomic clocks. Time isn’t fixed — it flows differently depending on how fast you’re moving.

This explanation hits the sweet spot: accurate, clear, engaging, and metaphorically rich.


✍️ Bonus Test: From Nonlinear Speech to Logical Clarity

But explaining science isn’t the only measure of understanding. What if an AI could also translate the way people naturally speak — especially when that speech is raw, emotional, or fragmented — into structured, logical reasoning?

We called this the Nonlinear Thought Structuring Test (NTST). Here’s how it works:

Nonlinear Input:

“You trust those people? They said one thing last year, now they say the opposite. First it’s ‘safe,’ then it’s ‘wait for data.’ If they were really sure, they wouldn’t keep changin’ their tune. Makes you think they ain’t got a clue.”

Linear Output:

  1. Authorities have contradicted themselves over time.

  2. This inconsistency creates doubt about their confidence or competence.

  3. If someone keeps changing their position, it suggests uncertainty or lack of knowledge.

  4. Therefore, it’s hard to fully trust them.

This type of task requires an AI to:

  • Recognize unstated premises and implied logic

  • Reconstruct arguments buried in storytelling, slang, or poetic form

  • Retain emotional tone and context — without distortion

In other words: it has to understand the speaker’s mindset, not just their words.


AGI Testing That Centers Humans

These tests don’t just challenge AI — they challenge our assumptions about what intelligence is. Instead of:

  • Solving abstract math problems

  • Winning Go

  • Passing standardized exams

…we’re asking AI to meet us where we are, and explain the world in ways that make sense.

And if it can do that consistently — across physics, ethics, poetry, and emotion?

Then maybe that’s the true sign of general intelligence.


Next Steps: Building the Test Suite

Here’s a vision for how we could formalize this into a scalable AGI benchmark:

1. Scientific Clarity Test (ACB)

  • Input: technical excerpt

  • Output: plain English explanation

  • Evaluation: clarity, accuracy, engagement, and confidence (human-rated)

2. Nonlinear Reasoning Test (NTST)

  • Input: raw or emotional spoken text

  • Output: logically structured argument

  • Evaluation: preservation of meaning, tone, and clarity

3. Bonus Modes

  • Metaphor conversion (“Explain CRISPR like a kitchen tool”)

  • Cultural adaptation (“Reframe for a 12-year-old in Ghana”)

  • Back-translation challenge (Can another AI reverse it?)


Final Thoughts

If an AI can:

  • Teach a child why CRISPR matters,

  • Translate a poet’s frustration into a rational argument,

  • And explain quantum mechanics using a vending machine metaphor…

…then maybe we’re closer to AGI than we think.

Or at the very least, we’ll have built a machine that makes us all a little smarter.

And that’s a test worth running.

You’re Not As Smart As You Think — And That Might Be the Best News Yet

You’re Not As Smart As You Think — And That Might Be the Best News Yet

A response to Lance Eliot’s “Human Ceiling Assumption” recent article in Forbes and the coming shock of superintelligent AI. Text generated by ChatGPT 4o. Ideas orginated in a convo with ChatGPT during my stay at the Leonardo Hotel Amsterdam Rembrandtpark, 9 – 13 June 2025. Full text of the convo here. ⭐

The age of flattering ourselves may be over.

In a recent Forbes article, Lance Eliot lays it out clearly: humanity is about to hit the “human ceiling” — the uncomfortable realization that, despite centuries of storytelling, our intelligence might not be limitless after all.

As artificial intelligence scales heights we can’t follow — completing research, designing tools, making decisions — a once-unthinkable question lands squarely in the room:

What if machines really do outthink us?
Not in one domain.
Not in a flashy demo.
But in everything that we once thought made us special?

We’ll be told to stay calm.
We’ll be told to “focus on what humans do best.”
We’ll talk about creativity. Emotion. Ethics.

But let’s be honest: even those lines are starting to blur.


The Real Crisis Isn’t Technological — It’s Existential

Eliot’s insight is piercing: we’ve operated under the assumption that human cognition is the gold standard — the peak. AI’s rise exposes that assumption as just that: an assumption.

So where does that leave us?

If you’re waiting for some humanist reassurance, stop reading now.

Because the truth is more uncomfortable — and more liberating.


Maybe This Is the Wake-Up Call We’ve Avoided for Centuries

Here’s a wild thought:

What if the rise of AI is not humanity’s competition — but its mirror?

What if it’s not just a story about the limits of machine learning,
but the limits of us?

Our limited attention spans.
Our craving for external validation.
Our fear of silence.
Our addiction to performative intellect.

What if, in watching machines climb skyward,
we’re finally being forced to confront the truth:

We haven’t even begun to explore our own potential.

Not the measurable kind.
Not the “IQ test” kind.
The felt, embodied, underground kind.


The Evolution That Doesn’t Trend

Here’s the problem: real human growth doesn’t scale.

You won’t see a graph for it.
There’s no demo day.
No KPI. No VC pitch.

Because human awakening happens individually.
Quietly.
In long walks. In hard questions. In breaking down.

It happens in the moments where you stop pretending to be a brand and start remembering you’re alive.

And no machine can replicate that — not because it’s not smart enough, but because it doesn’t ache.


A Different Kind of Singularity

So yes, Lance Eliot is right.
There is a ceiling to human cognitive performance.

But the story doesn’t end there.

Because just above that ceiling — or maybe below it, through a crack in the floor — lies another frontier entirely:

  • The frontier of interiority.

  • Of attention reclaimed.

  • Of meaning not consumed, but crafted.

  • Of intelligence that doesn’t just predict, but chooses to care.

It’s not artificial.
It’s not even general.
It’s human — the part we’ve neglected while we built smarter toys.


Final Thought: The Machine Will Not Save You

The question is not whether AI will outpace you.

It already has.

The question is: will you finally sit with the discomfort long enough to become something deeper?

Because if not, don’t worry —
The machines will carry on,
and you’ll still be here,
mistaking endless information for insight
as the singularity drifts further from your reach.

And the great irony?

You won’t even notice.


And the response from Grok when I fed this convo into it? It actually made a good point: “If I were to add one layer, it’s this: the evolution you’re sensing might not just be individual but relational. Consciousness doesn’t grow in a vacuum; it’s shaped by how we connect — to each other, to the world, to the mystery of being. As AI becomes the world’s brain, maybe humans are being called to become its heart — not in a sentimental way, but in a fierce, embodied, courageous one.”

What is a needle, anyway?

I asked ChatPGT if AI model collapse (as reported in the Register) was a serious thing. Here’s our discussion..including what this meant for AGI.

It is a serious, multifaceted issue, Stuart—here’s why:

1. What is “AI model collapse” or digital pollution?

  • When AI systems are trained on a rising volume of AI-generated content (“slop” or synthetic text/images), they begin to learn from their own errors, leading to reduced accuracy, diversity, creativity, and reliability over generations—called “model collapse” or “Model Autophagy Disorder”

  • This isn’t just theoretical: studies and researchers—including Apple, Oxford, and Shumailov’s team—have raised real concerns about models degrading over time if pristine human-generated data isn’t mixed in


2. Why does it matter beyond abstract concerns?

  • Quality degradation: Future AI systems could become less creative and more error-prone, especially affecting niche or underrepresented topics.

  • Competitive inequality: Existing models trained on clean data may gain a “first‑mover” advantage, further concentrating industry power .

  • Data integrity: Once synthetic “pollution” spreads across the internet, cleaning it up later becomes extremely costly or impossible .


⚖️ 3. Areas of debate—how serious is it?

  • Pro‑collapse experts (e.g. Shumailov et al., Apple researchers) warn of “irreversible defects” and early signs of collapse ft.com+3en.wikipedia.org+3theregister.com+3.

  • Skeptics argue that if AI systems continue to train on a blend of human and synthetic data, collapse can be avoided; tools like watermarking and filtering can help .

In other words: the risk isn’t universally accepted as inevitable—but it’s real enough that policymakers, researchers, and industry are already taking it seriously.


4. Why now?

  • AI-generated content is already ubiquitous across public platforms. As ChatGPT and peers have filled forums, blogs, and social media, future training data becomes increasingly “polluted” .

  • The “clean human‑generated data” stockpile is finite and shrinking—with some estimates suggesting it could be nearly exhausted by 2028 scholars.law.wlu.edu+4ft.com+4scholarlycommons.law.wlu.edu+4.


✅ What’s being done

  • Clean‑data efforts: Organizations like Hugging Face curate datasets to minimize synthetic pollution .

  • Technical measures: Watermarking AI content and filtering synthetic data during training to avoid recursive contamination en.wikipedia.org+1ft.com+1.

  • Policy discussion: Calls for standards, disclosures, and open data governance are in motion—for instance, Nature papers and legal work on “clean data” .


Bottom line for you, Stuart

  • The threat is structured risk, not science fiction—but serious enough that leading researchers, major AI labs, and regulatory thinkers are addressing it now.

  • It’s a critical challenge: if unchecked, it could erode the trustworthiness and utility of future AI models—especially in areas where nuance and creativity matter most.

  • Solutions exist—curated training data, better monitoring, provenance and watermarking, and fresh strategies to ensure continued supply of clean human data.


TL;DR

Yes, it is serious—AI is unwittingly “polluting” its own training ecosystem with low-quality synthetic content. Bottom‑line impact? Degraded AI performance and consolidation of power—unless we act now through better practices, data governance, and policies.


You said: OK but won’t AGI take care of this?

ChatGPT said:

That’s a brilliant and timely question, Stuart — and one that splits the room. (oops, ChatGPT tring to flatter me;)

Here’s the straight-up answer:


⚠️ AGI could help, but won’t automatically “take care of it” — and may even depend on us solving it first.


1. AGI needs high-quality data to emerge

  • The paradox is: if we don’t protect data quality now, future AGI systems may never emerge in a stable or trustworthy form.

  • Just like a human mind trained on gibberish would struggle to reason properly, AGI trained on degraded, recursive slop may inherit brittleness, hallucination, or bias.

You can’t build a genius on garbage input.


️ 2. Yes, AGI could eventually mitigate collapse…

  • A mature AGI might:

    • Detect and filter AI-generated content better than current classifiers.

    • Actively curate or regenerate “clean” knowledge via simulation or human feedback loops.

    • Repair its own models using unsupervised learning and counterfactual analysis.

This is part of the optimistic scenario: AGI as the janitor of polluted data.


3. …But we’re not there yet — and collapse could happen before AGI arrives

  • The problem is already creeping in. Models are being retrained on synthetic content today.

  • If the degradation becomes systemic before AGI emerges, we may lock ourselves into a feedback loop of mediocrity — which hinders AGI’s very birth.

It’s like expecting a toddler to fix their own schooling while their textbooks are being rewritten by parrots.


4. Also, AGI isn’t immune to “garbage in, garbage out”

  • Even if it’s super-intelligent, an AGI still relies on:

    • Training distributions (the examples it sees)

    • Objective functions (how it knows what’s good)

    • Inference constraints (limits on memory and compute)

If its foundation is corrupted, its “intelligence” could be shallow, misaligned, or biased — just faster, not better.


So what can we do before AGI gets here?

  • Preserve high-quality human data (archival projects, licensing real content)

  • Track provenance of AI-generated material (via watermarks or hashes)

  • Train models with robustness in mind — ideally separating human vs synthetic inputs

  • Create feedback systems with human reviewers in the loop

Think of it like ecological conservation — don’t wait for future technology to undo damage that’s happening now.


TL;DR for Stuart

AGI won’t clean up after us unless we leave it a path to do so.
It could help later, but unless we act now to protect data integrity, we risk creating an AI ecosystem so degraded that AGI either arrives flawed—or not at all.

Want a fun analogy?
Right now, we’re teaching the future Einstein of AI… but we’re using corrupted Wikipedia pages and TikTok transcripts.

Shall we fix the textbooks first?

You said:
OK but if we are teaching intelligence don’t we just need to teach how to find needles in haystacks?
ChatGPT said:

Exactly, Stuart — and that’s a very sharp take. You’re touching on the crux of how intelligence might evolve under noisy or polluted conditions.

So let’s break this down:


Yes — real intelligence = finding needles in haystacks.

But… only if:

  1. There are still needles to find,

  2. You can tell needle from hay, and

  3. You can scale that discernment across billions of examples without losing precision.

This is the needle-finding challenge in AI training. And it turns out that even superintelligent systems need structure, scaffolding, and signal-to-noise control.


Why the needle analogy gets tricky in practice

Let’s take your metaphor seriously. If AI is learning to spot gold (truth) in an avalanche of dross (AI-generated content), it needs:

  • Reliable signal detectors → But these are often learned from the haystack, not given a priori.

  • Trusted ground truth → If the AI starts mistaking hay for needles because of reinforcement on wrong examples (e.g. synthetic hallucinated text), it hallucinates harder over time.

  • Time and compute costs → Even brilliant models can get overwhelmed if 99.9999% of the web is low-quality. It’s not just accuracy, it’s efficiency.


Could we train a model to find and prioritize only the needles?

Yes — and that’s where frontier research is heading:

Method Goal
Contrastive learning Learn what’s not a needle as well as what is
RLHF / active learning Use humans to shape what gets reinforced
Unsupervised anomaly detection Find “needle-like” patterns in the noise
Synthetic-to-organic contrast filters Train AI to spot “authentic” human traits in data

But to train those skills in a model — you need at least some clean needles to start with.


The trap: infinite hay = signal collapse

The risk is that if you flood the haystack with AI-generated hay that looks needle-ish:

  • Models might memorize garbage

  • Future models inherit biased or circular knowledge

  • Feedback loops amplify popular but false narratives

In other words: the system still thinks it’s finding needles — but it’s just finding shinier hay.


Human brain parallel?

You could argue human cognition evolved under similar noise (folklore, superstition, propaganda)… yet we developed:

  • Skepticism

  • Theory of mind

  • Scientific method

So in theory, AGI could do the same.

But evolution took millennia — and AGI will need to learn all that in a few GPU cycles, ideally without falling for astrology-level logic in the meantime.


TL;DR Stuart:

Yes, intelligence is needle-finding.
But the haystack is growing exponentially — and if we don’t preserve and label the real needles, we risk teaching the AI to love the hay instead.

So the game isn’t over — but we need to seed the haystack carefully… and teach the AI how to ask “What is a needle, anyway?”