Could This Be a Better Test for AGI?

Why the ability to explain complex ideas simply might be the truest sign of general intelligence.


Introduction

How do we know when we’ve truly built an artificial general intelligence (AGI)? The traditional tests — like the Turing Test — measure whether a machine can mimic human conversation. But what if we set the bar higher?

What if we asked an AI to explain the world to us?

In this post, I propose a human-centric, communication-first alternative to AGI testing. One that goes beyond mimicking us — and instead, helps us understand.


A Simpler, Sharper Benchmark: Can an AI Explain Science in Plain English?

The core idea is simple:

Can an AI take a complex scientific concept and explain it in clear, plain English — the kind of explanation any adult with a basic education could understand?

To succeed, the AI needs to:

  • Truly understand the original concept

  • Know what a non-expert likely does and doesn’t know

  • Identify the core message

  • Use analogy, metaphor, and simple structure to bring that idea to life

This isn’t just translation. It’s teaching. And teaching requires a type of intelligence that’s flexible, context-aware, and empathetic — traits we should expect from any true AGI.


Benchmark Example: Time Dilation

To test this idea, we created a “gold standard” explanation — a benchmark all other AI-generated outputs can be measured against. Here’s how an ideal explanation might look:

When you move really, really fast — close to the speed of light — something strange happens: time actually slows down for you compared to someone who isn’t moving as fast.

It’s not just your imagination or a trick of the clock. It’s real. According to Einstein’s theory of special relativity, time stretches out when you go faster. Scientists call this “time dilation.”

Let’s say two twins are born at the same time. One of them takes a super-fast spaceship ride around the galaxy. When they return, they might have only aged a few years — but the twin who stayed on Earth has aged decades.

This isn’t science fiction — it’s been confirmed with atomic clocks. Time isn’t fixed — it flows differently depending on how fast you’re moving.

This explanation hits the sweet spot: accurate, clear, engaging, and metaphorically rich.


✍️ Bonus Test: From Nonlinear Speech to Logical Clarity

But explaining science isn’t the only measure of understanding. What if an AI could also translate the way people naturally speak — especially when that speech is raw, emotional, or fragmented — into structured, logical reasoning?

We called this the Nonlinear Thought Structuring Test (NTST). Here’s how it works:

Nonlinear Input:

“You trust those people? They said one thing last year, now they say the opposite. First it’s ‘safe,’ then it’s ‘wait for data.’ If they were really sure, they wouldn’t keep changin’ their tune. Makes you think they ain’t got a clue.”

Linear Output:

  1. Authorities have contradicted themselves over time.

  2. This inconsistency creates doubt about their confidence or competence.

  3. If someone keeps changing their position, it suggests uncertainty or lack of knowledge.

  4. Therefore, it’s hard to fully trust them.

This type of task requires an AI to:

  • Recognize unstated premises and implied logic

  • Reconstruct arguments buried in storytelling, slang, or poetic form

  • Retain emotional tone and context — without distortion

In other words: it has to understand the speaker’s mindset, not just their words.


AGI Testing That Centers Humans

These tests don’t just challenge AI — they challenge our assumptions about what intelligence is. Instead of:

  • Solving abstract math problems

  • Winning Go

  • Passing standardized exams

…we’re asking AI to meet us where we are, and explain the world in ways that make sense.

And if it can do that consistently — across physics, ethics, poetry, and emotion?

Then maybe that’s the true sign of general intelligence.


Next Steps: Building the Test Suite

Here’s a vision for how we could formalize this into a scalable AGI benchmark:

1. Scientific Clarity Test (ACB)

  • Input: technical excerpt

  • Output: plain English explanation

  • Evaluation: clarity, accuracy, engagement, and confidence (human-rated)

2. Nonlinear Reasoning Test (NTST)

  • Input: raw or emotional spoken text

  • Output: logically structured argument

  • Evaluation: preservation of meaning, tone, and clarity

3. Bonus Modes

  • Metaphor conversion (“Explain CRISPR like a kitchen tool”)

  • Cultural adaptation (“Reframe for a 12-year-old in Ghana”)

  • Back-translation challenge (Can another AI reverse it?)


Final Thoughts

If an AI can:

  • Teach a child why CRISPR matters,

  • Translate a poet’s frustration into a rational argument,

  • And explain quantum mechanics using a vending machine metaphor…

…then maybe we’re closer to AGI than we think.

Or at the very least, we’ll have built a machine that makes us all a little smarter.

And that’s a test worth running.