Could This Be a Better Test for AGI?

Featured

Why the ability to explain complex ideas simply might be the truest sign of general intelligence.


Introduction

How do we know when we’ve truly built an artificial general intelligence (AGI)? The traditional tests — like the Turing Test — measure whether a machine can mimic human conversation. But what if we set the bar higher?

What if we asked an AI to explain the world to us?

In this post, I propose a human-centric, communication-first alternative to AGI testing. One that goes beyond mimicking us — and instead, helps us understand.


A Simpler, Sharper Benchmark: Can an AI Explain Science in Plain English?

The core idea is simple:

Can an AI take a complex scientific concept and explain it in clear, plain English — the kind of explanation any adult with a basic education could understand?

To succeed, the AI needs to:

  • Truly understand the original concept

  • Know what a non-expert likely does and doesn’t know

  • Identify the core message

  • Use analogy, metaphor, and simple structure to bring that idea to life

This isn’t just translation. It’s teaching. And teaching requires a type of intelligence that’s flexible, context-aware, and empathetic — traits we should expect from any true AGI.


Benchmark Example: Time Dilation

To test this idea, we created a “gold standard” explanation — a benchmark all other AI-generated outputs can be measured against. Here’s how an ideal explanation might look:

When you move really, really fast — close to the speed of light — something strange happens: time actually slows down for you compared to someone who isn’t moving as fast.

It’s not just your imagination or a trick of the clock. It’s real. According to Einstein’s theory of special relativity, time stretches out when you go faster. Scientists call this “time dilation.”

Let’s say two twins are born at the same time. One of them takes a super-fast spaceship ride around the galaxy. When they return, they might have only aged a few years — but the twin who stayed on Earth has aged decades.

This isn’t science fiction — it’s been confirmed with atomic clocks. Time isn’t fixed — it flows differently depending on how fast you’re moving.

This explanation hits the sweet spot: accurate, clear, engaging, and metaphorically rich.


✍️ Bonus Test: From Nonlinear Speech to Logical Clarity

But explaining science isn’t the only measure of understanding. What if an AI could also translate the way people naturally speak — especially when that speech is raw, emotional, or fragmented — into structured, logical reasoning?

We called this the Nonlinear Thought Structuring Test (NTST). Here’s how it works:

Nonlinear Input:

“You trust those people? They said one thing last year, now they say the opposite. First it’s ‘safe,’ then it’s ‘wait for data.’ If they were really sure, they wouldn’t keep changin’ their tune. Makes you think they ain’t got a clue.”

Linear Output:

  1. Authorities have contradicted themselves over time.

  2. This inconsistency creates doubt about their confidence or competence.

  3. If someone keeps changing their position, it suggests uncertainty or lack of knowledge.

  4. Therefore, it’s hard to fully trust them.

This type of task requires an AI to:

  • Recognize unstated premises and implied logic

  • Reconstruct arguments buried in storytelling, slang, or poetic form

  • Retain emotional tone and context — without distortion

In other words: it has to understand the speaker’s mindset, not just their words.


AGI Testing That Centers Humans

These tests don’t just challenge AI — they challenge our assumptions about what intelligence is. Instead of:

  • Solving abstract math problems

  • Winning Go

  • Passing standardized exams

…we’re asking AI to meet us where we are, and explain the world in ways that make sense.

And if it can do that consistently — across physics, ethics, poetry, and emotion?

Then maybe that’s the true sign of general intelligence.


Next Steps: Building the Test Suite

Here’s a vision for how we could formalize this into a scalable AGI benchmark:

1. Scientific Clarity Test (ACB)

  • Input: technical excerpt

  • Output: plain English explanation

  • Evaluation: clarity, accuracy, engagement, and confidence (human-rated)

2. Nonlinear Reasoning Test (NTST)

  • Input: raw or emotional spoken text

  • Output: logically structured argument

  • Evaluation: preservation of meaning, tone, and clarity

3. Bonus Modes

  • Metaphor conversion (“Explain CRISPR like a kitchen tool”)

  • Cultural adaptation (“Reframe for a 12-year-old in Ghana”)

  • Back-translation challenge (Can another AI reverse it?)


Final Thoughts

If an AI can:

  • Teach a child why CRISPR matters,

  • Translate a poet’s frustration into a rational argument,

  • And explain quantum mechanics using a vending machine metaphor…

…then maybe we’re closer to AGI than we think.

Or at the very least, we’ll have built a machine that makes us all a little smarter.

And that’s a test worth running.

The AI Power Struggle: What JD Vance Didn’t Say in Paris and Why It Matters for Europe

Vice President JD Vance’s recent speech at the Paris AI Summit was a masterclass in controlled messaging. With a confident tone, he outlined America’s commitment to AI leadership, deregulation, and economic expansion. But as the saying goes, it’s often what’s left unsaid that speaks the loudest. And in this case, the omissions raise fundamental questions about the future of AI and who will control it.

All the AI world’s a stage..

The Missing Debate: Who Controls AI Access?

Throughout his speech, Vance avoided one of the most crucial debates in AI today: who should control access to AI? Should it be the domain of governments, regulated by democratic oversight? Should it be the preserve of powerful tech corporations, shaping AI in the interest of their shareholders? Or should independent developers and open-source communities have the freedom to build AI outside of corporate and governmental control?

By sidestepping this issue, Vance implicitly reinforced the idea that AI leadership should remain in the hands of a few U.S. firms and a government intent on keeping its technological dominance. This omission should give Europe pause, especially as the EU pursues a vision of AI that prioritizes openness, transparency, and accessibility.

The Open-Source AI Revolution – And Why Vance Ignored It

One of the biggest technological shifts in AI today is the rise of open-source AI models. Until recently, developing cutting-edge AI required immense computing resources and access to proprietary datasets, effectively locking out smaller players. But that’s changing.

Lower Compute Requirements – New AI architectures allow powerful models to run on smaller hardware, breaking the dependency on massive cloud infrastructures.

Greater Accessibility – Open-weight models, such as Meta’s LLaMA or Mistral’s AI systems, are enabling researchers, startups, and even hobbyists to develop sophisticated AI tools.

Decentralization of Power – Open-source AI prevents monopolization by big tech and provides alternatives for countries looking to avoid overreliance on U.S. firms.

Yet, Vance said nothing about this trend. And for good reason: it undermines America’s dominance in AI. If AI can be developed independently without reliance on U.S. cloud computing giants like Microsoft, Google, and Amazon, then the entire premise of U.S. AI superiority starts to erode.

China: The Omission That Speaks Volumes

Another striking absence in Vance’s speech? China. Given the geopolitical weight of AI, this is baffling. While he hinted at “hostile foreign adversaries” using AI for surveillance and censorship, he never explicitly named China as the U.S.’s main AI rival.

This raises several questions:

Is the U.S. avoiding a direct confrontation in AI policy?

Does China’s approach to AI—heavily state-controlled yet increasingly innovative—present a model that the U.S. isn’t ready to acknowledge?

Is America concerned about losing ground to China in AI research and implementation?

For Europe, which has to navigate the tensions between U.S. and Chinese AI ecosystems, this omission should prompt reflection. If AI is truly a strategic asset, why avoid naming the world’s second-largest economy in a speech about global AI leadership?

Indeed, former Google CEO Eric Schmidt has warned that the West must prioritize open-source AI development or risk falling behind China, which has made significant strides in AI efficiency. Speaking at the AI Action Summit in Paris, Schmidt pointed to Chinese start-up DeepSeek’s breakthrough with its R1 model, which was built more efficiently than its U.S. counterparts.

He criticized the dominance of closed-source AI models in the U.S., such as OpenAI’s GPT-4 and Google’s Gemini, arguing that failing to invest in open-source alternatives could stifle scientific progress in Western universities. Schmidt cautioned that if the U.S. and Europe do not act, China could become the global leader in open AI, while the West remains locked into costly, proprietary systems.

The EU’s Role: Should Europe Follow the U.S. or Forge Its Own Path?

Vance’s speech was also a subtle pitch for Europe to align with the U.S. on AI policy. He criticized the EU’s Digital Services Act and GDPR, warning against “excessive regulation” that could stifle innovation. But the real question is: should Europe follow the American model, or does it have an opportunity to lead AI development on its own terms?

The EU has a strong case for taking a different path:

AI Sovereignty – Europe should not be forced to choose between U.S. corporate AI and China’s state-controlled AI. Investing in open-source alternatives could create a third way.

Ethical AI Leadership – While the U.S. focuses on deregulation, Europe has been shaping AI policies around transparency, bias mitigation, and safety.

Decentralization – Encouraging open-weight models can ensure AI remains accessible to a wide range of developers rather than being concentrated in a few Silicon Valley firms.

Conclusion: Is the U.S. Really in Control of AI?

Vance’s speech sounded powerful, but its omissions reveal deeper uncertainties. By refusing to discuss who controls AI access, dismissing the open-source revolution, and sidestepping China, the U.S. may be projecting confidence while secretly grappling with strategic vulnerabilities.

For the EU, the path forward is clear: rather than simply following the U.S. lead, Europe should double down on open-source AI, transparency, and digital sovereignty. Because in the end, AI’s future will not just be shaped by those who build the biggest models, but by those who ensure access to AI remains open, fair, and democratic.

References and Further Reading:

PS: Except Mistral repeatedly failed to identify me properly when I asked its new app “Who is Stuart G Hall @stuartgh”. Ironically, ChatGPT said this failure “exposes a fundamental weakness in Mistral’s approach—it’s not just a memory issue, but a broken search ranking and retrieval model”.

This article was written using ChatGPT-4, which is based on OpenAI’s GPT-4 model. Specifically, it was generated using the latest available version of GPT-4-turbo, optimized for efficiency and cost-effectiveness while maintaining high-quality outputs.