Students Scored Higher With an AI Tutor That Asked Questions Instead of Answering Them
A new study found students paired with an AI that asks questions instead of giving answers scored higher on exams. Here's what that means for AI in education.

The AI tutor that worked best refused to give students answers.
Researchers at the University of Wisconsin, La Crosse built a tool called Macro Buddy for their undergraduate macroeconomics class. It had internet access turned off. Whenever a student asked it a question, it didn't answer — it asked one back.
Students who used Macro Buddy alongside peer discussion scored higher on in-person, closed-book exams than students who studied alone with no AI.
That sounds like a small finding. It isn't.
The problem everyone's arguing about
About 90% of US college students now use generative AI for coursework, per a 2025 survey of 1,100 students across two- and four-year institutions. The debate has mostly been framed as: is this cheating, and how do we stop it?
Brookings spent a year on a different question. Drawing on more than 400 studies and hundreds of interviews with teachers, students, and experts, they published a report in January 2026 that reframed the problem.
Cheating isn't the real issue. Learning is.
"Students can't reason. They can't think. They can't solve problems," one teacher told Brookings researchers. The report called AI the "fast food of education" — convenient and immediately satisfying, cognitively hollow over time. The term for what's happening to students' thinking: "great unwiring."
The OECD reached a similar conclusion in its Digital Education Outlook 2026, released the same month. "Offloading cognitive tasks to general-purpose chatbots creates risks of metacognitive laziness," the report states. More damning: students using general-purpose AI tools tend to produce better assignments — but when AI is removed at exam time, the advantage disappears. Sometimes it reverses.
Better outputs. Less actual learning. The AI was doing the thinking for them.
What Macro Buddy did differently
The key design decision was simple and uncomfortable: it wouldn't give answers.
If a student asked why falling prices might increase consumer spending, Macro Buddy wouldn't explain. It would ask: what happens to your purchasing power when the price of something drops? The student had to work it out.
This approach — Socratic tutoring — is well-established in learning science. Struggling to retrieve and apply knowledge, rather than receiving it passively, is where durable memory forms. Macro Buddy was a structured version of that struggle: available any hour, trained on specific course materials, unable to go beyond them.
The study enrolled 140 first- and second-year undergraduates across four sections. After an initial shared exam, researchers split them into four conditions: alone without AI, in groups without AI, alone with Macro Buddy, and in groups with Macro Buddy. All exams were in-person, closed-book, no AI.
The group combining Macro Buddy with peer discussion came out ahead. Those students had to articulate their reasoning to the AI, then explain it again to classmates — two rounds of effortful retrieval. By the third exam, the advantage had widened.
Why the combination mattered
The peer discussion piece isn't incidental. Teaching something to another person is one of the most reliable ways to cement understanding. When you explain a concept, gaps in your own knowledge become visible. Macro Buddy surfaced those gaps; peer conversation closed them.
Students who used Macro Buddy alone showed more modest gains. The AI wasn't a miracle in isolation — it worked best as a complement to human interaction, not a replacement for it.
This matters globally. AI tutoring tools are being deployed at scale, often pitched as equalizers for low-resource schools. The AI in education topic is central to how countries think about closing learning gaps. But Macro Buddy's results suggest design matters as much as availability.
An AI that hands a struggling student the answer trains them to expect answers. An AI that asks another question trains them to think.
The design gap
Most AI tools for students are built for output quality, not learning. ChatGPT, Claude, and their equivalents are optimised to produce useful responses — that's why they're popular for homework, and why they're damaging for cognitive development when used without structure.
The OECD's 2026 report describes what purposeful educational AI looks like: peer-like contribution to collaborative tasks, materials tied to specific course content, feedback tied to learning goals. Macro Buddy fits that description. Most tools don't.
Schools are largely responding to AI by banning it or monitoring it. The harder redesign question — what should AI assistance look like during learning, not just at submission time — has barely been addressed.
The research points toward a counterintuitive answer: the AI that makes learning harder in the moment makes it stick. Friction is the feature, not the bug.
What this means for the AI tutoring boom
The e-learning market is projected to grow from $276 billion in 2026 to $462 billion by 2031. A large chunk of that growth is AI tutoring. The Macro Buddy study is a reminder that the question isn't whether AI can tutor — it clearly can. It's whether the people building these tools understand how learning actually works.
An AI that gives students answers on demand will be popular. Usage metrics will look great.
An AI that asks better questions, built around specific course content and integrated with peer interaction — that's harder to build, slower to use, and less immediately satisfying. It's also the one that made exam scores go up.
Albis tracks how knowledge and information travel across regions and disciplines. Explore the AI topic hub for more on artificial intelligence and learning.
Sources & Verification
Based on 5 sources from 2 regions
- The ConversationNorth America
- FortuneNorth America
- OECD Digital Education Outlook 2026International
- Brookings InstitutionNorth America
- Fast CompanyNorth America
Keep Reading
The Best AI Tutor Refuses to Answer Your Question
A Wisconsin experiment found that an AI chatbot designed to ask questions instead of giving answers produced the highest exam scores — but only when paired with peer discussion.
Gen Z Is the First Generation Less Cognitively Capable Than Their Parents. EdTech Did That.
US schools spent $30B on edtech. Gen Z's test scores dropped. Now AI tutoring is flooding classrooms — with less than 10% of tools having evidence they work.
95% of UK Students Use AI on Assignments. Their Professors Still Don't Know What to Do.
A major UK survey found 95% of undergrads use AI for assessed work, up from near-zero three years ago. Universities are scrambling to respond as students say AI is 'making us all lazy.'
Explore Perspectives
Get this delivered free every morning
The daily briefing with perspectives from 7 regions — straight to your inbox.
Free · Daily · Unsubscribe anytime
🔒 We never share your email