Back to Blog

At the 2025 AIET IN STEM Forum, Yasemin Copur-Gencturk presented a striking contrast to the AI hype cycle: a carefully designed, evidence-based AI system for teacher professional development that actually works - with randomized controlled trial data to prove it.

The Problem: AI Tools That Forget Their Purpose

A Critical Observation

There is no AI in my title. Do you know why? Because we are all talking about AI, but I want to make sure that we are paying attention to domain-specific expertise. Sometimes they forget the most important thing: they need content experts to create meaningful tools.

Design Goals vs. Actual Products

Many AI tools claim to solve problems that were their original design goal, but technical difficulties cause the actual product to diverge from the original intent. The product doesn't match the promise.

  • Companies state design goals but get "lost" during development
  • The developed product doesn't have the same goals as the intended product
  • Everyone says their product is great - but nobody talks about evidence
  • We need to know: Is it working? Under which conditions?
Original Quote

Yasemin Copur-Gencturk: "When we are trying to achieve this goal, when we are trying to design a product, sometimes you forget what our intention is. Because there are so many technical difficulties you may face when you are trying to create a product. That means that sometimes the product you develop is not aligned with the thing that you are trying to solve."

Why Giving Answers Doesn't Work

The Google Problem

How many times do you search the same thing in Google? Sometimes a hundred times and it still doesn't stay. Many AI tools claiming to help you learn are just giving you answers - and you're not learning them.

  • Research shows: people don't learn if you just give them the answer
  • Most tutoring systems: You ask a question, they provide an answer
  • The shift: AI should be the one asking questions, not answering them
  • AI should identify areas where learners need improvement and guide them there
Original Quote

Yasemin Copur-Gencturk: "We know from research that people don't learn if you give them the answer. How many times do you search the same thing in Google? Sometimes I search the same thing maybe a hundred times and still it doesn't stay... So many tools claiming to help you learn are giving you the answer and then you are not learning them."

Three Challenges in Teacher Professional Development

Challenge 1: Scale

In-person professional development requires teachers to physically be somewhere - after school, driving to a location. Some teachers may not have access to high-quality PD when they need it.

  • Traditional PD requires physical presence at specific times and places
  • Geographic and scheduling barriers limit access
  • Not all teachers can access high-quality opportunities when needed
  • Question: How can we scale while maintaining accessibility?
Challenge 2: Quality Consistency

In-person teaching effectiveness depends on the instructor. If you have a great instructor, you learn well. What if you don't? How can we maintain consistent quality across all teachers?

  • Instructor quality varies dramatically
  • Same program can yield different results with different facilitators
  • Question: How can everyone experience high-quality education?
Challenge 3: Individualized Feedback

Most personalized feedback discussion focuses on students. But what about helping teachers identify the areas THEY need to improve? Teachers need individualized support too.

  • Personalized feedback for teachers is rarely discussed
  • Teachers need to identify their own growth areas
  • Question: How can we help teachers learn and improve systematically?
Original Quote

Yasemin Copur-Gencturk: "Most of the discussion around personalized feedback focuses on giving feedback to students. But we don't talk about how we can help teachers to learn and how we can help teachers with, you know, identifying the areas they need to improve."

The Solution: A Multi-Agent AI System

The Key Design Principle

AI is not giving the answers. AI is the one who is asking questions. It finds the areas teachers need improvement and provides scaffolding activities rather than explanations.

Filter

Are you on task?

Judge

Which goals are met?

Responder

Feedback + new questions

Explainer

After 3 attempts

Agent 1: Filter

When users try to game the system by typing "none" or irrelevant responses, Filter catches this and says "Stay on task." It's the gatekeeper ensuring meaningful engagement.

  • Analyzes whether response is relevant to the assigned task
  • If irrelevant: blocks and redirects
  • If relevant: green light, passes to Judge
Original Quote

Yasemin Copur-Gencturk: "You know that when you are using a system, sometimes when you try to teach the system, you just put 'none.' I'm sure I'm not the only one who is doing that. If you put random responses, the system says, 'You are not on track. You are not doing what you are supposed to do. Stay on track.'"

Agent 2: Judge

Judge analyzes which learning expectations have been met and which still need work. It maps responses to predefined learning goals set by content experts.

  • Expectations are set by content experts (like labels for ML)
  • Example: "Solve the problem correctly" + "Reasoning should be correct"
  • Judge determines which expectations are met, which need more work
  • If all met: move to next activity
Agent 3: Responder

Responder tells you what you did right, then gives additional hints or scaffolding activities for areas where you still need help. It doesn't give answers - it guides.

  • Provides positive feedback on what was done correctly
  • Offers hints and scaffolding activities (not answers) for remaining goals
  • May ask teachers to watch a video, create a graph, or use ratio tables
  • Learning by doing, not by being told
Agent 4: Explainer (After 3 Attempts)

After three attempts, if teachers are still struggling, Explainer steps in and explains the concept. This prevents frustration while maximizing learning opportunity.

  • Only activates after 3 unsuccessful attempts
  • Explains the concept the teacher is struggling with
  • Prevents endless loops of frustration
  • Balances productive struggle with support

The Evidence: Randomized Controlled Trial

The Funding Story

In 2015, NSF said "Who needs human-less PD? Nobody completes a program without in-person presentation." They didn't fund it. IES funded it instead. The results proved the skeptics wrong.

The Study Design

Half the teachers were randomly assigned to the AI program, half did nothing. This rules out alternative explanations - any changes can confidently be attributed to the program.

  • Randomized controlled trial (gold standard for evidence)
  • Treatment group: Complete AI-based PD program
  • Control group: No intervention ("Thank you, you don't need to do anything")
  • Measured: Teacher knowledge, teaching practices, student learning
Original Quote

Yasemin Copur-Gencturk: "I randomly assigned teachers, half of the teachers to my program, and half of the teachers, I said, thank you, but you don't need to do anything. So that I can confidently say that after the randomization, if I see any changes in teachers' knowledge, in their teaching practices, in their student learning, I can confidently say that there is something within the PDF that I did to help teachers improve their knowledge and skills."

The Results: Impact at Three Levels

Result 1: Teacher Content Knowledge

Teachers who completed the program improved their content knowledge. This is exciting but not unique - many programs can claim this.

  • Statistically significant improvement in math content knowledge
  • Many US programs can make this claim
  • Important baseline, but not the key finding
Result 2: Teaching Practices Changed

This is the tricky part. Control group teachers used low-level activities with little opportunity for deep understanding. Treatment group teachers created rigorous tasks that identify student thinking.

  • The program gave NO materials or activities to use in class
  • Control teachers: Low-level math, not hitting expected standards
  • Treatment teachers: Found and created rigorous tasks on their own
  • Treatment teachers could identify additive vs. multiplicative reasoning in student work
Original Quote

Yasemin Copur-Gencturk: "This is the tricky part. We don't see many programs that are showing an impact on teachers... In our program, we didn't give teachers any activities they can use in their classroom... They went out, identified, they used the knowledge and the skills they learned from the system to create more meaningful learning opportunities for their students."

Result 3: Student Learning Improved

Almost one-fourth of a grade level improvement. This is huge. Creating an intervention at teacher level and showing impact at student level is extremely rare.

  • Approximately 0.25 grade level improvement in student outcomes
  • "Huge impact" - rare to show student-level effects from teacher-level intervention
  • Published in Computers and Education (AI impact) and AERJ (instructional practices)
Original Quote

Yasemin Copur-Gencturk: "In the United States, there are not many programs that can create an intervention at teacher level and show the impact at the student level... Almost one-fourth of the grade level. This is huge. This is a huge impact and I was so happy because I was so scared that the people would be right and I wouldn't be able to show that the system is working."

The Three Design Principles

Design with Purpose

1. Design with purpose: Keep your original goal aligned with the final product. 2. AI as facilitator: Guide, don't tell. 3. Evidence matters: If you can't prove it works, you don't know if it works.