At the 2025 AIET IN STEM Forum, Professor Jiliang Tang presented a comprehensive overview of AI's evolution and its implications for educational assessment. This post captures the key insights from that session, structured to allow progressive exploration from summary to detailed analysis to actual forum quotes.
The evolution of AI in education mirrors the evolution of AI itself. As AI changed from rule-based to learning-based to generative, so did its potential applications in education—from rigid tutoring systems to adaptive assessment to transparent, explainable feedback.
The Three Stages of AI Evolution
AI as programmed rules: Experts encode domain knowledge into if-then logic. Rigid, brittle, and limited to known scenarios.
In the early stages of AI, the approach was fundamentally about programming intelligence. Domain experts would analyze a problem, extract rules, and hard-code them into a system.
- How it works: Experts provide domain-specific rules (e.g., "If fever + cough + positive test → diagnose influenza")
- Who could build it: Only domain experts, since they needed to articulate all the rules
- Key limitation: Any situation outside the programmed rules causes failure. No ability to handle out-of-distribution cases.
This is why we call them "expert systems"—they can only be created by experts who provide all the domain-specific rules.
Jiliang Tang: "What's AI? At any stage, AI, we're trying to, how can we make AI think? How can we make AI work? What we are doing is just to express the program. So we tell a lot of rules. We gather those rules from experts. And then we program it. And then once the situation, the input come in, we just go through those rules... Of course, people, I mean, the users of this expert system think the machine can think. But actually it cannot because all those rules are hard-coded. And if there are some new situations occur, the system will fail."
Instead of programming rules, we show examples. The machine learns to recognize patterns—just like teaching a child what a cat looks like.
The breakthrough of machine learning was a fundamental shift: let the machine learn rules from examples rather than having experts program them directly.
- How it works: Show many labeled examples (cat/not cat), and the machine learns to identify patterns
- Who could build it: Normal users could train models—just provide examples
- Key formula: AI = Algorithm + Data
- Analogy: This mirrors how we teach children—we don't explain rules for identifying cats, we show many examples
Data becomes central. The quality and quantity of labeled examples determines the model's capability.
Jiliang Tang: "So instead of giving AI the rules, we just show examples. And we want AI to learn those rules or summarize those rules from examples... Why do we think this can work? Because this is similar to how we teach our kids. For example, we want the kids to learn what is cat. We don't tell all kinds of rules to identify a cat, but we just show a lot of examples to the kids. Okay, this is a dog, this is a cat. So if we show a lot of these kinds of examples, the kids will learn."
From classification to generation. Models don't just categorize—they create sentences, images, code. Emergent abilities appear at scale.
The current era represents another paradigm shift: from classification to generation. Instead of "cat or not cat," AI now creates—sentences, emails, images, code.
Three Training Stages of Large Models:
- Pre-training: Learn from virtually all internet data using unsupervised learning (predicting next word). The model becomes "knowledgeable" but not yet useful.
- Supervised Fine-tuning: Teach specific tasks (e.g., "summarize this article") with labeled examples
- Alignment: Ensure values match human values (e.g., don't teach how to make bombs)
The surprise is emergent abilities—capabilities that appear unexpectedly as model size increases: summarization, translation, programming, image generation.
Jiliang Tang: "So if you can imagine, so the old machine learning, we just do the classification. So cat or not cat. But now we are entering into the generative AI. Generation. Everything is generation. Instead of cat or not cat, we create a sentence, write an email, create a new picture of cat... The pre-training is just like a student. Before you choose a major, you learn the foundations. You learn everything. You are not a PhD in math, PhD in biology. You just learn everything."
Impact on Educational Assessment: Auto-Grading Case Study
Experts create rubrics → Annotators label data → Train model → Predict. Human expertise flows through data, not direct communication.
Traditional auto-grading treated scoring as a classification problem: map input (student response) to output (score category) using labeled training data.
The Pipeline:
- Experts build rubrics
- Annotators (trained with rubrics) label student responses
- Train classification model on labeled data
- Model predicts scores on new responses
Two Critical Problems:
- No direct communication: Experts cannot interact directly with the algorithm—everything flows through annotators and labeled data
- Feedback gap: If the model makes mistakes, experts can't directly correct it. They must relabel data and retrain.
Jiliang Tang: "So the problem, actually, so there are two problems. So for example, the first one is, actually, it's very hard for the human to interact with the algorithm, for the human experts to interact with the algorithm. It's because everything, if you check the information flow, everything is through the annotators... If the algorithm has problems, how can we get the feedback from a human, right? We cannot directly get feedback from a human. We must go through, we need to label it again."
Remove the annotation bottleneck. AI directly understands rubrics through language. Experts and AI communicate directly.
With generative AI, we can remove the annotation step entirely. The model directly understands rubrics, and experts can directly communicate feedback.
Three Key Advantages:
- Generalization: Change rubrics, and the model immediately applies to new domains, problems, and disciplines
- Engagability: Human experts can directly instruct the model with domain knowledge—no data labeling required
- Transparency: The reasoning process is visible in natural language
This enables new capabilities: uncertainty quantification (telling teachers when to get involved), human-in-the-loop verification, and direct knowledge injection.
Jiliang Tang: "Can we just remove the annotation part, right? So can we learn the model, understand, directly understand the gradient [rubric], and then, any feedback, any domain knowledge, we can directly instruct the model to do this... So this is the first one. The second one is engagable. So that means, if the human experts have the feedback, or you have domain knowledge, you're trying to pull this up, you can directly write in a way, machine learning can understand. So previously, machine learning to understand human experts through the labeled data. Now, everything's based on language."
The Central Question: AI's Role in Education
AI in education should assist, not replace. Probability-based systems require human verification. Transparency and collaboration are essential.
A fundamental point that emerged from the forum discussion: AI's role is to assist, not to replace human decision-making.
Three Principles:
- Assistance over replacement: Based on how AI models are trained (probability-based), they always require human verification
- Transparency: Make the reasoning process visible to humans, provide evidence for decisions
- Human-AI partnership: Collaboration where AI enhances human capability rather than substituting for it
Just like self-driving cars—even on highways where the technology is reliable, humans must remain engaged. We hold AI to higher standards than humans, forgiving human mistakes more readily.
Jiliang Tang: "So basically, one thing I want to clarify is what's the role of AI in education? I think AI in education is to assist. It's going to replace the decision humans should make, right? So we, there's no, because based on the nature of the way we train the AI model, it's a probability. Always a probability. Always need a human to verify. So that means probably there's no one field we can totally trust without any human intervention or human involvement. So that's why we emphasize the role of AI is to assist. It's not to replace. It's to enhance."
The paradigm shift isn't just technical—it's relational. For the first time, educational researchers and AI systems can communicate in the same language. This opens possibilities for transparent, adaptable, and human-centered AI in assessment.