AI Evolution and Its Impact on Educational Assessment

At the 2025 AIET IN STEM Forum, Professor Jiliang Tang presented a comprehensive overview of AI's evolution and its implications for educational assessment. This post captures the key insights from that session, structured to allow progressive exploration from summary to detailed analysis to actual forum quotes.

Core Thesis

The evolution of AI in education mirrors the evolution of AI itself. As AI changed from rule-based to learning-based to generative, so did its potential applications in education—from rigid tutoring systems to adaptive assessment to transparent, explainable feedback.

The Three Stages of AI Evolution

Stage 1: Expert Systems (Rule-Based AI)

AI as programmed rules: Experts encode domain knowledge into if-then logic. Rigid, brittle, and limited to known scenarios.

Details

In the early stages of AI, the approach was fundamentally about programming intelligence. Domain experts would analyze a problem, extract rules, and hard-code them into a system.

How it works: Experts provide domain-specific rules (e.g., "If fever + cough + positive test → diagnose influenza")
Who could build it: Only domain experts, since they needed to articulate all the rules
Key limitation: Any situation outside the programmed rules causes failure. No ability to handle out-of-distribution cases.

This is why we call them "expert systems"—they can only be created by experts who provide all the domain-specific rules.

View Original Quote

Actual Forum Quote

Jiliang Tang: "What's AI? At any stage, AI, we're trying to, how can we make AI think? How can we make AI work? What we are doing is just to express the program. So we tell a lot of rules. We gather those rules from experts. And then we program it. And then once the situation, the input come in, we just go through those rules... Of course, people, I mean, the users of this expert system think the machine can think. But actually it cannot because all those rules are hard-coded. And if there are some new situations occur, the system will fail."

Stage 2: Machine Learning (Learning from Data)

Instead of programming rules, we show examples. The machine learns to recognize patterns—just like teaching a child what a cat looks like.

Details

The breakthrough of machine learning was a fundamental shift: let the machine learn rules from examples rather than having experts program them directly.

How it works: Show many labeled examples (cat/not cat), and the machine learns to identify patterns
Who could build it: Normal users could train models—just provide examples
Key formula: AI = Algorithm + Data
Analogy: This mirrors how we teach children—we don't explain rules for identifying cats, we show many examples

Data becomes central. The quality and quantity of labeled examples determines the model's capability.

View Original Quote

Actual Forum Quote

Jiliang Tang: "So instead of giving AI the rules, we just show examples. And we want AI to learn those rules or summarize those rules from examples... Why do we think this can work? Because this is similar to how we teach our kids. For example, we want the kids to learn what is cat. We don't tell all kinds of rules to identify a cat, but we just show a lot of examples to the kids. Okay, this is a dog, this is a cat. So if we show a lot of these kinds of examples, the kids will learn."

Stage 3: Generative AI (Creation & Emergent Abilities)

From classification to generation. Models don't just categorize—they create sentences, images, code. Emergent abilities appear at scale.

Details

The current era represents another paradigm shift: from classification to generation. Instead of "cat or not cat," AI now creates—sentences, emails, images, code.

Three Training Stages of Large Models:

Pre-training: Learn from virtually all internet data using unsupervised learning (predicting next word). The model becomes "knowledgeable" but not yet useful.
Supervised Fine-tuning: Teach specific tasks (e.g., "summarize this article") with labeled examples
Alignment: Ensure values match human values (e.g., don't teach how to make bombs)

The surprise is emergent abilities—capabilities that appear unexpectedly as model size increases: summarization, translation, programming, image generation.

View Original Quote

Actual Forum Quote

Jiliang Tang: "So if you can imagine, so the old machine learning, we just do the classification. So cat or not cat. But now we are entering into the generative AI. Generation. Everything is generation. Instead of cat or not cat, we create a sentence, write an email, create a new picture of cat... The pre-training is just like a student. Before you choose a major, you learn the foundations. You learn everything. You are not a PhD in math, PhD in biology. You just learn everything."

Impact on Educational Assessment: Auto-Grading Case Study

The Traditional Auto-Grading Pipeline

Experts create rubrics → Annotators label data → Train model → Predict. Human expertise flows through data, not direct communication.

Details

Traditional auto-grading treated scoring as a classification problem: map input (student response) to output (score category) using labeled training data.

The Pipeline:

Experts build rubrics
Annotators (trained with rubrics) label student responses
Train classification model on labeled data
Model predicts scores on new responses

Two Critical Problems:

No direct communication: Experts cannot interact directly with the algorithm—everything flows through annotators and labeled data
Feedback gap: If the model makes mistakes, experts can't directly correct it. They must relabel data and retrain.

View Original Quote

Actual Forum Quote

Jiliang Tang: "So the problem, actually, so there are two problems. So for example, the first one is, actually, it's very hard for the human to interact with the algorithm, for the human experts to interact with the algorithm. It's because everything, if you check the information flow, everything is through the annotators... If the algorithm has problems, how can we get the feedback from a human, right? We cannot directly get feedback from a human. We must go through, we need to label it again."

The New Paradigm: Direct Rubric Understanding

Remove the annotation bottleneck. AI directly understands rubrics through language. Experts and AI communicate directly.

Details

With generative AI, we can remove the annotation step entirely. The model directly understands rubrics, and experts can directly communicate feedback.

Three Key Advantages:

Generalization: Change rubrics, and the model immediately applies to new domains, problems, and disciplines
Engagability: Human experts can directly instruct the model with domain knowledge—no data labeling required
Transparency: The reasoning process is visible in natural language

This enables new capabilities: uncertainty quantification (telling teachers when to get involved), human-in-the-loop verification, and direct knowledge injection.

View Original Quote

Actual Forum Quote

Jiliang Tang: "Can we just remove the annotation part, right? So can we learn the model, understand, directly understand the gradient [rubric], and then, any feedback, any domain knowledge, we can directly instruct the model to do this... So this is the first one. The second one is engagable. So that means, if the human experts have the feedback, or you have domain knowledge, you're trying to pull this up, you can directly write in a way, machine learning can understand. So previously, machine learning to understand human experts through the labeled data. Now, everything's based on language."

The Central Question: AI's Role in Education

AI as Assistant, Not Replacement

AI in education should assist, not replace. Probability-based systems require human verification. Transparency and collaboration are essential.

Details

A fundamental point that emerged from the forum discussion: AI's role is to assist, not to replace human decision-making.

Three Principles:

Assistance over replacement: Based on how AI models are trained (probability-based), they always require human verification
Transparency: Make the reasoning process visible to humans, provide evidence for decisions
Human-AI partnership: Collaboration where AI enhances human capability rather than substituting for it

Just like self-driving cars—even on highways where the technology is reliable, humans must remain engaged. We hold AI to higher standards than humans, forgiving human mistakes more readily.

View Original Quote

Actual Forum Quote

Jiliang Tang: "So basically, one thing I want to clarify is what's the role of AI in education? I think AI in education is to assist. It's going to replace the decision humans should make, right? So we, there's no, because based on the nature of the way we train the AI model, it's a probability. Always a probability. Always need a human to verify. So that means probably there's no one field we can totally trust without any human intervention or human involvement. So that's why we emphasize the role of AI is to assist. It's not to replace. It's to enhance."

Takeaway

The paradigm shift isn't just technical—it's relational. For the first time, educational researchers and AI systems can communicate in the same language. This opens possibilities for transparent, adaptable, and human-centered AI in assessment.

AI Evolution and Its Impact on Educational Assessment

Jiliang Tang

The Three Stages of AI Evolution

Three Training Stages of Large Models:

Impact on Educational Assessment: Auto-Grading Case Study

The Pipeline:

Two Critical Problems:

Three Key Advantages:

The Central Question: AI's Role in Education

Three Principles:

Related Posts

The Human-AI Partnership Debate

Multi-Agent Scoring Systems