At Rutgers University and as the founding CEO of InqITS, Dr. Janice Gobert has built a platform that fundamentally rethinks how we assess students during science inquiry. With over 20 million investigations completed and 100 million interaction logs from 94 countries, InqITS represents one of the largest AI-driven formative assessment systems in science education.

20M+
Investigations
100M+
Interaction Logs
94
Countries
170
Days of Transfer
The Problem

If you rely only on what a student writes, you're misassessing them between 30 to 60% of the time. We have Billy (the science genius who can't express himself) and Johnny (the social loafer who writes great lab reports). ChatGPT is making this even worse.

The Billy vs Johnny Problem

Billy (False Negative)

Science genius who understands everything but struggles to write it down. Gets underestimated by traditional assessment.

Johnny (False Positive)

Social loafer who catches the classroom buzz and writes excellent reports without actually understanding. Gets overestimated.

The Assessment Crisis in Science Education

Traditional assessment methods—multiple choice, lab reports, summative tests—systematically fail to capture what students actually know and can do. ChatGPT is amplifying this problem.

The crisis in science education assessment runs deep. Multiple-choice tests measure surface knowledge, not authentic practices. Lab reports reflect writing ability, not scientific understanding. And with ChatGPT, students can now generate sophisticated explanations without any actual comprehension.

Why Traditional Methods Fail:

  • Multiple-choice tests: Don't capture rich knowledge or authentic practices
  • High-stakes summative tests: Come too late to inform instruction
  • Hands-on labs: A nightmare for teachers to assess—kids work at different paces, lack equipment
  • Lab reports: "Not a good reflection of what they know"
  • ChatGPT moment: What does it even mean to "know" something when AI can generate the explanation?
Original Quote
Forum Transcript

Janice Gobert: "If you rely only on what a student writes, you're misassessing them between 30 to 60% of the time because you have Johnny's and you have Billy's. Now, ChatGPT is making this even worse because if a student is just like, 'oh, explain to me how the boiling point stays the same even though I add more water,' right, and then is writing that, the teacher's like, 'oh, gee, Johnny knows what he's talking about.' This is the ChatGPT moment because what does it mean to know? What does it mean to learn?"

How InqITS Works: Real-Time Process Assessment

Every mouse click, every widget change, everything the student does is captured. Patented algorithms score them in real-time on NGSS practices—and drive immediate support.

InqITS fundamentally shifts assessment from products (what students write) to processes (what students do). As students work in virtual labs, the system captures every interaction and applies AI in real-time.

The Technical Architecture:

  • Data Capture: Every mouse click, widget change, everything written—captured as high-fidelity log files
  • Algorithm Suite: Machine learning, knowledge engineering, and NLP algorithms applied in real-time
  • Evidence-Centered Design: Sub-competencies defined a priori using learning science literature
  • Real-time Scaffolding: Rex (the AI tutor) intervenes based on assessment, not on-demand
  • Teacher Dashboard: Alerts and tips sent to teachers in real-time

What Gets Assessed:

  • Forming questions and hypotheses
  • Collecting data with simulations
  • Interpreting data with mathematics
  • Warranting claims with evidence
  • Writing claim-evidence-reasoning statements
Original Quote
Forum Transcript

Janice Gobert: "You have the student on the left. The student is doing a virtual lab aligned to NGSS and as they're working, our infrastructure is garnering every mouse click, every change to the widget, everything they write, absolutely everything. And it's then applying these patented algorithms—machine-learned algorithms, knowledge engineer algorithms, and natural language processing algorithms—in real time to drive support to the virtual tutor who is then in turn supporting the student on an actionable level on the sub-competencies underlying each practice."

Rex: The AI Tutor That Knows When You're Struggling

Not on-demand help—Rex intervenes when the AI detects struggle. Students don't know what they don't know, so we can't wait for them to ask.

Rex (a dinosaur for middle school, a robot for high school) is fundamentally different from typical AI tutors. It doesn't wait for students to ask for help—because they usually don't know what they need help with.

Why Not On-Demand Help:

  • Students don't know what they don't know: A student won't say "Miss, I don't understand what an independent variable is"
  • Gaming the system: Research shows students drill down to get the bottom hint without learning
  • This is what happens with ChatGPT: Students get answers without understanding

How Rex Intervenes:

Rex provides multiple levels of support based on detected difficulties. For example, if a student confuses independent and dependent variables:

  • Why it matters: If you don't understand independent variables, you can't form a question, can't design an experiment, can't interpret data
  • Real-time intervention: Stop the student before they collect 2 periods of buggy data
  • Productive struggle: Studies determine how long to let students struggle before intervening
Original Quote
Forum Transcript

Janice Gobert: "It's not on-demand help because kids generally don't know what they don't know. The problem is that they're not gonna say, 'hey, miss, I don't understand what an independent variable is.' They don't typically understand that. And furthermore, we know from people who study gaming the system, like Ryan Baker, that what happens is kids will drill down and get the bottom-up hint just to find the specific thing they should do, like a cookbook, and they're not doing any learning, right? This is also what's happening with ChatGPT, I fear."

Teacher Dashboard: Real-Time Alerts and TIPS

Teachers receive actionable alerts as students struggle. TIPS (Teacher Inquiry Practice Supports) are evidence-based scripts derived from recording what effective teachers actually say.

Teachers consistently say InqITS "changed their lives" because of the dashboard. It makes NGSS formative assessment actually feasible in real classrooms.

Alert System:

  • Student-level alerts: "John Marcones is struggling with hypothesizing"
  • Class-level detection: "80% of students struggling with variable identification—stop the class"
  • Slow progress alert: Detects carelessness or frustration (validated log signatures)
  • Adjustable thresholds: Be more forgiving early in the year, stricter before state tests

TIPS - Teacher Inquiry Practice Supports:

Derived from an innovative study: recorded what teachers said to struggling students, then analyzed which discourse led to improvement on the next lab.

  • 4 types of tips: Different tips for different students and situations
  • High-flyer: Give smallest breadcrumb—let them figure it out
  • Struggling student: Give instrumental support, then ask "How is this different from what you did?"
Original Quote
Forum Transcript

Janice Gobert: "I've spent my whole career building student software and no kids, including my own three, ever told me that I changed their life. But teachers tell us that we change their lives because of our dashboard, which does real-time reports, alerts, and tips. And this I'm really excited about because it really changes boots on the ground, the implementation of NGSS."

"What we did was we examined kids who did better on the next lab and then we called the discourse for what teachers said to them. And then we turned them into these tips. These are called TIPS—Teacher Inquiry Practice Supports. And there's four kinds of different tips. And the teacher can walk over and decide what tip to use for what kind of kid."

Evidence of Efficacy: 170-Day Transfer Effects

NSF-funded studies show transfer effects persisting 170 days after just one unit with Rex. Spanish-speaking multilingual learners show consistent improvement even on tasks without Rex support.

The efficacy research demonstrates that InqITS doesn't just help in the moment—it produces lasting, transferable learning of scientific practices.

Research Findings:

  • State test improvement: Multiple states show improved science test scores
  • Systematic experimentation: Students become more systematic with hands-on experiments after using InqITS
  • Special education: Rex supports learners with IEPs
  • Mathematics transfer: Students transfer math skills associated with science across topics

The 170-Day Study:

A robust NSF-funded study showed effects from one Rex treatment persisting 170 days later. "We taught them something. We definitely taught them something."

Multilingual Learner Study:

Spanish Rex support led to better learning AND better transfer—even to claim-evidence-reasoning, which doesn't have Rex support yet. "The students who had Rex for the earlier inquiry tasks wrote significantly better claim evidence reasoning statements because they understood the practices better."

Original Quote
Forum Transcript

Janice Gobert: "We showed after one treatment of Rex, the effects still showed up 170 days later. So we taught them something, right? We definitely taught them something. I just had a doctoral student, Jeremy Lee, finish his dissertation on our Spanish support, and kids who got Spanish Rex versus multilingual learners who did not showed consistent and better learning and better transfer. Furthermore, this is really fascinating—we do not have Rex support for claim evidence reasoning yet but the students who had Rex for the earlier inquiry tasks wrote significantly better claim evidence reasoning statements because they understood the practices better and they understood the content better."

Evidence-Centered Design: The Theoretical Foundation

InqITS is built on rigorous theoretical foundations—evidence-centered design, learning science, and science education literature. Not just "throw machine learning at it."

Unlike systems that simply apply machine learning to student data, InqITS was designed theory-first using Evidence-Centered Design (ECD) to define what competencies mean and how to assess them.

Why Theory Matters:

  • Science inquiry is ill-defined: Post-hoc knowledge engineering won't work
  • Sub-competencies need definition: Different stakeholders have different ideas about what "data interpretation" means
  • Pedagogical meaning: It's not enough to count mouse clicks—data must be actionable

The Development Process:

  1. Collect data from student interactions
  2. Build text replay clips of student behavior
  3. Tag clips for skills (human-defined competencies)
  4. Define features with domain experts
  5. Build and validate detectors using held-out test sets

Domain Specificity:

Computer scientists tend to think knowledge is domain-agnostic—learning geography is like learning science. But there are knowledge ontologies: physics emphasizes different things than biology (structure-function-behavior) or earth science (time scales, spatial information).

Original Quote
Forum Transcript

Janice Gobert: "It's not good enough to say, 'well, Johnny, who was successful, did 4,000 mouse clicks, and hey, Billy, you only did 2,000 mouse clicks, do more mouse clicks,' right? It's what have you done, and how do you turn that into actionable data? So, we did this using a combination of computer science, learning science, science education, literature, coming together in this very rich way. And I think that's really what you need, to design a good, theoretically grounded system that has a deep infrastructure."

Practices-First: Why NGSS Should Focus on Scientific Skills

A student who learns to form testable hypotheses can do so in any content area. Practices are the most transferable, actionable part of NGSS.

NGSS is three-dimensional (practices, content, crosscutting concepts), but Gobert takes a "practices-first" perspective—arguing that practices are the most actionable and transferable component.

Why Practices First:

  • Transferability: A student who can form testable hypotheses in biology can do so in physics
  • Gets them back in the game: Even in unfamiliar content, solid practices let students proceed
  • Transfer is not a bad word: "Knowledge and use" = transfer. They're not different things.

The Independent Variable Example:

Understanding what an independent variable is and what it means in inquiry is "a huge component" of learning progressions. If you don't get this:

  • You can't form a proper question
  • You can't design an unconfounded experiment
  • You can't interpret your data
  • You can't warrant claims
  • You can't write valid CER statements
Original Quote
Forum Transcript

Janice Gobert: "I take a slightly practices-first perspective, because I believe that is the most actionable thing. So, if a student, in fact, is taught how to form a testable hypothesis, when they move to a new content area that they don't know, they're still able to form a testable hypothesis. You know, if they move to a content area that they don't understand, they can collect bona fide data that's not confounded. So, it gets them back in the game. That's why I like this practices-first perspective."

Key Takeaway

Real-time assessment of scientific practices—not products—combined with intelligent tutoring that intervenes at the moment of struggle, produces lasting learning that transfers across content and persists for months. The AI assesses the process, not just the product.

Previous: CALTSTEAM Back to Blog