Publishing process signals: MODERATE — reflects the venue and review process. — venue and review process.

GenAI matched human scoring of students’ scientific inquiry understanding

Research area:PedagogyEducationEducational Strategies and Epistemologies

What the study found

Generative AI (GenAI), when carefully designed and prompted, was able to score elementary students’ responses about scientific inquiry with high agreement with human raters. The study also found that the AI-generated feedback was generally appropriate and often strong in supporting student agency, although some Korean wording reduced clarity.

Why the authors say this matters

The authors conclude that GenAI may function as a dialogic partner for helping students build epistemic understanding, meaning understanding of how knowledge is developed in scientific inquiry. They also suggest this work offers new pathways for formative assessment and teacher education in supporting students’ understanding of the nature of scientific inquiry.

What the researchers tested

The researchers collected 560 responses from 80 fourth-grade students in Korea using the Korean version of the Views About Scientific Inquiry for Elementary school students (VASI-E). They used prompt engineering strategies in ChatGPT-4o to create GenAI prompts for scoring responses as naive, mixed, or informed, and for generating feedback aligned with established epistemic frameworks.

What worked and what didn't

GenAI scoring showed high agreement with human raters, with overall kappa at 0.825 and item-level kappa values ranging from 0.606 to 0.923. The generated feedback scored 2.75 out of 3 on learner-centered quality, with particular strength in future impact, sensemaking, and student agency; however, some Korean phrasing caused clarity problems.

What to keep in mind

The abstract does not describe limitations beyond the language-related clarity issues in Korean phrasing. The study summary is limited to fourth-grade students in Korea and to the VASI-E instrument used in this research.

Key points

GenAI scored students’ scientific inquiry responses with high agreement with human raters (κ = 0.825).
Item-level agreement ranged from κ = 0.606 to 0.923 across questions.
AI-generated feedback was generally appropriate, with an average learner-centered score of 2.75 out of 3.
The feedback was especially strong in supporting student agency, future impact, and sensemaking.
Some Korean phrasing reduced the clarity of the generated feedback.

Disclosure

Research title:: GenAI matched human scoring of students’ scientific inquiry understanding
Publication date:: 2026-03-10
DOI:: 10.1007/s10956-026-10298-5
OpenAlex record:: View

AI provenance: AI provenance information is not available for this post.

GenAI matched human scoring of students’ scientific inquiry understanding

What the study found

Why the authors say this matters

What the researchers tested

What worked and what didn't

What to keep in mind

Disclosure

More posts

Next-to-leading power terms can be significant in slepton pair production

Modular symmetry shapes quintessence and de Sitter vacua

BIR-Adapter reduces training needs for blind image restoration

Gamma-limit analysis of thin incompressible magnetoelastic shallow shells