What the study found
Generative AI (GenAI), when carefully designed and prompted, was able to score elementary students’ responses about scientific inquiry with high agreement with human raters. The study also found that the AI-generated feedback was generally appropriate and often strong in supporting student agency, although some Korean wording reduced clarity.
Why the authors say this matters
The authors conclude that GenAI may function as a dialogic partner for helping students build epistemic understanding, meaning understanding of how knowledge is developed in scientific inquiry. They also suggest this work offers new pathways for formative assessment and teacher education in supporting students’ understanding of the nature of scientific inquiry.
What the researchers tested
The researchers collected 560 responses from 80 fourth-grade students in Korea using the Korean version of the Views About Scientific Inquiry for Elementary school students (VASI-E). They used prompt engineering strategies in ChatGPT-4o to create GenAI prompts for scoring responses as naive, mixed, or informed, and for generating feedback aligned with established epistemic frameworks.
What worked and what didn't
GenAI scoring showed high agreement with human raters, with overall kappa at 0.825 and item-level kappa values ranging from 0.606 to 0.923. The generated feedback scored 2.75 out of 3 on learner-centered quality, with particular strength in future impact, sensemaking, and student agency; however, some Korean phrasing caused clarity problems.
What to keep in mind
The abstract does not describe limitations beyond the language-related clarity issues in Korean phrasing. The study summary is limited to fourth-grade students in Korea and to the VASI-E instrument used in this research.
Key points
- GenAI scored students’ scientific inquiry responses with high agreement with human raters (κ = 0.825).
- Item-level agreement ranged from κ = 0.606 to 0.923 across questions.
- AI-generated feedback was generally appropriate, with an average learner-centered score of 2.75 out of 3.
- The feedback was especially strong in supporting student agency, future impact, and sensemaking.
- Some Korean phrasing reduced the clarity of the generated feedback.
Disclosure
- Research title:
- GenAI matched human scoring of students’ scientific inquiry understanding
- Publication date:
- 2026-03-10
- OpenAlex record:
- View
Get the weekly research newsletter
Stay current with peer-reviewed research without reading academic papers — one filtered digest, every Friday.


