Smarter Together: Enhancing Human-AI Collaborative Grading With Teacher-Cognition Multi-Agent LLM Framework

AI Summary of Scholarly Research

This page presents an AI-generated summary of a published research paper. The original authors did not write or review this article. See full disclosure ↓

2026-03-03·View original paper ↗·Follow this topic (RSS)

Computer Science & AI Artificial Intelligence & Machine Learning

Publication Signals show what we were able to verify about where this research was published.Available publication signals for this source were verified.ⓘ Publication Signals reflect the source’s verifiable credentials, not the quality of the research.

Fewer signals were independently confirmable for this source. That reflects the limits of what’s on record — not a judgment about the research.

✔ Published in indexed journal
✔ No retraction or integrity flags

Overview

This study addresses limitations in automated grading of open-ended short-answer responses, particularly regarding partial credit attribution, model calibration, and interpretability in resource-constrained educational settings. The research introduces the Teacher-Cognition Multi-Agent Grading framework (TC-MAG), which operationalizes teacher decision-making processes through multiple anchored language model agents. The framework systematically executes rubric creation, guideline validation, independent double marking, arbitration protocols, and confidence calibration, generating staged explanations at each stage to enable targeted teacher review.

Methods and approach

A motivational preliminary study informed the design of TC-MAG's operational structure. Validation employed a dataset comprising 2,000 primary school student responses to mathematics questions scaled 1-4 marks, adjudicated against teacher-established gold standard labels. The multi-agent architecture decomposed grading into discrete modules: rubric generation, compliance checking, dual independent markings, conflict resolution via arbitration, and cross-verification with confidence scoring. Quantitative evaluation measured inter-rater reliability using Cohen's kappa for single-mark items and quadratic-weighted kappa for multi-mark items. A mixed-methods teacher study (N=14, mean experience 12.1 years) assessed explanation format, confidence scoring effects, and teacher delegation decisions through structured qualitative analysis.

Key Findings

TC-MAG achieved κ=0.968 for single-mark items and quadratic-weighted κ=0.936 for multi-mark items, demonstrating deployment-level reliability. Performance exceeded human teacher baseline by κ=+0.063 (p<.001) and outperformed state-of-the-art LLM baselines with minimum improvement of κ=+0.012 (p<.001). Teacher study findings indicated that explanation format and confidence scores significantly influenced grading delegation decisions. Staged explanations demonstrated superior diagnosticity relative to summarized formats (likelihood ratio positive=11.5 versus 4.60), suggesting explanation structure modulates teacher trust and oversight behavior.

Implications

The TC-MAG framework demonstrates feasibility of operationalizing pedagogical expertise through multi-agent LLM architectures for automated assessment. The achievement of reliability metrics exceeding human performance while maintaining interpretability through staged explanations addresses a critical tension in deployment scenarios where automated systems must preserve teacher agency and oversight. These results support the viability of structured multi-agent approaches as intermediate solutions in resource-constrained educational contexts where human grading capacity is limited.

Disclosure

Research title: Smarter Together: Enhancing Human-AI Collaborative Grading With Teacher-Cognition Multi-Agent LLM Framework
Authors: Sanskriti Uma, Surjya Ghosh, Dio Dzaky Achmad Mustaqim
Institutions: Birla Institute of Technology and Science, Pilani, James Cook University Singapore, Universitas Negeri Surabaya
Publication date: 2026-03-03
DOI: https://doi.org/10.1145/3742413.3789130
OpenAlex record: View
Image credit: Photo by cottonbro studio on Pexels (Source • License)
Disclosure: This post was generated by Claude (Anthropic). The original authors did not write or review this post.

Smarter Together: Enhancing Human-AI Collaborative Grading With Teacher-Cognition Multi-Agent LLM Framework

Overview

Methods and approach

Key Findings

Implications

Disclosure

Get the weekly research newsletter

Related research in Computer Science & AI

More posts

Next-to-leading power terms can be significant in slepton pair production

Modular symmetry shapes quintessence and de Sitter vacua

BIR-Adapter reduces training needs for blind image restoration

Gamma-limit analysis of thin incompressible magnetoelastic shallow shells