LEAD Graduate School & Research Network

26.01.2026

AI in the Classroom

Can an Algorithm Recognize High‑Quality Teaching?

For years, teaching quality has been considered a key factor in students’ academic success. But how can the quality of instruction be captured in everyday school practice in a way that provides teachers with timely feedback? Until now, researchers have relied on student surveys, teacher judgments, or classroom observations. These methods are expensive, time‑consuming, and not always reliable. An international study involving Dr. Tim Fütterer and Prof. Dr. Ulrich Trautwein from the Hector Research Institute of Education Sciences and Psychology therefore explores a new approach: Can AI algorithms automatically assess teaching quality?
 

The idea is appealing: Instead of human observers, algorithms would analyze classroom videos and estimate their quality. “Automated procedures have the potential to make classroom analysis not only more efficient but also more objective,” says lead author Tim Fütterer. The study draws on data from the international TALIS Video Study, which includes video recordings of mathematics lessons in Germany that were evaluated by trained observers. These human ratings served as the reference point for the AI models.

The researchers used multimodal data—video, audio, and transcripts—and employed AI algorithms to predict 18 subdimensions of teaching quality, which map onto the three core dimensions: classroom management, student support, and cognitive activation. The team then examined how reliably the automated assessments matched human judgments and whether they proved plausible and predictive. 

AI Excels at Language‑Based Features

Overall, the AI algorithms achieved accuracy levels comparable to human ratings. In 11 of the 18 subdimensions, automated scores were closer to the “true” value than human judgments. Text‑ and audio‑based models performed especially well on aspects such as discourse quality or teacher feedback. Here, the algorithms appeared able to detect subtle linguistic patterns—such as question structures or the depth of explanations—that humans often overlook.

The plausibility of the AI-generated scores was also tested. An interesting finding: Experts frequently judged the automated assessments to be just as plausible as human ones—and in some cases even more so.

Results were mixed when it came to predicting student achievement. Neither human nor AI ratings consistently showed strong associations with students’ demonstrated subject‑matter performance. Importantly, AI and human raters did not always assess the same lesson in the same way. Notably, in some areas the AI rated teaching quality higher than the human observers. What matters most, however: Only the AI-based assessments showed systematic links with students’ mathematics performance. Human raters’ judgments did not show such a relationship. Moreover, independent experts found the AI ratings more plausible than human assessments.

What Does This Mean for Practice?

The study demonstrates that automated analyses are feasible and could complement traditional methods, especially in research contexts where large datasets have so far required considerable resources to evaluate. The next step is to optimize the AI algorithms and provide teachers with a free app that offers timely feedback on their teaching quality. This work is currently being pursued by Tim Fütterer and colleagues in the ETQ‑AI project, with the first test version scheduled for pilot use in classrooms starting in early February. In parallel, the team is addressing several open questions—for example regarding the explainability of AI judgments, the quality of the underlying multimodal data (e.g., audio quality), and the reliability of the “ground truth,” meaning the human ratings on which the models are currently trained.

“Our AI approach is promising, and right now we’re doing everything we can to make it ready for real‑world use so that all teachers can benefit from private, individualized feedback,” emphasizes Tim Fütterer.

Overall, the study marks an important step: It shows that AI‑based procedures are not only theoretically feasible but already match human observers in accuracy in several areas. This opens up new perspectives not only for practice but also for educational research—for example, regarding the feasibility of large‑scale studies or deeper insights into the development of teaching quality.

Publication

Fütterer, T., Hou, R., Bühler, B., Bozkir, E., Bell, C., Kasneci, E., Gerjets, P., & Trautwein, U. (2026). Validating automated assessments of teaching effectiveness using multimodal data. Learning and Instruction, 101, 102264. https://doi.org/10.1016/j.learninstruc.2025.102264

See also:

Fütterer, T., Goldberg, P., Bühler, B., Sikimić, V., Trautwein, U., Gerjets, P., Stürmer, K., & Kasneci, E. (2025). Artificial intelligence in classroom management: A systematic review on educational purposes, technical implementations, and ethical considerations. Computers and Education: Artificial Intelligence, 100483. https://doi.org/10.1016/j.caeai.2025.100483


Media Contact:

Philipp Sigle
pressespam prevention@lead.uni-tuebingen.de