In my new role as scientific advisor at the French Ministry of Education, I have recently been looking for meta-analyses and RCTs about impact of AI (and LLMs) in education.
- A recent (preprint, Dec 1) RCT by University of Toronto and Microsoft Research on 1200 participants.
- UK Dept of Education report about LLMs in Education
- Meta analysis about ITS
A recent (preprint, Dec 1) RCT by University of Toronto and Microsoft Research on 1200 participants.
People were recruited on Amazon Mechanical Turk and were shown SAT-like math questions.
RQ1. When doing practice questions for a math test, how does the type of explanation people receive (answers alone or answers with LLM-generated explanations) affect performance on subsequent test questions?
RQ2. How does the relationship between explanation type and performance change when people i) attempt questions before seeing explanations or ii) see explanations before attempting questions?
LLM-based explanations positively impact learning (relatively to seeing only correct answers), regardless of whether participants consulted them before or after attempting practice problems.
An accompanying qualitative analysis revealed that these boosts in performance were indeed due to participants adopting the strategies they were shown, and that exposure to LLM explanations increased the amount people felt they learned and decreased the perceived difficulty of the test problems.
Kumar, Harsh, et al. “Math Education with Large Language Models: Peril or Promise?.” Available at SSRN 4641653 (2023). https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4641653
From June 14 to August 23, the English government asked teachers how they were using LLMs for education. Last week, they published a very interesting report summarizing the 567 responses they obtained.
More than 2/3 respondants reported a positive result. Other 1/3 reported mixed or negative impact or noted that it was too early to tell.
Access to GenAI is not a subtitute for having a deep reservoir of subject knowledge held in your long-term memory.
GenAI’s ability to quickly generate explanations at varying educational levels could help explain complex concepts to pupils in a way they understand. Struggling students could be quickly identified and supported, while high-performing students could be challenged with more advanced materials.
- freeing up teacher time for focusing on teaching
- for creating and improving educational resources
- streamlining administrative tasks (emails, summarising meeting minutes, repetitive tasks)
- automating marking and assessment (experimental)
- generating (regular) feedback on students’ work; also, personalized study and revision plans for pupils based on their performance
- lesson and curriculum planning
- live demos (e.g. in food and nutrition lessons; or acting as a scriptwriter for the drama department, ChatGPT has sparked creativity)
- providing (adaptive) additional educational support (notably for students with special needs)
- enhanced engagement
- improved accessibility and inclusion
- proofread, edit and improve written content (first draft)
- support coding
- teacher professional development: understanding the latest pedagogical strategies (one respondent suggested creating personalized learning pathways for teachers based on their skills)
- (provided there is an improved access to technology and the Internet)
- over-reliance on GenAI tools (among pupils)
- young people should not access or create harmful or inappropriate content (academic misconduct)
- pupils should understand their personal data is being processed using AI tools
- ensuring pupils’ work is their own
- GenAI tools can produce unreliable or biased information: any content produced requires professional judgement
- (exacerbating the “digital divide”, as some pupils do not have access to devices, or stable Internet)
- Some teachers reported suspected academic malpractice at their institution
- Teachers needed additional time to check signs of AI use
- Some GenAI outputs were of poor quality, which then took time to correct to a sufficient standard for use
- User knowledge and skills (e.g. awareness of potential applications, or prompt engineering)
- Performance of tools (inaccurate and biased content, “issues such as Americanized spelling” 😂)
- Workplace lack of awareness and skeptical attitudes or fear (some institutions have blocked GenAI tools on devices)
- Data protection adherence: non-compliance with GDPR
- Managing student use: exposure to harmful content
- Access: paywalls to premium versions, institutions banning platforms, accessibility of student with special needs