Despite controversial opinions, teachers and students should collaborate to maximize the benefits of incorporating ChatGPT into education
OpenAI released ChatGPT in November 2022. The AI chatbot was created to answer users’ questions with a human-like response. Some students, mostly in high school and college, then started using ChatGPT to finish their homework, including their essays.
How Good is ChatGPT?
There are two different Generative Pre-trained Transformers (GPTs): GPT-3.5 and GPT-4. OpenAI has tested both of them to observe the gap between the two versions, as GPT-4 is the predecessor of GPT-3.5. Business insider tells us the tests GPT-4 (with vision) and GPT-3.5 took and what scores they received.
One of the tests was the SAT. The SAT has two parts, the Reading and Writing Section and the Math Section. For the Reading and Writing Section, GPT-3.5 scored 670 out of 800, placing in the 87th percentile, while GPT-4 scored 710 out of 800, placing in the 93rd percentile. In the Math Section, GPT-3.5 was in the 70th percentile, while GPT-4 received 700 out of 800, coming in the 89th percentile. Overall, GPT-4 received 1410 out of 1600, a substantially higher score compared to the average SAT score in 2021 of 1060, according to a report from the College Board.
On the Graduate Record Examinations (GRE) GPT-3 scored in the 63rd percentile on the verbal section, the 25th percentile on the quantitative section and the 54th percentile on the writing section. On the other hand, GPT-4 scored in the 99th percentile on the verbal section, the 80th percentile on the quantitative section and the 54th percentile on the writing test.
The USA Biology Olympiad is a difficult exam for the brightest biology students in the country. GPT-3.5 received 43 out of 150 in the USABO Semifinal Exam of 2020, scoring in the 31st to 33rd percentile, while GPT-4 received 87 out of 150, averaging around the 99th to 100th percentile.
In Advanced Placement (AP) examinations, where high school students administered by the College Board take college-level courses, students can receive a score between one and five, with five being the best score. These are the scores that GPT-3.5 received:
5: AP Art History, AP Environmental Science and AP Psychology
4: AP Biology, AP Microeconomics, AP US Government, AP US History and AP World History
3: AP Physics 2 and AP Statistics
2: AP Chemistry, AP English Language and Composition, AP English Literature and Composition and AP Macroeconomics
1: AP Calculus BC
Meanwhile, GPT-4 received the following scores:
5: AP Art History, AP Biology, AP Environmental Science, AP Macroeconomics, AP Microeconomics, AP Psychology, AP Statistics, AP US Government and AP US History
4: AP Calculus BC, AP Chemistry, AP Physics 2 and AP World History
2: AP English Language and Composition and AP English Literature and Composition
Based on these data, GPT-4 did better than or the same as GPT-3.5 in every single course.
Finally, for AMC 10 and 12, there were some surprising results. GPT-3.5 received 36 out of 150 for AMC 10, whereas GPT-4 received 30 out of 150. However, when GPT had no vision, it scored 36 out of 150 as well. In AMC 12, GPT-3.5 received a 30 out of 150, while GPT-4 received a 60 out of 150 with vision.
These are some of the tests or exams that GPT-3.5 and GPT-4 took. To see all of the tests and scores, including the differences when GPT-4 has or doesn’t have vision, view the report OpenAI published with all of the tests, scores, and other information about GPT-3.5 and GPT-4
Why People Shouldn’t Use ChatGPT to Write Their Essays
According to CNN, some teachers catch their students cheating, or at least using ChatGPT to write their essays. Though students believe that ChatGPT can help them get a good score, it may be the opposite, as teachers can easily tell if a student uses ChatGPT.
One obvious giveaway is that the essay looks suspiciously perfect. ChatGPT will not make errors in grammar, spelling or any other convention. However, excellent grammar is a flawed indicator of a ChatGPT written essay, as it could be composed by a college student with magnificent writing skills.
Therefore, another important giveaway is a lack of human voice in writing. Some college professors say that the voice of ChatGPT is the voice of a “50-year-old compliance lawyer,” while others say the essay lacks vibrancy, authentic human voice, experience, and more.
The last and most obvious giveaway is that ChatGPT cites false research. It may seem that there are sources, but since ChatGPT cannot search on the internet, it will cite false sources. It may also use literal or false information.
Within this context, it is safe to say that even if students get a good score using ChatGPT, they will never learn. To elaborate, if students continuously rely on ChatGPT to write essays, they will never be able to write essays themselves. Therefore, when they receive an assignment with different parameters, like a paper or timed essay, they will not be as successful.
The Problem
Despite several giveaways, as mentioned earlier, some teachers struggle to detect ChatGPT. For example, AI might not even cite research, meaning teachers can’t look for false information cited. If some information is incorrect, it can also be seen as a human error or a misunderstanding of something.
The YouTube channel Stay Tuned created an experiment where Stay Tuned used his favorite high school teacher, Topich, from long ago. They used nine high school juniors, all in AP classes, to write a two to three page essay in response to a foreign affairs article against Chinese economic coercion. Four students out of the nine were found cheating. Two of them could not edit the essay written by ChatGPT; they could only perfect it with prompts. One student could change the essay written by ChatGPT, but only a little. The last cheating student could change the essay entirely, but still partly by cheating. Topich had to check if the essay was written by ChatGPT or a student. When he reviewed all of the essays, ChatGPT fooled Topich six times, getting three out of the nine students correct.
The Solution
Despite these difficulties for teachers, there are a few ways to check if an AI wrote an essay more proficiently.
According to Stay Tuned, OpenAI has created a program where it can detect if an essay has been written by ChatGPT. According to Ben Miles on his YouTube channel, there are also other programs, such as ChatGPT Zero, that claim to spot texts created by ChatGPT. Scientists have asserted that OpenAI is also working on a cryptographic watermark to make sure ChatGPT text is not copied and pasted.
Ben Miles also states that teachers can use ChatGPT as a helper, not a doer. Teachers should start using ChatGPT so that students will have a different perspective on ChatGPT. Students need to use ChatGPT differently.
Should students use ChatGPT for essays?
The answer to this question is “Yes” and “No.” If teachers specify not to use ChatGPT at all, students should not use the tool because it is their priority to follow their instructor. However, if teachers specify not to copy and paste ChatGPT text but authorize using ChatGPT, students should take advantage of that.
The problem is about how students use it, not that they do. Students need to learn that ChatGPT was not made to take over their responsibilities. Rather, it was made to help students, teachers and the rest of the global citizens.
Teachers should gradually incorporate AI in their classrooms, educating their students on how to use ChatGPT effectively. But to do so, they need to gain familiarity with ChatGPT to spot essays written entirely by the chatbot.
In conclusion, it is both the teacher and the students who must act. Teachers should start teaching students how to use ChatGPT and other AI programs well, and the students need to continue to put in the necessary effort while utilizing ChatGPT as a tool, not a crutch.
Comments