ChatGPT fails IAS Exam
(4 min read)
IAS Exam!
The title of this piece is purposely
sensational. To entice you to read further. Of course, you know, there is
nothing called the IAS exam, but most people nevertheless use that name for the
Civil Services Examination (CSE) conducted every year by UPSC for about 1000
Grade I & II posts.
You may have heard the story, possibly
apocryphal, of parents of a young man from Bihar who visited the home of a
potential bride for their ladla, and the girl’s father asked, ‘Aapke suputra
ajkal kya kar rahein hain,’ and the boy’s father replied, with much pride, ‘Shriman
Dilli mein IAS kar rahein hain.’ That was truthful, since the young man had
joined a coaching institute, and was living in Old Rajinder Nagar in a one room
shanty with a shared bathroom for the last several years.
ChatGPT passes tough Tests
ChatGPT, OpenAI’s chatbot, is making
news ever since its public release in November 2022. It has bedazzled academics
by passing several tough Tests including
· Wharton Business School’s MBA exam (though
with a Grade of B or B-)
· University of Minnesota’s law exam
· US Medical Licensing exam, and
· Google’s interview for a Level 3 engineer for
a coding position.
‘ChatGPT on its way to become a
virtual doctor, lawyer, or business analyst’ is the title of a piece in a
leading business daily.
ChatGPT takes CSE Prelim Exam
It was high time to test the AI tool
for the UPSC Civil Services Examination - claimed by the IAS coaching
institutes, those who wrote it and succeeded and those who didn’t, as the
‘toughest test’ in the world.
(Don’t believe that tall claim. At the
end of this piece is a little Test you may like to take to see how easy it is.
Simple, school level math!)
I ran a snap, sample test for the AI
chatbot.
Well, ChatGPT didn’t write the Civil
Services (Preliminary) Examination 2022 conducted by the UPSC; but I coerced it
to take a Test. I picked up five random questions each from the General Studies
Paper I and II, and asked ChatGPT to answer.
Why prolong the suspense? Here is the
result:
ChatGPT (C) Scores
General Studies Paper I:
· Total Questions -100, Total Marks – 200, Time
– 2 hours
· Each correct answer has a value of 2 marks
· Each incorrect answer has a penalty of (-)
0.66
· Of the 5 Questions asked, C got 4 correct, and
1 incorrect
· Score: 8 – 0.66 = 7.34
· Total marks – 10, hence score is 73.4%
General Studies Paper II:
· Total Questions -80, Total Marks – 200, Time –
2 hours
· Each correct answer has a value of 2.5 marks
· Each incorrect answer has a penalty of (-) 0.8
· Of the 5 Questions asked, C got 2 correct, and
3 incorrect
· Score: 5 – 2.4 = 2.6
· Total Marks – 12.5, hence score is 20.8%
Result:
· Cut off is based on GSP I score only. C scored
very high in GSP I, and may be among the highest scorers
· However, minimum qualifying marks for GSP II
is 33%, and hence C failed.
(A painting of ChatGPT robot sweating over a tough question. Painting by Prasanna + DALL-E)
CSE Preliminary 2022- Basic Stats
· Cut-off: GSP I for General category candidates
- 43.5% to 50% in in previous years; likely cut off for CSEP 2022 – 45% to
47.5% (90 to 95 marks in GSP I)
· Number of candidates who applied: 11.52 lakhs
· Number of candidates who appeared in Prelim
exam: About 8 lakhs
· Cleared Preliminary Exam: 13090 (13090/8 lakh=1.64%)
· Cleared CSE Main Exam: 2529 (0.32%)
· Total Vacancies: 861 (0.1%)
· Chance of a candidate getting selected for
Civil Services: 0.1%
Why did ChatGPT fail?*
· No match for average human intelligence.
Someone has observed that AI at present has the ‘intelligence’ far below that
of a cockroach, and he may be right.
· No strategy. A smart candidate would quickly
scan the Qs from beginning to end, answer the Qs where she is sure she is
correct, skip the doubtful ones (the golden rule where incorrect answer earns a
penalty); GSP I – 50 correct answers out of 100 Qs is enough to clear the exam,
GSP II – 27 correct answers out of 80 Qs earn the minimum qualifying marks. No
need to try to solve all Qs. Very few candidates have the time to solve all Qs
and get it right.
· No knowledge of previous years’ Qs and cut
off, and the criteria of cut off.
· I have analysed ChatGPT’s incorrect answers,
and have a good idea why it goofed up. Should I publish my research like the
Stanford Professor, or wait for OpenAI to make me an attractive offer? Let me
think over it.
Disclaimers:
1. My method of random sampling could be
problematic. A better method may be to pick up a Question from each category
such as Indian Economy, Constitution and governance, Indian history,
Environment, Technology, arithmetic, problem solving, comprehension, etc.
2. A larger sample, say of 20 Qs from each Paper,
may throw up a different result.
3. C may perform better if asked to take the full
test (GSP I -100 Qs, GSP II-80 Qs).
4. I did not tell C about the penalty for an
incorrect answer, and the option to pass a question. However, C sometimes says:
I don’t know about that or I can’t answer that question. In my test, it
answered all the questions readily, and explained the basis of each answer.
5. C is handling millions of Q.s every minute
including the test Q.s I threw at it. Of course, it has access to the neural
network of supercomputers, and has been trained on 300 billion pages of text,
more than a human can read, and remember in several lifetimes.
6. Once trained on UPSC Question Papers of
previous years, C’s performance may improve.
Take a Test
You are, no doubt, among the brightest
and the best. To reassure yourself that your grey cells are still robust and
agile, would you like to take a Snap Test of a single question which is not too
difficult to solve? Correct answer earns you 2.5 marks, and incorrect answer
has a penalty of (-) 0.8; you have 1.5 mins to solve it, and the option to
answer or give it a pass, for which there is no penalty.
Your TIME (1.5 mins) starts now.
Question**
The digits 1 to 9 are arranged in
three rows in such a way that each row contains three digits, and the number
formed in the second row is twice the number formed in the first row; and the
number formed in the third row is thrice the number formed in the first row.
Repetition of digits is not allowed. If only three of the four digits 2, 3, 7
and 9 are allowed to use in the first row, how many such combinations are
possible to be arranged in the three rows?
(a) 4 (b) 3 (c) 2 (d) 1
I guess you solved it, got it right,
and within 1.5 mins. Congratulations, you scored 2.5 marks.
**A Question from Civil Services
(Preliminary) Examination, 2022, General Studies Paper II, and ChatGPT gave an incorrect answer for this question!
Update*
I published this blog on 22/02/23. The
next day, I asked ChatGPT:
Hi, did you fail IAS exam by UPSC,
India?
C replied:
I'm sorry, but I cannot take the IAS
exam or any other exam as I am an AI language model designed to assist with
various tasks such as answering questions, generating text, and providing
information on various topics. I have not taken any exams, nor am I capable of
doing so. My purpose is to provide helpful responses and assist users in their
queries to the best of my ability.
***
Note:
- Author's profile is
at http://amazon.com/author/pkdash
- Books by this author
are available on Amazon.in, Kindle eBook, Flipkart, and Notion Press,
Chennai.
***
This comment has been removed by the author.
ReplyDeleteThis is an excellent experiment with ChatGPT (C); I liked the honesty with which ChatGPT responded to author's last question.
ReplyDeleteI tested C for atleast one of the research questions I have answered in my PhD thesis after doing lots of research; C's responses are useful, though superficial but nowhere close to the details with which my thesis has answered.
However, over time, I am sure, C will gather enough experience and take away many people's job!