Testing 1, 2, 3

It is mid-term time, and everyone is thinking about tests.

When I was in school, I never gave a thought on how professors come up with tests.  It was just sort of a given that the professors pull something out of a hat and there was a “perfect” test.  It never struck me that coming up with a test is actually a grueling process that can take many hours.  And, even after all of that work, you can still get it massively wrong.

Since there are many different types of classes, there are many different types of tests.  I guess that I have really written three types of exams aimed at the following students: (1) undergraduates in massive classes; (2) graduate students in pretty high-level classes; and (3) graduate students taking the qualifying exam.  The first one is completely different than the other two, obviously.  The second and third are aimed at the same audience, so they are somewhat similar. The “rules” for the test can vary dramatically, though, so there are differences.

If you are writing a test for 200 students, you would really like to aim the mean of the test at some certain percentage.  I am a pretty generous person, so I tend to aim my mean score at about 85%.  That is a straight B.  Others may aim lower – say at 75%, which is a straight C.  It really takes a lot of practice to hit these numbers.  For my 101 Rocket Science class, I aim to hit 85%, but I consistently get 82% means.  There were a few times in a row in which all of the tests were coming back with means of 82%, which was really fantastic. It makes you think that you are like a master at writing these things.  Then the next semester, I got an average of 78%.  I had to think – is it the class or is it my questions or is it hints that I did/didn’t give them?  What is different?  Well, my tests tend to have something like 50 questions/points on them, which means that each question is worth about 2 percent.  The short answer questions are 3 points, or 6%.  So, if I ask a couple of harder multiple choice questions or a single harder short answer question, it can change the mean pretty significantly.

This last semester, I stressed that I would not ask questions about specific dates on the Rocket Science mid-term, but I would ask about order of things and which year things happened in.  Many of the students heard “no specific dates” and didn’t bother studying any dates at all.  And there were many complaints.

How do you then write an exam to aim it at a certain score?  It is very tricky.  Lately, I have gotten almost no 100% corrects on my exams.  This says that things might be a little too hard.  I try to give a bunch of soft-ball questions, but it seems way to easy to give too many of these types of questions.  Then, I struggle with multiple choice questions that are relatively simple.  The idea is that you might put two answers that they should be able to eliminate right away and then they have to really think about it to eliminate the last one. Is that a fair way of doing it?  I am not really sure.  I feel like these types of questions you are trying to trick the student into answering the wrong thing, which you are not really wanting to do. You want to test for depth of knowledge – which is incredibly difficult to do on a multiple choice test.

The best solution would be to go with all short answer questions, which take FOREVER to grade.  And I only have a limited amount of resources for grading.  I have also found that the scores on the multiple choice and the short answer questions are highly correlated, meaning that it almost doesn’t matter whether I ask short answer or multiple choice, the students will get roughly the same score no matter what.

Then there are the question that you completely screw up and the students don’t understand what you are asking.  That sucks.  Having 150 students all coming up to you in the middle of the test and asking the same question – “what does this question mean???” Crap!  For a person who aims at “good enough”, this can be a slight problem.  I have learned the hard way that in making tests, there is no “good enough”.  The test has to be read over multiple times and another person has to look at it too.  Then there will probably only be one or two awkward questions.  It is hard!

Tests for upper level classes are a bit harder on one hand and easier on another hand.  One problem that I think that most professors have is that they expect a lot more out of upper level students (which they should in some ways).  They tend to ask much harder questions, and those end up being (sometimes) much too hard.  My main problem is judging how many questions the students can answer in a given amount of time.

The first exam that I gave in my grad-level class on ionosphere/thermosphere had 10 questions.  I was able to create an answer key in about an hour, so I figured that two hours would be fine for 10 questions.  What an idiot!  That test was a bit of a disaster.  The next time I taught it, I lowered it to 8 questions.  Then I gave them 8 and had them answer 6. That seems to be the right level.  A good rule of thumb is that the professor should be able to do the test in less than a quarter of the time.  On the 100 level class tests, I can typically do them in under 10 minutes, or about 1/5 of the time available.  On upper level classes, I can get the ideas down in about that time, but need a few more minutes for the math and such.

I ALWAYS curve my tests, putting the mean at my intended mean.  It is almost impossible to design a “fair” test that gets the mean exactly where you want it.  Therefore, I strongly believe that curving is important.

I can see the argument that you have an expectation of the students, and the students should live up to that expectation.  Especially in upper level classes where you can’t really get good statistics on the distribution.  If you only have 10 students, some years these students might be quite good, while other years the students might not be stellar. Therefore, should you curve?  It depends on how consistent your tests are.  If you can write a really consistent exam, then curving may not really be needed.  I just have a hard time writing consistent exams that are actually different from year to year.

In summary, tests are like tests for professors.  They really test to see how well you are teaching the class and how well you can probe the students knowledge.  I feel like I get an A-/B+ in this endeavor. It would be nice to do better, but I am relatively happy.

(Yet another post on how things can always be improved…..)


About aaronridley

Professor at the University of Michigan, Department of Climate and Space Science and Engineering.
This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s