Achieving Fairness in Developing Content for Standardized Language Tests

— Jill Fu, October 4, 2018

Fairness in language testing is an issue of great importance that can directly affect the validity of test scores. Through the following questions and answers, we introduce a few core concepts and their applications, and discuss how language test developers apply these principles in order to achieve fairness in test content.

Q: What is fairness and why are we concerned about it?

A: Let’s consider a few examples: signs at Canadian airports are in a minimum of two languages, computers are equipped with screen readers to assist visually impaired users. The authors of these materials all had a purpose in mind – to ensure that their content is appropriate, accessible, and fair to the consumer.

As test developers, we bear the responsibility of creating test content that is representative of authentic language-use situations so that we can stand behind our claim that the test is measuring what it is intended to measure – language ability. Just as we cannot measure body temperature with a ruler, we cannot obtain an accurate measure of language ability if we present inauthentic or irrelevant materials to test takers. We also need to ensure that our test content is not distressing for the test taker. If we include contexts or situations that make a test taker uncomfortable, they will not be able to perform as well on the test as they might have.

The adverse effects of these “off-target” materials are what we call construct-irrelevant factors. Eliminating or minimizing construct-irrelevant factors is a serious undertaking and is far more challenging than it might initially seem. The examples given earlier were selected on the assumption that most of us would know them. This is a relatively safe assumption. Also, the stakes in this context are low; even if you are unfamiliar with airport signs in Canada, you have the option of checking this information. However, in developing dialogues and reading passages for a test, we must be very careful about the assumptions we make.

Q: How do we ensure that our assumptions are correct?

A: This is indeed the question! Paragon has researched and developed Fairness Guidelines that we regularly consult when evaluating test content in development. This document is also integral to writers’ and reviewers’ training. However, our work doesn’t end here. While the Guidelines set out clear principles and examples of fairness issues, we still need to critically review and sometimes debate the appropriateness of certain pieces of writing. It is insufficient to have guidelines; the careful application of these guidelines is just as crucial. Consider the following excerpt from a passage about genetic engineering:

This is a piece of scientific writing on a fascinating and fast-advancing field of biology. The narrative is written objectively and does not take a position or side. This is an appropriate text for a biology class or perhaps a philosophy class. However, the topic can generate anxiety and might offend some people’s values. For this reason, it is inappropriate for an English language test which is not testing principles of biology or philosophy. It would be insensitive to overlook the anxieties that could result from discussion of such a controversial field. Our concern lies with the construct-irrelevant factors that could hinder achieving an accurate test score.

Some other examples that can illustrate our fairness considerations include avoiding brand names, such as Kleenex, so that test takers who are unfamiliar with them will not be at a disadvantage; avoiding discussions of ongoing events so that test takers who are aware of the subsequent development will not be at an advantage; and avoiding highly specific idioms and expressions so that test takers who have yet to learn them will not be at a disadvantage when asked, “Who is the man in a toque ordering a double-double?” When such critical review happens, we consider both the nature of the text and the context in which it is used. We ask ourselves a series of questions and weigh the appropriateness of the content from a language testing perspective.

Q: It sounds fairly limiting then, in terms of what you can produce…

A: It may seem so, but test fairness is not about rejecting reading or listening passages as soon as they contain an unfamiliar word. It is not about postulating that at least one test taker might feel uneasy reading certain phrases or ideas. While these are concerns, it is easy to oversimplify the rules and reject writing that can reasonably and adequately measure a test taker’s comprehension in the context provided. It is important to revisit why we are concerned with being fair – to eliminate construct-irrelevant factors that could adversely contribute to test performance and, subsequently, test score. If a dialogue explicitly explains that Kleenex means facial tissues, it is then fair to ask what a Kleenex is used for. If an article discusses the complete development of a culturally significant event, it is fair to ask the test takers to order the sequence of its stages. Conversely, it would be unfair to ask a test taker to comment on the Canadian legal system when no relevant information is provided.

Q: That is indeed a lot to consider! Can you give us some examples of your debates?

A: Absolutely. As test developers, the more we apply and improve our principles, the fairer our content will be. Now, perhaps you would like to be the judge for the following two “stumper” test questions. Would you keep or reject these test items?

Speaking prompt:

What is your view on vaccination and immunization? Should schools have mandatory flu shots? Why or why not?

Our Answer

Due to the contentious nature of the topic, it would be better to avoid asking for views on vaccination in a high-stress testing environment.

Listening testlet:

You will hear a conversation between two neighbours, Finn and Sasha. They are discussing plans for days taken off during the Family Day holiday.

Sasha: Hi Finn! Nice new fence you got there.

Finn: Oh thanks! Yeah – I spent all weekend painting it.

Sasha: Looks nice. So, are you taking the Family Day off and go on another trip?

Finn: Family Day? Oh, you mean the Monday after next week? Where I’m from, it’s called Heritage Day, and it’s on a Friday!

Sasha: Oh – I guess so! And I believe it’s called Islander Day in Prince Edward Island.

Finn: Is that right? Well, I guess what we all have in common is to take the opportunity to rest and enjoy our family!

Sasha: Very true. Say, since you’ve been on so many road trips with your family, would you recommend somewhere? Alex and the kids always say they want to go to a nice place with a lake and spend the night watching the stars.

Finn: Absolutely. Depending on how far you want to go…

Question 1.

Where is the man from?

  • British Columbia
  • Alberta
  • Yukon
  • Prince Edward Island
Our Answer

Answer: Yukon

Answering this item requires background knowledge which is not offered by the dialogue. This item is then unfair in that it discriminates test takers who do not know the name of the holiday is Heritage Day in the province of Yukon.

Jill Fu is a Content Development Specialist at Paragon Testing Enterprises.