Validity - Tests are popular assessment tools used in occupational and education settings. One of the primary requirements a test must meet is validity, that is, a test must measure what it claims to measure. A driving test, for example, is supposed to measure one’s driving capabilities, and not one’s physical appearance. Similarly, a final exam in a chemistry course needs to include items that measure the subject matter taught.
Assessing validity - Validity may be assessed in several ways. For example, asking subject matter experts whether the test measures what it claims to measure (content validity), or assessing the correlation between test scores and another criterion (criterion validity). An example of criterion validity is assessing the accident rate of people who passed a driving test versus those who did not, assuming that those who passed are supposed to have fewer accidents.
Face validity - Another topic discussed in the following article is that of face validity, a term that does not refer to what a test measures, rather to what it appears to measure. Usually this type of validity is related to the test takers, that is, whether the test is perceived by its takers to measure what it is supposed to measure.
Importance of face validity – The perceived Importance of an assessment tool is of great value since a test that is not perceived as relevant and seems juvenile, for example, may result in lowered motivation or cooperation from the test taker. In such a case, test validity may be jeopardized.
For example, projective tests are known for their low face validity when used for recruiting purposes. When people are given the task of drawing a person or a tree, or are asked to tell a story based on a vague black and white picture, they often do not understand the connection between the task at hand and possible occupational requirements. Therefore, even a suitable candidate may take offense or not take the test seriously causing the outcome to be an assessment that is possibly different from the one that would have resulted were the same candidate to understand and believe in the significance of the assessment task. Such a situation may result in incorrect assessment of suitable candidates, thus harming the test’s validity.
A study conducted in France by Born and Derous (2005) found that explanations given about the assessment process and the use of the assessment tool impacted test takers’ motivation and performance. Test takers who were provided with an explanation about the test they were about to take exerted greater effort and produced behaviors that were more representative of their true potential.
Assessment of face validity - Face validity is usually assessed using a feedback questionnaire or interview given to test takers. By asking a number of questions, test takers’ attitudes are revealed regarding the test and what they believe it measures, as well as the test’s suitability as an assessment tool for a specific occupational role. Piloting tests for feedback in this way is often recommended.
Test taker preferences - Research shows that knowledge tests are considered to have high face validity, whereas projective tests are perceived to have low face validity.
In studies conducted simultaneously in Spain and Portugal by Salgado and Moscoso (2004), test takers were asked to rank their preferences for different pre-employment methods. Similar findings were noted in both countries.
The preferred assessment tools were personal interviews, resumes and occupational task samples (tests using actual tasks from the target job). The least preferred methods were integrity tests and graphology assessments. These findings were confirmed by another study also conducted in 2004.
At the same time, participants in these studies were not actually tested using these measurements, but were only asked to rank them.
Significant variability exists among attitudes of test takers that were actually tested in comparison to those who are asked to share their perceptions. This variability is particularly true of integrity tests that usually trigger resentment.
In fact, studies conducted in the United States confirmed that test takers who had undergone integrity tests or simulations of these tests did not react negatively to such tests.
Midot’s Integritest includes a questionnaire that asks test takers to rate the extent to which they were willing to answering the questions. Approximately 90% expressed a willingness to answer the questions. Likewise, when asked about a general need for integrity tests, 62% answered that they perceive a need for such tests, 12% indicated they did not know, and only 26% answered that they do not perceive a need for such tests.
The face validity of Midot Integritest was piloted on 414 additional test takers, where the following four questions were asked:
1. To what extent will the questionnaire help maintain norms and ethics within the organization? 86% of responses were positive.
2. To what extent will the questionnaire help select honest and trustworthy candidates? 82% of responses were positive.
3. To what extent did you understand the questionnaire requirements? 92% of responses were positive.
4. To what extent were you willing to respond to the questionnaire? 97% of responses were positive.
In general, test takers reported face validity that is perceived to be high based on several parameters such as these.
A 1999 study found differences between face validity of overt integrity tests (including questions that either directly or indirectly address topics related to unlawful behavior) and personality based tests in which integrity is indirectly established on the basis of relevant psychological characteristics. Participants who took both tests determined that the face validity of the overt tests was higher than that of the personality based tests.
Not only does test type impact the perceived face validity, but also test structure. An updated review of integrity tests conducted by Berry, Sackett & Wiemann (2007) reported a study in which participants rated a five-point multiple choice test as having higher face validity than a two-point multiple-choice test. The researchers concluded that participants felt they were unable to exhaustively describe their feelings using a two-point scale.
Another finding unrelated to content is test medium, whereby participants who took a computer-based test reacted more favorably towards the test compared to those who took a paper-based version of the same test.
Improving face validity -There are a number of ways to improve face validity as perceived by test takers. One way is to write questions that are as relevant as possible to test takers’ circumstances and culture, and, specifically, to the occupational role for which they are candidates. Another way is to provide explanations about the test and its rationale. Transparency usually helps, although transparency may contradict, at times, the test administrator’s goals to hide the tests rationale.
This transparency is particularly important in tests that do not appear to test the skills required for a specific occupational role (for example, test of written expression for a computer programmer). If the test battery includes tests that have particularly low face validity, it is recommended that candidates at least be informed at the start of testing that not all tests are related to the targeted occupational role, and that instead a wide array of skills are being tested, for which participants should do their best in the different tasks.
In conclusion - When constructing or choosing a test, it is important to consider issues of face validity and to measure the reactions of test-takers by piloting the test. If low face validity is noted, consideration should be given as to whether the test should be used or means by which to it can be improved.