When doing Functional Capacity Evaluations (FCE) some clinicians and FCE vendors pull a bit of a testing protocol from here and a bit of another from there, slap it together and call it a defensible, functional test. While that might get the job done, is it an accurate test? And what makes a test accurate anyway?

Test accuracy is affected by its reliability and validity.

What is reliability? Miriam Webster defines reliability as the extent to which an experiment, test or measuring procedure yields the same results on repeated trials.

Test reliability can be determined by answer this important question: Does the test produce consistent results?

Without consistent results, you don’t know which test result is the right one. You can’t have an accurate test that’s inconsistent. There are two types of consistency:

  • If two different clinicians conduct the test on the same person, do they get the same test results? (Inter-rater reliability)
  • If the same person takes the test twice, do they get the same test results? (intra-rater reliability)

Even if the test results are consistent, the test still may not be accurate. If someone consistently adds 2+2 and says that equals 5, the result is not accurate even though it’s consistent. To fully establish accuracy, you need to look at validity.

What is Validity? The state of being acceptable according to the law; the quality of being well- grounded, sound, or correct. In an FCE, the validity question becomes: Can we accurately determine the level of work that an individual can perform safely?

For example: If the test result indicates that an individual can lift 50 lbs. without getting injured, is there evidence that this result holds true 1 month, 6 months, 1 year, 2 years down the line?

Confusion between Validity and Sincerity of Effort. Unfortunately, some early commercial providers of FCEs started using the term “validity” to refer to sincerity of effort. Their reports state “this FCE is invalid” or “this FCE is conditionally valid” due to client self-limiting behavior (stopping before a maximum is reached).

Validity is not a measure of the client’s sincerity of effort. Test validity is established through research and does not change with client self-limiting. If the client self- imits on many tasks of the test, then it becomes a test of what the person is willing to do. Sincerity of Effort will be discussed in a subsequent post.

Why is accuracy important in Functional Capacity Evaluations? In FCEs, we are testing for a brief period of time and test results must be extrapolated or projected to the full work day/work week.


But Not all Validity is Created Equal.

There are a variety of ways to demonstrate validity and not all forms of validity are equally robust when it comes to defensibility. So, it’s important to understand the types of validity.

Content/Face validity answers the question: Does the test include all the things it should include? Does it measure the appropriate physical demands of work?


Content/Face validity is the weakest form of validity and, by itself, is not enough to establish full validity of the test. But it’s the only validation that most commercial FCE vendors have.

Criterion-related validity refers to the comparison of a newly developed instrument to one that is considered to be the “gold standard” (Portney & Watkins, 1993). A possible gold standard in FCE is the work that the worker is actually performing on the job. However, there are challenges with using actual work status as a gold standard. Individuals work at levels above and below their safe maximum for reasons other than their physical abilitiies. Criterion-related validity can be broken down into two types.

Concurrent validity answers the question: Can the test determine what the patient can do safely (without injury) today?

Predictive validity answers the question: Can the test determine what the patient can do safely (without injury) in the future?

Both concurrent and predictive studies are the most robust forms of validity. If studies are published in the peer-reviewed literature, they carry more legal weight than case studies.

This brings us to the questions:

Is your Functional Capacity Evaluation reliable and valid?

Why should you care?

Reliability and validity are not merely academic concepts. They are critical to credibility in FCEs and make a significant difference in “real world” practice. Without reliability, the referral source could send the client to a different therapist and get an entirely different result. Without validity, the client, therapist, referral source, and employer do not know if the test results are accurate.

  • Your profession’s national association thinks it’s important. APTA Standards for Tests and Measurements states that test purveyors should establish reliability and validity, procedure manuals should document this research and that clinicians should understand the research behind the tests they use.      
  • The US Supreme Court thinks it’s important. The Supreme Court ruled in Daubert v. Merrell Dow Pharmaceuticals in 1993 that expert witnesses must rely upon tests and measures with reliability and validity published in the peer-reviewed literature. If FCEs do not meet these standards, they may be considered inadmissible in a legal case.
  • The EEOC and ADAAA regulations think they are important. In its guidelines for employment testing, the EEOC discusses extensively the importance of reliability and validity research to support testing. http://uniformguidelines.com/uniformguidelines.html#20

At the end of the day, you want to go home feeling like you’ve conducted a fair and objective test that truly determines a person’s physical abilities as they relate to work. An accurate post-injury FCE or Return-to-Work test helps to determine when – or whether - someone can go back to work in their own – or any job. Pre-Hire tests determine whether someone gets the job – or not. There’s a lot riding on these assessments. They not only affect employee/employer pocket books but deeply affect the social fabric of the worker’s life. When people can’t or don’t work, their personal lives often unravel. So, you want a test with proven validity: one supported by peer-reviewed, published research.

Don’t accept anything less.

