PEARL Reliability and Validity

The following is a brief summary of important information on the reliability and validity of the PEARL Screener.





Summary of Reliability Results

  • Estimates of the PEARL’s overall reliability, relative to several different methods used to measure reliability (inter- and intra-rater reliability, fidelity of test administration, parallel forms reliability, and internal consistency reliability) strongly suggest that the PEARL has minimal test error and that examiners can have confidence in its results.
  • Our point-by-point inter- and intra-rater analyses for the Decoding and Language subtests of the PEARL have yielded agreements at or above 90% across multiple examiners with varying degrees of education and experience in administering and scoring standardized tests.
  • Fidelity of administration has consistently been at or above 95% for both subtests.
  • There is very strong evidence to suggest that the pretest and posttest of the Decoding subtest are parallel, with correlation coefficients well above .90. 
  • The pretest and posttest of the Language subtest of the PEARL have yielded strong correlation coefficients at or above .70 across a large number of diverse students.

Reliability refers to how consistently a test can measure something across different conditions and situations. If a test lacks reliability, then the examiner will have limited confidence in the results of that test. It is common to calculate reliability using several methods, including inter- and intra-rater reliability, fidelity of test administration, parallel forms reliability, and internal consistency reliability.

For inter- and intra-rater reliability, it is important that the examiner agrees with other examiners (inter) and with him/herself (intra) most of the time. This type of reliability is very important especially for the language subtest of the PEARL because there is a certain amount of subjectivity involved in scoring a student’s language in real-time, despite clear scoring procedures. Inter- and intra-rater reliability at or above 80% is acceptable, and at or above 90% is preferred. Our point-by-point inter- and intra-rater analyses for the Decoding and Language subtests of the PEARL have yielded agreements at or above 90% across multiple examiners with varying degrees of education and experience in administering and scoring standardized tests.

Fidelity of administration refers to the extent that examiners administer a test in the same way over time. The results of a test may not be reliable if the test is administered differently each time it is given. This type of reliability is typically measured using a checklist or some other form of observation that measures how closely the examiner adheres to the administration procedures outlined in the test manual. It is often reported as the percent of administration steps completed correctly. For the PEARL, fidelity of administration has consistently been at or above 95% for both subtests.

Parallel-forms reliability refers to the extent to which two forms of a test are similar to each other. Because the PEARL uses a dynamic assessment process, each subtest of the PEARL has a pretest and a posttest that are designed to be parallel in content, length, and complexity. When the pretest and posttest of a dynamic assessment are parallel, gains from pretest to posttest can be attributed to factors unrelated to differences between the two forms. Pearson product-moment correlation coefficients (r) are typically calculated to provide information on the strength and direction of the linear relationship between two forms. Correlation coefficients need to be interpreted differently than simple calculations of percent agreement (e.g., 90%), which is what we used for inter- and intra-rater reliability and fidelity of administration. Positive correlation coefficients range from .01 (negligible) to 1.0 (perfect). Coefficients ranging from .20 to .29 are considered weak, coefficients ranging from.30 to .39 are considered moderate, coefficients ranging from .40 to .69 are considered strong, and coefficients at or above .70 are considered very strong. For the Decoding subtest of the PEARL, the pre- and post-tests use the same nonsense word stimulus items (tad, nad, kad, zad), and our analyses have indicated that the order of these nonsense words, which varies from pretest to posttest, has little effect on a student’s ability to decode those words. In other words, there is very strong evidence to suggest that the pretest and posttest of the

Decoding subtest are parallel, with correlation coefficients well above .90. The PEARL Language subtest also includes a pretest and a posttest that are designed to be parallel. The pretest and posttest of the Language subtest requires students to retell a story that the examiner models. These model stories were carefully crafted to have similar length, content, and complexity. The pretest and posttest of the Language subtest of the PEARL have yielded strong correlation coefficients at or above .70 across a large number of diverse students.

Examining the Internal consistency of a test often entails the calculation of Cronbach’s alpha, which represents the internal comparison of test items. Coefficient alpha internal consistency estimates of reliability (Cronbach, 1951) were conducted to examine how well test items within the same PEARL subtests are related to each other. A reliability coefficient of 0.00 indicates the absence of reliability whereas a coefficient of 1.00 indicates perfect reliability. It is generally recognized that an internal consistency alpha of.80 to .89 is acceptable, and .90 or higher is excellent. Cronbach’s alphas for the pretest words and posttest words of the Decoding subtest were at or above .90 (excellent). Cronbach’s alphas for items in the Language subtest were above .80 (acceptable to excellent).





The validity of a test is dependent upon how well the results of the test fulfill the intended purposes of the test and how well those results can be clearly interpreted by the examiner. The validity of a test, then, should always be measured within the context of clearly defined purposes. Evidence of validity has traditionally been organized under the headings of content-description validity, criterion-prediction validity, and construct-identification validity, which correspond to overlapping (rather than alternative) empirical and logical approaches taken during a test’s development and use. The PEARL Technical Manual, available at, contains detailed information related to each of those evidences of validity. An overview of the evidence of validity, organized according to the purposes of the PEARL, is provided below.

PURPOSE 1: The PEARL was designed to efficiently and accurately predict future decoding and oral and written language comprehension difficulty.

Purpose 1 Results:

According to this purpose statement, the results of the PEARL are valid to the extent that they efficiently and accurately predict future word-level reading ability and written and oral language comprehension. Our analyses of the sensitivity and specificity of the PEARL across over 1000 students indicate that the Decoding and Language subtests yield sensitivity and specificity at or above 80%, and that these results can be obtained after a very brief screening process.

This first purpose statement indicates that future decoding and comprehension abilities are being measured. This is a prediction statement, and it is theoretically linked to the dynamic assessment process (test, teach, retest). This purpose statement has to do with classification accuracy. For tests that have the purpose of classifying individuals (e.g., the test is designed to identify language impairment and to identify typically developing language ability), evidence of classification accuracy would be of paramount importance. Perhaps the best indicator of classification accuracy is a test’s sensitivity and specificity. Sensitivity refers to how accurately a test identifies students who have a disorder, and specificity refers to how well a test accurately identifies students who do not have a disorder. How accurate a test needs to be is open to debate, and it may be reasonable to expect a greater degree of error from a brief screener than from a more comprehensive assessment (because of how the results of a screener will be used), nevertheless, the test user must determine whether the accuracy of a test is sufficient for his or her needs. A general rule of thumb is to expect a test to have 80% or higher sensitivity and specificity, although it may be acceptable in some situations to sacrifice specificity for the sake of increased sensitivity.

PURPOSE 2: The PEARL was designed to reduce bias of the information used to predict future reading or language difficulty.

Purpose 2 Results:

According to this purpose statement, the results of the PEARL are valid to the extent that they yield adequate sensitivity and specificity across a diverse group of students. We conducted analyses of the predictive accuracy of the PEARL with different subgroups of culturally and linguistically diverse students, including Hispanic students, Native American students, and students from lower socio-economic backgrounds. Across each subgroup of students, the PEARL yielded sensitivity and specificity at or above 80%, indicating limited assessment bias and excellent predictive validity for a diverse group of students.

This purpose statement suggests that some information derived from other tests can be biased, and that the PEARL was designed to reduce some of that bias. This statement is theoretically connected to the dynamic assessment process that is used in the PEARL, which reduces or eliminates many of the biases found in other static measures (e.g., content bias, linguistic bias). This purpose statement also refers to the use of nonsense words in the Decoding subtest, and the use of relatively complex narrative language in the Language subtest. The use of nonsense words and complex language was designed to establish a scenario where most young students, regardless of cultural or linguistic differences, will have something to learn. A test that has limited cultural or linguistic bias should yield similar results across students who are diverse. Because this purpose of the PEARL is to reduce biased predictive information, strong test sensitivity and specificity across a diverse group of students can be convincing evidence of limited assessment bias.

PURPOSE 3: The PEARL was designed to inform present level of performance of decoding and academically related language ability.

Purpose 3 Results:

According to this purpose statement, the results of the PEARL are valid to the extent that they provide accurate information on a student’s current ability to decode, and understand and use complex language. Analyses indicate that examiners can have confidence in the static data gathered by the assessment, especially when those results are generally interpreted and when there is a focus on positive performance. For example, we have found that students who can decode the nonsense words presented in the pretest of the Decoding Subtest tend to be students who can, in general, decode words, and we have found that students who can retell a narrative with multiple story grammar elements with a complete episode and with subordinating conjunctions tend to be students who can use and understand complex, academic language.

Although the PEARL uses a dynamic assessment process, this purpose statement indicates that the static information derived from the pretest and posttest measures of the dynamic assessment is still important. It implies that how a student is currently performing in decoding and academic language will potentially impact his or her ability to take on new material in those domains. This purpose is founded upon theories of second language acquisition, and upon the idea that students can have adequate language learning abilities, but still not have proficiency in a second language or in academically related language even in their first language. This purpose statement implies that the process of learning a second language or a different dialect can be accelerated, and that once this basic language foundation is established, more complex language can be added. For example, the dynamic results of the PEARL Language subtest are meant to identify students who will or will not have unusual difficulty learning to understand and use complex language. This information, although potentially very important, only provides information on language learning ability, it does not provide information on how well a student presently understands and uses the language found in the school setting. Static information on present language performance can be very important because, even in the absence of a language learning disability, students who have limited English language proficiency or cultural or dialectal differences can be at an initial disadvantage in the school setting. The majority of students who do not have language-learning difficulty and who use and understand language that aligns with school expectations will have, at the beginning of their formal education, the language foundation upon which complex, academic language can be constructed. Students who do not have this initial English language foundation, or sufficient exposure to the academic dialect, are placed in a situation where they must develop basic English language proficiency or learn to use and understand a different dialect with different vocabulary and language complexity. One of the purposes of the PEARL, therefore, is to provide current information on language proficiency so that students can receive additional language instruction that can accelerate the development of a language foundation or dialect upon which complex, academic language can be established. This 20 purpose of the PEARL also suggests that young students who have already begun to master the decoding process and who already have advanced language skills can be identified, and specialized instruction can be provided, accelerating their academic learning process instead of expecting them to wait for others to catch up.

PURPOSE 4: The PEARL was designed to help identify a student’s zone of proximal development and

Purpose 4 Results:

According to this purpose statement, the results of the PEARL are valid to the extent that they help an interventionist understand how much support a student might need to successfully decode and understand and use complex language, and identify which specific intervention goals would be most appropriate for an individual student. The extent to which this purpose can be met is primarily dependent on the examiner’s attention to what takes place during the dynamic assessment process. Although the PEARL subtests were designed for brief administration and scoring, the opportunity is provided through the pretest, teaching, and posttest phases of the dynamic assessments for the examiner to carefully document how a student  approaches and responds to the tasks of decoding and language comprehension and production.


identify specific deficits and strengths related to decoding and language. Because the PEARL uses a dynamic assessment, a teaching process is employed. This teaching process can provide the examiner with valuable information on what teaching strategies appear to be effective for a student, and how much support a student needs to learn a new concept or skill. This purpose is founded upon Vygotsky’s concept of the zone of proximal development. The zone of proximal development is the distance between what a student can do independently and what that student can do with help. The more help or scaffolding a student requires, the more effort is required to bring that student to independence. Results of the PEARL can help with intervention planning because information can be obtained about what it will likely take to bring a student to mastery, and information is gathered about where the specific difficulties lie, facilitating specific intervention goals.

PURPOSE 5: The PEARL was designed to identify participants in research that focuses on students’ reading or language development.

Purpose 4 Results:

According to this purpose statement, the results of the PEARL are valid to the extent that they help researchers characterize their participants’ language and reading abilities. The PEARL provides a considerable amount of valid and reliable information by using a dynamic assessment process. This information can be of significant use to researchers.


This purpose is related to inferential statistics. Accurately identifying and describing participants in research is extremely important. It is through the adequate description of research participants that external validity can be inferred. To apply research findings to other students who were not directly involved in a research project requires inference. No two students are identical (not even monozygotic twins), and what works for one student cannot be expected to work with absolute certainty for another student. Nevertheless, inference that an intervention will be effective is strengthened if a student receiving the intervention is more similar to a student who responded positively to the same intervention in a well-controlled research study. One of the purposes of the PEARL is to provide accurate information on a student’s current and future reading and language ability. This information will allow researchers to carefully describe their research participants, create subgroups of participants, and disaggregate data according to participant characteristics.

Shopping Cart
Scroll to Top