How to Harmonize Regional and International Learning Assessments

By Silvia Montoya, Director of the UNESCO Institute for Statistics (UIS), Mmantsetsa Marope, Director of the UNESCO International Bureau of Education (IBE-UNESCO) and Renato Opertti, Senior Programme Specialist of the IBE-UNESCO.

 This blog was also published by the IBE.

As education stakeholders, including governments, assessment initiatives and donors gather in Madrid for the Fourth Meeting of the Global Alliance to Monitor Learning (GAML), the UNESCO Institute for Statistics and the International Bureau of Education set out strategies to help resolve the technical and political challenges of measurement.

 Sustainable Development Goal 4 (SDG 4) – an inclusive and quality education for all – is a crucial benchmark for global well-being. Its broad ambitions have been given tangible force by SDG 4, Target 1: by 2030, all girls and boys complete free, equitable, and quality primary and secondary education leading to relevant and effective learning outcomes. And the measure of success? Indicator 4.1.1.: the percentage of children and youth achieving a minimal level of competency in literacy and numeracy in three points in time and by sex: (a) in grades 2/3; (b) at the end of primary; and (c) at the end of lower secondary.

A new paper from the UNESCO Institute for Statistics (UIS) and the International Bureau of Education (IBE-UNESCO) examines the technical and political challenges in producing cross-nationally comparable assessment data for indicator 4.1.1 as well as a set of criteria and strategies to overcome them.

The technical challenges include the development of common methodologies to allow comparison of the results of existing assessments while the political challenges include the need for leaders to agree on a ‘minimum level of competency’. This is not so simple when diverse educational proposals and realities, curricular logics, and human capital development mean diverse interpretations of the ‘minimum level’.

Read the paper in English:


Helping global leaders agree on a common strategy to measure learning

The paper informs the debate on the possibilities and limitations of developing a global assessment strategy for indicator 4.1.1, by comparing different international, regional, and foundational skills assessments of literacy and numeracy. In total, the paper reviewed 15 assessments, which have different target populations, contexts, purposes, conceptual focus, methodologies, procedures, and so on. We reviewed these diverse assessments using a set of criteria that allowed comparison of their technical and political dimensions.

Technical dimensions

On the technical dimensions, the starting point was the development of criteria to evaluate the plausibility of comparing the percentage of children and youth achieving a minimal level of competency in literacy and numeracy in three points over time and across countries. If the task is to identify a proportion of the population achieving a certain level of competency, then we need to focus on how these concepts are measured in different assessments. That means examining the design of the assessments, the standard setting procedures to set achievement levels and scores cuts, and the statistical procedures used to estimate the distribution of achievement in the population. Differences in these three dimensions may have a significant impact on estimations of the proportion of students who achieve the minimum learning proficiency over time.

In terms of design, we identified the purpose of the assessments, the targeted populations and the domains as three critical aspects for measuring progress according to indicator 4.1.1. Regarding purposes, we found that they have been developed to fulfil different needs, either to monitor and compare education systems, diagnose an education system, or to evaluate programmes. In relation to target populations, none of the assessments measure progress at the three key points in time required by indicator 4.1.1. And in terms of their domains, the content and skills assessed in each evaluation vary widely, responding to the needs of the institutions that developed the instruments.

Political dimensions

Each country has different expectations of the minimal level of knowledge and skills in literacy and numeracy for their citizens, but the path to SDG 4 requires the international community to define a common global measure. As our paper suggests, this process requires deep reflection on the practical purpose of SDG 4 measurement – both for individual countries, especially those that face the greatest struggles, and for the global community.

There are three key political challenges. First, there is the level of representation of national curricula in the definition of the minimal level and in the items included in the test. Most countries have defined learning objectives through their official curricula and may feel that these are not sufficiently represented in the assessment or the definition of the minimal level. Second, the possible consequences of an assessment that is shared worldwide and that could trigger political and social consequences for countries perceived as ‘low achieving’. And third, the possibility of challenges to the validity of a global assessment in a context of political pressures.

The paper proposes that such challenges could be diminished by the establishment of rigorous procedures for international evaluation, led by a consortium of highly respected institutions to define a minimal level of competency for assessing SDG 4.

Whatever the challenges to be overcome, there is a pressing need to measure indicator 4.1.1 at a global scale. This will require the commitment of different evaluation projects around the world to find ways to either link tests or to create a specific assessment for the purposes of SDG 4, while there is a political task ahead to convince countries of the validity of any international definition of such a minimum standard.

Four strategies to move forward

The paper sets out four strategies to measure indicator 4.1.1 over the short-term, medium-term and long-term, summarized in Table 1.

Table 1. Strategies to measure progress towards Sustainable Development Goal 4

Strategy Implications
Strategy 1: Short term

Use of national assessments to measure SDG4 with adjustments using international assessments.

·       High levels of external validity for measuring the minimum level of competency established in official curriculum.

·       Low levels of international comparability.

Strategy 2: Medium term

Equating among international and regional assessments.

·       Apparent low cost by using existing assessments.

·       Entails performing one equating for each of the grades to be assessed in indicator 4.1.1 and defining new proficiency levels for each scale.

·       Technically questionable from a psychometric and substantive point of view.

·       Low levels of external validity for representing the national curriculum.

Strategy 3: Medium or long term

Equating between different international evaluations aiming at similar school grades.

·       Requires the definition of anchor items that can be shared across the different evaluations and the creation of a consortium of different assessment projects.

·       Difficulties of comparison because of the differences in the domains assessed in the different assessments.

·       Psychometrically and substantively more robust.

·       Low levels of external validity for representing the national curriculum.

Strategy 4: Long term

Creating a Worldwide Proficiency Assessment on Numeracy and Literacy.

·       Psychometrically and substantively robust.

·       Politically difficult to convince countries to participate in this assessment.

·       Requires the participation of technical institutions in the design, implementation, and analysis of test results.

·       Low levels of external validity for representing the national curriculum.

In the medium term, the most technically appropriate way to assess indicator 4.1.1 is to develop a specific instrument with a clear definition of the minimal level of competency. This is essential because external validity may equate to political legitimacy among assessment participants. Whatever the eventual strategy, it will require the support and collaboration of institutions specialized in international evaluation, possibly in the form of a consortium. By bringing to the table key technical institutions, as well as regional and international assessment initiatives, the Global Alliance to Monitor Learning can help to ensure both the technical quality of the assessment of indicator 4.1.1 and its all-important political legitimacy.

