Skip to content
Tech News
← Back to articles

Deep learning four decades of human migration

read original get Deep Learning Migration Book → more articles

Quantifying global migration

Current methods for estimating global migration rely on relatively straightforward techniques compared with the advanced computational approaches adopted in recent years for predicting and explaining human migration and mobility. Estimates of global migrant population stocks, by country of birth and country of residence, are derived from official statistics on foreign-born or foreign populations, with simple interpolation across census years and imputation when data are missing or inconsistent, using regional averages, demographic assumptions or alignment with changes in the population totals22.

The availability of migration flow data is much more limited than that on migrant stocks. Countries that publish migration statistics tend to have well-developed statistical infrastructure for monitoring population movements located in rich, developed settings. The scale of migration flows occurring between developing countries and to and from some of the world’s most populous nations is often unknown. Consequently, to estimate origin-destination migration flows at the global level, indirect methods have been developed based on changes in global migrant stock estimates. These methods were reviewed in a previous work9, alongside a systematic comparison. They identified six methods, grouping them into three classes.

The first class comprises two stock-differencing approaches, which treat changes in bilateral migrant stocks between census rounds as flows. Negative differences are either set to zero or interpreted as return migration. The second class is a migration-rate approach, which derives transition rates directly from a single stock table by dividing each off-diagonal stock count by the global foreign-born population. These rates are then scaled by an approximation of the total number of global flows, calculated as the sum of absolute net migration flows.

The third class includes three demographic accounting methods, which reconcile changes in birth-place-destination stocks, total population, births and deaths with estimated origin-destination flows. In this framework, adjusted stock tables at the beginning and end of the period are used to define outflow and inflow margins. These margins are then arranged into a three-way array of origin, destination and birthplace flows. Missing values in this array are imputed so that the reconstructed flow table matches the stock-based margin totals. To achieve this, an iterative proportional fitting algorithm, adapted from a past work73, is applied to adjust the cell values until the row, column and diagonal constraints are satisfied. Variants of this framework differ in whether inconsistencies in inflow and outflow margins are absorbed into an open demographic system by introducing a residual category12 or resolved in a closed demographic system by scaling adjusted stock tables to enforce consistency13. A further extension combines two imputation strategies within the closed demographic accounting system by weighting alternative treatments of the diagonal cells in the array that represent non-migrants14. The first imputation sets the diagonals to their maximum feasible values, whereas the second applies an independent log-linear fit that relaxes this constraint. The final flow estimates are obtained as a weighted average of the two imputations, with weights calibrated against harmonized European migration flow data. Although each method has trade-offs, the weighted demographic accounting approach produced estimates that were most consistent with reported flow statistics in countries with official data9.

All applications of these indirect methods are constrained by the temporal spacing of the available migrant stock data, typically five-year intervals, and by errors or inconsistencies in the underlying stock statistics. As a result, the estimated flows inherit the limitations of the stock data, including inaccuracies in imputations by UN DESA or other agencies, which can affect both the precision and comparability of global migration flow estimates. Moreover, these methods make use of very limited covariate information—only allowing information via a single variable for the seed values of the iterative proportional fitting procedure, which has minimal impact—further restricting their ability to capture corridor-specific dynamics or explanatory factors.

More recently, direct estimates of global migration flows have been produced using large-scale online data sources21 (discussed above). The estimates represent a substantial advance over indirect methods, as they are based on observed movements, provide higher temporal resolution, and avoid relying solely on changes in migrant stock data. However, the data cover only a limited number of years, omit several important countries, and will not be updated, restricting their long-term utility.

Recurrent neural network approach to quantifying global migration

Demographic account for global migration estimation

UN DESA provides estimates of global migrant stock S bi (t), that is, the number of people born in country b living in country i at time t (ref. 22). These data are given at five-year intervals from 1990 to 2020, as well as a recent estimate for 2024. The stocks evolve deterministically according to the equation

... continue reading