Data
All land cover and protected area datasets used in the analysis (see Supplementary Information, ‘Data sets’ and Supplementary Tables 1 and 2) are publicly available. I measured vegetation patterns using the Digital Earth Australia (DEA) 1:250,000 Land Cover Map (https://maps.dea.ga.gov.au)27,28,56. The DEA Land Cover Map was created using the global standard of the Food and Agriculture Organization’s (FAO) Land Cover Classification System (LCCS) Taxonomy Version 257,58. The dataset combines quantitative reflectance data from the Landsat satellite and qualitative environmental descriptions to measure Australian vegetation cover and type at a grain of 25 × 25 m over the period 1988–2020. Although annual data are available, for this analysis I focused on the first and last years of the publicly available time series.
The DEA Land Cover Map distinguishes between cultivated and natural terrestrial vegetation, and between terrestrial and aquatic vegetation. I used the natural terrestrial vegetation data because of its higher value than cultivated land for biodiversity conservation. Natural vegetation in the DEA Land Cover Map describes areas that “have all or most of the characteristics of natural or semi-natural herbaceous or woody vegetation (based primarily on floristics, structure, function and dynamics)”27,28,56. In Australia, natural or semi-natural vegetation may include areas that are grazed by domestic livestock to keep grassy areas open and reduce fire risk. Areas classified as naturally vegetated have a greater fraction of photosynthetic or non-photosynthetic vegetation than the bare soil fraction for at least two consecutive months. By contrast, vegetation classified as cultivated terrestrial vegetation occurs in areas “where management practices aimed at cultivation (including for grass production) are actively performed during the year being shown… [they] include crop planting and harvesting, fertilization and ploughing”59.
A few details of the classification scheme are particularly relevant to interpretation of the findings of this analysis. Managed plantations and orchards are difficult to distinguish from natural vegetation and may be misclassified as described by the DEA Land Cover Map authors; and in some locations, in northern Australia in particular, variability in natural vegetation growth and fire management patterns can result in natural woodlands being misclassified as cultivated land. Similarly, in arid and semi-arid regions, natural terrestrial vegetation may transition into the natural surface class (naturally bare ground or rock) during dry periods. Finally, in urban areas, vegetated pixels are classified as natural only if the pixel is at least 30% vegetated. Although some years are singled out by DEA as being more likely to include classification errors (for example, 2010 as an unusually wet year and 2015 as an unusually dry year), the two years included in this analysis, 1988 and 2020, were both relatively ‘normal’59. Although these details are unlikely to affect the overall conclusions drawn from this analysis, they may affect specific instances of findings about spillover effects. Note that to avoid confusion with other numeric information, X is used throughout the manuscript as a prefix to indicate a vegetation land cover class.
Data for potentially confounding variables (that is, variables that describe influences on vegetation cover other than protection) were assembled from a variety of sources. Variables were based initially on the findings of Joppa and Pfaff60 and a list of established predictors described in related publications15,61,62. The 14 biophysical variables for initial consideration included the following (described in more detail in the Supplementary Information and Supplementary Table 2): elevation, slope63, BIO1 (annual mean temperature), BIO5 (maximum temperature in warmest month), BIO6 (minimum temperature in coldest month), BIO7 (temperature annual range), BIO9 (mean temperature of driest quarter), BIO12 (mean annual precipitation), BIO15 (precipitation seasonality (coefficient of variation))64, SoilDES (soil depth in cm), Soil BDW0_5 and soil BDW5_1565,66 (soil bulk density, measured using dry mass per volume, at depths of 0–5 cm and 5–15 cm), AET (actual evapotranspiration)67, and agricultural capability68. This list includes well-established influences on plant growth patterns, the climatic and soil variables identified by Bennett et al.61 as being most correlated with forest biomass, representative measures of the soil–plant interface, and a human perspective (agricultural potential) on the capacity of locations to support agriculture.
To correct for biases due to the placement of protected areas I also included four location-related measures: distance to town of >250,000 people, distance to coast, distance to major road, and distance to any road. I initially included human population density, but replaced this with ‘distance to town of >250,000 people’ because large areas of Australia have such low population densities that using actual human population density provided very little benefit in distinguishing between most locations and resulted a large number of missing data values that hindered statistical analyses. Note that although these variables will also correlate with land clearing processes, they are intended to correct for protected area position rather than to measure land clearing. Thus, the dates of the coverages used to generate these variables need only to align with the relatively slow process of protected area gazettement, rather than with any recent changes in land use or population expansion.
Extraction from these datasets to create attributes for each sampling polygon was undertaken in Google Earth Engine. The number of included covariates was constrained by the need to avoid overfitting in matching and regression analyses by maintaining a working ratio of at least five independent sampling points per predictor variable. I used variance partitioning and then principal components analysis to reduce the number of matching covariates (see Supplementary Information, section 7.4, for full details and related statistics). The first five principal components explained over 83% of the variance in the environmental data. The variance partitioning analysis indicated that these components explained ~18% of the variance in vegetation composition across all sampling polygons in both 1988 and 2020, with additional components explaining just over 1%.
Judging that the inclusion of additional components was not worth either the resulting loss of data from the larger analysis or (alternatively) the risks of overfitting regression models for protected areas where available sample sizes of sampling polygons were small, I ran the matching analysis using the first five principal components to correct for environmental confounding variables. Using principal components also has the added advantages that: (1) the components themselves were normally distributed, although slightly skewed in a few cases; and (2) the components were entirely uncorrelated with each other. Thus, the components fully meet textbook assumptions of regression analysis and related techniques.
Data on the management categories, gazettement dates, and other attributes of protected areas across Australia are provided in the CAPAD 2020 dataset. Some correction of minor errors in this dataset was required prior to use (details in Supplementary Information, section 1).
Data processing
... continue reading