Section 21 Discussion

The results show that we can provide improved estimates of past land-use change using multiple data sources in the Bayesian framework. The main advantage of the approach is that it provides a coherent, generalised framework for combining multiple disparate sources of data. Recalling our original aim of making spatially- and temporally-explicit estimates of land-use change in the UK, using multiple sources of data, we conclude that this aim has largely been met. We have added new data beyond that which is used in the present inventory method: \(B\) is now jointly determined by the annual national-scale Agricultural Census data, rather than just decadal Countryside Survey data; the spatial pattern and resulting vectors are constrained by the high-resolution information contained in IACS, the holdings-level Agricultural Census data, LCM, LCC and CORINE (deliverables A.1). The data have been assimilated so as to use the information from sources which we believe to reliably represent the absolute magnitude of change, together with data sets which provide spatial information. This results in a time series of maps describing our best estimate of land-use change given the available data (deliverable A.1). Furthermore, rather than producing a single time series of maps, we have a set of these which represents the posterior distribution of the data cube \(U\). This distribution quantifies the joint probability distribution of the parameters, and properly propagates the uncertainty from input data to final output (deliverable A.2). The resulting data cube \(U\) has been summarised in vector format (deliverable A.1, e.g. Figure 20.13). The code is documented, version-controlled and made available via GitHub (deliverable A.3).

We suggest the work described here represents an advance on the existing GHGI methodology, but there remain a number of improvements and issues to resolve before this might be made operational. We outline some of the more important ones here.

Possibly most importantly, we should attempt to characterise uncertainty in the different data sources. The Bayesian approach is quite capable of incorporating different uncertainties for different data sources, and indeed, at the level of the individual observation; or in more complex ways, such as accounting for changes in observation uncertainty over time (as we suspect happens in IACS). However, to do this in a rigorous and quantitative way can be quite difficult, and at present we represent all observations to be equal. Some advance on this state at least should be possible. This could involve some more detailed analysis of the data sets we have already. For example, a thorough comparison of the IACS data with the holdings-level Agricultural Census data would be informative. Good agreement would give us some confidence that these are reflecting reality rather than artefacts of the reporting methodology; poor agreement would necessarily raise doubts. The same comparisons could be made between other data sets, such as ground-based and EO-derived data.

Most usefully, we should attempt to introduce independent constraints. For example, as discussed earlier, using data on the area and age structure of temporary grassland would constrain our estimates of the area changing between crop and grass each year. Other forestry data sets, such as annual timber harvest volumes, felling license data, and Woodland Grant Scheme planting data could act as constraints in a similar way.

The CS data was used in the form that it is used in the current GHGI - as a constant annual value over the period between surveys. However, this value is not actually constant; we know the sample mean value over the inter-survey period, but not the annual variability. We should change the way this is represented accordingly. This would have the effect of increasing the uncertainty in the values for indivivdual years, and permitting greater year-to-year variation, as seen in AgCensus.

As discussed in the section on the Agricultural Census, there are some apparent step changes in these data, with no known cause. Our approach of smoothing these out where necessary is effective, but subjective. As the Agricultural Census data play an important part, as the longest running data set with the widest coverage, some effort spent in understanding these would be worthwhile. Whether we apply more or less smoothing to the data is open to debate, and we should understand the sensitivity of the final output to this.

The present project was carried out within a rather short time frame, and the process of obtaining data, and the permission to use it, can be slow. As a result, at the time of completion, we are still lacking permission to use some important data sets that we would want to include (notably IACS and holdings-level data), even though we are already in possession of the data files. The process of requiring the permissions requires significant staff time to establish and maintain communication with personnel in the numerous different government departments who hold the data, and needs some understanding of the legalities of data sharing. This would be an area to prioritise in any future work.

So far, we have pooled all the data for the UK, and allowed the spatial information in the second part of the procedure to differentiate between the four DAs which make up the UK. This approach is certainly more efficient and simpler in terms of programming and data processing. However, it is possible that more accurate results could be achieved by applying both parts of the DA procedure to each of the four DAs separately. This is not essential, and would mean that we have to replicate the whole process four times, but merits consideration. Alternatively, if we apply the GHGI inventory model spatially, then this might actually simplify the process, at the expense of larger data volumes.

We have used only two different assumptions about the prior distribution of the \(B\) parameters. Preliminary analysis suggested that there was not much sensitivity to this. However, we suggest this is worth further investigation, and a wider range of priors examined.

A potential issue is the assumption of independence of errors. The consequence of assuming non-independence of errors would be to produce unreasonably small uncertainties in the posterior parameters, and that does not seem to be the case here. However, we need to be careful when selecting data sources that these are truly independent estimates, and are not double-counting the same data. A much larger source of uncertainty is probably associated with classification. Depending on definitions used to delimit land-use classes, and the consistency of applying these definitions, quite different areas may be calculated for the same nominal classes. There is a real potential problem in combining data from different sources in that we may not be comparing like with like. Here, we reduce this problem by using a relatively coarse land-use classification, with only six classes. This would become more problematic if attempting to distinguish more refined classes.

Whilst the code has good internal documentation (in comment lines, and text within the Rmarkdown files), a document to act specifically as a user manual is still lacking. We consider most parts of the code to be reasonably robustly tested. However, it is likely that there is an artefact of the order of iteration in the importance sampling loop described in section 20. The land-use classes are assigned an arbitrary integer 1-6, and this is determines the order in which land-use changes are computed. We suspect this has only a minor effect, if any. A simple solution would be to randomise the order of iteration every time, so as to remove any bias. This might be an unnecessary complication if not needed, but this should be confirmed.

It is reasonably straightforward to display the uncertainty associated with the final output, the carbon flux from land-use change. This is a time series of a continuous variable, and can be plotted as a line with a shaded band representing 95 % CI limits (Levy et al. 2018). Summarising and visualising the uncertainty in categorical data like land use itself, and its change over time, is much harder. The plots in Figures 20.13, 20.14 and 20.15 can summarise the data, but they do not capture the uncertainty, and some thought needs to be given to this.

Our method here introduces much larger data volumes than the existing method, and a dependence on some level of high-performance computing. The ramifications of this, the potential risks, and the benefits, need to be weighed up. However, the trends in availability and ease-of-use of HPC means that this limitation will diminish over time; low-cost commercial cloud computing services are becoming widely available, such as Amazon Web Services and Google Earth Engine. While it is freely available, the NERC JASMIN facility is ideal for our purposes at present, but there is no particular dependence on this.

The remit of this project was to provide land-use maps and vectors for the GHGI. Because these data are not yet used in the current LULUCF methodology, no testing of the suitability of our output data has taken place. Before taking this work further, it would be wise to consider how the data will actually be used in a new LULUCF methodology. At a minimum, it should be checked that the vector data meets the needs of the soil carbon, land management and CARBINE models. We have previously demonstrated using the first of these (Levy et al. 2018), but this used a model of minimal complexity, without some of the complexities that are introduced in the operational inventory model. The CARBINE model run by Forest Research uses closely related input data and we should consider how these two activities should relate to one another. Some planning ahead for this would help focus what is needed in future.

References

Levy, P., M. van Oijen, G. Buys, and S. Tomlinson. 2018. “Estimation of Gross Land-Use Change and Its Uncertainty Using a Bayesian Data Assimilation Approach.” Biogeosciences 15 (5): 1497–1513. https://doi.org/10.5194/bg-15-1497-2018.