I sent a link to my recent post The Hunt For Global Warming: Southern Hemisphere Summary to Professor Richard Muller at Berkeley drawing attention to the gulf between Berkeley Earth Surface Temperature (BEST) for southern hemisphere land and the compilations produced by Roger Andrews and I (Figure 1) in the hope that he or his group may help us to understand where the discrepancies may lie. He passed this on to Steven Mosher to respond and we exchanged several emails. Most of this correspondence shall remain confidential but suffice to say that Mr Mosher pointed out that they have verified and tested BEST and since neither I nor Roger had documented verification of our methods the onus was on us to do so.
In summary, Roger Andrews’ (RA) compilation for Southern Hemisphere land has good geographic cover and uses 369 records. My compilation (EM) to date uses 174 records and is specifically designed to sample low population areas. It nevertheless gave a result very similar to RA. Since 1882 BEST runs at about 0.7˚C per century warmer than either RA or EM (Figure 1). RA and EM are using GHCN V2 records that are “unadjusted”. A handful of spot checks in Australia show that these are the same records as unadjusted BEST. BEST is however using their own homogenisation algorithim to supposedly correct for non-climatic artefacts. Roger has compared BEST adjusted with unadjusted records in South America and found a large positive bias in the homogenised set (Figure 2).
Figure 1 Comparison of RA, EM and BEST. BEST series based on monthly data downloaded from their site and recalculated to the equivalent metANN employed by GHCN (DJFMAMJJASON). RA series begins in 1882 hence this is used as the start point. All three series adjusted so that 1882 = zero. There is a reasonably high degree of congruity between EM and BEST, i.e. the peaks and troughs rise and fall together. Note how similar the BEST 1976 feature is to EM. But the gradients are totally different. 1880 to 2011 EM = +0.18˚C per century; BEST = +0.91˚C per century.
In this post I report on three simple tests to the methodology I employed to see how robust it is. This includes 1) summing the gradients of 5 individual records and 9 groups of records and comparing these with the gradient of the average of dT determined on the same record stacks; 2) filling blanks with average data for a region and comparing this with unfilled data; 3) decimating data by removing 20% and 50% of the records.
Figure 2 A comparison of BEST homogenised and unbiased records from S America. At the regional / continental level homogenisation is not supposed to introduce bias but in BEST it evidently does (chart by Roger Andrews, Worst of Best.)
One of the things I have been told is that no one is averaging anomalies. Looking at a chart of dT spaghetti both myself and Roger independently reached a conclusion that the easiest way to see or measure the average trend was to simply take the average of the dT stack on the spread sheet. But does the average dT gradient equal the average of the dT gradients measured for each station?
In Figure 3 I show dT for 5 S Hemisphere stations with long records. One from Australia, one from South Africa, two from South America and one from Base Orcadas, just off the Antarctic peninsula. Three have cooling trends and two warming trends. Base Orcadas in particular has one of the strongest warming trends in the S hemisphere.
Figure 3 Five long S Hemisphere records are shown together with the average of the dT stack. The gradient of the average dT stack is +0.37˚C per century. This compares with averaging the gradients of each individual record that works out at =0.34˚C per century.
Individually the gradients through each data series in ˚C per century are as follows:
Alice Springs: +0.73
Punta Arenas: -0.48
Asuncion Aero: -0.14
Base Orcadas: 2.06
Average = +0.34
This compares with a regression run through the mean of the dT stack =+0.37. The difference of 0.03˚C per century is immaterial.
Another way to look at this is to sum the dT gradients for each of the 9 regions looked at and to compare this with the sum of the dT stack for the 9 regions combined (Figure 4) and to compare that with the dT stack for the 174 records.
Sn Africa 1: +0.12
Sn Africa 2: -0.53
Central Australia: +0.14
New Zealand: +0.15
This compares with the regression run through the sum of the dT stack that = +0.20 and a regression through the average of all 174 records = +0.18.
Figure 4 The traces of the averages of the 9 regions I have looked at so far are shown and the 9 averages are averaged. (Note that the two islands of Signy and Kerguelen are treated as a region). The sum of 9 averages = +0.28˚C per century. The regression through the average of averages is +0.20˚C per century. Summing the dT stack of all 174 records and running a regression through that produces +0.18˚C per century.
Note that while records within regions tend to be congruous the regions themselves are not. There can be a tendency for one region to be warm one year while another is cool.
This time there is a larger but still small difference of 0.08˚C per century between the sum of 9 dT gradients and a regression through the average of the 9. I suspect this may be linked to the discontinuous data series that affects in particular Antarctica and The Islands. In the average of 9 groups, Antarctica has equal weight to each group. In the sum of dT regression the weighting is significantly reduced since the data only begin in the 1950s. So let’s take a look at the impact of discontinuous data on the averaging process.
Averaging Discontinuous Data
Discontinuous data arises from station records starting and stopping at different times. Of my 174 records, only two run from 1880 to 2011. Those are Capetown and Alice Springs. All other records are incomplete. Out of a possible 22,794 annual average records (131*174) there are only 9,636 recordings. The record is only 42% complete. This most certainly creates potential issues prior to 1900 where the record may become biased by the relatively small number of stations. But what of all those blanks and data discontinuities? May that impart bias to a regression through the average of the dT stack?
To evaluate this I have filled blanks in all records with the average for the stack from each region. To put this in fancy language I have projected a regional mean into areas with no data. In reality I filled in blanks in my spread sheet with regional average values. The number of filled cells is now 21,473 or 94% complete (Figure 8). It is not 100% complete because certain regions do not have data spanning all the way back to 1880, hence there are periods in some records where there is no regional average with which to fill the blanks, for example Antarctica, The Peninsula and Patagonia.
Figure 5 This chart shows the regression through the sum of dT for the 174 records. The distribution of records is shown in Figure 7.
Figure 6 This chart shows the sum of the dT stack with blank records filled using the average values for each region to fill the blanks. The distribution of this synthesised record stack is shown in Figure 8.
The difference between the empty cell and the filled cell regressions is o.o3˚C per century, effectively zero. The structure of the data has not surprisingly been changed. In particular, three high temperature spikes have been amplified. But the gradient is unchanged.
Figure 7 The annual count distribution of the 174 selected records. Only 42% of all possible annual records actually have data.
Figure 8 Filling blank records with regional means produces a synthetic data stack that is now 94% full. It is not 100% full because some regions do not have data in certain time periods with which to average and fill blanks.
The final test is to see how sensitive the regression is to the number of stations included. The first removing 20% of records by deleting every 5th record and the second removing 50% of records, i.e. every second record which is quite a severe test with so few records at the outset.
Removing 1 in 5 records changes the gradient by +0.04˚C per century, which is immaterial (Figure 9). Removing 1 in 2 records changes the gradient by -0.12˚C per century, a somewhat larger change (Figure 10). This latter test leaves only 3 records in the pre-1888 part of the stack which is far from representative, but it still makes little difference.
Figure 9 Deleting every 5th record produces little difference to the gradient of the dT stack.
Figure 10 Deleting every second record produces a small but recognisable change. This is a severe test, especially for the older part of the section which is reduced from 8 to 3 records.
In my original post I addressed two methodological issues :
- The normalisation procedure employed to produce the dT stack. Using the station average mean produced the same result as using a fixed base period of 1963 to 1992.
- Area weighting of regions did not significantly modify the results.
In this post I have applied some simple methodological tests. These tests may not apply to other data sets because the 174 records used here show little variance and are at the outset fairly homogenous. Put simply, the majority of these records are flat to gently rising (+0.2˚C per century) and no matter how they are sliced and diced that is the trend they should show (Figure 11).
The argument that my 174 selected records are non-represnentative is true but carries little weight since RA has a larger data set with more complete geographic cover of the S Hemisphere but produces a similar result . When I get around to expanding my data cover to more populated areas, many of which lie further north I fully expect to find marginally steeper warming gradients. The discussion then will focus on whether this has to do with greater human working of the land surface or with the greater land area per degree latitude as you move northwards.
The question remains how Berkeley (and GISS) manage to produce 0.9˚C warming out of records that appear on average to carry a +0.2˚C signal. Roger Andrews discusses some of the possibilities in Worst of Best. He speculates that leading contenders may be 1) homogeneity adjustments that may remove real climatic signal, 2) weighting of stations based on their match to a regional expectation, stations that do not match an expectation of warming may be de-weighted out of existence and 3) projection of data into areas with no data from up to 1000 km away. This latter possibility, if correct, opens the door to projecting Northern Hemisphere warming into the Southern Hemisphere.
In the interest of openness and transparency the 1.9Gb of Berkeley code is freely available for everyone to run and read. Does anyone out there have a functioning Berkeley Earth platform? Clive?
Figure 11 Summary of the dT gradients through different data sets using different methods for data treatment.