(1/17) A fact-checker told me that Wahl & Ammann 2007 used the same proxy data as MBH98. Let's see if this is TRUE or FALSE.
This Python script generates most of the figures in this thread (and downloads ~150 MB of data): pastebin.com/06pKtnQh
(2/17) The MBH98 temperature reconstruction and the WA emulation of it span the period AD 1400–1980 and are concatenations of a dozen or so shorter reconstructions, each using a separate network of available proxies. These notes will focus on the earliest interval (AD 1400–1449).
(3/17) Now for the good stuff: linear algebra.
Skip to tweet 6 if you don't like equations.
P: proxy data matrix.
P₀: calibration submatrix of P.
T ≈ UₖΣₖVₖᵀ: low-rank approximation of observed temperature field.
(4/17) The methods description in MBH98 is a little wordy but amounts to this:
Regression model: P₀ = UₖG + Ε
Calibration: Ĝ = (UₖᵀUₖ)⁻¹UₖᵀP₀
Reconstructed Uₖ: Ûₖ = PĜᵀ(ĜĜᵀ)⁻¹
Reconstructed T: T̂ = ÛₖΣₖVₖᵀ
(5/17) So the reconstructed temperature field is T̂ = PĜᵀ(ĜĜᵀ)⁻¹ΣₖVₖᵀ, and the Northern Hemisphere mean (the "hockey stick") is a weighted mean of T̂. The useful thing to note here is that the temperature reconstruction is a linear combination of the proxy records.
(6/17) WA used proxy data from this archive: meteo.psu.edu/holocene/publi…
If MBH98 used the same data, then the MBH98 reconstruction would be a linear combination of this data. It's easy to see that this is not the case by regressing the reconstruction on the proxy data.
(7/17) The regression shows the closest WA could have hoped to emulate MBH98 with the data they used. The actual emulation is much less accurate.
(8/17) Just to check that the code works, here's the WA reconstruction regressed on the WA proxy data. It's an exact fit, as expected.
(9/17) To determine what proxy data was really used in MBH98 we need the larger Climategate dataset, hosted here by @🇺🇦 Dave Burton ❌
The proxy data used in WA is a subset of the Climategate data.
(10/17) Redoing the regression analysis with all 15 North American ITRDB PCs included yields a match. The regression coefficients reveal that the first six PCs were used in MBH98, whereas WA only used the first two PCs. Conversely, some proxies used in WA were not used in MBH98.
(11/17) Here is a side-by-side comparison of the MBH98 and WA regression coefficients.
(12/17) MBH98 and WA also standardized the proxy data differently (detrended versus nondetrended standard deviations), so their regression coefficients aren't directly comparable. For completeness, here are comparisons for each standardization.
(13/17) The difference between WA and MBH98 disappears almost completely if the WA proxy data is replaced with the correct Climategate proxy data and the WA code is modified to standardize by detrended standard deviations.
(14/17) WA begin their results section with guesses. Nowhere do they state the real reasons for the discrepancy in 1400–1449: the swapping of four North American PCs (including the heavily weighted sixth PC) for French and Moroccan data, and the different standardizations.
(15/17) By "equal weighting of the proxies" they mean that they didn't use the weights included in the proxy lists. MBH98 didn't either, so it doesn't explain any difference whatsoever.
Not the greatest start to a results section.
(16/17) Wikipedia also gets the PCA stuff wrong in their hockey stick article, which cites Wahl and Ammann's flawed analysis.
(17/17) In conclusion, Wahl and Ammann did use the wrong dataset, and presented made-up results to boot.