Last week Derek Scissors, a think tank analyst at the American Enterprise Institute, published an article in which he referred to an October 2014 study by Credit Suisse that attempts to measure total household wealth by region and by country. Scissors argues that in the interminable debate about whether or not China will overtake the United States as the world’s largest economy, it is widely assumed that there is only one correct way to decide the answer, and that is by comparing the GDPs of the two countries. Some people argue that nominal GDP at the current exchange rate is the appropriate measure, whereas others prefer to use PPP-adjusted GDP, but there is no reason, Scissors points out, that either of these in fact are the appropriate comparisons:
There is a debate over which country has the world’s largest economy. One side cites gross domestic product adjusted for purchasing power parity and puts China on top, while various other indicators show the United States ahead. The claims are used to gauge China’s importance, highlight Sino-American competition, and sometimes identify China as a threat.
What is almost never in dispute is that China is rising economically relative to the United States. If China is not ahead yet, it is said, the day is coming when it will be. However, at least one vital indicator casts doubt on that thesis: national wealth. From the beginning of 2008 through the middle of 2014, China may have lost ground to the United States in total wealth.
As Scissors points out, “Credit Suisse put net private American wealth at $42.9 trillion, compared to $4.7 trillion for China: a ratio of more than 9:1”, meaning that the US is 9.1 times wealthier than China. However their GDP ratios are very different. A quick check shows that at the end of 2014 China reported GDP of $9.18 trillion, whereas the US reported $16.77 trillion, so that U.S. GDP is 1.8 times China’s GDP.
Pettis, an expert on China’s economy, is professor of finance at Peking University’s Guanghua School of Management, where he specializes in Chinese financial markets.
More >
This might at first seem strange. A country’s GDP is supposed to measure the amount of wealth created during the period measured, and is often thought of as analogous to the earnings generated by a business. I am not sure exactly what the Credit Suisse estimates of total household wealth represent, but if we think of them as being equal (or at least proportional) to the total market value of each economy’s assets and of their ability (combined with the labor of American or Chinese people) to produce goods and services, it seems that every dollar of American income is 5.0 times as valuable as every dollar of Chinese income. To put it in stock market terms, the U.S. P/E multiple is five times the Chinese P/E multiple.
Is that plausible? Yes it is, although I make no claim about the accuracy of this particular ratio. While the United States should certainly trade at a higher “multiple” than China, whether it should five times higher, or more, or less, is impossible to prove. Although I don’t find the debate about whether the Chinese economy will overtake that of the United States, and if so, when, especially interesting or even intelligent, I do think the question about the relative economic value of the two countries is interesting because it illuminates quite a lot about both the Chinese economy and about how we should be thinking about economic growth.
But before I explain why a higher U.S. “multiple” can easily be justified, let me turn back to the question of GDP. A country’s gross domestic product, or GDP, is supposed ideally to be the aggregate value of all the goods and services produced during the GDP period, including any improvement or deterioration of that country’s capital stock. The OECD defines it, perhaps not very elegantly, as “an aggregate measure of production equal to the sum of the gross values added of all resident, institutional units engaged in production (plus any taxes, and minus any subsidies, on products not included in the value of their outputs).”
What good is GDP?
GDP, as we all know, is intended to measure a country’s (or region’s) economic wealth creation during a particular period. But, as we also all know, it doesn’t do this very accurately. Simon Kuznets, the person who is generally credited with having “invented” GDP in a 1934 report to the U.S. Congress, understood its weaknesses, and he fairly consistently warned about the ways in which GDP can be mis-used. The problem with GDP is that there are many things included in the standard GDP calculations – some people propose for example that these include military expenditures, or brokerage fees – that don’t reflect any real change in the ability of the economy to produce goods and services, whereas other things that do reflect such changes are often not part of the GDP calculation. The most typical examples of the latter are things we call positive or negative externalities. For example while there may well be positive economic value in the activities of a factory that produces chemicals while dumping the effluvium in a nearby river, if we ignore the economic costs associated with polluting the river, which may include lower future returns on farming and fishing, higher future health care costs, and less “pleasure” for future hikers, boaters, and nature lovers, then the “real” economic value of producing the chemicals is likely to be lower than its contribution to reported GDP.
What’s more, for something to be part of GDP it has to be part of the recorded cash economy. Prostitutes certainly provide a highly valued consumer service, and an argument can be made that drug dealers do too, at least in a way analogous to bartenders, but their activity is rarely included in GDP figures (although in some countries economists are starting to do so). Babysitting provided by an agency is part of GDP, but if a neighbour or relative baby-sits for free it, it is not part of GDP. I also want to mention something that is rarely given enough credit as adding to household consumption, certainly to my consumption, which nonetheless I think has enormous value as a consumer service. My life has been transformed, and this is not an exaggeration, by Google’s search function, and I am certain that its contribution to my welfare, and that of the rest of the world, vastly exceeds whatever contribution it is calculated to add to global GDP. Maybe not everyone is as ecstatic as I am about the fact that from my office, home, or even while sitting in a taxi, I can easily access vast amounts of information, references, and data, and so put together in hours something once would have taken me weeks, but if internet searching were taken away from me, it would impoverish me far more than losing a car or most of my wardrobe.
There is no question that GDP, in other words, does not measure what we usually think it measures, but this doesn’t make GDP a useless number. There are two reasons why it makes sense to invest the time and effort into calculating GDP. First, as long as we constantly remind ourselves of the errors implicit in calculating GDP, and try to correct for them, if only informally, GDP can give us a rough proxy for total value creation. The second reason, probably much more important, is that GDP can be very useful in allowing us to make comparisons between economies, or between different time periods.
In fact this is one of the main uses of GDP, and it can be very accurate, but its usefulness depends crucially on a condition that is very easy to specify and yet is so poorly understood and so often violated by economists that it is frankly a little shocking. The GDP calculation might not capture real value creation with great accuracy, and sometimes this failure to capture real economic value creation can be substantial, but as long as these “failures” are consistent and biased in the same direction, the comparisons are still useful and can be extremely precise and accurate. For example, the errors in the calculation of U.S. GDP in 2013 are probably consistent with the errors in the calculation of U.S. GDP in 2014, so that the ratio of 2014’s GDP to 2013’s GDP, which we call the GDP growth rate, is probably extremely close to the real growth in the value of the U.S. economy.
Similarly the GDPs of Canada and the UK, while also embedding incorrect measures of value creation, probably do so in ways that are fairly consistent with the incorrect measures embedded in the calculation of the U.S. GDP. When I say that at the end of 2014, the U.S. economy was 9.1 times the size of the Canadian economy and 7.4 times the size of the British economy, according to their reported GDPs, I can be reasonably confident that the truth is not too far from that number. With other countries, however, I should be much less confident about how usefully the ratio of reported GDPs represents the ratio of the real value of one country’s creation of goods and services to the other’s.
Let me add that while some people might immediately and intuitively understand why it must be the case that comparing reported GDPs with one set of countries provides a more accurate description of the relative size of the U.S. economy than another set of countries, as I will show later, not every economist understand why this must be the case. The problem is not mainly that in calculating GDP different countries classify economic activity in different ways, although this certainly is the case. The real reason is that economies are systems, as Hyman Minsky so richly and usefully explained, consisting of interlocking balance sheets, and economic activity is mediated through the connections between the various balance sheets, which themselves reflect very different institutional structures.
One of Albert Hirschman’s great insights, whose implications I think are still not fully appreciated by many economists, is that all economic activity, especially rapid growth, creates imbalances within the system, and these imbalances always eventually reverse themselves. The ways in which they do so, however, can vary greatly and are necessarily constrained by the institutions that characterize each economy. In some cases — obvious examples include economies with a very powerful state sector, or economies heavily dependent on the production of one or a few commodities, or economies dominated by other highly concentrated sectors (the alarming increase in banking concentration in the United States, perhaps), or economies in which a very large, underemployed rural population is streaming into urban centers, or economies in which business activity is extremely corrupt or heavily bureaucratized, and so on — these institutions can distort the rebalancing process or hamper it long enough for the country to develop deep imbalances.
These deep imbalances can introduce equally deep systemic biases in the calculation of GDP that undermine the implicit assumption behind all GDP comparisons: although GDP calculations necessarily fail to capture accurately the aggregate value of all the goods and services produced during the GDP period, including improvement or deterioration in capital stock, as long as these “failures” are broadly consistent, and biased in the same direction, comparing GDPs can be a meaningful exercise. But economies with very different institutional structures are likely to have very different sets of biases, and I am not sure why economists who understand easily the concept of the “agency problem” — their different incentive structures lead managers to make decisions that might not be in the best interest of the shareholder — have trouble understanding that different institutions can create different sets of biases in the way economic activity correlates to wealth creation, and this undermines the usefulness of GDP as a tool for comparing economies. The agency problem itself, after all, is an example of one such institution and can cause significant value distortions especially in an economy dominated by the state sector. Economic agents in countries with artificially high interest rates, to take another example, will treat capital very differently than will economic agents in countries with artificially low interest rates, and so the true economic value of activity involving capital will be reported in very different ways when GDP is calculated.
What can you measure with a broken scale?
Fortunately not all factors that undermine GDP comparisons are quite as intractable. There is one way that GDP between countries can be distorted, and that is because GDP comparisons are made according to current exchange rates, and of course these vary constantly in real terms. It may turn out that once you adjust for cost, the standard of living in the United States may actually imply that the U.S. economy is more, or less, than 9.1 times the Canadian economy or 7.4 times the British economy. There is, however, a way to correct for this, and that is to adjust the British and Canadian GDP numbers on a purchasing power parity (PPP) basis so that price difference caused by fluctuations in the real value of the currencies of the three countries are eliminated. This isn’t necessarily easy unless Canadians, English, and American households divide their purchases among various goods and services in exactly the same proportions, but it is possible to do a reasonable approximation.
It may now sound like I am belaboring the point unnecessarily with my next metaphor, but there is a reason for this, so please bear with me. I want to make an extremely important point, one which I have made before, and while engineers, mathematicians, and bond traders find it annoying that I would even bother making such an obvious point again, economists seem to have so much trouble understanding it, and through them journalists, that I am going to try again to explain.
We often hear that the real way to compare two economies is not on the basis of reported GDP but rather on the basis of the PPP-adjusted GDP. This is not true. PPP adjustments are useful in certain contexts, not in most others, but this essay is not the appropriate place to explain why. At any rate in the article I cited above, Derek Scissors notes that in the debate over whether China or the United States is the world’s biggest economy, “one side cites gross domestic product adjusted for purchasing power parity and puts China on top.” In another article Scissors explains why he dismisses the PPP-adjusted GDP calculation, and while his reasons are correct, I think he misses the main reason to reject the usefulness of China’s PPP adjustment.
To explain why, we will switch gears altogether and assume that I had a broken scale at home that caused the recorded weight of anyone who used it to be consistently higher than his real weight. This would be annoying, but it wouldn’t make the scale useless. It would still serve two useful purposes. First, and most obviously, if I weigh myself every day, I will get a fairly accurate record of the percentage change in my weight on a daily basis, and although I might not know what I really weigh, if all I care about is how well I am managing my weight, then the broken scale is as good as an accurate one. This is the equivalent of comparing a country’s GDP growth from one period to the next — the actual numbers might not be accurate, but the percent change is.
The second thing I can do is compare my weight with that of my friend, who also uses my inaccurate scale, which perhaps we do every January 1 and publish on my blog. This allows our friends to compare our progress and to make jokes at our expense. The progress, or lack of progress, indicated by the broken scale is real, even if the recorded weight isn’t. If we do it January 1 on any given year, for example, and he turns out to weigh 10% more than I do on my inaccurate scale, it’s a pretty safe bet that he also weighed 10% more than I did in reality, and our friends can make fun of him for weighing more than me. This, of course, is analogous to comparing U.S. GDP with that of the UK or Canada. The real numbers may be inaccurate, but the comparisons are valid.
But what happens if we always weigh ourselves in the morning after getting out of bed, whereas this year my friend was away from hone, and by the time he was able to come to my home to weigh himself it was evening and he had already eaten dinner, when he was likely to be heavier than he would have been in the morning. In that case the comparison between us will have been distorted, in my favor.
There is however a way to fix the problem. I can ask him to weigh himself every morning and evening over several days, and to average the difference, and then I can use this average to adjust the weight he recorded this past January 1. This adjustment won’t be perfect, but we can all agree that it is a useful adjustment because it gives us a more accurate measure of our weight difference on January 1. This adjustment, of course, is analogous to the PPP adjustment – it isn’t perfect, but it certainly improves the accuracy of the comparison.
But let’s say, for some weird reason, I have a second friend with whom I engage in the same ritual. The problem is that this second friend has his own scale, which is also inaccurate, but it is not inaccurate in the same way mine is. Because we live so far apart, we have never been able to figure out what the difference is, but we just know that the two scales are inaccurate in totally inconsistent ways.
Obviously while this second friend can use his scale to measure how well he is managing his weight, any comparison between his recorded weight and mine is pretty much a useless exercise. We know how he is doing on a year-to-year basis, and we know how I am doing, but if we wanted to find out which year it was that we both weighed exactly the same, we wouldn’t be able to tell.
My second friend, however, isn’t terrible smart. If on January 1, he also weighed himself in the evening and then went through the same adjustment process as my first friend, then it would be absurd if he then published the adjusted number and said that this adjustment made the comparison between us much more accurate. Why? Because the adjustment would be functionally random. If his scale recorded higher weights than mine, and the difference was greater than the adjustment, then he would be right to say that the adjustment improved accuracy, but this would just be a result of chance. If his scale record lower numbers than mine, or if it records numbers that are higher by less than half of the adjustment, then his adjustment would actually make the comparison between us less accurate.
Damned PPP again
If everyone understood that the weight comparisons between me and my second friend are inaccurate, and my second friend went through the adjustment process as a joke, and everyone understood that it was a joke, it wouldn’t matter much. If when they heard about the adjustment, however, and they took the comparisons seriously because they believed that this adjustment represented a real improvement in describing accurately the differences in our respective weights, then I would probably find the whole thing either annoying or even funnier.
So what am I talking about – is there really anything analogous to such an absurd story? Unfortunately there is. It is the comparison between the U.S. GDP and China’s GDP on a PPP-adjusted basis. When the World Bank announced China’s PPP-adjusted GDP, it turned out that the PPP adjustment was much larger than expected, and it implied that, on a PPP basis, China would overtake the U.S. economically much earlier than expected. I posted a blog entry explaining what I thought was quite obvious: that because China’s GDP was constructed differently than that of the United States, direct comparisons between the two were not terribly useful. Worse than useless, however, it was downright foolish to make PPP adjustments and imply that what was in effect a random change in comparability somehow improved the quality of the comparison.
Some people interpreted this to mean that I was arguing that China was using a different set of rules to compile its GDP, but this is not at all what I meant. My point was only that because these two economies were so different, not least because of the enormous roles the two governments played, especially in the financial system, and most especially in the widespread perception of moral hazard within China, it was inevitable that the many ways in which U.S. GDP was miscalculated would differ significantly from the many ways in which China’s GDP was miscalculated, so that the differences would involve very different biases between reported GDP and the “real” value of goods and services produced. In that case any kind of “adjustment” that did not specifically eliminate all the differences in bias, especially a PPP adjustment, would as likely make comparability worse as it would make it better.
I assumed that it was obvious how institutional difference were so great between the two that their inconsistent biases would render GDP numbers incomparable, but just in case, I mentioned the most glaringly obvious such difference, which was the very different ways in which the two countries recorded the impact of loans made to projects that did not generate increases in productivity that were large enough to justify the investment. Because these were far more likely to be written down in the United States than in China, and because most economists agree that the difference is very large in GDP terms, the failure to recognize bad loans in China is by itself more than enough to invalidate any PPP adjustment.
But it turned out to be less obvious than I thought. A few months later a friend of mine sent me a Bloomberg article with the title “Bad Math Makes China’s GDP No. 1”, and I discovered that I was the perpetrator of this bad math. I was a little worried at first, because I am enough of a math geek that I think launching into a discussion about the sheer beauty of probability theory during a dinner with friends makes me a charming conversationalist, and usually when I try to do economics and smart people around me patiently point out my mistakes, math is usually not the part that I get wrong. Logically speaking, it seemed that there were only two ways the author of the article, Noah Smith, might prove me to be mistaken. One way was to prove that GDP calculations are actually always very accurate measures of real value creation, and the second way was to prove that the conditions of moral hazard within which much Chinese lending occurs nonetheless makes Chinese banks as likely as U.S. banks to write down loans they have made into projects whose economic value is less than their cost.
Is an obsession with accuracy unhealthy for the economics establishment?
It turned out that I was being rebuked for a very different way of committing my bad math that I had expected. Smith’s spanking, if I understand it correctly, was because he thought I was trying to get economists to stop accounting for GDP in a consistent way. Actually I wasn’t doing any such thing. All I did in my PPP essay, or at least what I thought I did, was to point out that accounting models are attempts to approximate reality according to a consistent set of rules, and sometimes, even usually, they do so reasonably well, but there are times when they distort the picture of reality enough that we should be recognize that the model is largely useless, and so we should ignore its implications.
Smith seemed to think I was doing something far more subversive. I may be a little confused about his objections, in part because I assume that my explanation for why we should ignore the whole PPP excercise for China are pretty unremarkable. At any rate he writes:
There are plenty of doubts surrounding the Chinese figures, of course. The latest price survey might be just as inaccurate as the earlier ones. Chinese provincial gross domestic product figures are notoriously overstated by job-seeking officials. And the calculation the IMF uses to adjust for price differences, called purchasing power parity, contains a lot of assumptions — using market exchange rates, the U.S. still has the biggest economy.
So when I clicked on a Quartz article entitled “Nope, China’s economy hasn’t yet surpassed America’s,” I expected to see these concerns highlighted. Instead, what I found was that the usually reliable and perspicacious China-watcher Gwynn Guilford had bought into a dodgy theory being promulgated by the renowned Beijing University professor Michael Pettis.
Pettis’s theory, in a nutshell, is that bad investments shouldn’t be counted in GDP… What Pettis is suggesting is that we change the whole way we measure GDP. He wants us to use the discounted present value of assets — in other words, a guess about the far future — as our GDP measure. In other words, he thinks true GDP ought to be a measure of wealth creation rather than a measure of current production.
…We must resist that urge. If economists start trying to subtract perceived malinvestment from GDP, then estimates of GDP will vary wildly from economists to economist, based on how big each one thinks the bubble is. For example, suppose it’s 2007 and I think most of the houses that are being built will eventually be occupied, but you believe that most of them will stand empty and eventually be demolished. If we do what Pettis recommends and subtract our subjective estimates of the percentage of future unused housing from GDP, then you and I will come up with two different GDP numbers!
The friend who emailed me the article is a mathematician who became interested in economics through finance, and he was a little too delighted with the article because he knows how frustrated I get by the way economists regularly combine clunky mathematical intuition with a reverence for mathematically formulated statements that often exceeds their worth. He also knows that in class I am particularly insistent that every model has implicit underlying assumptions, and we should not use the model until we have worked them out and find them consistent with the rest of our assumptions (he had taken my arbitrage class many years ago at the Columbia Business School). Because he understood that I called the PPP adjustment for China useless not because I wanted to tear down the edifice of economics but rather because an implicit assumption fundamental to the PPP model is that if the adjustment is supposed to improve the quality of the comparison, then any biases in the reported GDP numbers must be broadly consistent with the biases in the reported U.S. GDP numbers. Here is what my friend said in the email:
This guy says you’re wrong, not because your sacred implicit assumption remained intact [i.e. the assumption that biases must be broadly consistent for GDP to be comparable], and not even because Chinese banks didn’t make bad loans (that’s what I expected him to say). You are wrong because China followed the accounting rules in calculating GDP, and if you start questioning the rules you’ll make it impossible to do economics.
He’s right, you know. If you start running around sacrificing precision just because of an unhealthy obsession with accuracy, no one is going to be able to get their work published. Maybe if China wrote down its bad debt, its GDP number would be totally different. Well great, and if they did, you could have a whole different set of GDP numbers to play with, and everyone would be happy. But they didn’t. So deal with it.
Very funny, but actually I don’t think Smith understood that I wasn’t saying anything quite so heretical as he thinks. I wasn’t arguing that because GDP is “wrong”, it is therefore always useless and should be jettisoned altogether. I was only making what should have been an obvious mathematical point, which is that without dismantling the whole structure of economics we can still recognize when certain numbers are useless, and we should understand that the PPP adjustment for China is useless. Smith accepts that are many reasons to question the PPP adjustment, some of which he notes in the first paragraph.
He thinks these are legitimate reasons, and they certainly are, but they are also either minor or debatable. The most important discrepancy in our ability to compare the reported GDPs of China and the United States, however, forces us to treat the PPP adjustment as a useless exercise, and it is neither minor nor, I would have thought, debatable. Mathematicians, engineers, and bond traders usually roll their eyes at the obviousness of my explanation, and I suspect that some of the regular readers of my blog will comment with eye-rolling avatars, if they exist (or do I mean icons?), but this is why I put together the little story about inaccurate scales, which is not a story about why we should throw all our scales away but rather a story about how they can sometimes be useful and how sometimes they can’t. It may seem silly, but so are the constant references to the implications of China’s PPP adjustment, so one way or the other we’re stuck with silliness.
National P/E ratios
But I started this essay out by saying I wanted to discuss a number of reasons that might explain why the U.S. economy would be valued at a higher “multiple” than the Chinese, and while the US should indeed have a higher multiple, for reasons I explain below, I am not trying to suggest here that the higher multiple implied by the numbers to which Scissors refers is in fact the right one. There are, of course, two parts to the higher multiple and these correspond to “price” and to “earnings” in the P/E ra
The value placed on current and future growth says a lot about the quality of that growth. It also has important policy implications, especially for reforms.
|