William Bengen calculated sustainable withdrawal rates (SWR) using historical S&P500 market returns since 1928 leading to the “4% Rule.” More recently, Robert Shiller published stock market returns data back to 1871 using the S&P Composite Index. In this post, I’ll explore the “probability of ruin” using the more extensive Shiller data.
He found that assuming a fixed 30-year retirement and annual withdrawals of 4% of the retiree’s portfolio value at retirement the worst-case historical scenario (someone retiring for 30 years beginning in 1966) would have depleted a portfolio in less than 30 years for about 5% of the rolling periods. Hence, the “4% Rule.”
(Click on the charts to zoom in.)
Six of the 110 periods (5.5%, the historical “probability of ruin”) were depleted in fewer than 30 years. TPV charts typically and reasonably assume a retiree’s portfolio can’t drop below zero but I continued withdrawals for the full 30 years to show the extent to which they failed. Another way to read this is that the deeper the red column, the sooner the portfolio was depleted.
Take a longing glance at those tall columns, the ones with really large terminal portfolio values. Then, compare them to the little stubby blue guys. Both are probability of ruin “successes”.
Probability of ruin assumes that you’ll be happy simply not retiring in one of those red years. You’re either in the 5% of scenarios that start a losing period or the 95% of winners and so as long as your bar turns out blue, you’re good, right?
Not really. Wouldn’t you be at least a little happier with a tall blue bar than a short, stubby blue bar, even though both avoid portfolio depletion? I would. Probability of ruin assumes that you’ll be just as happy successfully funding retirement and leaving a hundred bucks to your heirs as you would be leaving them a million. And, that you’d be as dissatisfied with a portfolio that funds 29 years as with one that only funds 15.
I wouldn’t. If a planner said, “Hey, great news! Your retirement is funded 95% of the time”, my response would be, “That sounds great but how well does it turn out when it is completely funded and how badly when it isn’t?”
(For a better iceberg effect, turn your phone upside down while you view the chart below. If you’re reading this on an iMac or PC, probably better to just use your imagination.)
In Figure 2 below, I increased spending from 4.2% of initial portfolio value to 4.75% which, of course, creates more red bars indicating more depleted portfolios.
Note that the red bars appear in four distinct clusters in both Figures 1 and 2. A “95% probability of ruin” might suggest that ruin appears sporadically about every 20 years (5% of periods). It does not, although that is how sequence risk is most often (incorrectly) modeled.
When I increase spending to 5.5%, the result is even more red bars, as expected, but they’re still all within those four clusters. Ruin isn’t a uniformly-distributed event. Probability of ruin is quite high in certain periods of economic distress but relatively low any other time.
Here’s an analogy. Kentucky averages about 12 snowfall days per year but we don’t predict snowfall in July. It’s more likely to snow in winter in Kentucky and high sequence risk is more likely to deplete a portfolio when spending starts in an “economic winter”. Many models of sequence risk predict snow in July.
Probability of ruin in pictures via @Retirement_Cafe.
In the next chart, Figure 3, the y-axis scale changes from $M to $K so we can better see the near misses. I arbitrarily set the definition of success in this test to include TPVs greater than $150,0000 and the definition of failures to include TPVs worse than -$150,000. My reasoning is that given the margin of error in a 30-year retirement plan these scenarios might have gone either way IRL (in real life, as Millennials say). This is arbitrary but so is drawing the failure line at precisely zero dollars and this definition factors in more of the uncertainty of the analysis.
I’m not advocating ignoring these data but simply viewing them in three categories instead of two: probably succeeded, probably failed and too-close-to-call, based on our degree of confidence in the outcomes.
When you have only a few failures, a few close calls make a large difference in probability of ruin. Portfolio’s that come up just a little short probably aren’t losers and a small bequest left to heirs is probably too close to call a winner, as well. Thinking we can predict a 30-year retirement much more accurately than plus or minus a few years is overconfidence.
Why do I question “near misses”? Because they probably would have funded most of the 30 years. Only 6% of men and 13% of women aged 65 live another 30 years and all of those who died sooner would have successfully funded their retirements in these scenarios.
The following chart, Figure 4, brings bear markets (the yellow bars) into the picture.
Retirees are often told that retiring into a bear market is deadly, but bear markets don’t appear to be particularly highly correlated with failing portfolio periods. Robert Shiller doesn’t even consider the 1960’s and 1970’s to be bear markets because they were so gradual. Paint those bars blue and the correlation of bear markets to portfolio ruin is even less obvious.
If portfolio depletion isn’t necessarily caused by bear markets, what does cause it? The EarlyRetirementNow.com website found that the sustainable withdrawal rate is nearly completely explained by portfolio returns for the first five and first ten years of 30-year periods. This explains SWR but not ruin — portfolio depletion is completely explained by sequence risk.
Nonetheless, a chart of SWRs is informative. Figure 5 shows the SWRs that would have depleted a portfolio in precisely 30 years from 1872 to 1982.
This is the view of the iceberg below the surface. Sustainable withdrawal rates that deplete portfolios in precisely 30 years are unpredictable and vary widely from 3.8% to 12.6% historically.
There are two potential risks with this strategy. The obvious one is that you might fall into the unlucky 5% (one in twenty) and outlive your savings but an equally important concern is that you would almost always underspend. All of the blue bars above the red line represent underspending. You would have spent 4.2% if you retired in 1950, planning to live 30 years, for example, when you could have spent 11.8%. Of course, you couldn’t have known that in 1950.
Some planners have suggested that sequence risk goes away after 10 years. Alas, it does not. The following chart shows the value of portfolios at the end of the first 10 years for historical data.
If both scenarios are assumed to complete the remaining 20 years of a 30-year retirement and both continue to spend the $42,000 they calculated as sustainable back in year one, the larger portfolio would have survived all rolling 20-year historical periods with continued annual spending of 1.1% (42,000 / 3,800,000), while the smaller portfolio would have failed nearly all of those periods with 12.4% annual spending (42,000 / 340,000).
Sequence risk might appear to go away after 10 years from the perspective of the start of a 30-year period but after 10 years much will have changed. Sequence risk will change accordingly and become greater or smaller. We can’t know which.
Portfolio failures are caused by poor market returns early in a series of returns. The low returns can result from a quick, precipitous shock like The Crash of October 1929, from a single terrible year of returns like 1937, or from a long, gradual sideways series of mediocre real returns like 1966 to 1975.
These growth rates are explanatory, not predictive. In these charts we are explaining the past, not predicting the future. We have no idea what the next five years of market returns will bring but we can see that low early returns — sequence risk — are not a good way to start.
To summarize, probability of ruin is an interesting rule of thumb with severe limitations. Sequence risk affects all portfolios from which the retiree periodically spends but probability of ruin only measures the extreme outcomes, those that result in premature portfolio depletion. It treats all failures alike and all success alike, ignoring the extent of the success or failure. The thin line separating success from failure is arbitrary. It hides the extent of success and the extent of failure.
Models of probability of ruin are not robust. They provide a significantly different answer every time they are run even when nothing changes except the Monte Carlo random number draw.
Probability of ruin is based on some strange assumptions about human behavior, like assuming we will continue to spend the same amount when ruin becomes apparent or that we don’t care how much wealth we have as long as it’s more than zero. It’s also based on less than five unique sequences of 30-year historical returns, a truly small sample.
Put all this together and probability of ruin looks like a very poor metric by which to predict, model, or manage retirement finances.
 The market grew about 0% from 1927 to 1931 as shown in the bottom panel, for example, and portfolios with spending beginning in 1927 failed sooner than 30 years with 4.75% spending, as the top chart shows.
Originally posted at http://www.theretirementcafe.com/2018/08/probability-of-ruin-in-pictures.html