Retirement planning: why probability software can be unreliable

As financial services become increasingly automated, retirement spending apps have emerged ostensibly to give a reasonable prediction of whether or how long your nest eggs may last in retirement. But they're a far cry from a crystal ball. 

Samuel Bender, left, laughs with 87-year-old Richard Forsyth, right, while working out in a gym at Laurelmead Cooperative retirement community, in Providence, R.I. Retirement apps developed by firms such as Betterment, Vanguard, T. Rowe Price and Schwab, and others sold as subscription services to financial advisors for use with their clients. The problem is that users are led to believe they should make important life decisions with the aid of these apps, even though the underlying probabilities are based on inherently unpredictable outcomes.

Steven Senne/AP/File

March 30, 2016

As financial services become increasingly automated, retirement-spending apps have emerged that enable you to enter your income needs and portfolio information, ostensibly to get a reasonable prediction of whether or how long your nest eggs may last in retirement.

Many of these apps are on the market — some developed by firms such as Betterment, Vanguard, T. Rowe Price and Schwab, and others sold as subscription services to financial advisors for use with their clients. The problem is that users are led to believe they should make important life decisions with the aid of these apps, even though the underlying probabilities are based on inherently unpredictable outcomes.

In truth, applying probability software to retirement-planning analysis is folly. Even the most sophisticated retirement-planning software used by financial professionals is a far cry from a crystal ball.

Israel has a Houthi missile problem. It’s stuck finding a solution.

The problem with probabilities

The failings of probability-based retirement software, specifically those apps that apply so-called Monte Carlo simulation techniques, are reasonably well-known in professional circles. One of the first academic papers to raise the issue was a 2006 article written by renowned retirement researcher and York University of Toronto professor Moshe Milevsky, who noted in his introduction:

“Of course, as most investment advisors have known for years, a retirement number — if it actually exists — is vague and imprecise, as it depends on many economic unknowns, especially future equity market returns. After all, this number must be invested somewhere in order to produce income, and the portfolio return process is inherently random.”

In addition to the unpredictability of future returns, Milevsky goes on to document how “probabilities” produced by popular retirement software applications vary from one app to the next, depending upon the applications’ internal assumptions and design parameters.

Another academic study, published in February, concluded that “the advice provided from a majority of these tools is extremely misleading to households.”

These publications have caused some to question whether retirement-planning software offers any value to consumers at all. So what alternatives are there?

For Jimmy Carter, a life of service, defined by faith

‘Back-testing’ software

Financial advisors who use Monte Carlo simulation software often express their clients’ results in terms of the likelihood of a positive outcome. Instead of attempting to predict “probabilities of success,” perhaps a better way to approach retirement planning is from a glass-half-empty perspective.

What you really need to know is not how you may fare if things go well, but what will happen to you if a 10% possibility of rain turns into a 100% probability of a thunderstorm. You desperately need and want to know, “If things go badly in the investment markets, will I still be OK?”

Traditionally, historical “back-testing” software has been used for this purpose. By entering your retirement profile into a back-testing app, you can test how your portfolio may have fared if you had retired prior to previous economic downturns. While such information is useful and interesting to consumers, back-testing also has significant limitations.

Specifically, past returns are unlikely to be repeated in the exact same sequence again, and it is entirely possible that future returns will be worse than historical experience.

Further, suppose you wanted to test how your portfolio might hold up over a 30-year retirement horizon if you had retired at the end of 1999 (just before the 2000-’02 and 2007-’09 bear markets). Because we are only in 2016, it isn’t possible to play out the analysis over the full 30-year horizon. You can’t back-test the future.

Bootstrapping technique

One solution to the limitations of back-testing is to apply a simulation technique called bootstrapping. While the simulation engine under the hood of many retirement apps requires the program designer to make assumptions about expected mean rates of return and volatility for various asset classes, bootstrapping requires no such assumptions. Simulations are produced instead by randomly sampling historical returns.

If enough simulations are generated — typically a minimum of 5,000 — the median result may be expected to be roughly in line with historical averages. By considering the range of results below the median, bootstrapping programs may illustrate scenarios showing below-average investment returns, with the value-at-risk statistics (the bottom 1%, 5% and 10% results) representing scenarios that may be as bad as or worse than the historical record.

For example, the following table shows the bootstrapping simulation results for a 65-year-old investor with a 25-year retirement horizon, a $1 million initial portfolio value and a 70-to-30 stock-bond retirement allocation. In this example, the investor requires a $50,000 (5%) first-year withdrawal rate and a 3% annual cost-of-living increase thereafter. He estimates his annual investment expense at 1% and has stated that he expects to withdraw proportionately from each asset class each year and rebalance to maintain his 70-to-30 allocation.

In the chart below, the percentages in the left column are simulation percentiles, and the columns on the right indicate how much in savings would remain after 5, 10, 15, 20 and 25 years for each simulation percentile.

           
80% $1,212,308 $1,358,150 $1,439,849 $1,513,529 $1,483,135
60% $1,091,368 $1,127,568 $1,108,806 $1,004,560 $796,054
Median $1,038,653 $1,040,195 $977,559 $833,761 $535,366
40% $988,481 $958,058 $864,393 $671,558 $316,435
20% $886,511 $789,407 $615,265 $329,948 $0
10% $818,595 $685,467 $466,587 $129,937 $0
5% $763,903 $601,042 $353,836 $0 $0
1% $675,021 $472,024 $190,510 $0 $0
Worst $545,910 $259,541 $0 $0 $0

By focusing on the bottom half of the results and displaying the simulation range in five-year increments over the time period, you can gain a much more tangible sense of whether and how long your savings may last. What’s more, by presenting the data in this format, it is easy to then test how changing factors that are within your control (spending amount, withdrawal strategy, asset allocation, investment expenses) may affect the outcomes.

To be clear, there is absolutely nothing predictive in these simulation results, and the simulation percentiles should not be viewed as probabilities. Instead, the worst results merely represent potential scenarios that may be used to give you a clearer picture of what may happen if things go badly.

While bootstrapping offers a neat way to illustrate these data, it is also not without its flaws and limitations. In this example, bootstrapping was applied only to historical stock market data from 1970 to 2014. The bond portion of the portfolio was assumed to be a constant 2% per year, which reasonably reflects the return an investor might earn today on a five-year CD or 10-year Treasury. The fact that bootstrapping simulations were not applied to historical bond data reflects a limitation seen in most retirement apps in that the yields on bonds today are near the bottom of the historic extreme. As a result, any Monte Carlo application that is generating numbers based on mean historical bond returns or any bootstrapping simulation that is randomly sampling historical bond index returns may produce overly optimistic results.

With any retirement-planning app, the devil is in the details. Consumers and advisors alike would do well to take the time to understand the assumptions and limitations inherent in any retirement-planning application.

John H. Robinson is the owner of Financial Planning Hawaii and a co-founder of Nest Egg Guru, a retirement-planning software application for financial professionals. Learn more about J.R. on NerdWallet’s Ask An Advisor.

This article first appeared in NerdWallet.