How to figure out someone's social security number

July 6, 2009

Even careful people who don’t disclose their social security numbers (SSNs) unless absolutely necessary could have them revealed by computer programs crunching publicly available data. All that’s needed to predict at least a valuable portion of someone’s nine-digit SSN is their date of birth and the state where he or she was born.

That’s the conclusion of two researchers at Carnegie Mellon University in Pittsburgh. Alesssandro Acquisti and Ralph Gross say that the government forces Americans to place a “perilous reliance” on SSNs to establish their identities while giving them the “impossible duty” of trying to protect their number.

The researchers found visual and statistical patterns in publicly available SSN data, showing that “a strong correlation exists between dates of birth and all 9 SSN digits.” They were able to develop a prediction algorithm that “exploits” the fact that individuals with similar birth dates who registered in the same state “are likely to share similar SSNs,” the study says.

In some cases, they were able to predict the entire nine-digit SSN number on the first attempt. The odds of that happening randomly would be nearly one in a billion, Dr. Acquisti says.

The study, “Predicting Social Security Numbers from Published Data,” is being released online today and will be published in the Proceedings of the National Academy of Science.

The formula works best for numbers assigned in recent years and in smaller states. For individuals born after 1988, the researchers were able to predict the first five digits of a SSN on the first try 44 percent of the time. Using birth dates in Vermont from 1995, they were able to predict the first five digits in 90 percent of cases. Nationwide, for birth dates between 1989 and 2003, and using two attempts, they were able to determine the first five numbers of a SSN in 61 percent of cases.

Revealing only the last four digits of a SSN in documents, a precaution used by some organizations, provides little protection, the authors say, since the first five digits of a SSN are actually the easiest to predict.

Once the identity of a SSN can be narrowed to a range of, say, 10,000 possibilities, a network of computers controlled by a fraudster could easily make enough accurate guesses to fool websites that required a valid SSN. In many cases, only a name, date of birth, and SSN are needed to open a credit card account, Acquisti says.

“When one or two attempts are sufficient to identify a large proportion of issued SSNs’ first five digits, an attacker has incentives to invest resources into harvesting the remaining four from public documents or commercial services,” the authors conclude.

At least 10 million US residents, they estimate, have made their birth dates publicly available or easy to infer in online profiles. These can appear many places online, including Facebook or other social networking sites.

The problem with SSNs, as other researchers have pointed out, is that they are used at the same time for two purposes: to be a public identifier as well as a private password. In essence, they serve as both the name of the account and the password rolled into one.

The Social Security Administration should immediately change its system and begin assigning SSNs that are truly random, Acquisti says.

But that will be of no help to the millions of Americans who already possess “predictable” SSNs. What’s worse, unlike other passwords, SSNs can’t be easily changed or blacklisted.

The study also shows how publicly available data online can be “mined” from various sources and aggregated to reveal new information.

“Maybe no one single piece of that information in itself is personally identifiable, but when you start linking the pieces of information with even a little bit of context, you can with a high degree of probability identify someone personally,” says Helen Nissenbaum, a professor of media, culture, and communication at New York University, who did not work on the study.

The burden now, the authors conclude, is on “industry, academia, and policy makers to think about better and economically efficient ways to protect identities in a world of wired consumers.”