Nicole Wong on how big data could change the way we live

After stints at the White House, Google and Twitter, Wong understands the promise and peril of big data.

Electrical conduits inside a data server room in New York City.

Mark Lennihan/AP

By Sara Sorcher, Staff Writer

December 8, 2014

Nicole Wong is a founding columnist for Passcode, the Christian Science Monitor’s soon-to-launch section about security and privacy in the digital age.

After working as a senior legal executive at Twitter and Google, Wong came to the White House in June 2013. It was a critical time. Edward Snowden’s leaks about government surveillance had thrust online privacy into the forefront, and in the wake of public backlash, Wong helped to author a highly anticipated report on how the massive collection of personal data online could affect the way people live and work. Here’s what she learned.

Edited excerpts follow.

Why many in Ukraine oppose a ‘land for peace’ formula to end the war

Passcode: What’s the most important thing you found out putting together the White House report?

NW: When we were talking to people in the early set of meetings and workshops, we’d ask them, "Describe your privacy concern with the collection of big data." When you started to scrape the surface of their responses … their answer really was, "I don’t want people to use the data to make unfair decisions about me, I don’t want them to use data to limit opportunities presented to me or change my choices."

Passcode: So you found people’s concerns with big data isn't just limited to personal privacy?

NW: Personalization has made the Web really successful. Usually websites, for instance, try to show you ads that are more personalized to you ... presented in the language you speak, personalized to your location, or your interests. We are now going to have data about people where we can personalize all kinds of offers or opportunities: What kinds of loans you should be offered, what kind of educational opportunities should be offered.

In the future, hypothetically speaking, imagine you are a high school student, and you’re looking at college opportunities. It matters a lot what types of opportunities are even presented to you. It makes a lot of sense to personalize it in some ways, but there’s also the possibility of it for closing opportunities. What if you don’t see certain colleges, because whoever made that algorithm didn’t think you were interested in them because of your past academic performance, or what’s on your Facebook page? If everyone on your social network, for instance, went to a certain tier of school — maybe it will presume you only want to see schools in a certain tier.

Howard University hoped to make history. Now it’s ready for a different role.

That’s what people are worried about: that the algorithm will presume what people want, in a way that closes off opportunities for them. That there are opportunities they will simply never see.

We are now going to have data about people where we can personalize all kinds of offers or opportunities: What kinds of loans you should be offered, what kind of educational opportunities should be offered.' — Wong

Those questions about fairness and opportunity are not privacy questions. To some degree, these are discrimination questions, but also equity and access questions. I’d like to see the White House, federal government, and technologists using this data really bear into those issues, also from ethical standpoint — what’s the way I use data, [and] what is the right public policy outcome to use the data?

While working for Google, Wong testified at a Senate Judiciary Committee hearing on Internet freedom in 2008.

Susan Walsh/AP

Passcode: At what point does it become illegal for a company to give different pricing to one customer than another based on data?

NW: There are not necessarily simple answers to that question. Some big data algorithms, their entire purpose is being able to separate two groups that are dissimilar. Sometimes, the function is to look at two groups and treat them differently.

The question is: What’s the right context for doing that? There are reasons to do differential pricing that are not about illegal discrimination, [for instance] if it’s harder to get certain goods to a part of the country. But when you set different prices for people based on race, sex, national origin — that is not permissible.

Passcode: That sounds pretty clear. So why is it so complicated?

NW: What if we use a proxy, a set of data factors that amount to someone being African American or Chinese American or a Latino?

For instance, you could use things that are not specifically race, but are proxies for race, like [people’s] names. The more sophisticated ones might be a bunch of interests online that are closely associated with either a race or ethnicity. You could imagine someone’s social network, for instance, all in Spanish, highly attuned to Univision and Latino student groups or certain musical groups. You can imagine a set of [factors] each of which individually might not necessarily tell you someone’s race, but when taken as a whole, becomes a proxy for someone’s race.

Is that OK? Even though none of the factors we put in play are in fact about race? That, to me, is the harder part of the question. Whoever’s developing the algorithm, what are their guidelines on putting together the set of factors? And how will we identify a set of factors as improper discrimination, for factions of people who we don’t want to be set apart?

Passcode: Where do you draw the line? If it's OK for Amazon to offer different prices for items, is it OK for an insurance company?

FICO scores numerical values assigned to demographics to set insurance rates for individuals based on age and other types of factors. What if we didn’t do that? What if we don’t like those categories, because they are unfair in some way to everyone who’s in that age group or neighborhood? The typical redlining problem.

Instead, what if we attached something to your car, looked at your driving patterns, and based insurance rates on actual driving patterns? What some civil rights folks said was, "That’s really interesting, but [let’s say] because I work two shifts, I drive a lot at night, and because of where I live, there are more break-ins on my car. And this category of people is disproportionately people of color, versus not." Assigning a rate on these individual factors will take entire categories of people and put them at higher insurance rates. Is that the outcome we want?

Passcode: Could local law enforcement using predictive analytics to forecast crimes lead to profiling or discrimination?

It does no good to have cops on one side of town, if all crime is in other areas. [Their idea was] "let’s try to use data we know as a police department in order to effectively police and allocate resources accordingly." That strikes me as appropriate and right. The problem is, some of the ways they used the data were … invasive of civil liberties, looking at social networks of people in those areas where a lot of crimes were happening, and calling out folks who are connected to people with high crime recidivism rates. That struck me as needing more discussion before that becomes a policy: How much should police departments be able to harness social networks … how much should they be using that from a predictive standpoint, versus a remedial standpoint?

Passcode: How did your work in government compare to your experience in industry?

I learned that government does not move anything close to the speed a private company does. That reset my expectations on how much could get done or what could get done. In the private sector the founder or engineer will have an idea and say, "Let’s launch this today."

If you think about how Internet companies work, they expect that version one will be put out there, have bugs and flaws, and be quickly followed by version 1.1, 1.2. They are built for that type of agility. The government is not. Healthcare.gov would be an outstanding example. That was just a horrible cycle. You have to figure out how to make the delivery of those digital services more in line with how a product company would deliver them. This concept of fast fail is a very Silicon Valley thing.

Passcode: Why did you take the job?

There is a constellation of reasons. One is this moment in history, with this president who is so engaged in science and technology issues, and this moment in technology — the speed of its delivery, the scale of its development, the globalization and debate about how it should be governed. That was irresistible. On a more personal level, this privilege to be able to serve. I am a fourth generation Chinese-American. My grandparents couldn’t buy a home in the States until the 50s. They felt very strongly about public service…. I wanted my children to see me model that.

Passcode: And why did you leave?

NW: My family was ready to come back to California. There was no mystery about it: We as a family decided we were going to commit a year to public service. I will miss the people. I made great friends when I was there, friends who I respect for how much they put into serving their government and serving all of us. I will not miss the weather.

Nicole Wong on how big data could change the way we live

After stints at the White House, Google and Twitter, Wong understands the promise and peril of big data.

Why many in Ukraine oppose a ‘land for peace’ formula to end the war

Get the Monitor Stories you care about delivered to your inbox.

Howard University hoped to make history. Now it’s ready for a different role.

Are you savvy about social networks? Take our quiz to find out.

Kenyan raid exposes hive of cybercrime

Cybersecurity unit drives Israeli Internet economy

Sony hack fits pattern of recent destructive attacks