(Update: correction below)
Forbes reported last week that the crowdsourced mapping location service Waze is beginning to share bulk location data with government bodies—with Rio de Janeiro since 2013, and soon with the state of Florida. The cycling app Strava is also in talks to begin selling its data to urban planners, and the public-transportation app Moovit is already selling data to multiple cities.
We are not to worry about our privacy, a Waze spokesperson tells us, because the company replaces the names that accompany driving data with an alias.
The problem is, your location history IS your identity.
I use Waze sometimes, and all of my trips either begin or end at my home, so attaching an alias to my data does me little good in terms of privacy. In fact, as I’ve discussed before, it turns out that even relatively rough location information about a person will often identify them uniquely. For example, according to this study, just knowing the zip code (actually census tract, which is basically equivalent) of where you work, and where you live, will uniquely identify 5% of the population, and for half of Americans will place them in a group of 21 people or fewer. If you know the “census blocks” where somebody works and lives (an area roughly the size of a block in a city, but much larger in rural areas), the accuracy is much higher, with at least half the population being uniquely identified.
But of course Waze’s data could be used to get more precise data than that—in many cases, to determine a vehicle’s home address, which pretty much reveals who you are if you’re in a single-family home, and narrows it down pretty well even if you’re in a large apartment building. (Academic papers have been written on inferring home address from location data sets.)
To truly anonymize the data set, these companies would need to do much more. Possibilities might include snipping off the first and last mile of each journey (or whatever distance data scientists find is necessary depending on population density), introducing random changes or “fuzzing” the data, or lowering the resolution of the data by reducing the significant bits or resolution of the GPS coordinates. These techniques are not without their problems (snipping the trips would make short trips disappear from the database, for example, and fuzzing the data is susceptible to statistical cleaning). But, these problems are being worked on.
These kinds of data sets may prove truly useful for urban planners, who after all are working to make life better for everyone. Well-planned cities are certainly much more socially valuable than advertisers trying to shave a few cents’ more of efficiency out of ad delivery (though, at least one of the companies, Moovit, is also exploring sale of data to advertisers). We might hope that more robust anonymization techniques could preserve privacy without sacrificing the data’s usefulness to planners—or at least that some usefulness could be saved. But in the end, we do not want to become a society where individuals are constantly tracked, even if we pay some price in efficiency.
Meanwhile, if you use location services and don’t want records of your comings and goings landing in the laps of government officials and who-knows-who-else down the line, one simple solution is to, where practical and safe, keep the app turned off until you’re a mile from your home, and turn it off a mile before your destination.
Update & correction (7/16)
A representative of Waze (owned by Google), Julie Anne Mossler, sent a response to this post:
Waze only transmits road closures and incident reports to partners (accidents, traffic jams as reported by users) This information is 100% non-identifiable.
Waze does not allow free, unfiltered access to its data; rather, the company creates spreadsheets or opens its API to select partners which only passes along information critical to the partner’s specific issue.
For example, when we share an accident alert, the government does not know any identifying information about the cars involved, just that a Waze user nearby reported there was an incident. Even the Waze reporter is identified only by user name. Waze never collects license plate or similar identification at any time during a consumer’s use of the app. Even if we wanted to share a driver's route, we would not have the ability simply because of how this information is stored. And we have no desire to do so.
We are also in full compliance with Google’s policies regarding data sharing, considered to be some of the most stringent in the world.
This is good news! I don’t see any privacy problem with the sharing of data that Mossler describes. The Forbes article reads:
What may be especially tantalizing for planners is the super-accurate read Waze gets on exactly where drivers are going, by pinging their phones’ GPS once every second. The app can tell how fast a driver is moving and even get a complete record of their driving history, according to Waze spokesperson Julie Mossler. (UPDATE: Since this story was first published Waze has asked to clarify that it separates users’ names and their 30-day driving info. The driving history is categorized under an alias.)
This passively-tracked GPS data “is not something we share,” she adds. Waze, which Google bought last year for $1.3 billion, can turn the data spigots on and off through its application programing interface (API)
I don’t remember reading the line about “not something we share,” apparently I just missed it and therefore misunderstood what was happening, for which I apologize.
We’re glad to clarify that Google is not offering complete user location data to governments. We hope it will stay that way. These are, as Forbes said, “tantalizing” data sets, which is just one reason I still don’t like Google compiling and retaining my location data itself, even if it doesn’t share it.
Finally, remember that this is not just about Waze. There are a lot of other location apps and services out there, and will be even more the future. (I have sent a note to Strava and Moovit asking if they would also like to respond to what I’ve written and will update further as appropriate.)
Update 2 (7/17/14):
Representatives from Strava and Moovit contacted me to confirm that the way that they aggregate user data does not reveal individual location trails. Moovit wrote, "Moovit only shares data about average transit speeds and incident reports about specific lines (overcrowding, which buses have handicapped access)." Strava told me they provide cities with minute-by-minute counts of how many cyclists are on each block, and their directions of travel, but again don't provide whole data trails.