Eight Problems With Police “Threat Scores”
The Washington Post Monday had a piece about the use of “threat scores” by law enforcement in Fresno, California. This story follows release of information about this predictive-policing program obtained through an open-records request by my colleagues at the ACLU of Northern California.
The scores are generated by software called “Beware,” made by a company called Intrado. According to a promotional pamphlet obtained by the NorCal ACLU, the software’s purpose is “searching, sorting and scoring billions of commercial records” about individuals. It scours the internet for social media posts and web site hits and combines it with other information such as public records and “key data elements from commercial providers.” Intrado claims that its product is “based on significant amounts of historical work in mathematical science, decisioning science and link analysis,” and “uses a comprehensive set of patent-pending algorithms that search, sort and score vast amounts of commercial records from the largest and most reputable data mining companies in the industry.”
Intrado boasts that its software can target an address, a person, a caller, or a vehicle. If there’s a disturbance in your neighborhood and you have to call 911, the company would have the police use a product called “Beware Caller” to “create an information brief” about you. Police can also target an area: a product called “Beware Nearby” “searches, sorts and scores potential threats within a specified proximity of a specific location in order to create an information profile about surrounding addresses.” So the police may be generating a score on you under this system not only if you call the police but if one of your neighbors calls the police.
The prospect of a democratic government making unregulated, data-driven judgments about its own citizens outside the protections of the justice system raises some fundamental and profound questions about the relationship between the individual and the state. Citizens in a democratic society need to be able to monitor their government, and make judgments about how it is performing. Is it healthy for the government to begin to do the same to its citizens? At what point does that begin to resemble China’s incipient “citizen scoring” system, which threatens to draw on social media postings and include “political compliance” in its credit-score-like measurements?
The governmental scoring of citizens is an imperative we have seen before, not just in the context of policing—for example in Chicago’s “Heat List”— but in other security contexts as well. The TSA, for example, pushed hard under President Bush for a program known as CAPPS II, under which the government would have tapped into commercial data sources to perform background checks on the 100 million Americans who fly each year, and build a profile of those individuals in order to determine their “risk” to airline safety. As with this Beware software, it was originally envisioned as giving a red, yellow, or green light to each subject. CAPPS II was highly controversial from the start, and after a battle that lasted approximately five years the government abandoned the concept (though it does threaten to come creeping back).
There are numerous problems with this or any system for generating “threat scores” on citizens:
- Scoring Americans in secret. Like the TSA before it, Intrado says that its methods for generating risk assessments will be secret. This is a cutting-edge technology being used for a novel and highly sensitive purpose. Given the vast uncertainties that surround the making of automated predictive judgments about individuals, especially in a law enforcement context, public transparency is vital so that we as a society can begin to evaluate such approaches. We are a democracy after all, and the highly fraught value judgments about what if any uses of “big data” to make in policing must be made publicly.
- Inaccurate data. We do know that the source data used for such judgments is likely to include many errors and inaccuracies. Anyone who has looked at their credit report knows how frequently those reports get basic facts wrong, confuse different individuals with similar names, etc. The contracts among commercial data brokers and their clients “include few provisions regarding the accuracy of their products,” the FTC has found. For private data companies, accuracy levels beyond a certain point are simply not worth the cost. But the FBI too felt compelled to exempt its primary criminal database from a legal requirement that the agency maintain its records with sufficient accuracy to “assure fairness to the individual” — and damage to people’s lives has been the predictable result. With the Beware software’s scoring formula kept secret, there will be little check against such errors.
- Questionable effectiveness. Without public scrutiny, the public will not know what data sources are used to generate the scores, how reliable that source data is, how the different variables are weighed and interpreted, and how valid the assumptions behind the inclusion and relative weight of each variable are. Those are highly methodologically and sociologically complex questions, and robust, valid, broadly acceptable answers are unlikely to emerge from the corporate suite of a small company that sells software to police, no matter how much “mathematical science” it brings to the task. Even if the project of rating citizens were acceptable, it could never be done properly without the broad public and expert scrutiny that transparency to “a million eyeballs” brings. Another effectiveness problem comes from the limited ability of key word-based evaluation systems to understand human communications. Scary-sounding language used in private almost always consists of sarcasm, irony, hyperbole, jokey boasting, quotations of others, references to works of fiction, or other innocuous things. Despite many advances, computers are still far away from understanding human social life with enough sophistication to tease out such contexts.
- Unfairness and bias. Without transparency a major question about secret risk scores is whether and to what extent they will have intentional or unintentional racial, ethnic, religious, or other biases, or whether they include elements that are just downright unfair (such as guilt-by-association credit ratings that penalize people for shopping at stores where other customers have bad credit). There is nothing magical about taking a lot of data and creating a score; the algorithm by which that is done will do no more than reflect its creators’ understanding of the world and how it works (at least if it is not based on machine learning—which I doubt this system is, and which in any case has other problems of its own). Ultimately the danger is that existing societal prejudices and biases will be institutionalized within algorithmic processes, which just hide, harden, and amplify the effects of those biases.
- Potentially dangerous results. The consequences of inaccurate and biased data may be dangerous and even deadly if it leads police officers, many of whom are already far too prone to use force, to come into an encounter already frightened and predisposed to believe that a subject is dangerous. And officers who do use unnecessary force will inevitably cite the scores as evidence that their actions were subjectively reasonable.
- An unjustified government intrusion. These risk assessments are being built out of two sources of data that we should not want our government to access: citizens’ social media conversations, and the dossiers that the data broker industry is compiling on virtually all Americans. While public social media postings, unlike private online conversations, are not protected by the Fourth Amendment, as a policy matter we do not want our law enforcement troweling through our online conversations. This would largely waste the time of the public officials we are paying to keep us safe, and create chilling effects on our raucous online discourse. We don’t want secret police in America, or their computerized equivalent, circulating among law abiding citizens as they exercise their constitutional rights—online or off—just to monitor what they are doing. We don’t want Americans to have to pause before they speak to ask, “will this be misinterpreted by a computer?” Nor should the authorities be buying information, directly or indirectly, from the privacy-invading data broker industry, which builds dossiers on virtually all Americans without their consent. While it does this for commercial reasons, the result is nonetheless comparable to what we’ve seen in totalitarian states. The questionable benefits of these invasions of privacy are not worth the chilling effects and danger of abuse they bring.
- First Amendment questions. Other First Amendment problems stem from the fact that our law enforcement unfortunately has a long history of antagonism toward even peaceful political activists and protesters seeking to make the world a better place—a history that has continued right up to the present. This alone provides ample reason to worry that a ratings system will hurt and chill political activists. The problem is only confirmed by the inclusion (as my colleague Matt Cagle describes) of hashtags such as #Blacklivesmatter, #Mikebrown, #Weorganize, and #wewantjustice on a police social media monitoring list of key words touted as “extremely effective in pro-active policing.” A timid citizen considering tweeting about a political protest could be seriously chilled from expressing himself by the prospect that doing so might make him a “yellow light” in the eyes of the authorities.
- Mission creep. If this system is sold, snuck, or forced into American policing, it will, once entrenched, inevitably expand. First, in the data that it draws upon as companies and agencies seek ever-more data in a futile quest to improve their inevitably crude assessments of individuals’ risk. Second, the purposes for which it is used may expand as police departments go beyond using them for individual police calls to other uses (force deployment decisions, perhaps, and who-knows-what-else). Risk assessments may be created not just on an individual basis for police calls, but on a wholesale basis for entire populations. And of course the scores may be shared with and adopted by other agencies for use in a wide variety of governmental purposes. They may also spread to the private sector—starting with corporate security forces, perhaps, which often work very closely with police and might use them for anti-union activities, the vetting of customers, or any other corporate goals. In general the danger is that these assessments, once brought into being, could come to reverberate through individuals’ lives in many ways.
Overall there is a lot more easily accessible data floating around about everybody in today’s society. How should the police make use of all that data? How much should be fed to officers in different situations, and in what form? The data revolution raises complex questions for policing that we as a society are going to have to work through—but any law enforcement use of big data needs to be approached carefully and thoughtfully, and hashed out publicly and democratically. That means total transparency. And the risk scoring of individuals should have no part in it.