New York City has been a Big Data pioneer for decades. In the early 1990s, the city launched the CompStat data-driven policing system, so that, in the words of former NYPD department chief Lou Anemone, officers could stop “just running around answering 911 calls” and start analyzing patterns to prevent crime. Thanks in part to CompStat, major crimes in the city have since fallen by 80 percent. During the Michael Bloomberg mayoral years, the city used data to pinpoint dangerous intersections and driving habits, cutting traffic deaths by nearly a third. Today, thanks to advances in data-storage capacity as well as the ubiquity of smartphones and broadband access, New York has an unprecedented number of facts to analyze and act upon, CompStat-style, across all areas of government—from building inspection to noise reduction. But while it can shine a brighter light on problems and give citizens and government new tools with which to understand them, Big Data can’t solve the problems themselves. For that, we still need old-fashioned political will.
Cities are in the middle of what Daniel Doctoroff, a Bloomberg-era deputy mayor, calls their fourth modern revolution. In the eighteenth century, cities got the first: the steam engine, which made possible the industrial age. In the nineteenth century, electricity gave cities the subways and elevators they needed to fit more people into small spaces. In the twentieth century, the automobile made it easier for residents and workers to leave dense urban areas. And now, in the twenty-first century, cities are getting the “networked revolution” of large-scale data collection, both human and automated, as well as continuous connectivity to transmit and store those data.
And with better transparency laws, much of this Big Data becomes open data: information that everyone can see and use. Consider how one area of data reporting and collection has exploded in just a decade: New Yorkers’ complaints, questions, and observations about their city. Last year, more than 30 million people called or went online to 311, the city’s information and complaint system. That’s nearly four times the city’s population, and four times the number of people who called in 2003, the first year that New York offered the service. Every call or click makes a point of data for the city or for outside observers—who can easily access much of this information—to follow.
During the Bloomberg years, the city began using these data to figure out, CompStat-style, where and when to send officials out—not to prevent crimes but to address other issues. Mike Flowers, chief analytics officer at the data-technology firm Enigma, served during Bloomberg’s third term as New York’s first-ever data-analytics chief. Bloomberg built his fortune in the private sector “de-fragging intelligence,” Flowers says, and the mayor applied the same techniques to government. One basic problem of even highly functional cities, Flowers notes, is “too big a body, too little blanket,” and the traditional answer to that problem, he says, is “let me hire more people.”
Instead, Flowers used data to help him perform a critical task: protecting New Yorkers from firetraps. “We were getting 20,000 complaints a year,” he says, many via 311, “about illegal [apartment] conversions”—that is, landlords or tenants subdividing their properties, creating dangerous conditions for poor tenants. But building inspectors, “treat[ing] the Empire State Building the same as a small apartment building,” regarded all buildings equally, deploying resources unwisely. Working with the fire department and buildings department to analyze historical information, Flowers’s team discovered which buildings were more likely to have critical violations resulting in destruction and death—and made it a priority to inspect those buildings. “What we basically did is reconstruct the inspection experience” to be risk-based, he says. Now, every building gets a risk grade, which inspectors use as guides. Fire deaths and injuries have never been so low.
Another success during the Bloomberg years: “end-to-end 911 analysis,” that is, integrating different city agencies’ systems to learn how long it takes for ambulances to respond to 911 calls and if response times could be improved. Previously, city departments had tracked such data only at the points in which their own resources were involved. The fire department, for example, measured how long it took to send out an ambulance after it received a transferred call from the police department, not how long it took for the police department to transfer that call. Yet “both pieces of data are valuable,” says Flowers. The fire department shouldn’t be accountable for a dispatching delay, and vice versa, but seeing all the data, the city can now figure out whether the problem involves dispatching or ambulance availability—or both. Officials can rely on this information in scheduling shifts more efficiently.
Flowers’s experience also shows that data can’t act on their own: they require officials to exert political will. He couldn’t have done his job, he points out, were it not for the third-term Bloomberg administration’s willingness to annoy the fire-department unions, which “sued me over and over” regarding the more efficient inspections, even though these practices didn’t violate collective bargaining agreements. “The algorithmic stuff is really the easy piece,” he says—getting different city agencies to work together to boost efficiency was the heavier lift.
Cascading problems help explain why data alone aren’t the solution to urban woes. When the city used its more efficient analysis procedures to shut down a firetrap, for example, it also made several families newly homeless, creating a concern for another government agency. But, Flowers notes, he always kept in mind that his job was to use data to pursue “the mission of [a government] agency,” not to protect the bureaucracy.
New York’s worst natural disaster in recent memory—Superstorm Sandy—struck during the Bloomberg administration, spurring the city not only to use data more aggressively but also to improve the quality of those data. Noel Hidalgo heads BetaNYC, a civic “hacking” organization that relies on open data to try to help government perform better. Hidalgo notes that after Sandy in 2012, the Bloomberg administration grasped how bad some of the city’s information was: single locations had multiple different addresses, for instance; and the buildings department had no idea what the fire department knew about a particular building. “The tragedy of Sandy created an environment where the city recognized the value of data sharing,” says Hidalgo. The Bloomberg administration then made sure that every location in the city has a single, consistent address, now available to the public. “Interagency cooperation on address location” is “fundamental to a city as dense as New York,” Hidalgo says. As data quality and sharing improve, firefighters responding at a particular address could someday receive a warning that police have gone to the home multiple times for domestic-violence incidents, or see that the building’s landlord had violations at other buildings for dividing them into illegal and unsafe apartments—and that the landlord might have been doing the same thing with the burning building.
Half a decade ago, the state-run Metropolitan Transportation Authority started making data feeds available for the public’s use. Private-sector entrepreneurs have drawn on them to create dozens of apps to help New Yorkers do everything from find out when the next bus is coming to learn about the piece of artwork in a particular train station. The MTA is exploring other applications; it held a “hackathon” in early March so that volunteers vying for $2,000 in prizes could crunch data to help the authority speed up Staten Island bus times. Using open data, citizens have prompted reforms in other transportation areas as well. Ben Wellington, a Pratt professor who runs the I Quant NY blog, learned through open data that just half of the city’s cab fleet included tolls to calculate the suggested tip for a cab ride. Now, the fleet has a consistent policy (in favor of the higher tips).
Open data can be deployed not only to improve government but also to help business owners cope with the government agencies that regulate them. Aileen Gemma Smith, a local entrepreneur, saw that after Sandy, small-business owners wanted to know such things as when a certain street would reopen. She noted that this lack of information, though more acute after a disaster, was chronic even in normal times. Business owners showing up for work would be surprised that the city had closed their street for repairs—meaning lost customers and thus lost revenue. “I went to shopkeepers and said, ‘I’m building this for you, talk to me about what’s important,’ ” she says. The app she launched, called Mind My Business, crawls through hundreds of the city’s data sets to give subscribers practical information: “the MTA is closing the subway stop near your store this weekend,” or “the city is repairing the sidewalk that goes by your shop this week,” or “the previous owner of the restaurant space you own got fined four times for the following violations.” “Data aren’t just for privileged folks doing research,” Smith says. “Open data is how I help the local bodega guy, how I help the diner that’s been there for 25 years.” Mind My Business has 2,000 local subscribers.
Amateurs can mine Big Data to improve the quality of life in the city, too, even if they know nothing about software coding. Paul Vogel, a Prospect Heights resident, got a camera for his bike a few years back because he “had a couple of bad run-ins” with reckless car and truck drivers, “and I’m really bad at remembering license plates.” When he gets home after a ride, he sends pictures of taxis and other for-hire cars whose drivers have violated various laws to 311. “I was surprised that the 311 system for [taxi complaints] is so efficient . . . that I could get someone fined,” he says. Over a year or so, his self-described “hobby” has earned the city about $30,000 in revenue. More important, he may have saved some lives, by deterring drivers he got fined from parking in bike lanes or running red lights again. Vogel tweets his successes. “Putting it out on social media has raised awareness a little bit,” he says. Several other people have contacted him to let him know that they’ve done the same thing.
In 1974, the New York Public Interest Research Group, or NYPIRG, a good-government organization, helped prevail upon the state legislature and Governor Malcolm Wilson to pass the nation’s third freedom-of-information law. The FOIL law, as reporters know it, gave the press and public the right to obtain any government documents unless, among other restrictions, they violated private citizens’ confidentiality. A decade later, good-government groups persuaded the city council to establish the commission on public information and communications, which, in turn, published the city’s first public-data directory in 1993. New Yorkers could now access a list showing every publicly available data set that the city maintained, even if they couldn’t actually see the data without special software and expertise.
Public-information laws have never been perfect. Government agencies sometimes charge onerous fees for documents, and they often interpret their mandate to protect privacy too broadly, forcing petitioners to spend money and time suing for what they should get for free. But the laws gave a generation of civic researchers a valuable new tool. Steven Romalewski, director of CUNY’s mapping service and a long-time veteran of data collection who worked for NYPIRG for 22 years, says that he spent much of his time there “accessing data and putting it to use.”
In the 1990s, Romalewski used information he gleaned via the law to map toxic-waste sites on Long Island, showing the public: “Here’s how close they are to parks, here’s how close they are to schools.” And in the city, during the Rudolph Giuliani years, Romalewski fought to get data on children whom the city had tested for lead-poisoning exposure. The city denied Romalewski’s FOIL requests, but after Michael Bloomberg became mayor in 2002, the city settled the resulting lawsuit, and “we saw the data, tested it by zip code, allocated it to council districts on a map,” and “showed it to council members.” Traffic-safety advocates have similarly been fighting for traffic-crash data and presenting them to the public for years. And some city agencies—the department of city planning, for one—have long had websites featuring sophisticated, accessible data sets on zoning and population. “Information transmission has dramatically changed” in recent years, though, says Romalewski, so that it’s become “almost trivial” to download a data set that once could have taken months to obtain.
During the Bloomberg years, civic-minded citizens got a legal update for the connectivity age: the 2012 open-data law, which requires the city to make certain data sets available on a free online portal. Today, city agencies offer 1,400 data sets to download and examine, with no tools needed besides spreadsheet software—now free, thanks to Google—and some patience. If you’re up late, upset about a barking dog, you can find out within an hour or so how many other people have had the same complaint over the last six years, and where they live—DIY data collection and analysis that were unfathomable even 15 years ago.
Those responsible for providing the data, however, often resist its dissemination. “The open-data law was pushed by the [city] council, not the mayor,” says John Kaehny, of the New York City Transparency Working Group. Though Bloomberg supported more open data, the problem persists in government. The issue isn’t cost—the city was already collecting all the data it now releases automatically to the public through its open-data portal—but accountability. “Data can equal embarrassment,” says Kaehny. More data can make more people aware that the city has a problem—whether with slow ambulance-response times or a rising street population.
Another drawback, from government’s perspective, is loss of control. Kaehny notes that city agencies have historically preferred to release their data selectively to “trusted users.” Open-data policies make such data “feeding” harder. If information is power, those who once enjoyed exclusive access but no longer do become less powerful.
The biggest danger in the Big Data and open-data worlds may be complacency. Just because the city releases 1,400 data sets doesn’t mean that we know everything that we need to know. We still need FOIL, as well as journalists and civic activists, to ask for data that the city hasn’t already collected or won’t give out because no journalist or researcher has asked for them. One thing hasn’t changed: information that is free and easy to get is often the only information that the government wants you to see.
At some point, though, Big Data and open data run into an old-fashioned problem: the city knows what the information suggests that it should do—but it won’t do it. For years now, residents of Battery Park City and the rest of the lower Hudson waterfront have suffered from incessant noise from tourist helicopters, says John Dellapontas of Stop the Chop, a residents’ advocacy group. “We have flight data that we got from FOIL,” he notes—referring to the freedom-of-information law (see box) by which his group has petitioned the city and state to reveal how many helicopter flights take off from a city-owned helipad. He could show “round trips, one every minute,” flying by residents’ apartment windows. Yet the company that runs helicopter tours used the lack of 311 data to say that residents didn’t mind the flights. “That is true, but meaningless,” Dellapontas says. “We started doing 311, [but] we’d get a form response saying that it’s perfectly legal, which it is.” Instead, Stop the Chop “circumvented the 311 system and had 5,000 members e-mail blast” local politicians. “We basically created a more effective 311 system.” Stop the Chop won the data battle but lost the war. The compromise that the mayor came up with reduced flights to every two minutes, from every one minute—and extended the tour-helicopter company’s lease.
Midtown residents have had similar difficulty in getting the city to rezone land so that developers can’t build 1,000-foot-plus condominium towers that cast large shadows across public spaces. The “sunshine task force” of Manhattan’s Community Board 5, the arm of local government that is supposed to be closest to citizens’ needs, has spent more than two years amassing and analyzing property-records data, zoning rules, and building permits, and working with other nonprofits to show the consequences of poorly conceived construction. The first problem that local residents have run into is that data released by the city are often difficult to understand and analyze. When sharing information on, say, the property-rights transfers necessary to build super-tall towers, the city doesn’t put out a simple weekly list of all such transfers. Residents looking to learn about such activity in their neighborhoods must crawl online through hundreds of individual property records to see what activity occurred at those properties. “The way the information is buried, it becomes unusable,” says Layla Law-Gisiko, chairperson of the task force.
Second, even when residents are able to do the difficult work of presenting incontrovertible, easy-to-understand data, it’s still easy for the city council and the mayor to ignore those data. The sunshine task force and the Municipal Art Society have compiled and publicized maps that show clearly how shadows from planned buildings would affect Central Park, one of the city’s most treasured public spaces. But the city council has responded by promising only to study the issue. Meantime, the towers are going up. “It really makes no sense,” says Law-Gisiko. “It’s really putting our democratic system in jeopardy.”
On the plus side, data’s role in public decision making has become increasingly hard to ignore. Even in New York’s city council, which often makes headlines for greenlighting poorly considered ideas, at least some proposals go down to defeat when they’re not backed by sufficient information. Consider the horse saga. During his first two years in office, Mayor de Blasio fought to rid New York of Central Park’s horse carriages. A horse-free New York was the top priority of some of his major donors. Yet the city’s paucity of data to back the mayor’s case was striking. At a January 2016 hearing, Mindy Tarlow, the mayor’s director of operations, attempted to convince the city council that it was dangerous for horses to ride in traffic and for pedicab drivers and horses to coexist in Central Park. Yet she had no data map of crashes between horses and pedicabs and no graphic showing how horse deaths on city streets had increased (if they had even occurred). “We shouldn’t burden an entire industry without any statistics,” Council Member Margaret Chin said. In the end, the horse-carriage drivers prevailed.
In some cases, however, government finds itself outgunned by private-sector firms that manipulate data to circumvent laws meant to protect New Yorkers’ quality of life. Take the example of Airbnb, which allows New Yorkers to turn their apartments into hotel rooms for paying guests. Airbnb’s most lucrative service allows New Yorkers to rent out entire empty apartments to strangers. This business is illegal under state and city law: New Yorkers do not want their apartment buildings to become hotels or youth hostels. Enforcing the law, though, is difficult. The city must rely on 311 callers to report illegal apartment rentals; then it must dispatch enforcers who wait, sometimes for hours, to spot illegal activity. The city expends taxpayer resources to find out something that Airbnb already knows, from its internal databases: which rentals are illegal and which are not. Before releasing select data to reporters in December 2015, Airbnb scrubbed its listing of at least 1,000 illegal rentals, in order to make the data look better to journalists. Airbnb knows where and when its “hosts” are breaking the law; the city does not.
Despite many advances, we still need more data—and better data—to improve New York’s quality of life. Take a common scenario: a resident calls to complain that construction plates over underground work in the street aren’t secured properly. The website says that inspectors found nothing wrong. “You wonder, did they ever inspect it? You don’t know,” says John Kaehny, cochair of the New York City Transparency Working Group, a coalition of government watchdogs. The city should let the public see specific details of government agencies’ responses to such complaints. The city should also allow people to continue a 311 complaint when they think that the city hasn’t resolved it successfully, rather than force them to open new complaints. Doing this involves overcoming two political hurdles. The police department responds to many complaints, but police unions don’t want GPS data tracking cruisers’ movements, and providing such specific data might prove what complainants suspect: that officers and inspectors are often too overwhelmed to respond.
The future of data, though, isn’t more 311 calls, but sensors. This year, a private vendor is erecting free Wi-Fi kiosks around the city to let New Yorkers access high-speed Internet as they walk along the street. The city could work with the vendor to outfit these kiosks with sensors to monitor everything from noise-pollution levels to carbon monoxide, one person familiar with the project noted. High levels of carbon monoxide could direct traffic agents to a congested street with double-parked trucks. The city could also combine human reporting with sensor enforcement, forcing a building site with a long history of noise violations, say, to deploy a permanent sensor at its site. The city could also learn when a garbage can is full and needs emptying, or count how many people walk through a particular intersection at particular times, or count how many people, precisely, get into a subway car—and use that information to provide better services. Sensors could count how many cars and trucks are in Manhattan—and prohibit other vehicles from coming in until some leave.
And New Yorkers, of course, have urban problems that data won’t solve. On many subway lines now, riders can look at “countdown clocks” to see when the next train will arrive—but knowing that the Lexington Avenue line train is coming in two minutes doesn’t change the fact that not everyone waiting on the platform will fit onto that train. New Yorkers don’t need Big Data to tell them that the subways are too crowded. Without better, smarter investment in the assets that our second modern urban revolution—electricity—gave us, we can’t fully capitalize on Big Data’s potential.
Top Photo: Using data to identify dangerous intersections and reckless driving habits, New York City has cut traffic deaths by nearly one-third. (FRANCES M. ROBERTS/THE IMAGE WORKS)