As rates of urbanisation continue to increase, society needs to better utilise its scarce urban resources. This can be achieved through the use of big data. “Big data” is a universal reality and a reflection of our society. As individuals we cannot avoid contributing to it nor can we avoid its wide spread (and largely unbeknownst) use. Big data has shaped our daily lives in ways that could not have been imagined a few decades ago ranging from traffic management to online purchases, taxes, insurance, finances, food and housing choices. This will only grow in significance and scope, big data is here to stay.
“Big data is high-volume, high-velocity and/or high-variety information assets that demand cost-effective, innovative forms of information processing that enable enhanced insight, decision making, and process automation” (Gartner IT Glossary, 2017). While the principles of the aggregation and use of large amount of data sets has existed for centuries, the tsunami of data generated today along with its wide ranging use is unprecedented and has led to the development of new machine learning technologies (Kitchin, 2013a). 90% of the data that exists today did not exist 3 years ago (SINTEF, 2013), New York City alone generates a terabyte of data every day (BSA, 2015). This opens a considerable world of opportunities for city management but also challenges that we are only now starting to understand, explore and regulate.
The use of big data by municipalities predates the age of Information and Communication Technologies (ICT) as we now know it. New York City for instance has carried out comprehensive statistical analysis for over a century as demonstrated in the book “how the other half lives” by Jacob Rii’s (1890) which portrayed through comprehensive cross-statistical analysis the living conditions of the poorest elements of society. The use of early computers to analyse big data in NYC came through the NYC-RAND institute which used historical rates of fires, population densities and types of buildings to determine efficient resource allocation of fire departments in the 1970’s (Kolesar, Walker and Hausner, 1975). This phenomenon has grown exponentially since, both in its use and sophistication.
The use of big data in an urban context is often synonymous to the “smart cities” movement, a city cannot be “smart” if it does not leverage the power of data. The role of big data in cities is meant to facilitate an efficient flow of data and information to citizens, municipalities and private sector stakeholders who interact amongst each other within an urban environment. Cities are at their core a collection of complex and fluid social networks supported by urban infrastructure and services. Big data is thus concerned with how society interacts with that built environment and the lessons that can be drawn from it. It allows to better understand the metabolism of a city, to better match the demand with the supply of resources, to detect patterns and predict future requirements, to manage growth efficiently.
Data needs to be contextualised to become useful, actionable information (Graham, 2012). The speed at which modern electronics can analyse large data sets and therefore enact change within urban systems is considerably faster and less prone to errors than humans. This creates opportunities for both “hardwired” solutions where a machine is wired to respond in a certain way to a certain set of pre- defined circumstances and “machine learning”, where the machine adapts and reacts to situations by making its own evolutive decision making process.
Hardwired systems allow for a straightforward engineering system based on a set of simple metrics (absent of complexities such as the need to model for human behaviour) which are well adapted to urban challenges and can thus be relatively easily regulated (Bettencourt, 2014). Examples of this can be found in municipal transport systems whereby an automated system can regulate bus traffic flows in real time by analysing average wait time and the number of people waiting at bus stops, rerouting or releasing more buses as the needs arise. The same is true for regulating public utilities, traffic, or even street lamps. Los Angeles for instance recently replaced 4,500 miles of streetlights with “intelligent” LED’s who inform municipal authorities when a light bulb is broken (Elejoste et al., 2013) and which self- regulates the intensity of light emitted according to sunsets and ambient conditions. The London parking sensors that tell users in real time where empty parking spots are through an app in order to reduce time spent searching for a parking spot by up to 30% (Nawaz, Efstratiou and Mascolo, 2013) exemplifies the challenges that hardwired systems can solve but they are limited in their application and in particular their ability to integrate themselves into a multi-stakeholder environment.
Urban challenges which have more complex socio-economic roots require more sophisticated solutions. Crime, education, poverty reduction, affordable housing, sustainability are all elements that are fluid and socially complex. It is those socio-economic “wicked problems” within urban areas that big data needs to tackle to be truly efficient. This is achieved through the integration of multi-dimensional and multi- sourced data into a machine learning environment that result in the coordination of multiple agencies over different timescales and scopes. Thus empowering machines to identify patterns by “linking the data” and giving it meaning. Integrating the cognitive systems of data management with the millions of data stream available could lead to a form of artificial intelligence that is capable of autonomous decision making (Kitchin, 2013b; Kitchin and Dodge 2011).
Crime reduction is a recurrent theme in the urban big data argument. Improved reporting and predictive policing methods are two key areas where big data can have a significant impact. In a recent study it was assumed that under normal circumstances, 80% of gunshots would be reported to the police, yet after the introduction of acoustic sensors calibrated to hear gunshots in a number of US cities, it was found that only 10% of gunshots were reported to the police by local residents (Schlossberg, 2015). Los Angeles (LA) implemented a predictive policing system in 2014 which allowed for pattern identification of over 13 million incidents since 1940 allowing the LAPD to “place officers at the right time and location to give them the best chance of preventing crime” (Mohler et al., 2015). The LAPD is going further in using big data by layering the physical real time tracking of ex-convicts through face and license plate recognition to the predictive policing system, leading to accusations of privacy violations and bias towards past criminals (Hvistendahl, 2016). Criminal activity in LA over the past 2 years has been significantly reduced even if causality is not always evident (Thompson, 2016; Hvistendahl, 2016).
While progress is being made into developing such artificial intelligence systems to aid in city management, knowing something does not necessarily means that it is understood within its social, economic and political context. A comprehensive understanding of society’s ethics and moral values remain elusive and something that computers are still not good at applying effectively (Kitchin, 2013a). Implicit bias and racism can find their way into and be reinforced by human designed computer algorithms (Kitchin, 2013b). While data itself is apolitical and objective, its ultimate use is neither of those (Mayer-Schonberger and Cukier 2013). Along with big data, comes therefore the requirement for “big judgement” by human controllers who may themselves be biased and prejudiced (Shah et al., 2012).
The recent explosion in connected devices – Cisco estimates that by 2020, there will be 6.5 networked devices per global capita or 50 billion devices (Cisco 2016) – leads to the conclusion that big data will only ever get bigger and that, as technology becomes more perversive, citizens will (willingly or not) contribute ever greater portions of their lives to scrutiny and analysis. Yet, with inefficient oversight and transparency in place, there are concerns that big data will be used for commercial purposes, will breach privacy concerns, and may not be sufficiently secure to prevent its misuse.
In order to regulate the use of big data, it is important to differentiate between its broad sources (directed, automated and volunteered) and its uses (Kitchen 2013a). Privacy elements within volunteered sources (created and willingly handed over by the generator) are generally covered under the terms and conditions agreement regulating those interactions, even if accusations of a lack of transparency and failures to meet “informed consent” charges can be levied (Rubinstein, 2013). Directed data (data which is often obtained through traditional forms of human driven surveillance such as CCTV or immigration) plays a relatively small role in urban big data and its use is strictly regulated through national level regulations. The true potential for big data lies in automated sources of data whereby sensors, tracking technologies and the “Internet of Things” generate “continuous data across multiple networks regarding the movement of people and materials” (Batty et al., 2012: 482). Such data is collected by both the private and the public sectors, often without an explicit consent or knowledge from those generating it. In a majority of global regulatory frameworks, as long as data is anonymised and adequately secured, it can be exploited by public authorities for the “greater good”. Those pre- requisites are still not being sufficiently met in a number of jurisdictions (Rubenstein, 2013) leading to data being compartmentalised and treated in “silos”, not being shared across agencies and departments, reducing its utility to hardwired systems and not unlocking the full potential of big data.
There have been attempt at centralising all the data streams and analytics into large control centres such as the Centro De Operacoes Prefeitura Do Rio1 in Rio de Janeiro, Brazil which centralises the operations of 30 municipal agencies (Singer, 2012). Saudi Arabia is equally heavily reliant on big data for the co-ordination of multiple agencies and state entities for crowd and disease control during the 5 days of the Hajj in Mecca (Alaska et al., 2017). It is technologically complex to manage such vast amounts of data, there are millions of data streams that need to come together, they need to be anonymised, secured, organised, allocated and provided the appropriate digital rights. Increasingly, responsibilities for municipal big data management such as the control centre in Brazil is outsourced to private sector operators such as IBM who develop the technologies needed to operate the algorithms that manage big data (Kitchin, 2013), thus increasing the risk for privacy and security breaches while raising concerns over conflicts of interests. The same is true for the physical data storage solutions which are provided by the private sector. The benefits of big data must be weighted against the negative externalities that may be engendered by an over-connected world (Elmaghraby and Losavio, 2014).
Privacy is deeply emotional, a breach of privacy and excessive surveillance alters behaviours (Jourard, 1972). While individuals rank their desire for privacy highly, society as a whole agrees to trade elements of privacy in exchange for the utility and convenience that big data provides. While the right to privacy is an evolving concept from a legal perspective in developed countries, it is rarely, if ever, mentioned in emerging economies regulatory environments (Krotoszynski, 2016). The US has both national and state level privacy regulations which are considered generally weak and were further weakened by the recent repeal of the Internet Privacy Regulations (Naylor, 2017). Europe has passed the General Data Protection Regulation in 2013 to promote individual rights to data and anonymity while placing new accountability measures on organisation that collect or process data (Rubenstein, 2013) but its applicability and enforcement still falls short of providing comprehensive data privacy and security.
The oft repeated mantra of “if you have nothing to hide than you shouldn’t worry about data privacy” is flawed since data privacy is not necessarily required to protect criminal intent but is essential to protect bank details, health conditions but also on a practical level it can signify when you are on holiday and your home is empty or it can provide information on your children’s school’s. Privacy is always essential.
Preserving the anonymity of an individual’s location, actions or conversations is best achieved if explicit identifiers (personally identifiable information) are removed from the data and it is not linked to external information (Shmatikov, 2011). The absence of quasi-identifiers (data which when correlated with other sources can lead to an identification) and sensitive attributes (exact data that can lead to a person’s identity without further data) is essential to preserve anonymity (Dasgupta and Kosara, 2011) but it erodes the usefulness of the data (Brickell and Shmatikov, 2008) and compromises the analysis.
Different privacy preserving data mining (PPDM) techniques are being developed with an aim to enhance privacy at the source while maintaining the full utility of the data sets and promoting their interconnectivity (Baby and Subhash, 2016). K-anonymity models have been developed which mask quasi-identifiers linked to location-based data from aggregators but, given sufficient additional data they remain vulnerable (Zuberi, Lall and Ahmad, 2012). A parallel coordinates system for instance allows for a visualisation interface tool that can be configured and demonstrated but does not release or aggregate data unless authorised (Dasgupta and Kosara, 2011). To enhance privacy and minimise a loss of data utility, some measure of individual control must be afforded to contributors and purpose limitations must be applied on users. As the use of big data becomes more commonplace, new tools will be developed to more accurately protect the privacy of individuals, to improve storage systems and to maximise security.
Many of the current systems only apply for data which is volunteered and where the grantor of the data is sufficiently informed to be able to make a judgement as to the privacy implications of the data that is being provided. While legislations are reinforced to protect the anonymity and the privacy of data, it will always lag behind the situation on the ground and will never be sufficient to meet the requirements for both privacy and maximal utility. While the maintenance of privacy and anonymity are essential, there is an opportunity cost of inaction in the development of essential tools for the sustainability of our cities. For instance, Netherland’s roll out of smart energy meters to residences was held up in courts for over two years because of concerns of infringement of human rights (Cuipers and Koops, 2008).
Data is sourced from the bottom-up and used top-down. Individuals are rarely given the tools and ability to understand and use the data directly, nor are they actively involved in the design and implementation of data driven projects. To achieve improved privacy and analysis accuracy, increased participation is required, a shift from being passive generators and consumers of data to being thoughtful and active participants in the future design of cities. The Detroit Motor City Mapping project which categorises blighted properties for demolition ran with a 150 community volunteers who identified 380,000 parcels which resulted in data 22% more accurate than when automatically collected (Motor Mapping, 2016).
Governments and municipalities – such as New York City (Feuer, 2013) – are exploring ways of making automated data sets open sourced and more public so that the process may become more participatory, decentralised and allow for faster and more efficient methods of analysis and data visualisation (Koonin and Holland, 2015) by private sector stakeholders such as app developers. Privacy is improved through the scale of data sets which limits users capacity for cross referencing (Koonin and Holland, 2015).
Beyond the privacy and security concerns that exist in a well functioning, democratic and rational system, there are other practical challenges. There are concerns with the validity and the accuracy of the data collection processes leading to potentially skewed analysis (Croushore, 2011), big data can be and is often wrong (i.e. predictions for the US 2016 Presidential election). Municipalities often do not have the financial capacity to establish such complex integrated analysis systems, to secure the data they acquire, to maintain sophisticated control centres and develop the human and technological capacity required to centralised and analyse the comprehensive data that cities today have access to. The over-reliance on big data can become a vulnerability for municipalities who focus on the data analysis over the physical and social elements driving city dynamics. Big data systems can be hacked.
Hacking is a perennial challenge of data security and leads to a loss of confidence in the sanctity of privacy, over 1.4 billion records have been leaked through hacking or mistakes in 2016 alone (Kumar, 2017). Scandals such as the hacking of 33 Million Ashley Madisson accounts (Baraniuk, 2015) as well as the privacy and security implications of the “always on” microphone enabled devises such as the Samsung TV and the Amazon Alexa systems (Gray, 2017) do little to quieten the fears of citizens.
Data is information, it is power, it is responsibilities. Even if big data is anonymised adequately and does serve the public good as intended, questions arise regarding the trade off between efficiently run cities and the behavioural impacts of being constantly spied on, examined and used (Jourard, 1972). Visions of a dystopian future where all urban functions are perfectly efficient and well managed but where human chaos, disorder and the subsequent creativity that this may generate are stifled for a presumed better good. At what point in the big data debates do humans become a mere commodity to be controlled (even if for their own good). Singapore is an example of a city that has used big data in its urban planning to great success (Kloeckl et al., 2012), yet Singapore’s very efficiency is for many claustrophobic and restrictive (Rubenstein, 2013). Can the public sector be trusted to use data responsibly and act free of conflict of interest or prejudice? Who is the ultimate arbitrator of social good? The Soviet Union used early forms of big data to better control citizens under the guise of “Public Good”.
The world is being quantified with or without our consent. This creates the risk of separating communities into those that are “quantified” and those that are excluded from the big data revolution, further widening income and representation gaps. In slum or in informal areas, such as in Mumbai’s Dharavi slum, there is a desire to participate in the data collection process since it may, for communities that have long been ignored, lead to a recognition that those areas matter and lead to a better understanding of the issues they face (Catlett and Ghani, 2015).
Successful use of big data will require co-operation between the public and private sectors at all levels of policy making. While a balance will need to be achieved between securing the privacy of citizens and unlocking the potential of big data, it must be considered that while municipalities do have the requirement to use all available resources as efficiently as possible, it is also their responsibility to protect the privacy of its citizens. Big data will continue to revolutionise the world, it should be leveraged to its full potential but not leave cities over-reliant on its analysis, nor should it be developed at the expense of privacy, security and equality of all citizens. Privacy is an integral part of the “Public Good”.
- Alaska, Y., Aldawas, A., Aljerian, N., Memish, Z. and Suner, S. (2017). The impact of crowd control measures on the occurrence of stampedes during Mass Gatherings: The Hajj experience. Travel Medicine and Infectious Disease, 15, pp.67-70.
- Baby, V. and Subhash, N. (2016). Privacy-Preserving Distributed Data Mining Techniques: A Survey. International Journal of Computer Applications, 143(10), pp.37-41.
- Baraniuk, C. (2015). Ashley Madison: ‘Suicides’ over website hack. BBC News. [online] Available at: http://www.bbc.com/news/technology-34044506 [Accessed 1 Apr. 2017].
- Batty, M., Axhausen, K. W., Giannotti, F., Pozdnoukhov, A., Bazzani, A., Wachowicz, M. (2012). Smart cities of the future. European Physical Journal Special Topics, 214(1), 481–518.
- Bettencourt, L. (2014). The Uses of Big Data in Cities. Big Data, 2(1), pp.12-22.
- Brickell, J. and Shmatikov, V. (2008). The cost of privacy. Knowledge discovery and data mining.
- BSA (2015). What’s the big deal with data?. 1st ed. [pdf] BSA the Software Alliance. Available at: http://data.bsa.org/wp-content/uploads/2015/10/bsadatastudy_en.pdf [Accessed 3 Apr. 2017].
- Catlett, C. and Ghani, R. (2015). Big Data for Social Good. Big Data, 3(1), pp.1-2.
- CISCO (2016). White paper: Cisco VNI Forecast and Methodology, 2015-2020. Available at: http:// www.cisco.com/c/en/us/solutions/collateral/service-provider/visual-networking-index-vni/complete- white-paper-c11-481360.html Accessed: 3rd of April 2017.
- Croushore, D. (2011). Frontiers of Real-Time Data Analysis. Journal of Economic Literature, 49(1), pp.72-100.
- Cuipers, C. and Koops, B. (2008). Het wetsvoorstel ‘slimme meters’: een privacytoets op basis van art. 8 EVRM. Universiteit van Tilburg
- Dasgupta, A. and Kosara, R. (2011). Adaptive Privacy-Preserving Visualization Using Parallel Coordinates. IEEE Transactions on Visualization and Computer Graphics, 17(12), pp.2241-2248.
- Elejoste, P., Angulo, I., Perallos, A., Chertudi, A., Zuazola, I., Moreno, A., Azpilicueta, L., Astrain, J., Falcone, F. and Villadangos, J. (2013). An Easy to Deploy Street Light Control System Based on Wireless Communication and LED Technology. Sensors, 13(5), pp.6492-6523.
- Feuer, A. (2013). The Mayor’s Geek Squad. New York Times, March 23rd. http://www.nytimes.com/ 2013/03/24/nyregion/mayor-bloombergs-geek-squad.html?_r=0 Accessed 9 April 2017.
- Gartner IT Glossary (2017). What Is Big Data? – Gartner IT Glossary – Big Data. [online] Available at: http://www.gartner.com/it-glossary/big-data [Accessed 01 Apr. 2017].
- Graham, M. (2012). Big data and the end of theory?. 9th of March. The Guardian Print Edition. London.
- Hvistendahl, M. (2016). Crime Forecasters. Science, 353(6307), pp.1484-1487.
- Jourard, S. (1972). Self-disclosure : an experimental analysis of the transparent self. 1st ed. Ste-Foy, Québec: Editions Saint-Yves.
- Kitchin, R. (2013a). The real-time city? Big data and smart urbanism. GeoJournal, 79(1), pp.1-14.
- Kitchin, R. (2013b). Big data and human geography: Opportunities, challenges and risks. Dialogues in Human Geography.
- Kitchin, R. and Dodge, M. (2011). Code/space: Software and everyday life. Cambridge, MA: MIT Press.
- Kloeckl, K., Senn, O., & Ratti, C. (2012). Enabling the real-time city: LIVE Singapore! Journal of Urban Technology, 19(2), 89–112.
- Kolesar, P., Walker, W. and Hausner, J. (1975). Determining the Relation between Fire Engine Travel Times and Travel Distances in New York City. Operations Research, 23(4), pp.614-627.
- Konnin, S. and Holland, M. (2015). Privacy, big data, and the public good. 1st ed. New York, NY: Cambridge University Press, pp.136-152.
- Krotoszynski, R. (2016). Privacy Revisited : A Global Perspective on the Right to Be Left Alone. 1st ed. Oxford University Press.
- Kumar, M. (2017). Database of 1.4 Billion Records leaked from World’s Biggest Spam Networks. [online] The Hacker News. Available at: http://thehackernews.com/2017/03/email-marketing- database.html [Accessed 3 Apr. 2017].
- Mayer-Schonberger, V., & Cukier, K. (2013). Big data: A revolution that will change how we live, work and think. London: John Murray.
- Mohler, G., Short, M., Malinowski, S., Johnson, M., Tita, G., Bertozzi, A. and Brantingham, P. (2015). Randomized Controlled Field Trials of Predictive Policing. Journal of the American Statistical Association, 110(512), pp.1399-1411.
- Nawaz, S., Efstratiou, C. and Mascolo, C. (2013). ParkSense: A Smartphone Based Sensing System For On-Street Parking. 1st ed. Cambridge: Cambridge Papers.
- Naylor, B. (2017). Congress Overturns Internet Privacy Regulation. NPR. [online] Available at: http:// www.npr.org/2017/03/28/521831393/congress-overturns-internet-privacy-regulation [Accessed 6 Apr. 2017].
- Riis, J. (1980). How the Other Half Lives: Studies among the Tenements of New York. Available http:// www.authentichistory.com/1898-1913/2-progressivism/2-riis/index.html [Accessed 6 Apr. 2017].
- Rubinstein, I. (2013). Big Data: The End of Privacy or a New Beginning?. International Data Privacy Law, 3(2), pp.74-87.
- Schlossberg, T. (2015). New York Police Begin Using ShotSpotter System to Detect Gunshots. The New York Times. [online] Available at: https://www.nytimes.com/2015/03/17/nyregion/shotspotter- detection-system-pinpoints-gunshot-locations-and-sends-data-to-the-police.html?_r=0 [Accessed 5 Apr. 2017].
- Shah, S., Horne, A., & Capellá, J. (2012). Good Data Won’t Guarantee Good Decisions, Harvard Business Review, 90, 4, pp. 23-25, Business Source Complete, EBSCOhost.
- Shmatikov, V. (2011). Anonymity is not privacy. Communications of the ACM, 54(12), p.132.
- Singer, N. (2012). Mission control, built for cities: I.B.M. Takes ‘Smarter Cities’ Concept to Rio de Janeiro. New York Times, 3 March 2012. http://www.nytimes.com/2012/03/ 04/business/ibm-takes- smarter-cities-concept-to-rio-de- janeiro.html. Accessed 9 April 2017.
- SINTEF (2013). Big Data, for better or worse: 90% of world’s data generated over last two years. ScienceDaily. Available at: www.sciencedaily.com/releases/2013/05/130522085217.htm [Accessed 6 Apr. 2017].
- Thompson, M. (2016). Analyzing the Efficacy of Predictive Policing in Law Enforcement. Journal of Criminal Anthropology, 98(1), pp. 56 – 100.
- Zuberi, R., Lall, B. and Ahmad, S. (2012). Privacy Protection Through k.anonymity in Location-based Services. IETE Technical Review, 29(3), pp.196-201.