As we venture out into the virtual world of the internet, we outlay lots of personal data. Our Facebook and social media profiles create a detailed account of ourselves, a projection to the world of our virtual persona. The internet has an account of who we are, what we believe. Given the unlimited accessibility of the internet, and the copious amounts of data pumped into the cloud, privacy is becoming somewhat compromised. Retention of personal privacy, not allowing access to certain data beyond intentional audiences is becoming more and more difficult. Especially for governmental census data, ensuring privacy whilst being accurate is becoming a counterbalance.
The Responsibilities Of The US Census
The United States Census Bureau (USCB), formed in 1902, is officially responsible for gathering data for the American people and the economy. Every ten years, the US census is conducted, gathering information about the population, and allocating seats to the US house of representatives accordingly.
The census also assists in directing funds, resources, and attention towards different sections of society, from local to national levels. A hefty $675 billion of federal funds are allocated every year in America, to maintain infrastructure, hospitals, and police forces, distribution according to the data. The census provides a benchmark statistic for comparing data to previous decades, to analyze trends of progress or regression.
Mission Of The Census Bureau
The Census Bureau’s mission is “to serve as the nation’s leading provider of quality data about its people and economy”. While fulfilling the mission, the bureau must also ensure privacy.
How Is The Census Data Gathered?
Traditionally, the census data was gathered by door-to-door knocking, collecting information written in a ledger. Mailed in forms were used in 1970, but the concern was raised regarding the expense of paper and printing. As computing becomes more ubiquitous in society, it was used in 2010, saving $1 billion in the collection.
Furthermore, embracing the technology increased the accuracy of data, which is imperative given the overwhelming decision-making reliance on the data. However, alongside technological implication comes security precautionary measures in order to protect data, but also to enforce confidentiality.
New Privacy Methods Required As Computers Are Introduced
As computers were introduced into the census collection method, it required some different privacy methods. In order to enforce privacy, the US census used methods of table suppression and data swapping in 2010 to ensure the privacy of individual data. Census bureau has a constitutional mandate of article 1, section 2 of the American Constitution, which requires a data census of the American population every ten years. Furthermore, the mandate requires the individual confidentiality of the information for 72 years subsequent to the consensus, under ‘Title 13’.
Given the increased computing power in the last decade, a new form of privacy was required to ensure confidentiality for the 2020 census. The 2020 US census saw the adoption of ‘differential privacy’, which confronts the risk of computer hacking. Furthermore, John Abowd, the head scientist at the Census Bureau, said the shift to differential privacy “marks a sea change for the way that official statistics are produced and published”.
Counterbalance Of Accuracy And Privacy
Differential privacy provides mathematical noise and uncertainty to the data set, to balance between accuracy and privacy. Accuracy and privacy are counterbalanced, higher accuracy can be achieved through low privacy and visa versa. The rationale of injecting ‘noise’ into the data set is to reduce the change of user identification.
The concern for privacy revolves around the risk of attacking census data. We live in a world of big data collection, and census data would prove valuable to companies operating to better target their consumers with goods and services. Companies use data to refine their product, depending on large consumer data for what their preferences are, to align design with preference. However, this is undermining privacy, and the contradicting want of privacy of consumers, and the need for big data is the rationale behind differential privacy.
Method Of Differential Privacy
The method of differential privacy allows attaining the data, without compromising the individuals’ privacy. It includes fake data sets ingrained within real data figures, as opposed to merely ‘removing the name’. 87% of Americans can be identified by anonymous zip code, birthday, and gender, hence merely removing the name is not a good enough mechanism for privacy.
Differential privacy neutralizes attacks, by introducing noise, sending real and fake data at an algorithm determined rate. Uncertainty regarding the validity of answers means that privacy is retained.
Bureau Hacking Into Its Own Data Set?
Using the previous data privacy structure, the bureau successfully hacked into its own data set and identified, race, age, ethnicity, and sex of 52 million Americans, which provided the catalyst for the differential privacy movement. Differential privacy counteracts this through a measure called ‘Epsilon’, which is set from zero to infinity.
Zero value of Epsilon registers a completely scrambled data set, whereas infinite measure would be completely accurate. Therefore, computer models account for and mitigate risk for the Epsilon value to be used. Also, adjustments were used for erroneous data, such as negative numbers, non-integer values, referred to as ‘post-processing alterations’.
The US Census Differentiation Privacy And What It Portrays
The US census differentiation privacy showed adaptability in light of the risen need to install greater privacy techniques. Pre-existing traditional governmental frameworks are generally slow to react to technological developments. However, there is also the counter-argument, which goes against the policy installation.
Steven Ruggles, University of Minnesota Historian said “Differential privacy goes above and beyond what is necessary to keep data safe under census law and precedent. This is not the time to impose arbitrary and burdensome new rules that will sharply restrict or eliminate access to the nation’s core data sources.”
Blunt Instrument Of Differential Privacy
He went on to explain “My central concern about differential privacy is that it’s a blunt instrument. If you want to provide the same level of protection against reidentification that current methods do, you’re going to have to do a lot more damage to the data than is done now”.
On social media profiles, many people share their ideas about politics, their hobbies, and their experiences, alongside general information. The virtual data profiles that we unveil upon the world are rich in data. The census personal data entries are limited to ethnicity, sex, and age, which is far less revealing than that released on social media. It is interesting to consider the extensive measures to ensure privacy for census data, employing differential privacy.
If you find any mistakes or inaccuracies in this article, please don’t hesitate to contact us via email at email@example.com