‘How does Big Data affect personal privacy?’ and ‘in what specific way are privacy and big data connected?’ are two questions we are exploring in our research on big data and privacy. Another question is about a possible way out. How can we organize privacy in the age of Big Data?
One interesting take on this matter is presented by Jeff Jonas, Chief Scientist of the Entity Analytic Solutions-group and an IBM Fellow, in a paper called Privacy By Design (in collaboration with Ann Cavoukian). He presents an ‘anonymous resolution’ decreasing the risk of re-identification based on 7 design principals:
FULL ATTRIBUTION: Every observation (record) needs to know from where it came and when. There cannot be merge/purge data survivorship processing whereby some observations or fields are discarded
DATA TETHERING: Adds, changes and deletes occurring in systems of record must be accounted for, in real time, in sub-seconds
ANALYTICS ON ANONYMIZED DATA: The ability to perform advanced analytics (including some fuzzy matching) over cryptographically altered data means organizations can anonymize more data before information sharing
TAMPER-RESISTANT AUDIT LOGS: Every user search should be logged in a tamper-resistant manner — even the database administrator should not be able to alter the evidence contained in this audit log.
FALSE NEGATIVE FAVORING METHODS: The capability to more strongly favor false negatives is of critical importance in systems that could be used to affect someone’s civil liberties.
SELF-CORRECTING FALSE POSITIVES: With every new data point presented, prior assertions are re-evaluated to ensure they are still correct, and if no longer correct, these earlier assertions can often be repaired —in real time.
INFORMATION TRANSFER ACCOUNTING: Every secondary transfer of data, whether to human eyeball or a tertiary system, can be recorded to allow stakeholders (e.g., data custodians or the consumers themselves) to understand how their data is flowing.
While this framework is reducing the risks and not completely solving the issue (which is impossible), I think privacy needs to be adressed from the start. So the design/architecture stage of systems is a proactive approach to addressing the issue. Building in privacy-enhancing elements by design can minimize the privacy harm and in some cases might take away possible harm in the first place. How do you feel about organizing privacy in the age of big data? And what about organizing privacy by design?