A computational algorithm for handling the special uniques problem

Research output: Contribution to journalArticlepeer-review

Abstract

Many organizations require detailed individual-level information, much of which has been collected under guarantees of confidentiality. However, simple anonymization procedures, i.e. removing names and addresses, are insufficient for this to be ensured. The records belonging to certain individuals have a high probability of being identified (as their contents, or attributes, are unusual) and therefore have the potential to be recognized spontaneously - such records are referred to as special uniques. Consider, for example, a sixteen-year-old widow in a population survey. Confidentiality of a given dataset cannot be enabled until all special unique records are identified and either disguised or removed. However, to the knowledge of the authors, no exhaustive automated analysis of this nature has been conducted due to the demanding levels of computation and data storage that are required. This paper introduces a new algorithm that locates 'Risky' records in discrete data by first identifying all unique attribute sets (up to a user-specified maximum size) and secondly by grading the 'Risk' of each record by considering the number and distribution of unique attribute sets within each record. Empirical tests indicate that the algorithm is highly effective at picking out 'Risky' records from large samples of data.

Bibliographical metadata

Original languageEnglish
Pages (from-to)493-509
Number of pages16
JournalInternational Journal of Uncertainty, Fuzziness and Knowlege-Based Systems
Volume10
Issue number5
DOIs
Publication statusPublished - Oct 2002

Related information

Impact

Impact: Economic impacts, Societal impacts, Legal impacts

View all (1)