Secure Computation in the Cloud Using MapReduce

UoM administered thesis: Phd

  • Authors:
  • James Dyer


Processing large volumes of data has become increasingly important to businesses and government [218]. One of the popular tools used in processing large sets of data is MapReduce. As new MapReduce applications are developed to be deployed in untrusted environments, such as public clouds, or to process sensitive data or to be deployed across data centres, well understood security measures may be deployed, such as authentication and authorisation or encryption of messages passed between cluster nodes. However, there may be situations where authorised individuals cannot be trusted, such as “rogue” system administrators, or, in the cloud, where the MapReduce cluster nodes have been compromised. Where the input data is sensitive, such as medical data, we require a means to protect this data from exposure. Furthermore, we may need to protect the intermediate data and details of the computation from snoopers to prevent information leakage. To take full advantage of MapReduce cloud computing services, we require a means to process the data securely on such a platform. We designate such a computation, secure computation in the cloud (SCC). SCC should not expose input or output data to any other party, including the cloud service provider. Furthermore, the details of the computation should not allow any other party to deduce its inputs and outputs. Most importantly, we require the computations to be performed in practically reasonable time and space. The ability to perform MapReduce computations on encrypted data would offer a solution to these problems. However, this poses a significant problem in that many encryption schemes transform the original data in such way that meaningful computation on the encrypted data is impossible. Our work aims to provide a practical SSCC system (CryptMR) inspired by CryptDB [272]. Our solution (CryptMR) has detailed several novel cryptographic methods suitable for implementation in a secure computation in the cloud solution using MapReduce. We encrypt integer data using a novel somewhat homomorphic encryption scheme (SHE). This SHE can be made fully homomorphic. We also provide novel order-preserving (OPE) and searchable symmetric encryption (SSE) for the purposes of sorting and searching. Our OPE scheme is, to our knowledge, the first scheme to rely on a computationally hard primitive rather a security model. We have implemented all of these encryption schemes and devised experiments to test their suitability for large-scale distributed computing by integrating them into Hadoop MapReduce (MR) applications. In addition to the work on encryption schemes, we have provided a novel probabilistic method for verifying that mappers and reducers are correctly computing and reporting their results (see chapter 10). This work uses random sampling to minimise the probability that a cheating mapper or reducer can successfully report a false result. We evaluated our “proof-of-concept” implementation in a small scale cloud environment. Our results show that sampling 5 to 10% of intermediate or final data allows us to detect cheating with very strong probability.


Original languageEnglish
Awarding Institution
Award date1 Aug 2018