Privacy, Security, and Repair in Distributed Storage Systems
Abstract
We are living in the age of information where our lives are shaped by information and communication technologies. As a consequence, there is an explosion in the amount of generated data. Distribute storage systems (DSSs) are a storage technology wherein large amounts of data are stored on a network of inexpensive storage nodes in a distributed fashion, in an efficient and inexpensive way. This thesis explores three aspects of DSSs: efficient storage, security, and privacy. We start with the description of DSSs as they are the underlying theme for the majority of the thesis. For such systems, we propose a code construction that performs efficient repair of failed systematic nodes with low repair complexity. We construct such codes by concatenating two different classes of codes. The first code serves to provide the ability to tolerate node failures, while the second allows for efficient and low complexity repair. The proposed codes achieve better repair bandwidth compared to maximum distance separable (MDS) codes, codes constructed using piggybacks, and local reconstruction/Pyramid codes, while they have a better repair complexity compared to MDS, Zigzag, Pyramid codes, and codes constructed using piggybacks. Next, we consider the notion of information-theoretic security in DSSs. To this end, we assume an eavesdropper model that has access to the data stored on a subset of nodes and a subset of the data exchanged during the repair of nodes. We present a secure coding scheme that is secure against this adversary. Furthermore, the proposed scheme ensures an efficient node repair. The security is achieved by appending the data with random symbols, and encoding the resulting sequence using a concatenated coding scheme consisting of a Gabidulin code and a repairable fountain code. We then move toward the problem of private information retrieval (PIR) in DSSs. PIR is the problem of retrieving data from the nodes without revealing the identity of the requested data (or files) to the nodes. A scheme that achieves PIR is referred to as a PIR protocol, and its efficiency is determined by its PIR rate, i.e., the ratio of downloaded data and the requested file size. For the DSS scenario, we present PIR protocols for two cases: first, where some nodes act as spies, which do not collaborate, with a goal of determining what the user is requesting. We refer to this scenario as the noncolluding case, and, second, where these nodes collaborate together in order to determine the request of a user, which we refer to as the colluding case. For the noncolluding case, we show that it is possible to achieve MDS-PIR capacity, i.e., the maximum PIR rate achieved by any protocol when the DSS stores the data using an MDS code, even when the codes used by the DSS are non-MDS. We go on to give a necessary and a sufficient condition based on the structure of the storage code for our PIR protocol to achieve this capacity. We refer to such codes as MDS-PIR capacity-achieving codes. Furthermore, we show that the rates achieved by these codes using the proposed protocol are indeed equal to the maximum PIR rate achieved by these codes under any protocol. Subsequently, we show that cyclic codes, Reed-Muller (RM) codes, and a class of distance-optimal local reconstruction codes are MDS-PIR capacity-achieving codes. For the colluding case, we present a PIR protocol for DSSs that store data using arbitrary linear codes. Following this, as in the noncolluding case, we give a necessary and a sufficient condition based on the structure of the underlying codes that allow the proposed protocol to achieve an upper bound on the PIR rate and present codes that allow this. In particular, we show that if the underlying codes are RM codes, then the protocol achieves the maximum PIR rate. The concepts learned while finding solutions to the privacy issues for DSSs are then applied to distributed caching in wireless networks, which is one of the fundamental techniques that will be implemented in future 5G wireless networks. In the considered distributed caching scenario, the files are encoded using MDS codes and then cached on the small-cell base stations (SBSs), with the goal to reduce the backhaul usage. We introduce the notion of PIR in this setting and present a protocol that achieves PIR. One essential difference between DSSs and distributed caching is that in distributed caching the files to be cached are typically cached using codes with different rates that depend on the file popularity distribution. We assume that the SBSs can collude and as such the protocol is a generalization of the protocol considered for the colluding case in DSSs. As a surprising result, we show that to minimize backhaul usage, uniform content placement, i.e., the files are cached using a single code, is optimal.
Has parts
Paper I: S. Kumar, A. Graell i Amat, I. Andriyanova, F. Brännström, E. Rosnes, Code Constructions for Distributed Storage With Low Repair Bandwidth and Low Repair Complexity,IEEE Transactionson Communications,to appear. arXiv:1801.05989Paper II: S. Kumar, E. Rosnes, A. Graell i Amat, Secure Repairable Fountain Codes, IEEE Communications Letters, vol. 20, no. 8, August 2016. https://doi.org/10.1109/lcomm.2016.2574355
Paper III: S. Kumar, H.-Y. Lin, E. Rosnes, A. Graell i Amat, Achieving Maximum Distance Separable Private Information Retrieval Capacity With Linear Codes, submitted to IEEE Transactionson Information Theory, revised in August 2018. arXiv:1712.03898
Paper IV: H.-Y. Lin, S. Kumar, E. Rosnes, A. Graell i Amat, Asymmetry Helps: Improved Private Information Retrieval Protocols for Distributed Storage, in Proc. IEEE Information Theory Workshop (ITW), Guangzhou, China, November 2018. arXiv:1806.01342
Paper V: S. Kumar, A. Graell i Amat, E. Rosnes, L. Senigagliesi, Private Information Retrieval From a Cellular Network With Caching at the Edge, submitted to IEEE Transactionson Communications. arXiv:1809.00872