Algorithmic Complexity of Clustering and Low-Rank Approximation Problems
Doctoral thesis
Åpne
Permanent lenke
https://hdl.handle.net/11250/2735169Utgivelsesdato
2021-03-29Metadata
Vis full innførselSamlinger
Sammendrag
The two most popular unsupervised learning problems are k-Clustering and Low-Rank Approximation. Consider a set of n datapoints, in the k-Clustering problem, the objective is to partition these points into k clusters and select k centers such that each cluster is represented well by its center. In the Low-Rank Approximation problem, the task is to find an r-dimensional subspace that minimizes the sum of deviations from each point to the subspace. Both problems are of utmost importance for the modern data-driven applications, and both can be thought of as structured representation problems.
In this thesis, we provide a thorough study of the multivariate complexity of k-Clustering and Low-Rank Approximation. We focus on extensions of these problems, such as robust and constrained versions, that reach beyond the well-studied standard setting. The main body of the thesis is divided into three parts. In the first part, we study parameterized complexity of exact algorithms, where the parameter is the total cost of the clustering. Problems that constitute the main focus of this part are k-Clustering in L_p-norm for p ∈ [0, ∞], and Categorical Clustering with Row/Column Outliers. We provide a number of fixed-parameter tractable algorithms based on hypergraph enumeration, and a number of hardness results. The second part can be summarized as employing sampling methods to provide (1 + ε)-approximation in FPT time. We show a space-efficient coreset for the Fair Clustering problem, and an FPT approximation scheme for the problem of clustering points with missing entries, where the number of missing entries in each point is bounded. Finally, in the third part we deal with the Low-Rank Approximation problem, and its robust variant, Low-Rank Approximation with Outliers. For the latter, we employ algebraic geometry methods to provide an n^O(rd) exact algorithm that is nearly tight even for arbitrary-factor approximation, and its dimensionality reduction-based improvements. We also present a PTAS for Low-Rank Approximation of binary matrices in column-sum norm.