Title: Topologically optimal bounding chains for persistent homology and novel persistence modules for both static and dynamic data
Abstract: Topological data analysis is an approach to data analysis that combines statistical methods with classical ideas in algebraic topology in order to extract geometrical and topological information from data. One of the tools that lies at the heart of TDA is \emph{persistent homology}, which was developed in the early 2000s as a tool to discover homological features in (possibly discrete) data sets. It describes the evolution of homology classes through a filtration of topological spaces as a set of intervals, called the \emph{barcode}. The start- and endpoints of these intervals can be interpreted as birth-death events of homology classes. Longer intervals are often thought of as more significant features. In applications, one typically works with a filtration of simplicial complexes constructed from data. The barcode of this filtration is a geometrical signature of the data, from which further conclusions may be drawn. This thesis is composed of three papers concerning different aspects of topological data analysis, and each paper constitutes its own chapter: The first paper, `Topologically optimal bounding chains for persistent homology,' deals with realizations of persistent homology classes by means of bounding chains that are optimal with respect to a topological notion of size. This is a novel approach to representing persistent homology classes, inspired by concepts from low dimensional topology. We study the computational complexity of the problem of finding such optimal bounding chains, which in general, turns out to be NP-hard. We also identify several cases in which optimal bounding chains can be found in polynomial time. In the second paper, `The Christoffel-Darboux kernel for topological data analysis,' we present a novel persistence module for data in $\R^n$, which is based on recent applications of Christoffel-Darboux kernels to data analysis. We show that this persistence module is stable with respect to the Wasserstein distance, which guarantees that -- unlike classical persistence modules -- it is not sensitive to statistical outliers in the data. Furthermore, we show that its persistence diagram can be computed in time linear in the amount of data points, and we analyze its performance on several examples. The chapter concludes with a short postscript, giving a precise error bound for piecewise linear approximations to Lipschitz functions, improving on a bound given in the paper. The final paper is a preliminary version, and has the working title `A 2-parameter persistence module for generalized dynamic metric spaces'. In this paper, we construct and analyze a novel persistence module for dynamic data. This persistence module is similar to the spatio-temporal Rips persistence for dynamic data defined by Kim and Mémoli, but can be defined for a more general notion of dynamic data. We also show that it has more desirable algebraic properties: A priori, both are 3-parameter persistence modules, but in a precise sense, our novel persistence module can be interpreted as a two-parameter persistence module. What is missing from the presentation is a suitable notion of distance for the input data, as well as a corresponding stability result, which is left for future work.