Title: Dictionary Identification—Sparse Matrix-Factorization via $\ell_1$-Minimization
Abstract: This paper treats the problem of learning a dictionary providing sparse representations for a given signal class, via ℓ <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</sub> -minimization. The problem can also be seen as factorizing a d × N matrix Y = (y <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</sub> . . . y <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">N</sub> ), y <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">n</sub> ∈ ℝ <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">d</sup> of training signals into a d × K dictionary matrix Φ and a K × N coefficient matrix X = (x <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</sub> . . . x <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">N</sub> ), x <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">n</sub> ∈ ℝ <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">K</sup> , which is sparse. The exact question studied here is when a dictionary coefficient pair (Φ, X) can be recovered as local minimum of a (nonconvex) ℓ <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</sub> -criterion with input Y = Φ X. First, for general dictionaries and coefficient matrices, algebraic conditions ensuring local identifiability are derived, which are then specialized to the case when the dictionary is a basis. Finally, assuming a random Bernoulli-Gaussian sparse model on the coefficient matrix, it is shown that sufficiently incoherent bases are locally identifiable with high probability. The perhaps surprising result is that the typically sufficient number of training samples N grows up to a logarithmic factor only linearly with the signal dimension, i.e., N ≈ CK log K, in contrast to previous approaches requiring combinatorially many samples.