Title: On the height of digital trees and related problems
Abstract: This paper studies in a probabilistic framework some topics concerning the way words (strings) can overlap, and relationship of this to the height of digital trees associated with this set of words. A word is defined as a random sequence of (possibly infinite) symbols over a finite alphabet. A key notion of analignment matrix {C ij } is introduced whereC ij is the length of the longest string that is a prefix of theith and thejth word. It is proved that the height of an associated digital tree is simply related to the alignment matrix through some order statistics. In particular, using this observation and proving some inequalities for order statistics, we establish that the height of adigital trie under anindependent model (i.e., all words are statistically independent) is asymptotically equal to 2 logα n wheren is the number of words stored in the trie and α is a parameter of the probabilistic model. This result is generalized in three directions, namely we considerb-tries,Markovian model (i.e., dependency among letters in a word), and adependent model (i.e., dependency among words). In particular, when consecutive letters in a word are Markov dependent (Markovian model), then we demonstrate that the height converges in probability to 2 logθ n where θ is a parameter of the underlying Markov chain. On the other hand, for suffix trees which fall into the dependent model, we show that the height does not exceed 2 logκ n, where κ is a parameter of the probabilistic model. These results find plenty of applications in the analysis of data structures built over digital words.