The string correction factor is the term by which the probability of a word $w$ needs to be multiplied in order to account for character changes or ``errors'' occurring in at most $k$ arbitrary positions in that word. The behavior of this factor, as a function of $k$ and of the word length, has implications on the number of candidates that need to be considered and weighted when looking for subwords of a sequence that present unusually recurrent replicas within some bounded number of mismatches. Specifically, it is seen that over intervals of mono- or bi-tonicity for the correction factor, only some of the candidates need be considered. This mitigates the computation and leads to tables of over-represented words that are more compact to represent and inspect. In recent work, expectation and score monotonicity has been established for a number of cases of interest, under {em i.i.d.} probabilistic assumptions. The present paper reviews the cases of bi-tonic behavior for the correction factor, concentrating on the instance in which the question is still open.
@InProceedings{apostolico_et_al:DagSemProc.06201.5, author = {Apostolico, Alberto and Pizzi, Cinzia}, title = {{On the Monotonicity of the String Correction Factor for Words with Mismatches}}, booktitle = {Combinatorial and Algorithmic Foundations of Pattern and Association Discovery}, pages = {1--9}, series = {Dagstuhl Seminar Proceedings (DagSemProc)}, ISSN = {1862-4405}, year = {2006}, volume = {6201}, editor = {Rudolf Ahlswede and Alberto Apostolico and Vladimir I. Levenshtein}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/DagSemProc.06201.5}, URN = {urn:nbn:de:0030-drops-7899}, doi = {10.4230/DagSemProc.06201.5}, annote = {Keywords: Pattern discovery, Motif, Over-represented word, Monotone score, Correction Factor} }
Feedback for Dagstuhl Publishing