9 Search Results for "Thompson, Sam M."


Document
Linear Time Subsequence and Supersequence Regex Matching

Authors: Antoine Amarilli, Florin Manea, Tina Ringleb, and Markus L. Schmid

Published in: LIPIcs, Volume 345, 50th International Symposium on Mathematical Foundations of Computer Science (MFCS 2025)


Abstract
It is well-known that checking whether a given string w matches a given regular expression r can be done in quadratic time O(|w|⋅ |r|) and that this cannot be improved to a truly subquadratic running time of O((|w|⋅ |r|)^{1-ε}) assuming the strong exponential time hypothesis (SETH). We study a different matching paradigm where we ask instead whether w has a subsequence that matches r, and show that regex matching in this sense can be solved in linear time O(|w| + |r|). Further, the same holds if we ask for a supersequence. We show that the quantitative variants where we want to compute a longest or shortest subsequence or supersequence of w that matches r can be solved in O(|w|⋅ |r|), i. e., asymptotically no worse than classical regex matching; and we show that O(|w| + |r|) is conditionally not possible for these problems. We also investigate these questions with respect to other natural string relations like the infix, prefix, left-extension or extension relation instead of the subsequence and supersequence relation. We further study the complexity of the universal problem where we ask if all subsequences (or supersequences, infixes, prefixes, left-extensions or extensions) of an input string satisfy a given regular expression.

Cite as

Antoine Amarilli, Florin Manea, Tina Ringleb, and Markus L. Schmid. Linear Time Subsequence and Supersequence Regex Matching. In 50th International Symposium on Mathematical Foundations of Computer Science (MFCS 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 345, pp. 9:1-9:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{amarilli_et_al:LIPIcs.MFCS.2025.9,
  author =	{Amarilli, Antoine and Manea, Florin and Ringleb, Tina and Schmid, Markus L.},
  title =	{{Linear Time Subsequence and Supersequence Regex Matching}},
  booktitle =	{50th International Symposium on Mathematical Foundations of Computer Science (MFCS 2025)},
  pages =	{9:1--9:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-388-1},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{345},
  editor =	{Gawrychowski, Pawe{\l} and Mazowiecki, Filip and Skrzypczak, Micha{\l}},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.MFCS.2025.9},
  URN =		{urn:nbn:de:0030-drops-241162},
  doi =		{10.4230/LIPIcs.MFCS.2025.9},
  annote =	{Keywords: subsequence, supersequence, regular language, regular expression, automata}
}
Document
Mutational Signature Refitting on Sparse Pan-Cancer Data

Authors: Gal Gilad, Teresa M. Przytycka, and Roded Sharan

Published in: LIPIcs, Volume 344, 25th International Conference on Algorithms for Bioinformatics (WABI 2025)


Abstract
Mutational processes shape cancer genomes, leaving characteristic marks that are termed signatures. The level of activity of each such process, or its signature exposure, provides important information on the disease, improving patient stratification and the prediction of drug response. Thus, there is growing interest in developing refitting methods that decipher those exposures. Previous work in this domain was unsupervised in nature, employing algebraic decomposition and probabilistic inference methods. Here we provide a supervised approach to the problem of signature refitting and show its superiority over current methods. Our method, SuRe, leverages a neural network model to capture correlations between signature exposures in real data. We show that SuRe outperforms previous methods on sparse mutation data from tumor type specific data sets, as well as pan-cancer data sets, with an increasing advantage as the data become sparser. We further demonstrate its utility in clinical settings.

Cite as

Gal Gilad, Teresa M. Przytycka, and Roded Sharan. Mutational Signature Refitting on Sparse Pan-Cancer Data. In 25th International Conference on Algorithms for Bioinformatics (WABI 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 344, pp. 11:1-11:23, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{gilad_et_al:LIPIcs.WABI.2025.11,
  author =	{Gilad, Gal and Przytycka, Teresa M. and Sharan, Roded},
  title =	{{Mutational Signature Refitting on Sparse Pan-Cancer Data}},
  booktitle =	{25th International Conference on Algorithms for Bioinformatics (WABI 2025)},
  pages =	{11:1--11:23},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-386-7},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{344},
  editor =	{Brejov\'{a}, Bro\v{n}a and Patro, Rob},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2025.11},
  URN =		{urn:nbn:de:0030-drops-239374},
  doi =		{10.4230/LIPIcs.WABI.2025.11},
  annote =	{Keywords: mutational signatures, signature refitting, cancer genomics, genomic data analysis, somatic mutations}
}
Document
Media Exposition
Software for the Thompson and Funk Polygonal Geometry (Media Exposition)

Authors: Hridhaan Banerjee, Carmen Isabel Day, Auguste H. Gezalyan, Olga Golovatskaia, Megan Hunleth, Sarah Hwang, Nithin Parepally, Lucy Wang, and David M. Mount

Published in: LIPIcs, Volume 332, 41st International Symposium on Computational Geometry (SoCG 2025)


Abstract
Metric spaces defined within convex polygons, such as the Thompson, Funk, reverse Funk, and Hilbert metrics, are subjects of recent exploration and study in computational geometry. This paper contributes an educational piece of software for understanding these unique geometries while also providing a tool to support their research. We provide dynamic software for manipulating the Funk, reverse Funk, and Thompson balls in convex polygonal domains. Additionally, we provide a visualization program for traversing the Hilbert polygonal geometry.

Cite as

Hridhaan Banerjee, Carmen Isabel Day, Auguste H. Gezalyan, Olga Golovatskaia, Megan Hunleth, Sarah Hwang, Nithin Parepally, Lucy Wang, and David M. Mount. Software for the Thompson and Funk Polygonal Geometry (Media Exposition). In 41st International Symposium on Computational Geometry (SoCG 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 332, pp. 82:1-82:6, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{banerjee_et_al:LIPIcs.SoCG.2025.82,
  author =	{Banerjee, Hridhaan and Day, Carmen Isabel and Gezalyan, Auguste H. and Golovatskaia, Olga and Hunleth, Megan and Hwang, Sarah and Parepally, Nithin and Wang, Lucy and Mount, David M.},
  title =	{{Software for the Thompson and Funk Polygonal Geometry}},
  booktitle =	{41st International Symposium on Computational Geometry (SoCG 2025)},
  pages =	{82:1--82:6},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-370-6},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{332},
  editor =	{Aichholzer, Oswin and Wang, Haitao},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SoCG.2025.82},
  URN =		{urn:nbn:de:0030-drops-232349},
  doi =		{10.4230/LIPIcs.SoCG.2025.82},
  annote =	{Keywords: Thompson metric, Hilbert metric, Funk metric, balls}
}
Document
FC-Datalog as a Framework for Efficient String Querying

Authors: Owen M. Bell, Joel D. Day, and Dominik D. Freydenberger

Published in: LIPIcs, Volume 328, 28th International Conference on Database Theory (ICDT 2025)


Abstract
Core spanners are a class of document spanners that capture the core functionality of IBM’s AQL. FC is a logic on strings built around word equations that when extended with constraints for regular languages can be seen as a logic for core spanners. The recently introduced FC-Datalog extends FC with recursion, which allows us to define recursive relations for core spanners. Additionally, as FC-Datalog captures 𝖯, it is also a tractable version of Datalog on strings. This presents an opportunity for optimization. We propose a series of FC-Datalog fragments with desirable properties in terms of complexity of model checking, expressive power, and efficiency of checking membership in the fragment. This leads to a range of fragments that all capture LOGSPACE, which we further restrict to obtain linear combined complexity. This gives us a framework to tailor fragments for particular applications. To showcase this, we simulate deterministic regex in a tailored fragment of FC-Datalog.

Cite as

Owen M. Bell, Joel D. Day, and Dominik D. Freydenberger. FC-Datalog as a Framework for Efficient String Querying. In 28th International Conference on Database Theory (ICDT 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 328, pp. 29:1-29:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{bell_et_al:LIPIcs.ICDT.2025.29,
  author =	{Bell, Owen M. and Day, Joel D. and Freydenberger, Dominik D.},
  title =	{{FC-Datalog as a Framework for Efficient String Querying}},
  booktitle =	{28th International Conference on Database Theory (ICDT 2025)},
  pages =	{29:1--29:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-364-5},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{328},
  editor =	{Roy, Sudeepa and Kara, Ahmet},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2025.29},
  URN =		{urn:nbn:de:0030-drops-229708},
  doi =		{10.4230/LIPIcs.ICDT.2025.29},
  annote =	{Keywords: Information extraction, word equations, datalog, document spanners, regex}
}
Document
A Framework for Extraction and Transformation of Documents

Authors: Cristian Riveros, Markus L. Schmid, and Nicole Schweikardt

Published in: LIPIcs, Volume 328, 28th International Conference on Database Theory (ICDT 2025)


Abstract
We present a theoretical framework for the extraction and transformation of text documents as a two-phase process: The first phase uses document spanners to extract information from the input document. The second phase transforms the extracted information into a suitable output. To support several reasonable extract-transform scenarios, we propose for the first phase an extension of document spanners from span-tuples to so-called multispan-tuples, where variables are mapped to sets of spans instead of only single spans. We focus on multispanners described by regex formulas, and we prove that these have the same desirable properties as standard regular spanners. To formalize the second phase, we consider transformations that map every pair document-tuple, where each tuple comes from the (multi)span-relation extracted in the first phase, into a new output document. The specification of the two phases is what we call an extract-transform (ET) program, which covers practically relevant extract-transform tasks. In this paper, our main technical goal is to identify a broad class of ET programs that can be evaluated efficiently. We specifically focus on the scenario of regular ET programs: the extraction phase is given by a regex multispanner and the transformation phase is given by a regular string-to-string function. We show that for any regular ET program, given an input document, we can enumerate all final output documents with output-linear delay after linear preprocessing. As a side effect, we characterize the expressive power of regular ET programs and also show that they have desirable properties, like being closed under composition.

Cite as

Cristian Riveros, Markus L. Schmid, and Nicole Schweikardt. A Framework for Extraction and Transformation of Documents. In 28th International Conference on Database Theory (ICDT 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 328, pp. 18:1-18:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{riveros_et_al:LIPIcs.ICDT.2025.18,
  author =	{Riveros, Cristian and Schmid, Markus L. and Schweikardt, Nicole},
  title =	{{A Framework for Extraction and Transformation of Documents}},
  booktitle =	{28th International Conference on Database Theory (ICDT 2025)},
  pages =	{18:1--18:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-364-5},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{328},
  editor =	{Roy, Sudeepa and Kara, Ahmet},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2025.18},
  URN =		{urn:nbn:de:0030-drops-229593},
  doi =		{10.4230/LIPIcs.ICDT.2025.18},
  annote =	{Keywords: Information extraction, Document spanners, Transducers, Query evaluation}
}
Document
Vision
Knowledge Engineering Using Large Language Models

Authors: Bradley P. Allen, Lise Stork, and Paul Groth

Published in: TGDK, Volume 1, Issue 1 (2023): Special Issue on Trends in Graph Data and Knowledge. Transactions on Graph Data and Knowledge, Volume 1, Issue 1


Abstract
Knowledge engineering is a discipline that focuses on the creation and maintenance of processes that generate and apply knowledge. Traditionally, knowledge engineering approaches have focused on knowledge expressed in formal languages. The emergence of large language models and their capabilities to effectively work with natural language, in its broadest sense, raises questions about the foundations and practice of knowledge engineering. Here, we outline the potential role of LLMs in knowledge engineering, identifying two central directions: 1) creating hybrid neuro-symbolic knowledge systems; and 2) enabling knowledge engineering in natural language. Additionally, we formulate key open research questions to tackle these directions.

Cite as

Bradley P. Allen, Lise Stork, and Paul Groth. Knowledge Engineering Using Large Language Models. In Special Issue on Trends in Graph Data and Knowledge. Transactions on Graph Data and Knowledge (TGDK), Volume 1, Issue 1, pp. 3:1-3:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)


Copy BibTex To Clipboard

@Article{allen_et_al:TGDK.1.1.3,
  author =	{Allen, Bradley P. and Stork, Lise and Groth, Paul},
  title =	{{Knowledge Engineering Using Large Language Models}},
  journal =	{Transactions on Graph Data and Knowledge},
  pages =	{3:1--3:19},
  ISSN =	{2942-7517},
  year =	{2023},
  volume =	{1},
  number =	{1},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/TGDK.1.1.3},
  URN =		{urn:nbn:de:0030-drops-194777},
  doi =		{10.4230/TGDK.1.1.3},
  annote =	{Keywords: knowledge engineering, large language models}
}
Document
Splitting Spanner Atoms: A Tool for Acyclic Core Spanners

Authors: Dominik D. Freydenberger and Sam M. Thompson

Published in: LIPIcs, Volume 220, 25th International Conference on Database Theory (ICDT 2022)


Abstract
This paper investigates regex CQs with string equalities (SERCQs), a subclass of core spanners. As shown by Freydenberger, Kimelfeld, and Peterfreund (PODS 2018), these queries are intractable, even if restricted to acyclic queries. This previous result defines acyclicity by treating regex formulas as atoms. In contrast to this, we propose an alternative definition by converting SERCQs into FC-CQs - conjunctive queries in FC, a logic that is based on word equations. We introduce a way to decompose word equations of unbounded arity into a conjunction of binary word equations. If the result of the decomposition is acyclic, then evaluation and enumeration of results become tractable. The main result of this work is an algorithm that decides in polynomial time whether an FC-CQ can be decomposed into an acyclic FC-CQ. We also give an efficient conversion from synchronized SERCQs to FC-CQs with regular constraints. As a consequence, tractability results for acyclic relational CQs directly translate to a large class of SERCQs.

Cite as

Dominik D. Freydenberger and Sam M. Thompson. Splitting Spanner Atoms: A Tool for Acyclic Core Spanners. In 25th International Conference on Database Theory (ICDT 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 220, pp. 10:1-10:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


Copy BibTex To Clipboard

@InProceedings{freydenberger_et_al:LIPIcs.ICDT.2022.10,
  author =	{Freydenberger, Dominik D. and Thompson, Sam M.},
  title =	{{Splitting Spanner Atoms: A Tool for Acyclic Core Spanners}},
  booktitle =	{25th International Conference on Database Theory (ICDT 2022)},
  pages =	{10:1--10:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-223-5},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{220},
  editor =	{Olteanu, Dan and Vortmeier, Nils},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2022.10},
  URN =		{urn:nbn:de:0030-drops-158843},
  doi =		{10.4230/LIPIcs.ICDT.2022.10},
  annote =	{Keywords: Document spanners, information extraction, conjunctive queries}
}
Document
Track B: Automata, Logic, Semantics, and Theory of Programming
The Theory of Concatenation over Finite Models

Authors: Dominik D. Freydenberger and Liat Peterfreund

Published in: LIPIcs, Volume 198, 48th International Colloquium on Automata, Languages, and Programming (ICALP 2021)


Abstract
We propose FC, a new logic on words that combines finite model theory with the theory of concatenation - a first-order logic that is based on word equations. Like the theory of concatenation, FC is built around word equations; in contrast to it, its semantics are defined to only allow finite models, by limiting the universe to a word and all its factors. As a consequence of this, FC has many of the desirable properties of FO on finite models, while being far more expressive than FO[<]. Most noteworthy among these desirable properties are sufficient criteria for efficient model checking, and capturing various complexity classes by adding operators for transitive closures or fixed points. Not only does FC allow us to obtain new insights and techniques for expressive power and efficient evaluation of document spanners, but it also provides a general framework for logic on words that also has potential applications in other areas.

Cite as

Dominik D. Freydenberger and Liat Peterfreund. The Theory of Concatenation over Finite Models. In 48th International Colloquium on Automata, Languages, and Programming (ICALP 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 198, pp. 130:1-130:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)


Copy BibTex To Clipboard

@InProceedings{freydenberger_et_al:LIPIcs.ICALP.2021.130,
  author =	{Freydenberger, Dominik D. and Peterfreund, Liat},
  title =	{{The Theory of Concatenation over Finite Models}},
  booktitle =	{48th International Colloquium on Automata, Languages, and Programming (ICALP 2021)},
  pages =	{130:1--130:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-195-5},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{198},
  editor =	{Bansal, Nikhil and Merelli, Emanuela and Worrell, James},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2021.130},
  URN =		{urn:nbn:de:0030-drops-141997},
  doi =		{10.4230/LIPIcs.ICALP.2021.130},
  annote =	{Keywords: finite model theory, word equations, descriptive complexity, model checking, document spanners}
}
Document
Dynamic Complexity of Document Spanners

Authors: Dominik D. Freydenberger and Sam M. Thompson

Published in: LIPIcs, Volume 155, 23rd International Conference on Database Theory (ICDT 2020)


Abstract
The present paper investigates the dynamic complexity of document spanners, a formal framework for information extraction introduced by Fagin, Kimelfeld, Reiss, and Vansummeren (JACM 2015). We first look at the class of regular spanners and prove that any regular spanner can be maintained in the dynamic complexity class DynPROP. This result follows from work done previously on the dynamic complexity of formal languages by Gelade, Marquardt, and Schwentick (TOCL 2012). To investigate core spanners we use SpLog, a concatenation logic that exactly captures core spanners. We show that the dynamic complexity class DynCQ is more expressive than SpLog and therefore can maintain any core spanner. This result is then extended to show that DynFO can maintain any generalized core spanner and that DynFO is more powerful than SpLog with negation.

Cite as

Dominik D. Freydenberger and Sam M. Thompson. Dynamic Complexity of Document Spanners. In 23rd International Conference on Database Theory (ICDT 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 155, pp. 11:1-11:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)


Copy BibTex To Clipboard

@InProceedings{freydenberger_et_al:LIPIcs.ICDT.2020.11,
  author =	{Freydenberger, Dominik D. and Thompson, Sam M.},
  title =	{{Dynamic Complexity of Document Spanners}},
  booktitle =	{23rd International Conference on Database Theory (ICDT 2020)},
  pages =	{11:1--11:21},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-139-9},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{155},
  editor =	{Lutz, Carsten and Jung, Jean Christoph},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2020.11},
  URN =		{urn:nbn:de:0030-drops-119355},
  doi =		{10.4230/LIPIcs.ICDT.2020.11},
  annote =	{Keywords: Document spanners, information extraction, dynamic complexity, descriptive complexity, word equations}
}
  • Refine by Type
  • 9 Document/PDF
  • 6 Document/HTML

  • Refine by Publication Year
  • 5 2025
  • 1 2023
  • 1 2022
  • 1 2021
  • 1 2020

  • Refine by Author
  • 4 Freydenberger, Dominik D.
  • 2 Schmid, Markus L.
  • 2 Thompson, Sam M.
  • 1 Allen, Bradley P.
  • 1 Amarilli, Antoine
  • Show More...

  • Refine by Series/Journal
  • 8 LIPIcs
  • 1 TGDK

  • Refine by Classification
  • 2 Theory of computation → Complexity theory and logic
  • 2 Theory of computation → Logic and databases
  • 1 Applied computing → Bioinformatics
  • 1 Computing methodologies → Machine learning
  • 1 Computing methodologies → Natural language processing
  • Show More...

  • Refine by Keyword
  • 3 Document spanners
  • 3 word equations
  • 2 Information extraction
  • 2 descriptive complexity
  • 2 document spanners
  • Show More...

Any Issues?
X

Feedback on the Current Page

CAPTCHA

Thanks for your feedback!

Feedback submitted to Dagstuhl Publishing

Could not send message

Please try again later or send an E-mail