DROPS

Document

Invited Talk

DOI: 10.4230/LIPIcs.ICDT.2023.1

A Researcher’s Digest of GQL (Invited Talk)

Authors: Nadime Francis, Amélie Gheerbrant, Paolo Guagliardo, Leonid Libkin, Victor Marsault, Wim Martens, Filip Murlak, Liat Peterfreund, Alexandra Rogova, and Domagoj Vrgoč

Published in: LIPIcs, Volume 255, 26th International Conference on Database Theory (ICDT 2023)

Abstract

GQL (Graph Query Language) is being developed as a new ISO standard for graph query languages to play the same role for graph databases as SQL plays for relational. In parallel, an extension of SQL for querying property graphs, SQL/PGQ, is added to the SQL standard; it shares the graph pattern matching functionality with GQL. Both standards (not yet published) are hard-to-understand specifications of hundreds of pages. The goal of this paper is to present a digest of the language that is easy for the research community to understand, and thus to initiate research on these future standards for querying graphs. The paper concentrates on pattern matching features shared by GQL and SQL/PGQ, as well as querying facilities of GQL.

Cite as

Nadime Francis, Amélie Gheerbrant, Paolo Guagliardo, Leonid Libkin, Victor Marsault, Wim Martens, Filip Murlak, Liat Peterfreund, Alexandra Rogova, and Domagoj Vrgoč. A Researcher’s Digest of GQL (Invited Talk). In 26th International Conference on Database Theory (ICDT 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 255, pp. 1:1-1:22, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{francis_et_al:LIPIcs.ICDT.2023.1,
  author =	{Francis, Nadime and Gheerbrant, Am\'{e}lie and Guagliardo, Paolo and Libkin, Leonid and Marsault, Victor and Martens, Wim and Murlak, Filip and Peterfreund, Liat and Rogova, Alexandra and Vrgo\v{c}, Domagoj},
  title =	{{A Researcher’s Digest of GQL}},
  booktitle =	{26th International Conference on Database Theory (ICDT 2023)},
  pages =	{1:1--1:22},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-270-9},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{255},
  editor =	{Geerts, Floris and Vandevoort, Brecht},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2023.1},
  URN =		{urn:nbn:de:0030-drops-177434},
  doi =		{10.4230/LIPIcs.ICDT.2023.1},
  annote =	{Keywords: GQL, Property Graph, Query Language, Graph Database, Pattern matching, Multi-Graph}
}

@InProceedings{francis_et_al:LIPIcs.ICDT.2023.1,
  author =	{Francis, Nadime and Gheerbrant, Am\'{e}lie and Guagliardo, Paolo and Libkin, Leonid and Marsault, Victor and Martens, Wim and Murlak, Filip and Peterfreund, Liat and Rogova, Alexandra and Vrgo\v{c}, Domagoj},
  title =	{{A Researcher’s Digest of GQL}},
  booktitle =	{26th International Conference on Database Theory (ICDT 2023)},
  pages =	{1:1--1:22},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-270-9},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{255},
  editor =	{Geerts, Floris and Vandevoort, Brecht},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2023.1},
  URN =		{urn:nbn:de:0030-drops-177434},
  doi =		{10.4230/LIPIcs.ICDT.2023.1},
  annote =	{Keywords: GQL, Property Graph, Query Language, Graph Database, Pattern matching, Multi-Graph}
}

Document

DOI: 10.4230/LIPIcs.FUN.2022.23

Playing Guess Who with Your Kids

Authors: Ami Paz and Liat Peterfreund

Published in: LIPIcs, Volume 226, 11th International Conference on Fun with Algorithms (FUN 2022)

Abstract

Guess who is a two-player search game in which each player chooses a character from a deck of 24 cards, and has to infer the other player’s character by asking yes-no questions. A simple binary search strategy allows the starting player find the opponent’s character by asking 5 questions only, when the opponent is honest. Real-life observations show that in more realistic scenarios, the game is played against adversaries that do not strictly follow the rules, e.g., kids. Such players might decide to answer all questions at once, answer only part of the questions as they do not know the answers to all, and even lie occasionally. We devise strategies for such scenarios using techniques from error-correcting and erasure codes. This connects to a recent line of work on search problems on graphs and trees with unreliable auxiliary information, and could be of independent interest.

Cite as

Ami Paz and Liat Peterfreund. Playing Guess Who with Your Kids. In 11th International Conference on Fun with Algorithms (FUN 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 226, pp. 23:1-23:10, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{paz_et_al:LIPIcs.FUN.2022.23,
  author =	{Paz, Ami and Peterfreund, Liat},
  title =	{{Playing Guess Who with Your Kids}},
  booktitle =	{11th International Conference on Fun with Algorithms (FUN 2022)},
  pages =	{23:1--23:10},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-232-7},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{226},
  editor =	{Fraigniaud, Pierre and Uno, Yushi},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.FUN.2022.23},
  URN =		{urn:nbn:de:0030-drops-159935},
  doi =		{10.4230/LIPIcs.FUN.2022.23},
  annote =	{Keywords: Guess Who?, Binary Search, Error Correcting Codes, Erasure Codes}
}

Document

Track B: Automata, Logic, Semantics, and Theory of Programming

DOI: 10.4230/LIPIcs.ICALP.2021.130

The Theory of Concatenation over Finite Models

Authors: Dominik D. Freydenberger and Liat Peterfreund

Published in: LIPIcs, Volume 198, 48th International Colloquium on Automata, Languages, and Programming (ICALP 2021)

Abstract

We propose FC, a new logic on words that combines finite model theory with the theory of concatenation - a first-order logic that is based on word equations. Like the theory of concatenation, FC is built around word equations; in contrast to it, its semantics are defined to only allow finite models, by limiting the universe to a word and all its factors. As a consequence of this, FC has many of the desirable properties of FO on finite models, while being far more expressive than FO[<]. Most noteworthy among these desirable properties are sufficient criteria for efficient model checking, and capturing various complexity classes by adding operators for transitive closures or fixed points. Not only does FC allow us to obtain new insights and techniques for expressive power and efficient evaluation of document spanners, but it also provides a general framework for logic on words that also has potential applications in other areas.

Cite as

Dominik D. Freydenberger and Liat Peterfreund. The Theory of Concatenation over Finite Models. In 48th International Colloquium on Automata, Languages, and Programming (ICALP 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 198, pp. 130:1-130:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{freydenberger_et_al:LIPIcs.ICALP.2021.130,
  author =	{Freydenberger, Dominik D. and Peterfreund, Liat},
  title =	{{The Theory of Concatenation over Finite Models}},
  booktitle =	{48th International Colloquium on Automata, Languages, and Programming (ICALP 2021)},
  pages =	{130:1--130:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-195-5},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{198},
  editor =	{Bansal, Nikhil and Merelli, Emanuela and Worrell, James},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2021.130},
  URN =		{urn:nbn:de:0030-drops-141997},
  doi =		{10.4230/LIPIcs.ICALP.2021.130},
  annote =	{Keywords: finite model theory, word equations, descriptive complexity, model checking, document spanners}
}

Document

DOI: 10.4230/LIPIcs.ICDT.2021.7

Grammars for Document Spanners

Authors: Liat Peterfreund

Published in: LIPIcs, Volume 186, 24th International Conference on Database Theory (ICDT 2021)

Abstract

We propose a new grammar-based language for defining information-extractors from documents (text) that is built upon the well-studied framework of document spanners for extracting structured data from text. While previously studied formalisms for document spanners are mainly based on regular expressions, we use an extension of context-free grammars, called {extraction grammars}, to define the new class of context-free spanners. Extraction grammars are simply context-free grammars extended with variables that capture interval positions of the document, namely spans. While regular expressions are efficient for tokenizing and tagging, context-free grammars are also efficient for capturing structural properties. Indeed, we show that context-free spanners are strictly more expressive than their regular counterparts. We reason about the expressive power of our new class and present a pushdown-automata model that captures it. We show that extraction grammars can be evaluated with polynomial data complexity. Nevertheless, as the degree of the polynomial depends on the query, we present an enumeration algorithm for unambiguous extraction grammars that, after quintic preprocessing, outputs the results sequentially, without repetitions, with a constant delay between every two consecutive ones.

Cite as

Liat Peterfreund. Grammars for Document Spanners. In 24th International Conference on Database Theory (ICDT 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 186, pp. 7:1-7:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{peterfreund:LIPIcs.ICDT.2021.7,
  author =	{Peterfreund, Liat},
  title =	{{Grammars for Document Spanners}},
  booktitle =	{24th International Conference on Database Theory (ICDT 2021)},
  pages =	{7:1--7:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-179-5},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{186},
  editor =	{Yi, Ke and Wei, Zhewei},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2021.7},
  URN =		{urn:nbn:de:0030-drops-137154},
  doi =		{10.4230/LIPIcs.ICDT.2021.7},
  annote =	{Keywords: Information Extraction, Document Spanners, Context-Free Grammars, Constant-Delay Enumeration, Regular Expressions, Pushdown Automata}
}

Document

DOI: 10.4230/LIPIcs.ICDT.2020.8

Weight Annotation in Information Extraction

Authors: Johannes Doleschal, Benny Kimelfeld, Wim Martens, and Liat Peterfreund

Published in: LIPIcs, Volume 155, 23rd International Conference on Database Theory (ICDT 2020)

Abstract

The framework of document spanners abstracts the task of information extraction from text as a function that maps every document (a string) into a relation over the document’s spans (intervals identified by their start and end indices). For instance, the regular spanners are the closure under the Relational Algebra (RA) of the regular expressions with capture variables, and the expressive power of the regular spanners is precisely captured by the class of vset-automata - a restricted class of transducers that mark the endpoints of selected spans. In this work, we embark on the investigation of document spanners that can annotate extractions with auxiliary information such as confidence, support, and confidentiality measures. To this end, we adopt the abstraction of provenance semirings by Green et al., where tuples of a relation are annotated with the elements of a commutative semiring, and where the annotation propagates through the (positive) RA operators via the semiring operators. Hence, the proposed spanner extension, referred to as an annotator, maps every string into an annotated relation over the spans. As a specific instantiation, we explore weighted vset-automata that, similarly to weighted automata and transducers, attach semiring elements to transitions. We investigate key aspects of expressiveness, such as the closure under the positive RA, and key aspects of computational complexity, such as the enumeration of annotated answers and their ranked enumeration in the case of numeric semirings. For a number of these problems, fundamental properties of the underlying semiring, such as positivity, are crucial for establishing tractability.

Cite as

Johannes Doleschal, Benny Kimelfeld, Wim Martens, and Liat Peterfreund. Weight Annotation in Information Extraction. In 23rd International Conference on Database Theory (ICDT 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 155, pp. 8:1-8:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{doleschal_et_al:LIPIcs.ICDT.2020.8,
  author =	{Doleschal, Johannes and Kimelfeld, Benny and Martens, Wim and Peterfreund, Liat},
  title =	{{Weight Annotation in Information Extraction}},
  booktitle =	{23rd International Conference on Database Theory (ICDT 2020)},
  pages =	{8:1--8:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-139-9},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{155},
  editor =	{Lutz, Carsten and Jung, Jean Christoph},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2020.8},
  URN =		{urn:nbn:de:0030-drops-119325},
  doi =		{10.4230/LIPIcs.ICDT.2020.8},
  annote =	{Keywords: Information extraction, regular document spanners, weighted automata, provenance semirings, K-relations}
}

Document

DOI: 10.4230/LIPIcs.ICDT.2019.13

Recursive Programs for Document Spanners

Authors: Liat Peterfreund, Balder ten Cate, Ronald Fagin, and Benny Kimelfeld

Published in: LIPIcs, Volume 127, 22nd International Conference on Database Theory (ICDT 2019)

Abstract

A document spanner models a program for Information Extraction (IE) as a function that takes as input a text document (string over a finite alphabet) and produces a relation of spans (intervals in the document) over a predefined schema. A well-studied language for expressing spanners is that of the regular spanners: relational algebra over regex formulas, which are regular expressions with capture variables. Equivalently, the regular spanners are the ones expressible in non-recursive Datalog over regex formulas (which extract relations that constitute the extensional database). This paper explores the expressive power of recursive Datalog over regex formulas. We show that such programs can express precisely the document spanners computable in polynomial time. We compare this expressiveness to known formalisms such as the closure of regex formulas under the relational algebra and string equality. Finally, we extend our study to a recently proposed framework that generalizes both the relational model and the document spanners.

Cite as

Liat Peterfreund, Balder ten Cate, Ronald Fagin, and Benny Kimelfeld. Recursive Programs for Document Spanners. In 22nd International Conference on Database Theory (ICDT 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 127, pp. 13:1-13:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{peterfreund_et_al:LIPIcs.ICDT.2019.13,
  author =	{Peterfreund, Liat and Cate, Balder ten and Fagin, Ronald and Kimelfeld, Benny},
  title =	{{Recursive Programs for Document Spanners}},
  booktitle =	{22nd International Conference on Database Theory (ICDT 2019)},
  pages =	{13:1--13:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-101-6},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{127},
  editor =	{Barcelo, Pablo and Calautti, Marco},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2019.13},
  URN =		{urn:nbn:de:0030-drops-103155},
  doi =		{10.4230/LIPIcs.ICDT.2019.13},
  annote =	{Keywords: Information Extraction, Document Spanners, Polynomial Time, Recursion, Regular Expressions, Datalog}
}

Document

DOI: 10.4230/LIPIcs.ICDT.2017.17

Detecting Ambiguity in Prioritized Database Repairing

Authors: Benny Kimelfeld, Ester Livshits, and Liat Peterfreund

Published in: LIPIcs, Volume 68, 20th International Conference on Database Theory (ICDT 2017)

Abstract

In its traditional definition, a repair of an inconsistent database is a consistent database that differs from the inconsistent one in a "minimal way." Often, repairs are not equally legitimate, as it is desired to prefer one over another; for example, one fact is regarded more reliable than another, or a more recent fact should be preferred to an earlier one. Motivated by these considerations, researchers have introduced and investigated the framework of preferred repairs, in the context of denial constraints and subset repairs. There, a priority relation between facts is lifted towards a priority relation between consistent databases, and repairs are restricted to the ones that are optimal in the lifted sense. Three notions of lifting (and optimal repairs) have been proposed: Pareto, global, and completion. In this paper we investigate the complexity of deciding whether the priority relation suffices to clean the database unambiguously, or in other words, whether there is exactly one optimal repair. We show that the different lifting semantics entail highly different complexities. Under Pareto optimality, the problem is coNP-complete, in data complexity, for every set of functional dependencies (FDs), except for the tractable case of (equivalence to) one FD per relation. Under global optimality, one FD per relation is still tractable, but we establish Pi-2-p-completeness for a relation with two FDs. In contrast, under completion optimality the problem is solvable in polynomial time for every set of FDs. In fact, we present a polynomial-time algorithm for arbitrary conflict hypergraphs. We further show that under a general assumption of transitivity, this algorithm solves the problem even for global optimality. The algorithm is extremely simple, but its proof of correctness is quite intricate.

Cite as

Benny Kimelfeld, Ester Livshits, and Liat Peterfreund. Detecting Ambiguity in Prioritized Database Repairing. In 20th International Conference on Database Theory (ICDT 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 68, pp. 17:1-17:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)

Copy BibTex To Clipboard

@InProceedings{kimelfeld_et_al:LIPIcs.ICDT.2017.17,
  author =	{Kimelfeld, Benny and Livshits, Ester and Peterfreund, Liat},
  title =	{{Detecting Ambiguity in Prioritized Database Repairing}},
  booktitle =	{20th International Conference on Database Theory (ICDT 2017)},
  pages =	{17:1--17:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-024-8},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{68},
  editor =	{Benedikt, Michael and Orsi, Giorgio},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2017.17},
  URN =		{urn:nbn:de:0030-drops-70489},
  doi =		{10.4230/LIPIcs.ICDT.2017.17},
  annote =	{Keywords: inconsistent databases, preferred repairs, data cleaning, functional dependencies, conflict hypergraph}
}

Search Results

Documents authored by Peterfreund, Liat

A Researcher’s Digest of GQL (Invited Talk)

Abstract

Cite as

Playing Guess Who with Your Kids

Abstract

Cite as

The Theory of Concatenation over Finite Models

Abstract

Cite as

Grammars for Document Spanners

Abstract

Cite as

Weight Annotation in Information Extraction

Abstract

Cite as

Recursive Programs for Document Spanners

Abstract

Cite as

Detecting Ambiguity in Prioritized Database Repairing

Abstract

Cite as