DROPS

Document

DOI: 10.4230/LIPIcs.AFT.2025.35

On-Chain Decentralized Learning and Cost-Effective Inference for DeFi Attack Mitigation

Authors: Abdulrahman Alhaidari, Balaji Palanisamy, and Prashant Krishnamurthy

Published in: LIPIcs, Volume 354, 7th Conference on Advances in Financial Technologies (AFT 2025)

Abstract

Billions of dollars are lost every year in DeFi platforms by transactions exploiting business logic or accounting vulnerabilities. Existing defenses focus on static code analysis, public mempool screening, attacker contract detection, or trusted off-chain monitors, none of which prevents exploits submitted through private relays or malicious contracts that execute within the same block. We present the first decentralized, fully on-chain learning framework that: (i) performs gas-prohibitive computation on Layer-2 to reduce cost, (ii) propagates verified model updates to Layer-1, and (iii) enables gas-bounded, low-latency inference inside smart contracts. A novel Proof-of-Improvement (PoIm) protocol governs the training process and verifies each decentralized micro update as a self-verifying training transaction. Updates are accepted by PoIm only if they demonstrably improve at least one core metric (e.g., accuracy, F1-score, precision, or recall) on a public benchmark without degrading any of the other core metrics, while adversarial proposals get financially penalized through an adaptable test set for evolving threats. We develop quantization and loop-unrolling techniques that enable inference for logistic regression, SVM, MLPs, CNNs, and gated RNNs (with support for formally verified decision tree inference) within the Ethereum block gas limit, while remaining bit-exact to their off-chain counterparts, formally proven in Z3. We curate 298 unique real-world exploits (2020 - 2025) with 402 exploit transactions across eight EVM chains, collectively responsible for $3.74 B in losses. We demonstrate that on-chain ML governed by PoIm detects previously unseen attacks with over 97% attack detection accuracy and 82.0% F1. A single inference, such as one made via an external call, typically incurs zero cost. Fully on-chain inference consumes 57,603 gas (≈ $0.18) for linear models, 143,647 gas (≈ $0.49) for CNN(F2, K1), and 506,397 gas (≈ $1.77) for CNN(F8, K4) on L1 (e.g., Ethereum). Our results show that practical and continually evolving DeFi defenses can be embedded directly in protocol logic without trusted guardians, and our solution achieves highly cost-effective protection while filling a critical gap between vulnerability scanners and real-time transaction screening.

Cite as

Abdulrahman Alhaidari, Balaji Palanisamy, and Prashant Krishnamurthy. On-Chain Decentralized Learning and Cost-Effective Inference for DeFi Attack Mitigation. In 7th Conference on Advances in Financial Technologies (AFT 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 354, pp. 35:1-35:27, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{alhaidari_et_al:LIPIcs.AFT.2025.35,
  author =	{Alhaidari, Abdulrahman and Palanisamy, Balaji and Krishnamurthy, Prashant},
  title =	{{On-Chain Decentralized Learning and Cost-Effective Inference for DeFi Attack Mitigation}},
  booktitle =	{7th Conference on Advances in Financial Technologies (AFT 2025)},
  pages =	{35:1--35:27},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-400-0},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{354},
  editor =	{Avarikioti, Zeta and Christin, Nicolas},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.AFT.2025.35},
  URN =		{urn:nbn:de:0030-drops-247548},
  doi =		{10.4230/LIPIcs.AFT.2025.35},
  annote =	{Keywords: DeFi attacks, on-chain machine learning, decentralized learning, real-time defense}
}

Document

DOI: 10.4230/LIPIcs.AFT.2025.27

Unravelling the Probabilistic Forest: Arbitrage in Prediction Markets

Authors: Oriol Saguillo, Vahid Ghafouri, Lucianna Kiffer, and Guillermo Suarez-Tangil

Published in: LIPIcs, Volume 354, 7th Conference on Advances in Financial Technologies (AFT 2025)

Abstract

Polymarket is a prediction market platform where users can speculate on future events by trading shares tied to specific outcomes, known as conditions. Each market on Polymarket is associated with a set of one or more such conditions. To ensure proper market resolution, the condition set must be exhaustive - collectively accounting for all possible outcomes - and mutually exclusive - only one condition may resolve as true. Thus, the collective prices (probabilities) of all related outcomes (whether in a condition or market) should be $1, representing a combined probability of 1 of any outcome. Despite this design, Polymarket exhibits cases where dependent assets are mispriced, allowing for purchasing (or selling) a certain outcome for less than (or more than) $1, guaranteeing profit. This phenomenon, known as arbitrage, could enable sophisticated participants to exploit such inconsistencies. In this paper, we conduct an empirical arbitrage analysis on Polymarket data to answer three key questions: (Q1) What conditions give rise to arbitrage? (Q2) Does arbitrage actually occur on Polymarket?, and (Q3) Has anyone exploited these opportunities? A major challenge in analyzing arbitrage between related markets lies in the scalability of comparisons across a large number of markets and conditions, with a naive analysis requiring O(2^{n+m}) comparisons. To overcome this, we employ a heuristic-driven reduction strategy based on timeliness, topical similarity, and combinatorial relationships, further validated by expert input. Our study reveals two distinct forms of arbitrage on Polymarket: Market Rebalancing Arbitrage, which occurs within a single market or condition (intra-market), and Combinatorial Arbitrage, which spans across multiple markets (inter-market). We use on-chain historical order book data to analyze when these types of arbitrage opportunities have existed, and when they have been executed by users. We find a realized estimate of 40 million USD of profit extracted across both types of arbitrage during our measurement period.

Cite as

Oriol Saguillo, Vahid Ghafouri, Lucianna Kiffer, and Guillermo Suarez-Tangil. Unravelling the Probabilistic Forest: Arbitrage in Prediction Markets. In 7th Conference on Advances in Financial Technologies (AFT 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 354, pp. 27:1-27:24, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{saguillo_et_al:LIPIcs.AFT.2025.27,
  author =	{Saguillo, Oriol and Ghafouri, Vahid and Kiffer, Lucianna and Suarez-Tangil, Guillermo},
  title =	{{Unravelling the Probabilistic Forest: Arbitrage in Prediction Markets}},
  booktitle =	{7th Conference on Advances in Financial Technologies (AFT 2025)},
  pages =	{27:1--27:24},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-400-0},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{354},
  editor =	{Avarikioti, Zeta and Christin, Nicolas},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.AFT.2025.27},
  URN =		{urn:nbn:de:0030-drops-247468},
  doi =		{10.4230/LIPIcs.AFT.2025.27},
  annote =	{Keywords: Prediction Markets, Maximal Extractable Value, Large Language Models}
}

Document

APPROX

DOI: 10.4230/LIPIcs.APPROX/RANDOM.2025.8

Multipass Linear Sketches for Geometric LP-Type Problems

Authors: N. Efe Çekirge, William Gay, and David P. Woodruff

Published in: LIPIcs, Volume 353, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2025)

Abstract

LP-type problems such as the Minimum Enclosing Ball (MEB), Linear Support Vector Machine (SVM), Linear Programming (LP), and Semidefinite Programming (SDP) are fundamental combinatorial optimization problems, with many important applications in machine learning applications such as classification, bioinformatics, and noisy learning. We study LP-type problems in several streaming and distributed big data models, giving ε-approximation linear sketching algorithms with a focus on the high accuracy regime with low dimensionality d, that is, when d < (1/ε)^0.999. Our main result is an O(ds) pass algorithm with O(s(√d/ε)^{3d/s}) ⋅ poly(d, log (1/ε)) space complexity in words, for any parameter s ∈ [1, d log (1/ε)], to solve ε-approximate LP-type problems of O(d) combinatorial and VC dimension. Notably, by taking s = d log (1/ε), we achieve space complexity polynomial in d and polylogarithmic in 1/ε, presenting exponential improvements in 1/ε over current algorithms. We complement our results by showing lower bounds of (1/ε)^Ω(d) for any 1-pass algorithm solving the (1 + ε)-approximation MEB and linear SVM problems, further motivating our multi-pass approach.

Cite as

N. Efe Çekirge, William Gay, and David P. Woodruff. Multipass Linear Sketches for Geometric LP-Type Problems. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 353, pp. 8:1-8:25, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{cekirge_et_al:LIPIcs.APPROX/RANDOM.2025.8,
  author =	{\c{C}ekirge, N. Efe and Gay, William and Woodruff, David P.},
  title =	{{Multipass Linear Sketches for Geometric LP-Type Problems}},
  booktitle =	{Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2025)},
  pages =	{8:1--8:25},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-397-3},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{353},
  editor =	{Ene, Alina and Chattopadhyay, Eshan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.APPROX/RANDOM.2025.8},
  URN =		{urn:nbn:de:0030-drops-243741},
  doi =		{10.4230/LIPIcs.APPROX/RANDOM.2025.8},
  annote =	{Keywords: Streaming, sketching, LP-type problems}
}

Document

Track A: Algorithms, Complexity and Games

DOI: 10.4230/LIPIcs.ICALP.2025.106

Streaming Maximal Matching with Bounded Deletions

Authors: Sanjeev Khanna, Christian Konrad, and Jacques Dark

Published in: LIPIcs, Volume 334, 52nd International Colloquium on Automata, Languages, and Programming (ICALP 2025)

Abstract

We initiate the study of the Maximal Matching problem in bounded-deletion graph streams. In this setting, a graph G is revealed as an arbitrary sequence of edge insertions and deletions, where the number of insertions is unrestricted but the number of deletions is guaranteed to be at most K, for some given parameter K. The single-pass streaming space complexity of this problem is known to be Θ(n²) when K is unrestricted, where n is the number of vertices of the input graph. In this work, we present new randomized and deterministic algorithms and matching lower bound results that together give a tight understanding (up to poly-log factors) of how the space complexity of Maximal Matching evolves as a function of the parameter K: The randomized space complexity of this problem is Θ̃(n ⋅ √K), while the deterministic space complexity is Θ̃(n ⋅ K). We further show that if we relax the maximal matching requirement to an α-approximation to Maximum Matching, for any constant α > 2, then the space complexity for both, deterministic and randomized algorithms, strikingly changes to Θ̃(n + K). A key conceptual contribution of our work that underlies all our algorithmic results is the introduction of the hierarchical maximal matching data structure, which computes a hierarchy of L maximal matchings on the substream of edge insertions, for an integer L. This deterministic data structure allows recovering a Maximal Matching even in the presence of up to L-1 edge deletions, which immediately yields an optimal deterministic algorithm with space Õ(n ⋅ K). To reduce the space to Õ(n ⋅ √K), we compute only √K levels of our hierarchical matching data structure and utilize a randomized linear sketch, i.e., our matching repair data structure, to repair any damage due to edge deletions. Using our repair data structure, we show that the level that is least affected by deletions can be repaired back to be globally maximal. The repair data structure is computed independently of the hierarchical maximal matching data structure and stores information for vertices at different scales with a gradually smaller set of vertices storing more and more information about their incident edges. The repair process then makes progress either by rematching a vertex to a previously unmatched vertex, or by strategically matching it to another matched vertex whose current mate is in a better position to find a new mate in that we have stored more information about its incident edges. Our lower bound result for randomized algorithms is obtained by establishing a lower bound for a generalization of the well-known Augmented-Index problem in the one-way two-party communication setting that we refer to as Embedded-Augmented-Index, and then showing that an instance of Embedded-Augmented-Index reduces to computing a maximal matching in bounded-deletion streams. To obtain our lower bound for deterministic algorithms, we utilize a compression argument to show that a deterministic algorithm with space o(n ⋅ K) would yield a scheme to compress a suitable class of graphs below the information-theoretic threshold.

Cite as

Sanjeev Khanna, Christian Konrad, and Jacques Dark. Streaming Maximal Matching with Bounded Deletions. In 52nd International Colloquium on Automata, Languages, and Programming (ICALP 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 334, pp. 106:1-106:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{khanna_et_al:LIPIcs.ICALP.2025.106,
  author =	{Khanna, Sanjeev and Konrad, Christian and Dark, Jacques},
  title =	{{Streaming Maximal Matching with Bounded Deletions}},
  booktitle =	{52nd International Colloquium on Automata, Languages, and Programming (ICALP 2025)},
  pages =	{106:1--106:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-372-0},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{334},
  editor =	{Censor-Hillel, Keren and Grandoni, Fabrizio and Ouaknine, Jo\"{e}l and Puppis, Gabriele},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2025.106},
  URN =		{urn:nbn:de:0030-drops-234834},
  doi =		{10.4230/LIPIcs.ICALP.2025.106},
  annote =	{Keywords: Streaming Algorithms, Maximal Matching, Maximum Matching, Bounded-Deletion Streams}
}

Document

Position

DOI: 10.4230/TGDK.2.1.5

Standardizing Knowledge Engineering Practices with a Reference Architecture

Authors: Bradley P. Allen and Filip Ilievski

Published in: TGDK, Volume 2, Issue 1 (2024): Special Issue on Trends in Graph Data and Knowledge - Part 2. Transactions on Graph Data and Knowledge, Volume 2, Issue 1

Abstract

Knowledge engineering is the process of creating and maintaining knowledge-producing systems. Throughout the history of computer science and AI, knowledge engineering workflows have been widely used given the importance of high-quality knowledge for reliable intelligent agents. Meanwhile, the scope of knowledge engineering, as apparent from its target tasks and use cases, has been shifting, together with its paradigms such as expert systems, semantic web, and language modeling. The intended use cases and supported user requirements between these paradigms have not been analyzed globally, as new paradigms often satisfy prior pain points while possibly introducing new ones. The recent abstraction of systemic patterns into a boxology provides an opening for aligning the requirements and use cases of knowledge engineering with the systems, components, and software that can satisfy them best, however, this direction has not been explored to date. This paper proposes a vision of harmonizing the best practices in the field of knowledge engineering by leveraging the software engineering methodology of creating reference architectures. We describe how a reference architecture can be iteratively designed and implemented to associate user needs with recurring systemic patterns, building on top of existing knowledge engineering workflows and boxologies. We provide a six-step roadmap that can enable the development of such an architecture, consisting of scope definition, selection of information sources, architectural analysis, synthesis of an architecture based on the information source analysis, evaluation through instantiation, and, ultimately, instantiation into a concrete software architecture. We provide an initial design and outcome of the definition of architectural scope, selection of information sources, and analysis. As the remaining steps of design, evaluation, and instantiation of the architecture are largely use-case specific, we provide a detailed description of their procedures and point to relevant examples. We expect that following through on this vision will lead to well-grounded reference architectures for knowledge engineering, will advance the ongoing initiatives of organizing the neurosymbolic knowledge engineering space, and will build new links to the software architectures and data science communities.

Cite as

Bradley P. Allen and Filip Ilievski. Standardizing Knowledge Engineering Practices with a Reference Architecture. In Special Issue on Trends in Graph Data and Knowledge - Part 2. Transactions on Graph Data and Knowledge (TGDK), Volume 2, Issue 1, pp. 5:1-5:23, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@Article{allen_et_al:TGDK.2.1.5,
  author =	{Allen, Bradley P. and Ilievski, Filip},
  title =	{{Standardizing Knowledge Engineering Practices with a Reference Architecture}},
  journal =	{Transactions on Graph Data and Knowledge},
  pages =	{5:1--5:23},
  ISSN =	{2942-7517},
  year =	{2024},
  volume =	{2},
  number =	{1},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/TGDK.2.1.5},
  URN =		{urn:nbn:de:0030-drops-198623},
  doi =		{10.4230/TGDK.2.1.5},
  annote =	{Keywords: knowledge engineering, knowledge graphs, quality attributes, software architectures, sociotechnical systems}
}

Document

Survey

DOI: 10.4230/TGDK.1.1.11

How Does Knowledge Evolve in Open Knowledge Graphs?

Authors: Axel Polleres, Romana Pernisch, Angela Bonifati, Daniele Dell'Aglio, Daniil Dobriy, Stefania Dumbrava, Lorena Etcheverry, Nicolas Ferranti, Katja Hose, Ernesto Jiménez-Ruiz, Matteo Lissandrini, Ansgar Scherp, Riccardo Tommasini, and Johannes Wachs

Published in: TGDK, Volume 1, Issue 1 (2023): Special Issue on Trends in Graph Data and Knowledge. Transactions on Graph Data and Knowledge, Volume 1, Issue 1

Abstract

Openly available, collaboratively edited Knowledge Graphs (KGs) are key platforms for the collective management of evolving knowledge. The present work aims t o provide an analysis of the obstacles related to investigating and processing specifically this central aspect of evolution in KGs. To this end, we discuss (i) the dimensions of evolution in KGs, (ii) the observability of evolution in existing, open, collaboratively constructed Knowledge Graphs over time, and (iii) possible metrics to analyse this evolution. We provide an overview of relevant state-of-the-art research, ranging from metrics developed for Knowledge Graphs specifically to potential methods from related fields such as network science. Additionally, we discuss technical approaches - and their current limitations - related to storing, analysing and processing large and evolving KGs in terms of handling typical KG downstream tasks.

Cite as

Axel Polleres, Romana Pernisch, Angela Bonifati, Daniele Dell'Aglio, Daniil Dobriy, Stefania Dumbrava, Lorena Etcheverry, Nicolas Ferranti, Katja Hose, Ernesto Jiménez-Ruiz, Matteo Lissandrini, Ansgar Scherp, Riccardo Tommasini, and Johannes Wachs. How Does Knowledge Evolve in Open Knowledge Graphs?. In Special Issue on Trends in Graph Data and Knowledge. Transactions on Graph Data and Knowledge (TGDK), Volume 1, Issue 1, pp. 11:1-11:59, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@Article{polleres_et_al:TGDK.1.1.11,
  author =	{Polleres, Axel and Pernisch, Romana and Bonifati, Angela and Dell'Aglio, Daniele and Dobriy, Daniil and Dumbrava, Stefania and Etcheverry, Lorena and Ferranti, Nicolas and Hose, Katja and Jim\'{e}nez-Ruiz, Ernesto and Lissandrini, Matteo and Scherp, Ansgar and Tommasini, Riccardo and Wachs, Johannes},
  title =	{{How Does Knowledge Evolve in Open Knowledge Graphs?}},
  journal =	{Transactions on Graph Data and Knowledge},
  pages =	{11:1--11:59},
  year =	{2023},
  volume =	{1},
  number =	{1},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/TGDK.1.1.11},
  URN =		{urn:nbn:de:0030-drops-194855},
  doi =		{10.4230/TGDK.1.1.11},
  annote =	{Keywords: KG evolution, temporal KG, versioned KG, dynamic KG}
}

Document

Position

DOI: 10.4230/TGDK.1.1.2

Large Language Models and Knowledge Graphs: Opportunities and Challenges

Authors: Jeff Z. Pan, Simon Razniewski, Jan-Christoph Kalo, Sneha Singhania, Jiaoyan Chen, Stefan Dietze, Hajira Jabeen, Janna Omeliyanenko, Wen Zhang, Matteo Lissandrini, Russa Biswas, Gerard de Melo, Angela Bonifati, Edlira Vakaj, Mauro Dragoni, and Damien Graux

Published in: TGDK, Volume 1, Issue 1 (2023): Special Issue on Trends in Graph Data and Knowledge. Transactions on Graph Data and Knowledge, Volume 1, Issue 1

Abstract

Large Language Models (LLMs) have taken Knowledge Representation - and the world - by storm. This inflection point marks a shift from explicit knowledge representation to a renewed focus on the hybrid representation of both explicit knowledge and parametric knowledge. In this position paper, we will discuss some of the common debate points within the community on LLMs (parametric knowledge) and Knowledge Graphs (explicit knowledge) and speculate on opportunities and visions that the renewed focus brings, as well as related research topics and challenges.

Cite as

Jeff Z. Pan, Simon Razniewski, Jan-Christoph Kalo, Sneha Singhania, Jiaoyan Chen, Stefan Dietze, Hajira Jabeen, Janna Omeliyanenko, Wen Zhang, Matteo Lissandrini, Russa Biswas, Gerard de Melo, Angela Bonifati, Edlira Vakaj, Mauro Dragoni, and Damien Graux. Large Language Models and Knowledge Graphs: Opportunities and Challenges. In Special Issue on Trends in Graph Data and Knowledge. Transactions on Graph Data and Knowledge (TGDK), Volume 1, Issue 1, pp. 2:1-2:38, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@Article{pan_et_al:TGDK.1.1.2,
  author =	{Pan, Jeff Z. and Razniewski, Simon and Kalo, Jan-Christoph and Singhania, Sneha and Chen, Jiaoyan and Dietze, Stefan and Jabeen, Hajira and Omeliyanenko, Janna and Zhang, Wen and Lissandrini, Matteo and Biswas, Russa and de Melo, Gerard and Bonifati, Angela and Vakaj, Edlira and Dragoni, Mauro and Graux, Damien},
  title =	{{Large Language Models and Knowledge Graphs: Opportunities and Challenges}},
  journal =	{Transactions on Graph Data and Knowledge},
  pages =	{2:1--2:38},
  year =	{2023},
  volume =	{1},
  number =	{1},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/TGDK.1.1.2},
  URN =		{urn:nbn:de:0030-drops-194766},
  doi =		{10.4230/TGDK.1.1.2},
  annote =	{Keywords: Large Language Models, Pre-trained Language Models, Knowledge Graphs, Ontology, Retrieval Augmented Language Models}
}

Document

DOI: 10.4230/LIPIcs.CCC.2019.13

Optimality of Linear Sketching Under Modular Updates

Authors: Kaave Hosseini, Shachar Lovett, and Grigory Yaroslavtsev

Published in: LIPIcs, Volume 137, 34th Computational Complexity Conference (CCC 2019)

Abstract

We study the relation between streaming algorithms and linear sketching algorithms, in the context of binary updates. We show that for inputs in n dimensions, the existence of efficient streaming algorithms which can process Omega(n^2) updates implies efficient linear sketching algorithms with comparable cost. This improves upon the previous work of Li, Nguyen and Woodruff [Yi Li et al., 2014] and Ai, Hu, Li and Woodruff [Yuqing Ai et al., 2016] which required a triple-exponential number of updates to achieve a similar result for updates over integers. We extend our results to updates modulo p for integers p >= 2, and to approximation instead of exact computation.

Cite as

Kaave Hosseini, Shachar Lovett, and Grigory Yaroslavtsev. Optimality of Linear Sketching Under Modular Updates. In 34th Computational Complexity Conference (CCC 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 137, pp. 13:1-13:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{hosseini_et_al:LIPIcs.CCC.2019.13,
  author =	{Hosseini, Kaave and Lovett, Shachar and Yaroslavtsev, Grigory},
  title =	{{Optimality of Linear Sketching Under Modular Updates}},
  booktitle =	{34th Computational Complexity Conference (CCC 2019)},
  pages =	{13:1--13:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-116-0},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{137},
  editor =	{Shpilka, Amir},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CCC.2019.13},
  URN =		{urn:nbn:de:0030-drops-108355},
  doi =		{10.4230/LIPIcs.CCC.2019.13},
  annote =	{Keywords: communication complexity, linear sketching, streaming algorithm}
}

Document

DOI: 10.4230/LIPIcs.CCC.2016.20

New Characterizations in Turnstile Streams with Applications

Authors: Yuqing Ai, Wei Hu, Yi Li, and David P. Woodruff

Published in: LIPIcs, Volume 50, 31st Conference on Computational Complexity (CCC 2016)

Abstract

Recently, [Li, Nguyen, Woodruff, STOC 2014] showed any 1-pass constant probability streaming algorithm for computing a relation f on a vector x in {-m, -(m-1), ..., m}^n presented in the turnstile data stream model can be implemented by maintaining a linear sketch Ax mod q, where A is an r times n integer matrix and q = (q_1, ..., q_r) is a vector of positive integers. The space complexity of maintaining Ax mod q, not including the random bits used for sampling A and q, matches the space of the optimal algorithm. We give multiple strengthenings of this reduction, together with new applications. In particular, we show how to remove the following shortcomings of their reduction: 1. The Box Constraint. Their reduction applies only to algorithms that must be correct even if x_{infinity} = max_{i in [n]} |x_i| is allowed to be much larger than m at intermediate points in the stream, provided that x is in {-m, -(m-1), ..., m}^n at the end of the stream. We give a condition under which the optimal algorithm is a linear sketch even if it works only when promised that x is in {-m, -(m-1), ..., m}^n at all points in the stream. Using this, we show the first super-constant Omega(log m) bits lower bound for the problem of maintaining a counter up to an additive epsilon*m error in a turnstile stream, where epsilon is any constant in (0, 1/2). Previous lower bounds are based on communication complexity and are only for relative error approximation; interestingly, we do not know how to prove our result using communication complexity. More generally, we show the first super-constant Omega(log(m)) lower bound for additive approximation of l_p-norms; this bound is tight for p in [1, 2]. 2. Negative Coordinates. Their reduction allows x_i to be negative while processing the stream. We show an equivalence between 1-pass algorithms and linear sketches Ax mod q in dynamic graph streams, or more generally, the strict turnstile model, in which for all i in [n], x_i is nonnegative at all points in the stream. Combined with [Assadi, Khanna, Li, Yaroslavtsev, SODA 2016], this resolves the 1-pass space complexity of approximating the maximum matching in a dynamic graph stream, answering a question in that work. 3. 1-Pass Restriction. Their reduction only applies to 1-pass data stream algorithms in the turnstile model, while there exist algorithms for heavy hitters and for low rank approximation which provably do better with multiple passes. We extend the reduction to algorithms which make any number of passes, showing the optimal algorithm is to choose a new linear sketch at the beginning of each pass, based on the output of previous passes.

Cite as

Yuqing Ai, Wei Hu, Yi Li, and David P. Woodruff. New Characterizations in Turnstile Streams with Applications. In 31st Conference on Computational Complexity (CCC 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 50, pp. 20:1-20:22, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)

Copy BibTex To Clipboard

@InProceedings{ai_et_al:LIPIcs.CCC.2016.20,
  author =	{Ai, Yuqing and Hu, Wei and Li, Yi and Woodruff, David P.},
  title =	{{New Characterizations in Turnstile Streams with Applications}},
  booktitle =	{31st Conference on Computational Complexity (CCC 2016)},
  pages =	{20:1--20:22},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-008-8},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{50},
  editor =	{Raz, Ran},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CCC.2016.20},
  URN =		{urn:nbn:de:0030-drops-58337},
  doi =		{10.4230/LIPIcs.CCC.2016.20},
  annote =	{Keywords: communication complexity, data streams, dynamic graph streams, norm estimation}
}

9 Search Results for "Ai, Yuqing"

On-Chain Decentralized Learning and Cost-Effective Inference for DeFi Attack Mitigation

Abstract

Cite as

Unravelling the Probabilistic Forest: Arbitrage in Prediction Markets

Abstract

Cite as

Multipass Linear Sketches for Geometric LP-Type Problems

Abstract

Cite as

Streaming Maximal Matching with Bounded Deletions

Abstract

Cite as

Standardizing Knowledge Engineering Practices with a Reference Architecture

Abstract

Cite as

How Does Knowledge Evolve in Open Knowledge Graphs?

Abstract

Cite as

Large Language Models and Knowledge Graphs: Opportunities and Challenges

Abstract

Cite as

Optimality of Linear Sketching Under Modular Updates

Abstract

Cite as

New Characterizations in Turnstile Streams with Applications

Abstract

Cite as

Thanks for your feedback!

Could not send message