3 Search Results for "Cattuto, Ciro"

Document

DOI: 10.4230/LIPIcs.WABI.2025.14

Human Readable Compression of GFA Paths Using Grammar-Based Code

Authors: Peter Heringer and Daniel Doerr

Published in: LIPIcs, Volume 344, 25th International Conference on Algorithms for Bioinformatics (WABI 2025)

Abstract

Pangenome graphs offer a compact and comprehensive representation of genomic diversity, improving tasks such as variant calling, genotyping, and other downstream analyses. Although the underlying graph structures scale sublinearly with the number of haplotypes, the widely used GFA file format suffers from rapidly growing file sizes due to the explicit and repetitive encoding of haplotype paths. In this work, we introduce an extension to the GFA format that enables efficient grammar-based compression of haplotype paths while retaining human readability. In addition, grammar-based encoding provides an efficient in-memory data structure that does not require decompression, but conversely improves the runtime of many computational tasks that involve haplotype comparisons. We present sqz, a method that makes use of the proposed format extension to encode haplotype paths using byte pair encoding, a grammar-based compression scheme. We evaluate sqz on recent human pangenome graphs from Heumos et al. and the Human Pangenome Reference Consortium (HPRC), comparing it to existing compressors bgzip, gbz, and sequitur. sqz scales sublinearly with the number of haplotypes in a pangenome graph and consistently achieves higher compression ratios than sequitur and up to 5 times better compression than bgzip in HPRC graphs and up to 10 times in the graph from Heumos et al.. When combined with bgzip, sqz matches or excels the compression ratio of gbz across all our datasets. These results demonstrate the potential of our proposed extension of the GFA format in reducing haplotype path redundancy and improving storage efficiency for pangenome graphs.

Cite as

Peter Heringer and Daniel Doerr. Human Readable Compression of GFA Paths Using Grammar-Based Code. In 25th International Conference on Algorithms for Bioinformatics (WABI 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 344, pp. 14:1-14:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{heringer_et_al:LIPIcs.WABI.2025.14,
  author =	{Heringer, Peter and Doerr, Daniel},
  title =	{{Human Readable Compression of GFA Paths Using Grammar-Based Code}},
  booktitle =	{25th International Conference on Algorithms for Bioinformatics (WABI 2025)},
  pages =	{14:1--14:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-386-7},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{344},
  editor =	{Brejov\'{a}, Bro\v{n}a and Patro, Rob},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2025.14},
  URN =		{urn:nbn:de:0030-drops-239395},
  doi =		{10.4230/LIPIcs.WABI.2025.14},
  annote =	{Keywords: pangenomics, pangenome graphs, compression, grammar-based code, byte pair encoding}
}

Document

Vision

DOI: 10.4230/TGDK.1.1.6

Towards Ordinal Data Science

Authors: Gerd Stumme, Dominik Dürrschnabel, and Tom Hanika

Published in: TGDK, Volume 1, Issue 1 (2023): Special Issue on Trends in Graph Data and Knowledge. Transactions on Graph Data and Knowledge, Volume 1, Issue 1

Abstract

Order is one of the main instruments to measure the relationship between objects in (empirical) data. However, compared to methods that use numerical properties of objects, the amount of ordinal methods developed is rather small. One reason for this is the limited availability of computational resources in the last century that would have been required for ordinal computations. Another reason - particularly important for this line of research - is that order-based methods are often seen as too mathematically rigorous for applying them to real-world data. In this paper, we will therefore discuss different means for measuring and ‘calculating’ with ordinal structures - a specific class of directed graphs - and show how to infer knowledge from them. Our aim is to establish Ordinal Data Science as a fundamentally new research agenda. Besides cross-fertilization with other cornerstone machine learning and knowledge representation methods, a broad range of disciplines will benefit from this endeavor, including, psychology, sociology, economics, web science, knowledge engineering, scientometrics.

Cite as

Gerd Stumme, Dominik Dürrschnabel, and Tom Hanika. Towards Ordinal Data Science. In Special Issue on Trends in Graph Data and Knowledge. Transactions on Graph Data and Knowledge (TGDK), Volume 1, Issue 1, pp. 6:1-6:39, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@Article{stumme_et_al:TGDK.1.1.6,
  author =	{Stumme, Gerd and D\"{u}rrschnabel, Dominik and Hanika, Tom},
  title =	{{Towards Ordinal Data Science}},
  journal =	{Transactions on Graph Data and Knowledge},
  pages =	{6:1--6:39},
  ISSN =	{2942-7517},
  year =	{2023},
  volume =	{1},
  number =	{1},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/TGDK.1.1.6},
  URN =		{urn:nbn:de:0030-drops-194801},
  doi =		{10.4230/TGDK.1.1.6},
  annote =	{Keywords: Order relation, data science, relational theory of measurement, metric learning, general algebra, lattices, factorization, approximations and heuristics, factor analysis, visualization, browsing, explainability}
}

Document

DOI: 10.4230/DagSemProc.08391.3

08391 Group Summary – Mining for Social Serendipity

Authors: Alexandre Passant, Ian Mulvany, Peter Mika, Nicolas Maisonneuve, Alexander Löser, Ciro Cattuto, Christian Bizer, Christian Bauckhage, and Harith Alani

Published in: Dagstuhl Seminar Proceedings, Volume 8391, Social Web Communities (2008)

Abstract

A common social problem at an event in which people do not personally know all of the other participants is the natural tendency for cliques to form and for discussions to mainly happen between people who already know each other. This limits the possibility for people to make interesting new acquaintances and acts as a retarding force in the creation of new links in the social web. Encouraging users to socialize with people they don't know by revealing to them hidden surprising links could help to improve the diversity of interactions at an event. The goal of this paper is to propose a method for detecting extit{"surprising"} relationships between people attending an event. By extit{"surprising"} relationship we mean those relationships that are not known a-priori, and that imply shared information not directly related with the local context of the event (location, interests, contacts) at which the meeting takes place. To demonstrate and test our concept we used the Flickr community. We focused on a community of users associated with a social event (a computer science conference) and represented in Flickr by means of a photo pool devoted to the event. We use Flickr metadata (tags) to mine for user similarity not related to the context of the event, as represented in the corresponding Flickr group. For example, we look for two group members who have been in the same highly specific place (identified by means of geo-tagged photos), but are not friends of each other and share no other common interests or, social neighborhood.

Cite as

Alexandre Passant, Ian Mulvany, Peter Mika, Nicolas Maisonneuve, Alexander Löser, Ciro Cattuto, Christian Bizer, Christian Bauckhage, and Harith Alani. 08391 Group Summary – Mining for Social Serendipity. In Social Web Communities. Dagstuhl Seminar Proceedings, Volume 8391, pp. 1-11, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2008)

Copy BibTex To Clipboard

@InProceedings{passant_et_al:DagSemProc.08391.3,
  author =	{Passant, Alexandre and Mulvany, Ian and Mika, Peter and Maisonneuve, Nicolas and L\"{o}ser, Alexander and Cattuto, Ciro and Bizer, Christian and Bauckhage, Christian and Alani, Harith},
  title =	{{08391 Group Summary – Mining for Social Serendipity}},
  booktitle =	{Social Web Communities},
  pages =	{1--11},
  series =	{Dagstuhl Seminar Proceedings (DagSemProc)},
  ISSN =	{1862-4405},
  year =	{2008},
  volume =	{8391},
  editor =	{Harith Alani and Steffen Staab and Gerd Stumme},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/DagSemProc.08391.3},
  URN =		{urn:nbn:de:0030-drops-17910},
  doi =		{10.4230/DagSemProc.08391.3},
  annote =	{Keywords: Serendipity, online activity, context, ubiquitous computing}
}

Refine by Type
3 Document/PDF
2 Document/HTML

Refine by Publication Year
1 2025
1 2023
1 2008

Refine by Author
1 Alani, Harith
1 Bauckhage, Christian
1 Bizer, Christian
1 Cattuto, Ciro
1 Doerr, Daniel
Show More...

Refine by Series/Journal
1 LIPIcs
1 TGDK
1 DagSemProc

Refine by Classification
1 Applied computing → Bioinformatics
1 Computing methodologies → Algebraic algorithms
1 Computing methodologies → Boolean algebra algorithms
1 Computing methodologies → Inductive logic learning
1 Computing methodologies → Nonmonotonic, default reasoning and belief revision
Show More...

Refine by Keyword
1 Order relation
1 Serendipity
1 approximations and heuristics
1 browsing
1 byte pair encoding
Show More...

3 Search Results for "Cattuto, Ciro"

Human Readable Compression of GFA Paths Using Grammar-Based Code

Abstract

Cite as

Towards Ordinal Data Science

Abstract

Cite as

08391 Group Summary – Mining for Social Serendipity

Abstract

Cite as

Thanks for your feedback!

Could not send message