1 Search Results for "Gay-Crosier, Cyrille"


Document
An Automatic Partitioning of Gutenberg.org Texts

Authors: Davide Picca and Cyrille Gay-Crosier

Published in: OASIcs, Volume 93, 3rd Conference on Language, Data and Knowledge (LDK 2021)


Abstract
Over the last 10 years, the automatic partitioning of texts has raised the interest of the community. The automatic identification of parts of texts can provide a faster and easier access to textual analysis. We introduce here an exploratory work for multi-part book identification. In an early attempt, we focus on Gutenberg.org which is one of the projects that has received the largest public support in recent years. The purpose of this article is to present a preliminary system that automatically classifies parts of texts into 35 semantic categories. An accuracy of more than 93% on the test set was achieved. We are planning to extend this effort to other repositories in the future.

Cite as

Davide Picca and Cyrille Gay-Crosier. An Automatic Partitioning of Gutenberg.org Texts. In 3rd Conference on Language, Data and Knowledge (LDK 2021). Open Access Series in Informatics (OASIcs), Volume 93, pp. 35:1-35:9, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)


Copy BibTex To Clipboard

@InProceedings{picca_et_al:OASIcs.LDK.2021.35,
  author =	{Picca, Davide and Gay-Crosier, Cyrille},
  title =	{{An Automatic Partitioning of Gutenberg.org Texts}},
  booktitle =	{3rd Conference on Language, Data and Knowledge (LDK 2021)},
  pages =	{35:1--35:9},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-199-3},
  ISSN =	{2190-6807},
  year =	{2021},
  volume =	{93},
  editor =	{Gromann, Dagmar and S\'{e}rasset, Gilles and Declerck, Thierry and McCrae, John P. and Gracia, Jorge and Bosque-Gil, Julia and Bobillo, Fernando and Heinisch, Barbara},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.LDK.2021.35},
  URN =		{urn:nbn:de:0030-drops-145714},
  doi =		{10.4230/OASIcs.LDK.2021.35},
  annote =	{Keywords: Digital Humanities, Machine Learning, Corpora}
}
  • Refine by Author
  • 1 Gay-Crosier, Cyrille
  • 1 Picca, Davide

  • Refine by Classification
  • 1 Computing methodologies
  • 1 Computing methodologies → Language resources

  • Refine by Keyword
  • 1 Corpora
  • 1 Digital Humanities
  • 1 Machine Learning

  • Refine by Type
  • 1 document

  • Refine by Publication Year
  • 1 2021

Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail