Search Results

Documents authored by Masciari, Elio


Document
Exploiting Structural Similarity For Effective Web Information Extraction

Authors: Elio Masciari, Sergio Flesca, Giuseppe Manco, Luigi Pontieri, and Andrea Pugliese

Published in: Dagstuhl Seminar Proceedings, Volume 5061, Foundations of Semistructured Data (2005)


Abstract
In this paper we propose an architecture that exploit web pages stuctural information for the extraction of relevant information from them. In this architecture, a primary role played by a distance-based classification methodology is devised. Such a methodology is based on an efficient and effective technique for detecting structural similarities among semistructured documents, which significantly differs from standard methods based on graph-matching algorithms. The technique is based on the idea of representing the structure of a document as a time series in which each occurrence of a tag corresponds to a given impulse. By analyzing the frequencies of the corresponding Fourier transform, we can hence state the degree of similarity between documents. Experiments on real data show the effectiveness of the proposed technique.

Cite as

Elio Masciari, Sergio Flesca, Giuseppe Manco, Luigi Pontieri, and Andrea Pugliese. Exploiting Structural Similarity For Effective Web Information Extraction. In Foundations of Semistructured Data. Dagstuhl Seminar Proceedings, Volume 5061, pp. 1-20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2005)


Copy BibTex To Clipboard

@InProceedings{masciari_et_al:DagSemProc.05061.4,
  author =	{Masciari, Elio and Flesca, Sergio and Manco, Giuseppe and Pontieri, Luigi and Pugliese, Andrea},
  title =	{{Exploiting Structural Similarity For Effective Web Information Extraction}},
  booktitle =	{Foundations of Semistructured Data},
  pages =	{1--20},
  series =	{Dagstuhl Seminar Proceedings (DagSemProc)},
  ISSN =	{1862-4405},
  year =	{2005},
  volume =	{5061},
  editor =	{Frank Neven and Thomas Schwentick and Dan Suciu},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/DagSemProc.05061.4},
  URN =		{urn:nbn:de:0030-drops-2301},
  doi =		{10.4230/DagSemProc.05061.4},
  annote =	{Keywords: DFT, Web Document Structural Similarity}
}
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail