Exploiting Structural Similarity For Effective Web Information Extraction

Masciari, Elio; Flesca, Sergio; Manco, Giuseppe; Pontieri, Luigi; Pugliese, Andrea

doi:10.4230/DagSemProc.05061.4

Document

Exploiting Structural Similarity For Effective Web Information Extraction

Authors Elio Masciari, Sergio Flesca, Giuseppe Manco, Luigi Pontieri, Andrea Pugliese

Part of: Volume: Dagstuhl Seminar Proceedings, Volume 5061
Part of: Series: Dagstuhl Seminar Proceedings (DagSemProc)
License: Creative Commons Attribution 4.0 International license
Publication Date: 2005-08-10

PDF

File

PDF

DagSemProc.05061.4.pdf

Filesize: 0.52 MB
20 pages

Document Identifiers

DOI: 10.4230/DagSemProc.05061.4
URN: urn:nbn:de:0030-drops-2301

Subject Classification

Keywords

DFT
Web Document Structural Similarity

Metrics

Access Statistics
Total Accesses (updated on a weekly basis)

0

Document

0

Metadata

Abstract

In this paper we propose an architecture that exploit web pages stuctural information for the extraction of relevant information from them.
  In this architecture, a primary role played by a distance-based classification  methodology is devised.
  Such a methodology is based on an efficient and effective technique for detecting structural similarities among semistructured documents,
  which significantly differs from standard methods based on graph-matching algorithms.
  The technique is based on the idea of representing the structure of a document as a time series in which each occurrence
  of a tag corresponds to a given impulse. By analyzing the frequencies of the corresponding Fourier transform, we can hence state
  the degree of similarity between documents.
  Experiments on real data show the effectiveness of the proposed technique.

Cite As Get BibTex

Elio Masciari, Sergio Flesca, Giuseppe Manco, Luigi Pontieri, and Andrea Pugliese. Exploiting Structural Similarity For Effective Web Information Extraction. In Foundations of Semistructured Data. Dagstuhl Seminar Proceedings, Volume 5061, pp. 1-20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2005) https://doi.org/10.4230/DagSemProc.05061.4

Author Details

Elio Masciari

Sergio Flesca

Giuseppe Manco

Luigi Pontieri

Andrea Pugliese

Any Issues?

Feedback on the Current Page

Thanks for your feedback!

Feedback submitted to Dagstuhl Publishing

Could not send message

Please try again later or send an E-mail