Temporal Support of Regular Expressions in Sequential Pattern Mining

Authors Alejandro Vaisman, Leticia I. Gómez, Bart Kuijpers



PDF
Thumbnail PDF

File

DagSemProc.08471.4.pdf
  • Filesize: 299 kB
  • 15 pages

Document Identifiers

Author Details

Alejandro Vaisman
Leticia I. Gómez
Bart Kuijpers

Cite As Get BibTex

Alejandro Vaisman, Leticia I. Gómez, and Bart Kuijpers. Temporal Support of Regular Expressions in Sequential Pattern Mining. In Geographic Privacy-Aware Knowledge Discovery and Delivery. Dagstuhl Seminar Proceedings, Volume 8471, pp. 1-15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2009) https://doi.org/10.4230/DagSemProc.08471.4

Abstract

Classic algorithms for sequential pattern discovery,return  all  frequent sequences present in a database. Since, in general, only a few ones are interesting from a user's point of view, languages based on regular expressions (RE) have been  proposed to restrict frequent sequences to the ones that satisfy
user-specified constraints.
 Although  the support of a sequence is computed as the number of data-sequences satisfying a pattern with respect to the total number of data-sequences in the database, once regular expressions come into play, new approaches to the concept of support  are needed. For example, users may be interested in computing the support of the RE as a whole, in addition to the  one  of a particular pattern.
 As a simple example, the expression $(A|B).C$ is satisfied by sequences like A.C or B.C. Even though the semantics of this RE suggests that both of them are
equally interesting to the user, if neither of them verifies a  minimum support although together they do), they would  not be retrieved.
Also, when the items are frequently updated, the traditional way of counting support in sequential pattern mining may lead to   incorrect (or, at least incomplete), conclusions. For example, if we are looking  for the support of the sequence  A.B, where A and B are two items such that A was created after B, all sequences in the database that were completed before A was created, can never produce a match. Therefore, accounting for them would underestimate the support of the  sequence A.B.
  The problem gets more involved if we are interested in categorical sequential patterns. In light of the above, in this paper we propose to revise the classic  notion of support in sequential pattern mining,  introducing the concept of  temporal support of regular expressions, intuitively defined as
the number of sequences satisfying a target pattern, out of the total number of
sequences that  could have possibly  matched such pattern, where the pattern is
defined as a RE over complex items (i.e., not only item identifiers,
but also attributes and functions).
We present and discuss a theoretical framework for these novel notion of support.

Subject Classification

Keywords
  • Temporal support
  • sequential pattern mining

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail