Mining Multi-Sequential Patterns

Sandra de Amo, Arnaud Giacometti, Dominique Laurent, Rosana O. Santos

Previous studies on mining sequential patterns  have focused on discovering frequent sequences of sets of items (itemsets) in a
given database of sequences, where each sequence is a list of transactions ordered by transaction-time and each transaction is a
set of items. The main purpose of these studies is to predict which sequences  of sets of  itemsets  (items in a retail store, symptoms
of a disease, user interface actions, etc) are potentially believed of being associated to \textit{one object}
(resp. a client, a patient, a software user, etc.)  during a given period of time.

In this work, we generalize these studies by focusing on the problem of discovering multi-sequential patterns
with the purpose of predicting the behavior of  groups of objects related to each other by a given relationship. Here, we
are interested not only in discovering sequences of items which are associated to objects in time, but also  in discovering how a
given  relationship between objects may influence the shape of these sequences with respect to each other.

A simple example of a multi-sequential pattern is ``10 % of groups of clients, each group containing clients who work in a same place, present the
following behavior: a client c1 buys a car X, then, sometime later, his colleague, client c2  buys also the same car X''. In
this example, objects are clients and clients are related to each other by the relationship ``being a working colleague''.

The problem of mining multi-sequential patterns poses more challenges than mining classical sequential patterns, from
both a conceptual and algorithmic point of view.

We introduce  a general logical framework which captures all known sequential patterns proposed so
far as well as our  multi-sequential patterns. We propose an algorithm MSP based on the Apriori idea 
for mining classical sequential patterns. The algorithm is under implementation by Rosana Oliveira dos Santos (until May 2003)
as part of her Master Science Dissertation.