Abstract:
One of the most important tasks in data mining is the discovery of frequent sequential patterns. This task focuses on identifying subsequences within a sequence database that frequently occur in the same timestamp order. An extension of this task is timed sequential pattern mining, which discovers frequent sequences from a sequence database, along with the temporal relationships in patterns. Mining such patterns supports a wide range of applications, including recommendation systems in transportation, healthcare, and weather forecasting. While many existing approaches have been developed to mine sequential patterns and timed sequential patterns, they assume the database is static. However, in real-world scenarios, databases are dynamic, evolving in response to user interactions, updated data, and changing requirements. Consequently, finding the complete set of timed sequential patterns efficiently, without repeatedly rescanning the database from scratch, remains a significant research challenge. To fill this gap, a novel algorithm called MINIng Timed Sequential Patterns in DYnamic Sequence Databases (MinitsDays) is proposed. MinitsDays is designed to discover timed sequential patterns that capture the temporal relation in patterns in a dynamic timed sequence database. Extensive theoretical and experimental evaluations were conducted to assess the performance of MinitsDays using both real-world and synthetic datasets. The experimental results demonstrate the effectiveness and advantages of the proposed approach. Additionally, the algorithm leverages parallelism through multicore CPUs to significantly enhance performance.