AIM - Another Itemset Miner

Frequent Itemset Mining Implementation

AIM is an algorithm for mining frequent itemsets. Past studies have proposed various algorithms and techniques for improving the efficiency of the mining task. We integrate a combination of these techniques into an algorithm which utilize those techniques dynamically according to the input dataset. The algorithm main features include depth first search with vertical compressed database, diffset, parent equivalence pruning, dynamic reordering and projection. Experimental testing suggests that our algorithm and implementation significantly outperform existing algorithms/ implementations.

To read more about frequent itemset mining see "Fast Algorithms for Mining Association Rules".

Click here for the AIM-F algorithm review presentation (PowerPoint)

The code can be freely used for research purposes. If you use this program in a research paper then a citation to either of the following papers is welcome:

  • Sagi Shporer, AIM2: Improved Implementation of AIM, IEEE ICDM Workshop on Frequent Itemset Mining Implementations (ICDM'04), CEUR Workshop Proceedings, volume 126, Brighton, UK, November 2004. (PDF)
  • Amos Fiat and Sagi Shporer, AIM-F: Another Itemset Miner, IEEE ICDM Workshop on Frequent Itemset Mining Implementations (FIMI'03), CEUR Workshop Proceedings, volume 90, Melbourne, Florida, USA, 2003 (PDF)

Software

Version 2.01 - 06/12/2004 - Source, Windows Binary

Current source & binaries are stored at GitHub here.

Usage

To compile the software on unix machines run 'make' to complie fim_all executable. On windows use your favorite C++ compiler. The implementation was tested on VC++ 7 & VC++ 7.1

To run the software:
fim_all <InputFile> <MinSupport> [OutputFile]

example:
fim_all chess.dat 2000

Input

The dataset input must use the following ASCII format only:
Every transaction is on a separate line as a list of items separated by white space and ending with a newline.
Every item is a non-negative integer.

Output

After execution, the program prints the number of frequent itemsets for each length between 1 and k separately to the standard output stream, with k being the size of the largest itemset.
If an output file was specified:

All frequent itemsets will be printed to to the file in the following format:
Every line contains a single frequent itemset as a list of items separated by whitespace. At the end of the line, the support (number of transactions) is printed thus: (XXXX).