AIM - Another Itemset Miner
Frequent Itemset Mining Implementation
AIM is an algorithm for mining frequent itemsets. Past
studies have proposed various algorithms and techniques for improving the
efficiency of
the mining task. We integrate a combination of these techniques into an
algorithm which utilize those techniques dynamically according to the input
dataset. The algorithm main features include depth first search with vertical
compressed database, diffset, parent equivalence pruning, dynamic reordering and
projection. Experimental testing suggests that our algorithm and implementation
significantly outperform existing algorithms/
implementations.
To read more about frequent itemset mining see
"Fast Algorithms for
Mining Association Rules".
Click here for the AIM-F algorithm review presentation (PowerPoint)
The code can be freely used for research purposes.
If you use this program in a research paper then a citation to either of the
following papers is welcome:
-
Sagi Shporer, AIM2: Improved Implementation of AIM, IEEE
ICDM Workshop on Frequent Itemset Mining Implementations (ICDM'04), CEUR
Workshop Proceedings, volume 126, Brighton, UK, November 2004. (PDF)
-
Amos Fiat and Sagi Shporer, AIM-F: Another Itemset Miner,
IEEE ICDM Workshop on Frequent Itemset Mining Implementations (FIMI'03), CEUR
Workshop Proceedings, volume 90, Melbourne, Florida, USA, 2003 (PDF)
Software
Version 2.01 - 06/12/2004 - Source,
Windows Binary
Current source & binaries are stored at GitHub here.
Usage
To compile the software on unix machines run 'make' to complie
fim_all executable. On windows use your favorite C++ compiler. The
implementation was tested on VC++ 7 & VC++ 7.1
To run the software:
fim_all <InputFile> <MinSupport> [OutputFile]
example:
fim_all chess.dat 2000
Input
The dataset input must use the following ASCII format only:
Every transaction is on a separate line as a list of items separated by white
space and ending with a newline.
Every item is a non-negative integer.
Output
After execution, the program prints the number of frequent
itemsets for each length between 1 and k separately to the standard output
stream, with k being the size of the largest itemset.
If an output file was specified:
All frequent itemsets will be printed to to
the file in the following format:
Every line contains a single frequent itemset as a list of items separated by
whitespace. At the end of the line, the support (number of transactions) is
printed thus: (XXXX).
|