The arithmetic coding package is an open-soruce implementation of a generic arithemtic coder and decoder, along with byte stream models that are subclasses of Java's I/O streams. The implementation of arithmetic coding is based on:
Example statistical models include a uniform distribution, simple unigram model, and a parametric prediction by partial matching (PPM) model. The PPM model is based on:
Models other than PPM may be integrated for compression.
License: The arithcode package is licensed under the standard Apache 2.0 license.
You only need the jar to run the arithcode package, but you need the source distribution to build and test it.
The source distribution includes JUnit, which is required for testing, but not for building the jar or using the jar.
The distribution also includes Witten and Bell's Calgary corpus, which includes a range of different kinds of byte data for testing compression algorithms.
The package has an Apache Ant build file which has targets to build the jar, test, and document the package. So just download the source, unpack it and run any of the following ant targets:
The original papers cited in the introduction are quite readable. I've also provided a tutorial with further references.
Here are results for the Calgary corpus for speed and compression for various lengths of PPM.
And just for laughs, some historical results illustrating the progress of compilers and processors and memory.
Thanks to the original authors for making their source and corpora available.
Thanks also to Garrick Toubassi for patching a bug in version 1.1; it only arose in fairly extreme boundary conditions, making its diagnosis a very nice piece of debugging indeed.