An Efficient Compressor for XML
(Mirror from AT&T Labs-Research)
News: XMill has been implemented in summer 1999 at AT&T Labs Research in New Jersey, USA, and
is not developed any further by the original developers.
The XMill source code has now moved into a
SourceForge project.
Release 0.8 is downloadable there.
XMill is a new tool for compressing XML data efficiently.
It is based on a regrouping strategy that leverages the effect of
highly-efficient compression techniques in compressors such as gzip
.
XMill groups XML text strings with respect to their meaning and exploits similarities
between those text strings for compression. Hence, XMill
typically achieves much better compression rates than conventional compressors
such as gzip
.
XML files are typically much larger than the same data
represented in some reasonably efficient domain-specific data format.
One of the most intriguing results of XMill is that the conversion of proprietary
data formats into XML will in fact improve the compression - i.e. the
the compressed XML file is (up to twice) smaller than the compressed original file!
And this astonishing compression improvement is achieved at about the same
compression speed.
Download XMill version 0.7
Licensing
XMill is essentially free software. Please read the license
carefully: if you do not agree with the terms of the release, do not
download or use this software.
The license allows you to modify XMill and distribute the new
software, but you must agree to certain terms. Among others your
license agreement must satisfy certain conditions. To simplify this
procedure for you, the distribution package includes
MINTERMS.txt, a default license agreement
you may use for distributing the modified software.
Creditors
Technical Overview
XMill is based on grouping technique that groups text items with similar
syntax and meaning and compresses them together. A brief overview
over the underlying principles can be found here.
The user manual contains a description and the syntax
of the pre-defined user compressors.
The UPenn technical report MS-CIS-99-26
describes the fundamental principles of XMill in detail.
Mailing List
If you would like to receive news related to updates and new releases
of XMill, please subscribe to xmill@research.att.com. For
that, send email to majordomo@research.att.com,
with the body consisting of:
subscribe xmill [<address>].
Experiments
We have tested XMill on several large real-life XML data sources. Some of the
experimental results and comparisons with existing compressors are shown
here. The data sources and their XML representation
are described here.
Copyright © 2004 Hartmut Liefke