Technical Overview

XMill is based on grouping technique that groups strings with similar syntax and meaning and compresses them together. This method is a generalization of column-wise compression in relational databases. The user can control the grouping by specifying generalized regular expressions.

One of the central features of XMill is the semantic compression of text items through semantic compressor. Semantic compressor take advantage of the specific syntactic and semantic properties of text strings in the XML files. The user can either specify pre-defined semantic compressors (such as for integers, or for dictionary encoding) or implement own semantic compressors. The extensibility of XMill with user-defined semantic compressor is particularly useful for domain-specific compression techniques, such as for DNA sequences or 3-dimensional air-traffic coordinates.

A more detailed technical description of XMill can be found in the technical report MS-CIS-99-26. A description and the syntax of the pre-defined user compressors can be found in the user manual.


Copyright © 2004 Hartmut Liefke