line
Homepage Alexander Schatten - Software: XML Indexer Vienna University of Technology Faculty for Informatics Institute for Software Technology and Interactive Systems
Home
Contact/CV
Information/Tutorials
Lehre/Forschung
Main Interests
Software
Publications
Images
> Home > Software > XML IndexerPrinter Friendly
line

XML Indexer

Highlights

XMLIndexer is a command line based java tools. Some highlights:

  • Java Based: Platform independent
  • Creates Index as XML file from arbitrary XML files.
  • Batch conversion of multiple files possible.
  • Format of "Output" index is configurable using XSLT.
  • Exclusion files can be defined holding words that may not appear in the index.

Installation

Binary Distribution

Installation is simple: unzip the archive in an arbitrary directory. The jar files contains all necessary classes including the xml libraries. A Java 2 Virtual machine is required, so please install an appropriate version e.g. J2RE from Sun Microsystems.

Start xmlindexer with this command in the same directory where you unpacked the xmlindexer.jar file.

java -cp xmlindexer.jar info.schatten.xmlindexer.XmlIndexer config-filename.xml

Source Distribution

The source distribution contains all Java sources and JBuilder project files as well as Apache Ant build files. So you are flexible in what tools to use for development. Unzip the archive into an arbitrary folder.

The source file distribution does not contain the xml libraries so you have to download and install the JDOM libraries too.

Downloads

XMLIndexer is available in two forms:

How to Modify Config File

Steps...

To build indices you have to write a config xml file defining all parameters for index generation. The generation is performed in two steps:

  1. XMLIndexer builds the index (in memory) and generates an XML representation of the result
  2. This result can be written into an xml file (standard output) or can be modified with an xsl stylesheet

Config File: Main Tags

The root tag is XMLINDEXER so the config file has to start with this tag and end with the closing tag. Inside this root tag there are three main tags:

ENCODINGDefine the xml encoding of the output index here.
EXCLUSIONThis tag holds (optional) filenames of exclusion files. These files are simple textfiles holding words (one per line) with words to exclude from index. Two files are included in distribution: "deutsch.txt" and "english.txt": these hold a set of german and english words I use for exclusion.
INDICESThis is the main section: inside this section you define the indices to be generated.

Config File: Define Index Generation: inside INDICES

Inside the INDICES tag you write the definition of all indices to be generated: Each Index Definition is inside the Index tag. In this Index tag you can define the following parameters:

xmlSourceFilenameFilename of the XML Sourcefile: From this file the index is generated.
indexTagThis is the tag in the xmlSourceFile that is indexed: Text inside this tag and also inside child tags is indexed.
referenceTagThis is also a tag in the xmlSourceFile: This tag is must be a parent tag of indexTag. One attribute of this tag is used as reference for the index words generated in indexTag.
referenceAttributeThis is an attribute from the referenceTag. It is used as reference for the index words.
xmlOutputFilenameThis is the filename of the index - xml output file.
xslOutputFilenameAll tags above are required, this tag is optional. If this field is left empty, a default output is generated. If you enter the filename of an xsl file here: the default output is transformed into an arbitrary output using xalan and your xslt script.

To principles are easy: please take a look at the example testindex.xml file and if you are interested also into the xsl directory: there you can find an xslt example how to customize output. Simply try it!

Contact

If you have bug reports, suggestions or make significant further developments with XMLIndexer, please send me an Email.

 
line
last changed at - see release info (c) by Alexander SchattenContact/Feedback
line