The typical data type to store descriptors is the JOEDataType.JOE_PAIR_DATA.
Every descriptor can be accessed by his name. To access the descriptor data
entries efficiently the descriptor data entries are stored in a dictionary.
Therefore descriptors can only occure once in a molecule.
Example 3-1. Getting descriptor data entries
// getting an iterator over all data elements
// including SSSR informations and other stuff
GenericDataIterator gdit = mol.genericDataIterator();
while ( gdit.hasNext() )
// get the next data element
genericData = gdit.nextGenericData();
// use only the data elements which contains descriptor
// or user defined data
if ( genericData.getDataType() == JOEDataType.JOE_PAIR_DATA )
// write this descriptor data as typical data block
// to an SD file
ps.printf( "> <%s>", genericData.getAttribute() );
pairData = ( JOEPairData ) genericData;
// write data in SD format, lines not longer than 80 characters
// per line and remove empty lines in data entries with
// ? or a character of your choice
ps.println( pairData.toString( IOTypeHolder.instance().getIOType( "SDF" ) ) );
Example 3-2. Setting descriptor data entries
// add a user defined data entry to the molecule
JOEPairData dp = new JOEPairData();
// the data entry has the name 'attribute'
dp.setAttribute( attribute );
// and a typical String value
// own types must have the fromString and toString method !!!
dp.setValue( dataEntry.toString() );
mol.addData( dp );
A big advantage is that you can use descriptors from other programs. If no calculation
routine in JOELib exists all unknown descriptors (e.g. additional data elements in
SDF-files) are handled as String's.
If you know the data type you can simply define your own data parser/writer. All known
decsriptors can be defined in
If you access data elements with mol.getData("DataName") the data element
will be automatically parsed if the data type is known (e.g. atom or bond properties or matrices
You can supress data parsing by using mol.getData("DataName", false)
which can be usefull if you not want to modify all data elements (should be faster !).
If you have special atom or bond properties you should always implement the
classes which guarantees you to access the data elements by the atom index or bond index
which were used in JOELib.
All implemented result classes are available at
and contains simple
types like int or double but also complex types like double array or int matrix.
If you want use this data types in different file formats you should add your needs to the
fromString(IOType ioType, String sValue) and
All new descriptors should implement the
and be defined in the
A simple example is the Kier descriptor
If you have a group of similar descriptors which uses the same initilization and result class
you can write a wrapper class like
which can very easily be used to create a lot of SMARTS pattern count descriptors, e.g.
to count the number of hydrogen donors in a molecule.
To remain user and developer friendly you should always produce a simple
set of documentation files (XML, HTML, RTF) in the
The easiest way would be to create a XML DocBook documentation file in the
These files can be easily transformed to HTML, RTF and PDF files.
If you want using a formation in these descriptor documentation files you must use <sect1>...</sect1>
or <sect2>...</sect2>, the <chapter> entries were already used by
the tutorial book. Futhermore you can use listitems, tables or analoge elements. All these single descriptor documentation files will be
generated by the Ant makefile mechanism (calling ant tutorial) and be available as HTML- and RTF-files in the docs/tutorial/descriptors/documentation-directory