Datasets used for the Evaluation of the Applicability Domain

This page provides access to the datasets used in the article

Nikolas Fechner, Andreas Jahn, Georg Hinselmann and Andreas Zell
Estimation of the Applicability Domain of Kernel-based Machine Learning Models for Virtual Screening
Journal of Cheminformatics, 2010, 2:2
DOI:10.1186/1758-2946-2-2

The training of the SVMs was conducted using the following SD files. Some of the files had to be preprocessed respective to the version provided by cheminformatics.org. In order to allow a fair comparison of future research to this paper here we provide exactly the same SD files as used in the article.
We are very grateful to the original authors of the datasets to grant us the permission to provide these files for public access. Feel free to use these data for your own research, but please cite the original publications as well.

Training Sets

Factor Xa

Training set as SD file consisting of 290 benzamidine-type molecules annotated with pKi values in the ki SD tag. The compounds were originally published by

Fabien Fontaine, Manuel Pastor, Ismael Zamora and Ferran Sanz
Anchor-GRIND: Filling the Gap between Standard 3D QSAR and the GRid-INdependent Descriptors
Journal of Medicinal Chemistry, 2005, 48(7),pp 2687-2694
DOI:10.1021/jm049113+

PDGRF &beta

Training set as SD file consisting of 79 piperazyninylquinazoline analogues annotated with pIC50 values in the Activity SD tag. The compounds were originally published by

Rajarshi Guha and Peter C. Jurs
Development of Linear, Ensemble and Nonlinear Models for the Prediction and Interpretation of the Biological Activity of a Set of PDGFR Inhibitors
Journal of Chemical Information and Computer Science, 2004, 44(6),pp 2179-2189
DOI:10.1021/ci049849f

Thrombin

Training set as SD file consisting of 88 benzamidine-type molecules annotated with pKi values in the Activity SD tag. The compounds were originally published by

Markus Böhm, Jörg Stürzebecher and Gerhard Klebe
Three-Dimensional Quantitative Structure-Activity Relationship Analyses Using Comparative Molecular Field Analysis and Comparative Molecular Similarity Indices Analysis To Elucidate Selectivity Differences of Inhibitors Binding to Trypsin, Thrombin, and Factor Xa
Journal of Medicinal Chemistry, 1999, 42(3),pp 458-477
DOI:10.1021/jm981062r

Screening Sets

The QSAR models were used to rank the respective data sets from the DUD LIB VS 1.0. This compilation was originally published by

Andreas Jahn, Georg Hinselmann, Nikolas Fechner and Andreas Zell
Optimal Assignment Methods for Ligand-based Virtual Screening
Journal of Cheminformatics, 2009, 1(14)
DOI:10.1186/1758-2946-1-14


Contact:

Nikolas Fechner
Andreas Jahn