Jörg Kurt Wegner and Holger Fröhlich and Andreas Zell

Feature selection for Descriptor based Classification Models. 1. Theory and GA-SEC Algorithm

J. Chem. Inf. Comput. Sci. 2004, 44, pp. 921-930


Abstract

The paper describes different aspects of classification models based on molecular data sets with the focus on feature selection methods. Especially model quality and avoiding a high variance on unseen data (overfitting) will be discussed with respect to the feature selection problem. We present several standard approaches and modifications of our Genetic Algorithm based on the Shannon Entropy Cliques (GA-SEC) algorithm and the extension for classification problems using boosting.

Download

[pdf]


Bibtex

@Article{wfz04a,
  author   =     "J. K. Wegner and H. Fr{\"{o}}hlich and A. Zell",
  title    =     "{F}eature selection for {D}escriptor based {C}lassification {M}odels. 1. {T}heory and {GA}--{SEC} {A}lgorithm",
  abstract =     "The paper describes different aspects of classification models based on molecular data sets
                  with the focus on feature selection methods. Especially model quality and avoiding a high
                  variance on unseen data (overfitting) will be discussed with respect to the feature
                  selection problem. We present several standard approaches and modifications of our
                  Genetic Algorithm based on the Shannon Entropy Cliques (GA-SEC) algorithm and the
                  extension for classification problems using boosting.",
  journal  =     "J. Chem. Inf. Comput. Sci.",
  volume   =     "44",
  year     =     "2004",
  pages    =     "921-930",
  url      =     "http://dx.doi.org/10.1021/ci0342324",
  doi      =     "10.1021/ci0342324",
  note     =     "",
  contents =     "model quality, feature selection, filter approach, wrapper approach, combinatorial optimization, genetic algorithm, bias-variance-decomposition, rényi entropy, shannon entropy, jensen-shannon entropy, differential shannon entropy, clique detection, boosting, decision tree, recursive partitioning",
  topics =       "model quality, feature selection, filter approach, wrapper approach, combinatorial optimization, genetic algorithm, bias-variance-decomposition, rényi entropy, shannon entropy, jensen-shannon entropy, differential shannon entropy, clique detection, boosting, decision tree, recursive partitioning",
}