I recall reading somwhere that a combintion of non-linear, sigmoid hidden unit activation functions and linear output units can be shown to be able to reproduce any function mapping. Has anyone a reference/cite for this? Cheers, Justin