NEURAL NET TRAINING SPEED USING A MODIFIED ELLIOT SIGMOID
The 2012b NN TBX features the option of replacing the TANSIG activation
function with the ELLIOT SIGMOID for faster training.
% http://www.mathworks.com/help/nnet/ug/...
% speedandmemoryoptimizations.htmlFast Elliot Sigmoid
%
% Some simple computing hardware might not support the exponential function
% directly, and software implementations can be slow. The Elliot sigmoid
% elliotsig function performs the same role as the symmetric sigmoid tansig
% function, but avoids the exponential function.
%
% Here, the times to execute elliotsiggh and tansig are compared.
% elliotsig is approximately four times faster on the test system.
Deleted computation time comparison (see the website)
% However, while simulation is faster with elliotsig, training is not
% guaranteed to be faster, due to the different shapes of the two transfer
% functions. Here, 10 networks are each trained for tansig and elliotsig,
% but training times vary significantly even on the same problem with the
% same network.
Deleted training time comparison (see the website)
The Elliot Sigmoid has the shape
elliotsig(x) = x ./ (1 + abs(x) );
compared to
tansig(x) = 2 ./ (1 + exp(2*x))  1;
The comparison plot shows that the Sshaped elliotsig approaches the
asymptotes [ 1, +1 ] more slowly. Therefore, even though it is computed
faster, it generally takes more epochs (iterations) to converge.
In order to mitigate this effect, I compared tansig with
elliotsigs = (s*x) ./ ( 1 + abs(s*x) );
for
x = 3*randn(1,e7); s=1:10;
Using the above equations instead of calling the functions, I found that
the minimum RMSE between tansig and elliotsigs occured when s = 4. In
particular,
RMSE4 = 0.1051
whereas
RMSE1 = 0.2285
For an impressive visual comparison, plot tansig, elliotsig and elliotsig4 on
the same graph.
A fast implementaion of elliotsig4 is
elliotsig4(x) = x ./ (0.25 + abs(x) );
which is just as fast as the original elliotsig.
To make a fair training comparison, I created a MATLABtype elliotsig4
function by modifying tansig. The resulting 225 lines of code will mask the
the basic difference in computation times of the two formulae.
I will post results later.
Hope this helps.
Greg
