- 5.1 Installation and Test
- 5.2 Getting Started

- 5.3 Advanced Usage
- 5.3.1 Adjusting Parameters
- 5.3.2 Network Design
- 5.3.3 Understanding the Error-value
- 5.3.4 Training and Testing
- 5.3.5 Avoid Over-fitting
- 5.3.6 Adjusting Parameters During Training

- 5.4 Fixed Point Usage

5 User's Guide

The ``Fixed Point Usage'' section 5.4 is only intended for users with need of running the ANN on a computer with no floating point processor like e.g. an iPAQ.

In order to compile and test the library, go to the `src` directory and type `make runtest`. This will compile the library and run a couple of tests. An example output from this run is shown in appendix A.1. The output is quite verbose, but everything should work fine if the ```Test failed`'' string is not shown in any of the last five lines.

If the test succeeds, the following libraries should be ready for use:

`libfloatfann.a`The standard floating point library.`libdebugfloatfann.a`The standard floating point library, with debug output.`libdoublefann.a`The floating point library with double precision floats.`libdebugdoublefann.a`The floating point library with double precision floats and debug output.`libfixedfann.a`The fixed point library.`libdebugfixedfann.a`The fixed point library with debug output.

These libraries can either be used directly from this directory, or installed in other directories like e.g. `/usr/lib/`.

5.2 Getting Started

There are several reasons to why it is usually a good idea to write the training and execution in two different programs, but the most obvious is the fact that a typical ANN system is only trained once, while it is executed many times.

Four functions are used in this program and often these are the only four functions you will need, when you train an ANN. I will now explain how each of these functions work.

**fann_create**- Creates the ANN with a connection rate (1 for a fully connected network), a learning rate (0.7 is a reasonable default) and a parameter telling how many layers the network should consist of (including the input and output layer). After this parameter follows one parameter for each layer (starting with the input layer) telling how many neurons should be in each layer.
**fann_train_on_file**- Trains the ANN for a maximum of
`max_epochs`epochs^{5}, or until the mean square error is lower than`desired_error`. A status line is written every`epochs_between_reports`epoch. **fann_save**- Saves the ANN to a file.
**fann_destroy**- Destroys the ANN and deallocates the memory it uses.

The configuration file saved by `fann_save` contains all information needed in order to recreate the network. For more specific information about how it is stored please look in the source code.

Figure 11 shows a simple program which executes a single input on the ANN, the output from this program can be seen in figure 12. The program introduces two new functions which was not used in the training procedure and it also introduces the `fann_type` type. I will now explain the two functions and the type:

**fann_create_from_file**- Creates the network from a configuration file, which have earlier been saved by the training program in figure 9.
**fann_run**- Executes the input on the ANN and returns the output from the ANN.
**fann_type**- Is the type used internally by the fann library. This type is
`float`when including`floatfann.h`,`double`when including`doublefann.h`and`int`when including`fixedfann.h`. For further info on`fixedfann.h`, see section 5.4.

These six functions and one type described in these two sections, are all you will need to use the fann library. However, if you would like to exploit the full potential of the fann library, I suggest you read the ``Advanced Usage'' section and preferably the rest of this report.

5.3 Advanced Usage

I will describe four different procedures, which can help to get more power out of the fann library: ``Adjusting Parameters'', ``Network Design'', ``Understanding the Error-value'' and ``Training and Testing''.

The learning rate, as described in equation 2.11, is one of the most important parameters, but unfortunately it is also a parameter which is hard to find a reasonable default for. I have several times ended up using 0.7, but it is a good idea to test several different learning rates when training a network. The learning rate can be set when creating the network, but it can also be set by the `fann_set_learning_rate(struct fann *ann, float learning_rate)` function.

The initial weights are random values between -0.1 and 0.1, if other weights are preferred, the weights can be altered by the `void fann_randomize_weights(struct fann *ann, fann_type min_weight, fann_type max_weight)` function.

The standard activation function is the sigmoid activation function, but it is also possible to use the threshold activation function. I hope to add more activation functions in the future, but for now these will do. The two activation functions are defined as `FANN_SIGMOID` and `FANN_THRESHOLD` and are chosen by the two functions:

`void fann_set_activation_function_hidden(struct fann *ann,`

unsigned int activation_function)`void fann_set_activation_function_output(struct fann *ann,`

unsigned int activation_function)

These two functions set the activation function for the hidden layers and for the output layer. Likewise the steepness parameter used in the sigmoid function can be adjusted by these two functions:

`void fann_set_activation_hidden_steepness(struct fann *ann,`

fann_type steepness)`void fann_set_activation_output_steepness(struct fann *ann,`

fann_type steepness)

I have chosen to distinguish between the hidden layers and the output layer, to allow more flexibility. This is especially a good idea for users wanting discrete output from the network, since they can set the activation function for the output to threshold. Please note, that it is not possible to train a network when using the threshold activation function, due to the fact, that it is not differentiable. For more information about activation functions please see section 2.2.1.

The number of hidden layers is also important. Generally speaking, if the problem is simple it is often enough to have one or two hidden layers, but as the problems get more complex, so does the need for more layers.

One way of getting a large network which is not too complex, is to adjust the `connection_rate` parameter given to `fann_create`. If this parameter is 0.5, the constructed network will have the same amount of neurons, but only half as many connections. It is difficult to say which problems this approach is useful for, but if you have a problem which can be solved by a fully connected network, then it would be a good idea to see if it still works after removing half the connections.

5.3.3 Understanding the Error-value

If is the desired output of an output neuron and is the actual output of the neuron, the square error is . If two output neurons exists, then the mean square error for these two neurons is the average of the two square errors.

When training with the `fann_train_on_file` function, an error value is printed. This error value is the mean square error for all the training data. Meaning that it is the average of all the square errors in each of the training pairs.

5.3.4 Training and Testing

The internals of the `fann_train_on_file` function is shown in a simplified form in figure 13. This piece of code introduces the `void fann_train(struct fann *ann, fann_type *input, fann_type *desired_output)` function, which trains the ANN for one iteration with one pair of inputs and outputs and also updates the mean square error. The `fann_train_data` structure is also introduced, this structure is a container for the training data in the file described in figure 10. The structure can be used to train the ANN, but it can also be used to test the ANN with data which it has not been trained with.

Figure 14 shows how the mean square error for a test file can be calculated. This piece of code introduces another useful function: `fann_type *fann_test(struct fann *ann, fann_type *input, fann_type *desired_output )`. This function takes an input array and a desired output array as the parameters and returns the calculated output. It also updates the mean square error.

The threshold activation function is faster than the sigmoid function, but since it is not possible to train with this function, I will suggest another approach:

While training the ANN you could slightly increase the steepness parameter of the sigmoid function. This would make the sigmoid function more steep and make it look more like the threshold function. After this training session you could set the activation function to the threshold function and the ANN would work with this activation function. This approach will not work on all kinds of problems, but I have successfully tested it on the XOR function. The source code for this can be seen in appendix B.2.3

5.4 Fixed Point Usage

The decimal point returned from the function is an indicator of, how many bits is used for the fractional part of the fixed point numbers. If this number is negative, there will most likely be integer overflow when running the library with fixed point numbers and this should be avoided. Furthermore, if the decimal point is too low (e.g. lower than 5), it is probably not a good idea to use the fixed point version.

Please note, that the inputs to networks that should be used in fixed point should be between -1 and 1.

An example of a program written to support training in both fixed point and floating point numbers is given in appendix B.2.1 `xor_train.c`.

To help using fixed point numbers, another function is provided. `unsigned int fann_get_decimal_point(struct fann *ann)` which returns the decimal point. The decimal point is the position dividing the integer and fractional part of the fixed point number and is useful for doing operations on the fixed point inputs and outputs.

For an example of a program written to support both fixed point and floating point numbers please see `xor_test.c` in appendix B.2.2.

Steffen Nissen 2003-11-07