4.1 API Design
4.2 Architectural Design
- 4.2.1 Connection Centered Architecture
- 4.2.2 Neuron Centered Architecture
4.3 Algorithmic Design
- 4.3.1 The Density Algorithm
- 4.3.2 The Activation Functions
4.4 Fixed Point Design
- 4.4.1 The Position of the Decimal Point
4.5 Code Design

4 Design and Implementation

In section 3 I have analyzed what the library should be able to do and which methods should be used to reach this objective. In this section I will use these considerations to give concrete suggestions as to how the design and programming of the library should be constructed. I will also describe how I have implemented some of these suggestions and why I have not implemented others. If nothing else is stated, all suggestions from both analysis and design have been implemented.

4.1 API Design

Much of the API have already been sketched in the ``Usage Analysis'' (section 3.1), so I will only give a few more details in this section and leave the actual description of the API to the ``User's Guide'' (section 5).

Since the library should be written in ANSI C, the API needs to be a function based API, but there can still be an object oriented thought behind the API.

I will use an ANN structure, which can be allocated by a constructor and deallocated by a destructor. This structure should be given as the first argument to all functions which operates on the ANN, to mimic an object oriented approach.

The ANN should have three different methods of storing the internal weights: float, double and int. Where float and double are standard floating point representations and int is the fixed point representation. In order to give the compiler the best possible opportunity to optimize the code, this distinction should be made at compile-time. This will produce several different libraries and require that the person using the library should include a header file which is specific to the method chosen. Although there is this distinction between which header file is include, it should still be easy to write code which could compile with all three header files. For this purpose I have invented a fann_type, which is defined in the three header files as float, double and int respectively.

It should be possible to save the network in standard floating point representation and in fixed point representation (more on this in section 4.4).

Furthermore there should be a structure which could hold training data. This structure should like the net itself be loadable from a file. The structure of the file should be fairly simple, making it easy to export a file in this format from another program.

I will leave the rest of the API details to section 5 ``User's Guide''.

4.2 Architectural Design

In section 3.3.2 ``Architectural Optimization'' I have outlined how I will create the general architectural design. In this section I will specify more precisely how the design should be.

The data structures should be structured in three levels: A level containing the whole ANN, a level containing the layers and a level containing the neurons and connections.

With this three level structure, I will suggest two different implementations. The first is centered around the connections and the second is centered around the neurons.

4.2.1 Connection Centered Architecture

In a structure where the connections are the central structure, the three levels would look like this:

fann The ANN with references to the connection layers.
fann_connection_layer The connection layers with references to the connections.
fann_connection The connections with a weight and two pointers to the two connected neurons.

Where the connection layers represent all the connections between two layers and the neurons themselves are only basic fann_type's. The main advantage of this structure is that one simple loop can run through all the connections between two layers. If these connections are allocated in one long array, the array can be processed completely sequentially. The neurons themselves are only basic fann_type's and since there are far less neurons than there are connections they do not take up that much space. This architecture is shown on figure 6.

**Figure 6:** The three levels for the connection centered architecture. At level 3, the `fann_connection` structures consists of a weight and a pointer to each of the two connected neurons.
$\includegraphics[width=0.9\textwidth]{figs/connection_centered.fig.eps}$

This was the first architecture which I implemented, but after testing it against Lightweight Neural Network version 0.3 [van Rossum, 2003], I noticed that this library was more than twice as fast as mine.

The question is now, why was the connection centered architecture not as fast as it should have been? I think the main reasons was that the connections, which is the most represented structure in ANNs, was implemented by a struct with two pointers and a weight. This took up more space than needed which meant poor cache performance. It also meant that the inner loop should constantly dereference pointers, which the compiler had no idea where would end up, although they where actually accessing the neurons in a sequential order.

4.2.2 Neuron Centered Architecture

After an inspection of the architecture used in the Lightweight Neural Network and evaluation of what went wrong in the connection centered architecture, I designed the neuron centered architecture, where the three levels looks like this:

fann The ANN with references to the layers.
fann_layer The layers with references to the neurons.
fann_neuron The neurons with references to the connected neurons in the previous layer and the weights for the connections.

In this architecture the fann structure has a pointer to the first layer and a pointer to the last layer⁴. Furthermore, it contains all the parameters of the network, like learning rate etc. The individual layers consists of a pointer to the first neuron in the layer and a pointer to the last neuron in the layer. This architecture is shown on figure 7.

**Figure 7:** The three levels for the neuron centered architecture. At level 3, the `fann_neuron` structures consists of a neuron value, a value for the number of neurons from the previous layer, connected to the neuron and two pointers. The first pointer points at the first weight for the incoming connections and the second pointer points to the first incoming connection (see figure 8).
$\includegraphics[width=0.9\textwidth]{figs/neuron_centered.fig.eps}$

**Figure 8:** Architecture of the fann library at the third level, illustrated by an ANN with 1 input neuron, two neurons in a hidden layer and one output neuron. Bias neurons are also illustrated, including a bias neuron in the output layer which is not used, but still included to make algorithms more simple. Three things should be noticed on this figure, the first is that all pointers go backwards which is more optimal when executing the ANN. The second is that all neurons, weights and connections have a number inside of them which shows their position in the array that they are allocated in, the arrays are implicit shown by colored boxes. The third is that only pointers to the first connection and weight connected to a neuron is required, because the neuron knows how many incoming connections it has.
$\includegraphics[width=\textwidth]{figs/architecture.fig.eps}$

Figure 8 illustrates how the architecture is connected at the third level. The neurons, the weights and the connections are allocated in three long arrays. A single neuron consists of a neuron value, two pointers and a value for the number of neurons connected to in the previous layer. The first pointer points at the position in the weight array, which contain the first weight for the connections to . Likewise the second pointer points to the first connection to in the connection array. A connection is simply a pointer to a neuron, in the previous layer, which is connected to .

This architecture is more cache friendly, than the connection centered architecture. When calculating the input to a neuron, the weights, the connections and the input neurons are all processed sequentially. The value of the neuron itself is allocated in a local variable, which makes for much faster access. Furthermore, the weights and the connections are all processed completely sequentially, throughout the whole algorithm. When the ANN is fully connected, the array with the connected neurons is obsolete and the algorithms can use this to increase speed.

4.3 Algorithmic Design

The algorithm for executing the network and the backpropagation algorithm is described in section 2.2.2 and 2.3.1 and the optimizations needed in these algorithms are described in section 3.3.

These algorithms are however not the only algorithms needed in this library.

4.3.1 The Density Algorithm

In section 3.1 I expressed the wish for a density parameter in the constructor. The reason for adding this parameter, is to allow it to be like a cheap version of the optimal brain damage algorithm [LeCun et al., 1990]. In large ANNs, many of the rules that are generated only utilize a few connections. Optimal brain damage try to remove unused connections. I will provide the opportunity to remove some connections in advance and then try to see if the network could be trained to give good results. There is no guarantee that this will give a better ANN, but it will give the users of the library another parameter to tweak in order to get better performance.

The algorithm, which creates an ANN with a certain density , has a number of requirements which it must comply to:

The number of neurons should be the same as stated by the parameters to the constructor.
The bias neurons should still be fully connected, since they represent the value in the activation functions. These connections should not be counted when calculating the density.
All neurons should be connected to at least one neuron in the previous layer and one neuron in the next layer.
The same connection must not occur twice.
The connections should be as random as possible.
The density in each layer should be as close to as possible.

The algorithm that I have constructed to do this is illustrated in algorithm 1. Because some connections should always be in place, the actual connection rate may be different than the connection rate parameter given.

$\begin{algorithm} % latex2html id marker 368 [htb] \begin{algorithmic} \REQUIRE... ...t represented in this algorithm in order to make it more simple.}\end{algorithm}$

4.3.2 The Activation Functions

I section 3.3.8 I suggested that a lookup table would probably be faster than calculating the actual activation functions. I have chosen not to implement this solution, but it would probably be a good idea to try it in the future.

This optimization has not been implemented because the activation function is only calculated once for each neuron and not once for each connection. The cost of the activation function becomes insignificant compared to the cost of the sum function, if fully connected ANNs become large enough.

The activation functions, that I have chosen to implement is the threshold function and the sigmoid function.

4.4 Fixed Point Design

In section 3.2 I explained, that after fully training an ANN you can make one check, that will guarantee, that there will be no integer overflow. In this section I will suggest several ways that this could be implemented and describe how I have implemented one of these suggestions.

The general idea behind all of these suggestions, is the fact that you can calculate the maximum value that you will ever get as input to the activation function. This is done, by assuming that the input on each connection into a neuron is the worst possible and then calculating how high a value you could get.

Since all inputs internally in the network is an output from another activation function, you will always know which value is the worst possible value. The inputs on the other hand are not controlled in any way. In order to ensure that an integer overflow will never occur, you will have to put constraints on the inputs. There are several different kinds of constraints you could put on the inputs, but I think that it would be beneficial to put the same constraints on the inputs as on the outputs, implying that the input should be between zero and one. Actually this constraint can be relaxed to allow inputs between minus one and one.

I now have full control of all the variables and I will have to choose an implementation method for ensuring that integer overflow will not occur. There are two questions here which needs to be answered. The first is how should the decimal point be handled? But I will come to this in section 4.4.1.

The second is when and how the check be should made? An obvious choice would be to let the fixed point library make the check, as this library needs the fixed point functionality. However, this presents a problem because the fixed point library is often run on some kind of portable or embedded device. The floating point library, on the other hand, is often run on a standard workstation. This fact suggests that it would be useful to let the floating point library do all the hard work and simply let the fixed point library read a configuration file saved by the floating point library.

These choices make for a model, where the floating point library trains the ANN, then checks for the possibility of integer overflow and saves the ANN in a fixed point version. The fixed point library then reads this configuration file and can begin executing inputs.

4.4.1 The Position of the Decimal Point

The position of the decimal point is the number of bits you will use for the fractional part of the fixed point number. The position of the decimal point also determines how many bits that can be used by the integer part of the fixed point number.

There are two ways of determining the position of the decimal point. The first way, is to set the decimal point at compile time. The second is to set the decimal point when saving the ANN from the floating point library.

There are several advantages of setting the decimal point at compile time:

Easy to implement
The precision is known in advance
The scope of the inputs and outputs are known when writing the software using the fixed point library
The compiler can optimize on basis of the decimal point

There are however also several advantages of setting the decimal point when saving the ANN to a fixed point configuration file:

The precision will be as high as possible
Less ANNs will fail the check that ensures that an overflow will not occur

Although there are less advantages in the last solution, it is the most general and scalable solution, therefore I will choose to set the decimal point when saving the ANN. The big question is now, where should the decimal point be? And how can you be absolutely sure that an integer overflow will not occur when the decimal point is there?

Before calculating where the decimal point should be, I will define what happens with the number of used bits under certain conditions: When multiplying two integer numbers, you will need the same number of bits to represent the result, as was needed to represent both of the multipliers. E.g. if you multiply two 8 bit numbers, you will get a 16 bit number [Pendleton, 1993]. Furthermore, when adding or subtracting two signed integers you will need as many bits as used in the largest of the two numbers plus one. When doing a fixed point division, you will need to shift the numerator left by as many bits as the decimal point. When doing a decimal point multiplication, you first do a standard integer multiplication and then shift right by as many bits as the decimal point.

Several operations has the possibility of generating integer overflow, when looking at what happens when executing an ANN with equation 2.5.

If is the number of bits needed for the fractional part of the fixed point number, we can calculate how many bits are needed for each of these operations. To help in these calculations, i will define the function, were is the number of bits used to represent the integer part of .

When calculating we do a fixed point multiplication with a weight and a number between zero and one. The number of bits needed in this calculation is calculated in equation 4.1.

$\begin{displaymath}\begin{array}{ll} t + bits(w_i) + t + bits(x_i) & = t + bits(w_i) + t + 0 & = 2t + bits(w_i) & \end{array}\end{displaymath}$

(4.1)

When calculating the activation function in fixed point numbers, it is calculated as a stepwise linear function. In this function the dominating calculation is a multiplication between a fixed point number between zero and one and the input to the activation function. From the input to the activation function another fixed point number is subtracted before the multiplication. The number of bits needed in this calculation is calculated in equation 4.2.

$\displaystyle 2t + bits \left( w_{n+1} \sum_{i=0}^{n}w_i x_i \right) + 1$

(4.2)

Since the highest possible output of the sum function is higher than the highest possible weight, this operation is the dominating operation. This implies, that if I can prove that an overflow will not occur in this operation, I can guarantee that an overflow will never occur.

When saving the ANN in the fixed point format, I can calculate the number of bits used for the integer part of the largest possible value that the activation function can be given as a parameter. This value is named in equation 4.3, which calculates the position of the decimal point for a bit signed integer (remembering that one bit is needed for the sign).

$\displaystyle f = \frac{n-2-m}{2}$

(4.3)

4.5 Code Design

Section 3.3 describes the optimization techniques used when implementing the library. This section will not elaborate on this, but rather explain which techniques is used in the low-level code design, in order to make the library easy to use and maintain.

The library is written in ANSI C which puts some limitations on a programmer like me, who normally codes in C++. However, this is not an excuse for writing ugly code, although allocating all variables at the beginning of a function will never be pretty.

I have tried to use comprehensible variable and function names, I have also tried only using standard C functions, in order to make the library as portable as possible. I have made sure that I did not create any global variables and that all global functions or macros was named in a way, so that they would not easily interfere with other libraries (by adding the name fann). Furthermore, I have defined some macros which are defined differently in the fixed point version and in the floating point version. These macros help writing comprehensible code without too many #ifdef's.

Steffen Nissen 2003-11-07