# The Boltzmann machine

The neural network discussed in this post, called the Boltzmann machine, is a stochastic and recurrent network. This post contains my exam notes for the course TDT4270 Statistical image analysis and learning and explains the network’s properties, activation and learning algorithm.

### Properties of the Boltzmann machine

• Recurrent network with all nodes connected to each other
• Nodes have binary outputs (either 0,1 or -1,1)
• Weights between the nodes are symmetric $$W_{ij} = W_{ji}$$
• Connection from a node to itself is not allowed
• Nodes are updated asynchronously (i.e. nodes are selected at random)
• The network can have hidden nodes
• Learning can be supervised or unsupervised
• Node activation is stochastic

### Activation algorithm

A node i is chosen at random for updating and the state of the node $$s_i$$ is set to 1 with a probability $$p_i$$:
$$p_i = \frac{1}{1 + e^{-\Delta E_i / T}}$$

where $$\Delta E_i$$ is the energy difference from the i’th unit being on or off. This difference is given by derivation of the energy E with respect to the node’s state and is therefor:
$$\Delta E_i = \sum_j W_{ij}s_j$$
And T is a parameter which simulates the temperature of a physical system, it is used with simulated annealing in the learning process.

### Learning algorithm

The network can be divided into to sets of nodes, a nonempty set of visible nodes V and a set of hidden nodes H. Visible nodes are those which are the interface towards the environment or the outside world. The hidden nodes used only for internal representation or modeling and does not interact with the outside world.

Supervised learning can be achieved by declaring some of the visible nodes as input nodes and some as output nodes. When we have no output nodes among the visible nodes we have unsupervised learning.

The learning algorithm has two different phases. The clamping phase and the free-running phase. In the clamping phase we lock or “clamp” a set of the visible nodes to specific values given by a specific pattern. Nodes that are clamped won’t change their output value and then we sample the values of the hidden nodes using simulated annealing.

In the free-running phase no nodes are locked/clamped and we sample values for all the nodes using simulated annealing.

The change in each weight $$\Delta W_{ij}$$ is set according the difference between the average probability of node i and j being in the on state at the same time in the clamping phase $$p_{ij}$$ and in the free running phase $$p\prime_{ij}$$.

$$\Delta W_{ij} = \epsilon (p_{ij}-p\prime_{ij})$$