This application simulates a group of ants learning where to find food and bringing it back home. The black squares represent ants out looking for sugar. The white squares are ants carrying sugar back to their nest. The black circle is the anthill, and the white circle is a big pile of sugar. You can click on the applet to move the location of the sugar or home. After a short period of time, the ants will form a path to the sugar and back home again. This is very similar to the trail that ants form on a kitchen counter.
The number of ants, their speed, and various learning parameters can be controlled by the sliders on the tabs at the bottom of the applet.
If an ant finds the sugar, it receives a reward. The value of that reward is echoed back along the path that the ant followed to reach the reward. Maps of these shared values are displayed in the panels on the right of the application.
The "No Sugar" map is used by ants that are searching for sugar. Ants trying to get back home use the "Sugar" value map:
The more green the location is, the more valuable the ants think that location is. This information is shared by all of the ants. The ants tend to move towards brighter green areas, and in this way, they collectively learn the path that provides the most reward for the least effort.
The more they follow this valuable path, the more they receive their rewards. This feedback loop reinforces the best path/areas until it is glowing bright green. Once this happens, it is fun to move the ants' goal. Click on the applet to move the sugar source. The value of the old sugar location will quickly fade, and the ants will start to wander off in search of greener pastures. This shows that this learning process is dynamic. The system can respond to changes in the environment.
In order to encourage the ants to find an efficient path, they receive a very slight punishment for every step they take. This is equivalent to the effort that a real ant expends to reach its food source. This encourages the ants to find the most efficient route and to favor closer reward sources over farther ones.
Most of the time, the ants will follow the most valuable (green) path, but occasionally, they will make a suboptimal move. These moves are exploratory. If the ants always followed the same path, then they would always stick to the first path they found. By leaving the best known path, they have the opportunity to discover a better path. This is the trade-off between exploiting existing knowledge and exploring new, and possibly better alternatives.
This is controlled by the "Exploration vs. Exploitation" slider.
If the ants are purely greedy, they will snap to a single path and follow that. If they are purely random, they will wander all over, only finding rewards by accident.
Out of some very simple rules a larger behavior emerges. It is interesting to note that the ants will often take one path to the sugar, and another path back home. They are staying out of each other's way. If they go back the way they came, they will bump into other ants, and this is more expensive, because they have to make additional moves to get around their neighbors. There is no rule that tells them to avoid these traffic jams, but this behavior emerges because it is more efficient given the rules of the system.
This application is an experiment based on reinforcement learning techniques found in this book: Reinforcement Learning. This is not exactly how ants in nature solve this problem, but it is similar. When an ant finds a food source, it lays down a path of pheromones on the way back to its hole. In this way, the ground itself becomes like our value function. The areas sprayed with more pheromones are more valuable, and the ants follow them to the food source.