Modelling Rays for Line of Sight in an Object-Rich World
Vision is our most effective sense for building an internal model of the world. If we don't simulate this faculty properly, AI agents will never understand a world in the same way that we do - their knowledge of a world will be a medley of simplistic assumptions and unthinking nous. But calculating true LOS is a computationally complex operation, and having to perform it constantly for a large number of agents is taxing on any system. Many simulations (such as game worlds) use simplifications (small fixed radii, no vision obstructions) or other significant assumptions to reduce the cost of operating LOS. But those speed gains aren't free - the world of such AI agents is poorly detailed and highly specialised.
Generally speaking, there are two main ways to handle vision. Because vision is symmetric (for our purposes), you either ask the question 'What can I see?' or you ask 'What can see me?'. Methods to answer the first question scale badly with area, because you have to scan outwards in all directions until you hit an (opaque) object. And techniques for solving the second question scale badly with the number of objects, since you need to look back at the player from every object.
I don't like the 'What can see me?' paradigm because inevitably it is selective (which is how it achieves its speed advantage). To agents that have visual information delivered to them through object-object ray testing, interesting objects just magically appear somewhere in their world. These objects could also disappear equally quickly, without providing any the agent with any good reasons for this occurrence. The information isn't provided with contextual information in the same medium - the rest of the agent's data comes from other sources, like pathing (which may also be quite sorcerous in its operation, from an agent's point of view).
'What can I see?' much better describes vision (obviously). The downside is that it's far more expensive than just watching out for interesting objects. But on the other hand, we get a lot more information out of it. Line of sight under this paradigm can provide information about everything in visible range - from the terrain underfoot to the building another agent just walked into. Because opaque areas tend to be solid, we can make use of this information in our pathing code and save calculation time there - the same goes for formulating positional strategy.
Algorithms for answering either question often trade memory efficiency for time efficiency (pre-calculation). But this won't provide a good solution for large amounts of varied visibility conditions, or dynamic visibility conditions. What we will look at here is a very fast and efficient method of modelling rays to resolve the 'What can I see?' query. For worlds with heavy object density, this is generally the best way to look at the problem, and it is also the most natural way to represent vision. For simplicity's sake, the description of the algorithm is focused on a 2D world divided into uniform areas which I will call tiles. I will assume we have pre-built a visibility map which can quickly tell us whether a particular tile represents an opaque area.
The most straight-forward way to check visibility under this paradigm is: For each potentially visible discrete area (tile) (e.g. those within a given sight radius), check visibility to that tile. The basic way to do this is to programmatically 'walk' out to that tile using something like a Bresenham line-drawing algorithm, and see if we encounter any vision-blocking objects (which we will call obstructions) on the way. This is obviously inefficient, because many of our walks are related, yet we share no information between operations.
A somewhat better way is: For each tile at maximum visible range, find out how far we can see in that direction. With this method, we only need to perform a walk for our perimeter tiles, not every tile in the area. Technically, these walks are really rays that are being cast through a medium that only allows for very discrete measurements. Knowing that these rays are all coming from the same source location, it is sufficient for us to attempt one maximum length ray cast in each direction.
But while that is a more efficient approach, due to the discrete nature of our world, rays out to maximum range will look at many of the same tiles as neighboring rays. Using Cartesian coordinates, if we are looking out from the origin (0, 0), up to 25% of our rays are going to walk through the tile at (1, 0). Even if we have a very fast means of determining whether we have processed a tile or not, the amount of overlap scales with our visual radius and world resolution, meaning that either the time or space complexity of the method will always be significantly worse than the ideal O(n).
What we will look at here is an algorithm to calculate true LOS that is highly efficient in both time and space. It will not look at any tile more than once, and it will not look between obstructions that form a continuous barrier from the viewer's perspective. For comparative purposes, on my 1.6GHz laptop the attached Java implementation takes an average of 50-60ms to calculate LOS from an arbitrary point on a 600 x 600 grid containing 5000 randomly placed obstructions.
The basis of the algorithm is a graph search covering all potentially visible positions. Our search nodes represent tile positions, and the children of a search node are the nodes for neighboring positions which are further away from the origin. The animated GIF to the left demonstrates how the search progresses on an empty grid. The blue point shows the position of the origin, and the green edge represents the expanding search perimeter. (If you find the animations distracting, you can toggle them on and off by clicking on them.)
Thus far, this is fairly mundane - a simple flood-fill expands in the same way. The difference is only apparent when we introduce obstructions. Instead of blindly searching behind obstructions, the search nodes of our algorithm will remember obstructions encountered earlier in the expansion, and so we can work out the visibility status of any particular tile on the current perimeter. To the left is an example with one obstruction shown in red, casting a pink shadow. (Note that these GIFs are all generated programmatically from an implementation, so they're accurate).
Assuming we can remember obstructions and keep that information local to the relevant sections of the perimeter, we will have accomplished quite a lot. We never visit a node more than once, and we only need to hold the current perimeter in memory at any one time. But what of my other claim - that we will avoid casting rays into areas behind a continuous barrier? Any combination of obstructions wider than the width of our (discretely represented) rays will block off all rays cast between them. If we can avoid casting those rays, obstructions will actually reduce computational requirements.
To the left shows the algorithm operating in an area with multiple contiguous obstructions (or one large obstruction, depending on how you want to look at it). The gray area left within the arc shows positions we never bother to visit, because we know they are outside of our line of sight. How we know this is by virtue of the way our perimeter is remembering previous obstructions, which we will explain shortly. The net effect is that the complexity of the algorithm in both time and space is directly related to the actual visible area. Since we have to look at visible tiles anyway (to draw them etc.), this is very close to optimal.
How it works
The heart of the algorithm is the way in which we remember obstructions when expanding the perimeter. Recall that to expand, search nodes create new nodes for neighboring positions away from the origin. For instance, a node at position (5,3) will create nodes at (6,3) and (5,4). Of course, (5,4) might have already been spawned from (4,4), in which case we don't create a new one. We call (4,4) and (5,3) the 'input' nodes for (5,4). Nodes inherit obstruction information from their input nodes, as shown below.a. b. c.
Diagram (a) shows our perimeter prior to striking obstructions. None of the search nodes carry any obstruction data, and every position is visible. Diagram (b) shows the perimeter when it encounters obstructions. The search nodes related to the obstruction positions now carry some data - specifically the direction relative to the origin that the obstruction lies on, represented as a vector (we are using the common graphics convention that positive Y is downwards).
Diagram (c) shows how we propagate the obstruction information that was initialised in (b). Note that the two obstructions aren't part of the perimeter in (c), their data is just shown for comparison. The numbers on the left of a square hold the obstruction vector (x component above, y component below). Note that the new nodes have inherited the vectors held by their input nodes (node (3,2) has two inputs carrying vectors, but we'll talk about how to resolve that later). So for now, let's say we propagate obstruction vectors when we spawn a new perimeter.
The numbers on the right represent the offset from the obstruction vector. Obstructions, not surprisingly, start off perfectly on the obstruction vector, for which I use equivalent values. When we populate position (4,1) with data, we note that it is a step in the X direction. Because we stepped in the X direction, we adjust the data to suit: the X value drops by the vector Y, and the Y value increases by the vector Y. Or rather, since we moved in the X direction, our propensity to move further in the X direction is reduced, and our propensity to move in the Y direction is increased (by the amounts needed to follow the ray).
So essentially, nodes carry both the obstruction vector and error data. This is the minimum information we can get away with, because we need to remember the direction of the obstruction, and we need error data because our world is discrete and therefore we can't be guaranteed to move perfectly with the vector. Nodes do not inherit any obstruction data if they are too far away from the vector. A node represents an obscured position if its error data puts it close enough to the obstruction vector.
That is all that is required to handle a single obstruction's line of sight. However, when two rays are close together, we get collisions such as the one at (3,2) in diagram (c). A collision like that one is what we call a 'cut' case, where we have no need to expand a node as we already know everything we need to about that branch of the tree. In this case, both of our inputs (3,1) and (2,2) are close enough to an obstruction vector to be obscure. Which is to say that rays through either of these points hit an obstruction (in this non-special case the points are the obstructions). And what we can logically infer from this is that there is no room for a ray between those obstructions because anything casting shadows on those angles must themselves be at least that close together (from the viewer's perspective).
From this deduction we can choose to ignore the node at (3,2). We simply don't expand it, and allow the rays to either side of it to delineate the borders of the arc. In addition, we can also cut nodes that have an obstruction ray on one side and a cut node as the other input, since we know that we are on the inside edge of an arc. Just like regular obscure nodes, I am marking these cut nodes as pink in the GIFs, since they are still checked by the algorithm, and obscure. No nodes are created for positions with both inputs cut, since they are implicitly within an arc region - these unchecked nodes are colored gray in the GIFs.
There are cases where a ray can 'hijack' another ray by overwriting its data when it is in the process of changing rows or columns. We prevent this by giving way to such changes in our code. Because both rays have the same source, they are always travelling apart, so we never have a case where one ray actually collides with another by reducing its own error in this manner - the ray that gave way will continue through its other output node. If both output nodes get overwritten, then our ray is within an arc described by the overwriting rays, and so we never needed to expand it anyway.
And that is pretty much it for our algorithm. I have deliberately not gone into much detail on how I represent the error values, since these could be stored in a number of ways. The way in which I do this is fairly simple, and requires only incremental addition and subtraction. A sample implementation for all of this is available here . It is in Java, but the basic code is very simple, and only the utility class for displaying results makes use of Java libraries.
Though we have only covered LOS for 2 dimensions here, addition of a third axis is straight-forward. Nodes simply have an extra dimension in their location, obstruction vector, and error data. One might think that the perimeter will get unmanageable in this case, but consider:
1. The ground will cut off roughly half of the sphere.
2. We generally model worlds where objects are in close proximity to the ground (due to gravity). So we can make our search perimeter follow the surface of the world by (for example) only moving upwards when our X or Y directions are blocked. Detached (flying) objects are exceptional in such worlds and we can do separate testing for them.
There are arches, overhangs, and ceilings to consider, but if the world is object-rich in the upwards direction, then there's no harm in expanding the perimeter upwards as well, since the algorithm performs better with more obstructions. Humans rarely do a full scan in all directions in any case.---
Note that the resolution of our visible world is arbitrary. If we want to speed up searches at the expense of accuracy, we can simply reduce the resolution by expanding the area that nodes cover.---
We don't need to do a full visual scan all of the time. It's easy to constrain the scan by falsifying obstructions in directions an agent is not focused on.---
If you want vision to extend to a particular radius, simply cut the search outside that radius with code something like this:
int distance = ((node.xLoc * node.xLoc) + (node.yLoc * node.yLoc));
if(distance > VISION_RADIUS_SQUARED) return;