EVENTS Project IST-1999-21125 (2000-2003).
Keywords: EC V Framework. IST. Digital TV. Computer Vision. Real time. Virtual environments for Football.Image Interpolation.


 

Rationale

The real world is 3-Dimensional, and one of the major challenges in the fields of computer vision and computer graphics is the construction and representation of life-like virtual 3D environments within a computer. An ideal 3D environment should be able to show all the possibilities physical reality has to offer: interactive navigation, realistic appearance, immersive environment, etc. Unfortunately, data reduction from the 3D world to computer based environments is no easy task. In fact, owing to both technical and economical reasons, the construction of realistic 3D scenarios is only viable for a restricted number of applications where time consuming effort and / or costly resources must be employed to achieve the objective.

Even in the cases when it is economically feasible, current 3D technologies have serious technical limitations in realistically representing the world. Regardless of whether automatic scanning techniques or manual modelling tools are used for 3D model acquisition, the state-of-the-art usually allows only static scenarios where the user is able to navigate interactively in the three-dimensional space, but the contents of the space itself remain invariant over time. Although recent advances have improved the visual integrity of the 3D representations, most existing systems still leave much to be desired in comparison with standard 2D video images.

The situation becomes even more discouraging when we consider wide scenarios, open environments and real-time events. Currently there is no practical way to handle multi-view presentations of these types of applications. The traditional approach followed by broadcast companies in these sorts of events consists in the use of multiple manually operated cameras, which are also manually switched by the TV producer. This is clearly short of the expectations we might have from a true 3D device.

However in many applications true 3D is not strictly required. Although electronic imaging is based on the projection of the 3D world on to an image plane of a camera or similar device that is 2-Dimensional, humans are incredibly good at inferring 3D information from moving images. This has led to the success of television and film as media for conveying 3D information using only a 2D display. The same is true for the display of virtual 3D environments by computer, and for the foreseeable future, 2D TV screens will remain the primary means of communicating 3D information to a human observer. The problem with this situation is that a planar projection shows only a single view of the 3D scenario, and typically the viewer has absolutely no control over the location of this viewpoint. Even when multiple cameras are on location, the viewer is restricted to images from that discrete set of viewpoints, and this constitutes a serious restriction, especially for the representation of wide scenarios.

The objective of the EVENTS project is to develop new techniques for the presentation of multi-view scenarios based on computer vision techniques that permits the possibility to select and modify at will your point of view in the scenario while at the same time keeping close to video quality standards. We aim to achieve this both for static scenes (frozen in time) or moving scenes (real-time video sequences). We do necessarily believe that in order to achieve this goal a full, accurate 3D model of the environment need not be created. Instead, we take advantage of the familiarity of humans with standard 2D display technology and giving the viewer control over the viewpoint. We will achieve this using novel view interpolation algorithms, which use minimal 3D information, while potentially using geometric constraints as and when required for speed and/or integrity.

The system will be based on a set of TV cameras placed at different positions around the scenario with the aim of providing widest coverage in the best possible conditions. Using the images taken simultaneously at a given instant by a number of physical cameras, the system will build interpolated views from any angle or position selected by the user creating a so called virtual camera that allows you to select your place in the field.

The result would be a virtual environment where it will be feasible, for example, to observe a football event from the most favourable position. You could see a corner foul from inside the goal, a penalty kick from the point of view of the goalkeeper, the arrival of an athletic competition from the very same finishing line, etc. What about change your point of view so that you are able to check if a foul has been legal or not? The number of possibilities are incredible.

Since the user may desire to watch the scenario from a point of view where there is no physical camera at the moment, the system will interpolate the desired intermediate image from the images available in nearby cameras providing a smooth transition from current camera position to the new virtual position. Of course, the freedom to move will be limited by the coverage provided by the set of cameras in the field. That is, you can select a point of view as long as the image requested can be composed of warping images from at least one or more cameras.

If the point of view is modified continuously by means of a joystick or similar device, the screen will be updated with the newly interpolated images in the intermediate positions until you get to your final destination. The user will have the illusion to navigate in a pseudo-3D environment with real images.

Objectives List

The specific objectives of the present proposal can be summarised in the following terms:

  • To develop the computer vision algorithms required for implementing the multi-view display system for still images. The main characteristic is the possibility to support multiple video streams, cover wide angles and be specifically oriented towards huge open environment scenarios.
  • To develop the computer vision algorithms required for implementing the multi-view display system for moving images. The main characteristic added is to support real-time image interpolation and video like results.
  • To develop the computer platform necessary to support the system proposed for still images. Together with the respective computer vision algorithms will materialise the Still Images Prototype. (SIP). Platform development will include HW selection and system software.
  • To develop the computer platform necessary to support the system proposed for moving images. Together with the respective computer vision algorithms will materialise the Moving Images Prototype. (MIP). Platform development will include HW selection and system software.
  • To develop the real time strategies necessary to implement the system for the MIP.
  • To develop the user interfaces required for both prototypes.
  • To materialise two pilot applications focused on the transmission of football events. The first one will be focused on the evaluation of the SIP, while the second one will concentrate efforts in the MIP.
  • To design and plan marketing and exploitation activities for the expected developments.
  • To search for additional potential applications of the technology as part of the exploitation activities.

 

Project Outcomes

The project is expected to provide the following results:

  • An Image Interpolation system for wide scenarios based on a new simplified computer vision approach.
  • A prototype suitable for interactive multi-view presentations of still interpolated images.
  • A prototype suitable for interactive real-time interpolation of moving images.

                                                                                                                               


Last Updated 03/09/2002