Virtual Panning
for Lecture Environments

Virtual Panning

Stitching Component


By Terry Tsen


Introduction


The process of image stitching has been used extensively to create panoramic views of scenes from individual images. Since the field of view of cameras is always smaller than the human field of view, stitching images together can provide a view of the environment as humans would see it. Image stitching is also able to present a large object in a single picture when multiple camera shots are needed to photograph the entire object.

The stitching component of the VIRPAN system is responsible for taking multiple recorded videos and stitching them together to create one panoramic video. The recorded videos are taken from fixed cameras in the lecture room, with each camera viewing a different part of the front of the lecture room. This component hopes to remove the need of a high-definition, wide field-of-view camera which is expensive to purchase, thus offering a cost-effective solution to the problem of automated lecture recording.


Aims


The aim of the stitching component is to stitch two or more videos together to produce a single panoramic video. The panoramic video created must have minimal visual artefacts and/or distortions, especially along the stitch seam, in order to give the impression that the video is recorded with a single camera. The evaluation of the stitch quality is done qualitatively.

With regards to the execution time, the processing time of the entire VIRPAN system needed to be under three times the length of the recorded lecture. With the stitching component taking much of the execution time compared to the other components, it needs to be executed as fast as possible while still producing good quality stitched videos.


Component Overview


Since OpenCV is used to create the VIRPAN system, the stitching component uses a pipeline that is similar to the one used by OpenCV.

overview of the stitching component

The base stitching pipeline contains seven stages of processing:

  • • Feature detection: Detects features within input images.
  • • Feature matching: Matches features between input images.
  • • Homography estimation: Estimates camera parameters between pairs of matched images.
  • • Bundle adjustment: Solves for all camera parameters jointly.
  • • Image warping: Warp the images onto a compositing surface.
  • • Gain compensation: Normalise the brightness and contrast of all images.
  • • Blending: Pixels along the stitch seam are blended to minimise visibility of stitch seams.

The above stitching pipeline undergoes various iterations to speed up the process of stitching, which is documented in Terry’s final report. One of the iterations includes multithreading the stitching pipeline. The execution times for the multithreading version is documented below.


Results


Execution time tests were done to determine the ratio between the execution time and length of video stitched. The list below shows the ratio for different lengths of videos stitched. The computer specifications used for the testing are as follows: Intel Core i5-3470 CPU @ 3.2 GHz, 4GB DDR3 133 Mhz RAM, 4000MiB Linux Swap Space.

  • • 60 seconds of stitched video took 241.83 seconds to stitch (ratio: 4.03).
  • • 120 seconds of stitched video took 742.75 seconds to stitch (ratio: 6.19).
  • • 300 seconds of stitched video took 2740.59 seconds to stitch (ratio: 9.14).
  • • 600 seconds of stitched video took 7827.11 seconds to stitch (ratio 13.05).
  • More execution times are documented in Terry's final report.

The requirement for execution time needed to have a ratio of below three. None of the ratios above is able to satisfy the requirement. More testing is required on a much faster computer with more RAM.

With regards to the stitch quality, the stitch seam is evident when the lecturer crosses the stitch seam. However, the writing on the blackboard is legible, which is the most important part of lecture recordings. Thus, the quality of the stitched images is deemed reasonably good.

result - lecturer cutoff at the stitch seam
result - blackboard text is legible

Since lecture recordings were from a single lecture venue, more testing is required to determine the stability of the execution time as well as the quality of the stitched videos.