Xcell Journal Online
  Xcell Journal Archives
   
  Writing for Xcell
  Advertising in Xcell
  FREE Subscription
   
  Partner Yellow Pages
  Reference Pages
  Contact Us

    

Home : Documentation : Xcell Journal Online : Article
Implementing DSP Algorithms Using Spartan-3 FPGAs



by Paolo Giacon, Graduate Student, Universita di Verona, Italy
paolo.giacon@students.univr.it
Saul Saggin, Undergraduate Student, Universita di Verona, Italy
saul.saggin@students.univr.it
Giovanni Tommasi, Undergraduate Student, Universita di Verona, Italy
giovanni.tommasi@students.univr.it
Matteo Busti, Graduate Student, Universita di Verona, Italy
matteo.busti@students.univr.it (4/18/05)


This article presents two case studies of FPGA implementations for commonly used image processing algorithms — feature extraction and digital image warping.
article link to PDF
Article PDF 385 KB


Computer vision is a branch of artificial intelligence that focuses on equipping computers with the functions typical of human vision. In this discipline, feature tracking is one of the most important pre-processing tasks for several applications, including structure from motion, image registration, and camera motion retrieval. The feature extraction phase is critical because of its computationally intensive nature.

Digital image warping is a branch of image processing that deals with techniques of geometric spatial transformations. Warping images is an important stage in many applications of image analysis, as well as some common applications of computer vision, such as view synthesis, image mosaicing, and video stabilization in a real-time system.

In this article, we'll present an FPGA implementation of these algorithms.

Feature Extraction Theory
In many computer vision tasks we are interested in finding significant feature points — or more exactly, the corners. These points are important because if we measure the displacement between features in a sequence of images seen by the camera, we can recover information both on the structure of the environment and on the motion of the viewer.

Figure 1 shows a set of feature points extracted from an image captured by a camera.Corner points usually show a significant change of the gradient values along the two directions (x and y). These points are of interest because they can be uniquely matched and tracked over a sequence of images, whereas a point along an edge can be matched with any number of other points on the edge in a second image.

The Feature Extraction Algorithm
The algorithm employed to select good features is inspired by Tomasi and Kanade's method, with the Benedetti and Perona approximation, considering the eigenvalues a and b of the image gradient covariance matrix. The gradient covariance matrix is given by:

where Ix and Iy denote the image gradients in the x and y directions.

Hence we can classify the structure around each pixel observing the eigenvalues of H:

No structure : a = b = 0
Edge : a = 0, b >> 0
Corner : a >> 0, b >> 0
Using the Benedetti and Perona approximation, we can choose the corners without computing the eigenvalues.

We have realized an algorithm that, compared to the original method, doesn't require any floating-point operations. Although this algorithm can be implemented either in hardware or software, by implementing it in FPGA technology we can achieve real-time performance.

Input:

  • 8-bit gray-level image of known size (up to 512 x 512 pixels)
  • The expected number of feature points (wf )
Output:
  • List of selected features (FL). The type of the output is a 3 x N matrix whose:
    • First row contains the degrees of confidence for each feature in the list
    • Second row contains the x-coordinates of the feature points
    • Third row contains the y-coordinates of the feature points
Semantic of the Algorithm
In order to determine if a pixel (i, j) is a feature point (corner), we followed Tomasi and Kanade's method.

First, we calculate the gradient of the image. Hence the 2 x 2 symmetric matrix G = [a b; b c] is computed, whose entries derive from the gradient values in a patch around the pixel (i, j).

If the minimum eigenvalue of G is greater than a threshold, then the pixel (i, j) is a corner point. The minimum eigenvalue is computed using an approximation to avoid the square root operation that is expensive for hardware implementations.

The corner detection algorithm could be summarized as follows:

The image gradient is computed by mean of convolution of the input image with a predefined mask. The size and the values of this mask depend on the image resolution. A typical size of the mask is 7 x 7.

  • For each pixel (i, j) loop:

    where N is the number of pixels in the patch and Ixk and Iyk are the components of the gradient at pixel k inside the patch.
  • Pi,j = (a–t)(c–t)– b2
    where t is a fixed integer parameter.
  • If (Pi,j > 0) and (ai,j > t), then we retain pixels (i,j)
  • Discard any pixel that is not a local maximum of Pi,j
  • End loop
  • Sort, in decreasing order, the feature list FL based on the degree of confidence values and take only the first wf items.
Implementation
With its high-speed embedded multipliers, the Xilinx® SpartanTM-3 architecture meets the cost/performance characteristics required by many computer vision systems that could take advantage of this algorithm.

The implementation is divided into four fundamental tasks:

  1. Data acquisition. Take in two gradient values along the x and y axis and compute for each pixel three coefficients used by the characteristic polynomial. To store and read the gradient values, we use a buffer (implemented using a Spartan-3 block RAM).
  2. Calculation of the characteristic polynomial value. This value is important to sort the features related to the specific pixel. We implemented the multiplications used for the characteristic polynomial calculus employing the embedded multipliers on Spartan-3 devices.
  3. Feature sorting. We store computed feature values in block RAM and sort them step by step by using successive comparisons.
  4. Enforce minimum distance. This is done to keep a minimum distance between features; otherwise we get clusters of features heaped around the most important ones. This is implemented using block RAMs, building a non-detect area around each most important feature where other features will not be selected.
Spartan-3 Theoretical Performance
The algorithm is developed for gray-level images at different resolutions, up to 512 x 512 at 100 frames per second.

The resources estimated by Xilinx System Generator are:

  • 1,576 slices
  • 15 block RAMs
  • 224 LUTs
  • 11 embedded multipliers
The embedded multipliers and extensive memory resources of the Spartan-3 fabric allow for an efficient logic implementation.

Applications of Feature Extraction
Feature extraction is used in the front end for any system employed to solve practical control problems, such as autonomous navigation and systems that could rely on vision to make decisions and provide control. Typical applications include active video surveillance, robotic arms motion, measurement of points and distances, and autonomous guided vehicles.

Image Warping Theory
Digital image warping deals with techniques of geometric spatial transformations.

The pixels in an image are spatially represented by a couple of Cartesian coordinates (x, y). To apply a geometric spatial transformation to the image, it is convenient to switch to homogeneous coordinates, which allow us to express the transformation by a single matrix operation. Usually this is done by adding a third coordinate with value 1 (x, y, 1).

In general, such transformation is represented by a non-singular 3 x 3 matrix H and applied through a matrix-vector multiplication to the pixel homogeneous coordinates:

The matrix H, called homography or collineation, is defined up to a scale factor (it has 8 degrees of freedom). The transformation is linear in projective (or homogeneous) coordinates, but non-linear in Cartesian coordinates.

The formula implies that to obtain Cartesian coordinates of the resulting pixel we have to perform a division, an operation quite onerous in terms of time and area consumption on an FPGA. For this reason, we considered a class of spatial transformations called "affine transformations" that is a particular specialization of homography. This allows us to avoid the division and obtain good observational results:

Affine transformations include several planar transformation classes as rotation, translation, scaling, and all possible combinations of these. We can summarize the affine transformation as every planar transformation where the parallelism is preserved.

Six parameters are required to define an affine transformation.

Image Warping Algorithms
There are two common ways to warp an image:

  • Forward mapping
  • Backward mapping
Using forward mapping, the source image is scanned line by line and the pixels are copied to the resulting image, in the position given by the result of the linear system shown in equation (2). This technique is subject to several problems, the most important being the presence of holes in the final image in the case of significant modification of the image (such as rotation or a scaling by a factor greater than 1) (Figure 2).

The backward mapping approach gives better results.Using the inverse transformation A-1, we scan the final image pixel by pixel and transform the coordinates. The result is a pair of non-integer coordinates in the source image. Using a bilinear interpolation of the four pixel values identified in the source image, we can find a value for the final image pixel (see Figure 3). This technique avoids the problem of holes in the final image, so we adopted it as our solution for the hardware implementation.

Implementation
Software implementations of this algorithm are well-known and widely used in applications where a personal computer or workstation is required. A hardware implementation requires further work to achieve efficiency constraints on an FPGA.

Essentially, the process can be divided in two parts: transformation and interpolation. We implemented the first as a matrix-vector multiplication (2), with four multipliers and four adders. The second is an approximation of the real result of the interpolation: we weighted the four pixel values approximating the results of the transformation with two bits after the binary point. Instead of performing the calculations given by the formula, we used a LUT to obtain the pixel final value, since we divided possible results of the interpolation into a set of discrete values.

Spartan-3 Theoretical Performance
We designed the algorithm using System Generator for DSP, targeting a Spartan-3 device. We generated the HDL code and synthesized it with ISE.design software, obtaining a resource utilization of:

  • 744 slices (1,107 LUTs)
  • 164 SRL16
  • 4 embedded multipliers
The design can process up to 46 fps (frames per second) with 512 x 512 images. Theoretical results show a boundary of 360+ fps in a Spartan-3-based system.

Applications of Image Warping
Image warping is typically used in many common computer vision applications, such as view synthesis, video stabilization, and image mosaicing.

Image mosaicing deals with the composition of sequence (or collection) of images after aligning all of them respective to a common reference frame. These geometrical transformations can be seen as simple relations between coordinate systems.

By applying the appropriate transformations through a warping operation and merging the overlapping regions of a warped image, we can construct a single panoramic image covering the entire visible area of the scene. Image mosaicing provides a powerful way to create detailed threedimensional models and scenes for virtual reality scenarios based on real imagery.It is employed in flight simulators, interactive multi-player games, and medical image systems to construct true scenic panoramas or limited virtual environments.

Conclusion
The challenge is to design efficient, effective, and reliable vision modules with the highest possible reliability.

Ultimodule, a Xilinx XPERTS partner, and the VIPS Laboratory at the Universita di Verona have defined a foundation platform for computer vision using Ultimodule's system-on-module family. The platform provides a stereovision system for real-time extraction of three-dimensional data and a real-time image-processing engine implementing most of the algorithms required when an application relies on vision to make decisions and provide control.

The platform supports applications that require high performance and robust vision analysis, both in qualitative and computational terms (real-time), including active video surveillance, robotic arm motion and control, autonomous vehicle navigation, test and measurement, and hazard detection. The platform provides modules with all required system control logic, memory, and processing hardware, together with the application software. Interconnecting modules allow fast development of a complex architecture.

The platform leverages Xilinx Spartan-3 devices, which are an optimal choice for image processing IP cores because of their flexibility, high performance, and DSPoriented targeting.The Spartan-3 family provides a valid, programmable alternative to ASICs. This characteristic, coupled with its low cost structure, adds considerable value when time to market is crucial.

For more information about feature extraction, you can e-mail the authors at paolo.giacon@students.univr.it or saul. saggin@students.univr.it. For more information about image warping, you can e-mail matteo.busti@students.univr.it or giovanni.tommasi@students.univr.it.

We are grateful for the support from our advisor, Professor Murino, in the Vision, Image Processing, and Sound (VIPS) Laboratory in the Dipartimento di Informatica at the Universita di Verona, and contributions from Marco Monguzzi, Roberto Marzotto, and Alessandro Negrente.

Printable PDF version of this article with graphics. PDF logo (4/18/05) 385 KB

 
职位招聘 本地活动及在线座谈 本地新闻稿 投资者关系 反馈 法律声明 网站地图
© 1994-2008 Xilinx, Inc. All Rights Reserved.