|
This article reports the work currently
being done in the European Union (EU)
R&D project Design Methodology
and Environment for Dynamic
RECONFigurable FPGA, whose short
name is RECONF2 (see www.cordis.lu/en/
home.htm for more details on EU-funded
research projects). Targeting designers, this
project aims to ease the access to changing
the configuration of part of an FPGA design
while the circuit is running. Xilinx already
offers this technology, but the lack of a simple
methodology and appropriate tools is a
major limitation to its implementation.
Therefore, the partners of this project
(both academic and industrial) defined a
complete and validated methodology, along
with the required front-end tools, addressing
the complete design flow: dynamic partitioning,
control of the dynamic behavior,
and dynamic verifications. Tools covering
all of the specifics related to dynamic reconfiguration
are fully compatible with the
standard design flow (a clear request from
our industrial partners). Also, neither the
methodology nor the tools are dedicated to
a particular application domain and are
thus suitable for any embedded application,
especially real-time ones.
Many advantages exist in using this
technique, including the ability to change
the behavior of a system while it still running
to adapt it to an externally changing
environment.
In this article, two of the project partners
MBDA France and Deltatec will
demonstrate these benefits through a short
presentation of the methodological flow, as
well as citing one example.
Adapted Design Flow
The goal of the RECONF2 project is to
build a set of partial bitstreams representing
different features, so as to partially
reconfigure the FPGA with those bitstreams
when needed under the control of
the FPGA itself or through the use of an
external controller. To reduce the difficulty
in managing such a dynamically reconfigured
application and to provide a
reliable implementation, the academic
partners developed a set of tools and
associated methodologies addressing the
following issues:
- Automatic or manual partitioning of a
conventional design
- Specification of the dynamic constraints
- Verification of the dynamic implementation
through dynamic simulations at
major steps of the design flow
- Automatic generation of the configuration
controller core for VHDL or C
implementation
- Dynamic floorplanning management
and guidelines for modular back-end
implementation
The resulting adapted design flow
shown in Figure 1 is based on both standard
and RECONF2-specific CAD tools.
The input of the design flow is a conventional
VHDL static description of the
application. You can also provide multiple
descriptions of a given VHDL entity to
enable dynamic switching between two
architectures sharing the same interfaces
and area on the FPGA.
We have enriched the classical design
flow with three major steps: partitioning of
the design code, verification of the dynamic
behavior, and generation of the configuration
controller.
Partitioning the Application
Based on the knowledge of the design architecture
and the use of each sub-module in
time, you can indicate which part of the
feature to dynamically load, and under
which conditions. You can also specify data
management constraints to retain some
internal states of the application after unloading and reloading the corresponding
dynamic module.
By identifying portions of the design in
the code at instance level, VHDL process,
or VHDL assignment, you can make the
dynamic specification flexible and independent
of the application coding style.
The outputs of this partitioning task are:
- A VHDL entity and architecture set
corresponding to an identified dynamic
module and containing the relevant
HDL code.
- A dynamic constraint file (.dcf ) that
contains the definition of each module (in terms of content) and the associated
constraints for loading and unloading
them. You can also specify dynamic
relations between two dynamic modules,
making them share the same area
of the FPGA or by declaring them
mutually exclusive in time.
- A VHDL entity and architecture set
corresponding to the static part of the
final implementation. This part
includes all primary design instances
on which no dynamic constraints have
been applied. These instances will
remain permanently inside the FPGA.
Verifying the Dynamic Implementation
Implementing such dynamic reconfiguration
mechanisms must be checked and
with standard simulation tools. To be able
to do so, we had to adapt the classical verification
flow to verify the dynamic behavior
of the design and the coherence of
dynamic constraints applied to and the use
of the design during simulation (Figure 2).
As a result, you can perform this dynamic
verification with behavioral, post-synthesis,
or post-layout VHDL netlists.
Simply enter a partitioned database to
the post-processing tool, which generates
an equivalent VHDL description of the
dynamic design that you can simulate
under standard static VHDL simulators.
The unloading of each dynamic module is
modeled by a wrapper that isolates the
inputs and outputs of each dynamic module
from the rest of the design according to
relevant dynamic constraints (Figure 3).
When a dynamic module is not present
inside the device, its outputs generate X
or Z states to the rest of the design.
The post-processing tool also automatically
generates two VHDL configuration
controller cores:
- The functional configuration controller
(FCC) is used during dynamic behavioral
simulation. The FCC controls
isolation switches by detecting events
inside the application, according to the
.dcf constraints. To assist with the verification
process, the FCC can also issue
different warnings each time a dynamic
module is requested in violation of
exclusivity rules defined in the .dcf.
- The physical configuration controller
(PCC) is a synthesizable version of the
FCC and is mapped as a static part of
the FPGA. As with the FCC, it detects
the loading and unloading conditions
according to the .dcf and manages the
dynamic reconfiguration of the FPGA
by reading bistreams in storage memories
and rewriting the FPGAs configuration.
The PCC also provides an interface
to monitor the reconfiguration process
for hardware debugging purposes.
For dynamic behavioral verification,
you can enter an estimation of the bitstream
lengths into the post-processing
tool to take into account reconfiguration
delays. After layout, you can replace them
with accurate ones, while a back-annotated
VHDL netlist can replace the VHDLpartitioned
code to obtain accurate vital
verifications.
Placement and Routing
You would synthesize the static part of the
design and the VHDL code of each
dynamic module separately to obtain separate
electronic design interchange format
(EDIF) netlists. You can then use the
Xilinx modular back-end flow to place and
route each module and to generate the
associated bitstream, resulting in a typical
floor plan (shown in Figure 4).
In the scope of the RECONF2 project,
the industrial partners extensively tested
these tools and methodologies through
various applications, including video processing,
complex state machines, automatically
adaptive portable equipment, and
fault-tolerant aerospace applications.
An Implementation Example
Figure 5 shows a complete video effects
console architecture using two effects generators
(A and B); their outputs feed a transition
mixer. Channel A feeds the live
output while the operator sets up the second
(Channel B) for a new effect, visible
through the preview output.
When ready, the operator selects a transition
scheme such as wipe or fade and
swaps the live and preview outputs (typically
using a T-bar). Effects generators select
their inputs from several external video
sources or feedback channels implemented
in an SDRAM-based frame store.
The challenging part of this application is
the building of a RECONF2-based implementation
with uninterrupted outputs.
We designed a dedicated hardware
development platform based on a Virtex-
II XC2V3000 device with a 64-bit PCI
adapter board (see Figure 6), taking into
account the specific constraints of the
Xilinx partial reconfiguration design flow
and providing the required flexibility for an
evaluation environment.
Dynamic Architecture of the Design
Based on Figure 5, we partitioned the
design into three processing modules (sharing
the same footprint), applied in
sequence to every field/frame. Each effects
generator also supports a collection of
effects, possibly changing from frame to
frame, implemented as separate exclusive
modules.
- Compute effects A output
- Compute effects B output
- Compute mixer output
This implies saving intermediate and
final results, while reconfiguring the module
for the next operation. An SDRAM
memory pool provides this buffering capability.
Also, the processing must run at
three times the video speed so that total
processing time remains unchanged.
In a reconfigurable design, there is
always a trade-off between the processing
time and reconfiguration time of a dynamic
module. This means that one dynamic
module must process a significant
amount of data before being replaced to
meet the real-time constraints.
In our real-time video application, a
common data unit is one field/frame to be
processed in 20/40 ms compared with
the ~25 ms needed to configure the full
XC2V3000 device via its Select Map interface.
Figure 8 shows the architecture used for
dynamically reconfigurable processing,
while Figure 7 shows the corresponding
layout. We instantiated field buffers on the
input and output side. Although the SDI
input/output pixel rate is 13.5 MHz, pixel
processing can run much faster, at 50
MHz, for instance.
Figure 9 shows typical phase alternating
line (PAL)-interlaced video timing.
Without buffering, dynamic module
reconfiguration must occur within the
blanking interval (1.57 ms), while processing
(at 13.5 MHz) fills the entire active
video interval (18.43 ms).
With the field buffers and 50 MHz processing, we obtain timing (as in Figure 9B)
with 16 ms allocated to reconfiguration
and 4 ms for video processing. Two processing
steps can be interleaved (as in
Figure 9C): 6 ms remain available for
dynamic module reconfiguration.
Applying the same reasoning to frame
buffering (2 fields > 40 ms), we double the
available time for reconfiguration (as in
Figure 9D).
The RECONF2 tools and flow will
help investigate these trade-offs.
Figure 10 shows the manual partitioning
tool GUI, with the input VHDL
design in the left window. The partitioned
design appears in the right window. A simple
drag-and-drop assigns chunks of logic
to one particular module. Scheduling constraints
(load/unload and frame) are then
entered for each module.
Our design lends itself by nature to
manual partitioning: one particular effect is
always applied to a full video field/frame.
Each effect is implemented as an independent
dynamic module.
The configuration controller generator
analyzes the partitioned design and its constraint
file to produce:
- An FCC for simulation purposes
- A PCC for implementation in hardware
(VDHL code) or software (C code)
To evaluate as many features of the tools
as possible, we chose to support both configuration
controller schemes and tested
them on our hardware development platform.
Nonetheless, our preferred solution
was asoftware configuration under the
control of an on-board DSP because of
critical real-time issues; we optimized the
platform accordingly.
The Xilinx Partial Reconfiguration
Design Flow, based on the modular design
available within ISE 6.2 software, is used to
produce one global bitstream for startup
and several partial bitstreams for the different
dynamic modules. See Xilinx application
note XAPP290 for a detailed
description of this flow.
One benefit of the RECONF2
approach for this video console application
is obvious: we can add as many new video
effects (such as video enhancement filters),
fitting in the reserved dynamic module
space without the need for additional
FPGA resources. This effectively increases
the functional density of the FPGA.
A further increase comes from executing
several processing steps (effects A, B, and
transition) within a single video field/frame
duration, as previously explained. This is
very similar to the traditional parallel vs.
serial arithmetic trade-off, and makes a
great deal of sense given the extraordinary
progression of FPGA performance over the
last several years.
One less obvious advantage exists
thanks to this partitioned approach: simultaneously
supporting all of the functions
results in unneeded complexity that may
adversely impact the designs performance.
The operating model is also more complex
for the control application. Smaller, dedicated
modules will run faster and need less
operating parameters, making them more
manageable objects.
In the example presented, we see the
reconfiguration time as a clear limitation.
This time is directly linked to the dynamic
module size and to other FPGA parameters.
We are currently trying to implement a
dynamic module caching: two dynamic
modules slots are reserved, and one module
is reloaded while the other is processing, and
vice versa. Reconfiguration time is completely
hidden, at the cost of FPGA space.
Conclusion
Most of the specific tasks required by partial
dynamic reconfiguration are handled
by a complete methodology and associated
tools. These have been designed to be
fully compatible with the ones used for
classic Xilinx FPGA implementation.
Furthermore, our approach is usable with
any technology compatible with dynamic
re-configuration.
The academic partners developed the
method and tools according to specifications
that take into account industrial constraints,
such as:
- Compatibility with standard tools such
as simulators and synthesizers
- Usability with any technology compatible
with dynamic re-configuration, in
particular with Xilinx technology and
back-end tools
The academic partners have made the
tools and methods available to the industrial
partners, who are currently testing them
for complex circuit design, thus ensuring
ease of use and efficiency.
MBDA France will then be able to take
full advantage of this technology in deeply
embedded on-board computers, characterized
by small volumes and low power dissipation.
Deltatec develops digital imaging products
for multimedia, industrial/medical,
and professional broadcast markets.
Upcoming video applications will require
more and more versatility. High-definition
television (HDTV) applications must tackle
multiple formats (resolution, frame rate,
interlaced/progressive scan) as well as converge
with the computer graphics world.
Simultaneous support for all existing formats/functions may rapidly become a
nightmare and even hamper feasibility
because of cost or performance issues (for
example, an HDTV pixel rate of 75 MHz,
almost a six-fold increase over standard digital
television [SDTV]).
The RECONF2 tools and methodology
circumvent these problems, as only the
required function blocks are loaded at any
particular instant:
- FPGA size (and cost) remains acceptable,
while keeping the same integration level.
- Smaller, less generic, optimized function
modules more easily reach performance
goals.
At this time, the methodology and
tools are accessible to our project partners
and could be extended to other third parties,
such as tool suppliers for distribution
and support in order to enable a larger
access to this technology. For more information
on the RECONF2 project, visit
www.reconf.org.
Printable PDF version of this article with graphics. (8/1/04) 390 KB |