Unlock New Levels of Productivity for Your Design Using ISE Design Suite 12

By: Hitesh Patel

Xilinx® ISE® Design Suite v12 is the production-optimized tool suite for Virtex®-6 and Spartan®-6 FPGAs that delivers innovation in three critical areas of FPGA design: power reduction, productivity, and performance.

This white paper presents an overview of key achievements and advances in each of these three categories, describing their intent and their impact.
Highlights

Power Optimization

Building on a well-known but often under-utilized power-optimizing design methodology called clock gating, ISE Design Suite v12 introduces the first automated, intelligent clock gating technology for FPGA design. With this capability, the tool automatically neutralizes unnecessary logic activity, reducing dynamic power usage up to 30%. ISE v12 software also introduces the fourth-generation partial reconfiguration technology, which, when combined with the design flow in ISE v12 software, provides a simple, intuitive approach to on-the-fly reuse of FPGA resources, creating additional opportunities to reduce power.

Productivity

ISE v12 software breaks new ground in design productivity enhancements with the introduction of design preservation—the ability to partition and lock down the placement and routing of timing-critical portions of a design, thus enabling the designer to achieve and maintain timing repeatability. This new partitioning technology figures prominently in the deployment of partial reconfiguration. ISE v12 software also introduces next-generation Advanced Microcontroller Bus Architecture (AMBA® protocol) IP, interconnect, and tool support. Xilinx has been intimately involved with ARM on the development of the updated version 4 open standard interface specification to enable the development and delivery of plug-and-play IP from Xilinx and third-party ecosystem providers—an advance that can provide the most valuable benefit to design productivity.

Performance

ISE v12 software supports production Spartan-6 and Virtex-6 FPGA devices and IP with fully optimized place-and-route and synthesis algorithms, improving Quality of Results (QoR) and greatly decreasing synthesis and implementation runtimes. An enhancement to SmartXplorer in the ISE v12 software release accelerates timing closure by enabling exploration of design strategies in the synthesis space.

Power Optimization

The vast majority of legacy and IP designs deployed today are fraught with power inefficiencies. Although the value of power reduction techniques, such as clock gating used to reduced dynamic power consumption, are well understood, rarely does an engineer have the time to manually employ them. ISE v12 software offers a straightforward solution whereby a user can automatically implement power optimizations downstream (after synthesis) that were overlooked or omitted at the RTL level.

A unique set of algorithms enables ISE v12 software to automatically identify and neutralize unnecessary logic activity, a primary contributor to dynamic power inefficiencies. These algorithms utilize the abundant clock enables (CEs) found in the Virtex-6 and Spartan-6 FPGAs. Each CE is ideally suited for power optimization because it connects to the basic cluster of the Virtex-6 FPGA logic (the slice) and controls a small number of registers (only eight). See Figure 1.
Based on a thorough analysis of the sequential elements in the design, the software detects any unnecessary transitions that do not change the final logic and then creates gating signals to cancel these unnecessary transitions, connecting them to the CE, as shown in Figure 2.
Although the use of clock gating to suppress unnecessary switching in FPGAs is not a new concept, intelligent, fine-grain clock gating is a completely new technology for FPGAs, promising to reduce dynamic power by as much as 30%.

ISE v12 software is the only tool that offers intelligent, automated clock gating optimizations integrated with place-and-route algorithms. These optimizations do not alter the preexisting logic or the processing of the design, nor do they alter clock placement. The additional logic created is separate from previous logic and only adds a small percent (an average, 2%) of LUTs to the original design. Thus, in the vast majority of cases, these optimizations have no effect on timing.

In a typical scenario, an engineer would use Xilinx Power Estimator (XPE) or Xilinx Power Analyzer (XPA) to estimate the dynamic power consumption for a design. When the power budget is exceeded, the engineer can, at the flip of a switch, automatically apply intelligent clock gating optimizations to the design. The designer can then use XPA to determine the extent of the anticipated power savings.

See WP370, Intelligent Clock Gating for more details.

Productivity

Design Preservation

An FPGA design comprising complex modules can make timing closure difficult to maintain, even though the design of the modules themselves might remain unchanged. As a result, designers can spend vast amounts of time repeatedly trying to regain timing after minor changes to other portions of the design.

The design preservation flow in ISE v12 software solves this problem by allowing the user to establish one or more partitions that can be locked to a particular placement and routing after timing closure for the partition is achieved. This greatly reduces the number of implementation iterations required during the timing closure phase. Moreover, by using the exact same implementation, design preservation precludes the necessity of performing full verification on unchanged modules.

Establishing an appropriately defined hierarchy in the RTL design phase greatly impacts the success with which design preservation can be employed in a design. The engineer must create partitions that follow the logical hierarchy of an HDL design. Some common rules for creating good hierarchy for partitions include:

- Keep logic that needs to be optimized, implemented, and verified in the same level of hierarchy
- Keep logic that needs to be packed together in the same level of hierarchy
- Register inputs and outputs of modules
- Do not have constants as inputs to partitions
- Do not have unused inputs or outputs in the partition

Even if the design preservation flow was not taken into consideration during the RTL design phase, designs often use independent cores that can still benefit from this flow.

For information on the use of the design preservation flow to achieve repeatable results, see WP362, Repeatable Results with Design Preservation. For information on maintaining predictable results throughout the design effort, see WP361, Maintaining Repeatable Results.
Partial Reconfiguration

Partial reconfiguration technology allows the dynamic modification of FPGA logic blocks by downloading partial bit files without interrupting the operation of the remaining logic (see Figure 3).

Support for fourth-generation on-the-fly partial reconfiguration technology enables designers to dramatically reduce system cost and power consumption by fitting sophisticated applications into the smallest possible device. Thus, engineers can reduce the size, cost, and power consumption of a design.

ISE v12 software makes this technology easier than ever to use by providing an intuitive interface and simplified methodology that closely aligns with the familiar standard ISE software design flow. The ISE software partial reconfiguration flow uses the same proven Xilinx tools and techniques for timing closure, design management and floorplanning, and design preservation.

Partial reconfiguration reduces power in a number of ways, the most obvious being the improved efficiency of resource utilization or device selection. Additionally, swapping out power-hungry functions or I/O standards for more power-efficient versions when maximum performance is not needed reduces dynamic power usage. The user can also download blanking bitstreams to reduce toggling activity when the functionality of the reconfigurable region is not required.

Figure 4 provides a high-level view of the use of partial reconfiguration to change the protocol standard for a port in a wired Optical Transport Network (OTN) solution.
This 40G OTN muxponder demonstration system employs partial reconfiguration to enable four independent ports (client channels) with support for OTU2 and OC-192/STM-64 industry standards. The system/user can reconfigure each channel on the fly by loading into the Xilinx FPGA a partial bitstream that instantiates just the chosen port persona needed at that time, rather than instantiating all possible port configurations at once. This approach enables developers to use fewer and smaller devices to implement the FPGA-based 40G OTN muxponder.

Partial reconfiguration is even being used in designs for space. In this field, it is valued primarily for its inherent ability to perform on-orbit “upgrades” (reconfigurations) without interrupting the device operation. Since logic in the static region remains operational, it is possible to reconfigure the device while maintaining a communications and state-of-health link to the node—a critically important feature for extreme remote applications. Since new configurations can be uploaded remotely, the use of partial reconfiguration also drastically reduces the amount of very expensive, radiation-hardened, nonvolatile memory typically required in the system.

**AXI4: The Path to Plug-and-Play IP**

Of all the design productivity advances delivered in ISE v12 software, the one that might offer the broadest reaching value is the introduction of a next-generation AMBA Advanced eXtensible Interface (AXI). The product of a collaborative effort involving ARM and Xilinx, this new IP interface is part of a Xilinx strategic initiative to enable the creation of plug-and-play IP. The new interface provides a framework that satisfies the requirements in all three targeted domains (the Embedded, DSP, and
Logic/Connectivity domains). Plug-and-play IP removes the design overhead required to address IP with different interface standards, while enabling the rapid growth of a more robust IP ecosystem offering a continuously broadening catalog of IP.

The new interface standard (AXI4) is built on a high-performance, point-to-point channel architecture that minimizes channel traffic congestion, maximizes data throughput through support of multiple outstanding memory-mapped transactions, and offers a streaming interface that allows efficient data transfer for high-speed serial I/O. Its ability to use register slices to pipeline connections and its support for long burst-based transactions makes it possible to achieve higher $F_{MAX}$ and throughput with AXI4.

Overcoming Diversity with Uniformity

The growth of the extensive library of IP available for Xilinx FPGA design has precipitated the creation of a large and potentially confusing assortment of interfaces. At present, there are several different interfaces for embedded design alone, e.g., Local Link, PLB46, DCR, XCL, FSL, IPIC/IPIF, OCM, LMB, NPI, MIB, PLBv34, OPB, and VBFC. Furthermore, current IP interfaces can be split into two general categories: memory-mapped and streaming. Memory-mapped interfaces (e.g., PLBv46, DCR, and NPI) use addresses to uniquely identify devices or memory regions. Streaming interfaces (e.g., FSL and Local Link) are generally used for video, communications, DSP, and data flow applications. Streaming interfaces are not memory mapped and therefore have no concept of addressing.

It is not always easy to determine which interface (or interfaces) should be used in a given design, nor is it a trivial matter to get IP with different interfaces to interoperate. In short, IP portability and interoperability are significant issues that have a major impact on design productivity.

In response to this challenge, Xilinx has partnered with ARM to develop a next-generation AXI interface called AMBA 4.0. AXI is a royalty-free industry standard interface that has been designed to be very flexible and to enable IP reuse. It is publicly defined in the AMBA specification, which can be accessed at http://www.arm.com/products/system-ip/amba/amba-open-specifications.php.

- The AXI4 specification, one component of AMBA 4.0, is also an open standard and is backward compatible with AMBA 3.0 (AXI3). AXI4 contains new features added to AXI3, such as longer burst lengths, plus a set of additional enhancements to support data streaming and simpler control register/peripheral interfaces—all features that favor use in an FPGA. ARM calls the new additions to AXI4, AXI4-Lite, and AXI4-Stream:
  - AXI4-Lite adds a lightweight, non-bursting, standard interface for peripheral IP or for control registers within an IP.
  - AXI4-Stream adds the capability of a lightweight, high-performance streaming interface. Based on the underlying data transfer protocol used for the Write Data Channels of AXI, this interface transfers data with no concept of address through a simple flow-control protocol.

AXI4 has also been improved from the AMBA 3.0 specification in that burst transaction support has been increased from a length of 16 data transfers to 256, quality of service and user signaling has been added, and it includes a more complete definition of transaction attributes and device types. A typical system architecture using these components is shown in Figure 5.
The additional benefits afforded by adopting AXI as the open standard IP interface are many and substantial:

- AXI provides the features that enable multi-processor operation, power control, and security features. Additional features can be added through the use of user-defined sideband signals.
- AXI is thoroughly specified by ARM and already widely understood and accepted by many.
- AXI is an industry-standard interface that supports a viable ecosystem of bus functional models, books, verification IP, and other supporting items. This open industry-standard interface will help to ensure interoperability.
- AXI very effectively separates the interface from the interconnect, returning large benefits in design reuse, design flexibility, and TTM.
- AXI can unify IP. Although not all IP needs to communicate with other IP and not all Xilinx IP will move to an AXI interface, for those cores that do, a standard interface will be used with a very flexible interconnect to provide interoperability.
- AXI is a single well-documented interface with a rich third-party contribution, making it significantly easier to use than the plethora of existing IP interfaces.
- AXI is scalable because it supports a very broad range of functionality and parameters such as data interface width. It provides a signaling design that easily allows pipelining, width conversion, and asynchronous interfaces. It also supports a wide range of topologies—point-to-point, switched, bus, or hierarchical.

In summary, AXI4 has more advanced protocol features that enable the designer to quickly and easily "tune" a system for timing, area, and/or performance. As an open industry-standard specification, it will improve reuse, interoperability, and overall ease of use.

Figure 5: A Typical System Architecture Using All Three of the AXI4 Interface Types
Embedded Configuration Wizard

Productivity also improved substantially with the introduction of a MicroBlaze™ processor configuration wizard (Figure 6) that dramatically simplifies the optimization of an embedded design for performance, area, or throughput. The new wizard benefits both the novice and the expert user enabling either to quickly create and/or explore an appropriate setup for a MicroBlaze processor configuration.

![MicroBlaze Processor Configuration Wizard](image)

**Figure 6: MicroBlaze Processor Configuration Wizard**

Performance

SmartXplorer helps users reach timing closure more quickly by enabling them to run multiple design strategies in parallel. Starting with the 12.1 release, SmartXplorer supports Xilinx Synthesis Technology (XST) as well as the Synopsys Synplify tool. Thus, before running multiple implementation strategies, users can now execute several strategies in synthesis to select the best synthesized netlist for the implementation runs, as illustrated in Figure 7.
The production-level speed specifications for Spartan-6 FPGAs produce an average of 5% improvement in logic performance (QoR) for Spartan-6 FPGAs. The speed files have also been updated for Virtex-6 FPGAs. In addition, work on removing design implementation congestion has lowered place-and-route times from one or two days to four or five hours.

More Advances in ISE Design Suite 12

Debug with ChipScope Analyzer

The ChipScope™ analyzer supports continuous (otherwise known as “repetitive”) trigger mode. In this mode, the trigger is automatically rearmed after a prior trigger event, capture, upload, and display sequence. This feature also includes an optional automatic data export to prevent losing captured information after each trigger/capture/upload/display sequence. This feature allows the user to visually monitor repetitive events without having to manually arm the trigger each time, providing an easy way to capture multiple sparse events that can occur over a long period of time—for example, over a weekend.

The repetitive trigger run mode can also be used to automatically capture and export data so that external applications (such as the MATLAB® software) can import the data for further processing.

Simulation with ISE Simulator (ISim)

ISim is now available for the embedded design flow through Xilinx Platform Studio (XPS) and Project Navigator. The 12.1 version of ISim introduces several productivity-enhancing features, including the facility to automatically detect design memories and list them for viewing and editing in a new memory editor.

Among its unique attributes, the memory editor arms the user with the facility to explore what-if scenarios, employing a graphical method to force a value or pattern on a signal without the need to recompile the design. The user can automatically scale the time unit and precision for time values shown in the memory editor’s waveform viewer by adjusting the zoom factor, or the user can lock in a user-specified choice.

The waveform viewer also provides the means to add signals, dividers, virtual buses, and markers to a waveform configuration, which can then be saved using Tool command language (Tcl) commands. The memory editor also allows the user to navigate directly from the waveform viewer to the HDL source.
ISE v12 Software Roadmap

The ISE v12 software release will occur in three phases. Highlighting the May 2010 release of ISE v12.1 software will be the introduction of intelligent clock gating for Virtex-6 FPGAs and delivery of the design preservation flow to achieve and maintain timing repeatability. The ISE v12.2 software release (Summer 2010) will include intelligent clock gating for Spartan-6 FPGAs and additional improvements in partial reconfiguration. ISE v12.3 software will introduce plug-and-play IP for embedded, DSP, and connectivity design utilizing the new AXI4 protocol.

Summary

In addition to the anticipated performance improvements commensurate with the production release of a Xilinx tool suite, the release of ISE v12 software unveils significant innovations with far-reaching potential. A new power-optimization capability called intelligent clock gating can reduce dynamic power by up to 30%. An innovation called design preservation vastly improves the user’s ability to achieve and maintain timing closure and design repeatability. An intuitive, fourth-generation partial reconfiguration design flow has already begun proving its ability to enable designers to reduce the size, cost, and power of their designs. With the introduction of AXI4, Xilinx has enabled the creation of a vast, ecosystem-supported plug-and-play IP library for Xilinx FPGAs that provides easy access to new and existing IP of both the memory-mapped and data-streaming varieties.

These innovations deliver unparalleled value in the three most important criteria for next-generation FPGA designs: better power efficiency, increased productivity, and higher performance.

Revision History

The following table shows the revision history for this document:

<table>
<thead>
<tr>
<th>Date</th>
<th>Version</th>
<th>Description of Revisions</th>
</tr>
</thead>
<tbody>
<tr>
<td>05/03/10</td>
<td>1.0</td>
<td>Initial Xilinx release.</td>
</tr>
</tbody>
</table>

Notice of Disclaimer

The information disclosed to you hereunder (the “Information”) is provided “AS-IS” with no warranty of any kind, express or implied. Xilinx does not assume any liability arising from your use of the Information. You are responsible for obtaining any rights you may require for your use of this Information. Xilinx reserves the right to make changes, at any time, to the Information without notice and at its sole discretion. Xilinx assumes no obligation to correct any errors contained in the Information or to advise you of any corrections or updates. Xilinx expressly disclaims any liability in connection with technical support or assistance that may be provided to you in connection with the Information. XILINX MAKES NO OTHER WARRANTIES, WHETHER EXPRESS, IMPLIED, OR STATUTORY, REGARDING THE INFORMATION, INCLUDING ANY WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, OR NONINFRINGEMENT OF THIRD-PARTY RIGHTS.