

## **QDR II SRAM Interface for Virtex-4 Devices**

Author: Derek Curd

## **Summary**

This application note describes the implementation and timing details of a 2-word or 4-word burst Quad Data Rate (QDR II) SRAM interface for Virtex®-4 devices. The synthesizable reference design leverages the unique I/O and clocking capabilities of the Virtex-4 family to achieve high performance levels.

The direct-clocking methodology presented in this solution greatly simplifies the task of read data capture within the FPGA while minimizing the number of resources used. A straight forward user interface is provided to allow simple integration into a complete FPGA design utilizing one or more QDR II interfaces.

### Introduction

QDR SRAM devices were developed in response to the demand for higher bandwidth memories targeted at networking and telecommunications applications. The basic QDR architecture has independent read and write datapaths for simultaneous operation. Both paths use Double Data Rate (DDR) transmission to deliver two words per clock cycle, one word on the rising clock edge and another on the falling edge. The result is that four bus-widths of data (two read and two write) are transferred during each clock period, hence the quad-data-rate name.

The QDR and QDR II specifications were defined and developed by the QDR Consortium (Cypress, IDT, NEC, Samsung, and Renesas). The links to memory device vendors in the Additional Resources section provide more information about the QDR specification and QDR memory products.

QDR memory devices are offered in both 2-word burst and 4-word burst architectures. The 2-word burst devices transmit two words per read or write request. A DDR address bus is used to allow Read requests during the first half of the clock period and Write requests during the second half of the clock period. In contrast, 4-word burst devices transmit four words per Read or Write request, and hence only require a Single Data Rate (SDR) address bus to maximize data bandwidth. Read and Write operations must be requested on alternating clock cycles (i.e., non-overlapping), allowing the address bus to be shared.

The reference designs discussed in this application note target either a 2-word or a 4-word burst QDR II SRAM device. One of the unique features of the QDR II architecture is the echo-clock (CQ) output that is frequency locked to the device input clock (K) but edge aligned to the data transmitted on the Read path outputs (Q). The CQ clock output is retimed to align with the Q data outputs using a delay-locked loop (DLL) circuit internal to the QDR II memory device. This clock forwarding, or source-synchronous, method of interface allows greater timing margin for the read data capture operation at the far-end device (the Virtex-4 device in this design). It also enables the simple and elegant direct-clocking methodology used in this reference design, discussed in detail in this application note.

Figure 1 is a timing diagram showing concurrent Read/Write operations on a 4-word burst QDR II memory interface. All inputs to the QDR II memory are synchronous to the input clocks (K and  $\overline{K}$ ) and are typically presented to the memory center-aligned with respect to the K and  $\overline{K}$  clock edges. It is important to note that the active-Low Read Control ( $\overline{R}$ ) and Write Control ( $\overline{W}$ ) pins alternate clock cycles to enable a single shared SDR address bus (SA).

© 2004–2008 Xilinx, Inc. XILINX, the Xilinx logo, Virtex, Spartan, ISE, and other designated brands included herein are trademarks of Xilinx in the United States and other countries. All other trademarks are the property of their respective owners.



The Write bus Data In (D) values are transmitted to the memory in DDR mode beginning on the next rising edge of K clock after the Write Control pin is active. The Read bus Data Out (Q) values are transmitted from the memory in DDR mode edge-aligned with the CQ and  $\overline{CQ}$  echo clock outputs. The first word on the Read bus is transmitted on the rising edge of the  $\overline{CQ}$  clock output following the next rising edge of the  $\overline{K}$  input clock.

QDR II memories also have active Low Byte Write ( $\overline{BW}$ ) enable pins to use when selecting specific bytes from the Data In (D) word to write to the memory. These signals are omitted from Figure 1 for clarity.



Figure 1: 4-Word Burst QDR II SRAM Timing Diagram with Concurrent Read and Write Operations

Figure 2 is a timing diagram showing concurrent Read/Write operations on a 2-word burst QDR II memory interface. The DDR address bus allows Read addresses to be presented to the memory during the first half of the clock period and Write addresses to be presented during the second half of the clock period. Thus, the active-Low Read Control  $(\overline{\mathbb{R}})$  and Write Control  $(\overline{\mathbb{W}})$  pins can be asserted on the same clock cycle.

The two Write bus Data In (D) values are transmitted to the memory in DDR mode starting on the rising edge of K clock prior to the Write address assertion. The Read bus Data Out (Q) values are transmitted from the memory in DDR mode edge-aligned with the CQ and  $\overline{CQ}$  echo clock outputs. The first word on the Read bus is transmitted on the rising edge of the  $\overline{CQ}$  clock output following the next rising edge of the  $\overline{K}$  input clock.





Figure 2: 2-Word Burst QDR II SRAM Timing Diagram with Concurrent Read and Write Operations



## Design Overview

Figure 3 is a high-level block diagram of the QDR II reference design that shows both the external connections to the QDR II memory device and the internal FPGA fabric interface for initiating Read/Write commands.



Figure 3: QDR II Reference Design Block Diagram

The  $\overline{D}_{OFF}$  C, and  $\overline{C}$  pins of the QDR II device are tied High in Figure 3. This configuration enables the CQ echo clock feature of the QDR II device and is necessary for proper operation of the reference design.

As shown in Figure 4, the QDR II reference design is composed of four main elements:

- User Interface
- Physical Interface
- Read/Write State Machine
- Delay Calibration State Machine





Figure 4: Components of the QDR II Reference Design

The user interface uses a simple protocol based entirely on SDR signals to make Read/Write requests. This module is constructed primarily from FIFO16 primitives and is used to store the address and data values for Read/Write operations before and after execution. More details on the user interface timing protocol are presented in a later section.

The Read/Write state machine is responsible for monitoring the status of the FIFOs within the user interface module, coordinating the flow of data between the user interface and physical interface, and initiating the actual Read/Write commands to the external memory device. It ensures execution of Read/Write operations with minimal latency in a concurrent manner as per the requirements of the QDR II memory specification.

The physical interface is responsible for generating the proper timing relationships and DDR signaling to communicate with the external memory device in a manner that conforms to its command protocol and timing requirements.

The delay calibration state machine is an integral component of the direct-clocking methodology used to achieve maximum performance while greatly simplifying the task of read data capture inside the FPGA. Each Input on a Virtex-4 device has a programmable delay element (IDELAY) that can be dynamically adjusted to control the amount of delay on the input path across a 5 ns window. The delay calibration state machine leverages this unique capability to adjust the timing of the read data returning from the memory device so that it can be synchronized directly to the global FPGA system clock (USER\_CLK0) without any complex local-clocking or data recapture techniques. More details on the direct clocking methodology are presented below.



Table 1 summarizes the QDR II reference design specifications, including performance goals and device utilization details.

Table 1: QDR II Reference Design Specifications

| Parameters                            |                    |              | Specification/Details                |  |
|---------------------------------------|--------------------|--------------|--------------------------------------|--|
| Maximum Frequency (by speed grade)    | -10                |              | 200 MHz                              |  |
|                                       | -11                |              | 250 MHz                              |  |
|                                       | -12                |              | 275 MHz                              |  |
| Device Utilization                    | Slices             |              | 174                                  |  |
|                                       | GCLK Buffers       |              | 3                                    |  |
|                                       | FIFO16 (block RAM) |              | 6                                    |  |
| QDR II SRAM Operation                 |                    |              | 2-word/4-word burst                  |  |
| Bus Width                             |                    |              | 36-bit Read/36-bit Write             |  |
| I/O Standard                          |                    |              | HSTL_I_18 (1.8V Signaling)           |  |
| HDL Language Support                  |                    |              | Verilog/VHDL                         |  |
| Target Memory Device for Verification | Simulation         | 2-word burst | Cypress CY7C1314BV18 (512K x 36 bit) |  |
|                                       |                    | 4-word burst | Samsung K7R323684M (1M x 36 bit)     |  |
|                                       | Hardware           | 2-word burst | Cypress CY7C1314BV18 (512K x 36 bit) |  |
|                                       |                    | 4-word burst | Samsung K7R323684M (1M x36 bit)      |  |

## Implementation Details

The QDR II reference design was implemented to take advantage of the unique capabilities of the Virtex-4 family. Advances in I/O, clocking, and storage element technology enable the high-performance, turnkey operation of this design. The following sections describe the design implementation in further detail.

#### **User Interface**

The user interface module utilizes six FIFO16 blocks to store the address and data values for Read/Write operations. For Write commands, three FIFO16 blocks are used, one to store the Write address (USER\_AD\_WR) and byte write enable (USER\_BW\_n) signals, and two to store the Low (USER\_DWL) and High (USER\_DWH) 36-bit data words to be written to the memory. Read commands also use three FIFO16 blocks, one to store the Read address (USER\_AD\_RD) and two to store the Low (USER\_QRL) and High (USER\_QRH) 36-bit data words returning from the memory as a result of the Read execution.

Figure 5 shows the timing protocol required to issue Read/Write requests to the user interface when using the 4-word burst reference design. As mentioned previously, the interface uses all SDR signals synchronized to the main FPGA design system clock (USER\_CLK0).



Figure 5: 4-Word Burst User Interface Timing Protocol

Write requests are made via an active-Low USER\_W\_n signal during the rising edge of USER\_CLK0. The 18-bit Write address (USER\_AD\_WR) must be presented on this same clock edge. The first and second 36-bit data words to be written to the memory are also presented at this time to the 36-bit USER\_DWL and USER\_DWH input buses, respectively. The third and fourth words of the 4-word burst are presented to USER\_DWL and USER\_DWH, respectively, on the next rising edge of USER\_CLK0.

Read requests are made via an active-Low USER\_R\_n signal during the rising edge of USER\_CLK0. The 18-bit Read address (USER\_AD\_RD) must be presented on this same clock edge. After the execution of the Read command, the 4-word burst values are stored in the Read data FIFOs. An active-Low USER\_QEN\_n signal during the rising edge of USER\_CLK0 retrieves these values and presents them on the 36-bit USER\_QRL and USER\_QRH outputs, with the first and second words presented on the first cycle in which USER\_QEN\_n is held Low and the third and fourth words presented on the next cycle in which USER\_QEN\_n is held Low.

Unlike the QDR II memory itself, the user interface can accept Read and Write requests on the same clock cycle as shown on the third cycle of Figure 5. The Read/Write state machine



manages the interleaving of Read and Write requests to the external memory device, relieving the user interface of this responsibility.

The user interface also includes a number of signals not shown in Figure 5 that indicate the status of the Read/Write FIFOs. An active-High USER\_WR\_FULL output indicates that the Write FIFOs are full. No more Write requests are allowed under this condition until the Write request queue is reduced. Any Write requests made while USER\_WR\_FULL is High are simply ignored. A similar situation applies to the USER\_RD\_FULL signal for Read requests.

An active-High USER\_QR\_EMPTY output indicates that there are no more Read data values stored in the Read data FIFOs. Attempts to read values out on to the USER\_QRL and USER\_QRH buses under this condition are ignored. This condition persists until additional Read commands are executed and the associated data values are stored in the Read data FIFOs.

Figure 6 shows the timing protocol required to issue Read/Write requests to the User Interface when using the 2-word burst reference design. Write requests are made via an active-Low USER\_W\_n signal during the rising edge of USER\_CLK0. The 18-bit Write address (USER\_AD\_WR) must be presented on this same clock edge. The first and second 36-bit data words to be written to the memory are also presented at this time to the 36-bit USER\_DWL and USER\_DWH input buses, respectively. The 2-word burst user interface protocol is similar to the 4-word burst case described above in all other respects.



Figure 6: 2-Word Burst User Interface Timing Protocol

### **Read/Write State Machine**

The state diagram for the 4-word burst Read/Write state machine is shown in Figure 7. This state machine is responsible for coordinating the flow of data between the user interface and



physical interface. It initiates the Read/Write commands to the external memory device based on the requests stored in the user interface FIFOs.

A USER\_RESET always returns the state machine to the INIT state, where memory operations are suspended until the delay calibration state machine has completed adjusting the delay on the IDELAY blocks for all of the QDR\_Q inputs to center align the Read path data to the FPGA system clock, USER\_CLK0. Completion of the calibration operation is signaled by an active-High DLY\_CAL\_DONE input that transitions the Read/Write state machine to the Idle state to await Read/Write requests from the user interface.

From the Idle state, Write commands take precedence on the presumption that a Write to memory must always occur before there is any valid Read data. When there are no Read or Write requests pending, the state machine loops in the Idle state.

A Write request pending in the user interface FIFOs causes transition to the Write state where a Write command is initiated via the internal WR\_INIT\_n strobe. This strobe pulls the Write address and data values from the FIFO and results in the initiation of the external QDR\_W\_n Write control strobe to the memory device.

Assuming there is a pending Read request, the state machine then transitions to the Read state where the internal RD\_INIT\_n strobe is activated. This strobe pulls the Read address from the FIFOs and launches an external QDR\_R\_n strobe to the memory device. Capture of the return values in the Read data FIFOs also occurs as a result of this process.

The Read/Write state machine continuously monitors the user interface FIFO status signals to determine if there are any pending Read/Write requests. A continuous flow of concurrent Read/Write requests causes the state machine to simply alternate between the Read and Write states, ensuring properly interleaved requests to the external memory. A stream of Write requests results in alternating Idle and Write states, while a stream of Read requests similarly alternates between Idle and Read states.





Figure 7: 4-Word Burst Read/Write State Machine

The state diagram for the 2-word burst Read/Write State Machine is shown in Figure 8. The operation of this state machine is quite similar to the 4-word burst state machine, with the exception that a single READ\_WRITE state manages the Read and Write requests to the memory. All 2-word burst QDR II memory devices allow Read and Write requests to occur on the same clock cycle, allowing these operations to be initiated from the same state.





Figure 8: 2-Word Burst Read/Write State Machine Diagram

## Physical Interface

The Physical Interface of the QDR II reference design generates the actual I/O signaling and timing relationships for communication of Read/Write commands to the external memory device, including the DDR data signals. It provides the necessary timing margins and I/O signaling standards required to meet the overall design performance specifications. All I/O signals for the QDR II design use HSTLI signaling. This section details each component of the Physical Interface.

## **Clocking Scheme**

The QDR II design makes extensive use of the Input DDR (IDDR) and Output DDR (ODDR) primitives found in all Virtex-4 device I/O blocks. These built-in DDR register functions greatly simplify the task of generating the proper clock, address, data, and control signaling for communication to the QDR II memory device. Both the IDDR and ODDR primitives have various modes of operation to determine how the captured or transmitted DDR data is presented to the FPGA fabric and I/O pins, respectively. More details on the IDDR and ODDR modes of operation are available in Chapter 8, "Advanced SelectIO Logic Resources" of UG070, Virtex-4 FPGA User Guide.

The clocking scheme (Figure 9) in the QDR II design uses the ODDR registers in opposite-edge mode to generate the QDR\_K and QDR\_K\_n clocks for the memory device. This clock forwarding methodology effectively removes the clock-to-out parameter of the FPGA from timing margin considerations, because the clock signals have nearly identical timing in comparison to the QDR II address, data, and control signals. All externally transmitted signals are therefore "matched" with respect to the clock-to-out parameter.





Figure 9: Clock Forwarding Scheme Based on ODDR Register Function

#### **Write Path**

The Write path to the QDR II memory includes the address, data, and control signals necessary to execute a Write operation. The Write address (QDR\_AD\_WR), control strobe (QDR\_W\_n), and byte write enable (QDR\_BW\_n) signals all use SDR formatting. However, the Write data values (QDR\_D) utilize DDR signaling to achieve the required 2-word or 4-word burst within the allotted clock periods.

All of these Write path signals must be presented center-aligned with respect to the QDR\_K and QDR\_K\_n clock edges. For this reason, the output registers for these signals are synchronized to the USER\_CLK270 clock. This signal operates at the same frequency but is 270° (75% of the clock period) out-of-phase with respect to USER\_CLK0. This ensures adequate setup and hold margins for the memory device with respect to the incoming QDR\_K and QDR\_K\_n clock edges.

Figure 10 demonstrates the use of USER\_CLK270 and the ODDR registers to generate the DDR signaling required for the QDR\_D Write datapath. The ODDR register is configured in same-edge mode allowing both 36-bit data words (FIFO\_DWL and FIFO\_DWH) to be captured from the FPGA fabric on the same rising edge of USER\_CLK270. The FIFO\_DWL value is transmitted immediately after this rising edge onto the QDR\_D Write data bus, while the FIFO\_DWH value is subsequently transmitted out of the ODDR block on the next falling edge of USER\_CLK270. This process repeats to generate a 4-word Write data burst.

The Read/Write address, byte write enables, and Read/Write control strobes are generated in a similar manner using a single flip-flop within the I/O block to create SDR signals synchronized to USER\_CLK270.





Figure 10: Write Datapath Implementation

### **Read Path**

While Read data capture is fundamentally a more challenging operation than Write data transmission, the direct-clocking methodology of the QDR II reference design greatly simplifies this task.

As mentioned previously, each Virtex-4 FPGA input pin has a programmable delay element (IDELAY). The IDELAY element can be dynamically adjusted to control the amount of delay on the input path. Each IDELAY block has 64 tap delays of 75 ps each, allowing the input signal timing to be adjusted across a 5 ns window. Use of the IDELAY blocks also requires instantiating the IDELAYCTRL primitives in the I/O banks using the IDELAY elements. The IDELAYCTRL blocks use a 200 MHz reference clock (±1000 ppm tolerance) to precisely calibrate the IDELAY tap delay value to 75 ps, independent of process, voltage, and temperature variations. The DLY\_CLK\_200 input to the top-level design in Figure 3 serves as the 200 MHz clock input for the QDR II design.

Figure 11 shows the use of the IDELAY primitives to implement the direct-clocking methodology of Read data capture. As mentioned in the Introduction, this methodology relies on the use of the CQ echo clock from the QDR II memory device. This clock signal is used as a "training" signal to center align the QDR Read data with respect to the FPGA system clock USER\_CLK0.

The CQ clock enters the FPGA through an identical path to the QDR\_Q data bus signals, an HSTL input buffer followed by the IDELAY block, followed by an IDDR register. All IDELAY blocks are configured to variable delay mode, this allows the tap delay setting to be dynamically adjusted. Additionally, all IDELAY blocks are controlled by the same set of signals (dly\_clk, dly\_ce, dly\_inc) from the delay calibration state machine.





Figure 11: Direct Clocking Implementation for Read Path Data Capture

X703\_11\_052208

The delay calibration state machine monitors the state of the CQ clock input captured by the IDDR register on the rising edge of USER\_CLK0. An edge detection algorithm finds the location of the rising and falling edges of the CQ clock by varying the IDELAY tap delay setting for this signal. After the edges are found, the tap delay setting is adjusted to center the CQ clock edges around the rising edge of USER\_CLK0. The CQ clock is edge aligned to the QDR\_Q data bus coming from the memory device. When the same tap delay setting is applied to this bus, the USER\_CLK0 signal is automatically centered in the data valid window of the incoming Read data words. In this manner, the Read data values can be captured directly into the FPGA system clock domain without using complex data recapture techniques or the advanced timing analysis typically required when crossing clock boundaries.

Figure 12 shows how the CQ clock and QDR\_Q signals are delayed through the IDELAY blocks with identical tap settings to center align these signals to USER\_CLK0. The qdr\_cq\_delay and qdr\_q\_delay signals represent the waveforms at the inputs to the IDDR registers after passing through the IDELAY elements.





Figure 12: Alignment of QDR\_Q Inputs to USER\_CLK0 Using Tap Delays

Figure 13 diagrams the algorithm used by the delay calibration state machine to determine the proper IDELAY tap settings for the QDR\_Q inputs based on edge detection of the CQ clock. Initially, the tap delay values for both the CQ clock and the QDR\_Q bus signals are set to zero. The state machine initially waits for an active CQ clock by looking for 1024 consecutive clock edges. This guarantees a stable clock signal for the edge detection training sequence.

The state machine then waits for the main Read/Write state machine to initiate delay calibration. The value of the CQ calibration input is subsequently captured at the initial tap delay setting. This stored CQ value (1 or 0) establishes a baseline for edge detection comparison. At this point, the state machine begins to increment the tap delay settings for both the CQ clock and the QDR\_Q bus signals. At each new tap setting, the free running CQ clock value captured at the IDDR register is compared against the original stored CQ value to locate the position of one edge of the CQ signal. This process continues until the current CQ value changes to the opposite state of the stored CQ value, indicating detection of an edge. The final tap delay value determined through this process becomes one end point of the delay window.

The location of the next edge of the CQ clock is similarly determined. Initially, the tap delay setting increments by eight from its current point to reliably capture a new stored CQ value away from the CQ transition region. Then the state machine begins to increment the tap delay settings, checking for a change in the state of the captured CQ value with respect to the stored CQ value, indicating another edge detection. This continues until the next edge location is found or the tap delay value reaches the maximum value of 63. This final tap point determines the other end of the delay window.

The final operation of the delay calibration state machine is to decrement the tap delay settings for the CQ and QDR\_Q signals back to the center point of the delay window range, thus precisely centering the USER\_CLK0 signal in the data valid window of the Read data words entering the Virtex-4 device. Because the tap delay settings of the QDR\_Q signals are always being controlled in lockstep with the setting on the CQ input, after the CQ centering algorithm is complete, the QDR\_Q signals are also, by definition, center aligned to the edges of USER\_CLK0.



Figure 13: Edge Detection Algorithm For Centering USER\_CLK0 in a QDR\_Q Data Valid Window

#### **Read FIFO Strobe Generation**

The write enables for the Read data FIFOs are generated by delaying the internal Read strobe by the appropriate number of clock cycles using an SRL16 shift register function. This synchronizes the write enable to the Read data (Q) returning from the QDR II device. This method is possible because of the direct-clocking methodology in which everything inside the Virtex-4 device is synchronized to USER\_CLK0.

The number of clock cycle delays inserted into the Read strobe path is determined by the value RD\_FIFO\_DELAY found near the top of the  $qdrII\_mem\_ctrl2.v$  /.vhd file. By default, this value is set to 2 (binary 0010) which should be appropriate for most systems. However, this value should be adjusted up or down as necessary to align the write enable strobe to the Read data entering the FIFO.

# Reference Design

The reference design for the QDR II SDRAM interface is integrated with the Memory Interface Generator (MIG) tool, which is integrated into the CORE Generator™ software. The CORE Generator software can be download at:

http://www.xilinx.com/support/download/index.htm



### **Board Design Considerations**

While the Virtex-4 family offers many advanced I/O and clocking related features to greatly simplify memory interface design, attention must still be paid to basic board design criteria for a reliable and high-performance interface.

Specifically, the source synchronous nature of the Read and Write path interfaces requires matched board trace lengths for the interface clock, data, and control signals.

For example, the trace lengths of the QDR II device input signals (QDR\_K, QDR\_K\_n, QDR\_W\_n, QDR\_R\_n, QDR\_SA, QDR\_BW\_n, and QDR\_D) must be well matched to present the control, address, and data lines to the memory device with adequate setup and hold margins. The implementation of the Physical Interface ensures that these signals are center aligned to the QDR\_K and QDR\_K\_n clock edges when leaving the FPGA device outputs. The board traces must ensure that this relationship continues to the memory device inputs.

Similarly, the QDR II device output signals (QDR\_Q, QDR\_CQ) must have well matched trace lengths for the signals to all arrive edge-aligned at the inputs to the Virtex-4 device. This is critical to the implementation of the direct-clocking Read data capture methodology. Any reasonable board design tool can match these traces within an acceptable tolerance with little effort.

## **Timing Analysis**

The QDR II reference design leverages the unique I/O and clocking features of the device to maximize performance and timing margins, while greatly reducing the need for detailed placement and pinout analysis.

This section presents an example timing analysis for the address/control paths, the Write datapath, and the Read (or capture) datapath.

#### **Address/Control Paths**

As discussed previously, the Read/Write address bus, byte write enable signals, and Read/Write control strobes are all synchronized to the USER\_CLK270 clock. This ensures that these SDR signals have adequate setup and hold margins to the memory device with respect to the incoming QDR\_K and QDR\_K\_n clock edges derived from USER\_CLK0.

Table 2 shows an example timing analysis for these signals based on an interface to a 250 MHz 4-word burst QDR II memory device implemented with a Virtex-4 device, -11 speed grade.

Table 2: Address and Control Signal Timing Analysis

| Parameter                    | Value<br>(ps) | Leading-Edge<br>Uncertainties | Trailing-Edge<br>Uncertainties | Description                                                            |
|------------------------------|---------------|-------------------------------|--------------------------------|------------------------------------------------------------------------|
| T <sub>CLOCK</sub>           | 4000          | -                             | -                              | Clock period at 250 MHz                                                |
| T <sub>CLOCK_SKEW_FPGA</sub> | ± 50          | 50                            | 50                             | Clock skew from Timing Reporter and Circuit Evaluator (TRACE) analysis |
| T <sub>PACKAGE_SKEW</sub>    | ± 30          | 30                            | 30                             | Maximum package skew within bank                                       |
| T <sub>SETUP</sub>           | 500           | 500                           | 0                              | Setup time from memory data sheet                                      |
| T <sub>HOLD</sub>            | 500           | 0                             | 500                            | Hold time from memory data sheet                                       |
| T <sub>PCB_LAYOUT_SKEW</sub> | ± 50          | 50                            | 50                             | Maximum skew between board traces based on estimated match tolerance   |



| Table 2: Address and Control Signal Timing Analysis (Con |
|----------------------------------------------------------|
|----------------------------------------------------------|

| Parameter                           | Value<br>(ps) | Leading-Edge<br>Uncertainties | Trailing-Edge<br>Uncertainties | Description                                                                       |
|-------------------------------------|---------------|-------------------------------|--------------------------------|-----------------------------------------------------------------------------------|
| T <sub>PHASE_OFFSET_ERROR_DCM</sub> | ± 140         | 140                           | 140                            | Maximum offset between different outputs of Digital Clock Manager (DCM)           |
| T <sub>JITTER</sub>                 | ± 50          | 50                            | 50                             | Jitter component associated with the difference between USER_CLK0 and USER_CLK270 |
| Total Uncertainties                 | -             | 820                           | 820                            |                                                                                   |
| Valid Window                        | 2360          | 820                           | 3180                           | Worst-case window = 2360 ps                                                       |

Figure 14 illustrates the address and control signal timing margins. Because these signals are referenced to USER\_CLK270, there is more trailing edge margin than leading edge margin with respect to the QDR\_K clock edge. This allows the use of fewer global clock buffers and still provides adequate margin on the leading edge.



Figure 14: Address and Control Signal Timing Margins

## **Write Datapath**

The Write datapath (QDR\_D) is also synchronized to USER\_CLK270. However, the Write data words are transmitted as DDR values, and therefore must have adequate setup and hold margins with respect to both the rising edge of QDR\_K and the rising edge of QDR\_K\_n. Accordingly, the timing analysis for the Write datapath shown in Table 3 incorporates the maximum duty cycle distortion of the memory clocks. This analysis is also for a 250 MHz 4-word burst QDR II memory device and a Virtex-4 device, -11 speed grade.

Table 3: Write Datapath Timing Analysis

| Parameter                | Value<br>(ps) | Leading-Edge<br>Uncertainties | Trailing-Edge<br>Uncertainties | Description                           |
|--------------------------|---------------|-------------------------------|--------------------------------|---------------------------------------|
| T <sub>CLOCK</sub>       | 4000          | -                             | -                              | Clock period at 250 MHz               |
| T <sub>CLOCK_PHASE</sub> | 2000          | -                             | -                              | Clock phase (50% of clock period)     |
| T <sub>DCD</sub>         | 150           | -                             | -                              | Duty cycle distortion of memory clock |



Table 3: Write Datapath Timing Analysis (Cont'd)

| Parameter                           | Value<br>(ps) | Leading-Edge<br>Uncertainties | Trailing-Edge<br>Uncertainties | Description                                                                   |
|-------------------------------------|---------------|-------------------------------|--------------------------------|-------------------------------------------------------------------------------|
| T <sub>DATA_PERIOD</sub>            | 1850          | -                             | -                              | Total data period, TCLOCK_PHASE - TDCD                                        |
| T <sub>CLOCK_SKEW_FPGA</sub>        | 50            | 50                            | 50                             | Clock skew from TRACE analysis                                                |
| T <sub>PACKAGE_SKEW</sub>           | ± 30          | 30                            | 30                             | Maximum package skew within bank                                              |
| T <sub>SETUP</sub>                  | 350           | 350                           | 0                              | Setup time from memory data sheet                                             |
| T <sub>HOLD</sub>                   | 350           | 0                             | 350                            | Hold time from memory data sheet                                              |
| T <sub>PCB_LAYOUT_SKEW</sub>        | ± 50          | 50                            | 50                             | Maximum skew between board traces based on estimated match tolerance          |
| T <sub>PHASE_OFFSET_ERROR_DCM</sub> | ± 140         | 140                           | 140                            | Maximum offset between different outputs of DCM                               |
| T <sub>JITTER</sub>                 | ± 50          | 50                            | 50                             | Jitter component associated with difference between USER_CLK0 and USER_CLK270 |
| Total Uncertainties                 | -             | 670                           | 670                            | Worst-case leading and trailing uncertainties can not occur simultaneously    |
| Valid Window                        | 510           | 670                           | 1180                           | Worst-case window = 510 ps                                                    |

Figure 15 illustrates the Write datapath timing margins. Only the analysis with respect to QDR\_K is shown. The analysis with respect to QDR\_K\_n is identical.



Figure 15: Write Datapath Timing Margins

## Read Datapath (Data Capture)

The Read datapath (QDR\_Q) values are captured directly into the USER\_CLK0 clock domain using the previously described direct clocking techniques. Thus, the data capture timing analysis must be performed with respect to USER\_CLK0 and consideration must be given to the IDELAY tap delay resolution. In addition, though the CQ echo clock from the memory is used only as a "training" signal for the edge detection algorithm that center-aligns the QDR\_Q bus to USER\_CLK0, the potential skew between the CQ clock and the QDR\_Q bus must also



be taken into account. Table 4 presents the read timing analysis, while interfacing a QDR II device to a -11 speed grade Virtex-4 FPGA.

Table 4: Read Datapath Timing Analysis

| Parameter                    | Value<br>(ps)      | Description                                                                                                                                                                                                                                 |  |  |  |
|------------------------------|--------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|--|--|
| T <sub>CLOCK</sub>           | 4000               | Clock period at 250 MHz.                                                                                                                                                                                                                    |  |  |  |
| T <sub>CLOCK_PHASE</sub>     | 2000               | Clock phase (50% of clock period).                                                                                                                                                                                                          |  |  |  |
| Mem Uncertainties            |                    |                                                                                                                                                                                                                                             |  |  |  |
| T <sub>MEM_DCD</sub>         | 150                | Duty cycle distortion of receive clock.                                                                                                                                                                                                     |  |  |  |
| T <sub>CQ_TO_Q_SKEW</sub>    | 600                | CQ-to-data skew from memory data sheet.                                                                                                                                                                                                     |  |  |  |
| FPGA Uncertainties           | FPGA Uncertainties |                                                                                                                                                                                                                                             |  |  |  |
| T <sub>SAMP</sub>            | 500                | This parameter includes the total sampling error of the Virtex-4 FPGA DDR input registers across pressure, voltage, and timing (PVT) variations. This includes setup and hold of an IOB register, clock jitter, and 150 ps tap uncertainty. |  |  |  |
| T <sub>CLOCK_SKEW</sub>      | 100                | Clock skew from TRACE analysis.                                                                                                                                                                                                             |  |  |  |
| T <sub>PACKAGE_SKEW</sub>    | 20                 | Maximum package skew within bank.                                                                                                                                                                                                           |  |  |  |
| T <sub>PCB_LAYOUT_SKEW</sub> | 50                 | Maximum skew between board traces based on estimated match tolerance.                                                                                                                                                                       |  |  |  |
| IDELAY Tap Jitter            | 480                | Jitter caused by delaying data through the IDELAY. An IDELAYPAT_JIT of 12 ps is used for a worst-case number of taps equivalent to three-fourths of a clock period.                                                                         |  |  |  |
| Data Window                  | 100                |                                                                                                                                                                                                                                             |  |  |  |

### Conclusion

This application note describes the implementation and timing details of a 2-word or 4-word burst QDR II SRAM interface for Virtex-4 devices. The direct-clocking methodology utilized greatly simplifies the task of read data capture within the FPGA while providing a high-performance, robust, and scalable memory interface solution for current and next-generation QDR II SRAM memory devices.

# Additional Resources

### QDR II SRAM Memory Device Vendors:

- Cypress Semiconductor: http://www.cypress.com/
- ♦ Renesas Technology: http://www.renesas.com/
- ◆ IDT, Inc.: http://www.idt.com/
- Samsung Semiconductor: <a href="http://www.samsung.com/us/">http://www.samsung.com/us/</a>
- NEC Corporation: http://www.necel.com/memory/en/index.html



# Revision History

The following table shows the revision history for this document.

| Date     | Version | Revision                                                                                           |
|----------|---------|----------------------------------------------------------------------------------------------------|
| 09/10/04 | 1.0     | Initial Xilinx release.                                                                            |
| 05/12/05 | 2.0     | Delay Calibration State Machine modified to run off of one-quarter rate clock (CLK_DIV4).          |
|          |         | 2. Delay Calibration State Machine algorithm modified to start IDELAY tap count from zero.         |
|          |         | 3. Two methods presented for generating the Read FIFO write enable strobe.                         |
|          |         | 4. ISE design example updated to show implementations of both Read FIFO strobe generation methods. |
|          |         | 5. Timing Analysis section revised with updated timing numbers.                                    |
|          |         | 6. New reference design files released to reflect all listed changes.                              |
| 08/10/05 | 2.1     | 1. 2-word burst memory device documentation added.                                                 |
|          |         | 2. Added Figure 2, Figure 6, and Figure 8.                                                         |
|          |         | 3. 2-word burst reference design files added.                                                      |
| 04/11/06 | 2.2     | Updated link to reference design.                                                                  |
| 09/06/06 | 2.3     | Updated Table 1 and Table 4. Removed Figure 18. Also made typographical edits.                     |
| 07/09/08 | 2.4     | Removed link to QDR Consortium.                                                                    |
|          |         | Removed signals from Figure 3.                                                                     |
|          |         | Removed description of RD_STB_n_out from Design Overview.                                          |
|          |         | Removed signals from Figure 4.                                                                     |
|          |         | Removed text from Read Path.                                                                       |
|          |         | Removed text from Read FIFO Strobe Generation.  Parameter from Read FIFO Strobe Generation.        |
|          |         | Removed figure from Read FIFO Strobe Generation.      Removed tout from Reference Resign.          |
|          |         | Removed text from Reference Design.     Removed figure from Reference Design.                      |
|          |         | <ul><li>Removed figure from Reference Design.</li><li>Updated link to Reference Design.</li></ul>  |
|          |         | Opuated link to Helefelice Design.                                                                 |

# Notice of Disclaimer

Xilinx is disclosing this Application Note to you "AS-IS" with no warranty of any kind. This Application Note is one possible implementation of this feature, application, or standard, and is subject to change without further notice from Xilinx. You are responsible for obtaining any rights you may require in connection with your use or implementation of this Application Note. XILINX MAKES NO REPRESENTATIONS OR WARRANTIES, WHETHER EXPRESS OR IMPLIED, STATUTORY OR OTHERWISE, INCLUDING, WITHOUT LIMITATION, IMPLIED WARRANTIES OF MERCHANTABILITY, NONINFRINGEMENT, OR FITNESS FOR A PARTICULAR PURPOSE. IN NO EVENT WILL XILINX BE LIABLE FOR ANY LOSS OF DATA, LOST PROFITS, OR FOR ANY SPECIAL, INCIDENTAL, CONSEQUENTIAL, OR INDIRECT DAMAGES ARISING FROM YOUR USE OF THIS APPLICATION NOTE.