Virtex4 High Speed DDR Transceivers Xapp705
Virtex4 High Speed DDR Transceivers Xapp705
Virtex4 High Speed DDR Transceivers Xapp705
Summary
This application note describes dual data rate (DDR) transmitter (Tx) and receiver (Rx)
interfaces in an Virtex-4 FPGA using 17 low-voltage differential signaling (LVDS) pairs (one
clock and 16 data channels). This design is implemented using the ChipSync features. The
accompanying reference design files include an example targeting a Virtex-4 XC4VLX25FF668 device. A UCF file is provided for implementation of this design on the Xilinx ML450
development board. Please see design characteristics/recommendation summary for further
information on design requirements.
Introduction
An DDR interface is defined as having two data bits for every positive edge transition of the
clock (shown in Figure 1). Thus, if the data rate is 500 Mb/s, the clock frequency is 250 MHz.
DATA
word_0
word_1
word_2
word_3
CLK
x705_01_122904
Reference Clock
REFCLK_P
REFCLK_N
CLK
DATA<15:0>
Virtex-4 FPGA
CLK
DATA<15:0>
Device
with
DDR
Interface
x705_02_122904
2005 Xilinx, Inc. All rights reserved. XILINX, the Xilinx logo, and other designated brands included herein are trademarks of Xilinx, Inc.
All other trademarks are the property of their respective owners.
www.xilinx.com
Virtex-4 Implementation
Virtex-4
Implementation
Figure 3 shows a simplified Virtex-4 DDR transceiver block diagram as found in the reference
design, DDR_LVDS_TX_RX. This module contains IDELAYCTRL, TX_CLOCKS,
TX_CLK_AND_DAT, RX_CLK_AND_DAT, and RST_MACHINE. Details on each module are
described in the following sections.
CLKI
TX_CLOCK
GCLKDIV
TX_CLK_AND_DAT
RX_CLK_AND_DAT
TXCLK
CLKINP
CLKINN
Design
Data Logic
Path
DATAINN
CLKOUTN
TXCLKDIV = TXCLK/4
Data from
ISERDES
DATAINP
CLKOUTP
DATAOUTP
Data to OSERDES
DATAOUTN
RST Machine
for RXCLK Domain
IDELAYCTRL
RST Machine
for TXCLK Domain
RST Machine
for IDELAYCTRL
x705_03_122904
TX_CLOCKS Module
The TX_CLOCKS module is designed to provide/generate all the clock frequencies necessary
to perform the transmit operations using OSERDES. There are two clocks generated by this
module: TXCLK and TXCLKDIV.
The reference design uses the DDR clock input (CLKI) to generate TXCLK and TXCLKDIV. The
CLKI input must already be in the global clock network. In this example, the frequency of
TXCLK is four times faster than TXCLKDIV. Connect these two clocks to the CLK and CLKDIV
inputs of the desired OSERDES.
There are two methods to generate the TXCLK and TXCLDIV. Depending on designer
preference, the clocks can be generated using either the DCM or PMCD. Xilinx recommends
using the PMCD for any x2, x4, or x8 division. Using the PMCD saves DCM resources. Other
integer division can only be generated using the DCM. The number of global clock networks
www.xilinx.com
Virtex-4 Implementation
required for the transmitter does not differ between a DCM or PMCD solution. Figure 4
illustrates the generated clocks.
CLKI
TXCLK
TXCLKDIV
x705_04_122904
Definition
CLKI
RST
TXCLK1
TXCLKDIV1
TXDCMLOCKED
Input
Output
Notes:
1.
Figure 5 shows a block diagram of the TX_CLOCK module using the PMCD.
PMCD
CLKI
CLKA
CLKA1
CLKA1D4
CLKPMCD
TXCLK
CLKDIVPMCD
TXCLKDIV
x705_05_010404
TX_CLK_AND_DAT Module
The transmitter (TX_CLK_AND_DAT) uses two different types of output modules, OSERDES
for the data channels and ODDR for the clock output. The data channels have instantiation
names with the prefix TX_DAT_OUT_ followed by a two-digit number to denote the bit number.
In this example, each data channel consists of a MASTER/SLAVE pair of OSERDES to
accommodate 8:1 serialization. If the OSERDES is a SLAVE OSERDES, the naming
convention has an "S" after the two digit number. The clock channel has an instantiation name
www.xilinx.com
Virtex-4 Implementation
with the prefix TX_CLK_OUT_ followed by a two-digit number. More of these blocks can be
instantiated. Table 2 contains the module pin description.
Table 2: TX_CLK_AND_DAT Module Pin Definitions
I/O Type
Input
Definition
ORST
OCE
TXCLK1
DDR Clock
TXCLKDIV1
DATA_IN<127:0>
CLKOUTP
CLKOUTN
Output
DATAOUTP<15:0>
DATAOUTN<15:0>
Notes:
1.
Both TXCLK and TXCLKDIV must be phase aligned for proper transmitter operation. Xilinx recommends
using the TX_CLOCKS module to generate these two clocks.
There are sixteen pairs of OSERDES blocks in this module to accommodate 128-bits of parallel
data input. Each pair is a MASTER/SLAVE pair. Each OSERDES is set for 8:1 serialization.
Table 3 summarizes the settings applied to all MASTER OSERDES data channels. Table 4
summarizes the settings applied to all SLAVE OSERDES data channels.
Table 3: MASTER OSERDES Data Channel Settings
Parameter Name
DATA_RATE_OQ
Parameter Value
DDR
DATA_WIDTH
SERDES_MODE
MASTER
DDR
DATA_WIDTH
SERDES_MODE
Parameter Value
SLAVE
www.xilinx.com
Virtex-4 Implementation
Figure 6 illustrates the OSERDES connections necessary to build an 8:1 serialization
MASTER/SLAVE pair of OSERDES data channels.
Parallel Data
from FPGA Fabric
Data1
Data2
Data3
Data4
Data5
Data6
D1
MASTER
OSERDES
Serial data
to external FPGA pin
OQ
D2
D3
D4
D5
D6
SHIFTIN 1
SHIFTIN 2
SHIFTOUT1
SHIFTOUT2
D1
D2
Data7
Data8
D3
D4
D5
D6
SLAVE
OSERDES
x705_06_122904
www.xilinx.com
Virtex-4 Implementation
commonly used to forward the clock from Virtex-4 FPGAs to external devices. Figure 7 shows
the timing waveform of the transmitted data with respect to TXCLK and TXCLKDIV.
TXCLK
TXCLKDIV
D1
D2
D3
D4
D5
D6
D7
D8
OQ
x705_07_122904
TXCLK
TX_CLK_OUT
PRECLKOUT
CLKOUTP
CLKOUTN
TXCLKDIV
TX_DAT_OUT_0
PREDATOUT(0)
DATA_IN[127:0]
DATOUTP(0)
DATOUTN(0)
TX_DAT_OUT_15 PREDATOUT(15)
DATOUTP(15)
DATOUTN(15)
x705_08_122904
RX_CLK_AND_DAT Module
The receiver (RX_CLK_AND_DAT) module has both clock recovery and data recovery blocks.
The clock recovery blocks include:
www.xilinx.com
Virtex-4 Implementation
FIFO16 - A FIFO to move data from the Regional Clock network into the Global Clock
network
ISERDES
M/S Pair
DATAINP(0)
ISERDES
M/S Pair
DATAINP(7)
DATA
DATAINN(0)
GCLKDIV
FIFO
DATA_OUT
128
DATA
BUFR
RXCLKDIV
RXCLK
DATAINN(7)
BUFIO
ISERDES
M/S Pair
DATAINP(8)
ISERDES
M/S Pair
DATAINP(15)
DATA
DATA
DATAINN(8)
DATAINN(15)
x705_09_122904
www.xilinx.com
Virtex-4 Implementation
The functionality of the sub-blocks are discussed in the following sections. Table 5 contains the
module pin descriptions.
Table 5: RX_CLK_AND_DAT Module Pin Definitions
I/O Type
Definition
CLKINP
CLKINN
DATAINP<15:0>
DATAINN<15:0>
IRDY
USE_BITSLIP
RST
IRST
SCE
TRAINING_PATTERN<7:0>
LOCKED
GCLKDIV
RXCLKDIV
DATA_OUT<127:0>
DATA_ALIGNED1
BUS_ALIGNED1
SEND_CLOCK1
Input
Output
Notes:
1.
All forwarded clock and data input pins are connected to LVDSEXT_25 input buffers.
www.xilinx.com
Virtex-4 Implementation
The Figure 10 illustrates the recovered clock network.
DATAINP(x)
ISERDES
DATAINN(x)
Logic
ISERDES
Clock
Region
Border
ISERDES
ISERDES
Logic
CLKINP
RXCLK
BUFR
Divide by 4
BUFIO
CLKINN
ISERDES
Clock
Region
Border
ISERDES
Logic
ISERDES
ISERDES
x705_10_062405
www.xilinx.com
Virtex-4 Implementation
Parameter Value
BITSLIP_ENABLE
TRUE
DATA_RATE
DDR
DATA_WIDTH
INTERFACE_TYPE
NETWORKING
IOBDELAY
IFD
IOBDELAY_TYPE
VARIABLE
IOBDELAY_VALUE
NUM_CE
SERDES_MODE
MASTER
TRUE
DATA_RATE
DDR
DATA_WIDTH
INTERFACE_TYPE
NETWORKING
IOBDELAY
IFD
IOBDELAY_TYPE
VARIABLE
IOBDELAY_VALUE
NUM_CE
SERDES_MODE
10
Parameter Value
SLAVE
www.xilinx.com
Virtex-4 Implementation
Figure 11 illustrates the ISERDES connections necessary to build an 8:1 deserialization
MASTER/SLAVE pair of ISERDES data channels.
Parallel data
into FPGA Fabric
Serial data
external to FPGA
MASTER
ISERDES
Q1
Q2
Q3
Q4
Q5
Q6
SHIFTOUT1
SHIFTOUT2
SHIFTIN 1
SHIFTIN 2
Data1
Data2
Data3
Data4
Data5
Data6
Q1
Q2
Q3
Q4
SLAVE
ISERDES
Data7
Data8
Q5
Q6
x705_11_010404
Figure 11: MASTER/SLAVE Pair of ISERDES Data Channels for 8:1 Deserialization
www.xilinx.com
11
Virtex-4 Implementation
When using ISERDES, the order of the data received into fabric at every RXCLKDIV cycle is
Q1 to Q6 (last in to first in). For cases larger than 6:1 serialization, the order of the data received
from (last in to first in) Q1 to Q6 of MASTER ISERDES followed by Q3 to Q6 of SLAVE
ISERDES. In this example, because 8:1 serialization is used, the order of the data is from (last
in to first in) Q1 to Q6 of MASTER ISERDES followed by Q3 to Q4 of SLAVE ISERDES.
Figure 12 illustrates the order of data from ISERDES into the FPGA fabric.
DAT
D0
D1
D2
D3
D4
D5
D6
D7
D8
RXCLK
RXCLKDIV
Q1
D7
Q2
D6
Q3
D5
Q4
D4
Q5
D3
Q6
D2
Q7
D1
Q8
D0
x705_12_122904
ISERDES_ALIGNMENT_MACHINE Module
ISERDES_ALIGNMENT_MACHINE optimally centers the recovered clock to the data valid
window of the incoming data using the IDELAY feature of ISERDES. In addition, when needed,
this module uses the BITSLIP feature to reorder data into the desired training pattern.
12
www.xilinx.com
Virtex-4 Implementation
Table 8 summarizes all the pins available in this module.
Table 8: ISERDES_ALIGNMENT_MACHINE Module Pin Definitions
I/O Type
Input
Output
Definition
RXCLKDIV
RST
SAMPLED_CLOCK<7:0>
IRDY
USE_BITSLIP
TRAINING_PATTERN<7:0>
SAP
RXDATA<7:0>
INC
ICE
BITSLIP
DATA_ALIGNED
SEND_CLOCK
Bus Alignment is a method of data recovery outlined in this application note. When using this
method for data recovery, all data is aligned to the center of the clock. Prior to using this
method, the skew between all incoming data and clock channels must be minimized.
Additionally, the data transition edge is closely aligned to the clock edges of the incoming clock.
This method is useful in applications where the transmitter does not provide a training pattern.
Using the bus alignment method, the receive clock is sampled by a 1:8 DDR SERDES
(MASTER/SLAVE ISERDES). All eight of the MASTER/SLAVE ISERDES outputs are used to
monitor the edge transitions when IDELAY taps are applied to the registered clock input. The
edge transition detection and the number of taps applied determine the data valid window width
and the tap location to center align the data with respect to the clock.
Because this method requires sampling a receive clock, a slight change is made to the
recovered clock network connection. Instead of directly connecting the clock input into a
BUFIO, an ISERDES is inserted in between this connection.
www.xilinx.com
13
Virtex-4 Implementation
The designer must connect the clock into the ISERDES D input. The ISERDES outputs used
are the unregistered output (O) and the registered outputs (Q). The O output is connected to
the BUFIO input. IDELAY is only applied to the Q outputs. Table 9 summarizes the ISERDES
settings.
Table 9: ISERDES Settings
Parameter Name
Parameter Value
BITSLIP_ENABLE
FALSE
DATA_RATE
DDR
DATA_WIDTH
INTERFACE_TYPE
NETWORKING
IOBDELAY
IFD
IOBDELAY_TYPE
VARIABLE
IOBDELAY_VALUE
NUM_CE
SERDES_MODE
BUFR
RXCLK
BUFIO
O
SAMPLED_CLOCK
Q
CLK
CLKDIV
x704_13_122904
14
www.xilinx.com
Virtex-4 Implementation
Figure 14 illustrates the relationship between the receive clock (RXCLK) and the
sampled/delayed clock to show the algorithm.
Edge 1
RXCLK
Edge 3
Edge 2
Sampled
Clock(1)
Edge 3
Edge 2
Sampled
Clock(2)
Edge 3
Edge 2
Sampled
Clock(3)
Edge 3
Edge 2
Sampled
Clock(4)
x705_14_122904
State
00
01
10
11
When the clock-to-data alignment process is complete, this module moves to the data
reordering portion of the alignment. After asserting the USE_BITSLIP pin to a logic High and
setting TRAINING_PATTERN into a desired 8-bit training pattern, the reordering portion uses
the BITSLIP pattern until the desired 8-bit training pattern is found. It also requires the
transmitting device to send the desired pattern.
To reduce slice utilization, the logic in the state machine is reduced by removing the 0101 and
0110 states and the associated control pins generated by these states. Also, the BITSLIP pin
connections can be removed from ISERDES and set BITSLIP_ENABLE to FALSE.
When both IDELAY and BITSLIP operations are completed, the DATA_ALIGNED bit is
asserted High.
www.xilinx.com
15
Virtex-4 Implementation
FIFO16 Modules
In this application note, a FIFO is needed to transfer the data recovered from the Regional
Clock domain to the Global Clock domain. By transferring to the Global Clock domain, any logic
required for data processing with the recovered data is not limited to three clock regions. The
logic can be implemented across the FPGA.
Four FIFO16s primitives are instantiated to create four 512 x 36 bit FIFOs. Because the data
deserialized by ISERDES is 128 bits, the reference design uses four FIFO16.
Additional control logic is implemented for the FIFO16 to operate, with the following conditions:
1. Begin writing into FIFO from Regional Clock domain after all ISERDES have finished
alignment process
2. Begin reading data into the Global Clock domain when at least 50 entries are in the FIFO
3. Stop writing data into the FIFO from Regional Clock domain when less than 50 spaces are
available in the FIFO
These conditions can be changed depending on the desired conditions. Xilinx recommends a
clock frequency of the write clock that is slower than or equal to the read clock. By meeting this
clock frequency conditions, a FIFO overflow will not occur.
IDELAYCTRL Module
Because this design uses IDELAY, IDELAYCTRL is needed in order to guarantee proper
operation of IDELAY in the Virtex-4 FPGA. IDELAYCTRL requires the following two conditions
for proper operation.
RST_MACHINE Module
This module is used to create a synchronous reset for all elements in a given clock domain.
This module is also used to create an active High reset pulse for a desired duration of time. As
an example, IDELAYCTRL requires an active High reset duration of (50 ns).
To initiate the reset pulse, an input clock and a stimulus are used. The reset pulse generated by
this the RST_MACHINE module should be connected to all elements in the design that are
clocked by the input clock.
The number of clock cycles for the active High reset is the comparator value of COUNT_VALUE
in the state machine portion of this module. To shorten or lengthen the duration, this
comparator value needs to be changed.
Table 11 summarizes all the pins available in this module.
Table 11: RST_MACHINE Module Pin Definitions
I/O
Type
Input
Output
16
Module Pin
Name
Definition
CLK_generic
RST_stimulus
IRDY
DOMAIN_RST
www.xilinx.com
Virtex-4 Implementation
DDR_LVDS_AND_LOGIC_TOP
DDR_LVDS_TX_RX
CHECKER
Signals
DATA_SOURCE
CHECKER
FIFO
Connected
when using
TESTBENCH
DSP48
Slice
x705_15_122904
ISE Implementation
This design is compiled using ISE 6.3i. Files needed for this implementation are:
DDR_LVDS_TX_RX.v
DDR_LVDS_AND_LOGIC_TOP.ucf
Some warnings may occur. The readme.txt file provides further information on these warnings.
www.xilinx.com
17
Design Summary
Table 12 summarizes the Virtex-4 device utilization on the ML450 development board.
Table 12: DDR LVDS Device Utilization on the ML450 Development Board
Component Name
Design
Summary
18
Device Utilization
169
41
17
17
Number of DSP48
Number of FIFO16
Number of ISERDES
34
Number of OLOGIC
Number of OSERDES
32
Number of Slices
579
Number of BUFG
Number of BUFIO
Number of BUFR
Number of DCM
Number of IDELAYCTRL
Tx pins are grouped as close as possible to minimize skew (both on the board and on the
device).
Rx pins are grouped as close as possible to minimize skew (both on the board and on the
device) and the number of clock region used.
The reference design requires the device to have a PMCD. For devices without PMCD
(XC4VLX15, XC4VSX25, XC4VFX12, and XC4VFX20), change the PMCD portion in the
code to use a DCM instead (see appendix).
www.xilinx.com
Conclusion
Table 13 summarizes the device utilization of this design (excluding the ML450 development
board design utilization).
Table 13: DDR LVDS Device Utilization in a Virtex-4 Device
Component Name
Conclusion
Device Utilization
IOB
FIFO16
4 for Receiver
ISERDES
OSERDES
OLOGIC
BUFIO
BUFR
IDELAYCTRL
BUFGs
Slices
65
Virtex-4 devices can implement dual data rate,16-bit, LVDS data transmission and reception at
500 MHz. This design can easily be expanded for data larger than 16-bit wide data.
Complete Verilog design files for this application note are available on the Xilinx website at:
http://www.xilinx.com/bvdocs/appnotes/xapp705.zip.
Appendix
When using Virtex-4 devices that do not have a PMCD, this appendix outlines a method to use
the DCM to generate the TXCLK and TXCLKDIV. The TXCLKDIV is generated at the CLKDV
output of the DCM. Because the input clock frequency to the DCM is greater than the DCM
input frequency specificaiton, the CLKIN_DIVIDE_BY_2 of the DCM must be set to TRUE.
Figure 16 shows a block diagram of the TX_CLOCK module using the DCM. This circuit
achieves a minimized skew between the CLKDV (TXCLKDIV) output and CLKIN (TXCLK) input
of the DCM.
TXCLK
DCM
CLKI
BUFG external from
TX_CLOCKS module
CLKIN
CLK0
CLKFB
CLKDV
TXCLKDIV
x705_16_010504
www.xilinx.com
19
Revision
History
20
Revision History
The following table shows the revision history for this document.
Date
Version
Revision
02/17/05
1.0
06/24/05
1.1
12/08/05
1.2
Updated Introduction.
www.xilinx.com