Virtex4 High Speed DDR Transceivers Xapp705

Download as pdf or txt
Download as pdf or txt
You are on page 1of 20

Application Note: Virtex-4 Family

XAPP705 (v1.2) December 08, 2005

Virtex-4 High-Speed Dual Data Rate LVDS


Transceiver
Author: Markus Adhiwiyogo

Summary

This application note describes dual data rate (DDR) transmitter (Tx) and receiver (Rx)
interfaces in an Virtex-4 FPGA using 17 low-voltage differential signaling (LVDS) pairs (one
clock and 16 data channels). This design is implemented using the ChipSync features. The
accompanying reference design files include an example targeting a Virtex-4 XC4VLX25FF668 device. A UCF file is provided for implementation of this design on the Xilinx ML450
development board. Please see design characteristics/recommendation summary for further
information on design requirements.

Introduction

An DDR interface is defined as having two data bits for every positive edge transition of the
clock (shown in Figure 1). Thus, if the data rate is 500 Mb/s, the clock frequency is 250 MHz.
DATA

word_0

word_1

word_2

word_3

CLK

x705_01_122904

Figure 1: DDR Clock and Data Interface


Figure 2 illustrates the overall system configuration, showing a full-duplex DDR link between a
Virtex-4 device and another device with a DDR transceiver. The Virtex-4 device requires a
reference clock with either LVDS or LVPECL differential outputs operating at the DDR clock
frequency to generate the transmit clock from the Virtex-4 device. Figure 2 shows a discrete
clock source operating at the DDR frequency.

Reference Clock

REFCLK_P
REFCLK_N

CLK
DATA<15:0>

Virtex-4 FPGA

CLK
DATA<15:0>

Device
with
DDR
Interface
x705_02_122904

Figure 2: Typical DDR Link System

2005 Xilinx, Inc. All rights reserved. XILINX, the Xilinx logo, and other designated brands included herein are trademarks of Xilinx, Inc.
All other trademarks are the property of their respective owners.

XAPP705 (v1.2) December 08, 2005

www.xilinx.com

Virtex-4 Implementation

Virtex-4
Implementation

Figure 3 shows a simplified Virtex-4 DDR transceiver block diagram as found in the reference
design, DDR_LVDS_TX_RX. This module contains IDELAYCTRL, TX_CLOCKS,
TX_CLK_AND_DAT, RX_CLK_AND_DAT, and RST_MACHINE. Details on each module are
described in the following sections.
CLKI

TX_CLOCK

GCLKDIV

TX_CLK_AND_DAT

RX_CLK_AND_DAT
TXCLK

CLKINP
CLKINN

Design
Data Logic
Path

DATAINN

CLKOUTN

TXCLKDIV = TXCLK/4

Data from
ISERDES

DATAINP

CLKOUTP

DATAOUTP

Data to OSERDES

DATAOUTN

RST Machine
for RXCLK Domain
IDELAYCTRL

RST Machine
for TXCLK Domain

RST Machine
for IDELAYCTRL
x705_03_122904

Figure 3: Simplified Virtex-4 DDR Transceiver Block Diagram


Multiple transmitters and receivers can be implemented in the same Virtex-4 FPGA. When
multiple instances are needed, only TX_CLK_AND_DAT and RX_CLK_AND_DAT modules are
replicated, saving valuable global clock resources by not replicating the TX_CLOCKS module.
Sample code and design for the ML450 board are also provided in the reference design file
DDR_LVDS_AND_LOGIC_TOP.

TX_CLOCKS Module
The TX_CLOCKS module is designed to provide/generate all the clock frequencies necessary
to perform the transmit operations using OSERDES. There are two clocks generated by this
module: TXCLK and TXCLKDIV.
The reference design uses the DDR clock input (CLKI) to generate TXCLK and TXCLKDIV. The
CLKI input must already be in the global clock network. In this example, the frequency of
TXCLK is four times faster than TXCLKDIV. Connect these two clocks to the CLK and CLKDIV
inputs of the desired OSERDES.
There are two methods to generate the TXCLK and TXCLDIV. Depending on designer
preference, the clocks can be generated using either the DCM or PMCD. Xilinx recommends
using the PMCD for any x2, x4, or x8 division. Using the PMCD saves DCM resources. Other
integer division can only be generated using the DCM. The number of global clock networks

www.xilinx.com

XAPP705 (v1.2) December 08, 2005

Virtex-4 Implementation
required for the transmitter does not differ between a DCM or PMCD solution. Figure 4
illustrates the generated clocks.

CLKI

TXCLK

TXCLKDIV
x705_04_122904

Figure 4: TX_CLOCKS Module Clock Waveforms


When using the CLK0 and CLKDV output of the DCM or the CLKA1 and CLKA1Dx of the
PMCD, the clock and divided clock outputs have minimum skew. It is important to minimize the
skew on these clocks.
In this application note, the reference design uses the PMCD. Information on how to use the
DCM instead is available in the Appendix, page 19.
Table 1: TX_CLOCKS Module Pin Definitions
I/O Type

Module Pin Name

Definition

CLKI

DDR Clock Input

RST

Active High Reset

TXCLK1

Generated TXCLK Domain Output

TXCLKDIV1

Generated TXCLK divided-by-four domain output

TXDCMLOCKED

DCM Locked Pin

Input

Output

Notes:
1.

Both TXCLK and TXCLKDIV must be phase aligned.

Figure 5 shows a block diagram of the TX_CLOCK module using the PMCD.
PMCD
CLKI
CLKA

CLKA1
CLKA1D4

CLKPMCD

TXCLK

CLKDIVPMCD

TXCLKDIV

x705_05_010404

Figure 5: TX_CLOCKS Module Using PMCD

TX_CLK_AND_DAT Module
The transmitter (TX_CLK_AND_DAT) uses two different types of output modules, OSERDES
for the data channels and ODDR for the clock output. The data channels have instantiation
names with the prefix TX_DAT_OUT_ followed by a two-digit number to denote the bit number.
In this example, each data channel consists of a MASTER/SLAVE pair of OSERDES to
accommodate 8:1 serialization. If the OSERDES is a SLAVE OSERDES, the naming
convention has an "S" after the two digit number. The clock channel has an instantiation name

XAPP705 (v1.2) December 08, 2005

www.xilinx.com

Virtex-4 Implementation
with the prefix TX_CLK_OUT_ followed by a two-digit number. More of these blocks can be
instantiated. Table 2 contains the module pin description.
Table 2: TX_CLK_AND_DAT Module Pin Definitions
I/O Type

Input

Module Pin Name

Definition

ORST

Active High Reset

OCE

Active High Output Enable

TXCLK1

DDR Clock

TXCLKDIV1

DDR Clock divided by 4

DATA_IN<127:0>

128-bit Parallel Data Input

CLKOUTP

Differential Transmit Clock Output

CLKOUTN

Output

DATAOUTP<15:0>
DATAOUTN<15:0>

16-bit Differential Data Clock Output

Notes:
1.

Both TXCLK and TXCLKDIV must be phase aligned for proper transmitter operation. Xilinx recommends
using the TX_CLOCKS module to generate these two clocks.

There are sixteen pairs of OSERDES blocks in this module to accommodate 128-bits of parallel
data input. Each pair is a MASTER/SLAVE pair. Each OSERDES is set for 8:1 serialization.
Table 3 summarizes the settings applied to all MASTER OSERDES data channels. Table 4
summarizes the settings applied to all SLAVE OSERDES data channels.
Table 3: MASTER OSERDES Data Channel Settings
Parameter Name
DATA_RATE_OQ

Parameter Value
DDR

DATA_WIDTH

SERDES_MODE

MASTER

Table 4: SLAVE OSERDES Data Channel Settings


Parameter Name
DATA_RATE_OQ

DDR

DATA_WIDTH

SERDES_MODE

Parameter Value

SLAVE

www.xilinx.com

XAPP705 (v1.2) December 08, 2005

Virtex-4 Implementation
Figure 6 illustrates the OSERDES connections necessary to build an 8:1 serialization
MASTER/SLAVE pair of OSERDES data channels.
Parallel Data
from FPGA Fabric

Data1
Data2
Data3
Data4
Data5
Data6

D1

MASTER
OSERDES

Serial data
to external FPGA pin
OQ

D2
D3
D4
D5
D6
SHIFTIN 1

SHIFTIN 2

SHIFTOUT1

SHIFTOUT2

D1
D2
Data7
Data8

D3
D4
D5
D6

SLAVE
OSERDES
x705_06_122904

Figure 6: MASTER/SLAVE Pair of OSERDES Data Channels for 8:1 Serialization


When using OSERDES, the order of the data transmitted at every positive TXCLK edge is from
D1 to D6 (LSB to MSB). For cases larger than 6:1 serialization, the order of the data transmitted
is from D1 to D6 of MASTER OSERDES followed by D3 to D6 of SLAVE OSERDES. Because
8:1 serialization is used in this example, the order of the data is from D1 to D6 of the MASTER
OSERDES followed by D3 to D4 of the SLAVE OSERDES. Because 3-state is not used, all 3state pins (TCE and T1 through T4) are tied to a logic Low. The 3-state attributes are left as
"Dont Care."
The ODDR is used to forward the DDR transmit clock from the Virtex-4 FPGA. This is
implemented by connecting TXCLK into the ODDR clock (C) input pin and connect D1 and D2
pins to a logic High and a logic Low respectively. ODDR and OSERDES are the two blocks

XAPP705 (v1.2) December 08, 2005

www.xilinx.com

Virtex-4 Implementation
commonly used to forward the clock from Virtex-4 FPGAs to external devices. Figure 7 shows
the timing waveform of the transmitted data with respect to TXCLK and TXCLKDIV.

TXCLK
TXCLKDIV
D1

D2

D3

D4

D5

D6

D7

D8

OQ

x705_07_122904

Figure 7: TX_CLK_AND_DAT Output Waveforms


All output pins from this module are connected to LVDSEXT_25 output buffers. Figure 8 shows
the block diagram for TX_CLK_AND_DAT module.

TXCLK

TX_CLK_OUT

PRECLKOUT

CLKOUTP
CLKOUTN

TXCLKDIV

TX_DAT_OUT_0

PREDATOUT(0)

DATA_IN[127:0]

DATOUTP(0)
DATOUTN(0)

TX_DAT_OUT_15 PREDATOUT(15)

DATOUTP(15)
DATOUTN(15)
x705_08_122904

Figure 8: TX_CLK_AND_DAT Module Block Diagram

RX_CLK_AND_DAT Module
The receiver (RX_CLK_AND_DAT) module has both clock recovery and data recovery blocks.
The clock recovery blocks include:

BUFIO - to access IOCLK network

BUFR - to access Regional Clock network

The data recovery blocks include:

www.xilinx.com

XAPP705 (v1.2) December 08, 2005

Virtex-4 Implementation

ISERDES - SERDES used to deserialize data

ISERDES_ALIGNMENT_MACHINE - Logic to control data recovery in one channel using


IDELAY and BITSLIP

FIFO16 - A FIFO to move data from the Regional Clock network into the Global Clock
network

Figure 9 shows a simplified block diagram for RX_CLK_AND_DAT module.


ISERDES_ALIGNMENT_MACHINE

ISERDES
M/S Pair

DATAINP(0)

ISERDES
M/S Pair

DATAINP(7)

DATA

DATAINN(0)

GCLKDIV
FIFO

DATA_OUT
128

DATA

BUFR

RXCLKDIV

RXCLK

DATAINN(7)

BUFIO

ISERDES
M/S Pair

DATAINP(8)

ISERDES
M/S Pair

DATAINP(15)

DATA

DATA

DATAINN(8)

DATAINN(15)
x705_09_122904

Figure 9: RX_CLK_AND_DAT Module Block Diagram

XAPP705 (v1.2) December 08, 2005

www.xilinx.com

Virtex-4 Implementation
The functionality of the sub-blocks are discussed in the following sections. Table 5 contains the
module pin descriptions.
Table 5: RX_CLK_AND_DAT Module Pin Definitions
I/O Type

Module Pin Name

Definition

CLKINP
CLKINN

Differential Receive Clock Input

DATAINP<15:0>
DATAINN<15:0>

16-bit Differential Receive Data Inputs

IRDY

When logic High, IDELAY is ready

USE_BITSLIP

When logic High, data recovery state machine


performs the BITSLIP operation until the training
pattern in TRAINING_PATTERN is found

RST

Active High Reset For all logic

IRST

Active High Reset For all ISERDES

SCE

Active High Clock Enable

TRAINING_PATTERN<7:0>

8-bit training pattern

LOCKED

LOCKED signal input

GCLKDIV

Global Clock Input Frequency near RXCLKDIV

RXCLKDIV

Received Clock divided by 4

DATA_OUT<127:0>

128-bit Parallel Data Output

DATA_ALIGNED1

When logic High, the alignment process for one


data channel is complete

BUS_ALIGNED1

When logic High, the alignment process across


all data channels is complete

SEND_CLOCK1

When logic High, the alignment machine is


requesting a clock signal at the data input pins

Input

Output

Notes:
1.

Used to indicate the state of the ISERDES alignment machine.

All forwarded clock and data input pins are connected to LVDSEXT_25 input buffers.

RX_CLK_AND_DAT Module Clocking Network


In this reference design, the recovered clock must be connected to input buffers at the clock
capable I/Os. After the input buffer, the clock is connected to the BUFIO followed by a BUFR.
The BUFIO allows the recovered clock to access the IOCLK network. The IOCLK clock network
is used for the CLK (fast or DDR clock input) of ISERDES. IOCLK can span up to three adjacent
clock regions.
The BUFR is used to access the regional clock network and perform the clock divide function.
The BUFR is used to provide the CLKDIV (slow or divided DDR clock input) of ISERDES. The
regional clock network can span up to three adjacent clock regions. The BUFR divide function
is set to four to accommodate 1:8 DDR deserialization.

www.xilinx.com

XAPP705 (v1.2) December 08, 2005

Virtex-4 Implementation
The Figure 10 illustrates the recovered clock network.
DATAINP(x)

ISERDES

DATAINN(x)
Logic
ISERDES
Clock
Region
Border

ISERDES

ISERDES

Logic

CLKINP

RXCLK
BUFR
Divide by 4

BUFIO

CLKINN

ISERDES
Clock
Region
Border
ISERDES

Logic

ISERDES

ISERDES
x705_10_062405

Figure 10: RX_CLK_AND_DAT Clock Network

XAPP705 (v1.2) December 08, 2005

www.xilinx.com

Virtex-4 Implementation

ISERDES Block Characteristics


There are sixteen pairs of MASTER/SLAVE ISERDES blocks in this module to accommodate
the 16 serial data inputs. Each ISERDES is set for 1:8 deserialization. Table 6 summarizes the
data channel MASTER ISERDES settings. Table 7 summarizes the data channel SLAVE
ISERDES settings.
Table 6: MASTER ISERDES Settings
Parameter Name

Parameter Value

BITSLIP_ENABLE

TRUE

DATA_RATE

DDR

DATA_WIDTH

INTERFACE_TYPE

NETWORKING

IOBDELAY

IFD

IOBDELAY_TYPE

VARIABLE

IOBDELAY_VALUE

NUM_CE

SERDES_MODE

MASTER

Table 7: SLAVE ISERDES Settings


Parameter Name
BITSLIP_ENABLE

TRUE

DATA_RATE

DDR

DATA_WIDTH

INTERFACE_TYPE

NETWORKING

IOBDELAY

IFD

IOBDELAY_TYPE

VARIABLE

IOBDELAY_VALUE

NUM_CE

SERDES_MODE

10

Parameter Value

SLAVE

www.xilinx.com

XAPP705 (v1.2) December 08, 2005

Virtex-4 Implementation
Figure 11 illustrates the ISERDES connections necessary to build an 8:1 deserialization
MASTER/SLAVE pair of ISERDES data channels.
Parallel data
into FPGA Fabric

Serial data
external to FPGA

MASTER
ISERDES

Q1
Q2
Q3
Q4
Q5
Q6

SHIFTOUT1

SHIFTOUT2

SHIFTIN 1

SHIFTIN 2

Data1
Data2
Data3
Data4
Data5
Data6

Q1
Q2
Q3
Q4

SLAVE
ISERDES

Data7
Data8

Q5
Q6
x705_11_010404

Figure 11: MASTER/SLAVE Pair of ISERDES Data Channels for 8:1 Deserialization

XAPP705 (v1.2) December 08, 2005

www.xilinx.com

11

Virtex-4 Implementation
When using ISERDES, the order of the data received into fabric at every RXCLKDIV cycle is
Q1 to Q6 (last in to first in). For cases larger than 6:1 serialization, the order of the data received
from (last in to first in) Q1 to Q6 of MASTER ISERDES followed by Q3 to Q6 of SLAVE
ISERDES. In this example, because 8:1 serialization is used, the order of the data is from (last
in to first in) Q1 to Q6 of MASTER ISERDES followed by Q3 to Q4 of SLAVE ISERDES.
Figure 12 illustrates the order of data from ISERDES into the FPGA fabric.
DAT

D0

D1

D2

D3

D4

D5

D6

D7

D8

RXCLK

RXCLKDIV
Q1

D7

Q2

D6

Q3

D5

Q4

D4

Q5

D3

Q6

D2

Q7

D1

Q8

D0
x705_12_122904

Figure 12: Input and Output Data Relationship of ISERDES


Because IDELAY and BITSLIP features are turned on, the BITSLIP, DLYINC, DLYCE, and
DLYRST are connected to control pins.

ISERDES_ALIGNMENT_MACHINE Module
ISERDES_ALIGNMENT_MACHINE optimally centers the recovered clock to the data valid
window of the incoming data using the IDELAY feature of ISERDES. In addition, when needed,
this module uses the BITSLIP feature to reorder data into the desired training pattern.

12

www.xilinx.com

XAPP705 (v1.2) December 08, 2005

Virtex-4 Implementation
Table 8 summarizes all the pins available in this module.
Table 8: ISERDES_ALIGNMENT_MACHINE Module Pin Definitions
I/O Type

Input

Output

Module Pin Name

Definition

RXCLKDIV

Received clock divided by 4.

RST

Active High reset for all logic.

SAMPLED_CLOCK<7:0>

The logic values when clock is sampled at


a give number of IDELAY taps.

IRDY

When logic High, IDELAY is ready.

USE_BITSLIP

When logic High, the data recovery state


machine performs a BITSLIP operation
until the training pattern in
TRAINING_PATTERN is found.

TRAINING_PATTERN<7:0>

8-bit training pattern.

SAP

When logic High, starts the alignment


process from beginning.

RXDATA<7:0>

8-bit data recovered from ISERDES.

INC

Increments/decrements the number of


IDELAY taps used.

ICE

Enables/disables change in the number of


IDELAY taps used.

BITSLIP

When logic High, ISERDES uses the


BITSLIP process.

DATA_ALIGNED

When logic High, the alignment process


on the current data channel has been
completed.

SEND_CLOCK

When logic High, the alignment machine is


requesting clock signal at data input pins
(used only as alignment module state
indicator).

Bus Alignment is a method of data recovery outlined in this application note. When using this
method for data recovery, all data is aligned to the center of the clock. Prior to using this
method, the skew between all incoming data and clock channels must be minimized.
Additionally, the data transition edge is closely aligned to the clock edges of the incoming clock.
This method is useful in applications where the transmitter does not provide a training pattern.
Using the bus alignment method, the receive clock is sampled by a 1:8 DDR SERDES
(MASTER/SLAVE ISERDES). All eight of the MASTER/SLAVE ISERDES outputs are used to
monitor the edge transitions when IDELAY taps are applied to the registered clock input. The
edge transition detection and the number of taps applied determine the data valid window width
and the tap location to center align the data with respect to the clock.
Because this method requires sampling a receive clock, a slight change is made to the
recovered clock network connection. Instead of directly connecting the clock input into a
BUFIO, an ISERDES is inserted in between this connection.

XAPP705 (v1.2) December 08, 2005

www.xilinx.com

13

Virtex-4 Implementation
The designer must connect the clock into the ISERDES D input. The ISERDES outputs used
are the unregistered output (O) and the registered outputs (Q). The O output is connected to
the BUFIO input. IDELAY is only applied to the Q outputs. Table 9 summarizes the ISERDES
settings.
Table 9: ISERDES Settings
Parameter Name

Parameter Value

BITSLIP_ENABLE

FALSE

DATA_RATE

DDR

DATA_WIDTH

INTERFACE_TYPE

NETWORKING

IOBDELAY

IFD

IOBDELAY_TYPE

VARIABLE

IOBDELAY_VALUE

NUM_CE

SERDES_MODE

MASTER (for MASTER ISERDES)


SLAVE (for SLAVE ISERDES)

The block diagram in Figure 13 shows the clock-to-data recovery scheme.


ISERDES
RXCLKDIV

BUFR

RXCLK

BUFIO
O

SAMPLED_CLOCK
Q
CLK
CLKDIV
x704_13_122904

Figure 13: Alternate Clock Data Recovery Circuit


The process to determine the window width and to center align the data follows:
1. Increment IDELAY taps until a 0-to-1 edge transition is found. The first 0-to-1 edge
transition indicates the beginning of a data valid window.
2. Begin counting the number of IDELAY taps.
3. Continue incrementing IDELAY taps until a 1-to-0 edge transition is found. When this
1-to-0 edge is found, the data valid window width has been determined.
4. Decrement the IDELAY taps by half of the data valid window width. This allows the IDELAY
tap at the center of the data valid window width to be used.
5. The IDELAY tap is moved for all data channels to the amount found in step 4.
This alignment scheme assumes minimized skew between all data channels and the clock
channel. The data transition edge of the data channel should be closely aligned to the positive
edge of the incoming clock.

14

www.xilinx.com

XAPP705 (v1.2) December 08, 2005

Virtex-4 Implementation
Figure 14 illustrates the relationship between the receive clock (RXCLK) and the
sampled/delayed clock to show the algorithm.
Edge 1

RXCLK
Edge 3

Edge 2

Sampled
Clock(1)
Edge 3

Edge 2

Sampled
Clock(2)
Edge 3

Edge 2

Sampled
Clock(3)
Edge 3

Edge 2

Sampled
Clock(4)
x705_14_122904

Figure 14: Timing Relationship Between Rx Clock and Sampled Clock


In Figure 14, the RXCLK is initially behind the clock to be sampled (edge 1 comes after edge 2).
The clock to be sampled is incremented by the tap delays until a 0-to-1 transition is found at the
Q outputs of the ISERDES. When this is true, edge 2 comes after edge 1. The clock to be
sampled is continually incremented until another 1-to-0 transition is found. When this is true,
edge 3 comes after edge 1. Finally, edge 1 is placed between edge 2 and edge 3 to center align
the data with respect to the clock.
The current bus alignment module state is indicated by a combination of the DATA_ALIGNED
and SEND_CLOCK pins. Table 10 summarizes the relationship between these two pins and
the alignment process state of this module.
Table 10: DATA_ALIGNED and SEND_CLOCK Relationship with Alignment State
DATA_ALIGNED and
SEND_CLOCK Value

State

00

Performing word alignment (optional)

01

Performing bit alignment

10

All alignment processes are completed

11

Dont care Defaults to all alignment processes are completed

When the clock-to-data alignment process is complete, this module moves to the data
reordering portion of the alignment. After asserting the USE_BITSLIP pin to a logic High and
setting TRAINING_PATTERN into a desired 8-bit training pattern, the reordering portion uses
the BITSLIP pattern until the desired 8-bit training pattern is found. It also requires the
transmitting device to send the desired pattern.
To reduce slice utilization, the logic in the state machine is reduced by removing the 0101 and
0110 states and the associated control pins generated by these states. Also, the BITSLIP pin
connections can be removed from ISERDES and set BITSLIP_ENABLE to FALSE.
When both IDELAY and BITSLIP operations are completed, the DATA_ALIGNED bit is
asserted High.

XAPP705 (v1.2) December 08, 2005

www.xilinx.com

15

Virtex-4 Implementation

FIFO16 Modules
In this application note, a FIFO is needed to transfer the data recovered from the Regional
Clock domain to the Global Clock domain. By transferring to the Global Clock domain, any logic
required for data processing with the recovered data is not limited to three clock regions. The
logic can be implemented across the FPGA.
Four FIFO16s primitives are instantiated to create four 512 x 36 bit FIFOs. Because the data
deserialized by ISERDES is 128 bits, the reference design uses four FIFO16.
Additional control logic is implemented for the FIFO16 to operate, with the following conditions:
1. Begin writing into FIFO from Regional Clock domain after all ISERDES have finished
alignment process
2. Begin reading data into the Global Clock domain when at least 50 entries are in the FIFO
3. Stop writing data into the FIFO from Regional Clock domain when less than 50 spaces are
available in the FIFO
These conditions can be changed depending on the desired conditions. Xilinx recommends a
clock frequency of the write clock that is slower than or equal to the read clock. By meeting this
clock frequency conditions, a FIFO overflow will not occur.

IDELAYCTRL Module
Because this design uses IDELAY, IDELAYCTRL is needed in order to guarantee proper
operation of IDELAY in the Virtex-4 FPGA. IDELAYCTRL requires the following two conditions
for proper operation.

Input reference clock (REFCLK) of 200 MHz

Minimum of 50 ns of active High reset pulse after startup

Additional information on IDELAYCTRL is available in Virtex-4 User Guide.

RST_MACHINE Module
This module is used to create a synchronous reset for all elements in a given clock domain.
This module is also used to create an active High reset pulse for a desired duration of time. As
an example, IDELAYCTRL requires an active High reset duration of (50 ns).
To initiate the reset pulse, an input clock and a stimulus are used. The reset pulse generated by
this the RST_MACHINE module should be connected to all elements in the design that are
clocked by the input clock.
The number of clock cycles for the active High reset is the comparator value of COUNT_VALUE
in the state machine portion of this module. To shorten or lengthen the duration, this
comparator value needs to be changed.
Table 11 summarizes all the pins available in this module.
Table 11: RST_MACHINE Module Pin Definitions
I/O
Type

Input

Output

16

Module Pin
Name

Definition

CLK_generic

The clock domain in which the reset pulse is needed

RST_stimulus

When active High, DOMAIN_RST is generated

IRDY

IDELAYCTRL ready signal

DOMAIN_RST

Active High reset pulse output This should be connected to all


reset pins of elements clocked by the clock connected to
CLK_generic input pin

www.xilinx.com

XAPP705 (v1.2) December 08, 2005

Virtex-4 Implementation

DDR_LVDS_AND_LOGIC_TOP (Top-Level Module)


This module uses Virtex-4 FPGAs and the ML450 development board to demonstrate the Tx
and Rx loopback. The DDR_LVDS_TX_RX performs both IDELAY and BITSLIP during the
alignment process. It also contains, a PRBS data generator, a FIFO to store data transmitted,
and a checker using a DSP48 slice to compare incoming data and data sent. The UCF file
targeted for the ML450 board is called ddr_lvds_and_logic_top.ucf. The design functions
correctly when both FINAL_BUS_ALIGNED and FINAL_DATA_CHECK are asserted High.
Figure 15 illustrates a simplified block diagram of this module.

DDR_LVDS_AND_LOGIC_TOP
DDR_LVDS_TX_RX
CHECKER
Signals

DATA_SOURCE
CHECKER

FIFO

Connected
when using
TESTBENCH

DSP48
Slice

x705_15_122904

Figure 15: DDR_LVDS_AND_LOGIC Simplified Block Diagram

Simulation for DDR_LVDS_AND_LOGIC


The reference design is simulated using ModelsSim SE 5.8b. The simulation testbench is
DDR_LVDS_AND_LOGIC_TOP_TESTBENCH.v, and the script is top.do. To invoke the script
at the Modelsim command prompt, Run top.do is typed. In this simulation, the design functions
correctly when both FINAL_BUS_ALIGNED and FINAL_DATA_CHECK are asserted High.
Some lines in the .do file must be changed to reflect the working directory or a library location.

ISE Implementation
This design is compiled using ISE 6.3i. Files needed for this implementation are:

DDR_LVDS_TX_RX.v

DDR_LVDS_AND_LOGIC_TOP.v (topmost level file)

DDR_LVDS_AND_LOGIC_TOP.ucf

The UCF file is associated with DDR_LVDS_AND_LOGIC_TOP.v


When using the ML450 development board, select XC4VLX25-11FF668 as the target device.
The following settings must be turned on or turned off:

Synthesize - XST - Equivalent Register Removal (unchecked)

Implement Design - MAP Properties - Trim Unconnected Signals (unchecked)

Some warnings may occur. The readme.txt file provides further information on these warnings.

XAPP705 (v1.2) December 08, 2005

www.xilinx.com

17

Design Summary
Table 12 summarizes the Virtex-4 device utilization on the ML450 development board.
Table 12: DDR LVDS Device Utilization on the ML450 Development Board
Component Name

Design
Summary

18

Device Utilization

Number of External IOB

169

Number of LOCed External IOB

41

Number of External IOBM

17

Number of External IOBS

17

Number of DSP48

Number of FIFO16

Number of ISERDES

34

Number of OLOGIC

Number of OSERDES

32

Number of Slices

579

Number of BUFG

Number of BUFIO

Number of BUFR

Number of DCM

Number of IDELAYCTRL

16 (IDELAYCTRLs were not LOC constrained)

The Virtex-4 reference design assumes/requires the following design parameters.

Flip-Chip packaged Virtex-4 FPGA

LVDSEXT at the receiver

Current design is a 8:1 Serializer/Deserializer (SERDES)

Tx and Rx pins are placed in either left or right I/O column

Tx pins are grouped as close as possible to minimize skew (both on the board and on the
device).

Rx pins are grouped as close as possible to minimize skew (both on the board and on the
device) and the number of clock region used.

The reference design requires the device to have a PMCD. For devices without PMCD
(XC4VLX15, XC4VSX25, XC4VFX12, and XC4VFX20), change the PMCD portion in the
code to use a DCM instead (see appendix).

www.xilinx.com

XAPP705 (v1.2) December 08, 2005

Conclusion
Table 13 summarizes the device utilization of this design (excluding the ML450 development
board design utilization).
Table 13: DDR LVDS Device Utilization in a Virtex-4 Device
Component Name

Conclusion

Device Utilization

IOB

17 differential data and clock inputs


17 differential data and clock outputs
2 differential clock inputs (DDR clock and REFCLK)

FIFO16

4 for Receiver

ISERDES

34 (16 pairs of MASTER/SLAVE ISERDES, one pair of


MASTER/SLAVE ISERDES for bus alignment)

OSERDES

32 (16 pairs of MASTER/SLAVE OSERDES

OLOGIC

1 (for Tx clock forwarding)

BUFIO

BUFR

IDELAYCTRL

16 (IDELAYCTRLs are not location constrained)

BUFGs

Slices

65

Virtex-4 devices can implement dual data rate,16-bit, LVDS data transmission and reception at
500 MHz. This design can easily be expanded for data larger than 16-bit wide data.
Complete Verilog design files for this application note are available on the Xilinx website at:
http://www.xilinx.com/bvdocs/appnotes/xapp705.zip.

Appendix

When using Virtex-4 devices that do not have a PMCD, this appendix outlines a method to use
the DCM to generate the TXCLK and TXCLKDIV. The TXCLKDIV is generated at the CLKDV
output of the DCM. Because the input clock frequency to the DCM is greater than the DCM
input frequency specificaiton, the CLKIN_DIVIDE_BY_2 of the DCM must be set to TRUE.
Figure 16 shows a block diagram of the TX_CLOCK module using the DCM. This circuit
achieves a minimized skew between the CLKDV (TXCLKDIV) output and CLKIN (TXCLK) input
of the DCM.
TXCLK
DCM
CLKI
BUFG external from
TX_CLOCKS module

CLKIN

CLK0

CLKFB

CLKDV

TXCLKDIV

x705_16_010504

Figure 16: TX_CLOCKS Module Using DCM

XAPP705 (v1.2) December 08, 2005

www.xilinx.com

19

Revision
History

20

Revision History

The following table shows the revision history for this document.
Date

Version

Revision

02/17/05

1.0

Initial Xilinx release.

06/24/05

1.1

Added link to the reference design files and made typographical


updates.

12/08/05

1.2

Updated Introduction.

www.xilinx.com

XAPP705 (v1.2) December 08, 2005

You might also like