
Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

2015 10th International Conference on Design & Technology of Integrated Systems in Nanoscale Era


All Digital Phase Interpolator
Andreas Tsimpos1, George Souliotis, Andreas Demartinos, Spiros Vlassis
Department of Physics, University of Patras, Patras, Greece
e-mail(1) [email protected]

Abstract— This paper proposes an all digital CMOS phase Adjacent input phases
interpolator suitable for high-speed multi-Gigabit serial
interfaces. The topology is based on the parallel combination of φψx
identical CMOS inverters grouped in eight segments and delivers
two programmable orthogonal output phases (I/Q). The phase
interpolator is designed to be compliant with MIPI alliance M-
Phase interpolator <phase>
PHY standard in a 65nm CMOS process. Simulation results
confirm 5-bit phase resolution with less than 5% worst case (PI) (a)
phase step variation, settling time less than 2 clock cycles and
power consumption about 2mW from 1.2V voltage supply. clkout
Keywords— High-speed integrated circuits, receivers, CMOS
mixed mode circuits, Clock/data recovery circuits, phase
interpolators. x
Multi-Gigabit data rate chip-to-chip serial interfaces are ψ
important systems, with a strong push to increase data rates, to o θ7
offer flexibility to support multiple standards, and to θn( ) θ6
implement several channels on the same chip [1]. The required φ
high-throughput with the short synchronization period for such
systems needs very fast burst mode clock and data recovery
systems (BM-CDR) and timing circuits [1]-[5].
Fast phase interpolator (PI) is maybe the most critical <phase>
block in BM-CDR systems. A PI produces accurate phases or
clock edges between the full cycle of 360o of the internal
Fig. 1. (a) Phase interpolator block diagramm and (b) phase out vs phase
reference clock and the BM-CDR places these clock edges in control code n characteristic
the middle eye of the received data bit [6]-[10].
Fig. 1a presents the block diagram of the PI where clkin are procedure in the next pair of input phases ψo and xo, where xo-
the input phases of reference clock, clkout is the output clock ψo=ψo-φo.
phases and <phase> is the digital control code. The general Many of PIs with controllable phase steps is based on
concept of a PI unit is to employ two constant phases ψo and weighted current summation that corresponds to two adjacent
φo of the system reference clock and to generate one or more reference phases [7], [11]-[18]. These are known as current
output phases θno which are precisely defined intermediates mode or analog phase interpolators using CML-type circuits
of the phases ψo and φo. [7], [8]. Therefore, the output discrete [11]-[14]. They are used in very fast circuits when the digital
phase steps θno can be described by eq (1): cells cannot fully support the required high speed operation or
they cannot achieve specifications. The main disadvantages of
θno=(n/N)φο+[(Ν-n)/N]ψο n=N:0 (1) them are the poor phase interpolation linearity and the
relatively large locking time over reasonable static power
where ψο<φο , N and n are integers, N is the total phase steps consumption which affects the CDR performance [9].
between ψο and φο and n defines the specific phase step
number. Fig. 1b illustrates eq.(1) for N=4 in which the output Instead, digital PIs based on CMOS cells and buffers have
phase takes the values of the reference phases e.g. ψo and φo been proposed simplifying the overall design and minimizing
and the additional phases steps θ5,θ6,θ7 between ψo and φo. The the power consumption [18]-[21]. In this type of PI, known
output phases can be further extended by applying the same also, as voltage mode the phase interpolation non-linearity still
remains the main issue. The non-linearity can be improved by

978-1-4799-1999-4/15/$31.00 ©2015 IEEE


using common-mode feedback [12] or controlling the slope or PI includes 8 identical segments (segment 0 to 7), each of
the slew rate of output waveform [23]. them processes two adjacent phases (e.g. 45o-90o) of
! PLL/VCO providing an accurate phase interpolation with 4
Other methods which use combination of digital and discrete phase steps of 11.25o (e.g. 45o, 56.25o, 67.5o, 78.75o).
analog interpolation have been used in [22] and inverse non- Therefore, 8 (segments) x 4 (phase steps) gives 32 discrete
linearity compensation has been proposed [13], [15], [20], phase steps of 11.25o that covers a whole clock cycle of 360o.
[22]. The last method suffers from mismatching effects. A
simpler method for linearity improvement is the employment Two orthogonal outputs clk-I, clk-Q are generated while
of smaller differences for the adjacent reference phases, such their phases are internally selected by a phase multiplexer
as 45o or 30o. The drawback is that more clock phases must be (PH-MUX) through the control bus <ctrl> according to the
always available which is not power efficient in high required phase. Also, <ctrl> is used to perform the PI power
frequency VCO. management for power optimization. The 2-bit control bus
<gear> suitably modifies the capacitor load tank, mentioned
In this paper an all digital CMOS-inverter based PI is before, in order to process date rates 1.5/3/6Gbps.
proposed. The circuit specifications are compliant with the
MIPI alliance M-PHY standard and the circuit is designed in The CONTROL UNIT is responsible to convert the
order to work with a BM-CDR. MIPI alliance recently (2013) <phase> control digital bus to the appropriate control bus
released M-PHY standard which is the most updated serial <ctrl> which is used to activate/deactivate the segments
interface protocol to high-speed serial communication with according to selected phase in order to minimize the power
three separate data rates or gears 1.5/3/6Gbps and is targeting consumption.
on multiple protocols (UniPro, DigRF, D-PHY) [2]. The
performance of the proposed PI topology is optimized for each III. OPERATION PRINCIPLE
gear by suitable selection of capacitors from a capacitor tank. Fig. 3 presents the architecture of the proposed PI and Fig.
The capacitor selection allows keeping good phase linearity in 5 the circuit topology of each segment. The CONTROL UNIT
all gears required by M-PHY standard. splits <ctrl> into two control busses <mux-ctrl> and <ph/en-
In Section II, the system overview of a BM-CDR is ctrl> for phase selection and power management, respectively.
explained and in Section III the operation principle of the The <ph/en-ctrl> enables the appropriate group of inverters
proposed PI is presented. In Section IV simulation results are inside a segment for accurate output phase interpolation and
provided to prove the proper operation of the system which is simultaneously deactivates the inverters and segments which
in accordance with the theoretically expected. does not contribute to phase interpolation minimizing in this
manner the power consumption. Therefore, only two segments
II. SYSTEM OVERVIEW are powered on (for clk-I and clk-Q) for a desired phase
keeping the power consumption as small as possible.
Fig. 2 illustrates the proposed system that includes the
phase interpolator (PI) and the associated phase/power control PH-MUX not only internally selects the appropriate
unit. The PLL/VCO is a conventional analog PLL with a segment’s output but also isolates the outputs of the rest
reference clock of 19.6MHz and actually generates 8 reference segments that do not contribute to phase interpolation. The
phases with accurate and constant phase difference of 45o. PH_MUX circuit is constructed by simple transmission gates
Therefore, the PLL/VCO produces the next phases 0, 45o, 90o, as it is shown in Fig. 4.
135o, 180o, 225o, 275o, and 315o. A suitable VCO can be easily
constructed by a four-stage CMOS differential ring oscillator
[24], [25].
to segment 7
clk-0o x4 sel
CONTROL <phase>
in0 UNIT
clk-45o 0
x4 sel
out-I clk-I
x4 <gear>
clk-90o 1
capacitor bank
x4 sel

in7 PH-MUX
clk-0o 7

to segment 0

Fig. 2. Burst mode CDR loop with multi-segment phase interpolator Fig. 3. Architecture of phase interpolator

Each segment which is actually a 2-bit phase interpolator The capacitance of the interpolation node is
generates one of four possibly discrete phases between two charged/discharged by the current which is produced by the
adjacent ! phases with a discrete phase step of 11.25o. It is four inverter combination. Therefore, intuitively, there are
constructed by 2 groups of controllable CMOS inverters four possible delay combinations and the delay time td.k for
arranged in four identical fingers (x4) with common output, as each phase step of a single segment is given by:
shown in Fig. 5. Each one of the inverter groups processes one
of the adjacent phases, e.g. ψo and φο=ψο+45ο. TABLE I. tϕ +45 − tϕ
td.k = td .0 + k (2)
presents the four possible inverter combinations k ,where 4
k=0,1,2,3 corresponds to the phase steps of a segment-j, where
j=0,1,…,7 is the number of segment and φ=0o, 45o,…, 315o is where tφ+45, tφ are the time where the reference clock phases
the discrete reference clock steps. Vφ+45 and Vφ, respectively, are applied to the segment and td.0
is the offset delay time of the output buffers. It should be
mentioned here that (a) the aspect ratio of pMOS/nMOS
transistor of a CMOS inverter can be chosen in such a manner
that charging and discharging currents will be both equal (b)
the output resistance of each inverter is very high compared
with the resistance of the transmission gates of PH-MUX (c)
the threshold logic voltage level is equal VDD/2 for the output
inverters and (d) Ct=Cp+C is the total capacitance on the
interpolation node, Cp is the parasitic capacitance and C is the
capacitive load tank.
Based on eq.(2), the time difference tφ+45-tφ is equal to
Tclk/8 and so the time delay step of a single segment will be
t d.k = t d .0 + k (3)
Fig. 4. Phase mux (PH-MUX) topology 32
Converting eq.(3) into phase domain it is clear that we can
segment- j sel
<en_ctrl> achieve a phase step equal to 11.25o for each single segment.
inv-1 inv-5 Tclk/32 So, the in-phase output θj.k.I and quadrature-phase output θj.k.Q
of the entire PI are given by
Vφ Vφ+45
inv-2 inv-6
tφ tφ tΦ+45

clk (φ)o
inv-3 inv-7 clk-(φ+45)ο θj.k.I=θo+[j.K/2+k].11.25ο (4)
. . ο
inv-4 inv-8 j+1, j+2,…segments θj.k.Q=θo+[j K/2+k +8] 11.25 (5)

where θo corresponds to the td.0. It should be mentioned here

Vk that the CONTROL UNIT selects simultaneously both
VDD orthogonal outputs.
C <gear>
sel4 output The voltage Vk on the charging/discharging node is the
result of the current summation of the two groups of inverters
Interpolation node with phases φ and φ+45o , as described in previous paragraph.
This voltage Vk can be linear over time when the capacitance
is suitable to create an integration node. Therefore, the
Fig. 5. Segment topology capacitor C selected from the capacitor tank must be large
enough in order to create the suitable integration conditions.
TABLE I. SEGMENT’S OUTPUT PHASES VS INVERTERS COMBINATIONS On the other hand a high capacitance may create too small
clk ψο clk ψο +45ο θ (o) k
amplitude Vp which would lead the operation to fail. The total
capacitance value includes all the parasitic capacitances from
inv 1 2 3 4 5 6 7 8 the transistors and wiring. In order to make the interpolator
1 1 1 1 0 0 0 0 ψ+θoff+0 0 capable for operation over three gears different capacitive load
are required. The proper value for the capacitor C can be
1 1 1 0 1 0 0 0 ψ+ θoff+11.25 1
sel1 - sel8(a) found by,
1 1 0 0 1 1 0 0 ψ+ θoff+22.50 2
C K ⋅ I inv
1 0 0 0 1 1 1 0 ψ+ θoff+33.75 3 ≈ (6)
a. Tclk V DD
seli=1 enables inv-i, seli=0 disables inv-i.
All inverter outputs in a segment, as well as in the entire
interpolator, are connected together to the interpolation node.

where, Tclk is the clock period, K is the number of inverters in

each group and Iin is the charging/discharging current of each
inverter. !
The proposed phase interpolator has been designed to be
compliant with the M-PHY standard specifications and
simulated in a 65nm CMOS technology node. The supply
voltage was 1.2V and the power consumption was about 2mW
for all gears. Post-layout simulations were performed with the
Spectre simulator of the Analog Design Environment of
Cadence software platform. The layout of a single PI segment Fig. 8. Output waveforms of a single segment with phase steps 0o, 11.25o,
22.5o and 33.75o for 6Gbps.
and of whole the PI are shown in Fig. 6 and Fig. 7,
respectively. The size of one segment is 30.2μm x 11.8μm and
for the PI is 121.5μm x 67.8μm including the capacitors.
Fig. 8 shows the output waveforms of a single segment for
6Gbps in which the adjacent clock phases are 0o and 45o. In
the same figure the three intermediate generated phases of
11.25o, 22.5o and 33.75o are presented. The 32 phases for the
full cycle of 360o are depicted in Fig. 9. The phase delays in
degrees for all the 32 generated phase steps as a function of
the control code for all gears 1.5/3/6 Gbps are presented in
Fig. 10, showing a high linearity.

Fig. 9. Output waveforms of whole interpolator with phases from 11.25o to

360o with step 11.25o for 6Gbps.

Fig. 6. Layout of a single segment

Fig. 10. Phase steps vs control code for 1.5/3/6Gbps

Fig. 7. Layout of the entire phase interpolator Fig. 11. Simulated phase error vs control code for 1.5Gbps with (W) and
without (WO) capacitor load.

operating under 1.2V. Only the circuits in [14] and [18]

operate with 1V. Also the topology in [23] requires only 0.5V
! supply voltage, but it operates in low frequencies. A PI with
significantly higher operating frequency is proposed in [16]
but it dissipates more than 50mW. Therefore the proposed is
an accurate and fast PI operating in high frequencies, suitable
for high speed deserializers.


Parameter Mean Value, (min-max)

Fig. 12. Duty cycle over delay stepping for 6Gbps.
VDD(V) 1.2 (±10% )
Ideally, every step must produce a phase difference of Tech. node 65 [SS,FF]
11.25o. The non-linearity of the PI can be measured by the T(oC) 27 (-10: 90)
error eo produced in each phase step θο comparing with its Gear(Gbps) 1.5 3 6
ideal value 11.25o, Power(mW) 1.9(1.08:3) 2(1.1:3.3) 2.1(1.15:3.6)
eο = θο - 11.25ο (7) Step(ps) 41.65 20.81 10.40
Step error (ps) 3(0,6:4,5) 1.5(1.1:1.7) 0.5(0.3:0.65)
The initial phase error which was about 4o for 1.5Gbps C(pF) 1.45 0.8 0.25
without the capacitive load is depicted in Fig. 11. Between the Duty Cycle (%) 50(46:52) 49(46:51) 49(45:52)
gears, the worst initial linearity error, without compensation, is Settling time
occurred for 1.5 Gbps because, due to the lack of the 3(2:5) 3(2:6) 2(1.5:4)
capacitive load the ratio Cp/T1.5Gbps is much smaller than b.
K.Iinv/VDD. The compensated phase error which becomes less for worst case PVT

than 1o using a capacitive load of 1.45pF is illustrated also in

A good metric characterizing the speed of a phase A CMOS phase interpolator for high-speed multi-Gigabit
interpolator is the number of clock cycles needed for the duty serial transceivers compliant with M-PHY standard is
cycle of the output waveforms to be settled around 50%. The presented. It can operate in all three HS-gears of M-PHY at
duty cycle for the transition between different phase steps is 1.5, 3 and 6 Gbps by controlling the capacitance load on the
shown in Fig. 12. The solid line depicts the simulated duty interpolation node. It generates two programmable orthogonal
cycle performed without layout parasitic while the dashed line output phases in 32 equally spaced discrete steps of 11.25o
depicts the post-layout simulation results. The large settling between 360o. The topology is designed in a 65nm CMOS
time of the duty cycle occurs for the first three phase process and, less than 5% phase step variation, 2 clock cycles
transitions (0o-180o-270o-315o). The main reason for this is settling time and 2mW power consumption from 1.2V voltage
that the phase changes are too large, so the output waveform supply. The phase noise remains less than -145dBc/Hz at
needs to lag in order to resynchronize with the new phase. In 1MHz in all gears.
addition, during these transitions different segments have to be
enabled or disabled in order to limit the power consumption. ACKNOWLEDGEMENTS
The settling time is improved, as the phase changes decrease, The present work was partially supported by the ''Intra-
e.g. phase transitions 315o - 337.5o - 348.75o. For the nominal university nano-electronics network'' of University of Patras.
case the duty cycle needs about 2 clock periods to be settled.
PVT corner simulations verify that it takes six clock periods
maximum for the duty cycle to recover.
Simulations were performed over process [SS, FF], VDD
[1] K. Maruko ,T.Sugioka, H. Hayashi, Zhiwei Zhou, Y.Tsukuda, Yagishita,
[1.08V, 1.32V] and temperature corners [-10oC, 90oC] (PVT Y.Yagishita, H.Konishi, T. Ogata,H.Owa,T. Niki,K. Konda, M. Sato,H.
corners) in order to verify the circuit stability. The circuit Shiroshita, T. Ogura, T Aoki, H. Kihara, S. Tanaka, “A 1.296-to-
performance is summarized in TABLE II. The power 5.184Gb/s Transceiver with 2.4mW/(Gb/s) Burst-mode CDR using
consumption is about 2mW under 1.2V supply voltage. The Dual-Edge Injection-Locked Oscillator,” in (ISSCC) Solid-State Circuits
maximum non-linearity error over PVT corners and Conference Digest of Technical Papers , San Francisco, CA, 2010, pp.
mismatches is about 0.5ps that corresponds to 5% error. The
[2] MIPI alliance, http://www.mipi.org/.
phase noise is less than -145dBc/Hz@1MHz for all gears. It
[3] J.Terada,K. Nishimura, S.Kimura, H. Katsurai, N. Yoshimoto,Y.
should be mentioned that the operation can be easily extended Ohtomo, “A 10.3125Gb/s Burst-Mode CDR Circuit using a ΔΣ DAC,”
to lower gears (<1Gbps) by adjusting the capacitor load while in (ISSCC) Solid-State Circuits Conference Digest of Technical Papers,
keeping the same performance. San Francisco, CA, 2008, pp. 226–227.
[4] J. D. Downi, AB Ruffin, Hurley J, “Ultra-low-loss optical fiber enabling
Comparing with other topologies, the proposed PI shows purely passive 10 Gb/s PON systems with 100 km length,” Optics
the smallest phase error and the smallest power dissipation Express, vol. 17, no. 4, pp. 2392-2399, Feb. 2009.

[5] C. Liang, Shen-Iuan Liu, “A 20/10/5/2.5Gb/s Power-scaling Burst- [15] H. Chung, D.-K. Jeong , W. Kim, “An 128-phase PLL using
Mode CDR Circuit Using GVCO/Div2/DFF Tri-mode Cells,” in interpolation technique,” J. Semiconductor Technology and Science,”vol
! Solid-State Circuits Conference Digest of Technical Papers, 3, no 4, pp.181-186, Dec. 2003.
San Francisco, CA, 2008, pp. 224-608. [16] H. Wang, A. Hajimiri, “A Wideband CMOS Linear Digital Phase
[6] Abiri, R. Shivnaraine, A. Sheikholeslami, Tamura, Hirotaka, M. Kibune, Rotator,” in (CICC) Custom Integrated Circuits Conference, San Jose,
“A 1-to-6Gb/s Phase-Interpolator-Based Burst-Mode CDR in 65nm CA, 2007, pp.671-674, 16-19.
CMOS,” in (ISSCC) Solid-State Circuits Conference Digest of Technical [17] L.N. Li, W. P. Cai, “A Phase Interpolator CDR with Low-Voltage CML
Papers, San Francisco, CA, 2011, pp. 154–155. Circuits,” J. of electronic science and technology, vol. 10, no. 4, pp.
[7] M. Horowitz, A. Chan, J Cobrunson,J Gasbarro,T Lee,W Leung, W. 341-318, Dec.2012.
Richardson, T. hrush, Y. Fujii, “PLL design for a 500 MB/s interface, ” [18] K.H. Cheng, P.K. Tseng, Y.L. Lo, “A Phase Interpolator For Sub-1V
in (ISSCC) Solid-State Circuits Conference Digest of Technical Papers, And High Frequency For Clock And Data Recovery,” in (ICECS)
Mounten View, CA, 1993, pp.159-161. Electronics, Circuits and Systems, Marrakech ,2007, pp.363-366.
[8] S.Sidiropoulos, M. Horowitz, “A Semidigital Dual Delay-Locked [19] A. Agrawal, A. Liu, P. K. Hanumolu, G.Y. Wei, “ An 8 5 Gb/s Parallel
Loop,” IEEE J. of Solid-State Circuits, vol 32, no.11, pp.1683-1692, Receiver With Collaborative Timing Recovery,” J. of Solid-State
Nov. 1997. Circuits, vol. 44,no. 11, pp. 3120 -3130, Nov. 2009.
[9] P.K. Hanumolu, G.-Y. Wei, U.K. Moon, “A Wide-Tracking Range [20] B.W. Garlepp, K.S. Donnelly, Kim Jun, P.S. Chau, J.L. Zerbe, C.
Clock and Data Recovery Circuit, ” IEEE J. of Solid-State Circuits, vol Huang, C. V. Tran, C.L. Portmann, D. Stark, Chan Yiu-Fai, T.H. Lee,
43, no.2, pp. 425-429, Feb. 2008. M.A. Horowitz, “A portable digital DLL architecture for CMOS
[10] J. Lee, B. Kim , "A low-noise fast-lock phase-locked loop with adaptive interface circuits,” in (VLSIC) IEEE Int. Symp. VLSI Circuits. Digest of
bandwidth control," IEEE J. of Solid-State Circuits, vol.35, no.8 Technical Papers, Honolulu, HI, USA, 1998, pp. 214-215.
pp.1137-1145, Aug. 2000. [21] A. Nicholson, J. Jenkins, A. van Schaik, T.J. Hamilton, T. Lehmann, “A
[11] M. Benyahia, J. B. Moulard, F. Badets, A. Mestassi, T. Finateu, L. Vogt, 1.2V 2-bit phase interpolator for 65nm CMOS,” in (ISCAS) Proc. IEEE
F. Boissieres, “A digitally controlled 5GHz Analog Phase Interpolator Int. Symp. Circuits Syst. Seoul, 2012, pp.2039-2042.
with 10GHz LC PLL, ” in (DTIS) Proc. Int. Conf. Design & Technology [22] Y. Jiang, A. Piovaccan, “A compact phase interpolator for 3.1256G
of Integrated Systems in Nanoscale Era , Rabat ,2007, pp. 130-135. Serdes application,” in Proc. IEEE Southwest Symp. Mixed-Signal
[12] H. Takauchi, H. Tamura, S. Matsubara, M. Kibune, Y. Doi, T. Chiba, H. Design, 2003 pp. 249-252.
Anbutsu, H. Yamaguchi, T. Mori, M. Takatsu, K. Gotoh, T. Sakai, T. [23] S. Kumakil, A. H. Johari, T. Matsubara, I. Hayashi, H. Ishikurol, “A
Yamamura, “A CMOS Multichannel 10-Gb/s Transceiver, ” IEEE J. 0.5V 6-bit Scalable Phase Interpolator,” in (APCCAS) Proc. IEEE Asia
Solid-State Circuits, vol 38, no.12 ,pp. 2094-2100, Dec 2003. Pacific Conference on Circuits and Systems, Kuala Lumpur, 2010, pp.
[13] R. Kreienkamp, U. Langmann, C. Zimmermann, T. Aoyama, H. 1019-1022.
Siedhoff, “A 10-Gb/s CMOS Clock and Data Recovery Circuit With an [24] S. Min, T. Copani, S. Kiaei, B. Bakkaloglu, “A 90-nm CMOS 5-GHz
Analog Phase Interpolator,” IEEE J. of Solid-State Circuits, vol. 40, no. ring oscillator PLL with delay-discriminator-based active phase-noise
3, pp. 736-743, March 2005. cancellation,” in (RFIC) Proc. IEEE Int. Symp. Radio Frequency
[14] S. Hu, C. Jia, K. Huang, C. Zhang, X. Zheng, Z. Wang, “A 10Gbps Integrated Circuits, Montreal, Canada, 2012, pp.173-176.
CDR based on phase interpolator for source synchronous receiver in [25] S.L.J Gierkink, “Low-Spur,Low-Phase-Noise Clock Multiplier Based on
65nm CMOS,” in (ISCAS) Proc. IEEE Int. Symp. Circuits Syst., Seoul, a Combination of PLL and Recirculating DLL With Dual-Pulse Ring
2012, pp.309-312, 20-23. Oscillator and Self-Correcting Charge Pump,” IEEE J. of Solid-State
Circuits, vol. 43, no.12, pp 967-2976, Dec. 2008.

You might also like