# DESIGN AND REALIZATION OF A 2.4Gbps – 3.2Gbps CLOCK AND DATA RECOVERY CIRCUIT

by

### ZAFER ÖZGÜR GÜRSOY

Submitted to the Graduate School of Engineering and Natural Sciences in partial fulfilment of the requirements for the degree of Master of Science

> Sabancı University January 2003

# DESIGN AND REALIZATION OF A 2.4Gbps – 3.2Gbps CLOCK AND DATA RECOVERY CIRCUIT

### APPROVED BY:

| Assoc. Prof. Dr. Yaşar GÜRBÜZ       |  |
|-------------------------------------|--|
| (Thesis Supervisor)                 |  |
|                                     |  |
| Assistant Prof. Dr. Ayhan BOZKURT   |  |
| (Thesis Co-Advisor)                 |  |
|                                     |  |
| Prof. Dr. Yusuf LEBLEBİCİ           |  |
| (Thesis Co-Advisor)                 |  |
|                                     |  |
| Assistant Prof. Dr. Mehmet KESKİNÖZ |  |
|                                     |  |
|                                     |  |
| PhD. Amer ALSHAWA                   |  |
|                                     |  |

DATE OF APPROVAL: .....

© Zafer Özgür GÜRSOY 2003 All Rights Reserved

### ABSTRACT

This thesis presents the design, verification, system integration and the physical realization of a high-speed monolithic phase-locked loop (PLL) based clock and data recovery (CDR) circuit. The architecture of the CDR has been realized as a two-loop structure consisting of coarse and fine loops, each of which is capable of processing the incoming low-speed reference clock and high-speed random data. At start up, the coarse loop provides fast locking to the system frequency with the help of the reference clock. After the VCO clock reaches a proximity of system frequency, the LOCK signal is generated and the coarse loop is turned off, while the fine loop is turned on. Fine loop tracks the phase of the generated clock with respect to the data and aligns the VCO clock such that its rising edge is in the middle of data eye.

The speed and symmetry of sub-blocks in fine loop are extremely important, since all asymmetric charging effects, skew and setup/hold problems in this loop translate into a static phase error at the clock output. The entire circuit architecture is built with a special low-voltage circuit design technique.

All analogue as well as digital sub-blocks of the CDR architecture presented in this work operate on a differential signalling, which significantly makes the design more complex while ensuring a more robust performance. Other important features of this CDR include small area, single power supply, low power consumption, capability to operate at very high data rates, and the ability to handle between 2.4 Gbps and 3.2 Gbps data rate. The CDR architecture was realized using a conventional 0.13-µm digital CMOS technology (Foundry: UMC), which ensures a lower overall cost and better portability for the design.

The CDR architecture presented in this work is capable of operating at sampling frequencies of up to 3.2 GHz, and still can achieve the robust phase alignment. The entire circuit is designed with single 1.2 V power supply. The overall power

consumption is estimated as 18.6 mW at 3.2 GHz sampling rate. The overall silicon area of the CDR is approximately 0.3 mm<sup>2</sup> with its internal loop filter capacitors.

Other researchers have reported similar featured PLL-based clock and data recovery circuits in terms of operating data rate, architecture and jitter performance. To the best of our knowledge, this clock recovery uses the advantage of being the first high-speed CDR designed in CMOS 0.13µm technology with the superiority on power consumption and area considerations among others.

The CDR architecture presented in this thesis is intended, as a state-of-the-art clock recovery for high-speed applications such as optical communications or high bandwidth serial wireline communication needs. It can be used either as a stand-alone single-chip unit, or as an embedded intellectual property (IP) block that can be integrated with other modules on chip.

ÖZET

Bu tez, yüksek hızlı, faz kilitlemeli çevrim tabanlı saat ve veri yakalama devresinin (clock and data recovery - CDR) tasarımı, sınanması, sistem düzeyinde tümleştirilmesi ve fiziksel tasarımının gerçekleştirilmesi aşamalarından oluşmuştur. CDR mimarisi, her biri girişindeki düşük hızlı referans saat işaretini ve rasgele veriyi işleyebilen kaba ayar çevrimi ve ince ayar çevrimi isimli iki farklı çevrimden oluşmuştur. Başlangıçta, kaba ayar çevrimi, veri frekansına referans saat işaretinin de yardımı ile kilitlenmeyi sağlar. Gerilim kontrollü osilatör (GKO) veri hızına yakın bir frekansta işaret üretmeye başladığı anda kilitlenme kontrol işareti (LOCK) üretilir. Bu kontrol işareti sayesinde kaba ayar çevrimi devreden çıkarılarak ince ayar çevrimi devreye sokulur. İnce ayar çevrimi GKO tarafından üretilen saat işaretinin yükselen kenarı veri biti göz açıklığının ortasına gelecek şekilde saat işaretini sürekli izler.

İnce ayar çevrimini oluşturan alt-blokların tasarımında hız ve simetri konuları son derece önemlidir. Bu çevrimin çalışması esnasında oluşabilecek asimetrik yükleme etkileri, zaman kaymaları ve örnekleme anlarındaki zamanlama hataları devre çıkışına statik faz hatası olarak yansıyacağından, tüm devre mimarisi özel düşük gerilim devre tasarım teknikleri kullanılarak tasarlanmıştır.

Bu calışma kapsamında ele alınan CDR mimarisinin tüm analog ve sayısal altblokları, blokların daha güvenli olarak çalışmalarını sağlamak amacıyla, devre tasarımını büyük ölçüde zorlaştırmasına rağmen, diferansiyel işaret işleme tekniği kullanılarak tasarlanmıştır. Bu CDR'nin diğer önemli özellikleri arasında küçük kırmık alanı, tek güç kaynağı kullanılması, düşük güç gereksinimi, çok yüksek veri transfer hızlarında ve 2.4 Gbps ve 3.2 Gbps veri hızları aralığında sorunsuz çalışabilme kabiliyeti sayılabilir. Bu tezde sunulan CDR mimarisi, daha düşük toplam maliyet ve tasarıma daha iyi taşınabilirlik sağlamak amacıyla, endüstride yaygın olarak kullanılan 0.13 μm sayısal CMOS teknolojisi (Üretici firma: UMC) kullanılarak gerçekleştirilmiştir.

Tasarlanan devre, 3.2 GHz örnekleme frekansına kadar doğru çalışabilme ve bu yüksek örnekleme frekansında hedeflenmiş olan faz ayarlama özelliklerini yerine getirebilme kabiliyetine sahiptir. Devrenin tamamı bir tek 1.2 V güç kaynağı ile beslenebilecek şekilde tasarlanmıştır. 3.2 GHz örnekleme hızında, toplam güç tüketimi 18.6 mW olarak öngörülmektedir. Tümleştirilen çevrim süzgeci kapasiteleri ile birlikte CDR'nin toplan silikon alanı yaklaşık 0.3 mm<sup>2</sup>'dir

Bu tez çalışmasında tasarlanan CDR mimarisi, optik haberleşme veya yüksek bant genişliğine sahip seri kablolu haberleşme gereksinimleri gibi çok yüksek hız gerektiren uygulamalarda kullanılmak amacıyla tasarlanmıştır. Bu devre, tek başına bir kırmık olarak veya daha büyük bir kırmık üzerine başka modüllerle birleştirilebilecek bir IP (intellectual property) bloğu olarak da kullanılabilir. To my parents, and to my daisy.

### ACKNOWLEDGEMENTS

As Claude Bernard says, "Art is I, science is we", which summarizes many truths about the importance of being a team during a scientific study. Related to this fact; I would like to thank the following persons and organisations that contributed to my thesis.

First, I would like to thank my thesis advisor Prof. Dr. Yusuf LEBLEBİCİ for his excellent support, and assistance even while he was in EPFL during the writing phase of my thesis. I was truly lucky to have the opportunity to work with an advisor like him.

I am also very lucky to have the opportunity to work with my thesis supervisor Assoc. Prof. Dr. Yaşar GÜRBÜZ during the last stages of my study. I am thankful to him for his understanding, helpful and professional approach.

I am grateful to ex-Alcatel Microelectronics (AME) and ST Microelectronics for funding my graduate studies at Sabancı University as a part of an industry-university collaborate agreement.

I would like to thank also the current analogue design group members of ST Microelectronics, for technical suggestions as well as great collegial working atmosphere. Thank you Alper, thank you Erdem, thank you Zeynep, thank you Turan and thank you Aslı.

Last, but by no means the least, I am grateful to my family for their patience and encouragement during my education. Finally, I am most grateful to Emel for her endless understanding, patience and love, which made it possible for me to be successful at the end.

# TABLE OF CONTENTS

| 1.        | INTRODUCTION                                      | 1        |
|-----------|---------------------------------------------------|----------|
| 1.        | . Motivation                                      | 1        |
| 1.        | . Thesis Organization                             | 2        |
| 2         | CLOCK AND DATA RECOVERV STRUCTURES IN SERIAL      |          |
| 2.<br>CO  | MUNICATION SYSTEMS                                | 4        |
| ้า        | Introduction                                      | 1        |
| 2.<br>2   | . Introduction                                    | 4        |
| 2.<br>2   | Methods of Clock and Data Recovery                | J<br>7   |
| 2.        | 3.1 Disk Drive Clock Recovery                     | /<br>و   |
|           | 3.2 Generating High-Speed Digital Clocks On-Chin  | ر<br>و   |
|           | 3.3 Over-sampled Data Conversion                  | 9        |
|           | .3.4. Wireless Communication                      | 9        |
| 2.        | . Basic Clock and Data Recovery Architectures     | 10       |
|           | .4.1. Properties of Non-Return to Zero (NRZ) Data | 10       |
|           | .4.2. Clock Recovery Architectures                | 13       |
| 2         | DEDEODMANCE MEASURES OF DUI BASED CLOCK AND DATA  |          |
| J.<br>RE( | OVERY CIRCUITS                                    | 20       |
|           |                                                   | 20       |
| 3.        | . Introduction                                    | 20       |
| 3.<br>2   | . Phase-Locked Loop Fundamentals                  | 20       |
| 3.<br>2   | Loop Bandwidth and Damping Factor                 | 24       |
| ).<br>2   | Lock Pange (Tracking Pange)                       | 23<br>27 |
| 3.<br>3   | Acquisition of Lock                               | 27       |
| 5.        | 6.1 Acquisition Time                              | 27       |
|           | 6.2 Aided Acquisition                             | 32       |
| 3.        | Timing Jitter Definitions                         |          |
|           | .7.1. Deterministic Jitter                        | 34       |
|           | .7.2. Random Jitter                               | 35       |
| 3.        | . SONET Jitter Specifications                     | 37       |
|           | .8.1. SONET Jitter Tolerance                      | 38       |
|           | .8.2. SONET Jitter Transfer                       | 40       |
|           | .8.3. SONET Jitter Generation                     | 43       |
| 4.        | MODELING AND SIMULATING PLL BASED CLOCK RECOVERY  |          |
| CIF       | CUIT IN MATLAB                                    | 44       |
| Δ         | Introduction                                      | ΔΔ       |
| <br>4     | . Two-Loop Architecture                           | 45       |
|           | T T                                               |          |

| 4.3.            | Determining Loop Dynamics                                     |            |
|-----------------|---------------------------------------------------------------|------------|
| 4.4.            | Simulink Modelling of Two-Loop Clock and Data Recovery        | 55         |
| 4.4.1.          | Coarse Loop Modelling                                         | 55         |
| 4.4.2.          | Fine Loop Modelling                                           | 58         |
| 4.4.3.          | Two-Loop Clock and Data Recovery Modelling                    | 60         |
| 5. AR           | CHITECTURE COMPONENTS: GENERAL TECHNOLOGY                     |            |
| REVIEW          | V & COARSE LOOP                                               |            |
| · · .           |                                                               | C 4        |
| 5.1.            | Introduction                                                  |            |
| J.Z.            | General Considerations                                        |            |
| 5.2.1.          | Substrate Current Injection                                   |            |
| 5.2.2.<br>5.2.3 | Differential vs. Single Ended Signalling                      | 00         |
| 52.3.           | Technology and Transistors                                    |            |
| 525             | Case Definitions                                              | 07         |
| 53              | Design of Coarse Loon Components:                             | 73         |
| 5.3.1           | Design of Phase-Frequency Detector                            | 73         |
| 532             | Design of Differential Charge Pump                            |            |
| 533             | Design of Common-Mode Feedback (CMFB) Circuit                 | 90         |
| 5.3.4           | Design of Divide-by-16 Circuit                                | 92         |
| 5.3.5.          | Design of Lock Detector.                                      |            |
|                 |                                                               | A <b>T</b> |
| 0. AK           | CHITECTURE COMPONENTS: FINE LOOP & DIFFERENTL                 | AL<br>100  |
| V C O           |                                                               | 100        |
| 6.1.            | Introduction                                                  | 100        |
| 6.2.            | Design of Fine Loop Components                                | 100        |
| 6.2.1.          | Design of Differential Phase Detector                         | 100        |
| 6.2.            | 1.1. Design of Differential Master-Slave Flip-Flop            | 105        |
| 0.2.            | 1.2. Design of Defay Cell                                     | 110        |
| 0.2.<br>6 2 2   | Design of Differential Charge Pump                            | 111        |
| 0.2.2.<br>6.2.2 | Design of Differential L oop Filter                           | 113        |
| 0.2.3.<br>63    | Design of Differential Voltage Controlled Oscillator (VCO)    |            |
| 631             | Ring Oscillator VCO                                           |            |
| 632             | Construction of the Differential Ring Oscillator              |            |
| 633             | Design of Differential Delay Stage and Self-Biasing Circuit   |            |
| 634             | Design of VCO Output Buffer                                   | 138        |
| <b>– –</b>      |                                                               |            |
| 7. IU<br>CONSID | P LEVEL CONSTRUCTION OF THE CIRCUIT AND LAYOUT                | 1/1        |
| CONSID          |                                                               |            |
| 7.1.            | Introduction                                                  | 141        |
| 7.2.            | Top-Level Construction of the Circuit                         |            |
| 7.3.            | Top-Level Simulations of the Circuit                          |            |
| 7.4.            | System Level Functionality of Clock and Data Recovery Circuit | 149        |
| 1.5.            | Layout Considerations                                         | 152        |
| /.5.1.          | Layer Snaring                                                 |            |
| 1.5.2.          | Kenapility                                                    | 153        |
| 1.5.3.          | Symmetry and Placing                                          | 153        |
| 1.3.4.<br>755   | Denuilig oli Data Patlis<br>Shialding                         | 155        |
| 1.3.3.<br>756   | Dummy Components                                              | 133        |
| 7.5.0.          |                                                               |            |

| 7.6.     | The Layout                              |            |
|----------|-----------------------------------------|------------|
| 8.       | CONCLUSION                              |            |
| 8.1.     | Future Work                             |            |
| A.       | APPENDIX A: COMPLETE CIRCUIT SCHEMATICS |            |
|          |                                         |            |
| B.       | APPENDIX B: COMPLETE MASK LAYOUTS       |            |
| В.<br>С. | APPENDIX B: COMPLETE MASK LAYOUTS       | 179<br>187 |

## LIST OF FIGURES

| Figure 2.1. Typical fiber optic serial data transmission system                       | 5    |
|---------------------------------------------------------------------------------------|------|
| Figure 2.2. Independent test of jitter due to clock recovery function                 | 7    |
| Figure 2.3. Simplified block diagram of a digital receiver                            | 7    |
| Figure 2.4. Generic clock recovery architecture                                       | 8    |
| Figure 2.5 (a) NRZ data; (b) RZ data; (c) fastest NRZ data with $r_b = 1$ Gbps        | .11  |
| Figure 2.6. Spectrum of NRZ data                                                      | . 12 |
| Figure 2.7. Power spectral density of 622 Mbps data                                   | . 12 |
| Figure 2.8. Edge detection of NRZ data                                                | . 13 |
| Figure 2.9. Edge detection and sampling NRZ data                                      | . 14 |
| Figure 2.10. Phase locked clock recovery circuit.                                     | . 15 |
| Figure 2.11. Response of a three-state PFD to random data                             | . 15 |
| Figure 2.12. Over-sampling clock recovery using variable number of delay elements.    | 17   |
| Figure 2.13. Over-sampling clock recovery using a DLL delay adjusting circuit         | . 18 |
| Figure 3.1. Simplified block diagram of phase-locked loop                             | . 21 |
| Figure 3.2. Small signal AC model of PLL                                              | . 21 |
| Figure 3.3. Simple first-order low-pass filter                                        | . 23 |
| Figure 3.4. Low-pass filter with a higher order pole.                                 | . 23 |
| Figure 3.5. Under-damped response of PLL to a frequency step (a) $\zeta = 0.25$ , (b) | ζ=   |
| 0.707                                                                                 | . 26 |
| Figure 3.6. Variation of parameters during tracking                                   | . 27 |
| Figure 3.7. Gain reduction in PD and VCO                                              | . 28 |
| Figure 3.8. Aided acquisition with a frequency detector                               | . 33 |
| Figure 3.9. Pattern dependent jitter                                                  | . 34 |
| Figure 3.10. Noise on a signal results in random jitter                               | . 35 |
| Figure 3.11 Relationship between RMS noise and RMS random jitter                      | . 37 |
| Figure 3.12. Jitter tolerance curve for a 155Mbps application [9]                     | . 39 |
| Figure 3.13. SONET jitter tolerance curve mask                                        | . 40 |

| Figure 3.14. SONET jitter transfer function mask41                                      |
|-----------------------------------------------------------------------------------------|
| Figure 3.15. Jitter peaking at jitter transfer function                                 |
| Figure 4.1. Simplified block diagram of two-loop clock and data recovery circuit 45     |
| Figure 4.2. Bode diagram of loop filter                                                 |
| Figure 4.3. Third order fine loop, open loop Bode diagram                               |
| Figure 4.4. Third order fine loop, closed loop Bode diagram                             |
| Figure 4.5. Root locus of fine loop                                                     |
| Figure 4.6. Jitter tolerance curve of the clock and data recovery system                |
| Figure 4.7. Third order coarse loop, open loop Bode diagram                             |
| Figure 4.8. Third order coarse loop, closed loop Bode diagram                           |
| Figure 4.9. Root locus of the coarse loop                                               |
| Figure 4.10. Step response of the coarse loop                                           |
| Figure 4.11. <i>Simulink</i> model of the coarse loop                                   |
| Figure 4.12. <i>Simulink</i> model of frequency detector                                |
| Figure 4.13. VCO control voltage variation for coarse loop only while frequency         |
| locking at 3.2 GHz                                                                      |
| Figure 4.14. Reference clock (@ 200 MHz) and divided VCO clock signals with the eye     |
| diagram of VCO clock after frequency lock at 3.2 GHz                                    |
| Figure 4.15. Spectrum of the 3.2 GHz VCO clock after frequency lock                     |
| Figure 4.16. <i>Simulink</i> model of the fine loop                                     |
| Figure 4.17. <i>Simulink</i> model of phase detector                                    |
| Figure 4.18 VCO control voltage variation for fine loop only while phase locking at 3.2 |
| Gbps data                                                                               |
| Figure 4.19. VCO clock (@ 3.2 GHz) and input data signals with the eye diagram of       |
| VCO clock after phase lock at 3.2 Gbps data                                             |
| Figure 4.20. <i>Simulink</i> model of two-loop architecture                             |
| Figure 4.21. VCO control voltage variation and lock signal for top-level clock recovery |
| while recovering 3.2 Gbps data61                                                        |
| Figure 4.22. 3.2 Gbps data in and sampling VCO clock signals                            |
| Figure 4.23. Eye diagram of the VCO clock after phase alignment                         |
| Figure 4.24. Spectrum of the VCO clock after phase alignment at 3.2 Gbps63              |
| Figure 5.1. Examples of substrate current injection. (a) CMOS (b) SCL [11]66            |
| Figure 5.2. Channel modulation coefficient simulation setup                             |
| Figure 5.3. I-V curve of an NMOS with the change of W and L                             |

| Figure 5.4. I-V curve of an NMOS with the change of $V_{GS}$ (W=1.7 $\mu$ m, L=0.12 $\mu$ m) | . 70  |
|----------------------------------------------------------------------------------------------|-------|
| Figure 5.5. I-V curve of an NMOS with the change of $V_{GS}$ (W=4.6µm, L=0.12µm)             | . 72  |
| Figure 5.6. Two cases for phase detector to resolve                                          | .74   |
| Figure 5.7. Phase-frequency detector state diagram and ideal waveforms                       | .75   |
| Figure 5.8. PFD transfer characteristic                                                      | .75   |
| Figure 5.9. Block diagram of dead-zone free PFD                                              | .76   |
| Figure 5.10. <i>Cadence</i> schematic view of the designed PFD circuit                       | .77   |
| Figure 5.11. Cadence schematic view of PFD_zero circuit                                      | .78   |
| Figure 5.12. Spectre simulation result of PFD circuit at 200 MHz                             | . 79  |
| Figure 5.13. Simulated PFD transfer characteristic by Spectre                                | . 80  |
| Figure 5.14. PFD with charge pump                                                            | . 81  |
| Figure 5.15. Charge sharing in charge pump                                                   | . 83  |
| Figure 5.16. Differential CMOS charge pump (B. Razavi)                                       | . 84  |
| Figure 5.17. Differential charge pump used in the coarse loop                                | . 85  |
| Figure 5.18. Pump-down operation of differential charge pump                                 | . 87  |
| Figure 5.19. Differential charge operation during no UP or DN pulses                         | . 88  |
| Figure 5.20. Simulation result of M1 and M10 transistor drain currents during pu             | ımp   |
| down operation                                                                               | . 89  |
| Figure 5.21. Coarse loop differential control signal at the output of the different          | ntial |
| charge pump                                                                                  | . 89  |
| Figure 5.22. Common-mode feedback (CMFB) circuit                                             | .91   |
| Figure 5.23. Differential control signals with CMFB                                          | . 92  |
| Figure 5.24. Divide-by-16 circuit                                                            | . 93  |
| Figure 5.25. Simulation result of divide-by-16 circuit with an input clock frequency         | y of  |
| 3.2 GHz                                                                                      | . 94  |
| Figure 5.26. Conceptual block diagram of lock detector                                       | . 95  |
| Figure 5.27. Digital inverter with hystherisis                                               | . 96  |
| Figure 5.28. Transistor level schematic of lock detector                                     | . 97  |
| Figure 5.29. Transient simulation result of lock detector                                    | . 99  |
| Figure 6.1 Conceptual block diagram of Hogge phase detector                                  | 101   |
| Figure 6.2. Timing diagram of phase detector (clock is centred)                              | 102   |
| Figure 6.3. Timing diagram of phase detector (clock is advanced)                             | 103   |
| Figure 6.4. <i>Cadence</i> schematic view of the phase detector                              | 104   |

| Figure 6.5. (a) True single-phase clock (TSPC) flip-flop stage, (b) latch proposed in      |
|--------------------------------------------------------------------------------------------|
| [23], and (c) latch using source-coupled logic105                                          |
| Figure 6.6. Designed differential master-slave flip-flop                                   |
| Figure 6.7. Spectre simulation result of differential FF result with a centered 3.2 GHz    |
| clock                                                                                      |
| Figure 6.8. Transient simulation result with 8 ps setup time                               |
| Figure 6.9. Transient simulation result with 1 ps hold time                                |
| Figure 6.10. Differential flip-flop simulation result with 10 GHz clock                    |
| Figure 6.11. Schematic view of phase detector delay cell                                   |
| Figure 6.12. Functional core of the differential current mode XOR113                       |
| Figure 6.13. Schematic of fine loop differential charge pump 114                           |
| Figure 6.14. DC-sweep simulation result of fine loop differential charge pump 116          |
| Figure 6.15. Transient simulation result of charge pump outputs with loop filter 116       |
| Figure 6.16. $I_{CPOUT_N}$ and $I_{CPOUT_P}$ current variation while there is 156 ps phase |
| difference between clock and data117                                                       |
| Figure 6.17. a) Fine loop control circuitry transfer curve, b) zoomed transfer curve 118   |
| Figure 6.18. Ideal model for the RC loop filter119                                         |
| Figure 6.19. Loop filter implementation with NMOS devices                                  |
| Figure 6.20. Voltage dependency of MOS capacitance of loop filter                          |
| Figure 6.21. Single-stage inverter with a unity gain feedback                              |
| Figure 6.22. Two-stage inverters with a unity gain feedback                                |
| Figure 6.23. Three-stage inverters with two-poles and with a unity gain feedback 125       |
| Figure 6.24. Three-stage ring oscillator                                                   |
| Figure 6.25. (a) Differential ring oscillator with odd number of stages, (b) differential  |
| ring oscillator with even number of stages                                                 |
| Figure 6.26. (a) Single-ended ring oscillator buffer stage, (b) differential ring 127      |
| Figure 6.27. Current-starved ring oscillator buffer stages                                 |
| Figure 6.28. Delay control with capacitive tuning                                          |
| Figure 6.29. Delay control in differential buffer stages                                   |
| Figure 6.30. a) Interpolating delay stage, b) smallest delay, c) largest delay130          |
| Figure 6.31. Top-level <i>Cadence</i> schematic view of the differential VCO               |
| Figure 6.32. Transfer characteristic of the differential VCO                               |
| Figure 6.33. Implementation of delay interpolating in the differential VCO132              |
| Figure 6.34. Ring oscillator buffer stage                                                  |

| Figure 6.35. DC analyse result of differential buffer                              | 134     |
|------------------------------------------------------------------------------------|---------|
| Figure 6.36. Schematic view of the self-biasing circuit                            | 135     |
| Figure 6.37. Biasing opamp circuit                                                 | 136     |
| Figure 6.38. Differential output range of the self-biasing circuit                 | 137     |
| Figure 6.39. Output buffer chain of the VCO                                        | 139     |
| Figure 6.40. Transient response of the VCO output buffer                           | 139     |
| Figure 6.41. AC response of VCO output buffer                                      | 140     |
| Figure 7.1. Top-level <i>Cadence</i> schematic view of the clock and data recovery | 142     |
| Figure 7.2. Schematic view of power down circuit                                   | 144     |
| Figure 7.3. Coarse loop simulation results (a) UP and DN signals during lock       | (b)     |
| Differential control voltage variation with LOCK signal                            | 145     |
| Figure 7.4. Divided VCO clock and reference clock after frequency lock             | 145     |
| Figure 7.5. Two-loop simulation result at 3.2 Gbps data rate (a) Differential      | control |
| voltage and LOCK signal (b) Aligned data and clock signals                         | 146     |
| Figure 7.6. Two-loop simulation result at 2.5 Gbps data rate (a) Differential      | control |
| voltage and LOCK signal (b) Aligned data and clock signals                         | 147     |
| Figure 7.7. Supply current (power consumption) of the two-loop clock recovery      | 148     |
| Figure 7.8. Block diagram of the SERDES macro                                      | 150     |
| Figure 7.9. Application block diagram N-channel SERDES chip                        | 150     |
| Figure 7.10. Internal block diagram of SERDES receiver                             | 151     |
| Figure 7.11. Crossing of differential lines                                        | 154     |
| Figure 7.12. Two pair of differential lines crossing                               | 154     |
| Figure 7.13. The effect of shield ring and substrate contact technique on the nois | se path |
|                                                                                    | 156     |
| Figure 7.14. Top-level layout view of CDR circuit                                  | 157     |
| Figure 7.15. Layout view of PFD                                                    | 158     |
| Figure 7.16. Layout view of coarse loop charge pump                                | 158     |
| Figure 7.17. Layout view of CMFB circuit                                           | 159     |
| Figure 7.18. Layout view of the lock detector                                      | 159     |
| Figure 7.19. Layout view of the fine loop phase detector                           | 160     |
| Figure 7.20. Layout view of the fine loop charge pump circuit                      | 161     |
| Figure 7.21. Layout view of the VCO                                                | 161     |
| Figure 8.1. The layout of the top-level CDR test-chip                              | 167     |
| Figure A.1. Schematic of the PFD                                                   | 168     |

| Figure A.2. Schematic of the PFD_zero circuit               |     |
|-------------------------------------------------------------|-----|
| Figure A.3. Schematic of the coarse loop charge pump        |     |
| Figure A.4. Schematic of the lock detector                  |     |
| Figure A.5. Schematic of the CMFB                           |     |
| Figure A.6. Schematic of divide-by-16 circuit               |     |
| Figure A.7. Schematic of the loop filter                    |     |
| Figure A.8. Schematic of the differential flip-flop         |     |
| Figure A.9. Schematic of the phase detector delay component |     |
| Figure A.10. Schematic of current mode differential XOR     |     |
| Figure A.11. Schematic of phase detector                    |     |
| Figure A.12. Schematic of the fine loop charge pump         |     |
| Figure A.13. Schematic of the VCO top-level                 |     |
| Figure A.14. Schematic of the VCO delay cell                |     |
| Figure A.15. Schematic of the VCO delay buffer              |     |
| Figure A.16. Schematic of the VCO self-biasing circuit      | 176 |
| Figure A.17. Schematic of the biasing OPAMP                 |     |
| Figure A.18. Schematic of the VCO output amplifier          |     |
| Figure A.19. Schematic of the power down circuit            |     |
| Figure A.20. Schematic of the top-level CDR                 |     |
| Figure B.1. Mask layout of differential flip-flop           |     |
| Figure B.2. Mask Layout of the power down circuit           |     |
| Figure B.3. Mask layout of the differential XOR             |     |
| Figure B.4. Mask layout of the divide-by-16 circuit         |     |
| Figure B.5. Mask layout of the output buffer                |     |
| Figure B.6. Mask layout of the biasing resistor chain       |     |
| Figure B.7. Mask layout of the VCO delay buffer             |     |
| Figure B.8. Mask layout of the VCO self-biasing circuit     |     |
| Figure B.9. Mask layout of the VCO output amplifier         |     |
| Figure B.10. Mask layout of the VCO                         |     |

# LIST OF TABLES

| Table 3.1. SONET jitter tolerance curve mask table                                | 40    |
|-----------------------------------------------------------------------------------|-------|
| Table 3.2. SONET jitter transfer mask table                                       | 41    |
| Table 4.1. Numerical parameters for loop dynamics                                 | 46    |
| Table 4.2. Performance parameters obtained from fine loop MATLAB calculations.    | 51    |
| Table 4.3. Performance parameters from coarse loop MATLAB calculations            | 54    |
| Table 5.1. Mobility and oxide thickness values for transistors used in the design | 68    |
| Table 5.2. Corner case definitions                                                | 73    |
| Table 5.3. Device geometries of the differential charge pump                      | 85    |
| Table 5.4. Device geometries of CMFB circuit                                      | 91    |
| Table 5.5. Device geometries of the digital inverter with hystherisis             | 97    |
| Table 5.6. Device geometries of lock detector                                     | 98    |
| Table 6.1. Device geometries of differential master-slave flip-flop               | . 107 |
| Table 6.2. Device geometries of differential XOR gate                             | . 113 |
| Table 6.3. Device geometries of fine loop differential charge pump                | . 114 |
| Table 6.4. Device geometries of differential buffer                               | . 133 |
| Table 6.5. Device geometries of the opamp circuit                                 | . 136 |
| Table 7.1. Truth table of power down circuit                                      | . 143 |
| Table 7.2. Power supply and temperature specifications of CDR                     | . 151 |
| Table 7.3. AC specifications of the CDR                                           | . 152 |
| Table C.1.Performance comparison with reported high-speed CDR's                   | . 187 |
|                                                                                   |       |

### LIST OF SYMBOLS / ABBREVIATIONS

| А                  | Ampere                            |
|--------------------|-----------------------------------|
| f                  | femto                             |
| F                  | farad                             |
| G                  | Giga                              |
| g <sub>m</sub>     | Transconductance                  |
| Hz                 | Hertz                             |
| Κ                  | Kilo                              |
| K <sub>VCO</sub>   | VCO gain                          |
| K <sub>PD-CP</sub> | Phase detector & charge pump gain |
| L                  | Length of transistor              |
| m                  | mili                              |
| М                  | Mega                              |
| n                  | nano                              |
| р                  | pico                              |
| r <sub>o</sub>     | Output resistance                 |
| S                  | second                            |
| t <sub>ox</sub>    | Oxide thickness                   |
| μ                  | Micro                             |
| Uo                 | Mobility                          |
| V                  | Volt                              |
| W                  | Width of transistor               |
| mW                 | mili-Watt                         |
| ζ                  | Damping factor                    |
| ω <sub>n</sub>     | Natural frequency                 |
| α <sub>b</sub>     | Body effect coefficient           |
| γ                  | Body effect constant              |

| λ                   | Channel length modulation |
|---------------------|---------------------------|
| $\lambda_b$         | Body effect coefficient   |
| $\phi_{\mathrm{F}}$ | Fermi potential           |
| Ω                   | Ohm                       |

| PLL      | Phase-Locked Loop                                           |
|----------|-------------------------------------------------------------|
| DLL      | Delay-Locked Loop                                           |
| CDR      | Clock and Data Recovery                                     |
| SERDES   | Serializer / Deserializer                                   |
| NRZ      | Non-return-to-zero                                          |
| PFD      | Phase-Frequency Detector                                    |
| PD       | Phase Detector                                              |
| СР       | Charge Pump                                                 |
| LF       | Loop Filter                                                 |
| SONET    | Synchronous Optical Network                                 |
| CMRR     | Common Mode Rejection Ratio                                 |
| PM       | Phase Margin                                                |
| ECL      | Emitter Coupled Logic                                       |
| ESD      | Electrostatic Discharge                                     |
| Gbps     | Giga bits per second                                        |
| IEEE     | The Institute of Electrical and Electronics Engineers, Inc. |
| I/O      | Input/ output                                               |
| LVDS     | Low Voltage Differential Signalling                         |
| MLF      | Multi Lead Frame                                            |
| PCB      | Printed Circuit Board                                       |
| PSRR     | Power Supply Rejection Ratio                                |
| $V_{PP}$ | Volts peak-to-peak                                          |
|          |                                                             |

### 1. INTRODUCTION

### 1.1. Motivation

The proposed circuit presented in this thesis deals with understanding and designing the critical components that make up a clock and data recovery circuit that will be used in a serializer / deserializer (SERDES) structure. The extremely complicated nature of such a system required a focused study that did not address many of the issues that are present in a similar commercially designed product.

The performance of many digital systems today is limited by the interconnection bandwidth between chips, boards, and cabinets. Although the processing performance of a single chip has increased dramatically since the inception of the integrated circuit technology, the communication bandwidth between chips has not enjoyed as much benefit. Most CMOS chips, when communicating off-chip, drive un-terminated lines with full-swing CMOS drivers and use CMOS gates as receivers. Such full-swing CMOS interconnect must ring-up the line, and hence has a bandwidth that is limited by the length of the line rather than the performance of the semiconductor technology. Thus, as VLSI technology scales, the pin bandwidth does not improve with the technology, but rather remains limited by board and cable geometry, making off-chip bandwidth an even more critical bottleneck.

Serial data transmission sends binary bits of information as a series of optical or electrical pulses. However, the transmission channel (coax, radio, fiber) generally distorts the signal in various ways. From distorted signal clock and data must be recovered at the receiver side. Also, clock must be aligned with the recovered data. To achieve this functionality, there are many ways have been offered and implemented since 1970's. However, the most appropriate way for clock and data recovery at gigabit range is to use phase-locked loop (PLL) based architecture. Moreover, two-loop

architecture should be preferred since frequency acquisition is a must at gigabit range operating frequencies.

The two-loop clock recovery architecture is adopted in this design. This topology helps the overall timing budget by reducing the receiver clock jitter and dithering.

The main purpose of this thesis is to implement a pure CMOS high-speed, power efficient clock and data recovery circuit, which meets OC-48 jitter specifications. Moreover, it is planned that, the architecture of the circuit should be modular and it should light the way towards higher data rate clock recovery systems, such as 10 Gbps.

#### 1.2. Thesis Organization

The goal of this thesis is to review the theory, design and analysis of PLL based clock and data recovery circuits and complete a detailed design of a 2.4-3.2 Gbps CMOS clock and data recovery.

Chapter 2 gives a brief overview of the role of clock and data recovery circuits in serial communications. This chapter also discusses different clock recovery methods and architectures. Nature and properties of the non-return-to-zero (NRZ) data is also covered in Chapter 2.

Chapter 3 defines the performance parameters useful for PLL based clock and data recovery circuits and their relationship to one another. An in-depth discussion is presented on the impact of the input jitter on system performance. SONET jitter specification definitions are also given in Chapter 3.

Chapter 4 covers MATLAB and *Simulink* modelling of the two-loop architecture. Basic functional description of the two-loop architecture in clock recovery systems is also given in this chapter. Loop dynamics of both coarse and fine loop components are determined and s-domain model of the circuit is generated according to determined loop dynamics. *Simulink* model of the CDR is formed and corresponding simulation results are discussed.

Chapter 5 covers circuit design of the all sub-blocks of the coarse loop. In the beginning of the chapter, general considerations about circuit design and the process are given to the reader. In the following part of the chapter, coarse loop control circuitry components are presented. Transistor level circuit design is discussed in detailed with corresponding *Spectre* simulation results.

In Chapter 6, fine loop components and differential VCO architecture are covered. Special design techniques used in the fine loop and the simulation results are also the concern of this chapter. Last part of the Chapter 6 deals with differential VCO design issues. Design of each sub block of the VCO is discussed in detail.

Chapter 7 covers top-level construction of the CDR, as well as the top-level simulation results. Special layout techniques used in high-speed circuit design are presented. Layout views of the main blocks are given with their detailed descriptions.

Chapter 8 gives a brief summary of the work that has been performed. Comments on the future works are also mentioned in Chapter 8.

# 2. CLOCK AND DATA RECOVERY STRUCTURES IN SERIAL COMMUNICATION SYSTEMS

### 2.1. Introduction

The rapid increase of real-time audio and video transport over the Internet has led to global demand for high-speed serial data communication networks. To accommodate the required bandwidth, an increasing number of wide–area networks (WANs) and local-area (LANs) are converting the transmission medium from a copper wire to fiber. This trend motivates research on low-cost, low power and high-speed integrated receivers. A critical task in such receivers is the recovery of the clock embedded in nonreturn-to-zero (NRZ) serial data stream. The recovered clock both removes the jitter and distortion in the data and retimes it for further digital processing.

The continuing scaling of CMOS process technologies enables higher degree of integration, reducing cost. This fact, combined with the ever-shrinking time to market, indicates that designs based on flexible modules and macro cells have great advantages. In clock recovery applications flexibility means, for example, programmable bit rates requiring a phase lock loop (PLL) with robust operation over a wide frequency range. Increased integration also implies that the analogue portions of the PLL should have good power supply rejection to achieve low jitter in the presence of large power supply noise caused by the digital circuitry.

Another trend is low-power design using reduced  $V_{DD}$ . This reduces headroom available for analogue design, causing integration problems for mixed mode circuits. Furthermore, in applications where power consumption is a more critical design goal than compute power,  $V_T$  is not scaled as aggressively as  $V_{DD}$  to avoid leakage current in OFF devices, which worsens the headroom problem. Before addressing the special design techniques and system overviews devoted to solve the problems mentioned above, it is essential to understand the role of the clock recovery circuits in communication systems. Also, system level clock recovery architectures are discussed in this chapter.

### 2.2. Clock and Data Recovery in Serial Data Transmission

High-speed serial digital data communication networks and communication standards are finding increased application in mainstream optical telecommunications. One example is the AT&T Synchronous Optical Network (SONET) standard; another is the emerging Asynchronous Transfer Mode (ATM) protocol. This kind of system is shown conceptually in Figure 2.1. Increasing demand on serial communication systems creates a need for small and easy-to-use fiber optic receivers, key elements of which are the recovery of the clock signal embedded in the non-return-to-zero (NRZ) serial data stream and re-establishing the synchronous timing of the data using the recovered clock as reference.



Figure 2.1. Typical fiber optic serial data transmission system

To reduce interconnection hardware, only the data is transmitted over a single fiber link. At the receiving end of the link, the optical signal is converted to an analogue voltage waveform by a transconductance amplifier. The function of the clock recovery circuit is to process the analogue input voltage  $V_{in}$  and generate the corresponding bit clock RCLK. This recovered clock signal is used as the clock input to a D flip-flop, which samples  $V_{in}$  to develop the output serial data stream.

For this application, the measurement goal is to determine how well the clock recovery function can be performed. The timing diagram in Figure 2.1 shows the ideal case when clock recovery is performed perfectly: There is no phase error (jitter) in the recovered clock, and RCLK samples  $V_{in}$  at the exact centre of the bit period. This results in the minimum achievable bit error rate (BER). Any deviation of RCLK from the ideal will increase BER.

Increased BER is not the only negative effect of jitter in serial data communication systems. In a repeater system, where the recovered clock is also used as a transmit clock for a subsequent data link, phase jitter reduces the number of links that can be cascaded before jitter becomes unacceptably large.

In evaluating the performance of a data link, the end user must be concerned with many other possible influences on BER. Among other factors that can degrade system BER in a fiber optic link are power loss and dispersion in the optical fiber, inadequate optical power input at the transmit end, and noisy optical-to-electronic conversion at the receive end.

To assess its contribution to BER, the clock recovery block can be tested independently, as shown in Figure 2.2 The input is an ideal data waveform; the recovered clock is then compared to the transmit clock using a communications signal analyzer. If there were no jitter, the phase difference between the clocks would be constant (due only to static phase and propagation delay differences). In the presence of jitter, there is a distribution of phase differences. The standard deviation of this distribution is the end user's figure-of-merit for characterizing the jitter performance of the clock recovery block.



Figure 2.2. Independent test of jitter due to clock recovery function

### 2.3. Methods of Clock and Data Recovery

In order to regenerate the binary data at the receiving end of the digital transmission system with the fewest bit errors, the received data must be sampled at the optimum instants of time. Since it is usually impractical to transmit the required sampling clock signal separately from the data, timing information is generally derived from the incoming data itself. The extraction of the clock signal from incoming data is called clock recovery, and its general role in digital receivers is illustrated in Figure 2.3.



Figure 2.3. Simplified block diagram of a digital receiver

One method of recovering the bit clock is to apply the nonlinearly processed data waveform to a resonant circuit such as a surface acoustic wave (SAW) filter. Nonlinear processing is required since a non-return-to-zero (NRZ) data waveform has a spectral null at the bit frequency. The disadvantage of this approach is that SAW filters cannot be integrated and are expensive to fabricate.

An alternative and more reliable approach for generating the recovered clock is to use a phase-locked loop (PLL) as shown in Figure 2.4. This has the advantage of being integrable, and thus relatively inexpensive. This thesis will address design techniques for high jitter performance when a PLL is used for the clock recovery function.



Figure 2.4. Generic clock recovery architecture

Figure 2.4 is a simplified block diagram of a PLL being used for clock recovery. The voltage-controlled oscillator (VCO) generates the recovered clock RCLK. The phase detector compares transitions of RCLK to transitions of  $V_{in}$ , and generates an error signal proportional to the phase difference. The error signal is processed by the loop filter and applied to the VCO to drive the phase difference to zero. Ideally there is no phase error, and RCLK samples  $V_{in}$  at the exact centre of the bit period, giving the minimum bit error rate.

However, due to the non-ideal effects of the clock recovery components, PLL can contribute jitter. Therefore, during the design steps extra attention must be paid so that the system can have minimum possible jitter at its output to obtain a better BER performance. Although this work was done with serial data transmission in mind, there are several other applications requiring low jitter performance from PLLs that perform a clock recovery function.

#### 2.3.1. Disk Drive Clock Recovery

Data is usually stored on magnetic media with no reference track to indicate bit boundaries. Therefore, when data is read from the magnetic medium, there is a need to recover a clock signal from the data to determine the bit boundaries. Low jitter is necessary since any increase in jitter increases read errors.

#### 2.3.2. Generating High-Speed Digital Clocks On-Chip

As digital processor and memory chips become capable of operating at clock rates exceeding 100 MHz, the problem of distributing such a high-speed clock throughout a system becomes more difficult. One approach to solving this problem is to distribute a lower frequency clock, and multiply this clock to the higher frequency with an on-chip PLL. Low jitter is necessary since any increase in jitter reduces timing margin for digital signals that rely on the clock.

#### 2.3.3. Over-sampled Data Conversion

A PLL can be used to generate the high-speed clock required for delta-sigma A/D and D/A conversion in digital audio applications. Low jitter is necessary since phase noise on the clock can be aliased into the audio band to produce audible, objectionable artefacts in the reconstructed analogue waveform.

### 2.3.4. Wireless Communication

A PLL can be used to integrate the local oscillator (LO) function required for signal modulation and demodulation in radio frequency (RF) communication ICs. In

this case, frequency-domain performance is important since phase noise on the LO will translate into noise in the signal band after demodulation.

### 2.4. Basic Clock and Data Recovery Architectures

#### 2.4.1. Properties of Non-Return to Zero (NRZ) Data

When the incoming data signal has spectral energy at the clock frequency, a synchronous clock can be obtained simply by passing the incoming data through a band-pass filter, often realized as an LC tank or surface acoustic wave (SAW) device, tuned to the nominal clock frequency. Because of the bandwidth restrictions, however, in most signalling formats the incoming signal has no spectral energy at the clock frequency making it necessary to use the clock recovery process.

Binary data is commonly transmitted in the non-return-to-zero (NRZ) format. As shown in Figure 2.5, in this format each bit has duration of  $T_b$  (bit period), is equally likely to be ZERO or ONE, and is statistically independent of other bits. The quantity defined as  $r_b = 1 / T_b$  is called "bit rate" and measured in (bit/s). The term "non-returnto-zero" distinguishes this data format from another one called the "return-to-zero" (RZ) format, in which the signal goes zero between consecutive bits (Figure 2.5). Since for a given bit rate, RZ data contains more transitions than NRZ data, the latter is preferable where channel or circuit bandwidth is costly.

NRZ data has two attributes that make the task of clock recovery difficult. First, the data may exhibit long sequences of consecutive ONEs or ZEROs, demanding the clock recovery circuit to remember the bit rate during such an interval. This means that, in the absence of data transitions, the clock recovery circuit should not only continue to produce clock, but also cause a negligible drift in the clock frequency.

Second, the spectrum of the NRZ data has nulls at frequencies that are integer multiplies of the bit rate. For example, if the data rate is 1 Gbps, the spectrum has no energy at 1 GHz. The fastest waveform for 1Gbps stream of data is given in Figure 2.5. The result is a 500 MHz square wave, with all the even-order harmonics absent. From another point of view, if an NRZ sequence with a rate  $r_b$  is multiplied by

A.sin( $2\pi$ .m.rb.t), the result has a zero average for all integers m, indicating that the waveform contains no frequency components at (m × r<sub>b</sub>).



Figure 2.5 (a) NRZ data; (b) RZ data; (c) fastest NRZ data with  $r_b = 1$  Gbps

It is also helpful to know the shape of the NRZ data spectrum. Since the autocorrelation function of a random binary sequence is:

$$R_{x}(t) = 1 - \frac{|t|}{T_{b}}, |t| < T_{b}$$

$$R_{x}(t) = 0, |t| < T_{b}$$
(2.1)

The power spectral density equals:

$$P_{x}(w) = T_{b} \left[ \frac{\sin(w.T_{b}/2)}{w.T_{b}/2} \right]^{2}$$
(2.2)

Power spectral density of NRZ data is plotted in Figure 2.6, this function vanishes at  $\omega = 2.m.\pi/Tb$ . In contrast, RZ data has finite power at such frequencies.

Due to the lack of a spectral component at the bit rate of NRZ format, a clock recovery circuit may lock to spurious signals or simply not lock at all. Thus, NRZ data usually undergoes a non-linear operation at the front end of the circuit so as to create a frequency component at  $r_b$ . A common approach is to detect each transition and generate a corresponding pulse (edge detection).



Figure 2.6. Spectrum of NRZ data

Another good example for power spectral density of NRZ data is given in Figure 2.7. In Figure 2.7, 622 Mbps data and corresponding clock signals are captured from oscilloscope with the spectrums of those signals. Note that spectral component of the clock has a peak at 622 MHz, while spectral component of 622 Mbps data vanishes at 622 MHz.



Figure 2.7. Power spectral density of 622 Mbps data

### 2.4.2. Clock Recovery Architectures

As illustrated in Figure 2.8(a), edge detection requires sensing both positive and negative data transitions. In Figure 2.8(b), an XOR gate with delayed input performs this operation, whereas in Figure 2.8(c), a differentiator produces impulses corresponding to each transition, and a squaring circuit or a full wave rectifier converts the negative impulses to positive ones.



Figure 2.8. Edge detection of NRZ data

A third method of edge detection employs a flip-flop operating on both rising and falling edges. In a phase locked clock recovery circuit, the edge-detected data is multiplied by the output of the VCO that means the data transition impulses sample points on the VCO output. This process can also be performed using a master-slave flip-flop consisting of two D type latches. The data pulses drive the clock input of the VCO while VCO output is sensed by the D input (

Figure 2.9 (a)). Since in this structure VCO output is sampled on either rising and falling edges of the data, the circuit can be modified such that both latches sample the VCO output, but on opposite transitions of data. As shown in

Figure 2.9 (b), the resulting circuit samples the VCO output on every data transition and therefore this double edge triggered flip flop can perform edge detection process by itself.



Figure 2.9. Edge detection and sampling NRZ data

From the above observations, it can be noted that clock recovery consists of two basic functions: 1) edge detection 2) generation of a periodic output that settles to the input data rate but negligible drift when some data transitions are missing. Illustrated in Figure 2.10 is a conceptual realization of these functions, where a high-Q oscillator is synchronized with the input transitions. The synchronization can be achieved by phase locking technique.

Figure 2.10 shows how a simple PLL can be used along with edge detection to perform clock recovery. If input data is supposed to be periodic with a frequency  $1 / T_b$  then, the edge detector simply doubles the frequency, allowing the VCO to lock to  $1 / (2T_b)$ . If some transitions on data input are absent then, the output of the multiplier is zero and the voltage stored in the low-pass filter (LPF) decays, thereby making the VCO frequency drift. To minimize this effect, the time constant of the LPF must be sufficiently larger than the maximum allowable interval between consecutive transitions, thereby resulting in a small bandwidth and, hence, a narrow capture range of PLL.



Figure 2.10. Phase locked clock recovery circuit.

It follows from the above discussion that a PLL used for a clock recovery must also employ frequency detection to ensure locking to the input data despite process and temperature variations. This may suggest replacing the multiplier with the three-state phase frequency detector (PFD). However, circuit produces incorrect output if either of its input signals exhibits missing transitions. As depicted in Figure 2.11, in the absence of transition on the main input, the PFD interprets the VCO frequency to be higher than the input frequency, driving the control voltage in such a direction as to correct the apparent difference. This occurs even if the VCO frequency is initially equal to the input data rate. Thus, the choice of the PLL architecture and phase & frequency detectors for random binary data requires careful examination of their response when some transitions are absent.



Figure 2.11. Response of a three-state PFD to random data

Phase locked loop type clock recovery circuits have their own advantages and disadvantages. As seen in Figure 2.4 a PLL based clock recovery circuit can generate free running clock on long consecutive identical digits (CID). Because of its
amenability to monolithic implementation, a PLL is an attractive alternative to tuned circuit clock recovery. Furthermore, conventional PLLs offer a comparatively wide tuning range. In addition, having a chance of low cost implementation makes it desirable to select a PLL based architecture for clock recovery circuits. Another advantage of phase lock loop based clock recovery circuit is its convenience of implementing using CMOS processing technology, which is the most widely used technology in VLSI systems.

However, because the desired PLL loop bandwidths are often smaller than the tuning range, frequency acquisition is not guaranteed. This fact may cause clock recovery to lock to data sidebands, and also, there is always a possibility for the clock recovery to lock the power supply noise. Hence, in many applications clock recovery needs a frequency acquisition facility.

Apart from PLL based clock recovery circuits, using over-sampling method could be another technique for synchronization circuits in serial communication channels. This technique gives the opportunity of selecting best data sample among other samples. In order to obtain over-sampled data, clock or data itself passes through a certain number of delay cells. Outputs of each delay cell correspond to a clock or data phase and is called clock or data tap.

Over-sampling clock recovery circuits can be divided into two groups with respect to their delaying mechanism. Sampling clock can be delayed through several delay cells or data itself can be delayed through delay cells. It is preferable to delay clock signal because of its symmetrical behaviour. As the signal passes through several delay cells, which are usually digital or analogue buffers, a certain amount of distortion such as duty cycle distortion, common mode distortion or skew in differential signals is added to the signal. If the signal is symmetrical such as clock, then the distribution of the distortion over the signal is also symmetrical. For example rising and falling times of a clock pulse will be affected similarly, which reduces duty cycle distortion of the signal. However, if random data signal passes through delay cells then distortion distribution over data signal becomes random and unpredictable.

Since the data signal is slower than the clock signal (at most <sup>1</sup>/<sub>2</sub> frequency of the clock), delaying data makes the circuit design more flexible and easy. Thus, system requirements determine the method of the over-sampling.

As such systems use delay elements such as buffers, it is a critical design issue to stabilize delay of each cell. There are two main solutions for fixing the cell delay.

The first one is to use variable number of delay elements and to activate proper number of cells according to process and temperature variations. A frequency detector is used to determine the number of cells used for the current conditions. At start up, frequency detector determines the period of the clock signal in terms of the number of the delay cells and activates that amount of the delay cells to cover a bit period. In that way, one bit period is covered between first and last active delay cells. Incoming data is then over-sampled with different phases of clock within one bit time and the best clock and data sample is selected by a multiplexer and given as recovered clock and data signals. Basic structure of the mentioned method is given in Figure 2.12.



Figure 2.12. Over-sampling clock recovery using variable number of delay elements.

The second solution for fixing cell delay is to use a delay locked loop (DLL) controller in order to tune the cell delays. This method does not need varying the number of delay cells; the number of delay cells is determined by the over-sampling ratio. Thus, a system using 16x over-sampling simply uses 16 delay elements. This solution fixes the number of delay elements but it changes the amount of the delay introduced by each cell. Adapting the delay of each cell is controlled by a DLL. DLL

generates a variable control voltage according to temperature and process variations. This control voltage is applied to reference delay line that consists of N delay cells. When DLL is in lock condition, it is guaranteed that one bit period is divided N equal delay elements. The control voltage is also applied to the master delay line that is used to over-sample the data. Thus, it becomes possible to divide data bit into equal phases of clock. A simple block diagram of the mentioned structure is given in Figure 2.13.



Figure 2.13. Over-sampling clock recovery using a DLL delay adjusting circuit.

In principle, both methods are identical from the functional description point of view. Regardless of which method is used in a system, selecting the proper phase of the clock or data is the main problem of the design. With an N times over-sampled data, there are N phases of clock signals are available. One phase of the clock among others

has the closest edge in the middle of the data eye. Such systems need well-defined and robust algorithms to select the best phase of the sampling clock. These algorithms are generally realized by using a digital back-end that is responsible for storing, processing and filtering the collected samples. After a certain amount of calculations, proper phase of the clock and data is determined. It is obvious that speed of those calculations is limited by the design of digital circuitry. After calculations, digital controller decides to increase or decrease the phase by one tap.

PLL based clock and data recovery circuits can perform synchronization in time. They react immediately to the phase variations over data signal within a certain margin. However, over-sampled clock recovery circuits select the proper phase of the sampling clock discretely among a certain number of taps. Moreover, tracking speed of the phase variations is directly limited with the operating frequency of the digital controller.

It is obvious that system requirements determine the type of the synchronizing method. All-analogue PLL based data and clock recovery circuits are preferred at overgigabit-rate serial communication channels. However, at lower speeds of transmission over-sampling based clock recovery circuits are also used.

# 3. PERFORMANCE MEASURES OF PLL BASED CLOCK AND DATA RECOVERY CIRCUITS

### **3.1. Introduction**

The design of data and clock recovery circuits for serial communication applications requires a thorough understanding of tradeoffs among the numerous levels of hierarchy. Each option has its merits, and determining which choice fits the desired system best is critical.

At the top level is the decision among the different data and clock recovery architectures. The important performance tradeoffs at the architectural level are the following: phase noise and timing jitter, tuning range, lock time, acquisition range, value of damping factor, loop bandwidth, idle data dependency and jitter tolerance performance.

Once an architecture has been chosen, the individual building blocks also have many design decisions. In addition to affecting top-level metrics, issues such as power consumption and quadrature signal generation also become important. These main performance measures and basic architecture fundamentals are discussed in this chapter.

# 3.2. Phase-Locked Loop Fundamentals

Figure 3.1 is a simplified block diagram of a Phase-Locked Loop (PLL). The components of a PLL generally include a phase detector, charge pump, loop filter, divider and Voltage-Controlled Oscillator (VCO). The basic functionality is as follows; the output frequency from the divider is first compared to the reference frequency by

the phase detector. A phase error signal generated by the phase detector is passed to the charge pump and phase detector creates a signal whose magnitude is proportional to the phase error. This signal is then low-pass filtered by the loop filter and used to control the output frequency with the VCO. When the PLL is in the locked condition, the two inputs to the phase detector are in-phase (or a fixed phase offset), and the output frequency is equal to the reference frequency multiplied by the divider ratio, N.

This section gives a brief description of the PLL linearized model.



Figure 3.1. Simplified block diagram of phase-locked loop

There are many variations of PLLs available on the market today since each of the components in the PLL can be designed in different ways. Digital implementations of PLLs can also be found for some specific applications [1]. Despite these PLL derivatives, understanding the fundamentals is still a good starting point. As the name suggests, PLL locks the phase of the VCO output to the reference signal phase. During the initial transient, PLL goes into nonlinear operating region as the VCO tries to find the correct frequency. As soon as the loop is in the locked condition, the small-signal linearized model can be used.



Figure 3.2. Small signal AC model of PLL

Figure 3.2 is a small-signal AC model of each building block in the PLL. The phase detector compares the phase difference between two inputs, and the charge pump converts the phase difference into a voltage signal.  $K_{PD-CP}$  denotes the composite phase detector and charge pump transfer function in units of volts/radian. The charge pump output is filtered by the low-pass filter and it generates a control voltage for the VCO. The transfer function for the loop filter is  $F_{LF}(s)$ , and the filter output varies the output frequency of the VCO. Because phase is the integral of frequency, the S-domain transfer function for VCO is  $K_{VCO}(s)$ . The divider in the feedback path divides the VCO output frequency by *N* and has a transfer function of 1/N.

In the steady state, the s-domain open loop transfer function of the PLL is

$$G(s) = \frac{K_{PD-CP} \cdot F_{LF}(s) \cdot K_{VCO} / s}{N}$$
(3.1)

yielding the following closed-loop transfer function:

$$H(s) = \frac{\Phi_o(s)}{\Phi_i(s)} = \frac{G(s)}{1 + G(s)} = \frac{\frac{K_{PD-CP} \cdot F_{LF}(s) \cdot K_{VCO} / s}{N}}{1 + \frac{K_{PD-CP} \cdot F_{LF}(s) \cdot K_{VCO} / s}{N}}$$
(3.2)

In its simplest form, a first-order low pass filter is implemented as in Figure 3.3, with

$$F_{LF}(s) = \frac{1}{1 + \frac{s}{w_{LF}}}$$
(3.3)

where  $\omega_{LF} = 1/(RC)$ . Eq. (3.2) reduces to:

$$H(s) = \frac{\Phi_o(s)}{\Phi_i(s)} = \frac{K_{PD-CP} \cdot K_{VCO} / N}{\frac{s^2}{W_{LF}} + s + K_{PD-CP} \cdot K_{VCO} / N}$$
(3.4)



Figure 3.3. Simple first-order low-pass filter

It is obvious from the Eq. (3.4) that the system is of second order, with one pole contributed by VCO and another pole by the low-pass filter. The quantity  $K=K_{PD-CP}.K_{VCO}/N$  is called loop gain and expressed in rad/s.

The generic phase-locked loop considered thus far is of second order. In principle, low-pass filter can include more poles to have sharper cut-off characteristics, a desirable property in many applications. However, such systems are difficult to stabilize, especially when process and temperature variations are taken into account. On the other hand, in many cases the PLL inevitably has a third pole, for example, if a capacitor is connected in parallel with the LPF output port (Figure 3.4) to suppress high frequency noise components.

Thus, most practical PLLs can be considered as third-order topologies with the third pole being much further from the origin than the other two.



Figure 3.4. Low-pass filter with a higher order pole

# 3.3. Loop Bandwidth and Damping Factor

In order to understand the dynamic behaviour of the PLL, denominator of the Eq. (3.4) can be converted to the familiar form used in control theory:  $s^2+2\zeta\omega_n s+\omega_n^2$ , where  $\zeta$  is the damping factor and  $\omega_n$  is the natural frequency of the system. Thus,

$$H(s) = \frac{\Phi_o(s)}{\Phi_i(s)} = \frac{W_n^2}{s^2 + 2ZW_n s + W_n^2}$$
(3.5)

where

$$W_n = \sqrt{W_{LF}K} \tag{3.6}$$

$$z = \frac{1}{2}\sqrt{\frac{W_{LF}}{K}}$$
(3.7)

 $\omega_n$  is the geometric mean of the –3dB bandwidth of the LPF and the loop gain. In addition, the damping factor is inversely proportional to the loop gain, an important and often undesirable trade off.

In second order systems, damping factor is usually greater than 0.5 and preferably equal to  $\sqrt{2}/2$  so, as to provide an optimally flat frequency response. Therefore, K and  $\omega_{LF}$  independently cannot be chosen.

The transfer function in Eq. (3.5) is that of a low-pass filter, suggesting that if the input excess phase varies slowly, then the output excess phase follows, and conversely, if the input excess phase varies rapidly, the output excess phase variation will be small. In particular, if  $s \rightarrow 0$ ,  $H(s) \rightarrow 1$ ; static phase shift at the input is transferred to the output unchanged.

An important drawback of the PLL is the direct relationship between  $\omega_{LF}$ ,  $\zeta$  and K given by Eq. (3.7). If the loop gain is increased to reduce the static phase error, then the settling behaviour degrades. Settling behaviour of a PLL directly determines the stability of the control loop feedback mechanism in the steady state. Stability of the loop

must be ensured during operation in order to prevent false locking and continuously tracking within a predetermined margin.

# **3.4.** Lock Time (Settling Time)

The two poles of the closed-loop system are given by

$$s_{1,2} = -ZW_n \pm \sqrt{(Z^2 - 1)W_n^2}$$
(3.8)

$$s_{1,2} = \left(-z \pm \sqrt{z^2 - 1}\right) w_n \tag{3.9}$$

Thus, if  $\zeta > 1$  both poles are real, the system is over-damped, and the transient response contains two exponentials with time constants  $1/s_1$  and  $1/s_2$ . On the other hand, If  $\zeta < 1$ , the poles are complex and the response to an input frequency step  $\omega_{in} = \Delta \omega.u(t)$  is equal to

$$W_{out}(t) = \left\{ 1 - e^{-zw_n t} \left[ \cos\left(w_n \sqrt{1 - z^2} t\right) + \frac{z}{\sqrt{1 - z^2}} \sin\left(w_n \sqrt{1 - z^2} t\right) \right] \right\} \Delta W.u(t) \quad (3.10)$$

$$W_{out}(t) = \left[1 - \frac{1}{\sqrt{1 - z^2}} e^{-zw_n t} \sin\left(w_n \sqrt{1 - z^2} t + q\right)\right] \Delta w.u(t)$$
(3.11)

where  $\omega_{out}$  denotes the change in the output frequency and  $q = \sin^{-1}(\sqrt{1-z^2})$ . Thus, as shown in Figure 3.5, the step response contains sinusoidal component with a frequency  $w_n = (\sqrt{1-z^2})$  that decays with a time constant  $(\zeta \omega_n)^{-1}$ . System exhibits same response if a phase step is applied to the input and the output phase is observed.



Figure 3.5. Under-damped response of PLL to a frequency step (a)  $\zeta$  = 0.25, (b)  $\zeta$  = 0.707

The settling speed of a PLL is of great concern in most applications. Eq. (3.11) indicates that the exponential decay determines how fast the output approaches its final value, implying that  $\omega_n \zeta$  must be maximized. Eq.'s (3.6) and (3.7) yield

$$\mathbf{Z}\mathbf{W}_n = \frac{1}{2}\mathbf{W}_{LF} \tag{3.12}$$

This result reveals a critical trade off between the settling speed and the ripple on VCO control line: the lower  $\omega_{LF}$ , the greater the suppression of high-frequency components produced by the phase detector but the longer the settling time.

The choice of  $\zeta$  entails other tradeoffs as well. First, as  $\omega_{LF}$  reduced to minimize the ripple on the control voltage, stability degrades. Second, phase error and  $\zeta$  are inversely proportional to K; lowering the phase error inevitably makes the system less stable. In summary, second order PLL suffers from tradeoffs between settling speed, the ripple on the control voltage (the quality of the output signal), the phase error and the stability.

#### 3.5. Lock Range (Tracking Range)

In lock position, the input and the output frequencies of a PLL should be equal. However, the phase error may not be zero. Lock or tracking range of a PLL based system is the measure of how far the system can track the input frequency. In this chapter, parameters that can determine the tracking range of a PLL will be explained.

In order to have a more expressive explanation, let's consider following two extreme cases: 1) the input frequency varies slowly (static tracking), 2) the input frequency is change abruptly (dynamic tracking). It will be seen that tracking behaviour of the PLL is distinctly different in two cases.

Starting from the VCO free-running frequency, the input frequency varies slowly such that the difference between  $\omega_{in}$  and  $\omega_{out}$  always remain much less than  $\omega_{LF}$ . Then to allow tracking, the magnitude of the VCO control voltage, and hence the static phase error, must increase. (Figure 3.6)



Figure 3.6. Variation of parameters during tracking

PLL tracks as long as the three parameters plotted in Figure 3.6 vary monotonically. In other words, the edge of the tracking range is reached at the point where the slope of one of the characteristics falls to zero or changes sign. This can occur only in the phase detector (PD) or the VCO (provided the LPF components are linear). Summarized in Figure 3.7 are examples of such behaviour. The VCO frequency typically has a limited range, out of which its gain drops sharply. In addition, in a typical phase detector, the characteristic becomes non-monotonic for a sufficient large input phase difference, at which point the PLL fails to maintain lock.



Figure 3.7. Gain reduction in PD and VCO

For a multiplier type PD, the gain,  $K_{PD-CP}$ , changes sign if the input phase difference deviates from its centre value by more than 90°. Thus, the VCO output frequency can deviate from its free-running value by no more than [2]

$$\Delta w_{tr} = K_{PD-CP} \left( \sin \frac{p}{2} \right) K_{VCO}$$
(3.13)

Therefore, the static tracking range of a PLL employing a sinusoidal PD is the smaller of K and half of the VCO output frequency range.

In the second case, the input frequency is changed abruptly. The input frequency of a PLL that is initially operating at  $\omega_{in} = \omega_{out} = \omega_{FR}$  is stepped by  $\Delta\omega$ . It is important to determine maximum  $\Delta\omega$  value as a tracking performance measure. It is obvious that  $\Delta\omega$  value cannot be as large as  $\omega_{tr}$  in the case of static tracking (Eq. 3.13).

For any input frequency step at its input, a PLL loses lock, at least momentarily. Loop requires a number of cycles to re-stabilize itself. During these cycles, the inputoutput phase difference varies, and the PLL can be considered unlocked. For small  $\Delta \omega$  values, the loop locks quickly, and the transient can be viewed as one of tracking rather than locking.

The key point resulting from the above observation is that the following two situations are similar: 1) a loop initially locked at  $\omega_{FR}$  experiences a large input frequency step,  $\Delta\omega$ ; and 2) a loop initially unlocked and free running ( $\omega_{out} = \omega_{FR}$ ) must lock onto as input frequency given by  $|\omega_{in} - \omega_{FR}| = \Delta\omega$ . In both cases, the loop must acquire lock. Acquisition of lock concept, which is an important design issue for PLL based data and clock recovery circuits, is given in the next section.

#### 3.6. Acquisition of Lock

The second case mentioned above occurs, for example, when PLL is turned on. If the initial conditions in the LPF are zero, the VCO begins to oscillate at  $\omega_{FR}$ , whereas the input as at different frequency,  $\omega_{FR} + \Delta \omega$ . The acquisition range (also called the capture range) is the maximum value of  $\Delta \omega$  for which the loop locks.

To understand how a PLL acquires lock, the response will be studied with frequency domain perspective. Let's consider a PLL with an input frequency of  $\omega_{in} = \omega_{FR} + \Delta \omega$  and an output frequency of  $\omega_{FR}$ . It can be seen that, because  $\omega_{in} \neq \omega_{out}$ , the average output of PD is zero and the loop cannot be driven toward lock. The important point, however, is that the LPF does not completely suppress the component at  $\omega_{in} - \omega_{out}$ =  $\Delta \omega$ . Thus, the VCO control voltage varies at a rate equal to  $\Delta \omega$ , thereby modulating output frequency:

$$V_{out}(t) = A\cos\left(W_{FR}t + K_{VCO}\int A_m \cos(\Delta w t)dt\right)$$
(3.14)

$$V_{out}(t) = A\cos\left[W_{FR}t + \frac{K_{VCO}}{\Delta W}A_m\sin(\Delta wt)\right]$$
(3.15)

$$V_{out}(t) = A.\cos(w_{FR}t) - \frac{K_{VCO}}{\Delta W} A_m \sin(w_{FR}t) \sin(\Delta wt)$$
(3.16)

where it is assumed that  $K_{vco}A_m/\Delta\omega \ll 1$ . As a result, the VCO output exhibits sidebands at  $\omega_{FR} \pm \Delta\omega$  in addition to the main component at  $\omega_{FR}$ . When the PD multiplies the sideband at  $\omega_{FR} + \Delta\omega$  by  $\omega_{in}$ , a DC component appears at the output of LPF, adjusting the VCO frequency toward lock. The DC component may need to grow over a number of cycles before lock is achieved.

The acquisition range depends on how much the LPF passes the component at  $\Delta\omega$ and how strong the feedback DC component is. Acquisition range is a direct function of the loop gain at  $\Delta\omega$ . In other words, because the loop gain of a simple PLL drops as the difference between input frequency and the VCO frequency increases, the acquisition range cannot be arbitrarily wide.

Acquisition range is a critical parameter because 1) it trades directly with the loop bandwidth. If an application requires small bandwidth (as in the case of clock and data recovery applications), the acquisition range will be proportionally small. 2) It determines the maximum frequency variation at the input or the VCO that can be accommodated. In monolithic implementations, the VCO free-running frequency can vary substantially with temperature and process, thereby requiring a wide acquisition range even if the input frequency is tightly controlled.

Unfortunately, it is difficult to calculate the acquisition range of PLLs analytically. However, a simplified case can be considered, where the LPF output signal can be approximated as [3]

$$V_{LPF}(t) = K_{PD-CP} \left| F_{LF}(j\Delta w) \right| \sin(\Delta w t)$$
(3.17)

this signal modulates the VCO frequency, causing a maximum deviation of

$$\left(W_{out} - W_{FR}\right)\Big|_{\max} = K_{PD-CP}K_{VCO}\Big|F_{LF}(j\Delta w)\Big|$$
(3.18)

if this deviation is equal to or greater to than  $\Delta \omega$ , then loop locks without any need of cycles. [4]

$$\Delta W_{acq} = K_{PD-CP} K_{VCO} \left| F_{LF} \left( j \Delta W \right) \right|$$
(3.19)

With  $F_{LF}(s)$  known,  $\Delta \omega$  can be calculated from this equation. For a simple low-pass filter:

$$\Delta W_{acq} = \left[\frac{W_{LF}^2}{2} \left(-1 + \sqrt{1 + \frac{1}{4z^2}}\right)\right]^{\frac{1}{2}}$$
(3.20)

which reduces to  $\Delta \omega_{acq} \approx 0.46 \omega_{LF}$  if  $z = \sqrt{2}/2$ . The above derivation actually underestimates the capture range. This is because as  $\omega_{in}$  is brought closer to  $\omega_{FR}$ , the average frequency of VCO also departs from  $\omega_{FR}$  and come closer to  $\omega_{in}$ . [5] Thus, around of lock, the difference between  $\omega_{in}$  and  $\omega_{VCO}$  is small and the LPF attenuation predicted by Eq. (3.17) too large. A more accurate and detailed expression can be found in [6].

Most modern phase-locked loop systems incorporate additional means of frequency acquisition to significantly increase the capture range, often removing its dependence on K and  $\omega_{LPF}$  and achieving limits equal to those of the VCO.

#### 3.6.1. Acquisition Time

The acquisition and settling times of a PLL are important in many applications. If a PLL used in a clock and data recovery system and the system is received idle data (long 1's and 0's) frequently, it becomes critical to know how long the system must wait for the clock recovery to allow adequate phase alignment.

For a simple second order system with  $\zeta < 1$ , the step response is expressed as

$$y(t) = \left[1 + \frac{1}{\sqrt{1 - z^2}} \exp(-zw_n t) \cdot \sin\left(w_n \sqrt{1 - z^2} t - y\right)\right] u(t)$$
(3.21)

where  $y = \sin^{-1} \sqrt{(1-z^2)}$ . Thus the decay time constant is

$$t_{dec} = \frac{1}{ZW_n} = \frac{2}{W_{LF}}$$
(3.22)

and the frequency of the ringing equals to  $w_n = \sqrt{(1-z^2)}$ . For a frequency step at the PLL input, Eq. (3.21) can be used to calculate the time required for the output frequency to settle within a given error band around its final value.

Eq. (3.21) assumes a linear system. In practice, non-linearities in  $K_{PD}$  and  $K_{VCO}$  result in somewhat different settling characteristics, and simulations must be used to predict the lock time accurately. Nonetheless, this equation provides an initial guess that proves useful in early phases of design.

## 3.6.2. Aided Acquisition

The acquisition behaviour formulated by Eq. (3.20) indicates that the capture range of a simple, optimally stable phase-locked loop is roughly equal to  $0.5\omega_{LF}$ , regardless of the magnitude of K. Since issues such as jitter and side-band suppression impose an upper bound on  $\omega_{LF}$ , the resulting capture range is often inadequate. Therefore, more practical PLL applications employ additional techniques to aid acquisition of frequency.

Shown in Figure 3.8 is a conceptual diagram of a PLL with aided frequency acquisition. Here, the system utilizes a frequency detector (FD) and a second low-pass filter, LPF<sub>2</sub>, whose output added to that of LPF<sub>1</sub>. The FD produces an output having a DC value proportional to and with the same polarity as  $\omega_{in} - \omega_{out}$ . If the difference between  $\omega_{in}$  and  $\omega_{out}$  is large, the PD output has a negligible DC component and the VCO is driven by the DC output of the FD with negative feedback, thereby moving  $\omega_{out}$  toward  $\omega_{in}$ . As  $|\omega_{in} - \omega_{out}|$  drops, the DC output of the FD decreases, and that of the PD increases. Thus the frequency detection loop gradually relinquishes the acquisition to phase-locked loop, becoming inactive when  $\omega_{in} - \omega_{out} = 0$ .



Figure 3.8. Aided acquisition with a frequency detector

It is important to note that in a frequency detection loop, the loop gain is relatively constant, independent of  $|\omega_{in} - \omega_{out}|$ , whereas in a simple phase-locked loop, loop gain drops if  $|\omega_{in} - \omega_{out}|$  exceed  $\omega_{LF}$ . For this reason, aided acquisition using FDs can substantially increase the capture range.

## 3.7. Timing Jitter Definitions

Jitter is defined as the short-term variations of a digital signal's significant instants from their ideal positions in time. Jitter can also be defined as "the deviation from the ideal timing of an event. The reference event is the differential zero crossing for electrical signals and the nominal receiver threshold power level for optical systems" [7].

Two general types of jitter are characterized: deterministic jitter and random jitter. Because each type accumulates differently in channel, they are characterized independently.

#### 3.7.1. Deterministic Jitter

Deterministic jitter is generally bounded in amplitude, non-Gaussian, and expressed in units of time, peak to peak. These are examples of deterministic jitter.

 $\emptyset$  Duty-cycle distortion – e.g., from asymmetric rise and fall times.

- $\emptyset$  Inter-symbol interference (ISI) e.g., from channel dispersion or filtering.
- $\emptyset$  Sinusoidal e.g., from power supply feedthrough.
- $\emptyset$  Uncorrelated e.g., from crosstalk by other signals.

Deterministic jitter can be sub-divided into two categories:

Ø Pattern dependent jitter (PDJ) or inter-symbol interference, which is caused either a low-frequency roll-off or a high frequency roll-of or both.

Ø Pulse width distortion (PWD) that can be caused by DC-offsets, difference in rise and fall times and non-linear behaviour of the receiver channel in presence of high input powers.

Pattern dependent jitter is caused by the fact that long bit patterns contain more power located at lower frequencies than bit patterns with a high number of transitions per second. If the amplitude of the frequency response of the receiver channel is not constant with frequency, for example due to a coupling capacitor at low frequencies or due to bandwidth limitations at higher frequencies, then pattern dependent jitter will occur. Figure 3.9 shows a method to estimate the worst-case pattern dependent jitter, which occurs when an infinite long string of "1"s is followed by a single isolated "1" bit.



Figure 3.9. Pattern dependent jitter

We can derive the following formula for the peak-to-peak pattern dependent jitter assuming the reference level for the comparator is halfway the maximum amplitude of the signal:

$$\Delta t_{PDJ,pp} = \frac{1}{2pf_{3dB}} \left[ \ln(2) + \ln(\frac{1}{2} + \exp(-\frac{2pf_{3dB}}{B})) \right]$$
(3.23)

where B is the bit-rate and  $f_{3dB}$  is the receiver channel bandwidth. Because  $B \propto f_{3dB}$  we see that there is a trade-off between sensitivity and the amount of pattern dependent jitter.

## 3.7.2. Random Jitter

Random jitter is caused by random noise present on the signal under investigation, which results in a random behaviour in function of time the transitions with a reference level. (Figure 3.10)



Figure 3.10. Noise on a signal results in random jitter

Usually, the noise associated with the received and converted signal is white noise with a Gaussian probability density distribution. Random jitter is assumed to be Gaussian in nature and accumulates from thermal noise sources. Because peak-to-peak measurements take a long time to achieve statistical significance, random jitter is measured as a root mean square (RMS) value. Multiple random-jitter sources add in an RMS fashion, but a peak-to-peak value is needed when adding random jitter to deterministic jitter to get total jitter, peak to peak [7]. Figure 3.11 shows how RMS voltage noise can be translated into RMS random jitter using Eq. (3.24)

$$s_{RJ} = \frac{s_v}{SR} \tag{3.24}$$

where  $\sigma_{RJ}$  the RMS random jitter,  $\sigma_v$  the RMS noise of the signal and SR=dv/dt the slew rate of the output signal of the receiver when crossing the comparator reference. Note that this rate is the smallest for weak signals and increases with increasing input power.

If the rise time of the received signal is known from 20% to 80%, the slew rate as a function of the input amplitude A can be approximated by:

$$SR = 0.6 \frac{A}{t_r} \propto f_{3dB} \tag{3.25}$$

Assuming white Gaussian noise (which usually dominates the receiver channel) we know that:

$$\boldsymbol{s}_{v} \propto \sqrt{f_{3dB}} \tag{3.26}$$

This leads to Eq. (3.27) showing that there exists a trade-off between sensitivity (which requires a bandwidth just enough to avoid inter-symbol interference) and random jitter performance (which requires a high bandwidth to obtain steep slopes of the recovered bits to minimize the random jitter).

$$s_{RJ} \propto \frac{1}{\sqrt{f_{3dB}}} \tag{3.27}$$



Figure 3.11 Relationship between RMS noise and RMS random jitter

To transform the RMS random jitter to peak-to-peak random jitter, the jitter should be multiplied by 13, assuming a BER  $< 10^{-10}$  [8].

# **3.8. SONET Jitter Specifications**

In an application as a regenerator, the clock and data recovery device should meet both receive-jitter tolerance and transmit jitter generation requirements. Synchronous Optical Network (SONET) Specifications combines all these jitter related issues in three titles. These are; SONET jitter tolerance, SONET jitter transfer and SONET jitter generation.

SONET jitter specifications offer different requirements for different predetermined data transmission rates under OC-N (Optical Communications - N) title. These are:

- OC-1: 52 Mbps data transmission.
- OC-3: 155 Mbps data transmission.
- OC-12: 622 Mbps data transmission.
- OC-48: 2.488 Gbps data transmission.
- OC-192: 9.952 Gbps data transmission.

For our application OC-48 requirements has been taken into account for design specifications. Following chapters will cover detailed SONET jitter specification sub-titles.

# **3.8.1. SONET Jitter Tolerance**

Jitter tolerance is the measure of the ability of the clock recovery device to track a jittered output data signal. Jitter on the input data in best thought of as a phase modulation, and is usually specified as unit intervals (UI). PLL must provide a clock output from VCO that tracks this phase modulation in order to accurately retime jittered data.

For the VCO output to have a phase modulation that tracks the input jitter some modulation signal must be generated at the output of the phase detector, this is the error signal, e(s), introduced by the PD. The error transfer function of a PLL,  $H_e(s) = e(s)/\Phi_{in}$  is given in Eq. (3.28).

$$H_{e}(s) = \frac{e(s)}{\Phi_{in}} = 1 - H(s) = 1 - \frac{K_{PD-CP} \cdot K_{VCO} / N}{\frac{s^{2}}{W_{LF}} + s + K_{PD-CP} \cdot K_{VCO} / N}$$

$$H_{e}(s) = \frac{s^{2} + W_{LF} s}{s^{2} + W_{LF} s + \frac{K_{PD-CP} \cdot K_{VCO} W_{LF}}{N}}$$
(3.28)

The modulation output from the phase detector can only be produced by a phase error between the data input and the clock input. Hence PLL can never perfectly track the jittered data. However the magnitude of the phase error depends on the gain around the loop. At low jitter frequencies, loop provides very high gain, and thus very large jitter can be tracked and compensated with small phase errors between input data and recovered clock. At frequencies closer to the loop bandwidth, the gain of the loop is much smaller, and thus less input jitter is tolerated. These features for 155 Mbps transmission is seen in Figure 3.12, the theoretical jitter tolerance for clock recovery phase-locked loop [9]. Note that there are two curves, one each for damping factor of 1 and 10.



Figure 3.12. Jitter tolerance curve for a 155Mbps application [9]

If the magnitude of the error signal, e, is greater than the eye opening of the input data signal, then the data retiming errors are made. This is the limit of the jitter tolerance. The error transfer function,  $e(s)/\Phi_{in}$  in Eq. (3.28), is a high pass filter. Thus, it is expected to tolerate large amounts of jitter at low frequencies, because the high pass filter attenuates. At higher frequency, the high pass filter transmits all the input jitter and only small amounts of jitter may be tolerated. The corner frequency of this high pass filter is same as the -3dB frequency of the low-pass jitter transfer function, H(s), given in Eq. (3.4). Thus, jitter tolerance can be determined in two different ways. An input signal with increasing amount of jitter can be tracked until the errors are detected, or the bandwidth of the jitter transfer function can be measured.

In Figure 3.13 and Table 3.1 SONET jitter tolerance specifications are summarized.



Figure 3.13. SONET jitter tolerance curve mask

| OC-N/STS-N | f0   | f1   | f2   | f3   | f4    | A1          | A2          | A2          |
|------------|------|------|------|------|-------|-------------|-------------|-------------|
| Level      | (Hz) | (Hz) | (Hz) | (Hz) | (Hz)  | $(UI_{pp})$ | $(UI_{pp})$ | $(UI_{pp})$ |
| 1          | 10   | 30   | 300  | 2K   | 20K   | 0.15        | 1.5         | 15          |
| 3          | 10   | 30   | 300  | 6.5K | 65K   | 0.15        | 1.5         | 15          |
| 12         | 10   | 30   | 300  | 25K  | 250K  | 0.15        | 1.5         | 15          |
| 48         | 10   | 600  | 6000 | 100K | 1000K | 0.15        | 1.5         | 15          |

Table 3.1. SONET jitter tolerance curve mask table

# 3.8.2. SONET Jitter Transfer

Knowing the jitter transfer function is important. There is a specification in SONET that the bandwidth of the jitter transfer function be less than 2 MHz for OC-48 2.5 Gbps clock recovery. If this bandwidth is too large, jitter will accumulate quickly and causes bit error rate (BER) to increase. Thus, there is a conflict at requirements on the jitter transfer function: first, it should be wideband to accommodate lots of jitter; second, it should be narrowband to filter jitter and prevent jitter accumulation.

SONET jitter transfer specifications are summarized in Figure 3.14 and Table 3.2 respectively.



Figure 3.14. SONET jitter transfer function mask

| OC-N STS-N | fc    | Р    |
|------------|-------|------|
| Level      | (kHz) | (dB) |
| 1          | 40    | 0.1  |
| 3          | 130   | 0.1  |
| 12         | 500   | 0.1  |
| 48         | 2000  | 0.1  |

Table 3.2. SONET jitter transfer mask table

SONET jitter transfer specifications also require a maximum gain in the passband of less than 0.1dB. The gain in the jitter transfer function is commonly called jitter peaking, and contributes to the accumulation of the jitter. From Figure 3.15 it is seen that zero in the closed loop transfer function occurs at a lower frequency than the first closed loop pole. This results in jitter peaking that can never be eliminated. But the peaking can be reduced to negligible levels by over-damping the loop. However overdamping has some undesired effects, such as long acquisition time and improperly large capacitor values in loop filter. Thus it is necessary to define an analytical model for jitter peaking in order to determine the trade-offs in the clock recovery design (Jitter peaking can be fundamentally eliminated by architectural change in the PLL [10]).



Figure 3.15. Jitter peaking at jitter transfer function

The jitter peaking can be approximated as the ratio of frequencies of the pole,  $s_{low}$ , and the closed loop zero. For a PLL using the loop filter given in Figure 3.3 jitter peaking definition is given in Eq. (3.29). [9]

$$JP \approx \frac{s_{low}}{s_{zero}} \approx 1 + \frac{1}{C.K_{VCO}.K_{PD-CP}.R^2}$$
(3.29)

It is convenient to express jitter peaking in dB, as shown in Eq. (3.30) [9].

$$JP_{dB} = \frac{8.686}{C.K_{VCO}.K_{PD-CP}.R^2} dB$$
(3.30)

The jitter bandwidth is easily determined by finding the purely imaginary frequency at which the value of the squared modulus of the transfer function of Eq. (3.4) is equal to one half. The result is given in Eq. (3.31).

$$W_{-3dB} = W_n \sqrt{\left[2z^2 + 1 + \sqrt{\left(2z^2 + 1\right)^2 + 1}\right]}$$
(3.31)

# 3.8.3. SONET Jitter Generation

Jitter generation requirement is the measure of the jitter generated by the clock recovery system while no jitter or wander applied to the input.

All jitter generation tests are performed with clean input data. In addition, a bandpass filter is used to limit the jitter generation measurements to the jitter frequency range of interest. For OC-48 electrical interfaces, the band-pass filter has a 12 KHz high-pass cut-off frequency with a roll-off of 20dB/decade, and a low-pass cut-off frequency of at least 20 MHz.

After applying such a band-pass filter, the jitter generated by the clock recovery should be less than 0.01  $UI_{RMS}$  and 0.1  $UI_{pp}$ . 0.1  $UI_{pp}$  jitter corresponds to 40ps peak-to-peak jitter amplitude at 2.5 Gbps transmission.

# 4. MODELING AND SIMULATING PLL BASED CLOCK RECOVERY CIRCUIT IN MATLAB

## 4.1. Introduction

It is very important to perform pre-design analysis using extensive modelling tools in order to obtain crucial design and model parameters, to observe signal flow and to determine system characteristics before starting physical circuit design. It is also important to model whole system in order to have a robust and versatile circuit. It must be understood how circuit impairments affect the over-all operating characteristics in order to achieve optimised circuit behaviour. Therefore, a unified simulation environment that allows the modelling of all important circuit impairments is required if the desired design specifications are to be achieved. A full end-to-end simulation not only facilitates tradeoffs among circuit blocks, but also tradeoffs between circuit performance and system complexity.

Focussing especially on clock and data recovery circuits, basic conceptual models of the circuit and parameters such as -3dB bandwidth, damping factor, acquisition time etc. have been determined using MATLAB. During the advanced steps of the physical circuit design, modelling of the circuit using MATLAB has been performed in parallel.

*Simulink* from The Mathworks has been also used to model the clock and data recovery circuit. *Simulink* is a graphical block-diagram based simulation tool, which provides an excellent combination of hierarchical abstraction, simplicity and extensibility. This chapter describes the challenges and solutions for simulating analogue non-ideal effects with MATLAB and *Simulink*. In particular, this chapter describes how to model and simulate clock and data recovery circuit impairments in serial data communications.

#### 4.2. Two-Loop Architecture





Figure 4.1. Simplified block diagram of two-loop clock and data recovery circuit

A frequency detector guarantees that the device locks to the proper data frequency, and a phase detector then aligns the clock edges with the input data edges. The first loop for frequency acquisition is called "coarse loop", since it helps VCO clock frequency to reach data frequency by bigger steps. The second loop for phase alignment is called "fine loop", since it shifts the phase of the VCO clock to retime the data with smaller steps. Two loops are necessary because a phase-locked loop alone has a slow and unreliable frequency acquisition range. First, the frequency detector reduces the error in frequency between input data and the VCO. Once the frequency error is sufficiently small, the phase detector takes over and aligns the phase of the clock to match the input data. During this final stage of the operation, the output of the frequency detector is identically zero, and no longer affects the operation of the circuit.

False lock to the sidebands of the data is completely eliminated by this two-loop architecture. Such false lock is a problem found in more ordinary single-loop PLL based clock and data recovery devices. In single loop architectures, an initial frequency acquisition aid is needed because a phase detector alone cannot be relied on to achieve frequency acquisition.

# 4.3. Determining Loop Dynamics

Loop dynamics parameters of both coarse and fine loops have to be determined before starting physical circuit design. Before determining loop parameters the over-all transfer functions of both loops are extracted. The loop filter for both loops is same and as shown in Figure 3.4. Numerical parameters determined after MATLAB analysis for both loops with respect to system requirements are given in Table 4.1. Values given in Table 4.1 have been used for all the following MATLAB models and calculations.

| Parameter               | Value | Unit  | Expression                      |
|-------------------------|-------|-------|---------------------------------|
| K <sub>VCO</sub>        | 2.65  | GHz/V | Gain of the VCO                 |
| C1                      | 800   | pF    | Loop filter capacitor           |
| C2                      | 24    | pF    | Loop filter capacitor           |
| R                       | 240   | ohm   | Loop filter resistor            |
| I <sub>CP(FINE)</sub>   | 30    | μΑ    | Fine loop charge pump current   |
| I <sub>CP(COARSE)</sub> | 150   | μA    | Coarse loop charge pump current |
| Divide Ratio            | 16    | -     | Ratio of the divider circuit    |

Table 4.1. Numerical parameters for loop dynamics.

The transfer function,  $F_{LF}(s)$  of the loop filter is given in Eq. (4.1).

$$F_{LF}(s) = \frac{sRC_1 + 1}{s^2 RC_1 C_2 + s(C_1 + C_2)}$$
(4.1)

The Bode diagram in terms of magnitude and phase of loop filter is given in Figure 4.2 with corresponding capacitor and resistor values. Note that the zero  $(1/RC_1)$  and the pole  $((C_1 + C_2) / (RC_1C_2))$  of the transfer function determine the cut-off corners of the filter's frequency response.



Figure 4.2. Bode diagram of loop filter

Thus, the third order fine-loop open loop transfer function is given as:

$$G_{fine}(s) = \frac{K_{VCO} I_{CP-fine}(sRC_1 + 1)}{s^3 RC_1 C_2 + s^2 (C_1 + C_2)}$$
(4.2)

The corresponding third order fine-loop closed loop transfer function is derived as follows:

$$H_{fine}(s) = \frac{G_{fine}(s)}{1 + G_{fine}(s)} = \frac{K_{VCO} I_{CP-fine}(sRC_1 + 1)}{s^3 RC_1 C_2 + s^2 (C_1 + C_2) + sK_{VCO} I_{CP-fine} RC_1 + K_{VCO} I_{CP-fine}}$$
(4.3)

Corresponding fine loop, open and closed loop Bode diagrams are given in

Figure 4.3 and Figure 4.4 respectively.



Figure 4.3. Third order fine loop, open loop Bode diagram



Figure 4.4. Third order fine loop, closed loop Bode diagram

The stability behaviour of the loop can also be analysed by the root locus of their poles in the complex plane as the parameter  $K_{PD-IC}.K_{VCO}$  varies. With  $K_{PD-IC}.K_{VCO} = 0$ , the loop is open,  $\zeta = \infty$ , and the two poles are given by  $s_1 = \omega_{LPF}$  and  $s_2 = 0$ . As  $K_{PD-IC}.K_{VCO}$  increases (feedback becomes stronger),  $\zeta$  drops and two poles, given by  $s_{1,2} = \left[-z \pm \sqrt{z^2 - 1}\right] w_n$ , move toward each other on the real axis. For  $\zeta = 1$  ( $K_{PD-IC}.K_{VCO}=\omega_{LPF}/4$ ), the two poles overlap:  $s_1 = s_2 = \zeta \omega_n = -\omega_{LPF}/2$ . As  $K_{PD-IC}.K_{VCO}$  increases further, the two poles become complex, with a real part equal to  $-\zeta \omega_n = -\omega_{LPF}/2$ , moving in parallel with the j $\omega$  axis.

Root locus of the fine loop obtained by MATLAB calculations is given in Figure 4.5.



Figure 4.5. Root locus of fine loop

The error transfer function of a PLL,  $H_e(s) = e(s)/\Phi_{in}$ , had been extracted in Eq. (3.28). This function can be used to estimate the corner frequency of jitter tolerance curve of the clock recovery system. The jitter tolerance function of a PLL based clock recovery system, T(s), is equal to  $1 / H_e(s) = 1 / [1-H(s)]$ .

The T(s) function is given in Eq. (4.4).

$$T(s) = \frac{1}{1 - H(s)} = \frac{s^3 R C_1 C_2 + s^2 (C_1 + C_2) + s R C_1 K_{VCO} I_{CP-fine} + K_{VCO} I_{CP-fine}}{s^3 R C_1 C_2 + s^2 (C_1 + C_2)}$$
(4.4)

The corresponding jitter tolerance curve is also given in Figure 4.6.



Figure 4.6. Jitter tolerance curve of the clock and data recovery system

As seen from the Figure 4.6, the corner frequency of jitter tolerance function is around 2 MHz, while SONET jitter tolerance specifications require a corner frequency of only 1 MHz. Similarly, SONET jitter transfer requirements offer a -3dB bandwidth of 2 MHz, while this circuitry has a f<sub>-3dB</sub> of 4 MHz. Trade-offs between jitter tolerance curve and the jitter transfer function had been mentioned in the previous chapters. Since this circuit will operate up to 3.2 Gbps, and SONET specifications are defined for only 2.488 Gbps, to completely cover SONET requirements is not the main design concern of this study. Instead, SONET requirements have lighted the way for determining key parameters for the design.

Since the fine loop performs phase alignment process and tracks the jittered data, operating of the fine loop will have considerable effect on performance measures of the entire clock recovery system. Hence, following performance measure parameters (Table 4.2) had been obtained from the MATLAB model of fine loop.

| Parameter                 | Calculated | MATLAB          | Expression                        |
|---------------------------|------------|-----------------|-----------------------------------|
|                           | Value      | Graphical Value |                                   |
| $f_{LPF}$                 | 829 KHz    | 832 KHz         | Loop filter cut-off frequency     |
| ζ                         | 0.957      | 0.968           | Damping factor of the loop        |
| $\Delta f_{acq}$          | 219 KHz    | 215 KHz         | Acquisition range                 |
| $\mathbf{f}_{\mathbf{n}}$ | 1.59 MHz   | 1.66 MHz        | Natural frequency                 |
| f <sub>-3dB</sub>         | 3.84 MHz   | 4.13 MHz        | Jitter (loop) bandwidth           |
| JP                        | 2.37 dB    | 1.5 dB          | Jitter peaking                    |
| JTOL                      | None       | 2.08 MHz        | Jitter tolerance corner frequency |
| PM                        | 64.34°     | 68.65°          | Phase margin                      |

Table 4.2. Performance parameters obtained from fine loop MATLAB calculations.

The coarse loop has a divide-by-16 circuit for frequency division. This division ratio is embedded in the transfer functions as given below for the open and closed loop definitions accordingly.

$$G_{coarse}(s) = \frac{K_{VCO} I_{CP-coarse}(sRC_1 + 1)}{s^3 RC_1 C_2 + s^2 (C_1 + C_2)} \cdot \frac{1}{N}$$
(4.4)

$$H_{coarse}(s) = \frac{\frac{K_{VCO} \cdot I_{CP-coarse}}{N}(sRC_1 + 1)}{s^3 RC_1 C_2 + s^2 (C_1 + C_2) + \frac{sK_{VCO} \cdot I_{CP-coarse} \cdot RC_1 + K_{VCO} \cdot I_{CP-coarse}}{N}$$
(4.5)

For N = 16 Figure 4.7 and Figure 4.8 show open and closed loop Bode diagrams of the coarse loop, respectively.


Figure 4.7. Third order coarse loop, open loop Bode diagram



Figure 4.8. Third order coarse loop, closed loop Bode diagram

The stability behaviour of the coarse loop is also important because coarse loop is initially locked to the frequency of the data. Common PLL stability issues are valid with the coarse loop of the system. The zero of the closed loop system must be at the lowest frequency of the root locus for a stable and robust frequency lock. The root locus of the coarse loop is given in Figure 4.9.



Figure 4.9. Root locus of the coarse loop

Note that coarse loop has also a higher order pole at 26.5 MHz coming from  $C_2$  capacitor same as in fine loop. The pole at lower frequency, 1.28 MHz, surpasses the AC characteristics of the closed loop.

Another important parameter for coarse loop beside stability issue is settling time of the system for a step function of its input. Since, the coarse loop first locks to the data frequency and then powered down, it is desired to have stable and fast settling characteristics for the control voltage of the VCO during coarse tuning. As seen from Figure 4.10, overshoot value for the coarse loop settling time is 20.9% while, settling time is around 0.65 µs within 2% settling margin.



Figure 4.10. Step response of the coarse loop

As, the whole clock and data recovery system is formed from two independent loops; each loop has distinct loop dynamics parameters and performance measures. The most important effect of the coarse loop on the top-level design is the settling time characteristic while locking to the frequency of data signal. Settling time and characteristic depends on many parameters shown in Table 4.3. These values have been obtained by manual calculations and then verified with the MATLAB graphical analyses.

| Parameter                 | Calculated | MATLAB          | Expression                 |
|---------------------------|------------|-----------------|----------------------------|
|                           | Value      | Graphical Value |                            |
| ζ                         | 0.757      | 0.748           | Damping factor of the loop |
| $\Delta f_{acq}$          | 336 KHz    | 344 KHz         | Acquisition range          |
| $\mathbf{f}_{\mathbf{n}}$ | 1.25 MHz   | 1.28 MHz        | Natural frequency          |
| f <sub>-3dB</sub>         | 2.66 MHz   | 2.77 MHz        | Jitter (loop) bandwidth    |
| JP                        | 3.79 dB    | 2.08 dB         | Jitter peaking             |
| PM                        | 58.72°     | 63.40°          | Phase margin               |

Table 4.3. Performance parameters from coarse loop MATLAB calculations

#### 4.4. Simulink Modelling of Two-Loop Clock and Data Recovery

The first problem encountered when trying to simulate gigabit-range analogue blocks with low-speed blocks is the discrepancy in operating frequency. The analogue blocks run at 2-3 GHz, whereas the low speed part of the circuit operates at around 100–200 MHz. The naive approach to the simulation problem is to simply run the simulator at the highest frequency, which would be at the gigabit-range. Although possible, the simulation time would be huge for repetitive simulations such as loop dynamics estimation, which requires a certain number of iterations.

In order to keep away from longer transistor based simulation times, *Simulink* models of the coarse loop, fine loop and top-level clock recovery circuit have been constructed. Transient analyses with the component parameters obtained from MATLAB calculations have been performed to observe the loop characteristics.

#### 4.4.1. Coarse Loop Modelling

The *Simulink* model of the coarse loop and the frequency detector model are shown in Figure 4.11 and Figure 4.12 respectively.



Figure 4.11. Simulink model of the coarse loop

Loop filter of the system has been modelled using s-domain transfer function generator block of *Simulink*. VCO of the coarse loop has been taken from *Simulink* model libraries and the gain of the VCO has been set to 2.65 GHz/V, which is the real

value of the gain in transistor-based design. As shown in Figure 4.11, VCO output is divided by 16 while reference clock is set to operating VCO clock frequency / 16.



Figure 4.12. Simulink model of frequency detector

After various simulations and several iterations final values for loop parameters have been determined. Following figures have been captured from the simulations, which were performed with the final loop parameters at an operating frequency of 3.2 GHz. Figure 4.13 show VCO control voltage variation during frequency locking.



Figure 4.13. VCO control voltage variation for coarse loop only while frequency locking at 3.2 GHz

Reference and divided VCO clock signals and eye diagram of the VCO clock after lock is shown in Figure 4.14.



Figure 4.14. Reference clock (@ 200 MHz) and divided VCO clock signals with the eye diagram of VCO clock after frequency lock at 3.2 GHz

Fast Fourier transform (FFT) of the VCO clock signal after lock is given in Figure 4.15. Note that spectral component of the clock has a peak at 3.2 GHz, which confirms the frequency lock.



Figure 4.15. Spectrum of the 3.2 GHz VCO clock after frequency lock

## 4.4.2. Fine Loop Modelling

The *Simulink* model of the fine loop is constructed similar with the coarse loop model. Fine loop model and phase detector model is shown in Figure 4.16 and Figure 4.17 respectively.



Figure 4.16. Simulink model of the fine loop



Figure 4.17. Simulink model of phase detector

Standalone fine loop simulations with an input data of 3.2 Gbps have been performed in order to observe phase alignment is performed. Since those simulations use only fine loop, without frequency acquisition, an initial frequency value has been set to VCO output at start-up. At start-up, VCO oscillates starting from 3.18 GHz.

Following captured figures are the result of those simulations. Figure 4.18 show the control voltage of the VCO with an initial oscillating frequency of 3.18 GHz. Note that, first, system tries to match VCO frequency to the data rate then it performs phase alignment.



Figure 4.18. VCO control voltage variation for fine loop only while phase locking at 3.2 Gbps data

Data and recovered VCO clock signals and eye diagram of the VCO clock after phase lock is shown in Figure 4.19. Note that the rising edge of the clock is at the middle of the data eye.



Figure 4.19. VCO clock (@ 3.2 GHz) and input data signals with the eye diagram of VCO clock after phase lock at 3.2 Gbps data

## 4.4.3. Two-Loop Clock and Data Recovery Modelling

Coarse and fine loop models mentioned previous chapters have been combined together with a lock detector to form the two-loop clock and data recovery model. Both loops use the same loop filter, whereas each loop has their own charge pumps with different gain and current values.

At start up, the coarse loop provides fast locking to the system frequency with the help of a reference clock. After the VCO clock reaches a proximity of system frequency, lock detector toggles the "lock" signal indicating that it is time for fine loop to take over the control of the phase locking. Fine loop tracks the phase of the generated clock with respect to the data and aligns the VCO clock such that its rising edge is in the middle of data eye.

The *Simulink* model of two-loop clock and data recovery architecture is shown in Figure 4.20 with its lock detector model.



Figure 4.20. Simulink model of two-loop architecture

Two-loop clock recovery top-level simulations have been performed with the design parameters, such as filter component values, charge pump current values, which were obtained from standalone coarse and fine loop analyses.

Following screenshots have been obtained from *Simulink* analyses for 3.2 Gbps input data rate. Figure 4.21 shows VCO control voltage variation and "lock" signal generated by lock detector. Note that coarse loop releases operating when lock signal is raised and fine loop takes over. Since the frequency locking is a coarse process, control voltage of the VCO is more relaxed in frequency detection part of the data recovery operation. However, phase locking requires more precise and fine adjustment, thus, VCO control voltage is much more straight in the phase alignment part.



Figure 4.21. VCO control voltage variation and lock signal for top-level clock recovery while recovering 3.2 Gbps data.

3.2 Gbps input data and recovered VCO clock signals and eye diagram of the VCO clock after phase lock is shown in Figure 4.22 and Figure 4.23, respectively. Note that the rising edge of the clock is at the middle of the data eye and the peak-to-peak jitter amplitude on the VCO clock is less than the jitter during frequency locking (Figure 4.14).



Figure 4.22. 3.2 Gbps data in and sampling VCO clock signals



Figure 4.23. Eye diagram of the VCO clock after phase alignment

In Figure 4.24 spectrum of the VCO clock after phase alignment is shown. Comparing with the Figure 4.15, it can be observed that sideband of the clock after phase alignment is narrower than the clock signal after frequency lock.



Figure 4.24. Spectrum of the VCO clock after phase alignment at 3.2 Gbps

Simulations and analyses of MATLAB and *Simulink* models give reliable and expected results. Next step of the design is to realize whole system in transistor level by using the determined component parameters and clues obtained from those analyses. As it will be seen in the following chapters, transistor based design is based on the MATLAB and *Simulink* models.

This chapter described several concepts that allow full system-level simulation of a synchronization block. In particular, PLL based clock and data recovery impairments have been discussed. In the first part of the chapter, problem of determining loop dynamics and its parameters have been examined. Second part of the chapter deals with the *Simulink* modelling and simulating of two-loop clock and data recovery circuit with the parameters obtained from MATLAB calculations.

Detailed analysis of each sub-block forming whole system will be performed in next chapter.

# 5. ARCHITECTURE COMPONENTS: GENERAL TECHNOLOGY REVIEW & COARSE LOOP

## **5.1. Introduction**

This chapter presents the detailed circuit design issues of coarse loop of the fully integrated clock and data recovery targeted for the system presented in Chapter 4. The clock and data recovery circuit was fabricated in UMC 0.13µm, single poly, 8-metal digital CMOS process. All transistor level schematic entries have been done using *Cadence* design environment and all simulations have been performed using Analogue Artist and *Spectre* simulator. Following chapters also cover comprehensive examination of the UMC 0.13µm CMOS technology in terms of its effects on circuit performance.

In addition to the given performance specifications, many of the design decisions have been motivated by several other considerations. These include substrate noise injection and common-mode noise immunity, issues of particular importance for highly integrated high-speed analogue circuits. These two main performance specifications are also the matter of this chapter.

Since, the circuit operating frequency reaches up to 3.2 GHz, special attention has been paid on designing high-speed parts of the system and special techniques have been used on those sensitive sub-blocks.

## 5.2. General Considerations

Following sub-chapters deal with the issues about the substrate noise injection and common-mode noise immunity. Reader could find fundamental information on those performance specifications.

#### 5.2.1. Substrate Current Injection

Substrate current injection is the result of charging and discharging capacitances to the bulk. Examples of these parasitic capacitors include the drain-bulk and sourcebulk junction capacitors. An example of the flow of substrate current is illustrated in Figure 5.1. As the voltage across the drain-bulk junction varies, the depletion width is modulated causing currents to flow into the substrate. Substrate current is given during injection is simply given in Eq. (5.1).

$$i_{sub} = C_{db} \frac{dv_{out}}{dt}$$
(5.1)

One method to reduce the substrate currents is to use differential logic styles. For fully differential circuit topologies, such as source-coupled logic (SCL), the substrate current is cancelled to first-order [11].

$$i_{sub} = C_{db1} \frac{dv_{out}}{dt} + C_{db2} \frac{dv_{out}}{dt}$$
(5.2)

The amount of cancellation is limited because the junction capacitances are nonlinear and depend upon the bias voltage across it. Pseudo-differential topologies are also a significant improvement over single-ended, rail-to-rail CMOS. To prevent substrate coupling into the sensitive high-speed blocks, differential signalling has been used in this design. Having a low resistance backside contact can also reduce the impact of substrate currents on the high-speed sensitive blocks.



Figure 5.1. Examples of substrate current injection (a) CMOS (b) SCL [11]

#### 5.2.2. Common-mode Noise Immunity

During switching transients, digital circuits produce large spikes in the current drawn from power supplies. Due to the inductance in the supplies, the supply voltage can easily bounce a few hundred milivolts. Since many analogue signals are much smaller than this bounce, the circuits must have a high power supply rejection ratio (PSSR). Another source of common-mode noise is capacitive coupling to long interconnect. Therefore, for mixed-signal designs, all circuits require a large amount of common-mode rejection. To achieve this, fully differential topologies are required. Therefore, all circuit blocks –phase detector, charge pump, loop filter, VCO and divider- are implemented as differential or pseudo differential circuits in this design.

## 5.2.3. Differential vs. Single-Ended Signalling

Differential signalling requires two wires and pins per channel, whereas singleended signalling requires only one wire and pin per channel. Due to self-induced power supply noise, however, differential signalling usually requires less than twice as many pins compared to single-ended signalling, as explained below. Although less efficient in terms of pin utilization, differential signalling has many advantages which make it more robust and better suited for a large digital system. These are described in more details below.

Ø Self-induced power-supply noise: A differential circuit, unlike a single-ended circuit, always draws a constant amount of current from the power supplies, resulting in very little AC power supply current. The stable power supply current draw helps reduce power supply noise due to wire inductance (i.e. Ldi/dt noise). A differential driver always sinks a constant amount of current, greatly reducing the di/dt noise. As technology scales and supply voltage decreases, this advantage will only become more important.

Ø References: A differential signal serves as its own receiver reference. Unlike the transmitter generated reference which is shared among a group of single-ended lines, the differential lines are usually tightly coupled (or even twisted) and easily make many noise sources common mode to the receiver.

Ø Signal swing: The voltage difference between a 1 and a 0 for differential signalling (therefore called the differential swing) is twice that of the value for single-ended signalling (therefore called the single-ended swing). For many drivers whose single-ended swings are limited, differential signalling can provide more noise margin.

In summary, differential signalling creates less noise and has better noise immunity compared to single-ended signalling. Its disadvantage, namely the pin inefficiency, will become less significant as bit rate increases and supply voltage decreases.

## 5.2.4. Technology and Transistors

Technology supplier (UMC) has two CMOS 0.13µm transistor versions, 3.3 Volts version and 1.2 Volts version. The first one is used for I/O circuits, and the second is generally for the core design. These transistors may be implemented as either high-speed (HS) transistors or low leakage (LL) ones. This design uses high-speed transistors with high mobility values. Therefore it is accepted beforehand that leakage currents will be high, namely around a few hundred nanoamperes in the worst case for the current.

|              | $u_0 (m^2/Vs)$ | t <sub>ox</sub> (nm) |
|--------------|----------------|----------------------|
| 1.2V HS NMOS | 4.28e-2        | 3.15                 |
| 1.2V HS PMOS | 1.24e-2        | 3.30                 |

Mobility  $(u_0)$  and gate oxide thickness  $(t_{ox})$  values of 1.2V high-speed NMOS and PMOS transistors, which will be used for manual calculations, are given in Table 5.1.

Table 5.1. Mobility and oxide thickness values for transistors used in the design

Since the some high-speed and sensitive blocks of the system use current mode differential signalling,  $\lambda$ , namely channel length modulation coefficient [12], becomes another important design variable. In our case,  $\lambda$  is derived by simulations, by connecting transistors in turn to the test setup in Figure 5.2. A parametric DC sweep for  $I_{DS}-V_{DS}$  curves revealed that channel length value has a great effect on the  $\lambda$  parameter. The result of the DC sweep is also given in Figure 5.3.



Figure 5.2. Channel modulation coefficient simulation setup

Analytically, channel length modulation coefficient  $\lambda$  definition is given in Eq. (5.3) [13].



 $I = \frac{1}{L} \cdot \sqrt{\frac{2e_s}{qN_A}} \cdot \frac{\sqrt{V_{DS} - V_{DS(sat)} + f_T} - \sqrt{f_T}}{V_{DS}}$ (5.3)

Figure 5.3. I-V curve of an NMOS with the change of W and L

Figure 5.3 shows that I-V behaviour of the transistor changes dramatically below 0.2 micrometers of channel length. In order to reach higher frequencies, minimum length allowed by the technology is used in switching transistors throughout the design process, which is 0.12 micrometers. Therefore,  $V_{DS}$  will have an important effect on the  $I_{DS}$  of the transistors. However, transistors in paths, which are transmitting signal by current instead of voltage, have larger channel length values than 0.12 µm in order to make the current independent from  $V_{DS}$  bouncing.

It is obvious after these analyses, ordinary  $I_{DS}$  definitions belong to linear and saturation region of a transistor (given in Eq. 5.4 and Eq. 5.5 respectively [14]) are no more reliable for a detailed analyse at deep sub-micron technologies. Especially, if channel length of a transistor is less than 0.2µm, then using those analytical equations could give unrealistic results. Thus, instead of comprehensive mathematical calculations for parameter estimation, simulation based analyses have been preferred during the rest of the circuit design.

$$I_{DS} = \frac{W}{L} m C_{ox} \left[ (V_{GS} - V_T) V_{DS} - \frac{1}{2} V_{DS}^2 \right] (1 + I V_{DS}) \quad while \ V_{GS} - V_T \ge V_{DS}$$
(5.4)

$$I_{DS} = \frac{1}{2} \frac{W}{L} m C_{ox} [V_{GS} - V_T]^2 \frac{(1 + I V_{DS})}{1 + I (V_{GS} - V_T)} \quad while \ V_{GS} - V_T \le V_{DS}$$
(5.5)

Another analyse has been performed on 1.2V high-speed NMOS and PMOS transistors in order to examine I-V curves of both N and P type MOS devices. In these simulations, widths and lengths of transistors are constant while gate-source voltage is parametrically swept from zero to  $V_{DD}$  (1.2V). Constant channel widths for N and P devices are  $W_n = 1.7\mu m$  and  $W_p = 4.6\mu m$  respectively, while  $L_n = L_p = 0.12\mu m$ .

For high-speed 1.2V NMOS and PMOS transistors, following I-V curves have been obtained (Figure 5.4 and







One analytically way to calculate  $\lambda$  from these curves is given in (5.6) [12].

$$\frac{I_{DS1}}{I_{DS2}} = \frac{(1+l \cdot V_{DS1})}{(1+l \cdot V_{DS2})}$$
(5.6)

Substituting values taken from the curves into (5.6) gives  $\lambda$  values are ranging from  $0.5V^{-1}$  to  $2V^{-1}$ , which are quite large values for  $\lambda$ . Another way of deriving  $\lambda$  is doing so by similar triangles theorem, by finding the point, where the curves' slope lines intersect with the x-axis. If this point is called V<sub>X</sub>, then  $\lambda$  can be found according to Eq. (5.7). This equation yields  $\lambda \approx 1 V^{-1}$ .

$$I = -\frac{1}{Vx} \tag{5.7}$$

Analog Artist *Spectre* simulator gives certain numerical outputs, which can be used to calculate  $\lambda$ . A DC simulation has been run, and instantaneous DC operating values for  $g_{DS}$  and  $I_{DS}$  have been discovered when  $V_{GS}=0.6V$  and  $V_{DS}=1.2V$ . Assuming (5.8) and therefore (5.9) are valid,  $\lambda$  is calculated as  $\lambda \approx 1$  V<sup>-1</sup> again.

$$r_o = \frac{1}{I \cdot I_D} \tag{5.8}$$

$$I = \frac{g_{ds}}{I_D} \tag{5.9}$$

These big  $\lambda$  values show that working at sub-micron geometries such as L=0.12µm, general equations are no longer valid due to short-channel effects [13].



Figure 5.5. I-V curve of an PMOS with the change of  $V_{GS}$  (W=4.6 $\mu$ m, L=0.12 $\mu$ m)

Another concern with the technology is the PMOS and NMOS width ratios for a better symmetry in differential and inverter based circuits. Ratio of the widths should be selected carefully in order to acquire  $V_{in} = V_{out} = V_{DD}/2$  and  $t_{phl}(fall time) = t_{plh}$  (rise time) at the same time in the symmetry sensitive blocks. Mobility ratio of N and P devices is  $u_{on} / u_{op} = 4.28 / 1.24 \approx 3.45$ . However, if this ratio is directly used both criteria cannot be satisfied at the same time due to non-linearities in deep sub-micron technology.

The first criterion ( $V_{in} = V_{out} = V_{DD}/2$ ) is satisfied with a ratio of 2.4. With this ratio,  $V_{in} = V_{out} \approx 601 \text{mV}$  under typical conditions. The second criterion ( $t_{phl} = t_{plh}$ ) is satisfied when the ratio is 3.4. Trying to find an optimum solution,  $W_p / W_n$  ratio has been chosen as 2.7. At this point,  $V_{in} = V_{out} \approx 610 \text{mV}$ .

This ratio changes with the load values and exact transistor width values, for design and layout simplicity the ratio may not has been kept constant throughout the thesis.

## 5.2.5. Case Definitions

The proposed design should meet the specs under all circumstances, that is, in certain limits of process, voltage and temperature (PVT) changes, the circuit should work properly. The schematic design is done with all combinations of these changes. Table 5.2 lists the values of worst, best and typical cases.

|                                               | Worst Case    | Typical Case      | Best Case     |
|-----------------------------------------------|---------------|-------------------|---------------|
| Process variation<br>(N and P devices)        | slow – slow   | typical - typical | fast - fast   |
| Supply voltage<br>(core voltage – IO voltage) | 1.08V – 2.97V | 1.2V – 3.3V       | 1.32V – 3.63V |
| Operating temperature                         | 120°C         | 27°C              | -40°C         |

Table 5.2. Corner case definitions

## 5.3. Design of Coarse Loop Components:

Under this heading, detailed description and design information of the coarse loop control components can be found. Namely, these control components are phasefrequency detector, differential charge pump, common-mode feedback circuit, frequency divider and lock-detector.

## 5.3.1. Design of Phase-Frequency Detector

In the feedback control path, the phase detector converts the phase/frequency difference into an electrical signal that can be processed by later stages. In the loop, divided output of the VCO can be either leading or lagging from the reference clock phase. When the divided VCO output is lagging the reference, the delay per VCO stage

needs to be shortened; on the contrary, when the divided VCO output is leading the reference, the time delay needs to be lengthened (See Figure 5.6 [15]). Therefore, the phase detector needs to distinguish not only the absolute phase difference, but also the phase relationship.



Figure 5.6. Two cases for phase detector to resolve

At start up of the data recovery operation, first off all; coarse loop needs to lock to the frequency of the current data rate before phase locking. To achieve this, control loop mechanism should also sense the frequency difference of the divided VCO clock and the reference clock. This utility is also required in order to expand the acquisition range of the whole clock recovery device. A solution to satisfy both requirements is to use a phase-frequency detector (PFD).

Phase-frequency detectors are able to discriminate frequency differences when the loop is not locked. When in lock, they behave like typical phase detectors, outputting a signal linearly dependent upon the phase difference of the inputs. In order to PFDs to detect frequency differences, they should have memory.

The state diagram and ideal waveforms for one phase-frequency detector is given Figure 5.7 [11]. This PFD contains two state variables, U and D, and four states. All transitions but one correspond the rising edges of the inputs. On the rising edge of V<sub>1</sub>, the state variable U is set. Similarly, on the rising edge of V<sub>2</sub>, the state variable D is set. When both have occurred, both state variables are reset after a short delay of  $\Delta T$ .



Figure 5.7. Phase-frequency detector state diagram and ideal waveforms

The phase detector output is the difference between U and D. When rising edge of  $V_1$  leads  $V_2$ , the net phase detector output will be positive, forcing the VCO to a higher frequency. Likewise, when the order of the edges is reversed, the phase detector output is negative, slowing down the VCO. The waveforms of the Figure 5.7 also show that even if the detector starts in an incorrect state, the phase continues to accumulate until the correct polarity is produced. Once the loop has acquired lock, the duration of the PFD output pulse is equal to the time difference between the input rising edges. Therefore a linear transfer characteristic is achieved. The transfer characteristic of the PFD is given in Figure 5.8. Note that it is not periodic, and when the phase difference is too large, the PFD enters the frequency discriminator mode where the polarity of the phase detector output does not change.



Figure 5.8. PFD transfer characteristic

A common problem for PFDs is so-called "dead-zone" problem [16]. The delay that occurs before both state variables are reset is used to avoid it. To illustrate the problem, let's consider the case when the static phase error is zero and both edges occur simultaneously. Without the delay, the ideal waveform would be an infinitely thin pulse on one of the outputs. Due to finite rise and fall times in digital circuits, the phase detector gain enters a non-linear low-gain region, which is obviously undesirable.

A behavioural block diagram for this dead-zone-free PFD is shown in Figure 5.9. The drive the edge-triggered D-flip-flops, which set the corresponding output signal. When both signals are on, the flip-flops are reset after a short delay to avoid the deadzone problem.



Figure 5.9. Block diagram of dead-zone free PFD

The delay for the reset signal must be long enough to allow full switching of the UP and DOWN pulses and the charge pump switches. The length of the pulse cannot be made arbitrarily long, because it limits the maximum operating frequency of the PFD. Maximum operating frequency of the PFD can be determined by [17]

$$f_{\rm max} = \frac{1}{2\Delta T} \tag{5.10}$$

where  $\Delta T$  is the pulse width.

In this application, a very similar PFD structure has been used at coarse loop control mechanism. The PFD used in coarse loop does not need to be so precise, since only frequency lock is of concern. However, the detector must guarantee frequency lock, which means that there may be a dead-zone but not too much to prevent the frequency equalization. *Cadence* schematic view of the PFD used in the design is shown in Figure 5.10.



Figure 5.10. Cadence schematic view of the designed PFD circuit

Note that a NOR gate has been used to allow the system power down the PFD if needed. In addition to NOR gate an inverter has been used through the reset path to satisfy enough dead-zone margin under different temperature and process conditions.

PFD in Figure 5.10 can be divided into two main parts. First part is the ordinary phase-frequency detection core, which is described in detail in the beginning of chapter. All the logic gates have been taken from UMC  $0.13\mu$ m standard-cell library. This PFD is designed for operation frequencies between 2.4 GHz / 16 = 150 MHz and 3.2GHz/16=200 MHz. Since the PFD will work at the low-speed part of the system, standard-cell library components can be used instead of full-custom cells for a reliable operation. Flip-flops used in the PFD core are rising edge triggered D-type ones with active-low reset inputs.

Second part of the PFD block is the "PFD\_zero" block, whose inputs are the outputs of ordinary PFD. As mentioned previously, the phase-frequency detector output is the difference between UP and DN signals generated by the PFD core. These signals then feeds charge pump input transistors and generated difference current is sank or drawn through loop filter. PFD\_zero block performs this differentiation process inside the PFD. In this way, only one output signal toggles among UP and DN. If the duration of the UP\_in is larger then DN\_in then PFD\_zero block generates a pulse with a

duration of UP\_in – DN\_in at its UP pin while DN pin is at logic zero level. On the contrary, if the duration of the DN\_in is larger than UP\_in then at the DN pin, a pulse with duration of DN\_in – UP\_in is generated. Hence, fatal current injection during transistor switching in charge pump operation is prevented in this way. *Cadence* schematic view of PFD\_zero circuit is given in Figure 5.11.



Figure 5.11. Cadence schematic view of PFD\_zero circuit

Since rest of the system uses differential signalling, PFD outputs should be also differential adaptive. Therefore, at the output of the PFD\_zero circuit, UP and DN signals are inverted to form UPN and DNN differential complementary signals. To do this, delay-matching method has been used. Since an inverter and a digital buffer have different propagation delays, this time difference has been compensated by a transmission gate, which is always on. All of the logic gated used in PFD\_zero block have been also taken from UMC 0.13µm standard-cell library. Only transmission gates and output inverters are manually designed to match the delays on both differential signal paths.

*Spectre* simulation result with 200 MHz reference and 200 MHz divided VCO clock while having a phase difference of 500 ps (typical conditions) is given in Figure 5.12.



Figure 5.12. Spectre simulation result of PFD circuit at 200 MHz

Note that when divided VCO clock is advanced only DN and DNN are active and, when reference clock is advanced only UP and UPN are active. The duration of generated UP and DN signal pulses are around 500 ps, which is the phase difference between inputs. Current spike generation due to differentiation process between UP and DN signals in charge pump are eliminated inside the PDF by means of the PFD\_zero circuitry. From the top-level system point of view, this method reduces the static phase error of the overall system.

Transfer characteristic of the designed PFD has been extracted by shifting the phase of one input  $2\pi$  radians with respect to the other input, while each output pins drive 10fF load. Results for best case (BC), worst case (WC) and typical case (TYP) conditions are combined together in Figure 5.13.



Figure 5.13. Simulated PFD transfer characteristic by Spectre

Note that, TYP and BC simulations give quite close results with each other; however, WC simulation results seem to be so far away from the origin. In typical case, dead-zone of the PFD is around  $\pm 200$  ps. This dead-zone value is quite inside the safety margin of the working principle of coarse loop.

In worst case, maximum delay time (dead-zone) is measured around  $\pm 500$  ps, which is required for the charge pump to completely turn on and off. Therefore, from Eq. (5.10), the maximum frequency that the phase-frequency detector can operate is 500 MHz, which is much larger than the 200 MHz maximum frequency of reference clock.

Current drawn from  $V_{DD}$  by the PFD under typical conditions with a 200 MHz of reference clock is 125  $\mu$ A, which equals to 150  $\mu$ W power consumption with a  $V_{DD}$  of 1.2V.

## 5.3.2. Design of Differential Charge Pump

In the low pass filter considered in Figure 3.4, the average value of the PFD output is obtained by depositing charge onto capacitor during each phase comparison

and allowing the charge to decay afterwards. In a charge pump, on the other hand, there is a negligible decay of charge between phase comparison instants, leading to increase consequences.

A three state charge pump can be best studied in conjunction with a three state phase-frequency detector. (Figure 5.14) Charge pump itself consist of two switched current sources driving a capacitor. (It is assumed that  $S_1$  and  $I_1$  are implemented with PMOS' and  $S_2$  and  $I_2$  with NMOS') Note that with a pulse width of T on  $Q_A$ ,  $I_1$  deposits a charge equal to I.T on C. Thus, if  $\omega_A > \omega_B$ , or  $\omega_A = \omega_B$  but A leads B, then positive charge accumulates on C steadily, yielding an infinite DC gain for the PFD. Similarly, if pulses appear on  $Q_B$ ,  $I_2$  removes charge from C on every phase comparison, driving  $V_{out}$ toward - $\infty$ . In the third state, with  $Q_A = Q_B = 0$ ,  $V_{out}$  remains constant. Since the steady state gain is infinite, it is more meaningful to define the gain for one comparison instant, which is equal to IT/( $2\pi$ C).



Figure 5.14. PFD with charge pump

An important conclusion to be emphasized from the above observations is that, id offsets and mismatches are neglected, a PLL using this structure locks such that the static phase difference between A and B is zero; even a very small phase error would result in an indefinite accumulation of charge on C.

Similar with the PFD, dead zone is also an undesirable phenomenon for charge pumps. If the phase between the input and output varies within this zone, the DC output of the charge pump will not change significantly and the loop fails to correct the resulting error. Consequently, a peak-to-peak jitter approximately equal to the width of the dead zone can arise at the output node.

From the above discussion, we also note that the dead zone disappears only if  $Q_A$  and  $Q_B$  can be simultaneously high for a sufficient amount of time. During this period, both  $S_1$  and  $S_2$  are on, allowing the difference between  $I_1$  and  $I_2$  to vary the voltage stored on C. Since  $I_1$  and  $I_2$  typically have a few percent of mismatch, the output voltage varies even if the input phase difference is zero. Thus, a PLL employing this arrangement locks with a finite phase error so as to cancel the net charge deposited by  $I_1$  and  $I_2$  on C. The important point is that the control voltage of the VCO is periodically disturbed, thereby modulating the VCO and introducing sidebands in the output spectrum.

Another error occurs from mismatches between  $S_1$  and  $S_2$  in Figure 5.14. When these switches turn off, their charge injection and feed-through mismatch result in an error step at the output, changing the VCO frequency until the next phase comparison instant. Up to this point,  $I_1$  and  $I_2$  in Figure 5.14 are assumed to be ideal. Since each current source requires a minimum voltage to maintain a relatively constant current, it is important that  $V_{DD} - V_X$  and  $V_Y$  not drop below a certain level. If the extreme values of  $V_{cont}$  violate this condition, the current charging C varies and so does the overall gain [17], influencing the loop dynamic and static behaviour.

Another related effect occurs when  $S_1$  and  $S_2$  are off:  $I_1$  and  $I_2$  pull nodes X and Y to  $V_{DD}$  and ground respectively, causing charge sharing between  $C_X$ ,  $C_Y$  and C when  $S_1$  and  $S_2$  turn on again (Figure 5.15). If  $V_{out} = V_{DD}/2$ ,  $I_1 = I_2$ , and  $C_X = C_Y$ , then  $V_{out}$  is not disturbed, but because  $V_{out}$  determines the VCO frequency, it is generally not equal to  $V_{DD}/2$ , thus experiencing a jump when  $S_1$  and  $S_2$  turn on. This effect is also periodic and introduces sidebands at the output, but it can be suppressed if nodes X and Y are bootstrapped to the voltage stored in the capacitor. [18]



Figure 5.15. Charge sharing in charge pump

As noise immunity demands a differential control voltage for the VCO, the charge pump circuit of Figure 5.14 must be modified to provide a differential output. An additional advantage is that differential implementation reduces mismatch and chargesharing problems as well. In a differential charge pump, the UP and DN signals activate only pull-down currents, and the pull-up currents are passive. Thus when both UP and DN signals are low, a common-mode (CM) feedback circuit must counteract the pull-up currents to maintain a proper level.

Shown in Figure 5.16 is an example where differential pairs  $M_1-M_2$  and  $M_3-M_4$  are driven by the PFD and the network consisting of  $M_5-M_9$  sets the output commonmode level at  $V_{GS6}+V_{GS9}$ . Because the CM level is temporarily disturbed at each phase comparison instant, it is important that CM transients not lead differential settling components. [19]



Figure 5.16. Differential CMOS charge pump (B. Razavi)

In this application, a differential charge pump structure has been used at coarse loop control mechanism since the VCO of the system has differential control line. Charge pump designed for this application is slightly different from the circuit given in Figure 5.16.

In Figure 5.17, functional core of the designed charge pump circuit has been given. In addition to the figure, Table 5.3 consists of device geometries of charge pump core. With current mirrors and power down transistors, complete circuit schematic view is given in Appendix A. Note that, this charge pump also receives differential UP and DN signals, which are generated by PFD. As mentioned in the previous chapter, during frequency lock PFD generates zero at both its UP and DN outputs. Hence, differential charge pump outputs should not be fall down in the absence of UP and DN pulses. For this reason, the charge pump given in Figure 5.17, turns off all the devices connected to the output nodes when both UP and DN pulses are low.



Figure 5.17. Differential charge pump used in the coarse loop

| Device | W (µm) | L (µm) |
|--------|--------|--------|
| M1     | 7      | 0.5    |
| M2     | 7      | 0.5    |
| M3     | 4.34   | 0.5    |
| M4     | 25     | 0.5    |
| M5     | 25     | 0.5    |
| M6     | 4.34   | 0.5    |
| M7     | 7      | 0.5    |
| M8     | 7      | 0.5    |
| M9     | 25     | 0.5    |
| M10    | 25     | 0.5    |

Table 5.3. Device geometries of the differential charge pump

As seen in the Table 5.3 length of the all transistors in the charge pump design has been selected as 0.5  $\mu$ m instead of minimum length size 0.12  $\mu$ m. Transistors of bias and current mirror circuits of charge pump have also 0.5 $\mu$ m length. Since the circuitry pumps up and pumps down current through differential loop filter, transistor-matching issue becomes an important performance parameter for the charge pump. In order to have good matching devices on the silicon, larger size transistors should be used in the design. Thus, using relatively larger transistors in the design will reduce devicematching problem of two symmetric parts of the charge pump. Another advantage of using large length transistors is reducing the short channel effect by reducing  $\lambda$  coefficient. As seen in Figure 5.3, NMOS with a channel length of 0.53 µm I-V curve has much more straight behaviour in saturation region comparing with the curves of smaller channel length transistors. In this way, I<sub>D</sub> of the transistor becomes almost independent from its V<sub>DS</sub> voltage, which is a desired characteristic for current mirrors and current mode circuits.

Supply voltage of the selected process is 1.2 V, which makes it mandatory to use special low-voltage circuit design techniques during our study. Especially, design of analogue portions of the whole circuitry suffers from voltage headroom problem. 1.2 V of supply voltage does not make it possible to use cascode structures, which means at most three cascoded transistors can be used to implement a robust circuit. In the design of charge pump this restrictive effect of low power supply forces us to use only three cascoded transistors from  $V_{DD}$  to ground. It is obvious that for differential type signalling a common-mode feedback (CMFB) circuit must be used. CMFB circuit should sense and correct the average differential output. Usual embedded CMFB circuits control the bias voltage of the current source of the charge pump, which requires additional transistor through  $V_{DD}$ -GND path. Since it is not possible to use 4 cascoded devices in this technology an alternative way for CMFB is used in charge pump circuit. This solution is given in following chapters.

Differential charge pump can be examined in two different ways. First one is pump up or down mode while second one is no pumping mode. During frequency locking, coarse loop tries to catch input reference frequency by pumping up or down charge to the loop filter's capacitor in order to tune VCO frequency. Figure 5.18 shows the pump-down operation of the charge pump. Pump-down operation of the charge pump is very similar to the case given in Figure 5.18. Only the current directions hence active devices change during pump-up operation.



Figure 5.18. Pump-down operation of differential charge pump

While reference clock is leading VCO clock, PFD generates UP pulses and DN output is zero. Thus, M7 and M8 transistors are off causing M4 and M5 go into cut off region while M6 sinks  $I_B$  current from  $V_{DD}$ . At the same time, M1 and M2 transistors do into saturation region while M3 is off. In this way,  $I_B/2$  current is pumped down from  $C_P$  capacitor and the same amount of charge is pumped up to the  $C_N$  capacitor through M9 and M10 transistors. Hence, differential control voltage of the VCO is formed symmetrically by pumping up and down  $I_B/2$  current over loop filter capacitors.

In the design,  $I_B$  is selected to be 300  $\mu$ A and thus  $I_B/2$  is equal to 150  $\mu$ A. Geometries of input NMOS transistors and current mirror load PMOS transistors are selected such that VP and VN nodes are settled at  $V_{DD}/2$  under DC conditions.

After coarse loop locks to the frequency of reference clock, no UP and DN pulses are generated. UP and DN outputs are at low level while differential complementary signals, UPN and DNN, are high level. During lock state of the coarse loop, charge pump operates as shown in Figure 5.19.


Figure 5.19. Differential charge operation during no UP or DN pulses

Since both UP and DN signals are zero during lock state, UPN and DNN are at logic one level. In this way, all the MOS devices in the charge pump are turned off except M3 and M6, allowing a current equal to  $I_B$  is sank from  $V_{DD}$  through each transistors. Having a continuous active current path even in lock state may seem to have unnecessarily power consumption. However, system will not create sudden current jumps and thus sudden voltage ripples in the control line when temporary generations of UP and DN pulses. Geometries of those devices are selected such that  $I_B$  current can flow easily during frequency lock.

In Figure 5.20, drain currents of M1 and M10 transistors are given during pump down operation. The phase difference between 200 MHz reference clock and the 200 MHz divided VCO clock is 1.5 ns in this simulation. Note that, M1 sinks a current almost equal to 150  $\mu$ A from C<sub>P</sub>, while M10 sources 150  $\mu$ A to C<sub>N</sub> capacitor and the pulse width of the pumped current is 1.5 ns as desired.



Figure 5.20. Simulation result of M1 and M10 transistor drain currents during pump down operation



Figure 5.21. Coarse loop differential control signal at the output of the differential charge pump

Another transient simulation result belongs to coarse loop charge pump is given in Figure 5.21. In this simulation phase difference between clocks is 4 ns. Charge pump output drives differential loop filter. CPOUT\_P and CPOUT\_N signals form the differential control voltage of the VCO. Note that, output of the charge pump is symmetric over  $V_{DD}/2 = 600$ mV.

Current drawn from  $V_{DD}$  by the differential charge pump under typical conditions with a 200 MHz of clock is about 1 mA, which means 1.2 mW power consumption with a  $V_{DD}$  of 1.2V.

# 5.3.3. Design of Common-Mode Feedback (CMFB) Circuit

Major drawback to fully differential circuits is the need for common-mode feedback (CMFB), which adds complexity, power and noise to the overall system. The CMFB block sets the DC level and counteracts common-mode variations on the differential lines of the loop filter. The core of the circuit is shown in Figure 5.22. Complete *Cadence* schematic view of the circuit can be found in Appendix A. Circuit consists of N-type and P-type subcircuits. This circuit is complete symmetric version of the CMFB given in Figure 5.16. The principle of operation is based on adjusting the on-resistances of M5 and M6 transistors in Figure 5.16.

If the common-mode component of  $V_P$  and  $V_N$  goes up, the current in N-type transistors increases, which in turn discharges the two lines similarly and pulls down their voltage level. If the common-mode component of  $V_P$  and  $V_N$  goes down, the P-type transistors counteract accordingly to pull up the voltage levels of the two differential lines. [20]

The operation of the CMFB defines the DC levels on the  $V_P$  and  $V_N$  at  $V_{DD}/2$  and prevents transients from creating steady-state components on parasitic line capacitors. The differential signal ( $V_P - V_N$ ) controls the biasing circuit at the input stage of the VCO. Differential signals larger than normal linear operating range can affect the bias points in the CMFB and cause non-linearities and large transients in the loop. Hence, transistor geometries and the P and N device ratios have been selected carefully. Device geometries of the CMFB circuit are also given in Table 5.4.



Figure 5.22. Common-mode feedback (CMFB) circuit

| Device | W (µm) | L (µm) |
|--------|--------|--------|
| M1     | 73.5   | 0.4    |
| M2     | 4.9    | 0.4    |
| M3     | 73.5   | 0.4    |
| M4     | 20     | 0.12   |
| M5     | 20     | 0.12   |
| M6     | 10     | 0.12   |
| M7     | 10     | 0.12   |
| M8     | 37.5   | 0.4    |
| M9     | 2.5    | 0.4    |
| M10    | 37.5   | 0.4    |

Table 5.4. Device geometries of CMFB circuit

In Figure 5.23,  $V_P$  and  $V_N$  differential signals obtained from CMFB simulation is given. In this simulation, loop filter is connected to the CMFB ports and circuit is driven with 150  $\mu$ A current pulses. Resulting voltage variation over loop filter capacitor by

means of CMFB is given in Figure 5.23. While initial voltage values of  $V_P$  and  $V_N$  are not around common-mode, it is observed that CMFB sets the DC level of differential control signal at desired level.



Figure 5.23. Differential control signals with CMFB

Total supply current under typical conditions at 200 MHz is 900  $\mu$ A for CMFB circuit. This means a total power consumption of 1.08 mW with a V<sub>DD</sub> of 1.2V.

# 5.3.4. Design of Divide-by-16 Circuit

In the coarse loop VCO clock frequency must be divided to a lower frequency in order PFD to compare low-speed reference clock and generated clock. The ratio of the divider is 16 in this application. Since the designed clock recovery system can operate with different operating frequencies between 2.5 GHz and 3.2 GHz, divider output also can vary between 156.25 MHz and 200 MHz.

It is not a convenient way to use standard cell library flip-flops to divide frequencies at gigahertz range. Instead, full-custom differential flip-flops have been used. Structure of the frequency divider is given in Figure 5.24 with custom differential flip-flops. These high-speed flip-flops are also used in phase detector of fine loop. Detailed transistor level design issues of flip-flop are given in chapter 6.2.1.1.



Figure 5.24. Divide-by-16 circuit

Since the input stage of the PFD is single ended, positive output of the divider is used by PFD. In Figure 5.25, simulation result with an input VCO clock frequency of 3.2 GHz is given. Note that, divided clock period is precisely 5 ns, which corresponds a clock frequency of 200 MHz.

Under typical conditions divide-by-16 circuit draws 4.25 mA from  $V_{DD}$ . This corresponds a power consumption of 5.1 mW while  $V_{DD}$ =1.2V.



Figure 5.25. Simulation result of divide-by-16 circuit with an input clock frequency of 3.2 GHz

# 5.3.5. Design of Lock Detector

Lock detectors for PLL structures are generally used in order to monitor the status of the phase locking operation. In two-loop clock and data recovery circuits, lock detector has two main responsibilities. These are to monitor the status of the locking process and to enable or disable coarse loop with respect to relation between clock and data signals.

In this application, coarse loop first starts to compare reference clock and the divided VCO clock. After the VCO clock reaches a proximity of reference clock frequency, the LOCK signal is pulled down (active low) by lock detector and the coarse loop is turned off, while the fine loop is turned on by means of a simple logic circuitry. Lock detector is always active during whole operation of the clock and data recovery circuit. The possibility of losing data signal or losing the phase lock makes it necessary to make the lock detector always active. A DATA LOSS input signal raised if data is

lost. In this way, coarse loop re-activated while fine loop is turned off in order to keep the sampling clock continuous. During this operation lock detector again starts to monitor the status of frequency locking.

Since the lock detector indicates whether the VCO reaches operating frequency or not, it evaluates and process UP and DN signals of PFD. Lock detector filters UP or DN pulses according to a time constant,  $\tau$ =1/RC, and if filtered UP or DN pulse DC levels are below a certain voltage during a certain amount time then LOCK signal is pulled down. Coarse loop is considered to lock to the reference clock frequency when phase difference between clocks is smaller than 200 ps. Lock detector does not pull down immediately when it senses pulse widths equal to 200 ps, instead it waits for a certain amount time to be sure the frequency lock is managed. This time margin to turn off the coarse loop is selected to be around 400 ns, which means 80 clock cycles for 200 MHz reference clock.

Conceptual block diagram of the lock detector is given in Figure 5.26.



Figure 5.26. Conceptual block diagram of lock detector

As seen in Figure 5.26 UP and DN signals coming from PFD pass through a NOR gate. Output of the NOR gate controls a PMOS switch in order to source or sink current from current supplies to RC low-pass filter. When a pulse is received from UP or DN inputs, a current equal to  $(N-1)xI_B$  flows to ground through RC filter resulting a charge accumulation over C capacitor. Similarly, while both UP and DN inputs are zero, a current equal to  $I_B$  is sourced from RC filter. Average of the received pulse widths

determines the accumulated charge amount over filter capacitor. If received average pulse widths are large and dense then, voltage on the filter capacitor is above the buffer's threshold level resulting output of the digital buffer remains at logic one level. If received average pulse widths are smaller ( $\approx 200$  ps) and sparse then voltage on capacitor starts to drop. When the voltage drops below the threshold of the buffer in the course of time, LOCK signal is pulled down indicating that frequency lock is achieved.

Uppermost current source supplies N times bigger current than the underside current source. This results a smaller charge time of the filter capacitor than discharge time. In this way, adequate time margin is given to lock detector to determine if the loop is locked or not. Namely, untimely LOCK signal generation is prevented. Another precaution to prevent untimely LOCK signal generation is to use a digital buffer with hystherisis. When the signal at the buffer input drops from high to low level threshold of the buffer is lower than  $V_{DD}/2$ . On the other hand, threshold is higher than  $V_{DD}/2$  while input signal is rising. Hystherisis amount is selected around 400 mV in the design. Namely, during rising edge, threshold voltage of the buffer is at 800 mV, and during falling edge of the signal, threshold is at 400 mV. This behaviour prevents sudden and unexpected voltage bounces on the filter, which can cause untimely LOCK signal generations. Internal transistor level schematic of the hystherisis inverter is given in Figure 5.27.



Figure 5.27. Digital inverter with hystherisis

| Device | W (µm) | L (µm) |
|--------|--------|--------|
| M1     | 1.8    | 0.12   |
| M2     | 1.8    | 0.12   |
| M3     | 0.6    | 0.12   |
| M4     | 0.6    | 0.12   |
| M5     | 1.5    | 0.12   |
| M6     | 0.5    | 0.12   |

Device geometries of the digital inverter with hystherisis are given in Table 5.5.

Table 5.5. Device geometries of the digital inverter with hystherisis

Transistor level schematic of the lock detector functional part is given in Figure 5.28. Complete *Cadence* schematic view of the circuit with power down and biasing circuits can be found in Appendix A. In Table 5.6, device geometries are also given.



Figure 5.28. Transistor level schematic of lock detector

| Device | W (µm) | L (µm) |
|--------|--------|--------|
| M1     | 1      | 0.5    |
| M2     | 18     | 0.5    |
| M3     | 18     | 0.5    |
| M4     | 2      | 0.12   |
| M5     | 0.5    | 0.5    |
| M6     | 0.5    | 0.5    |
| M7     | 15     | 5      |
| M8     | 1.5    | 0.12   |
| M9     | 0.5    | 0.12   |
| M10    | 3      | 0.12   |
| M11    | 1.2    | 0.12   |
| M12    | 6      | 0.12   |
| M13    | 2.4    | 0.12   |
| M14    | 1.5    | 0.5    |
| M15    | 1.5    | 0.5    |
| M16    | 0.5    | 0.5    |
| M17    | 0.5    | 0.5    |
| M18    | 3      | 0.12   |
| M19    | 3      | 0.12   |
| M20    | 1      | 0.12   |
| M21    | 1      | 0.12   |
| M22    | 90     | 5      |
| M23    | 3      | 0.12   |
| M24    | 2      | 0.12   |

Table 5.6. Device geometries of lock detector

M22 transistor is used as a secondary filtering capacitor in addition to the primary RC filter. In this way, two-step filtering of pulses is formed and definite lock signal generation is performed. In UMC CMOS 0.12 µm technology MOS unit gate oxide capacitance value is  $10.79 \text{ fF}/\mu\text{m}^2$ . Hence, M7 transistor is used to implement a capacitor value of  $5x15x10.79 \approx 0.81 \text{ pF}$ , while M22 corresponds a capacitance value of  $90x5x10.79 \approx 4.86 \text{ pF}$ . The resistor value of the RC filter is selected to be 300 ohms.

In Figure 5.29, transient response of lock detector to the given input signals is given. In the first 600 ns of simulation time, UP signal pulse width is set to 200 ps, while last 600 ns of simulation time, UP signal pulse width is set to 600 ps. DN signal is zero during whole simulation time. At start up, LOCK signal is pulled up to logic one level. After 400 ns, voltage level on the CAP node drops below 400 mV and reaches the threshold value of the hystherisis inverter. At his point, LOCK signal is pulled down indicating that frequency lock is achieved. After 600 ns of simulation time, UP signal

width becomes 600 ps. Consequently, voltage on CAP node starts to increase, and as soon as it reaches 800 mV, threshold of the inverter is exceeded. Thus, inverter toggles and LOCK signal is pulled up.



Figure 5.29. Transient simulation result of lock detector

Lock detector draws a total current of 300  $\mu$ A from power supply under typical conditions. Thus, total power consumption of the lock detector is around 360  $\mu$ W with a power supply of 1.2V.

# 6. ARCHITECTURE COMPONENTS: FINE LOOP & DIFFERENTIAL VCO

# **6.1. Introduction**

This chapter presents the detailed circuit design issues of fine loop and the differential voltage controlled oscillator (VCO) of the fully integrated clock and data recovery circuit.

Namely, fine loop control components are phase detector, differential charge pump and loop filter. And the VCO components are delay stage, self-biasing circuit and VCO output amplifier.

#### 6.2. Design of Fine Loop Components

# 6.2.1. Design of Differential Phase Detector

Since the fine loop operates up to 3.2 GHz, special attention must be paid during the design of the loop components. One of the important parts of the fine loop is the differential phase detector (PD). Differential phase detector is specially designed to detect phase difference between incoming random data and clock in order to have minimum static phase offset.

The component, which plays a key role in phase detector, is a differential flipflop. Performance of the differential flip-flop directly affects the overall performance of fine loop. Another important component of the phase detector is current mode XOR circuit. Ordinary logic XOR gates cannot be used with the phase detector because of the propagation time limitations due to high-speed data rate. Since phase detector should sense phase differences around a few picoseconds, XOR gate should sense these differences. Hence, current mode operation has preferred because of high-speed requirements. These sub-blocks of the phase detector will be discussed in detail in following chapters.

Conceptual block diagram of the phase detector [21] is given in Figure 6.1. In literature, this phase detector is known as Hogge phase detector and commonly used in high-speed phase detection operation because of its simplicity and self-correction characteristic. A modified and improved version of this type of phase detector is used in our design.



Figure 6.1 Conceptual block diagram of Hogge phase detector

As seen in Figure 6.1, simplest form of the phase detector consists of two flipflops and two XOR gates. The NRZ data signal is applied both the DFF<sub>1</sub> and XOR<sub>1</sub>. DFF<sub>1</sub> is both the retiming decision circuit for the receiver and a part of the clock recovery circuit. The retimed data from DFF<sub>1</sub> is applied to XOR<sub>1</sub>, XOR<sub>2</sub> and DFF<sub>2</sub>. DFF<sub>1</sub> is clocked with true CLK and DFF<sub>2</sub> is clocked with the inverted CLK. The REF output from  $XOR_2$  is a fixed width square pulse, which has a width of half a clock period for each transition of the retimed data. The ERROR output from  $XOR_1$  is a variable width pulse for each transition of the data signal – the width of which depends on the position of the clock within the eye opening. When the leading edge of the clock is centred properly, the width of the variable width pulse is identical too that of the fixed pulse width pulse (REF), plus and mines data jitter amplitude.

These two pulses are then passed through a charge pump and a filter to produce the error voltage for controlling the clock of the VCO.

Figure 6.2 is the timing diagram for an arbitrary data signal (containing three logic ones and seven logic zeros) with the clock properly centred within the data bit interval.



Figure 6.2. Timing diagram of phase detector (clock is centred)

Note that the pulse patterns from the outputs ERROR and REF have identical average values –resulting in zero error voltage from the loop filter. In this way phase lock has been achieved and no forcing will be applied to VCO towards up and down directions.

Figure 6.3 is a similar timing diagram but with the clock signal advanced relative to the centre of the data bit interval. That causes the logic one pulses at ERROR output to become narrower while those at REF output remain the same width as in Figure 6.3. The result is that the average value of the pulse pattern at ERROR output shifts more negative than that at output REF. This results a negative correction at the output of the loop filter. Similarly, when the clock is delayed, the logic one pulses at output ERROR becomes wider while those at output REF again remain the same width and positive correction voltage results.

It is not necessary for the data to have an equal number of logic ones and logic zeros over any time interval. Only a sufficient number of transitions should occur in either direction to keep the loop stable.



Figure 6.3. Timing diagram of phase detector (clock is advanced)

The phase detector gain is factor that varies with the data activity. As the transition density decreases, the open loop gain decreases but the closed loop bandwidth also decreases so the effective Q of the circuit increases. The circuit is thus able to accommodate occasional long intervals of ones and zeros, which can occur even with scrambled data.

The circuit of Figure 6.1 works well as shown as long as the propagation delay through DFF<sub>1</sub> is negligibly small relative to one bit interval. In cases where it is not negligibly small, a modification is made to compensate for the delay through DFF<sub>1</sub> by adding comparable delay between the originating node and the DFF<sub>1</sub>. Since operating frequency of the designed phase detector varies up to 3.2 GHz, propagation delays of the logic structures is not negligible comparing with the 312.5 ps of one bit interval.

Complete *Cadence* schematic view of the phase detector is given in Figure 6.4. Note that, differential flip-flip propagation delay is compensated by a delay element, which has an identical structure with the flip-flops used in phase detector. Also, dummy loads are used in order to equalize the load capacitances of each flip-flop so that the response times of each block could be as same as possible.



Figure 6.4. Cadence schematic view of the phase detector

One precautionary note is in order. This circuit is not a phase-frequency detector. Therefore, a coarse loop has been used to perform frequency detection process. Transfer characteristic of the phase detector has been extracted with the fine loop charge pump. Thus, the result is given in next section.

Sensitivity to the clock duty-cycle is of importance for optimum sampling. When simulated with a non-symmetric clock, the CDR using the designed PD, the loop is locked when the rising edges of the clock centre to the data-eye. But it is also found that the gain of the PD is affected by clock duty-cycle.

Components of phase detector have been discussed in detail in the following subchapters.

### 6.2.1.1.Design of Differential Master-Slave Flip-Flop

The most important part of the phase detector is the sampling flip-flops since it directly determines many of the performance parameters of the clock and data recovery circuit. Hence, from the flip-flip design point of view, setup and hold times should be as small as possible for a robust phase detector flip-flop. Also, circuit should operate properly up to 3.2 GHz with a power supply of 1.2V.

There are several high-speed flip-flop structures in literature. Figure 6.5 shows some published latches intended for high-speed operation. Both a conventional CMOS latch and a single-phase latch [22] (Figure 6.5(a)) are too slow for our purposes because they have a large input capacitance due to the parallel connection of PMOS and NMOS gates. Due to its lower mobility and larger threshold voltage (in our case,  $V_{TN}=V_{TP}\approx350$ mV), the PMOS transistor contributes little to the current drive and much to the capacitances, considerably slowing down the circuit. The latch proposed in [23] (Figure 6.5(b)) uses PMOS transistors in the clock path and was found to work only up to 2.8 GHz in our technology. Also, the 25% duty cycle of the output signals is less convenient for phase switching. The source-coupled latch (Figure 6.5(c), e.g., [24]) has a reduced output swing that facilitates high speed, but due to the stacking of many devices it cannot be accommodated in the intended low supply voltage.



Figure 6.5. (a) True single-phase clock (TSPC) flip-flop stage, (b) latch proposed in [23], and (c) latch using source-coupled logic

Using pseudo-NMOS gates enables high-speed operation while providing large output swing. For comparison, it is observed that in this technology, with a 1.2 V supply, a three-stage CMOS ring oscillator oscillates at 8 GHz, whereas a three-stage pseudo-NMOS ring oscillator oscillates at 14 GHz. This led to choice of pseudo-NMOS logic despite its high power consumption.

Figure 6.6 shows a pseudo-NMOS D-flip-flop whose outputs are connected back to its inputs to form a 2 stage. Also, devices geometries are given in Table 6.1.



Figure 6.6. Designed differential master-slave flip-flop

| Device | W (µm) | L (µm) |
|--------|--------|--------|
| M1     | 1.84   | 0.12   |
| M2     | 1.84   | 0.12   |
| M3     | 1.5    | 0.12   |
| M4     | 1.5    | 0.12   |
| M5     | 1.5    | 0.12   |
| M6     | 1.5    | 0.12   |
| M7     | 3.6    | 0.12   |
| M8     | 3.6    | 0.12   |
| M9     | 3.6    | 0.12   |
| M10    | 3.6    | 0.12   |
| M11    | 1.84   | 0.12   |
| M12    | 1.84   | 0.12   |
| M13    | 1.8    | 0.12   |
| M14    | 2      | 0.12   |
| M15    | 2      | 0.12   |
| M16    | 1.8    | 0.12   |
| M17    | 4.32   | 0.12   |
| M18    | 4.32   | 0.12   |
| M19    | 4.32   | 0.12   |
| M20    | 4.32   | 0.12   |

Table 6.1. Device geometries of differential master-slave flip-flop

As seen from Figure 6.6 differential flip-flop consists of two latches, which are connected together to form a master-slave structure. And each latch consists of sense pairs (M3-M6 in the master and M13-M16 in the slave), a regenerative loop (M4-M5 in master and M14-M15 in the slave), two pull up devices, (M1-M2 in master and M11-M12 in slave) and two pairs of clocking switches (M7-M10 and M8-M9 in master and M17-M20 and M18-M19 in slave).

When CK is low, M7, M10, M18 and M19 are on and the master is the transparent (hold) mode, while M8, M9, M17 and M20 are off and the slave is in the sense mode. When CK is low, reverse case occurs. When master is in sense mode, M3-M6 pair senses the data signal variations. At the same time in the slave part of the circuit, M14-M15 pair latches previous value of the data. It is obvious that, since the master is in transparent mode when CK is high, overall flip-flop circuit becomes a rising edge triggered flip-flop. Note that the circuit uses no stacked or pass transistors. Also, the gate channel capacitance of the clock input NMOS transistors hardly affects the

critical path because these devices are saturated almost for the entire voltage swing at data inputs.

Q and QN outputs are buffered in order to increase the drive capability of the master-slave flip-flop. In Figure 6.7, simulation result with an aligned 3.2 GHz clock is given.



Figure 6.7. *Spectre* simulation result of differential FF result with a centred 3.2 GHz clock

In Figure 6.8 and Figure 6.9, simulation results of clock with a setup time of 8 ps and clock with a hold time of 1 ps are given, respectively.



Figure 6.8. Transient simulation result with 8 ps setup time



Figure 6.9. Transient simulation result with 1 ps hold time

Simulation results show that differential flip-flop has a minimum setup time of 8 ps while it has a hold time almost equal to zero because of its master-slave structure. With a 3.2 GHz clock circuit draws 1.2 mA from 1.2V power supply, which means 1.44 mW of power consumption under typical conditions. Besides, in order to see the limits of the operating frequency of the flip-flop, another transient simulation has been performed with a 10 GHz clock. Simulation result is given in Figure 6.10. Note that flip-flop can operate up to 10 GHz still generating rail-to-rail outputs.



Figure 6.10. Differential flip-flop simulation result with 10 GHz clock

#### 6.2.1.2.Design of Delay Cell

As mentioned in the beginning of this chapter, for a better performance flip-flop propagation delay should be equalize at the one input of the XOR gate given in Figure 6.4. To have a delay equal to the flip-flop propagation delay, the best way is to use a replica circuit of the flip-flop. In this way, delay variations under different temperature and process conditions are compensated.

Delay cell for the phase detector is given in Figure 6.11. Note that, circuit is the replica of the latch. Geometries are same as the slave part of the flip-flip and the CK input is set to  $V_{DD}$  while CKN is set to ground. Hence, this latch is fixed to operate in transparent such that received data is directly passed to output with a propagation delay.



Figure 6.11. Schematic view of phase detector delay cell

### 6.2.1.3. Design of Differential XOR

Another important component of the differential phased detector is the XOR gate, which process the sampled data. Since the overall structure of the clock recovery circuit is based on differential signalling, XOR gate has been designed using differential manner.

The main obstacle in the design of XOR gate is to obtain the possible minimum phase difference sensing capability of the circuit at its inputs. As shown in Figure 6.4, outputs of the XOR gate indicate the phase difference between data and clock (ERROR\_P and ERROR\_N signals). Hence, non-sensible phase difference is directly seen at the clock recovery output as static phase offset. For a clock period of 312.5 ps, the XOR gate should sense phase differences of a few tens of picoseconds. This necessity makes it mandatory to use a current mode XOR instead of a voltage mode XOR gate. In conventional XOR gates, phase difference between the inputs is transferred to the output as voltage pulses. In this design, in order to gain speed, this operation has been performed with current. Current pulses at the output show the phase difference of the two input signals. In this way, speed of the circuit is considerably increased and voltage to current conversion, which is generally performed by charge pump in a conventional PLL, is embedded in XOR and hence in phase detector. Output current pulses, both error and reference, are then evaluated in the charge pump.

Functional core of the differential current mode XOR gate is given in Figure 6.12, device geometries are also given in Table 6.2. Complete *Cadence* schematic view of the circuit, with current mirrors and power down circuits can be found in Appendix A.  $C_1$  and  $C_2$  capacitors in the Figure 6.12 are in place of the input load of the charge pump.

In the circuit, logic one means sourcing a current equals to  $I_B$  to  $C_{1,2}$  capacitors, while logic zero means sinking current equals to  $I_B$  from  $C_{1,2}$  capacitors. Thus, Q output corresponds to A $\oplus$ B, while QN output is (A $\oplus$ B)'.

When A and B are high, thus AN and BN are low, M1, M3, M5, M6 are on and the other sensing NMOSs are off. For the half part generating Q output,  $2xI_B$  flows through M1 and M3 resulting a total current of  $4xI_B$ . at the same time, upper current source is sourcing  $3xI_B$  current. As a result, the difference  $I_B$  current is sourced from C capacitors, resulting a logic zero. In the complementary part of the circuit opposite case occurs. When A is high and B is low, M1 and M2 are on and M3 are M4 are off. Hence,  $2xI_B$  current is shared between M1 and M2, while current source is sourced  $3xI_B$ current. As a result, difference  $I_B$  current sinks through C capacitors resulting logic one. In the complementary side of the circuit, opposite case occurs and logic zero is generated by sourcing  $I_B$  from capacitor.



Figure 6.12. Functional core of the differential current mode XOR

| Device | W (µm) | L (µm) |
|--------|--------|--------|
| M1     | 0.41   | 0.12   |
| M2     | 0.41   | 0.12   |
| M3     | 0.41   | 0.12   |
| M4     | 0.41   | 0.12   |
| M5     | 0.41   | 0.12   |
| M6     | 0.41   | 0.12   |
| M7     | 0.41   | 0.12   |
| M8     | 0.41   | 0.12   |

Table 6.2. Device geometries of differential XOR gate

In the design,  $I_B$  current is 30  $\mu$ A. And the minimum phase difference that is sensed by the XOR gate is 12 ps under typical conditions. Differential XOR draws ~550  $\mu$ A from 1.2 power supply, which means a power consumption of 0.66 mW.

# 6.2.2. Design of Differential Charge Pump

As depicted in the previous chapter, function of the fine loop charge pump is to evaluate the current pulses generated by the phase detector. Differential charge pump receives current pulses at its inputs and converts them to a net differential charge to drive differential loop filter.

This charge pump is also designed fully differential. Transistor level schematic of the charge pump is given in Figure 6.13, corresponding device geometries can also be found in Table 6.3.



Figure 6.13. Schematic of fine loop differential charge pump

| Device | W (µm) | L (µm) |
|--------|--------|--------|
| M1     | 7.08   | 0.5    |
| M2     | 7.08   | 0.5    |
| M3     | 7.08   | 0.5    |
| M4     | 7.08   | 0.5    |
| M5     | 7.08   | 0.5    |
| M6     | 7.08   | 0.5    |
| M7     | 2.1    | 0.5    |
| M8     | 2.1    | 0.5    |
| M9     | 2.1    | 0.5    |
| M10    | 2.1    | 0.5    |
| M11    | 2.1    | 0.5    |
| M12    | 2.1    | 0.5    |

Table 6.3. Device geometries of fine loop differential charge pump

In the table given above only device geometries of the first part is given since both parts have completely same geometries. All the device channel length dimensions are set to  $0.5 \ \mu m$  in order to minimize the short channel effect.

This circuit performs the current averaging from which comes from the phase detector. Received differential current pulses ( $I_{REFP}$ ,  $I_{REFN}$ ,  $I_{ERRORP}$ , and  $I_{ERRORN}$ ) should be added together with respect to their polarity and transferred to the differential loop filter. Averaging and adding process is performed according to the following equations. (Eq. 5.11 and Eq. 5.12)

$$I_{CPOUT_P} = \left(I_{ERRORP} - I_{ERRORN}\right) + \left(I_{REFN} - I_{REFP}\right)$$
(5.11)

$$I_{CPOUT_N} = (I_{ERRORN} - I_{ERRORP}) + (I_{REFP} - I_{REFN})$$
(5.12)

Ideally, at any time, total differential charge, pulled down and pulled up, over loop filter capacitor should be equal. Eq. 5.11 and Eq. 5.12 show that  $I_{CPOUT_N}$  +  $I_{CPOUT_P}$  is zero which is a desired result. However, on silicon, because of the mismatches on devices, currents flown each branch would not be equal to each other. To minimize these current mismatches special attention must be paid while drawing the layout. N and P devices should be arranged symmetrically.

Another important issue for the differential circuit is to keep common mode stable during operation. It is desired to set the output's common mode at  $V_{DD}/2$ . Following simulation result reflects the DC analysis of the circuit (Figure 6.14). Note that, differential outputs are symmetric over 600 mV.



Figure 6.14. DC-sweep simulation result of fine loop differential charge pump



Figure 6.15. Transient simulation result of charge pump outputs with loop filter

To show the symmetry of the circuit in time domain, another simulation has been performed (Figure 6.15). Simulation result shows the output voltage variation of the charge pump, while loop filter is connected and there is a phase difference of 156 ps between the rising edge of the clock and the middle of the data eye. Note that, differential outputs vary around  $V_{DD}/2$  symmetrically. In Figure 6.16, current variations at the differential outputs of the charge pump ( $I_{CPOUT_N}$  and  $I_{CPOUT_P}$ ) are given while there is 156 ps of phase difference between clock and data. In the figure,  $I_{CPOUT_N} + I_{CPOUT_P}$  is also calculated and plotted. Average value of the total current found to be 350 nA, which is very close to zero. Also note that, amplitude of the current pulse is around 30  $\mu$ A.



Figure 6.16. I<sub>CPOUT\_N</sub> and I<sub>CPOUT\_P</sub> current variation while there is 156 ps phase difference between clock and data

Transfer characteristic of the phase detector and the charge pump together is given in Figure 6.17. Transfer characteristic of the fine loop control circuitry is based on  $\mu$ A/rad or  $\mu$ A/ps. In Figure 6.17 (a), transfer characteristic of the circuit has been given in one period interval. In Figure 6.17 (b), zoomed transfer curve is shown in order to determine static phase offset of the fine loop. Note that, static phase offset is around 12 ps while net output current is zero.







Figure 6.17. a) Fine loop control circuitry transfer curve, b) zoomed transfer curve

The charge pump circuit draws approximately 600  $\mu$ A current from 1.2 V power supply under typical conditions, which means 0.72 mW of power consumption.

#### 6.2.3. Design of Differential Loop Filter

The most common PLL loop filter is the simple RC circuit in Figure 6.18. Common design options for the resistor are poly or the channel resistance of an MOS transistor. For high resistance values, an MOS device is most attractive. However, it has a disadvantage at low  $V_{DD}$  if implemented with the straightforward configuration (transmission gate implementation). For a nominal  $V_{DD}$  of 1.2 V, the effective resistance of the transmission gate is nearly independent of the VCO control voltage ( $V_{ctrl}$ ). However, the resistance becomes strongly dependent on  $V_{ctrl}$  for low  $V_{DD}$ . For  $V_{DD} = V_{TP} + |V_{TN}|$ , the resistance goes to infinity for some values of  $V_{ctrl}$  [25] [26].



Figure 6.18. Ideal model for the RC loop filter

Since the capacitors in Figure 6.18 are not floating capacitors, they can be implemented with NMOS devices. In this way, gate oxide capacitance is used. Loop filter implementation is given in Figure 6.19 using NMOS devices as capacitors. When the VCO control voltage approaches the  $V_{TN}$ , MOS device is between inversion and depletion, where its gate oxide capacitance value is voltage dependent, as shown in Figure 6.20. By altering the gate and source/drain connections of the NMOS as shown in Figure 6.19, it will operate in accumulation where the capacitance value is less

voltage dependent, as shown for  $V_{ctrl} > 400 \text{ mV}$  in Figure 6.20. To avoid strong powersupply noise injection, the well must be connected to the same node as source and drain, as shown below.



Figure 6.19. Loop filter implementation with NMOS devices



Figure 6.20. Voltage dependency of MOS capacitance of loop filter

This filter has an impedance of

$$F_{LF}(s) = \frac{sRC_1 + 1}{s^2 RC_1 C_2 + s(C_1 + C_2)}$$
(5.13)

In the Figure 6.19, NMOS device dimensions are W = 3660  $\mu$ m and L = 20  $\mu$ m for M1 and W = 120  $\mu$ m and L = 20  $\mu$ m for M2. In UMC CMOS 0.13  $\mu$ m technology unit gate oxide capacitance value of a MOS transistor is given 10.79 fF/ $\mu$ m<sup>2</sup>. Thus, M1

capacitor value is 3660 x 20 x 10.79  $\approx$  790 pF. And M2 capacitor value is 120 x 20 x 10.79  $\approx$  26 pF.

Resistor of the loop filter has been implemented using n-poly non-salicided layer. The value of the resistor is around 400 ohms. Since the circuit is fully differential, loop filter should also be differential. Hence, two loop filters are used on each differential VCO control line.

The layout of the loop filter and the layouts of all the circuits given in above chapters are given in Appendix B.

# 6.3. Design of Differential Voltage Controlled Oscillator (VCO)

Under this heading, detailed description and design information of differential VCO can be found.

VCO design is one of the most challenging parts of the clock and data recovery design, especially if the PLL is designed to have a large tuning range, 2.4GHz – 3.2GHz in our application, and to have differential structure. There are many requirements for the VCO in a PLL design, which conflict with one another. Therefore, a special care must be taken in the VCO design. Some of the most important VCO requirements and VCO terms include the following:

Ø Control Voltage: This is the varying voltage, which is applied to the VCO input terminal causing a change in the output frequency. It is sometimes referred to as Modulation Voltage, especially if the input is an AC signal.

Ø Free-Running Frequency:  $\omega_{FR}$  is the output frequency of the VCO for a zero control voltage. It is sometimes referred to as "centre frequency".

Ø Tuning Range: The range of frequencies over which the VCO can operate as a result of the applied control voltage.

 $\emptyset$  Frequency Deviation: This is how far the center frequency will change as a function of the control voltage; usually specified in  $\pm$  percentage or parse-per-million

(ppm). As the deviation is made larger, other stabilities such as, temperature and aging will usually degrade.

Ø Linearity: The generally accepted definition of linearity is that specified in MIL-0-55310. It is the ratio between frequency error and total deviation, expressed in percent, where frequency error is the maximum frequency excursion from the best straight line drawn through a plot of output frequency versus control voltage.

 $\emptyset$  **Response Slope:** The slope of the frequency versus the control voltage. This is generally referred to as the VCO gain, K<sub>VCO</sub>, or the tuning performance and expressed in megahertz per volt (MHz/V) or mega radian per volt seconds (Mrad/Vs).

Ø Transfer Function: This denotes the direction of frequency change versus control voltage and sometimes referred to as "slope polarity". A positive transfer function denotes an increase in frequency for an increasing positive control voltage. Conversely, if the frequency decreases with a more positive control voltage, the transfer function is negative.

Ø Phase Stability: The output spectrum of the VCO should approximate as good as possible the theoretical dirac-impulse of a single sine wave. It is mostly referred to as "spectral purity" and quantified by phase noise, which is expressed in terms of dBc/Hz.

Ø Frequency pushing: It is the dependency of the centre frequency on the power supply voltage expressed in MHz/V.

Ø Frequency pulling: It is the dependency of the centre frequency on the output load impedance.

Ø Tuning Speed: This is the time required for the output frequency to settle to within 90% of its final value with the application of a tuning-voltage step.

Ø Output Amplitude: It is desirable to achieve large output oscillation amplitude, thus making the waveform less sensitive to noise. The amplitude trades with

power dissipation, supply voltage, and even the tuning range. Also the amplitude may vary across the tuning range, which is an undesirable effect.

Ø Output Characteristics: This defines the output waveform of the VCO. In PLL applications, sine or square-waves are used mostly.

Ø Power Dissipation: As with other analogue circuits, oscillators suffer from trade-offs between speed, power dissipation, and noise.

# 6.3.1. Ring Oscillator VCO

Due to its simplicity and ease in IC-integration, ring oscillators are frequently used in PLL's and clock recovery applications. In a ring oscillator, a ring of inverters generates the periodic signal. For single-ended topologies, it is not possible to achieve oscillation with less than three stages; moreover the number of the stages should be odd, which introduces a limit on the maximum frequency. It is easy to derive why at least three stages are necessary [27].

Figure 6.21 shows a typical single-stage inverter placed in a unity-gain loop where R and C are the output node resistance and capacitance of the inverter stage, respectively. So, the system has only one pole at:

$$w_P = \frac{1}{R.C} \tag{5.14}$$

thereby, providing a maximum frequency-dependent phase shift of 90°. Recall from Barkhausen criterion, to provide sustained oscillations, a total phase shift of  $360^{\circ}$  is necessary. Since the inverter exhibits a DC phase shift of  $-180^{\circ}$  due to the signal inversion from the input to the output, the maximum total phase shift is  $-270^{\circ}$ . The loop therefore fails to sustain oscillation growth and it is unconditionally stable. The output of the inverter will be biased at the threshold voltage. This suggests that oscillation may occur if the circuit contains multiple stages and multiple poles.


Figure 6.21. Single-stage inverter with a unity gain feedback

If we modify the loop as in Figure 6.22, then the loop consists of two inverter stages with unity-gain feedback. Now, there are two poles in the feedback signal path, which cause a frequency-dependent phase shift of  $-180^{\circ}$ . However, in this configuration the DC phase shift is 0°, which means that this circuit exhibits positive feedback rather than negative. As a result, instead of oscillating, circuit simply latches-up. That is if V<sub>1</sub> rises, V<sub>2</sub> falls allowing V<sub>1</sub> to rise further. This continues until V<sub>1</sub> reaches V<sub>DD</sub> and V<sub>2</sub> drops to ground, a state that will remain indefinitely. The total number of inversions in the loop must be odd so that the circuit does not latch-up.



Figure 6.22. Two-stage inverters with a unity gain feedback

An ideal inverter stage, which does not introduce a pole to the system, is added to the loop as in Figure 6.23. As in the previous case, the frequency-dependent phase shift is -180°. However, in this case the DC phase shift is also -180°. Total phase shift reaches -360° at a frequency of infinity. So, it clear that we need at least three inverter stages, all of which contributes to the phase, shift significantly to achieve a ring oscillator oscillating at a finite frequency.

As a result, we arrive in the ring oscillator topology shown in Figure 6.24. If all stages are identical, the total phase shift at  $\omega_p$  (it is the 3dB bandwidth of each stage) is -135°, and the phase shift is -270° at  $\omega = \infty$ . Consequently, the frequency-dependent phase shift crosses -180° (or total phase shift crosses -360°) at a finite frequency where each stage contributes a phase shift of -60°.



Figure 6.23. Three-stage inverters with two-poles and with a unity gain feedback



Figure 6.24. Three-stage ring oscillator

Each stage in a three-stage ring oscillator contributes a frequency-dependent phase shift of  $-60^{\circ}$ . However, they also contribute a DC phase shift of  $-180^{\circ}$ . As a result, the waveform at each node is  $-240^{\circ}$  out of phase with respect to the previous node.

When we analyse the circuit of Figure 6.24 with any simulator, each node equals to the threshold voltage of the inverters if they are identical. To be able to start the oscillation in simulator, one of the nodes should be initially set a different voltage. However, in practice noise disturbs each node voltage resulting in a growing waveform. If the gain of the stages is sufficient then the circuit exhibits rail-to-rail switching. Assuming n identical stages with a delay of  $t_d$  in the ring oscillator, the oscillation frequency is generally:

$$f_o = \frac{1}{2 \cdot N \cdot t_d} \tag{5.15}$$

In contrast to single-ended ring oscillators, differential ring oscillators can be implemented either with odd number or with even number of stages as shown in Figure 6.25.



Figure 6.25. (a) Differential ring oscillator with odd number of stages, (b) differential ring oscillator with even number of stages

Simple gain stages for single-ended and differential topologies are presented in Figure 6.26 (a) and (b), respectively. However, these stages do not have a frequency control input to implement a VCO.



Figure 6.26. (a) Single-ended ring oscillator buffer stage, (b) differential ring oscillator buffer stage.

In order to vary the frequency of oscillation, the single-ended gain stage is modified as in Figure 6.27. They are called "current-starved" inverters, and the ring oscillators employing these stages are called current-starved ring oscillators [28]. The basic problem with the current-starved ring oscillators is they operate nonlinearly. Also, they have extremely limited operating range.



Figure 6.27. Current-starved ring oscillator buffer stages

Another option to tune the output frequency is to vary the capacitance or resistance seen at the output node of each stage [2, 30]. This is presented in Figure 6.28, where a MOS device operates as a voltage-dependent resistor, thereby varying the effective capacitance seen at the output node. However, as in the current-starved oscillators,  $K_{VCO}$  can experience substantial variation, if a wide tuning range is required.



Figure 6.28. Delay control with capacitive tuning

The differential stage of Figure 6.26 (b) can be modified as in Figure 6.29 to achieve frequency control. The problem with the circuit of Figure 6.29 (a) is, the time constant at the output does not directly depend on the tail current, because the output resistance and the capacitance is not a function of the tail current. Thus, the tuning range of an oscillator employing this stage is very small. For the Figure 6.29 (b), the load devices are biased in the triode region and  $V_C$  adjusts their on-resistance. As  $V_C$  decreases, the delay of the stage drops because the time constant at the output nodes decreases. However, the small signal gain also decreases. As the gain of each stage drops, the circuit eventually fails to oscillate. In Figure 6.29 (c), as  $V_C$  adjusts the tail current, the small signal impedance of the load devices varies accordingly, but the voltage gain remains constant. Thus, the circuit seems to be appropriate for a VCO stage. But the problem is, the large signal output voltage swings still depend on the current.



Figure 6.29. Delay control in differential buffer stages

# 6.3.2. Construction of the Differential Ring Oscillator

In this application, four-stage differential ring oscillator structure has been used as a VCO. The VCO uses "interpolation" technique for tuning [30]. As illustrated in Figure 6.30 (a), each stage consists of a fast path and slow path whose outputs are summed and whose gains are adjusted by  $V_{cont}$  in opposite directions. At one extreme of the control voltage, only the fast path is on and the slow path is disabled, yielding the maximum oscillation frequency (Figure 6.30 (b)). Conversely, at the other extreme, only the slow path is on, and the fast path is off, providing the minimum oscillation frequency (Figure 6.30 (c)). If  $V_{cont}$  lies between the two extremes, each path is partially on and the total delay is a weighted sum of their delays.



Figure 6.30. a) Interpolating delay stage, b) smallest delay, c) largest delay

Top-level *Cadence* schematic view of the differential VCO with its self-bias circuits is given in Figure 6.31. Since the VCO must have a tuning range of between 2.4 GHz and 3.2 GHz, special attention has been paid on the design of each standalone subblock of the VCO.

A differential control path coming from the loop filter controls the VCO frequency. Slow path and fast path of the each VCO delay stage receives VCONT\_P and VCONT\_N as frequency control voltages. Also note that, output of the four stage delay cells is buffered in order to have a rail-to-rail voltage swing. A voltage divider by using three-resistor chain has been used to obtain bias voltage for the circuits.



Figure 6.31. Top-level Cadence schematic view of the differential VCO

Simulation results show that gain of the VCO,  $K_{VCO}$ , for the designed circuit is 2.65 GHz/V. The transfer curve of the VCO is given in Figure 6.32. Differential control voltage is given as VONT\_P – VCONT\_N.



#### Differential VCO Transfer Characteristic.

Figure 6.32. Transfer characteristic of the differential VCO

The VCO is based upon the differential buffer delay stages with symmetric loads, replica-feedback biasing [31] and output buffer. In the following sub-chapters biasing circuit, interpolating delay stage and buffer, which are formed the VCO, will be discussed.

#### 6.3.3. Design of Differential Delay Stage and Self-Biasing Circuit

Each stage of the VCO consists of a three-buffer structure as given in Figure 6.33 and each of these three differential buffers are identical except their control voltages. Every individual buffer stage receives separate differential control voltages, which are generated by separate biasing circuits. As seen from the Figure 6.31, there are three independent biasing circuits for three interpolating buffer. Two of them receives one of the differential control voltages and generates control voltages of U1 and U2 buffers, while third biasing circuit generates a fixed value of bias voltage for U3. Hence, U3 can be said to be a fixed delay buffer.



Figure 6.33. Implementation of delay interpolating in the differential VCO

The differential buffer stage, shown in Figure 6.34, contains a source-coupled pair with resistive load elements called symmetric loads. Symmetric loads consist of a diode-connected PMOS device in shunt with an equally sized PMOS device. Control voltage is applied to the gates of the PMOS devices and a bias voltage is applied to the

gate of the NMOS current source to adjust the bias current of the buffer stages. Device geometries of the differential buffer is also given in Table 6.4.



Figure 6.34. Ring oscillator buffer stage

| Device | W (µm) | L (µm) |
|--------|--------|--------|
| M1     | 1      | 0.12   |
| M2     | 1      | 0.12   |
| M3     | 1      | 0.12   |
| M4     | 1      | 0.12   |
| M5     | 1      | 0.12   |
| M6     | 1      | 0.12   |
| M7     | 7      | 0.6    |

Table 6.4. Device geometries of differential buffer

Spectre DC analyse result of the differential buffer is given in Figure 6.35. Note that differential outputs are symmetrical over  $V_{DD}/2$  with respect to its differential inputs.



Figure 6.35. DC analyse result of differential buffer

The VCO design uses a self-biasing technique to generate the necessary bias voltages of the buffer stages. The key idea behind self-biasing is that it allows circuits to choose the operating bias levels in which they function best. By referencing all bias voltages and currents to other generated bias voltages and currents, the operating bias levels are essentially established by the operating frequency. The need for external biasing, which can require special bandgap circuits, is completely avoided.

The biasing technique is shown in Figure 6.36.  $V_{ctrl}$  input of the circuit is one of the differential control pair coming from the differential loop filter. If the biasing circuit receives  $V_{ctrl}$ +, then the outputs are  $V_b$ + and  $V_c$  + and these control voltages are used to tune fast path (U1) of the interpolating three-buffer delay stage. On the contrary, if the circuit receives  $V_{ctrl}$ -, then  $V_b$ - and  $V_c$  – control the slow path buffer (U2).

Control voltage,  $V_{ctrl}$ , is applied to the gate of the M2. Connecting the NMOS gates (M5 and M6) to the  $V_{DD}$  turn on both of the branches in the differential pair.  $V_{ctrl}$  is also applied to the negative input of an opamp and the output of the differential pair (gate of the diode-connected PMOS device) is connected to the positive input of the opamp. The output of the opamp,  $V_b$ , is applied to the gates of the NMOS current sources (M7 and M8). The NMOS inputs (M5 and M6) behave like a switch, thus neglecting its  $R_{on}$  resistance; they can be assumed as shorted paths. In this way, the

PMOS devices supply a current to the NMOS current sources, which operates as a common-source gain stage from  $V_b$  to  $V_c$ . It can be easily seen that the inputs of the opamp have changed sign because the additional output stage introduces a negative gain.

Consequently, the bias circuit shorts the  $V_{ctrl}$  voltage to the output voltage. In other words, basically, circuit operates as a voltage controlled current source.



Figure 6.36. Schematic view of the self-biasing circuit

 $V_c$  and  $V_b$  form the differential control voltage of the VCO buffers.  $V_c$  voltage is derived from  $V_{ctrl}$ , which is coming from loop filter, as mentioned above and it follows the variations on the  $V_{ctrl}$ . At the same time,  $V_b$  is generated by the opamp, which is a differential complementary of  $V_c$ .

The opamp circuit used in the self-biasing circuit and the geometries of the corresponding device geometries are given in Figure 6.37 and Table 6.5, respectively.



Figure 6.37. Biasing opamp circuit

| Device | W (µm) | L (µm) |
|--------|--------|--------|
| M1     | 15     | 0.48   |
| M2     | 15     | 0.48   |
| M3     | 2      | 0.48   |
| M4     | 2      | 0.48   |
| M5     | 12     | 0.48   |
| M6     | 18     | 0.48   |
| M7     | 4.8    | 0.48   |
| M8     | 2.4    | 0.48   |
| M9     | 0.48   | 0.48   |

Table 6.5. Device geometries of the opamp circuit

The opamp is implemented with a single-stage differential amplifier. Using more than one stage for the opamp introduces stability problems because the buffer stage also behaves as an output gain stage for the opamp.

The most significant advantage of this biasing technique is that the NMOS current source is dynamically biased by  $V_b$  to compensate for drain and substrate voltage variations, achieving the effective performance of a cascode current source without the extra voltage headroom required by cascode current sources. The bias current is completely determined by  $V_{ctrl}$ , thus, if the substrate voltage changes, bias current remains the same and the opamp adjusts the  $V_b$  voltage so that the currents (sourced by PMOS devices and sinked by NMOS device) are equal. This is also valid for drain

voltage variations. However, the bandwidth of the bias generator should be as large as possible to overcome high frequency voltage variations that can affect the PLL design.

DC analyse simulation result of the bias circuit is given in Figure 6.38. In this simulation,  $V_{ctrl}$  has been swept from 0 to 1.2 V and  $V_c$  and  $V_b$  control voltages have been observed. As seen from the plot, both differential outputs have symmetrical behaviour inside linear region and the linear interval of the circuit is around 700mV. With this structure of the biasing circuit, VCO can be controlled linearly between ~ 400 mV and ~ 1.1V, which allows enough margin for a wide tuning of operation.



Figure 6.38. Differential output range of the self-biasing circuit

The differential output of the four-stage ring oscillator does not swing rail-to-rail. Instead,  $V_c$  control voltage of the differential buffer determines the lower limit of the output amplitude while the upper limit is at the  $V_{DD}$ . In order to obtain a rail-to-rail operation at the output of the overall VCO design, generated clock should be amplified by means of a buffer chain. In the following chapter detailed design issues of the output buffer is discussed.

#### 6.3.4. Design of VCO Output Buffer

The output signal swing of the VCO is between  $V_{DD}$  and  $V_c$ , however, the input of the frequency divider and the phase detector should be rail-to-rail to be able to operate properly. Besides, if the output load of the VCO is too high compared to the input load of the buffer stages, then the output frequency of the VCO will be lower for the same control voltage or even the VCO may fail to oscillate. Consequently, the loop operating point departs from midrail which is assumed to be the optimum point. If the operating point moves far from the optimum point, the loop may fail to lock. The divider is not the only load for a frequency synthesizer, it is supposed to drive the clock inputs of the digital blocks. As a result, the output signal should be amplified to achieve a rail-to- rail swing and sufficient drive capability without loading the VCO.

Another problem is the output duty-cycle, even though this is not a design specification, it should be close to %50. To achieve a 50% duty-cycle at the output, the loop is generally designed to operate at twice the output frequency and a divide-by-two circuit is used to achieve the desired output frequency and 50% duty-cycle. However, it is not feasible to design a loop operating at 6.4 GHz to achieve 50% duty-cycle at the output.

Digital buffering technique has been used in the output amplifier. Output buffer simply contains digital buffer chains and special attention has been paid on sizing each stage of the output buffer. *Cadence* schematic view of the differential buffer chain is given in Figure 6.39. A resistor is used between the input and the output of the first inverter stage to reduce the peaking at the output when the signal switches.

Input stage of the buffer chain is AC coupled by means of a floating capacitor in order not to be affected by the common mode variations at the input. The value of the capacitor is selected to be 100 fF. Furthermore, input of the first inverter is connected to its output via a resistor with a value of 20 Kohms. This is done so that the gain of the first stage can be increased to a sufficient level to amplify the input clock signal. Transient response of the both differential input and output signals are given in Figure 6.40. In the simulation, input of the output buffer has a frequency of 3.2 GHz with

amplitude of 300 mV over 750 mV DC level. Note that output signal swings rail-to-rail at 3.2 GHz without any skew.



Figure 6.39. Output buffer chain of the VCO



Figure 6.40. Transient response of the VCO output buffer

Since the input stage of the buffer chain contains a capacitor and a resistor, the AC characteristics of the circuit becomes important. The buffer chain has band-pass behaviour because of the passive element. Pass band of the output buffer should cover the operating frequencies of the clock recovery. AC response of the buffer chain is given in Figure 6.41. Note that, frequency components between 2.4 GHz and 3.2GHz are inside the pass band.



Figure 6.41. AC response of VCO output buffer

Overall VCO circuit with its output buffer, 6.21 mA is drawn from the power supply under typical process and temperature conditions. This means a power consumption of 7.45 mW with a  $V_{DD}$  of 1.2 V.

# 7. TOP LEVEL CONSTRUCTION of THE CIRCUIT AND LAYOUT CONSIDERATIONS

# 7.1. Introduction

In this chapter, top-level construction, top-level simulations, detailed description of system level functionality issues of the clock and data recovery circuit are given. Layout considerations and techniques are discussed.

#### 7.2. Top-Level Construction of the Circuit

The previous chapters have presented the design of all sub-blocks that are used in the clock and data recovery structure, as well as the complete design of the coarse and fine loops. The operation and the performance characteristics of these essential components were also verified with extensive simulations. It was demonstrated that the electrical performance of the sub-blocks are well within the expected bounds so that the operation of the overall clock recovery architecture can be guaranteed to match the initial specifications. At this point, the final task is to combine the designed components into the final clock recovery structure, combining both coarse and fine loops together.

*Cadence* top-level schematic view of the clock and data recovery circuit is given in Figure 7.1. Note that there are additional sub-blocks, which has not been discussed in the previous chapters such as "powerdown", "data retiming" and "output buffer". In this chapter these sub-blocks will be also discussed shortly.



Figure 7.1. Top-level *Cadence* schematic view of the clock and data recovery

Data retiming circuit is used to retime the recovered data with the aligned VCO clock. There is a master slave differential flip-flop inside this block and the outputs of the circuit are the recovered clock and data signals.

Output buffer contains four digital high-drive capability and symmetric buffers for differential clock and data signals. The buffer has a drive capability of 100 fF at 3.2 GHz operating frequency.

Power down circuit has crucial functionality on overall system instead of its simple structure. The combinatorial circuit is responsible for the processing power down, lock and data loss signals. The circuit generates global power down signal, power down for coarse loop and power down for fine loop signals. The truth table of the circuit is given in Table 7.1.

| PD_IN | LOCK | DATA LOSS | PD | PD_FINE | PD_COARSE |  |
|-------|------|-----------|----|---------|-----------|--|
| 1     | 1    | 1         | 1  | 1       | 1         |  |
| 1     | 1    | 0         | 1  | 1       | 1         |  |
| 1     | 0    | 1         | 1  | 1       | 1         |  |
| 1     | 0    | 0         | 1  | 1       | 1         |  |
| 0     | 1    | 1         | 0  | 1       | 0         |  |
| 0     | 1    | 0         | 0  | 1       | 0         |  |
| 0     | 0    | 1         | 0  | 1       | 0         |  |
| 0     | 0    | 0         | 0  | 0       | 1         |  |

Table 7.1. Truth table of power down circuit

Schematic of the power down circuit is given in Figure 7.2. DATA LOSS signal comes from outside and is generated by the system. When this signal is high, meaning that data is invalid, phase alignment is no more meaningful. However, a continuous VCO clock at the data frequency should be supplied by he clock recovery. Hence, when DATA LOSS is received, PD\_COARSE is pulled down and PD\_FINE is pulled up. In this way, coarse loop is enabled while fine loop is disabled.

During normal operation, as discussed in the previous chapters, LOCK signal indicates that frequency lock is achieved. Consequently, pulling up PD\_COARSE disables coarse loop and pulling down PD\_FINE signal enables fine loop.



Figure 7.2. Schematic view of power down circuit

Necessary bias voltages for sub-blocks are generated by means of a resistor chain between  $V_{DD}$  and GND. These voltage values are 400 mV and 500 mV. Also, in order to filter the ripples on the supply voltage a capacitor with a value of 500 pF has been used between  $V_{DD}$  and GND. This capacitor has been implemented by using NMOS device gate oxide capacitance, which has a width of 2.3 mm and a length of 20 µm.

#### 7.3. Top-Level Simulations of the Circuit

After constructing the top-level structure of two-loop architecture, *Spectre* simulations have been performed in order to see if it is guaranteed to match the initial specifications. First, fine and coarse loop simulations have been performed separately.

In Figure 7.3 (a) and (b); UP and DN signals during lock and differential control voltage with LOCK signal are given respectively for standalone coarse loop. Note that after frequency lock, UP and DN signals are all zero. Simulation has been performed

with a reference clock frequency of 175 MHz, which corresponds a VCO clock frequency of 2.8 GHz.



Figure 7.3. Coarse loop simulation results (a) UP and DN signals during lock (b) Differential control voltage variation with LOCK signal

In Figure 7.4, divided VCO clock and 175 MHz reference clock is shown after lock is achieved. Note that, frequency of two clock signals are matched while the phase is not.



Figure 7.4. Divided VCO clock and reference clock after frequency lock

Next step of the design is to perform two-loop simulations. Following simulation has been performed with a reference clock of 200 MHz, which corresponds 3.2 GHz of VCO clock during lock.



Figure 7.5. Two-loop simulation result at 3.2 Gbps data rate (a) Differential control voltage and LOCK signal (b) Aligned data and clock signals

In Figure 7.5 (a), differential control voltage variation and LOCK signal are given. Under typical conditions, coarse loop lock time is around 4  $\mu$ s. After frequency acquisition by coarse loop, it is observed that phase alignment procedure has been achieved. Data and clock relationship is given in Figure 7.5 (b). Rising edge of the VCO clock is in the middle of the data eye with a few picoseconds of static phase offset. This phase offset is due to the dead zone coming from phase detector and charge pump circuits. Static phase offset of the system is around 10 ps.

Another two-loop top-level simulation has been performed with 2.5 Gbps data rate. Simulation results, similar to the previous one, are given in Figure 7.6.



Figure 7.6. Two-loop simulation result at 2.5 Gbps data rate (a) Differential control voltage and LOCK signal (b) Aligned data and clock signals

While operating at 2.5 Gbps data rate, frequency lock time decreases to 3.56  $\mu$ s. From the Figure 6.32, 2.5 GHz clock is generated while differential control voltage is zero. This is verified with the simulation plot given above. During lock at 2.5 Gbps data,  $V_{CTRLP} = V_{CTRLN} = 600$  mV.

To show the power consumption, supply currents of different operating frequencies are plotted in Figure 7.7.



Figure 7.7. Supply current (power consumption) of the two-loop clock recovery

In Figure 7.7 (a) operating frequency is 2.5 GHz. At this frequency coarse loop consumes 11.84 mA under typical conditions, which equals to 14.21 mW with a power supply of 1.2 V. At the same data rate when system switches to the fine loop average supply current increases to 14.15 mA, which means a power consumption of 16.98 mW.

On the other hand, in Figure 7.7 (b), operating frequency is 3.2 GHz. At this maximum frequency, power consumption of the system reaches its peak value. While coarse loop is on, average supply current is 12.47 mA, thus power consumption equals

to 14.96 mW. And when fine loop is enabled, average supply current reaches its maximum average value of 15.46 mA, thus the maximum power consumption of the system under typical process and temperature conditions with a power supply voltage of 1.2 V is 18.55 mW.

# 7.4. System Level Functionality of Clock and Data Recovery Circuit

This clock and data recovery circuit can be a part of a serialization/deserialization (SERDES) macro that can be integrated in any serial link interface chip with large amount of channels (> 64).

On the receiver side of the SERDES, it has an integrated PLL-based clock and data recovery block, an elastic memory and a serial-in-parallel-out block. In some applications, the high-speed system clock arrives the chip together with the data. The clock and data recovery block retimes the data with the generated write clock. The transition between the two same frequency clock domains is accomplished by the elastic memory. The recovered data is synchronized to the read clock (the system clock) after this block. The whole chip operates synchronous with the high-speed system clock or its low speed derivative. The transmit side involves a parallel in serial out (PISO) block to serialize the parallel data coming from the digital core. The high-speed system clock is transmitted to this side together with data coming from the digital core. The signalling in between the receive-transmit channels are fully differential. The high-speed LVDS IOs are also differential in out. Block diagram of the SERDES macro is given in Figure 7.8.

A typical application for the use of SERDES macro can be a chip where N numbers of channels are employed (see Figure 7.9). It is assumed in such a system that the external high-speed clock (>2.4GHz) or a synchronous reference low speed clock is transmitted to the chip within the system from the PCB or the backplane. The error-free transfer between high-frequency (HF) / low-frequency (LF) domains within the chip is achieved by using symmetrical clock trees for both clocks. Some applications may require the input and output channels to be on the same side of the chip. The layouts of the macros should be suitable for such a compact in/out channel layout.



Figure 7.9. Application block diagram N-channel SERDES chip

The SERDES receiver starts its operation by receiving the 2.4-3.2 Gbps input data through fully differential LVDS IOs. Each receiver channel has a dedicated clock recovery cell, which extracts the clock from the incoming data and aligns this clock inphase with the data. Later, the recovered data is deserialized with the system clock with a custom FIFO-SIPO circuitry and the low speed parallel data is transmitted to the digital core.

The block diagram of the Receiver (RX) is given in Figure 7.10. It consists of differential in/out LVDS receivers, clock divider block (which can be common to all receiver channels in an application), clock and data recovery block, FIFO and SIPO.



Figure 7.10. Internal block diagram of SERDES receiver

After several analyses AC and DC specifications of the clock and data recovery macro have been determined as given in Table 7.2 and Table 7.3, respectively.

| Symbol          | Parameter              | Min  | Nom | Max  | Unit |
|-----------------|------------------------|------|-----|------|------|
| V <sub>DD</sub> | Operating power supply | 1.08 | 1.2 | 1.32 | V    |
| T <sub>J</sub>  | Junction temperature   | 0    | 25  | 125  | °C   |

Table 7.2. Power supply and temperature specifications of CDR

| Parameter                        | Min | Тур  | Max  | Unit |
|----------------------------------|-----|------|------|------|
| Input DATAIN data rate           | 2.4 |      | 3.2  | Gbps |
| Ref CLK accuracy                 |     |      | 1000 | ppm  |
| Input Jitter Tolerance @ 600Hz   |     | 15   |      | UI   |
| Input Jitter Tolerance @ 6 kHz   |     | 1.5  |      | UI   |
| Input Jitter Tolerance @ 100 kHz |     | 1.5  |      | UI   |
| Input Jitter Tolerance @ 2 MHz   |     | 0.15 |      | UI   |
| Jitter Peaking                   |     |      | 1.5  | dB   |
| Jitter Transfer Bandwidth        |     |      | 4    | MHz  |
| Input DATAIN Duty Cycle *        | 40  |      | 60   | %    |
| LOCK time                        |     |      | 10   | μs   |
| Power Consumption                |     | 15.5 | 23   | mA   |

\* While data in is 10101...

Table 7.3. AC specifications of the CDR

### 7.5. Layout Considerations

Since the operating frequency of the circuit reaches up to 3.2 GHz, special attention should be paid in the layout-engineering phase of the design. In most cases, especially on the high-speed paths, special techniques should be used in the layout in order to have a robust design. In the following chapters special points concerning the layout of the clock recovery are discussed.

# 7.5.1. Layer Sharing

The layout of this design is drawn with 1P8M (one poly eight metal layers). The seventh and eighth metals are reserved for other purposes of the test chip. This leaves only six metal layers for the clock recovery layout. These metal layers are shared as:

- Ø Metal 1: transistor level routing
- Ø Metal 2: cell level routing
- Ø Metal 3: CDR signal routing + power routing
- Ø Metal 4: CDR signal routing + power routing
- Ø Metal 5: Power routing (CDRGND)
- Ø Metal 6: Power routing (CDRVDD)

# 7.5.2. Reliability

The layout has been drawn according to UMC Design Rules [32], [33], [34] and reliability rules, to work properly for 10 years at 100°C. Reliability rulebook determines the widths of metal layers and numbers of vias according to the value of the current flowing over them. Since the resistance of vias and contacts are not negligible, they are used as many as possible in parallel. Not to limit the functionality probability of the circuit to a single contact or via, they are always used at least in pairs.

# 7.5.3. Symmetry and Placing

Symmetry is one of the most important consideration points in a high-speed differential layout design. Both data paths should see exactly the same load capacitance throughout the circuit. Circuit sub components should be placed symmetrically, with a symmetry origin.

Common-centroid placement [35] is not an ideal solution for a circuit working in radio frequencies (RF). In such a placement, differential lines cross over each other many times, which introduces a big coupling capacitance. However, components on different data paths should not be drawn separate, since a mismatch on one side of the symmetry axis, or a noise coming from one side of the circuit would affect only one line of the differential data. There is a trade-off here, and a designer should choose his/her own path, which suits his/her case better.

In this thesis' layout drawing, single elements, such as current sources have been centered or split. Differential components are generally split to both sides of the symmetry origin line. This introduced foreseen crossovers on data lines.

Generally, differential signals, which need to intersect because of needs or routing problems, are on the same metal layer, e.g. metal1. To jump one signal over the other, when "A" is taken to metal2 and "B" left on metal1,

Ø Data path "A" has additional via resistances on it

Ø Data path "A" sees more capacitances to lines on metal3 layer

Ø Data path "B" sees more capacitances to substrate, since it has more area on metal1.

To keep the symmetry, the better line, that is the line that sees less resistance and capacitance, is modified to see more resistance and capacitance as the other line.

This way, both lines are first taken to metal3, and then on the crossing point one path is taken to metal2, while the other is taken to metal4. Graphical presentation can be seen in Figure 7.11.



Figure 7.11. Crossing of differential lines

In Figure 7.11, data "A" sees two sets of via2 while data "B" sees two sets of via3. In case of any resistance differences between via2 and via 3 occur, additional vias have been put into data lines.

The implementation of this placement is somewhat more confusing when the differential lines which should cross reach the number of four, but keeping to the same philosophy, it can be done as in Figure 7.12.



Figure 7.12. Two pair of differential lines crossing

Lines, which have not seen via2, have been transferred to metal2 and back again just to introduce the same amount of via2 as the other data path. Symmetrical via application is also done to the lines, which have not seen via4 on the crossing point.

#### 7.5.4. Bending on Data Paths

 $90^{\circ}$  bendings on the data paths have been avoided.  $45^{\circ}$  turns have been implemented instead. If a data path has a  $90^{\circ}$  turning point, the metal will be more likely to break up at that point due to electron migration. Such a bending may also affect different frequency components of a signal in different ways, thus it may affect the unity of the signal.

#### 7.5.5. Shielding

Circuits operating at high frequencies may generate considerable amount of noise, and analogue circuits are susceptible to that. This CDR circuit will most probably have other circuits, which also work at gigahertz frequencies around it. Hence, close attention must be paid to shielding. Layout should be drawn so that the circuit itself does not become a noise source. Therefore, it is not enough to shield circuits that are susceptible to noise. Noise emitting parts should also be shielded.

In the layout of CDR, separate shielding has been performed to each sub blocks. Those shieldings have been connected to appropriate power rails, " $V_{DD}$ " or "GND" from points that are far from the receiver core circuits. Noises in minority and majority carriers have been treated separately, and a shield has been drawn for each, n-well and p-substrate shielding depending on the case.

Although substrate and n-well biasing contacts are connected to " $V_{DD}$ " and "GND" nodes, they have not been shorted to shielding pick-up vias, which are also on the same metal and close to each other. Instead, they have been connected to power routings via different sets of vias, and are connected to each other only at the highest metal layer. In this way, when a noise component comes to a shield, it does not go directly to the substrate of the circuit, instead, it first goes to highest metal layer of the related power routing, and is carried out of the chip via a few sheet resistances of that metal. A graphical representation can be seen in Figure 7.13. As seen in the figure, higher ratio of the noise coming from the substrate has been thrown out of the chip.





Although each sub cell is shielded, there is another shield around the top-level CDR core layout borders.

# 7.5.6. Dummy Components

During fabrications, components close to the centre tend to be better fabricated compared to those at the edges. This is due to etching and shading effects [35]. Using of dummy elements is advised to overcome this problem. This way, mismatches between components are reduced drastically.

# 7.6. The Layout

Top level of the layout of CDR can be seen in Figure 7.14. The layout can be summarized in four main parts, coarse loop control, fine loop control, VCO and loop filter components. Uncommented layout views can be found in Appendix B.



Figure 7.14. Top-level layout view of CDR circuit

PFD of the coarse loop control circuitry is given in Figure 7.15. Note that both  $V_{DD}$  and GND guard rings surrounds whole cell for shielding.



Figure 7.15. Layout view of PFD

In Figure 7.16, coarse loop charge pump is shown. Charge pump functional core layout is drawn symmetrical since the circuit performance is strongly dependent on the current symmetry.



Figure 7.16. Layout view of coarse loop charge pump

One of the symmetry crucial components of the CDR is the CMFB block layout is given in Figure 7.17. Since the differential control path passes through CMFB, layout of this block is well shielded against noise.



Figure 7.17. Layout view of CMFB circuit

Layout view of the lock detector circuit is given in Figure 7.18. Since there are two capacitors in the circuit for the purpose of filtering, a considerable amount of the space in the layout has been spent for those MOS capacitors.



Figure 7.18. Layout view of the lock detector
In Figure 7.19, phase detector layout of the fine loop is given. Since the phase detector will operate at gigabit range, supply voltage of the circuit is also filtered internally. Filter capacitors of the sub block is placed around the cell, between  $V_{DD}$  and GND guard rings.



Figure 7.19. Layout view of the fine loop phase detector

Another high-speed part of the circuit is differential charge pump of the fine loop. Layout view of the charge pump is given in Figure 7.20. Transistors that carry differential control voltage charges should be in good symmetry. Dummy transistors are also used in the layout to increase the matching ratio between transistors.



Figure 7.20. Layout view of the fine loop charge pump circuit



Figure 7.21. Layout view of the VCO

In Figure 7.21 top-level layout view of the VCO is given. This layout consists of VCO self-biasing circuits, 4-stage ring oscillator and output buffer. Detailed layout views of each block are given in Appendix B. In the layout of the VCO internal supply voltage-filtering technique is used similar to the phase detector.

The top-level CDR layout has a size of 500  $\mu m$  x 600  $\mu m.$  The total area of the CDR is around 0.3  $mm^2.$ 

#### 8. CONCLUSION

The last decade of this century has seen an explosive growth in the communications industry. People want to be connected all the time using wireless communication devices. In addition, the demand for high bandwidth communication channels has exploded with the advent of the Internet. Thanks to the high density available on integrated circuits, sophisticated digital modulation schemes can be employed to maximize the capacity of these channels. This has changed the design of wireless and wire-line transceivers.

This thesis has presented the design, verification, system integration and the physical realization of a monolithic high-speed clock and data recovery (CDR) circuit with an operation range between 2.4 Gbps and 3.2 Gbps. The circuit is implemented and produced by using UMC 0.13  $\mu$ m digital CMOS technology.

The architecture of the CDR is based on phase-locked loop technique. Circuit has been realized as a two-loop structure consisting of coarse and fine loops, each of which is capable of processing the incoming reference low-speed clock and high-speed random data respectively. Frequency acquisition has been performed by the coarse loop. A lock detection mechanism in the coarse loop has been employed for switching between coarse and fine loop when VCO clock frequency reaches data rate. The entire circuit architecture is built with a fully differential approach, consisting of symmetrical blocks and signal paths organized with special low-voltage design and layout techniques.

To realize the proposed circuit architecture, dynamics of the two separate loops have been determined individually and specifications of each block that construct the loops were designated. At this step of the design CDR has been modelled and simulated by using MATLAB and *Simulink* software. Basic building blocks of the two-loop architecture are: (i) phase and frequency detector for coarse loop, (ii) charge pump for the coarse loop, (iii) phase detector for the fine loop, (iv) charge pump for the fine loop, (v) differential VCO, and (vi) differential loop filter. These key components were designed to meet the desired specifications, and then, the overall physical design of the CDR has been constructed.

Basic functional description of the CDR is as follows:

At start up, the coarse loop provides fast locking to the system frequency with the help of a low-speed reference clock. After the VCO clock reaches proximity of system frequency, the LOCK signal is pulled LOW (active low) and the coarse loop is turned off, while the fine loop is turned on. Fine loop tracks the phase of the generated clock with respect to the data and aligns the VCO clock such that its rising edge is in the middle of data eye. The phase detector allows this operation. The speed and symmetry of sub blocks in fine loop are extremely important, since all asymmetric charging effects, skew and setup/hold problems in this loop translate into a static phase error at the clock output.

It is desirable not to lose the recovered clock in absence of data input stream. When DATA\_LOSS signal is raised HIGH, the fine loop is turned off and the coarse loop becomes functional again within the clock recovery, in order to maintain the frequency of the recovered clock.

The PD\_IN signal is a global power down for both the fine and coarse loops. When raised HIGH, the clock recovery stops functioning and the clock/data outputs are stuck at logic '1' level.

The phase frequency detector and the charge pump are the main components of the coarse loop that dictate the performance of the overall frequency acquisition performance as well as acquisition speed of the CDR. Several different topologies were studied; all of which were derived from the same basic structure. The full differential circuit architectures were chosen as the final design in both PFD and charge pump. The worst-case PFD and charge pump transfer curve extraction simulations show that, coarse loop control circuitry has a dead zone of 500 ps, allowing enough margin for the necessary frequency approximation between reference clock and VCO clock. The lock time of the frequency loop at 3.2 Gbps data rate is around 4  $\mu$ s.

Another important part of the circuit is the fine loop control circuitry, which contains high-speed phase detector and charge pump. Differential phase detector structure has been designed similar with the one proposed by the Hogge. However, modifications have been performed on the usual Hogge phase detector in order to increase the phase alignment performance. Matching of the transistors in the fine loop control circuitry has considerable effect on the overall system performance. Thus, special attention has been paid on the layout of the circuit. Dead zone of the circuit, which means static phase offset at the output of the CDR, is 12 ps under worst-case conditions at 3.2 Gbps data rate.

VCO of the system is common for both coarse and fine loops. VCO has differential control voltage and differential output as in the whole system. 4-stage ring oscillator architecture has been used in the core of the VCO. The delay-interpolating manner in the delay stages of the VCO gives a wide linear tuning range opportunity to the overall CDR system. VCO also uses self-biasing technique, which completely avoids the need for external biasing such as bandgap circuits. Digital buffer chain amplifies VCO output in order to obtain a rail-to-rail clock output and a high drive capability. Supply current of the VCO is 6.21 mA while output amplifier consumes itself 4 mA of the total supply current.

Passive loop filter is also designed differential. Capacitors of the loop filter have been implemented by using NMOS devices. Parasitic gate oxide capacitances of the NMOS devices have been used. Resistors of the loop filter have been implemented by using unsalicided N-poly layer.

All analogue as well as digital sub-blocks of the CDR architecture presented in this work operate on a differential clock and data signal, which comes from the LVDS receiver pads. This significantly increases the complexity if the design while ensuring a more robust performance. Other important feature of this CDR is that all the components including loop filter have been integrated. Despite the complete integration of loop and supply filter capacitors, CDR has  $0.3 \text{ mm}^2$  area, which can be supposed a small silicon area with comparison with the CDR's proposed in literature and industry. Proposed circuit uses single power supply while it does not have high power consumption (18.6 mW). It has the capability to operate up to 3.2 GHz sampling clock rates, and the ability to handle a wide range of input date rates (2.4 Gbps – 3.2 Gbps).

The CDR architecture was realized using a conventional 0.13  $\mu$ m digital CMOS technology (Foundry: UMC), which ensures a lower overall cost and better portability for the design.

Other researchers have reported similar featured PLL-based clock and data recovery circuits in terms of operating data rate, architecture and jitter performance. To the best of our knowledge, this clock recovery uses the advantage of being the first high-speed CDR designed in CMOS 0.13µm technology with the superiority on power consumption and area considerations among others. A detailed comparison with reported CDR's has been given in Appendix C.

To summarize, the CDR architecture presented in this thesis is intended, as a state-of-the-art clock recovery for very high-speed applications such as optical network transmission or high bandwidth wire-line backplane communication needs. The circuit meets jitter tolerance specification of OC-48 standards. It can be used either as a standalone single-chip unit, or as an embedded IP block that can be integrated with other modules in various SOC (system-on-chip) applications.

#### 8.1. Future Work

The CDR test-chip is fabricated with different versions of the CDR. These versions are; complete functionality CDR, open loop CDR, CDR with external loop filter, standalone VCO and CDR with SIPO, PISO and FIFO as SERDES macro. Package of the test-chip is selected to be 52-pin MLF (micro-lead-frame), which comes with low pin-count but are suitable for high frequency signalling.

In the test-chip, there is also a one-million-gate low frequency digital circuit in order to realize a real noise, which will be faced in the actual operation of the CDR. There are digital counters and registers working with low frequency clock in the digital core. When digital noise generator is enabled, a random noise will be injected to the substrate of the whole test-chip. The layout of the test-chip can bee seen in Figure 8.1. The main blocks are highlighted in the figure.

The test-chip has been fabricated. However, functional measurements and characterization of the CDR cannot be completed.

The future work includes testing the CDR test-chip, and verifying the simulation results, extracting jitter tolerance performance of the CDR design, as well as the jitter transfer function of the CDR. An LC VCO may be implemented to improve the jitter performance, however, first step is to verify noise models, and the measurement techniques. In addition to the open points given above, a data loss detector may be designed and embedded to the CDR in order to turn on the coarse loop and turn off the fine loop in case of receiving a certain time of idle data.



Figure 8.1. The layout of the top-level CDR test-chip

# **APPENDIX A: COMPLETE CIRCUIT SCHEMATICS**







Figure A.2. Schematic of the PFD\_zero circuit



Figure A.3. Schematic of the coarse loop charge pump



Figure A.4. Schematic of the lock detector



Figure A.5. Schematic of the CMFB



Figure A.6. Schematic of divide-by-16 circuit



Figure A.7. Schematic of the loop filter



Figure A.8. Schematic of the differential flip-flop



Figure A.9. Schematic of the phase detector delay component



Figure A.10. Schematic of current mode differential XOR



Figure A.11. Schematic of phase detector





Figure A.12. Schematic of the fine loop charge pump



Figure A.13. Schematic of the VCO top-level



Figure A.14. Schematic of the VCO delay cell



Figure A.15. Schematic of the VCO delay buffer



Figure A.16. Schematic of the VCO self-biasing circuit



Figure A.17. Schematic of the biasing OPAMP



Figure A.18. Schematic of the VCO output amplifier



Figure A.19. Schematic of the power down circuit



Figure A.20. Schematic of the top-level CDR

# APPENDIX B: COMPLETE MASK LAYOUTS



Figure B.1. Mask layout of differential flip-flop



Figure B.2. Mask Layout of the power down circuit



Figure B.3. Mask layout of the differential XOR



Figure B.4. Mask layout of the divide-by-16 circuit



Figure B.5. Mask layout of the output buffer



Figure B.6. Mask layout of the biasing resistor chain



Figure B.7. Mask layout of the VCO delay buffer



Figure B.8. Mask layout of the VCO self-biasing circuit



Figure B.9. Mask layout of the VCO output amplifier



Figure B.10. Mask layout of the VCO

## **APPENDIX C: LITERATURE SURVEY**

| CDR                                                                                  | Technology  | Supply Volt. | Power   | Data Rate | Architecture       | Die Size                |
|--------------------------------------------------------------------------------------|-------------|--------------|---------|-----------|--------------------|-------------------------|
| Our CDR                                                                              | 0.13um CMOS | 1.2 V        | 18.6 mW | 3.2 Gbps  | two-loop PLL-based | 0.3 mm2                 |
| ISSCC 2002, OC-192 Transmitter in Standart 0.18um CMOS                               | 0.18um CMOS | 1.8 V        | xxx     | 10 Gbps   | two-loop PLL-based | 2.3 mm2                 |
| JSSC,May 2001 A 10Gbps CDR Circuit with a Half-Rate Phase Detector                   | 0.18um CMOS | 2.5 V        | 72 mW   | 10 Gbps   | single PLL-based   | 0.99 mm2                |
| ISSCC 2001, Fully-Integrated SONET OC-48 Transceiver in Standart CMOS                | 0.18um CMOS | 1.8V         | 90 mW   | 2.5 Gbps  | two-loop PLL-based | 3 mm2                   |
| ISSCC 2001, A 10Gbps CDR Circuit with Frequency Detection                            | 0.18um CMOS | 1.8 V        | 91 mW   | 10 Gbps   | two-loop PLL-based | 2.71 mm2                |
| ISSCC 2001, A CDR with a Half-Rate Phase Detector for 2.5 Gbps Optical Communication | 0.25um CMOS | 2.5 V        | 95 mW   | 2.5 Gbps  | single PLL-based   | 0.7 mm2                 |
| ISSCC 2001, 2.5 Gbps 3x Oversampled Transceiver with Robust CDR                      | 0.25um CMOS | 2.5 V        | 269 mW  | 2.5 Gbps  | DLL-based          | 4.9 mm2                 |
| ISSCC 2001, 2.75Gbps CMOS CDR with Broad Capture Range                               | 0.25um CMOS | 2.7 V        | 50 mW   | 2.75 Gbps | two-loop PLL-based | 0.54 mm2                |
| ISSCC 2002, 5 Gbps Jitter Tolerant variable Onterval Oversampling CDR Circuit        | 0.25um CMOS | 2.5 V        | xxx     | 5 gbps    | DLL-based          | 0.85 mm2                |
| JSSC, March 2001 A CMOS Clock Recovery Circuit for 2.5Gbps NRZ data                  | 0.4um CMOS  | 3.3 V        | 33.5 mW | 2.5 Gbps  | single PLL-based   | 0.32 mm2 (external LPF) |
| ISSCC 1999, A 1Gbps CMOS CDR Circuit                                                 | 0.5um CMOS  | 5 V          | 300 mW  | 1 Gbps    | two-loop PLL-based | xxx                     |
| JSSC, December 1999, A 0.155, 0.622 and 2.5Gbps Automatic bit-rate selecting CDR IC  | Si-bipolar  | 5 V          | 680 mW  | 2.5 Gbps  | single PLL-based   | XXX                     |

Table C.1.Performance comparison with reported high-speed CDR's

### REFERENCES

- 1. Rohde U. L., *Digital PLL Frequency Synthesizers*, Prentice-Hall, Englewood Cliffs, N.J. 1983.
- 2. Razavi B., *Monolithic Phase-Locked Loops and Clock Recovery Circuits*, IEEE Press, New York, 1996.
- 3. A. B. Grebene, *The monolithic phase-locked loop a versatile building block*, IEEE Spectrum, vol. 8, pp.38-49, March 1971.
- 4. R. E. Best, *Phase Locked Loops*, second edition, McGraw Hill, New York, 1993.
- G. S. Moschytz, *Miniaturized RC filters using phase-locked loop*, Bell Syst. Tech.
  J., vol. 44, pp. 823-870, May 1965.
- 6. W. F. Egan, *Frequency Synthesis by Phase Lock*, Wiley & Sons, New York, 1981.
- Jitter in digital communication systems, Part 1, Application note HFAN-4.0.3 Maxim High-Frequency/Fiber Communications Group.
- 8. *Converting between RMS and Peak-to-Peak Jitter at a specified BER*, Application note HFAN-4.0.2 Maxim High-Frequency/Fiber Communications Group.

- L. M. DeVito, A Versatile Clock Recovery Architecture and Monolithic Implementation, from Monolithic Phase-Locked Loops and Clock Recovery Circuits by B. Razavi, pp. 408.
- R. Cordell, A 50 MHz phase- and frequency-locked loop, IEEE J. Solid-State Circuits, vol. SC-14, no. 6, pp. 1003-1009, December 1979.
- C. H. Doan, Design and Implementation of a Highly-Integrated Low-Power CMOS Frequency Synthesizer for an Indoor Wireless Wideband-CDMA Direct-Conversion Receiver, Master Thesis, 1999.
- 12. Kang, S. and Y. Leblebici, CMOS Digital Integrate Circuits: Analysis and Design, McGraw-Hill, 1994.
- 13. Çilingiroğlu, U., Systematic Analysis of Bipolar and MOS Transistors, Artech House, 1993.
- 14. Leblebici, D., *Microelectronics Technology Lecture Notes*, Istanbul Technical University, 2000.
- 15. Chien G., Low-Noise Local Oscillator Design Techniques using a DLL-based Frequency Multiplier for Wireless Applications, PhD thesis, 2000.
- W. F. Egan, *Frequency Synthesis by Phase Lock*, John Wiley & Sons, New York, 1981.
- F. M. Gardner, *Charge Pump phase-locked loops*, IEEE Trans. Comm., vol. COM-28pp. 1849-1858, November, 1980.

- M.G. Johnson and E. L. Hudson, A variable delay line PLL for CPU-coprocessor Synchronization, IEEE J. Solid-State Circuits, vol 23., pp. 1218-1223, October 1988.
- 19. Razavi B., *Principles of Data Conversion System Design*, Piscataway, NJ, IEEE Press, 1995.
- Djahanshahi H. and Salama A. S., Differential CMOS Circuits for 622MHz/933MHz Clock and Data Recovery Applications, IEEE Journal of Solid States Circuits, vol. 35, No.6, June 2000.
- C. R. Hogge, A Self Correcting Clock Recovery Circuit, IEEE Journal of Lightwave Technology, vol. LT-3, pp. 1312-1214, December 1985.
- J. Yuan and C. Svensson, *High-speed CMOS circuit technique*, IEEE J. Solid-State Circuits, vol. 24, pp. 62–70, Feb. 1989.
- B. Razavi, et al., Design of high-speed, low-power frequency dividers and phase locked-loops in deep submicron CMOS, IEEE J. Solid-State Circuits, vol. 30, pp. 101–109, Feb. 1995.
- J. Craninckx and M. S. J. Steyaert, A 1.75-GHz/3-V dual-modulus divide-by-128/129 prescaler in 0.7-μm CMOS, IEEE J. Solid-State Circuits, vol. 31, no. 7, pp. 890–897, July 1996.
- J. Crols and M. Steyeart, Switched-opamp: An approach to realize full CMOS switched-capacitor circuits at very low power supply voltages, IEEE J. Solid-State Circuits, vol. 29, pp. 936–942, Aug. 1994.

- 26. A. M. Abo and P. R. Gray, A 1.5 V, 10-bit, 14 MS/s CMOS pipeline analogue-todigital converter, IEEE J. Solid-State Circuits, vol. 34, pp. 599–606, May 1999.
- 27. Razavi, B., *Design of Analog CMOS Integrated Circuits*, McGraw-Hill Companies, Inc., 2000.
- Hajimiri, A. and Lee, T., *The Design of Low Noise Oscillators*, Kluwer Academic Publishers 1999.
- Johnson, M. and Hudson, E., A Variable Delay Line PLL for CPU-Coprocessor Synchronization, IEEE Journal of Solid-State Circuits, vol. 23, pp. 1218-1223, 1988..
- 30. B. Lai and R.C.Walker, A Monolithic 622Mbps Clock Extraction and Data Retiming Circuit, ISSCC Dig. Tech. Papers, pp.144-145, Feb. 1991.
- Maneatis, J., Low Jitter Process-Independent DLL and PLL Based on Self-Biased Techniques, IEEE Journal of Solid-State Circuits, vol. 31, no. 11, pp. 1723-1731, 1996.
- UMC 0.13um 1.2V/3.3V 1P8M Logic High Speed Process Topological Layout Rule (Ver 1.3\_P1), UMC, SPEC No:G-03-LOGIC13-1.2V/3.3V-1P8M-HS-TLR 01.08.2002.
- UMC 0.13um LOGIC 1.2V/3.3V 1P8M HS Process Electrical Design Rule (Rev.1.2\_P1), UMC, SPEC No:G-02-LOGIC13-1.2V/3.3V-1P8M-HS-EDR 05.25.2002.

- 34. UMC 0.13um 1P8M 1.2V/3.3V Logic Process Interconnect Capacitance Model (Rev. 0.2), UMC, SPEC No:G-04-LOGIC13-1P8M-INTERCAP 01.31.2002.
- 35. Ismail, M. and T. Fiez, *Analog Signal and Information Processing*, International Edition, McGraw-Hill, inc. 1994.
- 36. 9.953 Gbps Integrated Low Power SONET/SDH Transmitter,, BCM8110 Broadcom, 8110-PB02-R-8.27.01.
- 37. *Multirate 1:16 Demux With Clock And Data Recovery*, BCM8131 Broadcom, 8131-PB00-R-3.22.01.
- 38. Kuntman, H., *Analog MOS Tümdevre Tekniği*, İstanbul Teknik Üniversitesi Elektrik-Elektronik Fakültesi Ofset Baskı Atölyesi. 1. Baskı, 1997.
- Gray, P. and R. Meyer., Analysis and Design of Analog Integrated Circuits John Wiley & Sons Inc., 1984.
- 40. Sanchez-Sinencio, E. and A. Andreou, *Low -Voltage / Low- Power Integrated Circuits and Systems, Low- Voltage Mixed-Signal Circuits*, IEEE Inc, 1999.
- 41. EPFL course notes, *PLLs and Clock Recovery*, Lausanne, Switzerland, June 2002.
- R. Walker and C. Stout, A 2.488 Gb/s Si-Bipolar Clock and Data Recovery IC with Robust Loss of Signal Detection, ISSCC Digest of Technical Papers, pp.246-247, Feb. 1997.

- 43. J. D. H. Alexander, *Clock Recovery from Random Binary Signals*, vol. 11, pp. 541-542, October 1975.
- 44. J.G.Maneatis and M.A.Horowitz, *Precise delay generation using coupled oscillators*, IEEE J.Solid-State Circuits, vol.28, pp.1273-1282, Dec 1993.
- 45. Razavi B., personal talk, 2002.

### REFERENCES

- 1. Rohde U. L., *Digital PLL Frequency Synthesizers*, Prentice-Hall, Englewood Cliffs, N.J. 1983.
- 2. Razavi B., *Monolithic Phase-Locked Loops and Clock Recovery Circuits*, IEEE Press, New York, 1996.
- 3. A. B. Grebene, *The monolithic phase-locked loop a versatile building block,* IEEE Spectrum, vol. 8, pp.38-49, March 1971.
- 4. R. E. Best, *Phase Locked Loops*, second edition, McGraw Hill, New York, 1993.
- G. S. Moschytz, *Miniaturized RC filters using phase-locked loop*, Bell Syst. Tech.
  J., vol. 44, pp. 823-870, May 1965.
- 6. W. F. Egan, Frequency Synthesis by Phase Lock, Wiley & Sons, New York, 1981.
- Jitter in digital communication systems, Part 1, Application note HFAN-4.0.3 Maxim High-Frequency/Fiber Communications Group.
- 8. *Converting between RMS and Peak-to-Peak Jitter at a specified BER*, Application note HFAN-4.0.2 Maxim High-Frequency/Fiber Communications Group.

- L. M. DeVito, A Versatile Clock Recovery Architecture and Monolithic Implementation, from Monolithic Phase-Locked Loops and Clock Recovery Circuits by B. Razavi, pp. 408.
- R. Cordell, A 50 MHz phase- and frequency-locked loop, IEEE J. Solid-State Circuits, vol. SC-14, no. 6, pp. 1003-1009, December 1979.
- C. H. Doan, Design and Implementation of a Highly-Integrated Low-Power CMOS Frequency Synthesizer for an Indoor Wireless Wideband-CDMA Direct-Conversion Receiver, Master Thesis, 1999.
- 12. Kang, S. and Y. Leblebici, CMOS Digital Integrate Circuits: Analysis and Design, McGraw-Hill, 1994.
- Çilingiroğiu, U., Systematic Analysis of Bipolar and MOS Transistors, Artech House, 1993.
- 14. Leblebici, D., *Microelectronics Technology Lecture Notes*, Istanbul Technical University, 2000.
- 15. Chien G., Low-Noise Local Oscillator Design Techniques using a DLL-based Frequency Multiplier for Wireless Applications, PhD thesis, 2000.
- W. F. Egan, *Frequency Synthesis by Phase Lock*, John Wiley & Sons, New York, 1981.
- F. M. Gardner, *Charge Pump phase-locked loops*, IEEE Trans. Comm., vol. COM-28pp. 1849-1858, November, 1980.
- M.G. Johnson and E. L. Hudson, *A variable delay line PLL for CPU-coprocessor* Synchronization, IEEE J. Solid-State Circuits, vol 23., pp. 1218-1223, October 1988.
- Razavi B., Principles of Data Conversion System Design, Piscataway, NJ, IEEE Press, 1995.
- Djahanshahi H. and Salama A. S., Differential CMOS Circuits for 622MHz/933MHz Clock and Data Recovery Applications, IEEE Journal of Solid States Circuits, vol. 35, No.6, June 2000.
- C. R. Hogge, A Self Correcting Clock Recovery Circuit, IEEE Journal of Lightwave Technology, vol. LT-3, pp. 1312-1214, December 1985.
- J. Yuan and C. Svensson, *High-speed CMOS circuit technique*, IEEE J. Solid-State Circuits, vol. 24, pp. 62-70, Feb. 1989.
- B. Razavi, et al., Design of high-speed, low-power frequency dividers and phase locked-loops in deep submicron CMOS, IEEE J. Solid-State Circuits, vol. 30, pp. 101-109, Feb. 1995.
- J. Craninckx and M. S. J. Steyaert, A 1.75-GHz/3-V dual-modulus divide-by-1281129 prescaler in 0.7-um CMOS, IEEE J. Solid-State Circuits, vol. 31, no. 7, pp. 890-897, July 1996.
- J. Crols and M. Steyeart, Switched-opamp: An approach to realize full CMOS switched-capacitor circuits at very low power supply voltages, IEEE J. Solid-State Circuits, vol. 29, pp. 936-942, Aug. 1994.

- 26. A. M. Abo and P. R. Gray, *A 1.5 V, 10-bit, 14 MS/s CMOS pipeline analogue-todigital converter,* IEEE J. Solid-State Circuits, vol. 34, pp. 599-606, May 1999.
- 27. Razavi, B., *Design of Analog CMOS Integrated Circuits*, McGraw-Hill Companies, Inc., 2000.
- Hajimiri, A. and Lee, T., *The Design of Low Noise Oscillators*, Kluwer Academic Publishers 1999.
- Johnson, M. and Hudson, E., A Variable Delay Line PLL for CPU-Coprocessor Synchronization, IEEE Journal of Solid-State Circuits, vol. 23, pp. 1218-1223, 1988..
- 30. B. Lai and R.C.Walker, A Monolithic 622Mbps Clock Extraction and Data Retiming Circuit, ISSCC Dig. Tech. Papers, pp.144-145, Feb. 1991.
- Maneatis, J., Low Jitter Process-Independent DLL and PLL Based on Self-Biased Techniques, IEEE Journal of Solid-State Circuits, vol. 31, no. 11, pp. 1723-1731, 1996.
- UMC 0.13um 1.2V/3.3V 1P8M Logic High Speed Process Topological Layout Rule (Ver 1.3\_P1), UMC, SPEC No:G-03-LOGIC13-1.2V/3.3V-IP8M-HS-TLR 01.08.2002.
- 33. UMC O.ISum LOGIC 1.2V/3.3V 1P8M HS Process Electrical Design Rule (Rev.l.2\_Pl\ UMC, SPEC No:G-02-LOGIC13-1.2V/3.3V-1P8M-HS-EDR 05.25.2002.

- 34. UMC 0.13um 1P8M 1.2V/3.3V Logic Process Interconnect Capacitance Model (Rev. 0.2), UMC, SPEC No:G-04-LOGIC13-IP8M-INTERCAP 01.31.2002.
- Ismail, M. and T. Fiez, *Analog Signal and Information Processing*, International Edition, McGraw-Hill, inc. 1994.
- 9.953 Gbps Integrated Low Power SONET/SDH Transmitter,, BCM8110 -Broadcom, 8110-PB02-R-8.27.01.
- Multirate 1:16 Demux With Clock And Data Recovery, BCM8131 Broadcom, 8131-PBOO-R-3.22.01.
- Kuntman, H., Analog MOS Tümdevre Teknigi, Istanbul Teknik Üniversitesi Elektrik-Elektronik Fakiiltesi Ofset Baski Atolyesi. 1. Baski, 1997.
- Gray, P. and R. Meyer., Analysis and Design of Analog Integrated Circuits John Wiley & Sons Inc., 1984.
- 40. Sanchez-Sinencio, E. and A. Andreou, Low -Voltage / Low- Power Integrated Circuits and Systems, Low- Voltage Mixed-Signal Circuits, IEEE Inc, 1999.
- 41. EPFL course notes, *PLLs and Clock Recovery*, Lausanne, Switzerland, June 2002.
- R. Walker and C. Stout, A 2.488 Gb/s Si-Bipolar Clock and Data Recovery 1C with Robust Loss of Signal Detection, ISSCC Digest of Technical Papers, pp.246-247, Feb. 1997.

- 43. J. D. H. Alexander, *Clock Recovery from Random Binary Signals*, vol. 11, pp. 541-542, October 1975.
- 44. J.G.Maneatis and M.A.Horowitz, *Precise delay generation using coupled oscillators*, IEEE J.Solid-State Circuits, vol.28, pp.1273-1282, Dec 1993.
- 45. Razavi B., personal talk, 2002.