Self-timed rings as low-phase noise programmable oscillators
Laurent Fesquet, Abdelkarim Cherkaoui, Oussama Elissati

To cite this version:
Self-timed rings as low-phase noise programmable oscillators

Laurent Fesquet¹, Abdelkarim Cherkaoui², Oussama Elissati³

¹Univ. Grenoble Alpes & CNRS, TIMA, Grenoble, France. Email: laurent.fesquet@imag.fr
²Hubert Curien Laboratory. Saint-Etienne, France. Email: abdelkarim.cherkaoui@univ-st-etienne.fr
³STRS Laboratory, INPT, Rabat, Morocco. Email: elissati@inpt.ac.ma

Abstract - Self-timed rings are promising for designing high-speed serial links and system clock generators. Indeed, their architecture is well-suited to digitally control their frequency and to easily adapt their phase noise by design. Self-timed ring oscillation frequency does not only depend on the number of stages as the usual inverter ring oscillators but also on their initial state. This feature is extremely important to make them programmable. Moreover, with such ring oscillators, it is easy to control the phase noise by design. Indeed, 3dB phase noise reduction is obtained at the cost of higher power consumption when the number of stages is doubled while keeping the same oscillation frequency, thanks to the oscillator programmability.

In this paper, we completely describe the method to design self-timed rings in order to make them programmable and to generate a phase noise in accordance with the specifications. Test chips have been designed and fabricated in AMS 0.35 µm and in STMicroelectronics CMOS 65 nm technology to verify our models and theoretical claims.

I. INTRODUCTION

Oscillators are essential building and basic blocks in many applications; they are part of PLLs, clock recovery systems and frequency synthesizers. Timing jitter and phase noise are important design considerations in almost every type of communication systems. There are plenty of works covering the clock generation. High frequency oscillators can be implemented using ring structures, relaxation circuits or LC circuits.

Multiphase oscillators are often required in frequency synthesizers. The design of low phase-noise low jitter oscillators is crucial especially when a large number of phases is required. Multiphase clock generation have two important requirements: a precise oscillation frequency setting and a fine temporal resolution. High frequencies with high phase resolutions are often required in multiphase clocks. Ring architectures composed of chained elements are inherently multi-phase and can easily provide multiple clocks within a small die size. Inverter Ring Oscillators (IROs) are often used to generate multi-phase clocks due to their simple structure, low cost and good integration in the design flows.

There are two main problems that we face with IROs for implementing multiphase clocks. Firstly, their timing resolution is limited by the propagation delay of one ring stage. Secondly, in these structures, there is a frequency drop with respect to the number of stages. Since the number of available phases is equal to the number of stages, the only way to obtain more output phases is to add more stages, which decreases the maximum frequency without improving the time resolution. Consequently, inverter-ring oscillators cannot be used in the applications requiring high resolution or high-speed multiphase clocks. There are many architectural techniques which have been proposed to increase the maximum frequency of ring oscillators with multiphase outputs. Some of these techniques include the use of sub-feedback loops, multiple-feedback loops [1], skewed delay schemes [2] and output-interpolation methods [3]. However, these techniques require careful calibration to achieve high precision, their resolution is limited and the added materiel may increases the phase noise.

Today many studies are oriented toward Self-Timed Ring (STR) oscillators which are considered as promising solution for generating clocks. They have well-suited characteristics for managing process variability [4] and offer an appropriate structure to limit the phase noise [5], [6]. On the other hand, S. Fairbanks introduced in [7] the idea of the use of STRs to generate high-resolution timing signals. Moreover, they can easily be configured to change their frequency by simply controlling their initialization at reset time. A Fully programmable/stoppable oscillator based on self-timed rings is also presented in [8].

This paper describes the method to design and define STR configurations in order to build programmable, multi-phase oscillators with regards to phase-noise and timing resolution specifications. This design method allows two main features: an important phase noise reduction at the cost of higher power consumption, and a timing resolution that can be set as fine as needed.

The paper is structured as follows. Section II provides the background, definitions and principles of STRs. Section III shows how a 3dB phase noise reduction can be obtained simply by doubling the number of STR stages, while keeping the same frequency and the same resolution, and describes how the temporal behavior of STR favors low timing jitter. Section IV provides jitter and phase noise measurements results for two fabricated test chips and an FPGA device in order to verify the theoretical assumptions. Section V synthesizes the results and describes the design flow for building low-noise, multi-phase and programmable clock signals. Finally, Section VI states the paper conclusions and future works.

II. SELF-TIMED RING OSCILLATORS

A. Architecture and behavior

The architecture of a STR is depicted in Fig. 1. It corresponds to the control circuit of an asynchronous micro-pipeline, as proposed by Sutherland in [9], which has been closed to form a ring of L stages. Each stage is composed of a Muller gate and an inverter. \( D_{ff} \) and \( D_{rr} \) are the forward and reverse static propagation delays of a ring stage associated to inputs \( F \) and \( R \).

The micropipeline stages communicate using a two-phase handshake protocol as described in [9]. Each request and acknowledgment signifies an event (electric transition) transfer between interconnecting stages. This is actually one of the most important features of STRs: their architecture allows the simultaneous propagation of several events without collisions, which allows a built-in frequency and phase control (by setting the number of propagating events) by a simple reset of the ring.
In fact, the number of propagating events in a STR is set at its initialization. The token and bubble concept, derived from the 2-phase communication protocol, is generally used to represent the internal states of a STR: stage, contains a token if its output \( C_i \) is not equal to the output \( C_{i+1} \) of stage \( C_{i+1} \). On the other hand, stage, contains a bubble if its output \( C_i \) is equal to the output \( C_{i+1} \) of stage \( C_{i+1} \). With a 2-phase protocol, a stage containing a token is a stage processing an event, while a stage containing a bubble is a free stage ready to process a new data. The number of tokens and bubbles will respectively be denoted \( N_T \) and \( N_B \). It is set during the ring initialization and remains constant (\( N_T + N_B = L \)).

In practice, the ring is initialized with \( N \) events that start propagating during a transient state. Independently of their initial positions in the structure, the events end up in a steady state in which they arrange themselves in one of these two ways: either they form a cluster that propagates in the ring (burst oscillation mode), or they spread-out around the ring and propagate with a constant spacing (evenly-spaced oscillation mode). Both of these oscillation modes are stable and depend on the static parameters of the ring (principally the static delays ratio with regards to the number of initialized tokens).

The evenly-spaced oscillation mode is obtained for a centered range of tokens (depending on the ring parameters), while the burst mode is obtained for the corner values (too many tokens or too many bubbles). In the burst oscillation mode, the events are non-uniformly spaced in time. In our application, we only target the evenly spaced mode.

B. The evenly-spaced mode locking mechanism

The propagation delay of a Muller gate is a function of the separation time between its two inputs. The smaller is this separation time, the longer is the propagation delay. This phenomenon, called the Charlie effect, is mainly responsible for the evenly-spaced propagation of events inside a STR. So called Charlie curves are often used to predict the temporal behavior of such gates [12]. An example of a symmetric Charlie curve is plotted in Fig. 2. It represents the propagation delay of a Muller gate as a function of the separation time between its inputs. Note that charlie(s) represents the propagation delay of the gate including the synchronization time of its inputs. The effective delay of the gate (i.e. seen from the last input event), noted \( D_{eff} \) in Fig. 2, represents the propagation delay of a Muller gate as a function of the separation time between its inputs. Note that charlie(s) represents the propagation delay of the gate including the synchronization time of its inputs. The effective delay of the gate (i.e. seen from the last input event), noted \( D_{eff} \) in Fig. 2, increases when the separation time decreases. We can also remark the non-linear temporal behavior around \( s=0 \). We show in Section IV that this feature is a determinant factor for the low jitter characteristics of STR.

In the STR context, the Charlie effect can cause two events that propagate closely to push away from each other: this is due to the increased delay experienced by a ring stage when driven by a request and acknowledge signal with a short separation time. When a high number of events is constrained in a short ring structure, this effect can become retroactive (depending on the ring occupancy): each event pushes away from its neighbors until they spread-out evenly across the ring, causing the evenly-spaced propagation mode of a STR.

C. Frequency

The frequency of a STR in the evenly-spaced regime is a function of its occupancy. It increases with the number of events (interpreted as tokens), then starts dropping when the number of free stages is lower than the number of events to process with regard to the \( D_{eff}/D_{off} \) ratio. In this case, the apparent number of propagating corresponds to the number of bubbles and the events propagate across the paths of the acknowledge signals. The maximum frequency is achieved when equation (1) is satisfied [4]:

\[
D_{eff}/D_{off} = \frac{N_T}{N_B}
\]

In practice, the oscillation frequency can be approximated using the following formula [10]:

\[
F_{osc-STR} = \frac{1}{2.(R+1)}
\]

Contrarily to other oscillators composed of chained elements, STR allow timing resolutions which are fractions of the propagation delay of one ring stage because they manage the simultaneous propagation of several events in the same structure. In fact, in classical oscillators, each stage switches successively after the other providing signals with a minimal resolution that corresponds to the propagation delay per stage. Contrarily in STRs, different stages may switch at times which do not depend on the propagation delays, but rather on the number of initialized events with regards to the \( D_{eff}/D_{off} \) ratio.

D. Phase distribution

As shown in [8], if the number of a STR stages is a multiple of the number of events, some stages may exhibit the same absolute phase. Nevertheless, if the number of tokens and bubbles are co-prime, the STR exhibits as many different equidistant phases as the number of stages. In this case, with T
the oscillation period, the minimal temporal resolution obtained in the ring is:

$$\Delta \phi = \frac{T}{2L} \quad (3)$$

In STRs, the oscillation period does not depend directly on the number of ring stages, but rather on the ring occupancy. This means that it is possible to increase L while keeping the same T in Equation (3). Therefore the phase resolution of a STR can theoretically be set as fine as needed.

### III. PHASE NOISE AND JITTER ANALYSIS IN STRS

#### A. Jitter

Electronic noise manifests itself in a digital signal as short-term variations in its significant timings, this phenomenon is called jitter. In typical digital oscillators, it depends on three factors: 1) the level of noise in each macro-cell of the oscillator (e.g. inverter in an IRO) 2) how this intrinsic noise affects other macro-cells 3) the global noise in the circuit (such as power supply noise). Typical quantification of the jitter magnitude involves statistical measurements as the standard deviation of timing grandeurs (propagation delay, oscillation period ...).

In digital oscillators composed of chained elements such as IROs, there is a linear dependency between a stage input time and its output time. In other terms, due to the chained structure, timing variations (due to noise in each ring stage and to global/environmental fluctuations) are linearly additive. In the case of white noise, the variance of the oscillation period is equal to the sum of the variances of the propagation delays (i.e. the level of noise in each macro-cell) [10].

In terms of frequency stability, STRs take advantage of the non-linear timing behavior of their basic stages: as shown in Fig. 2, in the part of the curve where the Charlie effect is strong (which can be almost flat), timing variations of the inputs are strongly attenuated at the output of the gate. That means that, under certain conditions on the input synchronization times, intrinsic noise of each stage barely affects timing variations of other stages. This enhances the all-around timing stability of the clock signal provided by a STR.

We propose to verify this assumption using the state of the art temporal model of STR stages. Presented simulations are based on TAL VHDL libraries (used for precise temporal modeling of Muller gates). Jitter is implemented using a software random number generator that applies random variations on the propagation delays of each ring stage (with a customizable standard deviation representing the local noise level of each macro-cell).

In the following experiment, we evaluate how the local noise in a stage structure affects other stages in both IRO and STR structures. The used structures are a 32-stage STR with 16 tokens and \(D_{Q}=D_{n}\) (separation times between the stage inputs are such as the Charlie effect is maximal), and a 7-stage IRO. In both structures, one cell (stage 4) generates timing variations generated in a particular stage are still measurable in all stages of the IRO, while they are progressively attenuated in adjacent stages of the STR. To fully take advantage of this feature, the number of tokens must be selected according to equation (1) in order to maximize the Charlie effect. The clocks built this way are expected to present lower period jitter values than clocks built from IROs with the same frequency (which we will try to verify in Section IV).

<table>
<thead>
<tr>
<th>Stage number</th>
<th>1</th>
<th>2</th>
<th>3</th>
<th>4</th>
<th>5</th>
<th>6</th>
<th>7</th>
</tr>
</thead>
<tbody>
<tr>
<td>STR</td>
<td>0.47</td>
<td>0.59</td>
<td>1.07</td>
<td>2.26</td>
<td>1.07</td>
<td>0.59</td>
<td>0.47</td>
</tr>
<tr>
<td>IRO</td>
<td>2.47</td>
<td>2.47</td>
<td>2.47</td>
<td>2.47</td>
<td>2.47</td>
<td>2.47</td>
<td>2.47</td>
</tr>
</tbody>
</table>

#### B. Phase Noise

The phase noise is given by the semi-empirical Leeson formula [13]:

$$L(f_c) = 10 \times \log \left( \frac{1}{2} \left[ 1 + \left( \frac{f_c}{2O_c} \right)^2 \right] \left[ 1 + \left( \frac{f_c}{O_{st}} \right)^2 \right] \left( \frac{FkT}{P_c} \right) \right) \quad (4)$$

Our experiments show that doubling the number of stages improves the phase noise by 3dB. According to Leeson's (eq.4), there are two solutions to improve the phase noise: by improving the load factor \(Q\) or by increasing the signal power consumption \(P_c\). The phase noise is inversely proportional to the oscillator power consumption. In other words, the phase noise can be reduced by 3dB by doubling the power consumption. The oscillators, which have the same \(N_r/N_s\) ratio, have the same waveforms output signal, and so the same load factor [13]. Moreover, we can improve the resolution of the ring and reduce the phase noise at the same time by adding stages, respecting the equation (1) and choosing a co-prime number of tokens and bubbles. With such rules, the maximal frequency is kept, the resolution is enhanced and the phase noise is reduced at the price of an increased power consumption.

To confirm the above analysis and evaluate the proposed methodology, we conducted extensive simulations of STRs oscillators using STMicroelectronics 65nm CMOS Technology. We used the TAL (TIMA Asynchronous Library) and the STMicroelectronics 65nm standard libraries for the physical implementation.

Table II presents the performances of STRs oscillators with the same \(N_r/N_s\) ratio. These oscillators are oscillating at the same frequency. The phase noise is reduced by -3dB when the number of stages is doubled which confirms our analysis. Of course, we cannot create an asymptotically zero phase noise ring by having millions of elements due to the noise floor imposed by the HF thermal noise.

<table>
<thead>
<tr>
<th>L</th>
<th>T/B</th>
<th>Freq. (GHz)</th>
<th>Consum. (mW)</th>
<th>PN at 1MHz (dBc)</th>
<th>PN at 10MHz (dBc)</th>
</tr>
</thead>
<tbody>
<tr>
<td>3</td>
<td>2T/1B</td>
<td>3.95</td>
<td>0.454</td>
<td>-82.97</td>
<td>-109.07</td>
</tr>
<tr>
<td>6</td>
<td>4T/2B</td>
<td>3.95</td>
<td>0.908</td>
<td>-85.98</td>
<td>-112.08</td>
</tr>
<tr>
<td>12</td>
<td>8T/4B</td>
<td>3.95</td>
<td>1.817</td>
<td>-88.99</td>
<td>-115.09</td>
</tr>
<tr>
<td>24</td>
<td>16T/8B</td>
<td>3.95</td>
<td>3.635</td>
<td>-92</td>
<td>-118.1</td>
</tr>
</tbody>
</table>

Table III presents the performances of the three multi-phases oscillators. We designed these three oscillators by respecting the rules given above. This table shows that we can...
increase the resolution and reduce the phase noise by adding stages and keeping a high frequency. Comparing to the 9-stage STR, the resolution of the 41-stage STR is improved by 4.5 times and the phase noise is reduced by -7.8 dBc with a small change in the oscillation frequency.

IV. EXPERIMENTAL RESULTS

To validate our study, two test chips have been designed and respectively fabricated in a 65 nm CMOS technology from STMicroelectronics (for the phase noise measurements) and in 0.35 µm CMOS technology from AMS (for the jitter measurements). Jitter measurements have also been performed on an Altera Cyclone III FPGA.

Figure 3. : Microphotography and the experimental setup

The 65nm test chip contains several STR configurations, noted OSC_xt_yB, where x is the number of tokens and y the number of bubbles. Their performances in terms of frequency and phase noise are presented in Table IV. Note the improvement of phase noise by doubling the number of stages in the ring while keeping the same frequency of oscillation. We also remark that the phase noise of the oscillator at 1MHz offset frequency in OSC_8T_4B is improved by 2.84 dBc/Hz compared to OSC_4T_2B, and by 7.19 dBc/Hz compared to OSC_2T_1B. We announced in section III an improvement of 3dB by doubling the number of stages. Here, we obtained an improvement of 4.35dBc/Hz from three stages to six stages and an improvement of 2.84dBc/Hz from six stages to twelve stages. This difference may be due to the precision of measuring devices or to the output buffer of the oscillators.

TABLE IV. : PERFORMANCES OF OSC_2T_1B, OSC_4T_2B AND OSC_8T_4B OSCILLATORS

<table>
<thead>
<tr>
<th>Oscillator</th>
<th>Freq. (GHz)</th>
<th>Consu. (mA)</th>
<th>PN at 1MHz (dBc/Hz)</th>
<th>PN at 10MHz (dBc/Hz)</th>
</tr>
</thead>
<tbody>
<tr>
<td>OSC_2T_1B</td>
<td>2.00</td>
<td>160</td>
<td>-83,32</td>
<td>-108,95</td>
</tr>
<tr>
<td>OSC_4T_2B</td>
<td>2.00</td>
<td>360</td>
<td>-87,67</td>
<td>-111,53</td>
</tr>
<tr>
<td>OSC_8T_4B</td>
<td>2.00</td>
<td>600</td>
<td>-90,51</td>
<td>-114,73</td>
</tr>
</tbody>
</table>

To extent these results, we performed jitter measurements on a test chip (CMOS 350nm) and on an Altera Cyclone III FPGA. The tested STR configurations are noted CS_xt_yB for the ASIC test chip and FS_xt_yB for the FPGA test device, they have similar frequencies but different number of stages. FI_4 is an IRO configuration composed of 1 inverter and 3 delay elements in Altera Cyclone III, presented for comparison. Oscillation periods and jitter values (standard deviation of the oscillation period) are shown in Table V.

The three first configurations (in CMOS 350nm) show that the number of stages does not seem to increase the jitter in STRs. Moreover, increasing the number of stages seems to improve the ratio standard deviation on mean oscillation period. On the other hand, all the tested FPGA configurations of STR have lower jitter values than the IRO at similar frequencies. Remark that the FS_4T_4B has a much higher frequency because it is implemented using local interconnections which were not feasible for larger rings in the FPGA. As for the ASIC test chip, it can be noted that increasing the number of stages in STRs reduces the jitter on period ratio.

TABLE V. : JITTER MEASUREMENTS IN CMOS 350NM AND ALTERA CYCLONE III FPGA

<table>
<thead>
<tr>
<th>Oscillator</th>
<th>Mean Period (ns)</th>
<th>Standard deviation (ps)</th>
</tr>
</thead>
<tbody>
<tr>
<td>CS_4T_4B</td>
<td>1.72</td>
<td>4.2</td>
</tr>
<tr>
<td>CS_8T_8B</td>
<td>1.85</td>
<td>4.3</td>
</tr>
<tr>
<td>CS_16T_16B</td>
<td>2.13</td>
<td>3.8</td>
</tr>
<tr>
<td>FS_4T_4B</td>
<td>1.58</td>
<td>3.1</td>
</tr>
<tr>
<td>FS_32T_32B</td>
<td>2.19</td>
<td>2.6</td>
</tr>
<tr>
<td>FS_64T_64B</td>
<td>2.43</td>
<td>2.8</td>
</tr>
<tr>
<td>FI_4</td>
<td>2.03</td>
<td>5.6</td>
</tr>
</tbody>
</table>

V. CONCLUSION

This paper presents the self-timed rings as low phase noise oscillators. The configurability of STRs allows us 3dB phase noise reduction at the cost of higher power consumption when the number of stages is doubled while keeping the same oscillation frequency. In addition, the temporal behavior of STR favors low timing jitter. The structure of STRs allows us to achieve sub gate phase resolutions. These resolutions can theoretically be set as fine as needed. All the simulation results have been confirmed by measurements done on our test chips fabricated in ST CMOS 65nm and in AMS 0.35µm.

REFERENCES