Clock Tree Synthesis- part 1
Author : Nishant Lamani, Physical Design Engineer, SignOff Semiconductors
Clock Tree Synthesis (CTS) is one of the most important stages in PnR. CTS QoR decides timing convergence & power. In most of the ICs clock consumes 30-40 % of total power. So efficient clock architecture, clock gating & clock tree implementation helps to reduce power.
Sanity checks need to be done before CTS
- Check legality.
- Check power stripes, standard cell rails & also verify PG connections.
- Timing QoR (setup should be under control).
- Timing DRVs.
- High Fanout nets (like scan enable / any static signal).
- Congestion (running CTS on congested design / design with congestion hotspots can create more congestion & other issues (noise / IR)).
- Remove don’t_use attribute on clock buffers & inverters.
- Check whether all pre-existing cells in clock path are balanced cells (CK* cells).
- Check & qualify don’t_touch, don’t size attributes on clock components.
Preparations
- Understand clock structure of the design & balancing requirements of the designs. This will be help in coming with proper exceptions to build optimum clock tree.
- Creating non-default rules (check whether shielding is required).
- Setting clock transition, capacitance & fan-out.
- Decide on which cells to be used for CTS (clock buffer / clock inverter).
- Handle clock dividers & other clock elements properly.
- Come up with exceptions.
- Understand latency (from Full chip point of view) & skew targets.
- Take care of special balancing requirements.
- Understand inter-clock balancing requirements.
Difference between High Fan-out Net Synthesis (HFNS) & Clock Tree Synthesis:
- Clock buffers and clock inverter with equal rise and fall times are used. Whereas HFNS uses buffers and inverters with a relaxed rise and fall times.
- HFNS are used mostly for reset, scan enable and other static signals having high fan-outs. There is not stringent requirement of balancing & power reduction.
- Clock tree power is given special attention as it is a constantly switching signal. HFNS are mostly performed for static signals and hence not much attention to power is needed.
- NDR rules are used for clock tree routing.
Why buffers/inverters are inserted?
- Balance the loads.
- Meet the DRC’s (Max Tran/Cap etc.).
- Minimize the skew.
What is the difference between clock buffer and normal buffer?
- Clock buffer have equal rise time and fall time, therefore pulse width violation is avoided.
- In clock buffers Beta ratio is adjusted such that rise & fall time are matched. This may increase size of clock buffer compared to normal buffer.
- Normal buffers may not have equal rise and fall time.
- Clock buffers are usually designed such that an input signal with 50% duty cycle produces an output with 50% duty cycle.
CTS Goals
- Meet the clock tree DRC.
- Max. Transition.
- Max. Capacitance.
- Max. Fanout.
- Meet the clock tree targets.
- Minimal skew.
- Minimum insertion delay.
Clock Tree Reference
By default, each clock tree references list contains all the clock buffers and clock inverters in the logic library. The clock tree reference list is,
- Clock tree synthesis.
- Boundary cell insertions.
- Sizing.
- Delay insertion.
Boundary cell insertions
- When you are working on a block-level design, you might want to preserve the boundary conditions of the block’s clock ports (the boundary clock pins).
- A boundary cell is a fixed buffer that is inserted immediately after the boundary clock pins to preserve the boundary conditions of the clock pin.
- When boundary cell insertion is enabled, buffer is inserted from the clock tree reference list immediately after the boundary clock pins. For multi-voltage designs, buffers are inserted at the boundary in the default voltage area.
- The boundary cells are fixed for clock tree synthesis after insertion; it can’t be moved or sized. In addition, no cells are inserted between a clock pin and its boundary cell.
Fig1: Boundary cell
Delay Insertion
- If the delay is more, instead of adding many buffers we can just add a delay cell of particular delay value. Advantage is the size and also power reduction. But it has high variation, so usage of delay cells in clock tree is not recommended.
Clock Tree Design Rule Constraints
- Max. Transition.
- The Transition of the clock should not be too tight or too relaxed.
- If it is too tight then we need more number of buffers.
- If it is too relaxed then dynamic power is more.
- Max. Capacitance.
- Max. Fanout.
Clock Tree Exceptions
- Non- Stop Pin
- Exclude Pin
- Float Pin
- Stop Pin
- Don’t Touch Subtree
- Don’t Buffer Nets
- Don’t Size Cells
Non- Stop Pin:
- Nonstop pins trace through the endpoints that are normally considered as endpoints of the clock tree.
- Example :
- The clock pin of sequential cells driving generated clock are implicit non-stop pins.
- Clock pin of ICG cells.
Fig2: Non Stop pin
Boundary cell insertions
Exclude pin:
- Exclude pin are clock tree endpoints that are excluded from clock tree timing calculation and optimization.
- The tool considers exclude pins only in calculation and optimizations for design rule constraints.
- During CTS, the tool isolates exclude pins from the clock tree by inserting a guide buffer before the pin.
- Examples:
- Implicit exclude pin-
- Non clock input pin of sequential cell.
- Multiplexer select pin.
- Three-state enable pin.
- Output port.
- Incorrectly defined clock pin [if pin don’t have trigger edge info.].
- Cascaded clock.
Fig3: Exclude pin
- In the above figure, beyond the exclude pin the tool never perform skew or insertion delay optimization but does perform design rule fixing.
Float Pin:
- Float pins are clock pins that have special insertion delay requirements and balancing is done according to the delay.[Macro modelling].
Fig4: Float pin
Stop Pin:
- Stop pins are the endpoints of clock tree that are used for delay balancing.
- CTS, the tool uses stop pins in calculation & optimization for both DRC and clock tree timing.
- Example:
- Clock sink are implicit stop pins.
Fig5: Stop pin
The optimization is done only upto the stop pin as shown in the above figure.
Don’t Touch Sub-tree:
- If we want to preserve a portion of an existing clock tree, we put don’t touch exception on the sub-tree.
Fig6: Don’t touch subtree
- CLK1 is the pre-existing clock and path 1 is optimized with respect to CLK1.
- CLK2 is the new generated clock. Don’t touch sub-tree attribute is set w.r.t C1.
- Example:
- If path1 is 300ps and path2 is 200ps, during balancing delay are added in path2.
- If path1 is 200ps and path2 is 300ps, during balancing delay can’t be added on path1 because on path1 don’t touch attribute is set and we get violation.
Don’t Buffer Net:
- It is used in order to improve the results, by preventing the tool from buffering certain nets.
Note: Don’t buffer nets have high priority than DRC.CTS do not add buffers on such nets.
- Example:
- If the path is a false path, then no need of balancing the path. So set don’t buffer net attribute.
Don’t Size Cell:
- To prevent sizing of cells on the clock path during CTS and optimization, we must identify the cell as don’t size cells.
Specifying Size-Only Cells:
- During CTS & optimization, size only cells can only be sized not moved or split.
- After sizing, if the cells overlap with an adjacent cell after sizing, the size-only cell might be moved during the legalization step.
Implementing Clock Tree:
For implementing the clock tree, use the clock-opt which performs CTS & incremental physical optimization.
- Synthesizes the clock Tree:
- Before implementing the clock tree, the tool upsize & possible moves the existing clock gate which improves the quality of result (QoR) and reduce the number of clock tree levels.
- Optimize the Clock Tree: Is done by following steps
- Buffer relocation.
- Buffer sizing.
- Gate relocation.
- Gate sizing.
- Improve skew.
- Delay insertion.
- Perform inter-clock delay balancing
- Balancing has to be done between two flops driven by two different clocks.
- Clock groups between which balancing have to be performed need to be specified.
- Perform detail routing of clock nets [NDR rule].
- Apply non default routing (NDR) rules for clock nets.
- Double width.
- Double spacing.
- Shielding
- By default the tool applies routing rules for sink pin by default. It is better to use normal routing rules at the sink pin because to reduce the congestion and tapping of clock might be easy.
Fig7: Non Default Routing
- Perform RC extraction of the clock nets and compute accurate clock arrival time.
- Adjust the I/O timings.
- After implementing the clock tree, the tool can update the input and output delays to reflect the actual clock arrival time.
- Perform power optimization.
- Use a large/Max clock gating fanout during insertion of the ICG cells.
- Merge ICG cells that have the same enable signal.
- Perform power-aware placement of ICG and registers.
- Check and fix any congestion hotspots.
- Optimize the scan chain.
- Fix the placement of the clock tree buffers and inverters.
- Perform placement and timing opt.
- Check for major hold time violation.
Comments are closed.