Latency and Skew Reduction Techniques in Physical Design
1. Introduction
In Clock Tree Synthesis (CTS), latency and skew directly impact timing closure and overall chip performance.
- Clock Latency: The total delay from the clock source to the registers.
- Clock Skew: The difference in clock arrival times between registers.
Goal: Reduce latency and skew while ensuring balanced and optimized clock distribution.
2. Techniques to Reduce Latency and Skew
1️⃣ Use Balanced Clock Tree Topologies
🔹 Why? Unbalanced clock trees cause high skew and timing violations.
🔹 Fix:
✔ Use H-Tree or X-Tree structures for uniform distribution.
✔ Avoid asymmetrical buffering, which causes delays.
Command Example (Cadence Innovus - Enable H-Tree CTS):
setCTSMode -clockTopology H-tree
createClockTree
(Forces an H-tree structure for balanced clock distribution.)
2️⃣ Clock Buffer Sizing and Placement Optimization
🔹 Why? Incorrect buffer placement increases clock latency.
🔹 Fix:
✔ Use high-drive strength buffers in long paths to reduce delay.
✔ Place clock buffers symmetrically to balance delays.
Command Example (Synopsys ICC2 - Use Strong Buffers for Clock Paths):
setOptMode -bufferSize large
clock_opt
(Uses large buffers for better clock distribution and lower skew.)
3️⃣ Clock Shielding to Reduce Crosstalk Noise
🔹 Why? Clock nets are sensitive to noise and delay variations due to crosstalk.
🔹 Fix:
✔ Use ground shielding on critical clock nets.
✔ Route clock signals away from high-switching nets.
Command Example (Cadence Innovus - Add Clock Shields):
createShield -nets {clk} -layer M4 -type ground
routeClockTree
(Adds ground shields around the clock network to prevent noise coupling.)
4️⃣ Skew Balancing Using Buffer Insertion
🔹 Why? Uneven buffer distribution creates skew.
🔹 Fix:
✔ Insert buffers at proper locations to balance delay.
✔ Perform skew-aware clock tree synthesis.
Command Example (Synopsys ICC2 - Skew Optimization in CTS):
setCTSMode -skewTarget 50ps
clock_opt -postCTS
(Optimizes clock skew to target 50ps post-CTS.)
5️⃣ Adding Useful Clock Gating to Reduce Power and Delay
🔹 Why? Large clock trees consume excessive power and increase latency.
🔹 Fix:
✔ Insert clock gating cells to turn off idle sections of the clock tree.
✔ Use integrated clock gating (ICG) to save power.
Command Example (Cadence Innovus - Enable Clock Gating):
set_clock_gating -enable
clock_opt
(Enables automatic clock gating insertion to reduce power and latency.)
6️⃣ Metal Layer Selection for Lower RC Delay
🔹 Why? Lower metal layers (M1-M3) have higher resistance, increasing clock latency.
🔹 Fix:
✔ Route clock nets on higher metal layers (M6-M9) for lower resistance.
Command Example (Synopsys ICC2 - Force High Metal Routing for Clock):
setCTSMode -clockRoutingLayer M6
routeClockTree
(Forces clock routing on M6 to reduce latency.)
7️⃣ Post-CTS Clock Tree Optimization (ECO Mode)
🔹 Why? Even after CTS, timing may not be ideal.
🔹 Fix:
✔ Perform ECO adjustments to fix skew and timing violations.
✔ Use local buffering and wire tuning for fine adjustments.
Command Example (Cadence Innovus - ECO Clock Fixes):
fix_clock_eco -targetSkew 30ps
(Performs an ECO to further reduce clock skew to 30ps.)
3. CTS Optimization Flow
Step 1: Choose balanced clock tree topology (H-Tree, X-Tree, Mesh).
Step 2: Use proper buffer sizing to avoid excessive delays.
Step 3: Shield clock nets from noisy signals.
Step 4: Perform clock skew balancing using buffers.
Step 5: Use high metal layers for routing to minimize resistance.
Step 6: Apply ECO-based fine-tuning after CTS.