I.  Post silicon tuning for digital and mixed-signal circuits

Adaptvie body biasing (ABB) and adaptive supply voltage (ASV) tuning are two relatively mature techniques for post silicon tuning. ABB uses the body effect to modulate the threshold voltages of transistors, thereby controlling leakage and performance. ASV raises the power supply (Vdd) for slow (low-leakage) dies, and lowers it for fast (high-leakage) dies, ensuring better overall yield. It relies on the roughly cubic dependence of leakage power on Vdd in CMOS circuits (also impacting dynamic power quadratically). A detailed scheme for ASV is discussed in [4]

The main problems are how to select the proper adaptive voltage, and how to cluster the devices (gates) into groups at design time according to their potential adaptive voltage to be assigned. In post silicon tuning, the devices (gates) in the same cluster are always assigned the same voltage.

[1] uses forward body biasing to reduce the leakage in active mode In gate level circuits, where a large number of gates are not switching in active mode at any given point in time but nevertheless are consuming leakage power. It proposed a fine-grained forward body biasing (FBB) scheme for active mode leakage power reduction in gate level circuits without any delay penalty.

[2],[10] propose a new variability-aware method that clusters gates at design time into a handful of carefully chosen independent body bias groups, which are then individually tuned post-silicon for each die. This allows them to obtain near-optimal performance and power characteristics with minimal overhead. For each gate, the proposed method generates the probability distribution of its post-silicon ideal body bias voltage using an MC sampling method.  Then these distributions and their correlations are used to drive a statistically-aware clustering technique. Furthermore, the physical design constraints are studied to show how the area and wirelength overhead can be significantly limited using the proposed method.

However, in the problem formulation, this paper only uses the MC to consider process variation, which calls for two unsolved problems: 1) if we can replace the MC sampling method with the statistical timing analysis, it can significantly improve the accuracy and efficiency. However, how to maximize the yield when clustering the gates according to body biasing in the framework of SSTA is still an open problem in literature.  2) A related problem would be how to remove pessimism in SSTA by considering the post-silicon tuning:  It is well known that adaptive body-biasing can significantly improve the yield, however, this is not considered in the timing analysis and accordingly, current SSTA scheme would give much more pessimistic results.

[3] describes an optimization strategy that unifies design-time gate-level sizing and post-silicon adaptation using adaptive body bias at the chip level. The statistical formulation utilizes adjustable robust linear programming to derive the optimal policy for assigning body bias once the uncertain variables, such as gate length and threshold voltage, are known. Computational tractability is achieved by restricting optimal body bias selection policy to be an affine function of uncertain variables. Though there is significant flaw in math derivation (and accordingly the solution is not correct ), this paper points out a possibility of design-time and post-silicon co-optimization. 

2.  Post silicon tuning for clock tree

Post silicon tunable (PST) clock-tree has become an important design-for-yield (DFM) technique to counter variations on path delay and clock skew in manufactured chips. By inserting PST clock buffers into the clock-tree, slacks can be redistributed among adjacent timing paths and timing failures may be corrected through post-silicon clock tuning.

[5] proposes a comprehensive clock scheduling methodology that improves timing and yield through both pre-silicon clock scheduling and post-silicon clock tuning. First, an optimal clock scheduling algorithm has been developed to allocate the slack for each path according to its timing uncertainty. To balance the skew that can be caused by process variations, programmable delay elements are inserted at the clock inputs of a small set of flip-flops on the timing critical paths. A delay-fault testing scheme combined with linear programming is used to identify and eliminate timing violations in the manufactured chips.

[6] proposes to insert PST clock buffers at both internal and leaf nodes of a clock-tree and uses a bottom-up algorithm to reduce the number of candidate PST clock buffer locations. It then provides two statistical timing-driven optimization algorithms to reduce the hardware cost of a PST clock-tree.

[8] proposes an integrated framework that performs simultaneous statistical gate-sizing in presence of PST clock-tree buffers for minimizing binning-yield loss (BYL) and tunability costs by determining the ranges of tuning to be provided at each buffer. The simultaneous gate-sizing and PST buffer range determination problem is proved to be a convex stochastic programming formulation under longest path delay constraints and hence solved optimally. It further extends the formulation into a heuristic to additionally consider shortest path delay constraints.

Previous works on adaptivity optimization for post-silicon tuning focus on either logic signal tuning or clock signal tuning. [7] proposes the first unified adaptivity optimization on logical and clock signal tuning, which enables the designers to significantly save resource. In addition, it does not need any assumption on variation distributions. The unified optimization is based on a novel linear programming formulation which can be efficiently solved by an advanced robust linear programming technique. Due to the discrete nature of the problem, the continuous solution obtained from linear programming is then efficiently discretized. This procedure involves binary search accelerated dynamic programming, batch based optimization, and Latin Hypercube sampling based fast simulation.

[9] discusses the deskew problem, where the clock timing of flip-flops (FFs) is tuned by inserted programmable delay elements (PDEs) into the clock tree, is classified into this method. It proposes a novel deskew method that decides the delay values of the elements by measuring a small amount of FFs’ clock timing and presuming the rest of FFs’ clock timings based on a statistical model. In addition, the proposed method can determine the discrete PDE delay value because the rewrited constraint satisfies the condition of total unimodularity

3. Post silicon tuning of analog circuit

[11] discusses a widely used post-silicon tuning methodology in analog circuit design field to reduce random mismatches for analog circuits. A novel dynamic programming algorithm is incorporated into a fast Monte Carlo simulation flow for statistical analysis and optimization of the proposed tunable analog circuits (a tunable differential pair and tunable SC amplifier). It applies the proposed post silicon tuning methodology to several commonly-used analog circuit blocks. The paper demonstrates that with the post-silicon tuning, device mismatch exponentially decreases as area increases.

4. Miscellaneous (Post silicon tuning for high level synthesis and FPGA)

[12] proposes a module selection algorithm that combines design-time optimization with post silicon tuning (using ABB) to maximize design yield. A variation-aware module selection algorithm based on efficient performance and power yield gradient computation is developed. The post silicon optimization is formulated as an efficient sequential conic program to determine the optimal body bias distribution, which in turn affects design-time module selection.

[13] studies an FPGA architecture with a dual voltage supply wherein the supply voltage for individual CLBs can be assigned after fabrication; this yields a mechanism for fixing chips that fail because of manufactured transistors being slower than designed. The fundamental advance this work makes is that it assigns voltages based on manufactured data rather than designed values. The key contributions of this work are a CAD methodology and a detailed quantitative study using realistic data on the latest process technologies of the impact of post-manufacturing tuning on yield and power for dual-Vdd FPGAs.