A summary of partial TMR (by Yu Hu, 01/22/2009)
Xilinx XTMR User Manual [pdf]
Partial TMR for combinational circuits
- P. K. Samudrala, J. Ramos, and S. Katkoori, Selective
triple modular redundancy (STMR) based single-event upset SEU tolerant
synthesis for FPGAs, IEEE Trans. Nucl. Sci., vol. 51, no. 6,
pp. 2957-2969, Oct. 2004.
- Uses
signal probabilities to find the SEU-sensitive sub-circuits of a design,
then TMR those critical sub-circuits and map the TMR-ed application to an
FPGA.
- However,
the selective TMR is performed in technology-independent synthesis, and
therefore certain optimization will be lost after the technology mapping.
- K. Veezhinathan, S. N. Mahammad,
V. Muralidaran, V. Narayanan, and V.
Chandrasekhar, Reduced triple modular
redundancy for tolerating SEUs in SRAM-based FPGAs, presented at the MAPLD Conf., Sep.
2005. [slides]
- Same
methodology as Samudrala05, but directly perform selective TMR on sensitive LUT
- Developed
a way to calculate the sensitivity of a LUT, based on signal probability
Partial TMR for sequential circuits
- K.
Morgan, M. Caffrey, P. Graham, E. Johnson, B.
Pratt, and M. Wirthlin, SEU-induced
persistent error propagation in FPGAs, IEEE
Trans. Nucl. Sci.,
vol. 51, no. 6, pp. 2438-2445, Dec. 2005.
- Proposed
a metric to measure the criticality of the circuit components
(gates/registers) in a sequential circuit, i.e., persistent
error/non-persistent error, where persistent error is an error which
cannot be corrected until a reset is performed, otherwise an error is
non-persistent.
- The
impact to TMR the persistant error-critical
components by experiments
- Brian
Pratt, Member, Michael Caffrey, James F.
Carroll, Paul Graham, Keith Morgan, and Michael Wirthlin,
Fine-Grain SEU Mitigation for FPGAs Using Partial TMR, IEEE Trans. On Nuclear
Science, VOL. 55, NO. 4, AUGUST 2008
- The
paper proposes a structural indication of a persistent error-critical
component, i.e, if this component is lying in a
cycle of the circuit, it has the highest priority; if a component is the
input of a cycle, it has the second highest priority; if a component is
the output of a cycle, it has the lowest priority.
- A
greedy algorithm is presented to maximize reliability (based on the
priority of the components) under the given resource constraint.
TMR and Beyond
- Keith
S. Morgan, Daniel L. McMurtrey, Brian H. Pratt,
and Michael J. Wirthlin, A Comparison of TMR With Alternative
Fault-Tolerant Design Techniques for FPGAs,
IEEE Trans. On Nuclear Science, VOL. 54, NO. 6, DECEMBER 2007
- Besides
TMR, this paper evaluates three addition mitigation techniques, including
quadded logic, state machine encoding and
temporal redundancy, in terms of both area costs and fault tolerant.
Results suggest that none of these techniques provides greater
reliability and often require more resource than TMR.
Voter placement
- F.
Lima Kastensmidt, L. Sterpone,
L. Carro, and M. Sonza
Reorda, On the Optimal
Design of Triple Modular Redundancy Logic for SRAM-based FPGAs, DATE'05.
- To
tolerate faults in interconnect, four different TMR schemes are studied
for optimizing a FIR filter. For each TMR scheme, the insertion of voters
is different, i.e., inserting voters after every triplicated
gates (tmr_max), inserting voters after a triplicated block of gates (tmr_med),
and inserting voters at the primary outputs (tmr_min).
Experimental results show that (tmr_med)
achieves the smallest overhead (area/performance) and best reliability.
- However,
there is no quantities/algorithmic solution presented for the optimal
voter insertion.