Paper Review on Technology Mapping
Commented by Yan
Last update: 05/08/2006

1. FlowMap, "FlowMap: An Optimal Technology Mapping Algorithm for Delay Optimization in Lookup-Table Based FPGA Designs", Jason Cong and Yuzheng Ding, TCAD 1999 (pdf)
The algorithm consists of labeling phase and mapping phase. The cut with optimal depth is generated by formulating the problem as a max flow min cut problem. Area is heuristically minimized by finding the max volume min-cut and further reduced using post-processing techniques.

2. CutMap, "Simultaneous Depth and Area Minimization in LUT-Based FPGA Mapping", J. Cong and Y. Hwang, ISFPGA'95 (pdf)
The algorithm is developed based on FlowMap. In the labeling phase, the optimal depth in LUT-mapped solution is calculated for each node. In the mapping phase, the min-height cut is calculated for nodes on critical path while the min-cost K-feasible is calculated for nodes on non-critical path. A heuristic is presented to speed-up cut enumeration. Compared to FlowMap, number of LUTs (area) is reduced by 15%.

3. Efficient cut generation and pruning, "Cut ranking and pruning: enabling a general and efficient FPGA mapping solution", J. Cong and et al, ISFPGA'99 (pdf)
An efficient cut enumeration method is presented. The method is to recursively generate cuts for a node based on the cuts of its fan-in nodes. This method has been used in various state-of-the-art mappers such as [4][5][6][7].

4. Placement driven mapping, "placement-driven technology mapping for LUT-based FPGAs", J. Lin and et al, ISFPGA'03 (pdf)
Given a placement solution and delay models from placement, another mapping solution with placement is generated. Cut generation is based on the mathod in [3]. Delay is updated using look-up table in placement. The algorithm is in iterative fashion and includes labeling and mapping phases. The placement solution may be invalid and is legalized through local refinement including greedy movement and low temperature simulated annealing method.

5. EMap, power minimization, "On the interaction between power-aware CAD algorithms for FPGAs", S. Wilton and et al, ICCAD'03 (pdf)
The algorithm is developed based on CutMap. Similar to CutMap, the min-height cut is selected for critical nodes but min-cost cut is selected for non-critical nodes. Different from CutMap that only considers area cost, power cost is considered along with area cost as a weighted sum. Nodes with larger activities are absorbed into an LUT. Compared to CutMap, an average of 8% energy saving is achieved. There is virtually no area overhead.

6. DAOMap, "DAOMap, a depth-optimal area optimization mapping algorithm for FPGA designs", D. Chen and J. Cong, ICCAD'04 (pdf)
The mapper is an incremental work based on previous work from the same group (i.e. cutmap[2] and cut enumeration technique [3]). Surprisingly,  one key area estimation cost function is almost the same with that in IMap [7]. Besides of various heuristics presented in this paper, DAOMap and IMap achieves similar area and therefore the area estimation may be dominant in the cost function.

7. IMap, "heuristics for area minimization in LUT-based FPGA technology mapping", V. Manohararajah and et al, IWLS'04, (pdf)
The concept of area flow is presented in a recursive way and is proved to be equal to area in non-duplication mapping. In duplication mapping, area flow is updated iteratively to catch the real area and is used to minimize area. Depth is optimized. Compared to CutMap, area is reduced by 13% on average with LUT size of 4. Cuts are enumerated exhaustively, which work fairly well with LUT size less than 5.

8. Adaptive logic module (ALM), "Improving FPGA performance and area using an adaptive logic module", M. Hutton and et al, FPL'04 (pdf)
This paper more focuses on the structure of ALM instead of mapping algorithm. The combinations of an ALM is shown in the following table.
The mapping algorithm is only briefly introduced in a pretty high level due to the confidential information of Altera. The mapper needs to consider number of ALMs instead of number of LUTs, in other words, the packability of a pair of LUTs into an ALM. Verified with large industry designs, ALM reduces area and delay by 15% and 12%, respectively.