linear regression and random forests for layer

000001002003004005006007008009010011012013014015016017018019020021022023024025026027028029030031032033034035036037038039040041042043044045046047048049050051052053

Linear Regression and Random Forests for LayerIdentification in IC Layout Reverse-Engineering

Anonymous Author(s)AffiliationAddressemail

Abstract

Due to the tedious nature of the work, IC reverse-engineering is a process that isadvantageous to automate. Linear prediction and random forests for layer identi-fication are evaluated in this paper, and both approaches are shown to have goodperformance for this task, with random forests of relatively few (16) trees beingable to achieve lower error rates (7-12%) than linear prediction (10-20%).

1 Introduction and motivation

The reverse-engineering of integrated circuits is a process carried out widely by semiconductormanufacturers and researchers, who wish to analyse the layout of an IC and eventually derive theschematic of the circuit it implements. This process begins with depackaging the chip to expose thesilicon die, which is then photographed under microscopy. While these steps are not trivial, theycan be easily automated and represent a small fraction of the total effort spent in the process; thebulk of the work consists of tracing from the die images the regions which form components such astransistors, and their interconnects. This tracing is tedious work, and any methods which facilitate itare beneficial in increasing the efficiency of the process.

1.1 Basic structure of integrated circuits

Integrated circuits are fabricated starting from a wafer of silicon, upon which various layers aredeposited using a photolithographic process, to create the fundamental structures of electronic com-ponents such as transistors, resistors, and capacitors. The common types of layers are the bulk(substrate, i.e. the base silicon of the wafer), N-type and P-type diffusion (which occur together onone layer, the diffusion layer, and cannot overlap), polysilicon, and metal. For example, one typeof transistor, an N-channel MOSFET, consists of two regions of N-diffusion on one layer formingthe source and drain terminals, and another region of polysilicon located between these, formingthe gate terminal. The diffusion layer is always the bottommost (excluding the bulk, which canbe treated as a “background” for the purposes of this paper), followed by one or more layers ofpolysilicon, and finally one or more layers of metal. Layers are connected to each other by verticalstructures called vias. This basic structure viewed from the top is illustrated in figure 1.

Figure 1: Basic IC structure

1

054055056057058059060061062063064065066067068069070071072073074075076077078079080081082083084085086087088089090091092093094095096097098099100101102103104105106107

The implication of this structure in reverse-engineering is that from an image of the die, it is possibleto discern the various layers; while a newer chip may have over a dozen metal (topmost) layers, thevery topmost metal layer will be immediately apparent since it is above all the others, and once ithas been traced, either the image can be processed by subtracting the traced layer to rid it and makeclear the layer below, or the die can be processed physically and chemically to remove the layer andre-imaged. By repeating these steps, the layout of all the layers (and the vias connecting them) canbe determined, and this information used to derive the location of components (e.g. transistors) andtheir interconnections, which provides the information needed to recover the original design.

1.2 Approach and related work

There appears to be little prior research in using machine learning approaches for IC reverse-engineering, with the majority of work focusing on image processing techniques such as patternrecognition[1]. Degate[2] is a tool that works on this principle, and attempts to recognise commonlogic blocks by approximate matching against a libarary of known layout patterns. While patternrecognition is useful, the need to match features which are essentially a combination of all the lay-ers limits its flexibility to only designs which exhibit this regularity. In contrast, a layer-by-layerapproach as taken in this paper allows for full generality to any layout.

Figure 2: Training image and label 1 — Nintendo 3193A[3]

In this paper, the focus is on identifying the first (metal) layer, which visually appears as the brightestand uppermost. Figures 2 and 3 show examples of the die images used, along with their labelingimage that denotes the areas of metal; they are from the Visual6502 Project[4], which also hosts alarge number of (unlabeled) die images for various ICs. For this paper, the Nintendo 3193A andRCA 1802E are the two ICs chosen for applying ML to recover the metal layer. As can be seenfrom these images, the appearance of the metal layer can vary considerably between images, likelynecessitating re-training on each chip, but the advantage gained is from being able to use the resultsof training on a relatively small portion of the image to apply recognition to the rest of it and thusspeed up the overall process.

Figure 3: Training image and label 2 — RCA 1803E[5]

2 Linear prediction

2.1 Overview

The first method attempted is linear prediction. For each pixel in the source image (less a 4-pixelborder), itself and the surrounding 80 pixels are used as the vector of input features, with the corre-sponding pixel in the labeling image (0 or 255) the expected output. Thus, each feature correspondsto a 9 × 9-pixel block centred at the pixel to be predicted. This size was chosen as a compromiseto keep the computation reasonable (since the number of points is quadratic with respect to it) and

2

108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161

hopefully not lose too much accuracy; 9 pixels is slightly wider than the width of the thinnest metaltraces in the images above. Two types of features were tried; one uses the 3 RGB (0-255) values ofeach pixel, while the second uses converted HSB colour space values, both resulting in 244 (243 pre-diction + 1 expected output) data points per pixel. This feature arrangement precludes the predictionof a 4-pixel border at the edge of the image without performing extra preprocessing, but it was de-cided to not implement this preprocessing for two reasons: the values of the pixels beyond the edgeare not known and using approximations for them may reduce the accuracy of the prediction, andthe whole-chip images this process is aimed at are sufficiently large (in the region of 5000 × 5000or larger) that a 4-pixel border is of little concern. The calculation of errors and other values belowexcludes this border.

2.2 Details

Implementation of this method was done in C, and split into three programs for processing conve-nience. The first takes as input the training and labeling images, and outputs the n-by-244 matrix oftraining data. The second reads this matrix and uses the Intel MKL[6] to compute the least-squaresregression estimate, while the third uses the output of the second and an input image to produce anoutput image with each pixel (less the border) corresponding to a linear combination of it and the 80surrounding pixels, in accordance with the regression coefficients. Values below 0 or greater than255 were clipped into this range. RGB and HSB colour space versions of the first and third programwere written, and they were trained on the images of figures 2 and 3. The former image is 733×353,resulting in a 250125×244 matrix, while the latter is 400×400 and results in a 153664×244 matrix.

On an Intel Core i7 860 @ 4GHz, computation of the regression coefficients required between 2-3minutes of processor time, and prediction using the test images shown also took approximately thesame amount of time. 470MB and 290MB respectively were consumed for the computation of thesecoefficients on figures 2 and 3.

Figure 4: Test image and label 1 — 3193A

After training, prediction was done using both the training image and the test images shown in fig-ures 4 and 5 (using the results of the training image from the same chip.) Using the accompanyinglabeling image for the test images, the error can also be calculated as the percentage of pixels thatdiffer between the label and the prediction result; those pixels which are not 0 or 255 can be thresh-olded, and although it may seem reasonable to assume the 50% point (0-127 and 128-255) is optimal,to provide evidence in support of that choice, a series of comparisons were made to determine theeffects of the threshold on the overall error rate.

Figure 5: Test image and label 2 — 1802E

3

162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215

2.3 Results

Figure 6: Linear prediction on figure 2 result and 50% threshold

The result of training with RGB produces the image shown in figure 6. At a 50% threshold, most ofthe traces of the metal layer are reproduced adequately, but the error is still non-negligible, causingbreaks in some of the traces (false negatives) and the creation of traces where there aren’t any (falsepositives). Examples of this occur at the lower left and right, where slightly brighter green areas ofP-diffusion have been identified as metal by the linear prediction.


Figure 7 shows the output of training on RGB with figure 2 and testing with figure 4; it can be seenthat while some regions such as the lower third appear to be predicted with good accuracy, in theupper two thirds of the image where there is more detail in the lower layers, it causes both falsepositives (metal layer detected where there isn’t any) and negatives (no metal detected where thereshould be).

Figure 8: Linear prediction on figure 2 result, HSB colour space


The results of using HSB colour space in figures 8 and 9 appear worse than for RGB, with muchlarger areas of false positives. Although widely used in image processing tasks, HSB performsweaker than RGB in this application because the colours being discriminated seem to be morealigned with RGB — polysilicon appears reddish, P-diffusion is green, while N-diffusion is bluish-purple, and the desired metal layer is also slightly greenish, explaining the tendency for the P-diffusion to be confused with the metal and causing the false positives.

4

216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269



Compared to the 3193A images above, the 1802E RGB test and train results in figures 10 and 11show subjectively worse results, with many false positives arising from the brownish and bluishpolysilicon layer. False negatives are also present, and from comparison with the original image,appear mainly at the edges where the metal is crossing over poly. One possible explanation of thisis the wider traces in the image (8 pixels vs 6 for the 3193A) and its higher contrast and colour sat-uration. Although this gives the perception of a sharper image to the human vision, its containmentof many dark edges disturbs linear prediction since pixels labeled as non-metal are also surroundedby these dark edge pixels.



However, the results for 1802E HSB linear prediction show the opposite trend, with much of thebrown polysilicon false positives having disappeared (note how the vertical false positive strip andsquare structures on the left half in figure 11 are not present in figure 13.) One reason for theHSB colour space giving better performance compared to RGB can be hypothesised to be relatedto the nature of the data; for the 3193A images, with their relatively low contrast, discrimination isbased more heavily on the amount of green, whereas for the 1802E images the metal layer is nearlywhite and thus brightness becomes the main dimension of categorisation, matching well with the3rd component of HSB.

5

270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323

Figure 14: Error vs prediction threshold

The effect of the threshold on the error rate is shown in figure 14. At low thresholds, the number offalse positives is high, while the opposite with false negatives occurs when the threshold is too high.Thus the optimal threshold can be expected to be between these two extremes and that correspondswith the shape of the error curves obtained in figure 13, but the optimal is not exactly in the middleof the range (50% or 128). Moreover, the threshold of minimum error is different for each specificimage, with e.g. the 1802E HSB test image giving a minimum error at 86 while the 1802E HSBtrain image has its minimum error at a threshold of 140. With the exception of the 1802E HSB testimage with its 2%, the difference between the minimum error and the error at a 50% threshold isbelow 1%, meaning that a fixed threshold of 50% is a reasonable compromise. Overall, the errorsat 50% threshold range from 10-20%, meaning that while linear prediction certainly has its benefitsin getting the majority of the layer pattern identified, the output still requires manual inspection andcorrection of the errors.

3 Random forests

3.1 Overview

The second method under consideration in this paper is random forests. For this method, the same9 × 9-pixel (243-dimension) feature set was used as with linear prediction, but due to the timeneeded to generate the forests, it was decided to only use RGB input data. At each node of the tree,n random dimensions of the 243 are chosen, and the best threshold among all of them, with thehighest information gain, was used to decide the split. For prediction, the average across all the treeswas taken as the output pixel value.

3.2 Details

Two additional programs were written in C, one to generate a random forest and the other to predict.For generation, an in-place sort-based algorithm was used which does not require copying data ateach split, in order to reduce memory usage and increase performance. Incremental updating ap-proach when scanning through a dimension was adopted for information gain calculation, decreasingthe computation required. Nevertheless, random forests proved to require much more time to gener-ate than the linear prediction coefficients; all of the latter needed less than 5 minutes on an Intel Corei7 860 @ 4GHz, while the same machine required nearly an hour to generate each of the forests withthe parameters mentioned below. For the two sets of chip images used, 3193A and 1802E, forestswere generated varying t, the number of trees, and n, the number of random dimensions consideredat each split. (t, n) pairs were chosen to keep the overall workload relatively constant; they were(1, 243), (2, 120), (4, 60), (8, 30), and (16, 16). Nodes were leafed if they contained less than 256points, or their depth reached 256.

6

324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377

3.3 Results

Figure 15: Error vs prediction threshold – random forests

The quantitative error rates for 16 trees and 16 random dimensions are shown in figure 15 (notshown due to brevity, but errors were higher with fewer trees and more dimensions). For randomforests, the choice of threshold appears to have a greater effect, with lower thresholds preferred. Thecause of this has not been entirely determined but it maybe an artifact of how random forests’ outputis an average of individual trees, where the decision of any one tree should have a higher weighton the overall decision. These results show random forests able to achieve higher accuracy thanlinear prediction, and thus 2-fold cross validation was employed to further verify the accuracy of themodel, by training using the test data and testing on the training data. The results of this are shownin figure 16.

Figure 16: Error vs prediction threshold – random forests cross validation

Keeping in mind that the CV results are when the forests are generated from the test data and thentested on the training data, it appears that the errors are within the same range as before, althoughthe minimum error thresholds have changed. The errors at 50% threshold for testing (1802train cvand 3193train cv, since they were trained with the test image) remain at 12% and 8% respectively,comparable to the 12% and 9% obtained in figure 15. From these results, it can be stated that thisapplication of random forests generalises well to similar datasets.

Figures 17 and 18 show examples of predictions made on figures 4 and 5 by training on figures 2and 3 using 16 trees with 16 random dimensions at each split, and their decisions using thresholdsof 112 and 16, respectively. These correspond to errors of 10% and 7%. In particular, it can be seen

7

378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431

that the false positives are much reduced compared to linear prediction, and most of the errors thatare present are small discontinuities that could be easily removed using an inpainting algorithm.[7]

Figure 17: Random forest result on figure 4, and threshold 112

Figure 18: Random forest result on figure 5, and threshold 16

4 Conclusion

Random forests provide superior accuracy to linear prediction for the identification of layers in ICreverse-engineering. Even with a fixed threshold of 50%, linear prediction achieves an 11-20%error rate, while random forests can achieve 9-12%. These error rates mean that the tedious taskof manually tracing the layers can be reduced to only verification and small corrections of errors,and the nature of the errors with random forests also lends them to easy correction via inpainting.An interactive system can be envisioned for threshold finding, presenting the user with the decisionmade at a particular threshold, and allowing her to adjust it while comparing with the die image untilthe results are as desired. Further processing can also exploit the regular polygonal nature of the IClayout, using e.g. the adherence to a grid pattern to decide that metal is present in a grid pixel ifa certain threshold of its pixels were decided by the random forest to be metal. In conclusion, theresults achieved show a strong positive benefit to this application with random forests.

References

[1] G. Masalskis, R. Navickas; Reverse Engineering of CMOS Integrated Circuits. In Electronics and ElectricalEngineering, 2008, vol.88 pp. 25-28.

[2] M. Schobert; Reverse engineering integrated circuits with degate. http://www.degate.org/

[3] Visual6502 team; Nintendo 3193A. http://visual6502.org/images/pages/Nintendo 3193A.html

[4] Visual6502 team; Visual6502 site. http://visual6502.org/

[5] Visual6502 team; RCA 1802. http://visual6502.org/images/pages/RCA 1802 die shots.html

[6] Intel Corporation; Intel(R) Math Kernel Library. http://software.intel.com/en-us/intel-mkl

[7] A. Telea; An image inpainting technique based on the fast marching method. In Proceedings of Journal ofGraphics Tools, 2004, vol.9 no.1 pp.25-36

8

linear regression and random forests for layer

Documents