From IoT devices to datacenters, code size is important, motivating ongoing research in binary reduction. A key technique is the merging of similar functions to reduce code redundancy. Success, however, depends on accurately identifying functions that can be profitably merged. Attempting to merge all function pairs is prohibitively expensive. Current approaches, therefore, employ summaries to estimate similarity. However these summaries often give little information about how well two programs will merge. To make things worse, they rely on exhaustive search across all summaries; impractical for realworld programs.
In this work, we propose a new technique for matching similar functions. We use a hash-based approach that better captures code similarity and, at the same time, significantly reduces the search space by focusing on the most promising candidates. Experimental results show that our similarity metric has a better correlation with merging profitability. This improves the average code size reduction by 6 percentage points, while it reduces the overhead of function merging by 1.8x on average and by as much as 597x for large applications. Faster merging and reduced code size to compile at later stages mean that our approach introduces little to no compile time overhead, while in many cases it makes compilation faster by up to 30%.