It becomes an essential requirement to identify cryptographic functions in binaries due to their widespread application in modern software. The technology fundamentally supports numerous software security analyses, such as malware analysis, blockchain forensics, etc. Unfortunately, the existing methods still struggle to strike a balance between analysis accuracy, efficiency, and code coverage, which hampers their practical application. In this paper, we propose BinCrypto, a method of emulation-based code similarity analysis on the interval domain, to identify cryptographic functions in binary files. It produces accurate results because it relies on the behavior-related code features collected during emulation. On the other hand, the emulation is performed in a path-insensitive manner, where the emulated values are all represented as intervals. As such, it is able to analyze every basic block only once, accomplishing the identification efficiently, and achieve complete block coverage simultaneously. We conduct the experiments with nine real-world cryptographic libraries. The results show that BinCrypto achieves the average accuracy of 83.2%, nearly twice that of WheresCrypto, the state-of-the-art method. BinCrypto is also able to successfully complete the tasks, including statically-linked library analysis, cross-library analysis, obfuscated code analysis, and malware analysis, demonstrating its potential for practical applications.
Journal article