APT CLASS

Attributing a piece of malware to its creator typically requires threat intelligence to attain a sufficient confidence level. Binary attribution increases the level of difficulty as it mostly relies upon the ability to disassemble binaries to gather relevant features and build a fingerprint to identify the author.

To date, most research focuses on source code authorship attribution and the application of similar techniques to benign and malicious binaries. However, this approach provides an opportunity for malicious authors to attack the authorship attribution models due to the stark differences between both source code and binaries and benign and malicious authors.

Our survey (joint work with S3Lab) explores the style of threat actors and the adversarial techniques used by them to remain anonymous. We examine the adversarial impact on state-of-the-art methods for binary authorship attribution. Through this approach, we identify key findings and explore the open research challenges to identifying authorship style within malicious binaries.

One major challenge is the lack of a ground truth dataset of malware and authors. To mitigate this issue for the community, we publish alongside this survey a meta-information dataset of 17,513 malware labeled to 275 threat actor groups. This is the largest and diverse dataset to date. Additionally, we identify a further 5,630 malicious samples currently linked to unknown groups.

Access

To request access to the dataset, please complete the following form: We have already granted access to people from the following institutions (alphabetical order):
  1. Amadeus IT Group, Spain
  2. Beijing University of Posts and Telecommunications, China
  3. Ben Gurion University, Israel
  4. Bern University of Applied Sciences, Switzerland
  5. BlackTruffle Security
  6. Cybergeeks[.]tech
  7. Delhi Technological University, India
  8. DSO National Laboratories
  9. FortiGuard Labs
  10. Fraunhofer FKIE, Germany
  11. Georgia Tech Research Institute, USA
  12. Global Infotek, Inc, USA
  13. Grammatech, USA
  14. Hacettepe University
  15. Harfanglab, France
  16. Hasso-Plattner-Institut
  17. HRL Laboratories, USA
  18. Illinois Institute of Technology
  19. IMDEA Software Institute
  20. Institute of Information Engineering
  21. International Business Machines (IBM), USA
  22. Indian Institute of Technology Kanpur, India
  23. Information Sciences Institute, University of Southern California, USA
  24. InQuest
  25. Jawaharlal Nehru University
  26. Jinan University, China
  27. Kennesaw State University, USA
  28. Kudu Dynamics, USA
  29. Lancaster University, UK
  30. Mahidol University
  31. Nanjing University of Posts and Telecommunications
  32. Nanyang Technological University - NTU Singapore
  33. National University and Science and Technology Islamabad, Pakistan
  34. National University of Singapore
  35. NATO
  36. Naval Research Laboratory, USA
  37. Norwich University
  38. OpenAnalysis Inc
  39. Osaka Electro-Communication University, Japan
  40. Purdue University
  41. PolySwarm - Malware Intelligence
  42. Recorded Future, USA
  43. Rice University, USA
  44. Ritsumeikan University
  45. Royal Holloway University Of London, UK
  46. Ruhr-Universität Bochum, Germany
  47. Sabancı University, Turkey
  48. Shahid Beheshti University, Iran
  49. SRI International
  50. The MITRE Corporation
  51. TU Wien, Austria
  52. UC Berkeley, USA
  53. University Institute of Information Technology, PMAS, Pakistan
  54. University of Chinese Academy of Sciences, China
  55. University of Colorado
  56. University of Illinois, USA
  57. University of Kent, UK
  58. University of New Brunswick, Canada
  59. University of Saskatchewan, Canada
  60. University of Southern California
  61. Universidad Técnica Particular de Loja
  62. Unknown Cyber Inc
  63. Westphalian University, Germany
  64. Wrexham Glyndwr University
  65. Wright State University
  66. Wuhan University, China
  67. Zeropoint Dynamics, USA
  68. Zetier

Papers

Identifying Authorship in Malicious Binaries: Features, Challenges & Datasets
Jason Gray, Daniele Sgandurra, Lorenzo Cavallaro, Jorge Blasco Alis
CSUR 2024 · ACM Computing Surveys, 2024
@article{Grayetal2024,
author = {Gray, Jason and Sgandurra, Daniele and Cavallaro, Lorenzo and Blasco Alis, Jorge},
title = {Identifying Authorship in Malicious Binaries: Features, Challenges \& Datasets},
journal = {ACM Comput. Surv.},
issue_date = {August 2024},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
volume = {56},
number = {8},
month = {apr},
year = {2024},
articleno = {212},
numpages = {36},
url = {https://doi.org/10.1145/3653973},
doi = {10.1145/3653973},
issn = {0360-0300},
}

People