Micropackages and Open Source Trust Scaling
Trust and Auditing
This leads me to what my actual issue with micro-dependencies is: we do not have trust solved. Every once in a while people will bring up how we all would be better off if we PGP signed our Python packages. I think what a lot of people miss in the process is that signatures were never a technical problem but a trust and scaling problem.
I want to give you a practical example of what I mean with this. Say you build a program based on the Flask framework. You pull in a total of 4-5 dependencies for Flask alone which are all signed off my me. The attack vector to get untrusted code into Flask is:
- get a backdoor into a pull request and get it merged
- steal my credentials to PyPI and publish a new release with a backdoor
- put a backdoor into one of my dependencies
All of those attack vectors I cover. I use my own software, monitor what releases are PyPI which is also the only place to install my software from. I 2FA all my logins where possible, I use long randomly generated passwords where I cannot etc. None of my libraries use a dependency I do not trust the developer of. In essence if you use Flask you only need to trust me to not be malicious or idiotic. Generally by vetting me as a person (or maybe at a later point an organization that releases my libraries) you can be reasonably sure that what you install is what you expect and not something dangerous. If you develop large scale Python applications you can do this for all your dependencies and you end up with a reasonably short list. More than that. Because Python's import system is very limited you end up with only one version of each library so when you want to go in detail and sign off on releases you only need to do it once.
Back to Sentry's use of npm. It turns out we have four different versions of the same query string library because of different version pinning by different libraries. Fun.
Those dependencies can easily end up being high value targets because of how few people know about them. juliangruber's "isarray" has 15 stars on github and only two people watch the repository. It's downloaded 18 million times a month. Sentry depends on it 20 times. 14 times it's a pin for 0.0.1, once it's a pin for ^1.0.0 and 5 times for ~1.0.0. Any pin for anything other than a strict version match is a disaster waiting to happen if someone would manage to push out a point release for it by stealing juliangruber's credentials.
Now one could argue that the same problem applies if people hack my account and push out a new Flask release. But I can promise you I will notice a release from one of my ~5 libraries because of a) I monitor those packages, b) other people would notice a release. I doubt people would notice a new isarray release. Yet isarray is not sandboxed and runs with the same rights as the rest of the code you have.
For instance sindresorhus maintains 827 npm packages. Most of which are probably one liners. I have no idea how good his opsec is, but my assumption is that it's significantly harder for him to ensure that all of those are actually his releases than it is for me as I only have to look over a handful.