Managing Software Dependency at Scale

阿新 • • 發佈：2018-12-29

Introduction

At LinkedIn, we have more than 10,000 separate software codebases, referred to as multiproducts, which represent individual software products developed at LinkedIn. Each multiproduct is made up of various modules, which may have hundreds of transitive dependencies or more. Multiproducts can also be assembled into more complex multiproducts.

This leads to a complex graph of dependencies. Dependency resolution across these graphs is probably the most complex part of the build pipeline that LinkedIn engineers have to deal with on a daily basis. It is very important to us to make dependency resolution and management experience better for our engineers. The experience needs to be intuitive, consistent, reliable, and fast. In this article, we present our recent efforts in building a dependency management service at LinkedIn to meet these goals.

First, we need to manage these dependencies at the level of binary artifacts instead of using most recent version of the source code, allowing us to manage the sources specific to each multiproduct as its own Source Code Management (SCM) repository. Build systems like Gradle and Maven manage these binary dependencies at build time in an automated fashion, which makes it possible to build large-scale projects with complex dependencies. Most build systems operate by simply managing or capturing dependencies at the module level; however, here at LinkedIn, we also need to manage our dependencies at the multiproduct level—our unit of abstraction.

Given this graph of binary dependencies across our products, when we make changes to a product, we need to ensure that we are not breaking any other products that depend on it. We achieve that through what we call dependency testing.

Once we build and successfully test a product and publish its constituent binary artifacts, we then need to understand the dependencies of these published artifacts across all of the products. We need to answer questions that Gradle and Maven cannot answer, like the following:

What is the impact if we remove an End Of Life artifact from Artifactory?
What dependency tests should we run after a code change is made?
What version of a given module ended up in a deployed war or pex file?
Whether anyone uses a version of a library with a critical bug?

The legacy solution

Our previous dependency management solution known as LinkedIn Dependency Service was built using an off-the-shelf graph database. This solution focuses on managing dependency at the product level and was initially acceptable. However, as the organization grew and the complexities of products grew, we quickly discovered that this solution had several major limitations.

Firstly, the coarse granularity of managing dependency information at the product level without inadequate module level data—where the actual dependencies are created— caused a lot of problems. Build platforms like Gradle and Maven naturally operate at the module level when it comes to resolving version conflicts in dependencies. Therefore, managing dependency data at the product level with these tools can lead to impedance mismatch unless proper steps are taken.

Secondly, in the legacy solution, the version conflicts—which are resolved at the module level—were not resolved in the recorded product level dependencies, so multiple potential dependency paths co-existed within the data without indicating which exact version of a dependency is really being used in a specific situation.

The module level information that was recorded was mainly for Java purposes because that was the predominant development ecosystem at LinkedIn. With time, though, that has changed, and the legacy system seemed very limited to cope with the evolving language landscape at LinkedIn. We need the dependency graphs to be generated using dependency resolution strategies best suited for individual development platforms.

Our preferred build platform, Gradle, works best with Java. When it comes to CocoaPods for iOS development or JS libraries managed by Yarn or npm, we need to be able to deal with dependency graphs that are natively produced by their resolvers. Furthermore, the dependency graphs in the legacy solution lacked the ability to distinguish various classes of dependencies such as build, test, deployment, and runtime.

As a consequence of the limitations of the existing service, the tools and other services relying on this were often very conservative or incorrect, leading to developer frustrations due to:

False errors of circular dependencies
False warnings that end-of-lifed libraries were used
A lack of ability to distinguish different classes of dependencies and libraries, such as test framework that would unnecessarily be forcibly upgraded by the resolution strategy, leading to test failures!

Enter the new dependency service

As the company grew, the existing solution for managing dependencies across all products was not scaling well at all. We needed the service to reliably and accurately answer questions about product dependencies, direct or transitive. To address this challenge, we went back to the drawing board and built a more robust dependency management service.

The new service is designed to efficiently and accurately answer questions like

What modules does a product produce?
What are all the dependencies of a given module or a product regardless of the type (iOS, Java, Pig Job, etc.)?
What is the entire dependency graph for a given library?

Sometimes, we also need to know which modules or products depend on a given external library and whether a given library dependency is used only during the build and test period or if it is also used at runtime in production. This is important for us to know especially when we have problems of bad and ill-behaved libraries that need to be deprecated as soon as possible. The resulting service implementation delivers the following improvements.

Accurate, fine-grained, and fully resolved dependency graph at per module per configuration (based on Apache Ivy) level. It includes Toolchain, Java and container dependencies, which can be leveraged by deployment and other tools.
It captures data using build tools like Gradle, accounting for version conflict resolution done by the tool and LinkedIn specific dependency substitution rules. No more multiple confusing versions of the same library in one dependency graph.
It supports the importing of dependency graphs agnostic of programming languages and build tools.
Last but least, it exposes a well defined Rest.li API

Managing Software Dependency at Scale

Introduction

The legacy solution

Enter the new dependency service

Managing Software Dependency at Scale

RH413 Unit 2 Managing Software Updates

《The challenge of realistic music generation: modelling raw audio at scale》論文閱讀筆記

Personalization at Scale With Machine Learning: The Xero Story

The new frontier: Agile automation at scale

Serverless Streaming At Scale with Cosmos DB

tech businesses are beginning to use artificial intelligence at scale | AITopics

Future of Software: Developers at the center of the universe

Tracking User Behavior At Scale with Streaming Reactive Big Data Systems

First Business Models at Scale | Machine Learning Blog

Software Engineer at Mentor Together, India's largest youth mentorship not

High Quality Video Encoding at Scale

Learning at Scale & The End of “If -Then” Logic.

Server Fleet Managemement at Scale

Feature based GraphQL Modules at scale

Amazon Machine Learning – Make Data-Driven Decisions at Scale

Launch – Hello Amazon Macie: Automatically Discover, Classify, and Secure Content at Scale

Accelerating Precision Medicine at Scale

Quantum computing at scale: Scientists achieve compact, sensitive qubit readout

Scrum At Scale® 指南-切實可行的規模化擴充套件敏捷

Managing Software Dependency at Scale

Introduction

The legacy solution

Enter the new dependency service

相關推薦