license scanning and compliance programs for foss projects

27
By Steve Winslow License Scanning and Compliance Programs for FOSS Projects

Upload: others

Post on 22-May-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: License Scanning and Compliance Programs for FOSS Projects

By Steve Winslow

License Scanning and

Compliance Programs

for FOSS Projects

Page 2: License Scanning and Compliance Programs for FOSS Projects

License Scanning and Compliance Programs for FOSS Projects

By Steve Winslow

This article describes the benefits of license scanning and compliance for open source projects, together with recommendations for how to incorporate scanning and compliance into a new or existing project.

Copyright © 2018 The Linux Foundation. All rights reserved.

Page 3: License Scanning and Compliance Programs for FOSS Projects

Introduction

1

1

Page 4: License Scanning and Compliance Programs for FOSS Projects

License Scanning and Compliance Programs for FOSS Projects

2

For many new open source software project communities, licensing may at first take a back seat to the project’s technical goals, design and architecture considerations, and community involvement. But as a project grows and sees greater adoption, it will eventually encounter questions about license compliance. Getting license compliance right early on in a project can help the project attract contributors and users. Too often projects never reach their full potential because someone looked at the licensing, found issues and moved on to alternatives.

Modern open source projects rarely consist solely of new code, written entirely from scratch; they are often built on top of a stack of tens or hundreds of dependencies. Or they may start from a template set of sample source code files, or as a fork from another existing project. Each of these original sources may be under their own license – which often are not the same as the license that the new project declares it is under.

When we talk about “license compliance,” a large part of what we mean is ensuring that distributors and users of a project are following the obligations of the applicable licenses for that project and its constituent parts. But being able to follow third-party license obligations requires, as a prerequisite, knowing what licenses apply. “License scanning” refers to the use of software tools and services to help enable this knowledge.

This article describes the benefits of license scanning and compliance for open source projects, together with recommendations for how to incorporate scanning and compliance into a new or existing project. This article does not address specific requirements under different types of licenses (for example, what is required to comply with a copyleft or permissive license). Rather, it addresses how to structure a project so that it, and its downstream consumers, can gain the information needed to be able to meet those requirements.

Page 5: License Scanning and Compliance Programs for FOSS Projects

3

Understanding the Purpose of License Scanning and Compliance 2

Page 6: License Scanning and Compliance Programs for FOSS Projects

License Scanning and Compliance Programs for FOSS Projects

4

License scanning and compliance are not end goals, in and of themselves. They are processes that can serve other objectives of a project’s developers, users and redistributors. Some of these objectives can include:

• Protecting the project’s developers. The dangers of noncompliance with inbound open source license terms have been frequently exaggerated and misstated over the years. Open source licenses are not “dangerous” or “viral”1. Nonetheless, open source developers should understand that their use of a third-party’s FOSS code is subject to the obligations in that code’s license. Failing to follow those obligations can lead to legal risks (such as infringement claims from a copyright holder), reputational risks (such as a project being perceived as disrespectful of community norms), and project growth risks (such as a project losing contributors and users due to licensing concerns and disputes).

• Assisting downstream compliance efforts. A FOSS project’s contributors might decide that they themselves are not troubled by the risks described above. They should also consider how those risks will be evaluated by their downstream users and packagers. A project which demonstrates that it has appropriate controls over its inbound and outbound licensing, and which can provide accurate license notices and information in a standardized way, will be greatly valued by downstream redistributors who want to comply with license obligations. Taking the time to improve a project’s license compliance can lead to expanded usage and a corresponding growth in contributors.

• Demonstrating project maturity. Similarly, awareness and controls over licensing can be one of several aspects of demonstrating that a project has achieved a degree of maturity

1 For an excellent overview of the misuse of the inaccurate term “viral” to describe copyleft FOSS licenses, see Open Source for Business (2015) by Heather Meeker, pp. 8-9.

Page 7: License Scanning and Compliance Programs for FOSS Projects

License Scanning and Compliance Programs for FOSS Projects

5

and reliability. A project which communicates clear and reliable information about the licensing for all of its constituent parts will be viewed as more “grown up” than a project with a single LICENSE.txt file and nothing else.

• Sooner is better than later. An open source project that hopes to grow significantly will likely be faced with license conflicts or questions at some point. Being aware of the potential questions and issues at an early stage may help a project to make better architectural decisions, such as avoiding incorporating dependencies that are subject to incompatible licenses. Avoiding potential conflicts at the design stage is far less disruptive to a project than finding out that you have to remove a now-essential component due to license incompatibilities.

• Good hygiene benefits. Even if a project views licensing concerns as simply a bunch of “legal gobbledygook,” it should be aware of the related benefits that can result from a deeper dive into its licensing. The most tangible of these benefits relate to security. License scanning and investigation often involves analyzing a project’s direct and indirect dependencies, including those incorporated directly into the project’s code as well as those that are pulled in at build time, install time or run time. This accurate understanding of exactly which versions of code are being used is also extremely useful in understanding a project’s security landscape. Answering the question “Is my project vulnerable to exploit ABC?” often depends on being able to answer “Is my project using version X.Y of dependency Z?” For this reason, some tools and services that provide license scans also bundle together various types of security vulnerability information.

Additionally, some license scanning processes can also include resources that assist in evaluating matters like compliance with export controls (e.g., for code implementing cryptographic functions); identification of copyright holders to comply with license obligations; awareness of other useful

Page 8: License Scanning and Compliance Programs for FOSS Projects

License Scanning and Compliance Programs for FOSS Projects

6

component information (e.g., flagging dependencies from inactive projects that have not been maintained for years); identifying confidential code that was published accidentally and should be removed; and the like.

Page 9: License Scanning and Compliance Programs for FOSS Projects

7

Determine Goals, Expectations and Leaders3

Page 10: License Scanning and Compliance Programs for FOSS Projects

License Scanning and Compliance Programs for FOSS Projects

8

After a project has decided to take steps towards implementing license compliance, and to make license scanning a part of this, it is equally important to set realistic goals for these processes. To make the scanning and compliance efforts meaningful, and to minimize wasted time and frustration, key participants in the project should be aligned on what they hope to achieve and how to achieve it.

Any project that implements license scanning and compliance should aim to make it sustainable. When facing license issues for the first time, it is easy for a project to become overwhelmed by the sheer number of issues and options that arise. This can cause a sense of burnout and discouragement, and result in abandoning license compliance efforts. Instead, a project should go in with the intention to avoid making “perfect” the enemy of “good,” and should place value on getting a little bit better at licensing compliance every month or every year.

The project should set realistic priorities for what types of issues are of more immediate concern, versus those that are stretch goals to work towards over time. For example, regardless of what a project chooses as its primary license, a tangible first goal could be to identify and remove code that is subject to incompatible licenses. A project declared to be under a copyleft license may want to focus first on ensuring that it isn’t including or relying on any binary-only dependencies, which could undermine the letter or spirit of its copyleft license. A project declared to be under a more permissively-licensed project might start by ensuring that copyleft components and dependencies are not incorporated into the project, or used in a manner that would require significant parts of the project to be provided only under that copyleft license.

Ultimately, a project may find it necessary to start thinking of license issues as “bugs” in the code. Open source developers have innumerable competing demands on their time. It is easy to write off licensing compliance as an externality that is just a lawyers’ concern. But if a project decides that it values the purposes behind license scanning and compliance, then the project developers may, for example, need

Page 11: License Scanning and Compliance Programs for FOSS Projects

License Scanning and Compliance Programs for FOSS Projects

9

to begin treating license incompatibilities between dependencies in the same fashion that they would a technical incompatibility. Accomplishing this will require buy-in from the core developers that license compliance is a worthwhile goal, and that they will take reasonable efforts in their development to help achieve it.

For many projects, it is helpful to identify one or more core developers who will be the primary leaders on licensing matters. This enables the project to grow core competencies in open source licensing without requiring all developers to become proficient at once. A developer who takes on this role can provide significant benefits to that project. In addition, they will develop expertise that makes them incredibly valuable to other communities and companies. Having tangible expertise with complex open source licensing matters, and being able to communicate with (and translate between) legal, technical and business teams on those matters, is a rare skill that can set a developer apart in many contexts.

Page 12: License Scanning and Compliance Programs for FOSS Projects

10

Select Scanning and Reporting Tool(s)4

Page 13: License Scanning and Compliance Programs for FOSS Projects

License Scanning and Compliance Programs for FOSS Projects

11

There are a variety of open source and commercial tools and services that fall under the broad umbrella of “open source scanning.” Many of these offerings apply different meanings to the same term, and there are few agreed-upon definitions. In this article, we loosely define a few of these terms as follows:

• license scanning: applying textual analysis techniques to source code files (and sometimes other files such as documentation, object code, data files, etc.), to search for references to specific open source licenses or other indicators of license-relevant information

• code scanning: comparing snippets of source code (and sometimes object code) to a curated database of open source components under known observed licenses

• dependency scanning: identifying which dependencies (including version numbers) are imported at build time or install time, and retrieving license information from a package registry of declared licenses and/or curated database of observed licenses

Each of these types of scanning has different advantages and drawbacks, in terms of time and effort required, depth of review, scope and actionability of results, and typical false positive rates. Different types of scanning will also provide additional adjacent benefits – for example, license scanning may be more useful for identifying copyright holders, whereas code scanning or dependency scanning may provide more useful information for identifying security vulnerabilities. A project might reasonably choose to use tools that perform one or all of these types of scanning. As with anything in open source development, the choice of what to focus on is a matter of tradeoffs between available resources and desired benefits. The key element in choosing a type of scanning is understanding what its results will tell you, and what they won’t.

Page 14: License Scanning and Compliance Programs for FOSS Projects

License Scanning and Compliance Programs for FOSS Projects

12

At The Linux Foundation, for many of our projects we have focused on license scanning. We have found that this type of scanning gives our projects insight into the specific notices and licenses within their codebases, just as any recipient of the project’s code could identify these notices without requiring access to a specialized database of other code or components. For some projects, we have also incorporated dependency scanning to assist in identifying licenses for build-time and install-time dependencies. Scanning also does not have to be an activity done after coding. For example, there are tools available today that will scan for licensing at the moment a GitHub pull request is created. Challenges with this approach include that many of the available tools for scanning at the moment of contribution do not cover in their scope the dependencies incorporated at build time or install time; or they may have high false positive and false negative detection rates (and therefore still require a degree of manual review).

We often use a license scanning tool called FOSSology – itself a Linux Foundation open source project – to scan many of our key projects’ codebases on a monthly basis and just prior to a release. FOSSology has a bit of a learning curve and typically requires some manual effort, but it enables a deep look at licenses contained within a project’s code. It contains two primary scanners, one of which is a regular expression engine and the other a keyword and bulk text matching agent. Each is designed to flag for review not only exact matches to license texts, but also keywords and phrases that might be of interest in potentially identifying relevant licenses.

After conducting a scan, an equally important step is to package the scan results into meaningful reports for downstream users and actionable improvements that the project developers can make. As part of reporting license scan results, we recommend using SPDX®, the Software Package

Page 15: License Scanning and Compliance Programs for FOSS Projects

License Scanning and Compliance Programs for FOSS Projects

13

Data Exchange. SPDX is a specification for, among other things, associating license scan findings and conclusions with a set of files comprising a code base, in a defined and machine-readable format. A growing number of commercial services and open source tools (including FOSSology) are able to produce and consume SPDX reports. We recommend using one of these tools to develop SPDX documents reflecting your scan results. In addition to various reporting functionality built into scanning programs and services, the SPDX project also provides various tools that can assist in converting SPDX files into reports that are more usable by project developers.

Page 16: License Scanning and Compliance Programs for FOSS Projects

14

Establish a Sustainable Process and Document It5

Page 17: License Scanning and Compliance Programs for FOSS Projects

License Scanning and Compliance Programs for FOSS Projects

15

After selecting a scanning tool, the next step is to integrate it into a sustainable process. Essentially, this consists of answering the following questions:

What should our priorities be? This was briefly addressed above, where I described some examples of initial realistic priorities. One framework to prioritize the many possible end goals for license scanning and compliance could look like the following:

• First, focus on just understanding the license composition of the project, by identifying what licenses are present in its codebase and/or dependencies.

• Second, identify which components (if any) have licenses are potentially incompatible with the project’s declared license – or with other licenses present in components and dependencies. Remediate them by removing the problematic code, rewriting it or replacing it with a different component provided under a more compatible license.

• Third, improve the project’s own compliance efforts by, for example, ensuring that all required attribution of license texts and copyright notices are appropriately provided.

• Fourth, improve downstream users’ compliance efforts by enhancing the project’s public-facing outputs of license information (see the section below).

How often should we run and review scans? A scanning tool that requires significant amounts of manual effort will very likely be infeasible for use on a daily basis. By contrast, a more automated tool might be able to re-scan on every new commit, but may be more likely to provide false positives and insufficient results – leading to a need for supplementing with periodic deeper manual scans. Projects should select a scanning frequency that is reasonable for the tool(s) used, the developer time available and the project’s goals. For example, scans could be run and reviewed by the

Page 18: License Scanning and Compliance Programs for FOSS Projects

License Scanning and Compliance Programs for FOSS Projects

16

lead licensing developer on a regular periodic basis (such as weekly, monthly or quarterly), and reviewed by the broader project leadership prior to significant milestones (such as for a checkpoint to each major new release).

Who decides what actions to take? For larger projects, it is often helpful to determine in advance who from the project’s technical leadership will decide what actions to take in response to significant findings. They may not need to be consulted for every minor change, such as re-inserting missing notices. But it is often helpful to have decision-making oversight resting in the hands of the project leadership, informed by legal counsel where possible. This is particularly relevant for findings where remediation requires substantial development efforts (e.g., re-architecting to remove a key component with an incompatible license), or where downstream licensing concerns arise (e.g., retaining a copyleft component within a permissively-licensed project).

How do we communicate and document the scan findings and decisions? Smaller projects, and those that are more individual- or community-based, may decide to use their project’s standard issue tracker to discuss and determine how to resolve license findings. Although this may be satisfactory for the contributors (and fits with the idea described above of treating license issues as “bugs”), contributors should be conscious of these public discussions leading to the potential for public accusations that “your project is infringing my license.” Larger projects, and those that are more driven by company-based contributors, may prefer to designate a smaller group of developers and counsel who will review license findings and provide guidance in the first instance. This approach may also enable legal counsel to keep a modicum of attorney-client privilege in conversations about license issues (at least with developers at their own companies).

Ideally, after developing a scanning and compliance process, a project will want to record it as part of the project’s “how to contribute”

Page 19: License Scanning and Compliance Programs for FOSS Projects

License Scanning and Compliance Programs for FOSS Projects

17

documentation. This can be in as much or as little detail as desired, and it will enable existing and new contributors to the project to more readily participate in the project’s licensing objectives.

Page 20: License Scanning and Compliance Programs for FOSS Projects

18

Provide Public-Facing Outputs6

Page 21: License Scanning and Compliance Programs for FOSS Projects

License Scanning and Compliance Programs for FOSS Projects

19

License scanning and compliance is not intended just for the project developers’ own benefit. Just as a project should consider the licensing for its inbound dependencies, it should also be aware that downstream users, redistributors and developers will be taking their own dependencies on it – and will be asking many of the same questions about the project’s licensing composition.

• Provide a top-level LICENSE.txt file. This is perhaps the most common step taken by FOSS projects, even those that don’t give another thought to licensing. Make sure that your project has a top-level LICENSE file containing the text of the project’s declared license. If multiple licenses apply (such as separate licenses for code and documentation), include both of them and explain the different in your documentation.

For some licenses, note that just including the standard license text may not provide sufficient information to users. For example, projects licensed under a version of the GPL or its related licenses will want to explicitly clarify whether it is provided (a) under only that version of the GPL (GPL-2.0-only), or (b) under that version and any later version (GPL-2.0-or-later).2

And, be sure that you’ve included the correct license text – it is not uncommon to find projects that, for instance, state in a README file that they use one version of a Creative Commons license but then include the text for a different Creative Commons license.

• Always add copyright and license notices to each individual file in the project, wherever possible. Part of what makes open source software so wonderful is the ability to reuse code from one project within a completely different project. However, if the only license information in a project is its top-level LICENSE.txt file, then it is harder to determine what license applies when a source code file gets taken and reused in a

2 Richard Stallman recently published an article addressing the need for clarity about “only” vs. “any later version” for the GPL family of licenses, and describing changes to the SPDX License List that help make this choice more explicit. See For Clarity’s Sake, Please Don’t Say “Licensed under GNU GPL 2”!

Page 22: License Scanning and Compliance Programs for FOSS Projects

License Scanning and Compliance Programs for FOSS Projects

20

different project under a different license. For this reason, many widely used FOSS licenses (for example, various versions of the GPL and LGPL, MPL-2.0, and Apache-2.0, to name just a few) include the notion of a “standard license header” or “source code form license notice,” which spells out the recommended form of notice a developer should add to each file under that license.

Keep in mind that it will not always be feasible to add a license notice to every file. In particular, for files such as images, metadata file formats without comments, structured or binary test files, and others, it may not be feasible to add a license notice within the file itself without unreasonable effort. In these cases, it is usually considered reasonable to rely on the top-level LICENSE file.

• Use SPDX short-form identifiers. As an alternative – or in addition – to including the standard license header notices, we also recommend adding SPDX short form identifiers to project files. These are simple, one-line comments that communicate license information in a way that is both human- and machine-readable, by reference to the license IDs in the SPDX License List3. For example, a file containing source code under Apache 2.0 could include the following comment:

# SPDX-License-Identifier: Apache-2.0

A file containing source code under Apache 2.0, but also containing a snippet taken from an MIT-licensed file, could

3 The SPDX License List was initially developed as part of the SPDX specification, to standardize references to commonly-used licenses found in open source. The License List has since become an invaluable resource to open source developers and attorneys who are looking to find the specific texts for a wide variety of open source licenses.

## Copyright (C) The Linux Foundation# SPDX-License-Identifier: Apache-2.0## Licensed under the Apache License, Ver# you may not use this file except in co# You may obtain a copy of the License a# # http://www.apache.org/Licenses/LIC## Unless required by applicable law or a# distracted under the License is distribu# WITHOUT WARRANTIES OR CONDITIONS OF AN

456789

10111213141516

Page 23: License Scanning and Compliance Programs for FOSS Projects

License Scanning and Compliance Programs for FOSS Projects

21

include the following comment:

# SPDX-License-Identifier: (Apache-2.0 AND MIT)

The combination of the specific “SPDX-License-Identifier” tag, together with the standardized license IDs from the SPDX License List, can make it far easier for downstream users to collect relevant license information – potentially as simple as running a grep command.

• Provide SPDX documents from your own scans. If your scanning tool exports SPDX documents, consider making them available to downstream consumers together with releases of your project code. Some downstream users will elect to run their own scans no matter what. But making the results of your own scans available will decrease the need for redundant effort, and will help demonstrate your project’s confidence in the quality of its licensing compliance and controls.

• Consider implementing other practices for communicating license information. In addition to SPDX identifiers and documents, there exist other recommendations for how to improve license information for FOSS projects. In particular, Free Software Foundation Europe has developed the REUSE Initiative, which describes best practices with recommended file and directory structures for delivering license texts and related information.

Page 24: License Scanning and Compliance Programs for FOSS Projects

22

Result: Creating Benefit for the Whole Ecosystem7

Page 25: License Scanning and Compliance Programs for FOSS Projects

License Scanning and Compliance Programs for FOSS Projects

23

Implementing a license scanning and compliance process provides benefits beyond just protecting the project’s developers from legal risk. It builds confidence that the project’s downstream consumers can know with greater certainty what requirements they must abide by, in order to use, redistribute and build on top of it. It reduces the need for multiple individuals and organizations to carry out redundant efforts in scanning code for licenses. And it demonstrates that the project’s developers take open source licensing seriously, and will help make it easier for their full related ecosystem of contributors and users to do so as well.

The Linux Foundation, SPDX and Software Package Data Exchange are registered trademarks of The Linux Foundation. Linux is a registered trademark of Linus Torvalds.

Page 26: License Scanning and Compliance Programs for FOSS Projects

License Scanning and Compliance Programs for FOSS Projects

24

About the author

Steve Winslow is Director of Strategic Programs at The Linux Foundation. He runs The Linux Foundation’s license scanning and analysis service, advising projects about licenses identified in their source code and dependencies. Steve is also involved with projects including SPDX, FOSSology and the Community Data License Agreement;

manages The Linux Foundation’s trademark program; and assists on other legal matters.

Steve has presented on license scanning and trademark matters at The Linux Foundation’s Legal Summit 2017 and Open Compliance Summit 2017. Previously, Steve was Vice President of Technology Law at Intralinks and an associate at Choate, Hall and Stewart in Boston. Steve graduated from Georgetown University Law Center and majored in computer science at Williams College.

Page 27: License Scanning and Compliance Programs for FOSS Projects

The Linux Foundation promotes, protects and standardizes Linux by providing unified resources and services needed for open source to successfully compete with closed platforms.

To learn more about The Linux Foundation, please visit us at linuxfoundation.org.