Find personally identifiable information (PII) and secrets exposed by your APIs

Mike Solomon
|
September 4, 2023
Find PII in APIs
Table of Contents

Key Takeaways

As companies grow, more and more APIs tend to be created. As these APIs are added, it becomes exponentially more difficult to track how sensitive information flows throughout the codebase. Despite best intentions, it can be quite easy for personally identifiable information (PII), classified, or confidential business information to be accidentally exposed by an application programming interface (API).

In this article, we discuss:

  • What is considered sensitive data to your organization
  • How to search for sensitive data in APIs
  • How to examine the API results

Doing a text-based search for sensitive field names such as ssn or dateOfBirth might help you find where the data is—but, more likely, you would get a ton of irrelevant results. After all, many parts of the codebase require PII and other sensitive data to function, and just because a class has a variable with this name doesn’t mean it’s exposed via an API. Furthermore, by doing a text-based search, you would miss out on cases where classes create or call out to other classes that have sensitive data. And, if that wasn’t tricky enough, all of these security issues are made exponentially worse as you search across thousands of repositories.

Fortunately, Moderne offers a recipe called “Find sensitive API endpoints” that uses rich type information to detect how sensitive data flows throughout your codebase. This recipe can detect all sorts of things that other searches would fail at. For instance, it can detect if an API returns a class (e.g., a PetOwner) that extends another class (e.g., Person) that has a method that returns sensitive data (e.g., homeAddress). Or, it can detect when an API returns a class that has a method that returns a different class that has a method that returns sensitive data. It can even recursively step through classes to find sensitive data flowing through multiple levels of dependencies.

Let’s walk through how you can use Moderne and this recipe to feel more confident that your APIs are not unintentionally exposing sensitive data.

Determine what is considered sensitive data to your organization

Before Moderne can help you find sensitive data, you’ll need to come up with a list of what that means at your organization. Maybe pet names and breeds are considered sensitive data to your company—or maybe you store more traditional sensitive data such as credit card numbers, driver’s licenses, social security numbers, or addresses.

As you develop this list, it’s a good idea to ponder the different ways this data might be stored in your code along with potential vulnerabilities. For instance, if you consider a birthday to be sensitive, you may find that that data is stored in a field like dob or dateOfBirth or birthday or birthDate or so on and so forth. 

Adding all variants to your list will increase the chances that the recipe finds sensitive data.

Once you have a list of all of the sensitive field names, you’re ready to run the recipe. 🚀

Running the recipe to find sensitive data in APIs

Enter the list of field names that you consider to be sensitive at your organization into the find sensitive API endpoints recipe. Figure 1 shows an example of what that might look like.

Figure 1: An example of sensitive data field names

Next, you’ll need to specify whether or not you want to perform a transitive search with this recipe. Setting Transitive to true (recommended) will recursively check through objects for sensitive data. For instance, if this field is true and an Owner object has a getPet method that returns a Pet object that contains PII or private business data (e.g., birthdate from the example above), then the recipe would flag any methods that return an Owner because it could then return a Pet, which contains sensitive data.

On the other hand, if this field is set to false, then the recipe would only check the Owner class and any objects the Owner class extends (such as a Person class). In this scenario, you wouldn’t get any warnings about the Pet object and its sensitive data being exposed unless an API returned one explicitly.

When you’re done configuring the recipe, press the Dry Run button to kick it off.

Results in minutes: Examining the APIs returned with Moderne

Moderne will step through every repository in your organization and look for APIs that return sensitive data. If any are found, they will appear at the top of the results page (See Figure 2). Note that in this example, it took Modern less than a minute to search across 124 repositories and find API endpoints that contained sensitive data. You can try out Moderne on open source repositories, running this recipe and others.

Figure 2: Find sensitive API endpoints results from Moderne

You can click on one of the repositories to find out more information about which APIs exposed what data. Figure 3 shows what this looks like for the spring-petclinic repository (a sample repository).

Figure 3: Examples of endpoints that expose “sensitive” data

You can click on the underlined code to find out more information about what was exposed. For instance, if you click on the Pet API, you will see that it exposes a BirthDate (See Figure 4).

Figure 4: Sensitive data context

If you enabled transitivity, you might also see results that show how the sensitive data flowed throughout your classes (See Figure 5).

Figure 5: Transitive sensitive data context

Using these results, you can quickly determine if your APIs need to be changed to eliminate sensitive, personal data exposure. You can also feel much more confident that your APIs don’t leak sensitive data than you would by doing a traditional search.

Other helpful recipes to check out

If you found this recipe useful, you might also get some benefit out of the find API calls recipe. It will locate all of the places in your code where an outbound call is being made, which can be helpful when auditing your APIs. 

Another neat security recipe is the Java security best practices recipe. It will apply many of the best security practices to your code, such as ensuring temporary files are securely created/deleted and ensuring that XML is being parsed correctly. 

For a list of all the recipes Moderne supports, check out the Moderne recipe marketplace, where you can find over 1000 recipes to help improve or change your code. If you don’t find one that meets your needs, consider helping the community out by writing your own.