Hero Image

Test Strategy with Feature Flags in LaunchDarkly

 

02 Jan 2025 | Marta Rydel

Test Strategy with Feature Flags in LaunchDarkly 

In today’s fast-paced software development landscape, delivering innovative features quickly and safely is more important than ever. Feature flags—also known as feature toggles—have revolutionized this process by enabling developers to control feature releases without redeploying code. At the forefront of feature management solutions is LaunchDarkly, a platform designed to scale and simplify feature flag implementation.

While feature flags unlock unprecedented flexibility and control, they also introduce unique testing challenges that teams must address to ensure quality and stability. In this article, we’ll explore the fundamentals of feature flags, how LaunchDarkly streamlines their management, and the key strategies for mastering testing in a feature-flag-driven workflow.

LaunchDarkly: A Comprehensive Feature Management Platform

Feature flags are a software development technique that allows teams to turn certain features on or off at runtime, without changing the code or redeploying the application. This gives developers the ability to control the availability of features dynamically, often used for testing in production, continuous delivery, and A/B testing. They can be used in various stages of development and in different environments, allowing teams to adapt and respond quickly to user needs and technical challenges.

LaunchDarkly is a leading feature flag management platform that helps organizations control the delivery of software features in a more controlled, predictable, and scalable way. Key features of LaunchDarkly includes Flag Targeting, Percentage/Progressive Rollouts or Experimentation.

Types of Flags in LaunchDarkly

LaunchDarkly offers several types of flags, each suited to different use cases. These include: Release Flags that are used for temporary feature rollouts, kill Switch Flags as a safety mechanism to quickly disable non-critical functionality or third-party services in case of issues, Experiment, Migration or Custom Flags.

Flags can be either temporary or permanent.

  • Temporary Flags are usually created for features that will eventually be removed once they are fully rolled out. They should be deleted when no longer needed.
  • Permanent Flags are used for long-term functionality where the flag is an integral part of the system’s operations, such as enabling certain capabilities or user-targeting features.

feature flags

Trunk-Based Development and Feature Flags

One of the practices that LaunchDarkly supports is trunk-based development, a version control strategy where developers commit small, frequent changes to the main branch (the “trunk”). This practice enables teams to continuously integrate new features and fixes while minimizing merge conflicts. With LaunchDarkly, developers can keep code hidden behind feature flags, ensuring that new functionality doesn’t go live until it’s ready. This approach allows for more flexibility and faster delivery of new features.

feature flags v2

Testing Challenges with Feature Flags

Testing software that incorporates feature flags presents a unique set of challenges that largely depend on how these flags are being utilized within the application. The dynamic nature of feature flags, which control the activation of certain features at runtime, adds complexity to the testing process. Therefore, careful analysis and thoughtful planning are essential before testing begins.

To illustrate these challenges, let’s consider a development team working on an online shopping platform and how they would approach testing a new feature using feature flags:

  1. Code Push and Initial Testing: The developer pushes the new code to the repository after testing the functionality locally. At this point, the feature flag is turned on but limited only to test environments. This allows for controlled validation of the feature without impacting the entire user base.
  2. Quality Assurance (QA) Testing: A QA engineer then takes over the testing process. They run through the feature, identify any bugs, and develop automated test scripts. If the feature passes testing, with bugs addressed and test scripts validated, it is then moved to the staging environment.
  3. Regression Testing: Once the feature is in staging, it’s critical to perform regression testing. The tests are highly dependent on the current state of the feature flag—whether it’s enabled or disabled. Even if the flag is off, there’s a need to ensure that the underlying code must not interfere with existing functionalities elsewhere in the system. It’s not just about whether the feature works; it’s about ensuring no unintended side effects or regressions occur in other parts of the application.
  4. Production Deployment: After successful regression testing, the feature is ready for production. However, whether the feature flag is turned on or off in production is often a decision made based on business requirements. The testing team must ensure that the application functions as expected in both scenarios, and that users are properly targeted based on the flag’s state.

Avoiding Dependencies Between Features

When planning and developing features that will be managed via flags, it’s important to avoid unnecessary dependencies between features. The more tightly coupled the features are, the more difficult it becomes to control them independently through flags. Ideally, each feature should be developed and tested in isolation, and feature flags should reflect this independence.

When dependencies are unavoidable, it’s critical that the feature flag structure is designed to reflect and manage those dependencies. In LaunchDarkly, this can be achieved by using:

  • Prerequisite flags: are flags that must be enabled in order for other flags to work,

Or

  • Dependent flags: are flags that represent a feature or sub-feature within a larger feature.

Proper flag management ensures that, even in the presence of interdependencies, testing remains manageable and does not cause cascading issues when flags are toggled.

Defining Proper Test Cases for Feature Flags

Feature flags often result in a variety of possible configurations, especially when dealing with multiple flags and variations. Since systems can have dozens of flags, some of which may serve multiple functions (beyond simple Boolean values), it is practically impossible to test every combination of flag states. However, certain test case strategies can help ensure robust testing without covering every possible combination.

  • Test All Flag Variations When Turned On: This is the simplest and most direct test. When a flag is enabled, ensure the correct feature or functionality is presented to the user. Depending on how many variations exist within the feature (for example, multiple settings for the same flag), the number of test cases will grow accordingly.
  • Test System Behavior When a Flag is Turned Off: The system should not be impacted when a flag is disabled. It’s essential to confirm by regression tests that other features and existing functionalities continue to operate smoothly, unaffected by code that is behind a feature flag.
  • Test the Fallback Value: Every feature flag should have a fallback value defined in the code. This fallback is used when LaunchDarkly is unavailable, such as during network issues or service outages. It is critical to test how the application behaves when LaunchDarkly cannot be reached and the fallback values are used in place of flag-driven configurations. Make sure that all the fallbacks are fully compatible with each other and the application’s state is as expected.
  • Test Cases with Likely Flag Variations: While you may not be able to test every possible combination, it’s important to consider edge cases where flags may conflict or interact in unexpected ways. This includes testing combinations that are likely to occur and may cause interoperability issues between flags.
  • Coordinate Between Teams to Avoid Test Interference: Feature flag testing requires close coordination between developers, testers, and even Quality Assurance Engineers. It’s important to ensure that different testers do not change the same flag settings simultaneously, as this could lead to non-deterministic errors. A lack of coordination can make it difficult to pinpoint the root cause of issues. This is why maintaining independence between features and flag states is so important to the success of testing.
  • Create Test Data That Match Targeting Rules: Many feature flags allow for user-specific targeting, such as serving different feature variations based on location, subscription plan, or other user attributes. It is crucial to create test data that match the targeting rules defined for the feature flag to ensure that users receive the correct version of the app and that the flag behaves as expected across different segments.

Regression Testing with Feature Flags

Once the functional tests have been completed and the feature is deployed to staging, regression testing begins. During this phase, it’s important to replicate the exact flag configuration that will be used in production. This ensures that testing reflects the real-world state of the application and that users will experience the feature in the same way once it’s live.

Automated Test Suites: Automated testing is especially valuable in this context. These tools can quickly verify that the correct flag variations are applied, making regression testing faster and more reliable. Automated tests can also perform repetitive checks that are crucial for verifying basic user journeys. With these tools, QA teams can more easily validate the system under different flag settings and configurations. This can be achieved by the integration with LaunchDarkly API to setup the feature flag environment as desired.

Verify That Legacy Features Aren’t Broken: Regression testing with feature flags is not just about verifying that the new feature works but also ensuring that no legacy features or functionalities are broken. Turning flags on and off may affect existing features, so it’s essential to check that older functionality remains intact, regardless of the state of the feature flag.

Flag Maintenance and Removal

One of the ongoing responsibilities when using feature flags is ensuring proper flag maintenance. Temporary flags, which are typically used during feature rollouts or experiments, should be removed once they have fulfilled their purpose. This prevents clutter and reduces the risk of introducing bugs due to leftover flag configurations in the code.

Once a flag has been enabled in production, and no issues have been identified, it’s important to remove it from the codebase. Flag maintenance should always involve collaboration with the QA team to ensure that no unintended effects occur when the flag is removed. Even if the flag removal is transparent to end users, it can still impact the underlying system. LaunchDarkly offers a solution to monitor how many flag evaluations have taken place for each flag. Flag status (New, Active, Launched, Inactive) analysis allows and owner to decide which flags can be removed.

For more information book a meeting with a member of our QA team and keep and eye on our blog pages for a forthcoming Cypress and LaunchDarkly article.