Feature Flags

May 12, 2023 • A 10 minute read

What are feature flags?

Imagine that you work on a legacy e-commerce application in a team with 5 other engineers. People are working fast and the codebase is changing daily. You've been assigned to rewrite the product suggestion algorithm in an attempt to improve performance. You know it's going to be a big task and is likely going to take you a good few weeks.

You also know, as a seasoned engineer, that long-lived feature branches are less than favourable. They contradict the principles of CI/CD and you just know that you're going to end up with some messy conflicts very quickly.

Wouldn't it be great if you could implement your changes incrementally, merging small changes frequently, and getting your code into production early on?

But surely you don't want your customers seeing those half-built features? So how do we releases code changes in prod without creating an observable change in the software?

Enter: feature flags.

In essence, feature flags allow for toggling specific features or code paths at runtime without requiring a code change. It is, arguably, a fundamental technique to achieving fully continuous delivery as it allows incomplete features to be merged to the master branch.

Other benefits to feature flags include:

Testing in production: you can enable features in production for specific users (i.e. engineers or testers) and collect data in a live environment
Risk mitigation: feature flags allow the isolation of new, risky, changes. Roll-out of a large refactoring goes wrong? Simply switch off the feature flag and revert back to the legacy code that you know is stable

Use cases of feature flags

There are many possible use cases for feature flags. Below are just a few that I've seen during my career. Some of these are controversial, others are ubiquitous. Hopefully, by the end of this article, you'll know which is which.

Trunk based development: allows you to deploy partially-developed features to production which are hidden behind feature flags and therefore out of reach of the general codepath
Canary launches: roll out risky changes to a subset of users so your code can be tested in situ. Once you're confident everything is working as expected, simply roll out the feature to your wider user base by removing the feature flag
Circuit breaker: kill features in production quickly without requiring a code change. This allows you to turn off expensive features during a traffic spike, for example
A/B testing: compare the performance of two (or more) different implementations of a feature by having a feature flag for one customer, and disabled for the other
Verbose logging: quickly turn on or off verbose logging for particularly complex features during an incident or investigation, without requiring a code change

Implementation

The implementation of feature flags can be as simple or as complex as you make it, depending on your situation and requirements. You can explore a third party service such as LaunchDarkly, or you could roll your own.

If you're rolling your own solution, there is a minimum of three pieces of functionality you'll need to consider:

A centralised data store which holds the data on which feature flags are enabled/disabled (in a primitive example, this could be a simple in-memory boolean)
A method/service/mechanism that allows you to check if a given feature flag is enabled within your code. This will allow you to wrap the togglable code within a condition
A management system that will allow you and/or your team members to switch the feature flags on and off

Centralised feature flag decisions

For those who opt to build their own implementation, one piece of advice I'd offer would be to ensure that you centralise all of your feature flag decision logic in a single location.

Imagine a scenario in which you're building an algorithm which calculates a worker's pay for a given shift. We'll call this feature the "rate engine".

You've completed the MVP which includes calculating base pay based on a given rate and hours worked, and now you are ready to roll it out to production with the use of a feature flag which we'll call enableRateEngine.

Meanwhile, your colleague is working on extending this MVP to allow for calculation of overtime rates. The feature is nowhere near complete, but since we're studiously following Trunk Based Development practices, she'll shortly be releasing part of her code to prod. This overtime-related code will be hidden behind a secondary feature flag which we'll call enableRateEngine-overtime.

We've essentially created ‘feature flag nesting' here. We don't want to start displaying overtime in the application UI unless both enableRateEngine and enableRateEngine-overtime are enabled.

That's one example of feature flag decision-making logic. But there are other types of logic that could be at play. Perhaps in a multi-tenanted environment we'd want to consider the customerId. Or we might want to automatically disable an expensive feature when a certain alarm is triggered.

Having this logic within your main application code will quickly turn your codebase into a tangled mess.

By maintaining a centralised service which contains all of your decision making, we can easily see which logic is controlling which feature flag. It also means that we have one single place to go to when that logic needs to be changed.

const featureFlagDecisionService = (enabledFeatures) => {
	return {
		enableRateEngine() {
			return enabledFeatures.includes("enableRateEngine");
		},

		enableRateEngineOvertime() {
			return enabledFeatures.includes("enableRateEngine")
				&& enabledFeatures.includes("enableRateEngine-overtime");
		}
	};
};

Feature flags & CI/CD

The example at the start of this article briefly touched upon the usage of feature flags in a CI/CD environment, but I'd like to expand a bit further on the process behind this.

A quick overview on my definitions of CI/CD before we get started:

CI (Continuous Integration): the ongoing integration of code changes made by multiple engineers, sometimes across multiple teams
CD (Continuous Delivery): the automatic deployment (or delivery) of those changes to a prod environment via the usage of pipelines

Most modern engineering teams strive to achieve "full CI/CD" which can, essentially, be measured with a combination of:

The number of times the team releases to prod
How much human intervention those releases require

The goal is to release n times per day, with the only human-initiated action being the clicking the "merge" button within the VC repository.

So how do feature flags help us to achieve full CI/CD?

Integration of code changes

Before we can achieve regular, fully automated releases (CD), we need to first achieve CI—the integration of code changes.

Historically, we'd have a release cycle of, say, one release every month. Teams would be working on long-lived feature or release branches and the main branch would remain untouched up until the day of release. And on this day, there would be an influx of pull requests as engineers scramble to resolve merge conflicts in an effort get the release branch ready in time.

In a modern CI/CD environment, this doesn't work. The idea is to merge small changes frequently, even if the feature isn't ready. This is the crux of iterative development which, by the way, comes with a boat load of other benefits which are slightly out of scope of this article.

Okay, but if we're releasing to prod on every merge, how do we prevent unfinished features being seen by our customers?

You said it—feature flags!

Roll-out

Alright, so you've been working on a new feature for the past 6 weeks, developing incrementally by merging and deploying frequently, all in the comfort of knowing that your unfinished feature isn't resulting in any observable change to your software.

Finally, you feel that your feature is ready. Acceptance criteria has been met, your integration tests are passing, and the Product Manager has signed off. So how do we get this thing out into the wild?

This roll-out process varies from company to company, but it generally boils down to a simple 3-step recipe:

1. Canary launch

Enable the feature flag for a subset of users, or a single user. Using various monitoring tools and techniques, observe the usage of the feature and deem whether it's successful.

2. Full roll-out

After X hours/days/weeks, enable the feature flag for all users. Again, keep an eye on your monitoring tools to ensure that everything is working as expected.

Do not remove the feature flag at this stage. To faciliate a full roll-out, simply invert the feature flag. For example, instead of enableRateEngine, you'll now have something like disableRateEngine.

This turns your feature flag into a circuit-breaker. In the event that something goes catastrophically wrong, you can easily kill the feature without requiring a code change.

3. Feature flag removal

After X days/weeks/months of monitoring, clean it up! Remove the feature flag entirely. This is an important, but often over-looked, step of the process. Don't do this and you'll be faced with "feature flag proliferation", which introduces a big problem.

As you can see, monitoring is your friend at every step of the process here. It's vital to have the instrumentation in place to know if your feature is performing as expected at each stage.

The dangers of feature flags

Up until now, I've been talking about how great feature flags are and how they're vital to achieving full CI/CD.

But, like with anything in life, there is such a thing as too much of a good thing. Therefore, let's talk briefly about the hidden (or not-so-hidden) dangers of feature flags.

Feature flag proliferation

Adding one feature flag splits your codepath in two. Adding five feature flags can result in more than twenty codepaths. Imagine how many possible codepaths you'd have with fifty feature flags, given a variable number of feature flag combinations.

I'm sure you can see the issue here. It's difficult enough having a decent test coverage for an application without feature flags. Let alone one with thousands of additional codepaths to have to think about.

Good engineering teams ensure that all feature flags follow the 1-2-3 lifecycle defined above. Operate under the assumption that all feature flags must be ultimately removed. To help enforce this within your team, consider the following:

Assign a max lifecycle span for all feature flags to, say, one quarter
Create a ticket for feature flag removal within your Epic to ensure it's followed up on before the feature can be regarded as 'complete'
Limit the total number of feature flags permitted within your system at any given time. If the limit is reached, a feature flag must be removed before a new one can be added

Whatever process you settle on, the key thing here is to not let the number of feature flags in your system spin out of control.

Feature flags as configuration

Feature flags are cheap to create, but expensive to maintain.

Customer X wants its users to be able to toggle ‘dark mode' but Customer Y doesn't. Because feature flags are cheap to create, it's easy to fall into the trap of using them as a configuration mechanism, like in this example.

I find that nine times out of ten, configuration at this level isn't actually required. Push back on this—if a customer doesn't want to use a feature, they should be able to simply ignore it.

As a rule of thumb, keep feature flags for roll-out purposes only, and don't allow them to become long-lived. Follow these guidelines and you won't fall into the pitfalls of feature flag proliferation.

Conclusion

A vital tool for achieving CI/CD, feature flags allow for great flexibility, but they come with a cost. A cost which is not easy to observe at first, but can spiral out of control fairly quickly. Keep them to roll-out purposes only, and watch out for the warning signs!