Allocating resources for the agent

Last updated 2024-06-15

IMPORTANT

This guide only applies to Next-Gen WAF customers with access to the Next-Gen WAF control panel. If you have access to the Next-Gen WAF product in the Fastly control panel, you can only deploy the Next-Gen WAF with the Edge WAF deployment method.

The Next-Gen WAF agent requires computational resources to properly function. When setting up and testing your deployment, you must allocate adequate CPU, RAM, and file handles to the agent. The exact amount needed for your web application and anticipated traffic flow depends on multiple factors such as request size and content, application and network latency, and number and type of rules applied. Generally, try to balance allocating sufficient resources for the agent to keep request latency to a minimum.

Average agent resource usage

While the exact computational resources needed for the agent depends on the deployment, you can use aggregate agent resource usage numbers from other users to get an idea of what your deployment will need. Looking at the averages provides a good starting point, and the higher percentile values give an indication of what can be anticipated for high volume deployments.

TIP

For an initial proof of concept, we suggest each agent have access to one full CPU core and at least 500MB of RAM. Once you perform load testing of your unique traffic and web application usages, you can scale these resources up or down as appropriate to accommodate your unique needs.

Average CPU usage

Most deployed agents do not fully use the CPU resources they have available. It is typically desirable to budget more CPU than the agent may need at baseline traffic levels to allow enough headroom to remain performant during traffic spikes.

The table below shows a snapshot in time of deployed agents' CPU usage adjusted to normalize for the number of CPU cores the agent is configured to use.

Percentile	CPU usage
Average	~3%
95th percentile	~20%
99th percentile	~41%
99.9th percentile	~65%

Average RAM usage

The agent memory footprint typically remains fairly stable during its long-term execution. However, Golang is a garbage collected language, so there may be momentary memory spikes above the typical baseline during traffic spikes. To account for these spikes, we recommend leaving additional memory headroom above the normal observed baseline to allow the agent to momentarily burst higher without needing to perform as many garbage collection cycles. Variance in customer configurations, datasets, and rulesets may impact the memory footprint.

The table below shows a snapshot in time of deployed agents' RAM usage.

Percentile	RAM usage
Average	~76 MB
50th percentile	~67 MB
95th percentile	~126 MB
99.9th percentile	~280 MB

Average number of usable CPU cores

To prevent CPU starvation for other processes on the system, the agent defaults to using only half of the available CPU cores on the host. You can use the max-procs setting to change this number.

The table below shows a snapshot in time of the number of usable CPU cores provided to deployed agents. Keep in mind the following about the data:

Some agents have modified max-procs configurations above or below the 50% default.
All deployment types are represented, including Kubernetes deployments. Kubernetes deployments have a higher quantity of lower resourced agents, which causes a strong skew toward the low end of the distribution.

Percentile	Number of usable CPU cores
Average	6
50th percentile	2
75th percentile	8
95th percentile	18
99th percentile	48

Calculating agent resource needs

To understand the infrastructure costs the agent will incur, complete the following steps:

Prior to deploying the Next-Gen WAF, measure the computational resources used by your web application, especially during peak usage times. Focus on CPU utilization, memory usage, and file handle counts, both for the web application and for the OS in general.
Deploy the Next-Gen WAF and register it with a representative set of rules.
Verify that traffic is being inspected, tagged, and blocked when applicable.
Perform the same tests that you conducted in step one. If possible, run the tests on the same hosts, with the same resources, at the same time of day, on the same days of the week, and during the same business cycle as before.
Compare the results of the tests.

For example, let's say that prior to installing the Next-Gen WAF, you measure 50 percent CPU usage to run your web application. Then, after installing and configuring the Next-Gen WAF alongside your application, you measure 90 percent CPU usage and no adverse latency to your web application. You may want to scale to a larger host system with more CPU cores to have spare capacity for traffic fluctuations. For Kubernetes environments this may mean adapting your replica count, auto scaling, or resource limit configurations.

Hypervisors

Virtual machines run in an environment that shares resources with other virtual machines. Sometimes these groups of VMs are over-subscribed on the hypervisor. The agent is a real-time security solution designed to operate with minimum latency. If the virtual machine has to wait on the hypervisor for CPU cycles, queuing will occur and performance will suffer. In extreme examples, this type of resource starvation can lead to complete failures and application crashes. To avoid this resource scheduling conflict, we recommend running the Next-Gen WAF on a VM that is not over-subscribed or that has resource reservations.

Kubernetes

It is a common practice to request and limit resources in Kubernetes in general. To help place the Next-Gen WAF pods on nodes with sufficient resources, we recommend setting resource requests. However, we generally don’t recommend setting resource limits because if the pod reaches those limits it will be killed, impacting the level of protection and potentially causing disruptions. To restrict the number of CPU cores the agent is permitted to fully use, use the max-procs setting.

Do not use this form to send sensitive information. If you need assistance, contact support. This form is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Network services

Security

Compute

Quick start

Building blocks

Integrations

Tutorials

Demos

Use Cases

Code Examples

Starter Kits