Announcing HAProxy 2.9

HAProxy 2.9 is now available.

Watch the webinar HAProxy 2.9 Feature Roundup to learn more about this release.

HAProxy 2.9 further extends HAProxy's performance, flexibility, and observability. 

  • This is the fastest HAProxy yet, with improved performance efficiency in HTTP/2, threads, shared pools, logs, health checks, maps, caching, stick tables, and QUIC. 

  • More flexibility with syslog load balancing, more QUIC options, more SSL/TLS libraries, Linux capabilities, and an experimental new interaction between data center load balancers and edge load balancers we call “Reverse HTTP”.

  • You can customize your HAProxy even more to get exactly the behavior you want, with a new none hash type for custom hash-based load balancing, new functions for working with Proxy Protocol TLV fields, more caching options, more Lua options, and new converters.

  • You can get more visibility into the operation of HAProxy, and get more helpful information for simpler decision-making.  This includes new fetch methods, more options for Lua logging, debugging hints, and warnings for inconsistent CPU bindings.

This release came together through the efforts of all the community members who got involved. A release like this requires feature requests, bug reports, forum discussions, code submissions, QA tests, and documentation! In other words, this project is fueled by people like you! If you're interested in joining this vibrant community, it can be found on GitHub, Slack, Discourse, and the HAProxy mailing list.

How to get HAProxy 2.9

You can install HAProxy version 2.9 in any of the following ways:

more-performance-haproxy-2_9

More performance

HAProxy is the world’s fastest software load balancer, famously achieving 2 million RPS on a single Arm-based AWS Graviton2 Instance. In HAProxy 2.9, we’ve pushed performance efficiency even further, making cost-effective scalability even more achievable.

HTTP

The HTTP/2 implementation uses significantly less memory and is up to 40-60% more CPU-efficient on large transfers thanks to a faster recycling of the buffers that significantly increases the probability of performing zero-copy forwarding operations. 

Previously, when data was available the receiving mux would allocate a buffer, put the data into it, then pass it along the chain. Some heuristics were in place to avoid copies by trying to only swap buffer pointers, but the data would still require space and cause L3 cache eviction, which degrades performance and increases DRAM bandwidth usage. 

The improvement comes from a change where the receiving side (typically the backend mux) will now request the sending side’s (frontend mux) buffer and receive the data directly into it, respecting flow control and available space. This avoids additional buffering which saves memory and preserves data in CPU caches. 

Currently TCP, HTTP/1, HTTP/2, and HTTP/3 benefit from this. Updates for the cache will come later. Zero-copy forwarding can be disabled using tune.disable-zero-copy-forwarding.

Threads

An upgrade of the locking code produced a performance gain of 2-4% on x86 systems and 14-33% on Arm systems. 

Pools

Measures to reduce contention on shared pools improved performance for HTTP/2 and HTTP/3. The HTTP/2 request rate improved from 1.5M to 6.7M requests/second on 80 threads without the shared cache, and from 368k to 4.7M requests/second with the shared cache.

Logs

In the log-forward section, the lock used in the part of the code related to updating the log server's index during sampling has been replaced with atomic operations to reduce bottlenecks and facilitate ~4x faster log production.

Health checks

HAProxy 2.9 improves upon the changes to health checks that were introduced in version 2.7. In version 2.7, health checks were modified to run on a single thread, which on systems with many threads reduced thread contention. However, that version also allowed for the health checks thread to offload its work if it ever became overloaded. 

In version 2.9, the health checks thread will be even more proactive about offloading work if it finds a less busy thread available. This retains the advantages of running health checks on a single thread, but with greater sensitivity to the amount of work that the thread has to do. This comes into play especially in configurations containing many backend servers and many health checks. You will see an improvement in particular when running a large number of TLS health checks, especially when using OpenSSL 3.x where the health checks are extremely expensive.

To fine tune this, the new tune.max-checks-per-thread global directive sets the number of active checks per thread above which the thread will actively try to find a less busy thread to run the check. You can also use spread-checks to slow down health checks on overloaded threads.

TLS sessions are stored one per-thread in order to limit locking, but as a side effect, each thread had to perform its own handshake if no connection could be reused. This had potential to cause slow health checks, as the health checks could be responsible for as many handshakes as there are threads. Now a thread will automatically try to find an existing TLS session if it doesn't have one which aids in speeding up health checks convergence and reducing the load on servers when reloading the load balancer frequently.

For external checks, there is now an option to preserve environment variables by appending preserve-env after the external-check global directive so that scripts that depend on them can also work.

Maps

The http-request set-map and other set-map directives gained a performance boost. Previously, updating a large map would require a linear search of the verbatim entry that is called a reference. Frequent updates to large maps could cause significant latency and greatly increased CPU usage. 

For HAProxy 2.9, a tree was implemented for the reference, which uses a little bit more memory but makes it so that the search time is logarithmic during updates, so these operations are now performed almost in constant time. A future update will use a more compact tree that will reduce the memory usage. The gain applies to the Lua function core.set_map as well. 

Additionally, HAProxy will now avoid performing lookups in maps and ACLs known to be empty, saving CPU time.

Caching

Improvements to the caching storage mechanism to reduce its lock contention on large systems allow it to scale much more linearly with threads.

QUIC

QUIC now uses less memory by allowing for earlier release of resources associated with connections winding down. This improves performance at the termination of a connection when there’s no data left to send. 

Stick tables

The locking around stick tables was carefully refined by disentangling lookups from peer updates, and improving some undesired cache line sharing.

haproxy-2_9-release-more-flexibility-image

More flexibility

From community discussions, feedback, and user reviews, we see HAProxy used in a huge variety of different ways. We recapped a few of the use cases that users mentioned in recent G2 reviews. With broader load balancing and SSL/TLS support, plus a host of new options, HAProxy 2.9 is more flexible than ever, helping you to solve many more problems with one powerful multi-purpose tool.

Load balancing syslog

Previously, load balancing syslog required the use of sampling with log forwarding. While this provided a primitive form of load balancing for syslog, it was not ideal and could result in lost logs, as there was no way to verify whether the log servers were up. HAProxy now supports true load balancing for syslog traffic by using log backends, and provides health check mechanisms to ensure robust load balancing for logs. 

A log backend is a backend responsible for sending log messages to the log servers. To specify a backend as a log backend, specify  mode log. Below, we show how to load balance HAProxy's own logs by setting our global  log line to direct log messages to the log backend named mylog-rrb:

global
log backend@mylog-rrb local0
backend mylog-rrb
mode log
balance roundrobin
server s1 udp@10.0.0.1:514
server s2 udp@10.0.0.2:514

Combining a log backend with the native logging infrastructure for relaying logs provided by log-forward produces an elegant solution for syslog load balancing. Below we demonstrate load balancing syslog messages coming in through the log-forward section named graylog using the log backend mylog-rrb:

log-forward graylog
# Listen on Graylog ports
bind :12201
dgram-bind :12201
log backend@mylog-rrb local0
backend mylog-rrb
mode log
balance roundrobin
server log1 udp@10.0.0.1:514
server log2 udp@10.0.0.2:514

Be sure to set up your remote syslog servers to expect logs with the facility code local0 to match the examples.

List each log server in the log backend that should receive log messages. Log backends support UDP servers by prefixing the server's address with the prefix udp@.

HAProxy 2.9 adds the ability to specify the load balancing algorithm to use for logs with the balance directive. Use this directive as you would use the balance directive for a regular backend, specifying roundrobin, random, log-hash, or sticky as the load balancing algorithm. 

Log backends support common backend and server features, but do not support the HTTP and TCP related features. You can enable health checks for the log servers too. Beware that there is no way to check a UDP syslog service, so you would need to check a TCP port on the server or configure agent checks. TCP is more reliable, and should work well for the majority of users, though its performance is generally lower due to extra copies and queuing costs.

You can tune the buffer size of the implicit ring associated with the log backend using the log-bufsize setting. A larger value for this setting may increase memory usage, but can help to prevent loss of log messages. 

For further updates, follow our progress for Improved log server management on GitHub.

QUIC enhancements

Limited QUIC with a TLS library that does not support QUIC

When built with a TLS library that does not support QUIC, quic bindings are rejected and HAProxy produces an error, but a limited QUIC implementation may be enabled using the new limited-quic setting. 

To use this, you must build HAProxy with USE_QUIC=1 and USE_QUIC_OPENSSL_COMPAT. As the name suggests, this is only compatible with OpenSSL. This is labeled as limited QUIC and has the following limitations: 

  • 0-RTT is not supported in this mode. 

For QUIC support, we recommend using a TLS library with full QUIC support wherever possible, rather than OpenSSL.

Listeners

Two settings, the quic-socket argument on a bind line and the global directive tune.quic.socket-owner, allow specifying whether connections will share a listener socket or each connection will allocate its own socket. Previously, only tune.quic.socket-owner was available, which could be applied to all listeners globally, but as of version 2.9 the setting can be applied for each bind line using quic-socket. The latter option is the default since it performs the best, but on systems that don't support all the necessary features in their UDP stack, HAProxy will switch to the shared listener mode. See the Linux capabilities section of this article, as socket-owner connection may require cap_net_bind_service to work as expected.

Performance tuning

Four new performance tuning settings affect QUIC in listener mode by changing the size of the receive buffer:

  • tune.rcvbuf.backend

  • tune.rcvbuf.frontend

  • tune.sndbuf.backend

  • tune.sndbuf.frontend

Maximum concurrent connections

In this version, the global directive maxsslconn, which sets the maximum, per-process number of concurrent TLS connections, now applies to QUIC connections as well. Also in this version, QUIC connections are counted in maxconn since their allocation. Previously, they had been counted only after handshake completion. Along with this, the number of concurrent handshakes and connections waiting to be accepted is now limited by the backlog directive. Finally, there is also a new timeout client-hs directive, which should prevent handshake from running too long. This timeout is applied for handshakes on TCP, HTTP/1, and HTTP/2. It falls back to timeout client when not set.

QUIC with AWS-LC TLS library

QUIC now builds with the AWS-LC TLS library, minus a couple of features: 0-RTT, some ciphers based on chacha20 and aes128-ccm, and the client hello callback used to select between RSA and ECDSA certificates.

Signing algorithms and elliptic curves for TLS

Previously, HAProxy 2.8 added the client-sigalgs and sigalgs arguments to the bind line, giving you the ability to restrict which signature algorithms could be used with client certificate authentication and during a TLS handshake, disallowing weak cryptography such as SHA-1. In this release, you can now set these same arguments on server lines in a backend. This lets you restrict signature algorithms when connecting to backend servers, both when establishing a TLS connection and when using client certificates.

You can set allowed algorithms on each server line, set them for an entire backend by using a default-server line, or set them for all servers by using the global directives ssl-default-server-client-sigalgs and ssl-default-server-sigalgs.

Also in this release, the server directive gained the curves argument, which lets you specify which elliptic curves to use when sending the ClientHello message to a backend server during a TLS handshake. You can either set this list per server or set it globally with the ssl-default-server-curves directive. A similar argument already exists for the bind line.

OpenSSL, WolfSSL, and AWS-LC

OpenSSL, which has long been the TLS library we recommend, has dropped support for version 1.1.1 of its library, leaving only the 3.x version. Because the 3.x version has shown poor performance compared to the older 1.1.1 release, and also because it lacks QUIC protocol support, HAProxy contributors have been working to implement alternatives. 

HAProxy 2.9 adds the AWS-LC library to the list, and the latest version of WolfSSL also provides good compatibility. Check the install guide in GitHub to learn how to build HAProxy with WolfSSL or AWS-LC. The support status of these libraries changes rapidly, and as such, you should always check the wiki article to verify the status of supported TLS libraries. This article is updated regularly to accurately reflect the updated support status.

Linux capabilities

The new global directive setcap allows you to preserve previously set Linux capabilities after the HAProxy process starts up. After startup, HAProxy typically runs to a non-root user and loses these capabilities—the global directives that set which user and group to run as are user and group.

In Linux, capabilities are permissions you can grant directly to a program that it would not otherwise have when running as a non-root user. For example, without running as the root user, a program would not ordinarily be able to bind to an IP address not assigned to the server. By granting it the cap_net_raw capability it can, allowing use cases like transparent proxying.

As a real-world example, in full transparent proxy mode, the source directive's usesrc argument allows for connecting to servers while presenting the client's source IP address as the load balancer's IP. It allows setting a source IP address that does not belong to the system itself. This requires elevated privileges or for you to pass the cap_net_raw capability to HAProxy with the setcap global directive.

Multiple capabilities may be specified, such as cap_net_bind_service which allows binding to a privileged port for QUIC.

Reverse HTTP

An experimental feature in HAProxy 2.9 enables something we're calling reverse HTTP. The idea is for a datacenter load balancer to initiate an outbound connection to an edge load balancer, and then the edge load balancer converts that connection to be bidirectional. These bidirectional connections are put into a pool of idle connections. When clients make requests, the edge load balancer sends them over these connections to the datacenter load balancer.

Comparing this to a traditional load balancer setup, where a backend server pool consists of a static list of IP addresses, this approach looks more like dynamic service discovery. The datacenter load balancers register themselves with the edge load balancer by initiating connections to it, which puts them into the server pool.

This feature relies on HTTP/2 over TLS and uses client certificate authentication when registering. Coincidentally, a similar design was published by the IETF HTTP working group only hours after we finished our first design meeting. There are subtle differences but we're exchanging with the draft's authors and intend for our designs to converge; as such, the Reverse HTTP feature must be considered experimental until the draft becomes a standard, so that we can adjust the protocol for better interoperability.

Use cases for reverse HTTP include:

  • a new method of service discovery where HAProxy instances self-register with a public-facing, edge load balancer. This could enable self-service publishing of applications in organizations where teams manage their own apps, but not the load balancing infrastructure.

  • publishing applications running on an internal network behind NAT. These applications, which would be behind a local HAProxy instance, would connect to a load balancer that has a public IP address. Clients would then connect to these applications through the public-facing load balancer by requesting the application's unique FQDN, where FQDNs differentiate connections idling in the server pool.

  • the ability for mobile application developers to test their applications from their smartphone over the internet while their applications run in debug mode on their PC.

  • tech support situations where customers need to be able to download software directly from or upload traces directly to their PC.

haproxy-2_9-release-more-customization-image

More customization

HAProxy gives you the power to customize its operation so that it behaves exactly as you need it to. This approach starts with open source and extends to every corner of the configuration, from traffic shaping to the Lua event framework. In HAProxy 2.9, we’ve given you even more options, providing you even more opportunities to get the unique load balancing experience you need.

Hash-type none

HAProxy 2.9 adds a new hash function called none to the hash-type directive. Before, you had the choice of several hash functions including sdbm, dbj2, and others. By using none, you can manually hash the key using a converter and then have HAProxy use the result directly.

Recall that one way to load balance traffic is to generate a hash of some part of the request, for example, the URL path,and associate that hash with a server in the backend. Often used for load balancing cache servers, hash-based load balancing ensures that traffic for a given resource, such as a video, an image, etc., will go to the same server where that resource was cached.

HAProxy 2.6 added flexibility to this approach by providing a generic hash load balancing algorithm as a replacement for the more specific algorithms source, uri, hdr, url_param, and rdp-cookie. The generic algorithm lets you generate a hash from the output of any fetch method. Combining this with the new none function gives you ultimate flexibility in deciding what and how to hash.

Proxy Protocol TLV fields

The Proxy Protocol enables you to preserve a client's IP address when relaying traffic through an HAProxy load balancer. Otherwise the IP address is replaced by the load balancer's own address. It works by including the client's IP address in a header attached to the TCP connection. It is supported by a number of web servers and proxies.

The protocol supports attaching additional information, beyond the IP address, in what are called Type-Length-Value (TLV) fields. For example, AWS Elastic Load Balancer uses TLVs to send the VPC endpoint ID for traffic routed through a VPC endpoint service.

In this version of HAProxy, you can: 

  • set new TLVs by adding the set-proxy-v2-tlv-fmt argument on a server line. 

  • use the fc_pp_tlv fetch method to extract TLVs from the protocol header. 

  • use set-proxy-v2-tlv-fmt to forward TLVs you receive with fc_pp_tlv.

Caching supports Origin header in Vary

This version updates HAProxy's small object cache to support the Origin header in Vary. Previously, the implementation supported varying on Accept-Encoding and Referer only. 

By supporting the Origin header in Vary, you can cache CORS responses. Before this, HAProxy would return cached responses that were missing CORS headers or had CORS headers that didn't match the client, resulting in an error client-side. Users who tried to fix this by adding Vary: Origin would have found that none of those responses got cached, since the HAProxy implementation didn't support Origin.

Lua features

Lua includes several new features:

  • A new function named core.get_var(var_name) returns proc scoped variables. The older txn:get_var(var_name) returns txn scoped variables.

  • The httpclient class now supports a retries field, which sets the number of times to retry a failed request.

  • The httpclient class now supports a timeout.connect field for setting the maximum time to wait for establishing a connection, in milliseconds. It defaults to 5 seconds.

  • The core.register_action now supports http-after-res.

Fragment references

Per RFC3986, the HTTP protocol does not permit certain characters in URIs sent to servers. One example of a character that browsers should not send to the server or load balancer as part of the URI is the hash mark #, which indicates a URI fragment. Fragments are parsed and interpreted client-side. It may be the case, however, that a buggy client or server will send such data, and HAProxy offers two directives that will allow these requests: option accept-invalid-http-request and option accept-invalid-http-response. When enabled, these options relax HAProxy’s header name parsing and allow invalid characters to be present in header names and URIs. This means that when the option is enabled in HAProxy 2.9, incoming URIs are allowed to contain fragment references. 

These options should never be enabled by default, however, and you should use them only once you've confirmed a specific problem. 

When enabled, requests with erroneous fragment references are captured for later analysis. You can use the show errors Runtime API command to see these requests. 

Converters

Date format converters

Four new converters assist in allowing any date format to be used in logs.

  • ms_ltime

  • ms_utime

  • us_ltime

  • us_utime

ms_ltime and ms_utime work like their ltime and utime counterparts but take input in milliseconds. us_ltime and us_utime work like their ltime and utime counterparts but take input in microseconds. These converters translate integers (containing a date since epoch) into a date format string.

Bytes converter

bytes can now take its offset and length from variables as well as literal integers, enabling it to extract contents from more complex protocols such as those that contain TLVs, so you can skip past some TLVs and read others.

JSON query converter

HAProxy now supports retrieving JSON arrays from the request body via the json_query converter. This converter supports the JSON Path syntax, which includes the JSON types string, boolean, number, and now array.

haproxy-2_9-release-more-visibility-image

More visibility

HAProxy is valued for its transparency, robust logging and stats capabilities. In HAProxy 2.9, we’ve provided more methods to fetch the information you need and post logs from Lua scripts, plus helpful hints and analysis to aid debugging. This improves usability, and makes troubleshooting and planning easier.

Fetch methods for log information

New fetch methods make it easier to access information that had previously been available only within the access logs or had been complex to work with using log-format variables: 

pid

The process ID of the current process which is usually the worker process. 

act_conn

The total number of active, concurrent connections on the process.

bytes_in

The number of bytes uploaded from the client to the load balancer. 

bytes_out

The number of bytes transmitted from the load balancer to the client.

accept_date

The exact date when the connection was received by HAProxy.

request_date

The value for the exact date when HAProxy received the first byte of the HTTP request.

fc.timer.handshake

The time spent accepting the TCP connection and executing handshakes and is equivalent to %Th.

fc.timer.total

The total session duration time and is equivalent to %Tt.

req.timer.idle

The idle time before the request and is equivalent to %Ti.

req.timer.hdr

Time spent waiting to get the client's request and is equivalent to %TR.

req.timer.tq

The sum of %Th, %Ti and %TR and is equivalent to %Tq.

req.timer.queue

Time spent queued and is equivalent to %Tw.

bc.timer.connect

Time spent establishing a connection to the backend server and is equivalent to %Tc.

res.timer.data

The time spent transferring the response payload to the client and is equivalent to %Td.

res.timer.hdr

Time spent waiting for the server to send a full response and is equivalent to %Tr.

txn.timer.user

The estimated time as seen by the client and is equivalent to %Tu.

txn.timer.total

The active time for the HTTP request and is equivalent to %Ta.

Originally, this information was available only from within your access logs. You could define a custom log format that included variables with terse names like %Th (connection handshake time) to capture access log values.

Then, in version 2.5, the http-response set-var-fmt directive was added, which let you capture the information in variables for use in other parts of your configuration. For example, below, we store %Tw (time waiting in queue), %Tc (time waiting to connect to the server), and %Tr (time waiting for the server to send the response) as variables and then add them up to get the total response time:

http-response set-var-fmt(txn.queue_time) %Tw
http-response set-var-fmt(txn.connect_time) %Tc
http-response set-var-fmt(txn.response_time) %Tr
http-response set-var(txn.total_response_time) var(txn.queue_time),add(txn.connect_time),add(txn.response_time)

This syntax is still the way to go for performing math operations on variables, since the add converter expects a variable name as an argument, not a fetch method. However, in certain parts of the configuration using a fetch method is more convenient than using a log-format variable, as in the example below where we increase the log level for any request that experienced a slow response:

http-after-response set-log-level err if { txn.timer.user ge 5000 }

ACL fetches

A new fetch allows you to gain insight about defined ACLs: 

  • acl evaluates up to 12 named ACLs and returns a boolean. Provide the ACLs in a list with a comma between each one. You can use the ! operator to invert the result of an ACL.

    This example shows the syntax:

acl(!is_malware,goodguys,auth_ok)

Layer 7 fetches

Two new fetches at Layer 7, the application layer, provide more information about HTTP contents:  

  • req.cook_names returns the names of all cookies in requests.

  • res.cook_names returns the names of all cookies in responses. 

Layer 4 fetches

Three new fetches at Layer 4, the transport layer closest to the connection, provide more information about connections:

  • ssl_bc_curve retrieves the name of the curve used in the key agreement when the outgoing connection was made over an SSL/TLS transport layer. 

  • ssl_fc_curve does the same for the incoming connection. 

  • cur_client_timeout retrieves the value in milliseconds for the currently configured client timeout.

  • fc_pp_tlv returns the TLV value for a given TLV ID. This fetch may be useful in detecting errors related to duplicate TLV IDs. 

Lua logging

Two new global directives give you precise control over logging from your Lua scripts. 

  • Setting tune.lua.log.loggers to on enables sending log messages from your Lua script to the loggers you've configured with log lines. Setting it to off disables the logging. The default value is on.

  • Setting tune.lua.log.stderr to on will send log messages to stderr. When set to auto, logging to stderr occurs when tune.lua.log.loggers is set to off or when there's no proxy-level logger assigned, such as for Lua tasks that run in the background. The default value is auto.

In your Lua script, use any of the following functions to write to the log:

  • core.Info("message")

  • core.Warning("message")

  • core.Alert("message")

Debugging hints

Debug mechanisms have been improved to help reduce the number of round trips between users and developers during troubleshooting sessions. To accomplish this, panic dumps may now show better hints and likely causes for certain situations. 

For example, if a thread is waiting on the Lua lock while the lua-load directives are in use, a message regarding trying lua-load-per-thread will be emitted. Another case may be the situation where a watchdog triggers inside Lua, upon which some possible causes will be proposed, including the case where perhaps the script depends on some unsafe external library. 

Tasks and memory profiling now indicate over what period the measures were taken, suspected memory leaks can now be tracked to the line of code that allocated the object, spurious task wakeups can also be tracked back to the line of code that created the timer, stream dumps can be limited to suspicious ones or to old ones, and some extra developer info can be dumped to show operating system or container specificities that are not easy to spot otherwise.

CPU binding warnings

There are two cases where configurations with inconsistent CPU bindings may result in contention and performance degradation:

  1. Thread sets are bound to smaller CPU sets, which causes contention.

  2. Only some (but not all) threads are referenced in the cpu-map This may lead to the threads that are not referenced using the same CPUs.

 A warning will be produced if either situation is present in the configuration.

Conclusion

HAProxy 2.9 is faster, more flexible, and more observable than ever before. 

This has been made possible by a long list of contributors, all providing invaluable work and collaboration. On behalf of the whole HAProxy community, we thank everyone who contributed to HAProxy 2.9.
Ready to upgrade to HAProxy 2.9? Here’s how to get started.

Subscribe to our blog. Get the latest release updates, tutorials, and deep-dives from HAProxy experts.