HAProxy 2.9 is now available.
Watch the webinar HAProxy 2.9 Feature Roundup to learn more about this release.
HAProxy 2.9 further extends HAProxy's performance, flexibility, and observability.
This is the fastest HAProxy yet, with improved performance efficiency in HTTP/2, threads, shared pools, logs, health checks, maps, caching, stick tables, and QUIC.
More flexibility with syslog load balancing, more QUIC options, more SSL/TLS libraries, Linux capabilities, and an experimental new interaction between data center load balancers and edge load balancers we call “Reverse HTTP”.
You can customize your HAProxy even more to get exactly the behavior you want, with a new none hash type for custom hash-based load balancing, new functions for working with Proxy Protocol TLV fields, more caching options, more Lua options, and new converters.
You can get more visibility into the operation of HAProxy, and get more helpful information for simpler decision-making. This includes new fetch methods, more options for Lua logging, debugging hints, and warnings for inconsistent CPU bindings.
This release came together through the efforts of all the community members who got involved. A release like this requires feature requests, bug reports, forum discussions, code submissions, QA tests, and documentation! In other words, this project is fueled by people like you! If you're interested in joining this vibrant community, it can be found on GitHub, Slack, Discourse, and the HAProxy mailing list.
How to get HAProxy 2.9
You can install HAProxy version 2.9 in any of the following ways:
Run it as a Docker container. View the Docker installation instructions.
Compile it from source. View the compilation instructions.
More performance
HAProxy is the world’s fastest software load balancer, famously achieving 2 million RPS on a single Arm-based AWS Graviton2 Instance. In HAProxy 2.9, we’ve pushed performance efficiency even further, making cost-effective scalability even more achievable.
HTTP
The HTTP/2 implementation uses significantly less memory and is up to 40-60% more CPU-efficient on large transfers thanks to a faster recycling of the buffers that significantly increases the probability of performing zero-copy forwarding operations.
Previously, when data was available the receiving mux would allocate a buffer, put the data into it, then pass it along the chain. Some heuristics were in place to avoid copies by trying to only swap buffer pointers, but the data would still require space and cause L3 cache eviction, which degrades performance and increases DRAM bandwidth usage.
The improvement comes from a change where the receiving side (typically the backend mux) will now request the sending side’s (frontend mux) buffer and receive the data directly into it, respecting flow control and available space. This avoids additional buffering which saves memory and preserves data in CPU caches.
Currently TCP, HTTP/1, HTTP/2, and HTTP/3 benefit from this. Updates for the cache will come later. Zero-copy forwarding can be disabled using tune.disable-zero-copy-forwarding
.
Threads
An upgrade of the locking code produced a performance gain of 2-4% on x86 systems and 14-33% on Arm systems.
Pools
Measures to reduce contention on shared pools improved performance for HTTP/2 and HTTP/3. The HTTP/2 request rate improved from 1.5M to 6.7M requests/second on 80 threads without the shared cache, and from 368k to 4.7M requests/second with the shared cache.
Logs
In the log-forward
section, the lock used in the part of the code related to updating the log server's index during sampling has been replaced with atomic operations to reduce bottlenecks and facilitate ~4x faster log production.
Health checks
HAProxy 2.9 improves upon the changes to health checks that were introduced in version 2.7. In version 2.7, health checks were modified to run on a single thread, which on systems with many threads reduced thread contention. However, that version also allowed for the health checks thread to offload its work if it ever became overloaded.
In version 2.9, the health checks thread will be even more proactive about offloading work if it finds a less busy thread available. This retains the advantages of running health checks on a single thread, but with greater sensitivity to the amount of work that the thread has to do. This comes into play especially in configurations containing many backend servers and many health checks. You will see an improvement in particular when running a large number of TLS health checks, especially when using OpenSSL 3.x where the health checks are extremely expensive.
To fine tune this, the new tune.max-checks-per-thread
global directive sets the number of active checks per thread above which the thread will actively try to find a less busy thread to run the check. You can also use spread-checks
to slow down health checks on overloaded threads.
TLS sessions are stored one per-thread in order to limit locking, but as a side effect, each thread had to perform its own handshake if no connection could be reused. This had potential to cause slow health checks, as the health checks could be responsible for as many handshakes as there are threads. Now a thread will automatically try to find an existing TLS session if it doesn't have one which aids in speeding up health checks convergence and reducing the load on servers when reloading the load balancer frequently.
For external checks, there is now an option to preserve environment variables by appending preserve-env
after the external-check
global directive so that scripts that depend on them can also work.
Maps
The http-request set-map
and other set-map
directives gained a performance boost. Previously, updating a large map would require a linear search of the verbatim entry that is called a reference. Frequent updates to large maps could cause significant latency and greatly increased CPU usage.
For HAProxy 2.9, a tree was implemented for the reference, which uses a little bit more memory but makes it so that the search time is logarithmic during updates, so these operations are now performed almost in constant time. A future update will use a more compact tree that will reduce the memory usage. The gain applies to the Lua function core.set_map
as well.
Additionally, HAProxy will now avoid performing lookups in maps and ACLs known to be empty, saving CPU time.
Caching
Improvements to the caching storage mechanism to reduce its lock contention on large systems allow it to scale much more linearly with threads.
QUIC
QUIC now uses less memory by allowing for earlier release of resources associated with connections winding down. This improves performance at the termination of a connection when there’s no data left to send.
Stick tables
The locking around stick tables was carefully refined by disentangling lookups from peer updates, and improving some undesired cache line sharing.
More flexibility
From community discussions, feedback, and user reviews, we see HAProxy used in a huge variety of different ways. We recapped a few of the use cases that users mentioned in recent G2 reviews. With broader load balancing and SSL/TLS support, plus a host of new options, HAProxy 2.9 is more flexible than ever, helping you to solve many more problems with one powerful multi-purpose tool.
Load balancing syslog
Previously, load balancing syslog required the use of sampling with log forwarding. While this provided a primitive form of load balancing for syslog, it was not ideal and could result in lost logs, as there was no way to verify whether the log servers were up. HAProxy now supports true load balancing for syslog traffic by using log backends, and provides health check mechanisms to ensure robust load balancing for logs.
A log backend is a backend responsible for sending log messages to the log servers. To specify a backend as a log backend, specify mode log
. Below, we show how to load balance HAProxy's own logs by setting our global log
line to direct log messages to the log backend named mylog-rrb:
global | |
log backend@mylog-rrb local0 | |
backend mylog-rrb | |
mode log | |
balance roundrobin | |
server s1 udp@10.0.0.1:514 | |
server s2 udp@10.0.0.2:514 |
Combining a log backend with the native logging infrastructure for relaying logs provided by log-forward
produces an elegant solution for syslog load balancing. Below we demonstrate load balancing syslog messages coming in through the log-forward
section named graylog using the log backend mylog-rrb:
log-forward graylog | |
# Listen on Graylog ports | |
bind :12201 | |
dgram-bind :12201 | |
log backend@mylog-rrb local0 | |
backend mylog-rrb | |
mode log | |
balance roundrobin | |
server log1 udp@10.0.0.1:514 | |
server log2 udp@10.0.0.2:514 |
Be sure to set up your remote syslog servers to expect logs with the facility code local0 to match the examples.
List each log server in the log backend that should receive log messages. Log backends support UDP servers by prefixing the server's address with the prefix udp@
.
HAProxy 2.9 adds the ability to specify the load balancing algorithm to use for logs with the balance
directive. Use this directive as you would use the balance directive for a regular backend, specifying roundrobin
, random
, log-hash
, or sticky
as the load balancing algorithm.
Log backends support common backend and server features, but do not support the HTTP and TCP related features. You can enable health checks for the log servers too. Beware that there is no way to check a UDP syslog service, so you would need to check a TCP port on the server or configure agent checks. TCP is more reliable, and should work well for the majority of users, though its performance is generally lower due to extra copies and queuing costs.
You can tune the buffer size of the implicit ring associated with the log backend using the log-bufsize
setting. A larger value for this setting may increase memory usage, but can help to prevent loss of log messages.
For further updates, follow our progress for Improved log server management on GitHub.
QUIC enhancements
Limited QUIC with a TLS library that does not support QUIC
When built with a TLS library that does not support QUIC, quic
bindings are rejected and HAProxy produces an error, but a limited QUIC implementation may be enabled using the new limited-quic
setting.
To use this, you must build HAProxy with USE_QUIC=1
and USE_QUIC_OPENSSL_COMPAT
. As the name suggests, this is only compatible with OpenSSL. This is labeled as limited QUIC and has the following limitations:
0-RTT is not supported in this mode.
For QUIC support, we recommend using a TLS library with full QUIC support wherever possible, rather than OpenSSL.
Listeners
Two settings, the quic-socket
argument on a bind
line and the global directive tune.quic.socket-owner
, allow specifying whether connections will share a listener socket or each connection will allocate its own socket. Previously, only tune.quic.socket-owner
was available, which could be applied to all listeners globally, but as of version 2.9 the setting can be applied for each bind
line using quic-socket
. The latter option is the default since it performs the best, but on systems that don't support all the necessary features in their UDP stack, HAProxy will switch to the shared listener mode. See the Linux capabilities section of this article, as socket-owner
connection may require cap_net_bind_service
to work as expected.
Performance tuning
Four new performance tuning settings affect QUIC in listener mode by changing the size of the receive buffer:
tune.rcvbuf.backend
tune.rcvbuf.frontend
tune.sndbuf.backend
tune.sndbuf.frontend
Maximum concurrent connections
In this version, the global directive maxsslconn
, which sets the maximum, per-process number of concurrent TLS connections, now applies to QUIC connections as well. Also in this version, QUIC connections are counted in maxconn
since their allocation. Previously, they had been counted only after handshake completion. Along with this, the number of concurrent handshakes and connections waiting to be accepted is now limited by the backlog
directive. Finally, there is also a new timeout client-hs
directive, which should prevent handshake from running too long. This timeout is applied for handshakes on TCP, HTTP/1, and HTTP/2. It falls back to timeout client
when not set.
QUIC with AWS-LC TLS library
QUIC now builds with the AWS-LC TLS library, minus a couple of features: 0-RTT, some ciphers based on chacha20 and aes128-ccm, and the client hello callback used to select between RSA and ECDSA certificates.
Signing algorithms and elliptic curves for TLS
Previously, HAProxy 2.8 added the client-sigalgs
and sigalgs
arguments to the bind
line, giving you the ability to restrict which signature algorithms could be used with client certificate authentication and during a TLS handshake, disallowing weak cryptography such as SHA-1. In this release, you can now set these same arguments on server
lines in a backend
. This lets you restrict signature algorithms when connecting to backend servers, both when establishing a TLS connection and when using client certificates.
You can set allowed algorithms on each server
line, set them for an entire backend
by using a default-server
line, or set them for all servers by using the global directives ssl-default-server-client-sigalgs
and ssl-default-server-sigalgs
.
Also in this release, the server
directive gained the curves
argument, which lets you specify which elliptic curves to use when sending the ClientHello message to a backend server during a TLS handshake. You can either set this list per server or set it globally with the ssl-default-server-curves
directive. A similar argument already exists for the bind
line.
OpenSSL, WolfSSL, and AWS-LC
OpenSSL, which has long been the TLS library we recommend, has dropped support for version 1.1.1 of its library, leaving only the 3.x version. Because the 3.x version has shown poor performance compared to the older 1.1.1 release, and also because it lacks QUIC protocol support, HAProxy contributors have been working to implement alternatives.
HAProxy 2.9 adds the AWS-LC library to the list, and the latest version of WolfSSL also provides good compatibility. Check the install guide in GitHub to learn how to build HAProxy with WolfSSL or AWS-LC. The support status of these libraries changes rapidly, and as such, you should always check the wiki article to verify the status of supported TLS libraries. This article is updated regularly to accurately reflect the updated support status.
Linux capabilities
The new global directive setcap
allows you to preserve previously set Linux capabilities after the HAProxy process starts up. After startup, HAProxy typically runs to a non-root user and loses these capabilities—the global directives that set which user and group to run as are user
and group
.
In Linux, capabilities are permissions you can grant directly to a program that it would not otherwise have when running as a non-root user. For example, without running as the root user, a program would not ordinarily be able to bind to an IP address not assigned to the server. By granting it the cap_net_raw
capability it can, allowing use cases like transparent proxying.
As a real-world example, in full transparent proxy mode, the source
directive's usesrc
argument allows for connecting to servers while presenting the client's source IP address as the load balancer's IP. It allows setting a source IP address that does not belong to the system itself. This requires elevated privileges or for you to pass the cap_net_raw
capability to HAProxy with the setcap
global directive.
Multiple capabilities may be specified, such as cap_net_bind_service
which allows binding to a privileged port for QUIC.
Reverse HTTP
An experimental feature in HAProxy 2.9 enables something we're calling reverse HTTP. The idea is for a datacenter load balancer to initiate an outbound connection to an edge load balancer, and then the edge load balancer converts that connection to be bidirectional. These bidirectional connections are put into a pool of idle connections. When clients make requests, the edge load balancer sends them over these connections to the datacenter load balancer.
Comparing this to a traditional load balancer setup, where a backend server pool consists of a static list of IP addresses, this approach looks more like dynamic service discovery. The datacenter load balancers register themselves with the edge load balancer by initiating connections to it, which puts them into the server pool.
This feature relies on HTTP/2 over TLS and uses client certificate authentication when registering. Coincidentally, a similar design was published by the IETF HTTP working group only hours after we finished our first design meeting. There are subtle differences but we're exchanging with the draft's authors and intend for our designs to converge; as such, the Reverse HTTP feature must be considered experimental until the draft becomes a standard, so that we can adjust the protocol for better interoperability.
Use cases for reverse HTTP include:
a new method of service discovery where HAProxy instances self-register with a public-facing, edge load balancer. This could enable self-service publishing of applications in organizations where teams manage their own apps, but not the load balancing infrastructure.
publishing applications running on an internal network behind NAT. These applications, which would be behind a local HAProxy instance, would connect to a load balancer that has a public IP address. Clients would then connect to these applications through the public-facing load balancer by requesting the application's unique FQDN, where FQDNs differentiate connections idling in the server pool.
the ability for mobile application developers to test their applications from their smartphone over the internet while their applications run in debug mode on their PC.
tech support situations where customers need to be able to download software directly from or upload traces directly to their PC.
More customization
HAProxy gives you the power to customize its operation so that it behaves exactly as you need it to. This approach starts with open source and extends to every corner of the configuration, from traffic shaping to the Lua event framework. In HAProxy 2.9, we’ve given you even more options, providing you even more opportunities to get the unique load balancing experience you need.
Hash-type none
HAProxy 2.9 adds a new hash function called none
to the hash-type
directive. Before, you had the choice of several hash functions including sdbm
, dbj2
, and others. By using none
, you can manually hash the key using a converter and then have HAProxy use the result directly.
Recall that one way to load balance traffic is to generate a hash of some part of the request, for example, the URL path,and associate that hash with a server in the backend. Often used for load balancing cache servers, hash-based load balancing ensures that traffic for a given resource, such as a video, an image, etc., will go to the same server where that resource was cached.
HAProxy 2.6 added flexibility to this approach by providing a generic hash
load balancing algorithm as a replacement for the more specific algorithms source
, uri
, hdr
, url_param
, and rdp-cookie
. The generic algorithm lets you generate a hash from the output of any fetch method. Combining this with the new none
function gives you ultimate flexibility in deciding what and how to hash.
Proxy Protocol TLV fields
The Proxy Protocol enables you to preserve a client's IP address when relaying traffic through an HAProxy load balancer. Otherwise the IP address is replaced by the load balancer's own address. It works by including the client's IP address in a header attached to the TCP connection. It is supported by a number of web servers and proxies.
The protocol supports attaching additional information, beyond the IP address, in what are called Type-Length-Value (TLV) fields. For example, AWS Elastic Load Balancer uses TLVs to send the VPC endpoint ID for traffic routed through a VPC endpoint service.
In this version of HAProxy, you can:
set new TLVs by adding the
set-proxy-v2-tlv-fmt
argument on aserver
line.use the
fc_pp_tlv
fetch method to extract TLVs from the protocol header.use
set-proxy-v2-tlv-fmt
to forward TLVs you receive withfc_pp_tlv
.
Caching supports Origin header in Vary
This version updates HAProxy's small object cache to support the Origin
header in Vary
. Previously, the implementation supported varying on Accept-Encoding
and Referer
only.
By supporting the Origin
header in Vary
, you can cache CORS responses. Before this, HAProxy would return cached responses that were missing CORS headers or had CORS headers that didn't match the client, resulting in an error client-side. Users who tried to fix this by adding Vary: Origin
would have found that none of those responses got cached, since the HAProxy implementation didn't support Origin
.
Lua features
Lua includes several new features:
A new function named
core.get_var(var_name)
returnsproc
scoped variables. The oldertxn:get_var(var_name)
returnstxn
scoped variables.The
httpclient
class now supports aretries
field, which sets the number of times to retry a failed request.The
httpclient
class now supports atimeout.connect
field for setting the maximum time to wait for establishing a connection, in milliseconds. It defaults to 5 seconds.The
core.register_action
now supportshttp-after-res
.
Fragment references
Per RFC3986, the HTTP protocol does not permit certain characters in URIs sent to servers. One example of a character that browsers should not send to the server or load balancer as part of the URI is the hash mark #, which indicates a URI fragment. Fragments are parsed and interpreted client-side. It may be the case, however, that a buggy client or server will send such data, and HAProxy offers two directives that will allow these requests: option accept-invalid-http-request
and option accept-invalid-http-response
. When enabled, these options relax HAProxy’s header name parsing and allow invalid characters to be present in header names and URIs. This means that when the option is enabled in HAProxy 2.9, incoming URIs are allowed to contain fragment references.
These options should never be enabled by default, however, and you should use them only once you've confirmed a specific problem.
When enabled, requests with erroneous fragment references are captured for later analysis. You can use the show errors
Runtime API command to see these requests.
Converters
Date format converters
Four new converters assist in allowing any date format to be used in logs.
ms_ltime
ms_utime
us_ltime
us_utime
ms_ltime
and ms_utime
work like their ltime
and utime
counterparts but take input in milliseconds. us_ltime
and us_utime
work like their ltime
and utime
counterparts but take input in microseconds. These converters translate integers (containing a date since epoch) into a date format string.
Bytes converter
bytes
can now take its offset and length from variables as well as literal integers, enabling it to extract contents from more complex protocols such as those that contain TLVs, so you can skip past some TLVs and read others.
JSON query converter
HAProxy now supports retrieving JSON arrays from the request body via the json_query
converter. This converter supports the JSON Path syntax, which includes the JSON types string, boolean, number, and now array.
More visibility
HAProxy is valued for its transparency, robust logging and stats capabilities. In HAProxy 2.9, we’ve provided more methods to fetch the information you need and post logs from Lua scripts, plus helpful hints and analysis to aid debugging. This improves usability, and makes troubleshooting and planning easier.
Fetch methods for log information
New fetch methods make it easier to access information that had previously been available only within the access logs or had been complex to work with using log-format variables:
| The process ID of the current process which is usually the worker process. |
| The total number of active, concurrent connections on the process. |
| The number of bytes uploaded from the client to the load balancer. |
| The number of bytes transmitted from the load balancer to the client. |
| The exact date when the connection was received by HAProxy. |
| The value for the exact date when HAProxy received the first byte of the HTTP request. |
| The time spent accepting the TCP connection and executing handshakes and is equivalent to |
| The total session duration time and is equivalent to |
| The idle time before the request and is equivalent to |
| Time spent waiting to get the client's request and is equivalent to |
| The sum of |
| Time spent queued and is equivalent to |
| Time spent establishing a connection to the backend server and is equivalent to |
| The time spent transferring the response payload to the client and is equivalent to |
| Time spent waiting for the server to send a full response and is equivalent to |
| The estimated time as seen by the client and is equivalent to |
| The active time for the HTTP request and is equivalent to |
Originally, this information was available only from within your access logs. You could define a custom log format that included variables with terse names like %Th
(connection handshake time) to capture access log values.
Then, in version 2.5, the http-response set-var-fmt
directive was added, which let you capture the information in variables for use in other parts of your configuration. For example, below, we store %Tw
(time waiting in queue), %Tc
(time waiting to connect to the server), and %Tr
(time waiting for the server to send the response) as variables and then add them up to get the total response time:
http-response set-var-fmt(txn.queue_time) %Tw | |
http-response set-var-fmt(txn.connect_time) %Tc | |
http-response set-var-fmt(txn.response_time) %Tr | |
http-response set-var(txn.total_response_time) var(txn.queue_time),add(txn.connect_time),add(txn.response_time) |
This syntax is still the way to go for performing math operations on variables, since the add
converter expects a variable name as an argument, not a fetch method. However, in certain parts of the configuration using a fetch method is more convenient than using a log-format variable, as in the example below where we increase the log level for any request that experienced a slow response:
http-after-response set-log-level err if { txn.timer.user ge 5000 } |
ACL fetches
A new fetch allows you to gain insight about defined ACLs:
acl
evaluates up to 12 named ACLs and returns a boolean. Provide the ACLs in a list with a comma between each one. You can use the!
operator to invert the result of an ACL.
This example shows the syntax:
acl(!is_malware,goodguys,auth_ok) |
Layer 7 fetches
Two new fetches at Layer 7, the application layer, provide more information about HTTP contents:
req.cook_names
returns the names of all cookies in requests.res.cook_names
returns the names of all cookies in responses.
Layer 4 fetches
Three new fetches at Layer 4, the transport layer closest to the connection, provide more information about connections:
ssl_bc_curve
retrieves the name of the curve used in the key agreement when the outgoing connection was made over an SSL/TLS transport layer.ssl_fc_curve
does the same for the incoming connection.cur_client_timeout
retrieves the value in milliseconds for the currently configured client timeout.fc_pp_tlv
returns the TLV value for a given TLV ID. This fetch may be useful in detecting errors related to duplicate TLV IDs.
Lua logging
Two new global directives give you precise control over logging from your Lua scripts.
Setting
tune.lua.log.loggers
toon
enables sending log messages from your Lua script to the loggers you've configured withlog
lines. Setting it tooff
disables the logging. The default value ison
.Setting
tune.lua.log.stderr
toon
will send log messages tostderr
. When set toauto
, logging tostderr
occurs whentune.lua.log.loggers
is set tooff
or when there's no proxy-level logger assigned, such as for Lua tasks that run in the background. The default value isauto
.
In your Lua script, use any of the following functions to write to the log:
core.Info("message")
core.Warning("message")
core.Alert("message")
Debugging hints
Debug mechanisms have been improved to help reduce the number of round trips between users and developers during troubleshooting sessions. To accomplish this, panic dumps may now show better hints and likely causes for certain situations.
For example, if a thread is waiting on the Lua lock while the lua-load
directives are in use, a message regarding trying lua-load-per-thread
will be emitted. Another case may be the situation where a watchdog triggers inside Lua, upon which some possible causes will be proposed, including the case where perhaps the script depends on some unsafe external library.
Tasks and memory profiling now indicate over what period the measures were taken, suspected memory leaks can now be tracked to the line of code that allocated the object, spurious task wakeups can also be tracked back to the line of code that created the timer, stream dumps can be limited to suspicious ones or to old ones, and some extra developer info can be dumped to show operating system or container specificities that are not easy to spot otherwise.
CPU binding warnings
There are two cases where configurations with inconsistent CPU bindings may result in contention and performance degradation:
Thread sets are bound to smaller CPU sets, which causes contention.
Only some (but not all) threads are referenced in the cpu-map This may lead to the threads that are not referenced using the same CPUs.
A warning will be produced if either situation is present in the configuration.
Conclusion
HAProxy 2.9 is faster, more flexible, and more observable than ever before.
This has been made possible by a long list of contributors, all providing invaluable work and collaboration. On behalf of the whole HAProxy community, we thank everyone who contributed to HAProxy 2.9.
Ready to upgrade to HAProxy 2.9? Here’s how to get started.