Haproxy for Active Directory LDAPS
One more issue to solve...
We have one domain controller at work that is kinda critical to the entire operation. I mean, not like this particular server is the only AD server on the site, but it somehow became only LDAPS server responsible for authentication for non-AD enabled services.
Like VPN. Like our ERP.
So when it's down, it's a bit of a problem. We tend to avoid making it go down during operational hours, but nothing is 100% infallible. Plus with trying to be more hands-off with hosts (I prefer cattle instead of pets), more and more Active Directory specific CVE's coming out, this one machine is being a bit of a pain in the ass.
How to make one service more reliable
Since all the DCs in a site (well, all in the organization) serve the same directory, so the thought crosses the mind to swap in a proxy host at the IP address, move the critical host to a new IP address, then configure the proxy to talk to whatever host it can connect to.
I've been aware of HAProxy
as a means to proxy traffic to multiple backend servers, but we've never really had an application to use it on, until this project.
A better way
Honestly, a better way would be to configure the appliances that are doing the LDAPS lookups be HA, but we can't all be winners. Some appliances don't support it, other appliances have some weird configuration gotcha's...
And in other cases, inter-company office politics make the change too hard.
Getting started, but with ANSIBLE!
I've been trying to get better with Ansible, so today I'm starting my configuration from a blank OS and a working ansible controller node.
First step, inventory.
Adding to inventory is easy enough - just add the host. Ran into a small issue that DNS resolution wasn't working yet for the host (DNS replication delays), so we can fix this by setting ansible_host
and a corresponding IP address.
Second step, getting connected
I wanted to figure out how to get the server managed without fiddling a lot with the host first. Normally I would connect to the host, install my own SSH keys, then connect from ansible, but I think I can get ansible to do this for me.
This wound up setting a bunch of extra variables on the command line:
ansible-playbook -l haproxy all-host-configure.yml --extra-vars "ansible_user=timatlee ansible_password=XXXXX ansible_sudo_pass=XXXXX"
I needed to install sshpass
on the controller node, but otherwise - I was good to go.
Second + n steps, writing the playbook
I've started my last few playbooks that simply call a role, then all the heavy lifting goes into a role. I feel like this enables me to more easily call dependencies on other roles or collections, set variables at a larger scope and so on. So really, I wind up with a playbook that reads something like:
1# haproxy.yml
2---
3- name: HAProxy configuration
4 hosts: haproxy
5 become: true
6
7 roles:
8 - haproxy
The hosts, haproxy
, refers to a group within the inventory. The idea being - maybe I want more than one of these hosts.
Writing the role
This took some iteration.
Installing from Debian's repo's
Out of the gate, I started with installing haproxy
from Debian's repositories. This seemed to well enough, but I started getting some odd errors connecting to the LDAPS backend, similar to:
1[WARNING] (10) : Health check for server ldap/openldap succeeded, reason: Layer7 check passed, code: 0, info: "Success", check duration: 0ms, status: 3/3 UP.
2<133>Sep 13 19:39:29 haproxy[10]: Health check for server ldap/openldap succeeded, reason: Layer7 check passed, code: 0, info: "Success", check duration: 0ms, status: 3/3 UP.
3[WARNING] (10) : Health check for server ldap/ad-ldap failed, reason: Layer7 invalid response, info: "Not LDAPv3 protocol", check duration: 0ms, status: 0/2 DOWN.
4[WARNING] (10) : Server ldap/ad-ldap is DOWN. 1 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
5<133>Sep 13 19:39:29 haproxy[10]: Health check for server ldap/ad-ldap failed, reason: Layer7 invalid response, info: "Not LDAPv3 protocol", check duration: 0ms, status: 0/2 DOWN.
6<129>Sep 13 19:39:29 haproxy[10]: Server ldap/ad-ldap is DOWN. 1 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
These messages were taken from the github issue (at https://github.com/haproxy/haproxy/issues/1390). This lead me to a mentioned issue, which made reference to fixes not being backported into version 1.8 of haproxy
... and guess what version comes with the Debian apt repositories...
Installing from Docker
Oh yeah, back to Docker. The role's tasks change a bit:
- Install Docker
- Install a docker-compose file
- Bring it all up.
Easy, right?
Turns out.. kinda is. Mostly. Sort of.
Jeff Geerling has a Docker role for ansible that did most of what I wanted. Docker-compose needed to be not installed so that Python's docker
library works. This is is an easy flag to set for the role.
Installing pip
Python's pip
also needs to be installed so that we can Python's docker
and docker-compose
installed. These are required for Ansible's interaction with Docker. I went ahead and got this from Debian's repositories, but wonder if there's a more "current" version to get.
In any case, installing pip
this way solved the problem.. so onwards.
Configurations
After the software installation, it's just a matter of creating some directories and copying over some config files for docker
and haproxy
:
/usr/local/src/docker-compose-haproxy/docker-compose.yml
/etc/haproxy/haproxy.cfg
/etc/haproxy/errors/*.http
(which I wound up commenting out from the configuration anyways..)
Docker-compose
The docker-compose file is pretty straightforward, thankfully. Haproxy
doesn't seem to need much environment configuration, but there were a few additions:
haproxy
's stats page is enabled, and I'm exposing that on port 8404. I'm leaving it up to the operating system to firewall it (instead of using something liketraefik
)- I only care about proxying LDAPS traffic. We've moved away from LDAP, so there's no reason to be proxying port 389.
- I wasn't able to start the container on port 636 without the
net.ipv4.ip_unprivileged_port_start=0
line. This is due to ports below 1024 being considered privileged, and only usable by the system.
1# docker-compose.yml
2---
3version: '3.3'
4
5services:
6 haproxy:
7 container_name: haproxy
8 restart: always
9 image: haproxy
10 volumes:
11 - '/etc/haproxy:/usr/local/etc/haproxy:ro'
12 ports:
13 - '636:636'
14 - '8404:8404'
15 sysctls:
16 - net.ipv4.ip_unprivileged_port_start=0
Haproxy
This took a bit of work and fiddling around. I needed to have a few things happen here:
- Backend servers should be checked, and if offline, be taken out of the rotation
- SSL should not terminate at the proxy, instead should be passed through to the backend server (and terminate there). This removes the need for
haproxy
to hold its own certificate, and allows me to continue to use the automatic certificate renewal in a domain controller - Connections should be prioritized at one physical site, but failover to our other physical site. These are in separate IP address spaces
Some learnings along the way:
- I struggled quite a bit with the SSL piece, and in the end more or less gave up on validating SSL checks - and opted to add the
ssl-server-verify none
option in the global section. I should revisit this, but I know that the traffic betweenhaproxy
and my domain controllers are encrypted, but by disabling SSL verification, it is open to a MITM attack (though we would be in a BAD place if that happened...) - Logging needs to come out to stdout so that
docker container logs
can read them. - I took the SSL bind options, ciphersuites and ciphers directly from the comments in the original haproxy.cfg file (specifically https://hynek.me/articles/hardening-your-web-servers-ssl-ciphers/). This also should be revisited, as the Mozilla list seems to be more comprehensive.
- Most of the defaults were left alone, though I did remove the error pages and changed the
mode
totcp
. - On that thought,
mode tcp
is required to pass the LDAPS right through to the backend server. This has some unintended consequences when running in HTTP mode (missing some headers likeX-Forwarded
and such), but these weren't a concern for this application. - The
frontend stats
page needs to bemode http
for somewhat obvious reasons. This generates the basic stats page from withinhaproxy
and is enough to see what's going on at a high level. I've left the firewalling of that up to the OS, but some level of authentication should be set up here. frontend ldaps-in
is nothing too remarkable...backend ldaps-out
is where some of the fun begins...- The default behaviour is for
haproxy
to round-robin connect between the servers listed, but I want to prioritize the servers "closest" tohaproxy
- so dc1, dc3 and dc4 are it. - dc4 doesn't actually exist - that's google's DNS server - but it provides me an easy way of checking the behaviour when a server is offline or not able to be connected. This host basically goes offline after 2 seconds of
haproxy
being awake, and isn't used in the connection pool. - e-dc2 and e-dc3 both have the
backup
tag on them, and my understanding is that the first backup server gets used when all the non-backup servers are offline. This is desirable, as I shouldn't have all my DCs offline within a site. If we've hit this condition, I'm prioritizing getting the site operational, and the services that depend on that authentication can just.. wait. - I was using
option ldap-check
, but Active Directory does not allow anonymous binds by default - and we prefer to leave it that way. Instead, we can emulate a connection to the domain controller that doesn't do anything. Source found on this gist, but after some travels, found the original mailing list post here.
- The default behaviour is for
1# haproxy.cfg
2
3global
4 log stdout format raw daemon debug
5 daemon
6 ssl-server-verify none
7
8
9 # Default ciphers to use on SSL-enabled listening sockets.
10 # For more information, see ciphers(1SSL). This list is from:
11 # https://hynek.me/articles/hardening-your-web-servers-ssl-ciphers/
12 # An alternative list with additional directives can be obtained from
13 # https://mozilla.github.io/server-side-tls/ssl-config-generator/?server=haproxy
14 # ssl-default-bind-ciphers ECDH+AESGCM:DH+AESGCM:ECDH+AES256:DH+AES256:ECDH+AES128:DH+AES:RSA+AESGCM:RSA+AES:!aNULL:!MD5:!DSS
15 # ssl-default-bind-options no-sslv3
16 ssl-default-bind-options ssl-min-ver TLSv1.2 prefer-client-ciphers
17 ssl-default-bind-ciphersuites TLS_AES_128_GCM_SHA256:TLS_AES_256_GCM_SHA384:TLS_CHACHA20_POLY1305_SHA256
18 ssl-default-bind-ciphers ECDH+AESGCM:ECDH+CHACHA20:ECDH+AES256:ECDH+AES128:!aNULL:!SHA1:!AESCCM
19
20 ssl-default-server-options ssl-min-ver TLSv1.2
21 ssl-default-server-ciphersuites TLS_AES_128_GCM_SHA256:TLS_AES_256_GCM_SHA384:TLS_CHACHA20_POLY1305_SHA256
22 ssl-default-server-ciphers ECDH+AESGCM:ECDH+CHACHA20:ECDH+AES256:ECDH+AES128:!aNULL:!SHA1:!AESCCM
23
24 tune.ssl.default-dh-param 2048
25
26
27defaults
28 log global
29 mode tcp
30 option tcplog
31 option dontlognull
32 timeout connect 1s
33 timeout client 20s
34 timeout server 20s
35
36frontend stats
37 mode http
38 option httplog
39 bind *:8404
40 stats enable
41 stats uri /stats
42 stats refresh 10s
43 stats admin if LOCALHOST
44
45frontend ldaps-in
46 mode tcp
47 option tcplog
48 bind *:636
49 mode tcp
50 option tcplog
51 default_backend ldaps-servers
52
53backend ldaps-servers
54 mode tcp
55
56 server dc1 192.168.10.253:636 check
57 server dc3 192.168.10.218:636 check
58 server dc4 8.8.8.8:636 check
59 server e-dc2 192.168.20.213:636 check backup
60 server e-dc3 192.168.20.214:636 check backup
61
62# option ldap-check
63 # Below, ldap check procedure :
64 option tcp-check
65 tcp-check connect port 636 ssl
66 tcp-check send-binary 300c0201 # LDAP bind request "<ROOT>" simple
67 tcp-check send-binary 01 # message ID
68 tcp-check send-binary 6007 # protocol Op
69 tcp-check send-binary 0201 # bind request
70 tcp-check send-binary 03 # LDAP v3
71 tcp-check send-binary 04008000 # name, simple authentication
72 tcp-check expect binary 0a0100 # bind response + result code: success
73 tcp-check send-binary 30050201034200 # unbind request
Done ?
So while the whole thing works, there's a few things yet to fix:
- Fixing some variables in
docker-compose
- Revisiting the SSL connection verification to the domain controllers
- Expanding (and updating) the SSL configuration to be more current
And of course, testing...