All,
I've recently come across a very subtle bug in the way coova-chilli handles uamanydns NAT requests. It exists in all versions of coova-chilli.
Here is how it originally works:
- When a client uses a DNS server a check is made to see if it is one of the configured servers
- If not AND uamanydns is specified, it performs a DNAT on the server address to the first configured DNS server
- When the DNS response is received, the source address is replaced with the original address
However the implementation has a very serious bug that would cause it to break and behave strangely on a number of occasions (dhcp.c):
DNAT (dhcp_doDNAT):
if (this->anydns &&
iph->daddr != conn->dns1.s_addr &&
iph->daddr != conn->dns2.s_addr) {
conn->dnatdns = iph->daddr;
iph->daddr = conn->dns1.s_addr;
chksum(iph);
}
reverse NAT (dhcp_undoDNAT):
if ((this->anydns ||
iph->saddr == conn->dns1.s_addr ||
iph->saddr == conn->dns2.s_addr) &&
iph->protocol == PKT_IP_PROTO_UDP &&
udph->src == htons(DHCP_DNS)) {
if (this->anydns &&
conn->dnatdns &&
iph->saddr != conn->dns1.s_addr &&
iph->saddr != conn->dns2.s_addr &&
iph->saddr != conn->dnatdns) {
iph->saddr = conn->dnatdns;
chksum(iph);
}
There are two scenarios where this implementation will totally break:
1) If a client uses a DNS server that is not configured on chilli and starts a DNS request, its address will be put in conn->dnatdns and when the response is received, it is unNATed to the address from conn->dnatdns, which is correct. Now assume that this same client has changed DNS server to one of the chilli-configured ones. The implementation has no way of cleaning up conn->dnatdns if the DNS server is valid and will blinding unNAT the response to the old entry in conn->dnatdns.
This case is the one that brought the issue to my attention. It took us a long time to determine it because in most cases where clients do not have advanced anti-virus, the client will accept the DNS response coming from a different server as long as it has all the necessary credentials.
2) If a client uses a primary and secondary DNS servers (or more) and both are different that the ones configured on chilli:
In this case, if the timing is right, the first request from DNS1 will be NATed and its original address is placed in conn->dnatdns. If it takes a bit long to respond, the client may initiate further requests or choose to switch to DNS2 and its address will be placed in conn->dnatdns. Now if one response is received from DNS1 and one response received from DNS2, both will be incorrectly unNATed to source address of DNS2.
I admit the second scenario is very rare and generally won't affect the client because eventually dnatdns will contain the address of DNS2 and should work.
Proposed solution(s)
There are two proposed solutions here and are subject to discussions. The first one is less invasive and will maintain configuration backward compatibility:
a) Change the code in dhcp_doDNAT so that if it receives a DNS request from a valid DNS server it resets conn->dnatdns. This will resolve scenario number (1) but not (2). Here is a quick patch:
if (this->anydns &&
iph->daddr != conn->dns1.s_addr &&
iph->daddr != conn->dns2.s_addr) {
conn->dnatdns = iph->daddr;
iph->daddr = conn->dns1.s_addr;
chksum(iph);
}
+ else {
+ conn->dnatdns = 0;
+ }
if (_options.dnsparanoia) {
b) Modify the way DNS NAT is implemented so that it can handle multiple DNS addresses rather than just one. This will resolve (1) and (2).
c) Change the behavior of uamanydns so that it only allows the DNS requests to go out unNATed and let iptables do that NAT by using user DNAT rules. This will resolve (1) and (2). This obviously have the disadvantage of breaking configuration compatibility.
I think for your 1) you will
I think for your 1) you will only have that problem when the client went from one non-chilli provided DNS server to another non-chilli provided DNS server since undoDNAT is checking for those chilli-provided DNS sources and not changing the source. Regardless, I support the patch given...
great
Thanks. I will apply it to our 1.0.14 installation too.
You are right in your comment. I was looking at 1.0.14 source and pasted code from 1.2.12 which seems to have fixed most of the issue by checking the response source.
Indeed.. the issue was
Indeed.. the issue was pointed out in a previous forum post and fixed in 1.2.2. Though, I still do like your patch as it will help that situation where you move from one non-chilli dns server to another non-chilli assigned dns server. Thanks for your feedback!