From: Chuck Lever Date: Mon, 18 May 2009 15:08:53 +0000 (-0400) Subject: sm-notify: Failed DNS lookups should be retried X-Git-Tag: nfs-utils-1-1-7-rc1~4 X-Git-Url: https://git.decadent.org.uk/gitweb/?a=commitdiff_plain;ds=sidebyside;h=5d253e3e326bfcf0e8a342bca53f1b4db120a7a9;hp=5d253e3e326bfcf0e8a342bca53f1b4db120a7a9;p=nfs-utils.git sm-notify: Failed DNS lookups should be retried Currently, if getaddrinfo(3) fails when trying to resolve a hostname, sm-notify gives up immediately on that host. If sm-notify is started before network service is available on a system, that means it quits without notifying anyone. Or, if DNS service isn't available due to a network partition or because the DNS server crashed, sm-notify will simply remove all of its callback files and exit. Really, sm-notify should try harder. We know that the hostnames passed in to notify_host() have already been vetted by statd, which won't monitor a hostname that it can't resolve. So it's likely that any DNS failure we meet here is a temporary condition. If it isn't, then sm-notify will stop trying to notify that host in 15 minutes anyway. [ The host's file is left in /var/lib/nfs/sm.bak in this case, but sm.bak is not read again until the next time sm-notify runs. ] sm-notify already has retry logic for handling RPC timeouts. We can co-opt that to drive DNS resolution retries. We also add AI_ADDRCONFIG because on systems whose network startup is handled by NetworkManager, there appears to be a bug that causes processes that started calling getaddinfo(3) before the network came up to continue getting EAI_AGAIN even after the network is fully operating. As I understand it, legacy glibc (before AI_ADDRCONFIG was exposed in headers) sets AI_ADDRCONFIG by default, although I haven't checked this. In any event, pre-glibc-2.2 systems probably won't run NetworkManager anyway, so this may not be much of a problem for them. Signed-off-by: Chuck Lever Signed-off-by: Steve Dickson ---