X-Git-Url: https://git.decadent.org.uk/gitweb/?p=nfs-utils.git;a=blobdiff_plain;f=utils%2Fmount%2Fnfs.man;h=87e27e1519615d3663172dc0b29d7aef33514695;hp=be91a252150c37dda50610abcff10c7de70d17c4;hb=9a5293a10551c03b4fb976503dd24da569fcadb3;hpb=4bbd6d624c000f26ab828852ee90a4624df26c49

diff --git a/utils/mount/nfs.man b/utils/mount/nfs.man
index be91a25..87e27e1 100644
--- a/utils/mount/nfs.man
+++ b/utils/mount/nfs.man
@@ -46,11 +46,10 @@ files on this mount point.
 The fifth and sixth fields on each line are not used
 by NFS, thus conventionally each contain the digit zero. For example:
 .P
-.SP
-.NF
-.TA 2.5i +0.75i +0.75i +1.0i
+.nf
+.ta 8n +14n +14n +9n +20n
 	server:path	/mountpoint	fstype	option,option,...	0 0
-.FI
+.fi
 .P
 The server's hostname and export pathname
 are separated by a colon, while
@@ -113,12 +112,16 @@ option may mitigate some of the risks of using the
 option.
 .TP 1.5i
 .BI timeo= n
-The time (in tenths of a second) the NFS client waits for a
-response before it retries an NFS request. If this
-option is not specified, requests are retried every
-60 seconds for NFS over TCP.
-The NFS client does not perform any kind of timeout backoff
-for NFS over TCP.
+The time in deciseconds (tenths of a second) the NFS client waits for a
+response before it retries an NFS request.
+.IP
+For NFS over TCP the default
+.B timeo
+value is 600 (60 seconds).
+The NFS client performs linear backoff: After each retransmission the 
+timeout is increased by
+.BR timeo 
+up to the maximum of 600 seconds.
 .IP
 However, for NFS over UDP, the client uses an adaptive
 algorithm to estimate an appropriate timeout value for frequently used
@@ -369,14 +372,8 @@ Valid security flavors are
 .BR sys ,
 .BR krb5 ,
 .BR krb5i ,
-.BR krb5p ,
-.BR lkey ,
-.BR lkeyi ,
-.BR lkeyp ,
-.BR spkm ,
-.BR spkmi ,
 and
-.BR spkmp .
+.BR krb5p ,
 Refer to the SECURITY CONSIDERATIONS section for details.
 .TP 1.5i
 .BR sharecache " / " nosharecache
@@ -503,6 +500,8 @@ Specifying a netid that uses TCP forces all traffic from the
 command and the NFS client to use TCP.
 Specifying a netid that uses UDP forces all traffic types to use UDP.
 .IP
+.B Before using NFS over UDP, refer to the TRANSPORT METHODS section.
+.IP
 If the
 .B proto
 mount option is not specified, the
@@ -517,6 +516,8 @@ The
 option is an alternative to specifying
 .BR proto=udp.
 It is included for compatibility with other operating systems.
+.IP
+.B Before using NFS over UDP, refer to the TRANSPORT METHODS section.
 .TP 1.5i
 .B tcp
 The
@@ -752,8 +753,8 @@ If
 is specified, the client assumes that POSIX locks are local and uses NLM
 sideband protocol to lock files when flock locks are used.
 .IP
-To support legacy flock behavior similar to that of NFS clients < 2.6.12, use
-'local_lock=flock'. This option is required when exporting NFS mounts via
+To support legacy flock behavior similar to that of NFS clients < 2.6.12, 
+use 'local_lock=flock'. This option is required when exporting NFS mounts via
 Samba as Samba maps Windows share mode locks as flock. Since NFS clients >
 2.6.12 implement flock by emulating POSIX locks, this will result in
 conflicting locks.
@@ -900,40 +901,40 @@ The following example from an
 file causes the mount command to negotiate
 reasonable defaults for NFS behavior.
 .P
-.NF
-.TA 2.5i +0.7i +0.7i +.7i
+.nf
+.ta 8n +16n +6n +6n +30n
 	server:/export	/mnt	nfs	defaults	0 0
-.FI
+.fi
 .P
 Here is an example from an /etc/fstab file for an NFS version 2 mount over UDP.
 .P
-.NF
-.TA 2.5i +0.7i +0.7i +.7i
+.nf
+.ta 8n +16n +6n +6n +30n
 	server:/export	/mnt	nfs	nfsvers=2,proto=udp	0 0
-.FI
+.fi
 .P
 Try this example to mount using NFS version 4 over TCP
 with Kerberos 5 mutual authentication.
 .P
-.NF
-.TA 2.5i +0.7i +0.7i +.7i
+.nf
+.ta 8n +16n +6n +6n +30n
 	server:/export	/mnt	nfs4	sec=krb5	0 0
-.FI
+.fi
 .P
 This example can be used to mount /usr over NFS.
 .P
-.NF
-.TA 2.5i +0.7i +0.7i +.7i
+.nf
+.ta 8n +16n +6n +6n +30n
 	server:/export	/usr	nfs	ro,nolock,nocto,actimeo=3600	0 0
-.FI
+.fi
 .P
 This example shows how to mount an NFS server
 using a raw IPv6 link-local address.
 .P
-.NF
-.TA 2.5i +0.7i +0.7i +.7i
+.nf
+.ta 8n +40n +5n +4n +9n
 	[fe80::215:c5ff:fb3e:e2b1%eth0]:/export	/mnt	nfs	defaults	0 0
-.FI
+.fi
 .SH "TRANSPORT METHODS"
 NFS clients send requests to NFS servers via
 Remote Procedure Calls, or
@@ -1073,6 +1074,83 @@ or
 options are specified more than once on the same mount command line,
 then the value of the rightmost instance of each of these options
 takes effect.
+.SS "Using NFS over UDP on high-speed links"
+Using NFS over UDP on high-speed links such as Gigabit
+.BR "can cause silent data corruption" .
+.P
+The problem can be triggered at high loads, and is caused by problems in
+IP fragment reassembly. NFS read and writes typically transmit UDP packets
+of 4 Kilobytes or more, which have to be broken up into several fragments
+in order to be sent over the Ethernet link, which limits packets to 1500
+bytes by default. This process happens at the IP network layer and is
+called fragmentation.
+.P
+In order to identify fragments that belong together, IP assigns a 16bit
+.I IP ID
+value to each packet; fragments generated from the same UDP packet
+will have the same IP ID. The receiving system will collect these
+fragments and combine them to form the original UDP packet. This process
+is called reassembly. The default timeout for packet reassembly is
+30 seconds; if the network stack does not receive all fragments of
+a given packet within this interval, it assumes the missing fragment(s)
+got lost and discards those it already received.
+.P
+The problem this creates over high-speed links is that it is possible
+to send more than 65536 packets within 30 seconds. In fact, with
+heavy NFS traffic one can observe that the IP IDs repeat after about
+5 seconds.
+.P
+This has serious effects on reassembly: if one fragment gets lost,
+another fragment
+.I from a different packet
+but with the
+.I same IP ID
+will arrive within the 30 second timeout, and the network stack will
+combine these fragments to form a new packet. Most of the time, network
+layers above IP will detect this mismatched reassembly - in the case
+of UDP, the UDP checksum, which is a 16 bit checksum over the entire
+packet payload, will usually not match, and UDP will discard the
+bad packet.
+.P
+However, the UDP checksum is 16 bit only, so there is a chance of 1 in
+65536 that it will match even if the packet payload is completely
+random (which very often isn't the case). If that is the case,
+silent data corruption will occur.
+.P
+This potential should be taken seriously, at least on Gigabit
+Ethernet.
+Network speeds of 100Mbit/s should be considered less
+problematic, because with most traffic patterns IP ID wrap around
+will take much longer than 30 seconds.
+.P
+It is therefore strongly recommended to use
+.BR "NFS over TCP where possible" ,
+since TCP does not perform fragmentation.
+.P
+If you absolutely have to use NFS over UDP over Gigabit Ethernet,
+some steps can be taken to mitigate the problem and reduce the
+probability of corruption:
+.TP +1.5i
+.I Jumbo frames:
+Many Gigabit network cards are capable of transmitting
+frames bigger than the 1500 byte limit of traditional Ethernet, typically
+9000 bytes. Using jumbo frames of 9000 bytes will allow you to run NFS over
+UDP at a page size of 8K without fragmentation. Of course, this is
+only feasible if all involved stations support jumbo frames.
+.IP
+To enable a machine to send jumbo frames on cards that support it,
+it is sufficient to configure the interface for a MTU value of 9000.
+.TP +1.5i
+.I Lower reassembly timeout:
+By lowering this timeout below the time it takes the IP ID counter
+to wrap around, incorrect reassembly of fragments can be prevented
+as well. To do so, simply write the new timeout value (in seconds)
+to the file
+.BR /proc/sys/net/ipv4/ipfrag_time .
+.IP
+A value of 2 seconds will greatly reduce the probability of IPID clashes on
+a single Gigabit link, while still allowing for a reasonable timeout
+when receiving fragmented traffic from distant peers.
 .SH "DATA AND METADATA COHERENCE"
 Some modern cluster file systems provide
 perfect cache coherence among their clients.
@@ -1413,7 +1491,7 @@ security flavor encrypts every RPC request
 to prevent data exposure during network transit; however,
 expect some performance impact
 when using integrity checking or encryption.
-Similar support for other forms of cryptographic security (such as lipkey and SPKM3)
+Similar support for other forms of cryptographic security
 is also available.
 .P
 The NFS version 4 protocol allows
@@ -1558,10 +1636,10 @@ To ensure that the saved mount options are not erased during a remount,
 specify either the local mount directory, or the server hostname and
 export pathname, but not both, during a remount.  For example,
 .P
-.NF
-.TA 2.5i
+.nf
+.ta 8n
 	mount -o remount,ro /mnt
-.FI
+.fi
 .P
 merges the mount option
 .B ro