.\"@(#)nfs.5"
-.TH NFS 5 "2 November 2007"
+.TH NFS 5 "9 October 2012"
.SH NAME
nfs \- fstab format and options for the
.B nfs
The fifth and sixth fields on each line are not used
by NFS, thus conventionally each contain the digit zero. For example:
.P
-.SP
-.NF
-.TA 2.5i +0.75i +0.75i +1.0i
+.nf
+.ta 8n +14n +14n +9n +20n
server:path /mountpoint fstype option,option,... 0 0
-.FI
+.fi
.P
The server's hostname and export pathname
are separated by a colon, while
option.
.TP 1.5i
.BI timeo= n
-The time (in tenths of a second) the NFS client waits for a
-response before it retries an NFS request. If this
-option is not specified, requests are retried every
-60 seconds for NFS over TCP.
-The NFS client does not perform any kind of timeout backoff
-for NFS over TCP.
+The time in deciseconds (tenths of a second) the NFS client waits for a
+response before it retries an NFS request.
+.IP
+For NFS over TCP the default
+.B timeo
+value is 600 (60 seconds).
+The NFS client performs linear backoff: After each retransmission the
+timeout is increased by
+.BR timeo
+up to the maximum of 600 seconds.
.IP
However, for NFS over UDP, the client uses an adaptive
algorithm to estimate an appropriate timeout value for frequently used
.BR automount (8)
for details).
.TP 1.5i
+.BR rdirplus " / " nordirplus
+Selects whether to use NFS v3 or v4 READDIRPLUS requests.
+If this option is not specified, the NFS client uses READDIRPLUS requests
+on NFS v3 or v4 mounts to read small directories.
+Some applications perform better if the client uses only READDIR requests
+for all directories.
+.TP 1.5i
.BI retry= n
The number of minutes that the
.BR mount (8)
.BR sys ,
.BR krb5 ,
.BR krb5i ,
-.BR krb5p ,
-.BR lkey ,
-.BR lkeyi ,
-.BR lkeyp ,
-.BR spkm ,
-.BR spkmi ,
and
-.BR spkmp .
+.BR krb5p ,
Refer to the SECURITY CONSIDERATIONS section for details.
.TP 1.5i
.BR sharecache " / " nosharecache
.IP
The DATA AND METADATA COHERENCE section contains a
detailed discussion of these trade-offs.
+.TP 1.5i
+.BR fsc " / " nofsc
+Enable/Disables the cache of (read-only) data pages to the local disk
+using the FS-Cache facility. See cachefilesd(8)
+and <kernel_soruce>/Documentation/filesystems/caching
+for detail on how to configure the FS-Cache facility.
+Default value is nofsc.
.SS "Options for NFS versions 2 and 3 only"
Use these options, along with the options in the above subsection,
for NFS versions 2 and 3 only.
.TP 1.5i
.BI proto= netid
-The transport protocol name and protocol family the NFS client uses
-to transmit requests to the NFS server for this mount point.
-If an NFS server has both an IPv4 and an IPv6 address, using a specific
-netid will force the use of IPv4 or IPv6 networking to communicate
-with that server.
-.IP
-If support for TI-RPC is built into the
-.B mount.nfs
-command,
-.I netid
-is a valid netid listed in
-.IR /etc/netconfig .
-The value "rdma" may also be specified.
-If the
-.B mount.nfs
-command does not have TI-RPC support, then
+The
.I netid
-is one of "tcp," "udp," or "rdma," and only IPv4 may be used.
+determines the transport that is used to communicate with the NFS
+server. Available options are
+.BR udp ", " udp6 ", "tcp ", " tcp6 ", and " rdma .
+Those which end in
+.B 6
+use IPv6 addresses and are only available if support for TI-RPC is
+built in. Others use IPv4 addresses.
.IP
Each transport protocol uses different default
.B retrans
command and the NFS client to use TCP.
Specifying a netid that uses UDP forces all traffic types to use UDP.
.IP
+.B Before using NFS over UDP, refer to the TRANSPORT METHODS section.
+.IP
If the
.B proto
mount option is not specified, the
option is an alternative to specifying
.BR proto=udp.
It is included for compatibility with other operating systems.
+.IP
+.B Before using NFS over UDP, refer to the TRANSPORT METHODS section.
.TP 1.5i
.B tcp
The
through a firewall that blocks the rpcbind protocol.
.TP 1.5i
.BI mountproto= netid
-The transport protocol name and protocol family the NFS client uses
+The transport the NFS client uses
to transmit requests to the NFS server's mountd service when performing
this mount request, and when later unmounting this mount point.
.IP
-If support for TI-RPC is built into the
+.I netid
+may be one of
+.BR udp ", and " tcp
+which use IPv4 address or, if TI-RPC is built into the
.B mount.nfs
command,
-.I netid
-is a valid netid listed in
-.IR /etc/netconfig .
-Otherwise,
-.I netid
-is one of "tcp" or "udp," and only IPv4 may be used.
+.BR udp6 ", and " tcp6
+which use IPv6 addresses.
.IP
This option can be used when mounting an NFS server
through a firewall that blocks a particular transport.
if the negotiation causes problems on the client or server.
Refer to the SECURITY CONSIDERATIONS section for more details.
.TP 1.5i
-.BR rdirplus " / " nordirplus
-Selects whether to use NFS version 3 READDIRPLUS requests.
-If this option is not specified, the NFS client uses READDIRPLUS requests
-on NFS version 3 mounts to read small directories.
-Some applications perform better if the client uses only READDIR requests
-for all directories.
-.TP 1.5i
.BR local_lock= mechanism
Specifies whether to use local locking for any or both of the flock and the
POSIX locking mechanisms.
for NFS version 4 and newer.
.TP 1.5i
.BI proto= netid
-The transport protocol name and protocol family the NFS client uses
-to transmit requests to the NFS server for this mount point.
-If an NFS server has both an IPv4 and an IPv6 address, using a specific
-netid will force the use of IPv4 or IPv6 networking to communicate
-with that server.
-.IP
-If support for TI-RPC is built into the
-.B mount.nfs
-command,
-.I netid
-is a valid netid listed in
-.IR /etc/netconfig .
-Otherwise,
+The
.I netid
-is one of "tcp" or "udp," and only IPv4 may be used.
+determines the transport that is used to communicate with the NFS
+server. Supported options are
+.BR tcp ", " tcp6 ", and " rdma .
+.B tcp6
+use IPv6 addresses and is only available if support for TI-RPC is
+built in. Both others use IPv4 addresses.
.IP
All NFS version 4 servers are required to support TCP,
so if this mount option is not specified, the NFS version 4 client
the behavior of this option in more detail.
.TP 1.5i
.BI clientaddr= n.n.n.n
+.TP 1.5i
+.BI clientaddr= n:n: ... :n
Specifies a single IPv4 address (in dotted-quad form),
or a non-link-local IPv6 address,
that the NFS client advertises to allow servers
file causes the mount command to negotiate
reasonable defaults for NFS behavior.
.P
-.NF
-.TA 2.5i +0.7i +0.7i +.7i
+.nf
+.ta 8n +16n +6n +6n +30n
server:/export /mnt nfs defaults 0 0
-.FI
+.fi
.P
Here is an example from an /etc/fstab file for an NFS version 2 mount over UDP.
.P
-.NF
-.TA 2.5i +0.7i +0.7i +.7i
+.nf
+.ta 8n +16n +6n +6n +30n
server:/export /mnt nfs nfsvers=2,proto=udp 0 0
-.FI
+.fi
.P
Try this example to mount using NFS version 4 over TCP
with Kerberos 5 mutual authentication.
.P
-.NF
-.TA 2.5i +0.7i +0.7i +.7i
+.nf
+.ta 8n +16n +6n +6n +30n
server:/export /mnt nfs4 sec=krb5 0 0
-.FI
+.fi
.P
This example can be used to mount /usr over NFS.
.P
-.NF
-.TA 2.5i +0.7i +0.7i +.7i
+.nf
+.ta 8n +16n +6n +6n +30n
server:/export /usr nfs ro,nolock,nocto,actimeo=3600 0 0
-.FI
+.fi
.P
This example shows how to mount an NFS server
using a raw IPv6 link-local address.
.P
-.NF
-.TA 2.5i +0.7i +0.7i +.7i
+.nf
+.ta 8n +40n +5n +4n +9n
[fe80::215:c5ff:fb3e:e2b1%eth0]:/export /mnt nfs defaults 0 0
-.FI
+.fi
.SH "TRANSPORT METHODS"
NFS clients send requests to NFS servers via
Remote Procedure Calls, or
options are specified more than once on the same mount command line,
then the value of the rightmost instance of each of these options
takes effect.
+.SS "Using NFS over UDP on high-speed links"
+Using NFS over UDP on high-speed links such as Gigabit
+.BR "can cause silent data corruption" .
+.P
+The problem can be triggered at high loads, and is caused by problems in
+IP fragment reassembly. NFS read and writes typically transmit UDP packets
+of 4 Kilobytes or more, which have to be broken up into several fragments
+in order to be sent over the Ethernet link, which limits packets to 1500
+bytes by default. This process happens at the IP network layer and is
+called fragmentation.
+.P
+In order to identify fragments that belong together, IP assigns a 16bit
+.I IP ID
+value to each packet; fragments generated from the same UDP packet
+will have the same IP ID. The receiving system will collect these
+fragments and combine them to form the original UDP packet. This process
+is called reassembly. The default timeout for packet reassembly is
+30 seconds; if the network stack does not receive all fragments of
+a given packet within this interval, it assumes the missing fragment(s)
+got lost and discards those it already received.
+.P
+The problem this creates over high-speed links is that it is possible
+to send more than 65536 packets within 30 seconds. In fact, with
+heavy NFS traffic one can observe that the IP IDs repeat after about
+5 seconds.
+.P
+This has serious effects on reassembly: if one fragment gets lost,
+another fragment
+.I from a different packet
+but with the
+.I same IP ID
+will arrive within the 30 second timeout, and the network stack will
+combine these fragments to form a new packet. Most of the time, network
+layers above IP will detect this mismatched reassembly - in the case
+of UDP, the UDP checksum, which is a 16 bit checksum over the entire
+packet payload, will usually not match, and UDP will discard the
+bad packet.
+.P
+However, the UDP checksum is 16 bit only, so there is a chance of 1 in
+65536 that it will match even if the packet payload is completely
+random (which very often isn't the case). If that is the case,
+silent data corruption will occur.
+.P
+This potential should be taken seriously, at least on Gigabit
+Ethernet.
+Network speeds of 100Mbit/s should be considered less
+problematic, because with most traffic patterns IP ID wrap around
+will take much longer than 30 seconds.
+.P
+It is therefore strongly recommended to use
+.BR "NFS over TCP where possible" ,
+since TCP does not perform fragmentation.
+.P
+If you absolutely have to use NFS over UDP over Gigabit Ethernet,
+some steps can be taken to mitigate the problem and reduce the
+probability of corruption:
+.TP +1.5i
+.I Jumbo frames:
+Many Gigabit network cards are capable of transmitting
+frames bigger than the 1500 byte limit of traditional Ethernet, typically
+9000 bytes. Using jumbo frames of 9000 bytes will allow you to run NFS over
+UDP at a page size of 8K without fragmentation. Of course, this is
+only feasible if all involved stations support jumbo frames.
+.IP
+To enable a machine to send jumbo frames on cards that support it,
+it is sufficient to configure the interface for a MTU value of 9000.
+.TP +1.5i
+.I Lower reassembly timeout:
+By lowering this timeout below the time it takes the IP ID counter
+to wrap around, incorrect reassembly of fragments can be prevented
+as well. To do so, simply write the new timeout value (in seconds)
+to the file
+.BR /proc/sys/net/ipv4/ipfrag_time .
+.IP
+A value of 2 seconds will greatly reduce the probability of IPID clashes on
+a single Gigabit link, while still allowing for a reasonable timeout
+when receiving fragmented traffic from distant peers.
.SH "DATA AND METADATA COHERENCE"
Some modern cluster file systems provide
perfect cache coherence among their clients.
to prevent data exposure during network transit; however,
expect some performance impact
when using integrity checking or encryption.
-Similar support for other forms of cryptographic security (such as lipkey and SPKM3)
+Similar support for other forms of cryptographic security
is also available.
.P
The NFS version 4 protocol allows
specify either the local mount directory, or the server hostname and
export pathname, but not both, during a remount. For example,
.P
-.NF
-.TA 2.5i
+.nf
+.ta 8n
mount -o remount,ro /mnt
-.FI
+.fi
.P
merges the mount option
.B ro