<html xmlns="http://www.w3.org/1999/xhtml">
<head>
-<title>What's new in the Linux kernel - DebConf 2013</title>
+<title>What's new in the Linux kernel - DebConf 2014</title>
<!-- metadata -->
<meta name="generator" content="S5" />
<meta name="version" content="S5 1.1" />
<div id="header">
</div>
<div id="footer">
-<h1>DebConf 2013</h1>
+<h1>DebConf 2014</h1>
<h2>What's new in the Linux kernel</h2>
</div>
<div class="slide">
<h1>What's new in the Linux kernel</h1>
<object data="tux-debian.svg" width="35%" align="right"></object>
+<h2>and what's missing in Debian</h2>
<h3>Ben Hutchings</h3>
</div>
<ul>
<li>
Professional software engineer by day, Debian developer by night
+ (or sometimes the other way round)
</li>
<li>
Regular Linux contributor in both roles since 2008
+ </li>
+ <li>
+ Working on various drivers and kernel code in my day job
+ </li>
+ <li>
+ Debian kernel team member, now doing most of the unstable
+ maintenance aside from ports
+ </li>
+ <li>
+ Maintaining Linux 3.2.<var>y</var> stable update series on
+ kernel.org
+ </li>
+ </ul>
+</div>
+
+<div class="slide">
+ <h1>Linux releases early and often</h1>
+ <ul class="incremental">
+ <li>
+ Linux is released about 5 times a year (plus stable updates
+ every week or two)
<ul>
<li>
- Maintaining a net driver in my day job, plus core networking
- and PCI code as necessary
- </li>
- <li>
- Debian kernel team member, now doing most of the unstable
- maintenance aside from ports
- </li>
- <li>
- Maintaining Linux 3.2.<var>y</var> stable update series on
- kernel.org
+ ...though some features aren't ready to use when they first
+ appear in a release
</li>
</ul>
</li>
+ <li>
+ Since my talk last year, Linus has made 6 releases (3.11-3.16)
+ </li>
+ <li>
+ Good news: we have lots of new kernel features in testing/unstable
+ </li>
+ <li>
+ Bad news: some of them won't really work without new userland
+ </li>
+ </ul>
+</div>
+
+<div class="slide">
+ <h1>Recap of last year's features (1)</h1>
+ <ul class="incremental">
+ <li>
+ Team device driver: userland package (libteam) was uploaded in
+ October
+ </li>
+ <li>
+ Transcendent memory: frontswap, zswap and Xen tmem will be
+ enabled in next kernel upload
+ </li>
+ <li>
+ New KMS drivers: should all work with current Xorg drivers
+ </li>
+ <li>
+ Module signing: still not enabled, but probably will be if we
+ do Secure Boot
+ </li>
+ </ul>
+</div>
+
+<div class="slide">
+ <h1>Recap of last year's features (2)</h1>
+ <ul class="incremental">
+ <li>
+ More support for discard: still not enabled at install time
+ (<a href="https://bugs.debian.org/690977">#690977</a>)
+ </li>
+ <li>
+ More support for containers: XFS was fixed, and user namespaces
+ have been enabled
+ </li>
+ <li>
+ bcache: userland package (bcache-tools) still not quite ready
+ (<a href="https://bugs.debian.org/708132">#708132</a>)
+ </li>
+ <li>
+ ARMv7 multiplatform: d-i works on <em>some</em> platforms but
+ I'm still not sure which. Some progress on GPU drivers, but not
+ in Debian yet.
+ </li>
+ </ul>
+</div>
+
+<div class="slide">
+ <h1>Unnamed temporary files [3.11]</h1>
+ <ul>
+ <li>
+ Open directory with option <tt>O_TMPFILE</tt> to create an
+ unnamed temporary file on that filesystem
+ </li>
+ <li>
+ As with <tt>tmpfile()</tt>, the file disppears on
+ last <tt>close()</tt>
+ </li>
+ <li>
+ File can be linked into the filesystem using
+ <tt>linkat(..., AT_EMPTY_PATH)</tt>, allowing for 'atomic'
+ creation of file with complete contents and metadata
+ </li>
+ <li>
+ Not supported on all filesystem types, so you will usually need
+ a fallback
+ </li>
+ </ul>
+</div>
+
+<div class="slide">
+ <h1>Network busy-polling [3.11] (1)</h1>
+ <p>A conventional network request/response process looks like:</p>
+ <small><!-- ew -->
+ <ol class="incremental">
+ <li>
+ Task calls <tt>send()</tt>; network stack constructs a
+ packet; driver adds it to hardware Tx queue
+ </li>
+ <li>
+ Task calls <tt>poll()</tt> or <tt>recv()</tt>, which blocks;
+ kernel puts it to sleep and possibly idles the CPU
+ </li>
+ <li>
+ Network adapter receives response and generates IRQ, waking
+ up CPU
+ </li>
+ <li>
+ Driver's IRQ handler schedules polling of the hardware Rx
+ queue (NAPI)
+ </li>
+ <li>
+ Kernel runs the driver's NAPI poll function, which passes
+ the response packet into the network stack
+ </li>
+ <li>
+ Network stack decodes packet headers and adds packet to
+ the task's socket
+ </li>
+ <li>
+ Network stack wakes up sleeping task; scheduler switches
+ to it and the socket call returns
+ </li>
+ </ol>
+ </small>
+</div>
+
+<div class="slide">
+ <h1>Network busy-polling [3.11] (2)</h1>
+ <ul class="incremental">
+ <li>
+ If driver supports busy-polling, it tags each packet with
+ the receiving NAPI context, and kernel tags sockets
+ </li>
+ <li>
+ When busy-polling is enabled, <tt>poll()</tt>
+ and <tt>recv()</tt> call the driver's busy poll function to
+ check for packets synchronously (up to some time limit)
+ </li>
+ <li>
+ If the response usually arrives quickly, this reduces overall
+ request/response latency as there are no context switches and
+ power transitions
+ </li>
+ <li>
+ Time limit set by sysctl (<tt>net.busy_poll</tt>,
+ <tt>net.busy_read</tt>) or socket option (<tt>SOL_SOCKET,
+ SO_BUSY_POLL</tt>); requires tuning
+ </li>
+ </ul>
+</div>
+
+<div class="slide">
+ <h1>Lustre filesystem [3.12]</h1>
+ <ul>
+ <li>
+ A distributed filesystem, popular for cluster computing
+ applications
+ </li>
+ <li>
+ Developed out-of-tree since 1999, but now added to Linux staging
+ directory
+ </li>
+ <li>
+ Was included in squeeze but dropped from wheezy as it didn't
+ support Linux 3.2
+ </li>
+ <li>
+ Userland is now missing from Debian
+ </li>
+ </ul>
+</div>
+
+<div class="slide">
+ <h1>Btrfs offline dedupe [3.12]</h1>
+ <ul class="incremental">
+ <li>
+ Btrfs generally does COW rather than updating in-place, allowing
+ snapshots and file copies to defer the actual copying and save
+ space
+ </li>
+ <li>
+ Filesystems may still end up with multiple copies of the same
+ file content
+ </li>
+ <li>
+ Btrfs doesn't actively merge these duplicates, but userland can
+ tell it to do so
+ </li>
+ <li>
+ Many file dedupe tools are packaged for Debian, but not one that
+ works with this Btrfs feature, e.g. bedup
+ </li>
+ </ul>
+</div>
+
+<div class="slide">
+ <h1>nftables [3.13]</h1>
+ <ul class="incremental">
+ <li>
+ Linux has several firewall APIs - iptables, ip6tables, arptables
+ and ebtables
+ </li>
+ <li>
+ All limited to single protocol, and need a kernel module for
+ each match type and each action
+ </li>
+ <li>
+ Kernel's internal netfilter API is more flexible
+ </li>
+ <li>
+ nftables exposes more of this flexibility, allowing userland
+ to provide firewall code for a specialised VM (similar to BPF)
+ </li>
+ <li>
+ nftables userland tool uses this API and is already packaged
+ </li>
+ <li>
+ Eventually, old APIs will be removed and old userland
+ tools must be ported to use nftables
+ </li>
+ </ul>
+</div>
+
+<div class="slide">
+ <h1>User-space lockdep [3.14]</h1>
+ <ul>
+ <li>
+ Kernel threads and interrupts all run in same address space,
+ using several different synchronisation mechanisms
+ </li>
+ <li>
+ Easy to introduce bugs that can result in deadlock, but hard to
+ reproduce them
+ </li>
+ <li>
+ Kernel's 'lockdep' system dynamically tracks locking operations
+ and detects <em>potential</em> deadlocks
+ </li>
+ <li>
+ Now available as a userland library! Except we need to package
+ it (build from linux-tools source package)
+ </li>
+ </ul>
+</div>
+
+<div class="slide">
+ <h1>arm64 and ppc64el ports</h1>
+ <ul class="incremental">
+ <li>
+ 'arm64' architecture was added in Linux 3.7, but was not yet
+ usable, and no real hardware was available at the time
+ </li>
+ <li>
+ Upstream Linux arm64 kernel, and Debian packages, should now run
+ on emulators and real hardware
+ </li>
+ <li>
+ 'powerpc' architecture has been available for many years,
+ but didn't support kernel running little-endian
+ </li>
+ <li>
+ Linux 3.13 added little-endian kernel suport, along with new
+ userland ELF ABI variant - we call it ppc64el
+ </li>
+ <li>
+ Both ports now being bootstrapped in unstable and are candidates
+ for jessie release
+ </li>
+ </ul>
+</div>
+
+<div class="slide">
+ <h1>File-private locking [3.15]</h1>
+ <ul class="incremental">
+ <li>
+ POSIX says that closing a file descriptor removes
+ the <em>process</em>'s locks on that file
+ </li>
+ <li>
+ What if process has multiple file descriptors for the same
+ file? It loses all locks obtained through any descriptor!
+ </li>
+ <li>
+ Multithreaded processes may require serialisation around
+ file open/close to ensure they open each file exactly once
+ </li>
+ <li>
+ Hard and symbolic links can hide that two files are really the
+ same
+ </li>
+ <li>
+ Linux now provides file-private locks, associated with a
+ specific open file and removed when last descriptor for the
+ open file is closed
+ </li>
+ </ul>
+</div>
+
+<div class="slide">
+ <h1>Multiqueue block devices [3.16]</h1>
+ <ul class="incremental">
+ <li>
+ Each block device has a command queue (possibly shared with
+ other devices)
+ </li>
+ <li>
+ Queue may be partly implemented by hardware (NCQ) or only
+ in software
+ </li>
+ <li>
+ A single queue means initiation is serialised and completion
+ involves IPI - can be bottleneck for fast devices
+ </li>
+ <li>
+ High-end SSDs support multiple queues, but kernel needed changes
+ to use them
+ </li>
+ <li>
+ <tt>mtip32xx</tt> driver now supports multiqueue, but SCSI
+ drivers don't yet - may be backport-able?
+ </li>
</ul>
</div>