<html xmlns="http://www.w3.org/1999/xhtml">
<head>
-<title>What's new in the Linux kernel - DebConf 2013</title>
+<title>What's new in the Linux kernel - DebConf 2014</title>
<!-- metadata -->
<meta name="generator" content="S5" />
<meta name="version" content="S5 1.1" />
<div id="header">
</div>
<div id="footer">
-<h1>DebConf 2013</h1>
+<h1>DebConf 2014</h1>
<h2>What's new in the Linux kernel</h2>
</div>
<ul>
<li>
Professional software engineer by day, Debian developer by night
+ (or sometimes the other way round)
</li>
<li>
Regular Linux contributor in both roles since 2008
</li>
<li>
- Maintaining a net driver in my day job, plus core networking
- and PCI code as necessary
+ Working on various drivers and kernel code in my day job
</li>
<li>
Debian kernel team member, now doing most of the unstable
every week or two)
<ul>
<li>
- ...though some features aren't ready to use when they firat
+ ...though some features aren't ready to use when they first
appear in a release
</li>
</ul>
</li>
<li>
- For 'wheezy' we chose to freeze with Linux 3.2, which was
- getting pretty old by the time of release
+ Since my talk last year, Linus has made 6 releases (3.11-3.16)
</li>
<li>
Good news: we have lots of new kernel features in testing/unstable
</div>
<div class="slide">
- <h1>Team device driver [3.3]</h1>
+ <h1>Recap of last year's features (1)</h1>
<ul class="incremental">
<li>
- Alternative to the bonding driver - simpler, modular, high-level
- control deferred to userland
+ Team device driver: userland package (libteam) was uploaded in
+ October
</li>
<li>
- Basic configuration can be done with <tt>ip</tt>, but it really
- needs new tools - <tt>teamd</tt>, <tt>teamnl</tt>, etc.
+ Transcendent memory: frontswap, zswap and Xen tmem will be
+ enabled in next kernel upload
</li>
<li>
- Make it work: see
- <a href="http://bugs.debian.org/695850">http://bugs.debian.org/695850</a>
+ New KMS drivers: should all work with current Xorg drivers
+ </li>
+ <li>
+ Module signing: still not enabled, but probably will be if we
+ do Secure Boot
</li>
</ul>
</div>
<div class="slide">
- <h1>Transcendent memory [3.0-3.5]</h1>
+ <h1>Recap of last year's features (2)</h1>
<ul class="incremental">
<li>
- Abstract storage for memory pages, expected to be slower than
- regular memory but faster than disk
- </li>
- <li>
- Can provide a second layer of page cache (cleancache and frontswap)
+ More support for discard: still not enabled at install time
+ (<a href="https://bugs.debian.org/690977">#690977</a>)
</li>
<li>
- Pages stored by hypervisor (Xen), compressed local memory
- (zcache) or cluster of machines (RAMster)
+ More support for containers: XFS was fixed, and user namespaces
+ have been enabled
</li>
<li>
- Not yet enabled in Debian kernels, and needs some thought about
- configuration
+ bcache: userland package (bcache-tools) still not quite ready
+ (<a href="https://bugs.debian.org/708132">#708132</a>)
</li>
<li>
- Make it work: see
- <a href="https://lwn.net/Articles/454795/">https://lwn.net/Articles/454795/</a>
- and send proposal to debian-kernel
+ ARMv7 multiplatform: d-i works on <em>some</em> platforms but
+ I'm still not sure which. Some progress on GPU drivers, but not
+ in Debian yet.
</li>
</ul>
</div>
<div class="slide">
- <h1>New KMS drivers [3.3-3.10]</h1>
- <ul class="incremental">
+ <h1>Unnamed temporary files [3.11]</h1>
+ <ul>
<li>
- DRM/KMS drivers added for old, new and virtual hardware -
- AST, DisplayLink, Hyper-V, Matrox G200, QEMU Cirrus
+ Open directory with option <tt>O_TMPFILE</tt> to create an
+ unnamed temporary file on that filesystem
</li>
<li>
- Should be more robust than purely user-mode drivers, and
- compatible with Secure Boot
+ As with <tt>tmpfile()</tt>, the file disppears on
+ last <tt>close()</tt>
</li>
<li>
- Current X drivers don't work with these, so the kernel drivers
- are disabled for now
+ File can be linked into the filesystem using
+ <tt>linkat(..., AT_EMPTY_PATH)</tt>, allowing for 'atomic'
+ creation of file with complete contents and metadata
</li>
<li>
- Make it work: join the X Strike Force and package the new X
- drivers
+ Not supported on all filesystem types, so you will usually need
+ a fallback
</li>
</ul>
</div>
<div class="slide">
- <h1>Module signing [3.7]</h1>
- <ul class="incremental">
+ <h1>Lustre filesystem [3.12]</h1>
+ <ul>
+ <li>
+ A distributed filesystem, popular for cluster computing
+ applications
+ </li>
<li>
- Kernel modules can be signed at build time, and the kernel
- configured to refuse loading unsigned modules
+ Developed out-of-tree since 1999, but now added to Linux staging
+ directory
</li>
<li>
- Necessary but not sufficient to implement Secure Boot -
- we would also need signed kernel images and some other
- restrictions when booted in this mode
+ Was included in squeeze but dropped from wheezy as it didn't
+ support Linux 3.2
</li>
<li>
- Make Secure Boot work: come to the meeting on Tuesday
+ Userland is now missing from Debian
</li>
</ul>
</div>
<div class="slide">
- <h1>More support for discard</h1>
+ <h1>Network busy-polling [3.11] (1)</h1>
+ <p>A conventional network request/response process looks like:</p>
+ <small><!-- ew -->
+ <ol class="incremental">
+ <li>
+ Task calls <tt>send()</tt>; network stack constructs a
+ packet; driver adds it to hardware Tx queue
+ </li>
+ <li>
+ Task calls <tt>poll()</tt> or <tt>recv()</tt>, which blocks;
+ kernel puts it to sleep and possibly idles the CPU
+ </li>
+ <li>
+ Network adapter receives response and generates IRQ, waking
+ up CPU
+ </li>
+ <li>
+ Driver's IRQ handler schedules polling of the hardware Rx
+ queue (NAPI)
+ </li>
+ <li>
+ Kernel runs the driver's NAPI poll function, which passes
+ the response packet into the network stack
+ </li>
+ <li>
+ Network stack decodes packet headers and adds packet to
+ the task's socket
+ </li>
+ <li>
+ Network stack wakes up sleeping task; scheduler switches
+ to it and the socket call returns
+ </li>
+ </ol>
+ </small>
+</div>
+
+<div class="slide">
+ <h1>Network busy-polling [3.11] (2)</h1>
<ul class="incremental">
<li>
- Flash devices (and thin-provisioned SANs) can be more efficient
- if the filesystem 'discards' unused disk space
+ If driver supports busy-polling, it tags each packet with
+ the receiving NAPI context, and kernel tags sockets
</li>
<li>
- Requires support in hardware, driver, filesystem and any layered
- device drivers - e.g. LVM, RAID (added in 3.7)
+ When busy-polling is enabled, <tt>poll()</tt>
+ and <tt>recv()</tt> call the driver's busy poll function to
+ check for packets synchronously (up to some time limit)
</li>
<li>
- Must be explicitly enabled, but d-i doesn't do this by default
+ If the response usually arrives quickly, this reduces overall
+ request/response latency as there are no context switches and
+ power transitions
</li>
<li>
- Make it work: fix <a href="http://bugs.debian.org/690977">http://bugs.debian.org/690977</a>
+ Time limit set by sysctl (<tt>net.busy_poll</tt>,
+ <tt>net.busy_read</tt>) or socket option (<tt>SOL_SOCKET,
+ SO_BUSY_POLL</tt>); requires tuning
</li>
</ul>
</div>
<div class="slide">
- <h1>More support for containers</h1>
+ <h1>Btrfs offline dedupe [3.12]</h1>
<ul class="incremental">
<li>
- Containers are lightweight VMs - run on the same kernel as host,
- but with limited privileges and resources
- </li>
- <li>
- Previously done by OpenVZ and Linux-VServer; gradually being
- reimplemented upstream
+ Btrfs generally does COW rather than updating in-place, allowing
+ snapshots and file copies to defer the actual copying and save
+ space
</li>
<li>
- User namespaces (added in 3.7) support the existence of a
- <tt>root</tt> user inside the container that is unprivileged
- outside the container
+ Filesystems may still end up with multiple copies of the same
+ file content
</li>
<li>
- Currently somewhat experimental, and requires filesystem
- changes which haven't been done for XFS
+ Btrfs doesn't actively merge these duplicates, but userland can
+ tell it to do so
</li>
<li>
- Make user namespaces work: send patches to upstream XFS
- developers (this one's hard)
+ Many file dedupe tools are packaged for Debian, but not one that
+ works with this Btrfs feature, e.g. bedup
</li>
</ul>
</div>
<div class="slide">
- <h1>bcache [3.10]</h1>
+ <h1>nftables [3.13]</h1>
<ul class="incremental">
<li>
- Turns a fast block device into a cache for a larger, slower
- device (see also: dm-cache, EnhanceIO)
- </li>
- <li>
- Needs its own set of userland tools
- </li>
- <li>
- Make it work:
- see <a href="http://bugs.debian.org/708132">http://bugs.debian.org/708132</a>
- (maybe just needs a sponsor)
+ Linux has several firewall APIs - iptables, ip6tables, arptables
+ and ebtables
</li>
- </ul>
-</div>
-
-<div class="slide">
- <h1>ARMv7 multiplatform</h1>
- <ul class="incremental">
<li>
- Until recently, each ARM kernel image could support only a small
- set of different chips
+ All require a specific kernel module for each type of match
+ and each possible action
</li>
<li>
- Debian 'armmp' kernel now supports ARMv7 SoCs from Calxeda,
- Freescale and Marvell, and others should be supported soon
+ Userland could only use the four protocol-specific APIs,
+ although the internal netfilter API is more flexible
</li>
<li>
- Debian could run on a much larger range of ARM hardware - but we
- need installer and boot loader support to make this easy
+ nftables exposes more of this flexibility, allowing userland
+ to provide firewall code for a specialised VM (similar to BPF)
</li>
<li>
- Make it work: join the ARM porters and d-i team
+ nftables userland tool uses this API and is already packaged
</li>
<li>
- Make the GPUs work: join a reverse-engineering project
+ Eventually, the old APIs will be removed and the old userland
+ tools must be ported to use nftables
</li>
</ul>
</div>