1 <!doctype debiandoc system>
2 <!-- -*- mode: sgml; mode: fold -*- -->
4 <title>DSync File List Format</title>
6 <author>Jason Gunthorpe <email>jgg@debian.org</email></author>
7 <version>$Id: filelist.sgml,v 1.4 1999/11/15 07:59:49 jgg Exp $</version>
13 Copyright © Jason Gunthorpe, 1998-1999.
15 DSync and this document are free software; you can redistribute them and/or
16 modify them under the terms of the GNU General Public License as published
17 by the Free Software Foundation; either version 2 of the License, or (at your
18 option) any later version.
21 For more details, on Debian GNU/Linux systems, see the file
22 /usr/doc/copyright/GPL for the full license.
29 <!-- ===================================================================== -->
32 The DSync file list is a crucial part of the DSync system, it provides the
33 client with access to a list of files and file attributes for all the files
34 in a directory tree. Much information is compacted into the per-file structure
35 that may be used by the client in reconstructing the directory tree. In spirit
36 it is like the common ls-lR files that mirrors have, but in practice it is
37 radically different, most striking is that it is stored in a compacted binary
38 format and may optionally contain MD5 hashes.
41 The file list for a directory tree may be either dynamically generated by the
42 server or generated only once like the ls-lR files. In fact with a static
43 file list it is possible to use the <em>rsync method</> to transfer only the
44 differences in the list which is a huge boon for sites with over 50000 files
45 in their directory trees
48 Internally the file list is stored as a series of directory blocks in no set
49 order. Each block has a relative path from the base to the directory itself
50 and a list of all files in that directory. Things are not stored recursively
51 so that the client can have fixed memory usage when managing the list.
52 Depending on how the generator is configured the order of the directories
53 may be breadth first or depth first, or perhaps completely random. The client
54 should make no assumptions about the ordering of anything in the file.
57 Since the list may be generated on the fly by the server it is necessary for
58 it to be streamable. To this effect there will be no counts or sizes that
59 refer to anything outside of the current record. This assures that the
60 generator will be able to build a file list without negligable server side
61 overhead. Furthermore a focus is placed on making things as small as possible,
62 to this end usefull items like record length indicators are omitted. This
63 does necessarily limit the ability to handle format changes.
67 <!-- Data Stream {{{ -->
68 <!-- ===================================================================== -->
71 The data stream is encoded as a series of variable length numbers, fixed
72 length numbers and strings. The use of variable length number encoding
73 was chosen to accomidate sites with over 100000 files, mostly below 16k,
74 using variable length encoding will save approximately 400k of data and still
75 allow some items that are very large.
78 Numbers are coded as a series of bytes of non-fixed length, the highest bit
79 of each byte is 1 if the next byte is part of this number. Bytes are ordered
80 backwards from the least significant to the most significant inorder to
81 simplify decoding, any omitted bits can be assumed to be 0. Clients should
82 decode into their largest type and fatally error if a number expands to
83 larger than that. All numbers are positive.
86 Strings are coded in pascal form, with a length number preceeding a series
87 of 8 bit characters making up the string. The strings are coded in UTF.
90 The first records in the file should be a header record followed by any
91 include/exclude records to indicate how the list was generated. Following
92 that is the actual file list data.
95 The records all have the same form, they start with an 8 bit tag value and
96 then have the raw record data after. The main header has a set of flags for
97 all of the records types, these flags are used to designate optional portions
98 of the record. For instance a
99 file record may not have a md5 hash or uid/gid values, those would be marked
100 off in the flags. Generally every non-critical value is optional. The records
101 and their tags are as follows:
105 <item> 1 - Directory Marker
106 <item> 2 - Directory Start
107 <item> 3 - Directory End
108 <item> 4 - Normal File
110 <item> 6 - Device Special
112 <item> 8 - Include/Exclude
114 <item> 10 - Group Map
115 <item> 11 - Hard Link
116 <item> 12 - End Marker
117 <item> 13 - RSync Checksums
118 <item> 14 - Aggregate File
119 <item> 15 - RSync End
123 The header record is placed first in the file followed by Directory records
124 and then by a number of file type records. The Directory Start/End are used
125 to indicate which directory the file records are in. The approach is to
126 create a bundle of file type records for each directory that are stored
127 non-recursively. The directory marker records are used with depth-first
128 traversal to create unseen directories with the proper permissions.
131 <!-- ===================================================================== -->
134 The header is the first record in the file and contains some information about
139 uint8 Tag; // 0 for the header
152 This field should contain the hex value 0x97E78AB which designates the file
153 as a DSync file list. Like all numbers it should be stored in network byte
157 <tag>MinorVersion<item>
158 These two fields designate the revision of the format. The major version
159 should be increased if an incompatible change is made to the structure of
160 the file, otherwise the minor version should reflect any changes. The current
161 major/minor is 0 and 0. Compatibility issues are discussed later on.
164 Inorder to encode time in a single 32 bit signed integer the format uses a
165 shifting epoch. Epoch is set to a time in seconds from the unix
166 epoch. All other times are relative to this time.
167 In this way we can specify any date 68 years in either direction from any
168 possible time. Doing so allows us to encode time using only 32 bits. The
169 generator should either error or truncate if a time value exceeds this
170 representation. This does impose the limitation that the difference between
171 the lowest stored date and highest stored date must be no more than 136 years.
174 This designates the number of items in the flag array.
177 Each possible record type has a flag value that is used to indicate what
178 items the generator emitted. There is no per-record flag in order to save
179 space. The flag array is indexed by the record ID.
183 <!-- Directory Marker {{{ -->
184 <!-- ===================================================================== -->
185 <sect>Directory Marker, Directory Start and Directory
187 The purpose of the directory marker record is to specify directories that
188 must be created before a directory start record can be processed. It is needed
189 to ensure the correct permissions and ownership are generated while the
190 contents are in transfer.
193 A Directory Start record serves to indicate a change of directory. All further
194 file type records will refer to the named directory until a Directory End
195 record is processed marking the final modification for this directory. It is
196 not possible to nest directory start directives, in fact a Directory Start
197 record implies a Directory End record for the previosly Started Directory
200 The plain directory record is a file type record that refers to a directory
201 file type. All of these record types describe the same thing used in different
202 contexts so share the same structure.
207 uint8 Tag; // 1, 2 or 7 for the header
217 <tag>Flags [from the header]<item>
218 Optional portions of the structure are Permissions (1<<0) and user/group
219 (1<<1). The bit is set to 1 if they are present.
222 This is the number of seconds since the file list epoch, it is the modification
223 date of the directory.
225 <tag>Permissions<item>
226 This is the standard unix permissions in the usual format.
230 These are the standard unix user/group for the directory. They are indirected
231 through the user/group maps described later on.
234 The path from the base of the file list to the directory this record describes.
235 However ordinary directory types have a single name relative to the last
236 Directory Start record.
239 <!-- Directory End {{{ -->
240 <!-- ===================================================================== -->
243 The purpose of the directory end marker is to signafy that their will be no
244 more file type records from this directory. Directory Start and Directory
245 End records must be paired. The intent of this record is to allow future
246 expansion, NOT to allow recursive directory blocks. A Directory Start
247 record will imply a Directory End record if the previous was not terminated.
250 There are no data members, it is the basic 1 item record. If the data stream
251 terminates with an open directory block it is assumed to be truncated and
255 <!-- Normal File {{{ -->
256 <!-- ===================================================================== -->
259 A normal file is a simple, regular file. It has the standard set of unix
260 attributes and an optional MD5 hash for integrity checking.
276 <tag>Flags [from the header]<item>
277 Optional portions of the structure are Permissions (1<<0), user/group
278 (1<<1), and MD5 (1<<2). The bit is set to 1 if they are present.
281 This is the number of seconds since the file list epoch, it is the modification
284 <tag>Permissions<item>
285 This is the standard unix permissions in the usual format.
289 These are the standard unix user/group for the directory. They are indirected
290 through the user/group maps described later on.
293 The name of the item. It should have no pathname components and is relative
294 to the last Directory Start record.
297 This is a MD5 hash of the file.
300 This is the size of the file in bytes.
304 <!-- ===================================================================== -->
307 This encodes a normal unix symbolic link. Symlinks do not have permissions
308 or size, but do have optional ownership.
323 <tag>Flags [from the header]<item>
324 Optional portions of the structure are, user/group
325 (1<<0). The bit is set to 1 if they are present.
328 This is the number of seconds since the file list epoch, it is the modification
333 These are the standard unix user/group for the directory. They are indirected
334 through the user/group maps described later on.
337 The name of the item. It should have no pathname components and is relative
338 to the last Directory Start record.
340 <tag>Compression<item>
341 Common use of symlinks makes them very easy to compress, the compression
342 byte allows this. It is an 8 bit byte with the first 7 bits representing an
343 unsigned number and the 8th bit as being a flag. The first 7 bits describe
344 how many bytes of the last symlink should be prepended to To and if the 8th
345 bit is set then Name is appended to To.
348 This is the file the symlink is pointing to. It is an absolute string taken
349 as is. The client may perform checking on it if desired. The string is
350 compressed as described in the Compression field.
353 <!-- Device Special {{{ -->
354 <!-- ===================================================================== -->
357 Device Special records encode unix device special files, which have a major
358 and a minor number corrisponding to some OS specific attribute. These also
359 encode fifo files, anything that can be created by mknod.
374 <tag>Flags [from the header]<item>
375 Optional portions of the structure areuser/group
376 (1<<0). The bit is set to 1 if they are present.
379 This is the number of seconds since the file list epoch, it is the modification
382 <tag>Permissions<item>
383 This non-optional field is used to encode the type of device and the
384 creation permissions.
387 This is the OS specific 'dev_t' field for mknod.
391 These are the OS dependent device numbers.
394 The name of the item. It should have no pathname components and is relative
395 to the last Directory Start record.
398 This is the file the symlink is pointing to.
401 <!-- Include/Exclude {{{ -->
402 <!-- ===================================================================== -->
403 <sect>Include and Exclude
405 The include/exclude list used to generate the file list is encoded after
406 the header record. It is stored as an ordered set of include/exclude records
407 acting as a filter. If no record matches then the pathname is assumed to
408 be included otherwise the first matching record decides.
411 struct IncludeExclude
420 <tag>Flags [from the header]<item>
424 This is the sort of rule, presently 1 is an include rule and 2 is an exclude
428 This is the textual pattern used for matching.
432 <!-- User/Group Map {{{ -->
433 <!-- ===================================================================== -->
436 In order to properly transfer users and groups the names are converted from
437 a local number into a file list number and a number to name mapping. When
438 the remote side reads the file list it directs all UID/GID translations
439 through the mapping to create the real names and then does a local lookup.
440 This also provides some compressesion in the file list as large UIDs are
441 converted into smaller values through the mapping.
444 The generator is expected to emit these records at any place before the IDs
457 <tag>Flags [from the header]<item>
458 Optional portions of the structure are RealID (1<<0).
461 This is the ID used internally in the file list, it should be monotonically
462 increasing each time a Map record is created so that it is small and unique.
465 This is the ID used in the filesystem on the generating end. This information
466 maybe used if the user selected to regenerate IDs without translation.
469 <!-- Hard Link {{{ -->
470 <!-- ===================================================================== -->
473 A hard link record is used to record a file that is participating in a hard
474 link. The only information we know about the link is the inode and device
475 on the local machine, so we store this information. The client will have to
476 reconstruct the linkages if possible.
494 <tag>Flags [from the header]<item>
495 Optional portions of the structure are Permissions (1<<0), user/group
496 (1<<1), and MD5 (1<<2). The bit is set to 1 if they are present.
499 This is the number of seconds since the file list epoch, it is the modification
503 This is the unique ID number for the hardlink. It is composed from the
504 device inode pair in a generator dependent way. The exact nature of the
505 value is unimportant, only that two hard link records with the same serial
506 should be linked together. It is recommended that the generator compress
507 hard link serial numbers into small monotonically increasing IDs.
509 <tag>Permissions<item>
510 This is the standard unix permissions in the usual format.
514 These are the standard unix user/group for the directory. They are indirected
515 through the user/group maps described later on.
518 The name of the item. It should have no pathname components and is relative
519 to the last Directory Start record.
522 This is a MD5 hash of the file.
525 This is the size of the file in bytes.
528 <!-- End Marker {{{ -->
529 <!-- ===================================================================== -->
532 The End Marker is the final record in the stream, if it is missing the stream
533 is assumed to be incomplete.
537 uint8 Tag; // 12 for the header
544 This field should contain the hex value 0xBA87E79 which is designed to
545 prevent a correputed stream as begin a legitimate end marker.
549 <!-- RSync Checksums {{{ -->
550 <!-- ===================================================================== -->
551 <sect>RSync Checksums
553 The checksum record contains the list of checksums for a file and represents
554 the start of a RSync description block which may contain RSync Checksums,
555 a Normal File entry or Aggregate Files records.
557 struct RSyncChecksums
563 uint160 Sums[ceil(FileSize/BlockSize)];
568 The size of each block in the stream in bytes.
571 The total size of the the file in bytes.
574 The actual checksum data. The format has the lower 32 bytes as the weak
575 checksum and the upper 128 as the strong checksum.
578 <!-- Aggregate File {{{ -->
579 <!-- ===================================================================== -->
582 If the generator was given a list of included files this record will be
583 emitted after the rsync checksum record, once for each file. The given
584 paths are files that are likely to contain fragments of the larger file.
588 uint8 Tag; // 14 for this record
598 <!-- RSync End {{{ -->
599 <!-- ===================================================================== -->
602 The purpose of the directory end marker is to signafy that the RSync data
603 is finished. RSync blocks begin with the RSync checksum record, then are
604 typically followed by a Normal File record describing the name and attributes
605 of the file and then optionally followed by a set of Aggregate File records.
608 There are no data members, it is the basic 1 item record. If the data stream
609 terminates with an open block it is assumed to be truncated and an error
614 <!-- Handling Compatibility {{{ -->
615 <!-- ===================================================================== -->
616 <sect>Handling Compatibility
618 The format has no provision for making backwards compatible changes, even
619 minor ones. What was provided is a way to make a generator that is both
620 forwards and backwards compatible with clients, this is done by disabling
621 generation of unsupported items and masking them off in the flags.
624 To deal with this a client should examine the header and determine if it has
625 a suitable major version, the minor version should largely be ignored. The
626 client should then examine the flags values and for all records it understands
627 ensure that no bits are masked on that it does not understand. Records that
628 it cannot handle should be ignored at this point. When the client is
629 parsing it should abort if it hits a record it does not support.
631 <!-- Client Requirements {{{ -->
632 <!-- ===================================================================== -->
633 <sect>Client Requirements
635 The client attempting to verify syncronisity of a local file tree and a
636 tree destribed in a file list must do three things, look for extra local files,
637 manage the UID/GID mappings and maintain a map of hardlinks. These items
638 corrispond to the only necessary memory usage on the client.
641 It is expected that the client will use the timestamp, size and possibly
642 MD5 hash to match the local file against the remote one to decide if it
646 Hardlinks are difficult to handle, but represent a very usefull feature. The
647 client should track all hard links until they are associated with a local
648 file+inode, then all future links to that remote inode can be recreated
653 <!-- Overview {{{ -->
654 <!-- ===================================================================== -->
657 The <em>rsync method</> was invented by Andrew Tridgell and originally
658 implemented in the rsync program. DSync has a provision to make use of the
659 <em>rsync method</> for transfering differences between files effeciently,
660 however the implemention is not as bandwidth efficient as what the rsync
661 program uses, emphasis is placed on generator efficiency.
664 Primarily the <em>rsync method</> makes use of a series of weak and strong
665 block checksums for each block in a file. Blocks are a uniform size and
666 are uniformly distributed about the source file. In order to minimize server
667 loading the checksum data is generated for the file on the server and then
668 sent to the client - this might optionally be done from a cached file. The
669 client is responsible for performing the checksumming and searching on its
673 In contrast rsync has the client send its checksums to the server and the
674 server sends back commands to reconstruct the file. This is more bandwidth
675 efficient because only one round trip is required and there is a higher chance
676 that more blocks will be matched and not need to be sent to the client.
679 Furthermore a feature designed for use by CD images is provided where a file
680 can be specified as the aggregation of many smaller files. The aggregated
681 files are specified only by giving the file name. The client is expected to
682 read the file (probably from the network) and perform checksum searching
683 against the provided table.
686 <!-- CD Images {{{ -->
687 <!-- ===================================================================== -->
690 The primary and most complex use of the rsync data is for forming CD images
691 on the fly from a mirror and a CD source. This is extremly usefull beacause
692 CD images take up alot of space and bandwidth to mirror, while they are
693 mearly aggregates of (possibly) already mirrored data. Using checksums
694 and a file listing allows the CD image to be reconstructed from any mirror
695 and reduces the loading on primary CD image servers.
698 The next use of checksums is to 'freshen' a CD image during development. If
699 a image is already present that contains a subset of the required data the
700 checksums generally allow a large percentage of that data to be reused.
703 Since the client is responsible for reconstruction and checksum searching it
704 is possible to perform in place reconstruction and in place initial generation
705 that does not require a (large!) temporary file.