Return to site

Fsnotes 3 1 0

broken image


Updated: 2019-12-31
Created: 2005-10-31

FSNotes is a plain-text note manager for macOS, and is modern reinvention of notational velocity (nvALT) on steroids.Our application respects the following open formats: plain/text, Markdown, and RTF, and stores data in the file system. You can view, edit, and copy data in your favourite external editor, and see live results in FSNotes. Previous version had problems with install to 10.14.3. Shirm, Yes, everything is fine here Sup. Let me know if you need any help Added changes Version 3.1.0: Version 3.1.0 (3.0.4): Note: The downloadable demo is version 3.0.4; the version available for purchase on the Mac App Store is version 3.1.0. The NetPhos 3.1 server predicts serine, threonine or tyrosine phosphorylation sites in eukaryotic proteins using ensembles of neural networks. Both generic and kinase specific predictions are performed. The generic predictions are identical to the predictions performed by NetPhos 2.0.The kinase specific predictions are identical to the predictions by NetPhosK 1.0. Size Number: 000: 00: 0: 1: 2: 3: 4: 5: 6: 8: 10: 12: Diameter.034.047.060.073.086.099.112.125.138.164.190.216: Hole Diameter.040.053.066.080.094.107.120.133.146.173.198.

  • Section menu

File system references (170414)

Older references are not quite accurate, because things in kernel 2.6 are quite better than in kernel 2.4 and filesystem maintainers have reacted to older unfavourable benchmarks by tuning their designs. So the references below are ordered by most recent first.

General
  • The ext2 page.
  • The ext3 page and the ext3 mailing list.
  • The ext4 page.
  • The JFS project and the JFS mailing list archive.
  • The XFS project and the XFS mailing list archive.
  • The ReiserFS page.
  • The Reiser4 page.
  • F2FS.
  • Bcachefs.
  • OpenZFS2017-03-16.
  • Comparison of filesystems2005-08-26.
  • Linux Filesystem Overview2005-08-02.
Descriptions (20200111)
  • ZFS: Using allocation classes2017-03-20.
  • A ZFS developer's analysis of the good and bad in Apple's new APFS file system2016-06-26.
  • Files Are Hard2015-12-12.
  • Results with btrfs and zfs2013-12-07.
  • XFS: the filesystem of the future?2012-01-20.
  • XBuilding the next generation file system for Windows: ReFS2012-01-16.
  • XFS: Adventures in Matadata Scalability2012-01-18
  • Edited by and Advanced tuning for XFS2011-12.
  • AFS cell 'ipp-garching.mpg.de'2011-09-22.
  • A look inside the OCFS2 filesystem2010-09-01.
  • Solving the ext3 latency problem2009-04-14.
  • and others OpenAFS + Object Storage2008-05-22.
  • Improving fsck Speeds in ext42007-09-18.
  • FUSEWiki - Filesystems.
  • A Reliable and Portable Multimedia File System2006-07-18.
  • Shortening fsck Time on ext22006-07-18.
  • Why NFS Sucks2006-07-18.
  • OCSF22006-07-18.
  • Proposal and plan for ext2/3 future development work2006-06-28.
  • Ext3 for large filesystems2006-06-12.
  • A Brief History of UNIX File Systems2005-05-17.
  • and others Interview: Hans Reiser2005-09-13.
  • and others State of the Art: Where we are with the Ext3 filesystem [paper] 2005.
  • Advanced Linux File Systems2005-05-10.
  • Large File Support in Linux2005-02-15.
  • Linux ext3 FAQ2004-10-14.
  • EXT3 File System mini-HOWTO2004-04-23.
  • Linux filesystems, 2003-08.
  • The Google filesystem, 2003-10.
  • XFS for Linux, 2003-08-05.
  • Journaling filesystems for Linux2002-12-15.
  • Planned extensions to the Linux Ext2/Ext3 filesystem2002-06.
  • Journal File Systems in Linux2002-02-24.
  • JFS for Linux2002-02.
  • Introducing XFS2002-01-01.
  • Interview With the People Behind JFS, ReiserFS & XFS2001-08-28.
  • RedHat's new journaling filesystem: ext32001.
  • Linux Filesystems HOWTO2000-08-22.
  • EXT3 journaling filesystem, 2000-07-20.
  • Journal Filesystems2000-07.
  • and JFS layout2000-05-01.
  • JFS overview2000-01-01.
  • Journaling the Linux ext2fs Filesystem1998.
  • and others, Scalability in the XFS File System1996-01.
Benchmarks
Warnings: many of these benchmarks not only are designed somewhat naively, some truly essential aspects of the context, like the elevator or the filesystem readahead, are not mentioned; benchmarks under Linux 2.6 can give very different results from under Linux 2.4; SCSI and ATA/IDE disc drives have very, very different performance profiles, including sync reporting.
  • git-annex-centric benchmark of the filesystems2016-07-06 for Btrfs, ext4, ReiserFS, XFS, ZFS.
  • Real World Benchmarks Of The EXT4 File-System2008-12-03 (and comments).
  • and Exploring High Bandwidth Filesystems on Large Systems [slides version] 2006-08-09.
  • Effects of Filesystem Fragmentation2006-07-18.
  • and Exploring High Bandwidth Filesystems on Large Systems [document version] 2006-07-18.
  • and others State of the Art: Where we are with the Ext3 filesystem [slides] 2005.
  • Linux File Systems Comparisons/Benchmarks2005-07-19.
  • Benchmarks Of ReiserFS Version 42005-06-17.
  • 2.6FileSystemBenchmarks2005-06-07.
  • JFS and ext3 are generally the fastest under a database, while XFS and Reiser seem to be pretty slow.2005-05-11.
  • XFS's real talent is hidden in recovery2005-05-11.
  • Filesystems comparison for present time (r4,r3,jfs,xfs,ext3)2005-04-12.
  • Linux filesystem speed comparison2005-04-10.
  • fsbench2005-01-25.
  • The internet of tomorrow today2004-11-07.
  • Benchmarking Filesystems2004-06.
  • Benchmarking Maildir Delivery on Linux Filesystems2004-05-14.
  • and Linux 2.6 performance in the corporate datacenter2004-01.
  • ReiserFS v. ext2 v. ext3 v. JFS v. XFS, 2003-04-03.
  • Linux File System Benchmarks2003-10-28.
  • and Filesystem Tests, 2003-08-06.
  • Journaling filesystems for Linux2002-08-15.
  • Ext3 vs Reiserfs2002-07-11.
  • Tuning an Oracle8i Database running Linux, Part 2: the RAW Facts on Filesystems2002.
  • Linux Filesystems Comparison: ext2, ext3, xfs, Reiserfs2001-12-25.
Online discussions
Warning: some of these discussions are listed here because I think that they are notably wrong. Some pointers are to single articles, some to threads.
  • Large FOSS filesystems.

File system features (120407)

Desktop filesystem features
Featureext3JFSXFS
Block sizes1024-40964096512-4096
Max fs size8TiB (243B)32PiB (255B)8EiB (263B)
16TiB (244B) on 32b system
Max file size1TiB (240B)4PiB (252B)8EiB (263B)
16TiB (244B) on 32b system
Max files/fs232232232
Max files/dir232231232
Max subdirs/dir215216232
Number of inodesfixeddynamicdynamic
Indexed dirsoptionautoauto
Small data in inodesnoauto (xattrs, dirs)auto (xattrs, extent maps)
fsck speedslowfastfast
fsck space?32B per inode2GiB RAM per 1TiB + 200B per inode
(half on 32b CPU)
Redundant metadatayesyesno
Bad block handlingyesmkfs onlyno
Tunable commit intervalyesnometadata
Supports VFS lockyesyesyes
Has own lock/snapshotnonoyes
Names8 bitUTF-16 or 8 bit8 bit
noatimeyesyesyes
O_DIRECTyesyesyes
barrieryesnoyes (and checks)
commit intervalyesnono
EA/ACLsbothbothboth
Quotasbothbothboth
DMAPInopatchoption
Case insensitivenomkfs onlymkfs only
(since 2.6.28)
Supported by GRUByesyesmostly
Can growonlineonline onlyonline only
Can shrinkofflinenono
Journals dataoptionnono
Journals whatblocksoperationsoperations
Journal disablingyesyesno
Journal sizefixedfixedgrow/shrink
Resize journalofflinemaybeoffline
Journal on another partitionyesyesyes
Special features or misfeaturesIn place convert from ext2.
MS Windows drivers.
Case insensitive option.
Low CPU usage.
DCE DFS compatible.
OS2 compatible.
Real time (streaming) section.
IRIX compatible.
Very large write behind.
Project (subtree) quotas.
Superblock on sector 0.

Fs Notes 3 1 0

File system hints

This section is about known hints and issues with various aspects of common filesystems. They can be just inconveniences or limitations or severe performance problems.

File system hints for JFS (121226)

  • Support from TRIM only from kernel version 3.7 and later.
  • No support for barriers; but the flush interval to the journal is very short.
  • JFS in kernel version 2.6.8 has a singificant memory leak.
  • JFS can handle bad blocks only when a filetree is created, additional ones cannot be handled.
  • The journal can be disabled for fast writing, but disabling the journal is not safe and should not be used for anything other than reloading backups.
  • Since each growing extent is allocated space from an allocation group, and each allocation group will only allocate space to a single extents, if the number of extents (or files) being grown is greater than that of allocation groups some processes will block.

File system hints for ext3 (120304)

  • An ext3 filetree can get very fragmented.
  • Creating a large filetree can take a long time to initialize block groups.
  • Timestamps (including modification time) have a granularity of 1 second, which means that multiple updates per second are not recorded. This can impact make processing and fsync.
  • When fsync is issued all outstanding updates in the journal are written out (in some popular cases), making it a very expensive operation.
  • Flushing for memory to disk is every 5 seconds by default, which can be too frequent, and can mask the lack of fsync in applications.
  • Older kernels (such as in RHEL5) support only filetrees with 128B inodes, but newer tools create filetrees with 256B inodes by default.
  • Older versions of GRUB can only read filetrees with 128B inodes.
  • Maximum filetree size is 8TiB.
  • Support for only 32k subdirectories in a directory.
  • Support for only 32k hard links to a file.
  • Support and TRIM and FSTRIM only from kernel version 2.6.36.
  • Checking a damaged filesystem can take months.
  • Directory indices or ACL blocks can be allocated away from directory data and lead to terrible performance.

File system hints for ext4 (120922)

  • In-place conversion from ext3 leaves existing files allocated as they were.
  • Timestamps (including modification time) have a granularity of 1 second if the older ext3 compatible inode size of 128 bytes is used or kept. If so, multiple updates per second are not recorded. This can impact make processing and fsync.
  • Flushing to disk is much less frequent than for ext3, which increases the chances of data loss unless barriers are enabled and fsync is used by applications.
  • It is possible to create ext4 filetrees larger than 8TiB but this requires a recent kernel with a page cache that supports that.
  • It is possible to create ext4filetrees larger than 16TiB but this requires not just a recent kernel but also mke2fs 1of 1.42 or newer.
  • Online resize of ext4 filetree from less than 16TiB to more than 16TiB is possible from kernel release 3.3 but only if the filetree was created with 64 bit offsets.
  • Since kernel version 3.5ext4 metadata is optionally checksummed.

File system hints for XFS (140925)

Kernel version dependent hints:

  • 3.17 fixes a long standing bug that means that by default when a filetree space is increased any newly created allocation groups are not used for inode allocations. A workaround is to remount with first with option inode32 and then again remount with option inode64.
  • 3.4.5: fixes a delayed segment allocation issue.
  • 3.2.x: bug that causes needless wakeups in xfsaild.
  • 3.2.x: Bug of double unlocking of the ilock.
  • 3.2.12 and older: The default i/o scheduler, CFQ, will defeat much of the parallelization in XFS.
  • 2.6.39 and newer: Support for TRIM and FSTRIM.
  • 2.6.37: There a known bug in the VFS which impacts XFS.
  • 2.6.35 and older: Mounting a filesystem that was mounted with inode64 without it can cause problems.
  • 2.6.32 and newer: The su and sw parameters are obtained automatically from Linux MD, with xfsprogs 3.1.1 or newer.
  • 2.6.27: Case insensitive filenames.
  • 2.6.27.3: Fixes for some regressions.
  • 2.6.21 and older: On crash a file can have nulls inside.
  • 2.6.17 (point versions up to .6): Bug that leads to directory corruption.
  • 2.6.17 and newer: Barriers are enabled by default.

Kernel version independent hints:

  • Since the superblock is at sector 0 of the filetree volume, one cannot have a partition boot loader or other metadata in the same volume.
  • XFS does not handle bad blocks at all.
  • Applications that don't issue fsync can get files full of zero blocks if there is a crash. This is not an issue with XFS, but with the applications.
  • In 32b mode it is possible to use a filetree that is larger than fsck can repair.
  • In 32b mode or without the inode64 option inodes will only be allocated in the first 1TiB (if the sector size is 512B) of space.
  • In 32b mode with 4KiB kernel stacks there is a strong possibility of stack overflow, and a near certainty if exported by NFS.
  • In 64b mode kernel stacks are 8KiB by default, but there is still some possibility of stack overflow with NFS exports, especially on top od DM/LVM2.
  • The inode64 option rotors directories across AGs, and then attempts to allocate space for new files in the AG containing the directory, which is quite different from the alternative because if you create a bunch of files in the same directory, without inode64 XFS will scatter the extents all over the disk rather than trying to allocate them next to each other.
  • When a file is opened the entire list of extents is loaded and kept in memory, which can consume a lot of memory for highly fragmented files.
  • Larger inode sizes can store more extended attributes and inode extents maps for greater efficiency.
  • Up to 64KiB of extended attributes are supported.
  • Filetree freezing hangs all applications accessing files in that filetree.
  • It is possible to move an internal journal to an external journal, but not viceversa, unless the filetree had an internal journal to start with.
  • Since metadata access is serialized by allocation group, if all allocation groups are in use to grown extents writing can stop for all other files, or similarly if the files are in the same allocation group. Having more allocations groups typically improves multithreaded performance.
  • If the XFS allocation group size is a multiple of the underlying RAID stripe then the allocation groups and (and their metadata) may end up on the same disks, preventing parallel IO across the stripe. If mkfs.xfs can discover the underlying RAID geometry it will warn about this with the message:
    Warning: AG size is a multiple of stripe width. This can cause performanceproblems by aligning all AGs on the same disk. To avoid this, run mkfs withan AG size that is one stripe unit smaller, for example %llu.
    The solution indicated is to manually specify an allocation group size that is not congruent with the stripe width, usually a bit smaller.
  • Configuring an external journal will disable XFS' write barrier support.

File system hints for Btrfs

Inside the Btrfs notes page.

File system hints for ZFS (190422)

Release independent:
  • The arecord must match the largest record size.
  • Vdevs cannot be changed or expanded or deleted.
  • All IO is done by at least a full recordsize, regardless of how much is actually read.
  • Removing snapshots can be very, very slow.
Release dependent:
  • In Ubuntu LTS 16, hot spares don't work.
  • In older FreeBSD versions spares can become accidentally UNAVAIL, to fix detach and reattach them.

File system hints for NFS (191231)

Release dependent:

  • NFS protocol versions up to NFSv3 only report timestamps, including modification times, with a granularity of 1 second, even if the exported filesystem has a finer timestamp granularity.
  • The Linux NFS client does not do incremental flushing, but it issues COMMIT packets with no range, that flush all pending writes.
  • NFS over TCP will not restart a half-open connection.
  • The NFSv4 exports can all be under the same directory, and that directory must be exported with fsid=0, and the exported filetree paths do not contain the name of that directory. But this is optional and exports can be done as separate filetrees too.
  • Using des-cbc-crc:normal keytab entries with newer versions of Kerberos may require editing /etc/krb5.conf.
  • NFSv4 ID mapping must be enabled, and the mapping domain must be explicitly set and exactly the same between client and host. It does not need to be the DNS domain, but conventionally it is.
  • NFSv4 ID mapping with Kerberos authentication only works if the undocumented variable Local-Realm is set and must be the same between client and host, and must be the name of the Kerberos realm.
  • When mounting with Kerberos authentication the name given for the server must be identical to that of the service principal for the server, and must be a canonical name of the server, that is the address it resolves to must resolve back to the same name, perhaps because of a bug in GNU LIBC which affects the Kerberos library.

Release dependent:

  • Fixed from 4.14 and 4.19: in some cases where programs rename an NFS file, under NFS 4.0 but not 4.1, the Linux NFS client does not revalidatea a file handle resulting in some programs reporting Stale file error.
  • In kernel 2.6.32 there is a bug in UDP offloading that causes freezes and corruption in NFS.
  • In some recent versions of the Linux kernel's NFS client autotuning can result in instability:
    • a change in RHEL 6.3: sunrpc.tcp_max_slot_table_entries dynamically allocating RPC slots up to the maximum (65536).
    • Reverting to previous limit of 128 recovered system stability.
  • Kerberos security export syntax must be done with a pseudo client of gss/krb5 or gss/krb5[ip] with kernel versions older than 2.6.23 or nfs-utils versions older than 1.11.
  • NFSv4 with Kerberos can only use des-cbc-crc:normal (also in RHEL) keytab entries in kernels older than 2.6.35 thanks to a massive update to the Linux GSS support module.
    With kernel versions that only support des-cbc-crc enctypes if unsupported enctypes are used the GSS server dæmon will print debug message like:

Summary of conditions for a working NFSv4 with Kerberos GSSAPI authentication and/or encryption:

  • The rpc_pipefs filesystem must be mounted on both fileserver and client, usually at /var/lib/nfs/rpc_pipepfs/ or /var/run/rpc_pipefs/ depending on distribution, and the Pipefs-Directory parameter in idmapd.conf must be set to that path.
  • The id mapping parameter Domain in idmapd.conf must be set to the same value on client(s) and server(s), and usually is the lower case version of the relevant DNS domain name.
  • The Local-Realm parameter in idmapd.conf must be set to the same value on client(s) and server(s) and must be the Kerberos realm name used for both.
  • The rpcsec_gss_krb5 kernel module must be loaded.
  • The file gssapi_mech.conf must list the gssapi_krb5 shared object with mechglue_internal_krb5_init as the initialization function.
  • On all involved systems the /etc/krb5.keytab must have the relevant host and service entries:
    • The server must have in its keytab the keys for the nfs/ and host/ service principal for its canonical DNS name, unless the svcgss dæmon is configured otherwise.
    • Each client must have in its keytab the keys for the host/ and nfs/ service principal for its own canonical DNS name, unless the gss dæmon is configured to run with the -n option, or the gss dæmon is a new one that can use just the host/ service principal.
    The canonical DNS names for the machine should be all lowercase, and because of canonical name ambiguities using multiple IP addresses on a host may not work.
    To sort out canonicalization issues the /etc/hosts file or the DNS zone may need to be carefully edited.
  • These dæmons should be running:
    • The usual rpcbind (or portmap) plus idmap on both server and client.
    • svcgss on the server, unless this is a recent versions of the NFS Linux utilities in which case it is not needed. Reading carefully the man page can be useful.
    • gss dæmon on the client. Reading carefully the man page can be useful.
  • The exports file on the server must list the relevant filetrees as exported with one of the gss/ security types, and the exporting filetree must be mounted on the client using the same security type.
  • Each user accessing a NFS filetree mounted with Kerberos security must have their own Kerberos principal, the nfs/ principal for the client is used only to mount the filetree, but does not grant any user on the client access to the files.
  • If the mounting succeeds the Kerberos credential cache /tmp/krb5cc_machine_REALM will have a key from the local host's service principal and a key for the server's service principal.
  • The type of salt should be normal unless using a kaserver in which case it should be afs3 and v4 only if relying on a Kerberos4 KDC.
  • The Kerberos host and service keytab entries must have enctypedes-cbc-src for kernel versions older than 2.6.35.

Some useful pages for using NFSv4 with Kerberos:

  • http://wiki.linux-nfs.org/wiki/Nfsv4_configuration.
  • http://www.citi.umich.edu/projects/nfsv4/linux/using-nfsv4.html.
  • http://linuxcostablanca.blogspot.co.uk/2012/02/nfsv4-myths-and-legends.html.
  • https://we.riseup.net/stefani/kerberos-and-nfs4.
  • http://wiki.debian.org/NFS/Kerberos.
  • http://wiki.linux-nfs.org/wiki/Enduser_doc_kerberos.
  • http://www.itp.uzh.ch/~dpotter/howto/kerberos.
  • http://sadiquepp.blogspot.co.uk/2009/02/how-to-configure-nfsv4-with-kerberos-in.html.
  • http://wiki.linux-nfs.org/wiki/General_troubleshooting_recommendations.
  • https://docs.fedoraproject.org/en-US/Fedora/17/html/FreeIPA_Guide/kerb-nfs.html.

File system hints for OpenAFS (130913)

Version dependent:

  • From version 1.6.2 to 1.6.5 inclusive a bug in the new asynchronous IO code means that core dumps (and probably other kernel initiated IO) cannot be made to OpenAFS volumes.
  • From OpenAFS versions 1.4.15, 1.6.5, 1.7.26 it is possible to use Kerberos keys with better enctypes than des-cbc-crc and des-cbc-md5, and they should be used because 56-bit DES encryption is quite easy to break.
  • In OpenAFS up to and including 1.6.1 there is a bug with select and having more than 1024 file descriptors open that can cause memory corruption in the fileserver or the salvageserver; this often results in a hung salvageserver process and error messages like: in the FileLog.
    The easiest solution is to put the line in the script that starts BOS. The best solution is to upgrade to the latest release, as 1.6.1 has this and other known issues.
    This bug affects the Debian 7/Wheezy packages for OpenAFS at least up to version 1.6.1-3+deb7u1. There are version Debian 7/Wheezy 1.6.5.1 packages in the wheezy-backports archive.
  • In OpenAFS version 1.6.0 a bug can lead to extreme overpinging of file servers.
  • AFS protocol encryption is available in OpenAFS from version 1.4.11 or from version 1.5.60, and probably it has always been in OpenAFS.
  • OpenAFS versions 1.6 and newer can put the cache on any filesystem as the relevant code has been rewritten. However it is useful to have the cache on a filetree without a journal, as the cache is ephemeral.
  • The client cache and the partitions up to and including OpenAFS version 1.4 must be on an ext2 or ext3 filesystem.
  • OpenAFS with Kerberos can only use des-cbc-crc:normal tickets and since version 1.2.11 it can also use des-cbc-md4 and des-cbc-md5 and using other may require editing /etc/krb5.conf.

Version independent:

  • OpenAFS terminology is sometimes different from common usage. In particular a volume is actually a subtree of AFS directories and files, and a partition that holds volumes is actually a subtree of some native operating system filesystem, whether the partition is on a fileserver or is the cache on a client.
  • OpenAFS file servers will use as a partition anything that is mounted under directories whose name begins with vicep in the system's root directory.
  • It is possible to use a file as the block device for a partition holding OpenAFS volumes, as long as it is mounted vi a loop device.
  • The partition for AFS volumes on an OpenAFS fileserver does not need to be in its own dedicated block device, and neither does the AFS cache filetree on an OpenAFS client, but out of space conditions caused by space in the filetree being less than that declared for the OpenAFS may be handled badly. The /vicepABpartitions which are not mount points will be however ignored unless they contain a file called AlwaysAttach.
  • AFS cell names are case insensitive but they are stored internally in uppercase and printed in lower case. As a rule by convention they should always bwe specified in lower case, as there are default mappings to case sensitive Kerberos realm names in all upper cases and to case insentitive DNS domain names in all lower case.
  • The afsio program cannot use dynroot because it relies on libafscp which does not handle synthetic roots.
  • OpenAFS uses UDP and implements a window style flow control algorithm similar to TCP, but the maximumwindow size is much smaller, which limits performance links with a large BDP. The protocol allows up to 256 outstanding packets, but versions of OpenAFS limit that for 32 packets, with the exception of the YFS version which allows for the full 256 packets.
  • The default network buffers sizes for OpenAFS fileservers are usually very inadequate:

    So, setting a UDP buffer of 8Mbytes from user space is _just_ enough to handle 4096 incoming RX packets on a standard ethernet. However, it doesn't give you enough overhead to handle pings and other management packets. 16Mbytes should be plenty providing that you don't

    a) Dramatically increase the number of threads on your fileserver
    b) Increase the RX window size
    c) Increase the ethernet frame size of your network (what impact this has depends on the internals of your network card implementation)
    d) Have a large number of 1.6.0 clients on your network

    To summarise, and to stress Dan's original point - if you're running with the fileserver default buffer size (64k, 16 packets), or with the standard Linux maximum buffer size (128k, 32 packets), you almost certainly don't have enough buffer space for a loaded fileserver.

  • Since a read-only replica will always be preferred to a read-write one, even if all read-only replicas are not available as long as one exists OpenAFS will not use the read-write one.
    Therefore it is always a good idea if a volume has read-only replicas to create an additional read-only replica in the same partition as the read-write one, as that is essentially free as it does not require file copying.
    It also is a good idea because it in case of release (updating read-only volumes to have the same content as a read-write volume) the read-only replica in the same partitions gets updated very quickly, and then other read-only replicas get updated from it, reducing the latency of the release operation.
  • Because of a (difficult to fix) bug it is very recommended to avoid having read-only and read-write replicas in different partitions on the same server because at boot it could happen that the partition with the read-only replica is the first to be discovered by OpenAFS and then the read-write replica is never attached.
    Having read-only and read-write replicas of the same volume in different partitions on the same server is a design error, and there are checks against that, but some corner cases can be missed by the checks.
  • When reconfiguring an OpenAFS service special care must be taken when changing the database server with the lowest IP address.
  • When reconfiguring the addresses of OpenAFS DB servers the client caches must be restarted, or reset with fs newcell.
  • When changing the addresses of OpenAFS file servers there are important cautions concerning the use of fs changeaddr which is however rarely needed.
  • For various quorum related reasons the number of AFS db servers should be odd (1, 2).
  • It is possible to authenticate OpenAFS clients against multiple cells or as multiple users, because the client authentication cache can hold distinct AFS tokens; even if the Kerberos credential cache can only only hold those for one principal.

File system hints for Lustre (120307)

  • Not all Lustre releases are equally reliable, and choosing a good one for a specific environment can require some experimentation.
  • MGS, MDS and OSS can reside on the same system, and the same system can run those of several instances.
  • Mounting an OST on the OSS system that holds it can cause memory resource deadlock and is not recommended.
  • The MDS can be rather CPU bound.
  • The best RPC size for transfers is 1MiB, and aligning storage and user requests to 1MiB boundaries can give very large performance increases.
  • Operations that query or update the inodes, such as changing the size of a file, can be very slow, as all OSSes on which the inode is sliced must be contacted.
  • It is much better to use the Lustre own find command than the platform one.
  • Re-exporting a Lustre mount via SMB or NFS can give very poor performance.
  • Dual-linking Lustre servers on two separate network for Lustre-client and intra-Lustre communications can avoid a lot of problems.
  • LNET imposes some restrictions on changing IP addresses while the system is running.

Some of my notes on filesystems (120307)

These are pointers to some of the entries in my technical blog where filesystems are discussed:

  • 150316Contortions needed for effective use of some advising operations
  • 120222bLog structured and COW filesystems
  • 120222Filesystem recovery and soft updates and journaling
  • 120220Code size as indicator of filesystem complexity
  • 120218bA COW, snapshotting version of 'ext3' and 'ext4'
  • 120128bPresentation on petascale filesystems
  • 120123Types of clusters and cluster filesystems
  • 120120OCFS2 with DRBD
  • 120118OCFS2 a nice filesystem with good performance
  • 120111Ambiguous filesystem terminology and change
  • 120108Good transfer rates on bulk file-tree copy
  • 120104Switching from JFS to XFS on my data file-trees
  • 111230Filesystems for SSDs
  • 090508Amazing filesystem news from Red Hat
  • 090131Impressive JFS and eSATA performance
  • 080822A new log structured filesystem design
  • 080516Large storage pools are rarely necessary
  • 080417bLarge storage pools and Lustre
  • 080415Dimensions of filesystem performance
  • 080407A cheap large reliable storage pool system
  • 080406Much improved filesystem checking for XFS
  • 080216A RAID and filesystem perversity
  • 080210Some more data on filesystem checking speed
  • 070923bSo the cases where RAID5 makes sense are...
  • 070923Yet another RAID5 perversity
  • 070914Another used filesystem test
  • 070701bDisappointing Linux NFSv3 writing misfeature and workaround
  • 070331cCheck of a 5TB filesystem takes 12 hours
  • 070127More RAID5/RAID6 madness
  • 061031bEMC2 often recommends RAID3
  • 061031Storage wire and command protocols, and SAN vs. NAS
  • 061022bRAID5 perversions
  • 061022XFS etc. performance for parallel IO and fragmentation
  • 061015Options for mailboxing and tagged queueing
  • 061014Effect of elevator on multistream reading performance
  • 061013Evolution of a video-on-demand system
  • 060914bGame load times, fragmentation; reporting to base
  • 060729Volumes and filesystem tags
  • 060724bPartitions, extended partitions and 'ms-sys -p' on NTFS
  • 060723Now IO has priorities too under Linux
  • 060702The 'ext4' filesystem and RHEL
  • 060625bSwap space misallocation in Linux
  • 060514Quick write speed test for NFS and CIFS
  • 060513Quick read speed test for NFS and CIFS
  • 060510Quick speed test for Reiser4
  • 060424bSummary of fsck times
  • 060424Some larger filesystem informal speed tests
  • 060423bFilesystem free space and fragmentation
  • 060422bDisc-to-disc defragmenting and backups
  • 060416Retesting JFS performance over time
  • 060323 comments on a report on the LKML that a bug in the disk queue managed in Linux means that writes to disk can be delayed a great deal.
  • 060306b on performance degradation after 2 months of using JFS after a fresh install of Fedora 5.
  • 051226b on performance degradation after 4 weeks of using JFS after a fresh install of Fedora.
  • 051219 on my switch to ext2 for all my MS Windows filesystems except the boot one.
  • 051204 on surprising speed test results with JFS and ext3 with and without extended attributes and ext3's new hash directory indices.
  • 051127 on filesystems and partition size, and how large should partitions be.
  • 051108 with comments on a filesystem with 4 million inodes in a 138GB partition, and fsck.
  • 051101 on JFS speed degradation after 6 weeks of use.
  • 051030b with some comments on the ZFS from Sun.
  • 051014 on the filesystem usage of some popular apps that are slow to startup.
  • 051012d with code example for advising for IO access patterns and their ineffectiveness under Linux.
  • 051012 on several rather interesting threads in the XFS mailing list.
  • 051011b on preallocating when overwriting.
  • 051011 on filesystems, advising and preallocations.
  • 051010 on large blocks sizes or fragmentation in filesystems and the davtools package to visualize ext3 fragmentation.
  • 051009 on a case where fsck takes more than one month, and some filesystems being VLDBs.
  • 051008 on better disk and memory kernel parameters.
  • 051003 on having switched to JFS for Linux, and considering switching to ext2 for MS Windows.
  • 050925 on some details of the way I disk some previous quick tsts of filesystem speed.
  • 050917 on some suspected troubles with JFS and noatime.
  • 050916 on disk performance, read head and filesystems, and turning ext3 into something else.
  • 050915 on some speculation about performance issues starting programs and filesystems.
  • 050914 on space overheads for metadata and internal fragmentation in various filesystem types.
  • 050913 on how filesystem performance degrades with time.
  • 050912 again in what works means for filesystems.
  • 050910 on the that various meanings of works for file systems.
  • 050908 on testing speed for various filesystems on a root filesystem.
  • 050907 on comparing elevators.
  • 050906 comparing various filesystems as to what they are good at.
  • 050523 on filesystems and write caching.

JFS structure summary (051031)

This is a summary in my own words of this more detailed description of JFS data structures. But there is a much better PDF version of the same document, with inline illustrations, also available inside this RPM from SUSE.

Basic entities
Partition
A partition is a container, and has merely a size and a sector size, also called a partition block size, which defines IO granularity (and is usually the same for all partitions on the physical medium); a partition only contains an aggregate.
Extent
A contiguous sequence of blocks, wholly contained in one allocation group. The maximum size of an extent is 224-1 blocks, or almost 64GiB. There are a few types of extents, one of them is ABNR which describes an extent contaning zero bytes only.
Map
A map is a collection of extents that contains a B+-tree index rooted in the first extent of the collection; for example it can be an index of extents for a file body, in which case it is an allocation map, or an index of inode names for a directory, in which case it is called a directory map; the extents in a map are described in the map itself. The root extent of the map is called btree and the leaf extents are called xtrees (and contain an array of entries called xads) if they are for an allocation map, and dtrees if they are for a directory map.
File body
A file body is a sequence of one or more extents, the extents being listed in an allocation map. The extents may be from different allocation groups.
Inode
An inode is a 512 byte descriptor for the attributes of a file or directory, and contains also the root of a file body's allocation map, or of a directory map.
Aggregates
Aggregate
An aggregate is about allocating space, and has a size and an aggregate block size, which defines the granularity of allocation of space to files, and currently must be 4096.
  • Aggregates have a primary and a backup superblock.
  • Aggregates contain one or more allocation groups.
  • Aggregates have a primary and backup aggregate inode tables, which must be exactly one 32 inodes long.
  • Aggregates may contain one or more filesets, but currently only one is allowed.
  • Aggregates also have some space reserved for use by jfs_fsck.
Allocation group
An allocation group, also known as an AG, is merely a section of an aggregate. There is no data structure associated with an allocation group, all belong either to the aggregate or to a fileset.
  • There can be up to 128 AGs in an aggregate, and each must be at least 8192 blocks or 32MiB.
  • Each allocation group must contain a number of blocks that is a power of 2 of the number of block descriptors in a dmap page.
  • If multiple files are growing, each allocates extents from a different allocation group if possible.
Aggregate inode table
The aggregate inode table is an inode allocation map for the inodes that are used internally by the aggregate, and are not user visible (that is, are not part of any fileset). The inodes defined in the table are:
  • Number 0 is reserved.
  • Number 1 is the aggregate inode table itself.
  • Number 2 is the block allocation map file.
  • Number 3 is the inline log file.
  • Number 4 is the bad blocks file.
  • Number 16 is the fileset root file.
Since the aggregate inode table file refers to itself, the first extent of its inode allocation map has a well known constant address (just after the superblock).
Block allocation map
The block allocation map, also called bmap, is a file (not a B+-tree, despite being called map) divided into 4KiB pages. The first block is the
Fsnotes 3 1 0
bmap control page, and then there are up to three levels of dmap control pages that point to many dmap pages. Each dmap page contains:
  • Two arrays of 213 bits where each bit corresponds to a block of the aggregate, and the bit is 1 if the block is in use. Because of the limit of three levels of dmap control pages, there can be at most 230 dmap pages, and thus at most 243 blocks in an aggregate.
  • Some metadata, includings a buddy tree that defines a buddy system of the free and allocated blocks. The buddy tree also extends upwards in the dmap control pages.
The block allocation map contains information that is redundant with that of inode allocation maps, so it can be fully reconstructed, but only with a a full scan of the aggregate and fileset inode tables.
Inline log
A sequence of blocks towards the end of an aggregate that is used to record intended modifications to aggregate or fileset metadata.
Bad blocks
This is a file whose extents cover all the bad blocks discovered by jfs_fsck if any.
Inode allocation maps
Inode allocation map
An inode allocation map is the file body of an inode table file, not a map. This file body contains as the first 4KiB block a control page called dinomap, and after that a number of extents called inode allocation groups.
The dinomap contains:
  • The AG free inode lists array.
  • The AG free inode extents lists array.
  • The IAG free list.
  • The IAG free next.
which segment the information held in the inode allocation map by allocation group.
AG free inode lists array
The AG free inode lists array contains a list headers for each AG. Each lists threads together all the IAGs in that AG that have some free inode entries.
AG free inode extents lists array
The AG free inode extents lists array contains a list header for each AG, and each list threads together all the IAGs in an AG that have some free inode extents.
IAG free list

Fs Notes 3 1 0 1

The IAG free list array contains a list header for each AG, and each list contains the number of those IAGs in the AG whose inodes are all free.
IAG free next
The IAG free next is the number of the next IAG to append (if required) to an inode allocation map, or equivalently the number of IAGs in an inode allocation map plus 1.
Inode allocation group
An inode allocation group, also called IAG, is a 4KiB block that describes up to 128 inode table extents, for a total of up to 4096 inode table entries.
An inode allocation group can be in any allocation group, but all the inode table extents it describes must be in the same allocation group as the first one, unlike the extents of a general purpose file body, which can be in any allocation group; as soon as its first inode table extent is allocated in a allocation group, the inode allocation group is tied to it, until all such extents are freed.
Once allocated, inode allocation groups are never freed, but their inode table extents may be freed though.
Inode table extent
Inode table extents are pointed to by inode allocation groups, and each must be 16KiB in length, and contains 32 inode table entries.
Filesets
Fileset
A fileset is a collections of named inodes. Filesets are defined as and by a fileset inode table, which is an inode allocation map file. It contains these inodes:
  • Number 0 is reserved.
  • Number 1 is a file containing extended fileset information.
  • Number 2 is a directory which is the root of the fileset naming tree.
  • Number 3 is a file containing the ACL for the fileset.
  • Number 4 and following are used for the other files or directories in the fileset, all must be reachable from the directory at number 2.
File
A file is an inode with an attached (optional) allocation map describing a file body that contains data; a particular case of a file is a symbolic link, where the data in the file is a path name.
Directory
Fs notes 3 1 0 1
bmap control page, and then there are up to three levels of dmap control pages that point to many dmap pages. Each dmap page contains:
  • Two arrays of 213 bits where each bit corresponds to a block of the aggregate, and the bit is 1 if the block is in use. Because of the limit of three levels of dmap control pages, there can be at most 230 dmap pages, and thus at most 243 blocks in an aggregate.
  • Some metadata, includings a buddy tree that defines a buddy system of the free and allocated blocks. The buddy tree also extends upwards in the dmap control pages.
The block allocation map contains information that is redundant with that of inode allocation maps, so it can be fully reconstructed, but only with a a full scan of the aggregate and fileset inode tables.
Inline log
A sequence of blocks towards the end of an aggregate that is used to record intended modifications to aggregate or fileset metadata.
Bad blocks
This is a file whose extents cover all the bad blocks discovered by jfs_fsck if any.
Inode allocation maps
Inode allocation map
An inode allocation map is the file body of an inode table file, not a map. This file body contains as the first 4KiB block a control page called dinomap, and after that a number of extents called inode allocation groups.
The dinomap contains:
  • The AG free inode lists array.
  • The AG free inode extents lists array.
  • The IAG free list.
  • The IAG free next.
which segment the information held in the inode allocation map by allocation group.
AG free inode lists array
The AG free inode lists array contains a list headers for each AG. Each lists threads together all the IAGs in that AG that have some free inode entries.
AG free inode extents lists array
The AG free inode extents lists array contains a list header for each AG, and each list threads together all the IAGs in an AG that have some free inode extents.
IAG free list

Fs Notes 3 1 0 1

The IAG free list array contains a list header for each AG, and each list contains the number of those IAGs in the AG whose inodes are all free.
IAG free next
The IAG free next is the number of the next IAG to append (if required) to an inode allocation map, or equivalently the number of IAGs in an inode allocation map plus 1.
Inode allocation group
An inode allocation group, also called IAG, is a 4KiB block that describes up to 128 inode table extents, for a total of up to 4096 inode table entries.
An inode allocation group can be in any allocation group, but all the inode table extents it describes must be in the same allocation group as the first one, unlike the extents of a general purpose file body, which can be in any allocation group; as soon as its first inode table extent is allocated in a allocation group, the inode allocation group is tied to it, until all such extents are freed.
Once allocated, inode allocation groups are never freed, but their inode table extents may be freed though.
Inode table extent
Inode table extents are pointed to by inode allocation groups, and each must be 16KiB in length, and contains 32 inode table entries.
Filesets
Fileset
A fileset is a collections of named inodes. Filesets are defined as and by a fileset inode table, which is an inode allocation map file. It contains these inodes:
  • Number 0 is reserved.
  • Number 1 is a file containing extended fileset information.
  • Number 2 is a directory which is the root of the fileset naming tree.
  • Number 3 is a file containing the ACL for the fileset.
  • Number 4 and following are used for the other files or directories in the fileset, all must be reachable from the directory at number 2.
File
A file is an inode with an attached (optional) allocation map describing a file body that contains data; a particular case of a file is a symbolic link, where the data in the file is a path name.
Directory
A directory is a an inode with a list of name and corresponding inode numbers; the list is either contained entirely within the inode if it is small, or is an attached directory map, containing dtree entries.

Fs Notes 3 1 07

F.A.Q.'s

Q. I installed DVD43 Plug-in on my computer. My copy program still complains that the DVD is encrypted. What am I doing wrong? demo slots
A.
It's more than likely that your copy program doesn't support decrypter plug-ins. Most DVD copy programs don't yet support decrypter plug-ins.

Q. I'm using DVD43 but my DVD copy program still doesn't work well copying newer DVDs. Why is this?
A.
DVD43 will decrypt the files on the DVD so that your copy program can read them. However, if the DVD structure is non standard or the DVD contains bad sectors, your copy program needs to take care of that. Outdated copy programs can't handle bad sectors commonly found on newer DVDs. You need a copy program that provides frequent updates to handle new movies as they are released.

Q. I installed the DVD43 Plug-in but there is no application icon, how can I launch it? caça niquel gratis
A.
The DVD43 Plug-in isn't an application, it's a DLL that's installed in your Windows system folder. It provides an interface that copy programs can use to decrypt DVDs using Microsoft Windows.

Q. What gets installed on my computer and where are the files located?
A.
'DVD43.dll' is installed in the Windows system folder. For example, on a 64 bit Windows 7 computer, it's installed in the 'WindowsSysWOW64' folder. There are three other files installed in the 'Program Files (x86)' folder: 'DVD43 Plugin API.txt', 'History.txt' and 'unins000.exe'.spelautomater

Q. How do I uninstall the plug-in?
A.
Go to your Program Files folder and run the uninstall program. For example: 'ProgramFiles (x86)unins000.exe'.

Fs Notes 3 1 0 2

Q. I'm a software developer and I want to use the DVD43 Plug-in API. Where can I get more information and sample code?
A.
The 'DVD43 Plugin API.txt' file ('ProgramFiles(86)' folder) contains information about each of the exported functions and also provides sample code to use the API.





broken image