Wishlist items

  • "Seed devices", hard-readonly devices that are CoWed from on write (btrfs has this; useful for base devices for virtualization, among other things).
  • Nonce-misuse-resistant authenticated encryption, such as AES-SIV or HS1-SIV (Closes potential hole regarding nonce reuse and "external" snapshots, as might happen to VMs or systems with externally-managed storage like iSCSI).
  • Some form of "secure delete" functionality. (However, see this LWN article regarding implementation strategies and pitfalls).
  • A simplified userspace API with no hierarchy, only blobs identified by unique integer keys (eternaleye thinks this might be useful for object-capability systems, such as Robigalia).
  • An API like the above, but supporting multiple streams per blob, possibly with string identifiers (needs further examination, intent is to match the needs of CephFS for OSD backends).
  • More advanced caching algorithms; one potentially-relevant paper is Pannier: A Container-based Flash Cache for Compound Objects.
  • "Asymmetrical" compression algorithms, that support only decompression (XZ is a nice candidate here, and would be a very good match for some seed device use cases).
  • RAID-6 with parity 3 or greater - could potentially use Andrea Mazzoleni's technique for generating Cauchy matrices compatible with Linux' current RAID-5 and RAID-6 formats, providing a clean upgrade path.
  • "Inline" forward error correction, possibly using a fountain code like RaptorQ.
  • Support Trusted/Encrypted kernel keyring keys, in order to take advantage of TPMs.
  • Support for multiple key slots.
    • LUKS2 has a new keyslot system that better supports two-factor auth and other external keying mechanisms.
  • Ponder the ramifications of (and safe defaults for) compression in the presence of encryption.
  • Swap file support.
  • Support (a subset of?) the Ext[234] attributes denoting special behaviors:
    • +c for compressed files
    • +C for disabling copy-on-write
    • +e for extent-based storage (always set? btrfs doesn't set it...)
    • +E for displaying that a file is encrypted
    • +N for displaying that a file's contents are inlined into the inode
    • +s for files that should be securely erased on delete
    • +u for files that should permit being "undeleted"
    • Others seem less relevant, but may be worth investigating.
  • "Remote Subvolumes", subvolumes that reside on a subset of the filesystem's physical devices and can be deactivated so that the physical devices can be detached (if all subvolumes on them are remote and deactivated) without unmounting the entire filesystem.
  • Per-file allocation policies - highly useful for VM disks, potential zvol killer.
  • Conversion of multi-device filesystems (potentially relevant for btrfs)
    • May be possible to use the new getfsmap ioctl; it seems to provide device information, and it looks like it iterates the whole FS (perfect for doing a whole-FS conversion, so it may even simplify single-device cases).
  • A fully-explicit mount syntax that does no scanning at all, possibly with a purely-userspace mount.bcachefs helper for scanning that then invokes the explicit form. Allows eliminating scanning/registration from the kernel.
    • Strawman: mount -t bcachefs bcachefs -o dev=...,dev=... /mount/point
  • Support the i_version inode attribute, useful for NFSv4, backup tools, indexers, and other things that want to have a counter that updates whenever something changes about a file.
  • Support DAX, to allow using NVM devices directly and reduce page-cache pressure
    • Caveat: May need careful thinking if the kernel assumes it can write directly if DAX is supported; getting checksums invalidated would suck.
  • Support converting filesystems not just to bcachefs volumes, but to image files on bcachefs volumes (and once reflinks are in, to both at once)
    • Notably, if conversion made an image as the first step, this could allow the subsequent extraction of files to bcachefs files a completely userspace process, which merely reflinks ranges of content out. This would also avoid the risks facing the current system, where the converter notes where the data lives in the "inner" bcachefs image, and then the content is moved out from under it (such as by GC, or btrfs rebalancing)
  • Support "adopting" devices, by importing their content into an existing filesystem using the existing extents on the old device. In theory, could supplant "adding a cache" in a minimally-invasive way.
  • Add a bcachefs command to export a path as a block device (satisfy some bcache use cases that bcachefs is clumsy for today, and in combination with the above two items may make migration from other filesystems easier)