In the current (4.6) kernel, asking ext4_map_blocks() for a large allocation will result in only 8MB being allocated. For DAX, we want to see 1GB allocations.
We believe the limitation is due to the on-disk format:
struct ext4_extent { __le32 ee_block; /* first logical block extent covers */ __le16 ee_len; /* number of blocks covered by extent */ __le16 ee_start_hi; /* high 16 bits of physical block */ __le32 ee_start_lo; /* low 32 bits of physical block */ };
but the in-memory format supports much larger extents:
struct extent_status { struct rb_node rb_node; ext4_lblk_t es_lblk; /* first logical block extent covers */ ext4_lblk_t es_len; /* length of extent in block */ ext4_fsblk_t es_pblk; /* first physical block */ };
ext4 currently assumes a 1:1 mapping between on-disk extents and in-memory extents, but there's really no reason not to split in-memory extents into on-disk extents, and merge on-disk extents into in-memory extents.
The next limit on the size returned by ext4_map_blocks is probably 2GB, imposed by:
struct ext4_map_blocks { ext4_fsblk_t m_pblk; ext4_lblk_t m_lblk; unsigned int m_len; unsigned int m_flags; };
The type of m_len could easily be changed, but this isn't necessary for supporting 1GB pages with DAX. It may be desirable for supporting 16GB pages on, eg, PPC64, but that would be extra credit for this assignment
So, the project is three parts:
- Research the ext4 block allocator, and make sure this really is the reason for the 8MB limit.
- Write kernel code to split and merge ext4 extents when converting between the on-disk and in-memory formats.
Remove the 8MB limit. Test like crazy Use the infrastructure in xfstests to make sure this is really happening.
- Change the type of m_len (optional)