RFC: Precalculated swap log for supporting devices with bigger write block sizes #2122

de-nordic · 2024-11-15T14:28:38Z

Where we are

At this point MCUboot, while moving sectors around, logs status of operations with three flags per sector. With devices with write block size of 4b it amounts to 12 bytes per moved sector. This becomes problematic when you want to make MCUboot work with devices that have bigger write block sizes; for example with devices that require 8 bytes this is already 24 bytes, and becomes 48 for devices with 16 byte write block size. When you get to devices that have blocks like 512 byte, this become hard to handle.

Note: while using "sector" here I mean logical sector of N * erase-block-size of device with the biggest erase-block-size.

What can we do?

The proposal here is to replace the swap log with information block, layout record, that would state desired state of slot and swap algorithm would just try to restore the desired state from what it has on flash.

The layout record could consist of array of sha of sectors in order they should appear in slot. Note that sha256 is 32 byte long so it is already shorter then log entry for 16 byte write-block-size device.

For example

    [ State header ]
    [ sha(sector0) ]
    [ sha(sector1) ]
    ...
    ...
    ...
   [ sha(sectorN) ]
   [ sha(layout record) ]

Where sha is whatever sha we select (could be configurable, sha256 seems fine), and sectorX i logical sector size block on device, within slot.
Each slot will have its own expected layout record, so there would be two such blocks. The layout record sha is just to make sure that we the record is complete.

Note that, if we put aside header size, the log size is now just N * sizeof(sha) and is aligned to write-block-size as a whole not per log entry (actually it will be aligned to erase block size, or logical sector).

Unresolved questions:

Where do we store this record?
Probably storing logs at the end of slot is not a good idea and we should have reserved space outside of slots and the entire slot should be dedicated to application image; problem will be here with devices that have big erase sizes as the advantage of small log record would be killed by using entire large independent erase block size just to store it, and it is better for such devices to have that record inside slot. This could be probably configurable depending on device layout.
How to mark test/confirm.
Generally if we just assume that MCUboot always attempts to restore layout according to the pre-defined log, then just appearance of the log would mean that the swap should happen, and confirmation could be just a removal of the log - this is again problem for devices with big erase blocks, so they would have to mark somehow that they wan to stay with tested layout anyway.
Should application carry its own layout record?
Probably, specifically when signed. Also calculating record on device is risky, but should be evaluated.

@nvlsianpu @d3zd3z @nordicjm @butok @erwango

The text was updated successfully, but these errors were encountered:

butok · 2024-11-19T08:54:25Z

To clarify.
Will these sha be generated and preinstalled by imagetool, or during run-time?

de-nordic · 2024-11-19T11:20:29Z

To clarify. Will these sha be generated and preinstalled by imagetool, or during run-time?

Actually that is point 3 in unresolved question. I guess that we can have it following ways, probably configurable:

the sha table is calculated and carried by every image
+ we have it pre calculated, we are sure that it is not manipulated by hardware errors
+/- kinda saves time on calculating but we mostly have accelerators to do that job anyway
- takes space with image
calculated on hardware:
+ saves image space
+/- time of calculation may depend on having accelerator
- risk of calculating something wrong
combo: sha of each page not pre-calculated but sha of the sha table is
+ saves image space
+/- time of calculation may depend on having accelerator
+ even if we have calculated something wrong, or the are where we calculate the sha is broken, the image carried sha of sha table will prevent us from bricking device

nordicjm · 2024-11-19T11:37:32Z

I don't see how it could be done without having the hash table on the flash, if a device is rebooting during a page being swapped, how are you going to know which part goes with which partial image? One sector will be corrupted. If you don't have a checksum/hash of the sectors for each image, you can't know where that is, nor can you re-generate an "in RAM" hash list because then you have a huge number of operations to try and put a bunch of hashes together to come up with the "sha of the sha table"

de-nordic · 2024-11-19T17:32:17Z

I don't see how it could be done without having the hash table on the flash, if a device is rebooting during a page being swapped, how are you going to know which part goes with which partial image? One sector will be corrupted. If you don't have a checksum/hash of the sectors for each image, you can't know where that is, nor can you re-generate an "in RAM" hash list because then you have a huge number of operations to try and put a bunch of hashes together to come up with the "sha of the sha table"

No, we are intended to have it on flash, the question is where we get it from when we start the swap (and then where we place it).

PavelChromyNXP · 2024-11-26T13:03:33Z

Hello all,

I believe it is a very good idea to implement more efficient algorithm for image swapping.
We have discussed a similar approach internally at NXP quite a long time ago and there was also a design proposal, although it never got to the implementation due to significant redesign of MCUBoot required to achieve the goal (it would not be an isolated change).

For optimal operation the whole swapping process may need to be altered as current swapping methods have significant limitations and/or drawbacks.
The current algorithm with logging not only does not fits devices with larger write phrases, it is also quite wasteful taking 3 writes per swapping step while only 2 are actually necessary (that's another story).

Let me share our analysis and description of the design we have been thinking about.

Generic thoughts:

The swapping using scratch area is not ideal due to FLASH wear of the scratch and unnecessary FLASH operations performed.

The swapping with move-by-one approach scatters the wear across larger area of FLASH but there are still ~twice as many FLASH operations than required.
Moreover moving the primary (executable) slot is not really a neat solution as it adds unnecessary state to be handled during recovery.

It would be fairly easy to handle offset of the image in the secondary/staging slot, because there is no strong (technical) requirement for the placement of the image (the code never gets executed directly from there).
So that either the image could be staged with an offset, or better, the swapping could be done so that the original image would be backed up to the secondary/staging slot with an offset.
The swapping would need to be performed backwards, from the end of the image down, utilizing the unused space following the staged image to backup the original one.

Having metadata (trailer) at the end of the slots is not really practical solution as it cannot be used to store the image itself anyway.
There could simply be a reserved area that does not need to be treated as part of the slot. (That area would be then a clear choice to store the hashes then.)

Overhead calculation/estimations:

It is quite clear that for fail-safe image swapping it is necessary to have at least a single spare erase sector.
However the swapping steps do not necessarily need to correspond with erase sectors. Each swapping step may span multiple of them.

Example: let's assume each slot consists of 1024 erase sectors. We divide each slot into 128 chunks to be swapped (8 erase sectors each).
The overhead of erase sectors needed for failsafe operation is 1/128 of the slot size, that is less than 1%, which is more than acceptable.
So in praxis, disregarding actual slot size, there is no big deal with limiting number of swap steps (and hashes) by a constant number, say 128 as in the setup presented.

But do we really need to have hash table for the both images for successful recovery? Actually not. Swapping chunks are not scattered randomly; we know the swapping algorithm. We just need to find the crossing-point where the swapping was interrupted.
For that, the hash table of just one on the images is needed. Once we put together that image, it is clear where and how get the other one, possibly keeping a single hash for its final verification.
In absolute numbers, the space needed for SHA256 hashes would be 4 KB + 32 bytes in total to store the hash table + one extra hash for the other image.

Going further, the image typically would not take up the whole slot. The application designer likely leaves at least 10% margin or more.
If the actual image occupies 80% of the slot size, then 20% is unused and could be leveraged for image swapping. In such a case a smart swapping algorithm could adapt and finish in 5 steps while keeping just 6 hashes.
It is disputable whether such optimization makes sense regarding increased complexity. Surely not in the initial implementation, this is maybe just for your consideration.

Where to store the hashes:

Rather not pre-calculated in the image itself for several reasons:

If the swapping gets interrupted it would be very impractical for determination of where the hash table was at the moment of interruption and to which image it actually belonged.
It would not save us from on-device hash calculation as it is still needed for verification (on-device hashing needs to be implemented in any case).
The size of the swapping/hashing chunk is not a property of the image but rather depends on the configuration of the bootloader.
Consider an MCU with external XIP FLASH. Each manufacturing batch of the device could utilize different FLASH part (different erase sector size). Well designed firmware should not depend on such a detail.
For neat design, swapping algorithm should not depend on the actual content it is processing, but rather be generic to allow for flexibility/modularity. All it needs to know are addresses/lenghths of data to swap.

Hashes in the FLASH:

Straight forward solution, ~ 4 KB overhead for reasonably large FLASH memory may not be that big deal.

In RAM:

Doable. The hashes for all blocks would have to be re-calculated and table needs to be reconstructed upon recovery from failure.
A hash of the table is need for its reconstruction.
For the table from the example above, hash of 4096 byte table would have to be calculated up to 128 times. That roughly corresponds to calculation of hash over about 500 KB of data.
If that seems slow, it would be possible to speed it up by storing also CRC32 of the table, do reconstruction of the table according to it, verify hash only when the CRC matches and eventually retry.
Using a dynamic programming technique the total amount of data passing through CRC32 algorithm could be cut down by half.
In any case the recovery time should not be that big deal if it finishes in some reasonable time (i.e. within a minute) as it is an exceptional scenario, not expected to happen under normal operation.

Summary:

It would be good idea to have reserved space to keep bootloader meta-data/config separate from the image slots. Ideally having two instances in independent erase blocks to allow for safe ping-pong style update is preferred.
it is not beneficial to insist on placing the image in the secondary slot always at its very beginning. Relaxing this requirement reduces FLASH wear and ev. number of states to be taken into account during recovery.
Swapping does not need to be done in units of erase sector size. The unit could be any multiple of it - it is matter of choice with overhead tradeoff.
Full hash table is needed just for one of the images. For the other one just a single hash is sufficient.
Having the hashes in RAM is possible but likely does not pay off if there is a space for meta-data allocated in the FLASH.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: Precalculated swap log for supporting devices with bigger write block sizes #2122

RFC: Precalculated swap log for supporting devices with bigger write block sizes #2122

de-nordic commented Nov 15, 2024

butok commented Nov 19, 2024

de-nordic commented Nov 19, 2024

nordicjm commented Nov 19, 2024

de-nordic commented Nov 19, 2024 •

edited

Loading

PavelChromyNXP commented Nov 26, 2024

RFC: Precalculated swap log for supporting devices with bigger write block sizes #2122

RFC: Precalculated swap log for supporting devices with bigger write block sizes #2122

Comments

de-nordic commented Nov 15, 2024

Where we are

What can we do?

Unresolved questions:

butok commented Nov 19, 2024

de-nordic commented Nov 19, 2024

nordicjm commented Nov 19, 2024

de-nordic commented Nov 19, 2024 • edited Loading

PavelChromyNXP commented Nov 26, 2024

Generic thoughts:

Overhead calculation/estimations:

Where to store the hashes:

Summary:

de-nordic commented Nov 19, 2024 •

edited

Loading