Using MpuPacker to Minimize Waste

The Cortex-M v7 MPU requires regions to be a power of two in size and aligned to their size. This can waste a lot of memory if they aren’t ordered efficiently. We developed our MpuPacker utility to help with this by optimizing the ordering and also recommending regions that can be reduced in size or for which subregions can be disabled. It processes the linker command file (.icf) and map file to do this, and it outputs a text file with the recommendations. All discussion here is in regard to the IAR EWARM tools.

Background

The linker works with blocks. Blocks can contain code and data sections (e.g. .text, .data, etc.) and other blocks. At the top level, each block becomes a region in the MPU. It is these blocks that are ordered by MpuPacker, and this section of the .icf file is indicated by a special marker comment. We define a block for each memory space (e.g. SRAM, Flash, etc.) that contains these region blocks. For example:

define block sram_block with fixed order, size = sramsz*5/8, alignment = sramsz
                        {block sys_data, block fsdd_data, block EMAC_BUF, block nslo_data,
                         block ns_data, block lcd_data, block fpu_data, block usbh_data,
                         block usbhdp_data, block nsdp_data, block usbddp_data,
                         block usbd_data, rw}

Here, sys_data, ns_data, and others will become regions in the MPU images for different tasks. The ordering of this list is one of the main recommendations MpuPacker makes.

In general blocks pack most tightly by ordering largest to smallest. However, it is not so simple, depending on the set of block sizes and due to subregions. Subregions are a feature of the MPU to help compensate for the size/alignment restriction. A region has 8 subregions of equal size that may be independently disabled. Disabling them at the end of a region allows a smaller region to be located in that space.

Region Ordering

MpuPacker first shows the existing statement in the .icf file and an analysis of the block sizes and gaps:

*** Unoptimized Fixed Order Region ***

define block sram_block with fixed order, size = sramsz*5/8, alignment = sramsz
                        {block sys_data, block usbh_data, block usbhdp_data, block fsdd_data,
                         block ns_data, block nslo_data, block EMAC_BUF, block lcd_data,
                         block nsdp_data, block usbddp_data, block fpu_data,
                         block usbd_data, rw}

Details for Region  sram_block  ARMv7 MPU Alignment
       Block sys_data      Size   20000   8/8-size  20000   Start 20000000  End 2001ffff
       Block usbh_data     Size    8000   6/8-size   6000   Start 20020000  End 20025fff  *
       Block usbhdp_data   Size    2000   8/8-size   2000   Start 20026000  End 20027fff
       (gap)               Size   18000                     Start 20028000  End 2003ffff  *
       Block fsdd_data     Size   20000   6/8-size  18000   Start 20040000  End 20057fff  *
       Block ns_data       Size       -
       Block nslo_data     Size       -
       Block EMAC_BUF      Size       -
       Block lcd_data      Size     200   7/8-size    1c0   Start 20058000  End 200581bf  *
       Block nsdp_data     Size       -
       Block usbddp_data   Size       -
       Block fpu_data      Size      20   8/8-size     20   Start 200581c0  End 200581df
       Block usbd_data     Size       -
       Total Gaps          Size   18000
       End Free            Size    7e9c

Gaps are marked (gap), and blocks marked ‘-‘ are not present due to the configuration of this build. Here is the optimized ordering:

*** Optimized Fixed Order Region ***
define block sram_block with fixed order, size = sramsz*5/8, alignment = sramsz
                        {block sys_data, block fsdd_data, block usbh_data, block usbhdp_data,
                         block lcd_data, block fpu_data, block ns_data,
                         block nslo_data, block EMAC_BUF, block nsdp_data,
                         block usbddp_data, block usbd_data, rw};

Details for Region  sram_block  ARMv7 MPU Alignment
       Block sys_data      Size   20000   8/8-size  20000   Start 20000000  End 2001ffff
       Block fsdd_data     Size   20000   6/8-size  18000   Start 20020000  End 20037fff  *
       Block usbh_data     Size    8000   6/8-size   6000   Start 20038000  End 2003dfff  *
       Block usbhdp_data   Size    2000   8/8-size   2000   Start 2003e000  End 2003ffff
       Block lcd_data      Size     200   7/8-size    1c0   Start 20040000  End 200401bf  *
       Block fpu_data      Size      20   8/8-size     20   Start 200401c0  End 200401df
       Block ns_data       Size       -
       Block nslo_data     Size       -
       Block EMAC_BUF      Size       -
       Block nsdp_data     Size       -
       Block usbddp_data   Size       -
       Block usbd_data     Size       -
       Total Gaps          Size       0
       End Free            Size    7e9c

Total gaps went from 0x18000 to 0. (It is usually not the case that the blocks are in strictly decreasing order.) Notice the second line (fsdd_data) uses only 0x18000 due to 2 subregions being disabled at the end, making it possible to squeeze in an 0x8000 byte region. Similarly, the lcd_data region has a subregion disabled, allowing the fpu_data to fit. The n/8 fractions make it easy to see how many subregions are disabled at the end.

To use the optimized ordering, you simply copy and paste the lines above the table into the .icf replacing the old lines.

Here is an optimized ordering for the Flash:

Details for Region  rom_block  ARMv7 MPU Alignment
       Block sys_code      Size   20000   8/8-size  20000   Start 08000000  End 0801ffff
       Block fsdd_code     Size   20000   5/8-size  14000   Start 08020000  End 08033fff  *
       Block lcd_code      Size    4000   5/8-size   2800   Start 08034000  End 080367ff  *
       Block fpu_code      Size     800   8/8-size    800   Start 08036800  End 08036fff
       Block usbhdp_code   Size    1000   7/8-size    e00   Start 08037000  End 08037dff  *
       (gap)               Size     200                     Start 08037e00  End 08037fff  *
       Block opcon_code    Size     400   8/8-size    400   Start 08038000  End 080383ff
       Block led_code      Size     400   7/8-size    380   Start 08038400  End 0803877f  *
       (gap)               Size    7880                     Start 08038780  End 0803ffff  *
       Block usbh_code     Size   10000   5/8-size   a000   Start 08040000  End 08049fff  *
       Block nslo_code     Size       -
       Block ns_code       Size       -
       Block nsdp_code     Size       -
       Block usbddp_code   Size       -
       Block usbd_code     Size       -
       Total Gaps          Size    7a80
       End Free            Size   918c6

Study the actual sizes and starting addresses on the top lines to see how blocks were packed into the areas made available by disabled subregions. I’m sure you can appreciate that manually optimizing this would be a pain, and as your system is developed, sizes will change, requiring it to be done repeatedly. Clearly it’s a big help to have a tool instantly figure this out, and that’s why we developed it.

Block Tails

Before worrying about region ordering, the first thing to do is to get code and data blocks sized properly for the actual amount of code and data. The coarse adjustment is the region size, and the fine adjustment is the disabled subregions at the end. Here is an example of what MpuPacker displays for the code regions:

rom_block
                                        addr           tail         subreg      opt
                                        ----           ----         ------      ---
     ucom_code          const        0x0800447a      0x1b86         0x1000       S
     sys_code           uninit       0x08016e60      0x91a0         0x4000       S
     fs_code            const        0x0802b994       0x66c         0x2000        
     fsdp_code          const        0x0802c774       0xc8c          0x400       R
     fsdd_code          const        0x0802d400      0x6c00         0x4000       R
     usbhdp_code        const        0x08034cc8       0x138          0x200        
     fpu_code           uninit       0x08035000       0x800          0x100       -
     opcon_code         const        0x08035900       0x300           0x80       R
     led_code           const        0x08035d80       0x200           0x80       R
     lcd_code           const        0x0803a53c       0x2c4          0x800        
     usbh_code          const        0x0804841c      0x1be4         0x2000         
                                                 ----------
                                          Total:    0x158fe (88318)

The tails are the wasted memory, and in this case the Total shows 88318 bytes. For example, if a region size is 0x400 but only 0x1F0 is used, you can divide the region size by 2 to get a 0x200, and 0x1F0 fits. Alternatively, if the number of bytes used is 0x370, a subregion can be disabled to make the effective region size 0x380. (Subregion size is 0x400 / 8 = 0x80.)

The opt column indicates the recommended optimization for you to make. R means the region size is too big and can be reduced by a power of two or more. In the .icf, divide that region size by 2 (and also change the n/8 to 8/8 on that line). S means that an additional subregion or more may be disabled. Do this by reducing n in the n/8 fraction for that region in the .icf file. Handle each of the lines and then run MpuPacker again to see the new results. If an R didn’t go away, divide that region by 2 again. If the S didn’t, reduce the n/8 again. Do this iteratively, until the opt column is empty. Then the regions are tight, and now you can look at the Region Ordering section and copy the optimized definitions to the .icf.

In the table above, the ucom_code line shows that the tail (waste) is bigger than the subregion size, so clearly a subregion can be disabled, hence the S in the opt column. The .icf has these lines:

define exported symbol ucomcsz   = 0x8000;
…
define block ucom_code with fixed order, size = ucomcsz*6/8, alignment = ucomcsz

Dividing the size by 2 would mean the ucomcsz value 0x8000 to 0x4000. Disabling a subregion would mean changing 6/8 to 5/8. Note that you will never reduce 5/8 to 4/8 since you would instead divide the block size by 2 and return to 8/8 for the multiplier. If this were possible, MpuPacker would indicate R on that line.

Internal Gaps

MpuPacker also reports information about internal gaps within a region, which can happen due to nesting of a block that specifies an alignment. Nesting like this is done due to the limitation of having only 8 slots on most MPUs. If a task needs access to regions A and B, but it has only 1 available MPU slot, B could be nested in A (assuming it is ok for other tasks needing A to also be able to access B). A task needing only B would have only region B in its MPU slot, and then it can’t access A. It is necessary to size and align B according to the usual rule, so if any other blocks or sections preceded it in A, there could be a gap between them and the start of B. The solution is usually to move B to the start of the list for A. This is another issue to fix before getting the new region ordering.

Conclusion

The size and alignment restrictions of the v7 MPU are unfortunate, but MpuPacker is a big help to deal with them. The v8 MPU does not have these restrictions, and merely regions must be a multiple of 32 bytes in size and 32-byte aligned, so MpuPacker is not needed for them. However, the majority of Cortex-M based systems in existence are v7, and an important goal of SecureSMX is to allow OEMs to improve the security of existing systems. We have put a lot of work into supporting v7, not just with MpuPacker, but by handling similar issues in the code.

More Information

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s