The Cortex-M v7 MPU requires regions to be a power of two in size and aligned to their size. This can waste a lot of memory if they aren’t ordered efficiently. We developed our MpuPacker utility to help with this by optimizing the ordering and also recommending regions that can be reduced in size or for which subregions can be disabled. It processes the linker command file (.icf) and map file to do this, and it outputs a text file with the recommendations. All discussion here is in regard to the IAR EWARM tools.
Background
The linker works with blocks. Blocks can contain code and data sections (e.g. .text, .data, etc.) and other blocks. At the top level, each block becomes a region in the MPU. It is these blocks that are ordered by MpuPacker, and this section of the .icf file is indicated by a special marker comment. We define a block for each memory space (e.g. SRAM, Flash, etc.) that contains these region blocks. For example:
define block sram_block with fixed order, size = sramsz*5/8, alignment = sramsz
{block sys_data, block fsdd_data, block EMAC_BUF, block nslo_data,
block ns_data, block lcd_data, block fpu_data, block usbh_data,
block usbhdp_data, block nsdp_data, block usbddp_data,
block usbd_data, rw}
Here, sys_data, ns_data, and others will become regions in the MPU images for different tasks. The ordering of this list is one of the main recommendations MpuPacker makes.
In general blocks pack most tightly by ordering largest to smallest. However, it is not so simple, depending on the set of block sizes and due to subregions. Subregions are a feature of the MPU to help compensate for the size/alignment restriction. A region has 8 subregions of equal size that may be independently disabled. Disabling them at the end of a region allows a smaller region to be located in that space.
Region Ordering
MpuPacker first shows the existing statement in the .icf file and an analysis of the block sizes and gaps:
*** Unoptimized Fixed Order Region ***
define block sram_block with fixed order, size = sramsz*5/8, alignment = sramsz
{block sys_data, block usbh_data, block usbhdp_data, block fsdd_data,
block ns_data, block nslo_data, block EMAC_BUF, block lcd_data,
block nsdp_data, block usbddp_data, block fpu_data,
block usbd_data, rw}
Details for Region sram_block ARMv7 MPU Alignment
Block sys_data Size 20000 8/8-size 20000 Start 20000000 End 2001ffff
Block usbh_data Size 8000 6/8-size 6000 Start 20020000 End 20025fff *
Block usbhdp_data Size 2000 8/8-size 2000 Start 20026000 End 20027fff
(gap) Size 18000 Start 20028000 End 2003ffff *
Block fsdd_data Size 20000 6/8-size 18000 Start 20040000 End 20057fff *
Block ns_data Size -
Block nslo_data Size -
Block EMAC_BUF Size -
Block lcd_data Size 200 7/8-size 1c0 Start 20058000 End 200581bf *
Block nsdp_data Size -
Block usbddp_data Size -
Block fpu_data Size 20 8/8-size 20 Start 200581c0 End 200581df
Block usbd_data Size -
Total Gaps Size 18000
End Free Size 7e9c
Gaps are marked (gap), and blocks marked ‘-‘ are not present due to the configuration of this build. Here is the optimized ordering:
*** Optimized Fixed Order Region ***
define block sram_block with fixed order, size = sramsz*5/8, alignment = sramsz
{block sys_data, block fsdd_data, block usbh_data, block usbhdp_data,
block lcd_data, block fpu_data, block ns_data,
block nslo_data, block EMAC_BUF, block nsdp_data,
block usbddp_data, block usbd_data, rw};
Details for Region sram_block ARMv7 MPU Alignment
Block sys_data Size 20000 8/8-size 20000 Start 20000000 End 2001ffff
Block fsdd_data Size 20000 6/8-size 18000 Start 20020000 End 20037fff *
Block usbh_data Size 8000 6/8-size 6000 Start 20038000 End 2003dfff *
Block usbhdp_data Size 2000 8/8-size 2000 Start 2003e000 End 2003ffff
Block lcd_data Size 200 7/8-size 1c0 Start 20040000 End 200401bf *
Block fpu_data Size 20 8/8-size 20 Start 200401c0 End 200401df
Block ns_data Size -
Block nslo_data Size -
Block EMAC_BUF Size -
Block nsdp_data Size -
Block usbddp_data Size -
Block usbd_data Size -
Total Gaps Size 0
End Free Size 7e9c
Total gaps went from 0x18000 to 0. (It is usually not the case that the blocks are in strictly decreasing order.) Notice the second line (fsdd_data) uses only 0x18000 due to 2 subregions being disabled at the end, making it possible to squeeze in an 0x8000 byte region. Similarly, the lcd_data region has a subregion disabled, allowing the fpu_data to fit. The n/8 fractions make it easy to see how many subregions are disabled at the end.
To use the optimized ordering, you simply copy and paste the lines above the table into the .icf replacing the old lines.
Here is an optimized ordering for the Flash:
Details for Region rom_block ARMv7 MPU Alignment
Block sys_code Size 20000 8/8-size 20000 Start 08000000 End 0801ffff
Block fsdd_code Size 20000 5/8-size 14000 Start 08020000 End 08033fff *
Block lcd_code Size 4000 5/8-size 2800 Start 08034000 End 080367ff *
Block fpu_code Size 800 8/8-size 800 Start 08036800 End 08036fff
Block usbhdp_code Size 1000 7/8-size e00 Start 08037000 End 08037dff *
(gap) Size 200 Start 08037e00 End 08037fff *
Block opcon_code Size 400 8/8-size 400 Start 08038000 End 080383ff
Block led_code Size 400 7/8-size 380 Start 08038400 End 0803877f *
(gap) Size 7880 Start 08038780 End 0803ffff *
Block usbh_code Size 10000 5/8-size a000 Start 08040000 End 08049fff *
Block nslo_code Size -
Block ns_code Size -
Block nsdp_code Size -
Block usbddp_code Size -
Block usbd_code Size -
Total Gaps Size 7a80
End Free Size 918c6
Study the actual sizes and starting addresses on the top lines to see how blocks were packed into the areas made available by disabled subregions. I’m sure you can appreciate that manually optimizing this would be a pain, and as your system is developed, sizes will change, requiring it to be done repeatedly. Clearly it’s a big help to have a tool instantly figure this out, and that’s why we developed it.
Block Tails
Before worrying about region ordering, the first thing to do is to get code and data blocks sized properly for the actual amount of code and data. The coarse adjustment is the region size, and the fine adjustment is the disabled subregions at the end. Here is an example of what MpuPacker displays for the code regions:
rom_block
addr tail subreg opt
---- ---- ------ ---
ucom_code const 0x0800447a 0x1b86 0x1000 S
sys_code uninit 0x08016e60 0x91a0 0x4000 S
fs_code const 0x0802b994 0x66c 0x2000
fsdp_code const 0x0802c774 0xc8c 0x400 R
fsdd_code const 0x0802d400 0x6c00 0x4000 R
usbhdp_code const 0x08034cc8 0x138 0x200
fpu_code uninit 0x08035000 0x800 0x100 -
opcon_code const 0x08035900 0x300 0x80 R
led_code const 0x08035d80 0x200 0x80 R
lcd_code const 0x0803a53c 0x2c4 0x800
usbh_code const 0x0804841c 0x1be4 0x2000
----------
Total: 0x158fe (88318)
The tails are the wasted memory, and in this case the Total shows 88318 bytes. For example, if a region size is 0x400 but only 0x1F0 is used, you can divide the region size by 2 to get a 0x200, and 0x1F0 fits. Alternatively, if the number of bytes used is 0x370, a subregion can be disabled to make the effective region size 0x380. (Subregion size is 0x400 / 8 = 0x80.)
The opt column indicates the recommended optimization for you to make. R means the region size is too big and can be reduced by a power of two or more. In the .icf, divide that region size by 2 (and also change the n/8 to 8/8 on that line). S means that an additional subregion or more may be disabled. Do this by reducing n in the n/8 fraction for that region in the .icf file. Handle each of the lines and then run MpuPacker again to see the new results. If an R didn’t go away, divide that region by 2 again. If the S didn’t, reduce the n/8 again. Do this iteratively, until the opt column is empty. Then the regions are tight, and now you can look at the Region Ordering section and copy the optimized definitions to the .icf.
In the table above, the ucom_code line shows that the tail (waste) is bigger than the subregion size, so clearly a subregion can be disabled, hence the S in the opt column. The .icf has these lines:
define exported symbol ucomcsz = 0x8000;
…
define block ucom_code with fixed order, size = ucomcsz*6/8, alignment = ucomcsz
Dividing the size by 2 would mean the ucomcsz value 0x8000 to 0x4000. Disabling a subregion would mean changing 6/8 to 5/8. Note that you will never reduce 5/8 to 4/8 since you would instead divide the block size by 2 and return to 8/8 for the multiplier. If this were possible, MpuPacker would indicate R on that line.
Internal Gaps
MpuPacker also reports information about internal gaps within a region, which can happen due to nesting of a block that specifies an alignment. Nesting like this is done due to the limitation of having only 8 slots on most MPUs. If a task needs access to regions A and B, but it has only 1 available MPU slot, B could be nested in A (assuming it is ok for other tasks needing A to also be able to access B). A task needing only B would have only region B in its MPU slot, and then it can’t access A. It is necessary to size and align B according to the usual rule, so if any other blocks or sections preceded it in A, there could be a gap between them and the start of B. The solution is usually to move B to the start of the list for A. This is another issue to fix before getting the new region ordering.
Conclusion
The size and alignment restrictions of the v7 MPU are unfortunate, but MpuPacker is a big help to deal with them. The v8 MPU does not have these restrictions, and merely regions must be a multiple of 32 bytes in size and 32-byte aligned, so MpuPacker is not needed for them. However, the majority of Cortex-M based systems in existence are v7, and an important goal of SecureSMX is to allow OEMs to improve the security of existing systems. We have put a lot of work into supporting v7, not just with MpuPacker, but by handling similar issues in the code.