(gdb) bt #0 _hang () at startup/startup.s:136 #1 <signal handler called> #2 0x0001c918 in prvPortStartFirstTask () at portable/GCC/ARM_CM4F/port.c:303 #3 0x0001ca02 in xPortStartScheduler () at portable/GCC/ARM_CM4F/port.c:395 Backtrace stopped: previous frame inner to this frame (corrupt stack?)
How to approach such issues?
Let's do some assumptions before going further:
- The processor is 32 bit ARM.
- You're using GCC toolchain.
- OS is the FreeRTOS.
- The stack is configured to grow downwards (standard way).
- You don't have memory dumping mechanism and/or true post-mortem analysis tools/scripts.
- You can catch the exception using GDB.
Having in mind above assumptions, this is what I do to find the root cause:
1. Connect through GDB and wait until bug reproduces.
2. When GDB catches the exception, print current task name:
(gdb) p pxCurrentTCB->pcTaskName $6 = "Bug task\000\000"
3. Find in the source code what stack size was allocated for this task. Example:
#define BUG_TASK_SIZE 256 xTaskCreate(bug_task, "BUG", BUG_TASK_SIZE, NULL, 1, NULL)
To find out how many bytes are reserved for this task's stack in FreeRTOS, the "BUG_TASK_SIZE" value must be multiplied by word size. The 32 bit ARMs have 4 byte word size, so actual stack size is 256*4 = 1kB.
4. Find the lowest possible stack address:
(gdb) p pxCurrentTCB->pxStack $7 = (StackType_t *) 0x20010400
5. Add the stack size to get stack range:
0x20010400 + 0x400 (1kB) = 0x20010800;
The stack of this task is between 0x20010400 and 0x20010800.
6. Read current top of the stack:
(gdb) p pxCurrentTCB->;pxTopOfStack $8 = (volatile StackType_t *) 0x2001073c
So far, we get:
0x20010800 <- beginning of stack | | 0x2001073c <- current top of stack ... 0x20010400 <- end of stack
7. Calculate how many bytes of stack was used:
0x20010800 - 0x2001073c = 0xC4 (196 bytes which are 49 words)
8. Dump the stack:
(gdb) x/49wx 0x2001073c 0x2001073c: 0x2000e41c 0x200107a8 0x2000e3f8 0x00000000 0x2001074c: 0x00000000 0x20010758 0x20010758 0x00000000 0x2001075c: 0x00000001 0x200107a8 0x00000000 0x00000000 0x2001076c: 0x00000000 0x20015f90 0x2001626c 0x0102fea9 0x2001077c: 0x20015ed4 0x0000000a 0x20010798 0x20010798 0x2001078c: 0x00021869 0x00020cf8 0x01000000 0x0102fea9 0x2001079c: 0x00000000 0x00000000 0x00000000 0x200107b0 0x200107ac: 0x00020e89 0x200107b8 0x00020f99 0x00020e71 0x200107bc: 0x00020e81 0x200107c8 0x00020bf5 0x00020e71 0x200107cc: 0x00020e81 0x00000000 0x00000000 0x00020f71 0x200107dc: 0x02000000 0x200107e8 0x0002101b 0x00020e71 0x200107ec: 0x00020e81 0x00000000 0x0001c8a5 0x00000000 0x200107fc: 0x00000000
9. Pass it to arm-none-eabi-addr2line:
You can pass each value one by one, create some sort of script or format it as one column and just paste:
arm-none-eabi-addr2line -e <path-to-elf> <paste stack data> 0x2000e41c 0x200107a8 0x2000e3f8 0x00000000 0x00000000 0x20010758 0x20010758 0x00000000 0x00000001 0x200107a8 0x00000000 0x00000000 0x00000000 0x20015f90 0x2001626c 0x0102fea9 0x20015ed4 0x0000000a 0x20010798 0x20010798 0x00021869 0x00020cf8 0x01000000 0x0102fea9 0x00000000 0x00000000 0x00000000 0x200107b0 0x00020e89 0x200107b8 0x00020f99 0x00020e71 0x00020e81 0x200107c8 0x00020bf5 0x00020e71 0x00020e81 0x00000000 0x00000000 0x00020f71 0x02000000 0x200107e8 0x0002101b 0x00020e71 0x00020e81 0x00000000 0x0001c8a5 0x00000000 0x00000000
addr2line tool will try to parse each value as code address. Some of those addresses are data, so you'll get a garbage that you can ignore. Other lines, that match one of your source files, will be printed with specific line number. All in all, as an output you'll get something like this:
heap_4.c:? heap_4.c:? :? :? heap_4.c:? heap_4.c:? :? :? heap_4.c:? :? :? :? zzzz_sd.c:? zzzz_sd.c:? ??:0 main.c:? :? heap_4.c:? heap_4.c:? /home/yyy/devel/xxx/app/zz/src/zzzz_sd.c:331 <<--- CHECK THIS LINE heap_4.c:? /home/yyy/devel/xxx/app/zzz/src/zzzz_sd.c:395 /home/yyy/devel/xxx/app/zzz/src/zzzz_sd.c:324 /home/yyy/devel/xxx/app/zzz/src/zzzz_sd.c:329 heap_4.c:? /home/yyy/devel/xxx/app/zzz/src/zzzz_sd.c:227 /home/yyy/devel/xxx/app/zzz/src/zzzz_sd.c:324 /home/yyy/devel/xxx/app/zzz/src/zzzz_sd.c:329 :? :? /home/yyy/devel/xxx/app/zzz/src/zzz_sd.c:387 ??:0 heap_4.c:? /home/yyy/devel/xxx/app/zzz/src/zzz_sd.c:422 /home/yyy/devel/xxx/app/zzz/src/zzz_sd.c:324 /home/kowyyyalmic/devel/xxx/app/zzz/src/zzz_sd.c:329 :? /home/kowalmic/devel/xxx/portable/GCC/ARM_CM4F/port.c:269
9. Analyze:
Now, at this point there are no strict rules how to proceed. However, check the source lines parsed by addr2line tool (note, the addresses are LR values, not PC) and this should point you as close as possible to the offending function. There are pretty good chances that the top-most parsed source line is just after some kind of wrongly used memcpy/memset.
For example, one of the last LRs put on stack was parsed by addr2line as:
If you check the source line of zzzz_sd.c you can see for instance:
For example, one of the last LRs put on stack was parsed by addr2line as:
/home/yyy/devel/xxx/app/zz/src/zzzz_sd.c:331
If you check the source line of zzzz_sd.c you can see for instance:
328: void xxx_get_ipv4_addr_raw(char *addr) 329: { 330: memcpy(addr, m_xxx_iface->ip_addr, IPV6_LENGTH); 331: }
The address points to the next instruction after branch to the memcpy. Now, just look into the memcpy call above and the bug becomes obvious. Got it!