XCC Compiler Bug - No Exit Condition in Assembly for uint8_t Array

Technical questions regarding the XTC tools and programming with XMOS.
User avatar
vergil19
Member++
Posts: 21
Joined: Thu Jan 20, 2022 3:54 am

XCC Compiler Bug - No Exit Condition in Assembly for uint8_t Array

Post by vergil19 »

Compiler Version: 15.2.1
Operating System: Ubuntu22.04

When compiling code that utilizes a `uint8_t` array, specifically `OLED_GRAM[128][8]`,
the XCC compiler generates assembly code that lacks a proper exit condition.
This issue leads to a runtime memory access exception, as observed by the following error message when running the program:

Code: Select all

xrun: Program received signal ET LOAD STORE Memory access exception.
When using a `uint32_t` array in place of the `uint8_t` array, the compiler generates the correct assembly code with proper exit conditions, and no runtime exceptions occur.

The code and assembly code is in attachments:
You do not have the required permissions to view the files attached to this post.


User avatar
CousinItt
Respected Member
Posts: 365
Joined: Wed May 31, 2017 6:55 pm

Post by CousinItt »

There's a common problem in XCC 14.x which I think you are seeing here. When using dual-issue instructions to access data, the data has to be aligned on a 64-bit boundary. The usual workaround is to use an alias to declare the array, for example in C:

Code: Select all

static uint64_t aligned_block[DATA_SIZE / UINT8_PER_UINT64] = {0};

static uint8_t * data_block = (uint8_t *) &aligned_block[0];
The problem might not show up as often when using a 32-bit int array, but that's just down to luck.

If using xc you may need to declare data as unsafe or use a different pointer type to use this workaround.
User avatar
dsteinwe
XCore Addict
Posts: 144
Joined: Wed Jun 29, 2016 8:59 am

Post by dsteinwe »

@vergil19 How do you have generated the disassembled code? I'm interested in, because some instructions are coupled in a {} for dual issue. This looks nice and improves readability.

@CousinItt I don't think it is a dual-issue problem. I think, it is a compiler bug. Let me explain why.

I use "Compiler version: 14.3.3" with the xtimecomposer. But it is also reproducable with "XTC version: 15.1.4".
I used following code, to reproduce vergil19 bug:

Code: Select all

#include <stdio.h>
#include <stdint.h>

uint8_t OLED_GRAM[128][8];

void oled_Refresh_GRAM(void) {
    uint8_t i, n;

    for (i = 0; i < 8; i++) {
        for (n = 0; n < 128; n++) {
            OLED_GRAM[n][i] = 0x00;
            if (n > 127) {
                // compiler with -O3
                // n > 125 => n becomes greater than 127; sometimes trash ouput
                // n > 126 => ends as expected
                // n > 127 => ET_LOAD_STORE, Memory access exception
                printf(" OLED_GRAM -> i = %d n = %d \n", i, n);
            }
//            printf(" OLED_GRAM -> i = %d, n = %d \n", i, n);
        }
    }
}

int main(void) {
    printf("Hello World\n");

    oled_Refresh_GRAM();

    return 0;
}
I have compiled the code and disassembled it with:

Code: Select all

xobjdump -d -S test.xe > xcc14.txt
I have attached xcc14.txt. The important section is:

Code: Select all

int main(void) {
.label7      0x000400f4: ff 17:       nop (0r)        
             0x000400f6: 82 7f:       dualentsp (u6)  0x2
    printf("Hello World\n");
             0x000400f8: 00 f0 4d 7f: ldaw (lu6)      r11, cp[0xd]
             0x000400fc: 8c 91:       add (2rus)      r0, r11, 0x0
             0x000400fe: ff 17:       nop (0r)        
             0x00040100: 00 f0 21 d0: bl (lu10)       0x21 <puts>
             0x00040104: 00 f0 12 60: ldaw (lru6)     r0, dp[0x12]
            OLED_GRAM[n][i] = 0x00;
             0x00040108: 40 68:       ldc (ru6)       r1, 0x0 
             0x0004010a: ff 17:       nop (0r) 
.label14     0x0004010c: 11 f8 ec 8f: st8 (l3r)       r1, r0[r1]
#include <stdio.h>
#include <stdint.h>
             
uint8_t OLED_GRAM[128][8];

void oled_Refresh_GRAM(void) {
    uint8_t i, n;

    for (i = 0; i < 8; i++) {
        for (n = 0; n < 128; n++) {
             0x00040110: 80 94:       add (2rus)      r0, r0, 0x8
             0x00040112: ff 17:       nop (0r)
             0x00040114: 00 f0 03 77: bu (lu6)        -0x3 <.label14>

<_get_cmdline>:
.label6      0x00040118: 00 f0 40 77: entsp (lu6)     0x0
             0x0004011c: 24 90:       add (2rus)      r2, r1, 0x0
             0x0004011e: 10 90:       add (2rus)      r1, r0, 0x0
             0x00040120: 0d 68:       ldc (ru6)       r0, 0xd
             0x00040122: 00 f0 6f d8: ldap (lu10)     r11, 0x6f <_DoSyscall>
             0x00040126: fb 27:       bau (1r)        r11

The problem is in the line:

Code: Select all

0x00040114: 00 f0 03 77: bu (lu6)        -0x3 <.label14>
It is a unconditional jump to label14, that means, it is an endless loop: The loop increases the address in r8 until the r8 is out of bounds and then the exception happens.

If you write instead of "uint8_t i, n;" "uint16_t i, n;" the code works as expected (attachment xcc14_16.txt). The important code section is:

Code: Select all

int main(void) {
.label7      0x000400f4: ff 17:       nop (0r)
             0x000400f6: 84 7f:       dualentsp (u6)  0x4 
             0x000400f8: ff 17:       nop (0r)        
             0x000400fa: 02 55:       stw (ru6)       r4, sp[0x2]
    printf("Hello World\n");
             0x000400fc: 00 f0 4d 7f: ldaw (lu6)      r11, cp[0xd]
             0x00040100: 8c 91:       add (2rus)      r0, r11, 0x0
             0x00040102: ff 17:       nop (0r)        
             0x00040104: 00 f0 29 d0: bl (lu10)       0x29 <puts>
             0x00040108: 00 68:       ldc (ru6)       r0, 0x0
             0x0004010a: ff 17:       nop (0r) 
             0x0004010c: 02 f0 40 68: ldc (lru6)      r1, 0x80
             0x00040110: 00 f0 92 60: ldaw (lru6)     r2, dp[0x12]

    oled_Refresh_GRAM();
             0x00040114: 30 90:       add (2rus)      r3, r0, 0x0
             0x00040116: ff 17:       nop (0r)        
.label15     0x00040118: b8 90:       add (2rus)      r11, r2, 0x0
             0x0004011a: 44 90:       add (2rus)      r4, r1, 0x0
            OLED_GRAM[n][i] = 0x00;
.label14     0x0004011c: 8f f9 ec 8f: st8 (l3r)       r0, r11[r3]
        for (n = 0; n < 128; n++) {
             0x00040120: 01 99:       sub (2rus)      r4, r4, 0x1
             0x00040122: bc 96:       add (2rus)      r11, r11, 0x8
             0x00040124: 00 f0 03 75: bt (lru6)       r4, -0x3 <.label14>
#include <stdio.h>
#include <stdint.h>

uint8_t OLED_GRAM[128][8];

void oled_Refresh_GRAM(void) {
    uint16_t i, n;

    for (i = 0; i < 8; i++) {
             0x00040128: bd 90:       add (2rus)      r11, r3, 0x1
             0x0004012a: ff 17:       nop (0r)
             0x0004012c: 8f b2:       eq (2rus)       r4, r3, 0x7
             0x0004012e: bc 91:       add (2rus)      r3, r11, 0x0
             0x00040130: 00 f0 07 7d: bf (lru6)       r4, -0x7 <.label15>
             0x00040134: 00 68:       ldc (ru6)       r0, 0x0
             0x00040136: 02 5d:       ldw (ru6)       r4, sp[0x2]
             0x00040138: ff 17:       nop (0r)
             0x0004013a: c4 77:       retsp (u6)      0x4
Here, the line

Code: Select all

 0x00040124: 00 f0 03 75: bt (lru6)       r4, -0x3 <.label14>

contains a conditional jump to label14. That is, what we want. "r4" is initialized with 0x80 = 128 in line

Code: Select all

0x0004011a: 44 90:       add (2rus)      r4, r1, 0x0

and decreased for each loop iteration by the line:

Code: Select all

  0x00040120: 01 99:       sub (2rus)      r4, r4, 0x1
As long "r4" is true, that means "> 0", the pc jumps to label14 otherwise to the next instruction.
You do not have the required permissions to view the files attached to this post.
User avatar
fabriceo
XCore Addict
Posts: 186
Joined: Mon Jan 08, 2018 4:14 pm

Post by fabriceo »

dsteinwe wrote: Thu Dec 28, 2023 2:14 pm @vergil19 How do you have generated the disassembled code? I'm interested in, because some instructions are coupled in a {} for dual issue. This looks nice and improves readability.
Hi,
you need to find the corresponding .s file generated by the compiler. in order to do that easily , just add a comment like this somewhere in the original file, for example before or inside your for() loop:
asm volatile ("#mycomment:");
and then select #mycomment: and with the right-click menu do a search in workspace.
this will show the tree with all files in .build containing this text and then you ll find the .s file. just double click and this will open the editor at the line of your comment.
cheers
User avatar
Ross
XCore Expert
Posts: 968
Joined: Thu Dec 10, 2009 9:20 pm
Location: Bristol, UK

Post by Ross »

dsteinwe wrote: Thu Dec 28, 2023 2:14 pm How do you have generated the disassembled code? I'm interested in, because some instructions are coupled in a {} for dual issue. This looks nice and improves readability.
Use -save-temps to save intermediate files (i.e. .s files)

-fcomment-asm can also be handy
User avatar
dsteinwe
XCore Addict
Posts: 144
Joined: Wed Jun 29, 2016 8:59 am

Post by dsteinwe »

Hello fabriceo and ross!

Thank you very much for your tips. I will try them out as soon as I have cured the virus. They may also help with my problem I have posted in the last few days.

I wish you a happy new year!
User avatar
fabriceo
XCore Addict
Posts: 186
Joined: Mon Jan 08, 2018 4:14 pm

Post by fabriceo »

Hello
what an amazing bug... just tested by curiosity , it happens in -01 -02 or -03
it is scary

it has to do with the compiler removing unused code & data.
if you print the content of the oled table within the for loop and change (n>127) by just (n) then it provides good code.

optimization always gives us surprises, but here the infinite loop "bu .label" is hard to justify
User avatar
dsteinwe
XCore Addict
Posts: 144
Joined: Wed Jun 29, 2016 8:59 am

Post by dsteinwe »

The problem also occurs with one dimensional arrays. Futhermore, the problem can also be reproduced with while-loops. Here my test code:

Code: Select all

#include <stdio.h>
#include <stdint.h>

uint32_t OLED_GRAM[128];

void oled_Refresh_GRAM(void) {
    uint8_t n = 0;
    asm volatile ("#WHILE_LOOP");
    while (n < 128) {
            //OLED_GRAM[n][i] = 0x00;
            OLED_GRAM[n] = 0x00;
            if (n > 127) {
                // compiler with -O3
                // n > 125 => n becomes greater than 127; sometimes trash ouput
                // n > 126 => ends as expected
                // n > 127 => ET_LOAD_STORE, Memory access exception
                //printf(" OLED_GRAM -> i = %d n = %d \n", i, n);
                printf(" OLED_GRAM -> n = %d \n", n);
            }
            asm volatile ("#WHILE_LOOP_INC");
            n++;
        }
}


int main(void) {
    printf("Hello World\n");

    oled_Refresh_GRAM();

    return 0;
}
User avatar
Ross
XCore Expert
Posts: 968
Joined: Thu Dec 10, 2009 9:20 pm
Location: Bristol, UK

Post by Ross »

I've raised the issue internally and it has been reproduced. Initial reports from our compiler team:

It effects the XC compiler only and optimisation levels > 0

It doesn't effect the C compiler at any optimisation level

The type of the array isn't significant but the type of the counter variable is.

A smaller example to replate:

extern unsigned arr[128];

Code: Select all

void f(void) {
    unsigned char n;

    for (n = 0; n < 128; n++) {
      arr[n] = 0x00;
    }
}
Current suggested workaround is to use int for array indexing - its likely more efficient anyway.