Rare and sporadic ET_ILLEGAL_RESOURCE exceptions

Technical questions regarding the XTC tools and programming with XMOS.
magnus
Member
Posts: 13
Joined: Tue Aug 06, 2019 8:40 pm

Rare and sporadic ET_ILLEGAL_RESOURCE exceptions

Post by magnus »

Hi,

I'm having an issue where we occasionally get an ET_ILLEGAL_RESOURCE exception. Unfortunately we have not been able to reproduce this reliably.

CPU: XE232-1024-FB374

We usually try to stress the device with network traffic, though of note is that high throughput isn't the issue but rather timing seems to cause it.
Getting 2000 packets per second through the whole stack for prolonged times is no issue, but sending just a handful(?) of the exact same packets with "good timing" seems to trigger it.

With a debugger attached we most commonly get the same exact result (but have seen another one near it):

Code: Select all

xrun: Program received signal ET_ILLEGAL_RESOURCE, Resource exception.
      [Switching to tile[0] core[5]]
      0x0004ee92 in _i.xtcp_if._chan.get_ipconfig ()
which is on the chkct instruction:

Code: Select all

<_i.xtcp_if._chan.get_ipconfig>:
             0x0004ee80: 00 f0 44 77: entsp (lu6)     0x4
             0x0004ee84: 02 55:       stw (ru6)       r4, sp[0x2]
             0x0004ee86: 02 87:       getr (rus)      r4, 0x2
             0x0004ee88: 90 17:       setd (r2r)      res[r4], r0
             0x0004ee8a: 0d 68:       ldc (ru6)       r0, 0xd
             0x0004ee8c: c0 10:       add (3r)        r0, r4, r0
             0x0004ee8e: 80 af:       out (r2r)       res[r4], r0
             0x0004ee90: 12 4f:       outct (rus)     res[r4], 0x2
             0x0004ee92: 11 cf:       chkct (rus)     res[r4], 0x1      <-------  
             0x0004ee94: 41 54:       stw (ru6)       r1, sp[0x1]
             0x0004ee96: 12 4f:       outct (rus)     res[r4], 0x2
             0x0004ee98: 41 64:       ldaw (ru6)      r1, sp[0x1]
             0x0004ee9a: c0 90:       add (2rus)      r0, r4, 0x0
             0x0004ee9c: 21 f0 9c d1: bl (lu10)       0x859c <__interface_client_call>
             0x0004eea0: 11 cf:       chkct (rus)     res[r4], 0x1
             0x0004eea2: e4 17:       freer (1r)      res[r4]
             0x0004eea4: 02 5d:       ldw (ru6)       r4, sp[0x2]
             0x0004eea6: c4 77:       retsp (u6)      0x4
Possible reasons according to https://www.xmos.com/download/xCORE-200 ... )(1.1).pdf
ET_ILEBGAL_RESOURCE r is not pointing to a channel resource, or the resource is not in use.
ET ILLEGAL RESOURCE r contains a data token.
ET ILLEGAL RESOURCE Y contains a control token different to s.
Not quite sure how to verify the channel resource but I can't see any corruption related to the interface variables etc.

our/calling code:

Code: Select all

    xtcp_ipconfig_t ipconfig;
    i_xtcp.get_ipconfig(ipconfig);
other side (xtcp lwip): https://github.com/xmos/lib_xtcp/blob/b ... 69-L508C69

Code: Select all

case i_xtcp[unsigned i].get_ipconfig(xtcp_ipconfig_t &ipconfig):
      memcpy(&ipconfig.ipaddr, &my_netif.ip_addr, sizeof(xtcp_ipaddr_t));
      memcpy(&ipconfig.netmask, &my_netif.netmask, sizeof(xtcp_ipaddr_t));
      memcpy(&ipconfig.gateway, &my_netif.gw, sizeof(xtcp_ipaddr_t));
      break;
About as simple as it gets...

I'd like to have a better understanding of what could cause this. I've verified that the interface variables etc. haven't been overwritten or something like that but I'm unsure about the actual structures used under the hood.

We do have some MAYBEs in the constraints check by the compiler, example:

Code: Select all

Constraint check for tile[0]:
  Cores available:            8,   used:          7+.  MAYBE
  Timers available:          10,   used:          7+.  MAYBE
  Chanends available:        32,   used:         29+.  MAYBE

Constraint check for tile[1]:
  Cores available:            8,   used:          8+.  MAYBE
  Timers available:          10,   used:          8+.  MAYBE
  Chanends available:        32,   used:         32+.  MAYBE
  
Constraint check for tile[2]:
  Cores available:            8,   used:          8+.  MAYBE
  Timers available:          10,   used:          8+.  MAYBE
  Chanends available:        32,   used:         26+.  MAYBE

  [...] 
Anything to be concerned about? Any way to verify the actual resource use? I've assumed it is completely unrelated but realize I don't know...

We usually compile with Community 14.3.3 but have verified that the issue remains in 14.4.1 as well as XTC 15.2.1 (though code size exploded a bit so had to trim stuff down)

Any pointers or thoughts?


User avatar
akp
XCore Expert
Posts: 579
Joined: Thu Nov 26, 2015 11:47 pm

Post by akp »

It looks like you have a large application. I suggest you pare it back to the minimum necessary to test xtcp and then add features in until it breaks. I would have thought with MAYBEs that you would be more likely to fail on the getr instruction if it can't allocate a resource.

I can't recall why you get MAYBEs for constraint checks, I had that for a while and then I changed it to remove it. I think it was most likely some assembly code I was using.
magnus
Member
Posts: 13
Joined: Tue Aug 06, 2019 8:40 pm

Post by magnus »

Thanks! Yeah, it is uncomfortably large. I remember being concerned about the MAYBEs a few years ago but thought I verified it was fine but now I can't remember any details... So I don't trust I did my homework on that or that all assumptions still hold.

We haven't written any assembly ourselves, but we do make use of the GET_SHARED_GLOBAL macros sharing memory (and make use of quite some xcore/xmos libraries that might do something).

Good suggestion, I will tinker with that. I have tried to reproduce it on a XK-AUDIO-216-MC-AB development board but I probably need to understand the issue better before succeeding with that route.
jack1999
New User
Posts: 2
Joined: Thu Sep 21, 2023 3:24 am

Post by jack1999 »

Hi, if you haven't found the answer yet, I think you can go to our website https://www.oemstron.com/ to make a consultation, there are professional customer service to help you solve any problem you have.