XUF216-512-TQ128 boot failed

Technical discussions around xCORE processors (e.g. xcore-200 & xcore.ai).
Post Reply
Neo
New User
Posts: 3
Joined: Wed Aug 26, 2020 7:48 am

XUF216-512-TQ128 boot failed

Post by Neo »

Hello everyone,

Our board used XUF216-512-TQ128 and it combined flash IS25LQ016B inside, after working some days, very few board boot failed. XTAG dumpstate show below:


xrun: Program received signal ET_ECALL, Application exception.
0xfff005ce in ?? ()

***** Active Cores *****
2 tile[1] core[0] 0xfff00518 in ?? ()
* 1 tile[0] core[0] 0xfff005ce in ?? ()

Thread 2 (tile[1] core[0]):

***** Call Stack *****
#0 0xfff00518 in ?? ()

***** Disassembly *****
0xfff00518: in (2r) r1, res[r6] *
0xfff0051a: setd (r2r) res[r6], r1
0xfff0051c: in (2r) r2, res[r6] *
0xfff0051e: ldw (ru6) r4, dp[0x0]
0xfff00520: mkmsk (rus) r5, 0x20

***** Registers *****
r0 0x0 0
r1 0x0 0
r2 0x0 0
r3 0x40000 262144
r4 0x200100 2097408
r5 0x100200 1049088
r6 0x10002 65538
r7 0x80a 2058
r8 0x0 0
r9 0x10023 65571
r10 0x3 3
r11 0x40b 1035
cp 0x0 0
dp 0xfff01eb0 -1040720
sp 0x7ff7c 524156
lr 0xfff00342 -1047742
pc 0xfff00518 -1047272
sr 0x40 64
spc 0x0 0
ssr 0x0 0
et 0x0 0
ed 0x0 0
sed 0x0 0
kep 0xfff00400 -1047552
ksp 0xfff00518 -1047272

Thread 1 (tile[0] core[0]):

***** Call Stack *****
#0 0xfff005ce in ?? ()

***** Disassembly *****
0xfff005ce: ecallt (1r) r7
0xfff005d0: ldc (ru6) r11, 0x1
0xfff005d2: out (r2r) res[r1], r11 *
0xfff005d4: syncr (1r) res[r1] *
0xfff005d6: setc (ru6) res[r0], 0x0 *

***** Registers *****
r0 0x10100 65792
r1 0x10000 65536
r2 0x40100 262400
r3 0x106 262
r4 0x40000 262144
r5 0xd15ab1e 219523870
r6 0xedb88320 -306674912
r7 0x6522df69 1696784233
r8 0x0 0
r9 0x23 35
r10 0x6 6
r11 0x0 0
cp 0x0 0
dp 0xfff01eb0 -1040720
sp 0x7ff7c 524156
lr 0xfff00342 -1047742
pc 0xfff005ce -1047090
sr 0x51 81
spc 0xfff005ce -1047090
ssr 0x0 0
et 0x8 8
ed 0x0 0
sed 0x0 0
kep 0xfff00400 -1047552
ksp 0xfff00402 -1047550
ksp 0xfff00402 -1047550


This is CRC error, but actually image in flash is correct, we dumped all flash and compared it. After flash dumped, the board can work well again, the board was repaired. We guest that QE bit was wrong in flash, we know flash in XUF216 should be in qual bit mode. We write some code to read the QE bit via QSPI port(not quadflashlib), and used xTime IDE run it on other bad board, found the QE equal zero. Then we write another codes fixed QE bit, the bad board was repaired.

My questions are:
1. Why the QE bit has been cleared ? Our code just called quadflashlib.
2. Under the case, how xcore can fix the QE error, I know ROM is unchangable, how about fix QE error in AES loader, that can be burned into OTP.



Thanks!


User avatar
mon2
XCore Legend
Posts: 1913
Joined: Thu Jun 10, 2010 11:43 am
Contact:

Post by mon2 »

Hi.

Go through the checklist inside the CPU datasheet.

Post the schematic of this design for a review.

Double check that the power up voltage sequencing is correct. Current draw of each rail is correct?

Reset supervisor is working correctly?

Check the soldering of this CPU which is critical.

It is good that you can access the CPU over JTAG (xtag3).

Have you attempted to run the basic Xmos USB hid mouse demo on your custom PCBA? Does that run?
Neo
New User
Posts: 3
Joined: Wed Aug 26, 2020 7:48 am

Post by Neo »

Actually, we totally copied offical demo board, just replace few parts. We exchanged the XMOS from bad board to offical demo board, the bad board became OK, and demo board became BAD. So we think PCBA is fine, and flash QE is the root cause.
User avatar
mon2
XCore Legend
Posts: 1913
Joined: Thu Jun 10, 2010 11:43 am
Contact:

Post by mon2 »

The XMOS internal flash is from ISSI. You could consider to contact them for advice but here is a general thread worth a quick review:

https://microchipsupport.force.com/s/ar ... Corruption

ISSI has great support. They were very hands-on for investigating another case involving their devices and XMOS otp firmware. See the long thread somewhere in this forum.

If the issue is truly Xmos CPU related then the working CPU should fail on the official kit?

Suggest to run the XMOS USB hid mouse IP for a few hours or longer to confirm if the QE bit will erase or not.

Is your ip performing read/writes to the flash during runtime?
Neo
New User
Posts: 3
Joined: Wed Aug 26, 2020 7:48 am

Post by Neo »

>>> If the issue is truly Xmos CPU related then the working CPU should fail on the official kit?
I don't know the means of 'working CPU', we picked the XMOS CPU from our bad board, and sticked it on official kit, then the official kit boot fail.

>>> Suggest to run the XMOS USB hid mouse IP for a few hours or longer to confirm if the QE bit will erase or not.
Boot Fail is low probability events, we have 100 boards, only 4 boards boot fail, we used three methods to fix them.
1. XTAG dump all flash.
2. xTime IDE run our IP.
3. xTime IDE run special IP set the QE bit.

>>> Is your ip performing read/writes to the flash during runtime?
Yes, our IP will read data from flash, and update boot partition as needed. But our IP just called quadflashlib interface, not modified the QE bit.


If we want to use AES loader to set the QE bit, where I can find the source code of AES loader to customize it. Then ROM load AES loader from OTP, then AES loader set QE bit and load flash loader...
We think This is a solution to fix this fault, we don't warry about who and when clear the QE bit.
User avatar
mon2
XCore Legend
Posts: 1913
Joined: Thu Jun 10, 2010 11:43 am
Contact:

Post by mon2 »

The AES code is not open source. Fairly sure this is still accurate but you can open a support ticket with Xmos for a confirmation.

The repair of the QE bit is a work around. Something with your 4 boards is different that causes the QE bit to be corrupted.

Keep in mind that Xmos has shipped millions of these chips to customers including Sony. If this was a serious issue, this forum will be flooded with similar complaints.

Check your power rails, the reset supervisor circuits and PCBA for the non working boards. Some transients or event is causing this issue. Yes, then the CPU QE bit is corrupted and you are performing the work around.

READING of the flash over xtag should not correct the QE bit. These are read transactions only. I do believe that the use of xflash WRITE of supported qspi devices does automatically enable the QE bit.

Are the lines used by the internal flash loaded by external hardware? Check for this.

Will not hurt to ask Xmos or even ISSI support for help. ISSI will know about their flash inside this CPU.

In this forum I have posted a fae contact at ISSI USA who was excellent on low level flash memory support a few years ago. If required can share the details again.

You could try a test to disable your part of the code that calls the flash IP, enable QE before running your code and then run to see if the code continues to run.
User avatar
mon2
XCore Legend
Posts: 1913
Joined: Thu Jun 10, 2010 11:43 am
Contact:

Post by mon2 »

Repost of the ISSI FAE:

KJ Jang
Integrated Silicon Solution, Inc.
Application Engineer
Phone: 408.969.5137

KyeongJin Jang
kyeongjin_jang[at_usual_stuff]issi.com
Post Reply