Sneaky problem with libusb & audio on transfer to host Topic is solved

Technical questions regarding the xTIMEcomposer, xSOFTip Explorer and Programming with XMOS.
User avatar
dsteinwe
Experienced Member
Posts: 87
Joined: Wed Jun 29, 2016 8:59 am

Sneaky problem with libusb & audio on transfer to host

Post by dsteinwe »

Hi,

I have a sneaky problem when my custom sound card sends audio data to the host. When I record the data stream from my sound card on the PC, artefacts can be heard, when I playback the recorded stream. These artefacts can also be seen in the audio editor. I have added a screenshot ("audio editor.png"). In the screenshot you see a record at 48000Hz, where the samples have the values 1x20,000 and 5x10,000 on the left channel and 0 on the right channel. The artefacts happens all approx. 0,5s for about 10ms. Then, I have sniffed the usb traffic with wireshark and found some errors in the transfer (see screenshot "wireshark.png"). At first, I thought it is a hardware problem, but by chance I discovered that it is a software problem. I create the test signal (1x20,000 and 5x10,000 on left, 0 on right) directly in the code. During testing I always had a external source connected with the same sample rate. When I disconnected the external source, the artefacts didn't occur any more! Currently, my thoughts are:
  1. The xud thread has too less cpu cycles to process the the transmission in time, because to many cores are used on the usb tile. The libusb spec says, that the thread must run at least at 80MHz.
  2. It takes a too long time after receiving the SOF interrupt to call "XUD_SetReady_InPtr()" to transmit the next samples to the host.
  3. Something totally different ...
Ad 1) I've checked that I'm not using too many cores. I have 5 threads/cores including the xud thread, that are running on the usb tile. There are 2 further threads, that are marked as distributed thread. Therefore they doesn't consume a core. I think, that is not the problem.
Ad 2) If this were the case, I think the recorded samples would look different in the audio editor. I would expected artefacts on every sent sample block (6 samples per channel at 48kHz). Am I wrong?
Ad 3) No ideas, yet.

The thing that irritates me is that the output is transmitted bit-perfect even at 192 kHz, but the input already has problems at 48kHz. Do have any ideas?
You do not have the required permissions to view the files attached to this post.


View Solution
User avatar
dsteinwe
Experienced Member
Posts: 87
Joined: Wed Jun 29, 2016 8:59 am

Post by dsteinwe »

It seems to be the case that the core that sends the data to libxud and receives the data from the input core is critical in terms of execution time. I have deactivated the code that copies the samples from the input to a page buffer, that will be read from libxud. Then I have connected a signal source to the input and recorded an input in parallel. The recording still looks fine. The select-case statement seems to be time critical, where the sof event and input signal event is received.
User avatar
CousinItt
Respected Member
Posts: 275
Joined: Wed May 31, 2017 6:55 pm

Post by CousinItt »

I don't have any detailed knowledge of xmos USB audio, so just a few general points.

1) Is the device clock synchronised in some way with the USB? In which case, if there were problems with the source clock would that affect the reliability? For example, have you tried it with more than one PC? Are there any detectable differences if you do?

2) xmos libraries often use the most general method for exchanging data, but in this case the XUD documentation says:
It is important to note that, for performance reasons, cores communicate with the XUD library using both XC channels and shared memory communication. Therefore, all cores using the XUD library must be on the same tile as the library itself.
Assuming you're doing that already, is there any possibility that there's some mistake in handling pointers to shared memory? If all is good there, are there any tweaks you can make to the code to speed up transfer of data blocks, or is it already close to optimal?

3) Lastly, if you're using -O3 optimisation, it might be worth dropping the level to -O2 (not necessarily all modules) and seeing if anything improves.
User avatar
dsteinwe
Experienced Member
Posts: 87
Joined: Wed Jun 29, 2016 8:59 am

Post by dsteinwe »

Hi CousinItt,

thanks for your post. It was useful to me.
Is the device clock synchronised in some way with the USB?
I have used a implicit clock feedback for input audio streams. This means, that the hosts triggers the SOF at 8kHz and the SOF interrupts from the host gives the device the feedback, how long 125ns lasts on the host. Depending on this interval duration the device sends more or less samples according to the clock of the source. As far I can remember, the XMOS example also uses the implicit feedback for input.
In which case, if there were problems with the source clock would that affect the reliability? For example, have you tried it with more than one PC? Are there any detectable differences if you do?
Yes, I think, that the source clock affects problem but is not the cause in my opinion. I going to explain it some lines later, because I have some new observations.
2) xmos libraries often use the most general method for exchanging data, but in this case the XUD documentation says:
It is important to note that, for performance reasons, cores communicate with the XUD library using both XC channels and shared memory communication. Therefore, all cores using the XUD library must be on the same tile as the library itself.
Assuming you're doing that already, is there any possibility that there's some mistake in handling pointers to shared memory?
Yes, that are very important points, that I have consider. I think, I use libxud correctly with the shared memory and the cores are on the same tile. If I would use the shared memory in a wrong way, that will cause the transfer of corrupted samples, but will not cause protocol errors. Therefore I conclude, the errors must have another source.
If all is good there, are there any tweaks you can make to the code to speed up transfer of data blocks, or is it already close to optimal?
Indeed, I have improved the code a little bit and now, it works fine at 48kHz but not at 192kHz. Then I have written some test code, where dummy data are written to the shared memory. I have also added a delay in microseconds. I had tested 2 values in the delay: 160us and 70us. 160us causes artefacts, 70us not. I haven't tested out the limit, but I think it is obvious, now. The sample data must be transferred to the host in less than 125us. That means, my code has to consume less than 125us of time to process the input samples including the cpu cycles required for libxud. I guess, my productive code exceeds this limit in some situations, because the SOF clock and input source clock deviates. The deviation could be the source, why the artefacts happens rhythmically, but this is only a guess, yet.
3) Lastly, if you're using -O3 optimisation, it might be worth dropping the level to -O2 (not necessarily all modules) and seeing if anything improves.
I use "-O3", but I thought, "-O3" optimizes the code more than "-O2". I haven't compiled the project with "-O2", but I would expect, that some more artefacts may happen. Am I wrong or do I have understood the compiler switches wrong?

Currently, I focusing to improve consume of time on processing of input samples.
User avatar
CousinItt
Respected Member
Posts: 275
Joined: Wed May 31, 2017 6:55 pm

Post by CousinItt »

Re -O3 vs -O2, I seem to remember reading somewhere that the -O3 switch can sometimes be buggy. However I don't have any hard evidence to hand and I have had -O3 code working reliably (e.g. in the ethernet library).
User avatar
dsteinwe
Experienced Member
Posts: 87
Joined: Wed Jun 29, 2016 8:59 am

Post by dsteinwe »

Re -O3 vs -O2, I seem to remember reading somewhere that the -O3 switch can sometimes be buggy. However, I don't have any hard evidence to hand and I have had -O3 code working reliably (e.g. in the ethernet library).
I didn't know that. Fortunately, I haven't had any issue with -O3 so far. I have searched a little and found this post in the xcore forum: lib_ethernet and ET_LOAD_STORE. That is the only one post I have found to this issue. BTW, I have found a corresponding thread at stackoverflow for the gcc, where a difference occurs, when the code is compiled with "-O2" and "-O3": https://stackoverflow.com/questions/83962/do-i-have-a-gcc-optimization-bug-or-a-c-code-problem. Does anyone know if the xc-compiler is derived from the gcc?

In the meanwhile, I have done some improvements to the code. Currently, I run a test to verify, if the input is bit-perfect. It's running about 2 hours without an error. During improving the code I have had some other sample errors. Surprisingly, the errors occurs when the optical cable was bent at the end. Seems to be, that the cable is broken. I have never seen such an issue, and it occurs only at 192kHz. More surprisingly is, that the parity bit must correctly set in the SPDIF stream, otherwise the SPDIF transceiver IC would have disabled the output.

I have also tested with a delay, how much time reserve I have. It is not much: about 60 ticks. I have no idea, what code part consume so much time. Anyhow, it seems to be sufficient or as Dave from eevblog says: "It's enough for australia" ;-).

Have a nice week!