XUD - Support for Multiple Transactions per Microframe

Sub forums for various specialist XMOS applications. e.g. USB audio, motor control and robotics.
User avatar
kenmac
Member++
Posts: 21
Joined: Tue May 13, 2014 9:37 am

XUD - Support for Multiple Transactions per Microframe

Post by kenmac »

During my recent cross-platform experimentation with my custom version of the XMOS UAC2 referrence design I have noticed problems when using high channel counts and bit rates on OSX and Linux. I have narrowed this down to the fact that Windows accepts out of spec isochronous mMaxPacketSize endpoint descriptors, and will quite happily deal with packet sizes over the maximum 1024 bytes without splitting into multiple transactions.

The throughput I require demands that I send more that 1024 bytes per microframe and therefore to be within the USB spec, the isochronous endpoint data should be divided between multiple transactions per microframe.

I have noticed that in the recent XUD version the asynchronous SetReady functions no longer have a PIDn parameter (to specify transaction/packet number within microframe). e.g.

Code: Select all

 XUD_SetReady_In(aud_to_host_usb_ep, PIDn_DATA0, p+4, len);
Does XUD actually support multiple transactions per microframe using the asynchronous I/O functions? Did it use to support them as is suggested by the former inclusion of PIDn_DATAx tokens in the function calls?

Would appreciate if somebody could enlighten me on XUD's capabilities regarding these issues.

Regards,

Kenny


User avatar
Ross
XCore Expert
Posts: 962
Joined: Thu Dec 10, 2009 9:20 pm
Location: Bristol, UK

Post by Ross »

kenmac wrote: Does XUD actually support multiple transactions per microframe using the asynchronous I/O functions? Did it use to support them as is suggested by the former inclusion of PIDn_DATAx tokens in the function calls?
XUD can handle multiple transactions per micro-frames for endpoint types other than isochronous (e.g. Bulk)

Since you mention audio, my guess is that the functionality you are effectively asking for is High-bandwith isochronous endpoints.

The short answer is "no", these are not currently supported. As you elude to these have a different PID scheme (DATA2, MDATA) - the plan is to handle this PID toggling/checking for high-bandwidth endpoints in the library but this work has not been completed yet.

That said, I've just taken a quick look at the code and the following should work for the IN (xCORE transmit) side of things if you would like to generate the PIDs manually:

- Mark the endpoint as Iso (to avoid the built in PID toggling)
- Add a PID param to the existing (or a new) SetReady function(s)
- Add the following line to your new SetReady function(s):

Code: Select all

    // Set packet PID 
    asm ("stw %0, %1[4]"::"r"(pid),"r"(ep));
No warranty on that one I'm afraid - but let us know how you get on :)
User avatar
kenmac
Member++
Posts: 21
Joined: Tue May 13, 2014 9:37 am

Post by kenmac »

Hi Ross,

I have tried what you recommended regarding setting PIDs in the XUD_SetReady_In functions but with little success. The main problem seems to be that XUD deals with MDATA PIDs different from DATA1 and DATA2 tokens and does not first send an IN token. As the XUD_ep interface is opaque, it is difficult to know if this can be corrected from publicly available information.

I am also unsure if the timing requirements of multiple transactions per microframe can be met by sequencing the transactions using a thread external to the main XUD thread, and using XUD_SetData_Select with XUD_SetReady_In. I might have been able to confirm this if the above limitation had not been present.

I have heard that there are projects out there using up to 40 channels of audio, which I assume would require more bandwidth than 1 frame (1024 bytes) per microframe would allow. If this is the case, are you aware of what method they used to get around this obstacle? Can you give an indication of when the modification to XUD will be released that does this iso pid seqeuncing internally?

Regards,

Kenny
User avatar
Ross
XCore Expert
Posts: 962
Joined: Thu Dec 10, 2009 9:20 pm
Location: Bristol, UK

Post by Ross »

kenmac wrote:Hi Ross,

I have tried what you recommended regarding setting PIDs in the XUD_SetReady_In functions but with little success. The main problem seems to be that XUD deals with MDATA PIDs different from DATA1 and DATA2 tokens and does not first send an IN token. As the XUD_ep interface is opaque, it is difficult to know if this can be corrected from publicly available information.
Hi Kenny,

The XMOS device doesn't send the IN tokens - these are sent from the host - if the host isn't sending them them then you may have modified the descriptors incorrectly or the driver you are using simply doesn't support high-bandwidth endpoints.
User avatar
Ross
XCore Expert
Posts: 962
Joined: Thu Dec 10, 2009 9:20 pm
Location: Bristol, UK

Post by Ross »

kenmac wrote:
I am also unsure if the timing requirements of multiple transactions per microframe can be met by sequencing the transactions using a thread external to the main XUD thread, and using XUD_SetData_Select with XUD_SetReady_In. I might have been able to confirm this if the above limitation had not been present.
You should be able to get something going - someone has previously mimicked a camera interface using a similar trick in the past.
kenmac wrote: I have heard that there are projects out there using up to 40 channels of audio, which I assume would require more bandwidth than 1 frame (1024 bytes) per microframe would allow. If this is the case, are you aware of what method they used to get around this obstacle? Can you give an indication of when the modification to XUD will be released that does this iso pid seqeuncing internally?
I am told that OS X does now support high-bandwidth endpoints prior to this support multiple endpoints have been used successfully. I don't have an indication of time-scales at the moment - as always it will depend on demand, driver support etc. Let us know if you have an interesting project in mind :)
User avatar
kenmac
Member++
Posts: 21
Joined: Tue May 13, 2014 9:37 am

Post by kenmac »

Ross wrote: You should be able to get something going - someone has previously mimicked a camera interface using a similar trick in the past.
I have gone over the approach you suggested again to be sure, but with the same conclusions. MDATA packets do not get transmitted with the preceding IN token required for each packet as the DATA0, DATA1, DATA2 packets do. I have tried sending the IN token manually using the same SetReady function, but doing this does not populate the required bytes in the token other than PID (ENDP and ADDR).

I am not sure how the camera interface project managed this. Perhaps they had additional information regarding the opaque Endpoint byte interface in XUD? Perhaps they were not using the non-blocking endpoint I/O? If you have any ideas I would be interested in hearing. Right now I am convinced it is not possible without a XUD modification, but I would be happy to learn otherwise.
Ross wrote:I am told that OS X does now support high-bandwidth endpoints prior to this support multiple endpoints have been used successfully. I don't have an indication of time-scales at the moment - as always it will depend on demand, driver support etc. Let us know if you have an interesting project in mind :)
The project is a composite device with VCOM and 16 Channel, 192kHz, 24 bit UAC2 soundcard. The calculations for required isoc transaction length are as follows:
Audio USB Isoc Endpoint
chans 16F_uframe (Hz) 8000
bitdepth (bits) 32 Transaction size (bytes) 1024
sr (Hz) 192000bw (bps) 65536000
bw (bps) 98304000

Bytes per transaction required = 1536

It is clear that we need multiple transactions per uframe as 1024 is the largest isoc transaction size in the USB spec (although not in Windows).
User avatar
Ross
XCore Expert
Posts: 962
Joined: Thu Dec 10, 2009 9:20 pm
Location: Bristol, UK

Post by Ross »

kenmac wrote: I have gone over the approach you suggested again to be sure, but with the same conclusions. MDATA packets do not get transmitted with the preceding IN token required for each packet as the DATA0, DATA1, DATA2 packets do. I have tried sending the IN token manually using the same SetReady function, but doing this does not populate the required bytes in the token other than PID (ENDP and ADDR).
The device doesn't send the IN tokens - the host does - if the host doesn't send an IN token the device cannot respond. You need to resolve the issue with the host not requesting the data before you go any further.

The lib doesn't do anything with the PIDs - it just sends them out - the only thing it does is do the toggling (for non-iso endpoints) in XUD_EpFuncs.S
User avatar
kenmac
Member++
Posts: 21
Joined: Tue May 13, 2014 9:37 am

Post by kenmac »

Ross wrote:The device doesn't send the IN tokens - the host does - if the host doesn't send an IN token the device cannot respond. You need to resolve the issue with the host not requesting the data before you go any further.
You are right. It is easy to forget that the host is the only thing that can issue requests in USB. All we are doing is readying the endpoint to send data when it receives a IN request/token from the host. Your comment brought me on to discovering something I was doing wrong. I was sending MDATA packets as part of the IN transactions, but these should only feature in OUT transactions. The protocol for multiple transactions per microframe varies between IN and OUT, which I overlooked before. This however did not fix the problem overall.

What I am getting now is IN tokens being issued in Windows constantly with no OUT tokens. On OSX I get both IN and OUT tokens and audio playback is possible, but there are gaps/silence in playback at regular intervals. This is likely due to the fact that I am not actually sending multiple multiple transactions per uframe, and rather sending single transactions in consecutive uframes with alternating PIDS, as confirmed by my bus sniffer.

Image

Is this because the host/driver is not requesting multiple IN transactions per uframe or that XUD is not sending out the two packets fast enough?

In the first case there is clearly a host/UAC2 driver issue regarding support for multiple transactions per uframe, and there is not that much that can be done to fix this, short of writing a custom driver. I guess the other option would be to use multiple isoc endpoints like you suggested, but this would involve substantial modification of the firmware.

If the second case is true, all I can think of is to create a separate thread devoted to servicing the second XUD_SetReady_In call which sends the second packet in the uframe. My theory is that since this loop would have a higher priority than the default XUD_SetData_Select service routine for the IN endpoint, it might be able to meet the timing requirements.

How do you feel about those suggestions?

Cheers,

Kenny
User avatar
Ross
XCore Expert
Posts: 962
Joined: Thu Dec 10, 2009 9:20 pm
Location: Bristol, UK

Post by Ross »

kenmac wrote:This is likely due to the fact that I am not actually sending multiple multiple transactions per uframe, and rather sending single transactions in consecutive uframes with alternating PIDS, as confirmed by my bus sniffer.

Image

Is this because the host/driver is not requesting multiple IN transactions per uframe or that XUD is not sending out the two packets fast enough?
The trace is missing the SOF's but looking at the timing (about 125usS?) it looks like you are only getting one IN token packet from the host per micro-frame. So the first case.
User avatar
kenmac
Member++
Posts: 21
Joined: Tue May 13, 2014 9:37 am

Post by kenmac »

Ross wrote:The trace is missing the SOF's but looking at the timing (about 125usS?) it looks like you are only getting one IN token packet from the host per micro-frame. So the first case.
I believe you are correct. The host/driver does seem to be the problem. I have investigated further on multiple platforms and have witnessed completely different behaviour on each.

The screen grab I posted before wasn't as helpful as it could have been without the SOFs I agree. I have posted bus sniffer data for Linux, OSX and Windows, with SOFs this time ;)

The most interesting thing I have spotted is that despite a technical note published by Apple indicating that there is no support for multiple transactions per uframe in their drivers, the data captured for the Apple case is closer to the expected data than the other two platforms. I have Included the grabs and a description below.

I am not convinced it will be possible to use multiple transactions per uframe across all three platforms at present without custom drivers. Part of the benefit of using UAC2 is to avoid the need for proprietary drivers, so this is not likely to be an option.


Linux
Image

OUT transaction is not split up into multiple parts despite being over 1024 bytes.
IN transaction is split up into multiple parts and despite appearing to be between SOF tokens the delta time interval between tokens suggests that it is not (151us > 125us frame period).
Data returned to test bench is munged.

Windows
Image

No OUT transactions are issued.
IN transaction are trying to follow protocol for mutliple isoc transactions per uframe but host doesn't seem to be requesting the packets fast enough. Similar to Linux except worse timing properties.

OSX
Image

OUT transactions appear to have been executed correctly according to USB spec for multiple isoc transactions per uframe.
In transactions are between SOFs but a closer look at the timing shows negative intervals suggesting the transaction belongs to the previous uframe. The previous uframe has only 1 OUT transactions inbetween SOFs. This pattern repeats.
Data received is partially correct, but with drop outs.
Post Reply