External memory interface for xcores ?

Technical discussions around xCORE processors (e.g. xcore-200 & xcore.ai).
jhrose
Active Member
Posts: 40
Joined: Mon Dec 14, 2009 11:18 am

Post by jhrose »

Hei,

Heater wrote:
I understand from David May's comments on another thread that the cores are already capable of addressing much larger amounts of memory.
Can you give a link please as I cannot search this?

On one hand, as an embedded programmer used to DSPs, Micros and O/S, I'm with you as it would make life a lot more familiar. However, David May wrote as programmers we need to think differently about memory usage http://www.xmoslinkers.org/forum/viewtopic.php?f=3&t=80, though he also wrote XMOS would think about adding more memory if there is a good reason in the same post.

I imagine adding an external memory interface, allowing the processor to address external RAM, would be rather costly in real-estate and could add significantly to the unit price. And a link speed increase (proposed for an XS1-L2 variant?) would seem to be needed to drive modern fast SDRAMs. For reasons such as these I wrote the XOSS proposal http://www.xcore.com/forum/download/file.php?id=16, which located Operating System and memory management on an external processor conncected via XLinks and proposed an XC-language extension to "swap" memory with XS1 cores in software. Would this approach, or one like it, which views XMOS as a co-processor be preferable to morphing XMOS along the traditional architecture route?


Heater
Respected Member
Posts: 296
Joined: Thu Dec 10, 2009 10:33 pm

Post by Heater »

jhrose: Sorry I can't find a link for you. Maybe it was not David, maybe I imagined it...nah somewhere there was a discussion about increasing the internal RAM capacity and someone was thinking that the instruction set needed expanding to accommodate it. It was pointed out that the current instruction formats are quite capable of handling larger spaces.

I agree there are a lot of applications for which solutions can be designed using "communicating sequential processes" where each process has limited memory space.

Video generation is one. I imagine there is a solution for that which involves an array of synchronized cores each driving a subset of the scan lines. In such a case each core only needs video buffer for it's own scan lines. Still that's a heavy weight solution for a smallish display.

Now using an XMOS as a co-processor for those real-time tasks and "software defined silicon" is an excellent idea. But a co-processor to what?

The idea of an external bus interface is turned inside out if the XMOS is a co-processor for, say, an ARM based system.
User avatar
Folknology
XCore Legend
Posts: 1274
Joined: Thu Dec 10, 2009 10:20 pm

Post by Folknology »

HI Julian great to get your opinion on this having reading the XOSS proposal previously I like the idea, particularly in the case of building more 'sophisticated XMOS systems' and integrating with existing platforms and systems like linux/VX etc..

However I am not sure it works in the simpler cases as it adds further complexity. Take for example an XS1-G4 (512) solution. if a memory mapping to external 32 bit port were implemented one could simply hook up the SRAM/SDRAM directly to a G4 core. If however they used the more sophisticated XOSS route they would have to add an FPGA and its design tools/process and maintenance into the equation. So on the grounds of Xmos's "Softchip", '"Revolutionising electronics' and keeping it simple (less chips,connections and simpler tool chain) I think 32 bit address bus wins.

However when using other members of the XS1 family XOSS becomes more attractive, due to either limited pin count (I/O lines available per core) or that fact you dedicate one to memory management anyhow and thus compromise efficiency. But again you still have the issues of complexity to deal with by adding in FPGA and its tool chains. Indeed a more ruthless supplier may look at the final chip count and complexity and instead use a larger FPGA designing the XS1 out completely...

P.S. I love the idea of running a virtual XS1 core inside the FPGA thats just so mind boggling, I must have missed that on the previous read...

P.P.S The link Heater is referring to is here http://www.xcore.com/forum/viewtopic.ph ... 01&start=0

I would be interested in you response to this Julian in terms of the making it simple.

regards
Al
User avatar
Folknology
XCore Legend
Posts: 1274
Joined: Thu Dec 10, 2009 10:20 pm

Post by Folknology »

I think both the 32 bit address pins and XOSS either add significant complication or can not be used across the XS1 family (i.e. only benefit the the 2 full I/O chips 128/512 pin). By far the greatest benefit across the whole Xmos XS1 chip range would be an increase in internal Ram and replacing OTP with Flash. This will solve many of the smaller memory problems whilst sticking to the 'Softcore' and keeping it simple. The more memory intensive applications not solved by this would probably benefit from an XOSS like enhancement. Unless we come up with perhaps a third way, maybe paging/caching or DMA like pattern by combining Links with a using a smaller set (say 8/16) of the I/O pins.

Perhaps someone with more know how about the Links operation can help here

Thoughts
jhrose
Active Member
Posts: 40
Joined: Mon Dec 14, 2009 11:18 am

Post by jhrose »

Heater wrote:
But a co-processor to what?
Any host with an XLink interface would do. One with a large FLASH to store the bootable XMOS program image might be useful.

Folknology wrote:
However I am not sure it works in the simpler cases...
The XOSS diagrams depict a single XMOS connected to an FPGA-subsystem doing the external memory interface. In systems with few or no XMOS devices networked to that interface you're probably right in respect of cost; the questionable benefit in these cases comes from having the host operating system interface of a XOSS-like design. All system designers need to make a cost estimate, which here would include comparing an XMOS network against an FPGA, and I would hope for systems with more XMOS devices connected the equation balances more favourably, irrespective of having the host O/S feature.

An simpler solution than XOSS, for a system with few networked XMOS parts, is to directly attach one XMOS processor to a host through an XLink (as in l.h.s. illustration 3 in XOSS), and have a software protocol for data exchange. The problem then is to design the software, to attain the performance required for your app - which is what this thread is about.

These things we can do today (almost).

I'll gladly be corrected but imagine implementing an external address bus in XMOS is not going to be low-cost. Providing the i/o pins is one aspect, but you also need to route the internal memory address and data lines through a crossbar switch, to allow the CPU access to both internal SRAM and external SDRAM at different times. Then you may have interference when a thread accessing some external address may temporarily block all other threads (badness), so you may want a per-thread level2 cache too. And you need to modify the CPU hold-off logic if an external access is required, to suspend the thread concerned. And as mentioned you either need to drive the SDRAM interface (refresh and stuff), or select an auto-refresh device, and you really should be targetting 400MHz DDR2/3. And there are probably issues of power, timing and other hardware stuff that's beyond my understanding.

Hmm. Instead of a direct EMIF, maybe it would be easier to add an 8-channel (per-thread) DMA device on-chip which interfaced to an external memory on behalf of the CPU when an external address range is placed onto the address bus. But then you need a paged memory of sorts, so the CPU/DMA knew how much to load into SRAM and where. And for data you also need to write-back to external memory which needs need a cache table. Hmm.
I love the idea of running a virtual XS1 core inside the FPGA
Dude, that's your idea. And XOSS' provision of an XLink interface belongs to Ali/Paul.
The link Heater is referring to is ...
Thanks, that adds clarity.
I would be interested in you response to this Julian in terms of the making it simple.
Wish I could make it so. Probably what's missing is a good book on Systems and Software Design for the Xcore.
User avatar
Folknology
XCore Legend
Posts: 1274
Joined: Thu Dec 10, 2009 10:20 pm

Post by Folknology »

I think going the DMA route is definitely worth investigating as it could be useful well beyond the scope of this thread and memory interfacing.

My initial thinking with DMA combined with Links and Xmos 'Services' and channels came about when considering how to deal with very fast data sources such as multiple adcs for high speed sampling or other fast data inputs from Radar to disk arrays. Thus if the concept could be used with these as well as memory its more likely to be useful and fly with Xmos.

Thoughts?
jhrose
Active Member
Posts: 40
Joined: Mon Dec 14, 2009 11:18 am

Post by jhrose »

Hei,

In my previous post I briefly mooted DMA as an alternative to a CPU-addressable EMIF and decided it didn't make things any easier to implement - though I should have made that concluding statement more clearly.

Here, I think you're proposing a general purpose on-chip DMA device be added, that can be programmed to perform a variety of I/O functions. So presumably it has some memory-mapped control registers, and can raise an event to the CPU; it is connected to the links and switching sub-system, and also to the I/O ports; has a 32-bit wide data interface that is capable of passing 1-, 2-, 4-, 8-, 16- and 32-bit word transactions. And the on-chip SRAM would be dual-ported, to allow the CPU and the DMA concurrent access. Is this about it? (If you mean it to access an EMIF too, then possibly add some of the complexities mooted previously.)

As the CPU does these things already, why not just run a software DMA process in parallel?

Or do you see a way to get high-speed data transfers with a DMA, through the I/O ports?
User avatar
lilltroll
XCore Expert
Posts: 956
Joined: Fri Dec 11, 2009 3:53 am
Location: Sweden, Eskilstuna

Post by lilltroll »

I have now ideá about the large LED display, but today I started to play with a 320*240 (6bit*RGB=18) TFT dispaly with touchscreen, mounted on a PCB card.
On the card you can choose to acess the TFT directly by the 18 bit standard, or to communicate with a driverchip with a 256 Kb video memory including table lookup for 1-2-4-6-8 bits RGB com. + the ability for grayscale dithering.
Since the TFT part is expensive anyway, isn't it a rather good solution to just use an external video memory if you are doing graphics, since the driverchips sells in large quantaties !?
Probably not the most confused programmer anymore on the XCORE forum.
kster59
XCore Addict
Posts: 162
Joined: Thu Dec 31, 2009 8:51 am

Post by kster59 »

My project already incorporates an ARM and XMOS on the same pcb.

I found the XLINK protocol is not reasonable to implement on the ARM if you want any reasonable speed.

Instead I plan to use full duplex SPI at 25mhz to communicate with the XMOS. If I need more speed, I can use 2 SPI channels at 50mhz.

The ARM has a deep FIFO buffer for hardware SPI and DMA transfers between SPI and internal memory. This is substantially faster than interrupting code every time you get an XLINK request. This also lets you run a non real time OS like Linux without running a special kernel.

As pointed out in the other thread, you can buy a Hawkboard for $89 to play with. ARM chips are $5 even in low quantities and have all the peripherals you need.

Still prefer XMOS for fast IO control however.
jhrose
Active Member
Posts: 40
Joined: Mon Dec 14, 2009 11:18 am

Post by jhrose »

Hei,
I found the XLINK protocol is not reasonable to implement on the ARM if you want any reasonable speed. Instead I plan to use full duplex SPI at 25mhz to communicate with the XMOS.
That's good analysis, along with your point about using a non-real time OS. Can you publish your ARM driver source code and any performance test results you obtained? Your results may suggest an FPGA or CPLD would best provide a firmware XLINK interface, and to use SPI as you say with a general purpose processor. (Though I also wonder whether, say, a TI C6000 DSP variant with serial peripheral/EDMA could be made to interface well enough.)