SPI APIs of OpenCR

OpusK · June 8, 2018, 12:36am

Continuing the discussion from OpenCR 1.0 mapping Arduino pins to capabilities:

OpusK · June 8, 2018, 12:38am

@Kurt,
Just as I answered the previous topic, I have created a new topic about SPI APIs.

OpusK · June 8, 2018, 12:47am

I wonder if some of this, I should discuss here? Or should some of this be a github issue?

This is a free community so you can discuss it where you want
This discussion seems to be better as users are more active.

Next up try to make a version that uses the DMA support. (transferFast ?). My quick look, appears like I need to call something like beginFast instead of begin? Will figure it out.
Wondering if it would make sense to simply have the transferFast check to see if the DMA had been previously init…

I think the API can be improved enough.

For example, we could design the SPI DMA to be enabled through the parameters of the begin () function, and the user would have the design to use only the transfer () function.

What do you think about this?

KurtE · June 8, 2018, 1:37pm

Thanks @OpusK,

Through you and @ROBOTIS-Will I found the information on the right DMA channel for SPI4. Will make those changes today and play around.

I will also play around some with the API. Awhile ago on the Arduino developers mail list, there were some discussions on trying to come up with a standard extensions to the SPI library for faster SPI transfers and the like. Example:
https://groups.google.com/a/arduino.cc/forum/#!mydiscussions/developers/Q12I7sB_Elk
Not sure if that link will work or not. If not, I could paste in here some of the stuff from that discussion.

I don’t think we ever reached an agreement on all of it, but I did get some of it into the Teensy board (PJRC) SPI library. So I will see how hard it would be to add some of this here.

Things like: SPI.transfer(buf, retbuf, count);
So you have a multibyte transfer, where you have specify both the TX buffer as well as the RX buffer and a count. Either the TX or RX can be NULL. If RX is NULL it is a simple write operation. if TX is NULL, it sends some default byte. On Teensy I have a function that allows you to specify which one…

Question is, are there any reasons why when you use an API like this you would not want fast mode? For example on the ESP… boards, these would always go to DMA… Any reason why we should not do that here?

On the Teensy board, I know that Paul (owner…) wants Arduino to come up with some standard form of some event models. Where the user can have events happen and code defined to process those events. You can choose on when these callbacks should happen (on the interrupt, quickly, idle, never…).

The reason I mention this is on Teensy I added an Asynchronous version of the transfer function that uses the event object. Before this I had an Asynch mothod proposed/working where you specified a callback function.

So for example I could have code that, looked like:
SPI.transfer(txbuf, rxbuf, cnt, myEvent)
Where the event object was setup before the call.

So with this, you could setup a display driver, that does an Asynchronous update, by simply passing in the Event object to the transfer, and either have the event call back function set a flag saying you are done, or have the event object setup, where you can simply query it to see if it was triggered.

Would be great to get some form of this in here as well (either simple callback or…) But that would probably be pass 2…

KurtE · June 10, 2018, 4:23pm

Some updates. I was able to get DMA to work for TX on SPI4 I have pushed up to the forked branch up on github:

Then started working on adding the Transfer(txbuf, rxbuf, cnt).

So far it is implemented to check to see if DMA is enabled and use the DMA for TX only, otherwise use the standard HAL transfer. Running into some interesting stuff.

That is: with TX only: HAL_SPI_Transmit(_hspi, (uint8_t *)buf, count, 0xffff)
Works just as quickly as using the: drv_spi_start_dma_tx(_hspi, (uint8_t *)buf, count);
code and wait for the DMA to complete.

If you look at the output from Logic Analyzer:

You can see that see that a full screen update of my SSD1306 using a modified version of the Teensyview library took about .62ms to update the display 128x64 mono…

If I enable the DMA, It took more or less the same time as you can see from LA output:

But it introduces another issue. That is my display code makes the assumption that when the transfer returns, that the full transfer has completed and at that point it unasserts the CS signal.

That is not currently the case with the writeFast like code. It retuns when the DMA operation completes and there are no more data to startup a new DMA transfer. The issue is that the SPI hardware has not completed the actual output, so we drop the CS signal while it is still doing output. Looks like there are still two bytes being output when CS is unasserted…

I know on the Teensy I need to check a different status to know that the SPI transfer was fully complete. Wonder if we can get that status here as well?

Edit: Put in a fix to wait for SPI not busy before return…

 if((_hspi->Instance->SR & SPI_FLAG_BSY) == 0)

OpusK · June 11, 2018, 12:34pm

I can not give the correct answer because the original designer is not me, but I guess that the API was designed to be able to choose the Polling method and the DMA method.

I think that the SPI can design the API to use DMA only.

Cool!

I agree with your opinion. To provide a reliable SPI DMA API, ensure that no other SPI transfer requests are executed until the previous SPI transfer is complete.

I think we can check HAL_SPI_STATE_BUSY_TX_RX.
Is there a reason not to use the HAL_SPI_TransmitReceive_DMA() function?

KurtE · June 11, 2018, 1:08pm

Thanks @OpusK,

I will probably clean this up a little and bit and try out a few more things and then maybe extract into own branch to issue a Pull Request…

As you mentioned could/shoud probably use the HAL_SPI_STATE_BUSY_TX_RX function. It had it’s own timeout so I just extracted the relevant part.

I will try adding in DMA RX stuff, in order to try out a full DMA transfer (not just write) and see if there is any speed differences. Currently I am thinking of just simplify the synchronous transfer to use the NON DMA internal functions as there appears to be no real gain by using the DMA. This is true on the Teensy, but I believe is different on some other boards like Arduino DUE and maybe some of the ESP boards.

Where DMA can make it better is the ability to do the SPI Asynchronous. Example have a display attached to the OpenCR, that you wish to display some stuff while running the Turtlebot. With the Asynch update ability you could do some quick updates in memory and issue the update hopefully without impacting the main Turtlebot timing loops.

Not sure if it would make sense to update the IMU SPI code to work asynch as well? Looks like most reads are like 6 bytes each. Not sure if would make sense to combine…

Now back to playing!

OpusK · June 11, 2018, 2:24pm

Cool!
Thank you for your enthusiasm and contribution

I think there will be no problem to think.

However, since Turtlebot users use IMU values, I think we should update them after testing.

KurtE · June 12, 2018, 1:24am

I think there will be no problem to think.

However, since Turtlebot users use IMU values, I think we should update them after testing.

I totally agree that would need to be tested…

Today I was taking some time to try to understand how the DMA works with SpI in different options.

That it, we already have the code in place to do TX only… I was putting in the code to do RX only, but noticed in HAL_SPI_Receive_DMA, that if we are the master and we are configured as SPI_DIRECTION_2LINES than this function won’t directly work and then calls off to
HAL_SPI_TransmitReceive_DMA with the receive buffer used as transmit buffer as well, which is probably not what I would want.

I think what I would prefer it to update the HAL_SPI_TransmitReceive_DMA function to allow me to pass in NULL for the pTxData. Currently this would fail. What I think I would like to do, is to then use a static 1 byte buffer probably init to 0, I am then wondering if it would be valid to then update the tx DMA channel data to remove the DMA_MINC_ENABLE field and instead disable the increment… Again I did something similar on Teensy, and also had a function to set the actual transmit character.

Will try it out probably tomorrow.

KurtE · June 14, 2018, 6:04pm

@OpusK and others:

I have, more bits and pieces in place and I am starting some different testings…

For example some simple Arduino Test app, that I can watch the data coming out on Logic Analyzer.

#include <SPI.h>
#define SPIT SPI
#define CS_PIN 10
#define SMALL_TRANSFER_SIZE 128
#define BUFFER_SIZE (320*240*2) // size of ILI9341 display... 
uint8_t buffer[BUFFER_SIZE];

void setup() {
  pinMode(CS_PIN, OUTPUT);
  digitalWriteFast(CS_PIN, HIGH);
  while (!Serial && millis() < 4000) ;  // wait for Serial port
  Serial.begin(115200);
  SPIT.begin();
  Serial.println("SPI Test program");  

  Serial.print("Buffer size: ");
  Serial.print(sizeof(buffer), DEC);
  Serial.print(" ");
  Serial.println(BUFFER_SIZE, DEC);
}

void loop() {
  // put your main code here, to run repeatedly:
  while (Serial.read() != -1) ; // Make sure queue is empty.
  Serial.println("Press any key to run test");
  while (!Serial.available()) ; // will loop until it receives something
  while (Serial.read() != -1) ; // loop until queue is empty
  Serial.println("Ready to start tests");
  SPIT.beginTransaction(SPISettings(8000000, MSBFIRST, SPI_MODE0));
  Serial.println("After Begin Transaction");
  for (int i=0; i < BUFFER_SIZE; i++) buffer[i] = i & 0xff;  
  Serial.println("Transfer Small"); Serial.flush();
  digitalWriteFast(CS_PIN, LOW);
  SPIT.transfer(buffer, buffer, SMALL_TRANSFER_SIZE);
  digitalWriteFast(CS_PIN, HIGH);
  delay(5);

  for (int i=0; i < BUFFER_SIZE; i++) buffer[i] = i & 0xff;  
  Serial.println("write Small"); Serial.flush();
  digitalWriteFast(CS_PIN, LOW);
  SPIT.transfer(buffer, NULL, SMALL_TRANSFER_SIZE);
  digitalWriteFast(CS_PIN, HIGH);
  delay(5);

  for (int i=0; i < BUFFER_SIZE; i++) buffer[i] = i & 0xff;  
  Serial.println("read Small"); Serial.flush();
  digitalWriteFast(CS_PIN, LOW);
  SPIT.transfer(NULL, buffer, SMALL_TRANSFER_SIZE);
  digitalWriteFast(CS_PIN, HIGH);
  delay(5);

  SPI.beginTransaction(SPISettings(2000000, MSBFIRST, SPI_MODE0));
  for (int i=0; i < BUFFER_SIZE; i++) buffer[i] = i / 1024;  
  Serial.println("Transfer Full"); Serial.flush();
  digitalWriteFast(CS_PIN, LOW);
  SPIT.transfer(buffer, buffer, BUFFER_SIZE);
  digitalWriteFast(CS_PIN, HIGH);
  delay(5);

  for (int i=0; i < BUFFER_SIZE; i++) buffer[i] = i / 1024;  
  Serial.println("write full"); Serial.flush();
  digitalWriteFast(CS_PIN, LOW);
  SPIT.transfer(buffer, NULL, BUFFER_SIZE);
  digitalWriteFast(CS_PIN, HIGH);
  delay(5);

  for (int i=0; i < BUFFER_SIZE; i++) buffer[i] = i & 0xff;  
  Serial.println("read full"); Serial.flush();
  digitalWriteFast(CS_PIN, LOW);
  SPIT.transfer(NULL, buffer, BUFFER_SIZE);
  digitalWriteFast(CS_PIN, HIGH);
  delay(5);
  Serial.println("Tests completed");
  SPIT.endTransaction();  
}

The large buffer size is setup for the size of a ILI9341 display, when we use our own Frame buffer…

So far I have not added the Asynchronous calls as wanted to see if normal Synch outputs worked…

Turns out there are issues:
If you look at the simple Transfer function, it looks like:

void SPIClass::transfer(const void * buf, void * retbuf, size_t count) {
  if ((count == 0) || ((buf == NULL) && (retbuf == NULL))) return;    // nothing to do

  bool dma_enabled = drv_spi_dma_enabled(_hspi);
  if (retbuf == NULL) { 
    // write only transfer
      HAL_SPI_Transmit(_hspi, (uint8_t *)buf, count, 0xffff);
  } else if (buf == NULL) {
    // Read only buffer
    HAL_SPI_Receive(_hspi, (uint8_t*)retbuf, count, 0xffff);
  } else {
    // standard Read/write buffer transfer
    // start off without DMA support
    HAL_SPI_TransmitReceive(_hspi, (uint8_t *)buf, (uint8_t*)retbuf, count, 0xffff);

  }
}

Running into two main issues: The first is the call to HAL_SPI_Receive is not working at all
You can see some of it in the Logic Analyzer output:

The two little blips are where the receives are tried and nothing shows up on the SPI pins, we only see the CS pin change from my code.

EDIT: Program error on my side, fixed in above

The second issue is in the TX only or TXRX case we are not able to send a full large buffer. The HAL_SPI_Transmit… functions have their count fields as uint16_t size, while the apis are size_t or uint32_t.a

Two obvious possible fixes:
a) Have top level transfer function, iterate calling the HAL function with up to 64K at a time… Actually max is 64K-1…

b) Update HALL calls to uint32_t need to update hspi fields lit TxXferCount to be updated as well. Looks pretty straight forward.

I am leaning to b) as their DMA functions handle uint32_t size and they have the code to break up the underlying DMA transfers to the max size, but this is all handled internal to function.
Thoughts?

As for the Asynch (DMA) version of API, will test out. I know already the RX only mode diverts to the TXRX.

I am also trying to decide how best to handle the end of transfer timing…
As mentioned earlier the TX DMA event will finish before the transfer is complete. So if you do something like release CS or change Data/Command state at this time, it will likely screw up…

Fix for Synchronous is to wait until the SPI Busy flag is cleared and then you know it is done.

But for ASynch mode, not sure if this is best approach. That is you do the DMA so the processor is free to do other things, but then the code hangs for about two byte transfer time, before continuing. Which reduces the benefits for Asynch…

There are a couple of other options:
a) Try to setup an interrupt to happen that tests for busy state…

b) switch TX only to use TXRX, do like I suggested for RX only mode and maybe in this case have a dummy one byte receive buffer that does not increment, and then trigger everything on RX DMA complete, which will be after the last bytes were sent/received… I have done something like this on some teensy boards do to some similar issue. Semi-downside is 2nd dma channel will be in use…

Again thoughts… B) again if not difficult to set/clear the DINC flag is probably maybe the most robust…

KurtE · June 18, 2018, 3:02am

Quick update: I have a version working with the Aynch code using the MINC flags for either the RX or TX modes, which I think is working.

For example the TX only mode sets up the dummy RX buffer which does not increment. Wastes memory bust time, but took care of the issue with returning from call before the actual transfer was completed.

RX only, I am defaulting to sending 0… May change later to allow you to set.

Fixed a few things along the way, like found out that the clock divider is different on SPI4 than it is on SPI2… I believe SPI2 and 3 have max speed of 25mhzwhere the SPI1, 4, 5 can go twice as fast 50mhz, do dividers are different…

So got some of it working. Tried on my updated version of Sparkfun’s Teensyview driver and got it to work, then setup to use two of these displays with one on each buss. Took a bit to get to work,
Note: these displays are SSD1306 SPI displays, I have some other ones from Amazon/Ebay that have double the display…

Again mostly working, but my Test app, hangs one display randomly which I still am debugging. Not sure if some interrupt is being swallowed…

But when it is working properly, it is nice to see both SPI busses working concurrently.

As you can see in the logic analyzer output. The top 4 signals are the MosI, SCK, CS, DC) of the first display ON SPI. The bottom 4 signals re for the new SPI buss (SPI4) again with same signals. These are running at 6.7mhz

If anyone wishes to try some of this out, I updated my OpenCR branch/fork, There is a link earlier in this thread.

Also pushed up an updated version of my Teensyview code:

Can put up my test program if interested… Looks like upload does not allow zip files or the like?

Kurt

OpusK · June 18, 2018, 11:38am

You can upload freely!
If you can not upload a .zip file, try using .7z or modifying the extension name like .zi

KurtE · June 18, 2018, 12:51pm

Maybe I am missing something in the forums User Interface, but the upload button only appears to allow Jpeg, PNG, gif, and pdf… Again could probably rename to one of those…

But for now I simply put the test program as an example under the Sparkfun_TeensyView_Arduino_Library github project… So it is up there now.

Next up, Will create a new branch, with just the SPI stuff in it… Probably reduce out of it some of the intermediate stuff. Example I set up functions to do Reads, but now I bypass that…

Edit I pushed up a branch with mostly just SPI changes. I did include a simple change to print.h/cpp as well that removed warning messages if you did something like: Serial.printf(“My text”); It would complain about converting String to char*, which is easily fixed by changing the functions input parameter to be: cont char *

Again as I mentioned in Pull Request, there are several parts to the change, that you may want to look at and decide if you want or not. Things like the Asynchronous transfer function, which currently is setup to use the Teensy proposed way of using another object EventResonponder… I brought over a very reduced version of it which only allows call Immediate as that is what my Test programs used. On Teensy there are options to call duing idle (yield), or timer… Could go to simple callback function and/or query functions…

Also was not sure what to call the new SPI object so I named it SPI_EXT (for external connector)… Wish we could simply use SPI, SPI1, SPI2… But SPI1/SPI2 are the internal hardware names…

KurtE · June 29, 2018, 12:59pm

I thought I would mention, that the current stuff has been merged into the OpenCR develop branch

So done with SPI updates for now!