OTA firmware upgrades over LTE?



  • Hi all,

    I've got the ESP32 Basic Core kit and the COM.LTE module. Just wondering if anyone has has any luck in getting OTA firmware upgrades working over the LTE modem, where the ESP32 pulls the upgrade from web server. All the libraries I have found are for upgrading over wifi. And I have tried various guides that never seem to work, and most others are for pushing an update from a PC to the board.

    Any help would be most appreciated!

    Thanks!!



  • Hello @aezero

    the linked example works for me. It is using TinyGSM library which supports multiple modems. I've tested it with a SIM7080 modem and CAT-M1.

    Thanks
    Felix



  • @felmue Thank you so much!!! I wish I had gotten some kind of notification when you posted this! Otherwise I wouldn't have burned an entire working week of failing to get other people's libraries to work!

    I loaded up your sketch and tweaked it to my modem (SIM7600) and it flawlessly downloaded a bin file and flashed it. Curiously, when I merged it into my sketch, I get the output below when trying to download the same bin file. I wonder if maybe my sketch takes up too much memory and it can't store it or something.

    UPDATE 1: I did a packet capture on the web server. I can see the server transmitting the entire file, and I can see the ESP32 acknowledging the entire transmission. It's not disconnecting while downloading, so it must be something from when it goes from memory into flash. The "actually read" value does fluctuate, but is always in the range of 120000-160000.

    UPDATE 2: I just noticed the duration value is always 59s +/- 0.5. With your original version, it is always 35s.

    UPDATE 3: This 59s thing caught my attention. I tried switching the SIM over to another APN. The original APN uses Carrier-Grade NAT and the new one gives the modem a real public IP without any NAT. Now the new firmware downloads successfully (after around 105s). I'm guessing the 59s is actually the CGNAT timeout. That being said, I need this thing to work behind CGNAT for a few reasons. So I guess this now boils down to two questions:

    • Why is my sketch taking so much longer? It's basically your script pasted into mine almost verbatim. It's exactly three times longer, curiously...
    • Is there any way to decouple the flashing process from the LTE connection? I could see in my packet capture that the download completed in under one second. I'm assuming that data is stored in some sort of buffer on the modem itself. Is there a way to do something like disconnecting from the LTE but keeping the data in the buffer and letting the flash happen at its own pace? Or how do I keep the NAT translation on the LTE connection alive?

    Any ideas?

    Content-length: 233040
    Update begin ok
    Reading response data

    0.00% 7.69% 15.38% 23.08% 30.77% 38.46% 46.15% 53.85% 55.73%
    Error Occurred. Error #: 12

    Content-Length: 233040
    Actually read: 129867
    Duration: 58.87s



  • Hello @aezero

    hmm, not sure why your sketch is taking longer (unless the firmware is bigger).

    I think if you want to decouple the flash from the download process you'd need to look into examples using SPIFFS. However in my opinion this is kind of redundant as SPIFFS is in flash as well. So the download would go into SPIFFS first and then copy from flash to flash. (Unless I am missing something.)

    When I check with tcpdump on my server I see it reading multiple small chunks over the period of the download (about 37s). In other words I don't think there is a buffer on the way or in the modem in my case.

    Are you using the default modem.restart();? I found it can take quite some time for the modem to be ready after that. Plus the modem sometimes sends some unsolicited messages (e.g. +CPIN: READY, SMS DONE, PB DONE) as well which I think might interfere with the download.

    You could try using modem.init(); instead and / or add some sort of delay / check if the modem is ready or not. I've added the following line modem.testAT(30000); before String modemInfo = modem.getModemInfo(); which helped the download when using a SIM7600G.

    BTW: from what I can tell my setup uses CGNAT IP starts with 10.x.x.x.

    Thanks
    Felix

    P.S. github example updated



  • @felmue
    Hi Felix!

    My setup differs from yours in that I initialize the modem, do a post of data from the onboard sensors which includes the version number, then the server returns a message to it to indicate that it needs to download a firmware update. That's where I have your code run within a function instead of the main loop. So the modem already has been initialized and has working data connectivity.

    I've tried a number of things over the weekend to no avail, including:

    • Changing TINY_GSM_RX_BUFFER to very low values
    • Reinitializing the modem within my updateFirmware function (including this step makes it almost your code verbatim)
    • Removing the lte.connected() and lte.available() from the while loops

    I forgot to mention, I'm downloading the same replacement firmware when I use your code and mine (it just says "It worked!" on the console), so won't contibute any differences.



  • Hello @aezero

    just to be clear, with your setup, the firmware update always fails, correct?

    • Does your code have tasks (other than the mandatory loop())?
    • Is your power source stable?

    Thanks
    Felix



  • Hi Felix,

    That's correct, it fails every time. When tinkering with it yesterday, I had a couple runs that actually got past the 60 second timeout, but I couldn't pin it down to a specific reason. I started digging around online about TinyGSM. They even have this in their main Github page:

    "If you are able to open a TCP connection but have the connection close before receiving data, try adding a keep-alive header to your request. Some modules (ie, the SIM7000 in SSL mode) will immediately throw away any un-read data when the remote server closes the connection - sometimes without even giving a notification that data arrived in the first place. When using MQTT, to keep a continuous connection you may need to reduce your keep-alive interval (PINGREQ/PINGRESP)."

    That does somewhat align with what I'm seeing. I added a .htaccess file onto the server to enable keep-alive with a pretty long interval, but it doesn't seem to have helped.

    Regarding tasks, I do have it doing a few things, which I had completely working before I started on this OTA endeavour. It is an M5Stack Core with a display, an ENV III sensor, and a battery

    • In setup():
      • After powering on, it checks if it is running on battery (to alert me about power outages)
      • Intialize the ENV III sensor
      • Initialize the screen
      • Initialize the modem
    • In loop()
      • Update the screen
      • Get temperure/humidity readings
      • It calls a function to submit the results to my server via HTTP
      • Handle any error that the server returns, which includes it needing a firmware update. It then calls a function that has your code.
      • Then there is a while loop that runs for five minutes to watch for the buttons to be pressed.

    In regards to power, the same failure occurs if it running on the internal battery or an external charger.

    I think my priority here is to find a way to make the connection stay up through the 60 second barrier that seems to be imposed by the CGNAT. I am planning on deploying a number of these units, and many of the locations are rural with very poor cellular coverage, so I am expecting slow transfers. I just picked up a SIM from other carrier, which I am going to test to see if their CGNAT behaves differently.

    Thanks for all your guidance!

    UPDATE: Different carrier didn't have an issue with their CGNAT. I'm thinking it's because that one is a consumer service, and the one that's giving me problems is M2M specifically.



  • Hi @felmue!

    Good news! I got this working using the most ridiculous solution. My 220KiB test firmware would write in about 105 seconds, so approximately 2KiB/sec. I enabled mod_ratelimit on Apache and gave a 2KiB/sec rate limit to my firmware directory.

    <IfModule ratelimit_module>
    SetOutputFilter RATE_LIMIT
    SetEnv rate-limit 2
    </IfModule>

    Now the firmware gets written at roughly the same rate at which it downloads, keeping the TCP session active the whole time, instead of the "burst then die while idle" connection behavior from before.

    Thank you for all your help!



  • Hello @aezero

    I am glad to hear you found a solution. And thank you for sharing.

    BTW: for me a rate limit of 10 reduces the download time (from about 70 s to 55 s).

    Thanks
    Felix