Recently I have created some great SharePoint flows using Microsoft Flow. One problem I have been battling with however is the 429 error that occurs when your SharePoint Online environment is struggling to keep up with the requests.
The 429 error can occur in different places during your process. When any of the SharePoint connector actions give you this error then you can look at handling the error by for example sending an email to a user, but when you trigger gets this error then you are in trouble
Until recently I only found that in analytics I was able to find the 429 trigger failures. But in the run history you can also find these under Failed checks
Ok, this is great but still there is no way to recover the failed runs.
Therefore I went to look for options to make my flows more reliable. At this stage I don’t care about the failure inside my flow I’m just looking at the failures in the flow startup.
Like the actions the flow triggers also have a Retry Policy. By default flow will retry 4 times at an exponential interval. For my triggers I changed this to 20 retries at 20 seconds. This should now cover a period of the Throttling recovery time. I’m not saying that this 20/20 option is the best possible configuration. But it does seem to do the trick!
Should we now set the retry policy on every action?
I don’t think that that would be a good idea! I’m quite happy for my flows to fail when the system is struggling. Then I can at least inform my users from within my flow. Pushing more retries to the system might make it harder for the system to recover. Quick flow exits in that case may not be a bad idea.
Did you know that you can check the SharePoint Online health score with PnP PowerShell? Simply use Get-PnPHealthScore to check your score. This way you can check how close you are to the bad score of 10 when the system will reject any interaction until it has recovered.