We had an interesting workflow issue the other day. For some inexplicable reason our state machine workflow was terminated. After digging around in the database trying to reconstruct what might have happened we finally figured it out:
We have a transaction scope activity which sends an update to the database. The update failed and the exception cause the fault handler on the activity to kick in. The fault handler changed the state of the state machine to custom state called 'TechnicalError'. We explicitly modeled this state because it allows our administrators to recover a workflow from a technical error and essentially restart the workflow. In the InitializeState of the 'TechnicalError' state we wanted to update some data in the database. This also failed, since the cause of the original error was that we had lost connectivity to our database.
Next?
Since we had no fault handler on this database action the workflow crashed. Since connectivity to our workflow persistence database was also lost we now have a situation where the workflow in memory is inconsistent with the data in our line of business application database and is also inconsistent with the last persisted state in the workflow persistence database.
The workflow runtime never crashed several minutes later the workflow persistence database came back online and the in memory state of the workflow (which was terminated) was sync-ed with the workflow persistence database. However, our line of business database was never updated. The timestamp on the updates in the workflow persistence database where minutes apart from the last updates in the line of business database, which made it hard to reconstruct what had happened.
Solution?
We now have a fault handler on the initialize state of the 'TechnicalError' state. If the workflow persistence database OR the line of business application database is unavailable then a delay is introduced. And the workflow retries to transition to the TechnicalError state. This way the workflow will never ever terminate. The only scenario left is where the machine running the workflow is turned off. If this happens then the workflow will recover from it's last save point and life should be good.
dasBlog theme by Mads Kristensen
Concepts LINQ Entity Framework WCF WPF RESTful Web Unit Testing .NET Workflow More >>
Tools Visual Studio Windows IIS Silverlight More >>
Type Screencast Tools Video Newsletter Sample Article Books Magazine How To Demo Course Products More >>