Over the past few months I have lost valuable statistical data that I am not able to access any more. As time goes on the statistics are more and more important but HA does very little to help when things go wrong
- More robustness in the database design to prevent failures from corrupting the database, and when a corruption does occur more effort by HA to recover and restore.
- There needs to be a repair action alert when the database has an issue. At present a corrupt database is simply renamed and a new one added… this error does show up in the log but unless you look there every day its easily missed. If you restart its lost all together.
- In the event of issues where the database is renamed and a new one created HA needs to provide users with an easy to use method of merging a backup of the database to the live database, or recovering data from the renamed “corrupt” database and merging into the new live version.
Worst case, If I knew the database had been corrupted when the system started, I could recover from backup and restore… I may loose a day of data but that’s better than loosing months of statistics.
Better case would be to be able to restore up to the last write from a database log.
ideal would be, in the event of failure, for HA to offer recover from backup and data recovery during startup rather than just renaming and continuing.