Some bots went offline today, which led to web account outages. The reason was one of our bot servers which has died because of a hard drive failure. The affected bots were quickly moved to a nearby server… which got crashed under the extra load, too. This was unexpected.
The good news is that while our server team was trying to reanimate these machines, the development team managed to develop the system which allows SmartBots to quickly detach (or, say, temporary forget) any set of broken bots. This will also help during SL rolling restarts, when large random sets of bots fall offline.
We are sorry if your bots were offline today. Everything is running smoothly now!
P.S. We are going to completely replace the second, unreliable server tomorrow. All bots are expected to stay online during the migration.