whoami
I’m a ICT engineer and have been working for 4 years at a nice small biopharma company in Switzerland with lots of really smart people, and most importantly, an awesome IT team :). I’m passionate about software engineering and cybersecurity.
The problem
At the end of 2023, our backup system detected that there was an issue with one of our servers. The result of that was that the backup couldn’t be completed. The role of that server was to host a MS SQL Database that retrieves and stores data from desktop clients across our labs that are used to control complicated instruments which run complex analyses that are not relevant for us cool kids. An important thing to note here also is that this server has a short downtime acceptance, because if the desktop client cannot send the results to the database server after a run, all the data is lost (maybe a bad software design, I don’t know…) and because we are talking about cells and biology stuff, each run counts.
After opening the EventViewer in Windows, those were the errors
As a quick fix, we started using MS SQL backup system to dump the database (don’t judge, sometimes there’s just too many things to do), it worked for while but after a while, a user told the team that some analyses were not accessible anymore.
So hard drive has a bad block, pretty scary, but to fix things, it is often useful to know what broke it.
Investigation
Lead 1 - EDR (it’s always the AV fault right ?)
Because we had just finished the configuration and the deployment of our new Endpoint Detection and Response (EDR) system a week before. I jumped to the conclusion that the problem was probably due to the EDR agent analyzing / disturbing too much the backup process when the agent tried to make a it. So the pretty straightforward thing to do was to disable the agent and try to do backup, right? guess what, it didn’t work! Then I thought ok, uninstall completely the EDR agent, also didn’t work. At that moment, I realized I was up for a ride.
... continue reading