Access failures with Ukrainian hosting. How was it?
It was «amazing». For example, infrastructure productivity would fall for 1-3 minutes and then quickly rise back up. This seemingly small delay caused a huge snowball of larger temporary delays and became the source of serious violations in our business processes.
In this specific situation the problem looked like this: the failures that lasted «only 3 minutes» created a huge queue to our data base, the 1С system broke through the block in these 3 minutes and reset all the documents in divisions and stores. Let’s just say we weren’t too happy with that outcome.
At the level of a specific employee, the process reminded us of a thriller film. For example, let us imagine that a manager downloads several waybills and the failure happens; the manager has no idea whether the transaction passed or not. Just like his colleagues, the manager now has to delete all downloaded transactions and manually enter the waybills. It was a boring, irritating and useless process that seemed like it lasted forever. Our monthly turnover was about UAH 50 000 000, and because of these failures the company could have lost up to 10%, i.e. about UAH 5 000 000 per month!
I’m not comfortable with generating bad press, which is why I won’t name this hosting provider.
Our peak load is approximately 700 documents downloaded as packages in the morning, from 9.30 to 11 a.m., and the same amount in the afternoon – from 3 to 5 p.m. That’s approximately 15000 documents daily. The documents were downloaded at once and and the same time and went through different processing – the checking of 1 waybill may be carried out by 20 (!) positions.
We should mention the specific seasonal fluctuations of such failures with our former Ukrainian provider: as a rule, they happened in the beginning and the end the month (around 10 days), the middle of the month (when everybody submits documents to the Pension fund) and the 20th day (tax document period)”.
Comments from SIM-Networks:
Such delays usually happen when the server is designed for, say, 5 clients: the provider places 10 customers there, naively hoping that they won’t use all of their capacities, which they have already paid for. We believe that it does not matter if the client uses the capacities they lease – as long as they keep paying, these capacities are theirs to do with as they please and theirs only.
The customer continues, «We were baffled that the provider did not foresee the possibility that clients may use all of their capacities at the same time. As a result, the resources that were guaranteed by the provider and that we ordered and paid for was not provided! There were no failures when our neighbors weren’t using their capacities.
We spent a lot of time proving to the provider that the issue was on their end – we started monitoring, wrote letters every time the failures happened. Only after a year of this nightmare did the provider agree to buy a SSD-rackmount in their data center, «special for us». It was still too early to be optimistic.
The migration of our data to this rackmount was performed hideously. The provider promised to transfer this data base within a day, on the weekends. When dinnertime on Sunday came, the restructuring of the new rackmount has not been finished yet, and we asked the provider to transfer the data back where it was. The provider insisted that the system was up told us that they deleted the old data base. The provider decided to delete the data, since they thought that the migration had already taken place and they didn’t have to store it anymore We were utterly shocked. In other words, they’d just destroyed our old data base before the new SSD-rackmount was even restructured, and we had to launch our entire retail network.
In view of such an emergency, we set up all systems from backups. The full restructuring of our «new» SSD-rackmount ended up taking an entire week. During this time, we had to work on our backup infrastructure, which was significantly slower. We don’t need to tell you what stopping for an entire week means for an online retailer.
We were ready to rent physical equipment from this hosting provider. They informed us that we’d need approximately USD 85 000 and offered us to buy this equipment by ourselves for installation at their data center. Naturally, we weren’t interested.
In search of a solution, we started tested various hosting providers – both domestic and European – and saw the difference.
It took us 16 minutes to download 100 waybills on our own servers. With 5 employees downloading 100 waybills, this took up to 2 hours and these 16 minutes turned into 40 minutes – 1 hour.
On the SSD-based hosting of our previous domestic provider, the same process took 9 minutes.
With SSD-based hosting in a European data center, everything was downloaded within around 4 minutes. The main task we were trying to solve was the liquidation of queues to the data base, and the capacities of SIM-Networks helped us achieve that.
After accumulating unsuccessful, but extremely useful experience, we decided to rent physical infrastructure instead of virtual capacities from a European provider. We first transferred essential 1C services to the new infrastructure. After the initial success, we rented 2 additional reserve servers and transferred our cluster, which included a file server, Microsoft Exchange, etc. As a whole, we rented several servers and a rackmount in the data center.