Jump to content
MeTa

Unexpected downtime (MAJOR INCIDENT)

Recommended Posts

Dear MU-tizens,

  OVH (our hosting provinders) have serious problems with datacenters. 

OVH Quote:

Quote
March 10, 2021 3:28AM UTC
[Identified] We are currently facing a major incident in our DataCenter of Strasbourg with a fire declared in the building SBG2. Firefighters were immediately on the scene but could not control the fire in SBG2. The whole site has been isolated, which impacts all our services on SBG1, SBG2, SBG3 and SBG4. If your production is in Strasbourg, we recommend to activate your Disaster Recovery Plan. All our teams are fully mobilized along with the firefighters. We will keep you updated as more information becomes available.

Our servers are in SGB3 building.

Share this post


Link to post
Share on other sites

This is a very unfortunate situation 

I surely hope your contingency plan works @MeTa

Lets hope for best !

  • Like 1

- Artagosul

 

 

Share this post


Link to post
Share on other sites
Quote

Update 10am. We finished to shutdown the UPS in SBG3. Now they are off. We are looking to enter into SBG3 and check the servers. The goal is to create a plan to restart , at least SBG3/SBG4, maybe SBG1. To do so, we need to check the network rooms too.

 

Share this post


Link to post
Share on other sites
Quote

Update 11:20am All servers in SBG3 are okey. They are off, but not impacted. We create a plan how to restart them and connect to the network. no ETA. Now, we will verify SBG1.

 

Share this post


Link to post
Share on other sites
Quote

Update 1PM:

Plan for the next 1-2 weeks:

1) rebuilding 20KV for SBG3

2) rebuilding 240V in SBG1/SBG4

3) verifying DWDM/routers/switchs in the network room A (SBG1). checking the fibers Paris/Frankfurt

4) rebuilding the network room B (in SBG5). checking fibers Paris/Frankfurt

 

Share this post


Link to post
Share on other sites

Acela este planul pentru refacerea instalatiilor permanente.  Pe termen scurt o sa aduca generatoare sa porneasca cladirile neafectate dupa ce termina de verificat toate cladirile. 

 

Share this post


Link to post
Share on other sites
Quote

Update 4pm:

We plan to restart SBG1+SBG4+the network by Monday March,15 and SBG3 by Friday March,19.

 

Share this post


Link to post
Share on other sites

Cu parere de rau, dar daca si ei se tin de termene precum @MeTa la update-uri ... 

 

NO offense..it's just a joke...or not 😂😂😂

  • Haha 1

Share this post


Link to post
Share on other sites
2 hours ago, Mandarin said:

Cu parere de rau, dar daca si ei se tin de termene precum @MeTa la update-uri ... 

 

NO offense..it's just a joke...or not 😂😂😂

Trebuie sa intelegi ca si @MeTa pe langa administrarea serverlui si "lucrul" la server mai mult ca sigur are si un serviciu la care trebuie sa fie prezent .

In ceea ce priveste compania respectiva, asta este scopul ei, stocarea de date. Mai mult ca sigur ca pe langa focul fizic in compania respectiva este si un foc psihic in a rezolva cat mai rapid asta ( reboot and stuff ) , deci este si in interesul lor o rezolvare cat mai rapida pentru ca pierd clientii sau au alte probleme juridico-financiare . 

Intre timp a inceput sa ninga, scoateti inapoi si impodobiti bradul , mai puneti o colinda, mai beti un vin si asteptati rezolvarea. Nu e ca si cum am putea noi sa facem ceva.

 

  • Like 2

- Artagosul

 

 

Share this post


Link to post
Share on other sites
Quote

 

[Monitoring] Summary:
• At 00:47 CET on Wednesday 10 March 2021, a fire broke out in a room at one of our four OVHcloud data centers in Strasbourg (SBG2).
• The fire was contained by the early hours of the morning.
• There are no injuries.
• The fire mostly destroyed the SBG2 data center and partially damaged the SBG1 data center (4 of the 12 rooms destroyed). The two other OVHcloud data centers in Strasbourg were not affected by the fire; the SBG3 and SBG4 servers are currently switched off but undamaged.
• The cause of the fire has yet to be established and an investigation has been launched as mandated by the authorities. Actions taken by OVHcloud:
• The technical and commercial teams have been working since this morning to inform our customers and handle the unavailability of our Strasbourg site.
• The company’s founder, Octave Klaba, has been on site since this morning with the industrial and technical teams.
 
 • To help us handle customer requests we recommend following real-time updates posted on our US Status page or by opening a support request where you have been impacted by replying to this email directly. Our three priorities are as follows:
   1. Reserve infrastructures at our other data centers for our affected customers: we have a stock of new servers at the Roubaix and Gravelines sites, ready to be delivered to the majority of affected customers. We will further enhance availability in these data centers, with the production of nearly 10,000 new servers in the coming weeks. Affected customers will be notified about this process as soon as possible.
   2. Secure the site now that we have regained access, clean it up, and reconnect the electricity and the network for the three affected data centers.
   3. Continue to assess the impact on our customers’ servers at the affected data centers, in order to find the best solutions. We are doing everything we can to ensure a continuity of service to our customers:
 
We are working on a plan to relaunch the two unaffected data centers (SBG3 and SBG4), the partially affected data center (SBG1), as well as our network, as quickly as possible.
• We ask that our customers exercise caution around the emails they receive: in times of crisis, it is common for malicious activity (phishing, spam, etc.) to increase. It is more important than ever to stay alert. Impact on our operation: • We are continuing to assess the impact of this incident, particularly for the customers whose data was located in the data center destroyed by the fire.
• All of our services in our other France-based data centers and across the world (including 15 data centers in Europe and two in the US) are fully operational. Our mission is to provide our customers with the highest quality of services to support their online activities and we know how important this is to them.
 
We sincerely apologize for the issues caused by this fire. We will continue to communicate with the greatest transparency about the cause of the fire and its consequences. We are assessing the environmental impact by working with the relevant authorities on a procedure to confirm that no pollution was caused. At this stage, we can confirm that the local residents are not at any risk. We are continuously assessing the impact of this incident and will communicate as transparently as possible on the progress of our analyses and the solutions to be implemented. All of our communication channels, including our incident tracking platform, can be accessed so that you can stay informed of developments in real time.

 

 

  • Like 3

Share this post


Link to post
Share on other sites
Quote

 

Summary:

We’ve started to propose replacement infrastructure (Dedicated Servers  ) in Roubaix and Gravelines data centers to our customers.
To address the demand, additional assembly lines will be in place in the next 48 hours. This will triple our production capacity.
On the Strasbourg site:
Diagnosis and inventory are currently in progress.
We are in the process of cleaning up and repairing the damaged buildings to guarantee a safe working environment for our teams.
We have finished removing the fire extinguishing foam and water from the street. We will remove the water from the water tank today.
SBG2 will need to be almost entirely reconstructed.
SBG1 was heavily damaged.
Following a preliminary audit, the network room in SBG1 will be restored at the beginning of next week.
The provisional date for restoring the power supply is Monday, March 15.
In the coming days, servers will be activated room by room after the audit.
SBG4: preliminary audits did not reveal any issues. Our ambition now is to restore power during the week of March 22 and then gradually reactivate all services.
SBG3 was not impacted by the fire. We aim to restore the power and network during the week of March 22 and then gradually reactivate all services.

 

 

  • Like 1

Share this post


Link to post
Share on other sites
Quote

 

Our three priorities are as follows:
Priority 1: Restoring services to SBG1, SBG3, and SBG4
Priority 2: Providing infrastructures in other data centers for our affected customers
Priority 3: Implementing all DRP (Disaster Recovery Plan) mechanisms with our customers

Action plan -
Priority #1:
SBG-1
- Situation : 4 of 12 rooms were damaged
- Electrical restart : From Monday 15 March
- Server restart : Progressive restoration of undamaged servers by 22 March

SBG-2
- Situation : Building out of use
- Electrical restart : Audit and inventory of premises
- Server restart : Replacing infrastructures in other data centers

SBG-3
- Situation : Servers undamaged
- Electrical restart : Tests for the high voltage supply will be carried out this weekend. Provisional restart on 15 March
- Server restart : A progressive restart of all services is estimated from Monday, 22 March.

SBG-4
- Situation : Servers undamaged
- Electrical restart : Scheduled for Monday, 15 March
- Server restart : A progressive restart of all services is estimated from Monday, 22 March.

 

 

  • Like 1

Share this post


Link to post
Share on other sites

Un update de pe 15 cu situatia SBG-3 ? 😄


- Artagosul

 

 

Share this post


Link to post
Share on other sites

SBG-3


- Situation : Data center operational. Servers undamaged.
- Electrical restart : Accomplished
- Network restart : Accomplished
- Server restart : Test today on one rack and if conclusive provisional estimate for Friday 19 March for a progressive restart of the services.

  • Like 1
  • Dislike 1

Share this post


Link to post
Share on other sites
58 minutes ago, BioHaZzarD said:

Sincer , nu prea știu personal ce înseamnă , e un mister acolo ,însă cu toții vrem sa știm un termen când se va redeschide serverul . 

Pe data de 19.03.2021 au reusit sa puna in functiune SGB3 (camera in care se afla serverul). Serverele sunt puse in functiune pe rand, pentru a evita alte incidente.

Verifica link-ul de mai sus, iar cand patratelul cu numarul S330A04 este marcat cu verde, inseamna ca serverul este functional si putem relua activitatea.

 

  • Like 2

Share this post


Link to post
Share on other sites
2 hours ago, MeTa said:

Pe data de 19.03.2021 au reusit sa puna in functiune SGB3 (camera in care se afla serverul). Serverele sunt puse in functiune pe rand, pentru a evita alte incidente.

Verifica link-ul de mai sus, iar cand patratelul cu numarul S330A04 este marcat cu verde, inseamna ca serverul este functional si putem relua activitatea.

 

Ok ms frumos ! 

  • Like 1

Share this post


Link to post
Share on other sites
Quote

Update March,24 10pm

pCS: we hope have all the servers cleaned and UP by Friday. We need 24h to restart the whole cluster and resync some data. Then we will give the RO access to pCS on Saturday. Full RW from Sunday. Can be faster.

 

  • Like 3
  • Panic 2

Share this post


Link to post
Share on other sites

Hello,

We finally have received access to the server and we're currently checking all data for integrity issues.

We'll need to complete some tasks before we can restart the server:

1. Check all data integrity, including server configuration and database

2. Test for potential issues in our server-> website data comm

3. Updating Gold Member Status, and Temporary items by adding the offline time to the expiry date.

Updates shall be posted as we are completing the steps described above.

 

Best Regards,

MeTa

  • Like 2

Share this post


Link to post
Share on other sites

Hello everyone,

 

As promised we are happy to announce that file integrity has been succesfully checked and no loss reports have been detected.

We’ve extended the Gold Member status to all eligible users by a period of 40 days.

We’ve extended the wings expiration date by 40 days to all users that had active wings on their characters at the date of the incident. If your wing shows as expired it surely has been expired at the date of the incident.

 

Now that we can carry on with our job, here is the following tasks that we are currently involved in:

1. Item Shop- The item shop is in a pretty advanced state, however further work is required since we didn’t have access to the database in order to complete it, and we’ve been forced to pause it’s developement.

2. Small BugFixes and player suggested improvements (More details will be posted at the delivery date)

3. Free Reward in diamonds and Gold Member for players/ new players ( More details will be posted at the delivery date)

4. Adding more monsters to each spot in order to speed up the reset process.

 

Best Regards,

MeTa

 

  • Like 3
  • Love 1
  • Thanks 2

Share this post


Link to post
Share on other sites

Please sign in to comment

You will be able to leave a comment after signing in



Sign In Now

×
×
  • Create New...