Network Connectivity Issues
Incident Report for Cartika
Postmortem

We want to provide some additional context regarding the incident seen today.

On all Linux environments, Cartika operates a central set of Firewall rules. This allows us to push blocks, fixes and proactive defence to all servers in the fleet when needed. Especially, should a new type of attack be detected by our team or proactive defence mechanisms. This is all done to further Cartika's compliance initiatives and is the final step in our goal for compliance with the NIST framework. (https://www.nist.gov/industry-impacts/cybersecurity-and-privacy)

Tonight, our team rolled a similar, proactive approach to our Managed Windows servers.

During the roll-out, we began to notice a few strange items taking place.

1) Reports of connectivity issues between our Toronto and Dallas Facilities

2) Reports of some client VMs being unable to communicate with each other

Our team began to investigate the reports and determined the following:

A) One of the ranges of IP's we use was omitted in error from our firewall configurations

B) That due to the changes, a Synchronization error between our Domain Controllers had begun to occur - Causing various connection errors. Because our Linux environments also connect to the same Domain Controllers, some clients on Linux environments also began to see connectivity difficulty.

A fix to the Firewall for the omitted range was pushed within 10 minutes of the initial reports coming in. However, the Sync errors between Domain Controllers was causing some client devices not to pick up this change in a timely manner.

Cartika's NOC in Toronto immediately declared an incident, bringing additional staff on-site to resolve the issue. Working with our team, we worked to bring all Domain Controllers back in Sync. Once Syncing again, the firewall push was accepted to all client devices - Resolving the incident.

While the Firewall changes and tonight's maintenance was absolutely needed, We discovered during this process that there are a few key areas we will work to improve upon in the future.

We will be implementing a modification to our Change Management procedures to ensure that we have more eyes on the proposed set of Firewall Changes before they go live to a customer environment.

We deeply and sincerely apologize to the clients impacted tonight. We will be running a further internal postmortem on tonight's incident over the coming days.

Matt Cianfarani

COO - Cartika

Posted about 1 month ago. Jul 19, 2019 - 01:18 EDT

Resolved
This incident has been resolved
Posted about 1 month ago. Jul 19, 2019 - 01:10 EDT
Monitoring
We have resolved connectivity in our Toronto facility as well as pushed a fix for FTP to the fleet. This change may take a few hours to fully propagate to all servers.

We are changing this incident to Monitoring Status and will monitor things for the next few hours before declaring it fully resolved.
Posted about 1 month ago. Jul 19, 2019 - 00:45 EDT
Update
Connectivity has been restored in our Dallas, TX facility. Some clients may be experiencing issues with FTP to a few Windows servers in that locale. We are pushing a patch shortly to resolve this.

Toronto (Access Management and connectivity) is still being worked on at this time.
Posted about 1 month ago. Jul 19, 2019 - 00:31 EDT
Update
We are currently working to resolve an issue with our Domain Controllers causing connection difficulty for some clients. Our team is working as swiftly as possible to fix impacted systems. Additional updates to follow.
Posted about 1 month ago. Jul 18, 2019 - 20:57 EDT
Update
We have resolved Exchange connectivity in Toronto and are working to issue a fix for Dallas. Any customers with continued connectivity issues in the Toronto locale should contact support@cartika.com so we can investigate.
Posted about 1 month ago. Jul 18, 2019 - 19:15 EDT
Identified
We have identified an issue with our Microsoft Exchange services and are pushing a fix to them now. We are also working with impacted clients and network vendors to fully resolve all connectivity issues. Further updates to follow.
Posted about 1 month ago. Jul 18, 2019 - 18:29 EDT
Investigating
We are currently investigating reports of network connectivity issues impacting several portions of our network. Updates will be posted here as they become available.
Posted about 1 month ago. Jul 18, 2019 - 17:33 EDT
This incident affected: Cartika Dallas (Shared Email, Shared Web, Shared Database, Cloud Email, Public Cloud, Access Management, Mobile VPN) and Cartika Toronto (Shared Email, Shared Web, Shared Database, Cloud Email, Public Cloud, Access Management, Mobile VPN).