10/24/2016 11:00 a.m. Eastern Time: Today, at approximately 10:20 a.m. Eastern Time, our domain, secureto.us came under a significant DDOS (denial of service) attack. This was similar to the attack experienced by many of the largest websites this past Friday that seemed to take out half the internet, including Amazon, CNN, FoxNews, and many others. We were fortunate to avoid any issues last week, however, this attack today caused disruption to our service. Most customers were not affected by the attack until approximately 10:50 a.m. at which time all processing was stopped.
These attacks overloaded the ARP cache on our primary firewall for our Columbus server, and before we had resolved the issue, our secondary firewall had become overloaded, causing a temporary outage. After becoming aware of the disturbances, we logged into our DNS server and initiated counter measures by changing the IP Address for secureto.us and increasing the DDOS protection on this new IP Address. By approximately 11:00 a.m. Eastern Time, the new IP address was in place and we confirmed all calls were processing normally. The secureto.us websites were unreachable for some customers during the change until their local DNS servers/cache got the new IP Address. Some customers had hung calls that had to be cleared up to allow calls to resume and were handled on a case by case basis.
10/4/2016 12:20 p.m. Eastern Time: L3 (Level 3) is reporting full restoration at this time and our test calls are completing normally.
10/4/2016 11:48 a.m. Eastern Time L3 outage update:
We are in process of routing toll free numbers around affected carrier. We cannot do same for local numbers.
10/4/2016 10:47 a.m. Eastern Time - Underlying carrier outage - L3 (Level 3)
We are experiencing indications of a Network Impairment on the L3 (Level 3) network, one of underlying carriers.
Initial:
We are currently experiencing issues with several services as a result of a major underlying carrier network event:
911 Termination Calls to certain destinations that traverse our underlying carrier are currently failing over to our national call center.
Toll Free Inbound calls received via our underlying carrier are currently not completing as expected. L3 (Level 3) is working to route these TF numbers around the impacted carrier.
Telephone Numbers received from L3 (Level 3) are impacted for origination calls.
We are working with our underlying carrier to resolve this issue as soon as possible.
The following Products are impacted:
Origination for L3 (Level 3) numbers routed to us
Termination to L3 (Level 3) numbers not owned by us
L3 (Level 3) Toll Free Origination
E911 calls are being routed to our backup National Routing center.
07/15/2016 8:23 a.m. Eeastern Time - There was an issue with our primary server. The primary handed off to the secondary server with no interruption. The time you are referencing is when our vendor was trying to bring the Primary server back on line and when the switch happened, the root cause of the first issue created a 3-4 minute outage. It was then discovered that the root was with one of our Caller ID Name vendors so we have taken them off line on our Primary and Secondary servers. We will work with them on our dev servers before we bring them back on line. All services have now been restored and we are running on the Primary server.
06/07/2016 All services were fully restored at 12:37:10 Eastern time. We will update this ticket and the public comments with and RFO when it becomes available.
06/17/2016 12:15 Eastern time. We experienced an outage that lasted approximately 20 minutes. Switched to secondary OpenSips server to correct problem and calls immediately got processed normally. Registrations slowly came back to normal over the next hour, with 80% online within 5 minutes. We will investigate further and update again.
04/26/2016 15:58 EST VoIP Innovations (VI) reporting issue resolved. We saw no impact from this.
On 04/26/2016 at 8:00 EST, one of our carriers, VoIP Innovations (VI) has made us aware of an issue affecting both origination and termination. We have been watching the termination and have not seen any issues since they are generally our secondary carrier in over 95% of rate centers, and even if they do not respond, we have secondary carriers in those rate centers as well. As for origination, they only have about 25% of our DIDs, usually in remote areas. Again, we have not yet seen any issues, however, if you do have a complaint about missing inbound calls, let us know and we will check to see if it is a VI number and if it is, we can put in a carrier level forward to a temp number we will add to the account.
At 8:07 a.m. EST, I accidentally removed all current registrations with a rogue query. This would affect inbound calls only and will recover on the next registration cycle.
The admin system is now synced with openSIPs again. All services are working normally and we are back on the primary servers.
Calls processing was interrupted at 9:44 a.m. this morning. Service was restored at 10:02 a.m. EST. We are working on when the problem started and an RFO. The admin system cannot communicate with the OpenSIPs database at this time. We are still working on resolving this.
We did have a limited outage on April 20th from 13:30 - 13:40 EST where some registrations were missed and some calls were not processed due to a customer having looped forward from himself to himself that caused a MySQL slowdown. We corrected the problem quickly and seems like it was a very limited affect.
There was a load problem on April 11th 2016 from 3:52 p.m. EST to 4:18 p.m. EST (26 minutes). During that time calls were degraded and some failed. We are investigating why this occurred and why we were not notified sooner.
Inbound calls failed to many of our customer this morning. On Oct 1 2015 at 11:41 a.m. we noticed a significant number of inbound call failures. The calls are reaching our servers, however we are not seeing responses to invites to customers IP Addresses. Our primary data center is reporting that they found an issue in the routing through Chicago IL. Chicago is a major hub for the northern portion of the country and we are working with our data center to route around it. At approximately 13:17 p.m. EST our data centers reported that a BGP directed route is now routing SIP traffic properly. All calls seem to be completing at this time.
If you do not have call fail over set up in the Member's portal, please log in and do so. Click on "Manage Trunks" -> "Telephone Numbers" to see the grid. You may set one number for all calls to fail over to or a different fail over number for each DID.
0 Comments