SRCDS Steam group


Spike Dropouts [RESOLVED]
#1
Hello,
First off to explain my problem I believe I should list off the server specifications:

CPU: E3-1225 @ 3.2GHZ
RAM: 12GB
Port: 100Mbit
OS: Windows Server 2008

The problem I am experiencing is:
During the time window of about 6PM-9PM I experience an issue with my dedicated server, whilst tracking it inside HLSW constant dropouts begin to appears and become closer and closer together, then slowly fade away.
Image:
[Image: Y0Aeq.png]

What I have done:
  • Checked if it happens in the morning - No
  • Confirmed with Datacenter that the network has capacity and that my server is by no means being capped
  • Checked CPU + RAM load, both sit comfortably under 15%
  • Tried different gameservers (CS:S/DOD:S/L4D) - Eliminating any kind of Mod problem.
  • Checked it against other servers in the same datacenter and they dont get the error
  • Multiple users on a multitude of ISP's and connecting from various countries experience lag due to this. -Eliminating me & my connection

I am really stuck for idea's as to what it could be, I'm keen to hear absolutely any feedback from anyone who might know what this could be.
Thanks,
Black Sheep
Reply
#2
I monitor my game servers via HLSW as well. I see this exact same issue occasionally with my servers. It's actually more common than you think, especially when hosting in a public datacenter where other servers may be subject to DDOS attacks. These attacks can disrupt the entire network segment that the victim server is on and it usually takes the datacenter personnel blacklisting the IP of the victim server before the issue can be corrected. I can tell you this for sure, it was most likely a network issue either at the data center or specifically on your box. So what caused it? Well, if it's gone, then you likely had one of a few things happening:

1. Your server or another one on the same network link as yours was suffering from a low to medium level DDOS attack which was gobbling up the outbound traffic capability of your server's network connection in order to respond to the attackers network requests. This is particularly evident on 10 Mbps or 100 Mbps connections that have lower bandwidth capabilities and can't bear the burden of full DDOS attacks as well as a 1 Gbps connection.

2. There was a problem with the outbound network route from your server to you and others affected by this problem

3. A router at the D/C or somewhere in your path to your server was having issues and causing loss in the network

I spent a few days dealing with this issue about a month ago when it started and didn't go away like it usually does after reporting it to the data center as an attack. In that particular case, I actually discovered a mis-configured router at the datacenter was causing it because someone changed a setting on it. It was only after I pressed them did they acknowledge the mistake and correct it. Amazingly, they actually credited me a month of service because it hosed us so badly and it took their customer to figure out the problem.

There are several tools that may help you diagnose this issue in the future. If you know about them already, forgive me. One is the command prompt utility "pathping". You can open a command prompt and type "pathping xxx.xxx.xxx.xxx" where the xxx is the IP to your destination IP. This will test the connection and report the latency and loss to it from where-ever you are running it from. Basic but handy.

You can run a great diagnostic tool to help determine exactly where the problem's occurring on the network. Download and install the Ping Blotter from: http://www.pingplotter.com/ Once you install it on your PC, you can ping your server's IP(s) with it and check each hop along the way for loss and timeouts. When the problem you were having is actually occurring, Ping Blotter will show the loss and timeouts as they occur at a particular hop in the path to your server. You can then use that info to report it to the data center so they can resolve it.

Best of luck!
About Me:
I help people who at least try to help themselves. Please use the "Search" button before posting a new topic.
If you post, give us the info we need from the "READ ME FIRST" sticky at the top of each thread!

I'm here to share my experiences to help others. If I'm wrong about something, don't hold it against me, educate me.
I not perfect and try to learn from every failure, yours and mine.
Reply
#3
We had a similar issue with a facility in KC. It occurred for months, we pressed them to fix it. They admitted it was a faulty edge but still never fixed it. We ended up switching facilities.

Because your drops appear to be so consistent in length I'm willing to bet its faulting routing or overloading routing.

Contact your facility, show them your packet loss. Make them fix it.

IF you're unsure if its your problem or theirs then do what scso suggested:
Quote:There are several tools that may help you diagnose this issue in the future. If you know about them already, forgive me. One is the command prompt utility "pathping". You can open a command prompt and type "pathping xxx.xxx.xxx.xxx" where the xxx is the IP to your destination IP. This will test the connection and report the latency and loss to it from where-ever you are running it from. Basic but handy.
Pathping your servers IP AND the gateway of your server ( if accessible ) to determine if its hardware faults or network faults.
Looking for a game server? Visit fullfrag.com and pick one up as low as $2.50 / mo!
Reply
#4
Hey guys,
Thanks for your replies ! You were right hehe, after 2 weeks (ish) of pressing the issue. They came back with:

On one of our backhaul links providing services to the **HIDDEN FOR PRIVACY** Data Centre the device started to re-learn the mac addresses of some switches in another data centre of ours. To prevent a loop in traffic on our network, the system shut the link down and fell over to our redundant carrier.

It took a short while for all traffic to fall over, and engineers then rectified the mac address issues and switched the link back a few hours later (this was seamless).

We are looking at ways to improve the speed of the fail over and ways to prevent this in the future.

We apologise for any inconvenience that this may have caused.
Reply
#5
Awesome! Really good new!!! Congrats! - Can you please edit your original post title and add [RESOLVED] so people know it's fixed?

Take care!
About Me:
I help people who at least try to help themselves. Please use the "Search" button before posting a new topic.
If you post, give us the info we need from the "READ ME FIRST" sticky at the top of each thread!

I'm here to share my experiences to help others. If I'm wrong about something, don't hold it against me, educate me.
I not perfect and try to learn from every failure, yours and mine.
Reply


Forum Jump:


Users browsing this thread: 2 Guest(s)