Isolated lag spikes (stability issues)
Server OS: CentOS 5.6
Processor: 1x AMD Opteron 4184 (6 cores @ 2800 mHz)
Ram: 16 GB
Game(s): Team Fortress 2
Start Up Command: ./srcds_run -game tf -tickrate 66 +fps_max 0 +maxplayers 24 -port 27015 +exec server.cfg +map cp_freight_final1 (map and ports change for each instance)
Bandwidth: I'm billed at 10 Mbits, but it's not capped. I can burst up to 1 Gbps.
Running 4 24-slot servers. Each has been assigned its own core 1-4, cores 0 and 5 have no srcds processes on them. When uncapped, all servers run at 980 FPS under full load (96 players). For normal operations, fps_max 70 is in each server.cfg.

All installations are separate. Each server has it's own user and login.

Server hums along famously most of the time, lots of compliments. Now with the addition of two game servers on cores 3 and 4, two or three times a map, lag spikes. Everyone freezes, lots of packets are dropped, then the game re-synches. All servers lag at the same time.

No other processes are intended to be running on the server. This box is ONLY for srcds.

Since each server is running on it's own core, and I have 16 GB of RAM, why would moving to 4 servers from two affect anything? What resources are they sharing that are being used up?

What additional information is needed to diagnose this?
Are there any kernel panics or anything like that in your logs?
a kernel panic would stop the whole machine, you would notice that without looking into the logs...

can you try doing a fps-meter measurement? if you see fps drops there, you have probably some process (not necessarily but possible one srcds server) going mad and taking all cpus. but in my experience that is rather unlikely, so I would suspect (external) network problems, in which case you will not see any fps drops on the server. maybe your network bandwidth is capped after all, but in some strange indirect way. or your hoster has some other problems...
Do not ask technical questions via PM!
No system crashes to report. No kernel oops or panics in any logs. I'll wait for peak time later today to use fps-meter.

Can one srcds server utilize other CPUs when its affinity has been set? What shared resources could be compromised by a greedy process?

Also, from what I've read on these forums, CentOS is just a giant headache, and I've got almost 100 clients connected. Could this be caused by my choice of distribution?

I'm going to keep gScramble on for fps-meter, but I'm suspicious that it might be a runaway plugin.

The test's showing is poor, whatever "Almost Perfect" might suggest. There are clearly a number of issues to address yet.

I was in a voice chat (hosted elsewhere) where the users in the server could tell me when they experienced a lag spike. The only "lag spike" experienced by the users was at 22:04. Would this suggest network or configuration issues?
Ah, here's an interesting development: The spikes apparently occur whenever one of the four servers changes to a different map. All the servers share the same RAM, but I have 16 GB and should be good to go. They all have their own cores. So that leaves...the Hard Disk.

I'm running an SSD as the sole drive on SATA II. Is possibly my bottleneck?
(06-27-2011, 06:12 AM)Mr_Wiggles Wrote:  The test's showing is poor, whatever "Almost Perfect" might suggest. There are clearly a number of issues to address yet.
ignore the rating, I should probably remove it ;-) its very out-dated and never was really good...

HDD will not be your bottleneck (gameservers shouldn't read from HDD unless changing map and at some other rare occasions), especially if it's a SSD.

did you try without plugins? sometimes some plugins take a lot of cpu and can cause horrible lags... (do that test even if you tested those plugins somewhere else!)
I will remove all plugins except for vanilla SourceMod and test tomorrow. Thanks for your continued help.
Hi, i would suggest you to compile a never kernel. Try the 2.6.39 with an RT-patch.
Very similar FPS-meter results. This is with all plugins disabled except very basic MetaMod & SourceMod. And only one server running.

We'll try a newer kernel. If I get similar results, it's time to adjust network settings.
There doesn't appear to be a newer RT kernel patch than

Should I try a different kernel regardless?
use a different patch. try zen.
I feel so foolish. I didn't know there was a different RT project. Thank-you.

I'll hammer on a new kernel this weekend and let you know how it goes.

Edit: Oh my. Such colorful version names.

