SRCDS Steam group


Extremely High CPU Usage
#1
Hey guys,

We've been suffering with high CPU usage for a while now (Since the game came out actually). It's a 64slot SRCDS server that runs Zombie* Variations ie Zombie:Reloaded or in the past ZombieMod. SRCDS always manages to pin our core at 100%. We're running both of Monks addons (Adaptive Usleep and LibFastTime), they seem to help. However when 50 or so players join, shit hits the fan and the lag starts. Server FPS drops to an all time low, and InGame pings shoot way up (A2S_INFO replies are unaltered, surprisingly).

We're running Debian Six (Squeeze) x86_64 with the Stock Linux Kernel at the moment (Tried Zen-Kernel, didn't do much; Classic RCU is still broken so we are twiddling our thumbs):
Kernel HZ: 100hz
Dynamic Ticks on
TSC as our systems ClockSource
Disabled SELinux and all of that other *fun* stuff.

CPU: Intel Xeon X3460 Lynnfield 2.8GHz (Runs at 3-3.4 with Turbo Boost)
Mainboard: SUPERMICRO X8SIL-F Micro ATX Server Board
HD: Western Digital Caviar Blue WD5000AAKS
Ram: KVR1333D3D8R9S/2G x2

I've mailed a couple employee's from Valve, they've *forwarded* the information along. As well to this I was told they do not support any servers over 32slots as it's out of their testing domain; 'You're on your own'.

Anyways, if anyone has any ideas on how we can get our CPU usage lower, we're all ears to test out practically anything. Fair warning though, we will not use Windows as signatures change far too often for it, causing me to fire up IDA to find the new hex values. Symbol names are MUCH nicer. I was also told by a numerous amount of people that Linux handles higher slots better, which is another reason why we are currently using it.
Reply
#2
(10-17-2010, 04:55 PM)jimmy69 Wrote:  Hey guys,

We've been suffering with high CPU usage for a while now (Since the game came out actually). It's a 64slot SRCDS server that runs Zombie* Variations ie Zombie:Reloaded or in the past ZombieMod. SRCDS always manages to pin our core at 100%. We're running both of Monks addons (Adaptive Usleep and LibFastTime), they seem to help. However when 50 or so players join, shit hits the fan and the lag starts. Server FPS drops to an all time low, and InGame pings shoot way up (A2S_INFO replies are unaltered, surprisingly).

We're running Debian Six (Squeeze) x86_64 with the Stock Linux Kernel at the moment (Tried Zen-Kernel, didn't do much; Classic RCU is still broken so we are twiddling our thumbs):
Kernel HZ: 100hz
Dynamic Ticks on
TSC as our systems ClockSource
Disabled SELinux and all of that other *fun* stuff.

CPU: Intel Xeon X3460 Lynnfield 2.8GHz (Runs at 3-3.4 with Turbo Boost)
Mainboard: SUPERMICRO X8SIL-F Micro ATX Server Board
HD: Western Digital Caviar Blue WD5000AAKS
Ram: KVR1333D3D8R9S/2G x2

I've mailed a couple employee's from Valve, they've *forwarded* the information along. As well to this I was told they do not support any servers over 32slots as it's out of their testing domain; 'You're on your own'.

Anyways, if anyone has any ideas on how we can get our CPU usage lower, we're all ears to test out practically anything. Fair warning though, we will not use Windows as signatures change far too often for it, causing me to fire up IDA to find the new hex values. Symbol names are MUCH nicer. I was also told by a numerous amount of people that Linux handles higher slots better, which is another reason why we are currently using it.

Xeon X3460 Lynnfield 2.8GHz is the problem i have had one and either way custom kernel or not your system will not handle a 64slot server maybe half that 32 might run ok and you can try one of Terrorkarotte's kernels at here
first try the 2.6.33.5-zen3-ub-100hz kernel and try 2.6.33.5-zen3-ub-1000hz they may work as they have for me
Reply
#3
64 slots are a lot, it usually takes a lot of effort to get it running somehow (and it will never run really smooth). try this:
- reduce tickrate to 33 (sv_maxcmdrate 33 and sv_maxupdaterate 33, or maybe 34 for both) and match fps the tickrate (fps_max 34 or so). since orange box this will not decrease quality (I am now say 95% sure of this).
- do not use any tricks like "adaptive usleep" or "LibFastTime", what ever they do, it will probably make things worse in your case. those things are usually designed for high-end and low slot count servers and were created before the OB update.
- try out different kernels that are optimized for maximum throughput, not for minimal latency. i.e. do *not* use RT-patches, but try ZEN with settings for servers.
- run srcds with realtime or fifo scheduling (see my howto, "resched.sh").
- disable all plugins you do not absolutely require.

this is only a start. you will have to play around much on your own. also cyperthug could be right and your cpu cannot handle this. most people already get problems when reaching 32 slots or even lower. in theory your cpu must be approximately twice as fast as theirs, which is probably not the case by far... if you happen to think about a new cpu, watch out for those who have the maximum possible performance per core, as srcds is (basically) single threaded...

ah and btw: do not look too much on the cpu usage, quite often it has nothing to do with reality (neither in stats, top, htop or whatever). instead make sure the fps are always stable at (or above) your tickrate.
http://www.fpsmeter.org
http://wiki.fragaholics.de/index.php/EN:Linux_Optimization_Guide (Linux Kernel HOWTO!)
Do not ask technical questions via PM!
Reply
#4
FPS drops to 10-30, from 937 (Adaptive usleep) that's my issue at the moment. LibFastTime helps my usage, Adaptive Usleep makes my *kernel changes* usable.
Reply
#5
(10-17-2010, 06:24 PM)jimmy69 Wrote:  FPS drops to 10-30, from 937 (Adaptive usleep) that's my issue at the moment. LibFastTime helps my usage, Adaptive Usleep makes my *kernel changes* usable.

Lynnfield is low grade i have proven it before even with BEpingboost.c and the kernel at 100 hz i couldnt get a stable 500 fps and with Turbo Boost
im sure its struggling to keep up Xeons arent meant to boost or overclock in anyway trust me i have tried to get low end servers to run css ob at high fps and 32 slots and another thing is this a dedicated server if so you gotta think about power the power supply could be a cheapie meaning the cpu is under powered there are many things that could cause this just my thoughts
Reply
#6
(10-17-2010, 06:43 PM)cyberthug Wrote:  
(10-17-2010, 06:24 PM)jimmy69 Wrote:  FPS drops to 10-30, from 937 (Adaptive usleep) that's my issue at the moment. LibFastTime helps my usage, Adaptive Usleep makes my *kernel changes* usable.

Lynnfield is low grade i have proven it before even with BEpingboost.c and the kernel at 100 hz i couldnt get a stable 500 fps and with Turbo Boost
im sure its struggling to keep up Xeons arent meant to boost or overclock in anyway trust me i have tried to get low end servers to run css ob at high fps and 32 slots and another thing is this a dedicated server if so you gotta think about power the power supply could be a cheapie meaning the cpu is under powered there are many things that could cause this just my thoughts

I'm sorry, but that makes no sense at all. How is the Lynnfield Architecture considered to be low grade? It's based off the i7's Nahalem architecture. The x3460 is essentially a rebinned Core i7 860. And if they were not meant to be boosted, then why is turbo mode enabled on these CPU's by default.

And the CPU not "getting enough power" is not the issue. That's not the way it works. There is no reason to believe that the power supply is a "cheapie". If it were the case, then the chances of having it supply the server with "dirty power" would be high, and every component could be fried by now. If the CPU was not getting enough power, then the server would constantly crash -- we would definitely know there was a hardware issue going on - and, that is not the case here.
Reply
#7
I've tried Zen, I've optimized the Kernel (haven't touched glibc or anything else) as I said in the initial post. This made all the difference that it could, at the moment I'm looking for other advice besides absolutely nuking my server performance @ a fake 33tick (setting those rates wont even bring down my CPU usage one bit, I've tried it). The preloaded modules only seem to help my performance and CPU usage (What they were designed to do).

Thank you RayW for bringing common sense to the table. At the moment I'm pretty lost with these replies as they're not remotely helpful, even though that was their intention. The OP hasn't been fully read.

EDIT: My FPS is well above 900 before the CPU gets pinned at around 50 players.
Reply
#8
(10-17-2010, 07:28 PM)jimmy69 Wrote:  I've tried Zen, I've optimized the Kernel (haven't touched glibc or anything else) as I said in the initial post. This made all the difference that it could, at the moment I'm looking for other advice besides absolutely nuking my server performance @ a fake 33tick (setting those rates wont even bring down my CPU usage one bit, I've tried it). The preloaded modules only seem to help my performance and CPU usage (What they were designed to do).

Thank you RayW for bringing common sense to the table. At the moment I'm pretty lost with these replies as they're not remotely helpful, even though that was their intention. The OP hasn't been fully read.

EDIT: My FPS is well above 900 before the CPU gets pinned at around 50 players.
BehaartesEtwas and myself have told you 64 slots are to much
Reply
#9
(10-17-2010, 09:41 PM)cyberthug Wrote:  BehaartesEtwas and myself have told you 64 slots are to much
I know man. Depending on the map, my FPS starts to drop dramatically due to the core being pinned at around 40-45 players. It drops below the acceptable threshold at around 50-55 players (on some maps this isn't even an issue, and 64 players are just fine).

The point of this thread was to optimize my system and hopefully serve as a guide for others with what they can do since Valve doesn't seem to care the least bit. So far, the replies have been an exact repeat of the initial post.
Reply
#10
Engine is only designed to support a maximum amount of 32. Running 64 slots on the new engine is futile at best, the new code is just way too expensive, and throwing lower HZ (100) at it is not going to make the game run any better under load, maybe a little, but nothing you can remotely measure or have it noticeable.

You could do some profiling of it in strace to see what it's spending most of it's time doing, it's probably eating up cpu in nanosleep, so you'll have to lower what fps_max is set to (64 slots should be a fps of 66 anyways (1:1 fps/tick))
http://leaf.dragonflybsd.org/~gary

“The two most common elements in the universe are hydrogen and stupidity.”








Reply
#11
Ah shoot.
Code:
root@Tramicia:~# strace -cf -p 9042
Process 9042 attached with 5 threads - interrupt to quit
[ Process PID=9054 runs in 32 bit mode. ]
^CProcess 9042 detached
Process 9047 detached
Process 9052 detached
Process 9053 detached
Process 9054 detached
System call usage summary for 32 bit mode:
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
58.99    9.085366         238     38148      9834 futex
24.03    3.701374          59     62575           select
  7.66    1.180000         543      2172           nanosleep
  5.06    0.778574         429      1816           read
  4.06    0.624596        2271       275           fsync
  0.19    0.030000       15000         2         2 restart_syscall
  0.00    0.000258           1       252           getdents
  0.00    0.000221           0    408504           gettimeofday
  0.00    0.000090           0     76016           sendto
  0.00    0.000024           0    121206      3860 recvfrom
  0.00    0.000018           0     37298           recv
  0.00    0.000011           0     35181           send
  0.00    0.000000           0      2102           write
  0.00    0.000000           0       877       441 open
  0.00    0.000000           0       438           close
  0.00    0.000000           0        69           unlink
  0.00    0.000000           0      2768           time
  0.00    0.000000           0       132       132 access
  0.00    0.000000           0         1           rename
  0.00    0.000000           0        22           times
  0.00    0.000000           0         1           ioctl
  0.00    0.000000           0       120           munmap
  0.00    0.000000           0        86           mprotect
  0.00    0.000000           0     11020           _llseek
  0.00    0.000000           0       220           poll
  0.00    0.000000           0       121           mmap2
  0.00    0.000000           0      1221       756 stat64
  0.00    0.000000           0       325           fstat64
  0.00    0.000000           0      1147           fcntl64
  0.00    0.000000           0      8224           clock_gettime
  0.00    0.000000           0         2           socket
  0.00    0.000000           0         2           connect
  0.00    0.000000           0      3859      3859 accept
  0.00    0.000000           0         1           shutdown
  0.00    0.000000           0         5           setsockopt
------ ----------- ----------- --------- --------- ----------------
100.00   15.400532                816208     18884 total
root@Tramicia:~#
Reply
#12
Try letting it run for about 10 minutes to get a better sample rate. Running processes in profiling mode will slow them down a little bit
http://leaf.dragonflybsd.org/~gary

“The two most common elements in the universe are hydrogen and stupidity.”








Reply
#13
I noticed, everyones in game ping shot up to 500 with 60 clients ingame lol >.<

EDIT: 12 or so minutes of tracing (54-60 Real Clients):
Code:
root@Tramicia:~# strace -cf -p 9042
Process 9042 attached with 5 threads - interrupt to quit
[ Process PID=9054 runs in 32 bit mode. ]
^CProcess 9042 detached
Process 9047 detached
Process 9052 detached
Process 9053 detached
Process 9054 detached
System call usage summary for 32 bit mode:
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
73.19  132.270169         305    433366    119883 futex
25.73   46.499326         108    430864           select
  0.44    0.801059          59     13639           nanosleep
  0.38    0.691791        2882       240           fsync
  0.22    0.396512          20     19845           read
  0.03    0.060000       30000         2         2 restart_syscall
  0.00    0.001917           0   3959501           gettimeofday
  0.00    0.001128           0    699857           sendto
  0.00    0.000548           1       468           getdents
  0.00    0.000312           0   1492545     53618 recvfrom
  0.00    0.000115           0    204686           send
  0.00    0.000105           0    211428           recv
  0.00    0.000040           0    111987           clock_gettime
  0.00    0.000016           0     53618     53618 accept
  0.00    0.000012           0     34779           time
  0.00    0.000011           0     67228           _llseek
  0.00    0.000009           0      6332      1404 stat64
  0.00    0.000000           0      2732           write
  0.00    0.000000           0      3452       819 open
  0.00    0.000000           0      2654           close
  0.00    0.000000           0        60           unlink
  0.00    0.000000           0        93        93 access
  0.00    0.000000           0         8         8 mkdir
  0.00    0.000000           0       260           times
  0.00    0.000000           0        16           ioctl
  0.00    0.000000           0      1722           munmap
  0.00    0.000000           0      1700           mprotect
  0.00    0.000000           0         4           flock
  0.00    0.000000           0       225           poll
  0.00    0.000000           0      1722           mmap2
  0.00    0.000000           0      1907           fstat64
  0.00    0.000000           0       944           fcntl64
  0.00    0.000000           0        32           socket
  0.00    0.000000           0        32           connect
  0.00    0.000000           0        16           shutdown
  0.00    0.000000           0        80           setsockopt
------ ----------- ----------- --------- --------- ----------------
100.00  180.723070               7758044    229445 total
root@Tramicia:~#

I hope that's more helpful then the last trace.

EDIT (again): If Userspace = Userland is there any chance 'kmctrl' is available for public usage?

http://docs.google.com/viewer?url=http://people.summit-servers.com/monk.pdf
Reply
#14
Spending alot of time doing futexes, but each futex is eating 300 usecs per call, hrm. Alot of errors, too.

Can you do

strace -o /root/whatever.log -f -p 12345 (whatever the pid is) and look for the futex lines to see what is causing them to error?

kmctrl is just a program that was written to interface with a driver / mode in the kernel, it only works on IA32 systems (no amd64) and it's very beta and it does cause panics which I haven't been able to solve. The patches for all that stuff are about 2k in size, but again, it's all beta.
http://leaf.dragonflybsd.org/~gary

“The two most common elements in the universe are hydrogen and stupidity.”








Reply
#15
Code:
9047  futex(0xf700e098, FUTEX_WAIT_PRIVATE, 1619845, {0, 49983279} <unfinished ...>
9047  <... futex resumed> )             = -1 ETIMEDOUT (Connection timed out)
9047  futex(0xf700e07c, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>


9042  <... futex resumed> )             = 0
9047  futex(0xf700e098, FUTEX_WAIT_PRIVATE, 1620049, {0, 80971314} <unfinished ...>
9042  futex(0xf700e098, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0xf700e094, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1} <unfinished ...>
9047  <... futex resumed> )             = -1 EAGAIN (Resource temporarily unavailable)
9042  <... futex resumed> )             = 0
9047  futex(0xf700e07c, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
9042  futex(0xf700e07c, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
9047  <... futex resumed> )             = -1 EAGAIN (Resource temporarily unavailable)
9042  <... futex resumed> )             = 0
9047  futex(0xf700e07c, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
9042  futex(0xf700e058, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
Grepped futex: http://pastebin.com/UEE10zEP

That sucks about kmctrl.
Reply


Forum Jump:


Users browsing this thread: 5 Guest(s)