SRCDS Steam group


HL1 & HL2 Booster Library
#91
name the function to do this...
http://www.fpsmeter.org
http://wiki.fragaholics.de/index.php/EN:Linux_Optimization_Guide (Linux Kernel HOWTO!)
Do not ask technical questions via PM!
Reply
#92
Name the function to do what? read time from the TSC?

#define rdtscl(low) \
__asm__ __volatile__ ("rdtsc" : "=a" (low) : : "edx")
#define rop(weeee) \
__asm__ __volatile__("rep; nop" ::: "memory");

Then something like this, to calibrate the delay for a 'tick'

unsigned int blah1, blah2;
unsigned long crap;
rdtscl(blah1);
do {
rop();
rdtscl(blah2);
} while ((blah2 - blah1) < crap);
http://leaf.dragonflybsd.org/~gary

“The two most common elements in the universe are hydrogen and stupidity.”








Reply
#93
(01-22-2010, 09:03 PM)d4f Wrote:  From my experience fps drops from 1000 to 900 are not noticeable, since it's still over 9x faster than the max. client update rate and well below the network latency variation. However I agree, stable fps can easily be achieved without a lib, especially since the libs dont really change anything but simply use somewhat more precise equivalents of the commands.

v2 is following a completely other approach - it tries to reduce cpu usage and syscalls aswell as produce (mostly - not always) high-precision nanosleep, not correct timing issues from the past runs.

You cannot have low CPU usage when the engine is calling nanosleep * fps_max

Calling nanosleep with low latency eats up large amounts of CPU. You cannot have low CPU usage when you're trying for precision. Precision = more CPU usage.

Instead of trying to 'hack' in stable FPS, try fixing the engine bugs, because their are so many. I still don't know why people care about stable FPS when the engine only uses 1 frame for each tick on CSS.
http://leaf.dragonflybsd.org/~gary

“The two most common elements in the universe are hydrogen and stupidity.”








Reply
#94
(01-06-2010, 03:49 AM)Rom1 Wrote:  
(01-06-2010, 03:34 AM)saintjimmygd Wrote:  On my SRCDS server,without this lib, the FPS fluctuates from 900 to 1000 ( RT Kernel 2.6.31.6 ).
With the lib installed ... it makes no differnece ! In game-play really,it doesn't matter if you use it or not.
With FPS=4000 it still fluctuates from 3700 to 4000 and the PING's are the same as before.
So...why should i use it ? Realy, is there any difference ?


Playing at 1000 or 4000 fps makes almost no difference. You're right !
Stability is the most important thing.

This lib use a lot of clock_gettime from the monotonic source. It can be a performance killer !
For gettimeofday on a 64bits kernel it's slower, on a 32 bits it's twice faster.
For the usleep it use a lot of gettime that slow down calculation.

So don't use it as it is for now. Wait for a new cleaner version using rdtsc and another algo.

clock_gettime uses the same backends as gettimeofday, and it's wrapped in a vdso page on the stack.

gettimeofday on 64bit kernel is faster to read than on 32bit. id usleep calls clock_gettime, which is another syscall, there's another 300uS on the fast path (it's more like 600, the sysenter to the kernel, the sysexit)

64bit kernel:
gettimeofday ( 79716us / 1000000runs ) = 0.079716us

32bit kernel:
gettimeofday ( 1951527us / 1000000runs ) = 1.951525us

64bit kernel + 32 bit test code

gettimeofday ( 80232us / 1000000runs ) = 0.080232us

64bit kernel + FAL patch (experimental kernel patch written by me)

gettimeofday ( 50191us / 1000000runs ) = 0.050191us

64bit + 32 bit test code wins.
http://leaf.dragonflybsd.org/~gary

“The two most common elements in the universe are hydrogen and stupidity.”








Reply
#95
(01-30-2010, 06:18 PM)Monk Wrote:  Name the function to do what? read time from the TSC?

You are forgetting some things. Maybe you can readout tsc directly without a syscall. But your server will be at 100% cpu load if you don't call some sleep function (or at least sched_yield) as long as you are waiting. But all those functions are syscalls.

But even if you live with 100% load, you will effectively have "syscalls" in your time critical sections. If something at a higher priority needs the cpu (e.g. a driver) the kernel will susped srcds and give the driver some cpu time. That is a context switch just like any syscall.

Also, your last post shows that a syscall can be as fast as 0.05us. What is the point of avoiding 0.05us?

I have done some other measurements: With an "appropriate" kernel (rt or ck patches, configured along my howto) and sched_fifo, the "wakeup" latency (i.e. the time the sleep function sleeps too long) is like 9us (RMS value). There is no need to make this any better. 9us is far below any fluctuations in the network latencies, and far below the 0.5ms average fluctuation from the 1000 fps.

So, Monk, if you say other bugs in the engine should be fixed, I completely agree with you. But how are you planning to do this? By changing some syscalls or system library functions we can't really change the engine...
http://www.fpsmeter.org
http://wiki.fragaholics.de/index.php/EN:Linux_Optimization_Guide (Linux Kernel HOWTO!)
Do not ask technical questions via PM!
Reply
#96
(01-31-2010, 08:40 PM)BehaartesEtwas Wrote:  
(01-30-2010, 06:18 PM)Monk Wrote:  Name the function to do what? read time from the TSC?

You are forgetting some things. Maybe you can readout tsc directly without a syscall. But your server will be at 100% cpu load if you don't call some sleep function (or at least sched_yield) as long as you are waiting. But all those functions are syscalls.

But even if you live with 100% load, you will effectively have "syscalls" in your time critical sections. If something at a higher priority needs the cpu (e.g. a driver) the kernel will susped srcds and give the driver some cpu time. That is a context switch just like any syscall.

Also, your last post shows that a syscall can be as fast as 0.05us. What is the point of avoiding 0.05us?

I have done some other measurements: With an "appropriate" kernel (rt or ck patches, configured along my howto) and sched_fifo, the "wakeup" latency (i.e. the time the sleep function sleeps too long) is like 9us (RMS value). There is no need to make this any better. 9us is far below any fluctuations in the network latencies, and far below the 0.5ms average fluctuation from the 1000 fps.

So, Monk, if you say other bugs in the engine should be fixed, I completely agree with you. But how are you planning to do this? By changing some syscalls or system library functions we can't really change the engine...

100% load? No, you wrap it in a function, and delay it with nops, so it doesn't suck up CPU. It way more precise than waiting on the kernel. But it's an overkill, just like people who use RT kernels for gameservers. The engine has bugs, like calling expensive, deprecated APIs on userland..
http://leaf.dragonflybsd.org/~gary

“The two most common elements in the universe are hydrogen and stupidity.”








Reply
#97
there is very little difference between a syscall and a context switch performed by the scheduler. to either you have 100% cpu load or you have context switches. if you do noop etc. you won't have control when the kernel takes the cpu away from you. so doing some controlled sleep is actually better, because you give the kernel the time it wants when you don't do anything but can have full cpu when you need it (sched_fifo with a high priority gives you the cpu without interruption long enough to render a frame but not to play a war).
as I wrote, the precision of kernel sleeps is sufficient on appropriate kernels. but if you want to improve it, why don't you sleep a little shorter than required and do the rest of the waiting using rdtsc? i.e. sleep only 900us and busy wait 100us. that could help maybe...

replacing expensive APIs might help of course. but I wouldn't see every syscall as a deprecated API, virtually every low level function has to be run at the kernel - and that is not a problem. usleep and gettimeofday are certainly not deprecated. and tampering with gettimeofday can be dangerous because it affects our instrument to measure the quality (almost) *only* (i.e. the stats command). it will not improve the quality if we save a call that takes 0.05us one or two times per 1000us (i.e. frame).
http://www.fpsmeter.org
http://wiki.fragaholics.de/index.php/EN:Linux_Optimization_Guide (Linux Kernel HOWTO!)
Do not ask technical questions via PM!
Reply
#98
Busy waiting is what the code above does. I noop it in assembly to reduce CPU consumption. As I said before, it's an overkill.

In the end, the windows binaries are way more optimized than the linux binaries. I just wish one day valve would dump this server fps crap and use what quake ha[d,s], sv_fps.
http://leaf.dragonflybsd.org/~gary

“The two most common elements in the universe are hydrogen and stupidity.”








Reply
#99
I am not familiar with the quake engine... what is sv_fps?
http://www.fpsmeter.org
http://wiki.fragaholics.de/index.php/EN:Linux_Optimization_Guide (Linux Kernel HOWTO!)
Do not ask technical questions via PM!
Reply
Well,all this boosters gave me a lot of errors and server interrupts.I hope next version will be better.

About my system,i had an AMD Athlon X64 dual core 2.3 Ghz with 2 Gb RAM.This was my result with 2.6.31.12-rt20 kernel :
http://fpsmeter.org/p,view;43665.html

I switched my HLDS-based server on a different machine ... INTEL P4 2.8 Ghz with 2 Gb RAM,with kernel 2.6.32-ck2,no boosters,only resched.sh and one idler :
http://fpsmeter.org/p,view;45879.html

I have a feaw questions for BehaartesEtwas ... This is resched.sh:
Quote:#!/bin/sh

PIDS=`ps ax | grep sirq-hrtimer | grep -v grep | sed -e "s/^ *//" -e "s/ .*$//"`
for p in $PIDS; do
chrt -f -p 99 $p
done

PIDS=`ps ax | grep sirq-timer | grep -v grep | sed -e "s/^ *//" -e "s/ .*$//"`
for p in $PIDS; do
chrt -f -p 51 $p
done

PIDS=`pidof srcds_i686`
for p in $PIDS; do
chrt -f -p 98 $p
done

PIDS=`pidof srcds_i486`
for p in $PIDS; do
chrt -f -p 98 $p
done

PIDS=`pidof srcds_amd`
for p in $PIDS; do
chrt -f -p 98 $p
done

PIDS=`pidof hlds_i686`
for p in $PIDS; do
chrt -f -p 98 $p
done

PIDS=`pidof hlds_i486`
for p in $PIDS; do
chrt -f -p 98 $p
done

PIDS=`pidof hlds_amd`
for p in $PIDS; do
chrt -f -p 98 $p
done

You said
Quote:Note: sirq-hrtimer seems to depend on the distribution or exact kernel version, I have also seen softirq-hrtimer & ksoftirqd. Try running:
ps ax | grep hrtimer
There should be something like [softirq-hrtimer/N] for each CPU (N is a number). Strip the brackets and the /N and use what is left...

I don't understant....i typed ps ax | grep hrtimer and gave me this :
Quote:5267 pts/0 S+ 0:00 grep hrtimer

So,do i have to change something in resched.sh file ?
Also,what can i do more to stabilize my FPS ? The second graph was "my lucky 5 minutes" , in reality i have drops from 1000 to 200 Sad
RT and CK patches act the same on my machine...So,what should i do more to improve the quality of my Cs1.6 server ?
I use tsc,hpet isn't available in my bios...also Linux Debian x86..
Reply
(02-06-2010, 01:02 AM)saintjimmygd Wrote:  I don't understant....i typed ps ax | grep hrtimer and gave me this :
Quote:5267 pts/0 S+ 0:00 grep hrtimer

ck patches don't have hrtimer threads, no way to set their priority. the appropriate parts of the resched.sh will just be without effect.
http://www.fpsmeter.org
http://wiki.fragaholics.de/index.php/EN:Linux_Optimization_Guide (Linux Kernel HOWTO!)
Do not ask technical questions via PM!
Reply
Hello i need help.

I Need Startscript for this LD_PRELOAD with SCREEN (start.sh)
Reply
Screen doesn't work directly with LD_PRELOAD.
Write a little wrapper script.
srcds_wrapper:
Code:
#!/bin/bash
FPS=2000 LD_PRELOAD=~/boost.so ./srcds_run $@
Then do "chmod u+x srcds_wrapper".
Start your server with "screen -mdS screenname ./srcds_wrapper -game csrtike -ip xxx.xxx.xxx.xxx -port xxxx -maxplayers xx +map xxx"
When you are using your own startscript, then search for "srcds_run" in your script and replace it with "srcds_wrapper".

When you are using a WebInterface, you can rename the symbolic link srcds_run to srcds_start and create a new srcds_run in your serverdirectory:
Code:
#!/bin/bash
FPS=2000 LD_PRELOAD=~/boost.so ./srcds_start $@
Then do "chmod u+x srcds_run".
Start your server with "screen -mdS screenname ./srcds_run -game csrtike -ip xxx.xxx.xxx.xxx -port xxxx -maxplayers xx +map xxx"

Note: Please use the correct path to the boost-lib. I have my lib in my homedirectory.
Reply
Hello,

Quote:okay thanks ;D
Reply
(02-08-2010, 06:52 PM)BehaartesEtwas Wrote:  
(02-06-2010, 01:02 AM)saintjimmygd Wrote:  I don't understant....i typed ps ax | grep hrtimer and gave me this :
Quote:5267 pts/0 S+ 0:00 grep hrtimer

ck patches don't have hrtimer threads, no way to set their priority. the appropriate parts of the resched.sh will just be without effect.

on ck its ksoftirqd instead of hrtimer, so it has to look like "grep ksoftirqd":

ps ax | grep ksoftirqd:

3 ? S 0:00 [ksoftirqd/0]
4 ? S 0:00 [ksoftirqd/1]
31591 pts/1 S+ 0:00 grep ksoftirqd

so just modify the appropriate part in resched.sh
Reply


Forum Jump:


Users browsing this thread: 3 Guest(s)