SRCDS Steam group


Boosting SRCDS for linux without a RT kernel
#16
(10-17-2011, 05:52 PM)Monk Wrote:  Adding 'nice' to a process on top of 'chrt' is irrelevant. In linux_sched.c, chrt gives more of a 'benefit' to process scheduling than 'nice' could ever do. Personally in my tests, a process with SCHED_FIFO (unixbyte, dd and others) with a nice of -20 to 20 gives no noticeable changes. I haven't found any cases where nice is even used anymore, personally I think it's ancient cruft from the days of SysV/ancient BSD systems.

You should simply remove it and use say, ionice.

Nice catch, I supposed as much but didn't bother to check inside linux_sched. This will be updated. Cheers !

Regarding ionice, it is in use already (check [[ "$CFQ" ...).

Also Monk to keep this discussion going ... regarding processor affinity. The script currently sticks all srcds threads on one core. Still, source logs show that there is at least one worker pool made of 4 threads (presumably because there are four cores on my box). Would it make sense to spread those threads away from core 3, where srcds_run + its main thread would still be stuck. The thing is, other than pseudo-PID ordering, knowing who is the main thread might get abit tricky.

Thanks for your input !

Reply
#17
On older systems, CPU0 would always service interrupts, rdtsc, LAPIC, etc. Newer designs allow these to exist on all CPUs.

Honestly, I don't even bother to set affinity. Task switching between CPU{0-8} on modern CPUs is a few ns at most depending on your setup. It's a pessimistic optimization at most and the only applications that need this is near nanosecond specific applications like realtime, but most realtime systems aren't even x86, they are ARM or other embedded CPUs.

You could get creative and remove all of the asm("hlt")s from the kernel, which would stop it from going into HLT mode. This would reduce the wakeups / locking latency so you may see a benefit to it..


http://leaf.dragonflybsd.org/~gary

“The two most common elements in the universe are hydrogen and stupidity.”








Reply
#18
(10-18-2011, 11:19 AM)Monk Wrote:  Honestly, I don't even bother to set affinity. Task switching between CPU{0-8} on modern CPUs is a few ns at most depending on your setup. It's a pessimistic optimization at most and the only applications that need this is near nanosecond specific applications like realtime ...

I believe the point of setting affinity in 2011, on a PC, is to minimize L1/L2 cache trashing and TLB blanking. Mono affinity process : less caching action. Here's what makes me believe it's still relevant :

Architecture matters. Dual socket (2xdual core with hyperthreading) isn't the same as single socket (1xquad core with hyperthreading). I suppose you meant the latter. Maybe you would agree that in the first case, setting affinity makes sense, since even L3 caches are not shared.

AFAIK, you only get one contiguous bit of data at a time when talking to the memory controller, be it in the CPU or in the northbridge. Any kind of memory controller contention will only make the TLB/cache refresh period even longer after a task migrates from one core to another. If it ever errs towards 3ms-10ms side, we're in for a "LAG SPIIIIKE" on the srcds side.

Interesting figures and benchmarks over there : Performance Impact of Resource Contention in Multicore Systems. I'm not claiming it proves my point or anything, but at least it shows that what they call task density, and resource contention, is still very much a present problem, and not something only relevant to RT systems.
Reply
#19
Err. A TLB miss is not 3 to 10ms. The TLB is controlled by the MMU, and TLB misses are between 5 and 100 clock cycles, which is a few nanoseconds depending on the CPU speed; certainly not in the ms. If a TLB miss was in the ms range, then the x86 architecture would be unusable for just about anything at all.

Memory controller contention has nothing to do with 'TLB cache / refresh period' because the MMU controls the TLB arena which is located on the CPU. Refresh period is controlled by the OS and not the hardware as programs in the virtual address space are controlled/mapped by the kernel which hints to the TLB via CR3 register.

Here's the TLB flushing prototypes in the linux kernel:

void flush_tlb_all(void)
void flush_tlb_mm(struct mm_struct *mm)
void flush_tlb_range(struct vm_area_struct *vma,
unsigned long start, unsigned long end)

There are others, such as the COW (copy-on-write) prototypes that affect cache to hint at the hardware etc

I think you're grossly overstating how much 'performance' (?) a gameserver will suffer if a.) affinity is allowed operate on all CPUs b.) a tlb hit/miss occurs. Personally, there is no evidence at all that doing any affinity stuff helps anything at all.



















http://leaf.dragonflybsd.org/~gary

“The two most common elements in the universe are hydrogen and stupidity.”








Reply
#20
I guess, it's best to try both options. Especially since srcds is now multithreaded, binding it to one single core will be counter-productive for large servers certainly. But for small servers it might help in case one server takes all cpu and therefore blocks others. But then something is wrong already in the first place...
http://www.fpsmeter.org
http://wiki.fragaholics.de/index.php/EN:Linux_Optimization_Guide (Linux Kernel HOWTO!)
Do not ask technical questions via PM!
Reply
#21
(10-10-2011, 07:27 PM)mrzor Wrote:  
(10-10-2011, 03:21 PM)sarmadsoomro Wrote:  This is not working in CentOS.
Can u guide me how to make it run.
Thanks

What is not working ?

CentOS how to make it run?.
Reply


Forum Jump:


Users browsing this thread: 2 Guest(s)