frontpage.
newsnewestaskshowjobs

Open Source @Github

fp.

Open in hackernews

Show HN: Ray Hosting – Topology-aware game server orchestrator made from scratch

https://ray-hosting.com/en-US
2•bardhyliis•1h ago
Hey HN, I have built a game server orchestrator from scratch, As a solo-dev it took me 3+ years and almost 10 hours daily to finally complete it since i started in the beginning of 2023. Im 26 years old now!.

The complexity and stuff i had to research to complete this project i couldnt have imagined them even in my dreams, but hey, here it is, my greatest professional achievement until now.

Down below I will try to break down just some of the core and most important features of my game server orchestrator.

1. CORE PINNING & CCD CACHE ALIGNMENT

I had to research and understand CPU cache layouts. I found out that if my game containers, which utilize docker run, span across different core complex dies (CCDs) or share SMT sibling threads with a busy neighbor, L3 cache thrashing ruins single-core tick efficiency.

Then what I did is that I pinned all non game-server processes strictly on core 0 and its SMT sibling core 12 using GRUB:

I disabled the 1000Hz timer interrupts to prevent context switching so as to not pollute the L3 cache.

I also offloaded the rcu to cores 0 and 12 so as to avoid any micro interruptions on the game containers and leave 100% of the performance to the game containers.

  GRUB_CMDLINE_LINUX_DEFAULT="nomodeset isolcpus=1-11,13-23 nohz_full=1-11,13-23 rcu_nocbs=1-11,13-23"
As for the game containers, as i mentioned i utilize docker run directly since swarm is not needed and would actually be bad design, I have the orchestrator service which utilizes and algorithm to calculate which CCD core is best to pin the game server container on:

  // Zen 4 core complex die (CCD) mapping in C#
  int siblingOffset = totalHardwareThreads / 2;
  int coresPerCcd = siblingOffset / 2;
  int getCcdId(int i) => ((i % siblingOffset) < coresPerCcd) ? 0 : 1;
  int getSibling(int i) => (i < siblingOffset) ? (i + siblingOffset) : (i - siblingOffset);
I also set the memory limit and the memory reservation to be equal (--memory == --memory-reservation), in order to make the kernel lock that RAM memory physically RAM and block swap usage to avoid the noisy-neighbour problem.

Since, as can be seen, the orchestrator tries to find the most performant threads for a game server, this means that the host node will get its cpu fragmented, specifically for this case I have an algorithm that simulates on the host node the best place for each running game container then relocates some or all of the container dynamically, live, without restarting the container or disconnecting any active player using:

  docker update --cpuset-cpus="{cpuSet}" {containerName}
2. EBPF/XDP + NFTABLES utilization for preventing ddos attacks, since game servers get constantly bombarded by ddos attacks, bots or otherwise specially targeted for many different reasons, could be whats called a script kid or sometimes even salty gamers, xd.

In the beginning i tried to use UFW but ended up get rid of it since it conflicts with docker, which it took me quite some time to realize it in the beginning since i was still doing research on how things work on the network-level.

In order to have the best protection I decided to have specific, per port connection rate limits. If the limits are hit I use a blacklist which the offenders ip is registered on, with a specific timer, then immediately register those blacklisted ips on the eBPF map. These IPs are dynamically added and removed from each list/map when the ban expires.

There is AnonymousPipeClientStream edge case though, a lot of games have many different mods and plugins which can increase the rate of packets, even though I have tried my best to account for this in the default rate limit rules I have set, also allow the game server owners to actually adjust these limits if needed, cloudflare-style, by providing 4 profiles: Standard, Loose, Strict, UnderAttack.

have optimized the standard one as best as I could, based on real life data, and it should be enough for 99% of the servers, the other profiles could be utilized in other rare cases for heavily modded servers for example.

So the best approach for ddos mitigation is using nftables with per game server port limits have per game port nftables limits which

I have also bumped the rmem_max/wmem_max buffers to 16MB so that specific game-container threads dont block when registering the map data directly into ram, by default the write buffer is tiny around 200 KB, by doing this the player ticks are processed quicker.

Since the user needs to manage the game files, uploading/downloading/editing/deleting etc etc, I use fireqos to prioritize game traffic, meaning game traffic gets the fast-lane and is never throttled by the actions that the clients does using their file manager making sure that the game stays ping spike free.

I also use TCP BBR Congestion Control instead of the default Linux CUBIC which is unoptimized and causes rubber-banding because it assumes that if there is packet-loss between the game server and the player there must be network congestion which as a result reduces transmission speed, which in turn causes lag spikes. What BBR Congestion Control does is that it measures the actual bandwidth between the game server and the player and sends the data packets at a speed which the player can consume and as a result avoids rubber-banding.

I also use fq, fair queueing, in order to avoid a single game server owner from using all the bandwidth in case for example someone decides to upload or download huge files.

# BBR Congestion Control net.core.default_qdisc = fq net.ipv4.tcp_congestion_control = bbr

  # UDP/TCP Buffer Expansion
  net.core.rmem_max = 16777216
  net.core.wmem_max = 16777216
  net.core.rmem_default = 16777216
  net.core.wmem_default = 16777216
3. SSR CACHE POISON solution.

In order to avoid angular ssr cache poisoning i have two endpoints, /graphql - public and read-only data which are directly cached on cloudflare, this endpoint rejects immediately any auth header, by rejecting the entire request, in order to prevent cache-poisoning and prevent any state sharing between requests. The second endpoint is /secure handles any authenticated data and does not cache anything. Also all my web services, like the front end, api, database calls use my private wireguard mesh which adds a layer of security. Also during SSR in Node.js I have skipped the TLS handshakes entirely which adds a bit of latency by using the local Docker swarm network for direct access to my api.

-----

Since as I mentioned im a solo-dev, im bootstrapping this entirely out of my own pocket, I have two bare-metal nodes, one in Europe and the other on Central USA.

Today, my goal is to see how my orchestrator handles real world usage before i scale up, so I invite anyone to spin up a game server by using my free trials and try to break my system.

If anyone wishes, he can go directly on https://ray-hosting.com/en-US/free-trial and register to automatically claim the free trial. It requires a credit card though, solely for abuse protection. OR, if you dont want to put your card down which is understandable, i can spin up a trial for you from my admin panel directly after you register so that you can test my system's abilities, just drop a comment here since I will be watching the thread today. I would really love to hear honest thoughts and opinions on the architecture, deployment speed, or any other thing you want to discuss.

PS: im not a native english-speaker so I had a hard time putting this together, lol, btw, I do have a lot more stuff to talk about my platform but for now this drained me. Lol, thank you very much for reading.

Highly intelligent people are more likely to ditch old habits for better ideas

https://www.psypost.org/highly-intelligent-people-are-more-likely-to-ditch-old-habits-for-better-...
1•randycupertino•14s ago•0 comments

The Chinese parents dancing on live streams to help their children fight cancer

https://www.sixthtone.com/news/1018060
1•thisislife2•1m ago•0 comments

Zerostack v1.5 – A Unix-inspired coding agent written in pure Rust

https://crates.io/crates/zerostack/1.5.0#zerostack
1•gidellav•1m ago•0 comments

Lithos

https://lithosgraphein.com/
1•serhack_•2m ago•0 comments

Ask HN: Is anyone growing further from capitalism?

2•holistio•2m ago•0 comments

Risk of Portable Electronic Devices in Patients with Implanted Devices

https://www.ahajournals.org/doi/10.1161/CIRCEP.121.010646
1•bookofjoe•2m ago•1 comments

We Built a CLI That Gets Smarter Every Time You Use It

https://medium.com/@vektormemory/via-v0-4-0-we-built-a-cli-that-gets-smarter-every-time-you-use-i...
1•vektormemory•3m ago•0 comments

Scientists Identify 2 Distinct Subtypes of Autism in the Brain

https://www.sciencealert.com/scientists-identify-2-distinct-subtypes-of-autism-in-the-brain
1•andsoitis•4m ago•0 comments

8 Years of Refinement

https://alt-tab.app/changelog
1•behnamoh•6m ago•0 comments

Calculations Suggest It'll Be Impossible to Control a Super-Intelligent AI

https://www.sciencealert.com/calculations-suggest-itll-be-impossible-to-control-a-super-intellige...
1•andsoitis•6m ago•0 comments

Luddite

https://en.wikipedia.org/wiki/Luddite
1•d4ng•8m ago•1 comments

RFC 9396: OAuth 2.0 Rich Authorization Requests

https://ciamweekly.substack.com/p/rfc-9396-oauth-20-rich-authorization
1•mooreds•9m ago•0 comments

Show HN: V-COS – Governance layer that keeps AI coding agents coherent

https://github.com/vagnerfirminopro/v-cos
1•vagnerfirmino•9m ago•0 comments

Claude Code Is Dead

https://claude-code-is-dead.vercel.app/#3
1•gidellav•10m ago•0 comments

A bitter lesson for medicine, or a benchmark problem?

https://sparsethought.com/2026/06/14/what-did-they-actually-measure/
1•galsapir•11m ago•0 comments

The Small-Business Owners Managing Whole Armies of A.I. Employees

https://www.nytimes.com/2026/06/04/magazine/ai-agents-openclaw-small-business.html
1•mooreds•11m ago•0 comments

AI and the Red Queen

https://huntersoftwareconsulting.com/posts/2026-06-14-ai-red-queen/
1•mooreds•12m ago•0 comments

The Role of a Software Engineer

https://eli.cx/blog/the-role-of-a-software-engineer
1•chronicom•14m ago•0 comments

ISC License

https://opensource.org/license/isc
1•doener•15m ago•0 comments

Show HN: Nodrix – open-source IoT cloud that runs in your own Cloudflare account

https://nodrix.live/
1•decoded_cipher•19m ago•0 comments

Ethereum can quantum-proof accounts for just 7 cents, says Kohaku project leader

https://cointelegraph.com/news/ethereum-quantum-proof-accounts-7-cents-researcher
1•ytNumbers•20m ago•0 comments

Teachers of Reddit: Is the "Gen Alpha can't read" crisis real?

https://old.reddit.com/r/AskReddit/comments/1u5ku71/teachers_of_reddit_is_the_gen_alpha_cant_read/
4•YesBox•21m ago•0 comments

Qed: A verified web front end written in Lean4

https://github.com/JacobAsmuth/qed
1•JacobAsmuth•21m ago•0 comments

Claude Code is dead, the future is open

https://claude-code-is-dead.vercel.app/#second
2•gidellav•23m ago•0 comments

Show HN: Replicant Detector with Datastar, Common Lisp, BKNR Datastore

https://rep-detect.lambda-combine.net/
1•fsmunoz•23m ago•1 comments

Show HN: Solaris the Thinking Ocean Simulator

https://solaris.franzai.com/
2•franze•24m ago•0 comments

Show HN: cuSBF – faster Bloom filter on GPUs for DNA sequences

https://github.com/tdortman/cuSBF
1•tdortman•24m ago•0 comments

'It's a hurricane warning': Guardrails around powerful AI models may be too late

https://www.politico.com/news/2026/06/07/frontier-ai-cybersecurity-china-race-00952786
1•u1hcw9nx•24m ago•0 comments

Ask HN: How do I do marketing for an app I made

1•rohand7•25m ago•0 comments

Setting Node and PNPM Versions in Cloudflare Workers Programmatically

https://senhongo.com/blog/cloudflare-workers-node-pnpm/
2•SenHeng•27m ago•0 comments