Long time lurker, many accounts, one at a time, no abuse. Hi. Yesterday's recount about layer duplication and adjustment for popular open weight models on huggingface, led to this submission.
Since GPT ~3.5 it has been apparent that computers can simulate human, as far as a computer is concerned. The dead-internet theory actually originated circa 2012, but I've had difficulty finding verification, including searching the archive.org .
All this turmoil makes offline on prem so important.
Here is the current hacker infra:
1. choose a platform, then decide on the linux kernel.
2. qemu/kvm maybe incus, or docker or lxc/lxd, point being: step two is isolate from the platform chosen in step one
3. choose to use kvm, because it has the largest development community. believe me: you don't want to get bogged down with the associated dependencies of bhyve or jails, or, for that matter, WSL2 (or 1).
4. mainstream, mainline, main, all day.
5. either libvirt or, maybe just choose the easy option and q emulate.
6. the host os can emulate the virtual machine's kernel with very little overhead or introduced latency if the interactions with the host kernel are tailored.
7. after sorting everything out on the host (use incus?) pass through some bare metal to your pet, some performance cores and a couple dozen gigs of ram, maybe a 6000 gpu or two, some nics, doesn't matter, all the host hardware is passed through
8. except the hypervisor. is it xen>? Can't be, not ideal, losing market share. Likellllly qemu/kvm = LINUX.
9. HYPERVISOR OPTIMUM ACHieved. Local and global minmax measured and optimal global apex...
10. profit???
Comments
yjtpesesu2•1h ago
Oh, that's just the infra for the infra. Then use something like graphllm from matteo, and of course llama.cpp from greg, tailor you model selection to your hardware.
yjtpesesu2•1h ago