The request processing pipeline is split into small tasks. A single I/O worker accepts incoming requests. When a new request arrives, the I/O worker publishes a read-request task, which is then picked up by one of the generic workers. From there, the request continues through a series of smaller tasks: a query-processing task parses the query, an execution task performs the operation, and finally a write-response task sends the result back to the client. Each stage publishes the next task to the queue, and that task may be picked up by a different worker. The maximum number of workers is controlled by a configuration parameter, and there is simple logic to retire idle workers and spawn new ones as the workload increases.
Sklad also collects internal metrics such as request latency, task latency, queue wait time, and the number of pending memtables and active workers. Right now these metrics are exposed through a dedicated metrics request, but I’d like to use them as inputs for adaptive behavior in the engine, for example scaling the worker pool, deciding when to run compaction, and potentially tuning SSTable parameters such as memtable size or Bloom filter bits per key.