Standard benchmarks (like BEIR/MS MARCO) are great, but they are likely already in distribution for foundation models training sets, and crucially, they lack the complex, structured metadata needed to test real-world filtering scenarios (e.g., "Find docs from region X, between dates Y and Z, with tag A").
datasetFactory is an orchestrated LLM pipeline that turns a single natural language prompt into a (potentially) massive, structured evaluation dataset.
tacoooooooo•34m ago
datasetFactory is an orchestrated LLM pipeline that turns a single natural language prompt into a (potentially) massive, structured evaluation dataset.