I run a lot of headless Screaming Frog crawls on servers.
The main bottleneck is that while the SF CLI can consume configuration files (.seospiderconfig), it cannot produce them. If you want to run a crawl with complex settings (like custom extractions or specific excludes), you are forced to open the desktop GUI, configure it manually, save the file, and upload it.
You can't just script the file generation because the configs are serialized Java objects (binary blobs), not JSON or XML.
I decided to reverse engineer it. A hex dump confirmed the format was standard Java serialization. Instead of writing a fragile parser, I realized I could use the application's own JARs to handle the heavy lifting.
I built two tools to solve this:
Python Library: Uses JPype to bridge Python to the local SF JARs. You can instantiate config objects, modify them (e.g., config.set_user_agent(...)), and serialize them back to disk. Great for Airflow/Python pipelines.
Java Utility: A standalone CLI tool to do the same thing if you prefer a native Java environment or don't want the Python overhead.
What this enables:
True Headless Automation: Generate valid configs on the fly right before a crawl runs.
Diffing: Compare two binary config files to debug "config drift" (e.g., seeing exactly why a crawl limit changed).
Feedback welcome—especially on the JPype implementation, as that was the trickiest part to stabilize!