I've been involved with a few different discussions about building something along these lines, but we never actually moved forward because we realized the effort to keep up to date with the new content across all public entities would be enormous. And without close communication with the various entities and governance solution providers, it would be so easy to just become an annoyance and make them want to block our crawlers.
How do you all address such concerns, and handle throttling your data gathering in order to not become a problem for the original content providers?
codingdave•1h ago
How do you all address such concerns, and handle throttling your data gathering in order to not become a problem for the original content providers?