The Tool: https://websiteaiscore.com/gist-compliance-check
The Context (The Paper): To understand the tool, you have to understand the problem Google is solving with GIST: redundancy is expensive. When generating an AI answer , the model cannot feed 10k search results into the context window—it costs too much compute. If the top 5 results are semantically identical (consensus content), the model wastes tokens processing duplicates.
The GIST algorithm solves this via Max-Min Diversity:
Utility Score: It selects a high-value source.
The Radius: It draws a mathematical conflict radius around that content based on semantic similarity.
The Lockout: Any content inside that radius is rejected to save compute, regardless of domain authority.
How my implementation works: I wanted to see if we could programmatically detect if a piece of content falls inside this "redundancy radius." The tool uses an LLM to analyze the top ranking URLs for a specific query, calculates the vector embedding, and measures the Semantic Cosine Similarity against your input.
If the overlap is too high (simulating the GIST lockout), the tool flags the content as providing zero marginal utility to the model.
I’d love feedback on the accuracy of the similarity scoring.