That should in theory prevent overly redacted documents for political purposes.
An approach that could be rolled out today would be redacting with human review, but showing what % of redactions the AI would have done, and also showing the prompt given to the AI to perform redactions.
Whoever did these "bad" redactions doesn't even know how to use a PDF Editor.
We have paralegals and lawyers "mark for redaction", then review the documents, then "apply redactions". It's literally be done by thousands of lawyers/paralegals for decades. This is just someone not following the process and procedure, and making mistakes. It's actually quite amateurish. You should never, ever screw up redactions if you follow the proper process. Good on the X-ray project on trying to find errors.
I just want to add, applying black highlights on top of text is in fact, the "old" way of redaction, as it was common to do this, and then simply print the paper with the black bars, and send the paper as the final product.
Whoever did it is probably old, and may have done it thinking they were going to print it on paper afterwards!! Just guessing as to why someone would do this.
Especially with the "draw a black box over it" method, the text also stops being trivially mouse-selectable (even if CTRL+A might still work).
Another possibility is, of course, that whoever was responsible for this knew exactly what they were doing, but this way they can claim a honest mistake rather than intentionally leaking the data.
Yes; that's presumably included in being "amateurish" and "not following proper process".
Anyway, I made X-ray to analyze the millions of documents we have in CourtListener so that we can try to educate people about the issue.
The analysis was fun. We used S3 batch jobs, but we haven’t done the hard part of looking at the results and reporting them out. One day.
> Information Leaking from Redaction Marks: Even when content is properly removed, the redaction marks themselves can leak some information if not done carefully. For example, if you have a black box exactly covering a word, the length of that black box gives a clue to the word’s length (and potentially its identity).
Does X-ray employ glyph spacing attacks and try to exploit font metric leaks?
PDF redaction fails are everywhere and it's usually because people don't understand that covering text with a black box doesn't actually remove the underlying data.
I see this constantly in compliance. People think they're protecting sensitive info but the original text is still there in the PDF structure.
I'm almost fully convinced that someone did this bad intentionally, together with the bad redactions, as surely people tasked with redacting a bunch of files receive some instructions on what to do/not to do?
text=about them to damage their credibility when they tried to go public with their stories of being text=Epstein also threatened harm to victims and helped release damaging stories =attorneys' fees and case costs in litigation related to this conduct.
=Defendants also attempted to conceal their criminal sex trafficking and abuse
text=$327,497.48 and $6,487.04 in New York City text=trafficking and abuse conduct. text=destroy evidence relevant to ongoing court proceedings involving Defendants' criminal sex text=Epstein also instructed one or more Epstein Enterprise participant-witnesses to text=trafficked and sexually abused. text=conduct by paying large sums of money to participant-witnesses, including by paying for their
seanw444•2h ago
formerly_proven•2h ago
agumonkey•2h ago
kstrauser•2h ago
agumonkey•1h ago
arthurcolle•1h ago
airstrike•1h ago
zahlman•1h ago
kstrauser•1h ago
k1t•2h ago
agumonkey•1h ago
hopefully this is straw that breaks the camel's back
XorNot•1h ago
agumonkey•1h ago
jibal•1h ago
arthurcolle•1h ago