frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Pro-democracy HK tycoon Jimmy Lai convicted in national security trial

https://www.bbc.com/news/articles/cp844kjj37vo
103•onemoresoop•37m ago•27 comments

Carrier Landing in Top Gun for the NES

https://relaxing.run/blag/posts/top-gun-landing/
194•todsacerdoti•2h ago•72 comments

$50 PlanetScale Metal Is GA for Postgres

https://planetscale.com/blog/50-dollar-planetscale-metal-is-ga-for-postgres
65•ksec•1h ago•24 comments

P-computers can solve spin-glass problems faster than quantum systems

https://news.ucsb.edu/2025/022239/new-ucsb-research-shows-p-computers-can-solve-spin-glass-proble...
25•magoghm•1w ago•4 comments

Avoid UUIDv4 Primary Keys

https://andyatkinson.com/avoid-uuid-version-4-primary-keys
205•pil0u•7h ago•214 comments

Thousands of U.S. farmers have Parkinson's. They blame a deadly pesticide

https://www.mlive.com/news/2025/12/thousands-of-us-farmers-have-parkinsons-they-blame-a-deadly-pe...
201•bikenaga•2h ago•142 comments

It seems that OpenAI is scraping [certificate transparency] logs

https://benjojo.co.uk/u/benjojo/h/Gxy2qrCkn1Y327Y6D3
83•pavel_lishin•3h ago•57 comments

Speech and Language Processing (3rd ed. draft)

https://web.stanford.edu/~jurafsky/slp3/
31•atomicnature•1w ago•6 comments

Adafruit: Arduino’s Rules Are ‘Incompatible With Open Source’

https://thenewstack.io/adafruit-arduinos-rules-are-incompatible-with-open-source/
358•MilnerRoute•22h ago•194 comments

DNA Learning Center: Mechanism of Replication 3D Animation

https://dnalc.cshl.edu/resources/3d/04-mechanism-of-replication-advanced.html
61•timschmidt•1w ago•15 comments

Roomba maker goes bankrupt, Chinese owner emerges

https://news.bloomberglaw.com/bankruptcy-law/robot-vacuum-roomba-maker-files-for-bankruptcy-after...
486•nreece•16h ago•569 comments

Unscii

http://viznut.fi/unscii/
257•Levitating•13h ago•33 comments

If AI replaces workers, should it also pay taxes?

https://english.elpais.com/technology/2025-11-30/if-ai-replaces-workers-should-it-also-pay-taxes....
376•PaulHoule•16h ago•606 comments

Invader: Where to Spot the 8-Bit Street Art in London

https://londonist.com/london/art-and-photography/invader-where-to-spot-the-8-bit-street-art-in-lo...
52•zeristor•1w ago•17 comments

Arborium: Tree-sitter code highlighting with Native and WASM targets

https://arborium.bearcove.eu/
179•zdw•13h ago•31 comments

Optery (YC W22) Hiring CISO, Release Manager, Tech Lead (Node), Full Stack Eng

https://www.optery.com/careers/
1•beyondd•5h ago

Ask HN: What Are You Working On? (December 2025)

350•david927•1d ago•1133 comments

Samsung may end SATA SSD production soon

https://www.techradar.com/computing/storage-backup/looking-for-a-cheap-ssd-dont-wait-samsung-coul...
43•Krontab•2h ago•27 comments

SoundCloud has banned VPN access

https://old.reddit.com/r/SoundCloudMusic/comments/1pltd19/soundcloud_just_banned_vpn_access/
180•empressplay•14h ago•134 comments

We Put Flock Under Surveillance: Go Make Them Behave Differently [video]

https://www.youtube.com/watch?v=W420BOqga_s
40•huvarda•2h ago•6 comments

Ask HN: Is building a calm, non-gamified learning app a mistake?

22•hussein-khalil•1h ago•32 comments

AI agents are starting to eat SaaS

https://martinalderson.com/posts/ai-agents-are-starting-to-eat-saas/
285•jnord•17h ago•285 comments

$5 whale listening hydrophone making workshop

https://exclav.es/2025/08/03/dinacon-2025-passive-acoustic-listening/
80•gsf_emergency_6•4d ago•27 comments

John Varley has died

http://floggingbabel.blogspot.com/2025/12/john-varley-1947-2025.html
141•decimalenough•14h ago•54 comments

The Whole App is a Blob

https://drobinin.com/posts/the-whole-app-is-a-blob/
125•valzevul•13h ago•72 comments

The Java Ring: A Wearable Computer (1998)

https://www.nngroup.com/articles/javaring-wearable-computer/
35•cromulent•5d ago•32 comments

Common Rust Lifetime Misconceptions

https://github.com/pretzelhammer/rust-blog/blob/master/posts/common-rust-lifetime-misconceptions.md
87•CafeRacer•11h ago•41 comments

Show HN: I wrote a book – Debugging TypeScript Applications (in beta)

https://pragprog.com/titles/aodjs/debugging-typescript-applications/
44•ozornin•1w ago•17 comments

The Problem of Teaching Physics in Latin America (1963)

https://calteches.library.caltech.edu/46/2/LatinAmerica.htm
80•rramadass•20h ago•77 comments

How well do you know C++ auto type deduction?

https://www.volatileint.dev/posts/auto-type-deduction-gauntlet/
75•volatileint•5d ago•100 comments
Open in hackernews

It seems that OpenAI is scraping [certificate transparency] logs

https://benjojo.co.uk/u/benjojo/h/Gxy2qrCkn1Y327Y6D3
83•pavel_lishin•3h ago

Comments

drwhyandhow•3h ago
This has been long the case! I think there whole business model is based off scraping lol
Aurornis•2h ago
This could be OpenAI, or it could be another company using their header pattern.

It has long been common for scrapers to adopt the header patterns of search engine crawlers to hide in logs and bypass simple filters. The logical next step is for smaller AI players to present themselves as the largest players in the space.

Some search engines provide a list of their scraper IP ranges specifically so you can verify if scraper activity is really them or an imitator.

EDIT: Thanks to the comment below for looking this up and confirming this IP matches OpenAI’s range.

jsheard•2h ago
In this case it is actually OpenAI, the IP (74.7.175.182) is in one of their published ranges (74.7.175.128/25).

https://openai.com/searchbot.json

I don't know if imitating a major crawler is really worth it, it may work against very naive filters, but it's easy to definitively check whether you're faking so it's just handing ammo to more advanced filters which do check.

  $ curl -I https://www.cloudflare.com
  HTTP/2 200

  $ curl -I -H "User-Agent: Googlebot" https://www.cloudflare.com
  HTTP/2 403
Aurornis•2h ago
Thanks for looking it up!
827a•2h ago
Thousands of systems, from Google to script kiddies to OpenAI to nigerian call scammers to cybersecurity firms, actively watch the certificate transparency logs for exactly this reason. Yawn.
H8crilA•2h ago
For those that never looked at the CT logs: https://crt.sh/?q=ycombinator.com

(the site may occasionally fail to load)

Eikon•1h ago
Shameless plug :)

https://www.merklemap.com/search?query=ycombinator.com&page=...

Entries are indexed by subdomain instead of by certificate (click an entry to see all certificates for that subdomain).

Also, you can search for any substring (that was quite the journey to implement so it's fast enough across almost 5B entries):

https://www.merklemap.com/search?query=ycombi&page=0

pavel_lishin•1h ago
What's the yawn for?
xpe•1h ago
Presumably this is well-known among people that already know about this.

P.S. In the hopes of making this more than just a sarcastic comment, the question of "How do people bootstrap knowledge?" is kind of interesting. [1]

> To tackle a hard problem, it is often wise to reuse and recombine existing knowledge. Such an ability to bootstrap enables us to grow rich mental concepts despite limited cognitive resources. Here we present a computational model of conceptual bootstrapping. This model uses a dynamic conceptual repertoire that can cache and later reuse elements of earlier insights in principled ways, modelling learning as a series of compositional generalizations. This model predicts systematically different learned concepts when the same evidence is processed in different orders, without any extra assumptions about previous beliefs or background knowledge. Across four behavioural experiments (total n = 570), we demonstrate strong curriculum-order and conceptual garden-pathing effects that closely resemble our model predictions and differ from those of alternative accounts. Taken together, this work offers a computational account of how past experiences shape future conceptual discoveries and showcases the importance of curriculum design in human inductive concept inferences.

[1]: https://www.nature.com/articles/s41562-023-01719-1

jfindper•1h ago
It implies that this is boring and not article/post-worthy (which I agree with).

Certificate transparency logs are intended to be consumed by others. That is indeed what is happening. Not interesting.

pavel_lishin•1h ago
> It implies that this is boring and not article/post-worthy (which I agree with).

It's certainly news to me, and presumably some others, that this exists.

jfindper•1h ago
Which part is news?

If certificate transparency is new to you, I feel like there are significantly more interesting articles and conversations that could/should have been submitted instead of "A public log intended for consumption exists, and a company is consuming that log". This post would do literally nothing to enlighten you about CT logs.

If the fact that OpenAI is scraping certificate transparency logs is new and interesting to you, I'd love to know why it is interesting. Perhaps I'm missing something.

Way more interesting reads for people unfamiliar with what certificate transparency is, in my opinion, than this "OpenAI read my CT log" post:

https://googlechrome.github.io/CertificateTransparency/log_s...

https://certificate.transparency.dev/

JumpCrisscross•46m ago
> Certificate transparency logs are intended to be consumed by others. That is indeed what is happening. Not interesting

Oh, I read this as indicating OpenAI may make a move into the security space.

prettyblocks•24m ago
Even if it's just for their internal security initiatives it would make sense given how massive they are. Threat hunting via cert monitoring is very effective.
moralestapia•1h ago
Because it's hardly news in its context.
irishcoffee•1h ago
Everyone does it, it’s no big deal. “Yes officer I was speeding, so was everyone else!”

Gross.

edvinbesic•1h ago
You are implying that a law is being broken, but isn't this the equivalent of going to city hall to pull public land records?
formerly_proven•1h ago
The whole point of CT logs is to make issuance of certificates in the public WebPKI… public.
tsimionescu•1h ago
The whole point of the CT logs is to be a public list of all domains which have TLS certs issued by the Web PKI. People are reading this list. I really don't see what is either surprising or in any way problematic in doing so.
jfindper•1h ago
The intended purpose of certificate transparency logs is to be viewed by others!

Perhaps you should save your "gross" judgement for when you better understand what's happening?

ekr____•1h ago
With that said, given that (1) pre-certificates in the log are big and (2) lifetimes are shortening and so there will be a lot of duplicates, it seems like it would be good for someone to make a feed that was just new domain names.
Eikon•44m ago
Merklemap offers that: https://www.merklemap.com/documentation/live-tail
agwa•11m ago
There's an extension to static-ct-api, currently implemented by Sunlight logs, that provides a feed of just SANs and CNs: https://github.com/FiloSottile/sunlight/blob/main/names-tile...

For example:

  curl https://tuscolo2026h1.skylight.geomys.org/tile/names/000 | gunzip
(It doesn't deduplicate if the same domain name appears in multiple certificates, but it's still a substantial reduction in bandwidth compared to serving the entire (pre)certificate.)
raldi•57m ago
What reason?
electroly•53m ago
The CT log tells you about new websites as soon as they come online. Good if you're intending to scrape the web.
1vuio0pswjnm7•32m ago
"... for exacty this reason."

Needs clarification. What reason

gmerc•1h ago
Let's prompt inject it
throwaway613745•1h ago
OpenAI is scraping everything that is publicly accessible. Everything.
Aachen•1h ago
Yet they provide the user agents and IP address ranges which they scrape from, and say they respect robots.txt

I run a web server and so see a lot of scrapers, but OpenAI is one of the ones that appear to respect limits that you set. A lot of (if not most) others don't even have that ethics standard so I'd not say that "OpenAI scrapes everything they can access. Everything" without qualification, as that doesn't seem to be true, at least not until someone puts a file behind a robots deny page and finds that chatgpt (or another of openai's products) has knowledge of it

warkdarrior•13m ago
So do Google, Microsoft/Bing, Yandex, etc. How else would they make sure their search/chatbot/q&a products are up to date?
_pdp_•1h ago
I wonder if this can be used to contaminate OpenAI search indexes?
bombcar•1h ago
If you somewhat want to avoid this, get a wildcard certificate (LE supports them: https://community.letsencrypt.org/t/acme-v2-production-envir...

Then all they know is the main domain, and you can somewhat hide in obscurity.

lysace•1h ago
Unfortunately they are a bit extra bothersome to automate (depending on your DNS provider/setup) because of the DNS CNAME-method validation requirement.
jsheard•1h ago
Yep, but next year they intend to launch an alternative DNS challenge which doesn't require changing DNS records with every renewal. Instead you'll create a persistent TXT record containing a public key, and then any ACME client which has the private key can keep requesting new certs forever.

https://letsencrypt.org/2025/12/02/from-90-to-45#making-auto...

8cvor6j844qw_d6•14m ago
Great to hear, one less API keys needed for the DNS records.
cortesoft•40m ago
If you are using a non-standard DNS provider that doesn’t have integration with certbot or cert-manager or whatever you are using, it is pretty easy to set up an acme-dns server to handle it

https://github.com/joohoi/acme-dns

vault•51m ago
Correct, that's what I did with caddy, which is now periodically renewing my wildcard certificate through a DNS-01 challenge.
8cvor6j844qw_d6•16m ago
May I know does Caddy automatically update with apt if you built custom Caddy binaries for the DNS provider plugin?

Also, may I know which DNS provider you went with? The GitHub issues pages with some of the DNS provider plugins seems to suggest some are more frequently maintained, while some less so.

matt3210•1h ago
Your content is stolen for training the moment you put it up
jfindper•1h ago
It is an _incredible_ stretch to frame certificate transparency logs as "content" in the creative sense.

The whole purpose of this data is to be consumed by 3rd-parties.

integralid•12m ago
I don't see issue with OAI scraping public logs.

But what GP probably meant is that OAI definitely uses this log to get a list of new websites in order to scrap then later. This is a pretty standard way to use CT logs - you get a list of domains to scrap instead of relying solely on hyperlinks.

poormathskills•1h ago
Is it still “scraping” when the purpose of these transparency logs is to be used for this purpose?
8cvor6j844qw_d6•47m ago
Anyone went with wildcard certificates to avoid disclosing subdomains in certificate transparency logs?
toddgardner•7m ago
If you want to learn more about Certificate Transparency Logs, how to pull and search them, we just did a 3 part series about how we did this at CertKit: https://www.certkit.io/blog/searching-ct-logs