Show HN: No more writing shitty regexes to police usernames

https://www.username.dev

19•choraria•1mo ago

Every product that allows usernames eventually ships the same broken solution. Someone adds a blacklist. Then a regex. Then another regex copied from StackOverflow. It works just long enough to ship, and then `admin`, `support`, city names, brand impersonation, and obvious slurs start leaking through anyway. Everyone knows it’s fragile, but it gets ignored because "it’s just usernames".

I’ve had to rebuild this logic across multiple products, and I got tired of pretending it’s a solved problem. So I built *username.dev*, an API that answers a more useful question than "is this taken?" — it tells you what a username actually represents.

Instead of returning a boolean, the API classifies usernames into real categories like brands, public figures, places, system-reserved terms, dictionary words, premium handles, and offensive content, and returns structured metadata you can actually make decisions with. That means blocking impersonation without breaking legitimate users, stopping abuse without maintaining massive regex lists, and even monetizing high-demand usernames if that’s part of your product.

Under the hood it’s intentionally boring infrastructure: Cloudflare Workers at the edge, KV for fast reads, D1 for usage and analytics, and a simple HTTP endpoint (`GET /check?input=foo`). P95 latency sits around 300ms globally. There’s no ML magic, no black box, and no attempt to be clever — just fast, deterministic classification.

Pricing is usage-based and prepaid because subscriptions for infrastructure like this are annoying. There’s a free tier with 1,000 requests and no credit card. Use it, throw it away, or rip the idea off.

If you think regex blacklists are "good enough", usernames don’t matter, or this is a trivial problem, you’re probably already shipping bugs — they’re just not loud enough yet.

Tell me why this is a bad idea, what edge cases I’m missing, or what you’ve duct-taped together instead.

— Sourabh

Comments

sampli•1mo ago

I want all the SaaS in my stack

choraria•1mo ago

Hey @sampli — was there some kind of bundling that you were looking for?

maxall4•1mo ago

I can’t tell if this is some complex joke or a real product. This is literally string.contains() as a service.

Edit: 300ms?!

gs17•1mo ago

I think there's some value in providing a huge dictionary of things to test against, with tagging for what things are to help filter. This doesn't do a great job at it, and it would make 100x more sense as a library, but it's a little more than just string.contains().

maxall4•1mo ago

Sure, but I’m not convinced that producing a blacklist and filtering system is that difficult. More importantly, it’s little things like this that slowly and insidiously degrade the user experience. Sure it starts with one 300ms API call, maybe most people won’t notice. But when you reach for solutions like this to every minor technical problem, the next thing you know it takes 5 seconds to sign-up.

choraria•1mo ago

My take on latency in general is this: You may just use the API to flag (not act) in an async way. This way, you can just alert/monitor and decide later whether or not to take any actions while keeping the flow non-blocking. Another approach would be to run it against existing handles to see what opportunities exist (ex: premium usernames, impersonators etc.).

gs17•1mo ago

Sounds like a good opportunity for some kind of batching feature.

choraria•1mo ago

Yes; I've gotten that request from another person on LinkedIn too for bulk checking existing usernames. Will work on releasing that shortly too. Thanks for being helpful and constructive all the way throughout the convo :)

choraria•1mo ago

Not a joke (I'm taking this in the spirit intended) but I can see there are TONS of things I need to be improving on:

1. latency: my original goal was to make it sub-10s but with checking for auth, cold starts, the actual lookup, couldn't get it to do better than 2-300ms. I need to improve this though and I will. 2. increased list size: currently, the lookup happens across 1.7million records (will go up to 2.5m in the next days/weeks) BUT I don't think that would ever cover ALL scenarios. 3. better categorisation

tommy_axle•1mo ago

Ok so taylorswift is reserved but taylor_swift and realtaylorswift can be used? It seems like impersonation would still be a problem.

chaps•1mo ago

Hah no kidding. I tried just, "bill_gates" --

  {
    "username": "bill_gates",
    "isReserved": false,
    "isDeleted": false,
    "categories": []
  }

what's the point of this thing...?

gs17•1mo ago

It's odd that they focused so much on "it's better than regexes" when it doesn't handle these cases where a regex would do well.

choraria•1mo ago

The comment on regex was really because that's what I did when I built internal reserved usernames list of 2 of my URL shortener projects. I love regex, btw. BUT, I don't think they cover all of what we need with usernames specifically. Shared some more insights on the thread about variations too (like underscores etc.).

bpt3•1mo ago

Why would I want billgates to be reserved in the first place, unless I'm Microsoft?

And the definition of a "public figure" is absurdly broad and inconsistent. Some very common names are flagged as reserved for what are extremely minor celebrities at best (like an assistant coach of a college basketball team, or a actor with barely any formal credits as examples, and some other obscure athletes are marked as reserved while others are not).

choraria•1mo ago

Well, to clarify, this API is really for folks who're building platforms that require usernames. For ex: imagine if you were building the next Twitter or anything that requires usernames. There, you'd want to know what's happening with these kinds of usernames, where, people are now prepared to pay for too (premium usernames). Similarly, for cases where the names are offensive or profane, you may want to block outright.

As for definition of specific categories (more specifically public figures), you're right. Currently, it's just me building this and so I had to decide where to draw the line. I just drew it around the entire earth which I know is NOT the best appraoch but that's the one I went with just to ensure I cover all bases. Honestly, the API would tell if and why a username could be deemed reserved/premium. What to do with this info is really up to the platforms that are consuming it. They could let it slide, do nothing, just flag and monitor, block etc.

choraria•1mo ago

I thought about this and decided against complicating ways in which this can be restricted. Honestly, this is a super simple challenge to solve. Perhaps I should introduce this as an API parameter to detect variations. That way, not just taylor_swift but t_aylorswift, ta_ylorswift etc. could also be detected and flagged.

As for realtaylorswift, I thought about that too. I don't think — and this is my personal opinion, obviously — most platforms wouldn't want to restrict this because then it really becomes unmanageable. I could obviously be wrong though and these could very easily be introduced to the API also (i.e. detect obvious username patterns) and totally open to adding that as an API parameter too.

chaps•1mo ago

Friend, with respect, these "simple challenge"s really start to add up very quickly, especially after edge cases.

Highly recommend you read this and similar posts: https://www.kalzumeus.com/2010/06/17/falsehoods-programmers-...

gs17•1mo ago

> I can safely assume that this dictionary of bad words contains no people’s names in it.

This is a big one for this kind of project, and I've never been sure how usernames for people named Kike should be handled.

choraria•1mo ago

Good point. Currently, I've got "kike" as a Spanish dictionary word and also a public figure. Honestly, the job of this API stops there. It tells the platform that this username needs to be handled differently than "randomusername7346783" which has absolutely no value. Now, what we do with this info is really up to admins/platform owners. They could simply do nothing, flag and monitor, charge a premium or block outright. Totally their call but they can now programmatically decide that.

gs17•1mo ago

It definitely should be in a list of offensive terms too (and offensive dictionaries by language could be even more useful, telling moderators why it was flagged is valuable).

choraria•1mo ago

I see. Will re-run through the categories and the datasets from which I've adopted the names and categories. Maybe either I missed something or it might've not existed in the import in the first place. But noted. Also, thanks :)

choraria•1mo ago

Damn! Just read the title and a few lines from the post but will definitely go through it fully and thoroughly. Thanks for sharing.

I didn't mean to reduce the complexity of the challenge. Was mostly trying to convey that the specific cases being discussed, should be something that I could quickly solution and incorporate in the API.

You're right about ALL the different kinds of edge cases that exist though and really, I'm trying to have this API be the go-to solution for it. Clearly, it's still not there. But it will be. I'm now more sure than ever.

CamJN•1mo ago

I hate to say it but checking if a string is ~= some identifier might actually be something an llm might be useful for, since it doesn't need to be 100% accurate and does need to evaluate the string against a massive number of potential transformations.

bpt3•1mo ago

Yes, a classifier based on similarity metrics would be more useful than whatever is going on behind the scenes here, which seems to be completely based on string matching and a not very creative dictionary of offensive terms.

choraria•1mo ago

Interesting! Didn't think about it that way. Currently, it's a super dumb system. There's a list of ~1.7 million records and the API simply looks-up against that. Super lazy approach. Was avoid running an API across OpenAI or other model but didn't think about hosting a classifier/LLM myself. Might consider it in the future.

Full disclosure: I'm not a developer. I understand tech architectures well. Can code (have coded in JS pre-AI too) BUT will figure this out as I go along. Thanks and truly appreciate the input.

Edit note: added million next to 1.7. fml!

Dumbledumb•1mo ago

So, I can’t use my legal name as a username because some random town with a few thousand people is named the same?

choraria•1mo ago

That would depend on the folks implementing the API

In it's current state, I'd look at the API to check for reserved / premium names (or something that's profane).

If it makes sense contextually: imagine if you were building the next Twitter. I'm guessing you'd want to have a way to charge for premium names and in-turn need a way to detect what's premium. For the most part, first and last names are pretty premium and people pay (they do!) for such usernames.

eptcyka•1mo ago

I can easily generate valid yet foul names that I’d prefer to not allow if I was into censoring usernames.

choraria•1mo ago

Tell me about 'em. Will add to the list. I doubt I'll be able to stop ALL variations but I really am determined to manually keep this list updated as best as possible. Currently at 1.7 million records; will be at around 2.5 million in the coming months and I suspect this will just keep increasing.

warmedcookie•1mo ago

It's pretty hard to be thorough in censorship because humans are good at spotting patterns and can easily see the masked profanity. Ex. f0u0c0k

choraria•1mo ago

No disagreements there. My goal is to make this THE BEST list (of all, if any) out there. As much as I see AI taking over handling variations, it'll still need an exhaustive source of truth about what's what and I really think I'd be able to provide that with this API.

nicpottier•1mo ago

Congrats on the launch!

Do you expect / want this to be a business? This feels like the kind of thing where anybody big enough to pay for it will build it in house. And your pricing seems so cheap that even if you do win some it won't be enough.

Genuine curiosity but 300ms seems slow? Am I missing something? How big is the blacklist?

choraria•1mo ago

Thanks and I do appreciate the comment too.

I'm a bit unsure about it's future as a business but for now, hoping it becomes my first app with some paying users. I typically think small scale but you're right. I suppose most big companies already have an in-house way to deal with it.

Idea behind this was super charged because there wasn't a global reserve list already available for folks to access.

On the latency, I'll work on improving it. Currently, the list (not a blacklist :P) is about 1.7 million records. I suspect it to go to 2.5M in the next few days. I should probably stop using Cloudflare Workers, KV and D1 to instantly improve on that.

nlh•1mo ago

I love that you’re tackling this problem, and congrats on launching and getting this on HN!

This does feel like a real problem. The thing that concerns me (and likely other devs here) is that it adds an additional remote API dependency for a very core part of a system when a lot of people are trying to keep those dependencies to an absolute minimum. When your service goes down (not if), everyone who’s dependent on you will not be able to register new users, etc.

Is there any way you can offer this as a library instead? You deserve to get paid of course - maybe provide the library and initial data and charge for updates / premium checks, something like that.

choraria•1mo ago

Super valid and fair. Thanks for taking the time and writing this too. In tears (on the inside) because of some validation around problem statement. I am exploring providing this as a pay-once service too, where you get a point-in-time CSV/JSON export and then folks pay to update data. Felt like too much work for the first release so didn't get to it.

As for the original concern though, here's some thoughts: You may just use it to flag (not act) in an async way. This way, you can just alert/monitor and decide later whether or not to take any actions while keeping the flow non-blocking. Another approach would be to run it against existing handles to see what opportunities exist (ex: premium usernames, impersonators etc.).

BUT, thanks again for the input. I'll definitely make this happen!

tommy_axle•1mo ago

I see a service like this as being in the ip lookup API category (like ipinfo.io) but I wanted to mention that for this (and IP lookup, captcha etc) I would expect that if the service is down then you allow the registrations then review later, and not simply prevent all registrations.

choraria•1mo ago

Interesting. I think you're right (on the API category this falls under). Also love the approach on keeping this API async. Makes so much more sense that way.

gs17•1mo ago

I'm not understanding your categories. Every dictionary word is flagged? It seems any first or last name is a "public_figure" ("apple" is a "public_figure" and also a "brand", I guess that means there's someone named Apple? Tim Apple?)?

It "blocks profanity", but "shithead", "assfucker", etc. are allowed (not to mention obfuscating a restricted term even slightly, e.g. "sh1t")? Yes, the Scunthorpe problem exists, but you can do better, and should if you're expecting people to pay to wait 500ms.

Something that detects these sorts of things very well could actually be worth paying for, although it still would probably be better off as a library.

choraria•1mo ago

Thanks and this gives me more perspective too. Here's what I'm hearing:

- need to improve categorisation (some are miscategorised, some categories don't make sense) - better list; more subsets to block (fair and very true) — this is an evolving list and so I'll work on constantly adding more to it (currently has ~1.7million records; will go to 2.5 in the next few days) - latency is a killer

Again, I said it in another comment too, I'm pretty happy with this (tears on the inside) because the problem at least is validated in some way.

I just need to do better in terms of solutioning; which, IMO, is doable.

bpt3•1mo ago

Why do I care as a website owner whether someone uses a brand name (e.g. cocacola) as their username on my site?

Same question, but for place names which seems completely innocuous?

Instead of us telling you why this is a bad idea, can you tell us why this is a good idea and what bugs we are shipping currently that this prevents?

gs17•1mo ago

I could see social-media-ish websites not wanting those names to prevent impersonation. They'd be deciding if they want to risk friction when a big name joins the platform (@cocacola needs Coca-Cola to verify) or risk threats from that big names' legal department (when @cocacola gets registered by someone who just posts furry porn of their mascot bear). It could just set a flag to require the account to verify or be renamed.

bpt3•1mo ago

I get the argument in theory, but then I'll just register coca-cola (which is available), cocacola_furry (which is available), C0CAC0LA (which is available), etc.

You're signing up to play a game you can't win preemptively IMO.

As an aside, cocacola is also "available", despite being listed as an example of what you don't want to allow on the homepage and presumably would be flagged as a reserved brand name handle by this service.

choraria•1mo ago

You're right about the variations there. I did think about it but decided NOT to add that in this version (felt like over-complicating the process), which I've now come to understand IS a required criteria. Will work on improving this.

As for @cocacola — that's on me. I've not yet gotten to the bottom half of the list of categories here: https://docs.username.dev/reference/categories (need to work on "government" and below). "company" is listed there and I suspect "cocacola" should be covered there.

In hindsight, I should've reserved names that I'm showing in the flipping text of the hero title but I didn't want to game the system or make it seem more reliant than it currently is. Which, again, I'm learning is not so reliant to begin with anyway.

PS. Love the passion around the topic here. One thing that I'm happy about is getting the problem validated. It's not in my head, I'm not the only one experiencing it, this is real. AND I WILL SOLVE IT :)

choraria•1mo ago

Fair. I suppose most newer platforms may not think too much about it. So here's the pitch though: Imagine you're building the next Twitter (or, you know the platform has the potential to become the next Twitter). Knowing what we know now about social media platforms, where, users are open to paying for premium usernames (ex: @apple, @cocacola, @media etc.), it would be nice to at least flag/know if there are folks trying to reserve with these usernames. You could decide later / async what to do about it but you'll at least have a way to flag. Similarly, you can also avoid profanity or abusive words from seeping in the platform also. You may want to restrict/block 'em outright.

As for bugs: what I see happening now is folks either have a static list (which is already bad; not a bug) or have pattern-matching to avoid these (which isn't full proof). Regex/pattern matching can only help in cases where we have "real" or "try" or "something" as a pre/postfix. More complex cases but don't really identify a wide range of premium / reserved names. IMO, for this, we will need a dictionary of sorts, which is what I'm hoping to achieve with this API.

It's a giant manual list. I'm a human maintaining it. Just need to do better in terms of the API / deliverability side of things.

bpt3•1mo ago

Thanks for the response.

> Fair. I suppose most newer platforms may not think too much about it. So here's the pitch though: Imagine you're building the next Twitter (or, you know the platform has the potential to become the next Twitter). Knowing what we know now about social media platforms, where, users are open to paying for premium usernames (ex: @apple, @cocacola, @media etc.), it would be nice to at least flag/know if there are folks trying to reserve with these usernames. You could decide later / async what to do about it but you'll at least have a way to flag. Similarly, you can also avoid profanity or abusive words from seeping in the platform also. You may want to restrict/block 'em outright.

How many people are trying to build the next twitter? I would guess it's approximately zero, so I think you'll need a wider target audience to generate meaningful revenue.

It's much easier for the next twitter to just institute a policy that says handles can be modified by the platform as needed and deal with the "problem" post hoc.

> As for bugs: what I see happening now is folks either have a static list (which is already bad; not a bug) or have pattern-matching to avoid these (which isn't full proof). Regex/pattern matching can only help in cases where we have "real" or "try" or "something" as a pre/postfix. More complex cases but don't really identify a wide range of premium / reserved names. IMO, for this, we will need a dictionary of sorts, which is what I'm hoping to achieve with this API.

Based on what you've said, you're also using a static list, correct?

Long term, I suppose the actual value proposition is not that using a list is a bug, but you have the "best" list due to your scale and people can outsource managing their own version?

To me, the issue is that this isn't a solvable problem using your current approach because people are more creative than a list of banned strings and you're severely outnumbered at scale.

choraria•1mo ago

Right on all counts. Twitter is a rather simplified example. I see it as something that literally every platform can use. Say, ProductHunt, other platforms that offer product launches, link-in-bio tools etc. etc. I'm a bit bullish around the market because, regardless of me knowing all of 'em, the challenge of using usernames exists in general.

On the static list, yes. Me too. But I keep updating mine as well. For ex: on day 1, "apple" was just a dictionary word. On day 2, it was also classified as a brand. Also, every quarter, half-yearly or yearly, there are newer companies, public figures whose usernames keep getting to be significant. Currently, though manually, I intend to maintain this list for the long run.

As for a better, permanent solution, on another comment, I came across using an LLM/classifer for this (based on my understanding, that's not just asking OpenAI but building an LLM of my own) where I have the "best" source of truth and the LLM handles all variations. I think it actually is solvable to an extent now. Though, I'm not sure what the final solution looks. I WILL SOLVE THIS THOUGH :D

delduca•1mo ago

Hmm… I do know, certain usernames in one language can have a bad meaning in others

choraria•1mo ago

True. I've tried to add language where possible. I think currently, it's only on dictionary words so if the username is a dictionary word in another name, it would be flagged. It may or may not show-up under the "restricted" category though.

cracki•1mo ago

Site is AI-generated. The post to HN is AI-generated.

As other comments point out, lots of holes.

I think nobody should pay for that.

choraria•1mo ago

- site is AI generated: yes. I'm NOT a developer. I vibe-coded it using Cursor and other AI tools - post is AI generated: not 100%. I wrote the whole thing myself (promise). The sentiment is real, so is all the context. I just asked AI to polish it. Had made too many typos in my original text. To avoid being labelled as "AI content", I now make video responses for the most part. Please check my twitter (same username) and you'll see. - lots of holes: you bet! what I'm happy about is though that the problem statement is validated to an extent. I see multiple people ack'ing that the problem is real. It's just that my solution is bad. I can improve it and I will. - paying: yes, you're right. IMO, they should try first. complain, complain, complain so I can get to fixing issues (like from many of the comments here) and only if they need to make more API requests, they could then choose to pay

WDYT?

dsfdsfdsffdsfs•1mo ago

Credits need to expire in X months. That way you don't have to keep the service running if it turns out not to get traction.

choraria•1mo ago

I think there's a general aversion to subscriptions at the moment so wanted to offer this on a usage-based pricing to begin with. While I may have to (hope not to though) will switch to subscription if that's what most users end up asking for. Thanks for the note and comment though. Much appreciated.

bpt3•1mo ago

As a counter to my skeptical comments elsewhere, one way this could be a more useful service is if you go poll well known social media sites and see what the handle is associated with.

If the handle is taken by what seems like the same content/brand owner across FB, IG, reddit, X, etc. then that could add weight to a decision to reserve it (and be provided as useful context to your user as to why you recommend it be reserved), and if it's associated with something like hate speech or just crappy content someone who is doing brand research can know to look for alternatives.

choraria•1mo ago

YOU NAILED IT! That's part of what I want to explore next too. One of the other common thing I've heard is to check other social media platforms to check if a username is reserved. But, namevine.com already did that. Maybe it doesn't do it as accurately anymore. Almost all platforms are super protective of their API usage in that way. In any case, that's definitely the next challenge to solve for.

drcongo•1mo ago

You might need to invest in a copy of Roger's Profanisaurus, I just registered something absolutely obscene.

choraria•1mo ago

That's helpful. I think I need to beef up profanity keywords A LOT more than I'd originally anticipated. On it!

SectorC: A C Compiler in 512 bytes

Brookhaven Lab's RHIC concludes 25-year run with final collisions

The F Word

I write games in C (yes, C)

Software factories and the agentic moment

Speed up responses with fast mode

Hoot: Scheme on WebAssembly

Stories from 25 Years of Software Development

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

First Proof

The Waymo World Model

Al Lowe on model trains, funny deaths and working with Disney

Reinforcement Learning from Human Feedback

Vocal Guide – belt sing without killing yourself

Start all of your commands with a comma (2009)

We mourn our craft

Coding agents have replaced every framework I used

Show HN: I saw this cool navigation reveal, so I made a simple HTML+CSS version

France's homegrown open source online office suite

72M Points of Interest

The AI boom is causing shortages everywhere else

Selection Rather Than Prediction

A Fresh Look at IBM 3270 Information Display System

Unseen Footage of Atari Battlezone Arcade Cabinet Production

History and Timeline of the Proco Rat Pedal (2021)

Where did all the starships go?

Show HN: Kappal – CLI to Run Docker Compose YML on Kubernetes for Local Dev

Learning from context is harder than we thought

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

Hackers (1995) Animated Experience

SectorC: A C Compiler in 512 bytes

Brookhaven Lab's RHIC concludes 25-year run with final collisions

The F Word

I write games in C (yes, C)

Software factories and the agentic moment

Speed up responses with fast mode

Hoot: Scheme on WebAssembly

Stories from 25 Years of Software Development

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

First Proof

The Waymo World Model

Al Lowe on model trains, funny deaths and working with Disney

Reinforcement Learning from Human Feedback

Vocal Guide – belt sing without killing yourself

Start all of your commands with a comma (2009)

We mourn our craft

Coding agents have replaced every framework I used

Show HN: I saw this cool navigation reveal, so I made a simple HTML+CSS version

France's homegrown open source online office suite

72M Points of Interest

The AI boom is causing shortages everywhere else

Selection Rather Than Prediction

A Fresh Look at IBM 3270 Information Display System

Unseen Footage of Atari Battlezone Arcade Cabinet Production

History and Timeline of the Proco Rat Pedal (2021)

Where did all the starships go?

Show HN: Kappal – CLI to Run Docker Compose YML on Kubernetes for Local Dev

Learning from context is harder than we thought

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

Hackers (1995) Animated Experience

Show HN: No more writing shitty regexes to police usernames

Comments