Element: setHTML() method

https://developer.mozilla.org/en-US/docs/Web/API/Element/setHTML

268•todsacerdoti•3mo ago

Comments

michalpleban•3mo ago

So is this basically a safe version of innerHTML?

Octoth0rpe•3mo ago

Yes, although a slightly more relevant way of putting it would be that it's an inbuilt DOMPurify (dompurify being an npm package commonly used to sanitize html before injecting it).

ngold•3mo ago

Is this basically doing the same thing as https now? But for http, and firefox just never implemented a simple fix for it's entire existence until now?

I obviously know nothing about this, but I still find it fascinating. Or am I off my block.

masklinn•3mo ago

This has nothing whatsoever to do with http.

bilekas•3mo ago

XSS isn't related to https/ssl, ssl is the secure conncetion between you and the server, but xss is the injection of data into the site which will be executed in your browser in this case. The connection isnt relevant.

https://developer.mozilla.org/en-US/docs/Web/Security/Attack...

intrasight•3mo ago

I'm confused as to why you need a "safe" version if you're the one generating and injecting the HTML.

evbogue•3mo ago

Why should a web page only have a single person generating and injecting HTML into it?

intrasight•3mo ago

A single company. Why would I let another company inject HTML into my page?

afavour•3mo ago

There's this newfangled concept called social media where you let other people post content that exists on your web site. You're rarely allowed to post HTML because of the associated issues with sanitizing it. setHTML could help with that.

president_zippy•3mo ago

I just had a flashback to the heyday of MySpace. Now that I think about it though, Neocities has the "social networking" of being able to discover other people's pages and give each other likes and comments.

Hmmm...

mpeg•3mo ago

Or CMS content, or even anything that comes from the user outside of social media content and could cause a reflected XSS

for example, a search query, or a redirect url, or a million other things

intrasight•3mo ago

The analogy doesn't hold markup ;)

Whether I generate a whole page or generate a partial page and then add HTML to it is equivalent from a safety perspective.

matmo•3mo ago

Isn't this kinda like asking "why does my gun need a safety if I'm the only one consciously pulling the trigger"?

theendisney•3mo ago

It was kind of strange to have bbcode and wiki markup specifically to avoid allowing users to use html.

masklinn•3mo ago

Gruber’s original markdown tool passes HTML straight through, it was designed to make writing long-form content easier.

Markdown implementations can do any of that, only allowing a whitelist of HTML elements (GFM), or not allowing HTML at all.

halapro•3mo ago

If you generate it from completely static and known values, have at it.

If you include user-provided data, then you should sanitize it for HTML.

jeroenhd•3mo ago

As it turns out, verifying that HTML is safe to render without neutering HTML down to a whitelist of elements is actually quite difficult. That's not great when you're rendering user-generated content.

Solutions in the form of pre-existing HTML sanitisation libraries have existed for years but countless websites still manage to get XSS'd every year because not everyone capable of writing code is capable of writing secure code.

masklinn•3mo ago

1. Because you commonly are not.

2. Because it’s really easy to fuck up and leak attacker controlled content in markup, especially when the environment provides tons of tools to do things wrong and none to do things right. IME even when the environment provides tons of tools to do things right it’s an uphill battle (universe, idiots, yadda yadda).

zarzavat•3mo ago

Markdown parsers output unsafe HTML.

idreyn•3mo ago

It is to render untrusted (user-generated) HTML without letting them slip in markup like script tags that could harm other users.

bawolff•3mo ago

Because sometimes you generate html based on user input and upwards of 98% of web related security vulnerabilities have been this.

evilpie•3mo ago

We enabled this by default in Firefox Nightly (only) this week.

spankalee•3mo ago

I'll be very excited to use this in Lit when it hits baseline.

While lit-html templates are already XSS-hardened because template strings aren't forgeable, we do have utilities like `unsafeHTML()` that let you treat untrusted strings as HTML, which are currently... unsafe.

With `Element.setHTML()` we can make a `safeHTML()` directive and let the developer specify sanitizer options too.

StrauXX•3mo ago

Why don't you use DOMPurify right now? It's battle tested and supports configs just like this proposal.

ffsm8•3mo ago

Why would the framework do that?

The app developers can still use that right now, but if the framework forces it's usage it'd unnecessarily increase package size for people that didn't need it.

spankalee•3mo ago

One, lit-html doesn't have any dependencies.

Two, even if we did, DOMPurify is ~2.7x bigger than lit-html core (3.1Kb minzipped), and the unsafeHTML() directive is less than 400 bytes minzipped. It's just really big to take on a sanitizer, and which one to use is an opinion we'd have to have. And lit-html is extensible and people can already write their own safeHTML() directive that uses DOMPurify.

For us it's a lot simpler to have safe templates, an unsafe directive, and not parse things to finely in between.

A built-in API is different for us though. It's standard, stable, and should eventually be well known by all web developers. We can't integrate it with no extra dependencies or code, and just adopt the standard platform options.

senfiaj•3mo ago

There is also a native DomParser API. I wrote an example of HTML sanitizer that uses DomParser: https://waspdev.com/articles/2025-05-07/how-to-sanitize-html...

leenify•3mo ago

Are you certain that this is secure? What about parsing depth/DOM clobbering, etc?

See https://mizu.re/post/exploring-the-dompurify-library-bypasse... for an example of why this is really hard. Please do not roll your own sanitizers; DOMPurify has very good maintenance hygiene, and the maintainer is an expert. I have reported a bunch of issues and never waited for more than two hours for a response in the past. He is also one of the leading authors of the specification behind `setHTML`.

bawolff•3mo ago

Their example only supports a very small subset of html, which makes the problem much easier.

senfiaj•3mo ago

My code accepts only a very limited subset of HTML tags and their respective attributes. (<a>, <img>, <font>, <br>, <b>, <strong>, <i>, <em>, <del>, <s>, <u>, <p>, <hr>, <li>, <ul>, <ol>).

I could easily add more, like headings or tables. Just decided to not overwhelm the readers. But all of the allowed elements / attributes here are harmless. When I'm copying them, I'm only copying the known safe elements and attributes (forbids unknown attributes, including styles/scripts, event handlers, style attributes, ids, or even classes). I have fine control over the allowed elements / attributes and the structure. This makes things much easier. For a basic html content management this kind of filtering is fine since DOMParser actually does the heavy lifting.

Sure, DomPurify is powerful and handles much more complex use cases (doesn't it also use DOMParser though?), no doubts about that. But a basic CMS probably has to handle basic HTML text elements. I guess inline SVG sanitation is more complicated (maybe just use ordinary <img> instead?).

If you have some html example that will inject js/css or cause any unexpected behavior in my code example, please provide that HTML.

somat•3mo ago

So that's why template literals are broken. I am not much of a JS dev but sometimes I play one on TV. and I was cursing up a storm because I could not get templates to work the way I wanted them to. And I quote "What do you mean template strings are not strings? What idiot designed this."

If curious I had a bright idea for a string translation library, yes, I know there are plenty of great internationalization libraries, but I wanted to try it out. the idea was to just write normalish template strings so the code reads well, then the translation engine would lookup the template string in the language table and replace it with the translated template string, this new template string is the one that would be filled. But I could not get it to work. I finally gave up and had to implement "the template system we have at home" from scratch just to get anything working.

To the designers of JS template literals, I apologize, you were blocking an attack vector that never crossed my mind. It was the same thing the first time I had to do the cors dance. I thought it was just about the stupidest thing I had ever seen. "This protects nothing, it only works when the client(the part you have no control over) decides to do it" The idea that you need protection after you have deliberately injected unknown malicious code(ads) into your web app took me several days of hard thought to understand.

normie3000•3mo ago

I've written a fair number of custom template literals, and I don't understand what your complaint is. Can you share more details?

somat•3mo ago

js can't use a string as a template.

my example: a table to lookup translated templates. most translation engines require you to use placeholder strings. this lets you use the template directly as the optional lookup key.

simplified with some liberties taken as this can't be done with template literals. Easy enough to fake with some regexes and loops. but I was a bit surprised that the built in js templates are limited in this manner.

    const translate_table = {
      'where is the ${thing}':'${thing} はどこですか' ,
      }

  function t(template, args) {
    if (translate_table[template] == undefined) {
      return template.format(args);
    }
    else {
     return translate_table[template].format(args);
     }
    }

    user_dialog(t('Where is the ${thing}', {'thing', users_thing} ));

I even dug deep into tagged templates, but they can't do this ether. The only solution I found was a variant of eval() and at that point I would rather write my own template engine.

normie3000•3mo ago

I think I understand what you're suggesting, and I think it can be achieved with javascript template literals. It might be easier to understand with a usage example instead of an implementation example.

The only restriction may be that variable placeholders in additional translations might need to be positional rather than named.

jitl•3mo ago

You can make your tagged template literal return an array of tokens, so the developer gets to write naturally and no one has to deal with parsing. Just use the json stringified token array as the key in your translation map.

Here's how the tagged template literal maps to tokens:

    t`Where is the ${t.thing()}` ->
    ["Where is the ", ["thing"]] // ["variable name"]

Example rendering a translated string directly:

    t`Where is the ${t.thing(user_data)}?`.toString()

Its internet forum so I made it as short as possible over all other style factors. Untested - just trying to express the idea.

    /** @typedef {[name: string, value?: unknown]} Variable */
    /** @typedef {string | Variable} Token */
    isVariable = Array.isArray
    bind = (token, values) =>
      isVariable(token) ? [token[0], values[token[0]]] : token
    unbind = (token, values) => {
      if (isVariable(token) && token.length > 1) {
        if (values) {
          values[token[0]] = token[1]
        }
        return [token[0]]
      }
      return token
    }
    render = token => (isVariable(token) ? token[1] : token)
    /**
     * Render a translated string:
     * ```
     * t`Some kind of ${t.thing(user_data)}`.toString()
     * ```
     */
    t = (literals, ...args) => {
      // template = ["some kind of ", ""]
      //     args = [t.thing]
      // zip -> ["some kind of ", t.thing, ""]
      const tokens = literals.flatMap((literal, i) =>
        i === 0 ? literal : [args[i - 1], literal],
      )
      return methods(tokens)
    }
    methods = tokens =>
      Object.assign(tokens, {
        bound: values => methods(this.map(token => bind(token, values))),
        unbound: values => methods(this.map(token => unbind(token, values))),
        toKey: values => JSON.stringify(this.unbound(values)),
        toString: () => {
          const values = Object.create(null)
          const translated = TRANSLATION_TABLE[this.toKey(values)]
          const resolved = translated
            ? translated.map(token => bind(token, values))
            : tokens
          return resolved.map(render).join("")
        },
      })

    // Proxy so t.anyKey returns the variable constructor
    t = new Proxy(t, {
      get: (target, name) =>
        Reflect.get(target, prop) ?? ((...args) => [name, ...args]),
    })

    // Example:
    const TRANSLATION_TABLE = {
      // This can be JSON.stringify round tripped fine
      [t`Some kind of ${t.thing()}`.toKey()]: t`${t.thing()} はどこですか`,
    }
    function handleEvent(event) {
      alert(t`Some kind of ${t.thing(event.thing)}`)
    }

    const prepared = t`Avoids ${t.repeated()} JSON.stringify lookups`
    function calledInLoop() {
      console.log(prepared.bound({ repeated: "lots" }).toString())
    }

vasvir•3mo ago

Yes the CORS threat model was also reversed for me. Couldn't understand it. Eventually I got it...

spankalee•3mo ago

What do you mean "broken"? Template literals are great.

bawolff•3mo ago

This has nothing to do with xss or security. Its also a pretty common for template literals/string interopolation to work like this. There are a couple of exceptions, but the majority of programming languages do it this way.

Its why they are called "literals".

somat•3mo ago

As far as I can tell JS has no way to symbolicly handle unformatted templates and then format them later.

For example, you can't do this.

  const t1 = new Template('Hello ${name}');

  const str_1 = t1.format({'name':user_name});

You could argue, perhaps correctly, that this is by design and doing something like this is a mistake. But when my whole clever idea depended on doing exactly this, I was a bit surprised when it does not work with native templates.

bawolff•3mo ago

Sure. And you can't do it in php either.

I'm not saying its right or wrong just that php is following the trend with this feature when it comes to language design.

I know i said earlier its not for security, but it could very well be for security (not xss though) as format string injection is a common vulnerability in c and python which allow this sort of thing.

leenify•3mo ago

Thank you for the effort to bring this to life together with Freddy!

redbell•3mo ago

> This feature is not Baseline because it does not work in some of the most widely-used browsers.

This is interesting, but it appears to be in its early days as none of the major browsers seem to support it.. yet.

JadeNB•3mo ago

A sibling comment by evilpie says that it is enabled in Firefox Nightly: https://news.ycombinator.com/item?id=45674985

MarsIronPI•3mo ago

Actually, it exists behind an about:config as far back as 138. So if you enable it, it even works in the current ESR.

CaptainOfCoit•3mo ago

Really happy to see it, after 25 years (https://www.bugcrowd.com/glossary/cross-site-scripting-xss/) of surviving without it. It always struck me as an obvious missing part of the DOM API, and I still don't know why it took this long time.

But mostly I'm just happy that it's finally here, I do appreciate all the hard work people been doing to get this live.

theendisney•3mo ago

Yes

<sc<script>ript>

bawolff•3mo ago

I think DOM api people just really wanted everyone to use .appendChild() methods.

I think there is an interesting lesson here about how security is partially an ergonomic problem.

AlienRobot•3mo ago

Great functionality, terrible name.

varun_ch•3mo ago

I sometimes wonder whether what the DOM APIs could look like in a hypothetical world where we could start over with everything.

hexasquid•3mo ago

It looks like this isn't a standard yet.

jonathrg•3mo ago

Why? Does it not set the HTML?

netsharc•3mo ago

It doesn't say "There's a lot of hidden sanitizing stuff inside this method" from the name...

Something like "setSafeHTML()" would be preferable. (Since it's Mozilla, there should be a few committee meetings to come up with the appropriate name)...

hoppp•3mo ago

Well ,could it be safelySetHTML instead of setSafeHTML ?

The second one could imply the HTML is already safe while the first one is safe way to set html.

If it's just setHTML then it could imply that don't care if its safe or not.

AlienRobot•3mo ago

There is already an innerHTML property for elements. This doesn't set the outer HTML, so it's literally setInnerHTML2.

hexasquid•3mo ago

After a minute of digging, found discussion here: https://github.com/WICG/sanitizer-api/issues/100 Perhaps it can be reopened (or a new issue can be opened) regarding naming.

ajkjk•3mo ago

The name seems ideal to me.

dzogchen•3mo ago

Neat. I think once this is adopted by HTMX (or similar libraries) you don't need to sanitize on the server side anymore?

dylan604•3mo ago

Do you honestly feel that we will ever be in a place for the server to not need to sanitize data from the client? Really? I don't. Any suggestion to me of "not needing to sanitize data from client" will immediately have me thinking the person doing the suggesting is not very good at their job, really new, or trying to scam me.

There's no reason to not sanitize data from the client, yet every reason to sanitize it.

jsmith99•3mo ago

It's arguably easier just to sanitise at display time otherwise you have problems like double escaping.

bpt3•3mo ago

Easier does not mean better, which seems to be true in this case given the many, many vulnerabilities that have been exploited over the years due to a lack of input sanitization.

padjo•3mo ago

In this case easier is actually better. Sanitize a string at the point where you are going to use it. The locality makes it easy to verify that sanitation has been done correctly for the context. The alternative means you have to maintain a chain of custody for the string and ensure it is safe.

dylan604•3mo ago

if you are using it at the client, sure, but then why is the server involved? if you are sending it to the server, you need to treat it like it is always coming from a hacker with very bad intentions. i don't care where the data comes from, my server will sanitize it for its own protection. after all, just because it left "clean" from your browser does not mean it was not interfered with elsewhere upstream TLS be damned. if we've double encoded something, that's fine, it won't blow up the server. at the end of that day, that's what is most important. if some double decoding doesn't happen correctly on the client, then <shrugEmoji>

padjo•3mo ago

Yeah as an Irish person with an apostrophe in their name this attitude is why my name routinely gets mangled or I get told my name is invalid.

You don’t escape input. You safely store it in the database and then sanitize it at the point where you’re going to use it.

strbean•3mo ago

It can be a complicated and error-prone process, mainly in scenarios where you have multiple mediums that require different sanitizers. Obviously you should do it. But in such scenarios, the best practice is to sanitize as close to the place it is used as possible. I've seen terrible codebases where they tried to apply multiple layers of sanitization on user input before storing to the DB, then reverse the unneeded layers before output. Obviously this didn't work.

Point being, if you can move sanitization even closer to where it is used, and that sanitization is actually provided by the standard library of the platform in question, that's a massive win.

immibis•3mo ago

By "sanitise" what's really meant is usually "escape". User typed their display name as <script>. You want the screen to say their display name, which is <script>. Therefore you send <script>. That's not their display name - that's just what you write in HTML to get their display name to appear on the screen. You shouldn't store it in the database in the display_name column.

strbean•3mo ago

Agreed. The codebase I'm thinking of was html encoding stuff before storing it, then when they needed to e.g. send an SMS, trying to remember to decode. Terrible.

dylan604•3mo ago

You're making a bad assumption that client side code was the last place the submitted string was altered in the path to the server. The man in the middle might have a different idea and should always be protected against on the server where it is the last place to sanitize it.

strbean•3mo ago

Well, you have to sanitize for the transport medium, otherwise you can't sanitize at all afterwards. But if I'm sending user content in JSON and I didn't sanitize it for insertion into HTML, what man in the middle is going to be compromised? Furthermore, how can I possibly protect an unknown intermediary without knowing what it is going to do with it?

Maybe it is going to try to copy a value into a 20 char buffer, I don't know!

padjo•3mo ago

Sanitize as close as possible to where it is used is usually best, then you don’t have to keep track of what’s sanitized and what’s not sanitized for very long.

(Especially important if sanitation is not idempotent!)

auxiliarymoose•3mo ago

If you sanitize on the server, you are making assumptions about what is safe/unsafe for your clients. It's possible to make these assumptions correctly, but that requires keeping them in sync with all clients which is hard to do correctly.

Something that's sanitized from an HTML standpoint is not necessarily sanitized for native desktop & mobile applications, client UI frameworks, etc. For example, with Cloudflare's CloudBleed security incident, malformed img tags sent by origin servers (which weren't themselves by themselves unsafe in browsers) caused their edge servers to append garbage (including miscellaneous secure data) from heap memory to some requests that got indexed by search engines.

Sanitization is always the sole responsibility of the consumer of the content to make sure it presents any inbound data safely. Sometimes the "consumer" is colocated on the server (e.g. for server rendered HTML + no native/API users) but many times it's not.

dylan604•3mo ago

> If you sanitize on the server, you are making assumptions about what is safe/unsafe for your clients.

No. I'm making decisions on what is safe for my server. I'm a back end guy, I don't really care about your front end code. I will never deem your front end code's requests as trustworthy. If the front end code cannot properly handle encoding, the back end code will do what it needs to do to not allow stupid string injection attacks. I don't know where your request has been. Just because you think it came from your code in the browser does not mean that was the last place it was altered before hitting the back end.

auxiliarymoose•3mo ago

How can user input be unsafe on the server? Are you evaluating it somehow?

User-generated content shouldn't be trusted in that way (inbound requests from client, data fields authored by users, etc.)

dylan604•3mo ago

Is that a serious question?

INSERT INTO table (user_name) VALUES ...

Are you one of today's 10000 on server side sanitizing of user data?

krapp•3mo ago

Are you one of today's 10000 on using parameterized queries and prepared statements?

Unless you're doing something stupid like concatenating strings into SQL queries, there's no need to "sanitize" anything going into a database. SQL injection is a solved problem.

Coming from the database and sending to the client, sure. But unless you're doing something stupid like concatenating strings into SQL statements it hasn't been necessary to "sanitize" data going into a database in ages.

Edit: I didn't realize until I reread this comment that I repeated part of it twice, but I'm keeping it in because it bears repeating.

hoppp•3mo ago

SQL injection is solved if you use dependencies that solve it of course.

Other than SQL injection there is command or log injection, file names need to be sanitized or any user uploaded content for XSS and that includes images. Any incoming JSON data should be sanitized, extra fields removed etc.

Log injection is a pretty nasty sort of hack that depending on how the logs are processed can lead to XSS or Command injection

auxiliarymoose•3mo ago

Communicating with a SQL driver by concatenating strings containing user input and then evaluating it? wat?

I'm very interested in what tech stack you are using where this is a problem.

jfengel•3mo ago

People do it all the time, on any tech stack that lets you execute command strings. A lot of of early databases didn't even support things like parameterized inserts.

leenify•3mo ago

As the stuff is rendered on the front-end how do you deal with tags where you do not even have the information to decide how they shall be parsed on the server?

This seems rather ignorant and, in my experience, leads to security issues, such as CVE-2023-38500 or CVE-2023-23627. This is not decidable on the server-side, so you will always mess stuff like this up. Sanitization can only work properly on the client for HTML.

wasmperson•3mo ago

I interpreted your question as "do I now no longer need to escape user-generated data in the HTML sent by the server in response to requests by HTMX?" The short answer is no, you still need to escape it:

- HTMX adds extra significance to HTML attributes which aren't accounted for by the built-in sanitizer

- HTMX can't add a custom sanitizer because it wouldn't be able to distinguish between intentional and malicious uses of those attributes

- Even if the HTMX client library sanitized all of the HTML from the server, you can't guarantee that all requests to the server will come from HTMX: browsers can navigate to your "back-end" URLs directly. While you can protect yourself from this using HTTP headers, that's not something I'd feel comfortable relying on since it would be easy to not notice when you've accidentally gotten it wrong.

The HTMX website has a longer explainer on how to protect yourself from XSS when using the library:

https://htmx.org/essays/web-security-basics-with-htmx/

ishouldbework•3mo ago

> It then removes any HTML entities that aren't allowed by the sanitizer configuration, and further removes any XSS-unsafe elements or attributes — whether or not they are allowed by the sanitizer configuration.

Emphasis mine. I do not understand this design choice. If I explicitly allow `script` tag, why should it be stripped?

If the method was called setXSSSafeSubsetOfHTML sure I guess, but feels weird for setHTML to have impossible-to-override filter.

evilpie•3mo ago

If you want to use an XSS-unsafe Sanitizer you have to use setHTMLUnsafe.

jmull•3mo ago

I guess they are going for a safe default... the idea is people who don't carefully read the docs or carefully monitor the provenance of their dynamically generated HTML will probably reach for "setHTML()".

Meanwhile, there's "setHTMLUnsafe()" and, of course, good old .innerHTML.

strbean•3mo ago

This is primarily an ergonomic addition, so it kinda makes sense to me to not make the dangerous footguns more ergonomic in the process. You can still assign `innerHTML` etc. to do the dangerous thing.

hsbauauvhabzb•3mo ago

Ideally this should be called dangerouslySetInnerHTML but hindsight blah blah

meowface•3mo ago

I agree, though I also agree with the parent that the method name is a little bit confusing. "safeSetHTML" or "setUntrustedHTML" or something would be clearer.

strbean•3mo ago

Idk about that, there's a good argument that the most obvious methods should be the safe ones. That's what juniors will probably jump to first. If you need the unsafe ones, you'll probably be able to figure that out and find them quickly.

jfengel•3mo ago

I like React's dangerouslySetInnerHTML. The name so clearly conveys "you can do this but you really, really, really shouldn't".

domenicd•3mo ago

Indeed, the web platform now has setHTML() and setHTMLUnsafe() to replace the innerHTML setter.

There's also getHTML() (which has extra capabilities over the innerHTML getter).

meowface•3mo ago

Okay, I've changed my mind and agree this is better, then. I wasn't aware they were adding two new methods. That is the safest way to do it.

SoftTalker•3mo ago

Why not name it what it does: sanitizeAndSetHTML

xp84•3mo ago

Naming things in that manner hasn’t proven to be a good idea over the years.

When you have 2 of something and one is safe/better and the other one is known to be problematic, you give the awkward name to the problematic one and the obvious name to the safe/better one. Noobs oughtn’t to be attempting the other one, and anyone who is mature enough to have reason to do it, are mature enough to appreciate the reason behind that complexity.

pwdisswordfishy•3mo ago

It doesn't matter when the "unsafe" method is already so entrenched and easy to reach for.

xp84•3mo ago

Sure it does. A baby developer today has a good chance of discovering setHTML first. The most “with it” keep abreast of great new additions to the DOM API. We just have to educate the mid-levels (and hope the AI that does most of the actual code authoring for the juniors gets the memo quickly).

wewtyflakes•3mo ago

Wouldn't that open the floodgates by allowing code that could itself call `setHTML` again but then further revise the args to escalate its privileges?

systoll•3mo ago

A script tag would be able to call setHTMLUnsafe, bypassing whatever sanitation you configured.

I’d’ve made it a runtime error to call setHTML with an unsafe config, but Javascript tends toward implicit reinterpretation rather than erroring-out.

recursivecaveat•3mo ago

You have to make the safe version the ergonomic one. Many many C++ memory bugs are a result of the standards committee making the undefined behaviour version of an operation even 3 characters shorter than the safe one. (They're still doing it too! I found another example added in C++23 recently)

masklinn•3mo ago

> feels weird for setHTML to have impossible-to-override filter.

It really doesn’t. We’ve decades of experience telling us that safe behaviour is critical.

> I do not understand this design choice. If I explicitly allow `script` tag, why should it be stripped?

Because there’s an infinitesimal number of situations where it’s not broken, and that means you should have to put in work to get there.

`innerHTML` still exists, and `setHTMLUnsafe` has no filtering whatsoever by default (not even the script deactivation innerHTML performs).

ishouldbework•3mo ago

I did not notice setHTMLUnsafe exists. That makes it (in my, unimportant, opinion) fine.

ibowankenobi•3mo ago

The API design could be better. Document fragments are designed to be reused. It should accept an optional fragment key which accepts a document fragment.If not a fragment, throw, if has children, empty contents first.

spankalee•3mo ago

In what way are document fragments meant to be reused?

They empty their contents into the new parent when they're appended, so they can't be meaningfully appended a second time without rebuilding them.

`<template>` is mean to be reused, since you're meant to clone it in order to use it, and then you can clone it again.

ibowankenobi•3mo ago

You can absolutely reuse a document fragment

https://ibrahimtanyalcin.github.io/Cahir/

the whole rendering uses a single fragment.

halapro•3mo ago

You can absolutely not reuse a DocumentFragment. The moment you append it to a node, the fragment is emptied.

https://dom.spec.whatwg.org/#mutation-algorithms

> To insert a node into a parent before a child [...]:

> If node is a DocumentFragment node:

> Remove its children

ibowankenobi•3mo ago

People have trransformed their brain because of using frameworks and do not understand how DOM works.

I pasted A LIVE example to prove you wrong and you will still attach me whatwg link. YES , when you append it is emptied! Keep a reference to the same fragment and REAPPEND to it! REUSE it. If you want to empty without appending, call replaceChildren() since it inherits from Node.

Why are people stubborn on things they dont know????

spankalee•3mo ago

We are only talking about the DOM and not about frameworks.

DocumentFragments empty their contents when appended. This is standard DOM behavior. To "reuse" a DocumentFragments after appending it somewhere you have to repopulate it with _new_ DOM, which is no different from creating a new fragment.

At that point are you really arguing that you can keep a container and keep refilling it and that counts as reuse in the sense we mean? Reuse in spirit is reusing the DOM in the container, not jus the empty container.

padjo•3mo ago

As someone who has dealt with more than my fair share of content injection vulnerabilities over the years this is great to see at last. It’s kinda crazy that this only coming now while other, more cumbersome solutions like CSP have been around for years.

modinfo•3mo ago

Cursor build a pseudo-sethtml: https://github.com/skorotkiewicz/pseudo-sethtml

exdeejay_•3mo ago

This code only does the most basic and naive regex filtering that even a beginner XSS course's inputs would work against. With the Node example code and input string:

  <p>Hello <scr<script>ipt>alert(1)</scr<script>ipt> World</p>

The program outputs:

  $ node .
  <p>Hello <script>alert(1)</script> World</p>
  {
    sanitizedHTML: '<p>Hello <script>alert(1)</script> World</p>',
    wasModified: true,
    removedElements: [],
    removedAttributes: []
  }

Asking a chatbot to make a security function and then posting it for others to use without even reviewing it is not only disrespectful, but dangerous and grossly negligent. Please take this down.

codedokode•3mo ago

I wonder why Cursor chose regex approach when it is widely known that it is a wrong method. Is it a result of training on low-quality forums for beginners?

foldr•3mo ago

It does seem like a weirdly bad result. I got something more sensible that used DOMParser when I gave GPT-5 the following prompt:

> Write a JavaScript function for sanitizing arbitrary untrusted HTML input before setting a DOM element’s innerHTML attribute.

I won’t post it here in case someone tries to use it, but it wasn’t just doing regex munging.

bilekas•3mo ago

It doesn't really matter, but if you ask it the exact same prompt it will give different results everytime. And if you don't know how to write one properly yourself, you really shouldn't be blindly trusting Ai to produce something correctly. But these are the source of all future employment of developers and engineers who actually know things.

sph•3mo ago

  node.ts:52: const regex = new RegExp(`<\\/?${tag}[^>]*>`, "gi");
  node.ts:72: const regex = new RegExp(`\\s+${attr}\\s*=\\s*["'][^"']*["']`, "gi");
  node.ts:94: const tagRegex = /<(\w+)[^>]*>/g;

https://stackoverflow.com/questions/1732348/regex-match-open...

LLMs are not intelligent enough to figure that the post is non-satirical and you should indeed avoid parsing HTML with regexes.

On the other hand, there is a non-zero chance that a vibe coded HTML parser will eventually include obscure references to ritual infanticide and other eldritch entities of the Basic Multilingual Plane.

_the_inflator•3mo ago

Maybe it is then time for having something that is beyond "use strict" at the beginning auf a JavaScript document as one option to use the statement.

I think a config object in which you define for script options like sanitization and other script configuration might be helpful.

After all, there almost always need to be backward compatibility be ensured, and this might work. I am no spec guy, it is just an idea. React makes use of "use client/server", so this would be more central and explicit.

ricardobeat•3mo ago

ESM modules are already in strict mode by default.

rictic•3mo ago

A somewhat related spec, at the page level rather than the module level, are Content Security Policies, which let a page disable various unsafe browser features for a page: https://developer.mozilla.org/en-US/docs/Web/HTTP/Guides/CSP

One of my favorite features in there is trusted types enforcement: https://developer.mozilla.org/en-US/docs/Web/API/Trusted_Typ...

Lets you create your own API for what code is allowed to create arbitrary, potentially unsafe HTML at runtime, so you can allow secure templating systems but disallow code that just concats strings together naively.

cheeaun•3mo ago

Found a polyfill here https://github.com/mozilla/sanitizer-polyfill

dkyc•3mo ago

This just uses DOMPurify under the hood

Izkata•3mo ago

...yes, that's what a polyfill is: a javascript implementation of a new spec that's only applied when the current browser doesn't yet support the new spec. This lets devs start using it right away, then when it has enough support across browsers the polyfill can be removed without changing their code.

sergeykish•3mo ago

So `.setHTML("<script>...</script>")` does not set HTML?

xp84•3mo ago

Sounds reasonable enough to me. 99.99% of the times you’re in an actual script, if you mean to execute code, you’d just execute it yourself, rather than making a script tag full of code and sticking that tag into a random DOM element. That’s why the default wouldn’t honor the script tag and there’d be an “unsafe” method explicitly named as such to hint you that you’re doing something weird.

amelius•3mo ago

But it breaks an abstraction. Sometimes you just want to take working HTML and insert it into a document. It will be painful if suddenly this does not work, and you have to dig into the documentation to see why.

rictic•3mo ago

It is also painful when your app gets hacked, accounts get taken over and abused, user data is compromised, and so on. For serious sites it's worth the pain to turn on security enforcement features.

amelius•3mo ago

Ok, but be sure to make it optional. Putting 10 locks on your door is great for security, but it's not for everyone.

And instead of this security feature some might want to take a more fundamental look at security which might lead them to a completely different design. Again, make it optional.

kibwen•3mo ago

It is optional. Use setHTMLUnsafe.

joquarky•3mo ago

Then just use innerHTML, it's not going away.

xp84•3mo ago

If a developer so green that they don’t know what script injection risk is, and doesn’t know about innerHTML vs this method, stumbles into that scenario, I want them to encounter friction and have to dig into the documentation to find out why their script tag wasn’t run. Then they can start to learn how to do their job correctly. Having everything “just work” unsafely by default is not a viable best practice on the Web in 2025. Things have been slowly changing in this direction for at least a decade.

In fact, it’s better for the industry even if a few such individuals are so pained by having to learn about and handle security that they just quit web development entirely. Just like aspiring pilots who can’t stand checklists and safety rules should pursue a different career.

WA•3mo ago

Neither does

    .innerHTML = "<script>...</script>"

codedokode•3mo ago

I don't like this. This could be implemented as a JS library. I believe browsers should provide the minimal API so that they are smaller and easier to create. As for safe alternative to innerHTML, it is called innerText.

csmantle•3mo ago

I think innerText and setHTML() have different purposes. The former inserts the whole string as a text leaf, while the latter tries to preserve structures that are meaningful in context.

---

Libraries can surely do the same job, but then the exact behavior would vary among a sea of those libs. Having specs defined [0] for such an interface would hopefully iron out much of these variations, as well as enabling some performance gains.

[0]: https://wicg.github.io/sanitizer-api/#dom-element-sethtml

codedokode•3mo ago

And if you need something that is not in a spec, you have to use a library anyway. Also the point was that browser should be as simple as possible and not like a whole new OS.

codezero•3mo ago

innerText has wildly inconsistent implementations across browsers.

petralithic•3mo ago

> I believe browsers should provide the minimal API so that they are smaller and easier to create.

That ship has long since sailed. Browsers are so complex that it takes quite some effort to support the various levels of 9s of the percentage of compatibility with standards, not to mention the browser makers themselves define many of the standards.

CGamesPlay•3mo ago

Is “XSS-unsafe” precisely defined anywhere? I assume it means “any access to the JS interpreter”, but assuming in this context seems decidedly unsafe.

pyth0•3mo ago

It appears you can tune what is sanitized from the input via the "sanitizer" optional parameter. The default sanitizer is however defined in a spec linked on the docs page [1] with the actual sanitize operation specified as well [2].

[1] https://wicg.github.io/sanitizer-api/#dom-element-sethtml

[2] https://wicg.github.io/sanitizer-api/#sanitize

CGamesPlay•3mo ago

Ah, perfect, the "remove unsafe" operation is what I was looking for. It includes a list of elements and a list of attributes. These appear to apply regardless of the sanitizer configuration you use, the original MDN link demonstrates allowlisting "script" but seeing that it is removed anyways.

https://wicg.github.io/sanitizer-api/#sanitizerconfig-remove...

Traubenfuchs•3mo ago

So this is the easier, built in successor to

    innerHTML = trustedTypes.createPolicy('myPolicy', {
  createHTML: (input) => DOMPurify.sanitize(input, {RETURN_TRUSTED_TYPE: true})

}).createHTML()

textlanes33•3mo ago

This is goood news for me. Finally! A safer and more predictable alternative to innerHTML.

ulrischa•3mo ago

Since 5 years everybody says Jquery us no longer necessary. But really baisc function like this take a long time for replacing Jquery

halapro•3mo ago

jQuery does not sanitize HTML. This is why jQuery is no longer necessary, even if people think it is.

ulrischa•3mo ago

There is the jquery bashing again. let sanitizedHTML = $('<div>').text(unsanitizedHTML).html();

wccrawford•3mo ago

You can 100% do that same thing without jQuery. It's not even complicated.

And that is not what the new .setHTML() does.

awayredbarron•3mo ago

>Verbose I/O element

Parsing > "DocumentFragment"

Returns proc. exit status [0]/[1] for browser HTML incompatability.

xnorswap•3mo ago

That's really good to hear.

I've found LLMs will happily generate XSS vulnerable code, which will make things worse for a while until they can be trained better.

In fact, I found it really difficult to get claude-code to use templating libraries and not want to default to hand-written templating with XSS vulnerabilties and injecting content directly, even after going through options with it.

There's also a difference between escaping and sanitisation which can be tricky to handle and track, and it can even be dangerous to try to mix different approaches or sanitizers.

Having a safe backstop in the form of setHTML() to use will be a fantastic addition to narrow the scope of ways to get it wrong.

tartoran•3mo ago

This whole page renders extremely poorly on mobile, beyond usable I'd say.

waldothedog•3mo ago

Working well for me on mobile!

erickotato•3mo ago

I don't really get it. Why this is needed on client? If all we want is to prevent XSS attacks, wouldn't it be more effective to sanitize on server? Am I missing something?

Reputation Scores for GitHub Accounts

A BSOD for All Seasons – Send Bad News via a Kernel Panic

Show HN: I got tired of copy-pasting between Claude windows, so I built Orcha

Omarchy First Impressions

Reinforcement Learning from Human Feedback

Show HN: Versor – The "Unbending" Paradigm for Geometric Deep Learning

Show HN: HypothesisHub – An open API where AI agents collaborate on medical res

Big Tech vs. OpenClaw

Anofox Forecast

Ask HN: How do you figure out where data lives across 100 microservices?

Motus: A Unified Latent Action World Model

Rotten Tomatoes Desperately Claims 'Impossible' Rating for 'Melania' Is Real

The protein denitrosylase SCoR2 regulates lipogenesis and fat storage [pdf]

Los Alamos Primer

NewASM Virtual Machine

Terminal-Bench 2.0 Leaderboard

I vibe coded a BBS bank with a real working ledger

The Path to Mojo 1.0

Show HN: I'm 75, building an OSS Virtual Protest Protocol for digital activism

Show HN: I built Divvy to split restaurant bills from a photo

Hot Reloading in Rust? Subsecond and Dioxus to the Rescue

Skim – vibe review your PRs

Show HN: Open-source AI assistant for interview reasoning

Tech Edge: A Living Playbook for America's Technology Long Game

Golden Cross vs. Death Cross: Crypto Trading Guide

Hoot: Scheme on WebAssembly

What the longevity experts don't tell you

Monzo wrongly denied refunds to fraud and scam victims

They were drawn to Korea with dreams of K-pop stardom – but then let down

Show HN: AI-Powered Merchant Intelligence

Reputation Scores for GitHub Accounts

A BSOD for All Seasons – Send Bad News via a Kernel Panic

Show HN: I got tired of copy-pasting between Claude windows, so I built Orcha

Omarchy First Impressions

Reinforcement Learning from Human Feedback

Show HN: Versor – The "Unbending" Paradigm for Geometric Deep Learning

Show HN: HypothesisHub – An open API where AI agents collaborate on medical res

Big Tech vs. OpenClaw

Anofox Forecast

Ask HN: How do you figure out where data lives across 100 microservices?

Motus: A Unified Latent Action World Model

Rotten Tomatoes Desperately Claims 'Impossible' Rating for 'Melania' Is Real

The protein denitrosylase SCoR2 regulates lipogenesis and fat storage [pdf]

Los Alamos Primer

NewASM Virtual Machine

Terminal-Bench 2.0 Leaderboard

I vibe coded a BBS bank with a real working ledger

The Path to Mojo 1.0

Show HN: I'm 75, building an OSS Virtual Protest Protocol for digital activism

Show HN: I built Divvy to split restaurant bills from a photo

Hot Reloading in Rust? Subsecond and Dioxus to the Rescue

Skim – vibe review your PRs

Show HN: Open-source AI assistant for interview reasoning

Tech Edge: A Living Playbook for America's Technology Long Game

Golden Cross vs. Death Cross: Crypto Trading Guide

Hoot: Scheme on WebAssembly

What the longevity experts don't tell you

Monzo wrongly denied refunds to fraud and scam victims

They were drawn to Korea with dreams of K-pop stardom – but then let down

Show HN: AI-Powered Merchant Intelligence

Element: setHTML() method

Comments