While lit-html templates are already XSS-hardened because template strings aren't forgeable, we do have utilities like `unsafeHTML()` that let you treat untrusted strings as HTML, which are currently... unsafe.
With `Element.setHTML()` we can make a `safeHTML()` directive and let the developer specify sanitizer options too.
The app developers can still use that right now, but if the framework forces it's usage it'd unnecessarily increase package size for people that didn't need it.
Two, even if we did, DOMPurify is ~2.7x bigger than lit-html core (3.1Kb minzipped), and the unsafeHTML() directive is less than 400 bytes minzipped. It's just really big to take on a sanitizer, and which one to use is an opinion we'd have to have. And lit-html is extensible and people can already write their own safeHTML() directive that uses DOMPurify.
For us it's a lot simpler to have safe templates, an unsafe directive, and not parse things to finely in between.
A built-in API is different for us though. It's standard, stable, and should eventually be well known by all web developers. We can't integrate it with no extra dependencies or code, and just adopt the standard platform options.
See https://mizu.re/post/exploring-the-dompurify-library-bypasse... for an example of why this is really hard. Please do not roll your own sanitizers; DOMPurify has very good maintenance hygiene, and the maintainer is an expert. I have reported a bunch of issues and never waited for more than two hours for a response in the past. He is also one of the leading authors of the specification behind `setHTML`.
I could easily add more, like headings or tables. Just decided to not overwhelm the readers. But all of the allowed elements / attributes here are harmless. When I'm copying them, I'm only copying the known safe elements and attributes (forbids unknown attributes, including styles/scripts, event handlers, style attributes, ids, or even classes). I have fine control over the allowed elements / attributes and the structure. This makes things much easier. For a basic html content management this kind of filtering is fine since DOMParser actually does the heavy lifting.
Sure, DomPurify is powerful and handles much more complex use cases (doesn't it also use DOMParser though?), no doubts about that. But a basic CMS probably has to handle basic HTML text elements. I guess inline SVG sanitation is more complicated (maybe just use ordinary <img> instead?).
If you have some html example that will inject js/css or cause any unexpected behavior in my code example, please provide that HTML.
If curious I had a bright idea for a string translation library, yes, I know there are plenty of great internationalization libraries, but I wanted to try it out. the idea was to just write normalish template strings so the code reads well, then the translation engine would lookup the template string in the language table and replace it with the translated template string, this new template string is the one that would be filled. But I could not get it to work. I finally gave up and had to implement "the template system we have at home" from scratch just to get anything working.
To the designers of JS template literals, I apologize, you were blocking an attack vector that never crossed my mind. It was the same thing the first time I had to do the cors dance. I thought it was just about the stupidest thing I had ever seen. "This protects nothing, it only works when the client(the part you have no control over) decides to do it" The idea that you need protection after you have deliberately injected unknown malicious code(ads) into your web app took me several days of hard thought to understand.
my example: a table to lookup translated templates. most translation engines require you to use placeholder strings. this lets you use the template directly as the optional lookup key.
simplified with some liberties taken as this can't be done with template literals. Easy enough to fake with some regexes and loops. but I was a bit surprised that the built in js templates are limited in this manner.
const translate_table = {
'where is the ${thing}':'${thing} はどこですか' ,
}
function t(template, args) {
if (translate_table[template] == undefined) {
return template.format(args);
}
else {
return translate_table[template].format(args);
}
}
user_dialog(t('Where is the ${thing}', {'thing', users_thing} ));
I even dug deep into tagged templates, but they can't do this ether. The only solution I found was a variant of eval() and at that point I would rather write my own template engine.The only restriction may be that variable placeholders in additional translations might need to be positional rather than named.
Here's how the tagged template literal maps to tokens:
t`Where is the ${t.thing()}` ->
["Where is the ", ["thing"]] // ["variable name"]
Example rendering a translated string directly: t`Where is the ${t.thing(user_data)}?`.toString()
Its internet forum so I made it as short as possible over all other style factors. Untested - just trying to express the idea. /** @typedef {[name: string, value?: unknown]} Variable */
/** @typedef {string | Variable} Token */
isVariable = Array.isArray
bind = (token, values) =>
isVariable(token) ? [token[0], values[token[0]]] : token
unbind = (token, values) => {
if (isVariable(token) && token.length > 1) {
if (values) {
values[token[0]] = token[1]
}
return [token[0]]
}
return token
}
render = token => (isVariable(token) ? token[1] : token)
/**
* Render a translated string:
* ```
* t`Some kind of ${t.thing(user_data)}`.toString()
* ```
*/
t = (literals, ...args) => {
// template = ["some kind of ", ""]
// args = [t.thing]
// zip -> ["some kind of ", t.thing, ""]
const tokens = literals.flatMap((literal, i) =>
i === 0 ? literal : [args[i - 1], literal],
)
return methods(tokens)
}
methods = tokens =>
Object.assign(tokens, {
bound: values => methods(this.map(token => bind(token, values))),
unbound: values => methods(this.map(token => unbind(token, values))),
toKey: values => JSON.stringify(this.unbound(values)),
toString: () => {
const values = Object.create(null)
const translated = TRANSLATION_TABLE[this.toKey(values)]
const resolved = translated
? translated.map(token => bind(token, values))
: tokens
return resolved.map(render).join("")
},
})
// Proxy so t.anyKey returns the variable constructor
t = new Proxy(t, {
get: (target, name) =>
Reflect.get(target, prop) ?? ((...args) => [name, ...args]),
})
// Example:
const TRANSLATION_TABLE = {
// This can be JSON.stringify round tripped fine
[t`Some kind of ${t.thing()}`.toKey()]: t`${t.thing()} はどこですか`,
}
function handleEvent(event) {
alert(t`Some kind of ${t.thing(event.thing)}`)
}
const prepared = t`Avoids ${t.repeated()} JSON.stringify lookups`
function calledInLoop() {
console.log(prepared.bound({ repeated: "lots" }).toString())
}Its why they are called "literals".
For example, you can't do this.
const t1 = new Template('Hello ${name}');
const str_1 = t1.format({'name':user_name});
You could argue, perhaps correctly, that this is by design and doing something like this is a mistake. But when my whole clever idea depended on doing exactly this, I was a bit surprised when it does not work with native templates.I'm not saying its right or wrong just that php is following the trend with this feature when it comes to language design.
I know i said earlier its not for security, but it could very well be for security (not xss though) as format string injection is a common vulnerability in c and python which allow this sort of thing.
This is interesting, but it appears to be in its early days as none of the major browsers seem to support it.. yet.
But mostly I'm just happy that it's finally here, I do appreciate all the hard work people been doing to get this live.
<sc<script>ript>
I think there is an interesting lesson here about how security is partially an ergonomic problem.
Something like "setSafeHTML()" would be preferable. (Since it's Mozilla, there should be a few committee meetings to come up with the appropriate name)...
The second one could imply the HTML is already safe while the first one is safe way to set html.
If it's just setHTML then it could imply that don't care if its safe or not.
There's no reason to not sanitize data from the client, yet every reason to sanitize it.
You don’t escape input. You safely store it in the database and then sanitize it at the point where you’re going to use it.
Point being, if you can move sanitization even closer to where it is used, and that sanitization is actually provided by the standard library of the platform in question, that's a massive win.
Maybe it is going to try to copy a value into a 20 char buffer, I don't know!
(Especially important if sanitation is not idempotent!)
Something that's sanitized from an HTML standpoint is not necessarily sanitized for native desktop & mobile applications, client UI frameworks, etc. For example, with Cloudflare's CloudBleed security incident, malformed img tags sent by origin servers (which weren't themselves by themselves unsafe in browsers) caused their edge servers to append garbage (including miscellaneous secure data) from heap memory to some requests that got indexed by search engines.
Sanitization is always the sole responsibility of the consumer of the content to make sure it presents any inbound data safely. Sometimes the "consumer" is colocated on the server (e.g. for server rendered HTML + no native/API users) but many times it's not.
No. I'm making decisions on what is safe for my server. I'm a back end guy, I don't really care about your front end code. I will never deem your front end code's requests as trustworthy. If the front end code cannot properly handle encoding, the back end code will do what it needs to do to not allow stupid string injection attacks. I don't know where your request has been. Just because you think it came from your code in the browser does not mean that was the last place it was altered before hitting the back end.
User-generated content shouldn't be trusted in that way (inbound requests from client, data fields authored by users, etc.)
INSERT INTO table (user_name) VALUES ...
Are you one of today's 10000 on server side sanitizing of user data?
Unless you're doing something stupid like concatenating strings into SQL queries, there's no need to "sanitize" anything going into a database. SQL injection is a solved problem.
Coming from the database and sending to the client, sure. But unless you're doing something stupid like concatenating strings into SQL statements it hasn't been necessary to "sanitize" data going into a database in ages.
Edit: I didn't realize until I reread this comment that I repeated part of it twice, but I'm keeping it in because it bears repeating.
Other than SQL injection there is command or log injection, file names need to be sanitized or any user uploaded content for XSS and that includes images. Any incoming JSON data should be sanitized, extra fields removed etc.
Log injection is a pretty nasty sort of hack that depending on how the logs are processed can lead to XSS or Command injection
I'm very interested in what tech stack you are using where this is a problem.
This seems rather ignorant and, in my experience, leads to security issues, such as CVE-2023-38500 or CVE-2023-23627. This is not decidable on the server-side, so you will always mess stuff like this up. Sanitization can only work properly on the client for HTML.
- HTMX adds extra significance to HTML attributes which aren't accounted for by the built-in sanitizer
- HTMX can't add a custom sanitizer because it wouldn't be able to distinguish between intentional and malicious uses of those attributes
- Even if the HTMX client library sanitized all of the HTML from the server, you can't guarantee that all requests to the server will come from HTMX: browsers can navigate to your "back-end" URLs directly. While you can protect yourself from this using HTTP headers, that's not something I'd feel comfortable relying on since it would be easy to not notice when you've accidentally gotten it wrong.
The HTMX website has a longer explainer on how to protect yourself from XSS when using the library:
Emphasis mine. I do not understand this design choice. If I explicitly allow `script` tag, why should it be stripped?
If the method was called setXSSSafeSubsetOfHTML sure I guess, but feels weird for setHTML to have impossible-to-override filter.
Meanwhile, there's "setHTMLUnsafe()" and, of course, good old .innerHTML.
There's also getHTML() (which has extra capabilities over the innerHTML getter).
When you have 2 of something and one is safe/better and the other one is known to be problematic, you give the awkward name to the problematic one and the obvious name to the safe/better one. Noobs oughtn’t to be attempting the other one, and anyone who is mature enough to have reason to do it, are mature enough to appreciate the reason behind that complexity.
I’d’ve made it a runtime error to call setHTML with an unsafe config, but Javascript tends toward implicit reinterpretation rather than erroring-out.
It really doesn’t. We’ve decades of experience telling us that safe behaviour is critical.
> I do not understand this design choice. If I explicitly allow `script` tag, why should it be stripped?
Because there’s an infinitesimal number of situations where it’s not broken, and that means you should have to put in work to get there.
`innerHTML` still exists, and `setHTMLUnsafe` has no filtering whatsoever by default (not even the script deactivation innerHTML performs).
They empty their contents into the new parent when they're appended, so they can't be meaningfully appended a second time without rebuilding them.
`<template>` is mean to be reused, since you're meant to clone it in order to use it, and then you can clone it again.
https://ibrahimtanyalcin.github.io/Cahir/
the whole rendering uses a single fragment.
https://dom.spec.whatwg.org/#mutation-algorithms
> To insert a node into a parent before a child [...]:
> If node is a DocumentFragment node:
> Remove its children
I pasted A LIVE example to prove you wrong and you will still attach me whatwg link. YES , when you append it is emptied! Keep a reference to the same fragment and REAPPEND to it! REUSE it. If you want to empty without appending, call replaceChildren() since it inherits from Node.
Why are people stubborn on things they dont know????
DocumentFragments empty their contents when appended. This is standard DOM behavior. To "reuse" a DocumentFragments after appending it somewhere you have to repopulate it with _new_ DOM, which is no different from creating a new fragment.
At that point are you really arguing that you can keep a container and keep refilling it and that counts as reuse in the sense we mean? Reuse in spirit is reusing the DOM in the container, not jus the empty container.
<p>Hello <scr<script>ipt>alert(1)</scr<script>ipt> World</p>
The program outputs: $ node .
<p>Hello <script>alert(1)</script> World</p>
{
sanitizedHTML: '<p>Hello <script>alert(1)</script> World</p>',
wasModified: true,
removedElements: [],
removedAttributes: []
}
Asking a chatbot to make a security function and then posting it for others to use without even reviewing it is not only disrespectful, but dangerous and grossly negligent. Please take this down.> Write a JavaScript function for sanitizing arbitrary untrusted HTML input before setting a DOM element’s innerHTML attribute.
I won’t post it here in case someone tries to use it, but it wasn’t just doing regex munging.
node.ts:52: const regex = new RegExp(`<\\/?${tag}[^>]*>`, "gi");
node.ts:72: const regex = new RegExp(`\\s+${attr}\\s*=\\s*["'][^"']*["']`, "gi");
node.ts:94: const tagRegex = /<(\w+)[^>]*>/g;
https://stackoverflow.com/questions/1732348/regex-match-open...LLMs are not intelligent enough to figure that the post is non-satirical and you should indeed avoid parsing HTML with regexes.
On the other hand, there is a non-zero chance that a vibe coded HTML parser will eventually include obscure references to ritual infanticide and other eldritch entities of the Basic Multilingual Plane.
I think a config object in which you define for script options like sanitization and other script configuration might be helpful.
After all, there almost always need to be backward compatibility be ensured, and this might work. I am no spec guy, it is just an idea. React makes use of "use client/server", so this would be more central and explicit.
One of my favorite features in there is trusted types enforcement: https://developer.mozilla.org/en-US/docs/Web/API/Trusted_Typ...
Lets you create your own API for what code is allowed to create arbitrary, potentially unsafe HTML at runtime, so you can allow secure templating systems but disallow code that just concats strings together naively.
And instead of this security feature some might want to take a more fundamental look at security which might lead them to a completely different design. Again, make it optional.
In fact, it’s better for the industry even if a few such individuals are so pained by having to learn about and handle security that they just quit web development entirely. Just like aspiring pilots who can’t stand checklists and safety rules should pursue a different career.
.innerHTML = "<script>...</script>"---
Libraries can surely do the same job, but then the exact behavior would vary among a sea of those libs. Having specs defined [0] for such an interface would hopefully iron out much of these variations, as well as enabling some performance gains.
[0]: https://wicg.github.io/sanitizer-api/#dom-element-sethtml
That ship has long since sailed. Browsers are so complex that it takes quite some effort to support the various levels of 9s of the percentage of compatibility with standards, not to mention the browser makers themselves define many of the standards.
[1] https://wicg.github.io/sanitizer-api/#dom-element-sethtml
https://wicg.github.io/sanitizer-api/#sanitizerconfig-remove...
innerHTML = trustedTypes.createPolicy('myPolicy', {
createHTML: (input) => DOMPurify.sanitize(input, {RETURN_TRUSTED_TYPE: true})
}).createHTML()?
And that is not what the new .setHTML() does.
Parsing > "DocumentFragment"
Returns proc. exit status [0]/[1] for browser HTML incompatability.
I've found LLMs will happily generate XSS vulnerable code, which will make things worse for a while until they can be trained better.
In fact, I found it really difficult to get claude-code to use templating libraries and not want to default to hand-written templating with XSS vulnerabilties and injecting content directly, even after going through options with it.
There's also a difference between escaping and sanitisation which can be tricky to handle and track, and it can even be dangerous to try to mix different approaches or sanitizers.
Having a safe backstop in the form of setHTML() to use will be a fantastic addition to narrow the scope of ways to get it wrong.
michalpleban•3mo ago
Octoth0rpe•3mo ago
ngold•3mo ago
I obviously know nothing about this, but I still find it fascinating. Or am I off my block.
masklinn•3mo ago
bilekas•3mo ago
https://developer.mozilla.org/en-US/docs/Web/Security/Attack...
intrasight•3mo ago
evbogue•3mo ago
intrasight•3mo ago
afavour•3mo ago
president_zippy•3mo ago
Hmmm...
mpeg•3mo ago
for example, a search query, or a redirect url, or a million other things
intrasight•3mo ago
Whether I generate a whole page or generate a partial page and then add HTML to it is equivalent from a safety perspective.
matmo•3mo ago
theendisney•3mo ago
masklinn•3mo ago
Markdown implementations can do any of that, only allowing a whitelist of HTML elements (GFM), or not allowing HTML at all.
halapro•3mo ago
If you include user-provided data, then you should sanitize it for HTML.
jeroenhd•3mo ago
Solutions in the form of pre-existing HTML sanitisation libraries have existed for years but countless websites still manage to get XSS'd every year because not everyone capable of writing code is capable of writing secure code.
masklinn•3mo ago
2. Because it’s really easy to fuck up and leak attacker controlled content in markup, especially when the environment provides tons of tools to do things wrong and none to do things right. IME even when the environment provides tons of tools to do things right it’s an uphill battle (universe, idiots, yadda yadda).
zarzavat•3mo ago
idreyn•3mo ago
bawolff•3mo ago