It's probably a side effect of what is IMO another bad design of that language: letter casing determining field visibility, instead of using a keyword or a sigil. If your field has to be named "User" to be public, and the corresponding entry in the JSON has all-lowercase "user" as the key (probably because the JSON was defined first, and most languages have "field names start with lowercase" as part of their naming conventions), you have to either ignore case when matching, or manually map every field. They probably wanted to be "intuitive" and not require manual mapping.
then you specify the key to be "user"? Isn't that the point of the ability to remap names? Except you can't, because you don't have a choice whether or not your data is deserialised with case sensitivity enabled or not.
I've written plenty of Rust code to turn camelCase into snake_case and "it's too much effort" has never been a problem. It's a minor bother that helps prevents real security issues like the ones listed in this article.
Even if you want to help lazy programmers, I don't think there's a good reason to confuse "User" and "uſER" by default.
The accidental omitempty and - are a good example of the weirdness even if they might not cause problems in practice.
It's one of many examples of 80/20 design in Go: 80% of functionality with 20% of complexity and cost.
Struct tags address an important scenario in an easy to use way.
But they don't try to address other scenarios, like annotations do. They are not function tags. They're not variable tags. They are not general purpose annotations. They are annotations for struct fields and struct fields only.
Are they are as powerful as annotations or macros? Of course not, not even close.
Are they as complex to implement, understand, use? Also not.
80/20 design. 80% of functionality at 20% of cost.
There's no free lunch here, and the compromises Go makes to achieve its outcomes have shown themselves to be error-prone in ways that were entirely predictable at design time.
It does occasionally, although I'll push back on the "often". Go's simplifications allow most of the codebase to be... well... simple.
This does come at the cost of some complexity on the edge cases. That's a trade off I'm perfectly willing to make. The weird parts being complex is something I'm willing to accept in exchange for the normal parts being simple, as opposed to constantly dealing with a higher amount of complexity to make the edge cases easier.
> There's no free lunch here
This I'll agree with as well. The lunch is not free, but it's very reasonably priced (like one of those hole in the wall restaurants that serves food way too good for what you pay for it).
> the compromises Go makes to achieve its outcomes have shown themselves to be error-prone in ways that were entirely predictable at design time.
I also agree here, although I see this as a benefit. The things that are error prone are clear enough that they can be seen at design time. There's no free lunch here either, something has to be error prone, and I like the trade offs that go has made on which parts are error prone. Adding significant complexity to reduce those error prone places has, in my experience, just increased the surface area of the error prone sections of other languages.
Could you make the case that some other spot in design space is a better trade-off? Absolutely, especially for a particular problem. But this spot seems to work really well for ~95% of things.
Or mutability modifiers. Yes, that's an extra feature, and there's an undeniable appeal to having fewer features. But being able to flag things as immutable will make all the code you deal with easier in future.
Or consider how they left out generics for 15 years. It simplifies the language in some ways, sure, but when you needed generics, you had to use reflection, which is way more complicated than generics. Even macros, unpopular as they are, are better than codegen.
Again, I understand the appeal of minimalism, but a cost/benefit analysis of any of these features shows them to be a massive net gain. A blanket policy of "no, we can't have nice things" is needlessly austere, and it leaves everyone worse off imo.
Exactly this.
Basically: have a complex compression algorithm? Yes, it's complex, but the resulting filesize (= program complexity) will be low.
If you use a very basic compression algorithm, it's easier the understand the algorithm, but the filesize will be much bigger.
It's a trade-off. However, as professionals, I think we should really strive to put time to properly learn the good complex compression algorithm once and then benefit for all the programs we write.
[insert Pike's Google young programmers quote here]
That's just not the philosophy of the language. The convention in Go is to be as obvious as possible, at the cost of more efficient designs. Some people like it, others don't. It bothers me, so I stopped using Go.
Are you somehow under the impression that Go is unique in having a terse way to map fields to fields?
> It’s really quite novel once you understand it.
It's the opposite of novel, putting ad-hoc annotations in unstructured contexts is what people used to do before java 5.
This allows you to derive a safe parser from the structural data, and you can make said parser be really strict. See e.g., Wuffs or Langsec for examples of approaches here.
What constraints? Ignoring decades of programming language developments since C89?
So in .NET, like Java as you mention, we have attributes, .
e.g.
[JsonPropertyName("username")]
[JsonIgnore]
etc.This is simple, and obvious. The JsonPropertyName attribute is an override, you can set naming policies for the whole class. camelCase by default, with kebab-case, snake_case etc as alternative defaults.
C#/.NET of course has the benefit of having public properties, which are serialised by default, and private properties, which aren't, so you're unlikely to be exposing things you don't want to expose.
This contrasts to Go's approach, much like python, of using casing convention to determine private vs public fields. ( Please correct me if I'm wrong on this? )
The first example still confuses me though, because either you want IsAdmin to come from the user, in which case you still want to deserialise it, or you don't, in which case it shouldn't even be in your DTO at all.
Deserialisation there is a bit of a red-herring, as there should be a validation step which includes, "Does this user have the rights to create an admin?".
The idea of having a user class, which gets directly updated using properties straight from deserialized user input, feels weird to me, but I'd probably be dismissed as an "enterprise programmer" who wants to put layers between everything.
I think calling it a convention is misleading.
In Python, you can access an _field just by writing obj._field. It's not enforced, only a note to the user that they shouldn't do that.
But in Go, obj.field is a compiler error. Fields that start with a lowercase letter really are private, and this is enforced.
So I think it's better to think of it as true private fields, just with a... unique syntax.
Go actually ties visibility to casing, instead of using separate annotations. And it will not serialise private fields, only public.
Python has no concept of visibility at all, conventionally you should not access attributes prefixed with `_` but it won't stop you.
Any serious Python project will use at least one linter or typechecker, which can easily enforce this.
You can just not use them though - you can unmarshal to a map instead and select the keys you want, perform validation etc and then set the values.
Same when publishing - I prefer to have an explicit view which defines the keys exposed rather than than publishing all by default based on these poorly understood string keys attached to types.
The reason its like that is that Go philosophically is very much against the idea of annotations and macros, and very strongly about the idea of a clear upfront control flow, and this is one of the reasons I love the language. But it does come at the cost of a few highly useful usecases for annotations (like mapping JSON and XML, etc.) becoming obtuse to use.
The idea of more compile-time macros in Go is interesting to me, but at the same time the ease of debugging and understanding the Go control flow in my programs is one of the reasons I love it so much, and I would not want to invite the possibility of "magic" web frameworks that would inevitably result from more metaprogramming ability in Go. So I guess I'm prepared to live with this consequence. :/
The solution is usually to have an even better language. One, where the typesystem is so powerful, that such hacks are not necessary. Unfortunately, that also means you have to learn that typesystem to be productive in language, and you have to learn it more or less upfront - which is not something that Google wanted for golang due to the turnover.
What might be interesting is a language ecosystem, where one can write parts of a system in one language and other parts in another. The BEAM and JVM runtimes allow for this but I don't think I've seen any good examples of different languages commingling and playing to their strengths.
> The BEAM and JVM runtimes allow for this but I don't think I've seen any good examples of different languages commingling and playing to their strengths.
Probably because the runtime is always the lowest common denominator. That being said, there are lots of tools e.g. written in Scala but then being used by Java, such as Akka or Spark. And the other way around of course.
Annotations have no control flow, they just attach metadata to items. The difference with struct tags being that that metadata is structured.
Would it not be better to:
type CreateUserRequest struct {
Username string
Password string
}
type UserView struct {
Username string
IsAdmin boolean
}
etc?No need to just have just 1 model that maps 1:1 to your DB row. This applies to all languages
One reason is to avoid copying data constantly. I don't just mean this from an efficiency perspective, but also (and maybe more so) from a simplicity one. If you have a library for shoving data into a struct mechanistically, but you then take the data from that struct and shove it into an additional struct, what's the point of the library? You're writing the code move the data anyway.
In my dayjob I see this tendency constantly to have a lot of different very narrow structs that somehow integrate into some library, and then a TON of supporting code to copy between those structs. Only to then do very little actually useful work with any of the data at the end. I generally think you'd be happier with fatter structs that integrated less with weird "struct-filling" libraries.
Maybe that's the problem to solve, rather than exposing the entire internal world to the outside? Because different views of the same entities is pretty critical otherwise it's way too easy to start e.g. returning PII to public endpoints because some internal process needed it.
That's not at all what I said.
You don't need a struct to avoid exposing internal data. If you're building a JSON object, you can just not write the code to format some fields out. You don't need a new data layout for that.
Did you fail to read the article somehow?
Few if any modern language requires you to write the code to format individual fields out. Even Go does not, though when it comes to json most fields need to be annotated because of the choices it made to tie casing and visibility, but then it doesn't even require you to opt the type itself into being serializable, every struct is serializable by default.
Skipping fields on a serializable structure is what requires extra work.
Super annoying if you need to do it by hand, and wastes compute and memory if you actually need to do copies of copies, but this is the mapping part of "object relational mapping", the M in ORM. Skipping it is a bad idea.
Your business/domain model should not be tied directly to your persistence model. It's a common mistake that's responsible for like half of the bad rep ORMs get. Data structures may look superficially similar, but they represent different concepts with different semantics and expectations. If you skip on that, you'll end up with tons of stupid mistakes like 'masklinn mentions, and more subtle bugs when the concepts being squashed together start pulling in opposite directions over time.
record Profile(int id, String login, boolean isAdmin) {}
create a mapper for it: interface UserMapper {
// arrays are just one example, plain models and plenty
// of other data structures are supported
Profile[] usersToProfiles(User[] user);
// other mappers...
}
and then use it: class UserController {
//
@GET("/profiles")
Profile[] getUserProfiles() {
var users = userRepo.getUsers();
return userMapper.usersToProfiles(users);
}
}
As long as fields' names match, everything will be handled for you. Adding another "view" of your users requires creating that "view" (as a record or as a plain class) and adding just one line to the mapper interface, even if that class contains all User's fields but one. So no need to write and maintain 19+ lines of copying data around.It also handles nested/recursive entities, nulls, etc. It's also using codegen, not reflection, so performance is exactly the same as if you had written it by hand, and the code is easy to read.
Go developers usually "don't need these complications", so this is just another self-inflicted problem. Or maybe it's solved, look around.
>create a mapper for it:
> ...
>Go developers usually "don't need these complications", so this is just another self-inflicted problem.
In Go:
type DTO struct {
A, B, C string
}
Somewhere in your API layer: // copy the fields to the DTO
return DTO{A: o.A, B: o.B, C: o.C}
I fail to see where the "self-inflicted problem" is and why it requires a whole library? (which seems to require around the same number of lines of code at the end of the day, if you count the imports, the additional mapper interface) type SmallerThing struct {
Id int
Login string
IsAdmin bool
}
type UserController struct {
SmallerThing
OtherField Whatever
OtherField2 SomethingElse
}
In principle this could break down if you need super, super complicated non-overlapping mappings, in practice I have yet to need that.2. Exposing internal models to APIs directly also makes it hard to refactor code because refactoring would change APIs, which would require updating the clients (especially problematic when the clients are owned by other teams). I've seen this firsthand too in a large legacy project, people were afraid to refactor the code because whenever they tried, it broke the clients downstream. So instead of refactoring, they just added various complex hacks to avoid touching the old core code (and of course, their models also mapped directly to the UI).
In the end, codebases like that, with no layer separation, become really hard to maintain and full of security problems.
All because they thought it was "simpler" to skip writing ~10 lines of extra boilerplate per model to map models to DTOs.
Lack of layer separation becomes a problem in the long term. When you're just starting out, it may seem like overengineering, but it isn't
I actually agree, but you're setting of a false dichotomy. I do believe in strong layering at the interfaces, for exactly the reasons you line up. What I don't believe in is what I might call "struct annotation based parsing" at those interfaces.
Typically, you don't want to pass DTO's around your code. Usually, you take in that struct, and then immediatly have some code to poke it into the appropriate places in your actual data structures. It's very often much easier to simply take a well structured but more direct and generic interpretation of the input data, and write the code to poke it into the correct places directly.
It is not that you should define your inputs separately from your internal data storage. It's that the specification of your input structure shouldn't exist as a struct, it should exist as the consequence of your parsing code.
> When you're just starting out, it may seem like overengineering, but it isn't
It's a real shame that the internet has driven us to assume everybody is a novice.
Sorry, English is not my native language. I didn't mean to say you're a novice.
>Usually, you take in that struct, and then immediatly have some code to poke it into the appropriate places in your actual data structures
>It's that the specification of your input structure shouldn't exist as a struct, it should exist as the consequence of your parsing code.
>It is not that you should define your inputs separately from your internal data storage. It's that the specification of your input structure shouldn't exist as a struct, it should exist as the consequence of your parsing code.
Can you give an example?
Sure. I like taking Jackson (the Java library) as an example, since it actually supports both models. The way I've seen it used mostly is with jackson-databind. Here you define classes and annotate the fields with data that tells the library how to marshall them to/from json. Superficially, I find that similar to how Go or SerDe (from rust) suggests you handle that. In that programming model, I agree it makes total sense to declare some classes separately from your core structures, for all the reasons we've talked about.
The other model Jackson has is what they call the Jackson Tree Model. In this model you get back a representation of a Json Object. From that object you can get fields, those fields can themselves be objects, arrays, or immediate. It's an AST if you're a compiler person.
The first model might lead to code like this:
public class Person {
@JsonProperty
String name;
@JsonProperty
String phoneNo;
}
Usually, the annotations wont be able to fully specify the constraints of your code, so you'll see usage code like this: if(personDto.phoneNo.length() != 10) return Http(400, "Phone number must be 10 chars");
person.phone = personDto.phoneNo
With the Tree Model you'd instead get a representation of the raw JSON from the client and pull out the fields you care about yourself: var phoneNoObj = jsonBody.get("phoneNo");
if(phoneNoObj == null) return Http(400, "Phone number is required");
var phoneNoStr = phoneNoObj.asString()
if(phoneNoStr == null) return Http(400, "Phone number must be a string");
if(phoneNoStr.length() != 10) return Http(400, "Phone number must be 10 chars");
person.phone = phoneNoStr;
Notice that we are now just doing a single application specific parse of the json, and while we were at it we also got to surface a bunch more relevant errors. The Jackson Tree model is obviously pretty inefficient, but there are ways to implement it that makes it more efficient too.Don’t think of it as doing a little useful work at the end; think of it as doing all the useful work in the centre. Your core logic should be as close to a pure implementation without external libraries as possible (ideally zero, but that is often not easily achievable), but call out to external libraries and services to get its work done as appropriate. That does mean a fair amount of data copying, but IMHO it’s worth it. Testing copies is easy and localised, whereas understand the implications of a JSON (or Protobuf, or Django, or whatever) object carried deep into one’s core logic and passed into other services and libraries is very very difficult.
There’s a common theme with the ORM trap here. The cost of a little bit of magic is often higher than the cost of a little bit of copying.
At most, you can argue that simple serialization libraries (Go's is indeed one of the best) make it more tempting to "just send the data" in such a design, so if you squint really (really) hard, you can call this a "footgun" I guess.
But the rest of the headline is 100% nonsense. This is not about "Go" or "parsers". At all.
Nothing in the article discusses a parser or anything like a parser bug.
The article doesn't like that the semantics of the user-facing API wrapped around the parser is, I guess, "easy to make mistakes with". That's an article about API design, at most. But that's boring and specious and doesn't grab clicks, so they want you to think that Go's parsers are insecure instead.
The security failure is not the parsing library, but failing to model your application architecture properly.
And it doesn't claim to. The article is titled "footguns" not "bugs". A footgun is just something that is easy to misuse due to unintuitive or unexpected behavior.
Yes it does. The title is literally "Unexpected security footguns in Go's parsers". The article didn't identify a single footgun. This is just bad design.
It's deliberately misleading clickbait. You know it. I know it. We all know it.
If you want to have a considered discussion about pitfalls with the use of automatic serialization paradigms across trust boundaries, I'm here for it. If you just want to flame about Go, get better source material. This one isn't the hill to stand on.
[1] Which, again, has a really first rate serialization story; but not one fundamentally different from any of a zillion others. Cooking data from untrusted sources is just plain hard, and not something that anyone (much less the author of this awful blog post) is going to solve with a serialization API.
It was a bad design choice to allow JSON keys with different capitalisation and to serialise all public struct members by default.
These decisions can easily result in the creation of an insecure system.
This isn’t intended to be a ding against Go; it’s a universal problem across programming languages (otherwise there’d be no differentials at all). But they’re worth noting, and I think the post amply demonstrates their security relevance.
See eg rails' strong params for the opposite approach: opt-in.
> In our opinion, this is the most critical pitfall of Go’s JSON parser because it differs from the default parsers for JavaScript, Python, Rust, Ruby, Java, and all other parsers we tested.
It would be kind of difficult to argue that this is not about Go.
Don't get me wrong, I love Go just as much as the next person, but this article definitely lays bare some issues with serialization specific to Go.
(This is why the more formal definition of a parser is useful to ground on: a parser is a type of recognizer, and disagreements between recognizers that claim to recognize the same thing can be exploited. This doesn’t require a bug per se, only difference, which is why it’s a footgun.)
This was an explicit decision for convenience, because the Go struct fields will be Capitalized to export them but JSON tends to follow a lower case convention.
Gob is a format made for Go, so it does not have a field naming convention mismatch; what's on wire is the Go convention.
encoding/xml isn't really structured as a "convert between equivalent-shaped structs and a message" utility, its struct tags are more like a query language, like a simpler cousin of XPath. Most real-world code using it does things like
type Person struct {
FirstName string `xml:"name>first"`
LastName string `xml:"name>last"`
}
Where as with JSON there was a clear desire to have {"email": "jdoe@example.com", "name": "John Doe"}
parse with as little noise as possible: type Person struct {
Email string
Name string
}Any change to your DB schema is liable to become a breaking change on your API. If you need separate types for your requests and responses so be it.
Transfer Objects are not the Storage Objects.
I agree with your stance, but creating lots of DTOs is more work and we developers are lazy. It's incredibly common to see a single "User" type used everywhere. This also makes all User-related functions re-usable in all contexts (again: it's the fast and lazy approach).
Just to have the assurance that, regardless of programming language, you're guaranteed a consistent ser/de experience.
But it’s not, for reasons that have more to do with the languages themselves, than parsing
e.g. C++ numbers are different than Java numbers are different than Python numbers are different than JavaScript numbers
ditto for strings
(I have no idea if that is the case with protobuf, I don't have enough experience with it.)
Again, the problem has more to do with the programming languages themselves, rather than with protobufs or parsing.
Protobuf has both signed and unsigned integers - the initial use case was C++ <-> C++ communication
Java doesn't have unsigned integers
Python has arbitrary precision integers
JavaScript traditionally only had doubles, which means it can represent integers up to 53 bit exactly. It has since added arbitrary size integers -- but that doesn't mean that the protobuf libraries actually use them
---
These aren't the only possibilities -- every language is fundamentally different
OCaml has 31- or 63-bit integers IIRC
https://protobuf.dev/programming-guides/encoding/#int-types
And again, strings also differ between all these languages -- there are three main choices, which are basically 8-bit, 16-bit, or 32-bit code units
Go and Rust favor 8-bit units; Java and JavaScript favor 16-bit units; and Python/C/C++ favors 32-bit units (which are code points)
As long as a language has bytes and arrays, you can implement anything on top of them, like unsigned integers, 8-bit strings, UTF-8 strings, UCS-2, whatever you want. Sure it won't be native types, so it will probably be slower and could have an awkward memory layout, but it's possible
Granted, if a language is so gimped that it doesn't even have integers (as you mentioned JavaScript), then that language will not be able to fully support it indeed.
I recommend writing a protobuf generator for your favorite language. The less it looks like C++, the more hard decisions you'll have to make
If you try your approach, you'll feel the "tax" when interacting with idiomatic code, and then likely make the opposite decision
---
Re: "so gimped" --> this tends to be what protobuf API design discussion are like. Users of certain languages can't imagine the viewpoints of users of other languages
e.g. is unsigned vs. signed the way the world is? Or an implementation detail.
And it's a problem to be MORE expressive than C/C++ -- i.e. from idiomatic Python code, the protobuf data model also causes a problem
Even within C/C++, there is more than one dialect -- C++ 03 versus C++ 11 with smart pointers (and probably more in the future). These styles correspond to the protobuf v1 and protobuf v2 APIs
(I used both protobuf v1 and protobuf v2 for many years, and did a design review for the protobuf v3 Python API)
In other words, protobufs aren't magic; they're another form of parsing, combined with code generation, which solve some technical problems, and not others. They also don't resolve arguments about parsing and serialization!
Are there that many implementations of protobuf? How many just wrap the C lib and proto compiler? Consistency can be caused by an underlying monoculture, although that's turtles all the way down because protobuf is not YAML is not JSON, etc.
Off in the weeds already, and all because I implemented a pure Python deserializer / dissector simply because there wasn't one.
so the user can send in unknown fields all they want, the code will only accept the username and firstname strings, and ignore the other ones.
same with fetching data and sending it to the user. i fetch only the fields i want and create the correct datastructures before invoking the marshaling step.
there are no footguns. if you expect your parser to protect you you are using it wrong. they were not designed for that.
input -> parse -> extract the fields we want, which are valid -> create a data-structure with those fields.
data -> get fields i want -> create datastructures with only wanted fields -> write to output format
This would be solved (as you described), by ensuring that the downstream layer uses only contents that are verified in the security check layer.
If they are using a microservice then: Security check API -> return verified data (i.e. re-serialize the verified JSON or XML into byte form, NOT the original input) -> Processing layer i.e. userCreate API uses verified data.
This is the method we used in fixing the ruby-saml example.
See: https://bsky.app/profile/filippo.abyssdomain.expert/post/3le...
The part of the article that I read before getting annoyed at the clickbaity title is basically "if you trust external data here's how you can blame that design decision on the parser".
(I'm not a Go developer, just tried the language casually).
type User struct {
Username string `json:"username_json_key,omitempty"`
Password string `json:"password"`
isAdmin bool
}
https://go.dev/play/p/1m-6hO93XceThat may break other things - `gorm`, for example, will ignore private fields - inconvenient if you want to serialise `User` to your DB.
If you mean "public API", yep, 100% agree. Internal API between microservices though? Perfectly safe and cromulent, I'd say.
When that boundary is moved to outside the application, so an HTTP API between microservices, I feel even more strongly (though indeed still not as strongly as in what you call a "public API").
E.g. I have seen plenty of times a situation where a bunch of applications were managed within one team, the team split up and now this "internal API" has become an API between teams, suddenly making it "public" (when viewed from the teams perspective).
The logic should be "Parse, don't validate"[0] and after that you work on those parsed data.
[0]: https://hn.algolia.com/?q=https%3A%2F%2Flexi-lambda.github.i...
I’m a little curious to try and build an API where parsing must be exact, and changes always result in a new version of the API. I don’t actually think it would be too difficult, but perhaps some extra tooling around downgrading responses and deprecating old versions may need to be built.
If you’re writing a server, I believe the rule is that any once valid input must stay valid forever, so you just never delete fields. The main benefit of DisallowUnknownFields is that it makes it easier for clients to know when they’ve sent something wrong or useless.
What actually makes sense is versioning your interfaces (and actually anything you serialize at all), with the version designator being easily accessible without parsing the entire message. (An easy way to have that is to version the endpoint URLs: /api/v1, /api/v2, etc).
For some time, you support two (or more) versions. Eventually you drop the old version if it's problematic. You never have to guess, and can always reject unknown fields.
Especially the case in frameworks that prescribe a format for routing.
For instance, there simply isn't a "correct" way for a parser to handle duplicate keys. Because the problem they have is different layers seeing them differently, you can have the problem anywhere duplicate keys are treated differently, and it's not like Go is the only thing to implement "last wins". It doesn't matter what you do. Last wins? Varies from the many "first wins" implementations. First wins? Varies from the many "last wins" implementations. Nondeterministically choose? Now you conflict with everyone, even yourself, sometimes. Crash or throw an exception or otherwise fail? Now you've got a potential DOS. There's no way for a parser to win here, in any langauge. The code using the parser has some decisions to make.
Another example, the streaming JSON decoder "accepts" trailing garbage data because by the act of using the streaming JSON decoder you have indicated a willingness to potentially decode more JSON data. You can use this to handle newline-separated JSON, or other interesting JSON protocols where you're not committing to the string being just one JSON value and absolutely nothing else. It's not "an issue they're not planning on fixing", it's a feature with an absolutely unavoidable side effect in the context of streams. The JSON parser stops reading the stream at the end of the complete JSON object, by design, and anything else would be wrong because it would be consuming a value to "check" whether the next thing is JSON or not, when you may not even be "claiming" that the "next thing" is JSON, and whatever input it consumed to verify a claim that nobody is even making would itself be a bug.
Accepting user input into sensitive variables is a common mistake I've seen multiple times in a number of langauges. The root problem there is more the tension between convenience and security than languages themselves; any language can make it so convenient to load data that developers accidentally load more data than they realize.
Etc. The best lesson to take away from this is that there is more than meets the eye with JSON and XML and they're harder to use safely than its ease-of-use suggests.
Although in the interests of fairness I also consider case insensitivity in the JSON field names to be a mistake; maybe it should be an option, JSON can get messy in the real world, but it's a bad default. I have other quibbles but most of them are things where there isn't a correct answer where you can unambiguously say that some choice is wrong. JSON is really quite fundamentally messier than people realize, and XML, while generally more tightly specified at the grammer level than JSON is, is generally quite messy in the protocols people build on top of it.
The problem of trying to ensure that each parser behaves the same for all input is twofold: - JSON and XML specifications are complex, lots of quirks. So not feasible. - Does not solve the fundamental issue of the processing layer not using the same data that is verified in the verification layer.
Note: the processing layer parses the original input bytes, while the verification layer verifies a struct that is parsed using another parser.
Processed: Proc(input) Verified: VerifyingParser(input)
So, I don't think that's a relevant critique. I think any ambiguous case in parsing untrusted user input should raise an error, and anyone working on code with untrusted data should be ready to handle errors
On another note, it's mind-blowing that a single string can parse as XML, JSON, and YAML.
(I mean, don't use SAML to begin with, but.)
We patched the gosaml2 (and other go saml libraries), by ensuring only the authenticated bytes are processed (not the original XML document). You can see the patches here: https://github.com/russellhaering/goxmldsig/commit/e1c8a5b89... https://github.com/russellhaering/gosaml2/commit/99574489327...
> I just wrote my own for my SAML.
Curious to see your implementation for SAML and XML Signatures.
[1]: https://bsky.app/profile/filippo.abyssdomain.expert/post/3le...
It never occurred to me to ever (in any language) have a DTO with fields I wish (let alone require for security) not to unmarshall.
This seems doubly strange in Go the language of "Yes absolutely DO repeat yourself!"
Side rant:
JS (even with Typescript) pisses me off as this is unavoidable. Splats make it worse. But still a distinct DTO and business object and don't use splats would help. (Maybe make ... a lint error)
In Go etc. If a struct doesn't a field foo then there will not be a field foo at runtime. In JS there might be. Unless you bring in libraries to help prevent it.
You are relying on someone remembering to use zod on every fetch
But you can find it many places.
2. C++ has some unusual OO features -- friends, multiple-inheritance.
3. Most importantly, Java is significantly more approachable than C++ due to automatic memory management.
Or make some "entry gate"-service not only validate/authorize requestes but also re-encode them into certainly valid shape. In the example with AuthService/ProxyService from "Attack scenario 2", make the Auth Service return not a simple "yep/nope" in response, but properly re-serialized request instead (if it's allowed in). So if e.g. AuthService takes a request with two fields "UserAction" and "AdminAction" and allows the "UserAction", it would response with a request object that has "UserAction" field in it but not "AdminAction" because the service ignored that field and so did not copy it into the response.
anitil•7mo ago
It's interesting that decisions made about seemingly-innocuous conditions like 'what if there are duplicate keys' have a long tail of consequences
shakna•7mo ago
anitil•7mo ago
jwilk•7mo ago
shakna•7mo ago
I don't really agree, as "surprising" is a stench in any API. But it's their project.
mdaniel•7mo ago
Parsing JSON Is a Minefield (2018) - https://news.ycombinator.com/item?id=40555431 - June, 2024 (56 comments)
et al https://hn.algolia.com/?query=parsing%20json%20is%20a%20mine...