IMO, it's nuts to purposely introduce random bugs into the apps of everyone who uses your dependency.
The change of behavior here would prevent the specific problem, but makes the general problem worse.
This change creates a new code path that did not exist when all the app code was developed and tested, so there's a decent chance something bad can happen that it's not prepared for.
Keeping the first key/value would be a poor idea since it is a behavior not mentioned in the RFC.
The idea that your JSON parser can catch this kind of error in the general case just doesn’t make sense.
Changing the default behavior of the parser would catch this specific problem — by coincidence — but will also create new problems. They are as yet unknown, but we will find out (unless someone gains the sense to back off this bad idea before it makes it into production).
I have a huge selection of mostly handwritten JSON data that powers a benchmark collection. Sometimes, mistakes happen and keys get duplicated. That is not a big problem, if e.g. a graphics card is two times in an array nothing bad can happen. Even if the actual value is wrong because of such a duplication (like a benchmark update gone wrong) that will usually not be a problem, cause there is a lot of other data that will place the component at around the correct position regardless.
Now, what will happen is that a ruby update forces me to update the gems, and then the new JSON gem will crash with my data, and then I will have to take time I don't have to fix something that does not need fixing for a project that does not really generate income. Awesome.
The right solution here is to specify the unspecified behaviour about the duplicate keys as it is currently for this parser. Then applications randomly running into new bugs is prevented. And then print a warning, but that's done now already. And then maybe offer an opt-in to disallow input data that has duplicated keys. Not to waste the developers time by making the breakage default.
And that's why dependencies breaking are unacceptable (if the old behaviour is not completely broken), and relying on invisible deprecation messages is not okay. The sqlite gem did the same thing, completely broke their old API of how parameters could be supplied and did not revert the change even when the chaos that caused was reported, and then took it as a personal insult when I opened a discussion about how that's still a problem.
Nice is also the YAML and psych gem, that just this month suddenly could not write a YAML file with YAML::Store.new anymore. I had to workaround that with `file.write(Psych.dump(theyamlhash))`, https://gitlab.com/onli/sustaphones/-/commit/fecae4bb2ee36c8... is the commit. If I got that right this is about `permitted_classes: [Date]` not being given and not being givable to psych in YAML::Store. Made me pause.
Is the ruby community breaking? I'm not aware of gems producing developer hostile situations like that before.
The whole point of the article is to explain why while I have empathy for code owners that will be impacted by various changes sometimes I believe the benefits outweigh the cost.
I even linked to an example of a very nasty security issue that would have been prevented if that change had been made sooner.
> Is the ruby community breaking?
No, the community is trying to fix past mistakes. For instance the YAML change you are cursing about has been the source of numerous security vulnerabilities, so tenderlove had to do something to remove that massive footgun.
I totally get that it annoyed you, you can't imagine how much code I had to update to deal with that one.
But again, while I totally agree maintainers should have empathy for their users, and avoid needless deprecations, it'd be good if users had a bit of empathy for maintainers that are sometimes stuck in "damned if you do, damned if you don't" situations...
I appreciated the "avoiding breakage" sentiment included in the post (and commented accordingly below). I'm just really of the opinion that dependencies should not trigger me to do something like set `allow_duplicated_key: true` if it is not absolutely necessary, and I guess I do not see why it is absolutely necessary here. I saw the link to a security vulnerability, but why not let devs set `allow_duplicated_key: false` manually in contexts where it matters? Avoiding churn is more important - this is not a "the api will cause random code to be executed" situation, unlike the create_additions options situation you describe and possibly unlike the YAML situation. There I understand the need (even with YAML, there I'm just sad about the currently broken default code path).
Also, we saw with Yaml how there it wasn't viable to do such a change without breakage via the intermittent dependencies (and I was very happy you mentioned that API not being nice) and churn via the direct. The same is very likely to happen here, programs will break because their dependencies to not set the new option when those store JSON.
Because that wouldn't have prevented the issue I linked to.
Default settings are particularly important, because most of the gem's users have no idea that duplicated keys are even a concern.
JSON is used a lot to parse untrusted data, as such having strict and safe default is particularly valuable.
In your case the JSON documents are trusted, so I understand this change is of negative value for you, but I believe it has positive value overall when I account for all the users of the gem.
Additionally, even for the trusted configuration file or similar case, I believe most users would see it as valuable to not let duplicated keys unanswered, because duplicated keys are almost always a mistake and can be hard to track down.
e.g. a developer might have some `config.json` with:
{
"enabled": true,
// many more keys,
"enabled": false,
}
And waste time figuring out why `JSON.parse(doc)["enabled"]` is `false` when they can see it's clearly `true` in their config.So again, I understand you are/will be annoyed, because your use case isn't the majority one, but I'm trying to cater to lots of different users.
If going over your code to add the option is really too much churn for your low maintenance project, as I mention in the post, for such cases a totally OK solution is to monkey patch and move on:
require "json"
module JSONAllowDuplicateKey
def parse(doc, options = nil)
options = { allow_duplicate_key: true }.merge(options || {})
super(doc, options)
end
end
JSON.singleton_class.prepend(JSONAllowDuplicateKey)I get that the option to detect and prevent this is good. The warning is seriously useful. But it does not need to be a new default to forbid the old valid inputs. And I do not understand why the option to do so would not have helped in the hacker issue.
Thanks for the code snippet. Actually, it is not a lot of code for me to change and likely I do not have an intermediate dependency, I hope. Problem is the dependencies vastly outnumbering my program and thus work like this potentially adding up.
Dont feel obliged to further discuss if it is a waste of your time - but it is important for me to try to argue against churn.
Just clarifying this one:
> I do not understand why the option to do so would not have helped in the hacker issue.
Because users aren't omnipotent. When a user need to parse some JSON, they'll reach to `JSON.parse`, which they either already know about or will find from a cursory search. That will have solved their problem so they will not likely look in detail for the dozen or so various options this method takes and won't consider the possibility of duplicated keys nor their potential nefarious impact.
Hence why the defaults are so important.
> it is important for me to try to argue against churn
It's alright, I'm with you on this in general, just not in this particular case.
I've always used
require 'json/ext'
This uses the C extension version and is much faster.
Meaning Ruby ships with that gem pre-installed, but it's still a gem and you can upgrade/downgrade it in your gemfile if you so chose.
As for `require "json/ext"` that's outdated knowledge.
Software which prepares JSON with duplicate keys is causing unspecified behavior in downstream processors, and therefore its output can be regarded as broken.
If you put out JSON with duplicate keys, you are responsible for whatever consequently blows up in your face or the faces of downstream users.
ezekg•6mo ago
athorax•6mo ago
ezekg•6mo ago
Mixing the two has always been a smell to me, but maybe you're right.
caseyohara•6mo ago
fuzzy_biscuit•6mo ago
onli•6mo ago