Would Ruby be as successful if they had all those complicated features right from the start ?
Or do all languages start from a nice simple clean slate tabula rasa to get developers hooked, until the language is enough famous to get well developed and starts to be similar to all others big programming languages ?
The standard library also has String, CString, CStr, OsString, and OsStr.
The latter four are for niche situations. 99.9% of the time, it's similar to Java: &str is Java's String, String is Java's StringBuffer/StringBuilder.
String
&str
&mut str
&'static str
etc.
These are just the language semantics.The other string types are non-Rust strings. Filesystem, C strings, etc. You only deal with them in dealing with specific OS and binding interfaces.
95% of the time you'll just be using String.
Why do you say that? I would say the opposite.
Ruby isn’t making all strings immutable here. Just string literals. You are free to allocate mutable strings that can be appended to, to your heart’s content. It is extremely rare that modifying a literal is intended behavior, since their contents are permanently persisted throughout the lifetime of your program. With your example, this would be like having one shared global buffer for your final document.
Ruby is not a web focused scripting language.
JavaScript is much more of a "web-focused scripting language" than Ruby is, and it is quite happy with immutable strings (only).
> I think the comment is about that you now need to choose mutable vs immutable, and that is framed as a consequence of broader adoption.
Ruby has also had immutable (frozen) strings for a very long time, so you've always had the choice. What is changing is that string literals are (eventually) going to migrate from "mutable with a strong-encouraged file level switch to make them immutable" to "immutable".
> The selectors in parentheses may be replaced with other selectors by modifying the compiler and recompiling all methods in the system. The other selectors are built into the virtual machine.
> Any objects referred to in a CompiledMethod's bytecodes that do not fall into one of the categories above must appear in its literal frame. The objects ordinarily contained in a literal frame are
> shared variables (global, class, and pool)
> most literal constants (numbers, characters, strings, arrays, and symbols)
> most message selectors (those that are not special)
> Objects of these three types may be intermixed in the literal frame. If an object in the literal frame is referenced twice in the same method, it need only appear in the literal frame once. The two bytecodes that refer to the object will refer to the same location in the literal frame.
> Two types of object that were referred to above, temporary variables and shared variables, have not been used in the example methods. The following example method for Rectangle merge: uses both types. The merge: message is used to find a Rectangle that includes the areas in both the receiver and the argument.
http://www.mirandabanda.org/bluebook/bluebook_chapter26.html
'justastring' at: 6 put: $S; yourself
'justaString' .
However #'justasymbol' at: 6 put: $S; yourself
errorNoModification
self error: 'symbols can not be modified.'Evaluating this yields an error in both Squeak and Pharo. What Smalltalk are you using? I'm going to guess Cuis, in which case your example holds, but is misleading. Consider:
a:='justastring'.
b:='justastring'.
a at: 6 put: $S.
a, ' = ', b.
'justaString = justaString' .
Notice, modifying "a" also modified "b," because of the shared literal frame entry. This is why you were traditionally admonished to avoid directly modifying string literals. (Which wasn't an issue given the design of the string classes, and the general poor manners of destructively modifying a string argument of unknown origin.) | a b |
a := 'justastring'.
b := 'justastring'.
a == b
true .Mutable strings are totally possible (and not even especially hard) in compiled, statically typed, and lower-level languages. They're just not especially performant, and are sometimes a footgun.
> all those complicated features right from the start
Arguably, mutable strings are the more complicated feature. Removing them by default simplifies the language, or at least forces you to go out of your way to find the complexity.
What? Mutable strings are more performant generally. Sometimes immutability allows you to use high level algorithms that provide better performance, but most code doesn't take advantage of that.
An obviously good change, actually massive performance improvements not hard to implement but its still gonna be such a headache and dependency hell
https://www.ruby-lang.org/en/news/2015/12/25/ruby-2-3-0-rele...
Most linting setups I've seen since then have required this line. I don’t expect many libraries to run afoul of this, and this warning setting will make finding them easy and safe. This will be nothing like the headache Python users faced transitioning to 3.
I agree it has been a well advertised and loudly migration path and timeframe for it
The rest of the changes were a bit annoying but mostly boring; some things could have been done better here too, but the string encoding thing was the main issue that caused people to hold on to Python 2 for a long time.
The frozen string literal changes are nothing like it. It's been "good practise" to do this for years, on errors fixing things is trivial, there is a long migration path, and AFAIK there are no plans to remove "frozen_string_literal: false". It's just a change in the default from false to true, not a change in features.
"Learning lessons" doesn't mean "never do anything like this ever again". You're the one who failed to learn from Python 3, by simply saying "language change bad" without deeper understanding of what went wrong with Python 3, and how to do things better. Other languages like Go also make incompatible changes to the language, but do so in a way that learned the lessons from Python 3 (which is why you're not seeing people complain about it).
And since that flag really doesn't require lots of work in the VM, it's likely to be kept around pretty much forever.
I recall it was a bit bumpy, but not all that rough in the end. I suppose static type checking helps here to find all the ways how it could be used. There was a switch to allow running old code (to make strings and buffers interchangeable).
Ruby is not doing that, it's transitioning from mutable strings that can be frozen with no special treatment of literals (unless you opt-in to literals being frozen on per file basis) to mutable strings with all string literals frozen.
fooLit = "foo"
fooVar = "f".concat("o").concat("o")
This would have fooLit be frozen at parse time. In this situation there would be "foo", "f", and "o" as frozen strings; and fooLit and fooVar would be two different strings since fooVar was created at runtime.Creating a string that happens to be present in the frozen strings wouldn't create a new one.
irb(main):001> str = "f".concat("o").concat("o")
=> "foo"
irb(main):002> str.frozen?
=> false
irb(main):003> str.freeze
=> "foo"
irb(main):004> str.frozen?
=> true
irb(main):005> str = str.concat("bar")
(irb):5:in 'String#concat': can't modify frozen #<Class:#<String:0x000000015807ec58>>: "foo" (FrozenError)
from (irb):5:in '<main>'
from <internal:kernel>:168:in 'Kernel#loop'
from /opt/homebrew/Cellar/ruby/3.4.4/lib/ruby/gems/3.4.0/gems/irb-1.14.3/exe/irb:9:in '<top (required)>'
from /opt/homebrew/opt/ruby/bin/irb:25:in 'Kernel#load'
from /opt/homebrew/opt/ruby/bin/irb:25:in '<main>'With immutable strings literals, string literals can be reused.
You make an arrow function that takes an object as input, and calls another with a string and a field from the object, for instance to populate a lookup table. You probably don’t want someone changing map keys out from under you, because you’ll break resize. So copies are being made to ensure this?
Though since Ruby already has symbols which act as immutable interned strings, frozen literals might just piggyback on that, with frozen strings being symbols under the hood.
1. Strings have a flag (FL_FREEZE) that are set when the string is frozen. This is checked whenever a string would be mutated, to prevent it.
2. There is an interned string table for frozen strings.
> Does it keep a reference count to each unique string that requires a set lookup to update on each string instance’s deallocation?
This I am less sure about, I poked around in the implementation for a bit, but I am not sure of this answer. It appears to me that it just deletes it, but that cannot be right, I suspect I'm missing something, I only dig around in Ruby internals once or twice a year :)
The interned string table uses weak references. Any string added to the interned string tables has the `FL_FSTR` flag set to it, and when a string a freed, if it has that flag the GC knowns to remove it from the interned string table.
The keyword to know to search for this in the VM is `fstring`, that's what interned strings are called internally:
- https://github.com/ruby/ruby/blob/b146eae3b5e9154d3fb692e8fe...
- https://github.com/ruby/ruby/blob/b146eae3b5e9154d3fb692e8fe...
But not actually stated it's the plan. I'd bet whatever LLM wrote the article took it as a stronger statement than it is.
I had to explain the same reasoning in Reddit the other day. Perhaps it’s time to take this as a feedback and update the blog.
Btw I just asked gpt to write an article on the same topic, with a reference to the Ruby issues page. And it DID NOT add the future proposal part. So LLMs are definitely smarter than me.
The move is so we can avoid allocating a string each we declare and use it since it will be frozen by default. It is a big optimization for GC mainly. Before we had to do such optimization by hand if we intend not to modify it:
# before
def my_method
do_stuff_with("My String") # 1 allocation at each call
end
# before, optim
MY_STRING = "My String".freeze # this does 2 allocations with 1 at init being GC quite early
def my_method
do_stuff_with(MY_STRING)
end
# after
def my_method
do_stuff_with("My String") # 1 allocation first time
end
But this move also complicates strings manipulation in the sense of it will lean users toward immutable ops that tend to allocate a lot of strings. foo.upcase.reverse
# VS
bar = foo.dup
bar.upcase!
bar.reverse!
So now we have to be deliberate about it: my_string = +"My String" # it is not frozen
We have frozen string literals for quite a while now, enabled file by file with the "frozen_string_literal: true" comment and I've seen it as the recommended way by the community and the de-facto standard in most codebase I've seen. It is generally enforced by code quality tools like Rubocop.So the mutable vs immutable is well known, and as it is part of the language, well, people should know the ins and outs.
I'm just a bit surprised that they devised this long path toward real frozen string literals, because it is already ongoing for years with the "frozen_string_literal: true" comment. Maybe to add proper warnings etc. in a way that does not "touch" code ? I prefer the explicit file by file comment. And for deps, well, the version bump of Ruby adding frozen string literals by default is quite a filter already.
Well, Ruby is well alive and it is what matters)
I say sorta late to the party, as I think it is more than fair to say there was not much of a party that folks were interested in in the lisp world. :D
Oh, I think I see some nameless person I know over there. Well-met Lisper, but goodbye!
The original plan was to make the breaking change in 3.0, but that plan was canceled because it broke too much code all at once.
Hence why I proposed this multi-step plan to ease the transition.
See the discussion on the tracker if you are curious: https://bugs.ruby-lang.org/issues/20205
SUB_ME = ':sub_me'.freeze
def my_method(method_argument)
foo = 'foo_:sub_me'
foo.sub!(SUB_ME, method_argument)
foo
end
which, without `# frozen_string_literal: true`, I believe allocates a string when the application loads (it sounds like it might be 2) and another string at runtime and then mutate that.That seems like it's better than doing
# frozen_string_literal: true
FOO = 'foo_:sub_me'
SUB_ME = ':sub_me'
def my_method(method_argument)
FOO.sub(SUB_ME, method_argument)
end
because that will allocate the frozen string to `FOO` when the application loads, then make a copy of it to `foo` at runtime, then mutate that copy. That means two strings that never leave memory (FOO, SUB_ME) and one that has to be GCed (return value) instead of just one that never leaves memory (SUB_ME) and one that has to be GCed (foo/return value).This is true in particular when FOO is only used in `my_method`. If it's also used in `my_other_method` and it logically makes sense for both methods to use the same base string, then it's beneficial to use the wider-scope constant.
(The reason this seems reasonable in an application is that the method defines the string, mutates it, and sends it along, which primarily works because I work on a small team. Ostensibly it should send a frozen string, though I rarely do that in practice because my rule is don't mutate a string outside the context in which it was defined, and that seems sensible enough.)
Am I mistaken and/or is there another, perhaps more common pattern that I'm not thinking about that makes this desirable? Presumably I can just add # frozen_string_literal: false to my files if I want so this isn't a complaint. I'm just curious to know the reasoning since it is not obvious to me.
Variables don't "contain" a string, they just point to objects on the heap.
So:
my_string = same_string = "Hello World"
Here both variables are essentially pointers to a pre-existing object on the heap, and that object is immutable.So I sometimes wonder why JIT isn't used as a motivation to move / remove features. Basically if you want JIT to work, your code has to be x ready or without feature x. So if you still want those performance improvements you will have to move forward.
teddyh•7mo ago
llamataboot•7mo ago
neallindsay•7mo ago
yxhuvud•7mo ago
WJW•7mo ago
steveklabnik•7mo ago
Before that, Ruby did "support encodings" in a sense, but a lot of the APIs were byte oriented. It was awkward in general.
https://web.archive.org/web/20180331093051/http://graysoftin...
llamataboot•7mo ago
0x457•7mo ago
thibaut_barrere•7mo ago
teddyh•7mo ago
skywhopper•7mo ago
WJW•7mo ago
tangus•7mo ago
teddyh•7mo ago
__s•7mo ago
<< is inplace append operator for strings/arrays, while + is used to make copy. So += will make new string & rebind variable
corytheboyd•7mo ago
Good reminder that anyone can go on the internet, just say stuff, and be wrong.
jashmatthews•7mo ago
corytheboyd•7mo ago
hinkley•7mo ago
Most but not all of these were performance related. If it took a few days to run that’s fine. Major versions don’t come out that often.
runeblaze•7mo ago
zht•7mo ago
corytheboyd•7mo ago
typeofhuman•7mo ago
corytheboyd•7mo ago
dragonwriter•7mo ago