The real world data transformations can get gnarly very quickly and sql is the perfect common debiminator compared to dplyr which is still niche...
How do you feel about polars?
It’s certainly miles better than Pandas, which has a terrible API in addition to being comically inefficient. In my group, we generally use it for any new work, and have also swapped out pandas for polars in critical spots of our existing code - the latter giving a huge benefit relative to the amount of work it took.
I largely agree with you on SQL being the common denominator, but there are some things that are just awkward in SQL, and much easier to do in Python or other general purpose language.
What an awful thing to imagine. It's already the programming language of choice for egregious abuses of good practice.
People I've worked with that used R and manged data / did analysis didn't really seem too concerned with long term maintenance.
Secondary observation, these same people were the first to preach for the AI coding gospel.
Unless you’re the poor schmuck who is given the task of running the code written by the previous analyst, who has probably already left the company. Often it’s easier to just throw something together from scratch and then look for a new job, perpetuating the problem.
Two things that make me wonder if they can possibly turn out good quality R.
Perhaps a true test of AGI will be when you ask it to write an application in R and it refuses for fear of what people might think.
First, usage: Using R for our undergrads in time of LLMs is brilliant. ChatGPT slops out working code for their needs. Not pretty but works better that in 2022.
Second, development: Mastering R is hard, because its kalkül. Tidyverse mediates some of it, but still. This is the perfect breeding ground for slopification. Lets see.
Third, errata: I would love to know the percentage of science built on R to this day. I mean insights and analysis supported by it and it vast packages. What if somewhere, deep down in the stack there is an ancient bug that dented all of this? I think AI might help us here, or review slop will negate this?
Science is built on libraries with experience, that have been validated extensively against reality. Code often written by people who have retired and died because that exact same code has been validated and pinned to reality for decades. It is of course possible that a load bearing bug survives for a long time conspiring with an incorrect model of reality to give validated results, but wide use tends to eliminate these things.
If you’re making a package for a small team or aren’t pushing it to a large audience then just keep it on a GitHub repository. It is almost as easy to install from GitHub with devtools as it is to install.packages().
Posit is obviously the only organization with the pull to do that, and I feel like they got pulled in 10 directions during the move to AI and trying to also support Python. R Shiny is dead too which sucks because reflex.dev just copied them and ate their lunch in 3 months.
Not to mention the ridiculous styling/formatting of most tidyverse users, which Wickham and others seem to promote. One of the reasons R has lost ground to other languages recently is that most R code these days is ugly
As a working data scientist, I know I am not a computer scientist or a 10x engineer (hell, I am probably a 0.8x engineer), but that's not where my expertise is. My engineer co-workers are 0.01x data scientists, but you won't see me complaining that they don't know the Central Limit Theorem or how to build a causal inference engine.
They are the coding equivalent of orchestral viola jokes. By which I mean fundamentally grounded in truth.
jdw64•1h ago
RA_Fisher•1h ago
AI speeds up learning, so I bet that’s what you’re noticing with R.
As an aside, the best programmers these days are probabilistic programmers (who write stochastic functions). Our languages are Stan and PyMC. Both can be called by Python or R, and AI writes all of them extremely well. So it seems to me that the underlying language matters less than ever.
davemp•1h ago
Programming isn’t even a field in the same way as prob&stats. Computer science does in fact have non-deterministic sub fields such as information theory.
RA_Fisher•20m ago
Information theory doesn’t even incorporate utility.
jdw64•53m ago
R these days mostly uses the tidyverse, which feels like a variant of DOP (Data-Oriented Programming). It's a kind of data flow, so it's different from typical OOP. I also occasionally work with statisticians (being a freelancer, ETL work is more common than you'd think), and I know what you mean by Stan and PyMC. I know they're powerful tools for Bayesian statistics and multilevel modeling. I know the basic syntax and examples, but I wouldn't say I know them well. My level is mainly focused on the scientists who hire me, and those tools still don't come up often in my country.
That said, I think we differ on the bigger picture because academic code isn't everything. Academic code is typically algorithm‑centric, like LeetCode problems, but most production work revolves around code hygiene and responsibility (algorithms are usually already established ones). Anyway, that's not the main point. What you said is mostly correct, but my focus was on something else: even people who studied at that level can be surprisingly clumsy at expressing themselves through programming. Regardless, thanks for your input, and I agree that AI is good at programming. But using a programming language generally means understanding its tradeoffs, and R is tricky in that regard since it feels like a mix of OOP and DOP variants
latexr•46m ago
Is anyone arguing “AI is always bad”? I think the argument is clearly “the negatives outweigh the positives”.
jdw64•40m ago
PaulHoule•23m ago
At risk of sounding like ChatGPT, it's not an R thing, it's a general thing. Turn [showdead] on in your profile and see how Show HN is flooded with AI slop projects and we all know GitHub is drowning in it.
colechristensen•14m ago
Some other researcher, often with limited skills in your native tongue, even more limited skills in software development best practices, wrote some code for a paper between 5 and 50 years ago and your PI has told you to use that code and some OTHER code together at the same time to validate some experiment he wants you to do.
In the past you would take days/weeks/months to get this to work, but with an LLM?
I'm envious of the grad students of today for the amount of nonsense which is bypassable.