Too many R packages: CRAN is inundated with submissions

https://rworks.dev/posts/too-many-R-packages/

43•ionychal•2h ago

Comments

jdw64•1h ago

People would typically choose based on CRAN TaskViews or follow conventional methodologies, but what I notice from this is that R is truly a language used only by those who use it. And the people who use it are usually master's students or professors; it's rarely used at the undergraduate level. So even those with that level of academic background and training must have had their own implementation roadblocks. Could that be why the use of R has exploded with the help of AI? Looking at this, I think it's fair to understand that even domain experts found programming difficult. Seeing this, can we really say that AI is always bad? For some people, it has become both the hands and a voice for their words.

RA_Fisher•1h ago

Programming is a lot easier than statistics bc it’s deterministic, whereas statistics is stochastic (that extends and encompasses deterministic functions).

AI speeds up learning, so I bet that’s what you’re noticing with R.

As an aside, the best programmers these days are probabilistic programmers (who write stochastic functions). Our languages are Stan and PyMC. Both can be called by Python or R, and AI writes all of them extremely well. So it seems to me that the underlying language matters less than ever.

davemp•1h ago

Picking up on some dunning kruger effect here.

Programming isn’t even a field in the same way as prob&stats. Computer science does in fact have non-deterministic sub fields such as information theory.

RA_Fisher•20m ago

There’ll always be boundary tending, true. Only a portion of CS deals with stochastic functions though, whereas all of statistics is stochastic. That makes a big difference, bc the world is complex.

Information theory doesn’t even incorporate utility.

jdw64•53m ago

I partially agree, but I also differ on some points. The part I agree with is that probabilistic programming is difficult and that advanced programmers tend to enjoy it. Where I differ is on the claim that programming is deterministic. At the script level, programming is deterministic and sequential, but once it crosses a certain threshold, it becomes absolutely probabilistic. That's because latency, locks, and asynchronous communication start to intervene. If programming were Non deterministic , C's undefined behavior wouldn't exist; everyone would have prevented it.

R these days mostly uses the tidyverse, which feels like a variant of DOP (Data-Oriented Programming). It's a kind of data flow, so it's different from typical OOP. I also occasionally work with statisticians (being a freelancer, ETL work is more common than you'd think), and I know what you mean by Stan and PyMC. I know they're powerful tools for Bayesian statistics and multilevel modeling. I know the basic syntax and examples, but I wouldn't say I know them well. My level is mainly focused on the scientists who hire me, and those tools still don't come up often in my country.

That said, I think we differ on the bigger picture because academic code isn't everything. Academic code is typically algorithm‑centric, like LeetCode problems, but most production work revolves around code hygiene and responsibility (algorithms are usually already established ones). Anyway, that's not the main point. What you said is mostly correct, but my focus was on something else: even people who studied at that level can be surprisingly clumsy at expressing themselves through programming. Regardless, thanks for your input, and I agree that AI is good at programming. But using a programming language generally means understanding its tradeoffs, and R is tricky in that regard since it feels like a mix of OOP and DOP variants

latexr•46m ago

> Seeing this, can we really say that AI is always bad?

Is anyone arguing “AI is always bad”? I think the argument is clearly “the negatives outweigh the positives”.

jdw64•40m ago

You're right. I think I overstated it. Since English isn't my native language, I might have used some stronger words than intended. Thank you for pointing that out

PaulHoule•23m ago

There is some great stuff in R but from a software engineering level I'd much rather data scientists work in Python.

At risk of sounding like ChatGPT, it's not an R thing, it's a general thing. Turn [showdead] on in your profile and see how Show HN is flooded with AI slop projects and we all know GitHub is drowning in it.

colechristensen•14m ago

A considerable amount of work for grad students is answering the question: "How the f#$% do I get this code to compile and run"

Some other researcher, often with limited skills in your native tongue, even more limited skills in software development best practices, wrote some code for a paper between 5 and 50 years ago and your PI has told you to use that code and some OTHER code together at the same time to validate some experiment he wants you to do.

In the past you would take days/weeks/months to get this to work, but with an LLM?

I'm envious of the grad students of today for the amount of nonsense which is bypassable.

Mairoce•1h ago

Frankly the bigger problem is an over reliance among R instructors on the tidyverse, an ever-expanding ecosystem of redundant functions and anti-patterns. They’re teaching new R users that everything can be solved with yet another package import and skipping over teaching them how to use the already powerful and intuitive base packages.

mjhay•1h ago

I’m not saying it doesn’t have flaws, but the tidyverse is still the most coherent and functional ML/stat computing ecosystem I’ve ever used. R packages outside of the tidyverse can get pretty gnarly. Even the R stdlib is usually considered to be inconsistent and riddled with legacy cruft.

331c8c71•1h ago

It's certainly quite pleasant to work with...but I would rather use sql for etl, the backend be whatever it needs to be...

The real world data transformations can get gnarly very quickly and sql is the perfect common debiminator compared to dplyr which is still niche...

How do you feel about polars?

mjhay•55m ago

I’m a big fan of Polars. It’s really fast and memory efficient. With the lazy streaming functionality, I’ve been able to easily process 1 Tb+ data on a single machine (you do have to be careful to not do any operation that would cause the whole DF to materialize in that case).

It’s certainly miles better than Pandas, which has a terrible API in addition to being comically inefficient. In my group, we generally use it for any new work, and have also swapped out pandas for polars in critical spots of our existing code - the latter giving a huge benefit relative to the amount of work it took.

I largely agree with you on SQL being the common denominator, but there are some things that are just awkward in SQL, and much easier to do in Python or other general purpose language.

nickcageinacage•49m ago

vibe coding hell is the reason

greenavocado•48m ago

The solution to this problem will be a web of trust featuring a vouching system that auto-closes PRs by default. I already see this being implemented in projects.

dofm•44m ago

R slop. Oof.

What an awful thing to imagine. It's already the programming language of choice for egregious abuses of good practice.

ActionHank•36m ago

I do wonder if there isn't enough computer science / software engineering that is being taught as part of data science.

People I've worked with that used R and manged data / did analysis didn't really seem too concerned with long term maintenance.

Secondary observation, these same people were the first to preach for the AI coding gospel.

mjhay•30m ago

Bingo. The typical data scientist has a masters or PhD in a non-CS quantitative field, and has had exactly zero CS or software eng classes. It’s a shame, because once you get over some of the idiosyncrasies, R is a really powerful and flexible functional language.

mr_toad•28m ago

> People I've worked with that used R and manged data / did analysis didn't really seem too concerned with long term maintenance.

Unless you’re the poor schmuck who is given the task of running the code written by the previous analyst, who has probably already left the company. Often it’s easier to just throw something together from scratch and then look for a new job, perpetuating the problem.

dofm•25m ago

One of the things that always reassures me about LLMs is that as well as being trained on languages with reasonably well-designed grammars, they will also have seen lots of examples of good practice in their training set.

Two things that make me wonder if they can possibly turn out good quality R.

Perhaps a true test of AGI will be when you ask it to write an application in R and it refuses for fear of what people might think.

ianbooker•19m ago

I see "AI and R" in three perspectives:

First, usage: Using R for our undergrads in time of LLMs is brilliant. ChatGPT slops out working code for their needs. Not pretty but works better that in 2022.

Second, development: Mastering R is hard, because its kalkül. Tidyverse mediates some of it, but still. This is the perfect breeding ground for slopification. Lets see.

Third, errata: I would love to know the percentage of science built on R to this day. I mean insights and analysis supported by it and it vast packages. What if somewhere, deep down in the stack there is an ancient bug that dented all of this? I think AI might help us here, or review slop will negate this?

colechristensen•8m ago

>What if somewhere, deep down in the stack there is an ancient bug that dented all of this?

Science is built on libraries with experience, that have been validated extensively against reality. Code often written by people who have retired and died because that exact same code has been validated and pinned to reality for decades. It is of course possible that a load bearing bug survives for a long time conspiring with an incorrect model of reality to give validated results, but wide use tends to eliminate these things.

parsimo2010•16m ago

I feel like CRAN should be used for packages that are expressly made for others to use, and with effort put in to the documentation and vignettes.

If you’re making a package for a small team or aren’t pushing it to a large audience then just keep it on a GitHub repository. It is almost as easy to install from GitHub with devtools as it is to install.packages().

We're making Bunny DNS free: because a faster internet won't build itself

Slate EV truck starts at $24,950

Founding a company in Germany: €9600, 152 days and I still can't send an invoice

Minimus container images are now free

Krea 2 Technical Report

Too many R packages: CRAN is inundated with submissions

Haystack: Open-Source AI Framework for Production Ready Agents, RAG

Statistics that live in your SQL

Stealing Is a Skill

Reid Hoffman says SpaceX 'not an AI company', xAI 'complete train wreck'

A Practical Guide to SSH Tunnels: Local and Remote Port Forwarding

Vulnerability reports are not special anymore

Raspberry Pi Pico W as USB Wi-Fi Adapter

In memory of the man who put red and green squiggles under words

Jerry's Map

FUTO Swipe – A new swipe typing model

François Englert (1932 – 2026)

"Fix" MacBook Neo Cursor Lag: Record 1 Pixel of the Screen Every 10 Seconds

Ashby (YC W19) Is Hiring EMEA Engineers Who Can Design

A deadly fungus that can infect cats and people is spreading

Qwen-AgentWorld: Language World Models for General Agents

Cointegration and Long-Horizon Forecasting (2025)

Vector Graphics in Lil

Printing Gaussian Splats

Rhombus Language 1.0

The worthlessness of Vitamin D is mildly exaggerated

Remaking BBC test cards to teach you video processing

Swift Package Index joins Apple

Show HN: TikZ Editor – WYSIWYG editor for figures in LaTeX

Usbliter8: an A12/A13 SecureROM Exploit

We're making Bunny DNS free: because a faster internet won't build itself

Slate EV truck starts at $24,950

Founding a company in Germany: €9600, 152 days and I still can't send an invoice

Minimus container images are now free

Krea 2 Technical Report

Too many R packages: CRAN is inundated with submissions

Haystack: Open-Source AI Framework for Production Ready Agents, RAG

Statistics that live in your SQL

Stealing Is a Skill

Reid Hoffman says SpaceX 'not an AI company', xAI 'complete train wreck'

A Practical Guide to SSH Tunnels: Local and Remote Port Forwarding

Vulnerability reports are not special anymore

Raspberry Pi Pico W as USB Wi-Fi Adapter

In memory of the man who put red and green squiggles under words

Jerry's Map

FUTO Swipe – A new swipe typing model

François Englert (1932 – 2026)

"Fix" MacBook Neo Cursor Lag: Record 1 Pixel of the Screen Every 10 Seconds

Ashby (YC W19) Is Hiring EMEA Engineers Who Can Design

A deadly fungus that can infect cats and people is spreading

Qwen-AgentWorld: Language World Models for General Agents

Cointegration and Long-Horizon Forecasting (2025)

Vector Graphics in Lil

Printing Gaussian Splats

Rhombus Language 1.0

The worthlessness of Vitamin D is mildly exaggerated

Remaking BBC test cards to teach you video processing

Swift Package Index joins Apple

Show HN: TikZ Editor – WYSIWYG editor for figures in LaTeX

Usbliter8: an A12/A13 SecureROM Exploit

Too many R packages: CRAN is inundated with submissions

Comments