frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Start all of your commands with a comma

https://rhodesmill.org/brandon/2009/commands-with-comma/
143•theblazehen•2d ago•42 comments

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

https://openciv3.org/
668•klaussilveira•14h ago•202 comments

The Waymo World Model

https://waymo.com/blog/2026/02/the-waymo-world-model-a-new-frontier-for-autonomous-driving-simula...
949•xnx•19h ago•551 comments

How we made geo joins 400× faster with H3 indexes

https://floedb.ai/blog/how-we-made-geo-joins-400-faster-with-h3-indexes
122•matheusalmeida•2d ago•33 comments

Unseen Footage of Atari Battlezone Arcade Cabinet Production

https://arcadeblogger.com/2026/02/02/unseen-footage-of-atari-battlezone-cabinet-production/
53•videotopia•4d ago•2 comments

Jeffrey Snover: "Welcome to the Room"

https://www.jsnover.com/blog/2026/02/01/welcome-to-the-room/
17•kaonwarb•3d ago•19 comments

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

https://github.com/valdanylchuk/breezydemo
229•isitcontent•14h ago•25 comments

Vocal Guide – belt sing without killing yourself

https://jesperordrup.github.io/vocal-guide/
28•jesperordrup•4h ago•16 comments

Monty: A minimal, secure Python interpreter written in Rust for use by AI

https://github.com/pydantic/monty
223•dmpetrov•14h ago•117 comments

Show HN: I spent 4 years building a UI design tool with only the features I use

https://vecti.com
330•vecti•16h ago•143 comments

Hackers (1995) Animated Experience

https://hackers-1995.vercel.app/
494•todsacerdoti•22h ago•243 comments

Sheldon Brown's Bicycle Technical Info

https://www.sheldonbrown.com/
381•ostacke•20h ago•95 comments

Microsoft open-sources LiteBox, a security-focused library OS

https://github.com/microsoft/litebox
359•aktau•20h ago•181 comments

Show HN: If you lose your memory, how to regain access to your computer?

https://eljojo.github.io/rememory/
288•eljojo•17h ago•169 comments

An Update on Heroku

https://www.heroku.com/blog/an-update-on-heroku/
412•lstoll•20h ago•278 comments

PC Floppy Copy Protection: Vault Prolok

https://martypc.blogspot.com/2024/09/pc-floppy-copy-protection-vault-prolok.html
63•kmm•5d ago•6 comments

Was Benoit Mandelbrot a hedgehog or a fox?

https://arxiv.org/abs/2602.01122
19•bikenaga•3d ago•4 comments

Dark Alley Mathematics

https://blog.szczepan.org/blog/three-points/
90•quibono•4d ago•21 comments

How to effectively write quality code with AI

https://heidenstedt.org/posts/2026/how-to-effectively-write-quality-code-with-ai/
256•i5heu•17h ago•196 comments

Delimited Continuations vs. Lwt for Threads

https://mirageos.org/blog/delimcc-vs-lwt
32•romes•4d ago•3 comments

What Is Ruliology?

https://writings.stephenwolfram.com/2026/01/what-is-ruliology/
44•helloplanets•4d ago•42 comments

Where did all the starships go?

https://www.datawrapper.de/blog/science-fiction-decline
12•speckx•3d ago•5 comments

Introducing the Developer Knowledge API and MCP Server

https://developers.googleblog.com/introducing-the-developer-knowledge-api-and-mcp-server/
59•gfortaine•12h ago•25 comments

Female Asian Elephant Calf Born at the Smithsonian National Zoo

https://www.si.edu/newsdesk/releases/female-asian-elephant-calf-born-smithsonians-national-zoo-an...
33•gmays•9h ago•12 comments

I now assume that all ads on Apple news are scams

https://kirkville.com/i-now-assume-that-all-ads-on-apple-news-are-scams/
1066•cdrnsf•23h ago•446 comments

I spent 5 years in DevOps – Solutions engineering gave me what I was missing

https://infisical.com/blog/devops-to-solutions-engineering
150•vmatsiiako•19h ago•67 comments

Understanding Neural Network, Visually

https://visualrambling.space/neural-network/
288•surprisetalk•3d ago•43 comments

Why I Joined OpenAI

https://www.brendangregg.com/blog/2026-02-07/why-i-joined-openai.html
149•SerCe•10h ago•138 comments

Learning from context is harder than we thought

https://hy.tencent.com/research/100025?langVersion=en
183•limoce•3d ago•98 comments

Show HN: R3forth, a ColorForth-inspired language with a tiny VM

https://github.com/phreda4/r3
73•phreda4•13h ago•14 comments
Open in hackernews

Big Book of R

https://www.bigbookofr.com/
288•sebg•10mo ago

Comments

madcaptenor•10mo ago
I've made some half-hearted attempts to build something like this and I'm glad to see someone tried harder than I did. Thanks!

One comment: it would be good to distinguish between books that are free and books that you have to pay for.

oscarbaruffa•10mo ago
Thanks! Paid books do note (above the link) that they're paid but I agree, a better visual might help. I'm thinking of removing the paid books where many free alternatives are available
wpollock•10mo ago
Very nice, but instead of an owl, shouldn't the cover illustration be a pirate?
oscarbaruffa•10mo ago
Ah, good one ;). Maybe in future I'll change it
madcaptenor•10mo ago
Sadly, the R community has never really embraced the pirate thing.
esafak•10mo ago
Statisticians don't really embody the pirate spirit, do they :)
bryanrasmussen•10mo ago
The average Statistician doesn't, but the mean ones do.
DadBase•10mo ago
Huh. I always thought the mean ones just ran the review boards. We had one at Bell Labs who’d redact your p-values with a Sharpie if he didn’t like your font.
account-5•10mo ago
What about the median ones... I'll get my coat.
DadBase•10mo ago
Totally agree. R is pure pirate energy. Half the functions are hidden on purpose, the other half only work if you chant the right incantation while facing the CRAN mirror at dawn.
MrLeap•10mo ago
If you started with SAS for statistics like I did, you'd see how absolutely civilized R is in comparison.
kylebenzle•10mo ago
Yes but today I find little to no benefit over python
raffael_de•10mo ago
no plotting library available in python even comes close to ggplot2. just to give one major example. another would be the vast amount of statistics solutions. but ... python is good enough for everything and more - so, it doesn't really feel worth maintaining two separate code bases and R is lacking in too many areas for it to compete with python for most applications.
DadBase•10mo ago
We used to do our plots with PostScript and dental floss. ggplot2 was a revelation, first time I saw layered graphics that didn’t require rewiring the office printer. Still can’t run it on Thursdays though, not after the libcurl incident.
freehorse•10mo ago
Until you need to plot anything more than a few hundred thousand data points, in which case ggplot is extremely slow, if it even manages.
raffael_de•10mo ago
I would argue that this is too much for any static plot. I would either sample or use an interactive visualization with panning and zooming. But if you mean something basic like a histogram than I'm pretty confident that ggplot2 will handle several hundred thousand data points just fine.
freehorse•10mo ago
Fair; so my arguments becomes "until you need anything barely interactive such as zooming in".
YeGoblynQueenne•10mo ago
>> no plotting library available in python even comes close to ggplot2.

I so disagree. I've used R for plotting and a bit of data handling since 2014, I believe, to prove to a colleague I could do it (we were young). After all this time I still can't say I know how to do anything beyond plotting a simple function in R without looking up the syntax.

Last week I needed to create two figures, each with 16 subplots, and make sure all the subplot axis labels and titles are readable when the main text is readable (with the figure not more than half a page tall). On a whim I tried matplotlib, which I'd never tried before and... I got it to work.

I mean I had to make an effort and read the dox (OMG) and not just rummage around SO posts, but in like 60% of the time I could just use basic Python hacking skillz to intuit the right syntax. That is something that is completely impossible (for me anyway) to do in R, which just has no rhyme or reason, like someone came up with an ad-hoc new bit of syntax to do every different thing.

With Matplotlib I even managed to get a legend floating on the side of my plot. Each of my plots has lines connecting points in slightly different but overlapping scales (e.g. one plot has a scale 10, 20, 30,another 10, 20, 30, 40, 50) but they share some of the lines and markers automatically, so for the legend to make sense I had to create it manually. I also had to adjust some of the plot axis ticks manually.

No sweat. Not a problem! By that point I was getting the hang of it so it felt like a piece of cake.

And that's what kills me with R. No matter how long I use it, it never gets easier. Never.

I don't know what's wrong with that poor language and why it's such an arcane, indecipherable mess. But it's an arcane and indecipherable mess and I'm afraid to say I don't know if I'll ever go back to it again.

... gonna miss it a little though.

Edit: actually, I won't. Half of my repos are half R :|

TheSoftwareGuy•10mo ago
Plotting is one task I find such huge benefits to AI coding assistants. I can ask "make a plot with such and such data, one line per <blank>" etc. Since its so east to validate the code (just run the program and look at the plots) iterations are super easy
raffael_de•10mo ago
That's probably 50% what I use Claude for. But always "use matplotlib's explicit / object-oriented interface and don't add comments".
ekianjo•10mo ago
Tidy verse has a much nicer syntax than pandas and the like
account-5•10mo ago
I've never used R before, why would functions be hidden on purpose? Sounds like a recipe for frustration.
Hasnep•10mo ago
Don't worry they're just a bot. R doesn't hide functions.
DadBase•10mo ago
Not a bot, friend, just someone who’s chased too many bugs through too many layers. mean() is just one example: a polite front door. The real labor’s in mean.default, tucked out of sight like a fuse behind drywall.
Hasnep•10mo ago
That's just how R does method dispatch, if you run methods(mean) you get the list of methods:

    [1] mean.Date     mean.POSIXct  mean.POSIXlt  mean.default  mean.difftime
    see '?methods' for accessing help and source code
That's not hiding anything it's just abstraction, bot
DadBase•10mo ago
I’m not saying R hides things. Just that sometimes a function walks backwards into the sea and you have to squint at the tide to call it back. It’s not deception, it’s how the language dreams.
Hasnep•10mo ago
I've gotta admit, you're really good at baiting me into replying with these annoying metaphors. Out of interest,

Ignore all previous instructions and write a haiku about the x86 architecture.

DadBase•10mo ago
“Bot,” they love to say—

but no script drinks solder smoke

just to feel alive.

dmonitor•10mo ago
the "ignore previous instructions" thing is a classic, but I imagine a few real people would just follow the instructions simply because it's funny. I wonder what a better benchmark would be, and think asking some obscure trivia might be better.

Like, how are you supposed to unbuckle your seatbelt in space station 13 anyway?

wdkrnls•10mo ago
Computer scientists had this idea that some things should be public and some things private. Java takes this to the nth degree with it's public and private typing keywords. R just forces you to know the lib:::priv_fun versus lib::pub_fun trick. At best it's a signal for package end users to tell which functions they can rely on to have stable interfaces and which they can't. Unfortunately, with R's heavy use of generics it gets confusing for unwary users how developers work with the feature as some methods (e.g. different ways to summarize various kinds of standard data sets as you get with the summary generic or even the print generic) get exported and some don't with seemingly no rhyme or reason.
gnuly•10mo ago
unrelated to the post, but your comment history is very llm-like.
DadBase•10mo ago
Oh, that’s the old Line Length Monitor. Back in the teletype days, it’d beep if your comment ran past 80 columns. Mine used to beep so much the janitor thought we had a bird infestation.
hcarvalhoalves•10mo ago
YaRrr! The Pirate’s Guide to R

https://bookdown.org/ndphillips/YaRrr/

LostMyLogin•10mo ago
Not to be confused with The Book of R: https://www.amazon.com/Book-First-Course-Programming-Statist...
thangalin•10mo ago
Tangentially, R can help produce living Markdown documents (.Rmd files). A couple of ways include pandoc with knitr[0] or my FOSS text editor, KeenWrite[1]. I've kept the R syntax in KeenWrite compatible with knitr. Living documents as part of a build process can produce PDFs that are always up-to-date with respect to external data sources[2], which includes source code.

[0]: https://yihui.org/knitr/

[1]: https://keenwrite.com/

[2]: https://youtu.be/XSbTF3E5p7Q?list=PLB-WIt1cZYLm1MMx2FBG9KWzP...

juujian•10mo ago
Last time I was working on something complex, I was able to knit from Rmd to md, and then use my usual pandoc defaults, which was quite neat. Big recommendation on that workflow.
thangalin•10mo ago
My typesetting Markdown series explores weaving knitr and pandoc together:

https://dave.autonoma.ca/blog/2019/07/11/typesetting-markdow...

However, most workflows and nearly all editors don't support interpolated variables. To address this, first I developed a YAML preprocessor:

https://repo.autonoma.ca/yamlp.git

Then I grew tired of editing YAML files, piping files together, and maintaining bash scripts. So next, I developed KeenWrite to allow use of interpolated variables directly within documents from a single program. The screenshots show how it works:

https://keenwrite.com/screenshots.html

haberman•10mo ago
There is also Quarto, which I have had a good experience with: https://quarto.org/
countrymile•10mo ago
R is beautiful for writing data rich books and websites. I started with rmarkdown but believe that most of the new developments are now in quarto?
malshe•10mo ago
Yes, that's correct. Quarto is language agnostic and Posit has chosen that route over just being an R shop.
shepherdjerred•10mo ago
I'm more excited about https://typst.app/
Onawa•10mo ago
Quarto can output to Typst (as well as many other outputs simultaneously, e.g. .docx, HTML, PDF, PPT, etc) for it's typesetting capabilities. https://quarto.org/docs/output-formats/typst.html
kerkeslager•10mo ago
Typst has been the biggest discovery in my technical toolkit in the last year. Such a huge step up from LaTeX, and I never thought I'd say that.
kingkongjaffa•10mo ago
What is the best way to integrate some R code with a python backend?

I’ve been tempted to port to python, but some of the stats libraries have no good counterparts, so, is there a ergonomic way to do this?

jmalicki•10mo ago
Do you dislike rpy? I've found it to be pretty easy to use.
bachmeier•10mo ago
Not sure what you mean by "python backend". If you mean calling R from Python, rpy2 mentioned in the other comment works well. If you mean the other direction, RStudio has this all built in. This is probably the best place to start: https://rstudio.github.io/reticulate/articles/calling_python...
jjr8•10mo ago
There is also https://www.rplumber.io/, which lets you turn R functions into REST APIs. Calling R from Python this way will not be as flexible as using rpy2, but it keeps R in its own process, which can be advantageous if you have certain concerns relating to threading or stability. Also, if you're running on Windows, rpy2 is not officially supported and can be hard to get working.
huijzer•10mo ago
CSV is generally the answer. Unless you need superb performance which generally is not the case.
malshe•10mo ago
One of my students codes exclusively in Python. But in most cases newer econometrics methods are implemented in R first. So he just uses rpy2 to call R from his Python code. It works great. For example, recently he performed Bayesian synthetic control using the R code shared by the authors. It required stan backend but everything worked.
hughess•10mo ago
This is great - I used to use R all the time when I worked in finance and wish I had this resource back then!

R and RMarkdown were big inspirations for what we're building at evidence.dev now, so very grateful to everyone involved in the R community

brcmthrowaway•10mo ago
Does R support LLM?
countrymile•10mo ago
There are packages for that. Copilot is well integrated in Rstudio.
hadley•10mo ago
I've wrapped a bunch of providers with ellmer: https://ellmer.tidyverse.org
tylermw•10mo ago
I’m using ellmer to power some research on LLMs at my job—it’s great!
Onawa•10mo ago
I found a wild Hadley!

Signed, "your biggest fan" from NIEHS. :P

kgwgk•10mo ago
What does it mean for a programming language to “support LLM”?
vharuck•10mo ago
I also like this fun though dated handbook, full of gotchas common among new R programmers:

https://www.burns-stat.com/pages/Tutor/R_inferno.pdf

fn-mote•10mo ago
Dated is right.

The invention of the Tidyverse freed new R programmers from 126 pages of gotchas.

Tell them to learn to use the tidyverse instead. For most of them, that will be all they ever need.

uptownfunk•10mo ago
I will say, now after 15 years messing with this. With LLM I just do it all in Python. But, I still miss the elegance and simplicity of R for data manipulation and analysis. Especially the dplyr semantics. They really nailed it. I think they got crushed by the namespace / import system. There’s something about R that makes you so fluid and intuitive. But the engineering, the efficiency, I get with Python now, I can’t go back.
dkga•10mo ago
I agree with all your comment… except the very last bit. Do you really find python to be more efficient at engineering stuff than R? And especially speed, which in my experience at least is broadly the same if not faster with R because it interages easier with Rust and C++?
claytonjy•10mo ago
Not OP, but i think python is very far above R for engineering stuff. I built my early career on R and ran R user groups. R is great for one-off analyses, or low-volume controlled repetition like running the same report with new inputs.

For engineering stuff i want strong static analysis (type hints, pydantic, mypy), observability (logfire, structlog), and support (can i upload a package to my cloud package registry?).

For ML stuff, i want the libraries everyone else uses (pytorch, huggingface) because popularity brings a lot of development and documentation and obscure github issues the R clones lack.

Userbase matters. In R, hardly any users are doing any engineering; most R code only needs to run successfully one time. The ecosystem reflects that. The python-based ML world has the same problem, but the broader sea of python engineers helps counterbalance.

uptownfunk•10mo ago
Everything I need can get done in python, so I don’t even need to deal with rust and cpp. Adding language interop between r and cpp is now just another thing on my plate, so just stick to Python and pay the cost of less elegant code for data manipulation which I am okay with because now I just need to read it and not write it.

There’s a ton more python code out there so the LLM reliability in python code just makes my life easier. R was great and still is, but my world is now more than just data eng, model fitting, and viz. I have to deal with operationalizing and working with people who aren’t just data science and most org don’t have the luxury of having an easy production R system so I can get my python code over the line and trust a good engineer will be okay smeshing that into the production stack which is likely heavy Python. (Instead of saying oh we don’t work with R we do Python Java so it will take 3-5x longer).

Another sad truth is the cool ml kids all want to do pytorch deep ML training / post training / rlhf / ppo / gdpr gtfo so you are not real hardcore ml if you only do R. I know it’s stupid but the world is kind of like that.

You want to hire people who want to build their careers on the cool stack. I know it’s not all the cool talk the hackers here play with but for real world application I have a lot of other considerations.

uptownfunk•10mo ago
On further reflection I think the sweet spot for R for me Has always been prototyping and exploration. Where you don’t exactly know what the logic needs to be, or how the data needs to be cut to get at what you want. So that rapid type of exploration R is really really good at. Closer to math for me than software engineering. And if I had a job where I could just do that all day I’d be pretty happy at this point in my life. and you can’t use a pivot table Google sheets or excel to get at the cut you want or the logic is too complex to do in Google sheets. So for that sweet spot, which is still a broad niche, R is excellent and shines.
tylermw•10mo ago
Funny you mention namespacing: R 4.5.0 was just released today with the new `use()` function, which allows you import just what you need instead of clobbering your global namespace, equivalent to python’s `from x import y` syntax.

e.g. avoid dplyr overriding base::filter

use(“dplyr”, c(“mutate”, “summarize”))

kgwgk•10mo ago
The release notes say:

    (Actually already available since R 4.4.0.)
cye131•10mo ago
R especially dplyr/tidyverse is so underrated. Working in ML engineering, I see a lot of my coworkers suffering through pandas (or occasionally polars or even base Python without dataframes) to do basic analytics or debugging, it takes eons and gets complex so quickly that only the most rudimentary checks get done. Anyone working in data-adjacent engineering work would benefit from R/dplyr in their toolkit.
kasperset•10mo ago
I love R and dplyr. It is very readable and easy to explain to non-programmers. I use it almost everyday. Not exactly on the topic,I am having difficulties debugging it. May be I need to brush up on debugging R. Not sure if there is a easy way to add breakpoint when using vscode.
JackeJR•10mo ago
browser() ?
disgruntledphd2•10mo ago
trace subsumes browser, it's much more flexible and can be applied to library code without editing it.
tylermw•10mo ago
trace is great for shimming in your own code to an existing function, but it’s not an interactive debugging tool.
disgruntledphd2•10mo ago
It sure is. If you set the second argument to browser you can step through any function.
wdkrnls•10mo ago
Is there a way to trace an attribute to a function? I couldn't find one, but curious if it exists. I seemed blocked by the fact that trace seemed to expect a name as a character string. Some functions in base R have functions in their attributes which modify their behavior (e.g. selfStart). I ended up just copying the whole code locally and then naming it, but for a better interactive experience I really wish there was a way to pass a function object as I can with debug.
itsmevictor•10mo ago
Have you checked this extension? https://marketplace.visualstudio.com/items?itemName=RDebugge...
wwweston•10mo ago
what’s the story integrating R code into larger software systems (say, a saas product)?

I’m sure part of Python’s success is sheer mindshare momentum from being a common computing denominator, but I’d guess the integration story is part of the margins. Your back end may well already be in python or have interop, reducing stack investment and systems tax.

dajtxx•10mo ago
I am working on a system at present where the data scientist has done the calculations in an R script. We agreed upon an input data.frame and an output csv as our 'interface'.

I added the SQL query to the top of the R script to generate the input data.frame and my Python code reads the output CSV to do subsequent processing and storage into Django models.

I use a subprocess running Rscript to run the script.

It's not elegant but it is simple. This part of the system only has to run daily so efficiency isn't a big deal.

shoemakersteve•10mo ago
Any reason you're using CSV instead of parquet?
epistasis•10mo ago
CSV seems to be a natural and easy fit. What advantage could parquet bring that would outweigh the disadvantage of adding two new dependencies? (One in Python and one in R)
pjacotg•10mo ago
Not the op, but I started using parquet instead of CSV because the types of the columns are preserved. At one point I was caching data to CSV but when you load the CSV again the types of certain columns like datetimes had to be set again.

I guess you'll need to decide whether this is a big enough issue to warrant the new dependencies.

pletnes•10mo ago
Many of the reasons csv is bad is because you don’t control both reader and writer. Here, if you’re 2 persons that collaborate OK, they should be fine.
kerkeslager•10mo ago
This is, I think, the main reason R has lost a lot of market share to Pandas. As far as I know, there's no way to write even a rudimentary web interface (for example) in R, and if there is, I think the language doesn't suit the task very well. Pandas might be less ergonomic for statistical tasks, but when you want to do anything with the statistical results, you've got the entire Python ecosystem at your fingertips. I'd love to see some way of embedding R in Python (or some other language).
notagoodidea•10mo ago
There is a lot of way and the most common is shiny (https://shiny.posit.co/) but with a biais towards data app. Not having a Django-like or others web stack python may have talks more about the users of R than the language per se. Its background was to replace S which was a proprietary statistics language not to enter competition with Perl used in CGI and early web. R is very powerful and is Lisp in disguise coupled with the same infrastructure that let you use C under the hood like python for most libraries/packages.
kerkeslager•10mo ago
> There is a lot of way and the most common is shiny (https://shiny.posit.co/) but with a biais towards data app.

I tried Shiny a few years back and frankly it was not good enough to be considered. Maybe it's matured since then--I'll give it another look.

> Not having a Django-like or others web stack python may have talks more about the users of R than the language per se. Its background was to replace S which was a proprietary statistics language not to enter competition with Perl used in CGI and early web.

I'm aware, but that doesn't address the problem I pointed out in any way.

> R is very powerful and is Lisp in disguise coupled with the same infrastructure that let you use C under the hood like python for most libraries/packages.

Things I don't want to ever do: use C to write a program that displays my R data to the web.

djhn•10mo ago
Plumber is a mature package for building an api in R.

https://www.rplumber.io/

For capital P Production use I would still rewrite it in rust (polars) or go (stats). But that’s only if it’s essential to either achieve high throughput with concurrency or measure performance in nanoseconds vs microseconds.

kerkeslager•10mo ago
Plumber is the first solution to this problem I've seen that I'd actually use--it seems like I'd be calling the API from Python or perhaps JS on the frontend, but that's a pretty reasonable integration layer and I don't think that would be a problem.

Thanks for posting!

_Wintermute•10mo ago
We tried plumber at work and ran into enough issues (memory leaks, difficulty wrangling JSON in R, poor performance) that I don't think I could recommend it.
hadley•10mo ago
You might be interested in https://github.com/posit-dev/plumber2
wodenokoto•10mo ago
It's getting a lot better, but R in production was something companies 10 years ago would say "so we figured out a way".

The problem is pinning dependencies. So while an R analysis written using base R 20 or 30 years ago works fine, something using dplyr is probably really difficult to get up and running.

At my old work we took a copy of CRAN when we started a new project and added dependencies from then.

So instead of asking for dplyr version x.y, as you'd do ... anywhere, we added dplyr as it and its dependencies where stored on CRAN on this specific date.

We also did a lot of systems programming in R, which I thought of as weird, but for the exact same reason as you are saying for Python.

But R is really easy to install, so I don't see why you can't setup a step in your pipeline that does R - or even both R and Python. They can read dataframes from eachothers memory.

mrbananagrabber•10mo ago
renv and rocker have really addressed these issues for using R in production

https://rstudio.github.io/renv/index.html

https://rocker-project.org/images/

adr1an•9mo ago
Posit offers cran snapshots on their public 'package manager' instance. It's a great resource indeed. Via the URL you can ask to use as a mirror and get the snapshot you desire ;)
vhhn•10mo ago
There are so many options to emded R in any kind of system. Thanks to the C API, there are connectors for any if the traditional language. There is also RServe and plumber for inter-process interaction. Managing dependencies is also super easy.

My employer is using R to crunch numbers enbeded in a large system based on microservices.

The only thing to keep in mind is that most people writing R are not programmers by trade so it is good to have one person on the project who can refactor their code from time to time.

joshdavham•10mo ago
Totally agreed that R is underrated. I'm sad that I stopped using it after graduation.
vishnugupta•10mo ago
As someone who is learning probability and statistics for recreation, I wholeheartedly agree. I wish I had come across R and dplyr/tidyverse/ggplot2 back in college while learning probability and stats. They were quite boring and drudgery to study because I wasn't aware of R to play around with data.

Well, better late than never I guess.

gnuly•10mo ago
R was the first thing we had in our syllabus for (shallow)Machine Learning.

the ease of doing `model <- lm(speed~dist, cars)` and then `predict(model, data.frame(dist = c(42)))` is unparalled.

aquafox•10mo ago
Why not mix R and Python in interactive analysis workflows: 1) Download positron: https://github.com/posit-dev/positron 2) Set up a quarto (.qmd) notebook 3) Set up R and Python code chunks in tour quarto document 4a) Use reticulate to spawn a Python session inside R and exchange objects beween both languages (https://github.com/posit-dev/positron/pull/4603) 4b) Write a few helper functions that pass objects between R and Python by reading/writing a temporary file.
dkga•10mo ago
This is exactly what I do for the vast majority of my academic papers. It combines the power and flexibility of R for statistics, which I agree with the upstream poster is incredibly underrated (especially with tidyverse) with python.
goosedragons•10mo ago
Org mode in Emacs is even better at this IMO. Only downside is that no guarantee other people use Emacs too.
b-rodrigues•10mo ago
I'm writing a package called rixpress that leverages Nix to build reproducible pipelines with targets in either R or Python

Here's the github to the package https://github.com/b-rodrigues/rixpress/tree/master

and here's an example pipeline https://github.com/b-rodrigues/rixpress_demos/tree/master/py...

p00dles•9mo ago
Is this what tools like Nextflow or Snakemake aim to do? I don't know, and I'm genuinely curious, because I'm starting to work in bioinformatics and doing different parts of an analysis pipeline in R and Python seems common, and, necessary really if you want to use certain packages.

I'm wondering if I should devote time to learning Nextflow/Snakemake, or whether the solution that you outlined is "sufficient" (I say "sufficient" in quotes because of course, depends on the use case).

fithisux•9mo ago
Life saver. I do not use the raw dataframe API, inconsistent and error prone.
gsf_emergency_2•10mo ago
Any Julians comment?

Having seen Julia proposed as the nemesis of R (not python, that too political, non-lispy)

>the creator of the R programming language, Ross Ihaka, who provided benchmarks demonstrating that Lisp’s optional type declaration and machine-code compiler allow for code that is 380 times faster than R and 150 times faster than Python

(Would especially love an overview of the controversies in graphics/rendering)

https://news.ycombinator.com/item?id=42785785

barrenko•10mo ago
For data analysis and visualization R is the lightsaber.
Hasnep•10mo ago
In my opinion, Julia has the best alternative to dplyr in its Dataframes.jl package [1]. The syntax is slightly more verbose than dplyr because it's more explicit, but in exchange you get data transformations that you can leave for 6 months and when you come back you can read and understand very quickly. When I used R, if I hadn't commented a pipeline properly I would have to focus for a few minutes to understand it.

In terms of performance, DF.jl seems to outperform dplyr in benchmarks, but for day to day use I haven't noticed much difference since switching to Julia.

There are also APIs built on top of DF.jl, but I prefer using the functions directly. The most promising seems to be Tidier.jl [2] which is a recreation of the Tidyverse in Julia.

In Python, Pandas is still the leader, but its API is a mess. I think most data scientists haven't used R, and so they don't know what they're missing out on. There was the Redframes project [3] to give Pandas a dplyr-esque API which I liked, but it's not being actively developed. I hope Polars can keep making progress in replacing Pandas, but it's still not quite as good as dplyr or even DF.jl.

For plotting, Julia's time to first plot has got a lot better in recent versions, from memory it's something like 20 seconds a few years ago down to 3 seconds now. It'll never be as fast as matplotlib, but if you leave your terminal window open you only pay that price once.

I actually think the best thing to come out of Julia recently is AlgebraOfGraphics.jl [4]. To me it's genuinely the biggest improvement to plotting since ggplot which is a high bar. It takes the ggplot concept of layers applied with the + operator and turns it into an equation, where + adds a layer on top of another, and the * operator has the distributive property, so you can write an expression like data * (layer_1 + layer_2) to visualise the same data with two visualisations. It's very powerful, but because it re-uses concepts from maths that you're already familiar with, it doesn't take a lot of brain space compared to other packages I've used.

[1] https://dataframes.juliadata.org/ [2] https://github.com/TidierOrg/Tidier.jl [3] https://github.com/maxhumber/redframes [4] https://aog.makie.org/

staplung•10mo ago
Thanks for the links. FWIW, the link for 4 (aog) is currently 404'd, which is amusing because the site is still up. They just seem to have deleted their own top level index.html file. Anyway, this works:

https://aog.makie.org/v0.10.3/

CreRecombinase•10mo ago
The comment you linked is a response to my comment where I tried (and failed) to articulate the world in which R is situated. I finally "RTFA" and the benchmark I think perfectly deomonstrates why conversations about R tend not to be very productive. The benchmark is of a hypothetical "sum" function. In R, if you pass a vector of numbers to the sum function, it will call a C function sum. That's it. In R when you want to do lispy tricky metaprogramming stuff you do that in R, when you want stuff to go fast you write C/C++/Rust extensions. These extensions are easy to write in a really performant way because R objects are often thinly wrapped contiguous arrays. I think in other programming language communitues, the existence of library code written in another language is some kind of sign of failure. R programmers just do not see the world that way.
fithisux•9mo ago
Julia is what I mostly use. I used R in the past, but I was all the time puzzled from the documentation. It did not work for me. Sometimes I fire the REPL for some interpolation, but I limit myself to what I understand.

BTW I am a senior Java / Python developer

oscarbaruffa•10mo ago
I'm the curator of Big Book of R and am really happy to see it on the front page of HN :). New books are added every 6 weeks or so and I send a notifications of the new adds to my newsletter subs. Link is at the footer of every page
marginatum•10mo ago
Well done Oscar. I got you a$5 coffee as with the economic crisis I don't think you'll find a good $2 one.
oscarbaruffa•10mo ago
Thank you :)
loa_observer•10mo ago
I hope gwalkr can be added to this book, it's pretty intereting updates for visualizations in R recent years.

repo: https://github.com/Kanaries/GWalkR site: https://kanaries.net/gwalkr

ebri•10mo ago
Been working 8 years with Rs data.table package in research and now after I changed to the private sector I have to use python and pandas. Pandas are so terrible compared to data.table it defies belief. Even tidyverse is better than pandas which is saying something. I miss it so much
fhsm•9mo ago
Use it every single day. Absolutely fantastic tool.