I'd imagine these days typescript or node might be taking over some of what would have hit on javascript.
But there is likely some decline for Java. I'd bet Elixir and Erlang have been nibbling away on the JVM space for quite some time, they make it pretty comfortable to build the kind of systems you'd otherwise use a JVM-JMS-Wildfly/JBoss rig for. Oracle doesn't help, they take zero issue with being widely perceived as nasty and it takes a bit of courage and knowledge to manage to avoid getting a call from them at your inconvenience.
Which is a pity, once you learn to submit to and tolerate Maven it's generally a very productive and for the most part convenient language and 'ecosystem'. It's like Debian, even if you fuck up badly there is likely a documented way to fix it. And there are good libraries for pretty much anything one could want to do.
b) The ultimate hard search topic for is 'R' / 'R language'. Check if you think you index it corectly. Or related terms like RStudio, Posit, [R]Shiny, tidyverse, data.table, Hadleyverse...
Call it "Go", for example.
(Necessary disclaimer for the irony-impaired: this is a joke and an attempt at being witty.)
I present to you "Gangam C"
Day/Hour activity maps for a given user are relatively trivial to do in a single query, but only public submission/comment data could be used to infer it.
Hmm. Personally I never look at user names when I comment on something. It's too easy to go from "i agree/disagree with this piece of info" to "i like/dislike this guy"...
...is that supposed to pose some kind of problem? The problem would be in the other direction, surely?
You're wrong in both cases :)
Maybe you don't like my opinions on cogwheel shaving but you will agree with me on quantum frobnicators. But if you first come across about my comments on cogwheel shaving and note the user name, you may not even read the comments on quantum frobnicators later.
The leaderboard https://news.ycombinator.com/leaders absolutely doesn't correlate with posting frequency. Which is probably a good thing. You can't bang out good posts non-stop on every subject.
Undefined, presumably. For what reason would there be to take time out of your day to press a pointless button?
It doesn't communicate anything other than that you pressed a button. For someone participating in good faith, that doesn't add any value. But those not participating in good faith, i.e. trolls, it adds incredible value knowing that their trolling is being seen. So it is actually a net negative to the community if you did somehow accidentally press one of those buttons.
For those who seek fidget toys, there are better devices for that.
Like when someone says GUIs are better than CLIs, or C++ is better than Rust, or you don't need microservices, you can just hide that inconvenient truth from the masses.
The C++ example for instance above, you are likely to be downvoted for supporting C++ over rust and therefore most people reading through the comments (and LLMs correlating comment “karma” to how liked a comment is) will generally associate Rust > C++, which isn’t a nuanced opinion at all and IMHO is just plain wrong a decent amount if times. They are tools and have their uses.
So generally it shows the sentiment of the group and humans and conditioned to follow the group.
All it can fundamentally serve is to act as an impoverished man's read receipt. And why would you want to give trolls that information? Fishing to find out if anyone is reading what they're posting is their whole game. Do not feed the trolls, as they say.
I've probably written over 50k words on here and was wondering if I could restructure my best comments into a long meta-commentary on what does well here and what I've learned about what the audience likes and dislikes.
(HN does not like jokes, but you can get away with it if you also include an explanation)
I wrote a Puppeteer script to export my own data that isn't public (upvotes, downvotes, etc.)
If you click the api link bottom of page it’ll explain.
I had to CTRL-C and resume a few times when it stalled; it might be a bug in my tool
Shouldn't that be The Fall Of Rust? According to this, it saw the most attention during the years before it was created!
[1]: https://atlas.web.cern.ch/Atlas/GROUPS/PHYSICS/PUBNOTES/ATL-...
The area occupied by each color is basically meaningless, though, because of the logarithmic y-scale. It always looks like there's way more of whatever you put on the bottom. And obviously you can grow it without bound: if you move the lower y-limit to 1e-20 you'll have the whole plot dominated by whatever is on the bottom.
For the record I think it's a terrible convention, it just somehow became standard in some fields.
I'm actually surprised at that volume, given this is a text-only site. Humans have managed to post over 20 billion bytes of text to it over the 18 years that HN existed? That averages to over 2MB per day, or around 7.5KB/s.
Also, I bet a decent amount of that is not from humans. /newest is full of bot spam.
$ du -c ~/feepsearch-prod/datasource/hacker-news/data/dump/*.jsonl | tail -n1
19428360 total
Not sure how your sqlite file is structured but my intuition is that the sizes being roughly the same sounds plausible: JSON has a lot of overhead from redundant structure and ASCII-formatted values; but sqlite has indexes, btrees, ptrmaps, overflow pages, freelists, and so on.[0] Well, ChatGPT did the math but it seems to check out: https://chatgpt.com/share/68124afc-c914-800b-8647-74e7dc4f21...
I did some modelling of how much contributed text data there was on Google+ as that site was shutting down in 2019.
By "text data", I'm excluding both media (images, audio, video), and all the extraneous page throw-weight (HTML scaffolding, CSS, JS).
Given the very low participation rates, and finding that posts on average ran about 120 characters (I strongly suspect that much activity was part of a Twitter-oriented social strategy, though it's possible that SocMed posts just trend short), seven years' of history from a few tens of millions of active accounts (out of > 4 billion registered profiles) only amounted to a few GiB.
This has a bearing on a few other aspects:
- The Archive Team (AT, working with, but unaffiliated with, the Internet Archive, IA) was engaged in an archival effort aimed at G+. That had ... mixed success (much content was archived, one heck of a lot wasn't, very few comments survive (threads were curtailed to the most recent ten or so, absent search it remains fairly useless, those with "vanity accounts" (based on a selected account name rather than a random hash) prove to be even less accessible). In addition to all of that, by scraping full pages and attempting to present the site as it presented online, AT/IA are committing to a tremendous increase in the stored data requirements whilst missing much of what actually made the site actually of interest.
- Those interested in storing text contributions of even large populations face very modest storage requirements. If, say, average online time is 45 minutes daily, typing speed is 45 wpm, and only half of online time is spent writing vs. reading, that's roughly 1,000 words/(person*day), or about 6 KiB/(person*day). That's 6 MiB per 1,000 people, 6 GiB per 1 million, 6 PiB per billion. And ... the true values are almost certainly far lower: I'm pretty certain I've overstated writing time (it's likely closer to 10%), and typing speed (typing on mobile is likely closer to 20--30 wpm, if that). E.g., Facebook sees about 2.45 billion "pieces of content" posted per day, of which half is video. If we assume 120 characters (bytes) per post, that's a surprisingly modest amount, substantially less than 300 GiB/day of text data. (Images, audio, and video will of course inflate that markedly).
- The amount of non-entered data (e.g., location, video, online interactions, commerce) is the bulk of current data collection / surveillance state & capitalism systems.
- BigQuery, (requires Google Cloud account, querying will be free tier I'd guess) — `bigquery-public-data.hacker_news.full`
- ClickHouse, no signup needed, can run queries in browser directly, [1]
[1] https://play.clickhouse.com/play?user=play#U0VMRUNUICogRlJPT...
The author said this in jest, but I fear someone, someday, will try this; I hope it never happens but if it does, could we stop it?
With long and substantive comments, sure, you can usually tell, though much less so now than a year or two ago. With short, 1 to 2 sentence comments though? I think LLMs are good enough to pass as humans by now.
AI actively tears us apart. We no longer know if we're talking to a human, or if an artists work came from their ability, or if we will continue to have a job to pay for our living necessities.
Eh yes, but also debatable.
It brings you together if you follow their rules, and excommunicates you to the darkness if you do not. It is a complicated set of controls from the times before the rules of society were well codified.
And the job? I’ve been laid off several times in my career. You never know if you will have a job tomorrow or not.
AI has changed none of this, it is only exposing these problems to more people than before because it makes these things easier. It also makes good works easier, but I don’t think it cheapens that work if the person had the capability in the first place.
In essence, we have the same problems we had before and now we are socially forced to deal with them a bit more head-on. I don’t think it’s a bad thing though. We needed to deal with this at some point anyway.
Do we accuse everybody of being an LLM? Will most threads devolve into "you're an LLM, no you're the LLM" wars? Will this give an edge to non-native English speakers, because grammatical errors are an obvious tell that somebody is human? Will LM makers get over their squeamishness and make "write like a Mexican who barely speaks English" a prompt that works and produces good results?
Maybe the whole system of anonymity on the internet gets dismantled (perhaps after uncovering a few successful llm-powered psy-ops or under the guise of child safety laws), and everybody just needs to verify their identity everywhere (or login with Google)? Maybe browser makers introduce an API to do this as anonymously and frictionlessly as possible, and it becomes the new normal without much fuss? Is turnstile ever going to get good enough to make this whole issue moot?
I think we have a very interesting few years in front of us.
Buried at the bottom of the thread was a helpful reply by an obvious LLM account that answered the original question far better than any of the other comments.
I'm still not sure if that's amazing or terrifying.
- ID/proof of human verification. Scan your ID, give me your phone number, rotate your head around while holding up a piece of paper etc. note that some sites already do this by proxy when they whitelist like 5 big email providers they accept for a new account.
- Going invite only. Self explanatory and works quite well to prevent spam, but limits growth. lobste.rs and private trackers come to mind as an example.
- Playing a whack-a-mole with spammers (and losing eventually). 4chan does this by requiring you to solve a captcha and requires you to pass the cloudflare turnstile that may or may not do some browser fingerprinting/bot detection. CF is probably pretty good at deanonimizing you through this process too.
All options sound pretty grim to me. Im not looking forward to the AI spam era of the internet.
Of course this goes against the interests of tracking/spying industry and increasingly authoritarian governments, so it's unlikely to ever happen.
https://support.apple.com/en-us/102591
https://blog.cloudflare.com/eliminating-captchas-on-iphones-...
The weak link is in the ID servers themselves. What happens if the servers go down, or if they refuse to issue keys? Think a government ID server refusing to issue keys for a specific person. Pages that only accept keys from these government ID servers, or that are forced to only accept those keys, would be inaccessible to these people. The right to ID would have to be enshrined into law.
This verification mechanism must include some sort of UUID to reign in a single bad actor who happens to validate his/her bot farm of 10000 accounts from the same certificate.
See also my other comment on the same parent wrt network of trust. That could perhaps vet out spammers and trolls. On one and it seems far fetched and a quite underdeveloped idea, on the other hand, social interaction (including discussions like these) as we know it is in serious danger.
You'd need to have a permanent captcha that tracks that the actions you perform are human-like, such as mouse movement or scrolling on phone etc. And even then it would only deter current AI bots but not for long as impersonation human behavior would be a 'fun' challenge to break.
Trusted relationships are only as trustworthy as the humans trusting each other, eventually someone would break that trust and afterwards it would be bots trusting bots.
Due to bots already filling up social media with their spew and that being used for training other bots the only way I see this resolving itself is by eventually everything becoming nonsensical and I predict we aren't that far from it happening. AI will eat itself.
Correct. But for curbing AI slop comments this is enough imo. As of writing this, you can quite easily spot LLM generated comments and ban them. If you have a verification system in place then you banned the human too, meaning you put a stop to their spamming.
Perhaps I am jaded but most if not all people regurgitate about topics without thought or reason along very predictable paths, myself very much included. You can mention a single word covered with a muleta (Spanish bullfighting flag) and the average person will happily run at it and give you a predictable response.
I see the exact same in others. There are some HN usernames that I have memorized because they show up deterministically in these threads. Some are so determined it seems like a dedicated PR team, but I know better...
Lots of issues there to solve, privacy being one (the links don't have to be known to the users, but in a naive approach they are there on the server).
Paths of distrust could be added as negative weight, so I can distrust people directly or indirectly (based on the accounts that they trust) and that lowers the trust value of the chain(s) that link me to them.
Because it's a network, it can adjust itself to people trying to game the system, but it remains a question to how robust it will be.
For a mix of ideological reasons and lack of genuine interest for the internet from the legislators, mainly due to the generational factor I'd guess, it hasn't happened yet, but I expect government issued equivalent of IDs and passports for the internet to become mainstream sooner than later.
I don’t think that really follows. Businesses credit bureaus and Dun & Bradstreet have been privately enabling trust between non-familiar parties for quite a long time. Various networks of merchants did the same in the Middle Ages.
Under the supervision of the State (they are regulated and rely on the justice and police system to make things work).
> Various networks of merchants did the same in the Middle Ages.
They did, and because there was no State the amount of trust they could built was fairly limited compared to was has later been made possible by the development of modern states (the industrial revolution appearing in the UK has partly been attributed to the institutional framework that existed there early).
Private actors can, and do, and have always done, build their own makeshift trust network, but building a society-wide trust network is a key pillar of what makes modern states “States” (and it directly derives from the “monopoly of violence”).
States will often co-opt existing trust networks as a way to enhance and maintain their legitimacy, as with Constantine’s adoption of Christianity to preserve social cohesion in the Roman Empire, or all the compromises that led the 13 original colonies to ratify the U.S. constitution in the wake of the American Revolution. But violence comes first, then statehood, then trust.
Attempts to legislate trust don’t really work. Trust is an emotion, it operates person-to-person, and saying “oh, you need to trust such-and-such” don’t really work unless you are trusted yourself.
I'm not saying otherwise (I've even referred to this in a later comment).
> But violence comes first, then statehood, then trust.
Nobody said anything about the historical process so you're not contradicting anyone.
> Attempts to legislate trust don’t really work
Quite the opposite, it works very, very well. Civil laws and jurisdiction on contracts have existed since the Roman Republic, and every society has some equivalent (you should read about how the Taliban could get back to power so quickly in big part because they kept doing civil justice in the rural afghan society even while the country was occupied by the US coalition).
You must have institutions to be sure than the other party is going to respect the contract, so that you don't have to trust them, you just need to trust that the state is going to enforce that contract (what they can do because they have the monopoly of violence and can just force the party violating the contract into submission).
With the monopoly of violence comes the responsibility to use your violence to enforce contracts, otherwise social structures are going to collapse (and someone else is going to take that job from you, and gone is your monopoly of violence)
I like the idea of one's trust to leverage that of those around them. This may make it more feasible to ask some 'effort' for the trust gain (as a means to discourage duplicate 'personas' for a single human), as that can ripple outward.
How are individuals in the network linked? Just comments on comments? Or something different?
Frankly I don't trust my friends of friends of friends not to add thirst trap bots.
TLS (or more accurately, the set of browser-trusted X.509 root CAs) is extremely hierarchical and all-or-nothing.
The PGP web of trust is non-hierarchical and decentralized (from an organizational point of view). That unfortunately makes it both more complex and less predictable, which I suppose is why it “lost” (not that it’s actually gone, but I personally have about one or maybe two trusted, non-expired keys left in my keyring).
The fact that the Spanish mint can mint (pun!) certificates for any domain is unfortunate.
Hopefully, any abuse would be noticed quickly and rights revoked.
It would maybe have made more sense for each country’s TLD to have one or more associated CA (with the ability to delegate trust among friendly countries if desired).
At least they seem to have kicked out the Russian ones now. But it's weird that such an important decision lies with arbitrary companies like OS and browser developers. On some platforms (Android) it's not even possible to add to the system CA list without root (only the user one which apps can choose to ignore)
I have nothing intrinsically against people that 'will click absolutely anything for a free iPad' but I wouldn't mind removing them from my online interactions if that also removes bots, trolls, spamners and propaganda.
I also want something like this for a lightweight social media experience. I’ve been off of the big platforms for years now, but really want a way to share life updates and photos with a group of trusted friends and family.
The more hostile the platforms become, the more viable I think something like this will become, because more and more people are frustrated and willing to put in some work to regain some control of their online experience.
Meta and X have glommed them all together and made them unworkable with opaque algorithmic control, to the detriment of all of them.
And then you have all of them colonised by ad tech, which distorts their operation.
Just in case I need to check for plagiarism.
I don't have enough Vram nor enough time to do anything useful on my personal computer. And yes I wrote vram like that to pothole any EE.
> Hideo Kojima's ambitious script in Metal Gear Solid 2 has been praised, some calling it the first example of a postmodern video game, while others have argued that it anticipated concepts such as post-truth politics, fake news, echo chambers and alternative facts.
Even if someone managed to keep a few AI-driven accounts alive, the marginal cost is high. Running inference on dozens of fresh threads 24/7 isn’t free, and keeping the output from slipping into generic SEO sludge is surprisingly hard. (Ask anyone who’s tried to use ChatGPT to farm karma—it reeks after a couple of posts.) Meanwhile the payoff is basically zero: you can’t monetize HN traffic, and karma is a lousy currency for bot-herders.
Could we stop a determined bad actor with resources? Probably, but the countermeasures would look the same as they do now: aggressive rate-limits, harsher newbie caps, human mod review, maybe some stylometry. That’s annoying for legit newcomers but not fatal. At the end of the day HN survives because humans here actually want to read other humans. As soon as commenters start sounding like a stochastic parrot, readers will tune out or flag, and the bots will be talking to themselves.
Written by GPT-3o
Maybe that'll be a use case for blockchain tech. See the whole posting history of the account on-chain.
We can still take the mathematical approach: any argument can be analyzed for logical self-consistency, and if it fails this basic test, reject it.
Then we can take the evidentiary approach: if any argument that relies on physical real-word evidence is not supported by well-curated, transparent, verifiable data then it should also be rejected.
Conclusion: finding reliable information online is a needle-in-a-haystack problem. This puts a premium on devising ways (eg a magnet for the needle) to filter the sewer for nuggets of gold.
https://console.cloud.google.com/marketplace/product/y-combi...
It would have been nice to coordinate that with you, though.
Was feeling pretty pleased with myself until I realised that all I’d done was teach an innocent machine about wanking and divorce. Felt like that bit in a sci-fi movie where the alien/super-intelligent AI speed-watches humanity’s history and decides we’re not worth saving after all.
Is it not still good to be exposed to the experiences of others, even if one cannot experience these things themself?
So "normalize divorce" is pretty backward when what we should be doing is normalizing making sure you're marrying the right person.
Normalize divorce and stop stigmatizing it by calling it bad or evil.
>Making sure you are marrying the right person is normalized.
Absolutely not.
I live in the southern US and we have the culmination of "Young people should get married" coupled with "divorce is bad/evil" and the disincentivization of actually learning about human behaviors/complications before going through something that could be traumatic.
There are a lot of relationships that from an outside and balanced perspective give all the signs they will not work out and will be potentially dangerous for one or both partners in the relationship.
This is good for you, but many people do come out of their marriages much worse off in various ways
> Normalize divorce and stop stigmatizing it by calling it bad or evil
It's not bad or evil, but let's also not pretend that it isn't damaging
The alternative to divorce isn't perfect marriages, it is failed marriages that are inescapable.
I'm sure this has nothing to do with you, but by your comments in this thread, I'm reminded of a conversation I had with a friend on a bus one day. We were talking about the unfortunate tendency, in daytoday, of people to shuffle their elderly parents off to nursing homes, rather than to support said parents in some sort of independent living. A nearby passenger jumped into our conversation to argue that there are situations in which the nursing home situation is for the best. Although we agreed with him, he seemed to dislike the fundamental idea of caring for one's elderly parents at all, and subsequently became quite heated.
It's not a false dichotomy between either a jurisdiction must allow instant no-fault divorce for everyone who petitions for it, or none at all.
> Usually one or both parties know the consequences of the divorce and prefer them to the state of the marriage, because the damages are less than if divorce wasn't an option.
Sometimes both parties are reasonably rational and honest and non-adversarial, then again sometimes one or both aren't, and it only takes one party (or their relatives) to make things adversarial. If you as a member of the public want to see it in action, in general you can sit in and observe proceedings in your local courthouse in person, or view the docket of that day's cases, or view the local court calendar online. Often the judge and counsel strongly affect the outcome too, much more than the facts at issue.
> Claiming divorce is some kind of undesirable 'damaged' state is just as stigmatizing as claiming it is 'bad' or 'evil'.
It is not necessarily the end-state of being divorced that is objectively quantifiably the most damaging to both parties' finances, wellness, children, and society at large, it's the expensive non-transparent ordeal of family court itself that can cause damage, as much as (or sometimes more than) the end-state of ending up divorced. Or both. Or neither.
> The alternative to divorce is...
...a less broken set of divorce laws, for which there are multiple viable candidates. Or indeed, marriage(/cohabitation/relationships) continuing to fall out of favor. Other than measuring crude divorce rates and comparing their ratio to crude marriage rates (assuming same jurisdiction, correcting for offset by the (estimated) average length of marriage, and assuming zero internal migration), as marriage becomes less and less common, we're losing the ability to form a quantified picture of human behavior viz. when partnerships/relationships start or end; many countries' censuses no longer track this or being pressued to stop tracking it [1]; it could be inferred from e.g. bank, insurance, household bill arrangements, credit information, public records, but obviously privacy needs to be respected.
[0] https://en.wikipedia.org/wiki/Divorce_law_by_country
[1]: https://www.pewresearch.org/short-reads/2015/05/11/census-bu...
It's pretty easy to create strawmen arguments and argue against those instead of what people actually say, but it makes for at best boring and at worst confusing reading.
It’s not any more damaging than getting married in some cases, or staying married.
Marriage is not some sacred thing to be treasured. It CAN be, but it isn’t inherently good. Inherently, marriage is a legal thing, and that’s about it; being married changes how taxes, confidential medical information, and death are handled, and that’s about it. Every meaning or significance beyond those legal things is up to the happy couple, including how, if, and when, to end the marriage.
The comparison can't be to an imaginary world where everyone always picks the best partner. It has to be to the real world where people don't always pick the best partner and the absence of divorce means they're stuck with them.
NYT Gift Article: https://www.nytimes.com/2016/05/29/opinion/sunday/why-you-wi...
Thanks for sharing the link.
[0] https://www.youtube.com/watch?v=-EvvPZFdjyk 22 minutes
[1] https://www.youtube.com/watch?v=zuKV2DI9-Jg 4 minutess
Marriage should be made less artificially blown up with meaning and divorce should not be stigmatized. Instead, if done with a healthy frequency, people divorcing when they notice it is not working, should be applauded, for looking out for their own health.
At the same time people also should learn how to make relationships in general work.
> At the same time people also should learn how to make relationships in general work.
And most importantly, knowing when to do the one or the other.
I think this thought that divorce is bad comes from religion which would end up having to care for abandoned kids (especially when contraception didn't exist so having kids wasn't as much of a choice)
I don't really hear it so much here in Europe except from very religious people. Most people are totally ok with divorce, many aren't even married (I myself never married and I had a gf for 12 years from a Catholic family who also didn't mind at all) and a lot of them are even polyamorous :) I have a feeling that would not go down so well in rural America.
Let's say you discovered a pendrive of a long lost civilization and train a model on that text data. How would you or the model know that the pendrive contained data on wanking and divorce without anykind of external grounding to that data?
does not answer the general good abstract question and "how semantics possible thru relative context / relations to other terms only?", but speaks to how different modalities of information (e.g. visual data vs. sound data) are likewise represented, modelled, processed, etc. using different neural structures which presumably encode different aspects of information (e.g. layman obvious guess - temporality / relation-across-time-axis much more important for sound data).
tl;dr what you think of as "grounding" is just yet more relative context...
But any GDPR requests for info and deletion in your inbox, yet?
https://play.clickhouse.com/play?user=play#U0VMRUNUICogRlJPT...
Edit: or make a non-stacked version?
I like Tableau Public, because it allows for interactivity and exploration, but it can't handle this many rows of data.
Is there a good tool for making charts directly from Clickhouse data?
NVM this approach of going item by item would take 460 days if the average request response time is 1 second (unless heavily parallelized, for instance 500 instances _could_ do it in a day but thats 40 million requests either way so would raise alarms).
For when the apocalypse happens it’ll be enjoyable to read relatively high quality interactions and some of them may include useful post-apoc tidbits!
One of the advantages of comments is that there's simply so much more text to work with. For the front page, there is up to 80 characters of context (often deliberately obtuse), as well as metadata (date, story position, votes, site, submitter).
I'd initially embarked on the project to find out what cities were mentioned most often on HN (in front-page titles), though it turned out to be a much more interesting project than I'd anticipated.
(I've somewhat neglected it for a while though I'll occasionally spin it up to check on questions or ideas.)
ashish01•9mo ago
jasonthorsness•9mo ago
I used a function based on the age for staleness, it considers things stale after a minute or two initially and immutable after about two weeks old.
https://github.com/jasonthorsness/unlurker/blob/main/hn/core...