Also I'm surprised Cloudflare hasn't shut them down like they do for other dodgy sites.
Error HTTP 451 Unavailable For Legal Reasons
In response to a legal order, Cloudflare has taken steps to limit access to this website through Cloudflare's pass-through security and CDN services within Belgium
I used to get archive.org blocked and had to contact my provider to have the filters taken off.
By comparison, on my work network (TalkTalk) I can resolve the domain but I get a connection reset from the site.
I think this might be the first time I've hit a DNS block. It feels rather eerie seeing people talking about a site that, from my point of view, doesn't even exist...
There are a lot of legitimate reasons to want to use scraping sites that UK copyright law is not nuanced enough to protect, and so blanket bans just end up emerging at the demands of copyright owners (which more often than not, means Disney or Springer).
8.8.8.8, 9.9.9.9, and many others exist.
https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/...
- DDOS attacks
- Spamming
- UK like surveillance laws
- LLM scraping
Why is it that there is almost not initiative for this?
The easiest way to mitigate those problem will be to decrease the openness and centralize more. It might lead to even worse things that DDOS.
- DRM. - Owner-unfriendly device locks (such as manufacturer-controlled secure boot or locked-down OSes). - Inability to audit network traffic from one's own devices, i.e. an IoT device. - Remote attestation, when in opposition to open computing.
I could also see folks seeing the use of cryptography as "having something to hide" - I don't personally agree.
Proof of work and micropayments (eg. Xanadu or Internet Mail 2000) schemes solve spamming and LLM scraping, but are more expensive or more CPU-intensive.
P2P systems like FreeNet too, but they are harder to use and more storage intensive and make it easier to spy on individual users.
Tor solves UK-like surveillance laws but it's slower and makes it easier to spam.
So see, there are initiatives, but people treat it as a joke, maybe because of when it was released.
even if it's decentralised, it'll be banned one way or another and you'll be hunted down.
(Not to mention the astronomical technical work it would be; you can't just replace "The Entire Internet")
The tweet only names Meta, but it would be very surprising if OpenAI didn't do the same thing.
Either this is practice is judged (or legislated) to be fair use, or copyright is done. It's also that simple.
Copyright law exists for a reason. Trying to improve an LLM doesn't give you the right to flout our legal system. Yes, other countries might have an advantage in LLM training as a result but so be it.
If it's judged as fair use, then yes. And then it's not flouting anything.
Remember the whole point of fair use is to benefit society by allowing reuse of material in ways that don't directly copy large portions of the material verbatim.
For example, nonfiction authors already "just take it" when reviews describe the main points of their book without paying them a cent. The justification is that it's for the greater good, and rights are limited.
How do you think masked language models work?
[1] https://www.404media.co/judge-rules-training-ai-on-authors-b...
I'll stop you right there - I really don't think that applies at all. Does 'society' really benefit when the whole thing is a funnel for enormous amounts of wealth to go to already-gigantic companies like Microsoft?
If you don't like it, there's a process for changing how it works, but don't expect an easy path to success. Various people will object, and will have to be won over to your way of thinking.
That's a rather bastardized and twisted representation of copyright and fair use.
The "whole point" of copyright was to promote the authorship of original creative works by legally protecting the financial income of those authors. The "whole point" of fair use was to make exceptions in cases where it's clear that the usage doesn't result in a market substitute and deprive original authors of their income.
The end-goal of LLMs is to ingest all of that original content and reproduce it with expert-level accuracy, promising to be the know-all, end-all product. If wildly optimistic predictions of LLM proponents turn out to be correct then they will never buy a book again, they will have no reason to. And this is precisely what the copyright was designed to protect authors against.
That phrase is carrying a lot of water, isn't it? Trillions of dollars worth by some estimates.
I'll ignore the legality aspects in my response. I think coming up with a representative sample of all relevant information would be better in the long term (teams will not be outcompeted on long time horizons). Why don't the companies do this? Because it is easier to just "carpet bomb the parameter space" and worry about the potential confounding [1] and sampling bias [2] later. Coming up with a representative sample requires domain expertise and that is expensive in terms of time and money. But it reduces the total amount of training data and should reduce the amount of time and resources it takes to build the models. That may matter now that models are quite large.
This is definitely a design decision with tradeoffs on both sides. I can entertain the notion that we don't have time to sample things, but I think we are all too often dismissing the long-term benefits of proper sampling.
(In terms of the legality aspects, judges are trying to "split the baby" [3] in my opinion by saying that training on stuff you got legally is OK but training on pirated material isn't. So nobody is going to recommend training on pirated material in the first place.)
[1] https://en.wikipedia.org/wiki/Confounding
[2] https://en.wikipedia.org/wiki/Sampling_bias
[3] https://www.404media.co/judge-rules-training-ai-on-authors-b...
Meta managed to get into a private ebook torrent tracker called Bibliotik a few years ago to use for training Llama and the resulting publicity essentially killed the tracker.
Infinite love to the team <3
Everyone involved is taking on significant personal liability and hosting expenses. Not sure what more you expect.
"Information wants to be free," which means that any cost of producing that information can be abstracted away due to ideological inconvenience.
out of $20 book, the authors earn about $1 - $1.5, for e-books its about $1.7 - $2
The value from book sales goes to retailer and publisher: two large corporations, and in case of amazon - a single big corporation
so please cry me a river about amazon's lost profits earned at the back of the book authors
~25% VAT and then the publishers and retailers take their cut. The government takes another 40% in income and payroll taxes from that. The leftovers are what the author gets.
Buying from yourself is probably the biggest markup you can get.
its really might be better to publish for free and create a buy me a coffee
The real one has been down for a long time.
Last but not least?
About recent events.
We are still alive and kicking. In recent weeks we’ve seen increased attacks on our mission. We are taking steps to harden our infrastructure and operational security. The work of securing humanity’s legacy is worth fighting for.
Since we started in 2022, we have liberated tens of millions of books, scientific articles, magazines, newspapers, and more. These are now forever protected from destruction by natural disasters, wars, budget cuts, and other catastrophes, thanks to everyone who helps with torrenting.
Anna’s Archive itself has organized some of the largest scrapes: we acquired tens of millions of files from IA Controlled Digital Lending, HathiTrust, DuXiu, and many more.
We have also scraped and published the largest book metadata collections in history: WorldCat, Google Books, and others. With this we’ll be able to identify which books are still missing from our collections, and prioritize saving the rarest ones.
Much thanks to all of our volunteers for making these projects happen.
We’ve forged some incredible partnerships. We’ve partnered with two LibGen forks, STC/Nexus, Z-Library. We’ve secured tens of millions additional files through these partnerships. And they are helping the mission by mirroring our files.
Unfortunately we have seen the disappearance of one of the LibGen forks. We don’t have further information about what happened there, but are saddened by this development.
There is a new entrant: WeLib. They appear to have mirrored most of our collection, and use a fork of our codebase. We have copied some of their user interface improvements, and are grateful for that push. Sadly, we are not seeing them share any new collections, nor share their codebase improvements. Since they haven’t shown commitment to contributing back to the ecosystem, we advise extreme caution. We recommend not using them.
In the meantime, we have some exciting projects in the works. We have hundreds of terabytes in new collections sitting on our servers, waiting to be processed. If you’re at all interested in helping out, feel free to check out our Volunteering and Donate pages. We run all of this on a minimal budget, so any help is greatly appreciated.
Keep fighting.
Apologies for the minor grumble, but on mobile I used to be able to browse search results much more effectively; the new design only fits ~4-5 results on a screen.
I've been using WeLib since April and had a good experience so far
Otherwise, please explain how I am missing your point.
That's an odd combination.
that is an odd demand for a site that thrives on piracy. Don't steal from the thieves? When you take from others it's liberation, when others take from you it's parasitic, that's certainly a convenient coincidence
I'd assume there are many people who don't help out purely because of legal fears, something i2p could help with.
* ability to fund shadow libraries without fear of censorship
* lists with a single item still count as lists
Not really helping in the big picture, here, guys.
They are even offering decent bounties: https://software.annas-archive.li/AnnaArchivist/annas-archiv...
Whoever is running it must be doing really well for themselves laundering all that crypto.
Also interestingly they don't offer a tor onion service, while the admin is most certainly technically competent to administer one given that he no doubt uses tor to insulate himself from his enterprise and launder crypto. What is the reasoning for that?
Obviously, since Anna's Archive is breaking the law, it can't conform itself to the normal legal/regulatory system that governs non-profit organizations. It can certainly still claim to be acting in the spirit of a non-profit, and it's up to you to decide whether you trust that claim. Nobody's forcing you to give them money.
I really, really don't think that anybody is being fooled or misled into thinking that Anna's Archive is a "legitimate" audited organization when they describe themselves as a non-profit.
This is very geography-specific. In the US, 501(c)(3)s (what most people think of when they say "non-profit" where I am) have no general requirement for audits. There's also plenty of non-profit-by-some-definition organizations that never file a Form 1023, giving up some benefits of the 501(c)(3) regulations but in exchange being even less regulated.
The primary difference between a non-profit and a for-profit is that a non-profit does not distribute profit to shareholders, including the founders.
A non-profit is a corporate legal structure. An unregistered organization could be a cabal, a gang, a syndicate, a fellowship, a religion, a movement, a private club, or something else.
I would love to see someone try to explain to the IRS why all those purchases of Amazon gift cards and Monero for the transparently illegal organization should be deductible though
The usage of crypto is entirely one of necessity, as controling information and knowledge is something powerful people have clear stakes in. Many countries weild their financial systems to hold or acquire power. Information and Knowledge is one form of such power.
Everything points to the Anna's Archive team being passionate ideologues as opposed to some criminal enterprise focused on profit motives.
Anonymous international fugitive?
> Nobody is getting rich off of donations.
How can anyone aside from the beneficiary know that?
The extent to which the controller can get rich off this enterprise depends entirely on the unknown quantity of donated funds (and deals with AI companies) and his skill at laundering crypto (which darknet marketplace controllers doing far more illegal stuff can do).
They're getting donations as much as megaupload was getting donations for premium accounts...
People pay for higher bandwidth and no wait time, not to support the "cause". It's a farce to qualify this of donations.
And obviously people do get rich off of it, as you can see from the slew of file hosting services.
Thus, Who gives a shit if they're taking money from those who voluntarily subscribe. They still offer an absolutely incredible free service to who knows how many people who otherwise wouldn't be able to afford so much access to so much free information.
Given the behavior of the pro-copyright business interests and legal bodies of the world, and the outright hypocrisy of openly creating one set of rules on content piracy for certain corporations while applying another, harsher rule system for those who aren't so nicely connected, smug moralizing about something like Annas Archive has little grounding.
And aside from picking random crap out of your ass for smearing arbitrarily, what shred of evidence do you have of anyone there laundering crypto, and how?
The controller's freedom. If they didn't launder it they wouldn't be free.
> They still offer an absolutely incredible free service
Actually their free downloads aren't particularly good when compared to some of the other online services that 'leech' from them.
And their torrent strategy could be altruistic but it could also be self interested. By spreading storage costs around and attracting more contributions. And providing insurance to hardrive seizures.
What mainly interests me is how much money they are actually making, I suspect it's very profitable.
A pretty rich thing to say when your mission is piracy.
I'm not against piracy at all, quite the contrary, but this is quite laughable.
https://annas-archive.org/donate
I'll also say that when too much money starts becoming a part of this, trouble will increase dramatically. I realize this sort of endeavor costs a lot of time and money, but it's a line we should probably be aware of.
Some AI company techbros like this data trove even harder, and limit their pretending to publicly saying things like "we're changing the world" (and "AI could be bad if you don't give us money and lock out competitors") but really only care about wealth and power.
Certain sanctioned countries that culturally value literature and science might also appreciate this. (This last category, I'm much-much more sympathetic to, and wish them well in their intellectual pursuits and appreciation of the humanities, though we should really find a better way to share that doesn't undermine Western economies and many people's livelihoods.)
Moreover, however many countless AI companies now buying and pulping copies of every book in existence seems to be really changing the used book market. Prices are going up dramatically and before this year it was very rare to not find a single copy in the world of whatever old book one desired.
As someone who spends a disproportionate amount on books and shares your concern for not making life even more difficult for authors, these services going away would be a tremendous regression.
And I am one of the best customers of these 3 physical shops, in my town.
So sure, I don't buy the latest trends based on ads. I investigate a lot to buy GREAT stuff. Sometimes the shopkeeper has headaches to find the obscure stuff I discovered online that NOBODY knows it exists.
Am I an exception?
I don't know but those services are great to maintain a freedom of choice.
Yes, I think you're an exception, sorry.
We will never have real data on this. But simply on its face, I find it extremely hard to believe that most consumers have a strong enough moral compass to go out of their way to buy something they already have access to. Maybe they will for a tiny handful of special books that they want hard copies of, or authors they really like, but not for most media they consume.
This type of system also becomes a popularity contest for creators; you are supporting the people you like as opposed to whose work you want to read. If an author says something you disagree with, it's easy to just read their work without paying them. I'm not against consumer boycotts, but it should generally come with a sacrifice on both sides--for consumers, that means missing out on the product or service.
You are free to feel however you want about this. I can certainly see the immense societal value of making things accessible to more people. But I flat out don't believe the "piracy doesn't lead to lost sales" shtick, of course it does.
From above:
'The Dutch firm Ecory was commissioned to research the impact of piracy for several months, eventually submitting a 304-page report to the EU in May 2015. The report concluded that: “In general, the results do not show robust statistical evidence of displacement of sales by online copyright infringements. That does not necessarily mean that piracy has no effect but only that the statistical analysis does not prove with sufficient reliability that there is an effect.”
The report found that illegal downloads and streams can actually boost legal sales of games, according to the report. The only negative link the report found was with major blockbuster films: “The results show a displacement rate of 40 percent which means that for every ten recent top films watched illegally, four fewer films are consumed legally.”'
I'm not as certain as you are. Correlation does not imply causation, but media sales have trended upwards in the age of piracy which leads to some interesting hypotheses.
A few years ago Shirley Manson (lead singer of the 90s band Garbage) accused YouTube of making its fortune off the backs of content creators - basically charging the entire enterprise as being one big exercise in copyright infringement. And yet the music industry, as well as Hollywood, seem to be doing better and better each year in terms of dollars made. Some of the distribution models have changed - broadcast and cable television are pretty dead in the water, but the entertainment industries in general seem to be doing better than ever. And yeah lots of individual artists are still getting raw deals from Spotify and labels etc. as they always have. But industry-wise, in terms of dollar amounts, it seems there's more money to be made than ever before from creating and selling entertainment.
The statement you made that I absolutely agree with is that it's hard to get real world data on this. An individual who is able to get free access to something may be unlikely to ever pay for that same thing.But the answer to the question: "Does piracy hurt the industry's bottom line, or help it on the whole?" is a very difficult question to answer. And we have to consider the even harder stuff to measure. Things like: is a teenager who pirates recorded media more or less likely to buy merch and concert tickets? More or less likely to buy a special edition package with tangible collector items?
At the end of the day, I have no clue.
I also offer all of this being very pro-capitalism and pro-intellectual-property. I don't condone piracy. But if we're just looking at raw data and trying to form our hypothesis, we have to start with the fact that the raw data points to upwards trends on the whole.
> I'm not against consumer boycotts, but it should generally come with a sacrifice on both sides--for consumers, that means missing out on the product or service.
I'm curious as to why you feel this way, genuinely. The decision to boycott means that there is no sale, full stop, so no money is being handed over. Why does anything after that matter? The important part, the money, is already decided from the start.
The main reason to download "pirated" books is that they get rid of all annoying barriers that exist in "legitimate" copies. It's a better product.
For example, if you pirated an ebook and liked it, you'd likely buy a physical copy.
How many authors who write the books in pro Anna's archive are happy about it?
I personally am pro Anna's archive (and sci-hub, etc) because I believe it benefits society to have better read citizens. That said, I have some misgivings, because under our current system, there are issues with law and remuneration.
How do you expect people to find your book?
Also, but too late now, if I had known your attitude, I'd not have bought your book.
That being said, do you know if their offering of your material has had a significant impact on your revenue or is it more the principal of the matter?
You have invested in an idea that has been created by power structures through culture, that you are getting harmed by someone else's freedom. The people that will/want to support your work will do so out of a desire to do so, not because law says its right.
Many people are deceived that law breakers are immoral and harmful to society, but I don't think that's the case. The people that care to much about copyright are too invested in demanding a return for their efforts. What ever happens to the priority of making the world a better place first and foremost and having faith that you will be compensated in some fashion for your efforts.
I was just trying to finish this writer’s corpus on a reread of their later material. It’s not that I’m cheap. I own a paper and audiobook copy of several of my favorite books. Including this author, so I’ve paid her twice. I just avoided the trap some of my friends long ago were falling into of hoarding books, by only keeping books I intend to read again. So any completionist tendencies have always been resolved via library or electronic editions.
I’m getting older now, and my first real confrontation with my own mortality came up with books. I have several years worth of books even if I were retired and reading three or four a week. New things come out all the time, and new voices. I haven’t read some of these books in ten years or more. Am I really going to read them again before… So a couple years ago I reread Dune for what will likely be the last time and sold my ratty old yellow copies to a used bookstore. If I do it again it will likely be audiobook.
Yes, there are other reason why the music industry fell, but when your main demographic can always go to bittorrent to get their music if prices are too high, then there is only so much you can do with the price of music.
Yeah, I remember the 90's, music was huge, and there were so many good bands (Smashing Pumpkins, Nirvana, REM, White Stripes... Or if you're more into popular music, Michael Jackson, Whitney Houston...). Now, music is de-valued and cheap and our music scene has been decimated. Personally, think we should try to find ways to support musicians, writers, thinkers, artists...
... but if that is not your thing, what I can do? Just give it a thought.
NoMoreNicksLeft•2h ago
ronsor•2h ago
crest•2h ago
HedgeMage•2h ago