You can turn off personalization. (Operating under the assumption that most people search for facts, I personally don't see why one would ever want personalized results.)
Location based personalization is pretty useful - if I search for 'Bob's Discount Linguine' I want the one in my neighborhood.
Lots of niche things (like programming) also reuse common english words to mean specific things - if I search e.g. 'locking' it's nice to get results related to asynchronous programming instead of locksmiths because google knows I regularly search for programming related terminology.
Of course it's questionable whether google does a good job at any of this, but I absolutely see the value.
I won't bother defending Google-style personalization as it exists for their search results, but since collisions in terminology across fields are common, it's not that hard to see how actual, thoughtful personalization could be useful. Someone searching for "Kafka" is going to want very different results based on whether they're thinking of software or literature. Opinions may also differ over the usefulness of sources, even for people ultimately interested primarily in facts; I find Kagi-style personalization (make your own domain list) very useful, but across Kagi's userbase Reddit is simultaneously one of the most lowered, most raised, and most pinned domains: https://kagi.com/stats?stat=leaderboard
Anecdotally I find myself appending 'reddit' to search terms very frequently. It's effectively shorthand for "I want to read about peoples direct experience with this thing", and reddit is huge and well crawled by search engines. It's astroturfed to hell especially around political topics, but I feel like it's easy to tell when discussions about random products are authentic.
> I personally don't see why one would ever want personalized results.
The same short combination of words can mean very different things to different people. My favorite example of this is "C string" because when I was a kid learning C I was introduced to a whole new class of lingerie because Google didn't really personalize results back then. Now when I search "C string" Google knows exactly what I mean.
Google does search now? I mean, it's great to see but I'm not sure how this is going to challenge the convenience of my chosen brand of chatbot being able to find the same info without being scammed by 100 seo optimised junk sites.
Yeah they’re pretty terrible now. Reminds me, this is an interesting article about search engines getting worse and failing, but the author didn’t get into the spam aspect iirc: https://archive.org/details/search-timeline
No matter what my chosen brand of chatbot is, it can't help but hallucinate between 25% and 90% of the links it offers me. If it's not it's just proxying a google search for you itself.
1. Your chatbot doesn't have its own internet scale search index.
2. You're being given information that may or may not be coming in part from junk sites. All you've done is give up the agency to look at sources and decide for yourself which ones are legitimate.
I'm not sure I've ever relied on google to tell me what a site like this had, when the site itself is fully indexed, as this one is. Freetext search over the metastate of title, author, format, date (when available) -seems to work.
Good point. So there is definitely a social utility in search over text which google does have, for the trove it scanned, hands and cats-pawprints and all.
I’m pretty sure Google indexing pages from Anna’s archive would only get metadata, because AA doesn’t have the full text of the books on those pages. I think to get the full text you have to download the torrents, and I don’t think Google was doing that.
I was surprised that those pages showed up in book title searches at all. Makes sense to get rid of them, you don't want a search for a book to be topped by a link to pirate the book. The top-level domains still come up, and people who know they want to pirate a book can still find the site.
Anna's archive has already fulfilled G's needs (training Gemini) so now it's time to pretend it never existed ;)
Did Anna's Archive also organize much of the world's information and made it universally accessible, for some time?
This should remain as the top comment.
Feels weird to say but I have found using Yandex of all places an excellent search engine for content that get taken down by DMCA requests.
Eg if you want to watch a movie that's not on Netflix using a web stream the search results are far better.
Feels like Google circa 2005.
I've been playing around with a variety of search engines such as Kagi, Startpage, Ecosia, DDG.
All of them are better than google in finding relevant results. Lol
Google is way too "personalized".
Google hides the most relevant results on the 3rd page. It was confirmed in trial disclosures a few months ago. Their concern isn’t public search.
Seems to not be empirically true.
You can turn off personalization. (Operating under the assumption that most people search for facts, I personally don't see why one would ever want personalized results.)
Location based personalization is pretty useful - if I search for 'Bob's Discount Linguine' I want the one in my neighborhood.
Lots of niche things (like programming) also reuse common english words to mean specific things - if I search e.g. 'locking' it's nice to get results related to asynchronous programming instead of locksmiths because google knows I regularly search for programming related terminology.
Of course it's questionable whether google does a good job at any of this, but I absolutely see the value.
I just add another keyword to narrow the search result. I don’t think I’ve ever wanted results based on anything other than the query.
I won't bother defending Google-style personalization as it exists for their search results, but since collisions in terminology across fields are common, it's not that hard to see how actual, thoughtful personalization could be useful. Someone searching for "Kafka" is going to want very different results based on whether they're thinking of software or literature. Opinions may also differ over the usefulness of sources, even for people ultimately interested primarily in facts; I find Kagi-style personalization (make your own domain list) very useful, but across Kagi's userbase Reddit is simultaneously one of the most lowered, most raised, and most pinned domains: https://kagi.com/stats?stat=leaderboard
Anecdotally I find myself appending 'reddit' to search terms very frequently. It's effectively shorthand for "I want to read about peoples direct experience with this thing", and reddit is huge and well crawled by search engines. It's astroturfed to hell especially around political topics, but I feel like it's easy to tell when discussions about random products are authentic.
> Kafka" is going to want very different results based on whether they're thinking of software or literature.
Speak for yourself. I've worked in several "Kafka-esque" software organizations.
> I personally don't see why one would ever want personalized results.
The same short combination of words can mean very different things to different people. My favorite example of this is "C string" because when I was a kid learning C I was introduced to a whole new class of lingerie because Google didn't really personalize results back then. Now when I search "C string" Google knows exactly what I mean.
yep Yandex all days when I wanna wear an eye patch and pirate the seas.
I just tested, indeed very good results!
Google does search now? I mean, it's great to see but I'm not sure how this is going to challenge the convenience of my chosen brand of chatbot being able to find the same info without being scammed by 100 seo optimised junk sites.
I have heard that chatbots aren’t affected by spam as much as Google when you ask them to search, is that true?
Not sure. I understand they used to do search though.
(Love the username, BTW.)
Yeah they’re pretty terrible now. Reminds me, this is an interesting article about search engines getting worse and failing, but the author didn’t get into the spam aspect iirc: https://archive.org/details/search-timeline
No matter what my chosen brand of chatbot is, it can't help but hallucinate between 25% and 90% of the links it offers me. If it's not it's just proxying a google search for you itself.
Weird, I get pretty great results. Maybe I had hallucination rates like that 2 years ago, but not today.
Browser based iOS usage of ChatGPT, by chance?
1. Your chatbot doesn't have its own internet scale search index.
2. You're being given information that may or may not be coming in part from junk sites. All you've done is give up the agency to look at sources and decide for yourself which ones are legitimate.
As for point one, is that true? I thought ChatGPT and Perplexity had their own indexes.
I'm not sure I've ever relied on google to tell me what a site like this had, when the site itself is fully indexed, as this one is. Freetext search over the metastate of title, author, format, date (when available) -seems to work.
They don’t have full text search of document contents though do they? I know Google wouldn’t have this for AA pages either, just curious
Good point. So there is definitely a social utility in search over text which google does have, for the trove it scanned, hands and cats-pawprints and all.
I’m pretty sure Google indexing pages from Anna’s archive would only get metadata, because AA doesn’t have the full text of the books on those pages. I think to get the full text you have to download the torrents, and I don’t think Google was doing that.
No, thats more meta's trick. and they were "only doing it for the articles" not the pictures. I think. I dunno..
They were doing it for the videos too, but only for "personal use"...
https://www.wired.com/story/meta-claims-downloaded-porn-at-c...
Google has already removed URLs from the first page of "search" results.
Go thing that Google hasn't been a part of my life for a while now. I use DuckDuck for search.
I've seen DDG censor stuff that was still on google
https://www.google.com/search?q=Anna%27s+Archive
Google's march to irrelevance continues with full steam.
They got a long way ahead of them then, considering they're still something like 97% of all search queries.
Actually ~90%, but that does not include AI search (chatgpt et al).
https://www.klatch.co.uk/search-engine-market-share
And still it’s the top result in Google if one searches for Anna’s archive. How is it that that search result hasn’t been removed?
Presumably, the home page doesn't contain any copyright violations. This is only DMCA stuff targetting individual links.
I was surprised that those pages showed up in book title searches at all. Makes sense to get rid of them, you don't want a search for a book to be topped by a link to pirate the book. The top-level domains still come up, and people who know they want to pirate a book can still find the site.
Google search keeps getting less useful every day.
Are they in ChatGPT and other LLM providers? No need for Google.