Funny enough, these days it indicates the article was written by a human. I had a dev join my team and made a few typos and it gave me a chuckle, as it’s a whole class of mistake I hadn’t seen in awhile.
This is not how email works, though.
I wonder if it is a generation gap thing. The young folks these days have probably used only Gmail, Proton or one of these big email services that abstract away all the technical details of sending and receiving emails. Without some visibility into the technical details of how emails are composed and sent they might not have ever known that the email headers are not some definite source of truth but totally user defined and can be set to anything.
trollbridge's point about scrapers using residential IPs and targeting authentication endpoints matches what we've seen. The scrapers have gotten sophisticated. They're not just crawling, they're probing.
The economics are broken. Running a small site used to cost almost nothing. Now you need to either pay for CDN/protection or spend time playing whack-a-mole with bad actors.
ronsor hosting a front-page HN project on 32MB RAM is impressive and also highlights how much bloat we've normalized. The scraper problem is real, but so is the software efficiency problem.
edit: I feel their pain - I've spent the past week fighting AI scrapers on multiple sites hitting routes that somehow bypass Cloudflare's cache. Thousands of requests per minute, often to URLs that have never even existed. Baidu and OpenAI, I'm looking at you.
Plus hitting the endpoints for authentication that return 403 over and over.
Oh you're so deterministic.
Look at how many sites still get "HN hugged" (formerly known as "slashdotted").
At this point, I have to assume that most software is too inefficient to be exposed to the Internet, and that becomes obvious with any real load.
I also do not have a robots.txt so google doesnt index.
Got some scanners who left a message how to index or dei dex, but was like 3 lines total in my log (thats not abusive).
But yeah, blocking the whole of Asia stopped soooo much of the net-shit.
That doesn't sound right. I don't have robots.txt too but Google indexes everything for me.
I think this is a recent change.