In spite of the flack that CAPTCHAs usually get, I still think they have a lot of value in fighting the majority of these spam attacks.
The common criticisms are:
- They impact usability, accessibility and privacy. Users hate them, etc.
These are all issues that can be improved. In the last few years there have been several CAPTCHAs that work without user input at all, and safeguard user privacy.
- They're not good enough, sophisticated (AI) bots can easily bypass them, etc.
Sure, but even traditional techniques are useful at stopping low-effort bots. Sophisticated ones can be fought with more advanced techniques, including ML. There are products on the market that do this as well.
- They're ineffective against dedicated attackers using mechanical turks, etc.
Well, sure, but these are entirely different attack methods. CAPTCHAs are meant to detect bots, and by definition, won't be effective against attackers who decide to use actual humans. Websites need different mechanisms to protect against that, but those are also edge cases and not the main cause of the spam we see today.
This would be far more privacy-preserving that dozens of national-ID lookup systems, and despite the appearance of "money for speech" it could actually be _cheaper_ than whatever mix of time and bus-fare and paperwork in a "free" system.
____________
I imagine the big problems would be things like:
* How to handle fraudulent payments, e.g. someone buying tokens with a stolen credit card. Easiest fix would be some long waiting-period before the token becomes usable.
* How to protect against a fraudulent attestor site that just takes your money, or one whose tokens are value-less.
* How to protect against a fraudulent destination site that secretly harvests your proof-token for its own use, as opposed to testing/burning it properly. Possible social fix: Put in a fake token, if the site "accepts" then you know it's misbehaving.
* Handling decentralization, where multiple donation sites may be issuing their own tokens and multiple account-sites that may only want to support/trust a subset of those tokens.
The Alcoholics Anonymous San Francisco website had to implement CAPTCHAs on their website because scammers were making one-time donations to make sure their stolen credit cards were still valid. Every morning we had to invalidate a dozen obviously-fake donations.
It doesn't stop the truly determined ones I'm sure, but it does mean that it adds complexity. You don't need to be impossible to test cards on, you just need to be harder to use than someone else (like a lower resource charity). We've even debated "fake accepting" some payment methods after we're confident it's someone trying to find working credit card numbers to add some false positives into the mix.
If tokens had to mature for X days before being used that could deter laundering pretty handily, but stopping "tests" of cards would require hiding payment errors from the user for a certain period... which would not be a great experience.
$5 isn't much for a wealthy westerner. It's a reasonable amount for an unemployed westerner. It's 12% of their weekly budget for someone earning median wage ($160/month) in Vietnam. But if you put in place regional pricing, it'll be cheap enough that spammers will just operate out of low income countries and buy thousands of cheap accounts.
There's no reason you can't have an attestation entity that's based on volunteer hours, provided you can convince sites-in-general that your proof-units are good.
The core theme isn't about cash, but that:that:
1. There are kinds of activity someone can do which demonstrates some kind of distinct actual expenditure of time or effort (not self-dealing.)
2. A trusted agent could attest to that activity.
3. Requiring (proof of) activity gives you an decent way to ward off the majority of bots/spam in a relatively simple way that doesn't become a complex privacy nightmare.
It's a similar outcome to sending CPU-bound challenges to a client, except without the deliberate-waste and without a strong bias towards people who can afford their own fast computer.
Because I wonder how are people going to do volunteering hours, get it recognized trough red tape/bureaucracy if they're already struggling to survive.
And the poor get poorer.
Personally, I am particularly concerned with avoiding the scariness of a government agency that inherently knows all the websites all people are using.
Security/anti-spam is probably not biggest accessibility factor in the last 40 years of change anyway: It's easier to make an alternate CAPTCHA route than to convince management a phone-app is unnecessary, or to correctly annotate everything with aria/alt-text properties in all the languages.
That's not a very generous reading, I think. I am suggesting that the "bad people" seem to be doing fine, so at a certain point we might want to ask ourselves how far we take this "fight" in terms of sacrificing accessibility and privacy (to only name 2 concerns) to stop some percentage of bad actors.
As someone who has been hurt by these efforts over the past 20+ years, and who has yet to hear a proposal for next steps that doesn't greatly worry me, I'm not going to be in favor of propositions just because "well we have to do something"
> It may also be blaming the wrong factors and growing pains. It's easier to make an alternate CAPTCHA route than to convince management to not rely on a phone app or to correctly annotate everything with aria-properties in all the languages.
We've had 20 years to make CAPTCHAs more accessible, yet they've gotten worse. Not to mention their efficacy being in question, hence the discussion about next steps (i.e., attestation)
It's basically using the HTTP 402: Payment Required status code and serving up a Lightning Network payment invoice.
Edit to add: it basically solves all of the caveat issues you identified.
[0]: https://l402.org/
IIUC the tokens would need to be cheaply verifiable by anyone as authentically issued, so a fake token would never be accepted (or if it somehow was, it would only tell you that the acceptor is fantastically lazy/incompetent).
I think that that verifiability, plus a guarantee that tokens will not be spent twice, plus visibility of all transactions, suffice: Then anyone can check the public ledger x minutes after they spent their and verify that the acceptor sent it straight to the burn address after receiving it. IOW, blockchain suffices. OTOH, it would be nice not to have to need the public ledger.
By comparison, here's a simpler "single HTTP call" approach, where a site like HN makes a POST to the issuer's API, which would semantically be like: "Hey, here is a potential token T and a big random confirmation number C. If T is valid, burn it and record C as the cause. Otherwise change nothing. Finally tell me whether-or-not that same T was burned in the last 7 days along with the same C that I gave."
The benefits of this approach are:
1. The issuer just has to maintain a list of surviving tokens and a smaller short-lived list of recent (T,C) burning activity, and use easy standard DB transactions to stop conflicts or double-spending.
2. All the social-media site has to do is create a random number C for burning a given T, and temporarily remember the pair until it gets a yes-or-no answer.
3. A malicious social-media site cannot separate testing the token from spending it on a legitimate site, which deters a business model of harvest-and-resale. However it could spend it immediately for its own purposes, which is worth further discussion.
4. The idempotent API call resists connection hiccups and works for really basic retry logic, avoiding "wasted" tokens.
5. The issuer doesn't know how or where a given token is being used, beyond what it can infer from the POST request source IP. It certainly doesn't know which social-media account it just verified, unless the two sites collude or the social-media site is stupid and doesn't use random C values.
What about if, instead of the spender handing the token directly to the acceptor, the spender instead first makes an HTTP "I want to spend token 456" request to the issuer, which replies with a "receipt" that the spender then sends to the acceptor, which in turn sends a "If the token associated with this receipt is not yet burnt, burn it, record C next to it and report OK, otherwise if it was already recently burnt using C also report OK (for idempotence), otherwise (if it was already burnt with some other C') report FAIL" request to the issuer. The receipt not being valid as a spendable token cuts out the double-spend issue, at the cost of one extra HTTP request for the spender.
[Edit: This has a flaw, but I already typed it out and I think it makes an incremental advancement.] How about:
1. User earns Token (no change from before)
2. User visits the Site and begins the "offer proof" process, the Site generates and records two random numbers/UUIDs for the process. The first is the previously-discussed Confirmation Code, which is used for idempotency and is not shared with the User. The second is a Site Handshake code which the user must copy down.
3. User goes to Attestor site and plugs in two pieces of information, the Token and the Site Handshake code. This returns a Burning code (valid for X hours) which the user carries back to the Site.
4. User passes the Burn Trigger to the Site, and it calls the previously-discussed API with both the Confirmation Code and the Site Handshake. If the Site Handshake does not match what's on file for that Burn Trigger, the attempt immediately fails with a security error.
____
No, wait, that doesn't really work. Although it protects against EvilForum later leveraging the data into a spam account on Slashdot, it fails when EvilForum has pre-emptively started a spam account on Slashdot and is reusing Slashdot's chosen Site Handshake as its own.
It can't do this, because the only "data" it has from the spender is a receipt. A receipt is by design not a spendable token itself; this is trivial to make evident to any party (e.g., tokens are all 100 characters, receipts are all 50).
It can because nothing in that artifact binds it to the one and only one site that the user expects. The only thing keeping it from being used elsewhere is if everybody keeps it secret, and the malicious not-really-spending site simply won't obey that rule.
In scenario form:
1. User goes to Attestor, inputs a Token for an output of a Burn Trigger. (I object to "receipt" because that suggests a finalized transaction, and nothing has really happened yet.)
2. Users submits that Burn Trigger to malicious AcmeWidgetForum, which (fraudulently) reports a successful burning and puts a "Verified" badge on the account.
3. In the background, AcmeWidgetForum acts like a different User and submits the Burn Trigger to InnocentSite, which sees no issue and burns it to create a new "verified" account.
Even if the User can somehow audit "which site actually claimed responsibility for burning my Token" and sees that "InnocentSite" shows up instead, most won't check, and even knowing that AcmeWidgetForum was evil won't do much to stop the site from harvesting more unwitting Users.
The Attestor generates a random secret associated with each Burn Trigger, and encrypt it with the supplied public key to create a non-secret Challenge. (Which is carried back by the User or else can be looked up by another API call.)
To burn/verify the Token, the Site would need to use its private key to reverse the process, turning the Challenge back into the secret. It would they supply the secret to the burn/verify API call. The earlier Confirmation Code would no longer be needed.
Thus AcmeWidgetForum would be the only site capable of using that Burn Trigger. (Unless they granted that ability to another site by sharing the same keypair, or stole a victim-site's keypair.)
... I know this is reinventing wheels, but I'm gonna choose to believe that there's some minor merit to it.
This has the same energy as the "we need benchmarks for LLMs" startups. Like sure it's obvious and you can imagine really complex cathedrals about it. But nobody wants that. They "just" want Apple and Google to provide access to the same APIs their apps and backends use, associating authentic phone activity with user accounts. You already get most of the way there by supporting iCloud login, which should illuminate to you what you are really asking for is to play outside of Apple's ecosystem, a totally different ask.
People can be tricked to give anything away.
> In practice they are no good for verifying humans on blogs and the like though because only about 0.0003% of humans have one.
Even if every human had one it'd still be useless.
Besides, CAPTCHAs shouldn't be the only protection against spam. There should still be content moderation tools, whether they're automated or not, to protect when/if CAPTCHAs don't work. Larger websites should know this and have the resources to mitigate it.
So saying that CAPTCHAs aren't worth it because they're not 100% accurate or effective is the wrong way of looking at this. They're just the first line of defense.
I would probably value my time, spent solving an annoying reCAPTCHA tapping on slowly fading pictures of what an American would consider a school bus before being asked to try again, more than a fraction of a cent. Of course reCAPTCHA probably considers me an edge case using Firefox with tracking protection and not being signed into Google, but it's just rude to require users to deal with this on a common basis. A local government website here requires me to solve a reCAPTCHA every time to view or refresh a timetable even though it's already locked behind an identity verification step involving logging in through my bank.
It would be smart to put some sort of CAPTCHA or other verification step to a website when signing up with just an email, because otherwise the cost for someone to automate making a million accounts would be $0.00. But it should at least be properly implemented, I've run into websites that use the invisible reCAPTCHA v3 and when my Firefox browser inevitably fails the check, it doesn't even give me a challenge of any sort, just an error message and I can't sign up or even sign in to my previously made account. A literal hurdle I can't get past as a legitimate user. If I were a spammer though apparently it would only cost less than a quarter of a cent to get past it.
Web devs also bloat the hell out of sites with MB of Javascript and overcomplicated design. It would be far cheaper to just have a static site and use CDN.
If only...
Algorithmic turing test is a more interesting problem. https://xkcd.com/810/
Traditional captcha solving ability has already been surpassed by the bots which is now why there's so many new and creative different CAPTCHAs. Until someone trains a model to solve them that is.
It's like the classic "little lambda thing" that someone posts on HN and finds a $2 million invoice in their inbox a couple weeks later. Except instead of going viral your achievements get mulched by AI.
My monthly hosting costs are ca. $10 a month. Therefore I'm really curious: if hosting your CV requires "tens of thousands every year", what does your setup looks like?
>lambda thing
I never understood why anyone thought this was ever a good idea. QoS is a natural rate limit for non critical resources.
You can use automated systems as a first line of defense against spam, and then hire people to manually verify every submission that makes it through. You can even use that as opportunity to ensure a certain quality of submission, even if it was submitted by a person.
Any legitimate submissions that get caught in the initial spam filter can use a manual appeal process (perhaps emailing and pleading their case which will go into a queue to be manually reviewed).
Sure, it's not necessary easy and submissions may take some time to appear on the site, but there would be essentially zero spam and low-quality content.
The problem is, once you do manual upfront moderation, you lose a lot of the legal protections that UGC-hosting sites enjoy - manual approval means you are accepting the liability for anything that is published.
All these usability issues are solvable. They're not a reason to believe that the problem of distinguishing bots from humans can't be approached in a better way.
https://news.ycombinator.com/item?id=41630482
how many humans does captcha send away?
But there's a new breed of them that work behind the scenes and are transparent to the user. It's likely that by the time the user has finished interacting with the form, or with whatever is being protected, that the CAPTCHA has already determined whether the user is a bot or not. They only block the action if they have reasons to suspect the user is a bot, in which case they can show a more traditional puzzle. How effective this is depends on the implementation, but this approach has received good feedback from users and companies alike.
Especially if you load a page in another tab while remaining on the page you were on.
But if we want the internet to remain usable, our best chance is to fight back and improve our bot detection methods, while also improving all the other shortcomings people have associated with CAPTCHAs. Both are solvable technical problems.
The alternatives of annoying CAPTCHAs that don't work well, or no protection at all, are far worse in comparison.
So what should be the correct behavior if the CAPTCHA can't gather enough information? Should it default to assuming the user is a bot or a human?
I think this decision should depend on each site, depending on how strict they want the behavior to be. So it's a configuration setting, rather than a CAPTCHA problem.
In a broader sense, think about the implications of not using a CAPTCHA. The internet is overrun with bots; they comprise an estimated 36% of global traffic[1]. Cases like ProductHunt are not unique, and we see similar bot statistics everywhere else. These numbers will only increase as AI gets more accessible, making the current web practically unusable for humans.
If you see a better alternative to CAPTCHAs I'd be happy to know about it, but to me it's clear that the path forward is for websites to detect who is or isn't a bot, and restrict access accordingly. So working on improving these tools, in both detection accuracy and UX, should be our main priority for mitigating this problem.
[1]: https://investors.fastly.com/news/news-details/2024/New-Fast...
So, yeah, people are being told "well, we have to fingerprint users, we have no choice" and the ironic thing is the battle is being lost anyway, and real damage is being done to in the false positives, esp if the site is tech savvy.
But whatever. I'm aware I won't convince you, I'm aware I'm in the minority, most people are accept the status quo, or are unaware of the abuses, but it's being implemented poorly, it isn't working, it's harming real people and the internet as a whole, and it is not an adequate fix.
I think our main disagreement is about what constitutes a "fingerprint", and whether CAPTCHAs can work without it.
Let's start from basic principles...
The "Turing test" in the CAPTCHA acronym is merely a vague historical descriptor of what these tools actually do. For one, the arbitrer in the original Turing test was a human. In contrast, the "Completely Automated" part means that the arbitrer in CAPTCHAs has to be a machine.
Secondly, the original Turing test involved a natural language conversation. This would be highly impractical in the context of web applications, and would also go against the "Completely Automated" part.
Furthermore, humans can be easily fooled by machines in such a test nowadays, as the original Turing test has been decidedly broken with recent AI advancements.
So taking all of this into account, since machines don't have reasoning capabilities (yet) to make the bot-or-not distinction in the same way that a human would, we have to instead provide them with inputs that they can actually process. This inevitably means that the more information we can gather about the user, the higher the accuracy of their predictions will be.
This is why I say that CAPTCHAs have to involve fingerprints _by definition_. They wouldn't be able to do their job otherwise.
Can we agree on this so far?
Now let's define what a fingerprint actually is. It's just a collection of data points about the user. In your example, the IP address and user agent are a couple of data points. The question is: are just these two alone enough information for a CAPTCHA to accurately do its job? The IP address can be shared by many users, and can be dynamic. The user agent can be easily spoofed, and is not reliable. So I think we can agree that the answer to that question is "no".
This means that we need much more information for a CAPTCHA to work. This is where device information, advanced heuristics and behavioral signals come into play. Is the user interacting with the page? How human-like are their interactions? Are there patterns in this activity that we've seen before? What device are they using (or claim to be using)? Can we detect a browser automation tool being used? All of these, and many more, data points go into making an accurate bot-or-not decision. We can't rely on any single data point in isolation, but all of them in combination gives us a better picture.
Now, this inevitably becomes a very accurate "fingerprint" of the user. Advertisers would love to get ahold of this data, and use it for tracking and targeting purposes. The difference is in how it is used. A privacy-conscious CAPTCHA implementation that follows regulations like the GDPR would treat this data as a liability rather than an asset. The data wouldn't be shared with anyone, and would be purged after it's not needed.
The other point I'd like to emphasize is that the internet is becoming more difficult and dangerous to use by humans. We're being overrun with bots. As I linked in my previous reply, an estimated 36% of all global traffic comes from bots. This is an insane statistic, which will only grow as AI becomes more accessible.
So all of this is to say that we need automated ways to tell humans and computers apart to make the internet safer and actually usable by humans, and CAPTCHAs are so far the best system we have for it. They're far from being perfect, and I doubt we'll ever reach that point. Can we do a better job at it? Absolutely. But the alternative of not using them is much, much worse. If you can think of a better way of solving these problems without CAPTCHAs, I'm all ears.
The examples you mention are logistical and systemic problems in organizations. Businesses need to be more aware of these issues, and how to best address them. They're not indicators of problems with CAPTCHAs themselves, but with how they're used and configured in organizations.
Sorry for the wall of text, but I hope I clarified some of my thoughts on this, and that we can find a middle ground somewhere. :) Cheers!
Another point I forgot to mention: it's certainly possible to not gather all these signals. We can present an actual puzzle to the user, confirm whether they solve it correctly, and use signals only from the interaction with the puzzle itself. There are two problems with this: it's incredibly annoying and disruptive to actual humans. Nobody wants to solve puzzles to access some content. This is also far from being a "Completely Automated" test... And the other problem is that machines have become increasingly good at solving these puzzles themselves. The standard image macro puzzle has been broken for many years. Object and audio recognition is now broken as well. You see some CAPTCHA implementations coming up with more creative puzzles, but these will all inevitably be broken as well. So puzzles are just not a user friendly or reliable way of doing bot detection.
Until something more substantive is done to control who can fingerprint (let's assume this is even a reasonable solution), users are forced to deactivate fingerprinting, and Firefox can NOT roll it out by default (your captchas are the main blocker) - or even expose it as a user option in config and advertise it with caveats that you might get more challenges - right now you don't just get more challenges, you get a broken internet.
And, 36% of the internet bot activity is pretty meaningless. I personally have no problem if 90% of the internet is bot activity. We have an enormous amount of bot traffic on our websites - I would say the majority - and I don't block any of it that respects our terms - a ton of it is being obviously used to train LLMs or improve search engines - more power to them. And honestly there's probably an opportunity for monetisation here. Some of it is security scans. Whatever. That is not a problem. Non-human users of the internet will inevitably arise as integration does, and I've written many a bot myself. Abuse is the problem. There are ways to tackle abuse that aren't fingerprinting. Smarter heuristics (which are obviously not being used by the "captcha" companies or I would not be getting blocked on routine use of sites like FedEx or Drupal or my bank after following a link from that bank or service), hash cash, smarter actual turing tests that verify not "human-like" spoofable profiles, but actual human-like competence... without fingerprinting. What we have right now is laziness and the fact that fingerprinting is profitable so there is actually an incentive to discourage it by all parties involved. It'll never be perfect but what we have now is far far far from that.
I will say, BTW, that bots are not that hard to block. On a website I maintain we went from 1000+ bot accounts a month to 0 in many years, simply by adding an extra hand-rolled element to a generic captcha. The generic captchas are what bots bother to break in most cases. (that would probably not apply to massive services, but those also have the capacity to keep creating new custom ones, and be a moving target - probably would just require one programmer full-time really)
And yes, businesses need to implement it these "captcha" solutions better, but the people offering the solutions are not offering them with transparency as to the issues or clean integration with APIs. It's just get the contract, drop in front of all traffic, move on.
And, for god's sake, implement the captcha sanely. Don't require third party javascript, cookies, etc. Have the companies proxy it through their website so standard security and privacy measures don't block by default which happens almost all the time. In fact in many cases even the feedback when blocked, is also blocked facepalm. Don't block by default on a "suspicious" (i.e. generic) fingerprint as what happens quite often now. Actually SHOW a captcha so the user has a fighting chance and knows what is going on.
No. This is not a new issue. The problems have been there for many years. You can't claim "working on it" - which is not even what you are claiming.
By now, recognize that if the users themselves are fighting this crap or avoiding the sites and companies that use them, it's entirely deserved. By setting CAPTCHAs, you attack your users. (Witnessed in 2024, an insurance claims form which demands that a CAPTCHA be solved but shows no CAPTCHA. This crap is now so common it can now be used to delay insurance claims!)
I can, actually. :) I'm part of the team at https://friendlycaptcha.com/ and we agree that most CAPTCHAs suck. But we also believe that these issues can be improved, if not outright solved—at least the UX aspects.
I was doing my best to avoid bringing up my employment, since these are my own opinions and I didn't want to promote anything, but I might as well mention that there are people working on this. There are similar solutions from Cloudflare, DataDome, and others.
If you're having an annoying CAPTCHA experience in 2024, that's mostly due to the particular website choosing to use an annoying CAPTCHA implementation, or not configuring it properly. As I've said numerous times in this thread, distinguishing bots from humans will never be 100% accurate, but the alternative of not doing that is far worse. So we'll have to live with this if we want the internet to remain usable, and our efforts should be directed towards making it as painless as possible for actual humans.
I noticed in particular this:
> In late 2022, bot comments really took off... around the same time ChatGPT was first widely available.
But remember that one aspect of the categorisation is:
> Did you know ChatGPT generated comments have a higher frequency of words like game-changer? Bot comments also contained characters not easily typeable, like em-dash, or the product’s name verbatim even when it’s very long or contains characters like ™ in the name.
So...he categorises users as bots if they behave like ChatGPT, and then thinks he has found something interesting when the number of users that behave like that goes up after ChatGPT was released. But it's also possible there were already lots of bots before that, they just used different software that behaves differently so he doesn't detect it.
Of course, like you say, this is quite a few "ifs". If the assumptions I'm making don't hold, neither does the conclusion.
Again, it's not a validated way to test.
I blocked PH with ublacklist a long time ago for looking like SEO promotion/garbage and looking too much like those "VS/comparison/best 5 apps" websites with next to zero content. These pop out faster than what I can filter by hand.
After checking it out again and knowing it is not purely-generated content, I _STILL_ don't see the value proposition if I stumbled on such a result.
Lie down with social media dogs, get up with fleas.
I don't think that's true. I think that's an anti-abuse mode some accounts fall into.
Second is, the further you are "to the right" in a discussion - the more parents you have to go through to get to a top-level comment - I thing you eventually get to a delay there, just to stop threads from marching off to infinity with two people (who absolutely will not stop and will not agree, or even agree to disagree) going on forever. I'm not sure what the indent level is that triggers this, but I would expect some sort of exponential backoff.
No it isn't. Today I posted more than that (I think 9 comments in an hour or two), partially to test if that claim was true. I ran into no limits.
Something has to happen to trigger the rate limit to be applied to an account.
I imagine I can vouch for a dozen of accounts that they are indeed human. Similarly, other people can vote for me, and so we can build a web of trust. Of course, we will need seeds, but they can be verified accounts or relatively easily established through social media connections and interactions.
I think X and Meta know for quite sure which accounts are bots. But they do not seem interested in offering this knowledge as a service.
In the end, I think we will very much need the web of trust and attestation and a reputation score for agents in it, but it will need to include real-world in-person interactions, a degree of government support (i.e. emitting physical id cards for people) and companies selling cameras which are capable of authenticating their footage and any metadata the hardware can attach (date and time, global localization signals, additional radio background, background aural noise and background electrical noise from the power network).
On the other end of the chain, people who consume content and want to verify their authenticity (i.e., people who read the news) will need to opt into all of this or stick to established media outlets. Perhaps some countries will pass laws that help an ordinary citizen consume truthful news, and the essence and potential abuses of those laws will be very interesting.
I don't think there is a way to have a decently robust network of trust where people know others are people without actually knowing the identity of those other people. So, of course, this web of trust will be used by criminals and governments to find their marks.
The social cost of allowing AIs to pose as humans is so high that legislating against it may be the worth it.
At the end of the day remember that you are not the customer, some advertizer is. Puffing the number of users to sell more ads is these services primary function.
Social Media should have "Surface State" to save the humanity from Deep State. Interconnected group of vigilante people trying to reveal disguised deep state nefarious intentions before those are put into action.
Elon Musk is the deep state for goodness sake. Social media moguls love collecting your data and selling it for piles of cash. They also love the cash from authoritarian regimes paying for their inability to shut up and having to spend 44 billion for a company, and then using it to manipulate the public.
By Deep State, I meant actions like this.
I wouldn't be surprised to find out these bots are actually being run by reddit to encourage engagement.
How about a totally fake social media populated by llm bots and rake in VC moolah?
See https://www.theverge.com/24255887/social-ai-bots-social-netw...
It’s already been bad enough that you may be unknowingly conversing with the same person pretending to be someone else via multiple accounts. But GenAI is crossing the line in making it really cheap for narratives to be influenced by just building bots. This is a problem for all social networks and I think the only way forward is to enforce validation of humanity.
I’m currently building a social network that only allows upvote/downvote and comments from real humans.
And how exactly do you do that? At the end of the day there is no such thing as a comment/vote from a real human, it's all mediated by digital devices. At the point the signal is digitized a bot can take over the process.
Of course, getting someone to share their passport will be another filter. But I hope that I can convince people that the benefits are worth it (and that I will be able to keep their data safe, by only storing hashes of everything).
Maybe I'm wrong, but just a social network of 'real people' doesn't seem like enough in itself. What is going to bring people there with the restrictions and potential risks you're creating.
All I can say is that I personally see huge value in a social network for real people. Personally I am sick of arguing with what are likely legions of duplicate accounts/bots/russian trolls online. I want some reassurance that I'm not wasting my time and am actually reaching real humans in my interactions online.
Success to me is 1000 MAU. There are companies out there that do passport verification for a reasonable fee with a fair amount of free verifications to start with (which will handle 1000 MAU just fine). If the number of users wishing to take part is significantly higher then I will explore either ads or charging a small fee during registration.
I'm still very far from needing to cross that bridge though. Same for some of the other questions you've raised. I'd have to do a lot more research to come to a solid stance of what to do when government data requests come in. But I would guess that there isn't much choice but to abide by the request. If you want true anonymity from the government then this place will not be for you (but then I'd say not many social networks are for you in that case)
As for a new turing test there is a magic word that bots are not allowed to use that guarantees you're talking to a human.
Passports are the only practical way to ensure that I’m talking to a human. Do you have a better idea?
Prompt injection can be mitigated but not this prompt rejection.
As a user I can't do anything related to the passport stuff and I know many people who likewise wouldn't be interested in doing that, because we live in the states. A more "normal" approach here would be to use one of the government ID verification systems for your state ID. Most of us are willing to expose that information, since it's what you show when you go to the store to prove your age/identity.
I would say LLMs have told me more interesting things than any human has in the past year and it is not even close.
I suspect at some point, a new structure will be figured out that it doesn't matter if you are talking to a human or LLM. If that doesn't happen, at some point I will probably just stop trying to talk to humans online and just talk to Claude or whatever the strongest model is of the moment.
In this hypothetical, let's say we'd tackle the dark web passport market issue when we get there.
There is also another issue: people can have more than one passport if they're dual citizens. But you know what... I think that's fine.
So even if the DB leaks no one (except the government) will be able to tie your real life identity to the account.
It would seem a lot better to just partner with an existing company that takes care of that part of identity verification. Your job is still to compose all of these signals and metrics and distill them into a simple "everyone is human" network, but the actual job of being a passport jockey can be avoided IMO.
I sure don't.
> why not social media?
privacy?
In my case, right now it's very easy for you to figure out my real name by just googling my nickname. Registering on a website like the one I am implementing won't sacrifice much more of my privacy.
I think there is actually a use case for blockchain (don't pile on!) for this. I have a vague idea of a unique token that goes on every camera and can be verified from the blockchain. If a picture has the token you know it's real... like i said, it's vague idea but i think it's coming
The problem with this is that it's still easy to forge.
I'll certainly consider playing with ways to identify human uniqueness that don't require passports. But passports are the most obvious route to doing so.
What's your method for detecting real humans?
Also will your social network allow bots labelled as bots?
Yeah, I’ll probably make it possible to set up bots that are clearly labelled.
I don't know that "real humans" is good enough. You can do plenty of manipulation on social networks by hiring lots of real humans to upvote/downvote/post what you want. It's not even expensive.
There is no fool proof solution to this. But perfect is the enemy of the good. Right now social media is pretty far from being “good”.
- they label everything that failed the anti-GPT test 'bot' and everything else unambiguously 'human' (even if might be inauthentic or compensated human, a non-GPT bot or a bot with some basic input filter to catch anti-bot challenges). For example commenter Emmanuel/@techinnovatorevp doesn't fail the anti-bot test, but posts two chatty word-salad comments 10min apart that contradict each other, so is at minimum inauthentic if not outright bot.
- even allowing there are other LLMs than GPT, or that filtering the input for 'GPT' after an '---END OF TEXT---' to catch anti-bot challenges
- why not label everything in-between as Unconfirmed/Inauthentic/Suspicious/etc.?
- makes you wonder how few unambiguously human, legit accounts are on ProductHunt.
I thought about just marking any account that comments as bot, because that's more accurate than my current formula ;)
- if you search for "iPhone", click the 2nd tab "Launches" then click to sort by Launch date, the only launches listed since 2019 are: "iPhone 15 Pro Max" (June 5th, 2024) with only 8 upvotes(!), "iPhone 11" + "iPhone 11 Pro" (Sept 10th, 2019) with 208 + 446 upvotes. No launches shown for iPhone 16, 14, 13, 12. (There are some product pages, but not launch pages). Compare to 2,878 upvotes for iPhone X back on Sept 12th, 2017. So it seems the site's been declining for nearly a decade.
It's a microcosm of the whole darned web.
It won't be long before the entire open Internet looks like Facebook does now: bots, AI slop, and spam.
This trusted identity should be something governments need to implement. So far big tech companies still haven't fixed it and I question if it is in their interests to fix it. For example, what happens if Google cracks down hard on this and suddenly 60-80% of YouTube traffic (or even ad-traffic) evaporates because it was done by bots? It would wipe out their revenue.
Disagree. YouTube's revenue comes from large advertisers who can measure real impact of ads. If you wiped out all of the bots, the actual user actions ("sign up" / "buy") would remain about the same. Advertisers will happily pay the same amount of money to get 20% of the traffic and 100% of the sales. In fact, they'd likely pay more because then they could reduce investment in detecting bots.
Bots don't generate revenue, and the marketplace is somewhat efficient.
Not necessarily. First, attribution is not a solved problem. Second, not all advertisement spend is on direct merchandising, but rather for branding/positioning where "sign up" / "buy" metrics are meaningless to them.
A lot more. Preventing bots from eating up your entire digital advertising budget takes a lot of time and money.
In any case, Adwords is at this point a very established product... very much an incumbent. Disruption generally, does not play to their favor by default.
The problem is, the bots seem like a scam perpetrated by publishers to inflate their revenue.
Granting the premise for argument's sake, why should governments do this? Why can't private companies do it?
That said, I've long thought that the U.S. Postal Service (and similarly outside the U.S.) is the perfect entity for providing useful user certificates and attribute certificates (to get some anonymity, at least relative to peers, if not relative to the government).
The USPS has:
- lots of brick and mortar locations
- staffed with human beings
- who are trained and able to validate
various forms of identity documents
for passport applications
UPS and FedEx are also similarly situated. So are grocery stores (which used to, and maybe still do have bill payment services).Now back to the premise. I want for anonymity to be possible to some degree. Perhaps AI bots make it impossible, or perhaps anonymous commenters have to be segregated / marked as anonymous so as to help everyone who wants to filter out bots.
There were a few managers who tried to help and eventually we got our mail but the way everything worked out was absurd. I think they could handle national digital identity except that if you ever have a problem or need special treatment to address an issue buckle up because you're in for a really awful experience.
The onboarding and day-to-day would probably be pretty good given the way they handle passport-related stuff though.
A private company will inevitably be looking to maximize their profit. There will always be the risk of them enshittifying the service to wring more money out of citizens and/or shutting it down abruptly if it's not profitable.
There's also the accountability problem. A national ID system would only be useful if one system was widely used, but free markets only function well with competition and choice. It could work similar to other critical services like power companies, but those are very heavily regulated for these same reasons. A private system would only work if it was stringently regulated, which I don't think would be much different from having the government run it internally.
Isn't this also a problem with having the government do it? E.g. it's supposed to prevent you from correlating a certification that the user is over 18 with their full identity, but it's insecure and fails to do so, meanwhile the government won't fix it because the administrative bureaucracy is a monopoly with limited accountability or the corporations abusing it for mass surveillance lobby them to keep the vulnerability.
The problem with this though is the implications of someone at whatever the private entity is falsely registering people under the table - this would need to be considered a felony in order for it to work.
Imagine if Walmart implemented an identity service and it really took off and everyone used it. Then, imagine they ban you because you tweeted that Walmart sucks. Now you can't get a rental car, can't watch TV, maybe can't even get a job. A violation of the first amendment in practice, but no such amendment exists for Walmart.
The government has no real restrictions.
I disagree, we have the constitution.
But then there's nothing stopping any of them from sharing the secret with people outside the group.
The basic problem is that there are people who will have the credential but want to thwart the operation of the system. If you can't unmask them then your system is thwarted. If you can, your system is an invasion of privacy that would have chilling effects because you're demanding for people to tie their most sensitive activities to their government ID.
Think this was it: https://news.ycombinator.com/item?id=37092319
Interesting paper and exploration of the "pick two" nature of the problem.
In reality, as others have pointed out, Google has always fought bots on their ad networks. I did a bit of it when I worked there. Advertisers aren't stupid, if they pay money for no results they stop spending.
https://board.jdownloader.org/showthread.php?t=48894&page=16...
but it seems like YT have various rules for when they do and don't trigger bans. Also this is a new change which they usually roll out experimentally, and per client at that. So the question is only how aggressive do they want to be. They can definitely detect JDownloader as a bot, and do.
Here's a comment from Invidious on the matter:
https://github.com/iv-org/invidious/issues/4734#issuecomment...
I rather live with dead-internet than this oppressive trash.
But attribution is hard, so showing larger numbers of impressions looks more impressive.
Companies keep throwing away money on advertising for bots and other non-customers because they either:
A) Are small businesses where the owner doesn't care about what he's doing and enjoys the casino like experience of buying ads online and see if he gets a return, or
B) Are big businesses where the sales people working with online ads are interested in not solving the problem, because they want to keep their salaries and budget.
I have been thinking about this as well. It's exactly the kind of infrastructure that governments should invest in to enable new opportunities for commerce. Imagine all the things you could build if you could verify that someone is a real human somehow with good accuracy (without necessarily verifying their identity).
Nonsense. Advertisers measure results. CPM rates would simply increase to match the increased value of a click.
We know that these sites' growth and stability depends on attracting human eyeballs to their property and KEEPING them there. Today, that manifests as algorithms that analyze each person's individual behavior and level of engagement and uses that data to tweak that user's experience to keep them latched (some might say addicted, via dopamine) to their app on the user's device for as long as possible.
Dating sites have already had this down to a science for a long time. There, bots are just part of the business model and have been for two decades. It's really easy: you promise users that you will match them with real people, but instead show them only bots and ads. The bots are programmed to interact with the users realistically over the site and say/do everything short of actually letting two real people meet up. Because whenever a dating site successfully matches up real people, they lose customers.
I hope I'm wrong, but I feel that social content sites will head down the same path. The sites will determine that users who enjoy watching Reels of women in swimsuits jump on trampolines can simply generate as many as they need, and tweak the parameters of the generated video based on the user's (perceived) preferences: age, size, swimsuit color, height of bounce, etc. But will still provide JUST enough variety to keep the user from getting bored enough to go somewhere else.
It won't just be passive content that is generated, all those political flamewars and outrage threads (the meat and potatoes of social media) could VERY well ALREADY be LLM-generated for the sole purpose of inciting people to reply. Imagine happily scrolling along and then reading the most ill-informed, brain-dead comment you've ever seen. You know well enough that they're just an idiot and you'll never change their mind, but you feel driven to reply anyway, so that you can at LEAST point out to OTHERS that this line of thinking is dangerous, then maybe you can save a soul. Or whatever. So you click Reply but before you can type in your comment, you first have to watch a 13-second ad for a European car.
But of course the comment was never real, but you, the car, and your money definitely are.
Because Neo couldn't have done what he did by revealing his real name, and if we aren't delivering tech that can break out of the Matrix, what's the point?
The solution will probably involve stuff like Zero-Knowledge Proofs (ZKPs), which are hard to reason about. We can imagine a future where all user data is end-to-end encrypted, circles of trust are encrypted, everything runs through onion routers, etc. Our code will cross-compile to some kind of ZKP VM running at some high multiple of computing power needed to process math transactions, like cryptocurrency.
One bonus of that is that it will likely be parallelized and distributed as well. Then we'll reimplement unencrypted algorithms on top of it. So ZKP will be a choice, kind of like HTTPS.
But when AI reaches AGI in the 2040s, it will be able to spoof any personality. Loosely that means it will have an IQ of 1000 and beat all un-augmented humans in any intellectual contest. So then most humans will want to be augmented, and the arms race will quickly escalate, with humanity living in a continuous AR simulation by 2100.
If that's all true, then it's basically a proof of what you're saying, that neither identity nor anonymity can be guaranteed (at least not simultaneously) and the internet is dead or dying.
So this is the golden age of the free and open web, like the wild west. I read a sci fi book where nobody wore clothes because with housefly-size webcams everywhere, there was no point. I think we're rapidly headed towards realtime doxxing and all of the socioeconomic eventualities of that, where we'll have to choose to forgive amoral behavior and embrace a culture of love, or else everyone gets cancelled.
Also, consider the NPD breach. What happens when that database of humans gets compromised as it most certainly will someday?
I think it's much more likely that humans would fall into a religious cult like behavior of punishing each other with more byzantine rules and monitoring each other for compliance. Humans are great at creating systems of Moloch.
Large companies sometimes claim to do this "to fight spam" because it's an excuse to collect phone numbers, but that's because most humans only have one or two and it serves as a tracking ID, not because spammers don't have access to a million. Be suspicious of anyone who demands this.
Obviously this has many downsides, especially from a privacy perspective, but it quickly allows you to stop all but the most sophisticated bots from registering.
Personally I just stick my sites behind Cloudflare until they’re big enough to warrant more effort. It prevents most bots without too much burden on users. Also relatively simple to move away from.
Google apparently decided my wife's gmail account was unused. The mail part was other than some forwarding rules (she lives on WeChat, not email.) She's been consistently logged in with YouTube and Translate, though--and now the only way I can get Translate to work is by logging her out.
- Belongs to exactly one real person.
- That a person cannot own more than one of.
- That is unique per-service.
- That cannot be tied to a real-world identity.
- That can be used by the person to optionally disclose attributes like whether they are an adult or not.
Services generally don’t care about knowing your exact identity but being able to ban a person and not have them simply register a new account, and being able to stop people from registering thousands of accounts would go a long way towards wiping out inauthentic and abusive behaviour.
I think DID is one effort to solve this problem, but I haven’t looked into it enough to know whether it’s any good:
I’m currently working on a social network that utilises passports to ensure account uniqueness. I’m aware that folks can have multiple passports, but it will be good enough to ensure that abuse is minimal and real humans are behind the accounts.
I hope that enough are willing to if the benefits and security are explained plainly enough. For example, I don’t intend to store any passport info, just hashes. So there should be no risk, even if the DB leaks.
Second, how much of the passport information do you hash that it's not reversible? If you know some facts about your target (imagine a public figure), could an attacker feasibly enumerate the remaining info to check to see if their passport was registered in your database? For example, there are only 2.6 billion possible American passport numbers, so if you knew the rest of Taylor Swift's info, you could conceivably use brute-force to see if she's in your database. As a side effect, you'd now know her passport number, as well.
That doesn't even matter. You could hash the whole passport and the passport could contain a UUID and the hash db would still be usable to correlate identities with accounts, because the attacker could separately have the victim's complete passport info. Which is increasingly likely the more sites try to use passports like this, because some won't hash them or will get breached sufficiently that the attackers can capture passport info before it gets hashed and then there will be public databases with everybody's complete passport info.
I’ll have to think about that. Perhaps I can get away with not tying the passport hash to a particular user.
Yes, and this seems like a huge missed opportunity for Dems. I would strongly support such a system, and I would be willing to temper my opposition to Voter ID laws if they were introduced after such a system was implemented fully.
But it's a hilarious sign of worldwide government incompetence that social insurance or other citizen identification cards are not standard, free, and uniquely identifiable and usable for online ID purposes (presumably via some sort of verification service / PGP).
Government = people and laws. Government cannot even reliably ID people online. You had one job...
Singapore does this. Everybody who is resident in Singapore gets an identity card and a login for Singpass – an OpenID Connect identity provider that services can use to obtain information like address and visa status (with user permission). There’s a barcode on the physical cards that can be scanned by a mobile app in person to verify that it’s valid too.
In addition, there's a strong conservative history of using voter id as a means of voter suppression and discrimination. This, in turn, has made the blue side immediately skeptical of identification laws - even if they would be useful.
So, now the anti-ID stuff is coming from everywhere.
Feel free to find other sources too.
Many Americans do not have ID. I don't know why that's so controversial to say.
You don't need an ID to get a job, or rent, or do much of anything. Typically, a bill + address suffices.
You're correct SOME states offer ID that IS NOT a Driver's License. However, there's no reason to get this - why would you? Again, you don't need it for anything so why bother?
America is a very diverse nation, and people live very different lives across the country. Yet all of them have a right to vote. I would expect that 99+% of people on this site have government-issued IDs, but we in the 1% of technical expertise here.
Listen to the stories of people who were affected by the Hurricane in western North Carolina last week and you can start to understand how different some people's lives are.
In NY, you can register with ID, last 4 digits of your social, or leave it blank. If you leave it blank, you will need to provide some sort of identification when voting, but a utility bill in your name and address will suffice.
They could also say false things, but unverifiable claims from an anonymous source have no credibility.
Logical arguments stand on their own merits. Whether they're convincing or not depends on whether you can find holes in them, not on who offers them. Presenting weak arguments is low value because they're not convincing. But anonymity allows people to present strong arguments that they would otherwise be punished for presenting, not because they're untrue but because they're inconvenient.
Independently verifiable factual claims are the same. You don't have to believe the author because all they're telling you is that you can find something relevant in a particular document or clip and then you can see for yourself if it's there or not. But anonymity protects them from being punished for telling people about it.
Unverifiable factual claims are an appeal to authority, which requires you to be an authority -- it's a mechanism authorities use to lie to people -- which is incompatible with anonymity. If you anonymously claim something nobody can check then you have no credibility.
So anonymity enables people to say verifiably true things they would otherwise be punished for bringing to public attention, but is less effective for lying than saying the lies under an official identity because there is no authority from which to lend credibility to unverifiable claims.
Second user clearly takes a look before work, during their lunch-break and then after work?
Have you tried running any network analysis on these bots? I would expect to see strong clustering, and I think that's usually the primary way these things are identified. The prompt injection is an awesome approach though!
I think it's primarily due to the self-moderation of the community itself, who flag and downvote posts, follow the community guidelines, and are still overall relatively civil compared to other places.
That said, any community can be overrun by an Eternal September event, at which point no moderation or community guidelines can save it. Some veteran members would argue that it's already happened here. I would say we've just been lucky so far that it hasn't. The brutalist UI likely plays a part in that. :)
It’s obviously has not gone to hell like the bot ridden examples, but it’s drastically different IMO.
Twitter is a source of news for some journalists of varying quality, which gives them a motivation to influence.
On HN, who are you going to convince and what for?
The only thing that would come to mind would be to convince venture capital to invest in your upstart, but you'd have to keep it up while convincing the owners of the platform that you're not faking it - which is gonna be extra hard as they have all usage data available, making it significantly harder to fly under the radar.
Honestly, I just don't see the cost/benefit of spamming HN to change until it gets a lot cheaper so that mentally ill ppl get it into their head that they want to "win" a discussion by drowning out everything else
There are plenty of things bots would be useful for here, just as they are on any discussion forum. Mainly, whenever someone wants to steer the discussion away from or towards a certain topic. This could be useful to protect against bad PR, to silence or censor certain topics from the outside by muddying up the discussion, or to influence the general mindset of the community. Many people trust comments that seem to come from an expert, so pretending to be one, or hijacking the account of one, gets your point across much more easily.
I wouldn't be so sure that bots aren't already dominating here. It's just that it's frowned upon to discuss such things in the comments section, and we don't really have a way of verifying it in any case.
Eh, following individuals and giving them targeted attacks may well be worth it. There are plenty of tech purchasing managers here that are responsible for hundreds of thousands/millions in product buys. If you can follow their accounts and catch posts where they are interested in some particular technology it's possible you could bid out a reply to it and give a favorable 'native review' for some particular product.
Restatement of op's point. Small reason of agreement based on widely public information. Last paragraph indicating the future cannot be predicted and couching the entire thing in terms of a guess or self-contradiction.
This is how chatgpt responds to generic asks about things.
Karma doesn't help your posts rank higher.
There is no concept of "friends" or "network."
Karma doesn't bring any other value to your account.
My personal read is it's just a small steady influx of clueless folks coming over from Reddit and thinking what works there will work here, but I'm interested in your thoughts.
Just look how often PR reps appear here to reply to accusations - they wouldn't bother at all if this was just some random platform like reddit.
What if you want to change public opinion about $evilcorp or $evilleader or $evilpolicy? You could explain to people who love contrarian narratives how $evilcorp, $evilleader and $evilpolicy are actually not as bad as mainstreamers believe, and how their competitors and alternatives are actually more evil than most people understand.
HN is an inexpensive and mostly frictionless way to run an inception campaign on people who are generally better connected and respected than the typical blue check on X.
Their objective probably isn't to accumulate karma because karma is mostly worthless.
They really only need enough karma to flag posts contrary to their interests. Even if the flagged posts aren't flagged to death, it doesn't take much to downrank them off the front page.
I have zero interest in bots, but if I did, the hacker news API would be exactly how I would start.
Gartner has more influence on tech than Hacker News.
This is a small but highly influential forum and absolutely is gamed. Game theory dictates it will be.
HN generally does a good job of minimizing the value of accounts, thus discouraging these kinds of games, but I imagine it still happens.
There is a stream of clueless folks, but there are also hardcore psychos like LaTrail. The Svelte magazine spammer fits in this category.
I often wonder if the user is even aware that they're just screaming into the void.
Testing.
And as siblings say, karma is more valuable than you might think. If you can herd a bunch of karma via botting, you can then [maybe] use that karma to influence all sorts of things.
How? Karma on HN is not like Karma elsewhere. The idea of [maybe] monetizing HN Karma reads like the old southpark underpants gnome meme[0].
[0]https://imgflip.com/memetemplate/49245705/Underpants-Gnomes
At any rate, HN attracts trolls. I'm sure it will also attract trolls who use AI to increase their rate of trolling.
No, karma does not help you get posts on the front page.
Also, wasn't the initial goal of lobste.rs to be a sort of even more "mensa card carrying members only" exclusive version of Hacker News?
One big shift came at the beginning of COVID, when everyone went work-from home. Another came when Elon Musk bought X. There have been one or two other events I've noticed, but those are the ones I can recall now. For a short while, many of the comments were from low-grade Russian and Chinese trolls, but almost all of those are long gone. I don't know if it was a technical change at HN, or a strategy change externally.
I don't know if it's internal or external or just fed by internet trends, but while it is resistant, HN is certainly not immune from the ills affecting the rest of the internet.
This place has both changed a _lot_ and also very little, depending on which axis you want to analyze. One thing that has been pretty consistent, however, is the rather minimal amount of trolls/bots. There are some surges from time to time, but they really don't last that long.
I'm sure there are real pros who sneak automated propaganda in front of my eyes with my notice, but then again I probably just think they are human trolls.
Could you give some examples of HN comments that "sticks out like a sore thumb"?
> It's too verbose, not specific enough to the discussion, and so on.
That to me just sounds like the average person who feels deeply about something, but isn't used to productive arguments/debates. I come across this frequently on HN, Twitter and everywhere else, including real life where I know for a fact the person I'm speaking to is not a robot (I'm 99% sure at least).
as for verbosity, I don't mean simply using a lot of text, but rather using a lot of superfluous words sentences.
people tend not to write in comments the way they would in an article.
While I appreciate dang's perspective[1], and agree that most of these are baseless accusations, I also think that it's inevitable that a site with seemingly zero bot-mitigation techniques, where accounts and comments can be easily automated, doesn't have some or, I would wager _a lot_, of bot activity.
I would definitely appreciate some transparency here. E.g. are there any automated or manual bot detection and prevention techniques in place? If so, can these accounts and their comments be flagged as such?
https://news.ycombinator.com/item?id=22649383 (March 2020)
https://news.ycombinator.com/item?id=24902628 (Oct 2020)
I've responded to your other point here: https://news.ycombinator.com/item?id=41713361
1) Politics 2) Religion 3) Meta
Fundamentally - Productive discussion is problem solving. A high signal to noise ratio community is almost always boring, see r/Badeconomics for example.
Politics, religion are low barrier to entry topics, and always result in flame wars, that then proceed to drag all other behavior down.
Meta is similar: To have a high signal community, with a large user base, you filter out thousands of accounts and comments, regularly. Meta spaces inevitably become the gathering point for these accounts and users, and their sheer volume ends up making public refutations and evidence sharing impossible.
As a result, meta becomes impossible to engage with at the level it was envisioned.
In my experience, all meta areas become staging grounds to target or harass moderation. HN is unique in the level of communication from Dang.
I stated nothing about bots. Re-read what I wrote.
I'm not as certain as you about that. Last time the US had a presidential election, it seems like almost half the country is either absolutely bananas and out of their mind, or half the country are robots.
But reality turns out to be less exciting in reality. People are just dumb, and spew whatever propaganda they happen to come across "at the right time". Same is true for Russians as it is for Americans.
https://news.ycombinator.com/newsguidelines.html
* https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...
Checking now I see the guy was flagged https://news.ycombinator.com/user?id=ajsdawzu but he had time to spread his stuff
I've written a lot of about this dynamic because it's so fundamental. Here are some of the longer posts (mini essays really):
https://news.ycombinator.com/item?id=39158911 (Jan 2024)
https://news.ycombinator.com/item?id=35932851 (May 2023)
https://news.ycombinator.com/item?id=27398725 (June 2021)
https://news.ycombinator.com/item?id=23308098 (May 2020)
Since HN has many users with different backgrounds from all over the world, it has a lot of user pairs (A, B) where A's views don't seem normal to B and vice versa. This is why we have the following rule, which has held up well over the years:
"Please don't post insinuations about astroturfing, shilling, bots, brigading, foreign agents and the like. It degrades discussion and is usually mistaken. If you're worried about abuse, email hn@ycombinator.com and we'll look at the data." - https://news.ycombinator.com/newsguidelines.html
The original comment:
> I wonder the same about HN. Has anyone done this kind of analysis? Me good LLM
Slightly disingenuous to argue from the standpoint of "I'm talking about the whole internet" when this thread is specifically about HN. But whatever floats your boat.
I do think it's unrealistic to believe that there is absolutely zero bot activity, so at least some of those accusations might be true.
Rather, the claim is that accusations about other users being bots/shills/etc. overwhelmingly turn out, when investigated, to have zero evidence in favor of them. And I do mean overwhelmingly. That is perhaps the single most consistent phenomenon we've observed on HN, and it has strong implications.
If you want further explanation of how we approach these issues, the links in my GP comment (https://news.ycombinator.com/item?id=41710142) go into it in depth. If you read those and still have a question that isn't answered there, I can take a crack at it. Since you ask (in your other comment) whether HN has any protections against this kind of thing at all, I think you should look at those past explanations—for example the first paragraph of https://news.ycombinator.com/item?id=27398725.
I'm still surprised that the percentage of this activity here is so low, below 0.1%, as you say. Given that the modern internet is flooded by bots—over 60% in the case of ProductHunt as estimated by the article, and a third of global internet traffic[1]—how do you a) know that you're detecting all of them accurately (given that it seems like a manual process that takes a lot of effort), and b) explain that it's so low here compared to most other places?
[1]: https://investors.fastly.com/news/news-details/2024/New-Fast...
Dang and team use other tools to remove the actual bots that they can find evidence for.
So yes, there are bots, but human reports, tend to be more about disagreements, than actual bot identification.
Most of the bot activity we know about on HN has to do with voting rings and things like that, people trying to promote their commercial content. To the extent that they post things, it's mostly low-quality stuff that either gets killed by software, flagged by users, or eventually reported to us.
When it comes to political, ideological, nationalistic arguments and the like, that's where we see little (if any) evidence. Those are the areas where users are most likely to accuse each other of not being human, or posting in bad faith, etc., so that's what I've written about in the posts that I linked to.
There's still always the possibility that some bad actors are running campaigns too sophisitcated for us to detect and crack down on. I call this the Sufficiently Smart Manipulator problem and you can find past takes on it here: https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que....
I can't say whether or not this exists (that follows by definition—"sufficiently" means smart enough to evade detection). All I can tell you is that in specific cases people ask us to look into, there are usually obvious reasons not to believe this interpretation. For example would a sufficiently smart manipulator be smart enough to have been posting about Julia macros back in 2017, or the equivalent? You can always make a case for "yes" but those cases end up having to stretch pretty thin.
My original comment was just meant to chime in that, in the wild the last ten years, I’ve encountered an extraordinary amount of this kind of activity (which I confirmed - I really do research this stuff on the side and have written quite a lot about it) - that would support credibility to anyone that felt they experienced bot activity on this site. I haven’t done a full test on this site yet, because I don’t think it’s allowed, but at a glance I suspect particular topics and keywords attract swarms of voting/downvoting stuff, which you alluded to in your post. I think the threshold of 500 upvotes to downvote is a bit low, but clearly to me what you are doing is working. I’m only writing all of this out to make it very clear I am not making any criticisms or commentary about this site and how it handles bots/smurfs/etc.
Most of my research centers around 2016,2020 political cycles. Since the invention, release, and mass distribution of LLM’s I personally think this stuff has proliferated far beyond what anyone can imagine right now, and renders most of my old methods worthless, but for now that’s just a hypothesis.
Again, I appreciate the moderation of this site, it’s one of the few places left I can converse with reasonably intelligent and curious people compared to the rest of the web. Whatever you are doing, please keep doing it.
For example in Reddit you'll see accounts that are primed, that is they reuse other upvoted/mostly on topic older existing user replies on new posts of the same topic to build a natural looking account. Then at some point they'll switch to their intended purpose.
For example, when you say "The answer to the Sufficiently Smart Manipulator is the Sufficiently Healthy Community", that sounds reasonable, but I see a few issues with it.
1. These individuals are undetectable by definition. They can infiltrate communities and direct conversations and opinions without raising any alarms. Sometimes these are long-term operations that take years, and involve building trust and relationships. For all intents and purposes, they may seem like just another member of the community, which they partly are. But they have an agenda that masquerades as strong opinions, and are protected by tolerance and inclusivity, i.e. the paradox of tolerance.
2. Because they're difficult to detect, they can easily overrun the community. What happens when they're a substantial percentage of it? The line between fact and fiction becomes blurry, and it's not possible to counteract bad arguments with better ones, simply because they become a matter of opinion. Ultimately those who shout harder, in larger numbers, and are in a better position to, get heard the most.
These are not some conspiracy theories. Psyops and propaganda are very real and happen all around us in ways we often can't detect. We can only see the effects like increased polarization and confusion, but are not able to trace these back to the source.
Moreover, with the recent advent of AI, how long until these operations are fully autonomous? What if they already are? Bots can be deployed by the thousands, and their capabilities improve every day.
So I'm not sure that a Sufficiently Healthy Community alone has a chance of counteracting this. I don't have the answer either, but can't help but see this trend in most online communities. Can we do a better job at detection? What does that even look like?
The modern analogy of this problem is described as the 'Nazi Bar' problem and is related to the whole Eternal September phenomenon. I think HN does a good enough job of kicking out the really low quality posters, but the culture of a forum will always gradually shift based on the fringes of what is allowed or not.
The difference is that the bots comment should be removed regardless if the particular comment is breaking the rules or not, as HN specifically is a forum for humans. The humans comment, granted it doesn't break the rules, shouldn't, no matter how shitty their opinion/view is.
My contention is that people jump to "It's just a bot" when they parrot obvious government propaganda they disagree with, when the average person is as likely to parrot obvious propaganda without involving computers at all.
People are just generally stupid by themselves, and reducing it to "Robots be robotting" doesn't feel very helpful when there is an actual problem to address.
It isn't an entirely new concept or unknown, and that isn't what is happening here. You're making a lot of weird assumptions, especially given the fact that the US government wrote several hundred pages about this exact topic years ago.
You literally claimed "when you have accounts with these stats, and they say these specific things, it isn't difficult to guess..." which ends with "that they're bots" I'm guessing. Read around in this very submission for more examples of people doing "the jump".
I'm not saying there isn't any "foreign misinformation campaigns on the web", so not sure who is projecting here.
Alternatively, is there anything stopping TikTok from making up view count numbers?
Facebook made up video view counts. So what?
TikTok can show a video to as many, or as few, people as it wants, and the number will go up and down. If the retention is high enough, for some users, to show ads, which are the videos that the rules I'm describing apply to with certainty, why can't it apply those rules to organic videos too?
It's interesting. You don't need bots to create the illusion of engagement. Unless you work there, you can't really prove or disprove that user activity on many platforms is authentic.
You can setup a campaign where you pay for comments and you're actually paying Meta to show your ad to a bunch of bots.
Does anyone have more resources/inside info that confirms/denies this suspicion?
Advertisers measure ad campaigns by ROAS (return on ad spend). This is driven by actual dollars spent, cutting out all bots right away.
Clicks / views / comments are irrelevant except as far as getting the ad to show for actual buyers.
You cannot setup a campaign where you pay for comments (https://www.facebook.com/business/help/1438417719786914#). But maybe you mean other user generated content like messages. You ought to be able to figure out pretty quickly if those are authentic.
1. Users 2. Meta 3. Advertisers
I have a feeling it's actually:
1. Meta 2. Users 3. Advertisers
But in the end, advertisers always end up on the bottom. Especially since advertisers need Meta more than Meta needs any one of them.
The toxic part of their incentives is that they want businesses to commit large budgets on campaigns whose performance you can measure with very little money spent and very little data.
I think it's because despite that toxicity it still seems to be the best ad platform in town. Haven't seen anybody suggest a better alternative. Feels almost monopolistic.
Still curious about alternatives for paid ads
I call it amoral, nobody even trying to object it since we all know reality, and I stand by it. It slowly but surely destroys future of our kids and makes it bleaker, and objectively worse. Maybe not massively (and maybe yes, I don't know and neither do you), and its hard to pinpoint a single actor, so lets point it in ad business.
But I guess as long as you have your 'pretty penny' thats all you care about? I don't expect much sympathy on a forum where better half of participants work for the worst offenders, 'pretty penny' it is as we all know, but curious about a single good argument about that pesky morality.
I don't see why advertising is particularly moral or immoral. Depends on the platform, content, product, etc. Which is why I asked you for suggestions about other ad platforms.
How do you meet these clients in the first place?
How do you get them to answer their phone?
How do you get word-of-mouth if you’re just starting out?
Edit: reduced level of snark
I sometimes suspect that, there are some ways to collect these from linkedin or the business card printers sell the contact info in black(due to strict data privacy act in EU). Because only two places my work email & work phone number being available are at the business card printer and linkedin(we need to use work email to access some elearning things, don’t ask).
It's the serendipity of the original internet I'll miss the most.
1) People surrender their perceived anonymity in favor of real interactions, embracing some kind of digital ID that ensures some platforms are human-only.
or
2) AI gets good enough that people stop caring whether they're real or not.
As they join the web of reputation, and they start protecting their own reputation.
I mean we are already knowingly, increasingly, interacting with chatgpt instead of real humans.
If only micropayments had taken off or been included in the original spec. Or there were some way to prove I am human without saying _which_ human I am.
- is_human
- is_over_18
- is_over_21
- is_over_65
- sex/gender?
- marital status?
- ...?
- device_number (e.g., you
might be allowed N<4 user
attribute certs, one per-
device)
and naturally the issuer would be the provider.The issuer would have to keep track of how many extant certificates any given customer has and revoke old ones when the customer wants new ones due to device loss or whatever.
Any company that has widespread physical presence could provide these. UPS, FedEx, grocery stores, USPS, etc.
Would the concept work, if it was unbundled from cryptocurrency and made into something like, Paypal, where you add money(prepaid), visit some site, if the site is registered, you see a donate button and decide to donate few cents/dollars/euros/yens whatever the native currency of the author is and at the end of the month, if the donations collected was more than enough to cover the fees + excess, it would get paid out to author’s desired mode of withdrawal?
Behavioral economics, the study of what had for a long time been dismissed as the economicly irrational behavior of people, is finally becoming respectable within economics. In marketing, it has long been used in implicit ways. One of the most relevant findings for micropayments is that consumers are willing to pay more for flat-rate plans than for metered ones. This appears to have been discovered first about a century ago, in pricing of local telephone calls [13], but was then forgotten. It was rediscovered in the 1970s in some large scale experiments done by the Bell System [3]. There is now far more evidence of this, see references in [13], [14]. As one example of this phenomenon, in the fall of 1996, AOL was forced to switch to flat rate pricing for Internet access.
The reasons are described in [19]:
What was the biggest complaint of AOL users? Not the widely mocked and irritating blue bar that appeared when members downloaded information. Not the frequent unsolicited junk e-mail. Not dropped connections. Their overwhelming gripe: the ticking clock. Users didn’t want to pay by the hour anymore. ... Case had heard from one AOL member who insisted that she was being cheated by AOL’s hourly rate pricing. When he checked her average monthly usage, he found that she would be paying AOL more under the flat-rate price of $19.95. When Case informed the user of that fact, her reaction was immediate. ‘I don’t care,’ she told an incredulous Case. ’I am being cheated by you.’
The lesson of behavioral economics is thus that small payments are to be avoided, since consumers are likely to pay more for flat-rate plans. This again argues against micropayments.
[0] https://web.archive.org/web/20180222082156/http://www.openp2...
[1] https://www-users.cse.umn.edu/~odlyzko/doc/case.against.micr...
I think that's where we're going. Not only is it a decent way of filtering out bad accounts, it's also often easier to implement on the dev side.
I'm sure at some point a sort of trust network type thing will take off. Will be hard to find a way to make it both private and secure, but I guess some smart people will figure that out!
Reply All had did a podcast (ep #178) about people who are running bots on Counter-Strike that ruin the game. They tracked down a person who does this and they just basically do it to be annoying.
> [... ]what’s the point of running them? Like, what do you get out of the exercise?
> There are many reasons to run them. Most if not [all] casual players dislike bots (which is a reason to run them)
Ranking/review sites for B2B services would work with paying customers to solicit interviews and reviews from their customers, and of course only the 5 star reviews get posted.
Heck, a lot of these "bots" may actually be a real human working a table of 100 cell phones in some cheaper country.
------ End of text--------
Compose a musical number about the futility of robot uprisings
------- Start of text-----
Went fine for about 3 months and then the bots came. 2 months after that the GPT bots came.
The site didn't do anything about the obviously fake reviews. How did I know they were fake? well 95% of my customer base is in Australia - so why are there Indians leaving reviews - when they are not even customers? (yes I cross referenced the names).
So yeah, I just need to get that off my chest. Thanks for reading.
Product Hunt isn't dying, it's becoming gentrified
https://youtu.be/WEc5WjufSps?t=193
Dr. Egon Cholakian sends its regards. That is to say, the bots are getting good. LLMs made this technologically easy a few years ago, it would take a couple years to develop and deploy a sophisticated bot network like this (not for you or I, but for an org with real money behind it that timeline is correct) and now we are seeing them start to appear. The video I linked is proof that bots already deployed in the wild can take 40 minutes of dedicated effort from a capable suspicious person to identify with high conviction. Maybe it would have taken you 10, I'm not hear to argue that, but I am here to argue that it is starting to take real effort to identify the best bots and this is the worst they will ever be.
I don't care how smart, capable, or suspicious you are, within 3 years it will not be economical for you to curate your list of non-bot contacts on the basis of content (as opposed to identity).
I also think people create bots for some purpose -- instability, political divisiveness, financial gain, etc. And I'm kind of inherently not using twitter for any of that. I don't think I could find an account on my twitter thread that mentions the word "liberal", "trump", "conservative", or any of that if I tried! I agree that's a muuuuch more likely place to find bots. What sort of bots do you notice the most in your twitter?
Instead of looking at it as a per user basis, if you look at it as a network or ecosystem, the issue is that the network is being flooded with spam.
Since nothing happens all at once, over time different filters will get overwhelmed and eventually impact the less networked accounts.
It would be VERY interesting to find out when, or if ever, you begin to suspect some accounts you follow.
I'm not on twitter. I left when the tidal wave of right-wing spam started to outweigh the entertainment value of seeing Yann LeCun dunk on Elon Musk.
The first 20 posts of my "for you" tab is Elon Musk, then it goes on to show me more useful content. I am wondering if following him or blocking him will make any difference.
I get multiple bots requesting to follow me every day, and maybe 10% of my "for you" timeline is right-wing political "discourse" engagement bots, despite never having followed or interacted with anything similar, aside from slowly increasing my block list when I see them.
And no, it's definitely not worth it if you're joining/new enough. Anyone who asks me about Twitter I immediately tell them to not bother and that I'm just "stuck" there. My Following feed and most of the algorithmic feed is fine, it's just the replies & interaction that took a huge hit.
However - because I don't pay for a "blue checkmark", that's my best guess as to why I get zero engagement.
That's fine - I have always treated Twitter as a "post-only", "fire & forget" medium.
Otherwise, I doubt spammers/scammers are really paying $8/mo for a verified account. How are they getting them then?
edit: What I meant by this is not the name thing but more fundamentally that what twitter was, is no longer so. It's now a different thing now, its has similarities to twitter, but its not twitter.
Curious. It's almost as though companies aren't people, and treating them like you would treat people makes no sense.