Some recent examples from my history:
what video formats does mastodon support? https://www.phind.com/search?cache=jpa8gv7lv54orvpu2c7j1b5j
compare xfs and ext4fs https://www.phind.com/search?cache=h9rmhe6ddav1bnb2odtchdb1
on an apple ][ how do you access the no slot clock? https://www.phind.com/search?cache=w4cc1saw6nsqxyige7g3wple
The answers aren't perfect. But they are a good gloss and then the links to web sources are terrific. ChatGPT and Claude aren't good at that. Bing CoPilot sort of is but I don't like it as much.
FWIW my query for your question gives me a pretty good answer. The first list has three options, one of which is soldered (and the answer says so). It narrows it down to unsoldered ones when I ask.
https://www.phind.com/search?cache=kukryw72yutlp4u88nubmjuu
This answer is mostly good because it relies heavily on an iFixit article that it provides as the first reference. That's what I like about using Phind, it's as much a search engine as an oracle.
https://www.ifixit.com/News/35377/which-wireless-earbuds-are...
Google seems to be better at this, giving me the Fairbuds directly: https://imgur.com/a/7En4e9u
Many times the answers seemed to be getting more and more incomplete or incorrect as time went on (to a variety of questions over a period of months). Even worse it would say it couldn't find the answer, yet the answer was among the sites noted as reference!
I've ended up mostly resorting to Bing and gpt 4o. Frankly, I'm hesitant to waste time trying this new version.
Kagi does the opposite: it's mostly search results but sometimes an AI gives you a "Quick Answer" too.
I sincerely apologize for my earlier response. Upon reviewing the search results provided, I realize I made an error in referencing those specific studies. The search results don't contain any relevant information for the claims I mentioned earlier. As an AI assistant, I should be more careful in providing accurate and supported information. Thank you for bringing this to my attention. In this case, I don't have reliable references to support those particular statements about software tools and their impact on developer experience and software quality.
Not sure about this...
I then offered 3 lines of code of a route I'm using in Laravel and I asked to tell me how to implement fragment usage where the parameter in the url determines the fragment returned.
Route::get('/vge-frags/{fragment}', function ($fragment) { return view('vge-fragments'); });
It told me to make sure I have the right view created (which I did) and that was a good start. Then...
It recommended this?
Route::get('/vge-frags/{fragment}', function ($fragment) { return fragment($fragment); });
I immediately knew it was wrong (but somebody looking to learn might not know). So I had to ask it: "Wait, how does the code know which view to use"?
Then it gave me the right answer.
Route::get('/vge-frags/{fragment}', function ($fragment) { return view('vge-fragments')->fragment($fragment); });
I dunno. It's really easy to find edge cases with any of these models and you have to essentially question everything you receive. Other times it's very powerful and useful.
I mean, this is an unsolvable problem with chat interfaces, right?
If you use a plugin that is integrated with tooling that check generated code compiles / passes tests / whatever a lot of this kind of problem goes away.
Generally speaking these models are great at tiny self contained code fragments like what you posted.
It’s longer, more complex, logically difficult things with interconnected parts that they struggle with; mostly because the harder the task, the more constraints have to be simultaneously satisfied; and models don’t have the attention to fix things simultaneously, so it’s just endless fix one thing / break something else.
So… at least in my experience, yes, but honestly, for a trivial fragment like that most of the time is fine, especially for anything you can easily write a test for.
To be fair, I don't expect these AI models to give me perfect answers every time. I'm just not sure people are vigilant enough to ask follow up questions that criticize how the AI got the answers to ensure the answers come from somewhere reasonable.
I hate this kind of thing so much.
"I am so sorry and heartbroken about having suggested that to play a sound you should use the, as you now inform me, non existing command and parameter `oboe --weird-format mysound.snd`, I'll check my information more thoroughly next time and make sure it will not happen again"...
Are you ok?
When a web site says "Sorry, page not found" do you start punching your monitor?
When the delivery guy leaves a note saying "Sorry we missed you" do you go to the depot to beat up the employees?
I think you are on a good trail to having understood what they meant.
The use of 'sorry' is not generally a problem because it is normally framed within expected behaviour and it can be taken as adequate for a true representation, or not blatantly false. But you could imagine scenarios in which the term would be misused into inappropriate formality or manipulation and yes, disrespect is "eliciting violence". You normally work a way in the situation to avoid violence - that is another story.
In "sorry, page not found" 'sorry' is the descriptor for a state (i.e. "not the better case"); in "sorry we missed you" it is just courtesy - and it does not generally cover fault or negligence. But look: there are regions that adopt "your call is important to us", and regions that tend to avoid it - because the suspect of it being inappropriate (false) can be strong.
The outputs of LLMs I have used frequently passes the threshold, and possibly their structural engineering - if you had in front of you a worker, in flesh and bones, that in its outputs wrote plausible fiction ("I imagined a command `oboe` because it sounded good in the story") as opposed to answering your question, but under the veneer of answering questions (which implies, outputting relevant world assessments, Truth based), that would be a right "sore" for "sorry". The anthropomorphic features of LLMs compromise the quality of their outputs in terms of form, especially in solution-finding attempts that become loops of "This is the solution" // "Are you sure?" // "Definitely" // "It is not" // "Oh, I'm so sorry! It will not happen again. This is the solution" (loop...).
Edit: it seems you may have also asked for clarifications about the contextual expression «clean societies». Those societies cybernetically healthy, in which feedback mechanisms work properly to fine-tune general mechanisms - with particular regard to fixing individual, then collective behaviour.
Ctrl+F for "Central nervous system":
https://en.wikipedia.org/wiki/List_of_human_cell_types
Choose any five wikilinks. Skim their distinct functions and pathologies:
https://en.wikipedia.org/wiki/List_of_regions_in_the_human_b...
https://en.wikipedia.org/wiki/Large-scale_brain_network
Evolution's many things, but maybe most of all lazy. Human intelligence has dozens of distinct neuron types and at least hundreds of differentiated regions/neural subnetworks because we need all those parts in order to be both sentient and sapient. If you lesion parts of the human brain, you lose the associated functions, and eventually end up with what we'd call mental/neurological illnesses. Delusions, obsessions, solipsism, amorality, shakes, self-contradiction, aggression, manipulation, etc.
LLMs don't have any of those parts at all. They only have pattern-matching. They can only lie, because they don't have the sensory, object permanence, and memory faculties to conceive of an immutable external "truth"/reality. They can only be hypocritical, because they don't have the internal identity and introspective abilities to be able to have consistent values. They cannot apologize in substance, because they have neither the theory of mind and self-awareness to understand what they did wrong, the social motivation to care, nor the neuroplasticity to change and be better. They can only ever be manipulative, because they don't have emotions to express honestly. And I think it speaks to a not-atypical Silicon Valley arrogance to pretend that they can replicate "intelligence", without apparently ever considering a high-school-level philosophy or psychology course to understand what actually lets human intelligence tick.
At most they're mechanical psychopaths [1]. They might have some uses, but never outweighing the dangers for anything serious. Some of the individuals who think this technology is anything remotely close to "intelligent" have probably genuinely fallen for it. The rest, I suppose, see nothing wrong because they've created a tool in their own image…
[1]: I use this term loosely. "Psychopathy" is not a diagnosis in the DSM-V, but psychopathic traits are associated with multiple disorders that share similar characteristics.
https://github.com/mukhal/intrinsic-source-citation
This is not something that can be LoRa finetuned after the pretraining step.
What we need is a human curated benchmark for different types of source-aware training, to allow competition, and an extra column in the most popular leaderboards, including it in the Average column, to incentivice AI companies to train in a source aware way, of course this will instantly invalidate the black-box-veil LLM companies love to hide behind so as not to credit original authors and content creators, they prefer regulators to believe such a thing can not be done.
In meantime such regulators are not thinking creatively and are clearly just looking for ways to tax AI companies, and in turn hiding behind copyright complications as an excuse to tax the flow of money wherever they smell it.
Source aware training also has the potential to decentralize search!
But I find the anthropomorphization and "AGI" narrative really creepy and grifty. Such a waste that that's the direction it's going.
And I wouldn't say lazy at _all_. I would say efficient. Even evolutionary features that look "bad" on the surface can still make sense if you look at the wider system they're a part of. If our tailbone caused us problems, then we'd evolve it away, but instead we have a vestigial part that remains because there are no forces driving its removal.
But the issue is with calling finished products what are laboratory partials. "Oh look, they invented a puppet" // "Oh, nice!" // "It's alive..."
In terms of people thinking LLMs are smarter than they really are, well...that's just people. Who hate each other for skin colour and sexuality, who believe that throwing salt over your shoulder wards away bad luck; we're still biological at the end of the day, we're not machines. Yet.
That is definitely not true.
In the context of the comment chain I replied to, and the behaviour in question, any statement by an LLM pretending to be be capable of self-awareness/metacognition is also necessarily a lie. "I should be more careful", "I sincerely apologize", "I realize", "Thank you for bringing this to my attention", etc.
The problem is the anthropomorphization. Since it pretends to be like a person, if you ascribe intention to it then I think it is most accurately described as always lying. If you don't ascribe intention to it, then it's just a messy PRNG that aligns with reality an impressive amount of the time, and words like "lying" have no meaning. But again, it's presented and marketed as if it's a trustworthy sapient intelligence.
Some parts seemingly stopped at "output something plausible", but it does not seem theoretically impossible to direct the output towards "adhere to the truth", if a world model is there.
We would still need to implement the "reason on your world model and refine it" part, for the purpose of AGI - meanwhile, fixing the "impersonation" fumble ("probabilistic calculus say your interlocutor should offer stochastic condolences") would be a decent move. After a while with present chatbots it seems clear that "this is writing a fiction, not answering questions".
Feels like they were trained with a gun to their heads. If I don't tell it it doesn't have to answer it'll generate nonsense in a confident voice.
It turns out that this process makes it useful at producing mostly sensible predictions (generate output) for text that is not present in the training set (generalization).
The reason that works is because there are a lot of patterns and redundancy in the stuff that we feed to the models and the stuff that we ask the models so there is a good chance that interpolating between words and higher level semantics relationship between sentences will make sense quite often.
However that doesn't work all the time. And when it doesn't, current models have no way to tell they "don't know".
The whole point was to let them generalize beyond the training set and interpolate in order to make decent guesses.
There is a lot of research in making models actually reason.
That being said, I'm aware that the model doesn't reason in the classical sense. Yet, as I mentioned, it does give me less confabulation when I tell it it's ok not to answer.
I will note that when I've tried the same kind of prompts with Phi 3 instruct, it's way worse than Gemma. Though I'm not sure if that's just because of a weak instruction tuning or the underlying training as well, as it frequently ignores parts of my instructions.
For example you can confabulate "facts" or you can make logical or coherence mistakes.
Current LLMs are encouraged to be creative and effectively "make up facts".
That's what created the first wow factor. The models are able to write a Star Trek fan fiction model in the style of Shakespeare. They are able to take a poorly written email and make it "sound" better (for some definition of better, e.g. more formal, less formal etc).
But then, human psychology kicked in and as soon as you have something that can talk like a human and some marketing folks label as "AI" you start expecting it to be useful also for other tasks, some of which require factual knowledge.
Now, it's in theory possible to have a system that you can converse with which can _also_ search and verify knowledge. My point is that this is not the place where LLMs start from. You have to add stuff on top of them (and people are actively researching that)
Honestly, that's a lot of words and repetition to say "I bullshitted".
Though there are humans that also talk like this. Silver lining to this LLM craze, maybe it'll inoculate us to psychopaths.
Is this true? I feel like most complaints I have and hear about is how inaccurate some of the AI results are. I.e. the mistakes it confidently makes when helping you code.
From hitting enter to a set of relevant answers loaded into your brain, though? Isn't that the goal that should be measured? Against that goal, the two decade old approach seems to have peaked over a decade ago, or phind wouldn't find traction.
For the 20 year old page rankers, time from search to a set of correct answers in your brain is approaching “DNF” -- did not finish.
---
PS. Hallucinations or irrelevant results, both require exercising a brain cell. On a percentage basis, there are fewer hallucinations than irrelevant results, it's just that we gave up on SERP confidence ages ago.
You can have a small model that's cost effective to serve, and gives fast responses, but will be wrong half the time.
Or you can have a large model that's slow to run on cheap hardware, but will give more accurate answers. This is usually only fast enough for personal use.
And the third option with a large model that's fast and accurate, and you'll have to pay Nvidia/Groq/etc. a small fortune to be able to run it at speed and also probably build a solar powerplant to make it cost effective in power use.
Plan page says $20/mo Unlimited powerful Phind-405B and Phind-70B searches; Daily GPT-4o (500+) , Claude 3.5 Sonnet (500+), Claude Opus (10) uses
> Phind-405B scores 92% on HumanEval (0-shot), matching Claude 3.5 Sonnet.
Any other benchmarks?
There was one UI related annoyance with Phind; scroll bar sometimes jumped randomly, maybe even after each input or during token generation (on Firefox). You start wasting a lot of time if you always need to find again the part you were looking. Or even just scrolling back to bottom.
Primary issue is still that both hallucinate too much when you ask something difficult. But that is the general problem everywhere.
I stubornly continued to type my complaint about my json getting to large for phones with slow cpus or slow connections and got 100 solutions to explore. I couldnt help but think this is the worse case robot overlord, it gave me a year worth of materials to study complete with the urge to go do the work. That future we use to joke about is here!
Some of the suggestions are familiar but i dont have the time to read books about with little titbits of semi practical information smeard out over countless pages in the wrong order for my use case.
Im having flashbacks reading for days, digging though a humongous library only to end up with 5 lines of curl. I still cant tell if im a genius or just that dumb.
This long response unexpectedly makes me want to code all day long. One can chose where to go next which is much more exciting than the short linear answers... apparently.
Well done
It has a vscode extension. So if you use that, it makes some sense. Purely for search, I dont know. Ime phind is not that great with internet access, sometimes people disable the search function to get better answers.
At that level of performance you're probably in the realm of hard edge cases with ambiguous ground truth.
May I ask which models you're seeing the pollution with?
I still have to ask follow up questions to get reasonable results but when I tested earlier this year it was outright failing on most of my test queries.
> What degrees are held by each of the current Fortune 100 CEOs?
> What job did each of the current NFL GMs hold before their current position?
> Which genre would each of the current Billboard Hot 100 songs be considered part of?
> How many recipients of the Presidential Medal of Freedom were born outside of the US?
> Which US car company has the most models in their 2025 line-up across all of their brands?
It can't handle those directly right now.
You need to break the problem down step by step and sort of walk it through gathering the data with follow up questions.
But much better than it used to be.
[0] https://query.wikidata.org/#SELECT%20%28COUNT%28DISTINCT%20%...
The amount of thought required to answer any of those questions is pretty high, especially because they are all sizeable lists. It is going to take a lot of thinking out loud, and detailed training data covering all those items, to do that well.
as far as i know you are running your own GPUs - what do you do in overload? have a queue system? what do you do in underload? just eat the costs? is there a "serverless" system here that makes sense/is anyone working on one?
Serverless would make more sense if we had a significant underutilization problem.
const MyClass& getMyClass(){....}
auto obj = getMyClass();
this makes a copy right?
And it was very confident about it not making a copy. It thinks auto will deduce the type as a const ref and not make a copy. Which is wrong, you need auto& or const auto& for that. I asked it if it is sure and it was even more confident.Here is the godbolt output https://godbolt.org/z/Mz8x74vxe . You can see the "copy" being printed. And you can also see you can call non-const methods on copied object, which implies it is a non-const type
I asked the very same question to phind and it gave the same answer https://www.phind.com/search?cache=k3l4g010kuichh9rp4dl9ikb
How come two different AIs, one was supposed to be specialized on coding, fails in such a confident way?
Though to be fair we will hopefully quickly approach a point where a machine can be much more trusted than a human being, which will be fun. Don't cry about it, it's our collectives faults for proving that meat bags can develop ulterior motives.
```
const MyClass& getMyClass(){....}
auto obj = getMyClass();
```
Does this make a copy. Let's think step by step.
This might help you if you're trying to get these to help you more often.Would be great to have access to your model in a LLM gateway like https://openrouter.ai/
I would give your API a try as minimum.
ChatGPT amazed people but its UI didn’t. A bunch of better UI’s showed up building on the OpenAI API and ChatGPT own. They helped people accomplish more which further marketed the product.
You can get this benefit with few downsides if you make the API simple with few promises about feature. “This is provided AS IS for your convenience and experimentation.”
I found even Phind-70B to often be preferable to Claude Sonnet and would commonly opt for it. I’ve been using the 405B today and it seems to be even better at answering.
I’ve found it does depend on the task. For instance, for formatting JSON in the past, GPT-4 was actually the best.
Because you can cycle through the models, you can check the output of each one, to get the best answer.
Bread and water for everyone
now apparently means Bread for our customers, water for everyone
Mostly use it for API questions. It's been amazing at MomentJs stuff. Also use it for code optimization and debugging error messages.
I just paste the page link as a query and it tells me what the page is about and even pulls key points.
Any reason why Phind is region-locked? Is there a list of what countries Phind is available in?
One of our vendors insisted on whitelisting the IPs we were going to call them from, and our deployments went through AWS copilot/Fargate directly to the public subnets. Management had fired the only person with infrastructure experience a few months ago (very small company), and nobody left knew anything about networking.
Within about a week, Phind brought me from questions like "What is a VPC?" "What is a subnet?" to having set up the NAT gateways, diagnosing deploy problems and setting up the VPC endpoints in AWS' crazy complicated setup, and gotten our app onto the private subnet and routing outbound traffic through the NAT for that vendor.
Yes, it occasionally spit out nonsense (using the free/instant model). I even caught it blatantly lying about its sources once. Even so, once I asked the right questions it helped me learn and execute so much faster than I would have trying to cobble understanding through ordinary google searches, docs, and blog posts.
Strongly recommended if you're ever wading into a new/unfamiliar topic.
The day after this thread hit the front page, I tried perlexity.ai b/c Phind was overloaded with traffic and not responding. It was ok, but not quite as helpful. Too hard to tell whether that's a fair judgment of the services or just because I don't have as much to ask as I did a week ago though.
> The impedance of a 22 μH capacitor at 400 THz is approximately 1.80 × 10^-24 Ω.
The correct answer should have been “what the hell are you talking about dumbass?”. Capacitors are not measured in henries and the question really has no meaning at 400THz. Another stochastic parrot.
It then explains that in practice other factors would dominate and the frequency is so high traditional analysis doesn't make so much sense.
Although it's not gonna look like 22 uH.
So what LLM should do is only search and filter human-written material.
An example of search query is "What is the temporal duration between striking adjacent strings to form a chord on a guitar?". Google (being just a dumb keyword search engine) produces mostly unrelated search results (generally answering what chords there are and how does one play them). Phind also cannot answer: [1]
However, when I asked a LLM what keywords I should use to find this, it suggested "guitar chord microtiming" among other choices, which allows to find a research work containing the answer (5 to 12 ms if someone is curious).
[1] https://www.phind.com/search?cache=u4xiqluairg3zkdaxstcr39v
It looks good, thought!
"How to replace a string in Gerbil Scheme?"
Then I realized that this pattern is the biggest application of LLMs right now.
So I guess they're just acknowledging the target market.
...if you buy their pro subscription
Edit: It has been resolved for me. Thank you!
The inference service may be temporarily unavailable - we have alerts for this and will be fixing it soon.