Phind-405B and faster, high quality AI answers for everyone

361
153
rushingcreek
9 months ago
phind.com

NelsonMinar
·
9 months ago
·
[ - ]

Phind continues to be my favorite AI-enhanced search engine. They do a really nice job giving answers to technical questions with links to references where I can verify the answer or learn more detail.

Some recent examples from my history:

what video formats does mastodon support? https://www.phind.com/search?cache=jpa8gv7lv54orvpu2c7j1b5j

compare xfs and ext4fs https://www.phind.com/search?cache=h9rmhe6ddav1bnb2odtchdb1

on an apple ][ how do you access the no slot clock? https://www.phind.com/search?cache=w4cc1saw6nsqxyige7g3wple

The answers aren't perfect. But they are a good gloss and then the links to web sources are terrific. ChatGPT and Claude aren't good at that. Bing CoPilot sort of is but I don't like it as much.

jadbox
·
9 months ago
·
[ - ]

In my tests, it does hallucinate answers, even with Phind 70B. For example, I asked for bluetooth earplugs that have easy battery replacements. It always kept giving me answers for earplugs with I know have their battery soldered into the casing. Tbf, perplexity also fails at this question.

NelsonMinar
·
9 months ago
·
[ - ]

What's good about Phind is even if it gives a wrong answer, it gives you a link to a website where you can try to verify the answer.

FWIW my query for your question gives me a pretty good answer. The first list has three options, one of which is soldered (and the answer says so). It narrows it down to unsoldered ones when I ask.

https://www.phind.com/search?cache=kukryw72yutlp4u88nubmjuu

This answer is mostly good because it relies heavily on an iFixit article that it provides as the first reference. That's what I like about using Phind, it's as much a search engine as an oracle.

https://www.ifixit.com/News/35377/which-wireless-earbuds-are...

foobarchu
·
9 months ago
·
[ - ]

If you ask it questions about software that doesn't occupy wide swaths of stack overflow, it tends to hallucinate features of the language too. I asked for some details regarding policies in Chef and it went on a long, confident diatribe about the "policy" resource (which does not exist, nor does anything even kind of similar to it AFAIK).

guilamu
·
9 months ago
·
[ - ]

I just asked your question in French and the right answer (the Fairphone Fairbuds) is in third position on the right: https://imgur.com/a/dmoaB5r

Google seems to be better at this, giving me the Fairbuds directly: https://imgur.com/a/7En4e9u

navaed01
·
9 months ago
·
[ - ]

‘Easy battery replacements’ is pretty subjective. This feels like one of those error that demonstrate how good the tech is, because it’s being used for very specific and subjective requests

shultays
·
9 months ago
·
[ - ]

It should be able to figure out what an average joe would understand from such questions. I think any human would interpret "easy battery replacement" as "you can just remove the old battery out and put the new one in". If a random person asked you such a question, would you assume he has the tools and the skill needed to solder new batteries and considers that easy?

sharemywin
·
9 months ago
·
[ - ]

this is why search sucks the average person(if they didn't really know what you meant would ask...what are you talking about?

boesboes
·
9 months ago
·
[ - ]

However, glued-in is not subjective. Non-replaceable is not on the scale of easily-hardly replaceable.

Tepix
·
9 months ago
·
[ - ]

Can you name some earbuds where we could disagree whether it is easy or not to replace the battery?

Tepix
·
9 months ago
·
[ - ]

Guess not

rushingcreek
·
9 months ago
·
[ - ]

Thank you! We think there's quite a bit of room for improvement still and are working on better answer organization and verifiability.

canada_dry
·
9 months ago
·
[ - ]

Phind was my go-to for getting more relevant and up-to-date information that could be found on the internet... but that stopped about 3+ months ago.

Many times the answers seemed to be getting more and more incomplete or incorrect as time went on (to a variety of questions over a period of months). Even worse it would say it couldn't find the answer, yet the answer was among the sites noted as reference!

I've ended up mostly resorting to Bing and gpt 4o. Frankly, I'm hesitant to waste time trying this new version.

smusamashah
·
9 months ago
·
[ - ]

I see references here but when I ask questions, I get answer but no citations, and I am logged in. This used to be an issue but was fixed but still an issue for me. If I logout and ask I get reference but the answers are using instant model.

NelsonMinar
·
9 months ago
·
[ - ]

I've noticed that sometimes too. I think it depends on the type of question, sometimes Phind decides you don't need references. You can ask explicitly for them.

Kagi does the opposite: it's mostly search results but sometimes an AI gives you a "Quick Answer" too.

0xpgm
·
9 months ago
·
[ - ]

I'm a Brave Search enjoyer for similar reasons. It's easy to switch between normal search and LLM-powered search when you prefer one over the other.

spirodonfl
·
9 months ago
·
[ - ]

I just tried. Asked a question on a research topic I'm digging into. It gave me some answers but no references. Then I copy the answers it gave me and specifically ask for references. Then I got:

I sincerely apologize for my earlier response. Upon reviewing the search results provided, I realize I made an error in referencing those specific studies. The search results don't contain any relevant information for the claims I mentioned earlier. As an AI assistant, I should be more careful in providing accurate and supported information. Thank you for bringing this to my attention. In this case, I don't have reliable references to support those particular statements about software tools and their impact on developer experience and software quality.

Not sure about this...

spirodonfl
·
9 months ago
·
[ - ]

Just to follow up on this: I asked it to give me a brief explanation on how to use laravel 11 blade fragments, which it did reasonably well.

I then offered 3 lines of code of a route I'm using in Laravel and I asked to tell me how to implement fragment usage where the parameter in the url determines the fragment returned.

Route::get('/vge-frags/{fragment}', function ($fragment) { return view('vge-fragments'); });

It told me to make sure I have the right view created (which I did) and that was a good start. Then...

It recommended this?

Route::get('/vge-frags/{fragment}', function ($fragment) { return fragment($fragment); });

I immediately knew it was wrong (but somebody looking to learn might not know). So I had to ask it: "Wait, how does the code know which view to use"?

Then it gave me the right answer.

Route::get('/vge-frags/{fragment}', function ($fragment) { return view('vge-fragments')->fragment($fragment); });

I dunno. It's really easy to find edge cases with any of these models and you have to essentially question everything you receive. Other times it's very powerful and useful.

wokwokwok
·
9 months ago
·
[ - ]

Seems a little bit of an unfair generalisation.

I mean, this is an unsolvable problem with chat interfaces, right?

If you use a plugin that is integrated with tooling that check generated code compiles / passes tests / whatever a lot of this kind of problem goes away.

Generally speaking these models are great at tiny self contained code fragments like what you posted.

It’s longer, more complex, logically difficult things with interconnected parts that they struggle with; mostly because the harder the task, the more constraints have to be simultaneously satisfied; and models don’t have the attention to fix things simultaneously, so it’s just endless fix one thing / break something else.

So… at least in my experience, yes, but honestly, for a trivial fragment like that most of the time is fine, especially for anything you can easily write a test for.

dotancohen
·
9 months ago
·
[ - ]

And you can have the LLM write the test, too.

rushingcreek
·
9 months ago
·
[ - ]

This is a good point, and we have new application-level features coming soon that to improve verifiability.

spirodonfl
·
9 months ago
·
[ - ]

I dunno if you need it but I'd be happy to come up with some scenarios and help test

rushingcreek
·
9 months ago
·
[ - ]

Sorry about that, could you make sure that "Always search" is enabled and try that first query again? It should be able to get the correct answer with references.

spirodonfl
·
9 months ago
·
[ - ]

It was on. If I ask the same question again it now gets the right answer. Maybe a blip? Not sure.

To be fair, I don't expect these AI models to give me perfect answers every time. I'm just not sure people are vigilant enough to ask follow up questions that criticize how the AI got the answers to ensure the answers come from somewhere reasonable.

mokkun
·
9 months ago
·
[ - ]

I found that quite often even though the always search option is on, it won’t search at times; maybe that was the case here.

Retr0id
·
9 months ago
·
[ - ]

> As an AI assistant, I should be more careful

I hate this kind of thing so much.

mdp2021
·
9 months ago
·
[ - ]

Absolutely. Behaviour that in normal life in clean societies would be "eliciting violence": automated hypocritical lying, apologizing in form and not in substance, making statements based on fictional value instead of truthfulness...

okasaki
·
9 months ago
·
[ - ]

What?

mdp2021
·
9 months ago
·
[ - ]

What "what"? Did the rest of the comments clarify the points to you or should I formulate a

"I am so sorry and heartbroken about having suggested that to play a sound you should use the, as you now inform me, non existing command and parameter `oboe --weird-format mysound.snd`, I'll check my information more thoroughly next time and make sure it will not happen again"...

okasaki
·
9 months ago
·
[ - ]

I mean I'm just not sure what you're saying. What does "eliciting violence" and "clean society" mean?

Are you ok?

When a web site says "Sorry, page not found" do you start punching your monitor?

When the delivery guy leaves a note saying "Sorry we missed you" do you go to the depot to beat up the employees?

mdp2021
·
9 months ago
·
[ - ]

> What does "eliciting violence" and "clean society" mean

I think you are on a good trail to having understood what they meant.

The use of 'sorry' is not generally a problem because it is normally framed within expected behaviour and it can be taken as adequate for a true representation, or not blatantly false. But you could imagine scenarios in which the term would be misused into inappropriate formality or manipulation and yes, disrespect is "eliciting violence". You normally work a way in the situation to avoid violence - that is another story.

In "sorry, page not found" 'sorry' is the descriptor for a state (i.e. "not the better case"); in "sorry we missed you" it is just courtesy - and it does not generally cover fault or negligence. But look: there are regions that adopt "your call is important to us", and regions that tend to avoid it - because the suspect of it being inappropriate (false) can be strong.

The outputs of LLMs I have used frequently passes the threshold, and possibly their structural engineering - if you had in front of you a worker, in flesh and bones, that in its outputs wrote plausible fiction ("I imagined a command `oboe` because it sounded good in the story") as opposed to answering your question, but under the veneer of answering questions (which implies, outputting relevant world assessments, Truth based), that would be a right "sore" for "sorry". The anthropomorphic features of LLMs compromise the quality of their outputs in terms of form, especially in solution-finding attempts that become loops of "This is the solution" // "Are you sure?" // "Definitely" // "It is not" // "Oh, I'm so sorry! It will not happen again. This is the solution" (loop...).

Edit: it seems you may have also asked for clarifications about the contextual expression «clean societies». Those societies cybernetically healthy, in which feedback mechanisms work properly to fine-tune general mechanisms - with particular regard to fixing individual, then collective behaviour.

Intralexical
·
9 months ago
·
[ - ]

That's all they can do. They seem impressive at first because they're basically trained as an adversarial attack on the ways we express our own intelligence. But they fall apart quickly because they don't have actually have any of the internal state that allows our words to mean anything. They're a mask with nothing behind it.

Ctrl+F for "Central nervous system":

https://en.wikipedia.org/wiki/List_of_human_cell_types

Choose any five wikilinks. Skim their distinct functions and pathologies:

https://en.wikipedia.org/wiki/List_of_regions_in_the_human_b...

https://en.wikipedia.org/wiki/Large-scale_brain_network

Evolution's many things, but maybe most of all lazy. Human intelligence has dozens of distinct neuron types and at least hundreds of differentiated regions/neural subnetworks because we need all those parts in order to be both sentient and sapient. If you lesion parts of the human brain, you lose the associated functions, and eventually end up with what we'd call mental/neurological illnesses. Delusions, obsessions, solipsism, amorality, shakes, self-contradiction, aggression, manipulation, etc.

LLMs don't have any of those parts at all. They only have pattern-matching. They can only lie, because they don't have the sensory, object permanence, and memory faculties to conceive of an immutable external "truth"/reality. They can only be hypocritical, because they don't have the internal identity and introspective abilities to be able to have consistent values. They cannot apologize in substance, because they have neither the theory of mind and self-awareness to understand what they did wrong, the social motivation to care, nor the neuroplasticity to change and be better. They can only ever be manipulative, because they don't have emotions to express honestly. And I think it speaks to a not-atypical Silicon Valley arrogance to pretend that they can replicate "intelligence", without apparently ever considering a high-school-level philosophy or psychology course to understand what actually lets human intelligence tick.

At most they're mechanical psychopaths [1]. They might have some uses, but never outweighing the dangers for anything serious. Some of the individuals who think this technology is anything remotely close to "intelligent" have probably genuinely fallen for it. The rest, I suppose, see nothing wrong because they've created a tool in their own image…

[1]: I use this term loosely. "Psychopathy" is not a diagnosis in the DSM-V, but psychopathic traits are associated with multiple disorders that share similar characteristics.

DoctorOetker
·
9 months ago
·
[ - ]

Wait for the first large scale LLM using source-aware training:

https://github.com/mukhal/intrinsic-source-citation

This is not something that can be LoRa finetuned after the pretraining step.

What we need is a human curated benchmark for different types of source-aware training, to allow competition, and an extra column in the most popular leaderboards, including it in the Average column, to incentivice AI companies to train in a source aware way, of course this will instantly invalidate the black-box-veil LLM companies love to hide behind so as not to credit original authors and content creators, they prefer regulators to believe such a thing can not be done.

In meantime such regulators are not thinking creatively and are clearly just looking for ways to tax AI companies, and in turn hiding behind copyright complications as an excuse to tax the flow of money wherever they smell it.

Source aware training also has the potential to decentralize search!

Intralexical
·
9 months ago
·
[ - ]

Yeah. Treating these things as advanced, semantically aware search engines would actually be really cool.

But I find the anthropomorphization and "AGI" narrative really creepy and grifty. Such a waste that that's the direction it's going.

fennecfoxy
·
9 months ago
·
[ - ]

This is just the start. Imagine giving up on progressing these models because they're not yet perfect (and probably never will be). Humans wouldn't accomplish anything at all this way, aha.

And I wouldn't say lazy at _all_. I would say efficient. Even evolutionary features that look "bad" on the surface can still make sense if you look at the wider system they're a part of. If our tailbone caused us problems, then we'd evolve it away, but instead we have a vestigial part that remains because there are no forces driving its removal.

mdp2021
·
9 months ago
·
[ - ]

> This is just the start

But the issue is with calling finished products what are laboratory partials. "Oh look, they invented a puppet" // "Oh, nice!" // "It's alive..."

fennecfoxy
·
9 months ago
·
[ - ]

Oh yeah for sure, it's totally just more beta culture. But at the same time the first iPhone was called a "finished product" but it's missing a lot of what we would consider essential today.

In terms of people thinking LLMs are smarter than they really are, well...that's just people. Who hate each other for skin colour and sexuality, who believe that throwing salt over your shoulder wards away bad luck; we're still biological at the end of the day, we're not machines. Yet.

amelius
·
9 months ago
·
[ - ]

> They can only lie

That is definitely not true.

Intralexical
·
9 months ago
·
[ - ]

Lying is a state of mind. LLMs can output true statements, and they can even do so consistently for a range of inputs, but unlike a human there isn't a clear distinction in an LLM's internal state based on whether its statements are true or not. The output's truthfulness is incidental to its mode of operation, which is always the same, and certainly not itself truthful.

In the context of the comment chain I replied to, and the behaviour in question, any statement by an LLM pretending to be be capable of self-awareness/metacognition is also necessarily a lie. "I should be more careful", "I sincerely apologize", "I realize", "Thank you for bringing this to my attention", etc.

The problem is the anthropomorphization. Since it pretends to be like a person, if you ascribe intention to it then I think it is most accurately described as always lying. If you don't ascribe intention to it, then it's just a messy PRNG that aligns with reality an impressive amount of the time, and words like "lying" have no meaning. But again, it's presented and marketed as if it's a trustworthy sapient intelligence.

mdp2021
·
9 months ago
·
[ - ]

I am not sure that lying is structural to the whole system though: it seems that some parts may encode a world model, and that «the sensory, object permanence, and memory faculties» may not be crucial - surely we need a system that encodes a world model and that refines it, that reasons on it and assesses its details to develop it (I have been insisting on this for the past years also as the "look, there's something wrong here" reaction).

Some parts seemingly stopped at "output something plausible", but it does not seem theoretically impossible to direct the output towards "adhere to the truth", if a world model is there.

We would still need to implement the "reason on your world model and refine it" part, for the purpose of AGI - meanwhile, fixing the "impersonation" fumble ("probabilistic calculus say your interlocutor should offer stochastic condolences") would be a decent move. After a while with present chatbots it seems clear that "this is writing a fiction, not answering questions".

magicalhippo
·
9 months ago
·
[ - ]

I've been playing with Gemma locally, and I've had some success by telling it to answer "I don't know" if it doesn't know the answer, or similar escape hatches.

Feels like they were trained with a gun to their heads. If I don't tell it it doesn't have to answer it'll generate nonsense in a confident voice.

ithkuil
·
9 months ago
·
[ - ]

The models weights are tuned towards the direction that would cause the model to best fit the training set.

It turns out that this process makes it useful at producing mostly sensible predictions (generate output) for text that is not present in the training set (generalization).

The reason that works is because there are a lot of patterns and redundancy in the stuff that we feed to the models and the stuff that we ask the models so there is a good chance that interpolating between words and higher level semantics relationship between sentences will make sense quite often.

However that doesn't work all the time. And when it doesn't, current models have no way to tell they "don't know".

The whole point was to let them generalize beyond the training set and interpolate in order to make decent guesses.

There is a lot of research in making models actually reason.

magicalhippo
·
9 months ago
·
[ - ]

In the Physics of Language Models talk[1], he argues that the model knows it has made a mistake, sometimes even before it has made it. Though apparently training is crucial to make the model be able to use this constructively.

That being said, I'm aware that the model doesn't reason in the classical sense. Yet, as I mentioned, it does give me less confabulation when I tell it it's ok not to answer.

I will note that when I've tried the same kind of prompts with Phi 3 instruct, it's way worse than Gemma. Though I'm not sure if that's just because of a weak instruction tuning or the underlying training as well, as it frequently ignores parts of my instructions.

[1]: https://www.youtube.com/watch?v=yBL7J0kgldU

ithkuil
·
9 months ago
·
[ - ]

There are different ways to be wrong.

For example you can confabulate "facts" or you can make logical or coherence mistakes.

Current LLMs are encouraged to be creative and effectively "make up facts".

That's what created the first wow factor. The models are able to write a Star Trek fan fiction model in the style of Shakespeare. They are able to take a poorly written email and make it "sound" better (for some definition of better, e.g. more formal, less formal etc).

But then, human psychology kicked in and as soon as you have something that can talk like a human and some marketing folks label as "AI" you start expecting it to be useful also for other tasks, some of which require factual knowledge.

Now, it's in theory possible to have a system that you can converse with which can _also_ search and verify knowledge. My point is that this is not the place where LLMs start from. You have to add stuff on top of them (and people are actively researching that)

Intralexical
·
9 months ago
·
[ - ]

> I sincerely apologize for my earlier response. Upon reviewing the search results provided, I realize I made an error in referencing those specific studies. The search results don't contain any relevant information for the claims I mentioned earlier. As an AI assistant, I should be more careful in providing accurate and supported information. Thank you for bringing this to my attention. In this case, I don't have reliable references to support those particular statements about software tools and their impact on developer experience and software quality.

Honestly, that's a lot of words and repetition to say "I bullshitted".

Though there are humans that also talk like this. Silver lining to this LLM craze, maybe it'll inoculate us to psychopaths.

josvdwest
·
9 months ago
·
[ - ]

"A key issue with AI-powered search is that it is just too slow compared to classic Google. Even if it generates a better answer, the added latency is discouraging."

Is this true? I feel like most complaints I have and hear about is how inaccurate some of the AI results are. I.e. the mistakes it confidently makes when helping you code.

Terretta
·
9 months ago
·
[ - ]

From hitting enter to seeing something, ofc it's slower.

From hitting enter to a set of relevant answers loaded into your brain, though? Isn't that the goal that should be measured? Against that goal, the two decade old approach seems to have peaked over a decade ago, or phind wouldn't find traction.

For the 20 year old page rankers, time from search to a set of correct answers in your brain is approaching “DNF” -- did not finish.

---

PS. Hallucinations or irrelevant results, both require exercising a brain cell. On a percentage basis, there are fewer hallucinations than irrelevant results, it's just that we gave up on SERP confidence ages ago.

moffkalast
·
9 months ago
·
[ - ]

It's one of those triangles with speed \ accuracy / cost.

You can have a small model that's cost effective to serve, and gives fast responses, but will be wrong half the time.

Or you can have a large model that's slow to run on cheap hardware, but will give more accurate answers. This is usually only fast enough for personal use.

And the third option with a large model that's fast and accurate, and you'll have to pay Nvidia/Groq/etc. a small fortune to be able to run it at speed and also probably build a solar powerplant to make it cost effective in power use.

caskstrength
·
9 months ago
·
[ - ]

This is true in my experience. Before searching for something I often try to guess whether it will take me more time to quickly go over Google results or watch Perplexity Pro slowly spitting the answer line-by-line.

loktarogar
·
9 months ago
·
[ - ]

I think they're both key issues - when the results are accurate, they're too slow; and you can't trust the results when you get there because they're often inaccurate

kamatchu07
·
9 months ago
·
[ - ]

Is not for everyone as title said . It is just for pro users . The title is confusing . Can you please change it

·
9 months ago
·
[ - ]

dsp_person
·
9 months ago
·
[ - ]

Hmm this versus Kagi Assistant?

Plan page says $20/mo Unlimited powerful Phind-405B and Phind-70B searches; Daily GPT-4o (500+) , Claude 3.5 Sonnet (500+), Claude Opus (10) uses

> Phind-405B scores 92% on HumanEval (0-shot), matching Claude 3.5 Sonnet.

Any other benchmarks?

nicce
·
9 months ago
·
[ - ]

I payed and used 6 months for Phind. I am more satisfied with the Kagi Assistant currently. It does not give that many links but overall results are as good or even better, and you can use lenses. You get general search engine too.

There was one UI related annoyance with Phind; scroll bar sometimes jumped randomly, maybe even after each input or during token generation (on Firefox). You start wasting a lot of time if you always need to find again the part you were looking. Or even just scrolling back to bottom.

Primary issue is still that both hallucinate too much when you ask something difficult. But that is the general problem everywhere.

rushingcreek
·
9 months ago
·
[ - ]

Thanks for the feedback. We've fixed the UI jumping issue. The new Phind update today should also work as a general search engine.

theendisney4
·
9 months ago
·
[ - ]

The icons cover up the input area on this crappy android work phone.

I stubornly continued to type my complaint about my json getting to large for phones with slow cpus or slow connections and got 100 solutions to explore. I couldnt help but think this is the worse case robot overlord, it gave me a year worth of materials to study complete with the urge to go do the work. That future we use to joke about is here!

Some of the suggestions are familiar but i dont have the time to read books about with little titbits of semi practical information smeard out over countless pages in the wrong order for my use case.

Im having flashbacks reading for days, digging though a humongous library only to end up with 5 lines of curl. I still cant tell if im a genius or just that dumb.

This long response unexpectedly makes me want to code all day long. One can chose where to go next which is much more exciting than the short linear answers... apparently.

Well done

freehorse
·
9 months ago
·
[ - ]

> Hmm this versus Kagi Assistant?

It has a vscode extension. So if you use that, it makes some sense. Purely for search, I dont know. Ime phind is not that great with internet access, sometimes people disable the search function to get better answers.

rushingcreek
·
9 months ago
·
[ - ]

Have you tried the new internet answers that are a part of this update?

freehorse
·
9 months ago
·
[ - ]

Not really, as I do not have subscription anymore. Is it better compared to no-internet-access?

joshvm
·
9 months ago
·
[ - ]

92% suggests a harder benchmark is needed, so it's difficult judge. Especially when a lot of "high scoring" models produce cogent results with a high level of hallucination (eg Llama 3 is chatty, confident and quite often wrong for me).

At that level of performance you're probably in the realm of hard edge cases with ambiguous ground truth.

caskstrength
·
9 months ago
·
[ - ]

Yeah, just went over their pricing and they apparently don't have any lower tier subscription besides 20$/month "unlimited Phind + 500/day ChatGPT" version. I don't need that, what I need is something like 100 uses per month for 5$. As a coding-focused search engine they really need to consider why would people pay them same rates as for more feature-rich competitors.

itorcs
·
9 months ago
·
[ - ]

Been subscribed to phind pro for the last 5 or 6 months I think? Feels like the pollution from search results has gotten a bit better but it sometimes still messes with answers when I ask a follow up question. Like I will reference the answer aboves code in my question, and the next answer will answer based not on the conversation but some code in the search results. I'm not versed enough in rag to know how you would fix that with like a prioritization or something. Other than that I'm REALLY looking forward how you guys tackle your own artifacts in the web interface. Something about that ui in Claude's version of artifacts works really well with my work flow when using the web, plus having the versions of different files, etc.

rushingcreek
·
9 months ago
·
[ - ]

We're working on artifacts :)

May I ask which models you're seeing the pollution with?

itorcs
·
9 months ago
·
[ - ]

Has happened with both 4o and sonnet, probably 4o more if I had to say for sure. I need to use 405 more to see if it has that same problem. I guess I didn't think about how the issue might be better or worse depending on model, I assumed the rag stuff applied the same

CSMastermind
·
9 months ago
·
[ - ]

Okay, wait this is actually doing a really good job.

I still have to ask follow up questions to get reasonable results but when I tested earlier this year it was outright failing on most of my test queries.

rushingcreek
·
9 months ago
·
[ - ]

That’s great to hear! What are you asking it?

CSMastermind
·
9 months ago
·
[ - ]

These are the types of questions I want to ask it:

> What degrees are held by each of the current Fortune 100 CEOs?

> What job did each of the current NFL GMs hold before their current position?

> Which genre would each of the current Billboard Hot 100 songs be considered part of?

> How many recipients of the Presidential Medal of Freedom were born outside of the US?

> Which US car company has the most models in their 2025 line-up across all of their brands?

It can't handle those directly right now.

You need to break the problem down step by step and sort of walk it through gathering the data with follow up questions.

But much better than it used to be.

druskacik
·
9 months ago
·
[ - ]

There is a method that could help immensely when answering questions like these. E.g. some of these question may be answered quite quickly using WikiData [0] (answer to question about the recipients of Medal of Freedom, query written with the help of Claude), instead of just scraping and compiling information from potentially hundreds of websites. I believe this idea is quite under-explored compared to just blindly putting everything to the model's context.

[0] https://query.wikidata.org/#SELECT%20%28COUNT%28DISTINCT%20%...

kreyenborgi
·
9 months ago
·
[ - ]

Yeah, I've used gpt to create wikidata queries for me, it worked great :-)

pseudosavant
·
9 months ago
·
[ - ]

Part of the problem will not be solved by LLMs but maybe hiding aspects of one running. LLMs basically "think" "out loud" as it processes and produces tokens.

The amount of thought required to answer any of those questions is pretty high, especially because they are all sizeable lists. It is going to take a lot of thinking out loud, and detailed training data covering all those items, to do that well.

rushingcreek
·
9 months ago
·
[ - ]

Thanks! Have you tried enabling “multi-query” mode in the search box?

BikeShuester
·
9 months ago
·
[ - ]

I'd suggest offering at least one free query to allow users to evaluate the service.

rushingcreek
·
9 months ago
·
[ - ]

Our fast model, Phind Instant, is completely free

johndough
·
9 months ago
·
[ - ]

Maybe OP was referring to Phind-405B (the model from the article). I certainly wonder how good the 405B model really is.

cjtrowbridge
·
9 months ago
·
[ - ]

It's just an innovated (enshittified) version of Facebook's free 405b model.

fshr
·
9 months ago
·
[ - ]

Why not let us try the new model for free like the 5 uses available for the 70B model? Seems like a no brainer to hook new users if what you're selling is worth it, eh?

swyx
·
9 months ago
·
[ - ]

> The model, based on Meta Llama 3.1 8B, runs on a Phind-customized NVIDIA TensorRT-LLM inference server that offers extremely fast speeds on H100 GPUs. We start by running the model in FP8, and also enable flash decoding and fused CUDA kernels for MLP.

as far as i know you are running your own GPUs - what do you do in overload? have a queue system? what do you do in underload? just eat the costs? is there a "serverless" system here that makes sense/is anyone working on one?

rushingcreek
·
9 months ago
·
[ - ]

We run the nodes "hot" and close to overload for peak throughput. That's why NVIDIA's XQA innovation was so interesting, because it allows for much higher throughput for a given latency budget: https://github.com/NVIDIA/TensorRT-LLM/blob/main/docs/source....

Serverless would make more sense if we had a significant underutilization problem.

jmakov
·
9 months ago
·
[ - ]

Phind is the best productivity booster I've found in the last years. Congrats and keep up the great work!

rushingcreek
·
9 months ago
·
[ - ]

Thank you!

FofXisNull
·
9 months ago
·
[ - ]

Hey can you enhance the "clear history" button so that it doesn't delete your pinned threads? Also if you can improve the threads area with folders for better organization that would be awesome. Great work on the product btw!

rushingcreek
·
9 months ago
·
[ - ]

Great feedback!

shultays
·
9 months ago
·
[ - ]

Recently I asked an AI following question

  const MyClass& getMyClass(){....}
  auto obj = getMyClass();

  this makes a copy right?

And it was very confident about it not making a copy. It thinks auto will deduce the type as a const ref and not make a copy. Which is wrong, you need auto& or const auto& for that. I asked it if it is sure and it was even more confident.

Here is the godbolt output https://godbolt.org/z/Mz8x74vxe . You can see the "copy" being printed. And you can also see you can call non-const methods on copied object, which implies it is a non-const type

I asked the very same question to phind and it gave the same answer https://www.phind.com/search?cache=k3l4g010kuichh9rp4dl9ikb

How come two different AIs, one was supposed to be specialized on coding, fails in such a confident way?

prosunpraiser
·
9 months ago
·
[ - ]

You prove the point that these are just token generation machines whose output is psuedo-intelligent. It’s probably not there yet to be blindly trusted.

fennecfoxy
·
9 months ago
·
[ - ]

More to the point; I wouldn't blindly trust 99% of humans, let alone a machine.

Though to be fair we will hopefully quickly approach a point where a machine can be much more trusted than a human being, which will be fun. Don't cry about it, it's our collectives faults for proving that meat bags can develop ulterior motives.

renewiltord
·
9 months ago
·
[ - ]

One of the oldest tricks to make LLMs perform better is to ask them to "think step by step". I asked your question to Claude with that one

    ```
    const MyClass& getMyClass(){....}
      auto obj = getMyClass();
    ```

    Does this make a copy. Let's think step by step.

This might help you if you're trying to get these to help you more often.

asadm
·
9 months ago
·
[ - ]

Any perplexity pro user tried Phind? how good is it? specially for code/tech research etc.

BigLasagne
·
9 months ago
·
[ - ]

[dead]

rawrawrawrr
·
9 months ago
·
[ - ]

Title says "for everyone", but post says "Phind-405B is available now for all Phind Pro users". I guess everyone on earth has paid for Phind :)

rushingcreek
·
9 months ago
·
[ - ]

The "for everyone" part is about the new Phind Instant, trained using similar data to Phind-405B, which is great at fast information retrieval

natrys
·
9 months ago
·
[ - ]

Does an API not make economic sense for you? Personally I would rather use my own tooling (not VSCode based).

rushingcreek
·
9 months ago
·
[ - ]

So far an API has been less of a priority than focusing on the user-facing product. But it seems there's a reasonable amount of demand for it, which we'll consider.

therealmarv
·
9 months ago
·
[ - ]

I consider AIs without API access even as non existent. Not everybody wants a web interface and waste time on copy&paste all the time. APIs can hook the filesystem directly with an AI, make complicated prompt engineering and multi file changes a non-issue. And they should also help you to make more money (don't undersell the API access and you're fine). Without an API the community can also not compare Phind-405B to other models easily.

Would be great to have access to your model in a LLM gateway like https://openrouter.ai/

I would give your API a try as minimum.

nickpsecurity
·
9 months ago
·
[ - ]

You should also consider the ecosystem value that might be created for your product. There’s a prior example.

ChatGPT amazed people but its UI didn’t. A bunch of better UI’s showed up building on the OpenAI API and ChatGPT own. They helped people accomplish more which further marketed the product.

You can get this benefit with few downsides if you make the API simple with few promises about feature. “This is provided AS IS for your convenience and experimentation.”

lerchmo
·
9 months ago
·
[ - ]

I think an API would be fantastic for use cases like Aider / SWE agents. The primary issue besides fully understanding the code base is having up-to-date knowledge on libraries and packages. Perplexity has "online" models. And phind with Claude, GTP-4o, Phind 70 + search / rag would be awesome.

jadbox
·
9 months ago
·
[ - ]

"Phind-405B scores 92% on HumanEval (0-shot), matching Claude 3.5 Sonnet". I'd love to see examples of actual code modifications created by Phind and Sonnet back-to-back. This level of transparency would give me the confidence to try to pro. As it is, I'm skeptical by the claim and actual performance as I've yet to see a finetuned model from Llama3.1 that performed notably better in an area without suffering problems in other areas. We do need more options!

Simorgh
·
9 months ago
·
[ - ]

I’ve been a customer of Phind for a number of months now, so I’m familiar with the capabilities of all the models they offer.

I found even Phind-70B to often be preferable to Claude Sonnet and would commonly opt for it. I’ve been using the 405B today and it seems to be even better at answering.

I’ve found it does depend on the task. For instance, for formatting JSON in the past, GPT-4 was actually the best.

Because you can cycle through the models, you can check the output of each one, to get the best answer.

xwolfi
·
9 months ago
·
[ - ]

Tbh formatting JSON... should be a solved problem already for the last decade, why consume AI resources for that ??

·
9 months ago
·
[ - ]

trees101
·
9 months ago
·
[ - ]

Hopefully it gets evaluated on this leaderboard https://aider.chat/docs/leaderboards/

rushingcreek
·
9 months ago
·
[ - ]

The effectiveness of any given model depends on the specific use cases. We noticed that Phind-405B is particularly good at making websites and included some zero-shot examples in the blog.

amelius
·
9 months ago
·
[ - ]

Government institutions (academia) should be training these kinds of networks (or funding it) so they become accessible to everybody and truly "open".

TekMol
·
9 months ago
·
[ - ]

Sneaky title.

    Bread and water for everyone

now apparently means

    Bread for our customers, water for everyone

mritchie712
·
9 months ago
·
[ - ]

Does anybody use Phind? What do you use it for?

sgc
·
9 months ago
·
[ - ]

I use it, with the phind models, instead of chatGPT. I had to change my user agent to Chrome since too many sites would refuse to work with FF otherwise, and now chatGPT is stuck in an endless captcha loop whenever I go there. I am just a casual user, to help write a quick script or to get some bit of relevant info. It works just as well or better for my use case, and of course having actual citations with links is worlds better than just playing "guess the hallucination". I am happy chatGPT kicked me out.

TaylorAlexander
·
9 months ago
·
[ - ]

My friend has the endless captcha loop on ChatGPT too. Does anyone know how to fix it?

bishfish
·
9 months ago
·
[ - ]

I had that recently and it went away last time on its own. Not sure what triggers it or how to fix.

Salgat
·
9 months ago
·
[ - ]

Incognito mode until it goes away on its own.

swyx
·
9 months ago
·
[ - ]

change browser, change location. should fix.

paranoidxprod
·
9 months ago
·
[ - ]

I was subscribed for about 6 months between the end of last year and beginning of this, but canceled and haven't looked back. The web interface was constantly buggy for me, and they seemed to be very focused on the VSCode extension without integrations for other editors, so I ended up canceling.

davidpatrick
·
9 months ago
·
[ - ]

what are you using now?

paranoidxprod
·
9 months ago
·
[ - ]

Mostly just Kagi Quick Answer. I use Claude sometimes, but I'll probably up my Kagi plan to try the new assistant soon.

fkyoureadthedoc
·
9 months ago
·
[ - ]

I use it periodically for things that I'd typically search on google and then read stack overflow for. I started this workflow before chatgpt had web search, so might be irrelevant now, but I've found it decent. Back then it was nice to be able to see the sources vs chatgpt just giving a random answer from who knows where.

thoughtpalette
·
9 months ago
·
[ - ]

Been subbed for 8+ months.

Mostly use it for API questions. It's been amazing at MomentJs stuff. Also use it for code optimization and debugging error messages.

rushingcreek
·
9 months ago
·
[ - ]

Thank you for being a Pro sub :)

axpy906
·
9 months ago
·
[ - ]

I’ve use it since last year as a paid subscriber. I like it because of the technical nature as it will help you know the exact steps on how to get something done. I also use it for random things like bouncing ideas off or to enhance my knowledge retention of a subject.

smusamashah
·
9 months ago
·
[ - ]

I use it to summarise articles :)

I just paste the page link as a query and it tells me what the page is about and even pulls key points.

·
9 months ago
·
[ - ]

rainbowjelly
·
9 months ago
·
[ - ]

I get a blank page with the text "Service is unavailable in this region."

Any reason why Phind is region-locked? Is there a list of what countries Phind is available in?

zx8080
·
9 months ago
·
[ - ]

Interesting. I'm not working for phind, but can you share which region are you trying to access it from?

rainbowjelly
·
9 months ago
·
[ - ]

I tried to access it from Malaysia. VPN works but I’d rather not.

·
9 months ago
·
[ - ]

atemerev
·
9 months ago
·
[ - ]

Some time ago, you promised to release the weights for Phind-70B. Do you still plan to do this?

mountain_goat
·
9 months ago
·
[ - ]

I've been using Phind this past week and it's been excellent.

One of our vendors insisted on whitelisting the IPs we were going to call them from, and our deployments went through AWS copilot/Fargate directly to the public subnets. Management had fired the only person with infrastructure experience a few months ago (very small company), and nobody left knew anything about networking.

Within about a week, Phind brought me from questions like "What is a VPC?" "What is a subnet?" to having set up the NAT gateways, diagnosing deploy problems and setting up the VPC endpoints in AWS' crazy complicated setup, and gotten our app onto the private subnet and routing outbound traffic through the NAT for that vendor.

Yes, it occasionally spit out nonsense (using the free/instant model). I even caught it blatantly lying about its sources once. Even so, once I asked the right questions it helped me learn and execute so much faster than I would have trying to cobble understanding through ordinary google searches, docs, and blog posts.

Strongly recommended if you're ever wading into a new/unfamiliar topic.

thruway516
·
9 months ago
·
[ - ]

How does it compare to Perplexity or even plain vanilla ChatGPT? Did you specifically seek to use Phind because you weren't satisfied with others? Or did it just happen to be the first one you used?

mountain_goat
·
9 months ago
·
[ - ]

Just happened to be the first one I used!

The day after this thread hit the front page, I tried perlexity.ai b/c Phind was overloaded with traffic and not responding. It was ok, but not quite as helpful. Too hard to tell whether that's a fair judgment of the services or just because I don't have as much to ask as I did a week ago though.

·
9 months ago
·
[ - ]

enum
·
9 months ago
·
[ - ]

This is going to be renamed to Llama Phind 405B, right?

minkles
·
9 months ago
·
[ - ]

I asked it a question and it answered authoritatively.

> The impedance of a 22 μH capacitor at 400 THz is approximately 1.80 × 10^-24 Ω.

The correct answer should have been “what the hell are you talking about dumbass?”. Capacitors are not measured in henries and the question really has no meaning at 400THz. Another stochastic parrot.

IanCal
·
9 months ago
·
[ - ]

The 405B model for me brings out a formula for calculating this, explains that my question doesn't make sense given that capacitors are measured in F instead and then plugs the values in assuming I made a mistake and meant 22uF.

It then explains that in practice other factors would dominate and the frequency is so high traditional analysis doesn't make so much sense.

https://www.phind.com/search?cache=ns6ojo1obnomkerccup9nm3r

CamperBob2
·
9 months ago
·
[ - ]

At 400 THz any real-world capacitor will look inductive. :-P

Although it's not gonna look like 22 uH.

Citizen_Lame
·
9 months ago
·
[ - ]

Can the new model provide creative writing with high token context or is Phind purely focused on answering questions (enhanced search).

rushingcreek
·
9 months ago
·
[ - ]

It can, via our Playground mode, but it’s not optimized for that. Phind-405B does seem to generate good poems though.

J_Shelby_J
·
9 months ago
·
[ - ]

Accessible by api?

codedokode
·
9 months ago
·
[ - ]

I think that LMM should not produce answers to questions. Instead, they should generate keywords, make keyword search and give quotes from human-written material as an answer.

So what LLM should do is only search and filter human-written material.

An example of search query is "What is the temporal duration between striking adjacent strings to form a chord on a guitar?". Google (being just a dumb keyword search engine) produces mostly unrelated search results (generally answering what chords there are and how does one play them). Phind also cannot answer: [1]

However, when I asked a LLM what keywords I should use to find this, it suggested "guitar chord microtiming" among other choices, which allows to find a research work containing the answer (5 to 12 ms if someone is curious).

[1] https://www.phind.com/search?cache=u4xiqluairg3zkdaxstcr39v

davidcollantes
·
9 months ago
·
[ - ]

The “About” is not “Who we are” at the time I am typing this. Please add information about the company, founders, etc.

It looks good, thought!

nonotanymore
·
9 months ago
·
[ - ]

This is incredible. Truly the evolution of documentation! It makes going through docs in Python so much easier.

garfieldnate
·
9 months ago
·
[ - ]

Boy I would love something like this for electronic parts datasheets.

ashleyn
·
9 months ago
·
[ - ]

I was a Phind user for a bit but I've switched to Perplexity lately. Anyone know how the two compare?

rushingcreek
·
9 months ago
·
[ - ]

We should have higher quality and faster answers across the board with this new update. Would love to hear your thoughts.

fshr
·
9 months ago
·
[ - ]

It'd be cool if you showed off and did your own comparison and posted it on your blog. It'd also be cool if your blog was sorted newest to oldest - it's currently the reverse.

thibran
·
9 months ago
·
[ - ]

wow it finds the correct answer to a Scheme niche language question.

"How to replace a string in Gerbil Scheme?"

RIMR
·
9 months ago
·
[ - ]

The AI generated page that attributes Steve Jobs' success to Paul Graham is killing me.

swyx
·
9 months ago
·
[ - ]

you have cause and effect reversed. PG's post directly cites Brian Chesky and Steve Jobs in the post as founder mode inspirations.

lxe
·
9 months ago
·
[ - ]

Is Phind similar to Perplexity?

hleszek
·
9 months ago
·
[ - ]

Are the weights available since it's based on Meta Llama 3.1 405B?

TacticalCoder
·
9 months ago
·
[ - ]

Serious question: does the Meta LLama ToS / EULA even allow fine-tuned models based on Llama to be used for commercial purposes without making the weights available?

darwinwhy
·
9 months ago
·
[ - ]

I believe it does unless you're another tech giant with billions of users / revenue.

jeffybefffy519
·
9 months ago
·
[ - ]

Ive given up on phind. The company seems like a bit of a dodgy black hole in terms of what they do with information (the answer is - i dont know, but i requested to try find out so my company could use them and got no reply). Seems untrustworthy…

11101010001100
·
9 months ago
·
[ - ]

Looks cool, but anyone not familiar with 'founder mode' will be confused....

jncraton
·
9 months ago
·
[ - ]

It would be nice to see the Phind Instant weights released under a permissive license. It looks like it could be a useful tool in the local-only code model toolbox.

·
9 months ago
·
[ - ]

neilv
·
9 months ago
·
[ - ]

Their example is: stealing content, to make a sketchy-looking pile of worthless BS.

Then I realized that this pattern is the biggest application of LLMs right now.

So I guess they're just acknowledging the target market.

·
9 months ago
·
[ - ]

ewuhic
·
9 months ago
·
[ - ]

How well does Phind operate in uncommon languages, say, the Uyghur language?

KETpXDDzR
·
9 months ago
·
[ - ]

> [...] faster, high quality AI answers for everyone

...if you buy their pro subscription

cocodill
·
9 months ago
·
[ - ]

a really good benchmark for such systems: ask them to summarize the reasons to vote for Trump or Harris.

johndough
·
9 months ago
·
[ - ]

For me, the website says: "Sorry, you have been blocked. You are unable to access phind.com"

rushingcreek
·
9 months ago
·
[ - ]

Sorry about that, can you please email me at hello(at)phind(dot)com?

johndough
·
9 months ago
·
[ - ]

Sure! I've contacted you.

Edit: It has been resolved for me. Thank you!

mathfailure
·
9 months ago
·
[ - ]

For me it says

The inference service may be temporarily unavailable - we have alerts for this and will be fixing it soon.

·
9 months ago
·
[ - ]