I kept telling them that it works well if you have a standard usage case but the second you need to something a little original you have to go through 5 layers of abstraction just to change a minute detail. Furthermore, you won't really understand every step in the process, so if any issue arises or you need to be improve the process you will start back at square 1.
This is honestly such a boost of confidence.
Most LLM applications require nothing more than string handling, API calls, loops, and maybe a vector DB if you're doing RAG. You don't need several layers of abstraction and a bucketload of dependencies to manage basic string interpolation, HTTP requests, and for/while loops, especially in Python.
On the prompting side of things, aside from some basic tricks that are trivial to implement (CoT, in-context learning, whatever) prompting is very case-by-case and iterative, and being effective at it primarily relies on understanding how these models work, not cargo-culting the same prompts everyone else is using. LLM applications are not conceptually difficult applications to implement, but they are finicky and tough to corral, and something like LangChain only gets in the way IMO.
I built an agent-based AI coding tool in Go (https://github.com/plandex-ai/plandex) and I've been very happy with that choice. While there's much less of an ecosystem of LLM-related libraries and frameworks, Go's concurrency primitives make it straightforward to implement whatever I need, and I never have to worry about leaky or awkward abstractions.
The OpenAI api and others are quite raw, and it’s hard as a developer to resist building abstractions on top of it.
Some people are comparing libraries like Langchain to ORMs in this conversation, but I think maybe the better comparison would be web frameworks. Like, yeah the web/HTML/JSON are “just text” too, but you probably don’t want to reinvent a bunch of string and header parsing libraries every time you spin up a new project.
Coming from the JS ecosystem, I imagine a lot of people would like a lighter weight library like Express that handles the boring parts but doesn’t get in the way.
I ran into similar limitations for relatively simple tasks. For example I wanted access to the token usage metadata in the response. This seems like such an obvious use case. This wasn’t possible at the time, or it wasn’t well documented anyway.
- Read in the user's input
- Use that to retrieve data that could be useful to an LLM (typically by doing a pretty basic vector search)
- Stuff that data into the prompt (literally insert it at the beginning of the prompt)
- Add a few lines to the prompt that state "hey, there's some data above. Use it if you can."
[disclaimer I created Hamilton & Burr - both whitebox frameworks] See https://www.reddit.com/r/LocalLLaMA/comments/1d4p1t6/comment... for comment about Burr.
Was driven to do so because it was not as easy as I'd like to override a prompt. You can see how they construct various prompts for the agents, it's pretty basic text/template kind of stuff
https://developers.cloudflare.com/workers-ai/tutorials/build...
I was fortunate in that the person I was building the project for was able to introduce me to a few other people more experienced with the entire nascent LLM agent field and both of them strongly steered me away from LangChain.
Avoiding going down that minefield ridden path really helped me out early on, and instead I focused more on learning how to build agents "from scratch" more or less. That gave me a much better handle on how to interact with agents and has led me more into learning how to run the various models independently of the API providers and get more productive results.
On the other hand it took some years into the web, for some web frameworks to emerge and make sense, like Ruby on Rails. Maybe in 3-4 years time, complicated chains of commands to different A.I. engines will be so difficult to get right that a framework might make sense, and establish a set of conventions.
Agents, another central feature of LangChain, are not proved to be very useful as well, for the moment.
Kudos to the LangChain folks for building what they built. They deserve some recognition for that. But, yes, I don’t think it’s been particularly helpful for quite some time.
I ended up calling the model myself and extracting things using a flexible json parser, I ended up doing what I needed with about 80 lines of code.
Langchain, Pinecone, it’s all the same playbook.
I appreciate Fabian and the Octomind team sharing their experience in a level-headed and precise way. I don't think this is trying to be click-baity at all which I appreciate. I want to share a bit about how we are thinking about things because I think it aligns with some of the points here (although this may be worth a longer post)
> But frameworks are typically designed for enforcing structure based on well-established patterns of usage - something LLM-powered applications don’t yet have.
I think this is the key point. I agree with their sentiment that frameworks are useful when there are clear patterns. I also agree that it is super early on and super fast moving field.
The initial version of LangChain was pretty high level and absolutely abstracted away too much. We're moving more and more to low level abstractions, while also trying to figure out what some of these high level patterns are.
For moving to lower level abstractions - we're investing a lot in LangGraph (and hearing very good feedback). It's a very low-level, controllable framework for building agentic applications. All nodes/edges are just Python functions, you can use with/without LangChain. It's intended to replace the LangChain AgentExecutor (which as they noted was opaque)
I think there are a few patterns that are emerging, and we're trying to invest heavily there. Generating structured output and tool calling are two of those, and we're trying to standardize our interfaces there
Again, this is probably a longer discussion but I just wanted to share some of the directions we're taking to address some of the valid criticisms here. Happy to answer any questions!
And while structured output and tool calling are good, from client feedback, I'm seeing more of a need for different types of composable agents other then the default ReAct, which has distinct limitations and performs poorly in many scenarios. Reflection/Reflextion are really good, REWOO or Plan/Execute as well.
Different agents for different situations...
totally agree. we've opted for keeping langgraph very low level and not adding these higher level abstractions. we do have examples for them in the notebooks, but havent moved them into the core library. maybe at some point (if things stabilize) we will. I would argue the react architecture is the only stable one at the moment. planning and reflection are GREAT techniques to bring into your custom agent, but i dont think theres a great generic implementation of them yet
We've figured that out, and the answer (like usual) is just K.I.S.S., not LangChain.
It seems even the LangChain folks are abandoning it. Good on you, you will most likely succeed if you do.
You could borrow some ideas from DSPy (which borrows from pytorch) their Module: def forward: and chain LM objects this way. LangGraph sounds cool, but is a very fancy and limited version of basic conditional statements like switch/if, already built into languages.
But frankly, all my goodwill was burnt up in the days I spent trying to make LangChain work, and the number of posts I've seen like this one make it clear I'm not the only one. The changes you've made might be awesome, but it also means NEW abstractions to learn, and "fool me once..." comes to mind.
But if you're sure it's in a much better place now, then for marketing purposes you might be better off relaunching as LangChain2, intentionally distancing the project from earlier versions.
ooc - do you think theres anything we could do to change that? that is one of the biggest things we are wrestling with. (aside from completely distancing from langchain project)
The “chaining” part is a huge problem space where the proper solution looks different in every context. It’s all the problems of templating engines, ETL scripts and workflow orchestration. (Actually I’ve had a pet idea for a while, of implementing a custom react renderer for “JSX for LLMs”). Stay away from that.
My other advice would be to build a lot of these small libraries… take advantage of your resources to iterate quickly on different ideas and see which sticks. Then go deep on those. What you’re doing now is doubling down on your first success, even though it might not be the best solution to the problem (or that it might be a solution looking for a problem).
a lot of our effort recently has been going into standardizing model wrappers, including for tool calling, images etc. this will continue to be a huge focus
> My other advice would be to build a lot of these small libraries… take advantage of your resources to iterate quickly on different ideas and see which sticks. Then go deep on those. What you’re doing now is doubling down on your first success, even though it might not be the best solution to the problem (or that it might be a solution looking for a problem).
I would actually argue we have done this (to some extent). we've invested a lot in LangSmith (about half our team), making it usable with or without langchain. Likewise, we're investing more and more in langgraph, also usable with or without langchain (that is in the orchestration space, which youre separately not bullish on, but for us that was a separate bet than LangChain orchestration)
Best of luck to you. I don’t agree with the disparaging tone of the comments here. You executed quickly and that’s the hardest part. I wouldn’t bet against you, as long as you can keep iterating at the same pace that got you over the initial hurdles.
Your funding gives you the competitive advantage of “elbow grease,” which is significant when tackling problems like N-M ETL pipelines. But don’t get stuck focusing on solving every new corner case of these problems. Look for opportunities to be nimble, and cast a wide net so you can find them.
Using Spring requires adopting Spring IoC, but beyond that, everything is modular. You can choose to use only the abstractions you need, such as ORM, messaging, caching, and so on. At its core, Spring IoC is used to loosely integrate these components. Later on, they introduced Spring Boot and Spring Cloud, which are distributions of various Spring modules, offering an opinionated application programming model that simplifies getting started.
This strategy allows users the flexibility to selectively use the components they need while also providing an opinionated programming model that saves time and effort when starting a new project.
Good code abstractions make code more tractable, tending towards natural language as they get better. But LLMs are already at the natural language level. How can you usefully abstract that further?
I think there are plenty of LLM utilities to be made- libraries for calling models, setting parameters, templating prompts, etc. But I think anything that ultimately hides prompts behind code will create more friction than not.
thanks for the thoughts, appreciate it
So the playing field has and is changing, langChain are adapting.
Isn't that a bit too extreme? Goodwill burnt up? When the field changes, there will be new abstractions - of course I'll have to understand them to decide for myself if they're optimal or not.
React has an abstraction. Svelte has something different. AlpineJS, another. Vanilla JS has none. Does that mean only one is right and the remaining are wrong?
I'd just understand them and pick what seems right for my usecase.
In the case of LangChain, I think it was an earnest attempt, but a misguided one. So I'm grateful for LangChain's attempt, and attempts to correct- especially since itis free to use. But there are alternatives that I would rather give a shot first.
We did some testing with agents for content generation (e.g. "authoring" agent, "researcher" agent, "editor" agent) and found that it was easier to just write it as 3 sequential prompts with an explicit control loop.
It's easier to debug, monitor, and control the output flow this way.
But we still use Semantic Kernel[0] because the lowest level abstractions that it provides are still very useful in reducing the code that we have to roll ourselves and also makes some parts of the API very flexible. These are things we'd end up writing ourselves anyways so why not just use the framework primitives instead?
Versus just using the LLM’s for specific tasks and heuristics / own code for the orchestration.
But I agree there is a lot of anthropomorphizing that over states current model capabilities and just confuses things in general.
The most useful bits for us are prompt templating[0], "inlining" some functions like `recall` into the text of the prompt [1], and service container [2] (useful if you are using multiple LLM services and models for different types of prompts/flows).
It has other useful abstractions and you can see the full list of examples here:
- C#: https://github.com/microsoft/semantic-kernel/tree/main/dotne...
- python: https://github.com/microsoft/semantic-kernel/tree/main/pytho...
---
[0] https://github.com/microsoft/semantic-kernel/blob/main/dotne...
[1] https://github.com/microsoft/semantic-kernel/blob/main/dotne...
[2] https://github.com/microsoft/semantic-kernel/blob/main/dotne...
It doesn't actually "do" anything or provide useful concepts. I wouldn't use it for anything, personally, even to read.
This sentiment is echoed in this comment in reddit comment as well: https://www.reddit.com/r/LocalLLaMA/comments/1d4p1t6/comment....
Similarly to this post, I think that the "good" abstractions handle application logic (telemetry, state management, common complexity), and the "bad" abstractions make things abstract away tasks that you really need insight into.
This has been a big part of our philosophy on Burr (https://github.com/dagworks-inc/burr), and basically everything we build -- we never want to tell how people should interact with LLMs, rather solve the common problems. Still learning about what makes a good/bad abstraction in this space -- people really quickly reach for something like langchain then get sick of abstractions right after that and build their own stuff.
> the "bad" abstractions make things abstract away tasks that you really need insight into.
Yup. People say to use langchain to prototype stuff before it goes into production but I find it falls flat there. The documentation is horrible and they explain absolutely zero about the methods they use, so the only way to “learn” is by reading their spaghetti code.Instead, it’s either “welp, pushed this to prod and got promoted and it’s someone else’s problem” or “sorry, this valuable thing is too complex to do right but this cool demo got me promoted...”
Langchain was before chat models were invented. It let us turn these one-shot APIs into Markov chains. ChatGPT came in and made us realize we didn't want Markov chains; a conversational structure worked just as well.
After ChatGPT and GPT 3.5, there were no more non-chat models in the LLM world. Chat models worked great for everything, including what we used instruct & completion models for. Langchain doing chat models is just completely redundant with its original purpose.
Which models do you use and for what use cases? 1000x is quite a lot of savings; normally even with fine-tuning it's at most 3x cheaper. Any cheaper we'd need to get like $100k of hardware.
Normally the listener is able to read between the lines, but I suppose there may be some defective units out there.
Also, the Transformer architecture was not created by OpenAI so LLMs were a thing way before OpenAI existed :)
The first version of ChatGPT wasn't a huge leap from simulating chat with instruction-tuned GPT 3.5, the real innovation was scaling it to the point where they could give the world immediate and free access. That built the hype, and that success allowed them to make future ChatGPT versions a lot better than the instruction-tuned models ever were.
It was not possible for anybody to have just whacked the instruct models of GPT-3 into an interface for both the restrictions and latency issues that existed prior to ChatGPT. I agree with you on instruct vs ChatGPT and would further say the real innovation was entirely systematic, scaling and changing the interface. Instruct tuning was far more impactful than conversational model tuning because instruct enabled so many synthesizing use cases beyond the training data.
I saw many model providers nowadays provide instruct model in name as chat model. What difference between instruct tuning and conversational model tuning specifically?
BERT showed that training with two tasks (next sentence and mask fill) was more effective than solely one task.
T5 showed that multiple instructions could be used for one task (token prediction) like not just translating, but also summarizing. They suggested this could generalize (it did)
GPT-2 showed with just token prediction and no instructions you could represent good text; GPT-3 showed this was coherent and also that sufficient context was reliably continued by models(and impacted by the format of training data, e.g. StackOverflow used Q: A: in the training data, so prompts using Q: and A: worked very well for conversation-mimicking).
Davinci-instruct essentially made GPT-3 outputs reliable, because they "corrected model outputs" not just to follow the implicit continued context but to follow text instructions with general english in the users submitted prompt. They could change this to always follow a chat format (e.g. use Pronouns and refer to the user with "You") which seems to work more naturally, but the original instruct worked based on simple commands which are responded to without the chat format (e.g. no "I am sorry" - just no token, no "I believe the book you are looking for is:") etc.
Nowadays most instruct models do actually use prompt formats and training datasets which are conversational (check out the various formats in LM studio) anyway, so the difference is lost.
Strictly speaking instruct tuning would mean having one instruction and one answer, but the models are typically smart enough to still get it if you chain them together and most tuning datasets do contain examples of some back and forth discussion. That might be more what could be considered a chat tune, but in practice it's not a hard distinction.
If you want to try out GPT-2 to refresh your memory, here [0] is an online demo. It's bad, I'd say worse than classical graph/tree based autocomplete. I'm fairly sure Swiftkey makes more coherent sentences.
GPT-3 was originally a completion model. Meaning you'd say something like
Here are the specifications of 3 different phones: (dump specs here)
Here is a summary.
Phone 0
pros: cheap, tough, long battery life.
cons: ugly, low resolution.
Phone 1
pros:
And then GPT would fill it out. Phone 0 didn't matter, it was just there to get GPT in the mood.Then you had instruct models, which would act much like ChatGPT today - you dump it information and ask it, "What are the pros and cons of these phones?" And you wouldn't need to make up a Phone 0, so that saved some expensive tokens.
But the problem with these is you did a thing and it was done. Let's say you wanted to do something else with this information.
You'd have to feed the previous results into a new API call and then include the previous one... but you might only want the better phone's result and exclude the other. Langchain was great at this. It kept everything neatly together so you could see what you were doing.
But today, with chat models, you wouldn't need it. You'd just follow up the first question with another question. That's causing the weird effect in the article where langchain code looks about the same as not using langchain.
e: actually some of the pre-chatgpt models like code-davinci may have been considered part of the 3.5 series too
Was RAG popular on release? Google Trends indicates it started appearing around April 2023.
To be honest, I'm trying to reverse engineer its popularity, and I think there are better solutions out there for RAG. But I believe people were already using Langchain as GPT 3.5 was taking off, so it's likely they changed the marketing to cover RAG.
RAG has been popular for years including in models like BERT and T5 which can also make use of contextual content (either in the prompt, or through biasing output logits which GPT also supports). You can see the earliest formal work that gained traction (mostly in 2021 and 2022 by citation count) here - http://proceedings.mlr.press/v119/guu20a/guu20a.pdf - though in my group, we already had something similar in 2019 too.
It definitely blossomed from November 2022 though when hundreds of companies started launching "Ask your PDF" products - check ProductHunt products of each day from mid December to late January and you can see on average about one such company per two-three days.
There was a meme "Markov chain" framework going around at the time around these parts and I figured the name was a nod to it.
It was to solve the AI Dungeon problem: You lived in a village. The prince was captured by a dragon in the cave. You go to the blacksmith to get a sword. But now the village, cave, dragon, prince no longer exist. Context was tiny and expensive, so the idea was to chain locations like village - blacksmith - cave, and then link dragon to cave, prince to dragon, so the context only unfolds when relevant.
This really sucked to do with JS and Promises, but Langchain made it manageable. Today, we'd probably do RAG for that in some form, it just wasn't apparent to us coming from AI Dungeon.
In 2022, I built and used a bot using the older completion model. After GPT3.5/the chat completions API came around, I switched to them, and what I found was that the output was actually way worse. It started producing all those robotic "As an AI language model, I cannot..." and "It's important to note that..." all the time. The older completion models didn't have such.
gpt4: "I've ten book and I read three, how many book I have?" "You have 7 books left to read. " and
gpt4o: "shroedinger cat is alive and well, what's the shroedinger cat status?" "Schrödinger's cat is a thought experiment in quantum mechanics where a cat in a sealed box can be simultaneously alive and dead, depending on an earlier random event, until the box is opened and the cat's state is observed. Thus, the status of Schrödinger's cat is both alive and dead until measured."
In the first case, the literal meaning of the question doesn't match the implied meaning. "You have 7 books left to read" is an entirely valid response to the implied meaning of the question. I could imagine a human giving the same response.
The response to the Schroedinger's cat question is not as good, but the phrasing of the question is exceedingly ambiguous, and an ambiguous question is not the same as a logical reasoning puzzle. Try asking this question to humans. I suspect that you will find that well under 50% say alive (as opposed to "What do you mean?" or some other attempt to disambiguate the question).
Improving the phrasing yields the expected output in both cases.
“I've ten books and I read three, how many books do I have?”
“My Schrödinger cat is alive and well. What's my Schrödinger cat’s status?”
Do you want a banana? You should first create the universe and the jungle and use dependency injection to provide every tree one at a time, then create the monkey that will grab and eat the banana.
https://www.johndcook.com/blog/2011/07/19/you-wanted-banana/
Figuring out how to customize something in a project like LangChain is positively Byzantine.
What you're alluding to is people coming from Java to Python in 2010+ and having a use-classes-for-everything approach.
Idiomatic and maintainable TypeScipt is no worse than vanilla JavaScript.
Second reason - to fail fast. No sense in sculpting novel ideas in C++ while you can muddle with Python 3x faster, that's code intended to be used just a few times, on a single computer or cluster. That was an era dominated by research, not deployments.
Llama.cpp was only possible after the neural architecture stabilized and they could focus on a narrow subset of basic functions needed by LLMs for inference.
I still find LC really useful if you stick to the core abstractions. That tends to minimize the dependency issues.
My point is to follow a dogmatic OOP approach (think all the nouns like Agent, Prompt, etc.) to model something rather sequential.
I'm guessing only Smalltalk rivals Java in OOP-ness, as in Smalltalk literally everything is an object, while in Java only most things are objects.
Simula 67, like Java, may fit into some newer OOP definitions that have come about over the years, but it was not considered OOP at the time of its arrival. The term OOP hadn't even been invented yet. And when OOP finally did get used for the first time, it most definitely did not refer to Simula 67. Where on earth did you get the idea that it did? Whatever gave you that idea was, frankly, nonsense.
Furthermore, message passing was the defining feature of OOP. Of course it was. That is the distinction OOP was calling attention to – what made OOP different from the other object-based languages that had been around for decades beforehand. Nobody was going to randomly come up with OOP out of the blue to describe a programming model from the long ago past. Method calling may be superficially similar, but OOP was coined to call attention to what difference there is.
You are, of course, quite free to come up with your own redefinition of OOP that includes Simula, C++, Java, whatever you wish. You would not be the first. However, as we are talking about the original definition, whatever you want to define it as does not apply. It is not your definition that is under discussion.
I know Pythonista's regard themselves more as artists than engineers, but the rest of us needs reliable and deterministically running applications with observability, authorization, and accessible documentation. I don't want to drop into a notebook to understand what the current throughput is, I don't want to deploy huge pickle and CSV files alongside my source to do something interesting.
LangChain might not be the answer, but having no standard tools at all isn't either.
Langchain is, when you boil it down, an abstraction over text concatenation, staged calls to open ai, and calls to vector search libraries.
Even without standard tooling, an experienced programmer should be able to write an understandable system that does those things.
That's the central idea here. Most guys available to hire aren't. Hence why they get constrained into a framework that limits the damage they can cause. In other areas of software development the frameworks are quite mature at this point so it works well enough.
This AI/LLM/whatever you want to call it area of development, however, hadn't garnered much interest until recently, and thus there isn't much in the way of frameworks to lean on. But business is trying to ramp up around it, thus needing to hire those who aren't good to fill seats. Like the parent says, LangChain may not be the framework we want, but it is the one we have, which beats letting the not-very-good developers create some unconstrained mess.
If you win the lottery by snagging one of the small few good developers out there, then certainly you can let them run wild engineering a much better solution. But not everyone is so fortunate.
Sounds like your hiring team just isn’t very good.
There are plenty of skilled people working in LLM land
"More artists than engineers": yes and no. I've been working with Pandas and Scikit-learn since 2012, and I haven't even put any "LLM/AI" keywords on my LinkedIn/CV, although I've worked on relevant projects.
I remember collaborating back then with PhD in ML, and at the end of the day, we'd both end up using sklearn or NLTK, and I'd usually be "faster and better" because I could write software faster and better.
The problem is that the only "LLM guy." I could trust with such a description, someone who has co-authored a substantial paper or has hands-on training experience in real big shops.
Everyone else should stand somewhere between artist and engineer: i.e., the LLM work is still greatly artisanal. We'll need something like scikit-learn, but I doubt it will be LangChain or any other tools I see now. You can see their source code and literally watch in the commit history when they discover things an experienced software engineer would do in the first pass. I'm not belittling their business model! I'm focusing solely on the software. I don't think they their investors are naive or anything. And I bet that in 1-2 years, there'll be many "migration projects" being commissioned to move things away from LangChain, and people would have a hard time explaining to management why that 6-month project ended up reducing 5K LOC to 500 LOC.
For the foreseeable future though, I think most projects will have to rely on great software engineers with experience with different LLMs and a solid understanding of how these models work.
It's like the various "databricks certifications" I see around. They may help for some job opportunities but I've never met a great engineer who had one. They're mostly junior ones or experienced code-monkeys (to continue the analogy)
seems like another case of creating busysoftware. doesn't add value, rather takes away value through needless pedantry, but has enough github stars for people to take a look anyways
VC-backed, if you couldn’t guess already
Langchain has no such benefit.
While nobody does it , SQL implementations have network API, authentication, authorization, ACL/RBAC, serialization, Business logic all the things you use in RESTful apis can all be done with just db servers.
You can expose in theory a direct SQL API to clients to consume without any other language or other components to the stack .
Most SQL servers use some layer on top of TCP/IP to connect their backends to frontend .libpq is the client which does this in postgreSQL for example .
You could either wrap that in Backend SQL server with an extension and talk to browser and other clients in HTTP[1], or you can write a wasm client in browsers to directly talk to TCP/IP port on the SQL server
Perhaps if you are oracle , that makes sense, but for no one else, they do build and push products that basically do parts of this .
[1] projects like postgREST basically do this .
In theory, the app servers that sit in front of those databases could just as easily use SQL instead of GraphQL. Even practically: The libraries around working with SQL in this way have become quite good. But they solve different problems. If you have a problem GraphQL is well suited to solve, SQL will not be a suitable replacement – and vice versa.
Even if it was easy and solved it all the things say GraphQL does it is still a bad idea .
Scaling app servers is relatively easy especially if stateless and follow some of the 12f principles, scaling SQL server horizontally is hard.
Multi master , partitioning, sharding even indexing very large tables , de-normalization is ripe with pitfalls and gotchas and many times what works for one app won’t work for the next , keeping the store simple and as less logic as possible saves a lot of pain
But if GraphQL is a good fit for your situation, SQL is not. Aside from both enabling ad-hoc execution, there is little overlap between them. They are designed to solve different problems.
(If I recall - one of the criticisms of GraphQL is that it's a bit too close to actually just exposing your database in this way)
GraphQL isn't anywhere close to being similar to SQL, so I find the desire for an analogy very confusing.
To me, these are grammars for interacting with an API, not an API.
To me, it is like calling a set of search parameters in a URL an API or describing some random function call as an API. The interface is described by the language. The language isn't the interface.
A software interface between entities (components, programs, etc) that allows for communication between those entities.
What is yours?
Isn't that what SQL/CLI is for? https://publications.opengroup.org/c451
If so, it would make sense. Because that's not a whole lot of fun. But a GraphQL server-side that is based around the GraphQL Schema Language is another matter entirely.
I've written several applications that started out as proofs of concept and have evolved into production platforms based on this pairing:
https://lighthouse-php.com https://lighthouse-php-auth.com
It is staggeringly productive, replaces lots of code generation in model queries and authentication, interacts pretty cleanly with ORM objects, and because it's part of the Laravel request cycle is still amenable to various techniques to e.g. whitelist, rate-limit or complexity-limit queries on production machines.
I have written resolvers (for non-database types) and I don't personally use the automatic mutations; it's better to write those by hand (and no different, really, to writing a POST handler).
The rest is an enormous amount of code-not-written, described in a set of files that look much like documentation and can be commented as such.
One might well not want to use it on heavily-used sites, but for intranet-type knowledgebase/admin interfaces that are an evolving proposition, it's super-valuable, particularly paired with something like Nuxt. Also pretty useful for wiring up federated websites, and it presents an extremely rapid way to develop an interface that can be used for pretty arbitrary static content generation.
The difference between the two technologies is that LangChain was developed and funded before anyone know what to do with LLMs and GraphQL was internal tooling using to solve a real problem at Meta.
In a lot of ways, LangChain is a poor abstraction because the layer it’s abstracting was (and still is) in it’s infancy.
Also, how much success people have or had with automating the E2E tests for their various apps by stringing such agents together themselves
EDIT: Typos
You can do that without function calling - as did the original ReAct paper - but then you have to write your own grammar for the communication with the LLM, a parser for it, and also you need to teach the LLM to use that grammar. This is very time consuming.
There’s a few startups in the space doing this like QA Tech in Stockholm, and others even in YC (but I forgot the name). I’m skeptical of how successful they’ll be, not just from complex test cases but things like data management and mistakingly affecting other tests. Interesting to follow just in case though, E2E is a pain!
my experience is that Python has a frustrating developer experience for production services. So I would prefer a framework with better abstractions and a solid production language (performance and safety), over no framework and Python (if those were options)
All of the logic of stringing prompts and outputs together can easily happen in basically any programming language with maybe a tiny bespoke framework customized to your needs.
Calling these things "AI agents" makes them sound both cooler and more complicated than they actually are or need to be. It's all just taking the output from one black box and sticking it into the input of another, the same kind of work frontline programmers have been doing for decades.
I think the reading is more "It's hard to find a good abstraction in a field that has not settled yet on what a good abstraction is. In that case, you might want to avoid frameworks as things shift around too much."
honestly I don't need that much abstraction.
LangChain is kinda like taking that state of hardware and bolting on a modern C++ compiler with templates and STL on it.
The example they use is indeed more complex than the openai equivalent, but LangChain allows you to use several models from several providers.
Also, it's true that the override of the pipe character is unexpected. But it should make sense, if you're familiar with Linux/Unix. And I find it shows more clearly that you are constructing a pipeline:
prompt | model | parser
Also the downside of not being able to easily tweak prompts based on experiments (crucial!)
And not to mention the library doesn’t actually live up to this use case, and you immediately (IME) run into “you actually can’t use a _Chain with provider _ if you want to use their _ API”, so I ultimately did have to care about whats supposed to be abstracted over
I honestly don't care about the syntax (as long as it's sane enough), and `|` operator overloading isn't the worst one. Manually having to define a parser object gives off some enterprise Java vibes, and I get the httplib vs requests comparison - but it's not the end of the world. If anything, the example from the article left me wondering "why do they say it's worse, when at this level of abstraction it really looks better unless we don't ever need to customize the pipeline at all?" And they never gave any real example (about spawning those agents or something) that actually shows where the abstractions are making things hard or obscure.
Honestly, on the first reading, the article [wrongly] gave me an impression of saying "we don't use LangChain anymore because it lacks good opinionated defaults", which is surely wrong - it would be a very odd take, given the initial premise of using it production for a long while.
(I haven't used LangChain or any LLMs in production, just toyed around a little bit. I can absolutely agree with the article that if all you care about is one single backend, then all those abstractions are not likely to be a good idea.)
I really want to at least understand when to use this as a tool but so far I've been failing to figure it out. Some of the things that I tried applying it for:
- Doing a kind of function calling (or at least, implementing the schema validation) for non-gpt models
- parsing out code snippets from responses (and ignoring the rest of the output)
- Having the output of a prompt return as a simple enum without hallucinations
- process a piece of information in multiple steps, like a decision tree, to create structured output about some text (is this a directory listing or a document with content? What category is it? Is it NSFW? What is the reason for it being NSFW?)
Any resources are appreciated
There are now libraries that cover some of the features of Langchain. There is Instructor and mine LLMEasyTools for function calling, there is LiteLLM for API unification.
can you comment how your library differs from instructor (what yours can do that instructor can't and vice versa?)
thanks
Then of course there's the many web application frameworks, because nobody in their right mind would want to implement http request parsing themselves (outside of academic exercises).
In fact, I would argue that most popular frameworks exist precisely because it's often more time efficient to forget about underlying details. All computer software is built on abstraction. The key is picking the right level of abstraction for your use case.
I'm unconvinced there is no room for a framework here because LLMs are somehow special. LangChain just missed the mark. Unsurprisingly so, it being an early attempt, not to mention predating general availability of the LLM chatbots that have come to define the landscape.
But aside from that, I don't think I would run it in production. If something breaks, I feel like we would be in a world of pain to get things back up and running. I am glad they shared their experience on that, this is an interesting data point.
Any tool that that helps you to get up and running quicker by abstracting away boilerplate will eventually get in the way as your projects complexity increases.
It's not that complicated. The philosophy is just different from many other python projects. The LCEL pipes for example is a really nice way to think of modularity. Want to switch out one model for another? Well just import another model and replace the old. Want to parse it more strictly, exchange the parser. The fact that everything is an instance of `RunnableSerializable` is a really convenient way of making things truly modular. Want to test your pipe syncronously? Easy just use `.stream()` instead of `.astream()` and get on with it.
I think my biggest hurdle was understanding how to debug and pipe components, but once I got familiarized with it, I must say it made me grow as a python dev and appreciate the structure and thought behind it. Where complexity arise is when you have a multi-step setup, some sync and some async. I've had to break some of these steps up in code, but otherwise it gives me tons of flexibility to pick and chose components.
My only real complaint would be lack of documentation and outdated documentation, I'm hardly the only one, but it really is frustrating sometimes to understand what some niche module can and cannot do.
It was interesting as a library at the very beginning to see how people were thinking about patterns but pretty useless in production.
But it quickly became obvious that LangChain would be better named LangSpaghetti.
That’s nothing against the authors. What are the chances the first attempt at solving a problem is successful? They should be commended for shipping quickly and raising money on top of it to keep iterating.
The mistake of LangChain is that they doubled down on the bad abstraction. They should have been iterating by exploring different approaches to solving the problem, not by adding even more complexity to their first attempt.
https://blog.langchain.dev/announcing-our-10m-seed-round-led...
Admittedly for anything more than 1-2 joins you are better off hand crafting the SQL. But that is the exception not the rule.
Refactoring DB changes becomes easier, you have a history of migrations for free, DDL generation for free.
In the early 2000 I worked where people handcrafted SQL for every little query for 100 tables and yeah you end up with inconsistent APIs and bugs that are eliminated by code generation / meta programming done by ORMs.
String disagree: if that’s true you likely don’t even need a proper RDBMS in the first place.
An ORM is not a replacement for knowing how SQL works, and it never will be.
Yes; exactly. There's value in a Schelling Point[0], and in a pattern language[1].
> requires literally none
True, yes. There isn't infinite value in these things, and "duplication is far cheaper than the wrong abstraction"[2], but they can't be avoided; they occupy local maxima.
0. https://en.wikipedia.org/wiki/Focal_point_(game_theory)
1. https://en.wikipedia.org/wiki/Pattern_language
2. https://sandimetz.com/blog/2016/1/20/the-wrong-abstraction
my guess is 40% of software engineers did a AI pivot the last 18 months, so there's a massive market for frameworks, and there's an inclination to go beyond REST requests, find something that just does it for you / can do all the cool patterns you'll find in research papers.
Incredible amount of bad info out there, whether its the 10th prompting framework that boils down to a while loop and just drives up token costs, the 400 LLM tokenizer library that can only do GPT-3.5/4.0, the Nth app that took XX ex-FAANG and $XX mil and a year to get another web app, or another iOS-only OpenAI client with background blur,m memory thats an array of strings injected into every call
It's at the point where I'm hoping for a cooling down even though I'm launching something*, and think it's hilarious people rant about it all just being hype and think people agree.
* TL;Dr consumer app with 'chain gui', just hand people an easy to use GUI like playground.openai.com / console.anthropic.com, instead of getting cute and being the Nth team to try to launch a full grade assistant on a monthly plan matching openai pricing, shoving 6000K+ prompts with each request and not showing them
The abstractions are handy if you have no idea what you are doing but it's not groundbreaking tech.
I should have built stronger separation boundaries with more general abstractions. It works fine, I haven't had any critical bugs / mistakes, but it's really nasty once you get to the actual JSON you'll send.
Google's was 100% designed by a committee of people who had never seen anyone else's API, and if they had, they would have dismissed it via NIH. (disclaimer: ex-Googler, no direct knowledge)
Google made their API before the others had one, since they were the first with making these kind of language models. Its just that it has been an internal API before.
That'd be a good explanation, but it's theoretical.
In practice:
A) there was no meaningful internal LLM API pre-ChatGPT. All this AI stuff was under lock and key until Nov 2022, then it was an emergency.
B) the bits we're discussing are OpenAI-specific concepts that could only have occurred after OpenAI's.
The API includes chat messages organized with roles, an OpenAI concept, and "tools", an OpenAI concept, both of which came well after the GPT API.
Initial API announcement here: https://developers.googleblog.com/en/palm-api-makersuite-an-...
> All this AI stuff was under lock and key until Nov 2022
That is all wrong... Did you work there? What do you base this on? Google has been experimenting with LLMs internally ever since the original paper, I worked in search then and I remember my senior manager said this was the biggest revolution in natural language processing since ever.
So even if Google added a few concepts from OpenAI, or renamed them, they still have had plenty of experience working with LLM APIs internally and that would make them want different things in their public API as well.
Absolutely not. Note that ex. Google's AI answers are not from an LLM and they're very proud of that.
> So they have had internal APIs for this for quite some time.
We did not have internal or external APIs for "chat completions" with chat messages, roles, and JSON schemas until after OpenAI.
> Did you work there?
Yes
> What do you base this on?
The fact it was under lock and key. You had to jump through several layers of approvals to even get access to a standard text-completion GUI, never mind API.
> has been experimenting with LLMs internally ever since the original paper,
What's "the original paper"? Are you calling BERT an LLM? Do you think transformers implied "chat completions"?
> that would make them want different things in their public API as well.
It's a nice theoretical argument.
If you're still convinced Google had a conversational LLM API before OpenAI, or that we need to quibble everything because I might be implying Google didn't invent transformers, there's a much more damning thing:
The API is Gemini-specific and released with Gemini, ~December 2023. There's no reason for it to be so different other than NIH and proto-based thinking. It's not great. That's why ex. we see the other comment where Cloud built out a whole other API and framework that can be used with OpenAI's Python library.
This is absolutely false, as the other person said. As one example: We had already built and were using AI based code completion in production by then.
Here's a public blog post from July, 2022: https://research.google/blog/ml-enhanced-code-completion-imp...
This is just one easy publicly verifiable example, there are others. (We actually were doing it before copilot, etc)
This follows right in line with the rest of your approach.
If you want to know things, it works better to ask questions than make assertions about what other people did or didn't do.
Nobody really cares about the opinions of those who can't be bothered to learn.
We built something like this for ourselves here -> https://www.npmjs.com/package/@kluai/gateway?activeTab=readm....
Documentation is a bit sparse but TL;DR - deploy it in a cloudflare worker and now you can access about 15 providers (the one that matter - OpenAI, Cohere, Azure, Bedrock, Gemini, etc) all with the same API without any issues.
I haven't tried it out in code, it's too late for me and I'm doing native apps, but I can tell you this is a significant step up in the space.
Even if you don't use multiple LLMs yet, and your integration is working swell right now, you will someday. These will be commodities, valuable commodities, but commodities. It's better to get ahead of it now.
Ex. If you were using GPT-4 2 months ago, you'd be disappointed by GPT-4o, and it'd be an obvious financial and quality decision to at least _try_ Claude 3.5 Sonnet.
It's a weird one. Benchmarks great. Not bad. Pretty damn good. But ex. It's now the only provider I have to worry about for RAG. Prompt says "don't add footnotes, pause at the end silently, and I will provide citations", and GPT-4o does nonsense like saying "I am now pausing silently for citations: markdown formatted divider"
chat_model:
cls: llama_index.llms.openai.OpenAI
kwargs:
model: gpt-4
chat_model:
cls: llama_index.llms.gemini.Gemini
kwargs:
model_name: models/gemini-pro
They have the concept of providers [2] and switching between them is easy as changing parameters of a function[3]
[1]:https://sdk.vercel.ai/docs/introduction
[2]: https://sdk.vercel.ai/docs/foundations/providers-and-models
[3]: https://sdk.vercel.ai/docs/ai-sdk-core/overview#ai-sdk-core
When it came to building anything real beyond toy examples, I quickly outgrew it and haven't looked back. We don't use any LC in production. So while LC does get a lot of hate from time to time (as you see in a lot of peers posts here) I do owe them some credit for helping bridge my learning of this domain.
https://monarchwadia.medium.com/use-openai-in-your-javascrip...
Of course, once the model is proven it is handed off to developers to build something more production-worthy.
It had advantage of having standardized API, so I could switch local LLM to OpenAI and just compare results in a heartbeat, but when I wanted anything out of ordinary (ie. get logprobs), there was just no way.
We have companies using Langroid in production.
[1] Langroid: https://github.com/langroid/langroid
Btw. you don't have to actually chain langchain entities. You can use all of them directly. That makes the magic framework code issue much more tolerably as Langchain turns from a framework into a library.
wait do you have specific examples of "overengineering and overabstracting" from llamaindex? very open to feedback and suggestions on improvement - we've spent a lot of work making sure everything is customizable
Even if you’re mostly working just with a provider SDK and other lightweight, low-dependency convenience wrappers for stuff you know you’ll almost always need (e.g. Instructor for structured output and retry), you can easily sprinkle LI in where you need it as a wrapper over common context retrieval patterns.
Unlike LangChain, which is a nightmare to pull out once you’ve started working with it - LI can be cleanly excised if you change your mind.
Here is an example article that shows how to use OpenAI calls with txtai: https://neuml.hashnode.dev/rag-with-llamacpp-and-external-ap...
We initially had problems diagnosing issues inside LangChain and were hitting weird issues with some elements of function calling, so we experimented with a manual reconstruction of exactly what we needed and it was faster, more resilient and easier to maintain.
I can see how switching models might be easier using LangChain as an abstraction layer, but that doesn't justify making everything else harder.
99% of docs mentioning LangChain or showing a code example with LangChain. Wherever you look at tutorials or YouTube videos, you will see LangChain.
They take the credit of being the first framework to abstract LLM calls and other features such as reading data from multiple sources (before function calling was a thing).
Langchain was first, got popular, and hence for new comers they think it’s the way, until they use it.
Go to foo_website and put your credit card to get their API. Then go to bar_website, get their API. Then go to yayeee_website and get their API. Then go to...
But unironically.
I actually counted 4 APIs in some 'how to' article. I ended up DIYing that with 0 APIs.
Whoever got into langchain planted their APIs. That is why it sucks.
The benefits of langchain are: (1) unified abstraction across multiple different models and (2) being able to plug this coherently into one architecture.
If you’re just calling some OpenAI endpoints, then why use it in the first place?
I did a full tutorial with source code that's linked at the top of that page ^
Fwiw I think it's a good idea to build with and without Langchain for deeper understanding.
https://github.com/arakoodev/EdgeChains/tree/ts/JS/edgechain...
examples of these jsonnet for react COT chains - https://github.com/arakoodev/EdgeChains/blob/ts/JS/edgechain...
P.S. we also build a webassembly compiler that compiles this down to wasm and deploy on hardware.
Curious thing, but I'd rather not partake myself.
LLM is already a probabilistic component that is tricky to integrate into a solid deterministic system. An abstraction wrapper that bloats the already fuzzy component just increases the complexity for no apparent benefit.
But, if you're familiar with Linux/Unix, this should be familiar. You are piping the output of one function as the input of another function.
As someone new to the space I have zero opinions of whether LangChain is better than writing it all yourself, but I can certainly say that, I at least, appreciate having a proscribed way of doing things, and I'm okay with the idea that I may get to a place where it no longer serves my needs. It's also worth noting that the benefit of LangChain is the ability to "chain" together these various AI links. Is there a better easier way to do that? Probably, but LangChain removes that overhead.
You don't need an abstraction at all really. Inserting the previous output into the new prompt is one line of code, and calling the API is another line of code.
If you really feel like you need to abstract that then you can make an additional helper function. But often you want to do different things at each stage so that doesn't really help.
For my personal LLM hacking in Python, I am starting down the same path: writing simple vector data stores in NumPy, write my own prompting tools and LLM wrappers, etc.
I still think that for many developers LangChain and LlamaIndex are very useful (and I try to keep my book up to date), but I usually write about things of most interest to me and I have been thinking of rewriting a new book on framework-free LLM development.