> MCP discards this lesson, opting for schemaless JSON with optional, non-enforced hints. Type validation happens at runtime, if at all. When an AI tool expects an ISO-8601 timestamp but receives a Unix epoch, the model might hallucinate dates rather than failing cleanly. In financial services, this means a trading AI could misinterpret numerical types and execute trades with the wrong decimal precision. In healthcare, patient data types get coerced incorrectly, potentially leading to wrong medication dosing recommendations. Manufacturing systems lose sensor reading precision during JSON serialization, leading to quality control failures.
Having worked with LLMs every day for the past few years, it is easy to see every single one of these things happening.
I can practically see it playing out now: there is some huge incident of some kind, in some system or service with an MCP component somewhere, with some elaborate post-mortem revealing that some MCP server somewhere screwed up and output something invalid, the LLM took that output and hallucinated god knows what, its subsequent actions threw things off downstream, etc.
It would essentially be a new class of software bug caused by integration with LLMs, and it is almost sure to happen when you combine it with other sources of bug: human error, the total lack of error checking or exception handling that LLMs are prone to (they just hallucinate), a bunch of gung-ho startups "vibe coding" new services on top of the above, etc.
I foresee this being followed by a slew of Twitter folks going on endlessly about AGI hacking the nuclear launch codes, which will probably be equally entertaining.
Before 2023 I always thought that all the bugs and glitches of technology in Star Trek were totally made up and would never happen this way.
Post-LLM I am absolutely certain that they will happen exactly that way.
I am not sure what LLM integrations have to do with engineering anymore, or why it makes sense to essentially put all your company's infrastructure into external control. And that is not even scratching the surface with the lack of reproducibility at every single step of the way.
It "somehow works" isn't engineering.
Never mind the quality or if it's even going to work in production.
And maybe that's all that's needed, I don't really know.
I'm sure that's just me being the old curmudgeon of a software engineer I am, wishing people thought about more than one user using a system and 2 engineers supporting it.
Consider this - everything will "somehow work" if the system has been there for generations and is complex enough that no single human brain can keep everything about it in the brain at any given time.
It is easy to keep a system high quality, well maintained, well understood for a year with a small team, but imagine doing that for 100+ years with a system constantly evolving in complexity with generations of maintainers, people being rotated.
Maybe we are stuck in the past and are in the process of disillusioning ourselves, and that's why the older generation is so reluctant to use vibe coding tools?
What's great is that a whole new area of research is available, because previous methodologies don't seem to work anymore and we have to find different ways to get better results out of vibe coding workflows.
But it sure is fast.
So very much like an LLM accessing multiple pieces of functionality across different tools and API endpoints (if you want to imagine it that way).
While it is seemingly very knowledgeable, it is rather stupid. It gets duped by nefarious actors or has a class of bugs that are elementary that put the crew into awkward positions.
Most professional software engineers might have previously looked as these scenarios as implausible, given the "failure model" of current software is quite blunt, and especially given how far into the future the series took place.
Now we see that computational tasks are becoming less predictable, less straight-forward, with cascading failures instead of blunt, direct failures. Interacting with an LLM might be compared to talking with a person in psychosis when it starts to hallucinate.
So you get things like this in the Star Trek universe: https://www.youtube.com/watch?v=kUJh7id0lK4
Which make a lot more sense, become a lot more plausible and a lot more relatable with our current implementations of AI/LLM's.
I wanted to add that in Star Trek they always talk with techno babble things like "Computer, create a matrix from a historic person who was knowledgeable in a specialized surgery field" and then the Hologram room creates that avatar's approximation, with the programming and simulated/hallucinated expertise.
The holodeck is a special kind of weird because sooo many accidents happen because of sloppy coding that the AI of the ship's computer created as flawed programs that later then hurt the crew members because of failing or ignored/bypassed safety protocols, which we see now as the rising field of prompt engineering in redteams.
Additionally, in Star Trek instead of coding on tablets, they usually just show analytics data or debug views of what the ship's computer created. The crew never actually code on a computer, and if they do they primarily just "vibe code" it by saying absurd things like "Computer, analyze the enemy ship's frequency and create a phasing shield emitter to block their phasers" (or something like that) and the computer generates those programs on the fly.
The cool part that I liked the most is when Voyager's neural packs (think of them as the AI-to-system control adapters) actually got sick with a biological virus because they were essentially made out of brain matter.
I liked that part too. I hadn't paid attention much before but that was a fun revelation that the computer is run by a bunch of brain tissue pouches. The LLM "guts" to speak is pretty much a collection of brain tissue clumps semantically, with weights and connections as opposed, to some database of logical assertions, like an expert systems people envisioned in the 1980s.
The author even later says that MCP supports JSON Schema, but also claims "you can't generate type-safe clients". Which is plainly untrue, there exist plenty of JSON Schema code generators.
Claude will happily cast your int into a 2023 Toyota Yaris and keep on hallucinating things.
> Cast an integer into the type of a 2023 Toyota Yaris using Javascript
(GPT-4o mini)
> To cast an integer into the type of a 2023 Toyota Yaris in JavaScript, you would typically create a class or a constructor function that represents the Toyota Yaris. Then, you can create an instance of that class using the integer value. Here's an example of how you might do this:
// Define a class for the 2023 Toyota Yaris
class ToyotaYaris {
constructor(modelYear, value) {
this.modelYear = modelYear;
this.value = value;
}
displayInfo() {
console.log(`Model Year: ${this.modelYear}, Value: ${this.value}`);
}
}
// Function to cast an integer into a Toyota Yaris object
function castToYaris(integerValue) {
const modelYear = 2023; // Set the model year
return new ToyotaYaris(modelYear, integerValue);
}
// Example usage
const integerValue = 20000; // Example integer value
const yaris = castToYaris(integerValue);
yaris.displayInfo(); // Output: Model Year: 2023, Value: $20000
Claude Code validated the response against the schema and did not pass the response to the LLM.
test - test_tool (MCP)(input: "foo")
⎿ Error: Output validation error: 'bar' is not of type 'integer'
It works in this instance. On this run. It is not guaranteed to work next time. There is a error percentage here that makes it _INEVITABLE_ that eventually, with enough executions, the validation will pass when it should fail.
It will choose not to pass this to the validator, at some point in the future. It will create its own validator, at some point in the future. It will simply pretend like it did any of the above, at some point in the future.
This might be fine for your B2B use case. It is not fine for underlying infrastructure for a financial firm or communications.
what happens if this schema validation fails here - what will the mcp server respond with and what will the llm do next (in a deterministic sense)?
llm tool call -> mcp client validates the schema -> mcp client calls the tool -> mcp server validates the schema
At the server level it's just a good old JSON API at this point, and the server would send the usual error message it would send out to anyone.
Can you guarantee it will validate it every time ? Can you guarantee the way MCPs/tool calling are implemented (which is already an incredible joke that only python brained developers would inflict upon the world) will always go through the validation layer, are you even sure of what part of Claude handles this validation ? Sure, it didn't cast an int into a Toyota Yaris. Will it cast "70Y074" into one ? Maybe a 2022 one. What if there are embedded parsing rules into a string, will it respect it every time ? What if you use it outside of Claude Code, but just ask nicely through the API, can you guarantee this validation still works ? Or that they won't break it next week ?
The whole point of it is, whichever LLM you're using is already too dumb to not trip when lacing its own shoes. Why you'd trust it to reliably and properly parse input badly described by a terrible format is beyond me.
Yes, to the extent you can guarantee the behavior of third party software, you can (which you can't really guarantee no matter what spec the software supposedly implements, so the gaps aren't an MCP issue), because “the app enforces schema compliance before handing the results to the LLM” is deterministic behavior in the traditional app that provides the toolchain that provides the interface between tools (and the user) and the LLM, not non-deterministic behavior driven by the LLM. Hence, “before handing the results to the LLM”.
> The whole point of it is, whichever LLM you're using is already too dumb to not trip when lacing its own shoes. Why you'd trust it to reliably and properly parse input badly described by a terrible format is beyond me.
The toolchain is parsing, validating, and mapping the data into the format preferred by the chosen models promot template, the LLM has nothing to do with doing that, because that by definition has to happen before it can see the data.
You aren't trusting the LLM.
The LLM has everything to do with that. The LLM is literally choosing to do that. I don't know why this point keeps getting missed or side-stepped.
It WILL, at some point in the future and given enough executions, as a matter of statistical certainty, simply not do that above, or pretend to do the above, or do something totally different at some point in the future.
No, the LLM doesn't control on a case-by-caae basis what the toolchain does between the LLM putting a tool call request in an output message and the toolchain calling the LLM afterwards.
If the toolchain is programmed to always validate tool responses against the JSON schema provided by MCP server before mapping into the LLM prompt template and calling the LLM again to handle the response, that is going to happen 100% of the time. The LLM doesn't choose it. It CAN'T because the only way it even knows that the data has come back from the tool call is that the toolchain has already done whatever it is programmed to do, ending with mapping the response into a prompt and calling the LLM again.
Even before MCPs or even models specifically trained and with vendor-provided templates for tool calling (but after the ReAct architecture was described), it was like a weekend project to implement a basic framework supporting tooling calling around a local or remote LLM. I don't think you need to do that to understand how silly the claim that the LLM controls what the toolchain does with each response and might make it not validate it is, but certainly doing it will give you a visceral understanding of how silly it is.
The pieces here are:
* Claude Code, a Node (Javascript) application that talks to MCP server(s) and the Claude API
* The MCP server, which exposes some tools through stdin or HTTP
* The Claude API, which is more structured than "text in, text out".
* The Claude LLM behind the API, which generates a response to a given prompt
Claude Code is a Node application. CC is configured in JSON with a list of MCP servers. When CC starts up, CC"s Javascript initialises each server and as part of that gets a list of callable functions.
When CC calls the LLM API with a user's request, it's not just "here is the user's words, do it". There are multiple slots in the request object, one of which is a "tools" block, a list of the tools that can be called. Inside the API, I imagine this is packaged into a prefix context string like "you have access to the following tools: tool(args) ...". The LLM API probably has a bunch of prompts it runs through (figure out what type of request the user has made, maybe using different prompts to make different types of plan, etc.) and somewhere along the way the LLM might respond with a request to call a tool.
The LLM API call then returns the tool call request to CC, in a structured "tool_use" block separate from the freetext "hey good news, you asked a question and got this response". The structured block means "the LLM wants to call this tool."
CC's JS then calls the server with the tool request and gets the response. It validates the response (e.g., JSON schemas) and then calls the LLM API again bundling up the success/failure of the tool call into a structured "tool_result" block. If it validated and was successful, the LLM gets to see the MCP server's response. If it failed to validate, the LLM gets to see that it failed and what the error message was (so the LLM can try again in a different way).
The idea is that if a tool call is supposed to return a CarMakeModel string ("Toyota Tercel") and instead returns an int (42), JSON Schemas can catch this. The client validates the server's response against the schema, and calls the LLM API with
{
"type": "tool_result",
"tool_use_id": "abc123",
"is_error": true,
"content": [
{
"type": "text",
"text": "Expected string, got integer."
}
]
}
So the LLM isn't choosing to call the validator, it's the deterministic Javascript that is Claude Code that chooses to call the validator.There are plenty of ways for this to go wrong: the client (Claude Code) has to validate; int vs string isn't the same as "is a valid timestamp/CarMakeModel/etc"; if you helpfully put the thing that failed into the error message ("Expect string, got integer (42)") then the LLM gets 42 and might choose to interpret that as a CarMakeModel if it's having a particularly bad day; the LLM might say "well, that didn't work, but let's assume the answer was Toyota Tercel, a common car make and model", ... We're reaching here, yet these are possible.
But the basic flow has validation done in deterministic code and hiding the MCP server's invalid responses from the LLM. The LLM can't choose not to validate. You seemed to be saying that the LLM could choose not to validate, and your interlocutor was saying that was not the case.
I hope this helps!
No they're literally just skipping an entire step into how LLM's actually "use" MCP.
MCP is just a standard, largely for humans. LLM's do not give a singular fuck about it. Some might be fine tuned for it to decrease erroneous output, but at the end of the day it's just system prompts.
And respectfully, your example misunderstands what is going on:
>* The Claude API, which is more structured than "text in, text out".
>* The Claude LLM behind the API, which generates a response to a given prompt
No. That's not what "this" is. LLM's use MCP to discover tools they can call, aka function/tool calling. MCP is just an agreed upon format, it doesn't do anything magical; it's just a way of aligning the structure across companies, teams, and people.
There is not an "LLM behind the API", while a specific tool might implement its overall feature set using LLM's, that's totally irrelevant to what's being discussed and the principle point of contention.
Which is this: an LLM interacting with other tools via MCP still needs system prompts or fine tuning to do so. Both of those things are not predictable or deterministic. They will fail at some point in the future. That is indisputable. It is a matter of statistical certainty.
It's not up for debate. And an agreed upon standard between humans that ultimately just acts as convention is not going to change that.
It is GRAVELY concerning that so many people are trying to use technical jargon of which they clearly are ill-equipped to do so. The magic rules all.
No,you are literally misunderstanding the entire control flow of how an LLM toolchain uses both the model and any external tools (whether specified via MCP or not, but the focus of the conversation is MCP.)
> MCP is just a standard, largely for humans.
The standard is for humans implementing both tools and the toolchains that call them.
> LLM's do not give a singular fuck about it.
Correct. LLM toolchains, which if they can connect to tools via MCP, are also MCP clients care about it. LLMs don't care abojt it because the toolchain is the thing that actually calls both the LLM and the tools. And that's true whether the toolchain is a desktop frontend with a local, in process llama.cpp backend for running the LLM or if its the Claude Desktop app with a remote connection to the Anthropic API for calling the LLM or whatever.
> Some might be fine tuned for it to decrease erroneous output,
No, they aren't. Most models that are used to call tools now are specially trained for tool calling with a well-defined format for requesting tool calls from the toolchain a mnd receiving results back from it (though this isn't necessary for tool calling to work, people were using the ReAct pattern in toolchains to do it with regular chat models without any training or prespecified prompt/response format for tool calls just by having the toolchain inject tool-related instructions in the prompt, and read LLM responses to see if it was asking for tool calls), none of them that exist now are fine tuned for MCP, nor do they need to be because they literally never see it. The toolchain reads LLM responses, identifies tool call requests, takes any that map to tools defined via MCP and routes them down the channel (http or subprocess stdio) specified by the MCP, and does the reverse woth responses from the MCP server, validating responses and then mapping them into a prompt template that specifies where tool responses go and how they are formatted. It does the same thing (minus the MCP parts) for tools that aren’t specified by MCP (frontends might have their own built-tools, or have other mechanisms for custom tools that predate MCP support.) The LLM doesn't see any difference between MCP tools and other tools or a human reading the message with the tool request and manually creating a response that goes directly back.
> LLM's use MCP to discover tools they can call,
No, they don't. LLM frontends, which are traditional deterministic programs, use MCP to do that, and to find schemas for what should be sent to and expected from the tools. LLMs don’t see the MCP specs, and get information from the toolchain in prompts in formats that are model-specific and unrelated to MCP that tell them what tools they can request calls be made to and what they can expect back.
> an LLM interacting with other tools via MCP still needs system prompts or fine tuning to do so. Both of those things are not predictable or deterministic. They will fail at some point in the future. That is indisputable.
That's not, contrary to your description, a point of contention.
The point of contention is that the validation of data returned by an MCP server against the schema provided by the server is not predictable or deterministic. Confusing these two issues can only happen if you think the model does something with each response that controls whether or not the toolchain validates it, which is impossible, because the toolchain does whatever validation it is programmed to do before the model sees the data. The model has no way to know there is a response until that happens.
Now,can the model make requests that the don't fit the toolchain’s expectations due to unpredictable model behavior? Sure. Can the model do dumb things with the post-validation reaponse data after the toolchain has validated it and mapped it into the models prompt template and called the model with that prompt, for the same reason? Abso-fucking-lutely.
Can the model do anything to tell the toolchain not to validate response data for a tool call that it did decide to make on behalf of the model if the toolchain is programmed to validate the response data against the schema provided by the tool server? No, it can't. It can't even know that the tool was provided by an MCP and that that might be an issue, not can it know that the toolchain made the request, nor can it know that the toolchain received a response until the toolchain has done what it is programmed to do with the response through the point of populating the prompt template and calling the model with the resulting prompt, by which point any validation it was programmed to do has been done and is an immutable part of history.
You are REALLY, REALLY misunderstanding how this works. Like severely.
You think MCP is being used for some other purpose despite the one it was explicitly designed for... which is just weird and silly.
>Confusing these two issues can only happen if you think the model does something with each response that controls whether or not the toolchain validates it
No, you're still just arguing against something no one is arguing for the sake of pretending like MCP is doing something it literally cannot do or fundamentally fix about how LLM's operate.
I promise you if you read this a month from now with a fresh pair of eyes you will see your mistake.
That will depend on what MCP client you are using and how they've handled it.
How does the AI bypass the MCP layer to make the request? The assumption is (as I understand it) the AI says “I want to make MCP request XYZ with data ABC” and it sends that off to the MCP interface which does the heavy lifting.
If the MCP interface is doing the schema checks, and tossing errors as appropriate, how is the AI routing around this interface to bypass the schema enforcement?
It doesn't. I don't know why the other commenters are pretending this step does not happen.
There is a prompt that basically tells the LLM to use the generated manifest/configuration files. The LLM still has to not hallucinate in order to properly call the tools with JRPC and properly follow MCP protocol. It then also has to make sense of the structured prompts that define the tools in the MCP manifest/configuration file.
It's system prompts all the way down. Here's a good read of some the underlying/supporting concepts: https://huggingface.co/docs/hugs/en/guides/function-calling
Why this fact is seemingly being lost in this thread, I have no idea, but I don't have anything nice to say about it so I won't :). Other than we're all clearly quite screwed, of course.
MCP is to make things standard for humans, with expected formats. The LLM's really couldn't give a shit and don't have anything super special about how the interact with MCP configuration files or the protocol (other than some additional fine-tuning, again, to make it less likely to get the wrong output).
No, there isn't. The model doesn't see any difference between MCP-supplied tools, tools built in to the toolchain, and tools supplied by any other method. The prompt simply provides tool names, arguments, and response types to the model. The toolchain, a conventional deterministic program, reads the model response, finds things that meet the models defined format for tool calls, parses out the call names and arguments, looks up in its own internal list of tools to find matching names and see if they are internal, MCP supplied, or other tools, and routes the calls appropriately, gathers responses, does any validation it is designed to do, then mals the validated results into where the model's prompt template specifies tool results should go, and calls the model again with an new message appended to the previous conversation context containing the tool results.
This really is not that hard to understand. The LLM must be "bootstrapped" with tool definitions and it must retain stable enough context to continue to call those tools into the future.
This will fail at some point, with any model. It will pretend to do a tool call, it will simply not do the tool call, or it will attempt to call a tool that does not exist, or any of the above or anything else not listed here. It is a statistical certainty.
I don't know why people are pretending MCP does something to fix this, or that MCP is special in anyway. It won't, and it's not.
Make sure you have a good understanding of the overall model: https://hackteam.io/blog/your-llm-does-not-care-about-mcp/
Then take a look at research like this: https://www.archgw.com/blogs/detecting-hallucinations-in-llm...
As for your comments on LLM pretending to do tool calls, sure. That's not what the original thread comments were discussing. There are ways to mitigate this with proper context and memory management but it is more advanced.
That is what the original article is describing, and what the comments misunderstood or purposefully over-simplified, and extends it to being able to trace these issues across a large amount of calls/invocations at scale.
>MCP has none of this richness. No machine-readable contracts beyond basic JSON schemas means you can’t generate type-safe clients or prove to auditors that AI interactions follow specified contracts.
>MCP ignores this completely. Each language implements MCP independently, guaranteeing inconsistencies. Python’s JSON encoder handles Unicode differently than JavaScript’s JSON encoder. Float representation varies. Error propagation is ad hoc. When frontend JavaScript and backend Python interpret MCP messages differently, you get integration nightmares. Third-party tools using different MCP libraries exhibit subtle incompatibilities only under edge cases. Language-specific bugs require expertise in each implementation, rather than knowledge of the protocol.
>Tool invocations can’t be safely retried or load-balanced without understanding their side effects. You can’t horizontally scale MCP servers without complex session affinity. Every request hits the backend even for identical, repeated queries.
Somehow comments confused a server <-> client interaction which has been a non-issue for decades with making the rest of the "call stack" dependable. What leads to that level of confusion, I can only guess it's inexperience and religious zealotry.
It's also worth noting that certain commenters saying I "should" (I'm using this word on purpose) read the spec is also pretty laughable, considering how vague the "protocol" itself is.
>Clients SHOULD validate structured results against this schema.
Have fun with that one. MCP could have at least copied the XML/SOAP process around this and we'd be better off.
Which again, leads back to the articles ultimate premise. MCP does a lot of talking and not a lot of walking, it's pointless at best and is going to lead to A LOT of integration headaches.
The MCP interface (Claude Code in this case) is doing the schema checks. Claude Code will refuse to provide the result to the LLM if it does not pass the schema check, and the LLM has no control over that.
Which is completely fucking irrelevant to what everyone else is discussing.
> Which is completely fucking irrelevant to what everyone else is discussing.
Not sure what you think is going on, but that is literally the question this subthread is debating, starting with an exchange in which the salient claims were:
From: https://news.ycombinator.com/item?id=44849695
> Claude Code validated the response against the schema and did not pass the response to the LLM.
From: https://news.ycombinator.com/item?id=44850894
> This time.
> Can you guarantee it will validate it every time ?
I can't gaurantee that behavior will remain the same more than any other software. But all this happens before the LLM is even involved.
> The whole point of it is, whichever LLM you're using is already too dumb to not trip when lacing its own shoes. Why you'd trust it to reliably and properly parse input badly described by a terrible format is beyond me.
You are describing why MCP supports JSON Schema. It requires parsing & validating the input using deterministic software, not LLMs.
No. It is not. You are still misunderstanding how this works. It is "choosing" to pass this to a validator or some other tool, _for now_. As a matter of pure statistics, it will simply not do this at some point in the future on some run.
It is inevitable.
Or write a simple MCP server and a client that uses it. FastMCP is easy: https://gofastmcp.com/getting-started/quickstart
You are quite wrong. The LLM "chooses" to use a tool, but the input (provided by the LLM) is validated with JSON Schema by the server, and the output is validated by the client (Claude Code). The output is not provided back to the LLM if it does not comply with the JSON Schema, instead an error is surfaced.
I think the others are trying to point out that statistically speaking, in at least one run the LLM might do something other than choose to use the correct tool. i.e 1 out of (say) 1 million runs it might do something else
The question is whether havign observed Claude Code validating a tool response before handing the response back to the LLM, you can count on that validation on future calls, not whether you can count on the LLM calling a tool in a similar situation.
>The LLM "chooses" to use a tool
Take a minute to just repeat this a few times.
LLMs cannot decide to skip this validation. They can only decide not to call the tool.
So is your criticism that MCP doesn't specify if and when tools are called? If so then you are essentially asking for a massive expansion of MCP's scope to turn it into an orchestration or workflow platform.
No, its not. The validation happens at the frontend before the LLM sees the response. There is no way for the LLM to choose anything about what happens.
The cool thing about having coded a basic ReAct pattern implementation (before MCP, or even models trained on any specific prompt format for tool calls, was a thing, but none of that impacts the basic pattern) is that it gives a pretty visceral understanding of what is going on here, and all that's changed since is per model standardization of prompt and response patterns on the frontend<->LLM side and, with MCP, of the protocol for interacting on the frontend<->tool side.
"1979010112345" is a unix timestamp that looks like it might be Jan 1 1979 datetime formatted as an integer, but is really Sep 17 2032 05:01:52.
It is absolutely possible to do this, and to generate client code which complies with ISO-8601 in JS/TS. Large amounts of financial services would not work if this was not the case.
See the c# support for ISO-8601 strings: https://learn.microsoft.com/en-us/dotnet/standard/base-types...
`DateTime` is not an ISO-8601 type. It can _parse_ an ISO-8601 formatted string.
And even past that, there are Windows-specific idiosyncrasies with how the `DateTime` class implements the parsing of these strings and how it stores the resulting value.
This is exactly the point: a string is just a data interchange format in the context of a DateTime, and C# provides (as far as I can tell) a complete way of accessing the ISO-8601 specification on the language object. It also supports type-safe generation of clients and client object (or struct) generation from the ISO-8601 string format.
> And even past that, there are Windows-specific idiosyncrasies with how the `DateTime` class implements the parsing of these strings and how it stores the resulting value.
Not really. The windows statements on the article (and I use this on linux for financial services software) are related to automated settings of the preferences for generated strings. All of these may be set within the code itself.
That was based on decades of experience in .NET and Windows. Not the article ;).
Related but distinct from serialization.
https://www.w3.org/TR/xmlschema-2/#isoformats
And so you can write a web service in Java and use it from C#, or whatever.
The merchants of complexity are disappointed. It turns out that even machines don't care for 'machine-readable' formats; even the machines prefer human-readable formats.
The only entities on this planet who appreciate so-called 'machine-readability' are bureaucrats; and they like it for the same reason that they like enterprise acronyms... Literally the opposite of readability.
this is like saying "HTTP doesn't do json validation", which, well, yeah.
LLMs are basically automating PEBKAC
When desktop OSes came out, hardware resources were scarce so all the desktop OSes (DOS, Windows, MacOS) forgot all the lessons from Unix: multi user, cooperative multitasking, etc. 10 years later PC hardware was faster than workstations from the 90s yet we're still stuck with OSes riddled with limitations that stopped making sense in the 80s.
When smartphones came out there was this gold rush and hardware resources were scarce so OSes (iOS, Android) again forgot all the lessons. 10 years later mobile hardware was faster than desktop hardware from the 00s. We're still stuck with mistakes from the 00s.
AI basically does the same thing. It's all lead by very bright 20 and 30 year olds that weren't even born when Windows was first released.
Our field is doomed under a Cascade of Attention-Deficit Teenagers: https://www.jwz.org/doc/cadt.html (copy paste the link).
It's all gold rushes and nobody does Dutch urban infrastructure design over decades. Which makes sense as this is all driven by the US, where long term plan I is anathema.
Of course this keeps happening
Already happening.
https://www.infosecurity-magazine.com/news/atlassian-ai-agen...
May have changed, but unlikely. I worked with medical telemetry as a young man and it was impressed upon me thoroughly how important parsing timestamps correctly was. I have a faint memory, possibly false, of this being the first time I wrote unit tests (and without the benefit of a test framework).
We even accounted for lack of NTP by recalculating times off of the timestamps I. Their message headers.
And the reasons I was given were incident review as well as malpractice cases. A drug administered three seconds before a heart attack starts is a very different situation than one administered eight seconds after the patient crashed. We saw recently with the British postal service how lives can be ruined by bad data, and in medical data a minute is a world of difference.
I also work in healthcare, and we've seen HL7v2 messages with impossible timestamps. (E.g., in the spring-forward gap.)
As RPC mechanisms go, HTTP is notable for how few of the classic blunders they made in 1.0 of the spec. Clock skew correction is just my favorite. Technically it exists for cache directives, but it’s invaluable for coordination across machines. There are reasons HTTP 2.0 waited decades to happen. It just mostly worked.
The stuff about type validation is incorrect. You don't need client-side validation. You shouldn't be using APIs you don't trust as tools and you can always add instructions about the LLM's output format to convert to different formats.
MCP is not the issue. The issue is that people are using the wrong tools or their prompts are bad.
If you don't like the format of an MCP tool and don't want to give formatting instructions the LLMs, you can always create your own MCP service which outputs data in the correct format. You don't need the coercion to happen on the client side.
If we can get an internal, sensitive-data-handling agent to ingest a crafted prompt, either via direct prompt injection against a more abstract “parent” agent, or by tainting an input file/URL it’s told to process, we can plant what I have internally coined an “unfolding injection.”
The injection works like a parasitic goal, it doesn’t just trick one agent, it rewrites the downstream intent. As the orchestrator routes tasks to other agents, each one treats the tainted instructions as legitimate and works toward fulfilling them.
Because many orchestrations re-summarize, re-plan, or synthesize goals between steps, the malicious instructions can actually gain fidelity as they propagate. By the time they reach a sensitive action (exfiltration, privilege escalation, external calls), there’s no trace of the original “weird” wording, just a confidently stated, fully-integrated sub-goal.
It’s essentially a supply-chain attack on the orchestration layer: you compromise one node in the agent network, and the rest “help” you without realizing it. Without explicit provenance tracking and policy enforcement between agents, this kind of unfolding injection is almost trivial to pull off, and we've been able to compromise entire environments based on the information the agentic system provided us, or just gave us either a bind or reverse shell in the case it has cli access and ability to figure out its own network constraints.
SSRF has been making a HUGE return in agentic systems, and Im sad defcon and black hat didnt really have many talks on this subject this year, because it is a currently evolving security domain and entirely new method of exploitation. The entire point of agentic systems is non determinism, but it also makes it a security nightmare. As a researcher though, this is basically a gold mine of all sorts of new vulnerabilities we'll be seeing. If you work as a bugbounty hunter and see a new listing for an AI company I can almost assuredly say you can get a pretty massive payout just by exploiting the innate trust between agents and the internal tools they are leveraging. Even if you dont have the architecture docs of the agentic system you can likely prompt inject the initial task enough to taint the further agents to have them list out the orchestration flow by creatively adjusting your prompt for different types of orchestration and how the company might be doing prompt engineering on the agents persona and task its designed to work on and then submit report on to parent agent, and the limited input validation between them.
If an LLM can be shown to be useful 80% of the time to the JS mindset this is fine, and the remaining 20% can be resolved once we're being paid for the rest, Pareto principle be damned.
Mostly, no. Whether its the client sending (statically) bad data or the server returning (statically) bad data, schema validation on the other end (assuming somehow it is allowed by the toolchain on the sending end) should reject it before it gets to the custom code of the MCP server or MCP client.
For arguments that are the right type but wrong because of the state of the universe, yes, the server receiving it should send a useful error message back to the client. But that's a different issue.
At some point we have to decide as a community of engineers that we have to stop building tools that are little more than loaded shotguns pointed at our own feet.
GIEMGO garbage in even more garbage out
Ironically, it's achieved this - but that's an indictment of USB-C, not an accomplishment of MCP. Just like USB-C, MCP is a nigh-universal connector with very poorly enforced standards for what actually goes across it. MCP's inconsistent JSON parsing and lack of protocol standardization is closely analogous to USB-C's proliferation of cable types (https://en.wikipedia.org/wiki/USB-C#Cable_types); the superficial interoperability is a very leaky abstraction over a much more complicated reality, which IMO is worse than just having explicitly different APIs/protocols.
Previously, you could reasonably expect a USB-C on a desktop/laptop of an Apple Silicon device, to be USB4 40Gbps Thunderbolt, capable of anything and everything you may want to use it for.
Now, some of them are USB3 10Gbps. Which ones? Gotta look at the specs or tiny icons, I guess?
Apple could have chosen to have the self-documenting USB-A ports to signify the 10Gbps limitation of some of these ports (conveniently, USB-A is limited to exactly 10Gbps, making it perfect for the use-case of having a few extra "low-speed" ports at very little manufacturing cost), but instead, they've decided to further dilute the USB-C brand. Pure innovation!
With the end user likely still having to use a USB-C to USB-A adapters anyways, because the majority of thumb drives, keyboards and mice, still require a USB-A port — even the USB-C ones that use USB-C on the kb/mice itself. (But, of course, that's all irrelevant because you can always spend 2x+ as much for a USB-C version of any of these devices, and the fact that the USB-C variants are less common or inferior to USB-A, is of course irrelevant when hype and fanaticism are more important than utility and usability.)
What's disallowed is for a non-host to have USB-A, hence, USB-A to USB-A is impossible, because one side of the cable has to be connected to a "device" that's not acting in host mode.
Only the host is allowed to have USB-A.
This is exactly why USB-A is superior to USB-C for host-only ports on embedded devices like routers (as well as auxiliary USB ports on your desktop or monitor).
Generally, many modern travel routers have one USB-C and one USB-A port. Without any documentation or pictograms, you can be relatively sure that USB-A would be used for data, and USB-C is for power (hopefully, through USB-PD). Since USB-A couldn't possibly be used to power up the router, since USB-A is a host-only port.
USB-C is great for USB-OTG and the bidirectional modes, when the same port could be used for both the host and the peripheral device functions, like on the smartphones. https://en.wikipedia.org/wiki/USB_On-The-Go
If the port can ONLY be used in host-mode, and does NOT support Alt Mode, Thunderbolt, or bidirectional USB-PD, then USB-A is a far more fitting connector, to signify all of the above.
Unfortunately, no one understood SOAP back.
(Additional context: Maintaining a legacy SOAP system. I have nothing good to say about SOAP and it should serve as a role model for no one)
It doesn't take very long for people to start romanticizing things as soon as they're not in vogue. Even when the painfulness is still fresh in memory, people lament over how stupid new stuff is. Well I'm not a fan of schemaless JSON APIs (I'm one of those weird people that likes protobufs and capnp much more) but I will take 50 years of schemaless JSON API work over a month of dealing with SOAP again.
/“xml is like violence, if it’s not working just use more!”
No.
SOAP uses that, but SOAP involves a whole lot of spec about how you do that, and that's even before (as the article seems to) treat SOAP as meaning SOAP + the set of WS-* standards built around it.
And I actually like XML-based technologies. XML Schema is still unparalleled in its ability to compose and verify the format of multiple document types. But man, SOAP was such a beast for no real reason.
Instead of a simple spec for remote calls, it turned into a spec that described everything and nothing at the same time. SOAP supported all kinds of transport protocols (SOAP over email? Sure!), RPC with remote handles (like CORBA), regular RPC, self-describing RPC (UDDI!), etc. And nothing worked out of the box, because the nitty-gritty details of authentication, caching, HTTP response code interoperability and other "boring" stuff were just left as an exercise to the reader.
Part of this is the nature of XML. There's a million ways to do things. Should some data be parsed as an attribute of the tag or should it be another tag? Perhaps the data should be in the body between the tags? HTML, based on XML, has this problem; eg. you can seriously specify <font face="Arial">text</font> rather than have the font as a property of the wrapping tag. There's a million ways to specify everything and anything and that's why it makes a terrible data parsing format. The reader and writer must have the exact same schema in mind and there's no way to have a default when there's simply no particular correct way to do things in XML. So everything had to be very very precisely specified to the point it added huge amounts of work when a non-XML format with decent defaults would not have that issue.
This become a huge problem for SOAP and why i hate it. Every implementation had different default ways of handling even the simplest data structure passing between them and were never compatible unless you took weeks of time to specify the schema down to a fine grained level.
In general XML is problematic due to the lack of clear canonical ways of doing pretty much anything. You might say "but i can specify it with a schema" and to that i say "My problem with XML is that you need a schema for even the simplest use case in the first place".
But parts of XML infrastructure were awesome. I could define a schema for the data types, and have my IDE auto-complete and validate the XML documents as I typed them. I could also validate the input/output data and provide meaningful errors.
And yeah, I also worked with XML and got burned many times by small incompatibilities that always happen due to its inherent complexity. If XML were just a _bit_ simpler, it could have worked so much better.
Unfortunately as usual when a new technology cycle comes, everything gets thrown away, including the good parts.
Generally it worked very well when both ends were written in the same programming language and was horseshit if they weren’t. No wonder Microsoft liked SOAP so much.
IBM thought they were good at lockin, until Bill Gates came along.
I've been on the other side of high-feature serialization protocols, and even at large tech companies, something like migrating to gRPC is a multi-year slog that can even fail a couple of times because it asks so much of you.
MCP, at its core, is a standardization of a JSON API contract, so you don't have to do as much post-training to generate various tool calling style tokens for your LLM.
I think you meant that is why JSON won instead of XML?
Not just XML, but a lot of other serialization formats and standards, like SOAP, protobuf in many cases, yaml, REST, etc.
People say REST won, but tell me how many places actually implement REST or just use it as a stand-in term for casual JSON blobs to HTTP URLs?
Now, YAML has quite a few shortcomings compared to JSON (if you don't believe me, look at its handling of the string no, discussed on HN), so, at least to me, it's obvious why JSON won.
SOAP, don't get me started on that, it's worth less than XML, protobuf is more efficient but less portable, etc.
That's backwards reasoning. XML was too complicated, so they decided on a simpler JSON.
CORBA emerged in 1991 with another crucial insight: in heterogeneous environments, you can’t just “implement the protocol” in each language and hope for the best. The OMG IDL generated consistent bindings across C++, Java, Python, and more, ensuring that a C++ exception thrown by a server was properly caught and handled by a Java client. The generated bindings guaranteed that all languages saw identical interfaces, preventing subtle serialization differences.
Yes, CORBA was such a success.So it baked in core assumptions that the network is transparent, reliable, and symmetric. So you could create an object on one machine, pass a reference to it to another machine, and everything is supposed to just work.
Which is not what happens in the real world, with timeouts, retries, congested networks, and crashing computers.
Oh, and CORBA C++ bindings had been designed before the STL was standardized. So they are a crawling horror, other languages were better.
And its complexity and size now are rivaling the specs of the good old XML-infused times.
Didn’t get that job, one of the interviewers asked me to write concurrent code, didn’t like my answer, but his had a race condition in it and I was unsuccessful in convincing him he was wrong. He was relying on preemption not occurring on a certain instruction (or multiprocessing not happening). During my tenure at the job I did take the real flaws in the Java Memory Model would come out and his answer became very wrong and mine only slightly.
On a more general note, I see in many critical comments here what I perceive to be a category error. Using JSON to pass data between web client and server, even in more complex web apps, is not the same thing as supporting two-way communications between autonomous software entities that are tasked to do something, perhaps something critical. There could be millions of these exchanges in some arbitrarily short time period, thus any possibility of errors is multiplied accordingly, and the effect any error could cascade if it does not fail early. I really don't believe this is a case where "worse is better." To use an analogy, yes everyday English is a versatile language that works great for most use cases; but when you really need to nail things down, with no tolerance for ambiguity, you get legalese or some other jargon. Or CORBA, or gRPC, etc.
If only that were true. Litigation happens every single day over the meanings of contracts and laws that were drafted by well-trained and experienced attorneys.
Comparatively, programming languages are very constrained. The environments in which they are interpreted and executed are far better understood than any human courtroom.
Your point is an interesting one but it’s painting with too broad a brush.
Perhaps you meant “less tolerance”?
- SOAP - interop needs support of DOC or RPC based between systems, or a combination, XML and schemas are also horribly verbose.
- CORBA - libraries and framework were complex, modern languages at the time avoided them in deference to simpler standards (e.g. Java's Jini)
- GPRC - designed for speed, not readability, requires mappings.
It's telling that these days REST and JSON (via req/resp, webhooks, or even streaming) are the modern backbone of RPC. The above standards either are shoved aside or for GPRC only used where extreme throughput is needed.
Since REST and JSON are the plat du jour, MCP probably aligns with that design paradigm rather than the dated legacy protocols.
No, they're the medium of the web.
The author is specifically addressing enterprise integration into business workflows - not showing stuff in a browser.
It seems to be a game of catch up for most things AI. That said, my school of thought is that certain technologies are just too big for them to be figured out early on - web frameworks, blockchain, ...
- the gap starts to shrink eventually. With AI, we'll just have to keep sharing ideas and caution like you have here. Such very interesting times we live in.
This is really obvious when they talk about tracing and monitoring, which seem to be the main points of criticism anyway.
They bemoan that they cant trace across MCP calls, assuming somehow there would be a person administering all the MCPs. Of course each system has tracing in whatever fashion fits its system. They are just not the same system, nor owned by the same people let alone companies.
Same as monitoring cost. Oh, you can’t know who racked up the LLM costs? Well of course you can, these systems are already in place and there are a million of ways to do this. It has nothing to do with MCP.
Reading this, I think its rather a blessing to start fresh and without the learnings of 40 years of failed protocols or whatever
1. Lessons.
2. Fairly sure all of Google is built on top of protobuf.
You'd still need basically the entire existing MCP spec to cover the use cases if it replaced JSON-RPC with Swagger or protobuf, plus additional material to cover the gaps and complications that that switch would involve.
If it was based on OpenAPI, servers created using transports that are not HTTP would need to implement a http server.
0: https://modelcontextprotocol.io/specification/2025-06-18/bas...
It's amusing to watch people refer to MCP as a set of tools, or a framework, or an SDK you can invoke, or something or other across a wide range of forums.It's just a standard. A convention. Calling it a protocol is a stretch as well. But there's no meat to it, really.
If you just used Rest API's, you'd need to create little "tools" (say, another executable) locally that the LLM can invoke that can call those API's. MCP standardizes what those tools should act like and their overall lifecycle model.
The references to it being like USB are also quite frankly absurd and delusional.
But that's the caliber of developer we're dealing with today.
I agree that swagger leaves a lot unplanned. I disagree about the local use case because (1) we could just run local HTTP servers easily and (2) I frankly assume the future of MCP is mostly remote.
Returning back to JSON-RPC, it’s a poorly executed RPC protocol. Here is an excellent HackerNews thread on it, but the TLDR is parsing JSON is expensive and complex, we have tons of tools (eg load balancers) that make modern services, and making those tools parse json is very expensive. Many people in the below thread mention alternative ways to implement J-RPC but that depends on new clients.
I know this because I wish it did. You can approximate streaming responses by using progress notifications. If you want something like the LLM partial response streaming, you'll have to extend MCP with custom capabilities flags. It's totally possible to extend it in this way, but then it's non standard.
Perhaps you are alluding to the fact that it's bidirectional protocol (by spec at least).
Sure, they might still find themselves in highly regulated industries where risk avoidance trumps innovation everyday, all day.
MCP is for _the web_ , it started with stdio only because Anthropic was learning lessons from building Claude Code.
Author also seems to expect that the result from MCP tool usage will feed directly to an LLM. This is preposterous and a recipe for disaster. Obviously you’d validade structured response against a schema, check for harmful content, etc etc.
> Author also seems to expect that the result from MCP tool usage will feed directly to an LLM
Isn't this exactly what MCP is for? Most tools I've come across are to feed context from other sources directly to the LLM. I believe this is the most common use-case for the protocol.
The greater problem is industry misunderstanding and misalignment with what agents are and where they are headed.
Web platforms of the world believe agents will be embedded in networked distributed infrastructure. So we should ship an MCP platform in our service mesh for all of the agents running in containers to connect to.
I think this is wrong, and continues to be butchered as the web pushes a hard narrative that we need to enable web-native agents & their sdks/frameworks that deploy agents as conventional server applications. These are not agents nor the early evolutionary form of them.
Frontier labs will be the only providers of the actual agentic harnesses. And we are rapidly moving to computer use agents - MCP servers were intended to serve as single instance deployments for single harnesses. ie. a single mcp server on my desktop for my Claude Desktop.
> In financial services, this means a trading AI could misinterpret numerical types and execute trades with the wrong decimal precision.
If you are letting an LLM execute trades with no guardrails then it is a ticking time bomb no matter what protocol you use for the tool calls.
> When an AI tool expects an ISO-8601 timestamp but receives a Unix epoch, the model might hallucinate dates rather than failing cleanly.
If your process breaks because of a hallucinated date -- don't use an LLM for it.
Also Erlang uses RPCs for pretty much all "synchronous" interactions but it's pretty minimal in terms of ceremony. Seems pretty reliable.
So this is a serious question because hand rolling "40 years" of best practices seems hard, what should we be using for RPC?
Multics vs Unix, xml based soap vs json based rest apis, xhtml’s failure, javascript itself, … I could keep going on.
So I’ve resigned myself to admitting that we are doomed to reimplement the “good enough” every time, and continue to apply bandaid after bandaid to gradually fix problems after we rediscover them, slowly.
https://en.m.wikipedia.org/wiki/Worse_is_better
It's been confirmed over and over since then. And I say that as someone who naturally gravitates towards "better" solutions.
In other words, many technical problems flow rather predictably from decision-making boundaries that don't internalize the externalities.
Ever heard someone say "if you care about X, run for office"? The same applies to technology. If one cares about good designs, one must promote organizational and societal structures that actually have a fighting chance at bringing those about.
The days of nerds and hackers not caring about broader dynamics and structures are long gone. Sitting back and letting the business folks have control is fine if you want them to optimize for the existing incentives. But if you want to change the rules of the game, you gotta jump in at the deep end.
The world we could have lived in... working web forms validations, working microdata...
- Electron disregards 40 years of best deployment practices,
- Web disregards 40 years of best GUI practices,
- Fast CPUs and lots of RAM disregards 40 years of best software optimization techniques,
there are probably many more examples.
windows 10 is easier to use than windows 95.
osx is easier to use than mac.. whatever they named their old versions.
It goes on and on. I can have 50 browser tabs open at the same time, each one hosting a highly complicated app, ranging from media playback to chat rooms to custom statistical calculators. I don't need to install anything for any of these apps, I just type in a short string in my url bar. And they all just work, at the same time.
Things are in fact better now.
The supercomputers in our pockets (that used to be telephones, but don't work well for that anymore) will let us run the programs that one of two companies allow us to run, which will run most apps... as long as the hardware is as recent as our laptops/desktops.
Yes, we're very advanced. In the past 20 years, we have achieved... the same thing we had 20 years ago... only with more hardware requirements, programming languages, and frameworks. Today you can do anything... as long as it's on a web page, on recent hardware (and God help you if you haven't updated your software in the past month)
Things are also getting better now that Intel is dying. I mean, the new Apple silicon chips are astoundingly fast and energy efficient, an M1 from 5 years ago is still going strong and probably won't truly need replacing for another 2. Similar for Ryzen chips from 5 years ago!
Things have changed a lot in 20 years. In 2005 we didn't consume all of our video / audio media online. We didn't have social media, just blogs and RSS readers. YouTube had just been released. TikTok, Facebook and Twitter didn't exist. Hypermedia today is very rich and necessitates a lot of resources. But at the same time, most work the past 10 years has been on native apps (on mobile particularly but also PCs), not web sites. Most people don't use the web browser as much.
Considering the fact that there were like five GUIs in existence between 1950 and 1980, I find that extremely hard to believe.
Point-by-point for the article's gripes:
- distributed tracing/telemetry - open discussion at https://github.com/modelcontextprotocol/modelcontextprotocol...
- structured tool annotation for parallelizability/side-effects/idempotence - this actually already exists at https://modelcontextprotocol.io/specification/2025-06-18/sch... but it's not well documented in https://modelcontextprotocol.io/specification/2025-06-18/ser... - someone should contribute to improving this!
- a standardized way in which the costs associated with an MCP tool call can be communicated to the MCP Client and reported to central tracking - nothing here I see, but it's a really good idea!
- serialization issues e.g. "the server might report a date in a format unexpected by the client" - this isn't wrong, but since the consumer of most tool responses is itself an LLM, there's a fair amount of mitigation here. And in theory an MCP Client can use an LLM to detect under-specified/ambiguous tool specifications, and could surface these issues to the integrator.
Now, I can't speak to the speed at which Maintainers and Core Maintainers are keeping up with the community's momentum - but I think it's meaningful that the community has momentum for evolving the specification!
I see this post in a highly positive light: MCP shows promise because you can iterate on these kinds of structured annotations, in the context of a community that is actively developing their MCP servers. Legacy protocols aren't engaging with these problems in the same way.
MCP is not a protocol. It doesn't protocolize anything of use. It's just "here's some symbols, do with them whatever you want.", leaving it there but then advertising that as a feature of its universality. It provides almost just as much of a protocol as TCP, but rebuild on 5 OSI layers, again.
It's not a security issue, it's a ontological issue.
That being said. MCP as a protocol has a fairly simple niche. Provide context that can be fed to a model to perform some task. MCP covers the discovery process around presenting those tools and resources to an Agent in a standardized manner. An it includes several other aspects that are useful in this niche. Things like "sampling" and "elicitations". Is it perfect? Not at all. But it's a step in the right direction.
The crowd saying "just point it at an OpenAPI service" does not seem to fully understand the current problem space. Can many LLMs extract meaning from un-curated API response messages? Sure. But they are also burning up context holding junk that isn't needed. Part of MCP is the acknowledgement that general API responses aren't the right way to feed the model the context it needs. MCP is supposed to be taking a concrete task, performing all the activities need to gather the info or affect the change, then generate clean context meant for the LLM. If you design an OpenAPI service around those same goals, then it could easily be added to an Agent. You'd still need to figure out.all the other aspects, but you'd be close. But at that point you aren't pointing an Agent at a random API, you're pointing it at a purpose made API. And then you have to wonder, why not something like MCP that's designed for that purpose from the start?
I'll close by saying there are an enormous number of MCP Servers out there that are poorly written, thin wrappers on general APIs, or have some other bad aspects. I attribute a lot of this to the rise in AI Coding Agents allowing people with poor comprehension of the space enabling them to crank out this... Noise.
There are also great examples of MCP Servers to be found. They are the ones that have thoughtful designs, leverage the spec fully, and provide nice clean context for the Agent to feed to the LLM.
I can envision a future where we can simply point an agent at a series of OpenAPI services and the agent uses it's models to self-assemble what we consider the MCP server today. Basically it would curate accessing the APIs into a set of focused tools and the code needed to generate the final context. That's not quite where we are today. It's likely not far off though.
MCP seems to be the ultimate inheritor of ReSTful architectural practices as described by Roy fielding but instead of needing a human on the other end to interact with links (hateos) you can have an LLM bridging the gap between user intent and concrete subsequent calls.
So many issues around versioning and the fragility here will likely start to disappear. Like embeddings I'd put a bet on this being one of the major unexpected architectural improvements that lets us build our scaffolding a bit higher before it collapses to use a Steve Jobs analogy, LLM's at key glue points to create looser coupling.
https://github.com/modelcontextprotocol/modelcontextprotocol...
The Python Bindings PR is here modelcontextprotocol/rust-sdk#172
The Typescript Bindings PR is here modelcontextprotocol/rust-sdk#183
MCP Bench at https://github.com/unimcp/mcpbench
Otherwise the larger picture is that MCP is a land grab for building an eco-system around integrations to get access to data. Your LLM agent is not valuable if it can't access things for you... and from a market perspective enterprise pays a lot for this stuff already, and yes MCP is not thought out at all for Enterprise really... At least thankfully they added stateless connections to the spec...
CORBA did pretty much everything wrong, which makes it a great anti-example. Automatic client generation? Fuck that.
Actually, MCP uses a normative TypeScript schema (and, from that, an autogenerated JSON Schema) for the protocol itself, and the individual tool calls also are specified with JSON Schema.
> Type validation happens at runtime, if at all.
That's not a consequence of MCP "opting for schemaless JSON" (which it factually does not), that's, for tool calls, a consequence of MCP being a discovery protocol where the tools, and thus the applicable schemas, are discovered aruntime.
If you are using MCP as a way to wire up highly-static components, you can do discovery against the servers once they are wired up, statically build the clients around the defined types, and build your toolchain to raise errors if the discovery responses change in the future. But that's not really the world MCP is built for. Yes, that means that the toolchain needs, if it is concerned about schema enforcement, use and apply the relevant schemas at runtime. So, um, do that?
WSDL is just pure nonsense. The idea that software would need to decide which API endpoints it needs on its own, is just profoundly misguided... Literally nobody and nothing ever reads the WSDL definitions; it's just poor man's documentation, at best.
LLMs only reinforce the idea that WSDL is a dumb idea because it turns out that even the machines don't care for your 'machine-friendly' format and actually prefer human-friendly formats.
Once you have an MPC tool working with a specific JSON API, it will keep working unless the server makes breaking changes to the API while in production which is terrible practice. But anyway, if you use a server, it means you trust the server. Client-side validation is dumb; like people who need to put tape over their mouths because they don't trust themselves to follow through on their diet plans.
WSDLs being available from the servers allows (a) clients to validate the requests they make before sending them to the server, and (b) developers (or in principle even AI) with access to the server to create a client without needing further out-of-band specifications.
I don't buy this idea that code should be generated automatically without a human involved (at least as a reviewer).
I also don't buy the idea that clients should validate their requests before sending to the server. The client's code should trust itself. I object to any idea of code (or any entity) not trusting itself. That is a flawed trust model.
In theory. In reality java could talk to java. M$ stuff could talk to other M$ stuff. And pretty much everyone else was left out in the cold. consistent cross language interop never actually happened despite the claims that it would.
MCP started from an accessibility direction, which is why it’s catching on.
Headaches are inevitable, but don’t network effects often dominate technological superiority?
On that note; some of these “best practices” arguably haven’t worked out. “Be conservative with what you send, liberal with what you receive” has turned even decent protocols into a dumpster fire, so why keep the charade going?
Failed protocols such as TCP adopted Postel's law as a guiding principle, and we all know how that worked out!
The ISO8601 v Unix epoch example seems very weak to me. I'd certainly expect any model to be capable of distinguishing between these things, so, it doesn't seem like a big deal that either one would be allowed in a JSON.
Honestly, my view that nothing of value ever gets published on medium, is strongly reinforced here.
But why did the designers make that choice when they had any of half a dozen other RCP protocols to choose from?
> The ISO8601 v Unix epoch example seems very weak to me. I'd certainly expect any model to be capable of distinguishing between these things
What about the medical records issue? How is the model to distinguish a weight in kgs from one in pounds?
Wouldn't medical records actually be better in JSON, because the field could expressly have "kg" or "lb" suffix within the value of the field itself, or even in the name of the field, like "weight-in-kg" or "weight-in-lb"? This is actually the beauty of JSON compared other formats where these things may end up being just a unitless integer.
The biggest problem with medical data would probably remain the human factor, where regardless of the format used by the machines and by MCP, the underlying data may already be incorrect or not coded properly, so, if anything, AI would likely have a better chance of interpreting the data correctly than the API provider blindly mislabelling unitless data.
A field name doesn't turn an integer field into a unit of anything. Suffixing a unit as part of the value only makes the field stringly-typed an complicates parsing.
BTW weight can take fractional values. That might not matter much for a 125kg adult, but it certainly matters for a newborn weighing 3.1kg.
And I bring up weight because I personally identified a unit conversion bug in a well-known medical records suite where a weight measured in pounds was displayed as kilograms. A dose of opioids sufficient to relieve traumatic pain in a 125kg person would kill someone who weighs 125lb. Medical record software is not in the move-fast-break-things category.
If you don't want to be banned, you're welcome to email hn@ycombinator.com and give us reason to believe that you'll follow the rules in the future. They're here: https://news.ycombinator.com/newsguidelines.html.