I took the unofficial IKEA US dataset (originally scraped by jeffreyszhou) and converted all 30,511 products into a flat, markdown-like protocol called CommerceTXT.
The goal: See if a flatter structure is more efficient for LLM context windows.
The results: - Size: 30k products across 632 categories. - Efficiency: The text version uses ~24% fewer tokens (3.6M saved total) compared to the equivalent minified JSON. - Structure: Files are organized in folders (e.g. /products/category/), which helps with testing hierarchical retrieval routers.
The link goes to the dataset on Hugging Face which has the full benchmarks.
Parser code is here: https://github.com/commercetxt/commercetxt
Happy to answer questions about the conversion logic!
These things should be put under /.well-known [1], not in the root.
It’s not ideal but representative of the tension between user experience and technical correctness.
Why would somebody even want to access that file? It doesn't make any sense to make that more user friendly, it's for LLMs.
You only have to look at how different services handle arrays in query strings to understand that serialising it is conceptually easier.
Comes up a lot in search or filter APIs. I'm sure there was some effort many moons ago to create a QUERY method for that.
For example, Google’s indexers already use this to surface pricing data. https://developers.google.com/search/docs/appearance/structu...
JSON-LD is just read-only metadata for machines.
Or just a handy open data set you could use to prove out the concept?
Huh? I don't think that's true, there usually is some sort of structural elements inside of the package, meant to be thrown away (usually made with cardboard/paper), and all Ikea boxes definitively have lots of air inside of them, not sure what would make you say otherwise, unless it's some joke I'm missing?
It's funny because it makes zero sense in the body of an initial post!
In comments replying to people downthread - maybe. But opening a top-level post with "Original Poster here" is just silly and shows a lack of respect for community etiquette.
https://hn.algolia.com/?dateRange=pastYear&page=0&prefix=tru...
>be me
Seeing it as a lack of respect is a huge stretch. And kinda conceited that you accuse someone of such, on the basis of a two word opener.