I made Ethos, an open-source tool to visualize the discourse on Hacker News. It extracts entities, tracks sentiment, and groups discussions by concept.
Check it out: https://ethos.devrupt.io
This was a "budget build" experiment. I managed to ship it for under $1 in infra costs. Originally I was using `qwen3-8b` for the LLM and `qwen3-embedding-8b` for the embedding, but I ran into some capacity issues with that model and decided to use `llama-3.1-8b-instruct` to stay within a similar budget while having higher throughput.
What LLM or embedding would you have used within the same price range? It would need to be a model that supports structured output.
How bad do you think it is that `llama-3.1` is being used and then a higher dimension embedding? I originally wanted to keep the LLM and embedding within the same family, but I'm not sure if there is munch point in that.
Repo: https://github.com/devrupt-io/ethos
I'm looking for feedback on which metrics (sentiment vs. concepts) you find most interesting! PRs welcome!
Also, hello fellow taylor.
There is a bug in the entity tracking. For the entity "github", it shows a positive sentiment. HN does NOT like GitHub (for reasons good or bad). If you click on it, it shows you stories about other seemingly unrelated stories.
Right now it seems to be only using one level of the parent comment hierarchy.
(Source: https://github.com/devrupt-io/ethos/blob/67670eb2855b84d389d...)
I think the budget is noticeable in the sentiment analysis unfortunately, the tags and entity recognition are good but the sentiment ratings themselves seem pretty sloppy.
You are an expert analyst of the Hacker News community. Analyze submissions for
the underlying ideas, concepts, technologies, and entities being discussed.
Write all summaries in third-person analytical prose. Do NOT start sentences
with "The user", "The commenter", "The author", or "This post". Instead, lead
with the substance: describe the idea, argument, or phenomenon directly.
Good: "Decentralized identity systems could reduce reliance on corporate
gatekeepers." Bad: "The user discusses how decentralized identity systems work."
(Source: https://github.com/devrupt-io/ethos/blob/67670eb2855b84d389d...)In my experience it's much more effective to reference key terms or ideas in the JSON schema and then explain those and their constraints in the system prompt.
This is one reason why people often think one model performs better than another for tasks they are both capable of. The real question IMO becomes, does porking all of that extra input prompt (a) eat too much context or (b) increase cost too much.
We will put an update on this in the future and post it in our blog, https://blog.devrupt.io/
If I could suggest, please make green colors more distinct in sentiment split wheel, they seem to be very similar now.
Side note: this is cool, but the sentiment analysis could be a bit more sophisticated in v2.
Congrats, I guess.
What model does it use?
What vector database is it using?