### What we built
Sumble is a knowledge graph for go-to-market teams. We allow you to run very rich queries to identify prospects at a granular level and be able to do very targeted outreach.
Sumble allows you to find:
- tech stacks (in larger companies, down to the team or buying group level) - key projects those teams are working on (cloud migrations, GenAI initiatives, etc.) - people involved in those key projects
For example, here's a list of GenAI projects at Capital One that involve RAG/Vector databases: https://sumble.com/l/6sDqKmhyAH
And this view includes a list of people who we think are involved in a particular project being undertaken by the AI Foundation Team at Capital One: https://sumble.com/l/j8mbRrDsly
These views allow you to reach out to that team with a granular understanding of what they are working on.
### Inspiration
Sumble was very much inspired by our experience at Kaggle:
1. Kaggle’s public-data platform showed us how hungry people are for high-quality data (the metrics on that product were really strong)
2. At Google we saw knowledge graphs unlock powerful and composable queries
### Trying it out
- The app is live today; you’ll need to log in (Google OAuth or magic links)
- Most functionality and data are free; we only charge individual users for bulk exports
### How it works (briefly)
- Sources: job posts, resume data, company websites (more to come!)
- Extraction & linking: We use LLM (mostly fine-tuned models) to extract entities out of text from sources (company → team → people on a team → projects the team is undertaking → technology the team uses)
### What’s next
- Adding more sources so you can run even more composable queries
- Opening an API so devs can hit the graph directly
- Much later: expand to use cases beyond GTM
### Feedback
- Is the web app intuitive?
- What queries do you want us to prioritize supporting in an API?
- What additional external data sources would you like us to prioritize? - What workflow improvements/integrations would you find most helpful?
Also: Please don't evolve the UI. Its perfect as it is
My last startup was selling to SMBs. It looks like Sumble is most likely targeted at mid-market and enterprise companies. Any plans to expand coverage into the long tail of smaller companies?
Our goal is to have complete coverage for active companies and organizations in the world, and an understanding for companies that previously existed but are no longer active as well (these appear extensively in CRM's and add noise).
We prioritize expanding data coverage in areas that we hear are most useful from our current users and customers.
I do find myself wanting to transform the data (especially the stuff in job descriptions) using an LLM, e.g. for scoring companies/contacts or looking for more subtle signals. Sometimes I do this manually but exporting a bunch of JDs from Sumble isn't possible AFAIK. Or doing it in Sumble would be great, too.
Awesome to see it on HN. Congrats on the launch!
We're planning to make that workflow much better in four ways this year:
1. Adding an API to make it easier to consume the data programmatically (next 2 months)
2. Enabling running LLM's on tabular results on Sumble directly that would enable pulling in job description context into the LLM call
3. Experimenting with an MCP endpoint, to see if that's helpful for these workflows as well
4. Experimenting with adding Sumble scoring models
1. I couldn't find some key persons that I know works in an organization. How accurate is the data?
2. I don't know if this is happening because you are getting lots of traffic now, but each query takes 20-30 seconds which is unusable.
> - Is the web app intuitive?
Yes
> - What queries do you want us to prioritize supporting in an API?
Maybe specific but I want to filter by head count in job function (ie: find organizations that have 50-200 software engineers regardless of their total head count).
> - What additional external data sources would you like us to prioritize? - What workflow improvements/integrations would you find most helpful?
I don't really care as long as the data is as accurate as possible. The process of lead generation/research is a slow one that I don't think workflows matter.
Thanks! What queries are you finding painful? Most should be under a second, there's some that are expensive though
You're not alone! We've heard this from others as well, planning to add it soon
The job functions we currently classify have been mostly focused by our early users/customers (companies building products/tools/infrastructure for data and software engineering teams), and handling the multilingual aspects of those across countries well.
We're aiming to extend this in two ways:
1. Adding job title and job description full-text search, to handle the long tail of usecases (in-flight project)
2. Extend the job function classification to the full universe of jobs that people can have
Congrats on the launch!
API could be helpful for enrichment of internal sources. MCP would also definitely make sense as well
Right now, we've focused on normalizing several key entities (e.g. organizations including parent/subsidiary relations, technologies, people, and job functions), and capturing the relations between these as well as additional useful metadata like location and industry.
From a backend implementation standpoint, this is currently implemented as structured relational tables for query performance and simplicity (e.g. count up all teams mentioning pytorch in job posts including rolling up across parent subsidiaries and sort by the biggest organizations descending).
Future direction here is TBD as we expand the sources that we cover and types of queries that can be computed across these sources.
There's been a lot of attempts at building high-quality public knowledge graphs that haven't hit escape velocity.
We're focusing on a structured, commercially relevant subset of the problem as a starting point to generate a critical mass of usage and funding that will enable us to build the bigger vision: a highly structured, up-to-date, and trusted repository of all the facts about the world that is easy to browse, query, and integrate programatically into all the relevant workflows (including for grounding LLM's)
Tools like Clay and Apollo are often misused for spammy cold outreach—which rarely works. The real value lies in enriching leads who’ve already shown interest, helping align marketing efforts with the right prospects. Beyond that, more data doesn’t always improve GTM decisions.
I'd love to hear(and learn) how others would want to use this tool specifically for GTM.
1. Revenue Operations teams
Integrate Sumble's data programmatically to help with account scoring, territory planning, account qualification/disqualification, and CRM data cleanup. We provide feature matrices that feed ML models for large sales teams.
2. Individual AE's/SDR's
Many sales people have a small universe of named accounts that they go deep on. They use Sumble to understand the buying groups that exist within their target accounts, and which relevant technologies these groups use, and any relevant projects going on (e.g. data infrastructure migrations, cloud migrations, and GenAI projects can be critical signals for many of our customers)
For ongoing awareness of key changes within accounts, we work with our enterprise customers to define all the signals that are relevant to their sales plays, and send email/slack notifications when any of these signals happens in their accounts as well.
For sales reps with a larger universe of accounts (e.g. the SMB/commercial tier), they use us to filter out a lot of the noise in their territory and understand which accounts are real active businesses that are potential users of their product that they should spend time on.
3. Marketing
Marketers use us to figure out which accounts to focus on, and to spin up very targeted LinkedIn/Facebook/etc. campaigns to reach their most likely potential users and buyers
requires you to sign in, which then follows by marketing emails.