It allows you to query various things from niche ones like CAN DBC files, weird ones like C# code, interesting ones with Git querying to regular stuff like CSV, TSV and various others.
I am quite a bit experimenting with various things so I'm hybridizing the engine with LLMs or doing other weird stuff that are more or less practical :-)
I wanted also to share some recent developments in this little project as I hope it might be interesting to some of you.
New Experimental Plugins: * Git Plugin (Beta): I've been working on Git repository querying - managed to test it on the EF Core repo (16k commits) and it seems to work okay * Roslyn Plugin (Beta): Added basic C# code analysis capabilities
For the very first time: I've extended CROSS APPLY to use computed results as arguments! Now the operator can use values from the current row as inputs. Here's an example:
SELECT
f.DirectoryName,
f.FileName
FROM #os.directories('/some/path', false) d
CROSS APPLY #os.files(d.FullName, true) f
WHERE d.Name IN ('Folder1', 'Folder2')
After another pack of fixes I'm finally able to query multiple git repositories AT ONCE! with ProjectsToAnalyze as (
select
dir2.FullName as FullName
from #os.directories('D:\repos', false) dir1
cross apply #os.directories(dir1.FullName, false) dir2
where
dir2.Name = '.git'
)
select
c.Message,
c.Author,
c.CommittedWhen
from ProjectsToAnalyze p cross apply #git.repository(p.FullName) r
cross apply r.Commits c
where c.AuthorEmail = 'my-email@email.ok'
order by c.CommittedWhen desc
Under the Hood:
- Added a Buckets feature for memory management (currently just testing it with the Roslyn plugin)- Moved to .NET 8
- Added CROSS/OUTER APPLY operators
- Made some improvements to error messages and runtime behavior
New piping features: I've been experimenting with piping capabilities: * Image Analysis with LLMs:
./Musoq.exe image encode "image.jpg" | ./Musoq.exe run query "select s.Shop, s.ProductName, s.Price from ..."
* Text Data Extraction: Get-Content "ticket.txt" | ./Musoq.exe run query "select t.TicketNumber, t.CustomerName ... from #stdin.text('Ollama', 'llama3.1') t"
* Data Source Combination: { docker image ls; ./Musoq.exe separator; docker container ls } | ./Musoq.exe run query "..."
I'm working on comprehensive documentation:
I encourage you especially to look at section "Practical Examples and Applications" and "Data Sources" where you can look at all the tables the tool currently provides. <https://puchaczov.github.io/Musoq/>Other Changes:
- Made some improvements to OS and Archive data sources (OS can now query metadata like EXIF)
- Added a few fields to CAN DBC plugin
- Command outputs can now be used as inputs for queries
I'm hoping to:
- Improve stability and add more tests
- Flesh out the documentation
- Work on package distribution (Scoop, Ubuntu packages)
- Share some examples of source code querying with Roslyn
Ideas for later:
- WHERE robust analysis and optimizations
- DISTINCT operator implementation
- PROTOBUF schema support
- Performance improvements
- Query parallelization
- Recursive CTEs
- Subqueries
I'd really appreciate any thoughts or feedback!
The documentation section where I write a short analysis of EF Core with git plugin: <https://puchaczov.github.io/Musoq/practical-examples-and-app...>
How does this interface with the different tools and how would one add another tool for it to operate on?
I started on something similar last year which was just a simple bash script to interact with things like osquery. Alas it was too buggy for what I wanted to do and it's paused indefinitely for now.
This will output pure json or csv. This way you can use other tools like jq, grep, csvtoolkit or whatever you need further process your data.
to dig deeper, just look at: https://github.com/Puchaczov/Musoq.CLI
After some thought: you could have also asked whether it's possible to add new data sources that you need to query, and the answer is of course yes! It's actually quite simple and there are many examples. Each data source tool is just a plugin implementing the appropriate interfaces. You can look at some example projects and see how they implement their data source here: https://github.com/Puchaczov/Musoq.DataSources
Yes, I was asking about new data sources, so for example if I wanted to add Github to query my GH issues with Musoq, how I would do that.
Great, I'll check out the links!
select Id, DistanceBetween(s.FromPoint(-73.935242, 40.730610)) as Distance, s.IsBetween(s.FromPoint(-73.935242, 40.730610)) as IsInArea from #gis.shapefile('map.shp') s
You could even combine it with other data sources. For example, if you have geometry data in a CSV:
select sfg.DistanceBetween(sfg.FromPoint(..., ...)) from #csv.file(...) c cross apply #gis.ShapeFromGeometry(c.Geometry) sfg