What it does: I created a Python script that dumps your entire Git repository into a single file. This makes it much easier to use with Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) systems.
Key Features: - Respects .gitignore patterns - Generates a tree-like directory structure - Includes file contents for all non-excluded files - Customizable file type filtering
Why I find it useful for LLM/RAG: - Full Context: It gives LLMs a complete picture of my project structure and implementation details. - RAG-Ready: The dumped content serves as a great knowledge base for retrieval-augmented generation. - Better Code Suggestions: LLMs seem to understand my project better and provide more accurate suggestions. - Debugging Aid: When I ask for help with bugs, I can provide the full context easily.
How to use it: Example: python dump.py /path/to/your/repo output.txt .gitignore py js tsx
Again, it's still a work in progress, but I've found it really helpful in my workflow with AI coding assistants (Claude/Openai). I'd love to hear your thoughts, suggestions, or if anyone else finds this useful!
https://github.com/artkulak/repo2file
P.S. If anyone wants to contribute or has ideas for improvement, I'm all ears!
- [files-to-prompt](https://github.com/simonw/files-to-prompt) (from the GOAT simonw)
- [code2prompt](https://github.com/mufeedvh/code2prompt)
- https://gh-repo-dl.cottonash.com/
- [1filellm](https://github.com/jimmc414/1filellm)
- [repopack](https://github.com/yamadashy/repopack)
- [ingest](https://github.com/sammcj/ingest)
What makes yours better?
What makes his better? Since you're asking, I tried these and here's my verdict:
- [files-to-prompt](https://github.com/simonw/files-to-prompt) (from the GOAT simonw) --> There's no option to specify files to include, must work backwards with ignore option
- [code2prompt](https://github.com/mufeedvh/code2prompt) --> It always puts the output to the paste buffer even if you specify output file
- https://gh-repo-dl.cottonash.com/ --> There's no CLI
- [1filellm](https://github.com/jimmc414/1filellm) --> Many dependencies and complicated setup(have to setup GitHub access token which I've never done)
- [repopack](https://github.com/yamadashy/repopack) - [ingest](https://github.com/sammcj/ingest) --> haven't tried these yet, but they actually look promising...
Which one of these you find the best? It’s quite tempting to write one myself for something as simple as this.
I guess the difference is that your script produces a complete copy, whereas aider uses a concise summary, necessary for when the context window is full
- treat '-' as stdout
- named arguments
- dont filter ignorefiles by checking they start with '.', cause it makes local .gitignore not being found, and treated as an extension :)
Put code blocks inside 3 ticks in the beginning and 3 ticks in the end since it's the default for each file.
Remove the dashes to save tokens.
In the title for the code blocks put the full relative path to the file since some projects have many files with the same name.
```js
console.log(2+2);
```
But I like the idea of tarballing it, as ndr_ suggested. I'm thinking that could be the move here.
In case anyone wanted to see my workflows https://github.com/atxtechbro/shell-tooling