Deepseek_chat_rag

Utilizes advanced retrieval-augmented generation models to answer queries based on indexed documents extracted from various file formats. Engages users by providing relevant answers from a Chroma database that stores extracted text from PDF, DOCX, TXT, and CSV files.

Author

Deepseek_chat_rag logo

samaraxmmar

No License

Quick Info

GitHub GitHub Stars 1
NPM Weekly Downloads 0
Tools 1
Last Updated 14/6/2025

Tags

retrieval samaraxmmar documents document processing samaraxmmar deepseek_chat_rag processing samaraxmmar

DeepSeek RAG Chatbot 🤖

DeepSeek RAG Chatbot Banner

An intelligent chatbot powered by Groq, LangChain, and ChromaDB to chat with your documents.

Issues Stars License


🌟 Introduction

DeepSeek RAG Chatbot is a powerful and intuitive application that allows you to have conversations with your own documents. By leveraging the speed of the Groq LPU Inference Engine and the versatility of LangChain, this tool transforms your static files (PDFs, DOCX, TXT, CSV) into an interactive knowledge base.

Simply upload your documents, and the system will automatically process, index, and prepare them for your questions. The user-friendly interface, built with Streamlit, makes it easy for anyone to get instant, accurate answers drawn directly from the provided content.

✨ Key Features

  • Multi-Format Document Support: Upload and process various file types, including .pdf, .docx, .txt, and .csv.
  • High-Speed Inferencing: Powered by Groq, delivering responses at exceptional speed for a fluid, real-time conversational experience.
  • Advanced RAG Pipeline: Utilizes LangChain for robust Retrieval-Augmented Generation, ensuring answers are relevant and contextually accurate.
  • Efficient Vector Storage: Employs ChromaDB to create and manage a persistent vector database of your document embeddings for fast retrieval.
  • User-Friendly Interface: A clean and simple web UI built with Streamlit that includes real-time processing feedback and chat history.
  • Open Source & Customizable: Fully open-source, allowing for easy customization and integration into other projects.

⚙️ How It Works

The application follows a sophisticated Retrieval-Augmented Generation (RAG) architecture to provide answers from your documents.

RAG Architecture Diagram

  1. Document Loading: You upload your documents (PDF, DOCX, etc.) through the Streamlit interface.
  2. Text Splitting & Embedding: The system loads the documents, splits them into smaller, manageable chunks, and generates vector embeddings for each chunk.
  3. Vector Indexing: These embeddings are stored in a ChromaDB vectorstore, creating a searchable index of your document's knowledge.
  4. User Query: You ask a question in the chat interface.
  5. Context Retrieval: The system takes your query, embeds it, and performs a similarity search in ChromaDB to retrieve the most relevant document chunks (the "context").
  6. Response Generation: The retrieved context and your original query are passed to the Groq-powered language model, which generates a human-like, accurate answer based on the provided information.

🚀 Getting Started

Follow these steps to set up and run the project on your local machine.

Prerequisites

  • Python 3.8+
  • A Groq API Key. You can get one for free at GroqCloud.

1. Clone the Repository

git clone [https://github.com/samaraxmmar/Deepseek_chat_rag.git](https://github.com/samaraxmmar/Deepseek_chat_rag.git)
cd Deepseek_chat_rag