- goal: produce more accurate and relevant `outputs?` - retrieval model: find most relevant information - generative model: create content based on input #### [Graph RAG vs Vector RAG: A Comprehensive Tutorial with Code Examples](https://ragaboutit.com/graph-rag-vs-vector-rag-a-comprehensive-tutorial-with-code-examples/) - #### [What Is RAG? (Retrieval Augmented Generation)](https://www.clarifai.com/blog/what-is-rag-retrieval-augmented-generation) - LLMs are good at answering questions around topics they have a large amount of training data on - problem with LLMs: outdated knowledge, knowledge gaps, hallucinations - RAG is Component of conversational AI. Adding more context, real-time data, to prompt - "the answer to this question is in this text" - LLM finds answers and puts it into natural language - integrate LLM with knowledge base - RAG capable of real-time information retrieval - naïve RAG: - Knowledge Base Retrieval: prepared vector store db optimized for textual similarity search - chunk size knowledge base is a balance between smaller chunks allowing broader information and bigger chunks allowing more detail - RAG uses prompt templates that include: user prompt, system instructions, historical and retrieved context. orchestration layer fills in the template and passes to LLM. - the prompt is turned into vector by embedding model - embedding model turns text into meaning - the LLM max context includes the length of its response as each time it runs, it creates a token and appends it to the prompt as the answer - prompt - I am reading this article naive https://www.clarifai.com/blog/what-is-rag-retrieval-augmented-generation - is it essential for retrieval models to use actual relational databases? Or could it use text documents. what all sources can a retrieval based model use? - what is the LLM + RAG + control logic called? what is the general name for the thing I interact with when I'm prompting chatGPT or gemini or claude? conversational AI - give me some details on what knowledge bases RAGs use? - based on the article, what are the "various components" the orchestration layer interacts with? - is the RAG control logic? no - what are LangChain, Semantic Kernel - should I consider turning my second brain into a static knowledge base? What are the basic considerations for doing this? Are there any setups that would need me to do anything and can just intake my whole second brain without modification? - are there instructions for getting an LLM model running on my laptop in under 20 minutes? what are the important metrics when considering LLMs? give me a table of open source LLMs with those metrics? yep done - regarding knowledge base, what is a vector store? what are common sizes of chunks in a vector? Are they all numbers that represent some length of text? is there a more official name for the piece of information that gets turned into a vector? Go in depth into the process for turning that piece of information into a number. Is it fair to say the vector is a number that conveys meaning? - is the LLM used at multiple stages of a response? both before and after the RAG? - why and how does the RAG use embeddings (numerical representation of text)? - is token limits the same as context window? - go more in depth about how the RAG associates the vectorized prompt with other vectors in the vector store - what versions of conversational AI started using RAG. What aspect decides what previous prompts information should be added to the new prompt? - how automated can information chunking be in the second brain context? Is it common for chunks to overlap to isolate different concepts? - how much does RAG in modern conversational AI depend on web searches and how much do they depend on static knowledge bases? - how well do RAGs use structured data like JSON? - what does it mean that GPT-2 is a transformers model? - define hallucination in terms of LLM ![[how a RAG works.png]]