Home Knowledge Base

Knowledge Base

Guidelines on how to manage knowledge bases.
By DigiAI INTL
3 articles

Knowledge Maintenance

Viewing Text Chunks Each document you upload to the knowledge base is organized into segments known as text chunks. You can browse the detailed content of these chunks by accessing the chunks list. Reviewing Chunk Quality The effectiveness of your knowledge base's Q&A capabilities is heavily influenced by the quality of these document chunks. It's advisable to manually review the chunks for quality assurance before linking the knowledge base to any agent. While automated chunking methods using character length, identifiers, or NLP semantic techniques can reduce labor for handling large text volumes, the intricacy of document formats and their semantic contexts necessitate manual checks to address machine chunk limitations in semantic processing. Important factors to observe when assessing chunk quality include: 1. Chunks that are too short, potentially losing semantic meaning. 2. Chunks that are too lengthy, introducing semantic noise that may impact accuracy. 3. Noticeable semantic truncation, which happens when exceeding segment length limits, causing forced truncation and omitting content during recall. Adding Text Chunks To include custom chunks in a document, navigate to the chunk list and click "Add a Chunk." This allows you to insert one or multiple chunks tailored to the document. Editing Text Chunks Within the chunk list, you can modify the added chunks by updating text content and keywords to ensure clarity and relevance. Disabling Documents You have the option to deactivate documents or specific chunks from being indexed in the dataset. To do so, locate the disable button in the dataset document list to deactivate the document. You can also choose to disable an entire document or a particular chunk within the document details. Once disabled, documents will not be indexed until you click enable to restore them.

Last updated on Oct 21, 2024

Knowledge Settings

To manage the settings of your chosen knowledge base, first select it, then navigate to the [Settings] tab. Here, you can adjust the name, description, indexing method, embedding model, and retrieval settings to optimize the performance and accessibility of your knowledge base. Knowledge Base Name Assign a unique name to differentiate this knowledge base from others, allowing for seamless navigation and management. Knowledge Description Provide a clear description of the content and purpose of the documents housed within the knowledge base, ensuring users understand what information is available. Indexing Methods - High-Quality Mode: This mode employs a customizable Embedding model, converting text chunks into numerical vectors for efficient data compression and storage, while also optimizing user interaction accuracy with LLMs. - Economical Mode: Utilizing an offline vector engine with keyword indexing, this mode minimizes operational costs by eliminating the need for additional tokens, albeit with reduced search precision. This method supports inverted indexing only. Embedding Model You have the option to change the embedding model used in the knowledge base. Switching models will prompt a re-embedding of all documents, with previous embeddings being erased to maintain data integrity. Retrieval Settings This involves executing both full-text and vector searches, followed by a reordering process to pinpoint the results that best align with the user's query. Weight Settings Customize the balance between semantic and keyword priorities to align search functionalities with organizational needs. Full-text (keyword) search ensures precision when specific terms are known, while semantic search applies vector distances for relevance, particularly useful in multilingual contexts. - Semantic Value of 1: Engage only semantic searches, facilitating deep content retrieval even when queries don't fully match the knowledge base terms. - Keyword Value of 1: Activate solely keyword search for precise, expedited searches, ideal for large databases or when specific terms are known. - Custom Weights: Besides single-mode options, set tailored weights to find the optimal balance between keyword and semantic searching that best serves your business objectives. TopK Define the number of text chunks retrieved based on similarity to the user's question. A default value of 3 is used, with higher values broadening the scope of retrieved text segments. Score Threshold Establish a similarity score threshold, defaulting to 0.5, to filter out less relevant text chunks. Raising the score threshold results in fewer, but potentially more relevant, retrieved segments.

Last updated on Oct 21, 2024