DocsModel Parameters
Model Parameters
Large language models generate the next token by analyzing the sequence of tokens that came before it. Instead of always choosing the single most likely token—which would result in dull and repetitive text—the model picks from a range of possible tokens, allowing for more varied and interesting output.
There are various settings in HammerAI that allow users to fine-tune how the AI generates responses, providing control over creativity, coherence, and variability.
Temperature (Creativity Control)
The Temperature parameter determines the level of randomness in the AI’s responses, directly influencing its creativity and adherence to predefined behavior.
Lower values
- AI remains focused and precise.
- Responses are logical, predictable, and stay in line with the character’s original personality.
- Ideal for structured conversations and rule-based characters.
Higher values
- AI responses become more varied and imaginative.
- The AI may deviate from predefined behavior, resulting in unexpected or more creative responses.
- Useful for unpredictable or roleplay-driven interactions.
Repetition Penalty (Redundancy Control)
The Repetition Penalty setting controls how often the AI reuses words, phrases, or ideas from previous responses, preventing it from getting stuck in repetitive loops.
Lower values
- AI may repeat phrases frequently, keeping responses consistent but sometimes redundant.
- Risk of conversational loops where the AI reuses the exact wording excessively.
Higher values
- AI actively avoids repetition, resulting in more diverse and novel responses.
- Produce more creative or unexpected phrasing, but it can sometimes lead to incoherent responses.
Top P (Probability Mass Sampling)
The Top P setting determines the range of words the AI considers when generating a response by limiting the probability mass of word choices.
Lower values
- Selects responses from a smaller, focused vocabulary, leading to controlled output.
- Ensures precise, rule-following behavior.
Higher values
- AI considers a wider variety of words, increasing response variability.
- Encourages more dynamic and natural conversations.
Top K (Word Choice Limit)
The Top K setting limits the number of word choices the AI has when generating a response, controlling how "deep" it searches its vocabulary.
Lower values
- AI selects from a narrower set of words, ensuring high precision and predictability.
- Helps the AI stay in character and follow structured guidelines.
Higher values
- AI expands its vocabulary, making responses less predictable and more creative.
- This may lead to unusual phrasing or less structured dialogue.
Response Token Limit (Max Response Tokens)
The Max Response Tokens setting determines the maximum length of AI-generated responses. A slider allows users to fine-tune responses to be either short and concise or longer and more detailed.
- Increasing the token limit results in more comprehensive responses but may slow processing and require more system memory.
- Lowering the token limit ensures faster response times but may lead to less detailed answers.
- Adjusting this setting helps users balance AI response quality and system efficiency.
Memory Lock (MLock)
The MLock (Memory Lock) option controls whether the AI’s allocated memory should be locked to enhance stability. Users should enable or disable this setting based on their system’s available memory and whether stability issues arise.
- Enabling MLock helps manage memory allocation, leading to more stable performance for users with sufficient RAM.
- Disabling MLock allows the system to manage memory dynamically but may lead to performance fluctuations.
- If the system does not have enough RAM, enabling MLock may cause the application to crash due to insufficient memory availability.
Context Size
The Context Size setting allows users to define how much of the conversation history the AI retains. This setting is crucial for balancing AI recall capabilities and system efficiency.
- Larger context sizes allow for more long-term memory retention, improving the AI’s ability to reference previous messages within a conversation.
- Higher context settings consume significantly more RAM, leading to performance slowdowns or crashes if the system is underpowered.
- If users experience severe memory issues, increasing this setting may improve stability.
- If the AI lags, slows down, or crashes, lowering the context size can improve performance and reduce resource consumption.
How These Settings Work Together
By fine-tuning these parameters, users can shape the AI’s conversational style to fit their specific needs. These settings interact to create different AI behaviors:
Low Temperature + Low Top P + Low Top K
- Best for structured conversations, rule-following AI, and professional or task-oriented chatbots.
- AI will remain on-topic, precise, and predictable.
High Temperature + High Top P + High Top K
- Best for open-ended storytelling, creative writing, and highly imaginative conversations.
- AI will generate varied, dynamic, and unpredictable responses.
Balancing Repetition Penalty
- Increasing the Rep Penalty ensures the AI does not repeat itself excessively, adding variety to responses.
- Lowering the Rep Penalty allows the AI to maintain consistency in personality and phrasing.