A LLM semantic caching system aiming to enhance user experience by reducing response time via cached query-result pairs.
-
Updated
Jun 30, 2025 - Python
A LLM semantic caching system aiming to enhance user experience by reducing response time via cached query-result pairs.
Redis Vector Library (RedisVL) -- the AI-native Python client for Redis.
mimir is a drop-in proxy that caches LLM API responses using semantic similarity, reducing costs and latency for repeated or similar queries.
SmarterRouter: An intelligent LLM gateway and VRAM-aware router for Ollama, llama.cpp, and OpenAI. Features semantic caching, model profiling, and automatic failover for local AI labs.
Unified AI Gateway for 30+ LLMs (OpenAI, Anthropic, Bedrock, Azure etc) with Caching, Guardrails, A/B test & cost controls. Go-native Fastest & Scalable AI Gateway LiteLLM & Kong AI Gateway alternative.
Reliable and Efficient Semantic Prompt Caching with vCache
Redis integration for Google Agent Development Kit (ADK) - Memory, Sessions, Search Tools, MCP
Redis Vector Library (RedisVL) -- the AI-native Java client for Redis.
This is a RAG based chatbot in which semantic cache and guardrails have been incorporated.
This repository contains sample code demonstrating how to implement a verified semantic cache using Amazon Bedrock Knowledge Bases to prevent hallucinations in Large Language Model (LLM) responses while improving latency and reducing costs.
Enterprise AI traffic gateway — unified compliance, routing across 20+ LLM providers, semantic cache, quotas, and audit. SDK / network / OS-layer intercept.
High-performance LLM query cache with semantic search. Reduce API costs 80% and latency from 8.5s to 1ms using Redis + Qdrant vector DB. Multi-provider support (OpenAI, Anthropic).
Local-first semantic cache for AI agents. A small C daemon + CLI that remembers what your agent learned across sessions. Plugs into Claude Code, Codex, Gemini CLI, and Claude Desktop / ChatGPT via MCP. No LLM calls, no SaaS, no API key.
OpenAI-compatible LLM gateway that reduces API costs using Redis exact cache and Qdrant semantic cache.
🏆 #1 on LLM routing benchmark · Cheapest LLM router with memory · Open-source parallel multi-LLM execution across 47+ providers
Enhance LLM retrieval performance with Azure Cosmos DB Semantic Cache. Learn how to integrate and optimize caching strategies in real-world web applications.
AI real-estate automation platform: Telegram bot, RAG, apartment search, CRM workflows, voice agent, Langfuse observability, and Dockerized AI runtime.
Self-hosted spend firewall for OpenAI / Anthropic / Gemini. Hard per-user & per-project budget caps that block runaway costs before the API call, plus cost-per-customer tracking, semantic caching, and failover. One line of code, single Go binary.
Ultra-fast Semantic Cache Proxy written in pure C
Add a description, image, and links to the semantic-cache topic page so that developers can more easily learn about it.
To associate your repository with the semantic-cache topic, visit your repo's landing page and select "manage topics."