ci: pre-build FAISS indices in Docker image to fix ~60 min graph init#296
Open
luarss wants to merge 3 commits into
Open
ci: pre-build FAISS indices in Docker image to fix ~60 min graph init#296luarss wants to merge 3 commits into
luarss wants to merge 3 commits into
Conversation
Graph initialisation during the docker-eval CI job was taking ~60 minutes because all 6 FAISS vector indices were rebuilt from scratch at container startup on every run (HuggingFace CPU inference over the full corpus, or Google Gemini embedding API exhausting its quota and retrying with 60 s minimum backoff). Root cause: - HybridRetrieverChain.create_hybrid_retriever() takes the slow embed_docs() path when faiss_db/<name> does not exist on disk. - The faiss_data named volume is empty on every CI run (docker compose down --volumes is called between jobs), so the indices are never reused. Fix: - Add backend/scripts/build_faiss.py: runs RetrieverTools.initialize() with EMBEDDINGS_TYPE=HF, FAST_MODE=true, and contextual_rerank=False at Docker build time, saving all 6 FAISS indices into the image layer. - Add a RUN step in the Dockerfile that calls the script after the dataset is downloaded. Docker layer caching means the step is skipped on re-runs where neither source nor data changed. - Set ENV EMBEDDINGS_TYPE=HF / HF_EMBEDDINGS=thenlper/gte-large as container defaults so runtime matches the pre-built indices (override in .env or via docker run -e if a different model is needed). - Add contextual_rerank: bool = True param to RetrieverTools.initialize() so the build script can skip loading the cross-encoder model, keeping the Docker build dependency-light. On first CI run with an empty faiss_data volume Docker copies the pre-built indices from the image into the volume automatically, so the container finds faiss_db/<name> at startup and takes the load_db() path instead. Graph init drops from ~60 min to a few seconds. Note: ensure backend/.env (or ci-secret.yaml) sets EMBEDDINGS_TYPE=HF to match the pre-built indices; using a different model at runtime causes a vector dimension mismatch. Signed-off-by: Jack Luar <jluar@precisioninno.com>
bffb994 to
caba92a
Compare
Signed-off-by: Jack Luar <jluar@precisioninno.com>
Move FAISS index construction out of the main Dockerfile into a dedicated Dockerfile.faiss-cache built only by the new build-faiss.yml secret CI workflow (manual trigger + on upload.yml completion). The main Dockerfile now does COPY --from=faiss-cache-image, so PRs never rebuild indices. build_faiss.py gains per-index hash-based skipping: a manifest.json tracks a SHA256 of each index source file paths+sizes. On each secret CI run the previous faiss_db/ is extracted from the old cache image and placed in the build context; unchanged indices are skipped entirely. Index names migrated from fragile counter-based similarity_INST1..6 to stable explicit names (general, install, commands, yosys_rtdocs, klayout, errinfo) so manifest keys are stable across runs. Signed-off-by: Jack Luar <jluar@precisioninno.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
docker-evalCI was taking ~60 min because all 6 FAISS vector indices were rebuilt from scratch at container startup on every run. With HuggingFace CPU inference over the full corpus this is inherently slow; with Google Gemini embeddings the quota exhaustion triggers amin=60 sexponential backoff that compounds across hundreds of batches.FAST_MODE=true. At container startupHybridRetrieverChain.create_hybrid_retriever()detectsfaiss_db/<name>on disk and takes the fastload_db()path, dropping graph init from ~60 min to a few seconds.Changes
backend/scripts/build_faiss.pyRetrieverTools.initialize()withcontextual_rerank=False(no cross-encoder download) at build time, saves all 6 indices tofaiss_db/backend/DockerfileRUN … build_faiss.pyafter dataset clone; setsENV EMBEDDINGS_TYPE=HF/ENV HF_EMBEDDINGS=thenlper/gte-largeas container defaultsbackend/src/agents/retriever_tools.pycontextual_rerank: bool = Trueparam toinitialize()(backward-compatible default); threads it through to all 6HybridRetrieverChaincallsHow the volume interaction works
docker-compose.ymlmountsfaiss_data:/ORAssistant-backend/faiss_db. When the volume is empty (first run afterdocker compose down --volumes), Docker automatically copies the pre-built indices from the image into the volume. On subsequent starts the volume already has the indices. Either way the container findsfaiss_db/<name>at startup.Required CI-side change
Ensure
backend/.env(written byci-secret.yaml) sets:The runtime embedding model must match the build-time model — using
GOOGLE_GENAIat runtime against HF-built indices causes a vector dimension mismatch.Test plan
uv run pytest tests/test_retriever_tools.pypasses (24/24 ✓ locally)docker compose buildcompletes andfaiss_db/is populated inside the imagecurl http://localhost:8000/conversations/readyreturns{"status":"ready"}within 2–3 polling iterations (~20–30 s) afterdocker compose updocker-evalCI job finishes in < 20 min total