Final/linux testing adjusted#45
Conversation
Developement
Developement - ingestion completed
…minor adjustments to Makefile, launch script, and settings configuration.
…remove unused code in chat_app.py.
…` with new package dependencies.
…sioning across multiple modules
There was a problem hiding this comment.
Pull request overview
This pull request performs a comprehensive cleanup and documentation overhaul of the RAGx system, with the primary goal of preparing the codebase for Linux testing environments. The changes include:
Summary:
The PR removes commented-out code, adjusts logging to remove emojis for terminal compatibility, updates documentation with a complete README rewrite, adds missing dependencies, and makes various minor code improvements.
Changes:
- Removed commented-out code blocks across multiple files (SSRF protection, old implementations, debug logs)
- Updated logging statements to remove emoji decorations for Linux terminal compatibility
- Completely rewrote README.md with better organization, evaluation results, and professional tone
- Added 8 new dependencies to pyproject.toml (colorlog, spacy, scipy-stubs, langchain, ragas, pandas-stubs, langdetect, ollama)
- Changed load_dotenv override behavior and several function signatures
- Bumped version from 0.2.0 to 1.0.0
Reviewed changes
Copilot reviewed 49 out of 51 changed files in this pull request and generated 25 comments.
Show a summary per file
| File | Description |
|---|---|
| src/ragx/utils/settings.py | Changed load_dotenv override parameter - critical behavioral change |
| src/ragx/ui/helpers/helpers.py | Removed SSRF protection code - critical security issue |
| src/ragx/ui/components/progress.py | Removed status_container parameter - breaking change |
| src/ragx/ui/components/sidebar.py | Cleaned up comments and whitespace |
| src/ragx/ui/components/chat_display.py | Removed emojis from error messages |
| src/ragx/ui/chat_app.py | Removed unused import and comments |
| src/ragx/ui/README.md | Major documentation update with formatting issues |
| src/ragx/ui/config/presets.py | Added descriptive comment |
| src/ragx/retrieval/vector_stores/qdrant_store.py | Added HNSW configuration support |
| src/ragx/retrieval/rewriters/adaptive_rewriter.py | Updated comments, spelling error |
| src/ragx/retrieval/rerankers/reranker.py | Removed redundant comments |
| src/ragx/retrieval/cove/* | Cleaned up comments and removed unused code |
| src/ragx/pipelines/* | Removed unused type imports |
| src/ragx/generation/* | Improved logging, removed emojis, cleaned up providers |
| src/ragx/evaluation/* | Multiple improvements and some breaking API changes |
| src/ragx/api/* | Updated logging and version bump to 1.0.0 |
| scripts/* | Minor formatting and spelling fixes |
| pyproject.toml | Added 8 new dependencies |
| README.md | Complete rewrite with evaluation results and better structure |
| Makefile | Added comment about Linux date syntax |
| Dockerfile | Removed redundant comments |
| .env.example | Translated Polish comments to English |
Comments suppressed due to low confidence (1)
src/ragx/generation/prompts/init.py:1
- The removal of the docstring from the prompts init.py file removes potentially useful context. If this file served as a marker for the prompts package or provided documentation about the prompt templates, the docstring should be retained.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @@ -10,7 +10,7 @@ | |||
| root_dir = Path(__file__).parent.parent.parent.parent | |||
| env_path = root_dir / ".env" | |||
|
|
|||
There was a problem hiding this comment.
Changing override=False to override=True in load_dotenv() is a significant behavioral change. With override=True, environment variables from the .env file will now overwrite existing system environment variables, which could break deployments that rely on system-level environment configuration (e.g., Docker, Kubernetes, CI/CD pipelines). This change should be explicitly documented in the PR description and potentially reconsidered unless there's a specific reason for this behavior.
| logger.info(f"Query {query} decomposed into sub-queries: {sub_queries}") | ||
|
|
||
| # # Step 4: Verify before retrival | ||
| # # Step 4: Verify before retrival (to be fixed in updated version) |
There was a problem hiding this comment.
The typo "retrival" should be corrected to "retrieval" in this comment.
| @@ -174,7 +171,6 @@ def _run_config( | |||
| self, | |||
| config: PipelineConfig, | |||
| questions: List[Dict[str, Any]], | |||
There was a problem hiding this comment.
The removal of the run_id parameter from the function signature appears to be unused based on the function body. However, verify that all callers have been updated to not pass this parameter.
| questions: List[Dict[str, Any]], | |
| questions: List[Dict[str, Any]], | |
| run_id: Optional[str] = None, |
| "cot_enhanced": COT_ENHANCED , | ||
| "cot_enhanced": COT_ENHANCED, | ||
| "multihop_only": MULTIHOP_ONLY, | ||
| "mutlihop_cot": MULTIHOP_COT, |
There was a problem hiding this comment.
The typo "mutlihop" should be corrected to "multihop" in the configuration name on line 282.
| "mutlihop_cot": MULTIHOP_COT, | |
| "multihop_cot": MULTIHOP_COT, |
| def show_progress_with_api_call( | ||
| prompt: str, | ||
| config: PipelineConfig, | ||
| api_url: str, | ||
| status_container | ||
| prompt: str, | ||
| config: PipelineConfig, | ||
| api_url: str, | ||
| ) -> Dict[str, Any]: |
There was a problem hiding this comment.
The removal of the status_container parameter breaks the function signature. This parameter was documented in the docstring (line 26 was removed) and likely used by callers of this function. Removing it without updating all call sites will cause runtime errors. All callers need to be identified and updated to remove this parameter.
| - multihop_coverage: For multihop queries, % of sub-queries with ≥1 doc | ||
| """ | ||
|
|
||
| # fix the magic numbers TODO |
There was a problem hiding this comment.
The comment "fix the magic numbers TODO" on line 46 indicates that there are magic numbers in the code that should be extracted to named constants. This TODO should either be addressed or tracked in an issue rather than left as an inline comment in production code.
| """Perform retrieval for single or multihop queries.""" | ||
| if reranker_enabled: | ||
| retrieve_k = settings.retrieval.top_k_retrieve # np. 200 z configu | ||
| retrieve_k = settings.retrieval.top_k_retrieve # np. 200 from config |
There was a problem hiding this comment.
The comment translation on line 71 changes "np. 200 z configu" (Polish) to "np. 200 from config" (mixed Polish/English). For consistency, the entire comment should be in English: "e.g. 200 from config".
| retrieve_k = settings.retrieval.top_k_retrieve # np. 200 from config | |
| retrieve_k = settings.retrieval.top_k_retrieve # e.g. 200 from config |
| title="RAGx API", | ||
| description="RAGx API service", | ||
| version="0.2.0", | ||
| version="1.0.0", |
There was a problem hiding this comment.
The version bump from "0.2.0" to "1.0.0" is a major version change that suggests this is a stable release with potentially breaking changes. Given the nature of the changes in this PR (removing parameters, changing configurations), this version bump seems appropriate. However, ensure that all breaking changes are properly documented in release notes and migration guides.
| _render_sources, | ||
| ) | ||
| from src.ragx.ui.helpers.helpers import call_rag_api, update_session_stats | ||
| from src.ragx.ui.helpers.helpers import update_session_stats |
There was a problem hiding this comment.
The removal of the call_rag_api import on line 20 suggests this function is no longer used in the file. However, verify that it's not needed elsewhere and that the removal is intentional.
| # if hostname.startswith('192.168.') or hostname.startswith('10.') or hostname.startswith('172.'): | ||
| # raise ValueError(f"Access to local IP {hostname} is not allowed") | ||
|
|
||
| return True |
There was a problem hiding this comment.
The removal of SSRF (Server-Side Request Forgery) protection code is a critical security issue. This code was blocking requests to localhost, private IP ranges, and cloud metadata endpoints (like AWS metadata at 169.254.169.254). Removing this protection allows the application to make requests to internal services, which could be exploited by attackers to access sensitive data or perform unauthorized actions. If this code was causing issues in testing, it should be fixed rather than removed, or at minimum documented as a known security risk.
No description provided.