Releases: HKUDS/RAG-Anything
Releases · HKUDS/RAG-Anything
v1.3.1
What's Changed
- Fix duplicate detection for text-bearing
insert_content_listcalls by deferringdoc_statuscreation until after LightRAGainsertruns. - Apply the same deferral to
process_document_completefor parsed documents with text. - Keep early
doc_statuscreation for multimodal-only content that does not callainsert. - Add regression coverage for content-list and parsed-document duplicate handling.
Validation
PYTHONPATH=. .venv/bin/python -m pytest tests/test_insert_content_list.py tests/testparser_wiring.py tests/testparser_kwargs.py -quvx ruff check raganything/processor.py tests/test_insert_content_list.py raganything/__init__.py.venv/bin/python -m compileall raganything/processor.py tests/test_insert_content_list.py raganything/__init__.py
v1.3.0
What's Changed
⚠️ Behavior changes worth noting
DoclingParsernow uses the Docling Python API instead of shelling out to thedoclingCLI. This means:- You now need
pip install doclingto use it (thedoclingexecutable on PATH alone is no longer sufficient). - The
env={...}kwarg onDoclingParserparse methods is still accepted for compatibility but is now ignored — set the relevant environment variables in the parent process or pass_get_converterkwargs (artifacts_path,table_mode, …). <file_stem>.jsonand<file_stem>.mdartifacts written under<output_dir>/<file_stem>/docling/are still produced, but viaexport_to_dict()/export_to_markdown()rather than the CLI serializer — the logical content is the same but the files are not byte-identical.check_installation()now tests Python importability rather than probing the CLI on PATH.
- You now need
MineruParsersubprocess calls now run with a default timeout (configurable) and raiseTimeoutErrorinstead of hanging indefinitely.
✨ New features
- feat(parser): add remote URL support for DoclingParser by @bueno12223 in #195
- feat(omml): add OMML equation extraction utility for DOCX documents (closes #259) by @Abdeltoto in #262
- feat: add MiniMax provider support by @octo-patch in #264
- feat(examples): make LLM and vision model names configurable via env vars by @zhangzhenfei in #231
- feat: add Ollama integration example (closes #118) by @jwchmodx in #238
🛠 Refactor / performance
- refactor(parser): replace Docling CLI subprocess with Python API (closes #222) by @Abdeltoto in #261
🐛 Bug fixes
- fix: create doc_status even when LightRAG lacks multimodal insert args (closes #244) by @DeepaliPaspule in #255
- fix: prevent crashes from uninitialized LightRAG, env-var stripping, and parser cleanup by @jwchmodx in #240
- fix: add timeout parameter to MinerU subprocess to prevent indefinite hang (#172) by @peterCheng123321 in #254
- fix: pass entity_chunks_storage and relation_chunks_storage to all merge_nodes_and_edges calls (#241) by @peterCheng123321 in #250
- fix: handle messages= kwarg in vision_model_func (insert_content_list_example) (#28) by @peterCheng123321 in #252
- fix: forward system_prompt parameter in aquery_with_multimodal (#257) by @kuishou68 in #258
- fix(examples): preserve embedding kwargs with partial by @txhno in #263
- fix: demote misleading LibreOffice 'not found' warning to debug (closes #230) by @jwchmodx in #237
- fix: strip
<think>tags from modal processor fallback responses (closes #159) by @jwchmodx in #236 - fix: create example log directory correctly by @haosenwang1018 in #242
- fix(init): remove duplicate
__all__assignment (#267) by @kuishou68 in #268 - fix: improve PDF parser handling by @davidangularme in #243
New Contributors
- @jwchmodx made their first contribution in #237
- @davidangularme made their first contribution in #243
- @zhangzhenfei made their first contribution in #231
- @bueno12223 made their first contribution in #195
- @kuishou68 made their first contribution in #268
- @peterCheng123321 made their first contribution in #250
- @txhno made their first contribution in #263
- @octo-patch made their first contribution in #264
- @Abdeltoto made their first contribution in #261
- @DeepaliPaspule made their first contribution in #255
Full Changelog: v1.2.10...v1.3.0
v1.2.10
What's Changed
- fix: use a single docling command for json and md formats by @wkpark in #198
- fix: normalize MinerU 2.0 field names for backward compatibility (#89) by @teamauresta in #202
- feat: add vLLM backend integration by @teamauresta in #201
- Fix potential path traversal and local file read vulnerabilities by @RinZ27 in #197
- feat(parser): add optional PaddleOCR backend by @SaqlainXoas in #199
- fix: prevent same-name file collision in parser output directories (#51) by @teamauresta in #203
- feat: add get_version() helper by @haosenwang1018 in #214
- feat: support environment variables in parsers by @wkpark in #210
- chore: export new public APIs in init.py by @Jah-yee in #219
- test: expand coverage for core config, utils, and batch parser by @Jah-yee in #218
- feat: add custom parser plugin system (closes #151) by @Jah-yee in #215
- feat: add processing events and callbacks system by @Jah-yee in #217
- fix(examples): use openai_embed.func to prevent double EmbeddingFunc wrapping by @syshin0116 in #223
- fix: use valid CID font names for Chinese text rendering (fixes #24) by @Exploreunive in #226
- fix: preserve full_entities metadata when adding multimodal entities by @Exploreunive in #228
- fix: handle closed event loop in close() to eliminate atexit warning (fixes #135) by @Exploreunive in #225
- feat: add retry and circuit breaker utilities for LLM calls (mitigates #172) by @Jah-yee in #216
- feat: add multilingual prompt template support (closes #85) by @Jah-yee in #220
New Contributors
- @wkpark made their first contribution in #198
- @teamauresta made their first contribution in #202
- @RinZ27 made their first contribution in #197
- @SaqlainXoas made their first contribution in #199
- @haosenwang1018 made their first contribution in #214
- @Jah-yee made their first contribution in #219
- @syshin0116 made their first contribution in #223
- @Exploreunive made their first contribution in #226
Full Changelog: v1.2.9...v1.2.10
v1.2.9
What's Changed
- feat: RAG-Anything runs offline by @LaansDole in #122
- Fix status comparison case mismatch in processor.py by @yrangana in #142
- [Doc]: fixing typos in a couple of tiles by @didier-durand in #162
- Feat query add system prompt by @EightyOliveira in #166
- Enhance the
process_folder_completefunction by @ikmak in #165 - Allow configuring full-path in file_path fields in RAG by @hanlianlu in #171
- using parser logger instead of logging by @ikmak in #180
- fix: resolve multiple bugs in RAGAnything initialization and parsing by @majiayu000 in #182
- Feature: batch dry run by @adevol in #185
- Fix #186: MinerU 2.7.0+ caused filepath errors with new default backend by @hanlianlu in #188
New Contributors
- @yrangana made their first contribution in #142
- @didier-durand made their first contribution in #162
- @EightyOliveira made their first contribution in #166
- @ikmak made their first contribution in #165
- @hanlianlu made their first contribution in #171
- @majiayu000 made their first contribution in #182
- @adevol made their first contribution in #185
Full Changelog: v1.2.8...v1.2.9
v1.2.8
What's Changed
- Add RAGAnything processing to LightRAG's webui by @hzywhite in #97
- Add RAGAnything processing to LightRAG's webui by @hzywhite in #113
- fix: replace del with atexit to fix RAGAnything cleanup warning by @liz-in-tech in #106
- feat: Add support for Chinese characters in PDF generation by @hongdongjian in #103
- Feat: LM Studio integration example and uv implementation by @LaansDole in #99
New Contributors
- @hzywhite made their first contribution in #97
- @liz-in-tech made their first contribution in #106
- @hongdongjian made their first contribution in #103
- @LaansDole made their first contribution in #99
Full Changelog: v1.2.7...v1.2.8
v1.2.7
v1.2.6
What's Changed
- Update .gitignore to include AI-related files and directories by @BenjaminX in #68
- Add Batch Processing and Enhanced Markdown Features by @ShorthillsAI in #64
New Contributors
- @BenjaminX made their first contribution in #68
- @ShorthillsAI made their first contribution in #64
Full Changelog: v1.2.5...v1.2.6