* remove title for slack
* initial working code
* simplification
* improvements
* name change to information_content_model
* avoid boost_score > 1.0
* nit
* EL comments and improvements
Improvements:
- proper import of information content model from cache or HF
- warm up for information content model
Other:
- EL PR review comments
* nit
* requirements version update
* fixed docker file
* new home for model_server configs
* default off
* small updates
* YS comments - pt 1
* renaming to chunk_boost & chunk table def
* saving and deleting chunk stats in new table
* saving and updating chunk stats
* improved dict score update
* create columns for individual boost factors
* RK comments
* Update migration
* manual import reordering
* add timings for syncing
* add more logging
* more debugging
* refactor multipass/db check out of VespaIndex
* circular imports?
* more debugging
* add logs
* various improvements
* additional logs to narrow down issue
* use global httpx pool for the main vespa flows in celery. Use in more places eventually.
* cleanup debug logging, etc
* remove debug logging
* this should use the secondary index
* mypy
* missed some logging
* review fixes
* refactor get_default_document_index to use search settings
* more missed logging
* fix circular refs
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
Co-authored-by: pablodanswer <pablo@danswer.ai>
* Add support for filtering 0xFDD0-0xFDEF Unicode range
- Update remove_invalid_unicode_chars to handle 0xFDD0-0xFDEF range
- Add comprehensive test cases for Unicode character sanitization
- Fix issue with illegal code point 0xFDDB in Vespa indexing
Co-Authored-By: Chris Weaver <chris@onyx.app>
* Remove unused pytest import
Co-Authored-By: Chris Weaver <chris@onyx.app>
---------
Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: Chris Weaver <chris@onyx.app>
* temporarily disabling validate indexing fences
* add back a few startup checks in the cloud
* use common vespa client to perform health check
* log vespa url and try using http1 on light worker index methods
---------
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>