* fix acl prefixing
* increase timeout a tad
* block access to init'ing DocumentAccess directly, fix test to work with ee/MIT
* fix env var checks
---------
Co-authored-by: Richard Kuo (Onyx) <rkuo@onyx.app>
* remove title for slack
* initial working code
* simplification
* improvements
* name change to information_content_model
* avoid boost_score > 1.0
* nit
* EL comments and improvements
Improvements:
- proper import of information content model from cache or HF
- warm up for information content model
Other:
- EL PR review comments
* nit
* requirements version update
* fixed docker file
* new home for model_server configs
* default off
* small updates
* YS comments - pt 1
* renaming to chunk_boost & chunk table def
* saving and deleting chunk stats in new table
* saving and updating chunk stats
* improved dict score update
* create columns for individual boost factors
* RK comments
* Update migration
* manual import reordering
* early work in progress
* rename utility script
* move actual data seeding to a shareable function
* add test
* make the test pass with the fix
* fix comment
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
* wip checkpointing/continue on failure
more stuff for checkpointing
Basic implementation
FE stuff
More checkpointing/failure handling
rebase
rebase
initial scaffolding for IT
IT to test checkpointing
Cleanup
cleanup
Fix it
Rebase
Add todo
Fix actions IT
Test more
Pagination + fixes + cleanup
Fix IT networking
fix it
* rebase
* Address misc comments
* Address comments
* Remove unused router
* rebase
* Fix mypy
* Fixes
* fix it
* Fix tests
* Add drop index
* Add retries
* reset lock timeout
* Try hard drop of schema
* Add timeout/retries to downgrade
* rebase
* test
* test
* test
* Close all connections
* test closing idle only
* Fix it
* fix
* try using null pool
* Test
* fix
* rebase
* log
* Fix
* apply null pool
* Fix other test
* Fix quality checks
* Test not using the fixture
* Fix ordering
* fix test
* Change pooling behavior
* add timings for syncing
* add more logging
* more debugging
* refactor multipass/db check out of VespaIndex
* circular imports?
* more debugging
* add logs
* various improvements
* additional logs to narrow down issue
* use global httpx pool for the main vespa flows in celery. Use in more places eventually.
* cleanup debug logging, etc
* remove debug logging
* this should use the secondary index
* mypy
* missed some logging
* review fixes
* refactor get_default_document_index to use search settings
* more missed logging
* fix circular refs
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
Co-authored-by: pablodanswer <pablo@danswer.ai>
* prototype tools for handling prod issues
* add some commands
* add batching and dry run options
* custom redis tool
* comment
* default to app config settings for redis
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
* doc_sync is refactored
* maybe this works
* tested to work!
* mypy fixes
* enabled integration tests
* fixed the test
* added external group sync
* testing should work now
* mypy
* confluence doc id fix
* got group sync working
* addressed feedback
* renamed some vars and fixed mypy
* conf fix?
* added wiki handling to confluence connector
* test fixes
* revert google drive connector
* fixed groups
* hotfix
* fresh indexing feature branch
* cherry pick test
* Revert "cherry pick test"
This reverts commit 2a624220687affdda3de347e30f2011136f64bda.
* set multitenant so that vespa fields match when indexing
* cleanup pass
* mypy
* pass through env var to control celery indexing concurrency
* comments on task kickoff and some logging improvements
* disentangle configuration for different workers and beats.
* use get_session_with_tenant
* comment out all of update.py
* rename to RedisConnectorIndexingFenceData
* first check num_indexing_workers
* refactor RedisConnectorIndexingFenceData
* comment out on_worker_process_init
* missed a file
* scope db sessions to short lengths
* update launch.json template
* fix types
* code review