* Make curators able to create permission synced connectors
* removed editing permission synced connectors for curators
* updated tests to use access type instead of is_public
* update copy
* cloud auth referral source
* minor clarity
* k
* minor modification to be best practice
* typing
* Update ReferralSourceSelector.tsx
* Update ReferralSourceSelector.tsx
---------
Co-authored-by: hagen-danswer <hagen@danswer.ai>
* doc_sync is refactored
* maybe this works
* tested to work!
* mypy fixes
* enabled integration tests
* fixed the test
* added external group sync
* testing should work now
* mypy
* confluence doc id fix
* got group sync working
* addressed feedback
* renamed some vars and fixed mypy
* conf fix?
* added wiki handling to confluence connector
* test fixes
* revert google drive connector
* fixed groups
* hotfix
* add provisioning on data plane
* functional but scrappy
* minor cleanup
* minor clean up
* k
* simplify
* update provisioning
* improve import logic
* ensure proper conditional
* minor pydantic update
* minor config update
* nit
* refactor RedisConnectorDeletion into RedisConnector
* refactor redis stop and deletion
* port pruning
* nest pruning
* port deletion
* port indexing
* refactor into individual files
* refactor redis connector index to take search settings at init
* move back to debug level log
* refactor doc set and user group (mostly)
* mypy fixes
* refactoring changes
* everything working for service account
* works with service account
* combined scopes
* copy change
* oauth prep
* Works for oauth and service account credentials
* mypy
* merge fixes
* Refactor Google Drive connector
* finished backend
* auth changes
* if its stupid but it works, its not stupid
* npm run dev fixes
* addressed change requests
* string fix
* minor fixes and cleanup
* spacing cleanup
* Update connector.py
* everything done
* testing!
* Delete backend/tests/daily/connectors/google_drive/file_generator.py
* cleaned up
---------
Co-authored-by: Chris Weaver <25087905+Weves@users.noreply.github.com>
* check for index swap
* initial bones
* kk
* k
* k:
* nit
* nit
* rebase + update
* nit
* minior update
* k
* minor integration test fixes
* nit
* ensure we build test docker image
* remove one space
* k
* ensure we wipe volumes
* remove log
* typo
* nit
* k
* k
* fresh indexing feature branch
* cherry pick test
* Revert "cherry pick test"
This reverts commit 2a624220687affdda3de347e30f2011136f64bda.
* set multitenant so that vespa fields match when indexing
* cleanup pass
* mypy
* pass through env var to control celery indexing concurrency
* comments on task kickoff and some logging improvements
* disentangle configuration for different workers and beats.
* use get_session_with_tenant
* comment out all of update.py
* rename to RedisConnectorIndexingFenceData
* first check num_indexing_workers
* refactor RedisConnectorIndexingFenceData
* comment out on_worker_process_init
* missed a file
* scope db sessions to short lengths
* update launch.json template
* fix types
* code review
* fresh indexing feature branch
* cherry pick test
* Revert "cherry pick test"
This reverts commit 2a624220687affdda3de347e30f2011136f64bda.
* set multitenant so that vespa fields match when indexing
* cleanup pass
* mypy
* pass through env var to control celery indexing concurrency
* comments on task kickoff and some logging improvements
* use get_session_with_tenant
* comment out all of update.py
* rename to RedisConnectorIndexingFenceData
* first check num_indexing_workers
* refactor RedisConnectorIndexingFenceData
* comment out on_worker_process_init
* fix where num_indexing_workers falls back
* remove extra brace
* use native rate limiting in the confluence client
* upgrade urllib3 to v2.2.3 to support retries in confluence client
* improve logging so that progress is visible.
* first cut at redis
* some new helper functions for the db
* ignore kombu tables in alembic migrations (used by celery)
* multiline commands for readability, add vespa_metadata_sync queue to worker
* typo fix
* fix returning tuple fields
* add constants
* fix _get_access_for_document
* docstrings!
* fix double function declaration and typing
* fix type hinting
* add a global redis pool
* Add get_document function
* use task_logger in various celery tasks
* add celeryconfig.py to simplify configuration. Will be used in a subsequent commit
* Add celery redis helper. used in a subsequent PR
* kombu warning getting spammy since celery is not self managing its queue in Postgres any more
* add last_modified and last_synced to documents
* fix task naming convention
* use celeryconfig.py
* the big one. adds queues and tasks, updates functions to use the queues with priorities, etc
* change vespa index log line to debug
* mypy fixes
* update alembic migration
* fix fence ordering, rename to "monitor", fix fetch_versioned_implementation call
* mypy
* switch to monotonic time
* fix startup dependencies on redis
* rebase alembic migration
* kombu cleanup - fail silently
* mypy
* add redis_host environment override
* update REDIS_HOST env var in docker-compose.dev.yml
* update the rest of the docker files
* in flight
* harden indexing-status endpoint against db changes happening in the background. Needs further improvement but OK for now.
* allow no task syncs to run because we create certain objects with no entries but initially marked as out of date
* add back writing to vespa on indexing
* actually working connector deletion
* update contributing guide
* backporting fixes from background_deletion
* renaming cache to cache_volume
* add redis password to various deployments
* try setting up pr testing for helm
* fix indent
* hopefully this release version actually exists
* fix command line option to --chart-dirs
* fetch-depth 0
* edit values.yaml
* try setting ct working directory
* bypass testing only on change for now
* move files and lint them
* update helm testing
* some issues suggest using --config works
* add vespa repo
* add postgresql repo
* increase timeout
* try amd64 runner
* fix redis password reference
* add comment to helm chart testing workflow
* rename helm testing workflow to disable it
* adding clarifying comments
* address code review
* missed a file
* remove commented warning ... just not needed
* fix imports
* refactor to use update_single
* mypy fixes
* add vespa test
* multiple celery workers
* update logs as well and set prefetch multipliers appropriate to the worker intent
* add db refresh to connector deletion
* add some preliminary locking
* organize tasks into separate files
* celery auto associates tasks created inside another task, which bloats the result metadata considerably. trail=False prevents this.
* code review fixes
* move monitor_usergroup_taskset to ee, improve logging
* add multi workers to dev_run_background_jobs.py
* update supervisord with some recommended settings for celery
* name celery workers and shorten dev script prefixing
* add configurable sql alchemy engine settings on startup (needed for various intents like API server, different celery workers and tasks, etc)
* fix comments
* autoscale sqlalchemy pool size to celery concurrency (allow override later?)
* supervisord needs the percent symbols escaped
* use name as primary check, some minor refactoring and type hinting too.
* stash merge (may not function yet)
* remove dead code
* more cleanup
* remove dead file
* we shouldn't be checking for deletion attempts in the db any more
* print cc_pair_id
* print status on status mismatch again
* add logging when cc_pair isn't present
* don't indexing any ingestion type connectors, and don't pause any connectors that aren't active
* add more specific check for deletion completion
* remove flaky mediawiki test site
* move is_pruning
* remove unused code
* remove old function
---------
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
* add tenant provisioning to data plane
* minor typing update
* ensure tenant router included
* proper auth check
* update disabling logic
* validated basic provisioning
* use new kv store