* Update mode to be a default parameter in `FileStore.read`
* Move query history exporting process to be a background job instead
* Move hardcoded report-file-naming to a common utility function
* Add type annotations
* Update download component
* Implement button to re-ping and download CSV file; fix up some backend file-checking logic
* De-indent logic (w/ early return)
* Return different error codes dependings on the type of task status
* Add more resistant failure retrying mechanisms
* Remove default parameter in helper function
* Use popup for error messaging
* Update return code
* Update web/src/app/ee/admin/performance/query-history/DownloadAsCSV.tsx
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
* Add type to useState call
* Update backend/ee/onyx/server/query_history/api.py
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
* Update backend/onyx/file_store/file_store.py
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
* Update backend/ee/onyx/background/celery/apps/primary.py
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
* Move rerender call to after check
* Run formatter
* Add type conversions back (smh greptile)
* Remove duplicated call to save_file
* Move non-fallible logic out of try-except block
* Pass date-ranges into API call
* Convert to ISO strings before passing it into the API call
* Add API to list all tasks
* Create new pydantic model to represent tasks to return instead
* Change helper to only fetch query-history tasks
* Use `shared_tasks` instead of old method
* Address more comments from PR; consolidate how task name is generated
* Mark task as failed if any exception is raised
* Change the task object which is returned back to the FE
* Add a table to display previously generated query-history-csv's
* Add timestamps to task; delete tasks as soon as file finishes processing
* Raise exception if start_time is not present
* Convert hard-coded string to constant
* Add "Generated At" field to table
* Return task list in sorted order (based off of start-time)
* Implement pagination
* Remove unused props and cleanup tailwind classes
* Change the name of kickoff button
* Redesign how previous query exports are viewed
* Make button a constant width even when contents change
* Remove timezone information before comparing
* Decrease interval time for re-pinging API
* Add timezone to start-time creation
* Add a refreshInterval for getting updated task status
* Add new background queue
* Edit small verbiage and remove error popup when max-retries is hit
* Change up heavy worker to recognize new task in new module
* Ensure `celery_app` is imported
* Change how `celery_app` is imported and defined
* Update comment on why `celery_app` must be imported
* Add basic skeleton for new beat task to cleanup any dead / failed query-history-export tasks
* Move cleanup task to different worker / queue
* Implement cleanup task
* Add return type
* Address comment on PR
* Remove delimiter from prefix
* Change name of function to be more descriptive
* Remove delimiter from prefix constant
* Move function invocation closer to usage location
* Move imports to top of file
* Move variable up a scope due to undefined error
* Remove dangling if-statement
* Make function more pure-functional
* Remove redefinition
---------
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
* wip checkpointing/continue on failure
more stuff for checkpointing
Basic implementation
FE stuff
More checkpointing/failure handling
rebase
rebase
initial scaffolding for IT
IT to test checkpointing
Cleanup
cleanup
Fix it
Rebase
Add todo
Fix actions IT
Test more
Pagination + fixes + cleanup
Fix IT networking
fix it
* rebase
* Address misc comments
* Address comments
* Remove unused router
* rebase
* Fix mypy
* Fixes
* fix it
* Fix tests
* Add drop index
* Add retries
* reset lock timeout
* Try hard drop of schema
* Add timeout/retries to downgrade
* rebase
* test
* test
* test
* Close all connections
* test closing idle only
* Fix it
* fix
* try using null pool
* Test
* fix
* rebase
* log
* Fix
* apply null pool
* Fix other test
* Fix quality checks
* Test not using the fixture
* Fix ordering
* fix test
* Change pooling behavior
* doc_sync is refactored
* maybe this works
* tested to work!
* mypy fixes
* enabled integration tests
* fixed the test
* added external group sync
* testing should work now
* mypy
* confluence doc id fix
* got group sync working
* addressed feedback
* renamed some vars and fixed mypy
* conf fix?
* added wiki handling to confluence connector
* test fixes
* revert google drive connector
* fixed groups
* hotfix
* fresh indexing feature branch
* cherry pick test
* Revert "cherry pick test"
This reverts commit 2a62422068.
* set multitenant so that vespa fields match when indexing
* cleanup pass
* mypy
* pass through env var to control celery indexing concurrency
* comments on task kickoff and some logging improvements
* disentangle configuration for different workers and beats.
* use get_session_with_tenant
* comment out all of update.py
* rename to RedisConnectorIndexingFenceData
* first check num_indexing_workers
* refactor RedisConnectorIndexingFenceData
* comment out on_worker_process_init
* missed a file
* scope db sessions to short lengths
* update launch.json template
* fix types
* code review
* fresh indexing feature branch
* cherry pick test
* Revert "cherry pick test"
This reverts commit 2a62422068.
* set multitenant so that vespa fields match when indexing
* cleanup pass
* mypy
* pass through env var to control celery indexing concurrency
* comments on task kickoff and some logging improvements
* use get_session_with_tenant
* comment out all of update.py
* rename to RedisConnectorIndexingFenceData
* first check num_indexing_workers
* refactor RedisConnectorIndexingFenceData
* comment out on_worker_process_init
* fix where num_indexing_workers falls back
* remove extra brace
* first cut at redis
* some new helper functions for the db
* ignore kombu tables in alembic migrations (used by celery)
* multiline commands for readability, add vespa_metadata_sync queue to worker
* typo fix
* fix returning tuple fields
* add constants
* fix _get_access_for_document
* docstrings!
* fix double function declaration and typing
* fix type hinting
* add a global redis pool
* Add get_document function
* use task_logger in various celery tasks
* add celeryconfig.py to simplify configuration. Will be used in a subsequent commit
* Add celery redis helper. used in a subsequent PR
* kombu warning getting spammy since celery is not self managing its queue in Postgres any more
* add last_modified and last_synced to documents
* fix task naming convention
* use celeryconfig.py
* the big one. adds queues and tasks, updates functions to use the queues with priorities, etc
* change vespa index log line to debug
* mypy fixes
* update alembic migration
* fix fence ordering, rename to "monitor", fix fetch_versioned_implementation call
* mypy
* switch to monotonic time
* fix startup dependencies on redis
* rebase alembic migration
* kombu cleanup - fail silently
* mypy
* add redis_host environment override
* update REDIS_HOST env var in docker-compose.dev.yml
* update the rest of the docker files
* in flight
* harden indexing-status endpoint against db changes happening in the background. Needs further improvement but OK for now.
* allow no task syncs to run because we create certain objects with no entries but initially marked as out of date
* add back writing to vespa on indexing
* actually working connector deletion
* update contributing guide
* backporting fixes from background_deletion
* renaming cache to cache_volume
* add redis password to various deployments
* try setting up pr testing for helm
* fix indent
* hopefully this release version actually exists
* fix command line option to --chart-dirs
* fetch-depth 0
* edit values.yaml
* try setting ct working directory
* bypass testing only on change for now
* move files and lint them
* update helm testing
* some issues suggest using --config works
* add vespa repo
* add postgresql repo
* increase timeout
* try amd64 runner
* fix redis password reference
* add comment to helm chart testing workflow
* rename helm testing workflow to disable it
* adding clarifying comments
* address code review
* missed a file
* remove commented warning ... just not needed
* fix imports
* refactor to use update_single
* mypy fixes
* add vespa test
* multiple celery workers
* update logs as well and set prefetch multipliers appropriate to the worker intent
* add db refresh to connector deletion
* add some preliminary locking
* organize tasks into separate files
* celery auto associates tasks created inside another task, which bloats the result metadata considerably. trail=False prevents this.
* code review fixes
* move monitor_usergroup_taskset to ee, improve logging
* add multi workers to dev_run_background_jobs.py
* update supervisord with some recommended settings for celery
* name celery workers and shorten dev script prefixing
* add configurable sql alchemy engine settings on startup (needed for various intents like API server, different celery workers and tasks, etc)
* fix comments
* autoscale sqlalchemy pool size to celery concurrency (allow override later?)
* supervisord needs the percent symbols escaped
* use name as primary check, some minor refactoring and type hinting too.
* addressing code review
* fix import
* fix prune_documents_task references
---------
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
* Deleting a connector should redirect to the indexing status page
* minor update to dev background jobs
* update refresh logic
* remove print statement
---------
Co-authored-by: pablodanswer <pablo@danswer.ai>
* avoid reindexing secondary indexes after they succeed
* use postgres application names to facilitate connection debugging
* centralize all postgres application_name constants in the constants file
* missed a couple of files
* mypy fixes
* update dev background script