danswer

mirror of https://github.com/danswer-ai/danswer.git synced 2025-10-11 13:46:07 +02:00

Author	SHA1	Message	Date
Raunak Bhagat	79b981075e	perf: Optimize query history exporting process (#4602 ) * Update mode to be a default parameter in `FileStore.read` * Move query history exporting process to be a background job instead * Move hardcoded report-file-naming to a common utility function * Add type annotations * Update download component * Implement button to re-ping and download CSV file; fix up some backend file-checking logic * De-indent logic (w/ early return) * Return different error codes dependings on the type of task status * Add more resistant failure retrying mechanisms * Remove default parameter in helper function * Use popup for error messaging * Update return code * Update web/src/app/ee/admin/performance/query-history/DownloadAsCSV.tsx Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * Add type to useState call * Update backend/ee/onyx/server/query_history/api.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * Update backend/onyx/file_store/file_store.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * Update backend/ee/onyx/background/celery/apps/primary.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * Move rerender call to after check * Run formatter * Add type conversions back (smh greptile) * Remove duplicated call to save_file * Move non-fallible logic out of try-except block * Pass date-ranges into API call * Convert to ISO strings before passing it into the API call * Add API to list all tasks * Create new pydantic model to represent tasks to return instead * Change helper to only fetch query-history tasks * Use `shared_tasks` instead of old method * Address more comments from PR; consolidate how task name is generated * Mark task as failed if any exception is raised * Change the task object which is returned back to the FE * Add a table to display previously generated query-history-csv's * Add timestamps to task; delete tasks as soon as file finishes processing * Raise exception if start_time is not present * Convert hard-coded string to constant * Add "Generated At" field to table * Return task list in sorted order (based off of start-time) * Implement pagination * Remove unused props and cleanup tailwind classes * Change the name of kickoff button * Redesign how previous query exports are viewed * Make button a constant width even when contents change * Remove timezone information before comparing * Decrease interval time for re-pinging API * Add timezone to start-time creation * Add a refreshInterval for getting updated task status * Add new background queue * Edit small verbiage and remove error popup when max-retries is hit * Change up heavy worker to recognize new task in new module * Ensure `celery_app` is imported * Change how `celery_app` is imported and defined * Update comment on why `celery_app` must be imported * Add basic skeleton for new beat task to cleanup any dead / failed query-history-export tasks * Move cleanup task to different worker / queue * Implement cleanup task * Add return type * Address comment on PR * Remove delimiter from prefix * Change name of function to be more descriptive * Remove delimiter from prefix constant * Move function invocation closer to usage location * Move imports to top of file * Move variable up a scope due to undefined error * Remove dangling if-statement * Make function more pure-functional * Remove redefinition --------- Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>	2025-05-03 00:16:35 +00:00
pablonyx	3a3b2a2f8d	add user files (#4152 )	2025-04-01 16:19:44 -07:00
Chris Weaver	f1fc8ac19b	Connector checkpointing (#3876 ) * wip checkpointing/continue on failure more stuff for checkpointing Basic implementation FE stuff More checkpointing/failure handling rebase rebase initial scaffolding for IT IT to test checkpointing Cleanup cleanup Fix it Rebase Add todo Fix actions IT Test more Pagination + fixes + cleanup Fix IT networking fix it * rebase * Address misc comments * Address comments * Remove unused router * rebase * Fix mypy * Fixes * fix it * Fix tests * Add drop index * Add retries * reset lock timeout * Try hard drop of schema * Add timeout/retries to downgrade * rebase * test * test * test * Close all connections * test closing idle only * Fix it * fix * try using null pool * Test * fix * rebase * log * Fix * apply null pool * Fix other test * Fix quality checks * Test not using the fixture * Fix ordering * fix test * Change pooling behavior	2025-02-16 02:34:39 +00:00
pablonyx	ccb16b7484	Indexing latency check fix (#3747 ) * add logs + update dev script * update conig * remove prints * temporarily turn off * va * update * fix * finalize monitoring updates * update	2025-01-23 17:14:26 +00:00
pablodanswer	21ec5ed795	welcome to onyx	2024-12-13 09:56:10 -08:00
hagen-danswer	fdc4811fce	doc sync celery refactor (#3084 ) * doc_sync is refactored * maybe this works * tested to work! * mypy fixes * enabled integration tests * fixed the test * added external group sync * testing should work now * mypy * confluence doc id fix * got group sync working * addressed feedback * renamed some vars and fixed mypy * conf fix? * added wiki handling to confluence connector * test fixes * revert google drive connector * fixed groups * hotfix	2024-11-12 23:57:14 +00:00
rkuo-danswer	9105f95d13	Feature/celery refactor (#2813 ) * fresh indexing feature branch * cherry pick test * Revert "cherry pick test" This reverts commit `2a62422068`. * set multitenant so that vespa fields match when indexing * cleanup pass * mypy * pass through env var to control celery indexing concurrency * comments on task kickoff and some logging improvements * disentangle configuration for different workers and beats. * use get_session_with_tenant * comment out all of update.py * rename to RedisConnectorIndexingFenceData * first check num_indexing_workers * refactor RedisConnectorIndexingFenceData * comment out on_worker_process_init * missed a file * scope db sessions to short lengths * update launch.json template * fix types * code review	2024-10-22 22:57:36 +00:00
Yuhong Sun	eccec6ab7c	Notion Fix Nested Properties (#2877 )	2024-10-22 14:10:31 -07:00
rkuo-danswer	6913efef90	fresh indexing feature branch (#2790 ) * fresh indexing feature branch * cherry pick test * Revert "cherry pick test" This reverts commit `2a62422068`. * set multitenant so that vespa fields match when indexing * cleanup pass * mypy * pass through env var to control celery indexing concurrency * comments on task kickoff and some logging improvements * use get_session_with_tenant * comment out all of update.py * rename to RedisConnectorIndexingFenceData * first check num_indexing_workers * refactor RedisConnectorIndexingFenceData * comment out on_worker_process_init * fix where num_indexing_workers falls back * remove extra brace	2024-10-18 22:40:05 +00:00
Yuhong Sun	5d356cc971	Remove Perm Sync Script Dev (#2712 )	2024-10-07 13:50:30 -07:00
rkuo-danswer	fbf51b70d0	Feature/celery multi (#2470 ) * first cut at redis * some new helper functions for the db * ignore kombu tables in alembic migrations (used by celery) * multiline commands for readability, add vespa_metadata_sync queue to worker * typo fix * fix returning tuple fields * add constants * fix _get_access_for_document * docstrings! * fix double function declaration and typing * fix type hinting * add a global redis pool * Add get_document function * use task_logger in various celery tasks * add celeryconfig.py to simplify configuration. Will be used in a subsequent commit * Add celery redis helper. used in a subsequent PR * kombu warning getting spammy since celery is not self managing its queue in Postgres any more * add last_modified and last_synced to documents * fix task naming convention * use celeryconfig.py * the big one. adds queues and tasks, updates functions to use the queues with priorities, etc * change vespa index log line to debug * mypy fixes * update alembic migration * fix fence ordering, rename to "monitor", fix fetch_versioned_implementation call * mypy * switch to monotonic time * fix startup dependencies on redis * rebase alembic migration * kombu cleanup - fail silently * mypy * add redis_host environment override * update REDIS_HOST env var in docker-compose.dev.yml * update the rest of the docker files * in flight * harden indexing-status endpoint against db changes happening in the background. Needs further improvement but OK for now. * allow no task syncs to run because we create certain objects with no entries but initially marked as out of date * add back writing to vespa on indexing * actually working connector deletion * update contributing guide * backporting fixes from background_deletion * renaming cache to cache_volume * add redis password to various deployments * try setting up pr testing for helm * fix indent * hopefully this release version actually exists * fix command line option to --chart-dirs * fetch-depth 0 * edit values.yaml * try setting ct working directory * bypass testing only on change for now * move files and lint them * update helm testing * some issues suggest using --config works * add vespa repo * add postgresql repo * increase timeout * try amd64 runner * fix redis password reference * add comment to helm chart testing workflow * rename helm testing workflow to disable it * adding clarifying comments * address code review * missed a file * remove commented warning ... just not needed * fix imports * refactor to use update_single * mypy fixes * add vespa test * multiple celery workers * update logs as well and set prefetch multipliers appropriate to the worker intent * add db refresh to connector deletion * add some preliminary locking * organize tasks into separate files * celery auto associates tasks created inside another task, which bloats the result metadata considerably. trail=False prevents this. * code review fixes * move monitor_usergroup_taskset to ee, improve logging * add multi workers to dev_run_background_jobs.py * update supervisord with some recommended settings for celery * name celery workers and shorten dev script prefixing * add configurable sql alchemy engine settings on startup (needed for various intents like API server, different celery workers and tasks, etc) * fix comments * autoscale sqlalchemy pool size to celery concurrency (allow override later?) * supervisord needs the percent symbols escaped * use name as primary check, some minor refactoring and type hinting too. * addressing code review * fix import * fix prune_documents_task references --------- Co-authored-by: Richard Kuo <rkuo@rkuo.com>	2024-09-27 00:50:55 +00:00
hagen-danswer	f3cea79c1c	Deleting a connector should redirect to the indexing status page (#2504 ) * Deleting a connector should redirect to the indexing status page * minor update to dev background jobs * update refresh logic * remove print statement --------- Co-authored-by: pablodanswer <pablo@danswer.ai>	2024-09-18 21:38:35 +00:00
rkuo-danswer	7c283b090d	Feature/postgres connection names (#1998 ) * avoid reindexing secondary indexes after they succeed * use postgres application names to facilitate connection debugging * centralize all postgres application_name constants in the constants file * missed a couple of files * mypy fixes * update dev background script	2024-07-31 20:36:30 +00:00
rkuo-danswer	546bfbd24b	autoscale with pool=thread crashes celery. remove and use concurrency… (#1929 ) * autoscale with pool=thread crashes celery. remove and use concurrency instead (to be improved later) * update dev background script as well	2024-07-25 00:15:27 +00:00
rkuo-danswer	6ee74bd0d1	fix pointers to various background tasks and scripts (#1914 )	2024-07-24 10:12:51 -07:00
Yuhong Sun	0c827d1e6c	Permission Sync Framework (#44 )	2024-06-25 15:07:56 -07:00
Chris Weaver	17cc262f5d	Private personas doc sets (#52 ) Private Personas and Document Sets --------- Co-authored-by: Yuhong Sun <yuhongsun96@gmail.com>	2024-06-25 15:07:56 -07:00
Yuhong Sun	546815dc8c	Consolidate File Processing (#1449 )	2024-05-11 23:11:22 -07:00
Yuhong Sun	9a51745fc9	Updated Contributing for Celery (#629 )	2023-10-25 18:26:02 -07:00

19 Commits