25 Commits

Author SHA1 Message Date
Chris Weaver
f1fc8ac19b
Connector checkpointing (#3876)
* wip checkpointing/continue on failure

more stuff for checkpointing

Basic implementation

FE stuff

More checkpointing/failure handling

rebase

rebase

initial scaffolding for IT

IT to test checkpointing

Cleanup

cleanup

Fix it

Rebase

Add todo

Fix actions IT

Test more

Pagination + fixes + cleanup

Fix IT networking

fix it

* rebase

* Address misc comments

* Address comments

* Remove unused router

* rebase

* Fix mypy

* Fixes

* fix it

* Fix tests

* Add drop index

* Add retries

* reset lock timeout

* Try hard drop of schema

* Add timeout/retries to downgrade

* rebase

* test

* test

* test

* Close all connections

* test closing idle only

* Fix it

* fix

* try using null pool

* Test

* fix

* rebase

* log

* Fix

* apply null pool

* Fix other test

* Fix quality checks

* Test not using the fixture

* Fix ordering

* fix test

* Change pooling behavior
2025-02-16 02:34:39 +00:00
pablodanswer
42a0f45a96 update 2025-02-04 12:06:11 -08:00
Richard Kuo (Danswer)
e4180cefba don't duplicate test module names 2025-01-28 15:24:05 -08:00
skylares
f67b5356fa
Create google drive e2e test (#3635)
* Create e2e google drive test

* Drive sync issue

* Add endpoints for group syncing

* google e2e fixes/improvements and add xfail to zendesk tests

* mypy errors

* Key change

* Small changes

* Merged main to fix group sync issue

* Update test_permission_sync.py

* Update google_drive_api_utils.py

* Update test_zendesk_connector.py

---------

Co-authored-by: hagen-danswer <hagen@danswer.ai>
2025-01-28 14:12:57 -08:00
pablodanswer
bee74ac360 mark slack perm sync as flaky 2024-12-16 11:50:03 -08:00
pablodanswer
e9b41bddc9 gmail configuration update 2024-12-14 14:53:02 -08:00
pablodanswer
699a02902a nit 2024-12-13 12:50:02 -08:00
pablodanswer
64f0ad8b26 fix drive tests (nit) 2024-12-13 11:36:39 -08:00
pablodanswer
21ec5ed795 welcome to onyx 2024-12-13 09:56:10 -08:00
pablonyx
0770a587f1
remove slack workspace (#3394)
* remove slack workspace

* update client tokens

* fix up

* clean up docs

* fix up tests
2024-12-12 01:01:43 +00:00
pablodanswer
238509c536 silence 2024-12-11 13:48:37 -08:00
rkuo-danswer
7f1e4a02bf
Feature/kill indexing (#3213)
* checkpoint

* add celery termination of the task

* rename to RedisConnectorPermissionSyncPayload, add RedisLock to more places, add get_active_search_settings

* rename payload

* pretty sure these weren't named correctly

* testing in progress

* cleanup

* remove space

* merge fix

* three dots animation on Pausing

* improve messaging when connector is stopped or killed and animate buttons

---------

Co-authored-by: Richard Kuo <rkuo@rkuo.com>
2024-11-28 05:32:45 +00:00
hagen-danswer
09d3e47c03
Perm sync behavior change (#3262)
* Change external permissions behavior

* fixed behavior

* added error handling

* LLM the goat

* comment

* simplify

* fixed

* done

* limits increased

* added a ton of logging

* uhhhh
2024-11-27 20:04:15 +00:00
hagen-danswer
a50a3944b3
Make curators able to create permission synced connectors (#3126)
* Make curators able to create permission synced connectors

* removed editing permission synced connectors for curators

* updated tests to use access type instead of is_public

* update copy
2024-11-13 18:58:23 +00:00
rkuo-danswer
d703e694ce
limited role api keys (#3115)
* in progress PoC

* working limited user, needs routes to be marked next

* make selected endpoint available to limited user role

* xfail on test_slack_prune

* add comment to sync function

---------

Co-authored-by: Richard Kuo <rkuo@rkuo.com>
2024-11-13 16:15:43 +00:00
hagen-danswer
fdc4811fce
doc sync celery refactor (#3084)
* doc_sync is refactored

* maybe this works

* tested to work!

* mypy fixes

* enabled integration tests

* fixed the test

* added external group sync

* testing should work now

* mypy

* confluence doc id fix

* got group sync working

* addressed feedback

* renamed some vars and fixed mypy

* conf fix?

* added wiki handling to confluence connector

* test fixes

* revert google drive connector

* fixed groups

* hotfix
2024-11-12 23:57:14 +00:00
Yuhong Sun
b49a9ab171
Seeding (#2902)
* checkpoint

* k

* k

* k

* fixed slack api calls

* missed one

---------

Co-authored-by: hagen-danswer <hagen@danswer.ai>
2024-10-24 23:45:48 +00:00
hagen-danswer
802086ee57
Refactored Confluence Connector (#2859)
* Refactored Confluence Connector

* rename metadataconnector to slimconnector

Finish rename

* danswer->onyx

* added rec

* typo

* refactored doc_sync for confluence

* mypy + enable tests

* tested and fixed for confluence cloud

* fixed all server syncing

* fixed connector test

* mypy+connector test fixes

* addressed richards comments

* minor fix
2024-10-21 23:03:40 +00:00
hagen-danswer
deb66a88aa dont fail flaky tests 2024-10-17 13:37:50 -07:00
hagen-danswer
0c102ebb5c simplified the document search function 2024-10-17 12:13:42 -07:00
hagen-danswer
5063b944ec Make flakey test still run but not fail CI 2024-10-17 11:36:59 -07:00
rkuo-danswer
0a0215ceee
check last_pruned instead of is_pruning (#2748)
* check last_pruned instead of is_pruning

* try using the ThreadingHTTPServer class for stability and avoiding blocking single-threaded behavior

* add startup delay to web server in test

* just explicitly return None if we can't parse the datetime

* switch to uvicorn for test stability
2024-10-16 18:52:27 +00:00
Richard Kuo (Danswer)
057321a59f disable flaky test 2024-10-08 13:40:35 -07:00
rkuo-danswer
3404c7eb1d
Feature/background prune 2 (#2583)
* first cut at redis

* some new helper functions for the db

* ignore kombu tables in alembic migrations (used by celery)

* multiline commands for readability, add vespa_metadata_sync queue to worker

* typo fix

* fix returning tuple fields

* add constants

* fix _get_access_for_document

* docstrings!

* fix double function declaration and typing

* fix type hinting

* add a global redis pool

* Add get_document function

* use task_logger in various celery tasks

* add celeryconfig.py to simplify configuration. Will be used in a subsequent commit

* Add celery redis helper. used in a subsequent PR

* kombu warning getting spammy since celery is not self managing its queue in Postgres any more

* add last_modified and last_synced to documents

* fix task naming convention

* use celeryconfig.py

* the big one. adds queues and tasks, updates functions to use the queues with priorities, etc

* change vespa index log line to debug

* mypy fixes

* update alembic migration

* fix fence ordering, rename to "monitor", fix fetch_versioned_implementation call

* mypy

* switch to monotonic time

* fix startup dependencies on redis

* rebase alembic migration

* kombu cleanup - fail silently

* mypy

* add redis_host environment override

* update REDIS_HOST env var in docker-compose.dev.yml

* update the rest of the docker files

* in flight

* harden indexing-status endpoint against db changes happening in the background.  Needs further improvement but OK for now.

* allow no task syncs to run because we create certain objects with no entries but initially marked as out of date

* add back writing to vespa on indexing

* actually working connector deletion

* update contributing guide

* backporting fixes from background_deletion

* renaming cache to cache_volume

* add redis password to various deployments

* try setting up pr testing for helm

* fix indent

* hopefully this release version actually exists

* fix command line option to --chart-dirs

* fetch-depth 0

* edit values.yaml

* try setting ct working directory

* bypass testing only on change for now

* move files and lint them

* update helm testing

* some issues suggest using --config works

* add vespa repo

* add postgresql repo

* increase timeout

* try amd64 runner

* fix redis password reference

* add comment to helm chart testing workflow

* rename helm testing workflow to disable it

* adding clarifying comments

* address code review

* missed a file

* remove commented warning ... just not needed

* fix imports

* refactor to use update_single

* mypy fixes

* add vespa test

* multiple celery workers

* update logs as well and set prefetch multipliers appropriate to the worker intent

* add db refresh to connector deletion

* add some preliminary locking

* organize tasks into separate files

* celery auto associates tasks created inside another task, which bloats the result metadata considerably. trail=False prevents this.

* code review fixes

* move monitor_usergroup_taskset to ee, improve logging

* add multi workers to dev_run_background_jobs.py

* update supervisord with some recommended settings for celery

* name celery workers and shorten dev script prefixing

* add configurable sql alchemy engine settings on startup (needed for various intents like API server, different celery workers and tasks, etc)

* fix comments

* autoscale sqlalchemy pool size to celery concurrency (allow override later?)

* supervisord needs the percent symbols escaped

* use name as primary check, some minor refactoring and type hinting too.

* stash merge (may not function yet)

* remove dead code

* more cleanup

* remove dead file

* we shouldn't be checking for deletion attempts in the db any more

* print cc_pair_id

* print status on status mismatch again

* add logging when cc_pair isn't present

* don't indexing any ingestion type connectors, and don't pause any connectors that aren't active

* add more specific check for deletion completion

* remove flaky mediawiki test site

* move is_pruning

* remove unused code

* remove old function

---------

Co-authored-by: Richard Kuo <rkuo@rkuo.com>
2024-10-07 18:16:17 +00:00
hagen-danswer
c2088602e1
Implement source testing framework + Slack (#2650)
* Added permission sync tests for Slack

* moved folders

* prune test + mypy

* added wait for indexing to cc_pair creation

* commented out check

* should fix other tests

* added slack channel pool

* fixed everything and mypy

* reduced flake
2024-10-02 23:16:07 +00:00