114 Commits

Author SHA1 Message Date
pablodanswer
a202e2bf9d Improvements to Redis + Vespa debugging 2025-02-06 13:30:06 -08:00
pablonyx
4affc259a6
Password reset tenant (#3895)
* nots

* functional

* minor naming cleanup

* nit

* update constant

* k
2025-02-05 03:17:11 +00:00
pablodanswer
125e5eaab1 various mypy improvements 2025-02-04 12:06:10 -08:00
rkuo-danswer
4fe99d05fd
add timings for syncing (#3798)
* add timings for syncing

* add more logging

* more debugging

* refactor multipass/db check out of VespaIndex

* circular imports?

* more debugging

* add logs

* various improvements

* additional logs to narrow down issue

* use global httpx pool for the main vespa flows in celery. Use in more places eventually.

* cleanup debug logging, etc

* remove debug logging

* this should use the secondary index

* mypy

* missed some logging

* review fixes

* refactor get_default_document_index to use search settings

* more missed logging

* fix circular refs

---------

Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
Co-authored-by: pablodanswer <pablo@danswer.ai>
2025-01-29 23:24:44 +00:00
pablonyx
7f10494bbe
Better vespa interface (#3781)
* k

* much cleaner vespa util class

* log

* typing

* improvement

* improve
2025-01-26 21:22:44 +00:00
Yuhong Sun
cbf98c0128
Fix Seeding Link for Support Use Case (#3784) 2025-01-25 19:39:36 -08:00
pablonyx
2a1bb4ac41
Vespa scripts + Redis script update (#3758)
* update onyx redis script

* looking good

* simplify comments

* remove unnecessary apps option

* iterate

* fix typing
2025-01-23 23:46:17 +00:00
pablonyx
ccb16b7484
Indexing latency check fix (#3747)
* add logs + update dev script

* update conig

* remove prints

* temporarily turn off

* va

* update

* fix

* finalize monitoring updates

* update
2025-01-23 17:14:26 +00:00
skylares
af953ff8a3
Paginate Query History table (#3592)
* Add pagination for query history table

* Fix method name

* Fix mypy
2025-01-17 15:31:42 -08:00
hagen-danswer
b1957737f2
refactored _add_user_filter usage (#3674)
* refactored db.connector_credential_pair

* Rerfactored the db.credentials user filtering

* the restr
2025-01-14 23:35:52 +00:00
rkuo-danswer
2ae91f0f2b
Feature/redis prod tool (#3619)
* prototype tools for handling prod issues

* add some commands

* add batching and dry run options

* custom redis tool

* comment

* default to app config settings for redis

---------

Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
2025-01-09 21:34:07 +00:00
pablonyx
d7bc32c0ec
Fully remove visit API (#3621)
* v1

* update indexing logic

* update updates

* nit

* clean up args

* update for clarity + best practices

* nit + logs

* fix

* minor clean up

* remove logs

* quick nit
2025-01-08 13:49:01 -08:00
pablonyx
ddec239fef
Improved indexing (#3594)
* nit

* k

* add steps

* main util functions

* functioning fully

* quick nit

* k

* typing fix

* k

* address comments
2025-01-05 23:31:53 +00:00
Chris Weaver
f895e5f7d0
Speedup orphan doc cleanup script (#3596)
* Speedup orphan doc cleanup script

* Fix mypy
2025-01-05 14:28:25 +00:00
skylares
c191e23256
Pagination Hook (#3494)
* Backend changes for pagination hook + Paginated users table

* Frontend changes for pagination hook

* Fix invited users endpoint

* Fix layout shift & add enter to submit user invites

* mypy

* Cleanup

* Resolve PR concerns & remove UserStatus

* Fix errors

---------

Co-authored-by: hagen-danswer <hagen@danswer.ai>
2025-01-03 14:32:55 -08:00
Richard Kuo
3eb72e5c1d Revert "More efficient Vespa indexing (#3552)"
This reverts commit 27832167819f063bc2f4436765ff27f0ccba2b64.
2024-12-31 09:40:23 -08:00
pablonyx
2783216781
More efficient Vespa indexing (#3552)
---------

Co-authored-by: Chris Weaver <25087905+Weves@users.noreply.github.com>
2024-12-30 18:51:14 -08:00
Yuhong Sun
814f97c2c7
MT Cloud Monitoring (#3465) 2024-12-15 16:05:03 -08:00
pablodanswer
21ec5ed795 welcome to onyx 2024-12-13 09:56:10 -08:00
hagen-danswer
ef9942b751
Related permission docs to cc_pair to prevent orphan docs (#3336)
* Related permission docs to cc_pair to prevent orphan docs

* added script

* group sync deduping

* logging
2024-12-04 21:00:54 +00:00
Weves
b01a1b509a Add basic loadtest script 2024-12-04 10:53:48 -08:00
Yuhong Sun
62a4aa10db
Refactor Search (#3233) 2024-11-23 13:42:54 -08:00
hagen-danswer
fdc4811fce
doc sync celery refactor (#3084)
* doc_sync is refactored

* maybe this works

* tested to work!

* mypy fixes

* enabled integration tests

* fixed the test

* added external group sync

* testing should work now

* mypy

* confluence doc id fix

* got group sync working

* addressed feedback

* renamed some vars and fixed mypy

* conf fix?

* added wiki handling to confluence connector

* test fixes

* revert google drive connector

* fixed groups

* hotfix
2024-11-12 23:57:14 +00:00
Yuhong Sun
55919f596c
PG Dev Max Connections (#3082) 2024-11-07 11:51:23 -08:00
Richard Kuo (Danswer)
0ed77aa8a7 Merge branch 'main' of https://github.com/danswer-ai/danswer into feature/reset_indexes 2024-10-25 12:00:25 -07:00
Richard Kuo (Danswer)
84d551eda4 Merge branch 'patch-1' of https://github.com/Yash-2707/danswer into feature/reset_indexes 2024-10-25 09:35:45 -07:00
Chris Weaver
4a47e9a841
Add strict json mode (#2917) 2024-10-24 22:38:46 -07:00
Yuhong Sun
b49a9ab171
Seeding (#2902)
* checkpoint

* k

* k

* k

* fixed slack api calls

* missed one

---------

Co-authored-by: hagen-danswer <hagen@danswer.ai>
2024-10-24 23:45:48 +00:00
rkuo-danswer
2b9a751b96
working chat feedback dump script (with api addition) (#2891)
* working chat feedback dump script (with api addition)

* mypy fix

* comment out pydantic models (but leave for reference)

* small code review tweaks

* bump to clear vercel issue?
2024-10-24 19:50:09 +00:00
pablodanswer
14e75bbd24
add default schema config (#2888)
* add default schema config

* resolve circular import

* k
2024-10-23 23:12:17 +00:00
rkuo-danswer
9105f95d13
Feature/celery refactor (#2813)
* fresh indexing feature branch

* cherry pick test

* Revert "cherry pick test"

This reverts commit 2a624220687affdda3de347e30f2011136f64bda.

* set multitenant so that vespa fields match when indexing

* cleanup pass

* mypy

* pass through env var to control celery indexing concurrency

* comments on task kickoff and some logging improvements

* disentangle configuration for different workers and beats.

* use get_session_with_tenant

* comment out all of update.py

* rename to RedisConnectorIndexingFenceData

* first check num_indexing_workers

* refactor RedisConnectorIndexingFenceData

* comment out on_worker_process_init

* missed a file

* scope db sessions to short lengths

* update launch.json template

* fix types

* code review
2024-10-22 22:57:36 +00:00
Yuhong Sun
eccec6ab7c
Notion Fix Nested Properties (#2877) 2024-10-22 14:10:31 -07:00
YASH
8f236a1288
Update reset_indexes.py
Error Handling: Add more specific error handling to make it easier to debug issues.
Configuration Management: Use environment variables or a configuration file for settings like DOCUMENT_INDEX_NAME and DOCUMENT_ID_ENDPOINT.
Logging: Improve logging to include more details about the operations.
Retry Mechanism: Add a retry mechanism for network requests to handle transient errors.
Testing: Add unit tests for the functions to ensure they work as expected
2024-10-22 17:37:07 +05:30
rkuo-danswer
6913efef90
fresh indexing feature branch (#2790)
* fresh indexing feature branch

* cherry pick test

* Revert "cherry pick test"

This reverts commit 2a624220687affdda3de347e30f2011136f64bda.

* set multitenant so that vespa fields match when indexing

* cleanup pass

* mypy

* pass through env var to control celery indexing concurrency

* comments on task kickoff and some logging improvements

* use get_session_with_tenant

* comment out all of update.py

* rename to RedisConnectorIndexingFenceData

* first check num_indexing_workers

* refactor RedisConnectorIndexingFenceData

* comment out on_worker_process_init

* fix where num_indexing_workers falls back

* remove extra brace
2024-10-18 22:40:05 +00:00
pablodanswer
bfe963988e
various multi tenant improvements (#2803)
* various multi tenant improvements

* nit

* ensure consistent db session operations

* minor robustification
2024-10-15 20:10:57 +00:00
pablodanswer
f40c5ca9bd
Add tenant context (#2596)
* add proper tenant context to background tasks

* update for new session logic

* remove unnecessary functions

* add additional tenant context

* update ports

* proper format / directory structure

* update ports

* ensure tenant context properly passed to ee bg tasks

* add user provisioning

* nit

* validated for multi tenant

* auth

* nit

* nit

* nit

* nit

* validate pruning

* evaluate integration tests

* at long last, validated celery beat

* nit: minor edge case patched

* minor

* validate update

* nit
2024-10-10 16:34:32 +00:00
Yuhong Sun
5d356cc971
Remove Perm Sync Script Dev (#2712) 2024-10-07 13:50:30 -07:00
rkuo-danswer
fbf51b70d0
Feature/celery multi (#2470)
* first cut at redis

* some new helper functions for the db

* ignore kombu tables in alembic migrations (used by celery)

* multiline commands for readability, add vespa_metadata_sync queue to worker

* typo fix

* fix returning tuple fields

* add constants

* fix _get_access_for_document

* docstrings!

* fix double function declaration and typing

* fix type hinting

* add a global redis pool

* Add get_document function

* use task_logger in various celery tasks

* add celeryconfig.py to simplify configuration. Will be used in a subsequent commit

* Add celery redis helper. used in a subsequent PR

* kombu warning getting spammy since celery is not self managing its queue in Postgres any more

* add last_modified and last_synced to documents

* fix task naming convention

* use celeryconfig.py

* the big one. adds queues and tasks, updates functions to use the queues with priorities, etc

* change vespa index log line to debug

* mypy fixes

* update alembic migration

* fix fence ordering, rename to "monitor", fix fetch_versioned_implementation call

* mypy

* switch to monotonic time

* fix startup dependencies on redis

* rebase alembic migration

* kombu cleanup - fail silently

* mypy

* add redis_host environment override

* update REDIS_HOST env var in docker-compose.dev.yml

* update the rest of the docker files

* in flight

* harden indexing-status endpoint against db changes happening in the background.  Needs further improvement but OK for now.

* allow no task syncs to run because we create certain objects with no entries but initially marked as out of date

* add back writing to vespa on indexing

* actually working connector deletion

* update contributing guide

* backporting fixes from background_deletion

* renaming cache to cache_volume

* add redis password to various deployments

* try setting up pr testing for helm

* fix indent

* hopefully this release version actually exists

* fix command line option to --chart-dirs

* fetch-depth 0

* edit values.yaml

* try setting ct working directory

* bypass testing only on change for now

* move files and lint them

* update helm testing

* some issues suggest using --config works

* add vespa repo

* add postgresql repo

* increase timeout

* try amd64 runner

* fix redis password reference

* add comment to helm chart testing workflow

* rename helm testing workflow to disable it

* adding clarifying comments

* address code review

* missed a file

* remove commented warning ... just not needed

* fix imports

* refactor to use update_single

* mypy fixes

* add vespa test

* multiple celery workers

* update logs as well and set prefetch multipliers appropriate to the worker intent

* add db refresh to connector deletion

* add some preliminary locking

* organize tasks into separate files

* celery auto associates tasks created inside another task, which bloats the result metadata considerably. trail=False prevents this.

* code review fixes

* move monitor_usergroup_taskset to ee, improve logging

* add multi workers to dev_run_background_jobs.py

* update supervisord with some recommended settings for celery

* name celery workers and shorten dev script prefixing

* add configurable sql alchemy engine settings on startup (needed for various intents like API server, different celery workers and tasks, etc)

* fix comments

* autoscale sqlalchemy pool size to celery concurrency (allow override later?)

* supervisord needs the percent symbols escaped

* use name as primary check, some minor refactoring and type hinting too.

* addressing code review

* fix import

* fix prune_documents_task references

---------

Co-authored-by: Richard Kuo <rkuo@rkuo.com>
2024-09-27 00:50:55 +00:00
hagen-danswer
2274cab554
Added permission syncing (#2340)
* Added permission syncing on the backend

* Rewored to work with celery

alembic fix

fixed test

* frontend changes

* got groups working

* added comments and fixed public docs

* fixed merge issues

* frontend complete!

* frontend cleanup and mypy fixes

* refactored connector access_type selection

* mypy fixes

* minor refactor and frontend improvements

* get to fetch

* renames and comments

* minor change to var names

* got curator stuff working

* addressed pablo's comments

* refactored user_external_group to reference users table

* implemented polling

* small refactor

* fixed a whoopsies on the frontend

* added scripts to seed dummy docs and test query times

* fixed frontend build issue

* alembic fix

* handled is_public overlap

* yuhong feedback

* added more checks for sync

* black

* mypy

* fixed circular import

* todos

* alembic fix

* alembic
2024-09-19 22:07:36 +00:00
hagen-danswer
f3cea79c1c
Deleting a connector should redirect to the indexing status page (#2504)
* Deleting a connector should redirect to the indexing status page

* minor update to dev background jobs

* update refresh logic

* remove print statement

---------

Co-authored-by: pablodanswer <pablo@danswer.ai>
2024-09-18 21:38:35 +00:00
Yuhong Sun
6cec31088d
CONTRIBUTING updates (#2354) 2024-09-07 14:05:36 -07:00
Weves
39c946536c Fix deletion due to foreign key issue 2024-09-02 17:56:43 -07:00
Yuhong Sun
5ab4d94d94
Logging Level Update (#2165) 2024-08-18 21:53:40 -07:00
Weves
0853d1a8f1 Update force deletion script 2024-08-14 23:29:26 -07:00
Nathan Schwerdfeger
c7e5b11c63
EE Connector Deletion Bugfix + Refactor (#2042)
---------

Co-authored-by: Weves <chrisweaver101@gmail.com>
2024-08-11 20:33:07 -07:00
Yuhong Sun
c8ead6a0dc
Need Reindexing Flag Setup (#2102) 2024-08-09 17:44:57 -07:00
rkuo-danswer
7c283b090d
Feature/postgres connection names (#1998)
* avoid reindexing secondary indexes after they succeed

* use postgres application names to facilitate connection debugging

* centralize all postgres application_name constants in the constants file

* missed a couple of files

* mypy fixes

* update dev background script
2024-07-31 20:36:30 +00:00
rkuo-danswer
546bfbd24b
autoscale with pool=thread crashes celery. remove and use concurrency… (#1929)
* autoscale with pool=thread crashes celery. remove and use concurrency instead (to be improved later)

* update dev background script as well
2024-07-25 00:15:27 +00:00
rkuo-danswer
6ee74bd0d1
fix pointers to various background tasks and scripts (#1914) 2024-07-24 10:12:51 -07:00
Weves
6222f533be Update force delete script to handle user groups 2024-07-21 22:22:37 -07:00