297 Commits

Author SHA1 Message Date
hagen-danswer
a50a3944b3
Make curators able to create permission synced connectors (#3126)
* Make curators able to create permission synced connectors

* removed editing permission synced connectors for curators

* updated tests to use access type instead of is_public

* update copy
2024-11-13 18:58:23 +00:00
hagen-danswer
eb0e20b9e4 quick fix for google doc sync 2024-11-13 07:24:29 -08:00
pablodanswer
22189f02c6
Add referral source to cloud on data plane (#3096)
* cloud auth referral source

* minor clarity

* k

* minor modification to be best practice

* typing

* Update ReferralSourceSelector.tsx

* Update ReferralSourceSelector.tsx

---------

Co-authored-by: hagen-danswer <hagen@danswer.ai>
2024-11-13 00:42:25 +00:00
hagen-danswer
fdc4811fce
doc sync celery refactor (#3084)
* doc_sync is refactored

* maybe this works

* tested to work!

* mypy fixes

* enabled integration tests

* fixed the test

* added external group sync

* testing should work now

* mypy

* confluence doc id fix

* got group sync working

* addressed feedback

* renamed some vars and fixed mypy

* conf fix?

* added wiki handling to confluence connector

* test fixes

* revert google drive connector

* fixed groups

* hotfix
2024-11-12 23:57:14 +00:00
pablodanswer
9272d6ebfe
Remove ee (#3093)
* move api key to non-ee

* finalize previous migration

* move token rate limit to non-ee

* general cleanup

* update

* update

* finalize

* finalize

* ensure callable

* k
2024-11-09 20:51:36 +00:00
pablodanswer
f6d8f5ca89
Migrate tenant upgrades to data plane (#3051)
* add provisioning on data plane

* functional but scrappy

* minor cleanup

* minor clean up

* k

* simplify

* update provisioning

* improve import logic

* ensure proper conditional

* minor pydantic update

* minor config update

* nit
2024-11-08 17:13:29 +00:00
pablodanswer
25f5c12750 remove print 2024-11-06 13:49:16 -08:00
pablodanswer
187a7d2da2 validated approach 2024-11-06 13:49:16 -08:00
pablodanswer
467ce4e3f3 fix usage report pagination 2024-11-06 13:21:00 -08:00
pablodanswer
2cb33b1fb4
add default api keys for cloud users (#3044)
* add default api keys for cloud users

* add cohere as well

* naming
2024-11-04 19:11:12 +00:00
hagen-danswer
2cd1e6be00
gmail refactor + permission syncing (#3021)
* initial frontend changes and shared google refactoring

* gmail connector is reworked

* added permission syncing for gmail

* tested!

* Added tests for gmail connector

* fixed tests and mypy

* temp fix

* testing done!

* rename

* test fixes maybe?

* removed irrelevant tests

* anotha one

* refactoring changes

* refactor finished

* maybe these fixes work

* dumps

* final fixes
2024-11-04 18:06:23 +00:00
Chris Weaver
c2d04f591d
Add drive sections (#3040)
* ADd header support for drive

* Fix mypy

* Comment change

* Improve

* Cleanup

* Add comment
2024-11-03 22:10:45 +00:00
pablodanswer
46e5ffa3ae
add validated + reformatted dynamic beat acquisition (#3006)
* add validated + reformatted dynamic beat acquisition

* validate

* reorg

* nit

* address comments

* update

* typing

* ensure versioned apps capture

* Remove locks (#3017)

* add validated + reformatted dynamic beat acquisition

* initial removal of locks!

* minor

* remove unecessary locks

* update

* nit

* k

* K8s jobs (#3033)

* add k8s configs

* k

* update config

* k

* improved timeouts + worker configs

* improve workers
2024-11-03 10:27:25 -08:00
rkuo-danswer
5f5cc9a724
Feature/redis connector refactor (#2992)
* refactor RedisConnectorDeletion into RedisConnector

* refactor redis stop and deletion

* port pruning

* nest pruning

* port deletion

* port indexing

* refactor into individual files

* refactor redis connector index  to take search settings at init

* move back to debug level log

* refactor doc set and user group (mostly)

* mypy fixes
2024-11-02 19:53:04 +00:00
pablodanswer
e4bb14d4e1
Super user (#2944)
* add super user

* nits
2024-11-02 17:29:23 +00:00
Chris Weaver
ecf4923a3a
Fix answer with specified doc ids (#2703)
* Fix

Fix

Refactor

more

more

fix

refactor

Fix circular imports

Refactor

Move tests around

* Add quote support

* Testing

* More testing

* Fix image generation slowness

* Remove unused exception

* Fix UT

* fix stop generating

* minor typo

* minor logging updates for clarity

---------

Co-authored-by: pablodanswer <pablo@danswer.ai>
2024-11-01 19:50:20 +00:00
pablodanswer
753293cefb
Basic multi tenant api key (#3004)
* basic multi tenant api key

* organization

* nit

* clean
2024-11-01 19:34:51 +00:00
pablodanswer
6d543f3d4f
Do not count API keys as users (#3022)
* don't count api keys as users

* typing
2024-11-01 19:34:30 +00:00
hagen-danswer
71d4fb98d3
Refactored Google Drive Connector + Permission Syncing (#2945)
* refactoring changes

* everything working for service account

* works with service account

* combined scopes

* copy change

* oauth prep

* Works for oauth and service account credentials

* mypy

* merge fixes

* Refactor Google Drive connector

* finished backend

* auth changes

* if its stupid but it works, its not stupid

* npm run dev fixes

* addressed change requests

* string fix

* minor fixes and cleanup

* spacing cleanup

* Update connector.py

* everything done

* testing!

* Delete backend/tests/daily/connectors/google_drive/file_generator.py

* cleaned up

---------

Co-authored-by: Chris Weaver <25087905+Weves@users.noreply.github.com>
2024-11-01 02:25:00 +00:00
Chris Weaver
5be457e321
Add alternative auth header (#2999) 2024-10-30 19:10:03 +00:00
pablodanswer
53e916552b
tenant seeding docs (#2925)
* tenant seeding docs

* k
2024-10-27 18:48:47 +00:00
pablodanswer
da3c5e3711
Feat: add clean logging for api routes (#2928)
* feat: add clean logging for api routes

* nit

* `MULTI_TENANT` must be shared config

* nit
2024-10-27 05:15:41 +00:00
pablodanswer
1261d859ac
Tenant aware JWT strategy (#2943)
* add tenantJWTSrategy

* nit
2024-10-26 23:27:40 +00:00
pablodanswer
9b147ae437
Tenant integration tests (#2913)
* check for index swap

* initial bones

* kk

* k

* k:

* nit

* nit

* rebase + update

* nit

* minior update

* k

* minor integration test fixes

* nit

* ensure we build test docker image

* remove one space

* k

* ensure we wipe volumes

* remove log

* typo

* nit

* k

* k
2024-10-25 18:47:17 +00:00
Chris Weaver
4a47e9a841
Add strict json mode (#2917) 2024-10-24 22:38:46 -07:00
Yuhong Sun
b49a9ab171
Seeding (#2902)
* checkpoint

* k

* k

* k

* fixed slack api calls

* missed one

---------

Co-authored-by: hagen-danswer <hagen@danswer.ai>
2024-10-24 23:45:48 +00:00
rkuo-danswer
2b9a751b96
working chat feedback dump script (with api addition) (#2891)
* working chat feedback dump script (with api addition)

* mypy fix

* comment out pydantic models (but leave for reference)

* small code review tweaks

* bump to clear vercel issue?
2024-10-24 19:50:09 +00:00
pablodanswer
0545fb4443
Multitenant redis update (#2889)
* add multi tenancy to redis

* rename context var

* k

* args -> kwargs

* minor update to kv interface

* robustify
2024-10-24 02:12:25 +00:00
pablodanswer
14e75bbd24
add default schema config (#2888)
* add default schema config

* resolve circular import

* k
2024-10-23 23:12:17 +00:00
pablodanswer
8b72264535
Gating Notifications (#2868)
* functional notifications

* typing

* minor

* ports

* nit

* verify functionality

* pretty
2024-10-23 20:20:20 +00:00
rkuo-danswer
9105f95d13
Feature/celery refactor (#2813)
* fresh indexing feature branch

* cherry pick test

* Revert "cherry pick test"

This reverts commit 2a624220687affdda3de347e30f2011136f64bda.

* set multitenant so that vespa fields match when indexing

* cleanup pass

* mypy

* pass through env var to control celery indexing concurrency

* comments on task kickoff and some logging improvements

* disentangle configuration for different workers and beats.

* use get_session_with_tenant

* comment out all of update.py

* rename to RedisConnectorIndexingFenceData

* first check num_indexing_workers

* refactor RedisConnectorIndexingFenceData

* comment out on_worker_process_init

* missed a file

* scope db sessions to short lengths

* update launch.json template

* fix types

* code review
2024-10-22 22:57:36 +00:00
hagen-danswer
914da2e4cb
Confluence polish (#2874) 2024-10-22 20:41:47 +00:00
hagen-danswer
802086ee57
Refactored Confluence Connector (#2859)
* Refactored Confluence Connector

* rename metadataconnector to slimconnector

Finish rename

* danswer->onyx

* added rec

* typo

* refactored doc_sync for confluence

* mypy + enable tests

* tested and fixed for confluence cloud

* fixed all server syncing

* fixed connector test

* mypy+connector test fixes

* addressed richards comments

* minor fix
2024-10-21 23:03:40 +00:00
pablodanswer
a24b465663
Minor tenant ID improvements (#2850)
* add migration dockerfile

* address edge case

* k

* k

* k

* nit

* k

* k

* k

* k

* remove

* k

* add comment
2024-10-20 23:48:00 +00:00
rkuo-danswer
457e7992a4
missing tenant_id as optional param (#2851)
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
2024-10-19 21:10:42 +00:00
rkuo-danswer
6913efef90
fresh indexing feature branch (#2790)
* fresh indexing feature branch

* cherry pick test

* Revert "cherry pick test"

This reverts commit 2a624220687affdda3de347e30f2011136f64bda.

* set multitenant so that vespa fields match when indexing

* cleanup pass

* mypy

* pass through env var to control celery indexing concurrency

* comments on task kickoff and some logging improvements

* use get_session_with_tenant

* comment out all of update.py

* rename to RedisConnectorIndexingFenceData

* first check num_indexing_workers

* refactor RedisConnectorIndexingFenceData

* comment out on_worker_process_init

* fix where num_indexing_workers falls back

* remove extra brace
2024-10-18 22:40:05 +00:00
rkuo-danswer
5b78299880
use native rate limiting in the confluence client (#2837)
* use native rate limiting in the confluence client

* upgrade urllib3 to v2.2.3 to support retries in confluence client

* improve logging so that progress is visible.
2024-10-18 18:15:43 +00:00
pablodanswer
a159779d39
prevent alembic from configuring logger (#2826)
* k

* k
2024-10-17 16:31:17 +00:00
hagen-danswer
938a65628d rearrange logging 2024-10-17 09:01:51 -07:00
hagen-danswer
5d390b65eb Added logging for when a member has no email or username 2024-10-17 08:47:46 -07:00
pablodanswer
db0779dd02
Session id: int -> UUID (#2814)
* session id: int -> UUID

* nit

* validated

* validated downgrade + upgrade + all functionality

* nit

* minor nit

* fix test case
2024-10-16 22:18:45 +00:00
pablodanswer
1a9921f63e
Redirect with query param (#2811)
* validated

* k

* k

* k

* minor update
2024-10-16 17:26:44 +00:00
pablodanswer
bfe963988e
various multi tenant improvements (#2803)
* various multi tenant improvements

* nit

* ensure consistent db session operations

* minor robustification
2024-10-15 20:10:57 +00:00
hagen-danswer
494fda906d
Confluence permission sync fix for server deployment (#2784)
* initial commit

* Made perm sync with with cql

* filter fix

* undo connector changes

* fixed everything

* whoops
2024-10-14 20:52:57 +00:00
pablodanswer
20df20ae51
Multi tenant vespa (#2762)
* add vespa multi tenancy

* k

* formatting

* Billing (#2667)

* k

* data -> control

* nit

* nit: error handling

* auth + app

* nit: color standardization

* nit

* nit: typing

* k

* k

* feat: functional upgrading

* feat: add block for downgrading to seats < active users

* add auth

* remove accomplished todo + prints

* nit

* tiny nit

* nit: centralize security

* add tenant expulsion/gating + invite user -> increment billing seat no.

* add cloud configs

* k

* k

* nit: update

* k

* k

* k

* k

* nit
2024-10-12 23:53:11 +00:00
hagen-danswer
101b010c5c
Improved logging and added comments (#2763)
* Improved logging and added comments

* fix exception logging

* cleanup
2024-10-10 17:37:27 +00:00
pablodanswer
f40c5ca9bd
Add tenant context (#2596)
* add proper tenant context to background tasks

* update for new session logic

* remove unnecessary functions

* add additional tenant context

* update ports

* proper format / directory structure

* update ports

* ensure tenant context properly passed to ee bg tasks

* add user provisioning

* nit

* validated for multi tenant

* auth

* nit

* nit

* nit

* nit

* validate pruning

* evaluate integration tests

* at long last, validated celery beat

* nit: minor edge case patched

* minor

* validate update

* nit
2024-10-10 16:34:32 +00:00
hagen-danswer
804de3248e
google drive permission sync cleanup (#2749) 2024-10-09 21:17:22 +00:00
rkuo-danswer
3404c7eb1d
Feature/background prune 2 (#2583)
* first cut at redis

* some new helper functions for the db

* ignore kombu tables in alembic migrations (used by celery)

* multiline commands for readability, add vespa_metadata_sync queue to worker

* typo fix

* fix returning tuple fields

* add constants

* fix _get_access_for_document

* docstrings!

* fix double function declaration and typing

* fix type hinting

* add a global redis pool

* Add get_document function

* use task_logger in various celery tasks

* add celeryconfig.py to simplify configuration. Will be used in a subsequent commit

* Add celery redis helper. used in a subsequent PR

* kombu warning getting spammy since celery is not self managing its queue in Postgres any more

* add last_modified and last_synced to documents

* fix task naming convention

* use celeryconfig.py

* the big one. adds queues and tasks, updates functions to use the queues with priorities, etc

* change vespa index log line to debug

* mypy fixes

* update alembic migration

* fix fence ordering, rename to "monitor", fix fetch_versioned_implementation call

* mypy

* switch to monotonic time

* fix startup dependencies on redis

* rebase alembic migration

* kombu cleanup - fail silently

* mypy

* add redis_host environment override

* update REDIS_HOST env var in docker-compose.dev.yml

* update the rest of the docker files

* in flight

* harden indexing-status endpoint against db changes happening in the background.  Needs further improvement but OK for now.

* allow no task syncs to run because we create certain objects with no entries but initially marked as out of date

* add back writing to vespa on indexing

* actually working connector deletion

* update contributing guide

* backporting fixes from background_deletion

* renaming cache to cache_volume

* add redis password to various deployments

* try setting up pr testing for helm

* fix indent

* hopefully this release version actually exists

* fix command line option to --chart-dirs

* fetch-depth 0

* edit values.yaml

* try setting ct working directory

* bypass testing only on change for now

* move files and lint them

* update helm testing

* some issues suggest using --config works

* add vespa repo

* add postgresql repo

* increase timeout

* try amd64 runner

* fix redis password reference

* add comment to helm chart testing workflow

* rename helm testing workflow to disable it

* adding clarifying comments

* address code review

* missed a file

* remove commented warning ... just not needed

* fix imports

* refactor to use update_single

* mypy fixes

* add vespa test

* multiple celery workers

* update logs as well and set prefetch multipliers appropriate to the worker intent

* add db refresh to connector deletion

* add some preliminary locking

* organize tasks into separate files

* celery auto associates tasks created inside another task, which bloats the result metadata considerably. trail=False prevents this.

* code review fixes

* move monitor_usergroup_taskset to ee, improve logging

* add multi workers to dev_run_background_jobs.py

* update supervisord with some recommended settings for celery

* name celery workers and shorten dev script prefixing

* add configurable sql alchemy engine settings on startup (needed for various intents like API server, different celery workers and tasks, etc)

* fix comments

* autoscale sqlalchemy pool size to celery concurrency (allow override later?)

* supervisord needs the percent symbols escaped

* use name as primary check, some minor refactoring and type hinting too.

* stash merge (may not function yet)

* remove dead code

* more cleanup

* remove dead file

* we shouldn't be checking for deletion attempts in the db any more

* print cc_pair_id

* print status on status mismatch again

* add logging when cc_pair isn't present

* don't indexing any ingestion type connectors, and don't pause any connectors that aren't active

* add more specific check for deletion completion

* remove flaky mediawiki test site

* move is_pruning

* remove unused code

* remove old function

---------

Co-authored-by: Richard Kuo <rkuo@rkuo.com>
2024-10-07 18:16:17 +00:00
pablodanswer
0da736bed9
Tenant provisioning in the dataplane (#2694)
* add tenant provisioning to data plane

* minor typing update

* ensure tenant router included

* proper auth check

* update disabling logic

* validated basic provisioning

* use new kv store
2024-10-06 04:08:35 +00:00