431 Commits

Author SHA1 Message Date
joachim-danswer
c5adbe4180 Knowledge Graph v1 (#4626)
* db setup

* transfer 1 - incomplete

* more adjustments

* relationship table + query update

* temp view creation

* restructuring

* nits

* updates

* separate read_only engine

* extraction revamp

* focus on metadata relatonships 1

* dev

* migration downgrade fix

* rebase migration change

* a3+

* progress

* base

* new extraction

* progress

* fixed KG extraction

* nits

* updates

* simplifications & cleanup

* fixes

* updates

* more feature flag checks

* fixes

* extraction process fix

* read-only user creation as part of setup

* fix for missing entity attributes

* kg read-only user creation as part of migration

* typo

* EL initial comments

* initial Account/SF Connector chnges

* SF Connector update

 - include account information

* base w/ salesforce

* evan updates + quite a bit more

* kg-filtered search

* EL changes pt 2

* migrations and env vars

* quick migration fix

* migration update

* post_rebase fixes

* mypy fixes

* test fixes

* test fix

* test fix

* read_only pool + misc

* nf

* env vars

* test improvements

* salesforce fix

* test update

* small changes

* small adjustments

* SF Connector fix & kg_stage removal for one table

* mypy fix

* small fixes

* EL + RK (pt 1) comments

* nit

* setting updated

* Salesforce test update

* EL comments

* read-only user replacement & cleanup

* SQL View fix

* converting entity type-name separators

* sql view group ownership

* view fix

* SQL tweak

* dealing with docs that were skipped by indexing

* increased error handling

* more error handling

* Output formatting fix

* kg-incremental-reindexing

* 0-doc found improvement

* celery

* migration correction

* timeout adjustments

* nit

* Updated migration

* Entity Normalization for KG Dev 1 (#4746)

* feat: trigrams column

* fix: reranking and db

* feat: v1

* fix: convert to orm

* feat: parallel

* fix: default to id_name

* fix: renamed semantic_id and semantic_id_trigrams

* fix: scalar subquery

* fix: tuning + redundancy

* fix: threshold

* fix: typo

* fix: shorten names

* wip

* fix: reverted

* feat: config

* feat: works but it was dumb

* feat: clustering works

* fix: mypy

* normalization <-> language awareness for SQL generation

* small type fixes

---------

Co-authored-by: joachim-danswer <joachim@danswer.ai>

* mypy

* typo and dead code

* kg_time_fencing

* feat: remove temp views on migration downgrade

* remove functions and triggers for now

* rebase adjustments

* EL code review results

* quick fix + trigger/funcs for single tenant

* fix: typo, mypy, dead code

* fix: autoflake

* small updatesd

* nit

* fix: typo

* early + faster view creation

* Extension creation in MT migration

* nit changes to default ETs

* Incremental Clustering and KG Refactor V1 (#4784)

Optimized/restructured incremental clustering. New pipeline actually that moves vespa updates to clustering.
Also, celery configuration has been updated.
---------

Co-authored-by: joachim-danswer <joachim@danswer.ai>

* prompt tweak & ET extraction reset

* more general hierarchical structure

* feat: better vespa reset logic

* prompt optimization and entity replacemants

* small prompt changes

* KG Refactor V2 (#4814)

Clustering & Extraction improvements & various nits 

Co-authored-by: joachim-danswer <joachim@danswer.ai>

* add connector-level coverage days

* fix: nit

* initial  EL responses

* refactor: helper functions for formatting

* fix: more helper fns & comments

* fix: comment code that's been implemented elsewhere

* fix: tenant_id missing arg

* fix: removed debugging stuff

* fix: moved kg_interactions db query to helper fn

* fix: tenant_id

* fix: tenant_id & removed outdated helper fn

* fix always set entity class

* fix: typo

* fix alembic heads

* fix: celery logging

* fix: migrations fix

* fix: multi tenant permissions

* fix: temp connector fix

* fix: downgrade

* Fix upgrade migration

* fix: tenant for normalization

* added additional acl

* stray EL comments

* fix: connector test

* fix mypy

* fix: temporary connector test fix

* fix: jira connector test

* nit

* small nits

* fix: black

* fix: mypy

* fix: mypy

---------

Co-authored-by: Rei Meguro <36625832+Orbital-Web@users.noreply.github.com>
2025-06-07 23:14:20 +00:00
Wenxi
dc4b9bc003 Fixed indexing when no sites are specified (#4822)
* Fixed indexing when no sites are specificed

* Added test for Sharepoint all sites index

* Accounted for paginated results.

* Typing

* Typing

---------

Co-authored-by: Wenxi Onyx <wenxi-onyx@Wenxis-MacBook-Pro.local>
2025-06-05 23:25:20 +00:00
Rei Meguro
4bb3ee03a0 Update GitHub Connector metadata (#4769)
* feat: updated github metadata

* feat: nullity check

* feat: more metadata

* feat: userinfo

* test: connector test + more metadata

* feat: num files changed

* feat str

* feat: list of str
2025-06-04 18:33:14 +00:00
Chris Weaver
094cc940a4 Small embedding model cleanups (#4820)
* Small embedding model cleanups

* fix

* address greptile

* fix build
2025-06-04 00:10:44 +00:00
joachim-danswer
80ecdb711d New metadata for Jira for KG (#4785)
* new metadata components

* nits & tests
2025-06-03 20:12:56 +00:00
Chris Weaver
a599176bbf Improve reasoning detection (#4817)
* Improve reasoning detection

* Address greptile comments

* Fix mypy
2025-06-03 20:01:12 +00:00
Chris Weaver
84d916e210 Fix hard delete of agentic chats (#4803)
* Fix hard delete of agentic chats

* Update backend/tests/integration/tests/chat/test_chat_deletion.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* Address Greptile comments

* fix tests

---------

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
2025-06-03 11:14:11 -07:00
rkuo-danswer
ef4d5dcec3 new slack rate limiting approach (#4779)
* fix slack rate limit retry handler for groups

* trying to mitigate memory usage during csv download

* Revert "trying to mitigate memory usage during csv download"

This reverts commit 48262eacf6.

* integrated approach to rate limiting

* code review

* try no redis setting

* add pytest-dotenv

* add more debugging

* added comments

* add more stats

---------

Co-authored-by: Richard Kuo (Onyx) <rkuo@onyx.app>
2025-05-29 19:49:32 +00:00
Evan Lohn
ca20e527fc fix tool calling for bedrock claude models (#4761)
* fix tool calling for bedrock claude models

* unit test

* fix unit test
2025-05-23 01:13:18 +00:00
Rei Meguro
9dbe12cea8 Feat: Search Eval Testing Overhaul (provide ground truth, categorize query, etc.) (#4739)
* fix: autoflake & import order

* docs: readme

* fix: mypy

* feat: eval

* docs: readme

* fix: oops forgot to remove comment

* fix: typo

* fix: rename var

* updated default config

* fix: config issue

* oops

* fix: black

* fix: eval and config

* feat: non tool calling query mod
2025-05-21 19:25:10 +00:00
rkuo-danswer
e78637d632 mitigate memory usage during csv download (#4745)
* mitigate memory usage during csv download

* more gc tweaks

* missed some small changes

---------

Co-authored-by: Richard Kuo (Onyx) <rkuo@onyx.app>
2025-05-21 00:44:27 +00:00
Evan Lohn
cac03c07f7 v1 answer refactor (#4721)
* v1 answer refactor

* fix tests

* good catch, tests

* more cleanup
2025-05-20 23:34:27 +00:00
Raunak Bhagat
95dabfaa18 fix: Add back Teams' replies processing (#4744)
* Add replies to document construction and edit tests

* Update tests

* Add replies processing to teams

* Fix test

* Add try-except block around potential failure

* Update entity-id during ConnectorFailure raise
2025-05-20 22:55:28 +00:00
rkuo-danswer
e92c418e0f Feature/openapi (#4710)
* starting openapi support

* fix app / app_fn

* send gitignore

* dedupe function names

* add readme

* modify gitignore

* update launch template

* fix unused path param

* fix mypy

* local tests pass

* first pass at making integration tests work

* fixes

* fix script path

* set python path

* try full path

* fix output dir

* fix integration test

* more build fixes

* add generated directory

* use the config

* add a comment

* add

* modify tsconfig.json

* fix index linting bugs

* tons of lint fixes

* new gitignore

* remove generated dir

* add tasks template

* check for undefined explicitly

* fix hooks.ts

* refactor destructureValue

* improve readme

---------

Co-authored-by: Richard Kuo (Onyx) <rkuo@onyx.app>
2025-05-20 21:33:18 +00:00
Evan Lohn
e0f5b95cfc full drive perm sync 2025-05-19 21:06:43 -07:00
Evan Lohn
b76e4754bf anthropic fix (#4733)
* anthropic fix

* naming
2025-05-19 20:34:29 +00:00
Rei Meguro
d64f479c9f feat: error handling & optimization (#4722) 2025-05-19 20:27:22 +00:00
Rei Meguro
30d9ce1310 feat: search quality eval (#4720)
* fix: import order

* test examples

* fix: import

* wip: reranker based eval

* fix: import order

* feat: adjuted score

* fix: mypy

* fix: suggestions

* sorry cvs, you must go

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* fix: mypy

* fix: suggestions

---------

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
2025-05-15 23:44:33 +00:00
Evan Lohn
2af2b7f130 fix connector tests and drive indexing (#4715)
* fix connector tests and drive indexing

* fix other test

* fix checkpoint data bug
2025-05-15 19:15:46 +00:00
Raunak Bhagat
312e3b92bc perf: Implement checkpointing for Teams Connector. (#4601)
* Add basic foundation for teams checkpointing classes

* Fix slack connector main entrypoint

* Saving changes

* Finish teams checkpointing impl

* Remove commented out code

* Remove more unused code

* Move code around

* Add threadpool to process requests in parallel

* Fix mypy errors / warnings

* Move test import to main function only

* Address nits on PR

* Remove unnecessary check prior to entering while-loop

* Remove print statement

* Change exception message

* Address more nits

* Use indexing instead of destructuring

* Add back invocation of `run_with_timeout` instead of a direct call

* Revert slack testing code

* Move early return to before second API call

* Pull fetch to team outside of loop

* Address nits on PR

* Add back client-side filtering

* Updated connector to return after a team's indexing is finished

* Add type ignore

* Implement proper datetime range fetching

* Address comment on PR

* Rename function

* Change exception type when no team with the given id was found

* Address nit on PR

* Add comment on why `page_loaded` is needed to be specified explicitly

* Remove duplicated calls to fetching channels

* Use helper function for thread-based yielding instead of manual logic

* Move datetime filtering to message-level instead

* Address more comments on PR

* Add new utility function for yielding sections

* Add additional utility function

* Add teams tests

* Edit error message

* Address nits on PR

* Promote url-prefix to be a class level constant

* Fix mypy error

* Remove start/end parameters from function that doesn't use them anymore; move around comments

* Address more nits on PR

* Add comment
2025-05-14 04:30:57 +00:00
Chris Weaver
b19515e25d Fix window_start (#4689)
* Fix window_start

* Add comment
2025-05-12 00:11:20 +00:00
rkuo-danswer
1a8b7abd00 add test (#4676)
* add test

* comment

---------

Co-authored-by: Richard Kuo (Onyx) <rkuo@onyx.app>
2025-05-09 21:38:51 +00:00
Evan Lohn
4c0423f27b fix github cursor pagination infinite loop (#4673)
* fix infinite loop

* unit test for infinite loop issue

* mypy version

* more logging

* unbound locals
2025-05-09 21:35:37 +00:00
Chris Weaver
519aeb6a1f Drive perm sync enhancement (#4672)
* Enhance drive perm sync

* add tests

* more stuff

* fixes

* Fix

* Speed up

* Add missing file

* Address EL comments

* Add ondelete=CASCADE

* Improve comment
2025-05-08 03:12:41 +00:00
Raunak Bhagat
d744c0dab4 fix: Fix error in which channel names would not have the leading "#" removed (#4664)
* Fix failing entrypoint into slack connector

* Pre-filter channel names upon instantiation of slack connector class

* Add decrypt script

* Add slack connector tests

* Fix mypy errors on decrypt.py

* Add property to SlackConnector class

* Add some basic tests

* Move location of tests

* Change name of env token

* Add secrets for Slack

* Add more parameterized cases

* Change env variable name

* Change names

* Update channel names

* Edit tests

* Modify tests

* Only import type in __main__

* Fix tests to actually test connectors

* Pass parameter to fixture directly
2025-05-07 04:55:21 +00:00
Chris Weaver
70df685709 Non default schema fix (#4667)
* Use correct postgres schema

* Remove raw Session() use

* Refactor + add test

* Fix comment
2025-05-06 20:35:59 -07:00
Chris Weaver
f85ef78238 Add more logging for confluence perm-sync + handle case where permiss… (#4586)
* Add more logging for confluence perm-sync + handle case where permissions are removed from the access token

* Make required permissions are explicit

* more

* Add slim fetch limit + mark all cc pairs of source type as successful upon group sync

* Add to dev compose

* Small teams fix

* Add file

* Add single limit pagination for confluence

* Restrict to server only

* more logging

* cleanup

* Cleanup

* Remove CONFLUENCE_CONNECTOR_SLIM_FETCH_LIMIT

* Handle teams error

* Fix ut

* Remove db dependency from confluence_doc_sync

* move stuff back to debug
2025-05-06 18:35:14 +00:00
Evan Lohn
113876b276 id not set in checkpoint FINAL (#4656)
* it will never happen again.

* fix perm sync issue

* fix perm sync issue2

* ensure member emails map is populated

* other fix for perm sync

* address CW comments

* nit
2025-05-03 00:10:21 +00:00
rkuo-danswer
0db2ad2132 memory optimize task generation for connector deletion (#4645)
* memory optimize task generation for connector deletion

* test

* fix up integration test docker file

* more no-cache

---------

Co-authored-by: Richard Kuo (Onyx) <rkuo@onyx.app>
2025-05-01 10:47:26 -07:00
Evan Lohn
6436b60763 github cursor pagination (#4642)
* v1 of cursor pagination

* mypy

* unit tests

* CW comments
2025-04-30 19:09:20 -07:00
rkuo-danswer
e254fdc066 add sendgrid as option (#4639)
* add sendgrid as option

* code review

* mypy

---------

Co-authored-by: Richard Kuo (Onyx) <rkuo@onyx.app>
2025-04-30 07:33:15 +00:00
Chris Weaver
9be3da2357 Fix gitlab (#4629)
* Fix gitlab

* Add back assert
2025-04-28 17:42:49 -07:00
Evan Lohn
eebfa5be18 Confluence server api time fix (#4589)
* tolerance of confluence api weirdness

* remove checkpointing

* remove skipping logic from checkpointing

* add back checkpointing

* switch confluence checkpointing to be based on page starts

* address CW comments and fix unit tests

* some mitigations of bad confluence api

* new checkpointing approach and testing fixes

* fix test

* CW comments
2025-04-28 06:06:29 +00:00
Evan Lohn
5db676967f no more duplicate files during folder indexing (#4579)
* no more duplicate files during folder indexing

* cleanup checkpoint after a shared folder has been finished

* cleanup

* lint
2025-04-28 01:01:20 +00:00
Evan Lohn
5ca7a7def9 fix migration and add test (#4615) 2025-04-25 21:27:59 +00:00
Chris Weaver
92b5e1adf4 Add support for overriding user list (#4616)
* Add support for overriding user list

* Fix

* Add typing

* pythonify
2025-04-25 15:15:23 -07:00
Chris Weaver
23c6e0f3bf Single source of truth for image capability (#4612)
* Single source of truth for image capability

* Update web/src/app/admin/assistants/AssistantEditor.tsx

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* Fix tests

---------

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
2025-04-25 20:37:16 +00:00
Evan Lohn
151aabea73 specific user emails for drive connector (#4608)
* specific user emails for drive connector

* fix drive connector tests

* fix connector tests
2025-04-25 18:49:20 +00:00
Chris Weaver
d711680069 Add e2e test for assistant creation/edit (#4597)
* Add e2e test for assistant creation/edit

* Skip initial full reset to have seeded connector
2025-04-25 13:21:34 -07:00
rkuo-danswer
c9a609b7d8 Bugfix/slack bot channel config (#4585)
* friendlier handling of slack channel retrieval

* retry on downgrade_postgres deadlock

* fix comment

* text

---------

Co-authored-by: Richard Kuo (Onyx) <rkuo@onyx.app>
2025-04-23 20:00:03 +00:00
rkuo-danswer
0d4c600852 out of process retry for multitenant test reset (#4566)
* tool to generate vespa schema variations for our cloud

* extraneous assign

* use a real templating system instead of search/replace

* fix float

* maybe this should be double

* remove redundant var

* template the other files

* try a spawned process

* move the wrapper

* fix args

* increase timeout

* run multitenant reset operations out of process as well

---------

Co-authored-by: Richard Kuo (Onyx) <rkuo@onyx.app>
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
2025-04-21 23:30:18 +00:00
Evan Lohn
eb569bf79d add emails to retry with on 403 (#4565)
* add emails to retry with on 403

* attempted fix for connector test

* CW comments

* connector test fix

* test fixes and continue on 403

* fix tests

* fix tests

* fix concurrency tests

* fix integration tests with llmprovider eager loading
2025-04-21 23:27:31 +00:00
Raunak Bhagat
b97628070e feat: Add ability to specify max input token limit for custom LLM providers (#4510)
* Add multi text array field

* Add multiple values to model configuration for a custom LLM provider

* Fix reference to old field name

* Add migration

* Update all instances of model_names / display_model_names to use new schema migration

* Update background task

* Update endpoints to not throw errors

* Add test

* Update backend/alembic/versions/7a70b7664e37_add_models_configuration_table.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* Update backend/onyx/background/celery/tasks/llm_model_update/tasks.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* Fix list comprehension nits

* Update web/src/components/admin/connectors/Field.tsx

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* Update web/src/app/admin/configuration/llm/interfaces.ts

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* Implement greptile recommendations

* Update backend/onyx/db/llm.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* Update backend/onyx/server/manage/llm/api.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* Update backend/onyx/background/celery/tasks/llm_model_update/tasks.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* Update backend/onyx/db/llm.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* Fix more greptile suggestions

* Run formatter again

* Update backend/onyx/db/models.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* Add relationship to `LLMProvider` and `ModelConfigurations` classes

* Use sqlalchemy ORM relationships instead of manually populating fields

* Upgrade migration

* Update interface

* Remove all instances of model_names and display_model_names from backend

* Add more tests and fix bugs

* Run prettier

* Add types

* Update migration to perform data transformation

* Ensure native llm providers don't have custom max input tokens

* Start updating frontend logic to support custom max input tokens

* Pass max input tokens to LLM class (to be passed into `litellm.completion` call later)

* Add ModelConfigurationField component for custom llm providers

* Edit spacing and styling of model configuration matrix

* Fix error message displaying bug

* Edit opacity of `FiX` field for first index

* Change opacity back

* Change roundness

* Address comments on PR

* Perform fetching of `max_input_tokens` at the beginning of the callgraph and rope it throughout the entire callstack

* Change `add` to `execute`

* Move `max_input_tokens` into `LLMConfig`

* Fix bug with error messages not being cleared

* Change field used to fetch LLMProvider

* Fix model-configuration UI

* Address comments

* Remove circular import

* Fix failing tests in GH

* Fix failing tests

* Use `isSubset` instead of equality to determine native vs custom LLM Provider

* Remove unused import

* Make responses always display max_input_tokens

* Fix api endpoint to hit

* Update types in web application

* Update object field

* Fix more type errors

* Fix failing llm provider tests

---------

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
2025-04-21 04:30:21 -07:00
rkuo-danswer
2111eccf07 Feature/vespa jinja (#4558)
* tool to generate vespa schema variations for our cloud

* extraneous assign

* use a real templating system instead of search/replace

* fix float

* maybe this should be double

* remove redundant var

* template the other files

* try a spawned process

* move the wrapper

* fix args

* increase timeout

---------

Co-authored-by: Richard Kuo (Onyx) <rkuo@onyx.app>
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
2025-04-20 22:28:55 +00:00
evan-danswer
dc62d83a06 File connector tests (#4561)
* danswer to onyx plus tests for file connector

* actually add test
2025-04-19 15:54:30 -07:00
evan-danswer
5681df9095 address getting attachments forever (#4562)
* address getting attachments forever

* fix unit tests
2025-04-19 15:53:27 -07:00
Chris Weaver
6666300f37 Fix flakey web test (#4551)
* Fix flakey web test

* Increase wait time

* Another attempt to fix

* Simplify + add new test

* Fix web tests
2025-04-19 15:12:11 -07:00
evan-danswer
953a4e3793 v1 file connector with metadata (#4552) 2025-04-17 23:02:34 +00:00
Chris Weaver
6df1c6c72f Pull in more fields for Jira (#4547)
* Pull in more fields for Jira

* Fix tests

* Fix

* more fix

* Fix

* Fix S3 test

* fix
2025-04-17 01:52:50 +00:00
evan-danswer
5acae2dc80 fix re-processing of previously seen docs Confluence (#4544)
* fix re-processing of previously seen docs

* performance
2025-04-16 23:16:21 +00:00