67 Commits

Author SHA1 Message Date
joachim-danswer
c5adbe4180 Knowledge Graph v1 (#4626)
* db setup

* transfer 1 - incomplete

* more adjustments

* relationship table + query update

* temp view creation

* restructuring

* nits

* updates

* separate read_only engine

* extraction revamp

* focus on metadata relatonships 1

* dev

* migration downgrade fix

* rebase migration change

* a3+

* progress

* base

* new extraction

* progress

* fixed KG extraction

* nits

* updates

* simplifications & cleanup

* fixes

* updates

* more feature flag checks

* fixes

* extraction process fix

* read-only user creation as part of setup

* fix for missing entity attributes

* kg read-only user creation as part of migration

* typo

* EL initial comments

* initial Account/SF Connector chnges

* SF Connector update

 - include account information

* base w/ salesforce

* evan updates + quite a bit more

* kg-filtered search

* EL changes pt 2

* migrations and env vars

* quick migration fix

* migration update

* post_rebase fixes

* mypy fixes

* test fixes

* test fix

* test fix

* read_only pool + misc

* nf

* env vars

* test improvements

* salesforce fix

* test update

* small changes

* small adjustments

* SF Connector fix & kg_stage removal for one table

* mypy fix

* small fixes

* EL + RK (pt 1) comments

* nit

* setting updated

* Salesforce test update

* EL comments

* read-only user replacement & cleanup

* SQL View fix

* converting entity type-name separators

* sql view group ownership

* view fix

* SQL tweak

* dealing with docs that were skipped by indexing

* increased error handling

* more error handling

* Output formatting fix

* kg-incremental-reindexing

* 0-doc found improvement

* celery

* migration correction

* timeout adjustments

* nit

* Updated migration

* Entity Normalization for KG Dev 1 (#4746)

* feat: trigrams column

* fix: reranking and db

* feat: v1

* fix: convert to orm

* feat: parallel

* fix: default to id_name

* fix: renamed semantic_id and semantic_id_trigrams

* fix: scalar subquery

* fix: tuning + redundancy

* fix: threshold

* fix: typo

* fix: shorten names

* wip

* fix: reverted

* feat: config

* feat: works but it was dumb

* feat: clustering works

* fix: mypy

* normalization <-> language awareness for SQL generation

* small type fixes

---------

Co-authored-by: joachim-danswer <joachim@danswer.ai>

* mypy

* typo and dead code

* kg_time_fencing

* feat: remove temp views on migration downgrade

* remove functions and triggers for now

* rebase adjustments

* EL code review results

* quick fix + trigger/funcs for single tenant

* fix: typo, mypy, dead code

* fix: autoflake

* small updatesd

* nit

* fix: typo

* early + faster view creation

* Extension creation in MT migration

* nit changes to default ETs

* Incremental Clustering and KG Refactor V1 (#4784)

Optimized/restructured incremental clustering. New pipeline actually that moves vespa updates to clustering.
Also, celery configuration has been updated.
---------

Co-authored-by: joachim-danswer <joachim@danswer.ai>

* prompt tweak & ET extraction reset

* more general hierarchical structure

* feat: better vespa reset logic

* prompt optimization and entity replacemants

* small prompt changes

* KG Refactor V2 (#4814)

Clustering & Extraction improvements & various nits 

Co-authored-by: joachim-danswer <joachim@danswer.ai>

* add connector-level coverage days

* fix: nit

* initial  EL responses

* refactor: helper functions for formatting

* fix: more helper fns & comments

* fix: comment code that's been implemented elsewhere

* fix: tenant_id missing arg

* fix: removed debugging stuff

* fix: moved kg_interactions db query to helper fn

* fix: tenant_id

* fix: tenant_id & removed outdated helper fn

* fix always set entity class

* fix: typo

* fix alembic heads

* fix: celery logging

* fix: migrations fix

* fix: multi tenant permissions

* fix: temp connector fix

* fix: downgrade

* Fix upgrade migration

* fix: tenant for normalization

* added additional acl

* stray EL comments

* fix: connector test

* fix mypy

* fix: temporary connector test fix

* fix: jira connector test

* nit

* small nits

* fix: black

* fix: mypy

* fix: mypy

---------

Co-authored-by: Rei Meguro <36625832+Orbital-Web@users.noreply.github.com>
2025-06-07 23:14:20 +00:00
Weves
58c641d8ec Remove ordering-only flow 2025-06-02 18:29:42 -07:00
Weves
94985e24c6 Adjust user file access 2025-06-02 17:28:49 -07:00
Chris Weaver
0c7ba8e2ac Fix/add back search with files (#4767)
* Allow search w/ user files

* more

* More

* Fix

* Improve prompt

* Combine user files + regular uploaded files
2025-05-24 15:44:39 -07:00
Evan Lohn
cac03c07f7 v1 answer refactor (#4721)
* v1 answer refactor

* fix tests

* good catch, tests

* more cleanup
2025-05-20 23:34:27 +00:00
rkuo-danswer
e254fdc066 add sendgrid as option (#4639)
* add sendgrid as option

* code review

* mypy

---------

Co-authored-by: Richard Kuo (Onyx) <rkuo@onyx.app>
2025-04-30 07:33:15 +00:00
rkuo-danswer
94de23fe87 Bugfix/chat images 2 (#4630)
* don't hardcode -1

* extra spaces

* fix binary data in blurb

* add note to binary handling

---------

Co-authored-by: Richard Kuo (Onyx) <rkuo@onyx.app>
2025-04-30 01:29:10 +00:00
pablonyx
df67ca18d8 My docs cleanup (#4519)
* update

* improved my docs

* nit

* nit

* k

* push changes

* update

* looking good

* k

* fix preprocessing

* try a fix

* k

* update

* nit

* k

* quick nits

* Cleanup / fixes

* Fixes

* Fix build

* fix

* fix quality checks

---------

Co-authored-by: Weves <chrisweaver101@gmail.com>
2025-04-25 05:20:33 +00:00
joachim-danswer
669b668463 updated logging and basic search expansion procedure 2025-04-22 11:58:02 -07:00
Raunak Bhagat
b97628070e feat: Add ability to specify max input token limit for custom LLM providers (#4510)
* Add multi text array field

* Add multiple values to model configuration for a custom LLM provider

* Fix reference to old field name

* Add migration

* Update all instances of model_names / display_model_names to use new schema migration

* Update background task

* Update endpoints to not throw errors

* Add test

* Update backend/alembic/versions/7a70b7664e37_add_models_configuration_table.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* Update backend/onyx/background/celery/tasks/llm_model_update/tasks.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* Fix list comprehension nits

* Update web/src/components/admin/connectors/Field.tsx

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* Update web/src/app/admin/configuration/llm/interfaces.ts

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* Implement greptile recommendations

* Update backend/onyx/db/llm.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* Update backend/onyx/server/manage/llm/api.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* Update backend/onyx/background/celery/tasks/llm_model_update/tasks.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* Update backend/onyx/db/llm.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* Fix more greptile suggestions

* Run formatter again

* Update backend/onyx/db/models.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* Add relationship to `LLMProvider` and `ModelConfigurations` classes

* Use sqlalchemy ORM relationships instead of manually populating fields

* Upgrade migration

* Update interface

* Remove all instances of model_names and display_model_names from backend

* Add more tests and fix bugs

* Run prettier

* Add types

* Update migration to perform data transformation

* Ensure native llm providers don't have custom max input tokens

* Start updating frontend logic to support custom max input tokens

* Pass max input tokens to LLM class (to be passed into `litellm.completion` call later)

* Add ModelConfigurationField component for custom llm providers

* Edit spacing and styling of model configuration matrix

* Fix error message displaying bug

* Edit opacity of `FiX` field for first index

* Change opacity back

* Change roundness

* Address comments on PR

* Perform fetching of `max_input_tokens` at the beginning of the callgraph and rope it throughout the entire callstack

* Change `add` to `execute`

* Move `max_input_tokens` into `LLMConfig`

* Fix bug with error messages not being cleared

* Change field used to fetch LLMProvider

* Fix model-configuration UI

* Address comments

* Remove circular import

* Fix failing tests in GH

* Fix failing tests

* Use `isSubset` instead of equality to determine native vs custom LLM Provider

* Remove unused import

* Make responses always display max_input_tokens

* Fix api endpoint to hit

* Update types in web application

* Update object field

* Fix more type errors

* Fix failing llm provider tests

---------

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
2025-04-21 04:30:21 -07:00
evan-danswer
68c6c1f4f8 refactor to use stricter typing (#4513)
* refactor to use stricter typing

* older version of ruff
2025-04-14 17:23:07 +00:00
rkuo-danswer
24184024bb Bugfix/dependency updates (#4482)
* bump fastapi and starlette

* bumping llama index and nltk and associated deps

* bump to fix python-multipart

* bump aiohttp

* update package lock for examples/widget

* bump black

* sentencesplitter has changed namespaces

* fix reorder import check, fix missing passlib

* update package-lock.json

* black formatter updated

* reformatted again

* change to black compatible reorder

* change to black compatible reorder-python-imports fork

* fix pytest dependency

* black format again

* we don't need cdk.txt. update packages to be consistent across all packages

---------

Co-authored-by: Richard Kuo (Onyx) <rkuo@onyx.app>
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
2025-04-10 08:23:02 +00:00
evan-danswer
1718b8f677 fix claude bug (#4493)
* fix claude bug

* fixed tests
2025-04-10 00:59:18 +00:00
pablonyx
8db80a6bb1 Add latency metrics (#4472)
* k

* update

* Update chat_backend.py

nit

---------

Co-authored-by: evan-danswer <evan@danswer.ai>
2025-04-08 21:23:26 +00:00
evan-danswer
10f1ac5da1 use persona info when creating tool args (#4397)
* use persona info when creating tool args

* fixed unit test

* include system message

* fix unit test

* nit
2025-04-08 02:55:36 +00:00
evan-danswer
9c73099241 Drive smart chip indexing (#4459)
* WIP

* WIP almost done, but realized we can just do basic retrieval

* rebased and added scripts

* improved approach to extracting smart chips

* remove files from previous branch

* fix connector tests

* fix test
2025-04-07 21:52:45 +00:00
pablonyx
b02af9b280 Div Con (#4442)
* base setup

* Improvements + time boxing

* time box fix

* mypy fix

* EL Comments

* CW comments

* date awareness

---------

Co-authored-by: joachim-danswer <joachim@danswer.ai>
2025-04-04 00:52:00 +00:00
evan-danswer
54b883d0ca fix large docs selected in chat pruning (#4412)
* fix large docs selected in chat pruning

* better approach to length restriction

* comments

* comments

* fix unit tests and minor pruning bug

* remove prints
2025-04-03 15:48:10 +00:00
pablonyx
3a3b2a2f8d add user files (#4152) 2025-04-01 16:19:44 -07:00
joachim-danswer
e988c13e1d Additional logging for the path from Search Results to LLM Context (#4387)
* added logging

* nit

* nit
2025-03-31 00:38:43 +00:00
Chris Weaver
22e00a1f5c Fix duplicate docs (#4378)
* Initial

* Fix duplicate docs

* Add tests

* Switch to list comprehension

* Fix test
2025-03-28 22:25:26 +00:00
Chris Weaver
0d0588a0c1 Remove OnyxContext (#4376)
* Remove OnyxContext

* Fix UT

* Fix tests v2
2025-03-28 12:39:51 -07:00
pablonyx
7dcec6caf5 Fix session touching (#4363)
* fix session touching

* Revert "fix session touching"

This reverts commit c473d5c9a2.

* Revert "Revert "fix session touching""

This reverts commit 26a71d40b6.

* update

* quick nit
2025-03-27 01:18:46 +00:00
Chris Weaver
d123713c00 Fix GPU status request in sync flow (#4318)
* Fix GPU status request in sync flow

* tweak

* Fix test

* Fix more tests
2025-03-21 11:11:00 -07:00
rkuo-danswer
426883bbf5 Feature/agentic buffered (#4231)
* rename agent test script to prevent pytest autodiscovery

* first cut

* fix log message

* fix up typing

* add a sample test

---------

Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
2025-03-10 15:48:42 +00:00
evan-danswer
b7da91e3ae improved basic search latency (#4186)
* improved basic search latency

* address PR comments + minor cleanup
2025-03-06 22:22:59 +00:00
pablonyx
20f2b9b2bb Add image support for search (#4090)
* add support for image search

* quick fix up

* k

* k

* k

* k

* nit

* quick fix for connector tests
2025-03-05 17:44:18 +00:00
evan-danswer
a7125662f1 Fix gpt o-series code block formatting (#4089)
* prompt addition for gpt o-series to encourage markdown formatting of code blocks

* fix to match https://simonwillison.net/tags/markdown/

* chris comment

* chris comment
2025-02-24 00:59:48 +00:00
evan-danswer
4a4e4a6c50 thread utils respect contextvars (#4074)
* thread utils respect contextvars now

* address pablo comments

* removed tenant id from places it was already being passed

* fix rate limit check and pablo comment
2025-02-24 00:43:21 +00:00
evan-danswer
e304ec4ab6 Agent search history displayed answer (#4052) 2025-02-19 15:52:16 -08:00
pablonyx
47fd4fa233 Strict Tenant ID Enforcement (#3871)
* strict tenant id enforcement

* k

* k

* nit

* merge

* nit

* k
2025-02-19 00:52:56 +00:00
evan-danswer
2b2ba5478c new is_agentic flag for chatmessages (#4026)
* new is_agentic flag for chatmessages

* added cancelled error to db

* added cancelled error to returned message
2025-02-18 04:20:33 +00:00
evan-danswer
5ca898bde2 Force use tool overrides (#4024)
* initial rename + timeout bump

* querry override
2025-02-17 21:01:24 +00:00
joachim-danswer
86bd121806 no reranking if local model w/o GPU for Agent Search (#4011)
* no reranking if locql model w/o GPU

* more efficient gpu status calling

* fix unit tests

---------

Co-authored-by: Evan Lohn <evan@danswer.ai>
2025-02-17 14:13:24 +00:00
evan-danswer
217569104b added context type for when internet search tool is used (#3930) 2025-02-08 20:44:38 -08:00
evan-danswer
29f5f4edfa fixed citations when sections selected (#3914)
* removed some dead code and fixed citations when a search request is made with sections selected

* fix black formatting issue
2025-02-05 22:16:07 +00:00
Yuhong Sun
49fd76b336 Tool Call Error Display (#3897) 2025-02-04 16:12:50 -08:00
pablodanswer
125e5eaab1 various mypy improvements 2025-02-04 12:06:10 -08:00
Yuhong Sun
506a9f1b94 Yuhong 2025-02-03 20:10:51 -08:00
Evan Lohn
71304e4228 always persist in agent search 2025-02-03 20:10:51 -08:00
Evan Lohn
29440f5482 alembic heads, basic citations, search pipeline state 2025-02-03 20:10:51 -08:00
Evan Lohn
5a95a5c9fd large number of PR comments addressed 2025-02-03 20:10:51 -08:00
Evan Lohn
118e8afbef reworked config to have logical structure 2025-02-03 20:10:51 -08:00
Evan Lohn
e191e514b9 fixed find and replace issue 2025-02-03 20:10:51 -08:00
joachim-danswer
0578c31522 rename retrieval & consolidate_sub_answers (initial and refinement) 2025-02-03 20:10:51 -08:00
Evan Lohn
6c7f8eaefb first pass at dead code deletion 2025-02-03 20:10:50 -08:00
Evan Lohn
110c9f7e1b nit 2025-02-03 20:10:50 -08:00
Evan Lohn
1a22af4f27 AgentPromptConfig in Answer class 2025-02-03 20:10:50 -08:00
Evan Lohn
7d494cd65e allowed empty Search Tool for non-agentic search 2025-02-03 20:10:50 -08:00
Evan Lohn
2d8486bac4 stop infos when done streaming answers 2025-02-03 20:10:50 -08:00