101 Commits

Author SHA1 Message Date
rkuo-danswer
3404c7eb1d
Feature/background prune 2 (#2583)
* first cut at redis

* some new helper functions for the db

* ignore kombu tables in alembic migrations (used by celery)

* multiline commands for readability, add vespa_metadata_sync queue to worker

* typo fix

* fix returning tuple fields

* add constants

* fix _get_access_for_document

* docstrings!

* fix double function declaration and typing

* fix type hinting

* add a global redis pool

* Add get_document function

* use task_logger in various celery tasks

* add celeryconfig.py to simplify configuration. Will be used in a subsequent commit

* Add celery redis helper. used in a subsequent PR

* kombu warning getting spammy since celery is not self managing its queue in Postgres any more

* add last_modified and last_synced to documents

* fix task naming convention

* use celeryconfig.py

* the big one. adds queues and tasks, updates functions to use the queues with priorities, etc

* change vespa index log line to debug

* mypy fixes

* update alembic migration

* fix fence ordering, rename to "monitor", fix fetch_versioned_implementation call

* mypy

* switch to monotonic time

* fix startup dependencies on redis

* rebase alembic migration

* kombu cleanup - fail silently

* mypy

* add redis_host environment override

* update REDIS_HOST env var in docker-compose.dev.yml

* update the rest of the docker files

* in flight

* harden indexing-status endpoint against db changes happening in the background.  Needs further improvement but OK for now.

* allow no task syncs to run because we create certain objects with no entries but initially marked as out of date

* add back writing to vespa on indexing

* actually working connector deletion

* update contributing guide

* backporting fixes from background_deletion

* renaming cache to cache_volume

* add redis password to various deployments

* try setting up pr testing for helm

* fix indent

* hopefully this release version actually exists

* fix command line option to --chart-dirs

* fetch-depth 0

* edit values.yaml

* try setting ct working directory

* bypass testing only on change for now

* move files and lint them

* update helm testing

* some issues suggest using --config works

* add vespa repo

* add postgresql repo

* increase timeout

* try amd64 runner

* fix redis password reference

* add comment to helm chart testing workflow

* rename helm testing workflow to disable it

* adding clarifying comments

* address code review

* missed a file

* remove commented warning ... just not needed

* fix imports

* refactor to use update_single

* mypy fixes

* add vespa test

* multiple celery workers

* update logs as well and set prefetch multipliers appropriate to the worker intent

* add db refresh to connector deletion

* add some preliminary locking

* organize tasks into separate files

* celery auto associates tasks created inside another task, which bloats the result metadata considerably. trail=False prevents this.

* code review fixes

* move monitor_usergroup_taskset to ee, improve logging

* add multi workers to dev_run_background_jobs.py

* update supervisord with some recommended settings for celery

* name celery workers and shorten dev script prefixing

* add configurable sql alchemy engine settings on startup (needed for various intents like API server, different celery workers and tasks, etc)

* fix comments

* autoscale sqlalchemy pool size to celery concurrency (allow override later?)

* supervisord needs the percent symbols escaped

* use name as primary check, some minor refactoring and type hinting too.

* stash merge (may not function yet)

* remove dead code

* more cleanup

* remove dead file

* we shouldn't be checking for deletion attempts in the db any more

* print cc_pair_id

* print status on status mismatch again

* add logging when cc_pair isn't present

* don't indexing any ingestion type connectors, and don't pause any connectors that aren't active

* add more specific check for deletion completion

* remove flaky mediawiki test site

* move is_pruning

* remove unused code

* remove old function

---------

Co-authored-by: Richard Kuo <rkuo@rkuo.com>
2024-10-07 18:16:17 +00:00
evan-danswer
089c734f63
disabled llm when skip_gen_ai_answer_question set (#2687)
* disabled llm when skip_gen_ai_answer_question set

* added unit test

* typing
2024-10-06 18:10:02 +00:00
Chris Weaver
728a41a35a
Add heartbeat to indexing (#2595) 2024-09-29 19:26:40 -07:00
Chris Weaver
50dd3c8beb
Add size limit to jira tickets (#2586) 2024-09-28 12:49:13 -07:00
pablodanswer
316b6b99ea
Tooling testing (#2533)
* add initial testing

* add custom tool testing

* update ports

* update tests - additional coverage

* update types
2024-09-23 20:09:01 +00:00
pablodanswer
f404c4b448
Move code block default language creation to citation processing (#2501)
* move code block default language creation to citaiton processing

* add test cases

* update copy
2024-09-19 06:00:58 +00:00
rkuo-danswer
f531d071af
Feature/background deletion (#2337)
* first cut at redis

* some new helper functions for the db

* ignore kombu tables in alembic migrations (used by celery)

* multiline commands for readability, add vespa_metadata_sync queue to worker

* typo fix

* fix returning tuple fields

* add constants

* fix _get_access_for_document

* docstrings!

* fix double function declaration and typing

* fix type hinting

* add a global redis pool

* Add get_document function

* use task_logger in various celery tasks

* add celeryconfig.py to simplify configuration. Will be used in a subsequent commit

* Add celery redis helper. used in a subsequent PR

* kombu warning getting spammy since celery is not self managing its queue in Postgres any more

* add last_modified and last_synced to documents

* fix task naming convention

* use celeryconfig.py

* the big one. adds queues and tasks, updates functions to use the queues with priorities, etc

* change vespa index log line to debug

* mypy fixes

* update alembic migration

* fix fence ordering, rename to "monitor", fix fetch_versioned_implementation call

* mypy

* switch to monotonic time

* fix startup dependencies on redis

* rebase alembic migration

* kombu cleanup - fail silently

* mypy

* add redis_host environment override

* update REDIS_HOST env var in docker-compose.dev.yml

* update the rest of the docker files

* in flight

* harden indexing-status endpoint against db changes happening in the background.  Needs further improvement but OK for now.

* allow no task syncs to run because we create certain objects with no entries but initially marked as out of date

* add back writing to vespa on indexing

* actually working connector deletion

* update contributing guide

* backporting fixes from background_deletion

* renaming cache to cache_volume

* add redis password to various deployments

* try setting up pr testing for helm

* fix indent

* hopefully this release version actually exists

* fix command line option to --chart-dirs

* fetch-depth 0

* edit values.yaml

* try setting ct working directory

* bypass testing only on change for now

* move files and lint them

* update helm testing

* some issues suggest using --config works

* add vespa repo

* add postgresql repo

* increase timeout

* try amd64 runner

* fix redis password reference

* add comment to helm chart testing workflow

* rename helm testing workflow to disable it

* adding clarifying comments

* address code review

* missed a file

* remove commented warning ... just not needed

* fix imports

* refactor to use update_single

* mypy fixes

* add vespa test

* add db refresh to connector deletion

* code review fixes

* move monitor_usergroup_taskset to ee, improve logging

---------

Co-authored-by: Richard Kuo <rkuo@rkuo.com>
2024-09-18 16:50:11 +00:00
rkuo-danswer
2fe49e5efb
add ssl testing for redis against a cloud instance (#2422) 2024-09-13 10:28:04 -07:00
pablodanswer
ebe3674ca7
update for edge case (#2336) 2024-09-05 17:58:49 +00:00
hagen-danswer
61b5bd569b
Reworked chunking to support mega chunks (#2032) 2024-08-14 22:18:53 -07:00
Yuhong Sun
d60fb15ad3
Allowing users to set Search Settings (#2106) 2024-08-10 20:48:58 -07:00
pablodanswer
0a8d44b44c
quote processing for lengthy intros (#2103) 2024-08-10 11:09:45 -07:00
pablodanswer
cc8a6da8e3
improve llm-generated citations (account for edge case) (#2096)
* improve llm-generated citations (account for edge case)

* additional test case
2024-08-10 02:06:39 +00:00
rkuo-danswer
be9ed319d5
add unit test for quotes (#2085)
* add unit test for quotes

* test answer and quotes together
2024-08-08 18:20:07 +00:00
pablodanswer
9eb48ca2c3 account for empty links + fix quote processing 2024-08-07 20:55:18 -07:00
rkuo-danswer
fcc4c30ead
don't skip the start of the json answer value (#2067) 2024-08-06 23:59:13 +00:00
Yuhong Sun
036d5c737e
No Null Embeddings (#1982) 2024-07-30 19:54:49 -07:00
hagen-danswer
3938a053aa
Rework tokenizer (#1957) 2024-07-29 23:01:49 -07:00
rkuo-danswer
4a0a927a64
fix removed parameter in MediaWikiConnector (#1970) 2024-07-29 18:47:30 +00:00
Yuhong Sun
9651ea828b
Handling Metadata by Vector and Keyword (#1909) 2024-07-24 11:05:56 -07:00
Weves
0d52e99bd4 Improve confluence rate limiting 2024-07-14 16:40:45 -07:00
Yuhong Sun
e90c66c1b6
Include Titles in Chunks (#1817) 2024-07-12 09:42:24 -07:00
Yuhong Sun
08c6e821e7
Merge Sections Logic (#1801) 2024-07-10 20:14:02 -07:00
pablodanswer
09a11b5e1a
Fix citations + unit tests (#1760) 2024-07-10 10:05:20 -07:00
Weves
97d058b8b2 Fix mypy for mediawiki tests 2024-05-25 17:16:47 -07:00
Andrew Sansom
94018e83b0
Add MediaWiki and Wikipedia Connectors (#1250)
* Add MediaWikiConnector first draft

* Add MediaWikiConnector first draft

* Add MediaWikiConnector first draft

* Add MediaWikiConnector sections for each document

* Add MediaWikiConnector to constants and factory

* Integrate MediaWikiConnector with connectors page

* Unit tests + bug fixes

* Allow adding multiple mediawikiconnectors

* add wikipedia connector

* add wikipedia connector to factory

* improve docstrings of mediawiki connector backend

* improve docstrings of mediawiki connector backend

* move wikipedia and mediawiki icon locations in admin page

* undo accidental commit of modified docker compose yaml
2024-05-24 08:51:20 -07:00
Yuhong Sun
546815dc8c
Consolidate File Processing (#1449) 2024-05-11 23:11:22 -07:00
Yuhong Sun
a17060af5a
Provide Additional Context for Chunk Options in APIs (#1330) 2024-04-14 18:32:22 -07:00
Weves
f135ba9c0c Rework LLM answering flow 2024-03-25 13:34:03 -07:00
Itay
a4053501d0
CI: adding prettier to pre-commit (#1009) 2024-01-28 13:03:39 -08:00
Itay
0ce992e22e
CI: Run Python tests (#1001) 2024-01-28 12:59:51 -08:00
Jeremi Joslin
d07647c597
Fix typo in gmail test connector (#981) 2024-01-24 12:01:26 -08:00
Itay
692fdb4597
Gmail Connector (#946)
---------

Co-authored-by: Yuhong Sun <yuhongsun96@gmail.com>
2024-01-22 16:25:10 -08:00
Yuhong Sun
65fde8f1b3
Chat Backend (#801) 2023-12-14 22:14:37 -08:00
Weves
08909b40b0 Add rate limiting wrapper + add to Document360 2023-10-29 18:00:17 -07:00
Yuhong Sun
fe117513b0
Reorganize and Cleanup for Hybrid Search (#643) 2023-10-28 14:24:28 -07:00
Weves
3554e29b8d Add updated_at to UI + add time range selector 2023-10-23 23:32:16 -07:00
Yuhong Sun
6a449f1fb1
Introduce Recency Bias (#592) 2023-10-19 12:54:35 -07:00
Chris Weaver
1bd76f528f
Document explorer admin page (#590) 2023-10-18 18:41:39 -07:00
Yuhong Sun
a5d2759fbc
Recreate Tables from HTML (#588) 2023-10-18 11:16:40 -07:00
Yuhong Sun
595f61ea3a
Add Retrieval to Chat History (#577) 2023-10-15 13:40:07 -07:00
Yuhong Sun
6b305c56b3
Use Sentence Aware Splitter (#452) 2023-09-16 16:28:16 -07:00
Weves
cf2bd8a40c highlighting 2023-09-12 11:35:37 -07:00
Yuhong Sun
ec4d0b856c
Added boost to rerank step (#360) 2023-08-30 23:12:55 -07:00
Yuhong Sun
2a339ec34b
Prevent too many tokens to GPT (#245) 2023-07-28 16:00:26 -07:00
Yuhong Sun
d6ca865034
Support GPT4All in memory (#230) 2023-07-23 12:26:14 -07:00
Yuhong Sun
e4820045f9
Add metadata to GPT (#140) 2023-07-14 16:54:42 -07:00
Chris Weaver
2f54795631
Basic Slack Bot Support (#128) 2023-07-03 14:26:33 -07:00
Yuhong Sun
6d1b750077
DAN-59 Fix all the mypy issues (#38) 2023-05-12 20:37:52 -07:00
Chris Weaver
e390906ac1
Make Document source required (#4) 2023-04-29 16:49:27 -07:00