16 Commits

Author SHA1 Message Date
rkuo-danswer
4fe99d05fd
add timings for syncing (#3798)
* add timings for syncing

* add more logging

* more debugging

* refactor multipass/db check out of VespaIndex

* circular imports?

* more debugging

* add logs

* various improvements

* additional logs to narrow down issue

* use global httpx pool for the main vespa flows in celery. Use in more places eventually.

* cleanup debug logging, etc

* remove debug logging

* this should use the secondary index

* mypy

* missed some logging

* review fixes

* refactor get_default_document_index to use search settings

* more missed logging

* fix circular refs

---------

Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
Co-authored-by: pablodanswer <pablo@danswer.ai>
2025-01-29 23:24:44 +00:00
devin-ai-integration[bot]
b82123563b
Fix Unicode sanitization for Vespa document indexing (#3831)
* Add support for filtering 0xFDD0-0xFDEF Unicode range

- Update remove_invalid_unicode_chars to handle 0xFDD0-0xFDEF range
- Add comprehensive test cases for Unicode character sanitization
- Fix issue with illegal code point 0xFDDB in Vespa indexing

Co-Authored-By: Chris Weaver <chris@onyx.app>

* Remove unused pytest import

Co-Authored-By: Chris Weaver <chris@onyx.app>

---------

Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: Chris Weaver <chris@onyx.app>
2025-01-29 18:32:00 +00:00
pablonyx
8ddd95d0d4
Fix exceptional seeding delay (#3723)
* fix seeding waiting

* k

* updated
2025-01-21 01:02:13 +00:00
pablodanswer
d98746b988 remove unnecessary logs 2025-01-08 17:03:15 -08:00
pablonyx
d7bc32c0ec
Fully remove visit API (#3621)
* v1

* update indexing logic

* update updates

* nit

* clean up args

* update for clarity + best practices

* nit + logs

* fix

* minor clean up

* remove logs

* quick nit
2025-01-08 13:49:01 -08:00
pablonyx
82eab9d704
Doc explore fix (#3614)
* k

* k

* add comment
2025-01-06 19:42:07 +00:00
pablonyx
e3b2c9d944
Tracking update (#3605)
* tracking update

* k
2025-01-06 17:17:00 +00:00
pablonyx
ddec239fef
Improved indexing (#3594)
* nit

* k

* add steps

* main util functions

* functioning fully

* quick nit

* k

* typing fix

* k

* address comments
2025-01-05 23:31:53 +00:00
Richard Kuo
8f0fb70bbf fix formatting 2025-01-02 23:21:54 -08:00
rkuo-danswer
2232702e99
retry the individual delete's (#3580)
* retry the individual delete's

* need to raise inside the retry

* just use retry for now

---------

Co-authored-by: Richard Kuo <rkuo@rkuo.com>
2025-01-02 17:39:37 -08:00
Richard Kuo
3eb72e5c1d Revert "More efficient Vespa indexing (#3552)"
This reverts commit 27832167819f063bc2f4436765ff27f0ccba2b64.
2024-12-31 09:40:23 -08:00
pablonyx
2783216781
More efficient Vespa indexing (#3552)
---------

Co-authored-by: Chris Weaver <25087905+Weves@users.noreply.github.com>
2024-12-30 18:51:14 -08:00
rkuo-danswer
e9b10e8b41
temporarily disabling validate indexing fences (#3502)
* temporarily disabling validate indexing fences

* add back a few startup checks in the cloud

* use common vespa client to perform health check

* log vespa url and try using http1 on light worker index methods

---------

Co-authored-by: Richard Kuo <rkuo@rkuo.com>
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
2024-12-19 01:32:09 +00:00
pablonyx
2847ab003e
Prompting (#3372)
* auto generate start prompts

* post rebase clean up

* update for clarity
2024-12-16 21:34:43 +00:00
pablodanswer
d0483dd269 temporary vespa bump for tests 2024-12-15 21:41:21 -08:00
pablodanswer
21ec5ed795 welcome to onyx 2024-12-13 09:56:10 -08:00