* prompt addition for gpt o-series to encourage markdown formatting of code blocks
* fix to match https://simonwillison.net/tags/markdown/
* chris comment
* chris comment
* thread utils respect contextvars now
* address pablo comments
* removed tenant id from places it was already being passed
* fix rate limit check and pablo comment
* added timeouts for agent llm calls
* timing suggestions in agent config
* improved timeout that actually exits early
* added new global timeout and connection timeout distinction
* fixed error raising bug and made entity extraction recoverable
* warnings and refactor
* mypy
---------
Co-authored-by: joachim-danswer <joachim@danswer.ai>
* add timings for syncing
* add more logging
* more debugging
* refactor multipass/db check out of VespaIndex
* circular imports?
* more debugging
* add logs
* various improvements
* additional logs to narrow down issue
* use global httpx pool for the main vespa flows in celery. Use in more places eventually.
* cleanup debug logging, etc
* remove debug logging
* this should use the secondary index
* mypy
* missed some logging
* review fixes
* refactor get_default_document_index to use search settings
* more missed logging
* fix circular refs
---------
Co-authored-by: Richard Kuo (Danswer) <rkuo@onyx.app>
Co-authored-by: pablodanswer <pablo@danswer.ai>
* Add support for filtering 0xFDD0-0xFDEF Unicode range
- Update remove_invalid_unicode_chars to handle 0xFDD0-0xFDEF range
- Add comprehensive test cases for Unicode character sanitization
- Fix issue with illegal code point 0xFDDB in Vespa indexing
Co-Authored-By: Chris Weaver <chris@onyx.app>
* Remove unused pytest import
Co-Authored-By: Chris Weaver <chris@onyx.app>
---------
Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: Chris Weaver <chris@onyx.app>
* Added Permission Syncing for Salesforce
* cleanup
* updated connector doc conversion
* finished salesforce permission syncing
* fixed connector to batch Salesforce queries
* tests!
* k
* Added error handling and check for ee and sync type for postprocessing
* comments
* minor touchups
* tested to work!
* done
* my pie
* lil cleanup
* minor comment
- renamed post-reranking/validation citation information consistently to final_... (example: doc_id_to_rank_map -> final_doc_id_to_rank_map)
- changed and renamed objects containing initial ranking information (now: display_...) consistent with final rankings (final_...). Specifically, {} to [] for displayed_search_results
- for CitationInfo, changed citation_num from 'x-th citation in response stream' to the initial position of the doc [NOTE: test implications]
- changed tests:
onyx/backend/tests/unit/onyx/chat/stream_processing/test_citation_processing.py
onyx/backend/tests/unit/onyx/chat/stream_processing/test_citation_substitution.py
* Mismatch issue of Documents shown and Citation number in text fix
When document order presented to LLM differs from order shown to user, wrong doc numbers are cited.
Fix:
- SearchTool.get_search_result returns now final and initial ranking
- initial ranking is passed through a few objects and used for replacement in citation processing
Notes:
- the citation_num in the CitationInfo() object has not been changed.
* PR fixes
- linting
- removed erroneous tab
- added a substitution test case
- adjusted original citation extraction use case
* Included a key test and
* Fixed extra spaces
* Updated test documentation
Updated:
- test_citation_substitution (changed description)
- test_citation_processing (removed data only relevant for the substitution)
* make pywikibot store its working files in a system provided temp directory
* move the config setting around
---------
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
* try rate limiting through redis
* fix circular import issue
* fix bad formatting of family string
* Revert "fix bad formatting of family string"
This reverts commit be688899e5b4dd189dc13d9fec1d0f3ade07ad4f.
* redis usage optional
* disable test that doesn't match with new design
* fix formatting
* fix poorly structured doc id, fix empty page id, fix family_class_dispatch invalid name (no spaces), fix setting id with int pageid
* fix mediawiki test