16 Commits

Author SHA1 Message Date
pablonyx
65fd8b90a8
add image indexing tests (#4477)
* address file path

* k

* update

* update

* nit- fix typing

* k

* should path

* in a good state

* k

* k

* clean up file

* update

* update

* k

* k

* k
2025-04-11 22:16:37 +00:00
rkuo-danswer
24184024bb
Bugfix/dependency updates (#4482)
* bump fastapi and starlette

* bumping llama index and nltk and associated deps

* bump to fix python-multipart

* bump aiohttp

* update package lock for examples/widget

* bump black

* sentencesplitter has changed namespaces

* fix reorder import check, fix missing passlib

* update package-lock.json

* black formatter updated

* reformatted again

* change to black compatible reorder

* change to black compatible reorder-python-imports fork

* fix pytest dependency

* black format again

* we don't need cdk.txt. update packages to be consistent across all packages

---------

Co-authored-by: Richard Kuo (Onyx) <rkuo@onyx.app>
Co-authored-by: Richard Kuo <rkuo@rkuo.com>
2025-04-10 08:23:02 +00:00
pablonyx
0acd50b75d docx bugfix 2025-04-04 18:20:31 -07:00
pablonyx
c3c9a0e57c
Docx parsing (#4455)
* looks okay

* k

* k

* k

* update values

* k

* quick fix
2025-04-04 23:36:43 +00:00
pablonyx
3a3b2a2f8d add user files (#4152) 2025-04-01 16:19:44 -07:00
rkuo-danswer
f08fa878a6
refactor file extension checking and add test for blob s3 (#4369)
* refactor file extension checking and add test for blob s3

* code review

* fix checking ext

---------

Co-authored-by: Richard Kuo (Onyx) <rkuo@onyx.app>
2025-03-27 18:57:44 +00:00
rkuo-danswer
036648146d
possible fix for confluence query filter (#4280)
* possible fix for confluence query filter

* nuke the attachment filter query ... it doesn't work!

---------

Co-authored-by: Richard Kuo (Onyx) <rkuo@onyx.app>
2025-03-27 00:35:14 +00:00
pablonyx
1b7d710b2a
Fix links from file metadata (#4324)
* quick fix

* clarify comment

* fix file metadata

* k
2025-03-22 18:21:47 +00:00
pablonyx
f87e559cc4
Separate out indexing-time image analysis into new phase (#4228)
* Separate out indexing-time image analysis into new phase

* looking good

* k

* k
2025-03-12 22:26:05 +00:00
pablonyx
5883336d5e
Support image indexing customization (#4261)
* working well

* k

* ready to go

* k

* minor nits

* k

* quick fix

* k

* k
2025-03-12 20:03:45 +00:00
pablonyx
20f2b9b2bb
Add image support for search (#4090)
* add support for image search

* quick fix up

* k

* k

* k

* k

* nit

* quick fix for connector tests
2025-03-05 17:44:18 +00:00
pablonyx
47fd4fa233
Strict Tenant ID Enforcement (#3871)
* strict tenant id enforcement

* k

* k

* nit

* merge

* nit

* k
2025-02-19 00:52:56 +00:00
pablodanswer
125e5eaab1 various mypy improvements 2025-02-04 12:06:10 -08:00
pablonyx
2ad86aa9a6
Unstructured fix (#3809)
* fix v1

* temporary patch for pdfs

* nit
2025-01-28 16:46:27 +00:00
Chris Weaver
1db778baa8
Small airtable refactor + handle files with uppercase extensions (#3598)
* Small airtable refactor + handle files with uppercase extensions

* Fix mypy
2025-01-05 11:27:50 -08:00
pablodanswer
21ec5ed795 welcome to onyx 2024-12-13 09:56:10 -08:00