Reduce ranking scores for short chunks without actual information (#4098)

* remove title for slack

* initial working code

* simplification

* improvements

* name change to information_content_model

* avoid boost_score > 1.0

* nit

* EL comments and improvements

Improvements:
  - proper import of information content model from cache or HF
  - warm up for information content model

Other:
  - EL PR review comments

* nit

* requirements version update

* fixed docker file

* new home for model_server configs

* default off

* small updates

* YS comments - pt 1

* renaming to chunk_boost & chunk table def

* saving and deleting chunk stats in new table

* saving and updating chunk stats

* improved dict score update

* create columns for individual boost factors

* RK comments

* Update migration

* manual import reordering
This commit is contained in:
joachim-danswer
2025-03-13 10:35:45 -07:00
committed by GitHub
parent ba82888e1e
commit 463340b8a1
31 changed files with 898 additions and 34 deletions

View File

@ -99,6 +99,7 @@ def generate_dummy_chunk(
),
document_sets={document_set for document_set in document_set_names},
boost=random.randint(-1, 1),
aggregated_chunk_boost_factor=random.random(),
tenant_id=POSTGRES_DEFAULT_SCHEMA,
)