mirror of
https://github.com/danswer-ai/danswer.git
synced 2025-09-21 14:12:42 +02:00
Updated Contributing for Celery (#629)
This commit is contained in:
@@ -6,7 +6,7 @@ As an open source project in a rapidly changing space, we welcome all contributi
|
|||||||
|
|
||||||
## 💃 Guidelines
|
## 💃 Guidelines
|
||||||
### Contribution Opportunities
|
### Contribution Opportunities
|
||||||
The [GitHub issues](https://github.com/danswer-ai/danswer/issues) page is a great place to start for contribution ideas.
|
The [GitHub Issues](https://github.com/danswer-ai/danswer/issues) page is a great place to start for contribution ideas.
|
||||||
|
|
||||||
Issues that have been explicitly approved by the maintainers (aligned with the direction of the project)
|
Issues that have been explicitly approved by the maintainers (aligned with the direction of the project)
|
||||||
will be marked with the `approved by maintainers` label.
|
will be marked with the `approved by maintainers` label.
|
||||||
@@ -19,7 +19,9 @@ If you have a new/different contribution in mind, we'd love to hear about it!
|
|||||||
Your input is vital to making sure that Danswer moves in the right direction.
|
Your input is vital to making sure that Danswer moves in the right direction.
|
||||||
Before starting on implementation, please raise a GitHub issue.
|
Before starting on implementation, please raise a GitHub issue.
|
||||||
|
|
||||||
And always feel free to message us (Chris Weaver / Yuhong Sun) on Slack / Discord directly about anything at all.
|
And always feel free to message us (Chris Weaver / Yuhong Sun) on
|
||||||
|
[Slack](https://join.slack.com/t/danswer/shared_invite/zt-1u3h3ke3b-VGh1idW19R8oiNRiKBYv2w) /
|
||||||
|
[Discord](https://discord.gg/TDJ59cGV2X) directly about anything at all.
|
||||||
|
|
||||||
|
|
||||||
### Contributing Code
|
### Contributing Code
|
||||||
@@ -44,8 +46,8 @@ We would love to see you there!
|
|||||||
|
|
||||||
|
|
||||||
## Get Started 🚀
|
## Get Started 🚀
|
||||||
Danswer being a fully functional app, relies on several external pieces of software, specifically:
|
Danswer being a fully functional app, relies on some external pieces of software, specifically:
|
||||||
- Postgres (Relational DB)
|
- [Postgres](https://www.postgresql.org/) (Relational DB)
|
||||||
- [Vespa](https://vespa.ai/) (Vector DB/Search Engine)
|
- [Vespa](https://vespa.ai/) (Vector DB/Search Engine)
|
||||||
|
|
||||||
This guide provides instructions to set up the Danswer specific services outside of Docker because it's easier for
|
This guide provides instructions to set up the Danswer specific services outside of Docker because it's easier for
|
||||||
@@ -54,11 +56,9 @@ development purposes but also feel free to just use the containers and update wi
|
|||||||
|
|
||||||
|
|
||||||
### Local Set Up
|
### Local Set Up
|
||||||
We've tested primarily with Python versions >= 3.11 but the code should work with Python >= 3.9.
|
It is recommended to use Python versions >= 3.11.
|
||||||
|
|
||||||
This guide skips a few optional features for simplicity, reach out if you need any of these:
|
This guide skips setting up User Authentication for the purpose of simplicity
|
||||||
- User Authentication feature
|
|
||||||
- File Connector background job
|
|
||||||
|
|
||||||
|
|
||||||
#### Installing Requirements
|
#### Installing Requirements
|
||||||
@@ -93,18 +93,11 @@ playwright install
|
|||||||
|
|
||||||
|
|
||||||
#### Dependent Docker Containers
|
#### Dependent Docker Containers
|
||||||
First navigate to `danswer/deployment/docker_compose`, then start up the containers with:
|
First navigate to `danswer/deployment/docker_compose`, then start up Vespa and Postgres with:
|
||||||
|
|
||||||
Postgres:
|
|
||||||
```bash
|
```bash
|
||||||
docker compose -f docker-compose.dev.yml -p danswer-stack up -d relational_db
|
docker compose -f docker-compose.dev.yml -p danswer-stack up -d document_index relational_db
|
||||||
```
|
```
|
||||||
|
(document_index refers to Vespa and relational_db refers to Postgres)
|
||||||
Vespa:
|
|
||||||
```bash
|
|
||||||
docker compose -f docker-compose.dev.yml -p danswer-stack up -d index
|
|
||||||
```
|
|
||||||
|
|
||||||
|
|
||||||
#### Running Danswer
|
#### Running Danswer
|
||||||
|
|
||||||
@@ -115,27 +108,33 @@ mkdir dynamic_config_storage
|
|||||||
|
|
||||||
To start the frontend, navigate to `danswer/web` and run:
|
To start the frontend, navigate to `danswer/web` and run:
|
||||||
```bash
|
```bash
|
||||||
AUTH_TYPE=disabled npm run dev
|
npm run dev
|
||||||
```
|
|
||||||
_for Windows, run:_
|
|
||||||
```bash
|
|
||||||
(SET "AUTH_TYPE=disabled" && npm run dev)
|
|
||||||
```
|
```
|
||||||
|
|
||||||
|
Package the Vespa schema. This will only need to be done when the Vespa schema is updated locally.
|
||||||
|
|
||||||
The first time running Danswer, you will need to run the DB migrations for Postgres.
|
|
||||||
Navigate to `danswer/backend` and with the venv active, run:
|
|
||||||
```bash
|
|
||||||
alembic upgrade head
|
|
||||||
```
|
|
||||||
|
|
||||||
Additionally, we have to package the Vespa schema deployment:
|
|
||||||
Nagivate to `danswer/backend/danswer/datastores/vespa/app_config` and run:
|
Nagivate to `danswer/backend/danswer/datastores/vespa/app_config` and run:
|
||||||
```bash
|
```bash
|
||||||
zip -r ../vespa-app.zip .
|
zip -r ../vespa-app.zip .
|
||||||
```
|
```
|
||||||
- Note: If you don't have the `zip` utility, you will need to install it prior to running the above
|
- Note: If you don't have the `zip` utility, you will need to install it prior to running the above
|
||||||
|
|
||||||
|
The first time running Danswer, you will also need to run the DB migrations for Postgres.
|
||||||
|
After the first time, this is no longer required unless the DB models change.
|
||||||
|
|
||||||
|
Navigate to `danswer/backend` and with the venv active, run:
|
||||||
|
```bash
|
||||||
|
alembic upgrade head
|
||||||
|
```
|
||||||
|
|
||||||
|
Next, start the task queue which orchestrates the background jobs.
|
||||||
|
Jobs that take more time are run async from the API server.
|
||||||
|
|
||||||
|
Still in `danswer/backend`, run:
|
||||||
|
```bash
|
||||||
|
python ./scripts/dev_run_background_jobs.py
|
||||||
|
```
|
||||||
|
|
||||||
To run the backend API server, navigate back to `danswer/backend` and run:
|
To run the backend API server, navigate back to `danswer/backend` and run:
|
||||||
```bash
|
```bash
|
||||||
AUTH_TYPE=disabled \
|
AUTH_TYPE=disabled \
|
||||||
@@ -153,33 +152,6 @@ powershell -Command "
|
|||||||
"
|
"
|
||||||
```
|
```
|
||||||
|
|
||||||
To run the background job to check for connector updates and index documents, navigate to `danswer/backend` and run:
|
|
||||||
```bash
|
|
||||||
PYTHONPATH=. DYNAMIC_CONFIG_DIR_PATH=./dynamic_config_storage python danswer/background/update.py
|
|
||||||
```
|
|
||||||
_For Windows:_
|
|
||||||
```bash
|
|
||||||
powershell -Command " $env:PYTHONPATH='.'; $env:DYNAMIC_CONFIG_DIR_PATH='./dynamic_config_storage'; python danswer/background/update.py "
|
|
||||||
```
|
|
||||||
|
|
||||||
To run the background job to check for periodically check for document set updates, navigate to `danswer/backend` and run:
|
|
||||||
```bash
|
|
||||||
PYTHONPATH=. DYNAMIC_CONFIG_DIR_PATH=./dynamic_config_storage python danswer/background/document_set_sync_script.py
|
|
||||||
```
|
|
||||||
_For Windows:_
|
|
||||||
```bash
|
|
||||||
powershell -Command " $env:PYTHONPATH='.'; $env:DYNAMIC_CONFIG_DIR_PATH='./dynamic_config_storage'; python danswer/background/document_set_sync_script.py "
|
|
||||||
```
|
|
||||||
|
|
||||||
To run Celery, which handles deletion of connectors + syncing of document sets, navigate to `danswer/backend` and run:
|
|
||||||
```bash
|
|
||||||
PYTHONPATH=. DYNAMIC_CONFIG_DIR_PATH=./dynamic_config_storage celery -A danswer.background.celery worker --loglevel=info --concurrency=1
|
|
||||||
```
|
|
||||||
_For Windows:_
|
|
||||||
```bash
|
|
||||||
powershell -Command " $env:PYTHONPATH='.'; $env:DYNAMIC_CONFIG_DIR_PATH='./dynamic_config_storage'; celery -A danswer.background.celery worker --loglevel=info --concurrency=1 "
|
|
||||||
```
|
|
||||||
|
|
||||||
Note: if you need finer logging, add the additional environment variable `LOG_LEVEL=DEBUG` to the relevant services.
|
Note: if you need finer logging, add the additional environment variable `LOG_LEVEL=DEBUG` to the relevant services.
|
||||||
|
|
||||||
### Formatting and Linting
|
### Formatting and Linting
|
||||||
|
@@ -1,4 +1,5 @@
|
|||||||
# This file is purely for development use, not included in any builds
|
import argparse
|
||||||
|
import os
|
||||||
import subprocess
|
import subprocess
|
||||||
import threading
|
import threading
|
||||||
|
|
||||||
@@ -16,18 +17,20 @@ def monitor_process(process_name: str, process: subprocess.Popen) -> None:
|
|||||||
break
|
break
|
||||||
|
|
||||||
|
|
||||||
def run_celery() -> None:
|
def run_jobs(exclude_indexing: bool) -> None:
|
||||||
cmd_worker = [
|
cmd_worker = [
|
||||||
"celery",
|
"celery",
|
||||||
"-A",
|
"-A",
|
||||||
"danswer.background.celery",
|
"danswer.background.celery",
|
||||||
"worker",
|
"worker",
|
||||||
|
"--pool=threads",
|
||||||
|
"--autoscale=3,10",
|
||||||
"--loglevel=INFO",
|
"--loglevel=INFO",
|
||||||
"--concurrency=1",
|
"--concurrency=1",
|
||||||
]
|
]
|
||||||
|
|
||||||
cmd_beat = ["celery", "-A", "danswer.background.celery", "beat", "--loglevel=INFO"]
|
cmd_beat = ["celery", "-A", "danswer.background.celery", "beat", "--loglevel=INFO"]
|
||||||
|
|
||||||
# Redirect stderr to stdout for both processes
|
|
||||||
worker_process = subprocess.Popen(
|
worker_process = subprocess.Popen(
|
||||||
cmd_worker, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, text=True
|
cmd_worker, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, text=True
|
||||||
)
|
)
|
||||||
@@ -35,7 +38,6 @@ def run_celery() -> None:
|
|||||||
cmd_beat, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, text=True
|
cmd_beat, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, text=True
|
||||||
)
|
)
|
||||||
|
|
||||||
# Monitor outputs using threads
|
|
||||||
worker_thread = threading.Thread(
|
worker_thread = threading.Thread(
|
||||||
target=monitor_process, args=("WORKER", worker_process)
|
target=monitor_process, args=("WORKER", worker_process)
|
||||||
)
|
)
|
||||||
@@ -44,10 +46,37 @@ def run_celery() -> None:
|
|||||||
worker_thread.start()
|
worker_thread.start()
|
||||||
beat_thread.start()
|
beat_thread.start()
|
||||||
|
|
||||||
# Wait for threads to finish
|
if not exclude_indexing:
|
||||||
|
update_env = os.environ.copy()
|
||||||
|
update_env["PYTHONPATH"] = "."
|
||||||
|
update_env["DYNAMIC_CONFIG_DIR_PATH"] = "./dynamic_config_storage"
|
||||||
|
update_env["FILE_CONNECTOR_TMP_STORAGE_PATH"] = "./dynamic_config_storage"
|
||||||
|
cmd_indexing = ["python", "danswer/background/update.py"]
|
||||||
|
|
||||||
|
indexing_process = subprocess.Popen(
|
||||||
|
cmd_indexing,
|
||||||
|
env=update_env,
|
||||||
|
stdout=subprocess.PIPE,
|
||||||
|
stderr=subprocess.STDOUT,
|
||||||
|
text=True,
|
||||||
|
)
|
||||||
|
|
||||||
|
indexing_thread = threading.Thread(
|
||||||
|
target=monitor_process, args=("INDEXING", indexing_process)
|
||||||
|
)
|
||||||
|
|
||||||
|
indexing_thread.start()
|
||||||
|
indexing_thread.join()
|
||||||
|
|
||||||
worker_thread.join()
|
worker_thread.join()
|
||||||
beat_thread.join()
|
beat_thread.join()
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
run_celery()
|
parser = argparse.ArgumentParser(description="Run background jobs.")
|
||||||
|
parser.add_argument(
|
||||||
|
"--no-indexing", action="store_true", help="Do not run indexing process"
|
||||||
|
)
|
||||||
|
args = parser.parse_args()
|
||||||
|
|
||||||
|
run_jobs(args.no_indexing)
|
@@ -11,7 +11,7 @@ services:
|
|||||||
uvicorn danswer.main:app --host 0.0.0.0 --port 8080"
|
uvicorn danswer.main:app --host 0.0.0.0 --port 8080"
|
||||||
depends_on:
|
depends_on:
|
||||||
- relational_db
|
- relational_db
|
||||||
- index
|
- document_index
|
||||||
restart: always
|
restart: always
|
||||||
ports:
|
ports:
|
||||||
- "8080:8080"
|
- "8080:8080"
|
||||||
@@ -23,7 +23,7 @@ services:
|
|||||||
- GEN_AI_HOST_TYPE=${GEN_AI_HOST_TYPE:-}
|
- GEN_AI_HOST_TYPE=${GEN_AI_HOST_TYPE:-}
|
||||||
- NUM_DOCUMENT_TOKENS_FED_TO_GENERATIVE_MODEL=${NUM_DOCUMENT_TOKENS_FED_TO_GENERATIVE_MODEL:-}
|
- NUM_DOCUMENT_TOKENS_FED_TO_GENERATIVE_MODEL=${NUM_DOCUMENT_TOKENS_FED_TO_GENERATIVE_MODEL:-}
|
||||||
- POSTGRES_HOST=relational_db
|
- POSTGRES_HOST=relational_db
|
||||||
- VESPA_HOST=index
|
- VESPA_HOST=document_index
|
||||||
- AUTH_TYPE=${AUTH_TYPE:-disabled}
|
- AUTH_TYPE=${AUTH_TYPE:-disabled}
|
||||||
- QA_TIMEOUT=${QA_TIMEOUT:-}
|
- QA_TIMEOUT=${QA_TIMEOUT:-}
|
||||||
- VALID_EMAIL_DOMAINS=${VALID_EMAIL_DOMAINS:-}
|
- VALID_EMAIL_DOMAINS=${VALID_EMAIL_DOMAINS:-}
|
||||||
@@ -60,7 +60,7 @@ services:
|
|||||||
command: /usr/bin/supervisord
|
command: /usr/bin/supervisord
|
||||||
depends_on:
|
depends_on:
|
||||||
- relational_db
|
- relational_db
|
||||||
- index
|
- document_index
|
||||||
restart: always
|
restart: always
|
||||||
environment:
|
environment:
|
||||||
- INTERNAL_MODEL_VERSION=${INTERNAL_MODEL_VERSION:-openai-chat-completion}
|
- INTERNAL_MODEL_VERSION=${INTERNAL_MODEL_VERSION:-openai-chat-completion}
|
||||||
@@ -69,7 +69,7 @@ services:
|
|||||||
- GEN_AI_ENDPOINT=${GEN_AI_ENDPOINT:-}
|
- GEN_AI_ENDPOINT=${GEN_AI_ENDPOINT:-}
|
||||||
- GEN_AI_HOST_TYPE=${GEN_AI_HOST_TYPE:-}
|
- GEN_AI_HOST_TYPE=${GEN_AI_HOST_TYPE:-}
|
||||||
- POSTGRES_HOST=relational_db
|
- POSTGRES_HOST=relational_db
|
||||||
- VESPA_HOST=index
|
- VESPA_HOST=document_index
|
||||||
- API_BASE_OPENAI=${API_BASE_OPENAI:-}
|
- API_BASE_OPENAI=${API_BASE_OPENAI:-}
|
||||||
- API_TYPE_OPENAI=${API_TYPE_OPENAI:-}
|
- API_TYPE_OPENAI=${API_TYPE_OPENAI:-}
|
||||||
- API_VERSION_OPENAI=${API_VERSION_OPENAI:-}
|
- API_VERSION_OPENAI=${API_VERSION_OPENAI:-}
|
||||||
@@ -129,7 +129,7 @@ services:
|
|||||||
- "5432:5432"
|
- "5432:5432"
|
||||||
volumes:
|
volumes:
|
||||||
- db_volume:/var/lib/postgresql/data
|
- db_volume:/var/lib/postgresql/data
|
||||||
index:
|
document_index:
|
||||||
image: vespaengine/vespa:8
|
image: vespaengine/vespa:8
|
||||||
restart: always
|
restart: always
|
||||||
ports:
|
ports:
|
||||||
|
@@ -11,14 +11,14 @@ services:
|
|||||||
uvicorn danswer.main:app --host 0.0.0.0 --port 8080"
|
uvicorn danswer.main:app --host 0.0.0.0 --port 8080"
|
||||||
depends_on:
|
depends_on:
|
||||||
- relational_db
|
- relational_db
|
||||||
- index
|
- document_index
|
||||||
restart: always
|
restart: always
|
||||||
env_file:
|
env_file:
|
||||||
- .env
|
- .env
|
||||||
environment:
|
environment:
|
||||||
- AUTH_TYPE=${AUTH_TYPE:-google_oauth}
|
- AUTH_TYPE=${AUTH_TYPE:-google_oauth}
|
||||||
- POSTGRES_HOST=relational_db
|
- POSTGRES_HOST=relational_db
|
||||||
- VESPA_HOST=index
|
- VESPA_HOST=document_index
|
||||||
volumes:
|
volumes:
|
||||||
- local_dynamic_storage:/home/storage
|
- local_dynamic_storage:/home/storage
|
||||||
- file_connector_tmp_storage:/home/file_connector_storage
|
- file_connector_tmp_storage:/home/file_connector_storage
|
||||||
@@ -33,14 +33,14 @@ services:
|
|||||||
command: /usr/bin/supervisord
|
command: /usr/bin/supervisord
|
||||||
depends_on:
|
depends_on:
|
||||||
- relational_db
|
- relational_db
|
||||||
- index
|
- document_index
|
||||||
restart: always
|
restart: always
|
||||||
env_file:
|
env_file:
|
||||||
- .env
|
- .env
|
||||||
environment:
|
environment:
|
||||||
- AUTH_TYPE=${AUTH_TYPE:-google_oauth}
|
- AUTH_TYPE=${AUTH_TYPE:-google_oauth}
|
||||||
- POSTGRES_HOST=relational_db
|
- POSTGRES_HOST=relational_db
|
||||||
- VESPA_HOST=index
|
- VESPA_HOST=document_index
|
||||||
volumes:
|
volumes:
|
||||||
- local_dynamic_storage:/home/storage
|
- local_dynamic_storage:/home/storage
|
||||||
- file_connector_tmp_storage:/home/file_connector_storage
|
- file_connector_tmp_storage:/home/file_connector_storage
|
||||||
@@ -69,7 +69,7 @@ services:
|
|||||||
- .env
|
- .env
|
||||||
volumes:
|
volumes:
|
||||||
- db_volume:/var/lib/postgresql/data
|
- db_volume:/var/lib/postgresql/data
|
||||||
index:
|
document_index:
|
||||||
image: vespaengine/vespa:8
|
image: vespaengine/vespa:8
|
||||||
restart: always
|
restart: always
|
||||||
ports:
|
ports:
|
||||||
|
@@ -38,7 +38,6 @@ SESSION_EXPIRE_TIME_SECONDS=86400
|
|||||||
|
|
||||||
# The following are for configuring User Authentication, supported flows are:
|
# The following are for configuring User Authentication, supported flows are:
|
||||||
# disabled
|
# disabled
|
||||||
# simple (email/password + user account creation in Danswer)
|
|
||||||
# google_oauth (login with google/gmail account)
|
# google_oauth (login with google/gmail account)
|
||||||
# oidc (only in Danswer enterprise edition)
|
# oidc (only in Danswer enterprise edition)
|
||||||
# saml (only in Danswer enterprise edition)
|
# saml (only in Danswer enterprise edition)
|
||||||
|
Reference in New Issue
Block a user