Updated Contributing for Celery (#629)

This commit is contained in:
Yuhong Sun 2023-10-25 18:26:02 -07:00 committed by GitHub
parent fbb05e630d
commit 9a51745fc9
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
5 changed files with 74 additions and 74 deletions

View File

@ -6,7 +6,7 @@ As an open source project in a rapidly changing space, we welcome all contributi
## 💃 Guidelines
### Contribution Opportunities
The [GitHub issues](https://github.com/danswer-ai/danswer/issues) page is a great place to start for contribution ideas.
The [GitHub Issues](https://github.com/danswer-ai/danswer/issues) page is a great place to start for contribution ideas.
Issues that have been explicitly approved by the maintainers (aligned with the direction of the project)
will be marked with the `approved by maintainers` label.
@ -19,7 +19,9 @@ If you have a new/different contribution in mind, we'd love to hear about it!
Your input is vital to making sure that Danswer moves in the right direction.
Before starting on implementation, please raise a GitHub issue.
And always feel free to message us (Chris Weaver / Yuhong Sun) on Slack / Discord directly about anything at all.
And always feel free to message us (Chris Weaver / Yuhong Sun) on
[Slack](https://join.slack.com/t/danswer/shared_invite/zt-1u3h3ke3b-VGh1idW19R8oiNRiKBYv2w) /
[Discord](https://discord.gg/TDJ59cGV2X) directly about anything at all.
### Contributing Code
@ -44,8 +46,8 @@ We would love to see you there!
## Get Started 🚀
Danswer being a fully functional app, relies on several external pieces of software, specifically:
- Postgres (Relational DB)
Danswer being a fully functional app, relies on some external pieces of software, specifically:
- [Postgres](https://www.postgresql.org/) (Relational DB)
- [Vespa](https://vespa.ai/) (Vector DB/Search Engine)
This guide provides instructions to set up the Danswer specific services outside of Docker because it's easier for
@ -54,11 +56,9 @@ development purposes but also feel free to just use the containers and update wi
### Local Set Up
We've tested primarily with Python versions >= 3.11 but the code should work with Python >= 3.9.
It is recommended to use Python versions >= 3.11.
This guide skips a few optional features for simplicity, reach out if you need any of these:
- User Authentication feature
- File Connector background job
This guide skips setting up User Authentication for the purpose of simplicity
#### Installing Requirements
@ -93,18 +93,11 @@ playwright install
#### Dependent Docker Containers
First navigate to `danswer/deployment/docker_compose`, then start up the containers with:
Postgres:
First navigate to `danswer/deployment/docker_compose`, then start up Vespa and Postgres with:
```bash
docker compose -f docker-compose.dev.yml -p danswer-stack up -d relational_db
docker compose -f docker-compose.dev.yml -p danswer-stack up -d document_index relational_db
```
Vespa:
```bash
docker compose -f docker-compose.dev.yml -p danswer-stack up -d index
```
(document_index refers to Vespa and relational_db refers to Postgres)
#### Running Danswer
@ -115,27 +108,33 @@ mkdir dynamic_config_storage
To start the frontend, navigate to `danswer/web` and run:
```bash
AUTH_TYPE=disabled npm run dev
```
_for Windows, run:_
```bash
(SET "AUTH_TYPE=disabled" && npm run dev)
npm run dev
```
Package the Vespa schema. This will only need to be done when the Vespa schema is updated locally.
The first time running Danswer, you will need to run the DB migrations for Postgres.
Navigate to `danswer/backend` and with the venv active, run:
```bash
alembic upgrade head
```
Additionally, we have to package the Vespa schema deployment:
Nagivate to `danswer/backend/danswer/datastores/vespa/app_config` and run:
```bash
zip -r ../vespa-app.zip .
```
- Note: If you don't have the `zip` utility, you will need to install it prior to running the above
The first time running Danswer, you will also need to run the DB migrations for Postgres.
After the first time, this is no longer required unless the DB models change.
Navigate to `danswer/backend` and with the venv active, run:
```bash
alembic upgrade head
```
Next, start the task queue which orchestrates the background jobs.
Jobs that take more time are run async from the API server.
Still in `danswer/backend`, run:
```bash
python ./scripts/dev_run_background_jobs.py
```
To run the backend API server, navigate back to `danswer/backend` and run:
```bash
AUTH_TYPE=disabled \
@ -153,33 +152,6 @@ powershell -Command "
"
```
To run the background job to check for connector updates and index documents, navigate to `danswer/backend` and run:
```bash
PYTHONPATH=. DYNAMIC_CONFIG_DIR_PATH=./dynamic_config_storage python danswer/background/update.py
```
_For Windows:_
```bash
powershell -Command " $env:PYTHONPATH='.'; $env:DYNAMIC_CONFIG_DIR_PATH='./dynamic_config_storage'; python danswer/background/update.py "
```
To run the background job to check for periodically check for document set updates, navigate to `danswer/backend` and run:
```bash
PYTHONPATH=. DYNAMIC_CONFIG_DIR_PATH=./dynamic_config_storage python danswer/background/document_set_sync_script.py
```
_For Windows:_
```bash
powershell -Command " $env:PYTHONPATH='.'; $env:DYNAMIC_CONFIG_DIR_PATH='./dynamic_config_storage'; python danswer/background/document_set_sync_script.py "
```
To run Celery, which handles deletion of connectors + syncing of document sets, navigate to `danswer/backend` and run:
```bash
PYTHONPATH=. DYNAMIC_CONFIG_DIR_PATH=./dynamic_config_storage celery -A danswer.background.celery worker --loglevel=info --concurrency=1
```
_For Windows:_
```bash
powershell -Command " $env:PYTHONPATH='.'; $env:DYNAMIC_CONFIG_DIR_PATH='./dynamic_config_storage'; celery -A danswer.background.celery worker --loglevel=info --concurrency=1 "
```
Note: if you need finer logging, add the additional environment variable `LOG_LEVEL=DEBUG` to the relevant services.
### Formatting and Linting

View File

@ -1,4 +1,5 @@
# This file is purely for development use, not included in any builds
import argparse
import os
import subprocess
import threading
@ -16,18 +17,20 @@ def monitor_process(process_name: str, process: subprocess.Popen) -> None:
break
def run_celery() -> None:
def run_jobs(exclude_indexing: bool) -> None:
cmd_worker = [
"celery",
"-A",
"danswer.background.celery",
"worker",
"--pool=threads",
"--autoscale=3,10",
"--loglevel=INFO",
"--concurrency=1",
]
cmd_beat = ["celery", "-A", "danswer.background.celery", "beat", "--loglevel=INFO"]
# Redirect stderr to stdout for both processes
worker_process = subprocess.Popen(
cmd_worker, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, text=True
)
@ -35,7 +38,6 @@ def run_celery() -> None:
cmd_beat, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, text=True
)
# Monitor outputs using threads
worker_thread = threading.Thread(
target=monitor_process, args=("WORKER", worker_process)
)
@ -44,10 +46,37 @@ def run_celery() -> None:
worker_thread.start()
beat_thread.start()
# Wait for threads to finish
if not exclude_indexing:
update_env = os.environ.copy()
update_env["PYTHONPATH"] = "."
update_env["DYNAMIC_CONFIG_DIR_PATH"] = "./dynamic_config_storage"
update_env["FILE_CONNECTOR_TMP_STORAGE_PATH"] = "./dynamic_config_storage"
cmd_indexing = ["python", "danswer/background/update.py"]
indexing_process = subprocess.Popen(
cmd_indexing,
env=update_env,
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT,
text=True,
)
indexing_thread = threading.Thread(
target=monitor_process, args=("INDEXING", indexing_process)
)
indexing_thread.start()
indexing_thread.join()
worker_thread.join()
beat_thread.join()
if __name__ == "__main__":
run_celery()
parser = argparse.ArgumentParser(description="Run background jobs.")
parser.add_argument(
"--no-indexing", action="store_true", help="Do not run indexing process"
)
args = parser.parse_args()
run_jobs(args.no_indexing)

View File

@ -11,7 +11,7 @@ services:
uvicorn danswer.main:app --host 0.0.0.0 --port 8080"
depends_on:
- relational_db
- index
- document_index
restart: always
ports:
- "8080:8080"
@ -23,7 +23,7 @@ services:
- GEN_AI_HOST_TYPE=${GEN_AI_HOST_TYPE:-}
- NUM_DOCUMENT_TOKENS_FED_TO_GENERATIVE_MODEL=${NUM_DOCUMENT_TOKENS_FED_TO_GENERATIVE_MODEL:-}
- POSTGRES_HOST=relational_db
- VESPA_HOST=index
- VESPA_HOST=document_index
- AUTH_TYPE=${AUTH_TYPE:-disabled}
- QA_TIMEOUT=${QA_TIMEOUT:-}
- VALID_EMAIL_DOMAINS=${VALID_EMAIL_DOMAINS:-}
@ -60,7 +60,7 @@ services:
command: /usr/bin/supervisord
depends_on:
- relational_db
- index
- document_index
restart: always
environment:
- INTERNAL_MODEL_VERSION=${INTERNAL_MODEL_VERSION:-openai-chat-completion}
@ -69,7 +69,7 @@ services:
- GEN_AI_ENDPOINT=${GEN_AI_ENDPOINT:-}
- GEN_AI_HOST_TYPE=${GEN_AI_HOST_TYPE:-}
- POSTGRES_HOST=relational_db
- VESPA_HOST=index
- VESPA_HOST=document_index
- API_BASE_OPENAI=${API_BASE_OPENAI:-}
- API_TYPE_OPENAI=${API_TYPE_OPENAI:-}
- API_VERSION_OPENAI=${API_VERSION_OPENAI:-}
@ -129,7 +129,7 @@ services:
- "5432:5432"
volumes:
- db_volume:/var/lib/postgresql/data
index:
document_index:
image: vespaengine/vespa:8
restart: always
ports:

View File

@ -11,14 +11,14 @@ services:
uvicorn danswer.main:app --host 0.0.0.0 --port 8080"
depends_on:
- relational_db
- index
- document_index
restart: always
env_file:
- .env
environment:
- AUTH_TYPE=${AUTH_TYPE:-google_oauth}
- POSTGRES_HOST=relational_db
- VESPA_HOST=index
- VESPA_HOST=document_index
volumes:
- local_dynamic_storage:/home/storage
- file_connector_tmp_storage:/home/file_connector_storage
@ -33,14 +33,14 @@ services:
command: /usr/bin/supervisord
depends_on:
- relational_db
- index
- document_index
restart: always
env_file:
- .env
environment:
- AUTH_TYPE=${AUTH_TYPE:-google_oauth}
- POSTGRES_HOST=relational_db
- VESPA_HOST=index
- VESPA_HOST=document_index
volumes:
- local_dynamic_storage:/home/storage
- file_connector_tmp_storage:/home/file_connector_storage
@ -69,7 +69,7 @@ services:
- .env
volumes:
- db_volume:/var/lib/postgresql/data
index:
document_index:
image: vespaengine/vespa:8
restart: always
ports:

View File

@ -38,7 +38,6 @@ SESSION_EXPIRE_TIME_SECONDS=86400
# The following are for configuring User Authentication, supported flows are:
# disabled
# simple (email/password + user account creation in Danswer)
# google_oauth (login with google/gmail account)
# oidc (only in Danswer enterprise edition)
# saml (only in Danswer enterprise edition)