mirror of
https://github.com/danswer-ai/danswer.git
synced 2025-03-17 21:32:36 +01:00
Updated Contributing for Celery (#629)
This commit is contained in:
parent
fbb05e630d
commit
9a51745fc9
@ -6,7 +6,7 @@ As an open source project in a rapidly changing space, we welcome all contributi
|
||||
|
||||
## 💃 Guidelines
|
||||
### Contribution Opportunities
|
||||
The [GitHub issues](https://github.com/danswer-ai/danswer/issues) page is a great place to start for contribution ideas.
|
||||
The [GitHub Issues](https://github.com/danswer-ai/danswer/issues) page is a great place to start for contribution ideas.
|
||||
|
||||
Issues that have been explicitly approved by the maintainers (aligned with the direction of the project)
|
||||
will be marked with the `approved by maintainers` label.
|
||||
@ -19,7 +19,9 @@ If you have a new/different contribution in mind, we'd love to hear about it!
|
||||
Your input is vital to making sure that Danswer moves in the right direction.
|
||||
Before starting on implementation, please raise a GitHub issue.
|
||||
|
||||
And always feel free to message us (Chris Weaver / Yuhong Sun) on Slack / Discord directly about anything at all.
|
||||
And always feel free to message us (Chris Weaver / Yuhong Sun) on
|
||||
[Slack](https://join.slack.com/t/danswer/shared_invite/zt-1u3h3ke3b-VGh1idW19R8oiNRiKBYv2w) /
|
||||
[Discord](https://discord.gg/TDJ59cGV2X) directly about anything at all.
|
||||
|
||||
|
||||
### Contributing Code
|
||||
@ -44,8 +46,8 @@ We would love to see you there!
|
||||
|
||||
|
||||
## Get Started 🚀
|
||||
Danswer being a fully functional app, relies on several external pieces of software, specifically:
|
||||
- Postgres (Relational DB)
|
||||
Danswer being a fully functional app, relies on some external pieces of software, specifically:
|
||||
- [Postgres](https://www.postgresql.org/) (Relational DB)
|
||||
- [Vespa](https://vespa.ai/) (Vector DB/Search Engine)
|
||||
|
||||
This guide provides instructions to set up the Danswer specific services outside of Docker because it's easier for
|
||||
@ -54,11 +56,9 @@ development purposes but also feel free to just use the containers and update wi
|
||||
|
||||
|
||||
### Local Set Up
|
||||
We've tested primarily with Python versions >= 3.11 but the code should work with Python >= 3.9.
|
||||
It is recommended to use Python versions >= 3.11.
|
||||
|
||||
This guide skips a few optional features for simplicity, reach out if you need any of these:
|
||||
- User Authentication feature
|
||||
- File Connector background job
|
||||
This guide skips setting up User Authentication for the purpose of simplicity
|
||||
|
||||
|
||||
#### Installing Requirements
|
||||
@ -93,18 +93,11 @@ playwright install
|
||||
|
||||
|
||||
#### Dependent Docker Containers
|
||||
First navigate to `danswer/deployment/docker_compose`, then start up the containers with:
|
||||
|
||||
Postgres:
|
||||
First navigate to `danswer/deployment/docker_compose`, then start up Vespa and Postgres with:
|
||||
```bash
|
||||
docker compose -f docker-compose.dev.yml -p danswer-stack up -d relational_db
|
||||
docker compose -f docker-compose.dev.yml -p danswer-stack up -d document_index relational_db
|
||||
```
|
||||
|
||||
Vespa:
|
||||
```bash
|
||||
docker compose -f docker-compose.dev.yml -p danswer-stack up -d index
|
||||
```
|
||||
|
||||
(document_index refers to Vespa and relational_db refers to Postgres)
|
||||
|
||||
#### Running Danswer
|
||||
|
||||
@ -115,27 +108,33 @@ mkdir dynamic_config_storage
|
||||
|
||||
To start the frontend, navigate to `danswer/web` and run:
|
||||
```bash
|
||||
AUTH_TYPE=disabled npm run dev
|
||||
```
|
||||
_for Windows, run:_
|
||||
```bash
|
||||
(SET "AUTH_TYPE=disabled" && npm run dev)
|
||||
npm run dev
|
||||
```
|
||||
|
||||
Package the Vespa schema. This will only need to be done when the Vespa schema is updated locally.
|
||||
|
||||
The first time running Danswer, you will need to run the DB migrations for Postgres.
|
||||
Navigate to `danswer/backend` and with the venv active, run:
|
||||
```bash
|
||||
alembic upgrade head
|
||||
```
|
||||
|
||||
Additionally, we have to package the Vespa schema deployment:
|
||||
Nagivate to `danswer/backend/danswer/datastores/vespa/app_config` and run:
|
||||
```bash
|
||||
zip -r ../vespa-app.zip .
|
||||
```
|
||||
- Note: If you don't have the `zip` utility, you will need to install it prior to running the above
|
||||
|
||||
The first time running Danswer, you will also need to run the DB migrations for Postgres.
|
||||
After the first time, this is no longer required unless the DB models change.
|
||||
|
||||
Navigate to `danswer/backend` and with the venv active, run:
|
||||
```bash
|
||||
alembic upgrade head
|
||||
```
|
||||
|
||||
Next, start the task queue which orchestrates the background jobs.
|
||||
Jobs that take more time are run async from the API server.
|
||||
|
||||
Still in `danswer/backend`, run:
|
||||
```bash
|
||||
python ./scripts/dev_run_background_jobs.py
|
||||
```
|
||||
|
||||
To run the backend API server, navigate back to `danswer/backend` and run:
|
||||
```bash
|
||||
AUTH_TYPE=disabled \
|
||||
@ -153,33 +152,6 @@ powershell -Command "
|
||||
"
|
||||
```
|
||||
|
||||
To run the background job to check for connector updates and index documents, navigate to `danswer/backend` and run:
|
||||
```bash
|
||||
PYTHONPATH=. DYNAMIC_CONFIG_DIR_PATH=./dynamic_config_storage python danswer/background/update.py
|
||||
```
|
||||
_For Windows:_
|
||||
```bash
|
||||
powershell -Command " $env:PYTHONPATH='.'; $env:DYNAMIC_CONFIG_DIR_PATH='./dynamic_config_storage'; python danswer/background/update.py "
|
||||
```
|
||||
|
||||
To run the background job to check for periodically check for document set updates, navigate to `danswer/backend` and run:
|
||||
```bash
|
||||
PYTHONPATH=. DYNAMIC_CONFIG_DIR_PATH=./dynamic_config_storage python danswer/background/document_set_sync_script.py
|
||||
```
|
||||
_For Windows:_
|
||||
```bash
|
||||
powershell -Command " $env:PYTHONPATH='.'; $env:DYNAMIC_CONFIG_DIR_PATH='./dynamic_config_storage'; python danswer/background/document_set_sync_script.py "
|
||||
```
|
||||
|
||||
To run Celery, which handles deletion of connectors + syncing of document sets, navigate to `danswer/backend` and run:
|
||||
```bash
|
||||
PYTHONPATH=. DYNAMIC_CONFIG_DIR_PATH=./dynamic_config_storage celery -A danswer.background.celery worker --loglevel=info --concurrency=1
|
||||
```
|
||||
_For Windows:_
|
||||
```bash
|
||||
powershell -Command " $env:PYTHONPATH='.'; $env:DYNAMIC_CONFIG_DIR_PATH='./dynamic_config_storage'; celery -A danswer.background.celery worker --loglevel=info --concurrency=1 "
|
||||
```
|
||||
|
||||
Note: if you need finer logging, add the additional environment variable `LOG_LEVEL=DEBUG` to the relevant services.
|
||||
|
||||
### Formatting and Linting
|
||||
|
@ -1,4 +1,5 @@
|
||||
# This file is purely for development use, not included in any builds
|
||||
import argparse
|
||||
import os
|
||||
import subprocess
|
||||
import threading
|
||||
|
||||
@ -16,18 +17,20 @@ def monitor_process(process_name: str, process: subprocess.Popen) -> None:
|
||||
break
|
||||
|
||||
|
||||
def run_celery() -> None:
|
||||
def run_jobs(exclude_indexing: bool) -> None:
|
||||
cmd_worker = [
|
||||
"celery",
|
||||
"-A",
|
||||
"danswer.background.celery",
|
||||
"worker",
|
||||
"--pool=threads",
|
||||
"--autoscale=3,10",
|
||||
"--loglevel=INFO",
|
||||
"--concurrency=1",
|
||||
]
|
||||
|
||||
cmd_beat = ["celery", "-A", "danswer.background.celery", "beat", "--loglevel=INFO"]
|
||||
|
||||
# Redirect stderr to stdout for both processes
|
||||
worker_process = subprocess.Popen(
|
||||
cmd_worker, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, text=True
|
||||
)
|
||||
@ -35,7 +38,6 @@ def run_celery() -> None:
|
||||
cmd_beat, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, text=True
|
||||
)
|
||||
|
||||
# Monitor outputs using threads
|
||||
worker_thread = threading.Thread(
|
||||
target=monitor_process, args=("WORKER", worker_process)
|
||||
)
|
||||
@ -44,10 +46,37 @@ def run_celery() -> None:
|
||||
worker_thread.start()
|
||||
beat_thread.start()
|
||||
|
||||
# Wait for threads to finish
|
||||
if not exclude_indexing:
|
||||
update_env = os.environ.copy()
|
||||
update_env["PYTHONPATH"] = "."
|
||||
update_env["DYNAMIC_CONFIG_DIR_PATH"] = "./dynamic_config_storage"
|
||||
update_env["FILE_CONNECTOR_TMP_STORAGE_PATH"] = "./dynamic_config_storage"
|
||||
cmd_indexing = ["python", "danswer/background/update.py"]
|
||||
|
||||
indexing_process = subprocess.Popen(
|
||||
cmd_indexing,
|
||||
env=update_env,
|
||||
stdout=subprocess.PIPE,
|
||||
stderr=subprocess.STDOUT,
|
||||
text=True,
|
||||
)
|
||||
|
||||
indexing_thread = threading.Thread(
|
||||
target=monitor_process, args=("INDEXING", indexing_process)
|
||||
)
|
||||
|
||||
indexing_thread.start()
|
||||
indexing_thread.join()
|
||||
|
||||
worker_thread.join()
|
||||
beat_thread.join()
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
run_celery()
|
||||
parser = argparse.ArgumentParser(description="Run background jobs.")
|
||||
parser.add_argument(
|
||||
"--no-indexing", action="store_true", help="Do not run indexing process"
|
||||
)
|
||||
args = parser.parse_args()
|
||||
|
||||
run_jobs(args.no_indexing)
|
@ -11,7 +11,7 @@ services:
|
||||
uvicorn danswer.main:app --host 0.0.0.0 --port 8080"
|
||||
depends_on:
|
||||
- relational_db
|
||||
- index
|
||||
- document_index
|
||||
restart: always
|
||||
ports:
|
||||
- "8080:8080"
|
||||
@ -23,7 +23,7 @@ services:
|
||||
- GEN_AI_HOST_TYPE=${GEN_AI_HOST_TYPE:-}
|
||||
- NUM_DOCUMENT_TOKENS_FED_TO_GENERATIVE_MODEL=${NUM_DOCUMENT_TOKENS_FED_TO_GENERATIVE_MODEL:-}
|
||||
- POSTGRES_HOST=relational_db
|
||||
- VESPA_HOST=index
|
||||
- VESPA_HOST=document_index
|
||||
- AUTH_TYPE=${AUTH_TYPE:-disabled}
|
||||
- QA_TIMEOUT=${QA_TIMEOUT:-}
|
||||
- VALID_EMAIL_DOMAINS=${VALID_EMAIL_DOMAINS:-}
|
||||
@ -60,7 +60,7 @@ services:
|
||||
command: /usr/bin/supervisord
|
||||
depends_on:
|
||||
- relational_db
|
||||
- index
|
||||
- document_index
|
||||
restart: always
|
||||
environment:
|
||||
- INTERNAL_MODEL_VERSION=${INTERNAL_MODEL_VERSION:-openai-chat-completion}
|
||||
@ -69,7 +69,7 @@ services:
|
||||
- GEN_AI_ENDPOINT=${GEN_AI_ENDPOINT:-}
|
||||
- GEN_AI_HOST_TYPE=${GEN_AI_HOST_TYPE:-}
|
||||
- POSTGRES_HOST=relational_db
|
||||
- VESPA_HOST=index
|
||||
- VESPA_HOST=document_index
|
||||
- API_BASE_OPENAI=${API_BASE_OPENAI:-}
|
||||
- API_TYPE_OPENAI=${API_TYPE_OPENAI:-}
|
||||
- API_VERSION_OPENAI=${API_VERSION_OPENAI:-}
|
||||
@ -129,7 +129,7 @@ services:
|
||||
- "5432:5432"
|
||||
volumes:
|
||||
- db_volume:/var/lib/postgresql/data
|
||||
index:
|
||||
document_index:
|
||||
image: vespaengine/vespa:8
|
||||
restart: always
|
||||
ports:
|
||||
|
@ -11,14 +11,14 @@ services:
|
||||
uvicorn danswer.main:app --host 0.0.0.0 --port 8080"
|
||||
depends_on:
|
||||
- relational_db
|
||||
- index
|
||||
- document_index
|
||||
restart: always
|
||||
env_file:
|
||||
- .env
|
||||
environment:
|
||||
- AUTH_TYPE=${AUTH_TYPE:-google_oauth}
|
||||
- POSTGRES_HOST=relational_db
|
||||
- VESPA_HOST=index
|
||||
- VESPA_HOST=document_index
|
||||
volumes:
|
||||
- local_dynamic_storage:/home/storage
|
||||
- file_connector_tmp_storage:/home/file_connector_storage
|
||||
@ -33,14 +33,14 @@ services:
|
||||
command: /usr/bin/supervisord
|
||||
depends_on:
|
||||
- relational_db
|
||||
- index
|
||||
- document_index
|
||||
restart: always
|
||||
env_file:
|
||||
- .env
|
||||
environment:
|
||||
- AUTH_TYPE=${AUTH_TYPE:-google_oauth}
|
||||
- POSTGRES_HOST=relational_db
|
||||
- VESPA_HOST=index
|
||||
- VESPA_HOST=document_index
|
||||
volumes:
|
||||
- local_dynamic_storage:/home/storage
|
||||
- file_connector_tmp_storage:/home/file_connector_storage
|
||||
@ -69,7 +69,7 @@ services:
|
||||
- .env
|
||||
volumes:
|
||||
- db_volume:/var/lib/postgresql/data
|
||||
index:
|
||||
document_index:
|
||||
image: vespaengine/vespa:8
|
||||
restart: always
|
||||
ports:
|
||||
|
@ -38,7 +38,6 @@ SESSION_EXPIRE_TIME_SECONDS=86400
|
||||
|
||||
# The following are for configuring User Authentication, supported flows are:
|
||||
# disabled
|
||||
# simple (email/password + user account creation in Danswer)
|
||||
# google_oauth (login with google/gmail account)
|
||||
# oidc (only in Danswer enterprise edition)
|
||||
# saml (only in Danswer enterprise edition)
|
||||
|
Loading…
x
Reference in New Issue
Block a user