Adds:
- actual error message in UI for indexing failure
- if a connector is disabled, stops indexing immediately (after the current batch of documents) to allow for deletion
- adds num docs indexed for the current run + a speed
This allow using Danswer in typical (non-google) enterprise environments.
* Access Tokens can be very large. A token without claims is already 1100 bytes for me (larger than allowed in danswer by default). With roles I got a 12kB token. For that reason I changed the field to TEXT in the database.
* Danswer used to swallow most errors when OIDC would fail. Nodejs forwards a request to the backend and swallows all errors. Even within the backend we catched all ValueErrors and only returned the last exception with the request. Added full stack trace logging to allow debugging issues with userinfo and other endpoints.
* Allow changing name of the login provider on the login button.
* Changed variables and URLs to generic OAUTH_XX (without google in the name) but kept compatibility with the existing google integration
* Tested again Keycloak with OpenID Connect
Next steps:
* Claim to role mappings
* Auto login/SSO (Login button is just an extra click)
This is a workaround around intermittent issues where sementic_identifier becomes None for some reason. It usually recovers when documents are rescraped.
Obviously, we do not yet understand the issue and are interested in a better solution.
- Adds some offset to the `start` for the Google Drive connector to give time for `modifiedTime` to propagate so we don't miss updates
- Moves fetching folders into a separate call since folder `modifiedTime` doesn't get updated when a file in the folder is updated
- Uses `connector_credential_pair.last_successful_index_time` instead of `updated_at` to determine the `start` for poll connectors
Danswer background crashes when the index task for a deleted source is still in the task queue. Without this is won't recover without manual database cleanup.