Debugging Self-Hosted Server

codezninja · December 31, 2024, 6:58pm

Hi! I want to first thank you for building such an amazing tool. I just started using this on one of my laptops and its definitely been a great help on my workflow.

I decided to take it to the next step and setup my own self-hosted sync server. This is where I’m hitting some issues and not sure how to debug. I’m running the docker image on a hashicorp nomad cluster I use to run my other workloads. The postgres db is on another mac mini server.

Now I’m not super familiar with rust or postgres as I would like. But I created a role and user that should that the atuin server should be able to use to create the schemas and interact with postgres server.

On server startup this is what I’m seeing which seems like it did work cause I see that it created tables

2024-12-31T14:53:23.080443Z DEBUG sqlx::query: summary="SELECT current_database()" db.statement="" rows_affected=0 rows_returned=1 elapsed=3.81986ms elapsed_secs=0.00381986
2024-12-31T14:53:23.083132Z DEBUG sqlx::query: summary="SELECT pg_advisory_lock($1)" db.statement="" rows_affected=1 rows_returned=1 elapsed=2.620061ms elapsed_secs=0.002620061
2024-12-31T14:53:23.085279Z  INFO sqlx::postgres::notice: relation "_sqlx_migrations" already exists, skipping
2024-12-31T14:53:23.085520Z DEBUG sqlx::query: summary="CREATE TABLE IF NOT …" db.statement="\n\nCREATE TABLE IF NOT EXISTS _sqlx_migrations (\n  version BIGINT PRIMARY KEY,\n  description TEXT NOT NULL,\n  installed_on TIMESTAMPTZ NOT NULL DEFAULT now(),\n  success BOOLEAN NOT NULL,\n  checksum BYTEA NOT NULL,\n  execution_time BIGINT NOT NULL\n);\n" rows_affected=0 rows_returned=0 elapsed=2.137481ms elapsed_secs=0.002137481
2024-12-31T14:53:23.090937Z DEBUG sqlx::query: summary="SELECT version FROM _sqlx_migrations …" db.statement="\n\nSELECT\n  version\nFROM\n  _sqlx_migrations\nWHERE\n  success = false\nORDER BY\n  version\nLIMIT\n  1\n" rows_affected=0 rows_returned=0 elapsed=5.315447ms elapsed_secs=0.005315447
2024-12-31T14:53:23.093620Z DEBUG sqlx::query: summary="SELECT version, checksum FROM …" db.statement="\n\nSELECT\n  version,\n  checksum\nFROM\n  _sqlx_migrations\nORDER BY\n  version\n" rows_affected=19 rows_returned=19 elapsed=2.549985ms elapsed_secs=0.002549985
2024-12-31T14:53:23.094904Z DEBUG sqlx::query: summary="SELECT current_database()" db.statement="" rows_affected=0 rows_returned=1 elapsed=1.207947ms elapsed_secs=0.001207947
2024-12-31T14:53:23.096931Z DEBUG sqlx::query: summary="SELECT pg_advisory_unlock($1)" db.statement="" rows_affected=1 rows_returned=1 elapsed=1.985064ms elapsed_secs=0.001985064

The 2 problems I’m trying to figure out

Why sometimes when I do curl in the docker of the atuin server requests sometimes take a min of 30 seconds
When I try to register a new server from my client to setup sync. i get 400 and 500 error.

2024-12-31T14:54:47.648016Z DEBUG request{method=GET uri=/user/ejacob version=HTTP/1.1}: tower_http::trace::on_request: started processing request
2024-12-31T14:55:17.649704Z ERROR request{method=GET uri=/user/ejacob version=HTTP/1.1}:get{user.username="ejacob"}: atuin_server::handlers::user: database error: pool timed out while waiting for an open connection
2024-12-31T14:55:17.649742Z DEBUG request{method=GET uri=/user/ejacob version=HTTP/1.1}: tower_http::trace::on_response: finished processing request latency=30001 ms status=500
2024-12-31T14:55:17.649749Z ERROR request{method=GET uri=/user/ejacob version=HTTP/1.1}: tower_http::trace::on_failure: response failed classification=Status code: 500 Internal Server Error latency=30001 ms
2024-12-31T14:55:18.032667Z DEBUG request{method=POST uri=/register version=HTTP/1.1}: tower_http::trace::on_request: started processing request
2024-12-31T14:55:48.059284Z ERROR request{method=POST uri=/register version=HTTP/1.1}:register: atuin_server::handlers::user: failed to add user: Other(pool timed out while waiting for an open connection

Location:
    crates/atuin-server-postgres/src/lib.rs:53:39)

My current thought is that the 30 second delay i’m seeing might be the culprit BUT unsure how to debug that currently. I’ve set the following envars on the server container but nothing really is helping me understand whats causing the slowness

RUST_LOG =  "debug"
ATUIN_LOG = "debug"

One thing I will mention is that the nomad VM is on proxmox and the whole filesystem underneath is ZFS. I did see some topics where ZFS has issues with sqlite but I’m unclear if thats my issues since I’m trying to use the server side of it with postgres.

I’m also running the latest atuin 18.4.0 on both client and server and postgres is 17.2