So, recently I spun up cAdvisor to provide some metrics for the Grafana dashboard. I created both the docker-compose.yml and prometheus.yml thusly:
prometheus.yml:
spoiler
scrape_configs:
- job_name: cadvisor
scrape_interval: 5s
static_configs:
- targets:
- cadvisor:8080
docker-compose.yml
spoiler
services:
prometheus:
image: prom/prometheus:latest
container_name: prometheus
ports:
- 9090:9090
command:
- --config.file=/etc/prometheus/prometheus.yml
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml:ro
depends_on:
- cadvisor
cadvisor:
image: gcr.io/cadvisor/cadvisor:latest
container_name: cadvisor
ports:
- 8080:8080
volumes:
- /:/rootfs:ro
- /var/run:/var/run:rw
- /sys:/sys:ro
- /var/lib/docker/:/var/lib/docker:ro
depends_on:
- redis
redis:
image: redis:latest
container_name: redis
ports:
- 6379:6379
Placed them both in /tmp/cadvisor/
and ran docker compose up
. All well and good, got some metrics to feed Grafana and all would seem jippity jippity.
Next day I notice Prometheus is off line. Hmm, check everything out. Logs complaining of a missing prometheus.yml. On a hunch I recreated the above prometheus.yml and placed it back in /tmp/cadvisor/
, restart Prometheus, and it fires right up no runs, no drips, no errors. Before I uploaded the new prometheus.yml, I notice that there is a directory now named prometheus.yml in /tmp/cadvisor/
, which is empty. Deleted it.
Next day, same scenario. Missing prometheus.yml, directory called prometheus.yml in /tmp/cadvisor/
. I thought well, if it’s getting deleted, change the permissions, and continued my daily affairs.
Today, same exact scenario. So, wtf, over? Run some commands:
stat /tmp/cadvisor/prometheus.yml
sudo lsof /tmp/cadvisor/prometheus.yml
grep "delete" /var/log/syslog
I can see that the file IS being deleted, but I cannot seem to trace down what is deleting it. It’s like there is a cron job that fires off every day at a certain time and deletes prometheus.yml, and in it’s place, creates a directory called prometheus.yml effectively taking Prometheus offline. I have no such cron job tho.
Any ideas? Suggestions? Ancient wizardry? Any mystical incantations or tomes to consult?
You have two things happening at once.
First. There a process that cleans up tmp files according to a configuration. Yours is probably set to clean files older than a day.
https://www.freedesktop.org/software/systemd/man/latest/tmpfiles.d.html for more information
Second, As to the folder. Docker will create a folder with the bind mount’s name if it’s not found. So Docker tries to find your promethius file, doesn’t find it, then creates a folder with that name and mounts it instead.
You should move the files out of tmp. That’ll solve all your problems.
Done. Standing by to stand by.
Thank you!
ETA: Follow up with another silly question: Why wouldn’t changing to permissions keep the file from being deleted by the internal process?
My guess is the cleanup process is running as root and clobbers anything it sees regardless of permissions. But that’s a guess. I’ve never tried keeping long term data in tmp.
Why wouldn’t changing to permissions keep the file from being deleted by the internal process?
That’s like keeping your lunch laying outside on the sidewalk, getting stepped on by people and destroyed, and then wondering if your lunch would be safer if you put it in a stronger bag (but still left it on the sidewalk).
Don’t leave your lunch outside laying on the sidewalk, regardless of what you might do to “protect” it. Don’t keep important files in /tmp
Don’t leave your lunch outside laying on the sidewalk
I get that. It would seem tho, you could make the file immutable with
sudo chattr +i /tmp/cadvisor/prometheus.yml
Yes, many bad ideas are possible to implement. At least temporarily. Until the next cleanup process figures out how to remove cadvisor dir regardless of file contents. Or the next OS release turns /tmp into a ram disk. Or… or… or…
Yes, it’s a fun academic exercise to think through possible mitigations. And in the end, it will still be dumb to keep this in /tmp
I was asking in the generic sense, not directly related to the above issue, but thank you.
https://refspecs.linuxfoundation.org/FHS_3.0/fhs/ch03s18.html
Don’t put important stuff in /tmp. Put it in /opt or something.
…it’s in /tmp…