This is awesome! Great work @katernoir!
I’m trying to set up Grafana/Prometheus and I feel like I’m missing something.
I set prometheus = true in config.toml. I checked that it’s listening on port 26660, and it appears Grafana is connecting:
netstat -an |grep 26660
tcp 0 0 127.0.0.1:41142 127.0.0.1:26660 ESTABLISHED
tcp 0 0 127.0.0.1:41146 127.0.0.1:26660 ESTABLISHED
tcp6 0 0 :::26660 :::* LISTEN
tcp6 0 0 127.0.0.1:26660 127.0.0.1:41142 ESTABLISHED
tcp6 0 0 127.0.0.1:26660 127.0.0.1:41146 ESTABLISHED
However, in the dashboard I see this:
And I see lots of 503 errors in the logs when the dashboard is running:
t=2018-07-20T03:09:37+0000 lvl=info msg=“Request Completed” logger=context userId=1 orgId=1 uname=admin method=GET path=/api/datasources/proxy/1/api/v1/query status=503 remote_addr=127.0.0.1 time_ms=1 size=59 referer=“http://localhost:3000/d/ajjGYQdmz/cosmos-network-dashboard?refresh=5s&orgId=1”
t=2018-07-20T03:09:37+0000 lvl=info msg=“Request Completed” logger=context userId=1 orgId=1 uname=admin method=GET path=/api/datasources/proxy/1/api/v1/query status=503 remote_addr=127.0.0.1 time_ms=4 size=59 referer=“http://localhost:3000/d/ajjGYQdmz/cosmos-network-dashboard?refresh=5s&orgId=1”
You need to run Prometheus to monitor the 26660 target by editing the
prometheus.yml. It will listen at port 9090. You then point your datasource in Grafana to <your_address_running_prometheus>:9090
The config in
prometheus.yml can be as simple as this.
We’ve turned on Hubble Alerts and Events for gaia-7001.
Instructions for how to use and subscribe are here:
@katernoir Do you plan on updating this for 7004?
@kwunyeung How about the telegram bot for 7004?
You mean the Grafana dashboard? It should work with any network your node is running, as long as the connection to Prometheus doesn’t get changed. So no need to update it I haven’t tried it in 7004 yet, though.
We have updated it and I’m testing with it now. I keep receiving absent validator notification if our validator node didn’t send vote to a certain height. You can add the bot and subscribe to your validator address to try.
Where is prometheus.yml?
Thanks for your work on this! I’m trying to configure the dashboard to monitor one of my validators. Could you please provide some guidance on the required settings for the Prometheus data source, show in the screenshot?
The URL in the HTTP section needs to be configured to the HTTP API URL of the Prometheus server that is scraping your validator input.
The Grafana documentation on this topic might be helpful as well.
Can you share your prometheus.yml so that we can understand the setup better
hey, try switchin in data sources in grafana from server to browser.
Because I’ve been asked this a lot, I provided a smat step-by-step instruction to setup Grafana with my dashboard. Hope this works and will help people to get started.
Step: Install Grafana (http://docs.grafana.org/installation/debian/) & start it
Step: In .gaiad/config.toml set prometheus=true
Step: Restart gaiad to apply config changes
Step: Download prometheus (https://prometheus.io/docs/introduction/first_steps/), edit prometheus.yml
Add the following:
# COSMOS MONITORING # The job name is added as a label `job=<job_name>` to any timeseries scraped$ - job_name: 'cosmops' # metrics_path defaults to '/metrics' # scheme defaults to 'http'. static_configs: - targets: ['localhost:26660'] labels: group: 'cosmops'
Step: start prometheus with: ./prometheus --config.file=prometheus.yml
Step: Open Grafana in Browser & Do initial Setup
Step: Under Configuration -> Data Source -> Add a new Data Source
Scrape Interval: 5s
Rest is Default
-> Save&Test should add DataSource
- Step: In Grafana goto Dashboard -> Import
- Step: Paste 7044 (this is my Dashboard template for Grafana), Choose “CosmosDataSource” as Data Source
- Step: You should now have a working Dashboard
I’m working on some monitoring and alerting for validators and sentries -
1 - Using Icinga for alerts
2 - Updating a Grafana/Prometheus dashboard
3 - Log analysis
I plan to open source the tools when they’re ready.
For starters, I’m wondering if anyone has done any research into log patterns that indicate missed pre-commits?
Adding feedback from -
@mattharrop If set to do so, gaiad will write every signature in each block to syslog. Just check for your validator’s ID in the block of signature, if it’s not there, that’s a miss.
In our old
gaiabot, it utilizes a
systemd package to monitor the journal. If you run your
gaiad as systemd service, then the journal can be received from it. You may take a look.
However, I don’t quite like this approach as it uses a lot of resources to keep checking every line of journal log of the process to decide if it should send out an alert message.
This post of mine is almost 2 years old. I’m not keeping this up-to-date anymore. Please find some more recent information.