-
Notifications
You must be signed in to change notification settings - Fork 5
docs: add dashboards documentation page with BBE Probes explanation #191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,26 @@ | ||||||||||||||||||
| # Dashboards | ||||||||||||||||||
|
|
||||||||||||||||||
| GameFabric provides predefined Grafana dashboards for monitoring your infrastructure. | ||||||||||||||||||
| You can find these under "Dashboards" in your Grafana instance. | ||||||||||||||||||
|
|
||||||||||||||||||
| ## BBE Probes from Nodes | ||||||||||||||||||
|
|
||||||||||||||||||
| This dashboard shows BlackBox Exporter (BBE) probe results from each of your assigned nodes to predefined targets, including major cloud providers (AWS, Azure, GCP) and DNS servers (such as 1.1.1.1 and 8.8.8.8). | ||||||||||||||||||
|
||||||||||||||||||
|
|
||||||||||||||||||
| ### Purpose | ||||||||||||||||||
|
|
||||||||||||||||||
| Use this dashboard to quickly identify whether game server issues are caused by network connectivity problems to a particular cloud provider rather than bugs in your application code. | ||||||||||||||||||
|
Comment on lines
+8
to
+12
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
IMO no need for a |
||||||||||||||||||
|
|
||||||||||||||||||
| ### Interpreting the Dashboard | ||||||||||||||||||
|
|
||||||||||||||||||
| - **Red sections** indicate the timespan during which a probe failed. | ||||||||||||||||||
| - **Short probe failures** are usually nothing to worry about. | ||||||||||||||||||
| - **Prolonged failures** to a single target (for example, a cloud provider your game doesn't use, or a backup DNS server) may have no impact on your game servers. | ||||||||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. see above comment
Comment on lines
+16
to
+18
|
||||||||||||||||||
| - If probe failures to **multiple targets persist**, GameFabric automatically sets the status to degraded on [status.gamefabric.com](https://status.gamefabric.com). | ||||||||||||||||||
|
|
||||||||||||||||||
| ### Best Practices | ||||||||||||||||||
|
|
||||||||||||||||||
| Nodes can occasionally experience network issues—100% reliability is not guaranteed. Game developers should implement their servers to be tolerant of network issues by: | ||||||||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I know this is written by AI, but the emdash isnt the right interpuction here, or at least the sentence should probably be inverted/changed like this: "Full network reliability is not guaranteed{,.} [Nn]odes can occasionally experience network issues." |
||||||||||||||||||
|
|
||||||||||||||||||
| - Retrying failed connections | ||||||||||||||||||
| - Gracefully terminating the game server after multiple connection attempts fail | ||||||||||||||||||
|
hloeffler marked this conversation as resolved.
Outdated
|
||||||||||||||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a highlighted section that this dashboard is not causally consistent with network issues. It only provides indicators that specific routes from the server to the predefined targets might be disrupted. Network issues may occur despite no probes failing, and vice versa, game servers might not experience connectivity issues even though there are probes failing. Failing probes do not equal "network issues" per se.
Because of the vantage point of the probes, this is also only a local view: probes towards cloud provider endpoints are a) highly selective (1 public, global endpoint per cloud provider, no regional or "other" ways to the targets -- those might not be equally disrupted as towards the global endpoints; this depends on their implementation) and b) target cloud provider services, not the entire cloud platform, thus just giving a selective view of what services might encounter issues, e.g., just because probes towards AWS S3 might be failing, that doesn't mean that all traffic towards AWS experiences connection issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you, will do