Skip to content

Commit e6ae386

Browse files
committed
fix: implement UTC-day-skipped analytics client rotation
Reuse clients across todaysand yesterday hashes, and rewrite yesterday matches to today hash instead of introducing another identity column. This keeps the identity model simple, preserves short-lived continuity across adjacent UTC midnights, and still rotates a client once a UTC day was skipped. Also move page-view dedup ahead of session updates so duplicate hits do not inflate page_view_count, duration, or exit metrics. Add the migration to merge duplicate clients, enforce unique (site_id, hash), and update tests/docs for the new rotation strategy
1 parent ce6ee6a commit e6ae386

20 files changed

Lines changed: 964 additions & 229 deletions

ANALYTICS.md

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -8,9 +8,11 @@ Server-generated visitor ID computed from minimized request signals:
88
- Hash algorithm: truncated HMAC-SHA-256
99
- Key derivation: site-scoped, daily key derived from a server secret
1010
- Inputs: internal site ID, truncated IP prefix, browser family, device class
11-
- Daily rotation: visitor ID changes every UTC day
11+
- UTC-day-skipped rotation: the server checks today's and yesterday's hash
12+
- Adjacent-day reuse: if only yesterday matches, the same client row is rewritten to today's hash
13+
- New client only after a full UTC day was skipped
1214
- No client-side storage or cookies
13-
- Same visitor receives consistent ID throughout the day
15+
- Same visitor receives a consistent ID within the day and across an adjacent UTC midnight
1416
- Country is not part of the visitor ID
1517
- The server secret helps reduce the impact of database-only leaks by making visitor IDs harder to recompute outside the app
1618

@@ -28,6 +30,7 @@ Filters non-human traffic:
2830
Prevents duplicate counting:
2931
- 10-second deduplication window per visitor per page
3032
- Filters double-clicks, script reloads, same-path SPA updates within 10s
33+
- Duplicate hits are ignored before page-view counters or session exit metrics change
3134
- Ensures accurate page view metrics
3235

3336
## Query Parameters
@@ -56,7 +59,7 @@ Tracks browsing sessions:
5659
## Privacy
5760

5861
- No client-side cookies or persistent identifiers
59-
- Visitor IDs rotate daily
62+
- Visitor IDs use UTC-day-skipped rotation
6063
- Visitor IDs are derived server-side from minimized signals
6164
- Site-scoped keying prevents reuse across sites
6265
- Keyed visitor IDs reduce the value of database-only leaks

PRIVACY.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ Lovely Eye is self-hosted analytics. The site owner is the data controller. This
1414

1515
## Visitor Identifiers
1616

17-
We derive a daily-rotating visitor identifier on the server. It is based on a keyed hash of site ID, truncated IP prefix, browser family, and device class. It changes every UTC day and is not a persistent identifier. This keyed approach helps reduce the impact of database-only leaks because the stored analytics rows do not include enough information to recompute the identifier on their own.
17+
We derive a keyed visitor identifier on the server from site ID, truncated IP prefix, browser family, and device class. The hash is computed per UTC day, but the server reuses the same client across `today` and `yesterday`; if only yesterday matches, that row is rewritten to today's hash. A new client is created only after a UTC day was skipped, so the identifier is still short-lived and not persistent. This keyed approach helps reduce the impact of database-only leaks because the stored analytics rows do not include enough information to recompute the identifier on their own.
1818

1919
## IP Addresses Under GDPR
2020

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ Self-hosted web analytics with a Go backend and React dashboard. Built for low-r
77

88
## Features
99

10-
- **Privacy-first**: no analytics cookies, daily visitor ID rotation with a keyed server-side visitor ID
10+
- **Privacy-first**: no analytics cookies, keyed server-side visitor identity with UTC-day-skipped rotation
1111
- **Bot filtering**: excludes crawlers, scrapers, monitoring bots.
1212
- **Lightweight**: runtime consumes around ~15MB of RAM on AMD processor.
1313
- **SQLite and PostgreSQL** supported.
@@ -146,7 +146,7 @@ After you started your containers:
146146

147147
Country tracking downloads the GeoIP database on demand when at least one site enables it. If the download fails, the dashboard will show the error in site settings.
148148

149-
Analytics visitor identity is server-generated, rotates daily in UTC, and is derived from a keyed hash of site ID, truncated IP prefix, browser family, and device class. Country tracking is kept separate from visitor identity. The dedicated analytics identity secret helps reduce the impact of database-only leaks, because visitor IDs cannot be recomputed from stored analytics data alone.
149+
Analytics visitor identity is server-generated and derived from a keyed hash of site ID, truncated IP prefix, browser family, and device class. The hash is computed per UTC day, but the server reuses the same client across `today` and `yesterday`; if only yesterday matches, that row is rewritten to today's hash. Country tracking stays separate from visitor identity, sessions still expire after 30 minutes of inactivity, and the dedicated analytics identity secret helps reduce the impact of database-only leaks because visitor IDs cannot be recomputed from stored analytics data alone.
150150

151151
## Custom Events
152152

server/CONTRIBUTING.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,8 +29,11 @@
2929

3030
## Analytics identity
3131

32-
- Visitor identity is server-generated and rotates daily in UTC
32+
- Visitor identity is server-generated and uses UTC-day-skipped rotation
3333
- Identity is derived from a keyed hash of: site ID, truncated IP prefix (`/24` for IPv4, `/64` for IPv6), browser family, and device class
34+
- The server checks today's and yesterday's hash; if only yesterday matches, it rewrites that client row to today's hash
35+
- A new client is created only after a full UTC day was skipped
36+
- Sessions still use 30-minute inactivity
3437
- Country tracking stays separate from visitor identity and is only used for reporting when enabled
3538
- Set `ANALYTICS_IDENTITY_SECRET` to control the identity key explicitly
3639
- If `ANALYTICS_IDENTITY_SECRET` is unset, the server falls back to `JWT_SECRET`

server/e2e/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,4 +6,4 @@ No source code as test dependencies. Exceptions:
66
- basic types/constants for data integrity
77
- generated operations: `import operations "github.com/lovely-eye/server/e2e/generated"`
88

9-
Analytics e2e tests should use a fixed `ANALYTICS_IDENTITY_SECRET` so visitor identity stays deterministic across test runs.
9+
Analytics e2e tests should use a fixed `ANALYTICS_IDENTITY_SECRET` so visitor identity stays deterministic across test runs, including the UTC-day-skipped `today`/`yesterday` client reuse path.

server/internal/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,4 +14,4 @@ Modules:
1414
- `./models` - Domain models with [Bun](https://github.com/uptrace/bun) annotations. Defines User, Site, Client, Session, Event, and event definition entities.
1515
- `./repository` - Data access layer. Provides CRUD operations for all models using [Bun ORM](https://github.com/uptrace/bun).
1616
- `./server` - Application bootstrap and HTTP server setup. Wires all dependencies and configures routes.
17-
- `./services` - Business logic layer. Contains SiteService and AnalyticsService with domain operations, including pseudonymous visitor identity and session handling.
17+
- `./services` - Business logic layer. Contains SiteService and AnalyticsService with domain operations, including pseudonymous visitor identity with UTC-day-skipped rotation and 30-minute session handling.

server/internal/auth/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ No CSRF tokens needed. See [discussion](https://www.reddit.com/r/node/comments/1
2929
| Variable | Default | Description |
3030
|----------|---------|-------------|
3131
| `JWT_SECRET` | generated at startup if empty | Secret key for signing tokens. Must be at least 32 characters when set. Set it explicitly in production if sessions should survive restarts. |
32-
| `ANALYTICS_IDENTITY_SECRET` | falls back to `JWT_SECRET` | Optional dedicated secret for analytics visitor identity. Must be at least 32 characters when set. Helps reduce the impact of database-only leaks by making visitor IDs harder to recompute. |
32+
| `ANALYTICS_IDENTITY_SECRET` | falls back to `JWT_SECRET` | Optional dedicated secret for analytics visitor identity. Must be at least 32 characters when set. Analytics uses it for the daily UTC hashes behind UTC-day-skipped rotation, and it helps reduce the impact of database-only leaks by making visitor IDs harder to recompute. |
3333
| `JWT_ACCESS_EXPIRY_MINUTES` | `15` | Access token lifetime in minutes |
3434
| `JWT_REFRESH_DAYS` | `7` | Refresh token lifetime in days |
3535
| `SECURE_COOKIES` | `true` | Set to `true` in production (requires HTTPS) |

server/internal/models/models.go

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -85,15 +85,16 @@ type Country struct {
8585
Name string `bun:"name,notnull,type:varchar(128)" json:"name"`
8686
}
8787

88-
// Client represents a pseudonymous visitor identity within the current rotation window.
89-
// Stores coarse client attributes used for analytics breakdowns.
88+
// Client represents a pseudonymous visitor identity resolved by UTC-day-skipped
89+
// rotation. The stored hash is a daily UTC key; matching yesterday rewrites the
90+
// same row to today's hash so continuity survives adjacent UTC-day boundaries.
9091
type Client struct {
9192
bun.BaseModel `bun:"table:clients,alias:c"`
9293

93-
ID int64 `bun:"id,pk,autoincrement" json:"id"`
94-
SiteID int64 `bun:"site_id,notnull" json:"site_id"`
95-
Hash string `bun:"hash,notnull,type:varchar(64)" json:"hash"` // Truncated HMAC-SHA-256 hex over site-scoped, minimized visitor signals
96-
Country string `bun:"country,type:varchar(2)" json:"country"`
94+
ID int64 `bun:"id,pk,autoincrement" json:"id"`
95+
SiteID int64 `bun:"site_id,notnull,unique:clients_site_id_hash" json:"site_id"`
96+
Hash string `bun:"hash,notnull,type:varchar(64),unique:clients_site_id_hash" json:"hash"` // Truncated HMAC-SHA-256 hex over site-scoped daily UTC visitor signals
97+
Country string `bun:"country,type:varchar(2)" json:"country"`
9798
Device ClientDevice `bun:"device,notnull,default:0" json:"device"`
9899
Browser ClientBrowser `bun:"browser,notnull,default:0" json:"browser"`
99100
OS ClientOS `bun:"os,notnull,default:0" json:"os"`

0 commit comments

Comments
 (0)