Skip to content

Fix critical memory leaks and add graceful shutdown#26

Open
statico wants to merge 4 commits intodaniela-hase:mainfrom
statico:main
Open

Fix critical memory leaks and add graceful shutdown#26
statico wants to merge 4 commits intodaniela-hase:mainfrom
statico:main

Conversation

@statico
Copy link
Copy Markdown

@statico statico commented Jan 29, 2026

Overview

This PR fixes multiple critical memory leaks that cause unbounded memory growth in Docker deployments, eventually hitting memory limits and causing instability.

Note: This work was completed entirely using Claude Code, Anthropic's CLI tool for software development.

Critical Issues Fixed

1. UDP Socket Leak (CRITICAL)

  • Problem: A new UDP socket was created for every ONVIF discovery probe response and never closed
  • Impact: With frequent discovery probes (every few seconds), this accumulated hundreds of unclosed sockets, eventually causing file descriptor exhaustion
  • Fix: Reuse the main discovery socket instead of creating ephemeral sockets

2. Missing Graceful Shutdown (CRITICAL)

  • Problem: No cleanup on SIGTERM/SIGINT signals, leaving all sockets and servers open
  • Impact: Docker container restarts leave zombie resources, compounding memory issues
  • Fix: Added comprehensive shutdown handlers that properly close all resources

3. TCP Proxy Leak (CRITICAL)

  • Problem: TCP proxy instances were never tracked or cleaned up
  • Impact: RTSP and snapshot proxy connections accumulate without cleanup
  • Fix: Track all proxy instances and call .end() on shutdown

4. xml2js Parser Leak (CRITICAL)

  • Problem: xml2js.parseString() creates a new Parser instance on every call, which happened on every discovery probe
  • Impact: Hundreds of parser instances created per hour, each with its own memory overhead
  • Fix: Create reusable Parser instance in constructor

5. SOAP Client HTTP Agent Leak (HIGH)

  • Problem: SOAP client in config-builder maintains HTTP connection pools that are never destroyed
  • Impact: HTTP agents accumulate during config generation
  • Fix: Destroy HTTP agent in finally block to ensure cleanup

6. Snapshot File I/O (HIGH)

  • Problem: snapshot.png read from disk synchronously on every HTTP request
  • Impact: Excessive file I/O and potential file handle accumulation under load
  • Fix: Cache snapshot in memory on first read

7. Context Loss Bug (MEDIUM)

  • Problem: HTTP server callback passed without binding, breaking this context
  • Impact: Snapshot cache doesn't work, falling back to file I/O every time
  • Fix: Use .bind(this) when creating HTTP server

8. Duplicate Event Listeners (MEDIUM)

  • Problem: Debug listeners could be added multiple times if enableDebugOutput() called repeatedly
  • Impact: Event listener accumulation causing memory growth
  • Fix: Guard against duplicate listener registration

9. Global Prototype Pollution (LOW)

  • Problem: Date.prototype.getUTCHours modified globally and never restored
  • Impact: Global state pollution affecting all Date instances
  • Fix: Save and restore original method in finally block

Additional Improvements

  • Added error handlers to HTTP server and discovery socket to prevent crashes
  • Replaced deprecated url.parse() with modern URL API
  • Extracted discovery response generation to separate method for better organization
  • Added error handling for XML parse failures

Testing Recommendations

Monitor Docker container memory under load:

docker stats --format "table {{.Name}}\t{{.MemUsage}}\t{{.MemPerc}}"

Memory usage should now stabilize instead of growing unbounded, especially under continuous ONVIF discovery traffic.

Bonus: GitHub Actions Workflow

Added automatic Docker image builds to GitHub Container Registry on every commit, with multi-platform support (amd64/arm64).

Impact

These fixes address the root causes of memory leaks that would eventually hit the 1.5G Docker memory limit, causing container instability. The server should now run indefinitely without memory growth.


Developed with: Claude Code by Anthropic

statico and others added 4 commits January 28, 2026 16:48
- Fix UDP socket leak: reuse discovery socket instead of creating new ones
- Add graceful shutdown on SIGTERM/SIGINT for Docker compatibility
- Cache snapshot.png in memory to prevent repeated file I/O
- Prevent duplicate event listener registration in debug mode
- Add error handlers for HTTP server and discovery socket
- Implement shutdown() method to properly clean up resources

Co-authored-by: Claude Code <claude@anthropic.com>
- Store and properly clean up TCP proxy instances on shutdown
- Fix 'this' binding in HTTP server listen callback
- Clean up SOAP client HTTP agent in config-builder
- Restore Date.prototype.getUTCHours after modification to prevent global pollution
- Add try-finally blocks to ensure cleanup happens even on errors

Co-authored-by: Claude Code <claude@anthropic.com>
- Create reusable xml2js Parser instance instead of new parser on every discovery message
- Cache XML parser options to avoid recreating array/object on each parse
- Extract discovery response generation to separate method for better code organization
- Replace deprecated url.parse with modern URL API
- Add error handling for XML parse failures in discovery

This fixes a critical memory leak where a new Parser instance was created
for every ONVIF discovery probe, which could happen hundreds of times per hour.

Co-authored-by: Claude Code <claude@anthropic.com>
- Builds and pushes Docker image to ghcr.io on every commit to main
- Also builds (but doesn't push) on pull requests for testing
- Supports multi-platform builds (amd64 and arm64)
- Uses GitHub Actions cache for faster builds
- Automatically tags images with branch name, commit SHA, and 'latest'

The workflow uses GITHUB_TOKEN for authentication, which is automatically
provided by GitHub Actions - no manual token setup required.

Co-authored-by: Claude Code <claude@anthropic.com>
@statico
Copy link
Copy Markdown
Author

statico commented Jan 29, 2026

(Human here: This tool was awesome but clearly running out of memory after a few hours. I asked Claude to fix all memory leaks and it seems to be stable now.)

@thom-vend
Copy link
Copy Markdown

Those fixes helped me, I need more fix and vendor WSDL to run offline: https://github.com/thom-vend/onvif-server

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants