|
1 | 1 | [](https://choosealicense.com/licenses/unlicense/) [](https://github.com/kunduso-org/github-self-hosted-runner-amazon-ec2-terraform/pulls?q=is%3Apr+is%3Aclosed) [](https://GitHub.com/kunduso-org/github-self-hosted-runner-amazon-ec2-terraform/pull/) |
2 | 2 | [](https://github.com/kunduso-org/github-self-hosted-runner-amazon-ec2-terraform/issues?q=is%3Aissue+is%3Aclosed) [](https://GitHub.com/kunduso-org/github-self-hosted-runner-amazon-ec2-terraform/issues/) |
3 | | -[](https://github.com/kunduso-org/github-self-hosted-runner-amazon-ec2-terraform/actions/workflows/terraform.yml) [](https://github.com/kunduso-org/github-self-hosted-runner-amazon-ec2-terraform/actions/workflows/code-scan.yml) |
| 3 | +[](https://github.com/kunduso-org/github-self-hosted-runner-amazon-ec2-terraform/actions/workflows/terraform.yml) [](https://github.com/kunduso-org/github-self-hosted-runner-amazon-ec2-terraform/actions/workflows/code-scan.yml) |
| 4 | + |
| 5 | +# GitHub Self-Hosted Runner on Amazon EC2 with Terraform |
| 6 | + |
| 7 | +This repository contains Terraform infrastructure code to deploy scalable, self-hosted GitHub Actions runners on Amazon EC2 instances. The solution provides automated runner provisioning, lifecycle management, and secure deregistration using AWS Auto Scaling Groups, Lambda functions, and CloudWatch logging. |
| 8 | + |
| 9 | +## Features |
| 10 | + |
| 11 | +- **High Availability**: Maintains consistent runner capacity using AWS Auto Scaling Groups with automatic instance replacement across multiple Availability Zones |
| 12 | +- **Secure Authentication**: Uses GitHub App authentication for secure API access |
| 13 | +- **Automated Lifecycle Management**: Automatic runner registration and deregistration with dual mechanisms (Lambda + systemd service) |
| 14 | +- **Automated Deregistration**: Prevents orphaned runners in GitHub organization using lifecycle hooks and Lambda functions |
| 15 | +- **Unified Logging**: Centralized CloudWatch logging for complete runner lifecycle tracking |
| 16 | +- **Network Security**: Runs in private subnets with NAT Gateway for outbound internet access |
| 17 | +- **Encryption**: KMS encryption for secrets, CloudWatch logs, EFS storage, SNS topics, and Lambda functions |
| 18 | +- **Performance Optimization**: EFS with tuned NFS parameters and Lambda layer for reduced cold start times |
| 19 | +- **Cost Optimization**: EFS storage for shared runner workspace and dependency caching to reduce startup time |
| 20 | + |
| 21 | +## Architecture |
| 22 | + |
| 23 | +The solution deploys: |
| 24 | +- **VPC with public/private subnets** across multiple Availability Zones |
| 25 | +- **Auto Scaling Group** with EC2 instances running GitHub Actions runners |
| 26 | +- **Auto Scaling Lifecycle Hooks** for graceful runner deregistration on instance termination |
| 27 | +- **SNS Topic** for lifecycle event notifications with KMS encryption |
| 28 | +- **Lambda function** for automated runner deregistration via GitHub API |
| 29 | +- **Lambda Layer** with PyJWT and cryptography dependencies for optimized performance |
| 30 | +- **Dead Letter Queue** for Lambda error handling and retry mechanisms |
| 31 | +- **EFS file system** for shared runner workspace storage with optimized NFS parameters |
| 32 | +- **CloudWatch log groups** for unified lifecycle logging with structured format |
| 33 | +- **Secrets Manager** for secure GitHub App credentials storage |
| 34 | +- **SSM Parameter Store** for runner configuration scripts and deregistration service |
| 35 | +- **Systemd Service** for backup deregistration mechanism |
| 36 | + |
| 37 | +## Prerequisites |
| 38 | + |
| 39 | +Before deploying this infrastructure, please ensure the following prerequisites are met: |
| 40 | + |
| 41 | +### AWS Setup |
| 42 | +- An AWS account with appropriate permissions to create and manage the resources included in this repository |
| 43 | +- An OpenID Connect identity provider created in AWS IAM with a trust relationship to this GitHub repository ([detailed setup guide](https://skundunotes.com/2023/02/28/securely-integrate-aws-credentials-with-github-actions-using-openid-connect/)) |
| 44 | +- The ARN of the IAM Role stored as a GitHub secret for use in the `terraform.yml` workflow and referred via `${{ secrets.IAM_ROLE }}`. |
| 45 | + |
| 46 | +### GitHub Setup |
| 47 | +- A GitHub organization where the self-hosted runners will be registered |
| 48 | +- A GitHub App created in the organization with the following permissions: |
| 49 | + - Repository permissions: `Actions (Read)`, `Administration (Read)`, `Metadata (Read)` |
| 50 | + - Organization permissions: `Self-hosted runners (Write)` |
| 51 | +- GitHub App credentials (App ID, Installation ID, and Private Key) stored in AWS Secrets Manager |
| 52 | + |
| 53 | +### Infracost Integration (Optional) |
| 54 | +- An `INFRACOST_API_KEY` stored as a GitHub Actions secret for cost estimation |
| 55 | +- A GitHub Actions variable `INFRACOST_SCAN_TYPE` set to either `hcl_code` or `tf_plan` depending on the desired scan type |
| 56 | + |
| 57 | +## Usage |
| 58 | + |
| 59 | +This infrastructure is deployed automatically using the GitHub Actions workflow defined in `.github/workflows/terraform.yml`. The workflow provides complete CI/CD automation with security scanning, cost estimation, and infrastructure deployment. |
| 60 | + |
| 61 | +### Automated Deployment Pipeline |
| 62 | + |
| 63 | +The `terraform.yml` workflow includes the following automated stages: |
| 64 | + |
| 65 | +#### 1. **Terraform Validation and Planning** |
| 66 | +- **Terraform Format Check**: Ensures code follows canonical formatting |
| 67 | +- **Terraform Validation**: Validates configuration syntax and logic |
| 68 | +- **Terraform Plan**: Generates execution plan showing proposed changes |
| 69 | +- **Plan Output**: Posts detailed plan as PR comment for review |
| 70 | + |
| 71 | +#### 2. **Security and Cost Analysis** |
| 72 | +- **Checkov Security Scan**: Identifies security misconfigurations and compliance issues |
| 73 | +- **Infracost Analysis**: Provides cost estimates for infrastructure changes |
| 74 | +- **Cost Comparison**: Shows cost diff between current and proposed infrastructure |
| 75 | + |
| 76 | +#### 3. **Automated Deployment** |
| 77 | +- **Trigger**: Automatically deploys on pushes to `main` branch |
| 78 | +- **Authentication**: Uses OIDC for secure, temporary AWS credentials |
| 79 | +- **Terraform Apply**: Provisions infrastructure with GitHub App credentials |
| 80 | +- **State Management**: Maintains Terraform state in remote backend |
| 81 | + |
| 82 | +### Configuration Steps |
| 83 | + |
| 84 | +#### 1. Configure GitHub Secrets |
| 85 | +Set up the following secrets in your GitHub repository: |
| 86 | +- `IAM_ROLE`: ARN of the OIDC-assumable IAM role |
| 87 | +- `THIS_GITHUB_APP_ID`: GitHub App ID for runner authentication |
| 88 | +- `THIS_GITHUB_INSTALLATION_ID`: GitHub App Installation ID |
| 89 | +- `THIS_GITHUB_PRIVATE_KEY`: GitHub App private key |
| 90 | +- `INFRACOST_API_KEY`: API key for cost estimation (optional) |
| 91 | + |
| 92 | +#### 2. Store GitHub App Credentials in AWS |
| 93 | +Create a secret in AWS Secrets Manager with GitHub App credentials: |
| 94 | +```json |
| 95 | +{ |
| 96 | + "app_id": "123456", |
| 97 | + "installation_id": "12345678", |
| 98 | + "private_key": "the-private-key" |
| 99 | +} |
| 100 | +``` |
| 101 | + |
| 102 | +### Deployment Process |
| 103 | + |
| 104 | +#### Pull Request Workflow |
| 105 | +1. **Create Feature Branch**: Make changes in a feature branch |
| 106 | +2. **Open Pull Request**: Triggers validation, security scan, and cost analysis |
| 107 | +3. **Review Automation**: |
| 108 | + - Terraform plan posted as PR comment |
| 109 | + - Checkov findings displayed in PR |
| 110 | + - Infracost analysis shows cost impact |
| 111 | +4. **Merge to Main**: Triggers automatic deployment |
| 112 | + |
| 113 | +#### Production Deployment |
| 114 | +1. **Automatic Trigger**: Merge to `main` branch starts deployment |
| 115 | +2. **Secure Authentication**: OIDC provides temporary AWS credentials |
| 116 | +3. **Infrastructure Provisioning**: Terraform applies changes to AWS |
| 117 | +4. **Validation**: Deployment success confirmed through workflow logs |
| 118 | + |
| 119 | +### Monitoring and Validation |
| 120 | + |
| 121 | +#### Deployment Status |
| 122 | +- **Workflow Badge**: Click the terraform-infra-provisioning badge above for real-time status |
| 123 | +- **GitHub Actions Logs**: Detailed logs available in the Actions tab |
| 124 | +- **Terraform State**: Remote state tracks all deployed resources |
| 125 | + |
| 126 | +#### Runner Validation |
| 127 | +- **GitHub Organization**: Verify runners appear in Actions settings |
| 128 | +- **CloudWatch Logs**: Monitor registration process in `/{name}/lifecycle` log group |
| 129 | +- **Auto Scaling Group**: Check EC2 instances are launching successfully |
| 130 | +- **EFS Mount**: Verify shared workspace storage is accessible |
| 131 | + |
| 132 | +## Configuration |
| 133 | + |
| 134 | +### Key Variables |
| 135 | +The infrastructure can be customized by modifying the default values in `variables.tf`: |
| 136 | + |
| 137 | +- `region`: AWS region for deployment (default: "us-west-2") |
| 138 | +- `name`: Prefix for all resource names (default: "github-self-hosted-runner") |
| 139 | +- `github_organization`: GitHub organization name (must be updated) |
| 140 | +- `runner_instance_type`: EC2 instance type for runners (default: "t3.medium") |
| 141 | +- `runner_min_size`: Minimum number of runners (default: 1) |
| 142 | +- `runner_max_size`: Maximum number of runners |
| 143 | +- `runner_desired_capacity`: Desired number of runners |
| 144 | + |
| 145 | +### Logging Structure |
| 146 | +The solution provides unified logging with the following structure: |
| 147 | +``` |
| 148 | +/{name}/lifecycle/ |
| 149 | +├── {instance-id}/registration |
| 150 | +├── {instance-id}/execution |
| 151 | +└── {instance-id}/deregistration |
| 152 | +``` |
| 153 | + |
| 154 | +## Security Considerations |
| 155 | + |
| 156 | +- All runners operate in private subnets with no direct internet access |
| 157 | +- GitHub App authentication provides scoped, time-limited access tokens |
| 158 | +- All secrets are encrypted using customer-managed KMS keys |
| 159 | +- CloudWatch logs are encrypted at rest with KMS |
| 160 | +- EFS file system uses encryption in transit and at rest |
| 161 | +- SNS topics and Lambda functions encrypted with customer-managed KMS keys |
| 162 | +- Lambda functions run in VPC with private subnets for enhanced security |
| 163 | +- Dead Letter Queue encrypted for secure error message handling |
| 164 | +- Security groups restrict network access to necessary ports only |
| 165 | +- IAM roles follow least privilege principle with minimal required permissions |
| 166 | + |
| 167 | +## Troubleshooting |
| 168 | + |
| 169 | +### Common Issues |
| 170 | +1. **Runner registration failures**: Check GitHub App permissions and credentials in Secrets Manager |
| 171 | +2. **Instance launch failures**: Verify VPC configuration and security group rules |
| 172 | +3. **Deregistration issues**: Check Lambda function logs in CloudWatch and dead letter queue messages |
| 173 | +4. **Network connectivity**: Ensure NAT Gateway is properly configured for private subnet internet access |
| 174 | +5. **Lambda deregistration failures**: Check Lambda function logs, VPC configuration, and GitHub API connectivity |
| 175 | +6. **EFS mount issues**: Verify NFS security group rules and mount target availability in all AZs |
| 176 | +7. **Lifecycle hook timeouts**: Check 5-minute timeout configuration and Lambda function performance metrics |
| 177 | +8. **SNS delivery failures**: Verify SNS topic permissions and Lambda subscription configuration |
| 178 | + |
| 179 | +### Monitoring |
| 180 | +- CloudWatch logs provide detailed lifecycle tracking with structured format |
| 181 | +- Auto Scaling Group metrics show scaling activities and lifecycle hook status |
| 182 | +- Lambda function metrics indicate deregistration success rates and error patterns |
| 183 | +- Dead Letter Queue metrics show failed Lambda executions requiring investigation |
| 184 | +- EFS performance metrics monitor storage throughput and connection counts |
| 185 | +- SNS topic metrics track message delivery and failure rates |
| 186 | + |
| 187 | +## Contributing |
| 188 | + |
| 189 | +Contributions are welcome! Please follow these guidelines: |
| 190 | + |
| 191 | +1. Fork the repository |
| 192 | +2. Create a feature branch (`git checkout -b feature/amazing-feature`) |
| 193 | +3. Commit your changes (`git commit -m 'Add some amazing feature'`) |
| 194 | +4. Push to the branch (`git push origin feature/amazing-feature`) |
| 195 | +5. Open a Pull Request |
| 196 | + |
| 197 | +Please ensure that: |
| 198 | +- Code follows Terraform best practices |
| 199 | +- All resources include appropriate tags |
| 200 | +- Security considerations are addressed |
| 201 | +- Documentation is updated for any new features |
| 202 | + |
| 203 | +If you find any issues or have suggestions for improvement, please feel free to open an issue. |
| 204 | + |
| 205 | +## License |
| 206 | + |
| 207 | +This code is released under the Unlicense License. See [LICENSE](LICENSE) for details. |
0 commit comments