Prevent PII leakage in PR descriptions and add structured formatting#12
Open
galel12 wants to merge 1 commit into
Open
Prevent PII leakage in PR descriptions and add structured formatting#12galel12 wants to merge 1 commit into
galel12 wants to merge 1 commit into
Conversation
b086912 to
218f1b9
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
The bot's generated PR descriptions expose internal PII (Red Hat email addresses, internal Jira URLs) on public GitHub repositories, and the unstructured raw dump of Jira ticket content makes PRs difficult to review.
Root Cause
The PR body was constructed by directly embedding
ticket.Fields.Description,ticket.Fields.Assignee.EmailAddress, andconfig.Jira.BaseURLinto afmt.Sprintfcall with no sanitization or formatting. There was no scrubbing layer and no structured output — the AI-generated code changes were committed, but the PR description was always a static template filled with raw Jira data.Solution
claude_prompt.tmpl,gemini_prompt.tmpl,ticket_prompt.tmpl) to instruct the AI to output a structured## PR Descriptionsection with Problem / Root Cause / Solution subsections.parsePRDescription()to extract that section from the AI's response and use it as the PR body.scrubPII()to strip email addresses and internal URLs before anything reaches GitHub.