This repository contains the code and curated test sets for MOCSR, a code-summary refinement workflow based on code evidence, retrieved dimension examples, iterative optimization, CodeRPE four-dimensional evaluation, and final metric aggregation.
MOCSR is evaluated on summaries generated by the following baseline systems:
dataset/java/test_new.jsonlanddataset/python/test_new.jsonl: curated test sets released with this repository.result/<model>/<language>/*.jsonl: released baseline and optimized result files.result/<model>/<language>/optimization_effect_metrics/*.jsonl: released record-level metric outputs when available.result/<model>/<language>/extract_*_evidence.py: build structured code evidence.result/<model>/<language>/retrieve_dimension_examples.py: retrieve similar examples for each target sample.RQ1/: main optimization scripts for EP4CS, CodeT5, and HGA outputs.RQ2/: ablation scripts.RQ3/: retrieval and model-variant analysis scripts.CodeRPE/: four-dimensional evaluation scripts.Evaluate/: final metric scripts. These scripts report onlycorpus_BLEU,mean_METEOR,mean_ROUGE_L, and the four-dimensional scores.
Only the two newly extracted test sets are included in this repository:
dataset/java/test_new.jsonldataset/python/test_new.jsonl
For the remaining Java and Python training/validation/test data, prepare CodeSearchNet from the CodeXGLUE code-to-text task:
https://github.com/microsoft/CodeXGLUE/tree/main/Code-Text/code-to-text
After downloading and cleaning CodeSearchNet, place the processed files under:
dataset/java/train.jsonldataset/java/valid.jsonldataset/java/test.jsonldataset/python/train.jsonldataset/python/valid.jsonldataset/python/test.jsonl
Released baseline and optimized result files are provided under result/<model>/<language>/.
Create or activate the project environment, then install dependencies:
conda activate CodeRPE
pip install -r requirements.txtSet API configuration with environment variables instead of editing scripts:
$env:OPENAI_API_KEY="your_api_key"
$env:OPENAI_BASE_URL="https://api.deepseek.com"- Prepare CodeSearchNet Java/Python data and keep the released curated test sets in
dataset/java/test_new.jsonlanddataset/python/test_new.jsonl. - Generate baseline summaries for EP4CS, CodeT5, or HGA and place the prediction JSONL files under the corresponding
result/<model>/<language>/directory. - Build code evidence, for example:
python result/EP4CS/Java/extract_java_evidence.py
python result/EP4CS/Python/extract_python_evidence.py- Retrieve similar examples, for example:
python result/EP4CS/Java/retrieve_dimension_examples.py
python result/EP4CS/Python/retrieve_dimension_examples.py- Run the corresponding optimization script in
RQ1/,RQ2/, orRQ3/, for example:
python RQ1/optimize_ep4cs_summaries_.py
python RQ1/optimize_python_ep4cs_summaries.py- Run CodeRPE four-dimensional evaluation:
python CodeRPE/setting_performance_ep4cs_full_optimized.py --method ablation --num-samples 1000 --force
python CodeRPE/setting_performance_python_ep4cs_full_optimized.py --method ablation --num-samples 1000 --force- Compute final metrics:
python Evaluate/compute_ep4cs_optimization_metrics.py
python Evaluate/compute_python_ep4cs_optimization_metrics.pyThe final metric scripts write auto_metrics_summary.csv and auto_metrics_summary.json to the corresponding result/<model>/<language>/optimization_effect_metrics/ directory.
- API keys are not stored in the repository. Use
OPENAI_API_KEYandOPENAI_BASE_URL. - The released JSONL files under
result/are experiment artifacts used by the evaluation scripts. Cache folders and regenerated non-JSONL metric reports are excluded from git. - The same evidence extraction and retrieval workflow applies to EP4CS, CodeT5, HGA, Java, and Python variants.