AI-powered fake invoice detection system using machine learning to identify fraudulent invoices through text and numerical pattern analysis.
- OCR Integration: Extract text from invoice images
- ML-Based Detection: XGBoost and Random Forest models
- Anomaly Detection: Identify unusual patterns
- Web Interface: React frontend with FastAPI backend
- Real-time Analysis: Upload and analyze invoices instantly
- FastAPI: Modern web framework for APIs
- XGBoost: Advanced machine learning algorithm
- scikit-learn: ML utilities and preprocessing
- TextBlob: Natural language processing
- pandas: Data manipulation
- Tesseract OCR: Text extraction from images
- React: User interface
- Material-UI: Modern UI components
- Axios: API communication
#Clone the repository
git clone https://github.com/shreeyashree-65/fake-invoice-detector.git
cd fake-invoice-detector
#Run automatic setup
python setup.py setup
#Start development servers
python setup.py devcd backend
pip install -r requirements.txt
python src/data_generator.py # Generate training data
python src/model_trainer.py # Train ML models
python app.py # Start FastAPI servercd frontend
npm install
npm start#Build and run with Docker
docker-compose up --build
#Or use the setup script
python setup.py docker-build
python setup.py docker-startFor image processing capabilities:
- Download Tesseract from: https://github.com/UB-Mannheim/tesseract/wiki
- Install and add to PATH
- Update
pytesseract.pytesseract.tesseract_cmdinocr_processor.py
- Text Features: Vendor name patterns, description analysis
- Numerical Features: Amount patterns, tax calculations
- Anomaly Detection: Isolation Forest for unusual patterns
- XGBoost: Primary classifier for fraud detection
- Random Forest: Ensemble method for robustness
- Isolation Forest: Anomaly detection for unknown patterns
- Synthetic invoice data with realistic patterns
- 50% genuine invoices with proper formatting
- 50% fake invoices with suspicious indicators
- Upload Invoice: Drag and drop invoice image
- OCR Processing: Text extraction from image
- Feature Extraction: Analyze text and numerical patterns
- ML Prediction: Classify as genuine or fake
- Results: View prediction with confidence score
This project is licensed under the MIT License - see the LICENSE file for details.
SHREEYA P S - shreeyashree-65