AI-Powered Podcast Creation Platform
NetAI Podcast Studio is a comprehensive web application that enables users to create professional podcasts from their documents using advanced AI technologies. The platform supports multiple AI providers for both language model (LLM) and text-to-speech (TTS) services, offering flexibility and customization options.
- Document Upload & Parsing: Supports PDF, DOCX, ODT, and TXT files
- AI-Powered Script Generation: Creates podcast scripts based on uploaded documents
- Text-to-Speech Synthesis: Converts scripts to audio with multiple voice options
- Podcast Management: Create, edit, and manage your podcast collection
- Multi-Language Support: Supports 18+ languages with localized interfaces
- Google Gemini: Default provider with built-in integration
- OpenAI Compatible: Supports OpenAI, Cerebras, Mistral, xAI (Grok), and OpenRouter
- Anthropic Claude: Specialized support for Claude models
- Custom APIs: Configure any OpenAI-compatible API endpoint
- Google Gemini TTS: High-quality voice synthesis
- OpenAI TTS: Standard OpenAI voices (alloy, echo, fable, onyx, nova, shimmer)
- Microsoft Edge TTS: Free browser-based multilingual voices
- OpenAudio-S1 / XTTS: Custom TTS server support
- Supertonic: OpenAI-compatible TTS with voice mixing capabilities
- Multiple Formats: Solo host or conversation styles
- Narration Styles: Professional, Educational, Conversational, Storytelling, Documentary, Explainer
- Duration Control: Preset durations (0.5-15 minutes) or custom lengths
- Speaker Configuration: Customize speaker names and voices
- Voice Preview: Test voices before generating full podcasts
- Authentication System: User registration, login, and password recovery
- Persistent Storage: Podcasts are saved per user account
- Settings Management: Configure AI providers, API keys, and debug options
- Framework: Svelte 4 with TypeScript
- Styling: Tailwind CSS
- Build Tool: Vite
- State Management: Svelte stores with localStorage persistence
- Google GenAI SDK: For Gemini integration
- Edge TTS Universal: Browser-based TTS
- LAME.js: MP3 encoding
- PDF.js: PDF parsing
- Mammoth.js: DOCX parsing
- JSZip: ODT parsing
- Web Audio API: Audio buffer manipulation
- PCM/WAV/MP3 Conversion: Multiple format support
- Chunked Processing: Efficient handling of large audio files
- Memory Optimization: Progressive audio buffer combination
- Node.js (v18+ recommended)
- npm or yarn
- API keys for your chosen AI providers
-
Install dependencies:
npm install
-
Configure API keys:
- Create
.env.localfile - Add your Gemini API key:
VITE_API_KEY=your_gemini_api_key - For other providers, configure through the Settings page
- Create
-
Run the development server:
npm run dev
-
Access the application: Open
http://localhost:5173in your browser
npm run build
npm run previewThe project includes a Docker configuration for easy deployment:
# Build and run with Docker Compose
docker-compose up --buildThe application will be available at http://localhost:8080
Create a .env.local file with the following variables:
# Gemini API Key (required for Gemini provider)
VITE_API_KEY=your_gemini_api_key
# Optional: Set default provider configurations
# VITE_DEFAULT_LLM_PROVIDER=gemini
# VITE_DEFAULT_TTS_PROVIDER=geminiConfigure AI providers through the in-app Settings page:
- LLM Provider: Select from available options
- TTS Provider: Choose your preferred TTS service
- API URLs: Configure custom endpoints
- API Keys: Securely store your credentials
- Debug Options: Adjust logging levels
- Upload Documents: Drag and drop or select files
- Configure Podcast: Set topic, duration, style, and language
- Customize Speakers: Select voices and names
- Generate Script: AI creates the podcast script
- Review & Edit: Modify the generated script if needed
- Synthesize Audio: Convert script to speech
- Play & Download: Listen and export your podcast
- Edit: Modify existing podcast scripts
- Delete: Remove unwanted podcasts
- Export: Download scripts (TXT) or audio (WAV/MP3)
src/
├── components/ # UI Components
├── services/ # AI Service Integrations
├── utils/ # Utility Functions
├── stores/ # State Management
├── types/ # Type Definitions
├── constants/ # Configuration Constants
├── locales/ # Internationalization
└── App.svelte # Main Application
- geminiService.ts: Google Gemini integration
- ttsServices.ts: TTS service abstraction layer
- docParser.ts: Document parsing utilities
- audio.ts: Audio processing and conversion
- text.ts: Text chunking and validation
- userStore: User authentication and profile
- settingsStore: AI provider configurations
- i18n: Internationalization and localization
- API Key Management: Keys are stored in localStorage (browser-only)
- CORS Handling: Proper error messages for API configuration issues
- Input Validation: Document parsing with error handling
- Authentication: Local user management system
- Chunked Processing: Large documents and audio files are processed in chunks
- Memory Management: Efficient audio buffer handling
- Progressive Loading: Real-time feedback during generation
- Caching: Local storage for user preferences and podcasts
The application supports multiple languages with complete translations for:
- English (en)
- German (de)
- Spanish (es)
- French (fr)
- Italian (it)
Additional languages are available for voice selection.
- Modern Browsers: Chrome, Firefox, Safari, Edge
- Required APIs: Web Audio API, Fetch API, Blob URLs
- Fallbacks: Graceful degradation for unsupported features
npm run check- TypeScript: Strict type checking
- ESLint: Code style enforcement
- Prettier: Code formatting
Enable debug logging in Settings:
- Set log level to DEBUG for detailed console output
- Monitor network requests and API responses
- Check audio processing steps and buffer operations
Contributions are welcome! Please follow the existing code style and architecture patterns.
- Additional AI provider integrations
- Enhanced document parsing support
- Improved audio processing algorithms
- Additional language translations
- UI/UX improvements
This project is licensed under the MIT License.
For issues or questions, please open a GitHub issue or contact the maintainers.
Future enhancements planned:
- Cloud synchronization
- Collaborative editing
- Advanced audio effects
- Video podcast support
- Analytics and insights
- API for programmatic access
