NetAI Podcast Studio

NetAI Podcast Studio

AI-Powered Podcast Creation Platform

Overview

NetAI Podcast Studio is a comprehensive web application that enables users to create professional podcasts from their documents using advanced AI technologies. The platform supports multiple AI providers for both language model (LLM) and text-to-speech (TTS) services, offering flexibility and customization options.

Features

Core Functionality

Document Upload & Parsing: Supports PDF, DOCX, ODT, and TXT files
AI-Powered Script Generation: Creates podcast scripts based on uploaded documents
Text-to-Speech Synthesis: Converts scripts to audio with multiple voice options
Podcast Management: Create, edit, and manage your podcast collection
Multi-Language Support: Supports 18+ languages with localized interfaces

AI Provider Support

Language Models (LLM)

Google Gemini: Default provider with built-in integration
OpenAI Compatible: Supports OpenAI, Cerebras, Mistral, xAI (Grok), and OpenRouter
Anthropic Claude: Specialized support for Claude models
Custom APIs: Configure any OpenAI-compatible API endpoint

Text-to-Speech (TTS)

Google Gemini TTS: High-quality voice synthesis
OpenAI TTS: Standard OpenAI voices (alloy, echo, fable, onyx, nova, shimmer)
Microsoft Edge TTS: Free browser-based multilingual voices
OpenAudio-S1 / XTTS: Custom TTS server support
Supertonic: OpenAI-compatible TTS with voice mixing capabilities

Podcast Customization

Multiple Formats: Solo host or conversation styles
Narration Styles: Professional, Educational, Conversational, Storytelling, Documentary, Explainer
Duration Control: Preset durations (0.5-15 minutes) or custom lengths
Speaker Configuration: Customize speaker names and voices
Voice Preview: Test voices before generating full podcasts

User Management

Authentication System: User registration, login, and password recovery
Persistent Storage: Podcasts are saved per user account
Settings Management: Configure AI providers, API keys, and debug options

Technical Stack

Frontend

Framework: Svelte 4 with TypeScript
Styling: Tailwind CSS
Build Tool: Vite
State Management: Svelte stores with localStorage persistence

AI Services

Google GenAI SDK: For Gemini integration
Edge TTS Universal: Browser-based TTS
LAME.js: MP3 encoding
PDF.js: PDF parsing
Mammoth.js: DOCX parsing
JSZip: ODT parsing

Audio Processing

Web Audio API: Audio buffer manipulation
PCM/WAV/MP3 Conversion: Multiple format support
Chunked Processing: Efficient handling of large audio files
Memory Optimization: Progressive audio buffer combination

Installation & Setup

Prerequisites

Node.js (v18+ recommended)
npm or yarn
API keys for your chosen AI providers

Quick Start

Install dependencies:
```
npm install
```
Configure API keys:
- Create .env.local file
- Add your Gemini API key: VITE_API_KEY=your_gemini_api_key
- For other providers, configure through the Settings page
Run the development server:
```
npm run dev
```
Access the application: Open http://localhost:5173 in your browser

Production Build

npm run build
npm run preview

Docker Deployment

The project includes a Docker configuration for easy deployment:

# Build and run with Docker Compose
docker-compose up --build

The application will be available at http://localhost:8080

Configuration

Environment Variables

Create a .env.local file with the following variables:

# Gemini API Key (required for Gemini provider)
VITE_API_KEY=your_gemini_api_key

# Optional: Set default provider configurations
# VITE_DEFAULT_LLM_PROVIDER=gemini
# VITE_DEFAULT_TTS_PROVIDER=gemini

Settings Page

Configure AI providers through the in-app Settings page:

LLM Provider: Select from available options
TTS Provider: Choose your preferred TTS service
API URLs: Configure custom endpoints
API Keys: Securely store your credentials
Debug Options: Adjust logging levels

Usage

Creating a Podcast

Upload Documents: Drag and drop or select files
Configure Podcast: Set topic, duration, style, and language
Customize Speakers: Select voices and names
Generate Script: AI creates the podcast script
Review & Edit: Modify the generated script if needed
Synthesize Audio: Convert script to speech
Play & Download: Listen and export your podcast

Managing Podcasts

Edit: Modify existing podcast scripts
Delete: Remove unwanted podcasts
Export: Download scripts (TXT) or audio (WAV/MP3)

Architecture

Component Structure

src/
├── components/          # UI Components
├── services/           # AI Service Integrations
├── utils/              # Utility Functions
├── stores/            # State Management
├── types/             # Type Definitions
├── constants/         # Configuration Constants
├── locales/           # Internationalization
└── App.svelte         # Main Application

Key Services

geminiService.ts: Google Gemini integration
ttsServices.ts: TTS service abstraction layer
docParser.ts: Document parsing utilities
audio.ts: Audio processing and conversion
text.ts: Text chunking and validation

State Management

userStore: User authentication and profile
settingsStore: AI provider configurations
i18n: Internationalization and localization

Security

API Key Management: Keys are stored in localStorage (browser-only)
CORS Handling: Proper error messages for API configuration issues
Input Validation: Document parsing with error handling
Authentication: Local user management system

Performance Optimization

Chunked Processing: Large documents and audio files are processed in chunks
Memory Management: Efficient audio buffer handling
Progressive Loading: Real-time feedback during generation
Caching: Local storage for user preferences and podcasts

Internationalization

The application supports multiple languages with complete translations for:

English (en)
German (de)
Spanish (es)
French (fr)
Italian (it)

Additional languages are available for voice selection.

Browser Compatibility

Modern Browsers: Chrome, Firefox, Safari, Edge
Required APIs: Web Audio API, Fetch API, Blob URLs
Fallbacks: Graceful degradation for unsupported features

Development

Running Tests

npm run check

Code Quality

TypeScript: Strict type checking
ESLint: Code style enforcement
Prettier: Code formatting

Debugging

Enable debug logging in Settings:

Set log level to DEBUG for detailed console output
Monitor network requests and API responses
Check audio processing steps and buffer operations

Contributing

Contributions are welcome! Please follow the existing code style and architecture patterns.

Key Areas for Contribution

Additional AI provider integrations
Enhanced document parsing support
Improved audio processing algorithms
Additional language translations
UI/UX improvements

License

This project is licensed under the MIT License.

Support

For issues or questions, please open a GitHub issue or contact the maintainers.

Roadmap

Future enhancements planned:

Cloud synchronization
Collaborative editing
Advanced audio effects
Video podcast support
Analytics and insights
API for programmatic access

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.kilo/plans		.kilo/plans
public		public
scripts		scripts
src		src
.gitignore		.gitignore
README.md		README.md
chunking-fix-plan.md		chunking-fix-plan.md
docker-compose.yml		docker-compose.yml
index.html		index.html
jest-import-meta-transform.js		jest-import-meta-transform.js
jest.config.js		jest.config.js
metadata.json		metadata.json
nginx.conf		nginx.conf
package-lock.json		package-lock.json
package.json		package.json
postcss.config.js		postcss.config.js
tailwind.config.js		tailwind.config.js
test-model-fetching.js		test-model-fetching.js
tsconfig.json		tsconfig.json
tsconfig.node.json		tsconfig.node.json
vite.config.ts		vite.config.ts
vite.config.ts.timestamp-1762161600656-3b1816af96209.mjs		vite.config.ts.timestamp-1762161600656-3b1816af96209.mjs
web-ui.py		web-ui.py

Folders and files

Latest commit

History

Repository files navigation

NetAI Podcast Studio

Overview

Features

Core Functionality

AI Provider Support

Language Models (LLM)

Text-to-Speech (TTS)

Podcast Customization

User Management

Technical Stack

Frontend

AI Services

Audio Processing

Installation & Setup

Prerequisites

Quick Start

Production Build

Docker Deployment

Configuration

Environment Variables

Settings Page

Usage

Creating a Podcast

Managing Podcasts

Architecture

Component Structure

Key Services

State Management

Security

Performance Optimization

Internationalization

Browser Compatibility

Development

Running Tests

Code Quality

Debugging

Contributing

Key Areas for Contribution

License

Support

Roadmap

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages