An intelligent, AI-powered research assistant for materials science discovery. Built with LangGraph and OpenAI models, it autonomously explores scientific databases, validates findings, checks intellectual property status, and provides comprehensive materials recommendations.
- 🤖 ReAct Agent Architecture: Autonomous reasoning and action loop powered by LangGraph
- 🔬 Multi-Database Integration: Materials Project, PubChem, SureChEMBL, and Web Search
- 💬 Conversational Interface: Natural language queries with clarification capabilities
- 🧠 Smart Memory Management: Two-tier memory system with automatic session cleanup
- 🎨 Modern UI: React + Tailwind CSS with markdown-formatted responses
- 🖼️ Multimodal Support: Chemical structure visualization with image generation
- 📊 Observability: Integrated Langfuse tracing for debugging and monitoring
- ⚡ Async Architecture: Non-blocking I/O for fast, parallel database queries
The system uses a ReAct (Reasoning + Acting) pattern where the agent:
- Reasons about which tools and databases to use
- Acts by executing searches and analyses
- Observes results and decides next steps
- Responds with well-formatted, actionable insights
The agent autonomously:
- Translates vague queries into specific database searches
- Asks clarifying questions when needed (max 5 questions)
- Validates scientific accuracy of results
- Checks patent/IP status for novelty assessment
- Formats responses with proper markdown structure
| Tool | Description | Use Cases |
|---|---|---|
| Materials Project | 150k+ inorganic materials with computed properties | Band gap, formation energy, elasticity, crystal structure |
| PubChem | 111M+ organic compounds with chemical properties | SMILES, molecular formulas, safety data, physical properties |
| SureChEMBL | Patent database with 20M+ chemical structures | IP status, novelty assessment, patent landscape analysis |
| Exa.ai | Semantic web search engine | Real-world applications, pricing, definitions, validation |
| Image Generation | Chemical structure visualization | SMILES to PNG conversion for multimodal analysis |
The agent uses a two-tier memory system with automatic cleanup:
-
Short-term Memory (
InMemorySaver): Stores conversation history per session in RAM- Fast access for active conversations
- Automatically cleaned up when sessions become inactive
- Lost on server restart (by design for development)
-
Long-term Memory (
AsyncSqliteStore): User-specific facts and preferences persisted to SQLite- Survives server restarts
- Stores user preferences, industry context, and search history
- Persisted in
backend/long_term_memory.db
-
Session Cleanup: Automatic memory management
- Removes orphaned sessions when users refresh/create new sessions
- Cleans up inactive sessions after configurable timeout (default: 30 minutes)
- Prevents RAM bloat from accumulated conversations
- Configurable via
SESSION_CLEANUP_INACTIVE_MINUTESandSESSION_CLEANUP_ON_NEW_SESSION
- Python 3.10+ (tested on 3.13)
- Node.js 18+ (for frontend)
- API Keys (required):
- OpenAI API Key ()
- Materials Project API Key (free registration)
- Exa.ai API Key (for web search)
- Optional:
- Langfuse Account (for observability)
git clone https://github.com/DavidAkinpelu/materials_discovery_agent.git
cd materials_discovery_agentcd backend
# Create and activate virtual environment
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Configure environment variables
cp .env.example .env
# Edit .env and add your API keys:
# OPENAI_API_KEY=sk-...
# MATERIALS_PROJECT_API_KEY=...
# EXA_API_KEY=...cd frontend
# Install dependencies
npm installTerminal 1 - Backend (Port 8000):
cd backend
source venv/bin/activate # Windows: venv\Scripts\activate
python main.pyTerminal 2 - Frontend (Port 5000):
cd frontend
npm run devAccess the Application:
Open your browser to http://localhost:5000
- Sign up at langfuse.com
- Create a project and copy your keys
- Add to
backend/.env:LANGFUSE_PUBLIC_KEY=pk-... LANGFUSE_SECRET_KEY=sk-... LANGFUSE_HOST=https://cloud.langfuse.com
- Restart the backend - traces will appear in your Langfuse dashboard
All settings are centralized in backend/config.py using Pydantic settings. Override via environment variables:
LLM_MODEL=gpt-4o # OpenAI model name
LLM_TEMPERATURE=0.0 # Temperature (0.0-2.0)DEFAULT_SEARCH_RESULTS=10 # Default number of search results
HTTP_TIMEOUT=30 # HTTP request timeout (seconds)BACKEND_PORT=8000 # Backend API port
FRONTEND_PORT=5000 # Frontend dev server portSESSION_CLEANUP_INACTIVE_MINUTES=30 # Clean up sessions inactive for 30+ minutes
SESSION_CLEANUP_ON_NEW_SESSION=true # Auto-cleanup when new session createdManual Cleanup: Trigger cleanup via POST /api/cleanup-sessions endpoint (useful for scheduled tasks or memory monitoring).
backend/long_term_memory.db- User preferences and facts (long-term memory)
Note: Database files are automatically created on first run and are gitignored. Session cleanup prevents RAM bloat from orphaned conversations.
Monitor agent execution, tool calls, and reasoning traces in real-time:
- Sign up: langfuse.com
- Create project and copy keys
- Configure in
backend/.env:LANGFUSE_PUBLIC_KEY=pk-... LANGFUSE_SECRET_KEY=sk-... LANGFUSE_HOST=https://cloud.langfuse.com
- View traces at your Langfuse dashboard
Traces include:
- Full conversation flow
- Tool execution timing
- LLM prompts and responses
- Error tracking
Contributions are welcome! Here's how you can help:
- Fork the repository
- Create a branch:
git checkout -b feature/your-feature - Make changes and test thoroughly
- Run linters (if you add them):
black,isort,mypy - Commit:
git commit -m "feat: add your feature" - Push:
git push origin feature/your-feature - Open a Pull Request with a clear description
- New Data Sources: Add wrappers for Crystallography Open Database, NIST, etc.
- Advanced Queries: Multi-step reasoning, comparison queries
- UI Enhancements: Data visualization, export features
- Testing: Unit tests for tools, integration tests for agent
- Documentation: Tutorials, use case examples
- Performance: Caching strategies, query optimization
- Python: Follow PEP 8, use type hints
- TypeScript: Use strict mode, follow React best practices
- Commits: Use conventional commits (feat:, fix:, docs:, etc.)
This project is licensed under the MIT License - see the LICENSE file for details.
- Materials Project for the comprehensive materials database
- PubChem (NCBI) for chemical compound data
- SureChEMBL (EMBL-EBI) for patent chemistry data
- LangChain/LangGraph for the agent framework
- OpenAI for the llm
- Exa.ai for semantic web search
- Issues: Open an issue on GitHub for bugs or feature requests
- Discussions: Use GitHub Discussions for questions and ideas
- Email: [akinpeluakorede01@gmail.com]
Built for materials scientists, chemists, and researchers