Materials Discovery Agent

An intelligent, AI-powered research assistant for materials science discovery. Built with LangGraph and OpenAI models, it autonomously explores scientific databases, validates findings, checks intellectual property status, and provides comprehensive materials recommendations.

✨ Features

🤖 ReAct Agent Architecture: Autonomous reasoning and action loop powered by LangGraph
🔬 Multi-Database Integration: Materials Project, PubChem, SureChEMBL, and Web Search
💬 Conversational Interface: Natural language queries with clarification capabilities
🧠 Smart Memory Management: Two-tier memory system with automatic session cleanup
🎨 Modern UI: React + Tailwind CSS with markdown-formatted responses
🖼️ Multimodal Support: Chemical structure visualization with image generation
📊 Observability: Integrated Langfuse tracing for debugging and monitoring
⚡ Async Architecture: Non-blocking I/O for fast, parallel database queries

🏗️ Architecture

ReAct Agent Workflow

The system uses a ReAct (Reasoning + Acting) pattern where the agent:

Reasons about which tools and databases to use
Acts by executing searches and analyses
Observes results and decides next steps
Responds with well-formatted, actionable insights

The agent autonomously:

Translates vague queries into specific database searches
Asks clarifying questions when needed (max 5 questions)
Validates scientific accuracy of results
Checks patent/IP status for novelty assessment
Formats responses with proper markdown structure

🛠️ Integrated Tools & Databases

Tool	Description	Use Cases
Materials Project	150k+ inorganic materials with computed properties	Band gap, formation energy, elasticity, crystal structure
PubChem	111M+ organic compounds with chemical properties	SMILES, molecular formulas, safety data, physical properties
SureChEMBL	Patent database with 20M+ chemical structures	IP status, novelty assessment, patent landscape analysis
Exa.ai	Semantic web search engine	Real-world applications, pricing, definitions, validation
Image Generation	Chemical structure visualization	SMILES to PNG conversion for multimodal analysis

🧠 Memory & State Management

The agent uses a two-tier memory system with automatic cleanup:

Short-term Memory (InMemorySaver): Stores conversation history per session in RAM
- Fast access for active conversations
- Automatically cleaned up when sessions become inactive
- Lost on server restart (by design for development)
Long-term Memory (AsyncSqliteStore): User-specific facts and preferences persisted to SQLite
- Survives server restarts
- Stores user preferences, industry context, and search history
- Persisted in backend/long_term_memory.db
Session Cleanup: Automatic memory management
- Removes orphaned sessions when users refresh/create new sessions
- Cleans up inactive sessions after configurable timeout (default: 30 minutes)
- Prevents RAM bloat from accumulated conversations
- Configurable via SESSION_CLEANUP_INACTIVE_MINUTES and SESSION_CLEANUP_ON_NEW_SESSION

🚀 Quick Start

Prerequisites

Python 3.10+ (tested on 3.13)
Node.js 18+ (for frontend)
API Keys (required):
- OpenAI API Key ()
- Materials Project API Key (free registration)
- Exa.ai API Key (for web search)
Optional:
- Langfuse Account (for observability)

1. Clone the Repository

git clone https://github.com/DavidAkinpelu/materials_discovery_agent.git
cd materials_discovery_agent

2. Backend Setup

cd backend

# Create and activate virtual environment
python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Configure environment variables
cp .env.example .env
# Edit .env and add your API keys:
#   OPENAI_API_KEY=sk-...
#   MATERIALS_PROJECT_API_KEY=...
#   EXA_API_KEY=...

3. Frontend Setup

cd frontend

# Install dependencies
npm install

4. Run the Application

Terminal 1 - Backend (Port 8000):

cd backend
source venv/bin/activate  # Windows: venv\Scripts\activate
python main.py

Terminal 2 - Frontend (Port 5000):

cd frontend
npm run dev

Access the Application: Open your browser to http://localhost:5000

5. Optional: Enable Langfuse Observability

Sign up at langfuse.com
Create a project and copy your keys

Add to backend/.env:

LANGFUSE_PUBLIC_KEY=pk-...
LANGFUSE_SECRET_KEY=sk-...
LANGFUSE_HOST=https://cloud.langfuse.com

Restart the backend - traces will appear in your Langfuse dashboard

⚙️ Configuration

All settings are centralized in backend/config.py using Pydantic settings. Override via environment variables:

Model Configuration

LLM_MODEL=gpt-4o                    # OpenAI model name
LLM_TEMPERATURE=0.0                  # Temperature (0.0-2.0)

Search & Query Settings

DEFAULT_SEARCH_RESULTS=10            # Default number of search results
HTTP_TIMEOUT=30                      # HTTP request timeout (seconds)

Server Ports

BACKEND_PORT=8000                    # Backend API port
FRONTEND_PORT=5000                   # Frontend dev server port

Session Management

SESSION_CLEANUP_INACTIVE_MINUTES=30  # Clean up sessions inactive for 30+ minutes
SESSION_CLEANUP_ON_NEW_SESSION=true   # Auto-cleanup when new session created

Manual Cleanup: Trigger cleanup via POST /api/cleanup-sessions endpoint (useful for scheduled tasks or memory monitoring).

Database Files

backend/long_term_memory.db - User preferences and facts (long-term memory)

Note: Database files are automatically created on first run and are gitignored. Session cleanup prevents RAM bloat from orphaned conversations.

📊 Observability with Langfuse

Monitor agent execution, tool calls, and reasoning traces in real-time:

Sign up: langfuse.com
Create project and copy keys

Configure in backend/.env:

LANGFUSE_PUBLIC_KEY=pk-...
LANGFUSE_SECRET_KEY=sk-...
LANGFUSE_HOST=https://cloud.langfuse.com

View traces at your Langfuse dashboard

Traces include:

Full conversation flow
Tool execution timing
LLM prompts and responses
Error tracking

🤝 Contributing

Contributions are welcome! Here's how you can help:

Development Workflow

Fork the repository
Create a branch: git checkout -b feature/your-feature
Make changes and test thoroughly
Run linters (if you add them): black, isort, mypy
Commit: git commit -m "feat: add your feature"
Push: git push origin feature/your-feature
Open a Pull Request with a clear description

Areas for Contribution

New Data Sources: Add wrappers for Crystallography Open Database, NIST, etc.
Advanced Queries: Multi-step reasoning, comparison queries
UI Enhancements: Data visualization, export features
Testing: Unit tests for tools, integration tests for agent
Documentation: Tutorials, use case examples
Performance: Caching strategies, query optimization

Code Style

Python: Follow PEP 8, use type hints
TypeScript: Use strict mode, follow React best practices
Commits: Use conventional commits (feat:, fix:, docs:, etc.)

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

Materials Project for the comprehensive materials database
PubChem (NCBI) for chemical compound data
SureChEMBL (EMBL-EBI) for patent chemistry data
LangChain/LangGraph for the agent framework
OpenAI for the llm
Exa.ai for semantic web search

📧 Contact & Support

Issues: Open an issue on GitHub for bugs or feature requests
Discussions: Use GitHub Discussions for questions and ideas
Email: [akinpeluakorede01@gmail.com]

Built for materials scientists, chemists, and researchers

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
backend		backend
frontend		frontend
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Materials Discovery Agent

✨ Features

🏗️ Architecture

ReAct Agent Workflow

🛠️ Integrated Tools & Databases

🧠 Memory & State Management

🚀 Quick Start

Prerequisites

1. Clone the Repository

2. Backend Setup

3. Frontend Setup

4. Run the Application

5. Optional: Enable Langfuse Observability

⚙️ Configuration

Model Configuration

Search & Query Settings

Server Ports

Session Management

Database Files

📊 Observability with Langfuse

🤝 Contributing

Development Workflow

Areas for Contribution

Code Style

📄 License

🙏 Acknowledgments

📧 Contact & Support

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Materials Discovery Agent

✨ Features

🏗️ Architecture

ReAct Agent Workflow

🛠️ Integrated Tools & Databases

🧠 Memory & State Management

🚀 Quick Start

Prerequisites

1. Clone the Repository

2. Backend Setup

3. Frontend Setup

4. Run the Application

5. Optional: Enable Langfuse Observability

⚙️ Configuration

Model Configuration

Search & Query Settings

Server Ports

Session Management

Database Files

📊 Observability with Langfuse

🤝 Contributing

Development Workflow

Areas for Contribution

Code Style

📄 License

🙏 Acknowledgments

📧 Contact & Support

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages