IdeaSearch Fitter Demo
IdeaSearch Fitter Demo usage examples - StreamLit-based intelligent symbolic regression web application
🎬 Demo Video
IdeaSearch/ideasearch-fit-demo
0
🚀 IdeaSearch Fitter Demo Usage Tutorial
📖 Overview
IdeaSearch Fitter Demo is an intelligent symbolic regression web application based on the IdeaSearch framework, using large language models to automatically discover mathematical expressions. Users can simply draw curves or upload data, and AI will find the best-fit formulas for you.
✨ Key Features
- 🎨 Interactive Drawing Canvas - Intuitively draw target curves with support for multiple drawing modes
- 📁 File Upload Support - Support NPZ data file upload and multi-dimensional feature fitting
- 🤖 Multi-Model Support - Integrated with mainstream LLMs like GPT, Gemini, Qwen, DeepSeek
- 🧠 Fuzzy Mode - Use natural language theory descriptions to assist fitting
- 📊 Real-time Visualization - Dynamically display fitting progress and result comparisons
- 🏝️ Island Evolution Algorithm - Parallel exploration of multiple solution spaces to improve fitting quality
- 📈 Pareto Front Analysis - Balance expression complexity and fitting accuracy
- ⚙️ Physical Unit Validation - Ensure generated expressions have correct dimensions
🛠️ Environment Setup
1. Clone the Repository
# Clone repository
git clone https://github.com/IdeaSearch/ideasearch-fit-demo
cd ideasearch-fit-demo
2. Install uv Package Manager
uv is a fast and reliable Python package manager, recommended for use:
# macOS/Linux
curl -LsSf https://astral.sh/uv/install.sh | sh
# Windows
powershell -c "irm https://astral.sh/uv/install.ps1 | iex"
# Or install via pip
pip install uv
3. Configure Environment and Dependencies
# Sync dependency environment
uv sync
4. Configure API Keys
Copy the example configuration file and edit:
# Copy example configuration
cp api_keys.json.example api_keys.json
# Edit configuration file
nano api_keys.json # or use other editors
API key configuration format:
{
"Gemini_2.5_Flash": [{
"api_key": "your-gemini-api-key-here",
"base_url": "https://generativelanguage.googleapis.com/v1beta",
"model": "gemini-2.0-flash-exp"
}],
"GPT_4o_Mini": [{
"api_key": "your-openai-api-key",
"base_url": "https://api.openai.com/v1",
"model": "gpt-4o-mini"
}],
"Qwen_Plus": [{
"api_key": "your-qwen-api-key",
"base_url": "https://dashscope.aliyuncs.com/compatible-mode/v1",
"model": "qwen-plus"
}]
}
Supported Model Names (Please configure strictly according to the following names):
- Gemini Series:
Gemini_2.5_Flash
,Gemini_2.5_Pro
,Gemini_Pro
- OpenAI Series:
GPT_4o
,GPT_4o_Mini
,GPT_4_Turbo
- Domestic Models:
Qwen_Plus
,Qwen_Max
,Qwen3
,Doubao
- Open Source Models:
Deepseek_V3
,Grok_4
🚀 Launch Application
After configuration, start the application:
# Use launch script (recommended)
./run.sh
# Or start manually
uv run streamlit run app.py --server.port 8501
The application will automatically open in browser: http://localhost:8501
📖 Usage Guide
The application provides two main tabs for different usage scenarios:
🎨 Tab 1: Draw Curve Fitting
This is the most intuitive way to use, suitable for quick exploration and teaching demonstrations.
Operation Steps
-
Draw Curves
- Draw target curves on the left canvas
- Supports three drawing modes:
- Free Drawing: Hand-draw curves of any shape
- Straight Line: Draw line segments
- Points: Mark data points one by one
- Adjustable line width (1-10 pixels)
- Can enable 📷 Pass Image option to pass canvas images to vision-capable models (like Gemini)
-
Configure Parameters (Right sidebar)
- Model Selection: Recommend using
Gemini_2.5_Flash
(best balance of speed and quality) - Function Configuration: Select available mathematical functions
Basic functions: sin, cos, tan, exp, log, sqrt, abs Advanced functions: sinh, cosh, tanh, asin, acos, atan
- Fitting Parameter Tuning:
- Number of Islands: 3-8 (Recommended: 5) - Number of parallel search populations
- Number of Cycles: 3-10 (Recommended: 5) - Number of evolution generations
- Unit Interactions: 5-10 (Recommended: 8) - LLM calls per generation
- Target Score: 80.0 (Recommended) - Early stop when reached
- Fuzzy Mode: Check to enable natural language theory description assistance
- Model Selection: Recommend using
-
Data Preview
- The right side displays extracted data point information
- Shows X, Y ranges and data point scatter plots
- Confirm data quality before starting fitting
-
Execute Fitting
- Click ▶️ Start Fitting button
- Observe real-time progress and log output
- Can see during fitting process:
- Current best score and expression
- Real-time fitting curve comparison plots
- API call counts and runtime
Fitting Results Interpretation
After fitting completion, the application displays:
- 📊 Fitting Comparison Plot: Original curve vs AI-discovered fitting curve
- 📈 Score History: Shows fitting quality improvement over iterations
- 📊 Pareto Front: Analyzes trade-off between expression complexity and accuracy
- 📞 API Call Log: Detailed model call records and performance statistics
📁 Tab 2: File Upload Fitting
This is the preferred method for professional users, supporting complex multi-dimensional data and physical unit validation.
Data Preparation
Prepare NPZ files containing the following keys:
'x'
: Input features, shape(n_samples, n_features)
'y'
: Output targets, shape(n_samples,)
'error'
: Optional error data, shape(n_samples,)
Python example code:
import numpy as np
# Generate example data: F = m * a (Newton's second law)
m = np.random.uniform(1, 10, 100) # mass kg
a = np.random.uniform(0.5, 20, 100) # acceleration m/s^2
F = m * a # force N
error = np.random.normal(0, 0.1, 100) # measurement error
# Save as NPZ file
x = np.column_stack([m, a]) # input feature matrix
y = F # output target
np.savez('physics_data.npz', x=x, y=y, error=error)
Operation Steps
-
Upload Data File
- Click Choose NPZ File to upload data
- System will automatically validate data format
- Display basic data information: number of samples, features, whether errors are included
-
Variable Configuration (Key step)
Set in ⚙️ Variable Configuration area:
Basic Description:
- Input Description: Describe the physical meaning of input data
Example: "Use object's mass and acceleration to derive force acting on the object"
Output Variables:
- Output Variable Name:
F
- Output Variable Description:
force
- Output Variable Unit:
kg*m/s^2
Input Variable Configuration: Configure for each input feature:
- Variable Name:
m
,a
(corresponding to mass, acceleration) - Unit:
kg
,m/s^2
- Description:
mass
,acceleration
- Input Description: Describe the physical meaning of input data
-
Advanced Options
- Enable Unit Validation: When checked, performs dimensional analysis to ensure generated expressions are physically correct
- Uncheck to skip unit checking, suitable for pure mathematical fitting
-
Parameter Tuning
- Sidebar parameters same as drawing mode
- For complex data, recommend:
- Number of Islands: 6-8
- Number of Cycles: 8-10
- Enable Fuzzy mode
-
Execution and Results
- Click ▶️ Start Fitting
- For multi-dimensional data, results shown as predicted vs actual scatter plots
- Ideally, points should be distributed near the y=x line
⚙️ Configuration Parameter Details
IdeaSearch Core Parameters
Prop
Type
Canvas Configuration Parameters
Prop
Type
Data Processing Parameters
Prop
Type
Fitter Configuration Parameters
Prop
Type
🎯 Parameter Tuning Guide
Key Parameter Explanations
Parameter | Recommended Value | Description | Tuning Suggestions |
---|---|---|---|
Number of Islands | 3-8 | Number of parallel evolution populations | Increase improves diversity but consumes more API |
Number of Cycles | 3-10 | Number of evolution generations | More cycles usually yield better results |
Unit Interactions | 5-10 | LLM calls per cycle | Balance exploration depth and cost |
Target Score | 80.0 | Automatic stop threshold | Adjust based on accuracy requirements (0-100) |
Sample Temperature | 10-30 | Generation randomness control | High temperature increases creativity, low temperature more stable |
🔧 Troubleshooting
Common Problem Solutions
Q: Application fails to start?
# Check Python version (requires 3.10+)
python --version
# Reinstall dependencies
uv sync
Q: API calls failing?
- Check if
api_keys.json
format is correct - Confirm API keys are valid and have balance
- Verify network connection
- Check if model names exactly match configuration file key names
Q: Fitting results unsatisfactory?
- Increase search intensity: Raise number of islands and cycles
- Enable Fuzzy mode: Use natural language theory descriptions
- Try different models: GPT-4o usually performs better than Mini versions
- Optimize data quality: Ensure canvas curves are clear and data is evenly distributed
- Adjust function library: Choose appropriate basic functions based on expected function types
Q: Memory or performance issues?
- Lower number of islands and cycles
- Use more lightweight models
- Reduce number of data points
- Turn off some unnecessary visualizations
Log Viewing
The application automatically saves detailed logs in the logs/
directory:
logs/
├── fit_20231208_143022/ # Fitting process logs
├── db_20231208_143022/ # IdeaSearcher database files
└── ...
Each fitting creates an independent timestamped directory containing:
- Complete fitting process records
- API call details
- Error messages and debug output
- Best expressions and Pareto front data
📚 Technical Architecture
Core Components
- Streamlit: Web application framework
- IdeaSearch-framework: Core optimization engine
- IdeaSearch-fit: Symbolic regression adapter
- streamlit-drawable-canvas: Drawing canvas component
Data Flow
User Input (Canvas/File) → Data Preprocessing → IdeaSearchFitter → IdeaSearcher → LLM Calls → Expression Generation → Evaluation and Selection → Result Display
File Structure
🚀 Advanced Features
Fuzzy Mode
Fuzzy mode is a unique feature of IdeaSearch that first lets LLM generate natural language theory descriptions, then converts them to mathematical expressions:
- Theory Generation: LLM analyzes data characteristics and generates physical or mathematical theory hypotheses
- Expression Conversion: Convert natural language theories to specific mathematical formulas
- Iterative Optimization: Continue refining theories and expressions based on fitting results
Applicable scenarios:
- Physical law discovery
- Complex nonlinear relationship modeling
- Symbolic regression requiring interpretability
Physical Unit Validation
When unit validation is enabled, the system will:
- Dimensional Analysis: Check dimensional consistency of expressions
- Unit Derivation: Verify if output units match expectations
- Correction Suggestions: Provide correction suggestions for expressions that don't conform to units
This ensures generated expressions are physically meaningful.
Island Evolution Algorithm
- Parallel Search: Multiple "islands" simultaneously evolve different expression populations
- Population Exchange: Islands periodically exchange excellent individuals
- Diversity Maintenance: Avoid premature convergence to local optima
Pareto Front Optimization
Balances two objectives:
- Fitting Accuracy: Degree of expression matching with data
- Expression Complexity: Simplicity of formulas
Helps users find optimal balance between accuracy and interpretability.
📈 Performance Optimization Suggestions
- Model Selection: Gemini 2.5 Flash provides best cost-effectiveness
- Batch Processing: Use larger unit interaction numbers to reduce network overhead
- Caching: System automatically caches intermediate results
- Parallelization: Island algorithm naturally supports parallel computing
- Early Stopping: Set reasonable target scores to avoid overfitting
🤝 Contribution and Feedback
Encountering problems or have improvement suggestions?
- 📋 Check GitHub Issues
- 🆕 Create New Issue
- 📧 Contact development team
🎯 Start exploring AI-driven symbolic regression!
Let large language models drive research and promote scientific discovery
Last updated on