IdeaSearch Fitter Demo usage examples - StreamLit-based intelligent symbolic regression web application

🚀 IdeaSearch Fitter Demo Usage Tutorial

IdeaSearch Fitter Demo is an intelligent symbolic regression web application based on the IdeaSearch framework, using large language models to automatically discover mathematical expressions. Users can simply draw curves or upload data, and AI will find the best-fit formulas for you.

✨ Key Features

🎨 Interactive Drawing Canvas - Intuitively draw target curves with support for multiple drawing modes
📁 File Upload Support - Support NPZ data file upload and multi-dimensional feature fitting
🤖 Multi-Model Support - Integrated with mainstream LLMs like GPT, Gemini, Qwen, DeepSeek
🧠 Fuzzy Mode - Use natural language theory descriptions to assist fitting
📊 Real-time Visualization - Dynamically display fitting progress and result comparisons
🏝️ Island Evolution Algorithm - Parallel exploration of multiple solution spaces to improve fitting quality
📈 Pareto Front Analysis - Balance expression complexity and fitting accuracy
⚙️ Physical Unit Validation - Ensure generated expressions have correct dimensions

🛠️ Environment Setup

1. Clone the Repository

# Clone repository
git clone https://github.com/IdeaSearch/ideasearch-fit-demo
cd ideasearch-fit-demo

2. Install uv Package Manager

uv is a fast and reliable Python package manager, recommended for use:

# macOS/Linux
curl -LsSf https://astral.sh/uv/install.sh | sh

# Windows
powershell -c "irm https://astral.sh/uv/install.ps1 | iex"

# Or install via pip
pip install uv

3. Configure Environment and Dependencies

# Sync dependency environment
uv sync

4. Configure API Keys

Copy the example configuration file and edit:

# Copy example configuration
cp api_keys.json.example api_keys.json

# Edit configuration file
nano api_keys.json  # or use other editors

API key configuration format:

{
  "Gemini_2.5_Flash": [{
    "api_key": "your-gemini-api-key-here",
    "base_url": "https://generativelanguage.googleapis.com/v1beta",
    "model": "gemini-2.0-flash-exp"
  }],
  "GPT_4o_Mini": [{
    "api_key": "your-openai-api-key",
    "base_url": "https://api.openai.com/v1",
    "model": "gpt-4o-mini"
  }],
  "Qwen_Plus": [{
    "api_key": "your-qwen-api-key",
    "base_url": "https://dashscope.aliyuncs.com/compatible-mode/v1",
    "model": "qwen-plus"
  }]
}

Supported Model Names (Please configure strictly according to the following names):

Gemini Series: Gemini_2.5_Flash, Gemini_2.5_Pro, Gemini_Pro
OpenAI Series: GPT_4o, GPT_4o_Mini, GPT_4_Turbo
Domestic Models: Qwen_Plus, Qwen_Max, Qwen3, Doubao
Open Source Models: Deepseek_V3, Grok_4

🚀 Launch Application

After configuration, start the application:

# Use launch script (recommended)
./run.sh

# Or start manually
uv run streamlit run app.py --server.port 8501

The application will automatically open in browser: http://localhost:8501

📖 Usage Guide

The application provides two main tabs for different usage scenarios:

🎨 Tab 1: Draw Curve Fitting

This is the most intuitive way to use, suitable for quick exploration and teaching demonstrations.

Operation Steps

Draw Curves
- Draw target curves on the left canvas
- Supports three drawing modes:
  - Free Drawing: Hand-draw curves of any shape
  - Straight Line: Draw line segments
  - Points: Mark data points one by one
- Adjustable line width (1-10 pixels)
- Can enable 📷 Pass Image option to pass canvas images to vision-capable models (like Gemini)
Configure Parameters (Right sidebar)
- Model Selection: Recommend using Gemini_2.5_Flash (best balance of speed and quality)
- Function Configuration: Select available mathematical functions
```
Basic functions: sin, cos, tan, exp, log, sqrt, abs
Advanced functions: sinh, cosh, tanh, asin, acos, atan
```
- Fitting Parameter Tuning:
  - Number of Islands: 3-8 (Recommended: 5) - Number of parallel search populations
  - Number of Cycles: 3-10 (Recommended: 5) - Number of evolution generations
  - Unit Interactions: 5-10 (Recommended: 8) - LLM calls per generation
  - Target Score: 80.0 (Recommended) - Early stop when reached
- Fuzzy Mode: Check to enable natural language theory description assistance
Data Preview
- The right side displays extracted data point information
- Shows X, Y ranges and data point scatter plots
- Confirm data quality before starting fitting
Execute Fitting
- Click ▶️ Start Fitting button
- Observe real-time progress and log output
- Can see during fitting process:
  - Current best score and expression
  - Real-time fitting curve comparison plots
  - API call counts and runtime

Fitting Results Interpretation

After fitting completion, the application displays:

📊 Fitting Comparison Plot: Original curve vs AI-discovered fitting curve
📈 Score History: Shows fitting quality improvement over iterations
📊 Pareto Front: Analyzes trade-off between expression complexity and accuracy
📞 API Call Log: Detailed model call records and performance statistics

📁 Tab 2: File Upload Fitting

This is the preferred method for professional users, supporting complex multi-dimensional data and physical unit validation.

Data Preparation

Prepare NPZ files containing the following keys:

'x': Input features, shape (n_samples, n_features)
'y': Output targets, shape (n_samples,)
'error': Optional error data, shape (n_samples,)

Python example code:

import numpy as np

# Generate example data: F = m * a (Newton's second law)
m = np.random.uniform(1, 10, 100)  # mass kg
a = np.random.uniform(0.5, 20, 100)  # acceleration m/s^2
F = m * a  # force N
error = np.random.normal(0, 0.1, 100)  # measurement error

# Save as NPZ file
x = np.column_stack([m, a])  # input feature matrix
y = F  # output target
np.savez('physics_data.npz', x=x, y=y, error=error)

Operation Steps

Upload Data File
- Click Choose NPZ File to upload data
- System will automatically validate data format
- Display basic data information: number of samples, features, whether errors are included
Variable Configuration (Key step)

Set in ⚙️ Variable Configuration area:

Basic Description:
- Input Description: Describe the physical meaning of input data
```
Example: "Use object's mass and acceleration to derive force acting on the object"
```
Output Variables:
- Output Variable Name: F
- Output Variable Description: force
- Output Variable Unit: kg*m/s^2
Input Variable Configuration: Configure for each input feature:
- Variable Name: m, a (corresponding to mass, acceleration)
- Unit: kg, m/s^2
- Description: mass, acceleration
Advanced Options
- Enable Unit Validation: When checked, performs dimensional analysis to ensure generated expressions are physically correct
- Uncheck to skip unit checking, suitable for pure mathematical fitting
Parameter Tuning
- Sidebar parameters same as drawing mode
- For complex data, recommend:
  - Number of Islands: 6-8
  - Number of Cycles: 8-10
  - Enable Fuzzy mode
Execution and Results
- Click ▶️ Start Fitting
- For multi-dimensional data, results shown as predicted vs actual scatter plots
- Ideally, points should be distributed near the y=x line

Parameter	Recommended Value	Description	Tuning Suggestions
Number of Islands	3-8	Number of parallel evolution populations	Increase improves diversity but consumes more API
Number of Cycles	3-10	Number of evolution generations	More cycles usually yield better results
Unit Interactions	5-10	LLM calls per cycle	Balance exploration depth and cost
Target Score	80.0	Automatic stop threshold	Adjust based on accuracy requirements (0-100)
Sample Temperature	10-30	Generation randomness control	High temperature increases creativity, low temperature more stable

🔧 Troubleshooting

Common Problem Solutions

Q: Application fails to start?

# Check Python version (requires 3.10+)
python --version

# Reinstall dependencies
uv sync

Q: API calls failing?

Check if api_keys.json format is correct
Confirm API keys are valid and have balance
Verify network connection
Check if model names exactly match configuration file key names

Q: Fitting results unsatisfactory?

Increase search intensity: Raise number of islands and cycles
Enable Fuzzy mode: Use natural language theory descriptions
Try different models: GPT-4o usually performs better than Mini versions
Optimize data quality: Ensure canvas curves are clear and data is evenly distributed
Adjust function library: Choose appropriate basic functions based on expected function types

Q: Memory or performance issues?

Lower number of islands and cycles
Use more lightweight models
Reduce number of data points
Turn off some unnecessary visualizations

Log Viewing

The application automatically saves detailed logs in the logs/ directory:

logs/
├── fit_20231208_143022/    # Fitting process logs
├── db_20231208_143022/     # IdeaSearcher database files
└── ...

Each fitting creates an independent timestamped directory containing:

Complete fitting process records
API call details
Error messages and debug output
Best expressions and Pareto front data

📚 Technical Architecture

Core Components

Streamlit: Web application framework
IdeaSearch-framework: Core optimization engine
IdeaSearch-fit: Symbolic regression adapter
streamlit-drawable-canvas: Drawing canvas component

Data Flow

User Input (Canvas/File) → Data Preprocessing → IdeaSearchFitter → IdeaSearcher → LLM Calls → Expression Generation → Evaluation and Selection → Result Display

File Structure

app.py

api_keys.json

api_keys.json.example

pyproject.toml

run.sh

README.md

ARCHITECTURE.md

🚀 Advanced Features

Fuzzy Mode

Fuzzy mode is a unique feature of IdeaSearch that first lets LLM generate natural language theory descriptions, then converts them to mathematical expressions:

Theory Generation: LLM analyzes data characteristics and generates physical or mathematical theory hypotheses
Expression Conversion: Convert natural language theories to specific mathematical formulas
Iterative Optimization: Continue refining theories and expressions based on fitting results

Applicable scenarios:

Physical law discovery
Complex nonlinear relationship modeling
Symbolic regression requiring interpretability

Physical Unit Validation

When unit validation is enabled, the system will:

Dimensional Analysis: Check dimensional consistency of expressions
Unit Derivation: Verify if output units match expectations
Correction Suggestions: Provide correction suggestions for expressions that don't conform to units

This ensures generated expressions are physically meaningful.

Island Evolution Algorithm

Parallel Search: Multiple "islands" simultaneously evolve different expression populations
Population Exchange: Islands periodically exchange excellent individuals
Diversity Maintenance: Avoid premature convergence to local optima

Pareto Front Optimization

Balances two objectives:

Fitting Accuracy: Degree of expression matching with data
Expression Complexity: Simplicity of formulas

Helps users find optimal balance between accuracy and interpretability.

📈 Performance Optimization Suggestions

Model Selection: Gemini 2.5 Flash provides best cost-effectiveness
Batch Processing: Use larger unit interaction numbers to reduce network overhead
Caching: System automatically caches intermediate results
Parallelization: Island algorithm naturally supports parallel computing
Early Stopping: Set reasonable target scores to avoid overfitting

🤝 Contribution and Feedback

Encountering problems or have improvement suggestions?

📋 Check GitHub Issues
🆕 Create New Issue
📧 Contact development team

🎯 Start exploring AI-driven symbolic regression!

Let large language models drive research and promote scientific discovery

IdeaSearch Fitter Demo

🎬 Demo Video

🚀 IdeaSearch Fitter Demo Usage Tutorial

📖 Overview

✨ Key Features

🛠️ Environment Setup

1. Clone the Repository

2. Install uv Package Manager

3. Configure Environment and Dependencies

4. Configure API Keys

🚀 Launch Application

📖 Usage Guide

🎨 Tab 1: Draw Curve Fitting

Operation Steps

Fitting Results Interpretation

📁 Tab 2: File Upload Fitting

Data Preparation

Operation Steps

⚙️ Configuration Parameter Details

IdeaSearch Core Parameters

Canvas Configuration Parameters

Data Processing Parameters

Fitter Configuration Parameters

🎯 Parameter Tuning Guide

Key Parameter Explanations

🔧 Troubleshooting

Common Problem Solutions

Log Viewing

📚 Technical Architecture

Core Components

Data Flow

File Structure

🚀 Advanced Features

Fuzzy Mode

Physical Unit Validation

Island Evolution Algorithm

Pareto Front Optimization

📈 Performance Optimization Suggestions

🤝 Contribution and Feedback

On this page