FAQ

Installation & Environment Setup

How do I install Second Me on Windows / Linux / Mac / Dock

Recommended solution: Use Docker (cross-platform support: Mac, Windows, Linux).
Notes for Windows users:
- Additional installation of make is required (via MinGW or WSL).
- Not recommended to use native Windows environment (not fully tested).
Non-Docker installation: Ensure all dependencies are installed (e.g., brew, poetry, Python 3.12).
Advanced users: Bare-metal deployment on Mac is suggested for maximum performance.

Can I shut down my computer during training?

Supports checkpoint resumption: Training progress is saved in resource/ and data/ directories. Restart to continue training.
Note: Shutting down will terminate the current training process; the service needs to be restarted.

Does it support GPU acceleration?

Under development. Docker GPU support can be combined with local Ollama.

How to use proxy or resolve network issues during installation?

Select different sources based on your region/country for installation.

Model/Training

How do I train with a local model (e.g., Ollama, Gemma, Qwen)?

Guide: Refer to Custom Model Config (Ollama).md.
Docker users: Replace 127.0.0.1 in the API Endpoint with host.docker.internal.

Why does the model fail during training?

Common causes:
- Insufficient Docker memory limits (increase memory allocation).
- Incorrect model configuration (verify parameter compatibility).

What to do if ChromaDB reports embedding dimension mismatch?

Solutions:
- Delete data/chroma_db and retrain.
- Ensure embedding model dimensions match (e.g., 768 vs. 3072).
Appendix:
- Partially resolved in PR #207.

Why is embedding failing with OpenAI error even when using Ollama?

The OpenAI SDK is used, so error paths may include "OpenAI," but requests are sent to the configured model endpoint, not OpenAI's service.

What is the recommended size for training data?

Keep between 10k~100k for stability. Larger datasets may cause timeouts or memory issues.

Can I reuse API calls to save money on retraining?

Yes, intermediate data is saved and won’t trigger repeated API calls.

Features & Architecture

What’s the difference between Second Me and me.bot?

SecondMe: Open-source personal LLM framework.
Me.Bot: An online app based on this framework.

Can I run multiple Second Me instances?

Supported: Ensure sufficient hardware resources and resolve port conflicts.

Can I use Second Me in my own agent framework?

Open API and MCP service support for direct integration.

Why does embedding stage use a different model than chat stage?

Technical reason: Not all model vendors provide both interfaces, openai does, but DeepSeek for example does not (for now). We need both interfaces during training, so we need to configure them separately.

Errors & Debugging

No rule to make target 'setup' error?

Troubleshooting:
- Confirm you’re in the project root directory.
- Verify Makefile integrity.

"Too many open files" during training?

Possible cause: Memory leak, please submit an issue to us if you encounter this situation.
When reporting issues, include:
- OS (Mac/Linux).
- Memory configuration (e.g., 16GB).
- Docker version (if applicable).
Note: Avoid sharing private data in logs.

Can’t enter training page or web UI crashes?

Debug steps:
- Run make status to check service status.
- Verify no network conflicts (e.g., port occupancy).

What does "entities.parquet - no such file or directory" mean?

Cause: Insufficient data extraction model capability.
Suggestion: Switch to high-performance models (e.g., OpenAI API).

“Permission denied (publickey)” when cloning repository?

SSH key not set up. Use HTTPS instead:

git clone https://github.com/mindverse/Second-Me.git

Why "internal server error"?

Typical cause: The probability is that the maximum length of the chunk exceeds the limit because of the inconsistency between the configured embedding model and the maximum length set by the project.
Action: You can adjust the parameter EMBEDDING_MAX_TEXT_LENGTH in the .env file according to the specific parameters of the model.

Step "generate_biography" failed?

For paid models (OpenAI/DeepSeek), common errors:
- openai.BadRequestError: Error code：
  - 400 - Bad Request
    Reason: The request body format is incorrect.
    Solution: Check whether the model name and API key are correct (there may be extra spaces after the model name).
  - 401 - Unauthorized
    Reason: Invalid API key, authentication failed.
    Solution: Verify that your API key is correct. If you don’t have one, create an API key first.
  - 402 - Insufficient Balance
    Reason: Insufficient account balance.
    Solution: Check your account balance and top up on the recharge page.
  - 422 - Unprocessable Entity
    Reason: Invalid parameters in the request body.
    Solution: Adjust the parameters based on the error message.
  - 429 - Too Many Requests
    Reason: Request rate (TPM or RPM) limit reached.
    Solution: Plan your request rate appropriately.
  - 500 - Internal Server Error
    Reason: Server internal error.
    Solution: Retry later. If the issue persists, contact the server provider.
  - 503 - Service Unavailable
    Reason: Server is overloaded.
    Solution: Retry your request later.
For errors like Biography generation failed: must be......, not....... or Expecting value...... :
- Upgrading Models
  - Select a more capable model to ensure that the generation capabilities meet the demand.
- Switch to an API model
  - Switch to a cloud-based API service that supports the OpenAI protocol to circumvent local arithmetic or compatibility limitations.
For errors like Biography generation failed: Request timed out , it is usually due to in sufficient local computing resources, resulting in model response timeout. The following optimization measure are recommended:
- Use cloud API services
  - Use APIs that support the OpenAI protocol to call the model, avoiding local hardware performance limitations and ensuring stable generation.

Issue with Embedding Model?

Usually the probability of embedding failure is very low and can be solved as follows:
- Use a better model (e.g., OpenAI) or host a local high-performance extractor.
- It may be that the local directory has already been initialized and chromadb needs to be re-initialized (refer to PR #207).

`sqlite3.OperationalError: no such column: collections.topic`?

Delete the data directory where ChromaDB stores data.
Restart the application to reinitialize ChromaDB (either restart make restart or make docker restart make docker-restart-all depending on your platform).

Training stuck at "Training to create Second Me -> train"?

Resource suggestion: the training process takes up a lot of memory, allocating more memory can speed up the training, 16G or even higher is recommended.

FAQ

Installation & Environment Setup

How do I install Second Me on Windows / Linux / Mac / Dock

Can I shut down my computer during training?

Does it support GPU acceleration?

How to use proxy or resolve network issues during installation?

Model/Training

How do I train with a local model (e.g., Ollama, Gemma, Qwen)?

Why does the model fail during training?

What to do if ChromaDB reports embedding dimension mismatch?

Why is embedding failing with OpenAI error even when using Ollama?

What is the recommended size for training data?

Can I reuse API calls to save money on retraining?

Features & Architecture

What’s the difference between Second Me and me.bot?

Can I run multiple Second Me instances?

Can I use Second Me in my own agent framework?

Why does embedding stage use a different model than chat stage?

Errors & Debugging

No rule to make target 'setup' error?

"Too many open files" during training?

What does "entities.parquet - no such file or directory" mean?

“Permission denied (publickey)” when cloning repository?

Why "internal server error"?

Step "generate_biography" failed?

Issue with Embedding Model?

`sqlite3.OperationalError: no such column: collections.topic`?

Training stuck at "Training to create Second Me -> train"?

Other Questions

Can I use Logseq, Notion, me.bot logs for training?

Why are some of my memory files missing after upload?

Does Mindverse recruit interns or collaborators?

Installation & Environment Setup

How do I install Second Me on Windows / Linux / Mac / Dock

Can I shut down my computer during training?

Does it support GPU acceleration?

How to use proxy or resolve network issues during installation?

Model/Training

How do I train with a local model (e.g., Ollama, Gemma, Qwen)?

Why does the model fail during training?

What to do if ChromaDB reports embedding dimension mismatch?

Why is embedding failing with OpenAI error even when using Ollama?

What is the recommended size for training data?

Can I reuse API calls to save money on retraining?

Features & Architecture

What’s the difference between Second Me and me.bot?

Can I run multiple Second Me instances?

Can I use Second Me in my own agent framework?

Why does embedding stage use a different model than chat stage?

Errors & Debugging

No rule to make target 'setup' error?

"Too many open files" during training?

What does "entities.parquet - no such file or directory" mean?

“Permission denied (publickey)” when cloning repository?

Why "internal server error"?

Step "generate_biography" failed?

Issue with Embedding Model?

sqlite3.OperationalError: no such column: collections.topic?

Training stuck at "Training to create Second Me -> train"?

Other Questions

Can I use Logseq, Notion, me.bot logs for training?

Why are some of my memory files missing after upload?

Does Mindverse recruit interns or collaborators?

`sqlite3.OperationalError: no such column: collections.topic`?