# multimodal-localization **Repository Path**: aHeiDaBai/multimodal-localization ## Basic Information - **Project Name**: multimodal-localization - **Description**: No description available - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-11-24 - **Last Updated**: 2025-11-24 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README qwen3-vl 模型参数的地址: ```bash https://huggingface.co/Qwen/Qwen3-VL-2B-Instruct ``` github 仓库地址: ```bash https://github.com/QwenLM/Qwen3-VL ``` qwen 在 encoder 阶段中,text encoder 和 image encoder 是分开来的,在 decoder 中融合的, GLIP 在 encoder 阶段中,好像就已经把 text encoder 和 image encoder 融合了 ```bash https://github.com/microsoft/GLIP ``` 微调模型: ```bash https://github.com/microsoft/GLIP?tab=readme-ov-file#fine-tuning ``` ```bash https://huggingface.co/microsoft/Florence-2-large https://huggingface.co/microsoft/Florence-2-base ``` ```bash https://arxiv.org/abs/2311.06242 ```