# multimodal-localization

**Repository Path**: aHeiDaBai/multimodal-localization

## Basic Information

- **Project Name**: multimodal-localization
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2025-11-24
- **Last Updated**: 2025-11-24

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README


qwen3-vl 模型参数的地址：

```bash
https://huggingface.co/Qwen/Qwen3-VL-2B-Instruct
```

github 仓库地址：

```bash
https://github.com/QwenLM/Qwen3-VL
```


qwen 在 encoder 阶段中，text encoder 和 image encoder 是分开来的，在 decoder 中融合的，

GLIP 在 encoder 阶段中，好像就已经把 text encoder 和 image encoder 融合了


```bash
https://github.com/microsoft/GLIP
```


微调模型：

```bash
https://github.com/microsoft/GLIP?tab=readme-ov-file#fine-tuning
```


```bash
https://huggingface.co/microsoft/Florence-2-large
https://huggingface.co/microsoft/Florence-2-base
```


```bash
https://arxiv.org/abs/2311.06242
```