# edw

**Repository Path**: nodets/edw

## Basic Information

- **Project Name**: edw
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: MIT
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2025-09-22
- **Last Updated**: 2025-09-22

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# edw （Edit Distance Weighting） - 文本相似度计算库

EDW 是一个基于编辑距离算法的文本相似度计算库，支持中文分词和拼音匹配，适用于多种文本匹配场景。

## 功能特点

- 多级别相似度计算：字符级、词语级、拼音级
- 自定义权重配置：可调整插入、删除、替换操作的权重
- 预设权重模板：提供常见场景的权重配置
- 高性能优化：采用缓存机制和内存池优化
- 中文支持：集成 jieba 分词和 pinyin-pro 拼音转换
- 链式调用 API：提供流畅的编程接口

## 安装

```bash
npm install edw
```

## 快速开始

```javascript
import edw from 'edw';

// 基本文本相似度计算
console.log(edw.similarity('北京', '南京')); // 0.5

// 词语级相似度计算
console.log(edw.similarity('我喜欢苹果', '我喜欢香蕉', 'word')); // 0.6

// 拼音级相似度计算
console.log(edw.similarity('重庆', '崇庆', 'pinyin')); // 1.0

// 批量计算相似度
const cities = ['北京', '南京', '上海', '广州'];
const results = edw.batch(cities, '京');
console.log(results);
// [
//   { text: '北京', similarity: 0.5 },
//   { text: '南京', similarity: 0.5 },
//   { text: '上海', similarity: 0 },
//   { text: '广州', similarity: 0 }
// ]
```

## API 说明

### 核心函数

#### `similarity(s1, s2, level?, config?)`

计算两个文本的相似度

参数：
- `s1`: 第一个文本字符串
- `s2`: 第二个文本字符串
- `level`: 计算级别 ('char' | 'word' | 'pinyin')，默认为 'char'
- `config`: 权重配置对象

返回值：0-1 之间的相似度值

#### `distance(s1, s2, level?, config?)`

计算两个文本的编辑距离

参数与返回值同 similarity 函数

#### `batch(targets, query, level?, config?)`

批量计算目标文本与查询文本的相似度

参数：
- `targets`: 目标文本数组
- `query`: 查询文本
- `level`: 计算级别
- `config`: 权重配置对象

返回值：按相似度降序排列的结果数组

### 权重配置

权重配置对象可以包含以下属性：

```javascript
const config = {
  insert: 1,           // 插入操作权重
  delete: 1,           // 删除操作权重
  substitute: 1,       // 替换操作权重
  position: (pos, char) => 1, // 位置权重函数
  charType: (char) => 1,      // 字符类型权重函数
  usePinyinFallback: true     // 是否使用拼音兜底匹配
};
```

### 预设权重

```javascript
// 重视开头字符
edw.similarity('北京大学', '南京大学', 'char', edw.PRESET_WEIGHTS.frontHeavy);

// 重视中文字符
edw.similarity('北京大学', '南京大学', 'char', edw.PRESET_WEIGHTS.chineseHeavy);

// 重视数字字符
edw.similarity('版本1.0', '版本2.0', 'char', edw.PRESET_WEIGHTS.numberHeavy);
```

### 链式调用 API

```javascript
// 使用链式调用
const result = edw.edw
  .q('北京大学')
  .setLevel('word')
  .use('chineseHeavy')
  .similarity('南京大学');

// 批量计算
const results = edw.edw
  .q('京')
  .batch(['北京', '南京', '上海']);
```

### 便捷函数

```javascript
// 字符级相似度
edw.charSimilarity('北京', '南京');

// 词语级相似度
edw.wordSimilarity('我喜欢苹果', '我喜欢香蕉');

// 拼音级相似度
edw.pinyinSimilarity('重庆', '崇庆');

// 编辑距离
edw.editDistance('北京', '南京');

// 快速批量计算
edw.quickBatch(['北京', '南京'], '京');
```

## 性能优化

EDW 采用了多种性能优化技术：

1. LRU 缓存机制：缓存计算结果避免重复计算
2. 内存池：重用数组对象减少垃圾回收
3. 算法优化：针对不同情况采用最优算法
4. 快速路径：对特殊情况进行快速处理

## 使用场景

- 搜索引擎中的文本匹配
- 数据清洗和去重
- 推荐系统中的内容匹配
- 智能客服中的问题匹配
- 地址匹配和标准化

## 许可证

MIT
```