中文分词器

中文文本处理在 Elasticsearch 中较为特殊，需要借助插件实现有效分词和索引。常用的中文分词器有 IK 分词器、jieba 分词等，本文重点介绍 IK 分词器的使用。

1. IK 分词器简介

IK 分词器是 Elasticsearch 最流行的中文分词插件，支持细粒度和智能切分模式，适合中文搜索和分析。

2. 安装 IK 分词器

在 Elasticsearch 插件目录执行：

bash

./bin/elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/vx.x.x/elasticsearch-analysis-ik-x.x.x.zip

（将 x.x.x 替换为对应版本号）

安装完成后重启 Elasticsearch。

3. 配置使用 IK 分词器

在索引创建时指定 IK 分词器：

json

PUT /my-chinese-index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "ik_smart": {
          "type": "ik_smart"
        },
        "ik_max_word": {
          "type": "ik_max_word"
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "content": {
        "type": "text",
        "analyzer": "ik_max_word",
        "search_analyzer": "ik_smart"
      }
    }
  }
}

ik_max_word：细粒度切分，适合索引阶段
ik_smart：智能切分，适合搜索阶段，提高搜索效果

4. 测试分析效果

使用 _analyze API 测试分词：

json

GET /my-chinese-index/_analyze
{
  "analyzer": "ik_max_word",
  "text": "快速学习 Elasticsearch 中文分词"
}

5. 其他中文分词方案

jieba 分词：基于 Python 的分词器，通过自定义插件集成
官方中文分析器：从 7.x 版本开始提供，功能有限

掌握中文分词器的安装与配置，能显著提升 Elasticsearch 中文文本的搜索准确性和性能。

中文分词器 ​

1. IK 分词器简介 ​

2. 安装 IK 分词器 ​

3. 配置使用 IK 分词器 ​