#ElasticSearch基本实操

简单操作

创建索引，并设置分片数，及副本数据

PUT/lib/{
    "settings": {
        "index": {
            "number_of_shards": 3,
            "number_of_replicas": 0
        }
    }
}

创建默认索引

PUT /lib2

默认分片：5 复本数：1

查看索引配置

GET /lib/_settings
GET /lib2/_settings

查看所有索引配置

GET _all/_settings

创建文档,指定ID使用PUT

PUT /lib/user/1
{
  "first_name":"Jane",
  "last_name":"Smith",
  "age":36,
  "about":"I like to collect rock albums",
  "interests":["music"]
}

创建文档，不指定ID使用POST

POST /lib/user/
{
  "first_name":"Douglas",
  "last_name":"Fir",
  "age":23,
  "about":"I like to build cabinets",
  "interests":["forestry"]
}

根据ID,查询文档

GET /lib/user/1
GET /lib/user/r9yyymwBwc87EyodP5Hu

指定查询显示的属性

GET /lib/user/1?_source=age,about

覆盖更新，使用PUT,直接把相关的属性放入即可

PUT /lib/user/1
{
  "first_name":"Jane",
  "last_name":"Smith",
  "age":30,
  "about":"I like to collect rock albums",
  "interests":["music"]
}

更新指定属性，使用POST

POST /lib/user/1/_update
{
  "doc":{
    "age":33
  }
}

删除单个文档

DELETE /lib/user/1

删除索引

DELETE /lib2

批量获取文档

GET /_mget
{
  "docs":[
    {
      "_index":"lib",
      "_type":"user",
      "_id":1

    },
    {
      "_index":"lib",
      "_type":"user",
      "_id":2

    },
    {
      "_index":"lib",
      "_type":"user",
      "_id":3

    }
  ]
}

批量获取文档,指定具体字段

GET /_mget
{
  "docs":[
    {
      "_index":"lib",
      "_type":"user",
      "_id":1,
      "_source":["age","interests"]

    },
    {
      "_index":"lib",
      "_type":"user",
      "_id":2,
      "_source":["age"]

    }
  ]
}

批量获取文档,获取同索引类型下的不同文档

GET /lib/user/_mget
{
  "docs":[
    {
      "_id":1
    },
    {
      "_type":"user",
      "_id":2
    }
  ]
}

批量获取文档，简化方式

GET /lib/user/_mget
{
  "ids":["1","2","3"]
}

Bulk 批量添加文件

blulk的格式： {action:{metadata}}\n {requestbody}\n

action:(行为)

create:文档不存在时创建

update:更新文档

index:创建新文档或替换已有文档

delete:删除一个文档

metadata:_index, _type, _id

create和index区别： 如果数据存在，使用create操作失败，会提示文档已经存在，使用index则可以成功执行。

示例：

{
    "delete": {
        "_index": "lib",
        "_type": "user",
        "_id": "1"
    }
}

Bulk 批量添加文件

POST /lib2/books/_bulk
{"index":{"_id":1}}
{"title":"Java","price":55}
{"index":{"_id":2}}
{"title":"Html5","price":45}
{"index":{"_id":3}}
{"title":"PHP","price":35}
{"index":{"_id":4}}
{"title":"Python","price":50}

批量获取文件

GET /lib2/books/_mget
{
  "ids":["1","2","3","4"]
}

Bulk 批量添加文件

POST /lib2/books/_bulk
{"delete":{"_index":"lib2","_type":"books","_id":4}}  #删除
{"create":{"_index":"tt","_type":"ttt","_id":100}}     #创建
{"name":"lisi"}                        
{"index":{"_index":"tt","_type":"ttt"}}             #创建或更新
{"name":"zhaosi"}
{"update":{"_index":"lib2","_type":"books","_id":4}}  #更新
{"doc":{"price":50}}

一般建议是1000-5000个文档，大小建议是5-15MB,默认不能超过100M，可以在es的配置文件（即$ES_HOME 下的config下的elasticsearch.yml）中。

版本控制

ElasticSearch采用了乐观锁来保证数据的一致性，也就是说，当用户对document进行操作时，并不需要对该document作加锁和解锁的操作，只需要指定要操作的版本即可。

当版本号一致时，ElasticSearch会允许该操作顺利执行，而当版本号在冲突时，ElasticSearch会提示冲突并抛出异常(VersionConflictEngineException异常)。

内部版本控制：使用的是_verison

外部版本控制：elasticsearch在处理外部版本号时会与内部版本号的处理有些不同。

它不再是检查_version是否与请求中指定的数值相同，而是检查当前的_version是否比指定的数值小。如果请求成功，那么外部的版本号就会被存储到文档中的_version中。

为了保持_version与外部版本控制的数据一致使用 version_type=external

覆盖更新，使用PUT,直接把相关的属性放入即可

内部版本号必须相等
PUT /lib/user/3?version=1 {
"first_name":"Jane", "last_name":"Smith", "age":35, "about":"I like to collect rock albums", "interests":["music"] }
必须大于原来version版本号

使用version_type=external 外部版本号时，version的值必须大于原来version版本号

PUT /lib/user/3?version=6&version_type=external
{

  "first_name":"Jane",
  "last_name":"Smith",
  "age":35,
  "about":"I like to collect rock albums",
  "interests":["music"]
}

Mapping使用

mapping定义了type中的每个字段的数据类型以及这些字段如何分词等相关属性。

创建索引的时候，可以预先定义字段的类型以及相关属性，这样就能够把日期字段处理成日期，把数字字段处理成数字，把字符串字段处理字符串值等。

核心数据类型

text 和 keyword

text 类型被用来索引长文本，在建立索引前会将这些文本进行分司，转化为词的组合，建立索引，允许es来检索这些词语，text类型不能用来排序和聚合。

keyword类型不需要进行分词，可以被用来检索过滤、排序和聚合，keyword类型字段只能用本身来进行检索。

数字类型：long, integer, short, byte, double, float 日期型： date (不分词) String类型（分词）布尔型： boolean

......

Mapping 支持属性

enabled：仅存储、不做搜索和聚合分析 "enabled":true （缺省）| false
index：是否构建倒排索引（即是否分词，设置false，字段将不会被索引） "index": true（缺省）| false
index_option：存储倒排索引的哪些信息 4个可选参数 docs：索引文档号 freqs：文档号+词频 positions：文档号+词频+位置，通常用来距离查询 offsets：文档号+词频+位置+偏移量，通常被使用在高亮字段分词字段默认时positions，其他默认时docs "index_options": "docs"
norms：是否归一化相关参数、如果字段仅用于过滤和聚合分析、可关闭分词字段默认配置，不分词字段：默认{“enable”: false}，存储长度因子和索引时boost，建议对需要参加评分字段使用，会额外增加内存消耗 "norms": {"enable": true, "loading": "lazy"}
doc_value：是否开启doc_value，用户聚合和排序分析对not_analyzed字段，默认都是开启，分词字段不能使用，对排序和聚合能提升较大性能，节约内存 "doc_value": true（缺省）| false
fielddata：是否为text类型启动fielddata，实现排序和聚合分析针对分词字段，参与排序或聚合时能提高性能，不分词字段统一建议使用doc_value "fielddata": {"format": "disabled"}
store：是否单独设置此字段的是否存储而从_source字段中分离，只能搜索，不能获取值 "store": false（默认）| true
coerce：是否开启自动数据类型转换功能，比如：字符串转数字，浮点转整型 "coerce: true（缺省）| false"
multifields：灵活使用多字段解决多样的业务需求

dynamic：控制mapping的自动更新

"dynamic": true（缺省）| false | strict

data_detection：是否自动识别日期类型 data_detection：true（缺省）| false - dynamic和data_detection的详解：Elasticsearch dynamic mapping（动态映射）策略

analyzer：指定分词器，默认分词器为standard analyzer "analyzer": "ik" - boost：字段级别的分数加权，默认值是1.0 "boost": 1.23 - fields：可以对一个字段提供多种索引模式，同一个字段的值，一个分词，一个不分词 "fields": {"raw": {"type": "string", "index": "not_analyzed"}} - ignore_above：超过100个字符的文本，将会被忽略，不被索引 "ignore_above": 100 - include_in_all：设置是否此字段包含在_all字段中，默认时true，除非index设置成no "include_in_all": true - null_value：设置一些缺失字段的初始化，只有string可以使用，分词字段的null值也会被分词 "null_value": "NULL" - position_increament_gap：影响距离查询或近似查询，可以设置在多值字段的数据上或分词字段上，查询时可以指定slop间隔，默认值时100 "position_increament_gap": 0 - search_analyzer：设置搜索时的分词器，默认跟analyzer是一致的，比如index时用standard+ngram，搜索时用standard用来完成自动提示功能 "search_analyzer": "ik" - similarity：默认时TF/IDF算法，指定一个字段评分策略，仅仅对字符串型和分词类型有效 "similarity": "BM25" - trem_vector：默认不存储向量信息，支持参数yes（term存储），with_positions（term+位置），with_offsets（term+偏移量），with_positions_offsets（term+位置+偏移量）对快速高亮fast vector highlighter能提升性能，但开启又会加大索引体积，不适合大数据量用 "trem_vector": "no"

查询索引的 mapping

GET /lib2/books/_mapping

{
  "lib2": {
    "mappings": {
      "books": {
        "properties": {
          "index": {
            "properties": {
              "_id": {
                "type": "long"
              }
            }
          },
          "price": {
            "type": "long"
          },
          "title": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          }
        }
      }
    }
  }
}

查询所有lib2索引，类型为books的所有记录

GET /lib2/books/_search

根据条件查询，查询属性title为html5的所有记录

GET /lib2/books/_search?q=title:html5

根据条件查询，查询属性post_date为2019-01-01的所有记录

GET /lib2/books/_search?q=post_date:2019-09-01

Object类型

PUT /lib5/person/1
{
  "name":"Tome",
  "age":25,
  "birthday":"1985-09-01",
  "address":{
    "country":"china",
    "province":"guangdong",
    "city":"shenzhen"
  }
}

获取默认设置

GET /lib5/_settings

手动创建mapping,指定属性类型。是否使用索引

PUT /lib6
{
  "settings": {
    "number_of_shards": 3,
    "number_of_replicas": 0
  },
  "mappings": {
    "books":{
      "properties":{
          "title":{"type":"text"},
          "name":{"type":"text","analyzer":"standard"},
          "publish_date":{"type":"date","index":false},
          "price":{"type":"double"},
          "number":{"type":"integer"}
      }
    }
  }
}

GET /lib6/_settings
GET /lib6/_mapping

-

PUT /lib3
{
  "settings": {
    "number_of_shards": 3,
    "number_of_replicas": 0
  },
  "mappings": {
    "user":{
      "properties":{
        "name":{"type":"text"},
        "address":{"type":"text"},
        "age":{"type":"integer"},
        "interests":{"type":"text"},
        "birthday":{"type":"date"}
      }
    }
  }
}

非中文查询方式

PUT /lib3/user/1
{
  "name":"u1",
  "address:":"address1 guangzhou",
  "age":31,
  "birthday":"2011-09-08",
  "interests":"duanlian aaaaaaaaaaaaaaaaaa"
}

PUT /lib3/user/2
{
  "name":"u2",
  "address:":"address shenzhen",
  "age":32,
  "birthday":"2012-09-08",
  "interests":"duanlian bbbbbbbbbbbbbbb"
}

PUT /lib3/user/3
{
  "name":"u3",
  "address:":"address beijing",
  "age":33,
  "birthday":"2013-09-08",
  "interests":"duanlian cccccccccccccccc"
}
PUT /lib3/user/4
{
  "name":"u4",
  "address:":"address shanghai",
  "age":34,
  "birthday":"2014-09-04",
  "interests":"duanlian hejj44444444444444"
}

根据属性查询

GET /lib3/user/_search?q=name:u3

根据属性查询，并排序

GET /lib3/user/_search?q=interests:duanlian&sort=age:desc

根据属性查询，使用term属性查询

GET /lib3/user/_search
{
  "query":{
    "term":{"name":"u1"}
  }
}

根据属性查询，使用terms属性查询,含有多个关键词的文档

GET /lib3/user/_search
{
  "query":{
    "terms":{"interests":["aaaaaaaaaaaaaaaaaa","bbbbbbbbbbbbbbb"]}
  }
}

根据属性查询，使用terms属性查询,并分页

GET /lib3/user/_search
{
  "from":0,
  "size":2,
  "query":{
    "terms":{"interests":["duanlian"]}
  }
}

max_score 当前搜索相关度的匹配分数

term query会去倒排索引中寻找确切的term，它并不知道分词器的存在。这种查询适合keyword、numeric date.

match query知道分词器的存在,会对filed进行分词操作，然后再查询

GET /lib3/user/_search
{
  "query": {
    "match": {
      "name": "亮"
    }
  }
}

GET /lib3/user/_search
{
  "query": {
    "match": {
      "age": "28"
    }
  }
}

match_all查询所有文档

GET /lib3/user/_search
{
  "query": {"match_all": {}}
}

multi_match:可以指定多个字段

multi_match 多个匹配条件

GET /lib3/user/_search
{
  "query": {
    "multi_match": {
    "query": "28",
    "fields": ["name","age"]
  }}
}

match_phrase 匹配短语，必须与短语一模一样(不能交换位置)

GET /lib3/user/_search
{
  "query": {
    "match_phrase": {
      "interests": "跑步，听音乐"
    }
  }
}

指定显示属性

GET /lib3/user/_search
{
  "_source": ["age","name","address","birthday"],
  "query": {
    "match": {
      "interests": "跑步"
    }

  }
}


GET /lib3/user/_search
{
  "query": {"match_all": {}},
  "_source": {
    "includes": ["name","address","interests"],
    "excludes": ["age","birthday"]
  }
}


GET /lib3/user/_search
{
  "query": {"match_all": {}},
  "_source": {
    "includes": ["name","addr*","interests"],
    "excludes": ["age","birthday"]
  }
}

排序

GET /lib3/user/_search
{
  "query": {
    "match_all": {}
  },
  "sort":[
      {
          "age":{
            "order": "desc"
          }
      }
    ]
}

查询名字带有“明”的前缀

GET /lib3/user/_search
{
  "query": {
    "match_phrase_prefix": {
      "name":{
        "query": "明"
      }
    }

  }
}

查询年龄 >=30 && <35 的记录

GET /lib3/user/_search
{
  "query": {
    "range": {
      "age": {
        "gte": 30,
        "lt": 35
      }
    }
  }
}

wildcard 通配符查询，查询包括有 “小”的名称

GET /lib3/user/_search
{
  "query": {
    "wildcard": {
        "name":"小*"
    }
  }
}

当使用字符查询时注意分词后，全部为小写

GET /lib3/user/_search
    {
      "query": {
        "wildcard": {
            "name":"w?m"
        }
      }
    }

fuzzy 模糊查询

GET /lib3/user/_search
{
  "query": {
    "fuzzy": {
        "name":"明"
    }
  }
}

value 查询的关键字

GET /lib3/user/_search
{
  "query": {
      "fuzzy": {
          "interests":{
            "value":"步"
          }
    }
  }
}

使用高亮

GET /lib3/user/_search
{
  "query":{
    "match":{
      "interests":"唱歌"
    }
  },
  "highlight": {
    "fields": {
       "interests": {}
    }
  }
}

ik带有两个分词器

ik_max_word：会将文本做最细粒度的拆分；尽可能多的拆分出词语 ik_smart:会做最粗粒度的拆分；已经分出的词语将不会再次被其它词语占有

创建自定义索引，使用中文分词器

DELETE /lib3

PUT /lib3
{
  "settings": {
    "number_of_shards": 3,
    "number_of_replicas": 0
  },
  "mappings": {
    "user":{
      "properties":{
         "name":{"type":"text","analyzer":"ik_max_word"},
         "address":{"type":"text","analyzer":"ik_max_word"},
         "age":{"type":"integer"},
         "interests":{"type":"text","analyzer":"ik_max_word"},
         "birthday":{"type":"date"}
      }
    }
  }
}



PUT /lib3/user/1
{
  "name":"小明 XM",
  "address":"广州中山大道",
  "age":31,
  "birthday":"2011-09-08",
  "interests":"跑步，听音乐"
}
PUT /lib3/user/2
{
  "name":"晚秋明 WQM",
  "address":"广州黄鹂大道西",
  "age":18,
  "birthday":"2012-09-08",
  "interests":"跑步，听音乐，唱歌"
}
PUT /lib3/user/3
{
  "name":"明亮 ML",
  "address":"中山市中山大道",
  "age":28,
  "birthday":"2013-09-08",
  "interests":"跑步，听音乐，唱歌"
}

PUT /lib3/user/4
{
  "name":"明明 MM",
  "address":"深圳市宝安大道",
  "age":35,
  "birthday":"2014-09-08",
  "interests":"听音乐，唱歌，跳舞"
}

“小明”是一个词，拆分两个中文则不能查询到

GET /lib3/user/_search
{
  "query": {
    "term": {
      "name":"小明"
    }
  }
}

Filter查询

filter是不计算相关性的，同时可以cache。因此，filter速度要快于query.

GET /lib2/books/_search
{
  "post_filter": {"term":{"price":35}}
}

GET /lib2/books/_search
{
  "post_filter": {"terms":{"price":[35,55]}}
}

默认text类型会进行分词，而分词不区分大小写，所以应该使用下面的写法：

GET /lib2/books/_search
{
  "query": {
    "bool": {
      "filter": [
          {"term":{"title":"php"}}
      ]
    }
  }
}

bool过滤查询

格式：

{"bool":{"must":[],"should":[],"must_not":[]}}

must:必须满足的条件---and should:可以满足也可不满足的条件---or must_not:不需要满足的条件---not

查询 title 为php或java,并且price！=55

GET /lib2/books/_search
{
  "query": {
    "bool": {
      "should": [
          {"term":{"title":"php"}},
          {"term":{"title":"java"}}
      ],
      "must_not": [
        {"term":{"price":35}}
      ]
    }
  }
}

查询(price=45 并且 title=html5) 或者 (title=java)

GET /lib2/books/_search
{
  "query": {
    "bool": {
      "should": [
          {"term":{"title":"java"}},
          {
            "bool": {
                "must": [
                  {"term":{"price":45}},
                  {"term":{"title":"html5"}}
                ]

            }
          }
      ]
    }
  }
}

查询年龄 >=10 && <=55

GET /lib3/user/_search
{
  "query": {
    "bool": {
      "filter": {
        "range": {
          "age": {
            "gte": 10,
            "lte": 55
          }
        }
      }
    }
  }
}

查询存在 name 属性的记录

GET /lib3/user/_search
{
  "query": {"bool": {
    "filter": {
      "exists": {
        "field": "name"
      }
    }
  }}
}

定义mapping不使用分词

PUT /lib7
{
  "mappings": {
    "items":{
      "properties":{
        "itemID":{
          "type":"text",
          "index":false
        }
      }
    }
  }
}

GET /lib7/_mapping

ElasticSearch基本实操（1）

简单操作

创建索引，并设置分片数，及副本数据

创建默认索引

查看索引配置

查看所有索引配置

创建文档,指定ID使用PUT

创建文档，不指定ID使用POST

根据ID,查询文档

指定查询显示的属性

覆盖更新，使用PUT,直接把相关的属性放入即可

更新指定属性，使用POST

删除单个文档

删除索引

批量获取文档

批量获取文档,指定具体字段

批量获取文档,获取同索引类型下的不同文档

批量获取文档，简化方式

Bulk 批量添加文件

Bulk 批量添加文件

批量获取文件

Bulk 批量添加文件

版本控制

覆盖更新，使用PUT,直接把相关的属性放入即可

Mapping使用

Mapping 支持属性

"dynamic": true（缺省）| false | strict

查询索引的 mapping

查询所有lib2索引，类型为books的所有记录

根据条件查询，查询属性title为html5的所有记录

根据条件查询，查询属性post_date为2019-01-01的所有记录

Object类型

获取默认设置

手动创建mapping,指定属性类型。是否使用索引

非中文查询方式

根据属性查询

根据属性查询，并排序

根据属性查询，使用term属性查询

根据属性查询，使用terms属性查询,含有多个关键词的文档

根据属性查询，使用terms属性查询,并分页

max_score 当前搜索相关度的匹配分数

term query会去倒排索引中寻找确切的term，它并不知道分词器的存在。这种查询适合keyword、numeric date.

match query知道分词器的存在,会对filed进行分词操作，然后再查询

match_all查询所有文档

multi_match:可以指定多个字段

multi_match 多个匹配条件

match_phrase 匹配短语，必须与短语一模一样(不能交换位置)

指定显示属性

排序

查询名字带有“明”的前缀

查询年龄 >=30 && <35 的记录

wildcard 通配符查询，查询包括有 “小”的名称

当使用字符查询时注意分词后，全部为小写

fuzzy 模糊查询

使用高亮

ik带有两个分词器

创建自定义索引，使用中文分词器

“小明”是一个词，拆分两个中文则不能查询到

Filter查询

默认text类型会进行分词，而分词不区分大小写，所以应该使用下面的写法：

bool过滤查询

查询 title 为php或java,并且price！=55

查询(price=45 并且 title=html5) 或者 (title=java)

查询年龄 >=10 && <=55

查询存在 name 属性的记录

定义mapping不使用分词