#ElasticSearch基本实操
简单操作
创建索引,并设置分片数,及副本数据
PUT/lib/{
"settings": {
"index": {
"number_of_shards": 3,
"number_of_replicas": 0
}
}
}创建默认索引
PUT /lib2
默认分片:5 复本数:1查看索引配置
GET /lib/_settings
GET /lib2/_settings查看所有索引配置
GET _all/_settings创建文档,指定ID使用PUT
PUT /lib/user/1
{
"first_name":"Jane",
"last_name":"Smith",
"age":36,
"about":"I like to collect rock albums",
"interests":["music"]
}创建文档,不指定ID使用POST
POST /lib/user/
{
"first_name":"Douglas",
"last_name":"Fir",
"age":23,
"about":"I like to build cabinets",
"interests":["forestry"]
}根据ID,查询文档
GET /lib/user/1
GET /lib/user/r9yyymwBwc87EyodP5Hu指定查询显示的属性
GET /lib/user/1?_source=age,about覆盖更新,使用PUT,直接把相关的属性放入即可
PUT /lib/user/1
{
"first_name":"Jane",
"last_name":"Smith",
"age":30,
"about":"I like to collect rock albums",
"interests":["music"]
}更新指定属性,使用POST
POST /lib/user/1/_update
{
"doc":{
"age":33
}
}删除单个文档
DELETE /lib/user/1删除索引
DELETE /lib2批量获取文档
GET /_mget
{
"docs":[
{
"_index":"lib",
"_type":"user",
"_id":1
},
{
"_index":"lib",
"_type":"user",
"_id":2
},
{
"_index":"lib",
"_type":"user",
"_id":3
}
]
}批量获取文档,指定具体字段
GET /_mget
{
"docs":[
{
"_index":"lib",
"_type":"user",
"_id":1,
"_source":["age","interests"]
},
{
"_index":"lib",
"_type":"user",
"_id":2,
"_source":["age"]
}
]
}批量获取文档,获取同索引类型下的不同文档
GET /lib/user/_mget
{
"docs":[
{
"_id":1
},
{
"_type":"user",
"_id":2
}
]
}批量获取文档,简化方式
GET /lib/user/_mget
{
"ids":["1","2","3"]
}Bulk 批量添加文件
blulk的格式: {action:{metadata}}\n {requestbody}\n
action:(行为)
create:文档不存在时创建
update:更新文档
index:创建新文档或替换已有文档
delete:删除一个文档
metadata:_index, _type, _id
create和index区别: 如果数据存在,使用create操作失败,会提示文档已经存在,使用index则可以成功执行。
示例:
{
"delete": {
"_index": "lib",
"_type": "user",
"_id": "1"
}
}Bulk 批量添加文件
POST /lib2/books/_bulk
{"index":{"_id":1}}
{"title":"Java","price":55}
{"index":{"_id":2}}
{"title":"Html5","price":45}
{"index":{"_id":3}}
{"title":"PHP","price":35}
{"index":{"_id":4}}
{"title":"Python","price":50}批量获取文件
GET /lib2/books/_mget
{
"ids":["1","2","3","4"]
}Bulk 批量添加文件
POST /lib2/books/_bulk
{"delete":{"_index":"lib2","_type":"books","_id":4}} #删除
{"create":{"_index":"tt","_type":"ttt","_id":100}} #创建
{"name":"lisi"}
{"index":{"_index":"tt","_type":"ttt"}} #创建或更新
{"name":"zhaosi"}
{"update":{"_index":"lib2","_type":"books","_id":4}} #更新
{"doc":{"price":50}}一般建议是1000-5000个文档,大小建议是5-15MB,默认不能超过100M,可以在es的配置文件(即$ES_HOME 下的config下的elasticsearch.yml)中。
版本控制
ElasticSearch采用了乐观锁来保证数据的一致性,也就是说,当用户对document进行操作时,并不需要对该document作加锁和解锁的操作,只需要指定要操作的版本即可。
当版本号一致时,ElasticSearch会允许该操作顺利执行,而当版本号在冲突时,ElasticSearch会提示冲突并抛出异常(VersionConflictEngineException异常)。
内部版本控制:使用的是_verison
外部版本控制:elasticsearch在处理外部版本号时会与内部版本号的处理有些不同。
它不再是检查_version是否与请求中指定的数值相同,而是检查当前的_version是否比指定的数值小。 如果请求成功,那么外部的版本号就会被存储到文档中的_version中。
为了保持_version与外部版本控制的数据一致使用 version_type=external
覆盖更新,使用PUT,直接把相关的属性放入即可
- 内部版本号必须相等
PUT /lib/user/3?version=1 {
"first_name":"Jane", "last_name":"Smith", "age":35, "about":"I like to collect rock albums", "interests":["music"] }
- 必须大于原来version版本号
使用version_type=external 外部版本号时,version的值必须大于原来version版本号
PUT /lib/user/3?version=6&version_type=external
{
"first_name":"Jane",
"last_name":"Smith",
"age":35,
"about":"I like to collect rock albums",
"interests":["music"]
}Mapping使用
mapping定义了type中的每个字段的数据类型以及这些字段如何分词等相关属性。
创建索引的时候,可以预先定义字段的类型以及相关属性,这样就能够把日期字段处理成日期,把数字字段处理成数字,把字符串字段处理字符串值等。
核心数据类型
text 和 keyword
text 类型被用来索引长文本,在建立索引前会将这些文本进行分司,转化为词的组合,建立索引,允许es来检索这些词语,text类型不能用来排序和聚合。
keyword类型不需要进行分词,可以被用来检索过滤、排序和聚合,keyword类型字段只能用本身来进行检索。
数字类型:long, integer, short, byte, double, float 日期型: date (不分词) String类型 (分词) 布尔型: boolean
......
Mapping 支持属性
- enabled:仅存储、不做搜索和聚合分析 "enabled":true (缺省)| false
- index:是否构建倒排索引(即是否分词,设置false,字段将不会被索引) "index": true(缺省)| false
- index_option:存储倒排索引的哪些信息 4个可选参数 docs:索引文档号 freqs:文档号+词频 positions:文档号+词频+位置,通常用来距离查询 offsets:文档号+词频+位置+偏移量,通常被使用在高亮字段 分词字段默认时positions,其他默认时docs "index_options": "docs"
- norms:是否归一化相关参数、如果字段仅用于过滤和聚合分析、可关闭 分词字段默认配置,不分词字段:默认{“enable”: false},存储长度因子和索引时boost,建议对需要参加评分字段使用,会额外增加内存消耗 "norms": {"enable": true, "loading": "lazy"}
- doc_value:是否开启doc_value,用户聚合和排序分析 对not_analyzed字段,默认都是开启,分词字段不能使用,对排序和聚合能提升较大性能,节约内存 "doc_value": true(缺省)| false
- fielddata:是否为text类型启动fielddata,实现排序和聚合分析 针对分词字段,参与排序或聚合时能提高性能,不分词字段统一建议使用doc_value "fielddata": {"format": "disabled"}
- store:是否单独设置此字段的是否存储而从_source字段中分离,只能搜索,不能获取值 "store": false(默认)| true
- coerce:是否开启自动数据类型转换功能,比如:字符串转数字,浮点转整型 "coerce: true(缺省)| false"
- multifields:灵活使用多字段解决多样的业务需求
dynamic:控制mapping的自动更新
"dynamic": true(缺省)| false | strict
data_detection:是否自动识别日期类型 data_detection:true(缺省)| false - dynamic和data_detection的详解:Elasticsearch dynamic mapping(动态映射) 策略
analyzer:指定分词器,默认分词器为standard analyzer "analyzer": "ik" - boost:字段级别的分数加权,默认值是1.0 "boost": 1.23 - fields:可以对一个字段提供多种索引模式,同一个字段的值,一个分词,一个不分词 "fields": {"raw": {"type": "string", "index": "not_analyzed"}} - ignore_above:超过100个字符的文本,将会被忽略,不被索引 "ignore_above": 100 - include_in_all:设置是否此字段包含在_all字段中,默认时true,除非index设置成no "include_in_all": true - null_value:设置一些缺失字段的初始化,只有string可以使用,分词字段的null值也会被分词 "null_value": "NULL" - position_increament_gap:影响距离查询或近似查询,可以设置在多值字段的数据上或分词字段上,查询时可以指定slop间隔,默认值时100 "position_increament_gap": 0 - search_analyzer:设置搜索时的分词器,默认跟analyzer是一致的,比如index时用standard+ngram,搜索时用standard用来完成自动提示功能 "search_analyzer": "ik" - similarity:默认时TF/IDF算法,指定一个字段评分策略,仅仅对字符串型和分词类型有效 "similarity": "BM25" - trem_vector:默认不存储向量信息,支持参数yes(term存储),with_positions(term+位置),with_offsets(term+偏移量),with_positions_offsets(term+位置+偏移量)对快速高亮fast vector highlighter能提升性能,但开启又会加大索引体积,不适合大数据量用 "trem_vector": "no"
查询索引的 mapping
GET /lib2/books/_mapping
{
"lib2": {
"mappings": {
"books": {
"properties": {
"index": {
"properties": {
"_id": {
"type": "long"
}
}
},
"price": {
"type": "long"
},
"title": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
}查询所有lib2索引,类型为books的所有记录
GET /lib2/books/_search根据条件查询,查询属性title为html5的所有记录
GET /lib2/books/_search?q=title:html5根据条件查询,查询属性post_date为2019-01-01的所有记录
GET /lib2/books/_search?q=post_date:2019-09-01Object类型
PUT /lib5/person/1
{
"name":"Tome",
"age":25,
"birthday":"1985-09-01",
"address":{
"country":"china",
"province":"guangdong",
"city":"shenzhen"
}
}获取默认设置
GET /lib5/_settings手动创建mapping,指定属性类型。是否使用索引
PUT /lib6
{
"settings": {
"number_of_shards": 3,
"number_of_replicas": 0
},
"mappings": {
"books":{
"properties":{
"title":{"type":"text"},
"name":{"type":"text","analyzer":"standard"},
"publish_date":{"type":"date","index":false},
"price":{"type":"double"},
"number":{"type":"integer"}
}
}
}
}
GET /lib6/_settings
GET /lib6/_mapping-
PUT /lib3
{
"settings": {
"number_of_shards": 3,
"number_of_replicas": 0
},
"mappings": {
"user":{
"properties":{
"name":{"type":"text"},
"address":{"type":"text"},
"age":{"type":"integer"},
"interests":{"type":"text"},
"birthday":{"type":"date"}
}
}
}
}非中文查询方式
PUT /lib3/user/1
{
"name":"u1",
"address:":"address1 guangzhou",
"age":31,
"birthday":"2011-09-08",
"interests":"duanlian aaaaaaaaaaaaaaaaaa"
}
PUT /lib3/user/2
{
"name":"u2",
"address:":"address shenzhen",
"age":32,
"birthday":"2012-09-08",
"interests":"duanlian bbbbbbbbbbbbbbb"
}
PUT /lib3/user/3
{
"name":"u3",
"address:":"address beijing",
"age":33,
"birthday":"2013-09-08",
"interests":"duanlian cccccccccccccccc"
}
PUT /lib3/user/4
{
"name":"u4",
"address:":"address shanghai",
"age":34,
"birthday":"2014-09-04",
"interests":"duanlian hejj44444444444444"
}根据属性查询
GET /lib3/user/_search?q=name:u3根据属性查询,并排序
GET /lib3/user/_search?q=interests:duanlian&sort=age:desc根据属性查询,使用term属性查询
GET /lib3/user/_search
{
"query":{
"term":{"name":"u1"}
}
}根据属性查询,使用terms属性查询,含有多个关键词的文档
GET /lib3/user/_search
{
"query":{
"terms":{"interests":["aaaaaaaaaaaaaaaaaa","bbbbbbbbbbbbbbb"]}
}
}根据属性查询,使用terms属性查询,并分页
GET /lib3/user/_search
{
"from":0,
"size":2,
"query":{
"terms":{"interests":["duanlian"]}
}
}max_score 当前搜索相关度的匹配分数
term query会去倒排索引中寻找确切的term,它并不知道分词器的存在。这种查询适合keyword、numeric date.
match query知道分词器的存在,会对filed进行分词操作,然后再查询
GET /lib3/user/_search
{
"query": {
"match": {
"name": "亮"
}
}
}
GET /lib3/user/_search
{
"query": {
"match": {
"age": "28"
}
}
}match_all查询所有文档
GET /lib3/user/_search
{
"query": {"match_all": {}}
}multi_match:可以指定多个字段
multi_match 多个匹配条件
GET /lib3/user/_search
{
"query": {
"multi_match": {
"query": "28",
"fields": ["name","age"]
}}
}match_phrase 匹配短语,必须与短语一模一样(不能交换位置)
GET /lib3/user/_search
{
"query": {
"match_phrase": {
"interests": "跑步,听音乐"
}
}
}指定显示属性
GET /lib3/user/_search
{
"_source": ["age","name","address","birthday"],
"query": {
"match": {
"interests": "跑步"
}
}
}
GET /lib3/user/_search
{
"query": {"match_all": {}},
"_source": {
"includes": ["name","address","interests"],
"excludes": ["age","birthday"]
}
}
GET /lib3/user/_search
{
"query": {"match_all": {}},
"_source": {
"includes": ["name","addr*","interests"],
"excludes": ["age","birthday"]
}
}排序
GET /lib3/user/_search
{
"query": {
"match_all": {}
},
"sort":[
{
"age":{
"order": "desc"
}
}
]
}查询名字带有“明”的前缀
GET /lib3/user/_search
{
"query": {
"match_phrase_prefix": {
"name":{
"query": "明"
}
}
}
}查询年龄 >=30 && <35 的记录
GET /lib3/user/_search
{
"query": {
"range": {
"age": {
"gte": 30,
"lt": 35
}
}
}
}wildcard 通配符查询,查询包括有 “小”的名称
GET /lib3/user/_search
{
"query": {
"wildcard": {
"name":"小*"
}
}
}当使用字符查询时注意分词后,全部为小写
GET /lib3/user/_search
{
"query": {
"wildcard": {
"name":"w?m"
}
}
}fuzzy 模糊查询
GET /lib3/user/_search
{
"query": {
"fuzzy": {
"name":"明"
}
}
}value 查询的关键字
GET /lib3/user/_search
{
"query": {
"fuzzy": {
"interests":{
"value":"步"
}
}
}
}使用高亮
GET /lib3/user/_search
{
"query":{
"match":{
"interests":"唱歌"
}
},
"highlight": {
"fields": {
"interests": {}
}
}
}ik带有两个分词器
ik_max_word: 会将文本做最细粒度的拆分;尽可能多的拆分出词语 ik_smart:会做最粗粒度的拆分;已经分出的词语将不会再次被其它词语占有
创建自定义索引,使用中文分词器
DELETE /lib3
PUT /lib3
{
"settings": {
"number_of_shards": 3,
"number_of_replicas": 0
},
"mappings": {
"user":{
"properties":{
"name":{"type":"text","analyzer":"ik_max_word"},
"address":{"type":"text","analyzer":"ik_max_word"},
"age":{"type":"integer"},
"interests":{"type":"text","analyzer":"ik_max_word"},
"birthday":{"type":"date"}
}
}
}
}
PUT /lib3/user/1
{
"name":"小明 XM",
"address":"广州中山大道",
"age":31,
"birthday":"2011-09-08",
"interests":"跑步,听音乐"
}
PUT /lib3/user/2
{
"name":"晚秋明 WQM",
"address":"广州黄鹂大道西",
"age":18,
"birthday":"2012-09-08",
"interests":"跑步,听音乐,唱歌"
}
PUT /lib3/user/3
{
"name":"明亮 ML",
"address":"中山市中山大道",
"age":28,
"birthday":"2013-09-08",
"interests":"跑步,听音乐,唱歌"
}
PUT /lib3/user/4
{
"name":"明明 MM",
"address":"深圳市宝安大道",
"age":35,
"birthday":"2014-09-08",
"interests":"听音乐,唱歌,跳舞"
}“小明”是一个词,拆分两个中文则不能查询到
GET /lib3/user/_search
{
"query": {
"term": {
"name":"小明"
}
}
}Filter查询
filter是不计算相关性的,同时可以cache。因此,filter速度要快于query.
GET /lib2/books/_search
{
"post_filter": {"term":{"price":35}}
}
GET /lib2/books/_search
{
"post_filter": {"terms":{"price":[35,55]}}
}默认text类型会进行分词,而分词不区分大小写,所以应该使用下面的写法:
GET /lib2/books/_search
{
"query": {
"bool": {
"filter": [
{"term":{"title":"php"}}
]
}
}
}bool过滤查询
格式:
{"bool":{"must":[],"should":[],"must_not":[]}}must:必须满足的条件---and should:可以满足也可不满足的条件---or must_not:不需要满足的条件---not
查询 title 为php或java,并且price!=55
GET /lib2/books/_search
{
"query": {
"bool": {
"should": [
{"term":{"title":"php"}},
{"term":{"title":"java"}}
],
"must_not": [
{"term":{"price":35}}
]
}
}
}查询(price=45 并且 title=html5) 或者 (title=java)
GET /lib2/books/_search
{
"query": {
"bool": {
"should": [
{"term":{"title":"java"}},
{
"bool": {
"must": [
{"term":{"price":45}},
{"term":{"title":"html5"}}
]
}
}
]
}
}
}查询年龄 >=10 && <=55
GET /lib3/user/_search
{
"query": {
"bool": {
"filter": {
"range": {
"age": {
"gte": 10,
"lte": 55
}
}
}
}
}
}查询存在 name 属性的记录
GET /lib3/user/_search
{
"query": {"bool": {
"filter": {
"exists": {
"field": "name"
}
}
}}
}定义mapping不使用分词
PUT /lib7
{
"mappings": {
"items":{
"properties":{
"itemID":{
"type":"text",
"index":false
}
}
}
}
}
GET /lib7/_mapping