学校网站建设运行简介,想做一个自己的网站怎么做的,建站技术入门,长沙房产搜索推荐#xff1a;Suggest
概述
搜索一般都会要求具有“搜索推荐”或者叫“搜索补全”的功能#xff0c;即在用户输入搜索的过程中#xff0c;进行自动补全或者纠错。以此来提高搜索文档的匹配精准度#xff0c;进而提升用户的搜索体验#xff0c;这就是Suggest。
四…搜索推荐Suggest
概述
搜索一般都会要求具有“搜索推荐”或者叫“搜索补全”的功能即在用户输入搜索的过程中进行自动补全或者纠错。以此来提高搜索文档的匹配精准度进而提升用户的搜索体验这就是Suggest。
四种Suggester term suggesterterm suggester正如其名只基于tokenizer之后的单个term去匹配建议词并不会考虑多个term之间的关系 POST index/_search
{ suggest: {suggest_name: {text: search_content,term: {suggest_mode: suggest_mode,field: field_name}}}
}Options text用户搜索的文本field要从哪个字段选取推荐数据analyzer使用哪种分词器size每个建议返回的最大结果数sort如何按照提示词项排序参数值只可以是以下两个枚举 score分数词频词项本身frequency词频分数词项本身 suggest_mode搜索推荐的推荐模式参数值亦是枚举 missing默认值仅为不在索引中的词项生成建议词popular仅返回与搜索词文档词频或文档词频更高的建议词always根据 建议文本中的词项 推荐 任何匹配的建议词 max_edits可以具有最大偏移距离候选建议以便被认为是建议。只能是1到2之间的值。任何其他值都将导致引发错误的请求错误。默认为2prefix_length前缀匹配的时候必须满足的最少字符min_word_length最少包含的单词数量min_doc_freq最少的文档频率max_term_freq最大的词频 phrase suggesterphrase suggester和term suggester相比对建议的文本会参考上下文也就是一个句子的其他token不只是单纯的token距离匹配它可以基于共生和频率选出更好的建议。 注意purase需要先创建Mapping Options real_word_error_likelihood 此选项的默认值为 0.95。此选项告诉 Elasticsearch 索引中 5% 的术语拼写错误。这意味着随着这个参数的值越来越低Elasticsearch 会将越来越多存在于索引中的术语视为拼写错误即使它们是正确的max_errors为了形成更正最多被认为是拼写错误的术语的最大百分比。默认值为 1confidence默认值为 1.0最大值也是。该值充当与建议分数相关的阈值。只有得分超过此值的建议才会显示。例如置信度为 1.0 只会返回得分高于输入短语的建议collate告诉 Elasticsearch 根据指定的查询检查每个建议以修剪索引中不存在匹配文档的建议。在这种情况下它是一个匹配查询。由于此查询是模板查询因此搜索查询是当前建议位于查询中的参数下。可以在查询下的“params”对象中添加更多字段。同样当参数“prune”设置为true时我们将在响应中增加一个字段“collate_match”指示建议结果中是否存在所有更正关键字的匹配direct_generatorphrase suggester使用候选生成器生成给定文本中每个项可能的项的列表。单个候选生成器类似于为文本中的每个单独的调用term suggester。生成器的输出随后与建议候选项中的候选项结合打分。目前只支持一种候选生成器即direct_generator。建议API接受密钥直接生成器下的生成器列表列表中的每个生成器都按原始文本中的每个项调用。 completion suggester自动补全自动完成支持三种查询【前缀查询prefix模糊查询fuzzy正则表达式查询regex)】 主要针对的应用场景就是Auto Completion。 此场景下用户每输入一个字符的时候就需要即时发送一次查询请求到后端查找匹配项在用户输入速度较高的情况下对后端响应速度要求比较苛刻。因此实现上它和前面两个Suggester采用了不同的数据结构索引并非通过倒排来完成而是将analyze过的数据编码成FST和索引一起存放。对于一个open状态的索引FST会被ES整个装载到内存里的进行前缀查找速度极快。但是FST只能用于前缀查找这也是Completion Suggester的局限所在。 completiones的一种特有类型专门为suggest提供基于内存性能很高。prefix query基于前缀查询的搜索提示是最常用的一种搜索推荐查询。 prefix客户端搜索词field建议词字段size需要返回的建议词数量默认5skip_duplicates是否过滤掉重复建议默认false fuzzy query fuzziness允许的偏移量默认autotranspositions如果设置为true则换位计为一次更改而不是两次更改默认为true。min_length返回模糊建议之前的最小输入长度默认 3prefix_length输入的最小长度不检查模糊替代项默认为 1unicode_aware如果为true则所有度量如模糊编辑距离换位和长度均以Unicode代码点而不是以字节为单位。这比原始字节略慢因此默认情况下将其设置为false。 regex query可以用正则表示前缀不建议使用 context suggester完成建议者会考虑索引中的所有文档但是通常来说我们在进行智能推荐的时候最好通过某些条件过滤并且有可能会针对某些特性提升权重。 contexts上下文对象可以定义多个 namecontext的名字用于区分同一个索引中不同的context对象。需要在查询的时候指定当前nametypecontext对象的类型目前支持两种category和geo分别用于对suggest item分类和指定地理位置。boost权重值用于提升排名 path如果没有path相当于在PUT数据的时候需要指定context.name字段如果在Mapping中指定了path在PUT数据的时候就不需要了因为 Mapping是一次性的而PUT数据是频繁操作这样就简化了代码。这段解释有木有很牛逼网上搜到的都是官方文档的翻译觉悟雷同。
#term suggestDELETE news
POST _bulk
{ index : { _index : news,_id:1 } }
{ title: baoqiang bought a new hat with the same color of this font, which is very beautiful baoqiangba baoqiangda baoqiangdada baoqian baoqia}
{ index : { _index : news,_id:2 } }
{ title: baoqiangge gave birth to two children, one is upstairs, one is downstairs baoqiangba baoqiangda baoqiangdada baoqian baoqia}
{ index : { _index : news,_id:3} }
{ title: baoqiangge s money was rolled away baoqiangba baoqiangda baoqiangdada baoqian baoqia}
{ index : { _index : news,_id:4} }
{ title: baoqiangda baoqiangda baoqiangda baoqiangda baoqiangda baoqian baoqia}GET news/_mappingPOST _analyze
{text: [BaoQiang bought a new hat with the same color of this font, which is very beautiful,BaoQiangGe gave birth to two children, one is upstairs, one is downstairs,BaoQiangGe s money was rolled away]
}POST /news/_search
{suggest: {my-suggestion: {text: baoqing baoqiang,term: {suggest_mode:always,field: title,min_doc_freq: 3}}}
}GET /news/_search
{ suggest: {my-suggestion: {text: baoqing baoqiang,term: {suggest_mode: popular,field: title}}}
}GET /news/_search
{ suggest: {my-suggestion: {text: baoqing baoqiang,term: {suggest_mode: popular,field: title,max_edits:2,max_term_freq:1}}}
}GET /news/_search
{ suggest: {my-suggestion: {text: baoqing baoqiang,term: {suggest_mode: always,field: title,max_edits:2}}}
}DELETE news2
POST _bulk
{ index : { _index : news2,_id:1 } }
{ title: baoqiang4}
{ index : { _index : news2,_id:2 } }
{ title: baoqiang4 baoqiang3}
{ index : { _index : news2,_id:3 } }
{ title: baoqiang4 baoqiang3 baoqiang2}
{ index : { _index : news2,_id:4 } }
{ title: baoqiang4 baoqiang3 baoqiang2 baoqiang}
POST /news2/_search
{ suggest: {second-suggestion: {text: baoqian baoqiang baoqiang2 baoqiang3,term: {suggest_mode: popular,field: title}}}
}#phrase suggester
DELETE test
PUT test
{settings: {index: {number_of_shards: 1,number_of_replicas: 0,analysis: {analyzer: {trigram: {type: custom,tokenizer: standard,filter: [lowercase,shingle]}},filter: {shingle: {type: shingle,min_shingle_size: 2,max_shingle_size: 3}}}}},mappings: {properties: {title: {type: text,fields: {trigram: {type: text,analyzer: trigram}}}}}
}GET /_analyze
{tokenizer: standard,filter: [{type: shingle,min_shingle_size: 2,max_shingle_size: 3}],text: lucene and elasticsearch
}# min_shingle_size: 2,
# max_shingle_size: 3
GET test/_analyze
{analyzer: trigram, text : lucene and elasticsearch
}
DELETE test
POST test/_bulk
{ index : { _id:1} }
{title: lucene and elasticsearch}
{ index : {_id:2} }
{title: lucene and elasticsearhc}
{ index : { _id:3} }
{title: luceen and elasticsearch}POST test/_search
GET test/_mapping
POST test/_search
{suggest: {text: Luceen and elasticsearhc,simple_phrase: {phrase: {field: title.trigram,max_errors: 2,gram_size: 1,confidence:0,direct_generator: [{field: title.trigram,suggest_mode: always}],highlight: {pre_tag: em,post_tag: /em}}}}
}
#complate suggester
DELETE suggest_carinfo
PUT suggest_carinfo
{mappings: {properties: {title: {type: text,analyzer: ik_max_word,fields: {suggest: {type: completion,analyzer: ik_max_word}}},content: {type: text,analyzer: ik_max_word}}}
}POST _bulk
{index:{_index:suggest_carinfo,_id:1}}
{title:宝马X5 两万公里准新车,content:这里是宝马X5图文描述}
{index:{_index:suggest_carinfo,_id:2}}
{title:宝马5系,content:这里是奥迪A6图文描述}
{index:{_index:suggest_carinfo,_id:3}}
{title:宝马3系,content:这里是奔驰图文描述}
{index:{_index:suggest_carinfo,_id:4}}
{title:奥迪Q5 两万公里准新车,content:这里是宝马X5图文描述}
{index:{_index:suggest_carinfo,_id:5}}
{title:奥迪A6 无敌车况,content:这里是奥迪A6图文描述}
{index:{_index:suggest_carinfo,_id:6}}
{title:奥迪双钻,content:这里是奔驰图文描述}
{index:{_index:suggest_carinfo,_id:7}}
{title:奔驰AMG 两万公里准新车,content:这里是宝马X5图文描述}
{index:{_index:suggest_carinfo,_id:8}}
{title:奔驰大G 无敌车况,content:这里是奥迪A6图文描述}
{index:{_index:suggest_carinfo,_id:9}}
{title:奔驰C260,content:这里是奔驰图文描述}
{index:{_index:suggest_carinfo,_id:10}}
{title:nir奔驰C260,content:这里是奔驰图文描述}GET suggest_carinfo/_search?pretty
{suggest: {car_suggest: {prefix: 奥迪,completion: {field: title.suggest}}}
}#1内存代价太大原话是性能高是通过大量的内存换来的
#2只能前缀搜索,假如用户输入的不是前缀 召回率可能很低POST suggest_carinfo/_search
{suggest: {car_suggest: {prefix: 宝马5系,completion: {field: title.suggest,skip_duplicates:true,fuzzy: {fuzziness: 2}}}}
}
GET suggest_carinfo/_doc/10
GET _analyze
{analyzer: ik_max_word,text: [奔驰AMG 两万公里准新车]
}POST suggest_carinfo/_search
{suggest: {car_suggest: {regex: nir,completion: {field: title.suggest,size: 10}}}
}# context suggester
# 定义一个名为 place_type 的类别上下文其中类别必须与建议一起发送。
# 定义一个名为 location 的地理上下文类别必须与建议一起发送
DELETE place
PUT place
{mappings: {properties: {suggest: {type: completion,contexts: [{name: place_type,type: category},{name: location,type: geo,precision: 4}]}}}
}PUT place/_doc/1
{suggest: {input: [ timmys, starbucks, dunkin donuts ],contexts: {place_type: [ cafe, food ] }}
}
PUT place/_doc/2
{suggest: {input: [ monkey, timmys, Lamborghini ],contexts: {place_type: [ money] }}
}GET place/_search
POST place/_search?pretty
{suggest: {place_suggestion: {prefix: sta,completion: {field: suggest,size: 10,contexts: {place_type: [ cafe, restaurants ]}}}}
}
# 某些类别的建议可以比其他类别提升得更高。以下按类别过滤建议并额外提升与某些类别相关的建议
GET place/_search
POST place/_search?pretty
{suggest: {place_suggestion: {prefix: tim,completion: {field: suggest,contexts: {place_type: [ { context: cafe },{ context: money, boost: 2 }]}}}}
}# 地理位置筛选器
PUT place/_doc/3
{suggest: {input: timmys,contexts: {location: [{lat: 43.6624803,lon: -79.3863353},{lat: 43.6624718,lon: -79.3873227}]}}
}
POST place/_search
{suggest: {place_suggestion: {prefix: tim,completion: {field: suggest,contexts: {location: {lat: 43.662,lon: -79.380}}}}}
}# 定义一个名为 place_type 的类别上下文其中类别是从 cat 字段中读取的。
# 定义一个名为 location 的地理上下文其中的类别是从 loc 字段中读取的
DELETE place_path_category
PUT place_path_category
{mappings: {properties: {suggest: {type: completion,contexts: [{name: place_type,type: category,path: cat},{name: location,type: geo,precision: 4,path: loc}]},loc: {type: geo_point}}}
}
# 如果映射有路径那么以下索引请求就足以添加类别
# 这些建议将与咖啡馆和食品类别相关联
# 如果上下文映射引用另一个字段并且类别被明确索引则建议将使用两组类别进行索引
PUT place_path_category/_doc/1
{suggest: [timmys, starbucks, dunkin donuts],cat: [cafe, food]
}
POST place_path_category/_search?pretty
{suggest: {place_suggestion: {prefix: tim,completion: {field: suggest,contexts: {place_type: [ { context: cafe }]}}}}
}