es 高亮显示

时间 2020-09-05 标签 es 高亮显示

es 支持3中高亮显示正则表达式

Unified highlighteredit：算法

The unified highlighter使用Lucene Unified Highlighter。这个突出显示器将文本分红句子，并使用BM25算法对单个句子进行评分，就好像它们是语料库中的文档同样。它还支持准确的短语和多项（模糊，前缀，正则表达式）突出显示。这是默认的highlighter。app

Plain highlighteredit：spa

The plain highlighter使用标准的Lucene荧光笔。它试图在词语重要性和短语查询中的任何单词定位标准方面反映查询匹配逻辑。排序

注意点：索引

The plain highlighter最适合在单个字段中突出显示简单查询匹配。为了准确反映查询逻辑，它会建立一个微小的内存索引，并经过Lucene的查询执行计划程序从新运行原始查询条件，以访问当前文档的低级匹配信息。对于须要突出显示的每一个字段和每一个文档都会重复此操做。若是要在复杂查询的大量文档中突出显示不少字段，咱们建议在发布或term_vector字段上使用Unified highlighter。内存

Fast vector highlighteredit：ci

The fvh highlighte使用 the Lucene Fast Vector highlighter。此高亮显示器可用于在映射中将term_vector设置为with_positions_offsets的字段。文档

The fvh highlighte：字符串

可使用boundary_scanner进行自定义。

须要将term_vector设置为with_positions_offsets，这会增长索引的大小

能够未来自多个字段的匹配组合成一个结果。见matching_fields

能够为不一样位置的匹配分配不一样的权重，容许在突出显示提高词组匹配的提高查询时，将词组匹配等术语排序在术语匹配之上

注意：

The fvh highlighte不支持跨度查询。若是您须要支持跨度查询，请尝试使用其余highlighter，例如the unified highlighter。

number_of_fragments

要返回的最大片断数。若是片断数设置为0，则不返回任何片断。而是突出显示并返回整个字段内容。当您须要突出显示标题或地址等短文本时，这可能很方便，但不须要分段。若是number_of_fragments为0，则忽略fragment_size。默认为5。

突出显示的片断的大小（以字符为单位）默认为100。

注意：

由于高亮搜索会对文本进行分句，若是当前高亮显示的搜索所在的一句里有.30个字符串，fragment_size为70 可是下局有50个字符串。es 能够担忧显示不全的问题，就只会显示当前30个字符串。若是下局小于40，就会继续显示。

PUT test_index

{

"mappings": {

"_doc": {

"properties": {

"content" : {

"type" : "text",

"analyzer" : "english",

"term_vector" : "with_positions_offsets",

"fields":{

"plain":{

"type" : "text",

"analyzer" : "standard",

"term_vector" : "with_positions_offsets"

}

PUT test_index/_doc/doc1

{

"content" : "For you I'm only a fox like a hundred thousand other foxes. But if you tame me, we'll need each other. You'll be the only boy in the world for me. I'll be the only fox in the world for you."

}

GET test_index/_search

{

"query": {

"match_phrase": {"content" : "fox like"}

"highlight": {

"type" : "unified",

"number_of_fragments" : 3,

"fragment_size": "30",

"fields": {

"content": {}

}

结果一

"highlight" : {

"content" : [

"For you I'm only a fox like a hundred"

]

}

由于For you I'm only a fox like a hundred thousand other foxes 超过了30个字符串，因此只显示前30个字符

结果2 "fragment_size": "70",

"highlight" : {

"content" : [

"For you I'm only a fox like a hundred thousand other foxes."

]

}

超过了当前的一句话，可是不够下局，因此只显示当前一句。

结果2 "fragment_size": "130",

"highlight" : {

"content" : [

"For you I'm only a fox like a hundred thousand other foxes. But if you tame me, we'll need each other."

]

}

由于130 也够下面一句话的长度，因此下句话也显示了出来。

结果2 "fragment_size": "1"

"highlight" : {

"content" : [

"I'm only a fox",

"like a hundred"

]

}

若是长度过小一句中匹配的词语也会被成2个。

highlight_query:

突出显示搜索查询之外的查询的匹配项。若是您使用rescore查询，这尤为有用，由于默认状况下突出显示不会考虑这些查询。

注意
Elasticsearch不会以任何方式验证highlight_query是否包含搜索查询，所以能够对其进行定义，以便不突出显示合法查询结果。一般，您应该将搜索查询做为highlight_query的一部分包含在内。

GET test_index/_search

{

"query": {

"match_phrase": {"content" : "fox like"}

"highlight": {

"type" : "unified",

"number_of_fragments" : 3,

"fragmenter": "span",

"fragment_size": "100",

"fields": {

"content": {

"highlight_query": {

"match_phrase": {

"content": "only fox"

}

返回结果： "highlight" : {

"content" : [

"I'll be the only fox in the world for you."

]

}

在结果中能够看出，须要高亮显示的条件已经被替换了。

no_match_size：

若是没有要突出显示的匹配片断，则要从字段开头返回的文本量。默认为0（不返回任何内容）。

GET test_index/_search

{

"query": {

"match_phrase": {"content" : "fox like"}

"highlight": {

"type" : "unified",

"number_of_fragments" : 3,

"fragmenter": "span",

"fragment_size": "100",

"no_match_size": 20,

"fields": {

"content": {

"highlight_query": {

"match_phrase": {

"content": "aa"

}

返回结果：

"highlight" : {

"content" : [

"For you I'm only a fox"

]

}

没有高亮匹配，可是由于配置了 no_match_size 因此依然有返回值。

matched_fields

在多个字段上组合匹配以突出显示单个字段。对于以不一样方式分析相同字符串的多字段，这是最直观的。全部matched_fields必须将term_vector设置为with_positions_offsets，但只加载组合匹配的字段，所以只有该字段才能将store设置为yes。仅适用于fvh荧光笔。

GET test_index/_search

{

"query": {

"query_string": {

"query": "running scissors",

"fields": ["content", "content.plain^10"]

}

"highlight": {

"type" : "fvh",

"number_of_fragments" : 2,

"fields": {

"content.plain": {

"matched_fields": ["content.plain", "content"]

}