Apache Solr extract, highlight HTML elements based on query, filter query terms

Question

Update. (+18d) edited title and provided answer addressing original question. tl/dr I am indexing HTML pages and dumping the

...

content as a snippet for search query returns. However, I don't want / need all that content (just the context around the query matched text). Background With these in my [classic] schema, and these in my solrconfig.xml I get this

Accepted Answer

Update [2020-12-31]Please overlook the answering of my own question, as 18 days have passed with one comment and no answers.I am building a search page with Solr as the backend, inspired by the following Ajax Solr tutorial.https://github.com/evolvingweb/ajax-solrUltimately, I decided to forgo Solr highlighting in favor of a more flexible, bespoke JavaScript (JS) solution.Basically, I:collect the Solr query (q) and filter query (fq) values (terms) in an array (simplified example shown below; more complete JS code appended)for (var i = 0, l = this.manager.response.response.docs.length; i < l; i++) { var doc = this.manager.response.response.docs[i];}extract sentences matching those terms (words) via a JS regex expressionvar mySentences = doc_p.replace(/([.?!])s*(?=['"A-Z])/g, "$1|").split("|");where doc.p is a Solr field (defined in schema.xml) corresponding to indexed HTML p-element (

…

) text.details: see Split string into sentences in javascripthighlight those query termsvar query = this.manager.store.get('q').value; /* or loop over array */const replacer = (str, replace) => { const re = new RegExp(`(${replace})`, 'gi') return str.replaceAll(re, '$1')}var doc_p_hl = replacer(doc.p.toString(), query);details: see JavaScript replaceAll case-insensitive search using variable rather than a stringuse those term-highlighted strings as snippets on the frontendapply a similar approach to the highighting of query terms in the full documents, doc.p.toString() …AddendumHere is the JS code I wrote to collect Solr “q” and “fq” terms in an array. Note that Solr returns single fq as a string, and multiple fq terms as an array.var q_arr = [];var fq_arr = [];var highlight_arr = [];var snippets_arr = [];var fq_vals = [];if ((this.manager.store.get('q').value !== undefined) && (this.manager.store.get('q').value !== '*:*')) { query = this.manager.store.get('q').value; q_arr.push(query); highlight_arr.push(query); console.log('q_arr:', q_arr, '| type:', typeof q_arr, '| length:', q_arr.length)}var doc_responseHeader = this.manager.response.responseHeader;if (doc_responseHeader.params.fq !== undefined) { /* ONE "fq" (FILTER QUERY) TERM: */ if (typeof doc_responseHeader.params.fq === 'string' || doc_responseHeader.params.fq instanceof String) { fq_arr.push(doc_responseHeader.params.fq); } /* MORE THAN ONE "fq" (FILTER QUERY) TERM: */ if (typeof doc_responseHeader.params.fq === 'object' || doc_responseHeader.params.fq instanceof Object) { for (var i = 0, l = doc_responseHeader.params.fq.length; i < l; i++) { fq_arr.push(doc_responseHeader.params.fq[i].toString()); } } fq_vals = fq_arr.map(function(x){return x.replace(/keywords:/g, '');}) console.log('fq_vals', fq_vals, '| type:', typeof fq_vals, '| length:', fq_vals.length) for (var i = 0, l = fq_vals.length; i < l; i++) { highlight_arr.push(fq_vals[i].toString()); }}

Advertisement

Answer