I am a caffeinated, busy software junkie. Daily I help teams with solution engineering solutions: SOLR deep pagination fix. (For now it scales)

Monday, August 18, 2014

SOLR deep pagination fix. (For now it scales)

So far in my earlier SOLR implementation of enterprise search, there is no need to export large amount of data in Excel OR CSV form. However with GM implementation, there is need to export a range of search results. (For example 300K to 500K rows of 1 million search results found). This is do perform some kind of statics analysis of results.

SOLR Rest API does allow fetching results with different start parameters in small blocks. However they way pagination was implemented ( with the all ranking math), as start param increases, search response time will increase linearly. Following Jira tasks explains more details.

https://issues.apache.org/jira/browse/SOLR-5463

See the start & QTime param values.

SOLR 4.61 search request & response numbers:

/solr path=/select params={start=304000&q=*&json.nl=map&wt=javabin&version=2&row
s=1000} hits=698889 status=0 QTime=1089

SOLR 4.71 search request & response numbers:

/solr path=/select params={start=304000&q=*&json.nl=map&wt=javabin&version=2&row
s=1000} hits=698889 status=0 QTime=108

Key SolrJ Java code changes to use new "cursor mark" feature:

String cursorVal = "*"; <--- default="" font="" value="">

for ( int i =0; i< noOfTrips;i++ ){
int start = i*bucketSize;
//build search request
s.setStart(start); <-- as="" font="" set="" start="">
s.setCursorMark(cursorVal);

///Execute your search request
SimpleSearchResponse searchResponse = performSimpleSearch1(s);
//now process search results

//process cursor mark value
cursorVal = searchResponse.getCursorValue();

???

////////////
if (cursorVal != null )
query.set("cursorMark", cursorVal);
else
query.set("cursorMark", "*");

/// New SOLRJ API method which returns the cursor Mark value
QueryResponse response = getServer().query(query);
String value = response.getNextCursorMark();

Monday, August 18, 2014

SOLR deep pagination fix. (For now it scales)

No comments: