Monday, August 18, 2014

SOLR deep pagination fix. (For now it scales)

So far in my earlier SOLR implementation of enterprise search, there is no need to export large amount of data in Excel OR CSV form.  However with GM implementation, there is need to export a range of search results. (For example 300K to 500K rows of 1 million search results found). This is do perform some kind of statics analysis of results.  

SOLR Rest API does allow fetching results with different start parameters in small blocks. However they way pagination was implemented ( with the all ranking math), as start param increases, search response time will increase linearly. Following Jira tasks explains more details.
https://issues.apache.org/jira/browse/SOLR-5463


See the  start & QTime param values.

SOLR 4.61  search  request & response numbers:

/solr path=/select params={start=304000&q=*&json.nl=map&wt=javabin&version=2&row
s=1000} hits=698889 status=0 QTime=1089


SOLR 4.71  search  request & response numbers:

/solr path=/select params={start=304000&q=*&json.nl=map&wt=javabin&version=2&row
s=1000} hits=698889 status=0 QTime=108


Key SolrJ  Java code changes to use new "cursor mark" feature:

    String cursorVal = "*"; <--- default="" font="" value="">

for ( int i =0; i< noOfTrips;i++ ){
int start = i*bucketSize;
      //build  search request
s.setStart(start); <-- as="" font="" set="" start="">
s.setCursorMark(cursorVal);

///Execute your search request
SimpleSearchResponse  searchResponse = performSimpleSearch1(s);
      //now process search results

      //process cursor mark value
cursorVal = searchResponse.getCursorValue();

        ???

////////////
  if (cursorVal != null )
  query.set("cursorMark", cursorVal);
else
query.set("cursorMark", "*");

/// New SOLRJ API method which returns the cursor Mark value
QueryResponse response = getServer().query(query);
String value = response.getNextCursorMark();


No comments: