Tuesday, August 26, 2014

Search Relevancy Issues Good example.

My keyword search  terms are sony laptop @bestbuy.com
See the following screenshot with the search results.




Clearly I am looking for sony laptop & results contains all types of items.
Also many marketplace items. 
I am looking for items from Bestbuy.

After selecting 4Gig/8Gig facets & Bestbuy Item tab, systems display all the laptops. 
See the following picture.




(You can try the same query at amazon  or Walmart and feel the difference.)

Search Findability Issues Good example.

I don’t know what happened to BestBuy keyword search.
Lately after moving to new home, searching for fridges with model numbers. 
See the following screenshot. 
My input is WSF26C3EXF.   model number of the the highly rated “Whirlpool 26.4-cu Ft. Side-by-Side Refrigerator”



How  bad a model number search fails without any results? 

Try the same input at lowes or amazon. 
Item shows up like a charm.

Above case is one fine example to demonstrate “findability” issue of Search in ecommerce site.
Next topic is few relevancy examples.

Tuesday, August 19, 2014

Google Maps APIs… Interesting findings. (Issues with Google MAPS API webservices)

   For a while, I am using some kind of Geocoding service to figure out the Longitude & latitude of few small business addresses so that we can suggest better service within certain distance. As long as address is correct, most of the Geocoding services works ok. (Some are good at Europe & Asian address whereas Google does great job with US addresses.) However main focus of this post is, I end up some legacy data with business addresses. (Most of the data was created early 90’s.  There is no consistent way address was created or updated or maintained.)  Still they are valid businesses & doing businesses however our data is incorrect. See the following example.

https://maps.googleapis.com/maps/api/geocode/xml?address=3100%20N%20COMMERCE%20ST,FORT%20WORTH,MI,US

address:
3100 N COMMERCE ST, FORT WORTH, MI, US

Google comes with responses which contains corrected address belonging to TX  state & TC Lat/Long  
3100 North Commerce Street, Fort Worth, TX 76164, USA


Based on the data set, I know that this business address belongs to MI State.
After ignoring first address field, now Let/Long comes back as
Fort Worth Drive, Macomb, MI 48044, USA


Despite my input contains MI, US, Google is not suggesting any valid Lat/Long values in MI.
So end up, making one more call to ignore first field in my address.
I know two more use cases like this & will append to this blog post.
Bottom line dirty data leads to more dirty data. 

Even with Google MAPs API, I end up calling twice. 
Bad choice? Any other alternatives?

On a side note: for my input N COMMERCE ST,FORT WORTH,MI,US
bing maps gives me following Lat/long coordinates.






Monday, August 18, 2014

SOLR deep pagination fix. (For now it scales)

So far in my earlier SOLR implementation of enterprise search, there is no need to export large amount of data in Excel OR CSV form.  However with GM implementation, there is need to export a range of search results. (For example 300K to 500K rows of 1 million search results found). This is do perform some kind of statics analysis of results.  

SOLR Rest API does allow fetching results with different start parameters in small blocks. However they way pagination was implemented ( with the all ranking math), as start param increases, search response time will increase linearly. Following Jira tasks explains more details.
https://issues.apache.org/jira/browse/SOLR-5463


See the  start & QTime param values.

SOLR 4.61  search  request & response numbers:

/solr path=/select params={start=304000&q=*&json.nl=map&wt=javabin&version=2&row
s=1000} hits=698889 status=0 QTime=1089


SOLR 4.71  search  request & response numbers:

/solr path=/select params={start=304000&q=*&json.nl=map&wt=javabin&version=2&row
s=1000} hits=698889 status=0 QTime=108


Key SolrJ  Java code changes to use new "cursor mark" feature:

    String cursorVal = "*"; <--- default="" font="" value="">

for ( int i =0; i< noOfTrips;i++ ){
int start = i*bucketSize;
      //build  search request
s.setStart(start); <-- as="" font="" set="" start="">
s.setCursorMark(cursorVal);

///Execute your search request
SimpleSearchResponse  searchResponse = performSimpleSearch1(s);
      //now process search results

      //process cursor mark value
cursorVal = searchResponse.getCursorValue();

        ???

////////////
  if (cursorVal != null )
  query.set("cursorMark", cursorVal);
else
query.set("cursorMark", "*");

/// New SOLRJ API method which returns the cursor Mark value
QueryResponse response = getServer().query(query);
String value = response.getNextCursorMark();