Wednesday, May 15, 2013

Moving from FAST to SOLR: Project Updates

Some background: Decision to move to SOLR was already made. Team hardly knows anything about FAST ESP. However business team know what they want in the new SOLR platform. (A clone of FAST based search system while addressing their pain points)

  Current status: Phase I of the project was completed & system is already in production (With all complexities, we wrapped the project in record time.)

  Positives: SOLR is fast in-terms of content indexing & searches. (Overall QPS is greater than FAST and current sizing issues are resolved.)

  Challenges: 1) During implementation we noticed strong customizations around business rules. This functionality is not available in SOLR/Lucene. I did some domain specific customizations. 2) We are replacing a search product with decent relevancy (because of all business rules) & we started late with relevancy. Relevancy echo system includes fine-tuning of similarity algorithms (tf/idf, B25 etc.) plus fine-tuning of synonyms/ spell-check modules. SOLR synonyms/spell check modules need more improvements/core bug fixes. Again I did more customizations to meet the needs. 3) Dynamic range facets & site taxonomy traversal/updates need future work. Basic stuff is working. However if the taxonomy changes often, doing incremental updates is a complex issue. For now, we have a workaround in place. Some extent business rules stuff was invented to work around some of these problems. Map reduce & Graph DB frameworks seems to solve issues around dynamic range facets/dynamic taxonomies. Exploring simple integration approaches between Hadoop/SOLR.

  Luck factor: Existing FAST based search was not completely leveraging FAST strong document processing capabilities(Linguistic normalization/ sentiment /Taxonomy etc). So we managed with little customizations around Lucene analyzers.

1 comment:

Anonymous said...

hey ! B25 is on the web http://besancon25.biz