Tuesday, May 15, 2012

Lucene revolution conference 2012

After long time, I attended Lucene revolution conference. Overall this is good experience. Few sessions are too good & few missed the mark.

Personally I like the following sessions because of the contents/presenters energy/passion behind the search technology.

Automata Invasion
Challenges in Maintaining a High Performance Search Engine Written in Java
Updateable Fields in Lucene and other Codec Applications
Solr 4: The SolrCloud Architecture

Also “Stump the Chump” questions are interesting & learned quite a bit.
I won small prizes too. In general, Lucid imagination uploads the
conference vedios at the following location. Keep watching.
I also missed few good sessions in Big data area.

http://www.lucidimagination.com/devzone/videos-podcasts/conference-videos

Friday, May 04, 2012

Mapping hierarchical data in to SOLR

Experiments with Solr 4.0 (beta)

While modeling hierarchical data in XML easy, (for example org charts, Bill of materials structures) mapping to a persistent storage is very challenging. Relational SQL manages it however fetching/ updating the hierarchies is very difficult. Even books are written on mapping Tree/Graphs in to RDBMS world.

Consider simple hierarchy list looks like this:
Satya
Saketh
Dhanvi
Venkata
Dhasa

The most common and familiar known method is adjacency model, and it usually works as every node knows the adjacency node. (In SOLR world, ID contains in the unique value & parent field contains parent. Assume for the root node it is null or same.)

In SOLR each row is a document:

SOLRID Name Parent
01 Satya NULL
02 Saketh 01
03 Dhanvi 01
04 Venkata 01
05 Dhasa 04

Latest SOLR(4.0 beta?) Join functionality gives you the full hierarchy (or any piece of it) very quickly and easily.

Example queries:

1) Give me complete hierarchical list: q= {!join from=id to=parent}id:*
2) Give me immediate childs of Satya: q={!join from=id to=parent}id:Satya

An example SOLR query component to pull other objects

Consider Linked-in example. i.e. I have a set of connections. With single SOLRrequest, following SOLR component brings all first level connections information. In RDBMS world, typical example is employee table. i.e. bring me first level employees of manager. Assume single table contains both employee and manager information. (typical Adjacency list)
In terms of physical SOLR mapping, every document contains a connection field which contains the list(Adjacency list) of connections (root ids)

Configuration point of view, add the following to solrconfig.xml



public class ExampleComponent extends SearchComponent
{
public static final String COMPONENT_NAME = "example";

@Override
public void prepare(ResponseBuilder rb) throws IOException
{

}
@SuppressWarnings("unchecked")
@Override
public void process(ResponseBuilder rb) throws IOException
{
DocSlice slice = (DocSlice) rb.rsp.getValues().get("response");
SolrIndexReader reader = rb.req.getSearcher().getReader();
SolrDocumentList rl = new SolrDocumentList();
int docId=0;//// at this point consider only one rootid.
for (DocIterator it = slice.iterator(); it.hasNext(); ) {
docId = it.nextDoc();
Document doc = reader.document(docId);
String id = (String)doc.get("id");
String connections = (String)doc.get("contains");
System.out.println("\n id:"+id+" contains-->"+connections);
List<String> list = new ArrayList<String>();
list.add(id);//add rootid too. If we have joins in solr4.0
int pos = 0, end;
while ((end = connections.indexOf(',', pos)) >= 0) {
list.add(connections.substring(pos, end));
pos = end + 1;
}
BooleanQuery bq = new BooleanQuery();
Iterator<String> cIter = list.iterator();
while (cIter.hasNext()) {
String anExp = cIter.next();
TermQuery tq = new TermQuery(new Term("id",anExp));
bq.add(tq, BooleanClause.Occur.SHOULD);
}
SolrIndexSearcher searcher = rb.req.getSearcher();
DocListAndSet results = new DocListAndSet();
results.docList = searcher.getDocList(bq, null, null,0, 100,rb.getFieldFlags());
System.out.println("\n results.docList-->"+results.docList.size() );
rl.setNumFound(results.docList.size());
rb.rsp.getValues().remove("response");
rb.rsp.add("response", results.docList);
}
}

@Override
public String getDescription() {
return "Information";
}
@Override
public String getVersion() {
return "Solr gur";
}
@Override
public String getSourceId() {
return "Satya Solr Example";
}
@Override
public String getSource() {
return "$URL: }
@Override
public URL[] getDocs() {
return null;
}
}