After long time, I attended Lucene revolution conference. Overall this is good experience. Few sessions are too good & few missed the mark.
Personally I like the following sessions because of the contents/presenters energy/passion behind the search technology.
Automata Invasion
Challenges in Maintaining a High Performance Search Engine Written in Java
Updateable Fields in Lucene and other Codec Applications
Solr 4: The SolrCloud Architecture
Also “Stump the Chump” questions are interesting & learned quite a bit.
I won small prizes too. In general, Lucid imagination uploads the
conference vedios at the following location. Keep watching.
I also missed few good sessions in Big data area.
http://www.lucidimagination.com/devzone/videos-podcasts/conference-videos
Daily I help teams with solution engineering aspect of connected vehicle data projects. (massive datasets & always some new datasets with new car models aka new technologies.) Lately in the spare time, applying some of the ML/Deep learning techniques on datasets (many are create based on observations of real datasets)To Share some thoughts on my work (main half of this blog) and the other half will be about my family and friends.
Tuesday, May 15, 2012
Friday, May 04, 2012
Mapping hierarchical data in to SOLR
Experiments with Solr 4.0 (beta)
While modeling hierarchical data in XML easy, (for example org charts, Bill of materials structures) mapping to a persistent storage is very challenging. Relational SQL manages it however fetching/ updating the hierarchies is very difficult. Even books are written on mapping Tree/Graphs in to RDBMS world.
Consider simple hierarchy list looks like this:
Satya
Saketh
Dhanvi
Venkata
Dhasa
The most common and familiar known method is adjacency model, and it usually works as every node knows the adjacency node. (In SOLR world, ID contains in the unique value & parent field contains parent. Assume for the root node it is null or same.)
In SOLR each row is a document:
SOLRID Name Parent
01 Satya NULL
02 Saketh 01
03 Dhanvi 01
04 Venkata 01
05 Dhasa 04
Latest SOLR(4.0 beta?) Join functionality gives you the full hierarchy (or any piece of it) very quickly and easily.
Example queries:
1) Give me complete hierarchical list: q= {!join from=id to=parent}id:*
2) Give me immediate childs of Satya: q={!join from=id to=parent}id:Satya
While modeling hierarchical data in XML easy, (for example org charts, Bill of materials structures) mapping to a persistent storage is very challenging. Relational SQL manages it however fetching/ updating the hierarchies is very difficult. Even books are written on mapping Tree/Graphs in to RDBMS world.
Consider simple hierarchy list looks like this:
Satya
Saketh
Dhanvi
Venkata
Dhasa
The most common and familiar known method is adjacency model, and it usually works as every node knows the adjacency node. (In SOLR world, ID contains in the unique value & parent field contains parent. Assume for the root node it is null or same.)
In SOLR each row is a document:
SOLRID Name Parent
01 Satya NULL
02 Saketh 01
03 Dhanvi 01
04 Venkata 01
05 Dhasa 04
Latest SOLR(4.0 beta?) Join functionality gives you the full hierarchy (or any piece of it) very quickly and easily.
Example queries:
1) Give me complete hierarchical list: q= {!join from=id to=parent}id:*
2) Give me immediate childs of Satya: q={!join from=id to=parent}id:Satya
An example SOLR query component to pull other objects
Consider Linked-in example. i.e. I have a set of connections. With single SOLRrequest, following SOLR component brings all first level connections information. In RDBMS world, typical example is employee table. i.e. bring me first level employees of manager. Assume single table contains both employee and manager information. (typical Adjacency list)
In terms of physical SOLR mapping, every document contains a connection field which contains the list(Adjacency list) of connections (root ids)
Configuration point of view, add the following to solrconfig.xml
public class ExampleComponent extends SearchComponent
{
public static final String COMPONENT_NAME = "example";
@Override
public void prepare(ResponseBuilder rb) throws IOException
{
}
@SuppressWarnings("unchecked")
@Override
public void process(ResponseBuilder rb) throws IOException
{
DocSlice slice = (DocSlice) rb.rsp.getValues().get("response");
SolrIndexReader reader = rb.req.getSearcher().getReader();
SolrDocumentList rl = new SolrDocumentList();
int docId=0;//// at this point consider only one rootid.
for (DocIterator it = slice.iterator(); it.hasNext(); ) {
docId = it.nextDoc();
Document doc = reader.document(docId);
String id = (String)doc.get("id");
String connections = (String)doc.get("contains");
System.out.println("\n id:"+id+" contains-->"+connections);
List<String> list = new ArrayList<String>();
list.add(id);//add rootid too. If we have joins in solr4.0
int pos = 0, end;
while ((end = connections.indexOf(',', pos)) >= 0) {
list.add(connections.substring(pos, end));
pos = end + 1;
}
BooleanQuery bq = new BooleanQuery();
Iterator<String> cIter = list.iterator();
while (cIter.hasNext()) {
String anExp = cIter.next();
TermQuery tq = new TermQuery(new Term("id",anExp));
bq.add(tq, BooleanClause.Occur.SHOULD);
}
SolrIndexSearcher searcher = rb.req.getSearcher();
DocListAndSet results = new DocListAndSet();
results.docList = searcher.getDocList(bq, null, null,0, 100,rb.getFieldFlags());
System.out.println("\n results.docList-->"+results.docList.size() );
rl.setNumFound(results.docList.size());
rb.rsp.getValues().remove("response");
rb.rsp.add("response", results.docList);
}
}
@Override
public String getDescription() {
return "Information";
}
@Override
public String getVersion() {
return "Solr gur";
}
@Override
public String getSourceId() {
return "Satya Solr Example";
}
@Override
public String getSource() {
return "$URL: }
@Override
public URL[] getDocs() {
return null;
}
}
Subscribe to:
Posts (Atom)