Friday, February 15, 2008

Integrate Full Text search functionality in to your applications with Lucene (searching)

Part 2: Now Full text searching.

public List search(){
List searchResult = new ArrayList();
IndexSearcher indexSearcher = null;

a) Get Indexer
b) Build default query parser
( since it was demo. Accept all wild cards characters in your inputs.
It is pure fun see how Lucene is responsing to wild queries.)

c) Now search the indexes.
( we already built them in the earlier step)

Hits hits = indexer.search(query);

My conclusions are
Lucene is a Pure Java product that provides:
* ranked searching ; best results returned first{ i need to test more here}
* Good numbers of query types:
phrase queries, wildcard queries, proximity queries, range queries etc
* fielded searching (e.g., title, path, contents)
* date-range searching
* sorting by any field
* multiple-index searching ( I am working on this one right now)
* allows simultaneous update and searching
I am looking forward to C/C++ implementation.
I love c++ stuff because it fits in our product stack very nicely.
I am wring custom code to parse our source code (C,C++ & CORBA)
Soon i will update this blog with key code blocks.

Wednesday, January 23, 2008

Integrate Full Text search functionality in to your applications with Lucene (indexing)

Part 1) Indexing your data.

Any Full Text search functionality involves indexing the data first. Lucene was no different in this approach. By indexing your data, it can perform high-performance full-text searching very fast. I did indexed 17,000 html files (my product documentation) in less than 5 minutes.

Creating Index writer & adding documents methods are key.
Rest f the methods for book keeping.
Following code indexes html, htm files in a folder. (It recursively iterates the nested folders & indexes each file)


////you data
public static final String dataDir = "D:\\webapps\\help";
//the directory that is used to store lucene index
private final String indexDir = "D:\\help_index";

public static String src1 = "";
public IndexWriter indexWriter;
public static int numF;
public static int numD;

public void openIndexWriter()throws IOException
{
Directory fsDirectory = FSDirectory.getDirectory(indexDir);
Analyzer analyzer = new StandardAnalyzer();
indexWriter = new IndexWriter(fsDirectory, true, analyzer);
indexWriter.setWriteLockTimeout(IndexWriter.WRITE_LOCK_TIMEOUT * 100 );
}

public void closeIndexWriter()throws IOException
{
indexWriter.optimize();
indexWriter.close();
}

public void indexFiles(String strPath) throws IOException
{
File src = new File(strPath);
if (src.isDirectory())
{
numD++;
String list[] = src.list();
try
{
for (int i = 0; i < list.length; i++)
{
src1 = src.getAbsolutePath() + File.separatorChar + list[i];
File file = new File(src1);
/*
* Try check like read/write access check etc.
*/
if ( file.isDirectory() )indexFiles(src1);
else
{
numF++;
if(src1.endsWith(".html") src1.endsWith(".htm")){
addDocument(src1, indexWriter);
}
}
}
}catch(java.lang.NullPointerException e){}
}
}
public boolean createIndex() throws IOException{
if(true == ifIndexExist()){
return true;
}
File dir = new File(dataDir);
if(!dir.exists()){
return false;
}
File[] htmls = dir.listFiles();
Directory fsDirectory = FSDirectory.getDirectory(indexDir);
Analyzer analyzer = new StandardAnalyzer();
IndexWriter indexWriter = new IndexWriter(fsDirectory, analyzer, true);
for(int i = 0; i < htmls.length; i++){
String htmlPath = htmls[i].getAbsolutePath();
if(htmlPath.endsWith(".html") htmlPath.endsWith(".htm")){
addDocument(htmlPath, indexWriter);
}
}
return true;
}
/**
* Add one document to the lucene index
*/
public void addDocument(String htmlPath, IndexWriter indexWriter){
//System.out.println("\n adding file to index "+htmlPath );
HTMLDocParser htmlParser = new HTMLDocParser(htmlPath);
String path = htmlParser.getPath();
String title = htmlParser.getTitle();
Reader content = htmlParser.getContent();
Document document = new Document();
document.add(new Field("path",path,Field.Store.YES,Field.Index.NO));
document.add(new Field("title",title,Field.Store.YES,Field.Index.TOKENIZED));
document.add(new Field("content",content));
try {
indexWriter.addDocument(document);
} catch (IOException e) {
e.printStackTrace();
}
}

im.openIndexWriter();
File src = new File(dataDir);
if(!src.exists()){
System.out.println("\n DATA DIR DOES NOT EXISTS" );
return;
}
long start = System.currentTimeMillis();
System.out.println("\n INDEXING STARTED" );
im.indexFiles(dataDir);
im.closeIndexWriter();
long end = System.currentTimeMillis();
long diff = (end-start)/1000;
System.out.println("\n Time consumed in Index the whole help=" +diff );
System.out.println("Number of files :\t"+numF);
System.out.println("Number of dirs :\t"+numD);
}

Friday, January 11, 2008

Populating MySql table from MS Excel { aka .csv} file

I am working on small project for a non profit organization.
{I felt Apache, PHP & MySQL combination fits their need.
I will explain about that application little later.}

I have already received some data in MS Excel file.
Strangely some trailing columns are missing in some records after saving Excel file in to csv file.

Following command fails saying column truncated MySQL errors.

mysql> load data infile 'C://bea//temple//dpexport.csv' into table donar
_info_4 fields terminated by ',' OPTIONALLY ENCLOSED BY '"' Lines terminated by
'\n';

After spend little more time with MySQL documentation,
I found out that IGNORE option does the magic &
I am able to load the csv files in my SQL table. Just add IGNORE next to infile.

Wednesday, January 02, 2008

Amazon Kindle is the next IPOD?

Few days back, I was little early to movie & I was waiting, sitting on a sofa, outside the theater.
Suddenly a young man sat next to me, seriously browsing web with his Amazon Kindle. (What made me curious was, he was reading one of my favorite web site, New York Times.) As I was paying attention to gadget , slowly he started talking all the good things about his new gadget & offered me to feel it. Quickly I checked the weight & look and feel of a blog & an e-book. Not so heavy & look and feel was very good & natural. I liked it a lot. I was so tempted after coming home, I checked Amazon website for Kindle.
(Little pricy, It was sold out & seems to be there is lot of demand for the gadget) Most of the reviews are positive. This incident remembers me another old incident with first experience with IPOD. Nearly4+ year's back, while coming back from my vacation, (India->London-> US), one person sitting next to me was explaining & talking & proud of owning a new IPOD. (If I remember correctly, it was the second month after IPOD launch.) Now I am seeing the same thing happening for Amazon's Kindle. I liked the way it was designed (automatic download content to the device)& convenience & thought process behind the Kindle. I was so surprised that Amazon came up with this kind of device. {After its initial launch as a major on line book store, this was the best thing from Amazon. My personal opinion. } Hoping that I will own Amazon Kindle very soon. It fits in to my taste.

Wednesday, December 19, 2007

An update & my progress in Kiva

Today I received an e-mail from kiva with happy Holidays message. Felt I like I need update my blog about my progress as a small kiva lender. I discovered Kiva in April 2007 via a PBS, tpt, Frontline program. I updated this same blog about this one. I was amazed by the Kiva founder’s sprit, website and the ease of loaning $25 to entrepreneurs around the world. I felt this is the best way to help “poorest of the poor” or "unlucky souls". I felt this is the right thing to do, So I was contributing to kiva whatever left in my savings account. (At the end, I am an immigrant here and working hard for my living.) I used to donate some good $ to few charities but for some reasons I felt this is the best way to make people responsible. { Donation is creating more dependency.} The fact is that I received no interest on my $ loan was immaterial. The important thing was being repaid in most cases. To date, my Kiva loan portfolio contains over 59 loans, including 5 loans that have been repaid completely and then re-loaned & I reached my little goal of loaning 50 entrepreneurs always. Now I am targetting 100 entrepreneurs in the next two years.
Following is my kiva lender page http://www.kiva.org/lender/dhasa

Friday, November 16, 2007

Java DefaultListModel performance issues

Recently I received a big customer escalation on search domain.
Basically end user was searching for enterprise information based on end user input criteria.
In our java rich client, we are showing a simple dialog to select the list of users in the enterprise. This action was consuming 10+ minutes.
Real culprit was UI works fine for simple 100 to 1000 users.
However customer is testing with 10K plus users.
After analyzing the all the code at server side & finally I looked in the client layer.
At client, server data is getting added to DefaultListModel with addElement() in a for loop.
Real culprit is addElement() method.
After seeing the implementation of the above method & its sequence of event calls &
Little bit of browsing the java forums, I found out that we should use the above class for large lists. Yes never use DefaultListModel directly. Still this problem exists in JDK 1.5 version. There are multiple solutions to this problem. Just Google it. You will find many.
I made a quick fix based on some suns forum advice. (Basically it is fast & I am seeing 90% improvement)

Steps:

1) Extend your DefaultListModel as shown below

class FastListModel extends DefaultListModel { private boolean listenersEnabled = true;
public boolean getListenersEnabled() { return listenersEnabled; }
public void setListenersEnabled(boolean enabled) { listenersEnabled = enabled; } public void fireIntervalAdded(Object source, int index0, int index1) { if (getListenersEnabled()) { super.fireIntervalAdded(source, index0, index1); } }
}

2) Add list listener to your list model

ListDataListener listener = new ListDataListener() { public void intervalAdded(ListDataEvent e) { list.ensureIndexIsVisible(e.getIndex1()); } public void intervalRemoved(ListDataEvent e) { } public void contentsChanged(ListDataEvent e) { } }; model.addListDataListener(listener);

3) Turn on & off listener explicitly

model.setListenersEnabled(false);

//add content for(int i = 0; i <>

// now enable the listers

model.setListenersEnabled(true);

Wednesday, October 03, 2007

New Java Runtime methods

If you building complex Java client & eclipse RCP based clients, always use
Java Runtime methods like maxMemory(), freeMemory() totalmemory() etc. to know your application or your module memory usage. It always helps.
Also use availableProcessors () elegantly if you application is spanning too many threads.

Thursday, September 06, 2007

hooray XSLT now part of JDK 1.5

XSLT now part of JDK 1.5 itself.
(Thanks god, we don’t have to download xalan, xerces etc.
& setting big class paths to author style sheets.)

Minus points are Java implementation of xalan sucks badly.
It is working fine for simple transformation use cases &
but failing for the complex cases.

(I will write those test cases in the next post)

Following is the sample Test.java to transform the input xml in to HTML ( or another other output) using java Xalan libs.

import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.transform.Source;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.stream.StreamSource;
import org.w3c.dom.Document;
import java.io.FileInputStream;
import java.io.FileOutputStream;

public class Test
{
public static void main(String[] args) throws Exception
{

Source source = new StreamSource(new FileInputStream("C://AE_html.xsl"));

Transformer t = TransformerFactory.newInstance().newTransformer(source);

Document doc = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(new FileInputStream("c://users.xml"));

System.out.println("Transforming...");
t.transform(new DOMSource(doc),new StreamResult ( new FileOutputStream("C://users.html")) ); }
}


//System.out.println( "doc as text content"+ doc.getTextContent() );
//t.transform(new DOMSource(doc), new StreamResult(System.out));

Thursday, August 30, 2007

MN State Fair pictures continued





We spend lot of time at pet animal section. Both Dhanvi & Saketh enjoyed a lot.
I observed big change in Poultry section. ( i.e. lots of new Rabbits. This change is good.)

Tuesday, August 28, 2007

MN State Fair pictures




Few pictures from flower stand. Compared to last year display, this year flower stand is less attractive. But a good one. I simply love this one. A must watch for me.

Monday, August 27, 2007

State Fair pictures






We went to State Fair over weekend. It became a ritual to visit State Fair every year.
Few picture from the fair. I will post more later.



Monday, June 04, 2007

More online references

Few more alternatives to wikipedia (www.wikipedia.org) I am finding more useful
Scholarpedia
www.scholarpedia.org

Conservapedia
www.conservapedia.org

Citizendium
www.citizendium.org

Wednesday, April 11, 2007

Became a member in kiva

As usual, yesterday, I was watching PBS documentaries.
But following program was really inspiring. (It is a small part in Frontline)

(A Little Goes a Long Way)
Watch it online first.
http://www.pbs.org/frontlineworld/stories/uganda601/

Immediately decided to be a member in kiva.
I believe in the concept (microfinance & direct lending & to the needy.)
I watched many programs on microfinance.
But I never know how to be part of it or Can an individual can join.
www.kava.org helped me to contribute.
At present I am thinking of contributing to the same for another one year.
Started with 6 & my goal is help 50 all the time.
Please read the FAQ etc in the www.kiva.org before jumping.

“It's a new, direct and sustainable way to fight global poverty, and the way I see it, I get a higher return on $25 helping someone build a future than the interest my checking account pays. “


Following is my lender page in kiva.
http://www.kiva.org/lender/dhasa

Wednesday, November 29, 2006

Saketh stills for passport applications. continued.


How to pull that camera?

See saketh stills for passport applications. continued.


How to stop these guys?

See saketh stills for passport applications. continued.


No more photos for the day. Got it.

See saketh stills for passport applications. continued.


How is my new still?

See saketh stills for passport applications. continued.


Why so many photos. I am sad but i dont give up.

See saketh stills for passport applications.


After struggling for 1/2 hour, finally his mom gave up for the day.

Sunday, May 21, 2006