An object string property contains all possible values.
Now I want to filter the list with different kind of input patterns (aka single field search) In particular accepting wildcards (*,? & + etc escaping is fun in java.)
sample code:
String str="*+PATAC+*";
Pattern pat=Pattern.compile(".*\\+*\\+.*");
Matcher matcher=pat.matcher(str);
boolean flag=matcher.find(); // true;
Logger.println("1) matcher result->"+flag);
if ( flag == true)
Logger.println("pattern found"+str);
str = "adjkfh+PATAC+ajdskfhhk";
matcher=pat.matcher(str);
flag=matcher.find(); // true;
Logger.println("2) matcher result->"+flag);
if ( flag == true)
Logger.println("pattern found"+str);
str = "PATAC";
matcher=pat.matcher(str);
flag=matcher.find(); // true;
Logger.println("3) matcher result->"+flag);
if ( flag == true)
Logger.println("pattern found"+str);
str = "adjkfh+PATAC+";
matcher=pat.matcher(str);
flag=matcher.find(); // true;
Logger.println("4) matcher result->"+flag);
if ( flag == true)
Logger.println("pattern found"+str);
str = "+PATAC+testingsuffixchars";
matcher=pat.matcher(str);
flag=matcher.find(); // true;
Logger.println("5) matcher result->"+flag);
if ( flag == true)
Logger.println("pattern found"+str);
Daily I help teams with solution engineering aspect of connected vehicle data projects. (massive datasets & always some new datasets with new car models aka new technologies.) Lately in the spare time, applying some of the ML/Deep learning techniques on datasets (many are create based on observations of real datasets)To Share some thoughts on my work (main half of this blog) and the other half will be about my family and friends.
Tuesday, May 24, 2011
Sample code to create SOLR document from CSV file
Just one guy stopped by office & asked to index this legacy data. So quick & dirty solution is the following. (May be Perl etc are there, but this one gives me more flexibility.
public class CsvToSolrDoc
{
public String columnName(int i)
{
//workarounds workarounds
if ( i == 0) return "id";
if ( i == 1) return "what ever you want as field name";
return null;
}
public void csvToXML(String inputFile, String outputFile) throws java.io.FileNotFoundException, java.io.IOException
{
BufferedReader br = new BufferedReader(new FileReader(inputFile));
StreamTokenizer st = new StreamTokenizer(br);
String line = null;
FileWriter fw = new FileWriter(outputFile);
// Write the XML declaration and the root element
fw.write("\n");
fw.write("\n"); \n");
while ((line = br.readLine()) != null)
{
String[] values = line.split(",");
fw.write("\n"); \n");
int i = 1;
for ( int j=0;j it is length; J++)
{
String colName = "field name=\""+columnName(j)+"\"";
fw.write("<" + colName + ">");
fw.write(values[j].trim());
fw.write( "\n");
}
fw.write("
}
// Now we're at the end of the file, so close the XML document,
// flush the buffer to disk, and close the newly-created file.
fw.write("
fw.flush();
fw.close();
}
public static void main(String argv[]) throws java.io.IOException
{
CsvToSolrDoc cp = new CsvToSolrDoc();
cp.csvToXML("c:\\tmp\\m2.csv", "c:\\tmp\\m2.xml");
}
SOLR project stories. Lack of SOLR post filter support
For past few quarters, I am working on a project to implement security on object documents. My goals is Decorate every SOLR document with ACL field. This ACL field used to determine what users have access to this document. ACL syntax is something like +u (dave)-g(support) etc. My thoughts are process these ACL fields after search aka I want to subject the query component results via some kind of post filter. However SOLR is not offering any direct mechanism to specify this kind of post filter along with the search request. At Lucene level, there is an option to specify the filter however current AS IS implementation, it sucks.
It iterates entire document sets. For large documents this sucks. Also we need to SOLR distributed capabilities. Also during computing the ACL fields, I tried to encode users names etc with Base64, URLEncoder.encode etc. For small set of strings, this is working Ok. But for large sets, it is a pain. Ultimately affecting the search performance.
Another blocker.
Encode/decoder tes code.
startTime = System.currentTimeMillis();
String inputText = "Hello#$#%^#^&world";
for (int i =0;i<50000;i++)
{
String baseString = i+ " "+inputText;
encodedText = URLEncoder.encode(baseString,"UTF-8");
decodedText = URLDecoder.decode(encodedText, "UTF-8");
}
endTime = System.currentTimeMillis();
elapsedTime = endTime - startTime;
System.out.println( "\n URLEncoder/decoder Elapsed Time = " + elapsedTime + "ms");
>>>>
Elapsed Time = 2246ms
It iterates entire document sets. For large documents this sucks. Also we need to SOLR distributed capabilities. Also during computing the ACL fields, I tried to encode users names etc with Base64, URLEncoder.encode etc. For small set of strings, this is working Ok. But for large sets, it is a pain. Ultimately affecting the search performance.
Another blocker.
Encode/decoder tes code.
startTime = System.currentTimeMillis();
String inputText = "Hello#$#%^#^&world";
for (int i =0;i<50000;i++)
{
String baseString = i+ " "+inputText;
encodedText = URLEncoder.encode(baseString,"UTF-8");
decodedText = URLDecoder.decode(encodedText, "UTF-8");
}
endTime = System.currentTimeMillis();
elapsedTime = endTime - startTime;
System.out.println( "\n URLEncoder/decoder Elapsed Time = " + elapsedTime + "ms");
>>>>
Elapsed Time = 2246ms
Subscribe to:
Posts (Atom)