I am a caffeinated, busy software junkie. Daily I help teams with solution engineering solutions

Daily I help teams with solution engineering aspect of connected vehicle data projects. (massive datasets & always some new datasets with new car models aka new technologies.) Lately in the spare time, applying some of the ML/Deep learning techniques on datasets (many are create based on observations of real datasets)To Share some thoughts on my work (main half of this blog) and the other half will be about my family and friends.

Tuesday, November 18, 2014

My 2014 Black Friday wish list

With Black Friday approaching quickly, following are in my wish list.

I have $1500 budget + $450 BestBuy + $50 Kohls Gift card.

a) Two laptops (Core i5 processor. sadly my Toshiba laptop died this year.)

Most likely I will go with BestBuy to get rid of Gift card or

HP Pavilion 15 Laptop Computer With 15.6" Screen from Office depot

b) Xbox 360 games

Both Dhanvi & Saketh wants Call of Duty Advanced warfare, & two other latest games

All these games are $59.99

Looking for buy 2 & get 3ed free

c) A nice corner desk + desktop chair (Found something in staples)

d) Ladder badly in need for the new house (Home depot or Lowes)

f) Nice large rug for living room (Sam’s or Costco)

g) Tri ply Stainless Steel cookware

My take on 2ed day 2014 Lucenerevolution at Washington, DC

Following is summary

Day started with SOLRCloud at Apple presentation by Apple team. It is solid presentation in terms of challenges faced during SOlrCloud implementation at Apple. Still they are implementing and seems to be they needed lots of new features (disaster& recovery space) and they are trying to automate as much as possible. They gave Some JIRA tasks and they are contributing something back to community. Finally something back to community. This session followed by lucid works presentation on scaling SOLR cloud for massive data. This story builds on top of Apple. Good one. After above general session, I attend following separate tracks.

a) Solr on HDFS – Past, Present, and Future (Introductory): Running SOLR on HDFC & challenges.

Solid presentation by mark miller.

b) Solr Anti Patterns (Intermediate) Presented by Rafał Kuć

This is just OK. I felt it is pure hypothetical use case. He was saying some existing customer moved from Solr 3.1 to Solr 4.10 simply copying Solr 3.1 config & schema,xml files in to Solr 4.10 etc. etc. I asked interesting questions after the session. Still lots of things are hidden.

c) Multi-language Content Discovery Through Entity Driven Search (Advanced)
Presented byAlessandro Benedetti,

Solid POC, proof of concepts, effort. (Basically with using freebase, he is trying to classify the content.) At some point, I worked on similar POCs. Mostly likely some site will use for some kind subscription based searches… I don’t think this make it big.

d) The Latest in Spatial/Temporal Search (Intermediate) Presented by David

General talk on spatial search. Good one... Overall I am facing different problems in this space. For my employer geodistance is not much useful. I need a truce driving distance based search results. After the session also, I talked to David on this one. Seems to be this is not possible in the current SOLR Geo plugin space… I am thinking forking David’s original code and make it customer specific.

e) Anatomy of Relevance: From Data to Action (Advanced) Presented by Saïd Radhouani

Actually I know the challenges in Relevancy and it is difficult to cover in 30 minutes session. Since not many alternatives and another friend dragged me to the session. Routine talk. Search is not good enough. Relevancy is needed & you need to consider signal, end users, click throw, conversion rates … Don’t expect anything from Ph.D. guy in 30 minutes talk.

f) Searching 35 Million Images by Color Using Solr (Intermediate) Presented by Chris Becker, Shutterstock

Simply superb. Presentation started with problem statement. I.e. problem with image search & quick demo of how color search makes the difference. After demo he explained how they implemented with using SOLR. Simply I loved the approach.

10 Keys to Solr’s Future Grant Ingersoll, Lucidworks Typical Grant talk. Nothing special. I was not expecting too.

After Apple talk, I talked to the Apple Sorl team, 2 folks, about the size of the search team. His response was Oh we are 8 to 10 people. We are small & our productivity is great etc… I stopped listening to him. He does not know he is talking to someone who delivered 3 large scale SOLR enterprise implementations single handly….

Overall an average 2ed day. Talked to few people on their SOlr challenges.

One of the interesting conversation is enterprise cloud search implementation by Hitachi folks.

Final thoughts… For 3 years in a row, I was attending lucenerevolution & now I am feeling most of the session are repeated. (May be I am doing too many things in SOLR.)

At this point, my take is I will not attend next year lucenerevolution. (90%, I decided on this one.)

I will enclose few picture later.

Dhanvi and Saketh Chess Updates

After moving to Austin, Dhanvi join chess club & started playing USCF rated chess tournaments.

With last Spicewood elementary, he completed first year.

Following picture tells his progress.

He dreams of becoming GM & spends daily some amount.

For most the year, he is consistent with his play.

Sad part is we are unable to find a right teacher. All his games based on books & YouTube videos.

Hoping we will find a nice chess mentor.

Saketh is also following his brother footsteps.

He started 6 months late. He started with 2014 spring Rackspace tournament.

Still he is learning & winning & enjoying.

Monday, November 17, 2014

Few picture from Spicewood Elementary chess Tournament

Both Dhanvi & Saketh did good. ( 3.5 out of 5 points)
Considering to Dhanvi is sick, his performance is good.

I will write more later.

Friday, November 14, 2014

lucenerevolution 2014 from WASHINGTON, DC NOVEMBER 13

Few pictures from the key note.

Key note from fisrt CTO is fine. A different perspective in to public/Govt partnership via IT.
I will write more about some of the sessions after reaching Austin.
"Stump the Chump" session setup is very bad.
I have an interesting problem in the supply chain problem. SO far no luck. Hopefully I will touch base few other Lucene/SOlr committers sometime in the 2ed day.

Thursday, October 30, 2014

Few picture from Kealing Chess Tournament

Dhanvi’s consistently is still issue. Will post his one year progress sometime during thanks giving weekend.

Thursday, September 04, 2014

On demand refresh of Materialized of views

Use case is simple: Some OLTP system is committing & updating the database.
In reporting system, they want to see the updates based on end user refresh.
Core idea is whenever end user clicks 'Refresh" data, at that time, a REST API call invokes following JDBC code to refresh targeted Materialized views.
( From my code Vault... Good & Old RDBMS days..)

public RefreshDataResponse refreshView(String[] names) {

RefreshDataResponse response = new RefreshDataResponse();
HashMap map = new HashMap();

long lStartTime = System.currentTimeMillis();
try{
Connection connection = jdbcHelper.dataSource.getConnection();

String prefix = "call DBMS_SNAPSHOT.REFRESH('";
String suffix = "','?')";

for ( String name : names){
String finalCall = prefix+name+suffix;

System.out.println(" final call "+finalCall);

CallableStatement stmt = connection.prepareCall(finalCall);

String update =null;
if ( flag == true ) update = "TRUE";
else update = "FALSE";
map.put(name,update);
}
response.setStatus("OK");
response.setIdAndStatus(map);
connection.close();

} catch (Exception e) {
// TODO Auto-generated catch block
response.setStatus("ERROR");
response.setErrorMessage(e.getLocalizedMessage());
e.printStackTrace();
}finally {
}

long lEndTime = System.currentTimeMillis();
long difference = lEndTime - lStartTime;
System.out.println("MV refresh Elapsed milliseconds: " + difference + "in sec:"+difference/1000);

return response;

}

Tuesday, August 26, 2014

Search Relevancy Issues Good example.

My keyword search terms are sony laptop @bestbuy.com

See the following screenshot with the search results.

Clearly I am looking for sony laptop & results contains all types of items.

Also many marketplace items.

I am looking for items from Bestbuy.

After selecting 4Gig/8Gig facets & Bestbuy Item tab, systems display all the laptops.

See the following picture.

(You can try the same query at amazon or Walmart and feel the difference.)

Search Findability Issues Good example.

I don’t know what happened to BestBuy keyword search.

Lately after moving to new home, searching for fridges with model numbers.

See the following screenshot.

My input is WSF26C3EXF. model number of the the highly rated “Whirlpool 26.4-cu Ft. Side-by-Side Refrigerator”

How bad a model number search fails without any results?

Try the same input at lowes or amazon.
Item shows up like a charm.

Above case is one fine example to demonstrate “findability” issue of Search in ecommerce site.

Next topic is few relevancy examples.

Tuesday, August 19, 2014

Google Maps APIs… Interesting findings. (Issues with Google MAPS API webservices)

For a while, I am using some kind of Geocoding service to figure out the Longitude & latitude of few small business addresses so that we can suggest better service within certain distance. As long as address is correct, most of the Geocoding services works ok. (Some are good at Europe & Asian address whereas Google does great job with US addresses.) However main focus of this post is, I end up some legacy data with business addresses. (Most of the data was created early 90’s. There is no consistent way address was created or updated or maintained.) Still they are valid businesses & doing businesses however our data is incorrect. See the following example.

https://maps.googleapis.com/maps/api/geocode/xml?address=3100%20N%20COMMERCE%20ST,FORT%20WORTH,MI,US

address:

3100 N COMMERCE ST, FORT WORTH, MI, US

Google comes with responses which contains corrected address belonging to TX state & TC Lat/Long

3100 North Commerce Street, Fort Worth, TX 76164, USA

Based on the data set, I know that this business address belongs to MI State.

After ignoring first address field, now Let/Long comes back as

Fort Worth Drive, Macomb, MI 48044, USA

Despite my input contains MI, US, Google is not suggesting any valid Lat/Long values in MI.

So end up, making one more call to ignore first field in my address.

I know two more use cases like this & will append to this blog post.

Bottom line dirty data leads to more dirty data.

Even with Google MAPs API, I end up calling twice.
Bad choice? Any other alternatives?

On a side note: for my input N COMMERCE ST,FORT WORTH,MI,US
bing maps gives me following Lat/long coordinates.

Monday, August 18, 2014

SOLR deep pagination fix. (For now it scales)

So far in my earlier SOLR implementation of enterprise search, there is no need to export large amount of data in Excel OR CSV form. However with GM implementation, there is need to export a range of search results. (For example 300K to 500K rows of 1 million search results found). This is do perform some kind of statics analysis of results.

SOLR Rest API does allow fetching results with different start parameters in small blocks. However they way pagination was implemented ( with the all ranking math), as start param increases, search response time will increase linearly. Following Jira tasks explains more details.

https://issues.apache.org/jira/browse/SOLR-5463

See the start & QTime param values.

SOLR 4.61 search request & response numbers:

/solr path=/select params={start=304000&q=*&json.nl=map&wt=javabin&version=2&row
s=1000} hits=698889 status=0 QTime=1089

SOLR 4.71 search request & response numbers:

/solr path=/select params={start=304000&q=*&json.nl=map&wt=javabin&version=2&row
s=1000} hits=698889 status=0 QTime=108

Key SolrJ Java code changes to use new "cursor mark" feature:

String cursorVal = "*"; <--- default="" font="" value="">

for ( int i =0; i< noOfTrips;i++ ){
int start = i*bucketSize;
//build search request
s.setStart(start); <-- as="" font="" set="" start="">
s.setCursorMark(cursorVal);

///Execute your search request
SimpleSearchResponse searchResponse = performSimpleSearch1(s);
//now process search results

//process cursor mark value
cursorVal = searchResponse.getCursorValue();

???

////////////
if (cursorVal != null )
query.set("cursorMark", cursorVal);
else
query.set("cursorMark", "*");

/// New SOLRJ API method which returns the cursor Mark value
QueryResponse response = getServer().query(query);
String value = response.getNextCursorMark();

Sunday, March 30, 2014

Rackspace Spring 2014 Chess Tournament

Both Dhanvi & Saketh participated and won trophies.
For Saketh this is his first tournament & it is positive one.
I will add more details about this event in blog post.
For now few pictures.

Wednesday, March 05, 2014

SOLR Velocity Template based web UI to database dictionary aka database walker

Here primary use case is, in most of the large IT organizations, lot’s of internal IT applications uses some kind of RDBMS as back-end and over the years, one will see hundred of databases. Again typical SOLO style operations i.e. one department focus on their needs only. In general one will end up seeing lot’s of duplication of data. However In my case, I end up analyzing very large database (hundred of schema or tables spaces and thousands of tables and 5 digit number of columns.) views, materialized views, stored procedures and many more. I noticed lot’s of people are using Oracle SQL developer for analysis and keep jumping from one table or view to other tables in other schemas. After seeing this, I wrote a small database walker. Primary purpose is to crawls entire Oracle data dictionary and produces some xml. I am feeding this to SOLR so that I can build simple Google kind of interface with using SOLRs default velocity templates based web UI to search for tables or columns or schemas or primary key and many more. I will host entire project in the GIT hub. In this post, I am including Oracle Table metadata only. (i.e. table columns, primary key, import & exported keys and column meta data etc. I wrote more code to pull stored procedures code etc.)

public static void  tableInfo(DatabaseMetaData meta,String tableName,String tableType, String schemaName) throws Exception {

  String   catalog   = null;
  String   schemaPattern     = schemaName;
  String   tableNamePattern  = tableName;
  String   columnNamePattern = null;

  String outputFile  = stageDir+schemaName+"_"+tableName+".xml";

  File f  = new File(outputFile);
  if (f.exists()){
  System.out.print("Skiping->"+outputFile);
  return;
  }

  FileWriter fw = null;
  try{
   fw = new FileWriter(outputFile);
  }catch(Exception e){
  System.out.print("Error ...  Skiping->"+outputFile);
  return;
  }

  if (fw == null){
   System.out.print("Unable to open file.  Skiping->"+outputFile);
   return;
  }

  fw.write("<add>\n");
  fw.write("<doc>\n");

  ResultSet result = meta.getColumns(
      catalog, schemaPattern,  tableNamePattern, columnNamePattern);

   String colName = "field name=\""+"id"+"\"";
   fw.write("<" + colName + ">");
   fw.write(tableName);
   fw.write("</field>");
   fw.write( "\n");

   colName = "field name=\""+"tableName"+"\"";
   fw.write("<" + colName + ">");
   fw.write(tableName);
   fw.write("</field>");
   fw.write( "\n");

   colName = "field name=\""+"tableType"+"\"";
   fw.write("<" + colName + ">");
   fw.write(tableType);
   fw.write("</field>");
   fw.write( "\n");

   colName = "field name=\""+"schemaName"+"\"";
   fw.write("<" + colName + ">");
   fw.write(schemaName);
   fw.write("</field>");
   fw.write( "\n");

  while(result.next()){///TODO remove 2,3,3 junk
      String columnName = result.getString(4);
      int    columnType = result.getInt(5);
      String columnTypeStr = result.getString(6);
      String catName = result.getString(1);

   colName = "field name=\""+"colName"+"\"";
   fw.write("<" + colName + ">");
   fw.write(columnName);
   fw.write("</field>");
   fw.write( "\n");

   //colName = "field name=\""+columnName+"_dtype"+"\"";
  // fw.write("<" + colName + ">");
   //fw.write(columnTypeStr);
  // fw.write("</field>");
   //fw.write( "\n");

   colName = "field name=\""+"colMeta"+"\"";
   fw.write("<" + colName + ">");
   fw.write(columnName+","+columnTypeStr);
   fw.write("</field>");
   fw.write( "\n");

  ////pull logical data.
  String[] logicalData = LogicalMetadata.getLogicalData(schemaName, tableName,columnName);
  if ( logicalData != null && logicalData.length <=7){
  String entityName = logicalData[3];
  String entityAttrName = logicalData[4];
  String entityAttrDesc = logicalData[6];

  /*
   colName = "field name=\""+"logicalEntityName"+"\"";
   fw.write("<" + colName + ">");
   fw.write(entityName);
   fw.write("</field>");
   fw.write( "\n");*/

   String logicalName = columnName+ "_lan";

   colName = "field name=\""+logicalName+"\"";
   fw.write("<" + colName + ">");
   fw.write(entityAttrName);
   fw.write("</field>");
   fw.write( "\n");

   String logicalDesc = columnName+ "_lad";

   colName = "field name=\""+logicalDesc+"\"";
   fw.write("<" + colName + ">");
   fw.write(entityAttrDesc);
   fw.write("</field>");
   fw.write( "\n");

  }
  }
  result.close();

  ResultSet  result1 = meta.getPrimaryKeys(
    catalog, schemaName, tableNamePattern);
  String columnName = null;
  HashSet set = new HashSet();

  while(result1.next()){
  columnName = result1.getString(4);
   if (set.contains(columnName)){
   //do nothing
   }else{
   colName = "field name=\""+"primaryKey"+"\"";
   fw.write("<" + colName + ">");
   fw.write(columnName);
   fw.write("</field>");
   fw.write( "\n");
   set.add(columnName);
   //System.out.println(" primary key" + columnName);
   }
  }
  result1.close();

  /////
  set.clear();

    ResultSet rs = meta.getExportedKeys(
            catalog, schemaPattern, tableNamePattern );

        while (rs.next()) {
            String fkTableName = rs.getString("FKTABLE_NAME");
            String fkColumnName = rs.getString("FKCOLUMN_NAME");
            int fkSequence = rs.getInt("KEY_SEQ");

   colName = "field name=\""+"ExportedKeys_Table_Colum_Seq"+"\"";
   fw.write("<" + colName + ">");
   fw.write(fkTableName+"."+fkColumnName+"."+fkSequence);
   fw.write("</field>");
   fw.write( "\n");

         }
        rs.close();

        ResultSet foreignKeys = meta.getImportedKeys( catalog, schemaName, tableNamePattern);
        while (foreignKeys.next()) {
            String fkTableName = foreignKeys.getString("FKTABLE_NAME");
            String fkColumnName = foreignKeys.getString("FKCOLUMN_NAME");
            String pkTableName = foreignKeys.getString("PKTABLE_NAME");
            String pkColumnName = foreignKeys.getString("PKCOLUMN_NAME");
   colName = "field name=\""+"ImportedKeys_Table_Colum_Seq"+"\"";
   fw.write("<" + colName + ">");
   fw.write(fkTableName+"."+fkColumnName+"."+pkTableName+"."+pkColumnName);
   fw.write("</field>");
   fw.write( "\n");
        }
        foreignKeys.close();

  fw.write("</doc>\n");
  fw.write("</add>\n");

  fw.flush();
  fw.close();

  }

Monday, March 03, 2014

Very Old: XSLT, XSL code convert XML file to produce SOLR documents ( aka xml)

Based on content needs, I used different strategies to produce SOLR input documents.
One of old post, contains Excel input to SOLR input documents.
In this particular use-case, a simple XSL aka XSLT, style-sheet converts an incoming eCommerce XML content feed in to SOLR documents & pushes these documents to SOLR staging folder.
From there, our custom SOLR component consumes these xml to add or update new content at regular intervals. I will post that custom SOLR component logic later in this blog. For now, I am including sample stylesheet which produces solr input XML document based another XML input feed

<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">

  <xsl:strip-space elements="*"/>
  <xsl:output method="xml" />

  <xsl:template match="/products">
     <add>
         <xsl:apply-templates />
     </add>
  </xsl:template>

<xsl:template match="product">
     <doc>
       <xsl:element name="field">
            <xsl:attribute name="name">id</xsl:attribute>
            <xsl:value-of select="./@skuNo"/>
        </xsl:element>

       <xsl:element name="field">
            <xsl:attribute name="name">status</xsl:attribute>
            <xsl:value-of select="./@status"/>
        </xsl:element>
       <xsl:element name="field">
            <xsl:attribute name="name">brand</xsl:attribute>
            <xsl:value-of select="./brand"/>
        </xsl:element>
       <xsl:element name="field">
            <xsl:attribute name="name">description</xsl:attribute>
            <xsl:value-of select="./descriptions/description"/>
        </xsl:element>
       <xsl:element name="field">
            <xsl:attribute name="name">department</xsl:attribute>
            <xsl:value-of select="./department/@name"/>
        </xsl:element>

       :q!

       <xsl:element name="field">
            <xsl:attribute name="name">categoryname</xsl:attribute>
            <xsl:value-of select="./class/@name"/>
        </xsl:element>

        <xsl:call-template name="addCurrentPrice">
        <xsl:with-param name="price" select="."/>
</xsl:call-template>

        <xsl:call-template name="addTaxonomyContent">
        <xsl:with-param name="skid" select="."/>
</xsl:call-template>

    </doc>
</xsl:template>

<xsl:template name="addCurrentPrice">
    <xsl:param name="product"/>

      <xsl:for-each select="./offers/offer/prices/*">
          <xsl:element name="field">
               <xsl:attribute name="name"><xsl:value-of select="@type"/></xsl:attribute>
               <xsl:value-of select="@amount"/>
        </xsl:element>
      </xsl:for-each>
</xsl:template>

<xsl:template name="addTaxonomyContent">
    <xsl:param name="product"/>
      <xsl:for-each select="./hierarchy/*">
           <xsl:variable name="ctr" select="position()"/>
           <xsl:variable name="variable" select="concat('tname_',$ctr)"/>

          <xsl:element name="field">
               <xsl:attribute name="name"><xsl:value-of select="$variable"/></xsl:attribute>
               <xsl:value-of select="@name"/>
        </xsl:element>
      </xsl:for-each>
</xsl:template>

</xsl:stylesheet>

Sunday, March 02, 2014

Sloppy Kohl's sale sign

After moving to Austin, TX, this is 2ed time, I visited nearby Kohl’s location (zip code 78759)

During first visit, during December last week, I noticed store was very sloppy. I thought it is because of peak sales season. ( I was comparing w.r.t. Kohl’s Minneapolis stores.)

Yesterday, I went buy shoes for my Saketh. Still store is sloppy. few customers.

See the following picture. (Sale price is greater registered price.)

I keep getting too many e-mails/ physical mails mentioning about sales however, at store level their execution is very poor. May be it is time, Kohl’s management needs to slow down on sales promotion & focus on basic things.

Wednesday, February 26, 2014

Personal stock portal… YQL experiments.

In general, Lots of portals provides stock quote information. For example, @finance.yahoo.com, one can give a valid stock ticker & gets the stock information. From this point, if you want to know more about company profile you have to click one link or one more web service call. If you want to know more about last year statistics, you have to click or one more web service call. Couple of years back, after knowing little bit of YQL, I did small POC. Basically given a stock ticker, system will display complete snapshot of the company in one click. It is a fun ride however with all bug in YQL, It remained as POC only.

Tons of code. But I will dump few important methods.

////

String stockSym = "AAPL";

SupplierSection StockInfo = new SupplierSection(stockSym);

StockInfo.createHttpClient();

StockInfo.crawlFinanceInfo(stockSym);//crawlRssInfo

StockInfo.crawlProfileInfo(stockSym);

StockInfo.crawlKeyStats(stockSym);

StockInfo.crawlRssInfo(stockSym);

StockInfo.crawlQuantInfo(stockSym);//crawlRssInfo

//StockInfo.printAllPublicInfo();

try {

//StockInfo.writeToXmlFile();

//StockInfo.writeToXmlFile1(stockSym);

StockInfo.writehtmlFile1(stockSym);

} catch (Exception e) {

// TODO Auto-generated catch block

e.printStackTrace();

}

///basic compnay info

public void crawlProfileInfo(String stockSym) {

CloseableHttpClient httpclient = HttpClients.createDefault();

try {

GetMethod httpGet = new GetMethod("http://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20yahoo.finance.stocks%20where%20symbol%3D%22"+stockSym+"%22&env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys");

httpGet.setFollowRedirects( true );

System.out.println("executing request in crawlProfileInfo " );

int responseCode = httpClient.executeMethod(httpGet);

byte[] data = httpGet.getResponseBody();

if (responseCode >= 400) {

System.out.println("Failed to send request "+ responseCode);

}else{

String xmlString = new String( data, ENCODING );

//System.out.println("xml string "+ xmlString);

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();

DocumentBuilder db = factory.newDocumentBuilder();

InputSource inStream = new InputSource();

inStream.setCharacterStream(new StringReader(xmlString));

Document doc = db.parse(inStream);

String fName = "CompanyName";