Friday, October 08, 2021

loving excalidraw

 noticed as part of a presentaion & I started using for my upcoming project. 



Monday, September 20, 2021

A case study on how to screw a great product & team

Business and systems architects got good vision on a critical enterprise data product. After some key decisions, everyone worked hard and initial few years later, a good product was deployed/operational.

Some wants to make/take it too great & at the same time, someone in the leadership thought, it is his baby and want to survive on it for rest of his life. so, he hired incompetent leaders to manage people.

This led to constantly delivered contributors are ignored but new commers are promoted. This news alone devastated the key guys. When they realized, they simply moved on within weeks.  Product is still evolving, and this happened. Now a good product became a mediocre product team and quality issues started (this is SAD & yet real story of a company which dreams BIG yet makes bad decisions.)

A  people leader needs to inspire, ( worst case stay neutral) else great turns in to shit quickly.

More later....


Monday, August 02, 2021

Apache Spark JSON pasing confusing errrors

input json:

{

      "shipping_address": {

        "street_address": "1600 Pen Avenue NW",

        "city": "Washington",

        "state": "DC",

         "type": "business",

         "additionalProperties": {

            "test": "one",

             "test1": "two"

          }

      }

}

spark code;

# File location and type

file_location = "/FileStore/tables/mock_example-1.json"

file_type = "json"


# CSV options

infer_schema = "false"

first_row_is_header = "false"

delimiter = ","


# The applied options are for CSV files. For other file types, these will be ignored.

df = spark.read.format(file_type) \

  .option("inferSchema", infer_schema) \

  .option("header", first_row_is_header) \

  .option("sep", delimiter) \

  .load(file_location)

display(df)



Since Spark 2.3, the queries from raw JSON/CSV files are disallowed when the referenced columns only include the internal corrupt record column

Solution: usuually spark expects one json message per line..

In general we use Notepad etc, to format JSON examples. ( just to validate the strucrue of the documents)... if U are saving formatted JSON, then spark will fail with the above error.

notepad JSON plugin offers compressJSON option too. so compress/save it. It works fine


Saturday, April 24, 2021

Weekday ( Monday) pure EV averages distance driven and electric consumed usage

Based on 50K vehicles, average distance driven/kWh consumed.



In Box  plot form





Kwh consumption (data includes some outliers)



After outliers cleanup












Wednesday, March 24, 2021

Connected Car Q&A

 Everyday, I work on connected car data projects.

Lately few people repeatedly keep asking what it means. So I put brief Q&A. 

Something I am sharing. (small slice but hoping this is useful)


What is connected Car?

    A car with have access to the Internet and communicate with traditional automotive components, such as the engine and electronics, as well as the smart devices of a driver. All via telematics* system.

What type of car data are talking and how it is useful to a driver or vehicle owner or auto maker or 3ed party?

the most common use of car data is to improve the driving experience by collect the data about driver behavior events i.e., from ignition on to ignition off. This data improves following experiences for the driver.

·        Finding fuel location/battery charging station as needed

·        Local business searches and promotions

·        Journey route weather/traffic updates

·        Real-time data communications about any emergency situations (flat tire etc.), crash etc

·        Location sharing, fast theft response

·        Insurance discounts based on good driving behaviors/usage-based discounts

For vehicle manufactures

a)      Data helps to measure the performance/reliability of the vehicles. Data helps to pinpoint about unforeseen issue(s) with new & old vehicles. Data helps voluntary recall vehicles for specific issues.

b)     Data helps to catch fraudulent warranty clams/odo tampering issues

c)      Various service offering such as oil changes, end of brake pad changes etc.

d)     Offering customer services for example geo fence boundaries for family of drivers. With Teen drivers.

           For 3ed party companies:

     Vehicle location data helps in forecasting about live traffic conditions

    3ed parity insurance providers to offers discounts based on driving behaviors, usage.

      Automatic pothole information improves the road conditions

      Near real time weather data to forecast real time weather

 

Usually Telematic* systems are integrated with Satellite navigation systems and onboard computers and back office systems. Not only data collections, back office controls and refreshes the software inside cars too via over the air updates.