noticed as part of a presentaion & I started using for my upcoming project.
Daily I help teams with solution engineering aspect of connected vehicle data projects. (massive datasets & always some new datasets with new car models aka new technologies.) Lately in the spare time, applying some of the ML/Deep learning techniques on datasets (many are create based on observations of real datasets)To Share some thoughts on my work (main half of this blog) and the other half will be about my family and friends.
Friday, October 08, 2021
Monday, September 20, 2021
A case study on how to screw a great product & team
Business and systems architects
got good vision on a critical enterprise data product. After some key decisions,
everyone worked hard and initial few years later, a good product was deployed/operational.
Some wants to make/take it too
great & at the same time, someone in the leadership thought, it is his baby
and want to survive on it for rest of his life. so, he hired incompetent
leaders to manage people.
This led to constantly delivered contributors
are ignored but new commers are promoted. This news alone devastated the key
guys. When they realized, they simply moved on within weeks. Product is still evolving, and this happened. Now
a good product became a mediocre product team and quality issues started (this
is SAD & yet real story of a company which dreams BIG yet makes bad decisions.)
A people leader needs
to inspire, ( worst case stay neutral) else great turns in to shit quickly.
More later....
Monday, August 02, 2021
Apache Spark JSON pasing confusing errrors
input json:
{
"shipping_address": {
"street_address": "1600 Pen Avenue NW",
"city": "Washington",
"state": "DC",
"type": "business",
"additionalProperties": {
"test": "one",
"test1": "two"
}
}
}
spark code;
# File location and type
file_location = "/FileStore/tables/mock_example-1.json"
file_type = "json"
# CSV options
infer_schema = "false"
first_row_is_header = "false"
delimiter = ","
# The applied options are for CSV files. For other file types, these will be ignored.
df = spark.read.format(file_type) \
.option("inferSchema", infer_schema) \
.option("header", first_row_is_header) \
.option("sep", delimiter) \
.load(file_location)
display(df)
Since Spark 2.3, the queries from raw JSON/CSV files are disallowed when the referenced columns only include the internal corrupt record column
Solution: usuually spark expects one json message per line..
In general we use Notepad etc, to format JSON examples. ( just to validate the strucrue of the documents)... if U are saving formatted JSON, then spark will fail with the above error.
notepad JSON plugin offers compressJSON option too. so compress/save it. It works fine
Saturday, April 24, 2021
Weekday ( Monday) pure EV averages distance driven and electric consumed usage
Based on 50K vehicles, average distance driven/kWh consumed.
In Box plot form
Kwh consumption (data includes some outliers)
After outliers cleanup
Wednesday, March 24, 2021
Connected Car Q&A
Everyday, I work on connected car data projects.
Lately few people repeatedly keep asking what it means. So I put brief Q&A.
Something I am sharing. (small slice but hoping this is useful)
What is connected Car?
A car with have access to the Internet and
communicate with traditional automotive components, such as the engine and
electronics, as well as the smart devices of a driver. All via telematics* system.
What type of car data
are talking and how it is useful to a driver or vehicle owner or auto maker or 3ed
party?
the most common use of car data is to
improve the driving experience by collect the data about driver behavior events
i.e., from ignition on to ignition off. This data improves following experiences
for the driver.
·
Finding
fuel location/battery charging station as needed
·
Local
business searches and promotions
·
Journey
route weather/traffic updates
·
Real-time
data communications about any emergency situations (flat tire etc.), crash etc
·
Location
sharing, fast theft response
·
Insurance
discounts based on good driving behaviors/usage-based discounts
For vehicle manufactures
a) Data helps to measure the performance/reliability of the vehicles. Data helps to pinpoint about unforeseen issue(s) with new & old vehicles. Data helps voluntary recall vehicles for specific issues.
b)
Data helps to catch fraudulent
warranty clams/odo tampering issues
c)
Various service
offering such as oil changes, end of brake pad changes etc.
d)
Offering customer services
for example geo fence boundaries for family of drivers. With Teen drivers.
For 3ed party companies:
Vehicle location data
helps in forecasting about live traffic conditions
3ed parity insurance
providers to offers discounts based on driving behaviors, usage.
Automatic pothole information improves the road conditions
Near real time weather data to forecast real time weather
Usually Telematic* systems
are integrated with Satellite navigation systems and onboard computers and back office systems. Not only data collections, back office controls and refreshes the software inside cars too via over the air updates.