Wednesday, January 29, 2020

Hive Timestamp column, default query results behavior, common misunderstanding and formatting to a target zone


    Similar to many RDBMS, Hive Timestamp datatype and it is stored as bigint and are stored as an offset from the UNIX epoch. Usually Timestamps are interpreted to be timezone-less. However end user runs a Hive query (via Hue or any tools), it returns the columns in the default server time zone. (in our case, EST ad team started saying your app is   not saving data in UTC, we need UTC format etc. etc.  After spending sometime with Hive language manual and with some of the past issues, I found an work around via Hive JIRA.

Idea of this query is vehicle sensor data is stored in data table.
Intent of the query is to know what is the value of a sensor on a given day.



select  ID, element_cd,
from_utc_timestamp(to_utc_timestamp(from_unixtime(unix_timestamp(timstm_column) ),'EST'), 'UTC'),
element_val
from  data_elements
where yr_nbr = 2019 and mth_nbr = 11 and day_nbr in (30)
and element_cd='POWER_LEVEL'


Lot’s of indirection. i.e. taking timestamp column & formatting it to unix timestamp and then first converting to default time zone and then to target time zone.  ( Above query returns timestamp value as UTC formatt.  My  HIVE servers are running in EST)

No comments: