input json:
{
"shipping_address": {
"street_address": "1600 Pen Avenue NW",
"city": "Washington",
"state": "DC",
"type": "business",
"additionalProperties": {
"test": "one",
"test1": "two"
}
}
}
spark code;
# File location and type
file_location = "/FileStore/tables/mock_example-1.json"
file_type = "json"
# CSV options
infer_schema = "false"
first_row_is_header = "false"
delimiter = ","
# The applied options are for CSV files. For other file types, these will be ignored.
df = spark.read.format(file_type) \
.option("inferSchema", infer_schema) \
.option("header", first_row_is_header) \
.option("sep", delimiter) \
.load(file_location)
display(df)
Since Spark 2.3, the queries from raw JSON/CSV files are disallowed when the referenced columns only include the internal corrupt record column
Solution: usuually spark expects one json message per line..
In general we use Notepad etc, to format JSON examples. ( just to validate the strucrue of the documents)... if U are saving formatted JSON, then spark will fail with the above error.
notepad JSON plugin offers compressJSON option too. so compress/save it. It works fine