JSON
This document explains how to load JSON format data files into Doris. Doris supports loading standard JSON format data and can flexibly handle different JSON data structures through parameter configuration, supporting field extraction from JSON data and handling nested structures.
Loading Methods
The following loading methods support JSON format data:
Supported JSON Formats
Doris supports the following three JSON formats:
Multiple Rows Represented as Array
Suitable for batch loading multiple rows of data, requirements:
- Root node must be an array
- Each element in the array is an object representing a row of data
- Must set
strip_outer_array=true
Example data:
[
{"id": 123, "city": "beijing"},
{"id": 456, "city": "shanghai"}
]
// Supports nested structures
[
{"id": 123, "city": {"name": "beijing", "region": "haidian"}},
{"id": 456, "city": {"name": "beijing", "region": "chaoyang"}}
]
Single Row Represented as Object
Suitable for loading single row data, requirements:
- Root node must be an object
- The entire object represents one row of data
Example data:
{"id": 123, "city": "beijing"}
// Supports nested structures
{"id": 123, "city": {"name": "beijing", "region": "haidian"}}
Typically used with Routine Load method, such as single messages in Kafka.
Multiple Object Rows Separated by Delimiter
Suitable for batch loading multiple rows of data, requirements:
- Each line is a complete JSON object
- Must set
read_json_by_line=true
- Line delimiter can be specified using
line_delimiter
parameter, default is\n
Example data:
{"id": 123, "city": "beijing"}
{"id": 456, "city": "shanghai"}
Parameter Configuration
Parameter Support
The following table lists the JSON format parameters supported by various loading methods:
Parameter | Default Value | Stream Load | Broker Load | Routine Load | TVF |
---|---|---|---|---|---|
json paths | None | jsonpaths | properties.jsonpaths | properties.jsonpaths | jsonpaths |
json root | None | json_root | properties.json_root | properties.json_root | json_root |
strip outer array | false | strip_outer_array | properties.strip_outer_array | properties.strip_outer_array | strip_outer_array |
read json by line | false | read_json_by_line | Always true | Not supported | read_json_by_line, default true |
fuzzy parse | false | fuzzy_parse | properties.fuzzy_parse | Not supported | fuzzy_parse |
num as string | false | num_as_string | properties.num_as_string | properties.num_as_string | num_as_string |
- Stream Load: Parameters are specified directly through HTTP Headers, e.g.,
-H "jsonpaths: $.data"
- Broker Load: Parameters are specified through
PROPERTIES
, e.g.,PROPERTIES("jsonpaths"="$.data")
- Routine Load: Parameters are specified through
PROPERTIES
, e.g.,PROPERTIES("jsonpaths"="$.data")
- TVF: Parameters are specified in TVF statements, e.g.,
S3("jsonpaths"="$.data")
Parameter Description
JSON Path
- Purpose: Specifies how to extract fields from JSON data
- Type: String array
- Default Value: None, defaults to matching column names
- Usage Examples:
-- Basic usage
["$.id", "$.city"]
-- Nested structures
["$.id", "$.info.city", "$.data[0].name"]
JSON Root
- Purpose: Specifies the parsing starting point for JSON data
- Type: String
- Default Value: None, defaults to parsing from root node
- Usage Example:
-- Original data
{
"data": {
"id": 123,
"city": "beijing"
}
}
-- Set json_root
json_root = $.data
Strip Outer Array
- Purpose: Specifies whether to remove the outermost array structure
- Type: Boolean
- Default Value: false
- Usage Example:
-- Original data
[
{"id": 1, "city": "beijing"},
{"id": 2, "city": "shanghai"}
]
-- Set strip_outer_array=true
Read JSON By Line
- Purpose: Specifies whether to read JSON data line by line
- Type: Boolean
- Default Value: false
- Usage Example:
-- Original data (one complete JSON object per line)
{"id": 1, "city": "beijing"}
{"id": 2, "city": "shanghai"}
-- Set read_json_by_line=true
Fuzzy Parse
- Purpose: Accelerates JSON data loading efficiency
- Type: Boolean
- Default Value: false
- Limitations:
- Field order in each row of the Array must be identical
- Usually used with strip_outer_array
- Performance: Can improve loading efficiency by 3-5 times
Num As String
- Purpose: Specifies whether to parse JSON numeric types as strings
- Type: Boolean
- Default Value: false
- Use Cases:
- Handling numbers outside numeric range
- Avoiding numeric precision loss
- Usage Example:
-- Original data
{
"id": "12345678901234567890",
"price": "99999999.999999"
}
-- Set num_as_string=true, price field will be parsed as string
Relationship between JSON Path and Columns
During data loading, JSON Path and Columns serve different responsibilities:
JSON Path: Defines data extraction rules
- Extracts fields from JSON data according to specified paths
- Extracted fields are reordered according to the order defined in JSON Path
Columns: Defines data mapping rules
- Maps extracted fields to target table columns
- Can perform column reordering and transformation
These two parameters are processed serially: first, JSON Path extracts fields from source data and forms an ordered dataset, then Columns maps these data to table columns. If Columns is not specified, extracted fields will be mapped directly according to table column order.
Usage Examples
Using JSON Path Only
Table structure and data:
-- Table structure
CREATE TABLE example_table (
k2 int,
k1 int
);
-- JSON data
{"k1": 1, "k2": 2}
Load command:
curl -v ... -H "format: json" \
-H "jsonpaths: [\"$.k2\", \"$.k1\"]" \
-T example.json \
http://<fe_host>:<fe_http_port>/api/db_name/table_name/_stream_load
Load result:
+------+------+
| k1 | k2 |
+------+------+
| 2 | 1 |
+------+------+
Using JSON Path + Columns
Using the same table structure and data, adding columns parameter:
Load command:
curl -v ... -H "format: json" \
-H "jsonpaths: [\"$.k2\", \"$.k1\"]" \
-H "columns: k2, k1" \
-T example.json \
http://<fe_host>:<fe_http_port>/api/db_name/table_name/_stream_load
Load result:
+------+------+
| k1 | k2 |
+------+------+
| 1 | 2 |
+------+------+
Field Reuse
Table structure and data:
-- Table structure
CREATE TABLE example_table (
k2 int,
k1 int,
k1_copy int
);
-- JSON data
{"k1": 1, "k2": 2}
Load command:
curl -v ... -H "format: json" \
-H "jsonpaths: [\"$.k2\", \"$.k1\", \"$.k1\"]" \
-H "columns: k2, k1, k1_copy" \
-T example.json \
http://<fe_host>:<fe_http_port>/api/db_name/table_name/_stream_load
Load result:
+------+------+---------+
| k2 | k1 | k1_copy |
+------+------+---------+
| 2 | 1 | 1 |
+------+------+---------+
Nested Field Mapping
Table structure and data:
-- Table structure
CREATE TABLE example_table (
k2 int,
k1 int,
k1_nested1 int,
k1_nested2 int
);
-- JSON data
{
"k1": 1,
"k2": 2,
"k3": {
"k1": 31,
"k1_nested": {
"k1": 32
}
}
}
Load command:
curl -v ... -H "format: json" \
-H "jsonpaths: [\"$.k2\", \"$.k1\", \"$.k3.k1\", \"$.k3.k1_nested.k1\"]" \
-H "columns: k2, k1, k1_nested1, k1_nested2" \
-T example.json \
http://<fe_host>:<fe_http_port>/api/db_name/table_name/_stream_load
Load result:
+------+------+------------+------------+
| k2 | k1 | k1_nested1 | k1_nested2 |
+------+------+------------+------------+
| 2 | 1 | 31 | 32 |
+------+------+------------+------------+
Usage Examples
This section demonstrates the usage of JSON format in different loading methods.
Stream Load
# Using JSON Path
curl --location-trusted -u <user>:<passwd> \
-H "format: json" \
-H "jsonpaths: [\"$.id\", \"$.city\"]" \
-T example.json \
http://<fe_host>:<fe_http_port>/api/example_db/example_table/_stream_load
# Specifying JSON root
curl --location-trusted -u <user>:<passwd> \
-H "format: json" \
-H "json_root: $.events" \
-T example.json \
http://<fe_host>:<fe_http_port>/api/example_db/example_table/_stream_load
# Reading JSON by line
curl --location-trusted -u <user>:<passwd> \
-H "format: json" \
-H "read_json_by_line: true" \
-T example.json \
http://<fe_host>:<fe_http_port>/api/example_db/example_table/_stream_load
Broker Load
-- Using JSON Path
LOAD LABEL example_db.example_label
(
DATA INFILE("s3://bucket/path/example.json")
INTO TABLE example_table
FORMAT AS "json"
PROPERTIES
(
"jsonpaths" = "[\"$.id\", \"$.city\"]"
)
)
WITH S3
(
...
);
-- Specifying JSON root
LOAD LABEL example_db.example_label
(
DATA INFILE("s3://bucket/path/example.json")
INTO TABLE example_table
FORMAT AS "json"
PROPERTIES
(
"json_root" = "$.events"
)
)
WITH S3
(
...
);
Routine Load
-- Using JSON Path
CREATE ROUTINE LOAD example_db.example_job ON example_table
PROPERTIES
(
"format" = "json",
"jsonpaths" = "[\"$.id\", \"$.city\"]"
)
FROM KAFKA
(
...
);
TVF Load
-- Using JSON Path
INSERT INTO example_table
SELECT *
FROM S3
(
"path" = "s3://bucket/example.json",
"format" = "json",
"jsonpaths" = "[\"$.id\", \"$.city\"]",
...
);
-- Specifying JSON root
INSERT INTO example_table
SELECT *
FROM S3
(
"path" = "s3://bucket/example.json",
"format" = "json",
"json_root" = "$.events",
...
);