JSON

This document explains how to load JSON format data files into Doris. Doris supports loading standard JSON format data and can flexibly handle different JSON data structures through parameter configuration, supporting field extraction from JSON data and handling nested structures.

Loading Methods

The following loading methods support JSON format data:

Supported JSON Formats

Doris supports the following three JSON formats:

Multiple Rows Represented as Array

Suitable for batch loading multiple rows of data, requirements:

Root node must be an array
Each element in the array is an object representing a row of data
Must set strip_outer_array=true

Example data:

[
    {"id": 123, "city": "beijing"},
    {"id": 456, "city": "shanghai"}
]

// Supports nested structures
[
    {"id": 123, "city": {"name": "beijing", "region": "haidian"}},
    {"id": 456, "city": {"name": "beijing", "region": "chaoyang"}}
]

Single Row Represented as Object

Suitable for loading single row data, requirements:

Root node must be an object
The entire object represents one row of data

Example data:

{"id": 123, "city": "beijing"}

// Supports nested structures
{"id": 123, "city": {"name": "beijing", "region": "haidian"}}

Note

Typically used with Routine Load method, such as single messages in Kafka.

Multiple Object Rows Separated by Delimiter

Suitable for batch loading multiple rows of data, requirements:

Each line is a complete JSON object
Must set read_json_by_line=true
Line delimiter can be specified using line_delimiter parameter, default is \n

Example data:

{"id": 123, "city": "beijing"}
{"id": 456, "city": "shanghai"}

Parameter Configuration

Parameter Support

The following table lists the JSON format parameters supported by various loading methods:

Parameter	Default Value	Stream Load	Broker Load	Routine Load	TVF
json paths	None	jsonpaths	properties.jsonpaths	properties.jsonpaths	jsonpaths
json root	None	json_root	properties.json_root	properties.json_root	json_root
strip outer array	false	strip_outer_array	properties.strip_outer_array	properties.strip_outer_array	strip_outer_array
read json by line	false	read_json_by_line	Always true	Not supported	read_json_by_line, default true
fuzzy parse	false	fuzzy_parse	properties.fuzzy_parse	Not supported	fuzzy_parse
num as string	false	num_as_string	properties.num_as_string	properties.num_as_string	num_as_string
compression format	plain	compress_type(supported from 3.0.5)	PROPERTIES.compress_type	compress_type

Note

Stream Load: Parameters are specified directly through HTTP Headers, e.g., -H "jsonpaths: $.data"
Broker Load: Parameters are specified through PROPERTIES, e.g., PROPERTIES("jsonpaths"="$.data")
Routine Load: Parameters are specified through PROPERTIES, e.g., PROPERTIES("jsonpaths"="$.data")
TVF: Parameters are specified in TVF statements, e.g., S3("jsonpaths"="$.data")
If you need to load the JSON object at the root node of a JSON file, the jsonpaths should be specified as $., e.g., PROPERTIES("jsonpaths"="$.")

Parameter Description

JSON Path

Purpose: Specifies how to extract fields from JSON data
Type: String array
Default Value: None, defaults to matching column names

Usage Examples:

-- Basic usage
["$.id", "$.city"]

-- Nested structures
["$.id", "$.info.city", "$.data[0].name"]

JSON Root

Purpose: Specifies the parsing starting point for JSON data
Type: String
Default Value: None, defaults to parsing from root node

Usage Example:

-- Original data
{
  "data": {
    "id": 123,
    "city": "beijing"
  }
}

-- Set json_root
json_root = $.data

Strip Outer Array

Purpose: Specifies whether to remove the outermost array structure
Type: Boolean
Default Value: false

Usage Example:

-- Original data
[
  {"id": 1, "city": "beijing"},
  {"id": 2, "city": "shanghai"}
]

-- Set strip_outer_array=true

Read JSON By Line

Purpose: Specifies whether to read JSON data line by line
Type: Boolean
Default Value: false

Usage Example:

-- Original data (one complete JSON object per line)
{"id": 1, "city": "beijing"}
{"id": 2, "city": "shanghai"}

-- Set read_json_by_line=true

Fuzzy Parse

Purpose: Accelerates JSON data loading efficiency
Type: Boolean
Default Value: false
Limitations:
- Field order in each row of the Array must be identical
- Usually used with strip_outer_array
Performance: Can improve loading efficiency by 3-5 times

Num As String

Purpose: Specifies whether to parse JSON numeric types as strings
Type: Boolean
Default Value: false
Use Cases:
- Handling numbers outside numeric range
- Avoiding numeric precision loss

Usage Example:

-- Original data
{
  "id": "12345678901234567890",
  "price": "99999999.999999"
}
-- Set num_as_string=true, price field will be parsed as string

Relationship between JSON Path and Columns

During data loading, JSON Path and Columns serve different responsibilities:

JSON Path: Defines data extraction rules

Extracts fields from JSON data according to specified paths
Extracted fields are reordered according to the order defined in JSON Path

Columns: Defines data mapping rules

Maps extracted fields to target table columns
Can perform column reordering and transformation

These two parameters are processed serially: first, JSON Path extracts fields from source data and forms an ordered dataset, then Columns maps these data to table columns. If Columns is not specified, extracted fields will be mapped directly according to table column order.

Usage Examples

Using JSON Path Only

Table structure and data:

-- Table structure
CREATE TABLE example_table (
    k2 int,
    k1 int
);

-- JSON data
{"k1": 1, "k2": 2}

Load command:

curl -v ... -H "format: json" \
    -H "jsonpaths: [\"$.k2\", \"$.k1\"]" \
    -T example.json \
    http://<fe_host>:<fe_http_port>/api/db_name/table_name/_stream_load

Load result:

+------+------+
| k1   | k2   |
+------+------+
|    2 |    1 | 
+------+------+

Using JSON Path + Columns

Using the same table structure and data, adding columns parameter:

Load command:

curl -v ... -H "format: json" \
    -H "jsonpaths: [\"$.k2\", \"$.k1\"]" \
    -H "columns: k2, k1" \
    -T example.json \
    http://<fe_host>:<fe_http_port>/api/db_name/table_name/_stream_load

Load result:

+------+------+
| k1   | k2   |
+------+------+
|    1 |    2 | 
+------+------+

Field Reuse

Table structure and data:

-- Table structure
CREATE TABLE example_table (
    k2 int,
    k1 int,
    k1_copy int
);

-- JSON data
{"k1": 1, "k2": 2}

Load command:

curl -v ... -H "format: json" \
    -H "jsonpaths: [\"$.k2\", \"$.k1\", \"$.k1\"]" \
    -H "columns: k2, k1, k1_copy" \
    -T example.json \
    http://<fe_host>:<fe_http_port>/api/db_name/table_name/_stream_load

Load result:

+------+------+---------+
| k2   | k1   | k1_copy |
+------+------+---------+
|    2 |    1 |       1 |
+------+------+---------+

Nested Field Mapping

Table structure and data:

-- Table structure
CREATE TABLE example_table (
    k2 int,
    k1 int,
    k1_nested1 int,
    k1_nested2 int
);

-- JSON data
{
    "k1": 1,
    "k2": 2,
    "k3": {
        "k1": 31,
        "k1_nested": {
            "k1": 32
        }
    }
}

Load command:

curl -v ... -H "format: json" \
    -H "jsonpaths: [\"$.k2\", \"$.k1\", \"$.k3.k1\", \"$.k3.k1_nested.k1\"]" \
    -H "columns: k2, k1, k1_nested1, k1_nested2" \
    -T example.json \
    http://<fe_host>:<fe_http_port>/api/db_name/table_name/_stream_load

Load result:

+------+------+------------+------------+
| k2   | k1   | k1_nested1 | k1_nested2 |
+------+------+------------+------------+
|    2 |    1 |         31 |         32 |
+------+------+------------+------------+

Usage Examples

This section demonstrates the usage of JSON format in different loading methods.

Stream Load

# Using JSON Path
curl --location-trusted -u <user>:<passwd> \
    -H "format: json" \
    -H "jsonpaths: [\"$.id\", \"$.city\"]" \
    -T example.json \
    http://<fe_host>:<fe_http_port>/api/example_db/example_table/_stream_load

# Specifying JSON root
curl --location-trusted -u <user>:<passwd> \
    -H "format: json" \
    -H "json_root: $.events" \
    -T example.json \
    http://<fe_host>:<fe_http_port>/api/example_db/example_table/_stream_load

# Reading JSON by line
curl --location-trusted -u <user>:<passwd> \
    -H "format: json" \
    -H "read_json_by_line: true" \
    -T example.json \
    http://<fe_host>:<fe_http_port>/api/example_db/example_table/_stream_load

Broker Load

-- Using JSON Path
LOAD LABEL example_db.example_label
(
    DATA INFILE("s3://bucket/path/example.json")
    INTO TABLE example_table
    FORMAT AS "json"
    PROPERTIES
    (
        "jsonpaths" = "[\"$.id\", \"$.city\"]"
    )
)
WITH S3 
(
    ...
);

-- Specifying JSON root
LOAD LABEL example_db.example_label
(
    DATA INFILE("s3://bucket/path/example.json")
    INTO TABLE example_table
    FORMAT AS "json"
    PROPERTIES
    (
        "json_root" = "$.events"
    )
)
WITH S3 
(
    ...
);

Routine Load

-- Using JSON Path
CREATE ROUTINE LOAD example_db.example_job ON example_table
PROPERTIES
(
    "format" = "json",
    "jsonpaths" = "[\"$.id\", \"$.city\"]"
)
FROM KAFKA
(
    ...
);

TVF Load

-- Using JSON Path
INSERT INTO example_table
SELECT *
FROM S3
(
    "path" = "s3://bucket/example.json",
    "format" = "json",
    "jsonpaths" = "[\"$.id\", \"$.city\"]",
    ...
);

-- Specifying JSON root
INSERT INTO example_table
SELECT *
FROM S3
(
    "path" = "s3://bucket/example.json",
    "format" = "json",
    "json_root" = "$.events",
    ...
);

Loading Methods​

Supported JSON Formats​

Multiple Rows Represented as Array​

Single Row Represented as Object​

Multiple Object Rows Separated by Delimiter​

Parameter Configuration​

Parameter Support​

Parameter Description​

JSON Path​

JSON Root​

Strip Outer Array​

Read JSON By Line​

Fuzzy Parse​

Num As String​

Relationship between JSON Path and Columns​

Usage Examples​

Using JSON Path Only​

Using JSON Path + Columns​

Field Reuse​

Nested Field Mapping​

Usage Examples​

Stream Load​

Broker Load​

Routine Load​

TVF Load​

Loading Methods

Supported JSON Formats

Multiple Rows Represented as Array

Single Row Represented as Object

Multiple Object Rows Separated by Delimiter

Parameter Configuration

Parameter Support

Parameter Description

JSON Path

JSON Root

Strip Outer Array

Read JSON By Line

Fuzzy Parse

Num As String

Relationship between JSON Path and Columns

Usage Examples

Using JSON Path Only

Using JSON Path + Columns

Field Reuse

Nested Field Mapping

Usage Examples

Stream Load

Broker Load

Routine Load

TVF Load