Variant
Use this page to load CSV or JSON data into VARIANT.
If you are still choosing between default behavior, sparse columns, DOC mode, and Schema Template, start with Variant Workload Guide. For syntax, indexes, limits, and configuration, see VARIANT.
Usage Limitations
Supports CSV and JSON formats.
Storage Optimization (V3)
For new VARIANT tables, especially wide JSON workloads, start with Storage Format V3 unless you already have a reason not to. For the architectural rationale, see Storage Format V3.
To enable it, specify the storage format in the table properties:
CREATE TABLE table_v3 (
id BIGINT,
data VARIANT
)
DISTRIBUTED BY HASH(id) BUCKETS 32
PROPERTIES (
"storage_format" = "V3"
);
Loading CSV Format
Step 1: Prepare Data
Create a CSV file named test_variant.csv with the following content:
14186154924|PushEvent|{"avatar_url":"https://avatars.githubusercontent.com/u/282080?","display_login":"brianchandotcom","gravatar_id":"","id":282080,"login":"brianchandotcom","url":"https://api.github.com/users/brianchandotcom"}|{"id":1920851,"name":"brianchandotcom/liferay-portal","url":"https://api.github.com/repos/brianchandotcom/liferay-portal"}|{"before":"abb58cc0db673a0bd5190000d2ff9c53bb51d04d","commits":[""],"distinct_size":4,"head":"91edd3c8c98c214155191feb852831ec535580ba","push_id":6027092734,"ref":"refs/heads/master","size":4}|1|2020-11-14 02:00:00
Step 2: Create Table in Database
Execute the following SQL statement to create the table:
CREATE TABLE IF NOT EXISTS testdb.test_variant (
id BIGINT NOT NULL,
type VARCHAR(30) NULL,
actor VARIANT NULL,
repo VARIANT NULL,
payload VARIANT NULL,
public BOOLEAN NULL,
created_at DATETIME NULL,
INDEX idx_payload (`payload`) USING INVERTED PROPERTIES("parser" = "english") COMMENT 'inverted index for payload'
)
DUPLICATE KEY(`id`)
DISTRIBUTED BY HASH(id) BUCKETS 10
properties("replication_num" = "1");
Step 3: Load Data
Using stream load as an example, use the following command to load data:
curl --location-trusted -u root: -T test_variant.csv -H "column_separator:|" http://127.0.0.1:8030/api/testdb/test_variant/_stream_load
Example of load results:
{
"TxnId": 12,
"Label": "96cd6250-9c78-4a9f-b8b3-2b7cef0dd606",
"Comment": "",
"TwoPhaseCommit": "false",
"Status": "Success",
"Message": "OK",
"NumberTotalRows": 1,
"NumberLoadedRows": 1,
"NumberFilteredRows": 0,
"NumberUnselectedRows": 0,
"LoadBytes": 660,
"LoadTimeMs": 213,
"BeginTxnTimeMs": 0,
"StreamLoadPutTimeMs": 6,
"ReadDataTimeMs": 0,
"WriteDataTimeMs": 183,
"ReceiveDataTimeMs": 14,
"CommitAndPublishTimeMs": 20
}
Step 4: Check Loaded Data
Use the following SQL query to check the loaded data:
mysql> select * from testdb.test_variant\G
*************************** 1. row ***************************
id: 14186154924
type: PushEvent
actor: {"avatar_url":"https://avatars.githubusercontent.com/u/282080?","display_login":"brianchandotcom","gravatar_id":"","id":282080,"login":"brianchandotcom","url":"https://api.github.com/users/brianchandotcom"}
repo: {"id":1920851,"name":"brianchandotcom/liferay-portal","url":"https://api.github.com/repos/brianchandotcom/liferay-portal"}
payload: {"before":"abb58cc0db673a0bd5190000d2ff9c53bb51d04d","commits":[""],"distinct_size":4,"head":"91edd3c8c98c214155191feb852831ec535580ba","push_id":6027092734,"ref":"refs/heads/master","size":4}
public: 1
created_at: 2020-11-14 02:00:00
Loading JSON Format
Step 1: Prepare Data
Create a JSON file named test_variant.json with the following content:
{"id": "14186154924","type": "PushEvent","actor": {"id": 282080,"login":"brianchandotcom","display_login": "brianchandotcom","gravatar_id": "","url": "https://api.github.com/users/brianchandotcom","avatar_url": "https://avatars.githubusercontent.com/u/282080?"},"repo": {"id": 1920851,"name": "brianchandotcom/liferay-portal","url": "https://api.github.com/repos/brianchandotcom/liferay-portal"},"payload": {"push_id": 6027092734,"size": 4,"distinct_size": 4,"ref": "refs/heads/master","head": "91edd3c8c98c214155191feb852831ec535580ba","before": "abb58cc0db673a0bd5190000d2ff9c53bb51d04d","commits": [""]},"public": true,"created_at": "2020-11-13T18:00:00Z"}
Step 2: Create Table in Database
Execute the following SQL statement to create the table:
CREATE TABLE IF NOT EXISTS testdb.test_variant (
id BIGINT NOT NULL,
type VARCHAR(30) NULL,
actor VARIANT NULL,
repo VARIANT NULL,
payload VARIANT NULL,
public BOOLEAN NULL,
created_at DATETIME NULL,
INDEX idx_payload (`payload`) USING INVERTED PROPERTIES("parser" = "english") COMMENT 'inverted index for payload'
)
DUPLICATE KEY(`id`)
DISTRIBUTED BY HASH(id) BUCKETS 10;
Step 3: Load Data
Using stream load as an example, use the following command to load data:
curl --location-trusted -u root: -T test_variant.json -H "format:json" http://127.0.0.1:8030/api/testdb/test_variant/_stream_load
Example of load results:
{
"TxnId": 12,
"Label": "96cd6250-9c78-4a9f-b8b3-2b7cef0dd606",
"Comment": "",
"TwoPhaseCommit": "false",
"Status": "Success",
"Message": "OK",
"NumberTotalRows": 1,
"NumberLoadedRows": 1,
"NumberFilteredRows": 0,
"NumberUnselectedRows": 0,
"LoadBytes": 660,
"LoadTimeMs": 213,
"BeginTxnTimeMs": 0,
"StreamLoadPutTimeMs": 6,
"ReadDataTimeMs": 0,
"WriteDataTimeMs": 183,
"ReceiveDataTimeMs": 14,
"CommitAndPublishTimeMs": 20
}
Step 4: Check Loaded Data
Use the following SQL query to check the loaded data:
mysql> select * from testdb.test_variant\G
*************************** 1. row ***************************
id: 14186154924
type: PushEvent
actor: {"avatar_url":"https://avatars.githubusercontent.com/u/282080?","display_login":"brianchandotcom","gravatar_id":"","id":282080,"login":"brianchandotcom","url":"https://api.github.com/users/brianchandotcom"}
repo: {"id":1920851,"name":"brianchandotcom/liferay-portal","url":"https://api.github.com/repos/brianchandotcom/liferay-portal"}
payload: {"before":"abb58cc0db673a0bd5190000d2ff9c53bb51d04d","commits":[""],"distinct_size":4,"head":"91edd3c8c98c214155191feb852831ec535580ba","push_id":6027092734,"ref":"refs/heads/master","size":4}
public: 1
created_at: 2020-11-14 02:00:00
Step 5: Check type inference
Running desc command to view schema information, sub-columns will automatically expand at the storage layer and undergo type inference.
mysql> desc test_variant;
+------------------------------------------------------------+------------+------+-------+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------------------------------------------------------+------------+------+-------+---------+-------+
| id | BIGINT | No | true | NULL | |
| type | VARCHAR(*) | Yes | false | NULL | NONE |
| actor | VARIANT | Yes | false | NULL | NONE |
| created_at | DATETIME | Yes | false | NULL | NONE |
| payload | VARIANT | Yes | false | NULL | NONE |
| public | BOOLEAN | Yes | false | NULL | NONE |
+------------------------------------------------------------+------------+------+-------+---------+-------+
6 rows in set (0.07 sec)
mysql> set describe_extend_variant_column = true;
Query OK, 0 rows affected (0.01 sec)
mysql> desc test_variant;
+------------------------------------------------------------+------------+------+-------+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------------------------------------------------------+------------+------+-------+---------+-------+
| id | BIGINT | No | true | NULL | |
| type | VARCHAR(*) | Yes | false | NULL | NONE |
| actor | VARIANT | Yes | false | NULL | NONE |
| actor.avatar_url | TEXT | Yes | false | NULL | NONE |
| actor.display_login | TEXT | Yes | false | NULL | NONE |
| actor.id | INT | Yes | false | NULL | NONE |
| actor.login | TEXT | Yes | false | NULL | NONE |
| actor.url | TEXT | Yes | false | NULL | NONE |
| created_at | DATETIME | Yes | false | NULL | NONE |
| payload | VARIANT | Yes | false | NULL | NONE |
| payload.action | TEXT | Yes | false | NULL | NONE |
| payload.before | TEXT | Yes | false | NULL | NONE |
| payload.comment.author_association | TEXT | Yes | false | NULL | NONE |
| payload.comment.body | TEXT | Yes | false | NULL | NONE |
....
+------------------------------------------------------------+------------+------+-------+---------+-------+
406 rows in set (0.07 sec)
DESC can be used to specify partition and view the schema of a particular partition. The syntax is as follows:
DESCRIBE ${table_name} PARTITION ($partition_name);