IPV6

description

IPv6 type, stored in UInt128 format in 16 bytes, used to represent IPv6 addresses. The range of values is ['::', 'ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff'].

Inputs that exceed the value range or have invalid format will return NULL

example

Create table example:

CREATE TABLE ipv6_test (
  `id` int,
  `ip_v6` ipv6
) ENGINE=OLAP
DISTRIBUTED BY HASH(`id`) BUCKETS 4
PROPERTIES (
"replication_allocation" = "tag.location.default: 1"
);

Insert data example:

insert into ipv6_test values(1, '::');
insert into ipv6_test values(2, '2001:16a0:2:200a::2');
insert into ipv6_test values(3, 'ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff');
insert into ipv6_test values(4, 'ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffffg'); // invalid data

Select data example:

mysql> select * from ipv6_test order by id;
+------+-----------------------------------------+
| id   | ip_v6                                   |
+------+-----------------------------------------+
|    1 | ::                                      |
|    2 | 2001:16a0:2:200a::2                     |
|    3 | ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff |
|    4 | NULL                                    |
+------+-----------------------------------------+

Best Practice: store mixed IPv4/IPv6 in a single IPV6 column with an inverted index

A single IPV6 column can store both IPv4 and IPv6 addresses: write IPv4 in its IPv4-mapped IPv6 form (::ffff:x.y.z.w) — for example 192.168.1.1 is stored as ::ffff:192.168.1.1 — while native IPv6 (such as 2001:16a0:2:200a::2) is written as-is. This lets a mixed-IP field be managed in one column, and an inverted index on that column accelerates IP filtering, supporting equality (=), IN, and CIDR functions such as is_ip_address_in_range().

Schema

CREATE TABLE access_log (
  `id`            BIGINT NOT NULL,
  `request_time`  DATETIME NOT NULL,
  `client_ip`     IPV6 NULL COMMENT 'Stored uniformly as IPv6; IPv4 stored in ::ffff: form',
  INDEX idx_client_ip (`client_ip`) USING INVERTED
) ENGINE=OLAP
DUPLICATE KEY(`id`, `request_time`)
DISTRIBUTED BY HASH(`id`) BUCKETS 4
PROPERTIES (
  "replication_allocation" = "tag.location.default: 1"
);

Normalizing on load

Regardless of the load method, the key is to use the same expression in the column mapping to convert the raw IP string into IPv6 (below, ip_str is the source field and client_ip is the table's IPV6 column):

client_ip = COALESCE(
    ipv4_to_ipv6(to_ipv4_or_null(ip_str)),
    to_ipv6_or_null(ip_str)
)

A plain IPv4 string is first parsed to IPV4 by to_ipv4_or_null(), then converted to ::ffff:x.y.z.w by ipv4_to_ipv6();
If it is not a valid IPv4, COALESCE falls back to to_ipv6_or_null() and parses it as native IPv6 text;
If both fail, NULL is written. To reject invalid data instead, replace _or_null with the strict to_ipv4 / to_ipv6 and handle it via the load error threshold.

Routine Load

CREATE ROUTINE LOAD example_db.access_log_job ON access_log
COLUMNS TERMINATED BY ",",
COLUMNS(
  id, request_time, ip_str,
  client_ip = COALESCE(
      ipv4_to_ipv6(to_ipv4_or_null(ip_str)),
      to_ipv6_or_null(ip_str)
  )
)
PROPERTIES ("desired_concurrent_number" = "3", "max_error_number" = "1000")
FROM KAFKA ("kafka_broker_list" = "broker:9092", "kafka_topic" = "ip_logs");

Stream Load

Apply the same conversion in the columns header:

curl --location-trusted -u user:passwd \
  -H "format: csv" \
  -H "column_separator: ," \
  -H "columns: id, request_time, ip_str, client_ip=COALESCE(ipv4_to_ipv6(to_ipv4_or_null(ip_str)), to_ipv6_or_null(ip_str))" \
  -T data.csv \
  "http://<fe_host>:8030/api/example_db/access_log/_stream_load"

INSERT INTO (migrating from an old VARCHAR table)

Convert the raw IP string to IPv6 in the SELECT (ip_str is the VARCHAR column of the old table):

INSERT INTO access_log (id, request_time, client_ip)
SELECT
  id, request_time,
  COALESCE(
      ipv4_to_ipv6(to_ipv4_or_null(ip_str)),
      to_ipv6_or_null(ip_str)
  )
FROM access_log_varchar;

Rewriting queries (important)

Once everything is stored as mapped IPv6, IPv4 query conditions must also be rewritten into mapped IPv6 form; otherwise a type / address-family mismatch causes the index to be skipped:

IPv6 CIDR: write it directly.

is_ip_address_in_range(client_ip, '2001:4860:4801::/48')

IPv4 CIDR → mapped IPv6 CIDR: add the ::ffff: prefix to the address and add +96 to the prefix length (the first 96 bits of a mapped address are a fixed prefix).

IPv4 CIDR mapped IPv6 CIDR
10.42.0.0/16 ::ffff:10.42.0.0/112
192.178.4.0/24 ::ffff:192.178.4.0/120
3.219.120.76/32 ::ffff:3.219.120.76/128
```
is_ip_address_in_range(client_ip, '::ffff:10.42.0.0/112')
```

IPv4 CIDR	mapped IPv6 CIDR
`10.42.0.0/16`	`::ffff:10.42.0.0/112`
`192.178.4.0/24`	`::ffff:192.178.4.0/120`
`3.219.120.76/32`	`::ffff:3.219.120.76/128`

Exact IPv4 match: compare against the mapped value too.

client_ip = ipv4_to_ipv6(to_ipv4('3.219.120.76'))
-- or
client_ip = to_ipv6('::ffff:3.219.120.76')

Allowlist (keep only rows in the range): use is_ip_address_in_range(...) directly as the filter.
```
WHERE is_ip_address_in_range(client_ip, '::ffff:10.42.0.0/112')
```
Blocklist (exclude rows in the range): use NOT is_ip_address_in_range(...), and avoid the is_ip_address_in_range(...) = 0 form (which is harder to push down to the index).
```
WHERE NOT is_ip_address_in_range(client_ip, '::ffff:10.42.0.0/112')
```

Full example (a stat query that excludes several ranges). Because it contains CIDR range conditions, lower the threshold first to ensure the index is used:

SET inverted_index_skip_threshold = 0;

SELECT client_ip, COUNT(*) AS cnt
FROM access_log
WHERE request_time BETWEEN '2026-06-01 00:00:00' AND '2026-06-02 00:00:00'
  AND client_ip NOT IN (to_ipv6('::ffff:3.219.120.76'))               -- exact IPv4 exclusion -> mapped
  AND NOT is_ip_address_in_range(client_ip, '::ffff:10.42.0.0/112')   -- IPv4 /16 -> /112
  AND NOT is_ip_address_in_range(client_ip, '2001:4860:4801::/48')    -- IPv6 range
GROUP BY client_ip;

Conditions for hitting the index

The column must be a native IPV4 / IPV6 type with a USING INVERTED inverted index built on it;
The CIDR argument of is_ip_address_in_range() must be a constant string;
An IPV6 column only handles IPv6 CIDRs (IPv4 ranges must first be converted to ::ffff: mapped form), and an IPV4 column only handles IPv4 CIDRs; a type / CIDR-family mismatch skips the index and falls back to expression filtering;
The session variable inverted_index_skip_threshold defaults to 50: when more than 50% of the rows in a segment match the index, the index is skipped and data is read directly (bypass).

caution

CIDR range queries need a lower inverted_index_skip_threshold

For a CIDR range query, is_ip_address_in_range() is internally split into two BKD queries, >= range start and <= range end. Each side alone often matches more than 50% of the rows (even when the final intersection is narrow), triggering bypass and skipping the index. Such queries therefore need SET inverted_index_skip_threshold = 0; (or a smaller value) in the session to hit the index reliably. Equality / IN are high-selectivity point lookups and are unaffected.

keywords

IPV6

IPV6​

description​

example​

Best Practice: store mixed IPv4/IPv6 in a single IPV6 column with an inverted index​

Schema​

Normalizing on load​

Routine Load​

Stream Load​

INSERT INTO (migrating from an old VARCHAR table)​

Rewriting queries (important)​

Conditions for hitting the index​

keywords​

IPV6

description

example

Best Practice: store mixed IPv4/IPv6 in a single IPV6 column with an inverted index

Schema

Normalizing on load

Routine Load

Stream Load

INSERT INTO (migrating from an old VARCHAR table)

Rewriting queries (important)

Conditions for hitting the index

keywords