Skip to main content

Index Optimization: Tips for Prefix Indexes and Inverted Indexes

Overview

Doris indexes are data structures used to accelerate query filtering. Using indexes appropriately can significantly improve query performance.

Doris currently supports two categories of indexes:

Index categoryIncluded typesCharacteristics
Built-in indexesPrefix index, ZoneMap indexAutomatically generated at table creation, no extra maintenance
Secondary indexesInverted index, Bloomfilter index, N-Gram Bloomfilter index, Bitmap indexCreated by users on demand, can be managed independently

During business optimization, effectively leveraging indexes based on business characteristics can greatly improve query and analysis performance, and is one of the key techniques for performance tuning.

For detailed introductions to each index type, refer to the Table Index chapter. This article starts from real cases and introduces index usage tips and optimization recommendations for several typical scenarios.

Pre-reading Self-check Checklist

  • You understand the current table's Key column definitions and order
  • You have identified high-frequency filter fields in the business
  • You have evaluated whether you can rebuild the table or only append secondary indexes
  • You have grasped the applicable scenarios of different index types

Case 1: Adjust Key Column Order to Accelerate Queries with Prefix Indexes

In Optimizing Table Schema Design, the article introduced how to choose appropriate fields as Key fields and use the Key column sorting feature of Doris to accelerate queries. This case further extends that scenario.

Background

Doris has a built-in prefix index: at table creation time, the first 36 bytes of the table Key are automatically taken as the prefix index. When the query condition matches the prefix of the prefix index, query speed can be significantly improved.

Problem: Key Column Order Does Not Match the Query Pattern

The original CREATE TABLE statement is as follows:

CREATE TABLE `t1` (
`c1` VARCHAR(10) NULL,
`c2` VARCHAR(10) NULL
) ENGINE=OLAP
DUPLICATE KEY(`c1`)
DISTRIBUTED BY HASH(`c2`) BUCKETS 10
PROPERTIES (
"replication_allocation" = "tag.location.default: 1"
);

The corresponding business SQL patterns are as follows:

select * from t1 where t1.c2 = '1';
select * from t1 where t1.c2 in ('1', '2', '3');

In the schema above, c1 comes first and c2 comes after, but the queries filter on the c2 field. In this case, the prefix index cannot be leveraged for acceleration.

Optimization: Adjust Column Order

Place the c2 column in the first field position so that the prefix index covers the business filter condition:

CREATE TABLE `t1` (
`c2` VARCHAR(10) NULL,
`c1` VARCHAR(10) NULL
) ENGINE=OLAP
DUPLICATE KEY(`c2`)
DISTRIBUTED BY HASH(`c1`) BUCKETS 10
PROPERTIES (
"replication_allocation" = "tag.location.default: 1"
);
Optimization tip

When defining the schema column order, refer to the high-frequency, high-priority columns used in business query filters to fully leverage the acceleration capability of the Doris prefix index.

Case 2: Use Inverted Indexes to Accelerate Queries

Applicable Scenarios

Doris supports inverted indexes as secondary indexes, used to accelerate the following business scenarios:

  • Full-text retrieval on text-type fields;
  • Equality queries on string, numeric, or datetime fields;
  • Range queries on string, numeric, or datetime fields.

Advantages

The creation and management of inverted indexes are independent: business performance can be conveniently optimized without affecting the original table schema and without re-importing table data.

For typical use cases, syntax, and examples, refer to Inverted Index. This section does not repeat them.

Optimization recommendation

When the original table structure and Key definitions are inconvenient to optimize, or when the cost of re-importing data is high, inverted indexes provide a flexible acceleration option for optimizing business execution performance.

Index Selection Comparison

Index typeApplicable queriesRequires table rebuildRequires data re-importTypical field types
Prefix indexEquality, range, prefix matchingYes (adjust Key)YesKey columns sorted in front
ZoneMap indexRange filteringNo (automatic)NoAll columns
Inverted indexFull-text retrieval, equality, rangeNoNoString, numeric, datetime
Bloomfilter indexHigh-cardinality equality filterNoNo (takes effect incrementally)String, numeric
N-Gram BloomfilterLIKE fuzzy matchingNoNo (takes effect incrementally)String
Bitmap indexLow-cardinality equality filterNoNo (takes effect incrementally)Enumeration fields

FAQ and Common Issues

Q1: Why are my queries still slow after creating an index?

Possible reasons:

  • The query condition does not hit any indexed column;
  • The Key column order does not match the filter condition, so the prefix index does not take effect;
  • The data volume is small, and the index does not bring noticeable benefit;
  • The index has not yet taken effect on historical data (some secondary indexes only take effect immediately on newly written data).

Q2: Does the prefix index need to be created manually?

No. At table creation, Doris automatically takes the first 36 bytes of the Key columns as the prefix index. To make the prefix index effective, place high-frequency business filter fields at the front of the Key columns.

Q3: How to choose between an inverted index and a Bloomfilter index?

  • Full-text retrieval, fuzzy matching, range queries: prefer inverted indexes;
  • Exact equality queries on high-cardinality fields: choose Bloomfilter indexes for lower overhead.

Q4: Does adjusting the Key column order require rebuilding the table?

Yes. Key column order is part of the table schema definition; after adjustment, you must rebuild the table and re-import the data.

Summary

In schema tuning, index optimization is as important as table-level schema optimization. Doris provides multiple index types:

  • Built-in indexes: prefix index, ZoneMap index;
  • Secondary indexes: inverted index, Bloomfilter, N-Gram Bloomfilter, Bitmap.

Using these indexes appropriately can significantly improve business query and analysis speed across multiple scenarios. It is recommended to first evaluate the high-frequency business filter fields, then choose the appropriate index type based on factors such as whether the table can be rebuilt and the data volume.