WINDOW_FUNNEL
Description
The WINDOW_FUNNEL function analyzes user behavior sequences by searching for event chains within a specified time window and calculating the maximum number of completed steps in the event chain. This function is particularly useful for conversion funnel analysis, such as analyzing user conversion from website visits to final purchases.
The function works according to the algorithm:
- The function searches for data that triggers the first condition in the chain and sets the event counter to 1. This is the moment when the sliding window starts.
- If events from the chain occur sequentially within the window, the counter is incremented. If the sequence of events is disrupted, the counter is not incremented.
- If the data has multiple event chains at varying points of completion, the function will only output the size of the longest chain.
Syntax
WINDOW_FUNNEL(<window>, <mode>, <timestamp>, <event_1>[, event_2, ... , event_n])
Parameters
| Parameter | Description |
|---|---|
<window> | window is the length of time window in seconds |
<mode> | There are four modes in total, default, deduplication, fixed, and increase. For details, please refer to the Mode below. |
<timestamp> | timestamp specifies column of DATETIME type, sliding time window works on it |
<event_n> | evnet_n is boolean expression like eventID = 1004 |
Mode
- `default`: Standard funnel calculation. Doris searches for the longest event chain that matches the specified order within the time window. Events that do not match any condition are ignored.
- `deduplication`: Based on `default`, but an event that has already been matched in the current chain cannot appear again. For example, if the condition list is [event1='A', event2='B', event3='C', event4='D'] and the original event chain is `A-B-C-B-D`, the second `B` breaks the chain, so the matched event chain is `A-B-C` and the max level is `3`.
- `fixed`: The chain must advance in the specified order and cannot skip intermediate steps. If an event that matches a later condition appears before its immediate predecessor is matched, the chain stops. Starting from Doris 4.1, events that do not match any condition are ignored and do not break the chain. For example, with [event1='A', event2='B', event3='C', event4='D'], `A-B-D-C` returns `A-B` and level `2`; with `A-B-X-C-D` (`X` matches none of the conditions), Doris 4.1 and later returns `A-B-C-D`, while earlier versions stop at `A-B`.
- `increase`: Based on `default`, but matched events must have strictly increasing timestamps. If two matched events have the same timestamp, the later event cannot advance the chain.
Return Value
Returns an integer representing the maximum number of consecutive steps completed within the specified time window.
Examples
example1: default mode
Using the default mode, find out the maximum number of consecutive events corresponding to different user_id, with a time window of 5 minutes:
CREATE TABLE events(
user_id BIGINT,
event_name VARCHAR(64),
event_timestamp datetime,
phone_brand varchar(64),
tab_num int
) distributed by hash(user_id) buckets 3 properties("replication_num" = "1");
INSERT INTO
events
VALUES
(100123, 'login', '2022-05-14 10:01:00', 'HONOR', 1),
(100123, 'visit', '2022-05-14 10:02:00', 'HONOR', 2),
(100123, 'order', '2022-05-14 10:04:00', 'HONOR', 3),
(100123, 'payment', '2022-05-14 10:10:00', 'HONOR', 4),
(100125, 'login', '2022-05-15 11:00:00', 'XIAOMI', 1),
(100125, 'visit', '2022-05-15 11:01:00', 'XIAOMI', 2),
(100125, 'order', '2022-05-15 11:02:00', 'XIAOMI', 6),
(100126, 'login', '2022-05-15 12:00:00', 'IPHONE', 1),
(100126, 'visit', '2022-05-15 12:01:00', 'HONOR', 2),
(100127, 'login', '2022-05-15 11:30:00', 'VIVO', 1),
(100127, 'visit', '2022-05-15 11:31:00', 'VIVO', 5);
SELECT
user_id,
window_funnel(
300,
"default",
event_timestamp,
event_name = 'login',
event_name = 'visit',
event_name = 'order',
event_name = 'payment'
) AS level
FROM
events
GROUP BY
user_id
order BY
user_id;
+---------+-------+
| user_id | level |
+---------+-------+
| 100123 | 3 |
| 100125 | 3 |
| 100126 | 2 |
| 100127 | 2 |
+---------+-------+
For user_id=100123, because the time when the payment event occurred exceeds the time window, the matched event chain is login-visit-order.
example2: deduplication mode
Use the deduplication mode to find out the maximum number of consecutive events corresponding to different user_ids, with a time window of 1 hour:
CREATE TABLE events(
user_id BIGINT,
event_name VARCHAR(64),
event_timestamp datetime,
phone_brand varchar(64),
tab_num int
) distributed by hash(user_id) buckets 3 properties("replication_num" = "1");
INSERT INTO
events
VALUES
(100123, 'login', '2022-05-14 10:01:00', 'HONOR', 1),
(100123, 'visit', '2022-05-14 10:02:00', 'HONOR', 2),
(100123, 'login', '2022-05-14 10:03:00', 'HONOR', 3),
(100123, 'order', '2022-05-14 10:04:00', 'HONOR', 4),
(100123, 'payment', '2022-05-14 10:10:00', 'HONOR', 4),
(100125, 'login', '2022-05-15 11:00:00', 'XIAOMI', 1),
(100125, 'visit', '2022-05-15 11:01:00', 'XIAOMI', 2),
(100125, 'order', '2022-05-15 11:02:00', 'XIAOMI', 6),
(100126, 'login', '2022-05-15 12:00:00', 'IPHONE', 1),
(100126, 'visit', '2022-05-15 12:01:00', 'HONOR', 2),
(100127, 'login', '2022-05-15 11:30:00', 'VIVO', 1),
(100127, 'visit', '2022-05-15 11:31:00', 'VIVO', 5);
SELECT
user_id,
window_funnel(
3600,
"deduplication",
event_timestamp,
event_name = 'login',
event_name = 'visit',
event_name = 'order',
event_name = 'payment'
) AS level
FROM
events
GROUP BY
user_id
order BY
user_id;
+---------+-------+
| user_id | level |
+---------+-------+
| 100123 | 2 |
| 100125 | 3 |
| 100126 | 2 |
| 100127 | 2 |
+---------+-------+
For user_id=100123, after matching the visit event, the login event appears repeatedly, so the matched event chain is login-visit.
example3: fixed mode
Use the fixed mode to find out the maximum number of consecutive events corresponding to different user_id, with a time window of 1 hour:
CREATE TABLE events(
user_id BIGINT,
event_name VARCHAR(64),
event_timestamp datetime,
phone_brand varchar(64),
tab_num int
) distributed by hash(user_id) buckets 3 properties("replication_num" = "1");
INSERT INTO
events
VALUES
(100123, 'login', '2022-05-14 10:01:00', 'HONOR', 1),
(100123, 'visit', '2022-05-14 10:02:00', 'HONOR', 2),
(100123, 'login2', '2022-05-14 10:03:00', 'HONOR', 3),
(100123, 'order', '2022-05-14 10:04:00', 'HONOR', 4),
(100123, 'payment', '2022-05-14 10:10:00', 'HONOR', 4),
(100125, 'login', '2022-05-15 11:00:00', 'XIAOMI', 1),
(100125, 'visit', '2022-05-15 11:01:00', 'XIAOMI', 2),
(100125, 'order', '2022-05-15 11:02:00', 'XIAOMI', 6),
(100126, 'login', '2022-05-15 12:00:00', 'IPHONE', 1),
(100126, 'visit', '2022-05-15 12:01:00', 'HONOR', 2),
(100127, 'login', '2022-05-15 11:30:00', 'VIVO', 1),
(100127, 'visit', '2022-05-15 11:31:00', 'VIVO', 5);
SELECT
user_id,
window_funnel(
3600,
"fixed",
event_timestamp,
event_name = 'login',
event_name = 'visit',
event_name = 'order',
event_name = 'payment'
) AS level
FROM
events
GROUP BY
user_id
order BY
user_id;
+---------+-------+
| user_id | level |
+---------+-------+
| 100123 | 4 |
| 100125 | 3 |
| 100126 | 2 |
| 100127 | 2 |
+---------+-------+
For user_id=100123, login2 does not match any condition in the funnel. Starting from Doris 4.1, such unrelated events do not break the fixed chain, so the matched event chain is login-visit-order-payment. In Doris versions earlier than 4.1, the same data would stop at login-visit and return level 2.
example4: increase mode
Use the increase mode to find out the maximum number of consecutive events corresponding to different user_id, with a time window of 1 hour:
CREATE TABLE events(
user_id BIGINT,
event_name VARCHAR(64),
event_timestamp datetime,
phone_brand varchar(64),
tab_num int
) distributed by hash(user_id) buckets 3 properties("replication_num" = "1");
INSERT INTO
events
VALUES
(100123, 'login', '2022-05-14 10:01:00', 'HONOR', 1),
(100123, 'visit', '2022-05-14 10:02:00', 'HONOR', 2),
(100123, 'order', '2022-05-14 10:04:00', 'HONOR', 4),
(100123, 'payment', '2022-05-14 10:04:00', 'HONOR', 4),
(100125, 'login', '2022-05-15 11:00:00', 'XIAOMI', 1),
(100125, 'visit', '2022-05-15 11:01:00', 'XIAOMI', 2),
(100125, 'order', '2022-05-15 11:02:00', 'XIAOMI', 6),
(100126, 'login', '2022-05-15 12:00:00', 'IPHONE', 1),
(100126, 'visit', '2022-05-15 12:01:00', 'HONOR', 2),
(100127, 'login', '2022-05-15 11:30:00', 'VIVO', 1),
(100127, 'visit', '2022-05-15 11:31:00', 'VIVO', 5);
SELECT
user_id,
window_funnel(
3600,
"increase",
event_timestamp,
event_name = 'login',
event_name = 'visit',
event_name = 'order',
event_name = 'payment'
) AS level
FROM
events
GROUP BY
user_id
order BY
user_id;
+---------+-------+
| user_id | level |
+---------+-------+
| 100123 | 3 |
| 100125 | 3 |
| 100126 | 2 |
| 100127 | 2 |
+---------+-------+
For user_id=100123, the timestamp of the payment event and the timestamp of the order event occur in the same second and are not incremented, so the matched event chain is login-visit-order.