Skip to main content

HTTP

HTTP table-valued-function (TVF) allows users to read data returned from any HTTP endpoint as if accessing relational table format data. As long as the returned data is in a supported format, it can be queried and analyzed directly via SQL. Currently supports csv/csv_with_names/csv_with_names_and_types/json/parquet/orc data formats.

Typical use cases include:

  • Querying data files hosted on HTTP/HTTPS (e.g., GitHub, S3, etc.).
  • Directly querying HTTP API endpoints that return JSON-formatted data.
  • Accessing datasets hosted on Hugging Face.
note

Supported since version 4.0.2.

Syntax

HTTP(
"uri" = "<uri>",
"format" = "<format>"
[, "<optional_property_key>" = "<optional_property_value>" [, ...] ]
)

Required Parameters

ParameterDescription
uriThe HTTP address to access. Supports http, https, and hf protocols. Can be a URL to a data file or an API endpoint that returns data.
formatData format, i.e., how the content returned by the HTTP endpoint is parsed. Supports csv/csv_with_names/csv_with_names_and_types/json/parquet/orc.

For hf:// (Hugging Face), please refer to Analyzing Hugging Face Data.

Optional Parameters

ParameterDescriptionNotes
http.header.xxxUsed to specify arbitrary HTTP Headers, which are passed directly to the HTTP Client.e.g., "http.header.Authorization" = "Bearer hf_MWYzOJJoZEymb...", the resulting Header will be Authorization: Bearer hf_MWYzOJJoZEymb...
http.enable.range.requestWhether to use range requests to access the HTTP service. Default is true.
http.max.request.size.bytesMaximum access size limit when using non-range request mode. Default is 100 MB.

When http.enable.range.request is true, the system will first attempt to access the HTTP service using range requests. If the HTTP service does not support range requests, it will automatically fall back to non-range request mode. The maximum data access size is limited by http.max.request.size.bytes.

Examples

Reading Data Files over HTTP

  • Read CSV data from GitHub

    SELECT COUNT(*) FROM
    HTTP(
    "uri" = "https://raw.githubusercontent.com/apache/doris/refs/heads/master/regression-test/data/load_p0/http_stream/all_types.csv",
    "format" = "csv",
    "column_separator" = ","
    );
  • Read Parquet data from GitHub

    SELECT arr_map, id FROM
    HTTP(
    "uri" = "https://raw.githubusercontent.com/apache/doris/refs/heads/master/regression-test/data/external_table_p0/tvf/t.parquet",
    "format" = "parquet"
    );
  • Read JSON data from GitHub and use DESC FUNCTION to view the schema

    DESC FUNCTION
    HTTP(
    "uri" = "https://raw.githubusercontent.com/apache/doris/refs/heads/master/regression-test/data/load_p0/stream_load/basic_data.json",
    "format" = "json",
    "strip_outer_array" = "true"
    );

Querying HTTP API Endpoints

You can use the HTTP table function to directly query HTTP API endpoints that return JSON-formatted data. For example, querying a REST API that returns JSON data:

SELECT * FROM
HTTP(
"uri" = "https://api.example.com/v1/data",
"format" = "json",
"http.header.Authorization" = "Bearer your_token",
"strip_outer_array" = "true"
);
tip

For HTTP API endpoints that do not support Range Requests, the system will automatically fall back to non-Range Request mode. You can manually disable Range Requests via the http.enable.range.request parameter.