unnest

package
v0.6.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 29, 2022 License: Apache-2.0 Imports: 12 Imported by: 0

README

UNNEST

函数说明

UNNEST是一个表函数,出现在sql的from子句中,用于将json[object|array]类型的数据展开为多行,每行包含json中的一个元素.

语法结构

> UNNEST(src[, path[, outer]])

相关参数

参数 说明 类型
src 必要参数,待展开的数据源 类型可以是json列或json字符串
path 可选参数,指明待展开数据源的具体json路径。默认为"$",展开整个json数据 path字符串
outer 可选参数,如果数据源展开后结果行为0,是否加上一个空行作为标记。默认为false bool类型

返回表结构

字段名 类型 说明
col varchar 数据源的名称。如果数据源是列,则是列名;如果数据源是json字符串,则是"UNNEST_DEFAULT"
seq int32 数据源中元素的索引,从0开始
key varchar 展开元素的键名,如果父级元素是数组,则为null
path varchar 展开元素的在数据源中的路径
index int32 展开元素在父级元素中的索引,如果数据源是对象,则为null
value varchar 展开元素的值
this varchar 展开元素的父级元素值

示例

> select *
> from unnest('{"a":1,"b":2,"c":3}') as u;
+----------------+------+------+------+-------+-------+--------------------------+
| col            | seq  | key  | path | index | value | this                     |
+----------------+------+------+------+-------+-------+--------------------------+
| UNNEST_DEFAULT |    0 | a    | $.a  |  NULL | 1     | {"a": 1, "b": 2, "c": 3} |
| UNNEST_DEFAULT |    0 | b    | $.b  |  NULL | 2     | {"a": 1, "b": 2, "c": 3} |
| UNNEST_DEFAULT |    0 | c    | $.c  |  NULL | 3     | {"a": 1, "b": 2, "c": 3} |
+----------------+------+------+------+-------+-------+--------------------------+

> select *
> from unnest('{"a":1,"b":2,"c":3}') as u
> where u.`key` = 'b';
+----------------+------+------+------+-------+-------+--------------------------+
| col            | seq  | key  | path | index | value | this                     |
+----------------+------+------+------+-------+-------+--------------------------+
| UNNEST_DEFAULT |    0 | b    | $.b  |  NULL | 2     | {"a": 1, "b": 2, "c": 3} |
+----------------+------+------+------+-------+-------+--------------------------+

> select *
> from unnest('{"a":1,"b":2,"c":3}',"$.b") as u;
Empty set (0.01 sec)

> select *
> from unnest('{"a":1,"b":2,"c":3}',"$.b",true) as u;
+----------------+------+------+------+-------+-------+--------------------------+
| col            | seq  | key  | path | index | value | this                     |
+----------------+------+------+------+-------+-------+--------------------------+
| UNNEST_DEFAULT |    0 | NULL | $.b  |  NULL | NULL  | 2                        |

> drop table if exists t1;
> create table t1 (a json,b int);
> insert into t1
> values ('{"a":1,"b":[{"c":2,"d":3},false,4],"e":{"f":true,"g":[null,true,1.1]}}',1);
> insert into t1
> values ('[1,true,false,null,"aaa",1.1,{"t":false}]',2);
> select * from unnest(t1.a, "$.b") as u;
+------+------+------+--------+-------+------------------+------------------------------+
| col  | seq  | key  | path   | index | value            | this                         |
+------+------+------+--------+-------+------------------+------------------------------+
| a    |    0 | NULL | $.b[0] |     0 | {"c": 2, "d": 3} | [{"c": 2, "d": 3}, false, 4] |
| a    |    0 | NULL | $.b[1] |     1 | false            | [{"c": 2, "d": 3}, false, 4] |
| a    |    0 | NULL | $.b[2] |     2 | 4                | [{"c": 2, "d": 3}, false, 4] |
+------+------+------+--------+-------+------------------+------------------------------+

> select * from unnest(t1.a, "$.b[0]") as u;
+------+------+------+----------+-------+-------+------------------+
| col  | seq  | key  | path     | index | value | this             |
+------+------+------+----------+-------+-------+------------------+
| a    |    0 | c    | $.b[0].c |  NULL | 2     | {"c": 2, "d": 3} |
| a    |    0 | d    | $.b[0].d |  NULL | 3     | {"c": 2, "d": 3} |
+------+------+------+----------+-------+-------+------------------+

> select distinct(f.seq) from unnest(t1.a, "$") as f;
+-------+
| f.seq |
+-------+
|     0 |
|     1 |
+-------+

注意事项

  • key,index和value全为null则代表当前行是outer为true时默认添加的空行

执行流程

  1. 数据源是json列 ...unnest -> project(jsonCol) -> tableScan
  2. 数据源是json字符串 ...unnest -> project(default) -> valueScan(parse jsonStr)

实现细节

数据源为json字符串
  1. 构建plan时将存储在tree.Unnest中的参数序列化后存储到unnestNode.TableDef.TableFunctionParam
  2. unnestNode中添加valueScan节点
  3. valueScan.TableDef.TableFunctionParam赋值为tree.Unnest中存储的json字符串转化的字节切片
  4. 在编译阶段首先将valueScanTableDef.TableFunctionParam存入scope.Datasource.Bat
  5. 在scope中添加vm.Unnest指令,并通过unnestNode.TableDef.TableFunctionParam构建运行参数
  6. 执行阶段通过bytejson包解析json字节切片,解析path字符串,通过json,path, outer,filter参数调用bytejson.Unnest函数,返回UnnestResult结果集
  7. 通过makeBatch组装UnnestResult结果集为batch
数据源为json列
  1. 构建plan时将存储在tree.Unnest中的参数序列化后存储到unnestNode.TableDef.TableFunctionParam
  2. unnestNode中添加tableScan节点,并根据tree.Unnest中的参数初始化tableScanTableDef
  3. 编译阶段在scope中添加vm.Unnest指令,并通过unnestNode.TableDef.TableFunctionParam构建运行参数
  4. 执行阶段通过bytejson包解析由tableScan传递bytejson字节切片,解析path字符串,通过json,path, outer,filter参数调用bytejson.Unnest 函数,返回UnnestResult结果集
  5. 通过makeBatch组装UnnestResult结果集为batch

filter参数是根据tree.Unnest中的Attrs字段构建的string切片,其目的是为了在bytejson.Unnest函数中过滤不需要的结果集

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func Call

func Call(_ int, proc *process.Process, arg any) (bool, error)

func Prepare

func Prepare(_ *process.Process, arg any) error

func String

func String(arg any, buf *bytes.Buffer)

Types

type Argument

type Argument struct {
	Es *Param
}

func (*Argument) Free

func (arg *Argument) Free(proc *process.Process, pipelineFailed bool)

type ExternalParam

type ExternalParam struct {
	ColName string
	Path    string
	Outer   bool
}

type Param

type Param struct {
	Attrs    []string
	Cols     []*plan.ColDef
	ExprList []*plan.Expr
	ColName  string
	// contains filtered or unexported fields
}

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL