2020-11-20

Hive-语法笔记

查询语法

SELECT [ALL | DISTINCT] select_expr, select_expr, ...
FROM table_reference
[WHERE where_condition]
[GROUP BY col_list [HAVING condition]]
[CLUSTER BY col_list
| [DISTRIBUTE BY col_list] [SORT BY| ORDER BY col_list]
]
[LIMIT number]

order by 会对输入做全局排序，因此只有一个reducer，会导致当输入规模较大时，需要
较长的计算时间。
sort by不是全局排序，其在数据进入reducer前完成排序。因此，如果用sort by进行排
序，并且设置mapred.reduce.tasks>1，则sort by只保证每个reducer的输出有序，不保证全
局有序。
distribute by(字段)根据指定的字段将数据分到不同的reducer，且分发算法是hash散列。
cluster by(字段) 除了具有distribute by的功能外，还会对该字段进行排序.
因此，如果distribute 和sort字段是同一个时，此时， cluster by = distribute by +
sort by

实用函数总结

正则替换函数: regexp_replace regex_replace('2020.12.2','\\.','-')

时间戳转日期函数: from_unixtime from_unixtime(1323308943,'yyyyMMdd')

日期转UNIX时间戳函数: unix_timestamp 默认yyyy-MM-dd HH:mm:ss unix_timestamp('20111207 13:01:03','yyyyMMdd HH:mm:ss')

日期比较函数: datediff 第一个参数减第二个参数

日期格式: date_format date_format('2019-12-12','yyyy-MM') --日期字符串必须满足yyyy-MM-dd格式

If函数: if if(1=1,100,200) 可嵌套

条件判断函数： CASE CASE a WHEN b THEN c [WHEN d THEN e]* [ELSE f] END 如果a等于b，那么返回c；如果a等于d，那么返回e；否则返回f

字符串连接函数： concat

带分隔符字符串连接： concat_ws

字符串截取函数： substr,substring

去空格函数：trim

分割字符串函数: split 返回值: array

字段数据进行拆分: explode 将Map和Array拆分

LATERAL VIEW: LATERAL VIEW explode(split(goodid,','))goods as goods_id2 goods相当于一个虚拟表

Over窗口函数: Count(*) over(xxxx)

反射函数reflect: reflect reflect("java.lang.Math","max",col1,col2) 调用java中的自带函数，秒杀一切udf函数。

RANK、ROW_NUMBER、DENSE_RANK OVER的使用使用这几个函数，可以实现分组求topN

类型转换函数: cast select cast('1' as bigint) from tableName;