Hudi同步到Hive创建的表如下
CREATE EXTERNAL TABLE `dateformatsinglepartitiondemo`(`_hoodie_commit_time` string,`_hoodie_commit_seqno` string,`_hoodie_record_key` string,`_hoodie_partition_path` string,`_hoodie_file_name` string,`age` bigint,`location` string,`name` string,`sex` string,`ts` bigint)PARTITIONED BY (`date` string)ROW FORMAT SERDE'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'STORED AS INPUTFORMAT'org.apache.hudi.hadoop.HoodieParquetInputFormat'OUTPUTFORMAT'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'LOCATION'file:/tmp/hudi-partitions/dateFormatSinglePartitionDemo'TBLPROPERTIES ('last_commit_time_sync'='20200816155107','transient_lastDdlTime'='1597564276')
查询表dateformatsinglepartitiondemo
文章插图
2.2 多分区【详解Apache Hudi如何配置各种类型分区】多分区表示使用多个字段表示作为分区字段的场景,如上述使用location字段和sex字段,核心配置项如下
- DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY()配置为location,sex;
- hoodie.datasource.hive_sync.partition_fields配置为location,sex,与写入Hudi的分区字段相同;
- DataSourceWriteOptions.KEYGENERATOR_CLASS_OPT_KEY()配置为org.apache.hudi.keygen.ComplexKeyGenerator;
- hoodie.datasource.hive_sync.partition_extractor_class配置为org.apache.hudi.hive.MultiPartKeysValueExtractor;
CREATE EXTERNAL TABLE `multipartitiondemo`(`_hoodie_commit_time` string,`_hoodie_commit_seqno` string,`_hoodie_record_key` string,`_hoodie_partition_path` string,`_hoodie_file_name` string,`age` bigint,`date` string,`name` string,`ts` bigint)PARTITIONED BY (`location` string,`sex` string)ROW FORMAT SERDE'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'STORED AS INPUTFORMAT'org.apache.hudi.hadoop.HoodieParquetInputFormat'OUTPUTFORMAT'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'LOCATION'file:/tmp/hudi-partitions/multiPartitionDemo'TBLPROPERTIES ('last_commit_time_sync'='20200816160557','transient_lastDdlTime'='1597565166')
查询表multipartitiondemo文章插图
2.3 无分区无分区场景是指无分区字段,写入Hudi的数据集无分区 。核心配置如下
- DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY()配置为空字符串;
- hoodie.datasource.hive_sync.partition_fields配置为空字符串,与写入Hudi的分区字段相同;
- DataSourceWriteOptions.KEYGENERATOR_CLASS_OPT_KEY()配置为org.apache.hudi.keygen.NonpartitionedKeyGenerator;
- hoodie.datasource.hive_sync.partition_extractor_class配置为org.apache.hudi.hive.NonPartitionedExtractor;
CREATE EXTERNAL TABLE `nonpartitiondemo`(`_hoodie_commit_time` string,`_hoodie_commit_seqno` string,`_hoodie_record_key` string,`_hoodie_partition_path` string,`_hoodie_file_name` string,`age` bigint,`date` string,`location` string,`name` string,`sex` string,`ts` bigint)ROW FORMAT SERDE'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'STORED AS INPUTFORMAT'org.apache.hudi.hadoop.HoodieParquetInputFormat'OUTPUTFORMAT'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'LOCATION'file:/tmp/hudi-partitions/nonPartitionDemo'TBLPROPERTIES ('last_commit_time_sync'='20200816161558','transient_lastDdlTime'='1597565767')
查询表nonpartitiondemo文章插图
2.4 Hive风格分区除了上述几种常见的分区方式,还有一种Hive风格分区格式,如location=beijing/sex=male格式,以location,sex作为分区字段,核心配置如下
- DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY()配置为location,sex;
- hoodie.datasource.hive_sync.partition_fields配置为location,sex,与写入Hudi的分区字段相同;
- DataSourceWriteOptions.KEYGENERATOR_CLASS_OPT_KEY()配置为org.apache.hudi.keygen.ComplexKeyGenerator;
- hoodie.datasource.hive_sync.partition_extractor_class配置为org.apache.hudi.hive.SlashEncodedDayPartitionValueExtractor;
- DataSourceWriteOptions.HIVE_STYLE_PARTITIONING_OPT_KEY()配置为true;
/location=beijing/sex=male
Hudi同步到Hive创建的表如下CREATE EXTERNAL TABLE `hivestylepartitiondemo`(`_hoodie_commit_time` string,`_hoodie_commit_seqno` string,`_hoodie_record_key` string,`_hoodie_partition_path` string,`_hoodie_file_name` string,`age` bigint,`date` string,`name` string,`ts` bigint)PARTITIONED BY (`location` string,`sex` string)ROW FORMAT SERDE'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'STORED AS INPUTFORMAT'org.apache.hudi.hadoop.HoodieParquetInputFormat'OUTPUTFORMAT'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'LOCATION'file:/tmp/hudi-partitions/hiveStylePartitionDemo'TBLPROPERTIES ('last_commit_time_sync'='20200816172710','transient_lastDdlTime'='1597570039')
推荐阅读
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- 桂花茶的窨制技术详解,茉莉花窨制新技术连窨技术
- 详解匈牙利算法与二分图匹配
- 万能的Windows定时开关机设置方法详解,不需要BIOS支持
- 桃花彩妆步骤详解 如何画出桃花彩妆
- 详解DBSCAN聚类
- pytorch实现 GoogLeNet——CNN经典网络模型详解
- 使用Apache协议的是自由软件吗?
- CNN中常用的四种卷积详解
- 新媒体运营黎想:详解用户活跃、留存、流失3大问题
- 杭州人的最爱,沈括尝茶茶诗赏析详解