Open
Description
Search before asking
- I searched in the issues and found nothing similar.
Paimon version
0.9
Compute Engine
hive: 2.1-cdh-6.3-1
Minimal reproduce step
create paimon table in beeline
SET hive.metastore.warehouse.dir=/user/hive/warehouse;
CREATE TABLE hive_test_table(
a INT COMMENT 'The a field',
b STRING COMMENT 'The b field'
)
STORED BY 'org.apache.paimon.hive.PaimonStorageHandler';
after that, try insert data
insert into hive_test_table values (3, 'paimon');
What doesn't meet your expectations?
MapReduce failed with bellow error.
2024-11-17 22:11:46,830 INFO [main] org.apache.hadoop.hive.ql.exec.TableScanOperator: Initializing operator TS[0]
2024-11-17 22:11:46,830 INFO [main] org.apache.hadoop.hive.ql.exec.SelectOperator: Initializing operator SEL[1]
2024-11-17 22:11:47,196 INFO [main] org.apache.hadoop.hive.ql.exec.SelectOperator: SELECT struct<tmp_values_col1:string,tmp_values_col2:string>
2024-11-17 22:11:47,203 INFO [main] org.apache.hadoop.hive.ql.exec.FileSinkOperator: Initializing operator FS[3]
2024-11-17 22:11:47,204 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
2024-11-17 22:11:47,621 INFO [main] org.apache.hadoop.hive.ql.exec.FileSinkOperator: Using serializer : org.apache.paimon.hive.PaimonSerDe@6ede46f6 and formatter : org.apache.paimon.hive.mapred.PaimonOutputFormat@66273da0
2024-11-17 22:11:47,621 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.healthChecker.script.timeout is deprecated. Instead, use mapreduce.tasktracker.healthchecker.script.timeout
2024-11-17 22:11:47,634 INFO [main] org.apache.hadoop.hive.ql.exec.FileSinkOperator: New Final Path: FS hdfs://cdh01.daniel.com:8020/user/hive/warehouse/default.db/_tmp.hive_test_table/000000_0
2024-11-17 22:11:47,781 INFO [main] org.apache.hadoop.hive.ql.exec.FileSinkOperator: FS[3]: records written - 1
2024-11-17 22:11:47,913 INFO [main] org.apache.hadoop.io.compress.CodecPool: Got brand-new compressor [.zstd]
2024-11-17 22:11:48,249 INFO [main] org.apache.hadoop.hive.ql.exec.MapOperator: MAP[0]: records read - 1
2024-11-17 22:11:48,249 INFO [main] org.apache.hadoop.hive.ql.exec.MapOperator: MAP[0]: Total records read - 1. abort - false
2024-11-17 22:11:48,249 INFO [main] org.apache.hadoop.hive.ql.exec.MapOperator: DESERIALIZE_ERRORS:0, RECORDS_IN:1,
2024-11-17 22:11:48,250 INFO [main] org.apache.hadoop.hive.ql.exec.FileSinkOperator: FS[3]: records written - 1
2024-11-17 22:11:48,250 INFO [main] org.apache.hadoop.hive.ql.exec.FileSinkOperator: TOTAL_TABLE_ROWS_WRITTEN:1, RECORDS_OUT_1_default.hive_test_table:1,
2024-11-17 22:11:48,272 INFO [main] org.apache.hadoop.mapred.Task: Task:attempt_1731851152753_0005_m_000000_0 is done. And is in the process of committing
2024-11-17 22:11:48,276 INFO [main] org.apache.hadoop.mapred.Task: Task attempt_1731851152753_0005_m_000000_0 is allowed to commit now
2024-11-17 22:11:48,357 ERROR [main] org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.UnsatisfiedLinkError: com.github.luben.zstd.ZstdOutputStreamNoFinalizer.recommendedCOutSize()J
at com.github.luben.zstd.ZstdOutputStreamNoFinalizer.recommendedCOutSize(Native Method)
at com.github.luben.zstd.ZstdOutputStreamNoFinalizer.<clinit>(ZstdOutputStreamNoFinalizer.java:30)
at com.github.luben.zstd.RecyclingBufferPool.<clinit>(RecyclingBufferPool.java:17)
at org.apache.paimon.shade.org.apache.parquet.hadoop.codec.ZstandardCodec.createOutputStream(ZstandardCodec.java:107)
at org.apache.paimon.shade.org.apache.parquet.hadoop.codec.ZstandardCodec.createOutputStream(ZstandardCodec.java:100)
at org.apache.paimon.shade.org.apache.parquet.hadoop.CodecFactory$HeapBytesCompressor.compress(CodecFactory.java:176)
at org.apache.paimon.shade.org.apache.parquet.hadoop.ColumnChunkPageWriteStore$ColumnChunkPageWriter.writePage(ColumnChunkPageWriteStore.java:168)
at org.apache.paimon.shade.org.apache.parquet.column.impl.ColumnWriterV1.writePage(ColumnWriterV1.java:59)
at org.apache.paimon.shade.org.apache.parquet.column.impl.ColumnWriterBase.writePage(ColumnWriterBase.java:389)
at org.apache.paimon.shade.org.apache.parquet.column.impl.ColumnWriteStoreBase.flush(ColumnWriteStoreBase.java:186)
at org.apache.paimon.shade.org.apache.parquet.column.impl.ColumnWriteStoreV1.flush(ColumnWriteStoreV1.java:29)
at org.apache.paimon.shade.org.apache.parquet.hadoop.InternalParquetRecordWriter.flushRowGroupToStore(InternalParquetRecordWriter.java:185)
at org.apache.paimon.shade.org.apache.parquet.hadoop.InternalParquetRecordWriter.close(InternalParquetRecordWriter.java:124)
at org.apache.paimon.shade.org.apache.parquet.hadoop.ParquetWriter.close(ParquetWriter.java:112)
at org.apache.paimon.format.parquet.writer.ParquetBulkWriter.close(ParquetBulkWriter.java:52)
at org.apache.paimon.io.SingleFileWriter.close(SingleFileWriter.java:170)
at org.apache.paimon.io.RowDataFileWriter.close(RowDataFileWriter.java:104)
at org.apache.paimon.io.RollingFileWriter.closeCurrentWriter(RollingFileWriter.java:131)
at org.apache.paimon.io.RollingFileWriter.close(RollingFileWriter.java:168)
at org.apache.paimon.append.AppendOnlyWriter$DirectSinkWriter.flush(AppendOnlyWriter.java:418)
at org.apache.paimon.append.AppendOnlyWriter.flush(AppendOnlyWriter.java:219)
at org.apache.paimon.append.AppendOnlyWriter.prepareCommit(AppendOnlyWriter.java:207)
at org.apache.paimon.operation.AbstractFileStoreWrite.prepareCommit(AbstractFileStoreWrite.java:210)
at org.apache.paimon.operation.MemoryFileStoreWrite.prepareCommit(MemoryFileStoreWrite.java:152)
at org.apache.paimon.table.sink.TableWriteImpl.prepareCommit(TableWriteImpl.java:253)
at org.apache.paimon.table.sink.TableWriteImpl.prepareCommit(TableWriteImpl.java:260)
at org.apache.paimon.hive.mapred.PaimonOutputCommitter.commitTask(PaimonOutputCommitter.java:95)
at org.apache.hadoop.mapred.OutputCommitter.commitTask(OutputCommitter.java:343)
at org.apache.hadoop.mapred.Task.commit(Task.java:1341)
at org.apache.hadoop.mapred.Task.done(Task.java:1185)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:351)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)
2024-11-17 22:11:48,460 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping MapTask metrics system...
2024-11-17 22:11:48,460 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system stopped.
2024-11-17 22:11:48,461 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system shutdown complete.
Anything else?
I checked if my cluster supports zstd compression:
Use the following command to run MR job
hadoop jar /opt/cloudera/parcels/CDH/jars/hadoop-mapreduce-examples*.jar wordcount -Dmapreduce.map.output.compress.codec=org.apache.hadoop.io.compress.ZStandardCodec -Dmapreduce.map.output.compress=true -Dmapreduce.output.fileoutputformat.compress=true -Dmapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.ZStandardCodec wcin wcout-zst
The job ran successfully.
a part of that log.
2024-11-17 22:07:43,023 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
2024-11-17 22:07:43,410 INFO [main] org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: File Output Committer Algorithm version is 2
2024-11-17 22:07:43,410 INFO [main] org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
2024-11-17 22:07:43,421 INFO [main] org.apache.hadoop.mapred.Task: Using ResourceCalculatorProcessTree : [ ]
2024-11-17 22:07:43,565 INFO [main] org.apache.hadoop.mapred.MapTask: Processing split: hdfs://cdh01.daniel.com:8020/user/hive/wcin/yum.log:0+0
2024-11-17 22:07:43,672 INFO [main] org.apache.hadoop.mapred.MapTask: (EQUATOR) 0 kvi 67108860(268435440)
2024-11-17 22:07:43,672 INFO [main] org.apache.hadoop.mapred.MapTask: mapreduce.task.io.sort.mb: 256
2024-11-17 22:07:43,672 INFO [main] org.apache.hadoop.mapred.MapTask: soft limit at 214748368
2024-11-17 22:07:43,672 INFO [main] org.apache.hadoop.mapred.MapTask: bufstart = 0; bufvoid = 268435456
2024-11-17 22:07:43,672 INFO [main] org.apache.hadoop.mapred.MapTask: kvstart = 67108860; length = 16777216
2024-11-17 22:07:43,679 INFO [main] org.apache.hadoop.mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
2024-11-17 22:07:43,696 INFO [main] org.apache.hadoop.mapred.MapTask: Starting flush of map output
2024-11-17 22:07:43,715 INFO [main] org.apache.hadoop.io.compress.CodecPool: Got brand-new compressor [.zst]
2024-11-17 22:07:43,739 INFO [main] org.apache.hadoop.mapred.Task: Task:attempt_1731851152753_0004_m_000000_0 is done. And is in the process of committing
2024-11-17 22:07:43,770 INFO [main] org.apache.hadoop.mapred.Task: Task 'attempt_1731851152753_0004_m_000000_0' done.
2024-11-17 22:07:43,778 INFO [main] org.apache.hadoop.mapred.Task: Final Counters for attempt_1731851152753_0004_m_000000_0: Counters: 29
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=220970
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=116
HDFS: Number of bytes written=0
HDFS: Number of read operations=3
HDFS: Number of large read operations=0
HDFS: Number of write operations=0
HDFS: Number of bytes read erasure-coded=0
Map-Reduce Framework
Map input records=0
Map output records=0
Map output bytes=0
Map output materialized bytes=90
Input split bytes=116
Combine input records=0
Combine output records=0
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=41
CPU time spent (ms)=430
Physical memory (bytes) snapshot=482156544
Virtual memory (bytes) snapshot=2589884416
Total committed heap usage (bytes)=480247808
Peak Map Physical memory (bytes)=482156544
Peak Map Virtual memory (bytes)=2589884416
File Input Format Counters
Bytes Read=0
2024-11-17 22:07:43,879 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping MapTask metrics system...
2024-11-17 22:07:43,879 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system stopped.
2024-11-17 22:07:43,879 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system shutdown complete.
Are you willing to submit a PR?
- I'm willing to submit a PR!