-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SUPPORT] OOM errors while creating a table using Bulk Insert operation #12116
Comments
After reading this issue, I tried adding the following configuration to my hudi options:
But that failed as well, with this exception in the executors:
|
@dataproblems Looks like the issue is with timeline server. Can you disable timeline server based markers - Set hoodie.write.markers.type to DIRECT and try once. If this also doesn't work then you can try disabling the timeline server itself. Can you please give more information about your environment. Just wanted to understand the issue with timeline server. |
@ad1happy2go - Did you mean I'm running this on an EMR release 6.11.0. What other information ( other than the spark version / spark submit command ) would you need? |
@dataproblems Thanks and sorry for overlooking that comment. Error in this comment #12116 (comment) is very generic. Do you see any other exception? Can you check the node health , are executor node going down due to OOM? |
@dataproblems Can you share the entire driver logs what you ran without timeline server markers. By the exception its not clear which part of the code is getting broken. |
@ad1happy2go - I see a couple of entries similar to:
in the driver logs. Does that give you an idea? ( I'm not sure if I can share the entire log file. ) I also see:
this exception in one of the executor logs that I spot checked. |
Hi @dataproblems, Additionally, you mentioned that an OOM error is occurring. After setting the |
Hi @rangareddy - I still see the exit code 137 in the driver log, I only checked a few of the executor logs and pasted the exception in my previous comment. This is a single writer creating the base table. I also tried creating the base table with less than 1% of my data ( something around 100 GB ) and the job just gets stuck here ( see attached screenshot ). I can see that the data files are there in S3 but the commit file isn't there yet. I'm also not sure what hoodie is doing in the stage. |
@dataproblems I noticed that only 400 tasks are getting created. This may be the main problem as tasks are taking more than 18 mins already. Can you find out the reason for this. Can you find out why its only created 400 tasks? |
Whats the nature of your input data? somehow looks like your input dataframe only creating 400 partition. Can you try repartition before saving to see if it works |
@ad1happy2go - That's likely because I'm setting the bulk insert shuffle parallelism to 400, I've tried with other values but all result in similar outcomes. Can you elaborate on the nature of the data question? Here's another screenshot where the partition / parallelism could was higher. |
@dataproblems Can you remove this config and try, After Hudi 0.14.x we auto calculate the number of parallelism required. On above screenshot can you expand Failure reason and send stacktrace? |
@ad1happy2go - I've removed that config for the second screenshot. I couldn't find the executor with the exact error message as shown in the screen shot but here's something that I do see:
Also for some reason the driver claims that the application was succeeded but the Spark UI as well as the data sink in s3 show that the data was never written completely:
The log entry |
@dataproblems This doesn't give much of insights. Possible to attach complete driver and one executor log? |
@ad1happy2go - Sure. Here you go. You will see the stack trace in the driver.log file. |
@dataproblems Any reason why we are using such a high --conf spark.executor.heartbeatInterval=900s. It should be much lesser than spark.network.timeout. Can you try leaving these as defaults one. I see lot of issues with spark configs. Shouldn't be using --conf spark.driver.maxResultSize=0 also, as then driver result collection will not have any limit. You may increase upto 4 gb if required but keep a check. Also did you tried turning on the timeline server i.e. update only this "hoodie.embed.timeline.server" to "true" as this is used to build file system image. Keep markers as direct only. |
@ad1happy2go - We were getting heartbeat timeout exceptions which resulted in the increased value for those configurations. Same with the driver maxResultSize, I got an exception about the driver result being larger than the maxResultSize of 5g that I was using previously - as such, I removed the limit to mitigate that error. When I tried using DIRECT markers but set |
@dataproblems There is something wrong in your setup, if such a large data size is getting collected to driver. Is it possible to share the hudi timeline? whats the size of the commit files? |
@ad1happy2go - Given that this is creating the table, there is only a single commit requested. Both the commit.requested and commit.inflight objects are 0 B in size. Since we never get to the .commit file as the job fails before writing all of the data. On a separate note - when I do disable The spark job is merely reading from S3 and writing the data back in hudi format on our end, there are no operations we perform which would result in the dataset being collected on the driver, so I would defer to you on that front - usually it's in the mapToPair operation in HoodieJavaRdd file or in the save operation as seen in the previous screenshots. |
@dataproblems There is some problem here. For a 100 GB data you should not have 40 MB commit file. Another reason can be you have lot of small files in the input. if thats the case, Can you try repartition the dataframe before saving it? |
@ad1happy2go, I have about 6 partitions for the sample dataset that I'm using.
Experiment 1: Repartition before savingLet me update you on how the repartition exercise goes and see if it results in a smaller size for the Experiment 2: Repartition with
|
Data Size | Commit Requested | Commit | Total Time |
---|---|---|---|
47 MB | 00:03 | 00:40 | 37 seconds |
400 MB | 00:24 | 02:13 | 1 minute 49 seconds |
800 MB | 00:08 | 03:02 | 2 minutes 54 seconds |
1.9 GB | 00:38 | 07:03 | 6 minutes 25 seconds |
3.8 GB | 00:28 | 16:55 | 16 minutes 27 seconds |
37 GB | Didn't create the table ( still in running / stuck state after 88 minutes) |
From my observations, the table creation time increases somewhat linearly at the start but then with 1.9 GB -> 3.8 GB it's non linear. Also, we were not able to get the table created for the dataset of 37 GB in size. Can you see if you can reproduce it on your end? It would be useful to learn about what parameter and configuration worked for you. I tried the random data example with and without the repartition - both times, I saw that partial data was written to S3 and then the job would go into the long pause / running state. See attached screenshot
@ad1happy2go - did you get a chance to take a look at the results / data I posted last week? |
@dataproblems For the Experiment 3 - i can clearly see the problem is there with parallelim. Its just creating 100 tasks and they are running from 1.6h. Can you try to increase the parallelism in this case. To do this you have to increase the repartition factor along with the dataset which is 100 in your case. |
Follow up questions:
Update on Experiment 3:Based on your suggestion, I updated the
Stacktrace
|
Followed up with @ad1happy2go in Hudi Office hours and got more things to try: Follow Up on Random Data: Use sort mode as
|
@dataproblems Good to know atleast you are unblocked. Although., bulk_insert with None sort mode should be faster than "insert" and ideally need less resources than "insert". So its strage that on same resources, bulk_insert failed but insert succeded. Can you try to increase hoodie.bulkinsert.shuffle.parallelism to 10000 and see if its getting successful. |
@ad1happy2go - unfortunately, even after increasing the |
@dataproblems Thanks for the update. Let us know how insert works with 20x size. |
@ad1happy2go - I increased the min file count to Another thing I noticed was that the file sizing isn't quite optimal, I have one partition that is larger than the rest due to the inherent nature of the data and hudi creates files of 160 MB for a partition that has potentially multiple TBs of data. |
@dataproblems Can you please share the spark UI screenshots? On this
Why do you think 160MB files are not optimal. Also can you check the size of your timeline files under .hoodie. |
@ad1happy2go The similar issue happened to me as well in bulk_insert mode where when i tried to insert a 34GB json input with 100 partitions to hudi table(approx 5.5GB parquet) got an I am using hoodie.bulkinsert.shuffle.parallelism = '10',hoodie.write.markers.type = 'direct', hoodie.embed.timeline.server = 'false'. As @dataproblems stated when I started with less memory to job I got But as u pointed the no of tasks are less to me as well due to 100 source partitions read, let me also once try with repartition of the dataset by 200 & increase the hoodie.bukinsert.shuffle.parallelism to 500. Any further suggestions /optimizations I shall do as in SPARKSQL Hudi table creation, bulk_insert is the only mode available by default i know i can switch to DF API'S but using SQL can we switch to normal insert? |
Describe the problem you faced
I am unable to create a hudi table using the data that I have with POPULATE_META_FIELDS being enabled. I can create the table with POPULATE_META_FIELDS set to false.
To Reproduce
Steps to reproduce the behavior:
There are a total of 68 billion unique record keys and my total dataset is around 5TB.
Expected behavior
I should be able to create the table without any exceptions
Environment Description
Hudi version : 0.15.0, 1.0.0-beta1, 1.0.0-beta2
Spark version : 3.3, 3.4
Hive version :
Hadoop version :
Storage (HDFS/S3/GCS..) : S3
Running on Docker? (yes/no) : no
Additional context
Spark submit command:
Hudi Bulk Insert Options:
I've tried GLOBAL, PARTITION_SORT, and NONE => all result in the same error.
Stacktrace
This is the piece / stage that fails:
The executors have these errors:
I see exit code 137 in the driver logs, OOM: Java Heap Space in the stdout logs.
The text was updated successfully, but these errors were encountered: