Skip to content

Blas GEMM launch failed #1

@to-where

Description

@to-where

How can I solve this problem:
cuda 10.0
tensorflow1.13.1

WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/metrics_impl.py:363: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_grad.py:102: div (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Deprecated in favor of operator or tf.math.divide.
2024-11-08 02:47:12.989408: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2024-11-08 02:47:13.289686: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:984] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2024-11-08 02:47:13.289907: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x6d92bb0 executing computations on platform CUDA. Devices:
2024-11-08 02:47:13.289941: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): NVIDIA GeForce RTX 3090, Compute Capability 8.6
2024-11-08 02:47:13.291961: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2496010000 Hz
2024-11-08 02:47:13.294180: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x6f112a0 executing computations on platform Host. Devices:
2024-11-08 02:47:13.294224: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): ,
2024-11-08 02:47:13.294433: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties:
name: NVIDIA GeForce RTX 3090 major: 8 minor: 6 memoryClockRate(GHz): 1.8
pciBusID: 0000:01:00.0
totalMemory: 24.00GiB freeMemory: 22.77GiB
2024-11-08 02:47:13.294481: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2024-11-08 02:47:13.515059: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2024-11-08 02:47:13.515134: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0
2024-11-08 02:47:13.515143: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N
2024-11-08 02:47:13.515453: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1194] Could not identify NUMA node of platform GPU id 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2024-11-08 02:47:13.515571: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 2457 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce RTX 3090, pci bus id: 0000:01:00.0, compute capability: 8.6)
Training will begin..
Batch_size: 4
Batch norm use?: False
Decoder arch: pcn
Last best_validation_loss: 100000.0
2024-11-08 02:47:45.424046: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
2024-11-08 02:48:43.328016: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 530.66MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2024-11-08 02:48:43.328106: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 530.66MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2024-11-08 02:48:43.416925: E tensorflow/stream_executor/cuda/cuda_blas.cc:698] failed to run cuBLAS routine cublasSgemm_v2: CUBLAS_STATUS_EXECUTION_FAILED
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1334, in _do_call
return fn(*args)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1319, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InternalError: Blas GEMM launch failed : a.shape=(4, 1024), b.shape=(4, 3072), m=1024, n=3072, k=4
[[{{node gradients/pcn_decoder_1/fc_2/MatMul_grad/MatMul_1}}]]
[[{{node Adam/update}}]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/workspace/SAUM/train.py", line 182, in
train(config)
File "/workspace/SAUM/train.py", line 112, in train
_, target_loss, summary = sess.run([optimizer.train, model.target_loss, train_summary], feed_dict=feed_dict)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 929, in run
run_metadata_ptr)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1152, in _run
feed_dict_tensor, options, run_metadata)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1328, in _do_run
run_metadata)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1348, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: Blas GEMM launch failed : a.shape=(4, 1024), b.shape=(4, 3072), m=1024, n=3072, k=4
[[node gradients/pcn_decoder_1/fc_2/MatMul_grad/MatMul_1 (defined at /workspace/SAUM/optimizer.py:31) ]]
[[node Adam/update (defined at /workspace/SAUM/optimizer.py:31) ]]

Caused by op 'gradients/pcn_decoder_1/fc_2/MatMul_grad/MatMul_1', defined at:
File "/workspace/SAUM/train.py", line 182, in
train(config)
File "/workspace/SAUM/train.py", line 42, in train
optimizer = importlib.import_module('optimizer').optimizer(lr_config, model.global_step, model.target_loss)
File "/workspace/SAUM/optimizer.py", line 15, in init
self.train = self.create_train()
File "/workspace/SAUM/optimizer.py", line 31, in create_train
global_step=self.global_step)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/optimizer.py", line 403, in minimize
grad_loss=grad_loss)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/optimizer.py", line 512, in compute_gradients
colocate_gradients_with_ops=colocate_gradients_with_ops)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/gradients_impl.py", line 664, in gradients
unconnected_gradients)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/gradients_impl.py", line 965, in _GradientsHelper
lambda: grad_fn(op, *out_grads))
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/gradients_impl.py", line 420, in _MaybeCompile
return grad_fn() # Exit early
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/gradients_impl.py", line 965, in
lambda: grad_fn(op, *out_grads))
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_grad.py", line 1132, in _MatMulGrad
grad_b = gen_math_ops.mat_mul(a, grad, transpose_a=True)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/gen_math_ops.py", line 5333, in mat_mul
name=name)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
op_def=op_def)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 3300, in create_op
op_def=op_def)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 1801, in init
self._traceback = tf_stack.extract_stack()

...which was originally created as op 'pcn_decoder_1/fc_2/MatMul', defined at:
File "/workspace/SAUM/train.py", line 182, in
train(config)
File "/workspace/SAUM/train.py", line 39, in train
model = model_module.model(config, inputs_pl, npts_pl, gt_pl, is_training_pl)
File "/workspace/SAUM/models/pcn.py", line 11, in init
super().init(config, inputs, npts, gt, is_training)
File "/workspace/SAUM/models/model.py", line 29, in init
self.outputs = self.network(inputs, npts, is_training)
File "/workspace/SAUM/models/pcn.py", line 54, in network
coarse, decoder_points = self.decoder(GFV, is_training)
File "/workspace/SAUM/models/pcn.py", line 17, in decoder
coarse = mlp(GFV, coarse_feat_dims, is_training, self.use_bn)
File "/workspace/SAUM/utils/tf_util.py", line 23, in mlp
scope='fc_%d' % (len(layer_dims) - 1))
File "/usr/local/lib/python3.5/dist-packages/tensorflow/contrib/framework/python/ops/arg_scope.py", line 182, in func_with_args
return func(*args, **current_args)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/contrib/layers/python/layers/layers.py", line 1855, in fully_connected
outputs = layer.apply(inputs)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/keras/engine/base_layer.py", line 1227, in apply
return self.call(inputs, *args, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/layers/base.py", line 530, in call
outputs = super(Layer, self).call(inputs, *args, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/keras/engine/base_layer.py", line 554, in call
outputs = self.call(inputs, *args, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/keras/layers/core.py", line 975, in call
outputs = gen_math_ops.mat_mul(inputs, self.kernel)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/gen_math_ops.py", line 5333, in mat_mul
name=name)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
op_def=op_def)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
return func(*args, **kwargs)

InternalError (see above for traceback): Blas GEMM launch failed : a.shape=(4, 1024), b.shape=(4, 3072), m=1024, n=3072, k=4
[[node gradients/pcn_decoder_1/fc_2/MatMul_grad/MatMul_1 (defined at /workspace/SAUM/optimizer.py:31) ]]
[[node Adam/update (defined at /workspace/SAUM/optimizer.py:31) ]]

PrefetchDataZMQ successfully cleaned-up.
PrefetchDataZMQ successfully cleaned-up.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions