Blas GEMM launch failed

How can I solve this problem&#65306;
cuda 10.0
tensorflow1.13.1

WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/metrics_impl.py:363: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_grad.py:102: div (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Deprecated in favor of operator or tf.math.divide.
2024-11-08 02:47:12.989408: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2024-11-08 02:47:13.289686: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:984] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2024-11-08 02:47:13.289907: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x6d92bb0 executing computations on platform CUDA. Devices:
2024-11-08 02:47:13.289941: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): NVIDIA GeForce RTX 3090, Compute Capability 8.6
2024-11-08 02:47:13.291961: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2496010000 Hz
2024-11-08 02:47:13.294180: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x6f112a0 executing computations on platform Host. Devices:
2024-11-08 02:47:13.294224: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
2024-11-08 02:47:13.294433: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: 
name: NVIDIA GeForce RTX 3090 major: 8 minor: 6 memoryClockRate(GHz): 1.8
pciBusID: 0000:01:00.0
totalMemory: 24.00GiB freeMemory: 22.77GiB
2024-11-08 02:47:13.294481: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2024-11-08 02:47:13.515059: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2024-11-08 02:47:13.515134: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0 
2024-11-08 02:47:13.515143: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N 
2024-11-08 02:47:13.515453: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1194] Could not identify NUMA node of platform GPU id 0, defaulting to 0.  Your kernel may not have been built with NUMA support.
2024-11-08 02:47:13.515571: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 2457 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce RTX 3090, pci bus id: 0000:01:00.0, compute capability: 8.6)
Training will begin.. 
Batch_size: 4
Batch norm use?: False
Decoder arch: pcn
Last best_validation_loss: 100000.0
2024-11-08 02:47:45.424046: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
2024-11-08 02:48:43.328016: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 530.66MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2024-11-08 02:48:43.328106: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 530.66MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2024-11-08 02:48:43.416925: E tensorflow/stream_executor/cuda/cuda_blas.cc:698] failed to run cuBLAS routine cublasSgemm_v2: CUBLAS_STATUS_EXECUTION_FAILED
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1334, in _do_call
    return fn(*args)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1319, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InternalError: Blas GEMM launch failed : a.shape=(4, 1024), b.shape=(4, 3072), m=1024, n=3072, k=4
	 [[{{node gradients/pcn_decoder_1/fc_2/MatMul_grad/MatMul_1}}]]
	 [[{{node Adam/update}}]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/workspace/SAUM/train.py", line 182, in <module>
    train(config)
  File "/workspace/SAUM/train.py", line 112, in train
    _, target_loss, summary = sess.run([optimizer.train, model.target_loss, train_summary], feed_dict=feed_dict)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 929, in run
    run_metadata_ptr)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1152, in _run
    feed_dict_tensor, options, run_metadata)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1328, in _do_run
    run_metadata)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1348, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: Blas GEMM launch failed : a.shape=(4, 1024), b.shape=(4, 3072), m=1024, n=3072, k=4
	 [[node gradients/pcn_decoder_1/fc_2/MatMul_grad/MatMul_1 (defined at /workspace/SAUM/optimizer.py:31) ]]
	 [[node Adam/update (defined at /workspace/SAUM/optimizer.py:31) ]]

Caused by op 'gradients/pcn_decoder_1/fc_2/MatMul_grad/MatMul_1', defined at:
  File "/workspace/SAUM/train.py", line 182, in <module>
    train(config)
  File "/workspace/SAUM/train.py", line 42, in train
    optimizer = importlib.import_module('optimizer').optimizer(lr_config, model.global_step, model.target_loss)
  File "/workspace/SAUM/optimizer.py", line 15, in __init__
    self.train = self.create_train()
  File "/workspace/SAUM/optimizer.py", line 31, in create_train
    global_step=self.global_step)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/optimizer.py", line 403, in minimize
    grad_loss=grad_loss)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/optimizer.py", line 512, in compute_gradients
    colocate_gradients_with_ops=colocate_gradients_with_ops)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/gradients_impl.py", line 664, in gradients
    unconnected_gradients)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/gradients_impl.py", line 965, in _GradientsHelper
    lambda: grad_fn(op, *out_grads))
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/gradients_impl.py", line 420, in _MaybeCompile
    return grad_fn()  # Exit early
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/gradients_impl.py", line 965, in <lambda>
    lambda: grad_fn(op, *out_grads))
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_grad.py", line 1132, in _MatMulGrad
    grad_b = gen_math_ops.mat_mul(a, grad, transpose_a=True)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/gen_math_ops.py", line 5333, in mat_mul
    name=name)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
    op_def=op_def)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 3300, in create_op
    op_def=op_def)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 1801, in __init__
    self._traceback = tf_stack.extract_stack()

...which was originally created as op 'pcn_decoder_1/fc_2/MatMul', defined at:
  File "/workspace/SAUM/train.py", line 182, in <module>
    train(config)
  File "/workspace/SAUM/train.py", line 39, in train
    model = model_module.model(config, inputs_pl, npts_pl, gt_pl, is_training_pl)
  File "/workspace/SAUM/models/pcn.py", line 11, in __init__
    super().__init__(config, inputs, npts, gt, is_training)
  File "/workspace/SAUM/models/model.py", line 29, in __init__
    self.outputs = self.network(inputs, npts, is_training)
  File "/workspace/SAUM/models/pcn.py", line 54, in network
    coarse, decoder_points = self.decoder(GFV, is_training)
  File "/workspace/SAUM/models/pcn.py", line 17, in decoder
    coarse = mlp(GFV, coarse_feat_dims, is_training, self.use_bn)
  File "/workspace/SAUM/utils/tf_util.py", line 23, in mlp
    scope='fc_%d' % (len(layer_dims) - 1))
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/contrib/framework/python/ops/arg_scope.py", line 182, in func_with_args
    return func(*args, **current_args)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/contrib/layers/python/layers/layers.py", line 1855, in fully_connected
    outputs = layer.apply(inputs)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/keras/engine/base_layer.py", line 1227, in apply
    return self.__call__(inputs, *args, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/layers/base.py", line 530, in __call__
    outputs = super(Layer, self).__call__(inputs, *args, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/keras/engine/base_layer.py", line 554, in __call__
    outputs = self.call(inputs, *args, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/keras/layers/core.py", line 975, in call
    outputs = gen_math_ops.mat_mul(inputs, self.kernel)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/gen_math_ops.py", line 5333, in mat_mul
    name=name)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
    op_def=op_def)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)

InternalError (see above for traceback): Blas GEMM launch failed : a.shape=(4, 1024), b.shape=(4, 3072), m=1024, n=3072, k=4
	 [[node gradients/pcn_decoder_1/fc_2/MatMul_grad/MatMul_1 (defined at /workspace/SAUM/optimizer.py:31) ]]
	 [[node Adam/update (defined at /workspace/SAUM/optimizer.py:31) ]]

PrefetchDataZMQ successfully cleaned-up.
PrefetchDataZMQ successfully cleaned-up.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Blas GEMM launch failed #1

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Blas GEMM launch failed #1

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions