Skip to content

[Bug Report] RL env countered crash PhysX error while the cup hit the table  #1460

@cidxb

Description

@cidxb

This is the error log

0%|                                                                                                                                                                                         | 10/32000 [00:00<22:27, 23.74it/s]

2024-11-25 09:13:08 [11,526ms] [Error] [omni.physx.plugin] PhysX error: Synchronizing GPU Narrowphase failed! 700
, FILE /builds/omniverse/physics/physx/source/gpunarrowphase/src/PxgNarrowphaseCore.cpp, LINE 1259
2024-11-25 09:13:08 [11,526ms] [Error] [omni.physx.plugin] Cuda context manager error, simulation will be stopped and new cuda context manager will be created.
2024-11-25 09:13:08 [11,526ms] [Error] [omni.physx.plugin] PhysX error: Fetching GPU Narrowphase failed! 700
, FILE /builds/omniverse/physics/physx/source/gpunarrowphase/src/PxgNarrowphaseCore.cpp, LINE 1367
2024-11-25 09:13:08 [11,526ms] [Error] [omni.physx.plugin] Cuda context manager error, simulation will be stopped and new cuda context manager will be created.
2024-11-25 09:13:08 [11,526ms] [Error] [omni.physx.plugin] PhysX error: SynchronizeStreams cuEventRecord failed with error 700
, FILE /builds/omniverse/physics/physx/source/gpucommon/include/PxgCudaUtils.h, LINE 57
2024-11-25 09:13:08 [11,526ms] [Error] [omni.physx.plugin] Cuda context manager error, simulation will be stopped and new cuda context manager will be created.
2024-11-25 09:13:08 [11,526ms] [Error] [omni.physx.plugin] PhysX error: SynchronizeStreams cuStreamWaitEvent failed with error 700

=========There are some repeat error ===============================================


2024-11-25 09:13:08 [11,738ms] [Error] [omni.physx.plugin] Cuda context manager error, simulation will be stopped and new cuda context manager will be created.
2024-11-25 09:13:08 [11,739ms] [Error] [omni.physx.tensors.plugin] CUDA error: an illegal memory access was encountered: ../../../extensions/runtime/source/omni.physx.tensors/plugins/gpu/GpuArticulationView.cpp: 650
2024-11-25 09:13:08 [11,739ms] [Error] [omni.physx.tensors.plugin] CUDA error: an illegal memory access was encountered: ../../../extensions/runtime/source/omni.physx.tensors/plugins/gpu/CudaKernels.cu: 382
2024-11-25 09:13:08 [11,739ms] [Error] [omni.physx.tensors.plugin] Failed to fetch DOF velocity attribute
  0%|                                                                                                                                                                                         | 19/32000 [00:00<26:31, 20.09it/s]
Error executing job with overrides: []
Traceback (most recent call last):
  File "/home/xxx/workspace/isaaclab_px/source/extensions/omni.isaac.lab_tasks/omni/isaac/lab_tasks/utils/hydra.py", line 91, in hydra_main
    func(env_cfg, agent_cfg, *args, **kwargs)
  File "/home/xxx/workspace/isaaclab_px/source/standalone/workflows/skrl/train.py", line 178, in main
    runner.run()
  File "/home/xxx/anaconda3/envs/RL/lib/python3.10/site-packages/skrl/utils/runner/torch/runner.py", line 376, in run
    self._trainer.train()
  File "/home/xxx/anaconda3/envs/RL/lib/python3.10/site-packages/skrl/trainers/torch/sequential.py", line 81, in train
    self.single_agent_train()
  File "/home/xxx/anaconda3/envs/RL/lib/python3.10/site-packages/skrl/trainers/torch/base.py", line 182, in single_agent_train
    next_states, rewards, terminated, truncated, infos = self.env.step(actions)
  File "/home/xxx/anaconda3/envs/RL/lib/python3.10/site-packages/skrl/envs/wrappers/torch/isaaclab_envs.py", line 63, in step
    self._observations, reward, terminated, truncated, self._info = self._env.step(actions)
  File "/home/xxx/anaconda3/envs/RL/lib/python3.10/site-packages/gymnasium/wrappers/order_enforcing.py", line 56, in step
    return self.env.step(action)
  File "/home/xxx/workspace/isaaclab_px/source/extensions/omni.isaac.lab/omni/isaac/lab/envs/direct_rl_env.py", line 332, in step
    self.scene.update(dt=self.physics_dt)
  File "/home/xxx/workspace/isaaclab_px/source/extensions/omni.isaac.lab/omni/isaac/lab/scene/interactive_scene.py", line 374, in update
    articulation.update(dt)
  File "/home/xxx/workspace/isaaclab_px/source/extensions/omni.isaac.lab/omni/isaac/lab/assets/articulation/articulation.py", line 202, in update
    self._data.update(dt)
  File "/home/xxx/workspace/isaaclab_px/source/extensions/omni.isaac.lab/omni/isaac/lab/assets/articulation/articulation_data.py", line 78, in update
    self.joint_acc
  File "/home/xxx/workspace/isaaclab_px/source/extensions/omni.isaac.lab/omni/isaac/lab/assets/articulation/articulation_data.py", line 350, in joint_acc
    self._joint_acc.data = (self.joint_vel - self._previous_joint_vel) / time_elapsed
  File "/home/xxx/workspace/isaaclab_px/source/extensions/omni.isaac.lab/omni/isaac/lab/assets/articulation/articulation_data.py", line 340, in joint_vel
    self._joint_vel.data = self._root_physx_view.get_dof_velocities()
  File "/home/xxx/anaconda3/envs/RL/lib/python3.10/site-packages/isaacsim/extsPhysics/omni.physics.tensors/omni/physics/tensors/impl/api.py", line 446, in get_dof_velocities
    raise Exception("Failed to get DOF velocities from backend")
Exception: Failed to get DOF velocities from backend

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
2024-11-25 09:13:08 [11,774ms] [Error] [omni.physx.fabric.plugin] CUDA error: an illegal memory access was encountered: ../../../extensions/runtime/source/omni.physx.fabric/plugins/DirectGpuHelper.cpp: 328
2024-11-25 09:13:08 [11,774ms] [Error] [omni.physx.fabric.plugin] CUDA error: an illegal memory access was encountered: ../../../extensions/runtime/source/omni.physx.fabric/plugins/DirectGpuHelper.cpp: 331
2024-11-25 09:13:08 [11,774ms] [Error] [omni.physx.fabric.plugin] CUDA error: an illegal memory access was encountered: ../../../extensions/runtime/source/omni.physx.fabric/plugins/DirectGpuHelper.cpp: 334
2024-11-25 09:13:08 [11,774ms] [Error] [omni.physx.fabric.plugin] CUDA error: an illegal memory access was encountered: ../../../extensions/runtime/source/omni.physx.fabric/plugins/DirectGpuHelper.cpp: 337
2024-11-25 09:13:08 [11,774ms] [Error] [omni.physx.fabric.plugin] CUDA error: an illegal memory access was encountered: ../../../extensions/runtime/source/omni.physx.fabric/plugins/DirectGpuHelper.cpp: 340
2024-11-25 09:13:08 [11,775ms] [Warning] [omni.graph.core.plugin] Could not find category 'Replicator:Annotators' for removal
2024-11-25 09:13:08 [11,776ms] [Warning] [omni.graph.core.plugin] Could not find category 'Replicator:Annotators' for removal
2024-11-25 09:13:08 [11,776ms] [Warning] [omni.graph.core.plugin] Could not find category 'Replicator:Annotators' for removal
2024-11-25 09:13:08 [11,776ms] [Warning] [omni.graph.core.plugin] Could not find category 'Replicator:Annotators' for removal
2024-11-25 09:13:08 [11,776ms] [Warning] [omni.graph.core.plugin] Could not find category 'Replicator:Annotators' for removal
2024-11-25 09:13:08 [11,776ms] [Warning] [omni.graph.core.plugin] Could not find category 'Replicator:Annotators' for removal
2024-11-25 09:13:08 [11,776ms] [Warning] [omni.graph.core.plugin] Could not find category 'Replicator:Annotators' for removal
2024-11-25 09:13:08 [11,776ms] [Warning] [omni.graph.core.plugin] Could not find category 'Replicator:Annotators' for removal
2024-11-25 09:13:08 [11,776ms] [Warning] [omni.graph.core.plugin] Could not find category 'Replicator:Annotators' for removal
2024-11-25 09:13:08 [11,776ms] [Warning] [omni.graph.core.plugin] Could not find category 'Replicator:Annotators' for removal
2024-11-25 09:13:08 [11,776ms] [Warning] [omni.graph.core.plugin] Could not find category 'Replicator:Annotators' for removal
2024-11-25 09:13:08 [11,776ms] [Warning] [omni.graph.core.plugin] Could not find category 'Replicator:Annotators' for removal
2024-11-25 09:13:08 [11,776ms] [Warning] [omni.graph.core.plugin] Could not find category 'Replicator:Core' for removal
2024-11-25 09:13:08 [11,778ms] [Warning] [omni.physx.plugin] PhysX warning: 

========================repeat the following errror 

/builds/omniverse/physics/physx/source/gpucommon/src/PxgCudaMemoryAllocator.cpp, FILE /builds/omniverse/physics/physx/source/gpucommon/src/PxgCudaMemoryAllocator.cpp, LINE 167
2024-11-25 09:13:08 [11,778ms] [Warning] [omni.physx.plugin] PhysX warning: /builds/omniverse/physics/physx/source/gpucommon/src/PxgCudaMemoryAllocator.cpp, FILE /builds/omniverse/physics/physx/source/gpucommon/src/PxgCudaMemoryAllocator.cpp, LINE 167


2024-11-25 09:13:08 [11,784ms] [Error] [omni.physx.plugin] PhysX error: Failed to unload CUDA module data, returned 700., FILE /builds/omniverse/physics/physx/source/cudamanager/src/CudaContextManager.cpp, LINE 817
2024-11-25 09:13:08 [11,846ms] [Warning] [carb] Recursive unloadAllPlugins() detected!

System Info

  • Commit:
  • Isaac Sim Version:4.2.02
  • OS: Ubuntu 22.04
  • GPU: RTX 4070
  • CUDA:12.4
  • GPU Driver: 550.120

Additional context

The problem took places when I testing my reinforcement learning environment with the skrl wrapped train script , in which I will spawn a cup and fall down to the table. The cup is a RigidObjectCfg object , and table is AssetBaseCfg.

I have opened the stream ,so I can see the crash happens while the cup hits the table. Does any one counter that or know how to fix it ?
THX!

Checklist

  • [ x ] I have checked that there is no similar issue in the repo (required)
  • [ x ] I have checked that the issue is not in running Isaac Sim itself and is related to the repo

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions