Skip to content

Conversation

@luke-gruber
Copy link
Contributor

We can avoid taking this barrier if we're not incremental marking or lazy sweeping. I found this was taking a significant amount of samples when profiling Psych.load in multiple ractors due to the vm barrier. With this change, we get significant improvements in ractor benchmarks that allocate lots of objects.

-- Psych.load benchmark --

Before:            After:
r:   itr:   time   r:   itr:   time
0    #1:  960ms    0    #1:  943ms
0    #2:  979ms    0    #2:  939ms
0    #3:  968ms    0    #3:  948ms
0    #4:  963ms    0    #4:  946ms
0    #5:  964ms    0    #5:  944ms
1    #1:  947ms    1    #1:  940ms
1    #2:  950ms    1    #2:  947ms
1    #3:  962ms    1    #3:  950ms
1    #4:  947ms    1    #4:  945ms
1    #5:  947ms    1    #5:  943ms
2    #1: 1131ms    2    #1: 1005ms
2    #2: 1153ms    2    #2:  996ms
2    #3: 1155ms    2    #3: 1003ms
2    #4: 1205ms    2    #4: 1012ms
2    #5: 1179ms    2    #5: 1012ms
4    #1: 1555ms    4    #1: 1209ms
4    #2: 1509ms    4    #2: 1244ms
4    #3: 1529ms    4    #3: 1254ms
4    #4: 1512ms    4    #4: 1267ms
4    #5: 1513ms    4    #5: 1245ms
6    #1: 2122ms    6    #1: 1584ms
6    #2: 2080ms    6    #2: 1532ms
6    #3: 2079ms    6    #3: 1476ms
6    #4: 2021ms    6    #4: 1463ms
6    #5: 1999ms    6    #5: 1461ms
8    #1: 2741ms    8    #1: 1630ms
8    #2: 2711ms    8    #2: 1632ms
8    #3: 2688ms    8    #3: 1654ms
8    #4: 2641ms    8    #4: 1684ms
8    #5: 2656ms    8    #5: 1752ms

Comment on lines 2015 to 2018
bool needs_gc = is_incremental_marking(objspace) || (heap->free_pages == NULL && is_lazy_sweeping(objspace));
if (needs_gc) {
gc_enter(objspace, gc_enter_event_continue, &lock_lev); // takes vm barrier, try to avoid
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about an early return instead?

Suggested change
bool needs_gc = is_incremental_marking(objspace) || (heap->free_pages == NULL && is_lazy_sweeping(objspace));
if (needs_gc) {
gc_enter(objspace, gc_enter_event_continue, &lock_lev); // takes vm barrier, try to avoid
}
bool needs_gc = is_incremental_marking(objspace) || (heap->free_pages == NULL && is_lazy_sweeping(objspace));
if (!needs_gc) {
return;
}

@luke-gruber luke-gruber force-pushed the heap_prepare_page_no_vm_barrier branch from 910514f to 4552e5c Compare September 2, 2025 19:30
{
unsigned int lock_lev;
gc_enter(objspace, gc_enter_event_continue, &lock_lev);
bool needs_gc = is_incremental_marking(objspace) || (heap->free_pages == NULL && is_lazy_sweeping(objspace));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about extracting a macro:

/* In lazy sweeping or the previous incremental marking finished and
 * did not yield a free page. */
#define needs_continue_sweeping(objspace, heap) \
    ((heap)->free_pages == NULL && is_lazy_sweeping(objspace))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nobu Thanks, done!

@luke-gruber luke-gruber force-pushed the heap_prepare_page_no_vm_barrier branch from f179f13 to b3a2555 Compare September 16, 2025 19:37
@launchable-app

This comment has been minimized.

@jhawthorn jhawthorn self-assigned this Oct 7, 2025
We can avoid taking this barrier if we're not incremental marking or lazy sweeping.
I found this was taking a significant amount of samples when profiling `Psych.load`
in multiple ractors due to the vm barrier. With this change, we get significant improvements
in ractor benchmarks that allocate lots of objects.

-- Psych.load benchmark --

```
Before:            After:
r:   itr:   time   r:   itr:   time
0    ruby#1:  960ms    0    ruby#1:  943ms
0    ruby#2:  979ms    0    ruby#2:  939ms
0    ruby#3:  968ms    0    ruby#3:  948ms
0    ruby#4:  963ms    0    ruby#4:  946ms
0    ruby#5:  964ms    0    ruby#5:  944ms
1    ruby#1:  947ms    1    ruby#1:  940ms
1    ruby#2:  950ms    1    ruby#2:  947ms
1    ruby#3:  962ms    1    ruby#3:  950ms
1    ruby#4:  947ms    1    ruby#4:  945ms
1    ruby#5:  947ms    1    ruby#5:  943ms
2    ruby#1: 1131ms    2    ruby#1: 1005ms
2    ruby#2: 1153ms    2    ruby#2:  996ms
2    ruby#3: 1155ms    2    ruby#3: 1003ms
2    ruby#4: 1205ms    2    ruby#4: 1012ms
2    ruby#5: 1179ms    2    ruby#5: 1012ms
4    ruby#1: 1555ms    4    ruby#1: 1209ms
4    ruby#2: 1509ms    4    ruby#2: 1244ms
4    ruby#3: 1529ms    4    ruby#3: 1254ms
4    ruby#4: 1512ms    4    ruby#4: 1267ms
4    ruby#5: 1513ms    4    ruby#5: 1245ms
6    ruby#1: 2122ms    6    ruby#1: 1584ms
6    ruby#2: 2080ms    6    ruby#2: 1532ms
6    ruby#3: 2079ms    6    ruby#3: 1476ms
6    ruby#4: 2021ms    6    ruby#4: 1463ms
6    ruby#5: 1999ms    6    ruby#5: 1461ms
8    ruby#1: 2741ms    8    ruby#1: 1630ms
8    ruby#2: 2711ms    8    ruby#2: 1632ms
8    ruby#3: 2688ms    8    ruby#3: 1654ms
8    ruby#4: 2641ms    8    ruby#4: 1684ms
8    ruby#5: 2656ms    8    ruby#5: 1752ms
```
@luke-gruber luke-gruber force-pushed the heap_prepare_page_no_vm_barrier branch from b3a2555 to 004d042 Compare November 3, 2025 18:53
@luke-gru luke-gru merged commit 16af727 into ruby:master Nov 3, 2025
87 checks passed
tenderlove added a commit to tenderlove/ruby that referenced this pull request Nov 3, 2025
* master: (381 commits)
  ZJIT: Implement include_p for opt_(new|dup)array_send YARV insns (ruby#14885)
  Avoid taking vm barrier in heap_prepare() (ruby#14425)
  [ruby/json] ext/json/ext/json.h: Add missing newline at end of file
  [ruby/json] Fix duplicate 'inline' declaration specifier
  [ruby/json] Fix check_dependency
  [ruby/json] parser.c: Always inline `json_eat_whitespace`
  [ruby/json] parser.c: use `rb_str_to_interned_str` over `rb_funcall`
  [ruby/json] parser.c: Extract `json_string_cacheable_p`
  [ruby/json] parser.c: simplify sorted insert loop in rstring_cache_fetch
  [ruby/json] parser.c: Skip checking for escape sequences in `rstring_cache_fetch`
  [ruby/json] Centralize macro definitions
  [DOC] Tweaks for String#to_f
  pend on Windows for timeouts
  Fix incorrect RUBY_DEBUG range
  Make Namespace.root visible not only for debugging
  Use CFUNC namespace only for IFUNC frames, its behavior should be unchanged
  Fix use of inappropriate debug flag
  Add flag to ignore EXPERIMENTAL warnings
  No need to call rb_define_class/module_under_id
  Add basic namespace tests
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants