Skip to content

Commit

Permalink
Merge branch 'develop-3.0' into develop
Browse files Browse the repository at this point in the history
  • Loading branch information
martinsumner committed Jun 7, 2022
2 parents fc9b739 + 582bbcc commit 8c01e8f
Show file tree
Hide file tree
Showing 4 changed files with 260 additions and 24 deletions.
15 changes: 15 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# Riak - a distributed, decentralised data storage system.

To build riak, Erlang OTP 22 or higher is required.

`make rel` will build a release which can be run via `rel/riak/bin/riak start`. Riak is primarily configured via `rel/riak/etc/riak.conf`

To make a package, install appropriate build tools for your operating system and run `make package`.

To create a local multi-node build environment use `make devclean; make devrel`.

To test Riak use [Riak Test](https://github.com/basho/riak_test/blob/develop-3.0/doc/SIMPLE_SETUP.md).

Up to date documentation is not available, but work on [documentation](https://www.tiot.jp/riak-docs/riak/kv/2.9.10/) is ongoing and the core information available in the [legacy documentation](https://docs.riak.com/riak/kv/latest/index.html) is still generally relevant.

Issues and PRs can be tracked via [Riak Github](https://github.com/basho/riak/issues) or [Riak KV Github](https://github.com/basho/riak_kv/issues).
22 changes: 0 additions & 22 deletions README.org

This file was deleted.

42 changes: 42 additions & 0 deletions RELEASE-NOTES.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,45 @@
# Riak KV 3.0.10 Release Notes

This release is focused on improving memory management, especially with the leveled backend, and improving the efficiency and ease of configuration of tictacaae full-sync.

- Improved [memory management of leveled](https://github.com/martinsumner/leveled/pull/371) SST files that contain [rarely accessed data](https://github.com/martinsumner/leveled/pull/371)

- Fix a bug whereby leveled_sst files could spend an [extended time in the delete_pending state](https://github.com/martinsumner/leveled/pull/377), causing significant short-term increases in memory usage when there are work backlogs in the penciller.

- Change the queue for reapers and erasers so that [they overflow to disk](https://github.com/basho/riak_kv/issues/1807), rather than simply consuming more and more memory.

- Change the replrtq (nextgenrepl) queue to use the same [overflow queue mechanism](https://github.com/basho/riak_kv/issues/1817) as used by the reaper and erasers.

- Change the default full-sync mechanism for tictacaae (nextgenrepl) full-sync to `auto_check`, which attempts to [automatically learn and use information about modified date-ranges](https://github.com/basho/riak_kv/issues/1815) in full-sync checks. The related changes also make full-sync by default bi-directional, reducing the amount of wasted effort in full-sync queries.

- Add [a peer discovery feature](https://github.com/basho/riak_kv/issues/1804) for replrtq (nextgenrepl) so that new nodes added to the cluster can be automatically recognised without configuration changes. By default this is disabled, and should only be enabled once both clusters have been upgraded to at least 3.0.10.

- Allow for underlying beam memory management and scheduler configuration to be exposed via riak.conf to allow for further performance tests on these settings. Note initial tests indicate the potential for [significant improvements when using the leveled backend](https://github.com/basho/riak_kv/issues/1826).

- Fix a potential issue whereby corrupted objects would prevent AAE (either legacy or nextgenrepl) [tree rebuilds](https://github.com/basho/riak_kv/issues/1824) from completing.

- Improved [handling of key amnesia](https://github.com/basho/riak_kv/issues/1813), to prevent rebounding of objects, and also introduce a reader process (like reaper and eraser) to which read repairs can be queued with overflow to disk.

Some caveats for this release exist:

- The release does not support OTP 20, only OTP 22 is supported. Updating some long out-of-date components have led to a requirement for the OTP version to be lifted.

- Volume and performance testing with the leveled backend now uses the following non-default settings:

```
erlang.schedulers_busywait = none
erlang.schedulers_busywait_dirtycpu = none
erlang.schedulers_busywait_dirtyio = none
erlang.async_threads = 4
erlang.schedulers.force_wakeup_interval = 0
erlang.schedulers.compaction_of_load = true
leveled_reload_recalc = enabled
```

- To maintain backwards compatibility with older linux versions, the [latest version of basho's leveldb](https://github.com/basho/leveldb/releases/tag/2.0.37) is not yet supported. This is likely to change in the next release, where support for older linux versions will be dropped.

- The release process has [exposed an issue](https://github.com/basho/riak_kv/issues/1831) via a recently extended test. This issue is pre-existing, and not specific to this release.

# Riak KV 3.0.9 Release Notes

This release contains stability, monitoring and performance improvements.
Expand Down
205 changes: 203 additions & 2 deletions priv/riak.schema
Original file line number Diff line number Diff line change
Expand Up @@ -143,25 +143,226 @@
merge
]}.

%% VM scheduler collapse, part 1 of 2
%% @doc Riak changes the VM default wakeup interval in order to reduce the
%% risk of scheduler collapse, prior to the availability of Dirty NIFs in
%% later OTP versions. When using the leveled backend exclusively (either for
%% AAE or object storage) this change is likely unnecessary, and the VM default
%% of 0 can be used.
{mapping, "erlang.schedulers.force_wakeup_interval", "vm_args.+sfwi", [
{default, 500},
{datatype, integer},
merge
]}.

%% VM scheduler collapse, part 2 of 2
%% @doc Riak changes the compaction_of_load default from true to false. This
%% is part of the strategy for preventing scheduler collapse in older VMs.
%% When using the leveled backend exclusively (either for AAE or object
%% storage), this change from the standard BEAM defaults is likely unnecessary
%% - and compaction_of_load can be re-enabled.
{mapping, "erlang.schedulers.compaction_of_load", "vm_args.+scl", [
{default, "false"},
merge
]}.

%% @doc Sets the number of threads in async thread pool, valid range
%% is 0-1024. If thread support is available, the default is 64.
%%
%% More information at: http://erlang.org/doc/man/erl.html
%%
%% Large async_thread pools are likely now unnecessary if exclusively using
%% the leveled backend due to dirty NIFs, and so can be set to a much smaller
%% value (potentially 1).
{mapping, "erlang.async_threads", "vm_args.+A", [
{default, 64},
{datatype, integer},
{validators, ["range:0-1024"]},
merge
]}.

%% VM emulator ignore break signal (prevent ^C / ^Gq)
{mapping, "erlang.vm.ignore_break_signal", "vm_args.+Bi", [
{default, "true"},
merge
]}.

%% @doc The VM single block carrier threshold (KB) for process heap
{mapping, "erlang.eheap_memory.sbct", "vm_args.+MHsbct", [
{commented, 512},
{datatype, integer},
merge
]}.

%% @doc The VM single block carrier threshold (KB) for binary heap
{mapping, "erlang.binary_memory.sbct", "vm_args.+MBsbct", [
{commented, 512},
{datatype, integer},
merge
]}.

%% @doc The VM multi block carrier large size for process heap
{mapping, "erlang.eheap_memory.lmbcs", "vm_args.+MHlmbcs", [
{commented, 5120},
{datatype, integer},
merge
]}.

%% @doc The VM multi block carrier large size for binary heap
{mapping, "erlang.binary_memory.lmbcs", "vm_args.+MBlmbcs", [
{commented, 5120},
{datatype, integer},
merge
]}.

%% @doc The VM multi block carrier small size for process heap
{mapping, "erlang.eheap_memory.smbcs", "vm_args.+MHsmbcs", [
{commented, 256},
{datatype, integer},
merge
]}.

%% @doc The VM multi block carrier small size for binary heap
{mapping, "erlang.binary_memory.smbcs", "vm_args.+MBsmbcs", [
{commented, 256},
{datatype, integer},
merge
]}.

%% @doc Set allocation strategy for binary multiblock carriers. Default is
%% not predictable - do not rely on aoffcbf being the default. For more info
%% see:
%% https://github.com/erlang/otp/blob/master/erts/emulator/internal_doc/CarrierMigration.md
{mapping, "erlang.binary_memory.as", "vm_args.+MBas", [
{commented, "aoffcbf"},
{datatype, {enum, [bf, aobf, aoff, aoffcbf, aoffcaobf, ageffcaoff, ageffcbf, ageffcaobf, gf]}},
merge
]}.

%% @doc Set allocation strategy for process multiblock carriers. Default is
%% not predictable - do not rely on aoffcbf being the default. For more info
%% see:
%% https://github.com/erlang/otp/blob/master/erts/emulator/internal_doc/CarrierMigration.md
{mapping, "erlang.eheap_memory.as", "vm_args.+MHas", [
{commented, "aoffcbf"},
{datatype, {enum, [bf, aobf, aoff, aoffcbf, aoffcaobf, ageffcaoff, ageffcbf, ageffcaobf, gf]}},
merge
]}.

%% @doc Set scheduler binding. This is either unbound (default - u) or can be
%% set to whatever the default binding condition is, in the deployed release of
%% OTP (db).
%% For more info see: https://www.erlang.org/doc/man/erl.html#+sbt
%% Note that if non-Riak work is activated on the same node - e.g. as part
%% of batch operational jobs, or monitoring - allowing schedulers to be bound
%% can result in significant and unpredictable negative outcomes. There may be
%% other ways of achieving similar performance improvements - e.g. by
%% right-sizing scheduler counts - that are lower risk than scheduler binding.
%% If a CPU topology cannot be determined, the binding will default to unbound
%% even when a binding is configured. To confirm binding, use `remote_console`
%% and view:
%% `erlang:system_info(scheduler_bindings).`
{mapping, "erlang.schedulers_binding", "vm_args.+stbt", [
{commented, "u"},
{datatype, {enum, [u, db]}},
merge
]}.

%% @doc Busy wait of schedulers
%% Sets scheduler busy wait threshold. Defaults to medium. The threshold
%% determines how long schedulers are to busy wait when running out of work
%% before going to sleep.
%% Significant improvements in efficiency may be gained by disabling busy
%% waiting
{mapping, "erlang.schedulers_busywait", "vm_args.+sbwt", [
{commented, "none"},
{datatype, {enum, [none, very_short, short, medium, long, very_long]}},
merge
]}.

%% @doc Busy wait of dirty cpu schedulers
%% Sets scheduler busy wait threshold. Defaults to short. The threshold
%% determines how long schedulers are to busy wait when running out of work
%% before going to sleep.
%% Significant improvements in efficiency may be gained by disabling busy
%% waiting
{mapping, "erlang.schedulers_busywait_dirtycpu", "vm_args.+sbwtdcpu", [
{commented, "none"},
{datatype, {enum, [none, very_short, short, medium, long, very_long]}},
merge
]}.

%% @doc Busy wait of dirty io schedulers
%% Sets scheduler busy wait threshold. Defaults to short. The threshold
%% determines how long schedulers are to busy wait when running out of work
%% before going to sleep.
%% Significant improvements in efficiency may be gained by disabling busy
%% waiting
{mapping, "erlang.schedulers_busywait_dirtyio", "vm_args.+sbwtdio", [
{commented, "none"},
{datatype, {enum, [none, very_short, short, medium, long, very_long]}},
merge
]}.

%% @doc Set the Percentage of Schedulers to be online
%% For every vCPU in the system, what percentage should have a scheduler, and
%% what percentage of those schedulers should be online by default.
%% Do not set unless guided by perfomance tests for the specific setup and
%% workload.
{mapping, "erlang.schedulers_online_percentage", "vm_args.+SP",[
{commented, "100:75"},
{validators, ["scheduler_percentage"]},
merge
]}.

%% @doc Set the Percentage of Dirty CPU Schedulers to be online
%% When using the leveled backend a relatievly low number of dirty schedulers
%% (e.g. 25%) are likely to be required due to the low proportion of NIFs in
%% use.
%% The percentages cannot exceed those of the schedulers_online_percentage
%% which will default to 100% of CPU.
%% Do not set unless guided by perfomance tests for the specific setup and
%% workload.
{mapping, "erlang.schedulers_dirtycpu_online_percentage", "vm_args.+SDPcpu",[
{commented, "50:25"},
{validators, ["scheduler_percentage"]},
merge
]}.

%% @doc Set the absolute limit of Dirty IO Schedulers to be online
%% When using the leveled backend a relatievly high number of dirty schedulers
%% may be required relative to the CPU count, depending on the concurrent disk
%% throughput possible.
%% Unlike the scheduler percentages, this is set as an abolute number between
%% 1 and 1024 (default is 10).
%% Do not set unless guided by perfomance tests for the specific setup and
%% workload.
{mapping, "erlang.schedulers_dirtyio_online", "vm_args.+SDio",[
{commented, 10},
{datatype, integer},
{validators, ["scheduler_absolute"]},
merge
]}.

{validator,
"scheduler_percentage",
"must be A:B when B =< A and both A and B 1 < x =< 100",
fun(PercPerc) ->
case string:tokens(PercPerc, ":") of
[A, B] ->
AV = list_to_integer(A),
BV = list_to_integer(B),
AV =< 100 andalso AV > 0 andalso BV =< 100 andalso BV > 0;
_ ->
false
end
end}.

{validator,
"scheduler_absolute",
"must be 1 to 1024",
fun(Value) ->
is_integer(Value) andalso Value =< 1024 andalso Value >= 1
end}.

{{#devrel}}
%% Because of the 'merge' keyword in the proplist below, the docs and datatype
%% are pulled from the leveldb schema.
Expand Down

0 comments on commit 8c01e8f

Please sign in to comment.