-
Notifications
You must be signed in to change notification settings - Fork 23.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Add io_uring support to redis #9440
base: unstable
Are you sure you want to change the base?
Conversation
Use io_uring to do poll_add/poll_remove. Make sure you have liburing installed and kernel supported. Signed-off-by: Betty <[email protected]>
Signed-off-by: Betty <[email protected]> Signed-off-by: Jiufei Xue <[email protected]>
Signed-off-by: Betty <[email protected]> Signed-off-by: Jiufei Xue <[email protected]>
Signed-off-by: Betty <[email protected]> Signed-off-by: Jiufei Xue <[email protected]>
Signed-off-by: Betty <[email protected]> Signed-off-by: Jiufei Xue <[email protected]>
io_uring_prep_poll_add() mistakenly always uses EPOLLIN. Correct it by using poll_mask. Signed-off-by: Hao Xu <[email protected]>
Register files when create clients in redis-benchmark. Signed-off-by: Hao Xu <[email protected]> Signed-off-by: Joseph Qi <[email protected]>
Add an argument to turn on/off sqpoll feature. Signed-off-by: Hao Xu <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How much CPU is used while this is ongoing? Someone mentioned to me that the submit query side of io_uring will use additional CPU in the background for sending the quests. It would be good to understand the performance gain of this compared to just enabling IO_threads for writes.
Thank you for your comments, my friends. Yes, you are right. Five CPUs were distributed to redis-server for its five thread(we use "taskset -c 24-28 redis-server"). When we open sqpoll, it needs an additional CPU. But if we close sqpoll, it just needs the same num of CPU, no addtitional CPU. Even if we close sqpoll,it can still get improvement. |
@HanGuangyu What is the difference between open and closed sq polling, I'm not familiar with that terminology? Also, to clarify your setup, you ran both the benchmark and the running server locally? I want to run this against some testing we did internally, where we didn't find the return very high returns to just using IO-threads, but we also have custom kernel tuning which might have reduced the benefits. The code looks pretty straightforward all things considered, and it seems still inline with the longer term multi-threaded improvements (having multiple event loops serving a subset of clients). I'd be inclined to take a deeper look at the code to try to understand it better and see if we can figure out how to merge it. |
@madolson Hi, I'm so happy for your quick responses and professional attitude. You made me feel the full vitality of the Redis community. I'm sorry that I didn't explain the information clearly in advance. I will offer whatever I can to help you understand it. 1. sqpoll(Submission Queue Polling) We need set two ring buffers up with io_uring_setup(). And we tell the kernel via io_uring_enter() system call that we have added an submission queue entry(SQE) to the submission queue ring buffer. We can add multiple SQEs before making the system call as well. But io_uring also has a This is done via a special 2. clarifying the setup ## didn't open SQ polling
taskset -c 24-28 redis-server
taskset -c 30 redis-benchmark -c 500 -n 10000000 -q
## open SQ polling
taskset -c 24-28 redis-server --sqpoll
# query pid of sq, and bind it to cpu
ps -ef|grep io_uring-sq
taskset -pc 29 {sq_pid}
taskset -c 30 redis-benchmark -c 500 -n 10000000 -q
Best wishes for you. If you have any question, call me. |
@HanGuangyu Sorry for being so slow to circle back, I'm going to review this tomorrow! |
@madolson Don't feel bad, my friends. I can understand you, I know you'are very busy. |
@madolson Could I get a response when you're not busy (ಥ﹏ಥ), my friend. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@HanGuangyu Sorry for getting delayed, looked through everything but the io_uring details with some questions. Still trying to think through how to better encapsulate the async code.
#ifdef HAVE_IO_URING | ||
if (aeCreateFileEvent(el, fd, AE_READABLE, acceptTcpHandler, NULL) == AE_ERR) { | ||
serverPanic("Unrecoverable error creating server.ipfd file event."); | ||
} | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to clarify my understanding, we are re-registering this since it will be de-registered each time it fires. Right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In ae_iouring.c, aeApiAddEvent()calls io_uring_prep_poll_add(), which is a single-shot poll request. use io_uring_prep_poll_multishot() can avoid re-registering event here.
Ref: https://man.archlinux.org/man/extra/liburing/io_uring_prep_poll_add.3.en
size_t objlen; | ||
clientReplyBlock *o; | ||
|
||
#if defined(HAVE_IO_URING) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would help readability if we used a config flag to control using IO_Uring as opposed to this being a regular Redis config instead of a macro.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm agree with you, a config flag is a better option. Use macro just to quickly implement POC.
And could you give me a suggestion about the implement form of config flag? Or is there a reference to the existing examples in redis
c->wiov.iov_len = c->bufpos-c->sentlen; | ||
c->wiov.iov_base = c->buf+c->sentlen; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the failure modes of submitting a request to the submission queue? It would be nice if we could encapsulate all of the submission into connWrite, it would make this code much easier to maintain, we just need an accounting mechanism.
While reading this code it seems like we should introduce a new "buffered IO" type so we can abstract the different requirements for submission queue. The type would have APIs for adding data to the buffer, pulling data out of the buffer, and acknowledging data from the buffer I think?
My other thought is that connections maybe should also have a callback function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- failure modes of submitting a request to the submission queue
We use io_uring_queue_init(MAX_ENTRIES, state->ring, 0);
to initialize a io_uring instance which have two queue. MAX_ENTRIES is a macro which was defined in the beginning of ae_iouring.c by #define MAX_ENTRIES 16384
. MAX_ENTRIES defines the length of io_uring queue which is the number of SQEs(Submission Queue Entry) in queue. This is a POC solution.
(In my opinion, it could be defined in redis.conf and will be modified by user. Could I get your opinion, my friends?)
Before we submit a request to the submission queue, we use io_uring_get_sqe
to get the available sqe. Then, populate the data for this
SQE. And now, we return -1 if all sqe is used in this POC.
struct io_uring_sqe *sqe = io_uring_get_sqe(state->ring);
if (!sqe) return -1;
And we can try to make it stronger if you are agreed and interested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
encapsulate all of the submission into connWrite
So good, I will realize it. -
new "buffered IO" type
I basically agree now, but sorry that I need some times to do a deeper understanding to realize it. -
a callback function in connection
Could you give me more information about this in your opinion
} | ||
#if defined(HAVE_IO_URING) | ||
if (aeCreateFileEventWithBuf(server.el, c->conn->fd, AE_WRITABLE, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a general comment, we wanted to hide all of the ae code within the connection objects, so this would be breaking that particular barrier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will try to fix it. And could I get more information about this rule?
@@ -3341,6 +3351,7 @@ void initServer(void) { | |||
|
|||
/* Register a readable event for the pipe used to awake the event loop | |||
* when a blocked client in a module needs attention. */ | |||
aeRegisterFile(server.el, server.module_blocked_pipe[0]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry that I didn't explain it in advance. This is a advanced feature of io_uring——file registration
.
Every time an operation is issued for a file descriptor, the kernel has to spend cycles mapping the file descriptor to its internal representation. For repeated operations over the same file, io_uring allows you to pre-register those files and save on the lookup.
The io_uring_register() system call registers user buffers or files for use in an io_uring(7) instance referenced by fd. Registering files or user buffers allows the kernel to take long term references to internal data structures or create long term mappings of application memory, greatly reducing per-I/O overhead.
For more details, we can see: https://unixism.net/loti/ref-iouring/io_uring_register.html. Or you can talk with me, if you have confuse about it.
@@ -1127,6 +1139,10 @@ static void acceptCommonHandler(connection *conn, int flags, char *ip) { | |||
freeClient(connGetPrivateData(conn)); | |||
return; | |||
} | |||
|
|||
#ifdef HAVE_IO_URING | |||
readQueryFromClient(conn); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this for?
@@ -115,10 +115,19 @@ client *createClient(connection *conn) { | |||
* in the context of a client. When commands are executed in other | |||
* contexts (for instance a Lua script) we need a non connected client. */ | |||
if (conn) { | |||
#if !defined(HAVE_IO_URING) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not enable tcp no delay?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm so sorry, this is a typo. I will fix it. Tcp no delay should be enable.
#if defined(HAVE_IO_URING) | ||
/* Do not create poll event */ | ||
conn->read_handler = readDoneFromClient; | ||
conn->write_handler = writeDoneToClient; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why install a write handler immediately?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have the same question...
Hello @HanGuangyu. I work with @madolson in AWS and we are interested in this great effort to improve the Redis performance!
all of these might have good reason to be enabled in some systems, but I think we need a clear picture regarding the io-uring performance boost with and without the extra io-thread and kernel mitigations disablement. Thank you, and nice to read your good work! |
Hello @ranshid , nice to meet you.
And we will also find a new environment to test io-uring performance with and without kernel mitigations disablement. But so sorry that maybe I need some time. Thank you. If you want to learn about any infomation of this work, call me. Nice to have the work with you. It’s so good to participate in the work of redis community. |
@HanGuangyu Hey, sorry for falling off for a bit. We ended up getting busy with 7.0 for a bit, but this is an important item that we want to improve in Redis, so wondering if you are still around and want to help merge this? |
If you are, please look into the possibility of building this on top of #9320. |
@HanGuangyu now that #9320 is merged, it would be interesting to see if we can leverage that, and extend it so that it's flexible enough to allow io_uring support to be added in a similar way. |
Sorry for my late response. A problem is we now lack a enough understanding of redis and #9320. |
@oranagra @madolson |
@HanGuangyu No worries. There is definitely interest in the community, so I'm sure someone will pick it up eventually. Appreciate starting the conversation though! |
@HanGuangyu I am interested in this work and found some possible bugs. Please see #9441 (comment) for detail. |
Hi @madolson , @oranagra , @HanGuangyu |
I can pick it up to continue the work if community still interest. |
@lipzhu Hey, we are interested in this. Please make sure to post a summary of what you are planning to look into. Just want to make sure what you are doing is productive and likely to get pulled. |
|
if (c->flags & CLIENT_CLOSE_AFTER_REPLY) { | ||
freeClientAsync(c); | ||
} | ||
if (c->flags & CLIENT_CLOSE_AFTER_REPLY) return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this logic, it seems same with 1746 line?
@HanGuangyu Any updates on this? The result looks very promising, we are interested in this as well. |
@congwang Welcome to continue the work if you are interested. |
@sundb Thank you! I will take a look. From my quick glance, it looks like registered buffer is not yet utilized to eliminate the data copy between kernel-space and user-space. I think it would make the performance even better. |
io_uring
is a powerful new asynchronous I/O API for Linux by Jens Axboe from Facebook. And it has been added to Linux kernel since 5.1 version. For more details aboutio_uring
, we can see Lord of the io_uring.This work is a POC(Proof of Concept) cooperated by Alibaba team of Anolis community and Uniontech Software Technology Co., Ltd. We have experimentally added io_uring support to redis and gotten good performance improvement in testes. If this work can be approved by the community, we will proceed.
Test Environment
Test Data
Please see the table below to get the data of performance improvement. And we also published it in Anolis Chinese Web.
Test Command
Test Tool: redis-benchmark
We ran the benchmark and the redis-server on the same server. And both redis-server and redis-benchmark are bound fixed CPU cores.
Looking forward to your comments, best wishes to you.
Related-Issue: #9441