Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SSH connection to RISCV failed. #1769

Open
chLFF opened this issue Aug 26, 2024 · 11 comments
Open

SSH connection to RISCV failed. #1769

chLFF opened this issue Aug 26, 2024 · 11 comments

Comments

@chLFF
Copy link

chLFF commented Aug 26, 2024

What happened?

I try to ssh connect to my RISC-V device by python asyncssh, but some how failed.

Python3 asyncssh code:

import asyncio
import asyncssh

if __name__ == "__main__":

    async def run_coro() -> None:
        """ Test coroutine showing the Connection reset by peer bug """

        async with asyncssh.connect("10.0.90.162", port=22, username="device", password="device", known_hosts=None) as conn:
            result = await conn.run("ls /")
            print(" --- ".join(result.stdout.splitlines()))

    asyncio.run(run_coro())

Fail log:

Traceback (most recent call last):
  File "/home/user/workspace/ssh.py", line 13, in <module>
    asyncio.run(run_coro())
  File "/home/user/miniconda3/lib/python3.11/asyncio/runners.py", line 190, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/home/user/miniconda3/lib/python3.11/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/miniconda3/lib/python3.11/asyncio/base_events.py", line 653, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/home/user/workspace/ssh.py", line 9, in run_coro
    async with asyncssh.connect("10.0.90.162", port=22, username="bianbu", password="bianbu", known_hosts=None) as conn:
  File "/home/user/miniconda3/lib/python3.11/site-packages/asyncssh/misc.py", line 298, in __aenter__
    self._coro_result = await self._coro
                        ^^^^^^^^^^^^^^^^
  File "/home/user/miniconda3/lib/python3.11/site-packages/asyncssh/connection.py", line 8811, in connect
    return await asyncio.wait_for(
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/miniconda3/lib/python3.11/asyncio/tasks.py", line 442, in wait_for
    return await fut
           ^^^^^^^^^
  File "/home/user/miniconda3/lib/python3.11/site-packages/asyncssh/connection.py", line 453, in _connect
    await options.waiter
asyncssh.misc.ConnectionLost: Connection lost

What i found?

I use zlib-ng (version 2.2.1) instead of zlib on my RISC-V board. When i turn WITH_RUNTIME_CPU_DETECTION off, everything went well.
But things work fine on ARM. Guess something is wrong with riscv_check_features.

How to reproduce?

Make and install zlib-ng with/without WITH_RUNTIME_CPU_DETECTION. Compare the asyncssh connection using the code above.

  1. WITH_RUNTIME_CPU_DETECTION on -> failed.
    cmake .. -DZLIB_COMPAT=on -DWITH_RUNTIME_CPU_DETECTION=on
-- The following features have been enabled:

 * CMAKE_BUILD_TYPE, Build type: Release (default)
 * WITH_GZFILEOP, Compile with support for gzFile related functions
 * ZLIB_COMPAT, Compile with zlib compatible API
 * ZLIB_ENABLE_TESTS, Build test binaries
 * ZLIBNG_ENABLE_TESTS, Test zlib-ng specific API
 * WITH_SANITIZER, Enable sanitizer support
 * WITH_GTEST, Build gtest_zlib
 * WITH_OPTIM, Build with optimisation
 * WITH_NEW_STRATEGIES, Use new strategies
 * WITH_RUNTIME_CPU_DETECTION, Build with runtime CPU detection
 * WITH_RVV, Build with RVV intrinsics
  1. WITH_RUNTIME_CPU_DETECTION off -> succeed.
    cmake .. -DZLIB_COMPAT=on -DWITH_RUNTIME_CPU_DETECTION=off
-- The following features have been enabled:

 * CMAKE_BUILD_TYPE, Build type: Release (selected)
 * WITH_GZFILEOP, Compile with support for gzFile related functions
 * ZLIB_COMPAT, Compile with zlib compatible API
 * ZLIB_ENABLE_TESTS, Build test binaries
 * ZLIBNG_ENABLE_TESTS, Test zlib-ng specific API
 * WITH_SANITIZER, Enable sanitizer support
 * WITH_GTEST, Build gtest_zlib
 * WITH_OPTIM, Build with optimisation
 * WITH_NEW_STRATEGIES, Use new strategies
 * WITH_RVV, Build with RVV intrinsics

What i want?

World peace.

@zlib-ng zlib-ng deleted a comment Aug 26, 2024
@zlib-ng zlib-ng deleted a comment Aug 26, 2024
@zlib-ng zlib-ng deleted a comment Aug 26, 2024
@zlib-ng zlib-ng deleted a comment Aug 26, 2024
@mtl1979
Copy link
Collaborator

mtl1979 commented Aug 26, 2024

Run-time feature detection on RISC-V is still quite a new feature and not all kernels support it, so knowing the kernel version will be important.

@chLFF
Copy link
Author

chLFF commented Aug 27, 2024

Linux kernel 6.6.36 on SpacemiT K1.

I compile and run riscv_feature.c as a demo, but it looks just fine.
(Zbc detection is only used in demo, not in libz.)

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/utsname.h>

#include <sys/auxv.h>

// see asm/hwcap.h
#define ISA_V_HWCAP (1 << ('v' - 'a'))
#define ISA_ZBC_HWCAP 45

struct riscv_cpu_features {
    int has_rvv;
    int has_zbc;
};

int is_kernel_version_greater_or_equal_to_6_5() {
    struct utsname buffer;
    uname(&buffer);

    int major, minor;
    if (sscanf(buffer.release, "%d.%d", &major, &minor) != 2) {
        // Something bad with uname()
        return 0;
    }

    if (major > 6 || major == 6 && minor >= 5)
        return 1;
    return 0;
}

void riscv_check_features_compile_time(struct riscv_cpu_features *features) {
    printf("riscv_check_features_compile_time\n");
#if defined(__riscv_v) && defined(__linux__)
    features->has_rvv = 1;
#else
    features->has_rvv = 0;
#endif

#if defined(__riscv_zbc) && defined(__linux__)
    features->has_zbc = 1;
#else
    features->has_zbc = 0;
#endif
}

void riscv_check_features_runtime(struct riscv_cpu_features *features) {
    printf("riscv_check_features_runtime\n");
    unsigned long hw_cap = getauxval(AT_HWCAP);
    features->has_rvv = hw_cap & ISA_V_HWCAP;
    features->has_zbc = hw_cap & ISA_ZBC_HWCAP;
}

void riscv_check_features(struct riscv_cpu_features *features) {
    if (is_kernel_version_greater_or_equal_to_6_5())
        riscv_check_features_runtime(features);
    else
        riscv_check_features_compile_time(features);

    printf("has_rvv = %08x\n", features->has_rvv);
    printf("has_zbc = %08x\n", features->has_zbc);
}

int main(){
    struct riscv_cpu_features rf;
    riscv_check_features(&rf);
}
~ % ./demo
riscv_check_features_runtime
has_rvv = 00200000
has_zbc = 0000002d

@chLFF chLFF changed the title Python asyncssh connection to RISCV failed. SSH connection to RISCV failed. Aug 27, 2024
@chLFF
Copy link
Author

chLFF commented Aug 27, 2024

SSH connection to the RISC-V board using -C shows the same problem as asyncssh.

~ % ssh -C board@ip
board@ip's password:
Connection closed by ip port 22

The issue can be reproduced more efficiently in this way.

@chLFF
Copy link
Author

chLFF commented Aug 27, 2024

Things also happend on Linux kernel 6.1.
6.1 won't run into runtime check, so i assume that something is wrong with is_kernel_version_greater_or_equal_to_6_5().
Then i force feature checking in compile time with -DWITH_RUNTIME_CPU_DETECTION=on and

void Z_INTERNAL riscv_check_features(struct riscv_cpu_features *features) {
    // if (is_kernel_version_greater_or_equal_to_6_5())
    //     riscv_check_features_runtime(features);
    // else
        riscv_check_features_compile_time(features);
}

Connection succeed.
Any idea about is_kernel_version_greater_or_equal_to_6_5()?

@phprus
Copy link
Contributor

phprus commented Aug 27, 2024

@chLFF

If your CPU supports RVV, please replace the riscv_check_features function (

void Z_INTERNAL riscv_check_features(struct riscv_cpu_features *features) {
if (is_kernel_version_greater_or_equal_to_6_5())
riscv_check_features_runtime(features);
else
riscv_check_features_compile_time(features);
}
) with:

void Z_INTERNAL riscv_check_features(struct riscv_cpu_features *features) {
    features->has_rvv = 1;
}

If the error is reproducible, then the problem is in the implementation of zlib-ng algorithms for RISC-V.

@chLFF
Copy link
Author

chLFF commented Aug 27, 2024

@phprus
The replacement you mentioned in #1769 (comment) is basically the same as my modification in #1769 (comment). They both make connection work (with -DWITH_RUNTIME_CPU_DETECTION=on).

And thanks for reminding me.
With -DWITH_RVV=off -DWITH_RUNTIME_CPU_DETECTION=on, the connection succeed.
With -DWITH_RVV=on -DWITH_RUNTIME_CPU_DETECTION=off, the connection succeed.
With -DWITH_RVV=on -DWITH_RUNTIME_CPU_DETECTION=on (original setting), the connection failed.
So it seems that WITH_RVV and WITH_RUNTIME_CPU_DETECTION can not be turned on together.

@phprus
Copy link
Contributor

phprus commented Aug 27, 2024

#1769 (comment) is not equivalent to #1769 (comment)

For test we need to manual set of features->has_rvv to 1.

@chLFF
Copy link
Author

chLFF commented Aug 27, 2024

@chLFF

If your CPU supports RVV, please replace the riscv_check_features function (

void Z_INTERNAL riscv_check_features(struct riscv_cpu_features *features) {
if (is_kernel_version_greater_or_equal_to_6_5())
riscv_check_features_runtime(features);
else
riscv_check_features_compile_time(features);
}

) with:

void Z_INTERNAL riscv_check_features(struct riscv_cpu_features *features) {
    features->has_rvv = 1;
}

If the error is reproducible, then the problem is in the implementation of zlib-ng algorithms for RISC-V.

I follow this to manual set of features->has_rvv to 1. No error shows up during connection.
i.e.
With -DWITH_RVV=on -DWITH_RUNTIME_CPU_DETECTION=on, is_kernel_version_greater_or_equal_to_6_5() commented out, the connection succeed.

And i don't understand why the equivalence is false. Because i also manually set features->has_rvv to 1 but only inside riscv_check_features_compile_time(features) which determined during compiling.

@phprus
Copy link
Contributor

phprus commented Aug 27, 2024

And i don't understand why the equivalence is false. Because i also manually set features->has_rvv to 1 but only inside riscv_check_features_compile_time(features) which determined during compiling.

riscv_check_features_compile_time has macro conditions.

To differentiate a feature detector bug and an implementation bug, we need to completely remove the feature detector from the source and set the features->has_rvv flag unconditionally.

@chLFF
Copy link
Author

chLFF commented Aug 27, 2024

@phprus Thanks very much, i got it. We do have rvv feature and rvv is turned on and thing goes well. So that there is no problem with RISC-V zlib-ng algorithms? As i said, is_kernel_version_greater_or_equal_to_6_5() causes the error?

@mtl1979
Copy link
Collaborator

mtl1979 commented Aug 29, 2024

@phprus Thanks very much, i got it. We do have rvv feature and rvv is turned on and thing goes well. So that there is no problem with RISC-V zlib-ng algorithms? As i said, is_kernel_version_greater_or_equal_to_6_5() causes the error?

Only function call in is_kernel_version_greater_or_equal_to_6_5() that can cause issues is uname... You could write a simple program to check what it returns for buffer.release and compare if you get the same result if you output it in stderr while running Python... If it returns garbage or string with no NULL-termination, then it can crash...

I will try to get all the feature detection fixes added in #1770, when I know the root cause of all the failures and crashes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants
@phprus @mtl1979 @chLFF and others