Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Tests for Base.cwstring #56123

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

rileysheridan
Copy link

I made simple unit tests for this function, but I would like some clarification on empty string and NUL terminated string inputs. cwstring currently throws an error for both, but I test for valid return vectors. If that's wrong, I can change the first and third tests.

Also, sorry for the wall of filemode changes. I tried to remove them from the commit but kept getting "unable to index file" errors. Advice on how to fix would be appreciated!

test/strings/basic.jl Outdated Show resolved Hide resolved
Fixes equality operator in first test of "cwstring" testset

Co-authored-by: Sukera <[email protected]>
@giordano
Copy link
Contributor

Also, sorry for the wall of filemode changes. I tried to remove them from the commit but kept getting "unable to index file" errors. Advice on how to fix would be appreciated!

What operating system are you using? How are you interacting with git? The command line interface/VS Code/something else?

@@ -1409,3 +1409,18 @@ end
@test transcode(String, transcode(UInt8, transcode(UInt16, str))) == str
end
end

@testset "cwstring" begin
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Base.cwstring is only defined on Windows:

if ccall(:jl_get_UNAME, Any, ()) === :NT
"""
Base.cwstring(s)
Converts a string `s` to a NUL-terminated `Vector{Cwchar_t}`, suitable for passing to C
functions expecting a `Ptr{Cwchar_t}`. The main advantage of using this over the implicit
conversion provided by [`Cwstring`](@ref) is if the function is called multiple times with the
same argument.
This is only available on Windows.
"""
function cwstring(s::AbstractString)
bytes = codeunits(String(s))
0 in bytes && throw(ArgumentError("embedded NULs are not allowed in C strings: $(repr(s))"))
return push!(transcode(UInt16, bytes), 0)
end
end
This testset should only be run on Windows, otherwise it'll fail on other platforms, which means you should change this to

if Sys.iswindows()
    @testset "cwstring" begin
        # ...
    end
end

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Testset still fails (intentionally) on Windows, will my branch still be merged or should a fix addressing the tests be pushed first?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We aren't going to merge a PR with broken tests (unless explicitly marked so), but I'm not clear why we should just add a broken test. What's broken? The test? The code?

Copy link
Author

@rileysheridan rileysheridan Nov 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Base.cwstring() converts an input string s to a NUL-terminated character vector but currently throws an error for any input string s that contains a NUL character anywhere. This is the problem test/input

str_2 = "Wordu\000"
# ...
@test Base.cwstring(str_2) == UInt16[0x0057, 0x006f, 0x0072, 0x0064, 0x0000]

where I input a string with a terminating NUL expecting cwstring() to trim the NUL off the end of the input and proceed to convert the remainder as normal, but instead cwstring() errors. Sorry for not clarifying earlier, but my ultimate question is should we add terminal NUL trimming to cwstring() or just leave the function alone? If the latter, I can just change my broken test to

@test_throws ArgumentError Base.cwstring(str_2)

so all tests in the testset work and we can merge this PR without issue.

@giordano giordano added system:windows Affects only Windows test This change adds or pertains to unit tests strings "Strings!" labels Oct 19, 2024
@rileysheridan
Copy link
Author

What operating system are you using? How are you interacting with git? The command line interface/VS Code/something else?

Sorry it took me so long to get back to you. Right now I’m on Windows and am just using the CLI (Cygwin) for Git interaction.

When I first opened this PR, I had an SSH connection to my GitHub account from Windows Command Prompt but not from Cygwin. I was originally swapping between Command Prompt and Cygwin (building Julia with Cygwin, git push through Command Prompt, for example). It’s been so long I don’t remember exactly how my bad workflow worked, but I know it was likely the cause behind the random filemode changes and made reverting them difficult. I eventually connected Cygwin to my GitHub, used just Cygwin, and have not had similar permission issues or “unable to index file” errors since.

Copy link
Member

@LilithHafner LilithHafner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding these tests! This was indeed a hole in our test coverage.

str_3 = "aܣ𒀀"
@test Base.cwstring(str_0) == UInt16[0x0000]
@test_throws ArgumentError Base.cwstring(str_1)
@test Base.cwstring(str_2) == UInt16[0x0057, 0x006f, 0x0072, 0x0064, 0x0000]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@test Base.cwstring(str_2) == UInt16[0x0057, 0x006f, 0x0072, 0x0064, 0x0000]
@test_throws ArgumentError Base.cwstring(str_2)

I do not think we should have automatic NUL trimming on cwstring.

  • There would be a slight performance cost to check for that
  • Generally, cwstring(s) produces a Vector{Cwchar_t} that represents s exactly. If we trimmed trailing NUL characters then the returned value would not fully describe the input
  • I don't see a case where someone would have a String or AbstractString that is semantically NUL terminated and want to automatically strip that character.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
strings "Strings!" system:windows Affects only Windows test This change adds or pertains to unit tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants