-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide shell version of strgen.c to allow use without building a binary. #13
Conversation
The output has been verified to be identical to that of the C language version for all files in this project by comparing the resulting .o files.
Codecov Report
@@ Coverage Diff @@
## master #13 +/- ##
======================================
Coverage 99.9% 99.9%
======================================
Files 16 16
Lines 3091 3091
======================================
Hits 3088 3088
Misses 3 3
Continue to review full report at Codecov.
|
Okay, this patch has more advanced POSIX shell scripting than I know, so I am going to make sure I understand it first, after reading about First, I am assuming that the lines after line 53 were to help you write the script and can be removed after I merge. Please let me know if I am wrong. Second, I assume that lines up to line 19 are just handling command-line arguments, the same way that Lines 21 and 22 are what I am not sure I understand. The I will assume that is the case for the rest of this comment. Lines 24-35 seem to be handling the optional command-line arguments. Lines 37-50 seem to be where the magic is happening. Since the script is running Going forward on that assumption, the here doc appears to be of the same form used by Then lines 46 and 47 is is where more magic happens. That is where the array is created. Line 47 is still a mystery to me. Obviously, the
First, it appears to be using the The first part is easy: it is using Anyway, if it is, then the The The The The What this probably means is that instead of creating an char array listed as numbers, it is creating a string literal. Am I correct about everything? Or did I get something wrong? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hate to say this, but this change breaks the bc
completely. It causes seg faults (see the build and click on any one of the four jobs in the build). The original code was correct, and it had a cast for a reason. I will attempt to explain why.
The code for bc
's vectors is sort-of a copy of Dyna's vector, also written by me. The idea is a type-agnostic container, kind of like the C++ STL, but without templates. In order to be type-agnostic, it has to implement by hand, and at runtime, the sort of code that you would get by compiling C++ templates. That means that it has to manually calculate pointers to elements.
As you can probably tell from Dyna, it uses a char array underneath as well, but it returns a void*
, which the user can then assign to a pointer of the correct type. The reason that it has a char*
underneath but returns a void*
is because the C standard doesn't require casts to or from a void*
, but it also says that pointer arithmetic on void*
is undefined, while on char*
, it is guaranteed to be calculated by individual bytes, since char
is defined as the smallest addressable type.
The bc
vector does exactly the same thing (see bc_vec_item()
and friends). Basically, even though it is a char*
underneath, it is doing an address calculation, taking the size of the type that the container holds into account.
In bc_vm_envArgs()
, the vector v
is created to hold the type char*
. This means that underneath, in bc_vec_item()
, it is doing a size calculation equivalent to:
v.v + idx * sizeof(char*)
This all means that for v
specifically, even though v.v
is technically a char*
, it actually contains a char*
, so it is actually a char**
.
As a result, by taking its address, you are sending a char***
to bc_args()
, which then, assuming it has a char**
, sends that to getopt_long()
, which then accesses wrongly and causes the seg fault.
Perhaps a clearer line of code would be the following:
s = bc_args((int) v.len - 1, bc_vec_item(&v, 0));
That line is functionally equivalent to the original line of code.
The reason I did not use that was because I did not want to assume that the compiler would optimize it away, but since it only happens once per run (at most), it might be better to do that instead.
Please either change it back or change it to the line of code above.
Hi Gavin, understood ... I could have made the script far more cryptic, BTW ;-)
Yes, I had pasted in some information to help me get the parameters right.
Yes, the script takes up to 7 parameters and does just assign variable
Yes, exec used in that way connects the files as if the script had been The script could just as well wrap use redirection on the "cat << EOF",
Yes, this could also be done at the places where the variables are used
Yes.
Yes.
The -e is just followed by any sed commands that are passed on the command
Yes, using double quotes had been possible, but then the \ characters had to
Yes. The sed commands can be preceded with a line range, where number are
Yes.
Yes.
Yes.
Yes, one long string literal that is broken in lots of double-quoted fragments, which are to be
|
Thank you for your patience with me as I understood the script. To me, it is imperative that I understand everything in my repo, in case I need to make changes.
I am so glad that you didn't. :P
You explained it so much more concisely.
Totally fine. I don't care about LOC, I care about readability, which you seem to have prioritized. (The
I agree. Thank you.
The original
Because this is the case, would it be possible to put
That way, it a little clearer to read, at least for me, and it seems the POSIX standard requires implementations to be able to take more that 1
Clever. And I am glad there was a solution to this.
Okay, perfect. There is just one problem: the limit on the length of string literals in the C standard. (See §5.2.4.1.1.) This was the original reason I used Right now, all of the items that strings are generated from produce strings less than 4095 characters, so technically, this script would work for everyone. However, I have been thinking about adding a few more functions to Now, I assume that Therefore, my proposal is to keep
Personally, I prefer (1). What do you think? I will go with (1) by default if I don't get a reply before you go to bed. (I am assuming you are in Germany.) |
Anyway, I'm no native speaker of English and therefore the change is only a crude example for what might be added ... ;-)
I will be doing some work on this, but otherwise, it is good enough. Merged. |
The shell version has been verified to lead to identical object files from generated C source files.
It should be a fully compatible replacement for strgen.c.
The reason I rewrote this tool is that I have prepared an import of bc/dc into the FreeBSD base system and having to compile a build tool as a prerequisite of building bc and dc is to be avoided, if at all possible.
I have verified, that the compilation succeeds if "GEN = strgen" is replaced by "GEN = strgen.sh" in Makefile.in.
On systems that do not directly execute the shell script that way, GEN_EMU may be set to /bin/sh or any other Bourne Shell compatible interpreter.