Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Go] Unable to JSON marshal float64 arrays which contain a NaN value #40563

Closed
xxgreg opened this issue Mar 14, 2024 · 9 comments
Closed

[Go] Unable to JSON marshal float64 arrays which contain a NaN value #40563

xxgreg opened this issue Mar 14, 2024 · 9 comments

Comments

@xxgreg
Copy link

xxgreg commented Mar 14, 2024

array.Float64.MarshallJSON() returns the following error for the example code below:

json: unsupported value: NaN
package main

import (
	"fmt"
	"math"

	"github.com/apache/arrow/go/v16/arrow/array"
	"github.com/apache/arrow/go/v16/arrow/memory"
)

func main() {

	builder := array.NewFloat64Builder(memory.DefaultAllocator)
	builder.Append(math.NaN())

	array_ := builder.NewArray()

	bs, err := array_.MarshalJSON()
	if err != nil {
		fmt.Println("error:", err)
	}

	fmt.Println(string(bs))
}

Consider updating the marshalling code for Float64 and other floating point array types with something like:

func (a *array.Float64) MarshalJSON() ([]byte, error) {
	vals := make([]any, a.Len())
	for i := 0; i < a.Len(); i++ {
		if !a.IsValid(i) {
			vals[i] = nil
			continue
		}

		f := a.Value(i)
		switch {
		case math.IsNaN(f):
			vals[i] = "NaN"
		case math.IsInf(f, 1):
			vals[i] = "+Inf"
		case math.IsInf(f, -1):
			vals[i] = "-Inf"
		default:
			vals[i] = f
		}
	}

	return json.Marshal(vals)
}

Component(s)

Go

@zeroshade
Copy link
Member

Hmm, if we do this we'd probably need to also update the corresponding UnmarshalJSON code to properly handle NaN so that we're still symmetrical. I'll mark this as a "good first issue" for someone to pick up

@verma-kartik
Copy link
Contributor

verma-kartik commented Apr 7, 2024

@zeroshade @xxgreg I would like to make the above changes. I can assign myself via take right?
Also, I tried the changes on local, and the change is working as expected. I observed that:

  1. There are not any unit tests for MarshalJSON() and UnmarshalJSON, shall I also add them to for this issue?
  2. /go/arrow/array has separate files for handling Float16 data type, but Float32, Float64, and others are together in numeric.gen.go. Is there an explicit reason to do so or just technical debt to break numeric.gen.go in multiple files?

@zeroshade
Copy link
Member

Hey @verma-kartik, using take would indeed assign yourself to this issue. It would also assign you automatically if you file a PR containing GH-30563 in it's title.

As for your observations:

There are not any unit tests for MarshalJSON() and UnmarshalJSON, shall I also add them to for this issue?

While there aren't tests that explicitly call MarshalJSON and UnmarshalJSON they are both indirectly tested via https://github.com/apache/arrow/blob/main/go/arrow/array/util_test.go and https://github.com/apache/arrow/blob/main/go/arrow/array/json_reader_test.go which call FromJSON, use json.Marshal and use a json.Reader respectively. So you should be able to add the relevant tests for this into one of those files.

  1. /go/arrow/array has separate files for handling Float16 data type, but Float32, Float64, and others are together in numeric.gen.go. Is there an explicit reason to do so or just technical debt to break numeric.gen.go in multiple files?

The reason why Float16 is separate from those is due to the fact that the code gen has to handle float16.Num as opposed to just using float16 like it can do with the other data types which are native. Ideally, the technical debt here would be to replace the numeric.gen.go file to utilize Go generics instead.

@verma-kartik
Copy link
Contributor

take

@verma-kartik
Copy link
Contributor

@zeroshade Thanks for pointing me in the right direction for tests.

@verma-kartik
Copy link
Contributor

verma-kartik commented Apr 8, 2024

@zeroshade @xxgreg math.inf() is of type float64. I have two approaches on how to handle these for float32:
Either, typecast the value to float64 before checking for math.IsInf(),

...
case math.IsInf(float64(f), 1):
      vals[i] = "+Inf"
...

Or, use the math.Float32frombits() to interpret the uint32 value of IEEE-752 of +Inf as float32,

...
case math.Float32frombits(0x7F800000) == f:
      vals[i] = "+Inf"
...

Similarly for handling NaN and -Inf. Which one (or neither) is feasible?

For float16, in, there is Inf() but not NegInf(). It can be added,

...
func NegInf() Num { return Num{bits: 0xFC00} }
...

@zeroshade
Copy link
Member

zeroshade commented Apr 8, 2024

Personally, I would just use strconv.FormatFloat which will properly handle the +Inf, -Inf and NaN cases for 32 or 64 bit floats

verma-kartik pushed a commit to verma-kartik/arrow that referenced this issue Apr 9, 2024
zeroshade added a commit that referenced this issue Apr 19, 2024
…NaN value (#41109)

### Rationale for this change

From this [issue](#40563 (comment)).

### What changes are included in this PR?

MarshalJSON() is changed for Float32, Float64 to handle NaN, +Inf and -Inf values. I was only able to handle NaN for Float16. 

UnmarshalJSON() was already handling NaN, +Inf and -Inf in the right way. 

### Are these changes tested?

These changes are tested. Tests are included in those files where I find it was logically appropriate to do so.

### Are there any user-facing changes?

They will be able to handle corner values for Float32, Float64 now.

* GitHub Issue: #40563

Lead-authored-by: verma-kartik <[email protected]>
Co-authored-by: work-vermakartik <[email protected]>
Co-authored-by: Matt Topol <[email protected]>
Signed-off-by: Matt Topol <[email protected]>
@zeroshade
Copy link
Member

Issue resolved by pull request 41109
#41109

@zeroshade zeroshade added this to the 16.1.0 milestone Apr 19, 2024
@xxgreg
Copy link
Author

xxgreg commented Apr 20, 2024

🎉 Thanks team! @verma-kartik @zeroshade

raulcd pushed a commit that referenced this issue Apr 29, 2024
…NaN value (#41109)

### Rationale for this change

From this [issue](#40563 (comment)).

### What changes are included in this PR?

MarshalJSON() is changed for Float32, Float64 to handle NaN, +Inf and -Inf values. I was only able to handle NaN for Float16. 

UnmarshalJSON() was already handling NaN, +Inf and -Inf in the right way. 

### Are these changes tested?

These changes are tested. Tests are included in those files where I find it was logically appropriate to do so.

### Are there any user-facing changes?

They will be able to handle corner values for Float32, Float64 now.

* GitHub Issue: #40563

Lead-authored-by: verma-kartik <[email protected]>
Co-authored-by: work-vermakartik <[email protected]>
Co-authored-by: Matt Topol <[email protected]>
Signed-off-by: Matt Topol <[email protected]>
tolleybot pushed a commit to tmct/arrow that referenced this issue May 2, 2024
…ain a NaN value (apache#41109)

### Rationale for this change

From this [issue](apache#40563 (comment)).

### What changes are included in this PR?

MarshalJSON() is changed for Float32, Float64 to handle NaN, +Inf and -Inf values. I was only able to handle NaN for Float16. 

UnmarshalJSON() was already handling NaN, +Inf and -Inf in the right way. 

### Are these changes tested?

These changes are tested. Tests are included in those files where I find it was logically appropriate to do so.

### Are there any user-facing changes?

They will be able to handle corner values for Float32, Float64 now.

* GitHub Issue: apache#40563

Lead-authored-by: verma-kartik <[email protected]>
Co-authored-by: work-vermakartik <[email protected]>
Co-authored-by: Matt Topol <[email protected]>
Signed-off-by: Matt Topol <[email protected]>
tolleybot pushed a commit to tmct/arrow that referenced this issue May 4, 2024
…ain a NaN value (apache#41109)

### Rationale for this change

From this [issue](apache#40563 (comment)).

### What changes are included in this PR?

MarshalJSON() is changed for Float32, Float64 to handle NaN, +Inf and -Inf values. I was only able to handle NaN for Float16. 

UnmarshalJSON() was already handling NaN, +Inf and -Inf in the right way. 

### Are these changes tested?

These changes are tested. Tests are included in those files where I find it was logically appropriate to do so.

### Are there any user-facing changes?

They will be able to handle corner values for Float32, Float64 now.

* GitHub Issue: apache#40563

Lead-authored-by: verma-kartik <[email protected]>
Co-authored-by: work-vermakartik <[email protected]>
Co-authored-by: Matt Topol <[email protected]>
Signed-off-by: Matt Topol <[email protected]>
rok pushed a commit to tmct/arrow that referenced this issue May 8, 2024
…ain a NaN value (apache#41109)

### Rationale for this change

From this [issue](apache#40563 (comment)).

### What changes are included in this PR?

MarshalJSON() is changed for Float32, Float64 to handle NaN, +Inf and -Inf values. I was only able to handle NaN for Float16. 

UnmarshalJSON() was already handling NaN, +Inf and -Inf in the right way. 

### Are these changes tested?

These changes are tested. Tests are included in those files where I find it was logically appropriate to do so.

### Are there any user-facing changes?

They will be able to handle corner values for Float32, Float64 now.

* GitHub Issue: apache#40563

Lead-authored-by: verma-kartik <[email protected]>
Co-authored-by: work-vermakartik <[email protected]>
Co-authored-by: Matt Topol <[email protected]>
Signed-off-by: Matt Topol <[email protected]>
rok pushed a commit to tmct/arrow that referenced this issue May 8, 2024
…ain a NaN value (apache#41109)

### Rationale for this change

From this [issue](apache#40563 (comment)).

### What changes are included in this PR?

MarshalJSON() is changed for Float32, Float64 to handle NaN, +Inf and -Inf values. I was only able to handle NaN for Float16. 

UnmarshalJSON() was already handling NaN, +Inf and -Inf in the right way. 

### Are these changes tested?

These changes are tested. Tests are included in those files where I find it was logically appropriate to do so.

### Are there any user-facing changes?

They will be able to handle corner values for Float32, Float64 now.

* GitHub Issue: apache#40563

Lead-authored-by: verma-kartik <[email protected]>
Co-authored-by: work-vermakartik <[email protected]>
Co-authored-by: Matt Topol <[email protected]>
Signed-off-by: Matt Topol <[email protected]>
vibhatha pushed a commit to vibhatha/arrow that referenced this issue May 25, 2024
…ain a NaN value (apache#41109)

### Rationale for this change

From this [issue](apache#40563 (comment)).

### What changes are included in this PR?

MarshalJSON() is changed for Float32, Float64 to handle NaN, +Inf and -Inf values. I was only able to handle NaN for Float16. 

UnmarshalJSON() was already handling NaN, +Inf and -Inf in the right way. 

### Are these changes tested?

These changes are tested. Tests are included in those files where I find it was logically appropriate to do so.

### Are there any user-facing changes?

They will be able to handle corner values for Float32, Float64 now.

* GitHub Issue: apache#40563

Lead-authored-by: verma-kartik <[email protected]>
Co-authored-by: work-vermakartik <[email protected]>
Co-authored-by: Matt Topol <[email protected]>
Signed-off-by: Matt Topol <[email protected]>
kou pushed a commit to apache/arrow-go that referenced this issue Aug 30, 2024
…NaN value (#41109)

### Rationale for this change

From this [issue](apache/arrow#40563 (comment)).

### What changes are included in this PR?

MarshalJSON() is changed for Float32, Float64 to handle NaN, +Inf and -Inf values. I was only able to handle NaN for Float16. 

UnmarshalJSON() was already handling NaN, +Inf and -Inf in the right way. 

### Are these changes tested?

These changes are tested. Tests are included in those files where I find it was logically appropriate to do so.

### Are there any user-facing changes?

They will be able to handle corner values for Float32, Float64 now.

* GitHub Issue: #40563

Lead-authored-by: verma-kartik <[email protected]>
Co-authored-by: work-vermakartik <[email protected]>
Co-authored-by: Matt Topol <[email protected]>
Signed-off-by: Matt Topol <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants