Skip to content
This repository has been archived by the owner on Dec 9, 2024. It is now read-only.

Error: object extended beyond tape #16

Open
dvillaveces-tq opened this issue Feb 7, 2023 · 4 comments
Open

Error: object extended beyond tape #16

dvillaveces-tq opened this issue Feb 7, 2023 · 4 comments

Comments

@dvillaveces-tq
Copy link

docker run -it -v "$(pwd)/tmp/mrf-parse:/tmp/mrf-parse:rw" dancarbone/danielchalef-mrfparse pipeline -i https://antm-pt-prod-dataz-nogbd-nophi-us-east1.s3.amazonaws.com/anthem/CO_CBPLMED0000.json.gz -o /tmp/mrf-parse/outputs -p -1 -s /tmp/mrf-parse/data/filters/tic_500_shoppable_svcs.csv

INFO[2023-02-04T18:31:29Z] Running step: Download
INFO[2023-02-04T18:31:45Z] Downloaded 47691958 bytes from https://antm-pt-prod-dataz-nogbd-nophi-us-east1.s3.amazonaws.com/anthem/CO_CBPLMED0000.json.gz to /tmp/mrfparse3638373359/src/CO_CBPLMED0000.json.gz
INFO[2023-02-04T18:31:45Z] Step Download completed in 16 seconds
INFO[2023-02-04T18:31:45Z] Running step: Split
Reading /tmp/mrfparse3638373359/src/CO_CBPLMED0000.json.gz
Closing /tmp/mrfparse3638373359/split/provider_references_00.jsonl after 4.898750 seconds
Closing /tmp/mrfparse3638373359/split/in_network_00.jsonl after 0.561741 seconds
/tmp/mrfparse3638373359/split/root.json written successfully
Completed in 5.465689 secondsINFO[2023-02-04T18:31:51Z] Step Split completed in 6 seconds
INFO[2023-02-04T18:31:51Z] Running step: Parse
INFO[2023-02-04T18:31:51Z] Loaded 493 services.
INFO[2023-02-04T18:31:51Z] Found 3 files.
INFO[2023-02-04T18:31:51Z] MrfRoot file parsed: /tmp/mrfparse3638373359/split/root.json
INFO[2023-02-04T18:31:51Z] Found in_network_rate file/tmp/mrfparse3638373359/split/root.json
INFO[2023-02-04T18:31:51Z] Parsing in_network_rates: /tmp/mrfparse3638373359/split/in_network_00.jsonl
ERRO[2023-02-04T18:31:51Z] Fatal error in /app/pkg/mrfparse/mrf/in_network_rates.go#390: corrupt input: object extended beyond tape

@danielchalef
Copy link
Owner

I'm unable to reproduce this with the same input. See below. Can you confirm you're using the latest pull from main and Go 1.19.x?

daniel@server1 ➜  mrfparse git:(main) ✗ ./out/bin/mrfparse pipeline -i https://antm-pt-prod-dataz-nogbd-nophi-us-east1.s3.
amazonaws.com/anthem/CO_CBPLMED0000.json.gz  -o /tmp/out -s data/tic_500_shoppable_services.csv -p 0
INFO[2023-02-07T17:50:57Z] Running step: Download
INFO[2023-02-07T17:51:01Z] Downloaded 47691958 bytes from https://antm-pt-prod-dataz-nogbd-nophi-us-east1.s3.amazonaws.com/anthem/CO_CBPLMED0000.json.gz to /tmp/mrfparse3704144958/src/CO_CBPLMED0000.json.gz
INFO[2023-02-07T17:51:01Z] Step Download completed in 4 seconds
INFO[2023-02-07T17:51:01Z] Running step: Split
Reading /tmp/mrfparse3704144958/src/CO_CBPLMED0000.json.gz
Closing /tmp/mrfparse3704144958/split/provider_references_00.jsonl after 3.308092 seconds
Closing /tmp/mrfparse3704144958/split/in_network_00.jsonl after 0.515044 seconds
/tmp/mrfparse3704144958/split/root.json written successfully
Completed in 3.824363 secondsINFO[2023-02-07T17:51:04Z] Step Split completed in 3 seconds
INFO[2023-02-07T17:51:04Z] Running step: Parse
INFO[2023-02-07T17:51:04Z] Loaded 493 services.
INFO[2023-02-07T17:51:04Z] Found 3 files.
INFO[2023-02-07T17:51:04Z] MrfRoot file parsed: /tmp/mrfparse3704144958/split/root.json
INFO[2023-02-07T17:51:04Z] Found in_network_rate file/tmp/mrfparse3704144958/split/root.json
INFO[2023-02-07T17:51:04Z] Parsing in_network_rates: /tmp/mrfparse3704144958/split/in_network_00.jsonl
INFO[2023-02-07T17:51:04Z] Completed reading negotiated_rates: /tmp/mrfparse3704144958/split/in_network_00.jsonl
INFO[2023-02-07T17:51:05Z] Found 4980 providers in in_network_rates.
INFO[2023-02-07T17:51:05Z] Found provider_references fileprovider_references_00.jsonl
INFO[2023-02-07T17:51:05Z] Parsing provider references: /tmp/mrfparse3704144958/split/provider_references_00.jsonl
INFO[2023-02-07T17:51:06Z] Completed reading provider references: /tmp/mrfparse3704144958/split/provider_references_00.jsonl
INFO[2023-02-07T17:51:06Z] Found 275675 providers. Matched on 4980 providers.
INFO[2023-02-07T17:51:06Z] Step Parse completed in 2 seconds
INFO[2023-02-07T17:51:06Z] Running step: Clean
INFO[2023-02-07T17:51:06Z] Step Clean completed in 0 seconds

@frishrash
Copy link

frishrash commented Feb 20, 2023

docker run -it -v "$(pwd)/tmp/mrf-parse:/tmp/mrf-parse:rw" dancarbone/danielchalef-mrfparse pipeline -i https://antm-pt-prod-dataz-nogbd-nophi-us-east1.s3.amazonaws.com/anthem/CO_CBPLMED0000.json.gz -o /tmp/mrf-parse/outputs -p -1 -s /tmp/mrf-parse/data/filters/tic_500_shoppable_svcs.csv

INFO[2023-02-04T18:31:29Z] Running step: Download INFO[2023-02-04T18:31:45Z] Downloaded 47691958 bytes from https://antm-pt-prod-dataz-nogbd-nophi-us-east1.s3.amazonaws.com/anthem/CO_CBPLMED0000.json.gz to /tmp/mrfparse3638373359/src/CO_CBPLMED0000.json.gz INFO[2023-02-04T18:31:45Z] Step Download completed in 16 seconds INFO[2023-02-04T18:31:45Z] Running step: Split Reading /tmp/mrfparse3638373359/src/CO_CBPLMED0000.json.gz Closing /tmp/mrfparse3638373359/split/provider_references_00.jsonl after 4.898750 seconds Closing /tmp/mrfparse3638373359/split/in_network_00.jsonl after 0.561741 seconds /tmp/mrfparse3638373359/split/root.json written successfully Completed in 5.465689 secondsINFO[2023-02-04T18:31:51Z] Step Split completed in 6 seconds INFO[2023-02-04T18:31:51Z] Running step: Parse INFO[2023-02-04T18:31:51Z] Loaded 493 services. INFO[2023-02-04T18:31:51Z] Found 3 files. INFO[2023-02-04T18:31:51Z] MrfRoot file parsed: /tmp/mrfparse3638373359/split/root.json INFO[2023-02-04T18:31:51Z] Found in_network_rate file/tmp/mrfparse3638373359/split/root.json INFO[2023-02-04T18:31:51Z] Parsing in_network_rates: /tmp/mrfparse3638373359/split/in_network_00.jsonl ERRO[2023-02-04T18:31:51Z] Fatal error in /app/pkg/mrfparse/mrf/in_network_rates.go#390: corrupt input: object extended beyond tape

I had a similar issue when I built on Windows. I tried to trace it back and it seems like a bug in fakesimdjson. When I forced it to work with simdjson directly this error has gone.

@danielchalef
Copy link
Owner

Thanks. That's helpful context. @dcarbone may be interested in taking a look ^

@dcarbone
Copy link
Contributor

Yeah, I saw this when dealing with particularly large files. I probably won't have much time to look into it for a little while, unfortunately :\

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants