Parse read_nsrdb_psm4 header with the csv module to keep quoted commas (fixes #2736)#2771
Open
gaoflow wants to merge 1 commit into
Open
Parse read_nsrdb_psm4 header with the csv module to keep quoted commas (fixes #2736)#2771gaoflow wants to merge 1 commit into
gaoflow wants to merge 1 commit into
Conversation
read_nsrdb_psm4 split the three header lines with a naive str.split(','),
which broke spectral-on-demand files whose column names are quoted fields
containing commas (e.g. '"GaAs (Bauhuis et al., 2009)"'). Such names were
split into spurious columns, raising on read. Parse the header lines with
the csv module so quoted fields are kept intact.
Fixes pvlib#2736
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
docs/sphinx/source/referencefor API changes. (n/a — no API change)docs/sphinx/source/whatsnewfor all changes.What this fixes
read_nsrdb_psm4parsed its three header lines with a naivestr.split(','):The NSRDB spectral-on-demand CSVs reported in #2736 have quoted column
names that contain commas, e.g.
These are valid CSV (the commas are inside quotes), and
pandas.read_csvparses the data rows correctly — but
str.split(',')splits each quoted nameinto multiple fragments, inflating the column count. The mismatch between the
mis-split
names/usecolsand the correctly-parsed data then raises on read.The change
Parse the three header lines with the
csvmodule (which honors quoting)instead of
str.split(','). For ordinary (unquoted) files this is identicalto the previous behavior, so the existing readers are unaffected.
This addresses the parsing crash that @kandersolar confirmed should be
supported. The further
map_variables=Trueunit handling for spectral files(W/m²/µm → W/m²/nm) mentioned in the issue is a separate enhancement and is
left out of scope here.
Reproduction (before this PR)
After the fix the quoted columns survive intact
(
'GaAs (Bauhuis et al., 2009)','InGaP (Gray, 2008)').A regression test (
test_read_nsrdb_psm4_quoted_columns_with_commas) is addedthat fails on
mainand passes with this change; the existingread_nsrdb_psm4tests continue to pass.