Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pivot table - support groups fields #2130

Open
2 of 6 tasks
ZephyrZiggurat opened this issue Jul 19, 2023 · 12 comments
Open
2 of 6 tasks

Pivot table - support groups fields #2130

ZephyrZiggurat opened this issue Jul 19, 2023 · 12 comments
Labels
area:load/save Problems related to loading or saving. enhancement Feature already exists, but should be enahanced. feature:pivot-table Pivot tables (NOT pivot table styles) triaged Checked and and verified that it is actionable (not dup, has required info)

Comments

@ZephyrZiggurat
Copy link

Do you want to request a feature or report a bug?

  • Bug
  • Feature
  • Question

Did you test against the latest CI build?

  • Yes
  • No

If you answered No, please test with the latest development build first.

I cloned and ran locally via ClosedXML.Sandbox.

Version of ClosedXML

0.102.0

What is the current behavior?

Exception during load of a vendor provided spreadsheet.

System.ArgumentOutOfRangeException: The column 'xyz' does not appear in the source range. (Parameter 'sourceName')
   at ClosedXML.Excel.XLPivotFields.Add(String sourceName, String customName)
   at ClosedXML.Excel.XLPivotFields.Add(String sourceName)
   at ClosedXML.Excel.XLWorkbook.LoadSpreadsheetDocument(SpreadsheetDocument dSpreadsheet)
   at ClosedXML.Excel.XLWorkbook.LoadSheets(Stream stream)
   at ClosedXML.Excel.XLWorkbook.Load(Stream stream)
   at ClosedXML.Excel.XLWorkbook..ctor(Stream stream, LoadOptions loadOptions)
   at ClosedXML.Excel.XLWorkbook..ctor(Stream stream)
   ...

What is the expected behavior or new feature?

Spreadsheet is loaded.

Is this a regression from the previous version?

Yes. Works in 0.101.0

Reproducibility

Code to reproduce problem:
The exception happens during the load of an existing file with a lot of pivot tables and connections. I'm sorry but a code sample isnt really helpful here.

new XLWorkbook("file")
  • I attached a sample spreadsheet. (You can drag files on to this issue)

I dont have enough Excel experience to recreate this issue myself and I cant attach the file from the vendor.


I did spend a bit troubleshooting in the code and noticed a couple things.

At the previous commit, there was a check that the field is available before trying to add it in several places:

var cacheField = pivotTableCacheDefinitionPart.PivotCacheDefinition.CacheFields.ElementAt(cf.Index.Value) as CacheField;
if (pt.SourceRangeFieldsAvailable.Contains(cacheField.Name?.Value))
pivotField = pf.Name != null
? pt.ColumnLabels.Add(cacheField.Name, pf.Name.Value)
: pt.ColumnLabels.Add(cacheField.Name.Value);
else
continue;

I assume this is what avoided the exception that is currently being thrown in XLPivotFields.Add.


When i skipped the bad adds in troubleshooting, I also ran into an exception where wss is null in ParsePivotSourceReference:

private XLPivotSourceReference ParsePivotSourceReference(PivotTableCacheDefinitionPart pivotTableCacheDefinitionPart)
{
// TODO: Implement other sources besides worksheetSource
// But for now assume names and references point directly to a range
var wss = pivotTableCacheDefinitionPart.PivotCacheDefinition.CacheSource.WorksheetSource;
if (!String.IsNullOrEmpty(wss.Id))
{

wss can be null with external sources so it needs to return early.

@rklec
Copy link

rklec commented Jul 31, 2023

May be related/same as #783, also having that error with spurious pivotTable and especially apparently data only in xl/pivotCache/pivotCacheDefinitions[0-9].xml. Pivot tables where apparently there, but removed long ago, but cannot find a way to cleanly delete the pivot cache and I tried all the ways - it is probably, because the pivot tables themselve do not exist anymore.

Re-creating or running these macros or so to refresh the pivot cache kinda works and "just" makes the error return a new (non-existent) column/ArgumentOutOfRangeException.

@rklec
Copy link

rklec commented Jul 31, 2023

I can confirm this error does not happen with v0.101.0 and is a regression/new in v0.102.0.

@jahav
Copy link
Member

jahav commented Jul 31, 2023

Repro workbook

using var wb = new XLWorkbook(@"c:\Temp\Issues\2130\ColumnDeletedFromWorksheetButNotFromPivotTable.xlsx");

ColumnDeletedFromWorksheetButNotFromPivotTable.xlsx

The root of the issue is that the source data in the workbook are stale. Pivot tables have their own copy of data they use to calculate and display pivot table (docs).

The problem is that ClosedXML doesn't keep this data and just uses the original data from worksheet. If you delete a column from the worksheet, data are gone even from pivot table. Basically, the current state is fundamentally built on sand and needs to be fixed.

  • I need an abstraction of pivot cache (IXLPivotCache). That was added in 0.102 .
  • I need to add shared string table, because pivot cache data references strings through SST. That was already merged to 0.103.
  • I need to load data to the the cache from workbook and save them back to the workbook. TBD.

If you click refresh all on pivot table, the exception might go away. I will look if I can add some simple check to ignore pivot fields that are missing from source workbook.

@rklec
Copy link

rklec commented Aug 1, 2023

Thanks. That reproducer Excel file works and I can reproduce the issue with that Excel with v0.102.0, but not 0.101. As such downgrading is a feasable workaround.

If you click refresh all on pivot table, the exception might go away.

Yeah and that did not (fully) work for them. The thing is somehow the pivot table in my XLSX file has somehow been deleted completely, while the cache remains (and in the XSLX/ZIP file I can see) and yes, I don't know how that could have happen, as I did not create that Excel file.

I will look if I can add some simple check to ignore pivot fields that are missing from source workbook.

IMHO that would be great. Or a simple LoadOption to ignore all pivot fields/stuff maybe, because one may not actually need it and disabling it could be the simplest solution (for now?).

@rklec
Copy link

rklec commented Aug 1, 2023

The thing is somehow the pivot table in my XLSX file has somehow been deleted completely, while the cache remains (and in the XSLX/ZIP file I can see) and yes, I don't know how that could have happen, as I did not create that Excel file.

Note I have reproduced this again with my customer-provided Excel file (which I described above) and even in ClosedXML v0.100.3 up to the current stable one, it does fail with the same error.

That is interesting, because your ColumnDeletedFromWorksheetButNotFromPivotTable.xlsx is indeed different here and works with v0.101.0 as said.
That means there has to be a different (older/additional) error, so should I open a new issue about that?

@jahav
Copy link
Member

jahav commented Oct 7, 2023

Resolved through #2178, #2182, #2183 and #2186. The original post didn't contain a test file, but pivot cache now loads records and uses pivot cache instead of data from workbook directly.

That is loading only.

Pivot table saving code doesn't work properly for pivot tables with deleted source (or some related reason). Saved files (example) without a valid reference to source data will have a empty pivot table (also shows an error on load because RefreshOnLoad is hardcoded to true in saving code likely because it's buggy and Excel saves us), though it is possible to remove and add fields to recreate the table because data are kept even with deleted source.

But it doesn't crash on load that is the sound of progress 🥇

image

image

@jahav jahav closed this as completed Oct 7, 2023
@ZephyrZiggurat
Copy link
Author

ZephyrZiggurat commented Oct 11, 2023

Unfortunately it's still failing to load my vendor provided spreadsheet. It's a mess inside, believe me.

I don't need any of the data from the pivot tables, so I'm sorry that I'm adding additional work to handle my edge-case.

Should I open a new issue since this continues to not work for this file? And since I'm running into two different issues, let me know if I should open another issue to split them.


  1. Issue where the cacheSource type is external:
<pivotCacheDefinition ... backgroundQuery="1"
    createdVersion="6" refreshedVersion="6" minRefreshableVersion="3" recordCount="0"
    supportSubquery="1" supportAdvancedDrill="1">
        <cacheSource type="external" connectionId="21" />
        <cacheField ...

This one I think I know how to pretty easily fix. I pointed out in the original description how wss can be null in ParsePivotSourceReference in this instance. But with the latest source changes there's an additional check needed when it tries to fallback to find a similar pivot cache. If it'd help, I can submit a PR with these changes.


  1. Issue with a different definition. The cached records are missing a column that is defined in the definition. Digging into the xml, the column missing from the cache is defined like this:
<cacheField name="Months" numFmtId="0" databaseField="0">
    <fieldGroup ...

The presence of the databaseField attribute seems to be the only difference with this field compared to others. This would be fixable by having ReadCacheFields continue when DatabaseField is not null but I have no idea the implications of such a change. I'm not sure how to proceed with this one.

Edit: I tried out continuing based on the presence of DatabaseField and it was a big mistake. It's needed later to load the column names and data. A different approach is needed 😄

@ZephyrZiggurat
Copy link
Author

@jahav sorry to ping you, but I assume you didnt get a prior notification

@jahav jahav changed the title non-generated Pivot Table regression Add support for groups fields Oct 25, 2023
@jahav jahav changed the title Add support for groups fields Pivot table - support groups fields Oct 25, 2023
@jahav jahav removed this from the v0.104 milestone Oct 25, 2023
@jahav
Copy link
Member

jahav commented Oct 25, 2023

The pivot table is using group fields. ClosedXML has a long way before that is supported.

As far as databaseFields:
The fields with databaseField="0" are calculated fields and grouped fields. I know there is an ancient WIP PR in unknown state #885 for calculated fields, I havent even used grouped fields, much less done any work on them.

MS-OI29500 (that is basically a patch over ISO29500 how does Excel actually implement the OOXML and fixes a lot of bugs in the standard) says

2.1.790
Part 1 Section 18.10.1.77, r (PivotCache Record)
a. The standard does not specify restriction on the number of child elements. In Office, the number of child elements shall be equal to the number of cacheField elements with the databaseField attribute equal to 1 or true.

@jahav jahav reopened this Oct 25, 2023
@jahav jahav added area:load/save Problems related to loading or saving. triaged Checked and and verified that it is actionable (not dup, has required info) feature:pivot-table Pivot tables (NOT pivot table styles) feature-request Issue is a feature request, not a bug. enhancement Feature already exists, but should be enahanced. and removed feature-request Issue is a feature request, not a bug. labels Oct 25, 2023
@ZephyrZiggurat
Copy link
Author

Would it make sense in the current implementation to add a DatabaseField property (nullable bool) to XLPivotCacheValuesStats and set it during the reading of cached fields? It could be used to [temporarily] ignore loading of calculated and grouped fields (via checking for non-null) during the processing of column labels and values.

@jahav
Copy link
Member

jahav commented Oct 26, 2023

No. It makes no sense to work with advanced features when I don't even have proper representation of basic features.

@ZephyrZiggurat
Copy link
Author

ZephyrZiggurat commented Oct 8, 2024

@jahav hello again! I would love to be able to make use of the performance improvements and enhancements you've been working on.

Would you be opposed to skipping over any DatabaseFields inside ReadCacheFields?

image

This resolves one of the issues with my input file.

The other issue is with an external CacheSource so the WorksheetSource is null.

image
image

I can create 2 PR's if you are amenable to the changes above. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:load/save Problems related to loading or saving. enhancement Feature already exists, but should be enahanced. feature:pivot-table Pivot tables (NOT pivot table styles) triaged Checked and and verified that it is actionable (not dup, has required info)
Projects
None yet
Development

No branches or pull requests

3 participants