Skip to content

Implement styles#538

Open
ddimaria wants to merge 2 commits intotafia:masterfrom
ddimaria:master
Open

Implement styles#538
ddimaria wants to merge 2 commits intotafia:masterfrom
ddimaria:master

Conversation

@ddimaria
Copy link

@ddimaria ddimaria commented Jul 24, 2025

Add style support for Calamine.

Basic api for getting styles

fn test_worksheet_style_iter() {
    let mut xlsx: Xlsx<_> = wb("styles.xlsx");
    let styles = xlsx.worksheet_style("Sheet 1").unwrap();

    for (row, styles) in styles.rows().enumerate() {
        for (col, style) in styles.iter().enumerate() {
            println!("row: {}, col: {}, style: {:?}", row, col, style);
        }
    }
}

Here is the relevant structs so far:

struct Style {
    font: Option<Font>,
    fill: Option<Fill>,
    borders: Option<Borders>,
    alignment: Option<Alignment>,
    number_format: Option<NumberFormat>,
    protection: Option<Protection>,
    style_id: Option<u32>,
}

struct Color {
    alpha: u8,
    red: u8,
    green: u8,
    blue: u8,
}

struct Borders {
    left: Border,
    right: Border,
    top: Border,
    bottom: Border,
    diagonal_down: Border,
    diagonal_up: Border,
}

struct Font {
    name: Option<String>,
    size: Option<f64>,
    weight: FontWeight,
    style: FontStyle,
    underline: UnderlineStyle,
    strikethrough: bool,
    color: Option<Color>,
    family: Option<String>,
}

struct Alignment {
    horizontal: HorizontalAlignment,
    vertical: VerticalAlignment,
    text_rotation: TextRotation,
    wrap_text: bool,
    indent: Option<u8>,
    shrink_to_fit: bool,
}

struct Fill {
    pattern: FillPattern,
    foreground_color: Option<Color>,
    background_color: Option<Color>,
}

struct NumberFormat {
    format_code: String,
    format_id: Option<u32>,
}

struct Protection {
    locked: bool,
    hidden: bool,
}

@jmcnamara
Copy link
Collaborator

Just a quick note to say that overall this looks good and I think it is going in the right direction.

I'll give more detailed feedback early next week.

For code comments you are already using correct sentence-case but could you also add a full stop/period at the end of each line. This makes it easier to change comments into documentation. It isn't a rule that has been adhered to in the existing code but I would like to improve that going forward.

@azarmadr
Copy link

Man, i wish i discovered this pull early, i wanted to know if a cell is stricken or not. Sat whole day and edited the code myself. Came till adding if stricken or not from the styles.

Learnt a lot from my experience though.
i'll test this pull against my code.

@azarmadr
Copy link

Is the data and styles coming together?
If not, then would we have to read the excel twice to get data, once and then style?

@ddimaria
Copy link
Author

Is the data and styles coming together? If not, then would we have to read the excel twice to get data, once and then style?

Data and Formats have always been processed separately (e.g. worksheet_range() and worksheet_formula()). This PR follows the format pattern (worksheet_style() for styles and worksheet_layout() for column widths and row heights).

When I finish the base work, I'm going to look into optimizations:

  1. More optimal storage (Vec vs Run-Length Encoding) for styles
  2. Looping (to your point, combining read operations to reduce memory/cpu usage)

@azarmadr
Copy link

i got it, even formulae were being used similarly. Was able to satisfy my requirement with worksheet_sytle over a range.

Another doubt, why every item in the style is an option?
All the fonts, borders, and others are available as ids in cellxfs?

hope i am not bothering you.😊

@ddimaria
Copy link
Author

Another doubt, why every item in the style is an option?

@azarmadr this is because the particular style may not exist for a given cell.

@ddimaria
Copy link
Author

ddimaria commented Aug 13, 2025

@jmcnamara We're currently using this fork in production and it's working well. What do I need to do to get this merged with Calamine? Some things to discuss:

  • Is the API agreeable?
  • What type of documentation is needed?
  • Add more testing?
  • Should this be a compiler feature flag?
  • This is only implemented for xlsx, is that OK?

@jmcnamara
Copy link
Collaborator

What do I need to do to get this merged with Calamine?

You should create a non-draft PR with all the WIP files and code removed. I will try to do a review on the current code.

Is the API agreeable?

Overall I think it is okay but I may have specific comments during review.

What type of documentation is needed?

Copy the style of the Xlsx documentation:

  • One line description.
  • One-two paragraph explanation.
  • Parameters section.
  • Errors section if the method Result<(), XlsxError>.
  • Examples section.

https://docs.rs/calamine/latest/calamine/struct.Xlsx.html#method.merged_regions_by_sheet

Add more testing?

That is always good. :-)

Should this be a compiler feature flag?

I think not.

This is only implemented for xlsx, is that OK?

Probably not but lets try get it right for Xlsx first.

STYLE_FEATURE.md Outdated
@@ -0,0 +1,253 @@
# Style Support in Calamine
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this files from the final PR or convert it to documentation.

/// The cell value
pub value: Data,
/// The cell style
pub style: Option<Style>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strictly speaking the style is never optional in the Excel file formats. There is always an implicit 0 style that is the default and is defined in the theme.xml file. Note, I'm not necessarily saying that it shouldn't be an Option<>.

src/datatype.rs Outdated
self.style = None;
}

/// Check if the cell has style information
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, an Excel cell will always have style information even if it is implicit for size reasons.

let mut style = None;

// Extract style ID if present
if let Ok(Some(style_id_str)) =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, the style id is 0 and not None if not present.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is done, please review.

}

/// Get effective column width (custom or default)
pub fn get_effective_column_width(&self, column: u32) -> f64 {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This effective width is probably not the width that the user would expect and also may be dependent on the default font for the file.

See the following for the calculation in the other direction: https://github.com/jmcnamara/rust_xlsxwriter/blob/main/src/worksheet.rs#L19340-L19368

However, there may not be any simple way to return the correct value in all cases.

}

/// Get effective row height (custom or default)
pub fn get_effective_row_height(&self, row: u32) -> f64 {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment for row height although the height calculation is simpler.

use crate::style::*;
use crate::XlsxError;

/// Get default Excel 2007+ theme color scheme (Office theme)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid using /// for non-public non-doc comments. Also this is an example of where proper sentence punctuation should be applied.

@jmcnamara
Copy link
Collaborator

jmcnamara commented Aug 14, 2025

First review done. Some general comments:

  • Please squash and or rebase WIP code and reverts into a small number of logical commits or one commit.
  • Fix the clippy and or format warnings the are breaking the CI.
  • Add a full stop/period at the end of all comments.
  • Avoid /// comments for non-public non-doc comments. Use // instead.
  • Post an explanation here for the setting functions in CellData and Style.

@jmcnamara
Copy link
Collaborator

@ddimaria Any update on this?

};

if style_id < self.styles.len() {
style = Some(self.styles[style_id].clone());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this is in scope for this project, but at work we've been dealing with a lot of very large workbooks and I'm concerned about the performance effect of cloning the style for each cell (especially because the style object includes strings).

Personally I'd prefer just exposing the style ID and also the styles table and allowing users to map it as required. Although I understand the ergonomics of that are not as nice, I still think it would avoid some potential performance impact.

To be clear I'm saying this without any data to back it up, but anecdotally I made a change to how this library parses formulas and got a significant* speed improvement by avoiding extra allocations for each cell.

Here's an example from a different PR: #568 (comment)

Let me know if you have any thoughts.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@louisdewar I share your concern. A good data structure for styles is Run Length Encoding (since styles often repeat). I want to add that feature, as originally discussed here: #538 (comment), though it's been tabled in this PR by the maintainer: #538 (comment)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hough it's been tabled in this PR by the maintainer: #538 (comment)

I tabled it because "stretch goal" sounded like a nice to have. If, in fact, it is something that is required to get reasonable performance then that will need to be done before merge.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to get some perf number before implementing. We've been using this branch in production and haven't noticed perf issues.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to run some very basic benchmarking today and show the results. I'll just comment out the clone and see what impact that has and then do some flamegraphs to see if I can see a meaningful difference.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @louisdewar! I want to do the same, time permitting, with the addition of comparing perf of style vs non-style parts of the library. It also won't take very long to implement RLE.

Copy link
Contributor

@louisdewar louisdewar Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay.

Simple benchmark:

use std::env;
use calamine::{open_workbook_auto, Reader};

fn main() {
    let file_path = env::args().nth(1).expect("Please provide a file path");

    let mut workbook = open_workbook_auto(&file_path)
        .expect("Failed to open workbook");

    for sheet_name in workbook.sheet_names() {
        let range = workbook
            .worksheet_range(&sheet_name)
            .expect("Failed to read worksheet");

        for row in range.rows() {
            for cell in row {
                std::hint::black_box(cell);
            }
        }
    }
}

Before

# warmup = 2
Benchmark 1: ./target/release/examples/parse_tmp ~/Downloads/DESRI_BASE.xlsm
  Time (mean ± σ):      1.285 s ±  0.005 s    [User: 1.206 s, System: 0.075 s]
  Range (min … max):    1.275 s …  1.292 s    10 runs

https://share.firefox.dev/4i0pUAo
7.2% of total extraction time is spent doing a specific style.clone().

After removing .clones

I'd imagine we would have comparible performance between this and just storing the style ID for each cell.

$ git diff
diff --git a/src/xlsx/cells_reader.rs b/src/xlsx/cells_reader.rs
index b1b1168..e60dc2f 100644
--- a/src/xlsx/cells_reader.rs
+++ b/src/xlsx/cells_reader.rs
@@ -132,7 +132,7 @@ where
                     };

                     if style_id < self.styles.len() {
-                        style = Some(self.styles[style_id].clone());
+                        // style = Some(self.styles[style_id].clone());
                     }

                     loop {
@@ -211,7 +211,7 @@ where
                     };

                     if style_id < self.styles.len() {
-                        style = Some(self.styles[style_id].clone());
+                        // style = Some(self.styles[style_id].clone());
                     }

                     loop {
@@ -352,7 +352,8 @@ where
                     {
                         if let Ok(style_id) = atoi_simd::parse::<usize>(style_id_str) {
                             if style_id < self.styles.len() {
-                                self.styles[style_id].clone()
+                                Style::new()
+                                // self.styles[style_id].clone()
                             } else {
                                 Style::new()
                             }
# warmup = 2
Benchmark 1: ./target/release/examples/parse_tmp ~/Downloads/DESRI_BASE.xlsm
  Time (mean ± σ):      1.103 s ±  0.005 s    [User: 1.038 s, System: 0.061 s]
  Range (min … max):    1.093 s …  1.110 s    10 runs

https://share.firefox.dev/47WkCBo

About a 14% reduction in time.

Notes

I assume the easiest way to get this speedup is to simply store style ID in the cells and expose the style table.

I haven't profiled memory, I'm not sure what the best test of this is but I imagine this would also show a reasonable impact assuming you were storing the entire workbook in memory e.g. doing a .collect() on the cell reader.

Also this benchmark could easily be biased, I just picked a large xlsx I was using at work, other workbooks might show something different.

My laptop is quite fast (m4 pro), I can imagine the different is a lot bigger on slower servers etc...

@jmcnamara
Copy link
Collaborator

@ddimaria Any update on this. I would really like to get this upstream in the next few months.

See also: #597

@louisdewar
Copy link
Contributor

I'm also willing to help out on this if that would be useful

@jmcnamara
Copy link
Collaborator

jmcnamara commented Jan 13, 2026

@louisdewar Thanks. That would be great. Could you start by rebasing the PR to the latest HEAD, fixing any issues, and creating a new PR.

Or, if you want to submit your own approach that is fine too.

@louisdewar
Copy link
Contributor

@jmcnamara sounds good. I should have some time in the next few weeks. I think my preference would be to squash all these commits into 1 with a Co-Authored-By and make changes on top or figure out ways to make smaller incremental PRs - I'll decide once I have time to dig into this in detail.

Also it might be good to let me know if you have any thoughts about implementation / things you want to see before I start trying to tidy this up.

@jmcnamara
Copy link
Collaborator

Also it might be good to let me know if you have any thoughts about implementation / things you want to see before I start trying to tidy this up.

@louisdewar

Nothing major. I made some comments above on the original PR. My main change would be to remove the Option<> from style_id because there is always an implicit style ID of 0 = "General". I would also have similar concerns to you about the memory footprint of the proposed design. I'd agree with your comment above:

Personally I'd prefer just exposing the style ID and also the styles table and allowing users to map it as required. Although I understand the ergonomics of that are not as nice, I still think it would avoid some potential performance impact.

#538 (comment)

I don't know if you saw this comment/proposal (See also: #597) but there is now a Rust based Excel number formatting crate called ssfmt that could be plugged in so that the user could have access to a Data or DataType function like:

fn get_formatted_number(&self) -> Option<String> {}

All the style IDs and lookups could then be hidden in the background from the user.

@ddimaria
Copy link
Author

ddimaria commented Jan 15, 2026

@ddimaria Any update on this. I would really like to get this upstream in the next few months.

See also: #597

@jmcnamara I've been a bit busy and haven't been able to return to this. My next body of work is to improve the performance of excel imports (using Calamine), so I can handle this then. That said, if there is a pressing need and someone wants to take it over, they certainly can.

Here is my plan:

  • Merge in main, resolve lints
  • Implement RunLength Encoding and look for perf improvements
  • Add criterion perf tests (run before and after RLE implementation)
  • Clean up documentation

Let me know how you'd like to proceed. Thanks! -David

@jmcnamara
Copy link
Collaborator

Thanks.

As a first step it would be good to have a rebased PR with the CI checks passing. Maybe submit that as a new PR to supersede this one. Due to the number of comments this page often fails to load.

@ddimaria
Copy link
Author

Finally back on this 😄

@ddimaria
Copy link
Author

I've merged main into this branch, resolving conflicts and fixing tests. I also have a 1M cell style file being used in new benchmark tests (used criterion). I'm moving on to flamegraph and will start improvements based on the results.

@ddimaria
Copy link
Author

I just implemented RLE and other optimizations. I'll close this PR and open a new one tomorrow, and will squash all of my commits. The perf is a lot better now, and there is less of a memory footprint.

@ddimaria
Copy link
Author

I added support for Rich Text, which handles various formats within a single cell. This is how excel stores it:

<si>
  <r>
    <rPr>
      <b/>  <!-- bold -->
      <sz val="11"/>
      <color rgb="FF0000FF"/>
      <rFont val="Calibri"/>
    </rPr>
    <t>Bold blue text</t>
  </r>
  <r>
    <rPr>
      <i/>  <!-- italic -->
      <sz val="11"/>
      <color rgb="FFFF0000"/>
    </rPr>
    <t> italic red text</t>
  </r>
</si>

wolfiesch added a commit to SynthGL/calamine that referenced this pull request Feb 14, 2026
Resolves merge conflicts in Cargo.toml (rstest version) and
tests/test.rs (pivot table tests vs style tests — keep both).

This merge adds:
- Style struct with Font/Fill/Borders/Alignment/NumberFormat/Protection
- styles.xml parser (src/xlsx/style_parser.rs)
- worksheet_style() API returning row×col style grid
- RLE-compressed StyleRange for memory efficiency

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@ddimaria
Copy link
Author

I noticed a bug in colors where tints weren't being handled. This is now implemented.

@ddimaria
Copy link
Author

ddimaria commented Mar 2, 2026

@jmcnamara I need to implement conditional formatting and validations, is it OK to add them to this PR?

@jmcnamara
Copy link
Collaborator

@ddimaria This PR is already very large. It would be best to get this stabilized first (is it suitable for review now?).

You can create other PRs that are dependent on this on. Conditional formatting and Data Validation should be separate PRs.

@ddimaria
Copy link
Author

ddimaria commented Mar 2, 2026

@ddimaria This PR is already very large. It would be best to get this stabilized first (is it suitable for review now?).

You can create other PRs that are dependent on this on. Conditional formatting and Data Validation should be separate PRs.

That works for me. I just rebased my fork's commits onto upstream/master and squashed into a single commit.
I'll do a final read through the code this morning and mark this ready for review.

@Erik1000
Copy link

@ddimaria This PR is already very large. It would be best to get this stabilized first (is it suitable for review now?).
You can create other PRs that are dependent on this on. Conditional formatting and Data Validation should be separate PRs.

That works for me. I just rebased my fork's commits onto upstream/master and squashed into a single commit. I'll do a final read through the code this morning and mark this ready for review.

is this PR now ready for review and can be merge? 👀

@ddimaria
Copy link
Author

@ddimaria This PR is already very large. It would be best to get this stabilized first (is it suitable for review now?).
You can create other PRs that are dependent on this on. Conditional formatting and Data Validation should be separate PRs.

That works for me. I just rebased my fork's commits onto upstream/master and squashed into a single commit. I'll do a final read through the code this morning and mark this ready for review.

is this PR now ready for review and can be merge? 👀

@Erik1000 I believe it is, though some conflicts just popped up. Let me review those and CI and it will be ready to review/merge.

@ddimaria ddimaria marked this pull request as ready for review March 17, 2026 14:53
@ddimaria
Copy link
Author

@jmcnamara this PR is ready for review. I'm staring conditional formatting in another branch.

@jmcnamara
Copy link
Collaborator

this PR is ready for review. I'm staring conditional formatting in another branch.

Ok. Fix the cargo fmt issues and if present, any clippy warnings. I think it would be best to resubmit this as a new PR. There are a lot of out of date comments on this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants