Skip to content
This repository has been archived by the owner on May 26, 2022. It is now read-only.

Enable est column width calculation #866

Open
wants to merge 8 commits into
base: master
Choose a base branch
from
Open

Conversation

xwiz
Copy link

@xwiz xwiz commented Feb 4, 2022

This is a first step to resolving #129

This approach primarily computes the width size by using an estimate created

After the PR, the library may be used as follows (sample outputs are attached below):

<?php

use Box\Spout\Writer\Common\Creator\WriterEntityFactory;
use Box\Spout\Common\Type;

require __DIR__ . '/vendor/autoload.php';

function generateRows()
{
    $data = [];
    $headers = [];
    for($i=0;$i<10;$i++) {
        $headers[] = "COL-".generateRandomString(5);
    }
    
    $data = [
        $headers
    ];
    
    for($i=0; $i< 10; $i++) {
        $row = [];
        $rand = mt_rand(1, 30);
        foreach ($headers as $header) {
            $randLen = mt_rand(1, $rand);
            $row[] = generateRandomString($randLen);
        }
        $data[] = $row;
    }
    return $data;
}

function generateRandomString($length = 10) {
    $characters = 'aaaaeeeeiiiiiiiioooouuuubcdefghijklmnopqrstuvwxyz';
    $charactersLength = strlen($characters);
    $randomString = '';
    for ($i = 0; $i < $length; $i++) {
        $randomString .= $characters[rand(0, $charactersLength - 1)];
    }
    return $randomString;
}

$data = generateRows();

$writer = WriterEntityFactory::createWriter(Type::ODS);
$writer->setWidthCalculation(1);
$writer->openToFile('spouter.ods');

foreach ($data as $row) {
    $writer->addRow(WriterEntityFactory::createRowFromArray($row));
}

$writer->close();

?>

spouter.xlsx
spouter.ods

@CLAassistant
Copy link

CLAassistant commented Feb 4, 2022

CLA assistant check
All committers have signed the CLA.

@xwiz
Copy link
Author

xwiz commented Feb 5, 2022

@adrilo I noticed two tests are failing and it seems the test has an error... The two particular resource files involved are generated from a unix environment instead of the current environment.. I think those two files should be dynamically generated to ensure they are compatible with the particular environment..

Meanwhile I've attached further examples of using fixed width style calculation below..

<?php

use Box\Spout\Common\Type;
use Box\Spout\Writer\Common\Creator\Style\StyleBuilder;
use Box\Spout\Writer\Common\Creator\WriterEntityFactory;

require __DIR__ . '/vendor/autoload.php';

function generateRows()
{
    $data = [];
    $headers = [];
    for($i=0;$i<10;$i++) {
        $n = $i+1;
        $headers[] = "COL-$n";
    }
    
    $data = [
        $headers
    ];
    
    for($i=0; $i< 10; $i++) {
        $row = [];
        $rand = mt_rand(1, 30);
        foreach ($headers as $header) {
            $randLen = mt_rand(1, $rand);
            $row[] = generateRandomString($randLen);
        }
        $data[] = $row;
    }
    return $data;
}

function generateRandomString($length = 10) {
    $characters = 'aaaaeeeeiiiiiiiioooouuuubcdefghijklmnopqrstuvwxyz';
    $charactersLength = strlen($characters);
    $randomString = '';
    for ($i = 0; $i < $length; $i++) {
        $randomString .= $characters[rand(0, $charactersLength - 1)];
    }
    return $randomString;
}

$data = generateRows();
$defaultStyle = (new StyleBuilder())
    ->setFontSize(36)
    ->build();
$boldtyle = (new StyleBuilder())
    ->setFontSize(42)
    ->setFontBold()
    ->build();
//$writer = WriterEntityFactory::createWriter(Type::ODS);
$writer = WriterEntityFactory::createWriter(Type::XLSX);
$writer->setWidthCalculation(2)->setDefaultRowStyle($defaultStyle)->openToFile('spoutw.xlsx');
//$writer->openToFile('spouter.xlsx');

foreach ($data as $i => $row) {
    if ($i == 0) {
        $writer->addRow(WriterEntityFactory::createRowFromArray($row, $boldtyle));
    } else {
        $writer->addRow(WriterEntityFactory::createRowFromArray($row));
    }
}

$writer->close();

spoutw.xlsx
spoutw.ods

* @param $pos Position to insert
* @param $content Content to insert
*/
public static function insertToFile($fp, $pos, $content)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be very costly to rewrite the contents, especially with large spreadsheet

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, I will try both approaches and do a benchmark today. It didn't seem to matter for small files.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adrilo So I ran some tests and this is not actually a bad solution at all... (performance differences range from 3% to 7% and seems to get better for larger amount of rows).. (Only tested 100,000, 500,000 and 1 million rows though)

The difference are quite negligible most likely because the file is still in memory the whole time and the only costly operation here is really calling stream_get_contents.

image
image


/**
* Instead of seeking and re-writing from position, a better hack might be to write dummy empty data
* Enough to take care of any length, then carefully overwrite
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that's the strategy I've used elsewhere

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adrilo Benchmark result with overwriting empty spaces... (negligible again but may be useful)

image

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested 5,000 rows, 500,000 rows and 50,000 rows... The script is included below..

<?php

use Box\Spout\Common\Type;
use Box\Spout\Writer\Common\Creator\Style\StyleBuilder;
use Box\Spout\Writer\Common\Creator\WriterEntityFactory;

require __DIR__ . '/vendor/autoload.php';
ini_set('memory_limit', -1);

function generateRows()
{
    $headers = [];
    for($i=0;$i<10;$i++) {
        $n = $i+1;
        $headers[] = "COL-$n";
    }
    
    yield $headers;
    
    for($i=0; $i< 500000; $i++) {
        $row = [];
        $rand = mt_rand(1, 30);
        foreach ($headers as $header) {
            $randLen = mt_rand(1, $rand);
            $row[] = generateRandomString($randLen);
        }
        yield $row;
    }
}

function generateRandomString($length = 10) {
    $characters = 'aaaaeeeeiiiiiiiioooouuuubcdefghijklmnopqrstuvwxyz';
    $charactersLength = strlen($characters);
    $randomString = '';
    for ($i = 0; $i < $length; $i++) {
        $randomString .= $characters[rand(0, $charactersLength - 1)];
    }
    return $randomString;
}

$data = generateRows();
$defaultStyle = (new StyleBuilder())
    ->setFontSize(36)
    ->build();
$boldtyle = (new StyleBuilder())
    ->setFontSize(42)
    ->setFontBold()
    ->build();
//$writer = WriterEntityFactory::createWriter(Type::ODS);
$start = hrtime(true);
$writer = WriterEntityFactory::createWriter(Type::XLSX);
$writer->setWidthCalculation(1)->setDefaultRowStyle($defaultStyle)->openToFile('spout_trunk.xlsx');
//$writer->openToFile('spouter.xlsx');

foreach ($data as $i => $row) {
    if ($i == 0) {
        $writer->addRow(WriterEntityFactory::createRowFromArray($row, $boldtyle));
    } else {
        $writer->addRow(WriterEntityFactory::createRowFromArray($row));
    }
}

$writer->close();

$end = hrtime(true);
$elapsed = ($end - $start)/1000000.0;
echo "$elapsed ms counted with truncate".PHP_EOL;


$data = generateRows();
$start = hrtime(true);
$writer = WriterEntityFactory::createWriter(Type::XLSX);
//width calculation style 3 just for testing purposes
$writer->setWidthCalculation(3)->setDefaultRowStyle($defaultStyle)->openToFile('spout_over.xlsx');
//$writer->openToFile('spouter.xlsx');

foreach ($data as $i => $row) {
    if ($i == 0) {
        $writer->addRow(WriterEntityFactory::createRowFromArray($row, $boldtyle));
    } else {
        $writer->addRow(WriterEntityFactory::createRowFromArray($row));
    }
}

$writer->close();

$end = hrtime(true);
$elapsed = ($end - $start)/1000000.0;

echo "$elapsed ms counted with overwrite";

public function autoSetWidth($cell, $style, $zeroBasedIndex)
{
$size = 1 + strlen($cell->getValue());//ensure we have at least 1 space
$size *= $style->isFontBold() ? 1.2 : 1.0;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did you come up with the 1.2 ratio?

Copy link
Author

@xwiz xwiz Feb 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's actually meant to be 700/400.. but the 1.2 seemed to fit better.. and that may be because tinier letters like 'i', like you rightly pointed out; occupy less space..

src/Spout/Writer/Common/Entity/Worksheet.php Outdated Show resolved Hide resolved
@Slamdunk
Copy link

Hi, instead of overwriting dummy portion of the end file, just like ODS already did it with openspout/openspout#45 now also XLSX writes the rows in a temporary file and only after that it writes all the remaining data, thanks to the speed of stream_copy_to_stream.

So now column with, alongside auto-sizing, could be easily supported and already achievable by user code without affecting performances 🚀

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants