Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding in Full OCR text as string #435

Merged
merged 7 commits into from
Feb 16, 2024
Merged

Adding in Full OCR text as string #435

merged 7 commits into from
Feb 16, 2024

Conversation

skalupa
Copy link
Collaborator

@skalupa skalupa commented Feb 16, 2024

Describe the change
There is a desire to also have the full extracted OCR text as a string for searching side by side with the text extracted as an array. This PR adds in that functionality under the field "string_text" in order to prevent issues with any existing parsing rules. Also updated the scan_ocr test case in order to reflect this change.

Additionally, fixed the formatting of several of the scanner files in order to pass the code stylization check.

Describe testing procedures
Tested locally via the test cases in scan_ocr, which were slightly modified in order to reflect changes in text fields.

Sample output

{'elapsed': 19.439179,
         'flags': [],
       +  'full_text': b'Lorem Ipsum Lorem ipsum dolor sit amet, consectetur adipisci'
       +               b'ng elit. Cras lobortis sem dui. Morbi at magna quis ligula f'
       +               b'aucibusconsectetur feugiat at purus. Sed nec lorem nibh. Nam'
       +               b' vel libero odio. Vivamus tempus non enim egestas pretium.Ve'
       +               b'stibulum turpis arcu, maximus nec libero quis, imperdiet sus'
       +               b'cipit purus. Vestibulum blandit quis lacus nonsollicitudin. '
       +               b'Nullam non convallis dui, et aliquet risus. Sed accumsan ull'
       +               b'amcorper vehicula. Proin non urna facilisis,condimentum eros'
       +               b' quis, suscipit purus. Morbi euismod imperdiet neque ferment'
       +               b'um dictum. Integer aliquam, erat sitamet fringilla tempus, m'
       +               b'auris ligula blandit sapien, et varius sem mauris eu diam. S'
       +               b'ed fringilla neque est, in laoreetfelis tristique in. Donec '
       +               b'luctus velit a posuere posuere. Suspendisse sodales pellente'
       +               b'sque quam.',
          'text': [b'Lorem',
                   b'Ipsum',
                   b'Lorem',
                   b'ipsum',
                   b'dolor',
                   b'sit',
                   b'amet,',
                   b'consectetur',
                   b'adipiscing',
                   b'elit.',
                   b'Cras',
                   b'lobortis',
                   b'sem',
                   b'dui.',
                   b'Morbi',
                   b'at',
                   b'magna',
                   b'quis',
                   b'ligula',
                   b'faucibus',
                   b'consectetur',
                   b'feugiat',
                   b'at',
                   b'purus.',
                   b'Sed',
                   b'nec',
                   b'lorem',
                   b'nibh.',
                   b'Nam',
                   b'vel',
                   b'libero',
                   b'odio.',
                   b'Vivamus',
                   b'tempus',
                   b'non',
                   b'enim',
                   b'egestas',
                   b'pretium.',
                   b'Vestibulum',
                   b'turpis',
                   b'arcu,',
                   b'maximus',
                   b'nec',
                   b'libero',
                   b'quis,',
                   b'imperdiet',
                   b'suscipit',
                   b'purus.',
                   b'Vestibulum',
                   b'blandit',
                   b'quis',
                   b'lacus',
                   b'non',
                   b'sollicitudin.',
                   b'Nullam',
                   b'non',
                   b'convallis',
                   b'dui,',
                   b'et',
                   b'aliquet',
                   b'risus.',
                   b'Sed',
                   b'accumsan',
                   b'ullamcorper',
                   b'vehicula.',
                   b'Proin',
                   b'non',
                   b'urna',
                   b'facilisis,',
                   b'condimentum',
                   b'eros',
                   b'quis,',
                   b'suscipit',
                   b'purus.',
                   b'Morbi',
                   b'euismod',
                   b'imperdiet',
                   b'neque',
                   b'fermentum',
                   b'dictum.',
                   b'Integer',
                   b'aliquam,',
                   b'erat',
                   b'sit',
                   b'amet',
                   b'fringilla',
                   b'tempus,',
                   b'mauris',
                   b'ligula',
                   b'blandit',
                   b'sapien,',
                   b'et',
                   b'varius',
                   b'sem',
                   b'mauris',
                   b'eu',
                   b'diam.',
                   b'Sed',
                   b'fringilla',
                   b'neque',
                   b'est,',
                   b'in',
                   b'laoreet',
                   b'felis',
                   b'tristique',
                   b'in.',
                   b'Donec',
                   b'luctus',
                   b'velit',
                   b'a',
                   b'posuere',
                   b'posuere.',
                   b'Suspendisse',
                   b'sodales',
                   b'pellentesque',
                   b'quam.']}

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of and tested my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings

@skalupa skalupa marked this pull request as draft February 16, 2024 16:16
@phutelmyer phutelmyer marked this pull request as ready for review February 16, 2024 18:25
Copy link
Contributor

@phutelmyer phutelmyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works as designed, thank you!

@phutelmyer phutelmyer merged commit 48de208 into master Feb 16, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants