Skip to content

Dorbii/eol_matcher

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

EOLMatch

EOLMatch is a Go CLI for normalizing inconsistent software names and matching them to products from endoflife.date.

It is designed for cases like:

  • app v10
  • app 10
  • app version 10

All of these can normalize to the same logical product before similarity matching.

What It Does

  • Fetches product data from https://endoflife.date/api/v1/products/full
  • Stores a slim local JSON database
  • Matches input app names from CSV to known products
  • Outputs match results with scores

Slim JSON Schema Used

The local DB intentionally keeps only:

{
  "result": [
    {
      "name": "ubuntu",
      "label": "Ubuntu",
      "aliases": ["ubuntu-linux"],
      "category": "os",
      "releases": [
        {
          "name": "22.04",
          "codename": "Jammy Jellyfish",
          "label": "22.04 'Jammy Jellyfish' (LTS)",
          "releaseDate": "2022-04-21",
          "isLts": true,
          "ltsFrom": "2022-04-21",
          "isEoas": false,
          "eoasFrom": "2024-09-30",
          "isEol": false,
          "eolFrom": "2027-04-01",
          "isDiscontinued": false,
          "discontinuedFrom": "2027-04-01",
          "isEoes": true,
          "eoesFrom": "2032-04-09",
          "isMaintained": true,
          "latest": {
            "name": "22.04.2",
            "date": "2022-04-21",
            "link": "https://wiki.ubuntu.com/JammyJellyfish/ReleaseNotes/"
          },
          "custom": {
            "chromeVersion": "M136",
            "nodeVersion": "22.15"
          }
        }
      ]
    }
  ]
}

Root metadata fields from the full API (schema_version, total, generated_at, etc.) are not stored.

Project Layout

eolmatch/
  main.go
  go.mod
  internal/
    eoldb/
      types.go
      fetch.go
      load.go
    match/
      normalize.go
      match.go
    utils/
      patternMatcher.go
  testdata/
    fake_eol_products_slim.json
    fake_apps.csv
    fake_matches.csv
    eol_products_real_slim.json

Requirements

  • Go 1.22+

Build

From repo root:

cd eolmatch
go build -o eolmatch

CLI Flags

  • -f: fetch endoflife.date full products and save slim DB
  • -db: path to local slim JSON DB (default eol_products_slim.json)
  • -s: similarity threshold in [0.0, 1.0] (default 0.92)
  • -i: input CSV path (name,version)
  • -o: output CSV path (default matches.csv)

Usage

1. Fetch real product data

cd eolmatch
go run . -f -db testdata/eol_products_real_slim.json

2. Match inventory CSV against DB

cd eolmatch
go run . \
  -db testdata/eol_products_real_slim.json \
  -i testdata/fake_apps.csv \
  -o testdata/fake_matches.csv \
  -s 0.92

Input CSV Format

Header is optional.

name,version
node v18,18
node 18,18
node version 18,18

Minimum required column is name. version is optional.

Output CSV Format

matched,aggregated_name,original_name,original_version,matched_candidate,score
true,nodejs,node v18,18,node,0.9940
false,,unknown app,1.0,,

Columns:

  • matched: whether score met threshold
  • aggregated_name: canonical product slug from endoflife.date
  • original_name: name from input CSV
  • original_version: version from input CSV
  • matched_candidate: product name/label/alias that matched
  • score: similarity score (empty when unmatched)

Name Normalization

Before matching, names are normalized by:

  • lowercasing
  • removing version words (version, ver)
  • stripping trailing numeric version segments
  • normalizing non-alphanumeric separators to spaces
  • collapsing repeated whitespace

This is what makes app v10, app 10, and app version 10 converge.

Notes

  • Matching currently compares each input against product name, label, and aliases.
  • Candidate normalization is precomputed once per run, so repeated input rows avoid re-normalizing the full product list.
  • Runtime is roughly O(N * M) (N input rows, M products).
  • Threshold tuning 0.94-0.97: stricter, fewer false positives.
  • Threshold tuning 0.90-0.93: balanced.
  • Threshold tuning 0.85-0.89: looser, more recall.

Data Source

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages