Skip to content

Conversation

@PythonFZ
Copy link
Member

@PythonFZ PythonFZ commented Oct 23, 2025

WIP

Summary by CodeRabbit

  • New Features

    • RDKit-first substructure matching (hydrogen modes, atom-mapping order) and fragment-aware grouping
    • Molecular-graph rendering with bond-order-aware visuals
    • Bundled Packmol binary, new Packmol CLI, and default packing using the bundled binary
  • Documentation

    • Rewritten atom-selection and Packmol guides/notebooks with RDKit examples, diagrams, and visualization guidance
    • Updated index/quick-start and examples to RDKit workflows
  • Infrastructure

    • Multi-platform wheel/sdist CI workflow and packaging changes
  • Tests

    • Tests migrated to RDKit-centric inputs; expanded mapping and hydrogen coverage

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 23, 2025

Warning

Rate limit exceeded

@PythonFZ has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 18 minutes and 34 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 38d6384 and 4960442.

📒 Files selected for processing (1)
  • .github/workflows/build-wheels.yaml (1 hunks)

Walkthrough

This PR shifts substructure matching to an RDKit-centered API, adds fragment-aware grouping and visualization, bundles/compiles a Packmol binary with a hatch build hook and wrapper/CLI, extends utilities (graph drawing, density, unwrap, bond detection), updates docs/notebooks, tests, CI, and packaging metadata.

Changes

Cohort / File(s) Summary
Substructure & selection (RDKit-centric)
src/molify/substructure.py, src/molify/substructure.py (new API), tests/test_substructure.py, docs/source/atom_selection.ipynb
Rewrote matching to RDKit: match_substructure(Chem.Mol, smarts_or_smiles, hydrogens, mapped_only); added group_matches_by_fragment, get_substructures, updated visualization helpers and tests; notebooks converted to RDKit-based examples and reorganized sections.
Packmol bundling & invocation
src/molify/packmol.py, src/molify/pack.py, hatch_build.py, .gitmodules, external/packmol, pyproject.toml
Added packmol wrapper (get/run/CLI) that uses a bundled binary; pack() defaults to bundled binary when packmol=None; added git submodule, Hatch build hook to compile Packmol into wheels, and packaging/script entry packmol.
Utilities & visualization
src/molify/utils.py, molify/utils.py, molify/utils.py (new/updated)
Added utilities: bond-order mapping, connected-component helpers, density/box calculations, unwrap_structures, rdkit_determine_bonds, and draw_molecular_graph with _draw_edges_with_bond_orders; added find_packmol_executable.
Documentation & notebooks
docs/source/index.rst, docs/source/conf.py, docs/source/packmol_tools.ipynb, docs/source/atom_selection.ipynb, README.md
Expanded index and docs, added sphinxcontrib.mermaid to Sphinx extensions, rewrote notebooks to RDKit flows and fragment grouping, updated kernelspec display_name to rdkit2ase, and noted bundled Packmol in README.
Public exports & package API
src/molify/__init__.py, src/molify/substructure.py, src/molify/packmol.py
Updated exports: added group_matches_by_fragment, draw_molecular_graph, packmol API; removed legacy selectors and Julia packmol helpers (e.g., get_packmol_julia_version).
Tests & call sites
tests/*.py, tests/integration/*.py, tests/conftest.py
Tests updated to use RDKit Mol inputs and new substructure signatures; removed explicit packmol arguments to rely on bundled default; removed tests tied to Julia-packmol helpers.
CI / Build workflows
.github/workflows/build-wheels.yaml, .github/workflows/pytest.yaml, .github/workflows/pages.yaml, (removed) .github/workflows/publish.yaml
Added multi-platform build/sdist workflow; consolidated OS matrix for pytest; upgraded checkout action and enabled recursive submodules; removed old publish workflow and Packmol install steps.
Hatch build hook & wheel packaging
hatch_build.py, pyproject.toml
New Hatch build hook compiles Packmol and embeds binary into wheel, marks wheel platform-specific; pyproject.toml bumped version, added build hooks, wheel target, docs dependency sphinxcontrib-mermaid, and packmol script entry.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Substruct as match_substructure
    participant RDKit
    participant Mol as Chem.Mol

    User->>Substruct: match_substructure(Mol, "SMARTS/SMILES", hydrogens, mapped_only)
    activate Substruct
    Substruct->>Substruct: parse pattern & validate maps
    Substruct->>RDKit: run substructure search
    RDKit-->>Substruct: matches (atom index tuples)
    Substruct->>Substruct: apply hydrogen mode & mapped ordering
    Substruct-->>User: tuple of matches
    deactivate Substruct
Loading
sequenceDiagram
    participant User
    participant PackAPI as molify.pack
    participant PackMod as molify.packmol
    participant Bin as packmol binary

    User->>PackAPI: pack(..., packmol=None)
    activate PackAPI
    PackAPI->>PackMod: get_packmol_binary()
    PackMod-->>PackAPI: /.../src/molify/binaries/packmol
    PackAPI->>Bin: subprocess(stdin = pack input)
    Bin-->>PackAPI: stdout/stderr
    PackAPI-->>User: packed ASE Atoms
    deactivate PackAPI
Loading
sequenceDiagram
    participant HatchBuild
    participant Hook as CustomBuildHook
    participant Make as make (external/packmol)
    participant BinDest as src/molify/binaries/

    HatchBuild->>Hook: initialize(version, build_data)
    activate Hook
    Hook->>Hook: locate gfortran (candidates)
    alt gfortran found
        Hook->>Make: run make FORTRAN=<gfortran>
        Make-->>Hook: compiled binary
        Hook->>BinDest: copy binary into package
        Hook->>HatchBuild: mark wheel platform-specific
    else not found
        Hook-->>HatchBuild: raise RuntimeError (guidance)
    end
    deactivate Hook
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60–75 minutes

Key areas requiring extra attention:

  • src/molify/substructure.py — RDKit parsing, hydrogen modes, mapped ordering, grouping, and visualization deduplication.
  • hatch_build.py — Fortran tool detection, make invocation, binary inclusion, and wheel metadata mutation.
  • src/molify/packmol.py & src/molify/pack.py — binary discovery, subprocess error handling, CLI behavior.
  • src/molify/utils.py — unwrap, bond-detection, density, and drawing implementations.
  • docs/notebooks — notebook cell correctness with updated APIs and kernelspec.

Possibly related PRs

Poem

🐰 I hopped through code with whiskers bright,

Packmol bundled, compiled by night.
RDKit maps the atoms true,
Graphs draw bonds in pretty view.
Docs and tests now dance in light.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.15% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Title check ❓ Inconclusive The title 'update documentation' is vague and generic. While documentation updates are present (notebooks, conf.py, index.rst), the PR encompasses major changes far beyond documentation: new core modules (packmol.py, substructure.py), significant API refactors (match_substructure signature, pack behavior), infrastructure additions (build hooks, GitHub workflows), and version bumps. The title fails to capture the scope or main intent. Clarify the title to reflect the actual primary change—either the documentation overhaul or the underlying RDKit-centric substructure matching refactor, API consolidation, and packaging improvements (e.g., 'Refactor substructure matching to RDKit-centric API and enhance documentation' or 'Add RDKit integration, bundle Packmol, and update docs').
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov-commenter
Copy link

codecov-commenter commented Oct 23, 2025

Codecov Report

❌ Patch coverage is 80.66914% with 104 lines in your changes missing coverage. Please review.
✅ Project coverage is 92.36%. Comparing base (243158f) to head (4960442).
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
src/molify/utils.py 56.18% 85 Missing ⚠️
src/molify/packmol.py 40.00% 15 Missing ⚠️
src/molify/substructure.py 97.93% 3 Missing ⚠️
src/molify/pack.py 83.33% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #80      +/-   ##
==========================================
- Coverage   96.76%   92.36%   -4.40%     
==========================================
  Files          25       26       +1     
  Lines        1853     1873      +20     
==========================================
- Hits         1793     1730      -63     
- Misses         60      143      +83     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
molify/utils.py (2)

76-85: calculate_box_dimensions has unit/dimension errors.

total_mass is in amu; dividing by density (kg/m³) and “/1000” yields inconsistent units. Return value is used as Å, but the math doesn’t convert kg→ų. Fix to compute volume in m³, convert to ų, then cube‑root.

 def calculate_box_dimensions(images: list[ase.Atoms], density: float) -> list[float]:
-    """Calculates the dimensions of the simulation box
-
-    based on the molar volume and target density.
-    """
-    total_mass = sum(sum(atoms.get_masses()) for atoms in images)
-    molar_volume = total_mass / density / 1000  # m^3 / mol
-    volume_per_mol = molar_volume * ase.units.m**3 / ase.units.mol
-    box_edge = volume_per_mol ** (1 / 3)
-    return [box_edge] * 3
+    """Compute cubic box edges (Å) for given images at target density (kg/m³).
+
+    Assumes `images` is the list of all molecules to be packed in one box.
+    """
+    # Total mass in kg (get_masses returns amu)
+    total_mass_kg = sum(sum(atoms.get_masses()) for atoms in images) * ase.units._amu
+    # Volume in m³ from density (kg/m³)
+    volume_m3 = total_mass_kg / density
+    # Convert to ų
+    volume_A3 = volume_m3 * 1e30
+    box_edge_A = volume_A3 ** (1 / 3)
+    return [box_edge_A] * 3

169-173: Comment says BFS, code uses DFS.

Update the comment for accuracy.

-        # now do a bfs traversal to unwrap the molecule
+        # now do a DFS traversal to unwrap the molecule
🧹 Nitpick comments (7)
docs/source/index.rst (1)

175-178: Section numbering jumps from “2.” to “4.”

Change “4. Full Reference” to “3. Full Reference” for consistency.

-**4. Full Reference**
+**3. Full Reference**
docs/source/packmol_tools.ipynb (1)

307-329: Runtime/version-dependent output in docs.

“Packmol.jl version: v0.1.12” will drift. Consider:

  • Mark cell as “do not execute” for published docs, or
  • Print a generic success message and hide exact version, or
  • Guard with a short timeout and fallback text.

This keeps the docs resilient across environments.

molify/utils.py (2)

239-260: Fragile parsing of Julia Packmol version.

Splitting on spaces and indexing parts[2] may break with format changes. Prefer a regex search for “Packmol” and a semver token, or expose the query in docs and handle absence gracefully.

-        for line in result.stdout.splitlines():
-            if "Packmol" in line:
-                parts = line.split()
-                if len(parts) >= 3:
-                    return parts[2]
+        import re
+        for line in result.stdout.splitlines():
+            if "Packmol" in line:
+                m = re.search(r'Packmol.*\b(v?\d+(?:\.\d+){1,3})\b', line)
+                if m:
+                    return m.group(1)

70-73: Minor: uses private ASE constant.

ase.units._amu is private; consider hard-coding the CODATA value or referencing public constants to reduce churn risk. Low priority.

molify/__init__.py (1)

35-35: Consider documenting draw_molecular_graph as optional.

If you keep the public symbol, please note in README/docs that plotting requires extra deps (matplotlib), or provide an extras_require pin.

docs/source/atom_selection.ipynb (2)

244-260: Strip committed outputs (large images/printed dumps).

Keep notebooks lean; outputs drift across environments and increase repo size. Use nbstripout or pre-commit to clear outputs.

Also applies to: 248-260, 393-401


133-134: Notebook lint: useless expression warning.

Static analysis flagged a useless expression; remove stray bare expressions that don’t affect execution.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a810754 and 546ad35.

📒 Files selected for processing (5)
  • docs/source/atom_selection.ipynb (9 hunks)
  • docs/source/index.rst (1 hunks)
  • docs/source/packmol_tools.ipynb (2 hunks)
  • molify/__init__.py (2 hunks)
  • molify/utils.py (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
molify/__init__.py (1)
molify/utils.py (2)
  • draw_molecular_graph (263-417)
  • unwrap_structures (88-178)
🪛 Ruff (0.14.1)
molify/utils.py

352-352: Avoid specifying long messages outside the exception class

(TRY003)


369-372: Avoid specifying long messages outside the exception class

(TRY003)

docs/source/atom_selection.ipynb

133-133: Found useless expression. Either assign it to a variable or remove it.

(B018)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: pytest (3.11, ubuntu-latest, 20.15.1)
  • GitHub Check: pytest (3.10, ubuntu-latest, 20.15.1)
  • GitHub Check: pytest (3.13, ubuntu-latest, 20.15.1)
  • GitHub Check: pytest (3.12, ubuntu-latest, 20.15.1)
🔇 Additional comments (3)
docs/source/index.rst (2)

4-8: No branding mismatch—the documentation is correct. The official package name is "molify" (per pyproject.toml), and the docs consistently use it for imports and function calls. The GitHub repository is named rdkit2ase, but repository names and package names commonly differ and is not an issue. Users will correctly install pip install molify and import import molify.


27-40: All required Mermaid configuration is already in place—no action needed.

Verification confirms that sphinxcontrib.mermaid is configured in docs/source/conf.py (line 25) and sphinxcontrib-mermaid>=1.0.0 is included in the docs dependency group in pyproject.toml. The Mermaid diagrams should render correctly as-is.

docs/source/packmol_tools.ipynb (1)

40-49: Strip notebook outputs and configure nbstripout in pre-commit to prevent recurrence.

The notebook contains 8 output cells with embedded base64-encoded images (2 images, ~3KB each encoded), resulting in unnecessary file bloat. This is part of a broader pattern: all 5 notebooks in docs/source/ carry embedded outputs (13–22 cells each), inflating the repository unnecessarily.

Recommendation remains valid. Strip outputs from this notebook (and ideally all others) and configure nbstripout in .git/.pre-commit-config.yaml to enforce clean notebooks going forward.

visualize_selected_molecules,
)
from molify.utils import unwrap_structures
from molify.utils import draw_molecular_graph, unwrap_structures
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Eager import adds a hard matplotlib dependency at import time.

Importing draw_molecular_graph from init forces matplotlib even when users only need core conversions. Make it lazy (try/except shim) or move to a viz submodule to keep base install slim.

-from molify.utils import draw_molecular_graph, unwrap_structures
+from molify.utils import unwrap_structures
+try:
+    # Optional: available only if plotting deps are installed
+    from molify.utils import draw_molecular_graph  # type: ignore
+except Exception:  # ImportError and others (e.g., backend issues)
+    def draw_molecular_graph(*_, **__):  # pragma: no cover
+        raise RuntimeError(
+            "molify.draw_molecular_graph requires matplotlib. "
+            "Install optional plotting deps, e.g. `pip install molify[plot]`."
+        )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
from molify.utils import draw_molecular_graph, unwrap_structures
from molify.utils import unwrap_structures
try:
# Optional: available only if plotting deps are installed
from molify.utils import draw_molecular_graph # type: ignore
except Exception: # ImportError and others (e.g., backend issues)
def draw_molecular_graph(*_, **__): # pragma: no cover
raise RuntimeError(
"molify.draw_molecular_graph requires matplotlib. "
"Install optional plotting deps, e.g. `pip install molify[plot]`."
)
🤖 Prompt for AI Agents
molify/__init__.py around line 18: the file currently does a top-level import of
draw_molecular_graph which forces a matplotlib dependency on import; remove that
eager import and instead either (A) move draw_molecular_graph (and any viz
helpers) into a new submodule molify.viz and export it from there, updating
documentation, or (B) replace the top-level import with a lazy shim: wrap the
import in a try/except ImportError and set draw_molecular_graph to a small
function that raises a clear error suggesting installation of the optional viz
extras (e.g., "pip install molify[viz]") or dynamically import the real function
inside draw_molecular_graph when called; ensure unwrap_structures (core) remains
imported at top-level.

Comment on lines 358 to 364
if layout == "spring":
if weight_by_bond_order:
# Use bond_order as weights - higher order = stronger spring
pos = nx.spring_layout(graph, weight="bond_order", seed=42)
else:
pos = nx.spring_layout(graph, seed=42)
elif layout == "circular":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Guard spring_layout weights when bond_order is None.

NetworkX expects numeric weights; None may error or be ignored inconsistently. Precompute numeric weights (fallback to 1.0).

-    if layout == "spring":
-        if weight_by_bond_order:
-            # Use bond_order as weights - higher order = stronger spring
-            pos = nx.spring_layout(graph, weight="bond_order", seed=42)
+    if layout == "spring":
+        if weight_by_bond_order:
+            # Build a view with numeric weights: None -> 1.0
+            g_w = nx.Graph()
+            g_w.add_nodes_from(graph.nodes(data=True))
+            for u, v, data in graph.edges(data=True):
+                w = data.get("bond_order")
+                g_w.add_edge(u, v, weight=float(w) if isinstance(w, (int, float)) else 1.0)
+            pos = nx.spring_layout(g_w, weight="weight", seed=42)
         else:
             pos = nx.spring_layout(graph, seed=42)
🤖 Prompt for AI Agents
In molify/utils.py around lines 358 to 364, the spring_layout call uses edge
attribute "bond_order" directly which may contain None values and NetworkX
requires numeric weights; ensure all edges have numeric bond_order before
calling spring_layout by iterating edges and replacing None (or non-numeric)
bond_order values with a numeric fallback (e.g., 1.0), then call
nx.spring_layout(graph, weight="bond_order", seed=42). If weight_by_bond_order
is False, leave current behavior unchanged.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (12)
docs/source/index.rst (1)

149-161: Tighten uv install steps.

Consider adding explicit uv venv/create step to avoid confusion for new contributors. For example: “uv venv && . .venv/bin/activate && uv sync”.

docs/source/packmol_tools.ipynb (3)

41-87: Clear notebook outputs before committing.

Large base64 images/prints bloat the repo and go stale. Strip outputs (e.g., nbconvert --clear-output or nbstripout) and keep execution_count null.

Also applies to: 248-268


321-329: Guard Packmol discovery.

The try/except is good; consider also showing a short preflight helper (e.g., checking shutil.which for “packmol.jl”/“packmol”) so users know why cells are skipped.


75-87: Reproducibility note.

You pass default seed=42 in pack(); consider calling it explicitly in examples to make determinism obvious.

tests/test_substructure.py (3)

17-21: Skip tests when Packmol is unavailable.

Gate Packmol-dependent tests to improve local dev ergonomics.

+import shutil
+
 @pytest.mark.parametrize("packmol", ["packmol.jl"])
-def test_match_substructure_box(packmol):
+@pytest.mark.skipif(
+    shutil.which("packmol.jl") is None and shutil.which("packmol") is None,
+    reason="Packmol not installed",
+)
+def test_match_substructure_box(packmol):
@@
 @pytest.mark.parametrize("packmol", ["packmol.jl"])
-def test_get_substructure_box(packmol):
+@pytest.mark.skipif(
+    shutil.which("packmol.jl") is None and shutil.which("packmol") is None,
+    reason="Packmol not installed",
+)
+def test_get_substructure_box(packmol):
@@
 @pytest.mark.parametrize("packmol", ["packmol.jl"])
-def test_iter_fragments(packmol, remove_connectivity):
+@pytest.mark.skipif(
+    shutil.which("packmol.jl") is None and shutil.which("packmol") is None,
+    reason="Packmol not installed",
+)
+def test_iter_fragments(packmol, remove_connectivity):
@@
 @pytest.mark.parametrize("packmol", ["packmol.jl"])
-def test_match_substructure_ions_with_none_bond_orders(packmol):
+@pytest.mark.skipif(
+    shutil.which("packmol.jl") is None and shutil.which("packmol") is None,
+    reason="Packmol not installed",
+)
+def test_match_substructure_ions_with_none_bond_orders(packmol):

Also applies to: 43-49, 61-72, 417-439


24-27: Reduce brittleness of exact index assertions.

Packmol ordering can vary; prefer semantic checks (element sets/counts) over fixed indices to avoid flakes across platforms.

Also applies to: 25-27


260-266: Add a negative test for mapped_only without maps.

Documented behavior is implicit; either enforce an error or test current behavior. Example new test:

+def test_mapped_only_without_maps_returns_full_match():
+    mol = Chem.MolFromSmiles("CCO")
+    mol = Chem.AddHs(mol)
+    # Current behavior: returns full match when no maps are present
+    matches = molify.match_substructure(mol, "CO", mapped_only=True)
+    assert matches == ((1, 2),)
docs/source/atom_selection.ipynb (1)

98-101: Clarify mapped_only semantics.

Add a short note: if mapped_only=True and no atom maps exist in the pattern, the full match is returned. This avoids surprises.

molify/substructure.py (4)

108-121: Consider stricter mapped_only behavior.

If mapped_only=True but the pattern has no atom maps, returning full matches can surprise users. Either raise a ValueError or document explicitly.

-        if mapped_only and mapped_pattern_indices:
+        if mapped_only and not mapped_pattern_indices:
+            raise ValueError("mapped_only=True requires mapped atoms in the pattern")
+        if mapped_only:
             core_atoms = tuple(match[idx] for idx in mapped_pattern_indices)
         else:
             core_atoms = match

236-273: Allow passing match options through get_substructures.

Expose hydrogens/mapped_only to make this utility more flexible.

-def get_substructures(
-    atoms: ase.Atoms,
-    pattern: str,
-    **kwargs,
-) -> list[ase.Atoms]:
+def get_substructures(
+    atoms: ase.Atoms,
+    pattern: str,
+    *,
+    hydrogens: tp.Literal["include", "exclude", "isolated"] = "exclude",
+    mapped_only: bool = False,
+    **kwargs,
+) -> list[ase.Atoms]:
@@
-    mol = ase2rdkit(atoms, **kwargs)
-    matches = match_substructure(mol, pattern)
+    mol = ase2rdkit(atoms, **kwargs)
+    matches = match_substructure(
+        mol, pattern, hydrogens=hydrogens, mapped_only=mapped_only
+    )

387-396: Lazy-import pyplot and avoid prints.

Reduce import overhead and use logging instead of print to keep libraries quiet.

-import matplotlib.pyplot as plt
+from typing import TYPE_CHECKING
+if TYPE_CHECKING:  # for type checkers only
+    import matplotlib.pyplot as plt
@@
-    colors = plt.cm.tab10.colors
+    # Lazy import to avoid heavy import at module import time
+    import matplotlib.pyplot as plt  # type: ignore
+    colors = plt.cm.tab10.colors
@@
-    if not candidate_mols:
-        print("No molecules to draw with the given selections.")
-        return None
+    if not candidate_mols:
+        import logging
+        logging.getLogger(__name__).info(
+            "No molecules to draw with the given selections."
+        )
+        return None

Also applies to: 445-461, 474-482


81-87: Tidy exception messages (TRY003).

Shorten messages or use dedicated exception classes to satisfy linters. Optional, but keeps noise down.

-        raise ValueError(f"Invalid SMARTS/SMILES: {smarts_or_smiles}")
+        raise ValueError("Invalid SMARTS/SMILES pattern")

-                raise ValueError(f"Atom map label '{map_num}' is used multiple times")
+                raise ValueError("Duplicate atom map label found")

-            raise ValueError(
-                f"Invalid value for `hydrogens`: {hydrogens!r}. "
-                "Expected one of 'include', 'exclude', 'isolated'."
-            )
+            raise ValueError("hydrogens must be one of: include, exclude, isolated")

Also applies to: 95-96, 162-165

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 546ad35 and 08eed70.

📒 Files selected for processing (6)
  • docs/source/atom_selection.ipynb (2 hunks)
  • docs/source/index.rst (1 hunks)
  • docs/source/packmol_tools.ipynb (2 hunks)
  • molify/__init__.py (2 hunks)
  • molify/substructure.py (2 hunks)
  • tests/test_substructure.py (9 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • molify/init.py
🧰 Additional context used
🧬 Code graph analysis (2)
molify/substructure.py (2)
molify/ase2x.py (1)
  • ase2rdkit (187-220)
molify/utils.py (1)
  • find_connected_components (32-62)
tests/test_substructure.py (5)
molify/ase2x.py (1)
  • ase2rdkit (187-220)
molify/substructure.py (4)
  • match_substructure (13-170)
  • get_substructures (236-272)
  • group_matches_by_fragment (173-233)
  • visualize_selected_molecules (387-482)
molify/smiles2x.py (2)
  • smiles2conformers (64-149)
  • smiles2atoms (33-61)
molify/pack.py (1)
  • pack (158-263)
tests/conftest.py (3)
  • ethanol_water (8-14)
  • alanine_dipeptide (18-20)
  • alanine_dipeptide_box (24-29)
🪛 Ruff (0.14.1)
molify/substructure.py

86-86: Avoid specifying long messages outside the exception class

(TRY003)


95-95: Avoid specifying long messages outside the exception class

(TRY003)


162-165: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: pytest (3.11, ubuntu-latest, 20.15.1)
  • GitHub Check: pytest (3.12, ubuntu-latest, 20.15.1)
  • GitHub Check: pytest (3.10, ubuntu-latest, 20.15.1)
  • GitHub Check: pytest (3.13, ubuntu-latest, 20.15.1)
🔇 Additional comments (5)
docs/source/index.rst (2)

22-40: Mermaid blocks need Sphinx support configured.

Confirm sphinxcontrib-mermaid (or equivalent) is enabled in docs/conf.py and on your docs builder (e.g., RTD). Otherwise these diagrams won’t render.


47-53: Nice: documents connectivity schema and None bond orders.

Matches current ase2rdkit behavior that tolerates None bond orders.

docs/source/atom_selection.ipynb (1)

32-44: LGTM; SMARTS usage and RDKit-first flow are clear.

Examples correctly use SMARTS and explicit ase2rdkit(suggestions=[...]).

Also applies to: 55-60, 72-77

molify/substructure.py (2)

172-233: Group-by-fragment logic is clear and correct.

Deduplication with order preservation is a nice touch.


275-314: iter_fragments: robust fallback without networkx.

Good defensive implementation using adjacency if networkx is missing.

@PythonFZ PythonFZ mentioned this pull request Nov 5, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 08eed70 and 8cfa171.

📒 Files selected for processing (2)
  • docs/source/index.rst (1 hunks)
  • docs/source/packmol_tools.ipynb (2 hunks)
🔇 Additional comments (4)
docs/source/packmol_tools.ipynb (1)

380-380: Verify the nbformat_minor downgrade is intentional.

The notebook format was downgraded from nbformat_minor: 5 to nbformat_minor: 4. This change moves to an older Jupyter Notebook format version (4.4 instead of 4.5). While this may improve compatibility with older Jupyter installations, please confirm this was intentional and not an inadvertent change during editing.

docs/source/index.rst (3)

172-172: The review comment is incorrect; no issues found.

The modules toctree entry on line 172 is valid. The file docs/source/modules.rst exists in the repository, and sphinx.ext.autodoc is properly configured in conf.py. The documentation build will not fail due to this reference.


64-98: Code examples correctly reference exported public APIs — verified.

All four conversion functions used in the Quick Start examples (rdkit2ase, ase2rdkit, rdkit2networkx, ase2networkx) are properly exported in molify's __init__.py, listed in __all__, and defined in their respective modules. The documentation examples accurately use the public API.


27-39: No issues found with Mermaid configuration or documentation structure.

sphinxcontrib-mermaid (>=1.0.0) is properly configured in conf.py and has appropriate version constraints in pyproject.toml. The Mermaid diagram syntax in both flowcharts is valid (graph LR with proper node/edge formatting), and all six conversion functions referenced in the diagrams and code examples exist in the molify API. The modules.rst reference is valid, and documentation structure is complete.

Comment on lines 110 to 133
import molify
# Generate 10 different 3D conformers of water
water = molify.smiles2conformers("O", numConfs=10)
**Molecular System Building**

For many applications, it is useful to solvate molecules or create mixtures.
molify provides a convenient interface to Packmol_ for building molecular systems.

.. code-block:: python
import molify
# Generate conformers for water and ethanol
water = molify.smiles2conformers("O", numConfs=10)
etoh = molify.smiles2conformers("CCO", numConfs=10)
# Pack 5 water and 5 ethanol molecules into a box at 800 kg/m³
# Note: Requires Packmol.jl (Julia) or Packmol to be installed
box = molify.pack(
data=[water, etoh], counts=[5, 5], density=800, packmol="packmol.jl"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Verify Advanced Features functions and documentation links.

The Advanced Features section references molify.smiles2conformers() and molify.pack() functions. Verify these are exported in the public API and that the documentation link targets (:doc:packmol_tools, :doc:ase_tools, etc. on line 135) exist and resolve correctly during documentation build.

Run the following script to verify function exports and documentation references:


🏁 Script executed:

#!/bin/bash
# Description: Verify Advanced Features functions are exported and doc links exist

echo "=== Checking for function definitions ==="
rg -n "def (smiles2conformers|pack)\s*\(" --type py

echo -e "\n=== Checking for referenced documentation files ==="
fd -e rst "^(packmol_tools|ase_tools|rdkit_tools|atom_selection|networkx_tools)\.rst$" docs/source/

Length of output: 168



---

🏁 Script executed:

```shell
#!/bin/bash
# Description: Verify function exports and find actual documentation references

echo "=== Checking molify/__init__.py for function exports ==="
rg -n "smiles2conformers|pack" molify/__init__.py -C 3

echo -e "\n=== Checking index.rst line 135 for documentation link targets ==="
sed -n '135,140p' docs/source/index.rst

echo -e "\n=== Finding all .rst files in docs/source/ ==="
fd -e rst docs/source/ | head -20

Length of output: 296



---

🏁 Script executed:

```shell
#!/bin/bash
# Description: Verify function exports and find actual documentation references

echo "=== Checking molify/__init__.py for function exports ==="
rg -n "smiles2conformers|pack" molify/__init__.py -C 3

echo -e "\n=== Checking index.rst line 135 for documentation link targets ==="
sed -n '135,140p' docs/source/index.rst

echo -e "\n=== Finding all .rst files in docs/source/ ==="
fd -e rst docs/source/

Length of output: 1277



---

🏁 Script executed:

```shell
#!/bin/bash
# Description: Verify documentation link targets exist

echo "=== Finding all .rst files in docs/source/ ==="
fd . docs/source/ -e rst

echo -e "\n=== Checking specifically for referenced doc files ==="
for docname in packmol_tools ase_tools rdkit_tools atom_selection networkx_tools; do
  if [ -f "docs/source/${docname}.rst" ]; then
    echo "✓ Found: docs/source/${docname}.rst"
  else
    echo "✗ Missing: docs/source/${docname}.rst"
  fi
done

Length of output: 349



---

🏁 Script executed:

```shell
#!/bin/bash
# Description: Check full docs structure and verify this is the issue

echo "=== Full directory structure of docs/ ==="
find docs -type f -name "*.rst" | sort

echo -e "\n=== Check if documentation is incomplete or links are wrong ==="
echo "Searching for any mention of packmol, ase, rdkit in .rst files..."
rg -l "packmol|ase|rdkit" docs/source/

Length of output: 416


Fix broken documentation links in Advanced Features section.

The functions smiles2conformers and pack are correctly exported in the public API. However, the documentation links on line 135 are broken: they reference :doc: targets (which expect .rst files) but the actual documentation files are .ipynb (Jupyter notebooks). The referenced files packmol_tools.rst, ase_tools.rst, rdkit_tools.rst, and atom_selection.rst do not exist; they are named packmol_tools.ipynb, ase_tools.ipynb, etc. This will cause broken links during documentation builds. Update the references to use the correct format (e.g., link to the .ipynb files directly or convert the documentation source files to .rst).

🤖 Prompt for AI Agents
In docs/source/index.rst around lines 110 to 133 (and the broken link at ~135),
the docs reference non-existent :doc: targets (packmol_tools.rst, ase_tools.rst,
rdkit_tools.rst, atom_selection.rst); update those links to point to the actual
notebook files (packmol_tools.ipynb, ase_tools.ipynb, rdkit_tools.ipynb,
atom_selection.ipynb) or convert the notebooks to .rst and update the targets
accordingly — specifically replace the :doc: references with direct
links/appropriate Sphinx roles that reference the .ipynb files (or change the
source filenames to .rst and keep :doc:) so the documentation builder resolves
them.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (3)
molify/__init__.py (1)

17-21: Matplotlib dependency still forced at import time.

The past review comment regarding eager import of draw_molecular_graph remains unresolved. Importing it at the top level forces matplotlib even when users only need core conversions.

molify/utils.py (2)

336-336: Documentation still incorrect for node_size default.

The past review comment remains unresolved. The docstring states "Default is 3000" but the signature defaults to 500.


401-407: Bond order None handling still missing for spring_layout.

The past review comment remains unresolved. When weight_by_bond_order=True, edges with bond_order=None may cause NetworkX errors or inconsistent behavior in spring_layout.

🧹 Nitpick comments (2)
molify/utils.py (2)

264-303: LGTM!

The auto-detection logic correctly prioritizes Julia wrapper over system binary and provides helpful error messages. The implementation is clean and well-documented.

Optional style improvements (per Ruff hints):

  • Line 290: Consider moving pass to an else block for clarity
  • Line 300-303: Consider extracting the error message to a constant if it's reused elsewhere

382-398: LGTM!

The implementation correctly:

  • Handles both provided and new axes
  • Delegates bond order visualization appropriately
  • Uses Jmol color scheme with proper fallback
  • Manages figure lifecycle to prevent double display in notebooks

Optional: Extract error messages at lines 395 and 412-415 to constants per Ruff TRY003 hint.

Also applies to: 417-460

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8cfa171 and b9b63e6.

📒 Files selected for processing (4)
  • docs/source/packmol_tools.ipynb (2 hunks)
  • molify/__init__.py (2 hunks)
  • molify/pack.py (4 hunks)
  • molify/utils.py (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
molify/pack.py (1)
molify/utils.py (3)
  • calculate_box_dimensions (77-86)
  • find_packmol_executable (264-303)
  • get_packmol_julia_version (232-261)
molify/__init__.py (2)
molify/substructure.py (2)
  • group_matches_by_fragment (173-233)
  • match_substructure (13-170)
molify/utils.py (2)
  • draw_molecular_graph (306-460)
  • find_packmol_executable (264-303)
🪛 Ruff (0.14.3)
molify/utils.py

290-290: Consider moving this statement to an else block

(TRY300)


300-303: Avoid specifying long messages outside the exception class

(TRY003)


395-395: Avoid specifying long messages outside the exception class

(TRY003)


412-415: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: pytest (3.10, ubuntu-latest, 20.15.1)
  • GitHub Check: pytest (3.13, ubuntu-latest, 20.15.1)
  • GitHub Check: pytest (3.11, ubuntu-latest, 20.15.1)
  • GitHub Check: pytest (3.12, ubuntu-latest, 20.15.1)
🔇 Additional comments (13)
docs/source/packmol_tools.ipynb (8)

7-11: LGTM! Clear introduction with proper prerequisites.

The updated introduction clearly explains what Packmol is and links to the external installation requirement. This helps users understand the prerequisites before proceeding.


42-65: LGTM! Clear and well-structured example.

The water box example demonstrates the workflow effectively:

  1. Generate conformers
  2. Pack with appropriate density (1000 kg/m³ for water)
  3. Display results with clear print statements

The code is clean and educational.


81-84: LGTM! Connectivity verification is helpful.

The connectivity check demonstrates that bond information is preserved through the packing process, which is important for users to understand. The expected bond count calculation (10 molecules × 2 O-H bonds) is correct.


184-227: LGTM! Comprehensive mixture example.

The water-ethanol mixture example effectively demonstrates:

  1. Creating conformers for multiple molecule types
  2. Packing mixed systems with appropriate density
  3. Visualizing the result with RDKit

The code is clear and the density choice (800 kg/m³) is reasonable for such a mixture.


245-249: LGTM! Useful verification example.

The density calculation using molify.utils.calculate_density() provides a practical way for users to verify their packed systems match the target density.


305-305: Verify nbformat_minor downgrade.

The nbformat_minor was changed from 5 to 4, which represents a downgrade to an older notebook format. Please verify:

  • Is this downgrade intentional for backward compatibility?
  • Are there any notebook features (from nbformat 5) that might break with this change?
  • Was the notebook edited with an older Jupyter version?

Modern Jupyter installations (6.x+) use nbformat 5, so downgrading might indicate compatibility requirements or an editing tool issue.


1-306: Consider WIP status and coverage before merging.

While the notebook documentation is comprehensive and well-structured, please note:

  1. WIP Status: The PR description indicates this is "WIP" (Work In Progress). Confirm the PR is ready for review and merge.

  2. Coverage Concerns: Per the PR objectives, coverage dropped significantly (96.76% → 92.69%, −4.07%). The molify/utils.py file has only 7.77% patch coverage with 83 missing lines. Consider addressing test coverage before finalizing this documentation update.

The tutorial content itself is excellent—clear examples, good structure, and comprehensive parameter explanations.


22-22: The import import molify is correct; no inconsistency exists.

The original concern misunderstood Python naming conventions. The package name in pyproject.toml is "molify"—this is what must be imported. The kernel's display_name of "rdkit2ase" and the repository name "zincware/rdkit2ase" are separate identities. Additionally, rdkit2ase is a function exported from the molify package, not a package name. All imports across the codebase correctly use import molify, consistent with the actual package name.

Likely an incorrect or invalid review comment.

molify/pack.py (3)

12-16: LGTM!

The import statement correctly includes find_packmol_executable to support the new auto-detection feature.


223-225: LGTM!

The auto-detection logic is correctly placed early in the function to fail fast if no packmol installation is found, before any expensive conformer selection.


169-169: Backward compatibility verified—change is safe and improves robustness.

The auto-detection implementation at find_packmol_executable() properly handles both fallback paths: Julia wrapper (preferred) and native binary in PATH. Existing tests with explicit packmol parameter are unaffected; tests relying on the default will now auto-detect and benefit from Julia support if available. Clear RuntimeError is raised if neither installation is found.

molify/utils.py (2)

2-2: LGTM!

The new imports support the visualization and executable detection features. All are appropriately used in the new functions.

Also applies to: 5-5, 9-9, 14-16


463-571: LGTM!

The bond order visualization correctly:

  • Handles None and unknown bond orders with appropriate fallbacks
  • Draws single/double/triple bonds with parallel lines
  • Renders aromatic bonds with solid+dashed style
  • Uses proper perpendicular offset calculations

Comment on lines 134 to 155
"By default, `pack()` automatically detects your Packmol installation. It searches for one of the following:\n",
"- **Native binary** (`packmol`)\n",
"- **Julia version** (`packmol.jl`)\n",
"\n",
"You can explicitly specify which version to use:\n",
"\n",
"```python\n",
"# Let auto-detection choose (recommended)\n",
"molify.pack([water], [10], 1000)\n",
"\n",
"# Explicitly use Julia version\n",
"molify.pack([water], [10], 1000, packmol=\"packmol.jl\")\n",
"\n",
"# Use custom path\n",
"molify.pack([water], [10], 1000, packmol=\"/path/to/packmol\")\n",
"```\n",
"\n",
"To check which executable will be used:\n",
"```python\n",
"executable = molify.find_packmol_executable()\n",
"print(f\"Using: {executable}\")\n",
"```"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Verify auto-detection preference order.

The documentation states that auto-detection searches for native binary first (line 136), then Julia version (line 137). However, line 272 in "Key Takeaways" says it "Prefers Julia version (packmol.jl) for better compatibility."

Please clarify and ensure consistency:

  • Which version is actually preferred by the auto-detection?
  • Update the documentation to be consistent throughout.

🏁 Script executed:

#!/bin/bash
# Find the implementation of find_packmol_executable to verify preference order

echo "=== Searching for find_packmol_executable implementation ==="
ast-grep --pattern 'def find_packmol_executable($$$) {
  $$$
}'

echo -e "\n=== Alternative search with rg ==="
rg -n "def find_packmol_executable" -A 20 --type py

Length of output: 1145


Update documentation to match actual auto-detection order.

The notebook documentation incorrectly lists native binary first (line 136), then Julia version (line 137). However, the actual implementation in molify/utils.py:268 prioritizes Julia version first for cross-platform compatibility, then the native binary.

Swap the order in the bulleted list:

  • Julia version (packmol.jl) — preferred for cross-platform compatibility
  • Native binary (packmol)

This will align the documentation with the real behavior implemented in find_packmol_executable().

🤖 Prompt for AI Agents
In docs/source/packmol_tools.ipynb around lines 134 to 155, the bulleted
auto-detection order is reversed relative to the implementation: update the list
so **Julia version** (`packmol.jl`) appears first and **Native binary**
(`packmol`) second, and add the note that the Julia version is preferred for
cross-platform compatibility so the documentation matches molify/utils.py:268
behavior.

* build wheels

* `build_data['pure_python'] = False`

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* brew install gcc before build

* search for gfortran

* Handle packmol.exe on Windows platforms

- Check for both 'packmol' and 'packmol.exe' after compilation
- Windows builds produce packmol.exe, Unix/Linux/macOS produce packmol
- Always install as 'packmol' (no extension) for cross-platform consistency

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* update

* remove `packmol.jl`

* Fix macOS deployment target and Windows Unicode error

- Set MACOSX_DEPLOYMENT_TARGET=14.0 for macOS wheel compatibility
- Remove Unicode checkmark (✓) that causes UnicodeEncodeError on Windows
- Add CIBW_BEFORE_ALL_MACOS to install gcc before building

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* set wheel dependens

* Add PyPI publishing and improve workflow

- Remove gcc installation (pre-installed on runners)
- Rename build_sdist → make_sdist with fetch-depth: 0
- Add upload_all job for PyPI publishing on releases
- Use trusted publishing with pypa/gh-action-pypi-publish
- Download all cibw-* artifacts before publishing

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* run pytest on all versions

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update build targets

* move files / fix tests

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update packmol docs

* run notebook

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* set `MACOSX_DEPLOYMENT_TARGET=15.0`

* remove `windows-11-arm` tests

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Claude <[email protected]>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
docs/source/index.rst (1)

134-177: Recheck the external Packmol link target

The narrative and build hook now bundle the Packmol Fortran source under external/packmol, but the Packmol_ reference currently points to the Packmol.jl Julia wrapper on GitHub. Consider pointing this link to the actual upstream Packmol project you vendor (e.g., the Fortran repository) or clarifying that you are specifically referencing the Julia wrapper if that’s intentional.

♻️ Duplicate comments (3)
src/molify/utils.py (2)

258-264: Doc default vs signature: node_size mismatch.

The docstring states node_size default is 3000 (line 260), but the function signature uses node_size=500 (line 234). These should match.

Apply this diff to align the docstring:

     node_size : int, optional
-        Size of the nodes. Default is 3000.
+        Size of the nodes. Default is 500.

325-331: Guard spring_layout weights when bond_order is None.

When weight_by_bond_order=True and layout="spring", the code passes weight="bond_order" to nx.spring_layout. However, edge bond_order attributes can be None (as seen in ase2networkx output when connectivity is distance-based). NetworkX's spring layout expects numeric weights and may error or behave inconsistently with None values.

Consider this approach to ensure numeric weights:

     if layout == "spring":
         if weight_by_bond_order:
-            # Use bond_order as weights - higher order = stronger spring
-            pos = nx.spring_layout(graph, weight="bond_order", seed=42)
+            # Ensure all bond_order values are numeric (fallback None to 1.0)
+            for u, v, data in graph.edges(data=True):
+                if data.get("bond_order") is None or not isinstance(data.get("bond_order"), (int, float)):
+                    data["bond_order"] = 1.0
+            pos = nx.spring_layout(graph, weight="bond_order", seed=42)
         else:
             pos = nx.spring_layout(graph, seed=42)
tests/test_substructure.py (1)

70-134: Fix “explict” typos in comments for explicit‑hydrogen SMARTS tests

In the explicit‑hydrogen SMARTS section, the comments still read “explict only hydrogens are returned” (Lines 122 and 132). Please correct these to “explicit” (e.g., “only explicit hydrogens are returned”).

This was already flagged in a previous review; updating the comments will avoid confusion.

🧹 Nitpick comments (7)
src/molify/utils.py (1)

31-61: Consider removing the NetworkX fallback.

NetworkX is already a hard dependency (imported at line 8), so the ImportError fallback path in this function is unreachable. The manual DFS implementation is well-written but appears to be dead code.

If you want to simplify:

 def find_connected_components(connectivity: list[tuple[int, int, float]]):
-    try:
-        import networkx as nx
-
-        graph = nx.Graph()
-        for i, j, _ in connectivity:
-            graph.add_edge(i, j)
-        for component in nx.connected_components(graph):
-            yield component
-    except ImportError:
-        adjacency = defaultdict(list)
-        for i, j, _ in connectivity:
-            adjacency[i].append(j)
-            adjacency[j].append(i)
-
-        visited = set()
-        for start in adjacency:
-            if start in visited:
-                continue
-
-            component = []
-            stack = [start]
-            while stack:
-                node = stack.pop()
-                if node in visited:
-                    continue
-                visited.add(node)
-                component.append(node)
-                stack.extend(n for n in adjacency[node] if n not in visited)
-
-            yield component
+    graph = nx.Graph()
+    for i, j, _ in connectivity:
+        graph.add_edge(i, j)
+    for component in nx.connected_components(graph):
+        yield component
tests/test_compress.py (1)

24-58: Freeze test is solid; you can slightly tighten the OH-distance checks

The freeze_molecules=True test nicely checks density, fragment count, and that water geometries are preserved. Currently you only assert one O–H distance per molecule depending on the O index; if you want stronger coverage, you could explicitly identify the O and both H indices and assert both O–H distances against ref_water. Not required, but would make the intent even clearer.

hatch_build.py (2)

14-41: _find_gfortran is pragmatic; version selection could be more robust (optional)

The approach of preferring plain gfortran and then scanning common versioned locations is practical. If you ever need to distinguish between multiple installed versions (e.g., gfortran-9, gfortran-11, gfortran-13), the current lexicographic matches.sort(reverse=True) may not pick the numerically highest version. Not urgent, but if you hit such setups, consider parsing the suffix as an integer and sorting numerically.


43-116: Minor lint nits: unused version argument and long error strings

Ruff’s ARG002/TRY003 hints here are mostly cosmetic: version in initialize is required by the hook interface but unused (you could rename it to _version), and the multi-line error messages are fine but could be moved into a dedicated exception class if you want to satisfy TRY003. These are optional cleanups and do not affect behavior.

src/molify/packmol.py (1)

21-29: Consider Windows .exe handling in get_packmol_binary

get_packmol_binary always looks for a file named binaries/packmol. On Windows, the compiled executable is typically packmol.exe; unless your build hook explicitly names the file without an extension, this will raise the “not found” error even when the binary exists.

It’s safer to either:

  • Ensure the build hook always produces a consistent cross‑platform filename, or
  • Add platform‑aware lookup (e.g., try packmol.exe on Windows before failing).
src/molify/pack.py (1)

64-78: Consider adding a timeout or reusing the higher‑level run_packmol helper

_run_packmol calls subprocess.run without a timeout and lets CalledProcessError bubble up directly. For long‑running or stuck Packmol runs, this can hang the caller indefinitely, unlike the run_packmol(..., timeout=...) helper which already centralizes nicer error handling.

Longer‑term, it may be worth either:

  • Threading an optional timeout parameter through to _run_packmol, or
  • Reusing molify.packmol.run_packmol here so Packmol invocation is handled in one place.
src/molify/substructure.py (1)

13-170: Robust implementation with comprehensive hydrogen handling.

The function correctly handles pattern parsing, atom mapping validation, and three hydrogen inclusion modes. The duplicate detection in "include" mode (lines 134-136) and the map number sorting (lines 99-106) are particularly well-thought-out.

Static analysis suggests extracting long error messages to constants or class-level attributes (TRY003). This is a minor style preference:

# At module level
_INVALID_PATTERN_MSG = "Invalid SMARTS/SMILES: {}"
_DUPLICATE_MAP_MSG = "Atom map label '{}' is used multiple times"
_INVALID_HYDROGENS_MSG = "Invalid value for `hydrogens`: {}. Expected one of 'include', 'exclude', 'isolated'."

# Then in the function:
raise ValueError(_INVALID_PATTERN_MSG.format(smarts_or_smiles))
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b9b63e6 and 93c9cf6.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (24)
  • .github/workflows/build-wheels.yaml (1 hunks)
  • .github/workflows/pages.yaml (1 hunks)
  • .github/workflows/publish.yaml (0 hunks)
  • .github/workflows/pytest.yaml (1 hunks)
  • .gitmodules (1 hunks)
  • README.md (1 hunks)
  • docs/source/index.rst (1 hunks)
  • docs/source/packmol_tools.ipynb (2 hunks)
  • external/packmol (1 hunks)
  • hatch_build.py (1 hunks)
  • pyproject.toml (4 hunks)
  • src/molify/__init__.py (2 hunks)
  • src/molify/pack.py (6 hunks)
  • src/molify/packmol.py (1 hunks)
  • src/molify/substructure.py (1 hunks)
  • src/molify/utils.py (1 hunks)
  • tests/conftest.py (2 hunks)
  • tests/integration/test_ase2x.py (0 hunks)
  • tests/integration/test_networkx2x.py (0 hunks)
  • tests/test_com.py (1 hunks)
  • tests/test_compress.py (2 hunks)
  • tests/test_solvate.py (4 hunks)
  • tests/test_substructure.py (8 hunks)
  • tests/test_utils.py (0 hunks)
💤 Files with no reviewable changes (4)
  • tests/test_utils.py
  • tests/integration/test_networkx2x.py
  • tests/integration/test_ase2x.py
  • .github/workflows/publish.yaml
✅ Files skipped from review due to trivial changes (1)
  • external/packmol
🧰 Additional context used
🧬 Code graph analysis (8)
tests/conftest.py (1)
src/molify/pack.py (1)
  • pack (143-247)
tests/test_solvate.py (2)
src/molify/smiles2x.py (1)
  • smiles2conformers (64-149)
src/molify/pack.py (1)
  • pack (143-247)
src/molify/utils.py (2)
src/molify/ase2x.py (1)
  • ase2networkx (97-184)
src/molify/rdkit2x.py (1)
  • rdkit2networkx (61-125)
src/molify/__init__.py (2)
src/molify/substructure.py (2)
  • group_matches_by_fragment (173-233)
  • match_substructure (13-170)
src/molify/utils.py (1)
  • draw_molecular_graph (230-384)
tests/test_compress.py (1)
src/molify/pack.py (1)
  • pack (143-247)
tests/test_substructure.py (5)
src/molify/ase2x.py (1)
  • ase2rdkit (187-220)
src/molify/substructure.py (3)
  • match_substructure (13-170)
  • get_substructures (236-272)
  • group_matches_by_fragment (173-233)
src/molify/smiles2x.py (2)
  • smiles2conformers (64-149)
  • smiles2atoms (33-61)
src/molify/pack.py (1)
  • pack (143-247)
tests/conftest.py (3)
  • ethanol_water (8-12)
  • alanine_dipeptide (16-18)
  • alanine_dipeptide_box (22-25)
src/molify/pack.py (2)
src/molify/packmol.py (1)
  • get_packmol_binary (8-29)
src/molify/utils.py (1)
  • calculate_box_dimensions (75-84)
src/molify/substructure.py (2)
src/molify/ase2x.py (1)
  • ase2rdkit (187-220)
src/molify/utils.py (1)
  • find_connected_components (31-61)
🪛 LanguageTool
README.md

[style] ~46-~46: Using many exclamation marks might seem excessive (in this case: 6 exclamation marks for a text that’s 1405 characters long)
Context: ...u like packmol, give it a star on GitHub! ```py from molify import pack, smiles2...

(EN_EXCESSIVE_EXCLAMATION)

🪛 Ruff (0.14.4)
hatch_build.py

36-41: Avoid specifying long messages outside the exception class

(TRY003)


43-43: Unused method argument: version

(ARG002)


56-59: Avoid specifying long messages outside the exception class

(TRY003)


70-70: Starting a process with a partial executable path

(S607)


78-78: subprocess call: check for execution of untrusted input

(S603)


79-79: Starting a process with a partial executable path

(S607)


88-88: Avoid specifying long messages outside the exception class

(TRY003)


96-100: Avoid specifying long messages outside the exception class

(TRY003)

src/molify/utils.py

28-28: Avoid specifying long messages outside the exception class

(TRY003)


182-182: Avoid specifying long messages outside the exception class

(TRY003)


208-208: Consider moving this statement to an else block

(TRY300)


212-216: Avoid specifying long messages outside the exception class

(TRY003)


319-319: Avoid specifying long messages outside the exception class

(TRY003)


336-339: Avoid specifying long messages outside the exception class

(TRY003)

src/molify/packmol.py

24-27: Avoid specifying long messages outside the exception class

(TRY003)


56-56: Avoid specifying long messages outside the exception class

(TRY003)


61-61: subprocess call: check for execution of untrusted input

(S603)


71-75: Avoid specifying long messages outside the exception class

(TRY003)


77-79: Avoid specifying long messages outside the exception class

(TRY003)


93-93: subprocess call: check for execution of untrusted input

(S603)


102-102: __all__ is not sorted

Apply an isort-style sorting to __all__

(RUF022)

src/molify/pack.py

72-72: subprocess call: check for execution of untrusted input

(S603)


237-237: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

src/molify/substructure.py

86-86: Avoid specifying long messages outside the exception class

(TRY003)


95-95: Avoid specifying long messages outside the exception class

(TRY003)


162-165: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (17)
  • GitHub Check: pytest (3.10, windows-latest)
  • GitHub Check: pytest (3.11, windows-latest)
  • GitHub Check: pytest (3.12, windows-latest)
  • GitHub Check: Build wheels on windows-11-arm
  • GitHub Check: Build wheels on macos-latest
  • GitHub Check: Build wheels on macos-15-intel
  • GitHub Check: Build wheels on windows-latest
  • GitHub Check: pytest (3.13, windows-latest)
  • GitHub Check: pytest (3.13, macos-15-intel)
  • GitHub Check: Build wheels on ubuntu-latest
  • GitHub Check: Build wheels on ubuntu-24.04-arm
  • GitHub Check: Build wheels on macos-latest
  • GitHub Check: Build wheels on macos-15-intel
  • GitHub Check: Build wheels on ubuntu-latest
  • GitHub Check: Build wheels on windows-latest
  • GitHub Check: Build wheels on windows-11-arm
  • GitHub Check: Build wheels on ubuntu-24.04-arm
🔇 Additional comments (36)
.github/workflows/pages.yaml (1)

29-31: LGTM! Checkout action updated and submodules enabled.

The upgrade from actions/checkout@v2 to v5 brings the workflow up to date, and the submodules: recursive configuration correctly enables fetching the external/packmol submodule introduced in this PR.

src/molify/utils.py (7)

18-28: LGTM! Bond order mapping is correct.

The function correctly maps numeric bond orders (1.0, 2.0, 3.0, 1.5) to RDKit BondType enums, including proper handling of aromatic bonds (1.5).


64-72: LGTM! Density calculation is correct.

The conversion from atomic mass units to kg and from ų to m³ is properly implemented.


75-84: LGTM! Box dimension calculation is correct.

The function correctly computes cubic box dimensions from total mass and target density.


87-177: LGTM! Unwrapping algorithm is well-designed.

The DFS-based approach to unwrapping molecular structures across periodic boundaries is sound, and the comprehensive docstring with examples is excellent.


180-216: LGTM! Bond determination logic handles edge cases well.

The special cases for single-atom ions (alkali metals, halogens) and PF6⁻ are reasonable workarounds for RDKit's bond perception limitations. The multi-charge-state fallback strategy is robust.


219-227: LGTM! Simple conversion wrapper.

The function correctly converts a list of SMILES strings to NetworkX graphs via RDKit molecules.


387-495: LGTM! Bond order visualization is well-implemented.

The edge drawing logic correctly handles single, double, triple, and aromatic bonds with appropriate line styles and offsets. The perpendicular offset calculation is mathematically sound.

tests/conftest.py (2)

11-11: LGTM! Test updated for default Packmol backend.

The removal of the explicit packmol parameter aligns with the new bundled Packmol approach, where the default behavior uses the built-in binary.


24-24: LGTM! Test updated for default Packmol backend.

Consistent with the changes throughout the test suite to rely on the bundled Packmol binary by default.

.gitmodules (1)

1-3: LGTM! Packmol submodule properly configured.

The submodule addition points to the official m3g/packmol repository and enables the bundled Packmol integration described throughout this PR.

src/molify/__init__.py (3)

10-16: LGTM! Public API updated with new substructure functionality.

The addition of group_matches_by_fragment exposes the new fragment-aware matching capability introduced in this PR.


17-20: LGTM! New visualization function exposed.

The addition of draw_molecular_graph to the public API provides users with bond-order-aware molecular graph visualization.


24-47: LGTM! all list updated consistently.

The exports correctly reflect the API additions (group_matches_by_fragment, draw_molecular_graph) and are consistent with the imports.

README.md (1)

44-46: LGTM! Documentation accurately reflects bundled Packmol.

The updated README correctly states that molify now ships with Packmol, consistent with the submodule integration and build changes in this PR. The acknowledgment of the upstream project is appropriate.

tests/test_com.py (1)

106-108: LGTM! Test updated for default Packmol backend.

The removal of the packmol parameter is consistent with the new default bundled behavior throughout the test suite.

tests/test_solvate.py (1)

9-126: LGTM! Tests updated to use bundled Packmol backend.

All pack() calls throughout this test file have been consistently updated to remove the explicit packmol parameter, relying on the new default bundled binary. Test function names have been appropriately simplified to remove backend-specific references. The test logic and assertions remain unchanged.

tests/test_compress.py (1)

7-22: Default-pack backend usage in test_compress looks good

Using molify.pack(..., density=500) without an explicit backend and then validating both pre- and post-compression densities is consistent with the new default/bundled Packmol behavior; the test is clear and appropriately strict.

.github/workflows/pytest.yaml (1)

24-30: Broader OS matrix is reasonable; just confirm runner/tooling support

The expanded matrix plus actions/checkout@v5 with recursive submodules is a nice improvement. Given the new hatch build hook compiles Packmol with gfortran/make, please double-check that all configured runners (especially ubuntu-24.04-arm, macos-15-intel, and windows-latest) either (a) have the necessary toolchain preinstalled or (b) install it before uv sync, and that the runner labels all exist as expected in GitHub Actions.

docs/source/index.rst (1)

4-132: Index rewrite is clear and aligned with the current API

The new introduction, conversion diagrams, and quick-start examples (RDKit↔ASE↔NetworkX, conformer generation, and packing) read well and match the current public API (rdkit2ase, ase2rdkit, smiles2conformers, pack, etc.). The hidden toctree plus explicit sections give a good high-level overview without overloading the front page.

hatch_build.py (1)

90-116: Based on the web search result, Hatchling's force_include documentation confirms that the mapping keys support both relative and absolute paths. Relative paths are interpreted from the project root, while absolute paths are supported as external sources.

Looking at the current code, target_binary is constructed as an absolute filesystem path (via Path objects), and the current implementation uses str(target_binary) as the key. According to Hatchling's documentation, this approach is valid and supported—there is no inherent "path-resolution issue" with absolute paths as the review comment suggests.

The suggested refactor to use relative paths would also work correctly, but it is not necessary to avoid issues. The review comment's premise that absolute paths cause problems on different machines or CI environments is not supported by Hatchling's documented behavior.


The review comment mischaracterizes how Hatchling handles force_include paths. Absolute paths (which the current code uses) are supported and properly handled by Hatchling's builder. While changing to relative paths would also be correct, the current implementation is not problematic and does not require fixing.

Likely an incorrect or invalid review comment.

pyproject.toml (1)

3-79: Project metadata, docs extra, and build hook wiring verified as correct

Verification confirms all configuration is properly wired:

  • molify.packmol:packmol_cli exists at src/molify/packmol.py:82 and is exported
  • hatch_build.py is present with complete logic to compile packmol, place the binary in src/molify/binaries/, and include it in the wheel via force_include
  • Hatch build hook is properly registered and wheel target correctly specifies packages = ["src/molify"] for the src layout
  • Build hook marks the wheel as platform-specific, ensuring correct wheel tags
.github/workflows/build-wheels.yaml (2)

15-30: Overall workflow structure for wheels and sdists looks solid

Checkout with recursive submodules, cibuildwheel usage, uv SDist build, and artifact aggregation for PyPI publishing are wired together cleanly and align with the new bundled Packmol setup.

Also applies to: 31-51, 60-68


12-13: The runner labels are valid GitHub-hosted labels and will not fail scheduling.

All three runner labels (ubuntu-24.04-arm, windows-11-arm, and macos-15-intel) are documented as available GitHub-hosted runner labels in 2025. The review comment's claim that these "do not match the usual GitHub-hosted runner labels" and are "likely to fail scheduling" is incorrect; these are standard, supported labels.

Regarding MACOSX_DEPLOYMENT_TARGET=15.0: This is a deliberate policy choice to target macOS 15+, not a misconfiguration. While it does restrict compatibility to newer macOS versions, this may be intentional based on project requirements and is not a technical error.

Likely an incorrect or invalid review comment.

src/molify/packmol.py (1)

82-99: CLI shim is straightforward and aligns with the bundled‑binary design

Forwarding stdin/stdout/stderr and propagating the Packmol return code makes the packmol console entry behave like the underlying binary, while still benefiting from your bundled distribution. Exporting packmol_cli via __all__ keeps the module surface tidy.

Also applies to: 102-102

tests/test_substructure.py (5)

155-215: Hydrogen‑handling tests exercise the new semantics well

The tests around hydrogens="exclude", "include", "isolated" and explicit [H] SMARTS on ethanol_mol closely mirror the documented behavior in match_substructure and should catch regressions in ordering and hydrogen selection.


219-251: Mapped SMILES + hydrogen filtering coverage looks strong

The mapped_only=True tests with and without hydrogens for "C[O:1]" validate both map‑number ordering and hydrogen inclusion/exclusion/isolated behavior. This is a good, targeted regression suite for the mapping logic in match_substructure.


268-386: Fragment‑grouping tests align with group_matches_by_fragment design

The test_group_matches_by_fragment and *_mapped_order_box cases verify:

  • Grouping by disconnected fragments,
  • Stable ordering of indices inside groups, and
  • Consistent behavior under different mapping orders and hydrogen modes.

These tests nicely pin down the fragment logic and mapped‑atom ordering.


412-450: Good regression test for ase2rdkit handling of None bond orders

test_match_substructure_ions_with_none_bond_orders faithfully reproduces the reported failure mode by injecting None bond orders into connectivity and then asserting that match_substructure still finds Na⁺, Cl⁻, and water. This should prevent regressions in the new ase2rdkit robustness logic.


452-510: Visualization tests cover the main usage patterns

The visualize_selected_molecules tests (basic, empty, overlapping, multiple selections, custom alpha) give good smoke coverage of the visualization API and color‑precedence behavior without over‑specifying image internals.

src/molify/pack.py (1)

150-177: Bundled‑binary default and improved logging look correct

Updating pack() to accept packmol: str | None and default to str(get_packmol_binary()) cleanly switches the public API to the bundled Packmol binary while still allowing a custom executable path. The added logging of the Packmol path when mixture.<format> is missing should make debugging misconfigurations significantly easier, without changing the happy path behavior.

Also applies to: 202-227, 235-242

docs/source/packmol_tools.ipynb (1)

7-12: Notebook now matches the bundled‑Packmol API and examples are consistent

The updated notebook correctly documents that molify ships with a pre‑compiled Packmol binary and demonstrates pack() usage without backend arguments, along with connectivity preservation, mixture creation, RDKit visualization, and density calculation. This aligns well with the new pack()/get_packmol_binary() behavior.

Just ensure molify.utils.calculate_density is present and public so the density example keeps working as shown.

Also applies to: 29-36, 75-86, 219-249, 274-279

src/molify/substructure.py (4)

1-11: LGTM!

Imports are well-organized and appropriate for the module's functionality.


173-233: LGTM! Clean grouping logic with proper deduplication.

The fragment assignment logic (lines 218-222) correctly assumes that substructure matches are within connected components. The duplicate removal (lines 229-230) preserves order efficiently using the seen set pattern.


236-272: LGTM! Clean integration between ASE and RDKit.

The function correctly bridges ASE and RDKit worlds, with clear documentation that **kwargs are passed to ase2rdkit for bond detection control.


316-384: LGTM! Well-structured helper functions.

The fragment collection logic correctly maps indices (lines 342-358), and the deduplication via canonical SMILES (line 376) is the right approach for identifying structurally identical molecules.

Comment on lines +43 to +89
def initialize(self, version, build_data):
"""Compile packmol before building the wheel."""
# Paths
project_root = Path(self.root)
packmol_source = project_root / "external" / "packmol"
binaries_dir = project_root / "src" / "molify" / "binaries"
target_binary = binaries_dir / "packmol"

# Ensure binaries directory exists
binaries_dir.mkdir(parents=True, exist_ok=True)

# Verify packmol source exists
if not packmol_source.exists():
raise RuntimeError(
f"Packmol source not found at {packmol_source}. "
"Run: git submodule update --init --recursive"
)

# Find gfortran compiler
gfortran_path = self._find_gfortran()
print(f"Using gfortran: {gfortran_path}")

# Compile packmol
print("Compiling packmol...")
try:
# Clean previous build artifacts
subprocess.run(
["make", "clean"],
cwd=packmol_source,
check=False, # Don't fail if clean fails
capture_output=True,
)

# Override FORTRAN variable to use detected gfortran
# Note: Not using -j4 due to Fortran module dependencies
result = subprocess.run(
["make", f"FORTRAN={gfortran_path}"],
cwd=packmol_source,
check=True,
capture_output=True,
text=True,
)
print(result.stdout)
except subprocess.CalledProcessError as e:
print(f"Compilation failed!\nstdout: {e.stdout}\nstderr: {e.stderr}")
raise RuntimeError(f"Failed to compile packmol: {e}") from e

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

Build hook behavior is good but currently hard-requires toolchain on all build platforms

The hook correctly ensures Packmol is built from external/packmol and surfaces useful diagnostics on failure. However, it unconditionally raises if the submodule is missing or gfortran/make compilation fails, which will cause pip/uv installs (and CI on platforms without a Fortran toolchain, especially Windows) to fail outright. If you intend Packmol to be strictly required, this is fine; otherwise, you might want to (a) gate compilation on an env var, (b) skip/soft-fail on unsupported platforms, or (c) provide a clearer, platform-specific message about missing compilers.


🌐 Web query:

How does Hatchling’s build hook lifecycle handle failures in `initialize` for `BuildHookInterface`? Is failing the hook the recommended way to enforce a hard dependency on native tooling, or is there a documented pattern for making such tooling optional?

💡 Result:

Short answer: If your BuildHookInterface.initialize raises an exception, the hook failure aborts the build — initialize runs immediately before each build and unhandled exceptions will stop the build. That makes failing the hook an effective way to force a hard failure, but it’s not the recommended pattern for expressing tooling requirements. Instead you should either declare the tool as a build dependency (so the build frontend will install it) or make the hook detect the tool and (when optional) mutate build_data (e.g. add deps or skip actions). The Hatch build-hook docs show initialize/finalize behavior and the CLI supports build hooks adding build-time dependencies; an existing issue/discussion also demonstrates using hooks to add dependencies for editable builds rather than hard-failing. [1][2][3]

Sources

  • Hatch build-hook reference (initialize/finalize semantics). [1]
  • Hatch CLI / build command (hooks, --hooks-only / --no-hooks options). [2]
  • Issue/discussion demonstrating using build hooks to add dependencies for editable builds (example of making a hook add dependencies rather than relying on hard failures). [3]

I'll verify the codebase to confirm whether there are any fallback mechanisms, optional tooling handling, or build dependency declarations that might affect the severity of this review comment.


---

🏁 Script executed:

```shell
#!/bin/bash
# Check for pyproject.toml and build configuration
fd -e "pyproject.toml" -e "setup.py" -e "setup.cfg" | head -20

Length of output: 0


🏁 Script executed:

#!/bin/bash
# Look for any conditional/optional compilation logic in Python build files
rg -n "optional|windows|platform|skip|soft.fail" --type=py -i $(fd -e "*.py" | grep -E "(build|setup|hatch)" | head -10)

Length of output: 2749


🏁 Script executed:

#!/bin/bash
# Check if there's any documentation about platform support or compilation requirements
fd -e "README*" -e "INSTALL*" -e "CONTRIBUTING*" -e "docs" -type f | head -20

Length of output: 170


🏁 Script executed:

#!/bin/bash
# Examine pyproject.toml for build dependencies and hook configuration
fd "pyproject.toml" -x cat {}

Length of output: 1809


🏁 Script executed:

#!/bin/bash
# Look at the entire hatch_build.py to see if there's any handling we missed
cat hatch_build.py

Length of output: 4322


Hard requirement on Fortran toolchain will fail builds on unsupported platforms (especially Windows)

The hook unconditionally raises on missing gfortran, make, or packmol source, which conflicts with Hatchling's documented patterns for handling native tooling. This will break builds on Windows and any CI environment without a Fortran compiler, despite the codebase showing intent to support multiple platforms (evidenced by the commented-out cibuildwheel config that explicitly notes Windows requires special setup).

Instead of raising unconditionally:

  • Declare gfortran/make as build dependencies so the build frontend can install them, OR
  • Detect unavailable tooling and mutate build_data to skip compilation or add conditional dependencies, OR
  • Gate compilation behind an environment variable for optional builds

See hatch_build.py lines 36–38 (_find_gfortran), 56–58 (submodule check), and 73 (compilation failure) — all currently unconditional raises.

🧰 Tools
🪛 Ruff (0.14.4)

43-43: Unused method argument: version

(ARG002)


56-59: Avoid specifying long messages outside the exception class

(TRY003)


70-70: Starting a process with a partial executable path

(S607)


78-78: subprocess call: check for execution of untrusted input

(S603)


79-79: Starting a process with a partial executable path

(S607)


88-88: Avoid specifying long messages outside the exception class

(TRY003)

Comment on lines +54 to +69
input_path = Path(input_file)
if not input_path.exists():
raise FileNotFoundError(f"Input file not found: {input_path}")

binary = get_packmol_binary()

try:
result = subprocess.run(
[str(binary)],
stdin=open(input_path),
capture_output=True,
text=True,
timeout=timeout,
check=True,
)
return result.stdout + result.stderr
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Close the input file used for stdin in run_packmol

run_packmol passes stdin=open(input_path) directly into subprocess.run without closing the file object. This can leak file descriptors if run_packmol is called many times in a process.

Wrap the open call in a context manager:

-    try:
-        result = subprocess.run(
-            [str(binary)],
-            stdin=open(input_path),
-            capture_output=True,
-            text=True,
-            timeout=timeout,
-            check=True,
-        )
+    try:
+        with input_path.open() as fin:
+            result = subprocess.run(
+                [str(binary)],
+                stdin=fin,
+                capture_output=True,
+                text=True,
+                timeout=timeout,
+                check=True,
+            )

I'll verify the review comment by examining the actual code and checking for similar patterns in the codebase.


---

🏁 Script executed:

```shell
#!/bin/bash
# Examine the actual code at the specified lines in src/molify/packmol.py
if [ -f "src/molify/packmol.py" ]; then
  echo "=== Lines 45-75 of src/molify/packmol.py ==="
  sed -n '45,75p' src/molify/packmol.py
  echo ""
  echo "=== Full run_packmol function ==="
  grep -n "def run_packmol" src/molify/packmol.py
else
  echo "File not found"
fi

Length of output: 1035


🏁 Script executed:

#!/bin/bash
# Search for similar subprocess.run calls with stdin=open pattern
echo "=== Searching for stdin=open patterns in codebase ==="
rg -n "stdin\s*=\s*open" --type py
echo ""
echo "=== Searching for all subprocess.run calls ==="
rg -n "subprocess\.run" --type py -A 5 -B 2

Length of output: 2356


🏁 Script executed:

#!/bin/bash
# Check for imports and context at the top of packmol.py
echo "=== First 40 lines of src/molify/packmol.py ==="
head -40 src/molify/packmol.py

Length of output: 965


Close the file descriptor leak in run_packmol at line 63

The code passes stdin=open(input_path) to subprocess.run() without closing the file. This creates a file descriptor leak when run_packmol is called repeatedly, potentially exhausting system resources. The codebase already demonstrates the correct pattern in src/molify/pack.py (lines 71-77).

Apply the suggested context manager fix:

-    try:
-        result = subprocess.run(
-            [str(binary)],
-            stdin=open(input_path),
-            capture_output=True,
-            text=True,
-            timeout=timeout,
-            check=True,
-        )
+    try:
+        with input_path.open() as fin:
+            result = subprocess.run(
+                [str(binary)],
+                stdin=fin,
+                capture_output=True,
+                text=True,
+                timeout=timeout,
+                check=True,
+            )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
input_path = Path(input_file)
if not input_path.exists():
raise FileNotFoundError(f"Input file not found: {input_path}")
binary = get_packmol_binary()
try:
result = subprocess.run(
[str(binary)],
stdin=open(input_path),
capture_output=True,
text=True,
timeout=timeout,
check=True,
)
return result.stdout + result.stderr
input_path = Path(input_file)
if not input_path.exists():
raise FileNotFoundError(f"Input file not found: {input_path}")
binary = get_packmol_binary()
try:
with input_path.open() as fin:
result = subprocess.run(
[str(binary)],
stdin=fin,
capture_output=True,
text=True,
timeout=timeout,
check=True,
)
return result.stdout + result.stderr
🧰 Tools
🪛 Ruff (0.14.4)

56-56: Avoid specifying long messages outside the exception class

(TRY003)


61-61: subprocess call: check for execution of untrusted input

(S603)

🤖 Prompt for AI Agents
In src/molify/packmol.py around lines 54 to 69, run_packmol opens the input file
with open(input_path) and passes that file object to subprocess.run as stdin
without closing it, causing a file descriptor leak; change it to open the input
file using a context manager (with open(input_path, "r") as f:) and pass that f
to subprocess.run so the file is automatically closed after the call, preserving
the same subprocess.run arguments and returning result.stdout + result.stderr.

Comment on lines +275 to +313
def iter_fragments(atoms: ase.Atoms) -> list[ase.Atoms]:
"""Iterate over connected molecular fragments in an ASE Atoms object.
If a 'connectivity' field is present in ``atoms.info``, it will be used
to determine fragments. Otherwise, ``ase.build.separate`` will be used.
Parameters
----------
atoms : ase.Atoms
A structure that may contain one or more molecular fragments.
Yields
------
ase.Atoms
Each connected component (fragment) in the input structure.
Examples
--------
>>> from molify import smiles2atoms
>>> from rdkit.Chem import CombineMols
>>>
>>> # Create multi-fragment system
>>> ethanol = smiles2atoms("CCO")
>>> methanol = smiles2atoms("CO")
>>> combined = ethanol + methanol
>>>
>>> # Iterate over fragments
>>> fragments = list(iter_fragments(combined))
>>> len(fragments)
2
"""
if "connectivity" in atoms.info:
# connectivity is a list of tuples (i, j, bond_type)
connectivity = atoms.info["connectivity"]
for component in find_connected_components(connectivity):
yield atoms[list(component)]
else:
for molecule in separate(atoms):
yield molecule
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix incorrect return type annotation for generator function.

The return type annotation states -> list[ase.Atoms] but the function uses yield (lines 310, 313), making it a generator. This will cause type checking failures.

Apply this diff:

-def iter_fragments(atoms: ase.Atoms) -> list[ase.Atoms]:
+def iter_fragments(atoms: ase.Atoms) -> tp.Generator[ase.Atoms, None, None]:
     """Iterate over connected molecular fragments in an ASE Atoms object.

Alternatively, if you want to return a list instead of a generator, replace yield with list accumulation:

 def iter_fragments(atoms: ase.Atoms) -> list[ase.Atoms]:
     """Iterate over connected molecular fragments in an ASE Atoms object.
     
     ...
-    Yields
+    Returns
     ------
-    ase.Atoms
-        Each connected component (fragment) in the input structure.
+    list[ase.Atoms]
+        List of connected components (fragments) in the input structure.
     
     ...
     """
+    result = []
     if "connectivity" in atoms.info:
         connectivity = atoms.info["connectivity"]
         for component in find_connected_components(connectivity):
-            yield atoms[list(component)]
+            result.append(atoms[list(component)])
     else:
         for molecule in separate(atoms):
-            yield molecule
+            result.append(molecule)
+    return result

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In src/molify/substructure.py around lines 275 to 313, the function
iter_fragments is annotated as returning list[ase.Atoms] but uses yield, so
update the signature to a generator/iterator type (e.g., Iterator[ase.Atoms] or
Iterable[ase.Atoms]) and add the corresponding import from typing at the top of
the file; alternatively, if you prefer a concrete list return, collect fragments
into a list and return it instead of yielding. Ensure the chosen change is
consistent with callers and type hints elsewhere.

Comment on lines +387 to +482
def visualize_selected_molecules(
mol: Chem.Mol,
*args,
mols_per_row: int = 4,
sub_img_size: tuple[int, int] = (200, 200),
legends: list[str] | None = None,
alpha: float = 0.5,
):
"""Visualize molecules with optional atom highlighting.
If no atom selections are provided, displays the molecule without highlights.
Duplicate molecular structures will only be plotted once.
Parameters
----------
mol : Chem.Mol
The RDKit molecule object, which may contain multiple fragments.
*args : list[int] or tuple[int]
Variable number of lists/tuples containing atom indices to be highlighted.
Each selection will be assigned a different color from matplotlib's tab10
colormap. If no arguments provided, displays the molecule without highlights.
mols_per_row : int, default=4
Number of molecules per row in the grid.
sub_img_size : tuple[int, int], default=(200, 200)
Size of each molecule image in pixels.
legends : list[str], optional
Custom legends for each molecule. If None, default legends will be used.
alpha : float, default=0.5
Transparency level for the highlighted atoms (0.0 = fully transparent,
1.0 = opaque).
Returns
-------
PIL.Image
A PIL image object of the grid.
Examples
--------
>>> from molify import smiles2atoms, ase2rdkit, match_substructure
>>> from molify import visualize_selected_molecules
>>>
>>> # Create and convert molecule
>>> mol = ase2rdkit(smiles2atoms("Cc1ccccc1"))
>>>
>>> # Find aromatic and aliphatic carbons
>>> aromatic = match_substructure(mol, "c")
>>> aliphatic = match_substructure(mol, "[C;!c]")
>>>
>>> # Flatten matches for visualization
>>> aromatic_flat = [idx for match in aromatic for idx in match]
>>> aliphatic_flat = [idx for match in aliphatic for idx in match]
>>>
>>> # Visualize with highlights
>>> visualize_selected_molecules(mol, aromatic_flat, aliphatic_flat)
<PIL.Image>
"""
# Handle empty args case - display molecule without highlights
if not args:
img = Draw.MolsToGridImage(
[mol],
molsPerRow=mols_per_row,
subImgSize=sub_img_size,
legends=legends if legends is not None else ["Molecule 0"],
)
return img

# Collect highlighted fragments
candidate_mols, candidate_highlights, candidate_colors = (
_collect_highlighted_fragments(mol, args, alpha)
)

if not candidate_mols:
print("No molecules to draw with the given selections.")
return None

# Filter for unique molecules
mols_to_draw, highlight_lists, highlight_colors = _filter_unique_molecules(
candidate_mols, candidate_highlights, candidate_colors
)

# Draw the grid
final_legends = (
legends
if legends is not None
else [f"Molecule {i}" for i in range(len(mols_to_draw))]
)

img = Draw.MolsToGridImage(
mols_to_draw,
molsPerRow=mols_per_row,
subImgSize=sub_img_size,
legends=final_legends,
highlightAtomLists=highlight_lists,
highlightAtomColors=highlight_colors,
)
return img
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Document or handle the None return case.

The docstring states the function returns PIL.Image, but line 460 returns None when no molecules match the selections. This inconsistency could cause issues for callers expecting an image.

Consider one of these approaches:

Option 1: Document the None return (simplest)

 Returns
 -------
-PIL.Image
-    A PIL image object of the grid.
+PIL.Image or None
+    A PIL image object of the grid, or None if no molecules contain
+    any of the selected atoms.

Option 2: Raise an exception instead (fail fast)

     if not candidate_mols:
-        print("No molecules to draw with the given selections.")
-        return None
+        raise ValueError(
+            "No molecules contain any of the selected atoms. "
+            "Check that atom indices are valid and correspond to the molecule."
+        )

Option 3: Return a blank/placeholder image (always returns an image)

     if not candidate_mols:
-        print("No molecules to draw with the given selections.")
-        return None
+        # Return placeholder image when no matches
+        from PIL import Image, ImageDraw
+        img = Image.new('RGB', sub_img_size, 'white')
+        draw = ImageDraw.Draw(img)
+        draw.text((10, sub_img_size[1]//2), "No matches found", fill='black')
+        return img
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def visualize_selected_molecules(
mol: Chem.Mol,
*args,
mols_per_row: int = 4,
sub_img_size: tuple[int, int] = (200, 200),
legends: list[str] | None = None,
alpha: float = 0.5,
):
"""Visualize molecules with optional atom highlighting.
If no atom selections are provided, displays the molecule without highlights.
Duplicate molecular structures will only be plotted once.
Parameters
----------
mol : Chem.Mol
The RDKit molecule object, which may contain multiple fragments.
*args : list[int] or tuple[int]
Variable number of lists/tuples containing atom indices to be highlighted.
Each selection will be assigned a different color from matplotlib's tab10
colormap. If no arguments provided, displays the molecule without highlights.
mols_per_row : int, default=4
Number of molecules per row in the grid.
sub_img_size : tuple[int, int], default=(200, 200)
Size of each molecule image in pixels.
legends : list[str], optional
Custom legends for each molecule. If None, default legends will be used.
alpha : float, default=0.5
Transparency level for the highlighted atoms (0.0 = fully transparent,
1.0 = opaque).
Returns
-------
PIL.Image
A PIL image object of the grid.
Examples
--------
>>> from molify import smiles2atoms, ase2rdkit, match_substructure
>>> from molify import visualize_selected_molecules
>>>
>>> # Create and convert molecule
>>> mol = ase2rdkit(smiles2atoms("Cc1ccccc1"))
>>>
>>> # Find aromatic and aliphatic carbons
>>> aromatic = match_substructure(mol, "c")
>>> aliphatic = match_substructure(mol, "[C;!c]")
>>>
>>> # Flatten matches for visualization
>>> aromatic_flat = [idx for match in aromatic for idx in match]
>>> aliphatic_flat = [idx for match in aliphatic for idx in match]
>>>
>>> # Visualize with highlights
>>> visualize_selected_molecules(mol, aromatic_flat, aliphatic_flat)
<PIL.Image>
"""
# Handle empty args case - display molecule without highlights
if not args:
img = Draw.MolsToGridImage(
[mol],
molsPerRow=mols_per_row,
subImgSize=sub_img_size,
legends=legends if legends is not None else ["Molecule 0"],
)
return img
# Collect highlighted fragments
candidate_mols, candidate_highlights, candidate_colors = (
_collect_highlighted_fragments(mol, args, alpha)
)
if not candidate_mols:
print("No molecules to draw with the given selections.")
return None
# Filter for unique molecules
mols_to_draw, highlight_lists, highlight_colors = _filter_unique_molecules(
candidate_mols, candidate_highlights, candidate_colors
)
# Draw the grid
final_legends = (
legends
if legends is not None
else [f"Molecule {i}" for i in range(len(mols_to_draw))]
)
img = Draw.MolsToGridImage(
mols_to_draw,
molsPerRow=mols_per_row,
subImgSize=sub_img_size,
legends=final_legends,
highlightAtomLists=highlight_lists,
highlightAtomColors=highlight_colors,
)
return img
def visualize_selected_molecules(
mol: Chem.Mol,
*args,
mols_per_row: int = 4,
sub_img_size: tuple[int, int] = (200, 200),
legends: list[str] | None = None,
alpha: float = 0.5,
):
"""Visualize molecules with optional atom highlighting.
If no atom selections are provided, displays the molecule without highlights.
Duplicate molecular structures will only be plotted once.
Parameters
----------
mol : Chem.Mol
The RDKit molecule object, which may contain multiple fragments.
*args : list[int] or tuple[int]
Variable number of lists/tuples containing atom indices to be highlighted.
Each selection will be assigned a different color from matplotlib's tab10
colormap. If no arguments provided, displays the molecule without highlights.
mols_per_row : int, default=4
Number of molecules per row in the grid.
sub_img_size : tuple[int, int], default=(200, 200)
Size of each molecule image in pixels.
legends : list[str], optional
Custom legends for each molecule. If None, default legends will be used.
alpha : float, default=0.5
Transparency level for the highlighted atoms (0.0 = fully transparent,
1.0 = opaque).
Returns
-------
PIL.Image or None
A PIL image object of the grid, or None if no molecules contain
any of the selected atoms.
Examples
--------
>>> from molify import smiles2atoms, ase2rdkit, match_substructure
>>> from molify import visualize_selected_molecules
>>>
>>> # Create and convert molecule
>>> mol = ase2rdkit(smiles2atoms("Cc1ccccc1"))
>>>
>>> # Find aromatic and aliphatic carbons
>>> aromatic = match_substructure(mol, "c")
>>> aliphatic = match_substructure(mol, "[C;!c]")
>>>
>>> # Flatten matches for visualization
>>> aromatic_flat = [idx for match in aromatic for idx in match]
>>> aliphatic_flat = [idx for match in aliphatic for idx in match]
>>>
>>> # Visualize with highlights
>>> visualize_selected_molecules(mol, aromatic_flat, aliphatic_flat)
<PIL.Image>
"""
# Handle empty args case - display molecule without highlights
if not args:
img = Draw.MolsToGridImage(
[mol],
molsPerRow=mols_per_row,
subImgSize=sub_img_size,
legends=legends if legends is not None else ["Molecule 0"],
)
return img
# Collect highlighted fragments
candidate_mols, candidate_highlights, candidate_colors = (
_collect_highlighted_fragments(mol, args, alpha)
)
if not candidate_mols:
print("No molecules to draw with the given selections.")
return None
# Filter for unique molecules
mols_to_draw, highlight_lists, highlight_colors = _filter_unique_molecules(
candidate_mols, candidate_highlights, candidate_colors
)
# Draw the grid
final_legends = (
legends
if legends is not None
else [f"Molecule {i}" for i in range(len(mols_to_draw))]
)
img = Draw.MolsToGridImage(
mols_to_draw,
molsPerRow=mols_per_row,
subImgSize=sub_img_size,
legends=final_legends,
highlightAtomLists=highlight_lists,
highlightAtomColors=highlight_colors,
)
return img
Suggested change
def visualize_selected_molecules(
mol: Chem.Mol,
*args,
mols_per_row: int = 4,
sub_img_size: tuple[int, int] = (200, 200),
legends: list[str] | None = None,
alpha: float = 0.5,
):
"""Visualize molecules with optional atom highlighting.
If no atom selections are provided, displays the molecule without highlights.
Duplicate molecular structures will only be plotted once.
Parameters
----------
mol : Chem.Mol
The RDKit molecule object, which may contain multiple fragments.
*args : list[int] or tuple[int]
Variable number of lists/tuples containing atom indices to be highlighted.
Each selection will be assigned a different color from matplotlib's tab10
colormap. If no arguments provided, displays the molecule without highlights.
mols_per_row : int, default=4
Number of molecules per row in the grid.
sub_img_size : tuple[int, int], default=(200, 200)
Size of each molecule image in pixels.
legends : list[str], optional
Custom legends for each molecule. If None, default legends will be used.
alpha : float, default=0.5
Transparency level for the highlighted atoms (0.0 = fully transparent,
1.0 = opaque).
Returns
-------
PIL.Image
A PIL image object of the grid.
Examples
--------
>>> from molify import smiles2atoms, ase2rdkit, match_substructure
>>> from molify import visualize_selected_molecules
>>>
>>> # Create and convert molecule
>>> mol = ase2rdkit(smiles2atoms("Cc1ccccc1"))
>>>
>>> # Find aromatic and aliphatic carbons
>>> aromatic = match_substructure(mol, "c")
>>> aliphatic = match_substructure(mol, "[C;!c]")
>>>
>>> # Flatten matches for visualization
>>> aromatic_flat = [idx for match in aromatic for idx in match]
>>> aliphatic_flat = [idx for match in aliphatic for idx in match]
>>>
>>> # Visualize with highlights
>>> visualize_selected_molecules(mol, aromatic_flat, aliphatic_flat)
<PIL.Image>
"""
# Handle empty args case - display molecule without highlights
if not args:
img = Draw.MolsToGridImage(
[mol],
molsPerRow=mols_per_row,
subImgSize=sub_img_size,
legends=legends if legends is not None else ["Molecule 0"],
)
return img
# Collect highlighted fragments
candidate_mols, candidate_highlights, candidate_colors = (
_collect_highlighted_fragments(mol, args, alpha)
)
if not candidate_mols:
print("No molecules to draw with the given selections.")
return None
# Filter for unique molecules
mols_to_draw, highlight_lists, highlight_colors = _filter_unique_molecules(
candidate_mols, candidate_highlights, candidate_colors
)
# Draw the grid
final_legends = (
legends
if legends is not None
else [f"Molecule {i}" for i in range(len(mols_to_draw))]
)
img = Draw.MolsToGridImage(
mols_to_draw,
molsPerRow=mols_per_row,
subImgSize=sub_img_size,
legends=final_legends,
highlightAtomLists=highlight_lists,
highlightAtomColors=highlight_colors,
)
return img
def visualize_selected_molecules(
mol: Chem.Mol,
*args,
mols_per_row: int = 4,
sub_img_size: tuple[int, int] = (200, 200),
legends: list[str] | None = None,
alpha: float = 0.5,
):
"""Visualize molecules with optional atom highlighting.
If no atom selections are provided, displays the molecule without highlights.
Duplicate molecular structures will only be plotted once.
Parameters
----------
mol : Chem.Mol
The RDKit molecule object, which may contain multiple fragments.
*args : list[int] or tuple[int]
Variable number of lists/tuples containing atom indices to be highlighted.
Each selection will be assigned a different color from matplotlib's tab10
colormap. If no arguments provided, displays the molecule without highlights.
mols_per_row : int, default=4
Number of molecules per row in the grid.
sub_img_size : tuple[int, int], default=(200, 200)
Size of each molecule image in pixels.
legends : list[str], optional
Custom legends for each molecule. If None, default legends will be used.
alpha : float, default=0.5
Transparency level for the highlighted atoms (0.0 = fully transparent,
1.0 = opaque).
Returns
-------
PIL.Image
A PIL image object of the grid.
Examples
--------
>>> from molify import smiles2atoms, ase2rdkit, match_substructure
>>> from molify import visualize_selected_molecules
>>>
>>> # Create and convert molecule
>>> mol = ase2rdkit(smiles2atoms("Cc1ccccc1"))
>>>
>>> # Find aromatic and aliphatic carbons
>>> aromatic = match_substructure(mol, "c")
>>> aliphatic = match_substructure(mol, "[C;!c]")
>>>
>>> # Flatten matches for visualization
>>> aromatic_flat = [idx for match in aromatic for idx in match]
>>> aliphatic_flat = [idx for match in aliphatic for idx in match]
>>>
>>> # Visualize with highlights
>>> visualize_selected_molecules(mol, aromatic_flat, aliphatic_flat)
<PIL.Image>
"""
# Handle empty args case - display molecule without highlights
if not args:
img = Draw.MolsToGridImage(
[mol],
molsPerRow=mols_per_row,
subImgSize=sub_img_size,
legends=legends if legends is not None else ["Molecule 0"],
)
return img
# Collect highlighted fragments
candidate_mols, candidate_highlights, candidate_colors = (
_collect_highlighted_fragments(mol, args, alpha)
)
if not candidate_mols:
raise ValueError(
"No molecules contain any of the selected atoms. "
"Check that atom indices are valid and correspond to the molecule."
)
# Filter for unique molecules
mols_to_draw, highlight_lists, highlight_colors = _filter_unique_molecules(
candidate_mols, candidate_highlights, candidate_colors
)
# Draw the grid
final_legends = (
legends
if legends is not None
else [f"Molecule {i}" for i in range(len(mols_to_draw))]
)
img = Draw.MolsToGridImage(
mols_to_draw,
molsPerRow=mols_per_row,
subImgSize=sub_img_size,
legends=final_legends,
highlightAtomLists=highlight_lists,
highlightAtomColors=highlight_colors,
)
return img
Suggested change
def visualize_selected_molecules(
mol: Chem.Mol,
*args,
mols_per_row: int = 4,
sub_img_size: tuple[int, int] = (200, 200),
legends: list[str] | None = None,
alpha: float = 0.5,
):
"""Visualize molecules with optional atom highlighting.
If no atom selections are provided, displays the molecule without highlights.
Duplicate molecular structures will only be plotted once.
Parameters
----------
mol : Chem.Mol
The RDKit molecule object, which may contain multiple fragments.
*args : list[int] or tuple[int]
Variable number of lists/tuples containing atom indices to be highlighted.
Each selection will be assigned a different color from matplotlib's tab10
colormap. If no arguments provided, displays the molecule without highlights.
mols_per_row : int, default=4
Number of molecules per row in the grid.
sub_img_size : tuple[int, int], default=(200, 200)
Size of each molecule image in pixels.
legends : list[str], optional
Custom legends for each molecule. If None, default legends will be used.
alpha : float, default=0.5
Transparency level for the highlighted atoms (0.0 = fully transparent,
1.0 = opaque).
Returns
-------
PIL.Image
A PIL image object of the grid.
Examples
--------
>>> from molify import smiles2atoms, ase2rdkit, match_substructure
>>> from molify import visualize_selected_molecules
>>>
>>> # Create and convert molecule
>>> mol = ase2rdkit(smiles2atoms("Cc1ccccc1"))
>>>
>>> # Find aromatic and aliphatic carbons
>>> aromatic = match_substructure(mol, "c")
>>> aliphatic = match_substructure(mol, "[C;!c]")
>>>
>>> # Flatten matches for visualization
>>> aromatic_flat = [idx for match in aromatic for idx in match]
>>> aliphatic_flat = [idx for match in aliphatic for idx in match]
>>>
>>> # Visualize with highlights
>>> visualize_selected_molecules(mol, aromatic_flat, aliphatic_flat)
<PIL.Image>
"""
# Handle empty args case - display molecule without highlights
if not args:
img = Draw.MolsToGridImage(
[mol],
molsPerRow=mols_per_row,
subImgSize=sub_img_size,
legends=legends if legends is not None else ["Molecule 0"],
)
return img
# Collect highlighted fragments
candidate_mols, candidate_highlights, candidate_colors = (
_collect_highlighted_fragments(mol, args, alpha)
)
if not candidate_mols:
print("No molecules to draw with the given selections.")
return None
# Filter for unique molecules
mols_to_draw, highlight_lists, highlight_colors = _filter_unique_molecules(
candidate_mols, candidate_highlights, candidate_colors
)
# Draw the grid
final_legends = (
legends
if legends is not None
else [f"Molecule {i}" for i in range(len(mols_to_draw))]
)
img = Draw.MolsToGridImage(
mols_to_draw,
molsPerRow=mols_per_row,
subImgSize=sub_img_size,
legends=final_legends,
highlightAtomLists=highlight_lists,
highlightAtomColors=highlight_colors,
)
return img
def visualize_selected_molecules(
mol: Chem.Mol,
*args,
mols_per_row: int = 4,
sub_img_size: tuple[int, int] = (200, 200),
legends: list[str] | None = None,
alpha: float = 0.5,
):
"""Visualize molecules with optional atom highlighting.
If no atom selections are provided, displays the molecule without highlights.
Duplicate molecular structures will only be plotted once.
Parameters
----------
mol : Chem.Mol
The RDKit molecule object, which may contain multiple fragments.
*args : list[int] or tuple[int]
Variable number of lists/tuples containing atom indices to be highlighted.
Each selection will be assigned a different color from matplotlib's tab10
colormap. If no arguments provided, displays the molecule without highlights.
mols_per_row : int, default=4
Number of molecules per row in the grid.
sub_img_size : tuple[int, int], default=(200, 200)
Size of each molecule image in pixels.
legends : list[str], optional
Custom legends for each molecule. If None, default legends will be used.
alpha : float, default=0.5
Transparency level for the highlighted atoms (0.0 = fully transparent,
1.0 = opaque).
Returns
-------
PIL.Image
A PIL image object of the grid.
Examples
--------
>>> from molify import smiles2atoms, ase2rdkit, match_substructure
>>> from molify import visualize_selected_molecules
>>>
>>> # Create and convert molecule
>>> mol = ase2rdkit(smiles2atoms("Cc1ccccc1"))
>>>
>>> # Find aromatic and aliphatic carbons
>>> aromatic = match_substructure(mol, "c")
>>> aliphatic = match_substructure(mol, "[C;!c]")
>>>
>>> # Flatten matches for visualization
>>> aromatic_flat = [idx for match in aromatic for idx in match]
>>> aliphatic_flat = [idx for match in aliphatic for idx in match]
>>>
>>> # Visualize with highlights
>>> visualize_selected_molecules(mol, aromatic_flat, aliphatic_flat)
<PIL.Image>
"""
# Handle empty args case - display molecule without highlights
if not args:
img = Draw.MolsToGridImage(
[mol],
molsPerRow=mols_per_row,
subImgSize=sub_img_size,
legends=legends if legends is not None else ["Molecule 0"],
)
return img
# Collect highlighted fragments
candidate_mols, candidate_highlights, candidate_colors = (
_collect_highlighted_fragments(mol, args, alpha)
)
if not candidate_mols:
# Return placeholder image when no matches
from PIL import Image, ImageDraw
img = Image.new('RGB', sub_img_size, 'white')
draw = ImageDraw.Draw(img)
draw.text((10, sub_img_size[1]//2), "No matches found", fill='black')
return img
# Filter for unique molecules
mols_to_draw, highlight_lists, highlight_colors = _filter_unique_molecules(
candidate_mols, candidate_highlights, candidate_colors
)
# Draw the grid
final_legends = (
legends
if legends is not None
else [f"Molecule {i}" for i in range(len(mols_to_draw))]
)
img = Draw.MolsToGridImage(
mols_to_draw,
molsPerRow=mols_per_row,
subImgSize=sub_img_size,
legends=final_legends,
highlightAtomLists=highlight_lists,
highlightAtomColors=highlight_colors,
)
return img
🤖 Prompt for AI Agents
In src/molify/substructure.py around lines 387 to 482, the function docstring
claims it returns a PIL.Image but the code returns None at line ~460 when no
molecules match; update the docstring and type information to reflect that the
function may return None (i.e., return type Optional[PIL.Image]) and document
the None case explicitly (or alternatively choose to raise a ValueError or
return a placeholder image—if you pick that, implement the chosen behavior and
update the docstring accordingly). Ensure imports/types reflect Optional and
that the docstring describes when None is returned and what callers should do.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/test_substructure.py (1)

70-135: Explicit-[H] SMARTS tests are valuable; consider clarifying the comments.

The BMIM/BF₄⁻ tests (including the explicit-[H] SMARTS cases) are a strong regression suite for ase2rdkit and match_substructure. The only nit is the comments:

  • Lines 113–114 and 133–134 currently say “explicit only hydrogens are returned”, which reads awkwardly and is slightly misleading for the hydrogens="include" case (heavy atoms are also returned).

A clearer wording would be:

-        # When pattern has explicit [H], only heavy atoms are returned by default
+        # When the pattern has explicit [H], only heavy atoms are returned by default
@@
-        # explicit only hydrogens are returned
+        # Only hydrogens from the explicit-[H] pattern are returned
@@
-        # explicit only hydrogens are returned
-        assert len(match) == 25  # 10 heavy + 15 H
+        # All heavy atoms plus their explicit hydrogens are returned
+        assert len(match) == 25  # 10 heavy + 15 H
♻️ Duplicate comments (2)
src/molify/substructure.py (2)

275-313: iter_fragments return type annotation is inconsistent with its generator behavior.

iter_fragments uses yield in both branches, so it’s a generator, but the signature advertises -> list[ase.Atoms] and the docstring still uses a “Yields” section. This will confuse type checkers and readers, and it was already flagged previously.

A minimal fix is to adjust the return type (and possibly the docstring heading):

-def iter_fragments(atoms: ase.Atoms) -> list[ase.Atoms]:
+def iter_fragments(atoms: ase.Atoms) -> tp.Generator[ase.Atoms, None, None]:
@@
-    Yields
-    ------
-    ase.Atoms
-        Each connected component (fragment) in the input structure.
+    Yields
+    ------
+    ase.Atoms
+        Each connected component (fragment) in the input structure.

Callers like list(iter_fragments(...)) remain valid.


387-482: visualize_selected_molecules docstring should acknowledge the None return.

When there are no candidate molecules to draw (no selections intersect any fragment), the function prints a message and returns None, but the docstring claims it always returns a PIL.Image. This mismatch was noted earlier and can surprise callers.

If you keep the current behavior, please update the docstring “Returns” section, e.g.:

-    Returns
-    -------
-    PIL.Image
-        A PIL image object of the grid.
+    Returns
+    -------
+    PIL.Image or None
+        A PIL image object of the grid, or None if no molecules contain
+        any of the selected atoms.

Optionally, consider replacing the print with a logged warning or a ValueError if you’d rather fail fast.

🧹 Nitpick comments (2)
src/molify/substructure.py (1)

13-170: match_substructure logic matches the tests; consider dropping the unused noqa.

The SMARTS/SMILES parsing, atom-map validation, map-ordering, and hydrogen handling modes all align with the test expectations and the docstring examples. The # noqa: C901 on Line 13 is no longer needed per Ruff (RUF100: unused noqa), so you can safely remove it to keep the file lint-clean.

src/molify/utils.py (1)

180-217: rdkit_determine_bonds heuristic is useful; consider guarding the MolFromXYZ failure path.

The charge-scan loop plus special cases for monoatomic ions and PF₆⁻ are practical. One edge case to consider is when rdkit.Chem.MolFromXYZBlock(xyz) returns None (e.g., malformed XYZ); in that case DetermineBonds(mol, ...) will raise a non-ValueError exception that isn’t caught, and the final ValueError won’t be reached. A simple defensive check before the charge loop (e.g., if mol is None: raise ValueError("Failed to parse XYZ block")) would make failures easier to interpret.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 93c9cf6 and de2b25b.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (6)
  • docs/source/index.rst (2 hunks)
  • docs/source/packmol_tools.ipynb (2 hunks)
  • pyproject.toml (4 hunks)
  • src/molify/substructure.py (1 hunks)
  • src/molify/utils.py (1 hunks)
  • tests/test_substructure.py (8 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • pyproject.toml
🧰 Additional context used
🧬 Code graph analysis (3)
tests/test_substructure.py (5)
src/molify/ase2x.py (1)
  • ase2rdkit (187-220)
src/molify/substructure.py (4)
  • match_substructure (13-170)
  • get_substructures (236-272)
  • group_matches_by_fragment (173-233)
  • visualize_selected_molecules (387-482)
src/molify/smiles2x.py (2)
  • smiles2conformers (64-149)
  • smiles2atoms (33-61)
src/molify/pack.py (1)
  • pack (143-247)
tests/conftest.py (3)
  • ethanol_water (8-12)
  • alanine_dipeptide (16-18)
  • alanine_dipeptide_box (22-25)
src/molify/substructure.py (2)
src/molify/ase2x.py (1)
  • ase2rdkit (187-220)
src/molify/utils.py (1)
  • find_connected_components (31-61)
src/molify/utils.py (2)
src/molify/ase2x.py (1)
  • ase2networkx (97-184)
src/molify/rdkit2x.py (1)
  • rdkit2networkx (61-125)
🪛 Ruff (0.14.4)
src/molify/substructure.py

13-13: Unused noqa directive (non-enabled: C901)

Remove unused noqa directive

(RUF100)


86-86: Avoid specifying long messages outside the exception class

(TRY003)


95-95: Avoid specifying long messages outside the exception class

(TRY003)


162-165: Avoid specifying long messages outside the exception class

(TRY003)

src/molify/utils.py

28-28: Avoid specifying long messages outside the exception class

(TRY003)


182-182: Avoid specifying long messages outside the exception class

(TRY003)


208-208: Consider moving this statement to an else block

(TRY300)


212-216: Avoid specifying long messages outside the exception class

(TRY003)


230-230: Unused noqa directive (non-enabled: C901)

Remove unused noqa directive

(RUF100)


321-321: Avoid specifying long messages outside the exception class

(TRY003)


338-341: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
  • GitHub Check: pytest (3.12, ubuntu-latest)
  • GitHub Check: Build wheels on windows-11-arm
  • GitHub Check: Build wheels on windows-latest
  • GitHub Check: Build wheels on ubuntu-latest
  • GitHub Check: Build wheels on ubuntu-24.04-arm
  • GitHub Check: Build wheels on ubuntu-latest
  • GitHub Check: Build wheels on windows-11-arm
  • GitHub Check: Build wheels on windows-latest
  • GitHub Check: Build wheels on ubuntu-24.04-arm
🔇 Additional comments (12)
docs/source/index.rst (1)

4-180: Index page restructure and cross-links look consistent with the new API.

The updated overview, quick start, advanced features, and the toctree/doc links (including smiles2conformers, pack, and the packmol_tools/ase_tools/rdkit_tools/atom_selection targets) are coherent with the current public API and the notebook-based docs layout.

docs/source/packmol_tools.ipynb (1)

4-307: Packmol notebook narrative matches the current packing API.

The water box and water+ethanol examples, connectivity discussion, and the density analysis via calculate_density all line up with the new pack behavior and the RDKit/ASE-based workflow; this is a good, concrete companion to the updated docs.

tests/test_substructure.py (6)

7-56: Core substructure and get_substructures tests are well-aligned with the new API.

These tests validate CH₃ matching, multi-molecule boxes, and NO₃ extraction (both single structure and packed box) via match_substructure and get_substructures with suggestions. They give good coverage of the RDKit-centric path and atom-index expectations.


58-68: iter_fragments tests correctly cover connectivity-present and -absent cases.

Using remove_connectivity to toggle between explicit connectivity and ase.build.separate ensures both code paths in iter_fragments are exercised and confirm the expected 10 water trimers in all cases.


181-261: Hydrogen-handling and mapped-SMILES tests give strong behavioral guarantees.

The parametrized tests for hydrogens="exclude", "include", and "isolated" plus the mapped-SMILES cases (with and without hydrogens) tightly pin down the ordering and contents of returned indices, which is crucial for users relying on deterministic mapping semantics.


269-372: Fragment grouping and map-order tests thoroughly exercise grouping logic.

The group_matches_by_fragment tests with ethanol+water and alanine dipeptide (including error-on-duplicate-label and map-order invariants) robustly validate fragment detection and map-number sorting, both with and without hydrogens. This matches the documented behavior of group_matches_by_fragment and match_substructure(mapped_only=True).


414-452: None bond-order regression test correctly protects the ase2rdkit fix.

test_match_substructure_ions_with_none_bond_orders accurately reproduces the reported failure mode by forcing all bond_order values to None and then verifying ase2rdkit and match_substructure work for Na⁺, Cl⁻, and water. This is an excellent high-value regression test.


459-512: Visualization tests sensibly cover basic usage patterns.

The visualization tests (basic, empty selections, overlapping/multiple selections, and varying alpha) assert that visualize_selected_molecules returns a non-None image under valid selections, which is a pragmatic smoke test for the new highlighting interface.

src/molify/utils.py (4)

18-62: Connectivity helpers are straightforward and robust.

bond_type_from_order and find_connected_components are simple and clear; the latter’s NetworkX-first, DFS-fallback approach avoids hard dependency issues while only touching the i, j parts of the connectivity tuples, so None bond orders are harmless here.


64-178: Density and unwrapping utilities look consistent with ASE conventions.

calculate_density and calculate_box_dimensions follow ASE units, and unwrap_structures’ DFS over the ase2networkx graph (with image offsets via cell.scaled_positions) matches the documented behavior of unwrapping bonded fragments without mutating the original atoms. The example in the docstring reflects the implemented algorithm.


219-227: suggestions2networkx is a clean bridge from SMILES to graphs.

The function correctly normalizes each SMILES string via RDKit (including Chem.AddHs) and reuses rdkit2networkx, keeping SMILES-to-graph logic centralized.


389-497: Bond-order edge rendering is well thought out.

_draw_edges_with_bond_orders correctly encodes single/double/triple/aromatic bonds as parallel lines with perpendicular offsets, and gracefully falls back to thin or single lines when bond_order is None or unknown. The geometry math is simple and guards against zero-length edges.

Comment on lines +230 to +386
def draw_molecular_graph( # noqa: C901
graph: nx.Graph,
layout: Literal["spring", "circular", "kamada_kawai"] = "kamada_kawai",
figsize: tuple[float, float] = (8, 8),
node_size: int = 500,
font_size: Optional[int] = None,
show_bond_orders: bool = True,
weight_by_bond_order: bool = True,
ax: Optional[Axes] = None,
) -> Figure:
"""Draw a molecular graph with bond order visualization.
This function visualizes NetworkX molecular graphs with proper representation
of chemical bonds. Single, double, triple, and aromatic bonds are displayed
with appropriate line styles.
Parameters
----------
graph : nx.Graph
NetworkX graph with node attribute 'atomic_number' and edge attribute
'bond_order'. Can be created from molify conversion functions like
ase2networkx() or rdkit2networkx().
layout : Literal["spring", "circular", "kamada_kawai"], optional
Layout algorithm to use for node positioning. Options are:
* `spring`: Force-directed layout
* `circular`: Nodes arranged in a circle
* `kamada_kawai`: Force-directed using Kamada-Kawai algorithm (default)
figsize : tuple[float, float], optional
Figure size as (width, height) in inches. Default is (8, 8).
node_size : int, optional
Size of the nodes. Default is 500.
font_size : int, optional
Font size for node labels (atomic numbers). If None, automatically scaled
based on node_size (approximately 0.66 * sqrt(node_size)). Default is None.
show_bond_orders : bool, optional
If True, visualize different bond orders with multiple parallel lines.
Default is True.
weight_by_bond_order : bool, optional
If True and layout is 'spring', use bond_order as edge weights in the
spring layout algorithm. Higher bond orders pull atoms closer together.
This makes double/triple bonds appear shorter than single bonds.
Default is True.
ax : Axes, optional
Matplotlib axes to draw on. If None, creates a new figure.
Returns
-------
Figure
Matplotlib figure containing the molecular graph visualization.
Notes
-----
**Node Colors:**
- Nodes are colored using Jmol color scheme based on atomic numbers
- Common colors: Carbon (gray), Hydrogen (white), Oxygen (red),
Nitrogen (blue)
**Bond Order Visualization:**
- Single bond (1.0): 1 solid line
- Double bond (2.0): 2 parallel solid lines
- Triple bond (3.0): 3 parallel solid lines
- Aromatic bond (1.5): 1 solid line + 1 dashed line (parallel)
- Unknown (None): 1 thin solid line
Examples
--------
>>> from molify import smiles2atoms, ase2networkx, draw_molecular_graph
>>> atoms = smiles2atoms("C=C=O") # Ketene
>>> graph = ase2networkx(atoms)
>>> fig = draw_molecular_graph(graph, layout='spring')
>>> # For benzene showing aromatic bonds
>>> benzene = smiles2atoms("c1ccccc1")
>>> graph_benzene = ase2networkx(benzene)
>>> fig = draw_molecular_graph(graph_benzene, show_bond_orders=True)
"""
# Calculate font_size and linewidth based on node_size if not provided
if font_size is None:
font_size = int(0.66 * np.sqrt(node_size))

linewidth = 0.091 * np.sqrt(node_size)

# Create figure if axes not provided
if ax is None:
fig, ax = plt.subplots(figsize=figsize)
created_fig = True
else:
fig_or_none = ax.get_figure()
if fig_or_none is None:
raise ValueError("Provided axes is not attached to a figure")
# Cast to Figure (assuming user provides proper Axes, not SubFigure axes)
fig = cast(Figure, fig_or_none)
created_fig = False

# Compute node positions using the specified layout
if layout == "spring":
if weight_by_bond_order:
# Use bond_order as weights - higher order = stronger spring
pos = nx.spring_layout(graph, weight="bond_order", seed=42)
else:
pos = nx.spring_layout(graph, seed=42)
elif layout == "circular":
pos = nx.circular_layout(graph)
elif layout == "kamada_kawai":
pos = nx.kamada_kawai_layout(graph)
else:
raise ValueError(
f"Unknown layout '{layout}'. "
f"Choose from 'spring', 'circular', 'kamada_kawai'."
)

# Draw edges with bond order visualization
if show_bond_orders:
_draw_edges_with_bond_orders(graph, pos, ax)
else:
nx.draw_networkx_edges(graph, pos, ax=ax, width=2)

# Get node colors based on atomic numbers using Jmol colors
node_colors = []
for node in graph.nodes():
atomic_number = graph.nodes[node].get("atomic_number", 0)
# jmol_colors is indexed by atomic number (1-indexed for elements)
if atomic_number > 0 and atomic_number < len(jmol_colors):
node_colors.append(jmol_colors[atomic_number])
else:
node_colors.append([0.5, 0.5, 0.5]) # Gray for unknown

# Draw nodes with Jmol colors
nx.draw_networkx_nodes(
graph,
pos,
node_size=node_size,
node_color=node_colors,
edgecolors="black",
linewidths=linewidth,
ax=ax,
)

# Draw labels (atomic numbers)
labels = nx.get_node_attributes(graph, "atomic_number")
nx.draw_networkx_labels(
graph, pos, labels=labels, font_size=font_size, font_color="black", ax=ax
)

# Add margins to prevent node clipping at edges
ax.margins(0.15)
ax.axis("off")
fig.tight_layout()

# Close figure to prevent double display in notebooks
# (only if we created it; otherwise it's managed externally)
if created_fig:
plt.close(fig)

return fig
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

spring layout weighting breaks when bond_order is None; normalize weights before calling NetworkX.

draw_molecular_graph advertises support for graphs from ase2networkx() and rdkit2networkx(), but edges from ase2networkx have bond_order=None. With layout="spring" and weight_by_bond_order=True (the default), this leads to:

pos = nx.spring_layout(graph, weight="bond_order", seed=42)

which will pass None weights into NetworkX’s spring layout. Depending on NetworkX version, this is likely to raise a TypeError or produce undefined behavior, and it contradicts the docstring example that uses draw_molecular_graph(ase2networkx(atoms), layout='spring').

To make this robust, normalize weights before calling spring_layout, e.g.:

-    if layout == "spring":
-        if weight_by_bond_order:
-            # Use bond_order as weights - higher order = stronger spring
-            pos = nx.spring_layout(graph, weight="bond_order", seed=42)
-        else:
-            pos = nx.spring_layout(graph, seed=42)
+    if layout == "spring":
+        if weight_by_bond_order:
+            # Build a view with numeric weights: None (or missing) -> 1.0
+            g_w = nx.Graph()
+            g_w.add_nodes_from(graph.nodes(data=True))
+            for u, v, data in graph.edges(data=True):
+                w = data.get("bond_order")
+                w_num = float(w) if isinstance(w, (int, float)) else 1.0
+                g_w.add_edge(u, v, weight=w_num)
+            pos = nx.spring_layout(g_w, weight="weight", seed=42)
+        else:
+            pos = nx.spring_layout(graph, seed=42)

Also, Ruff reports the # noqa: C901 on the function signature as unused; you can drop it once complexity is acceptable to your linter.


I'll verify the critical claims in this review comment by examining the codebase.



🏁 Script executed:

#!/bin/bash
# Find and examine ase2networkx and rdkit2networkx implementations
fd -e py -type f | xargs rg -l "def ase2networkx|def rdkit2networkx" 2>/dev/null | head -5

Length of output: 212


🏁 Script executed:

#!/bin/bash
# Search for ase2networkx function definition and check bond_order assignment
rg -A 30 "def ase2networkx" --type py

Length of output: 1562


🏁 Script executed:

#!/bin/bash
# Search for rdkit2networkx function definition  
rg -A 30 "def rdkit2networkx" --type py

Length of output: 1465


🏁 Script executed:

#!/bin/bash
# Check NetworkX version in dependencies
fd -e txt -o -e toml -o -e cfg | xargs rg -i "networkx|nx" 2>/dev/null | head -20

Length of output: 1488


🏁 Script executed:

#!/bin/bash
# Search for tests using draw_molecular_graph with spring layout
rg -B 5 -A 5 "draw_molecular_graph.*spring|spring.*draw_molecular_graph" --type py

Length of output: 672


🏁 Script executed:

#!/bin/bash
# Check if noqa comment C901 is actually unused by examining function complexity
rg -C 10 "def draw_molecular_graph" --type py | head -40

Length of output: 1069


🏁 Script executed:

#!/bin/bash
# Verify if there are test cases exercising the specific docstring example
rg -B 2 -A 10 "smiles2atoms.*ase2networkx.*draw_molecular_graph" --type py

Length of output: 795


🏁 Script executed:

#!/bin/bash
# Check what happens when bond_order is None in edge data
rg -B 5 -A 10 "bond_order.*None|None.*bond_order" --type py

Length of output: 27836


🏁 Script executed:

#!/bin/bash
# Look for version pin on networkx
cat pyproject.toml requirements.txt setup.py 2>/dev/null | grep -i networkx

Length of output: 45


🏁 Script executed:

#!/bin/bash
# Check if there's actual error handling or validation for bond_order values
rg -B 3 -A 3 "weight_by_bond_order" --type py

Length of output: 1358


#!/usr/bin/env python3
# Test what NetworkX 3.4.2+ does with None values in edge weights
import networkx as nx

# Create a simple graph with None edge weight
g = nx.Graph()
g.add_node(0)
g.add_node(1)
g.add_edge(0, 1, bond_order=None)

try:
    # Try to use spring_layout with bond_order as weight
    pos = nx.spring_layout(g, weight="bond_order", seed=42)
    print("SUCCESS: spring_layout handled None weights without error")
    print(f"Positions: {pos}")
except TypeError as e:
    print(f"TypeError caught: {e}")
except Exception as e:
    print(f"Exception caught ({type(e).__name__}): {e}")

🏁 Script executed:

#!/bin/bash
# Search for actual tests that call draw_molecular_graph with spring layout
rg -B 5 -A 10 "draw_molecular_graph" tests/ --type py

Length of output: 0



🏁 Script executed:

#!/bin/bash
# Find _draw_edges_with_bond_orders implementation
rg -B 5 -A 40 "def _draw_edges_with_bond_orders" --type py

Length of output: 2138


🏁 Script executed:

#!/bin/bash
# Check if there are any tests that exercise spring layout with None bond_order
rg -B 10 -A 10 "spring.*ase2networkx|ase2networkx.*spring" tests/ --type py

Length of output: 0


🏁 Script executed:

#!/bin/bash
# Look for any error handling around NetworkX spring_layout calls
rg -B 5 -A 5 "spring_layout" src/ --type py

Length of output: 746


🏁 Script executed:

#!/bin/bash
# Check actual implementations and edge handling in spring layout path
rg -B 2 -A 2 "weight_by_bond_order.*True" src/ --type py

Length of output: 254



🌐 Web query:

NetworkX spring_layout None edge weight values behavior

💡 Result:

Short answer: spring_layout treats the parameter weight as the name of the edge attribute that scales the attractive spring force; if you pass weight=None then the algorithm treats the graph as unweighted (every edge weight = 1). [1][2]

References

  • spring_layout docstring (weight: string or None — "If None, then all edge weights are 1"). [1]
  • NetworkX release notes / API: "Use weight=None to consider all weights equally (unweighted case)". [2]


🌐 Web query:

NetworkX spring_layout TypeError None values edge attributes weights

💡 Result:

Cause

  • spring_layout expects the edge attribute named by weight to contain numeric values (or weight=None to treat all edges as weight=1). If some edges have the attribute but its value is None, the layout code will try numeric operations / comparisons on None and raise a TypeError. See the spring_layout docs which state weight is a string or None and that None means all edge weights are 1. [1][2]

Fixes (pick one)

  • Tell spring_layout to ignore edge attributes and use uniform weights:
    • nx.spring_layout(G, weight=None)
  • Replace missing/None edge weights with numbers before calling spring_layout:
    • for u,v,d in G.edges(data=True):
      if d.get('weight') is None:
      d['weight'] = 1.0
  • Or set a default for all edges:
    • nx.set_edge_attributes(G, 1.0, 'weight')

References

  • NetworkX spring_layout (documentation). [1]
  • NetworkX layout source/docs (weight parameter behavior). [2]


🏁 Script executed:

#!/bin/bash
# Check Ruff configuration for C901 (McCabe complexity)
fd -name "ruff.toml" -o -name "pyproject.toml" -o -name ".ruff.toml" | xargs cat 2>/dev/null | grep -A 5 -B 5 "C901\|complexity\|mccabe"

Length of output: 230


🏁 Script executed:

#!/bin/bash
# Count cyclomatic complexity indicators in draw_molecular_graph (if, elif, for, etc.)
rg -A 160 "def draw_molecular_graph" src/molify/utils.py | grep -E "^\s+(if|elif|for|while|except)" | wc -l

Length of output: 2


Fix broken spring layout with None bond_order values from ase2networkx; normalize weights before calling spring_layout.

The review comment is accurate. When spring_layout expects the edge attribute named by weight to contain numeric values, if some edges have the attribute but its value is None, the layout code will try numeric operations on comparisons on None and raise a TypeError. The docstring example calls draw_molecular_graph(ase2networkx(atoms), layout='spring'), and since ase2networkx() produces edges with bond_order=None, this will fail at runtime with the current code.

The suggested fix—normalizing None weights to 1.0 before passing to spring_layout—is the correct approach. The secondary point about the unused # noqa: C901 comment is less critical but reasonable to address once the complexity is reduced or confirmed acceptable by your linter.

🧰 Tools
🪛 Ruff (0.14.4)

230-230: Unused noqa directive (non-enabled: C901)

Remove unused noqa directive

(RUF100)


321-321: Avoid specifying long messages outside the exception class

(TRY003)


338-341: Avoid specifying long messages outside the exception class

(TRY003)

🤖 Prompt for AI Agents
In src/molify/utils.py around lines 230-386, the spring_layout call breaks when
some edges have bond_order=None; normalize edge weights before calling
spring_layout by creating a numeric edge attribute (e.g., "spring_weight") where
you set spring_weight = float(bond_order) for numeric bond_order values and
spring_weight = 1.0 for None/unknown, then call nx.spring_layout(graph,
weight="spring_weight", seed=42); ensure you either remove the temporary
attribute after layout or set it on a shallow copy of the graph so original
bond_order values remain unchanged.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
.github/workflows/build-wheels.yaml (1)

25-25: Consider normalizing submodule fetch strategy for consistency.

The build_wheels job uses submodules: recursive (line 25) while build_sdist uses submodules: true (line 46). This inconsistency is unlikely to cause issues, but since the comment flags packmol as critical, both jobs should fetch submodules identically to ensure the SDist includes all dependencies.

Update line 46 to match line 25 for clarity:

      - uses: actions/checkout@v5
        with:
          fetch-depth: 0
-         submodules: true
+         submodules: recursive

Also applies to: 46-46

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between de2b25b and 38d6384.

📒 Files selected for processing (1)
  • .github/workflows/build-wheels.yaml (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
  • GitHub Check: pytest (3.10, windows-latest)
  • GitHub Check: pytest (3.11, windows-latest)
  • GitHub Check: pytest (3.12, macos-15-intel)
  • GitHub Check: pytest (3.12, macos-latest)
  • GitHub Check: pytest (3.13, windows-latest)
  • GitHub Check: pytest (3.12, windows-latest)
  • GitHub Check: Build wheels on windows-11-arm
  • GitHub Check: Build wheels on macos-15-intel
  • GitHub Check: Build wheels on windows-latest
  • GitHub Check: Build wheels on macos-latest
  • GitHub Check: Build wheels on ubuntu-latest
  • GitHub Check: Build wheels on ubuntu-24.04-arm
🔇 Additional comments (3)
.github/workflows/build-wheels.yaml (3)

3-11: ✓ Critical blocker resolved: release trigger now present.

The previous review flagged that the upload_pypi job was unreachable because the workflow lacked a release trigger. This has now been fixed—the workflow correctly triggers on release: { types: [published] }, so PyPI uploads will execute when you publish a GitHub Release.


31-31: Verify that macOS 15.0 deployment target is intentional.

Setting MACOSX_DEPLOYMENT_TARGET=15.0 (macOS Sequoia, released Sept 2024) is extremely restrictive and will exclude all users on macOS 11–14. This is unusually aggressive for package distribution.

Confirm this is intentional and aligns with your project's minimum macOS version requirements. If you need broader compatibility, consider a lower deployment target (e.g., 11.0 or 10.9).


20-20: Verify ARM runner availability in your GitHub environment.

The matrix includes ubuntu-24.04-arm and windows-11-arm runners. These are not standard GitHub-hosted runners; they require either GitHub Large Runners (paid tier) or custom self-hosted runner configuration. If these runners are unavailable, the matrix will fail to expand or jobs will hang waiting for runners.

Confirm:

  • Are you using GitHub Large Runners?
  • Are the runner labels (ubuntu-24.04-arm, windows-11-arm) correctly registered?
  • If not needed, consider removing ARM variants to reduce cost/complexity.

@PythonFZ PythonFZ merged commit 82600fe into main Nov 15, 2025
30 checks passed
@PythonFZ PythonFZ deleted the update-docs branch November 15, 2025 09:45
@coderabbitai coderabbitai bot mentioned this pull request Nov 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants