Skip to content

[ENH] ICA LiNGAM algorithm in casual_discovery#2998

Draft
Jatinbhardwaj-093 wants to merge 11 commits intopgmpy:devfrom
Jatinbhardwaj-093:causal_discovery/ICALiNGAM
Draft

[ENH] ICA LiNGAM algorithm in casual_discovery#2998
Jatinbhardwaj-093 wants to merge 11 commits intopgmpy:devfrom
Jatinbhardwaj-093:causal_discovery/ICALiNGAM

Conversation

@Jatinbhardwaj-093
Copy link
Contributor

@Jatinbhardwaj-093 Jatinbhardwaj-093 commented Mar 16, 2026

The following checklist is mandatory.

Your PR will be closed if you remove the checklist or do not answer the questions to a satisfactory level. Use of LLM is strictly forbidden for any part of this checklist (even for improving language).

Your checklist for this pull request

  • Have you followed all the steps from our Contributing Guide?
  • Does the PR fully address the linked issue and is within its defined scope? If you are still working on the PR, mark it as draft.
  • Are all the GitHub Actions checks passing? If not, mark your PR as draft while you fix it.

Please answer the following questions:

  • Did you use an LLM for any assistance? Please describe how and what you used it for?
    Yes. I used Gemini 3.1 primarily for assistance in understanding and summarizing parts of the LiNGAM research paper. In particular, it helped clarify
    The algorithm for permutation of the weight matrix.
    The overall steps of the LiNGAM algorithm.
    The statistical pruning methods described in Section 6 of the paper (Wald test and A Chi-Square Test for Evaluating the Overall Fit of the Estimated Model), which are not yet implemented in the current code.
    The implementation itself was written by me based on the original paper and my understanding of the algorithm.
  • What steps have you taken to verify that the changes correctly address the issue? And what edge cases have you considered?
    The implementation was written based on the LiNGAM paper and verified by running the algorithm on synthetic datasets with known causal structures.
    A small number of tests were generated with the help of an AI tool and then manually reviewed. Currently, two simple test cases are included to verify that the discovered causal graph matches the expected structure.
  • Has the LLM added try-except blocks? They will need to be removed; any error handling must be explicit.
    No.
  • Have you used LLM for generating tests? They need to be compressed into a smaller number of tests without reducing coverage.
    Some initial test scaffolding was generated with the help of an AI tool and then reviewed manually. However, the test suite is not finalized in this PR. A more complete and compact set of tests will be added in a subsequent update.

Issue number(s) that this pull request fixes

List of changes to the codebase in this pull request

  • Added the algorithm and few test.

@Jatinbhardwaj-093 Jatinbhardwaj-093 force-pushed the causal_discovery/ICALiNGAM branch from 23a75bd to 6570834 Compare March 17, 2026 07:20
@Jatinbhardwaj-093
Copy link
Contributor Author

Remaining Tasks

  • Section 6: Statistical Tests for Pruning Edges.
    • Wald Test for Examining Significance of Edges
  • Tests
    • Add tests on real datasets
    • Add few more synthetic/random tests with known causal structures.
  • Add proper documentation

@codecov
Copy link

codecov bot commented Mar 17, 2026

Codecov Report

❌ Patch coverage is 95.95376% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 95.57%. Comparing base (95637a5) to head (6570834).
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
pgmpy/causal_discovery/LiNGAM.py 94.06% 7 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff            @@
##              dev    #2998    +/-   ##
========================================
  Coverage   95.57%   95.57%            
========================================
  Files         504      506     +2     
  Lines       29122    29295   +173     
========================================
+ Hits        27833    27999   +166     
- Misses       1289     1296     +7     
Files with missing lines Coverage Δ
pgmpy/causal_discovery/__init__.py 100.00% <100.00%> (ø)
pgmpy/tests/test_causal_discovery/test_LiNGAM.py 100.00% <100.00%> (ø)
pgmpy/causal_discovery/LiNGAM.py 94.06% <94.06%> (ø)

Copy link
Member

@ankurankan ankurankan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Jatinbhardwaj-093 I see that you are also trying to include expert knowledge in the implementation. Did you find a reference paper for it? If not, I would suggest keeping the initial implementation without it.

@Jatinbhardwaj-093
Copy link
Contributor Author

@Jatinbhardwaj-093 I see that you are also trying to include expert knowledge in the implementation. Did you find a reference paper for it? If not, I would suggest keeping the initial implementation without it.

I initially assumed that expert knowledge was a native feature in pgmpy, which is why I didn’t explore it in depth during the first draft. I will look for relevant reference papers to support its inclusion. If I’m unable to find sufficient backing, I’ll remove or revise this part accordingly.

@Jatinbhardwaj-093
Copy link
Contributor Author

@ankurankan

I was thinking about the variant parameter you suggested. Please correct me if I’m wrong, but my understanding is that this would correspond to different types of LiNGAM algorithms, such as ICA, Direct, VAR, Kernel, and others.

If supporting multiple variants is a primary goal, would it make sense to create a single master wrapper class that selects the appropriate LiNGAM implementation based on the variant parameter? Which will have this parameter.

@ankurankan
Copy link
Member

ankurankan commented Mar 17, 2026

@Jatinbhardwaj-093 In the case of LiNGAM, I think the two main algorithms are Direct and ICA. I don't think there is a significant overlap between the two, so might be better to implement them in separate classes and not have a variant argument. Sorry, if I suggested to add that earlier.

# Step 1: Apply an ICA algorithm to obtain a decomposition X = AS where S has
# the same size as X and contains in its rows the independent components.
# From here on, we will exclusively work with W = A^-1.
ica = FastICA(random_state=self.random_state, max_iter=1000)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest taking an instance of the FastICA as the argument itself. This makes the method composable, and users can fine-tune the FastICA hyperparameters without us having to design our method in a way that allows users to specify that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants