-
-
Notifications
You must be signed in to change notification settings - Fork 154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update CL XML and head_matter fields with data from CAP #4614
base: main
Are you sure you want to change the base?
Conversation
for more information, see https://pre-commit.ci
@flooie, to you for triage, analysis, or both! :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found a problem, i was testing the command with this cluster: https://www.courtlistener.com/opinion/1539264/go/ (https://static.case.law/a2d/191/html/0138-01.html) and i saw that the resulting xml removed the link from the first footnote and also there is a wrong link in the first footnote in the updated xml:
<footnote data-label="1" id="footnote_1_1">
<footnote citation-index="1" href="#fn2_ref" label="139">1</footnote>
<p data-blocks="[["BL_175.11",175,[157,2661,695,68]]]" id="b175-8">. Schibi v. Schibi, <citation data-cite="136 Conn. 190" data-index="0" href="/citations/?q=136%20Conn.%20190">136 Conn. 190</citation>, <citation data-cite="69 A.2d 831" data-index="1" href="/citations/?q=69%20A.2d%20831">69 A.2d 831</citation>, <citation data-cite="14 A.L.R. 2d 620" data-index="2" href="/citations/?q=14%20A.L.R.%202d%20620">14 A.L.R.2d 620</citation>.</p>
</footnote>
When we ran the harvard merger command to update opinions and metatada, it also fixed the footnotes(regenerated the tag and link), and that may be breaking the update_cap_html_with_cl_xml function.
Here is how we fixed the footnotes to be linked correctly: https://github.com/freelawproject/courtlistener/blob/main/cl/corpus_importer/management/commands/harvard_merge.py#L518
this is the xml of the cluster i mentioned above:
<?xml version="1.0" encoding="utf-8"?><opinion type="majority"><author id="b174-23"> HOOD, Chief Judge. </author><p id="b174-24"> This appeal is by a husband from an order dismissing his complaint seeking a divorce on the ground of five years voluntary separation. </p><p id="b174-25"> The facts, as found by the trial court, are these. A child was born out of wedlock to the parties in April of 1955. In November of that year the parties were legally married, but separated eight days later and have not lived together since that time. Prior to and at the time of the marriage the parties agreed that the purpose of the marriage was to give the child a legal name and that if they were not satisfied with the marriage a divorce could be obtained. </p><p id="b174-26"> The trial court denied the divorce on the ground that the agreement of the parties prior to and at the time of marriage was collusive and contrary to law. </p><p id="b175-4"><span citation-index="1" class="star-pagination" label="139"> *139 </span> The parties, as the court found, were legally married. Although a marriage is entered into solely for the purpose of legitimizing a child born out of wedlock, such a marriage is a valid one. <a class="footnote" href="#fn1" id="fn1_ref"> 1 </a> The court also found that the parties had lived separate and apart for more than five years. The court did not expressly state that the separation was voluntary, but that is implicit in its finding, and there is no intimation in the record that the separation was other than voluntary. Under our law proof of a valid marriage and five years voluntary separation entitles either party to a divorce. <a class="footnote" href="#fn2" id="fn2_ref"> 2 </a> The sole question is whether the agreement of the parties at the time of marriage bars granting of the divorce. </p><p id="b175-5"> The agreement did not constitute collusion in a legal sense. In general it may be said that collusion, in the law of divorce, implies a corrupt agreement by which evidence is fabricated or suppressed in an attempt to deceive the court and obtain a divorce where legal grounds do not exist. Such was not the case here, but the trial court apparently was of the opinion that an agreement before marriage that if the marriage was unsatisfactory the parties could and would separate and thereafter obtain a divorce, was collusive in nature and contrary to law. </p><p id="b175-6"> When our divorce law was amended in 1935 to include five years voluntary separation as a ground for divorce, it made possible that parties to a marriage could put an end to the marriage by their own voluntary action and after the required period either party could have the marriage legally dissolved. In such a dissolution proceeding there is no question of the innocence or guilt of either party and the reason for the separation is not material. The only issue is the existence of the voluntary separation for the required time. </p><p id="b175-7"> The result is that an agreement by the parties prior to entering marriage that they may voluntarily separate, end the marriage and be divorced, is nothing more than a recognition of the rights given them by law. Such an agreement cannot be said to-be contrary to law. </p><p id="b175-11"> Reversed with instructions to award appellant a divorce. </p><div class="footnotes"><div class="footnote" id="fn1" label="1"><a class="footnote" href="#fn1_ref"> 1 </a><p id="b175-8"> . Schibi v. Schibi, 136 Conn. 190, 69 A.2d 831, 14 A.L.R.2d 620. </p></div><div class="footnote" id="fn2" label="2"><a class="footnote" href="#fn2_ref"> 2 </a><p id="b175-24"> . Code 1961, 16-403. </p></div></div></opinion>
Taking a look at this. Might be able to just plug in the fix_footnotes logic as a fix. @flooie you had mentioned about making some site wide modifications to footnotes sometimes soon, does that come into play here it all? |
@quevon24 I'm looking at this again and I dont think we should expect this to display the same when we re-import it, as the xml is still just cap xml but this page is displaying a lawbox import. We did look at examples of how these footnotes are being brought in from CAP and it seemed like it wasnt an issue pending @flooie 's work around footnote styling. |
I've already checked the opinions where xml_harvard is the main source and I didn't find any problems when updating the xml since I didn't find any footnotes with the format I described above. So they should be displayed correctly when updating the xml if that's the case. So I don't think there is a problem since it is not the main source shown in courtlistener. What do you think @flooie? |
use VerboseCommand instead of BaseCommand improve log messages to make it more readable remove tqdm
…te-cl-xml-from-cap
…te-cl-xml-from-cap
I did these changes:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry this took some time to get back to you on. I have a number of concerns about this PR and I think it might be helpful to hop onto a call after Thanksgiving. I'm going to highlight one particular problem that I came across but I think there may be a handful of others.
As far as I can tell there is no attempt to merge the changes we identified with the CAP data. For example, let me take a random example. This is an example where something on CAP went attorney happy and identified basically entire document as attorney
tags.
At the end of the CAP file you get
<opinion type="majority">
</opinion>
See links at bottom.
because we have to do this at scale I can sheepishly state that we didnt get it all correctly. we converted all the errant attorney tags to p tags and wrapped all the content after the court into the opinion and did not create a headmatter. We had no method to correctly parse out when the opinion actually started here.
When I tested your code it took the entire opinion prior to the empty opinion and stashed it as headmatter and generated an empty opinion. It reverted all the fixes we generated and made 26 attorney tags.
https://ia902209.us.archive.org/10/items//law.free.cap.p3d.443/597.12576835.json
https://www.courtlistener.com/opinion/8255415/white-v-premo/#p3
/opinion/8255415/white-v-premo/
I think this PR is going to have to wait until the front end 🤞 gets approved this week so we can do a thorough inspection of how these eventual changes would affect the front end and the css.
Description
This PR introduces a new management command
update_cap_cases
along with corresponding unit tests. The command is designed to update CourtListener (CL) cases with the latest data from the Caselaw Access Project (CAP).Key Changes
update_cap_cases.py
management commandtest_update_cap_cases.py
for unit testingTesting
Unit tests have been added to for core functionality in the new command
Note
It is necessary to have generated crosswalk files with the generate_capcrosswalk.py command before this script will work