Introduction to Nettlesome
Nettlesome is a Python library that lets you use readable English phrases as semantic tags for your data. Once you’ve created tags, Nettlesome can automate the process of comparing the tags to determine whether they have the same meaning, whether one tag implies another, whether one tag contradicts another, or whether one tag is consistent with another.
This tutorial will show you how to use some of Nettlesome’s basic
classes: Predicate
,
Statement
, and Comparison
.
“Predicates” as descriptors
A Predicate
should contain an
English-language phrase in the past tense. The use of the past
tense reflects Nettlesome’s original use case of legal
analysis, which is usually backward-looking, determining the legal
effect of past acts or past conditions.
To create a useful Predicate
, you might
remember from grade school that
in a simple sentence the “subject” and any “objects” are typically
nouns, while the “predicate” is a verb phrase that describes an action
or relationship of those nouns. In Nettlesome you can consider marking the
subject and object nouns as “terms” that should be represented by placeholders
in the Predicate
's template string. Generally,
concepts that are part of your data model are good choices to designate
as “terms”, while other nouns that aren’t relevant enough to be part
of your data model don’t need to be terms.
For instance, this example might be a good Predicate
for a data model about determining whether an individual is an owner or
employee of a company:
>>> from nettlesome import Predicate
>>> account_for_company = Predicate(content="$applicant opened a bank account for $company")
Because $applicant
and $company
are marked as placeholders for
Term
s, you’ll be able to add more data about
those entities later. But
notice that the words “bank account” haven’t been replaced by a placeholder.
That means you won’t be able to link this Predicate
to a Term
representing the bank account. That should be
okay as long as your data model
doesn’t need to include specific details about bank accounts.
Predicates also have a truth
attribute that can be used to establish
that one Predicate contradicts
another.
>>> no_account_for_company = Predicate(
>>> "$applicant opened a bank account for $company",
>>> truth=False)
>>> str(no_account_for_company)
'it was false that $applicant opened a bank account for $company'
The truth
attribute will become more significant as we build up to
create more complex objects.
Formatting predicates
Placeholders for terms in a Predicate
should
be marked up using
the placeholder syntax for a string.Template
, which is a part of the
Python standard library.
The placeholders you choose always need to $start_with_a_dollar_sign
and they need to be valid Python identifiers, which means they can’t
contain spaces. If a placeholder needs to be adjacent to a
non-whitespace character, you also need to
${wrap_it_in_curly_braces}
.
Don’t use capitalization or end punctuation to signal the beginning or
end of the phrase in a Predicate
,
because the phrase may be used in a
context where it’s only part of a longer sentence.
The use of different placeholders doesn’t
cause Predicate
s to be considered to have
different meanings. The example below demonstrates this using
the means()
method, which tests
whether two Nettlesome objects have the same meaning.
>>> account_for_partnership = Predicate(content="$applicant opened a bank account for $partnership")
>>> account_for_company.means(account_for_partnership)
True
If you need to mention the same term more than once
in a Predicate
, use
the same placeholder for that term each time. If you later create a
Statement
object using the
same Predicate
, you will only include each
unique Term
once, in the order they first appear.
In this example, a Predicate
's template has
two placeholders referring to the identical Term
.
Even though the rest of the text is the same, the
reuse of the same Term
means that
the Predicate
has a different meaning.
>>> account_for_self = Predicate(content="$applicant opened a bank account for $applicant")
>>> account_for_self.means(account_for_company)
False
Linking predicates to entities
Basically, a Statement
is
a Predicate
plus
the Term
s that need to be included to
make the Statement
a complete phrase.
>>> from nettlesome import Statement, Entity
>>> statement = Statement(
>>> predicate=account_for_company,
>>> terms=[Entity(name="Sarah"), Entity(name="Acme Corporation")])
>>> str(statement)
'the statement that <Sarah> opened a bank account for <Acme Corporation>'
An Entity
is a Term
representing a person or thing. If you’re lucky
enough to be able to run effective Named Entity Recognition techniques
on your dataset, you may already have good candidates for the
Entity
objects that should be included in
your Statement
s. The data
model of an Entity
in Nettlesome includes
just a name
attribute, an attribute indicating whether
the Entity
should be considered
generic
, and a plural
attribute mainly used to determine whether
the word “was” after the Entity
should be
replaced with “were”.
>>> not_at_school = Predicate(content="$group were at school", truth=False)
>>> plural_statement = Statement(not_at_school, terms=[Entity(name="the students", plural=True)])
>>> str(plural_statement)
'the statement it was false that <the students> were at school'
>>> singular_statement = Statement(not_at_school, terms=[Entity(name="Lee", plural=False)])
>>> str(singular_statement)
'the statement it was false that <Lee> was at school'
Generic Terms
The generic
attribute is more subtle than the plural
attribute.
An Entity
should be marked as generic
if
it’s really being used as a
stand-in for a broader category. For instance, in singular_statement
above, the fact that <Lee> is generic indicates that
the Statement
isn’t really about a specific incident when Lee was not at school.
Instead, it’s more about the concept of someone not being at school. In
Nettlesome, when angle brackets appear around the string representation
of an object, that’s an indication that the object is generic.
If two Statement
s have different
generic Entities but they’re otherwise the
same, they’re still considered to have the same meaning as one another.
That’s the case even if one of the Entities is
plural while the other is singular.
>>> plural_statement.means(singular_statement)
True
However, sometimes you need to label an Entity
as being somehow sui
generis, so that Statements about that Entity aren’t really applicable
to other, generic Entities. In that case, you can set the Entity’s
generic
attribute to False and it’ll no longer be found to have the
same meaning as generic Entities.
>>> harry_statement = Statement(not_at_school, terms=Entity(name="Harry Potter", generic=False))
>>> harry_statement.means(singular_statement)
False
By default, Entities are generic and Statements are not generic. Both of these defaults can be changed when you create instances of the respective classes.
Comparing quantitative statements
The Comparison
class extends the concept
of a Predicate
. A Comparison
still contains a truth value and a template string, but that template
should be used to identify a quantity that will be compared to an
expression using a sign
such as an equal sign or a greater-than sign.
This expression must be a constant: either an integer, a floating point
number, or a physical Quantity
expressed in units that can be parsed
using the pint library.
>>> from nettlesome import Comparison
>>> weight_in_pounds = Comparison(
>>> "the weight of ${driver}'s vehicle was",
>>> sign=">",
>>> expression="26000 pounds")
>>> pounds_statement = Statement(weight_in_pounds, terms=Entity(name="Alice"))
>>> str(pounds_statement)
"the statement that the weight of <Alice>'s vehicle was greater than 26000 pound"
Statement
s including Comparison
s
will handle unit conversions when
applying operations like implies()
or contradicts()
.
>>> weight_in_kilos = Comparison(
>>> "the weight of ${driver}'s vehicle was",
>>> sign="<=",
>>> expression="3000 kilograms")
>>> kilos_statement = Statement(weight_in_kilos, terms=Entity(name="Alice"))
>>>> str(kilos_statement)
"the statement that the weight of <Alice>'s vehicle was no more than 3000 kilogram"
>>> pounds_statement.contradicts(kilos_statement)
True
Formatting comparisons
To encourage consistent phrasing, the template string in every
Comparison
object must end with the word “was”.
If you phrase a Comparison
with an inequality sign using
truth=False
, Nettlesome will silently modify your statement so it
can have truth=True
with a different sign. In this example, the
user’s input indicates that it’s false that the weight of marijuana
possessed by a defendant was an ounce or more. Nettlesome interprets
this to mean it’s true that the weight was less than one ounce.
>>> drug_comparison_with_upper_bound = Comparison(
>>> "the weight of marijuana that $defendant possessed was",
>>> sign=">=",
>>> expression="1 ounce",
>>> truth=False)
>>> str(drug_comparison_with_upper_bound)
'that the weight of marijuana that $defendant possessed was less than 1 ounce'
An expression can also be a Python datetime.date
.
>>> license_date = Comparison(
>>> "the date $dentist became a licensed dentist was",
>>> sign="<",
>>> expression=date(1990, 1, 1))
>>> str(license_date)
'that the date $dentist became a licensed dentist was less than 1990-01-01'
When the number needed for a Comparison
is neither
a date
nor a physical quantity that
can be described with physical units like “pounds” or “meters”, you should
phrase the text in the template string to explain what the number
describes. The template string will still need to end with the word
“was”. The value of the expression
parameter should be an integer or a
floating point number, not a string to be parsed.
>>> three_children = Comparison(
>>> "the number of children in ${taxpayer}'s household was",
>>> sign="=",
>>> expression=3)
>>> str(three_children)
"that the number of children in ${taxpayer}'s household was exactly equal to 3"
Comparing groups of statements
If you pass a list of Statement
s to
the FactorGroup
constructor, you can then check to see whether
those Statements, taken as a group, implies another Statement or group of Statements.
Here, the use of placeholders that are identical except for a digit on
the end indicates to Nettlesome that the positions of the Entities in those places should
be considered interchangeable. (In this example, if site1
is a
certain distance away from site2
, then site2
must also be the
same distance away from site1
.)
>>> from nettlesome import FactorGroup
>>> more_than_100_yards = Comparison(
>>> "the distance between $site1 and $site2 was",
>>> sign=">",
>>> expression="100 yards")
>>> less_than_1_mile = Comparison(
>>> "the distance between $site1 and $site2 was",
>>> sign="<",
>>> expression="1 mile")
>>> protest_facts = FactorGroup(
>>> [Statement(
>>> more_than_100_yards,
>>> terms=[Entity(name="the political convention"), Entity(name="the police cordon")]),
>>> Statement(
>>> less_than_1_mile,
>>> terms=[Entity(name="the police cordon"), Entity(name="the political convention")])])
>>> str(protest_facts)
"FactorGroup(['the statement that the distance between <the political convention> and <the police cordon> was greater than 100 yard', 'the statement that the distance between <the police cordon> and <the political convention> was less than 1 mile'])"
>>> more_than_50_meters = Comparison(
>>> "the distance between $site1 and $site2 was",
>>> sign=">",
>>> expression="50 meters")
>>> less_than_2_km = Comparison(
>>> "the distance between $site1 and $site2 was",
>>> sign="<=",
>>> expression="2 km")
>>> speech_zone_facts = FactorGroup(
>>> [Statement(
>>> more_than_50_meters,
>>> terms=[Entity(name="the free speech zone"), Entity(name="the courthouse")]),
>>> Statement(
>>> less_than_2_km,
>>> terms=[Entity(name="the free speech zone"), Entity(name="the courthouse")])])
>>> protest_facts.implies(speech_zone_facts)
True