-
3.4 SQLDisplayFilter
-
4.1 Fields
4.2 Comparing Values
4.3 Field Types
4.5 Slice Operator
4.7 Functions
Python Display Filter Query Language (PyDFQL) offers an intuitive and powerful query language, similar to Wireshark's display filter, for working with various data structures and formats, including Python dictionaries, lists, objects, and SQL databases.
This user guide is designed to take you on a comprehensive journey of PyDFQL, starting from the installation process, and then diving into its application across diverse data formats. It will provide you with a detailed explanation of the query language syntax, and demonstrate its power and flexibility through a series of practical examples.
By the end of this guide, you will gain an in-depth understanding of PyDFQL. You will be equipped with the skills necessary to effectively filter, analyse, and extract valuable insights from complex and varied data sources, enhancing your data manipulation and analysis workflows.
To install PyDFQL, you can use the pip package manager. Open your terminal or command prompt and run the following command:
pip3 install pydfql
Now that you have installed pydfql
,
let's move on to the next section and explore how to quickly get started with it.
The pydfql
library provides support for filtering data from various sources.
This section will provide an overview of how pydfql
can be used for different data sources.
The ObjectDisplayFilter
enables filtering a list of objects.
Example:
from dataclasses import dataclass
from pydfql import ObjectDisplayFilter
@dataclass
class Actor:
name: list
age: dict
gender: str
actors = [
Actor(["Laurence", "Fishburne"], {"born": "1961"}, "male"),
Actor(["Keanu", "Reeves"], {"born": "1964"}, "male"),
Actor(["Joe", "Pantoliano"], {"born": "1951"}, "male"),
Actor(["Carrie-Anne", "Moss"], {"born": "1967"}, "female")
]
filter_query = "age.born > 1960"
filtered_data = ObjectDisplayFilter(actors).filter(filter_query)
print(list(filtered_data))
The DictDisplayFilter
allows to filter a list of dictionaries.
Example:
from pydfql import DictDisplayFilter
actors = [
{"name": ["Laurence", "Fishburne"], "age": {"born": "1961"}, "gender": "male"},
{"name": ["Keanu", "Reeves"], "age": {"born": "1964"}, "gender": "male", "power": ["flight", "bullet-time"]},
{"name": ["Joe", "Pantoliano"], "age": {"born": "1951"}, "gender": "male"},
{"name": ["Carrie-Anne", "Moss"], "age": {"born": "1967"}, "gender": "female"}
]
filter_query = "age.born > 1960 and age.born < 1965"
filtered_data = DictDisplayFilter(actors).filter(filter_query)
print(list(filtered_data))
The ListDisplayFilter
allows filtering a list of lists.
Example:
from pydfql import ListDisplayFilter
data = [
["Morpheus", "Laurence Fishburne", 38, "male", False],
["Neo", "Keanu Reeves", 35, "male", False],
["Cipher", "Joe Pantoliano", 48, "male", True],
["Trinity", "Carrie-Anne Moss", 32, "female", False]
]
filter_query = "age < 40"
field_names = ["name", "actor", "age", "gender", "killed"]
filtered_data = ListDisplayFilter(data, field_names).filter(filter_query)
print(filtered_data)
The SQLDisplayFilter
allows filtering a SQL database table.
Example:
from pydfql import SQLDisplayFilter
import sqlite3
database_file = './data/sqlite_example.sqlite'
connection = sqlite3.connect(database_file)
table_name = "Actors"
filter_query = "age > 30"
filtered_data = SQLDisplayFilter(connection, table_name).filter(filter_query)
print(filtered_data)
For a more advanced example checkout the SQLite Display Filter example.
The query language provides a wide range of operations, comparisons, and logical operators to help retrieving the desired subset of data. This section introduces the query language and its components, including fields, comparisons, field types, combining expressions, slice operator, membership operator, and functions. Through examples and explanations, you will learn how to construct effective filter queries to extract the desired data.
The simplest display filter is one that displays only items where a specific field is present in the dataset:
name
This works also for nested fields, whereby keys are joined by a dot (.):
age.born
You can build display filters that compare values using a number of different comparison operators. A complete list of available comparison operators is shown in the following table:
English | C-like | Description | Example |
---|---|---|---|
eq | == | Equal | name == Neo |
ne | != | Not equal | name != Neo |
gt | > | Greater than | age > 10 |
lt | < | Less than | age < 128 |
ge | >= | Greater than or equal to | age <= 128 |
le | <= | Less than or equal to | age <= 128 |
contains | ~= | field contains a value | age contains 3 |
matches | ~ | field matches a Perl-compatible regular expression | name matches 3 |
& | Bitwise AND is non-zero | age & 0x20 |
You can express integers in decimal, octal, or hexadecimal. The following display filters are equivalent:
age.born == 32
age.born == 0x02
age.born == 040
Strings are a sequence of characters. Characters can also be specified using a byte escape sequence using hex \xhh.
String comparison can be done using the ==
, !=
, contains
and matches
-operator:
name == Neo and name == \x4e\x65\x6f
name contains e
name matches *.e$
When using strings containing special characters like brackets and spaces they should be enclosed by single- or double-quotes:
name == "(Neo)"
name == 'Trin ity'
Ethernet addresses are 6 bytes separated by a colon (:), dot (.), or dash (-) with one or two bytes between separators:
eth.dst == ff:ff:ff:ff:ff:ff
eth.dst == ff-ff-ff-ff-ff-ff
eth.dst == ffff.ffff.ffff
IPv4 addresses are octets separated by a dot (.). In addition, IPv4 address ranges can be specified using a nmap like notation:
ip.addr in { 192.168.0.1/24 }
ip.addr in { 192.168.0.1-254 }
ip.addr in { 192.168.0.1,2,3 }
For more information regarding the nmap like IPv4 address notation see the IPRanger documentation.
IPv6 addresses are hexadecimal parts separated by colons (:). Both compact- and expressive IPv6 notations are supported:
ipv6 == 2001:0db8:0000:08d3:0000:8a2e:0070:7344 and ipv6 == 2001:db8:0:8d3:0:8a2e:70:7344
ipv6 == 2001:db8:0:0:0:0:1428:57ab and ipv6 == 2001:db8::1428:57ab
The value of a date- or time-field is expressed as a string. You can find some examples below:
published > 2000
published < 2000
published <= 2003/05/11
published <= 2003-05-11
See the python-dateutil documentation for more information.
Expressions can be combined using the following logical operators:
English | C-like | Description | Example |
---|---|---|---|
and | && | Logical AND | age >= 32 and gender == male |
or | || | Logical OR | name == Neo or name == Trinity |
xor | ^^ | Logical XOR | gender == female xor power |
not | ! | Logical NOT | gender == male and not (age > 35) |
The values of fields can be sliced similar to Pythons slice operator. This can be done by placing a pair of brackets [] containing a comma separated list of range specifiers.
The following example uses the n:m
format to specify a single range. In this case n is the beginning offset and m is
the length of the range being specified:
name[0:2] == Ne
The example below uses the n-m
format to specify a single range. In this case n is the beginning offset and m is the
ending offset:
name[1-2] == Ne
The example below uses the :m
format, which takes everything from the beginning of a sequence to offset m.
It is equivalent to 0:m
:
name[:2] == Ne
The example below uses the n:
format, which takes everything from offset n to the end of the sequence:
name[2:] == o
The example below uses the n
format to specify a single range. In this case the element in the sequence at offset
n is selected. This is equivalent to n:1
:
name[0] == N
name[-1] == o
It is also possible to string together single ranges in a comma separated list to form compound ranges as shown below:
name[:1,2-3] == Neo
The slice operator is content aware and currently recognizes Strings
, IPv4-
, IPv6-
, and
Ethernet-Addresses
. The examples below show their usage:
mac[0] == 00
mac[:2] == 00:83
mac[1-2] == 00:83
mac[1-2,1-2] == 00:83:00:83
ipv4[0] == 127
ipv4[0:2] == 127.0
ipv4[:2] == 127.0
ipv4[1-2] == 127.0
ipv4[0,1] == 127.0
ipv4[1-2,1-2] == 127.0.127.0
ipv6[0] == 2001
ipv6[0:2] == 2001:0db8
ipv6[:2] == 2001:0db8
ipv6[1-2] == 2001:0db8
ipv6[0,1] == 2001:0db8
ipv6[1-2,1-2] == 2001:0db8:2001:0db8
Expressions can be used to test a field for membership in a set of values or fields.
After the field name, use the in
-operator followed by the set items surrounded by braces ({}
).
For example, to display packets with a TCP source or destination port of 80, 443, or 8080,
you can use tcp.port in {80, 443, 8080}
. Set elements must be separated by commas.
The set of values can also contain ranges e.g. tcp.port in {443,4430..4434}
.
Sets are not just limited to numbers, other types can be used as well:
http.request.method in {"HEAD", "GET"}
tcp.port in {443, 4430..4434}
ip.addr in { 10.2.2.2/24 }
The display filter language has a number of functions to convert fields:
Function | Description | Example |
---|---|---|
upper | Converts a string field to uppercase. | upper(name) == NEO |
lower | Converts a string field to lowercase. | lower(name) == neo |
len | Returns the length of a string. | len(name) == 3 |
In this section, we will walk through various examples to demonstrate the practical usage of the PyDFQL.
Use PyDFQL to filter and manipulate data in CSV files. You can apply complex filtering rules that go beyond the typical capabilities of most CSV parsers:
python3 examples/csv_display_filter.py data/example.csv
# Enter ?help for a list of commands.
> fields
name
actor
age
gender
killed
> filter name == Neo
name | age | gender | killed
---- | --- | ------ | ------
Neo | 35 | male | False
1 row in set (0.01 secs)
See examples/csv_display_filter.py for implementation details.
JSON data can be deeply nested and complex to work with. PyDFQL allows you to filter and extract relevant information from such structures easily:
python3 examples/json_display_filter.py data/example.json
# Enter ?help for a list of commands.
> fields
gender
name
power
actor
age.born
> filter name == Neo
{"name": "Neo", "actor": ["Keanu", "Reeves"], "age": {"born": "1964"}, "gender": "male", "power": ["flight", "bullet-time"]}
1 row in set (0.01 secs)
See examples/json_display_filter.py for implementation details.
If you are dealing with network analysis and use tools like Nmap, PyDFQL can help filter and analyse the scan results:
python3 examples/nmap_display_filter.py data/nmap_example.xml
# Enter ?help for a list of commands.
> fields
host
port
protocol
status
service
> filter port == 179
host | port | protocol | status | service
------------ | ---- | -------- | ------ | -------
72.14.207.99 | 179 | tcp | closed | bgp
72.14.253.83 | 179 | tcp | closed | bgp
2 rows in set (0.01 secs)
> filter lower(service) ~= apache
host | port | protocol | status | service
-------------- | ---- | -------- | ------ | ------------------------------------------------------------------------------------------------------------------------------------------
66.35.250.168 | 80 | tcp | open | product: Apache httpd version: 1.3.39 extrainfo: (Unix) PHP/4.4.7
64.13.134.48 | 80 | tcp | open | product: Apache httpd version: 2.2.2 extrainfo: (Fedora)
204.152.191.37 | 80 | tcp | open | product: Apache httpd version: 2.2.2 extrainfo: (Fedora)
199.185.137.3 | 80 | tcp | open | product: Apache httpd
204.152.190.12 | 80 | tcp | open | product: Apache httpd version: 2.0.61 extrainfo: (Unix) mod_ssl/2.0.61 DAV/2 mod_fastcgi/2.4.2 mod_apreq2-20051231/2.6.0
204.152.190.12 | 443 | tcp | open | product: Apache httpd version: 2.0.61 extrainfo: mod_ssl/2.0.61 DAV/2 mod_fastcgi/2.4.2 mod_apreq2-20051231/2.6.0 hostname: rt.NetBSD.org
6 rows in set (0.01 secs)
See examples/nmap_display_filter.py for implementation details.
For SQLite databases, PyDFQL can be used as an advanced filter mechanism. This is especially useful for data exploration, allowing you to filter records without writing complex SQL queries:
python3 examples/sqlite_display_filter.py data/example.sqlite
# Enter ?help for a list of commands.
> tables
Actors
> use Actors
Database changed
> fields
name
actor
age
gender
killed
> filter name == Neo
name | actor | age | gender | killed
---- | ------------ | --- | ------ | ------
Neo | Keanu Reeves | 35 | male | 0
1 row in set (0.01 secs)
See examples/sqlite_display_filter.py for implementation details.
This project wouldn't be possible without these awesome projects:
- wireshark display filter: Display filter for filtering network packages
- parameterized: Parameterized testing with any Python test framework
- pyparsing: Creating PEG-parsers made easy
- ipranger: Parsing and matching IPv4-addresses
- python-dateutil: Parsing and comparing dates