# pytregex **Repository Path**: tanloong/pytregex ## Basic Information - **Project Name**: pytregex - **Description**: Tregex-like constituency tree matcher - **Primary Language**: Unknown - **License**: GPL-3.0 - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2023-07-04 - **Last Updated**: 2025-07-21 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README [](https://codecov.io/gh/tanloong/pytregex)  [](https://github.com/tanloong/pytregex/blob/master/LICENSE) [Tregex](https://nlp.stanford.edu/software/tregex.html) is the Java program for identifying patterns in constituency trees. PyTregex provides similar functionality in Python. ## Usage ### Command-line Install it with `pip install` and run it by `python -m pytregex`. ```sh $ pip install pytregex $ echo '(NP(DT The)(NN battery)(NN plant))' | python -m pytregex pattern 'NP < NN' -filter # (NP (DT The) (NN battery) (NN plant)) # (NP (DT The) (NN battery) (NN plant)) # There were 2 matches in total. $ echo '(NP(DT The)(NN battery)(NN plant))' > trees.txt $ python -m pytregex pattern 'NP < NN' ./trees.txt # (NP (DT The) (NN battery) (NN plant)) # (NP (DT The) (NN battery) (NN plant)) # There were 2 matches in total. $ python -m pytregex pattern 'NP < NN' -C ./trees.txt # 2 $ python -m pytregex pattern 'NP < NN=a' -h a ./trees.txt # (NN battery) # (NN plant) # There were 2 matches in total. $ python -m pytregex explain '<' # 'A < B' means A immediately dominates B $ python -m pytregex pprint '(NP(DT The)(NN battery)(NN plant))' # NP # ├── DT # │ └── The # ├── NN # │ └── battery # └── NN # └── plant ``` ### Inline ```python from pytregex.tregex import TregexPattern p = TregexPattern("NP < NN=a") matches = p.findall("(NP(DT The)(NN battery)(NN plant))") handles = p.get_nodes("a") print("matches nodes:\n{}\n".format("\n".join(str(m) for m in matches))) print("named nodes:\n{}".format("\n".join(str(h) for h in handles))) # Output: # matches nodes: # (NP (DT The) (NN battery) (NN plant)) # (NP (DT The) (NN battery) (NN plant)) # # named nodes: # (NN battery) # (NN plant) ``` See [tests](tests/test_tregex.py) for more examples. ## Differences from Tregex Tregex is whitespace-sensitive, it distinguishes between `|` and `␣|␣`. PyTregex ignores whitespace and has different symbols taking the place of `␣|␣`.
| Tregex | PyTregex | |
|---|---|---|
| node disjunction | A|B |
A|B |
A␣|␣B | ||
| condition disjunction | A<B␣|␣<C |
A<B␣||␣<C |
A<B||<C | ||
| expression disjunction | A␣|␣B |
N/A |
| expression separation | N/A | A;B |
A␣;␣B |