# API References¶

## Class Summary¶

 ParseException Exception thrown when we can’t parse the whole string. GrammarException Exception thrown when we can’t construct the grammar. MetaGrammar A meta grammar used to extract symbol names (expressed as variables) during grammar construction time. Grammar Grammar user interface. GrammarElement Basic grammar symbols (terminal or non-terminal). StringCs Case-sensitive string (usually a terminal) symbol that can be a word or phrase. String Case-insensitive version of StringCs. SetCs Case-sensitive strings in which matching any will lead to parsing success. Set Case-insensitive version of SetCs. RegexCs Case-sensitive string matching with regular expressions. Regex Case-insensitive version of RegexCs. GrammarExpression An expression usually involving a binary combination of two GrammarElement‘s. And An “+” expression that requires matching a sequence. Or An “|” expression that requires matching any one. GrammarElementEnhance Enhanced grammar symbols for Optional/OneOrMore etc. Optional Optional matching (0 or 1 time). OneOrMore OneOrMore matching (1 or more times). ZeroOrMore ZeroOrMore matching (0 or more times). NULL Null state, used internally GrammarImpl Actual grammar implementation that is returned by a Grammar construction. Production Abstract class for a grammar production in the form: ExpressionProduction Wrapper of GrammarExpression. ElementProduction Wrapper of GrammarElement. ElementEnhanceProduction Wrapper of GrammarElementEnhance. TreeNode A tree structure to represent parser output. Edge An edge in the chart with the following fields: Agenda An agenda for ordering edges that will enter the chart. ParseResult Parse result converted from TreeNode output, providing easy Chart A 2D chart (list) to store graph edges. IncrementalChart A 2D chart (list of list) that expands its size as having more edges added. ChartRule Rules applied in parsing, such as scan/predict/fundamental. TopDownInitRule Initialize the chart when we get started by inserting the goal. BottomUpScanRule Rules used in bottom up scanning. TopDownPredictRule Predict edge if it’s not complete and add it to chart LeftCornerPredictScanRule Left corner rules: only add productions whose left corner non-terminal can parse the lexicon. BottomUpPredictRule In bottom up parsing, predict edge if it’s not complete and add it to chart TopDownScanRule Scan lexicon from top down CompleteRule Complete an incomplete edge form the agenda by merging with a matching completed edge from the chart, or complete an incomplete edge from the chart by merging with a matching completed edge from the agenda. ParsingStrategy Parsing strategy used in TopDown, BottomUp, LeftCorner parsing. TopDownStrategy Parsing strategy used in TopDown, BottomUp, LeftCorner parsing. BottomUpStrategy Parsing strategy used in TopDown, BottomUp, LeftCorner parsing. LeftCornerStrategy Parsing strategy used in TopDown, BottomUp, LeftCorner parsing. RobustParser A robust, incremental chart parser.

## Class API Details¶

parsetron.py, a semantic parser written in pure Python.

class parsetron.Agenda(*args, **kwargs)[source]

Bases: object

An agenda for ordering edges that will enter the chart. Current implementation is a wrapper around collections.deque.

collections.deque supports both FILO (collections.deque.pop()) and FIFO (collections.deque.popleft()). FILO functions like a stack: edges get immediately popped out after they are pushed in. This has merit of finishing the parse sooner, esp. when new edges are just completed, then we can pop them for prediction.

__weakref__

list of weak references to the object (if defined)

append(edge)[source]

Add a single edge to agenda. edge can be either complete or not to be appended to agenda.

Parameters: edge (Edge) –
extend(edges)[source]

Add a sequence of edges to agenda.

Parameters: edges – list(:classEdge)
pop()[source]

Pop an edge from agenda (stack).

Returns: an edge Edge
class parsetron.And(exprs)[source]

An “+” expression that requires matching a sequence.

class parsetron.BottomUpPredictRule[source]

In bottom up parsing, predict edge if it’s not complete and add it to chart

class parsetron.BottomUpScanRule[source]

Rules used in bottom up scanning.

class parsetron.Chart(size)[source]

Bases: object

A 2D chart (list) to store graph edges. Edges can be accessed via: Chart.edges[start][end] and return value is a set of edges.

Parameters: size (int) – chart size, normally len(tokens) + 1.
__weakref__

list of weak references to the object (if defined)

_most_compact_trees(parent_edge, tokens=None)[source]

Try to eliminate spurious ambiguities by getting the most compact/flat tree. This mainly deals with removing Optional/ZeroOrMore nodes

add_edge(edge, prev_edge, child_edge, lexicon=u'')[source]

Add edge to the chart with backpointers being previous edge and child edge

Parameters: edge (Edge) – newly formed edge prev_edge (Edge) – the left (previous) edge where edge is coming from child_edge (Edge) – the right (child) edge that the completion of which moved the dot ot prev_edge Whether this edge is newly inserted (not already exists)
best_tree_with_parse_result(trees)[source]

Return a tuple of the smallest tree among trees and its parse result.

Parameters: trees (list) – a list of TreeNode a tuple of (best tree, its parse result) tuple(TreeNode, ParseResult)
filter_completed_edges(start, lhs)[source]

Find all edges with matching start position and LHS with lhs. directly after the dot as rhs_after_dot. For instance, both edges:

[1, 1] NP ->  * NNS CC NNS
[1, 3] NP ->  * NNS


match start=1 and lhs=NP.

Returns: a list of edges list(Edge)
filter_edges_for_completion(end, rhs_after_dot)[source]

Find all edges with matching end position and RHS nonterminal directly after the dot as rhs_after_dot. For instance, both edges:

[1, 1] NNS ->  * NNS CC NNS
[1, 1] NP ->  * NNS


match end=1 and rhs_after_dot=NNS

filter_edges_for_prediction(end)[source]

Return a list of edges ending at end.

Parameters: end (int) – end position list(Edge)
get_edge_lexical_span(edge)[source]

syntactic sugar for calling self.get_lexical_span(edge.start, edge.end)

Parameters: edge (Edge) – (int, int)
get_lexical_span(start, end=None)[source]

Get the lexical span chart covers from start to end. For instance, for the following sentence:

please [turn off] the [lights]


with parsed items in [], then the lexical span will look like:

when end = None, function assumes end=start+1
if start = 0 and end = None, then return (1, 3)
if start = 1 and end = None, then return (4, 5)
if start = 0 and end = 1, then return (1, 5)

Parameters: start (int) – end (int) – (int, int)
print_backpointers()[source]

Return a string representing the current state of all backpointers.

set_lexical_span(start, end, i=None)[source]

set the lexical span at i in chart to (start, end). if i is None, then default to the last slot in chart (self.chart_i-1). lexical span here means the span chart at i points to in the original sentence. For instance, for the following sentence:

please [turn off] the [lights]


with parsed items in [], then the lexical span will look like:

start = 1, end = 3, i = 0
start = 4, end = 5, i = 1

trees(tokens=None, all_trees=False, goal=None)[source]

Yield all possible trees this chart covers. If all_trees is False, then only the most compact trees for each goal are yielded. Otherwise yield all trees (warning: can be thousands).

Parameters: tokens (list) – a list of lexicon tokens all_trees (bool) – if False, then only print the smallest tree. goal – the root of this tree (usually Grammar.GOAL) GrammarElement, None a tuple of (tree index, TreeNode) tuple(int, TreeNode)
class parsetron.ChartRule[source]

Bases: object

Rules applied in parsing, such as scan/predict/fundamental. New rules need to implement the apply() method.

__weakref__

list of weak references to the object (if defined)

class parsetron.CompleteRule[source]

Complete an incomplete edge form the agenda by merging with a matching completed edge from the chart, or complete an incomplete edge from the chart by merging with a matching completed edge from the agenda.

class parsetron.Edge(start, end, production, dot)[source]

Bases: object

An edge in the chart with the following fields:

Parameters: start (int) – the starting position end (int) – the end position, so span = end - start production (Production) – the grammar production dot (int) – the dot position on the RHS. Any thing before the dot has been consumed and after is waiting to complete
get_rhs_after_dot()[source]

Returns the RHS symbol after dot. For instance, for edge:

[1, 1] NP ->  * NNS


it returns NNS.

If no symbol is after dot, then return None.

Returns: RHS after dot GrammarElement
is_complete()[source]

Whether this edge is completed.

Return type: bool
merge_and_forward_dot(edge)[source]

Move the dot of self forward by one position and change the end position of self edge to end position of edge. Then return a new merged Edge. For instance:

self: [1, 2] NNS ->  * NNS CC NNS
edge: [2, 3] NNS -> men *


Returns a new edge:

[1, 3] NNS -> NNS * CC NNS


Requires that edge.start == self.end

Returns: a new edge Edge
scan_after_dot(phrase)[source]

Scan phrase with RHS after the dot. Returns a tuple of (lexical_progress, rhs_progress) in booleans.

Returns: a tuple tuple(bool, bool)
span()[source]

The span this edge covers, alias of end - start. For instance, for edge:

[1, 3] NP ->  * NNS


it returns 2

Returns: an int int
class parsetron.ElementEnhanceProduction(element, rhs=None)[source]

Wrapper of GrammarElementEnhance. An ElementEnhanceProduction has the following assertion true:

LHS == RHS[0]
class parsetron.ElementProduction(element)[source]

Wrapper of GrammarElement. An ElementProduction has the following assertion true:

LHS == RHS[0]
class parsetron.ExpressionProduction(lhs, rhs)[source]

Wrapper of GrammarExpression.

class parsetron.Grammar[source]

Bases: object

Grammar user interface. Users should inherit this grammar and define a final grammar GOAL as class variable.

It’s a wrapper around GrammarImpl but does not expose any internal functions. So users can freely define their grammar without worrying about name pollution. However, when a Grammar is constructed, a GrammarImpl is actually returned:

>>> g = Grammar()


now g is the real grammar (GrammarImpl)

Warning

Grammar elements have to be defined as class variables instead of instance variables for the Grammar object to extract variable names in string

Warning

Users have to define a GOAL variable in Grammar (similar to start variable S conventionally used in grammar definition)

__metaclass__

alias of MetaGrammar

__weakref__

list of weak references to the object (if defined)

static test()[source]

A method to be batch called by pytest (through test_grammars.py). Users should give examples of what this Grammar parses and use these examples for testing.

class parsetron.GrammarElement[source]

Bases: object

Basic grammar symbols (terminal or non-terminal).

Developers inheriting this class should implement the following functions:

A grammar element carries the following attributes:

• is_terminal: whether this element is terminal or non-terminal. A general rule of thumb is:

• name: the name of this element, usually set by the the set_name() function or implicitly __call__() function.

• variable_name: automatically extracted variable name in string through the Grammar construction.

• canonical_name: if neither name nor variable_name is set, then a canonical name is assigned trying to be as expressive as possible.

• as_list: whether saves result in a hierarchy as a list, or just flat

• ignore: whether to be ignored in ParseResult

__add__(other)[source]

Implement the + operator. Returns And.

__call__(name)[source]

Shortcut for set_name()

__mul__(other)[source]

Implements the * operator, followed by an integer or a tuple/list:

• e * m: m repetitions of e (m > 0)
• e * (m, n) or e * [m, n]: m to n repetitions of e (all inclusive)
• m or n in (m,n)/[m,n] can be None

for instance (=> stands for “is equivalent to”):

• e * (m, None) or e * (m,) => m or more instances of e => e * m + ZeroOrMore (e)
• e * (None, n) or e * (0, n) => 0 to n instances of e
• e * (None, None) => ZeroOrMore (e)
• e * (1, None) => OneOrMore (e)
• e * (None, 1) => Optional (e)
__or__(other)[source]

Implement the | operator. Returns Or.

__radd__(other)[source]

Implement the + operator. Returns And.

__ror__(other)[source]

Implement the | operator. Returns Or.

__weakref__

list of weak references to the object (if defined)

_parse(instring)[source]

Main parsing method to be implemented by developers. Raises ParseException when there is no parse.

Parameters: instring (str) – input string to be parsed True if full parse else False bool ParseException
default_name()[source]

default canonical name.

Returns: a string str
ignore()[source]

Call this function to make this grammar element not appear in parse result. :return: self

parse(instring)[source]

Main parsing method to be called by users. Raises ParseException when there is no parse. Returns True if the whole string is parsed and False if input string is not parsed but no exception is thrown either (e.g., parsing with Null element)

Parameters: instring (str) – input string True if the whole string is parsed else False
production()[source]

converts this GrammarElement (used by User) to a GrammarProduction (used by Parser)

replace_result_with(value)[source]

replace the result lexicon with value. This is a shortcut to:

self.set_result_action(lambda r: r.set(value))

Parameters: value – any object self
run_post_funcs(result)[source]

Run functions set by set_result_action() after getting parsing result.

Parameters: result (ParseResult) – parsing result None
set_name(name)[source]

Set the name of a grammar symbol. Usually the name of a GrammarElement is set by its variable name, for instance:

>>> light = String("light")


but in on-the-fly construction, one can call set_name():

>>> Optional(light).set_name("optional_light")


or shorten it to a function call like name setting:

>>> Optional(light)("optional_light")


The function returns a new shallow copied GrammarElement object. This allows reuse of common grammar elements in complex grammars without name collision.

Parameters: name (str) – name of this grammar symbol a self copy (with different id and hash) GrammarElement
set_result_action(*functions)[source]

Set functions to call after parsing. For instance:

>>> number = Regex(r"\d+").set_result_action(lambda x: int(x))


It can be a list of functions too:

>>> def f1(): pass  # do something
>>> def f2(): pass  # do something
>>> number = Regex(r"\d+").set_result_action(f1, f2)

Parameters: functions – a list of functions self
yield_productions()[source]

Yield how this element/expression produces grammar productions

class parsetron.GrammarElementEnhance(expr)[source]

Enhanced grammar symbols for Optional/OneOrMore etc.

yield_productions()[source]

Yield how this expression produces grammar productions. A GrammarElementEnhance class should implement its own.

exception parsetron.GrammarException[source]

Exception thrown when we can’t construct the grammar.

__weakref__

list of weak references to the object (if defined)

class parsetron.GrammarExpression(exprs)[source]

An expression usually involving a binary combination of two GrammarElement‘s. The resulting GrammarExpression is a non-terminal and does not implement the parsing function _parse().

yield_productions()[source]

Yield how this expression produces grammar productions. A GrammarExpression class should implement its own.

class parsetron.GrammarImpl(name, dct)[source]

Bases: object

Actual grammar implementation that is returned by a Grammar construction.

__init__(name, dct)[source]

This __init__() function should only be called from MetaGrammar but never explicitly.

Parameters: name (str) – name of this grammar class dct (dict) – __class__.__dict__ field
__weakref__

list of weak references to the object (if defined)

_build_grammar_recursively(element, productions)[source]

Build a grammar from element. This mainly includes recursively extracting AND/OR GrammarExpression‘s from element.

Parameters: element (GrammarExpression) – a GrammarExpression productions (set) – assign to set() when calling the first time a set of Production set(Production)
_eliminate_null_and_expand()[source]

Eliminate the Null elements in grammar by introducing more productions without Null elements. For each production with Null, add a new production without. For instance:

S => Optional(A) B Optional(C)
Optional(A) => NULL    --> remove
Optional(A) => A
Optional(C) => NULL
Optional(C) => C       --> remove


becomes:

S => Optional(A) B Optional(C)
Optional(A) => A
Optional(C) => C
S => B Optional(C)     --> added
S => Optional(A) B     --> added


The rational behind this is that NULL elements call for a lot of extra computation and are highly ambiguous. This function increases the size of grammar but helps gain extra parsing speed. In reality comparison of a parsing task:

• without eliminating: 1.6s, _fundamental_rule() was called 38K times, taking 50% of all computing time. 2c52b18d5fcfb901b55ff0506d75c3f41073871c
• with eliminating: 0.6s, _fundamental_rule() was called 23K times, taking 36% of all computing time. 33a1f3f541657ddf0204d02338d94a7e89473d86
_extract_var_names(dct)[source]

Given a dictionary, extract all variable names. For instance, given:

light_general_name = Regex(r"(lights|light|lamp)")


extract the mapping from id(light_general_name) to “light_general_name”

Parameters: dct (dict) – a grammar dictionary a dictionary mapping from id(variable) to variable name.
_set_element_canonical_name(element)[source]

Set the canonical name field of element, if not set yet

Parameters: element (GrammarElement) – a grammar element None
_set_element_variable_name(element)[source]

Set the variable_name field of element

Parameters: element (GrammarElement) – a grammar element True if found else False
build_leftcorner_table()[source]

For each grammar production, build two mappings from the production to:

1. its left corner RHS element (which is a pre-terminal);
2. the terminal element that does the actual parsing job.
filter_productions_for_prediction_by_lhs(lhs)[source]

Yield all productions whose LHS is lhs.

Parameters: lhs (GrammarElement) – a grammar element a production generator generator(Production)
filter_productions_for_prediction_by_rhs(rhs_starts_with)[source]

Yield all productions whose RHS[0] is rhs_starts_with.

Parameters: rhs_starts_with (GrammarElement) – a grammar element a production generator generator(Production)
filter_terminals_for_scan(lexicon)[source]

Yield all terminal productions that parses lexicon.

Parameters: lexicon (str) – a string to be parsed a production generator generator(Production)
get_left_corner_nonterminals(prod)[source]

Given a grammar production, return a set with its left-corner non-terminal productions, or a set with prod itself if not found, .e.g.:

S => A B
A => C D
A => e f
B => b
C => c


passing S as prod will return the set of productions for A and C.

Parameters: prod (Production) – a grammar production set(Production)
get_left_corner_terminals(prod)[source]

Given a grammar production, return a set with its left-corner terminal productions, or an empty set if not found, .e.g.:

S => A B
A => C D
A => e f
B => b
C => c


passing S as prod will return the set of productions for e and c.

Parameters: prod (Production) – a grammar production set(Production)
class parsetron.IncrementalChart(init_size=10, inc_size=10)[source]

A 2D chart (list of list) that expands its size as having more edges added.

Parameters: size (int) – current size of chart max_size (int) – total capacity of chart, if exceeded, then need to increase by inc_size. inc_size (int) – size to increase when max_size is filled
__init__(init_size=10, inc_size=10)[source]
Parameters: init_size – the initial size inc_size – extra size to span when the chart is filled up
increase_capacity()[source]

Increase the capacity of the current chart by self.inc_size

class parsetron.LeftCornerPredictScanRule[source]

Left corner rules: only add productions whose left corner non-terminal can parse the lexicon.

class parsetron.MetaGrammar[source]

Bases: type

A meta grammar used to extract symbol names (expressed as variables) during grammar construction time. This provides a cleaner way than using obj.__class__.__dict__, whose __dict__ has to be accessed via an extra and explicit function call.

class parsetron.Null[source]

Null state, used internally

_parse(instring)[source]

Always returns False, no exceptions.

Parameters: instring (str) – input string to be parsed False
class parsetron.OneOrMore(expr)[source]

OneOrMore matching (1 or more times).

yield_productions()[source]

Yield how this expression produces grammar productions. If A = OneOrMore(B), then this yields:

A => B
A => B A

class parsetron.Optional(expr)[source]

Optional matching (0 or 1 time).

yield_productions()[source]

Yield how this expression produces grammar productions. If A = Optional(B), then this yields:

A => NULL
A => B

class parsetron.Or(exprs)[source]

An “|” expression that requires matching any one.

exception parsetron.ParseException[source]

Exception thrown when we can’t parse the whole string.

__weakref__

list of weak references to the object (if defined)

class parsetron.ParseResult(name, lexicon, as_flat=True, lex_span=(None, None))[source]

Bases: object

Parse result converted from TreeNode output, providing easy access by list or attribute style, for instance:

result['color']
result.color


Results are flattened as much as possible, meaning: deep children are elevated to the top as much as possible as long as there are no name conflicts. For instance, given the following parse tree:

(GOAL
(And(action_verb, OneOrMore(one_parse))
(action_verb "flash")
(OneOrMore(one_parse)
(one_parse
(light_name
(Optional(light_quantifiers)
(light_quantifiers "both")
)
(ZeroOrMore(light_specific_name)
(light_specific_name "top")
(light_specific_name "bottom")
)
(Optional(light_general_name)
(light_general_name "light")
)
)
(ZeroOrMore(color)
(color "red")
)
)
(one_parse
(light_name
(ZeroOrMore(light_specific_name)
(light_specific_name "middle")
)
(Optional(light_general_name)
(light_general_name "light")
)
)
(ZeroOrMore(color)
(color "green")
)
)
(one_parse
(light_name
(ZeroOrMore(light_specific_name)
(light_specific_name "bottom")
)
)
(ZeroOrMore(color)
(color "purple")
)
)
)
)
)


The parse result looks like:

{
"action_verb": "flash",
"one_parse": [
{
"one_parse": "both top bottom light red",
"light_name": "both top bottom light",
"light_quantifiers": "both",
"ZeroOrMore(color)": "red",
"color": "red",
"ZeroOrMore(light_specific_name)": "top bottom",
"Optional(light_general_name)": "light",
"light_general_name": "light",
"Optional(light_quantifiers)": "both",
"light_specific_name": [
"top",
"bottom"
]
},
{
"one_parse": "middle light green",
"light_name": "middle light",
"ZeroOrMore(color)": "green",
"color": "green",
"ZeroOrMore(light_specific_name)": "middle",
"Optional(light_general_name)": "light",
"light_general_name": "light",
"light_specific_name": "middle"
},
{
"one_parse": "bottom purple",
"light_name": "bottom",
"ZeroOrMore(color)": "purple",
"color": "purple",
"ZeroOrMore(light_specific_name)": "bottom",
"light_specific_name": "bottom"
}
],
"And(action_verb, OneOrMore(one_parse))": "flash both top bottom
light red middle light green bottom purple",
"GOAL": "flash both top bottom light red middle light green bottom
purple",
"OneOrMore(one_parse)": "both top bottom light red middle light green
bottom purple"
}


The following holds true given the above result:

assert result.action_verb == "flash"
assert result['action_verb'] == "flash"
assert type(result.one_parse) is list
assert result.one_parse[0].color == 'red'
assert result.one_parse[0].light_specific_name == ['top', 'bottom']
assert result.one_parse[1].light_specific_name == 'middle'


Note how the parse result is flattened w.r.t. the tree. Basic principles of flattening are:

• value of result access is either a string or another ParseResult object
• If a node has >= 1 children with the same name, make the name hold a list
• Else make the name hold a string value.
__weakref__

list of weak references to the object (if defined)

add_item(k, v)[source]

Add a k => v pair to result

add_result(result, as_flat)[source]

Add another result to the current result.

Parameters: result (ParseResult) – another result as_flat (bool) – whether to flatten result.
get(item=None, default=None)[source]

Get the value of item, if not found, return default. If item is not set, then get the main value of ParseResult. The usual value is a lexicon string. But it can be different if the ParseResult.set() function is called.

items()[source]

Return the dictionary of items in result

keys()[source]

Return the set of names in result

lex_span(name=None)[source]

Return the lexical span of this result (if name=None) or the result name. For instance:

_, result = parser.parse('blink lights')
assert result.lex_span() == (1,3)
assert result.lex_span("action") == (1,2)

Parameters: name (str) – when None, return self span, otherwise, return child result’s span (int, int)
name()[source]

Return the result name

Returns: a string str
names()[source]

Return the set of names in result

set(value)[source]

Set the value of ParseResult. value is not necessarily a string though: post functions from GrammarElement.set_result_action() can pass a different value to value.

values()[source]

Return the set of values in result

class parsetron.ParsingStrategy(rule_list)[source]

Bases: object

Parsing strategy used in TopDown, BottomUp, LeftCorner parsing. Each strategy consists of a list of various ChartRules‘s.

__weakref__

list of weak references to the object (if defined)

class parsetron.Production(lhs, rhs)[source]

Bases: object

Abstract class for a grammar production in the form:

LHS –> RHS (RHS is a list)

A grammar production is used by the parser while a grammar element by the user.

Parameters: lhs – a single LHS of GrammarElement rhs (list) – a list of RHS element, each of which is of GrammarElement
__weakref__

list of weak references to the object (if defined)

static factory(lhs, rhs=None)[source]

a Production factory that constructs new productions according to the type of lhs. Users can either call this function, or directly call Production constructors.

Parameters: lhs – a single LHS of GrammarElement rhs (list, GrammarElement) – RHS elements (single or a list), each of which is of GrammarElement
class parsetron.Regex(pattern, flags=2, match_whole=True)[source]

Case-insensitive version of RegexCs.

class parsetron.RegexCs(pattern, flags=0, match_whole=True)[source]

Case-sensitive string matching with regular expressions. e.g.:

>>> color = RegexCs(r"(red|blue|orange)")
>>> digits = RegexCs(r"\d+")


Or pass a compile regex:

>>> import re
>>> color = RegexCs(re.compile(r"(red|blue|orange|a long list)"))

Parameters: flags (int) – standard re flags match_whole (bool) – whether matching the whole string (default: True).

Warning

if match_whole=False, then r"(week|weeks)" will throw a ParseException when parsing “weeks”, but r"(weeks|week)" will succeed to parse “weeks”

RegexType

alias of SRE_Pattern

class parsetron.RobustParser(grammar, strategy=<parsetron.ParsingStrategy object>)[source]

Bases: object

A robust, incremental chart parser.

Parameters: grammar – user defined grammar, a GrammarImpl type. strategy (ParsingStrategy) – top-down or bottom-up parsing
__weakref__

list of weak references to the object (if defined)

_parse_multi_token(sent_or_tokens, chart=None, lex_start=None)[source]

Parse sentences while being able to tokenize multiple tokens, for instance:

kill lights -> “kill” “lights” turn off lights -> “turn off” “lights”

Each quotes-enclosed (multi-)token is recognized as a phrase.

This function doesn’t parse unrecognizable tokens.

clear_cache()[source]

Clear all history when the parser is to parse another sentence. Mainly used in server mode for incremental parsing

incremental_parse(single_token, is_final, only_goal=True, is_first=False)[source]

Incremental parsing one token each time. Returns the best parsing tree and parse result.

Parameters: single_token (str) – a single word is_final (bool) – whether the current single_token is the last one in sentence. only_goal (bool) – only output trees with GOAL as root node is_first (bool) – whether single_token is the first token (best parse tree, parse result) tuple(TreeNode, ParseResult) or (None, None)
incremental_parse_to_chart(single_token, chart)[source]

Incremental parsing one token each time. Returns (chart, is_token_accepted).

Parameters: single_token (str) – a single word chart (RobustChart) – the previous returned chart. On first call, set it to None. a tuple of (chart, parsed_tokens)
parse(string)[source]

Parse an input sentence in string and return the best (tree, result).

Parameters: string – tokenized input (best tree, best parse) tuple(TreeNode, ParseResult)
parse_multi_token_skip_reuse_chart(sent)[source]

Parse sentence with capabilities to:

• multi_token: recognize multiple tokens as one phrase

(e.g., “turn off”)

• skip: throw away tokens not licensed by grammar (e.g.,

speech fillers “um...”)

• reuse_chart: doesn’t waste computation by reusing the chart

from last time. This makes the function call up to 25% faster than without reusing the chart.

Parameters: sent (str) – a sentence in string the chart and the newly parsed tokens tuple(Chart, list(str))
parse_string(string)[source]

alias of parse().

parse_to_chart(string)[source]

Parse a whole sentence into a chart and parsed tokens. This gives a raw chart where all trees or the single best tree can be drawn from.

Parameters: string (str) – input sentence that’s already tokenized. parsing chart and newly parsed tokens tuple(Chart, list(str))
print_parse(string, all_trees=False, only_goal=True, best_parse=True, print_json=False, strict_match=False)[source]

Print the parse tree given input string.

Parameters: string (str) – input string all_trees (bool) – whether to print all trees (warning: massive output) only_goal (bool) – only print the tree licensed by final goal best_parse (bool) – print the best one tree ranked by the smallest size strict_match (bool) – strictly matching input with parse output (for test purposes) True if there is a parse else False
class parsetron.Set(strings)[source]

Case-insensitive version of SetCs.

class parsetron.SetCs(strings, caseless=False)[source]

Case-sensitive strings in which matching any will lead to parsing success. This is a short cut for disjunction of StringCs s (|), or Regex (r'(a\|b\|c\|...)').

Input can be one of the following forms:

• a string with elements separated by spaces (defined by regex r"\s+")
• otherwise an iterable

For instance, the following input is equivalent:

>>> "aa bb cc"
>>> ["aa", "bb", "cc"]
>>> ("aa", "bb", "cc")
>>> {"aa", "bb", "cc"}


The following is also equivalent:

>>> "0123...9"
>>> "0 1 2 3 .. 9"
>>> ["0", "1", "2", "3", ..., "9"]
>>> ("0", "1", "2", "3", ..., "9")
>>> {"0", "1", "2", "3", ..., "9"}

class parsetron.String(string)[source]

Case-insensitive version of StringCs.

class parsetron.StringCs(string)[source]

Case-sensitive string (usually a terminal) symbol that can be a word or phrase.

class parsetron.TopDownInitRule[source]

Initialize the chart when we get started by inserting the goal.

class parsetron.TopDownPredictRule[source]

Predict edge if it’s not complete and add it to chart

class parsetron.TopDownScanRule[source]

Scan lexicon from top down

class parsetron.TreeNode(parent, children, lexicon, lex_span)[source]

Bases: object

A tree structure to represent parser output. parent should be a chart Edge while children should be a TreeNode. lexicon is the matched string if this node is a leaf node.

Parameters: parent (Edge) – an edge in Chart children (list) – a list of TreeNode lexicon (str) – matched string when this node is a leaf node. lex_span ((int,int)) – (start, end) of lexical token offset.
__weakref__

list of weak references to the object (if defined)

dict_for_js()[source]

represents this tree in dict so a json format can be extracted by:

json.dumps(node.dict_for_js())

Returns: a dict
size()[source]

size is the total number of non-terminals and terminals in the tree

Returns: int int
to_parse_result()[source]

Convert this TreeNode to a ParseResult. The result is flattened as much as possible following:

• if the parent node has as_list=True (ZeroOrMore and OneOrMore), then its children are not flattened;
• children are flattened (meaning: they are elevated to the same level as their parents) in the following cases:
• child is a leaf node
• parent has as_list=False and all children have no name conflicts (e.g., in p -> {c1 -> {n -> "lexicon1"}, c2 -> {n -> "lexicon2"}}, n will be elevated to the same levels of c1 and c2 separately, but not to the same level of p).
Returns: ParseResult
class parsetron.ZeroOrMore(expr)[source]

ZeroOrMore matching (0 or more times).

yield_productions()[source]

Yield how this expression produces grammar productions. If A = ZeroOrMore(B), then this yields:

A => NULL
A => B
A => B A


or (semantically equivalent):

A => NULL
A => OneOrMore(B)

parsetron._ustr(obj)[source]

Drop-in replacement for str(obj) that tries to be Unicode friendly. It first tries str(obj). If that fails with a UnicodeEncodeError, then it tries unicode(obj). It then < returns the unicode object | encodes it with the default encoding | ... >.

parsetron.find_word_boundaries(string)[source]

Given a string, such as “my lights are off”, return a tuple:

0: a list containing all word boundaries in tuples
(start(inclusive), end(exclusive)):
[(0, 2), (3, 9), (10, 13), (14, 17)]
1: a set of all start positions: set[(0, 3, 10, 14)]
2: a set of all end positions: set[(2, 9, 13, 17)]

parsetron.strip_string(string)[source]

Merge spaces into single space