Natural Language Parsing with Ruby
Hi, today I am here to share with you a way to parse natural language with Ruby, using treetop gem.
Why Do I Need it?
Imagine we need that a user from our application input some rule or condition to solve a problem. We could use a traditional field like [type="text"]
to get it, no?!
But, what if this input is so complex (too many logical/comparison operators for instance) that you would need a lot of them to pull it off?
I came across with this situation a time ago, so, the better solution I found, was to use natural language to let users input rules using their own mother language.
Hands-on
Treetop let us define the syntax that is going to be parsed, so then, we need to create a treetop
file with desired rules.
Let's suppose we need to take some action just if the result of the following rule were true: "if number of orders is greater than X" (X is an integer number).
Below, we can see a treetop
file describing the above statement.
Ok, we have the syntax set, now we need to parse and evaluate the statement.
Our system does not know how to interpret the assertion, we need to help.. Let's create a file to put some assistance.
The text_value
method from each class inheriting from Treetop::Runtime::SyntaxNode
represents the value to be returned when parsing some statement, so here, we say to return >
always that the parser finds the snippet greater than
(that is linked to the GreaterThanOperator
in treetop
file).
Once the class does not override the text_value
method, it will return the same value contained into the assertion, but, to get the return, we need to create the class, as we did with Value
class.
Ok, now our system already knows how to deal with the statement, let's parse it:
Here we call the parser passing the statement with the already known number of orders and clean the tree (as written in this post).
We iterate over the extracted values from our assertion, create a valid Ruby statement and then evaluate it with the eval
method.
This is a simple example, but we can make use of more complex rules to parse any statement.
You can check out the source code here.
See you.