Natural Language Parsing with Ruby

Hi, today I am here to share with you a way to parse natural language with Ruby, using treetop gem.

Why Do I Need it?

Imagine we need that a user from our application input some rule or condition to solve a problem. We could use a traditional field like [type="text"] to get it, no?!

But, what if this input is so complex (too many logical/comparison operators for instance) that you would need a lot of them to pull it off?

I came across with this situation a time ago, so, the better solution I found, was to use natural language to let users input rules using their own mother language.

Hands-on

Treetop let us define the syntax that is going to be parsed, so then, we need to create a treetop file with desired rules.

Let's suppose we need to take some action just if the result of the following rule were true: "if number of orders is greater than X" (X is an integer number).

Below, we can see a treetop file describing the above statement.

# my_grammar.treetop
grammar MyGrammar
  rule root
    if space number_of_orders space greater_than space value
  end

  rule if
    "if"
  end

  rule number_of_orders
    "number of orders is"
  end

  rule space
    [\s]+
  end

  rule greater_than
    "greater than" <GreaterThanOperator>
  end

  rule value
    [0-9]+ <Value>
  end
end

Ok, we have the syntax set, now we need to parse and evaluate the statement.

Our system does not know how to interpret the assertion, we need to help.. Let's create a file to put some assistance.

# node_extensions.rb
module MyGrammar
  class GreaterThanOperator < Treetop::Runtime::SyntaxNode
    def text_value
      ">"
    end
  end

  class Value < Treetop::Runtime::SyntaxNode
  end
end

The text_value method from each class inheriting from Treetop::Runtime::SyntaxNode represents the value to be returned when parsing some statement, so here, we say to return > always that the parser finds the snippet greater than (that is linked to the GreaterThanOperator in treetop file).

Once the class does not override the text_value method, it will return the same value contained into the assertion, but, to get the return, we need to create the class, as we did with Value class.

Ok, now our system already knows how to deal with the statement, let's parse it:

# parser.rb
require "treetop"

BASE_PATH = File.expand_path(File.dirname(__FILE__))
require File.join(BASE_PATH, "node_extensions.rb")

class Parser
  Treetop.load(File.join(BASE_PATH, "my_grammar.treetop"))

  def self.go statement, number_of_orders
    parser = MyGrammarParser.new

    tree = parser.parse(statement)
    if !tree.nil?
      tree = clean_tree(tree)
    end
    rule = tree.inject(number_of_orders.to_s) do |final, current|
      final += current.text_value
    end
    puts rule
    eval rule
  end

  def self.clean_tree(root_node)
    return if(root_node.elements.nil?)
    root_node.elements.delete_if{|node| node.class.name == "Treetop::Runtime::SyntaxNode" }
    root_node.elements.each {|node| self.clean_tree(node) }
  end
end

assertion = Parser.go "if number of orders is greater than 20", 50
puts assertion

Here we call the parser passing the statement with the already known number of orders and clean the tree (as written in this post).

We iterate over the extracted values from our assertion, create a valid Ruby statement and then evaluate it with the eval method.

This is a simple example, but we can make use of more complex rules to parse any statement.

You can check out the source code here.

See you.

Written on November 10, 2014

Share: