<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Automation Inc. &#187; ruby regex</title>
	<atom:link href="http://blog.frameos.org/tag/ruby-regex/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.frameos.org</link>
	<description>Finding the perfect blend between fun and money</description>
	<lastBuildDate>Tue, 03 Jan 2012 19:20:26 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>ruby performance bits: regex libraries</title>
		<link>http://blog.frameos.org/2008/12/06/ruby-performance-bits-regex-libraries/</link>
		<comments>http://blog.frameos.org/2008/12/06/ruby-performance-bits-regex-libraries/#comments</comments>
		<pubDate>Sat, 06 Dec 2008 22:57:26 +0000</pubDate>
		<dc:creator>rubiojr</dc:creator>
				<category><![CDATA[In the LAB]]></category>
		<category><![CDATA[Scripts]]></category>
		<category><![CDATA[ruby regex]]></category>

		<guid isPermaLink="false">http://rubiojr.netcorex.org/blog/?p=45</guid>
		<description><![CDATA[Interesantes números midiendo el rendimiento de los diferentes &#8216;motores&#8217; de expresiones regulares que hay en ruby. Las pruebas se han hecho analizando un fichero de 1 millón de líneas de log de apache. ruby1.8.7 require 'benchmark' logfile = '/data/logs/httpd/old/1m' fields = [ '(\d+\.\d+\.\d+\.\d+)', # ip '(.*?)', # ident '(.*?)', # user '\[(.*?)\]', # datetime '"(.*?)"', [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fblog.frameos.org%2F2008%2F12%2F06%2Fruby-performance-bits-regex-libraries%2F">
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fblog.frameos.org%2F2008%2F12%2F06%2Fruby-performance-bits-regex-libraries%2F&amp;style=normal&amp;b=2" height="61" width="50" />
			</a>
		</div>Interesantes números midiendo el rendimiento de los diferentes &#8216;motores&#8217; de expresiones regulares que hay en ruby.

Las pruebas se han hecho analizando un fichero de 1 millón de líneas de log de apache.

<strong>ruby1.8.7</strong>
<pre lang="ruby" >
require 'benchmark'

logfile = '/data/logs/httpd/old/1m'
fields = [
  '(\d+\.\d+\.\d+\.\d+)',    # ip
  '(.*?)',                   # ident
  '(.*?)',                   # user
  '\[(.*?)\]',               # datetime
  '"(.*?)"',                 # request
  '(\d+)',                   # code
  '(-|\d+)',                 # size
  '"(.*?)"',                 # referer
  '"(.*?)"',                 # user-agent 
  '\[(.*?)\]'                # vhost
] 

re = "^#{fields.join('\s+')}$"
regex = /#{re}/

Benchmark.bm do |x|
  3.times do
    x.report {
      File.open logfile do |f|
        f.each_line do |l|
          regex.match(l)
        end
      end   
    }   
  end 
end 
</pre>
<code>
rubiojr@desire:~/lparse_bench$ ruby ruby_regex.rb 
         user     system      total        real
    42.820000   0.430000  43.250000 ( 43.578923)
    42.640000   0.520000  43.160000 ( 43.495074)
    42.780000   0.540000  43.320000 ( 43.369753)
</code>
&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8211;

<strong>ruby1.9.1 r19983 (ejecutando el mísmo código de arriba)</strong>
<code>
rubiojr@desire:~/lparse_bench$ /opt/ruby1.9/bin/ruby ruby_regex.rb 
         user     system      total        real
    13.200000   0.150000  13.350000 ( 13.357268)
    13.190000   0.180000  13.370000 ( 13.401657)
    13.220000   0.120000  13.340000 ( 13.333555)
</code>
&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8211;

<strong>ruby1.8.7 + oniguruma</strong>
<pre lang='ruby'>
require 'rubygems'
require 'oniguruma'
require 'benchmark'

logfile = '/data/logs/httpd/old/1m'
fields = [               
  '(\d+\.\d+\.\d+\.\d+)',    # ip
  '(.*?)',                   # ident
  '(.*?)',                   # user
  '\[(.*?)\]',               # datetime
  '"(.*?)"',                 # request
  '(\d+)',                   # code
  '(-|\d+)',                 # size
  '"(.*?)"',                 # referer
  '"(.*?)"',                 # user-agent 
  '\[(.*?)\]'                # vhost
]

re = "^#{fields.join('\s+')}$"
regex = Oniguruma::ORegexp.new(re)
    
Benchmark.bm do |x|
  3.times do 
    x.report {
      File.open logfile do |f|
        f.each_line do |l|
          regex.match(l)
        end
      end
    }
  end
end
</pre>

<code>
rubiojr@desire:~/lparse_bench$ ruby oniguruma_regex.rb 
         user     system      total        real
    12.500000   0.570000  13.070000 ( 13.146782)
    12.520000   0.600000  13.120000 ( 13.242722)
    12.470000   0.530000  13.000000 ( 13.010874)
</code>

<a href="http://oniguruma.rubyforge.org/">Oniguruma</a> forma parte de ruby1.9, lo que parece que explica la similitud de los resultados obtenidos con ruby1.8+oniguruma y ruby1.9.1.

Está claro que el motor de ruby1.8 no es el candidato ideal para el <a href="http://wikis.sun.com/display/WideFinder/Wide+Finder+Home">WayFinder Project</a> :D

<strong>Update: jruby1.1.6 RC1 (IcedTea6 1.3.1)</strong>
<code>
rubiojr@desire:~/lparse_bench$ jruby ruby_regex.rb 
         user     system      total        real
    17.260000   0.220000  17.480000 ( 18.085660)
    16.430000   0.160000  16.590000 ( 16.888714)
    16.240000   0.220000  16.460000 ( 16.728076)
</code>
JRuby 1.1 incluye un port de Oniguruma a Java, de acuerdo con las &#8220;<a href="http://wiki.jruby.org/wiki/JRuby_1.1">release notes</a>&#8220;]]></content:encoded>
			<wfw:commentRss>http://blog.frameos.org/2008/12/06/ruby-performance-bits-regex-libraries/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

