Santana's Tech Notes: December 2005

Friday, December 30, 2005

Matching ISO-8859-1 strings with Ruby

While refactoring a braindead legacy application, I needed to separate employees' last names and first names that historically have been stored in a single field in the employees table. Easy cake, even though Spanish names can be a bit more complicated than their English counterparts. Here is the regular expression:

# matches 'Gomez', 'de la Cruz', 'de los Santos', etc.
apellido = '((?:(?:de|del|la|las|los|y|san)\s+)*(?:\w|\#)+)\s+'

# matches the rest of the string after the last names
nombres = "(.*)"

# the complete regular expression
re = /#{apellido}#{apellido}#{nombres}/i

It's not perfect, but covers most of our cases, with only one failure.

These names were written in US-ASCII and thus without accented letters and 'Ñ' (did you notice the '#' in the regular expression?, it is replacing the 'Ñ'). But, what if they were used?

Ruby supports some encodings, including UTF-8 which would be enough for matching those characters. Unfortunately the database was created with the ISO-8859-1 encoding and converting it to UTF-8 was not an option because many programs and (very old) printers depend on ISO-8859-1.

Ruby supports ISO-8859-1 with its new regular expression engine code-named Oniguruma, but only in the development branch (1.9). Oniguruma will be included in Ruby 2.0.

There was one option left: converting the string from ISO-8859-1 to UTF-8 before passing it through the regular expression. This is done with the interface to the iconv library.

require 'iconv'

# We want to convert from ISO-8859-1 to UTF-8
c = Iconv.new('UTF-8', 'ISO-8859-1')

# This is an ISO-8859-1 string
fullname = "Núñez de los Santos María de Jesús"

# Converting
utf_fullname = c.iconv(fullname)

# We can test it, spliting the name into words:
utf_fullname.scan(/\w+/e)

Since \w now matches accented letters and 'ñ', the previous code splits fullname into words.

Notice the extra 'e' after the regular expression. It's an option, saying that Ruby should treat the string as UTF-8 encoded.

Sunday, December 18, 2005

Ruby and Tk on Solaris

Learning Ruby has been a pleasant and rewardable experience. Since May of this year it became one of my most valuable assets for solving problems at work, replacing Perl.

The need to understand how Ruby on Rails works started the quest. And it was such a joy programming in Ruby that I felt compelled to expand its use beyond scripting and web applications development.

Some scripts were asking for a GUI so it seemed like the next step to take. There are a hand of toolkits for which there are Ruby interfaces, like Qt, Gtk and FOX, but I choosed Tk to begin with. I made some tests on Windows, but I couldn't run them on Solaris. When requiring tk I got this error:

LoadError: No such file to load -- tcltklib

I wasn't able to find instructions on where to get and install the missing tcltklib file with Google. Hopefully this post will change this, helping those having the same problem.

The missing library is found in the ext/tcltklib directory, right inside the Ruby distribution. The installation is really simple, after reading the README file:

# ruby extconf.rb --with-tcl-include=/usr/sfw/include
# make && make install

Trivial. But easy to overlook.

Monday, December 12, 2005

Unlocking the Uniface repository. Part I.

Maintaining other people's code is ok if it is well designed, with clear and up to date documentation. But if the code is badly designed (if it ever was) with outdated or non-existent documentation, with the aggravating that it is made with an old out-of-the-market closed-source tool... well, it can be challenging.

That is my reality at work, maintaining an aberration, partially written in a tool from Compuware called Uniface, in its version 7.2.06, which is not supported any more.

Concerned about security, I started to audit this "software". And there I was, in front of around 1200 lines of code to replace to begin with. I can't, for obvious reasons, explain more about the flaw ;-) but I can tell you that these ~1200 lines scattered all over the source code had a pattern in common.

I can hear the Perl hackers saying: "just pass them through a regex". But Uniface 7.2.06 is cooked apart, as we use to say in Mexico, when speaking of source code storage.

A Uniface project's source code is not maintained as files in a filesystem. It is stored in a database in its own undocumented way, or not publicly available for that matter. I couldn't find any information about it on the web nor in the Uniface CDs. I bet I have to pay for some advanced training.

Uniface hackers would advise: "if you want to access the repository install the metadictionary". It's true that the metadictionary "decodes" the source code stored in the database, but I need to write a Uniface application that makes use of the new entities created by the metadictionary and still don't get my find & replace + regular expressions with it.

And you guessed it: Uniface 7.2.06 has no option to find & replace text outside a trigger (a piece of code assigned to a component, model, entity, etc...), let alone regular expressions. Heck, even this find & replace option is an undocumented feature. Go figure.

Exporting the source code to XML was no option, Uniface 7.2.06 doesn't eat its own dog food: it cannot load the XML file that itself generates. Exporting to and editing a TRX file is possible, but not supported by Compuware, and would require loading back the whole repository again.

I had no options left: to start modifying each of those ~1200 lines, one by one. Such is life.

Of course not.

Stay tuned for the second tech note about this topic and watch the Uniface repository in Eve's dress.

Sunday, December 11, 2005

Informix adapter for Ruby on Rails

Like many others recently, I've been touched by Ruby on Rails (RoR). It's indeed a nice framework to work with. I would talk about the beauty in its ORM implementation (ActiveRecord) or Ruby's dynamic nature that made Rails possible, but others have made it already, and way better than I can.

Rails can talk to many DBMS, but unfortunately not for the DBMS I have to use at work: Informix. Ok, it wasn't that bad actually: obstacles are opportunities. And this obstacle gave me the chance to write my first Rails adapter and at the same time gave Rails a chance to be ok'ed by the PHB.

The Informix adapter is usable already, but lacking a pair of features. I've made test applications without any problems whatsoever. Anyways, I will publish it when it is 100% ready. Testers are welcome!

binpatch 1.0.0 released

The first release of binpatch is out. Three years have passed since the first time binpatch saw the light, building binary patches for OpenBSD 3.1 and four years since the messages that started it all.

Back then binary patches for OpenBSD was a crazy idea. Nowadays a hand of different proposals have arised, each of them with its own particular approach. binpatch's approach is to be true to the OpenBSD philosophy of simplicity.

This release comes with documentation and a sample Makefile to help you create your own binary patches.

Home page: http://openbsdbinpatch.sourceforge.net/
Download: http://sourceforge.net/projects/openbsdbinpatch/

Santana's Tech Notes