4
Jan
2008

Mob intelligence - Please feed the pluralizer

I’ve always been very interested in how groups of people come together to create a body of information.

Now a few weeks ago, I posted some code to pluralize nouns. It’s based on the Ruby on Rails code, and it has been tweaked to include more words and ported to PHP and ActionScript.

It occured to me that maybe people out there could help make the pluralizer smarter. So I’ve written a small app that lets you input words to test out the pluralizer. If the pluralizer gets any of the words wrong, just type in the new word and the code will correct itself.

pluralizer thumbnail

Will this work? I don’t know. It all depends on whether enough people find it interesting to teach the pluralizer new words. It also depends on people not screwing it up with junk words.

Try it out and LMK what you think.

17
Dec
2007

Improved pluralizing in PHP, ActionScript, and RoR

I found a number of issues with the pluralizing code I posted the other day, and I felt compelled to fix it. While I was at it, I ported the code to ActionScript 3 and then back to Ruby, so you can use the pluralizer in your Flex or RoR apps.

BTW, for those of you who love language and know regular expressions (this means you, lori and nj!), please help me spot any problems with these rules.

The reason it was important for me to get these pluralization rules rights is that I am using these for human-readable strings. In my project, users get to name things however they want and it is nice to be able to pluralize (or singularize) these words when communicating with the user.

(As a side note, I think the Ruby on Rails idea of auto-pluralizing table names to be a bit bizarre. I like having magic frameworks that do all the work for me, but I want my frameworks to have predictable behavior!! I mean… how bad would it be if the table that holds the “person” object was called “person” instead of “people”?)

Thanks again to the Rails team for getting this ball rolling, and to Paul Osman for the original PHP version of this code.

On to the code. All code below is covered under the MIT license.

PHP:

// Thanks to http://www.eval.ca/articles/php-pluralize (MIT license)
//           http://dev.rubyonrails.org/browser/trunk/activesupport/lib/active_support/inflections.rb (MIT license)
//           http://www.fortunecity.com/bally/durrus/153/gramch13.html
//           http://www2.gsu.edu/~wwwesl/egw/crump.htm
//
// Changes (12/17/07)
//   Major changes
//   --
//   Fixed irregular noun algorithm to use regular expressions just like the original Ruby source.
//       (this allows for things like fireman -> firemen
//   Fixed the order of the singular array, which was backwards.
//
//   Minor changes
//   --
//   Removed incorrect pluralization rule for /([^aeiouy]|qu)ies$/ => $1y
//   Expanded on the list of exceptions for *o -> *oes, and removed rule for buffalo -> buffaloes
//   Removed dangerous singularization rule for /([^f])ves$/ => $1fe
//   Added more specific rules for singularizing lives, wives, knives, sheaves, loaves, and leaves and thieves
//   Added exception to /(us)es$/ => $1 rule for houses => house and blouses => blouse
//   Added excpetions for feet, geese and teeth
//   Added rule for deer -> deer

// Changes:
//   Removed rule for virus -> viri
//   Added rule for potato -> potatoes
//   Added rule for *us -> *uses

class Inflect
{
    static $plural = array(
        '/(quiz)$/i'               => "$1zes",
        '/^(ox)$/i'                => "$1en",
        '/([m|l])ouse$/i'          => "$1ice",
        '/(matr|vert|ind)ix|ex$/i' => "$1ices",
        '/(x|ch|ss|sh)$/i'         => "$1es",
        '/([^aeiouy]|qu)y$/i'      => "$1ies",
        '/(hive)$/i'               => "$1s",
        '/(?:([^f])fe|([lr])f)$/i' => "$1$2ves",
        '/(shea|lea|loa|thie)f$/i' => "$1ves",
        '/sis$/i'                  => "ses",
        '/([ti])um$/i'             => "$1a",
        '/(tomat|potat|ech|her|vet)o$/i'=> "$1oes",
        '/(bu)s$/i'                => "$1ses",
        '/(alias)$/i'              => "$1es",
        '/(octop)us$/i'            => "$1i",
        '/(ax|test)is$/i'          => "$1es",
        '/(us)$/i'                 => "$1es",
        '/s$/i'                    => "s",
        '/$/'                      => "s"
    );

    static $singular = array(
        '/(quiz)zes$/i'             => "$1",
        '/(matr)ices$/i'            => "$1ix",
        '/(vert|ind)ices$/i'        => "$1ex",
        '/^(ox)en$/i'               => "$1",
        '/(alias)es$/i'             => "$1",
        '/(octop|vir)i$/i'          => "$1us",
        '/(cris|ax|test)es$/i'      => "$1is",
        '/(shoe)s$/i'               => "$1",
        '/(o)es$/i'                 => "$1",
        '/(bus)es$/i'               => "$1",
        '/([m|l])ice$/i'            => "$1ouse",
        '/(x|ch|ss|sh)es$/i'        => "$1",
        '/(m)ovies$/i'              => "$1ovie",
        '/(s)eries$/i'              => "$1eries",
        '/([^aeiouy]|qu)ies$/i'     => "$1y",
        '/([lr])ves$/i'             => "$1f",
        '/(tive)s$/i'               => "$1",
        '/(hive)s$/i'               => "$1",
        '/(li|wi|kni)ves$/i'        => "$1fe",
        '/(shea|loa|lea|thie)ves$/i'=> "$1f",
        '/(^analy)ses$/i'           => "$1sis",
        '/((a)naly|(b)a|(d)iagno|(p)arenthe|(p)rogno|(s)ynop|(t)he)ses$/i'  => "$1$2sis",
        '/([ti])a$/i'               => "$1um",
        '/(n)ews$/i'                => "$1ews",
        '/(h|bl)ouses$/i'           => "$1ouse",
        '/(corpse)s$/i'             => "$1",
        '/(us)es$/i'                => "$1",
        '/s$/i'                     => ""
    );

    static $irregular = array(
        'move'   => 'moves',
        'foot'   => 'feet',
        'goose'  => 'geese',
        'sex'    => 'sexes',
        'child'  => 'children',
        'man'    => 'men',
        'tooth'  => 'teeth',
        'person' => 'people'
    );

    static $uncountable = array(
        'sheep',
        'fish',
        'deer',
        'series',
        'species',
        'money',
        'rice',
        'information',
        'equipment'
    );

    public static function pluralize( $string )
    {
        // save some time in the case that singular and plural are the same
        if ( in_array( strtolower( $string ), self::$uncountable ) )
            return $string;

        // check for irregular singular forms
        foreach ( self::$irregular as $pattern => $result )
        {
            $pattern = '/' . $pattern . '$/i';

            if ( preg_match( $pattern, $string ) )
                return preg_replace( $pattern, $result, $string);
        }

        // check for matches using regular expressions
        foreach ( self::$plural as $pattern => $result )
        {
            if ( preg_match( $pattern, $string ) )
                return preg_replace( $pattern, $result, $string );
        }

        return $string;
    }

    public static function singularize( $string )
    {
        // save some time in the case that singular and plural are the same
        if ( in_array( strtolower( $string ), self::$uncountable ) )
            return $string;

        // check for irregular plural forms
        foreach ( self::$irregular as $result => $pattern )
        {
            $pattern = '/' . $pattern . '$/i';

            if ( preg_match( $pattern, $string ) )
                return preg_replace( $pattern, $result, $string);
        }

        // check for matches using regular expressions
        foreach ( self::$singular as $pattern => $result )
        {
            if ( preg_match( $pattern, $string ) )
                return preg_replace( $pattern, $result, $string );
        }

        return $string;
    }

    public static function pluralize_if($count, $string)
    {
        if ($count == 1)
            return "1 $string";
        else
            return $count . " " . self::pluralize($string);
    }
}

ActionScript:

package
{
public class Inflect
{
    private static var plural : Array = [
        [/(quiz)$/i,                     "$1zes"],
        [/^(ox)$/i,                      "$1en"],
        [/([m|l])ouse$/i,                "$1ice"],
        [/(matr|vert|ind)ix|ex$/i,       "$1ices"],
        [/(x|ch|ss|sh)$/i,               "$1es"],
        [/([^aeiouy]|qu)y$/i,            "$1ies"],
        [/(hive)$/i,                     "$1s"],
        [/(?:([^f])fe|([lr])f)$/i,       "$1$2ves"],
        [/(shea|lea|loa|thie)f$/i,       "$1ves"],
        [/sis$/i,                        "ses"],
        [/([ti])um$/i,                   "$1a"],
        [/(tomat|potat|ech|her|vet)o$/i, "$1oes"],
        [/(bu)s$/i,                      "$1ses"],
        [/(alias|status)$/i,             "$1es"],
        [/(octop)us$/i,                  "$1i"],
        [/(ax|test)is$/i,                "$1es"],
        [/(us)$/i,                       "$1es"],
        [/s$/i,                          "s"],
        [/$/i,                           "s"]
    ];

    private static var singular : Array = [
        [/(quiz)zes$/i,             "$1"],
        [/(matr)ices$/i,            "$1ix"],
        [/(vert|ind)ices$/i,        "$1ex"],
        [/^(ox)en$/i,               "$1"],
        [/(alias|status)es$/i,      "$1"],
        [/(octop|vir)i$/i,          "$1us"],
        [/(cris|ax|test)es$/i,      "$1is"],
        [/(shoe)s$/i,               "$1"],
        [/(o)es$/i,                 "$1"],
        [/(bus)es$/i,               "$1"],
        [/([m|l])ice$/i,            "$1ouse"],
        [/(x|ch|ss|sh)es$/i,        "$1"],
        [/(m)ovies$/i,              "$1ovie"],
        [/(s)eries$/i,              "$1eries"],
        [/([^aeiouy]|qu)ies$/i,     "$1y"],
        [/([lr])ves$/i,             "$1f"],
        [/(tive)s$/i,               "$1"],
        [/(hive)s$/i,               "$1"],
        [/(li|wi|kni)ves$/i,        "$1fe"],
        [/(shea|loa|lea|thie)ves$/i,"$1f"],
        [/(^analy)ses$/i,           "$1sis"],
        [/((a)naly|(b)a|(d)iagno|(p)arenthe|(p)rogno|(s)ynop|(t)he)ses$/i,  "$1$2sis"],
        [/([ti])a$/i,               "$1um"],
        [/(n)ews$/i,                "$1ews"],
        [/(h|bl)ouses$/i,           "$1ouse"],
        [/(corpse)s$/i,             "$1"],
        [/(us)es$/i,                "$1"],
        [/s$/i,                     ""]
    ];

    private static var irregular : Array = [
        ['move'   , 'moves'],
        ['foot'   , 'feet'],
        ['goose'  , 'geese'],
        ['sex'    , 'sexes'],
        ['child'  , 'children'],
        ['man'    , 'men'],
        ['tooth'  , 'teeth'],
        ['person' , 'people']
    ];

    private static var uncountable : Array = [
        'sheep',
        'fish',
        'deer',
        'series',
        'species',
        'money',
        'rice',
        'information',
        'equipment'
    ];

    public static function pluralize( string : String ) : String
    {
        var pattern : RegExp;
        var result : String;

        // save some time in the case that singular and plural are the same
        if (uncountable.indexOf(string.toLowerCase()) != -1)
          return string;

        // check for irregular singular forms
        var item : Array;
        for each ( item in irregular )
        {
            pattern = new RegExp(item[0] + "$", "i");
            result = item[1];

            if (pattern.test(string))
            {
                return string.replace(pattern, result);
            }
        }

        // check for matches using regular expressions
        for each ( item in plural)
        {
            pattern = item[0];
            result = item[1];

            if (pattern.test(string))
            {
                return string.replace(pattern, result);
            }
        }

        return string;
    }

    public static function singularize( string : String ) : String
    {
        var pattern : RegExp;
        var result : String

        // save some time in the case that singular and plural are the same
        if (uncountable.indexOf(string.toLowerCase()) != -1)
            return string;

        // check for irregular singular forms
        var item : Array;
        for each ( item in irregular )
        {
            pattern = new RegExp(item[1] + "$", "i");
            result = item[0];

            if (pattern.test(string))
            {
                return string.replace(pattern, result);
            }
       }

       // check for matches using regular expressions
       for each ( item in singular)
       {
            pattern = item[0];
            result = item[1];

            if (pattern.test(string))
            {
                return string.replace(pattern, result);
            }
       }

       return string;

    }

    public static function pluralizeIf(count : int, string : String) : String
    {
        if (count == 1)
            return "1 " + string;
        else
            return count.toString() + " " + pluralize(string);
    }
}
}

Ruby on Rails (use this to replace your Inflect.rb):

Inflector.inflections do |inflect|
    inflect.plural(/$/, 's')
    inflect.plural(/s$/i, 's')
    inflect.plural(/(us)$/i, '\\1es')
    inflect.plural(/(ax|test)is$/i, '\\1es')
    inflect.plural(/(octop)us$/i, '\\1i')
    inflect.plural(/(alias)$/i, '\\1es')
    inflect.plural(/(bu)s$/i, '\\1ses')
    inflect.plural(/(tomat|potat|ech|her|vet)o$/i, '\\1oes')
    inflect.plural(/([ti])um$/i, '\\1a')
    inflect.plural(/sis$/i, 'ses')
    inflect.plural(/(shea|lea|loa|thie)f$/i, '\\1ves')
    inflect.plural(/(?:([^f])fe|([lr])f)$/i, '\\1\\2ves')
    inflect.plural(/(hive)$/i, '\\1s')
    inflect.plural(/([^aeiouy]|qu)y$/i, '\\1ies')
    inflect.plural(/(x|ch|ss|sh)$/i, '\\1es')
    inflect.plural(/(matr|vert|ind)(?:ix|ex)$/i, '\\1ices')
    inflect.plural(/([m|l])ouse$/i, '\\1ice')
    inflect.plural(/^(ox)$/i, '\\1en')
    inflect.plural(/(quiz)$/i, '\\1zes')

    inflect.singular(/s$/i, '')
    inflect.singular(/(us)es$/i, '\\1')
    inflect.singular(/(corpse)s$/i, '\\1')
    inflect.singular(/(h|bl)ouses$/i, '\\1ouse')
    inflect.singular(/(n)ews$/i, '\\1ews')
    inflect.singular(/([ti])a$/i, '\\1um')
    inflect.singular(/((a)naly|(b)a|(d)iagno|(p)arenthe|(p)rogno|(s)ynop|(t)he)ses$/i, '\\1\\2sis')
    inflect.singular(/(^analy)ses$/i, '\\1sis')
    inflect.singular(/(shea|loa|lea|thie)ves$/i, '\\1f')
    inflect.singular(/(li|wi|kni)ves$/i, '\\1fe')
    inflect.singular(/(hive)s$/i, '\\1')
    inflect.singular(/(tive)s$/i, '\\1')
    inflect.singular(/([lr])ves$/i, '\\1f')
    inflect.singular(/([^aeiouy]|qu)ies$/i, '\\1y')
    inflect.singular(/(s)eries$/i, '\\1eries')
    inflect.singular(/(m)ovies$/i, '\\1ovie')
    inflect.singular(/(x|ch|ss|sh)es$/i, '\\1')
    inflect.singular(/([m|l])ice$/i, '\\1ouse')
    inflect.singular(/(bus)es$/i, '\\1')
    inflect.singular(/(o)es$/i, '\\1')
    inflect.singular(/(shoe)s$/i, '\\1')
    inflect.singular(/(cris|ax|test)es$/i, '\\1is')
    inflect.singular(/(octop|vir)i$/i, '\\1us')
    inflect.singular(/(alias)es$/i, '\\1')
    inflect.singular(/^(ox)en/i, '\\1')
    inflect.singular(/(vert|ind)ices$/i, '\\1ex')
    inflect.singular(/(matr)ices$/i, '\\1ix')
    inflect.singular(/(quiz)zes$/i, '\\1')

    inflect.irregular('person', 'people')
    inflect.irregular('tooth', 'teeth')
    inflect.irregular('man', 'men')
    inflect.irregular('child', 'children')
    inflect.irregular('sex', 'sexes')
    inflect.irregular('goose', 'geese')
    inflect.irregular('foot', 'feet')
    inflect.irregular('move', 'moves')

    inflect.uncountable(%w(equipment information rice money species series deer fish sheep))
end
14
Dec
2007

How to pluralize in PHP (and please help me check the code)

An improved version of the code has been posted here. Please use that version instead of this version.

At some point, I should post about what I have been doing over the past few months. Suffice it to say that I am tinkering around with a few ideas and getting reacquainted with things like PHP and AJAX.

In the meantime, here is a small code snippet for how to pluralize in PHP. The original code came from two sources:

http://www.eval.ca/articles/php-pluralize (no license specified)
http://solarphp.com (BSD license)

The first source derives from the Rails source, which is covered under the MIT license. Since the BSD license is a tiny tiny bit more restrictive than the MIT license, I think that means that this code is covered under BSD.

I made a few changes from the two original versions:

  1. Started with www.eval.ca version
  2. Changed nested arrays to associative array and wrapped it in a class
  3. Added unpluralization rules from solarphp.com
  4. Changed suspicious pluralization rules
    1. Removed unorthodox virus -> viri in favor of general rule for *us -> *uses (e.g., viruses, cactuses, caucuses)
    2. I noticed a rule for buffalo -> buffaloes and tomato -> tomatoes but not one for potato->potatoes. I added it.

One quibble. It kind of bothers me that these algorithms have such specific rules for pluralizations that are unlikely to come up in computer software (ox -> oxen? octopus -> octopi?) and yet the rules are obviously not complete, because I found at least three problems just by inspection. Whether you agree or disagree with how I pluralized virus, the plural of cactus is not cactus, the plural of caucus is not caucus, and the plural of potato is not potatos.

Did I miss any pluralizations? Did I overstep my bounds by adding a rule that says *us -> *uses?

// Thanks to http://www.eval.ca/articles/php-pluralize (MIT license)
// As well as http://solarphp.com/trac/changeset/2214?format=diff&new=2214 (BSD license)

// Changes:
//   Removed rule for virus -> viri
//   Added rule for potato -> potatoes
//   Added rule for *us -> *uses

class Inflect
{
    static $plural = array(
        '/(quiz)$/i'               => "$1zes",
        '/^(ox)$/i'                => "$1en",
        '/([m|l])ouse$/i'          => "$1ice",
        '/(matr|vert|ind)ix|ex$/i' => "$1ices",
        '/(x|ch|ss|sh)$/i'         => "$1es",
        '/([^aeiouy]|qu)y$/i'      => "$1ies",
        '/([^aeiouy]|qu)ies$/i'    => "$1y",
        '/(hive)$/i'               => "$1s",
        '/(?:([^f])fe|([lr])f)$/i' => "$1$2ves",
        '/sis$/i'                  => "ses",
        '/([ti])um$/i'             => "$1a",
        '/(buffal|tomat|potat)o$/i'=> "$1oes",
        '/(bu)s$/i'                => "$1ses",
        '/(alias|status)$/i'       => "$1es",
        '/(octop)us$/i'            => "$1i",
        '/(ax|test)is$/i'          => "$1es",
        '/us$/i'                   => "$1es",
        '/s$/i'                    => "s",
        '/$/'                      => "s"
    );

    static $singular = array(
        '/(n)ews$/i'                => "$1ews",
        '/([ti])a$/i'               => "$1um",
        '/((a)naly|(b)a|(d)iagno|(p)arenthe|(p)rogno|(s)ynop|(t)he)ses$/i'  => "$1$2sis",
        '/(^analy)ses$/i'           => "$1sis",
        '/([^f])ves$/i'             => "$1fe",
        '/(hive)s$/i'               => "$1",
        '/(tive)s$/i'               => "$1",
        '/([lr])ves$/i'             => "$1f",
        '/([^aeiouy]|qu)ies$/i'     => "$1y",
        '/(s)eries$/i'              => "$1eries",
        '/(m)ovies$/i'              => "$1ovie",
        '/(x|ch|ss|sh)es$/i'        => "$1",
        '/([m|l])ice$/i'            => "$1ouse",
        '/(bus)es$/i'               => "$1",
        '/(o)es$/i'                 => "$1",
        '/(shoe)s$/i'               => "$1",
        '/(cris|ax|test)es$/i'      => "$1is",
        '/(octop|vir)i$/i'          => "$1us",
        '/(alias|status)es$/i'      => "$1",
        '/^(ox)en$/i'               => "$1",
        '/(vert|ind)ices$/i'        => "$1ex",
        '/(matr)ices$/i'            => "$1ix",
        '/(quiz)zes$/i'             => "$1",
        '/(us)es$/i'                => "$1",
        '/s$/i'                     => ""
    );

    static $irregular = array(
        array( 'move',   'moves'    ),
        array( 'sex',    'sexes'    ),
        array( 'child',  'children' ),
        array( 'man',    'men'      ),
        array( 'person', 'people'   )
    );

    static $uncountable = array(
        'sheep',
        'fish',
        'series',
        'species',
        'money',
        'rice',
        'information',
        'equipment'
    );

    public static function pluralize( $string )
    {
        // save some time in the case that singular and plural are the same
        if ( in_array( strtolower( $string ), self::$uncountable ) )
            return $string;

        // check for irregular singular forms
        foreach ( self::$irregular as $noun )
        {
            if ( strtolower( $string ) == $noun[0] )
            return $noun[1];
        }

        // check for matches using regular expressions
        foreach ( self::$plural as $pattern => $result )
        {
            if ( preg_match( $pattern, $string ) )
                return preg_replace( $pattern, $result, $string );
        }

        return $string;
    }

    public static function singularize( $string )
    {
        // save some time in the case that singular and plural are the same
        if ( in_array( strtolower( $string ), self::$uncountable ) )
            return $string;

        // check for irregular singular forms
        foreach ( self::$irregular as $noun )
        {
            if ( strtolower( $string ) == $noun[1] )
            return $noun[0];
        }

        // check for matches using regular expressions
        foreach ( self::$singular as $pattern => $result )
        {
            if ( preg_match( $pattern, $string ) )
                return preg_replace( $pattern, $result, $string );
        }

        return $string;
    }

    public static function pluralize_if($count, $string)
    {
        if ($count == 1)
            return "1 $string";
        else
            return $count . " " . self::pluralize($string);
    }
}

Enjoy!

13
Nov
2007

The National - Boxer

Just a quick note to say that this album.. whoa. Really good. Lush, but not overblown. Moody but not morose.

Musically, it’s a cross between Wilco, Joy Division, and Lou Reed, and is fronted by Matt Berninger’s deep, rich vocals which wrap around your mind and make you question why you spend your time listening to other bands which, in retrospect, sound a bit like whiny teenagers. If anything, maybe the vocals are a little too prominent, but for me, it works.

The first track, “Fake Empire”, gives a sense of what the album is all about. It starts out with spare vocals accompanied by a piano playing what starts out sounding like a badly played three against two pattern, which reveals itself to be a much more complex pattern once the percussion kicks in. The song builds and swells, adding horns (without sounding like a carnival or a mariachi band, which almost all bands that include horns end up doing..)

And all of this without sounding pretentious or showy.

19
Oct
2007

Even Apple sometimes screws up UI

Based on yesterday’s episode, I decided to try Buzzword. In order to do that, I needed to install the latest Flash Player.

The installation failed with the cryptic message “The file flashplayer.xpt could not be written.” This message could have been more helpful, but that’s really Adobe’s fault, not Apple’s.

I tracked down the problem to the fact that Firefox was installed using the “skuwamoto” account. Meanwhile, I was trying to install Flash Player using the “household” account.

So I decided to change the owner of Firefox to “household”. Easy, right? Guess again. Changing the owner of an application turns out to be kind of difficult.

~

I started by changing the owner of the file using the info dialog like so:

Info dialog for Firefox

The install still failed. Take 30 seconds and try to guess why.

Ok. Being a software developer, I knew that applications were really “packages” which are a special kind of Unix folder. If you do a “show package contents”, you find that the Firefox package only contains one folder:

Firefox package contents

And after doing a “get info” on the folder, I found that the package contents still had “skuwamoto” as the owner. ARRGGGHHH!!!

Permissions for package contents

Why on earth would you want to set the owner of an application without setting the owner of the enclosed folders?? Also note that the original info dialog had no option to “Apply to enclosed items”, so even if you knew that this was something to watch out for, there is no way to fix this without manually opening the package and inspecting the contents.

I mean… I had trouble figuring out what was going on, and I write software for a living.