~ programming ~

17
Dec
2007

Improved pluralizing in PHP, ActionScript, and RoR

I found a number of issues with the pluralizing code I posted the other day, and I felt compelled to fix it. While I was at it, I ported the code to ActionScript 3 and then back to Ruby, so you can use the pluralizer in your Flex or RoR apps.

BTW, for those of you who love language and know regular expressions (this means you, lori and nj!), please help me spot any problems with these rules.

The reason it was important for me to get these pluralization rules rights is that I am using these for human-readable strings. In my project, users get to name things however they want and it is nice to be able to pluralize (or singularize) these words when communicating with the user.

(As a side note, I think the Ruby on Rails idea of auto-pluralizing table names to be a bit bizarre. I like having magic frameworks that do all the work for me, but I want my frameworks to have predictable behavior!! I mean… how bad would it be if the table that holds the “person” object was called “person” instead of “people”?)

Thanks again to the Rails team for getting this ball rolling, and to Paul Osman for the original PHP version of this code.

On to the code. All code below is covered under the MIT license.

PHP:

// Thanks to http://www.eval.ca/articles/php-pluralize (MIT license)
//           http://dev.rubyonrails.org/browser/trunk/activesupport/lib/active_support/inflections.rb (MIT license)
//           http://www.fortunecity.com/bally/durrus/153/gramch13.html
//           http://www2.gsu.edu/~wwwesl/egw/crump.htm
//
// Changes (12/17/07)
//   Major changes
//   --
//   Fixed irregular noun algorithm to use regular expressions just like the original Ruby source.
//       (this allows for things like fireman -> firemen
//   Fixed the order of the singular array, which was backwards.
//
//   Minor changes
//   --
//   Removed incorrect pluralization rule for /([^aeiouy]|qu)ies$/ => $1y
//   Expanded on the list of exceptions for *o -> *oes, and removed rule for buffalo -> buffaloes
//   Removed dangerous singularization rule for /([^f])ves$/ => $1fe
//   Added more specific rules for singularizing lives, wives, knives, sheaves, loaves, and leaves and thieves
//   Added exception to /(us)es$/ => $1 rule for houses => house and blouses => blouse
//   Added excpetions for feet, geese and teeth
//   Added rule for deer -> deer

// Changes:
//   Removed rule for virus -> viri
//   Added rule for potato -> potatoes
//   Added rule for *us -> *uses

class Inflect
{
    static $plural = array(
        '/(quiz)$/i'               => "$1zes",
        '/^(ox)$/i'                => "$1en",
        '/([m|l])ouse$/i'          => "$1ice",
        '/(matr|vert|ind)ix|ex$/i' => "$1ices",
        '/(x|ch|ss|sh)$/i'         => "$1es",
        '/([^aeiouy]|qu)y$/i'      => "$1ies",
        '/(hive)$/i'               => "$1s",
        '/(?:([^f])fe|([lr])f)$/i' => "$1$2ves",
        '/(shea|lea|loa|thie)f$/i' => "$1ves",
        '/sis$/i'                  => "ses",
        '/([ti])um$/i'             => "$1a",
        '/(tomat|potat|ech|her|vet)o$/i'=> "$1oes",
        '/(bu)s$/i'                => "$1ses",
        '/(alias)$/i'              => "$1es",
        '/(octop)us$/i'            => "$1i",
        '/(ax|test)is$/i'          => "$1es",
        '/(us)$/i'                 => "$1es",
        '/s$/i'                    => "s",
        '/$/'                      => "s"
    );

    static $singular = array(
        '/(quiz)zes$/i'             => "$1",
        '/(matr)ices$/i'            => "$1ix",
        '/(vert|ind)ices$/i'        => "$1ex",
        '/^(ox)en$/i'               => "$1",
        '/(alias)es$/i'             => "$1",
        '/(octop|vir)i$/i'          => "$1us",
        '/(cris|ax|test)es$/i'      => "$1is",
        '/(shoe)s$/i'               => "$1",
        '/(o)es$/i'                 => "$1",
        '/(bus)es$/i'               => "$1",
        '/([m|l])ice$/i'            => "$1ouse",
        '/(x|ch|ss|sh)es$/i'        => "$1",
        '/(m)ovies$/i'              => "$1ovie",
        '/(s)eries$/i'              => "$1eries",
        '/([^aeiouy]|qu)ies$/i'     => "$1y",
        '/([lr])ves$/i'             => "$1f",
        '/(tive)s$/i'               => "$1",
        '/(hive)s$/i'               => "$1",
        '/(li|wi|kni)ves$/i'        => "$1fe",
        '/(shea|loa|lea|thie)ves$/i'=> "$1f",
        '/(^analy)ses$/i'           => "$1sis",
        '/((a)naly|(b)a|(d)iagno|(p)arenthe|(p)rogno|(s)ynop|(t)he)ses$/i'  => "$1$2sis",
        '/([ti])a$/i'               => "$1um",
        '/(n)ews$/i'                => "$1ews",
        '/(h|bl)ouses$/i'           => "$1ouse",
        '/(corpse)s$/i'             => "$1",
        '/(us)es$/i'                => "$1",
        '/s$/i'                     => ""
    );

    static $irregular = array(
        'move'   => 'moves',
        'foot'   => 'feet',
        'goose'  => 'geese',
        'sex'    => 'sexes',
        'child'  => 'children',
        'man'    => 'men',
        'tooth'  => 'teeth',
        'person' => 'people'
    );

    static $uncountable = array(
        'sheep',
        'fish',
        'deer',
        'series',
        'species',
        'money',
        'rice',
        'information',
        'equipment'
    );

    public static function pluralize( $string )
    {
        // save some time in the case that singular and plural are the same
        if ( in_array( strtolower( $string ), self::$uncountable ) )
            return $string;

        // check for irregular singular forms
        foreach ( self::$irregular as $pattern => $result )
        {
            $pattern = '/' . $pattern . '$/i';

            if ( preg_match( $pattern, $string ) )
                return preg_replace( $pattern, $result, $string);
        }

        // check for matches using regular expressions
        foreach ( self::$plural as $pattern => $result )
        {
            if ( preg_match( $pattern, $string ) )
                return preg_replace( $pattern, $result, $string );
        }

        return $string;
    }

    public static function singularize( $string )
    {
        // save some time in the case that singular and plural are the same
        if ( in_array( strtolower( $string ), self::$uncountable ) )
            return $string;

        // check for irregular plural forms
        foreach ( self::$irregular as $result => $pattern )
        {
            $pattern = '/' . $pattern . '$/i';

            if ( preg_match( $pattern, $string ) )
                return preg_replace( $pattern, $result, $string);
        }

        // check for matches using regular expressions
        foreach ( self::$singular as $pattern => $result )
        {
            if ( preg_match( $pattern, $string ) )
                return preg_replace( $pattern, $result, $string );
        }

        return $string;
    }

    public static function pluralize_if($count, $string)
    {
        if ($count == 1)
            return "1 $string";
        else
            return $count . " " . self::pluralize($string);
    }
}

ActionScript:

package
{
public class Inflect
{
    private static var plural : Array = [
        [/(quiz)$/i,                     "$1zes"],
        [/^(ox)$/i,                      "$1en"],
        [/([m|l])ouse$/i,                "$1ice"],
        [/(matr|vert|ind)ix|ex$/i,       "$1ices"],
        [/(x|ch|ss|sh)$/i,               "$1es"],
        [/([^aeiouy]|qu)y$/i,            "$1ies"],
        [/(hive)$/i,                     "$1s"],
        [/(?:([^f])fe|([lr])f)$/i,       "$1$2ves"],
        [/(shea|lea|loa|thie)f$/i,       "$1ves"],
        [/sis$/i,                        "ses"],
        [/([ti])um$/i,                   "$1a"],
        [/(tomat|potat|ech|her|vet)o$/i, "$1oes"],
        [/(bu)s$/i,                      "$1ses"],
        [/(alias|status)$/i,             "$1es"],
        [/(octop)us$/i,                  "$1i"],
        [/(ax|test)is$/i,                "$1es"],
        [/(us)$/i,                       "$1es"],
        [/s$/i,                          "s"],
        [/$/i,                           "s"]
    ];

    private static var singular : Array = [
        [/(quiz)zes$/i,             "$1"],
        [/(matr)ices$/i,            "$1ix"],
        [/(vert|ind)ices$/i,        "$1ex"],
        [/^(ox)en$/i,               "$1"],
        [/(alias|status)es$/i,      "$1"],
        [/(octop|vir)i$/i,          "$1us"],
        [/(cris|ax|test)es$/i,      "$1is"],
        [/(shoe)s$/i,               "$1"],
        [/(o)es$/i,                 "$1"],
        [/(bus)es$/i,               "$1"],
        [/([m|l])ice$/i,            "$1ouse"],
        [/(x|ch|ss|sh)es$/i,        "$1"],
        [/(m)ovies$/i,              "$1ovie"],
        [/(s)eries$/i,              "$1eries"],
        [/([^aeiouy]|qu)ies$/i,     "$1y"],
        [/([lr])ves$/i,             "$1f"],
        [/(tive)s$/i,               "$1"],
        [/(hive)s$/i,               "$1"],
        [/(li|wi|kni)ves$/i,        "$1fe"],
        [/(shea|loa|lea|thie)ves$/i,"$1f"],
        [/(^analy)ses$/i,           "$1sis"],
        [/((a)naly|(b)a|(d)iagno|(p)arenthe|(p)rogno|(s)ynop|(t)he)ses$/i,  "$1$2sis"],
        [/([ti])a$/i,               "$1um"],
        [/(n)ews$/i,                "$1ews"],
        [/(h|bl)ouses$/i,           "$1ouse"],
        [/(corpse)s$/i,             "$1"],
        [/(us)es$/i,                "$1"],
        [/s$/i,                     ""]
    ];

    private static var irregular : Array = [
        ['move'   , 'moves'],
        ['foot'   , 'feet'],
        ['goose'  , 'geese'],
        ['sex'    , 'sexes'],
        ['child'  , 'children'],
        ['man'    , 'men'],
        ['tooth'  , 'teeth'],
        ['person' , 'people']
    ];

    private static var uncountable : Array = [
        'sheep',
        'fish',
        'deer',
        'series',
        'species',
        'money',
        'rice',
        'information',
        'equipment'
    ];

    public static function pluralize( string : String ) : String
    {
        var pattern : RegExp;
        var result : String;

        // save some time in the case that singular and plural are the same
        if (uncountable.indexOf(string.toLowerCase()) != -1)
          return string;

        // check for irregular singular forms
        var item : Array;
        for each ( item in irregular )
        {
            pattern = new RegExp(item[0] + "$", "i");
            result = item[1];

            if (pattern.test(string))
            {
                return string.replace(pattern, result);
            }
        }

        // check for matches using regular expressions
        for each ( item in plural)
        {
            pattern = item[0];
            result = item[1];

            if (pattern.test(string))
            {
                return string.replace(pattern, result);
            }
        }

        return string;
    }

    public static function singularize( string : String ) : String
    {
        var pattern : RegExp;
        var result : String

        // save some time in the case that singular and plural are the same
        if (uncountable.indexOf(string.toLowerCase()) != -1)
            return string;

        // check for irregular singular forms
        var item : Array;
        for each ( item in irregular )
        {
            pattern = new RegExp(item[1] + "$", "i");
            result = item[0];

            if (pattern.test(string))
            {
                return string.replace(pattern, result);
            }
       }

       // check for matches using regular expressions
       for each ( item in singular)
       {
            pattern = item[0];
            result = item[1];

            if (pattern.test(string))
            {
                return string.replace(pattern, result);
            }
       }

       return string;

    }

    public static function pluralizeIf(count : int, string : String) : String
    {
        if (count == 1)
            return "1 " + string;
        else
            return count.toString() + " " + pluralize(string);
    }
}
}

Ruby on Rails (use this to replace your Inflect.rb):

Inflector.inflections do |inflect|
    inflect.plural(/$/, 's')
    inflect.plural(/s$/i, 's')
    inflect.plural(/(us)$/i, '\\1es')
    inflect.plural(/(ax|test)is$/i, '\\1es')
    inflect.plural(/(octop)us$/i, '\\1i')
    inflect.plural(/(alias)$/i, '\\1es')
    inflect.plural(/(bu)s$/i, '\\1ses')
    inflect.plural(/(tomat|potat|ech|her|vet)o$/i, '\\1oes')
    inflect.plural(/([ti])um$/i, '\\1a')
    inflect.plural(/sis$/i, 'ses')
    inflect.plural(/(shea|lea|loa|thie)f$/i, '\\1ves')
    inflect.plural(/(?:([^f])fe|([lr])f)$/i, '\\1\\2ves')
    inflect.plural(/(hive)$/i, '\\1s')
    inflect.plural(/([^aeiouy]|qu)y$/i, '\\1ies')
    inflect.plural(/(x|ch|ss|sh)$/i, '\\1es')
    inflect.plural(/(matr|vert|ind)(?:ix|ex)$/i, '\\1ices')
    inflect.plural(/([m|l])ouse$/i, '\\1ice')
    inflect.plural(/^(ox)$/i, '\\1en')
    inflect.plural(/(quiz)$/i, '\\1zes')

    inflect.singular(/s$/i, '')
    inflect.singular(/(us)es$/i, '\\1')
    inflect.singular(/(corpse)s$/i, '\\1')
    inflect.singular(/(h|bl)ouses$/i, '\\1ouse')
    inflect.singular(/(n)ews$/i, '\\1ews')
    inflect.singular(/([ti])a$/i, '\\1um')
    inflect.singular(/((a)naly|(b)a|(d)iagno|(p)arenthe|(p)rogno|(s)ynop|(t)he)ses$/i, '\\1\\2sis')
    inflect.singular(/(^analy)ses$/i, '\\1sis')
    inflect.singular(/(shea|loa|lea|thie)ves$/i, '\\1f')
    inflect.singular(/(li|wi|kni)ves$/i, '\\1fe')
    inflect.singular(/(hive)s$/i, '\\1')
    inflect.singular(/(tive)s$/i, '\\1')
    inflect.singular(/([lr])ves$/i, '\\1f')
    inflect.singular(/([^aeiouy]|qu)ies$/i, '\\1y')
    inflect.singular(/(s)eries$/i, '\\1eries')
    inflect.singular(/(m)ovies$/i, '\\1ovie')
    inflect.singular(/(x|ch|ss|sh)es$/i, '\\1')
    inflect.singular(/([m|l])ice$/i, '\\1ouse')
    inflect.singular(/(bus)es$/i, '\\1')
    inflect.singular(/(o)es$/i, '\\1')
    inflect.singular(/(shoe)s$/i, '\\1')
    inflect.singular(/(cris|ax|test)es$/i, '\\1is')
    inflect.singular(/(octop|vir)i$/i, '\\1us')
    inflect.singular(/(alias)es$/i, '\\1')
    inflect.singular(/^(ox)en/i, '\\1')
    inflect.singular(/(vert|ind)ices$/i, '\\1ex')
    inflect.singular(/(matr)ices$/i, '\\1ix')
    inflect.singular(/(quiz)zes$/i, '\\1')

    inflect.irregular('person', 'people')
    inflect.irregular('tooth', 'teeth')
    inflect.irregular('man', 'men')
    inflect.irregular('child', 'children')
    inflect.irregular('sex', 'sexes')
    inflect.irregular('goose', 'geese')
    inflect.irregular('foot', 'feet')
    inflect.irregular('move', 'moves')

    inflect.uncountable(%w(equipment information rice money species series deer fish sheep))
end
5
Apr
2006

AS3 — on the lack of private and protected constructors

As I was talking about using objects as enums, someone made a comment about the lack of private and protected constructors. I know that the this is a sore spot, so I thought I’d explain my view of how we got here, and give my two cents.

The problem is that unlike what we did with ActionScript 2, we are working tightly with people from Mozilla and others to standardize on ECMAScript edition 4. The ECMAScript standard is not final, but we are adhering as closely as we can to the spec as it develops.
More »

8
Mar
2006

FAB - Flex / AJAX bridge

FAB - Flex/AJAX Bridge - is a library created by Ely Greenfield, who is a good friend and Flex Architect.

FAB lets you control Flex applications using JavaScript. Method calls just work, and getters and setters are converted to method calls (because browsers don’t support getters and setters yet).

You can even attach event listeners from JavaScript and pass function objects back and forth. For example, you can create an MXML file with a button in it and drive all of the logic from within your HTML/JavaScript.

MXML:

<mx:Application xmlns:mx=”http://www.macromedia.com/2005/mxml”>
    <fab:FABridge xmlns:fab=”bridge.*” />
    <mx:Button id=”okButton” label=”OK” />
</mx:Application>

JavaScript:

function init()
{
    var flexApp = FABridge.flash.root();
    flexApp.okButton().addEventListener(”click”, handleClick);
}

function handleClick(event)
{
    // handle the click event here.
}

You can read more about it on Ely’s blog, which I predict will be worth reading.

http://www.quietlyscheming.com/blog/2006/03/06/flex-and-ajax/

6
Mar
2006

Slides from Flashforward 2006

As promised, here are my slides and notes from Flashforward. Some things you will need to know:

  1. The layout syntax uses the new beta 2 syntax, which is not out yet.
  2. The notes cover a bit less material than the slides. That’s because the notes were finalized before the slides were, in order to get them printed on paper for conference attendees. If I have time, I may extend the notes but with so much going on, I am not sure I will have time to do this.

[Slides - ppt] [Slides - pdf]
[Speaker notes - doc] [Speaker notes - pdf]

24
Feb
2006

Evolution of a feature - layout constraints

We spend a lot of time here on the Flex / Flex Builder team arguing over exactly how each piece of the framework should work, and how the syntax should look.

I thought it might be interesting to have a glimpse into how some of these features evolve, and why. I’m picking a particularly controversial example: layout constraints.
More »