~ actionscript ~

7
Jan
2008

The pluralizer: weekend results

Over the weekend, people tried out around three hundred words, and taught the pluralizer a bunch of new words.

Some words I rejected:
German words (Büch -> Bücher, for example)
Kanji / Chinese (木 -> 木)

Someone also claimed that the plural of “singularity” is “not applicable”. Uh… funny, but my Physics background tells me that you can have more than one singularity. Black holes, anyone?

If you would like to feed the pluralizer, it can be found here.

4
Jan
2008

Mob intelligence - Please feed the pluralizer

I’ve always been very interested in how groups of people come together to create a body of information.

Now a few weeks ago, I posted some code to pluralize nouns. It’s based on the Ruby on Rails code, and it has been tweaked to include more words and ported to PHP and ActionScript.

It occured to me that maybe people out there could help make the pluralizer smarter. So I’ve written a small app that lets you input words to test out the pluralizer. If the pluralizer gets any of the words wrong, just type in the new word and the code will correct itself.

pluralizer thumbnail

Will this work? I don’t know. It all depends on whether enough people find it interesting to teach the pluralizer new words. It also depends on people not screwing it up with junk words.

Try it out and LMK what you think.

17
Dec
2007

Improved pluralizing in PHP, ActionScript, and RoR

I found a number of issues with the pluralizing code I posted the other day, and I felt compelled to fix it. While I was at it, I ported the code to ActionScript 3 and then back to Ruby, so you can use the pluralizer in your Flex or RoR apps.

BTW, for those of you who love language and know regular expressions (this means you, lori and nj!), please help me spot any problems with these rules.

The reason it was important for me to get these pluralization rules rights is that I am using these for human-readable strings. In my project, users get to name things however they want and it is nice to be able to pluralize (or singularize) these words when communicating with the user.

(As a side note, I think the Ruby on Rails idea of auto-pluralizing table names to be a bit bizarre. I like having magic frameworks that do all the work for me, but I want my frameworks to have predictable behavior!! I mean… how bad would it be if the table that holds the “person” object was called “person” instead of “people”?)

Thanks again to the Rails team for getting this ball rolling, and to Paul Osman for the original PHP version of this code.

On to the code. All code below is covered under the MIT license.

PHP:

// Thanks to http://www.eval.ca/articles/php-pluralize (MIT license)
//           http://dev.rubyonrails.org/browser/trunk/activesupport/lib/active_support/inflections.rb (MIT license)
//           http://www.fortunecity.com/bally/durrus/153/gramch13.html
//           http://www2.gsu.edu/~wwwesl/egw/crump.htm
//
// Changes (12/17/07)
//   Major changes
//   --
//   Fixed irregular noun algorithm to use regular expressions just like the original Ruby source.
//       (this allows for things like fireman -> firemen
//   Fixed the order of the singular array, which was backwards.
//
//   Minor changes
//   --
//   Removed incorrect pluralization rule for /([^aeiouy]|qu)ies$/ => $1y
//   Expanded on the list of exceptions for *o -> *oes, and removed rule for buffalo -> buffaloes
//   Removed dangerous singularization rule for /([^f])ves$/ => $1fe
//   Added more specific rules for singularizing lives, wives, knives, sheaves, loaves, and leaves and thieves
//   Added exception to /(us)es$/ => $1 rule for houses => house and blouses => blouse
//   Added excpetions for feet, geese and teeth
//   Added rule for deer -> deer

// Changes:
//   Removed rule for virus -> viri
//   Added rule for potato -> potatoes
//   Added rule for *us -> *uses

class Inflect
{
    static $plural = array(
        ‘/(quiz)$/i’               => “$1zes”,
        ‘/^(ox)$/i’                => “$1en”,
        ‘/([m|l])ouse$/i’          => “$1ice”,
        ‘/(matr|vert|ind)ix|ex$/i’ => “$1ices”,
        ‘/(x|ch|ss|sh)$/i’         => “$1es”,
        ‘/([^aeiouy]|qu)y$/i’      => “$1ies”,
        ‘/(hive)$/i’               => “$1s”,
        ‘/(?:([^f])fe|([lr])f)$/i’ => “$1$2ves”,
        ‘/(shea|lea|loa|thie)f$/i’ => “$1ves”,
        ‘/sis$/i’                  => “ses”,
        ‘/([ti])um$/i’             => “$1a”,
        ‘/(tomat|potat|ech|her|vet)o$/i’=> “$1oes”,
        ‘/(bu)s$/i’                => “$1ses”,
        ‘/(alias)$/i’              => “$1es”,
        ‘/(octop)us$/i’            => “$1i”,
        ‘/(ax|test)is$/i’          => “$1es”,
        ‘/(us)$/i’                 => “$1es”,
        ‘/s$/i’                    => “s”,
        ‘/$/’                      => “s”
    );

    static $singular = array(
        ‘/(quiz)zes$/i’             => “$1″,
        ‘/(matr)ices$/i’            => “$1ix”,
        ‘/(vert|ind)ices$/i’        => “$1ex”,
        ‘/^(ox)en$/i’               => “$1″,
        ‘/(alias)es$/i’             => “$1″,
        ‘/(octop|vir)i$/i’          => “$1us”,
        ‘/(cris|ax|test)es$/i’      => “$1is”,
        ‘/(shoe)s$/i’               => “$1″,
        ‘/(o)es$/i’                 => “$1″,
        ‘/(bus)es$/i’               => “$1″,
        ‘/([m|l])ice$/i’            => “$1ouse”,
        ‘/(x|ch|ss|sh)es$/i’        => “$1″,
        ‘/(m)ovies$/i’              => “$1ovie”,
        ‘/(s)eries$/i’              => “$1eries”,
        ‘/([^aeiouy]|qu)ies$/i’     => “$1y”,
        ‘/([lr])ves$/i’             => “$1f”,
        ‘/(tive)s$/i’               => “$1″,
        ‘/(hive)s$/i’               => “$1″,
        ‘/(li|wi|kni)ves$/i’        => “$1fe”,
        ‘/(shea|loa|lea|thie)ves$/i’=> “$1f”,
        ‘/(^analy)ses$/i’           => “$1sis”,
        ‘/((a)naly|(b)a|(d)iagno|(p)arenthe|(p)rogno|(s)ynop|(t)he)ses$/i’  => “$1$2sis”,
        ‘/([ti])a$/i’               => “$1um”,
        ‘/(n)ews$/i’                => “$1ews”,
        ‘/(h|bl)ouses$/i’           => “$1ouse”,
        ‘/(corpse)s$/i’             => “$1″,
        ‘/(us)es$/i’                => “$1″,
        ‘/s$/i’                     => “”
    );

    static $irregular = array(
        ‘move’   => ‘moves’,
        ‘foot’   => ‘feet’,
        ‘goose’  => ‘geese’,
        ’sex’    => ’sexes’,
        ‘child’  => ‘children’,
        ‘man’    => ‘men’,
        ‘tooth’  => ‘teeth’,
        ‘person’ => ‘people’
    );

    static $uncountable = array(
        ’sheep’,
        ‘fish’,
        ‘deer’,
        ’series’,
        ’species’,
        ‘money’,
        ‘rice’,
        ‘information’,
        ‘equipment’
    );

    public static function pluralize( $string )
    {
        // save some time in the case that singular and plural are the same
        if ( in_array( strtolower( $string ), self::$uncountable ) )
            return $string;

        // check for irregular singular forms
        foreach ( self::$irregular as $pattern => $result )
        {
            $pattern = ‘/’ . $pattern . ‘$/i’;

            if ( preg_match( $pattern, $string ) )
                return preg_replace( $pattern, $result, $string);
        }

        // check for matches using regular expressions
        foreach ( self::$plural as $pattern => $result )
        {
            if ( preg_match( $pattern, $string ) )
                return preg_replace( $pattern, $result, $string );
        }

        return $string;
    }

    public static function singularize( $string )
    {
        // save some time in the case that singular and plural are the same
        if ( in_array( strtolower( $string ), self::$uncountable ) )
            return $string;

        // check for irregular plural forms
        foreach ( self::$irregular as $result => $pattern )
        {
            $pattern = ‘/’ . $pattern . ‘$/i’;

            if ( preg_match( $pattern, $string ) )
                return preg_replace( $pattern, $result, $string);
        }

        // check for matches using regular expressions
        foreach ( self::$singular as $pattern => $result )
        {
            if ( preg_match( $pattern, $string ) )
                return preg_replace( $pattern, $result, $string );
        }

        return $string;
    }

    public static function pluralize_if($count, $string)
    {
        if ($count == 1)
            return “1 $string”;
        else
            return $count . ” ” . self::pluralize($string);
    }
}

ActionScript:

package
{
public class Inflect
{
    private static var plural : Array = [
        [/(quiz)$/i,                     "$1zes"],
        [/^(ox)$/i,                      "$1en"],
        [/([m|l])ouse$/i,                “$1ice”],
        [/(matr|vert|ind)ix|ex$/i,       "$1ices"],
        [/(x|ch|ss|sh)$/i,               "$1es"],
        [/([^aeiouy]|qu)y$/i,            “$1ies”],
        [/(hive)$/i,                     "$1s"],
        [/(?:([^f])fe|([lr])f)$/i,       “$1$2ves”],
        [/(shea|lea|loa|thie)f$/i,       "$1ves"],
        [/sis$/i,                        "ses"],
        [/([ti])um$/i,                   “$1a”],
        [/(tomat|potat|ech|her|vet)o$/i, "$1oes"],
        [/(bu)s$/i,                      "$1ses"],
        [/(alias|status)$/i,             "$1es"],
        [/(octop)us$/i,                  "$1i"],
        [/(ax|test)is$/i,                "$1es"],
        [/(us)$/i,                       "$1es"],
        [/s$/i,                          "s"],
        [/$/i,                           "s"]
    ];

    private static var singular : Array = [
        [/(quiz)zes$/i,             "$1"],
        [/(matr)ices$/i,            "$1ix"],
        [/(vert|ind)ices$/i,        "$1ex"],
        [/^(ox)en$/i,               "$1"],
        [/(alias|status)es$/i,      "$1"],
        [/(octop|vir)i$/i,          "$1us"],
        [/(cris|ax|test)es$/i,      "$1is"],
        [/(shoe)s$/i,               "$1"],
        [/(o)es$/i,                 "$1"],
        [/(bus)es$/i,               "$1"],
        [/([m|l])ice$/i,            “$1ouse”],
        [/(x|ch|ss|sh)es$/i,        "$1"],
        [/(m)ovies$/i,              "$1ovie"],
        [/(s)eries$/i,              "$1eries"],
        [/([^aeiouy]|qu)ies$/i,     “$1y”],
        [/([lr])ves$/i,             “$1f”],
        [/(tive)s$/i,               "$1"],
        [/(hive)s$/i,               "$1"],
        [/(li|wi|kni)ves$/i,        "$1fe"],
        [/(shea|loa|lea|thie)ves$/i,"$1f"],
        [/(^analy)ses$/i,           "$1sis"],
        [/((a)naly|(b)a|(d)iagno|(p)arenthe|(p)rogno|(s)ynop|(t)he)ses$/i,  "$1$2sis"],
        [/([ti])a$/i,               “$1um”],
        [/(n)ews$/i,                "$1ews"],
        [/(h|bl)ouses$/i,           "$1ouse"],
        [/(corpse)s$/i,             "$1"],
        [/(us)es$/i,                "$1"],
        [/s$/i,                     ""]
    ];

    private static var irregular : Array = [
        ['move'   , 'moves'],
        ['foot'   , 'feet'],
        ['goose'  , 'geese'],
        ['sex'    , 'sexes'],
        ['child'  , 'children'],
        ['man'    , 'men'],
        ['tooth'  , 'teeth'],
        ['person' , 'people']
    ];

    private static var uncountable : Array = [
        'sheep',
        'fish',
        'deer',
        'series',
        'species',
        'money',
        'rice',
        'information',
        'equipment'
    ];

    public static function pluralize( string : String ) : String
    {
        var pattern : RegExp;
        var result : String;

        // save some time in the case that singular and plural are the same
        if (uncountable.indexOf(string.toLowerCase()) != -1)
          return string;

        // check for irregular singular forms
        var item : Array;
        for each ( item in irregular )
        {
            pattern = new RegExp(item[0] + “$”, “i”);
            result = item[1];

            if (pattern.test(string))
            {
                return string.replace(pattern, result);
            }
        }

        // check for matches using regular expressions
        for each ( item in plural)
        {
            pattern = item[0];
            result = item[1];

            if (pattern.test(string))
            {
                return string.replace(pattern, result);
            }
        }

        return string;
    }

    public static function singularize( string : String ) : String
    {
        var pattern : RegExp;
        var result : String

        // save some time in the case that singular and plural are the same
        if (uncountable.indexOf(string.toLowerCase()) != -1)
            return string;

        // check for irregular singular forms
        var item : Array;
        for each ( item in irregular )
        {
            pattern = new RegExp(item[1] + “$”, “i”);
            result = item[0];

            if (pattern.test(string))
            {
                return string.replace(pattern, result);
            }
       }

       // check for matches using regular expressions
       for each ( item in singular)
       {
            pattern = item[0];
            result = item[1];

            if (pattern.test(string))
            {
                return string.replace(pattern, result);
            }
       }

       return string;

    }

    public static function pluralizeIf(count : int, string : String) : String
    {
        if (count == 1)
            return “1 ” + string;
        else
            return count.toString() + ” ” + pluralize(string);
    }
}
}

Ruby on Rails (use this to replace your Inflect.rb):

Inflector.inflections do |inflect|
    inflect.plural(/$/, 's')
    inflect.plural(/s$/i, 's')
    inflect.plural(/(us)$/i, '\\1es')
    inflect.plural(/(ax|test)is$/i, '\\1es')
    inflect.plural(/(octop)us$/i, '\\1i')
    inflect.plural(/(alias)$/i, '\\1es')
    inflect.plural(/(bu)s$/i, '\\1ses')
    inflect.plural(/(tomat|potat|ech|her|vet)o$/i, '\\1oes')
    inflect.plural(/([ti])um$/i, ‘\\1a’)
    inflect.plural(/sis$/i, ’ses’)
    inflect.plural(/(shea|lea|loa|thie)f$/i, ‘\\1ves’)
    inflect.plural(/(?:([^f])fe|([lr])f)$/i, ‘\\1\\2ves’)
    inflect.plural(/(hive)$/i, ‘\\1s’)
    inflect.plural(/([^aeiouy]|qu)y$/i, ‘\\1ies’)
    inflect.plural(/(x|ch|ss|sh)$/i, ‘\\1es’)
    inflect.plural(/(matr|vert|ind)(?:ix|ex)$/i, ‘\\1ices’)
    inflect.plural(/([m|l])ouse$/i, ‘\\1ice’)
    inflect.plural(/^(ox)$/i, ‘\\1en’)
    inflect.plural(/(quiz)$/i, ‘\\1zes’)

    inflect.singular(/s$/i, ”)
    inflect.singular(/(us)es$/i, ‘\\1′)
    inflect.singular(/(corpse)s$/i, ‘\\1′)
    inflect.singular(/(h|bl)ouses$/i, ‘\\1ouse’)
    inflect.singular(/(n)ews$/i, ‘\\1ews’)
    inflect.singular(/([ti])a$/i, ‘\\1um’)
    inflect.singular(/((a)naly|(b)a|(d)iagno|(p)arenthe|(p)rogno|(s)ynop|(t)he)ses$/i, ‘\\1\\2sis’)
    inflect.singular(/(^analy)ses$/i, ‘\\1sis’)
    inflect.singular(/(shea|loa|lea|thie)ves$/i, ‘\\1f’)
    inflect.singular(/(li|wi|kni)ves$/i, ‘\\1fe’)
    inflect.singular(/(hive)s$/i, ‘\\1′)
    inflect.singular(/(tive)s$/i, ‘\\1′)
    inflect.singular(/([lr])ves$/i, ‘\\1f’)
    inflect.singular(/([^aeiouy]|qu)ies$/i, ‘\\1y’)
    inflect.singular(/(s)eries$/i, ‘\\1eries’)
    inflect.singular(/(m)ovies$/i, ‘\\1ovie’)
    inflect.singular(/(x|ch|ss|sh)es$/i, ‘\\1′)
    inflect.singular(/([m|l])ice$/i, ‘\\1ouse’)
    inflect.singular(/(bus)es$/i, ‘\\1′)
    inflect.singular(/(o)es$/i, ‘\\1′)
    inflect.singular(/(shoe)s$/i, ‘\\1′)
    inflect.singular(/(cris|ax|test)es$/i, ‘\\1is’)
    inflect.singular(/(octop|vir)i$/i, ‘\\1us’)
    inflect.singular(/(alias)es$/i, ‘\\1′)
    inflect.singular(/^(ox)en/i, ‘\\1′)
    inflect.singular(/(vert|ind)ices$/i, ‘\\1ex’)
    inflect.singular(/(matr)ices$/i, ‘\\1ix’)
    inflect.singular(/(quiz)zes$/i, ‘\\1′)

    inflect.irregular(’person’, ‘people’)
    inflect.irregular(’tooth’, ‘teeth’)
    inflect.irregular(’man’, ‘men’)
    inflect.irregular(’child’, ‘children’)
    inflect.irregular(’sex’, ’sexes’)
    inflect.irregular(’goose’, ‘geese’)
    inflect.irregular(’foot’, ‘feet’)
    inflect.irregular(’move’, ‘moves’)

    inflect.uncountable(%w(equipment information rice money species series deer fish sheep))
end
15
Jun
2006

Update: Avoid ints in ActionScript

After my post, Grant Skinner did some experiments around the discrepancy between Number and int performance. He originally posted a basic test, but has since expanded his testcases to show how different mathematical operations perform.

You can see his results here.

15
Jun
2006

Avoid ints in ActionScript

The more I play with Flex, the more I learn, and the more I learn about ints, the less I want to use them. I’ve concluded that I’m going to stop using ints unless I really need them.

Reason 1: Numbers may actually be faster than ints

Surprising, but true. ECMAScript Edition 4 is designed to be a language that is as compatible as possible with earlier versions of ECMAScript. As it turns out, this makes it difficult to ensure that math works “correctly” in seemingly innocuous cases.

public function timingTest() : void
{
	var intTime : Number;
	var numberTime : Number;

	var i : int;
	var j : int = 0;

	intTime = (new Date()).time;
	for (i=0; i<10000000; i++)
		j = (j + 15) / 7;

	intTime = (new Date()).time - intTime;

	var n : Number;
	var m : Number = 0;

	numberTime = (new Date()).time;
	for (n=0; n<10000000; n++)
		m = (m + 15) / 7;

	numberTime = (new Date()).time - numberTime;

	var message : String =
		"int version: " + intTime + "ms\n" +
		"Number version: " + numberTime + "ms";

	Alert.show(message);
}

Which version do you think wins? On my machine, the int version takes 331ms, while the Number version takes 291ms. Why is this? Let’s look at the following expression:

j = (j + 15) / 7;

What happens if you start with the value j = 2^31 - 1? In some languages, you would run into overflow issues as soon as you add 15 to it. ECMAScript, however, has a looser concept of numbers. The system is supposed to move smoothly from ints to doubles as needed. Because of this, virtually all math is done internally as Number, not as int.

Given that everything is being done as a Number anyway, the extra cost of converting from int to Number and back again takes even more time, which is why the int version is slower.

There is a second counterinuitive reason for using Number over int, which is that Number actually lets you store integral values more precisely than ints do…
More »