Improved pluralizing in PHP, ActionScript, and RoR
I found a number of issues with the pluralizing code I posted the other day, and I felt compelled to fix it. While I was at it, I ported the code to ActionScript 3 and then back to Ruby, so you can use the pluralizer in your Flex or RoR apps.
BTW, for those of you who love language and know regular expressions (this means you, lori and nj!), please help me spot any problems with these rules.
The reason it was important for me to get these pluralization rules rights is that I am using these for human-readable strings. In my project, users get to name things however they want and it is nice to be able to pluralize (or singularize) these words when communicating with the user.
(As a side note, I think the Ruby on Rails idea of auto-pluralizing table names to be a bit bizarre. I like having magic frameworks that do all the work for me, but I want my frameworks to have predictable behavior!! I mean… how bad would it be if the table that holds the “person” object was called “person” instead of “people”?)
Thanks again to the Rails team for getting this ball rolling, and to Paul Osman for the original PHP version of this code.
On to the code. All code below is covered under the MIT license.
PHP:
// Thanks to http://www.eval.ca/articles/php-pluralize (MIT license)
// http://dev.rubyonrails.org/browser/trunk/activesupport/lib/active_support/inflections.rb (MIT license)
// http://www.fortunecity.com/bally/durrus/153/gramch13.html
// http://www2.gsu.edu/~wwwesl/egw/crump.htm
//
// Changes (12/17/07)
// Major changes
// --
// Fixed irregular noun algorithm to use regular expressions just like the original Ruby source.
// (this allows for things like fireman -> firemen
// Fixed the order of the singular array, which was backwards.
//
// Minor changes
// --
// Removed incorrect pluralization rule for /([^aeiouy]|qu)ies$/ => $1y
// Expanded on the list of exceptions for *o -> *oes, and removed rule for buffalo -> buffaloes
// Removed dangerous singularization rule for /([^f])ves$/ => $1fe
// Added more specific rules for singularizing lives, wives, knives, sheaves, loaves, and leaves and thieves
// Added exception to /(us)es$/ => $1 rule for houses => house and blouses => blouse
// Added excpetions for feet, geese and teeth
// Added rule for deer -> deer
// Changes:
// Removed rule for virus -> viri
// Added rule for potato -> potatoes
// Added rule for *us -> *uses
class Inflect
{
static $plural = array(
'/(quiz)$/i' => "$1zes",
'/^(ox)$/i' => "$1en",
'/([m|l])ouse$/i' => "$1ice",
'/(matr|vert|ind)ix|ex$/i' => "$1ices",
'/(x|ch|ss|sh)$/i' => "$1es",
'/([^aeiouy]|qu)y$/i' => "$1ies",
'/(hive)$/i' => "$1s",
'/(?:([^f])fe|([lr])f)$/i' => "$1$2ves",
'/(shea|lea|loa|thie)f$/i' => "$1ves",
'/sis$/i' => "ses",
'/([ti])um$/i' => "$1a",
'/(tomat|potat|ech|her|vet)o$/i'=> "$1oes",
'/(bu)s$/i' => "$1ses",
'/(alias)$/i' => "$1es",
'/(octop)us$/i' => "$1i",
'/(ax|test)is$/i' => "$1es",
'/(us)$/i' => "$1es",
'/s$/i' => "s",
'/$/' => "s"
);
static $singular = array(
'/(quiz)zes$/i' => "$1",
'/(matr)ices$/i' => "$1ix",
'/(vert|ind)ices$/i' => "$1ex",
'/^(ox)en$/i' => "$1",
'/(alias)es$/i' => "$1",
'/(octop|vir)i$/i' => "$1us",
'/(cris|ax|test)es$/i' => "$1is",
'/(shoe)s$/i' => "$1",
'/(o)es$/i' => "$1",
'/(bus)es$/i' => "$1",
'/([m|l])ice$/i' => "$1ouse",
'/(x|ch|ss|sh)es$/i' => "$1",
'/(m)ovies$/i' => "$1ovie",
'/(s)eries$/i' => "$1eries",
'/([^aeiouy]|qu)ies$/i' => "$1y",
'/([lr])ves$/i' => "$1f",
'/(tive)s$/i' => "$1",
'/(hive)s$/i' => "$1",
'/(li|wi|kni)ves$/i' => "$1fe",
'/(shea|loa|lea|thie)ves$/i'=> "$1f",
'/(^analy)ses$/i' => "$1sis",
'/((a)naly|(b)a|(d)iagno|(p)arenthe|(p)rogno|(s)ynop|(t)he)ses$/i' => "$1$2sis",
'/([ti])a$/i' => "$1um",
'/(n)ews$/i' => "$1ews",
'/(h|bl)ouses$/i' => "$1ouse",
'/(corpse)s$/i' => "$1",
'/(us)es$/i' => "$1",
'/s$/i' => ""
);
static $irregular = array(
'move' => 'moves',
'foot' => 'feet',
'goose' => 'geese',
'sex' => 'sexes',
'child' => 'children',
'man' => 'men',
'tooth' => 'teeth',
'person' => 'people'
);
static $uncountable = array(
'sheep',
'fish',
'deer',
'series',
'species',
'money',
'rice',
'information',
'equipment'
);
public static function pluralize( $string )
{
// save some time in the case that singular and plural are the same
if ( in_array( strtolower( $string ), self::$uncountable ) )
return $string;
// check for irregular singular forms
foreach ( self::$irregular as $pattern => $result )
{
$pattern = '/' . $pattern . '$/i';
if ( preg_match( $pattern, $string ) )
return preg_replace( $pattern, $result, $string);
}
// check for matches using regular expressions
foreach ( self::$plural as $pattern => $result )
{
if ( preg_match( $pattern, $string ) )
return preg_replace( $pattern, $result, $string );
}
return $string;
}
public static function singularize( $string )
{
// save some time in the case that singular and plural are the same
if ( in_array( strtolower( $string ), self::$uncountable ) )
return $string;
// check for irregular plural forms
foreach ( self::$irregular as $result => $pattern )
{
$pattern = '/' . $pattern . '$/i';
if ( preg_match( $pattern, $string ) )
return preg_replace( $pattern, $result, $string);
}
// check for matches using regular expressions
foreach ( self::$singular as $pattern => $result )
{
if ( preg_match( $pattern, $string ) )
return preg_replace( $pattern, $result, $string );
}
return $string;
}
public static function pluralize_if($count, $string)
{
if ($count == 1)
return "1 $string";
else
return $count . " " . self::pluralize($string);
}
}
ActionScript:
package
{
public class Inflect
{
private static var plural : Array = [
[/(quiz)$/i, "$1zes"],
[/^(ox)$/i, "$1en"],
[/([m|l])ouse$/i, "$1ice"],
[/(matr|vert|ind)ix|ex$/i, "$1ices"],
[/(x|ch|ss|sh)$/i, "$1es"],
[/([^aeiouy]|qu)y$/i, "$1ies"],
[/(hive)$/i, "$1s"],
[/(?:([^f])fe|([lr])f)$/i, "$1$2ves"],
[/(shea|lea|loa|thie)f$/i, "$1ves"],
[/sis$/i, "ses"],
[/([ti])um$/i, "$1a"],
[/(tomat|potat|ech|her|vet)o$/i, "$1oes"],
[/(bu)s$/i, "$1ses"],
[/(alias|status)$/i, "$1es"],
[/(octop)us$/i, "$1i"],
[/(ax|test)is$/i, "$1es"],
[/(us)$/i, "$1es"],
[/s$/i, "s"],
[/$/i, "s"]
];
private static var singular : Array = [
[/(quiz)zes$/i, "$1"],
[/(matr)ices$/i, "$1ix"],
[/(vert|ind)ices$/i, "$1ex"],
[/^(ox)en$/i, "$1"],
[/(alias|status)es$/i, "$1"],
[/(octop|vir)i$/i, "$1us"],
[/(cris|ax|test)es$/i, "$1is"],
[/(shoe)s$/i, "$1"],
[/(o)es$/i, "$1"],
[/(bus)es$/i, "$1"],
[/([m|l])ice$/i, "$1ouse"],
[/(x|ch|ss|sh)es$/i, "$1"],
[/(m)ovies$/i, "$1ovie"],
[/(s)eries$/i, "$1eries"],
[/([^aeiouy]|qu)ies$/i, "$1y"],
[/([lr])ves$/i, "$1f"],
[/(tive)s$/i, "$1"],
[/(hive)s$/i, "$1"],
[/(li|wi|kni)ves$/i, "$1fe"],
[/(shea|loa|lea|thie)ves$/i,"$1f"],
[/(^analy)ses$/i, "$1sis"],
[/((a)naly|(b)a|(d)iagno|(p)arenthe|(p)rogno|(s)ynop|(t)he)ses$/i, "$1$2sis"],
[/([ti])a$/i, "$1um"],
[/(n)ews$/i, "$1ews"],
[/(h|bl)ouses$/i, "$1ouse"],
[/(corpse)s$/i, "$1"],
[/(us)es$/i, "$1"],
[/s$/i, ""]
];
private static var irregular : Array = [
['move' , 'moves'],
['foot' , 'feet'],
['goose' , 'geese'],
['sex' , 'sexes'],
['child' , 'children'],
['man' , 'men'],
['tooth' , 'teeth'],
['person' , 'people']
];
private static var uncountable : Array = [
'sheep',
'fish',
'deer',
'series',
'species',
'money',
'rice',
'information',
'equipment'
];
public static function pluralize( string : String ) : String
{
var pattern : RegExp;
var result : String;
// save some time in the case that singular and plural are the same
if (uncountable.indexOf(string.toLowerCase()) != -1)
return string;
// check for irregular singular forms
var item : Array;
for each ( item in irregular )
{
pattern = new RegExp(item[0] + "$", "i");
result = item[1];
if (pattern.test(string))
{
return string.replace(pattern, result);
}
}
// check for matches using regular expressions
for each ( item in plural)
{
pattern = item[0];
result = item[1];
if (pattern.test(string))
{
return string.replace(pattern, result);
}
}
return string;
}
public static function singularize( string : String ) : String
{
var pattern : RegExp;
var result : String
// save some time in the case that singular and plural are the same
if (uncountable.indexOf(string.toLowerCase()) != -1)
return string;
// check for irregular singular forms
var item : Array;
for each ( item in irregular )
{
pattern = new RegExp(item[1] + "$", "i");
result = item[0];
if (pattern.test(string))
{
return string.replace(pattern, result);
}
}
// check for matches using regular expressions
for each ( item in singular)
{
pattern = item[0];
result = item[1];
if (pattern.test(string))
{
return string.replace(pattern, result);
}
}
return string;
}
public static function pluralizeIf(count : int, string : String) : String
{
if (count == 1)
return "1 " + string;
else
return count.toString() + " " + pluralize(string);
}
}
}
Ruby on Rails (use this to replace your Inflect.rb):
Inflector.inflections do |inflect|
inflect.plural(/$/, 's')
inflect.plural(/s$/i, 's')
inflect.plural(/(us)$/i, '\\1es')
inflect.plural(/(ax|test)is$/i, '\\1es')
inflect.plural(/(octop)us$/i, '\\1i')
inflect.plural(/(alias)$/i, '\\1es')
inflect.plural(/(bu)s$/i, '\\1ses')
inflect.plural(/(tomat|potat|ech|her|vet)o$/i, '\\1oes')
inflect.plural(/([ti])um$/i, '\\1a')
inflect.plural(/sis$/i, 'ses')
inflect.plural(/(shea|lea|loa|thie)f$/i, '\\1ves')
inflect.plural(/(?:([^f])fe|([lr])f)$/i, '\\1\\2ves')
inflect.plural(/(hive)$/i, '\\1s')
inflect.plural(/([^aeiouy]|qu)y$/i, '\\1ies')
inflect.plural(/(x|ch|ss|sh)$/i, '\\1es')
inflect.plural(/(matr|vert|ind)(?:ix|ex)$/i, '\\1ices')
inflect.plural(/([m|l])ouse$/i, '\\1ice')
inflect.plural(/^(ox)$/i, '\\1en')
inflect.plural(/(quiz)$/i, '\\1zes')
inflect.singular(/s$/i, '')
inflect.singular(/(us)es$/i, '\\1')
inflect.singular(/(corpse)s$/i, '\\1')
inflect.singular(/(h|bl)ouses$/i, '\\1ouse')
inflect.singular(/(n)ews$/i, '\\1ews')
inflect.singular(/([ti])a$/i, '\\1um')
inflect.singular(/((a)naly|(b)a|(d)iagno|(p)arenthe|(p)rogno|(s)ynop|(t)he)ses$/i, '\\1\\2sis')
inflect.singular(/(^analy)ses$/i, '\\1sis')
inflect.singular(/(shea|loa|lea|thie)ves$/i, '\\1f')
inflect.singular(/(li|wi|kni)ves$/i, '\\1fe')
inflect.singular(/(hive)s$/i, '\\1')
inflect.singular(/(tive)s$/i, '\\1')
inflect.singular(/([lr])ves$/i, '\\1f')
inflect.singular(/([^aeiouy]|qu)ies$/i, '\\1y')
inflect.singular(/(s)eries$/i, '\\1eries')
inflect.singular(/(m)ovies$/i, '\\1ovie')
inflect.singular(/(x|ch|ss|sh)es$/i, '\\1')
inflect.singular(/([m|l])ice$/i, '\\1ouse')
inflect.singular(/(bus)es$/i, '\\1')
inflect.singular(/(o)es$/i, '\\1')
inflect.singular(/(shoe)s$/i, '\\1')
inflect.singular(/(cris|ax|test)es$/i, '\\1is')
inflect.singular(/(octop|vir)i$/i, '\\1us')
inflect.singular(/(alias)es$/i, '\\1')
inflect.singular(/^(ox)en/i, '\\1')
inflect.singular(/(vert|ind)ices$/i, '\\1ex')
inflect.singular(/(matr)ices$/i, '\\1ix')
inflect.singular(/(quiz)zes$/i, '\\1')
inflect.irregular('person', 'people')
inflect.irregular('tooth', 'teeth')
inflect.irregular('man', 'men')
inflect.irregular('child', 'children')
inflect.irregular('sex', 'sexes')
inflect.irregular('goose', 'geese')
inflect.irregular('foot', 'feet')
inflect.irregular('move', 'moves')
inflect.uncountable(%w(equipment information rice money species series deer fish sheep))
end


[...] One of those useful but sometimes hard to find (good) PHP scripts. How to pluralize in PHP. It’s a work in progress with tweaking, but looks mostly solid. Tags: PHP [...]
[...] about | contact « The National - Boxer Improved pluralizing in PHP, ActionScript, and RoR » 14Dec2007 [...]
I think your singular for “uses” (the word, not the suffix) will turn out to be “us” instead of “use”.
Here’s an odd one (plurals according to M-W):
“scarf” –> “scarfs”
“dwarf” –> “dwarfs”, although apparently “dwarves” is also OK
“serf” –> “serfs” (? — M-W doesn’t give a plural, but that’s my guess)
but “wharf” –> “wharves”
Cool. Thanks, Eylon! That rule (*uses -> *us) is the one I am the most worried about.
Hmm… Odd thing about scarfs. I would have thought it would be scarves!
Anyone else?
The plural of “octopus” is either “octopodes”, if you’re being pedantically correct, or “octopuses” if you’re not.
Also a couple of your regexps are wrong. The alternation operator “|” only applies to the regexps immediately to either side, which is typically a single character. If you want it to apply to a string, then you need to group it with parentheses. Thus your /(matr|vert|ind)ix|ex$/ currently matches “matrixx” and “vertiex”, but not “matrix”, “vertex” or “index”. You could improve it by changing it to /(matr|vert|ind)(ix|ex)$/, which would match everything you intend, but would also match some things you dont, e.g., “matrex”, and “indix”.
Thanks, tet! Never heard of “octopodes”. I prefer “octopuses” myself (why use an irregular form when you can use a regular one, right?). I think the original RoR code included “octopi” in order to show how irregular nouns would be handled.
Thanks also for finding the bug in the regular expression. I took that verbatim from the first port of the code to PHP and I hadn’t spotted that error.
Anyone else?
Guys … your code will be used by many third parties, so you do have a certain responsibility. Do not take simplification of English in your own hands. Irregularities are there to be respected.
Hi Hristo
Thanks for the comments. I’m totally in agreement about needing to respect the language.
If your comment was in reference to my earlier comment about irregular pluralizations (like octopodes), let me explain a bit more.
Everyone agrees on certain rules (the plural of person is people) whereas other rules are disputed (the plural of octopus might be octopuses, octopi, or octopodes). When a both a regular and irregular form are accepted pluralizations, I tend to prefer the regular form. This is not me being lazy as a programmer. This is just my preference in how to use the english language. As another example, the original ruby on rails code pluralized virus as viri. I don’t know anyone who uses that word in common usage, so I removed that rule.
even people/person is not in 100% agreement ;)
http://www.worldwidewords.org/articles/people.htm
and according to m-w.com, the plural of “scarf” is “scarves” OR “scarfs”. it’s a bit confusing because entry #1 just says “scarfs”, but that’s for “either of the chamfered or cutaway ends that fit together to form a scarf joint”. it’s entry #3 (”broad band of cloth worn about the shoulders, around the neck, or over the head” — what i would think is the more common use/meaning) that says “scarves” or “scarfs” are both ok.
(i wish m-w would let me directly link to the specific entry!)
thx for this class, i can use it like on Ruby on Rails.
this class very useful, when i work with classes in PHP especially dealing with plural and singular function
Hi Sho, this looks fantastic. I’ve updated my blog post to include a link here. Great work!
Cool, but a little confused, read it again.
Thanks for this! I got here via a couple hops away from Google (ended up here via Paul Osman’s blog post), and it’s exactly what I was looking for to de-pluralize my table names for my MySQL->PHP Class converter.
[...] I’m using pluralize from kuwamoto, http://kuwamoto.org/2007/12/17/improved-pluralizing-in-php-actionscript-and-ror/ [...]
Nice work man .. :)
Best wishes! Free makeover games for girls eubyi