Improved pluralizing in PHP, ActionScript, and RoR
I found a number of issues with the pluralizing code I posted the other day, and I felt compelled to fix it. While I was at it, I ported the code to ActionScript 3 and then back to Ruby, so you can use the pluralizer in your Flex or RoR apps.
BTW, for those of you who love language and know regular expressions (this means you, lori and nj!), please help me spot any problems with these rules.
The reason it was important for me to get these pluralization rules rights is that I am using these for human-readable strings. In my project, users get to name things however they want and it is nice to be able to pluralize (or singularize) these words when communicating with the user.
(As a side note, I think the Ruby on Rails idea of auto-pluralizing table names to be a bit bizarre. I like having magic frameworks that do all the work for me, but I want my frameworks to have predictable behavior!! I mean… how bad would it be if the table that holds the “person” object was called “person” instead of “people”?)
Thanks again to the Rails team for getting this ball rolling, and to Paul Osman for the original PHP version of this code.
On to the code. All code below is covered under the MIT license.
PHP:
// Thanks to http://www.eval.ca/articles/php-pluralize (MIT license)
// http://dev.rubyonrails.org/browser/trunk/activesupport/lib/active_support/inflections.rb (MIT license)
// http://www.fortunecity.com/bally/durrus/153/gramch13.html
// http://www2.gsu.edu/~wwwesl/egw/crump.htm
//
// Changes (12/17/07)
// Major changes
// --
// Fixed irregular noun algorithm to use regular expressions just like the original Ruby source.
// (this allows for things like fireman -> firemen
// Fixed the order of the singular array, which was backwards.
//
// Minor changes
// --
// Removed incorrect pluralization rule for /([^aeiouy]|qu)ies$/ => $1y
// Expanded on the list of exceptions for *o -> *oes, and removed rule for buffalo -> buffaloes
// Removed dangerous singularization rule for /([^f])ves$/ => $1fe
// Added more specific rules for singularizing lives, wives, knives, sheaves, loaves, and leaves and thieves
// Added exception to /(us)es$/ => $1 rule for houses => house and blouses => blouse
// Added excpetions for feet, geese and teeth
// Added rule for deer -> deer
// Changes:
// Removed rule for virus -> viri
// Added rule for potato -> potatoes
// Added rule for *us -> *uses
class Inflect
{
static $plural = array(
'/(quiz)$/i' => "$1zes",
'/^(ox)$/i' => "$1en",
'/([m|l])ouse$/i' => "$1ice",
'/(matr|vert|ind)ix|ex$/i' => "$1ices",
'/(x|ch|ss|sh)$/i' => "$1es",
'/([^aeiouy]|qu)y$/i' => "$1ies",
'/(hive)$/i' => "$1s",
'/(?:([^f])fe|([lr])f)$/i' => "$1$2ves",
'/(shea|lea|loa|thie)f$/i' => "$1ves",
'/sis$/i' => "ses",
'/([ti])um$/i' => "$1a",
'/(tomat|potat|ech|her|vet)o$/i'=> "$1oes",
'/(bu)s$/i' => "$1ses",
'/(alias)$/i' => "$1es",
'/(octop)us$/i' => "$1i",
'/(ax|test)is$/i' => "$1es",
'/(us)$/i' => "$1es",
'/s$/i' => "s",
'/$/' => "s"
);
static $singular = array(
'/(quiz)zes$/i' => "$1",
'/(matr)ices$/i' => "$1ix",
'/(vert|ind)ices$/i' => "$1ex",
'/^(ox)en$/i' => "$1",
'/(alias)es$/i' => "$1",
'/(octop|vir)i$/i' => "$1us",
'/(cris|ax|test)es$/i' => "$1is",
'/(shoe)s$/i' => "$1",
'/(o)es$/i' => "$1",
'/(bus)es$/i' => "$1",
'/([m|l])ice$/i' => "$1ouse",
'/(x|ch|ss|sh)es$/i' => "$1",
'/(m)ovies$/i' => "$1ovie",
'/(s)eries$/i' => "$1eries",
'/([^aeiouy]|qu)ies$/i' => "$1y",
'/([lr])ves$/i' => "$1f",
'/(tive)s$/i' => "$1",
'/(hive)s$/i' => "$1",
'/(li|wi|kni)ves$/i' => "$1fe",
'/(shea|loa|lea|thie)ves$/i'=> "$1f",
'/(^analy)ses$/i' => "$1sis",
'/((a)naly|(b)a|(d)iagno|(p)arenthe|(p)rogno|(s)ynop|(t)he)ses$/i' => "$1$2sis",
'/([ti])a$/i' => "$1um",
'/(n)ews$/i' => "$1ews",
'/(h|bl)ouses$/i' => "$1ouse",
'/(corpse)s$/i' => "$1",
'/(us)es$/i' => "$1",
'/s$/i' => ""
);
static $irregular = array(
'move' => 'moves',
'foot' => 'feet',
'goose' => 'geese',
'sex' => 'sexes',
'child' => 'children',
'man' => 'men',
'tooth' => 'teeth',
'person' => 'people'
);
static $uncountable = array(
'sheep',
'fish',
'deer',
'series',
'species',
'money',
'rice',
'information',
'equipment'
);
public static function pluralize( $string )
{
// save some time in the case that singular and plural are the same
if ( in_array( strtolower( $string ), self::$uncountable ) )
return $string;
// check for irregular singular forms
foreach ( self::$irregular as $pattern => $result )
{
$pattern = '/' . $pattern . '$/i';
if ( preg_match( $pattern, $string ) )
return preg_replace( $pattern, $result, $string);
}
// check for matches using regular expressions
foreach ( self::$plural as $pattern => $result )
{
if ( preg_match( $pattern, $string ) )
return preg_replace( $pattern, $result, $string );
}
return $string;
}
public static function singularize( $string )
{
// save some time in the case that singular and plural are the same
if ( in_array( strtolower( $string ), self::$uncountable ) )
return $string;
// check for irregular plural forms
foreach ( self::$irregular as $result => $pattern )
{
$pattern = '/' . $pattern . '$/i';
if ( preg_match( $pattern, $string ) )
return preg_replace( $pattern, $result, $string);
}
// check for matches using regular expressions
foreach ( self::$singular as $pattern => $result )
{
if ( preg_match( $pattern, $string ) )
return preg_replace( $pattern, $result, $string );
}
return $string;
}
public static function pluralize_if($count, $string)
{
if ($count == 1)
return "1 $string";
else
return $count . " " . self::pluralize($string);
}
}
ActionScript:
package
{
public class Inflect
{
private static var plural : Array = [
[/(quiz)$/i, "$1zes"],
[/^(ox)$/i, "$1en"],
[/([m|l])ouse$/i, "$1ice"],
[/(matr|vert|ind)ix|ex$/i, "$1ices"],
[/(x|ch|ss|sh)$/i, "$1es"],
[/([^aeiouy]|qu)y$/i, "$1ies"],
[/(hive)$/i, "$1s"],
[/(?:([^f])fe|([lr])f)$/i, "$1$2ves"],
[/(shea|lea|loa|thie)f$/i, "$1ves"],
[/sis$/i, "ses"],
[/([ti])um$/i, "$1a"],
[/(tomat|potat|ech|her|vet)o$/i, "$1oes"],
[/(bu)s$/i, "$1ses"],
[/(alias|status)$/i, "$1es"],
[/(octop)us$/i, "$1i"],
[/(ax|test)is$/i, "$1es"],
[/(us)$/i, "$1es"],
[/s$/i, "s"],
[/$/i, "s"]
];
private static var singular : Array = [
[/(quiz)zes$/i, "$1"],
[/(matr)ices$/i, "$1ix"],
[/(vert|ind)ices$/i, "$1ex"],
[/^(ox)en$/i, "$1"],
[/(alias|status)es$/i, "$1"],
[/(octop|vir)i$/i, "$1us"],
[/(cris|ax|test)es$/i, "$1is"],
[/(shoe)s$/i, "$1"],
[/(o)es$/i, "$1"],
[/(bus)es$/i, "$1"],
[/([m|l])ice$/i, "$1ouse"],
[/(x|ch|ss|sh)es$/i, "$1"],
[/(m)ovies$/i, "$1ovie"],
[/(s)eries$/i, "$1eries"],
[/([^aeiouy]|qu)ies$/i, "$1y"],
[/([lr])ves$/i, "$1f"],
[/(tive)s$/i, "$1"],
[/(hive)s$/i, "$1"],
[/(li|wi|kni)ves$/i, "$1fe"],
[/(shea|loa|lea|thie)ves$/i,"$1f"],
[/(^analy)ses$/i, "$1sis"],
[/((a)naly|(b)a|(d)iagno|(p)arenthe|(p)rogno|(s)ynop|(t)he)ses$/i, "$1$2sis"],
[/([ti])a$/i, "$1um"],
[/(n)ews$/i, "$1ews"],
[/(h|bl)ouses$/i, "$1ouse"],
[/(corpse)s$/i, "$1"],
[/(us)es$/i, "$1"],
[/s$/i, ""]
];
private static var irregular : Array = [
['move' , 'moves'],
['foot' , 'feet'],
['goose' , 'geese'],
['sex' , 'sexes'],
['child' , 'children'],
['man' , 'men'],
['tooth' , 'teeth'],
['person' , 'people']
];
private static var uncountable : Array = [
'sheep',
'fish',
'deer',
'series',
'species',
'money',
'rice',
'information',
'equipment'
];
public static function pluralize( string : String ) : String
{
var pattern : RegExp;
var result : String;
// save some time in the case that singular and plural are the same
if (uncountable.indexOf(string.toLowerCase()) != -1)
return string;
// check for irregular singular forms
var item : Array;
for each ( item in irregular )
{
pattern = new RegExp(item[0] + "$", "i");
result = item[1];
if (pattern.test(string))
{
return string.replace(pattern, result);
}
}
// check for matches using regular expressions
for each ( item in plural)
{
pattern = item[0];
result = item[1];
if (pattern.test(string))
{
return string.replace(pattern, result);
}
}
return string;
}
public static function singularize( string : String ) : String
{
var pattern : RegExp;
var result : String
// save some time in the case that singular and plural are the same
if (uncountable.indexOf(string.toLowerCase()) != -1)
return string;
// check for irregular singular forms
var item : Array;
for each ( item in irregular )
{
pattern = new RegExp(item[1] + "$", "i");
result = item[0];
if (pattern.test(string))
{
return string.replace(pattern, result);
}
}
// check for matches using regular expressions
for each ( item in singular)
{
pattern = item[0];
result = item[1];
if (pattern.test(string))
{
return string.replace(pattern, result);
}
}
return string;
}
public static function pluralizeIf(count : int, string : String) : String
{
if (count == 1)
return "1 " + string;
else
return count.toString() + " " + pluralize(string);
}
}
}
Ruby on Rails (use this to replace your Inflect.rb):
Inflector.inflections do |inflect|
inflect.plural(/$/, 's')
inflect.plural(/s$/i, 's')
inflect.plural(/(us)$/i, '\\1es')
inflect.plural(/(ax|test)is$/i, '\\1es')
inflect.plural(/(octop)us$/i, '\\1i')
inflect.plural(/(alias)$/i, '\\1es')
inflect.plural(/(bu)s$/i, '\\1ses')
inflect.plural(/(tomat|potat|ech|her|vet)o$/i, '\\1oes')
inflect.plural(/([ti])um$/i, '\\1a')
inflect.plural(/sis$/i, 'ses')
inflect.plural(/(shea|lea|loa|thie)f$/i, '\\1ves')
inflect.plural(/(?:([^f])fe|([lr])f)$/i, '\\1\\2ves')
inflect.plural(/(hive)$/i, '\\1s')
inflect.plural(/([^aeiouy]|qu)y$/i, '\\1ies')
inflect.plural(/(x|ch|ss|sh)$/i, '\\1es')
inflect.plural(/(matr|vert|ind)(?:ix|ex)$/i, '\\1ices')
inflect.plural(/([m|l])ouse$/i, '\\1ice')
inflect.plural(/^(ox)$/i, '\\1en')
inflect.plural(/(quiz)$/i, '\\1zes')
inflect.singular(/s$/i, '')
inflect.singular(/(us)es$/i, '\\1')
inflect.singular(/(corpse)s$/i, '\\1')
inflect.singular(/(h|bl)ouses$/i, '\\1ouse')
inflect.singular(/(n)ews$/i, '\\1ews')
inflect.singular(/([ti])a$/i, '\\1um')
inflect.singular(/((a)naly|(b)a|(d)iagno|(p)arenthe|(p)rogno|(s)ynop|(t)he)ses$/i, '\\1\\2sis')
inflect.singular(/(^analy)ses$/i, '\\1sis')
inflect.singular(/(shea|loa|lea|thie)ves$/i, '\\1f')
inflect.singular(/(li|wi|kni)ves$/i, '\\1fe')
inflect.singular(/(hive)s$/i, '\\1')
inflect.singular(/(tive)s$/i, '\\1')
inflect.singular(/([lr])ves$/i, '\\1f')
inflect.singular(/([^aeiouy]|qu)ies$/i, '\\1y')
inflect.singular(/(s)eries$/i, '\\1eries')
inflect.singular(/(m)ovies$/i, '\\1ovie')
inflect.singular(/(x|ch|ss|sh)es$/i, '\\1')
inflect.singular(/([m|l])ice$/i, '\\1ouse')
inflect.singular(/(bus)es$/i, '\\1')
inflect.singular(/(o)es$/i, '\\1')
inflect.singular(/(shoe)s$/i, '\\1')
inflect.singular(/(cris|ax|test)es$/i, '\\1is')
inflect.singular(/(octop|vir)i$/i, '\\1us')
inflect.singular(/(alias)es$/i, '\\1')
inflect.singular(/^(ox)en/i, '\\1')
inflect.singular(/(vert|ind)ices$/i, '\\1ex')
inflect.singular(/(matr)ices$/i, '\\1ix')
inflect.singular(/(quiz)zes$/i, '\\1')
inflect.irregular('person', 'people')
inflect.irregular('tooth', 'teeth')
inflect.irregular('man', 'men')
inflect.irregular('child', 'children')
inflect.irregular('sex', 'sexes')
inflect.irregular('goose', 'geese')
inflect.irregular('foot', 'feet')
inflect.irregular('move', 'moves')
inflect.uncountable(%w(equipment information rice money species series deer fish sheep))
end


[...] One of those useful but sometimes hard to find (good) PHP scripts. How to pluralize in PHP. It’s a work in progress with tweaking, but looks mostly solid. Tags: PHP [...]
[...] about | contact « The National - Boxer Improved pluralizing in PHP, ActionScript, and RoR » 14Dec2007 [...]
I think your singular for “uses” (the word, not the suffix) will turn out to be “us” instead of “use”.
Here’s an odd one (plurals according to M-W):
“scarf” –> “scarfs”
“dwarf” –> “dwarfs”, although apparently “dwarves” is also OK
“serf” –> “serfs” (? — M-W doesn’t give a plural, but that’s my guess)
but “wharf” –> “wharves”
Cool. Thanks, Eylon! That rule (*uses -> *us) is the one I am the most worried about.
Hmm… Odd thing about scarfs. I would have thought it would be scarves!
Anyone else?
The plural of “octopus” is either “octopodes”, if you’re being pedantically correct, or “octopuses” if you’re not.
Also a couple of your regexps are wrong. The alternation operator “|” only applies to the regexps immediately to either side, which is typically a single character. If you want it to apply to a string, then you need to group it with parentheses. Thus your /(matr|vert|ind)ix|ex$/ currently matches “matrixx” and “vertiex”, but not “matrix”, “vertex” or “index”. You could improve it by changing it to /(matr|vert|ind)(ix|ex)$/, which would match everything you intend, but would also match some things you dont, e.g., “matrex”, and “indix”.
Thanks, tet! Never heard of “octopodes”. I prefer “octopuses” myself (why use an irregular form when you can use a regular one, right?). I think the original RoR code included “octopi” in order to show how irregular nouns would be handled.
Thanks also for finding the bug in the regular expression. I took that verbatim from the first port of the code to PHP and I hadn’t spotted that error.
Anyone else?
Guys … your code will be used by many third parties, so you do have a certain responsibility. Do not take simplification of English in your own hands. Irregularities are there to be respected.
Hi Hristo
Thanks for the comments. I’m totally in agreement about needing to respect the language.
If your comment was in reference to my earlier comment about irregular pluralizations (like octopodes), let me explain a bit more.
Everyone agrees on certain rules (the plural of person is people) whereas other rules are disputed (the plural of octopus might be octopuses, octopi, or octopodes). When a both a regular and irregular form are accepted pluralizations, I tend to prefer the regular form. This is not me being lazy as a programmer. This is just my preference in how to use the english language. As another example, the original ruby on rails code pluralized virus as viri. I don’t know anyone who uses that word in common usage, so I removed that rule.
even people/person is not in 100% agreement ;)
http://www.worldwidewords.org/articles/people.htm
and according to m-w.com, the plural of “scarf” is “scarves” OR “scarfs”. it’s a bit confusing because entry #1 just says “scarfs”, but that’s for “either of the chamfered or cutaway ends that fit together to form a scarf joint”. it’s entry #3 (”broad band of cloth worn about the shoulders, around the neck, or over the head” — what i would think is the more common use/meaning) that says “scarves” or “scarfs” are both ok.
(i wish m-w would let me directly link to the specific entry!)
thx for this class, i can use it like on Ruby on Rails.
this class very useful, when i work with classes in PHP especially dealing with plural and singular function
Hi Sho, this looks fantastic. I’ve updated my blog post to include a link here. Great work!
Cool, but a little confused, read it again.
Thanks for this! I got here via a couple hops away from Google (ended up here via Paul Osman’s blog post), and it’s exactly what I was looking for to de-pluralize my table names for my MySQL->PHP Class converter.
This is great. I started to write my own and thought someone must have done a bitter job already. Thanks for putting it under the MIT License. I’m including the PHP version in another MIT-Licensed project. If you prefer I don’t, just let me know.
Just a quick one, might want to add ‘mice’ to the $uncountable array. Very nice functions!
Sweet. I ported it to c#. Do you want to add it to your list?
Sure! Just post it as a comment on this post for now. At some point, I will update this code and do a new post. I’ll link to your version when I do.
I tried to copy and paste, but it didn’t save it.
Dude, you rock! I was just writing my own class that does just this… and it was taking me longer than I thought to figure out all the rules for singularizing & pluralizing… thanx a million! great work!
+1 for the thanks
Bit late but for reference for those who might be wondering how to check if a word is actually plural or not:
public static function is_plural ($string)
{
return Inflect::pluralize($string) == $string;
}
public static function is_singular ($string)
{
return Inflect::singularize($string) == $string;
}
Great Class Btw!
[...] However, for a more full-fledged version, we now use the inflection class originally created by Sho Kuwamoto with slight modifications. If you have a look at the class, it makes use of simple regular [...]
[...] wrote a simple function that did the task for me. I know there are PHP Pluralize functions out there, but they are a bit overkill for my needs. I got this idea from the way Django templates handles [...]
[...] [upmod] [downmod] kuwamoto.org » Blog Archive » Improved pluralizing in PHP, ActionScript, and RoR (kuwamoto.org) 0 points posted 10 months, 1 week ago by jeethu tags english programming [...]
Thanks for a great script. Just thought I’d mention that the script bummed out with a syntax error on PHP 5.1.4 running on Redhat EH4. It worked fine for PHP 5.2.9-2 running on XP. I fixed the syntax error by changing the double quotes surrounding the array contents to single quotes (e.g “$1zes” became ‘$1izes’).