PHP Types
This is short ramble about PHP’s type system and the meaning of our programs.
Semantics
A programming system can usually be considered as 2 different parts;
syntax and semantics. Syntax is the particular way that things
need to be written in order for them to be “well formed” or
“syntactically correct”; in other words, syntax separates valid programs
from invalid programs based on how they’re written (for example
my$ =;
is invalid PHP). Semantics on the other
hand, separates valid and invalid programs based on their
behaviour or meaning. This is much trickier, since
misbehaving programs are much harder to spot than incorrectly written
ones, and in order to spot “bad” behaviour we need to know what “good”
behaviour is, and even what we mean by “behaviour”.
In PHP, the “meaning” of the code we write is that the computer will start at the first line of the first file it is given and treat each statement it finds as an instruction to carry out (joined together by semicolons which mean “and then”), until it either runs out of instructions or is told to stop. These instructions can modify the state of the computer. Computer Scientists would call this a form of “operational semantics”; the meaning of the code depends on some physical operation being performed, in this case the meaning of the code is the changes it makes to the state of the computer (its memory).
This simple definition allows us to spot that code like this:
$x = 0;
$x = $x;
is ‘wrong’ because we know that the second line will never change the state, and thus is ‘meaningless’ from the point of view of PHP. In this case the mistake is harmless, since by definition it doesn’t change the behaviour. There are much harder problems to deal with though, where the incorrect instruction depends on the state of the program when it is run, and we can use much more sophisticated techniques to find them.
Types
Understanding more about the semantics of PHP values, and in particular its type system allows us to spot and prevent many mistakes in the behaviour of our programs.
The type of some value, in a very non-rigourous sense, depends on both the way we represent that value and what we can do with it. Computers are built out of circuits which carry electricity, but the design of these circuits causes the electric signals to behave like numbers. Thus the most basic kind of thing, or type of value, that a computer can handle is (whole) numbers, since they’re built into the hardware of the machine.
In a similar way to using circuits that behave like numbers, we can make numbers behave like other types of things, if handled correctly. For example we can represent fractions with whole numbers by using scientific notation, such as ‘15×10-4’ to mean ‘0.0015’. We can represent letters and other characters as numbers if we follow a scheme (an encoding) like ‘01 is A, 02 is B, 03 is C…’, and we can use sequences of these characters to represent more complex types of thing, like PHP code and HTML pages, by inventing syntax to write them in.
Because all of these values are, ultimately, numbers (and then
electric signals in the computer), the computer itself doesn’t know if a
value is meant to represent something of another type or not. Thus it is
important to know what type of values we are dealing with in our
programs if we’re to handle them correctly. For example, we may try to
write out the string "CAR"
by saying:
echo "C" + "A" + "R";
However, if PHP is using the text representation described above (it doesn’t, but this won’t affect the examples) then this will actually be the same as an instruction to do:
echo 03 + 01 + 18;
This is obviously not what we were expecting, since it’s the number
22
. There’s been a mistake in our use of types here.
However, it gets worse. We now have the number 22, but we were expecting
a string so we’re going to make another type error as we treat
22
as a string, which will end up being a "V"
.
As far as the computer is concerned, "V"
is the result we
wanted, since that’s what we asked for, but from our point of view this
data has become corrupted. This is why semantics is difficult
to get right.
Luckily, PHP doesn’t blindly assume that we’re treating types correctly. Instead, in PHP each value contains information about what type it is, and if we try to do things which don’t make sense for that type of value (such as summing strings like above) then it will give us an error message. Note that PHP will sometimes try to guess what you might have meant, but it is dangerous to rely on this compared to specifying exactly what you mean.
If we want to fix the above example, and attach one string to the end
of another (as opposed to summing them), then we can use the
concatenation operator, a full stop .
, rather than
the addition operator, the plus +
. This means we
can write our example correctly as:
echo "C" . "A" . "R";
This won’t give a type error, and it won’t corrupt our values. The
downside, of course, is that we can’t freely mix strings and numbers,
since they’re different types. However, it’s quite easy to convert
numbers into strings (just write them out in decimal), and the
strval
function will do this for us. For example, if we
want to write out the number of cars, which is stored in a variable
called $car_num
, then we can’t say:
echo "There are " . $car_num . " cars";
Since this is a type error (we’re trying to attach a number to some strings). We can write it correctly as:
echo "There are " . strval($car_num) . " cars";
Of course, we may not know what our variable contains (the state may
change it), in which case using strval
is a good idea if
the variable might not be a string. There are similar functions for
integers (whole numbers) and “floating point” numbers (fractions) called
intval
and floatval
, although keep in mind
that most things can be turned into strings, but only properly formatted
strings can be turned into numbers correctly; for example
floatval('-4.3')
will work, but
floatval('minus four point three')
won’t.
Conclusion
Although types are very useful for finding programming mistakes, like most of PHP its type system is limited to just the handful that have been hard-coded into the implementation by its developers. For example, we’re not allowed to define our own types (eg. angles) or operators (eg. vector product), there are no higher-level types (types of types, eg. coordinates) and we can’t define our own encodings (eg. a HTML type). We can overcome some of these difficulties by using PHP’s primitive “object” system, which I may talk about in a future post.