PHP arrays are sort of a combination of traditional C-style arrays (i.e., index-based lookup) and hash tables (plus a few extra bits). With this comes a decent amount of overhead — they use a lot of memory! For something as basic as an array of 64-bit integers, it uses around 152 bytes per entry:
Memory usage in MB for an array with n 64-bit integers
In situations where you’re just using the array for storage — that is, you don’t need things like multiple types, string keys, nested arrays, etc — this overhead is significant, particularly as the array size grows. For these, it may make sense to try a different approach, one which will significantly reduce the overall memory footprint while maintaining decent lookup performance. It’s not suitable for every circumstance, but when needed it’s a very useful tool to have.
Implementation
This is a very basic implementation of a C-style array solely for 64-bit integers, but it should serve to illustrate the overall idea: using a packed binary string to implement a more efficient data structure. The example requires PHP 5.6+ since the 64-bit integer formats for pack
were first added then, but the function itself has existed for much longer and can be adapted to work on older versions.
class Array_Int64 { private $_backing = ''; public function __construct() { assert(PHP_INT_SIZE === 8, 'Class requires 64-bit integer support'); } public function append($item) { $this->_backing .= pack('P', $item); } public function count() { return $this->_binary_strlen($this->_backing) / PHP_INT_SIZE; } public function get($index) { if (!is_numeric($index)) { return null; } if ($index >= $this->count()) { return null; } $packed = $this->_binary_substr($this->_backing, $index * PHP_INT_SIZE, PHP_INT_SIZE); $unpacked = unpack('P', $packed); if (is_array($unpacked)) { return array_shift($unpacked); } return null; } protected function _binary_strlen($str) { if (function_exists('mb_internal_encoding') && (ini_get('mbstring.func_overload') & 2)) { return mb_strlen($str, '8bit'); } return strlen($str); } protected function _binary_substr($str, $start, $length = null) { if (function_exists('mb_internal_encoding') && (ini_get('mbstring.func_overload') & 2)) { return mb_substr($str, $start, $length, '8bit'); } return substr($str, $start, $length); } }
Discussion
Memory Usage
As you can see, it uses drastically less memory. At 450,000 items, the native array uses almost 18 times the memory as our custom array.
Backing Storage
The storage itself is pretty simple: we’re storing our data structure in a packed binary string rather than using the formal array language construct. Thus, our total memory usage will be the length of the string plus the minimal internal overhead PHP uses for operations on that string.
Item Count
The number of items in our “array” of 64-bit integers is _binary_strlen($backing) / 8
. Note the use of the custom function _binary_strlen
rather than strlen
to ensure we’re working in a binary-safe mode.
Insertion
Inserting an item into the array is can be as basic as concatenating on the result of pack('P', $value)
. It’s entirely possible to use a more targeted approach to insertions too, such as ensuring the contents remain sorted for binary search. Ultimately, it can be as simple or complex as needed as long as it supports the desired data set and the actions that will be performed on it.
Lookup
Since this is effectively a C-style array, lookup is done by calculating the offset from the first item and grabbing n bytes (in this case, 8) from that position. Like the item count, it’s important to note the use of the custom function _binary_substr
rather than substr
to ensure binary safety.
Conclusions
This approach isn’t a universal replacement to PHP arrays, but it does offer a better tool for some situations. When PHP code needs to be run on a variety of hosts, this allows the program to avoid the fairly common constraint of a low value for the INI setting memory_limit
. It’s especially useful when running in environments where the developer can’t control or predict the system limits.
For a more robust, generic replacement, another format to look at is MessagePack. Like the example above, it uses a packed binary string to implement array-like functionality while saving significantly on the required memory.