Language

Overloaded string/bytes type

Unambiguous array-of-bytes type

Unambiguous textual-string type

C++ 11

std::string

std::vector<byte>

C#

byte[]

string

Java

byte[]

String

Perl 5

SCALAR

PHP 5

string

Python <= 2.5

str

unicode

Python >= 2.6

str

bytearray (4)

unicode

Python 3

bytes, bytearray

str

Ruby 1.8

String

– (5)

Ruby 1.9

String

C++ 11

Since we use a box type, Variant, in C++, any difficulty interpreting strings is easily handled by qualifying the value container. It doesn't seem like too much of a stretch to me to also take this approach with bindings for languages that don't offer disambiguated types.

C++ 11 doesn't seem to have a dedicated type for unicode. It has wide characters (not the same thing), and it has literal syntax for unicode strings. These resolve, however, to arrays of char, char16_t, or char32_t, so there's no type signal we can easily use to figure out the developer's intention.

Perl 5

I know too little about perl to say what's going on here. Scalar::Util reftype seems to offer a way to get more type info. Perl has an array type, but its use for byte arrays doesn't appear to be recommended.

[jross@localhost ~]$ perl -e 'use Scalar::Util qw(reftype); my $foo = "hello"; print reftype(\$foo) . "\n"'
SCALAR
[jross@localhost ~]$ perl -e 'use Scalar::Util qw(reftype); my $foo = "hello"; print reftype([]) . "\n"'
ARRAY 

PHP 5

PHP 5 has an array type for bytes, but it's really a map with integer keys, which I would consider too inefficient for this application.

Python 2

Python 2's 'bytes' type is simply an alias for str, so we can't use it to disambiguate. Python >= 2.6 does, however, have bytearray, which I think would serve well enough.

Ruby 1.8

Ruby <= 1.8 doesn't have explicit string encodings, and I can't tell what the default is.

Ruby >= 1.9

Ruby >= 1.9 seems to offer everything we need. Er, I'm wrong, it doesn't. It has strings with encodings; it doesn't have an explicit binary data type that is easily distinguished from text. One could use an Array of ints, but that's perhaps less efficient than we need.

irb(main):003:0> x = [1, 2, 3, 255]
=> [1, 2, 3, 255]
irb(main):004:0> x.class
=> Array
irb(main):016:0> x = "holla"
=> "holla"
irb(main):017:0> x.class
=> String
irb(main):018:0> x.encoding
=> #<Encoding:UTF-8>
  • No labels

2 Comments

    1. Older 2.x versions of python have 'buffer', which looks like it could work.

      http://docs.python.org/2/library/functions.html#buffer