I'm using this code to get standard output from an external program:
>>> from subprocess import * >>> command_stdout = Popen(['ls', '-l'], stdout=PIPE).communicate()
The communicate() method returns an array of bytes:
>>> command_stdout b'total 0\n-rw-rw-r-- 1 thomas thomas 0 Mar 3 07:03 file1\n-rw-rw-r-- 1 thomas thomas 0 Mar 3 07:03 file2\n'
However, I'd like to work with the output as a normal Python string. So that I could print it like this:
>>> print(command_stdout) -rw-rw-r-- 1 thomas thomas 0 Mar 3 07:03 file1 -rw-rw-r-- 1 thomas thomas 0 Mar 3 07:03 file2
I thought that's what the binascii.b2a_qp() method is for, but when I tried it, I got the same byte array again:
>>> binascii.b2a_qp(command_stdout) b'total 0\n-rw-rw-r-- 1 thomas thomas 0 Mar 3 07:03 file1\n-rw-rw-r-- 1 thomas thomas 0 Mar 3 07:03 file2\n'
Does anybody know how to convert the bytes value back to string? I mean, using the "batteries" instead of doing it manually. And I'd like it to be ok with Python 3.
You need to decode the bytes object to produce a string:
# utf-8 is used here because it is a very common encoding, but you
# need to use the encoding your data is actually in.
You need to decode the byte string and turn it in to a character (unicode) string.
I think this way is easy:
bytes = [112, 52, 52]
I think what you actually want is this:
>>> from subprocess import *
>>> command_stdout = Popen(['ls', '-l'], stdout=PIPE).communicate()
>>> command_text = command_stdout.decode(encoding='windows-1252')
Aaron's answer was correct, except that you need to know WHICH encoding to use. And I believe that Windows uses 'windows-1252'. It will only matter if you have some unusual (non-ascii) characters in your content, but then it will make a difference.
By the way, the fact that it DOES matter is the reason that Python moved to using two different types for binary and text data: it can't convert magically between them because it doesn't know the encoding unless you tell it! The only way YOU would know is to read the Windows documentation (or read it here).
If you don't know the encoding, then to read binary input into string in Python 3 and Python 2 compatible way, use ancient MS-DOS cp437 encoding:
PY3K = sys.version_info >= (3, 0)
lines = 
for line in stream:
if not PY3K:
Because encoding is unknown, expect non-English symbols to translate to characters of
cp437 (English chars are not translated, because they match in most single byte encodings and UTF-8).
Decoding arbitrary binary input to UTF-8 is unsafe, because you may get this:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 2: invalid
The same applies to
latin-1, which was popular (default?) for Python 2. See the missing points in Codepage Layout - it is where Python chokes with infamous
ordinal not in range.
UPDATE 20150604: There are rumors that Python 3 has
surrogateescape error strategy for encoding stuff into binary data without data loss and crashes, but it needs conversion tests
[binary] -> [str] -> [binary] to validate both performance and reliability.
P.S. I used to be a Python fanboy like you, then I took an ordinal not in range.
Set universal_newlines to True, i.e.
command_stdout = Popen(['ls', '-l'], stdout=PIPE, universal_newlines=True).communicate()
While @Aaron Maenpaa's answer just works, a user recently asked
Is there any more simply way? 'fhand.read().decode("ASCII")' [...] It's so long!
You can use
decode() has a standard argument
codecs.decode(obj, encoding='utf-8', errors='strict')
To write or read binary data from/to the standard streams, use the underlying binary buffer. For example, to write bytes to stdout, use sys.stdout.buffer.write(b'abc').