Skip to content

Commit 9426b20

Browse files
committed
Doc: update urllib.request examples to handle gzip compression
1 parent 77c22d6 commit 9426b20

File tree

1 file changed

+20
-12
lines changed

1 file changed

+20
-12
lines changed

Doc/library/urllib.request.rst

Lines changed: 20 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1252,9 +1252,13 @@ it::
12521252

12531253
>>> import urllib.request
12541254
>>> with urllib.request.urlopen('https://www.python.org/') as f:
1255-
... print(f.read(300))
1256-
...
1257-
b'<!doctype html>\n<!--[if lt IE 7]> <html class="no-js ie6 lt-ie7 lt-ie8 lt-ie9"> <![endif]-->\n<!--[if IE 7]> <html class="no-js ie7 lt-ie8 lt-ie9"> <![endif]-->\n<!--[if IE 8]> <html class="no-js ie8 lt-ie9">
1255+
... # The response may be compressed (for example, 'gzip').
1256+
... print(f.headers.get('Content-Encoding'))
1257+
... data = f.read()
1258+
... if f.headers.get('Content-Encoding') == 'gzip':
1259+
... import gzip
1260+
... data = gzip.decompress(data)
1261+
... print(data[:300].decode('utf-8', errors='replace'))
12581262

12591263
Note that urlopen returns a bytes object. This is because there is no way
12601264
for urlopen to automatically determine the encoding of the byte stream
@@ -1272,25 +1276,29 @@ As the python.org website uses *utf-8* encoding as specified in its meta tag, we
12721276
will use the same for decoding the bytes object::
12731277

12741278
>>> with urllib.request.urlopen('https://www.python.org/') as f:
1275-
... print(f.read(100).decode('utf-8'))
1279+
... # Check for compression and decode appropriately.
1280+
... enc = f.headers.get('Content-Encoding')
1281+
... data = f.read()
1282+
... if enc == 'gzip':
1283+
... import gzip
1284+
... data = gzip.decompress(data)
1285+
... print(data[:100].decode('utf-8', errors='replace'))
12761286
...
1277-
<!doctype html>
1278-
<!--[if lt IE 7]> <html class="no-js ie6 lt-ie7 lt-ie8 lt-ie9"> <![endif]-->
1279-
<!-
12801287

12811288
It is also possible to achieve the same result without using the
12821289
:term:`context manager` approach::
12831290

12841291
>>> import urllib.request
12851292
>>> f = urllib.request.urlopen('https://www.python.org/')
12861293
>>> try:
1287-
... print(f.read(100).decode('utf-8'))
1294+
... enc = f.headers.get('Content-Encoding')
1295+
... data = f.read()
1296+
... if enc == 'gzip':
1297+
... import gzip
1298+
... data = gzip.decompress(data)
1299+
... print(data[:100].decode('utf-8', errors='replace'))
12881300
... finally:
12891301
... f.close()
1290-
...
1291-
<!doctype html>
1292-
<!--[if lt IE 7]> <html class="no-js ie6 lt-ie7 lt-ie8 lt-ie9"> <![endif]-->
1293-
<!--
12941302

12951303
In the following example, we are sending a data-stream to the stdin of a CGI
12961304
and reading the data it returns to us. Note that this example will only work

0 commit comments

Comments
 (0)