Scripting
languages should provide certain default things that every programmer use every
day and fetching content from Internet is one of them
Python provides
modules that allow fetching urls. Urllib2 is a python module that helps in
fetching URLs (Uniform Resource Locators). It offers a very simple interface,
in the form of the urlopen function. This is capable of
fetching URLs using a variety of different protocols.
The modules also
offer ways for handling basic authentication, cookies, and proxies and so on.
These are provided by objects called handlers and openers.
Case 1 – Make a call
In order to make
a call to a URL we can use,
import urllib2
response =
urllib2.urlopen(url)
print response.read();
This is the
simplest way of making a Call to URL and gets the response.
HTTP Requests – As we know HTTP is based on request and response.
The client makes a request and server’s sends back the response. Rullib2 has
the way to create a request object which will represent the HTTP request and
when sent to server, it returns the response object.
The request after
creation can be called using the same urlopen() method. This response is a
file-like object, which means can be processed using .read() on the
response
req = urllib2.Request('http://www.nove.org')
response = urllib2.urlopen(req)
data_page = response.read()
urllib2 makes the
use of same request interface to handle ftp too like,
req = urllib2.Request('ftp://example.com/')
Case 2 – Post requests
Urllib2 can be used
in posting data too. When using HTML forms, the data needs to be encoded before
sending and then passed to the request object as a data argument before using
this request object. The encoding can be done using urllib module rather than
urllib2.
This can done as,
import urllib
import urllib2
url = 'http://www.nova.com/SameServlet'
values = {'name' :
'Nova', 'location' : 'Hyderbad', 'language' : 'Python Call' }
data = urllib.urlencode(values)
req = urllib2.Request(url,
data)
response = urllib2.urlopen(req)
data_page = response.read()
Once we have the
values we need to use the urllib.urlencode() before appending them to the
request object. Once we have the encoded data we can create a Request object
passing the URL and data. The urlopen()
is called on the request object to get the response.
Case 3 – Delete Requests
There will be
cases where we need to use other methods like PUT, DELETE etc to perform
operations. Delete Operations can be done as,
url =
"http://nova.com/"+assetID
req =
urllib2.Request(url,data='1121')
req.get_method =
lambda: 'DELETE'
urllib2.urlopen(req).read()
Case 4 – Headers
Headers play an
important role when making calls to external web resource. The user-agent
header element is one important piece of information that can identify the
source of the hit.
So in order to
add the header to the request we can use
request = urllib2.Request('http://localhost:8080/')
request.add_header('User-agent',
'www.nove.com')
response = urllib2.urlopen(request)
data = response.read()
After creating a Request object,
use add_header() to set the user agent value before opening
the request.
I hope this
article on basics of urllib2 will help people to dig more into the library