Skip to content Skip to sidebar Skip to footer

How To Remove Scheme From Url In Python?

I am working with an application that returns urls, written with Flask. I want the URL displayed to the user to be as clean as possible so I want to remove the http:// from it. I l

Solution 1:

I don't think urlparse offers a single method or function for this. This is how I'd do it:

from urlparse import urlparse

url = 'HtTp://stackoverflow.com/questions/tagged/python?page=2'defstrip_scheme(url):
    parsed = urlparse(url)
    scheme = "%s://" % parsed.scheme
    return parsed.geturl().replace(scheme, '', 1)

print strip_scheme(url)

Output:

stackoverflow.com/questions/tagged/python?page=2

If you'd use (only) simple string parsing, you'd have to deal with http[s], and possibly other schemes yourself. Also, this handles weird casing of the scheme.

Solution 2:

If you are using these programmatically rather than using a replace, I suggest having urlparse recreate the url without a scheme.

The ParseResult object is a tuple. So you can create another removing the fields you don't want.

# py2/3 compatibilitytry:
    from urllib.parse import urlparse, ParseResult
except ImportError:
    from urlparse import urlparse, ParseResult


defstrip_scheme(url):
    parsed_result = urlparse(url)
    return ParseResult('', *parsed_result[1:]).geturl()

You can remove any component of the parsedresult by simply replacing the input with an empty string.

It's important to note there is a functional difference between this answer and @Lukas Graf's answer. The most likely functional difference is that the '//' component of a url isn't technically the scheme, so this answer will preserve it, whereas it will remain here.

>>> Lukas_strip_scheme('https://yoman/hi?whatup')
'yoman/hi?whatup'>>> strip_scheme('https://yoman/hi?whatup')
'//yoman/hi?whatup'

Solution 3:

I've seen this done in Flask libraries and extensions. Worth noting you can do it although it does make use of a protected member (._replace) of the ParseResult/SplitResult.

url = 'HtTp://stackoverflow.com/questions/tagged/python?page=2'
split_url = urlsplit(url) 
# >>> SplitResult(scheme='http', netloc='stackoverflow.com', path='/questions/tagged/python', query='page=2', fragment='')
split_url_without_scheme = split_url._replace(scheme="")
# >>> SplitResult(scheme='', netloc='stackoverflow.com', path='/questions/tagged/python', query='page=2', fragment='')
new_url = urlunsplit(split_url_without_scheme)

Solution 4:

A simple regex search and replace works.

import re
defstrip_scheme(url: str):
    return re.sub(r'^https?:\/\/', '', url)

Post a Comment for "How To Remove Scheme From Url In Python?"