导航栏: 首页 评论列表

URL Module

默认分类 2019/03/08 03:09

URL#

Stability: 2 - Stable

The url module provides utilities for URL resolution and parsing. It can be accessed using:

const url = require('url');

URL Strings and URL Objects#

A URL string is a structured string containing multiple meaningful components. When parsed, a URL object is returned containing properties for each of these components.

The url module provides two APIs for working with URLs: a legacy API that is Node.js specific, and a newer API that implements the same WHATWG URL Standard used by web browsers.

While the Legacy API has not been deprecated, it is maintained solely for backwards compatibility with existing applications. New application code should use the WHATWG API.

A comparison between the WHATWG and Legacy APIs is provided below. Above the URL 'http://user:pass@sub.example.com:8080/p/a/t/h?query=string#hash', properties of an object returned by the legacy url.parse() are shown. Below it are properties of a WHATWG URL object.

WHATWG URL's origin property includes protocol and host, but not username or password.

┌────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                              href                                              │
├──────────┬──┬─────────────────────┬────────────────────────┬───────────────────────────┬───────┤
│ protocol │  │        auth         │          host          │           path            │ hash  │
│          │  │                     ├─────────────────┬──────┼──────────┬────────────────┤       │
│          │  │                     │    hostname     │ port │ pathname │     search     │       │
│          │  │                     │                 │      │          ├─┬──────────────┤       │
│          │  │                     │                 │      │          │ │    query     │       │
"  https:   //    user   :   pass   @ sub.example.com : 8080   /p/a/t/h  ?  query=string   #hash "
│          │  │          │          │    hostname     │ port │          │                │       │
│          │  │          │          ├─────────────────┴──────┤          │                │       │
│ protocol │  │ username │ password │          host          │          │                │       │
├──────────┴──┼──────────┴──────────┼────────────────────────┤          │                │       │
│   origin    │                     │         origin         │ pathname │     search     │ hash  │
├─────────────┴─────────────────────┴────────────────────────┴──────────┴────────────────┴───────┤
│                                              href                                              │
└────────────────────────────────────────────────────────────────────────────────────────────────┘
(all spaces in the "" line should be ignored — they are purely for formatting)

Parsing the URL string using the WHATWG API:

const myURL =
  new URL('https://user:pass@sub.example.com:8080/p/a/t/h?query=string#hash');

Parsing the URL string using the Legacy API:

const url = require('url');
const myURL =
  url.parse('https://user:pass@sub.example.com:8080/p/a/t/h?query=string#hash');

The WHATWG URL API#

Class: URL#

Browser-compatible URL class, implemented by following the WHATWG URL Standard. Examples of parsed URLs may be found in the Standard itself. The URL class is also available on the global object.

In accordance with browser conventions, all properties of URL objects are implemented as getters and setters on the class prototype, rather than as data properties on the object itself. Thus, unlike legacy urlObjects, using the delete keyword on any properties of URL objects (e.g. delete myURL.protocol, delete myURL.pathname, etc) has no effect but will still return true.

Constructor: new URL(input[, base])#

Creates a new URL object by parsing the input relative to the base. If base is passed as a string, it will be parsed equivalent to new URL(base).

const myURL = new URL('/foo', 'https://example.org/');
// https://example.org/foo

A TypeError will be thrown if the input or base are not valid URLs. Note that an effort will be made to coerce the given values into strings. For instance:

const myURL = new URL({ toString: () => 'https://example.org/' });
// https://example.org/

Unicode characters appearing within the hostname of input will be automatically converted to ASCII using the Punycode algorithm.

const myURL = new URL('https://測試');
// https://xn--g6w251d/

This feature is only available if the node executable was compiled with ICU enabled. If not, the domain names are passed through unchanged.

In cases where it is not known in advance if input is an absolute URL and a base is provided, it is advised to validate that the origin of the URL object is what is expected.

let myURL = new URL('http://Example.com/', 'https://example.org/');
// http://example.com/

myURL = new URL('https://Example.com/', 'https://example.org/');
// https://example.com/

myURL = new URL('foo://Example.com/', 'https://example.org/');
// foo://Example.com/

myURL = new URL('http:Example.com/', 'https://example.org/');
// http://example.com/

myURL = new URL('https:Example.com/', 'https://example.org/');
// https://example.org/Example.com/

myURL = new URL('foo:Example.com/', 'https://example.org/');
// foo:Example.com/

url.hash#

Gets and sets the fragment portion of the URL.

const myURL = new URL('https://example.org/foo#bar');
console.log(myURL.hash);
// Prints #bar

myURL.hash = 'baz';
console.log(myURL.href);
// Prints https://example.org/foo#baz

Invalid URL characters included in the value assigned to the hash property are percent-encoded. Note that the selection of which characters to percent-encode may vary somewhat from what the url.parse() and url.format() methods would produce.

url.host#

Gets and sets the host portion of the URL.

const myURL = new URL('https://example.org:81/foo');
console.log(myURL.host);
// Prints example.org:81

myURL.host = 'example.com:82';
console.log(myURL.href);
// Prints https://example.com:82/foo

Invalid host values assigned to the host property are ignored.

url.hostname#

Gets and sets the hostname portion of the URL. The key difference between url.host and url.hostname is that url.hostname does not include the port.

const myURL = new URL('https://example.org:81/foo');
console.log(myURL.hostname);
// Prints example.org

myURL.hostname = 'example.com:82';
console.log(myURL.href);
// Prints https://example.com:81/foo

Invalid hostname values assigned to the hostname property are ignored.

url.href#

Gets and sets the serialized URL.

const myURL = new URL('https://example.org/foo');
console.log(myURL.href);
// Prints https://example.org/foo

myURL.href = 'https://example.com/bar';
console.log(myURL.href);
// Prints https://example.com/bar

Getting the value of the href property is equivalent to calling url.toString().

Setting the value of this property to a new value is equivalent to creating a new URL object using new URL(value). Each of the URL object's properties will be modified.

If the value assigned to the href property is not a valid URL, a TypeError will be thrown.

url.origin#

Gets the read-only serialization of the URL's origin.

const myURL = new URL('https://example.org/foo/bar?baz');
console.log(myURL.origin);
// Prints https://example.org
const idnURL = new URL('https://測試');
console.log(idnURL.origin);
// Prints https://xn--g6w251d

console.log(idnURL.hostname);
// Prints xn--g6w251d

url.password#

Gets and sets the password portion of the URL.

const myURL = new URL('https://abc:xyz@example.com');
console.log(myURL.password);
// Prints xyz

myURL.password = '123';
console.log(myURL.href);
// Prints https://abc:123@example.com

Invalid URL characters included in the value assigned to the password property are percent-encoded. Note that the selection of which characters to percent-encode may vary somewhat from what the url.parse() and url.format() methods would produce.

url.pathname#

Gets and sets the path portion of the URL.

const myURL = new URL('https://example.org/abc/xyz?123');
console.log(myURL.pathname);
// Prints /abc/xyz

myURL.pathname = '/abcdef';
console.log(myURL.href);
// Prints https://example.org/abcdef?123

Invalid URL characters included in the value assigned to the pathname property are percent-encoded. Note that the selection of which characters to percent-encode may vary somewhat from what the url.parse() and url.format() methods would produce.

url.port#

Gets and sets the port portion of the URL.

The port value may be a number or a string containing a number in the range 0 to 65535 (inclusive). Setting the value to the default port of the URL objects given protocol will result in the port value becoming the empty string ('').

The port value can be an empty string in which case the port depends on the protocol/scheme:

protocolport
"ftp"21
"file"
"gopher"70
"http"80
"https"443
"ws"80
"wss"443

Upon assigning a value to the port, the value will first be converted to a string using .toString().

If that string is invalid but it begins with a number, the leading number is assigned to port. If the number lies outside the range denoted above, it is ignored.

const myURL = new URL('https://example.org:8888');
console.log(myURL.port);
// Prints 8888

// Default ports are automatically transformed to the empty string
// (HTTPS protocol's default port is 443)
myURL.port = '443';
console.log(myURL.port);
// Prints the empty string
console.log(myURL.href);
// Prints https://example.org/

myURL.port = 1234;
console.log(myURL.port);
// Prints 1234
console.log(myURL.href);
// Prints https://example.org:1234/

// Completely invalid port strings are ignored
myURL.port = 'abcd';
console.log(myURL.port);
// Prints 1234

// Leading numbers are treated as a port number
myURL.port = '5678abcd';
console.log(myURL.port);
// Prints 5678

// Non-integers are truncated
myURL.port = 1234.5678;
console.log(myURL.port);
// Prints 1234

// Out-of-range numbers which are not represented in scientific notation
// will be ignored.
myURL.port = 1e10; // 10000000000, will be range-checked as described below
console.log(myURL.port);
// Prints 1234

Note that numbers which contain a decimal point, such as floating-point numbers or numbers in scientific notation, are not an exception to this rule. Leading numbers up to the decimal point will be set as the URL's port, assuming they are valid:

myURL.port = 4.567e21;
console.log(myURL.port);
// Prints 4 (because it is the leading number in the string '4.567e21')

url.protocol#

Gets and sets the protocol portion of the URL.

const myURL = new URL('https://example.org');
console.log(myURL.protocol);
// Prints https:

myURL.protocol = 'ftp';
console.log(myURL.href);
// Prints ftp://example.org/

Invalid URL protocol values assigned to the protocol property are ignored.

Special Schemes#

The WHATWG URL Standard considers a handful of URL protocol schemes to be special in terms of how they are parsed and serialized. When a URL is parsed using one of these special protocols, the url.protocol property may be changed to another special protocol but cannot be changed to a non-special protocol, and vice versa.

For instance, changing from http to https works:

const u = new URL('http://example.org');
u.protocol = 'https';
console.log(u.href);
// https://example.org

However, changing from http to a hypothetical fish protocol does not because the new protocol is not special.

const u = new URL('http://example.org');
u.protocol = 'fish';
console.log(u.href);
// http://example.org

Likewise, changing from a non-special protocol to a special protocol is also not permitted:

const u = new URL('fish://example.org');
u.protocol = 'http';
console.log(u.href);
// fish://example.org

The protocol schemes considered to be special by the WHATWG URL Standard include: ftp, file, gopher, http, https, ws, and wss.

url.search#

Gets and sets the serialized query portion of the URL.

const myURL = new URL('https://example.org/abc?123');
console.log(myURL.search);
// Prints ?123

myURL.search = 'abc=xyz';
console.log(myURL.href);
// Prints https://example.org/abc?abc=xyz

Any invalid URL characters appearing in the value assigned the search property will be percent-encoded. Note that the selection of which characters to percent-encode may vary somewhat from what the url.parse() and url.format() methods would produce.

url.searchParams#

Gets the URLSearchParams object representing the query parameters of the URL. This property is read-only; to replace the entirety of query parameters of the URL, use the url.search setter. See URLSearchParams documentation for details.

url.username#

Gets and sets the username portion of the URL.

const myURL = new URL('https://abc:xyz@example.com');
console.log(myURL.username);
// Prints abc

myURL.username = '123';
console.log(myURL.href);
// Prints https://123:xyz@example.com/

Any invalid URL characters appearing in the value assigned the username property will be percent-encoded. Note that the selection of which characters to percent-encode may vary somewhat from what the url.parse() and url.format() methods would produce.

url.toString()#

The toString() method on the URL object returns the serialized URL. The value returned is equivalent to that of url.href and url.toJSON().

Because of the need for standard compliance, this method does not allow users to customize the serialization process of the URL. For more flexibility, require('url').format() method might be of interest.

url.toJSON()#

The toJSON() method on the URL object returns the serialized URL. The value returned is equivalent to that of url.href and url.toString().

This method is automatically called when an URL object is serialized with JSON.stringify().

const myURLs = [
  new URL('https://www.example.com'),
  new URL('https://test.example.org')
];
console.log(JSON.stringify(myURLs));
// Prints ["https://www.example.com/","https://test.example.org/"]

Class: URLSearchParams#

The URLSearchParams API provides read and write access to the query of a URL. The URLSearchParams class can also be used standalone with one of the four following constructors. The URLSearchParams class is also available on the global object.

The WHATWG URLSearchParams interface and the querystring module have similar purpose, but the purpose of the querystring module is more general, as it allows the customization of delimiter characters (& and =). On the other hand, this API is designed purely for URL query strings.

const myURL = new URL('https://example.org/?abc=123');
console.log(myURL.searchParams.get('abc'));
// Prints 123

myURL.searchParams.append('abc', 'xyz');
console.log(myURL.href);
// Prints https://example.org/?abc=123&abc=xyz

myURL.searchParams.delete('abc');
myURL.searchParams.set('a', 'b');
console.log(myURL.href);
// Prints https://example.org/?a=b

const newSearchParams = new URLSearchParams(myURL.searchParams);
// The above is equivalent to
// const newSearchParams = new URLSearchParams(myURL.search);

newSearchParams.append('a', 'c');
console.log(myURL.href);
// Prints https://example.org/?a=b
console.log(newSearchParams.toString());
// Prints a=b&a=c

// newSearchParams.toString() is implicitly called
myURL.search = newSearchParams;
console.log(myURL.href);
// Prints https://example.org/?a=b&a=c
newSearchParams.delete('a');
console.log(myURL.href);
// Prints https://example.org/?a=b&a=c

Constructor: new URLSearchParams()#

Instantiate a new empty URLSearchParams object.

Constructor: new URLSearchParams(string)#

Parse the string as a query string, and use it to instantiate a new URLSearchParams object. A leading '?', if present, is ignored.

let params;

params = new URLSearchParams('user=abc&query=xyz');
console.log(params.get('user'));
// Prints 'abc'
console.log(params.toString());
// Prints 'user=abc&query=xyz'

params = new URLSearchParams('?user=abc&query=xyz');
console.log(params.toString());
// Prints 'user=abc&query=xyz'

Constructor: new URLSearchParams(obj)#