URL Encoding: Complete Guide
URL encoding, also known as percent-encoding, is essential for building web applications that handle user input, query parameters, and special characters correctly. Improper URL encoding causes broken links, security vulnerabilities, and data corruption. This comprehensive guide explains how URL encoding works and when to use different encoding functions.
What is URL Encoding?
URL encoding is a mechanism for converting characters that have special meaning in URLs, or that aren't allowed in URLs, into a safe format that can be transmitted correctly. When you see %20 in a URL, that's a space character that has been URL-encoded. The process is called percent-encoding because special characters are replaced with a percent sign (%) followed by their hexadecimal representation. The space character (ASCII 32, hex 20) becomes %20. The ampersand & (ASCII 38, hex 26) becomes %26. This allows these characters to appear in URLs as data rather than being interpreted as URL structure. URLs can only contain a limited set of characters safely. The unreserved characters that don't need encoding are: letters (A-Z, a-z), digits (0-9), and four special characters: hyphen (-), underscore (_), period (.), and tilde (~). All other characters should be percent-encoded when they appear in URL components where they might be misinterpreted. The encoding process is straightforward: take the character's UTF-8 byte representation, then represent each byte as %XX where XX is the two-digit hexadecimal value. Simple ASCII characters encode to a single %XX sequence. Non-ASCII characters like ö encode to multiple sequences: ö in UTF-8 is bytes C3 B6, so it becomes %C3%B6. URL encoding is not encryption or obfuscation—it's purely about character safety. The encoded form is trivially decodable, and anyone reading a URL can understand %20 means space. The purpose is technical correctness, not security or privacy. Understanding URL encoding is essential for: building URLs programmatically with user input, parsing URLs correctly to extract data, debugging why links don't work, preventing injection attacks in web applications, and working with APIs that pass data in URLs.
Reserved Characters
Certain characters are "reserved" in URLs because they have special meanings in URL structure. When these characters appear in URL data (like query parameter values), they must be encoded to prevent them from being interpreted as structural elements. The reserved characters and their URL functions are: ? separates the path from the query string, & separates query parameters, = separates parameter names from values, / separates path segments, # marks the fragment identifier, : separates scheme from host and host from port, @ separates user info from host, + in query strings represents a space (application/x-www-form-urlencoded). Consider this URL: https://example.com/search?q=cats&dogs. This looks like two query parameters: q=cats and dogs (with no value). But if you intended to search for "cats&dogs" as a single phrase, the & was misinterpreted. The correct encoding is: ?q=cats%26dogs. Failure to encode reserved characters causes real problems. A query parameter value containing = breaks parsing. A file path containing ? or # truncates unexpectedly. An API that accepts structured data in query parameters can be manipulated by unencoded special characters. Some reserved characters are safe in certain URL components. A / in the path component separates directories and shouldn't be encoded if that's your intent. But the same / in a query parameter value should be encoded to prevent confusion. This context-sensitivity is why different encoding functions exist. Unicode characters always need encoding in URLs. While modern browsers display decoded URLs for readability, the underlying transmission uses encoded forms. An internationalized domain name (IDN) like münchen.de becomes xn--mnchen-3ya.de (Punycode) at the DNS level, though browsers hide this complexity. For URL building: encode any value that might contain special characters, encode all user input, use the appropriate encoding function for your context (path, query, fragment), and when in doubt, encode—over-encoding is safer than under-encoding.
encodeURI vs encodeURIComponent
JavaScript provides two URL encoding functions that serve different purposes. Using the wrong one is a common source of bugs. Understanding their differences helps you choose correctly for each situation. encodeURI() is designed to encode a complete, valid URI. It preserves characters that have structural meaning in URIs: :, /, ?, #, @, &, =, +, $, and others. Use encodeURI when you have a complete URL and want to make it safe for transmission without breaking its structure. Example: encodeURI("https://example.com/path with spaces/page?param=value") becomes "https://example.com/path%20with%20spaces/page?param=value". Notice that the structural characters (://?=) are preserved while the spaces are encoded. encodeURIComponent() encodes everything except letters, digits, and: - _ . ~. It's designed for encoding individual URI components—particularly query parameter values. It encodes characters that encodeURI preserves, including &, =, /, and ?. Example: encodeURIComponent("cats & dogs?") becomes "cats%20%26%20dogs%3F". The &, space, and ? are all encoded because they would have special meaning if inserted into a URL. When to use each: Use encodeURIComponent for query parameter values: "?q=" + encodeURIComponent(userInput). Use encodeURIComponent for path segments when building URLs piece by piece. Use encodeURI when you have a complete URL that's well-formed but might contain spaces or non-ASCII characters. Common mistakes: Using encodeURI on parameter values allows injection. If userInput is "a&b=c", encodeURI won't help—the & and = remain unencoded. Using encodeURIComponent on a complete URL breaks it by encoding the structural characters. For query strings specifically, many developers use URLSearchParams instead of manual encoding: new URLSearchParams({q: "cats & dogs"}).toString() correctly produces "q=cats+%26+dogs". This handles the encoding automatically and produces properly-formatted query strings. In other languages: Python has urllib.parse.quote() and quote_plus(), PHP has urlencode() and rawurlencode(), and each language has similar pairs. The distinction is always: full URL vs. component encoding.
Prova verktyget
URL Kodare/Avkodare
Läs mer
Vanliga frågor
URL Kodare/Avkodare
Vanliga frågor →