Percent Encoding Explained

Percent encoding is the technical name for what most people call URL encoding. Understanding how it works helps you debug URL issues, work with web APIs, and build applications that handle special characters correctly.

How It Works

Percent encoding converts characters into a safe format for URLs using a simple process: represent each byte of the character's UTF-8 encoding as a percent sign followed by two hexadecimal digits. For ASCII characters, the process is straightforward. Look up the character's ASCII code, convert to hexadecimal, and prefix with %. The space character has ASCII code 32, which is 20 in hexadecimal, so space becomes %20. The ampersand & has ASCII code 38, hex 26, so it becomes %26. For non-ASCII characters (like ö, 中, or emoji), the process uses UTF-8 encoding. The character is first encoded as UTF-8 bytes, then each byte is percent-encoded separately. The German ö (U+00F6) encodes to UTF-8 bytes C3 B6, becoming %C3%B6. The Chinese character 中 (U+4E2D) encodes to UTF-8 bytes E4 B8 AD, becoming %E4%B8%AD. When decoding, the process reverses: find each %XX sequence, convert the hex digits to a byte, collect all the bytes for multi-byte sequences, and decode the bytes as UTF-8. This is why encoding and decoding must use the same character encoding (UTF-8 is standard now, but legacy systems might use other encodings). The percent sign itself must be encoded as %25. Otherwise, the decoder can't distinguish between a literal % and the start of an encoded sequence. Similarly, encoding something already encoded produces double-encoding: %20 becomes %2520, which decodes to %20, not to a space. Decoding should only happen once on already-decoded data. Over-decoding can cause security issues when an application decodes, makes security decisions, then something downstream decodes again. This is a common source of path traversal and injection vulnerabilities.

Common Encoded Characters

Knowing common encoded characters helps you read and debug URLs quickly. While the formula (character → UTF-8 bytes → %XX) is simple, recognizing these on sight saves time. Essential ASCII encodings: space is %20 (or + in query strings), ampersand & is %26, equals = is %3D, question mark ? is %3F, hash # is %23, slash / is %2F, plus + is %2B, percent % is %25, colon : is %3A, at-sign @ is %40. Bracket and quote encodings: left bracket [ is %5B, right bracket ] is %5D, left curly { is %7B, right curly } is %7D, double quote " is %22, single quote ' is %27, backtick ` is %60, less than < is %3C, greater than > is %3E. Common non-ASCII encodings: non-breaking space is %C2%A0, euro sign € is %E2%82%AC, pound sign £ is %C2%A3, registered trademark ® is %C2%AE. German umlauts: ä is %C3%A4, ö is %C3%B6, ü is %C3%BC. Note these are all multi-byte UTF-8 sequences. Spaces are special: %20 is the standard percent-encoding for space. However, in the application/x-www-form-urlencoded format (used for HTML form submission), spaces become +. This is why query strings sometimes show + for spaces. Both are valid in different contexts; our tool supports both conventions. When reading URLs: if you see %2F in a path, it's an encoded slash (a literal / character in data, not a path separator). If you see %3F in a query string, it's an encoded question mark. Understanding these helps you spot encoding issues: %252F means the percent sign was encoded (making %25) followed by 2F—this is double-encoded and usually a bug.

When to Encode

Knowing when to encode—and when not to—prevents both broken URLs and security vulnerabilities. The general principle is: encode when inserting data into a URL, but be precise about what "inserting data" means. Always encode user input before putting it in URLs. Any value that comes from outside your control—form fields, database values, API responses—should be encoded when building URLs. This prevents injection attacks and ensures special characters don't break URL parsing. Encode query parameter values: When building a URL like "/search?q=" + term, the term must be encoded. Even if the term is "cats", encoding doesn't hurt. If the term is "cats & dogs", failing to encode breaks the URL. Use encodeURIComponent() in JavaScript. Encode path segments when building paths dynamically: If a user creates a page titled "A/B Testing", the URL path shouldn't contain a literal /. Encode the title to create the path. Some applications have different rules (allowing / in some contexts), so understand your URL structure. Don't double-encode. If you receive a URL that's already properly encoded, encoding it again creates %25 sequences everywhere. The sequence %20 becomes %2520, which decodes to the literal string "%20" rather than a space. This is a common bug when passing URLs between systems. Don't encode URL structure when you mean to keep it. If you have a complete valid URL like "https://example.com/path?a=b", using encodeURIComponent would break it. Use encodeURI for complete URLs, or better, build URLs piece by piece with proper component encoding. Query strings from form submission are automatically encoded by browsers. When JavaScript or server code builds URLs, you're responsible for encoding. API libraries often have built-in URL builders—use them rather than string concatenation. When receiving encoded URLs: decode once for processing. Don't decode, process, and decode again unless you specifically expect multi-level encoding (rare). Log decoded values for debugging, but be careful about re-encoding for display or storage.

Prova verktyget

URL Kodare/Avkodare

URL Kodare/Avkodare