HTMLEntityCodec#decode incorrectly decodes upper-case accented letters as their lower-case counterparts

Description

From bja...@twigkit.com on March 25, 2013 12:05:24

What steps will reproduce the problem? 1. System.out.println(new org.owasp.esapi.codecs.HTMLEntityCodec().decode("Á")); What is the expected output? What do you see instead? I would expect the HTMLEntityCodec to correctly decode "Á" as a capital "Á". Instead, it outputs a lower-case "á". The same is true for all HTML entities whose encoding fits the "&*acute;" pattern. What version of the product are you using? On what operating system? Version 2.0.1 on MacOS X 10.8.2. Does this issue affect only a specified browser or set of browsers? Nope, this is an API issue. Please provide any additional information below. Checking out your source code from trunk (25/3/2013), it seems the problem is line 253 of HTMLEntityCodec.java (in method getNamedEntity):

possible.append(Character.toLowerCase(input.next()));

Here it is turning everything into lower case as it reads the input stream, thereby losing the case information for accented letters.

Original issue: http://code.google.com/p/owasp-esapi-java/issues/detail?id=296

Environment

None

Status

Assignee

Unassigned

Reporter

Max Gelman

Priority

Configure