Lucent Sky AVM relies on correct encoding information to analyze the source code of an application. Additionally, incorrect encoding settings might cause build errors for .NET and Java applications.
This article describes how Lucent Sky AVM handles character encoding, as well as ways to troubleshoot encoding errors.
How Lucent Sky AVM handles character encoding
.NET applications
Encodings in .NET applications are handled as described in the All applications section below, except for the Build phase, which is managed by .NET Framework and/or Microsoft Build Tools. MSBuild and ASP.NET Compilation Tool use the system locale (also known as Language for non-Unicode programs) setting on the operating system as the native encoding.
Java applications
Lucent Sky AVM will search for a file named org.eclipse.core.resources.prefs, which is the document Eclipse uses to keep track of file encoding formats for a project.
- If org.eclipse.core.resources.prefs exists, Lucent Sky AVM will make use of the character encoding settings contained therein. For files not included in org.eclipse.core.resources.prefs, their encodings are determined similar to those listed under "All applications" below.
- If org.eclipse.core.resources.prefs does not exist, then Lucent Sky AVM will make use of character encoding detection techniques similar to those described in All application below.
All applications
- Lucent Sky AVM will detect if a file's character encoding format is an Unicode (such as UTF-8) encoding or a native encoding (such as Big5, Shift JIS or Windows 1252).
-
For files with a native encoding, the native encoding is determined in the following order:
- Encoding set in scan arguments (such as Encoding,Big5)
- Encoding set in the custom runtime (if the application uses one)
- Encoding defined in the storage configurations file (storage.config, also known as the cluster configuration file)
- The native encoding most frequently used in the application's files with a native encoding
Troubleshooting
Follow the steps below to resolve encoding problems:
.NET applications
As MSBuild and ASP.NET Compilation Tools use the system locale setting on the operating system as the native encoding, if the native encoding of the application differs from the system locale of the Lucent Sky AVM instance, encoding issues might occur.
To specify the native encoding used by MSBuild, you can set the value of the <CodePage>
property in the project file to the codepage of the native encoding (such as <CodePage>932</CodePage>
). You can alternatively set the CodePage property by using the BuildProperties scan argument (for example, BuildProperties,CodePage=932).
To specify the native encoding used by ASP.NET Compilation Tools, you can set the fileEncoding
attribute in the <configuration/system.web/globalization>
element (such as <globalization fileEncoding="Big5" />
) in the application's top-level web.config file.
Java applications
- If org.eclipse.core.resources.prefs exists, check that its structure is correct, and that the files associated with the error have their encoding format properly specified.
- If you are not certain that org.eclipse.core.resources.prefs is accurately specifying the encoding format of the file, try removing it from that document in order to make Lucent Sky AVM detect the encoding format of the file.
- If org.eclipse.core.resources.prefs does not exist, take the steps listed under All applications section below.
All applications
- If a native encoding has been set as a scan argument, check that the argument is correctly specified.
- If a native encoding has been set in the runtime, check that the argument is correctly specified.
- If a native encoding has been set in the system configurations, check that it is correctly specified.
- Check that the application only includes one type of native character encoding. If there are two or more, then Lucent Sky AVM will use that which is detected in the greatest number of source files.
- When an application contains multiple native encoding formats, consider reformatting some of the files such that they all follow a single character encoding scheme, then run the scan again.