Friday 24 June 2011

String and memory leaks

Probably most of the Java users is aware that String object is more complex than just an array of char. To make the usage of strings in Java more robust additional measures were taken – for instance the String pool was created to save memory by reusing the same String objects instead of allocating new ones. Another optimization that I want to talk about today is adding the offset and count fields to the String object instances.
Why those fields were added? Their purpose is to help to reuse already allocated structures when using some of the string functionalities – like calculating substrings of a given string. The concept is that instead of creating an new char array for a substring we could just ‘reuse’ the old one. To be exact this is what String.substring() method does: instead of copying an char array for the returned object it creates a new String reusing char[] of the old one. Only the values of offset and count fields (which indicate the beginning and the length of a new string) are changed. See here first to see - how substring function works? Because the substring operation is quite often used this mechanism helps to save a lot of memory. It is important to add that this can work only because String objects are immutable. See the following snippet:

public static void sendEmail(String emailUrl) {
String email = emailUrl.substring(7); // 'mailto:' prefix has 7 letters
String userName = email.substring(0, email.indexOf("@"));
String domainName = email.substring(email.indexOf("@"));
}

public static void main(String[] args) {
sendEmail("mailto:user_name@domain_name.com");
}

Thanks to the way substring() is implemented when we extract the email, userName and domainName three new String objects are created, but the char array is not copied. All new string variables reuse the character array from emailUrl. Thanks to that reuse for really long urls we can save approximately 2/3 of memory we would use otherwise. Great, right?

Ok… now the dark side of that optimization! Check out this snippet:

public final static String START_TAG = "<title>";
public final static String END_TAG = "</title>";

public static String getPageTitle(String pageUrl) {
// retrieve the HTML with a helper function:
String htmlPage = getPageContent(pageUrl);

// parse the page content to get the title
int start = htmlPage.indexOf(START_TAG);
start = start + START_TAG.length();
int end = htmlPage.indexOf(END_TAG);
String title = htmlPage.substring(start, end);
return title;
}

In here we are extracting from the HTML page its title – can you see the problem with this code? Looks simple and correct, right?

Now, try to imagine that the htmlPage String is huge – more than 100.000 characters, but the title of this page has only 50 characters. Because of the optimization mentioned above the returned object will reuse the char array of the htmlPage instead of creating a new one… and this means that instead of returning a small string object you get back a huge String with 100.000 characters array!! If your code will invoke getPageTitle() method many times you may find out that you have stored only a thousand titles and already you are out of memory!! Scary, right?

Of course there is an easy solution for that – instead of returning the title in line 13, you can return new String(title). The String(String) constructor is always doing a subcopy of the underlying char array, so the created title will actually have only 50 characters. Now we are safe:)

So what is the lesson here? Always use new String(String)? No… In general the String optimizations are really helpful and it is worth to take advantage of them. You just have to be careful with what you are doing and be aware of what is going on ‘under the hood’ of your code. String class API is in some situations not intuitive, so beware!

No comments:

Post a Comment