I am trying to split the Value using a separator. But I am finding the surprising results

String data = "5|6|7||8|9||";
String[] split = data.split("\\|");
System.out.println(split.length);

I am expecting to get 8 values. [5,6,7,EMPTY,8,9,EMPTY,EMPTY] But I am getting only 6 values.

Any idea and how to fix. No matter EMPTY value comes at anyplace, it should be in array.

Solution 1

split(delimiter) by default removes trailing empty strings from result array. To turn this mechanism off we need to use overloaded version of split(delimiter, limit) with limit set to negative value like

String[] split = data.split("\\|", -1);

Little more details:
split(regex) internally returns result of split(regex, 0) and in documentation of this method you can find (emphasis mine)

The limit parameter controls the number of times the pattern is applied and therefore affects the length of the resulting array.

If the limit n is greater than zero then the pattern will be applied at most n - 1 times, the array's length will be no greater than n, and the array's last entry will contain all input beyond the last matched delimiter.

If n is non-positive then the pattern will be applied as many times as possible and the array can have any length.

If n is zero then the pattern will be applied as many times as possible, the array can have any length, and trailing empty strings will be discarded.

Exception:

It is worth mentioning that removing trailing empty string makes sense only if such empty strings were created by the split mechanism. So for "".split(anything) since we can't split "" farther we will get as result [""] array.
It happens because split didn't happen here, so "" despite being empty and trailing represents original string, not empty string which was created by splitting process.

Solution 2

From the documentation of String.split(String regex):

This method works as if by invoking the two-argument split method with the given expression and a limit argument of zero. Trailing empty strings are therefore not included in the resulting array.

So you will have to use the two argument version String.split(String regex, int limit) with a negative value:

String[] split = data.split("\\|",-1);

Doc:

If the limit n is greater than zero then the pattern will be applied at most n - 1 times, the array's length will be no greater than n, and the array's last entry will contain all input beyond the last matched delimiter. If n is non-positive then the pattern will be applied as many times as possible and the array can have any length. If n is zero then the pattern will be applied as many times as possible, the array can have any length, and trailing empty strings will be discarded.

This will not leave out any empty elements, including the trailing ones.

Solution 3

String[] split = data.split("\\|",-1);

This is not the actual requirement in all the time. The Drawback of above is show below:

Scenerio 1:
When all data are present:
    String data = "5|6|7||8|9|10|";
    String[] split = data.split("\\|");
    String[] splt = data.split("\\|",-1);
    System.out.println(split.length); //output: 7
    System.out.println(splt.length); //output: 8

When data is missing:

Scenerio 2: Data Missing
    String data = "5|6|7||8|||";
    String[] split = data.split("\\|");
    String[] splt = data.split("\\|",-1);
    System.out.println(split.length); //output: 5
    System.out.println(splt.length); //output: 8

Real requirement is length should be 7 although there is data missing. Because there are cases such as when I need to insert in database or something else. We can achieve this by using below approach.

    String data = "5|6|7||8|||";
    String[] split = data.split("\\|");
    String[] splt = data.replaceAll("\\|$","").split("\\|",-1);
    System.out.println(split.length); //output: 5
    System.out.println(splt.length); //output:7

What I've done here is, I'm removing "|" pipe at the end and then splitting the String. If you have "," as a seperator then you need to add ",$" inside replaceAll.

Solution 4

From String.split() API Doc:

Splits this string around matches of the given regular expression. This method works as if by invoking the two-argument split method with the given expression and a limit argument of zero. Trailing empty strings are therefore not included in the resulting array.

Overloaded String.split(regex, int) is more appropriate for your case.

Solution 5

you may have multiple separators, including whitespace characters, commas, semicolons, etc. take those in repeatable group with []+, like:

 String[] tokens = "a , b,  ,c; ;d,      ".split( "[,; \t\n\r]+" );

you'll have 4 tokens -- a, b, c, d

leading separators in the source string need to be removed before applying this split.

as answer to question asked:

String data = "5|6|7||8|9||";
String[] split = data.split("[\\| \t\n\r]+");

whitespaces added just in case if you'll have those as separators along with |