Thursday, June 22, 2006

url-pattern matching

I'm writing an intercepting filter to use in a web application, and I wanted to do some pattern matching in the same vein as the Servlet spec specifies. Basically, this spec says that you can use the '*' as a wildcard character. For example, /context/*.do would match /context/servlet.do and /context/different.do and as I interpret it, /context/path/to/another/servlet.do

I set out to write a pattern matching algorithm to handle this, and this is what I came up with:


private boolean urlMatches(String uri, String urlPattern){
boolean hasWildCard = urlPattern.indexOf("*") > -1;
boolean firstPositionWildCard = urlPattern.indexOf("*") == 0;

// test to see if its the same
if(uri.equals(urlPattern)){
return true;
}else{
//patterns without wildcards must match exactly
if(!hasWildCard){
return false;
}
}
// If the pattern is an empty string,
// interpret it as "nothing", not "everything"
if(urlPattern.trim().length() == 0){
return false;
}
String[] parts = urlPattern.split("\\*");
for(int i = 0; i < parts.length; i++){
String part = parts[i].trim();
int index = uri.indexOf(part);
if(index < 0){
return false;
}else{
// The first time through, the index must be 0
// when there's not a wildcard char in the first position
if(i == 0 && index > 0 && !firstPositionWildCard){
return false;
}
uri = uri.substring(uri.indexOf(part)+part.length());
}
}
return true;
}


I haven't tested it for more than an afternoon, but I think it's pretty solid. This is of course written in Java, but the logic should be similar for just about any other language.