A new class -->String

Frits van Bommel fvbommel at REMwOVExCAPSs.nl
Wed Apr 4 01:37:55 PDT 2007


jinheking wrote:
> I want to make a Class like java's String Class.
> module jdk2d.lang;
> 
> import std.stdio;
> /*
> * The <code>String</code> class represents character strings. All
> * string literals in D programs, such as <code>"abc"</code>, are
> * implemented as instances of this class.

I don't think last bit is going to happen.

> * @author  Caoqi
> * @version 0.001, 07/03/30
> * @since   JDK2D 0.02
> * @url     http://jinheking.javaeye.com/admin
> * <p>
> */
> 
> public class  String{
>   private final wchar[] value;
> 
>   /** The offset is the first index of the storage that is used. */
>   private final int offset;
> 
>   /** The count is the number of characters in the String. */
>   private final int count;

Those last two are unnecessary. wchar[] + slicing takes care of everything.
So all you need is
	private final wchar[] value;
(and when the new constness proposal comes through, add const (or better 
yet, invariant) to that)

>   /*
>    * Initializes a newly created {@code String} object so that it represents
>    * an empty character sequence.  Note that use of this constructor is
>    * unnecessary since Strings are immutable.
>    */
>   this(){
>    this.offset = 0;
>    this.count = 0;
>    this.value = new wchar[0];

just
	this.value = null;
will do just fine.

>   }
> 
>   String opAssign(wchar[] value) {
>    int size = value.length;
>    offset = 0;
>    count = size;
>    value = value;
>    return this;

(You forgot a 'this' in front of the left-hand 'value' in that assignment)

	this.value = value.dup;
To be sure the code outside can't modify the contents.
Once constness comes through, add an overload that takes an invariant 
wchar[] and doesn't dup.

>     }
> 
>   /*
>      * Allocates a new {@code String} so that it represents the sequence of
>      * characters currently contained in the character array argument. The
>      * contents of the character array are copied; subsequent modification 
> of
>      * the character array does not affect the newly created string.
>      *
>      * @param  value
>      *         The initial value of the string
>      */
>     public this(wchar[] value) {
>    int size = value.length;
>    this.offset = 0;
>    this.count = size;
>    this.value = value;

Same as above:
	this.value = value.dup;

>     }
> 
>   /**
>      * Returns the length of this string.
>      * The length is equal to the number of <a 
> href="Character.html#unicode">Unicode
>      * code units</a> in the string.
>      *
>      * @return  the length of the sequence of characters represented by this
>      *          object.
>      */
>     public int length() {
>         return count;

	return value.length;

>     }
> 
>     /**
>      * Returns <tt>true</tt> if, and only if, {@link #length()} is 
> <tt>0</tt>.
>      *
>      * @return <tt>true</tt> if {@link #length()} is <tt>0</tt>, otherwise
>      * <tt>false</tt>
>      *
>      * @since JDK2D 0.01
>      */
>     public bool isEmpty() {
>     return count == 0;

	return value.length == 0;

>     }
>   /**
>      * Returns the index within this string of the first occurrence of the
>      * specified substring. The integer returned is the smallest value
>      * <i>k</i> such that:
>      * <blockquote><pre>
>      * this.startsWith(str, <i>k</i>)
>      * </pre></blockquote>
>      * is <code>true</code>.
>      *
>      * @param   str   any string.
>      * @return  if the string argument occurs as a substring within this
>      *          object, then the index of the first character of the first
>      *          such substring is returned; if it does not occur as a
>      *          substring, <code>-1</code> is returned.
>      */
> 
>     public int indexOf(String str) {
>    return indexOf(str, 0);
>     }
> 
>     public int indexOf(wchar[] str) {
>    return indexOf(str, 0);
>     }
> 
>     /**
>      * Returns the index within this string of the first occurrence of the
>      * specified substring, starting at the specified index.  The integer
>      * returned is the smallest value <tt>k</tt> for which:
>      * <blockquote><pre>
>      *     k &gt;= Math.min(fromIndex, this.length()) && 
> this.startsWith(str, k)
>      * </pre></blockquote>
>      * If no such value of <i>k</i> exists, then -1 is returned.
>      *
>      * @param   str         the substring for which to search.
>      * @param   fromIndex   the index from which to start the search.
>      * @return  the index within this string of the first occurrence of the
>      *          specified substring, starting at the specified index.
>      */
>     public int indexOf(String str, int fromIndex) {
>         return indexOf(value, offset, count,
>                        str.value, str.offset, str.count, fromIndex);
>     }
> 
>     public int indexOf(wchar[] wstr, int fromIndex) {
>        String str=new String(wstr);
>         return indexOf(value, offset, count,
>                        str.value, str.offset, str.count, fromIndex);
>     }
> 
>     /**
>      * Code shared by String and StringBuffer to do searches. The
>      * source is the character array being searched, and the target
>      * is the string being searched for.
>      *
>      * @param   source       the characters being searched.
>      * @param   sourceOffset offset of the source string.
>      * @param   sourceCount  count of the source string.
>      * @param   target       the characters being searched for.
>      * @param   targetOffset offset of the target string.
>      * @param   targetCount  count of the target string.
>      * @param   fromIndex    the index to begin searching from.
>      */
>     static int indexOf(wchar[] source, int sourceOffset, int sourceCount,
>                        wchar[] target, int targetOffset, int targetCount,
>                        int fromIndex) {
>    if (fromIndex >= sourceCount) {
>               return (targetCount == 0 ? sourceCount : -1);
>    }
>        if (fromIndex < 0) {
>            fromIndex = 0;
>        }
>    if (targetCount == 0) {
>        return fromIndex;
>    }
> 
>           wchar first  = target[targetOffset];
>           int max = sourceOffset + (sourceCount - targetCount);
> 
>           for (int i = sourceOffset + fromIndex; i <= max; i++) {
>               /* Look for first character. */
>               if (source[i] != first) {
>                 i++;
>                   while (i <= max && source[i] != first){
>                    i++;
>                   }
>               }
> 
>               /* Found first character, now look at the rest of v2 */
>               if (i <= max) {
>                   int j = i + 1;
>                   int end = j + targetCount - 1;
>                   for (int k = targetOffset + 1; j < end && (source[j] == 
> target[k]);j++){
>                     k++;
>                   }
> 
>                   if (j == end) {
>                       /* Found whole string. */
>                       return i - sourceOffset;
>                   }
>               }
>           }
>           return -1;
>       }

Using slicing so offset and counts don't need to be passed will allow 
you to implement these more cleanly, I think.

> 
>  char[] toString(){
>   return cast(char[])std.utf.toUTF8(this.value);

This is broken for your implementation but just fine for mine. (It 
doesn't work for substrings in yours, it'll convert the entire string 
instead of just the part this String object represents)

>  }
> }
> 
> public static void  main() {
>    String str = new String("The quick brown fox jumped over the lazy dog.");
>    String s1 = new String("abc");
>    s1="abc";
>    writefln(s1);
>    printf("%d\n",str.indexOf("z"));
>    printf("%d\n",str.isEmpty());
> }

As a final note, there's a (less Java-like) string class implementation 
at http://www.dprogramming.com/dstring.php. It stores its data as 
char[], wchar[] or dchar[] and tries to use whichever is the most 
space-efficient yet can still store all elements in a single array 
element. The last bit allows it to implement "intuitive" slicing, never 
slicing in the middle of a group of UTF-8 or UTF-16 code units that form 
a single character point (if I got the terminology right).



More information about the Digitalmars-d mailing list