Text handling

Applications with a graphical user interface (and games surely fall into this category) are able to interact with users by displaying text and by expecting textual input from the user. We have already scratched the surface of this topic in the previous chapter by using the QString class. Now, we will go into more details.

Manipulating strings

Text in Qt is internally encoded using Unicode, which allows to represent characters in almost all languages spoken in the world and is de facto standard for native encoding of text in most modern operating systems. You have to be aware though that contrary to the QString class, the C++ language does not use Unicode by default. Thus, each string literal (that is, each bare text you wrap in quotation marks) that you enter in your code needs to be converted to Unicode first before it can be stored in any of Qt's string handling classes. By default, this is done implicitly assuming that the string literal is UTF-8 encoded, but QString provides a number of static methods to convert from other encodings such as QString::fromLatin1() or QString::fromUtf16(). This conversion is done at runtime, which adds an overhead to the program execution time, especially if you tend to do a lot of such conversions in your programs. Luckily, there is a solution for this:

QString str = QStringLiteral("I'm writing my games using Qt");

You can wrap your string literal in a call to QStringLiteral, as shown in the preceding code, which if your compiler supports, will perform the conversion at compile time. It's a good habit to wrap all your string literals into QStringLiteral but it is not required, so don't worry if you forget to do that.

We will not go into great detail here when describing the QString class, as in many aspects it is similar to std::string, which is part of the standard C++. Instead, we will focus on the differences between the two classes.

Encoding and decoding text

The first difference has already been mentioned—QString keeps the data encoded as Unicode. This has the advantage of being able to express text in virtually any language at the cost of having to convert from other encodings. Most popular encodings—UTF-8, UTF-16, and Latin1—have convenience methods in QString for converting from and to the internal representation. But, Qt knows how to handle many other encodings as well. This is done using the QTextCodec class.

Tip

You can list the codecs supported on your installation by using the QTextCodec::availableCodecs()static method. In most installations, Qt can handle almost 1,000 different text codecs.

Most Qt entities that handle text can access instances of this class to transparently perform the conversion. If you want to perform such conversion manually, you can ask Qt for an instance of a codec by its name and make use of the fromUnicode() and toUnicode() methods:

QByteArray big5Encoded = "你好";
QTextCodec *big5Codec = QTextCodec::codecForName("Big5");
QString text = big5Codec->toUnicode(big5Encoded);
QTextCodec *utf8Codec = QTextCodec::codecForMib(106); // UTF-8
QByteArray utf8Encoded = utf8Codec->fromUnicode(text);

Basic string operations

The most basic tasks that involve text strings are those where you add or remove characters from the string, concatenate strings, and access the string's content. In this regard, QString offers an interface that is compatible with std::string, but it also goes beyond that, exposing many more useful methods.

Adding data at the beginning or at the end of the string can be done using the prepend() and append() methods, which have a couple of overloads that accept different objects that can hold textual data, including the classic const char* array. Inserting data in the middle of a string can be done with the insert() method that takes the position of the character where we need to start inserting as its first argument and the actual text as its second argument. The insert method has exactly the same overloads as prepend and append, excluding const char*. Removing characters from a string is similar. The basic way to do this is to use the remove() method that accepts the position at which we need to delete characters and the number of characters to delete is as shown:

QString str = QStringLiteral("abcdefghij");
str.remove(2, 4); // str = "abghij"

There is also a remove overload that accepts another string. When called, all its occurrences are removed from the original string. This overload has an optional argument that states whether comparison should be done in the default case-sensitive (Qt::CaseSensitive) or case-insensitive (Qt::CaseInsensitive) way:

QString str = QStringLiteral("Abracadabra");
str.remove(QStringLiteral("ab"), Qt::CaseInsensitive); // str = "racadra"

To concatenate strings, you can either simply add two strings together or you can append one string to the other:

QString str1 = QStringLiteral("abc");
QString str2 = QStringLiteral("def");
QString str1_2 = str1+str2;
QString str2_1 = str2;
str2_1.append(str1);

Accessing strings can be divided into two use cases. The first is when you wish to extract a part of the string. For this, you can use one of these three methods: left(), right(), and mid() that return the given number of characters from the beginning or end of the string or extract a substring of a specified length, starting from a given position in the string:

QString original = QStringLiteral("abcdefghij");
QString l = original.left(3); // "abc"
QString r = original.right(2); // "ij"
QString m = original.mid(2, 5); // "cdefg"

The second use case is when you wish to access a single character of the string. The use of the index operator works with QString in a similar fashion as with std::string, returning a copy or non-const reference to a given character that is represented by the QChar class, as shown in the following code:

QString str = "foo";
QChar f = str[0]; // const
str[0] = 'g'; // non-const

In addition to this, Qt offers a dedicated method—at()—that returns a copy of the character:

QChar f = str.at(0);

Tip

You should prefer to use at() instead of the index operator for operations that do not modify the character, as this explicitly sets the operation.

The string search and lookup

The second group of functionality is related to searching for the string. You can use methods such as startsWith(), endsWith(), and contains() to search for substrings in the beginning or end or in an arbitrary place in the string. The number of occurrences of a substring in the string can be retrieved by using the count() method.

Tip

Be careful, there is also a count() method that doesn't take any parameters and returns the number of characters in the string.

If you need to know the exact position of the match, you can use indexOf() or lastIndexOf() to receive the position in the string where the match occurs. The first call works by searching forward and the other one searches backwards. Each of these calls takes two optional parameters—the second one determines whether the search is case-sensitive (similar to how remove works). The first one is the position in the string where the search begins. It lets you find all the occurrences of a given substring:

#include <QtDebug>
// ...
int pos = -1;
QString str = QStringLiteral("Orangutans like bananas.");
do {
  pos = str.indexOf("an", pos+1);
  qDebug() << "'an' found starts at position" << pos;
} while(pos!=-1);

Dissecting strings

There is one more group of useful string functionalities that makes QString different from std::string. That is, cutting strings into smaller parts and building larger strings from smaller pieces.

Very often, a string contains substrings that are glued together by a repeating separator. A common case is the Comma-separated Values (CSV) format where a data record is encoded in a single string where fields in the record are separated by commas. While you could extract each field from the record using functions that you already know (for example, indexOf), an easier way exists. QString contains a split() method that takes the separator string as its parameter and returns a list of strings that are represented in Qt by the QStringList class. Then, dissecting the record into separate fields is as easy as calling the following code:

QString record = "1,4,8,15,16,24,42";
QStringList fields = record.split(",");
for(int i=0; i< fields.count(); ++i){
  qDebug() << fields.at(i);
}

The inverse of this method is the join() method present in the QStringList class, which returns all the items in the list as a single string merged together with a given separator:

QStringList fields = { "1", "4", "8", "15", "16", "24", "42" }; // C++11 syntax!
QString record = fields.join(",");

Converting between numbers and strings

QString also provides some methods for convenient conversion between textual and numerical values. Methods such as toInt(), toDouble(), or toLongLong() make it easy to extract numerical values from strings. Apart from toDouble(), they all take two optional parameters—the first one is a pointer to a bool variable that is set to true or false depending on whether the conversion was successful or not. The second parameter specifies the numerical base (for example, binary, octal, decimal, or hexadecimal) of the value. The toDouble() method only takes a bool pointer to mark the success or failure as shown in the following code:

bool ok;
int v1 = QString("42").toInt(&ok, 10); // v1 = 42, ok = true
long long v2 = QString("0xFFFFFF").toInt(&ok, 16); // v2 = 16777215, ok = true
double v3 = QString("not really a number").toDouble(&ok); //v3 = 0.0, ok = false

A static method called number() performs the conversion in the other direction—it takes a numerical value and number base and returns the textual representation of the value:

QString txt = QString::number(255, 16); // txt = "0xFF"

If you have to combine both QString and std::string in one program, QString offers you the toStdString() and fromStdString() methods to perform an adequate conversion.

Tip

Some of the other classes that represent values also provide conversions to and from QString. An example of such a class is QDate, which represents a date and provides the fromString() and toString() methods.

Using arguments in strings

A common task is to have a string that needs to be dynamic in such a way that its content depends on the value of some external variable—for instance, you would like to inform the user about the number of files being copied, showing "copying file 1 of 2" or "copying file 2 of 5" depending on the value of counters that denote the current file and total number of files. It might be tempting to do this by assembling all the pieces together using one of the available approaches:

QString str = "Copying file " + QString::number(current) + " of "+QString::number(total);

There are a number of drawbacks to such an approach; the biggest of them is the problem of translating the string into other languages (this will be discussed later in this chapter) where in different languages their grammar might require the two arguments to be positioned differently than in English.

Instead, Qt allows us to specify positional parameters in strings and then replace them with real values. Positions in the string are marked with the % sign (for example, %1, %2, and so on) and they are replaced by making a call to arg() and passing it the value that is used to replace the next lowest marker in the string. Our file copy message construction code then becomes:

QString str = QStringLiteral("Copying file %1 of %2")
                                             .arg(current).arg(total);

The arg method can accept single characters, strings, integers, and real numbers and its syntax is similar to that of QString::number().

Regular expressions

Let's briefly talk about regular expressions—usually shortened as regex or regexp. You will need these regular expressions whenever you have to check whether a string or parts of it matches a given pattern or when you want to find specific parts inside the text and possibly want to extract them. Both the validity check and the finding/extraction are based on the so-called pattern of the regular expression, which describes the format a string must have to be valid, to be found, or to be extracted. Since this book is focused on Qt, there is unfortunately no time to cover regular expressions in depth. This is not a huge problem, however, since you can find plenty of good websites that provide introductions to regular expressions on the Internet. A short introduction can be found in Qt's documentation of QRegExp as well.

Even though there are many flavors of the regular expression's syntax, the one that Perl uses has become the de facto standard. According to QRegularExpression, Qt offers Perl-compatible regular expressions.

Note

QRegularExpression was first introduced with Qt 5. In the previous versions, you'll find the older QRegExp class. Since QRegularExpression is closer to the Perl standard and since its execution speed is much faster compared to QRegExp, we advise you to use QRegularExpression whenever possible. Nevertheless, you can read the QRegExp documentation about the general introduction of regular expressions.