What is the echo function in Python

Strings



Chapter 7

Rough translation - Please send feedback on errors and ambiguities to [email protected]

7.1 A compound data type

So far we have got to know three data types: int, float and string. String is qualitatively different from the other two because strings are made up of smaller building blocks made up of characters.

Types that will include smaller building blocks compound data types or structured data types called. Depending on what you are doing, you want to either treat a compound data type as a single thing or access its components. It is therefore useful to have both options available here.

The square bracket operator selects a single character from the string.

>>> fruit = "banana"
>>> letter = fruit [1]
>>> print letter

The expression fruit [1] takes the character number 1 from fruit. The variable letter refers to the result. When we print letter, we get a surprise:

a

The first letter of "banana" is not a. Unless you're a computer scientist. For perverse reasons, computer scientists always start counting at 0. The 0th ("zeroth") letter of "banana" is b. The 1st ("first") letter is a and the 2nd ("second") letter is n.

If you want to have the zeroth letter of a string, you just put a 0, or any expression that results in the value zero, in the square brackets.

>>> letter = fruit [0]
>>> print letter
b

The expression in the square brackets is called a index. An index defines an element of an ordered set, in our case the set of letters in a string. The index simply indicates which one you mean. It can be any integer expression.

7.2 length

The len function returns the number of characters in a string:

>>> fruit = "banana"
>>> len (fruit)
6

To get the last letter of a string, you might try something like this:

length = len (fruit)
last = fruit [length] # ERROR!

But that won't work. It causes the runtime error IndexError: string index out of range. This is because there is no 6th letter in the word "banana". Because we start counting at 0, the 6 letters are numbered from 0 to 5. To get the last character we need to subtract 1 from the length:

length = len (fruit)
last = fruit [length-1]

Alternatively, we can also use negative indices that count back from the end of the string. The expression fruit [-1] returns the last letter, fruit [-2] the penultimate letter, and so on.

7.3 Run with the for loop

Many calculations involve processing strings letter by letter. Often times they start at the beginning, pick one character at a time, do something with it, and continue like that until the end. We call this processing pattern one Pass. One way to code a pass is to use a while loop:

index = 0
while index letter = fruit [index]
print letter
index = index + 1

This loop goes through the string and writes each letter on a separate line. The loop condition is index

Exercise: write a function that takes a string as an argument and prints the letters from back to front, one per line

Using an index to loop through a set of values ​​is so common that Python provides an alternative, simpler syntax for it, the for loop:

for char in fruit:
print char

With each loop pass, the next letter in the string is assigned to the variable char. The loop continues until there are no letters left.

The following example shows how to use concatenation and a for loop to create an ABC series. "ABC series" means a series (sequence) or list in which the elements appear in alphabetical order. There are z. B. in Robert McCloskey's book Make way for ducklings the names of the ducklings Jack, Kack, Lack, Mack, Nack, Ouack, Pack, and Quack. The following loop prints these names in the correct order:

prefixes = "JKLMNOPQ"
suffix = "ack"

for letter in prefixes:
print letter + suffix

The output of the program is:

Jack
Shit
paint
Mack
Naked
Oack
pack
Qack

This is of course not entirely ok, because "Ouack" and "Quack" are not spelled correctly.

Exercise: modify the program to fix this error

7.4 String Sections

A section of a string is also called a slice (a disk). Selecting a section is similar to selecting a single character:

>>> s = "Peter, Paul, and Mary"
>>> print s [0: 5]
Peter
>>> print s [7:11]
Paul
>>> print s [17:21]
Mary

The operator [n: m] returns the part of the string that goes from the nth letter to the mth letter - including the first and excluding the last. This behavior is a bit counterintuitive; Perhaps better to imagine that the indices between the letters show as in the following diagram:

If you omit the first index (before the colon), the section begins at the beginning of the string. If you omit the second index, the section goes to the end of the string. Consequently:

>>> fruit = "banana"
>>> fruit [: 3]
'ban'
>>> fruit [3:]
'ana'

What do you think s [:] will mean?

7.5 Comparison of strings

The comparison operators also work with strings. How to check if two strings are the same:

if word == "banana":
print "Yes, we have no bananas!"

Other comparison operations can be used to put words in alphabetical order:

if word <"banana":
print "Your word," + word + ", comes before banana."
elif word> "banana":
print "Your word," + word + ", comes after banana."
else:
print "Yes, we have no bananas!"

But you should be aware that Python does not treat upper and lower case letters as we are used to. All capital letters come before all lower case letters. That gives z. B .:

Your word, zebra, comes before banana.

The usual solution to this problem is to convert strings into a standard format, e.g. B. only lower case letters, before you carry out the comparison. Getting a program to realize that zebras are not fruit is a more difficult problem.

7.6 Strings cannot be changed

It would seem natural to use the [] operator on the left side of a value assignment with the intention of changing a letter in a string. For example:

greeting = "Hello, world!"
greeting [0] = 'J' # ERROR!
print greeting

Instead of the Jello edition, world! However, this code only returns a runtime error TypeError: object doesn't support item
assignment.

Strings are immutable, which means you can't change an existing string. The best you can do is create a new string that is a variation on the original: Strings are unchangeable (English: immutable), that is, you cannot change an existing string. The best thing to do is to create a new string that is a variant of the old one:

greeting = "Hello, world!"
newGreeting = 'J' + greeting [1:]
print newGreeting

The solution here is to concatenate a new first letter with a section of greeting. This operation does not change the original strings.

7.7 A function called find

What does the following function do?

deffind (str, ch):
index = 0
while index if str [index] == ch:
return index
index = index + 1
return -1

In a sense, find is the opposite of the [] operator. Instead of finding the letter corresponding to a given index, it takes a letter and finds the index where the letter is located. If the letter does not appear at all, the function returns -1.

This is the first example in which a return statement occurs within a loop. If str [index] == ch, then the function immediately outputs the value of index and breaks the loop prematurely.

If the letter does not appear in the string, the program exits the loop normally and then returns -1.

This code pattern is sometimes called a "eureka" pass, because as soon as we have found what we are looking for we can call "Eureka" and stop looking.

Exercise. change the find function so that it has a third parameter: the index in the string at which to start the search.

7.8 Loops and Counters

The following program counts how often the letter a appears in a string:

fruit = "banana"
count = 0
for char in fruit:
if char == 'a':
count = count + 1
print count

This program demonstrates another code pattern called a counter. The variable count is initialized with 0 and then its value is increased by 1 each time an a is found. (It is also often said: the variable becomes incremented. This is the opposite of decrement, where the value of the variable is decreased by 1).

Exercise: encapsulate this code in a function called countLetters and generalize so that the function has two parameters: the string and the letter
Exercise: Rephrase this function to use the three-parameter version of find from the previous exercise instead of looping through the string.

7.9 The string module

The string module contains useful functions that manipulate strings. As usual, we need to import the module before we can use it:

>>> import string

The string module contains a function called find that does the same thing as the function we wrote. To call them, we have to enter the name of the module and the name of the function, using the dot notation:

>>> fruit = "banana"
>>> index = string.find (fruit, "a")
>>> print index
1

This example demonstrates one of the advantages of modules - it helps avoid conflicts between the names of built-in functions and the names of user-defined functions. By using the dot notation you can specify which version of find you want to use.

In fact, string.find is more general than our version. First, it can find substrings, not just letters:

>>> string.find ("banana", "na")
2

Furthermore, it takes an additional argument that defines the index with which the search should start:

>>> string.find ("banana", "na", 3)
4

Or it can take two additional arguments specifying a range of indexes:

>>> string.find ("bob", "b", 1, 2)
-1

In this example, the search fails because the letter b does not appear in the index range from 1 to 2 (which no longer contains 2).

7.10 Classification of characters

It is often necessary to look at a single character and see whether it is a lower or upper case letter, or whether it is a letter or a number. The string module provides some constants that are useful for this purpose.

The string string.lowercase contains all letters that the system considers to be lowercase. Similarly, string.uppercase contains all capital letters. Try the following and see what result you get:

>>> print string.lowercase
>>> print string.uppercase
>>> print string.digits

We can use this constant and find to classify letters. For example, if find (lowercase, ch) returns a value other than -1, then ch must be a lowercase letter:

defisLower (ch):
return find (string.lowercase, ch)! = -1

Alternatively, we can also use the in operator, which determines whether a character occurs in a string:

defisLower (ch):
return ch in string.lowercase

Another alternative is to use the comparison operator:

defisLower (ch):
return'a '<= ch <=' z '

If ch between a and z it must be a lowercase letter.

Exercise: think about which version of isLower will be the fastest. Can you think of other reasons besides speed of execution for preferring this or that version?

Another constant defined in the string module may surprise you when you look at it:

>>> print string.whitespace

Whitespace Characters move the cursor without outputting anything. They show the "white space" between the visible characters (at least on white paper). The constant string.whitespace contains all such invisible characters, including the space, tab (\ t), and newline (\ n).

There are a number of other useful functions in the string module. but this book is not intended to be a reference manual, unlike the Python Library Reference. It is available, along with other extensive documentation, on the Python website: www.python.org

7.11 Glossary

composite data type
A data type in which the values ​​consist of components, i.e. elements that are themselves values.
compound data type
A data type in which the values ​​are made up of components, or elements, that are themselves values.
run through
Carry out a similar operation for all elements of a set in sequence.
traverse
To iterate through the elements of a set, performing a similar operation on each.
index
A variable or value that is used to select an element of an ordered set, e.g. B. a character from a string
index
A variable or value used to select a member of an ordered set, such as a character from a string.
section
Part of a string specified by a range of indices.
slice
A part of a string specified by a range of indices.
changeable
A composite data type whose elements can be reassigned values.
mutable
A compound data types whose elements can be assigned new values.
counter
A variable that is used to count something. Usually it is initialized with zero and then incremented.
counter
A variable used to count something, usually initialized to zero and then incremented.
increment
Increase the value of a variable by one.
increment
To increase the value of a variable by one.
decrement
Decrease the value of a variable by one.
decrement
To decrease the value of a variable by one.
Whitespace sign
All characters that move the cursor without outputting any visible characters. The constant string.whitespace contains all "whitespace" characters.
whitespace
Any of the characters that move the cursor without printing visible characters. The constant string, whitespace contains all the whitespace characters.