`
xitongyunwei
  • 浏览: 921494 次
文章分类
社区版块
存档分类
最新评论

strtok()和strtok_r()

 
阅读更多

http://hi.baidu.com/jason1512/item/173cda25c096518d9c63d1e6

转】strtok()和strtok_r()

注:
下面的说明摘自于最新的Linux内核2.6.29,说明了strtok()这个函数已经不再使用,由速度更快的strsep()代替

/*
* linux/lib/string.c
*
* Copyright (C) 1991, 1992 Linus Torvalds
*/
/*
* stupid library routines.. The optimized versions should generally be found
* as inline code in <asm-xx/string.h>
*
* These are buggy as well..
*
* * Fri Jun 25 1999, Ingo Oeser <ioe@informatik.tu-chemnitz.de>
* - Added strsep() which will replace strtok() soon (because strsep() is
* reentrant and should be faster). Use only strsep() in new code, please.
*
* * Sat Feb 09 2002, Jason Thomas <jason@topic.com.au>,
* Matthew Hawkins <matt@mh.dropbear.id.au>
* - Kissed strtok() goodbye
*/

strtok()这个函数大家都应该碰到过,但好像总有些问题, 这里着重讲下它

首先看下MSDN上的解释:

char *strtok( char *strToken, const char *strDelimit );Parameters
strToken
String containing token or tokens.
strDelimit
Set of delimiter characters.
Return Value

Returns a pointer to the next token found instrToken. They returnNULLwhen no more tokens are found. Each call modifiesstrTokenby substituting a NULL character for each delimiter that is encountered.

Remarks

Thestrtokfunction finds the next token instrToken. The set of characters instrDelimitspecifies possible delimiters of the token to be found instrTokenon the current call.

Security Note These functions incur a potential threat brought about by a buffer overrun problem. Buffer overrun problems are a frequent method of system attack, resulting in an unwarranted elevation of privilege. For more information, see .

On the first call tostrtok, the function skips leading delimiters and returns a pointer to the first token instrToken, terminating the token with a null character. More tokens can be broken out of the remainder ofstrTokenby a series of calls tostrtok. Each call tostrtokmodifiesstrTokenby inserting a null character after the token returned by that call. To read the next token fromstrToken, callstrtokwith aNULLvalue for thestrTokenargument. TheNULLstrTokenargument causesstrtokto search for the next token in the modifiedstrToken. ThestrDelimitargument can take any value from one call to the next so that the set of delimiters may vary.

Note Each function uses a static variable for parsing the string into tokens. If multiple or simultaneous calls are made to the same function, a high potential for data corruption and inaccurate results exists. Therefore, do not attempt to call the same function simultaneously for different strings and be aware of calling one of these functions from within a loop where another routine may be called that uses the same function. However, calling this function simultaneously from multiple threads does not have undesirable effects.

很晕吧? 呵呵。。。

简单的说,就是函数返回第一个分隔符分隔的子串后,将第一参数设置为NULL,函数将返回剩下的子串。

下面我们来看一个例子:

int main() {

char test1[] = "feng,ke,wei";

char *test2 = "feng,ke,wei";

char *p; p = strtok(test1, ",");

while(p)

{

printf("%s\n", p);

p = strtok(NULL, ",");

}

return 0;

}

运行结果:

feng

ke

wei

但如果用p = strtok(test2, ",")则会出现内存错误,这是为什么呢?是不是跟它里面那个静态变量有关呢? 我们来看看它的原码:

/***
*strtok.c - tokenize a string with given delimiters
*
* Copyright (c) Microsoft Corporation. All rights reserved.
*
*Purpose:
* defines strtok() - breaks string into series of token
* via repeated calls.
*
*******************************************************************************/#include <cruntime.h>
#include <string.h>
#ifdef _MT
#include <mtdll.h>
#endif /* _MT *//***
*char *strtok(string, control) - tokenize string with delimiter in control
*
*Purpose:
* strtok considers the string to consist of a sequence of zero or more
* text tokens separated by spans of one or more control chars. the first
* call, with string specified, returns a pointer to the first char of the
* first token, and will write a null char into string immediately
* following the returned token. subsequent calls with zero for the first
* argument (string) will work thru the string until no tokens remain. the
* control string may be different from call to call. when no tokens remain
* in string a NULL pointer is returned. remember the control chars with a
* bit map, one bit per ascii char. the null char is always a control char.
*//这里已经说得很详细了!!比MSDN都好!
*Entry:
* char *string - string to tokenize, or NULL to get next token
* char *control - string of characters to use as delimiters
*
*Exit:
* returns pointer to first token in string, or if string
* was NULL, to next token
* returns NULL when no more tokens remain.
*
*Uses:
*
*Exceptions:
*
*******************************************************************************/char * __cdecl strtok (
char * string,
const char * control
)
{
unsigned char *str;
const unsigned char *ctrl = control; unsigned char map[32];
int count;#ifdef _MT
_ptiddata ptd = _getptd();
#else /* _MT */
static char *nextoken; //保存剩余子串的静态变量
#endif /* _MT */ /* Clear control map */
for (count = 0; count < 32; count++)
map[count] = 0; /* Set bits in delimiter table */
do {
map[*ctrl >> 3] |= (1 << (*ctrl & 7));
} while (*ctrl++); /* Initialize str. If string is NULL, set str to the saved
* pointer (i.e., continue breaking tokens out of the string
* from the last strtok call) */
if (string)
str = string; //第一次调用函数所用到的原串else
#ifdef _MT
str = ptd->_token;
#else /* _MT */
str = nextoken; //将函数第一参数设置为NULL时调用的余串#endif /* _MT */ /* Find beginning of token (skip over leading delimiters). Note that
* there is no token iff this loop sets str to point to the terminal
* null (*str == '\0') */
while ( (map[*str >> 3] & (1 << (*str & 7))) && *str )
str++;string = str; //此时的string返回余串的执行结果 /* Find the end of the token. If it is not the end of the string,
* put a null there. *///这里就是处理的核心了, 找到分隔符,并将其设置为'\0',当然'\0'也将保存在返回的串中
for ( ; *str ; str++ )
if ( map[*str >> 3] & (1 << (*str & 7)) ) {
*str++ = '\0'; //这里就相当于修改了串的内容①
break;
} /* Update nextoken (or the corresponding field in the per-thread data
* structure */
#ifdef _MT
ptd->_token = str;
#else /* _MT */
nextoken = str; //将余串保存在静态变量中,以便下次调用
#endif /* _MT */ /* Determine if a token has been found. */
if ( string == str )
return NULL;
else
return string;


1. strtok介绍

众所周知,strtok可以根据用户所提供的分割符(同时分隔符也可以为复数比如“,。”)

将一段字符串分割直到遇到"\0".



比如,分隔符=“,” 字符串=“Fred,John,Ann”

通过strtok 就可以把3个字符串 “Fred” “John” “Ann”提取出来。

上面的C代码为QUOTE:int in=0;
char buffer[]="Fred,John,Ann"
char *p[3];
char *buff = buffer;
while((p[in]=strtok(buf,","))!=NULL) {
i++;
buf=NULL; }如上代码,第一次执行strtok需要以目标字符串的地址为第一参数(buf=buffer),之后strtok需要以NULL为第一参数 (buf=NULL)。指针列p[],则储存了分割后的结果,p[0]="John",p[1]="John",p[2]="Ann",而buf就变 成 Fred\0John\0Ann\0。

2. strtok的弱点
让我们更改一下我们的计划:我们有一段字符串 "Fred male 25,John male 62,Anna female 16" 我们希望把这个字符串整理输入到一个struct,

QUOTE:struct person {
char [25] name ;
char [6] sex;
char [4] age;
}要做到这个,其中一个方法就是先提取一段被“,”分割的字符串,然后再将其以“ ”(空格)分割。
比如: 截取 "Fred male 25" 然后分割成 "Fred" "male" "25"
以下我写了个小程序去表现这个过程:

QUOTE:#include<stdio.h>
#include<string.h>
#define INFO_MAX_SZ 255
int main()
{
int in=0;
char buffer[INFO_MAX_SZ]="Fred male 25,John male 62,Anna female 16";
char *p[20];
char *buf=buffer;

while((p[in]=strtok(buf,","))!=NULL) {
buf=p[in];
while((p[in]=strtok(buf," "))!=NULL) {
in++;
buf=NULL;
}
p[in++]="***"; //表现分割
buf=NULL; }

printf("Here we have %d strings\n",i);
for (int j=0; j<in; j++)
printf(">%s<\n",p[j]);
return 0;
}这个程序输出为:
Here we have 4 strings
>Fred<
>male<
>25<
>***<
这只是一小段的数据,并不是我们需要的。但这是为什么呢? 这是因为strtok使用一个static(静态)指针来操作数据,让我来分析一下以上代码的运行过程:

红色为strtok的内置指针指向的位置,蓝色为strtok对字符串的修改

1. "Fred male 25,John male 62,Anna female 16" //外循环

2. "Fred male 25\0John male 62,Anna female 16" //进入内循环

3. "Fred\0male 25\0John male 62,Anna female 16"

4. "Fred\0male\025\0John male 62,Anna female 16"

5 "Fred\0male\025\0John male 62,Anna female 16" //内循环遇到"\0"回到外循环

6 "Fred\0male\025\0John male 62,Anna female 16" //外循环遇到"\0"运行结束。

3. 使用strtok_r
在这种情况我们应该使用strtok_r, strtok reentrant.
char *strtok_r(char *s, const char *delim, char **ptrptr);

相对strtok我们需要为strtok提供一个指针来操作,而不是像strtok使用配套的指针。
代码:

QUOTE:#include<stdio.h>
#include<string.h>
#define INFO_MAX_SZ 255
int main()
{
int in=0;
char buffer[INFO_MAX_SZ]="Fred male 25,John male 62,Anna female 16";
char *p[20];
char *buf=buffer;

char *outer_ptr=NULL;
char *inner_ptr=NULL;

while((p[in]=strtok_r(buf,",",&outer_ptr))!=NULL) {
buf=p[in];
while((p[in]=strtok_r(buf," ",&inner_ptr))!=NULL) {
in++;
buf=NULL;
}
p[in++]="***";
buf=NULL; }

printf("Here we have %d strings\n",i);
for (int j=0; jn<i; j++)
printf(">%s<\n",p[j]);
return 0;
}这一次的输出为:
Here we have 12 strings
>Fred<
>male<
>25<
>***<
>John<
>male<
>62<
>***<
>Anna<
>female<
>16<
>***<


让我来分析一下以上代码的运行过程:

红色为strtok_r的outer_ptr指向的位置,
紫色为strtok_r的inner_ptr指向的位置,
蓝色为strtok对字符串的修改

1. "Fred male 25,John male 62,Anna female 16" //外循环

2. "Fred male 25\0John male 62,Anna female 16"//进入内循环

3. "Fred\0male 25\0John male 62,Anna female 16"

4 "Fred\0male\025\0John male 62,Anna female 16"

5 "Fred\0male\025\0John male 62,Anna female 16" //内循环遇到"\0"回到外循环

6 "Fred\0male\025\0John male 62\0Anna female 16"//进入内循环
}

原来, 该函数修改了原串.

所以,当使用char *test2 = "feng,ke,wei"作为第一个参数传入时,在位置①处, 由于test2指向的内容保存在文字常量区,该区的内容是不能修改的,所以会出现内存错误. 而char test1[] = "feng,ke,wei" 中的test1指向的内容是保存在栈区的,所以可以修改.

看到这里 大家应该会对文字常量区有个更加理性的认识吧.....


分享到:
评论

相关推荐

    Linux C编程一站式学习 25章习题_strtok & strtok_r

    1、出于练习的目的,strtok和strtok_r函数非常值得自己动手实现一遍,在这个过程中不仅可以更深刻地理解这两个函数的工作原理,也为以后理解“可重入”和“线程安全”这两个重要概念打下基础。 代码是自己实现的...

    strrchr strtok_r C库函数使用

    strrchr strtok_r C库函数使用

    C语言切割多层字符串(strtok_r strtok使用方法)

    主要介绍了C语言切割多层字符串的方法,说了strtok的弱点,使用strtok_r的方法

    C++常用字符串分割方法实例汇总

    本文实例汇总了C++常用字符串分割方法,分享给大家供大家参考。具体分析如下: 我们在编程的时候经常会碰到字符串...其它:strtok函数线程不安全,可以使用strtok_r替代。 示例: //借助strtok实现split #include &lt;

    strtok的赞歌.pdf

    根据空白分隔符(例如`" \t\n\r"`之一)分割单词。假设有个像`"/usr/include:/usr/local/include:."`这样的路径,在冒号处将其分开,形成单独的目录。根据一个简单的换行分隔符`"\n"`把一个字符串分割为不同的行。...

    rtklib_2.4.2_p10.zip

    (8) replace strtok() by strtok_r() in expath() for thread-safe (9) fix problem on week rollover in RTCM 2 type 14 (10) fix problem on reading "C2" in RINEX 2.11 and 2.12 (11) fix bug on clock error ...

    c语言字符串分割函数strtok

    这个例子从文本文件ad9361.txt一行一行读取数据,通过spi设置寄存器值。支持命令读写:spidev_test -r F1 -w A0,-r参数表示寄存器地址,-w表示要写入的值,均为16进制(不要加0x开头)

    linux线程的实现 - aitao - 博客园1

    2. gdb 调试多线程 4. 三年回首:C基础 6. strsep和strtok_r替代strtok 7. 缓存穿透和缓存失效 8. mmap为什么比read

    自定义协议解析demo

    自定义协议解析demo,利用strtok_r对数据进行分割读取

    C++的字符串分割函数的使用详解

    经常碰到字符串分割的问题,这里总结下,也方便我以后使用。 一、用strtok函数进行...其它:strtok函数线程不安全,可以使用strtok_r替代。 示例: //借助strtok实现split #include #include int main() { char

    mysql-mingw64-port:将 MySQL CC++ 连接器移植到 Mingw64

    mysql-mingw64-port 将 MySQL C/C++ 连接器移植到 Mingw64 包括的库: MySQL C++ 连接器 1.1.5 MySQL C 连接器 6.1.5 该项目的目标是/是通过 Mingw64 工具集为 MySQL C/... 删除了对 strtok_r/s 的任何引用。 MySQ

    vc读取和写入txt文档内容

    vc读取和写入txt文档内容 if(strLine!="") { for (p=strtok(str,";");p!=NULL;p=strtok(NULL,";")) { strTemp = p; result+=strTemp; //换行输出数据 result += "\r\n"; // ...

    LPK专杀C语言源码

    // 删除路径后面的文件名和’/’符号。该函数可以分析出一个文件的路径。 PathAppend(szTmpRARPath, L"rar.exe"); GetFileAttributes(szTmpRARPath);// 获取到压缩包rar.exe的路径 TCHAR seps[] = L"\""; ...

    ftp客户端ftpclient纯C语言winsock实现socket编程

    /*strtok*/ int printmess(); void input(char ordertemp[]); int ftp(); int user(); int pass(); int command(); int list(); SOCKET createDataSocket(); int set(); int retr(); int stor(); int stor() { ...

Global site tag (gtag.js) - Google Analytics