View Issue Details Jump to Notes ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0013760CMakeCMakepublic2012-11-29 16:132016-06-10 14:31
ReporterAndreas Mohr 
Assigned ToKitware Robot 
PrioritynormalSeveritymajorReproducibilityalways
StatusclosedResolutionmoved 
PlatformPC 32bitOSLinuxOS VersionDebian stable
Product VersionCMake 2.8.9 
Target VersionFixed in Version 
Summary0013760: file(STRINGS): very questionable (sufficiently certainly buggy?) behaviour for square brackets
DescriptionI just tried parsing a section that resembles an IDL file part (the format spec of which has sections enclosed in '[',']').
I was rather very astonished about the result of file(STRINGS) on this
(after already having spent a sizeable chunk of the day about various other file(STRINGS) specifics, to add insult to injury).

Why in h*ll would file(STRINGS) take such specific care about the format of the text file?
Don't tell me that it's because of (quoting docs) "Intel Hex and Motorola S-record files", which could possibly happen to have certain '['-enclosed sections. That would be a sad result for an otherwise (in the case of non-Intel/Motorola files) supposedly(?) sufficiently generic file(STRINGS) functionality.

Needless to say having any []-enclosed yet originally *multi-line* content
end up delivered as a *single* line within foreach() processing is very problematic when contrasted against my expectations.
If it actually is correct handling (for certain aspects of "correct") and there's no quite standard CMake mechanism explanation for this that I managed to miss, then docs should definitely be corrected to mention this possibly '['-specific handling.

Any ideas or comments about this?

Severity major since it's data corrupting (e.g. going line-by-line over a regex with start-of-line/end-of-line constraints - ^$ - *will* cause headache or worse).

Thank you!
Steps To Reproducecmake_minimum_required(VERSION 2.8)

project(file_strings_bug_test NONE)

macro(write_file _file _content)
  file(WRITE "${CMAKE_CURRENT_BINARY_DIR}/${_file}" "${_content}")
endmacro(write_file _file _content)

macro(read_file _file)
  file(STRINGS "${CMAKE_CURRENT_BINARY_DIR}/${_file}" _content_list)
  foreach(line_ ${_content_list})
    message("line ${_file}: ${line_}")
  endforeach(line_ ${_content_list})
endmacro(read_file _file)

set("content_ok" "Hello
World
My Worrying
Test")

set(content_ko "[${content_ok}]")
set(content_ko2 "Hi
There
[${content_ok}] ")

write_file(file_ok "${content_ok}")
write_file(file_ko "${content_ko}")
write_file(file_ko2 "${content_ko2}")

read_file(file_ok)
read_file(file_ko)
read_file(file_ko2)
Additional Information$ cmake ..
line file_ok: Hello
line file_ok: World
line file_ok: My Worrying
line file_ok: Test
line file_ko: [Hello;World;My Worrying;Test]
line file_ko2: Hi
line file_ko2: There
line file_ko2: [Hello;World;My Worrying;Test]
-- Configuring done
-- Generating done
-- Build files have been written to: /home/andi/prg/cmake_tests/file_strings_bug_test/build
TagsNo tags attached.
Attached Files

 Relationships

  Notes
(0031770)
David Cole (manager)
2012-11-29 16:52

The results are as I would expect them...

It is equivalent to running the "strings" command line utility on the file.

Why do you think the file(STRINGS should return "lines" of text? It returns strings.

If you want to have newlines contained in the returned strings, try using the NEWLINE_CONSUME argument (meaning consume newlines into the returned strings...):

  file(STRINGS "${CMAKE_CURRENT_BINARY_DIR}/${_file}" _content_list NEWLINE_CONSUME)
(0031771)
Andreas Mohr (reporter)
2012-11-29 17:27

Hi,

first, thank you for your fast response! (hmm, I sense a recurring pattern...)

Darn, right, of course UNIX "strings" has a purpose which is rather different from line splitting. Don't know how I managed to get that wrong.
I think adding a docs phrase like "similar to the strings UNIX utility" would be useful.

To achieve the very same output with "strings" on my IDL file, I had to use strings -n 1, though.


However, that being said, I'm still unconvinced that all is fine in la-la land.

I realized that having one '[' added into the previously alpha-only file (i.e., even with the closing ']' omitted) will drastically change the splitting behaviour.

I'm currently talking of this content:

hello
[Hello
World
My Worrying
Test

And in this case even "strings -n 1 file_ko" output is *different* from what CMake produces:
$ strings -n 1 file_k
hello
[Hello
World
My Worrying
Test


line file_ko: hello
line file_ko: [Hello;World;My Worrying;Test


Note that prepending neither '*' nor '<' nor '{' rather than '[' produce this effect (nor '%', '!', ':', '\"', '-'), it's *only* '[' which does that.

A parser "feature" (parser state machine paying close attention to its somehow "special" '[' char and then starting to do weird things) seems more and more likely.
(0031772)
Andreas Mohr (reporter)
2012-11-29 17:37

Doing a file(STRINGS) over a large binary e.g. /bin/touch will cause one to realize that foreach() output keeps alternating between single-element and concatenated-elements dumping (right after it encountered one of '['/']' chars each time...), whereas "strings -n 1" totally does not do that.
(0031778)
David Cole (manager)
2012-11-30 06:51

I can't quite figure out why file(STRINGS cares about "[" characters...

The loop in the code starting here:
  https://github.com/Kitware/CMake/blob/e0af55a5f4cd84db1cc5a3517e730ea8c6332f45/Source/cmFileCommand.cxx#L582 [^]

...should only ever treat the file input character by character, and pull ASCII strings out of it. The '[' and ']' are not treated specially at all. They should fall squarely in the middle of the "(c >= 0x20 && c < 0x7F)" character range.

This is a very weird issue...
(0031779)
David Cole (manager)
2012-11-30 06:55

FYI... Here is some test code in the CMake source tree that reliably does a foreach over lines of text in a text file:

  https://github.com/Kitware/CMake/blob/e0af55a5f4cd84db1cc5a3517e730ea8c6332f45/Tests/CMakeTests/CheckSourceTreeTest.cmake.in#L249 [^]

...although you do have to add&remove a special end of line character to each line to account for semi-colons in the output. 'E' in this case.
(0031781)
Brad King (manager)
2012-11-30 08:26

The [] behavior is probably in the

 foreach(line_ ${_content_list})

line where variable expansion does not separate on ';' inside square brackets. Therefore if '[' appears on one line and ']' appears on a later line the "each" will not necessarily see them as two lines.
(0042162)
Kitware Robot (administrator)
2016-06-10 14:28

Resolving issue as `moved`.

This issue tracker is no longer used. Further discussion of this issue may take place in the current CMake Issues page linked in the banner at the top of this page.

 Issue History
Date Modified Username Field Change
2012-11-29 16:13 Andreas Mohr New Issue
2012-11-29 16:52 David Cole Note Added: 0031770
2012-11-29 17:27 Andreas Mohr Note Added: 0031771
2012-11-29 17:37 Andreas Mohr Note Added: 0031772
2012-11-30 06:51 David Cole Note Added: 0031778
2012-11-30 06:55 David Cole Note Added: 0031779
2012-11-30 08:26 Brad King Note Added: 0031781
2016-06-10 14:28 Kitware Robot Note Added: 0042162
2016-06-10 14:28 Kitware Robot Status new => resolved
2016-06-10 14:28 Kitware Robot Resolution open => moved
2016-06-10 14:28 Kitware Robot Assigned To => Kitware Robot
2016-06-10 14:31 Kitware Robot Status resolved => closed


Copyright © 2000 - 2018 MantisBT Team