Significant changes to the "ore" package are laid out below for each release. =============================================================================== VERSION 1.7.4 - Named groups would not be propagated to match matrices unless the regex was pre-compiled using `ore()`. This has been corrected. - A compiler warning about a `printf`-type format specification has been resolved. =============================================================================== VERSION 1.7.3.1 - A potential mismatch between the C compiler configured for R and the one used by the configure script has been resolved. - A modulus of very large match numbers is now used when printing, rather than nothing. This has the side-effect of silencing a warning. =============================================================================== VERSION 1.7.3 - The package will now properly detect a plain locale like "UTF-8" on start-up. - C prototype warnings have been resolved, and the problematic `sprintf()` function is now avoided. =============================================================================== VERSION 1.7.2.1 - C prototype-related warnings have been resolved, as required for CRAN. In order to facilitate this, an updated version of the embedded hash table library has been imported from the Ruby source tree. =============================================================================== VERSION 1.7.2 - A handful of small memory leaks have been plugged. - The `README` has been updated to detail the current divergence between CRAN and mainline versions of the package. =============================================================================== VERSION 1.7.1.1 - The `glass` dataset and connection support have been removed from the CRAN build of the package because they both produce notes from `R CMD check` and there is no known workaround. - A handful of small memory leaks have been plugged. =============================================================================== VERSION 1.7.1 - The `binary` argument to `ore_file()` is now stored in an attribute in its return value. The `ore_search()` function additionally uses this attribute to determine whether or not to set the `text` element of its return value. Treating a binary file's entire contents as a string is unwise, and may include embedded nuls and other problem bytes. - A minor tweak has been made to the `es()` function, which appreciably improves its performance in simple cases. - A potential buffer overrun and protection bug have both been corrected. =============================================================================== VERSION 1.7.0 - The Onigmo library has been updated to version 6.2.0. - R connections are now supported as text sources, allowing URLs, gzipped files and pipes (amongst others; see `?connections`) to be straightforwardly interfaced with the package. The package's C backend has been substantially refactored to support files and connections in all the core functions. - The new `ore_switch()` function selects between a number of possible outputs by matching each element of its input against a series of regular expressions. This can be a useful way to handle different possible forms of the same information. - The new `ore_repl()` function is a relative of the existing `ore_subst()`, but differs in how it is vectorised. Unlike `ore_subst()`, it will replicate the source text if necessary to ensure that all specified replacements are used. The `es()` function now uses `ore_repl()`, and so can produce output vectors longer than its input. - `ore_subst()` gains a "start" argument, like `ore_search()`, and now accepts multiple replacements, which will be applied in sequence to different matches. - `ore_match()` is a new alias of `ore_search()`. - Substitution group references can now include "\\0" for the whole match string, and allow for group numbers higher than 9. Group numbers that are out of range should now produce an error, rather than potentially leading to a segfault. Named group references should also be more robust. - A multilingual, multiscript dataset, "glass", is now included with the package thanks to Frank da Cruz and many contributors. - Names and `NA` values are now propagated from text arguments that are character vectors. - There is a new option (`ore.keepNA`) to propagate `NA`s in `ore_ismatch()`, and the infix functions `%~%` and `%~~%`, rather than convert them implicitly to `FALSE`. This is off by default for backwards compatibility. - Printing has been improved, and in particular "orematches" objects now show (just) the elements that matched. - Underscore-separated names are now used in preference in the package documentation, following the trend in other packages, but the period- separated versions are still available. - Detection of the platform's native encoding has been improved, and this will be asserted in regular expressions created by `ore()`, rather than them being marked as being in "unknown" encoding by default as before. Handling of encodings has been refined elsewhere too. =============================================================================== VERSION 1.6.3 - The ore.file() function now performs path expansion on its argument. - The package is now more careful to check that "external" pointers are valid before dereferencing them, thereby avoiding potential segfaults. (Reported by Thomas Weise, #5). - Multiarchitecture support is now explicitly declared, ensuring that 32-bit and 64-bit versions are built when installing from source on Windows. (Reported by Ioannis Mamalikidis, #6). =============================================================================== VERSION 1.6.2 - Additional warnings from GCC 8 and rchk have been resolved. =============================================================================== VERSION 1.6.1 - The Onigmo library has been updated to version 6.1.3. - Various compiler and sanitiser warnings have been resolved. =============================================================================== VERSION 1.6.0 - The Onigmo library has been updated to version 6.1.1. - Files will now be searched incrementally, meaning that large files do not need to be read in their entirety if the match is near the beginning. This is disabled by default if all matches are requested. - Additional arguments to ore.search() are now accessible through an ellipsis argument to ore.ismatch(). - The results of a substitution function are now recycled across matches, and can therefore be of any length. - A handful of subtle garbage-collection heisenbugs have been corrected. (Reported by Tomas Kalibera.) =============================================================================== VERSION 1.5.0 - The object returned by ore.search() now generally has the class "orematches". This class has its own indexing and print methods to avoid copious output and allow quicker access to multiple matches. - The ore.search() and ore.split() functions now accept zero-length text vectors, returning an empty list rather than producing an error. - The matches() and groups() functions now use the last match by default, if called with no argument. =============================================================================== VERSION 1.4.0 - The new ore.escape() function can be used to escape characters which would usually have special significance in a regular expression. This can be helpful when incorporating arbitrary strings into a regex using ore(). =============================================================================== VERSION 1.3.0 - It is now possible to search directly in text files, using their native encoding, or in binary files, byte-by-byte. To do this, a call to the new ore.file() function should replace the usual text argument to ore.search(). All of Onigmo's many encodings are supported for text files. - A new infix operator, %~|%, can be used filter vectors by whether they match a regular expression. - The "matches" and "groups" methods for lists now drop nonmatching elements by default. This helps avoid many NULLs in results. - The "print" method for "orematch" objects now works correctly with double-width characters. =============================================================================== VERSION 1.2.2 - Zero-length matches could lead to segmentation faults. This has been corrected. - Calls to ore.dict() which include other function calls are now handled properly. (Reported by Jon Olav Vik.) - The es() function would previously only honour its rounding arguments if all matches were numeric. This has been corrected. =============================================================================== VERSION 1.2.1 - The es() function would previously only replace the first embedded expression. That has been corrected. - Attributes are now ignored when comparing "ore" objects in tests. This led to a failure on versions of R prior to 3.2.0, at least on Windows. =============================================================================== VERSION 1.2.0 - Oniguruma's support for alternative regular expression syntaxes is now exposed by the ore() function. At present only "fixed" (i.e., literal) patterns are supported in addition to the default, which has always been "ruby". - The ore.lastmatch() function gains a "simplify" argument. - Underscore-separated function names are now available as aliases to their period-separated equivalents. For example, you can use ore_search() instead of ore.search(), if you prefer. - The use of variables holding patterns could sometimes cause errors in ore(). This has been corrected. =============================================================================== VERSION 1.1.0 - The package now provides functionality for creating and working with a pattern dictionary, and a few predefined patterns are included. This allows common regular expressions to be easily named, stored, and integrated into larger expressions later. The key new function is ore.dict(). - The new es() function provides expression substitution, a convenient way to embed R expressions within strings, and a good example of the package's substitution functions in action. - Printing of "orematch" objects is now much clearer, particularly for long or multiline search texts. - Group matches are now passed to substitution functions, in an attribute. - The "ore.colour" (or "ore.color") option can now be used to determine whether to print in colour, rather than using the crayon package (although the latter is still the default). =============================================================================== VERSION 1.0.7 - Printing "orematch" objects no longer leads to errors. This was due to a mismatch in function names between the C code and the calling R code. =============================================================================== VERSION 1.0.6 - All of the core functions have now switched to primarily C implementations, providing substantial performance gains. Minor changes in the handling of text encodings have occurred alongside this. - The limit of 128 matches per string has been removed, due to more sophisticated memory management. - The arguments to ore.subst() have been reordered slightly, to avoid inadvertent partial matching when using a substitution function. =============================================================================== VERSION 1.0.5 - Calculation of match offsets within the C code have been made much more efficient. =============================================================================== VERSION 1.0.4 - Default methods have been added for matches() and groups(), which return NA. - The "rex" test is now skipped if that package is not available. - Various low-level warnings/errors from UBSAN have been addressed. =============================================================================== VERSION 1.0.3 - Almost all of the work of ore.search() is now done in C code rather than R, for performance. In testing, ore is now appreciably faster than base R for simple searches, when the regex is precompiled. Optimisation of other functions will follow in later releases. - The package no longer requires any specified minimum version of R (although it has not been tested on old releases). =============================================================================== VERSION 1.0.2 - A bug in the ore.subst() function that could produce low-level errors or segmentation faults has been fixed. =============================================================================== VERSION 1.0.1 - Tests should no longer fail in locales which do not use a UTF-8 encoding. - Documentation tweaks. =============================================================================== VERSION 1.0.0 - First public release. ===============================================================================