This shows you the differences between two versions of the page.
— |
gnucap:user:istring [2018/09/24 07:42] (current) felixs created |
||
---|---|---|---|
Line 1: | Line 1: | ||
+ | == case sensitivity in maps == | ||
+ | gnucap provides a flag for case sensitivity that affects the parsing and | ||
+ | interpretation of identifiers. It is user controlled. The main use case is the | ||
+ | support for mostly obsolete but still widely used input languages such as | ||
+ | SPICE. the case sensitivity flag affects declaration and lookup. | ||
+ | |||
+ | In case insensitive mode, an identifier is cast to lower case when it is | ||
+ | entered, and when it is looked up in a dictionary. This can be seen when | ||
+ | switching to the SPICE language, which turns case sensitivity off. | ||
+ | |||
+ | gnucap> spice | ||
+ | gnucap-spice>.param a=1. | ||
+ | gnucap-spice>.param B=2. | ||
+ | gnucap-spice>.param | ||
+ | a= 1. b= 2. | ||
+ | |||
+ | internally, these names are stored in lower case, not case preserving, similar | ||
+ | to names in FAT12, but lower case. back to verilog mode, things look sensitive | ||
+ | again. | ||
+ | |||
+ | gnucap-spice>.verilog | ||
+ | gnucap-verilog>param C=2. | ||
+ | gnucap-verilog>param c=3. | ||
+ | gnucap-verilog>param b=4. | ||
+ | gnucap-verilog>param | ||
+ | .C( 2.), .a( 1.), .b( 4.), .c(3.) | ||
+ | |||
+ | in particular, the actual case is lost, and the parameter B is gone. | ||
+ | |||
+ | gnucap-verilog>eval C | ||
+ | C= 1. | ||
+ | gnucap-verilog>eval B | ||
+ | B=B | ||
+ | gnucap-verilog>eval b | ||
+ | b= 4. | ||
+ | |||
+ | similarly, when a parameter "X" is declared in case sensitive mode, it is | ||
+ | invisible in case insensitive mode. This can have surprising consequences, when | ||
+ | running simulations (e.g. parameter sweeps) in spice mode. | ||
+ | |||
+ | Newer case insensitive file systems (e.g. NTFS, VFAT, HFS+) are case | ||
+ | preserving, not without odd side effects. on NTFS, readme.txt and a Readme.txt | ||
+ | can coexist, according to wikipedia. Since file systems usually fix the | ||
+ | sensitivity mode, the side effects are limited. But the approaches do not work | ||
+ | very well in gnucap. | ||
+ | |||
+ | gnucap should be case preserving internally, but must implement a lookup that | ||
+ | depends on the current mode. | ||
+ | |||
+ | gnucap> options noinsensitive | ||
+ | gnucap> param A=1. | ||
+ | gnucap> options insensitive | ||
+ | gnucap> eval A | ||
+ | A= 1. | ||
+ | gnucap> eval a | ||
+ | a= 1. | ||
+ | gnucap> param A=2. | ||
+ | gnucap> eval A | ||
+ | A= 2. | ||
+ | gnucap> eval a | ||
+ | a= 2. | ||
+ | gnucap> param A=3. | ||
+ | gnucap> eval A | ||
+ | A= 3. | ||
+ | gnucap> eval a | ||
+ | a= 3. | ||
+ | |||
+ | With an approriate lookup, no mangling will be required in case insensitive | ||
+ | mode. But how to implement an appropriate lookup? Gnucap stores parameters in | ||
+ | an std::map<std::string, PARAMETER>. The problem with that is, if "AAAA" is a | ||
+ | key, a case insensitive lookup for "aaaa" will be tricky. The expectation is, | ||
+ | that "AAAA" will be found. however, Maps store keys in binary trees, and the | ||
+ | search is implemented as a bisection, for performance reasons. | ||
+ | In a linear search, we could simply cast the search string to upper, as well as | ||
+ | all the keys that we compare it to. In the map above, we'd have to do a lookup | ||
+ | for all strings that "aaaa" could possibly match. Sixteen in this case, but | ||
+ | much worse for longer strings. | ||
+ | |||
+ | The issue with the lookup boils down to the fact that the insensitive matches | ||
+ | could be anywhere in the map, which is sorted by the order induced by the ASCII | ||
+ | encoding, 1 < 2 < ... < 9 < A < B ... < Z < a < b < ... < z. The following | ||
+ | example illustrates the pattern, which easily scales to any size. | ||
+ | |||
+ | AAAA * | ||
+ | ABBB | ||
+ | AAaa * | ||
+ | AABB | ||
+ | Aaaa * | ||
+ | Bbbb | ||
+ | aaaa * | ||
+ | |||
+ | In contrast, if the keys were sorted insensitively first and blocks with | ||
+ | congruent keys sorted individually, case sensitive, the order would be | ||
+ | |||
+ | AAAA * | ||
+ | AAaa * | ||
+ | Aaaa * | ||
+ | aaaa * | ||
+ | AABB | ||
+ | ABBB | ||
+ | Bbbb. | ||
+ | |||
+ | Putting all possible matches to our query next to each other. In an std::map | ||
+ | sorted like this, lookups and inserts with either case (in)sensitivity mode | ||
+ | will work, while the mode may change in between. | ||
+ | |||
+ | - implement the string comparison operator | ||
+ | - encapsulate it in a custom string type | ||
+ | - provide case sensitive and insensitive lookup, by default these should use OPT::case_insensitive. | ||
+ | |||
+ | (this is implemented in the paramap-* branch) | ||
+ | |||
+ | == CARD_LIST == | ||
+ | |||
+ | With this kind of string in place, it will be possible to also implement an | ||
+ | efficient CARD_LIST::find, supporting both case sensitivities. CARD_LISTs | ||
+ | implement the lists of CARDs, which represent all user input, including the | ||
+ | order, for each individual scope. The CARDs could be models or components or | ||
+ | subcircuit declarations or just comments. | ||
+ | |||
+ | Unlike in PARAM_LISTs there can be many cards with the same key (short_label) | ||
+ | in one CARD_LIST, which is implemented as a linked list. The lookup is | ||
+ | implemented as a linear search, and hence slow. And if there was no case | ||
+ | sensitivity flag, this could be easily fixed. an std::multimap<string, | ||
+ | list_iterator> or std::multiset<list_iterator> right next to the std::list | ||
+ | could be used to lookup the position. | ||
+ | |||
+ | So the list needs to stay the same, and instead of the multiset, we need a | ||
+ | custom multiset that works transparently through case sensitivity changes. | ||
+ | One way of doing this, is to nest sets. One set, with elements sorted case | ||
+ | insensitively holds (at most) one multiset for each string class, a "bucket". | ||
+ | a bucket may then hold arbitrarily many CARDs, but in case sensitive ordering. | ||
+ | |||
+ | For example, if there are cards A A a B b b c, they will end up in three | ||
+ | buckets. | ||
+ | |||
+ | A - A | ||
+ | - A | ||
+ | - a | ||
+ | B - B | ||
+ | - b | ||
+ | - b | ||
+ | c - c | ||
+ | |||
+ | A case insensitive lookup for b will then try and find it in the B bucket, then | ||
+ | pick any of the cards there. similarly, a case sensitive lookup will pick one | ||
+ | of the lower case cards. since std::multiset preserves the insertion order, it | ||
+ | will be possible to fine tune the lookup. | ||
+ | |||
+ | (this is implemented in the multimap-* branch) |