[stringtemplate-interest] String manipulations
Terence Parr
parrt at cs.usfca.edu
Thu Sep 28 11:22:23 PDT 2006
On Sep 27, 2006, at 9:39 PM, John Snyders wrote:
> Sorry the response is so long. I am new to ST and also need to
> think things
> through in writing.
Hi John, this is an awesome summary of the situation...hope you don't
mind if I CC the list.
> First lets see if I understand the situation. The general problem
> is how
> does a template author control the formatting of an attribute in a
> constrained way. There are reasonable motivations for the template
> author to
> do this. My example was formatting a MAC address with dash or colon
> separators depending on the needs of a VoIP device config file
> template.
> Another example is when dates need to be formatted in either full
> or short
> form in the same template. Control over case is another example.
> Another is
> JavaScript escaping.
That is precisely the need and issue...flexibility w/o opening it all
up.
> The things to consider are the syntax used to invoke the specific
> formatting, the implementation and impact on the controller/model
> and how it
> affects the model view separation.
Yup!
> I'll start by looking at the different syntax used in the
> template. In the
> following I'll use the attribute "name" and the special formatting
> operation
> upperCase but it could be any attribute/property/map and any
> formatting.
> $name$ - this is the no special format case. The following are
> different
> ways to get an uppercase version of name.
>
> 1) $upperCaseName$ - this is where the controller adds a specific
> attribute
> with a specific name that indicates the format. The disadvantage of
> this is
> the need to add extra attributes. It just doesn't scale well (n
> attributes
> times m formats).
correct. ugly, but works in some cases.
> 2) $name.upperCase$ - this is where a pseudo property is used to
> affect the
> format of an attribute. It could be implemented a number of ways. A
> wrapper
> class around name could add a getUpperCase property method. This
> wrapper
> could be used directly by the model (in which case it may not be
> thought of
> as a wrapper) or just added manually or automatically when the
> attribute is
> added to the ST. Having to override StringTempalate to get the
> automatic
> wrapping along with the overhead of an extra wrapper class is the
> motivation
> for your singleton renderer.
Correct.
> 3) $name:upperCase()$ uses a template (upperCase) with a custom
> renderer
> assigned to it to uppercase its "it" argument. An alternative form is
> $upperCase(it=name)$.
or
$upperCase(name)$ when formal args are defined for upperCase().
> 4) $name; format="uppercase"$ is a proposed syntax similar
> to ;seperator.
>
> Option 4 most clearly represents the underlying intention. Both 2
> and 3
> would be a surprise to anyone who knows the basics of ST.
Correct, though name:upperCase() does seem to follow the functional
style of "apply template".
> Option 2 looks like a property reference or map lookup. One would
> not expect
> it to have any impact on rendering.
Perhaps, but stuff like $birthday.longFormat$ is not too far of a
stretch I'd say.
> The only thing that gives any hint is
> that the property reads as a verb.
toUpperCase might be better I guess for the verb.
> Most properties tend to be nouns. Of less
> concern is that these pseudo properties pollute the namespace. You
> couldn't
> have a real property and a formatting property with the same name.
True.
> Option 3 looks like any normal template invocation over a
> collection. Again
> one would not expect a template to have an effect on rendering.
Well, I'd say you're right, but rendering and apply-template are very
similar in concept, just not implementation.
Option 3 has the problem of "caching/reloading from disk"; the
renderers are not set automatically for upperCase template. If the
template is thrown out and reloaded by ST group then the renderer is
lost.
> I haven't
> used ST enough but I think template names would tend to be nouns so
> again
> the fact that the template name reads as a verb could indicate that it
> affects rendering but this is very weak argument for this syntax being
> understandable.
Well, you can say that the $x:y()$ means format x as a y like
$m:method()$. It's not a huge stretch in my view. Better than $x.y$
as you point out about the properties. The "()" denotes an action
sort of.
> Option 1 isn't too bad. It uses upperCase as an adjective to modify
> the noun
> "name" and it is part of the specification of the set of attributes
> so its
> meaning can be understood. Although it has the same effect (an
> uppercase
> version of name being output) it is not a renderer.
>
> So from a syntax perspective I like option 4 the best. The concern
> with
> option 4 is that it could be used to call a model method with a string
> argument. This is the slippery slope argument. At first I agreed
> with this
> argument against option 4 but now I'm not sure. I am compelled by the
> expressiveness of it.
Yes, but I cannot open that hole I don't think. Allowing you to call
a random method seems way too open.
> It is probably not a good argument (we already slipped so why not
> slip some
> more) but I noticed that I can already call a method with a string.
Correct, but it has to be only toString() not a random method. Abuse
is clear when you see it. If you alter toString() to wipe the hard
drive, I cannot prevent this hole. At some level 32 bits of binary
data have to be toString() to a string in 0..9 for integers, for
example.
> With this renderer I can do some simple math in the template
> public static class BadRenderer implements AttributeRenderer
> {
> public String toString(Object o)
> {
> if (o instanceof String)
> {
> String s = (String)o;
> String args[] = s.split(",");
> if (args.length >= 3)
> {
> int a = Integer.parseInt(args[1]);
> int b = Integer.parseInt(args[2]);
> int c = 0;
> if (args[0].equals("+"))
> {
> c = a + b;
> }
> else if (args[0].equals("-"))
> {
> c = a - b;
> }
> return new Integer(c).toString();
> }
> return "bad input";
> }
> return o.toString();
> }
> }
> .
> StringTemplate bad = builtinTemplates.defineTemplate("bad", "$it
> $");
> bad.registerRenderer(String.class, new BadRenderer());
>
> Now in my template I have
> 3+5 is $bad(it="+,3,5")$
> and it prints out
> 3+5 is 8
>
> I don't know if this indicates that option 4 isn't that bad or if per
> StringTemplate renderers should not be allowed. One could argue that
> property references should not be allowed because they can have side
> effects.
They must not have side-effects. If you alter a property like
$user.name$ to update the database, that is something i cannot
prevent. Again, abuse is clear when getName() wipes the harddrive.
> That would just leave scalar attributes and maps, and public object
> fields. But wait I can implement the same functionality as above
> with a map.
> Just create a class that implements Map and the get method can do
> anything
> with the key string. Example of use: $badmap.({+,3,5})$. You were
> probably
> aware of this. The old documentation says: "You may pass in
> instances of
> type HashMap and Hashtable but cannot pass in objects implementing
> the Map
> because that would allow all sorts of wacky stuff like database
> access." But
> now you can pass in Map. What changed your mind?
There is a case where you want to have Map access strings for i18n in
a database. The potential for abuse is there as you've shown with
badmap above. My philosophy is to make bad behavior as inconvenient
as possible and to clearly highlight it (getName() wiping drive)
while still allowing flexibility and being a practical system.
> Here is another distinction between options 2, 3, and 4. With
> option 2 there
> is no way that the rendering can be applied to anything but the
> value of the
> attribute. With 3 the rendering done by the template upperCase can
> apply to
> attributes, string literals, and templates.
Very true. Hadn't thought of that.
> When applied to templates only
> attributes referenced within that template get uppercased. Example
> from my
> previous email (correcting the typo):
> $upperCase(it={$message(p0=name)$})$ produces: "Hello JOHN!"
> "Not HELLO
> JOHN"
If args are defined you can say:
$upperCase({$message(name)$})$
Hmm...sees that the whole message should be uppercased, but youre
right...it only does on the renderering of attributes not string
literals. So option 3 seems not so good.
> With option 4 I think the expected behavior is that the formatting
> would
> apply to the whole template. The reason is that the ;separator syntax
> applies to templates as in:
> $requestParameters.(k) : {[$it$]}; separator=", "$ which produces a
> comma
> separated list of values enclosed in square brackets (ex: [a], [b],
> [c]).
> So I would expect ${ hello $name$!};format="upperCase"$ to produce
> "HELLO
> JOHN!"
Correct.
> Another problem with option 4 is that the template makes
> assumptions about
> the types of attributes. For example if format short applies to
> dates then
> $name;format="short"$ makes no sense. This breaks your rule 4.
What if format="foo" didn't call foo() on the object? What if it
called format(object,"foo") with "foo" as the format string? If
format is defined, cool else it's ignored.
Actually doesn't this present the renderer problem again? String
would have to have a renderer defined...oh, but you could do that
globally with a singleton renderer registered for String. If you
ever anywhere said $somestring; format="abbrev"$, the
stringRenderer.format(somestring, "abbrev") would be called.
Hmm...opens things up a bit, but I guess if format wipes the drive
it's clear you're abusing the format method.
> Finally back to the original question. I'm not sure I fully
> understand your
> example:
> "$name.toUpper$ would be converted to
> r = renderer.get("String");
> r.toUpper(name);
> So, in the controller you register String->SeansHTMLRendererThingie
> and then call $name.seansMethodForManipulatingStringsInACoolWay$."
>
> It seems to me that the main difference is that this way uses the
> data type
> "String" to lookup a renderer and then reflection is used to find a
> format
> method such as "toUpper" on that renderer and calls it. Where
> presumably
> option 4 would be implemented by passing the object to render and
> the format
> string to a renderer.
Yes.
> If this is the difference then YES I think it is the right solution
> for the
> implementation.
Meaning the name.toUpper or name; format="toUpper"?
> It constrains the possible set of renderers to just what the
> renderer class provides and makes it very clear what should happen
> when
> there is a type mismatch. (renderer.get("String") should return
> null if it
> doesn't know what to do with strings and then the next step would be
> skipped.)
>
> However I think the implementation is independent of the syntax
> used. Why
> couldn't option 4 ($name;format="upperCase"$) be translated to
> r = renderer.get("String");
> r.upperCase(name);
Interesting...ok, we don't want $x; format="y'$ to call x.y(), but we
can allow it to call $xClassRenderer.y(x)$. That is better...
> The same amount of information is available to the implementation
> in both
> syntaxes.
I have also just increased use of options such as wrap so this is not
a new "concept".
> To sum up I would like to see option 4 supported and implemented as
> you
> described. The formatting would apply to (the result of) templates
> to be
> consistent with the ;separator.
Yes, so I'd evaluate the entire expressions even if
$names:{$i$. <b>$it$</b>}$; format="upperCase"$
Here, the entire list of crap would be uppercased including the $i$
numbers. Right?
> If there is a type mismatch (such as
> $today;format="upperCase"$ where upperCase works on Strings and
> today is a
> Date) then the empty string is returned (or perhaps an exception
> would be
> better). Using a format string that doesn't exist as a method could
> also be
> defined to return an empty string or throw an exception (not sure
> which is
> best).
Hmm...I think perhaps that should reduce to simply $today$ if there
is no formatter...perhaps that allows you to remove a renderer w/o
breaking stuff?
> This handling of type mismatches and unsupported formats with option 4
> syntax is preferred over the option 2 syntax. Option 2 would end up
> potentially hiding properties. The person creating properties and
> the person
> defining renderers would have to coordinate because they are
> sharing the
> same namespace. Option 4 is a little more efficient since you know
> you are
> done if you don't find a format method you don't have to go looking
> for a
> property.
I think we'll need a severity/pendantic option for ST soon so you can
say what becomes an exception and what is ignored.
>
> Thanks for asking,
Thanks for your excellent analysis and suggestions!
Terence
More information about the stringtemplate-interest
mailing list