Follow-up post explaining research rationale
Joe Duarte via Digitalmars-d
digitalmars-d at puremagic.com
Mon May 9 12:09:35 PDT 2016
Hi all,
As I mentioned on the other thread where I asked about D syntax,
I'm a social scientist about to launch some studies of the
effects of PL syntax on learnability, motivation to pursue
programming, and differential gender effects on these factors.
This is a long post – some of you wanted to know more about my
research goals and rationale, and I also said I would post
separately on the gender issue, so here we go...
As you know, women are starkly underrepresented in software
engineering roles. I'm interested in zooming back to the
decisions people are making when they're 16 or 19 re: programming
as a career. I'm interested in people's *first encounters* with
programming, in high school or college, how men and women might
differentially assess programming as a career option, and why.
Let me note a few things: Someone on the other thread thought
that my hypothesis was that women don't become programmers
because of the semicolons and curly braces in PL syntax. That's
not one of my hypotheses. I do think PL syntax is a large
problem, and I have some hypotheses about how it
disproportionately deters qualified women, but the issues I see
go much deeper than what I've called the "punctuation noise" of
semicolons and curly braces. (I definitely don't have any
hypotheses about female perceptions of the aesthetics of curly
braces, which some posters had inferred – none of this is about
female aesthetic preferences.)
Also, I don't think D is particularly problematic – it has
cleaner and clearer syntax than its contemporaries (well, we'll
need careful research to know if it truly is clearer to a
targeted population). I plan to use D as a presumptive *clearer
syntax* condition in some studies – we'll see how it goes.
Lastly, I'm not approaching the gender issue from an ideological
or PC Principal perspective. My work will focus mostly on
cognitive science and pedagogical factors – as you'll see below,
I'm interested in diversity issues from lots of angles, but I
don't subscribe to the diversity ideology that is fashionable in
American academia.
One D-specific question I do have: Have any women ever posted
here? I scoured a bunch of threads here recently and couldn't
find a female poster. By this I mean a poster whose supplied name
was female, where a proper name was supplied (some people just
have usernames). Of course we don't really know who is posting,
and there could be some George Eliot situations, but the
presence/absence of self-identified women is useful enough. Women
are underrepresented in programming, but the skew in online
programming communities is even more extreme – we're seeing
near-zero percent in lots of boards. This is not a D-specific
problem. Does anyone know of occasions where women posted here?
Links?
Getting back to the research, recent studies have argued that one
reason women are underrepresented in certain STEM fields is that
smart women have more options than smart men. So think of the
right tail of the bell curve, the men and women in that region on
the relevant aptitudes for STEM fields. There's some evidence
that smart women have a broader set of skills -- *on average* --
than equivalently smart men, perhaps including better social
skills (or more interest in social interaction). This probably
fits with stereotypes and intuitions a lot of people already held
(lots of stereotypes are accurate, as probability distributions
and so forth).
I'm interested in monocultures and diversity issues in a number
of domains. I've done some recent work on the lack of
philosophical and political diversity in social science,
particularly in social psychology, and how this has undermined
the quality and validity of our research (here's a recent paper
by me and my colleagues in Behavioral and Brain Sciences:
http://dx.doi.org/10.1017/S0140525X14000430). My interest in the
lack of gender diversity in programming is an entirely different
research area, but there isn't much rigorous social science and
cognitive psychology research on this topic, which surprised me.
I think it's an important and interesting issue. I also think a
lot of the diversity efforts that are salient in tech right now
are acting far too late in the cycle, sort of just waiting for
women and minorities to show up. The skew starts long before
people graduate with a CS degree, and I think Google, Microsoft,
Apple, Facebook, et al. should think deeply about how programming
language design might be contributing to these effects
(especially before they roll out any more C-like programming
languages).
Informally, I think what's happening in many cases is that when
smart women are exposed to programming, it looks ridiculous and
they think something like "Screw this – I'm going to med school",
or any of a thousand permutations of that sentiment.
Mainstream PL syntax is extremely unintuitive and poorly designed
by known pedagogical, epistemological, and communicative science
standards. The vast majority people who are introduced to
programming do not pursue it (likely true of many fields, but
programming may see a smaller grab than most – this point
requires a lot more context). I'm open to the possibility that
the need to master the bizarre syntax of incumbent programming
languages might serve as a useful filter for qualities valuable
in a programmer, but I'm not sure how good or precise the filter
is.
Let me give you a sense of the sorts of issues I'm thinking of.
Here is a C sample from ProgrammingSimplified.com. It finds the
frequency of characters in a string:
int main()
{
char string[100];
int c = 0, count[26] = {0};
printf("Enter a string\n");
gets(string);
while (string[c] != '\0')
{
/** Considering characters from 'a' to 'z' only
and ignoring others */
if (string[c] >= 'a' && string[c] <= 'z')
count[string[c]-'a']++;
c++;
}
for (c = 0; c < 26; c++)
{
/** Printing only those characters
whose count is at least 1 */
if (count[c] != 0)
printf("%c occurs %d times in the entered
string.\n",c+'a',count[c]);
}
return 0;
}
There's a lot going on here from a learning, cognitive science
and linguistic encoding standpoint.
1. There's no clear distinction between types and names. It's
just plain text run-on phrases like "char string". string is an
unfortunate name here, and reminds us that this would be a type
in many modern languages, but my point here is that there's
nothing to visually distinguish types from names. I would make
types parenthetical or use a hashtag, so: MyString (char) or
MyString #char (and definitely with types at the end of the
declaration, with names and values up front and uninterrupted by
type names – I'll be testing my hunches here).
2. There's some stuff about an integer c that equals 0, then
something called count – it's not clear if this is a type or a
name, since it's all by itself and doesn't follow the pattern we
saw with int main and char string. It also seems to equal zero.
Things that equal zero are strange in this context, and we often
see bizarre x = 0 statements in programming when we don't mean it
to actually equal zero, or not for long, but PL syntax usually
doesn't include an explicit concept of a *starting value*, even
though that's what it often is. We see this further down in the
for loop.
3. The word *print* is being used to mean display on the screen.
That's odd. Actually, the non-word printf is being used. We'd
probably want to just say: display "Enter a string"
4. We switch the person or voice from an imperative "do this" as
in printf, to some sort of narrator third-person voice with
"gets". Who are we talking to? Who are we talking about? Who is
getting? The alignment is the same as printf, and there's not an
apparent actor or procedure that we would be referring to.
(Relatedly, the third-person puts command that is so common in
Ruby always makes me think of Silence of the Lambs – "It puts the
lotion on its skin"... Or more recently, the third-person style
of the Faceless Men, "a girl has no name", etc.)
5. Punctuation characters that already have strong semantics in
English are used in ways that are inconsistent with and unrelated
to those semantics. e.g. exclamation marks are jarring next to an
equals sign, and it's not clear why such syntax is desirable.
Same for percentage signs used to insert variables, rather than
expressing a percentage. (I predict that the curly brace style of
variable insertion in some HTML templating languages will be more
intuitive for learners – they isolate the insertion, and don't
have any conflicting semantics in normal English.)
I realize that some of this sprouted from the need to overload
English punctuation in the ASCII-constrained computing world of
the 1970s. The historical rationales for PL syntax decisions
don't bear much on my research questions on learnability and the
cognitive models people form when programming.
6. There are a bunch of semicolons and curly braces, and it's not
clear why they're needed. Compilation will fail or the program
will be broken if any of these characters are missing.
7. There are many other things going on here, lots of
observations one could make from pedagogical, logical
representation, and engineering standpoints.
Now, there are some reasonable hypotheses having to do with
programming/tech culture and its effects on gender diversity. I
think some of those can intertwine with PL design issues. I also
think there might be an issue with the quality and compellingness
of today's computing platforms, and the perceived power of
computers to do amazing and interesting things. I don't think the
platforms people are introduced to in CS education are very good
at generating excitement about what computers can do. It would be
interesting to gauge what sorts of things people think they might
be able to create, what sorts of problems they think they could
solve, or new interfaces they could implement, after their
introduction to programming. What horizons do they see? For
example, there used to be a lot of excitement about what
computers could do for education. Those visions have not
materialized, and it's not clear that computing is doing anything
non-trivial in education for reasoning ability, unlocking math
aptitude, writing creativity, etc. It might actually be a net
harm, with its effects on attention spans and language
development, though this will be very complicated to assess.
Mobile has reinvigorated some idealism and creativity about
computing. But the platforms people are introduced to or forced
to use when learning programming are not mobile platforms, since
you can't build complex applications on the devices themselves.
Unix and Linux are extremely popular in CS, but are terrible
examples for blue sky thinking about computing. Forcing people to
learn Vim or Emacs, grep, and poorly designed command line
interfaces that dump a bunch of unformatted text at you are
disastrous decisions from a pedagogical standpoint. (See the
BlueJ project for an effort to do something about this.) They do
nothing to illustrate what new and exciting things you could
build with computers, and they seem to mold students into a
rigid, conformist nix, git, and markdown monoculture where
computing is reduced to bizarre manipulations of ASCII text on a
black 1980s DOS-like screen, and constantly fiddling with and
repairing one's operating system just to be able to continue to
work on this DOS-like screen (Unix/Linux requires a lot of
maintenance and troubleshooting overhead, especially for
beginners – if they also have to do this while learning
programming, then programming itself could be associated with a
life of neverending, maddening hassles and frustrations). The
debugging experience on Unix/Linux will be painful. From a
pedagogical standpoint, this situation looks like a doomsday
scenario, the worst CS education approach we could devise.
The nuisance/hassle overhead of programming is probably worth a
few studies in conjunction with my studies on syntax, and I'd
guess the issues are related – the chance of success in
programming, in getting a simple program to just work, is pretty
low. It's not clear that it *needs* to be so low, and I want to
isolate any platform/toolchain factors from any PL syntax
factors. (The factors may not exist – I could be wrong across the
board.)
That's all I've got for now. This isn't as well-organized as I'd
like, but I wanted to get something out now or I'd likely let it
slip for weeks.
More information about the Digitalmars-d
mailing list