Install Theme

nightpool:

nostalgebraist:

Can someone who knows about OOP give me some cultural context for something like Sandi Metz’s Four Rules?

1. Your class can be no longer than 100 lines of code.
2. Your methods can be no longer than five lines of code.
3. You can pass no more than four parameters and you can’t just make it one big hash.
4. When a call comes into your Rails controller, you can only instantiate one object to do whatever it is that needs to be done. And your view can only know about one instance variable.

I’ve never used Ruby at all, and I imagine that the people who do this stuff are doing very different kinds of programming than what I do.  But this sounds hellish, mostly because of rule #2.  The things I want my program to do are already complicated.  The last thing I want to do is force myself to add additional complexity by artificially dividing the task into many tiny pieces spread over many disparate bits of code, each of which will have its own name I whose meaning I am guaranteed to forget later.

(I understand that rules like these are meant to build good habits and should not be literally applied all the time.  I just don’t understand how #2 is a good habit.)

hey! I like oop! i took a three-day class from sandy metz once! I am probably at least somewhat qualified to talk about this topic, which is frankly a first for me on tumblr.

The first thing to understand is that these ideas are coming from the perspective of backend web development—specifically, Rails development. Rails has very little boilerplate, and Ruby itself is a very expressive language. Some things that would probably take me 15 lines to write in Java only take me 3 in Ruby, for example. So if you’re working in a different language, inflate those lines by at least 2x. And specifically the cultural context here (as other people have mentioned) is about small team entry-level app development—they’re meant for people who don’t feel very confident writing OO code yet, and keeping a moderately sized codebase clean.

That said, here’s the core idea behind OOP: Code should be readable, composable, and maintainable. We think objects are a good way to achieve those goals.

This first rule is motivated by the concepts of Single Responsibility and Interface Segregation. These say that your class should only be responsible for one thing, and that no consumers of that class should be forced to depend on methods they don’t need. The rule of thumb here is that you should keep your classes (and thus the interfaces those classes expose) very small, to increase the flexibility of the class, avoid making it very brittle to interface with that class, and to increase the ability to reuse/compose the code you write.

The third rule is also about interface segregation. Specifically, it’s about the concept of coupling. If you have a method with lots and lots of parameters, it’s probably due to one of two things: either that method is so highly coupled to another place in your code that no one else is ever going to reasonably be able to consume it—you need 4 of this object, 6 of the other, the moon at three-quarters gibbous, that sort of thing. Or it’s a sign that some of your variables are so closely related—an x and a y coordinate, for example—that you’re almost never going to pass someone one without passing the other, and they should be encapsulated into another object.

The fourth rule is pretty specific to model view controller code so i’m just gonna pass over it for now (unless people really want to hear about it?)

The second rule, undoubtably, is the most controversial. Five lines?? Who can get anything done in five lines? Well, this goes back to the idea that your code should readable and composable. Instead of writing one long function to encompass work you want to do, you should break that work up into a mix of objects and behaviors that you can compose into the method that you want. 

Here’s an example from some code I just wrote. It parses a HTML page to extract a list of peer-review assignments for students:


for student in page.select(".student_reviews"):
    user_id = student.find(class_="student_review_id").text
    assigned_ids = student.select(".peer_review .user_id")
    reviews[user_id] = assigned_ids[0].text if assigned_ids else None
    for i in assigned_ids:
        reviews_for[i.text].append(user_id)
    name = student.find(class_="assessor_name").text.split(", ")
    names[user_id] = " ".join([name[-1]] + name[:-1])

This is working code, but it’s a complete pain to read, understand, and reason about. Let’s try to sketch out what a more object oriented version might look like:


class Section(object):
	@staticmethod
	def from_page(page):
		self.page = page
	
	def student_dom_objects(self):
		return page.select(".student_reviews")
	
	def students(self):
		return [Student.from_element(i) for i in self.student_dom_objects()]

	def reviews(self):
		return {s.id(): s.reviews() for s in self.students()}

	def reviews_for(self, student):
		return [s for s in students() if student.id() in s.reviews()]

class Student(object):
	@staticmethod
	def from_dom_object(dom_obj):
		self.dom_obj = dom_obj
	
	def id(self):
		return self.dom_obj.find(class_="student_review_id").text
	
	def reviews(self):
		return [i.text for i in self.dom_obj.select(".peer_review .user_id")]

	def name_given(self):
		return self.dom_obj.find(class_="assessor_name").text

	def name_formatted(self):
		last, first = self.name_given.split(", ")
		return " ".join((first, last))

There’s a lot that happened here, but the most obvious is that all of our actual code got removed—I elided a Section.from_page(page) call—and now everything is just declarative, instead of imperative. One benefit to that is that all of our methods instantly become shorter. And, because we’re composing methods, it becomes really easy to maintain—there’s only one place to change something, and it’s obvious where it might be. And now we have a clean, easy to use interface that’s easily extendable.

Obviously, there’s more to it then that, but that’s basically my version of the “Intro to OO” pitch. Does that clear things up a little?

Thanks, this is also helpful.

I am wary of composability as an ideal, for reasons I stated here.  As you get more and more objects and methods involved in performing a given task, you’re allowing the actual code that performs that task to be spread more and more widely across the code base, and requiring the reader to trace back more steps in order to have full comprehension of what any line is actually doing.  And if you want to follow what the code is doing closely, you have to jump around nonlinearly more and more.  @gattsuru​ used the phrase “ravioli code” for this, and Googling it, it seems like other people have made the same complaint, e.g.:

I should have noted why I think that Ravioli Code is a bad thing (and hence that those who think it is good style are doing a disservice to their trade). The problem is that it tends to lead to functions (methods, etc.) without true coherence, and it often leaves the code to implement even something fairly simple scattered over a very large number of functions. Anyone having to maintain the code has to understand how all the calls between all the bits work, recreating almost all the badness of Spaghetti Code except with function calls instead of GOTO. It is far better to ensure that each function has a strong consistent description (e.g. “this function frobnicates the foobar”, which you should attach to the function somehow - in C, by a comment because there’s no stronger metadata scheme) rather than splitting it up into smaller pieces (“stage 1 of preparing to frobnicate the foo part of the foobar”, etc.) with less coherence. The principal reason why this is better is precisely that it makes the code easier overall to understand.

Of course, it makes sense to group things together if they actually tend to get re-used together (like having x and y coordinates be attributes of the same object), or to put some code in a function of its own if it forms a distinct conceptual block.  But my understanding is that the “ifs” in the previous sentence should be read as “if-and-only-ifs”; the point of abstractions like functions and objects is to group some things together as well as to separate them from other things, in order to exploit actual regularities of the task or conceptual domain in your code.

If, instead, you try to make everything as modular as possible all the time, you’re no longer making useful “these form a group, apart from these other things” distinctions; you’re just splitting everything up as finely as it possibly can be.

gattsuru:

nostalgebraist:

Can someone who knows about OOP give me some cultural context for something like Sandi Metz’s Four Rules?

1. Your class can be no longer than 100 lines of code.
2. Your methods can be no longer than five lines of code.
3. You can pass no more than four parameters and you can’t just make it one big hash.
4. When a call comes into your Rails controller, you can only instantiate one object to do whatever it is that needs to be done. And your view can only know about one instance variable.

I’ve never used Ruby at all, and I imagine that the people who do this stuff are doing very different kinds of programming than what I do.  But this sounds hellish, mostly because of rule #2.  The things I want my program to do are already complicated.  The last thing I want to do is force myself to add additional complexity by artificially dividing the task into many tiny pieces spread over many disparate bits of code, each of which will have its own name I whose meaning I am guaranteed to forget later.

(I understand that rules like these are meant to build good habits and should not be literally applied all the time.  I just don’t understand how #2 is a good habit.)

There’s a lot of tradeoffs involved, and they’re different in different environments.  Breaking methods up into smaller subroutines does have advantages, especially in more heavily object-oriented code – I’ve seen C#  UI-facing class with tens of thousands of lines long, and with individual methods spanning over a thousand lines.  That’s not just tricky to follow, but just comprehending the scope limitations gets tricky, even though a similarly long pure data ingestion piece wouldn’t be unusual or unreasonable.

Obviously, it’s possible to go too far the other way.  In data processing, splitting highly related actions into dozens of small packets is another failure mode, and one common among folk trained in Java.

And even if you avoid ravioli code, trying to force too much logic onto a single line can be much worse than the same actual process happening over several lines.  And deeper call stacks invoke (usually though not always trivial) performance penalties, along with sometimes making logging more complicated.

However, that’s not the typical use case for Ruby, and especially Metz-focused Ruby code.  It’s very much marketed as a web, ui, and scripting language, rather than a bulk data processing, rapid ingestion, or analytics language.  Even though it can do those things, too, most of the advice Metz gives is more about small-team groups, generally working in the entry-level application programming field.  Here, the tradeoffs strongly favor shorter code.

Likewise, Rule #3 can either have folk combine related variables into objects like structures (good, until you learn the difference between value and reference types the hard way) or produce a Global God Object (terrible!… unless you’re on a microcontroller).

The other part… this isn’t just meant to be for building good habits.  The Metz Rule #0 isn’t “unless you must”, but specifically “unless your pair agrees”.  These rules exist so that the exceptions much be known by other programmers likely to review or support that code.  That’s obviously not a very useful rule for hobbyists or research programmers, but in entry web and app development it makes much more sense, especially as your peers are likely to have run into leakiness of abstractions more common in complex code anyway.

The other Metz Rule #0 is that once you’re an experienced coder, you needs must make decisions yourself.  These rules are intended for people who don’t have the experience to be confident making design documents.

Thanks, this is helpful – especially the second-to-last paragraph.

(via gattsuru)

absurdseagull:

nostalgebraist:

Can someone who knows about OOP give me some cultural context for something like Sandi Metz’s Four Rules?

1. Your class can be no longer than 100 lines of code.
2. Your methods can be no longer than five lines of code.
3. You can pass no more than four parameters and you can’t just make it one big hash.
4. When a call comes into your Rails controller, you can only instantiate one object to do whatever it is that needs to be done. And your view can only know about one instance variable.

I’ve never used Ruby at all, and I imagine that the people who do this stuff are doing very different kinds of programming than what I do.  But this sounds hellish, mostly because of rule #2.  The things I want my program to do are already complicated.  The last thing I want to do is force myself to add additional complexity by artificially dividing the task into many tiny pieces spread over many disparate bits of code, each of which will have its own name I whose meaning I am guaranteed to forget later.

(I understand that rules like these are meant to build good habits and should not be literally applied all the time.  I just don’t understand how #2 is a good habit.)

I am not a professional so take my thoughts with a grain of salt. #2 seems sensible in particular if you read it as:

“Each method should be morally 5 lines”

Or:
“Methods should be specialized and do exactly one thing.”

Keeping each method rather specialized and giving them sensible names helps with human readability. It makes it obvious when you could reuse code (so you get to fix it all at once). And smaller pieces of code are easier to debug.

Also it makes your code more structured so you can maintain it and the other people you’re working with can too.

Keeping each function small and specialized lets you cut down on coding time and subtle errors at the cost of planning beforehand.

Now it may be unnecessary if your project is small, but I dunno, it matches how I structure my thoughts anyways

I understand the “methods should do exactly one thing” principle.  I’m just having a hard time seeing how an extremely low line-number limit is going to enforce this successfully.  It’ll prevent you from having a method do more than one thing, but it will also force you to spread some things over multiple methods so that each does less than one thing.

@wirehead-wannabe mentioned line number limits being helpful on a project, and I can imagine them being helpful, particularly in domains where your program isn’t doing a lot of math.  (Several people mentioned web code being different from scientific code in this respect.)

But it seems like if the line number limit gets too low, you’ll have this problem where not only are you spreading individual tasks over several methods, you’re also treating lines of “actual” code (i.e. lines that don’t call a method) as a costly resource you must spread as thinly as possible.  So for every line that you could have understood before you started the project, you’re trying to maximize the number of lines that just pass the buck around to different abstractions in this system you’ve invented from scratch.

Thus most of your code isn’t going to be stuff you can look and say “OK, I know exactly what this is doing.”  For most of it, you’re only going to be able to say “I understand what this is doing insofar as I understand exactly what all of these abstractions do,” and when the abstractions were all made by a fallible human (you) and the code will be read by other people (and you in the future), this seems … really bad.  You lose the ability to say “okay, I know exactly what is going on at this point in the code” because almost nothing ever gets done without invoking lots of other methods and objects, all of which could conceivably be broken.

(via namelessdistribution)

Can someone who knows about OOP give me some cultural context for something like Sandi Metz’s Four Rules?

1. Your class can be no longer than 100 lines of code.
2. Your methods can be no longer than five lines of code.
3. You can pass no more than four parameters and you can’t just make it one big hash.
4. When a call comes into your Rails controller, you can only instantiate one object to do whatever it is that needs to be done. And your view can only know about one instance variable.

I’ve never used Ruby at all, and I imagine that the people who do this stuff are doing very different kinds of programming than what I do.  But this sounds hellish, mostly because of rule #2.  The things I want my program to do are already complicated.  The last thing I want to do is force myself to add additional complexity by artificially dividing the task into many tiny pieces spread over many disparate bits of code, each of which will have its own name I whose meaning I am guaranteed to forget later.

(I understand that rules like these are meant to build good habits and should not be literally applied all the time.  I just don’t understand how #2 is a good habit.)

I’ve been frantically MATLABbing this week and I notice I keep typing things like “x - mean(x)” or “x / peak2peak(x)” or “x / norm(x)” and having to copy/paste x to produce these expressions was annoying

There are plenty of sensible solutions to this problem, like just defining functions that compute these things given x, or using anonymous functions or whatever.  But I kept craving syntax like

“x - mean(some_keyword)”

where some_keyword means “the thing on the other side of this expression” (or maybe “the previous thing in this expression,” to avoid having two of them mirror one another)

I’ve never seen this before and it doesn’t seem like a particularly good or useful idea, but I think the reason I’m craving it is that it would work like a pronoun, and I’m used to those from natural language.  It feels natural to be able to use a generic word for “the thing I just mentioned.”  Which is kinda interesting I guess

nightpool asked: Yeah, I think the thing here is that this isn't really a Mac vs. Unix thing—its a "really complicated, niche open source projects with complicated dependencies usually have really bad installation stories" compounded by the fact that its an academic/scientific project so its probably not going to be super well written or with a focus on the end user to begin with.

Yeah, I realized after posting that my IBAMR example was almost “cheating” because it’s exactly the sort of thing that one would expect to see these problems with, if with anything.  It’s not like I’m a world where niche open source projects tend to come with nice Mac OS binaries, after all.

Actually, after thinking more about it, I realized that my earlier post was really about several completely distinct issues, which can all be stated in a much more straightforward way:

(1) Things frequently go wrong when I compile things from source, and fixing these problems is frustrating and takes a long time.  To some extent this is just “computers are computers,” but I do wonder whether I’m doing something wrong that is making this happen more frequently than it does for the usual user.  (Or maybe I did a bunch of little things wrong in the past that are invisibly creating problems now, and I should back up all my media and stuff and start from a clean slate again.)

(2) I keep getting into this situation with dependencies that take up a lot of disk space.  Again, this makes more sense for scientific software, where the tool I want may be build on top of some big scientific toolkit that the user is assumed to already have/use.  I am curious where there’s some way I can set things up to do more minimal installations if possible.  (I probably don’t need the visualization software for the thing your thing is built on top of.)

(3) If binaries are available, they may help me avoid the above problems.  I keep seeing things like “we have a binary for Ubuntu” or whatever – not just “for Linux” say – and I’m never sure what range of binaries I’d be able to use if I got a particular OS.  With the Mac OS, there’s just the one thing.

(Although another perspective might be “Mac OS is just another OS that a binary may exist for, and often doesn’t.”  But at least with Mac stuff I feel like I know the territory, I guess?)

togglesbloggle:

plain-dealing-villain:

togglesbloggle:

nostalgebraist:

This post is about me being a n00b and I hope you can enlighten me

I currently use the Mac OS and I already use the Unix command line all the time.  It seems sensible to try to think of myself as “someone who uses a type of Unix,” rather than “someone who uses the specific products whose pleasant aesthetics derive from Steve Jobs’ personal sense of Platonic ‘correctness’ and which are produced under (as far as I can tell) inhumane conditions even by the standards of the industries involved”

The biggest problem with this plan is that installing things which are specifically made for Steve Jobs’ private cosmos is painless, while installing things which are made for Unix users are painful

There are things I want to use, and they have dependencies.  Sometimes these dependencies take a long time to install and take up a great deal of disk space.  What I actually wanted was just whatever little bit of functionality in the dependency is used by the the tool I want.  But I don’t know how to get just that, and so I end up setting aside gigabytes of infrastructure because somewhere in those gigabytes is a simple function that the tool wanted to call, or some object definitions the that author of the tool found convenient.

This is a problem even with things that are not compiled from source – say, a simple python script which won’t run unless i install a gigantic set of modules.  If I am compiling from source, other problems may arise.  It may not compile with my default compiler and I may have to try another one, or hunt down the combination of flags the thing wants.  It may have expectations about my directory structure which aren’t true.  I may have to help it deal with multiple existing versions of the same thing, some or all of which were installed by other simple tools that “depended on” them.

But when I get a native Mac installer for the tool, it is always small if the tool is small.  It will basically always work with no extra help from me, and it will not want to put lots of extra stuff on my hard drive.  You would think that compiling from source would result in smoother compatibility than with binaries, making everything work with my specific configuration.  But Mac-specific binaries always work perfectly and are always small for things that seem like they should be small.

Am I doing something wrong?  At the moment, if I get a “native Mac installer” I expect to be able to use the thing within 5 minutes and for the thing to take up almost no space if it is simple.  If I get a normal sort of thing from github which doesn’t care that I have a Mac, I prepare to potentially spend an afternoon installing it and to let it have a gig.  If I declare that I will have nothing to do with Steve Jobs’ turtlenecks-and-Foxconn world anymore, I will no longer get the former.  This feels wrong.  Is it actually natural?

The general rule of thumb is that for a home Linux machine, your root directory will almost never get above 25 gigs.  There are some exceptions, mostly for people that are trying to run an elaborate home server setup or work in a large number of different environments each with their own vast network of dependencies or something, but in general you can expect a fairly lean library.  (Linux boxes can of course get *absurdly* small, but that’s for custom situations and not general use.)

I have never used a Mac machine with a console habitually, so I don’t have much basis for comparison, but the difference may be that the Mac has a poor package manager?  On Ubuntu, for example, apt-get is the standard tool for downloading and installing programs, and usually makes good choices about dependencies.  Follow-up with apt-get autoremove and apt-get clean will pare down unused dependencies automatically and reliably, although I don’t think they have the kind of laser-like focus to get rid of every single unused function (that would be pretty scary, anyway!).  If Homebrew doesn’t handle cleanup as elegantly (or uses more of a shotgun approach to dependencies) this may be the root of your problem.

That said, I don’t think the Linux installation process will ever be as clean as the Mac one; walled gardens have their advantages, and a lot of the elegance will come from Apple’s strong control over APIs and the app ecosystem.  The Linux advantage is always going to be variety, not simplicity.  So yeah, I’d say your instincts are wrong.

Fun contextual story: I just upgraded my home box from Ubuntu 14 to 16 (which should tell you about where my level of expertise is, i.e. not much).  I partitioned home and root directories separately back when I set the machine up, and didn’t leave much space for root- a little under 14 gigs.  That’s been enough, but doesn’t leave me a whole lot of breathing room, and the upgrade process wanted 3 gigs of free space that I didn’t have.  Sensible cleanup only got me two and a half.  So I deleted all of Unity (that is, the graphical desktop environment; it’s a bit bloated and much loathed in the community), navigated the upgrade process from the terminal, and then reinstalled Unity in version 16.04 of the OS.  When I think about reasons to use Linux at home, it’s more tricks like that, rather than an ‘everything just works’ sort of feeling.

Mac OS has no package manager at all, unless you download Homebrew. Which every programmer who’s worked in a Mac shop does, but @nostalgebraist doesn‘t program enough that it’s a guarantee.

That would do it! @nostalgebraist, you should certainly use a package manager if you don’t already. That will change your expectations for Steve-free computing for the better.

Ah, I see.  (Well, I sort of see.)  Thanks.

I have Homebrew and pip and Luarocks, and I use them, because I’ve been told to by installation instructions.  However, I now realize that I had not really known what they were, so thank you for alerting me.

For instance, I think I need to be making use of the cleaning functions?  But I am a n00b and need to be pointed to an explanation of exactly what these do and how they compare between managers.

For instance, I just spent 5 minutes on Google trying to ask what the Homebrew equivalent of “apt-get autoremove” is.  As far as I can tell, there isn’t one.  There is “brew cleanup”, which removes outdated packages and also cleans the cache (the latter is the equivalent of “apt-get clean” I think?), but that is not the same as removing unused dependencies that were automatically installed.  I also found this thread (only 25 days old), which states that Homebrew doesn’t track which packages were automatically installed and which were requested by the user.

So now I’m in the following situation: I know that I should be using package managers.  The reason for this is that there are certain core desirable things that a package manager will do for me, which I should (presumably) expect from anything that is called a “package manager.”  I read @togglesbloggle‘s helpful post and inferred “ah, the things that ‘apt-get clean’ and ‘apt-get autoremove’ do must be among these core desirable things.”  But now it seems that Homebrew can’t do what “apt-get autoremove” does.

What’s missing for me here is an explicit description of what these “core desirable things” are, and how to figure out what to do with a new package manager if presented with one.  For instance, even if I figure out Homebrew, I also want to understand package managers well enough to know, right away, what I ought to expect out of pip and LuaRocks.

P. S. this is a sidenote, but it’s amusing and illustrative of the difficulties I have when trying to make sense of all of this Unix installation complexities.  In the thread I linked above, one poster argues (I think??) that Homebrew shouldn’t have an equivalent of “apt-get autoremove” – in an almost beautifully high-context post which I think I can halfway make sense of, but which would require (at minimum) its own round of Googling to really get:

Should this be closed then? I don’t see much actual user benefit here other than possibly basically irrelevant disk space savings. The costs of inevitable reinstalls of the same build-time only deps over and over outweighs that ten-fold unless we’re adding some overwrought DSL for marking some build-time deps sufficiently “common” to merit “protection” from aggressive, needless cleaning behaviors, while others are relegated to a lesser, dispensable category. Given how broken build.with? can be, I’m also pretty sure any implementation of this is just going to break user installations unless we have the fabled declarative option system.


I’m trying to remember some of the sorts of issues I’ve had (the ones that have consumed afternoons).  Unfortunately I’d only be able to reconstruct most of the details by actually trying to install the damn things again.  But here are some things I remember:

(Cut for length and limited audience)

Keep reading

(via togglesbloggle)