Municipal Solar Power Production

Sid Ghodke
4 min readSep 13, 2017
Panorama shot of San Francsico, taken from Twin Peaks

I wanted to see how Solar power production is affected by seasonality and cloud cover. Thankfully, California Independent System Operator puts up an hour-by-hour report of power production from each municipal power station in California. (A sample report can be found here.)

Since the reports are generated and saved daily, I wrote a quick wget command to download the all the reports for the year 2016. Now I have the production numbers for all municipal power facilities in California. 🙌

Next up was to parse and transform the raw data into a date-indexed Pandas dataframe. The raw report files contain both renewable and non-renawable production, so hard-coding repeated search terms made it easy to split into tables.

The next step was to plot the data. To validate that the data was parsed and transformed correctly, I generated a quick scatterplot.

Plot made with Apple’s Numbers (X-axis is timestamp in seconds, Y-axis is power production in MegaWatts)

Cool, right off the bat we can see that there are 24 data-points for any one given day (one for each hour of production). Interestingly there seems to be some bands and patterns occurring on nearby days. We can see some empty regions, where production has moved. In general there appear to be some peaks and valleys in different times of the year.

Let us now only focus on the PhotoVoltaic production. Bringing this data into matplotlib, we can apply some fancier plotting techniques. If we divide the day into 4 equal parts, we can now use different tick marks to differentiate when (in the day) one data point was collected. (I filtered out Midnight to 5:59am, the Sun is not in the sky.)

Solar PhotoVolatic Power Production for 2016 (MegaWatts)

I added the Equinox and Solstice markers to highlight the seasons and study the effects of the shortest and longest days of the year.

  • With the new ticker shapes, we can see the effect of Summer’s late sunsets in the power production from 1800 to 2400 hrs around the Summer Solstice. The trends also show a nice curve — when the sunset reaches it apex just after the Solstice.
  • Morning power production is scattered nearly randomly in Winter and Spring, but seem to join into the top power band in the Summer.
  • And finally, peak PV power prodution in the Winter is nearly 2000 MW less than the peak in Summer.

Lastly, I was curious to see what the PV production would look like broken down by hour. I decided to use a box-and-whisker plot to show the distribution of power outputs. All combined into one chart, it looked like so.

Nice!

  • We can see that on average the Sun comes up at 7am, across all seasons.
  • The distribution of production in the morning is relatively compact, with the majority of data points less than 1MW away from the average.
  • It is second only to the distribution during mid-day. At mid-day, a skew is seen where the lower bound is still about 1MW away however the upperbound is much closer to the average (.5MW).
  • Late afternoon (1500hrs or so) is where we start to see a dropoff in power production. This is seen with the lower bound dropping away.
  • Finally, the change in sunset times can be seen here as well. The broader distribution of power production from 1700hrs and onward shows the influence of longer Summer days.

In conclusion, this post was a summation of some interesting notes from a visualization of a Solar report data. There is still a lot of things that could be done with this dataset, some of these thigns come to mind.

  • Integrate weather reports, to better refine energy production of total solar output (remove cloud and weather influences from the data)
  • Build a model to guess power production given date, time and city in California
  • Gather data for the last 3 years, to see how solar output has changed.

--

--

Sid Ghodke

Engineer at First Opinion App. Working on Algorithms and TeleHealth.